You are given the file points.mat with training data. There are three kinds of points.
The goal is to train with these data and classify the points in [0,1]×[0,1].
Modify the file netbpfull.m such that it works with the three categories of points.
• Load the data with load points.mat. This will result in an array x containing points
in 2D and an array labels containing labels.
• Modify the function cost such that it returns accuracy=total number of pointsnumber of points classified correctly∗100
and also returns the indices (in x) of training points that are not classified correctly.
The following contains the diff of the original netbpfull.m and the modified version
I have done the following changes
While loading points.mat, X is now normalized such that training can converge faster
The hidden layers has now been increased to 20 neurons each
W4 and b4 are now initialized with the number of classes in the dataset
The weights are initialized with a smaller scale (multiplied by 0.01), likely to maintain a tighter initial distribution, reducing the risk of saturation of neurons if a sigmoid activation function is used.
Hyperparameters tuning:
The initial learning rate has been reduced to 0.001 for more stable training
Momentum (alpha) is set to 0.89, which is used to update the weight changes. This is to help the neural net get out of local minima points so that a more important global minimum is found.
Added batch-size of 16 for mini-batch gradient descent, a balance between SGD and single gradient descent.
Added learning rate decay, which reduces the learning rate by a factor of 0.99 every 10000 iterations. This is to help the neural net converge to a global minimum.
Training process:
Introduce batch-aware training, which trains the neural net with a batch of points instead of a single point. This is to help the neural net converge faster.
Updated activation function to provide three options: sigmoid, relu, and leaky_relu. The latter two are used for the hidden layers, while the former is used for the output layer.
Update the backpropagation to compute LeaKy ReLU derivatives for the hidden layers.
Updated gradients over batches, and also update the weights with L2 regularization.
Finally, updated cost functions from norm-based error (MSE) to cross-entropy loss, which is more suitable for classification problems.
Activation function: The leaky ReLU activation function is explicitly defined, which helps to mitigate the “dying ReLU” problem where neurons can become inactive and only output zero.
The following contains graphs of the training process:
P2
Implement in Matlab the bisection and Newton’s method for finding roots of scalar equations.
Use your implementation of the bisection method to find a root of
a. x1−exp(2−x) in [0.1,1]
b. x∗sin(x)−1 in [0,2].
Use your implementation of Newton’s method and Matlab’s fsolve to find a root of
a. x1−exp(2−x) with initial guess x0=1
b. x∗sin(x)−1 with initial guess x0=2
For the bisection method, use tol=10∗−10. For your Newton and fsolve, solve until ∣f(xnn)∣≤10−10.
If you are obtaining different roots, explain the differences. Also, discuss the number of iterations.
Solution
Bisection method
Newton method
fsolve
Table
For x1−exp(2−x)
method
root r
f(r)
num. iterations
bisection
0.2152
8.7809e-09
29
Newton
28.6942
-2.5223e-14
9
fsolve
28.5131
-3.7357e-04
12
For x∗sin(x)−1
method
root r
f(r)
num. iterations
bisection
1.1142
4.3660e-10
30
Newton
-9.3172
-2.4834e-11
5
fsolve
1.1142
-1.9488e-08
3
Analysis
For x1−exp(2−x)
Bisection method: The bisection method found a root in the interval [0.1,1] as expected. This method guarantees convergence to a root when it exists within the interval and the function changes sign. However, it is generally slower, as indicated by the higher number of iterations.
Newton’s method: converged to a completely different root, which is outside the interval considered for the bisection method. This shows that Newton’s method is highly sensitive to the initial guess. It also converges faster (fewer iterations) but can lead to roots that are far from the initial guess if the function is complex or if the derivative does not behave well.
fsolve: Similar to Newton’s method, fsolve also found a root far from the interval used for the bisection method. Likely uses a variant of Newton’s method or a similar approach, which explains the similar behavior.
For x∗sin(x)−1
Bisection method: As with the first function, the bisection method finds a root within the specified interval. The method is reliable but slow, as seen from the number of iterations.
Newton’s method: converged to a negative root, which is quite far from the interval [0,2]. This indicates that for this particular function, the method diverged significantly from the initial guess due to the function’s complex behavior, especially when considering trigonometric functions combined with polynomial terms.
Discussion:
Root Differences: The significant differences in roots, especially for Newton’s method and fsolve, highlight the sensitivity of these methods to initial guesses and the nature of the function. For complex functions, especially those with multiple roots, the choice of the initial guess can lead to convergence to entirely different roots.
Number of Iterations: Newton’s method and fsolve generally require fewer iterations than the bisection method, demonstrating their faster convergence rate. However, this comes at the cost of potentially finding different roots, as seen in the results.
P3
The annuity equation is A=rP(1−(1+r)−n)
where A is borrowed amount, P is the amount of each payment, r is the interest rate per period, and there are n equally spaced payments.
Write Newton’s method for finding r.
Implement the function function r = interest(A, n, P) which returns the annual interest rate. Your function must call fsolve. Ensure that fsolve uses the analytical form of the derivative. Report the values of interest(100000, 20*12, 1000), interest(100000, 20*12, 100). Interpret the results.
Solution
Newton’s function for finding r
Given A=rP(1−(1+r)−n)
We have f(r)=rP(1−(1+r)−n)−A
Newton’s methods says r1=r0−f′(r0)f(r0), with f′(r)=−r2P(1−(1+r)−n)+rPn(1+r)−n−1
From calculation, f_100=-0.0099 and f_1000=0.0500. Given here that we use r_0=0.05 (or 5% interests rate)
For P=1000, the interest rate that satisfies the annuity equation is approximately 5%
For P=100, the interest rate required to satisfy the loan conditions would have to be different from 5%.
The negative value of the function (-0.0099) suggests that the actual interest rate required to meet the annuity equation under these conditions is lower than the initial guess of 5%.
Note that the initial value here affects Newton’s approximation vastly. If one changes to 1% one might observe different value
P4
Consider Newton’s method on x5−x3−4x=0
a. How do the computed approximations behave with x0=1?
b. Try your implementation with x0=1 and x0=1+10−14. Explain why this method behaves differently, when started with x0=1+10−14, compared to when it is started with x0=1.
c. Solve also with fsolve. Comment on the results.
Solution
Given the Newton’s implementation
a. The approximation converges to x=1.0. This indicates that the method finds a root at x=1.
b.
Newton’s Method with x0=1: The approximation converges to x=1.0. This indicates that the method finds a root at x=1.
Newton’s Method with x0=1+10−14: The approximation converges to a different value, approximately x=1.600485180440241. This suggests that a small change in the initial guess leads Newton’s method to converge to a different root, highlighting the method’s sensitivity to initial conditions.
c. Using fsolve
The fsolve result differs from the Newton’s method result for the same initial guess (yields 0). This could be due to the inherent differences in the algorithms used by fsolve.
fsolve in matlab uses Levenberg-Marquardt, which finds roots approximately by minimizing the sum of squares of the function and is quite robust, comparing to the heuristic implementation of the Newton’s implementation.
P5
Implement Newton’s method for systems of equations.
Each of the following systems of nonlinear equations may present some difficulty in computing a solution. Use Matlab’s fsolve and your own implementation of Newton’s method to solve each of the systems from the given starting point.
In some cases, the nonlinear solver may fail to converge or may converge to a point other than a solution. When this happens, try to explain the reason for the observed behavior.
Report for fsolve and your implementation of Newton’s method and each of the systems below, the number of iterations needed to achieve accuracy of 10−6 (if achieved).
Solution
For all of the following Newton’s method implementation, it will follow the following framework
Where equations is the Newton’s system of equations, jacobian is the Jacobian of the system, followed by x as the approximation and check for convergence.
A.
yields
Since Newton’s method is locally convergent, meaning that if the starting point is close enough to the actual solution, it will usually converge quickly, and in this case, it did. This means the initial guess was sufficiently close with the true solution.
However, fsolve did not converge to a solution. Since fsolve uses Levenberg-Marquardt algorithm (This algorithm is a trust-region type algorithm, which is a combination of the Gauss-Newton algorithm and the method of gradient descent), it does come with limitation:
The Levenberg-Marquardt algorithm can be sensitive to the starting values. If the initial guess is not sufficiently close to the true solution, the algorithm may not converge. (which we observed)
Local minima: The algorithm may converge to a local minimum instead of a global minimum, especially if the function landscape is complex with multiple minima.
exit flag of -2 means that the two consecutive steps taken by the algorithm were unable to decrease the residual norm, and the algorithm terminated prematurely.
B
yields
Newton’s method takes steps based directly on the local derivative information, potentially taking large steps when far from the solution and smaller steps when closer.
fsolve, when using the Levenberg-Marquardt algorithm, combines aspects of the gradient descent method (which takes smaller, more cautious steps) with the Gauss-Newton method (which is more aggressive). This can lead to different paths through the solution space and convergence to different solutions
fsolve might have found a local minimum, which it mistook for a global minimum, while Newton’s method might have bypassed this due to its larger initial steps.
C
yields
Newton’s method cannot find convergence after 100 steps because of divergence, if the initial guess is not close to the root, especially in the presence of steep gradients or saddle points.
The Jacobian matrix at some point during the iteration may become ill-conditioned, which would lead to large numerical errors in the computation of the inverse or the solution of the linear system in each iteration
fsolve converged here, meaning Levenberg-Marquardt is probably more robust in converging a local minima in this case.
D
yields
For Newton’s method:
Non-convergence: The fact that Newton’s method did not converge could be due to several factors such as a poor initial guess, especially since x1=0 is one of the solutions, which may lead to a division by zero or a derivative that does not exist at some point during the iteration.
Sensitive Derivative: The function (x1+0.1)10x1 has a derivative that becomes very large as x1 approaches -0.1, and this can cause numerical issues, such as overflow or large rounding errors, which can prevent convergence.
Flat Regions: The method might be getting stuck in a flat region of the function where the gradient is very small, leading to very small steps that do not significantly change the estimate of the solution.
For fsolve observation, same arguments can be made that of similar to problem C observation, with regards to robustness of Levenberg-Marquardt algorithm in solving this system of non-linear equations.
P6
You are given the data file data.txt. Each row contains the 2D
coordinates (xi,yi) of an object at time ti. This object exhibits a periodic motion.
Implement the function function period = findPeriod(file_name)
that reads the data from a file and computes the period of the periodic motion.
The points in time where the object returns to the same position must be determined using fsolve. Report the value for the computed period.
Solution
yields the computed period of 39.3870
P7
Consider two bodies of masses μ=0.012277471 and μ^=1−μ (Earth and Sun) in a planar motion, and a third body of negligible mass (moon) moving in the same plane.
The motion is given by
The initial values are
u1(0)=0.994, u1′(0)=0, u2(0)=0, u2′(0)=−2.001585106379082522420537862224.
Implement the classical Runge-Kutta method of order 4 and integrate this problem on [0,17.1] with uniform stepsize using 100, 1000, 10,000, and 20,000 steps.
Plot the orbits for each case. How many uniform steps are needed before the orbit appears to be qualitatively correct?