Problem 1 [10 points] You are given the file points.mat with training data. There are three kinds of points.
The goal is to train with these data and classify the points in $[0,1]×[0,1]$.
Modify the file netbpfull.m such that it works with the three categories of points.
• Load the data with load points.mat. This will result in an array $x$ containing points
in 2D and an array labels containing labels.
• Modify the function cost such that it returns $accuracy=total number of pointsnumber of points classified correctly ∗100$
and also returns the indices (in x) of training points that are not classified correctly.
The following contains the diff of the original netbpfull.m and the modified version
I have done the following changes
While loading points.mat, X is now normalized such that training can converge faster
The hidden layers has now been increased to 20 neurons each
W4 and b4 are now initialized with the number of classes in the dataset
The weights are initialized with a smaller scale (multiplied by 0.01), likely to maintain a tighter initial distribution, reducing the risk of saturation of neurons if a sigmoid activation function is used.
Hyperparameters tuning:
The initial learning rate has been reduced to 0.001 for more stable training
Momentum (alpha) is set to 0.89, which is used to update the weight changes. This is to help the neural net get out of local minima points so that a more important global minimum is found.
Added batch-size of 16 for mini-batch gradient descent, a balance between SGD and single gradient descent.
Added learning rate decay, which reduces the learning rate by a factor of 0.99 every 10000 iterations. This is to help the neural net converge to a global minimum.
Training process:
Introduce batch-aware training, which trains the neural net with a batch of points instead of a single point. This is to help the neural net converge faster.
Updated activation function to provide three options: sigmoid, relu, and leaky_relu. The latter two are used for the hidden layers, while the former is used for the output layer.
Update the backpropagation to compute LeaKy ReLU derivatives for the hidden layers.
Updated gradients over batches, and also update the weights with L2 regularization.
Finally, updated cost functions from norm-based error (MSE) to cross-entropy loss, which is more suitable for classification problems.
Activation function: The leaky ReLU activation function is explicitly defined, which helps to mitigate the “dying ReLU” problem where neurons can become inactive and only output zero.
The following contains graphs of the training process:
Problem 2 [8 points] Implement in Matlab the bisection and Newton’s method for finding roots of scalar equations.
Use your implementation of the bisection method to find a root of
a. $x1 −exp(2−x )$ in $[0.1,1]$
b. $x∗sin(x)−1$ in $[0,2]$.
Use your implementation of Newton’s method and Matlab’s fsolve to find a root of
a. $x1 −exp(2−x )$ with initial guess $x_{0}=1$
b. $x∗sin(x)−1$ with initial guess $x_{0}=2$
For the bisection method, use $tol=10∗−10$. For your Newton and fsolve, solve until $∣f(x_{n}n)∣≤10_{−10}$.
If you are obtaining different roots, explain the differences. Also, discuss the number of iterations.
Solution
Bisection method
Newton method
fsolve
Table
For $x1 −exp(2−x )$
method
root $r$
$f(r)$
num. iterations
bisection
0.2152
8.7809e-09
29
Newton
28.6942
-2.5223e-14
9
fsolve
28.5131
-3.7357e-04
12
For $x∗sin(x)−1$
method
root $r$
$f(r)$
num. iterations
bisection
1.1142
4.3660e-10
30
Newton
-9.3172
-2.4834e-11
5
fsolve
1.1142
-1.9488e-08
3
Analysis
For $x1 −exp(2−x )$
Bisection method: The bisection method found a root in the interval $[0.1,1]$ as expected. This method guarantees convergence to a root when it exists within the interval and the function changes sign. However, it is generally slower, as indicated by the higher number of iterations.
Newton’s method: converged to a completely different root, which is outside the interval considered for the bisection method. This shows that Newton’s method is highly sensitive to the initial guess. It also converges faster (fewer iterations) but can lead to roots that are far from the initial guess if the function is complex or if the derivative does not behave well.
fsolve: Similar to Newton’s method, fsolve also found a root far from the interval used for the bisection method. Likely uses a variant of Newton’s method or a similar approach, which explains the similar behavior.
For $x∗sin(x)−1$
Bisection method: As with the first function, the bisection method finds a root within the specified interval. The method is reliable but slow, as seen from the number of iterations.
Newton’s method: converged to a negative root, which is quite far from the interval $[0,2]$. This indicates that for this particular function, the method diverged significantly from the initial guess due to the function’s complex behavior, especially when considering trigonometric functions combined with polynomial terms.
Discussion:
Root Differences: The significant differences in roots, especially for Newton’s method and fsolve, highlight the sensitivity of these methods to initial guesses and the nature of the function. For complex functions, especially those with multiple roots, the choice of the initial guess can lead to convergence to entirely different roots.
Number of Iterations: Newton’s method and fsolve generally require fewer iterations than the bisection method, demonstrating their faster convergence rate. However, this comes at the cost of potentially finding different roots, as seen in the results.
Problem 3 [4 points] The annuity equation is $A=rP (1−(1+r)_{−n})$
where $A$ is borrowed amount, $P$ is the amount of each payment, $r$ is the interest rate per period, and there are $n$ equally spaced payments.
Write Newton’s method for finding $r$.
Implement the function function r = interest(A, n, P) which returns the annual interest rate. Your function must call fsolve. Ensure that fsolve uses the analytical form of the derivative. Report the values of interest(100000, 20*12, 1000), interest(100000, 20*12, 100). Interpret the results.
Solution
Newton’s function for finding $r$
Given $A=rP (1−(1+r)_{−n})$
We have $f(r)=rP (1−(1+r)_{−n})−A$
Newton’s methods says $r_{1}=r_{0}−f_{′}(r_{0})f(r_{0}) $, with $f_{_{′}}(r)=−r_{2}P (1−(1+r)_{−n})+rPn (1+r)_{−n−1}$
From calculation, f_100=-0.0099 and f_1000=0.0500. Given here that we use r_0=0.05 (or 5% interests rate)
For $P=1000$, the interest rate that satisfies the annuity equation is approximately 5%
For $P=100$, the interest rate required to satisfy the loan conditions would have to be different from 5%.
The negative value of the function (-0.0099) suggests that the actual interest rate required to meet the annuity equation under these conditions is lower than the initial guess of 5%.
Note that the initial value here affects Newton’s approximation vastly. If one changes to 1% one might observe different value
Problem 4 [3 points] Consider Newton’s method on $x_{5}−x_{3}−4x=0$
a. How do the computed approximations behave with $x_{0}=1$?
b. Try your implementation with $x_{0}=1$ and $x_{0}=1+10_{−14}$. Explain why this method behaves differently, when started with $x_{0}=1+10_{−14}$, compared to when it is started with $x_{0}=1$.
c. Solve also with fsolve. Comment on the results.
Solution
Given the Newton’s implementation
a. The approximation converges to $x=1.0$. This indicates that the method finds a root at $x=1$.
b.
Newton’s Method with $x_{0}=1$: The approximation converges to $x=1.0$. This indicates that the method finds a root at $x=1$.
Newton’s Method with $x_{0}=1+10_{−14}$: The approximation converges to a different value, approximately $x=1.600485180440241$. This suggests that a small change in the initial guess leads Newton’s method to converge to a different root, highlighting the method’s sensitivity to initial conditions.
c. Using fsolve
The fsolve result differs from the Newton’s method result for the same initial guess (yields 0). This could be due to the inherent differences in the algorithms used by fsolve.
fsolve in matlab uses Levenberg-Marquardt, which finds roots approximately by minimizing the sum of squares of the function and is quite robust, comparing to the heuristic implementation of the Newton’s implementation.
Problem 5 [8 points] Implement Newton’s method for systems of equations.
Each of the following systems of nonlinear equations may present some difficulty in computing a solution. Use Matlab’s fsolve and your own implementation of Newton’s method to solve each of the systems from the given starting point.
In some cases, the nonlinear solver may fail to converge or may converge to a point other than a solution. When this happens, try to explain the reason for the observed behavior.
Report for fsolve and your implementation of Newton’s method and each of the systems below, the number of iterations needed to achieve accuracy of $10_{−6}$ (if achieved).
Solution
For all of the following Newton’s method implementation, it will follow the following framework
Where equations is the Newton’s system of equations, jacobian is the Jacobian of the system, followed by x as the approximation and check for convergence.
A.
yields
Since Newton’s method is locally convergent, meaning that if the starting point is close enough to the actual solution, it will usually converge quickly, and in this case, it did. This means the initial guess was sufficiently close with the true solution.
However, fsolve did not converge to a solution. Since fsolve uses Levenberg-Marquardt algorithm (This algorithm is a trust-region type algorithm, which is a combination of the Gauss-Newton algorithm and the method of gradient descent), it does come with limitation:
The Levenberg-Marquardt algorithm can be sensitive to the starting values. If the initial guess is not sufficiently close to the true solution, the algorithm may not converge. (which we observed)
Local minima: The algorithm may converge to a local minimum instead of a global minimum, especially if the function landscape is complex with multiple minima.
exit flag of -2 means that the two consecutive steps taken by the algorithm were unable to decrease the residual norm, and the algorithm terminated prematurely.
B
yields
Newton’s method takes steps based directly on the local derivative information, potentially taking large steps when far from the solution and smaller steps when closer.
fsolve, when using the Levenberg-Marquardt algorithm, combines aspects of the gradient descent method (which takes smaller, more cautious steps) with the Gauss-Newton method (which is more aggressive). This can lead to different paths through the solution space and convergence to different solutions
fsolve might have found a local minimum, which it mistook for a global minimum, while Newton’s method might have bypassed this due to its larger initial steps.
C
yields
Newton’s method cannot find convergence after 100 steps because of divergence, if the initial guess is not close to the root, especially in the presence of steep gradients or saddle points.
The Jacobian matrix at some point during the iteration may become ill-conditioned, which would lead to large numerical errors in the computation of the inverse or the solution of the linear system in each iteration
fsolve converged here, meaning Levenberg-Marquardt is probably more robust in converging a local minima in this case.
D
yields
For Newton’s method:
Non-convergence: The fact that Newton’s method did not converge could be due to several factors such as a poor initial guess, especially since $x_{1}=0$ is one of the solutions, which may lead to a division by zero or a derivative that does not exist at some point during the iteration.
Sensitive Derivative: The function $(x_{1}+0.1)10x_{1} $ has a derivative that becomes very large as $x_{1}$ approaches -0.1, and this can cause numerical issues, such as overflow or large rounding errors, which can prevent convergence.
Flat Regions: The method might be getting stuck in a flat region of the function where the gradient is very small, leading to very small steps that do not significantly change the estimate of the solution.
For fsolve observation, same arguments can be made that of similar to problem C observation, with regards to robustness of Levenberg-Marquardt algorithm in solving this system of non-linear equations.
Problem 6 [8 points] You are given the data file data.txt. Each row contains the 2D
coordinates $(x_{i},y_{i})$ of an object at time $t_{i}$. This object exhibits a periodic motion.
Implement the function function period = findPeriod(file_name)
that reads the data from a file and computes the period of the periodic motion.
The points in time where the object returns to the same position must be determined using fsolve. Report the value for the computed period.
Solution
yields the computed period of 39.3870
Problem 7 [3 points] Consider two bodies of masses $μ=0.012277471$ and $μ^ =1−μ$ (Earth and Sun) in a planar motion, and a third body of negligible mass (moon) moving in the same plane.
The motion is given by
The initial values are
$u_{1}(0)=0.994$, $u_{1}(0)=0$, $u_{2}(0)=0$, $u_{2}(0)=−2.001585106379082522420537862224$.
Implement the classical Runge-Kutta method of order 4 and integrate this problem on $[0,17.1]$ with uniform stepsize using 100, 1000, 10,000, and 20,000 steps.
Plot the orbits for each case. How many uniform steps are needed before the orbit appears to be qualitatively correct?
Submit plots and discussion.
Solution
yields the following graph
100 steps
It appears that the plot for the first 100 steps of the three-body problem using the RK4 method in MATLAB shows a spiral pattern rather than the expected closed orbit. This divergence could be due to several factors:
Step Size: A step size of 100 may be too large to accurately capture the dynamics of the system, leading to significant numerical errors. The three-body problem is known for its sensitivity to initial conditions and step sizes, and thus requires a smaller step size for a more accurate solution.
Numerical Stability: The RK4 method, while fourth-order accurate for each step, is not guaranteed to be stable for all step sizes and problems.
1000 steps
The plot for the 1000 steps case shows a significant improvement over the 100 steps case. This plot demonstrates a more defined and coherent orbit, which suggests that the step size is more appropriate for capturing the dynamics of the system. The spiral pattern from the 100 steps case is less pronounced, and the orbit begins to resemble the expected closed path of the three-body problem.
However, there is still some noticeable deviation and distortion in the orbit, which indicates that while the solution is converging towards the correct behavior with a smaller step size, further refinement might be necessary. In practice, continuing to reduce the step size can help further improve the accuracy of the orbit.
10000 steps
The plot for 10,000 steps demonstrates a substantial improvement and now depicts a closed orbit, which is characteristic of the three-body problem when solved with sufficient numerical accuracy. This indicates that a step size small enough to capture the system’s dynamics accurately has been achieved, and the RK4 method is yielding a reliable approximation of the moon’s orbit.
The orbit is smooth and does not exhibit the distortions seen in the plots with fewer steps. This suggests that the numerical integration is now sufficiently resolving the trajectory over the time span of interest. With 10,000 steps, it appears that the orbit is qualitatively correct, showing the expected behavior of a third body under the gravitational influence of the other two massive bodies.
Sufficient Resolution: The step size of 10,000 steps seems to provide a high enough resolution for the RK4 method to produce a stable and accurate orbit.
Numerical Accuracy: The smaller step size has reduced the numerical errors to a level where they do not significantly affect the qualitative behavior of the solution.
Orbit Stability: The closed and stable orbit indicates that the solution is likely converging to the true physical behavior of the system.
20000 steps
The plot for 20,000 steps exhibits a very stable and well-defined orbit, which closely resembles the plot for 10,000 steps. This consistency between the two resolutions suggests that the numerical solution has converged, and increasing the step count further does not result in any significant changes to the orbit’s shape or accuracy.
NOTE: The above implementation manually implement RK4, since ode45 is an adaptive methods and not conform to the fixed-step RK4. The equivalent of Python’s implementation with RK45.
Problem 8 [5 points] The following system of ODEs, formulated by Lorenz, represents are crude model of atmospheric circulation:
Set $ω=10,b=38 ,r=28$, take initial values $y_{1}(0)=15,y_{2}(0)=15,andy_{3}(0)=36$, and integrate this ODE from $t=0tot=100$ using Matlab’s ode45.
Plot each component of the solution as a function of $t$. Plot also $(y_{1},y_{2})$, $(y_{1},y_{3})$, and $(y_{2},y_{3})$ (in separate plots).
Change the initial values by a tiny amount (e.g. $10_{−10}$) and integrate again. Compare the difference in the computed solutions.
Solution
The difference between the time series graph as shown
and diff between Lorenz are shown
The difference in the time series of $y_{1},y_{2},y_{3}$ indicates how sensitive the Lorenz system is to initial conditions. Even though the change in the initial conditions is extremely small (on the order of $10_{−10}$), the differences in the variables grow over time.
This divergence is a characteristic of chaotic systems and is known as sensitivity to initial conditions or the butterfly effect.
Difference in Phase Space Graphs, or the delta plots show that the discrepancies between the two sets of solutions with slightly different initial conditions also exhibit complex behavior. Initially, the differences are small, but as time progresses, they become more pronounced, indicating that the system’s trajectory has deviated significantly from the original path.
Problem 9 [8 points] Let A be an $n×n$ singular matrix.
Let $F(X)=I−AX$
where $I$ is the $n×n$ identity matrix. When $F(X)$ is the zero $n×n$ matrix, then $X=A_{−1}$.
We can use Newton’s method to find $A_{−1}$:
$X_{k+1}=X_{k}+A_{−1}(I−AX_{k})$
We replace $A_{−1}$ by $X_{k}$ to obtain the formula $X_{k+1}=X_{k}+X_{k}(I−AX_{k})$ (1)
a. Write a function to compute the inverse of a given matrix A using (1). You can use as an initial guess $X_{0}=∣A∣_{1}∣A∣_{∞}A_{T} $
Test your program on a few random matrices and report numerical experiments comparing its accuracy and efficiency with Matlab’s inverse function inv.
b. Does (1) converge quadratically? Provide sufficient detail supporting your claim.
Solution
a. The following entails the MATLAB solution (Python equivalent is inverse_newt.py)
yields
b. For quadratic convergence, we need $lim_{k→∞}∣e_{k}∣∣e_{k+1}∣ =C$
In this case, we need to check $E_{k}=X_{k}−A_{−1}$ as k increases.
$X_{k+1}=X_{k}+A_{−1}(I−AX_{k})$
Substitute $E_{k}$ we have $E_{k+1}=(I−X_{k}A)E_{k}$
Therefore, for quadratic convergence, we need $∣E_{k+1}∣≤C∣E_{k}∣_{2}$ for some constant $C$
From this, we have $∣E_{k+1}∣=∣(I−X_{k}A)E_{k}∣≤∣I−X_{k}A∣∣E_{k}∣$
The following modification of Python implementation is used to track errors
From the Python implementation, we calculated the ratio of the error at step $n+1$ to the square of the error at step $n$, and observed that these ratios seemed to stabilize around a constant value, rather than decreasing to zero. However, the ratios did not significantly deviate, indicating a consistent rate of convergence that could be quadratic.
To assert that the convergence is quadratic, we would expect the ratios to be bounded and for
$∣E_{k+1}∣$ to be significantly less than $∣E_{k}∣_{2}$ as $n$ increases. The results from Python showed that the error does decrease from one iteration to the next, which is consistent with convergence.
The discrepancy between the theoretical expectation of quadratic convergence and the observed stabilisation of the error ratios might suggest that while the Newton’s method for matrix inversion is converging, it may not exhibit pure quadratic convergence in the empirical test we conducted. There could be several reasons for this: - The matrix $A$ used in the test may not meet the conditions required for quadratic convergence throughout the iterations. - The numerical precision and floating-point representation in Python may affect the calculation of the error and its ratios.