# numerical analysis: root finding and others

a root finding algorithm is a numerical method of finding an x for a given function f such that f(x) = 0. this particular value of x is the root

there are many different methods of finding roots of a function

### Bisection Method

bisection method is a simple root finding method. it requires prior knowledge of the value of a continuous function f at the endpoints of a given interval [a,b] where the root exist

the function value at the endpoints must have different signs (i.e. f(a)*f(b)<0)

using the Intermediate Value Theorem, bisection method iterates and reduces the interval [a,b] into halves, disregarding the [a,b]/2 part where no sign change occurs. it will iterate until it converges to the root or the level of desired accuracy is reached

Bisection Method

algorithm:
1. input level of accuracy ε
2. set interval a, b
3. compute m = (a + b) / 2
4. if |f(m)| ≤ ε , m = root
5. if f(m) > 0, replace midpoint with the endpoint where function was positive
6. if f(m) < 0, replace midpoint with endpoint where function was negative
7. repeat 3-6

the bisection method algorithm constructs the sequence cₙ that will converge to a root of the given function that is lim cₙ = r , n → ∞


#!/usr/bin/python

#Bisection Method

from math import*

def F(x):                         # the function
f = pow (x, 2) - 4
return f

def bisection():
a = float(input ("Left Interval: "))
b = float(input ("Right Interval: "))
N = int(input ("How many iterations? "))

Flft = F(a)                # the left function value
Frgt = F(b)                # the right function value

if Flft * Frgt < 0 :       # a sign change occur

for i in range (0,N,1):

c = (a + b) / 2        # the midpoint of the interval
print(c)
Fmid = F(c)            # midpoint interval value
Flft = F(a)            # left interval value
Frgt = F(b)            # right interval value

if fabs (a - b) < 0.00001 : print ("root is:") ; print (c) ; break
elif Flft * Fmid < 0 : b = c          # sign change happens on the left
elif Fmid * Frgt < 0 : a = c          # sign change happens on the right

print ("no root")

bisection()


### Multiple Partitioning Method

if we can bisect the interval [a,b] on which f(x) is continuous on, is it not better to do multiple partitions? to answer this question, one must define ‘better’

let k=2,3,4.... be the number of partitions. is it better to do k partitions in terms of number of iterations required? in terms of work?

let n ≥ 1 be the number of iterations and an,bn,cn denote the nth value of a,b,c.

bn+1 - an+1 = (bn - an) / k
bn - an = (b - a) / k^(n-1)

∃ root = α ∈ [an ,cn] or [cn,bn]. hence: (α - cn) ≤ (cn - an) = (bn - cn) = (bn - an)/k

combining the two equations together: |α - cn| ≤ (cn - an) = (bn - cn) = [ (b - a) / k^(n-1) ] / k

|α - cn| ≤ (cn - an) = bn - cn = (b - a) / k^n

1/k^n will converge when n > 1, therefore, cn ⇒ α as n ⇒ ∞

if (b - a) / k^n ≥ ε   ⇒   |α - cn| ≥ ε

solving for n:

n ≤ log(ε/(b - a)) / log(k)

as k ⇒ ∞ the number of iterations required to find the root x = α will decrease. bisection (k=2) requires more iterations to find the root of a given function f(x) then Trisection (k=3)

which method is better with respect to how much work done (cost)?

there are (k-1) function evaluations. if we multiply this by the number of iterations, we should get the cost:

cost ≤ (k - 1) * [ (ε / log(b - a)) / log(k) ]

as k ⇒ ∞ it would require more computation to converge to a root

the following is the modified bisection method code into multi-partition method:


%% to find the root of f(x) = 0 in the interval [a,b] using n partitions

clear all, close all, clc

f = @(x) x.^3 + 1 ;                           % given function
eps = 1e-5 ;                                  % tolerance

a = input ("left endpoint of the interval  : ") ;
b = input ("right endpoint of the interval : ") ;

ya = f(a) ; yb = f(b) ;                       %  values at the endpoints

if ya * yb > 0 , display 'no root in this interval' , break , end

n = input ("number of partitions : ") ;

nmax = abs (log (eps / (b - a)) / log (n)) ;  % number of iterations

for j = 1 : ceil (nmax)
x =  linspace (a , b , n + 1) ;             % divide the interval into equal sections
for i = 1 : n
if (f (x(i)) * f (x(i + 1))) <= 0 ;       % a sign change occur
ta = x(i) ;                             % lower endpoint assignment
tb = x(i + 1) ;                         % upper endpoint assignment
end
end
a = ta ; b = tb ;
end

printf ("root is : %f\n", a) ;



the partition methods (i.e bisection, trisection, etc…) are guarantee to find a root to a given function on interval

the above methods of root finding are also refered to as Two-Point Methods. the following methods will be One-Point Method, requiring only one initial guess

### Fixed-Iteration Method

a fixed point x of a given function g(x) satisfies g(x) = x

given a continuous function g(x) : x ∈ [a,b] & g(x) ∈ [a,b] (the function maps it domain onto itself g: [a,b] ⇒ [a,b] then by Brawer' theorem ("if closed set is mapped by some linoperator into yourself then this set has fixed point") we have fixed point on [a,b]

is the fixed point unique? if g'(x) exists on [a,b] then the fixed point x is unique

Proof:
suppose that p and q are fixed points of g(x)   :   g(p) = p   and   g(q) = q

by the Mean Value theorem of derivative:

|p - q|   =   |g(p) - g(q)|   =   |g'(χ) * (p - q)|   ≤   |g'(χ)| * |p - q|   ≤   k * |p - q|

where χ ∈ [p,q] , 0 < k < 1

|p - q| ≤ k * |p - q| < |p - q| , 0 < k < 1

therefore p = q

QED

how to find fixed point?

it needs an initial point p₀

p₁=g(p₀), p₂=g(p₁) .. pn+1=g(pn)

this is the fixed point iteration

Proof:

let p=g(p) (i.e fixed point). p₀, p₁, p₂ .. pn are the sequence of guesses

|pn - p| = |g(pn) - g(p)| ⇒ MVT ⇒ |g'(χ) * (pn - p)| = |g'(χ)| * |pn - p| ≤ k * |pn - p|

( g(pn) - g(p) ) / ( pn - p ) = g'(χ)   where χ lies between pn and p

f(x)=cos( p*x) on [-pi/2, pi/2]

fixed point is the solution of the equation x=g(x), as shown on the image above

the fixed point iteration algorithm will generate the sequence pn , n = 0..∞

limn → ∞ g(pn) = g(p) = p


#!/usr/bin/python

# Fixed Iteration Method

from math import*

def F(x):                   # the function
f = 8 / (x + 2)
return f

def fixedpoint():
p = []
p.insert (0, 4)             #initial condition
for i in range (0, 19, 1):
t = p[i]
x = F(t)
p.insert (i+1, x)
print (p[i])

fixedpoint()



to find the root of a given function f(x) = 0, this can be written in the fixed point iteration for some g(x) so that the fixed-point of g(x) become the root of the f(x). a good g(x) is very important

example: say you wish to find the root of f(x) = x^3 - 2*x - 5
you may use g₁(x) = (x^3 - 5)/2 or
g₂(x) = (2*x + 5)^(1/3) (just solving f(x) for x)
using the code above, you will have the following output with initial condition of x₀ = 2 :

g₁(x) does not converge:

       2
1.5
-0.8125
-2.76818847656
-13.1061306749
-1128.12436634
-717861967.275
-1.84966397745e+26
-3.16408775755e+78


this selection of g₁(x) = (x^3-5)/2 is not a good choice

whereas g₂(x) will converge:

      2
2.08008382305
2.0923506778
2.09421699601
2.09450065219
2.09454375753
2.09455030781
2.09455130318
2.09455145444
2.09455147742
2.09455148092
2.09455148145
2.09455148153
2.09455148154

f(2.09455148154)=-2.596856063519226e-11

## Newton’s Method

given a function f(x) and whose f'(x) continuous near a root r, then the Newton’s method can be used to produce a sequence pk that converges faster than Bisection method. Newton method is also a one-point method (i.e it only requires one initial guess)

Newton's method uses the tangent line at one point, the initial guess x₀, on the given function y=f(x). the equation of the tangent at the initial guess is the written as the following:

y - f(x₀) = f'(x₀) * (x - x₀)

#### Newton’s Method Theorem:

Assume that f(x) ∈ C[a,b] & ∃ root r ∈ [a,b] such that f(r)=0. The Newton’s Method sequence is defined by the following iteration:

$$x_{k+1}=f(x)={x_k}-\frac{f(x_k)}{f'(x_k)} for k ≥ 0$$

This is x_{k+1} is the intersection about the x-axis of the tangent line above. For the next iteration, our initial guess is replaced by the x-intercept obtain in the first calculation. This iteration will continue until the root r is found or the user is satisfied with level of accuracy.

The Newton Method iteration, similar to the fixed point iteration, will converge to the root r for any initial guess close to the root and if f'(x) \neq 0 on [a,b].


#!/usr/bin/python

#Newton Method
from math import*

def F (x): # function
f = pow (x, 3) + 2 * x - 4
return f

def F1 (x): # derivative
f = 3*pow (x, 2) + 2
return f

def newtonmethod ():
p = []
p.insert (0, 1) #initial condition
for i in range (0, 10, 1):
t = p[i]
x = p[i] - (F (t) / F1 (t)) #newton method formula
p.insert (i + 1, x)
print (p[i])

newtonmethod ()


I will test this code to find sqrt(2)=1.41421356237 with 20 iterations. I will the function as f(x)=x^2-2. x₀=5. Here is the program output:

5
2.7
1.72037037037
1.44145536818
1.41447098137
1.4142135858
1.41421356237
1.41421356237
1.41421356237
1.41421356237
1.41421356237
1.41421356237
1.41421356237
1.41421356237
1.41421356237
1.41421356237
1.41421356237
1.41421356237
1.41421356237


There are times when Newton Method’s starting point enters a cycle.

Lets take f(x)= x^3 -2x +2 and start it at x₀ = 0

  0
1.0
0.0
1.0
0.0
1.0
0.0
1.0
0.0
1.0
0.0
1.0
0.0
1.0
0.0
1.0
0.0
1.0
0.0


The sequence will oscillate between the two without converging to a root.

## Vandermonde Matrix & Lagrange Polynomial

Vandermonde Matrix & Lagrange Interpolation:

The quadratic equation example above is in fact Vandermonde Matrix with only 3 points. However, lets consider n+1 points. The polynomial can be expressed in this form:

$$Pn(x)=\sum_{k=0}^n{a_k x^k} and f(x_k)=Pn(x_k), k=0,1,2 \cdots ,n$$

For large data points, usually the case in real life problem (compare to P_2(x) example above), we will obtain a linear system of equations for the coefficients a_0,a_1,a_2, \cdots, n:

$$\begin{bmatrix} 1 & x₀ & x₀^2 & \cdots & x₀^n \\ 1 & x₁ & x₁^2 & \cdots & x₁^2 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 1 & xn & xn^2 & \cdots & xn^n \end{bmatrix} \begin{bmatrix} a_0 \\ a_1 \\ \vdots \\an \end{bmatrix}=\begin{bmatrix} f(x₀) \\ f(x₁) \\ \vdots \\f(xn) \end{bmatrix}$$

The coefficient matrix is known as the Vandermonde Matrix.

Matlab code:

    clear

N=5;
for j=1:N
x(j)=((j-1)*(2*(pi))/(N-1));     % set up grid
end

% alternative use x=linspace(0,2*pi,N)

f=cos(x');                         % set up function values

for j=1:N
V(:,j)=x.^(j-1)
end

c=inv(V)*f;;                       % polynomial coefficients

%now make the polynomial
for j=1:N
cp(N+1-j)=c(j)                   % rearrange order of the coefficients
end

y=x';
P=polyval(cp,y)


You can increase the number of points to achieve better accurate approximation. However, the numerical solution of the linear equations above for large n is very expensive and give rise to stability problems. Using matlab, you will start getting the following warning:

     Warning: Matrix is close to singular or badly scaled.
Results may be inaccurate. RCOND = 9.377592e-48.


And the approximation (in blue) vs. the function( f(x)=cos(x) on [0,2 \pi] in red) would look like this:

The error of this interpolation is on the right. It is plotted again the number of point. As you increase data point, the error increases tremendously. This is not good. You should’ve expected the error to decrease. The figures above are with just 50 data points. If more points are added, you can see the error even more vividly.

For this reason, a better option for polynomial interpolation is needed. Say you’re trying to find an interpolation polynomial that agrees with the function f(x) at two points, a P_1(x). The function can then be approximated by taking linear combination of two polynomial of degree 1:

$$P_1(x)=L_{1,0}(x)+L_{1,1}f(x₁) L_{1,0}=\frac{x₁ -x}{x₁ -x₀} L_{1,0}=\frac{x -x₀}{x₁ -x₀}$$

Hence, whenever L_{1,0}(x₀)=1 then L_{1,1}(x₁)=0, and whenever L_{1,1}(x₀)=0 then L_{1,1}(x₁)=1.This way, the following equation will creates secant lines connecting to each other: P_1(x)=L_{1,0}(x)+L_{1,1}f(x₁).

The above equation can be extended for n+1 points as the following:

$$Pn(x)= \sum_{k=0}^{n} {L_{n,k}}(x) f(x_k)$$

The index n denotes the degree of the polynomial and k indicates f(x_k) being the coefficient in Pn(x).

$$L_{n,k}(x)=\prod_{j=0, j \neq k}^n \frac{x-x_j}{x_k-x_j}$$

The Lagrange polynomials satisfies:

L_{n,k}(x_j) = \begin{cases} 1 & j = k\\ 0 & j \neq k \end{cases}

The same thing happens with Lagrange Interpolation that happened in Vandermonde Matrix. This problem, or misbehavoir, is known as Runge’s phenomenom.

## Runge’s Phenomenon

Runge’s Phenomenon:

Its a problem that occurs with large data during interpolation of polynomial of higher degrees. The more data points used, the higher the polynomial degree. To investigate this phenomena, I will use f(x)=\frac{1}{1+25x^2} on [-1,1].

Equally Spaced Nodes:

error plot of f(x) with 10 interp point

f(x) with 10 interpolation points

The error plot looks great. Anyone would be happy to get an error like that, almost machine precision(10e-16). However, the reconstruction of the function(in blue) in poor. With more points, we see the runge’s phenomena, the oscillation toward the end:

The error plot of this interpolation shows the impact of Runge’s Phenomena:

The error toward the end point, where the oscillations occur in the interpolation, is very poor.

Chebyshev Points:

x = \cos \frac{k\pi}{n}\text{ for }0 \le k \le n

Chebyshev point is better in polynomial interpolation because it reduces the ocilation as n increases. Below is a plot of f(x) with just 16 points.

## Interpolation Using Splines

This method has advantages over the use of high degree polynomial, which have tendency to oscillate between data values. This method is a series of third degree polynomials joining the points together. Only four unknown coefficients need to computed to construct the cubic spline.

$$Given n+1 data points, there will be n cubic polynomials sn(x): s_0 (x) = a_0 + b_{0}x + c_0 x^2 + d_0 x^3 \\ s_1 (x) = a_1 + b_{1}x + c_1 x^2 + d_1 x^3 \\ s_2 (x) = a_2 + b_{2}x + c_2 x^2 + d_2 x^3 \\ \vdots \\ sn (x) = an + bn x + cn x^2 + dnx^3$$

Each of these cubic polynomial must pass through the two points it joins.

$$At the point (x₀,y_0) and (x₁,y_y), s_0(x) will pass through both points. At the second point,s_1(x) will also pass trough.$$

Smoothness conditions is most important in the spline method. Therefore, continuity of the slope f'(x) and the second derivative f''(x), which determines the concavity of the function f(x), must also be agree between the adjacent data points. These conditions controls the oscillations that usually happens in the high order polynomials from happening in Cubic Spline.

Properties of Cubic Spline:

1. S(x)=s_k(x)=s_{k,0} + s_{k,1}(x-x_k) + s_{k,2}(x-x_k)^2 + s_{k,3}(x-x_k)^3 for k=0,1,2, \cdots n
2. s(x_k)= f(x_k) for k=0,1,2, \cdots n. The splines passes through each data points.
3. s_k(x_{k+1})=s_{k+1}(x_{k+1}) for k=0,1,2, \cdots n. The spline forms a continuous functions on the interval.
4. s'_k(x_{k+1})=s'_{k+1}(x_{k+1}) for k=0,1,2,\cdots n. The spline forms a smooth function.
5. s''_k(x_{k+1})=s''_{k+1}(x_{k+1}) for k=0,1,2,\cdots n. The second derivative is continuous.

The properties of cubic splines makes sure that Runge phenomenon does not occur in this interpolation method.

Matlab Code:

#!/usr/local/bin/octave -q

close all; clc; clear all

x = linspace(-1,1,15);
x=-cos(pi*(0:20)/20); % the quadrature points [-1,1]
y = 1./(1+25*x.^2);

xx = linspace(-1,1,100);
yy= 1./(1+25*xx.^2);
y = spline(x,y,xx);

plot(xx,yy,’b’)
hold on
plot(xx,y,’r’)

pause ()


The result of this code is the following:

10 equally spaced points

10 chebyshev points

15 points

The Chebyshev points distributions visual results is a lot better than equally spaced points (with 10 points).

Quadrature is the numerical evaluation of definit inegrals in the following form:

$$F(x) = \int_{a}^{b} {f(x) dx}$$

This can approximated by in this form:

$$F(x) \approx \sum_{k} w_k f(x_k)$$

There are many forms of quadratures available. Each of these methods are in the same form above. The only difference is the choice of x_k called the quadrature nodes, and weights w_k.

Previous subject area had been about polynomial interpolation. The integral of the polynomial intepolant pn(x) of the f(x) maybe be used to derive the quadrature formula of F(x). Lets this polynomial integral be denoted as \hat{F(x)} from this point on.

$$pn(x) \approx f(x) through n+1 points. \hat{F(x)}=\hat{F(x)}=\int_a^b {pn(x)}dx$$

Example using Lagrange form:

The polynomial interpolation in the Lagrange form is: pn(x) = \sum_{k=0}^n {L_{n,k}(x) f(x_k)}. The integral of a given function f(x) which is interpolated in Lagrange form can be approximated by:

$$\hat{F(x)}=\int_a^b{(\sum_{k=0}^n {L_{n,k}(x) f(x_k)}})dx$$

Interchanging the order of summation and integration, we obtain:

$$\hat{F(x)}=\sum_{k=0}^n(\int_a^b{L_{n,k}(x) dx})f(x_k)$$

$$\hat{F(x)} = \sum_{k=0}^n{w_k f(x_k)}$$

the weight is now w_k = \int_a^b {L_{n,k}(x) dx}. The most commonly used and the simplest quadrature formula are those with equally spaced nodes and its knows as the (n+1) Newton-Cotes formulas. Here is an example of 2-point closed Newton-Cotes formula using the Lagrange form above:

set x₀=a , x₁=b and \Delta{x}=h=b-a. The interpolatory quadrature is equivalent to \hat{F_1 (x)} = w_0 f(x₀) + w_1 f(x₁), where

$$w_0 =\int_{x₀}^{x₁} L_{1,0}(x)dx and w_1 = \int_{x₀}^{x₁} L_{1,1}(x)dx$$

From my previous on "Lagrange Interpolation",

w_0 = \int_{x₀}^{x₁} {\frac{x-x₁}{x₀-x₁}dx} and w_1 = \int_{x₀}^{x₁} {\frac{x-x₀}{x₁-x₀}dx}

function [sums]=sidrewquad(B,lend,rend)
syms x;
for i=1:length(B)
c(i)=B(i)*x^(i-1);
ss(i)=int(c(i),x,lend,rend);
end

sums=sum(ss);
sums=double(sums);
end


This function file take the Interpolation polynomial and integrate it. Using this code to integrate f(x)= sin(x) on [- \pi, \pi], the numerical result of the code is:

sidrewquad(B,lend,rend)
ans =
4.118443451920250e-11

This is with only 20 points.
Now trying f(x)=cos(x) on [- \pi, \pi]:
sidrewquad(B,lend,rend)
ans =
2.262400242652592e-10
With f(x)= \frac{1}{1-x^2} on [-1,1] the function file return a result of:
\begin{tabular}{l c r} Number of points & Integration of f(x) \\ 20 & 5.633474435012952 \\ 25 & -1.725386391741510e+03 \\ 30 & -1.123890178571379e+03 \\ 40 & -1.282148574121979e+04 \end{tabular}

This form can be extended for n+1 point of any Polynomial base interpolation:

$$\hat{Fn (x)}= \sum_{k=0}^n w_k f(x_k) =w_0 f(x₀) + w_1 f(x₁) + \cdots + w_{n-1} f(x_{n-1) + w_{n} f(xn)}$$

The above equation can be used for any of the Rienmann Sum rules.

The above function file will work for any of the polynomial interpolation code, but not cubic spline. So we wrote one to Integrate cubic spline:

function [sums]=sidrewquadspline(a,b,c,d,lend,rend)
syms x;
for i = 1 : length(b)
C(i)=(a(i)*x^(i-1));
ss(i)=int(C(i),x,lend,rend);
end

sums=sum(ss);
sums=double(sums);


Problem with this code is that we forgot cubic spline has a number of cubic polynomials, not just one. So this code is currently not working.

The Newton-Cotes formulas are based on interpolating polynomials.However, interpolation at equidistant points experiences the Runge Phenomenon even for well-behaved function. For explicitly given f(x), other choices of quadrature should be considered, such as the Gaussian quadrature.

$$\int_{-1}^1 f(x) dx = \int_{-1}^{1} \frac{1}{\sqrt{1-x^2}} g(x) dx = \sum_{i=0}^n w_i g(x_i)$$

where

$$g(x_i) = f(x_i) \sqrt{1-x^2} and the weight is w_i=\frac{\pi}{n} for i=0,1,2, \cdots n$$

Here is a Matlab code implementing Gauss-Chebyshev Quadrature:


function int = chebquad(f, n, a, b) ;
x = cos (((2 * (1 : n) – 1) / (2 * n)) * pi) ;
w = pi / n ;
fx = ((b - a) / 2) * f(((b - a) / 2) * x + (b + a) / 2) .* sqrt (1 - x .^ 2) ;
int = sum (w .* fx) ;
end