systematic errors

numbers are represented in a computer with limited range and accuracy. let say, computer keeps four decimal places. then

    1/3 = 0.3333 and 2/3 = 0.6667

now,

    2 * (1 / 3) – (2 / 3) = – 0.0001 ≠ 0

Properties of IEEE 754 Doubles Precision

1 part in 252 (approximately 16 decimal places)

range: from 4.9 × 10^–324 to 1.8 × 10^308

underflow: happens when result magnitude is smaller than the smallest positive number. usually converted to 0

overflow: happens when result magnitude is larger than the largest positive number. usually converted to Inf

0/0, 0×∞, ∞/∞, ∞-∞, etc. produce NaNs

subtractive cancellation

if you subtract two large numbers and end up with a small result, fractional precision of the result will be much worse than the fractional precision of the terms

a = b – c

in the computer turns into

aFP = bFP – cFP = b * (1 + ε * b) – c * (1 + ε * c)

then

aFP / a = 1 + ε * b * b / a – ε * c * c / a

but

aFP / a = 1 + ε * a

by definition

if a is small then b ≈ c

and

ε * a ≈ (ε * b - ε * c) * b / a ≈ max (|ε * b|, |ε * c|) * b / a

where b / a is a large number

error amplification

beware rapidly changing functions which map large numbers into small

let y = f (x)

in the computer this turns into

yFP = f (x * (1 + ε * x))

using Taylor series at x,

yFP = f(x) + f’(x) * ε * x * x = y * (1 + f’(x) * ε * x * x / y)

then

ε * y = ε * x * f’(x) * x / y

so that the error amplification factor is large in case f’(x) * x / y is large