numbers are represented in a computer with limited range and accuracy. let say, computer keeps four decimal places. then
1/3 = 0.3333 and 2/3 = 0.6667now,
2 * (1 / 3) – (2 / 3) = – 0.0001 ≠ 0
1 part in 252 (approximately 16 decimal places)
range: from 4.9 × 10^–324 to 1.8 × 10^308
underflow: happens when result magnitude is smaller than the smallest positive number. usually converted to 0
overflow: happens when result magnitude is larger than the largest positive number. usually converted to Inf
0/0, 0×∞, ∞/∞, ∞-∞, etc. produce NaNs
if you subtract two large numbers and end up with a small result, fractional precision of the result will be much worse than the fractional precision of the terms
a = b – c
in the computer turns into
aFP = bFP – cFP = b * (1 + ε * b) – c * (1 + ε * c)
then
aFP / a = 1 + ε * b * b / a – ε * c * c / a
but
aFP / a = 1 + ε * a
by definition
if a is small then b ≈ c
and
ε * a ≈ (ε * b - ε * c) * b / a ≈ max (|ε * b|, |ε * c|) * b / a
where b / a is a large number
beware rapidly changing functions which map large numbers into small
let y = f (x)
in the computer this turns into
yFP = f (x * (1 + ε * x))
using Taylor series at x,
yFP = f(x) + f’(x) * ε * x * x = y * (1 + f’(x) * ε * x * x / y)
then
ε * y = ε * x * f’(x) * x / y
so that the error amplification factor is large in case f’(x) * x / y is large