r - Why are these numbers not equal? -
the following code wrong. what's problem?
i <- 0.1 <- + 0.05 ## [1] 0.15 if(i==0.15) cat("i equals 0.15") else cat("i not equal 0.15") ## not equal 0.15
general (language agnostic) reason
since not numbers can represented in ieee floating point arithmetic (the standard computers use represent decimal numbers , math them), not expected. true because values simple, finite decimals (such 0.1 , 0.05) not represented in computer , results of arithmetic on them may not give result identical direct representation of "known" answer.
this known limitation of computer arithmetic , discussed in several places:
- the r faq has question devoted it: r faq 7.31
- the r inferno patrick burns devotes first "circle" problem (starting on page 9)
- david goldberg, "what every computer scientist should know floating-point arithmetic," acm computing surveys 23, 1 (1991-03), 5-48 doi>10.1145/103162.103163 (revision available)
- the floating-point guide - every programmer should know floating-point arithmetic
- 0.30000000000000004.com compares floating point arithmetic across programming languages
- several stack overflow questions including
- why floating point numbers inaccurate?
- why can't decimal numbers represented in binary?
- is floating point math broken?
- canonical duplicate "floating point inaccurate" (a meta discussion canonical answer issue)
comparing scalars
the standard solution in r
not use ==
, rather all.equal
function. or rather, since all.equal
gives lots of detail differences if there any, istrue(all.equal(...))
.
if(istrue(all.equal(i,0.15))) cat("i equals 0.15") else cat("i not equal 0.15")
yields
i equals 0.15
some more examples of using all.equal
instead of ==
(the last example supposed show correctly show differences).
0.1+0.05==0.15 #[1] false istrue(all.equal(0.1+0.05, 0.15)) #[1] true 1-0.1-0.1-0.1==0.7 #[1] false istrue(all.equal(1-0.1-0.1-0.1, 0.7)) #[1] true 0.3/0.1 == 3 #[1] false istrue(all.equal(0.3/0.1, 3)) #[1] true 0.1+0.1==0.15 #[1] false istrue(all.equal(0.1+0.1, 0.15)) #[1] false
some more detail, directly copied answer similar question:
the problem have encountered floating point cannot represent decimal fractions in cases, means find exact matches fail.
while r lies when say:
1.1-0.2 #[1] 0.9 0.9 #[1] 0.9
you can find out thinks in decimal:
sprintf("%.54f",1.1-0.2) #[1] "0.900000000000000133226762955018784850835800170898437500" sprintf("%.54f",0.9) #[1] "0.900000000000000022204460492503130808472633361816406250"
you can see these numbers different, representation bit unwieldy. if @ them in binary (well, hex, equivalent) clearer picture:
sprintf("%a",0.9) #[1] "0x1.ccccccccccccdp-1" sprintf("%a",1.1-0.2) #[1] "0x1.ccccccccccccep-1" sprintf("%a",1.1-0.2-0.9) #[1] "0x1p-53"
you can see differ 2^-53
, important because number smallest representable difference between 2 numbers value close 1, is.
we can find out given computer smallest representable number looking in r's machine field:
?.machine #.... #double.eps smallest positive floating-point number x #such 1 + x != 1. equals base^ulp.digits if either #base 2 or rounding 0; otherwise, #(base^ulp.digits) / 2. 2.220446e-16. #.... .machine$double.eps #[1] 2.220446e-16 sprintf("%a",.machine$double.eps) #[1] "0x1p-52"
you can use fact create 'nearly equals' function checks difference close smallest representable number in floating point. in fact exists: all.equal
.
?all.equal #.... #all.equal(x,y) utility compare r objects x , y testing ‘near equality’. #.... #all.equal(target, current, # tolerance = .machine$double.eps ^ 0.5, # scale = null, check.attributes = true, ...) #....
so all.equal function checking difference between numbers square root of smallest difference between 2 mantissas.
this algorithm goes bit funny near extremely small numbers called denormals, don't need worry that.
comparing vectors
the above discussion assumed comparison of 2 single values. in r, there no scalars, vectors , implicit vectorization strength of language. comparing value of vectors element-wise, previous principles hold, implementation different. ==
vectorized (does element-wise comparison) while all.equal
compares whole vectors single entity.
using previous examples
a <- c(0.1+0.05, 1-0.1-0.1-0.1, 0.3/0.1, 0.1+0.1) b <- c(0.15, 0.7, 3, 0.15)
==
not give "expected" result , all.equal
not perform element-wise
a==b #[1] false false false false all.equal(a,b) #[1] "mean relative difference: 0.01234568" istrue(all.equal(a,b)) #[1] false
rather, version loops on 2 vectors must used
mapply(function(x, y) {istrue(all.equal(x, y))}, a, b) #[1] true true true false
if functional version of desired, can written
elementwise.all.equal <- vectorize(function(x, y) {istrue(all.equal(x, y))})
which can called
elementwise.all.equal(a, b) #[1] true true true false
alternatively, instead of wrapping all.equal
in more function calls, can replicate relevant internals of all.equal.numeric
, use implicit vectorization:
tolerance = .machine$double.eps^0.5 # default tolerance used in all.equal, # can pick different tolerance match needs abs(a - b) < tolerance #[1] true true true false
Comments
Post a Comment