Dealing with a precision error when generating probability distribution in python -
in context of model i'm generating, @ 1 point need generate probability distribution array of real numbers. i'll leave out relevant details, have function (we'll call "f" now), generates array of n floats:
arr = [value_1, value_2, ..., value_n]
now, these values proportional probabilities next need use in multinomial sampling procedure, obvious approach this:
result = np.random.multinomial(number_of_samples,arr/arr.sum())
but (sometimes) doesn't work! sum of arr/arr.sum() ends being greater 1. in principle should mathematically impossible, i'm assuming boils down floating-point precision issue. here's trivial example of how can happen:
in [58]: arr = np.array([1/20.]*20) in [59]: arr/arr.sum() out[59]: array([ 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05, 0.05]) in [60]: (arr/arr.sum()).sum() out[60]: 1.0000000000000002
so long story short, question how best deal this. can cheat adding small number sum, i.e.:
probs = arr / (arr.sum()+0.000001)
but hackish, , fear may introduce further unwanted precision issues. there better solution?
start reading https://docs.python.org/2/tutorial/floatingpoint.html
in nutshell, floating point can't represent 0.05. effect minute:
>>> repr(1/20.) '0.05' >>> repr(sum([1/20.]*20)) '1.0000000000000002'
the correct solution define desired precision each mathematical operation, calculate round errors of each step , round accordingly when necessary.
in case, can round 5 digits since you're adding few numbers.
>>> repr(round(sum([1/20.]*20),5)) '1.0'
but more complex calculations need correct, have error assessment.
Comments
Post a Comment