Rounding Floats So That They Sum To Precisely 1

December 27, 2023 Post a Comment

I have a rather gnarly bit of code that must more-or-less randomly generate a bunch of percentages, stored as decimal floats. That is, it decides that material one makes up 13.307

Solution 1:

From your code it looks like you're randomly generating planet atmospheres, presumably for some kind of game or something. At any rate, the randomness of it is convincing me it doesn't need to be too accurate.

So i'd suggest you don't use floats, just use ints and go up to 100. Then you'll get your exact summing. For any maths you want to use them in just cast.

Is this not an option?

If you insist on using floats, then read on...

The problem you have using floats is as follows:

A floating point (in this case a double) is represented like this:

enter image description here

which corresponds to a double of value:

enter image description here

So,

your number is (1+M) * 2**(E) (where E = e-offset)

1+M is always in the range 1-2.

So, we have equally spaced numbers inbetween each pair of power of two (positive and negative), and the spacing between the numbers doubles with each increase in the exponent, E.

Think about this, it means that there is a constant spacing of representable numbers between each of these numbers [(1,2),(2,4),(4,8), etc]. This also applies to the negative powers of two, so:

0.5 - 1
0.25 - 0.5
0.125 - 0.25
0.0625 - 0.125
etc.

And in each range, there are the same quantity of numbers. This means that if you take a number in the range (0.25,0.5) and add it to a number in the range (0.5,1), then you have a 50% chance that the number cannot be exactly represented.

If you sum two floating point numbers for which the exponents differ by D, then the chances of the sum being exactly representable are 2.

If you then want to represent the range 0-1, then you'll have to be very careful about which floats you use (i.e. force the last N bits of the fraction to be zero, where N is a function of E).

If you go down this route, then you'll end up with far more floats at the top of the range than the bottom.

The alternative is to decide how close to zero you want to be able to get. Lets say you want to get down to 0.0001.

0.0001 = (1+M) * 2

log₂(0.0001) = -13.28771...

So we'll use -14 as our minimum exponent.

And then to get up to 1, we just leave the exponent as -1.

So now we have 13 ranges, each with twice as many values as the lower one which we can sum without having to worry about precision.

This also means though, that the top range has 213 more values we can use. This obviously isn't okay.

So, after picking a float, then round it to the nearest allowable value - in this case, by round I just mean set the last 13 bits to zero, and just put it all in a function, and apply it to your numbers immediately after you get them out of rand.

Something like this:

from ctypes import *

def roundf(x,bitsToRound):

    i = cast(pointer(c_float(x)), POINTER(c_int32)).contents.value

    bits = bin(i)

    bits = bits[:-bitsToRound] + "0"*bitsToRound

    i = int(bits,2)

    y = cast(pointer(c_int32(i)), POINTER(c_float)).contents.value

    return y

(images from wikipedia)

Solution 2:

If you mean to find two values that add up to 1.0

I understand that you want to pick two floating-point numbers between 0.0 and 1.0 such that they add to 1.0.

Do this:

pick the largest L of the two. It has to be between 0.5 and 1.0.
define the smallest number S as 1.0 - L.

Then in floating-point, S + L is exactly 1.0.

If for some reason you obtain the smallest number S first in your algorithm, compute L = 1.0 - S and then S0 = 1.0 - L. Then L and S0 add up exactly to 1.0. Consider S0 the “rounded” version of S.

If you mean several values X₁, X₂, …, X_N

Here is an alternative solution if you are adding N numbers, each between 0.0 and 1.0, and expect the operations X₁ + X₂ + … and 1.0 - X₁ … to behave like they do in math.

Each time you obtain a new number X_i, do: X_i ← 1.0 - (1.0 - X_i). Only use this new value of X_i from that point onwards. This assignment will slightly round X_i so that it behaves well in all sums whose intermediate results are between 0.0 and 1.0.

EDIT: after doing the above for values X₁, …, X_N-1, compute X_N as 1 - X₁ - … - X_N-1. This floating-point computation will be exact (despite involving floating-point), so that you will have X₁ + … + X_N = 1 exactly.

Solution 3:

In the end, it turned out the simplest solution was to change the problem. Rounding the sum to 5 digits of precision with round(x,5) whenever it was checked gave adequate results.

Solution 4:

Since floats are stored in the machine in a binary representation, there are always numbers that can not be precisely represented. If you need to work around this limitation, you must use some math library, that uses custom defined datatypes.

Solution 5:

floats are are represented by powers of two. From the python docs: "Unfortunately, most decimal fractions cannot be represented exactly as binary fractions"

http://docs.python.org/2/tutorial/floatingpoint.html

EDIT: Maybe instead of actually trying to get to 1.0000000000000000000000 you should determine an acceptable level of error by cutting off anything after the third decimal place. You can be relatively certain that the value added to 1. Using this concept you could accept any answer greater than 0.999 and less than 1.001.

This may not be perfect but it might be a good workaround to get you past your problem.

Free Interactive Python Tutorial

Rounding Floats So That They Sum To Precisely 1

Solution 1:

Solution 2:

If you mean to find two values that add up to 1.0

If you mean several values X₁, X₂, …, X_N

Solution 3:

Solution 4:

Solution 5:

Post a Comment for "Rounding Floats So That They Sum To Precisely 1"

Rounding Floats So That They Sum To Precisely 1

Solution 1:

Solution 2:

If you mean to find two values that add up to 1.0

If you mean several values X1, X2, …, XN

Solution 3:

Solution 4:

Solution 5:

Post a Comment for "Rounding Floats So That They Sum To Precisely 1"

If you mean several values X₁, X₂, …, X_N