Float to Decimal Float Conversion

(c) 2022 Robert Finch

Decided to create an article on float to decimal float conversions. I could not really find anything on the web regarding this. Lots and lots about converting decimal numbers to float and float numbers to decimal but nothing really on float-to-float conversions. This is a method that I have come up with on my own. It remains to be seen how well it works. I have posted it to try and get some feedback. I cannot get it to work very well so maybe this approach is not very good.

There are other approaches that may be possible to use such as first shifting the number so that it has a zero exponent, converting it to decimal then normalizing the result. The issue with the approach is exponents are 15-bits for quad precision. That means a 32k bit shift register may be required. Not practical for anything other than smaller formats.

So, given a standard representation of a float number, how to convert it to a decimal float number?

# Sign Bit

Start with the easy part. The sign bit remains the same between float and decimal float. Step one done.

Now comes the fun.

Converting the exponent

The exponent for a float is a power of two, but for a decimal float is a power of ten. So, a translation needs to take place between the two. Summarizing, it can be found as the power of ten equals 0.3010299957 times the power of two. This comes from the following math:

2x = 10y

log 2x = log 10y

x log10 2 = y log10 10

x (0.301…) = y (1)

Hence to go from a power of two to a power of ten multiply by the magic number 0.301…

Now if only it were that simple.

The issue is that the resulting power of ten is not usually a whole number. It has a whole part and a fractional part to go along with it. But for purposes of representing the number as a decimal float, the exponent must be a whole number. What that means is the significand of the number needs to be modified so that a whole number exponent may be used. Scaling the significand by a constant is the same thing as raising it to a power. We can choose a scaling factor that is the missing fractional part of the exponent.

However, probably the easiest route to modifying the significand is to modify the binary significand in the original float number that is being converted. Using this approach avoids having to do BCD multiplication. But now the fractional part of the power of ten needs to be reflected back to the originating float number. So, the power of ten is rounded up to the nearest whole number, then the power of two is calculated by multiplying by the reciprocal of the factor 0.301… or 3.321… Rounding the power of ten to the nearest whole number provides the exponent for the decimal float, so that part is done. Whew! Now what remains is to manipulate the significand.

# Manipulating the Significand

Multiplying back the power of ten to the power of two results in a power of two that has a whole part and a fractional part. It seems we are not further ahead. But we are. How do we get rid of the fraction? By scaling the significand according to the fractional power of two so that it matches with an even power of two. The scaling factor is complicated to calculate, but fortunately it can be calculated before-hand.  A 512-entry table is used to allow for nine fractional bits of precision in the exponent. The table is formed using some more math as the following code shows.

 reg [64:0] tab1 [0:511]; genvar g3; generate begin : gTab1 for (g3 = 0; g3 < 512; g3 = g3 + 1)               initial begin                      tab1[g3] = \$pow(2,(g3/512.0)) * \$pow(2,64);               end end endgenerate

A few complications. The value being multiplied is a 113-bit significand. To keep the table small it contains only 65-bits of precision. The value from the table will be padded when used in the multiplier. Multiplying is really a form of shifting by a variable amount. The great thing about multiplication is that it is possible to shift by fractional amounts.

Once the significand of the originating float number is scaled by the factor needed to align it with a fraction in the exponent it can be converted to BCD for the decimal float representation. We already have the sign and decimal exponent, the BCD conversion is the last piece. With all the components of the decimal float number available it can finally be packed into a 128-bit format.

# Some Further Notes

Exponents for floats are fifteen bits and for decimal floats are fourteen bits. How do you multiply by a fractional number and get a fractional results? The constant .301… is scaled by a constant in this case 1024 is used. 0.30102 * 1024 = 308. So, the exponent is multiplied by 308 which gives a decimal float exponent that is 1024 times to large. This is easy to deal with, the extra low order 10 bits of the number are discarded.

Exponents are biased, they are not twos complement. To perform operations on the exponents they need to be unbiased in some places.

## BCD conversion

The bits making up the significand of a number are fractional bits. Each bit is a smaller value than the one before it. This is the opposite ordering of numbers to the usual BCD conversion. So, the order of the bits is reversed for the BCD conversion, then the order of converted BCD value is reversed.

# Conclusion

Does it work? Not yet.

The source code is FP2DFP128.sv