Loads respectively stores a 32-bit floating point number. Before storing,
respectively after loading, a conversion from, respectively to,
the 64-bit floating point format is done.
LDSF:
Register $X is set to the 64-bit floating point number corresponding to the 32-bit floating point number
represented by M4
[$Y + $Z] or M4
[$Y + Z]. No arithmetic exceptions occurs, not even if a signaling NaN is
loaded.
STSF:
The value obtained by rounding register $X to a 32-bit floating point number is placed in M4
[$Y + $Z] or
M4
[$Y + Z]. Rounding is done with the current rounding mode, in a manner exactly analogous to the
standard conventions for rounding 64-bit results, except that the precision and exponent range are limited.
In particular, floating overflow, underflow, and inexact exceptions might occur; a signaling NaN will trigger
an invalid exception and it will become quiet. The fraction part of a NaN is truncated if necessary to a
multiple of 2-23
, by ignoring the least significant 29 bits.
If we load any two short floats and operate on them once with either FADD, FSUB, FMUL, FDIV, FREM, FSQRT,
or FINT, and if we then store the result as a short float, we obtain the results required by the IEEE standard
for single format arithmetic, because the double format can be shown to have enough precision to avoid any
problems of "double rounding". But programmers are usually better off sticking to 64-bit arithmetic unless
they have a strong reason to emulate the precise behavior of a 32-bit computer; 32 bits do not offer much
precision.
Please help to keep this site up to date!
If you want to point out important material or projects
that are not listed here, if you find errors or want to suggest improvements,
please send email to