From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:52080) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UWsjX-00016m-Cx for qemu-devel@nongnu.org; Mon, 29 Apr 2013 14:23:49 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UWsjB-0007Av-G5 for qemu-devel@nongnu.org; Mon, 29 Apr 2013 14:23:27 -0400 Received: from e23smtp02.au.ibm.com ([202.81.31.144]:55356) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UWsj8-00079X-Pv for qemu-devel@nongnu.org; Mon, 29 Apr 2013 14:23:05 -0400 Received: from /spool/local by e23smtp02.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Tue, 30 Apr 2013 04:14:53 +1000 Received: from d23relay04.au.ibm.com (d23relay04.au.ibm.com [9.190.234.120]) by d23dlp03.au.ibm.com (Postfix) with ESMTP id 16A253578051 for ; Tue, 30 Apr 2013 04:22:35 +1000 (EST) Received: from d23av03.au.ibm.com (d23av03.au.ibm.com [9.190.234.97]) by d23relay04.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id r3TI8rL218808996 for ; Tue, 30 Apr 2013 04:08:53 +1000 Received: from d23av03.au.ibm.com (loopback [127.0.0.1]) by d23av03.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id r3TIMWqT015799 for ; Tue, 30 Apr 2013 04:22:34 +1000 From: Anthony Liguori In-Reply-To: <1367258703-6930-1-git-send-email-aliguori@us.ibm.com> References: <1367258703-6930-1-git-send-email-aliguori@us.ibm.com> Date: Mon, 29 Apr 2013 13:22:10 -0500 Message-ID: <87d2tdryct.fsf@codemonkey.ws> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH] softfloat: rebase to version 2a List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Thiemo Seufer , Peter Maydell , Richard Henderson , Stefan Weil , Juan Quintela , Max Filippov , Richard Sandiford , Jocelyn Mayer , Blue Swirl , Christophe Lyon , Paul Brook , malc , Paolo Bonzini , Guan Xuetao , Andreas =?utf-8?Q?F=C3=A4rber?= , Aurelien Jarno , Avi Kivity Anthony Liguori writes: > N.B. If you are on CC, see after the '---' for a requested action! > > The license of SoftFloat-2b is claimed to be GPLv2 incompatible by > the FSF due to an indemnification clause. The previous release, > SoftFloat-2a, did not contain this clause. The only changes between > these two versions as far as QEMU is concerned is the license change > and a global modification of the comment structure. This patch rebases > our softfloat code to SoftFloat-2a in order to have a GPLv2 compatible > license. > > Please note, this is a comment-only change. The resulting binary should > be the same. > > I created this patch using the following strategy: > > 1) Create a branch using the original import of softfloat code: > $ git checkout 158142c2c2df728cfa3b5320c65534921a764f26 > > 2) Remove carriage returns from Softfloat-2b > > 3) Compare each of the softfloat files against Softfloat-2b using the > following mapping to generate Fabrice's original softfloat changes: > > - fpu/softfloat.c -> softfloat/bits64/softfloat.c > - fpu/softfloat.h -> softfloat/bits64/386-Win32-gcc/softfloat.h > - fpu/softfloat-macros.h -> softfloat/bits64/softfloat-macros > - fpu/softfloat-specialize.h -> softfloat/bits64/386-Win32-gcc/softflo= at-specialize > > 4) Replace our softfloat files with the corresponding files from Softfloa= t-2a > > 5) Apply the diffs from (3) to (4) and commit > > 6) Create a diff between (5) and 158142c2c2df728cfa3b5320c65534921a764f26 > - This diff consists 100% of licensing change + comment reformating > > 7) Checkout the latest master branch, apply the diff from (6) > - There were a lot of comment rejects, confirmed this was only comments > and then used an emacs macro to rewrite the comments to the Softfloa= t-2a > form. > > Cc: Andreas F=C3=A4rber > Cc: Aurelien Jarno > Cc: Avi Kivity > Cc: Ben Taylor > Cc: Blue Swirl > Cc: Christophe Lyon > Cc: Fabrice Bellard > Cc: Guan Xuetao > Cc: Jocelyn Mayer > Cc: Juan Quintela > Cc: malc > Cc: Max Filippov > Cc: Paolo Bonzini > Cc: Paul Brook > Cc: Peter Maydell > Cc: Richard Henderson > Cc: Richard Sandiford > Cc: Stefan Weil > Cc: Thiemo Seufer > Signed-off-by: Anthony Liguori > --- > In order to make this change, we need to relicense all contributions > from initial import of the SoftFloat code to match the license of > SoftFloat-2a (instead of the implied SoftFloat-2b license). > > If you are on CC, it is because you have contributed to the softfloat > code in QEMU. Please response to this note with: > > Acked-by: Your Name > > To significant that you are able and willing to relicense your changes > to the SoftFloat-1a license (or a GPL compatible license). s/SoftFloat-1a/SoftFloat-2a/g. Sorry about that. Thanks to Peter for spotting the typo. Regards, Anthony Liguori > > Please respond no later than May 6th, 2013. If we are unable to confirm > relicense from an author, changes from that author will be reverted. > --- > For completeness, here is the full listing of contributions: > > Andreas F=C3=A4rber > be45f06 Silence softfloat warnings on OpenSolaris > 5aea4c5 softfloat: Replace uint16 type with uint_fast16_t > 94a49d8 softfloat: Replace int16 type with int_fast16_t > c969654 softfloat: Fix mixups of int and int16 > 38641f8 softfloat: Use uint16 consistently > 87b8cc3 softfloat: Resolve type mismatches between declaration and imple= mentation > 8d725fa softfloat: Prepend QEMU-style header with derivation notice > 9f8d2a0 softfloat: Use uint32 consistently > bb98fe4 softfloat: Drop [s]bits{8, 16, 32, 64} types in favor of [u]int{= 8, 16, 32, 64}_t > > Aurelien Jarno > 1020160 softfloat: fix default-NaN mode > 084d19b target-mips: Implement correct NaN propagation rules > 196cfc8 softfloat: add a 1.0 constant for float32 and float64 > 1b2ad2e softfloat-native: fix *nan() > 1f398e0 softfloat: use float{32,64,x80,128}_maybe_silence_nan() > 211315f softfloat: rename float*_eq() into float*_eq_quiet() > 2657d0f softfloat: rename float*_eq_signaling() into float*_eq() > 30e7a22 Use float_relation_* constants > 326b9e9 softfloat: fix float*_scalnb() corner cases > 34d2386 softfloat: remove HPPA specific code > 374dfc3 soft-float: add float32_log2() and float64_log2() > 4cc5383 softfloat-native: add float*_is_any_nan() functions > 587eabf softfloat: add float*_is_zero_or_denormal() > 629bd74 softfloat-native: add float32_is_nan() > 67b7861 softfloat: add float*_unordered_{,quiet}() functions > 8229c99 softfloat: add float32_exp2() > 85016c9 Assortment of soft-float fixes, by Aurelien Jarno. > 8d6c92b softfloat-native: improve correctness of floatXX_is_neg() > 93ae1c6 softfloat: fix float{32,64}_maybe_silence_nan() for MIPS > a167ba5 Add support for GNU/kFreeBSD > b3b4c7f softfloat: use GCC builtins to count the leading zeros > b4a0ef7 softfloat-native: add float*_unordered_quiet() functions > b689362 softfloat: move float*_eq and float*_eq_quiet > b76235e softfloat: fix floatx80_is_infinity() > bbc1ded softfloat: implement fused multiply-add NaN propagation for MIPS > be22a9a softfloat: always enable floatx80 and float128 support > c4b4c77 softfloat: add pi constants > c52ab6f fp: add floatXX_is_infinity(), floatXX_is_neg(), floatXX_is_zero= () > cf67c6b softfloat-native: remove > d2b1027 softfloat-native: add a few constant values > d6882cf softfloat-native: fix float*_scalbn() functions > d735d69 softfloat: rename *IsNaN variables to *IsQuietNaN > dadd71a fp: fix float32_is_infinity() > de4af5f softfloat: fix floatx80_is_{quiet,signaling}_nan() > e024e88 target-ppc: Implement correct NaN propagation rules > e2f4220 softfloat: fix floatx80 handling of NaN > e872aa8 softfloat-native: fix type of float_rounding_mode > e908775 softfloat: SH4 has the sNaN bit set > f3218a8 softfloat: add floatx80 constants > f5a6425 softfloat: improve description of comparison functions > f6714d3 softfloat: add floatx80_compare*() functions > f6a7d92 softfloat: add float{x80,128}_maybe_silence_nan() > > Avi Kivity > 3bf7e40 softfloat: fix for C99 > > Ben Taylor > 0475a5c Solaris 9/x86 support, by Ben Taylor. > c94655b Updated Solaris isinf support, by Juergen Keil and Ben Taylor. > > Blue Swirl > 128ab2f Preliminary OpenBSD host support (based on OpenBSD patches by To= dd T. Fries) > 14d483e Fix OpenSolaris softfloat warnings > 179a2c1 Rename _BSD to HOST_BSD so that it's more obvious that it's defi= ned by configure > 1d6198c Remove unnecessary trailing newlines > 1f58732 128-bit float support for user mode > 2734c70 Rename one more _BSD to HOST_BSD (spotted by Hasso Tepper) > 3f4cb3d Fix OpenSolaris gcc4 warnings: iovec type mismatches, missing 's= tatic' > 70c1470 Sparse fixes: dubious mixing of bitwise and logical operations > 7c2a9d0 Fix math warnings on OpenBSD -current > b1d8e52 Fix undeclared symbol warnings from sparse > b55266b Suppress gcc 4.x -Wpointer-sign (included in -Wall) warnings > cd8a253 Fix more typos in softloat code (Eduardo Felipe) > d07cca0 Add native softfloat fpu functions (Christoph Egger) > ed086f3 softfloat: remove dead assignments, spotted by clang > > Christophe Lyon > 8559666 softfloat: move all default NaN definitions to softfloat.h. > bcd4d9a softfloat: Honour default_nan_mode for float-to-float conversions > c30fe7d softfloat: add _set_sign(), _infinity and _half for 32 and 64 bi= ts floats. > > Fabrice Bellard > 158142c soft float support > 1b2b0af 64 bit fix > 1d6bda3 added abs, chs and compare functions > 38cfa06 Solaris port (Ben Taylor) > 750afe9 avoid using char when it is not necessary > b109f9f more native FPU comparison functions - native FPU remainder > ec530c8 Solaris port (Ben Taylor) > fdbb469 Solaris/SPARC host port (Ben Taylor) > > Guan Xuetao > d2fbca9 unicore32: necessary modifications for other files to support un= icore32 > > Jocelyn Mayer > 3430b0b Ooops... Typo. > 75d62a5 Add missing softfloat helpers. > > Juan Quintela > 0eb4fc8 softfloat: make USE_SOFTFLOAT_STRUCT_TYPES compile > 71e72a1 rename HOST_BSD to CONFIG_BSD > 75b5a69 rename NEEDS_LIBSUNMATH to CONFIG_NEEDS_LIBSUNMATH > dfe5fff change HOST_SOLARIS to CONFIG_SOLARIS{_VERSION} > e2542fe rename WORDS_BIGENDIAN to HOST_WORDS_BIGENDIAN > > malc > 947f5fc Add static qualifier to local functions > e58ffeb Remove all traces of __powerpc__ > > Max Filippov > 6617680 softfloat: make float_muladd_negate_* flags independent > 213ff4e softfloat: add NO_SIGNALING_NANS > b81fe82 target-xtensa: specialize softfloat NaN rules > > Paolo Bonzini > 1de7afc misc: move include files to include/qemu/ > 6b4c305 fpu: move public header file to include/fpu > 789ec7c softfloat: change default nan definitions to variables > > Paul Brook > 6001149 ARM FP16 support > 6939754 Correctly normalize values and handle zero inputs to scalbn func= tions. > 3598ecb Remove missing include. > 5c7908e Implement default-NaN mode. > 7918bf4 Fix typo in BSD FP rounding mode names. > 9027db8 Fix ARM default NaN. > 9ee6e8b ARMv7 support. > a1b91bb Fix typo in softfloat code. > e6e5906 ColdFire target. > f090c9d Add strict checking mode for softfp code. > fe76d97 Implement flush-to-zero mode (denormal results are replaced with= zero). > > Peter Maydell > 1856987 softfloat: Rename float*_is_nan() functions to float*_is_quiet_n= an() > 760e141 softfloat: roundAndPackInt{32, 64}: Don't assume int32 is 32 bits > 011da61 target-arm: Implement correct NaN propagation rules > 21d6ebd softfloat: Add float*_is_any_nan() functions > 274f1b0 softfloat: Add float*_min() and float*_max() functions > 2ac8bd0 softfloat: Reinstate accidentally disabled target-specific NaN h= andling > 2bed652 softfloat: Implement floatx80_is_any_nan() and float128_is_any_n= an() > 354f211 softfloat: abstract out target-specific NaN propagation rules > 369be8f softfloat: Implement fused multiply-add > 37d1866 softfloat: Implement flushing input denormals to zero > 4be8eea fpu/softfloat.c: Remove pointless shift of always-zero value > 600e30d softfloat: Fix single-to-half precision float conversions > 6f3300a softfloat: Add float32_is_zero_or_denormal() function > b3a6a2e softfloat: float*_to_int32_round_to_zero: don't assume int32 is = 32 bits > b408dbd softfloat: Add float*_maybe_silence_nan() functions > bb4d4bb softfloat: Add float16 type and float16 NaN handling functions > c29aca4 softfloat: Add setter function for tininess detection mode > cbcef45 softfloat: Add float/double to 16 bit integer conversion functio= ns > d5138cf softfloat: Fix compilation failures with USE_SOFTFLOAT_STRUCT_TY= PES > e3d142d fpu: Correct edgecase in float64_muladd > e6afc87 softfloat: Add new flag for when denormal result is flushed to z= ero > e744c06 fpu/softfloat.c: Return correctly signed values from uint64_to_f= loat32 > f591e1b softfloat: Correctly handle NaNs in float16_to_float32() > > Richard Henderson > 17ed229 softfloat: Fix uint64_to_float64 > 1e397ea softfloat: Implement uint64_to_float128 > 8443eff target-alpha: Split up FPCR value into separate fields. > 990b3e1 target-alpha: Enable softfloat. > ba0e276 target-alpha: Fixes for alpha-linux syscalls. > > Richard Sandiford > a6e7c18 softfloat: Handle float_muladd_negate_c when product is zero > > Stefan Weil > bc4347b arm host: fix compiler warning > > Thiemo Seufer > 5a6932d Fix NaN handling for MIPS and HPPA. > 5fafdf2 find -type f | xargs sed -i 's/[\t ]$//g' # on most files > 63a654b trunc() for Solaris 9 / SPARC, by Juergen Keil. > 924b2c0 Add proper float*_is_nan prototypes. > b645bb4 Fix softfloat NaN handling. > fc81ba5 Check that HOST_SOLARIS is defined before relying on its value. = Spotted by Joachim Henke. > --- > fpu/softfloat-macros.h | 430 ++++---- > fpu/softfloat-specialize.h | 494 +++++---- > fpu/softfloat.c | 2436 ++++++++++++++++++++++++--------------= ------ > include/fpu/softfloat.h | 242 +++-- > 4 files changed, 1981 insertions(+), 1621 deletions(-) > > diff --git a/fpu/softfloat-macros.h b/fpu/softfloat-macros.h > index b5164af..2009315 100644 > --- a/fpu/softfloat-macros.h > +++ b/fpu/softfloat-macros.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ >=20=20 > -/*=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > +/* > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D >=20=20 > This C source fragment is part of the SoftFloat IEC/IEEE Floating-point > -Arithmetic Package, Release 2b. > +Arithmetic Package, Release 2a. >=20=20 > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Cen= ter > @@ -16,28 +17,27 @@ National Science Foundation under grant MIP-9311980. = The original version > of this code was written as part of a project to build a fixed-point vec= tor > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. >=20=20 > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effor= t has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIM= ES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PER= SONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSS= ES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHER= MORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal notice) AGAINST ALL LOSSES, COSTS,= OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWA= RE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED = TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR = ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. >=20=20 > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice th= at > -the work is derivative, and (2) the source code includes prominent notic= e with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) t= hey > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. >=20=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D*/ >=20=20 > -/*----------------------------------------------------------------------= ------ > -| This macro tests for minimum version of the GNU C compiler. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +This macro tests for minimum version of the GNU C compiler. > +------------------------------------------------------------------------= ------- > +*/ > #if defined(__GNUC__) && defined(__GNUC_MINOR__) > # define SOFTFLOAT_GNUC_PREREQ(maj, min) \ > ((__GNUC__ << 16) + __GNUC_MINOR__ >=3D ((maj) << 16) + (min)) > @@ -46,14 +46,16 @@ these four paragraphs for those parts of this code th= at are retained. > #endif >=20=20 >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Shifts `a' right by the number of bits given in `count'. If any nonze= ro > -| bits are shifted off, they are ``jammed'' into the least significant b= it of > -| the result by setting the least significant bit to 1. The value of `c= ount' > -| can be arbitrarily large; in particular, if `count' is greater than 32= , the > -| result will be either 0 or 1, depending on whether `a' is zero or nonz= ero. > -| The result is stored in the location pointed to by `zPtr'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Shifts `a' right by the number of bits given in `count'. If any nonzero > +bits are shifted off, they are ``jammed'' into the least significant bit= of > +the result by setting the least significant bit to 1. The value of `cou= nt' > +can be arbitrarily large; in particular, if `count' is greater than 32, = the > +result will be either 0 or 1, depending on whether `a' is zero or nonzer= o. > +The result is stored in the location pointed to by `zPtr'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE void shift32RightJamming(uint32_t a, int_fast16_t count, uint32_t= *zPtr) > { > @@ -72,14 +74,16 @@ INLINE void shift32RightJamming(uint32_t a, int_fast1= 6_t count, uint32_t *zPtr) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Shifts `a' right by the number of bits given in `count'. If any nonze= ro > -| bits are shifted off, they are ``jammed'' into the least significant b= it of > -| the result by setting the least significant bit to 1. The value of `c= ount' > -| can be arbitrarily large; in particular, if `count' is greater than 64= , the > -| result will be either 0 or 1, depending on whether `a' is zero or nonz= ero. > -| The result is stored in the location pointed to by `zPtr'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Shifts `a' right by the number of bits given in `count'. If any nonzero > +bits are shifted off, they are ``jammed'' into the least significant bit= of > +the result by setting the least significant bit to 1. The value of `cou= nt' > +can be arbitrarily large; in particular, if `count' is greater than 64, = the > +result will be either 0 or 1, depending on whether `a' is zero or nonzer= o. > +The result is stored in the location pointed to by `zPtr'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE void shift64RightJamming(uint64_t a, int_fast16_t count, uint64_t= *zPtr) > { > @@ -98,23 +102,24 @@ INLINE void shift64RightJamming(uint64_t a, int_fast= 16_t count, uint64_t *zPtr) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right b= y 64 > -| _plus_ the number of bits given in `count'. The shifted result is at = most > -| 64 nonzero bits; this is stored at the location pointed to by `z0Ptr'.= The > -| bits shifted off form a second 64-bit result as follows: The _last_ b= it > -| shifted off is the most-significant bit of the extra result, and the o= ther > -| 63 bits of the extra result are all zero if and only if _all_but_the_l= ast_ > -| bits shifted off were all zero. This extra result is stored in the lo= cation > -| pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. > -| (This routine makes more sense if `a0' and `a1' are considered to = form > -| a fixed-point value with binary point between `a0' and `a1'. This fix= ed- > -| point value is shifted right by the number of bits given in `count', a= nd > -| the integer part of the result is returned at the location pointed to = by > -| `z0Ptr'. The fractional part of the result may be slightly corrupted = as > -| described above, and is returned at the location pointed to by `z1Ptr'= .) > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by = 64 > +_plus_ the number of bits given in `count'. The shifted result is at mo= st > +64 nonzero bits; this is stored at the location pointed to by `z0Ptr'. = The > +bits shifted off form a second 64-bit result as follows: The _last_ bit > +shifted off is the most-significant bit of the extra result, and the oth= er > +63 bits of the extra result are all zero if and only if _all_but_the_las= t_ > +bits shifted off were all zero. This extra result is stored in the loca= tion > +pointed to by `z1Ptr'. The value of `count' can be arbitrarily large. > + (This routine makes more sense if `a0' and `a1' are considered to fo= rm a > +fixed-point value with binary point between `a0' and `a1'. This fixed-p= oint > +value is shifted right by the number of bits given in `count', and the > +integer part of the result is returned at the location pointed to by > +`z0Ptr'. The fractional part of the result may be slightly corrupted as > +described above, and is returned at the location pointed to by `z1Ptr'.) > +------------------------------------------------------------------------= ------- > +*/ > INLINE void > shift64ExtraRightJamming( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint= 64_t *z1Ptr) > @@ -144,14 +149,15 @@ INLINE void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right b= y the > -| number of bits given in `count'. Any bits shifted off are lost. The = value > -| of `count' can be arbitrarily large; in particular, if `count' is grea= ter > -| than 128, the result will be 0. The result is broken into two 64-bit = pieces > -| which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by = the > +number of bits given in `count'. Any bits shifted off are lost. The va= lue > +of `count' can be arbitrarily large; in particular, if `count' is greater > +than 128, the result will be 0. The result is broken into two 64-bit pi= eces > +which are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE void > shift128Right( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint= 64_t *z1Ptr) > @@ -176,17 +182,18 @@ INLINE void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' right b= y the > -| number of bits given in `count'. If any nonzero bits are shifted off,= they > -| are ``jammed'' into the least significant bit of the result by setting= the > -| least significant bit to 1. The value of `count' can be arbitrarily l= arge; > -| in particular, if `count' is greater than 128, the result will be eith= er > -| 0 or 1, depending on whether the concatenation of `a0' and `a1' is zer= o or > -| nonzero. The result is broken into two 64-bit pieces which are stored= at > -| the locations pointed to by `z0Ptr' and `z1Ptr'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' right by = the > +number of bits given in `count'. If any nonzero bits are shifted off, t= hey > +are ``jammed'' into the least significant bit of the result by setting t= he > +least significant bit to 1. The value of `count' can be arbitrarily lar= ge; > +in particular, if `count' is greater than 128, the result will be either > +0 or 1, depending on whether the concatenation of `a0' and `a1' is zero = or > +nonzero. The result is broken into two 64-bit pieces which are stored at > +the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE void > shift128RightJamming( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint= 64_t *z1Ptr) > @@ -219,25 +226,26 @@ INLINE void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' = right > -| by 64 _plus_ the number of bits given in `count'. The shifted result = is > -| at most 128 nonzero bits; these are broken into two 64-bit pieces whic= h are > -| stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits s= hifted > -| off form a third 64-bit result as follows: The _last_ bit shifted off= is > -| the most-significant bit of the extra result, and the other 63 bits of= the > -| extra result are all zero if and only if _all_but_the_last_ bits shift= ed off > -| were all zero. This extra result is stored in the location pointed to= by > -| `z2Ptr'. The value of `count' can be arbitrarily large. > -| (This routine makes more sense if `a0', `a1', and `a2' are conside= red > -| to form a fixed-point value with binary point between `a1' and `a2'. = This > -| fixed-point value is shifted right by the number of bits given in `cou= nt', > -| and the integer part of the result is returned at the locations pointe= d to > -| by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slig= htly > -| corrupted as described above, and is returned at the location pointed = to by > -| `z2Ptr'.) > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' ri= ght > +by 64 _plus_ the number of bits given in `count'. The shifted result is > +at most 128 nonzero bits; these are broken into two 64-bit pieces which = are > +stored at the locations pointed to by `z0Ptr' and `z1Ptr'. The bits shi= fted > +off form a third 64-bit result as follows: The _last_ bit shifted off is > +the most-significant bit of the extra result, and the other 63 bits of t= he > +extra result are all zero if and only if _all_but_the_last_ bits shifted= off > +were all zero. This extra result is stored in the location pointed to by > +`z2Ptr'. The value of `count' can be arbitrarily large. > + (This routine makes more sense if `a0', `a1', and `a2' are considered > +to form a fixed-point value with binary point between `a1' and `a2'. Th= is > +fixed-point value is shifted right by the number of bits given in `count= ', > +and the integer part of the result is returned at the locations pointed = to > +by `z0Ptr' and `z1Ptr'. The fractional part of the result may be slight= ly > +corrupted as described above, and is returned at the location pointed to= by > +`z2Ptr'.) > +------------------------------------------------------------------------= ------- > +*/ > INLINE void > shift128ExtraRightJamming( > uint64_t a0, > @@ -289,13 +297,14 @@ INLINE void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Shifts the 128-bit value formed by concatenating `a0' and `a1' left by= the > -| number of bits given in `count'. Any bits shifted off are lost. The = value > -| of `count' must be less than 64. The result is broken into two 64-bit > -| pieces which are stored at the locations pointed to by `z0Ptr' and `z1= Ptr'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Shifts the 128-bit value formed by concatenating `a0' and `a1' left by t= he > +number of bits given in `count'. Any bits shifted off are lost. The va= lue > +of `count' must be less than 64. The result is broken into two 64-bit > +pieces which are stored at the locations pointed to by `z0Ptr' and `z1Pt= r'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE void > shortShift128Left( > uint64_t a0, uint64_t a1, int_fast16_t count, uint64_t *z0Ptr, uint= 64_t *z1Ptr) > @@ -307,14 +316,15 @@ INLINE void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' = left > -| by the number of bits given in `count'. Any bits shifted off are lost. > -| The value of `count' must be less than 64. The result is broken into = three > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', > -| `z1Ptr', and `z2Ptr'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Shifts the 192-bit value formed by concatenating `a0', `a1', and `a2' le= ft > +by the number of bits given in `count'. Any bits shifted off are lost. > +The value of `count' must be less than 64. The result is broken into th= ree > +64-bit pieces which are stored at the locations pointed to by `z0Ptr', > +`z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE void > shortShift192Left( > uint64_t a0, > @@ -343,13 +353,14 @@ INLINE void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Adds the 128-bit value formed by concatenating `a0' and `a1' to the 12= 8-bit > -| value formed by concatenating `b0' and `b1'. Addition is modulo 2^128= , so > -| any carry out is lost. The result is broken into two 64-bit pieces wh= ich > -| are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Adds the 128-bit value formed by concatenating `a0' and `a1' to the 128-= bit > +value formed by concatenating `b0' and `b1'. Addition is modulo 2^128, = so > +any carry out is lost. The result is broken into two 64-bit pieces which > +are stored at the locations pointed to by `z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE void > add128( > uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr= , uint64_t *z1Ptr ) > @@ -362,14 +373,15 @@ INLINE void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to= the > -| 192-bit value formed by concatenating `b0', `b1', and `b2'. Addition = is > -| modulo 2^192, so any carry out is lost. The result is broken into thr= ee > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr', > -| `z1Ptr', and `z2Ptr'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Adds the 192-bit value formed by concatenating `a0', `a1', and `a2' to t= he > +192-bit value formed by concatenating `b0', `b1', and `b2'. Addition is > +modulo 2^192, so any carry out is lost. The result is broken into three > +64-bit pieces which are stored at the locations pointed to by `z0Ptr', > +`z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE void > add192( > uint64_t a0, > @@ -400,14 +412,15 @@ INLINE void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Subtracts the 128-bit value formed by concatenating `b0' and `b1' from= the > -| 128-bit value formed by concatenating `a0' and `a1'. Subtraction is m= odulo > -| 2^128, so any borrow out (carry out) is lost. The result is broken in= to two > -| 64-bit pieces which are stored at the locations pointed to by `z0Ptr' = and > -| `z1Ptr'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Subtracts the 128-bit value formed by concatenating `b0' and `b1' from t= he > +128-bit value formed by concatenating `a0' and `a1'. Subtraction is mod= ulo > +2^128, so any borrow out (carry out) is lost. The result is broken into= two > +64-bit pieces which are stored at the locations pointed to by `z0Ptr' and > +`z1Ptr'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE void > sub128( > uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1, uint64_t *z0Ptr= , uint64_t *z1Ptr ) > @@ -418,14 +431,15 @@ INLINE void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b= 2' > -| from the 192-bit value formed by concatenating `a0', `a1', and `a2'. > -| Subtraction is modulo 2^192, so any borrow out (carry out) is lost. T= he > -| result is broken into three 64-bit pieces which are stored at the loca= tions > -| pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Subtracts the 192-bit value formed by concatenating `b0', `b1', and `b2' > +from the 192-bit value formed by concatenating `a0', `a1', and `a2'. > +Subtraction is modulo 2^192, so any borrow out (carry out) is lost. The > +result is broken into three 64-bit pieces which are stored at the locati= ons > +pointed to by `z0Ptr', `z1Ptr', and `z2Ptr'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE void > sub192( > uint64_t a0, > @@ -456,11 +470,13 @@ INLINE void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Multiplies `a' by `b' to obtain a 128-bit product. The product is bro= ken > -| into two 64-bit pieces which are stored at the locations pointed to by > -| `z0Ptr' and `z1Ptr'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Multiplies `a' by `b' to obtain a 128-bit product. The product is broken > +into two 64-bit pieces which are stored at the locations pointed to by > +`z0Ptr' and `z1Ptr'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE void mul64To128( uint64_t a, uint64_t b, uint64_t *z0Ptr, uint64_= t *z1Ptr ) > { > @@ -485,13 +501,14 @@ INLINE void mul64To128( uint64_t a, uint64_t b, uin= t64_t *z0Ptr, uint64_t *z1Ptr >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' by > -| `b' to obtain a 192-bit product. The product is broken into three 64-= bit > -| pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr= ', and > -| `z2Ptr'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Multiplies the 128-bit value formed by concatenating `a0' and `a1' by > +`b' to obtain a 192-bit product. The product is broken into three 64-bit > +pieces which are stored at the locations pointed to by `z0Ptr', `z1Ptr',= and > +`z2Ptr'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE void > mul128By64To192( > uint64_t a0, > @@ -513,13 +530,14 @@ INLINE void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Multiplies the 128-bit value formed by concatenating `a0' and `a1' to = the > -| 128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit > -| product. The product is broken into four 64-bit pieces which are stor= ed at > -| the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Multiplies the 128-bit value formed by concatenating `a0' and `a1' to the > +128-bit value formed by concatenating `b0' and `b1' to obtain a 256-bit > +product. The product is broken into four 64-bit pieces which are stored= at > +the locations pointed to by `z0Ptr', `z1Ptr', `z2Ptr', and `z3Ptr'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE void > mul128To256( > uint64_t a0, > @@ -550,14 +568,16 @@ INLINE void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns an approximation to the 64-bit integer quotient obtained by di= viding > -| `b' into the 128-bit value formed by concatenating `a0' and `a1'. The > -| divisor `b' must be at least 2^63. If q is the exact quotient truncat= ed > -| toward zero, the approximation returned lies between q and q + 2 inclu= sive. > -| If the exact quotient q is larger than 64 bits, the maximum positive 6= 4-bit > -| unsigned integer is returned. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns an approximation to the 64-bit integer quotient obtained by divi= ding > +`b' into the 128-bit value formed by concatenating `a0' and `a1'. The > +divisor `b' must be at least 2^63. If q is the exact quotient truncated > +toward zero, the approximation returned lies between q and q + 2 inclusi= ve. > +If the exact quotient q is larger than 64 bits, the maximum positive 64-= bit > +unsigned integer is returned. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static uint64_t estimateDiv128To64( uint64_t a0, uint64_t a1, uint64_t b= ) > { > @@ -581,15 +601,17 @@ static uint64_t estimateDiv128To64( uint64_t a0, ui= nt64_t a1, uint64_t b ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns an approximation to the square root of the 32-bit significand = given > -| by `a'. Considered as an integer, `a' must be at least 2^31. If bit = 0 of > -| `aExp' (the least significant bit) is 1, the integer returned approxim= ates > -| 2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of = `aExp' > -| is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either > -| case, the approximation returned lies strictly within +/-2 of the exact > -| value. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns an approximation to the square root of the 32-bit significand gi= ven > +by `a'. Considered as an integer, `a' must be at least 2^31. If bit 0 = of > +`aExp' (the least significant bit) is 1, the integer returned approximat= es > +2^31*sqrt(`a'/2^31), where `a' is considered an integer. If bit 0 of `a= Exp' > +is 0, the integer returned approximates 2^31*sqrt(`a'/2^30). In either > +case, the approximation returned lies strictly within +/-2 of the exact > +value. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static uint32_t estimateSqrt32(int_fast16_t aExp, uint32_t a) > { > @@ -620,10 +642,12 @@ static uint32_t estimateSqrt32(int_fast16_t aExp, u= int32_t a) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the number of leading 0 bits before the most-significant 1 bit= of > -| `a'. If `a' is zero, 32 is returned. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the number of leading 0 bits before the most-significant 1 bit of > +`a'. If `a' is zero, 32 is returned. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static int8 countLeadingZeros32( uint32_t a ) > { > @@ -668,10 +692,12 @@ static int8 countLeadingZeros32( uint32_t a ) > #endif > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the number of leading 0 bits before the most-significant 1 bit= of > -| `a'. If `a' is zero, 64 is returned. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the number of leading 0 bits before the most-significant 1 bit of > +`a'. If `a' is zero, 64 is returned. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static int8 countLeadingZeros64( uint64_t a ) > { > @@ -696,11 +722,13 @@ static int8 countLeadingZeros64( uint64_t a ) > #endif > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' > -| is equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' > +is equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE flag eq128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -709,11 +737,13 @@ INLINE flag eq128( uint64_t a0, uint64_t a1, uint64= _t b0, uint64_t b1 ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' i= s less > -| than or equal to the 128-bit value formed by concatenating `b0' and `b= 1'. > -| Otherwise, returns 0. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is = less > +than or equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE flag le128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -722,11 +752,13 @@ INLINE flag le128( uint64_t a0, uint64_t a1, uint64= _t b0, uint64_t b1 ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' i= s less > -| than the 128-bit value formed by concatenating `b0' and `b1'. Otherwi= se, > -| returns 0. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is = less > +than the 128-bit value formed by concatenating `b0' and `b1'. Otherwise, > +returns 0. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE flag lt128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > @@ -735,11 +767,13 @@ INLINE flag lt128( uint64_t a0, uint64_t a1, uint64= _t b0, uint64_t b1 ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > -| not equal to the 128-bit value formed by concatenating `b0' and `b1'. > -| Otherwise, returns 0. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the 128-bit value formed by concatenating `a0' and `a1' is > +not equal to the 128-bit value formed by concatenating `b0' and `b1'. > +Otherwise, returns 0. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE flag ne128( uint64_t a0, uint64_t a1, uint64_t b0, uint64_t b1 ) > { > diff --git a/fpu/softfloat-specialize.h b/fpu/softfloat-specialize.h > index 518f694..ba9bfeb 100644 > --- a/fpu/softfloat-specialize.h > +++ b/fpu/softfloat-specialize.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ >=20=20 > -/*=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > +/* > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D >=20=20 > This C source fragment is part of the SoftFloat IEC/IEEE Floating-point > -Arithmetic Package, Release 2b. > +Arithmetic Package, Release 2a. >=20=20 > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Cen= ter > @@ -16,22 +17,19 @@ National Science Foundation under grant MIP-9311980. = The original version > of this code was written as part of a project to build a fixed-point vec= tor > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. >=20=20 > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effor= t has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIM= ES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PER= SONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSS= ES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHER= MORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS= , OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWA= RE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED = TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR = ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. >=20=20 > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice th= at > -the work is derivative, and (2) the source code includes prominent notic= e with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) t= hey > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. >=20=20 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D*/ >=20=20 > @@ -48,9 +46,11 @@ these four paragraphs for those parts of this code tha= t are retained. > #define NO_SIGNALING_NANS 1 > #endif >=20=20 > -/*----------------------------------------------------------------------= ------ > -| The pattern for a default generated half-precision NaN. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +The pattern for a default generated half-precision NaN. > +------------------------------------------------------------------------= ------- > +*/ > #if defined(TARGET_ARM) > const float16 float16_default_nan =3D const_float16(0x7E00); > #elif SNAN_BIT_IS_ONE > @@ -59,9 +59,11 @@ const float16 float16_default_nan =3D const_float16(0x= 7DFF); > const float16 float16_default_nan =3D const_float16(0xFE00); > #endif >=20=20 > -/*----------------------------------------------------------------------= ------ > -| The pattern for a default generated single-precision NaN. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +The pattern for a default generated single-precision NaN. > +------------------------------------------------------------------------= ------- > +*/ > #if defined(TARGET_SPARC) > const float32 float32_default_nan =3D const_float32(0x7FFFFFFF); > #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA= ) || \ > @@ -73,9 +75,11 @@ const float32 float32_default_nan =3D const_float32(0x= 7FBFFFFF); > const float32 float32_default_nan =3D const_float32(0xFFC00000); > #endif >=20=20 > -/*----------------------------------------------------------------------= ------ > -| The pattern for a default generated double-precision NaN. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +The pattern for a default generated double-precision NaN. > +------------------------------------------------------------------------= ------- > +*/ > #if defined(TARGET_SPARC) > const float64 float64_default_nan =3D const_float64(LIT64( 0x7FFFFFFFFFF= FFFFF )); > #elif defined(TARGET_PPC) || defined(TARGET_ARM) || defined(TARGET_ALPHA) > @@ -86,9 +90,11 @@ const float64 float64_default_nan =3D const_float64(LI= T64( 0x7FF7FFFFFFFFFFFF )); > const float64 float64_default_nan =3D const_float64(LIT64( 0xFFF80000000= 00000 )); > #endif >=20=20 > -/*----------------------------------------------------------------------= ------ > -| The pattern for a default generated extended double-precision NaN. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +The pattern for a default generated extended double-precision NaN. > +------------------------------------------------------------------------= ------- > +*/ > #if SNAN_BIT_IS_ONE > #define floatx80_default_nan_high 0x7FFF > #define floatx80_default_nan_low LIT64( 0xBFFFFFFFFFFFFFFF ) > @@ -100,10 +106,12 @@ const float64 float64_default_nan =3D const_float64= (LIT64( 0xFFF8000000000000 )); > const floatx80 floatx80_default_nan > =3D make_floatx80_init(floatx80_default_nan_high, floatx80_default_n= an_low); >=20=20 > -/*----------------------------------------------------------------------= ------ > -| The pattern for a default generated quadruple-precision NaN. The `hig= h' and > -| `low' values hold the most- and least-significant bits, respectively. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +The pattern for a default generated quadruple-precision NaN. The `high'= and > +`low' values hold the most- and least-significant bits, respectively. > +------------------------------------------------------------------------= ------- > +*/ > #if SNAN_BIT_IS_ONE > #define float128_default_nan_high LIT64( 0x7FFF7FFFFFFFFFFF ) > #define float128_default_nan_low LIT64( 0xFFFFFFFFFFFFFFFF ) > @@ -115,21 +123,25 @@ const floatx80 floatx80_default_nan > const float128 float128_default_nan > =3D make_float128_init(float128_default_nan_high, float128_default_n= an_low); >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Raises the exceptions specified by `flags'. Floating-point traps can = be > -| defined here if desired. It is currently not possible for such a trap > -| to substitute a result value. If traps are not implemented, this rout= ine > -| should be simply `float_exception_flags |=3D flags;'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Raises the exceptions specified by `flags'. Floating-point traps can be > +defined here if desired. It is currently not possible for such a trap > +to substitute a result value. If traps are not implemented, this routine > +should be simply `float_exception_flags |=3D flags;'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > void float_raise( int8 flags STATUS_PARAM ) > { > STATUS(float_exception_flags) |=3D flags; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Internal canonical NaN format. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Internal canonical NaN format. > +------------------------------------------------------------------------= ------- > +*/ > typedef struct { > flag sign; > uint64_t high, low; > @@ -146,10 +158,12 @@ int float16_is_signaling_nan(float16 a_) > return 0; > } > #else > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the half-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the half-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float16_is_quiet_nan(float16 a_) > { > @@ -161,10 +175,12 @@ int float16_is_quiet_nan(float16 a_) > #endif > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the half-precision floating-point value `a' is a signaling > -| NaN; otherwise returns 0. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the half-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float16_is_signaling_nan(float16 a_) > { > @@ -177,10 +193,12 @@ int float16_is_signaling_nan(float16 a_) > } > #endif >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns a quiet NaN if the half-precision floating point value `a' is a > -| signaling NaN; otherwise returns `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns a quiet NaN if the half-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------= ------- > +*/ > float16 float16_maybe_silence_nan(float16 a_) > { > if (float16_is_signaling_nan(a_)) { > @@ -199,11 +217,13 @@ float16 float16_maybe_silence_nan(float16 a_) > return a_; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the half-precision floating-point NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the inval= id > -| exception is raised. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the half-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static commonNaNT float16ToCommonNaN( float16 a STATUS_PARAM ) > { > @@ -216,10 +236,12 @@ static commonNaNT float16ToCommonNaN( float16 a STA= TUS_PARAM ) > return z; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the canonical NaN `a' to the half- > -| precision floating-point format. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the canonical NaN `a' to the half- > +precision floating-point format. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static float16 commonNaNToFloat16(commonNaNT a STATUS_PARAM) > { > @@ -248,10 +270,12 @@ int float32_is_signaling_nan(float32 a_) > return 0; > } > #else > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the single-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the single-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float32_is_quiet_nan( float32 a_ ) > { > @@ -263,10 +287,12 @@ int float32_is_quiet_nan( float32 a_ ) > #endif > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the single-precision floating-point value `a' is a signal= ing > -| NaN; otherwise returns 0. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the single-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float32_is_signaling_nan( float32 a_ ) > { > @@ -279,10 +305,12 @@ int float32_is_signaling_nan( float32 a_ ) > } > #endif >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns a quiet NaN if the single-precision floating point value `a' i= s a > -| signaling NaN; otherwise returns `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns a quiet NaN if the single-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > float32 float32_maybe_silence_nan( float32 a_ ) > { > @@ -302,12 +330,13 @@ float32 float32_maybe_silence_nan( float32 a_ ) > return a_; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the single-precision floating-point N= aN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the inval= id > -| exception is raised. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the single-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------= ------- > +*/ > static commonNaNT float32ToCommonNaN( float32 a STATUS_PARAM ) > { > commonNaNT z; > @@ -319,10 +348,12 @@ static commonNaNT float32ToCommonNaN( float32 a STA= TUS_PARAM ) > return z; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the canonical NaN `a' to the single- > -| precision floating-point format. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the canonical NaN `a' to the single- > +precision floating-point format. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static float32 commonNaNToFloat32( commonNaNT a STATUS_PARAM) > { > @@ -339,22 +370,24 @@ static float32 commonNaNToFloat32( commonNaNT a STA= TUS_PARAM) > return float32_default_nan; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Select which NaN to propagate for a two-input operation. > -| IEEE754 doesn't specify all the details of this, so the > -| algorithm is target-specific. > -| The routine is passed various bits of information about the > -| two NaNs and should return 0 to select NaN a and 1 for NaN b. > -| Note that signalling NaNs are always squashed to quiet NaNs > -| by the caller, by calling floatXX_maybe_silence_nan() before > -| returning them. > -| > -| aIsLargerSignificand is only valid if both a and b are NaNs > -| of some kind, and is true if a has the larger significand, > -| or if both a and b have the same significand but a is > -| positive but b is negative. It is only needed for the x87 > -| tie-break rule. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Select which NaN to propagate for a two-input operation. > +IEEE754 doesn't specify all the details of this, so the > +algorithm is target-specific. > +The routine is passed various bits of information about the > +two NaNs and should return 0 to select NaN a and 1 for NaN b. > +Note that signalling NaNs are always squashed to quiet NaNs > +by the caller, by calling floatXX_maybe_silence_nan() before > +returning them. > + > +aIsLargerSignificand is only valid if both a and b are NaNs > +of some kind, and is true if a has the larger significand, > +or if both a and b have the same significand but a is > +positive but b is negative. It is only needed for the x87 > +tie-break rule. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > #if defined(TARGET_ARM) > static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag bIsSNa= N, > @@ -451,12 +484,14 @@ static int pickNaN(flag aIsQNaN, flag aIsSNaN, flag= bIsQNaN, flag bIsSNaN, > } > #endif >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Select which NaN to propagate for a three-input operation. > -| For the moment we assume that no CPU needs the 'larger significand' > -| information. > -| Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Select which NaN to propagate for a three-input operation. > +For the moment we assume that no CPU needs the 'larger significand' > +information. > +Return values : 0 : a; 1 : b; 2 : c; 3 : default-NaN > +------------------------------------------------------------------------= ------- > +*/ > #if defined(TARGET_ARM) > static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN, flag bIsQNaN, flag = bIsSNaN, > flag cIsQNaN, flag cIsSNaN, flag infzero STATUS= _PARAM) > @@ -554,12 +589,13 @@ static int pickNaNMulAdd(flag aIsQNaN, flag aIsSNaN= , flag bIsQNaN, flag bIsSNaN, > } > #endif >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes two single-precision floating-point values `a' and `b', one of w= hich > -| is a NaN, and returns the appropriate NaN result. If either `a' or `b= ' is a > -| signaling NaN, the invalid exception is raised. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Takes two single-precision floating-point values `a' and `b', one of whi= ch > +is a NaN, and returns the appropriate NaN result. If either `a' or `b' = is a > +signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------= ------- > +*/ > static float32 propagateFloat32NaN( float32 a, float32 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -594,14 +630,16 @@ static float32 propagateFloat32NaN( float32 a, floa= t32 b STATUS_PARAM) > } > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes three single-precision floating-point values `a', `b' and `c', o= ne of > -| which is a NaN, and returns the appropriate NaN result. If any of `a= ', > -| `b' or `c' is a signaling NaN, the invalid exception is raised. > -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which c= ase > -| obviously c is a NaN, and whether to propagate c or some other NaN is > -| implementation defined). > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Takes three single-precision floating-point values `a', `b' and `c', one= of > +which is a NaN, and returns the appropriate NaN result. If any of `a', > +`b' or `c' is a signaling NaN, the invalid exception is raised. > +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > +obviously c is a NaN, and whether to propagate c or some other NaN is > +implementation defined). > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static float32 propagateFloat32MulAddNaN(float32 a, float32 b, > float32 c, flag infzero STATUS_= PARAM) > @@ -656,10 +694,12 @@ int float64_is_signaling_nan(float64 a_) > return 0; > } > #else > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the double-precision floating-point value `a' is a quiet > -| NaN; otherwise returns 0. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the double-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float64_is_quiet_nan( float64 a_ ) > { > @@ -673,10 +713,12 @@ int float64_is_quiet_nan( float64 a_ ) > #endif > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the double-precision floating-point value `a' is a signal= ing > -| NaN; otherwise returns 0. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the double-precision floating-point value `a' is a signaling > +NaN; otherwise returns 0. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float64_is_signaling_nan( float64 a_ ) > { > @@ -691,10 +733,12 @@ int float64_is_signaling_nan( float64 a_ ) > } > #endif >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns a quiet NaN if the double-precision floating point value `a' i= s a > -| signaling NaN; otherwise returns `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns a quiet NaN if the double-precision floating point value `a' is a > +signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > float64 float64_maybe_silence_nan( float64 a_ ) > { > @@ -714,12 +758,13 @@ float64 float64_maybe_silence_nan( float64 a_ ) > return a_; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the double-precision floating-point N= aN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the inval= id > -| exception is raised. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the double-precision floating-point NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------= ------- > +*/ > static commonNaNT float64ToCommonNaN( float64 a STATUS_PARAM) > { > commonNaNT z; > @@ -731,10 +776,12 @@ static commonNaNT float64ToCommonNaN( float64 a STA= TUS_PARAM) > return z; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the canonical NaN `a' to the double- > -| precision floating-point format. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the canonical NaN `a' to the double- > +precision floating-point format. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static float64 commonNaNToFloat64( commonNaNT a STATUS_PARAM) > { > @@ -753,12 +800,13 @@ static float64 commonNaNToFloat64( commonNaNT a STA= TUS_PARAM) > return float64_default_nan; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes two double-precision floating-point values `a' and `b', one of w= hich > -| is a NaN, and returns the appropriate NaN result. If either `a' or `b= ' is a > -| signaling NaN, the invalid exception is raised. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Takes two double-precision floating-point values `a' and `b', one of whi= ch > +is a NaN, and returns the appropriate NaN result. If either `a' or `b' = is a > +signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------= ------- > +*/ > static float64 propagateFloat64NaN( float64 a, float64 b STATUS_PARAM) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -793,14 +841,16 @@ static float64 propagateFloat64NaN( float64 a, floa= t64 b STATUS_PARAM) > } > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes three double-precision floating-point values `a', `b' and `c', o= ne of > -| which is a NaN, and returns the appropriate NaN result. If any of `a= ', > -| `b' or `c' is a signaling NaN, the invalid exception is raised. > -| The input infzero indicates whether a*b was 0*inf or inf*0 (in which c= ase > -| obviously c is a NaN, and whether to propagate c or some other NaN is > -| implementation defined). > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Takes three double-precision floating-point values `a', `b' and `c', one= of > +which is a NaN, and returns the appropriate NaN result. If any of `a', > +`b' or `c' is a signaling NaN, the invalid exception is raised. > +The input infzero indicates whether a*b was 0*inf or inf*0 (in which case > +obviously c is a NaN, and whether to propagate c or some other NaN is > +implementation defined). > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static float64 propagateFloat64MulAddNaN(float64 a, float64 b, > float64 c, flag infzero STATUS_= PARAM) > @@ -855,11 +905,13 @@ int floatx80_is_signaling_nan(floatx80 a_) > return 0; > } > #else > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the extended double-precision floating-point value `a' is= a > -| quiet NaN; otherwise returns 0. This slightly differs from the same > -| function for other types as floatx80 has an explicit bit. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the extended double-precision floating-point value `a' is a > +quiet NaN; otherwise returns 0. This slightly differs from the same > +function for other types as floatx80 has an explicit bit. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int floatx80_is_quiet_nan( floatx80 a ) > { > @@ -877,11 +929,13 @@ int floatx80_is_quiet_nan( floatx80 a ) > #endif > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the extended double-precision floating-point value `a' is= a > -| signaling NaN; otherwise returns 0. This slightly differs from the same > -| function for other types as floatx80 has an explicit bit. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the extended double-precision floating-point value `a' is a > +signaling NaN; otherwise returns 0. This slightly differs from the same > +function for other types as floatx80 has an explicit bit. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int floatx80_is_signaling_nan( floatx80 a ) > { > @@ -900,10 +954,12 @@ int floatx80_is_signaling_nan( floatx80 a ) > } > #endif >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns a quiet NaN if the extended double-precision floating point va= lue > -| `a' is a signaling NaN; otherwise returns `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns a quiet NaN if the extended double-precision floating point value > +`a' is a signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > floatx80 floatx80_maybe_silence_nan( floatx80 a ) > { > @@ -923,12 +979,13 @@ floatx80 floatx80_maybe_silence_nan( floatx80 a ) > return a; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the extended double-precision floatin= g- > -| point NaN `a' to the canonical NaN format. If `a' is a signaling NaN,= the > -| invalid exception is raised. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the extended double-precision floating- > +point NaN `a' to the canonical NaN format. If `a' is a signaling NaN, t= he > +invalid exception is raised. > +------------------------------------------------------------------------= ------- > +*/ > static commonNaNT floatx80ToCommonNaN( floatx80 a STATUS_PARAM) > { > commonNaNT z; > @@ -946,10 +1003,12 @@ static commonNaNT floatx80ToCommonNaN( floatx80 a = STATUS_PARAM) > return z; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the canonical NaN `a' to the extended > -| double-precision floating-point format. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the canonical NaN `a' to the extended > +double-precision floating-point format. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static floatx80 commonNaNToFloatx80( commonNaNT a STATUS_PARAM) > { > @@ -972,12 +1031,13 @@ static floatx80 commonNaNToFloatx80( commonNaNT a = STATUS_PARAM) > return z; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes two extended double-precision floating-point values `a' and `b',= one > -| of which is a NaN, and returns the appropriate NaN result. If either = `a' or > -| `b' is a signaling NaN, the invalid exception is raised. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Takes two extended double-precision floating-point values `a' and `b', o= ne > +of which is a NaN, and returns the appropriate NaN result. If either `a= ' or > +`b' is a signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------= ------- > +*/ > static floatx80 propagateFloatx80NaN( floatx80 a, floatx80 b STATUS_PARA= M) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > @@ -1023,10 +1083,12 @@ int float128_is_signaling_nan(float128 a_) > return 0; > } > #else > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the quadruple-precision floating-point value `a' is a qui= et > -| NaN; otherwise returns 0. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the quadruple-precision floating-point value `a' is a quiet > +NaN; otherwise returns 0. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float128_is_quiet_nan( float128 a ) > { > @@ -1041,10 +1103,12 @@ int float128_is_quiet_nan( float128 a ) > #endif > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the quadruple-precision floating-point value `a' is a > -| signaling NaN; otherwise returns 0. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the quadruple-precision floating-point value `a' is a > +signaling NaN; otherwise returns 0. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float128_is_signaling_nan( float128 a ) > { > @@ -1060,10 +1124,12 @@ int float128_is_signaling_nan( float128 a ) > } > #endif >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns a quiet NaN if the quadruple-precision floating point value `a= ' is > -| a signaling NaN; otherwise returns `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns a quiet NaN if the quadruple-precision floating point value `a' = is > +a signaling NaN; otherwise returns `a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > float128 float128_maybe_silence_nan( float128 a ) > { > @@ -1083,12 +1149,13 @@ float128 float128_maybe_silence_nan( float128 a ) > return a; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the quadruple-precision floating-poin= t NaN > -| `a' to the canonical NaN format. If `a' is a signaling NaN, the inval= id > -| exception is raised. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the quadruple-precision floating-point = NaN > +`a' to the canonical NaN format. If `a' is a signaling NaN, the invalid > +exception is raised. > +------------------------------------------------------------------------= ------- > +*/ > static commonNaNT float128ToCommonNaN( float128 a STATUS_PARAM) > { > commonNaNT z; > @@ -1099,10 +1166,12 @@ static commonNaNT float128ToCommonNaN( float128 a= STATUS_PARAM) > return z; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the canonical NaN `a' to the quadrupl= e- > -| precision floating-point format. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the canonical NaN `a' to the quadruple- > +precision floating-point format. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static float128 commonNaNToFloat128( commonNaNT a STATUS_PARAM) > { > @@ -1119,12 +1188,13 @@ static float128 commonNaNToFloat128( commonNaNT a= STATUS_PARAM) > return z; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes two quadruple-precision floating-point values `a' and `b', one of > -| which is a NaN, and returns the appropriate NaN result. If either `a'= or > -| `b' is a signaling NaN, the invalid exception is raised. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Takes two quadruple-precision floating-point values `a' and `b', one of > +which is a NaN, and returns the appropriate NaN result. If either `a' or > +`b' is a signaling NaN, the invalid exception is raised. > +------------------------------------------------------------------------= ------- > +*/ > static float128 propagateFloat128NaN( float128 a, float128 b STATUS_PARA= M) > { > flag aIsQuietNaN, aIsSignalingNaN, bIsQuietNaN, bIsSignalingNaN; > diff --git a/fpu/softfloat.c b/fpu/softfloat.c > index 7ba51b6..9145582 100644 > --- a/fpu/softfloat.c > +++ b/fpu/softfloat.c > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ >=20=20 > -/*=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > +/* > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D >=20=20 > -This C source file is part of the SoftFloat IEC/IEEE Floating-point Arit= hmetic > -Package, Release 2b. > +This C source file is part of the SoftFloat IEC/IEEE Floating-point > +Arithmetic Package, Release 2a. >=20=20 > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Cen= ter > @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. = The original version > of this code was written as part of a project to build a fixed-point vec= tor > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. >=20=20 > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effor= t has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIM= ES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PER= SONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSS= ES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHER= MORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS= , OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWA= RE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED = TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR = ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. >=20=20 > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice th= at > -the work is derivative, and (2) the source code includes prominent notic= e with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) t= hey > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. >=20=20 > -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D*/ > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > +*/ >=20=20 > /* softfloat (and in particular the code in softfloat-specialize.h) is > * target-dependent and needs the TARGET_* macros. > @@ -42,21 +41,25 @@ these four paragraphs for those parts of this code th= at are retained. >=20=20 > #include "fpu/softfloat.h" >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Primitive arithmetic functions, including multi-word arithmetic, and > -| division and square root approximations. (Can be specialized to targe= t if > -| desired.) > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Primitive arithmetic functions, including multi-word arithmetic, and > +division and square root approximations. (Can be specialized to target = if > +desired.) > +------------------------------------------------------------------------= ------- > +*/ > #include "softfloat-macros.h" >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Functions and definitions to determine: (1) whether tininess for unde= rflow > -| is detected before or after rounding by default, (2) what (if anything) > -| happens when exceptions are raised, (3) how signaling NaNs are disting= uished > -| from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs > -| are propagated from function inputs to output. These details are targ= et- > -| specific. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Functions and definitions to determine: (1) whether tininess for underf= low > +is detected before or after rounding by default, (2) what (if anything) > +happens when exceptions are raised, (3) how signaling NaNs are distingui= shed > +from quiet NaNs, (4) the default generated quiet NaNs, and (5) how NaNs > +are propagated from function inputs to output. These details are target- > +specific. > +------------------------------------------------------------------------= ------- > +*/ > #include "softfloat-specialize.h" >=20=20 > void set_float_rounding_mode(int val STATUS_PARAM) > @@ -74,43 +77,51 @@ void set_floatx80_rounding_precision(int val STATUS_P= ARAM) > STATUS(floatx80_rounding_precision) =3D val; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the fraction bits of the half-precision floating-point value `= a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the fraction bits of the half-precision floating-point value `a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE uint32_t extractFloat16Frac(float16 a) > { > return float16_val(a) & 0x3ff; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the exponent bits of the half-precision floating-point value `= a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the exponent bits of the half-precision floating-point value `a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE int_fast16_t extractFloat16Exp(float16 a) > { > return (float16_val(a) >> 10) & 0x1f; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the sign bit of the single-precision floating-point value `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the sign bit of the single-precision floating-point value `a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE flag extractFloat16Sign(float16 a) > { > return float16_val(a)>>15; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes a 64-bit fixed-point value `absZ' with binary point between bits= 6 > -| and 7, and returns the properly rounded 32-bit integer corresponding t= o the > -| input. If `zSign' is 1, the input is negated before being converted t= o an > -| integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point = input > -| is simply rounded to an integer, with the inexact exception raised if = the > -| input cannot be represented exactly as an integer. However, if the fi= xed- > -| point input is too large, the invalid exception is raised and the larg= est > -| positive or negative integer is returned. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Takes a 64-bit fixed-point value `absZ' with binary point between bits 6 > +and 7, and returns the properly rounded 32-bit integer corresponding to = the > +input. If `zSign' is 1, the input is negated before being converted to = an > +integer. Bit 63 of `absZ' must be zero. Ordinarily, the fixed-point in= put > +is simply rounded to an integer, with the inexact exception raised if the > +input cannot be represented exactly as an integer. However, if the fixe= d- > +point input is too large, the invalid exception is raised and the largest > +positive or negative integer is returned. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static int32 roundAndPackInt32( flag zSign, uint64_t absZ STATUS_PARAM) > { > @@ -150,17 +161,19 @@ static int32 roundAndPackInt32( flag zSign, uint64_= t absZ STATUS_PARAM) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes the 128-bit fixed-point value formed by concatenating `absZ0' and > -| `absZ1', with binary point between bits 63 and 64 (between the input w= ords), > -| and returns the properly rounded 64-bit integer corresponding to the i= nput. > -| If `zSign' is 1, the input is negated before being converted to an int= eger. > -| Ordinarily, the fixed-point input is simply rounded to an integer, with > -| the inexact exception raised if the input cannot be represented exactl= y as > -| an integer. However, if the fixed-point input is too large, the inval= id > -| exception is raised and the largest positive or negative integer is > -| returned. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Takes the 128-bit fixed-point value formed by concatenating `absZ0' and > +`absZ1', with binary point between bits 63 and 64 (between the input wor= ds), > +and returns the properly rounded 64-bit integer corresponding to the inp= ut. > +If `zSign' is 1, the input is negated before being converted to an integ= er. > +Ordinarily, the fixed-point input is simply rounded to an integer, with > +the inexact exception raised if the input cannot be represented exactly = as > +an integer. However, if the fixed-point input is too large, the invalid > +exception is raised and the largest positive or negative integer is > +returned. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static int64 roundAndPackInt64( flag zSign, uint64_t absZ0, uint64_t abs= Z1 STATUS_PARAM) > { > @@ -203,9 +216,11 @@ static int64 roundAndPackInt64( flag zSign, uint64_t= absZ0, uint64_t absZ1 STATU >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the fraction bits of the single-precision floating-point value= `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the fraction bits of the single-precision floating-point value `= a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE uint32_t extractFloat32Frac( float32 a ) > { > @@ -214,9 +229,11 @@ INLINE uint32_t extractFloat32Frac( float32 a ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the exponent bits of the single-precision floating-point value= `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the exponent bits of the single-precision floating-point value `= a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE int_fast16_t extractFloat32Exp(float32 a) > { > @@ -225,10 +242,11 @@ INLINE int_fast16_t extractFloat32Exp(float32 a) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the sign bit of the single-precision floating-point value `a'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the sign bit of the single-precision floating-point value `a'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE flag extractFloat32Sign( float32 a ) > { >=20=20 > @@ -236,10 +254,12 @@ INLINE flag extractFloat32Sign( float32 a ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| If `a' is denormal and we are in flush-to-zero mode then set the > -| input-denormal exception and return zero. Otherwise just return the va= lue. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +If `a' is denormal and we are in flush-to-zero mode then set the > +input-denormal exception and return zero. Otherwise just return the valu= e. > +------------------------------------------------------------------------= ------- > +*/ > static float32 float32_squash_input_denormal(float32 a STATUS_PARAM) > { > if (STATUS(flush_inputs_to_zero)) { > @@ -251,13 +271,14 @@ static float32 float32_squash_input_denormal(float3= 2 a STATUS_PARAM) > return a; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Normalizes the subnormal single-precision floating-point value represe= nted > -| by the denormalized significand `aSig'. The normalized exponent and > -| significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Normalizes the subnormal single-precision floating-point value represent= ed > +by the denormalized significand `aSig'. The normalized exponent and > +significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------= ------- > +*/ > static void > normalizeFloat32Subnormal(uint32_t aSig, int_fast16_t *zExpPtr, uint32_= t *zSigPtr) > { > @@ -269,16 +290,18 @@ static void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| single-precision floating-point value, returning the result. After be= ing > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `= zSig' > -| will be added into the exponent. Since a properly normalized signific= and > -| will have an integer portion equal to 1, the `zExp' input should be 1 = less > -| than the desired result exponent whenever `zSig' is a complete, normal= ized > -| significand. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +single-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zS= ig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 le= ss > +than the desired result exponent whenever `zSig' is a complete, normaliz= ed > +significand. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE float32 packFloat32(flag zSign, int_fast16_t zExp, uint32_t zSig) > { > @@ -288,27 +311,29 @@ INLINE float32 packFloat32(flag zSign, int_fast16_t= zExp, uint32_t zSig) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes an abstract floating-point value having sign `zSign', exponent `= zExp', > -| and significand `zSig', and returns the proper single-precision floati= ng- > -| point value corresponding to the abstract input. Ordinarily, the abst= ract > -| value is simply rounded and packed into the single-precision format, w= ith > -| the inexact exception raised if the abstract input cannot be represent= ed > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value = is > -| returned. If the abstract value is too small, the input value is roun= ded to > -| a subnormal number, and the underflow and inexact exceptions are raise= d if > -| the abstract input cannot be represented exactly as a subnormal single- > -| precision floating-point number. > -| The input significand `zSig' has its binary point between bits 30 > -| and 29, which is 7 bits to the left of the usual location. This shift= ed > -| significand must be normalized or smaller. If `zSig' is not normalize= d, > -| `zExp' must be 0; in that case, the result returned is a subnormal num= ber, > -| and it must not require rounding. In the usual case that `zSig' is > -| normalized, `zExp' must be 1 less than the ``true'' floating-point exp= onent. > -| The handling of underflow and overflow follows the IEC/IEEE Standard f= or > -| Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Takes an abstract floating-point value having sign `zSign', exponent `zE= xp', > +and significand `zSig', and returns the proper single-precision floating- > +point value corresponding to the abstract input. Ordinarily, the abstra= ct > +value is simply rounded and packed into the single-precision format, with > +the inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounde= d to > +a subnormal number, and the underflow and inexact exceptions are raised = if > +the abstract input cannot be represented exactly as a subnormal single- > +precision floating-point number. > + The input significand `zSig' has its binary point between bits 30 > +and 29, which is 7 bits to the left of the usual location. This shifted > +significand must be normalized or smaller. If `zSig' is not normalized, > +`zExp' must be 0; in that case, the result returned is a subnormal numbe= r, > +and it must not require rounding. In the usual case that `zSig' is > +normalized, `zExp' must be 1 less than the ``true'' floating-point expon= ent. > +The handling of underflow and overflow follows the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static float32 roundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32= _t zSig STATUS_PARAM) > { > @@ -366,15 +391,16 @@ static float32 roundAndPackFloat32(flag zSign, int_= fast16_t zExp, uint32_t zSig >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes an abstract floating-point value having sign `zSign', exponent `= zExp', > -| and significand `zSig', and returns the proper single-precision floati= ng- > -| point value corresponding to the abstract input. This routine is just= like > -| `roundAndPackFloat32' except that `zSig' does not have to be normalize= d. > -| Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``tr= ue'' > -| floating-point exponent. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Takes an abstract floating-point value having sign `zSign', exponent `zE= xp', > +and significand `zSig', and returns the proper single-precision floating- > +point value corresponding to the abstract input. This routine is just l= ike > +`roundAndPackFloat32' except that `zSig' does not have to be normalized. > +Bit 31 of `zSig' must be zero, and `zExp' must be 1 less than the ``true= '' > +floating-point exponent. > +------------------------------------------------------------------------= ------- > +*/ > static float32 > normalizeRoundAndPackFloat32(flag zSign, int_fast16_t zExp, uint32_t zS= ig STATUS_PARAM) > { > @@ -385,9 +411,11 @@ static float32 >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the fraction bits of the double-precision floating-point value= `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the fraction bits of the double-precision floating-point value `= a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE uint64_t extractFloat64Frac( float64 a ) > { > @@ -396,9 +424,11 @@ INLINE uint64_t extractFloat64Frac( float64 a ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the exponent bits of the double-precision floating-point value= `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the exponent bits of the double-precision floating-point value `= a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE int_fast16_t extractFloat64Exp(float64 a) > { > @@ -407,10 +437,11 @@ INLINE int_fast16_t extractFloat64Exp(float64 a) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the sign bit of the double-precision floating-point value `a'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the sign bit of the double-precision floating-point value `a'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE flag extractFloat64Sign( float64 a ) > { >=20=20 > @@ -418,10 +449,12 @@ INLINE flag extractFloat64Sign( float64 a ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| If `a' is denormal and we are in flush-to-zero mode then set the > -| input-denormal exception and return zero. Otherwise just return the va= lue. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +If `a' is denormal and we are in flush-to-zero mode then set the > +input-denormal exception and return zero. Otherwise just return the valu= e. > +------------------------------------------------------------------------= ------- > +*/ > static float64 float64_squash_input_denormal(float64 a STATUS_PARAM) > { > if (STATUS(flush_inputs_to_zero)) { > @@ -433,13 +466,14 @@ static float64 float64_squash_input_denormal(float6= 4 a STATUS_PARAM) > return a; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Normalizes the subnormal double-precision floating-point value represe= nted > -| by the denormalized significand `aSig'. The normalized exponent and > -| significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Normalizes the subnormal double-precision floating-point value represent= ed > +by the denormalized significand `aSig'. The normalized exponent and > +significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------= ------- > +*/ > static void > normalizeFloat64Subnormal(uint64_t aSig, int_fast16_t *zExpPtr, uint64_= t *zSigPtr) > { > @@ -451,16 +485,18 @@ static void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| double-precision floating-point value, returning the result. After be= ing > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `= zSig' > -| will be added into the exponent. Since a properly normalized signific= and > -| will have an integer portion equal to 1, the `zExp' input should be 1 = less > -| than the desired result exponent whenever `zSig' is a complete, normal= ized > -| significand. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +double-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zS= ig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 le= ss > +than the desired result exponent whenever `zSig' is a complete, normaliz= ed > +significand. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE float64 packFloat64(flag zSign, int_fast16_t zExp, uint64_t zSig) > { > @@ -470,27 +506,29 @@ INLINE float64 packFloat64(flag zSign, int_fast16_t= zExp, uint64_t zSig) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes an abstract floating-point value having sign `zSign', exponent `= zExp', > -| and significand `zSig', and returns the proper double-precision floati= ng- > -| point value corresponding to the abstract input. Ordinarily, the abst= ract > -| value is simply rounded and packed into the double-precision format, w= ith > -| the inexact exception raised if the abstract input cannot be represent= ed > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value = is > -| returned. If the abstract value is too small, the input value is roun= ded > -| to a subnormal number, and the underflow and inexact exceptions are ra= ised > -| if the abstract input cannot be represented exactly as a subnormal dou= ble- > -| precision floating-point number. > -| The input significand `zSig' has its binary point between bits 62 > -| and 61, which is 10 bits to the left of the usual location. This shif= ted > -| significand must be normalized or smaller. If `zSig' is not normalize= d, > -| `zExp' must be 0; in that case, the result returned is a subnormal num= ber, > -| and it must not require rounding. In the usual case that `zSig' is > -| normalized, `zExp' must be 1 less than the ``true'' floating-point exp= onent. > -| The handling of underflow and overflow follows the IEC/IEEE Standard f= or > -| Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Takes an abstract floating-point value having sign `zSign', exponent `zE= xp', > +and significand `zSig', and returns the proper double-precision floating- > +point value corresponding to the abstract input. Ordinarily, the abstra= ct > +value is simply rounded and packed into the double-precision format, with > +the inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounded > +to a subnormal number, and the underflow and inexact exceptions are rais= ed > +if the abstract input cannot be represented exactly as a subnormal doubl= e- > +precision floating-point number. > + The input significand `zSig' has its binary point between bits 62 > +and 61, which is 10 bits to the left of the usual location. This shifted > +significand must be normalized or smaller. If `zSig' is not normalized, > +`zExp' must be 0; in that case, the result returned is a subnormal numbe= r, > +and it must not require rounding. In the usual case that `zSig' is > +normalized, `zExp' must be 1 less than the ``true'' floating-point expon= ent. > +The handling of underflow and overflow follows the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static float64 roundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64= _t zSig STATUS_PARAM) > { > @@ -548,15 +586,16 @@ static float64 roundAndPackFloat64(flag zSign, int_= fast16_t zExp, uint64_t zSig >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes an abstract floating-point value having sign `zSign', exponent `= zExp', > -| and significand `zSig', and returns the proper double-precision floati= ng- > -| point value corresponding to the abstract input. This routine is just= like > -| `roundAndPackFloat64' except that `zSig' does not have to be normalize= d. > -| Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``tr= ue'' > -| floating-point exponent. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Takes an abstract floating-point value having sign `zSign', exponent `zE= xp', > +and significand `zSig', and returns the proper double-precision floating- > +point value corresponding to the abstract input. This routine is just l= ike > +`roundAndPackFloat64' except that `zSig' does not have to be normalized. > +Bit 63 of `zSig' must be zero, and `zExp' must be 1 less than the ``true= '' > +floating-point exponent. > +------------------------------------------------------------------------= ------- > +*/ > static float64 > normalizeRoundAndPackFloat64(flag zSign, int_fast16_t zExp, uint64_t zS= ig STATUS_PARAM) > { > @@ -567,10 +606,12 @@ static float64 >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the fraction bits of the extended double-precision floating-po= int > -| value `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the fraction bits of the extended double-precision floating-point > +value `a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE uint64_t extractFloatx80Frac( floatx80 a ) > { > @@ -579,11 +620,12 @@ INLINE uint64_t extractFloatx80Frac( floatx80 a ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the exponent bits of the extended double-precision floating-po= int > -| value `a'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the exponent bits of the extended double-precision floating-point > +value `a'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE int32 extractFloatx80Exp( floatx80 a ) > { >=20=20 > @@ -591,11 +633,12 @@ INLINE int32 extractFloatx80Exp( floatx80 a ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the sign bit of the extended double-precision floating-point v= alue > -| `a'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the sign bit of the extended double-precision floating-point val= ue > +`a'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE flag extractFloatx80Sign( floatx80 a ) > { >=20=20 > @@ -603,13 +646,14 @@ INLINE flag extractFloatx80Sign( floatx80 a ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Normalizes the subnormal extended double-precision floating-point value > -| represented by the denormalized significand `aSig'. The normalized ex= ponent > -| and significand are stored at the locations pointed to by `zExpPtr' and > -| `zSigPtr', respectively. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Normalizes the subnormal extended double-precision floating-point value > +represented by the denormalized significand `aSig'. The normalized expo= nent > +and significand are stored at the locations pointed to by `zExpPtr' and > +`zSigPtr', respectively. > +------------------------------------------------------------------------= ------- > +*/ > static void > normalizeFloatx80Subnormal( uint64_t aSig, int32 *zExpPtr, uint64_t *zS= igPtr ) > { > @@ -621,10 +665,12 @@ static void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into an > -| extended double-precision floating-point value, returning the result. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into an > +extended double-precision floating-point value, returning the result. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE floatx80 packFloatx80( flag zSign, int32 zExp, uint64_t zSig ) > { > @@ -636,30 +682,31 @@ INLINE floatx80 packFloatx80( flag zSign, int32 zEx= p, uint64_t zSig ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes an abstract floating-point value having sign `zSign', exponent `= zExp', > -| and extended significand formed by the concatenation of `zSig0' and `z= Sig1', > -| and returns the proper extended double-precision floating-point value > -| corresponding to the abstract input. Ordinarily, the abstract value is > -| rounded and packed into the extended double-precision format, with the > -| inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value = is > -| returned. If the abstract value is too small, the input value is roun= ded to > -| a subnormal number, and the underflow and inexact exceptions are raise= d if > -| the abstract input cannot be represented exactly as a subnormal extend= ed > -| double-precision floating-point number. > -| If `roundingPrecision' is 32 or 64, the result is rounded to the s= ame > -| number of bits as single or double precision, respectively. Otherwise= , the > -| result is rounded to the full precision of the extended double-precisi= on > -| format. > -| The input significand must be normalized or smaller. If the input > -| significand is not normalized, `zExp' must be 0; in that case, the res= ult > -| returned is a subnormal number, and it must not require rounding. The > -| handling of underflow and overflow follows the IEC/IEEE Standard for B= inary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Takes an abstract floating-point value having sign `zSign', exponent `zE= xp', > +and extended significand formed by the concatenation of `zSig0' and `zSi= g1', > +and returns the proper extended double-precision floating-point value > +corresponding to the abstract input. Ordinarily, the abstract value is > +rounded and packed into the extended double-precision format, with the > +inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounde= d to > +a subnormal number, and the underflow and inexact exceptions are raised = if > +the abstract input cannot be represented exactly as a subnormal extended > +double-precision floating-point number. > + If `roundingPrecision' is 32 or 64, the result is rounded to the same > +number of bits as single or double precision, respectively. Otherwise, = the > +result is rounded to the full precision of the extended double-precision > +format. > + The input significand must be normalized or smaller. If the input > +significand is not normalized, `zExp' must be 0; in that case, the result > +returned is a subnormal number, and it must not require rounding. The > +handling of underflow and overflow follows the IEC/IEEE Standard for Bin= ary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > static floatx80 > roundAndPackFloatx80( > int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uin= t64_t zSig1 > @@ -823,15 +870,16 @@ static floatx80 >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes an abstract floating-point value having sign `zSign', exponent > -| `zExp', and significand formed by the concatenation of `zSig0' and `zS= ig1', > -| and returns the proper extended double-precision floating-point value > -| corresponding to the abstract input. This routine is just like > -| `roundAndPackFloatx80' except that the input significand does not have= to be > -| normalized. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Takes an abstract floating-point value having sign `zSign', exponent > +`zExp', and significand formed by the concatenation of `zSig0' and `zSig= 1', > +and returns the proper extended double-precision floating-point value > +corresponding to the abstract input. This routine is just like > +`roundAndPackFloatx80' except that the input significand does not have t= o be > +normalized. > +------------------------------------------------------------------------= ------- > +*/ > static floatx80 > normalizeRoundAndPackFloatx80( > int8 roundingPrecision, flag zSign, int32 zExp, uint64_t zSig0, uin= t64_t zSig1 > @@ -852,10 +900,12 @@ static floatx80 >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the least-significant 64 fraction bits of the quadruple-precis= ion > -| floating-point value `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the least-significant 64 fraction bits of the quadruple-precision > +floating-point value `a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE uint64_t extractFloat128Frac1( float128 a ) > { > @@ -864,10 +914,12 @@ INLINE uint64_t extractFloat128Frac1( float128 a ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the most-significant 48 fraction bits of the quadruple-precisi= on > -| floating-point value `a'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the most-significant 48 fraction bits of the quadruple-precision > +floating-point value `a'. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > INLINE uint64_t extractFloat128Frac0( float128 a ) > { > @@ -876,11 +928,12 @@ INLINE uint64_t extractFloat128Frac0( float128 a ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the exponent bits of the quadruple-precision floating-point va= lue > -| `a'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the exponent bits of the quadruple-precision floating-point value > +`a'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE int32 extractFloat128Exp( float128 a ) > { >=20=20 > @@ -888,10 +941,11 @@ INLINE int32 extractFloat128Exp( float128 a ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the sign bit of the quadruple-precision floating-point value `= a'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the sign bit of the quadruple-precision floating-point value `a'. > +------------------------------------------------------------------------= ------- > +*/ > INLINE flag extractFloat128Sign( float128 a ) > { >=20=20 > @@ -899,16 +953,17 @@ INLINE flag extractFloat128Sign( float128 a ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Normalizes the subnormal quadruple-precision floating-point value > -| represented by the denormalized significand formed by the concatenatio= n of > -| `aSig0' and `aSig1'. The normalized exponent is stored at the location > -| pointed to by `zExpPtr'. The most significant 49 bits of the normaliz= ed > -| significand are stored at the location pointed to by `zSig0Ptr', and t= he > -| least significant 64 bits of the normalized significand are stored at = the > -| location pointed to by `zSig1Ptr'. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Normalizes the subnormal quadruple-precision floating-point value > +represented by the denormalized significand formed by the concatenation = of > +`aSig0' and `aSig1'. The normalized exponent is stored at the location > +pointed to by `zExpPtr'. The most significant 49 bits of the normalized > +significand are stored at the location pointed to by `zSig0Ptr', and the > +least significant 64 bits of the normalized significand are stored at the > +location pointed to by `zSig1Ptr'. > +------------------------------------------------------------------------= ------- > +*/ > static void > normalizeFloat128Subnormal( > uint64_t aSig0, > @@ -940,19 +995,20 @@ static void >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Packs the sign `zSign', the exponent `zExp', and the significand formed > -| by the concatenation of `zSig0' and `zSig1' into a quadruple-precision > -| floating-point value, returning the result. After being shifted into = the > -| proper positions, the three fields `zSign', `zExp', and `zSig0' are si= mply > -| added together to form the most significant 32 bits of the result. Th= is > -| means that any integer portion of `zSig0' will be added into the expon= ent. > -| Since a properly normalized significand will have an integer portion e= qual > -| to 1, the `zExp' input should be 1 less than the desired result expone= nt > -| whenever `zSig0' and `zSig1' concatenated form a complete, normalized > -| significand. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Packs the sign `zSign', the exponent `zExp', and the significand formed > +by the concatenation of `zSig0' and `zSig1' into a quadruple-precision > +floating-point value, returning the result. After being shifted into the > +proper positions, the three fields `zSign', `zExp', and `zSig0' are simp= ly > +added together to form the most significant 32 bits of the result. This > +means that any integer portion of `zSig0' will be added into the exponen= t. > +Since a properly normalized significand will have an integer portion equ= al > +to 1, the `zExp' input should be 1 less than the desired result exponent > +whenever `zSig0' and `zSig1' concatenated form a complete, normalized > +significand. > +------------------------------------------------------------------------= ------- > +*/ > INLINE float128 > packFloat128( flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 ) > { > @@ -964,27 +1020,28 @@ INLINE float128 >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes an abstract floating-point value having sign `zSign', exponent `= zExp', > -| and extended significand formed by the concatenation of `zSig0', `zSig= 1', > -| and `zSig2', and returns the proper quadruple-precision floating-point= value > -| corresponding to the abstract input. Ordinarily, the abstract value is > -| simply rounded and packed into the quadruple-precision format, with the > -| inexact exception raised if the abstract input cannot be represented > -| exactly. However, if the abstract value is too large, the overflow and > -| inexact exceptions are raised and an infinity or maximal finite value = is > -| returned. If the abstract value is too small, the input value is roun= ded to > -| a subnormal number, and the underflow and inexact exceptions are raise= d if > -| the abstract input cannot be represented exactly as a subnormal quadru= ple- > -| precision floating-point number. > -| The input significand must be normalized or smaller. If the input > -| significand is not normalized, `zExp' must be 0; in that case, the res= ult > -| returned is a subnormal number, and it must not require rounding. In = the > -| usual case that the input significand is normalized, `zExp' must be 1 = less > -| than the ``true'' floating-point exponent. The handling of underflow = and > -| overflow follows the IEC/IEEE Standard for Binary Floating-Point Arith= metic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Takes an abstract floating-point value having sign `zSign', exponent `zE= xp', > +and extended significand formed by the concatenation of `zSig0', `zSig1', > +and `zSig2', and returns the proper quadruple-precision floating-point v= alue > +corresponding to the abstract input. Ordinarily, the abstract value is > +simply rounded and packed into the quadruple-precision format, with the > +inexact exception raised if the abstract input cannot be represented > +exactly. However, if the abstract value is too large, the overflow and > +inexact exceptions are raised and an infinity or maximal finite value is > +returned. If the abstract value is too small, the input value is rounde= d to > +a subnormal number, and the underflow and inexact exceptions are raised = if > +the abstract input cannot be represented exactly as a subnormal quadrupl= e- > +precision floating-point number. > + The input significand must be normalized or smaller. If the input > +significand is not normalized, `zExp' must be 0; in that case, the result > +returned is a subnormal number, and it must not require rounding. In the > +usual case that the input significand is normalized, `zExp' must be 1 le= ss > +than the ``true'' floating-point exponent. The handling of underflow and > +overflow follows the IEC/IEEE Standard for Binary Floating-Point Arithme= tic. > +------------------------------------------------------------------------= ------- > +*/ > static float128 > roundAndPackFloat128( > flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1, uint64_t zS= ig2 STATUS_PARAM) > @@ -1079,16 +1136,17 @@ static float128 >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Takes an abstract floating-point value having sign `zSign', exponent `= zExp', > -| and significand formed by the concatenation of `zSig0' and `zSig1', and > -| returns the proper quadruple-precision floating-point value correspond= ing > -| to the abstract input. This routine is just like `roundAndPackFloat12= 8' > -| except that the input significand has fewer bits and does not have to = be > -| normalized. In all cases, `zExp' must be 1 less than the ``true'' flo= ating- > -| point exponent. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Takes an abstract floating-point value having sign `zSign', exponent `zE= xp', > +and significand formed by the concatenation of `zSig0' and `zSig1', and > +returns the proper quadruple-precision floating-point value corresponding > +to the abstract input. This routine is just like `roundAndPackFloat128' > +except that the input significand has fewer bits and does not have to be > +normalized. In all cases, `zExp' must be 1 less than the ``true'' float= ing- > +point exponent. > +------------------------------------------------------------------------= ------- > +*/ > static float128 > normalizeRoundAndPackFloat128( > flag zSign, int32 zExp, uint64_t zSig0, uint64_t zSig1 STATUS_PARAM) > @@ -1115,13 +1173,14 @@ static float128 >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the 32-bit two's complement integer `= a' > -| to the single-precision floating-point format. The conversion is perf= ormed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > - > -float32 int32_to_float32( int32 a STATUS_PARAM ) > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the single-precision floating-point format. The conversion is perfor= med > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > +float32 int32_to_float32( int32 a STATUS_PARAM) > { > flag zSign; >=20=20 > @@ -1132,13 +1191,14 @@ float32 int32_to_float32( int32 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the 32-bit two's complement integer `= a' > -| to the double-precision floating-point format. The conversion is perf= ormed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > - > -float64 int32_to_float64( int32 a STATUS_PARAM ) > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the double-precision floating-point format. The conversion is perfor= med > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > +float64 int32_to_float64( int32 a STATUS_PARAM) > { > flag zSign; > uint32 absA; > @@ -1154,13 +1214,14 @@ float64 int32_to_float64( int32 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the 32-bit two's complement integer `= a' > -| to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Po= int > -| Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the 32-bit two's complement integer `a' > +to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) > { > flag zSign; > @@ -1177,12 +1238,13 @@ floatx80 int32_to_floatx80( int32 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the 32-bit two's complement integer `= a' to > -| the quadruple-precision floating-point format. The conversion is perf= ormed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the 32-bit two's complement integer `a'= to > +the quadruple-precision floating-point format. The conversion is perfor= med > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float128 int32_to_float128( int32 a STATUS_PARAM ) > { > flag zSign; > @@ -1199,12 +1261,13 @@ float128 int32_to_float128( int32 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the 64-bit two's complement integer `= a' > -| to the single-precision floating-point format. The conversion is perf= ormed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the single-precision floating-point format. The conversion is perfor= med > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float32 int64_to_float32( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1252,12 +1315,13 @@ float32 uint64_to_float32( uint64 a STATUS_PARAM ) > } > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the 64-bit two's complement integer `= a' > -| to the double-precision floating-point format. The conversion is perf= ormed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the double-precision floating-point format. The conversion is perfor= med > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float64 int64_to_float64( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1285,13 +1349,14 @@ float64 uint64_to_float64(uint64 a STATUS_PARAM) > return normalizeRoundAndPackFloat64(0, exp, a STATUS_VAR); > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the 64-bit two's complement integer `= a' > -| to the extended double-precision floating-point format. The conversion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Po= int > -| Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the 64-bit two's complement integer `a' > +to the extended double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1306,12 +1371,13 @@ floatx80 int64_to_floatx80( int64 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the 64-bit two's complement integer `= a' to > -| the quadruple-precision floating-point format. The conversion is perf= ormed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the 64-bit two's complement integer `a'= to > +the quadruple-precision floating-point format. The conversion is perfor= med > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float128 int64_to_float128( int64 a STATUS_PARAM ) > { > flag zSign; > @@ -1347,16 +1413,17 @@ float128 uint64_to_float128(uint64 a STATUS_PARAM) > return normalizeRoundAndPackFloat128(0, 0x406E, a, 0 STATUS_VAR); > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the single-precision floating-point v= alue > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows,= the > -| largest integer with the same sign as `a' is returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the single-precision floating-point val= ue > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, t= he > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------= ------- > +*/ > int32 float32_to_int32( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1378,16 +1445,17 @@ int32 float32_to_int32( float32 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the single-precision floating-point v= alue > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise,= if > -| the conversion overflows, the largest integer with the same sign as `a= ' is > -| returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the single-precision floating-point val= ue > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' = is > +returned. > +------------------------------------------------------------------------= ------- > +*/ > int32 float32_to_int32_round_to_zero( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1421,15 +1489,17 @@ int32 float32_to_int32_round_to_zero( float32 a S= TATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the single-precision floating-point v= alue > -| `a' to the 16-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise,= if > -| the conversion overflows, the largest integer with the same sign as `a= ' is > -| returned. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the single-precision floating-point val= ue > +`a' to the 16-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' = is > +returned. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int_fast16_t float32_to_int16_round_to_zero(float32 a STATUS_PARAM) > { > @@ -1470,16 +1540,17 @@ int_fast16_t float32_to_int16_round_to_zero(float= 32 a STATUS_PARAM) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the single-precision floating-point v= alue > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows,= the > -| largest integer with the same sign as `a' is returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the single-precision floating-point val= ue > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, t= he > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------= ------- > +*/ > int64 float32_to_int64( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1507,16 +1578,17 @@ int64 float32_to_int64( float32 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the single-precision floating-point v= alue > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. = If > -| `a' is a NaN, the largest positive integer is returned. Otherwise, if= the > -| conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the single-precision floating-point val= ue > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. If > +`a' is a NaN, the largest positive integer is returned. Otherwise, if t= he > +conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------= ------- > +*/ > int64 float32_to_int64_round_to_zero( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1554,13 +1626,14 @@ int64 float32_to_int64_round_to_zero( float32 a S= TATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the single-precision floating-point v= alue > -| `a' to the double-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the single-precision floating-point val= ue > +`a' to the double-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float64 float32_to_float64( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1584,13 +1657,14 @@ float64 float32_to_float64( float32 a STATUS_PARA= M ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the single-precision floating-point v= alue > -| `a' to the extended double-precision floating-point format. The conve= rsion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Po= int > -| Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the single-precision floating-point val= ue > +`a' to the extended double-precision floating-point format. The convers= ion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 float32_to_floatx80( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1614,13 +1688,14 @@ floatx80 float32_to_floatx80( float32 a STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the single-precision floating-point v= alue > -| `a' to the double-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the single-precision floating-point val= ue > +`a' to the double-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float128 float32_to_float128( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -1644,14 +1719,15 @@ float128 float32_to_float128( float32 a STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Rounds the single-precision floating-point value `a' to an integer, and > -| returns the result as a single-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > -float32 float32_round_to_int( float32 a STATUS_PARAM) > +/* > +------------------------------------------------------------------------= ------- > +Rounds the single-precision floating-point value `a' to an integer, and > +returns the result as a single-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > +float32 float32_round_to_int( float32 a STATUS_PARAM ) > { > flag aSign; > int_fast16_t aExp; > @@ -1704,15 +1780,16 @@ float32 float32_round_to_int( float32 a STATUS_PA= RAM) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of adding the absolute values of the single-precisi= on > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > -static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_P= ARAM) > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of adding the absolute values of the single-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > +static float32 addFloat32Sigs( float32 a, float32 b, flag zSign STATUS_P= ARAM ) > { > int_fast16_t aExp, bExp, zExp; > uint32_t aSig, bSig, zSig; > @@ -1783,15 +1860,16 @@ static float32 addFloat32Sigs( float32 a, float32= b, flag zSign STATUS_PARAM) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of subtracting the absolute values of the single- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IE= EE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > -static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_P= ARAM) > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of subtracting the absolute values of the single- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > +static float32 subFloat32Sigs( float32 a, float32 b, flag zSign STATUS_P= ARAM ) > { > int_fast16_t aExp, bExp, zExp; > uint32_t aSig, bSig, zSig; > @@ -1858,12 +1936,13 @@ static float32 subFloat32Sigs( float32 a, float32= b, flag zSign STATUS_PARAM) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of adding the single-precision floating-point value= s `a' > -| and `b'. The operation is performed according to the IEC/IEEE Standar= d for > -| Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of adding the single-precision floating-point values = `a' > +and `b'. The operation is performed according to the IEC/IEEE Standard = for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float32 float32_add( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -1881,12 +1960,13 @@ float32 float32_add( float32 a, float32 b STATUS_= PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of subtracting the single-precision floating-point = values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Sta= ndard > -| for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of subtracting the single-precision floating-point va= lues > +`a' and `b'. The operation is performed according to the IEC/IEEE Stand= ard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float32 float32_sub( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -1904,12 +1984,13 @@ float32 float32_sub( float32 a, float32 b STATUS_= PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of multiplying the single-precision floating-point = values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Sta= ndard > -| for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of multiplying the single-precision floating-point va= lues > +`a' and `b'. The operation is performed according to the IEC/IEEE Stand= ard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float32 float32_mul( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -1967,12 +2048,13 @@ float32 float32_mul( float32 a, float32 b STATUS_= PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of dividing the single-precision floating-point val= ue `a' > -| by the corresponding value `b'. The operation is performed according = to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of dividing the single-precision floating-point value= `a' > +by the corresponding value `b'. The operation is performed according to= the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float32 float32_div( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -2031,12 +2113,13 @@ float32 float32_div( float32 a, float32 b STATUS_= PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the remainder of the single-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is perform= ed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the remainder of the single-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float32 float32_rem( float32 a, float32 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -2132,16 +2215,18 @@ float32 float32_rem( float32 a, float32 b STATUS_= PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of multiplying the single-precision floating-point = values > -| `a' and `b' then adding 'c', with no intermediate rounding step after = the > -| multiplication. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic 754-2008. > -| The flags argument allows the caller to select negation of the > -| addend, the intermediate product, or the final result. (The difference > -| between this and having the caller do a separate negation is that nega= ting > -| externally will flip the sign bit on NaNs.) > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of multiplying the single-precision floating-point va= lues > +`a' and `b' then adding 'c', with no intermediate rounding step after the > +multiplication. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic 754-2008. > +The flags argument allows the caller to select negation of the > +addend, the intermediate product, or the final result. (The difference > +between this and having the caller do a separate negation is that negati= ng > +externally will flip the sign bit on NaNs.) > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > float32 float32_muladd(float32 a, float32 b, float32 c, int flags STATUS= _PARAM) > { > @@ -2339,12 +2424,13 @@ float32 float32_muladd(float32 a, float32 b, floa= t32 c, int flags STATUS_PARAM) > } >=20=20 >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the square root of the single-precision floating-point value `= a'. > -| The operation is performed according to the IEC/IEEE Standard for Bina= ry > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the square root of the single-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float32 float32_sqrt( float32 a STATUS_PARAM ) > { > flag aSign; > @@ -2394,23 +2480,25 @@ float32 float32_sqrt( float32 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the binary exponential of the single-precision floating-point = value > -| `a'. The operation is performed according to the IEC/IEEE Standard for > -| Binary Floating-Point Arithmetic. > -| > -| Uses the following identities: > -| > -| 1. -------------------------------------------------------------------= ------ > -| x x*ln(2) > -| 2 =3D e > -| > -| 2. -------------------------------------------------------------------= ------ > -| 2 3 4 5 n > -| x x x x x x x > -| e =3D 1 + --- + --- + --- + --- + --- + ... + --- + ... > -| 1! 2! 3! 4! 5! n! > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the binary exponential of the single-precision floating-point va= lue > +`a'. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > + > +Uses the following identities: > + > +1. ---------------------------------------------------------------------= ---- > + x x*ln(2) > + 2 =3D e > + > +2. ---------------------------------------------------------------------= ---- > + 2 3 4 5 n > + x x x x x x x > + e =3D 1 + --- + --- + --- + --- + --- + ... + --- + ... > + 1! 2! 3! 4! 5! n! > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > static const float64 float32_exp2_coefficients[15] =3D > { > @@ -2474,11 +2562,13 @@ float32 float32_exp2( float32 a STATUS_PARAM ) > return float64_to_float32(r, status); > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the binary log of the single-precision floating-point value `a= '. > -| The operation is performed according to the IEC/IEEE Standard for Bina= ry > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the binary log of the single-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float32 float32_log2( float32 a STATUS_PARAM ) > { > flag aSign, zSign; > @@ -2522,12 +2612,14 @@ float32 float32_log2( float32 a STATUS_PARAM ) > return normalizeRoundAndPackFloat32( zSign, 0x85, zSig STATUS_VAR ); > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the single-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is perfo= rmed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the single-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is perform= ed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float32_eq( float32 a, float32 b STATUS_PARAM ) > { > @@ -2546,12 +2638,14 @@ int float32_eq( float32 a, float32 b STATUS_PARAM= ) > return ( av =3D=3D bv ) || ( (uint32_t) ( ( av | bv )<<1 ) =3D=3D 0 = ); > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the single-precision floating-point value `a' is less than > -| or equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is per= formed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the single-precision floating-point value `a' is less than > +or equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is perfo= rmed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float32_le( float32 a, float32 b STATUS_PARAM ) > { > @@ -2575,12 +2669,14 @@ int float32_le( float32 a, float32 b STATUS_PARAM= ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the single-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed accord= ing > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the single-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float32_lt( float32 a, float32 b STATUS_PARAM ) > { > @@ -2604,12 +2700,14 @@ int float32_lt( float32 a, float32 b STATUS_PARAM= ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the single-precision floating-point values `a' and `b' ca= nnot > -| be compared, and 0 otherwise. The invalid exception is raised if eith= er > -| operand is a NaN. The comparison is performed according to the IEC/IE= EE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the single-precision floating-point values `a' and `b' cann= ot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float32_unordered( float32 a, float32 b STATUS_PARAM ) > { > @@ -2625,12 +2723,14 @@ int float32_unordered( float32 a, float32 b STATU= S_PARAM ) > return 0; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the single-precision floating-point value `a' is equal to > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause= an > -| exception. The comparison is performed according to the IEC/IEEE Stan= dard > -| for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the single-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. The comparison is performed according to the IEC/IEEE Standa= rd > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float32_eq_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2649,12 +2749,14 @@ int float32_eq_quiet( float32 a, float32 b STATUS= _PARAM ) > ( (uint32_t) ( ( float32_val(a) | float32_val(b) )<<1 ) =3D= =3D 0 ); > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the single-precision floating-point value `a' is less tha= n or > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do = not > -| cause an exception. Otherwise, the comparison is performed according = to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the single-precision floating-point value `a' is less than = or > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to= the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float32_le_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2680,12 +2782,14 @@ int float32_le_quiet( float32 a, float32 b STATUS= _PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the single-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause= an > -| exception. Otherwise, the comparison is performed according to the IE= C/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the single-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/= IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float32_lt_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2711,12 +2815,14 @@ int float32_lt_quiet( float32 a, float32 b STATUS= _PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the single-precision floating-point values `a' and `b' ca= nnot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. = The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the single-precision floating-point values `a' and `b' cann= ot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float32_unordered_quiet( float32 a, float32 b STATUS_PARAM ) > { > @@ -2734,16 +2840,17 @@ int float32_unordered_quiet( float32 a, float32 b= STATUS_PARAM ) > return 0; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the double-precision floating-point v= alue > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows,= the > -| largest integer with the same sign as `a' is returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the double-precision floating-point val= ue > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, t= he > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------= ------- > +*/ > int32 float64_to_int32( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2762,16 +2869,17 @@ int32 float64_to_int32( float64 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the double-precision floating-point v= alue > -| `a' to the 32-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise,= if > -| the conversion overflows, the largest integer with the same sign as `a= ' is > -| returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the double-precision floating-point val= ue > +`a' to the 32-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' = is > +returned. > +------------------------------------------------------------------------= ------- > +*/ > int32 float64_to_int32_round_to_zero( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2809,15 +2917,17 @@ int32 float64_to_int32_round_to_zero( float64 a S= TATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the double-precision floating-point v= alue > -| `a' to the 16-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise,= if > -| the conversion overflows, the largest integer with the same sign as `a= ' is > -| returned. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the double-precision floating-point val= ue > +`a' to the 16-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' = is > +returned. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int_fast16_t float64_to_int16_round_to_zero(float64 a STATUS_PARAM) > { > @@ -2860,16 +2970,17 @@ int_fast16_t float64_to_int16_round_to_zero(float= 64 a STATUS_PARAM) > return z; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the double-precision floating-point v= alue > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows,= the > -| largest integer with the same sign as `a' is returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the double-precision floating-point val= ue > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, t= he > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------= ------- > +*/ > int64 float64_to_int64( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2903,16 +3014,17 @@ int64 float64_to_int64( float64 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the double-precision floating-point v= alue > -| `a' to the 64-bit two's complement integer format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise,= if > -| the conversion overflows, the largest integer with the same sign as `a= ' is > -| returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the double-precision floating-point val= ue > +`a' to the 64-bit two's complement integer format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' = is > +returned. > +------------------------------------------------------------------------= ------- > +*/ > int64 float64_to_int64_round_to_zero( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2956,13 +3068,14 @@ int64 float64_to_int64_round_to_zero( float64 a S= TATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the double-precision floating-point v= alue > -| `a' to the single-precision floating-point format. The conversion is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the double-precision floating-point val= ue > +`a' to the single-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float32 float64_to_float32( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -2989,16 +3102,18 @@ float32 float64_to_float32( float64 a STATUS_PARA= M ) > } >=20=20 >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > -| half-precision floating-point value, returning the result. After being > -| shifted into the proper positions, the three fields are simply added > -| together to form the result. This means that any integer portion of `= zSig' > -| will be added into the exponent. Since a properly normalized signific= and > -| will have an integer portion equal to 1, the `zExp' input should be 1 = less > -| than the desired result exponent whenever `zSig' is a complete, normal= ized > -| significand. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Packs the sign `zSign', exponent `zExp', and significand `zSig' into a > +half-precision floating-point value, returning the result. After being > +shifted into the proper positions, the three fields are simply added > +together to form the result. This means that any integer portion of `zS= ig' > +will be added into the exponent. Since a properly normalized significand > +will have an integer portion equal to 1, the `zExp' input should be 1 le= ss > +than the desired result exponent whenever `zSig' is a complete, normaliz= ed > +significand. > +------------------------------------------------------------------------= ------- > +*/ > static float16 packFloat16(flag zSign, int_fast16_t zExp, uint16_t zSig) > { > return make_float16( > @@ -3132,13 +3247,14 @@ float16 float32_to_float16(float32 a, flag ieee S= TATUS_PARAM) > return packFloat16(aSign, aExp + 14, aSig >> 13); > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the double-precision floating-point v= alue > -| `a' to the extended double-precision floating-point format. The conve= rsion > -| is performed according to the IEC/IEEE Standard for Binary Floating-Po= int > -| Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the double-precision floating-point val= ue > +`a' to the extended double-precision floating-point format. The convers= ion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 float64_to_floatx80( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3163,13 +3279,14 @@ floatx80 float64_to_floatx80( float64 a STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the double-precision floating-point v= alue > -| `a' to the quadruple-precision floating-point format. The conversion = is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the double-precision floating-point val= ue > +`a' to the quadruple-precision floating-point format. The conversion is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float128 float64_to_float128( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3194,13 +3311,14 @@ float128 float64_to_float128( float64 a STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Rounds the double-precision floating-point value `a' to an integer, and > -| returns the result as a double-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Rounds the double-precision floating-point value `a' to an integer, and > +returns the result as a double-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float64 float64_round_to_int( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3267,14 +3385,15 @@ float64 float64_trunc_to_int( float64 a STATUS_PA= RAM) > return res; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of adding the absolute values of the double-precisi= on > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of adding the absolute values of the double-precision > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > static float64 addFloat64Sigs( float64 a, float64 b, flag zSign STATUS_P= ARAM ) > { > int_fast16_t aExp, bExp, zExp; > @@ -3346,14 +3465,15 @@ static float64 addFloat64Sigs( float64 a, float64= b, flag zSign STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of subtracting the absolute values of the double- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IE= EE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of subtracting the absolute values of the double- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > static float64 subFloat64Sigs( float64 a, float64 b, flag zSign STATUS_P= ARAM ) > { > int_fast16_t aExp, bExp, zExp; > @@ -3421,12 +3541,13 @@ static float64 subFloat64Sigs( float64 a, float64= b, flag zSign STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of adding the double-precision floating-point value= s `a' > -| and `b'. The operation is performed according to the IEC/IEEE Standar= d for > -| Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of adding the double-precision floating-point values = `a' > +and `b'. The operation is performed according to the IEC/IEEE Standard = for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float64 float64_add( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -3444,12 +3565,13 @@ float64 float64_add( float64 a, float64 b STATUS_= PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of subtracting the double-precision floating-point = values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Sta= ndard > -| for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of subtracting the double-precision floating-point va= lues > +`a' and `b'. The operation is performed according to the IEC/IEEE Stand= ard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float64 float64_sub( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -3467,12 +3589,13 @@ float64 float64_sub( float64 a, float64 b STATUS_= PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of multiplying the double-precision floating-point = values > -| `a' and `b'. The operation is performed according to the IEC/IEEE Sta= ndard > -| for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of multiplying the double-precision floating-point va= lues > +`a' and `b'. The operation is performed according to the IEC/IEEE Stand= ard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float64 float64_mul( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -3528,12 +3651,13 @@ float64 float64_mul( float64 a, float64 b STATUS_= PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of dividing the double-precision floating-point val= ue `a' > -| by the corresponding value `b'. The operation is performed according = to > -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of dividing the double-precision floating-point value= `a' > +by the corresponding value `b'. The operation is performed according to > +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float64 float64_div( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -3600,12 +3724,13 @@ float64 float64_div( float64 a, float64 b STATUS_= PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the remainder of the double-precision floating-point value `a' > -| with respect to the corresponding value `b'. The operation is perform= ed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the remainder of the double-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float64 float64_rem( float64 a, float64 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -3686,16 +3811,18 @@ float64 float64_rem( float64 a, float64 b STATUS_= PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of multiplying the double-precision floating-point = values > -| `a' and `b' then adding 'c', with no intermediate rounding step after = the > -| multiplication. The operation is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic 754-2008. > -| The flags argument allows the caller to select negation of the > -| addend, the intermediate product, or the final result. (The difference > -| between this and having the caller do a separate negation is that nega= ting > -| externally will flip the sign bit on NaNs.) > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of multiplying the double-precision floating-point va= lues > +`a' and `b' then adding 'c', with no intermediate rounding step after the > +multiplication. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic 754-2008. > +The flags argument allows the caller to select negation of the > +addend, the intermediate product, or the final result. (The difference > +between this and having the caller do a separate negation is that negati= ng > +externally will flip the sign bit on NaNs.) > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > float64 float64_muladd(float64 a, float64 b, float64 c, int flags STATUS= _PARAM) > { > @@ -3912,12 +4039,13 @@ float64 float64_muladd(float64 a, float64 b, floa= t64 c, int flags STATUS_PARAM) > } > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the square root of the double-precision floating-point value `= a'. > -| The operation is performed according to the IEC/IEEE Standard for Bina= ry > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the square root of the double-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float64 float64_sqrt( float64 a STATUS_PARAM ) > { > flag aSign; > @@ -3964,11 +4092,13 @@ float64 float64_sqrt( float64 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the binary log of the double-precision floating-point value `a= '. > -| The operation is performed according to the IEC/IEEE Standard for Bina= ry > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns the binary log of the double-precision floating-point value `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float64 float64_log2( float64 a STATUS_PARAM ) > { > flag aSign, zSign; > @@ -4011,12 +4141,14 @@ float64 float64_log2( float64 a STATUS_PARAM ) > return normalizeRoundAndPackFloat64( zSign, 0x408, zSig STATUS_VAR ); > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the double-precision floating-point value `a' is equal to= the > -| corresponding value `b', and 0 otherwise. The invalid exception is ra= ised > -| if either operand is a NaN. Otherwise, the comparison is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the double-precision floating-point value `a' is equal to t= he > +corresponding value `b', and 0 otherwise. The invalid exception is rais= ed > +if either operand is a NaN. Otherwise, the comparison is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float64_eq( float64 a, float64 b STATUS_PARAM ) > { > @@ -4036,12 +4168,14 @@ int float64_eq( float64 a, float64 b STATUS_PARAM= ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the double-precision floating-point value `a' is less tha= n or > -| equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is per= formed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the double-precision floating-point value `a' is less than = or > +equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is perfo= rmed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float64_le( float64 a, float64 b STATUS_PARAM ) > { > @@ -4065,12 +4199,14 @@ int float64_le( float64 a, float64 b STATUS_PARAM= ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the double-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed accord= ing > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the double-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float64_lt( float64 a, float64 b STATUS_PARAM ) > { > @@ -4094,12 +4230,14 @@ int float64_lt( float64 a, float64 b STATUS_PARAM= ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the double-precision floating-point values `a' and `b' ca= nnot > -| be compared, and 0 otherwise. The invalid exception is raised if eith= er > -| operand is a NaN. The comparison is performed according to the IEC/IE= EE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the double-precision floating-point values `a' and `b' cann= ot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float64_unordered( float64 a, float64 b STATUS_PARAM ) > { > @@ -4115,12 +4253,14 @@ int float64_unordered( float64 a, float64 b STATU= S_PARAM ) > return 0; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the double-precision floating-point value `a' is equal to= the > -| corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > -| exception.The comparison is performed according to the IEC/IEEE Standa= rd > -| for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the double-precision floating-point value `a' is equal to t= he > +corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception.The comparison is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float64_eq_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4142,12 +4282,14 @@ int float64_eq_quiet( float64 a, float64 b STATUS= _PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the double-precision floating-point value `a' is less tha= n or > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do = not > -| cause an exception. Otherwise, the comparison is performed according = to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the double-precision floating-point value `a' is less than = or > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. Otherwise, the comparison is performed according to= the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float64_le_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4173,12 +4315,14 @@ int float64_le_quiet( float64 a, float64 b STATUS= _PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the double-precision floating-point value `a' is less than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause= an > -| exception. Otherwise, the comparison is performed according to the IE= C/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the double-precision floating-point value `a' is less than > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/= IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float64_lt_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4204,12 +4348,14 @@ int float64_lt_quiet( float64 a, float64 b STATUS= _PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the double-precision floating-point values `a' and `b' ca= nnot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. = The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the double-precision floating-point values `a' and `b' cann= ot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float64_unordered_quiet( float64 a, float64 b STATUS_PARAM ) > { > @@ -4227,16 +4373,17 @@ int float64_unordered_quiet( float64 a, float64 b= STATUS_PARAM ) > return 0; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the extended double-precision floatin= g- > -| point value `a' to the 32-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic---which means in particular that the convers= ion > -| is rounded according to the current rounding mode. If `a' is a NaN, t= he > -| largest positive integer is returned. Otherwise, if the conversion > -| overflows, the largest integer with the same sign as `a' is returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 32-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic---which means in particular that the conversion > +is rounded according to the current rounding mode. If `a' is a NaN, the > +largest positive integer is returned. Otherwise, if the conversion > +overflows, the largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------= ------- > +*/ > int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4254,16 +4401,17 @@ int32 floatx80_to_int32( floatx80 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the extended double-precision floatin= g- > -| point value `a' to the 32-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic, except that the conversion is always rounded > -| toward zero. If `a' is a NaN, the largest positive integer is returne= d. > -| Otherwise, if the conversion overflows, the largest integer with the s= ame > -| sign as `a' is returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 32-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic, except that the conversion is always rounded > +toward zero. If `a' is a NaN, the largest positive integer is returned. > +Otherwise, if the conversion overflows, the largest integer with the same > +sign as `a' is returned. > +------------------------------------------------------------------------= ------- > +*/ > int32 floatx80_to_int32_round_to_zero( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4299,16 +4447,17 @@ int32 floatx80_to_int32_round_to_zero( floatx80 a= STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the extended double-precision floatin= g- > -| point value `a' to the 64-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic---which means in particular that the convers= ion > -| is rounded according to the current rounding mode. If `a' is a NaN, > -| the largest positive integer is returned. Otherwise, if the conversion > -| overflows, the largest integer with the same sign as `a' is returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 64-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic---which means in particular that the conversion > +is rounded according to the current rounding mode. If `a' is a NaN, > +the largest positive integer is returned. Otherwise, if the conversion > +overflows, the largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------= ------- > +*/ > int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4339,16 +4488,17 @@ int64 floatx80_to_int64( floatx80 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the extended double-precision floatin= g- > -| point value `a' to the 64-bit two's complement integer format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic, except that the conversion is always rounded > -| toward zero. If `a' is a NaN, the largest positive integer is returne= d. > -| Otherwise, if the conversion overflows, the largest integer with the s= ame > -| sign as `a' is returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the 64-bit two's complement integer format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic, except that the conversion is always rounded > +toward zero. If `a' is a NaN, the largest positive integer is returned. > +Otherwise, if the conversion overflows, the largest integer with the same > +sign as `a' is returned. > +------------------------------------------------------------------------= ------- > +*/ > int64 floatx80_to_int64_round_to_zero( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4383,13 +4533,14 @@ int64 floatx80_to_int64_round_to_zero( floatx80 a= STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the extended double-precision floatin= g- > -| point value `a' to the single-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the single-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float32 floatx80_to_float32( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4411,13 +4562,14 @@ float32 floatx80_to_float32( floatx80 a STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the extended double-precision floatin= g- > -| point value `a' to the double-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the double-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float64 floatx80_to_float64( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4439,13 +4591,14 @@ float64 floatx80_to_float64( floatx80 a STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the extended double-precision floatin= g- > -| point value `a' to the quadruple-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the extended double-precision floating- > +point value `a' to the quadruple-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float128 floatx80_to_float128( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4463,13 +4616,14 @@ float128 floatx80_to_float128( floatx80 a STATUS_= PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Rounds the extended double-precision floating-point value `a' to an in= teger, > -| and returns the result as an extended quadruple-precision floating-poi= nt > -| value. The operation is performed according to the IEC/IEEE Standard = for > -| Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Rounds the extended double-precision floating-point value `a' to an inte= ger, > +and returns the result as an extended quadruple-precision floating-point > +value. The operation is performed according to the IEC/IEEE Standard for > +Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 floatx80_round_to_int( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -4536,14 +4690,15 @@ floatx80 floatx80_round_to_int( floatx80 a STATUS= _PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of adding the absolute values of the extended doubl= e- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the sum= is > -| negated before being returned. `zSign' is ignored if the result is a = NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of adding the absolute values of the extended double- > +precision floating-point values `a' and `b'. If `zSign' is 1, the sum is > +negated before being returned. `zSign' is ignored if the result is a Na= N. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > static floatx80 addFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STAT= US_PARAM) > { > int32 aExp, bExp, zExp; > @@ -4602,14 +4757,15 @@ static floatx80 addFloatx80Sigs( floatx80 a, floa= tx80 b, flag zSign STATUS_PARAM >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of subtracting the absolute values of the extended > -| double-precision floating-point values `a' and `b'. If `zSign' is 1, = the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IE= EE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of subtracting the absolute values of the extended > +double-precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > static floatx80 subFloatx80Sigs( floatx80 a, floatx80 b, flag zSign STAT= US_PARAM ) > { > int32 aExp, bExp, zExp; > @@ -4670,12 +4826,13 @@ static floatx80 subFloatx80Sigs( floatx80 a, floa= tx80 b, flag zSign STATUS_PARAM >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of adding the extended double-precision floating-po= int > -| values `a' and `b'. The operation is performed according to the IEC/I= EEE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of adding the extended double-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 floatx80_add( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -4691,12 +4848,13 @@ floatx80 floatx80_add( floatx80 a, floatx80 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of subtracting the extended double-precision floati= ng- > -| point values `a' and `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of subtracting the extended double-precision floating- > +point values `a' and `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 floatx80_sub( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -4712,12 +4870,13 @@ floatx80 floatx80_sub( floatx80 a, floatx80 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of multiplying the extended double-precision floati= ng- > -| point values `a' and `b'. The operation is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of multiplying the extended double-precision floating- > +point values `a' and `b'. The operation is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 floatx80_mul( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -4771,12 +4930,13 @@ floatx80 floatx80_mul( floatx80 a, floatx80 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of dividing the extended double-precision floating-= point > -| value `a' by the corresponding value `b'. The operation is performed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of dividing the extended double-precision floating-po= int > +value `a' by the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 floatx80_div( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -4851,12 +5011,13 @@ floatx80 floatx80_div( floatx80 a, floatx80 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the remainder of the extended double-precision floating-point = value > -| `a' with respect to the corresponding value `b'. The operation is per= formed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the remainder of the extended double-precision floating-point va= lue > +`a' with respect to the corresponding value `b'. The operation is perfo= rmed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 floatx80_rem( floatx80 a, floatx80 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -4947,12 +5108,13 @@ floatx80 floatx80_rem( floatx80 a, floatx80 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the square root of the extended double-precision floating-point > -| value `a'. The operation is performed according to the IEC/IEEE Stand= ard > -| for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the square root of the extended double-precision floating-point > +value `a'. The operation is performed according to the IEC/IEEE Standard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) > { > flag aSign; > @@ -5017,12 +5179,14 @@ floatx80 floatx80_sqrt( floatx80 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the extended double-precision floating-point value `a' is= equal > -| to the corresponding value `b', and 0 otherwise. The invalid exceptio= n is > -| raised if either operand is a NaN. Otherwise, the comparison is perfo= rmed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the extended double-precision floating-point value `a' is e= qual > +to the corresponding value `b', and 0 otherwise. The invalid exception = is > +raised if either operand is a NaN. Otherwise, the comparison is perform= ed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int floatx80_eq( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5044,13 +5208,15 @@ int floatx80_eq( floatx80 a, floatx80 b STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the extended double-precision floating-point value `a' is > -| less than or equal to the corresponding value `b', and 0 otherwise. T= he > -| invalid exception is raised if either operand is a NaN. The compariso= n is > -| performed according to the IEC/IEEE Standard for Binary Floating-Point > -| Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the extended double-precision floating-point value `a' is > +less than or equal to the corresponding value `b', and 0 otherwise. The > +invalid exception is raised if either operand is a NaN. The comparison = is > +performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int floatx80_le( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5078,12 +5244,14 @@ int floatx80_le( floatx80 a, floatx80 b STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the extended double-precision floating-point value `a' is > -| less than the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is per= formed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the extended double-precision floating-point value `a' is > +less than the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is perfo= rmed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int floatx80_lt( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5111,12 +5279,14 @@ int floatx80_lt( floatx80 a, floatx80 b STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the extended double-precision floating-point values `a' a= nd `b' > -| cannot be compared, and 0 otherwise. The invalid exception is raised = if > -| either operand is a NaN. The comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the extended double-precision floating-point values `a' and= `b' > +cannot be compared, and 0 otherwise. The invalid exception is raised if > +either operand is a NaN. The comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > int floatx80_unordered( floatx80 a, floatx80 b STATUS_PARAM ) > { > if ( ( ( extractFloatx80Exp( a ) =3D=3D 0x7FFF ) > @@ -5130,12 +5300,14 @@ int floatx80_unordered( floatx80 a, floatx80 b ST= ATUS_PARAM ) > return 0; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the extended double-precision floating-point value `a' is > -| equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do = not > -| cause an exception. The comparison is performed according to the IEC/= IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the extended double-precision floating-point value `a' is > +equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do not > +cause an exception. The comparison is performed according to the IEC/IE= EE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int floatx80_eq_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5160,12 +5332,14 @@ int floatx80_eq_quiet( floatx80 a, floatx80 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the extended double-precision floating-point value `a' is= less > -| than or equal to the corresponding value `b', and 0 otherwise. Quiet = NaNs > -| do not cause an exception. Otherwise, the comparison is performed acc= ording > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the extended double-precision floating-point value `a' is l= ess > +than or equal to the corresponding value `b', and 0 otherwise. Quiet Na= Ns > +do not cause an exception. Otherwise, the comparison is performed accor= ding > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int floatx80_le_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5196,12 +5370,14 @@ int floatx80_le_quiet( floatx80 a, floatx80 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the extended double-precision floating-point value `a' is= less > -| than the corresponding value `b', and 0 otherwise. Quiet NaNs do not = cause > -| an exception. Otherwise, the comparison is performed according to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the extended double-precision floating-point value `a' is l= ess > +than the corresponding value `b', and 0 otherwise. Quiet NaNs do not ca= use > +an exception. Otherwise, the comparison is performed according to the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int floatx80_lt_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > @@ -5232,12 +5408,14 @@ int floatx80_lt_quiet( floatx80 a, floatx80 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the extended double-precision floating-point values `a' a= nd `b' > -| cannot be compared, and 0 otherwise. Quiet NaNs do not cause an excep= tion. > -| The comparison is performed according to the IEC/IEEE Standard for Bin= ary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the extended double-precision floating-point values `a' and= `b' > +cannot be compared, and 0 otherwise. Quiet NaNs do not cause an excepti= on. > +The comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > int floatx80_unordered_quiet( floatx80 a, floatx80 b STATUS_PARAM ) > { > if ( ( ( extractFloatx80Exp( a ) =3D=3D 0x7FFF ) > @@ -5254,16 +5432,17 @@ int floatx80_unordered_quiet( floatx80 a, floatx8= 0 b STATUS_PARAM ) > return 0; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 32-bit two's complement integer format. The conversi= on > -| is performed according to the IEC/IEEE Standard for Binary Floating-Po= int > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows,= the > -| largest integer with the same sign as `a' is returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 32-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, t= he > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------= ------- > +*/ > int32 float128_to_int32( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5283,16 +5462,17 @@ int32 float128_to_int32( float128 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 32-bit two's complement integer format. The conversi= on > -| is performed according to the IEC/IEEE Standard for Binary Floating-Po= int > -| Arithmetic, except that the conversion is always rounded toward zero. = If > -| `a' is a NaN, the largest positive integer is returned. Otherwise, if= the > -| conversion overflows, the largest integer with the same sign as `a' is > -| returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 32-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. If > +`a' is a NaN, the largest positive integer is returned. Otherwise, if t= he > +conversion overflows, the largest integer with the same sign as `a' is > +returned. > +------------------------------------------------------------------------= ------- > +*/ > int32 float128_to_int32_round_to_zero( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5331,16 +5511,17 @@ int32 float128_to_int32_round_to_zero( float128 a= STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 64-bit two's complement integer format. The conversi= on > -| is performed according to the IEC/IEEE Standard for Binary Floating-Po= int > -| Arithmetic---which means in particular that the conversion is rounded > -| according to the current rounding mode. If `a' is a NaN, the largest > -| positive integer is returned. Otherwise, if the conversion overflows,= the > -| largest integer with the same sign as `a' is returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 64-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic---which means in particular that the conversion is rounded > +according to the current rounding mode. If `a' is a NaN, the largest > +positive integer is returned. Otherwise, if the conversion overflows, t= he > +largest integer with the same sign as `a' is returned. > +------------------------------------------------------------------------= ------- > +*/ > int64 float128_to_int64( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5374,16 +5555,17 @@ int64 float128_to_int64( float128 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the 64-bit two's complement integer format. The conversi= on > -| is performed according to the IEC/IEEE Standard for Binary Floating-Po= int > -| Arithmetic, except that the conversion is always rounded toward zero. > -| If `a' is a NaN, the largest positive integer is returned. Otherwise,= if > -| the conversion overflows, the largest integer with the same sign as `a= ' is > -| returned. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the 64-bit two's complement integer format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic, except that the conversion is always rounded toward zero. > +If `a' is a NaN, the largest positive integer is returned. Otherwise, if > +the conversion overflows, the largest integer with the same sign as `a' = is > +returned. > +------------------------------------------------------------------------= ------- > +*/ > int64 float128_to_int64_round_to_zero( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5435,13 +5617,14 @@ int64 float128_to_int64_round_to_zero( float128 a= STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the single-precision floating-point format. The conversi= on > -| is performed according to the IEC/IEEE Standard for Binary Floating-Po= int > -| Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the single-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float32 float128_to_float32( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5470,13 +5653,14 @@ float32 float128_to_float32( float128 a STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the double-precision floating-point format. The conversi= on > -| is performed according to the IEC/IEEE Standard for Binary Floating-Po= int > -| Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the double-precision floating-point format. The conversion > +is performed according to the IEC/IEEE Standard for Binary Floating-Point > +Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float64 float128_to_float64( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5503,13 +5687,14 @@ float64 float128_to_float64( float128 a STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of converting the quadruple-precision floating-point > -| value `a' to the extended double-precision floating-point format. The > -| conversion is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of converting the quadruple-precision floating-point > +value `a' to the extended double-precision floating-point format. The > +conversion is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 float128_to_floatx80( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5538,13 +5723,14 @@ floatx80 float128_to_floatx80( float128 a STATUS_= PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Rounds the quadruple-precision floating-point value `a' to an integer,= and > -| returns the result as a quadruple-precision floating-point value. The > -| operation is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Rounds the quadruple-precision floating-point value `a' to an integer, a= nd > +returns the result as a quadruple-precision floating-point value. The > +operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float128 float128_round_to_int( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -5641,14 +5827,15 @@ float128 float128_round_to_int( float128 a STATUS= _PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of adding the absolute values of the quadruple-prec= ision > -| floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > -| before being returned. `zSign' is ignored if the result is a NaN. > -| The addition is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of adding the absolute values of the quadruple-precis= ion > +floating-point values `a' and `b'. If `zSign' is 1, the sum is negated > +before being returned. `zSign' is ignored if the result is a NaN. > +The addition is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > static float128 addFloat128Sigs( float128 a, float128 b, flag zSign STAT= US_PARAM) > { > int32 aExp, bExp, zExp; > @@ -5727,14 +5914,15 @@ static float128 addFloat128Sigs( float128 a, floa= t128 b, flag zSign STATUS_PARAM >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of subtracting the absolute values of the quadruple- > -| precision floating-point values `a' and `b'. If `zSign' is 1, the > -| difference is negated before being returned. `zSign' is ignored if the > -| result is a NaN. The subtraction is performed according to the IEC/IE= EE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of subtracting the absolute values of the quadruple- > +precision floating-point values `a' and `b'. If `zSign' is 1, the > +difference is negated before being returned. `zSign' is ignored if the > +result is a NaN. The subtraction is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > static float128 subFloat128Sigs( float128 a, float128 b, flag zSign STAT= US_PARAM) > { > int32 aExp, bExp, zExp; > @@ -5811,12 +5999,13 @@ static float128 subFloat128Sigs( float128 a, floa= t128 b, flag zSign STATUS_PARAM >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of adding the quadruple-precision floating-point va= lues > -| `a' and `b'. The operation is performed according to the IEC/IEEE Sta= ndard > -| for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of adding the quadruple-precision floating-point valu= es > +`a' and `b'. The operation is performed according to the IEC/IEEE Stand= ard > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float128 float128_add( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -5832,12 +6021,13 @@ float128 float128_add( float128 a, float128 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of subtracting the quadruple-precision floating-poi= nt > -| values `a' and `b'. The operation is performed according to the IEC/I= EEE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of subtracting the quadruple-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float128 float128_sub( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign; > @@ -5853,12 +6043,13 @@ float128 float128_sub( float128 a, float128 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of multiplying the quadruple-precision floating-poi= nt > -| values `a' and `b'. The operation is performed according to the IEC/I= EEE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of multiplying the quadruple-precision floating-point > +values `a' and `b'. The operation is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float128 float128_mul( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -5917,12 +6108,13 @@ float128 float128_mul( float128 a, float128 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the result of dividing the quadruple-precision floating-point = value > -| `a' by the corresponding value `b'. The operation is performed accord= ing to > -| the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the result of dividing the quadruple-precision floating-point va= lue > +`a' by the corresponding value `b'. The operation is performed accordin= g to > +the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float128 float128_div( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, bSign, zSign; > @@ -6001,12 +6193,13 @@ float128 float128_div( float128 a, float128 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the remainder of the quadruple-precision floating-point value = `a' > -| with respect to the corresponding value `b'. The operation is perform= ed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the remainder of the quadruple-precision floating-point value `a' > +with respect to the corresponding value `b'. The operation is performed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float128 float128_rem( float128 a, float128 b STATUS_PARAM ) > { > flag aSign, zSign; > @@ -6110,12 +6303,13 @@ float128 float128_rem( float128 a, float128 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns the square root of the quadruple-precision floating-point valu= e `a'. > -| The operation is performed according to the IEC/IEEE Standard for Bina= ry > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > - > +/* > +------------------------------------------------------------------------= ------- > +Returns the square root of the quadruple-precision floating-point value = `a'. > +The operation is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ > float128 float128_sqrt( float128 a STATUS_PARAM ) > { > flag aSign; > @@ -6179,12 +6373,14 @@ float128 float128_sqrt( float128 a STATUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the quadruple-precision floating-point value `a' is equal= to > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. Otherwise, the comparison is perfo= rmed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the quadruple-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. Otherwise, the comparison is perform= ed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float128_eq( float128 a, float128 b STATUS_PARAM ) > { > @@ -6206,12 +6402,14 @@ int float128_eq( float128 a, float128 b STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the quadruple-precision floating-point value `a' is less = than > -| or equal to the corresponding value `b', and 0 otherwise. The invalid > -| exception is raised if either operand is a NaN. The comparison is per= formed > -| according to the IEC/IEEE Standard for Binary Floating-Point Arithmeti= c. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the quadruple-precision floating-point value `a' is less th= an > +or equal to the corresponding value `b', and 0 otherwise. The invalid > +exception is raised if either operand is a NaN. The comparison is perfo= rmed > +according to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float128_le( float128 a, float128 b STATUS_PARAM ) > { > @@ -6239,12 +6437,14 @@ int float128_le( float128 a, float128 b STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the quadruple-precision floating-point value `a' is less = than > -| the corresponding value `b', and 0 otherwise. The invalid exception is > -| raised if either operand is a NaN. The comparison is performed accord= ing > -| to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the quadruple-precision floating-point value `a' is less th= an > +the corresponding value `b', and 0 otherwise. The invalid exception is > +raised if either operand is a NaN. The comparison is performed according > +to the IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float128_lt( float128 a, float128 b STATUS_PARAM ) > { > @@ -6272,12 +6472,14 @@ int float128_lt( float128 a, float128 b STATUS_PA= RAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the quadruple-precision floating-point values `a' and `b'= cannot > -| be compared, and 0 otherwise. The invalid exception is raised if eith= er > -| operand is a NaN. The comparison is performed according to the IEC/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the quadruple-precision floating-point values `a' and `b' c= annot > +be compared, and 0 otherwise. The invalid exception is raised if either > +operand is a NaN. The comparison is performed according to the IEC/IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float128_unordered( float128 a, float128 b STATUS_PARAM ) > { > @@ -6292,12 +6494,14 @@ int float128_unordered( float128 a, float128 b ST= ATUS_PARAM ) > return 0; > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the quadruple-precision floating-point value `a' is equal= to > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause= an > -| exception. The comparison is performed according to the IEC/IEEE Stan= dard > -| for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the quadruple-precision floating-point value `a' is equal to > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. The comparison is performed according to the IEC/IEEE Standa= rd > +for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float128_eq_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6322,12 +6526,14 @@ int float128_eq_quiet( float128 a, float128 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the quadruple-precision floating-point value `a' is less = than > -| or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs = do not > -| cause an exception. Otherwise, the comparison is performed according = to the > -| IEC/IEEE Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the quadruple-precision floating-point value `a' is less th= an > +or equal to the corresponding value `b', and 0 otherwise. Quiet NaNs do= not > +cause an exception. Otherwise, the comparison is performed according to= the > +IEC/IEEE Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float128_le_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6358,12 +6564,14 @@ int float128_le_quiet( float128 a, float128 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the quadruple-precision floating-point value `a' is less = than > -| the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause= an > -| exception. Otherwise, the comparison is performed according to the IE= C/IEEE > -| Standard for Binary Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the quadruple-precision floating-point value `a' is less th= an > +the corresponding value `b', and 0 otherwise. Quiet NaNs do not cause an > +exception. Otherwise, the comparison is performed according to the IEC/= IEEE > +Standard for Binary Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float128_lt_quiet( float128 a, float128 b STATUS_PARAM ) > { > @@ -6394,12 +6602,14 @@ int float128_lt_quiet( float128 a, float128 b STA= TUS_PARAM ) >=20=20 > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Returns 1 if the quadruple-precision floating-point values `a' and `b'= cannot > -| be compared, and 0 otherwise. Quiet NaNs do not cause an exception. = The > -| comparison is performed according to the IEC/IEEE Standard for Binary > -| Floating-Point Arithmetic. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Returns 1 if the quadruple-precision floating-point values `a' and `b' c= annot > +be compared, and 0 otherwise. Quiet NaNs do not cause an exception. The > +comparison is performed according to the IEC/IEEE Standard for Binary > +Floating-Point Arithmetic. > +------------------------------------------------------------------------= ------- > +*/ >=20=20 > int float128_unordered_quiet( float128 a, float128 b STATUS_PARAM ) > { > diff --git a/include/fpu/softfloat.h b/include/fpu/softfloat.h > index f3927e2..b646621 100644 > --- a/include/fpu/softfloat.h > +++ b/include/fpu/softfloat.h > @@ -4,10 +4,11 @@ > * Derived from SoftFloat. > */ >=20=20 > -/*=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D > +/* > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D >=20=20 > -This C header file is part of the SoftFloat IEC/IEEE Floating-point Arit= hmetic > -Package, Release 2b. > +This C header file is part of the SoftFloat IEC/IEEE Floating-point > +Arithmetic Package, Release 2a. >=20=20 > Written by John R. Hauser. This work was made possible in part by the > International Computer Science Institute, located at Suite 600, 1947 Cen= ter > @@ -16,24 +17,22 @@ National Science Foundation under grant MIP-9311980. = The original version > of this code was written as part of a project to build a fixed-point vec= tor > processor in collaboration with the University of California at Berkeley, > overseen by Profs. Nelson Morgan and John Wawrzynek. More information > -is available through the Web page `http://www.cs.berkeley.edu/~jhauser/ > +is available through the Web page `http://HTTP.CS.Berkeley.EDU/~jhauser/ > arithmetic/SoftFloat.html'. >=20=20 > -THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effor= t has > -been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT TIM= ES > -RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED TO PER= SONS > -AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR ALL LOSS= ES, > -COSTS, OR OTHER PROBLEMS THEY INCUR DUE TO THE SOFTWARE, AND WHO FURTHER= MORE > -EFFECTIVELY INDEMNIFY JOHN HAUSER AND THE INTERNATIONAL COMPUTER SCIENCE > -INSTITUTE (possibly via similar legal warning) AGAINST ALL LOSSES, COSTS= , OR > -OTHER PROBLEMS INCURRED BY THEIR CUSTOMERS AND CLIENTS DUE TO THE SOFTWA= RE. > +THIS SOFTWARE IS DISTRIBUTED AS IS, FOR FREE. Although reasonable effort > +has been made to avoid it, THIS SOFTWARE MAY CONTAIN FAULTS THAT WILL AT > +TIMES RESULT IN INCORRECT BEHAVIOR. USE OF THIS SOFTWARE IS RESTRICTED = TO > +PERSONS AND ORGANIZATIONS WHO CAN AND WILL TAKE FULL RESPONSIBILITY FOR = ANY > +AND ALL LOSSES, COSTS, OR OTHER PROBLEMS ARISING FROM ITS USE. >=20=20 > Derivative works are acceptable, even for commercial purposes, so long as > -(1) the source code for the derivative work includes prominent notice th= at > -the work is derivative, and (2) the source code includes prominent notic= e with > -these four paragraphs for those parts of this code that are retained. > +(1) they include prominent notice that the work is derivative, and (2) t= hey > +include prominent notice akin to these four paragraphs for those parts of > +this code that are retained. >=20=20 > -=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D*/ > +=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D > +*/ >=20=20 > #ifndef SOFTFLOAT_H > #define SOFTFLOAT_H > @@ -46,14 +45,16 @@ these four paragraphs for those parts of this code th= at are retained. > #include "config-host.h" > #include "qemu/osdep.h" >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Each of the following `typedef's defines the most convenient type that= holds > -| integers of at least as many bits as specified. For example, `uint8' = should > -| be the most convenient type that can hold unsigned integers of as many= as > -| 8 bits. The `flag' type must be able to hold either a 0 or 1. For mo= st > -| implementations of C, `flag', `uint8', and `int8' should all be `typed= ef'ed > -| to the same as `int'. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Each of the following `typedef's defines the most convenient type that h= olds > +integers of at least as many bits as specified. For example, `uint8' sh= ould > +be the most convenient type that can hold unsigned integers of as many as > +8 bits. The `flag' type must be able to hold either a 0 or 1. For most > +implementations of C, `flag', `uint8', and `int8' should all be `typedef= 'ed > +to the same as `int'. > +------------------------------------------------------------------------= ------- > +*/ > typedef uint8_t flag; > typedef uint8_t uint8; > typedef int8_t int8; > @@ -69,9 +70,11 @@ typedef int64_t int64; > #define STATUS(field) status->field > #define STATUS_VAR , status >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE floating-point ordering relations > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE floating-point ordering relations > +------------------------------------------------------------------------= ------- > +*/ > enum { > float_relation_less =3D -1, > float_relation_equal =3D 0, > @@ -79,9 +82,11 @@ enum { > float_relation_unordered =3D 2 > }; >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE floating-point types. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE floating-point types. > +------------------------------------------------------------------------= ------- > +*/ > /* Use structures for soft-float types. This prevents accidentally mixi= ng > them with native int/float types. A sufficiently clever compiler and > sane ABI should be able to see though these structs. However > @@ -137,17 +142,21 @@ typedef struct { > #define make_float128(high_, low_) ((float128) { .high =3D high_, .low = =3D low_ }) > #define make_float128_init(high_, low_) { .high =3D high_, .low =3D low_= } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE floating-point underflow tininess-detection mode. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE floating-point underflow tininess-detection mode. > +------------------------------------------------------------------------= ------- > +*/ > enum { > float_tininess_after_rounding =3D 0, > float_tininess_before_rounding =3D 1 > }; >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE floating-point rounding mode. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE floating-point rounding mode. > +------------------------------------------------------------------------= ------- > +*/ > enum { > float_round_nearest_even =3D 0, > float_round_down =3D 1, > @@ -155,9 +164,11 @@ enum { > float_round_to_zero =3D 3 > }; >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE floating-point exception flags. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE floating-point exception flags. > +------------------------------------------------------------------------= ------- > +*/ > enum { > float_flag_invalid =3D 1, > float_flag_divbyzero =3D 4, > @@ -167,7 +178,6 @@ enum { > float_flag_input_denormal =3D 64, > float_flag_output_denormal =3D 128 > }; > - > typedef struct float_status { > signed char float_detect_tininess; > signed char float_rounding_mode; > @@ -204,27 +214,33 @@ INLINE int get_float_exception_flags(float_status *= status) > } > void set_floatx80_rounding_precision(int val STATUS_PARAM); >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Routine to raise any or all of the software IEC/IEEE floating-point > -| exception flags. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Routine to raise any or all of the software IEC/IEEE floating-point > +exception flags. > +------------------------------------------------------------------------= ------- > +*/ > void float_raise( int8 flags STATUS_PARAM); >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Options to indicate which negations to perform in float*_muladd() > -| Using these differs from negating an input or output before calling > -| the muladd function in that this means that a NaN doesn't have its > -| sign bit inverted before it is propagated. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Options to indicate which negations to perform in float*_muladd() > +Using these differs from negating an input or output before calling > +the muladd function in that this means that a NaN doesn't have its > +sign bit inverted before it is propagated. > +------------------------------------------------------------------------= ------- > +*/ > enum { > float_muladd_negate_c =3D 1, > float_muladd_negate_product =3D 2, > float_muladd_negate_result =3D 4, > }; >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE integer-to-floating-point conversion routines. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE integer-to-floating-point conversion routines. > +------------------------------------------------------------------------= ------- > +*/ > float32 int32_to_float32( int32 STATUS_PARAM ); > float64 int32_to_float64( int32 STATUS_PARAM ); > float32 uint32_to_float32( uint32 STATUS_PARAM ); > @@ -239,15 +255,19 @@ floatx80 int64_to_floatx80( int64 STATUS_PARAM ); > float128 int64_to_float128( int64 STATUS_PARAM ); > float128 uint64_to_float128( uint64 STATUS_PARAM ); >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software half-precision conversion routines. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software half-precision conversion routines. > +*-----------------------------------------------------------------------= ----- > +*/ > float16 float32_to_float16( float32, flag STATUS_PARAM ); > float32 float16_to_float32( float16, flag STATUS_PARAM ); >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software half-precision operations. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software half-precision operations. > +------------------------------------------------------------------------= ------- > +*/ > int float16_is_quiet_nan( float16 ); > int float16_is_signaling_nan( float16 ); > float16 float16_maybe_silence_nan( float16 ); > @@ -257,14 +277,18 @@ INLINE int float16_is_any_nan(float16 a) > return ((float16_val(a) & ~0x8000) > 0x7c00); > } >=20=20 > -/*----------------------------------------------------------------------= ------ > -| The pattern for a default generated half-precision NaN. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +The pattern for a default generated half-precision NaN. > +------------------------------------------------------------------------= ------- > +*/ > extern const float16 float16_default_nan; >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE single-precision conversion routines. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE single-precision conversion routines. > +------------------------------------------------------------------------= ------- > +*/ > int_fast16_t float32_to_int16_round_to_zero(float32 STATUS_PARAM); > uint_fast16_t float32_to_uint16_round_to_zero(float32 STATUS_PARAM); > int32 float32_to_int32( float32 STATUS_PARAM ); > @@ -277,9 +301,11 @@ float64 float32_to_float64( float32 STATUS_PARAM ); > floatx80 float32_to_floatx80( float32 STATUS_PARAM ); > float128 float32_to_float128( float32 STATUS_PARAM ); >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE single-precision operations. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE single-precision operations. > +------------------------------------------------------------------------= ------- > +*/ > float32 float32_round_to_int( float32 STATUS_PARAM ); > float32 float32_add( float32, float32 STATUS_PARAM ); > float32 float32_sub( float32, float32 STATUS_PARAM ); > @@ -361,14 +387,18 @@ INLINE float32 float32_set_sign(float32 a, int sign) > #define float32_infinity make_float32(0x7f800000) >=20=20 >=20=20 > -/*----------------------------------------------------------------------= ------ > -| The pattern for a default generated single-precision NaN. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +The pattern for a default generated single-precision NaN. > +------------------------------------------------------------------------= ------- > +*/ > extern const float32 float32_default_nan; >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE double-precision conversion routines. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE double-precision conversion routines. > +------------------------------------------------------------------------= ------- > +*/ > int_fast16_t float64_to_int16_round_to_zero(float64 STATUS_PARAM); > uint_fast16_t float64_to_uint16_round_to_zero(float64 STATUS_PARAM); > int32 float64_to_int32( float64 STATUS_PARAM ); > @@ -383,9 +413,11 @@ float32 float64_to_float32( float64 STATUS_PARAM ); > floatx80 float64_to_floatx80( float64 STATUS_PARAM ); > float128 float64_to_float128( float64 STATUS_PARAM ); >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE double-precision operations. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE double-precision operations. > +------------------------------------------------------------------------= ------- > +*/ > float64 float64_round_to_int( float64 STATUS_PARAM ); > float64 float64_trunc_to_int( float64 STATUS_PARAM ); > float64 float64_add( float64, float64 STATUS_PARAM ); > @@ -467,14 +499,18 @@ INLINE float64 float64_set_sign(float64 a, int sign) > #define float64_half make_float64(0x3fe0000000000000LL) > #define float64_infinity make_float64(0x7ff0000000000000LL) >=20=20 > -/*----------------------------------------------------------------------= ------ > -| The pattern for a default generated double-precision NaN. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +The pattern for a default generated double-precision NaN. > +------------------------------------------------------------------------= ------- > +*/ > extern const float64 float64_default_nan; >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE extended double-precision conversion routines. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE extended double-precision conversion routines. > +------------------------------------------------------------------------= ------- > +*/ > int32 floatx80_to_int32( floatx80 STATUS_PARAM ); > int32 floatx80_to_int32_round_to_zero( floatx80 STATUS_PARAM ); > int64 floatx80_to_int64( floatx80 STATUS_PARAM ); > @@ -483,9 +519,11 @@ float32 floatx80_to_float32( floatx80 STATUS_PARAM ); > float64 floatx80_to_float64( floatx80 STATUS_PARAM ); > float128 floatx80_to_float128( floatx80 STATUS_PARAM ); >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE extended double-precision operations. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE extended double-precision operations. > +------------------------------------------------------------------------= ------- > +*/ > floatx80 floatx80_round_to_int( floatx80 STATUS_PARAM ); > floatx80 floatx80_add( floatx80, floatx80 STATUS_PARAM ); > floatx80 floatx80_sub( floatx80, floatx80 STATUS_PARAM ); > @@ -552,14 +590,18 @@ INLINE int floatx80_is_any_nan(floatx80 a) > #define floatx80_half make_floatx80(0x3ffe, 0x8000000000000000LL) > #define floatx80_infinity make_floatx80(0x7fff, 0x8000000000000000LL) >=20=20 > -/*----------------------------------------------------------------------= ------ > -| The pattern for a default generated extended double-precision NaN. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +The pattern for a default generated extended double-precision NaN. > +------------------------------------------------------------------------= ------- > +*/ > extern const floatx80 floatx80_default_nan; >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE quadruple-precision conversion routines. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE quadruple-precision conversion routines. > +------------------------------------------------------------------------= ------- > +*/ > int32 float128_to_int32( float128 STATUS_PARAM ); > int32 float128_to_int32_round_to_zero( float128 STATUS_PARAM ); > int64 float128_to_int64( float128 STATUS_PARAM ); > @@ -568,9 +610,11 @@ float32 float128_to_float32( float128 STATUS_PARAM ); > float64 float128_to_float64( float128 STATUS_PARAM ); > floatx80 float128_to_floatx80( float128 STATUS_PARAM ); >=20=20 > -/*----------------------------------------------------------------------= ------ > -| Software IEC/IEEE quadruple-precision operations. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +Software IEC/IEEE quadruple-precision operations. > +------------------------------------------------------------------------= ------- > +*/ > float128 float128_round_to_int( float128 STATUS_PARAM ); > float128 float128_add( float128, float128 STATUS_PARAM ); > float128 float128_sub( float128, float128 STATUS_PARAM ); > @@ -633,9 +677,11 @@ INLINE int float128_is_any_nan(float128 a) >=20=20 > #define float128_zero make_float128(0, 0) >=20=20 > -/*----------------------------------------------------------------------= ------ > -| The pattern for a default generated quadruple-precision NaN. > -*-----------------------------------------------------------------------= -----*/ > +/* > +------------------------------------------------------------------------= ------- > +The pattern for a default generated quadruple-precision NaN. > +------------------------------------------------------------------------= ------- > +*/ > extern const float128 float128_default_nan; >=20=20 > #endif /* !SOFTFLOAT_H */ > --=20 > 1.8.0