From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:55585)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <gauravs.2010@gmail.com>) id 1XGlqb-0001z2-Lj
	for qemu-devel@nongnu.org; Mon, 11 Aug 2014 05:24:58 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <gauravs.2010@gmail.com>) id 1XGlqa-0000uj-Ow
	for qemu-devel@nongnu.org; Mon, 11 Aug 2014 05:24:57 -0400
Received: from mail-vc0-x231.google.com ([2607:f8b0:400c:c03::231]:63786)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <gauravs.2010@gmail.com>) id 1XGlqa-0000ue-Bv
	for qemu-devel@nongnu.org; Mon, 11 Aug 2014 05:24:56 -0400
Received: by mail-vc0-f177.google.com with SMTP id hy4so11119523vcb.22
	for <qemu-devel@nongnu.org>; Mon, 11 Aug 2014 02:24:55 -0700 (PDT)
MIME-Version: 1.0
From: Gaurav Sharma <gauravs.2010@gmail.com>
Date: Mon, 11 Aug 2014 14:54:35 +0530
Message-ID: <CABiB5K5o1z3yPS6kd5bfsOfNtCvO4y=T0DaTsfmq1UyAZGdxcg@mail.gmail.com>
Content-Type: multipart/alternative; boundary=485b397dd5e5c4fc000500572041
Subject: [Qemu-devel] Issues in conversion to half precision number.
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: QEMU-DEVEL <qemu-devel@nongnu.org>

--485b397dd5e5c4fc000500572041
Content-Type: text/plain; charset=UTF-8

Hi,
While trying conversion of single precision float value to half precision
value for ARM, it seems the code generates incorrect values in some of the
scenarios :

"inline uint32_t perform_round16(iss_info *iss, uint32_t sign, int16_t exp,
uint32_t frac, FPRounding rounding)"

[Case 1]
1. From ARM specs overflow_to_inf is true and result is an overflow
condition.
if N != 16 || fpcr.AHP == '0' then // Single, double or IEEE half precision
    if biased_exp >= 2^E - 1 then
      result = if overflow_to_inf then FPInfinity(sign) else
FPMaxNormal(sign);
      FPProcessException(FPExc_Overflow, fpcr);
      error = 1.0; // Ensure that an Inexact exception occurs

In qemu, we always return the value as :
>> return packFloat16(zSign, 0x1f, 0);
In case overflow_to_inf is false we need to return FPMaxNormal which is :
>> return float_num16(sign, 0x1e, 0x3ff);

[Case 2]
1. From ARM specs :
if round_up then
int_mant = int_mant + 1;
if int_mant == 2^F then // Rounded up from denormalized to normalized
biased_exp = 1;
if int_mant == 2^(F+1) then // Rounded up to next exponent
biased_exp = biased_exp + 1; int_mant = int_mant DIV 2;

result = sign : biased_exp<N-F-2:0> : int_mant<F-1:0>;

[QEMU]
if (exp < -10) {
        return float_num16(sign, 0, 0);
 }

The incremented round up value seems to be lost in this scenario.

Kindly, let me know in case more data points are required.

Thanks,
Gaurav

--485b397dd5e5c4fc000500572041
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div><div><div><div><div><div><div><div><div><div>Hi,=
<br></div>While trying conversion of single precision float value to half p=
recision value for ARM, it seems the code generates incorrect values in som=
e of the scenarios : <br>

<br>&quot;inline uint32_t perform_round16(iss_info *iss, uint32_t sign, int=
16_t exp, uint32_t frac, FPRounding rounding)&quot;<br></div><br>[Case 1]<b=
r></div>1. From ARM specs overflow_to_inf is true and result is an overflow=
 condition.<br>

if N !=3D 16 || fpcr.AHP =3D=3D &#39;0&#39; then // Single, double or IEEE =
half precision<br>=C2=A0=C2=A0=C2=A0 if biased_exp &gt;=3D 2^E - 1 then<br>=
=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 result =3D if overflow_to_inf then FPInfinit=
y(sign) else FPMaxNormal(sign);<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 FPProcess=
Exception(FPExc_Overflow, fpcr);<br>

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 error =3D 1.0; // Ensure that an Inexact exc=
eption occurs<br></div><br>In qemu, we always return the value as :<br>&gt;=
&gt; return packFloat16(zSign, 0x1f, 0);<br></div>In case overflow_to_inf i=
s false we need to return FPMaxNormal which is :<br>

&gt;&gt; return float_num16(sign, 0x1e, 0x3ff);<br><br></div>[Case 2]<br></=
div>1. From ARM specs :<br>if round_up then<br>int_mant =3D int_mant + 1;<b=
r>if int_mant =3D=3D 2^F then // Rounded up from denormalized to normalized=
<br>

biased_exp =3D 1;<br>if int_mant =3D=3D 2^(F+1) then // Rounded up to next =
exponent<br>biased_exp =3D biased_exp + 1; int_mant =3D int_mant DIV 2;<br>=
<br>result =3D sign : biased_exp&lt;N-F-2:0&gt; : int_mant&lt;F-1:0&gt;;<br=
><br></div>

[QEMU]<br>if (exp &lt; -10) {<br>=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=
 return float_num16(sign, 0, 0);<br>=C2=A0}<br><br></div>The incremented ro=
und up value seems to be lost in this scenario.<br><br></div>Kindly, let me=
 know in case more data points are required. <br>

<br></div><div>Thanks,<br></div><div>Gaurav<br></div><br><div><div><br><br>=
<div><br></div></div></div></div>

--485b397dd5e5c4fc000500572041--