From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1HScsW-0004ZK-DG
	for qemu-devel@nongnu.org; Sat, 17 Mar 2007 13:39:40 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1HScsU-0004WX-KR
	for qemu-devel@nongnu.org; Sat, 17 Mar 2007 13:39:39 -0400
Received: from [199.232.76.173] (helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1HScsU-0004WI-Gy
	for qemu-devel@nongnu.org; Sat, 17 Mar 2007 12:39:38 -0500
Received: from mtaout02-winn.ispmail.ntl.com ([81.103.221.48])
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <jseward@acm.org>) id 1HScrE-00020f-IV
	for qemu-devel@nongnu.org; Sat, 17 Mar 2007 13:38:20 -0400
Received: from aamtaout04-winn.ispmail.ntl.com ([81.103.221.35])
	by mtaout02-winn.ispmail.ntl.com with ESMTP id
	<20070317173817.FNQK3103.mtaout02-winn.ispmail.ntl.com@aamtaout04-winn.ispmail.ntl.com>
	for <qemu-devel@nongnu.org>; Sat, 17 Mar 2007 17:38:17 +0000
Received: from phoenix2.frop.org ([82.21.100.63])
	by aamtaout04-winn.ispmail.ntl.com with ESMTP id
	<20070317173817.FWIT29112.aamtaout04-winn.ispmail.ntl.com@phoenix2.frop.org>
	for <qemu-devel@nongnu.org>; Sat, 17 Mar 2007 17:38:17 +0000
From: Julian Seward <jseward@acm.org>
Date: Sat, 17 Mar 2007 17:35:38 +0000
MIME-Version: 1.0
Content-Type: Multipart/Mixed;
  boundary="Boundary-00=_rbC/FXpsTD9PwUM"
Message-Id: <200703171735.39045.jseward@acm.org>
Subject: [Qemu-devel] [PATCH] Fix guest x86/amd64 helper_fprem/helper_fprem1
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org

--Boundary-00=_rbC/FXpsTD9PwUM
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline


The helpers for x86/amd64 fprem and fprem1 in target-i386/helper.c are
significantly borked and, for example, cause konqueror in RedHat8 (x86
guest) to go into an infinite loop when displaying http://news.bbc.co.uk.

helper_fprem has the following borkage:
- various Inf/Nan/zero inputs not handled correctly
- incorrect rounding when converting negative 'dblq' to 'q'
- incorrect order of assignment to C bits (0,3,1 not 0,1,3)

helper_fprem1 has those problems and is also incorrect about the points
at which its rounding needs to differ from that of helper_fprem.

Patch below fixes all these.  It brings the fprem and fprem1 behaviour 
very much closer to the hardware -- not identical, but close.  Some
+0.0 results should really be -0.0 and there may still be other differences.

Anyway konquerer no longer loops with the patch applied.

J

--Boundary-00=_rbC/FXpsTD9PwUM
Content-Type: text/x-diff;
  charset="us-ascii";
  name="x86_fprem.diff"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename="x86_fprem.diff"

--- ../Orig/qemu-0.9.0/target-i386/helper.c	2007-02-05 23:01:54.000000000 +0000
+++ target-i386/helper.c	2007-03-17 17:21:02.000000000 +0000
@@ -3097,30 +3097,51 @@
     CPU86_LDouble dblq, fpsrcop, fptemp;
     CPU86_LDoubleU fpsrcop1, fptemp1;
     int expdif;
-    int q;
+    signed long long int q;
+
+    if (isinf(ST0) || isnan(ST0) || isnan(ST1) || (ST1 == 0.0)) {
+       ST0 = 0.0 / 0.0; /* NaN */
+       env->fpus &= (~0x4700); /* (C3,C2,C1,C0) <-- 0000 */
+       return;
+    }
 
     fpsrcop = ST0;
     fptemp = ST1;
     fpsrcop1.d = fpsrcop;
     fptemp1.d = fptemp;
     expdif = EXPD(fpsrcop1) - EXPD(fptemp1);
+
+    if (expdif < 0) {
+        /* optimisation? taken from the AMD docs */
+        env->fpus &= (~0x4700); /* (C3,C2,C1,C0) <-- 0000 */
+        /* ST0 is unchanged */
+	return;
+    }
+
     if (expdif < 53) {
         dblq = fpsrcop / fptemp;
-        dblq = (dblq < 0.0)? ceil(dblq): floor(dblq);
+	/* round dblq towards nearest integer */
+        dblq = rint(dblq);
         ST0 = fpsrcop - fptemp*dblq;
-        q = (int)dblq; /* cutting off top bits is assumed here */
+
+	/* convert dblq to q by truncating towards zero */
+	if (dblq < 0.0)
+           q = (signed long long int)(-dblq);
+        else
+           q = (signed long long int)dblq;
+
         env->fpus &= (~0x4700); /* (C3,C2,C1,C0) <-- 0000 */
-				/* (C0,C1,C3) <-- (q2,q1,q0) */
-        env->fpus |= (q&0x4) << 6; /* (C0) <-- q2 */
-        env->fpus |= (q&0x2) << 8; /* (C1) <-- q1 */
-        env->fpus |= (q&0x1) << 14; /* (C3) <-- q0 */
+                                /* (C0,C3,C1) <-- (q2,q1,q0) */
+        env->fpus |= (q&0x4) << (8-2);  /* (C0) <-- q2 */
+        env->fpus |= (q&0x2) << (14-1); /* (C3) <-- q1 */
+        env->fpus |= (q&0x1) << (9-0);  /* (C1) <-- q0 */
     } else {
         env->fpus |= 0x400;  /* C2 <-- 1 */
         fptemp = pow(2.0, expdif-50);
         fpsrcop = (ST0 / ST1) / fptemp;
-        /* fpsrcop = integer obtained by rounding to the nearest */
-        fpsrcop = (fpsrcop-floor(fpsrcop) < ceil(fpsrcop)-fpsrcop)?
-            floor(fpsrcop): ceil(fpsrcop);
+        /* fpsrcop = integer obtained by chopping */
+        fpsrcop = (fpsrcop < 0.0)?
+            -(floor(fabs(fpsrcop))): floor(fpsrcop);
         ST0 -= (ST1 * fpsrcop * fptemp);
     }
 }
@@ -3130,26 +3151,48 @@
     CPU86_LDouble dblq, fpsrcop, fptemp;
     CPU86_LDoubleU fpsrcop1, fptemp1;
     int expdif;
-    int q;
-    
-    fpsrcop = ST0;
-    fptemp = ST1;
+    signed long long int q;
+
+    if (isinf(ST0) || isnan(ST0) || isnan(ST1) || (ST1 == 0.0)) {
+       ST0 = 0.0 / 0.0; /* NaN */
+       env->fpus &= (~0x4700); /* (C3,C2,C1,C0) <-- 0000 */
+       return;
+    }
+
+    fpsrcop = (CPU86_LDouble)ST0;
+    fptemp = (CPU86_LDouble)ST1;
     fpsrcop1.d = fpsrcop;
     fptemp1.d = fptemp;
     expdif = EXPD(fpsrcop1) - EXPD(fptemp1);
+
+    if (expdif < 0) {
+        /* optimisation? taken from the AMD docs */
+        env->fpus &= (~0x4700); /* (C3,C2,C1,C0) <-- 0000 */
+	/* ST0 is unchanged */
+        return;
+    }
+
     if ( expdif < 53 ) {
-        dblq = fpsrcop / fptemp;
+        dblq = fpsrcop/*ST0*/ / fptemp/*ST1*/;
+	/* round dblq towards zero */
         dblq = (dblq < 0.0)? ceil(dblq): floor(dblq);
-        ST0 = fpsrcop - fptemp*dblq;
-        q = (int)dblq; /* cutting off top bits is assumed here */
+        ST0 = fpsrcop/*ST0*/ - fptemp*dblq;
+
+	/* convert dblq to q by truncating towards zero */
+	if (dblq < 0.0)
+           q = (signed long long int)(-dblq);
+        else
+           q = (signed long long int)dblq;
+
         env->fpus &= (~0x4700); /* (C3,C2,C1,C0) <-- 0000 */
-				/* (C0,C1,C3) <-- (q2,q1,q0) */
-        env->fpus |= (q&0x4) << 6; /* (C0) <-- q2 */
-        env->fpus |= (q&0x2) << 8; /* (C1) <-- q1 */
-        env->fpus |= (q&0x1) << 14; /* (C3) <-- q0 */
+                                /* (C0,C3,C1) <-- (q2,q1,q0) */
+        env->fpus |= (q&0x4) << (8-2);  /* (C0) <-- q2 */
+        env->fpus |= (q&0x2) << (14-1); /* (C3) <-- q1 */
+        env->fpus |= (q&0x1) << (9-0);  /* (C1) <-- q0 */
     } else {
+        int N = 32 + (expdif % 32); /* as per AMD docs */
         env->fpus |= 0x400;  /* C2 <-- 1 */
-        fptemp = pow(2.0, expdif-50);
+        fptemp = pow(2.0, (double)(expdif-N));
         fpsrcop = (ST0 / ST1) / fptemp;
         /* fpsrcop = integer obtained by chopping */
         fpsrcop = (fpsrcop < 0.0)?

--Boundary-00=_rbC/FXpsTD9PwUM--