From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1HScsW-0004ZK-DG for qemu-devel@nongnu.org; Sat, 17 Mar 2007 13:39:40 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1HScsU-0004WX-KR for qemu-devel@nongnu.org; Sat, 17 Mar 2007 13:39:39 -0400 Received: from [199.232.76.173] (helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1HScsU-0004WI-Gy for qemu-devel@nongnu.org; Sat, 17 Mar 2007 12:39:38 -0500 Received: from mtaout02-winn.ispmail.ntl.com ([81.103.221.48]) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1HScrE-00020f-IV for qemu-devel@nongnu.org; Sat, 17 Mar 2007 13:38:20 -0400 Received: from aamtaout04-winn.ispmail.ntl.com ([81.103.221.35]) by mtaout02-winn.ispmail.ntl.com with ESMTP id <20070317173817.FNQK3103.mtaout02-winn.ispmail.ntl.com@aamtaout04-winn.ispmail.ntl.com> for ; Sat, 17 Mar 2007 17:38:17 +0000 Received: from phoenix2.frop.org ([82.21.100.63]) by aamtaout04-winn.ispmail.ntl.com with ESMTP id <20070317173817.FWIT29112.aamtaout04-winn.ispmail.ntl.com@phoenix2.frop.org> for ; Sat, 17 Mar 2007 17:38:17 +0000 From: Julian Seward Date: Sat, 17 Mar 2007 17:35:38 +0000 MIME-Version: 1.0 Content-Type: Multipart/Mixed; boundary="Boundary-00=_rbC/FXpsTD9PwUM" Message-Id: <200703171735.39045.jseward@acm.org> Subject: [Qemu-devel] [PATCH] Fix guest x86/amd64 helper_fprem/helper_fprem1 Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org --Boundary-00=_rbC/FXpsTD9PwUM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline The helpers for x86/amd64 fprem and fprem1 in target-i386/helper.c are significantly borked and, for example, cause konqueror in RedHat8 (x86 guest) to go into an infinite loop when displaying http://news.bbc.co.uk. helper_fprem has the following borkage: - various Inf/Nan/zero inputs not handled correctly - incorrect rounding when converting negative 'dblq' to 'q' - incorrect order of assignment to C bits (0,3,1 not 0,1,3) helper_fprem1 has those problems and is also incorrect about the points at which its rounding needs to differ from that of helper_fprem. Patch below fixes all these. It brings the fprem and fprem1 behaviour very much closer to the hardware -- not identical, but close. Some +0.0 results should really be -0.0 and there may still be other differences. Anyway konquerer no longer loops with the patch applied. J --Boundary-00=_rbC/FXpsTD9PwUM Content-Type: text/x-diff; charset="us-ascii"; name="x86_fprem.diff" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="x86_fprem.diff" --- ../Orig/qemu-0.9.0/target-i386/helper.c 2007-02-05 23:01:54.000000000 +0000 +++ target-i386/helper.c 2007-03-17 17:21:02.000000000 +0000 @@ -3097,30 +3097,51 @@ CPU86_LDouble dblq, fpsrcop, fptemp; CPU86_LDoubleU fpsrcop1, fptemp1; int expdif; - int q; + signed long long int q; + + if (isinf(ST0) || isnan(ST0) || isnan(ST1) || (ST1 == 0.0)) { + ST0 = 0.0 / 0.0; /* NaN */ + env->fpus &= (~0x4700); /* (C3,C2,C1,C0) <-- 0000 */ + return; + } fpsrcop = ST0; fptemp = ST1; fpsrcop1.d = fpsrcop; fptemp1.d = fptemp; expdif = EXPD(fpsrcop1) - EXPD(fptemp1); + + if (expdif < 0) { + /* optimisation? taken from the AMD docs */ + env->fpus &= (~0x4700); /* (C3,C2,C1,C0) <-- 0000 */ + /* ST0 is unchanged */ + return; + } + if (expdif < 53) { dblq = fpsrcop / fptemp; - dblq = (dblq < 0.0)? ceil(dblq): floor(dblq); + /* round dblq towards nearest integer */ + dblq = rint(dblq); ST0 = fpsrcop - fptemp*dblq; - q = (int)dblq; /* cutting off top bits is assumed here */ + + /* convert dblq to q by truncating towards zero */ + if (dblq < 0.0) + q = (signed long long int)(-dblq); + else + q = (signed long long int)dblq; + env->fpus &= (~0x4700); /* (C3,C2,C1,C0) <-- 0000 */ - /* (C0,C1,C3) <-- (q2,q1,q0) */ - env->fpus |= (q&0x4) << 6; /* (C0) <-- q2 */ - env->fpus |= (q&0x2) << 8; /* (C1) <-- q1 */ - env->fpus |= (q&0x1) << 14; /* (C3) <-- q0 */ + /* (C0,C3,C1) <-- (q2,q1,q0) */ + env->fpus |= (q&0x4) << (8-2); /* (C0) <-- q2 */ + env->fpus |= (q&0x2) << (14-1); /* (C3) <-- q1 */ + env->fpus |= (q&0x1) << (9-0); /* (C1) <-- q0 */ } else { env->fpus |= 0x400; /* C2 <-- 1 */ fptemp = pow(2.0, expdif-50); fpsrcop = (ST0 / ST1) / fptemp; - /* fpsrcop = integer obtained by rounding to the nearest */ - fpsrcop = (fpsrcop-floor(fpsrcop) < ceil(fpsrcop)-fpsrcop)? - floor(fpsrcop): ceil(fpsrcop); + /* fpsrcop = integer obtained by chopping */ + fpsrcop = (fpsrcop < 0.0)? + -(floor(fabs(fpsrcop))): floor(fpsrcop); ST0 -= (ST1 * fpsrcop * fptemp); } } @@ -3130,26 +3151,48 @@ CPU86_LDouble dblq, fpsrcop, fptemp; CPU86_LDoubleU fpsrcop1, fptemp1; int expdif; - int q; - - fpsrcop = ST0; - fptemp = ST1; + signed long long int q; + + if (isinf(ST0) || isnan(ST0) || isnan(ST1) || (ST1 == 0.0)) { + ST0 = 0.0 / 0.0; /* NaN */ + env->fpus &= (~0x4700); /* (C3,C2,C1,C0) <-- 0000 */ + return; + } + + fpsrcop = (CPU86_LDouble)ST0; + fptemp = (CPU86_LDouble)ST1; fpsrcop1.d = fpsrcop; fptemp1.d = fptemp; expdif = EXPD(fpsrcop1) - EXPD(fptemp1); + + if (expdif < 0) { + /* optimisation? taken from the AMD docs */ + env->fpus &= (~0x4700); /* (C3,C2,C1,C0) <-- 0000 */ + /* ST0 is unchanged */ + return; + } + if ( expdif < 53 ) { - dblq = fpsrcop / fptemp; + dblq = fpsrcop/*ST0*/ / fptemp/*ST1*/; + /* round dblq towards zero */ dblq = (dblq < 0.0)? ceil(dblq): floor(dblq); - ST0 = fpsrcop - fptemp*dblq; - q = (int)dblq; /* cutting off top bits is assumed here */ + ST0 = fpsrcop/*ST0*/ - fptemp*dblq; + + /* convert dblq to q by truncating towards zero */ + if (dblq < 0.0) + q = (signed long long int)(-dblq); + else + q = (signed long long int)dblq; + env->fpus &= (~0x4700); /* (C3,C2,C1,C0) <-- 0000 */ - /* (C0,C1,C3) <-- (q2,q1,q0) */ - env->fpus |= (q&0x4) << 6; /* (C0) <-- q2 */ - env->fpus |= (q&0x2) << 8; /* (C1) <-- q1 */ - env->fpus |= (q&0x1) << 14; /* (C3) <-- q0 */ + /* (C0,C3,C1) <-- (q2,q1,q0) */ + env->fpus |= (q&0x4) << (8-2); /* (C0) <-- q2 */ + env->fpus |= (q&0x2) << (14-1); /* (C3) <-- q1 */ + env->fpus |= (q&0x1) << (9-0); /* (C1) <-- q0 */ } else { + int N = 32 + (expdif % 32); /* as per AMD docs */ env->fpus |= 0x400; /* C2 <-- 1 */ - fptemp = pow(2.0, expdif-50); + fptemp = pow(2.0, (double)(expdif-N)); fpsrcop = (ST0 / ST1) / fptemp; /* fpsrcop = integer obtained by chopping */ fpsrcop = (fpsrcop < 0.0)? --Boundary-00=_rbC/FXpsTD9PwUM--