From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:46972) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ua6Xd-0004TQ-8e for qemu-devel@nongnu.org; Wed, 08 May 2013 11:44:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ua6Xa-0001UP-DY for qemu-devel@nongnu.org; Wed, 08 May 2013 11:44:29 -0400 Received: from gmplib-02.nada.kth.se ([130.237.222.242]:29311 helo=shell.gmplib.org) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ua6Xa-0001U5-6N for qemu-devel@nongnu.org; Wed, 08 May 2013 11:44:26 -0400 References: <86bo8mcsax.fsf@shell.gmplib.org> <518A072E.9070708@redhat.com> From: Torbjorn Granlund Sender: tg@gmplib.org Date: Wed, 08 May 2013 17:44:24 +0200 In-Reply-To: <518A072E.9070708@redhat.com> (Paolo Bonzini's message of "Wed\, 08 May 2013 10\:05\:02 +0200") Message-ID: <86haidv5lz.fsf@shell.gmplib.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] Possible ppc comparision optimisation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: qemu-devel@nongnu.org Paolo Bonzini writes: I think that would be faster on 32-bit hosts, truncs are cheap. =20=20 And slower perhaps on 64-bit hosts, at least for operations where additional explicit trunctation will be needed (such as before comparisions and after right shifts). > There could be a disadvantage of this compared to the old code, since > this has a chained algebraic dependency, while the old code's many > instructions might have been more independent. =20=20 What about these alternatives: =20=20 setcond LT, t0, arg0, arg1 setcond EQ, t1, arg0, arg1 trunc s0, t0 trunc s1, t1 shli s0, s0, 1 ; s0 =3D (arg0 < arg1) ? 2 : 0 subi s1, s1, 2 ; s1 =3D (arg0 !=3D arg1) ? -2 : -1 sub s0, s0, s1 ; < 4 =3D=3D 1 > 2 shli s0, s0, 1 ; < 8 =3D=3D 2 > 4 =20=20 =3D=3D=3D=3D=3D=3D=3D =20=20 setcond LT, t0, arg0, arg1 setcond NE, t1, arg0, arg1 trunc s0, t0 trunc s1, t1 add s0, s0, s1 ; < 2 =3D=3D 0 > 1 movi s1, 1 add s0, s0, s1 ; < 3 =3D=3D 1 > 2 shl s1, s1, s0 ; < 8 =3D=3D 2 > 4 =20=20 Surely there are many alternative forms. Is your aim to add micro-parallelism? (Your sequences look a bit curious. Did you use a super-optimiser?) --=20 Torbj=C3=B6rn