From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:46972)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <tg@gmplib.org>) id 1Ua6Xd-0004TQ-8e
	for qemu-devel@nongnu.org; Wed, 08 May 2013 11:44:32 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <tg@gmplib.org>) id 1Ua6Xa-0001UP-DY
	for qemu-devel@nongnu.org; Wed, 08 May 2013 11:44:29 -0400
Received: from gmplib-02.nada.kth.se ([130.237.222.242]:29311
	helo=shell.gmplib.org) by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <tg@gmplib.org>) id 1Ua6Xa-0001U5-6N
	for qemu-devel@nongnu.org; Wed, 08 May 2013 11:44:26 -0400
References: <86bo8mcsax.fsf@shell.gmplib.org> <518A072E.9070708@redhat.com>
From: Torbjorn Granlund <tg@gmplib.org>
Sender: tg@gmplib.org
Date: Wed, 08 May 2013 17:44:24 +0200
In-Reply-To: <518A072E.9070708@redhat.com> (Paolo Bonzini's message of "Wed\,
	08 May 2013 10\:05\:02 +0200")
Message-ID: <86haidv5lz.fsf@shell.gmplib.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] Possible ppc comparision optimisation
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org

Paolo Bonzini <pbonzini@redhat.com> writes:

  I think that would be faster on 32-bit hosts, truncs are cheap.
=20=20
And slower perhaps on 64-bit hosts, at least for operations where
additional explicit trunctation will be needed (such as before
comparisions and after right shifts).

  > There could be a disadvantage of this compared to the old code, since
  > this has a chained algebraic dependency, while the old code's many
  > instructions might have been more independent.
=20=20
  What about these alternatives:
=20=20
  setcond LT, t0, arg0, arg1
  setcond EQ, t1, arg0, arg1
  trunc  s0, t0
  trunc  s1, t1
  shli   s0, s0, 1                ; s0 =3D (arg0 < arg1) ? 2 : 0
  subi   s1, s1, 2                ; s1 =3D (arg0 !=3D arg1) ? -2 : -1
  sub    s0, s0, s1               ; < 4       =3D=3D 1      > 2
  shli   s0, s0, 1                ; < 8       =3D=3D 2      > 4
=20=20
  =3D=3D=3D=3D=3D=3D=3D
=20=20
  setcond LT, t0, arg0, arg1
  setcond NE, t1, arg0, arg1
  trunc   s0, t0
  trunc   s1, t1
  add     s0, s0, s1              ; < 2       =3D=3D 0      > 1
  movi    s1, 1
  add     s0, s0, s1              ; < 3       =3D=3D 1      > 2
  shl     s1, s1, s0              ; < 8       =3D=3D 2      > 4
=20=20
Surely there are many alternative forms.
Is your aim to add micro-parallelism?

(Your sequences look a bit curious.  Did you use a super-optimiser?)

--=20
Torbj=C3=B6rn