From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:59469)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1Ua737-0004vA-Sw
	for qemu-devel@nongnu.org; Wed, 08 May 2013 12:17:03 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1Ua736-0005vB-OC
	for qemu-devel@nongnu.org; Wed, 08 May 2013 12:17:01 -0400
Received: from mail-ea0-x22f.google.com ([2a00:1450:4013:c01::22f]:53136)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1Ua736-0005uy-HV
	for qemu-devel@nongnu.org; Wed, 08 May 2013 12:17:00 -0400
Received: by mail-ea0-f175.google.com with SMTP id q10so1045828eaj.34
	for <qemu-devel@nongnu.org>; Wed, 08 May 2013 09:16:59 -0700 (PDT)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
Message-ID: <518A7A6B.6050404@redhat.com>
Date: Wed, 08 May 2013 18:16:43 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <86bo8mcsax.fsf@shell.gmplib.org> <518A072E.9070708@redhat.com>
	<86haidv5lz.fsf@shell.gmplib.org>
In-Reply-To: <86haidv5lz.fsf@shell.gmplib.org>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Possible ppc comparision optimisation
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Torbjorn Granlund <tg@gmplib.org>
Cc: qemu-devel@nongnu.org

Il 08/05/2013 17:44, Torbjorn Granlund ha scritto:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
>   I think that would be faster on 32-bit hosts, truncs are cheap.
>   
> And slower perhaps on 64-bit hosts, at least for operations where
> additional explicit trunctation will be needed (such as before
> comparisions and after right shifts).
> 
>   > There could be a disadvantage of this compared to the old code, since
>   > this has a chained algebraic dependency, while the old code's many
>   > instructions might have been more independent.
>   
>   What about these alternatives:
>   
>   setcond LT, t0, arg0, arg1
>   setcond EQ, t1, arg0, arg1
>   trunc  s0, t0
>   trunc  s1, t1
>   shli   s0, s0, 1                ; s0 = (arg0 < arg1) ? 2 : 0
>   subi   s1, s1, 2                ; s1 = (arg0 != arg1) ? -2 : -1
>   sub    s0, s0, s1               ; < 4       == 1      > 2
>   shli   s0, s0, 1                ; < 8       == 2      > 4
>   
>   =======
>   
>   setcond LT, t0, arg0, arg1
>   setcond NE, t1, arg0, arg1
>   trunc   s0, t0
>   trunc   s1, t1
>   add     s0, s0, s1              ; < 2       == 0      > 1
>   movi    s1, 1
>   add     s0, s0, s1              ; < 3       == 1      > 2
>   shl     s1, s1, s0              ; < 8       == 2      > 4
>   
> Surely there are many alternative forms.
> Is your aim to add micro-parallelism?

Yes, I think in this respect I think the first one is better.  The
second could be three instructions on machines that have a set-nth-bit
instruction _and_ a zero register, but I'm not sure they exist...

> (Your sequences look a bit curious.  Did you use a super-optimiser?)

No, but I am attracted to these curious sequences from my previous life
working on compilers. :)  I know your superoptimizer and, in fact, we
both worked on some parts of GCC (optimization of conditional
branches/stores), just 20 years apart.

The second is actually not too curious after you look at it for a while,
it is a variant of the usual (x > y) + (x >= y) trick used to generate a
0/1/2 result.  The first I found by trial and error based on yours; it
is basically (x < y) * 2 - (x == y) + 2, with some reordering to get
parallelism and avoid the need for subfi-like instructions.

Paolo