From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:59469) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ua737-0004vA-Sw for qemu-devel@nongnu.org; Wed, 08 May 2013 12:17:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Ua736-0005vB-OC for qemu-devel@nongnu.org; Wed, 08 May 2013 12:17:01 -0400 Received: from mail-ea0-x22f.google.com ([2a00:1450:4013:c01::22f]:53136) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Ua736-0005uy-HV for qemu-devel@nongnu.org; Wed, 08 May 2013 12:17:00 -0400 Received: by mail-ea0-f175.google.com with SMTP id q10so1045828eaj.34 for ; Wed, 08 May 2013 09:16:59 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <518A7A6B.6050404@redhat.com> Date: Wed, 08 May 2013 18:16:43 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <86bo8mcsax.fsf@shell.gmplib.org> <518A072E.9070708@redhat.com> <86haidv5lz.fsf@shell.gmplib.org> In-Reply-To: <86haidv5lz.fsf@shell.gmplib.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Possible ppc comparision optimisation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Torbjorn Granlund Cc: qemu-devel@nongnu.org Il 08/05/2013 17:44, Torbjorn Granlund ha scritto: > Paolo Bonzini writes: > > I think that would be faster on 32-bit hosts, truncs are cheap. > > And slower perhaps on 64-bit hosts, at least for operations where > additional explicit trunctation will be needed (such as before > comparisions and after right shifts). > > > There could be a disadvantage of this compared to the old code, since > > this has a chained algebraic dependency, while the old code's many > > instructions might have been more independent. > > What about these alternatives: > > setcond LT, t0, arg0, arg1 > setcond EQ, t1, arg0, arg1 > trunc s0, t0 > trunc s1, t1 > shli s0, s0, 1 ; s0 = (arg0 < arg1) ? 2 : 0 > subi s1, s1, 2 ; s1 = (arg0 != arg1) ? -2 : -1 > sub s0, s0, s1 ; < 4 == 1 > 2 > shli s0, s0, 1 ; < 8 == 2 > 4 > > ======= > > setcond LT, t0, arg0, arg1 > setcond NE, t1, arg0, arg1 > trunc s0, t0 > trunc s1, t1 > add s0, s0, s1 ; < 2 == 0 > 1 > movi s1, 1 > add s0, s0, s1 ; < 3 == 1 > 2 > shl s1, s1, s0 ; < 8 == 2 > 4 > > Surely there are many alternative forms. > Is your aim to add micro-parallelism? Yes, I think in this respect I think the first one is better. The second could be three instructions on machines that have a set-nth-bit instruction _and_ a zero register, but I'm not sure they exist... > (Your sequences look a bit curious. Did you use a super-optimiser?) No, but I am attracted to these curious sequences from my previous life working on compilers. :) I know your superoptimizer and, in fact, we both worked on some parts of GCC (optimization of conditional branches/stores), just 20 years apart. The second is actually not too curious after you look at it for a while, it is a variant of the usual (x > y) + (x >= y) trick used to generate a 0/1/2 result. The first I found by trial and error based on yours; it is basically (x < y) * 2 - (x == y) + 2, with some reordering to get parallelism and avoid the need for subfi-like instructions. Paolo