From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:58459) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UZzN7-0002gC-Et for qemu-devel@nongnu.org; Wed, 08 May 2013 04:05:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UZzN5-0007GQ-DA for qemu-devel@nongnu.org; Wed, 08 May 2013 04:05:09 -0400 Received: from mail-we0-x232.google.com ([2a00:1450:400c:c03::232]:54599) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UZzN5-0007G4-6S for qemu-devel@nongnu.org; Wed, 08 May 2013 04:05:07 -0400 Received: by mail-we0-f178.google.com with SMTP id q57so1465943wes.37 for ; Wed, 08 May 2013 01:05:06 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <518A072E.9070708@redhat.com> Date: Wed, 08 May 2013 10:05:02 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <86bo8mcsax.fsf@shell.gmplib.org> In-Reply-To: <86bo8mcsax.fsf@shell.gmplib.org> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Possible ppc comparision optimisation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Torbjorn Granlund Cc: qemu-devel@nongnu.org Il 08/05/2013 00:56, Torbjorn Granlund ha scritto: > The current ppc gen_op_cmp generates a long sequence of instructions, > using a plain series of three disjoint compares. > > It is possible to compute the 3 result bits more cleverly. Below is a > possible replacement gen_op_cmp. (It is tested by booting GNU/Linux > ppx64, but not much more than that.) > > Surely this should be faster than the old code? OK, it is less > readable, but cmp is pretty critical and should be made fast. > > Should one truncate things using tcg_gen_trunc_tl_i32 and do the add, > xori, addi as i32 variants? (Why?) I think that would be faster on 32-bit hosts, truncs are cheap. > There could be a disadvantage of this compared to the old code, since > this has a chained algebraic dependency, while the old code's many > instructions might have been more independent. What about these alternatives: setcond LT, t0, arg0, arg1 setcond EQ, t1, arg0, arg1 trunc s0, t0 trunc s1, t1 shli s0, s0, 1 ; s0 = (arg0 < arg1) ? 2 : 0 subi s1, s1, 2 ; s1 = (arg0 != arg1) ? -2 : -1 sub s0, s0, s1 ; < 4 == 1 > 2 shli s0, s0, 1 ; < 8 == 2 > 4 ======= setcond LT, t0, arg0, arg1 setcond NE, t1, arg0, arg1 trunc s0, t0 trunc s1, t1 add s0, s0, s1 ; < 2 == 0 > 1 movi s1, 1 add s0, s0, s1 ; < 3 == 1 > 2 shl s1, s1, s0 ; < 8 == 2 > 4 Paolo > static inline void gen_op_cmp(TCGv arg0, TCGv arg1, int s, int crf) > { > TCGv t0 = tcg_temp_new(); > TCGv t1 = tcg_temp_new(); > TCGv_i32 s0 = tcg_temp_new_i32(); > > tcg_gen_trunc_tl_i32(cpu_crf[crf], cpu_so); > > tcg_gen_setcond_tl((s ? TCG_COND_LE: TCG_COND_LEU), t0, arg0, arg1); > tcg_gen_setcond_tl((s ? TCG_COND_LT: TCG_COND_LTU), t1, arg0, arg1); > tcg_gen_add_tl(t0, t0, t1); > tcg_gen_xori_tl(t0, t0, 1); > tcg_gen_addi_tl(t0, t0, 1); > tcg_gen_trunc_tl_i32(s0, t0); > tcg_gen_shli_i32(s0, s0, 1); > tcg_gen_or_i32(cpu_crf[crf], cpu_crf[crf], s0); > > tcg_temp_free(t0); > tcg_temp_free(t1); > tcg_temp_free_i32(s0); > } >