From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:35319) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TJ7Gs-0003GZ-Ej for qemu-devel@nongnu.org; Tue, 02 Oct 2012 14:32:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1TJ7Gr-0007WJ-3S for qemu-devel@nongnu.org; Tue, 02 Oct 2012 14:32:42 -0400 Received: from mail-pb0-f45.google.com ([209.85.160.45]:48337) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1TJ7Gq-0007WF-SL for qemu-devel@nongnu.org; Tue, 02 Oct 2012 14:32:41 -0400 Received: by pbbrp2 with SMTP id rp2so9147813pbb.4 for ; Tue, 02 Oct 2012 11:32:39 -0700 (PDT) Sender: Richard Henderson From: Richard Henderson Date: Tue, 2 Oct 2012 11:32:20 -0700 Message-Id: <1349202750-16815-1-git-send-email-rth@twiddle.net> Subject: [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Aurelien Jarno Changes v1->v2: * Patch 1 changes the exact swap condition. This helps add2 for e.g. add2 tmp4,tmp5,tmp4,tmp5,c1,c2 where tmp5, c1, and c2 are all input constants. Since tmp4 is variable, we cannot constant fold this. But the existing swap condition would give add2 tmp4.tmp5,tmp4,c2,c1,tmp5 While not incorrect, we do want to prefer "adc $c2,tmp5" on i686. * Patch 2 drops the partial constant folding for add2/sub2. It only does the operand ordering for add2. * Patch 4 is new. When writing the code for brcond2 et al, it did seem silly to do all the gen_args[N] = args[N] copying by hand. I think the patch makes the code more readable. * Patch 5 has the operand typo fixed that Aurelien noticed. * Patch 8 is new, adding the extra nop into the opcode stream that was suggested on the list. With this we fully constant fold add2/sub2. * Patch 9 is new. While looking at dumps from x86_64 bios boot, I noticed that sequences of push/pop insns leave the high-part of %rsp dead. And in general any 32-bit addition in which the high-part isn't "consumed" by cc_dst. * Patch 10 is new, treating mulu2 similarly to add2. It triggers frequently during the boot of seabios, and should not be expensive. r~ Richard Henderson (10): tcg: Split out swap_commutative as a subroutine tcg: Canonicalize add2 operand ordering tcg: Swap commutative double-word comparisons tcg: Use common code when failing to optimize tcg: Optimize double-word comparisons against zero tcg: Split out subroutines from do_constant_folding_cond tcg: Do constant folding on double-word comparisons tcg: Constant fold add2 and sub2 tcg: Optimize half-dead add2/sub2 tcg: Optimize mulu2 tcg/optimize.c | 465 ++++++++++++++++++++++++++++++++++++++------------------- tcg/tcg-op.h | 11 ++ tcg/tcg.c | 53 ++++++- 3 files changed, 377 insertions(+), 152 deletions(-) -- 1.7.11.4