Re: [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Richard Henderson <rth@twiddle.net>
To: Aurelien Jarno <aurelien@aurel32.net>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine
Date: Tue, 09 Oct 2012 09:40:18 -0700	[thread overview]
Message-ID: <50745372.9090704@twiddle.net> (raw)
In-Reply-To: <20121009153139.GB9230@hall.aurel32.net>

On 10/09/2012 08:31 AM, Aurelien Jarno wrote:
> I am not talking about the code generated by TCG, but rather by the code
> generated by GCC. Does using sum += and sum -= brings a gain to compare
> to the equivalent if function?

It's hard to tell.  My guess is that it's about a wash.  Adding an
artificial __attribute__((noinline)) to make it easier to see:

SUM version

0000000000000190 <swap_commutative>:
     190:       48 83 ec 18             sub    $0x18,%rsp
     194:       4c 8b 06                mov    (%rsi),%r8
     197:       48 8b 0a                mov    (%rdx),%rcx
     19a:       64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
     1a1:       00 00 
     1a3:       48 89 44 24 08          mov    %rax,0x8(%rsp)
     1a8:       31 c0                   xor    %eax,%eax
     1aa:       4c 89 c0                mov    %r8,%rax
     1ad:       49 89 c9                mov    %rcx,%r9
     1b0:       48 c1 e0 04             shl    $0x4,%rax
     1b4:       83 b8 00 00 00 00 01    cmpl   $0x1,0x0(%rax)
                        1b6: R_X86_64_32S       .bss
     1bb:       0f 94 c0                sete   %al
     1be:       49 c1 e1 04             shl    $0x4,%r9
     1c2:       41 83 b9 00 00 00 00    cmpl   $0x1,0x0(%r9)
     1c9:       01 
                        1c5: R_X86_64_32S       .bss
     1ca:       0f b6 c0                movzbl %al,%eax
     1cd:       41 0f 94 c1             sete   %r9b
     1d1:       45 0f b6 c9             movzbl %r9b,%r9d
     1d5:       44 29 c8                sub    %r9d,%eax
     1d8:       83 f8 01                cmp    $0x1,%eax
     1db:       75 23                   jne    200 <swap_commutative+0x70>
     1dd:       48 89 0e                mov    %rcx,(%rsi)
     1e0:       b8 01 00 00 00          mov    $0x1,%eax
     1e5:       4c 89 02                mov    %r8,(%rdx)
     1e8:       48 8b 54 24 08          mov    0x8(%rsp),%rdx
     1ed:       64 48 33 14 25 28 00    xor    %fs:0x28,%rdx
     1f4:       00 00 
     1f6:       75 15                   jne    20d <swap_commutative+0x7d>
     1f8:       48 83 c4 18             add    $0x18,%rsp
     1fc:       c3                      retq   
     1fd:       0f 1f 00                nopl   (%rax)
     200:       48 39 cf                cmp    %rcx,%rdi
     203:       75 04                   jne    209 <swap_commutative+0x79>
     205:       85 c0                   test   %eax,%eax
     207:       74 d4                   je     1dd <swap_commutative+0x4d>
     209:       31 c0                   xor    %eax,%eax
     20b:       eb db                   jmp    1e8 <swap_commutative+0x58>
     20d:       0f 1f 00                nopl   (%rax)
     210:       e8 00 00 00 00          callq  215 <swap_commutative+0x85>
                        211: R_X86_64_PC32      __stack_chk_fail-0x4

=======

    if ((temps[a1].state == TCG_TEMP_CONST
         && temps[a2].state != TCG_TEMP_CONST)
        || (dest == a2
            && ((temps[a1].state == TCG_TEMP_CONST
                 && temps[a2].state == TCG_TEMP_CONST)
                || (temps[a1].state != TCG_TEMP_CONST
                     && temps[a2].state != TCG_TEMP_CONST)))) {

0000000000000190 <swap_commutative>:
     190:       48 83 ec 18             sub    $0x18,%rsp
     194:       4c 8b 02                mov    (%rdx),%r8
     197:       64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
     19e:       00 00 
     1a0:       48 89 44 24 08          mov    %rax,0x8(%rsp)
     1a5:       31 c0                   xor    %eax,%eax
     1a7:       48 8b 06                mov    (%rsi),%rax
     1aa:       48 89 c1                mov    %rax,%rcx
     1ad:       48 c1 e1 04             shl    $0x4,%rcx
     1b1:       83 b9 00 00 00 00 01    cmpl   $0x1,0x0(%rcx)
                        1b3: R_X86_64_32S       .bss
     1b8:       74 0e                   je     1c8 <swap_commutative+0x38>
     1ba:       4c 39 c7                cmp    %r8,%rdi
     1bd:       74 39                   je     1f8 <swap_commutative+0x68>
     1bf:       31 c0                   xor    %eax,%eax
     1c1:       eb 20                   jmp    1e3 <swap_commutative+0x53>
     1c3:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
     1c8:       4c 89 c1                mov    %r8,%rcx
     1cb:       48 c1 e1 04             shl    $0x4,%rcx
     1cf:       83 b9 00 00 00 00 01    cmpl   $0x1,0x0(%rcx)
                        1d1: R_X86_64_32S       .bss
     1d6:       74 36                   je     20e <swap_commutative+0x7e>
     1d8:       4c 89 06                mov    %r8,(%rsi)
     1db:       48 89 02                mov    %rax,(%rdx)
     1de:       b8 01 00 00 00          mov    $0x1,%eax
     1e3:       48 8b 54 24 08          mov    0x8(%rsp),%rdx
     1e8:       64 48 33 14 25 28 00    xor    %fs:0x28,%rdx
     1ef:       00 00 
     1f1:       75 16                   jne    209 <swap_commutative+0x79>
     1f3:       48 83 c4 18             add    $0x18,%rsp
     1f7:       c3                      retq   
     1f8:       48 c1 e7 04             shl    $0x4,%rdi
     1fc:       83 bf 00 00 00 00 01    cmpl   $0x1,0x0(%rdi)
                        1fe: R_X86_64_32S       .bss
     203:       75 d3                   jne    1d8 <swap_commutative+0x48>
     205:       31 c0                   xor    %eax,%eax
     207:       eb da                   jmp    1e3 <swap_commutative+0x53>
     209:       e8 00 00 00 00          callq  20e <swap_commutative+0x7e>
                        20a: R_X86_64_PC32      __stack_chk_fail-0x4
     20e:       4c 39 c7                cmp    %r8,%rdi
     211:       75 ac                   jne    1bf <swap_commutative+0x2f>
     213:       eb c3                   jmp    1d8 <swap_commutative+0x48>


r~

next prev parent reply	other threads:[~2012-10-09 16:40 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-02 18:32 [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Richard Henderson
2012-10-02 18:32 ` [Qemu-devel] [PATCH 01/10] tcg: Split out swap_commutative as a subroutine Richard Henderson
2012-10-09 15:13   ` Aurelien Jarno
2012-10-09 15:23     ` Richard Henderson
2012-10-09 15:31       ` Aurelien Jarno
2012-10-09 16:40         ` Richard Henderson [this message]
2012-10-02 18:32 ` [Qemu-devel] [PATCH 02/10] tcg: Canonicalize add2 operand ordering Richard Henderson
2012-10-09 15:14   ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 03/10] tcg: Swap commutative double-word comparisons Richard Henderson
2012-10-09 15:16   ` Aurelien Jarno
2012-10-09 15:31     ` Richard Henderson
2012-10-09 15:48       ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 04/10] tcg: Use common code when failing to optimize Richard Henderson
2012-10-09 15:25   ` Aurelien Jarno
2012-10-09 15:33     ` Richard Henderson
2012-10-02 18:32 ` [Qemu-devel] [PATCH 05/10] tcg: Optimize double-word comparisons against zero Richard Henderson
2012-10-09 16:32   ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 06/10] tcg: Split out subroutines from do_constant_folding_cond Richard Henderson
2012-10-09 16:33   ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 07/10] tcg: Do constant folding on double-word comparisons Richard Henderson
2012-10-10  9:45   ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 08/10] tcg: Constant fold add2 and sub2 Richard Henderson
2012-10-10  9:52   ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 09/10] tcg: Optimize half-dead add2/sub2 Richard Henderson
2012-10-16 23:25   ` Aurelien Jarno
2012-10-02 18:32 ` [Qemu-devel] [PATCH 10/10] tcg: Optimize mulu2 Richard Henderson
2012-10-16 23:25   ` Aurelien Jarno
2012-10-17  1:09     ` Richard Henderson
2012-10-17 10:58       ` Avi Kivity
2012-10-17 16:41 ` [Qemu-devel] [PATCH v2 00/10] Double-word tcg/optimize improvements Aurelien Jarno

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50745372.9090704@twiddle.net \
    --to=rth@twiddle.net \
    --cc=aurelien@aurel32.net \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.