Re: [Qemu-devel] [PATCH 7/7] tcg-i386: Perform tail call to qemu_ret_ld*_mmu

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Richard Henderson <rth@twiddle.net>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: qemu-devel@nongnu.org, aurelien@aurel32.net
Subject: Re: [Qemu-devel] [PATCH 7/7] tcg-i386: Perform tail call to qemu_ret_ld*_mmu
Date: Thu, 29 Aug 2013 09:08:20 -0700	[thread overview]
Message-ID: <521F71F4.5080605@twiddle.net> (raw)
In-Reply-To: <521F6942.4060904@redhat.com>

On 08/29/2013 08:31 AM, Paolo Bonzini wrote:
> Il 27/08/2013 23:46, Richard Henderson ha scritto:
>> This does require the fast path always load to the function return
>> value register, but apparently the loaded value usually needs to be
>> spilled back to its memory slot anyway so the change in register
>> does not really change much.
> 
> Even for something like
> 
>     mov (%rdi), %rax
>     add (%r8), %rax
> 
> ?  Memory operands should avoid the need to spill anything.

No, not that kind of spilling.  The kind in which subsequent operations
in the TB perform something that requires the register backing store
within env be up to date.  Thus a "spill" from register to the memory slot.

I could have used better verbage i guess...

In theory, having the load go to a call-saved register instead of the
call-clobbered return value register would mean that (even with updating the
memory backing store) if we had a future read-only use of the register within
the basic block we could re-use the value in the register.

It's just that I didn't see that ever happen in the couple of MB of dumps that
I saw.  Thus my conclusion that it's rare enough not to worry about it.

> Is this change really an advantage considering the additional icache
> footprint of the new helpers?

We *are* getting fractionally smaller TBs, for the low low price of three
functions like

> 00000000005f2df0 <helper_ret_ldsw_mmu>:
>   5f2df0:       48 83 ec 18             sub    $0x18,%rsp
>   5f2df4:       64 48 8b 04 25 28 00    mov    %fs:0x28,%rax
>   5f2dfb:       00 00 
>   5f2dfd:       48 89 44 24 08          mov    %rax,0x8(%rsp)
>   5f2e02:       31 c0                   xor    %eax,%eax
>   5f2e04:       e8 27 fe ff ff          callq  5f2c30 <helper_ret_lduw_mmu>
>   5f2e09:       48 8b 54 24 08          mov    0x8(%rsp),%rdx
>   5f2e0e:       64 48 33 14 25 28 00    xor    %fs:0x28,%rdx
>   5f2e15:       00 00 
>   5f2e17:       48 0f bf c0             movswq %ax,%rax
>   5f2e1b:       75 05                   jne    5f2e22 <helper_ret_ldsw_mmu+0x32>
>   5f2e1d:       48 83 c4 18             add    $0x18,%rsp
>   5f2e21:       c3                      retq   
>   5f2e22:       e8 79 a7 e1 ff          callq  40d5a0 <__stack_chk_fail@plt>
>   5f2e27:       66 0f 1f 84 00 00 00    nopw   0x0(%rax,%rax,1)
>   5f2e2e:       00 00 

Spend 64 * 3 bytes on new helpers, and save 10 * zillion bytes on the TBs.  In
all likelyhood we're saving icache rather than using additional.

Where I do think there's cause for treading carefully is wrt Aurelien's
statement "it's the slow path exception, the call-return stack doesn't matter".
 Alternately, given that it *is* the slow path, who cares if the return from
the helper immediately hits a branch, rather than tail-calling back into the
fast path, if the benefit is that the call-return stack is still valid above
the code_gen_buffer after a simple tlb miss?

As an aside, why why o why do we default to -fstack-protector-all?  Do we
really need checks in every single function, as opposed to those that actually
do something with arrays?  Switch to plain -fstack-protector so we have

> 00000000005a1fd0 <helper_ret_ldsw_mmu>:
>   5a1fd0:       48 83 ec 08             sub    $0x8,%rsp
>   5a1fd4:       e8 57 fe ff ff          callq  5a1e30 <helper_ret_lduw_mmu>
>   5a1fd9:       48 83 c4 08             add    $0x8,%rsp
>   5a1fdd:       48 0f bf c0             movswq %ax,%rax
>   5a1fe1:       c3                      retq   
>   5a1fe2:       66 66 66 66 66 2e 0f    data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
>   5a1fe9:       1f 84 00 00 00 00 00 

and then lets talk about icache savings...

r~

next prev parent reply	other threads:[~2013-08-29 16:08 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-27 21:46 [Qemu-devel] [PATCH 0/7] Further tcg ldst improvements Richard Henderson
2013-08-27 21:46 ` [Qemu-devel] [PATCH 1/7] exec: Reorganize the GETRA/GETPC macros Richard Henderson
2013-08-27 21:46 ` [Qemu-devel] [PATCH 2/7] tcg-i386: Don't perform GETPC adjustment in TCG code Richard Henderson
2013-08-27 21:46 ` [Qemu-devel] [PATCH 3/7] exec: Rename USUFFIX to LSUFFIX Richard Henderson
2013-08-27 21:46 ` [Qemu-devel] [PATCH 4/7] target: Include softmmu_exec.h where forgotten Richard Henderson
2013-08-27 21:46 ` [Qemu-devel] [PATCH 5/7] exec: Split softmmu_defs.h Richard Henderson
2013-08-27 21:46 ` [Qemu-devel] [PATCH 6/7] tcg: Introduce zero and sign-extended versions of load helpers Richard Henderson
2013-08-27 21:46 ` [Qemu-devel] [PATCH 7/7] tcg-i386: Perform tail call to qemu_ret_ld*_mmu Richard Henderson
2013-08-28 16:34   ` Richard Henderson
2013-08-29 15:31   ` Paolo Bonzini
2013-08-29 16:08     ` Richard Henderson [this message]
2013-08-29 16:36       ` Paolo Bonzini
2013-08-29 17:06   ` Aurelien Jarno
2013-08-29 17:43     ` Richard Henderson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=521F71F4.5080605@twiddle.net \
    --to=rth@twiddle.net \
    --cc=aurelien@aurel32.net \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.