All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paolo Bonzini <pbonzini@redhat.com>
To: Richard Henderson <rth@twiddle.net>
Cc: qemu-ppc@nongnu.org, qemu-devel@nongnu.org, aurelien@aurel32.net
Subject: Re: [Qemu-devel] [PATCH 2/2] tcg-ppc: use new return-argument ld/st helpers
Date: Thu, 05 Sep 2013 17:41:32 +0200	[thread overview]
Message-ID: <5228A62C.90507@redhat.com> (raw)
In-Reply-To: <5228A08C.5040106@twiddle.net>

Il 05/09/2013 17:17, Richard Henderson ha scritto:
> On 09/05/2013 01:22 AM, Paolo Bonzini wrote:
>> These use a 32-bit load-of-immediate to save a mflr+addi+mtlr sequence.
>> Tested with a Windows 98 guest (pretty much the most recent thing I
>> could run on my PPC machine) and kvm-unit-tests's sieve.flat.  The
>> speed up for sieve.flat is as high as 10% for qemu-system-i386, 25%
>> (no kidding) for qemu-system-x86_64 on my PowerBook G4.
> 
> See also the series beginning at
> 
> http://lists.nongnu.org/archive/html/qemu-devel/2013-09/msg00025.html
> 
> The major difference is that I use a conditional call out of the fast
> path, which lets me later just use one mflr to pass the parameter.  I
> also, perhaps foolishly, got rid of the trampolines.  E.g.
> 
> 0xf57a1838:  rlwinm  r3,r15,24,20,27
> 0xf57a183c:  rlwinm  r0,r15,0,30,19
> 0xf57a1840:  add     r3,r3,r27
> 0xf57a1844:  lwz     r4,6436(r3)
> 0xf57a1848:  cmpw    cr7,r0,r4
> 0xf57a184c:  lwz     r3,6444(r3)
> 0xf57a1850:  bnel-   cr7,0xf57a1910
> 0xf57a1854:  stwx    r16,r3,r15
> ...
> 0xf57a1910:  mr      r3,r27
> 0xf57a1914:  mr      r4,r15
> 0xf57a1918:  mr      r5,r16
> 0xf57a191c:  li      r6,1
> 0xf57a1920:  mflr    r7
> 0xf57a1924:  lis     r0,4120
> 0xf57a1928:  ori     r0,r0,45040
> 0xf57a192c:  mtctr   r0
> 0xf57a1930:  bctrl
> 0xf57a1934:  b       0xf57a1858
> 
> I don't see anything technically wrong with your patch.  But I'd be
> interested to compare vs mine.

Sure, I'll give it a try tomorrow or in the weekend.

The G4 in my computer must simply hate the mflr/add/mtlr sequence in the
trampoline; there's no other explanation for such a huge performance
improvement.  So even though I suspect that there won't be much
difference between our patches it's good to check what's better in case
your sequences are triggering something as bad.  The bnel/mflr is a nice
trick to save one instruction, though!

Regarding removal of the trampolines, the extra icache cost should be a
wash now that they are half the size, but I'd still prefer it to be a
separate patch.

Paolo

  reply	other threads:[~2013-09-05 15:41 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-09-05  8:22 [Qemu-devel] [PATCH 0/2] tcg-ppc: use new return-argument ld/st helpers Paolo Bonzini
2013-09-05  8:22 ` [Qemu-devel] [PATCH 1/2] tcg-ppc: fix qemu_ld/qemu_st for AIX ABI Paolo Bonzini
2013-09-05  8:22 ` [Qemu-devel] [PATCH 2/2] tcg-ppc: use new return-argument ld/st helpers Paolo Bonzini
2013-09-05 15:17   ` Richard Henderson
2013-09-05 15:41     ` Paolo Bonzini [this message]
2013-09-05  9:46 ` [Qemu-devel] [Qemu-ppc] [PATCH 0/2] " Alexander Graf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5228A62C.90507@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=aurelien@aurel32.net \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.