Re: [Qemu-devel] [RFC PATCH] tcg/ppc: Improve unaligned load/store handling on 64-bit backend

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	qemu-ppc@nongnu.org, Alexander Graf <agraf@suse.de>,
	Aurelien Jarno <aurelien@aurel32.net>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [RFC PATCH] tcg/ppc: Improve unaligned load/store handling on 64-bit backend
Date: Mon, 20 Jul 2015 12:47:01 +1000	[thread overview]
Message-ID: <1437360421.28088.133.camel@kernel.crashing.org> (raw)
In-Reply-To: <1437357287.28088.132.camel@kernel.crashing.org>

On Mon, 2015-07-20 at 11:54 +1000, Benjamin Herrenschmidt wrote:
> Currently, we get to the slow path for any unaligned access in the
> backend, because we effectively preserve the bottom address bits
> below the alignment requirement when comparing with the TLB entry,
> so any non-0 bit there will cause the compare to fail.
> 
> For the same number of instructions, we can instead add the access
> size - 1 to the address and stick to clearing all the bottom bits.
> 
> That means that normal unaligned accesses will not fallback (the HW
> will handle them fine). Only when crossing a page boundary well we
> end up having a mismatch because we'll end up pointing to the next
> page which cannot possibly be in that same TLB entry.

Ignore that version of the patch, it's the wrong one, right one
on the way...

> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> 
> Note: I have verified things still work by booting an x86_64 ubuntu
> installer on ppc64. I haven't noticed a large performance difference,
> going to the full xubuntu installer took 5:45 instead of 5:51 on the
> test machine I used, but I felt this is still worthwhile in case one
> hits a worst-case scenario with a lot of unaligned accesses.
> 
> Note2: It would be nice to be able to pass larger load/stores to the
> backend... it means we would need to use a higher bit in the TLB entry
> for "invalid" and a bunch more macros in the front-end, but it could
> be quite helpful speeding up things like memcpy which on ppc64 use
> vector load/stores, or speeding up the new ppc lq/stq instructions.
> 
> Anybody already working on that ?
> 
> Note3: Hacking TCG is very new to me, so I apologize in advance for
> any stupid oversight. I also assume other backends can probably use
> the same trick if not already...
> 
> diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
> index 2b6eafa..59864bf 100644
> --- a/tcg/ppc/tcg-target.c
> +++ b/tcg/ppc/tcg-target.c
> @@ -1426,13 +1426,18 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp s_bits,
>      if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
>          tcg_out_rlw(s, RLWINM, TCG_REG_R0, addrlo, 0,
>                      (32 - s_bits) & 31, 31 - TARGET_PAGE_BITS);
> -    } else if (!s_bits) {
> -        tcg_out_rld(s, RLDICR, TCG_REG_R0, addrlo,
> -                    0, 63 - TARGET_PAGE_BITS);
>      } else {
> -        tcg_out_rld(s, RLDICL, TCG_REG_R0, addrlo,
> -                    64 - TARGET_PAGE_BITS, TARGET_PAGE_BITS - s_bits);
> -        tcg_out_rld(s, RLDICL, TCG_REG_R0, TCG_REG_R0, TARGET_PAGE_BITS, 0);
> +       /* Alignment check trick: We add the access_size-1 to the address
> +        * before masking the low bits. That will make the address overflow
> +        * to the next page if we cross a page boundary which will then
> +        * force a mismatch of the TLB compare since the next page cannot
> +        * possibly be in the same TLB index.
> +        */
> +        if (s_bits) {
> +            tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, (1 << s_bits) - 1));
> +        }
> +        tcg_out_rld(s, RLDICR, TCG_REG_R0, TCG_REG_R0,
> +                    0, 63 - TARGET_PAGE_BITS);
>      }
>  
>      if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
>

     prev parent reply	other threads:[~2015-07-20  2:47 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-07-20  1:54 [Qemu-devel] [RFC PATCH] tcg/ppc: Improve unaligned load/store handling on 64-bit backend Benjamin Herrenschmidt
2015-07-20  2:47 ` Benjamin Herrenschmidt [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1437360421.28088.133.camel@kernel.crashing.org \
    --to=benh@kernel.crashing.org \
    --cc=agraf@suse.de \
    --cc=aurelien@aurel32.net \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.