From: Aurelien Jarno <aurelien@aurel32.net>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexander Graf <agraf@suse.de>,
Paolo Bonzini <pbonzini@redhat.com>,
qemu-ppc@nongnu.org, qemu-devel@nongnu.org,
Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [RFC PATCH v2] tcg/ppc: Improve unaligned load/store handling on 64-bit backend
Date: Mon, 20 Jul 2015 07:16:28 +0200 [thread overview]
Message-ID: <20150720051628.GA22052@aurel32.net> (raw)
In-Reply-To: <1437360604.28088.134.camel@kernel.crashing.org>
On 2015-07-20 12:50, Benjamin Herrenschmidt wrote:
> Currently, we get to the slow path for any unaligned access in the
> backend, because we effectively preserve the bottom address bits
> below the alignment requirement when comparing with the TLB entry,
> so any non-0 bit there will cause the compare to fail.
>
> For the same number of instructions, we can instead add the access
> size - 1 to the address and stick to clearing all the bottom bits.
>
> That means that normal unaligned accesses will not fallback (the HW
> will handle them fine). Only when crossing a page boundary well we
> end up having a mismatch because we'll end up pointing to the next
> page which cannot possibly be in that same TLB entry.
>
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
>
> v2. This is the correct version of the patch, the one that
> actually works :-)
>
> Note: I have verified things still work by booting an x86_64 ubuntu
> installer on ppc64. I haven't noticed a large performance difference,
> going to the full xubuntu installer took 5:45 instead of 5:51 on the
> test machine I used, but I felt this is still worthwhile in case one
> hits a worst-case scenario with a lot of unaligned accesses.
>
> Note2: It would be nice to be able to pass larger load/stores to the
> backend... it means we would need to use a higher bit in the TLB entry
> for "invalid" and a bunch more macros in the front-end, but it could
> be quite helpful speeding up things like memcpy which on ppc64 use
> vector load/stores, or speeding up the new ppc lq/stq instructions.
>
> Anybody already working on that ?
>
> Note3: Hacking TCG is very new to me, so I apologize in advance for
> any stupid oversight. I also assume other backends can probably use
> the same trick if not already...
>
> diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
> index 2b6eafa..5ed8b58 100644
> --- a/tcg/ppc/tcg-target.c
> +++ b/tcg/ppc/tcg-target.c
> @@ -1426,13 +1426,19 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp s_bits,
> if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
> tcg_out_rlw(s, RLWINM, TCG_REG_R0, addrlo, 0,
> (32 - s_bits) & 31, 31 - TARGET_PAGE_BITS);
> - } else if (!s_bits) {
> - tcg_out_rld(s, RLDICR, TCG_REG_R0, addrlo,
> + } else if (s_bits) {
> + /* Alignment check trick: We add the access_size-1 to the address
> + * before masking the low bits. That will make the address overflow
> + * to the next page if we cross a page boundary which will then
> + * force a mismatch of the TLB compare since the next page cannot
> + * possibly be in the same TLB index.
> + */
> + tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, (1 << s_bits) - 1));
> + tcg_out_rld(s, RLDICR, TCG_REG_R0, TCG_REG_R0,
> 0, 63 - TARGET_PAGE_BITS);
> } else {
> - tcg_out_rld(s, RLDICL, TCG_REG_R0, addrlo,
> - 64 - TARGET_PAGE_BITS, TARGET_PAGE_BITS - s_bits);
> - tcg_out_rld(s, RLDICL, TCG_REG_R0, TCG_REG_R0, TARGET_PAGE_BITS, 0);
> + tcg_out_rld(s, RLDICR, TCG_REG_R0, addrlo,
> + 0, 63 - TARGET_PAGE_BITS);
> }
>
> if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
It looks like the same trick I posted here for x86 [1]. At a first look
your patch seems fine, but you need to check if the access is an aligned
or not, by checking the MO_AMASK. Some emulated architectures only
support aligned accesses, so unaligned accesses must still go through the
slow path to be trapped.
[1] https://lists.gnu.org/archive/html/qemu-devel/2015-07/msg02492.html
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
prev parent reply other threads:[~2015-07-20 5:16 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-20 2:50 [Qemu-devel] [RFC PATCH v2] tcg/ppc: Improve unaligned load/store handling on 64-bit backend Benjamin Herrenschmidt
2015-07-20 5:16 ` Aurelien Jarno [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150720051628.GA22052@aurel32.net \
--to=aurelien@aurel32.net \
--cc=agraf@suse.de \
--cc=benh@kernel.crashing.org \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu-ppc@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).