From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:55173)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <benh@kernel.crashing.org>) id 1ZH17U-0002bK-ET
	for qemu-devel@nongnu.org; Sun, 19 Jul 2015 22:47:57 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <benh@kernel.crashing.org>) id 1ZH17S-0000B5-UL
	for qemu-devel@nongnu.org; Sun, 19 Jul 2015 22:47:56 -0400
Message-ID: <1437360421.28088.133.camel@kernel.crashing.org>
From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Mon, 20 Jul 2015 12:47:01 +1000
In-Reply-To: <1437357287.28088.132.camel@kernel.crashing.org>
References: <1437357287.28088.132.camel@kernel.crashing.org>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [RFC PATCH] tcg/ppc: Improve unaligned load/store
 handling on 64-bit backend
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>, qemu-ppc@nongnu.org, Alexander Graf <agraf@suse.de>, Aurelien Jarno <aurelien@aurel32.net>, Richard Henderson <rth@twiddle.net>

On Mon, 2015-07-20 at 11:54 +1000, Benjamin Herrenschmidt wrote:
> Currently, we get to the slow path for any unaligned access in the
> backend, because we effectively preserve the bottom address bits
> below the alignment requirement when comparing with the TLB entry,
> so any non-0 bit there will cause the compare to fail.
> 
> For the same number of instructions, we can instead add the access
> size - 1 to the address and stick to clearing all the bottom bits.
> 
> That means that normal unaligned accesses will not fallback (the HW
> will handle them fine). Only when crossing a page boundary well we
> end up having a mismatch because we'll end up pointing to the next
> page which cannot possibly be in that same TLB entry.

Ignore that version of the patch, it's the wrong one, right one
on the way...

> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> ---
> 
> Note: I have verified things still work by booting an x86_64 ubuntu
> installer on ppc64. I haven't noticed a large performance difference,
> going to the full xubuntu installer took 5:45 instead of 5:51 on the
> test machine I used, but I felt this is still worthwhile in case one
> hits a worst-case scenario with a lot of unaligned accesses.
> 
> Note2: It would be nice to be able to pass larger load/stores to the
> backend... it means we would need to use a higher bit in the TLB entry
> for "invalid" and a bunch more macros in the front-end, but it could
> be quite helpful speeding up things like memcpy which on ppc64 use
> vector load/stores, or speeding up the new ppc lq/stq instructions.
> 
> Anybody already working on that ?
> 
> Note3: Hacking TCG is very new to me, so I apologize in advance for
> any stupid oversight. I also assume other backends can probably use
> the same trick if not already...
> 
> diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
> index 2b6eafa..59864bf 100644
> --- a/tcg/ppc/tcg-target.c
> +++ b/tcg/ppc/tcg-target.c
> @@ -1426,13 +1426,18 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp s_bits,
>      if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) {
>          tcg_out_rlw(s, RLWINM, TCG_REG_R0, addrlo, 0,
>                      (32 - s_bits) & 31, 31 - TARGET_PAGE_BITS);
> -    } else if (!s_bits) {
> -        tcg_out_rld(s, RLDICR, TCG_REG_R0, addrlo,
> -                    0, 63 - TARGET_PAGE_BITS);
>      } else {
> -        tcg_out_rld(s, RLDICL, TCG_REG_R0, addrlo,
> -                    64 - TARGET_PAGE_BITS, TARGET_PAGE_BITS - s_bits);
> -        tcg_out_rld(s, RLDICL, TCG_REG_R0, TCG_REG_R0, TARGET_PAGE_BITS, 0);
> +       /* Alignment check trick: We add the access_size-1 to the address
> +        * before masking the low bits. That will make the address overflow
> +        * to the next page if we cross a page boundary which will then
> +        * force a mismatch of the TLB compare since the next page cannot
> +        * possibly be in the same TLB index.
> +        */
> +        if (s_bits) {
> +            tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, (1 << s_bits) - 1));
> +        }
> +        tcg_out_rld(s, RLDICR, TCG_REG_R0, TCG_REG_R0,
> +                    0, 63 - TARGET_PAGE_BITS);
>      }
>  
>      if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) {
>