From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55173) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZH17U-0002bK-ET for qemu-devel@nongnu.org; Sun, 19 Jul 2015 22:47:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZH17S-0000B5-UL for qemu-devel@nongnu.org; Sun, 19 Jul 2015 22:47:56 -0400 Message-ID: <1437360421.28088.133.camel@kernel.crashing.org> From: Benjamin Herrenschmidt Date: Mon, 20 Jul 2015 12:47:01 +1000 In-Reply-To: <1437357287.28088.132.camel@kernel.crashing.org> References: <1437357287.28088.132.camel@kernel.crashing.org> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC PATCH] tcg/ppc: Improve unaligned load/store handling on 64-bit backend List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Cc: Paolo Bonzini , qemu-ppc@nongnu.org, Alexander Graf , Aurelien Jarno , Richard Henderson On Mon, 2015-07-20 at 11:54 +1000, Benjamin Herrenschmidt wrote: > Currently, we get to the slow path for any unaligned access in the > backend, because we effectively preserve the bottom address bits > below the alignment requirement when comparing with the TLB entry, > so any non-0 bit there will cause the compare to fail. > > For the same number of instructions, we can instead add the access > size - 1 to the address and stick to clearing all the bottom bits. > > That means that normal unaligned accesses will not fallback (the HW > will handle them fine). Only when crossing a page boundary well we > end up having a mismatch because we'll end up pointing to the next > page which cannot possibly be in that same TLB entry. Ignore that version of the patch, it's the wrong one, right one on the way... > Signed-off-by: Benjamin Herrenschmidt > --- > > Note: I have verified things still work by booting an x86_64 ubuntu > installer on ppc64. I haven't noticed a large performance difference, > going to the full xubuntu installer took 5:45 instead of 5:51 on the > test machine I used, but I felt this is still worthwhile in case one > hits a worst-case scenario with a lot of unaligned accesses. > > Note2: It would be nice to be able to pass larger load/stores to the > backend... it means we would need to use a higher bit in the TLB entry > for "invalid" and a bunch more macros in the front-end, but it could > be quite helpful speeding up things like memcpy which on ppc64 use > vector load/stores, or speeding up the new ppc lq/stq instructions. > > Anybody already working on that ? > > Note3: Hacking TCG is very new to me, so I apologize in advance for > any stupid oversight. I also assume other backends can probably use > the same trick if not already... > > diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c > index 2b6eafa..59864bf 100644 > --- a/tcg/ppc/tcg-target.c > +++ b/tcg/ppc/tcg-target.c > @@ -1426,13 +1426,18 @@ static TCGReg tcg_out_tlb_read(TCGContext *s, TCGMemOp s_bits, > if (TCG_TARGET_REG_BITS == 32 || TARGET_LONG_BITS == 32) { > tcg_out_rlw(s, RLWINM, TCG_REG_R0, addrlo, 0, > (32 - s_bits) & 31, 31 - TARGET_PAGE_BITS); > - } else if (!s_bits) { > - tcg_out_rld(s, RLDICR, TCG_REG_R0, addrlo, > - 0, 63 - TARGET_PAGE_BITS); > } else { > - tcg_out_rld(s, RLDICL, TCG_REG_R0, addrlo, > - 64 - TARGET_PAGE_BITS, TARGET_PAGE_BITS - s_bits); > - tcg_out_rld(s, RLDICL, TCG_REG_R0, TCG_REG_R0, TARGET_PAGE_BITS, 0); > + /* Alignment check trick: We add the access_size-1 to the address > + * before masking the low bits. That will make the address overflow > + * to the next page if we cross a page boundary which will then > + * force a mismatch of the TLB compare since the next page cannot > + * possibly be in the same TLB index. > + */ > + if (s_bits) { > + tcg_out32(s, ADDI | TAI(TCG_REG_R0, addrlo, (1 << s_bits) - 1)); > + } > + tcg_out_rld(s, RLDICR, TCG_REG_R0, TCG_REG_R0, > + 0, 63 - TARGET_PAGE_BITS); > } > > if (TCG_TARGET_REG_BITS < TARGET_LONG_BITS) { >