All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Emilio G. Cota" <cota@braap.org>
To: Richard Henderson <rth@twiddle.net>
Cc: mttcg@greensocs.com, Peter Maydell <peter.maydell@linaro.org>,
	qemu-devel@nongnu.org,
	Alvise Rigo <a.rigo@virtualopensystems.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	alex.bennee@linaro.org,
	Frederic Konrad <fred.konrad@greensocs.com>
Subject: Re: [Qemu-devel] [RFC 1/8] cputlb: add physical address to CPUTLBEntry
Date: Sun, 10 May 2015 04:07:48 -0400	[thread overview]
Message-ID: <20150510080748.GA8629@flamenco> (raw)
In-Reply-To: <554D2FFE.20204@twiddle.net>

On Fri, May 08, 2015 at 14:51:58 -0700, Richard Henderson wrote:
> Ouch.  24 of 64 wasted bytes for 64-bit?
> 
> I wonder if there's a better way we can encode this to avoid 3 copies of the
> virtual address for read/write/code.  Or if we're better off using more than
> one insn to multiply by a non-power-of-two.

Adding one more instruction works well. Perf tests
(# time ./selftest.sh for [1]) show no appreciable difference.

Patch (i386-only) appended.

[1] http://wiki.qemu.org/download/ppc-virtexml507-linux-2_6_34.tgz

> Or if the hardware multiplier is
> fast enough just multiply by the proper constant.

This option should be slower (3-cycle latency for IMUL
on Ivy Bridge/Haswell, probably slower on others),
but would make the code very simple.
Unfortunately I couldn't write a patch to do it due to
my poor grasp of TCG backend code. If someone provides a
patch I'd be glad to test it.

If the appended ends up being the preferred option I
can extend it to support all the TCG targets. I cannot
however test all of them--only have access to x86 and ppc
hardware at the moment.

Thanks,

		Emilio

[PATCH] i386-only: remove sizeof(CPUTLBEntry)=pow2 constraint

Breaks all non-i386 TCG backends! Do not apply.

Signed-off-by: Emilio G. Cota <cota@braap.org>
---
 include/exec/cpu-defs.h | 23 +++++++++++++++++------
 tcg/i386/tcg-target.c   | 12 +++++++-----
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 8891f16..716052a 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -96,14 +96,25 @@ typedef struct CPUTLBEntry {
     /* Addend to virtual address to get host address.  IO accesses
        use the corresponding iotlb value.  */
     uintptr_t addend;
-    /* padding to get a power of two size */
-    uint8_t dummy[(1 << CPU_TLB_ENTRY_BITS) -
-                  (sizeof(target_ulong) * 4 +
-                   ((-sizeof(target_ulong) * 4) & (sizeof(uintptr_t) - 1)) +
-                   sizeof(uintptr_t))];
 } CPUTLBEntry;
 
-QEMU_BUILD_BUG_ON(sizeof(CPUTLBEntry) != (1 << CPU_TLB_ENTRY_BITS));
+/*
+ * Fast TLB hits are essential for softmmu performance. Since we hand-code in
+ * assembly the TCG check for TLB hits, we define here a pair of constants that
+ * allow us to use only shifts to obtain a TLBEntry address given its index.
+ * NOTE: The constants below should thus be updated every time changes are
+ * made to the CPUTLBEntry struct. See the compile-time consistency check below.
+ */
+#if TARGET_LONG_BITS == 64
+#define CPU_TLB_ENTRY_OUT_SHIFT 3
+#define CPU_TLB_ENTRY_IN_SHIFT  2
+#else
+#define CPU_TLB_ENTRY_OUT_SHIFT (HOST_LONG_BITS == 32 ? 2 : 3)
+#define CPU_TLB_ENTRY_IN_SHIFT  (HOST_LONG_BITS == 32 ? 2 : 1)
+#endif
+
+QEMU_BUILD_BUG_ON(sizeof(CPUTLBEntry) != \
+                (1 + BIT(CPU_TLB_ENTRY_IN_SHIFT)) << CPU_TLB_ENTRY_OUT_SHIFT);
 
 /* The IOTLB is not accessed directly inline by generated TCG code,
  * so the CPUIOTLBEntry layout is not as critical as that of the
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index ab63823..54250f5 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -1195,15 +1195,17 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg addrlo, TCGReg addrhi,
     tcg_out_mov(s, htype, r0, addrlo);
     tcg_out_mov(s, ttype, r1, addrlo);
 
-    tcg_out_shifti(s, SHIFT_SHR + hrexw, r0,
-                   TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS);
+    tcg_out_shifti(s, SHIFT_SHR + hrexw, r0, TARGET_PAGE_BITS);
 
     tgen_arithi(s, ARITH_AND + trexw, r1,
                 TARGET_PAGE_MASK | ((1 << s_bits) - 1), 0);
-    tgen_arithi(s, ARITH_AND + hrexw, r0,
-                (CPU_TLB_SIZE - 1) << CPU_TLB_ENTRY_BITS, 0);
+    tgen_arithi(s, ARITH_AND + hrexw, r0, CPU_TLB_SIZE - 1, 0);
 
-    tcg_out_modrm_sib_offset(s, OPC_LEA + hrexw, r0, TCG_AREG0, r0, 0,
+    tcg_out_modrm_sib_offset(s, OPC_LEA + hrexw, r0, r0, r0,
+                             CPU_TLB_ENTRY_IN_SHIFT, 0);
+
+    tcg_out_modrm_sib_offset(s, OPC_LEA + hrexw, r0, TCG_AREG0, r0,
+                             CPU_TLB_ENTRY_OUT_SHIFT,
                              offsetof(CPUArchState, tlb_table[mem_index][0])
                              + which);
 
-- 
1.8.3

  reply	other threads:[~2015-05-10  8:07 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-08 21:02 [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 1/8] cputlb: add physical address to CPUTLBEntry Emilio G. Cota
2015-05-08 21:51   ` Richard Henderson
2015-05-10  8:07     ` Emilio G. Cota [this message]
2015-05-08 21:02 ` [Qemu-devel] [RFC 2/8] softmmu: add helpers to get ld/st physical addresses Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 3/8] tiny_set: add module to test for membership in a tiny set of pointers Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 4/8] radix-tree: add generic lockless radix tree module Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 5/8] aie: add module for Atomic Instruction Emulation Emilio G. Cota
2015-05-08 22:41   ` Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 6/8] aie: add target helpers Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 7/8] target-arm: emulate atomic instructions using AIE Emilio G. Cota
2015-05-08 21:02 ` [Qemu-devel] [RFC 8/8] target-i386: " Emilio G. Cota
2015-05-11 16:01 ` [Qemu-devel] [RFC 0/8] Helper-based Atomic Instruction Emulation (AIE) Frederic Konrad

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150510080748.GA8629@flamenco \
    --to=cota@braap.org \
    --cc=a.rigo@virtualopensystems.com \
    --cc=alex.bennee@linaro.org \
    --cc=fred.konrad@greensocs.com \
    --cc=mttcg@greensocs.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.