* [Qemu-devel] [RFC PATCH] tcg/softmmu: Increase size of TLB cache
@ 2017-07-24 21:03 Pranith Kumar
2017-07-24 21:19 ` Paolo Bonzini
2017-07-24 21:45 ` Richard Henderson
0 siblings, 2 replies; 4+ messages in thread
From: Pranith Kumar @ 2017-07-24 21:03 UTC (permalink / raw)
To: alex.bennee; +Cc: qemu-devel, rth, pbonzini
This patch increases the number of entries we allow in the TLB. I went
over a few architectures to see if increasing it is problematic. Only
armv6 seems to have a limitation that only 8 bits can be used for
indexing these entries. For other architectures, I increased the
number of TLB entries to a 4K-sized cache.
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
include/exec/cpu-defs.h | 5 ++++-
tcg/aarch64/tcg-target.h | 1 +
tcg/i386/tcg-target.h | 2 ++
tcg/mips/tcg-target.h | 1 +
tcg/s390/tcg-target.h | 1 +
tcg/sparc/tcg-target.h | 1 +
6 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 29b3c2ada8..cb81232b83 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -64,6 +64,9 @@ typedef uint64_t target_ulong;
#define CPU_TLB_ENTRY_BITS 5
#endif
+#ifndef CPU_TLB_BITS_MAX
+# define CPU_TLB_BITS_MAX 8
+#endif
/* TCG_TARGET_TLB_DISPLACEMENT_BITS is used in CPU_TLB_BITS to ensure that
* the TLB is not unnecessarily small, but still small enough for the
* TLB lookup instruction sequence used by the TCG target.
@@ -87,7 +90,7 @@ typedef uint64_t target_ulong;
* of tlb_table inside env (which is non-trivial but not huge).
*/
#define CPU_TLB_BITS \
- MIN(8, \
+ MIN(CPU_TLB_BITS_MAX, \
TCG_TARGET_TLB_DISPLACEMENT_BITS - CPU_TLB_ENTRY_BITS - \
(NB_MMU_MODES <= 1 ? 0 : \
NB_MMU_MODES <= 2 ? 1 : \
diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 55a46ac825..f428e09c98 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -15,6 +15,7 @@
#define TCG_TARGET_INSN_UNIT_SIZE 4
#define TCG_TARGET_TLB_DISPLACEMENT_BITS 24
+#define CPU_TLB_BITS_MAX 12
#undef TCG_TARGET_STACK_GROWSUP
typedef enum {
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 73a15f7e80..35c27a977b 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -162,6 +162,8 @@ extern bool have_popcnt;
# define TCG_AREG0 TCG_REG_EBP
#endif
+#define CPU_TLB_BITS_MAX 12
+
static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
{
}
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index d75cb63ed3..fd9046b7ad 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -37,6 +37,7 @@
#define TCG_TARGET_INSN_UNIT_SIZE 4
#define TCG_TARGET_TLB_DISPLACEMENT_BITS 16
+#define CPU_TLB_BITS_MAX 12
#define TCG_TARGET_NB_REGS 32
typedef enum {
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 957f0c0afe..218be322ad 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -27,6 +27,7 @@
#define TCG_TARGET_INSN_UNIT_SIZE 2
#define TCG_TARGET_TLB_DISPLACEMENT_BITS 19
+#define CPU_TLB_BITS_MAX 12
typedef enum TCGReg {
TCG_REG_R0 = 0,
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index 854a0afd70..9fd59a64f2 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -29,6 +29,7 @@
#define TCG_TARGET_INSN_UNIT_SIZE 4
#define TCG_TARGET_TLB_DISPLACEMENT_BITS 32
+#define CPU_TLB_BITS_MAX 12
#define TCG_TARGET_NB_REGS 32
typedef enum {
--
2.13.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] tcg/softmmu: Increase size of TLB cache
2017-07-24 21:03 [Qemu-devel] [RFC PATCH] tcg/softmmu: Increase size of TLB cache Pranith Kumar
@ 2017-07-24 21:19 ` Paolo Bonzini
2017-07-28 8:17 ` Alex Bennée
2017-07-24 21:45 ` Richard Henderson
1 sibling, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2017-07-24 21:19 UTC (permalink / raw)
To: Pranith Kumar, alex.bennee; +Cc: qemu-devel, rth
On 24/07/2017 23:03, Pranith Kumar wrote:
> This patch increases the number of entries we allow in the TLB. I went
> over a few architectures to see if increasing it is problematic. Only
> armv6 seems to have a limitation that only 8 bits can be used for
> indexing these entries. For other architectures, I increased the
> number of TLB entries to a 4K-sized cache.
>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
How did you benchmark this, and can you plot (at least for x86 hosts)
the results as CPU_TLB_BITS_MAX grows from 8 to 12?
Thanks,
Paolo
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] tcg/softmmu: Increase size of TLB cache
2017-07-24 21:03 [Qemu-devel] [RFC PATCH] tcg/softmmu: Increase size of TLB cache Pranith Kumar
2017-07-24 21:19 ` Paolo Bonzini
@ 2017-07-24 21:45 ` Richard Henderson
1 sibling, 0 replies; 4+ messages in thread
From: Richard Henderson @ 2017-07-24 21:45 UTC (permalink / raw)
To: Pranith Kumar, alex.bennee; +Cc: qemu-devel, pbonzini
On 07/24/2017 02:03 PM, Pranith Kumar wrote:
>
> +#ifndef CPU_TLB_BITS_MAX
> +# define CPU_TLB_BITS_MAX 8
You should simply require each backend to define this.
> +++ b/tcg/i386/tcg-target.h
> @@ -162,6 +162,8 @@ extern bool have_popcnt;
> # define TCG_AREG0 TCG_REG_EBP
> #endif
>
> +#define CPU_TLB_BITS_MAX 12
This is probably too much.
Exemplars:
NB_MMU_MODES = 1 moxie
NB_MMU_MODES = 2 m68k
NB_MMU_MODES = 3 alpha
NB_MMU_MODES = 7 arm
NB_MMU_MODES = 8 ppc64
sizeof(CPUArchState):
tlb bits \ modes 1 2 3 7 8
8 13856 25840 38952 92024 182576
12 198176 394480 591912 1382264 1657136
Having 1.5MB of TLB data seems excessive.
Please let's get some performance numbers for various tlb bit sizes.
How much improvement do you get if you increase the size of the victim tlb cache?
r~
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [Qemu-devel] [RFC PATCH] tcg/softmmu: Increase size of TLB cache
2017-07-24 21:19 ` Paolo Bonzini
@ 2017-07-28 8:17 ` Alex Bennée
0 siblings, 0 replies; 4+ messages in thread
From: Alex Bennée @ 2017-07-28 8:17 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Pranith Kumar, qemu-devel, rth
Paolo Bonzini <pbonzini@redhat.com> writes:
> On 24/07/2017 23:03, Pranith Kumar wrote:
>> This patch increases the number of entries we allow in the TLB. I went
>> over a few architectures to see if increasing it is problematic. Only
>> armv6 seems to have a limitation that only 8 bits can be used for
>> indexing these entries. For other architectures, I increased the
>> number of TLB entries to a 4K-sized cache.
>>
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>
> How did you benchmark this, and can you plot (at least for x86 hosts)
> the results as CPU_TLB_BITS_MAX grows from 8 to 12?
Pranith has some numbers but what we were seeing is the re-fill path
creeping up the perf profiles. Because it is so expensive to re-compute
the entries pushing up the TLB size does ameliorate the problem.
That said I don't think increasing the TLB size is our only solution.
What I've asked for is some sort of idea of the pattern for the eviction
of entries from the TLB and the performance of the victim cache. It may
be tweaking the locality of that cache would be enough.
One idea I had was With an 8 bit TLB you could afford to have 256
dynamically grown arrays in the victim path - one per entry. Then at
flush time you could simply count up the number of victims in the array
for that slot. That would give you a good idea if some regions are
hotter than others.
--
Alex Bennée
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-07-28 8:18 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-24 21:03 [Qemu-devel] [RFC PATCH] tcg/softmmu: Increase size of TLB cache Pranith Kumar
2017-07-24 21:19 ` Paolo Bonzini
2017-07-28 8:17 ` Alex Bennée
2017-07-24 21:45 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).