From: Richard Henderson <richard.henderson@linaro.org>
To: qemu-devel@nongnu.org
Cc: peter.maydell@linaro.org, "Emilio G. Cota" <cota@braap.org>
Subject: [Qemu-devel] [PULL v2 21/21] cputlb: read CPUTLBEntry.addr_write atomically
Date: Thu, 18 Oct 2018 23:06:56 -0700 [thread overview]
Message-ID: <20181019060656.7968-22-richard.henderson@linaro.org> (raw)
In-Reply-To: <20181019060656.7968-1-richard.henderson@linaro.org>
From: "Emilio G. Cota" <cota@braap.org>
Updates can come from other threads, so readers that do not
take tlb_lock must use atomic_read to avoid undefined
behaviour (UB).
This completes the conversion to tlb_lock. This conversion results
on average in no performance loss, as the following experiments
(run on an Intel i7-6700K CPU @ 4.00GHz) show.
1. aarch64 bootup+shutdown test:
- Before:
Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):
7487.087786 task-clock (msec) # 0.998 CPUs utilized ( +- 0.12% )
31,574,905,303 cycles # 4.217 GHz ( +- 0.12% )
57,097,908,812 instructions # 1.81 insns per cycle ( +- 0.08% )
10,255,415,367 branches # 1369.747 M/sec ( +- 0.08% )
173,278,962 branch-misses # 1.69% of all branches ( +- 0.18% )
7.504481349 seconds time elapsed ( +- 0.14% )
- After:
Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):
7462.441328 task-clock (msec) # 0.998 CPUs utilized ( +- 0.07% )
31,478,476,520 cycles # 4.218 GHz ( +- 0.07% )
57,017,330,084 instructions # 1.81 insns per cycle ( +- 0.05% )
10,251,929,667 branches # 1373.804 M/sec ( +- 0.05% )
173,023,787 branch-misses # 1.69% of all branches ( +- 0.11% )
7.474970463 seconds time elapsed ( +- 0.07% )
2. SPEC06int:
SPEC06int (test set)
[Y axis: Speedup over master]
1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+
| |
1.1 +-+.................................+++.............................+ tlb-lock-v2 (m+++x) +-+
| +++ | +++ tlb-lock-v3 (spinl|ck) |
| +++ | | +++ +++ | | |
1.05 +-+....+++...........####.........|####.+++.|......|.....###....+++...........+++....###.........+-+
| ### ++#| # |# |# ***### +++### +++#+# | +++ | #|# ### |
1 +-+++***+#++++####+++#++#++++++++++#++#+*+*++#++++#+#+****+#++++###++++###++++###++++#+#++++#+#+++-+
| *+* # #++# *** # #### *** # * *++# ****+# *| * # ****|# |# # #|# #+# # # |
0.95 +-+..*.*.#....#..#.*|*..#...#..#.*|*..#.*.*..#.*|.*.#.*++*.#.*++*+#.****.#....#+#....#.#..++#.#..+-+
| * * # # # *|* # # # *|* # * * # *++* # * * # * * # * |* # ++# # # # *** # |
| * * # ++# # *+* # # # *|* # * * # * * # * * # * * # *++* # **** # ++# # * * # |
0.9 +-+..*.*.#...|#..#.*.*..#.++#..#.*|*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*.|*.#...|#.#..*.*.#..+-+
| * * # *** # * * # |# # *+* # * * # * * # * * # * * # * * # *++* # |# # * * # |
0.85 +-+..*.*.#..*|*..#.*.*..#.***..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.****.#..*.*.#..+-+
| * * # *+* # * * # *|* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # |
| * * # * * # * * # *+* # * * # * * # * * # * * # * * # * * # * * # * |* # * * # |
0.8 +-+..*.*.#..*.*..#.*.*..#.*.*..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.*++*.#..*.*.#..+-+
| * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # * * # |
0.75 +-+--***##--***###-***###-***###-***###-***###-****##-****##-****##-****##-****##-****##--***##--+-+
400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean
png: https://imgur.com/a/BHzpPTW
Notes:
- tlb-lock-v2 corresponds to an implementation with a mutex.
- tlb-lock-v3 corresponds to the current implementation, i.e.
a spinlock and a single lock acquisition in tlb_set_page_with_attrs.
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181016153840.25877-1-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
accel/tcg/softmmu_template.h | 12 ++++++------
include/exec/cpu_ldst.h | 11 ++++++++++-
include/exec/cpu_ldst_template.h | 2 +-
accel/tcg/cputlb.c | 19 +++++++++++++------
4 files changed, 30 insertions(+), 14 deletions(-)
diff --git a/accel/tcg/softmmu_template.h b/accel/tcg/softmmu_template.h
index 09538b5349..b0adea045e 100644
--- a/accel/tcg/softmmu_template.h
+++ b/accel/tcg/softmmu_template.h
@@ -280,7 +280,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
uintptr_t mmu_idx = get_mmuidx(oi);
uintptr_t index = tlb_index(env, mmu_idx, addr);
CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
- target_ulong tlb_addr = entry->addr_write;
+ target_ulong tlb_addr = tlb_addr_write(entry);
unsigned a_bits = get_alignment_bits(get_memop(oi));
uintptr_t haddr;
@@ -295,7 +295,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
tlb_fill(ENV_GET_CPU(env), addr, DATA_SIZE, MMU_DATA_STORE,
mmu_idx, retaddr);
}
- tlb_addr = entry->addr_write & ~TLB_INVALID_MASK;
+ tlb_addr = tlb_addr_write(entry) & ~TLB_INVALID_MASK;
}
/* Handle an IO access. */
@@ -325,7 +325,7 @@ void helper_le_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
cannot evict the first. */
page2 = (addr + DATA_SIZE) & TARGET_PAGE_MASK;
entry2 = tlb_entry(env, mmu_idx, page2);
- if (!tlb_hit_page(entry2->addr_write, page2)
+ if (!tlb_hit_page(tlb_addr_write(entry2), page2)
&& !VICTIM_TLB_HIT(addr_write, page2)) {
tlb_fill(ENV_GET_CPU(env), page2, DATA_SIZE, MMU_DATA_STORE,
mmu_idx, retaddr);
@@ -358,7 +358,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
uintptr_t mmu_idx = get_mmuidx(oi);
uintptr_t index = tlb_index(env, mmu_idx, addr);
CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
- target_ulong tlb_addr = entry->addr_write;
+ target_ulong tlb_addr = tlb_addr_write(entry);
unsigned a_bits = get_alignment_bits(get_memop(oi));
uintptr_t haddr;
@@ -373,7 +373,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
tlb_fill(ENV_GET_CPU(env), addr, DATA_SIZE, MMU_DATA_STORE,
mmu_idx, retaddr);
}
- tlb_addr = entry->addr_write & ~TLB_INVALID_MASK;
+ tlb_addr = tlb_addr_write(entry) & ~TLB_INVALID_MASK;
}
/* Handle an IO access. */
@@ -403,7 +403,7 @@ void helper_be_st_name(CPUArchState *env, target_ulong addr, DATA_TYPE val,
cannot evict the first. */
page2 = (addr + DATA_SIZE) & TARGET_PAGE_MASK;
entry2 = tlb_entry(env, mmu_idx, page2);
- if (!tlb_hit_page(entry2->addr_write, page2)
+ if (!tlb_hit_page(tlb_addr_write(entry2), page2)
&& !VICTIM_TLB_HIT(addr_write, page2)) {
tlb_fill(ENV_GET_CPU(env), page2, DATA_SIZE, MMU_DATA_STORE,
mmu_idx, retaddr);
diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index f54d91ff68..959068495a 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -126,6 +126,15 @@ extern __thread uintptr_t helper_retaddr;
/* The memory helpers for tcg-generated code need tcg_target_long etc. */
#include "tcg.h"
+static inline target_ulong tlb_addr_write(const CPUTLBEntry *entry)
+{
+#if TCG_OVERSIZED_GUEST
+ return entry->addr_write;
+#else
+ return atomic_read(&entry->addr_write);
+#endif
+}
+
/* Find the TLB index corresponding to the mmu_idx + address pair. */
static inline uintptr_t tlb_index(CPUArchState *env, uintptr_t mmu_idx,
target_ulong addr)
@@ -439,7 +448,7 @@ static inline void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
tlb_addr = tlbentry->addr_read;
break;
case 1:
- tlb_addr = tlbentry->addr_write;
+ tlb_addr = tlb_addr_write(tlbentry);
break;
case 2:
tlb_addr = tlbentry->addr_code;
diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h
index d21a0b59bf..0f061d47ef 100644
--- a/include/exec/cpu_ldst_template.h
+++ b/include/exec/cpu_ldst_template.h
@@ -177,7 +177,7 @@ glue(glue(glue(cpu_st, SUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
addr = ptr;
mmu_idx = CPU_MMU_INDEX;
entry = tlb_entry(env, mmu_idx, addr);
- if (unlikely(entry->addr_write !=
+ if (unlikely(tlb_addr_write(entry) !=
(addr & (TARGET_PAGE_MASK | (DATA_SIZE - 1))))) {
oi = make_memop_idx(SHIFT, mmu_idx);
glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(env, addr, v, oi,
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 28b770a404..af57aca5e4 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -258,7 +258,7 @@ static inline bool tlb_hit_page_anyprot(CPUTLBEntry *tlb_entry,
target_ulong page)
{
return tlb_hit_page(tlb_entry->addr_read, page) ||
- tlb_hit_page(tlb_entry->addr_write, page) ||
+ tlb_hit_page(tlb_addr_write(tlb_entry), page) ||
tlb_hit_page(tlb_entry->addr_code, page);
}
@@ -855,7 +855,7 @@ static void io_writex(CPUArchState *env, CPUIOTLBEntry *iotlbentry,
tlb_fill(cpu, addr, size, MMU_DATA_STORE, mmu_idx, retaddr);
entry = tlb_entry(env, mmu_idx, addr);
- tlb_addr = entry->addr_write;
+ tlb_addr = tlb_addr_write(entry);
if (!(tlb_addr & ~(TARGET_PAGE_MASK | TLB_RECHECK))) {
/* RAM access */
uintptr_t haddr = addr + entry->addend;
@@ -904,7 +904,14 @@ static bool victim_tlb_hit(CPUArchState *env, size_t mmu_idx, size_t index,
assert_cpu_is_self(ENV_GET_CPU(env));
for (vidx = 0; vidx < CPU_VTLB_SIZE; ++vidx) {
CPUTLBEntry *vtlb = &env->tlb_v_table[mmu_idx][vidx];
- target_ulong cmp = *(target_ulong *)((uintptr_t)vtlb + elt_ofs);
+ target_ulong cmp;
+
+ /* elt_ofs might correspond to .addr_write, so use atomic_read */
+#if TCG_OVERSIZED_GUEST
+ cmp = *(target_ulong *)((uintptr_t)vtlb + elt_ofs);
+#else
+ cmp = atomic_read((target_ulong *)((uintptr_t)vtlb + elt_ofs));
+#endif
if (cmp == page) {
/* Found entry in victim tlb, swap tlb and iotlb. */
@@ -977,7 +984,7 @@ void probe_write(CPUArchState *env, target_ulong addr, int size, int mmu_idx,
uintptr_t index = tlb_index(env, mmu_idx, addr);
CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
- if (!tlb_hit(entry->addr_write, addr)) {
+ if (!tlb_hit(tlb_addr_write(entry), addr)) {
/* TLB entry is for a different page */
if (!VICTIM_TLB_HIT(addr_write, addr)) {
tlb_fill(ENV_GET_CPU(env), addr, size, MMU_DATA_STORE,
@@ -995,7 +1002,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
size_t mmu_idx = get_mmuidx(oi);
uintptr_t index = tlb_index(env, mmu_idx, addr);
CPUTLBEntry *tlbe = tlb_entry(env, mmu_idx, addr);
- target_ulong tlb_addr = tlbe->addr_write;
+ target_ulong tlb_addr = tlb_addr_write(tlbe);
TCGMemOp mop = get_memop(oi);
int a_bits = get_alignment_bits(mop);
int s_bits = mop & MO_SIZE;
@@ -1026,7 +1033,7 @@ static void *atomic_mmu_lookup(CPUArchState *env, target_ulong addr,
tlb_fill(ENV_GET_CPU(env), addr, 1 << s_bits, MMU_DATA_STORE,
mmu_idx, retaddr);
}
- tlb_addr = tlbe->addr_write & ~TLB_INVALID_MASK;
+ tlb_addr = tlb_addr_write(tlbe) & ~TLB_INVALID_MASK;
}
/* Notice an IO access or a needs-MMU-lookup access */
--
2.17.2
next prev parent reply other threads:[~2018-10-19 6:07 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-19 6:06 [Qemu-devel] [PULL v2 00/21] tcg patch queue Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 01/21] tcg: Implement CPU_LOG_TB_NOCHAIN during expansion Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 02/21] tcg: access cpu->icount_decr.u16.high with atomics Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 03/21] tcg: fix use of uninitialized variable under CONFIG_PROFILER Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 04/21] tcg: plug holes in struct TCGProfile Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 05/21] tcg: distribute tcg_time into TCG contexts Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 06/21] target/alpha: remove tlb_flush from alpha_cpu_initfn Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 07/21] target/unicore32: remove tlb_flush from uc32_init_fn Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 08/21] exec: introduce tlb_init Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 09/21] cputlb: fix assert_cpu_is_self macro Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 10/21] cputlb: serialize tlb updates with env->tlb_lock Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 11/21] tcg: Add tlb_index and tlb_entry helpers Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 12/21] tcg: Split CONFIG_ATOMIC128 Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 13/21] target/i386: Convert to HAVE_CMPXCHG128 Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 14/21] target/arm: " Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 15/21] target/arm: Check HAVE_CMPXCHG128 at translate time Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 16/21] target/ppc: Convert to HAVE_CMPXCHG128 and HAVE_ATOMIC128 Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 17/21] target/s390x: " Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 18/21] target/s390x: Split do_cdsg, do_lpq, do_stpq Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 19/21] target/s390x: Skip wout, cout helpers if op helper does not return Richard Henderson
2018-10-19 6:06 ` [Qemu-devel] [PULL v2 20/21] target/s390x: Check HAVE_ATOMIC128 and HAVE_CMPXCHG128 at translate Richard Henderson
2018-10-19 6:06 ` Richard Henderson [this message]
2018-10-19 18:01 ` [Qemu-devel] [PULL v2 00/21] tcg patch queue Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181019060656.7968-22-richard.henderson@linaro.org \
--to=richard.henderson@linaro.org \
--cc=cota@braap.org \
--cc=peter.maydell@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).