From: "Emilio G. Cota" <cota@braap.org>
To: qemu-devel@nongnu.org
Cc: Richard Henderson <rth@twiddle.net>
Subject: [Qemu-devel] [PATCH v3 09/43] tcg: consolidate TB lookups in tb_lookup__cpu_state
Date: Wed, 19 Jul 2017 23:08:55 -0400 [thread overview]
Message-ID: <1500520169-23367-10-git-send-email-cota@braap.org> (raw)
In-Reply-To: <1500520169-23367-1-git-send-email-cota@braap.org>
This avoids duplicating code. cpu_exec_step will also use the
new common function once we integrate parallel_cpus into tb->cflags.
Note that in this commit we also fix a race, described by Richard Henderson
during review. Think of this scenario with threads A and B:
(A) Lookup succeeds for TB in hash without tb_lock
(B) Sets the TB's tb->invalid flag
(B) Removes the TB from tb_htable
(B) Clears all CPU's tb_jmp_cache
(A) Store TB into local tb_jmp_cache
Given that order of events, (A) will keep executing that invalid TB until
another flush of its tb_jmp_cache happens, which in theory might never happen.
We can fix this by checking the tb->invalid flag every time we look up a TB
from tb_jmp_cache, so that in the above scenario, next time we try to find
that TB in tb_jmp_cache, we won't, and will therefore be forced to look it
up in tb_htable.
Performance-wise, I measured a small improvement when booting debian-arm.
Note that inlining pays off:
Performance counter stats for 'taskset -c 0 qemu-system-arm \
-machine type=virt -nographic -smp 1 -m 4096 \
-netdev user,id=unet,hostfwd=tcp::2222-:22 \
-device virtio-net-device,netdev=unet \
-drive file=jessie.qcow2,id=myblock,index=0,if=none \
-device virtio-blk-device,drive=myblock \
-kernel kernel.img -append console=ttyAMA0 root=/dev/vda1 \
-name arm,debug-threads=on -smp 1' (10 runs):
Before:
18714.917392 task-clock # 0.952 CPUs utilized ( +- 0.95% )
23,142 context-switches # 0.001 M/sec ( +- 0.50% )
1 CPU-migrations # 0.000 M/sec
10,558 page-faults # 0.001 M/sec ( +- 0.95% )
53,957,727,252 cycles # 2.883 GHz ( +- 0.91% ) [83.33%]
24,440,599,852 stalled-cycles-frontend # 45.30% frontend cycles idle ( +- 1.20% ) [83.33%]
16,495,714,424 stalled-cycles-backend # 30.57% backend cycles idle ( +- 0.95% ) [66.66%]
76,267,572,582 instructions # 1.41 insns per cycle
# 0.32 stalled cycles per insn ( +- 0.87% ) [83.34%]
12,692,186,323 branches # 678.186 M/sec ( +- 0.92% ) [83.35%]
263,486,879 branch-misses # 2.08% of all branches ( +- 0.73% ) [83.34%]
19.648474449 seconds time elapsed ( +- 0.82% )
After, w/ inline (this patch):
18471.376627 task-clock # 0.955 CPUs utilized ( +- 0.96% )
23,048 context-switches # 0.001 M/sec ( +- 0.48% )
1 CPU-migrations # 0.000 M/sec
10,708 page-faults # 0.001 M/sec ( +- 0.81% )
53,208,990,796 cycles # 2.881 GHz ( +- 0.98% ) [83.34%]
23,941,071,673 stalled-cycles-frontend # 44.99% frontend cycles idle ( +- 0.95% ) [83.34%]
16,161,773,848 stalled-cycles-backend # 30.37% backend cycles idle ( +- 0.76% ) [66.67%]
75,786,269,766 instructions # 1.42 insns per cycle
# 0.32 stalled cycles per insn ( +- 1.24% ) [83.34%]
12,573,617,143 branches # 680.708 M/sec ( +- 1.34% ) [83.33%]
260,235,550 branch-misses # 2.07% of all branches ( +- 0.66% ) [83.33%]
19.340502161 seconds time elapsed ( +- 0.56% )
After, w/o inline:
18791.253967 task-clock # 0.954 CPUs utilized ( +- 0.78% )
23,230 context-switches # 0.001 M/sec ( +- 0.42% )
1 CPU-migrations # 0.000 M/sec
10,563 page-faults # 0.001 M/sec ( +- 1.27% )
54,168,674,622 cycles # 2.883 GHz ( +- 0.80% ) [83.34%]
24,244,712,629 stalled-cycles-frontend # 44.76% frontend cycles idle ( +- 1.37% ) [83.33%]
16,288,648,572 stalled-cycles-backend # 30.07% backend cycles idle ( +- 0.95% ) [66.66%]
77,659,755,503 instructions # 1.43 insns per cycle
# 0.31 stalled cycles per insn ( +- 0.97% ) [83.34%]
12,922,780,045 branches # 687.702 M/sec ( +- 1.06% ) [83.34%]
261,962,386 branch-misses # 2.03% of all branches ( +- 0.71% ) [83.35%]
19.700174670 seconds time elapsed ( +- 0.56% )
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Emilio G. Cota <cota@braap.org>
---
include/exec/tb-lookup.h | 49 ++++++++++++++++++++++++++++++++++++++++++++++++
accel/tcg/cpu-exec.c | 47 ++++++++++++++++++----------------------------
tcg/tcg-runtime.c | 24 ++++++------------------
3 files changed, 73 insertions(+), 47 deletions(-)
create mode 100644 include/exec/tb-lookup.h
diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h
new file mode 100644
index 0000000..9d32cb0
--- /dev/null
+++ b/include/exec/tb-lookup.h
@@ -0,0 +1,49 @@
+/*
+ * Copyright (C) 2017, Emilio G. Cota <cota@braap.org>
+ *
+ * License: GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+#ifndef EXEC_TB_LOOKUP_H
+#define EXEC_TB_LOOKUP_H
+
+#include "qemu/osdep.h"
+
+#ifdef NEED_CPU_H
+#include "cpu.h"
+#else
+#include "exec/poison.h"
+#endif
+
+#include "exec/exec-all.h"
+#include "exec/tb-hash.h"
+
+/* Might cause an exception, so have a longjmp destination ready */
+static inline TranslationBlock *
+tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, target_ulong *cs_base,
+ uint32_t *flags)
+{
+ CPUArchState *env = (CPUArchState *)cpu->env_ptr;
+ TranslationBlock *tb;
+ uint32_t hash;
+
+ cpu_get_tb_cpu_state(env, pc, cs_base, flags);
+ hash = tb_jmp_cache_hash_func(*pc);
+ tb = atomic_rcu_read(&cpu->tb_jmp_cache[hash]);
+ if (likely(tb &&
+ tb->pc == *pc &&
+ tb->cs_base == *cs_base &&
+ tb->flags == *flags &&
+ tb->trace_vcpu_dstate == *cpu->trace_dstate &&
+ !atomic_read(&tb->invalid))) {
+ return tb;
+ }
+ tb = tb_htable_lookup(cpu, *pc, *cs_base, *flags);
+ if (tb == NULL) {
+ return NULL;
+ }
+ atomic_set(&cpu->tb_jmp_cache[hash], tb);
+ return tb;
+}
+
+#endif /* EXEC_TB_LOOKUP_H */
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index c4c289b..5d2ee5b 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -28,6 +28,7 @@
#include "exec/address-spaces.h"
#include "qemu/rcu.h"
#include "exec/tb-hash.h"
+#include "exec/tb-lookup.h"
#include "exec/log.h"
#include "qemu/main-loop.h"
#if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
@@ -333,43 +334,31 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
TranslationBlock *last_tb,
int tb_exit)
{
- CPUArchState *env = (CPUArchState *)cpu->env_ptr;
TranslationBlock *tb;
target_ulong cs_base, pc;
uint32_t flags;
bool acquired_tb_lock = false;
- /* we record a subset of the CPU state. It will
- always be the same before a given translated block
- is executed. */
- cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
- tb = atomic_rcu_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
- if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
- tb->flags != flags ||
- tb->trace_vcpu_dstate != *cpu->trace_dstate)) {
- tb = tb_htable_lookup(cpu, pc, cs_base, flags);
- if (!tb) {
-
- /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
- * taken outside tb_lock. As system emulation is currently
- * single threaded the locks are NOPs.
- */
- mmap_lock();
- tb_lock();
- acquired_tb_lock = true;
-
- /* There's a chance that our desired tb has been translated while
- * taking the locks so we check again inside the lock.
- */
- tb = tb_htable_lookup(cpu, pc, cs_base, flags);
- if (!tb) {
- /* if no translated code available, then translate it now */
- tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
- }
+ tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags);
+ if (tb == NULL) {
+ /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+ * taken outside tb_lock. As system emulation is currently
+ * single threaded the locks are NOPs.
+ */
+ mmap_lock();
+ tb_lock();
+ acquired_tb_lock = true;
- mmap_unlock();
+ /* There's a chance that our desired tb has been translated while
+ * taking the locks so we check again inside the lock.
+ */
+ tb = tb_htable_lookup(cpu, pc, cs_base, flags);
+ if (likely(tb == NULL)) {
+ /* if no translated code available, then translate it now */
+ tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
}
+ mmap_unlock();
/* We add the TB in the virtual pc hash table for the fast lookup */
atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
}
diff --git a/tcg/tcg-runtime.c b/tcg/tcg-runtime.c
index e85a042..7100339 100644
--- a/tcg/tcg-runtime.c
+++ b/tcg/tcg-runtime.c
@@ -27,7 +27,7 @@
#include "exec/helper-proto.h"
#include "exec/cpu_ldst.h"
#include "exec/exec-all.h"
-#include "exec/tb-hash.h"
+#include "exec/tb-lookup.h"
#include "disas/disas.h"
#include "exec/log.h"
@@ -149,24 +149,12 @@ void *HELPER(lookup_tb_ptr)(CPUArchState *env)
CPUState *cpu = ENV_GET_CPU(env);
TranslationBlock *tb;
target_ulong cs_base, pc;
- uint32_t flags, hash;
-
- cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
- hash = tb_jmp_cache_hash_func(pc);
- tb = atomic_rcu_read(&cpu->tb_jmp_cache[hash]);
-
- if (unlikely(!(tb
- && tb->pc == pc
- && tb->cs_base == cs_base
- && tb->flags == flags
- && tb->trace_vcpu_dstate == *cpu->trace_dstate))) {
- tb = tb_htable_lookup(cpu, pc, cs_base, flags);
- if (!tb) {
- return tcg_ctx.code_gen_epilogue;
- }
- atomic_set(&cpu->tb_jmp_cache[hash], tb);
- }
+ uint32_t flags;
+ tb = tb_lookup__cpu_state(cpu, &pc, &cs_base, &flags);
+ if (tb == NULL) {
+ return tcg_ctx.code_gen_epilogue;
+ }
qemu_log_mask_and_addr(CPU_LOG_EXEC, pc,
"Chain %p [%d: " TARGET_FMT_lx "] %s\n",
tb->tc_ptr, cpu->cpu_index, pc,
--
2.7.4
next prev parent reply other threads:[~2017-07-20 3:09 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-20 3:08 [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts Emilio G. Cota
2017-07-20 3:08 ` [Qemu-devel] [PATCH v3 01/43] cputlb: bring back tlb_flush_count under !TLB_DEBUG Emilio G. Cota
2017-07-20 3:08 ` [Qemu-devel] [PATCH v3 02/43] tcg: fix corruption of code_time profiling counter upon tb_flush Emilio G. Cota
2017-07-20 3:08 ` [Qemu-devel] [PATCH v3 03/43] exec-all: fix typos in TranslationBlock's documentation Emilio G. Cota
2017-07-20 3:08 ` [Qemu-devel] [PATCH v3 04/43] translate-all: make have_tb_lock static Emilio G. Cota
2017-07-20 3:08 ` [Qemu-devel] [PATCH v3 05/43] cpu-exec: rename have_tb_lock to acquired_tb_lock in tb_find Emilio G. Cota
2017-07-20 3:08 ` [Qemu-devel] [PATCH v3 06/43] tcg/i386: constify tcg_target_callee_save_regs Emilio G. Cota
2017-07-20 3:08 ` [Qemu-devel] [PATCH v3 07/43] tcg/mips: " Emilio G. Cota
2017-07-20 3:08 ` [Qemu-devel] [PATCH v3 08/43] tcg: remove addr argument from lookup_tb_ptr Emilio G. Cota
2017-07-20 3:08 ` Emilio G. Cota [this message]
2017-07-20 3:08 ` [Qemu-devel] [PATCH v3 10/43] exec-all: bring tb->invalid into tb->cflags Emilio G. Cota
2017-07-20 3:08 ` [Qemu-devel] [PATCH v3 11/43] tcg: define CF_PARALLEL and use it for TB hashing Emilio G. Cota
2017-07-20 8:45 ` Richard Henderson
2017-07-20 3:08 ` [Qemu-devel] [PATCH v3 12/43] tcg: convert tb->cflags reads to tb_cflags(tb) Emilio G. Cota
2017-07-20 7:22 ` Richard Henderson
2017-07-20 3:08 ` [Qemu-devel] [PATCH v3 13/43] target/arm: check CF_PARALLEL instead of parallel_cpus Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 14/43] target/hppa: " Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 15/43] target/i386: " Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 16/43] target/m68k: " Emilio G. Cota
2017-07-20 7:23 ` Richard Henderson
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 17/43] target/s390x: " Emilio G. Cota
2017-07-20 7:25 ` Richard Henderson
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 18/43] target/sh4: " Emilio G. Cota
2017-07-20 7:26 ` Richard Henderson
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 19/43] target/sparc: " Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 20/43] tcg: " Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 21/43] cpu-exec: lookup/generate TB outside exclusive region during step_atomic Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 22/43] translate-all: define and use DEBUG_TB_FLUSH_GATE Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 23/43] exec-all: introduce TB_PAGE_ADDR_FMT Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 24/43] translate-all: define and use DEBUG_TB_INVALIDATE_GATE Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 25/43] translate-all: define and use DEBUG_TB_CHECK_GATE Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 26/43] exec-all: extract tb->tc_* into a separate struct tc_tb Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 27/43] translate-all: use a binary search tree to track TBs in TBContext Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 28/43] exec-all: rename tb_free to tb_remove Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 29/43] translate-all: report correct avg host TB size Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 30/43] tci: move tci_regs to tcg_qemu_tb_exec's stack Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 31/43] tcg: take tb_ctx out of TCGContext Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 32/43] tcg: take .helpers " Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 33/43] tcg: define tcg_init_ctx and make tcg_ctx a pointer Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 34/43] gen-icount: fold exitreq_label into TCGContext Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 35/43] tcg: dynamically allocate optimizer temps Emilio G. Cota
2017-07-20 7:39 ` Richard Henderson
2017-07-20 23:53 ` Emilio G. Cota
2017-07-21 0:02 ` Richard Henderson
2017-07-21 5:04 ` Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 36/43] tcg: introduce **tcg_ctxs to keep track of all TCGContext's Emilio G. Cota
2017-07-20 7:47 ` Richard Henderson
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 37/43] tcg: distribute profiling counters across TCGContext's Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 38/43] util: move qemu_real_host_page_size/mask to osdep.h Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 39/43] osdep: introduce qemu_mprotect_rwx/none Emilio G. Cota
2017-07-20 7:49 ` Richard Henderson
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 40/43] translate-all: use qemu_protect_rwx/none helpers Emilio G. Cota
2017-07-20 7:51 ` Richard Henderson
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 41/43] tcg: define TCG_HIGHWATER Emilio G. Cota
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 42/43] tcg: introduce regions to split code_gen_buffer Emilio G. Cota
2017-07-20 8:04 ` Richard Henderson
2017-07-20 20:50 ` Emilio G. Cota
2017-07-20 21:22 ` Richard Henderson
2017-07-20 23:23 ` Emilio G. Cota
2017-07-21 0:07 ` Richard Henderson
2017-07-20 3:09 ` [Qemu-devel] [PATCH v3 43/43] tcg: enable multiple TCG contexts in softmmu Emilio G. Cota
2017-07-20 8:17 ` Richard Henderson
2017-07-20 4:05 ` [Qemu-devel] [PATCH v3 00/43] tcg: support for multiple TCG contexts no-reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1500520169-23367-10-git-send-email-cota@braap.org \
--to=cota@braap.org \
--cc=qemu-devel@nongnu.org \
--cc=rth@twiddle.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).