* [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path
@ 2016-07-15 17:58 Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 01/12] util/qht: Document memory ordering assumptions Sergey Fedorov
` (12 more replies)
0 siblings, 13 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée
From: Sergey Fedorov <serge.fdrv@gmail.com>
Hi,
This is a respin of this series [1].
Here I used a modified version of Paolo's patch to docuement memory
ordering assumptions for certain QHT operations.
The last patch is a suggestion for renaming tb_find_physicall().
This series can be fetch from the public git repository:
https://github.com/sergefdrv/qemu.git lockless-tb-lookup-v4
[1] http://thread.gmane.org/gmane.comp.emulators.qemu/426341
Kind regards,
Sergey
Summary of changes in v4:
- Modified version of Paolo's patch is used to document memory ordering
assumptions for certain QHT operations
- Intermediate compilation errors fixed
- Atomic access to TB CPU state
- tb_find_physical() renamed
Summary of changes in v3:
- QHT memory ordering assumptions documented
- 'tb_jmp_cache' reset in tb_flush() made atomic
- explicit memory barriers removed around 'tb_jmp_cache' access
- safe access to 'tb_flushed' out of 'tb_lock' prepared
- TBs marked with invalid CPU state early on invalidation
- Alex's tb_find_{fast,slow}() roll-up related patches dropped
- bouncing of tb_lock between tb_gen_code() and tb_add_jump() avoided
with local variable 'have_tb_lock'
- tb_find_{fast,slow}() merged
Alex Bennée (2):
tcg: set up tb->page_addr before insertion
tcg: cpu-exec: remove tb_lock from the hot-path
Paolo Bonzini (1):
util/qht: Document memory ordering assumptions
Sergey Fedorov (9):
tcg: Pass last_tb by value to tb_find_fast()
tcg: Prepare safe tb_jmp_cache lookup out of tb_lock
tcg: Prepare safe access to tb_flushed out of tb_lock
target-i386: Remove redundant HF_SOFTMMU_MASK
tcg: Introduce tb_mark_invalid() and tb_is_invalid()
tcg: Prepare TB invalidation for lockless TB lookup
tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump()
tcg: Merge tb_find_slow() and tb_find_fast()
tcg: rename tb_find_physical()
cpu-exec.c | 117 +++++++++++++++++++++--------------------------
include/exec/exec-all.h | 16 +++++++
include/qemu/qht.h | 5 ++
target-alpha/cpu.h | 14 ++++++
target-arm/cpu.h | 14 ++++++
target-cris/cpu.h | 14 ++++++
target-i386/cpu.c | 3 --
target-i386/cpu.h | 20 ++++++--
target-i386/translate.c | 12 ++---
target-lm32/cpu.h | 14 ++++++
target-m68k/cpu.h | 14 ++++++
target-microblaze/cpu.h | 14 ++++++
target-mips/cpu.h | 14 ++++++
target-moxie/cpu.h | 14 ++++++
target-openrisc/cpu.h | 14 ++++++
target-ppc/cpu.h | 14 ++++++
target-s390x/cpu.h | 14 ++++++
target-sh4/cpu.h | 14 ++++++
target-sparc/cpu.h | 14 ++++++
target-sparc/translate.c | 1 +
target-tilegx/cpu.h | 14 ++++++
target-tricore/cpu.h | 14 ++++++
target-unicore32/cpu.h | 14 ++++++
target-xtensa/cpu.h | 14 ++++++
translate-all.c | 29 ++++++------
util/qht.c | 7 ++-
26 files changed, 352 insertions(+), 96 deletions(-)
--
2.9.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v4 01/12] util/qht: Document memory ordering assumptions
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
@ 2016-07-15 17:58 ` Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 02/12] tcg: Pass last_tb by value to tb_find_fast() Sergey Fedorov
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée
From: Paolo Bonzini <pbonzini@redhat.com>
It is naturally expected that some memory ordering should be provided
around qht_insert() and qht_lookup(). Document these assumptions in the
header file and put some comments in the source to denote how that
memory ordering requirements are fulfilled.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[Sergey Fedorov: commit title and message provided;
comment on qht_remove() elided]
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
---
Changes in v4:
- Modified version of Paolo's patch is used
---
include/qemu/qht.h | 5 +++++
util/qht.c | 7 ++++++-
2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/include/qemu/qht.h b/include/qemu/qht.h
index 70bfc68b8d67..311139b85a9a 100644
--- a/include/qemu/qht.h
+++ b/include/qemu/qht.h
@@ -69,6 +69,9 @@ void qht_destroy(struct qht *ht);
* Attempting to insert a NULL @p is a bug.
* Inserting the same pointer @p with different @hash values is a bug.
*
+ * In case of successful operation, smp_wmb() is implied before the pointer is
+ * inserted into the hash table.
+ *
* Returns true on sucess.
* Returns false if the @p-@hash pair already exists in the hash table.
*/
@@ -83,6 +86,8 @@ bool qht_insert(struct qht *ht, void *p, uint32_t hash);
*
* Needs to be called under an RCU read-critical section.
*
+ * smp_read_barrier_depends() is implied before the call to @func.
+ *
* The user-provided @func compares pointers in QHT against @userp.
* If the function returns true, a match has been found.
*
diff --git a/util/qht.c b/util/qht.c
index 40d6e218f759..28ce289245a7 100644
--- a/util/qht.c
+++ b/util/qht.c
@@ -445,7 +445,11 @@ void *qht_do_lookup(struct qht_bucket *head, qht_lookup_func_t func,
do {
for (i = 0; i < QHT_BUCKET_ENTRIES; i++) {
if (b->hashes[i] == hash) {
- void *p = atomic_read(&b->pointers[i]);
+ /* The pointer is dereferenced before seqlock_read_retry,
+ * so (unlike qht_insert__locked) we need to use
+ * atomic_rcu_read here.
+ */
+ void *p = atomic_rcu_read(&b->pointers[i]);
if (likely(p) && likely(func(p, userp))) {
return p;
@@ -535,6 +539,7 @@ static bool qht_insert__locked(struct qht *ht, struct qht_map *map,
atomic_rcu_set(&prev->next, b);
}
b->hashes[i] = hash;
+ /* smp_wmb() implicit in seqlock_write_begin. */
atomic_set(&b->pointers[i], p);
seqlock_write_end(&head->sequence);
return true;
--
2.9.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v4 02/12] tcg: Pass last_tb by value to tb_find_fast()
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 01/12] util/qht: Document memory ordering assumptions Sergey Fedorov
@ 2016-07-15 17:58 ` Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 03/12] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Sergey Fedorov
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée, Sergey Fedorov,
Peter Crosthwaite
From: Sergey Fedorov <serge.fdrv@gmail.com>
This is a small clean up. tb_find_fast() is a final consumer of this
variable so no need to pass it by reference. 'last_tb' is always updated
by subsequent cpu_loop_exec_tb() in cpu_exec().
This change also simplifies calling cpu_exec_nocache() in
cpu_handle_exception().
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
Changes in v4:
- Compile error fixed (missed conversion)
---
cpu-exec.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index b840e1d2dd41..974de6aa27ee 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -320,7 +320,7 @@ found:
}
static inline TranslationBlock *tb_find_fast(CPUState *cpu,
- TranslationBlock **last_tb,
+ TranslationBlock *last_tb,
int tb_exit)
{
CPUArchState *env = (CPUArchState *)cpu->env_ptr;
@@ -342,7 +342,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
/* Ensure that no TB jump will be modified as the
* translation buffer has been flushed.
*/
- *last_tb = NULL;
+ last_tb = NULL;
cpu->tb_flushed = false;
}
#ifndef CONFIG_USER_ONLY
@@ -351,12 +351,12 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
* spanning two pages because the mapping for the second page can change.
*/
if (tb->page_addr[1] != -1) {
- *last_tb = NULL;
+ last_tb = NULL;
}
#endif
/* See if we can patch the calling TB. */
- if (*last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
- tb_add_jump(*last_tb, tb_exit, tb);
+ if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+ tb_add_jump(last_tb, tb_exit, tb);
}
tb_unlock();
return tb;
@@ -437,8 +437,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
} else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
/* try to cause an exception pending in the log */
- TranslationBlock *last_tb = NULL; /* Avoid chaining TBs */
- cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, &last_tb, 0), true);
+ cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, NULL, 0), true);
*ret = -1;
return true;
#endif
@@ -622,7 +621,7 @@ int cpu_exec(CPUState *cpu)
cpu->tb_flushed = false; /* reset before first TB lookup */
for(;;) {
cpu_handle_interrupt(cpu, &last_tb);
- tb = tb_find_fast(cpu, &last_tb, tb_exit);
+ tb = tb_find_fast(cpu, last_tb, tb_exit);
cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
/* Try to align the host and virtual clocks
if the guest is in advance */
--
2.9.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v4 03/12] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 01/12] util/qht: Document memory ordering assumptions Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 02/12] tcg: Pass last_tb by value to tb_find_fast() Sergey Fedorov
@ 2016-07-15 17:58 ` Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 04/12] tcg: Prepare safe access to tb_flushed " Sergey Fedorov
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée, Sergey Fedorov,
Peter Crosthwaite
From: Sergey Fedorov <serge.fdrv@gmail.com>
Ensure atomicity of CPU's 'tb_jmp_cache' access for future translation
block lookup out of 'tb_lock'.
Note that this patch does *not* make CPU's TLB invalidation safe if it
is done from some other thread while the CPU is in its execution loop.
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
[AJB: fixed missing atomic set, tweak title]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
[Sergey Fedorov: removed explicit memory barriers;
removed unnecessary atomic_read();
tweaked commit title and message]
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
---
Changes in v3:
- explicit memory barriers removed
- memset() on 'tb_jmp_cache' replaced with a loop on atomic_set()
Changes in v2:
- fix spelling s/con't/can't/
- add atomic_read while clearing tb_jmp_cache
- add r-b tags
Changes in v1 (AJB):
- tweak title
- fixed missing set of tb_jmp_cache
---
cpu-exec.c | 4 ++--
translate-all.c | 10 +++++++---
2 files changed, 9 insertions(+), 5 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index 974de6aa27ee..2fd1875a7317 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -315,7 +315,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
found:
/* we add the TB in the virtual pc hash table */
- cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)] = tb;
+ atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
return tb;
}
@@ -333,7 +333,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
is executed. */
cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
tb_lock();
- tb = cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)];
+ tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
tb->flags != flags)) {
tb = tb_find_slow(cpu, pc, cs_base, flags);
diff --git a/translate-all.c b/translate-all.c
index 0d47c1c0cf82..fdf520a86d68 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -848,7 +848,11 @@ void tb_flush(CPUState *cpu)
tcg_ctx.tb_ctx.nb_tbs = 0;
CPU_FOREACH(cpu) {
- memset(cpu->tb_jmp_cache, 0, sizeof(cpu->tb_jmp_cache));
+ int i;
+
+ for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
+ atomic_set(&cpu->tb_jmp_cache[i], NULL);
+ }
cpu->tb_flushed = true;
}
@@ -1007,8 +1011,8 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
/* remove the TB from the hash list */
h = tb_jmp_cache_hash_func(tb->pc);
CPU_FOREACH(cpu) {
- if (cpu->tb_jmp_cache[h] == tb) {
- cpu->tb_jmp_cache[h] = NULL;
+ if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) {
+ atomic_set(&cpu->tb_jmp_cache[h], NULL);
}
}
--
2.9.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v4 04/12] tcg: Prepare safe access to tb_flushed out of tb_lock
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
` (2 preceding siblings ...)
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 03/12] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Sergey Fedorov
@ 2016-07-15 17:58 ` Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 05/12] target-i386: Remove redundant HF_SOFTMMU_MASK Sergey Fedorov
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée, Sergey Fedorov,
Peter Crosthwaite
From: Sergey Fedorov <serge.fdrv@gmail.com>
Ensure atomicity and ordering of CPU's 'tb_flushed' access for future
translation block lookup out of 'tb_lock'.
This field can only be touched from another thread by tb_flush() in user
mode emulation. So the only access to be sequential atomic is:
* a single write in tb_flush();
* reads/writes out of 'tb_lock'.
In future, before enabling MTTCG in system mode, tb_flush() must be safe
and this field becomes unnecessary.
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
---
Changes in v4:
- Commit message tweaked
---
cpu-exec.c | 16 +++++++---------
translate-all.c | 4 ++--
2 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index 2fd1875a7317..c973e3b85922 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -338,13 +338,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
tb->flags != flags)) {
tb = tb_find_slow(cpu, pc, cs_base, flags);
}
- if (cpu->tb_flushed) {
- /* Ensure that no TB jump will be modified as the
- * translation buffer has been flushed.
- */
- last_tb = NULL;
- cpu->tb_flushed = false;
- }
#ifndef CONFIG_USER_ONLY
/* We don't take care of direct jumps when address mapping changes in
* system emulation. So it's not safe to make a direct jump to a TB
@@ -356,7 +349,12 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
#endif
/* See if we can patch the calling TB. */
if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
- tb_add_jump(last_tb, tb_exit, tb);
+ /* Check if translation buffer has been flushed */
+ if (cpu->tb_flushed) {
+ cpu->tb_flushed = false;
+ } else {
+ tb_add_jump(last_tb, tb_exit, tb);
+ }
}
tb_unlock();
return tb;
@@ -618,7 +616,7 @@ int cpu_exec(CPUState *cpu)
}
last_tb = NULL; /* forget the last executed TB after exception */
- cpu->tb_flushed = false; /* reset before first TB lookup */
+ atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
for(;;) {
cpu_handle_interrupt(cpu, &last_tb);
tb = tb_find_fast(cpu, last_tb, tb_exit);
diff --git a/translate-all.c b/translate-all.c
index fdf520a86d68..788fed1e0765 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -845,7 +845,6 @@ void tb_flush(CPUState *cpu)
> tcg_ctx.code_gen_buffer_size) {
cpu_abort(cpu, "Internal error: code buffer overflow\n");
}
- tcg_ctx.tb_ctx.nb_tbs = 0;
CPU_FOREACH(cpu) {
int i;
@@ -853,9 +852,10 @@ void tb_flush(CPUState *cpu)
for (i = 0; i < TB_JMP_CACHE_SIZE; ++i) {
atomic_set(&cpu->tb_jmp_cache[i], NULL);
}
- cpu->tb_flushed = true;
+ atomic_mb_set(&cpu->tb_flushed, true);
}
+ tcg_ctx.tb_ctx.nb_tbs = 0;
qht_reset_size(&tcg_ctx.tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
page_flush_tb();
--
2.9.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v4 05/12] target-i386: Remove redundant HF_SOFTMMU_MASK
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
` (3 preceding siblings ...)
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 04/12] tcg: Prepare safe access to tb_flushed " Sergey Fedorov
@ 2016-07-15 17:58 ` Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 06/12] tcg: Introduce tb_mark_invalid() and tb_is_invalid() Sergey Fedorov
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée, Sergey Fedorov,
Eduardo Habkost
From: Sergey Fedorov <serge.fdrv@gmail.com>
'HF_SOFTMMU_MASK' is only set when 'CONFIG_SOFTMMU' is defined. So
there's no need in this flag: test 'CONFIG_SOFTMMU' instead.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
---
target-i386/cpu.c | 3 ---
target-i386/cpu.h | 3 ---
target-i386/translate.c | 12 ++++--------
3 files changed, 4 insertions(+), 14 deletions(-)
diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index fc209ee1cb8a..6e49e4ca8282 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -2725,9 +2725,6 @@ static void x86_cpu_reset(CPUState *s)
/* init to reset state */
-#ifdef CONFIG_SOFTMMU
- env->hflags |= HF_SOFTMMU_MASK;
-#endif
env->hflags2 |= HF2_GIF_MASK;
cpu_x86_update_cr0(env, 0x60000010);
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 776efe630ea3..5b14a72baa6f 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -130,8 +130,6 @@
positions to ease oring with eflags. */
/* current cpl */
#define HF_CPL_SHIFT 0
-/* true if soft mmu is being used */
-#define HF_SOFTMMU_SHIFT 2
/* true if hardware interrupts must be disabled for next instruction */
#define HF_INHIBIT_IRQ_SHIFT 3
/* 16 or 32 segments */
@@ -161,7 +159,6 @@
#define HF_MPX_IU_SHIFT 26 /* BND registers in-use */
#define HF_CPL_MASK (3 << HF_CPL_SHIFT)
-#define HF_SOFTMMU_MASK (1 << HF_SOFTMMU_SHIFT)
#define HF_INHIBIT_IRQ_MASK (1 << HF_INHIBIT_IRQ_SHIFT)
#define HF_CS32_MASK (1 << HF_CS32_SHIFT)
#define HF_SS32_MASK (1 << HF_SS32_SHIFT)
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 7dea18bd6345..e81fce7bc2b5 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -8224,9 +8224,9 @@ void gen_intermediate_code(CPUX86State *env, TranslationBlock *tb)
dc->popl_esp_hack = 0;
/* select memory access functions */
dc->mem_index = 0;
- if (flags & HF_SOFTMMU_MASK) {
- dc->mem_index = cpu_mmu_index(env, false);
- }
+#ifdef CONFIG_SOFTMMU
+ dc->mem_index = cpu_mmu_index(env, false);
+#endif
dc->cpuid_features = env->features[FEAT_1_EDX];
dc->cpuid_ext_features = env->features[FEAT_1_ECX];
dc->cpuid_ext2_features = env->features[FEAT_8000_0001_EDX];
@@ -8239,11 +8239,7 @@ void gen_intermediate_code(CPUX86State *env, TranslationBlock *tb)
#endif
dc->flags = flags;
dc->jmp_opt = !(dc->tf || cs->singlestep_enabled ||
- (flags & HF_INHIBIT_IRQ_MASK)
-#ifndef CONFIG_SOFTMMU
- || (flags & HF_SOFTMMU_MASK)
-#endif
- );
+ (flags & HF_INHIBIT_IRQ_MASK));
/* Do not optimize repz jumps at all in icount mode, because
rep movsS instructions are execured with different paths
in !repz_opt and repz_opt modes. The first one was used
--
2.9.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v4 06/12] tcg: Introduce tb_mark_invalid() and tb_is_invalid()
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
` (4 preceding siblings ...)
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 05/12] target-i386: Remove redundant HF_SOFTMMU_MASK Sergey Fedorov
@ 2016-07-15 17:58 ` Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 07/12] tcg: Prepare TB invalidation for lockless TB lookup Sergey Fedorov
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée, Sergey Fedorov,
Peter Crosthwaite, Edgar E. Iglesias, Eduardo Habkost,
Michael Walle, Aurelien Jarno, Leon Alrae, Anthony Green, Jia Liu,
David Gibson, Alexander Graf, Mark Cave-Ayland, Artyom Tarasenko,
Bastian Koppelmann, Guan Xuetao, Max Filippov, qemu-arm, qemu-ppc
From: Sergey Fedorov <serge.fdrv@gmail.com>
These functions will be used to make translation block invalidation safe
with concurrent lockless lookup in the global hash table.
Most targets don't use 'cs_base'; so marking TB as invalid is as simple
as assigning -1 to 'cs_base'. SPARC target stores the next program
counter into 'cs_base', and -1 is a fine invalid value since PC must bet
a multiple of 4 in SPARC. The only odd target is i386, for which a
special flag is introduced in place of removed 'HF_SOFTMMU_MASK'.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
---
include/exec/exec-all.h | 10 ++++++++++
target-alpha/cpu.h | 14 ++++++++++++++
target-arm/cpu.h | 14 ++++++++++++++
target-cris/cpu.h | 14 ++++++++++++++
target-i386/cpu.h | 17 +++++++++++++++++
target-lm32/cpu.h | 14 ++++++++++++++
target-m68k/cpu.h | 14 ++++++++++++++
target-microblaze/cpu.h | 14 ++++++++++++++
target-mips/cpu.h | 14 ++++++++++++++
target-moxie/cpu.h | 14 ++++++++++++++
target-openrisc/cpu.h | 14 ++++++++++++++
target-ppc/cpu.h | 14 ++++++++++++++
target-s390x/cpu.h | 14 ++++++++++++++
target-sh4/cpu.h | 14 ++++++++++++++
target-sparc/cpu.h | 14 ++++++++++++++
target-sparc/translate.c | 1 +
target-tilegx/cpu.h | 14 ++++++++++++++
target-tricore/cpu.h | 14 ++++++++++++++
target-unicore32/cpu.h | 14 ++++++++++++++
target-xtensa/cpu.h | 14 ++++++++++++++
20 files changed, 266 insertions(+)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index acda7b613d53..a499c7c56eef 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -256,6 +256,16 @@ void tb_free(TranslationBlock *tb);
void tb_flush(CPUState *cpu);
void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
+static inline void tb_mark_invalid(TranslationBlock *tb)
+{
+ cpu_get_invalid_tb_cpu_state(&tb->pc, &tb->cs_base, &tb->flags);
+}
+
+static inline bool tb_is_invalid(TranslationBlock *tb)
+{
+ return cpu_tb_cpu_state_is_invalidated(tb->pc, tb->cs_base, tb->flags);
+}
+
#if defined(USE_DIRECT_JUMP)
#if defined(CONFIG_TCG_INTERPRETER)
diff --git a/target-alpha/cpu.h b/target-alpha/cpu.h
index ac5e801fb43b..f4ecabeb5b68 100644
--- a/target-alpha/cpu.h
+++ b/target-alpha/cpu.h
@@ -524,4 +524,18 @@ static inline void cpu_get_tb_cpu_state(CPUAlphaState *env, target_ulong *pc,
*pflags = flags;
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
#endif /* ALPHA_CPU_H */
diff --git a/target-arm/cpu.h b/target-arm/cpu.h
index 76d824d315f7..068f58d6a278 100644
--- a/target-arm/cpu.h
+++ b/target-arm/cpu.h
@@ -2371,6 +2371,20 @@ static inline void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
*cs_base = 0;
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
enum {
QEMU_PSCI_CONDUIT_DISABLED = 0,
QEMU_PSCI_CONDUIT_SMC = 1,
diff --git a/target-cris/cpu.h b/target-cris/cpu.h
index 7d7fe6eb1cf4..a20154e06b31 100644
--- a/target-cris/cpu.h
+++ b/target-cris/cpu.h
@@ -296,6 +296,20 @@ static inline void cpu_get_tb_cpu_state(CPUCRISState *env, target_ulong *pc,
| X_FLAG | PFIX_FLAG));
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
#define cpu_list cris_cpu_list
void cris_cpu_list(FILE *f, fprintf_function cpu_fprintf);
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 5b14a72baa6f..1e430ae07915 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -130,6 +130,8 @@
positions to ease oring with eflags. */
/* current cpl */
#define HF_CPL_SHIFT 0
+/* used to mark invalidated translation blocks */
+#define HF_INVALID_SHIFT 2
/* true if hardware interrupts must be disabled for next instruction */
#define HF_INHIBIT_IRQ_SHIFT 3
/* 16 or 32 segments */
@@ -159,6 +161,7 @@
#define HF_MPX_IU_SHIFT 26 /* BND registers in-use */
#define HF_CPL_MASK (3 << HF_CPL_SHIFT)
+#define HF_INVALID_MASK (1 << HF_INVALID_SHIFT)
#define HF_INHIBIT_IRQ_MASK (1 << HF_INHIBIT_IRQ_SHIFT)
#define HF_CS32_MASK (1 << HF_CS32_SHIFT)
#define HF_SS32_MASK (1 << HF_SS32_SHIFT)
@@ -1490,6 +1493,20 @@ static inline void cpu_get_tb_cpu_state(CPUX86State *env, target_ulong *pc,
(env->eflags & (IOPL_MASK | TF_MASK | RF_MASK | VM_MASK | AC_MASK));
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *flags = HF_INVALID_MASK;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return flags == HF_INVALID_MASK;
+}
+
void do_cpu_init(X86CPU *cpu);
void do_cpu_sipi(X86CPU *cpu);
diff --git a/target-lm32/cpu.h b/target-lm32/cpu.h
index d8a3515244ea..a94c6bd36bd3 100644
--- a/target-lm32/cpu.h
+++ b/target-lm32/cpu.h
@@ -271,4 +271,18 @@ static inline void cpu_get_tb_cpu_state(CPULM32State *env, target_ulong *pc,
*flags = 0;
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
#endif
diff --git a/target-m68k/cpu.h b/target-m68k/cpu.h
index b2faa6b60567..549b0eb23a87 100644
--- a/target-m68k/cpu.h
+++ b/target-m68k/cpu.h
@@ -270,4 +270,18 @@ static inline void cpu_get_tb_cpu_state(CPUM68KState *env, target_ulong *pc,
| ((env->macsr >> 4) & 0xf); /* Bits 0-3 */
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
#endif
diff --git a/target-microblaze/cpu.h b/target-microblaze/cpu.h
index beb75ffd26d5..2228e74b2f14 100644
--- a/target-microblaze/cpu.h
+++ b/target-microblaze/cpu.h
@@ -372,6 +372,20 @@ static inline void cpu_get_tb_cpu_state(CPUMBState *env, target_ulong *pc,
(env->sregs[SR_MSR] & (MSR_UM | MSR_VM | MSR_EE));
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
#if !defined(CONFIG_USER_ONLY)
void mb_cpu_unassigned_access(CPUState *cpu, hwaddr addr,
bool is_write, bool is_exec, int is_asi,
diff --git a/target-mips/cpu.h b/target-mips/cpu.h
index 5182dc74ffa3..e47e2e320f51 100644
--- a/target-mips/cpu.h
+++ b/target-mips/cpu.h
@@ -901,6 +901,20 @@ static inline void cpu_get_tb_cpu_state(CPUMIPSState *env, target_ulong *pc,
MIPS_HFLAG_HWRENA_ULR);
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
static inline int mips_vpe_active(CPUMIPSState *env)
{
int active = 1;
diff --git a/target-moxie/cpu.h b/target-moxie/cpu.h
index 3e880facf482..fba7276d72f8 100644
--- a/target-moxie/cpu.h
+++ b/target-moxie/cpu.h
@@ -137,6 +137,20 @@ static inline void cpu_get_tb_cpu_state(CPUMoxieState *env, target_ulong *pc,
*flags = 0;
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
int moxie_cpu_handle_mmu_fault(CPUState *cpu, vaddr address,
int rw, int mmu_idx);
diff --git a/target-openrisc/cpu.h b/target-openrisc/cpu.h
index aaf153579a9a..b6069a32e2b9 100644
--- a/target-openrisc/cpu.h
+++ b/target-openrisc/cpu.h
@@ -398,6 +398,20 @@ static inline void cpu_get_tb_cpu_state(CPUOpenRISCState *env,
*flags = (env->flags & D_FLAG);
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
static inline int cpu_mmu_index(CPUOpenRISCState *env, bool ifetch)
{
if (!(env->sr & SR_IME)) {
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 5fce1ffa251a..f94483691133 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -2295,6 +2295,20 @@ static inline void cpu_get_tb_cpu_state(CPUPPCState *env, target_ulong *pc,
*flags = env->hflags;
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
#if !defined(CONFIG_USER_ONLY)
static inline int booke206_tlbm_id(CPUPPCState *env, ppcmas_tlb_t *tlbm)
{
diff --git a/target-s390x/cpu.h b/target-s390x/cpu.h
index c216bdacef4f..113490ec8eb4 100644
--- a/target-s390x/cpu.h
+++ b/target-s390x/cpu.h
@@ -394,6 +394,20 @@ static inline void cpu_get_tb_cpu_state(CPUS390XState* env, target_ulong *pc,
((env->psw.mask & PSW_MASK_32) ? FLAG_MASK_32 : 0);
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
/* While the PoO talks about ILC (a number between 1-3) what is actually
stored in LowCore is shifted left one bit (an even between 2-6). As
this is the actual length of the insn and therefore more useful, that
diff --git a/target-sh4/cpu.h b/target-sh4/cpu.h
index 478ab558681b..6128d3890bda 100644
--- a/target-sh4/cpu.h
+++ b/target-sh4/cpu.h
@@ -388,4 +388,18 @@ static inline void cpu_get_tb_cpu_state(CPUSH4State *env, target_ulong *pc,
| (env->movcal_backup ? TB_FLAG_PENDING_MOVCA : 0); /* Bit 4 */
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
#endif /* SH4_CPU_H */
diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
index a3d64a4e5299..e327a35f78c1 100644
--- a/target-sparc/cpu.h
+++ b/target-sparc/cpu.h
@@ -749,6 +749,20 @@ static inline void cpu_get_tb_cpu_state(CPUSPARCState *env, target_ulong *pc,
*pflags = flags;
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1; /* npc must be a multible of 4 */
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
static inline bool tb_fpu_enabled(int tb_flags)
{
#if defined(CONFIG_USER_ONLY)
diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index e7691e44587d..81442ef813ae 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -39,6 +39,7 @@
#define DYNAMIC_PC 1 /* dynamic pc value */
#define JUMP_PC 2 /* dynamic pc value which takes only two values
according to jump_pc[T2] */
+/* NOTE: -1 is reserved for cpu_get_invalid_tb_cpu_state() */
/* global register indexes */
static TCGv_env cpu_env;
diff --git a/target-tilegx/cpu.h b/target-tilegx/cpu.h
index 17354272337d..863c06171841 100644
--- a/target-tilegx/cpu.h
+++ b/target-tilegx/cpu.h
@@ -175,4 +175,18 @@ static inline void cpu_get_tb_cpu_state(CPUTLGState *env, target_ulong *pc,
*flags = 0;
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
#endif
diff --git a/target-tricore/cpu.h b/target-tricore/cpu.h
index a3493a123c35..980b821b6b9f 100644
--- a/target-tricore/cpu.h
+++ b/target-tricore/cpu.h
@@ -411,6 +411,20 @@ static inline void cpu_get_tb_cpu_state(CPUTriCoreState *env, target_ulong *pc,
*flags = 0;
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
TriCoreCPU *cpu_tricore_init(const char *cpu_model);
#define cpu_init(cpu_model) CPU(cpu_tricore_init(cpu_model))
diff --git a/target-unicore32/cpu.h b/target-unicore32/cpu.h
index 7b5b405e79cd..01bf1e8288ed 100644
--- a/target-unicore32/cpu.h
+++ b/target-unicore32/cpu.h
@@ -180,6 +180,20 @@ static inline void cpu_get_tb_cpu_state(CPUUniCore32State *env, target_ulong *pc
}
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
int uc32_cpu_handle_mmu_fault(CPUState *cpu, vaddr address, int rw,
int mmu_idx);
void uc32_translate_init(void);
diff --git a/target-xtensa/cpu.h b/target-xtensa/cpu.h
index 7fe82a37af42..239588740f3c 100644
--- a/target-xtensa/cpu.h
+++ b/target-xtensa/cpu.h
@@ -582,6 +582,20 @@ static inline void cpu_get_tb_cpu_state(CPUXtensaState *env, target_ulong *pc,
}
}
+static inline void cpu_get_invalid_tb_cpu_state(target_ulong *pc,
+ target_ulong *cs_base,
+ uint32_t *flags)
+{
+ *cs_base = -1;
+}
+
+static inline bool cpu_tb_cpu_state_is_invalidated(target_ulong pc,
+ target_ulong cs_base,
+ uint32_t flags)
+{
+ return cs_base == -1;
+}
+
#include "exec/cpu-all.h"
#endif
--
2.9.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v4 07/12] tcg: Prepare TB invalidation for lockless TB lookup
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
` (5 preceding siblings ...)
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 06/12] tcg: Introduce tb_mark_invalid() and tb_is_invalid() Sergey Fedorov
@ 2016-07-15 17:58 ` Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 08/12] tcg: set up tb->page_addr before insertion Sergey Fedorov
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée, Sergey Fedorov,
Peter Crosthwaite
From: Sergey Fedorov <serge.fdrv@gmail.com>
When invalidating a translation block, set an invalid CPU state into the
TranslationBlock structure first.
As soon as the TB is marked with an invalid CPU state, there is no need
to remove it from CPU's 'tb_jmp_cache'. However it will be necessary to
recheck whether the target TB is still valid after acquiring 'tb_lock'
but before calling tb_add_jump() since TB lookup is to be performed out
of 'tb_lock' in future. Note that we don't have to check 'last_tb' since
it is safe to patch an already invalidated TB since it will not be
executed anyway.
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
---
Changes in v4:
- smp_wmb() removed after tb_mark_invalid()
- atomic access to TB CPU state
---
cpu-exec.c | 7 ++++---
include/exec/exec-all.h | 8 +++++++-
translate-all.c | 11 ++---------
3 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index c973e3b85922..e16df762f50a 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -334,8 +334,9 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
tb_lock();
tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
- if (unlikely(!tb || tb->pc != pc || tb->cs_base != cs_base ||
- tb->flags != flags)) {
+ if (unlikely(!tb || atomic_read(&tb->pc) != pc ||
+ atomic_read(&tb->cs_base) != cs_base ||
+ atomic_read(&b->flags) != flags)) {
tb = tb_find_slow(cpu, pc, cs_base, flags);
}
#ifndef CONFIG_USER_ONLY
@@ -352,7 +353,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
/* Check if translation buffer has been flushed */
if (cpu->tb_flushed) {
cpu->tb_flushed = false;
- } else {
+ } else if (!tb_is_invalid(tb)) {
tb_add_jump(last_tb, tb_exit, tb);
}
}
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index a499c7c56eef..8f0afcdbd62a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -258,7 +258,13 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr);
static inline void tb_mark_invalid(TranslationBlock *tb)
{
- cpu_get_invalid_tb_cpu_state(&tb->pc, &tb->cs_base, &tb->flags);
+ target_ulong pc = 0, cs_base = 0;
+ uint32_t flags = 0;
+
+ cpu_get_invalid_tb_cpu_state(&pc, &cs_base, &flags);
+ atomic_set(&tb->pc, pc);
+ atomic_set(&tb->cs_base, cs_base);
+ atomic_set(&tb->flags, flags);
}
static inline bool tb_is_invalid(TranslationBlock *tb)
diff --git a/translate-all.c b/translate-all.c
index 788fed1e0765..9db72e8982b1 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -986,11 +986,12 @@ static inline void tb_jmp_unlink(TranslationBlock *tb)
/* invalidate one TB */
void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
{
- CPUState *cpu;
PageDesc *p;
uint32_t h;
tb_page_addr_t phys_pc;
+ tb_mark_invalid(tb);
+
/* remove the TB from the hash list */
phys_pc = tb->page_addr[0] + (tb->pc & ~TARGET_PAGE_MASK);
h = tb_hash_func(phys_pc, tb->pc, tb->flags);
@@ -1008,14 +1009,6 @@ void tb_phys_invalidate(TranslationBlock *tb, tb_page_addr_t page_addr)
invalidate_page_bitmap(p);
}
- /* remove the TB from the hash list */
- h = tb_jmp_cache_hash_func(tb->pc);
- CPU_FOREACH(cpu) {
- if (atomic_read(&cpu->tb_jmp_cache[h]) == tb) {
- atomic_set(&cpu->tb_jmp_cache[h], NULL);
- }
- }
-
/* suppress this TB from the two jump lists */
tb_remove_from_jmp_list(tb, 0);
tb_remove_from_jmp_list(tb, 1);
--
2.9.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v4 08/12] tcg: set up tb->page_addr before insertion
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
` (6 preceding siblings ...)
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 07/12] tcg: Prepare TB invalidation for lockless TB lookup Sergey Fedorov
@ 2016-07-15 17:58 ` Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 09/12] tcg: cpu-exec: remove tb_lock from the hot-path Sergey Fedorov
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée, Sergey Fedorov,
Peter Crosthwaite
From: Alex Bennée <alex.bennee@linaro.org>
This ensures that if we find the TB on the slow path that tb->page_addr
is correctly set before being tested.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
translate-all.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/translate-all.c b/translate-all.c
index 9db72e8982b1..6156bdcbef42 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1118,10 +1118,6 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
{
uint32_t h;
- /* add in the hash table */
- h = tb_hash_func(phys_pc, tb->pc, tb->flags);
- qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
-
/* add in the page list */
tb_alloc_page(tb, 0, phys_pc & TARGET_PAGE_MASK);
if (phys_page2 != -1) {
@@ -1130,6 +1126,10 @@ static void tb_link_page(TranslationBlock *tb, tb_page_addr_t phys_pc,
tb->page_addr[1] = -1;
}
+ /* add in the hash table */
+ h = tb_hash_func(phys_pc, tb->pc, tb->flags);
+ qht_insert(&tcg_ctx.tb_ctx.htable, tb, h);
+
#ifdef DEBUG_TB_CHECK
tb_page_check();
#endif
--
2.9.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v4 09/12] tcg: cpu-exec: remove tb_lock from the hot-path
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
` (7 preceding siblings ...)
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 08/12] tcg: set up tb->page_addr before insertion Sergey Fedorov
@ 2016-07-15 17:58 ` Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 10/12] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Sergey Fedorov
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée, Sergey Fedorov,
Peter Crosthwaite
From: Alex Bennée <alex.bennee@linaro.org>
Lock contention in the hot path of moving between existing patched
TranslationBlocks is the main drag in multithreaded performance. This
patch pushes the tb_lock() usage down to the two places that really need
it:
- code generation (tb_gen_code)
- jump patching (tb_add_jump)
The rest of the code doesn't really need to hold a lock as it is either
using per-CPU structures, atomically updated or designed to be used in
concurrent read situations (qht_lookup).
To keep things simple I removed the #ifdef CONFIG_USER_ONLY stuff as the
locks become NOPs anyway until the MTTCG work is completed.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
v2 (hot path)
- Add r-b tags
v1 (hot path, split from base-patches series)
- revert name tweaking
- drop test jmp_list_next outside lock
- mention lock NOPs in comments
v3 (base-patches)
- fix merge conflicts with Sergey's patch
---
cpu-exec.c | 48 +++++++++++++++++++++---------------------------
1 file changed, 21 insertions(+), 27 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index e16df762f50a..bbaed5bb1978 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -286,35 +286,29 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
TranslationBlock *tb;
tb = tb_find_physical(cpu, pc, cs_base, flags);
- if (tb) {
- goto found;
- }
+ if (!tb) {
-#ifdef CONFIG_USER_ONLY
- /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
- * taken outside tb_lock. Since we're momentarily dropping
- * tb_lock, there's a chance that our desired tb has been
- * translated.
- */
- tb_unlock();
- mmap_lock();
- tb_lock();
- tb = tb_find_physical(cpu, pc, cs_base, flags);
- if (tb) {
- mmap_unlock();
- goto found;
- }
-#endif
+ /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+ * taken outside tb_lock. As system emulation is currently
+ * single threaded the locks are NOPs.
+ */
+ mmap_lock();
+ tb_lock();
- /* if no translated code available, then translate it now */
- tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+ /* There's a chance that our desired tb has been translated while
+ * taking the locks so we check again inside the lock.
+ */
+ tb = tb_find_physical(cpu, pc, cs_base, flags);
+ if (!tb) {
+ /* if no translated code available, then translate it now */
+ tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+ }
-#ifdef CONFIG_USER_ONLY
- mmap_unlock();
-#endif
+ tb_unlock();
+ mmap_unlock();
+ }
-found:
- /* we add the TB in the virtual pc hash table */
+ /* We add the TB in the virtual pc hash table for the fast lookup */
atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
return tb;
}
@@ -332,7 +326,6 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
always be the same before a given translated block
is executed. */
cpu_get_tb_cpu_state(env, &pc, &cs_base, &flags);
- tb_lock();
tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
if (unlikely(!tb || atomic_read(&tb->pc) != pc ||
atomic_read(&tb->cs_base) != cs_base ||
@@ -350,14 +343,15 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
#endif
/* See if we can patch the calling TB. */
if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
+ tb_lock();
/* Check if translation buffer has been flushed */
if (cpu->tb_flushed) {
cpu->tb_flushed = false;
} else if (!tb_is_invalid(tb)) {
tb_add_jump(last_tb, tb_exit, tb);
}
+ tb_unlock();
}
- tb_unlock();
return tb;
}
--
2.9.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v4 10/12] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump()
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
` (8 preceding siblings ...)
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 09/12] tcg: cpu-exec: remove tb_lock from the hot-path Sergey Fedorov
@ 2016-07-15 17:58 ` Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 11/12] tcg: Merge tb_find_slow() and tb_find_fast() Sergey Fedorov
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée, Sergey Fedorov,
Peter Crosthwaite
From: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
---
cpu-exec.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index bbaed5bb1978..073d783398f3 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -281,7 +281,8 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
static TranslationBlock *tb_find_slow(CPUState *cpu,
target_ulong pc,
target_ulong cs_base,
- uint32_t flags)
+ uint32_t flags,
+ bool *have_tb_lock)
{
TranslationBlock *tb;
@@ -294,6 +295,7 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
*/
mmap_lock();
tb_lock();
+ *have_tb_lock = true;
/* There's a chance that our desired tb has been translated while
* taking the locks so we check again inside the lock.
@@ -304,7 +306,6 @@ static TranslationBlock *tb_find_slow(CPUState *cpu,
tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
}
- tb_unlock();
mmap_unlock();
}
@@ -321,6 +322,7 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
TranslationBlock *tb;
target_ulong cs_base, pc;
uint32_t flags;
+ bool have_tb_lock = false;
/* we record a subset of the CPU state. It will
always be the same before a given translated block
@@ -329,8 +331,8 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
tb = atomic_read(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)]);
if (unlikely(!tb || atomic_read(&tb->pc) != pc ||
atomic_read(&tb->cs_base) != cs_base ||
- atomic_read(&b->flags) != flags)) {
- tb = tb_find_slow(cpu, pc, cs_base, flags);
+ atomic_read(&tb->flags) != flags)) {
+ tb = tb_find_slow(cpu, pc, cs_base, flags, &have_tb_lock);
}
#ifndef CONFIG_USER_ONLY
/* We don't take care of direct jumps when address mapping changes in
@@ -343,13 +345,18 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
#endif
/* See if we can patch the calling TB. */
if (last_tb && !qemu_loglevel_mask(CPU_LOG_TB_NOCHAIN)) {
- tb_lock();
+ if (!have_tb_lock) {
+ tb_lock();
+ have_tb_lock = true;
+ }
/* Check if translation buffer has been flushed */
if (cpu->tb_flushed) {
cpu->tb_flushed = false;
} else if (!tb_is_invalid(tb)) {
tb_add_jump(last_tb, tb_exit, tb);
}
+ }
+ if (have_tb_lock) {
tb_unlock();
}
return tb;
--
2.9.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v4 11/12] tcg: Merge tb_find_slow() and tb_find_fast()
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
` (9 preceding siblings ...)
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 10/12] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Sergey Fedorov
@ 2016-07-15 17:58 ` Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 12/12] tcg: rename tb_find_physical() Sergey Fedorov
2016-07-16 13:51 ` [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Paolo Bonzini
12 siblings, 0 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée, Sergey Fedorov,
Peter Crosthwaite
From: Sergey Fedorov <serge.fdrv@gmail.com>
These functions are not too big and can be merged together. This makes
locking scheme more clear and easier to follow.
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
---
cpu-exec.c | 72 ++++++++++++++++++++++++++------------------------------------
1 file changed, 30 insertions(+), 42 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index 073d783398f3..ff138809046c 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -278,45 +278,9 @@ static TranslationBlock *tb_find_physical(CPUState *cpu,
return qht_lookup(&tcg_ctx.tb_ctx.htable, tb_cmp, &desc, h);
}
-static TranslationBlock *tb_find_slow(CPUState *cpu,
- target_ulong pc,
- target_ulong cs_base,
- uint32_t flags,
- bool *have_tb_lock)
-{
- TranslationBlock *tb;
-
- tb = tb_find_physical(cpu, pc, cs_base, flags);
- if (!tb) {
-
- /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
- * taken outside tb_lock. As system emulation is currently
- * single threaded the locks are NOPs.
- */
- mmap_lock();
- tb_lock();
- *have_tb_lock = true;
-
- /* There's a chance that our desired tb has been translated while
- * taking the locks so we check again inside the lock.
- */
- tb = tb_find_physical(cpu, pc, cs_base, flags);
- if (!tb) {
- /* if no translated code available, then translate it now */
- tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
- }
-
- mmap_unlock();
- }
-
- /* We add the TB in the virtual pc hash table for the fast lookup */
- atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
- return tb;
-}
-
-static inline TranslationBlock *tb_find_fast(CPUState *cpu,
- TranslationBlock *last_tb,
- int tb_exit)
+static inline TranslationBlock *tb_find(CPUState *cpu,
+ TranslationBlock *last_tb,
+ int tb_exit)
{
CPUArchState *env = (CPUArchState *)cpu->env_ptr;
TranslationBlock *tb;
@@ -332,7 +296,31 @@ static inline TranslationBlock *tb_find_fast(CPUState *cpu,
if (unlikely(!tb || atomic_read(&tb->pc) != pc ||
atomic_read(&tb->cs_base) != cs_base ||
atomic_read(&tb->flags) != flags)) {
- tb = tb_find_slow(cpu, pc, cs_base, flags, &have_tb_lock);
+ tb = tb_find_physical(cpu, pc, cs_base, flags);
+ if (!tb) {
+
+ /* mmap_lock is needed by tb_gen_code, and mmap_lock must be
+ * taken outside tb_lock. As system emulation is currently
+ * single threaded the locks are NOPs.
+ */
+ mmap_lock();
+ tb_lock();
+ have_tb_lock = true;
+
+ /* There's a chance that our desired tb has been translated while
+ * taking the locks so we check again inside the lock.
+ */
+ tb = tb_find_physical(cpu, pc, cs_base, flags);
+ if (!tb) {
+ /* if no translated code available, then translate it now */
+ tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
+ }
+
+ mmap_unlock();
+ }
+
+ /* We add the TB in the virtual pc hash table for the fast lookup */
+ atomic_set(&cpu->tb_jmp_cache[tb_jmp_cache_hash_func(pc)], tb);
}
#ifndef CONFIG_USER_ONLY
/* We don't take care of direct jumps when address mapping changes in
@@ -437,7 +425,7 @@ static inline bool cpu_handle_exception(CPUState *cpu, int *ret)
} else if (replay_has_exception()
&& cpu->icount_decr.u16.low + cpu->icount_extra == 0) {
/* try to cause an exception pending in the log */
- cpu_exec_nocache(cpu, 1, tb_find_fast(cpu, NULL, 0), true);
+ cpu_exec_nocache(cpu, 1, tb_find(cpu, NULL, 0), true);
*ret = -1;
return true;
#endif
@@ -621,7 +609,7 @@ int cpu_exec(CPUState *cpu)
atomic_mb_set(&cpu->tb_flushed, false); /* reset before first TB lookup */
for(;;) {
cpu_handle_interrupt(cpu, &last_tb);
- tb = tb_find_fast(cpu, last_tb, tb_exit);
+ tb = tb_find(cpu, last_tb, tb_exit);
cpu_loop_exec_tb(cpu, tb, &last_tb, &tb_exit, &sc);
/* Try to align the host and virtual clocks
if the guest is in advance */
--
2.9.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [Qemu-devel] [PATCH v4 12/12] tcg: rename tb_find_physical()
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
` (10 preceding siblings ...)
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 11/12] tcg: Merge tb_find_slow() and tb_find_fast() Sergey Fedorov
@ 2016-07-15 17:58 ` Sergey Fedorov
2016-07-16 13:51 ` [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Paolo Bonzini
12 siblings, 0 replies; 14+ messages in thread
From: Sergey Fedorov @ 2016-07-15 17:58 UTC (permalink / raw)
To: qemu-devel
Cc: patches, Sergey Fedorov, mttcg, fred.konrad, a.rigo, cota,
bobby.prani, rth, mark.burton, pbonzini, jan.kiszka,
peter.maydell, claudio.fontana, Alex Bennée, Sergey Fedorov,
Peter Crosthwaite
From: Sergey Fedorov <serge.fdrv@gmail.com>
In fact, this function does not exactly perform a lookup by physical
address as it is descibed for comment on get_page_addr_code(). Thus
it may be a bit confusing to have "physical" in it's name. So rename it
to tb_htable_lookup() to better reflect its actual functionality.
Signed-off-by: Sergey Fedorov <serge.fdrv@gmail.com>
Signed-off-by: Sergey Fedorov <sergey.fedorov@linaro.org>
---
cpu-exec.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/cpu-exec.c b/cpu-exec.c
index ff138809046c..735541e753fb 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -259,7 +259,7 @@ static bool tb_cmp(const void *p, const void *d)
return false;
}
-static TranslationBlock *tb_find_physical(CPUState *cpu,
+static TranslationBlock *tb_htable_lookup(CPUState *cpu,
target_ulong pc,
target_ulong cs_base,
uint32_t flags)
@@ -296,7 +296,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
if (unlikely(!tb || atomic_read(&tb->pc) != pc ||
atomic_read(&tb->cs_base) != cs_base ||
atomic_read(&tb->flags) != flags)) {
- tb = tb_find_physical(cpu, pc, cs_base, flags);
+ tb = tb_htable_lookup(cpu, pc, cs_base, flags);
if (!tb) {
/* mmap_lock is needed by tb_gen_code, and mmap_lock must be
@@ -310,7 +310,7 @@ static inline TranslationBlock *tb_find(CPUState *cpu,
/* There's a chance that our desired tb has been translated while
* taking the locks so we check again inside the lock.
*/
- tb = tb_find_physical(cpu, pc, cs_base, flags);
+ tb = tb_htable_lookup(cpu, pc, cs_base, flags);
if (!tb) {
/* if no translated code available, then translate it now */
tb = tb_gen_code(cpu, pc, cs_base, flags, 0);
--
2.9.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
` (11 preceding siblings ...)
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 12/12] tcg: rename tb_find_physical() Sergey Fedorov
@ 2016-07-16 13:51 ` Paolo Bonzini
12 siblings, 0 replies; 14+ messages in thread
From: Paolo Bonzini @ 2016-07-16 13:51 UTC (permalink / raw)
To: Sergey Fedorov, qemu-devel
Cc: mttcg, peter.maydell, claudio.fontana, patches, jan.kiszka,
mark.burton, a.rigo, cota, Sergey Fedorov, bobby.prani, rth,
Alex Bennée, fred.konrad
On 15/07/2016 19:58, Sergey Fedorov wrote:
> From: Sergey Fedorov <serge.fdrv@gmail.com>
>
> Hi,
>
> This is a respin of this series [1].
>
> Here I used a modified version of Paolo's patch to docuement memory
> ordering assumptions for certain QHT operations.
>
> The last patch is a suggestion for renaming tb_find_physicall().
>
> This series can be fetch from the public git repository:
>
> https://github.com/sergefdrv/qemu.git lockless-tb-lookup-v4
>
> [1] http://thread.gmane.org/gmane.comp.emulators.qemu/426341
Queued all for 2.7, thanks.
Paolo
> Kind regards,
> Sergey
>
> Summary of changes in v4:
> - Modified version of Paolo's patch is used to document memory ordering
> assumptions for certain QHT operations
> - Intermediate compilation errors fixed
> - Atomic access to TB CPU state
> - tb_find_physical() renamed
> Summary of changes in v3:
> - QHT memory ordering assumptions documented
> - 'tb_jmp_cache' reset in tb_flush() made atomic
> - explicit memory barriers removed around 'tb_jmp_cache' access
> - safe access to 'tb_flushed' out of 'tb_lock' prepared
> - TBs marked with invalid CPU state early on invalidation
> - Alex's tb_find_{fast,slow}() roll-up related patches dropped
> - bouncing of tb_lock between tb_gen_code() and tb_add_jump() avoided
> with local variable 'have_tb_lock'
> - tb_find_{fast,slow}() merged
>
> Alex Bennée (2):
> tcg: set up tb->page_addr before insertion
> tcg: cpu-exec: remove tb_lock from the hot-path
>
> Paolo Bonzini (1):
> util/qht: Document memory ordering assumptions
>
> Sergey Fedorov (9):
> tcg: Pass last_tb by value to tb_find_fast()
> tcg: Prepare safe tb_jmp_cache lookup out of tb_lock
> tcg: Prepare safe access to tb_flushed out of tb_lock
> target-i386: Remove redundant HF_SOFTMMU_MASK
> tcg: Introduce tb_mark_invalid() and tb_is_invalid()
> tcg: Prepare TB invalidation for lockless TB lookup
> tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump()
> tcg: Merge tb_find_slow() and tb_find_fast()
> tcg: rename tb_find_physical()
>
> cpu-exec.c | 117 +++++++++++++++++++++--------------------------
> include/exec/exec-all.h | 16 +++++++
> include/qemu/qht.h | 5 ++
> target-alpha/cpu.h | 14 ++++++
> target-arm/cpu.h | 14 ++++++
> target-cris/cpu.h | 14 ++++++
> target-i386/cpu.c | 3 --
> target-i386/cpu.h | 20 ++++++--
> target-i386/translate.c | 12 ++---
> target-lm32/cpu.h | 14 ++++++
> target-m68k/cpu.h | 14 ++++++
> target-microblaze/cpu.h | 14 ++++++
> target-mips/cpu.h | 14 ++++++
> target-moxie/cpu.h | 14 ++++++
> target-openrisc/cpu.h | 14 ++++++
> target-ppc/cpu.h | 14 ++++++
> target-s390x/cpu.h | 14 ++++++
> target-sh4/cpu.h | 14 ++++++
> target-sparc/cpu.h | 14 ++++++
> target-sparc/translate.c | 1 +
> target-tilegx/cpu.h | 14 ++++++
> target-tricore/cpu.h | 14 ++++++
> target-unicore32/cpu.h | 14 ++++++
> target-xtensa/cpu.h | 14 ++++++
> translate-all.c | 29 ++++++------
> util/qht.c | 7 ++-
> 26 files changed, 352 insertions(+), 96 deletions(-)
>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2016-07-16 13:51 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-15 17:58 [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 01/12] util/qht: Document memory ordering assumptions Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 02/12] tcg: Pass last_tb by value to tb_find_fast() Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 03/12] tcg: Prepare safe tb_jmp_cache lookup out of tb_lock Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 04/12] tcg: Prepare safe access to tb_flushed " Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 05/12] target-i386: Remove redundant HF_SOFTMMU_MASK Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 06/12] tcg: Introduce tb_mark_invalid() and tb_is_invalid() Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 07/12] tcg: Prepare TB invalidation for lockless TB lookup Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 08/12] tcg: set up tb->page_addr before insertion Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 09/12] tcg: cpu-exec: remove tb_lock from the hot-path Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 10/12] tcg: Avoid bouncing tb_lock between tb_gen_code() and tb_add_jump() Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 11/12] tcg: Merge tb_find_slow() and tb_find_fast() Sergey Fedorov
2016-07-15 17:58 ` [Qemu-devel] [PATCH v4 12/12] tcg: rename tb_find_physical() Sergey Fedorov
2016-07-16 13:51 ` [Qemu-devel] [PATCH v4 00/12] Reduce lock contention on TCG hot-path Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).