From: "Emilio G. Cota" <cota@braap.org>
To: Richard Henderson <richard.henderson@linaro.org>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line
Date: Thu, 15 Nov 2018 20:13:38 -0500 [thread overview]
Message-ID: <20181116011338.GB17566@flamenco> (raw)
In-Reply-To: <06e66024-1abb-e5b7-591c-3633b5cb3e31@linaro.org>
On Thu, Nov 15, 2018 at 23:04:50 +0100, Richard Henderson wrote:
> On 11/15/18 7:48 PM, Emilio G. Cota wrote:
> > - Segfault in code_gen_buffer. This one I don't have a fix for,
> > but it's *much* easier to reproduce when -tb-size is very small,
> > e.g. "-tb-size 5 -smp 2" (BTW it crashes with x86_64 guests too.)
> > So at first I thought the code cache flushing was the problem,
> > but I don't see how that could be, at least from a TCGContext
> > viewpoint -- I agree that clearing the hash table in
> > tcg_region_assign is a good place to do so.
>
> Ho hum.
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 639f0b2728..115ea186e5 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1831,10 +1831,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
> existing_tb = tb_link_page(tb, phys_pc, phys_page2);
> /* if the TB already exists, discard what we just translated */
> if (unlikely(existing_tb != tb)) {
> - uintptr_t orig_aligned = (uintptr_t)gen_code_buf;
> -
> - orig_aligned -= ROUND_UP(sizeof(*tb), qemu_icache_linesize);
> - atomic_set(&tcg_ctx->code_gen_ptr, (void *)orig_aligned);
> return existing_tb;
> }
> tcg_tb_insert(tb);
>
> We can't easily undo the hash table insert, and for a relatively rare
> occurrence it's not worth the effort.
Nice catch! Everything works now =D
In the bootup+shutdown aarch64 test with -smp 12, we end up
discarding ~2500 TB's--that's ~439K of space for code that we
do not waste; note that I'm assuming 180 host bytes per TB,
which is the average reported by info jit.
We can still discard most of these by increasing a counter every
time we insert a new element into the OOL table, and checking
this counter before/after tcg_gen_code. (Note that checking
g_hash_table_size before/after is not enough, because we might
have replaced an existing item from the table.)
Then, we discard a TB iff an OOL thunk was generated. (Diff below.)
This allows us to discard most TBs; in the example above,
we end up *not* discarding only ~70 TBs, that is we end up keeping
only 70/2500 = 2.8% of the TBs that we'd discard without OOL.
Performance-wise it doesn't make a difference for -smp 1:
Host: Intel(R) Xeon(R) CPU E5-2643 0 @ 3.30GHz
Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (5 runs):
- Before (3.1.0-rc1):
14351.436177 task-clock (msec) # 0.998 CPUs utilized ( +- 0.24% )
49,963,260,126 cycles # 3.481 GHz ( +- 0.22% ) (83.32%)
26,047,650,654 stalled-cycles-frontend # 52.13% frontend cycles idle ( +- 0.29% ) (83.34%)
19,717,480,482 stalled-cycles-backend # 39.46% backend cycles idle ( +- 0.27% ) (66.67%)
59,278,011,067 instructions # 1.19 insns per cycle
# 0.44 stalled cycles per insn ( +- 0.17% ) (83.34%)
10,632,601,608 branches # 740.874 M/sec ( +- 0.17% ) (83.34%)
236,153,469 branch-misses # 2.22% of all branches ( +- 0.16% ) (83.35%)
14.382847823 seconds time elapsed ( +- 0.25% )
- After this series (with the fixes we've discussed):
13256.198927 task-clock (msec) # 0.998 CPUs utilized ( +- 0.04% )
46,146,457,353 cycles # 3.481 GHz ( +- 0.08% ) (83.34%)
22,632,342,565 stalled-cycles-frontend # 49.04% frontend cycles idle ( +- 0.12% ) (83.35%)
16,534,690,741 stalled-cycles-backend # 35.83% backend cycles idle ( +- 0.15% ) (66.67%)
58,047,832,548 instructions # 1.26 insns per cycle
# 0.39 stalled cycles per insn ( +- 0.18% ) (83.34%)
11,031,634,880 branches # 832.187 M/sec ( +- 0.12% ) (83.33%)
210,593,929 branch-misses # 1.91% of all branches ( +- 0.30% ) (83.33%)
13.285023783 seconds time elapsed ( +- 0.05% )
- After the fixup below:
13240.889734 task-clock (msec) # 0.998 CPUs utilized ( +- 0.19% )
46,074,292,775 cycles # 3.480 GHz ( +- 0.12% ) (83.35%)
22,670,132,770 stalled-cycles-frontend # 49.20% frontend cycles idle ( +- 0.17% ) (83.35%)
16,598,822,504 stalled-cycles-backend # 36.03% backend cycles idle ( +- 0.26% ) (66.66%)
57,796,083,344 instructions # 1.25 insns per cycle
# 0.39 stalled cycles per insn ( +- 0.16% ) (83.34%)
11,002,340,174 branches # 830.937 M/sec ( +- 0.11% ) (83.35%)
211,023,549 branch-misses # 1.92% of all branches ( +- 0.22% ) (83.32%)
13.264499034 seconds time elapsed ( +- 0.19% )
I'll generate now some more perf numbers that we could include in the
commit logs.
Thanks,
Emilio
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 115ea18..15f7d4e 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -1678,6 +1678,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
target_ulong virt_page2;
tcg_insn_unit *gen_code_buf;
int gen_code_size, search_size;
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+ size_t n_ool_thunks;
+#endif
#ifdef CONFIG_PROFILER
TCGProfile *prof = &tcg_ctx->prof;
int64_t ti;
@@ -1744,6 +1747,10 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
ti = profile_getclock();
#endif
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+ n_ool_thunks = tcg_ctx->n_ool_thunks;
+#endif
+
/* ??? Overflow could be handled better here. In particular, we
don't need to re-do gen_intermediate_code, nor should we re-do
the tcg optimization currently hidden inside tcg_gen_code. All
@@ -1831,6 +1838,18 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
existing_tb = tb_link_page(tb, phys_pc, phys_page2);
/* if the TB already exists, discard what we just translated */
if (unlikely(existing_tb != tb)) {
+ bool discard = true;
+
+#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
+ /* only discard the TB if we didn't generate an OOL thunk */
+ discard = tcg_ctx->n_ool_thunks == n_ool_thunks;
+#endif
+ if (discard) {
+ uintptr_t orig_aligned = (uintptr_t)gen_code_buf;
+
+ orig_aligned -= ROUND_UP(sizeof(*tb), qemu_icache_linesize);
+ atomic_set(&tcg_ctx->code_gen_ptr, (void *)orig_aligned);
+ }
return existing_tb;
}
tcg_tb_insert(tb);
diff --git a/tcg/tcg-ldst-ool.inc.c b/tcg/tcg-ldst-ool.inc.c
index 8fb6550..61da060 100644
--- a/tcg/tcg-ldst-ool.inc.c
+++ b/tcg/tcg-ldst-ool.inc.c
@@ -69,6 +69,7 @@ static bool tcg_out_ldst_ool_finalize(TCGContext *s)
/* Remember the thunk for next time. */
g_hash_table_replace(s->ldst_ool_thunks, key, dest);
+ s->n_ool_thunks++;
/* The new thunk must be in range. */
ok = patch_reloc(lb->label, lb->reloc, (intptr_t)dest, lb->addend);
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 1255d2a..d4f07a6 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -709,6 +709,7 @@ struct TCGContext {
#ifdef TCG_TARGET_NEED_LDST_OOL_LABELS
QSIMPLEQ_HEAD(ldst_labels, TCGLabelQemuLdstOol) ldst_ool_labels;
GHashTable *ldst_ool_thunks;
+ size_t n_ool_thunks;
#endif
#ifdef TCG_TARGET_NEED_POOL_LABELS
struct TCGLabelPoolData *pool_labels;
next prev parent reply other threads:[~2018-11-16 1:13 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-12 21:44 [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 01/17] tcg/i386: Add constraints for r8 and r9 Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 02/17] tcg/i386: Return a base register from tcg_out_tlb_load Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 03/17] tcg/i386: Change TCG_REG_L[01] to not overlap function arguments Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 04/17] tcg/i386: Force qemu_ld/st arguments into fixed registers Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 05/17] tcg: Return success from patch_reloc Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 06/17] tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 07/17] tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 08/17] tcg/aarch64: Add constraints for x0, x1, x2 Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 09/17] tcg/aarch64: Parameterize the temps for tcg_out_tlb_read Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 10/17] tcg/aarch64: Parameterize the temp for tcg_out_goto_long Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 11/17] tcg/aarch64: Use B not BL " Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 12/17] tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-12 21:44 ` [Qemu-devel] [PATCH for-4.0 13/17] tcg/arm: Parameterize the temps for tcg_out_tlb_read Richard Henderson
2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 14/17] tcg/arm: Add constraints for R0-R5 Richard Henderson
2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 15/17] tcg/arm: Reduce the number of temps for tcg_out_tlb_read Richard Henderson
2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 16/17] tcg/arm: Force qemu_ld/st arguments into fixed registers Richard Henderson
2018-11-12 21:45 ` [Qemu-devel] [PATCH for-4.0 17/17] tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS Richard Henderson
2018-11-13 9:00 ` [Qemu-devel] [PATCH for-4.0 00/17] tcg: Move softmmu out-of-line no-reply
2018-11-14 1:00 ` Emilio G. Cota
2018-11-15 11:32 ` Richard Henderson
2018-11-15 18:48 ` Emilio G. Cota
2018-11-15 18:54 ` Richard Henderson
2018-11-15 22:04 ` Richard Henderson
2018-11-16 1:13 ` Emilio G. Cota [this message]
2018-11-16 5:10 ` Emilio G. Cota
2018-11-16 8:07 ` Richard Henderson
2018-11-16 15:07 ` Emilio G. Cota
2018-11-16 8:10 ` Richard Henderson
2018-11-16 15:10 ` Emilio G. Cota
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181116011338.GB17566@flamenco \
--to=cota@braap.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).