* [PATCH v11 01/14] accel/tcg: introduce TBStatistics structure
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-05-03 8:12 ` Richard Henderson
2023-04-21 13:24 ` [PATCH v11 02/14] accel: collecting TB execution count Fei Wu
` (13 subsequent siblings)
14 siblings, 1 reply; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel
Cc: Vanderson M. do Rosario, Paolo Bonzini
From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
To store statistics for each TB, we created a TBStatistics structure
which is linked with the TBs. TBStatistics can stay alive after
tb_flush and be relinked to a regenerated TB. So the statistics can
be accumulated even through flushes.
The goal is to have all present and future qemu/tcg statistics and
meta-data stored in this new structure.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-2-vandersonmr2@gmail.com>
[AJB: fix git author, review comments]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/meson.build | 1 +
accel/tcg/tb-context.h | 1 +
accel/tcg/tb-hash.h | 7 +++++
accel/tcg/tb-maint.c | 19 ++++++++++++
accel/tcg/tb-stats.c | 58 +++++++++++++++++++++++++++++++++++++
accel/tcg/translate-all.c | 43 +++++++++++++++++++++++++++
include/exec/exec-all.h | 3 ++
include/exec/tb-stats.h | 61 +++++++++++++++++++++++++++++++++++++++
8 files changed, 193 insertions(+)
create mode 100644 accel/tcg/tb-stats.c
create mode 100644 include/exec/tb-stats.h
diff --git a/accel/tcg/meson.build b/accel/tcg/meson.build
index aeb20a6ef0..9263bdde11 100644
--- a/accel/tcg/meson.build
+++ b/accel/tcg/meson.build
@@ -4,6 +4,7 @@ tcg_ss.add(files(
'cpu-exec-common.c',
'cpu-exec.c',
'tb-maint.c',
+ 'tb-stats.c',
'tcg-runtime-gvec.c',
'tcg-runtime.c',
'translate-all.c',
diff --git a/accel/tcg/tb-context.h b/accel/tcg/tb-context.h
index cac62d9749..d7910d586b 100644
--- a/accel/tcg/tb-context.h
+++ b/accel/tcg/tb-context.h
@@ -35,6 +35,7 @@ struct TBContext {
/* statistics */
unsigned tb_flush_count;
unsigned tb_phys_invalidate_count;
+ struct qht tb_stats;
};
extern TBContext tb_ctx;
diff --git a/accel/tcg/tb-hash.h b/accel/tcg/tb-hash.h
index 83dc610e4c..87d657a1c6 100644
--- a/accel/tcg/tb-hash.h
+++ b/accel/tcg/tb-hash.h
@@ -67,4 +67,11 @@ uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags,
return qemu_xxhash7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
}
+static inline
+uint32_t tb_stats_hash_func(tb_page_addr_t phys_pc, target_ulong pc,
+ uint32_t flags)
+{
+ return qemu_xxhash5(phys_pc, pc, flags);
+}
+
#endif
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index 7246c1c46b..ba1635aa4b 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -23,6 +23,7 @@
#include "exec/log.h"
#include "exec/exec-all.h"
#include "exec/tb-flush.h"
+#include "exec/tb-stats.h"
#include "exec/translate-all.h"
#include "sysemu/tcg.h"
#include "tcg/tcg.h"
@@ -40,6 +41,23 @@
#define TB_FOR_EACH_JMP(head_tb, tb, n) \
TB_FOR_EACH_TAGGED((head_tb)->jmp_list_head, tb, n, jmp_list_next)
+/*
+ * This is the more or less the same compare as tb_cmp(), but the
+ * data persists over tb_flush. We also aggregate the various
+ * variations of cflags under one record and ignore the details of
+ * page overlap (although we can count it).
+ */
+bool tb_stats_cmp(const void *ap, const void *bp)
+{
+ const TBStatistics *a = ap;
+ const TBStatistics *b = bp;
+
+ return a->phys_pc == b->phys_pc &&
+ a->pc == b->pc &&
+ a->cs_base == b->cs_base &&
+ a->flags == b->flags;
+}
+
static bool tb_cmp(const void *ap, const void *bp)
{
const TranslationBlock *a = ap;
@@ -59,6 +77,7 @@ void tb_htable_init(void)
unsigned int mode = QHT_MODE_AUTO_RESIZE;
qht_init(&tb_ctx.htable, tb_cmp, CODE_GEN_HTABLE_SIZE, mode);
+ init_tb_stats_htable();
}
typedef struct PageDesc PageDesc;
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
new file mode 100644
index 0000000000..f78a21f522
--- /dev/null
+++ b/accel/tcg/tb-stats.c
@@ -0,0 +1,58 @@
+/*
+ * QEMU System Emulator, Code Quality Monitor System
+ *
+ * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+
+#include "disas/disas.h"
+
+#include "exec/tb-stats.h"
+#include "tb-context.h"
+
+/* TBStatistic collection controls */
+enum TBStatsStatus {
+ TB_STATS_DISABLED = 0,
+ TB_STATS_RUNNING,
+ TB_STATS_PAUSED,
+ TB_STATS_STOPPED
+};
+
+static enum TBStatsStatus tcg_collect_tb_stats;
+
+void init_tb_stats_htable(void)
+{
+ if (!tb_ctx.tb_stats.map && tb_stats_collection_enabled()) {
+ qht_init(&tb_ctx.tb_stats, tb_stats_cmp,
+ CODE_GEN_HTABLE_SIZE, QHT_MODE_AUTO_RESIZE);
+ }
+}
+
+void enable_collect_tb_stats(void)
+{
+ tcg_collect_tb_stats = TB_STATS_RUNNING;
+ init_tb_stats_htable();
+}
+
+void disable_collect_tb_stats(void)
+{
+ tcg_collect_tb_stats = TB_STATS_PAUSED;
+}
+
+void pause_collect_tb_stats(void)
+{
+ tcg_collect_tb_stats = TB_STATS_STOPPED;
+}
+
+bool tb_stats_collection_enabled(void)
+{
+ return tcg_collect_tb_stats == TB_STATS_RUNNING;
+}
+
+bool tb_stats_collection_paused(void)
+{
+ return tcg_collect_tb_stats == TB_STATS_PAUSED;
+}
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 74deb18bd0..0fcde2e84a 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -54,6 +54,7 @@
#include "qemu/cacheinfo.h"
#include "qemu/timer.h"
#include "exec/log.h"
+#include "exec/tb-stats.h"
#include "sysemu/cpus.h"
#include "sysemu/cpu-timers.h"
#include "sysemu/tcg.h"
@@ -297,6 +298,37 @@ static int setjmp_gen_code(CPUArchState *env, TranslationBlock *tb,
return tcg_gen_code(tcg_ctx, tb, pc);
}
+static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
+ target_ulong cs_base, uint32_t flags)
+{
+ TBStatistics *new_stats = g_new0(TBStatistics, 1);
+ uint32_t hash = tb_stats_hash_func(phys_pc, pc, flags);
+ void *existing_stats = NULL;
+ new_stats->phys_pc = phys_pc;
+ new_stats->pc = pc;
+ new_stats->cs_base = cs_base;
+ new_stats->flags = flags;
+
+ /*
+ * All initialisation must be complete before we insert into qht
+ * table otherwise another thread might get a partially created
+ * structure.
+ */
+ qht_insert(&tb_ctx.tb_stats, new_stats, hash, &existing_stats);
+
+ if (unlikely(existing_stats)) {
+ /*
+ * If there is already a TBStatistic for this TB from a previous flush
+ * then just make the new TB point to the older TBStatistic
+ */
+ g_free(new_stats);
+ return existing_stats;
+ } else {
+ return new_stats;
+ }
+}
+
+
/* Called with mmap_lock held for user mode emulation. */
TranslationBlock *tb_gen_code(CPUState *cpu,
target_ulong pc, target_ulong cs_base,
@@ -362,6 +394,17 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
trace_translate_block(tb, pc, tb->tc.ptr);
+ /*
+ * We want to fetch the stats structure before we start code
+ * generation so we can count interesting things about this
+ * generation.
+ */
+ if (tb_stats_collection_enabled()) {
+ tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);
+ } else {
+ tb->tb_stats = NULL;
+ }
+
gen_code_size = setjmp_gen_code(env, tb, pc, host_pc, &max_insns, &ti);
if (unlikely(gen_code_size < 0)) {
switch (gen_code_size) {
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index ad9eb6067b..25534baf33 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -611,6 +611,9 @@ struct TranslationBlock {
uintptr_t jmp_list_head;
uintptr_t jmp_list_next[2];
uintptr_t jmp_dest[2];
+
+ /* Pointer to a struct where statistics from the TB is stored */
+ struct TBStatistics *tb_stats;
};
/* Hide the qatomic_read to make code a little easier on the eyes */
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
new file mode 100644
index 0000000000..606f643723
--- /dev/null
+++ b/include/exec/tb-stats.h
@@ -0,0 +1,61 @@
+/*
+ * QEMU System Emulator, Code Quality Monitor System
+ *
+ * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef TB_STATS_H
+
+#define TB_STATS_H
+
+#include "exec/cpu-common.h"
+#include "exec/exec-all.h"
+#include "tcg/tcg.h"
+
+typedef struct TBStatistics TBStatistics;
+
+/*
+ * This struct stores statistics such as execution count of the
+ * TranslationBlocks. Each sets of TBs for a given phys_pc/pc/flags
+ * has its own TBStatistics which will persist over tb_flush.
+ *
+ * We include additional counters to track number of translations as
+ * well as variants for compile flags.
+ */
+struct TBStatistics {
+ tb_page_addr_t phys_pc;
+ target_ulong pc;
+ uint32_t flags;
+ /* cs_base isn't included in the hash but we do check for matches */
+ target_ulong cs_base;
+};
+
+bool tb_stats_cmp(const void *ap, const void *bp);
+
+void init_tb_stats_htable(void);
+
+void enable_collect_tb_stats(void);
+void disable_collect_tb_stats(void);
+void pause_collect_tb_stats(void);
+bool tb_stats_collection_enabled(void);
+bool tb_stats_collection_paused(void);
+
+#endif
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v11 01/14] accel/tcg: introduce TBStatistics structure
2023-04-21 13:24 ` [PATCH v11 01/14] accel/tcg: introduce TBStatistics structure Fei Wu
@ 2023-05-03 8:12 ` Richard Henderson
2023-05-08 9:52 ` Wu, Fei
0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2023-05-03 8:12 UTC (permalink / raw)
To: Fei Wu, alex.bennee, qemu-devel; +Cc: Vanderson M. do Rosario, Paolo Bonzini
On 4/21/23 14:24, Fei Wu wrote:
> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
>
> To store statistics for each TB, we created a TBStatistics structure
> which is linked with the TBs. TBStatistics can stay alive after
> tb_flush and be relinked to a regenerated TB. So the statistics can
> be accumulated even through flushes.
>
> The goal is to have all present and future qemu/tcg statistics and
> meta-data stored in this new structure.
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> Message-Id: <20190829173437.5926-2-vandersonmr2@gmail.com>
> [AJB: fix git author, review comments]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> accel/tcg/meson.build | 1 +
> accel/tcg/tb-context.h | 1 +
> accel/tcg/tb-hash.h | 7 +++++
> accel/tcg/tb-maint.c | 19 ++++++++++++
> accel/tcg/tb-stats.c | 58 +++++++++++++++++++++++++++++++++++++
> accel/tcg/translate-all.c | 43 +++++++++++++++++++++++++++
> include/exec/exec-all.h | 3 ++
> include/exec/tb-stats.h | 61 +++++++++++++++++++++++++++++++++++++++
> 8 files changed, 193 insertions(+)
> create mode 100644 accel/tcg/tb-stats.c
> create mode 100644 include/exec/tb-stats.h
>
> diff --git a/accel/tcg/meson.build b/accel/tcg/meson.build
> index aeb20a6ef0..9263bdde11 100644
> --- a/accel/tcg/meson.build
> +++ b/accel/tcg/meson.build
> @@ -4,6 +4,7 @@ tcg_ss.add(files(
> 'cpu-exec-common.c',
> 'cpu-exec.c',
> 'tb-maint.c',
> + 'tb-stats.c',
> 'tcg-runtime-gvec.c',
> 'tcg-runtime.c',
> 'translate-all.c',
> diff --git a/accel/tcg/tb-context.h b/accel/tcg/tb-context.h
> index cac62d9749..d7910d586b 100644
> --- a/accel/tcg/tb-context.h
> +++ b/accel/tcg/tb-context.h
> @@ -35,6 +35,7 @@ struct TBContext {
> /* statistics */
> unsigned tb_flush_count;
> unsigned tb_phys_invalidate_count;
> + struct qht tb_stats;
> };
>
> extern TBContext tb_ctx;
> diff --git a/accel/tcg/tb-hash.h b/accel/tcg/tb-hash.h
> index 83dc610e4c..87d657a1c6 100644
> --- a/accel/tcg/tb-hash.h
> +++ b/accel/tcg/tb-hash.h
> @@ -67,4 +67,11 @@ uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags,
> return qemu_xxhash7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
> }
>
> +static inline
> +uint32_t tb_stats_hash_func(tb_page_addr_t phys_pc, target_ulong pc,
> + uint32_t flags)
> +{
> + return qemu_xxhash5(phys_pc, pc, flags);
> +}
> +
Why are you avoiding the hash of cs_base?
It certainly changes the comparison for a number of guests.
> +/*
> + * This is the more or less the same compare as tb_cmp(), but the
> + * data persists over tb_flush. We also aggregate the various
> + * variations of cflags under one record and ignore the details of
> + * page overlap (although we can count it).
> + */
> +bool tb_stats_cmp(const void *ap, const void *bp)
> +{
> + const TBStatistics *a = ap;
> + const TBStatistics *b = bp;
> +
> + return a->phys_pc == b->phys_pc &&
> + a->pc == b->pc &&
> + a->cs_base == b->cs_base &&
> + a->flags == b->flags;
> +}
So, comparing cs_base, but not hashing it?
> +void disable_collect_tb_stats(void)
> +{
> + tcg_collect_tb_stats = TB_STATS_PAUSED;
> +}
> +
> +void pause_collect_tb_stats(void)
> +{
> + tcg_collect_tb_stats = TB_STATS_STOPPED;
> +}
These two seem swapped.
r~
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v11 01/14] accel/tcg: introduce TBStatistics structure
2023-05-03 8:12 ` Richard Henderson
@ 2023-05-08 9:52 ` Wu, Fei
0 siblings, 0 replies; 27+ messages in thread
From: Wu, Fei @ 2023-05-08 9:52 UTC (permalink / raw)
To: Richard Henderson, alex.bennee, qemu-devel
Cc: Vanderson M. do Rosario, Paolo Bonzini
On 5/3/2023 4:12 PM, Richard Henderson wrote:
> On 4/21/23 14:24, Fei Wu wrote:
>> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
>>
>> To store statistics for each TB, we created a TBStatistics structure
>> which is linked with the TBs. TBStatistics can stay alive after
>> tb_flush and be relinked to a regenerated TB. So the statistics can
>> be accumulated even through flushes.
>>
>> The goal is to have all present and future qemu/tcg statistics and
>> meta-data stored in this new structure.
>>
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
>> Message-Id: <20190829173437.5926-2-vandersonmr2@gmail.com>
>> [AJB: fix git author, review comments]
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>> accel/tcg/meson.build | 1 +
>> accel/tcg/tb-context.h | 1 +
>> accel/tcg/tb-hash.h | 7 +++++
>> accel/tcg/tb-maint.c | 19 ++++++++++++
>> accel/tcg/tb-stats.c | 58 +++++++++++++++++++++++++++++++++++++
>> accel/tcg/translate-all.c | 43 +++++++++++++++++++++++++++
>> include/exec/exec-all.h | 3 ++
>> include/exec/tb-stats.h | 61 +++++++++++++++++++++++++++++++++++++++
>> 8 files changed, 193 insertions(+)
>> create mode 100644 accel/tcg/tb-stats.c
>> create mode 100644 include/exec/tb-stats.h
>>
>> diff --git a/accel/tcg/meson.build b/accel/tcg/meson.build
>> index aeb20a6ef0..9263bdde11 100644
>> --- a/accel/tcg/meson.build
>> +++ b/accel/tcg/meson.build
>> @@ -4,6 +4,7 @@ tcg_ss.add(files(
>> 'cpu-exec-common.c',
>> 'cpu-exec.c',
>> 'tb-maint.c',
>> + 'tb-stats.c',
>> 'tcg-runtime-gvec.c',
>> 'tcg-runtime.c',
>> 'translate-all.c',
>> diff --git a/accel/tcg/tb-context.h b/accel/tcg/tb-context.h
>> index cac62d9749..d7910d586b 100644
>> --- a/accel/tcg/tb-context.h
>> +++ b/accel/tcg/tb-context.h
>> @@ -35,6 +35,7 @@ struct TBContext {
>> /* statistics */
>> unsigned tb_flush_count;
>> unsigned tb_phys_invalidate_count;
>> + struct qht tb_stats;
>> };
>> extern TBContext tb_ctx;
>> diff --git a/accel/tcg/tb-hash.h b/accel/tcg/tb-hash.h
>> index 83dc610e4c..87d657a1c6 100644
>> --- a/accel/tcg/tb-hash.h
>> +++ b/accel/tcg/tb-hash.h
>> @@ -67,4 +67,11 @@ uint32_t tb_hash_func(tb_page_addr_t phys_pc,
>> target_ulong pc, uint32_t flags,
>> return qemu_xxhash7(phys_pc, pc, flags, cf_mask,
>> trace_vcpu_dstate);
>> }
>> +static inline
>> +uint32_t tb_stats_hash_func(tb_page_addr_t phys_pc, target_ulong pc,
>> + uint32_t flags)
>> +{
>> + return qemu_xxhash5(phys_pc, pc, flags);
>> +}
>> +
>
> Why are you avoiding the hash of cs_base?
> It certainly changes the comparison for a number of guests.
>
Just as you mentioned below, it's checked during compare. There is a
comment in TBStatistics definition.
>> +/*
>> + * This is the more or less the same compare as tb_cmp(), but the
>> + * data persists over tb_flush. We also aggregate the various
>> + * variations of cflags under one record and ignore the details of
>> + * page overlap (although we can count it).
>> + */
>> +bool tb_stats_cmp(const void *ap, const void *bp)
>> +{
>> + const TBStatistics *a = ap;
>> + const TBStatistics *b = bp;
>> +
>> + return a->phys_pc == b->phys_pc &&
>> + a->pc == b->pc &&
>> + a->cs_base == b->cs_base &&
>> + a->flags == b->flags;
>> +}
>
> So, comparing cs_base, but not hashing it?
>
Yes.
>> +void disable_collect_tb_stats(void)
>> +{
>> + tcg_collect_tb_stats = TB_STATS_PAUSED;
>> +}
>> +
>> +void pause_collect_tb_stats(void)
>> +{
>> + tcg_collect_tb_stats = TB_STATS_STOPPED;
>> +}
>
> These two seem swapped.
>
Yes, it seems so, I will update it.
Thanks,
Fei.
>
> r~
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v11 02/14] accel: collecting TB execution count
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
2023-04-21 13:24 ` [PATCH v11 01/14] accel/tcg: introduce TBStatistics structure Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-05-03 8:28 ` Richard Henderson
2023-04-21 13:24 ` [PATCH v11 03/14] accel: collecting JIT statistics Fei Wu
` (12 subsequent siblings)
14 siblings, 1 reply; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel
Cc: Vanderson M. do Rosario, Paolo Bonzini
From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
enabled, then we instrument the start code of this TB
to atomically count the number of times it is executed.
We count both the number of "normal" executions and atomic
executions of a TB.
The execution count of the TB is stored in its respective
TBS.
All TBStatistics are created by default with the flags from
default_tbstats_flag.
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-3-vandersonmr2@gmail.com>
[AJB: Fix author]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/cpu-exec.c | 6 ++++++
accel/tcg/tb-stats.c | 6 ++++++
accel/tcg/tcg-runtime.c | 8 ++++++++
accel/tcg/tcg-runtime.h | 2 ++
accel/tcg/translate-all.c | 7 +++++--
accel/tcg/translator.c | 10 ++++++++++
include/exec/gen-icount.h | 1 +
include/exec/tb-stats.h | 18 ++++++++++++++++++
8 files changed, 56 insertions(+), 2 deletions(-)
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index c815f2dbfd..d89f9fe493 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -25,6 +25,7 @@
#include "trace.h"
#include "disas/disas.h"
#include "exec/exec-all.h"
+#include "exec/tb-stats.h"
#include "tcg/tcg.h"
#include "qemu/atomic.h"
#include "qemu/rcu.h"
@@ -564,7 +565,12 @@ void cpu_exec_step_atomic(CPUState *cpu)
mmap_unlock();
}
+ if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
+ tb->tb_stats->executions.atomic++;
+ }
+
cpu_exec_enter(cpu);
+
/* execute the generated code */
trace_exec_tb(tb, pc);
cpu_tb_exec(cpu, tb, &tb_exit);
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index f78a21f522..b1d4341b6f 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -22,6 +22,7 @@ enum TBStatsStatus {
};
static enum TBStatsStatus tcg_collect_tb_stats;
+static uint32_t default_tbstats_flag;
void init_tb_stats_htable(void)
{
@@ -56,3 +57,8 @@ bool tb_stats_collection_paused(void)
{
return tcg_collect_tb_stats == TB_STATS_PAUSED;
}
+
+uint32_t get_default_tbstats_flag(void)
+{
+ return default_tbstats_flag;
+}
diff --git a/accel/tcg/tcg-runtime.c b/accel/tcg/tcg-runtime.c
index e4e030043f..44a349b334 100644
--- a/accel/tcg/tcg-runtime.c
+++ b/accel/tcg/tcg-runtime.c
@@ -27,6 +27,7 @@
#include "exec/helper-proto.h"
#include "exec/cpu_ldst.h"
#include "exec/exec-all.h"
+#include "exec/tb-stats.h"
#include "disas/disas.h"
#include "exec/log.h"
#include "tcg/tcg.h"
@@ -148,3 +149,10 @@ void HELPER(exit_atomic)(CPUArchState *env)
{
cpu_loop_exit_atomic(env_cpu(env), GETPC());
}
+
+void HELPER(inc_exec_freq)(void *ptr)
+{
+ TBStatistics *stats = (TBStatistics *) ptr;
+ tcg_debug_assert(stats);
+ qatomic_inc(&stats->executions.normal);
+}
diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index e141a6ab24..66954aa210 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -39,6 +39,8 @@ DEF_HELPER_FLAGS_1(exit_atomic, TCG_CALL_NO_WG, noreturn, env)
DEF_HELPER_FLAGS_3(memset, TCG_CALL_NO_RWG, ptr, ptr, int, ptr)
#endif /* IN_HELPER_PROTO */
+DEF_HELPER_FLAGS_1(inc_exec_freq, TCG_CALL_NO_RWG, void, ptr)
+
DEF_HELPER_FLAGS_5(atomic_cmpxchgb, TCG_CALL_NO_WG,
i32, env, tl, i32, i32, i32)
DEF_HELPER_FLAGS_5(atomic_cmpxchgw_be, TCG_CALL_NO_WG,
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 0fcde2e84a..afc89d5692 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -308,6 +308,7 @@ static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
new_stats->pc = pc;
new_stats->cs_base = cs_base;
new_stats->flags = flags;
+ new_stats->stats_enabled = get_default_tbstats_flag();
/*
* All initialisation must be complete before we insert into qht
@@ -397,9 +398,11 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
/*
* We want to fetch the stats structure before we start code
* generation so we can count interesting things about this
- * generation.
+ * generation. If dfilter is in effect we will only collect stats
+ * for the specified range.
*/
- if (tb_stats_collection_enabled()) {
+ if (tb_stats_collection_enabled() &&
+ qemu_log_in_addr_range(tb->pc)) {
tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);
} else {
tb->tb_stats = NULL;
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 7bda43ff61..165072c8c3 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -18,6 +18,15 @@
#include "exec/plugin-gen.h"
#include "exec/replay-core.h"
+static inline void gen_tb_exec_count(TranslationBlock *tb)
+{
+ if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
+ TCGv_ptr ptr = tcg_temp_new_ptr();
+ tcg_gen_movi_ptr(ptr, (intptr_t)tb->tb_stats);
+ gen_helper_inc_exec_freq(ptr);
+ }
+}
+
bool translator_use_goto_tb(DisasContextBase *db, target_ulong dest)
{
/* Suppress goto_tb if requested. */
@@ -56,6 +65,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
/* Start translating. */
gen_tb_start(db->tb);
+ gen_tb_exec_count(tb);
ops->tb_start(db, cpu);
tcg_debug_assert(db->is_jmp == DISAS_NEXT); /* no early exit */
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index f6de79a6b4..20e7835ff0 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -2,6 +2,7 @@
#define GEN_ICOUNT_H
#include "exec/exec-all.h"
+#include "exec/tb-stats.h"
/* Helpers for instruction counting code generation. */
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 606f643723..27a233b7f8 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -30,6 +30,9 @@
#include "exec/exec-all.h"
#include "tcg/tcg.h"
+#define tb_stats_enabled(tb, JIT_STATS) \
+ (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
+
typedef struct TBStatistics TBStatistics;
/*
@@ -46,16 +49,31 @@ struct TBStatistics {
uint32_t flags;
/* cs_base isn't included in the hash but we do check for matches */
target_ulong cs_base;
+
+ /* which stats are enabled for this TBStats */
+ uint32_t stats_enabled;
+
+ /* Execution stats */
+ struct {
+ unsigned long normal;
+ unsigned long atomic;
+ } executions;
+
};
bool tb_stats_cmp(const void *ap, const void *bp);
void init_tb_stats_htable(void);
+#define TB_NOTHING (1 << 0)
+#define TB_EXEC_STATS (1 << 1)
+
void enable_collect_tb_stats(void);
void disable_collect_tb_stats(void);
void pause_collect_tb_stats(void);
bool tb_stats_collection_enabled(void);
bool tb_stats_collection_paused(void);
+uint32_t get_default_tbstats_flag(void);
+
#endif
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v11 02/14] accel: collecting TB execution count
2023-04-21 13:24 ` [PATCH v11 02/14] accel: collecting TB execution count Fei Wu
@ 2023-05-03 8:28 ` Richard Henderson
2023-05-08 10:02 ` Wu, Fei
0 siblings, 1 reply; 27+ messages in thread
From: Richard Henderson @ 2023-05-03 8:28 UTC (permalink / raw)
To: Fei Wu, alex.bennee, qemu-devel; +Cc: Vanderson M. do Rosario, Paolo Bonzini
On 4/21/23 14:24, Fei Wu wrote:
> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
>
> If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
> enabled, then we instrument the start code of this TB
> to atomically count the number of times it is executed.
> We count both the number of "normal" executions and atomic
> executions of a TB.
>
> The execution count of the TB is stored in its respective
> TBS.
>
> All TBStatistics are created by default with the flags from
> default_tbstats_flag.
>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> Message-Id: <20190829173437.5926-3-vandersonmr2@gmail.com>
> [AJB: Fix author]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> accel/tcg/cpu-exec.c | 6 ++++++
> accel/tcg/tb-stats.c | 6 ++++++
> accel/tcg/tcg-runtime.c | 8 ++++++++
> accel/tcg/tcg-runtime.h | 2 ++
> accel/tcg/translate-all.c | 7 +++++--
> accel/tcg/translator.c | 10 ++++++++++
> include/exec/gen-icount.h | 1 +
> include/exec/tb-stats.h | 18 ++++++++++++++++++
> 8 files changed, 56 insertions(+), 2 deletions(-)
>
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index c815f2dbfd..d89f9fe493 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -25,6 +25,7 @@
> #include "trace.h"
> #include "disas/disas.h"
> #include "exec/exec-all.h"
> +#include "exec/tb-stats.h"
> #include "tcg/tcg.h"
> #include "qemu/atomic.h"
> #include "qemu/rcu.h"
> @@ -564,7 +565,12 @@ void cpu_exec_step_atomic(CPUState *cpu)
> mmap_unlock();
> }
>
> + if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
> + tb->tb_stats->executions.atomic++;
> + }
The write is protected by the exclusive lock, but the read might be accessible from the
monitor, iiuc. Which means you should use atomic_set(), for non-tearable write after
non-atomic increment.
> @@ -148,3 +149,10 @@ void HELPER(exit_atomic)(CPUArchState *env)
> {
> cpu_loop_exit_atomic(env_cpu(env), GETPC());
> }
> +
> +void HELPER(inc_exec_freq)(void *ptr)
> +{
> + TBStatistics *stats = (TBStatistics *) ptr;
> + tcg_debug_assert(stats);
> + qatomic_inc(&stats->executions.normal);
> +}
Ug. Do we really need an atomic update?
If we have multiple threads executing through the same TB, we'll get significant slow-down
at the cost of not missing increments. If we allow a non-atomic update, we'll get much
less slow-down at the cost of missing a few increments. But this is statistical only, so
how much does it really matter?
r~
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v11 02/14] accel: collecting TB execution count
2023-05-03 8:28 ` Richard Henderson
@ 2023-05-08 10:02 ` Wu, Fei
0 siblings, 0 replies; 27+ messages in thread
From: Wu, Fei @ 2023-05-08 10:02 UTC (permalink / raw)
To: Richard Henderson, alex.bennee, qemu-devel
Cc: Vanderson M. do Rosario, Paolo Bonzini
On 5/3/2023 4:28 PM, Richard Henderson wrote:
> On 4/21/23 14:24, Fei Wu wrote:
>> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
>>
>> If a TB has a TBS (TBStatistics) with the TB_EXEC_STATS
>> enabled, then we instrument the start code of this TB
>> to atomically count the number of times it is executed.
>> We count both the number of "normal" executions and atomic
>> executions of a TB.
>>
>> The execution count of the TB is stored in its respective
>> TBS.
>>
>> All TBStatistics are created by default with the flags from
>> default_tbstats_flag.
>>
>> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
>> Message-Id: <20190829173437.5926-3-vandersonmr2@gmail.com>
>> [AJB: Fix author]
>> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
>> ---
>> accel/tcg/cpu-exec.c | 6 ++++++
>> accel/tcg/tb-stats.c | 6 ++++++
>> accel/tcg/tcg-runtime.c | 8 ++++++++
>> accel/tcg/tcg-runtime.h | 2 ++
>> accel/tcg/translate-all.c | 7 +++++--
>> accel/tcg/translator.c | 10 ++++++++++
>> include/exec/gen-icount.h | 1 +
>> include/exec/tb-stats.h | 18 ++++++++++++++++++
>> 8 files changed, 56 insertions(+), 2 deletions(-)
>>
>> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
>> index c815f2dbfd..d89f9fe493 100644
>> --- a/accel/tcg/cpu-exec.c
>> +++ b/accel/tcg/cpu-exec.c
>> @@ -25,6 +25,7 @@
>> #include "trace.h"
>> #include "disas/disas.h"
>> #include "exec/exec-all.h"
>> +#include "exec/tb-stats.h"
>> #include "tcg/tcg.h"
>> #include "qemu/atomic.h"
>> #include "qemu/rcu.h"
>> @@ -564,7 +565,12 @@ void cpu_exec_step_atomic(CPUState *cpu)
>> mmap_unlock();
>> }
>> + if (tb_stats_enabled(tb, TB_EXEC_STATS)) {
>> + tb->tb_stats->executions.atomic++;
>> + }
>
> The write is protected by the exclusive lock, but the read might be
> accessible from the monitor, iiuc. Which means you should use
> atomic_set(), for non-tearable write after non-atomic increment.
>
The writes are serialized, 'atomic' is an aligned integer (unsigned
long), the read in parallel with write should not be a problem? It
returns the value either before increment or after, not part of.
>> @@ -148,3 +149,10 @@ void HELPER(exit_atomic)(CPUArchState *env)
>> {
>> cpu_loop_exit_atomic(env_cpu(env), GETPC());
>> }
>> +
>> +void HELPER(inc_exec_freq)(void *ptr)
>> +{
>> + TBStatistics *stats = (TBStatistics *) ptr;
>> + tcg_debug_assert(stats);
>> + qatomic_inc(&stats->executions.normal);
>> +}
>
> Ug. Do we really need an atomic update?
>
> If we have multiple threads executing through the same TB, we'll get
> significant slow-down at the cost of not missing increments. If we
> allow a non-atomic update, we'll get much less slow-down at the cost of
> missing a few increments. But this is statistical only, so how much
> does it really matter?
>
This sounds reasonable to me. Alex, what's your point here?
Richard, could you please review all this series? I just saw your
reviews on patch 01 and 02.
Thanks,
Fei.
>
> r~
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v11 03/14] accel: collecting JIT statistics
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
2023-04-21 13:24 ` [PATCH v11 01/14] accel/tcg: introduce TBStatistics structure Fei Wu
2023-04-21 13:24 ` [PATCH v11 02/14] accel: collecting TB execution count Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-04-21 13:24 ` [PATCH v11 04/14] accel: replacing part of CONFIG_PROFILER with TBStats Fei Wu
` (11 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel
Cc: Vanderson M. do Rosario, Paolo Bonzini
From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
If a TB has a TBS (TBStatistics) with the TB_JIT_STATS enabled then we
collect statistics of its translation processes and code translation.
To help with collection we include the TCGProfile structure
unconditionally. It will have further alterations in future commits.
Collecting the number of host instructions seems to be not simple as
it would imply in having to modify several target source files. So,
for now, we are only collecting the size of the host gen code.
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-4-vandersonmr2@gmail.com>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/translate-all.c | 29 +++++++++++++++++++++++++++--
accel/tcg/translator.c | 3 +++
include/exec/tb-stats.h | 24 ++++++++++++++++++++++++
include/tcg/tcg.h | 22 ++++++++++++++++++++--
tcg/tcg.c | 9 +++++++--
5 files changed, 81 insertions(+), 6 deletions(-)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index afc89d5692..1cd051ca7c 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -309,6 +309,8 @@ static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
new_stats->cs_base = cs_base;
new_stats->flags = flags;
new_stats->stats_enabled = get_default_tbstats_flag();
+ new_stats->tbs = g_ptr_array_sized_new(4);
+ qemu_mutex_init(&new_stats->jit_stats_lock);
/*
* All initialisation must be complete before we insert into qht
@@ -322,6 +324,7 @@ static TBStatistics *tb_get_stats(tb_page_addr_t phys_pc, target_ulong pc,
* If there is already a TBStatistic for this TB from a previous flush
* then just make the new TB point to the older TBStatistic
*/
+ g_ptr_array_free(new_stats->tbs, true);
g_free(new_stats);
return existing_stats;
} else {
@@ -340,9 +343,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
tb_page_addr_t phys_pc;
tcg_insn_unit *gen_code_buf;
int gen_code_size, search_size, max_insns;
-#ifdef CONFIG_PROFILER
TCGProfile *prof = &tcg_ctx->prof;
-#endif
int64_t ti;
void *host_pc;
@@ -571,6 +572,30 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
return tb;
}
+ /*
+ * Collect JIT stats when enabled. We batch them all up here to
+ * avoid spamming the cache with atomic accesses
+ */
+ if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+ TBStatistics *ts = tb->tb_stats;
+ qemu_mutex_lock(&ts->jit_stats_lock);
+
+ ts->code.num_guest_inst += prof->translation.nb_guest_insns;
+ ts->code.num_tcg_ops += prof->translation.nb_ops_pre_opt;
+ ts->code.num_tcg_ops_opt += tcg_ctx->nb_ops;
+ ts->code.spills += prof->translation.nb_spills;
+ ts->code.out_len += tb->tc.size;
+
+ ts->translations.total++;
+ if (tb_page_addr1(tb) != -1) {
+ ts->translations.spanning++;
+ }
+
+ g_ptr_array_add(ts->tbs, tb);
+
+ qemu_mutex_unlock(&ts->jit_stats_lock);
+ }
+
/*
* Insert TB into the corresponding region tree before publishing it
* through QHT. Otherwise rewinding happened in the TB might fail to
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 165072c8c3..40a56962b1 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -132,6 +132,9 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
tb->size = db->pc_next - db->pc_first;
tb->icount = db->num_insns;
+ /* Save number of guest instructions for TB_JIT_STATS */
+ tcg_ctx->prof.translation.nb_guest_insns = db->num_insns;
+
#ifdef DEBUG_DISAS
if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)
&& qemu_log_in_addr_range(db->pc_first)) {
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 27a233b7f8..3506c8d4f7 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -59,6 +59,29 @@ struct TBStatistics {
unsigned long atomic;
} executions;
+ /* JIT Stats - protected by lock */
+ QemuMutex jit_stats_lock;
+
+ /* Sum of all operations for all translations */
+ struct {
+ unsigned num_guest_inst;
+ unsigned num_tcg_ops;
+ unsigned num_tcg_ops_opt;
+ unsigned spills;
+ unsigned out_len;
+ } code;
+
+ struct {
+ unsigned long total;
+ unsigned long uncached;
+ unsigned long spanning;
+ } translations;
+
+ /*
+ * All persistent (cached) TranslationBlocks using
+ * this TBStats structure. Has to be reset on a tb_flush.
+ */
+ GPtrArray *tbs;
};
bool tb_stats_cmp(const void *ap, const void *bp);
@@ -67,6 +90,7 @@ void init_tb_stats_htable(void);
#define TB_NOTHING (1 << 0)
#define TB_EXEC_STATS (1 << 1)
+#define TB_JIT_STATS (1 << 2)
void enable_collect_tb_stats(void);
void disable_collect_tb_stats(void);
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 5cfaa53938..0fdd62bf4a 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -529,7 +529,26 @@ static inline TCGRegSet output_pref(const TCGOp *op, unsigned i)
return i < ARRAY_SIZE(op->output_pref) ? op->output_pref[i] : 0;
}
+/*
+ * The TCGProfile structure holds data for analysing the quality of
+ * the code generation. The data is split between stuff that is valid
+ * for the lifetime of a single translation and things that are valid
+ * for the lifetime of the translator. As the former is reset for each
+ * new translation so it should be copied elsewhere if you want to
+ * keep it.
+ *
+ * The structure is safe to access within the context of translation
+ * but accessing the data from elsewhere should be done with safe
+ * work.
+ */
typedef struct TCGProfile {
+
+ struct {
+ int nb_guest_insns;
+ int nb_spills;
+ int nb_ops_pre_opt;
+ } translation;
+
int64_t cpu_exec_time;
int64_t tb_count1;
int64_t tb_count;
@@ -569,9 +588,7 @@ struct TCGContext {
tcg_insn_unit *code_buf; /* pointer for start of tb */
tcg_insn_unit *code_ptr; /* pointer for running end of tb */
-#ifdef CONFIG_PROFILER
TCGProfile prof;
-#endif
#ifdef CONFIG_DEBUG_TCG
int goto_tb_issue_mask;
@@ -633,6 +650,7 @@ struct TCGContext {
/* Exit to translator on overflow. */
sigjmp_buf jmp_trans;
+ TranslationBlock *current_tb;
};
static inline bool temp_readonly(TCGTemp *ts)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index bb52bc060b..f2d0243384 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1137,6 +1137,7 @@ void tcg_func_start(TCGContext *s)
s->nb_labels = 0;
s->current_frame_offset = s->frame_start;
+ s->prof.translation.nb_spills = 0;
#ifdef CONFIG_DEBUG_TCG
s->goto_tb_issue_mask = 0;
#endif
@@ -3750,6 +3751,7 @@ static TCGReg tcg_reg_alloc(TCGContext *s, TCGRegSet required_regs,
}
/* We must spill something. */
+ s->prof.translation.nb_spills++;
for (j = f; j < 2; j++) {
TCGRegSet set = reg_ct[j];
@@ -4924,12 +4926,12 @@ int64_t tcg_cpu_exec_time(void)
int tcg_gen_code(TCGContext *s, TranslationBlock *tb, target_ulong pc_start)
{
-#ifdef CONFIG_PROFILER
TCGProfile *prof = &s->prof;
-#endif
int i, num_insns;
TCGOp *op;
+ s->current_tb = tb;
+
#ifdef CONFIG_PROFILER
{
int n = 0;
@@ -4963,6 +4965,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, target_ulong pc_start)
}
#endif
+ /* save pre-optimisation op count */
+ prof->translation.nb_ops_pre_opt = s->nb_ops;
+
#ifdef CONFIG_DEBUG_TCG
/* Ensure all labels referenced have been emitted. */
{
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v11 04/14] accel: replacing part of CONFIG_PROFILER with TBStats
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
` (2 preceding siblings ...)
2023-04-21 13:24 ` [PATCH v11 03/14] accel: collecting JIT statistics Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-04-21 13:24 ` [PATCH v11 05/14] accel/tcg: move profiler dev_time to tb_stats Fei Wu
` (10 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel
Cc: Vanderson M. do Rosario, Paolo Bonzini
From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
We add some of the statistics collected in the TCGProfiler into the
TBStats, having the statistics not only for the whole emulation but
for each TB. Then, we removed these stats from TCGProfiler and
reconstruct the information for the "info jit" using the sum of all
TBStats statistics.
The goal is to have one unique and better way of collecting emulation
statistics. Moreover, checking dynamiclly if the profiling is enabled
showed to have an insignificant impact on the performance:
https://wiki.qemu.org/Internships/ProjectIdeas/TCGCodeQuality#Overheads.
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-5-vandersonmr2@gmail.com>
[AJB: fix authorship, use prof structure]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/tb-stats.c | 99 +++++++++++++++++++++++++++++++++
accel/tcg/translate-all.c | 10 ++--
include/exec/tb-stats.h | 11 ++++
include/tcg/tcg.h | 9 +--
tcg/tcg.c | 112 ++++----------------------------------
5 files changed, 129 insertions(+), 112 deletions(-)
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index b1d4341b6f..434b235edb 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -9,6 +9,10 @@
#include "qemu/osdep.h"
#include "disas/disas.h"
+#include "exec/exec-all.h"
+#include "tcg/tcg.h"
+
+#include "qemu/qemu-print.h"
#include "exec/tb-stats.h"
#include "tb-context.h"
@@ -24,6 +28,101 @@ enum TBStatsStatus {
static enum TBStatsStatus tcg_collect_tb_stats;
static uint32_t default_tbstats_flag;
+struct jit_profile_info {
+ uint64_t translations;
+ uint64_t aborted;
+ uint64_t ops;
+ unsigned ops_max;
+ uint64_t del_ops;
+ uint64_t temps;
+ unsigned temps_max;
+ uint64_t host;
+ uint64_t guest;
+ uint64_t search_data;
+};
+
+/* accumulate the statistics from all TBs */
+static void collect_jit_profile_info(void *p, uint32_t hash, void *userp)
+{
+ struct jit_profile_info *jpi = userp;
+ TBStatistics *tbs = p;
+
+ jpi->translations += tbs->translations.total;
+ jpi->ops += tbs->code.num_tcg_ops;
+ if (stat_per_translation(tbs, code.num_tcg_ops) > jpi->ops_max) {
+ jpi->ops_max = stat_per_translation(tbs, code.num_tcg_ops);
+ }
+ jpi->del_ops += tbs->code.deleted_ops;
+ jpi->temps += tbs->code.temps;
+ if (stat_per_translation(tbs, code.temps) > jpi->temps_max) {
+ jpi->temps_max = stat_per_translation(tbs, code.temps);
+ }
+ jpi->host += tbs->code.out_len;
+ jpi->guest += tbs->code.in_len;
+ jpi->search_data += tbs->code.search_out_len;
+}
+
+/* dump JIT statisticis using TCGProfile and TBStats */
+void dump_jit_profile_info(TCGProfile *s, GString *buf)
+{
+ if (!tb_stats_collection_enabled()) {
+ return;
+ }
+
+ struct jit_profile_info *jpi = g_new0(struct jit_profile_info, 1);
+
+ qht_iter(&tb_ctx.tb_stats, collect_jit_profile_info, jpi);
+
+ if (jpi->translations) {
+ g_string_append_printf(buf, "translated TBs %" PRId64 "\n",
+ jpi->translations);
+ g_string_append_printf(buf, "avg ops/TB %0.1f max=%d\n",
+ jpi->ops / (double) jpi->translations, jpi->ops_max);
+ g_string_append_printf(buf, "deleted ops/TB %0.2f\n",
+ jpi->del_ops / (double) jpi->translations);
+ g_string_append_printf(buf, "avg temps/TB %0.2f max=%d\n",
+ jpi->temps / (double) jpi->translations, jpi->temps_max);
+ g_string_append_printf(buf, "avg host code/TB %0.1f\n",
+ jpi->host / (double) jpi->translations);
+ g_string_append_printf(buf, "avg search data/TB %0.1f\n",
+ jpi->search_data / (double) jpi->translations);
+
+ if (s) {
+ int64_t tot = s->interm_time + s->code_time;
+ g_string_append_printf(buf, "JIT cycles %" PRId64
+ " (%0.3f s at 2.4 GHz)\n",
+ tot, tot / 2.4e9);
+ g_string_append_printf(buf, "cycles/op %0.1f\n",
+ jpi->ops ? (double)tot / jpi->ops : 0);
+ g_string_append_printf(buf, "cycles/in byte %0.1f\n",
+ jpi->guest ? (double)tot / jpi->guest : 0);
+ g_string_append_printf(buf, "cycles/out byte %0.1f\n",
+ jpi->host ? (double)tot / jpi->host : 0);
+ g_string_append_printf(buf, "cycles/search byte %0.1f\n",
+ jpi->search_data ? (double)tot / jpi->search_data : 0);
+ if (tot == 0) {
+ tot = 1;
+ }
+ g_string_append_printf(buf, " gen_interm time %0.1f%%\n",
+ (double)s->interm_time / tot * 100.0);
+ g_string_append_printf(buf, " gen_code time %0.1f%%\n",
+ (double)s->code_time / tot * 100.0);
+ g_string_append_printf(buf, "optim./code time %0.1f%%\n",
+ (double)s->opt_time / (s->code_time ? s->code_time : 1)
+ * 100.0);
+ g_string_append_printf(buf, "liveness/code time %0.1f%%\n",
+ (double)s->la_time / (s->code_time ? s->code_time : 1)
+ * 100.0);
+ g_string_append_printf(buf, "cpu_restore count %" PRId64 "\n",
+ s->restore_count);
+ g_string_append_printf(buf, " avg cycles %0.1f\n",
+ s->restore_count ?
+ (double)s->restore_time / s->restore_count : 0);
+ }
+ }
+ g_free(jpi);
+}
+
void init_tb_stats_htable(void)
{
if (!tb_ctx.tb_stats.map && tb_stats_collection_enabled()) {
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 1cd051ca7c..8a51c291d8 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -289,7 +289,6 @@ static int setjmp_gen_code(CPUArchState *env, TranslationBlock *tb,
*max_insns = tb->icount;
#ifdef CONFIG_PROFILER
- qatomic_set(&tcg_ctx->prof.tb_count, tcg_ctx->prof.tb_count + 1);
qatomic_set(&tcg_ctx->prof.interm_time,
tcg_ctx->prof.interm_time + profile_getclock() - *ti);
*ti = profile_getclock();
@@ -389,8 +388,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
tb_overflow:
#ifdef CONFIG_PROFILER
- /* includes aborted translations because of exceptions */
- qatomic_set(&prof->tb_count1, prof->tb_count1 + 1);
ti = profile_getclock();
#endif
@@ -463,9 +460,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
#ifdef CONFIG_PROFILER
qatomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
- qatomic_set(&prof->code_in_len, prof->code_in_len + tb->size);
- qatomic_set(&prof->code_out_len, prof->code_out_len + gen_code_size);
- qatomic_set(&prof->search_out_len, prof->search_out_len + search_size);
#endif
#ifdef DEBUG_DISAS
@@ -584,7 +578,11 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
ts->code.num_tcg_ops += prof->translation.nb_ops_pre_opt;
ts->code.num_tcg_ops_opt += tcg_ctx->nb_ops;
ts->code.spills += prof->translation.nb_spills;
+ ts->code.temps += prof->translation.temp_count;
+ ts->code.deleted_ops += prof->translation.del_op_count;
+ ts->code.in_len += tb->size;
ts->code.out_len += tb->tc.size;
+ ts->code.search_out_len += search_size;
ts->translations.total++;
if (tb_page_addr1(tb) != -1) {
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 3506c8d4f7..0ccf025ae4 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -33,6 +33,9 @@
#define tb_stats_enabled(tb, JIT_STATS) \
(tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
+#define stat_per_translation(stat, name) \
+ (stat->translations.total ? stat->name / stat->translations.total : 0)
+
typedef struct TBStatistics TBStatistics;
/*
@@ -68,7 +71,13 @@ struct TBStatistics {
unsigned num_tcg_ops;
unsigned num_tcg_ops_opt;
unsigned spills;
+
+ /* CONFIG_PROFILE */
+ unsigned temps;
+ unsigned deleted_ops;
+ unsigned in_len;
unsigned out_len;
+ unsigned search_out_len;
} code;
struct {
@@ -88,6 +97,8 @@ bool tb_stats_cmp(const void *ap, const void *bp);
void init_tb_stats_htable(void);
+void dump_jit_profile_info(TCGProfile *s, GString *buf);
+
#define TB_NOTHING (1 << 0)
#define TB_EXEC_STATS (1 << 1)
#define TB_JIT_STATS (1 << 2)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 0fdd62bf4a..645dbaa563 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -547,16 +547,13 @@ typedef struct TCGProfile {
int nb_guest_insns;
int nb_spills;
int nb_ops_pre_opt;
+
+ int del_op_count;
+ int temp_count;
} translation;
int64_t cpu_exec_time;
- int64_t tb_count1;
- int64_t tb_count;
int64_t op_count; /* total insn count */
- int op_count_max; /* max insn per TB */
- int temp_count_max;
- int64_t temp_count;
- int64_t del_op_count;
int64_t code_in_len;
int64_t code_out_len;
int64_t search_out_len;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index f2d0243384..6f46c87dc1 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -44,6 +44,7 @@
#define NO_CPU_IO_DEFS
#include "exec/exec-all.h"
+#include "exec/tb-stats.h"
#include "tcg/tcg-op.h"
#if UINTPTR_MAX == UINT32_MAX
@@ -1138,6 +1139,8 @@ void tcg_func_start(TCGContext *s)
s->current_frame_offset = s->frame_start;
s->prof.translation.nb_spills = 0;
+ s->prof.translation.del_op_count = 0;
+ s->prof.translation.temp_count = 0;
#ifdef CONFIG_DEBUG_TCG
s->goto_tb_issue_mask = 0;
#endif
@@ -2582,10 +2585,8 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
QTAILQ_REMOVE(&s->ops, op, link);
QTAILQ_INSERT_TAIL(&s->free_ops, op, link);
s->nb_ops--;
-
-#ifdef CONFIG_PROFILER
- qatomic_set(&s->prof.del_op_count, s->prof.del_op_count + 1);
-#endif
+ /* ? won't this end up op_opt - op = del_op_count ? */
+ s->prof.translation.del_op_count++;
}
void tcg_remove_ops_after(TCGOp *op)
@@ -4844,16 +4845,6 @@ void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
if (counters) {
PROF_ADD(prof, orig, cpu_exec_time);
- PROF_ADD(prof, orig, tb_count1);
- PROF_ADD(prof, orig, tb_count);
- PROF_ADD(prof, orig, op_count);
- PROF_MAX(prof, orig, op_count_max);
- PROF_ADD(prof, orig, temp_count);
- PROF_MAX(prof, orig, temp_count_max);
- PROF_ADD(prof, orig, del_op_count);
- PROF_ADD(prof, orig, code_in_len);
- PROF_ADD(prof, orig, code_out_len);
- PROF_ADD(prof, orig, search_out_len);
PROF_ADD(prof, orig, interm_time);
PROF_ADD(prof, orig, code_time);
PROF_ADD(prof, orig, la_time);
@@ -4931,26 +4922,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, target_ulong pc_start)
TCGOp *op;
s->current_tb = tb;
-
-#ifdef CONFIG_PROFILER
- {
- int n = 0;
-
- QTAILQ_FOREACH(op, &s->ops, link) {
- n++;
- }
- qatomic_set(&prof->op_count, prof->op_count + n);
- if (n > prof->op_count_max) {
- qatomic_set(&prof->op_count_max, n);
- }
-
- n = s->nb_temps;
- qatomic_set(&prof->temp_count, prof->temp_count + n);
- if (n > prof->temp_count_max) {
- qatomic_set(&prof->temp_count_max, n);
- }
- }
-#endif
+ /* save pre-optimisation op count */
+ prof->translation.nb_ops_pre_opt = s->nb_ops;
+ prof->translation.temp_count = s->nb_temps;
#ifdef DEBUG_DISAS
if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP)
@@ -4965,8 +4939,6 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, target_ulong pc_start)
}
#endif
- /* save pre-optimisation op count */
- prof->translation.nb_ops_pre_opt = s->nb_ops;
#ifdef CONFIG_DEBUG_TCG
/* Ensure all labels referenced have been emitted. */
@@ -5169,76 +5141,16 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, target_ulong pc_start)
return tcg_current_code_size(s);
}
-#ifdef CONFIG_PROFILER
void tcg_dump_info(GString *buf)
{
+ TCGProfile *s = NULL;
+#ifdef CONFIG_PROFILER
TCGProfile prof = {};
- const TCGProfile *s;
- int64_t tb_count;
- int64_t tb_div_count;
- int64_t tot;
-
tcg_profile_snapshot_counters(&prof);
s = &prof;
- tb_count = s->tb_count;
- tb_div_count = tb_count ? tb_count : 1;
- tot = s->interm_time + s->code_time;
-
- g_string_append_printf(buf, "JIT cycles %" PRId64
- " (%0.3f s at 2.4 GHz)\n",
- tot, tot / 2.4e9);
- g_string_append_printf(buf, "translated TBs %" PRId64
- " (aborted=%" PRId64 " %0.1f%%)\n",
- tb_count, s->tb_count1 - tb_count,
- (double)(s->tb_count1 - s->tb_count)
- / (s->tb_count1 ? s->tb_count1 : 1) * 100.0);
- g_string_append_printf(buf, "avg ops/TB %0.1f max=%d\n",
- (double)s->op_count / tb_div_count, s->op_count_max);
- g_string_append_printf(buf, "deleted ops/TB %0.2f\n",
- (double)s->del_op_count / tb_div_count);
- g_string_append_printf(buf, "avg temps/TB %0.2f max=%d\n",
- (double)s->temp_count / tb_div_count,
- s->temp_count_max);
- g_string_append_printf(buf, "avg host code/TB %0.1f\n",
- (double)s->code_out_len / tb_div_count);
- g_string_append_printf(buf, "avg search data/TB %0.1f\n",
- (double)s->search_out_len / tb_div_count);
-
- g_string_append_printf(buf, "cycles/op %0.1f\n",
- s->op_count ? (double)tot / s->op_count : 0);
- g_string_append_printf(buf, "cycles/in byte %0.1f\n",
- s->code_in_len ? (double)tot / s->code_in_len : 0);
- g_string_append_printf(buf, "cycles/out byte %0.1f\n",
- s->code_out_len ? (double)tot / s->code_out_len : 0);
- g_string_append_printf(buf, "cycles/search byte %0.1f\n",
- s->search_out_len ?
- (double)tot / s->search_out_len : 0);
- if (tot == 0) {
- tot = 1;
- }
- g_string_append_printf(buf, " gen_interm time %0.1f%%\n",
- (double)s->interm_time / tot * 100.0);
- g_string_append_printf(buf, " gen_code time %0.1f%%\n",
- (double)s->code_time / tot * 100.0);
- g_string_append_printf(buf, "optim./code time %0.1f%%\n",
- (double)s->opt_time / (s->code_time ?
- s->code_time : 1)
- * 100.0);
- g_string_append_printf(buf, "liveness/code time %0.1f%%\n",
- (double)s->la_time / (s->code_time ?
- s->code_time : 1) * 100.0);
- g_string_append_printf(buf, "cpu_restore count %" PRId64 "\n",
- s->restore_count);
- g_string_append_printf(buf, " avg cycles %0.1f\n",
- s->restore_count ?
- (double)s->restore_time / s->restore_count : 0);
-}
-#else
-void tcg_dump_info(GString *buf)
-{
- g_string_append_printf(buf, "[TCG profiler not compiled]\n");
-}
#endif
+ dump_jit_profile_info(s, buf);
+}
#ifdef ELF_HOST_MACHINE
/* In order to use this feature, the backend needs to do three things:
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v11 05/14] accel/tcg: move profiler dev_time to tb_stats
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
` (3 preceding siblings ...)
2023-04-21 13:24 ` [PATCH v11 04/14] accel: replacing part of CONFIG_PROFILER with TBStats Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-04-21 13:24 ` [PATCH v11 06/14] accel/tcg: convert profiling of restore operations to TBStats Fei Wu
` (9 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel; +Cc: Paolo Bonzini
From: Alex Bennée <alex.bennee@linaro.org>
This shouldn't live in the monitor code anyway. While we are at it
make it an uint64_t as we won't be dealing in negative numbers.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/monitor.c | 2 --
accel/tcg/tb-stats.c | 2 ++
include/qemu/timer.h | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/accel/tcg/monitor.c b/accel/tcg/monitor.c
index 1450e160e9..c9eec426ff 100644
--- a/accel/tcg/monitor.c
+++ b/accel/tcg/monitor.c
@@ -68,8 +68,6 @@ HumanReadableText *qmp_x_query_opcount(Error **errp)
#ifdef CONFIG_PROFILER
-int64_t dev_time;
-
HumanReadableText *qmp_x_query_profile(Error **errp)
{
g_autoptr(GString) buf = g_string_new("");
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 434b235edb..a2438a1f51 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -28,6 +28,8 @@ enum TBStatsStatus {
static enum TBStatsStatus tcg_collect_tb_stats;
static uint32_t default_tbstats_flag;
+uint64_t dev_time;
+
struct jit_profile_info {
uint64_t translations;
uint64_t aborted;
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index ee071e07d1..d86fc73a17 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -995,7 +995,7 @@ static inline int64_t profile_getclock(void)
return get_clock();
}
-extern int64_t dev_time;
+extern uint64_t dev_time;
#endif
#endif
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v11 06/14] accel/tcg: convert profiling of restore operations to TBStats
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
` (4 preceding siblings ...)
2023-04-21 13:24 ` [PATCH v11 05/14] accel/tcg: move profiler dev_time to tb_stats Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-04-21 13:24 ` [PATCH v11 07/14] accel/tcg: convert profiling of code generation " Fei Wu
` (8 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel; +Cc: Paolo Bonzini
From: Alex Bennée <alex.bennee@linaro.org>
This starts the conversion of CONFIG_PROFILER data collection to under
the TBStats system. We introduce a new flag TB_JIT_TIME and start
tracking how much time is spent restoring execution state from a given
TB.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/translate-all.c | 23 ++++++++++++++---------
include/exec/tb-stats.h | 6 ++++++
include/qemu/timer.h | 3 ---
3 files changed, 20 insertions(+), 12 deletions(-)
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index 8a51c291d8..bf10987450 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -200,10 +200,12 @@ void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
uintptr_t host_pc)
{
uint64_t data[TARGET_INSN_START_WORDS];
-#ifdef CONFIG_PROFILER
- TCGProfile *prof = &tcg_ctx->prof;
- int64_t ti = profile_getclock();
-#endif
+ uint64_t ti = 0;
+
+ if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+ ti = profile_getclock();
+ }
+
int insns_left = cpu_unwind_data_from_tb(tb, host_pc, data);
if (insns_left < 0) {
@@ -221,11 +223,14 @@ void cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
cpu->cc->tcg_ops->restore_state_to_opc(cpu, tb, data);
-#ifdef CONFIG_PROFILER
- qatomic_set(&prof->restore_time,
- prof->restore_time + profile_getclock() - ti);
- qatomic_set(&prof->restore_count, prof->restore_count + 1);
-#endif
+ if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+ TBStatistics *ts = tb->tb_stats;
+ uint64_t elapsed = profile_getclock() - ti;
+ qemu_mutex_lock(&ts->jit_stats_lock);
+ ts->tb_restore_time += elapsed;
+ ts->tb_restore_count++;
+ qemu_mutex_unlock(&ts->jit_stats_lock);
+ }
}
bool cpu_restore_state(CPUState *cpu, uintptr_t host_pc)
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 0ccf025ae4..80314c50f9 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -91,6 +91,11 @@ struct TBStatistics {
* this TBStats structure. Has to be reset on a tb_flush.
*/
GPtrArray *tbs;
+
+ /* Recover state from TB */
+ uint64_t tb_restore_time;
+ uint64_t tb_restore_count;
+
};
bool tb_stats_cmp(const void *ap, const void *bp);
@@ -102,6 +107,7 @@ void dump_jit_profile_info(TCGProfile *s, GString *buf);
#define TB_NOTHING (1 << 0)
#define TB_EXEC_STATS (1 << 1)
#define TB_JIT_STATS (1 << 2)
+#define TB_JIT_TIME (1 << 3)
void enable_collect_tb_stats(void);
void disable_collect_tb_stats(void);
diff --git a/include/qemu/timer.h b/include/qemu/timer.h
index d86fc73a17..ad0da18a5f 100644
--- a/include/qemu/timer.h
+++ b/include/qemu/timer.h
@@ -989,7 +989,6 @@ static inline int64_t cpu_get_host_ticks(void)
}
#endif
-#ifdef CONFIG_PROFILER
static inline int64_t profile_getclock(void)
{
return get_clock();
@@ -997,5 +996,3 @@ static inline int64_t profile_getclock(void)
extern uint64_t dev_time;
#endif
-
-#endif
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v11 07/14] accel/tcg: convert profiling of code generation to TBStats
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
` (5 preceding siblings ...)
2023-04-21 13:24 ` [PATCH v11 06/14] accel/tcg: convert profiling of restore operations to TBStats Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-04-21 13:24 ` [PATCH v11 08/14] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER Fei Wu
` (7 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel; +Cc: Paolo Bonzini
From: Alex Bennée <alex.bennee@linaro.org>
We continue the conversion of CONFIG_PROFILER data to TBStats by
measuring the time it takes to generate code. Instead of calculating
elapsed time as we do we simply store key timestamps in the profiler
structure and then calculate the totals and add them to TBStats under
lock.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/tb-stats.c | 33 -------------------
accel/tcg/translate-all.c | 69 ++++++++++++++++++++++-----------------
include/exec/tb-stats.h | 7 ++++
include/tcg/tcg.h | 14 ++++----
tcg/tcg.c | 17 ++++------
5 files changed, 59 insertions(+), 81 deletions(-)
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index a2438a1f51..01adbac2a0 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -88,39 +88,6 @@ void dump_jit_profile_info(TCGProfile *s, GString *buf)
jpi->host / (double) jpi->translations);
g_string_append_printf(buf, "avg search data/TB %0.1f\n",
jpi->search_data / (double) jpi->translations);
-
- if (s) {
- int64_t tot = s->interm_time + s->code_time;
- g_string_append_printf(buf, "JIT cycles %" PRId64
- " (%0.3f s at 2.4 GHz)\n",
- tot, tot / 2.4e9);
- g_string_append_printf(buf, "cycles/op %0.1f\n",
- jpi->ops ? (double)tot / jpi->ops : 0);
- g_string_append_printf(buf, "cycles/in byte %0.1f\n",
- jpi->guest ? (double)tot / jpi->guest : 0);
- g_string_append_printf(buf, "cycles/out byte %0.1f\n",
- jpi->host ? (double)tot / jpi->host : 0);
- g_string_append_printf(buf, "cycles/search byte %0.1f\n",
- jpi->search_data ? (double)tot / jpi->search_data : 0);
- if (tot == 0) {
- tot = 1;
- }
- g_string_append_printf(buf, " gen_interm time %0.1f%%\n",
- (double)s->interm_time / tot * 100.0);
- g_string_append_printf(buf, " gen_code time %0.1f%%\n",
- (double)s->code_time / tot * 100.0);
- g_string_append_printf(buf, "optim./code time %0.1f%%\n",
- (double)s->opt_time / (s->code_time ? s->code_time : 1)
- * 100.0);
- g_string_append_printf(buf, "liveness/code time %0.1f%%\n",
- (double)s->la_time / (s->code_time ? s->code_time : 1)
- * 100.0);
- g_string_append_printf(buf, "cpu_restore count %" PRId64 "\n",
- s->restore_count);
- g_string_append_printf(buf, " avg cycles %0.1f\n",
- s->restore_count ?
- (double)s->restore_time / s->restore_count : 0);
- }
}
g_free(jpi);
}
diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
index bf10987450..92285d0add 100644
--- a/accel/tcg/translate-all.c
+++ b/accel/tcg/translate-all.c
@@ -278,8 +278,9 @@ void page_init(void)
*/
static int setjmp_gen_code(CPUArchState *env, TranslationBlock *tb,
target_ulong pc, void *host_pc,
- int *max_insns, int64_t *ti)
+ int *max_insns)
{
+ TCGProfile *prof = &tcg_ctx->prof;
int ret = sigsetjmp(tcg_ctx->jmp_trans, 0);
if (unlikely(ret != 0)) {
return ret;
@@ -293,11 +294,9 @@ static int setjmp_gen_code(CPUArchState *env, TranslationBlock *tb,
tcg_ctx->cpu = NULL;
*max_insns = tb->icount;
-#ifdef CONFIG_PROFILER
- qatomic_set(&tcg_ctx->prof.interm_time,
- tcg_ctx->prof.interm_time + profile_getclock() - *ti);
- *ti = profile_getclock();
-#endif
+ if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+ prof->gen_ir_done_time = profile_getclock();
+ }
return tcg_gen_code(tcg_ctx, tb, pc);
}
@@ -348,7 +347,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
tcg_insn_unit *gen_code_buf;
int gen_code_size, search_size, max_insns;
TCGProfile *prof = &tcg_ctx->prof;
- int64_t ti;
void *host_pc;
assert_memory_lock();
@@ -392,10 +390,6 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
tcg_ctx->gen_tb = tb;
tb_overflow:
-#ifdef CONFIG_PROFILER
- ti = profile_getclock();
-#endif
-
trace_translate_block(tb, pc, tb->tc.ptr);
/*
@@ -407,11 +401,14 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
if (tb_stats_collection_enabled() &&
qemu_log_in_addr_range(tb->pc)) {
tb->tb_stats = tb_get_stats(phys_pc, pc, cs_base, flags);
+ if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+ prof->gen_start_time = profile_getclock();
+ }
} else {
tb->tb_stats = NULL;
}
- gen_code_size = setjmp_gen_code(env, tb, pc, host_pc, &max_insns, &ti);
+ gen_code_size = setjmp_gen_code(env, tb, pc, host_pc, &max_insns);
if (unlikely(gen_code_size < 0)) {
switch (gen_code_size) {
case -1:
@@ -463,9 +460,9 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
*/
perf_report_code(pc, tb, tcg_splitwx_to_rx(gen_code_buf));
-#ifdef CONFIG_PROFILER
- qatomic_set(&prof->code_time, prof->code_time + profile_getclock() - ti);
-#endif
+ if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+ prof->gen_code_done_time = profile_getclock();
+ }
#ifdef DEBUG_DISAS
if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) &&
@@ -575,26 +572,38 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
* Collect JIT stats when enabled. We batch them all up here to
* avoid spamming the cache with atomic accesses
*/
- if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+ if (tb_stats_enabled(tb, (TB_JIT_STATS | TB_JIT_TIME))) {
TBStatistics *ts = tb->tb_stats;
qemu_mutex_lock(&ts->jit_stats_lock);
- ts->code.num_guest_inst += prof->translation.nb_guest_insns;
- ts->code.num_tcg_ops += prof->translation.nb_ops_pre_opt;
- ts->code.num_tcg_ops_opt += tcg_ctx->nb_ops;
- ts->code.spills += prof->translation.nb_spills;
- ts->code.temps += prof->translation.temp_count;
- ts->code.deleted_ops += prof->translation.del_op_count;
- ts->code.in_len += tb->size;
- ts->code.out_len += tb->tc.size;
- ts->code.search_out_len += search_size;
-
- ts->translations.total++;
- if (tb_page_addr1(tb) != -1) {
- ts->translations.spanning++;
+ if (tb_stats_enabled(tb, TB_JIT_STATS)) {
+ ts->code.num_guest_inst += prof->translation.nb_guest_insns;
+ ts->code.num_tcg_ops += prof->translation.nb_ops_pre_opt;
+ ts->code.num_tcg_ops_opt += tcg_ctx->nb_ops;
+ ts->code.spills += prof->translation.nb_spills;
+ ts->code.temps += prof->translation.temp_count;
+ ts->code.deleted_ops += prof->translation.del_op_count;
+ ts->code.in_len += tb->size;
+ ts->code.out_len += tb->tc.size;
+ ts->code.search_out_len += search_size;
+
+ ts->translations.total++;
+ if (tb_page_addr1(tb) != -1) {
+ ts->translations.spanning++;
+ }
+
+ g_ptr_array_add(ts->tbs, tb);
}
- g_ptr_array_add(ts->tbs, tb);
+ if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+ ts->gen_times.ir += prof->gen_ir_done_time - prof->gen_start_time;
+ ts->gen_times.ir_opt +=
+ prof->gen_opt_done_time - prof->gen_ir_done_time;
+ ts->gen_times.la +=
+ prof->gen_la_done_time - prof->gen_opt_done_time;
+ ts->gen_times.code +=
+ prof->gen_code_done_time - prof->gen_la_done_time;
+ }
qemu_mutex_unlock(&ts->jit_stats_lock);
}
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 80314c50f9..a23b6320bd 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -96,6 +96,12 @@ struct TBStatistics {
uint64_t tb_restore_time;
uint64_t tb_restore_count;
+ struct {
+ uint64_t ir;
+ uint64_t ir_opt;
+ uint64_t la;
+ uint64_t code;
+ } gen_times;
};
bool tb_stats_cmp(const void *ap, const void *bp);
@@ -103,6 +109,7 @@ bool tb_stats_cmp(const void *ap, const void *bp);
void init_tb_stats_htable(void);
void dump_jit_profile_info(TCGProfile *s, GString *buf);
+void dump_jit_exec_time_info(uint64_t dev_time);
#define TB_NOTHING (1 << 0)
#define TB_EXEC_STATS (1 << 1)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 645dbaa563..abad5d6a70 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -557,13 +557,13 @@ typedef struct TCGProfile {
int64_t code_in_len;
int64_t code_out_len;
int64_t search_out_len;
- int64_t interm_time;
- int64_t code_time;
- int64_t la_time;
- int64_t opt_time;
- int64_t restore_count;
- int64_t restore_time;
- int64_t table_op_count[NB_OPS];
+
+ /* Timestamps during translation */
+ uint64_t gen_start_time;
+ uint64_t gen_ir_done_time;
+ uint64_t gen_opt_done_time;
+ uint64_t gen_la_done_time;
+ uint64_t gen_code_done_time;
} TCGProfile;
struct TCGContext {
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 6f46c87dc1..716afbd980 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -4957,18 +4957,13 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, target_ulong pc_start)
}
#endif
-#ifdef CONFIG_PROFILER
- qatomic_set(&prof->opt_time, prof->opt_time - profile_getclock());
-#endif
-
#ifdef USE_TCG_OPTIMIZATIONS
tcg_optimize(s);
#endif
-#ifdef CONFIG_PROFILER
- qatomic_set(&prof->opt_time, prof->opt_time + profile_getclock());
- qatomic_set(&prof->la_time, prof->la_time - profile_getclock());
-#endif
+ if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+ prof->gen_opt_done_time = profile_getclock();
+ }
reachable_code_pass(s);
liveness_pass_0(s);
@@ -4994,9 +4989,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, target_ulong pc_start)
}
}
-#ifdef CONFIG_PROFILER
- qatomic_set(&prof->la_time, prof->la_time + profile_getclock());
-#endif
+ if (tb_stats_enabled(tb, TB_JIT_TIME)) {
+ prof->gen_la_done_time = profile_getclock();
+ }
#ifdef DEBUG_DISAS
if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_OPT)
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v11 08/14] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
` (6 preceding siblings ...)
2023-04-21 13:24 ` [PATCH v11 07/14] accel/tcg: convert profiling of code generation " Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-04-21 13:24 ` [PATCH v11 09/14] debug: add -d tb_stats to control TBStatistics collection: Fei Wu
` (6 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel
Cc: Vanderson M. do Rosario, Paolo Bonzini
From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
Replace all others CONFIG_PROFILER statistics and migrate it to
TBStatistics system. However, TCGProfiler still exists and can
be use to store global statistics and times. All TB related
statistics goes to TBStatistics.
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-6-vandersonmr2@gmail.com>
[AJB: fix authorship]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/monitor.c | 29 +++--------
accel/tcg/tb-stats.c | 78 +++++++++++++++++++++++++++++
accel/tcg/tcg-accel-ops.c | 15 +++---
include/tcg/tcg.h | 5 +-
softmmu/runstate.c | 8 +--
tcg/tcg.c | 100 ++++++++++----------------------------
6 files changed, 123 insertions(+), 112 deletions(-)
diff --git a/accel/tcg/monitor.c b/accel/tcg/monitor.c
index c9eec426ff..5f9dba56e3 100644
--- a/accel/tcg/monitor.c
+++ b/accel/tcg/monitor.c
@@ -14,6 +14,7 @@
#include "sysemu/cpus.h"
#include "sysemu/cpu-timers.h"
#include "sysemu/tcg.h"
+#include "exec/tb-stats.h"
#include "internal.h"
@@ -55,6 +56,11 @@ HumanReadableText *qmp_x_query_opcount(Error **errp)
{
g_autoptr(GString) buf = g_string_new("");
+ if (!tb_stats_collection_enabled()) {
+ error_report("TB information not being recorded.");
+ return NULL;
+ }
+
if (!tcg_enabled()) {
error_setg(errp,
"Opcode count information is only available with accel=tcg");
@@ -66,34 +72,15 @@ HumanReadableText *qmp_x_query_opcount(Error **errp)
return human_readable_text_from_str(buf);
}
-#ifdef CONFIG_PROFILER
-
HumanReadableText *qmp_x_query_profile(Error **errp)
{
g_autoptr(GString) buf = g_string_new("");
- static int64_t last_cpu_exec_time;
- int64_t cpu_exec_time;
- int64_t delta;
-
- cpu_exec_time = tcg_cpu_exec_time();
- delta = cpu_exec_time - last_cpu_exec_time;
-
- g_string_append_printf(buf, "async time %" PRId64 " (%0.3f)\n",
- dev_time, dev_time / (double)NANOSECONDS_PER_SECOND);
- g_string_append_printf(buf, "qemu time %" PRId64 " (%0.3f)\n",
- delta, delta / (double)NANOSECONDS_PER_SECOND);
- last_cpu_exec_time = cpu_exec_time;
+
+ dump_jit_exec_time_info(dev_time);
dev_time = 0;
return human_readable_text_from_str(buf);
}
-#else
-HumanReadableText *qmp_x_query_profile(Error **errp)
-{
- error_setg(errp, "Internal profiler not compiled");
- return NULL;
-}
-#endif
static void hmp_tcg_register(void)
{
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 01adbac2a0..3cd7f6fc06 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -13,6 +13,7 @@
#include "tcg/tcg.h"
#include "qemu/qemu-print.h"
+#include "qemu/timer.h"
#include "exec/tb-stats.h"
#include "tb-context.h"
@@ -41,6 +42,13 @@ struct jit_profile_info {
uint64_t host;
uint64_t guest;
uint64_t search_data;
+
+ uint64_t interm_time;
+ uint64_t code_time;
+ uint64_t restore_count;
+ uint64_t restore_time;
+ uint64_t opt_time;
+ uint64_t la_time;
};
/* accumulate the statistics from all TBs */
@@ -62,6 +70,35 @@ static void collect_jit_profile_info(void *p, uint32_t hash, void *userp)
jpi->host += tbs->code.out_len;
jpi->guest += tbs->code.in_len;
jpi->search_data += tbs->code.search_out_len;
+
+ jpi->interm_time += stat_per_translation(tbs, gen_times.ir);
+ jpi->opt_time += stat_per_translation(tbs, gen_times.ir_opt);
+ jpi->la_time += stat_per_translation(tbs, gen_times.la);
+ jpi->code_time += stat_per_translation(tbs, gen_times.code);
+
+ /*
+ * The restore time covers how long we have spent restoring state
+ * from a given TB (e.g. recovering from a fault). It is therefor
+ * not related to the number of translations we have done.
+ */
+ jpi->restore_time += tbs->tb_restore_time;
+ jpi->restore_count += tbs->tb_restore_count;
+}
+
+void dump_jit_exec_time_info(uint64_t dev_time)
+{
+ static uint64_t last_cpu_exec_time;
+ uint64_t cpu_exec_time;
+ uint64_t delta;
+
+ cpu_exec_time = tcg_cpu_exec_time();
+ delta = cpu_exec_time - last_cpu_exec_time;
+
+ qemu_printf("async time %" PRId64 " (%0.3f)\n",
+ dev_time, dev_time / (double) NANOSECONDS_PER_SECOND);
+ qemu_printf("qemu time %" PRId64 " (%0.3f)\n",
+ delta, delta / (double) NANOSECONDS_PER_SECOND);
+ last_cpu_exec_time = cpu_exec_time;
}
/* dump JIT statisticis using TCGProfile and TBStats */
@@ -88,6 +125,47 @@ void dump_jit_profile_info(TCGProfile *s, GString *buf)
jpi->host / (double) jpi->translations);
g_string_append_printf(buf, "avg search data/TB %0.1f\n",
jpi->search_data / (double) jpi->translations);
+
+ uint64_t tot = jpi->interm_time + jpi->code_time;
+
+ g_string_append_printf(buf, "JIT cycles %" PRId64
+ " (%0.3fs at 2.4 GHz)\n",
+ tot, tot / 2.4e9);
+ g_string_append_printf(buf, " cycles/op %0.1f\n",
+ jpi->ops ? (double)tot / jpi->ops : 0);
+ g_string_append_printf(buf, " cycles/in byte %0.1f\n",
+ jpi->guest ? (double)tot / jpi->guest : 0);
+ g_string_append_printf(buf, " cycles/out byte %0.1f\n",
+ jpi->host ? (double)tot / jpi->host : 0);
+ g_string_append_printf(buf, " cycles/search byte %0.1f\n",
+ jpi->search_data ? (double)tot / jpi->search_data : 0);
+ if (tot == 0) {
+ tot = 1;
+ }
+
+ g_string_append_printf(buf, " gen_interm time %0.1f%%\n",
+ (double)jpi->interm_time / tot * 100.0);
+ g_string_append_printf(buf, " gen_code time %0.1f%%\n",
+ (double)jpi->code_time / tot * 100.0);
+
+ g_string_append_printf(buf, " optim./code time %0.1f%%\n",
+ (double)jpi->opt_time / (jpi->code_time ? jpi->code_time : 1)
+ * 100.0);
+ g_string_append_printf(buf, " liveness/code time %0.1f%%\n",
+ (double)jpi->la_time / (jpi->code_time ? jpi->code_time : 1)
+ * 100.0);
+
+ g_string_append_printf(buf, "cpu_restore count %" PRId64 "\n",
+ jpi->restore_count);
+ g_string_append_printf(buf, " avg cycles %0.1f\n",
+ jpi->restore_count ?
+ (double)jpi->restore_time / jpi->restore_count : 0);
+
+ if (s) {
+ g_string_append_printf(buf, "cpu exec time %" PRId64 " (%0.3fs)\n",
+ s->cpu_exec_time,
+ s->cpu_exec_time / (double) NANOSECONDS_PER_SECOND);
+ }
}
g_free(jpi);
}
diff --git a/accel/tcg/tcg-accel-ops.c b/accel/tcg/tcg-accel-ops.c
index af35e0d092..fb33ebe310 100644
--- a/accel/tcg/tcg-accel-ops.c
+++ b/accel/tcg/tcg-accel-ops.c
@@ -70,20 +70,17 @@ void tcg_cpus_destroy(CPUState *cpu)
int tcg_cpus_exec(CPUState *cpu)
{
int ret;
-#ifdef CONFIG_PROFILER
- int64_t ti;
-#endif
+ uint64_t ti;
+
assert(tcg_enabled());
-#ifdef CONFIG_PROFILER
ti = profile_getclock();
-#endif
+
cpu_exec_start(cpu);
ret = cpu_exec(cpu);
cpu_exec_end(cpu);
-#ifdef CONFIG_PROFILER
- qatomic_set(&tcg_ctx->prof.cpu_exec_time,
- tcg_ctx->prof.cpu_exec_time + profile_getclock() - ti);
-#endif
+
+ qatomic_add(&tcg_ctx->prof.cpu_exec_time, profile_getclock() - ti);
+
return ret;
}
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index abad5d6a70..2a47e75ee5 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -564,6 +564,9 @@ typedef struct TCGProfile {
uint64_t gen_opt_done_time;
uint64_t gen_la_done_time;
uint64_t gen_code_done_time;
+
+ /* Lifetime count of TCGOps per TCGContext */
+ uint64_t table_op_count[NB_OPS];
} TCGProfile;
struct TCGContext {
@@ -925,7 +928,7 @@ static inline TCGv_ptr tcg_temp_new_ptr(void)
return temp_tcgv_ptr(t);
}
-int64_t tcg_cpu_exec_time(void);
+uint64_t tcg_cpu_exec_time(void);
void tcg_dump_info(GString *buf);
void tcg_dump_op_count(GString *buf);
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index d1e04586db..48f46283f1 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -720,18 +720,12 @@ static bool main_loop_should_exit(int *status)
int qemu_main_loop(void)
{
int status = EXIT_SUCCESS;
-#ifdef CONFIG_PROFILER
- int64_t ti;
-#endif
+ uint64_t ti;
while (!main_loop_should_exit(&status)) {
-#ifdef CONFIG_PROFILER
ti = profile_getclock();
-#endif
main_loop_wait(false);
-#ifdef CONFIG_PROFILER
dev_time += profile_getclock() - ti;
-#endif
}
return status;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 716afbd980..337fc0dcab 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -4816,25 +4816,13 @@ static void tcg_reg_alloc_call(TCGContext *s, TCGOp *op)
}
}
-#ifdef CONFIG_PROFILER
-
/* avoid copy/paste errors */
#define PROF_ADD(to, from, field) \
do { \
(to)->field += qatomic_read(&((from)->field)); \
} while (0)
-#define PROF_MAX(to, from, field) \
- do { \
- typeof((from)->field) val__ = qatomic_read(&((from)->field)); \
- if (val__ > (to)->field) { \
- (to)->field = val__; \
- } \
- } while (0)
-
-/* Pass in a zero'ed @prof */
-static inline
-void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
+static void collect_tcg_profiler(TCGProfile *prof)
{
unsigned int n_ctxs = qatomic_read(&tcg_cur_ctxs);
unsigned int i;
@@ -4843,55 +4831,19 @@ void tcg_profile_snapshot(TCGProfile *prof, bool counters, bool table)
TCGContext *s = qatomic_read(&tcg_ctxs[i]);
const TCGProfile *orig = &s->prof;
- if (counters) {
- PROF_ADD(prof, orig, cpu_exec_time);
- PROF_ADD(prof, orig, interm_time);
- PROF_ADD(prof, orig, code_time);
- PROF_ADD(prof, orig, la_time);
- PROF_ADD(prof, orig, opt_time);
- PROF_ADD(prof, orig, restore_count);
- PROF_ADD(prof, orig, restore_time);
- }
- if (table) {
- int i;
+ PROF_ADD(prof, orig, cpu_exec_time);
- for (i = 0; i < NB_OPS; i++) {
- PROF_ADD(prof, orig, table_op_count[i]);
- }
+ for (i = 0; i < NB_OPS; i++) {
+ PROF_ADD(prof, orig, table_op_count[i]);
}
}
}
-#undef PROF_ADD
-#undef PROF_MAX
-
-static void tcg_profile_snapshot_counters(TCGProfile *prof)
-{
- tcg_profile_snapshot(prof, true, false);
-}
-
-static void tcg_profile_snapshot_table(TCGProfile *prof)
-{
- tcg_profile_snapshot(prof, false, true);
-}
-
-void tcg_dump_op_count(GString *buf)
-{
- TCGProfile prof = {};
- int i;
-
- tcg_profile_snapshot_table(&prof);
- for (i = 0; i < NB_OPS; i++) {
- g_string_append_printf(buf, "%s %" PRId64 "\n", tcg_op_defs[i].name,
- prof.table_op_count[i]);
- }
-}
-
-int64_t tcg_cpu_exec_time(void)
+uint64_t tcg_cpu_exec_time(void)
{
unsigned int n_ctxs = qatomic_read(&tcg_cur_ctxs);
unsigned int i;
- int64_t ret = 0;
+ uint64_t ret = 0;
for (i = 0; i < n_ctxs; i++) {
const TCGContext *s = qatomic_read(&tcg_ctxs[i]);
@@ -4901,19 +4853,6 @@ int64_t tcg_cpu_exec_time(void)
}
return ret;
}
-#else
-void tcg_dump_op_count(GString *buf)
-{
- g_string_append_printf(buf, "[TCG profiler not compiled]\n");
-}
-
-int64_t tcg_cpu_exec_time(void)
-{
- error_report("%s: TCG profiler not compiled", __func__);
- exit(EXIT_FAILURE);
-}
-#endif
-
int tcg_gen_code(TCGContext *s, TranslationBlock *tb, target_ulong pc_start)
{
@@ -5029,14 +4968,17 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, target_ulong pc_start)
s->pool_labels = NULL;
#endif
+ if (tb_stats_collection_enabled()) {
+ QTAILQ_FOREACH(op, &s->ops, link) {
+ TCGOpcode opc = op->opc;
+ s->prof.table_op_count[opc]++;
+ }
+ }
+
num_insns = -1;
QTAILQ_FOREACH(op, &s->ops, link) {
TCGOpcode opc = op->opc;
-#ifdef CONFIG_PROFILER
- qatomic_set(&prof->table_op_count[opc], prof->table_op_count[opc] + 1);
-#endif
-
switch (opc) {
case INDEX_op_mov_i32:
case INDEX_op_mov_i64:
@@ -5136,14 +5078,24 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, target_ulong pc_start)
return tcg_current_code_size(s);
}
+void tcg_dump_op_count(GString *buf)
+{
+ TCGProfile prof = {};
+ int i;
+
+ collect_tcg_profiler(&prof);
+ for (i = 0; i < NB_OPS; i++) {
+ g_string_append_printf(buf, "%s %" PRId64 "\n",
+ tcg_op_defs[i].name, prof.table_op_count[i]);
+ }
+}
+
void tcg_dump_info(GString *buf)
{
TCGProfile *s = NULL;
-#ifdef CONFIG_PROFILER
TCGProfile prof = {};
- tcg_profile_snapshot_counters(&prof);
s = &prof;
-#endif
+ collect_tcg_profiler(s);
dump_jit_profile_info(s, buf);
}
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v11 09/14] debug: add -d tb_stats to control TBStatistics collection:
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
` (7 preceding siblings ...)
2023-04-21 13:24 ` [PATCH v11 08/14] accel: adding TB_JIT_TIME and full replacing CONFIG_PROFILER Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-04-21 13:24 ` [PATCH v11 10/14] monitor: adding tb_stats hmp command Fei Wu
` (5 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel
Cc: Vanderson M. do Rosario, Paolo Bonzini
From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
-d tb_stats[[,level=(+all+jit+exec+time)][,dump_limit=<number>]]
"dump_limit" is used to limit the number of dumped TBStats in
linux-user mode.
[all+jit+exec+time] control the profilling level used
by the TBStats. Can be used as follow:
-d tb_stats
-d tb_stats,level=jit+time
-d tb_stats,dump_limit=15
...
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-7-vandersonmr2@gmail.com>
[AJB: fix authorship, reword title]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/tb-stats.c | 5 +++++
include/exec/gen-icount.h | 1 +
include/exec/tb-stats-flags.h | 30 ++++++++++++++++++++++++++++
include/exec/tb-stats.h | 15 +++-----------
include/qemu/log.h | 1 +
stubs/meson.build | 1 +
stubs/tb-stats.c | 27 +++++++++++++++++++++++++
util/log.c | 37 +++++++++++++++++++++++++++++++++++
8 files changed, 105 insertions(+), 12 deletions(-)
create mode 100644 include/exec/tb-stats-flags.h
create mode 100644 stubs/tb-stats.c
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 3cd7f6fc06..4d24ac2472 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -208,3 +208,8 @@ uint32_t get_default_tbstats_flag(void)
{
return default_tbstats_flag;
}
+
+void set_default_tbstats_flag(uint32_t flags)
+{
+ default_tbstats_flag = flags;
+}
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index 20e7835ff0..4372817951 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -3,6 +3,7 @@
#include "exec/exec-all.h"
#include "exec/tb-stats.h"
+#include "tb-stats-flags.h"
/* Helpers for instruction counting code generation. */
diff --git a/include/exec/tb-stats-flags.h b/include/exec/tb-stats-flags.h
new file mode 100644
index 0000000000..559d819b6b
--- /dev/null
+++ b/include/exec/tb-stats-flags.h
@@ -0,0 +1,30 @@
+/*
+ * QEMU System Emulator, Code Quality Monitor System
+ *
+ * We define the flags and control bits here to avoid complications of
+ * including TCG/CPU information in common code.
+ *
+ * Copyright (c) 2019 Vanderson M. do Rosario <vandersonmr2@gmail.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef TB_STATS_FLAGS
+#define TB_STATS_FLAGS
+
+#define TB_NOTHING (1 << 0)
+#define TB_EXEC_STATS (1 << 1)
+#define TB_JIT_STATS (1 << 2)
+#define TB_JIT_TIME (1 << 3)
+#define TB_ALL_STATS (TB_EXEC_STATS | TB_JIT_STATS | TB_JIT_TIME)
+
+/* TBStatistic collection controls */
+void enable_collect_tb_stats(void);
+void disable_collect_tb_stats(void);
+void pause_collect_tb_stats(void);
+bool tb_stats_collection_enabled(void);
+bool tb_stats_collection_paused(void);
+
+uint32_t get_default_tbstats_flag(void);
+void set_default_tbstats_flag(uint32_t);
+
+#endif
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index a23b6320bd..ad60662f78 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -30,6 +30,8 @@
#include "exec/exec-all.h"
#include "tcg/tcg.h"
+#include "exec/tb-stats-flags.h"
+
#define tb_stats_enabled(tb, JIT_STATS) \
(tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
@@ -111,17 +113,6 @@ void init_tb_stats_htable(void);
void dump_jit_profile_info(TCGProfile *s, GString *buf);
void dump_jit_exec_time_info(uint64_t dev_time);
-#define TB_NOTHING (1 << 0)
-#define TB_EXEC_STATS (1 << 1)
-#define TB_JIT_STATS (1 << 2)
-#define TB_JIT_TIME (1 << 3)
-
-void enable_collect_tb_stats(void);
-void disable_collect_tb_stats(void);
-void pause_collect_tb_stats(void);
-bool tb_stats_collection_enabled(void);
-bool tb_stats_collection_paused(void);
-
-uint32_t get_default_tbstats_flag(void);
+void set_tbstats_flags(uint32_t flags);
#endif
diff --git a/include/qemu/log.h b/include/qemu/log.h
index c5643d8dd5..6f3b8091cd 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -35,6 +35,7 @@ bool qemu_log_separate(void);
/* LOG_STRACE is used for user-mode strace logging. */
#define LOG_STRACE (1 << 19)
#define LOG_PER_THREAD (1 << 20)
+#define CPU_LOG_TB_STATS (1 << 21)
/* Lock/unlock output. */
diff --git a/stubs/meson.build b/stubs/meson.build
index b2b5956d97..ecce77a01f 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -63,3 +63,4 @@ else
endif
stub_ss.add(files('semihost-all.c'))
stub_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_false: files('vfio-user-obj.c'))
+stub_ss.add(files('tb-stats.c'))
diff --git a/stubs/tb-stats.c b/stubs/tb-stats.c
new file mode 100644
index 0000000000..d212c2a1fa
--- /dev/null
+++ b/stubs/tb-stats.c
@@ -0,0 +1,27 @@
+/*
+ * TB Stats Stubs
+ *
+ * Copyright (c) 2019
+ * Written by Alex Bennée <alex.bennee@linaro.org>
+ *
+ * This code is licensed under the GNU GPL v2, or later.
+ */
+
+
+#include "qemu/osdep.h"
+#include "exec/tb-stats-flags.h"
+
+void enable_collect_tb_stats(void)
+{
+ return;
+}
+
+bool tb_stats_collection_enabled(void)
+{
+ return false;
+}
+
+void set_default_tbstats_flag(uint32_t flags)
+{
+ return;
+}
diff --git a/util/log.c b/util/log.c
index 53b4f6c58e..7ae471d97c 100644
--- a/util/log.c
+++ b/util/log.c
@@ -19,6 +19,7 @@
#include "qemu/osdep.h"
#include "qemu/log.h"
+#include "qemu/qemu-print.h"
#include "qemu/range.h"
#include "qemu/error-report.h"
#include "qapi/error.h"
@@ -27,6 +28,7 @@
#include "qemu/thread.h"
#include "qemu/lockable.h"
#include "qemu/rcu.h"
+#include "exec/tb-stats-flags.h"
#ifdef CONFIG_LINUX
#include <sys/syscall.h>
#endif
@@ -47,6 +49,7 @@ static __thread Notifier qemu_log_thread_cleanup_notifier;
int qemu_loglevel;
static bool log_per_thread;
static GArray *debug_regions;
+int32_t max_num_hot_tbs_to_dump;
/* Returns true if qemu_log() will really write somewhere. */
bool qemu_log_enabled(void)
@@ -495,6 +498,10 @@ const QEMULogItem qemu_log_items[] = {
"log every user-mode syscall, its input, and its result" },
{ LOG_PER_THREAD, "tid",
"open a separate log file per thread; filename must contain '%d'" },
+ { CPU_LOG_TB_STATS,
+ "tb_stats[[,level=(+all+jit+exec+time)][,dump_limit=<number>]]",
+ "enable collection of TBs statistics"
+ "(and dump until given a limit if in user mode).\n" },
{ 0, NULL, NULL },
};
@@ -516,6 +523,36 @@ int qemu_str_to_log_mask(const char *str)
trace_enable_events((*tmp) + 6);
mask |= LOG_TRACE;
#endif
+ } else if (g_str_has_prefix(*tmp, "tb_stats")) {
+ mask |= CPU_LOG_TB_STATS;
+ set_default_tbstats_flag(TB_ALL_STATS);
+ enable_collect_tb_stats();
+ } else if (tb_stats_collection_enabled() &&
+ g_str_has_prefix(*tmp, "dump_limit=")) {
+ max_num_hot_tbs_to_dump = atoi((*tmp) + 11);
+ } else if (tb_stats_collection_enabled() &&
+ g_str_has_prefix(*tmp, "level=")) {
+ uint32_t flags = 0;
+ char **level_parts = g_strsplit(*tmp + 6, "+", 0);
+ char **level_tmp;
+ for (level_tmp = level_parts; level_tmp && *level_tmp;
+ level_tmp++) {
+ if (g_str_equal(*level_tmp, "jit")) {
+ flags |= TB_JIT_STATS;
+ } else if (g_str_equal(*level_tmp, "exec")) {
+ flags |= TB_EXEC_STATS;
+ } else if (g_str_equal(*level_tmp, "time")) {
+ flags |= TB_JIT_TIME;
+ } else if (g_str_equal(*level_tmp, "all")) {
+ flags |= TB_ALL_STATS;
+ } else {
+ /* FIXME: set errp */
+ fprintf(stderr, "no option level=%s, valid options are:"
+ "all, jit, exec or/and time\n", *level_tmp);
+ exit(1);
+ }
+ set_default_tbstats_flag(flags);
+ }
} else {
for (item = qemu_log_items; item->mask != 0; item++) {
if (g_str_equal(*tmp, item->name)) {
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v11 10/14] monitor: adding tb_stats hmp command
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
` (8 preceding siblings ...)
2023-04-21 13:24 ` [PATCH v11 09/14] debug: add -d tb_stats to control TBStatistics collection: Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-04-21 13:24 ` [PATCH v11 11/14] tb-stats: reset the tracked TBs on a tb_flush Fei Wu
` (4 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel
Cc: Vanderson M. do Rosario, Dr . David Alan Gilbert, Paolo Bonzini
From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
Adding tb_stats [start|pause|stop|filter] command to hmp.
This allows controlling the collection of statistics.
It is also possible to set the level of collection:
all, jit, or exec.
tb_stats filter allow to only collect statistics for the TB
in the last_search list.
The goal of this command is to allow the dynamic exploration
of the TCG behavior and quality. Therefore, for now, a
corresponding QMP command is not worthwhile.
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-8-vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-9-vandersonmr2@gmail.com>
[AJB: fix authorship]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/monitor.c | 51 ++++++++++++++++
accel/tcg/tb-stats.c | 112 ++++++++++++++++++++++++++++++++++
hmp-commands.hx | 16 +++++
include/exec/tb-stats-flags.h | 1 +
include/exec/tb-stats.h | 10 +++
include/monitor/hmp.h | 1 +
softmmu/runstate.c | 6 ++
7 files changed, 197 insertions(+)
diff --git a/accel/tcg/monitor.c b/accel/tcg/monitor.c
index 5f9dba56e3..b342727262 100644
--- a/accel/tcg/monitor.c
+++ b/accel/tcg/monitor.c
@@ -10,7 +10,9 @@
#include "qapi/error.h"
#include "qapi/type-helpers.h"
#include "qapi/qapi-commands-machine.h"
+#include "qapi/qmp/qdict.h"
#include "monitor/monitor.h"
+#include "monitor/hmp.h"
#include "sysemu/cpus.h"
#include "sysemu/cpu-timers.h"
#include "sysemu/tcg.h"
@@ -72,12 +74,61 @@ HumanReadableText *qmp_x_query_opcount(Error **errp)
return human_readable_text_from_str(buf);
}
+#ifdef CONFIG_TCG
+void hmp_tbstats(Monitor *mon, const QDict *qdict)
+{
+ if (!tcg_enabled()) {
+ error_report("TB information is only available with accel=tcg");
+ return;
+ }
+
+ char *cmd = (char *) qdict_get_try_str(qdict, "command");
+ enum TbstatsCmd icmd = -1;
+
+ if (strcmp(cmd, "start") == 0) {
+ icmd = START;
+ } else if (strcmp(cmd, "pause") == 0) {
+ icmd = PAUSE;
+ } else if (strcmp(cmd, "stop") == 0) {
+ icmd = STOP;
+ } else if (strcmp(cmd, "filter") == 0) {
+ icmd = FILTER;
+ } else {
+ error_report("invalid command!");
+ return;
+ }
+
+ char *slevel = (char *) qdict_get_try_str(qdict, "level");
+ uint32_t level = TB_EXEC_STATS | TB_JIT_STATS | TB_JIT_TIME;
+ if (slevel) {
+ if (strcmp(slevel, "jit") == 0) {
+ level = TB_JIT_STATS;
+ } else if (strcmp(slevel, "exec") == 0) {
+ level = TB_EXEC_STATS;
+ } else if (strcmp(slevel, "time") == 0) {
+ level = TB_JIT_TIME;
+ }
+ }
+
+ struct TbstatsCommand *tbscommand = g_new0(struct TbstatsCommand, 1);
+ tbscommand->cmd = icmd;
+ tbscommand->level = level;
+ async_safe_run_on_cpu(first_cpu, do_hmp_tbstats_safe,
+ RUN_ON_CPU_HOST_PTR(tbscommand));
+
+}
+#endif
+
HumanReadableText *qmp_x_query_profile(Error **errp)
{
g_autoptr(GString) buf = g_string_new("");
+#ifdef CONFIG_TCG
dump_jit_exec_time_info(dev_time);
dev_time = 0;
+#else
+ error_report("TCG should be enabled!");
+#endif
return human_readable_text_from_str(buf);
}
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 4d24ac2472..61bfbe96fc 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -16,6 +16,7 @@
#include "qemu/timer.h"
#include "exec/tb-stats.h"
+#include "exec/tb-flush.h"
#include "tb-context.h"
/* TBStatistic collection controls */
@@ -28,6 +29,8 @@ enum TBStatsStatus {
static enum TBStatsStatus tcg_collect_tb_stats;
static uint32_t default_tbstats_flag;
+/* only accessed in safe work */
+static GList *last_search;
uint64_t dev_time;
@@ -170,6 +173,102 @@ void dump_jit_profile_info(TCGProfile *s, GString *buf)
g_free(jpi);
}
+static void free_tbstats(void *p, uint32_t hash, void *userp)
+{
+ g_free(p);
+}
+
+static void clean_tbstats(void)
+{
+ /* remove all tb_stats */
+ qht_iter(&tb_ctx.tb_stats, free_tbstats, NULL);
+ qht_destroy(&tb_ctx.tb_stats);
+}
+
+void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
+{
+ struct TbstatsCommand *cmdinfo = icmd.host_ptr;
+ int cmd = cmdinfo->cmd;
+ uint32_t level = cmdinfo->level;
+
+ switch (cmd) {
+ case START:
+ if (tb_stats_collection_paused()) {
+ set_tbstats_flags(level);
+ } else {
+ if (tb_stats_collection_enabled()) {
+ qemu_printf("TB information already being recorded");
+ return;
+ }
+ qht_init(&tb_ctx.tb_stats, tb_stats_cmp, CODE_GEN_HTABLE_SIZE,
+ QHT_MODE_AUTO_RESIZE);
+ }
+
+ set_default_tbstats_flag(level);
+ enable_collect_tb_stats();
+ tb_flush(cpu);
+ break;
+ case PAUSE:
+ if (!tb_stats_collection_enabled()) {
+ qemu_printf("TB information not being recorded");
+ return;
+ }
+
+ /*
+ * Continue to create TBStatistic structures but stop collecting
+ * statistics
+ */
+ pause_collect_tb_stats();
+ set_default_tbstats_flag(TB_NOTHING);
+ set_tbstats_flags(TB_PAUSED);
+ tb_flush(cpu);
+ break;
+ case STOP:
+ if (!tb_stats_collection_enabled()) {
+ qemu_printf("TB information not being recorded");
+ return;
+ }
+
+ /* Dissalloc all TBStatistics structures and stop creating new ones */
+ disable_collect_tb_stats();
+ clean_tbstats();
+ tb_flush(cpu);
+ break;
+ case FILTER:
+ if (!tb_stats_collection_enabled()) {
+ qemu_printf("TB information not being recorded");
+ return;
+ }
+ if (!last_search) {
+ qemu_printf(
+ "no search on record! execute info tbs before filtering!");
+ return;
+ }
+
+ set_default_tbstats_flag(TB_NOTHING);
+
+ /*
+ * Set all tbstats as paused, then return only the ones from last_search
+ */
+ pause_collect_tb_stats();
+ set_tbstats_flags(TB_PAUSED);
+
+ for (GList *iter = last_search; iter; iter = g_list_next(iter)) {
+ TBStatistics *tbs = iter->data;
+ tbs->stats_enabled = level;
+ }
+
+ tb_flush(cpu);
+
+ break;
+ default: /* INVALID */
+ g_assert_not_reached();
+ break;
+ }
+
+ g_free(cmdinfo);
+}
+
void init_tb_stats_htable(void)
{
if (!tb_ctx.tb_stats.map && tb_stats_collection_enabled()) {
@@ -204,6 +303,19 @@ bool tb_stats_collection_paused(void)
return tcg_collect_tb_stats == TB_STATS_PAUSED;
}
+static void reset_tbstats_flag(void *p, uint32_t hash, void *userp)
+{
+ uint32_t flag = *((int *)userp);
+ TBStatistics *tbs = p;
+ tbs->stats_enabled = flag;
+}
+
+void set_tbstats_flags(uint32_t flag)
+{
+ /* iterate over tbstats setting their flag as TB_NOTHING */
+ qht_iter(&tb_ctx.tb_stats, reset_tbstats_flag, &flag);
+}
+
uint32_t get_default_tbstats_flag(void)
{
return default_tbstats_flag;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index bb85ee1d26..473544ed3d 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1651,6 +1651,22 @@ SRST
Executes a qemu-io command on the given block device.
ERST
+#if defined(CONFIG_TCG)
+ {
+ .name = "tb_stats",
+ .args_type = "command:s,level:s?",
+ .params = "command [stats_level]",
+ .help = "Control tb statistics collection:"
+ "tb_stats (start|pause|stop|filter) [all|jit_stats|exec_stats]",
+ .cmd = hmp_tbstats,
+ },
+#endif
+
+SRST
+``tb_stats`` *command* *stats_level*
+ Control recording tb statistics
+ERST
+
{
.name = "qom-list",
.args_type = "path:s?",
diff --git a/include/exec/tb-stats-flags.h b/include/exec/tb-stats-flags.h
index 559d819b6b..e64fe8caf1 100644
--- a/include/exec/tb-stats-flags.h
+++ b/include/exec/tb-stats-flags.h
@@ -16,6 +16,7 @@
#define TB_JIT_STATS (1 << 2)
#define TB_JIT_TIME (1 << 3)
#define TB_ALL_STATS (TB_EXEC_STATS | TB_JIT_STATS | TB_JIT_TIME)
+#define TB_PAUSED (1 << 4)
/* TBStatistic collection controls */
void enable_collect_tb_stats(void);
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index ad60662f78..33eed8d385 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -32,6 +32,9 @@
#include "exec/tb-stats-flags.h"
+enum SortBy { SORT_BY_HOTNESS, SORT_BY_HG /* Host/Guest */, SORT_BY_SPILLS };
+enum TbstatsCmd { START, PAUSE, STOP, FILTER };
+
#define tb_stats_enabled(tb, JIT_STATS) \
(tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
@@ -115,4 +118,11 @@ void dump_jit_exec_time_info(uint64_t dev_time);
void set_tbstats_flags(uint32_t flags);
+struct TbstatsCommand {
+ enum TbstatsCmd cmd;
+ uint32_t level;
+};
+
+void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
+
#endif
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index fdb69b7f9c..207ec19b34 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -181,5 +181,6 @@ void hmp_ioport_write(Monitor *mon, const QDict *qdict);
void hmp_boot_set(Monitor *mon, const QDict *qdict);
void hmp_info_mtree(Monitor *mon, const QDict *qdict);
void hmp_info_cryptodev(Monitor *mon, const QDict *qdict);
+void hmp_tbstats(Monitor *mon, const QDict *qdict);
#endif
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 48f46283f1..d168b82469 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -720,12 +720,18 @@ static bool main_loop_should_exit(int *status)
int qemu_main_loop(void)
{
int status = EXIT_SUCCESS;
+#ifdef CONFIG_TCG
uint64_t ti;
+#endif
while (!main_loop_should_exit(&status)) {
+#ifdef CONFIG_TCG
ti = profile_getclock();
+#endif
main_loop_wait(false);
+#ifdef CONFIG_TCG
dev_time += profile_getclock() - ti;
+#endif
}
return status;
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v11 11/14] tb-stats: reset the tracked TBs on a tb_flush
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
` (9 preceding siblings ...)
2023-04-21 13:24 ` [PATCH v11 10/14] monitor: adding tb_stats hmp command Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-04-21 13:24 ` [PATCH v11 12/14] Adding info [tb-list|tb] commands to HMP (WIP) Fei Wu
` (3 subsequent siblings)
14 siblings, 0 replies; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel; +Cc: Paolo Bonzini
From: Alex Bennée <alex.bennee@linaro.org>
We keep track of translations but can only do so up until the
translation cache is flushed. At that point we really have no idea if
we can re-create a translation because all the active tracking
information has been reset.
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
---
accel/tcg/tb-maint.c | 1 +
accel/tcg/tb-stats.c | 19 +++++++++++++++++++
include/exec/tb-stats.h | 8 ++++++++
3 files changed, 28 insertions(+)
diff --git a/accel/tcg/tb-maint.c b/accel/tcg/tb-maint.c
index ba1635aa4b..5f946e0285 100644
--- a/accel/tcg/tb-maint.c
+++ b/accel/tcg/tb-maint.c
@@ -762,6 +762,7 @@ static void do_tb_flush(CPUState *cpu, run_on_cpu_data tb_flush_count)
qht_reset_size(&tb_ctx.htable, CODE_GEN_HTABLE_SIZE);
tb_remove_all();
+ tbstats_reset_tbs();
tcg_region_reset_all();
/* XXX: flush processor icache at this point if cache flush is expensive */
qatomic_mb_set(&tb_ctx.tb_flush_count, tb_ctx.tb_flush_count + 1);
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 61bfbe96fc..56e944b225 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -269,6 +269,25 @@ void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd)
g_free(cmdinfo);
}
+/*
+ * We have to reset the tbs array on a tb_flush as those
+ * TranslationBlocks no longer exist and we no loner know if the
+ * current mapping is still valid.
+ */
+
+static void reset_tbs_array(void *p, uint32_t hash, void *userp)
+{
+ TBStatistics *tbs = p;
+ g_ptr_array_set_size(tbs->tbs, 0);
+}
+
+void tbstats_reset_tbs(void)
+{
+ if (tb_ctx.tb_stats.map) {
+ qht_iter(&tb_ctx.tb_stats, reset_tbs_array, NULL);
+ }
+}
+
void init_tb_stats_htable(void)
{
if (!tb_ctx.tb_stats.map && tb_stats_collection_enabled()) {
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index 33eed8d385..ec47cbecc2 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -125,4 +125,12 @@ struct TbstatsCommand {
void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
+/**
+ * tbstats_reset_tbs: reset the linked array of TBs
+ *
+ * Reset the list of tbs for a given array. Should be called from
+ * safe work during tb_flush.
+ */
+void tbstats_reset_tbs(void);
+
#endif
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v11 12/14] Adding info [tb-list|tb] commands to HMP (WIP)
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
` (10 preceding siblings ...)
2023-04-21 13:24 ` [PATCH v11 11/14] tb-stats: reset the tracked TBs on a tb_flush Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-04-21 14:43 ` Wu, Fei
2023-04-21 13:24 ` [PATCH v11 13/14] tb-stats: dump hot TBs at the end of the execution Fei Wu
` (2 subsequent siblings)
14 siblings, 1 reply; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel
Cc: Vanderson M. do Rosario, Dr . David Alan Gilbert, Paolo Bonzini
From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
These commands allow the exploration of TBs generated by the TCG.
Understand which one hotter, with more guest/host instructions... and
examine their guest, host and IR code.
The goal of this command is to allow the dynamic exploration of TCG
behavior and code quality. Therefore, for now, a corresponding QMP
command is not worthwhile.
[AJB: WIP - we still can't be safely sure a translation will succeed]
Example of output:
TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
| exec:4828932/0 guest inst cov:16.38%
| trans:1 ints: g:3 op:82 op_opt:34 spills:3
| h/g (host bytes / guest insts): 90.666664
| time to gen at 2.4GHz => code:3150.83(ns) IR:712.08(ns)
| targets: 0x0000000000034d5e (id:11), 0x0000000000034d0d (id:2)
TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
| exec:4825842/0 guest inst cov:21.82%
| trans:1 ints: g:4 op:80 op_opt:38 spills:2
| h/g (host bytes / guest insts): 84.000000
| time to gen at 2.4GHz => code:3362.92(ns) IR:793.75(ns)
| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)
TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
| exec:6956495/0 guest inst cov:21.82%
| trans:2 ints: g:2 op:40 op_opt:19 spills:1
| h/g (host bytes / guest insts): 84.000000
| time to gen at 2.4GHz => code:3130.83(ns) IR:722.50(ns)
| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)
----------------
IN:
0x00034d0d: 89 de movl %ebx, %esi
0x00034d0f: 26 8b 0e movl %es:(%esi), %ecx
0x00034d12: 26 f6 46 08 80 testb $0x80, %es:8(%esi)
0x00034d17: 75 3b jne 0x34d54
------------------------------
TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
| exec:5202686/0 guest inst cov:11.28%
| trans:1 ints: g:3 op:82 op_opt:34 spills:3
| h/g (host bytes / guest insts): 90.666664
| time to gen at 2.4GHz => code:2793.75(ns) IR:614.58(ns)
| targets: 0x0000000000034d5e (id:3), 0x0000000000034d0d (id:2)
TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
| exec:5199468/0 guest inst cov:15.03%
| trans:1 ints: g:4 op:80 op_opt:38 spills:2
| h/g (host bytes / guest insts): 84.000000
| time to gen at 2.4GHz => code:2958.75(ns) IR:719.58(ns)
| targets: 0x0000000000034d19 (id:4), 0x0000000000034d54 (id:1)
------------------------------
2 TBs to reach 25% of guest inst exec coverage
Total of guest insts exec: 138346727
------------------------------
Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-10-vandersonmr2@gmail.com>
[AJB: fix authorship, dropped coverset]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/monitor.c | 55 ++++++
accel/tcg/tb-stats.c | 322 +++++++++++++++++++++++++++++++++++
disas.c | 31 +++-
hmp-commands-info.hx | 16 ++
include/exec/tb-stats.h | 33 +++-
include/monitor/hmp.h | 2 +
include/qemu/log-for-trace.h | 6 +-
include/qemu/log.h | 2 +
util/log.c | 66 +++++--
9 files changed, 512 insertions(+), 21 deletions(-)
diff --git a/accel/tcg/monitor.c b/accel/tcg/monitor.c
index b342727262..6bfefd24f6 100644
--- a/accel/tcg/monitor.c
+++ b/accel/tcg/monitor.c
@@ -7,6 +7,7 @@
*/
#include "qemu/osdep.h"
+#include "qemu/log.h"
#include "qapi/error.h"
#include "qapi/type-helpers.h"
#include "qapi/qapi-commands-machine.h"
@@ -17,6 +18,7 @@
#include "sysemu/cpu-timers.h"
#include "sysemu/tcg.h"
#include "exec/tb-stats.h"
+#include "tb-context.h"
#include "internal.h"
@@ -117,6 +119,59 @@ void hmp_tbstats(Monitor *mon, const QDict *qdict)
RUN_ON_CPU_HOST_PTR(tbscommand));
}
+
+void hmp_info_tblist(Monitor *mon, const QDict *qdict)
+{
+ int number_int;
+ const char *sortedby_str = NULL;
+ if (!tcg_enabled()) {
+ error_report("TB information is only available with accel=tcg");
+ return;
+ }
+ if (!tb_ctx.tb_stats.map) {
+ error_report("no TB information recorded");
+ return;
+ }
+
+ number_int = qdict_get_try_int(qdict, "number", 10);
+ sortedby_str = qdict_get_try_str(qdict, "sortedby");
+
+ int sortedby = SORT_BY_HOTNESS;
+ if (sortedby_str == NULL || strcmp(sortedby_str, "hotness") == 0) {
+ sortedby = SORT_BY_HOTNESS;
+ } else if (strcmp(sortedby_str, "hg") == 0) {
+ sortedby = SORT_BY_HG;
+ } else if (strcmp(sortedby_str, "spills") == 0) {
+ sortedby = SORT_BY_SPILLS;
+ } else {
+ error_report("valid sort options are: hotness hg spills");
+ return;
+ }
+
+ dump_tbs_info(number_int, sortedby, true);
+}
+
+void hmp_info_tb(Monitor *mon, const QDict *qdict)
+{
+ const int id = qdict_get_int(qdict, "id");
+ const char *flags = qdict_get_try_str(qdict, "flags");
+ int mask;
+
+ if (!tcg_enabled()) {
+ error_report("TB information is only available with accel=tcg");
+ return;
+ }
+
+ mask = flags ? qemu_str_to_log_mask(flags) : CPU_LOG_TB_IN_ASM;
+
+ if (!mask) {
+ error_report("Unable to parse log flags, see 'help log'");
+ return;
+ }
+
+ dump_tb_info(id, mask, true);
+}
+
#endif
HumanReadableText *qmp_x_query_profile(Error **errp)
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 56e944b225..69ac3d25cf 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -11,14 +11,18 @@
#include "disas/disas.h"
#include "exec/exec-all.h"
#include "tcg/tcg.h"
+#include "qapi/error.h"
#include "qemu/qemu-print.h"
#include "qemu/timer.h"
+#include "qemu/log.h"
#include "exec/tb-stats.h"
#include "exec/tb-flush.h"
#include "tb-context.h"
+#include "internal.h"
+
/* TBStatistic collection controls */
enum TBStatsStatus {
TB_STATS_DISABLED = 0,
@@ -32,8 +36,23 @@ static uint32_t default_tbstats_flag;
/* only accessed in safe work */
static GList *last_search;
+static int id = 1; /* display_id increment counter */
uint64_t dev_time;
+static TBStatistics *get_tbstats_by_id(int id)
+{
+ GList *iter;
+
+ for (iter = last_search; iter; iter = g_list_next(iter)) {
+ TBStatistics *tbs = iter->data;
+ if (tbs && tbs->display_id == id) {
+ return tbs;
+ break;
+ }
+ }
+ return NULL;
+}
+
struct jit_profile_info {
uint64_t translations;
uint64_t aborted;
@@ -296,6 +315,309 @@ void init_tb_stats_htable(void)
}
}
+static void collect_tb_stats(void *p, uint32_t hash, void *userp)
+{
+ last_search = g_list_prepend(last_search, p);
+}
+
+static void count_invalid_tbs(gpointer data, gpointer user_data)
+{
+ TranslationBlock *tb = (TranslationBlock *) data;
+ unsigned *counter = (unsigned *) user_data;
+ if (tb->cflags & CF_INVALID) {
+ *counter = *counter + 1;
+ }
+}
+
+static int dump_tb_header(TBStatistics *tbs)
+{
+ unsigned g = stat_per_translation(tbs, code.num_guest_inst);
+ unsigned ops = stat_per_translation(tbs, code.num_tcg_ops);
+ unsigned ops_opt = stat_per_translation(tbs, code.num_tcg_ops_opt);
+ unsigned spills = stat_per_translation(tbs, code.spills);
+ unsigned h = stat_per_translation(tbs, code.out_len);
+ unsigned act = tbs->tbs->len;
+ unsigned invalid = 0;
+
+ float guest_host_prop = g ? ((float) h / g) : 0;
+
+ g_ptr_array_foreach(tbs->tbs, &count_invalid_tbs, &invalid);
+
+ qemu_log("TB id:%d | phys:0x"TB_PAGE_ADDR_FMT" virt:0x"TARGET_FMT_lx
+ " flags:0x%08x %d inv/%d\n",
+ tbs->display_id, tbs->phys_pc, tbs->pc, tbs->flags,
+ invalid, act);
+
+ if (tbs_stats_enabled(tbs, TB_EXEC_STATS)) {
+ qemu_log("\t| exec:%lu/%lu guest inst cov:%.2f%%\n",
+ tbs->executions.normal,
+ tbs->executions.atomic, tbs->executions.coverage / 100.0f);
+ }
+
+ if (tbs_stats_enabled(tbs, TB_JIT_STATS)) {
+ qemu_log("\t| trans:%lu ints: g:%u op:%u op_opt:%u spills:%d"
+ "\n\t| h/g (host bytes / guest insts): %f\n",
+ tbs->translations.total, g, ops, ops_opt, spills, guest_host_prop);
+ }
+
+ if (tbs_stats_enabled(tbs, TB_JIT_TIME)) {
+ qemu_log("\t| time to gen at 2.4GHz => code:%0.2lf(ns) IR:%0.2lf(ns)\n",
+ tbs->gen_times.code / 2.4, tbs->gen_times.ir / 2.4);
+ }
+
+ qemu_log("\n");
+
+ return act - invalid;
+}
+
+static gint
+inverse_sort_tbs(gconstpointer p1, gconstpointer p2, gpointer psort_by)
+{
+ const TBStatistics *tbs1 = (TBStatistics *) p1;
+ const TBStatistics *tbs2 = (TBStatistics *) p2;
+ int sort_by = *((enum SortBy *) psort_by);
+ unsigned long c1 = 0;
+ unsigned long c2 = 0;
+
+ if (sort_by == SORT_BY_SPILLS) {
+ c1 = stat_per_translation(tbs1, code.spills);
+ c2 = stat_per_translation(tbs2, code.spills);
+ } else if (sort_by == SORT_BY_HOTNESS) {
+ c1 = stat_per_translation(tbs1, executions.normal);
+ c2 = stat_per_translation(tbs2, executions.normal);
+ } else if (sort_by == SORT_BY_HG) {
+ if (tbs1->code.num_guest_inst == 0) {
+ return -1;
+ }
+ if (tbs2->code.num_guest_inst == 0) {
+ return 1;
+ }
+
+ c1 = tbs1->code.out_len / tbs1->code.num_guest_inst;
+ c2 = tbs2->code.out_len / tbs2->code.num_guest_inst;
+ }
+ return c2 - c1;
+}
+
+static void dump_last_search_headers(int count)
+{
+ if (!last_search) {
+ qemu_log("No data collected yet\n");
+ return;
+ }
+
+ GList *l = last_search;
+ while (l != NULL && count--) {
+ TBStatistics *tbs = (TBStatistics *) l->data;
+ GList *next = l->next;
+ dump_tb_header(tbs);
+ l = next;
+ }
+}
+
+static uint64_t calculate_last_search_coverages(void)
+{
+ uint64_t total_exec_count = 0;
+ GList *i;
+
+ /* Compute total execution count for all tbs */
+ for (i = last_search; i; i = i->next) {
+ TBStatistics *tbs = (TBStatistics *) i->data;
+ total_exec_count +=
+ (tbs->executions.atomic + tbs->executions.normal)
+ * tbs->code.num_guest_inst;
+ }
+
+ for (i = last_search; i; i = i->next) {
+ TBStatistics *tbs = (TBStatistics *) i->data;
+ uint64_t tb_total_execs =
+ (tbs->executions.atomic + tbs->executions.normal)
+ * tbs->code.num_guest_inst;
+ tbs->executions.coverage =
+ (10000 * tb_total_execs) / (total_exec_count + 1);
+ }
+
+ return total_exec_count;
+}
+
+static void do_dump_tbs_info(int total, int sort_by)
+{
+ id = 1;
+ GList *i;
+ int count = total;
+
+ g_list_free(last_search);
+ last_search = NULL;
+
+ qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
+
+ last_search = g_list_sort_with_data(last_search, inverse_sort_tbs,
+ &sort_by);
+
+ if (!last_search) {
+ qemu_printf("No data collected yet!\n");
+ return;
+ }
+
+ calculate_last_search_coverages();
+
+ for (i = last_search; i && count--; i = i->next) {
+ TBStatistics *tbs = (TBStatistics *) i->data;
+ tbs->display_id = id++;
+ }
+
+ /* free the unused bits */
+ if (i) {
+ if (i->next) {
+ i->next->prev = NULL;
+ }
+ g_list_free(i->next);
+ i->next = NULL;
+ }
+
+ dump_last_search_headers(total);
+}
+
+struct tbs_dump_info {
+ int count;
+ int sort_by;
+};
+
+static void do_dump_tbs_info_safe(CPUState *cpu, run_on_cpu_data tbdi)
+{
+ struct tbs_dump_info *info = tbdi.host_ptr;
+ qemu_log_to_monitor(true);
+ do_dump_tbs_info(info->count, info->sort_by);
+ qemu_log_to_monitor(false);
+ g_free(info);
+}
+
+/*
+ * When we dump_tbs_info on a live system via the HMP we want to
+ * ensure the system is quiessent before we start outputting stuff.
+ * Otherwise we could pollute the output with other logging output.
+ */
+
+void dump_tbs_info(int count, int sort_by, bool use_monitor)
+{
+ if (use_monitor) {
+ struct tbs_dump_info *tbdi = g_new(struct tbs_dump_info, 1);
+ tbdi->count = count;
+ tbdi->sort_by = sort_by;
+ async_safe_run_on_cpu(first_cpu, do_dump_tbs_info_safe,
+ RUN_ON_CPU_HOST_PTR(tbdi));
+ } else {
+ do_dump_tbs_info(count, sort_by);
+ }
+}
+
+/*
+ * We cannot always re-generate the code even if we know there are
+ * valid translations still in the cache. The reason being the guest
+ * may have un-mapped the page code. In real life this would be
+ * un-reachable as the jump cache is cleared and the following QHT
+ * lookup will do a get_page_addr_code() and fault.
+ *
+ * TODO: can we do this safely? We need to
+ * a) somehow recover the mmu_idx for this translation
+ * b) probe MMU_INST_FETCH to know it will succeed
+ */
+static GString *get_code_string(TBStatistics *tbs, int log_flags)
+{
+ int old_log_flags = qemu_loglevel;
+
+ CPUState *cpu = first_cpu;
+ uint32_t cflags = curr_cflags(cpu);
+ TranslationBlock *tb = NULL;
+
+ GString *code_s = g_string_new(NULL);
+ qemu_log_to_string(true, code_s);
+
+ Error *err = NULL;
+ if (!qemu_set_log(log_flags, &err)) {
+ g_string_append_printf(code_s, "%s", error_get_pretty(err));
+ error_free(err);
+ }
+
+ if (sigsetjmp(cpu->jmp_env, 0) == 0) {
+ mmap_lock();
+ tb = tb_gen_code(cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
+ tb_phys_invalidate(tb, -1);
+ mmap_unlock();
+ } else {
+ /*
+ * The mmap_lock is dropped by tb_gen_code if it runs out of
+ * memory.
+ */
+ qemu_log("\ncould not gen code for this TB (no longer mapped?)\n");
+ assert_no_pages_locked();
+ }
+
+ qemu_set_log(old_log_flags, &err);
+ qemu_log_to_string(false, NULL);
+
+ return code_s;
+}
+
+static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)
+{
+ g_autoptr(GString) code_s = NULL;
+
+ qemu_log("\n------------------------------\n\n");
+
+ if (dump_tb_header(tbs) > 0) {
+ code_s = get_code_string(tbs, log_flags);
+ } else {
+ code_s = g_string_new("cannot re-translate non-active translation");
+ }
+ qemu_log("%s", code_s->str);
+ qemu_log("------------------------------\n");
+}
+
+struct tb_dump_info {
+ int id;
+ int log_flags;
+ bool use_monitor;
+};
+
+static void do_dump_tb_info_safe(CPUState *cpu, run_on_cpu_data info)
+{
+ struct tb_dump_info *tbdi = (struct tb_dump_info *) info.host_ptr;
+
+ if (!last_search) {
+ qemu_log("no search on record\n");
+ return;
+ }
+
+ qemu_log_to_monitor(tbdi->use_monitor);
+
+ TBStatistics *tbs = get_tbstats_by_id(tbdi->id);
+ if (tbs) {
+ do_tb_dump_with_statistics(tbs, tbdi->log_flags);
+ } else {
+ qemu_log("no TB statitics found with id %d\n", tbdi->id);
+ }
+
+ qemu_log_to_monitor(false);
+
+ g_free(tbdi);
+}
+
+void dump_tb_info(int id, int log_mask, bool use_monitor)
+{
+ struct tb_dump_info *tbdi = g_new(struct tb_dump_info, 1);
+
+ tbdi->id = id;
+ tbdi->log_flags = log_mask;
+ tbdi->use_monitor = use_monitor;
+
+ async_safe_run_on_cpu(first_cpu, do_dump_tb_info_safe,
+ RUN_ON_CPU_HOST_PTR(tbdi));
+
+ /* tbdi free'd by do_dump_tb_info_safe */
+}
+
+
void enable_collect_tb_stats(void)
{
tcg_collect_tb_stats = TB_STATS_RUNNING;
diff --git a/disas.c b/disas.c
index b087c12c47..326d1a5208 100644
--- a/disas.c
+++ b/disas.c
@@ -7,6 +7,8 @@
#include "disas/disas.h"
#include "disas/capstone.h"
+#include "qemu/log-for-trace.h"
+
typedef struct CPUDebug {
struct disassemble_info info;
CPUState *cpu;
@@ -203,6 +205,24 @@ static void initialize_debug_host(CPUDebug *s)
#endif
}
+static int
+__attribute__((format(printf, 2, 3)))
+fprintf_log(struct _IO_FILE *a, const char *b, ...)
+{
+ va_list ap;
+ va_start(ap, b);
+
+ if (!to_string) {
+ vfprintf(a, b, ap);
+ } else {
+ qemu_vlog(b, ap);
+ }
+
+ va_end(ap);
+
+ return 1;
+}
+
/* Disassemble this for me please... (debugging). */
void target_disas(FILE *out, CPUState *cpu, target_ulong code,
target_ulong size)
@@ -226,11 +246,12 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong code,
}
for (pc = code; size > 0; pc += count, size -= count) {
- fprintf(out, "0x" TARGET_FMT_lx ": ", pc);
- count = s.info.print_insn(pc, &s.info);
- fprintf(out, "\n");
- if (count < 0)
- break;
+ fprintf_log(out, "0x" TARGET_FMT_lx ": ", pc);
+ count = s.info.print_insn(pc, &s.info);
+ fprintf_log(out, "\n");
+ if (count < 0) {
+ break;
+ }
if (size < count) {
fprintf(out,
"Disassembler disagrees with translator over instruction "
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 47d63d26db..01d9658aea 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -262,6 +262,22 @@ ERST
.params = "",
.help = "show dynamic compiler info",
},
+ {
+ .name = "tb-list",
+ .args_type = "number:i?,sortedby:s?",
+ .params = "[number sortedby]",
+ .help = "show a [number] translated blocks sorted by [sortedby]"
+ "sortedby opts: hotness hg spills",
+ .cmd = hmp_info_tblist,
+ },
+ {
+ .name = "tb",
+ .args_type = "id:i,flags:s?",
+ .params = "id [flag1,flag2,...]",
+ .help = "show information about one translated block by id."
+ "dump flags can be used to set dump code level: out_asm in_asm op",
+ .cmd = hmp_info_tb,
+ },
#endif
SRST
diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
index ec47cbecc2..3d2b167b0e 100644
--- a/include/exec/tb-stats.h
+++ b/include/exec/tb-stats.h
@@ -35,8 +35,11 @@
enum SortBy { SORT_BY_HOTNESS, SORT_BY_HG /* Host/Guest */, SORT_BY_SPILLS };
enum TbstatsCmd { START, PAUSE, STOP, FILTER };
+#define tbs_stats_enabled(tbs, JIT_STATS) \
+ (tbs && (tbs->stats_enabled & JIT_STATS))
+
#define tb_stats_enabled(tb, JIT_STATS) \
- (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
+ (tb && tb->tb_stats && tbs_stats_enabled(tb->tb_stats, JIT_STATS))
#define stat_per_translation(stat, name) \
(stat->translations.total ? stat->name / stat->translations.total : 0)
@@ -65,6 +68,8 @@ struct TBStatistics {
struct {
unsigned long normal;
unsigned long atomic;
+ /* filled only when dumping x% cover set */
+ uint16_t coverage;
} executions;
/* JIT Stats - protected by lock */
@@ -87,7 +92,6 @@ struct TBStatistics {
struct {
unsigned long total;
- unsigned long uncached;
unsigned long spanning;
} translations;
@@ -107,6 +111,9 @@ struct TBStatistics {
uint64_t la;
uint64_t code;
} gen_times;
+
+ /* HMP information - used for referring to previous search */
+ int display_id;
};
bool tb_stats_cmp(const void *ap, const void *bp);
@@ -133,4 +140,26 @@ void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
*/
void tbstats_reset_tbs(void);
+/**
+ * dump_tbs_info: report the hottest blocks
+ *
+ * @count: the limit of hotblocks
+ * @sort_by: property in which the dump will be sorted
+ * @use_monitor: redirect output to monitor
+ *
+ * Report the hottest blocks to either the log or monitor
+ */
+void dump_tbs_info(int count, int sort_by, bool use_monitor);
+
+/**
+ * dump_tb_info: dump information about one TB
+ *
+ * @id: the display id of the block (from previous search)
+ * @mask: the temporary logging mask
+ * @Use_monitor: redirect output to monitor
+ *
+ * Re-run a translation of a block at addr for the purposes of debug output
+ */
+void dump_tb_info(int id, int log_mask, bool use_monitor);
+
#endif
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index 207ec19b34..69f3a75fef 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -182,5 +182,7 @@ void hmp_boot_set(Monitor *mon, const QDict *qdict);
void hmp_info_mtree(Monitor *mon, const QDict *qdict);
void hmp_info_cryptodev(Monitor *mon, const QDict *qdict);
void hmp_tbstats(Monitor *mon, const QDict *qdict);
+void hmp_info_tblist(Monitor *mon, const QDict *qdict);
+void hmp_info_tb(Monitor *mon, const QDict *qdict);
#endif
diff --git a/include/qemu/log-for-trace.h b/include/qemu/log-for-trace.h
index d47c9cd446..5d0afc56c7 100644
--- a/include/qemu/log-for-trace.h
+++ b/include/qemu/log-for-trace.h
@@ -20,6 +20,9 @@
/* Private global variable, don't use */
extern int qemu_loglevel;
+extern bool to_string;
+
+extern int32_t max_num_hot_tbs_to_dump;
#define LOG_TRACE (1 << 15)
@@ -30,6 +33,7 @@ static inline bool qemu_loglevel_mask(int mask)
}
/* main logging function */
-void G_GNUC_PRINTF(1, 2) qemu_log(const char *fmt, ...);
+int G_GNUC_PRINTF(1, 2) qemu_log(const char *fmt, ...);
+int G_GNUC_PRINTF(1, 0) qemu_vlog(const char *fmt, va_list va);
#endif
diff --git a/include/qemu/log.h b/include/qemu/log.h
index 6f3b8091cd..26b33151f3 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -85,6 +85,8 @@ extern const QEMULogItem qemu_log_items[];
bool qemu_set_log(int log_flags, Error **errp);
bool qemu_set_log_filename(const char *filename, Error **errp);
bool qemu_set_log_filename_flags(const char *name, int flags, Error **errp);
+void qemu_log_to_monitor(bool enable);
+void qemu_log_to_string(bool enable, GString *s);
void qemu_set_dfilter_ranges(const char *ranges, Error **errp);
bool qemu_log_in_addr_range(uint64_t addr);
int qemu_str_to_log_mask(const char *str);
diff --git a/util/log.c b/util/log.c
index 7ae471d97c..6477eb5a5f 100644
--- a/util/log.c
+++ b/util/log.c
@@ -50,6 +50,8 @@ int qemu_loglevel;
static bool log_per_thread;
static GArray *debug_regions;
int32_t max_num_hot_tbs_to_dump;
+static bool to_monitor;
+bool to_string;
/* Returns true if qemu_log() will really write somewhere. */
bool qemu_log_enabled(void)
@@ -146,19 +148,6 @@ void qemu_log_unlock(FILE *logfile)
}
}
-void qemu_log(const char *fmt, ...)
-{
- FILE *f = qemu_log_trylock();
- if (f) {
- va_list ap;
-
- va_start(ap, fmt);
- vfprintf(f, fmt, ap);
- va_end(ap);
- qemu_log_unlock(f);
- }
-}
-
static void __attribute__((__constructor__)) startup(void)
{
qemu_mutex_init(&global_mutex);
@@ -206,6 +195,55 @@ valid_filename_template(const char *filename, bool per_thread, Error **errp)
return filename ? vft_strdup : vft_stderr;
}
+GString *string;
+
+int qemu_vlog(const char *fmt, va_list va)
+{
+ int ret = 0;
+
+ if (to_string && string) {
+ g_string_append_vprintf(string, fmt, va);
+ } else if (to_monitor) {
+ ret = qemu_vprintf(fmt, va);
+ } else {
+ FILE *f = qemu_log_trylock();
+ if (f) {
+ ret = vfprintf(f, fmt, va);
+ }
+ qemu_log_unlock(f);
+ }
+
+ /* Don't pass back error results. */
+ if (ret < 0) {
+ ret = 0;
+ }
+ return ret;
+}
+
+/* Return the number of characters emitted. */
+int qemu_log(const char *fmt, ...)
+{
+ int ret = 0;
+ va_list ap;
+
+ va_start(ap, fmt);
+ ret = qemu_vlog(fmt, ap);
+ va_end(ap);
+
+ return ret;
+}
+
+void qemu_log_to_monitor(bool enable)
+{
+ to_monitor = enable;
+}
+
+void qemu_log_to_string(bool enable, GString *s)
+{
+ to_string = enable;
+ string = s;
+}
+
/* enable or disable low levels log */
static bool qemu_set_log_internal(const char *filename, bool changed_name,
int log_flags, Error **errp)
@@ -523,6 +561,7 @@ int qemu_str_to_log_mask(const char *str)
trace_enable_events((*tmp) + 6);
mask |= LOG_TRACE;
#endif
+#ifdef CONFIG_TCG
} else if (g_str_has_prefix(*tmp, "tb_stats")) {
mask |= CPU_LOG_TB_STATS;
set_default_tbstats_flag(TB_ALL_STATS);
@@ -553,6 +592,7 @@ int qemu_str_to_log_mask(const char *str)
}
set_default_tbstats_flag(flags);
}
+#endif
} else {
for (item = qemu_log_items; item->mask != 0; item++) {
if (g_str_equal(*tmp, item->name)) {
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v11 12/14] Adding info [tb-list|tb] commands to HMP (WIP)
2023-04-21 13:24 ` [PATCH v11 12/14] Adding info [tb-list|tb] commands to HMP (WIP) Fei Wu
@ 2023-04-21 14:43 ` Wu, Fei
0 siblings, 0 replies; 27+ messages in thread
From: Wu, Fei @ 2023-04-21 14:43 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel
Cc: Vanderson M. do Rosario, Dr . David Alan Gilbert, Paolo Bonzini
On 4/21/2023 9:24 PM, Fei Wu wrote:
> From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
>
> These commands allow the exploration of TBs generated by the TCG.
> Understand which one hotter, with more guest/host instructions... and
> examine their guest, host and IR code.
>
> The goal of this command is to allow the dynamic exploration of TCG
> behavior and code quality. Therefore, for now, a corresponding QMP
> command is not worthwhile.
>
> [AJB: WIP - we still can't be safely sure a translation will succeed]
>
> Example of output:
>
> TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
> | exec:4828932/0 guest inst cov:16.38%
> | trans:1 ints: g:3 op:82 op_opt:34 spills:3
> | h/g (host bytes / guest insts): 90.666664
> | time to gen at 2.4GHz => code:3150.83(ns) IR:712.08(ns)
> | targets: 0x0000000000034d5e (id:11), 0x0000000000034d0d (id:2)
>
> TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
> | exec:4825842/0 guest inst cov:21.82%
> | trans:1 ints: g:4 op:80 op_opt:38 spills:2
> | h/g (host bytes / guest insts): 84.000000
> | time to gen at 2.4GHz => code:3362.92(ns) IR:793.75(ns)
> | targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)
>
> TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
> | exec:6956495/0 guest inst cov:21.82%
> | trans:2 ints: g:2 op:40 op_opt:19 spills:1
> | h/g (host bytes / guest insts): 84.000000
> | time to gen at 2.4GHz => code:3130.83(ns) IR:722.50(ns)
> | targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)
>
> ----------------
> IN:
> 0x00034d0d: 89 de movl %ebx, %esi
> 0x00034d0f: 26 8b 0e movl %es:(%esi), %ecx
> 0x00034d12: 26 f6 46 08 80 testb $0x80, %es:8(%esi)
> 0x00034d17: 75 3b jne 0x34d54
>
> ------------------------------
>
> TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
> | exec:5202686/0 guest inst cov:11.28%
> | trans:1 ints: g:3 op:82 op_opt:34 spills:3
> | h/g (host bytes / guest insts): 90.666664
> | time to gen at 2.4GHz => code:2793.75(ns) IR:614.58(ns)
> | targets: 0x0000000000034d5e (id:3), 0x0000000000034d0d (id:2)
>
> TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
> | exec:5199468/0 guest inst cov:15.03%
> | trans:1 ints: g:4 op:80 op_opt:38 spills:2
> | h/g (host bytes / guest insts): 84.000000
> | time to gen at 2.4GHz => code:2958.75(ns) IR:719.58(ns)
> | targets: 0x0000000000034d19 (id:4), 0x0000000000034d54 (id:1)
>
> ------------------------------
> 2 TBs to reach 25% of guest inst exec coverage
> Total of guest insts exec: 138346727
>
> ------------------------------
>
> Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
> Message-Id: <20190829173437.5926-10-vandersonmr2@gmail.com>
> [AJB: fix authorship, dropped coverset]
> Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
> ---
> accel/tcg/monitor.c | 55 ++++++
> accel/tcg/tb-stats.c | 322 +++++++++++++++++++++++++++++++++++
> disas.c | 31 +++-
> hmp-commands-info.hx | 16 ++
> include/exec/tb-stats.h | 33 +++-
> include/monitor/hmp.h | 2 +
> include/qemu/log-for-trace.h | 6 +-
> include/qemu/log.h | 2 +
> util/log.c | 66 +++++--
> 9 files changed, 512 insertions(+), 21 deletions(-)
>
> diff --git a/accel/tcg/monitor.c b/accel/tcg/monitor.c
> index b342727262..6bfefd24f6 100644
> --- a/accel/tcg/monitor.c
> +++ b/accel/tcg/monitor.c
> @@ -7,6 +7,7 @@
> */
>
> #include "qemu/osdep.h"
> +#include "qemu/log.h"
> #include "qapi/error.h"
> #include "qapi/type-helpers.h"
> #include "qapi/qapi-commands-machine.h"
> @@ -17,6 +18,7 @@
> #include "sysemu/cpu-timers.h"
> #include "sysemu/tcg.h"
> #include "exec/tb-stats.h"
> +#include "tb-context.h"
> #include "internal.h"
>
>
> @@ -117,6 +119,59 @@ void hmp_tbstats(Monitor *mon, const QDict *qdict)
> RUN_ON_CPU_HOST_PTR(tbscommand));
>
> }
> +
> +void hmp_info_tblist(Monitor *mon, const QDict *qdict)
> +{
> + int number_int;
> + const char *sortedby_str = NULL;
> + if (!tcg_enabled()) {
> + error_report("TB information is only available with accel=tcg");
> + return;
> + }
> + if (!tb_ctx.tb_stats.map) {
> + error_report("no TB information recorded");
> + return;
> + }
> +
> + number_int = qdict_get_try_int(qdict, "number", 10);
> + sortedby_str = qdict_get_try_str(qdict, "sortedby");
> +
> + int sortedby = SORT_BY_HOTNESS;
> + if (sortedby_str == NULL || strcmp(sortedby_str, "hotness") == 0) {
> + sortedby = SORT_BY_HOTNESS;
> + } else if (strcmp(sortedby_str, "hg") == 0) {
> + sortedby = SORT_BY_HG;
> + } else if (strcmp(sortedby_str, "spills") == 0) {
> + sortedby = SORT_BY_SPILLS;
> + } else {
> + error_report("valid sort options are: hotness hg spills");
> + return;
> + }
> +
> + dump_tbs_info(number_int, sortedby, true);
> +}
> +
> +void hmp_info_tb(Monitor *mon, const QDict *qdict)
> +{
> + const int id = qdict_get_int(qdict, "id");
> + const char *flags = qdict_get_try_str(qdict, "flags");
> + int mask;
> +
> + if (!tcg_enabled()) {
> + error_report("TB information is only available with accel=tcg");
> + return;
> + }
> +
> + mask = flags ? qemu_str_to_log_mask(flags) : CPU_LOG_TB_IN_ASM;
> +
> + if (!mask) {
> + error_report("Unable to parse log flags, see 'help log'");
> + return;
> + }
> +
> + dump_tb_info(id, mask, true);
> +}
> +
> #endif
>
> HumanReadableText *qmp_x_query_profile(Error **errp)
> diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
> index 56e944b225..69ac3d25cf 100644
> --- a/accel/tcg/tb-stats.c
> +++ b/accel/tcg/tb-stats.c
> @@ -11,14 +11,18 @@
> #include "disas/disas.h"
> #include "exec/exec-all.h"
> #include "tcg/tcg.h"
> +#include "qapi/error.h"
>
> #include "qemu/qemu-print.h"
> #include "qemu/timer.h"
> +#include "qemu/log.h"
>
> #include "exec/tb-stats.h"
> #include "exec/tb-flush.h"
> #include "tb-context.h"
>
> +#include "internal.h"
> +
> /* TBStatistic collection controls */
> enum TBStatsStatus {
> TB_STATS_DISABLED = 0,
> @@ -32,8 +36,23 @@ static uint32_t default_tbstats_flag;
> /* only accessed in safe work */
> static GList *last_search;
>
> +static int id = 1; /* display_id increment counter */
> uint64_t dev_time;
>
> +static TBStatistics *get_tbstats_by_id(int id)
> +{
> + GList *iter;
> +
> + for (iter = last_search; iter; iter = g_list_next(iter)) {
> + TBStatistics *tbs = iter->data;
> + if (tbs && tbs->display_id == id) {
> + return tbs;
> + break;
> + }
> + }
> + return NULL;
> +}
> +
> struct jit_profile_info {
> uint64_t translations;
> uint64_t aborted;
> @@ -296,6 +315,309 @@ void init_tb_stats_htable(void)
> }
> }
>
> +static void collect_tb_stats(void *p, uint32_t hash, void *userp)
> +{
> + last_search = g_list_prepend(last_search, p);
> +}
> +
> +static void count_invalid_tbs(gpointer data, gpointer user_data)
> +{
> + TranslationBlock *tb = (TranslationBlock *) data;
> + unsigned *counter = (unsigned *) user_data;
> + if (tb->cflags & CF_INVALID) {
> + *counter = *counter + 1;
> + }
> +}
> +
> +static int dump_tb_header(TBStatistics *tbs)
> +{
> + unsigned g = stat_per_translation(tbs, code.num_guest_inst);
> + unsigned ops = stat_per_translation(tbs, code.num_tcg_ops);
> + unsigned ops_opt = stat_per_translation(tbs, code.num_tcg_ops_opt);
> + unsigned spills = stat_per_translation(tbs, code.spills);
> + unsigned h = stat_per_translation(tbs, code.out_len);
> + unsigned act = tbs->tbs->len;
> + unsigned invalid = 0;
> +
> + float guest_host_prop = g ? ((float) h / g) : 0;
> +
> + g_ptr_array_foreach(tbs->tbs, &count_invalid_tbs, &invalid);
> +
> + qemu_log("TB id:%d | phys:0x"TB_PAGE_ADDR_FMT" virt:0x"TARGET_FMT_lx
> + " flags:0x%08x %d inv/%d\n",
> + tbs->display_id, tbs->phys_pc, tbs->pc, tbs->flags,
> + invalid, act);
> +
> + if (tbs_stats_enabled(tbs, TB_EXEC_STATS)) {
> + qemu_log("\t| exec:%lu/%lu guest inst cov:%.2f%%\n",
> + tbs->executions.normal,
> + tbs->executions.atomic, tbs->executions.coverage / 100.0f);
> + }
> +
> + if (tbs_stats_enabled(tbs, TB_JIT_STATS)) {
> + qemu_log("\t| trans:%lu ints: g:%u op:%u op_opt:%u spills:%d"
> + "\n\t| h/g (host bytes / guest insts): %f\n",
> + tbs->translations.total, g, ops, ops_opt, spills, guest_host_prop);
> + }
> +
> + if (tbs_stats_enabled(tbs, TB_JIT_TIME)) {
> + qemu_log("\t| time to gen at 2.4GHz => code:%0.2lf(ns) IR:%0.2lf(ns)\n",
> + tbs->gen_times.code / 2.4, tbs->gen_times.ir / 2.4);
> + }
> +
> + qemu_log("\n");
> +
> + return act - invalid;
> +}
> +
> +static gint
> +inverse_sort_tbs(gconstpointer p1, gconstpointer p2, gpointer psort_by)
> +{
> + const TBStatistics *tbs1 = (TBStatistics *) p1;
> + const TBStatistics *tbs2 = (TBStatistics *) p2;
> + int sort_by = *((enum SortBy *) psort_by);
> + unsigned long c1 = 0;
> + unsigned long c2 = 0;
> +
> + if (sort_by == SORT_BY_SPILLS) {
> + c1 = stat_per_translation(tbs1, code.spills);
> + c2 = stat_per_translation(tbs2, code.spills);
> + } else if (sort_by == SORT_BY_HOTNESS) {
> + c1 = stat_per_translation(tbs1, executions.normal);
> + c2 = stat_per_translation(tbs2, executions.normal);
> + } else if (sort_by == SORT_BY_HG) {
> + if (tbs1->code.num_guest_inst == 0) {
> + return -1;
> + }
> + if (tbs2->code.num_guest_inst == 0) {
> + return 1;
> + }
> +
> + c1 = tbs1->code.out_len / tbs1->code.num_guest_inst;
> + c2 = tbs2->code.out_len / tbs2->code.num_guest_inst;
> + }
> + return c2 - c1;
I take the type as int for granted, will fix it.
Thanks,
Fei.
> +}
> +
> +static void dump_last_search_headers(int count)
> +{
> + if (!last_search) {
> + qemu_log("No data collected yet\n");
> + return;
> + }
> +
> + GList *l = last_search;
> + while (l != NULL && count--) {
> + TBStatistics *tbs = (TBStatistics *) l->data;
> + GList *next = l->next;
> + dump_tb_header(tbs);
> + l = next;
> + }
> +}
> +
> +static uint64_t calculate_last_search_coverages(void)
> +{
> + uint64_t total_exec_count = 0;
> + GList *i;
> +
> + /* Compute total execution count for all tbs */
> + for (i = last_search; i; i = i->next) {
> + TBStatistics *tbs = (TBStatistics *) i->data;
> + total_exec_count +=
> + (tbs->executions.atomic + tbs->executions.normal)
> + * tbs->code.num_guest_inst;
> + }
> +
> + for (i = last_search; i; i = i->next) {
> + TBStatistics *tbs = (TBStatistics *) i->data;
> + uint64_t tb_total_execs =
> + (tbs->executions.atomic + tbs->executions.normal)
> + * tbs->code.num_guest_inst;
> + tbs->executions.coverage =
> + (10000 * tb_total_execs) / (total_exec_count + 1);
> + }
> +
> + return total_exec_count;
> +}
> +
> +static void do_dump_tbs_info(int total, int sort_by)
> +{
> + id = 1;
> + GList *i;
> + int count = total;
> +
> + g_list_free(last_search);
> + last_search = NULL;
> +
> + qht_iter(&tb_ctx.tb_stats, collect_tb_stats, NULL);
> +
> + last_search = g_list_sort_with_data(last_search, inverse_sort_tbs,
> + &sort_by);
> +
> + if (!last_search) {
> + qemu_printf("No data collected yet!\n");
> + return;
> + }
> +
> + calculate_last_search_coverages();
> +
> + for (i = last_search; i && count--; i = i->next) {
> + TBStatistics *tbs = (TBStatistics *) i->data;
> + tbs->display_id = id++;
> + }
> +
> + /* free the unused bits */
> + if (i) {
> + if (i->next) {
> + i->next->prev = NULL;
> + }
> + g_list_free(i->next);
> + i->next = NULL;
> + }
> +
> + dump_last_search_headers(total);
> +}
> +
> +struct tbs_dump_info {
> + int count;
> + int sort_by;
> +};
> +
> +static void do_dump_tbs_info_safe(CPUState *cpu, run_on_cpu_data tbdi)
> +{
> + struct tbs_dump_info *info = tbdi.host_ptr;
> + qemu_log_to_monitor(true);
> + do_dump_tbs_info(info->count, info->sort_by);
> + qemu_log_to_monitor(false);
> + g_free(info);
> +}
> +
> +/*
> + * When we dump_tbs_info on a live system via the HMP we want to
> + * ensure the system is quiessent before we start outputting stuff.
> + * Otherwise we could pollute the output with other logging output.
> + */
> +
> +void dump_tbs_info(int count, int sort_by, bool use_monitor)
> +{
> + if (use_monitor) {
> + struct tbs_dump_info *tbdi = g_new(struct tbs_dump_info, 1);
> + tbdi->count = count;
> + tbdi->sort_by = sort_by;
> + async_safe_run_on_cpu(first_cpu, do_dump_tbs_info_safe,
> + RUN_ON_CPU_HOST_PTR(tbdi));
> + } else {
> + do_dump_tbs_info(count, sort_by);
> + }
> +}
> +
> +/*
> + * We cannot always re-generate the code even if we know there are
> + * valid translations still in the cache. The reason being the guest
> + * may have un-mapped the page code. In real life this would be
> + * un-reachable as the jump cache is cleared and the following QHT
> + * lookup will do a get_page_addr_code() and fault.
> + *
> + * TODO: can we do this safely? We need to
> + * a) somehow recover the mmu_idx for this translation
> + * b) probe MMU_INST_FETCH to know it will succeed
> + */
> +static GString *get_code_string(TBStatistics *tbs, int log_flags)
> +{
> + int old_log_flags = qemu_loglevel;
> +
> + CPUState *cpu = first_cpu;
> + uint32_t cflags = curr_cflags(cpu);
> + TranslationBlock *tb = NULL;
> +
> + GString *code_s = g_string_new(NULL);
> + qemu_log_to_string(true, code_s);
> +
> + Error *err = NULL;
> + if (!qemu_set_log(log_flags, &err)) {
> + g_string_append_printf(code_s, "%s", error_get_pretty(err));
> + error_free(err);
> + }
> +
> + if (sigsetjmp(cpu->jmp_env, 0) == 0) {
> + mmap_lock();
> + tb = tb_gen_code(cpu, tbs->pc, tbs->cs_base, tbs->flags, cflags);
> + tb_phys_invalidate(tb, -1);
> + mmap_unlock();
> + } else {
> + /*
> + * The mmap_lock is dropped by tb_gen_code if it runs out of
> + * memory.
> + */
> + qemu_log("\ncould not gen code for this TB (no longer mapped?)\n");
> + assert_no_pages_locked();
> + }
> +
> + qemu_set_log(old_log_flags, &err);
> + qemu_log_to_string(false, NULL);
> +
> + return code_s;
> +}
> +
> +static void do_tb_dump_with_statistics(TBStatistics *tbs, int log_flags)
> +{
> + g_autoptr(GString) code_s = NULL;
> +
> + qemu_log("\n------------------------------\n\n");
> +
> + if (dump_tb_header(tbs) > 0) {
> + code_s = get_code_string(tbs, log_flags);
> + } else {
> + code_s = g_string_new("cannot re-translate non-active translation");
> + }
> + qemu_log("%s", code_s->str);
> + qemu_log("------------------------------\n");
> +}
> +
> +struct tb_dump_info {
> + int id;
> + int log_flags;
> + bool use_monitor;
> +};
> +
> +static void do_dump_tb_info_safe(CPUState *cpu, run_on_cpu_data info)
> +{
> + struct tb_dump_info *tbdi = (struct tb_dump_info *) info.host_ptr;
> +
> + if (!last_search) {
> + qemu_log("no search on record\n");
> + return;
> + }
> +
> + qemu_log_to_monitor(tbdi->use_monitor);
> +
> + TBStatistics *tbs = get_tbstats_by_id(tbdi->id);
> + if (tbs) {
> + do_tb_dump_with_statistics(tbs, tbdi->log_flags);
> + } else {
> + qemu_log("no TB statitics found with id %d\n", tbdi->id);
> + }
> +
> + qemu_log_to_monitor(false);
> +
> + g_free(tbdi);
> +}
> +
> +void dump_tb_info(int id, int log_mask, bool use_monitor)
> +{
> + struct tb_dump_info *tbdi = g_new(struct tb_dump_info, 1);
> +
> + tbdi->id = id;
> + tbdi->log_flags = log_mask;
> + tbdi->use_monitor = use_monitor;
> +
> + async_safe_run_on_cpu(first_cpu, do_dump_tb_info_safe,
> + RUN_ON_CPU_HOST_PTR(tbdi));
> +
> + /* tbdi free'd by do_dump_tb_info_safe */
> +}
> +
> +
> void enable_collect_tb_stats(void)
> {
> tcg_collect_tb_stats = TB_STATS_RUNNING;
> diff --git a/disas.c b/disas.c
> index b087c12c47..326d1a5208 100644
> --- a/disas.c
> +++ b/disas.c
> @@ -7,6 +7,8 @@
> #include "disas/disas.h"
> #include "disas/capstone.h"
>
> +#include "qemu/log-for-trace.h"
> +
> typedef struct CPUDebug {
> struct disassemble_info info;
> CPUState *cpu;
> @@ -203,6 +205,24 @@ static void initialize_debug_host(CPUDebug *s)
> #endif
> }
>
> +static int
> +__attribute__((format(printf, 2, 3)))
> +fprintf_log(struct _IO_FILE *a, const char *b, ...)
> +{
> + va_list ap;
> + va_start(ap, b);
> +
> + if (!to_string) {
> + vfprintf(a, b, ap);
> + } else {
> + qemu_vlog(b, ap);
> + }
> +
> + va_end(ap);
> +
> + return 1;
> +}
> +
> /* Disassemble this for me please... (debugging). */
> void target_disas(FILE *out, CPUState *cpu, target_ulong code,
> target_ulong size)
> @@ -226,11 +246,12 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong code,
> }
>
> for (pc = code; size > 0; pc += count, size -= count) {
> - fprintf(out, "0x" TARGET_FMT_lx ": ", pc);
> - count = s.info.print_insn(pc, &s.info);
> - fprintf(out, "\n");
> - if (count < 0)
> - break;
> + fprintf_log(out, "0x" TARGET_FMT_lx ": ", pc);
> + count = s.info.print_insn(pc, &s.info);
> + fprintf_log(out, "\n");
> + if (count < 0) {
> + break;
> + }
> if (size < count) {
> fprintf(out,
> "Disassembler disagrees with translator over instruction "
> diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
> index 47d63d26db..01d9658aea 100644
> --- a/hmp-commands-info.hx
> +++ b/hmp-commands-info.hx
> @@ -262,6 +262,22 @@ ERST
> .params = "",
> .help = "show dynamic compiler info",
> },
> + {
> + .name = "tb-list",
> + .args_type = "number:i?,sortedby:s?",
> + .params = "[number sortedby]",
> + .help = "show a [number] translated blocks sorted by [sortedby]"
> + "sortedby opts: hotness hg spills",
> + .cmd = hmp_info_tblist,
> + },
> + {
> + .name = "tb",
> + .args_type = "id:i,flags:s?",
> + .params = "id [flag1,flag2,...]",
> + .help = "show information about one translated block by id."
> + "dump flags can be used to set dump code level: out_asm in_asm op",
> + .cmd = hmp_info_tb,
> + },
> #endif
>
> SRST
> diff --git a/include/exec/tb-stats.h b/include/exec/tb-stats.h
> index ec47cbecc2..3d2b167b0e 100644
> --- a/include/exec/tb-stats.h
> +++ b/include/exec/tb-stats.h
> @@ -35,8 +35,11 @@
> enum SortBy { SORT_BY_HOTNESS, SORT_BY_HG /* Host/Guest */, SORT_BY_SPILLS };
> enum TbstatsCmd { START, PAUSE, STOP, FILTER };
>
> +#define tbs_stats_enabled(tbs, JIT_STATS) \
> + (tbs && (tbs->stats_enabled & JIT_STATS))
> +
> #define tb_stats_enabled(tb, JIT_STATS) \
> - (tb && tb->tb_stats && (tb->tb_stats->stats_enabled & JIT_STATS))
> + (tb && tb->tb_stats && tbs_stats_enabled(tb->tb_stats, JIT_STATS))
>
> #define stat_per_translation(stat, name) \
> (stat->translations.total ? stat->name / stat->translations.total : 0)
> @@ -65,6 +68,8 @@ struct TBStatistics {
> struct {
> unsigned long normal;
> unsigned long atomic;
> + /* filled only when dumping x% cover set */
> + uint16_t coverage;
> } executions;
>
> /* JIT Stats - protected by lock */
> @@ -87,7 +92,6 @@ struct TBStatistics {
>
> struct {
> unsigned long total;
> - unsigned long uncached;
> unsigned long spanning;
> } translations;
>
> @@ -107,6 +111,9 @@ struct TBStatistics {
> uint64_t la;
> uint64_t code;
> } gen_times;
> +
> + /* HMP information - used for referring to previous search */
> + int display_id;
> };
>
> bool tb_stats_cmp(const void *ap, const void *bp);
> @@ -133,4 +140,26 @@ void do_hmp_tbstats_safe(CPUState *cpu, run_on_cpu_data icmd);
> */
> void tbstats_reset_tbs(void);
>
> +/**
> + * dump_tbs_info: report the hottest blocks
> + *
> + * @count: the limit of hotblocks
> + * @sort_by: property in which the dump will be sorted
> + * @use_monitor: redirect output to monitor
> + *
> + * Report the hottest blocks to either the log or monitor
> + */
> +void dump_tbs_info(int count, int sort_by, bool use_monitor);
> +
> +/**
> + * dump_tb_info: dump information about one TB
> + *
> + * @id: the display id of the block (from previous search)
> + * @mask: the temporary logging mask
> + * @Use_monitor: redirect output to monitor
> + *
> + * Re-run a translation of a block at addr for the purposes of debug output
> + */
> +void dump_tb_info(int id, int log_mask, bool use_monitor);
> +
> #endif
> diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
> index 207ec19b34..69f3a75fef 100644
> --- a/include/monitor/hmp.h
> +++ b/include/monitor/hmp.h
> @@ -182,5 +182,7 @@ void hmp_boot_set(Monitor *mon, const QDict *qdict);
> void hmp_info_mtree(Monitor *mon, const QDict *qdict);
> void hmp_info_cryptodev(Monitor *mon, const QDict *qdict);
> void hmp_tbstats(Monitor *mon, const QDict *qdict);
> +void hmp_info_tblist(Monitor *mon, const QDict *qdict);
> +void hmp_info_tb(Monitor *mon, const QDict *qdict);
>
> #endif
> diff --git a/include/qemu/log-for-trace.h b/include/qemu/log-for-trace.h
> index d47c9cd446..5d0afc56c7 100644
> --- a/include/qemu/log-for-trace.h
> +++ b/include/qemu/log-for-trace.h
> @@ -20,6 +20,9 @@
>
> /* Private global variable, don't use */
> extern int qemu_loglevel;
> +extern bool to_string;
> +
> +extern int32_t max_num_hot_tbs_to_dump;
>
> #define LOG_TRACE (1 << 15)
>
> @@ -30,6 +33,7 @@ static inline bool qemu_loglevel_mask(int mask)
> }
>
> /* main logging function */
> -void G_GNUC_PRINTF(1, 2) qemu_log(const char *fmt, ...);
> +int G_GNUC_PRINTF(1, 2) qemu_log(const char *fmt, ...);
> +int G_GNUC_PRINTF(1, 0) qemu_vlog(const char *fmt, va_list va);
>
> #endif
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index 6f3b8091cd..26b33151f3 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -85,6 +85,8 @@ extern const QEMULogItem qemu_log_items[];
> bool qemu_set_log(int log_flags, Error **errp);
> bool qemu_set_log_filename(const char *filename, Error **errp);
> bool qemu_set_log_filename_flags(const char *name, int flags, Error **errp);
> +void qemu_log_to_monitor(bool enable);
> +void qemu_log_to_string(bool enable, GString *s);
> void qemu_set_dfilter_ranges(const char *ranges, Error **errp);
> bool qemu_log_in_addr_range(uint64_t addr);
> int qemu_str_to_log_mask(const char *str);
> diff --git a/util/log.c b/util/log.c
> index 7ae471d97c..6477eb5a5f 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -50,6 +50,8 @@ int qemu_loglevel;
> static bool log_per_thread;
> static GArray *debug_regions;
> int32_t max_num_hot_tbs_to_dump;
> +static bool to_monitor;
> +bool to_string;
>
> /* Returns true if qemu_log() will really write somewhere. */
> bool qemu_log_enabled(void)
> @@ -146,19 +148,6 @@ void qemu_log_unlock(FILE *logfile)
> }
> }
>
> -void qemu_log(const char *fmt, ...)
> -{
> - FILE *f = qemu_log_trylock();
> - if (f) {
> - va_list ap;
> -
> - va_start(ap, fmt);
> - vfprintf(f, fmt, ap);
> - va_end(ap);
> - qemu_log_unlock(f);
> - }
> -}
> -
> static void __attribute__((__constructor__)) startup(void)
> {
> qemu_mutex_init(&global_mutex);
> @@ -206,6 +195,55 @@ valid_filename_template(const char *filename, bool per_thread, Error **errp)
> return filename ? vft_strdup : vft_stderr;
> }
>
> +GString *string;
> +
> +int qemu_vlog(const char *fmt, va_list va)
> +{
> + int ret = 0;
> +
> + if (to_string && string) {
> + g_string_append_vprintf(string, fmt, va);
> + } else if (to_monitor) {
> + ret = qemu_vprintf(fmt, va);
> + } else {
> + FILE *f = qemu_log_trylock();
> + if (f) {
> + ret = vfprintf(f, fmt, va);
> + }
> + qemu_log_unlock(f);
> + }
> +
> + /* Don't pass back error results. */
> + if (ret < 0) {
> + ret = 0;
> + }
> + return ret;
> +}
> +
> +/* Return the number of characters emitted. */
> +int qemu_log(const char *fmt, ...)
> +{
> + int ret = 0;
> + va_list ap;
> +
> + va_start(ap, fmt);
> + ret = qemu_vlog(fmt, ap);
> + va_end(ap);
> +
> + return ret;
> +}
> +
> +void qemu_log_to_monitor(bool enable)
> +{
> + to_monitor = enable;
> +}
> +
> +void qemu_log_to_string(bool enable, GString *s)
> +{
> + to_string = enable;
> + string = s;
> +}
> +
> /* enable or disable low levels log */
> static bool qemu_set_log_internal(const char *filename, bool changed_name,
> int log_flags, Error **errp)
> @@ -523,6 +561,7 @@ int qemu_str_to_log_mask(const char *str)
> trace_enable_events((*tmp) + 6);
> mask |= LOG_TRACE;
> #endif
> +#ifdef CONFIG_TCG
> } else if (g_str_has_prefix(*tmp, "tb_stats")) {
> mask |= CPU_LOG_TB_STATS;
> set_default_tbstats_flag(TB_ALL_STATS);
> @@ -553,6 +592,7 @@ int qemu_str_to_log_mask(const char *str)
> }
> set_default_tbstats_flag(flags);
> }
> +#endif
> } else {
> for (item = qemu_log_items; item->mask != 0; item++) {
> if (g_str_equal(*tmp, item->name)) {
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v11 13/14] tb-stats: dump hot TBs at the end of the execution
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
` (11 preceding siblings ...)
2023-04-21 13:24 ` [PATCH v11 12/14] Adding info [tb-list|tb] commands to HMP (WIP) Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-04-21 13:24 ` [PATCH v11 14/14] configure: remove the final bits of --profiler support Fei Wu
2023-04-21 16:42 ` [PATCH v11 00/14] TCG code quality tracking Alex Bennée
14 siblings, 0 replies; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel
Cc: Vanderson M. do Rosario, Paolo Bonzini, Laurent Vivier
From: "Vanderson M. do Rosario" <vandersonmr2@gmail.com>
Dump the hottest TBs if -d tb_stats,dump_limit=N is used.
Example of output for the 3 hottest TBs:
TB id:1 | phys:0x34d54 virt:0x0000000000034d54 flags:0x0000f0
| exec:4828932/0 guest inst cov:16.38%
| trans:1 ints: g:3 op:82 op_opt:34 spills:3
| h/g (host bytes / guest insts): 90.666664
| time to gen at 2.4GHz => code:3150.83(ns) IR:712.08(ns)
| targets: 0x0000000000034d5e (id:11), 0x0000000000034d0d (id:2)
TB id:2 | phys:0x34d0d virt:0x0000000000034d0d flags:0x0000f0
| exec:4825842/0 guest inst cov:21.82%
| trans:1 ints: g:4 op:80 op_opt:38 spills:2
| h/g (host bytes / guest insts): 84.000000
| time to gen at 2.4GHz => code:3362.92(ns) IR:793.75(ns)
| targets: 0x0000000000034d19 (id:12), 0x0000000000034d54 (id:1)
TB id:3 | phys:0xec1c1 virt:0x00000000000ec1c1 flags:0x0000b0
| exec:872032/0 guest inst cov:1.97%
| trans:1 ints: g:2 op:56 op_opt:26 spills:1
| h/g (host bytes / guest insts): 68.000000
| time to gen at 2.4GHz => code:1692.08(ns) IR:473.75(ns)
| targets: 0x00000000000ec1c5 (id:4), 0x00000000000ec1cb (id:13)
Signed-off-by: Vanderson M. do Rosario <vandersonmr2@gmail.com>
Message-Id: <20190829173437.5926-12-vandersonmr2@gmail.com>
[AJB: fix authorship, ad softmmu support]
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
accel/tcg/tb-stats.c | 21 +++++++++++++++++++++
include/exec/tb-stats-dump.h | 21 +++++++++++++++++++++
include/exec/tb-stats-flags.h | 1 +
linux-user/exit.c | 2 ++
softmmu/runstate.c | 2 ++
stubs/tb-stats.c | 5 +++++
util/log.c | 4 ++--
7 files changed, 54 insertions(+), 2 deletions(-)
create mode 100644 include/exec/tb-stats-dump.h
diff --git a/accel/tcg/tb-stats.c b/accel/tcg/tb-stats.c
index 69ac3d25cf..da85f397fd 100644
--- a/accel/tcg/tb-stats.c
+++ b/accel/tcg/tb-stats.c
@@ -19,6 +19,7 @@
#include "exec/tb-stats.h"
#include "exec/tb-flush.h"
+#include "exec/tb-stats-dump.h"
#include "tb-context.h"
#include "internal.h"
@@ -33,6 +34,7 @@ enum TBStatsStatus {
static enum TBStatsStatus tcg_collect_tb_stats;
static uint32_t default_tbstats_flag;
+static int max_dump_tbs;
/* only accessed in safe work */
static GList *last_search;
@@ -618,6 +620,20 @@ void dump_tb_info(int id, int log_mask, bool use_monitor)
}
+/*
+ * Dump the final stats
+ */
+void tb_stats_dump(void)
+{
+ if (!tb_stats_collection_enabled()) {
+ return;
+ }
+
+ dump_tbs_info(max_dump_tbs, SORT_BY_HOTNESS, false);
+}
+
+/* TBStatistic collection controls */
+
void enable_collect_tb_stats(void)
{
tcg_collect_tb_stats = TB_STATS_RUNNING;
@@ -666,3 +682,8 @@ void set_default_tbstats_flag(uint32_t flags)
{
default_tbstats_flag = flags;
}
+
+void set_tbstats_max_tbs(int max)
+{
+ max_dump_tbs = max;
+}
diff --git a/include/exec/tb-stats-dump.h b/include/exec/tb-stats-dump.h
new file mode 100644
index 0000000000..197c6148e9
--- /dev/null
+++ b/include/exec/tb-stats-dump.h
@@ -0,0 +1,21 @@
+/*
+ * TB Stats common dump functions across sysemu/linux-user
+ *
+ * Copyright (c) 2019 Linaro
+ *
+ * SPDX-License-Identifier: GPL-3.0-or-later
+ */
+
+#ifndef _TB_STATS_DUMP_H_
+#define _TB_STATS_DUMP_H_
+
+/**
+ * tb_stats_dump: dump final tb_stats at end of execution
+ */
+#ifdef CONFIG_TCG
+void tb_stats_dump(void);
+#else
+static inline void tb_stats_dump(void) { /* do nothing */ };
+#endif
+
+#endif /* _TB_STATS_DUMP_H_ */
diff --git a/include/exec/tb-stats-flags.h b/include/exec/tb-stats-flags.h
index e64fe8caf1..6fd1f8ce33 100644
--- a/include/exec/tb-stats-flags.h
+++ b/include/exec/tb-stats-flags.h
@@ -25,6 +25,7 @@ void pause_collect_tb_stats(void);
bool tb_stats_collection_enabled(void);
bool tb_stats_collection_paused(void);
+void set_tbstats_max_tbs(int max);
uint32_t get_default_tbstats_flag(void);
void set_default_tbstats_flag(uint32_t);
diff --git a/linux-user/exit.c b/linux-user/exit.c
index 3017d28a3c..4fd23bcf60 100644
--- a/linux-user/exit.c
+++ b/linux-user/exit.c
@@ -25,6 +25,7 @@
#ifdef CONFIG_GPROF
#include <sys/gmon.h>
#endif
+#include "exec/tb-stats-dump.h"
#ifdef CONFIG_GCOV
extern void __gcov_dump(void);
@@ -41,4 +42,5 @@ void preexit_cleanup(CPUArchState *env, int code)
gdb_exit(code);
qemu_plugin_user_exit();
perf_exit();
+ tb_stats_dump();
}
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index d168b82469..6c18b19d2b 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -30,6 +30,7 @@
#include "crypto/cipher.h"
#include "crypto/init.h"
#include "exec/cpu-common.h"
+#include "exec/tb-stats-dump.h"
#include "gdbstub/syscalls.h"
#include "hw/boards.h"
#include "migration/misc.h"
@@ -819,6 +820,7 @@ void qemu_cleanup(void)
vm_shutdown();
replay_finish();
+ tb_stats_dump();
job_cancel_sync_all();
bdrv_close_all();
diff --git a/stubs/tb-stats.c b/stubs/tb-stats.c
index d212c2a1fa..a3e1406b88 100644
--- a/stubs/tb-stats.c
+++ b/stubs/tb-stats.c
@@ -21,6 +21,11 @@ bool tb_stats_collection_enabled(void)
return false;
}
+void set_tbstats_max_tbs(int max)
+{
+ return;
+}
+
void set_default_tbstats_flag(uint32_t flags)
{
return;
diff --git a/util/log.c b/util/log.c
index 6477eb5a5f..d159ca6916 100644
--- a/util/log.c
+++ b/util/log.c
@@ -49,7 +49,6 @@ static __thread Notifier qemu_log_thread_cleanup_notifier;
int qemu_loglevel;
static bool log_per_thread;
static GArray *debug_regions;
-int32_t max_num_hot_tbs_to_dump;
static bool to_monitor;
bool to_string;
@@ -568,7 +567,8 @@ int qemu_str_to_log_mask(const char *str)
enable_collect_tb_stats();
} else if (tb_stats_collection_enabled() &&
g_str_has_prefix(*tmp, "dump_limit=")) {
- max_num_hot_tbs_to_dump = atoi((*tmp) + 11);
+ int hot_tbs = atoi((*tmp) + 11);
+ set_tbstats_max_tbs(hot_tbs);
} else if (tb_stats_collection_enabled() &&
g_str_has_prefix(*tmp, "level=")) {
uint32_t flags = 0;
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v11 14/14] configure: remove the final bits of --profiler support
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
` (12 preceding siblings ...)
2023-04-21 13:24 ` [PATCH v11 13/14] tb-stats: dump hot TBs at the end of the execution Fei Wu
@ 2023-04-21 13:24 ` Fei Wu
2023-04-21 16:42 ` [PATCH v11 00/14] TCG code quality tracking Alex Bennée
14 siblings, 0 replies; 27+ messages in thread
From: Fei Wu @ 2023-04-21 13:24 UTC (permalink / raw)
To: alex.bennee, richard.henderson, qemu-devel
Cc: Paolo Bonzini, Marc-André Lureau, Daniel P. Berrangé,
Thomas Huth, Philippe Mathieu-Daudé
From: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
---
meson.build | 2 --
meson_options.txt | 2 --
scripts/meson-buildoptions.sh | 3 ---
3 files changed, 7 deletions(-)
diff --git a/meson.build b/meson.build
index 29f8644d6d..27627199d7 100644
--- a/meson.build
+++ b/meson.build
@@ -1872,7 +1872,6 @@ if numa.found()
dependencies: numa))
endif
config_host_data.set('CONFIG_OPENGL', opengl.found())
-config_host_data.set('CONFIG_PROFILER', get_option('profiler'))
config_host_data.set('CONFIG_RBD', rbd.found())
config_host_data.set('CONFIG_RDMA', rdma.found())
config_host_data.set('CONFIG_SDL', sdl.found())
@@ -3823,7 +3822,6 @@ if 'objc' in all_languages
summary_info += {'QEMU_OBJCFLAGS': ' '.join(qemu_objcflags)}
endif
summary_info += {'QEMU_LDFLAGS': ' '.join(qemu_ldflags)}
-summary_info += {'profiler': get_option('profiler')}
summary_info += {'link-time optimization (LTO)': get_option('b_lto')}
summary_info += {'PIE': get_option('b_pie')}
summary_info += {'static build': config_host.has_key('CONFIG_STATIC')}
diff --git a/meson_options.txt b/meson_options.txt
index fc9447d267..163233fda6 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -320,8 +320,6 @@ option('qom_cast_debug', type: 'boolean', value: false,
option('gprof', type: 'boolean', value: false,
description: 'QEMU profiling with gprof',
deprecated: true)
-option('profiler', type: 'boolean', value: false,
- description: 'profiler support')
option('slirp_smbd', type : 'feature', value : 'auto',
description: 'use smbd (at path --smbd=*) in slirp networking')
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 009fab1515..8a6d61ed90 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -34,7 +34,6 @@ meson_options_help() {
printf "%s\n" ' jemalloc/system/tcmalloc)'
printf "%s\n" ' --enable-module-upgrades try to load modules from alternate paths for'
printf "%s\n" ' upgrades'
- printf "%s\n" ' --enable-profiler profiler support'
printf "%s\n" ' --enable-qom-cast-debug cast debugging support'
printf "%s\n" ' --enable-rng-none dummy RNG, avoid using /dev/(u)random and'
printf "%s\n" ' getrandom()'
@@ -373,8 +372,6 @@ _meson_option_parse() {
--with-pkgversion=*) quote_sh "-Dpkgversion=$2" ;;
--enable-png) printf "%s" -Dpng=enabled ;;
--disable-png) printf "%s" -Dpng=disabled ;;
- --enable-profiler) printf "%s" -Dprofiler=true ;;
- --disable-profiler) printf "%s" -Dprofiler=false ;;
--enable-pvrdma) printf "%s" -Dpvrdma=enabled ;;
--disable-pvrdma) printf "%s" -Dpvrdma=disabled ;;
--enable-qcow1) printf "%s" -Dqcow1=enabled ;;
--
2.25.1
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v11 00/14] TCG code quality tracking
2023-04-21 13:24 [PATCH v11 00/14] TCG code quality tracking Fei Wu
` (13 preceding siblings ...)
2023-04-21 13:24 ` [PATCH v11 14/14] configure: remove the final bits of --profiler support Fei Wu
@ 2023-04-21 16:42 ` Alex Bennée
2023-05-12 8:07 ` Wu, Fei
2023-05-19 1:16 ` Wu, Fei
14 siblings, 2 replies; 27+ messages in thread
From: Alex Bennée @ 2023-04-21 16:42 UTC (permalink / raw)
To: Fei Wu; +Cc: richard.henderson, qemu-devel
Fei Wu <fei2.wu@intel.com> writes:
> This patch series were done by Vanderson and Alex originally in 2019, I
> (Fei Wu) rebased them on latest upstream from:
> https://github.com/stsquad/qemu/tree/tcg/tbstats-and-perf-v10
> and send out this review per Alex's request, I will continue to address
> any future review comments here. As it's been a very long time and there
> are lots of conflicts during rebase, it's my fault if I introduce any
> problems during the process.
Hi Fei,
Thanks for picking this up. I can confirm that this applies cleanly to
master and I have kicked the tyres and things still seem to work. I'm
not sure if I can provide much review on code I wrote but a few things
to point out:
- there are a number of CI failures, mainly qatomic on 32 bit guests
see https://gitlab.com/stsquad/qemu/-/pipelines/844857279/failures
maybe we just disable time accounting for 32 bit hosts?
- we need a proper solution to the invalidation of TBs so we can
exclude them from lists (or at least not do the attempt
translation/fail dance). Alternatively we could page out a copy of
the TB data to a disk file when we hit a certain hotness? How would
this interact with the jitperf support already?
- we should add some documentation to the manual so users don't have
to figure it all out by trail and error at the HMP command line.
- there may be some exit cases missed because I saw some weird TB's
with very long IR generation times.
TB id:5 | phys:0xb5f21d00 virt:0xffffcf2f17721d00 flags:0x00000051 1 inv/2
| exec:1889055/0 guest inst cov:1.05%
| trans:2 ints: g:4 op:32 op_opt:26 spills:0
| h/g (host bytes / guest insts): 56.000000
| time to gen at 2.4GHz => code:6723.33(ns) IR:2378539.17(ns)
- code motion in 9/14 should be folded into the first patch
Even if we can't find a solution for safely dumping TBs I think the
series without "tb-list" is still an improvement for getting rid of the
--enable-profiler and making info JIT useful by default.
Richard,
What do you think?
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v11 00/14] TCG code quality tracking
2023-04-21 16:42 ` [PATCH v11 00/14] TCG code quality tracking Alex Bennée
@ 2023-05-12 8:07 ` Wu, Fei
2023-05-12 8:42 ` Alex Bennée
2023-05-19 1:16 ` Wu, Fei
1 sibling, 1 reply; 27+ messages in thread
From: Wu, Fei @ 2023-05-12 8:07 UTC (permalink / raw)
To: Alex Bennée; +Cc: richard.henderson, qemu-devel
On 4/22/2023 12:42 AM, Alex Bennée wrote:
>
> Fei Wu <fei2.wu@intel.com> writes:
>
>> This patch series were done by Vanderson and Alex originally in 2019, I
>> (Fei Wu) rebased them on latest upstream from:
>> https://github.com/stsquad/qemu/tree/tcg/tbstats-and-perf-v10
>> and send out this review per Alex's request, I will continue to address
>> any future review comments here. As it's been a very long time and there
>> are lots of conflicts during rebase, it's my fault if I introduce any
>> problems during the process.
>
> Hi Fei,
>
> Thanks for picking this up. I can confirm that this applies cleanly to
> master and I have kicked the tyres and things still seem to work. I'm
> not sure if I can provide much review on code I wrote but a few things
> to point out:
>
Hi Alex,
There are several new files added, should I put your name as their
maintainer? Also, should I signed-off these patches or not, definitely
the original signed-offs will be kept.
Thanks,
Fei.
> - there are a number of CI failures, mainly qatomic on 32 bit guests
> see https://gitlab.com/stsquad/qemu/-/pipelines/844857279/failures
> maybe we just disable time accounting for 32 bit hosts?
>
> - we need a proper solution to the invalidation of TBs so we can
> exclude them from lists (or at least not do the attempt
> translation/fail dance). Alternatively we could page out a copy of
> the TB data to a disk file when we hit a certain hotness? How would
> this interact with the jitperf support already?
>
> - we should add some documentation to the manual so users don't have
> to figure it all out by trail and error at the HMP command line.
>
> - there may be some exit cases missed because I saw some weird TB's
> with very long IR generation times.
>
> TB id:5 | phys:0xb5f21d00 virt:0xffffcf2f17721d00 flags:0x00000051 1 inv/2
> | exec:1889055/0 guest inst cov:1.05%
> | trans:2 ints: g:4 op:32 op_opt:26 spills:0
> | h/g (host bytes / guest insts): 56.000000
> | time to gen at 2.4GHz => code:6723.33(ns) IR:2378539.17(ns)
>
> - code motion in 9/14 should be folded into the first patch
>
> Even if we can't find a solution for safely dumping TBs I think the
> series without "tb-list" is still an improvement for getting rid of the
> --enable-profiler and making info JIT useful by default.
>
> Richard,
>
> What do you think?
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v11 00/14] TCG code quality tracking
2023-05-12 8:07 ` Wu, Fei
@ 2023-05-12 8:42 ` Alex Bennée
2023-05-12 8:58 ` Wu, Fei
0 siblings, 1 reply; 27+ messages in thread
From: Alex Bennée @ 2023-05-12 8:42 UTC (permalink / raw)
To: Wu, Fei; +Cc: richard.henderson, qemu-devel
"Wu, Fei" <fei2.wu@intel.com> writes:
> On 4/22/2023 12:42 AM, Alex Bennée wrote:
>>
>> Fei Wu <fei2.wu@intel.com> writes:
>>
>>> This patch series were done by Vanderson and Alex originally in 2019, I
>>> (Fei Wu) rebased them on latest upstream from:
>>> https://github.com/stsquad/qemu/tree/tcg/tbstats-and-perf-v10
>>> and send out this review per Alex's request, I will continue to address
>>> any future review comments here. As it's been a very long time and there
>>> are lots of conflicts during rebase, it's my fault if I introduce any
>>> problems during the process.
>>
>> Hi Fei,
>>
>> Thanks for picking this up. I can confirm that this applies cleanly to
>> master and I have kicked the tyres and things still seem to work. I'm
>> not sure if I can provide much review on code I wrote but a few things
>> to point out:
>>
> Hi Alex,
>
> There are several new files added, should I put your name as their
> maintainer? Also, should I signed-off these patches or not, definitely
> the original signed-offs will be kept.
I assume they would just become part of the accel/tcg stuff rather than
be maintained as a separate subsystem.
For sign-offs you should keep the original authors and add your own.
Each s-o-b is a statement by the person working with the code that they
are "legally okay to contribute this and happy for it to go into QEMU".
So it is totally normal for work that goes through several trees before
being merged to have multiple sign-offs. When the maintainer creates
their pull request they will add theirs as well.
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v11 00/14] TCG code quality tracking
2023-05-12 8:42 ` Alex Bennée
@ 2023-05-12 8:58 ` Wu, Fei
2023-05-12 9:29 ` Alex Bennée
0 siblings, 1 reply; 27+ messages in thread
From: Wu, Fei @ 2023-05-12 8:58 UTC (permalink / raw)
To: Alex Bennée; +Cc: richard.henderson, qemu-devel
On 5/12/2023 4:42 PM, Alex Bennée wrote:
>
> "Wu, Fei" <fei2.wu@intel.com> writes:
>
>> On 4/22/2023 12:42 AM, Alex Bennée wrote:
>>>
>>> Fei Wu <fei2.wu@intel.com> writes:
>>>
>>>> This patch series were done by Vanderson and Alex originally in 2019, I
>>>> (Fei Wu) rebased them on latest upstream from:
>>>> https://github.com/stsquad/qemu/tree/tcg/tbstats-and-perf-v10
>>>> and send out this review per Alex's request, I will continue to address
>>>> any future review comments here. As it's been a very long time and there
>>>> are lots of conflicts during rebase, it's my fault if I introduce any
>>>> problems during the process.
>>>
>>> Hi Fei,
>>>
>>> Thanks for picking this up. I can confirm that this applies cleanly to
>>> master and I have kicked the tyres and things still seem to work. I'm
>>> not sure if I can provide much review on code I wrote but a few things
>>> to point out:
>>>
>> Hi Alex,
>>
>> There are several new files added, should I put your name as their
>> maintainer? Also, should I signed-off these patches or not, definitely
>> the original signed-offs will be kept.
>
> I assume they would just become part of the accel/tcg stuff rather than
> be maintained as a separate subsystem.
>
ok. I see ./scripts/checkpatch.pl reports some warnings as follows:
WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
> For sign-offs you should keep the original authors and add your own.
> Each s-o-b is a statement by the person working with the code that they
> are "legally okay to contribute this and happy for it to go into QEMU".
> So it is totally normal for work that goes through several trees before
> being merged to have multiple sign-offs. When the maintainer creates
> their pull request they will add theirs as well.
>
Got it, thank you.
Fei.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v11 00/14] TCG code quality tracking
2023-05-12 8:58 ` Wu, Fei
@ 2023-05-12 9:29 ` Alex Bennée
0 siblings, 0 replies; 27+ messages in thread
From: Alex Bennée @ 2023-05-12 9:29 UTC (permalink / raw)
To: Wu, Fei; +Cc: richard.henderson, qemu-devel
"Wu, Fei" <fei2.wu@intel.com> writes:
> On 5/12/2023 4:42 PM, Alex Bennée wrote:
>>
>> "Wu, Fei" <fei2.wu@intel.com> writes:
>>
>>> On 4/22/2023 12:42 AM, Alex Bennée wrote:
>>>>
>>>> Fei Wu <fei2.wu@intel.com> writes:
>>>>
>>>>> This patch series were done by Vanderson and Alex originally in 2019, I
>>>>> (Fei Wu) rebased them on latest upstream from:
>>>>> https://github.com/stsquad/qemu/tree/tcg/tbstats-and-perf-v10
>>>>> and send out this review per Alex's request, I will continue to address
>>>>> any future review comments here. As it's been a very long time and there
>>>>> are lots of conflicts during rebase, it's my fault if I introduce any
>>>>> problems during the process.
>>>>
>>>> Hi Fei,
>>>>
>>>> Thanks for picking this up. I can confirm that this applies cleanly to
>>>> master and I have kicked the tyres and things still seem to work. I'm
>>>> not sure if I can provide much review on code I wrote but a few things
>>>> to point out:
>>>>
>>> Hi Alex,
>>>
>>> There are several new files added, should I put your name as their
>>> maintainer? Also, should I signed-off these patches or not, definitely
>>> the original signed-offs will be kept.
>>
>> I assume they would just become part of the accel/tcg stuff rather than
>> be maintained as a separate subsystem.
>>
> ok. I see ./scripts/checkpatch.pl reports some warnings as follows:
>
> WARNING: added, moved or deleted file(s), does MAINTAINERS need
> updating?
All the stuff under accel/tcg should already be caught by:
Overall TCG CPUs
...
F: accel/tcg/
but I suspect the new headers will need explicit entries in the
MAINTAINERS file adding.
>
>> For sign-offs you should keep the original authors and add your own.
>> Each s-o-b is a statement by the person working with the code that they
>> are "legally okay to contribute this and happy for it to go into QEMU".
>> So it is totally normal for work that goes through several trees before
>> being merged to have multiple sign-offs. When the maintainer creates
>> their pull request they will add theirs as well.
>>
> Got it, thank you.
>
> Fei.
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v11 00/14] TCG code quality tracking
2023-04-21 16:42 ` [PATCH v11 00/14] TCG code quality tracking Alex Bennée
2023-05-12 8:07 ` Wu, Fei
@ 2023-05-19 1:16 ` Wu, Fei
2023-05-19 22:01 ` Richard Henderson
1 sibling, 1 reply; 27+ messages in thread
From: Wu, Fei @ 2023-05-19 1:16 UTC (permalink / raw)
To: Alex Bennée; +Cc: richard.henderson, qemu-devel
On 4/22/2023 12:42 AM, Alex Bennée wrote:
>
> Fei Wu <fei2.wu@intel.com> writes:
>
>> This patch series were done by Vanderson and Alex originally in 2019, I
>> (Fei Wu) rebased them on latest upstream from:
>> https://github.com/stsquad/qemu/tree/tcg/tbstats-and-perf-v10
>> and send out this review per Alex's request, I will continue to address
>> any future review comments here. As it's been a very long time and there
>> are lots of conflicts during rebase, it's my fault if I introduce any
>> problems during the process.
>
> Hi Fei,
>
> Thanks for picking this up. I can confirm that this applies cleanly to
> master and I have kicked the tyres and things still seem to work. I'm
> not sure if I can provide much review on code I wrote but a few things
> to point out:
>
> - there are a number of CI failures, mainly qatomic on 32 bit guests
> see https://gitlab.com/stsquad/qemu/-/pipelines/844857279/failures
> maybe we just disable time accounting for 32 bit hosts?
>
I sent out v12 series which fixes some CI failures. qatomic is not
touched yet, the current code with CONFIG_PROFILER should have the same
issue, what's the policy of 32 bit guests support on qemu?
Besides time, there are some other counters with uint64_t using qatomic
such as TCGProfile.table_op_count, we might switch to size_t instead?
> - we need a proper solution to the invalidation of TBs so we can
> exclude them from lists (or at least not do the attempt
> translation/fail dance). Alternatively we could page out a copy of
> the TB data to a disk file when we hit a certain hotness? How would
> this interact with the jitperf support already?
>
> - we should add some documentation to the manual so users don't have
> to figure it all out by trail and error at the HMP command line.
>
added one in docs/tb-stats.txt. Some extra bits could be added to
explain the fields of the output.
> - there may be some exit cases missed because I saw some weird TB's
> with very long IR generation times.
>
> TB id:5 | phys:0xb5f21d00 virt:0xffffcf2f17721d00 flags:0x00000051 1 inv/2
> | exec:1889055/0 guest inst cov:1.05%
> | trans:2 ints: g:4 op:32 op_opt:26 spills:0
> | h/g (host bytes / guest insts): 56.000000
> | time to gen at 2.4GHz => code:6723.33(ns) IR:2378539.17(ns)
>
Is it reproducible on your system? I didn't see it on my system, is it
possible the system events cause this?
> - code motion in 9/14 should be folded into the first patch
>
done.
btw, I also added a few comments on v12 series, could you please check
if they make sense?
Thanks,
Fei.
> Even if we can't find a solution for safely dumping TBs I think the
> series without "tb-list" is still an improvement for getting rid of the
> --enable-profiler and making info JIT useful by default.
>
> Richard,
>
> What do you think?
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v11 00/14] TCG code quality tracking
2023-05-19 1:16 ` Wu, Fei
@ 2023-05-19 22:01 ` Richard Henderson
0 siblings, 0 replies; 27+ messages in thread
From: Richard Henderson @ 2023-05-19 22:01 UTC (permalink / raw)
To: Wu, Fei, Alex Bennée; +Cc: qemu-devel
On 5/18/23 18:16, Wu, Fei wrote:
> On 4/22/2023 12:42 AM, Alex Bennée wrote:
>>
>> Fei Wu <fei2.wu@intel.com> writes:
>>
>>> This patch series were done by Vanderson and Alex originally in 2019, I
>>> (Fei Wu) rebased them on latest upstream from:
>>> https://github.com/stsquad/qemu/tree/tcg/tbstats-and-perf-v10
>>> and send out this review per Alex's request, I will continue to address
>>> any future review comments here. As it's been a very long time and there
>>> are lots of conflicts during rebase, it's my fault if I introduce any
>>> problems during the process.
>>
>> Hi Fei,
>>
>> Thanks for picking this up. I can confirm that this applies cleanly to
>> master and I have kicked the tyres and things still seem to work. I'm
>> not sure if I can provide much review on code I wrote but a few things
>> to point out:
>>
>> - there are a number of CI failures, mainly qatomic on 32 bit guests
>> see https://gitlab.com/stsquad/qemu/-/pipelines/844857279/failures
>> maybe we just disable time accounting for 32 bit hosts?
>>
> I sent out v12 series which fixes some CI failures. qatomic is not
> touched yet, the current code with CONFIG_PROFILER should have the same
> issue, what's the policy of 32 bit guests support on qemu?
They should work.
> Besides time, there are some other counters with uint64_t using qatomic
> such as TCGProfile.table_op_count, we might switch to size_t instead?
Probably. You probably don't need to represent times as uint64_t (or time64_t), but as
differentials for elapsed time.
We could accumulate into 'float' if you were concerned about overflowing 2^32 units. This
is statistics after all; we don't really need exact numbers, we need magnitude and 2-3
digits of precision.
r~
^ permalink raw reply [flat|nested] 27+ messages in thread