* [PATCH v2 00/14] TCG Plugin inline operation enhancement
@ 2024-01-18 3:23 Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 01/14] plugins: implement inline operation relative to cpu_index Pierrick Bouvier
` (14 more replies)
0 siblings, 15 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
This series adds a new thread-safe API to declare inline operation
inside plugins. As well, it removes the existing non thread-safe API,
and migrates all existing plugins to use it.
Tested on Linux (user, system) for i386, x86_64 and aarch64.
To give some context, this a long term series of work around plugins,
with the goal to be able to do basic operations in a more performant and
accurate way. This will mean to add more inline operations and
conditional callbacks.
One final target of this work is to implement a plugin that implements
the icount=auto feature, and allow QEMU to run at a given "frequency"
based on number of instructions executed, without QEMU needing to keep
track of this.
Another final target is to be able to detect control flow changes in an
efficient and elegant way, by combining inline operation and conditional
callbacks.
v2
--
Implement scoreboard API (cpu local storage), so plugins don't have to deal
with how many cpus are used.
Since plugins have been modified again, I didn't transfer any reviewed-by on
those commits.
Branches:
base: upstream/staging
topic: plugin_inline_per_vcpu
Pierrick Bouvier (15):
[TO_REMOVE] commit support
plugins: implement inline operation relative to cpu_index
plugins: scoreboard API
docs/devel: plugins can trigger a tb flush
plugins: add inline operation per vcpu
tests/plugin: add test plugin for inline operations
tests/plugin/mem: migrate to new per_vcpu API
tests/plugin/insn: migrate to new per_vcpu API
tests/plugin/bb: migrate to new per_vcpu API
contrib/plugins/hotblocks: migrate to new per_vcpu API
contrib/plugins/howvec: migrate to new per_vcpu API
plugins: remove non per_vcpu inline operation from API
plugins: register inline op with a qemu_plugin_u64_t
MAINTAINERS: Add myself as reviewer for TCG Plugins
contrib/plugins/execlog: fix new warnings
.gitignore | 1 +
Dockerfile | 23 +++++
MAINTAINERS | 1 +
accel/tcg/plugin-gen.c | 64 +++++++++++---
build.sh | 18 ++++
contrib/plugins/execlog.c | 6 +-
contrib/plugins/hotblocks.c | 46 +++++-----
contrib/plugins/howvec.c | 50 +++++++----
docs/devel/multi-thread-tcg.rst | 1 +
include/qemu/plugin.h | 9 ++
include/qemu/qemu-plugin.h | 135 ++++++++++++++++++++++-------
mt.cpp | 44 ++++++++++
plugins/api.c | 74 ++++++++++++----
plugins/core.c | 112 +++++++++++++++++++++++--
plugins/plugin.h | 13 ++-
plugins/qemu-plugins.symbols | 9 ++
run.sh | 22 +++++
test.sh | 34 ++++++++
tests/plugin/bb.c | 63 ++++++--------
tests/plugin/inline.c | 182 ++++++++++++++++++++++++++++++++++++++++
tests/plugin/insn.c | 106 +++++++++++------------
tests/plugin/mem.c | 39 +++++----
tests/plugin/meson.build | 2 +-
23 files changed, 846 insertions(+), 208 deletions(-)
Pierrick Bouvier (14):
plugins: implement inline operation relative to cpu_index
plugins: scoreboard API
docs/devel: plugins can trigger a tb flush
plugins: add inline operation per vcpu
tests/plugin: add test plugin for inline operations
tests/plugin/mem: migrate to new per_vcpu API
tests/plugin/insn: migrate to new per_vcpu API
tests/plugin/bb: migrate to new per_vcpu API
contrib/plugins/hotblocks: migrate to new per_vcpu API
contrib/plugins/howvec: migrate to new per_vcpu API
plugins: remove non per_vcpu inline operation from API
plugins: register inline op with a qemu_plugin_u64_t
MAINTAINERS: Add myself as reviewer for TCG Plugins
contrib/plugins/execlog: fix new warnings
MAINTAINERS | 1 +
accel/tcg/plugin-gen.c | 64 +++++++++--
contrib/plugins/execlog.c | 6 +-
contrib/plugins/hotblocks.c | 46 ++++----
contrib/plugins/howvec.c | 50 ++++++---
docs/devel/multi-thread-tcg.rst | 1 +
include/qemu/plugin.h | 9 ++
include/qemu/qemu-plugin.h | 135 ++++++++++++++++++-----
plugins/api.c | 74 ++++++++++---
plugins/core.c | 112 +++++++++++++++++++-
plugins/plugin.h | 13 ++-
plugins/qemu-plugins.symbols | 9 ++
tests/plugin/bb.c | 63 +++++------
tests/plugin/inline.c | 182 ++++++++++++++++++++++++++++++++
tests/plugin/insn.c | 106 +++++++++----------
tests/plugin/mem.c | 39 ++++---
tests/plugin/meson.build | 2 +-
17 files changed, 704 insertions(+), 208 deletions(-)
create mode 100644 tests/plugin/inline.c
--
2.43.0
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v2 01/14] plugins: implement inline operation relative to cpu_index
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-26 12:07 ` Alex Bennée
2024-01-18 3:23 ` [PATCH v2 02/14] plugins: scoreboard API Pierrick Bouvier
` (13 subsequent siblings)
14 siblings, 1 reply; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
Instead of working on a fixed memory location, allow to address it based
on cpu_index, an element size and a given offset.
Result address: ptr + offset + cpu_index * element_size.
With this, we can target a member in a struct array from a base pointer.
Current semantic is not modified, thus inline operation still targets
always the same memory location.
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
accel/tcg/plugin-gen.c | 67 +++++++++++++++++++++++++++++++++++-------
include/qemu/plugin.h | 3 ++
plugins/api.c | 7 +++--
plugins/core.c | 18 +++++++++---
plugins/plugin.h | 6 ++--
5 files changed, 81 insertions(+), 20 deletions(-)
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index b37ce7683e6..1a2375d7779 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -132,16 +132,28 @@ static void gen_empty_udata_cb_no_rwg(void)
*/
static void gen_empty_inline_cb(void)
{
+ TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
+ TCGv_ptr cpu_index_as_ptr = tcg_temp_ebb_new_ptr();
TCGv_i64 val = tcg_temp_ebb_new_i64();
TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
+ tcg_gen_ld_i32(cpu_index, tcg_env,
+ -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
+ /* pass an immediate != 0 so that it doesn't get optimized away */
+ tcg_gen_muli_i32(cpu_index, cpu_index, 0xdeadbeef);
+ tcg_gen_ext_i32_ptr(cpu_index_as_ptr, cpu_index);
+
tcg_gen_movi_ptr(ptr, 0);
+ tcg_gen_add_ptr(ptr, ptr, cpu_index_as_ptr);
tcg_gen_ld_i64(val, ptr, 0);
/* pass an immediate != 0 so that it doesn't get optimized away */
tcg_gen_addi_i64(val, val, 0xdeadface);
+
tcg_gen_st_i64(val, ptr, 0);
tcg_temp_free_ptr(ptr);
tcg_temp_free_i64(val);
+ tcg_temp_free_ptr(cpu_index_as_ptr);
+ tcg_temp_free_i32(cpu_index);
}
static void gen_empty_mem_cb(TCGv_i64 addr, uint32_t info)
@@ -289,12 +301,37 @@ static TCGOp *copy_const_ptr(TCGOp **begin_op, TCGOp *op, void *ptr)
return op;
}
+static TCGOp *copy_ld_i32(TCGOp **begin_op, TCGOp *op)
+{
+ return copy_op(begin_op, op, INDEX_op_ld_i32);
+}
+
+static TCGOp *copy_ext_i32_ptr(TCGOp **begin_op, TCGOp *op)
+{
+ if (UINTPTR_MAX == UINT32_MAX) {
+ op = copy_op(begin_op, op, INDEX_op_mov_i32);
+ } else {
+ op = copy_op(begin_op, op, INDEX_op_ext_i32_i64);
+ }
+ return op;
+}
+
+static TCGOp *copy_add_ptr(TCGOp **begin_op, TCGOp *op)
+{
+ if (UINTPTR_MAX == UINT32_MAX) {
+ op = copy_op(begin_op, op, INDEX_op_add_i32);
+ } else {
+ op = copy_op(begin_op, op, INDEX_op_add_i64);
+ }
+ return op;
+}
+
static TCGOp *copy_ld_i64(TCGOp **begin_op, TCGOp *op)
{
if (TCG_TARGET_REG_BITS == 32) {
/* 2x ld_i32 */
- op = copy_op(begin_op, op, INDEX_op_ld_i32);
- op = copy_op(begin_op, op, INDEX_op_ld_i32);
+ op = copy_ld_i32(begin_op, op);
+ op = copy_ld_i32(begin_op, op);
} else {
/* ld_i64 */
op = copy_op(begin_op, op, INDEX_op_ld_i64);
@@ -330,6 +367,13 @@ static TCGOp *copy_add_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
return op;
}
+static TCGOp *copy_mul_i32(TCGOp **begin_op, TCGOp *op, uint32_t v)
+{
+ op = copy_op(begin_op, op, INDEX_op_mul_i32);
+ op->args[2] = tcgv_i32_arg(tcg_constant_i32(v));
+ return op;
+}
+
static TCGOp *copy_st_ptr(TCGOp **begin_op, TCGOp *op)
{
if (UINTPTR_MAX == UINT32_MAX) {
@@ -395,18 +439,19 @@ static TCGOp *append_inline_cb(const struct qemu_plugin_dyn_cb *cb,
TCGOp *begin_op, TCGOp *op,
int *unused)
{
- /* const_ptr */
- op = copy_const_ptr(&begin_op, op, cb->userp);
-
- /* ld_i64 */
+ char *ptr = cb->userp;
+ if (!cb->inline_direct_ptr) {
+ /* dereference userp once to get access to memory location */
+ ptr = *(char **)cb->userp;
+ }
+ op = copy_ld_i32(&begin_op, op);
+ op = copy_mul_i32(&begin_op, op, cb->inline_element_size);
+ op = copy_ext_i32_ptr(&begin_op, op);
+ op = copy_const_ptr(&begin_op, op, ptr + cb->inline_offset);
+ op = copy_add_ptr(&begin_op, op);
op = copy_ld_i64(&begin_op, op);
-
- /* add_i64 */
op = copy_add_i64(&begin_op, op, cb->inline_insn.imm);
-
- /* st_i64 */
op = copy_st_i64(&begin_op, op);
-
return op;
}
diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index b0c5ac68293..9346249145d 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -86,6 +86,9 @@ enum plugin_dyn_cb_subtype {
struct qemu_plugin_dyn_cb {
union qemu_plugin_cb_sig f;
void *userp;
+ size_t inline_offset;
+ size_t inline_element_size;
+ bool inline_direct_ptr;
enum plugin_dyn_cb_subtype type;
/* @rw applies to mem callbacks only (both regular and inline) */
enum qemu_plugin_mem_rw rw;
diff --git a/plugins/api.c b/plugins/api.c
index 8d5cca53295..e777eb4e9fc 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -106,7 +106,8 @@ void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
void *ptr, uint64_t imm)
{
if (!tb->mem_only) {
- plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
+ plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE],
+ 0, op, ptr, 0, sizeof(uint64_t), true, imm);
}
}
@@ -131,7 +132,7 @@ void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
{
if (!insn->mem_only) {
plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
- 0, op, ptr, imm);
+ 0, op, ptr, 0, sizeof(uint64_t), true, imm);
}
}
@@ -156,7 +157,7 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
uint64_t imm)
{
plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
- rw, op, ptr, imm);
+ rw, op, ptr, 0, sizeof(uint64_t), true, imm);
}
void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
diff --git a/plugins/core.c b/plugins/core.c
index 49588285dd0..e07b9cdf229 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -280,13 +280,18 @@ static struct qemu_plugin_dyn_cb *plugin_get_dyn_cb(GArray **arr)
void plugin_register_inline_op(GArray **arr,
enum qemu_plugin_mem_rw rw,
- enum qemu_plugin_op op, void *ptr,
+ enum qemu_plugin_op op,
+ void *ptr, size_t offset, size_t element_size,
+ bool direct_ptr,
uint64_t imm)
{
struct qemu_plugin_dyn_cb *dyn_cb;
dyn_cb = plugin_get_dyn_cb(arr);
dyn_cb->userp = ptr;
+ dyn_cb->inline_element_size = element_size;
+ dyn_cb->inline_offset = offset;
+ dyn_cb->inline_direct_ptr = direct_ptr;
dyn_cb->type = PLUGIN_CB_INLINE;
dyn_cb->rw = rw;
dyn_cb->inline_insn.op = op;
@@ -431,9 +436,14 @@ void qemu_plugin_flush_cb(void)
plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
}
-void exec_inline_op(struct qemu_plugin_dyn_cb *cb)
+void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int cpu_index)
{
- uint64_t *val = cb->userp;
+ char *ptr = cb->userp;
+ if (!cb->inline_direct_ptr) {
+ ptr = *(char **) cb->userp;
+ }
+ ptr += cb->inline_offset;
+ uint64_t *val = (uint64_t *)(ptr + cpu_index * cb->inline_element_size);
switch (cb->inline_insn.op) {
case QEMU_PLUGIN_INLINE_ADD_U64:
@@ -466,7 +476,7 @@ void qemu_plugin_vcpu_mem_cb(CPUState *cpu, uint64_t vaddr,
vaddr, cb->userp);
break;
case PLUGIN_CB_INLINE:
- exec_inline_op(cb);
+ exec_inline_op(cb, cpu->cpu_index);
break;
default:
g_assert_not_reached();
diff --git a/plugins/plugin.h b/plugins/plugin.h
index 5eb2fdbc85e..2c278379b70 100644
--- a/plugins/plugin.h
+++ b/plugins/plugin.h
@@ -66,7 +66,9 @@ struct qemu_plugin_ctx *plugin_id_to_ctx_locked(qemu_plugin_id_t id);
void plugin_register_inline_op(GArray **arr,
enum qemu_plugin_mem_rw rw,
- enum qemu_plugin_op op, void *ptr,
+ enum qemu_plugin_op op,
+ void *ptr, size_t offset, size_t element_size,
+ bool direct_ptr,
uint64_t imm);
void plugin_reset_uninstall(qemu_plugin_id_t id,
@@ -95,6 +97,6 @@ void plugin_register_vcpu_mem_cb(GArray **arr,
enum qemu_plugin_mem_rw rw,
void *udata);
-void exec_inline_op(struct qemu_plugin_dyn_cb *cb);
+void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int cpu_index);
#endif /* PLUGIN_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 02/14] plugins: scoreboard API
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 01/14] plugins: implement inline operation relative to cpu_index Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-26 15:14 ` Alex Bennée
2024-01-18 3:23 ` [PATCH v2 03/14] docs/devel: plugins can trigger a tb flush Pierrick Bouvier
` (12 subsequent siblings)
14 siblings, 1 reply; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
We introduce a cpu local storage, automatically managed (and extended)
by QEMU itself. Plugin allocate a scoreboard, and don't have to deal
with how many cpus are launched.
This API will be used by new inline functions but callbacks can benefit
from this as well. This way, they can operate without a global lock for
simple operations.
New functions:
- qemu_plugin_scoreboard_free
- qemu_plugin_scoreboard_get
- qemu_plugin_scoreboard_new
- qemu_plugin_scoreboard_size
In more, we define a qemu_plugin_u64_t, which is a simple struct holding
a pointer to a scoreboard, and a given offset.
This allows to have a scoreboard containing structs, without having to
bring offset for all operations on a specific field.
Since most of the plugins are simply collecting a sum of per-cpu values,
qemu_plugin_u64_t directly support this operation as well.
New functions:
- qemu_plugin_u64_get
- qemu_plugin_u64_sum
New macros:
- qemu_plugin_u64
- qemu_plugin_u64_struct
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
include/qemu/plugin.h | 7 +++
include/qemu/qemu-plugin.h | 75 ++++++++++++++++++++++++++++
plugins/api.c | 39 +++++++++++++++
plugins/core.c | 97 ++++++++++++++++++++++++++++++++++++
plugins/plugin.h | 8 +++
plugins/qemu-plugins.symbols | 6 +++
6 files changed, 232 insertions(+)
diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index 9346249145d..5f340192e56 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -115,6 +115,13 @@ struct qemu_plugin_insn {
bool mem_only;
};
+/* A scoreboard is an array of values, indexed by vcpu_index */
+struct qemu_plugin_scoreboard {
+ GArray *data;
+ size_t size;
+ size_t element_size;
+};
+
/*
* qemu_plugin_insn allocate and cleanup functions. We don't expect to
* cleanup many of these structures. They are reused for each fresh
diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index 2c1930e7e45..934059d64c2 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -220,6 +220,23 @@ void qemu_plugin_register_vcpu_resume_cb(qemu_plugin_id_t id,
struct qemu_plugin_tb;
/** struct qemu_plugin_insn - Opaque handle for a translated instruction */
struct qemu_plugin_insn;
+/**
+ * struct qemu_plugin_scoreboard - Opaque handle for a scoreboard
+ *
+ * A scoreboard is an array of data, indexed by vcpu_index.
+ **/
+struct qemu_plugin_scoreboard;
+
+/**
+ * qemu_plugin_u64_t - uint64_t member of an entry in a scoreboard
+ *
+ * This field allows to access a specific uint64_t member in one given entry,
+ * located at a specified offset. Inline operations expect this as entry.
+ */
+typedef struct qemu_plugin_u64 {
+ struct qemu_plugin_scoreboard *score;
+ size_t offset;
+} qemu_plugin_u64_t;
/**
* enum qemu_plugin_cb_flags - type of callback
@@ -754,5 +771,63 @@ int qemu_plugin_read_register(unsigned int vcpu,
struct qemu_plugin_register *handle,
GByteArray *buf);
+/**
+ * qemu_plugin_scoreboard_new() - alloc a new scoreboard
+ *
+ * Returns a pointer to a new scoreboard. It must be freed using
+ * qemu_plugin_scoreboard_free.
+ */
+QEMU_PLUGIN_API
+struct qemu_plugin_scoreboard *qemu_plugin_scoreboard_new(size_t element_size);
+
+/**
+ * qemu_plugin_scoreboard_free() - free a scoreboard
+ * @score: scoreboard to free
+ */
+QEMU_PLUGIN_API
+void qemu_plugin_scoreboard_free(struct qemu_plugin_scoreboard *score);
+
+/**
+ * qemu_plugin_scoreboard_size() - return size of a scoreboard
+ * @score: scoreboard to query
+ */
+QEMU_PLUGIN_API
+size_t qemu_plugin_scoreboard_size(struct qemu_plugin_scoreboard *score);
+
+/**
+ * qemu_plugin_scoreboard_get() - access value from a scoreboard
+ * @score: scoreboard to query
+ * @vcpu_index: entry index
+ *
+ * Returns address of entry of a scoreboard matching a given vcpu_index. This
+ * address can be modified later if scoreboard is resized.
+ */
+QEMU_PLUGIN_API
+void *qemu_plugin_scoreboard_get(struct qemu_plugin_scoreboard *score,
+ unsigned int vcpu_index);
+
+/* Macros to define a qemu_plugin_u64_t */
+#define qemu_plugin_u64(score) \
+ (qemu_plugin_u64_t){score, 0}
+#define qemu_plugin_u64_struct(score, type, member) \
+ (qemu_plugin_u64_t){score, offsetof(type, member)}
+
+/**
+ * qemu_plugin_u64_get() - access specific uint64_t in a scoreboard
+ * @entry: entry to query
+ * @vcpu_index: entry index
+ *
+ * Returns address of a specific member in a scoreboard entry, matching a given
+ * vcpu_index.
+ */
+QEMU_PLUGIN_API
+uint64_t *qemu_plugin_u64_get(qemu_plugin_u64_t entry, unsigned int vcpu_index);
+
+/**
+ * qemu_plugin_u64_sum() - return sum of all values in a scoreboard
+ * @entry: entry to sum
+ */
+QEMU_PLUGIN_API
+uint64_t qemu_plugin_u64_sum(qemu_plugin_u64_t entry);
#endif /* QEMU_QEMU_PLUGIN_H */
diff --git a/plugins/api.c b/plugins/api.c
index e777eb4e9fc..4de94e798c6 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -547,3 +547,42 @@ static void __attribute__((__constructor__)) qemu_api_init(void)
qemu_mutex_init(®_handle_lock);
}
+
+struct qemu_plugin_scoreboard *qemu_plugin_scoreboard_new(size_t element_size)
+{
+ return plugin_scoreboard_new(element_size);
+}
+
+void qemu_plugin_scoreboard_free(struct qemu_plugin_scoreboard *score)
+{
+ plugin_scoreboard_free(score);
+}
+
+size_t qemu_plugin_scoreboard_size(struct qemu_plugin_scoreboard *score)
+{
+ return score->size;
+}
+
+void *qemu_plugin_scoreboard_get(struct qemu_plugin_scoreboard *score,
+ unsigned int vcpu_index)
+{
+ g_assert(vcpu_index < qemu_plugin_scoreboard_size(score));
+ char *ptr = score->data->data;
+ return ptr + vcpu_index * score->element_size;
+}
+
+uint64_t *qemu_plugin_u64_get(qemu_plugin_u64_t entry,
+ unsigned int vcpu_index)
+{
+ char *ptr = (char *) qemu_plugin_scoreboard_get(entry.score, vcpu_index);
+ return (uint64_t *)(ptr + entry.offset);
+}
+
+uint64_t qemu_plugin_u64_sum(qemu_plugin_u64_t entry)
+{
+ uint64_t total = 0;
+ for (int i = 0; i < qemu_plugin_scoreboard_size(entry.score); ++i) {
+ total += *qemu_plugin_u64_get(entry, i);
+ }
+ return total;
+}
diff --git a/plugins/core.c b/plugins/core.c
index e07b9cdf229..0286a127810 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -209,6 +209,71 @@ plugin_register_cb_udata(qemu_plugin_id_t id, enum qemu_plugin_event ev,
do_plugin_register_cb(id, ev, func, udata);
}
+struct resize_scoreboard_args {
+ size_t new_alloc_size;
+ size_t new_size;
+};
+
+static void plugin_resize_one_scoreboard(gpointer key,
+ gpointer value,
+ gpointer user_data)
+{
+ struct qemu_plugin_scoreboard *score =
+ (struct qemu_plugin_scoreboard *) value;
+ struct resize_scoreboard_args *args =
+ (struct resize_scoreboard_args *) user_data;
+ if (score->data->len != args->new_alloc_size) {
+ g_array_set_size(score->data, args->new_alloc_size);
+ }
+ score->size = args->new_size;
+}
+
+static void plugin_grow_scoreboards__locked(CPUState *cpu)
+{
+ if (cpu->cpu_index < plugin.scoreboard_size) {
+ return;
+ }
+
+ bool need_realloc = FALSE;
+ while (cpu->cpu_index >= plugin.scoreboard_alloc_size) {
+ plugin.scoreboard_alloc_size *= 2;
+ need_realloc = TRUE;
+ }
+ plugin.scoreboard_size = cpu->cpu_index + 1;
+ g_assert(plugin.scoreboard_size <= plugin.scoreboard_alloc_size);
+
+ if (g_hash_table_size(plugin.scoreboards) == 0) {
+ /* nothing to do, we just updated sizes for future scoreboards */
+ return;
+ }
+
+ if (need_realloc) {
+#ifdef CONFIG_USER_ONLY
+ /**
+ * cpus must be stopped, as some tb might still use an existing
+ * scoreboard.
+ */
+ start_exclusive();
+#endif
+ }
+
+ struct resize_scoreboard_args args = {
+ .new_alloc_size = plugin.scoreboard_alloc_size,
+ .new_size = plugin.scoreboard_size
+ };
+ g_hash_table_foreach(plugin.scoreboards,
+ &plugin_resize_one_scoreboard,
+ &args);
+
+ if (need_realloc) {
+ /* force all tb to be flushed, as scoreboard pointers were changed. */
+ tb_flush(cpu);
+#ifdef CONFIG_USER_ONLY
+ end_exclusive();
+#endif
+ }
+}
+
void qemu_plugin_vcpu_init_hook(CPUState *cpu)
{
bool success;
@@ -218,6 +283,7 @@ void qemu_plugin_vcpu_init_hook(CPUState *cpu)
success = g_hash_table_insert(plugin.cpu_ht, &cpu->cpu_index,
&cpu->cpu_index);
g_assert(success);
+ plugin_grow_scoreboards__locked(cpu);
qemu_rec_mutex_unlock(&plugin.lock);
plugin_vcpu_cb__simple(cpu, QEMU_PLUGIN_EV_VCPU_INIT);
@@ -576,8 +642,39 @@ static void __attribute__((__constructor__)) plugin_init(void)
qemu_rec_mutex_init(&plugin.lock);
plugin.id_ht = g_hash_table_new(g_int64_hash, g_int64_equal);
plugin.cpu_ht = g_hash_table_new(g_int_hash, g_int_equal);
+ plugin.scoreboards = g_hash_table_new(g_int64_hash, g_int64_equal);
+ plugin.scoreboard_size = 1;
+ plugin.scoreboard_alloc_size = 16; /* avoid frequent reallocation */
QTAILQ_INIT(&plugin.ctxs);
qht_init(&plugin.dyn_cb_arr_ht, plugin_dyn_cb_arr_cmp, 16,
QHT_MODE_AUTO_RESIZE);
atexit(qemu_plugin_atexit_cb);
}
+
+struct qemu_plugin_scoreboard *plugin_scoreboard_new(size_t element_size)
+{
+ struct qemu_plugin_scoreboard *score =
+ g_malloc0(sizeof(struct qemu_plugin_scoreboard));
+ score->data = g_array_new(FALSE, TRUE, element_size);
+ g_array_set_size(score->data, plugin.scoreboard_alloc_size);
+ score->size = plugin.scoreboard_size;
+ score->element_size = element_size;
+
+ qemu_rec_mutex_lock(&plugin.lock);
+ bool inserted = g_hash_table_insert(plugin.scoreboards, score, score);
+ g_assert(inserted);
+ qemu_rec_mutex_unlock(&plugin.lock);
+
+ return score;
+}
+
+void plugin_scoreboard_free(struct qemu_plugin_scoreboard *score)
+{
+ qemu_rec_mutex_lock(&plugin.lock);
+ bool removed = g_hash_table_remove(plugin.scoreboards, score);
+ g_assert(removed);
+ qemu_rec_mutex_unlock(&plugin.lock);
+
+ g_array_free(score->data, TRUE);
+ g_free(score);
+}
diff --git a/plugins/plugin.h b/plugins/plugin.h
index 2c278379b70..99829c40886 100644
--- a/plugins/plugin.h
+++ b/plugins/plugin.h
@@ -31,6 +31,10 @@ struct qemu_plugin_state {
* but with the HT we avoid adding a field to CPUState.
*/
GHashTable *cpu_ht;
+ /* Scoreboards, indexed by their addresses. */
+ GHashTable *scoreboards;
+ size_t scoreboard_size;
+ size_t scoreboard_alloc_size;
DECLARE_BITMAP(mask, QEMU_PLUGIN_EV_MAX);
/*
* @lock protects the struct as well as ctx->uninstalling.
@@ -99,4 +103,8 @@ void plugin_register_vcpu_mem_cb(GArray **arr,
void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int cpu_index);
+struct qemu_plugin_scoreboard *plugin_scoreboard_new(size_t element_size);
+
+void plugin_scoreboard_free(struct qemu_plugin_scoreboard *score);
+
#endif /* PLUGIN_H */
diff --git a/plugins/qemu-plugins.symbols b/plugins/qemu-plugins.symbols
index 6963585c1ea..93866d1b5f2 100644
--- a/plugins/qemu-plugins.symbols
+++ b/plugins/qemu-plugins.symbols
@@ -38,10 +38,16 @@
qemu_plugin_register_vcpu_tb_exec_inline;
qemu_plugin_register_vcpu_tb_trans_cb;
qemu_plugin_reset;
+ qemu_plugin_scoreboard_free;
+ qemu_plugin_scoreboard_get;
+ qemu_plugin_scoreboard_new;
+ qemu_plugin_scoreboard_size;
qemu_plugin_start_code;
qemu_plugin_tb_get_insn;
qemu_plugin_tb_n_insns;
qemu_plugin_tb_vaddr;
+ qemu_plugin_u64_get;
+ qemu_plugin_u64_sum;
qemu_plugin_uninstall;
qemu_plugin_vcpu_for_each;
};
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 03/14] docs/devel: plugins can trigger a tb flush
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 01/14] plugins: implement inline operation relative to cpu_index Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 02/14] plugins: scoreboard API Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 04/14] plugins: add inline operation per vcpu Pierrick Bouvier
` (11 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
When scoreboards need to be reallocated.
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
docs/devel/multi-thread-tcg.rst | 1 +
1 file changed, 1 insertion(+)
diff --git a/docs/devel/multi-thread-tcg.rst b/docs/devel/multi-thread-tcg.rst
index 7302c3bf534..1420789fff3 100644
--- a/docs/devel/multi-thread-tcg.rst
+++ b/docs/devel/multi-thread-tcg.rst
@@ -109,6 +109,7 @@ including:
- debugging operations (breakpoint insertion/removal)
- some CPU helper functions
- linux-user spawning its first thread
+ - operations related to TCG Plugins
This is done with the async_safe_run_on_cpu() mechanism to ensure all
vCPUs are quiescent when changes are being made to shared global
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 04/14] plugins: add inline operation per vcpu
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
` (2 preceding siblings ...)
2024-01-18 3:23 ` [PATCH v2 03/14] docs/devel: plugins can trigger a tb flush Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-26 15:17 ` Alex Bennée
2024-01-18 3:23 ` [PATCH v2 05/14] tests/plugin: add test plugin for inline operations Pierrick Bouvier
` (10 subsequent siblings)
14 siblings, 1 reply; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
Extends API with three new functions:
qemu_plugin_register_vcpu_{tb, insn, mem}_exec_inline_per_vcpu().
Those functions takes a qemu_plugin_u64_t as input.
This allows to have a thread-safe and type-safe version of inline
operations.
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
include/qemu/qemu-plugin.h | 51 +++++++++++++++++++++++++++++++++++-
plugins/api.c | 43 +++++++++++++++++++++++++++++-
plugins/qemu-plugins.symbols | 3 +++
3 files changed, 95 insertions(+), 2 deletions(-)
diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index 934059d64c2..55f918db1b0 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -330,6 +330,22 @@ void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
enum qemu_plugin_op op,
void *ptr, uint64_t imm);
+/**
+ * qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu() - execution inline op
+ * @tb: the opaque qemu_plugin_tb handle for the translation
+ * @op: the type of qemu_plugin_op (e.g. ADD_U64)
+ * @entry: entry to run op
+ * @imm: the op data (e.g. 1)
+ *
+ * Insert an inline op on a given scoreboard entry.
+ */
+QEMU_PLUGIN_API
+void qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+ struct qemu_plugin_tb *tb,
+ enum qemu_plugin_op op,
+ qemu_plugin_u64_t entry,
+ uint64_t imm);
+
/**
* qemu_plugin_register_vcpu_insn_exec_cb() - register insn execution cb
* @insn: the opaque qemu_plugin_insn handle for an instruction
@@ -360,6 +376,22 @@ void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
enum qemu_plugin_op op,
void *ptr, uint64_t imm);
+/**
+ * qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu() - insn exec inline op
+ * @insn: the opaque qemu_plugin_insn handle for an instruction
+ * @op: the type of qemu_plugin_op (e.g. ADD_U64)
+ * @entry: entry to run op
+ * @imm: the op data (e.g. 1)
+ *
+ * Insert an inline op to every time an instruction executes.
+ */
+QEMU_PLUGIN_API
+void qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
+ struct qemu_plugin_insn *insn,
+ enum qemu_plugin_op op,
+ qemu_plugin_u64_t entry,
+ uint64_t imm);
+
/**
* qemu_plugin_tb_n_insns() - query helper for number of insns in TB
* @tb: opaque handle to TB passed to callback
@@ -585,7 +617,24 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
enum qemu_plugin_op op, void *ptr,
uint64_t imm);
-
+/**
+ * qemu_plugin_register_vcpu_mem_inline_per_vcpu() - inline op for mem access
+ * @insn: handle for instruction to instrument
+ * @rw: apply to reads, writes or both
+ * @op: the op, of type qemu_plugin_op
+ * @entry: entry to run op
+ * @imm: immediate data for @op
+ *
+ * This registers a inline op every memory access generated by the
+ * instruction.
+ */
+QEMU_PLUGIN_API
+void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
+ struct qemu_plugin_insn *insn,
+ enum qemu_plugin_mem_rw rw,
+ enum qemu_plugin_op op,
+ qemu_plugin_u64_t entry,
+ uint64_t imm);
typedef void
(*qemu_plugin_vcpu_syscall_cb_t)(qemu_plugin_id_t id, unsigned int vcpu_index,
diff --git a/plugins/api.c b/plugins/api.c
index 4de94e798c6..132d5e0bec1 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -111,6 +111,20 @@ void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
}
}
+void qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+ struct qemu_plugin_tb *tb,
+ enum qemu_plugin_op op,
+ qemu_plugin_u64_t entry,
+ uint64_t imm)
+{
+ if (!tb->mem_only) {
+ plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE],
+ 0, op, entry.score->data,
+ entry.offset, entry.score->element_size,
+ false, imm);
+ }
+}
+
void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
qemu_plugin_vcpu_udata_cb_t cb,
enum qemu_plugin_cb_flags flags,
@@ -136,6 +150,20 @@ void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
}
}
+void qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
+ struct qemu_plugin_insn *insn,
+ enum qemu_plugin_op op,
+ qemu_plugin_u64_t entry,
+ uint64_t imm)
+{
+ if (!insn->mem_only) {
+ plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
+ 0, op, entry.score->data,
+ entry.offset, entry.score->element_size,
+ false, imm);
+ }
+}
+
/*
* We always plant memory instrumentation because they don't finalise until
@@ -148,7 +176,7 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
void *udata)
{
plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
- cb, flags, rw, udata);
+ cb, flags, rw, udata);
}
void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
@@ -160,6 +188,19 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
rw, op, ptr, 0, sizeof(uint64_t), true, imm);
}
+void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
+ struct qemu_plugin_insn *insn,
+ enum qemu_plugin_mem_rw rw,
+ enum qemu_plugin_op op,
+ qemu_plugin_u64_t entry,
+ uint64_t imm)
+{
+ plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
+ rw, op, entry.score->data,
+ entry.offset, entry.score->element_size,
+ false, imm);
+}
+
void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
qemu_plugin_vcpu_tb_trans_cb_t cb)
{
diff --git a/plugins/qemu-plugins.symbols b/plugins/qemu-plugins.symbols
index 93866d1b5f2..a499cee06d5 100644
--- a/plugins/qemu-plugins.symbols
+++ b/plugins/qemu-plugins.symbols
@@ -29,13 +29,16 @@
qemu_plugin_register_vcpu_init_cb;
qemu_plugin_register_vcpu_insn_exec_cb;
qemu_plugin_register_vcpu_insn_exec_inline;
+ qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu;
qemu_plugin_register_vcpu_mem_cb;
qemu_plugin_register_vcpu_mem_inline;
+ qemu_plugin_register_vcpu_mem_inline_per_vcpu;
qemu_plugin_register_vcpu_resume_cb;
qemu_plugin_register_vcpu_syscall_cb;
qemu_plugin_register_vcpu_syscall_ret_cb;
qemu_plugin_register_vcpu_tb_exec_cb;
qemu_plugin_register_vcpu_tb_exec_inline;
+ qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu;
qemu_plugin_register_vcpu_tb_trans_cb;
qemu_plugin_reset;
qemu_plugin_scoreboard_free;
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 05/14] tests/plugin: add test plugin for inline operations
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
` (3 preceding siblings ...)
2024-01-18 3:23 ` [PATCH v2 04/14] plugins: add inline operation per vcpu Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-26 16:05 ` Alex Bennée
2024-01-18 3:23 ` [PATCH v2 06/14] tests/plugin/mem: migrate to new per_vcpu API Pierrick Bouvier
` (9 subsequent siblings)
14 siblings, 1 reply; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
For now, it simply performs instruction, bb and mem count, and ensure
that inline vs callback versions have the same result. Later, we'll
extend it when new inline operations are added.
Use existing plugins to test everything works is a bit cumbersome, as
different events are treated in different plugins. Thus, this new one.
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
tests/plugin/inline.c | 182 +++++++++++++++++++++++++++++++++++++++
tests/plugin/meson.build | 2 +-
2 files changed, 183 insertions(+), 1 deletion(-)
create mode 100644 tests/plugin/inline.c
diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c
new file mode 100644
index 00000000000..28d1c3b1e48
--- /dev/null
+++ b/tests/plugin/inline.c
@@ -0,0 +1,182 @@
+/*
+ * Copyright (C) 2023, Pierrick Bouvier <pierrick.bouvier@linaro.org>
+ *
+ * Demonstrates and tests usage of inline ops.
+ *
+ * License: GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include <glib.h>
+#include <stdint.h>
+#include <stdio.h>
+
+#include <qemu-plugin.h>
+
+typedef struct {
+ uint64_t count_tb;
+ uint64_t count_tb_inline;
+ uint64_t count_insn;
+ uint64_t count_insn_inline;
+ uint64_t count_mem;
+ uint64_t count_mem_inline;
+} CPUCount;
+
+static struct qemu_plugin_scoreboard *counts;
+static qemu_plugin_u64_t count_tb;
+static qemu_plugin_u64_t count_tb_inline;
+static qemu_plugin_u64_t count_insn;
+static qemu_plugin_u64_t count_insn_inline;
+static qemu_plugin_u64_t count_mem;
+static qemu_plugin_u64_t count_mem_inline;
+
+static uint64_t global_count_tb;
+static uint64_t global_count_insn;
+static uint64_t global_count_mem;
+static unsigned int max_cpu_index;
+static GMutex tb_lock;
+static GMutex insn_lock;
+static GMutex mem_lock;
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+static void stats_insn(void)
+{
+ const uint64_t expected = global_count_insn;
+ const uint64_t per_vcpu = qemu_plugin_u64_sum(count_insn);
+ const uint64_t inl_per_vcpu =
+ qemu_plugin_u64_sum(count_insn_inline);
+ printf("insn: %" PRIu64 "\n", expected);
+ printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu);
+ printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
+ g_assert(expected > 0);
+ g_assert(per_vcpu == expected);
+ g_assert(inl_per_vcpu == expected);
+}
+
+static void stats_tb(void)
+{
+ const uint64_t expected = global_count_tb;
+ const uint64_t per_vcpu = qemu_plugin_u64_sum(count_tb);
+ const uint64_t inl_per_vcpu =
+ qemu_plugin_u64_sum(count_tb_inline);
+ printf("tb: %" PRIu64 "\n", expected);
+ printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu);
+ printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
+ g_assert(expected > 0);
+ g_assert(per_vcpu == expected);
+ g_assert(inl_per_vcpu == expected);
+}
+
+static void stats_mem(void)
+{
+ const uint64_t expected = global_count_mem;
+ const uint64_t per_vcpu = qemu_plugin_u64_sum(count_mem);
+ const uint64_t inl_per_vcpu =
+ qemu_plugin_u64_sum(count_mem_inline);
+ printf("mem: %" PRIu64 "\n", expected);
+ printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu);
+ printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
+ g_assert(expected > 0);
+ g_assert(per_vcpu == expected);
+ g_assert(inl_per_vcpu == expected);
+}
+
+static void plugin_exit(qemu_plugin_id_t id, void *udata)
+{
+ const unsigned int num_cpus = qemu_plugin_scoreboard_size(counts);
+ g_assert(num_cpus == max_cpu_index + 1);
+
+ for (int i = 0; i < num_cpus ; ++i) {
+ const uint64_t tb = *qemu_plugin_u64_get(count_tb, i);
+ const uint64_t tb_inline = *qemu_plugin_u64_get(count_tb_inline, i);
+ const uint64_t insn = *qemu_plugin_u64_get(count_insn, i);
+ const uint64_t insn_inline = *qemu_plugin_u64_get(count_insn_inline, i);
+ const uint64_t mem = *qemu_plugin_u64_get(count_mem, i);
+ const uint64_t mem_inline = *qemu_plugin_u64_get(count_mem_inline, i);
+ printf("cpu %d: tb (%" PRIu64 ", %" PRIu64 ") | "
+ "insn (%" PRIu64 ", %" PRIu64 ") | "
+ "mem (%" PRIu64 ", %" PRIu64 ")"
+ "\n",
+ i, tb, tb_inline, insn, insn_inline, mem, mem_inline);
+ g_assert(tb == tb_inline);
+ g_assert(insn == insn_inline);
+ g_assert(mem == mem_inline);
+ }
+
+ stats_tb();
+ stats_insn();
+ stats_mem();
+
+ qemu_plugin_scoreboard_free(counts);
+}
+
+static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
+{
+ (*qemu_plugin_u64_get(count_tb, cpu_index))++;
+ g_mutex_lock(&tb_lock);
+ max_cpu_index = MAX(max_cpu_index, cpu_index);
+ global_count_tb++;
+ g_mutex_unlock(&tb_lock);
+}
+
+static void vcpu_insn_exec(unsigned int cpu_index, void *udata)
+{
+ (*qemu_plugin_u64_get(count_insn, cpu_index))++;
+ g_mutex_lock(&insn_lock);
+ global_count_insn++;
+ g_mutex_unlock(&insn_lock);
+}
+
+static void vcpu_mem_access(unsigned int cpu_index,
+ qemu_plugin_meminfo_t info,
+ uint64_t vaddr,
+ void *userdata)
+{
+ (*qemu_plugin_u64_get(count_mem, cpu_index))++;
+ g_mutex_lock(&mem_lock);
+ global_count_mem++;
+ g_mutex_unlock(&mem_lock);
+}
+
+static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
+{
+ qemu_plugin_register_vcpu_tb_exec_cb(
+ tb, vcpu_tb_exec, QEMU_PLUGIN_CB_NO_REGS, 0);
+ qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+ tb, QEMU_PLUGIN_INLINE_ADD_U64, count_tb_inline, 1);
+
+ for (int idx = 0; idx < qemu_plugin_tb_n_insns(tb); ++idx) {
+ struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx);
+ qemu_plugin_register_vcpu_insn_exec_cb(
+ insn, vcpu_insn_exec, QEMU_PLUGIN_CB_NO_REGS, 0);
+ qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
+ insn, QEMU_PLUGIN_INLINE_ADD_U64, count_insn_inline, 1);
+ qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access,
+ QEMU_PLUGIN_CB_NO_REGS,
+ QEMU_PLUGIN_MEM_RW, 0);
+ qemu_plugin_register_vcpu_mem_inline_per_vcpu(
+ insn, QEMU_PLUGIN_MEM_RW,
+ QEMU_PLUGIN_INLINE_ADD_U64,
+ count_mem_inline, 1);
+ }
+}
+
+QEMU_PLUGIN_EXPORT
+int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
+ int argc, char **argv)
+{
+ counts = qemu_plugin_scoreboard_new(sizeof(CPUCount));
+ count_tb = qemu_plugin_u64_struct(counts, CPUCount, count_tb);
+ count_insn = qemu_plugin_u64_struct(counts, CPUCount, count_insn);
+ count_mem = qemu_plugin_u64_struct(counts, CPUCount, count_mem);
+ count_tb_inline = qemu_plugin_u64_struct(counts, CPUCount, count_tb_inline);
+ count_insn_inline =
+ qemu_plugin_u64_struct(counts, CPUCount, count_insn_inline);
+ count_mem_inline =
+ qemu_plugin_u64_struct(counts, CPUCount, count_mem_inline);
+ qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
+ qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
+
+ return 0;
+}
diff --git a/tests/plugin/meson.build b/tests/plugin/meson.build
index e18183aaeda..9eece5bab51 100644
--- a/tests/plugin/meson.build
+++ b/tests/plugin/meson.build
@@ -1,6 +1,6 @@
t = []
if get_option('plugins')
- foreach i : ['bb', 'empty', 'insn', 'mem', 'syscall']
+ foreach i : ['bb', 'empty', 'inline', 'insn', 'mem', 'syscall']
if host_os == 'windows'
t += shared_module(i, files(i + '.c') + '../../contrib/plugins/win32_linker.c',
include_directories: '../../include/qemu',
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 06/14] tests/plugin/mem: migrate to new per_vcpu API
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
` (4 preceding siblings ...)
2024-01-18 3:23 ` [PATCH v2 05/14] tests/plugin: add test plugin for inline operations Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 07/14] tests/plugin/insn: " Pierrick Bouvier
` (8 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
tests/plugin/mem.c | 39 ++++++++++++++++++++++++---------------
1 file changed, 24 insertions(+), 15 deletions(-)
diff --git a/tests/plugin/mem.c b/tests/plugin/mem.c
index 44e91065ba7..83b7fa1f187 100644
--- a/tests/plugin/mem.c
+++ b/tests/plugin/mem.c
@@ -16,9 +16,14 @@
QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
-static uint64_t inline_mem_count;
-static uint64_t cb_mem_count;
-static uint64_t io_count;
+typedef struct {
+ uint64_t mem_count;
+ uint64_t io_count;
+} CPUCount;
+
+static struct qemu_plugin_scoreboard *counts;
+static qemu_plugin_u64_t mem_count;
+static qemu_plugin_u64_t io_count;
static bool do_inline, do_callback;
static bool do_haddr;
static enum qemu_plugin_mem_rw rw = QEMU_PLUGIN_MEM_RW;
@@ -27,16 +32,16 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
{
g_autoptr(GString) out = g_string_new("");
- if (do_inline) {
- g_string_printf(out, "inline mem accesses: %" PRIu64 "\n", inline_mem_count);
- }
- if (do_callback) {
- g_string_append_printf(out, "callback mem accesses: %" PRIu64 "\n", cb_mem_count);
+ if (do_inline || do_callback) {
+ g_string_printf(out, "mem accesses: %" PRIu64 "\n",
+ qemu_plugin_u64_sum(mem_count));
}
if (do_haddr) {
- g_string_append_printf(out, "io accesses: %" PRIu64 "\n", io_count);
+ g_string_append_printf(out, "io accesses: %" PRIu64 "\n",
+ qemu_plugin_u64_sum(io_count));
}
qemu_plugin_outs(out->str);
+ qemu_plugin_scoreboard_free(counts);
}
static void vcpu_mem(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo,
@@ -46,12 +51,12 @@ static void vcpu_mem(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo,
struct qemu_plugin_hwaddr *hwaddr;
hwaddr = qemu_plugin_get_hwaddr(meminfo, vaddr);
if (qemu_plugin_hwaddr_is_io(hwaddr)) {
- io_count++;
+ (*qemu_plugin_u64_get(io_count, cpu_index))++;
} else {
- cb_mem_count++;
+ (*qemu_plugin_u64_get(mem_count, cpu_index))++;
}
} else {
- cb_mem_count++;
+ (*qemu_plugin_u64_get(mem_count, cpu_index))++;
}
}
@@ -64,9 +69,10 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
if (do_inline) {
- qemu_plugin_register_vcpu_mem_inline(insn, rw,
- QEMU_PLUGIN_INLINE_ADD_U64,
- &inline_mem_count, 1);
+ qemu_plugin_register_vcpu_mem_inline_per_vcpu(
+ insn, rw,
+ QEMU_PLUGIN_INLINE_ADD_U64,
+ mem_count, 1);
}
if (do_callback) {
qemu_plugin_register_vcpu_mem_cb(insn, vcpu_mem,
@@ -117,6 +123,9 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
}
}
+ counts = qemu_plugin_scoreboard_new(sizeof(CPUCount));
+ mem_count = qemu_plugin_u64_struct(counts, CPUCount, mem_count);
+ io_count = qemu_plugin_u64_struct(counts, CPUCount, io_count);
qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 07/14] tests/plugin/insn: migrate to new per_vcpu API
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
` (5 preceding siblings ...)
2024-01-18 3:23 ` [PATCH v2 06/14] tests/plugin/mem: migrate to new per_vcpu API Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 08/14] tests/plugin/bb: " Pierrick Bouvier
` (7 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
tests/plugin/insn.c | 106 +++++++++++++++++++++-----------------------
1 file changed, 50 insertions(+), 56 deletions(-)
diff --git a/tests/plugin/insn.c b/tests/plugin/insn.c
index 5fd3017c2b3..6fff51121d4 100644
--- a/tests/plugin/insn.c
+++ b/tests/plugin/insn.c
@@ -16,25 +16,21 @@
QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
-#define MAX_CPUS 8 /* lets not go nuts */
-
-typedef struct {
- uint64_t insn_count;
-} InstructionCount;
-
-static InstructionCount counts[MAX_CPUS];
-static uint64_t inline_insn_count;
+static qemu_plugin_u64_t insn_count;
static bool do_inline;
static bool do_size;
static GArray *sizes;
+typedef struct {
+ uint64_t hits;
+ uint64_t last_hit;
+ uint64_t total_delta;
+} MatchCount;
+
typedef struct {
char *match_string;
- uint64_t hits[MAX_CPUS];
- uint64_t last_hit[MAX_CPUS];
- uint64_t total_delta[MAX_CPUS];
- GPtrArray *history[MAX_CPUS];
+ struct qemu_plugin_scoreboard *counts; /* MatchCount */
} Match;
static GArray *matches;
@@ -48,41 +44,40 @@ typedef struct {
static void vcpu_insn_exec_before(unsigned int cpu_index, void *udata)
{
- unsigned int i = cpu_index % MAX_CPUS;
- InstructionCount *c = &counts[i];
-
- c->insn_count++;
+ (*qemu_plugin_u64_get(insn_count, cpu_index))++;
}
static void vcpu_insn_matched_exec_before(unsigned int cpu_index, void *udata)
{
- unsigned int i = cpu_index % MAX_CPUS;
Instruction *insn = (Instruction *) udata;
- Match *match = insn->match;
+ Match *insn_match = insn->match;
+ MatchCount *match = qemu_plugin_scoreboard_get(insn_match->counts,
+ cpu_index);
+
g_autoptr(GString) ts = g_string_new("");
insn->hits++;
g_string_append_printf(ts, "0x%" PRIx64 ", '%s', %"PRId64 " hits",
insn->vaddr, insn->disas, insn->hits);
- uint64_t icount = counts[i].insn_count;
- uint64_t delta = icount - match->last_hit[i];
+ uint64_t icount = *qemu_plugin_u64_get(insn_count, cpu_index);
+ uint64_t delta = icount - match->last_hit;
- match->hits[i]++;
- match->total_delta[i] += delta;
+ match->hits++;
+ match->total_delta += delta;
g_string_append_printf(ts,
- ", %"PRId64" match hits, "
- "Δ+%"PRId64 " since last match,"
+ " , cpu %u,"
+ " %"PRId64" match hits,"
+ " Δ+%"PRId64 " since last match,"
" %"PRId64 " avg insns/match\n",
- match->hits[i], delta,
- match->total_delta[i] / match->hits[i]);
+ cpu_index,
+ match->hits, delta,
+ match->total_delta / match->hits);
- match->last_hit[i] = icount;
+ match->last_hit = icount;
qemu_plugin_outs(ts->str);
-
- g_ptr_array_add(match->history[i], insn);
}
static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
@@ -94,8 +89,8 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
if (do_inline) {
- qemu_plugin_register_vcpu_insn_exec_inline(
- insn, QEMU_PLUGIN_INLINE_ADD_U64, &inline_insn_count, 1);
+ qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
+ insn, QEMU_PLUGIN_INLINE_ADD_U64, insn_count, 1);
} else {
uint64_t vaddr = qemu_plugin_insn_vaddr(insn);
qemu_plugin_register_vcpu_insn_exec_cb(
@@ -117,10 +112,9 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
* information about the instruction which we also need to
* save if there is a hit.
*/
- if (matches) {
+ if (matches->len) {
char *insn_disas = qemu_plugin_insn_disas(insn);
- int j;
- for (j = 0; j < matches->len; j++) {
+ for (int j = 0; j < matches->len; j++) {
Match *m = &g_array_index(matches, Match, j);
if (g_str_has_prefix(insn_disas, m->match_string)) {
Instruction *rec = g_new0(Instruction, 1);
@@ -150,36 +144,34 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
"len %d bytes: %ld insns\n", i, *cnt);
}
}
- } else if (do_inline) {
- g_string_append_printf(out, "insns: %" PRIu64 "\n", inline_insn_count);
} else {
- uint64_t total_insns = 0;
- for (i = 0; i < MAX_CPUS; i++) {
- InstructionCount *c = &counts[i];
- if (c->insn_count) {
- g_string_append_printf(out, "cpu %d insns: %" PRIu64 "\n",
- i, c->insn_count);
- total_insns += c->insn_count;
- }
+ unsigned int num_cpus = qemu_plugin_scoreboard_size(insn_count.score);
+ for (i = 0; i < num_cpus; i++) {
+ g_string_append_printf(out, "cpu %d insns: %" PRIu64 "\n",
+ i, *qemu_plugin_u64_get(insn_count, i));
}
g_string_append_printf(out, "total insns: %" PRIu64 "\n",
- total_insns);
+ qemu_plugin_u64_sum(insn_count));
}
qemu_plugin_outs(out->str);
+
+ qemu_plugin_scoreboard_free(insn_count.score);
+ for (i = 0; i < matches->len; ++i) {
+ Match *m = &g_array_index(matches, Match, i);
+ g_free(m->match_string);
+ qemu_plugin_scoreboard_free(m->counts);
+ }
+ g_array_free(matches, TRUE);
+ g_array_free(sizes, TRUE);
}
/* Add a match to the array of matches */
static void parse_match(char *match)
{
- Match new_match = { .match_string = match };
- int i;
- for (i = 0; i < MAX_CPUS; i++) {
- new_match.history[i] = g_ptr_array_new();
- }
- if (!matches) {
- matches = g_array_new(false, true, sizeof(Match));
- }
+ Match new_match = {
+ .match_string = g_strdup(match),
+ .counts = qemu_plugin_scoreboard_new(sizeof(MatchCount)) };
g_array_append_val(matches, new_match);
}
@@ -187,6 +179,10 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
const qemu_info_t *info,
int argc, char **argv)
{
+ matches = g_array_new(false, true, sizeof(Match));
+ /* null terminated so 0 is not a special case */
+ sizes = g_array_new(true, true, sizeof(unsigned long));
+
for (int i = 0; i < argc; i++) {
char *opt = argv[i];
g_auto(GStrv) tokens = g_strsplit(opt, "=", 2);
@@ -208,9 +204,7 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
}
}
- if (do_size) {
- sizes = g_array_new(true, true, sizeof(unsigned long));
- }
+ insn_count = qemu_plugin_u64(qemu_plugin_scoreboard_new(sizeof(uint64_t)));
qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 08/14] tests/plugin/bb: migrate to new per_vcpu API
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
` (6 preceding siblings ...)
2024-01-18 3:23 ` [PATCH v2 07/14] tests/plugin/insn: " Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 09/14] contrib/plugins/hotblocks: " Pierrick Bouvier
` (6 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
tests/plugin/bb.c | 63 +++++++++++++++++++----------------------------
1 file changed, 26 insertions(+), 37 deletions(-)
diff --git a/tests/plugin/bb.c b/tests/plugin/bb.c
index df50d1fd3bc..3f0c25883de 100644
--- a/tests/plugin/bb.c
+++ b/tests/plugin/bb.c
@@ -17,49 +17,51 @@
QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
typedef struct {
- GMutex lock;
- int index;
uint64_t bb_count;
uint64_t insn_count;
} CPUCount;
-/* Used by the inline & linux-user counts */
-static bool do_inline;
-static CPUCount inline_count;
+static struct qemu_plugin_scoreboard *counts;
+static qemu_plugin_u64_t bb_count;
+static qemu_plugin_u64_t insn_count;
+static bool do_inline;
/* Dump running CPU total on idle? */
static bool idle_report;
-static GPtrArray *counts;
-static int max_cpus;
-static void gen_one_cpu_report(CPUCount *count, GString *report)
+static void gen_one_cpu_report(CPUCount *count, GString *report,
+ unsigned int cpu_index)
{
if (count->bb_count) {
g_string_append_printf(report, "CPU%d: "
"bb's: %" PRIu64", insns: %" PRIu64 "\n",
- count->index,
+ cpu_index,
count->bb_count, count->insn_count);
}
}
static void plugin_exit(qemu_plugin_id_t id, void *p)
{
+ const unsigned int num_cpus = qemu_plugin_scoreboard_size(counts);
g_autoptr(GString) report = g_string_new("");
- if (do_inline || !max_cpus) {
- g_string_printf(report, "bb's: %" PRIu64", insns: %" PRIu64 "\n",
- inline_count.bb_count, inline_count.insn_count);
- } else {
- g_ptr_array_foreach(counts, (GFunc) gen_one_cpu_report, report);
+ for (int i = 0; i < num_cpus; ++i) {
+ CPUCount *count = qemu_plugin_scoreboard_get(counts, i);
+ gen_one_cpu_report(count, report, i);
}
+ g_string_append_printf(report, "Total: "
+ "bb's: %" PRIu64", insns: %" PRIu64 "\n",
+ qemu_plugin_u64_sum(bb_count),
+ qemu_plugin_u64_sum(insn_count));
qemu_plugin_outs(report->str);
+ qemu_plugin_scoreboard_free(counts);
}
static void vcpu_idle(qemu_plugin_id_t id, unsigned int cpu_index)
{
- CPUCount *count = g_ptr_array_index(counts, cpu_index);
+ CPUCount *count = qemu_plugin_scoreboard_get(counts, cpu_index);
g_autoptr(GString) report = g_string_new("");
- gen_one_cpu_report(count, report);
+ gen_one_cpu_report(count, report, cpu_index);
if (report->len > 0) {
g_string_prepend(report, "Idling ");
@@ -69,14 +71,11 @@ static void vcpu_idle(qemu_plugin_id_t id, unsigned int cpu_index)
static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
{
- CPUCount *count = max_cpus ?
- g_ptr_array_index(counts, cpu_index) : &inline_count;
+ CPUCount *count = qemu_plugin_scoreboard_get(counts, cpu_index);
uintptr_t n_insns = (uintptr_t)udata;
- g_mutex_lock(&count->lock);
count->insn_count += n_insns;
count->bb_count++;
- g_mutex_unlock(&count->lock);
}
static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
@@ -84,11 +83,10 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
size_t n_insns = qemu_plugin_tb_n_insns(tb);
if (do_inline) {
- qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
- &inline_count.bb_count, 1);
- qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
- &inline_count.insn_count,
- n_insns);
+ qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+ tb, QEMU_PLUGIN_INLINE_ADD_U64, bb_count, 1);
+ qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+ tb, QEMU_PLUGIN_INLINE_ADD_U64, insn_count, n_insns);
} else {
qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
QEMU_PLUGIN_CB_NO_REGS,
@@ -121,18 +119,9 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
}
}
- if (info->system_emulation && !do_inline) {
- max_cpus = info->system.max_vcpus;
- counts = g_ptr_array_new();
- for (i = 0; i < max_cpus; i++) {
- CPUCount *count = g_new0(CPUCount, 1);
- g_mutex_init(&count->lock);
- count->index = i;
- g_ptr_array_add(counts, count);
- }
- } else if (!do_inline) {
- g_mutex_init(&inline_count.lock);
- }
+ counts = qemu_plugin_scoreboard_new(sizeof(CPUCount));
+ bb_count = qemu_plugin_u64_struct(counts, CPUCount, bb_count);
+ insn_count = qemu_plugin_u64_struct(counts, CPUCount, insn_count);
if (idle_report) {
qemu_plugin_register_vcpu_idle_cb(id, vcpu_idle);
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 09/14] contrib/plugins/hotblocks: migrate to new per_vcpu API
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
` (7 preceding siblings ...)
2024-01-18 3:23 ` [PATCH v2 08/14] tests/plugin/bb: " Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 10/14] contrib/plugins/howvec: " Pierrick Bouvier
` (5 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
contrib/plugins/hotblocks.c | 46 +++++++++++++++++++++----------------
1 file changed, 26 insertions(+), 20 deletions(-)
diff --git a/contrib/plugins/hotblocks.c b/contrib/plugins/hotblocks.c
index 4de1b134944..fa1e2f02178 100644
--- a/contrib/plugins/hotblocks.c
+++ b/contrib/plugins/hotblocks.c
@@ -34,8 +34,8 @@ static guint64 limit = 20;
*/
typedef struct {
uint64_t start_addr;
- uint64_t exec_count;
- int trans_count;
+ struct qemu_plugin_scoreboard *exec_count;
+ int trans_count;
unsigned long insns;
} ExecCount;
@@ -43,7 +43,15 @@ static gint cmp_exec_count(gconstpointer a, gconstpointer b)
{
ExecCount *ea = (ExecCount *) a;
ExecCount *eb = (ExecCount *) b;
- return ea->exec_count > eb->exec_count ? -1 : 1;
+ uint64_t count_a = qemu_plugin_u64_sum(qemu_plugin_u64(ea->exec_count));
+ uint64_t count_b = qemu_plugin_u64_sum(qemu_plugin_u64(eb->exec_count));
+ return count_a > count_b ? -1 : 1;
+}
+
+static void exec_count_free(gpointer key, gpointer value, gpointer user_data)
+{
+ ExecCount *cnt = value;
+ qemu_plugin_scoreboard_free(cnt->exec_count);
}
static void plugin_exit(qemu_plugin_id_t id, void *p)
@@ -52,7 +60,6 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
GList *counts, *it;
int i;
- g_mutex_lock(&lock);
g_string_append_printf(report, "%d entries in the hash table\n",
g_hash_table_size(hotblocks));
counts = g_hash_table_get_values(hotblocks);
@@ -63,16 +70,20 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
for (i = 0; i < limit && it->next; i++, it = it->next) {
ExecCount *rec = (ExecCount *) it->data;
- g_string_append_printf(report, "0x%016"PRIx64", %d, %ld, %"PRId64"\n",
- rec->start_addr, rec->trans_count,
- rec->insns, rec->exec_count);
+ g_string_append_printf(
+ report, "0x%016"PRIx64", %d, %ld, %"PRId64"\n",
+ rec->start_addr, rec->trans_count,
+ rec->insns,
+ qemu_plugin_u64_sum(qemu_plugin_u64(rec->exec_count)));
}
g_list_free(it);
}
- g_mutex_unlock(&lock);
qemu_plugin_outs(report->str);
+
+ g_hash_table_foreach(hotblocks, exec_count_free, NULL);
+ g_hash_table_destroy(hotblocks);
}
static void plugin_init(void)
@@ -82,15 +93,8 @@ static void plugin_init(void)
static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
{
- ExecCount *cnt;
- uint64_t hash = (uint64_t) udata;
-
- g_mutex_lock(&lock);
- cnt = (ExecCount *) g_hash_table_lookup(hotblocks, (gconstpointer) hash);
- /* should always succeed */
- g_assert(cnt);
- cnt->exec_count++;
- g_mutex_unlock(&lock);
+ ExecCount *cnt = (ExecCount *)udata;
+ (*qemu_plugin_u64_get(qemu_plugin_u64(cnt->exec_count), cpu_index))++;
}
/*
@@ -114,18 +118,20 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
cnt->start_addr = pc;
cnt->trans_count = 1;
cnt->insns = insns;
+ cnt->exec_count = qemu_plugin_scoreboard_new(sizeof(uint64_t));
g_hash_table_insert(hotblocks, (gpointer) hash, (gpointer) cnt);
}
g_mutex_unlock(&lock);
if (do_inline) {
- qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
- &cnt->exec_count, 1);
+ qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+ tb, QEMU_PLUGIN_INLINE_ADD_U64,
+ qemu_plugin_u64(cnt->exec_count), 1);
} else {
qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
QEMU_PLUGIN_CB_NO_REGS,
- (void *)hash);
+ (void *)cnt);
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 10/14] contrib/plugins/howvec: migrate to new per_vcpu API
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
` (8 preceding siblings ...)
2024-01-18 3:23 ` [PATCH v2 09/14] contrib/plugins/hotblocks: " Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 11/14] plugins: remove non per_vcpu inline operation from API Pierrick Bouvier
` (4 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
contrib/plugins/howvec.c | 50 ++++++++++++++++++++++++++++------------
1 file changed, 35 insertions(+), 15 deletions(-)
diff --git a/contrib/plugins/howvec.c b/contrib/plugins/howvec.c
index 644a7856bb2..497b86fcd6b 100644
--- a/contrib/plugins/howvec.c
+++ b/contrib/plugins/howvec.c
@@ -43,13 +43,13 @@ typedef struct {
uint32_t mask;
uint32_t pattern;
CountType what;
- uint64_t count;
+ qemu_plugin_u64_t count;
} InsnClassExecCount;
typedef struct {
char *insn;
uint32_t opcode;
- uint64_t count;
+ qemu_plugin_u64_t count;
InsnClassExecCount *class;
} InsnExecCount;
@@ -159,7 +159,9 @@ static gint cmp_exec_count(gconstpointer a, gconstpointer b)
{
InsnExecCount *ea = (InsnExecCount *) a;
InsnExecCount *eb = (InsnExecCount *) b;
- return ea->count > eb->count ? -1 : 1;
+ uint64_t count_a = qemu_plugin_u64_sum(ea->count);
+ uint64_t count_b = qemu_plugin_u64_sum(eb->count);
+ return count_a > count_b ? -1 : 1;
}
static void free_record(gpointer data)
@@ -167,12 +169,14 @@ static void free_record(gpointer data)
InsnExecCount *rec = (InsnExecCount *) data;
g_free(rec->insn);
g_free(rec);
+ qemu_plugin_scoreboard_free(rec->count.score);
}
static void plugin_exit(qemu_plugin_id_t id, void *p)
{
g_autoptr(GString) report = g_string_new("Instruction Classes:\n");
int i;
+ uint64_t total_count;
GList *counts;
InsnClassExecCount *class = NULL;
@@ -180,11 +184,12 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
class = &class_table[i];
switch (class->what) {
case COUNT_CLASS:
- if (class->count || verbose) {
+ total_count = qemu_plugin_u64_sum(class->count);
+ if (total_count || verbose) {
g_string_append_printf(report,
"Class: %-24s\t(%" PRId64 " hits)\n",
class->class,
- class->count);
+ total_count);
}
break;
case COUNT_INDIVIDUAL:
@@ -212,7 +217,7 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
"Instr: %-24s\t(%" PRId64 " hits)"
"\t(op=0x%08x/%s)\n",
rec->insn,
- rec->count,
+ qemu_plugin_u64_sum(rec->count),
rec->opcode,
rec->class ?
rec->class->class : "un-categorised");
@@ -221,6 +226,12 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
}
g_hash_table_destroy(insns);
+ for (i = 0; i < ARRAY_SIZE(class_tables); i++) {
+ for (int j = 0; j < class_tables[i].table_sz; ++j) {
+ qemu_plugin_scoreboard_free(class_tables[i].table[j].count.score);
+ }
+ }
+
qemu_plugin_outs(report->str);
}
@@ -232,11 +243,12 @@ static void plugin_init(void)
static void vcpu_insn_exec_before(unsigned int cpu_index, void *udata)
{
- uint64_t *count = (uint64_t *) udata;
- (*count)++;
+ struct qemu_plugin_scoreboard *score = udata;
+ (*qemu_plugin_u64_get(qemu_plugin_u64(score), cpu_index))++;
}
-static uint64_t *find_counter(struct qemu_plugin_insn *insn)
+static struct qemu_plugin_scoreboard *find_counter(
+ struct qemu_plugin_insn *insn)
{
int i;
uint64_t *cnt = NULL;
@@ -265,7 +277,7 @@ static uint64_t *find_counter(struct qemu_plugin_insn *insn)
case COUNT_NONE:
return NULL;
case COUNT_CLASS:
- return &class->count;
+ return class->count.score;
case COUNT_INDIVIDUAL:
{
InsnExecCount *icount;
@@ -279,13 +291,15 @@ static uint64_t *find_counter(struct qemu_plugin_insn *insn)
icount->opcode = opcode;
icount->insn = qemu_plugin_insn_disas(insn);
icount->class = class;
+ icount->count =
+ qemu_plugin_u64(qemu_plugin_scoreboard_new(sizeof(uint64_t)));
g_hash_table_insert(insns, GUINT_TO_POINTER(opcode),
(gpointer) icount);
}
g_mutex_unlock(&lock);
- return &icount->count;
+ return icount->count.score;
}
default:
g_assert_not_reached();
@@ -300,14 +314,13 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
size_t i;
for (i = 0; i < n; i++) {
- uint64_t *cnt;
struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
- cnt = find_counter(insn);
+ struct qemu_plugin_scoreboard *cnt = find_counter(insn);
if (cnt) {
if (do_inline) {
- qemu_plugin_register_vcpu_insn_exec_inline(
- insn, QEMU_PLUGIN_INLINE_ADD_U64, cnt, 1);
+ qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
+ insn, QEMU_PLUGIN_INLINE_ADD_U64, qemu_plugin_u64(cnt), 1);
} else {
qemu_plugin_register_vcpu_insn_exec_cb(
insn, vcpu_insn_exec_before, QEMU_PLUGIN_CB_NO_REGS, cnt);
@@ -322,6 +335,13 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
{
int i;
+ for (i = 0; i < ARRAY_SIZE(class_tables); i++) {
+ for (int j = 0; j < class_tables[i].table_sz; ++j) {
+ class_tables[i].table[j].count =
+ qemu_plugin_u64(qemu_plugin_scoreboard_new(sizeof(uint64_t)));
+ }
+ }
+
/* Select a class table appropriate to the guest architecture */
for (i = 0; i < ARRAY_SIZE(class_tables); i++) {
ClassSelector *entry = &class_tables[i];
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 11/14] plugins: remove non per_vcpu inline operation from API
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
` (9 preceding siblings ...)
2024-01-18 3:23 ` [PATCH v2 10/14] contrib/plugins/howvec: " Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-26 16:26 ` Alex Bennée
2024-01-18 3:23 ` [PATCH v2 12/14] plugins: register inline op with a qemu_plugin_u64_t Pierrick Bouvier
` (3 subsequent siblings)
14 siblings, 1 reply; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
Now we have a thread-safe equivalent of inline operation, and that all
plugins were changed to use it, there is no point to keep the old API.
In more, it will help when we implement more functionality (conditional
callbacks), as we can assume that we operate on a scoreboard.
Bump API version as it's a breaking change for existing plugins.
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
include/qemu/qemu-plugin.h | 59 ++++----------------------------------
plugins/api.c | 29 -------------------
2 files changed, 6 insertions(+), 82 deletions(-)
diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index 55f918db1b0..3ee514f79cf 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -51,11 +51,16 @@ typedef uint64_t qemu_plugin_id_t;
*
* The plugins export the API they were built against by exposing the
* symbol qemu_plugin_version which can be checked.
+ *
+ * Version 2:
+ * Remove qemu_plugin_register_vcpu_{tb, insn, mem}_exec_inline.
+ * Those functions are replaced by *_per_vcpu variants, which guarantees
+ * thread-safety for operations.
*/
extern QEMU_PLUGIN_EXPORT int qemu_plugin_version;
-#define QEMU_PLUGIN_VERSION 1
+#define QEMU_PLUGIN_VERSION 2
/**
* struct qemu_info_t - system information for plugins
@@ -311,25 +316,6 @@ enum qemu_plugin_op {
QEMU_PLUGIN_INLINE_ADD_U64,
};
-/**
- * qemu_plugin_register_vcpu_tb_exec_inline() - execution inline op
- * @tb: the opaque qemu_plugin_tb handle for the translation
- * @op: the type of qemu_plugin_op (e.g. ADD_U64)
- * @ptr: the target memory location for the op
- * @imm: the op data (e.g. 1)
- *
- * Insert an inline op to every time a translated unit executes.
- * Useful if you just want to increment a single counter somewhere in
- * memory.
- *
- * Note: ops are not atomic so in multi-threaded/multi-smp situations
- * you will get inexact results.
- */
-QEMU_PLUGIN_API
-void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
- enum qemu_plugin_op op,
- void *ptr, uint64_t imm);
-
/**
* qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu() - execution inline op
* @tb: the opaque qemu_plugin_tb handle for the translation
@@ -361,21 +347,6 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
enum qemu_plugin_cb_flags flags,
void *userdata);
-/**
- * qemu_plugin_register_vcpu_insn_exec_inline() - insn execution inline op
- * @insn: the opaque qemu_plugin_insn handle for an instruction
- * @op: the type of qemu_plugin_op (e.g. ADD_U64)
- * @ptr: the target memory location for the op
- * @imm: the op data (e.g. 1)
- *
- * Insert an inline op to every time an instruction executes. Useful
- * if you just want to increment a single counter somewhere in memory.
- */
-QEMU_PLUGIN_API
-void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
- enum qemu_plugin_op op,
- void *ptr, uint64_t imm);
-
/**
* qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu() - insn exec inline op
* @insn: the opaque qemu_plugin_insn handle for an instruction
@@ -599,24 +570,6 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
enum qemu_plugin_mem_rw rw,
void *userdata);
-/**
- * qemu_plugin_register_vcpu_mem_inline() - register an inline op to any memory access
- * @insn: handle for instruction to instrument
- * @rw: apply to reads, writes or both
- * @op: the op, of type qemu_plugin_op
- * @ptr: pointer memory for the op
- * @imm: immediate data for @op
- *
- * This registers a inline op every memory access generated by the
- * instruction. This provides for a lightweight but not thread-safe
- * way of counting the number of operations done.
- */
-QEMU_PLUGIN_API
-void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
- enum qemu_plugin_mem_rw rw,
- enum qemu_plugin_op op, void *ptr,
- uint64_t imm);
-
/**
* qemu_plugin_register_vcpu_mem_inline_per_vcpu() - inline op for mem access
* @insn: handle for instruction to instrument
diff --git a/plugins/api.c b/plugins/api.c
index 132d5e0bec1..29915d3c142 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -101,16 +101,6 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
}
}
-void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
- enum qemu_plugin_op op,
- void *ptr, uint64_t imm)
-{
- if (!tb->mem_only) {
- plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE],
- 0, op, ptr, 0, sizeof(uint64_t), true, imm);
- }
-}
-
void qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
struct qemu_plugin_tb *tb,
enum qemu_plugin_op op,
@@ -140,16 +130,6 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
}
}
-void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
- enum qemu_plugin_op op,
- void *ptr, uint64_t imm)
-{
- if (!insn->mem_only) {
- plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
- 0, op, ptr, 0, sizeof(uint64_t), true, imm);
- }
-}
-
void qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
struct qemu_plugin_insn *insn,
enum qemu_plugin_op op,
@@ -179,15 +159,6 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
cb, flags, rw, udata);
}
-void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
- enum qemu_plugin_mem_rw rw,
- enum qemu_plugin_op op, void *ptr,
- uint64_t imm)
-{
- plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
- rw, op, ptr, 0, sizeof(uint64_t), true, imm);
-}
-
void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
struct qemu_plugin_insn *insn,
enum qemu_plugin_mem_rw rw,
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 12/14] plugins: register inline op with a qemu_plugin_u64_t
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
` (10 preceding siblings ...)
2024-01-18 3:23 ` [PATCH v2 11/14] plugins: remove non per_vcpu inline operation from API Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 13/14] MAINTAINERS: Add myself as reviewer for TCG Plugins Pierrick Bouvier
` (2 subsequent siblings)
14 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
Now inline previous API was removed, we can cleanup code path
associated.
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
accel/tcg/plugin-gen.c | 7 ++-----
include/qemu/plugin.h | 1 -
plugins/api.c | 12 +++---------
plugins/core.c | 17 ++++++-----------
plugins/plugin.h | 3 +--
5 files changed, 12 insertions(+), 28 deletions(-)
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 1a2375d7779..5e4938805e8 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -439,11 +439,8 @@ static TCGOp *append_inline_cb(const struct qemu_plugin_dyn_cb *cb,
TCGOp *begin_op, TCGOp *op,
int *unused)
{
- char *ptr = cb->userp;
- if (!cb->inline_direct_ptr) {
- /* dereference userp once to get access to memory location */
- ptr = *(char **)cb->userp;
- }
+ /* always dereference userp for inline operations */
+ char *ptr = *(char **)cb->userp;
op = copy_ld_i32(&begin_op, op);
op = copy_mul_i32(&begin_op, op, cb->inline_element_size);
op = copy_ext_i32_ptr(&begin_op, op);
diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index 5f340192e56..b63631207cd 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -88,7 +88,6 @@ struct qemu_plugin_dyn_cb {
void *userp;
size_t inline_offset;
size_t inline_element_size;
- bool inline_direct_ptr;
enum plugin_dyn_cb_subtype type;
/* @rw applies to mem callbacks only (both regular and inline) */
enum qemu_plugin_mem_rw rw;
diff --git a/plugins/api.c b/plugins/api.c
index 29915d3c142..d12fed6118e 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -109,9 +109,7 @@ void qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
{
if (!tb->mem_only) {
plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE],
- 0, op, entry.score->data,
- entry.offset, entry.score->element_size,
- false, imm);
+ 0, op, entry, imm);
}
}
@@ -138,9 +136,7 @@ void qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
{
if (!insn->mem_only) {
plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
- 0, op, entry.score->data,
- entry.offset, entry.score->element_size,
- false, imm);
+ 0, op, entry, imm);
}
}
@@ -167,9 +163,7 @@ void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
uint64_t imm)
{
plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
- rw, op, entry.score->data,
- entry.offset, entry.score->element_size,
- false, imm);
+ rw, op, entry, imm);
}
void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
diff --git a/plugins/core.c b/plugins/core.c
index 0286a127810..46175d557f7 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -347,17 +347,15 @@ static struct qemu_plugin_dyn_cb *plugin_get_dyn_cb(GArray **arr)
void plugin_register_inline_op(GArray **arr,
enum qemu_plugin_mem_rw rw,
enum qemu_plugin_op op,
- void *ptr, size_t offset, size_t element_size,
- bool direct_ptr,
+ qemu_plugin_u64_t entry,
uint64_t imm)
{
struct qemu_plugin_dyn_cb *dyn_cb;
dyn_cb = plugin_get_dyn_cb(arr);
- dyn_cb->userp = ptr;
- dyn_cb->inline_element_size = element_size;
- dyn_cb->inline_offset = offset;
- dyn_cb->inline_direct_ptr = direct_ptr;
+ dyn_cb->userp = entry.score->data;
+ dyn_cb->inline_element_size = entry.score->element_size;
+ dyn_cb->inline_offset = entry.offset;
dyn_cb->type = PLUGIN_CB_INLINE;
dyn_cb->rw = rw;
dyn_cb->inline_insn.op = op;
@@ -504,11 +502,8 @@ void qemu_plugin_flush_cb(void)
void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int cpu_index)
{
- char *ptr = cb->userp;
- if (!cb->inline_direct_ptr) {
- ptr = *(char **) cb->userp;
- }
- ptr += cb->inline_offset;
+ /* always dereference userp for inline operations */
+ char *ptr = (*(char **) cb->userp) + cb->inline_offset;
uint64_t *val = (uint64_t *)(ptr + cpu_index * cb->inline_element_size);
switch (cb->inline_insn.op) {
diff --git a/plugins/plugin.h b/plugins/plugin.h
index 99829c40886..4c8f2ca3728 100644
--- a/plugins/plugin.h
+++ b/plugins/plugin.h
@@ -71,8 +71,7 @@ struct qemu_plugin_ctx *plugin_id_to_ctx_locked(qemu_plugin_id_t id);
void plugin_register_inline_op(GArray **arr,
enum qemu_plugin_mem_rw rw,
enum qemu_plugin_op op,
- void *ptr, size_t offset, size_t element_size,
- bool direct_ptr,
+ qemu_plugin_u64_t entry,
uint64_t imm);
void plugin_reset_uninstall(qemu_plugin_id_t id,
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 13/14] MAINTAINERS: Add myself as reviewer for TCG Plugins
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
` (11 preceding siblings ...)
2024-01-18 3:23 ` [PATCH v2 12/14] plugins: register inline op with a qemu_plugin_u64_t Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-26 16:27 ` Alex Bennée
2024-01-18 3:23 ` [PATCH v2 14/14] contrib/plugins/execlog: fix new warnings Pierrick Bouvier
2024-01-26 9:21 ` [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
14 siblings, 1 reply; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss,
Philippe Mathieu-Daudé
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
MAINTAINERS | 1 +
1 file changed, 1 insertion(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index b406fb20c05..206d813ea5e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3675,6 +3675,7 @@ TCG Plugins
M: Alex Bennée <alex.bennee@linaro.org>
R: Alexandre Iooss <erdnaxe@crans.org>
R: Mahmoud Mandour <ma.mandourr@gmail.com>
+R: Pierrick Bouvier <pierrick.bouvier@linaro.org>
S: Maintained
F: docs/devel/tcg-plugins.rst
F: plugins/
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v2 14/14] contrib/plugins/execlog: fix new warnings
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
` (12 preceding siblings ...)
2024-01-18 3:23 ` [PATCH v2 13/14] MAINTAINERS: Add myself as reviewer for TCG Plugins Pierrick Bouvier
@ 2024-01-18 3:23 ` Pierrick Bouvier
2024-01-26 16:31 ` Alex Bennée
2024-01-26 9:21 ` [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
14 siblings, 1 reply; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-18 3:23 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Pierrick Bouvier,
Richard Henderson, Alex Bennée, Alexandre Iooss
‘g_pattern_match_string’ is deprecated,
Use 'g_pattern_spec_match_string' instead.
passing argument 2 of ‘g_ptr_array_add’ discards ‘const’ qualifier from
pointer target type
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
contrib/plugins/execlog.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/contrib/plugins/execlog.c b/contrib/plugins/execlog.c
index 5a4de1c93be..d12137ce5c0 100644
--- a/contrib/plugins/execlog.c
+++ b/contrib/plugins/execlog.c
@@ -336,8 +336,8 @@ static void registers_init(int vcpu_index)
for (int p = 0; p < rmatches->len; p++) {
g_autoptr(GPatternSpec) pat = g_pattern_spec_new(rmatches->pdata[p]);
g_autofree gchar *rd_lower = g_utf8_strdown(rd->name, -1);
- if (g_pattern_match_string(pat, rd->name) ||
- g_pattern_match_string(pat, rd_lower)) {
+ if (g_pattern_spec_match_string(pat, rd->name) ||
+ g_pattern_spec_match_string(pat, rd_lower)) {
Register *reg = init_vcpu_register(vcpu_index, rd);
g_ptr_array_add(registers, reg);
@@ -345,7 +345,7 @@ static void registers_init(int vcpu_index)
if (disas_assist) {
g_mutex_lock(&add_reg_name_lock);
if (!g_ptr_array_find(all_reg_names, reg->name, NULL)) {
- g_ptr_array_add(all_reg_names, reg->name);
+ g_ptr_array_add(all_reg_names, (gpointer)reg->name);
}
g_mutex_unlock(&add_reg_name_lock);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v2 00/14] TCG Plugin inline operation enhancement
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
` (13 preceding siblings ...)
2024-01-18 3:23 ` [PATCH v2 14/14] contrib/plugins/execlog: fix new warnings Pierrick Bouvier
@ 2024-01-26 9:21 ` Pierrick Bouvier
14 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-26 9:21 UTC (permalink / raw)
To: qemu-devel
Cc: Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alex Bennée, Alexandre Iooss
A polite ping to request review on this series, especially for
scoreboard [02/14] which is the main addition compared to v1.
On 1/18/24 07:23, Pierrick Bouvier wrote:
> This series adds a new thread-safe API to declare inline operation
> inside plugins. As well, it removes the existing non thread-safe API,
> and migrates all existing plugins to use it.
>
> Tested on Linux (user, system) for i386, x86_64 and aarch64.
>
> To give some context, this a long term series of work around plugins,
> with the goal to be able to do basic operations in a more performant and
> accurate way. This will mean to add more inline operations and
> conditional callbacks.
>
> One final target of this work is to implement a plugin that implements
> the icount=auto feature, and allow QEMU to run at a given "frequency"
> based on number of instructions executed, without QEMU needing to keep
> track of this.
>
> Another final target is to be able to detect control flow changes in an
> efficient and elegant way, by combining inline operation and conditional
> callbacks.
>
> v2
> --
>
> Implement scoreboard API (cpu local storage), so plugins don't have to deal
> with how many cpus are used.
>
> Since plugins have been modified again, I didn't transfer any reviewed-by on
> those commits.
>
> Branches:
> base: upstream/staging
> topic: plugin_inline_per_vcpu
>
> Pierrick Bouvier (15):
> [TO_REMOVE] commit support
> plugins: implement inline operation relative to cpu_index
> plugins: scoreboard API
> docs/devel: plugins can trigger a tb flush
> plugins: add inline operation per vcpu
> tests/plugin: add test plugin for inline operations
> tests/plugin/mem: migrate to new per_vcpu API
> tests/plugin/insn: migrate to new per_vcpu API
> tests/plugin/bb: migrate to new per_vcpu API
> contrib/plugins/hotblocks: migrate to new per_vcpu API
> contrib/plugins/howvec: migrate to new per_vcpu API
> plugins: remove non per_vcpu inline operation from API
> plugins: register inline op with a qemu_plugin_u64_t
> MAINTAINERS: Add myself as reviewer for TCG Plugins
> contrib/plugins/execlog: fix new warnings
>
> .gitignore | 1 +
> Dockerfile | 23 +++++
> MAINTAINERS | 1 +
> accel/tcg/plugin-gen.c | 64 +++++++++++---
> build.sh | 18 ++++
> contrib/plugins/execlog.c | 6 +-
> contrib/plugins/hotblocks.c | 46 +++++-----
> contrib/plugins/howvec.c | 50 +++++++----
> docs/devel/multi-thread-tcg.rst | 1 +
> include/qemu/plugin.h | 9 ++
> include/qemu/qemu-plugin.h | 135 ++++++++++++++++++++++-------
> mt.cpp | 44 ++++++++++
> plugins/api.c | 74 ++++++++++++----
> plugins/core.c | 112 +++++++++++++++++++++++--
> plugins/plugin.h | 13 ++-
> plugins/qemu-plugins.symbols | 9 ++
> run.sh | 22 +++++
> test.sh | 34 ++++++++
> tests/plugin/bb.c | 63 ++++++--------
> tests/plugin/inline.c | 182 ++++++++++++++++++++++++++++++++++++++++
> tests/plugin/insn.c | 106 +++++++++++------------
> tests/plugin/mem.c | 39 +++++----
> tests/plugin/meson.build | 2 +-
> 23 files changed, 846 insertions(+), 208 deletions(-)
>
> Pierrick Bouvier (14):
> plugins: implement inline operation relative to cpu_index
> plugins: scoreboard API
> docs/devel: plugins can trigger a tb flush
> plugins: add inline operation per vcpu
> tests/plugin: add test plugin for inline operations
> tests/plugin/mem: migrate to new per_vcpu API
> tests/plugin/insn: migrate to new per_vcpu API
> tests/plugin/bb: migrate to new per_vcpu API
> contrib/plugins/hotblocks: migrate to new per_vcpu API
> contrib/plugins/howvec: migrate to new per_vcpu API
> plugins: remove non per_vcpu inline operation from API
> plugins: register inline op with a qemu_plugin_u64_t
> MAINTAINERS: Add myself as reviewer for TCG Plugins
> contrib/plugins/execlog: fix new warnings
>
> MAINTAINERS | 1 +
> accel/tcg/plugin-gen.c | 64 +++++++++--
> contrib/plugins/execlog.c | 6 +-
> contrib/plugins/hotblocks.c | 46 ++++----
> contrib/plugins/howvec.c | 50 ++++++---
> docs/devel/multi-thread-tcg.rst | 1 +
> include/qemu/plugin.h | 9 ++
> include/qemu/qemu-plugin.h | 135 ++++++++++++++++++-----
> plugins/api.c | 74 ++++++++++---
> plugins/core.c | 112 +++++++++++++++++++-
> plugins/plugin.h | 13 ++-
> plugins/qemu-plugins.symbols | 9 ++
> tests/plugin/bb.c | 63 +++++------
> tests/plugin/inline.c | 182 ++++++++++++++++++++++++++++++++
> tests/plugin/insn.c | 106 +++++++++----------
> tests/plugin/mem.c | 39 ++++---
> tests/plugin/meson.build | 2 +-
> 17 files changed, 704 insertions(+), 208 deletions(-)
> create mode 100644 tests/plugin/inline.c
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 01/14] plugins: implement inline operation relative to cpu_index
2024-01-18 3:23 ` [PATCH v2 01/14] plugins: implement inline operation relative to cpu_index Pierrick Bouvier
@ 2024-01-26 12:07 ` Alex Bennée
2024-01-29 10:10 ` Pierrick Bouvier
0 siblings, 1 reply; 34+ messages in thread
From: Alex Bennée @ 2024-01-26 12:07 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> Instead of working on a fixed memory location, allow to address it based
> on cpu_index, an element size and a given offset.
> Result address: ptr + offset + cpu_index * element_size.
>
> With this, we can target a member in a struct array from a base pointer.
>
> Current semantic is not modified, thus inline operation still targets
> always the same memory location.
>
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
> accel/tcg/plugin-gen.c | 67 +++++++++++++++++++++++++++++++++++-------
> include/qemu/plugin.h | 3 ++
> plugins/api.c | 7 +++--
> plugins/core.c | 18 +++++++++---
> plugins/plugin.h | 6 ++--
> 5 files changed, 81 insertions(+), 20 deletions(-)
>
> diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
> index b37ce7683e6..1a2375d7779 100644
> --- a/accel/tcg/plugin-gen.c
> +++ b/accel/tcg/plugin-gen.c
> @@ -132,16 +132,28 @@ static void gen_empty_udata_cb_no_rwg(void)
> */
> static void gen_empty_inline_cb(void)
> {
> + TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
> + TCGv_ptr cpu_index_as_ptr = tcg_temp_ebb_new_ptr();
> TCGv_i64 val = tcg_temp_ebb_new_i64();
> TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
>
> + tcg_gen_ld_i32(cpu_index, tcg_env,
> + -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
> + /* pass an immediate != 0 so that it doesn't get optimized away */
> + tcg_gen_muli_i32(cpu_index, cpu_index, 0xdeadbeef);
> + tcg_gen_ext_i32_ptr(cpu_index_as_ptr, cpu_index);
> +
> tcg_gen_movi_ptr(ptr, 0);
> + tcg_gen_add_ptr(ptr, ptr, cpu_index_as_ptr);
> tcg_gen_ld_i64(val, ptr, 0);
> /* pass an immediate != 0 so that it doesn't get optimized away */
> tcg_gen_addi_i64(val, val, 0xdeadface);
> +
> tcg_gen_st_i64(val, ptr, 0);
> tcg_temp_free_ptr(ptr);
> tcg_temp_free_i64(val);
> + tcg_temp_free_ptr(cpu_index_as_ptr);
> + tcg_temp_free_i32(cpu_index);
> }
>
<snip>
> if (UINTPTR_MAX == UINT32_MAX) {
> @@ -395,18 +439,19 @@ static TCGOp *append_inline_cb(const struct qemu_plugin_dyn_cb *cb,
> TCGOp *begin_op, TCGOp *op,
> int *unused)
> {
> - /* const_ptr */
> - op = copy_const_ptr(&begin_op, op, cb->userp);
> -
> - /* ld_i64 */
> + char *ptr = cb->userp;
> + if (!cb->inline_direct_ptr) {
> + /* dereference userp once to get access to memory location */
> + ptr = *(char **)cb->userp;
> + }
I'm confused here, probably because inline_direct_ptr gets removed later
on?
> + op = copy_ld_i32(&begin_op, op);
> + op = copy_mul_i32(&begin_op, op, cb->inline_element_size);
> + op = copy_ext_i32_ptr(&begin_op, op);
> + op = copy_const_ptr(&begin_op, op, ptr + cb->inline_offset);
> + op = copy_add_ptr(&begin_op, op);
> op = copy_ld_i64(&begin_op, op);
> -
> - /* add_i64 */
> op = copy_add_i64(&begin_op, op, cb->inline_insn.imm);
> -
> - /* st_i64 */
> op = copy_st_i64(&begin_op, op);
> -
> return op;
> }
>
> diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
> index b0c5ac68293..9346249145d 100644
> --- a/include/qemu/plugin.h
> +++ b/include/qemu/plugin.h
> @@ -86,6 +86,9 @@ enum plugin_dyn_cb_subtype {
> struct qemu_plugin_dyn_cb {
> union qemu_plugin_cb_sig f;
> void *userp;
> + size_t inline_offset;
> + size_t inline_element_size;
> + bool inline_direct_ptr;
Ahh here it is.
(I seem to recall there is a setting for git to order headers first that
helps with this).
We could do with some documentation here. I can guess the offset and
element size values but what inline_direct_ptr means is not clear.
> enum plugin_dyn_cb_subtype type;
> /* @rw applies to mem callbacks only (both regular and inline) */
> enum qemu_plugin_mem_rw rw;
> diff --git a/plugins/api.c b/plugins/api.c
> index 8d5cca53295..e777eb4e9fc 100644
> --- a/plugins/api.c
> +++ b/plugins/api.c
> @@ -106,7 +106,8 @@ void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
> void *ptr, uint64_t imm)
> {
> if (!tb->mem_only) {
> - plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
> + plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE],
> + 0, op, ptr, 0, sizeof(uint64_t), true, imm);
> }
> }
>
> @@ -131,7 +132,7 @@ void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
> {
> if (!insn->mem_only) {
> plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
> - 0, op, ptr, imm);
> + 0, op, ptr, 0, sizeof(uint64_t), true, imm);
> }
> }
>
> @@ -156,7 +157,7 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
> uint64_t imm)
> {
> plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
> - rw, op, ptr, imm);
> + rw, op, ptr, 0, sizeof(uint64_t), true, imm);
> }
>
> void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
> diff --git a/plugins/core.c b/plugins/core.c
> index 49588285dd0..e07b9cdf229 100644
> --- a/plugins/core.c
> +++ b/plugins/core.c
> @@ -280,13 +280,18 @@ static struct qemu_plugin_dyn_cb *plugin_get_dyn_cb(GArray **arr)
>
> void plugin_register_inline_op(GArray **arr,
> enum qemu_plugin_mem_rw rw,
> - enum qemu_plugin_op op, void *ptr,
> + enum qemu_plugin_op op,
> + void *ptr, size_t offset, size_t element_size,
> + bool direct_ptr,
> uint64_t imm)
> {
> struct qemu_plugin_dyn_cb *dyn_cb;
>
> dyn_cb = plugin_get_dyn_cb(arr);
> dyn_cb->userp = ptr;
> + dyn_cb->inline_element_size = element_size;
> + dyn_cb->inline_offset = offset;
> + dyn_cb->inline_direct_ptr = direct_ptr;
> dyn_cb->type = PLUGIN_CB_INLINE;
> dyn_cb->rw = rw;
> dyn_cb->inline_insn.op = op;
> @@ -431,9 +436,14 @@ void qemu_plugin_flush_cb(void)
> plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
> }
>
> -void exec_inline_op(struct qemu_plugin_dyn_cb *cb)
> +void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int cpu_index)
> {
> - uint64_t *val = cb->userp;
> + char *ptr = cb->userp;
> + if (!cb->inline_direct_ptr) {
> + ptr = *(char **) cb->userp;
> + }
> + ptr += cb->inline_offset;
> + uint64_t *val = (uint64_t *)(ptr + cpu_index * cb->inline_element_size);
>
> switch (cb->inline_insn.op) {
> case QEMU_PLUGIN_INLINE_ADD_U64:
> @@ -466,7 +476,7 @@ void qemu_plugin_vcpu_mem_cb(CPUState *cpu, uint64_t vaddr,
> vaddr, cb->userp);
> break;
> case PLUGIN_CB_INLINE:
> - exec_inline_op(cb);
> + exec_inline_op(cb, cpu->cpu_index);
> break;
> default:
> g_assert_not_reached();
> diff --git a/plugins/plugin.h b/plugins/plugin.h
> index 5eb2fdbc85e..2c278379b70 100644
> --- a/plugins/plugin.h
> +++ b/plugins/plugin.h
> @@ -66,7 +66,9 @@ struct qemu_plugin_ctx *plugin_id_to_ctx_locked(qemu_plugin_id_t id);
>
> void plugin_register_inline_op(GArray **arr,
> enum qemu_plugin_mem_rw rw,
> - enum qemu_plugin_op op, void *ptr,
> + enum qemu_plugin_op op,
> + void *ptr, size_t offset, size_t element_size,
> + bool direct_ptr,
> uint64_t imm);
>
> void plugin_reset_uninstall(qemu_plugin_id_t id,
> @@ -95,6 +97,6 @@ void plugin_register_vcpu_mem_cb(GArray **arr,
> enum qemu_plugin_mem_rw rw,
> void *udata);
>
> -void exec_inline_op(struct qemu_plugin_dyn_cb *cb);
> +void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int cpu_index);
>
> #endif /* PLUGIN_H */
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 02/14] plugins: scoreboard API
2024-01-18 3:23 ` [PATCH v2 02/14] plugins: scoreboard API Pierrick Bouvier
@ 2024-01-26 15:14 ` Alex Bennée
2024-01-30 7:37 ` Pierrick Bouvier
2024-01-31 7:44 ` Pierrick Bouvier
0 siblings, 2 replies; 34+ messages in thread
From: Alex Bennée @ 2024-01-26 15:14 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> We introduce a cpu local storage, automatically managed (and extended)
> by QEMU itself. Plugin allocate a scoreboard, and don't have to deal
> with how many cpus are launched.
>
> This API will be used by new inline functions but callbacks can benefit
> from this as well. This way, they can operate without a global lock for
> simple operations.
>
> New functions:
> - qemu_plugin_scoreboard_free
> - qemu_plugin_scoreboard_get
> - qemu_plugin_scoreboard_new
> - qemu_plugin_scoreboard_size
>
> In more, we define a qemu_plugin_u64_t, which is a simple struct holding
> a pointer to a scoreboard, and a given offset.
> This allows to have a scoreboard containing structs, without having to
> bring offset for all operations on a specific field.
>
> Since most of the plugins are simply collecting a sum of per-cpu values,
> qemu_plugin_u64_t directly support this operation as well.
>
> New functions:
> - qemu_plugin_u64_get
> - qemu_plugin_u64_sum
> New macros:
> - qemu_plugin_u64
> - qemu_plugin_u64_struct
>
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
> include/qemu/plugin.h | 7 +++
> include/qemu/qemu-plugin.h | 75 ++++++++++++++++++++++++++++
> plugins/api.c | 39 +++++++++++++++
> plugins/core.c | 97 ++++++++++++++++++++++++++++++++++++
> plugins/plugin.h | 8 +++
> plugins/qemu-plugins.symbols | 6 +++
> 6 files changed, 232 insertions(+)
>
> diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
> index 9346249145d..5f340192e56 100644
> --- a/include/qemu/plugin.h
> +++ b/include/qemu/plugin.h
> @@ -115,6 +115,13 @@ struct qemu_plugin_insn {
> bool mem_only;
> };
>
> +/* A scoreboard is an array of values, indexed by vcpu_index */
> +struct qemu_plugin_scoreboard {
> + GArray *data;
> + size_t size;
Can we get size from GArray->len itself?
> + size_t element_size;
> +};
> +
> /*
> * qemu_plugin_insn allocate and cleanup functions. We don't expect to
> * cleanup many of these structures. They are reused for each fresh
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index 2c1930e7e45..934059d64c2 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -220,6 +220,23 @@ void qemu_plugin_register_vcpu_resume_cb(qemu_plugin_id_t id,
> struct qemu_plugin_tb;
> /** struct qemu_plugin_insn - Opaque handle for a translated instruction */
> struct qemu_plugin_insn;
> +/**
> + * struct qemu_plugin_scoreboard - Opaque handle for a scoreboard
> + *
> + * A scoreboard is an array of data, indexed by vcpu_index.
> + **/
stray *'s - I think this is what trips up kdoc.
> +struct qemu_plugin_scoreboard;
> +
> +/**
> + * qemu_plugin_u64_t - uint64_t member of an entry in a scoreboard
We generally reserve lower_case_with_underscores_ending_with_a_t for
scalar types. Its also a little generic. Maybe qemu_plugin_scoreboard_ref?
> + *
> + * This field allows to access a specific uint64_t member in one given entry,
> + * located at a specified offset. Inline operations expect this as entry.
> + */
> +typedef struct qemu_plugin_u64 {
I don't think you need the forward declaration here (which clashes later
on).
> + struct qemu_plugin_scoreboard *score;
> + size_t offset;
> +} qemu_plugin_u64_t;
>
> /**
> * enum qemu_plugin_cb_flags - type of callback
> @@ -754,5 +771,63 @@ int qemu_plugin_read_register(unsigned int vcpu,
> struct qemu_plugin_register *handle,
> GByteArray *buf);
>
> +/**
> + * qemu_plugin_scoreboard_new() - alloc a new scoreboard
> + *
> + * Returns a pointer to a new scoreboard. It must be freed using
> + * qemu_plugin_scoreboard_free.
> + */
> +QEMU_PLUGIN_API
> +struct qemu_plugin_scoreboard *qemu_plugin_scoreboard_new(size_t element_size);
> +
> +/**
> + * qemu_plugin_scoreboard_free() - free a scoreboard
> + * @score: scoreboard to free
> + */
> +QEMU_PLUGIN_API
> +void qemu_plugin_scoreboard_free(struct qemu_plugin_scoreboard *score);
> +
> +/**
> + * qemu_plugin_scoreboard_size() - return size of a scoreboard
> + * @score: scoreboard to query
> + */
> +QEMU_PLUGIN_API
> +size_t qemu_plugin_scoreboard_size(struct qemu_plugin_scoreboard
> *score);
Is this actually host memory size related? The use case in the plugins
seems to be a proxy for num_cpus and I'm note sure we want it used that
way. We are making the contract that it will always be big enough for
the number of vcpus created - maybe that is the helper we need?
> +
> +/**
> + * qemu_plugin_scoreboard_get() - access value from a scoreboard
> + * @score: scoreboard to query
> + * @vcpu_index: entry index
> + *
> + * Returns address of entry of a scoreboard matching a given vcpu_index. This
> + * address can be modified later if scoreboard is resized.
> + */
> +QEMU_PLUGIN_API
> +void *qemu_plugin_scoreboard_get(struct qemu_plugin_scoreboard *score,
> + unsigned int vcpu_index);
> +
> +/* Macros to define a qemu_plugin_u64_t */
> +#define qemu_plugin_u64(score) \
> + (qemu_plugin_u64_t){score, 0}
> +#define qemu_plugin_u64_struct(score, type, member) \
> + (qemu_plugin_u64_t){score, offsetof(type, member)}
qemu_plugin_val_ref and qemu_plugin_struct_ref?
> +
> +/**
> + * qemu_plugin_u64_get() - access specific uint64_t in a scoreboard
> + * @entry: entry to query
> + * @vcpu_index: entry index
> + *
> + * Returns address of a specific member in a scoreboard entry, matching a given
> + * vcpu_index.
> + */
> +QEMU_PLUGIN_API
> +uint64_t *qemu_plugin_u64_get(qemu_plugin_u64_t entry, unsigned int vcpu_index);
> +
> +/**
> + * qemu_plugin_u64_sum() - return sum of all values in a scoreboard
> + * @entry: entry to sum
> + */
> +QEMU_PLUGIN_API
> +uint64_t qemu_plugin_u64_sum(qemu_plugin_u64_t entry);
>
> #endif /* QEMU_QEMU_PLUGIN_H */
> diff --git a/plugins/api.c b/plugins/api.c
> index e777eb4e9fc..4de94e798c6 100644
> --- a/plugins/api.c
> +++ b/plugins/api.c
> @@ -547,3 +547,42 @@ static void __attribute__((__constructor__)) qemu_api_init(void)
> qemu_mutex_init(®_handle_lock);
>
> }
> +
> +struct qemu_plugin_scoreboard *qemu_plugin_scoreboard_new(size_t element_size)
> +{
> + return plugin_scoreboard_new(element_size);
> +}
> +
> +void qemu_plugin_scoreboard_free(struct qemu_plugin_scoreboard *score)
> +{
> + plugin_scoreboard_free(score);
> +}
> +
> +size_t qemu_plugin_scoreboard_size(struct qemu_plugin_scoreboard *score)
> +{
> + return score->size;
So this would be score->data->len
> +}
> +
> +void *qemu_plugin_scoreboard_get(struct qemu_plugin_scoreboard *score,
> + unsigned int vcpu_index)
> +{
> + g_assert(vcpu_index < qemu_plugin_scoreboard_size(score));
> + char *ptr = score->data->data;
> + return ptr + vcpu_index * score->element_size;
GArray has accesses functions for this:
return &g_array_index(score->data,
g_array_get_element_size(score->data),
vcpu_index);
> +}
> +
> +uint64_t *qemu_plugin_u64_get(qemu_plugin_u64_t entry,
> + unsigned int vcpu_index)
> +{
> + char *ptr = (char *) qemu_plugin_scoreboard_get(entry.score, vcpu_index);
> + return (uint64_t *)(ptr + entry.offset);
Not sure whats going with the casting here but I wonder if we are giving
the user too much rope. Why not to return the value itself? In fact why
do we treat things as an offset rather than an index of uint64_t.
The qemu_plugin_scoreboard_get can return uint64_t * and the individual
get becomes:
uint64_t *qemu_plugin_scoreboard_get(struct qemu_plugin_scoreboard *score,
unsigned int vcpu_index)
{
g_assert(vcpu_index < qemu_plugin_scoreboard_size(score));
return &g_array_index(score->data, score->element_size, vcpu_index);
}
uint64_t qemu_plugin_u64_get(struct qemu_plugin_scoreboard *score,
unsigned int vcpu_index, unsigned int score_index)
{
g_assert(score_index < score->num_elements);
uint64_t *sb = qemu_plugin_scoreboard_get(score, vcpu_index);
return sb[score_index];
}
> +}
> +
> +uint64_t qemu_plugin_u64_sum(qemu_plugin_u64_t entry)
> +{
> + uint64_t total = 0;
> + for (int i = 0; i < qemu_plugin_scoreboard_size(entry.score); ++i) {
> + total += *qemu_plugin_u64_get(entry, i);
> + }
> + return total;
> +}
> diff --git a/plugins/core.c b/plugins/core.c
> index e07b9cdf229..0286a127810 100644
> --- a/plugins/core.c
> +++ b/plugins/core.c
> @@ -209,6 +209,71 @@ plugin_register_cb_udata(qemu_plugin_id_t id, enum qemu_plugin_event ev,
> do_plugin_register_cb(id, ev, func, udata);
> }
>
> +struct resize_scoreboard_args {
> + size_t new_alloc_size;
> + size_t new_size;
> +};
> +
> +static void plugin_resize_one_scoreboard(gpointer key,
> + gpointer value,
> + gpointer user_data)
> +{
> + struct qemu_plugin_scoreboard *score =
> + (struct qemu_plugin_scoreboard *) value;
> + struct resize_scoreboard_args *args =
> + (struct resize_scoreboard_args *) user_data;
> + if (score->data->len != args->new_alloc_size) {
> + g_array_set_size(score->data, args->new_alloc_size);
> + }
> + score->size = args->new_size;
I think you have this in len so if you skip the extra field you can just
do:
new_size = GPOINTER_TO_UINT(user_data);
> +}
> +
> +static void plugin_grow_scoreboards__locked(CPUState *cpu)
> +{
> + if (cpu->cpu_index < plugin.scoreboard_size) {
> + return;
> + }
> +
> + bool need_realloc = FALSE;
> + while (cpu->cpu_index >= plugin.scoreboard_alloc_size) {
> + plugin.scoreboard_alloc_size *= 2;
I suspect for USER this makes sense, if we every end up doing system
expansion then it would just be +1 at a time. In fact given glib does:
gsize want_alloc = g_nearest_pow (g_array_elt_len (array, want_len));
maybe we just do a simple increment and leave it up glib to handle the
exponential growth?
> + need_realloc = TRUE;
> + }
> + plugin.scoreboard_size = cpu->cpu_index + 1;
> + g_assert(plugin.scoreboard_size <= plugin.scoreboard_alloc_size);
> +
> + if (g_hash_table_size(plugin.scoreboards) == 0) {
> + /* nothing to do, we just updated sizes for future scoreboards */
> + return;
> + }
> +
> + if (need_realloc) {
> +#ifdef CONFIG_USER_ONLY
> + /**
> + * cpus must be stopped, as some tb might still use an existing
> + * scoreboard.
> + */
> + start_exclusive();
> +#endif
Hmm this seems wrong to be USER_ONLY. While we don't expect to resize in
system mode if we did we certainly want to do it during exclusive
periods.
> + }
> +
> + struct resize_scoreboard_args args = {
> + .new_alloc_size = plugin.scoreboard_alloc_size,
> + .new_size = plugin.scoreboard_size
> + };
> + g_hash_table_foreach(plugin.scoreboards,
> + &plugin_resize_one_scoreboard,
> + &args);
This can just be g_hash_table_foreach(plugin.scoreboards,
&plugin_resize_one_scoreboard,
GUINT_TO_POINTER(plugin.scoreboard_alloc_size))
> +
> + if (need_realloc) {
> + /* force all tb to be flushed, as scoreboard pointers were changed. */
> + tb_flush(cpu);
> +#ifdef CONFIG_USER_ONLY
> + end_exclusive();
> +#endif
> + }
> +}
> +
> void qemu_plugin_vcpu_init_hook(CPUState *cpu)
> {
> bool success;
> @@ -218,6 +283,7 @@ void qemu_plugin_vcpu_init_hook(CPUState *cpu)
> success = g_hash_table_insert(plugin.cpu_ht, &cpu->cpu_index,
> &cpu->cpu_index);
> g_assert(success);
> + plugin_grow_scoreboards__locked(cpu);
> qemu_rec_mutex_unlock(&plugin.lock);
>
> plugin_vcpu_cb__simple(cpu, QEMU_PLUGIN_EV_VCPU_INIT);
> @@ -576,8 +642,39 @@ static void __attribute__((__constructor__)) plugin_init(void)
> qemu_rec_mutex_init(&plugin.lock);
> plugin.id_ht = g_hash_table_new(g_int64_hash, g_int64_equal);
> plugin.cpu_ht = g_hash_table_new(g_int_hash, g_int_equal);
> + plugin.scoreboards = g_hash_table_new(g_int64_hash, g_int64_equal);
> + plugin.scoreboard_size = 1;
> + plugin.scoreboard_alloc_size = 16; /* avoid frequent reallocation */
> QTAILQ_INIT(&plugin.ctxs);
> qht_init(&plugin.dyn_cb_arr_ht, plugin_dyn_cb_arr_cmp, 16,
> QHT_MODE_AUTO_RESIZE);
> atexit(qemu_plugin_atexit_cb);
> }
> +
> +struct qemu_plugin_scoreboard *plugin_scoreboard_new(size_t element_size)
> +{
> + struct qemu_plugin_scoreboard *score =
> + g_malloc0(sizeof(struct qemu_plugin_scoreboard));
> + score->data = g_array_new(FALSE, TRUE, element_size);
> + g_array_set_size(score->data, plugin.scoreboard_alloc_size);
> + score->size = plugin.scoreboard_size;
> + score->element_size = element_size;
> +
> + qemu_rec_mutex_lock(&plugin.lock);
> + bool inserted = g_hash_table_insert(plugin.scoreboards, score, score);
> + g_assert(inserted);
> + qemu_rec_mutex_unlock(&plugin.lock);
> +
> + return score;
> +}
> +
> +void plugin_scoreboard_free(struct qemu_plugin_scoreboard *score)
> +{
> + qemu_rec_mutex_lock(&plugin.lock);
> + bool removed = g_hash_table_remove(plugin.scoreboards, score);
> + g_assert(removed);
> + qemu_rec_mutex_unlock(&plugin.lock);
> +
> + g_array_free(score->data, TRUE);
> + g_free(score);
> +}
> diff --git a/plugins/plugin.h b/plugins/plugin.h
> index 2c278379b70..99829c40886 100644
> --- a/plugins/plugin.h
> +++ b/plugins/plugin.h
> @@ -31,6 +31,10 @@ struct qemu_plugin_state {
> * but with the HT we avoid adding a field to CPUState.
> */
> GHashTable *cpu_ht;
> + /* Scoreboards, indexed by their addresses. */
> + GHashTable *scoreboards;
> + size_t scoreboard_size;
> + size_t scoreboard_alloc_size;
> DECLARE_BITMAP(mask, QEMU_PLUGIN_EV_MAX);
> /*
> * @lock protects the struct as well as ctx->uninstalling.
> @@ -99,4 +103,8 @@ void plugin_register_vcpu_mem_cb(GArray **arr,
>
> void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int cpu_index);
>
> +struct qemu_plugin_scoreboard *plugin_scoreboard_new(size_t element_size);
> +
> +void plugin_scoreboard_free(struct qemu_plugin_scoreboard *score);
> +
> #endif /* PLUGIN_H */
> diff --git a/plugins/qemu-plugins.symbols b/plugins/qemu-plugins.symbols
> index 6963585c1ea..93866d1b5f2 100644
> --- a/plugins/qemu-plugins.symbols
> +++ b/plugins/qemu-plugins.symbols
> @@ -38,10 +38,16 @@
> qemu_plugin_register_vcpu_tb_exec_inline;
> qemu_plugin_register_vcpu_tb_trans_cb;
> qemu_plugin_reset;
> + qemu_plugin_scoreboard_free;
> + qemu_plugin_scoreboard_get;
> + qemu_plugin_scoreboard_new;
> + qemu_plugin_scoreboard_size;
> qemu_plugin_start_code;
> qemu_plugin_tb_get_insn;
> qemu_plugin_tb_n_insns;
> qemu_plugin_tb_vaddr;
> + qemu_plugin_u64_get;
> + qemu_plugin_u64_sum;
> qemu_plugin_uninstall;
> qemu_plugin_vcpu_for_each;
> };
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 04/14] plugins: add inline operation per vcpu
2024-01-18 3:23 ` [PATCH v2 04/14] plugins: add inline operation per vcpu Pierrick Bouvier
@ 2024-01-26 15:17 ` Alex Bennée
0 siblings, 0 replies; 34+ messages in thread
From: Alex Bennée @ 2024-01-26 15:17 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> Extends API with three new functions:
> qemu_plugin_register_vcpu_{tb, insn, mem}_exec_inline_per_vcpu().
>
> Those functions takes a qemu_plugin_u64_t as input.
>
> This allows to have a thread-safe and type-safe version of inline
> operations.
>
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 05/14] tests/plugin: add test plugin for inline operations
2024-01-18 3:23 ` [PATCH v2 05/14] tests/plugin: add test plugin for inline operations Pierrick Bouvier
@ 2024-01-26 16:05 ` Alex Bennée
2024-01-30 7:49 ` Pierrick Bouvier
0 siblings, 1 reply; 34+ messages in thread
From: Alex Bennée @ 2024-01-26 16:05 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> For now, it simply performs instruction, bb and mem count, and ensure
> that inline vs callback versions have the same result. Later, we'll
> extend it when new inline operations are added.
>
> Use existing plugins to test everything works is a bit cumbersome, as
> different events are treated in different plugins. Thus, this new one.
>
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
> tests/plugin/inline.c | 182 +++++++++++++++++++++++++++++++++++++++
> tests/plugin/meson.build | 2 +-
> 2 files changed, 183 insertions(+), 1 deletion(-)
> create mode 100644 tests/plugin/inline.c
>
> diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c
> new file mode 100644
> index 00000000000..28d1c3b1e48
> --- /dev/null
> +++ b/tests/plugin/inline.c
> @@ -0,0 +1,182 @@
> +/*
> + * Copyright (C) 2023, Pierrick Bouvier <pierrick.bouvier@linaro.org>
> + *
> + * Demonstrates and tests usage of inline ops.
> + *
> + * License: GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#include <glib.h>
> +#include <stdint.h>
> +#include <stdio.h>
> +
> +#include <qemu-plugin.h>
> +
> +typedef struct {
> + uint64_t count_tb;
> + uint64_t count_tb_inline;
> + uint64_t count_insn;
> + uint64_t count_insn_inline;
> + uint64_t count_mem;
> + uint64_t count_mem_inline;
> +} CPUCount;
I wonder if there is any way to enforce the structures being an array of
64 bit counts? I do worry the compiler might want day decide to do
something silly but legal leading to confusion.
I guess qemu_plugin_scoreboard_new could:
g_assert((element_size % sizeof(uint64_t)) == 0)
?
> +static qemu_plugin_u64_t count_tb;
> +static qemu_plugin_u64_t count_tb_inline;
> +static qemu_plugin_u64_t count_insn;
> +static qemu_plugin_u64_t count_insn_inline;
> +static qemu_plugin_u64_t count_mem;
> +static qemu_plugin_u64_t count_mem_inline;
Can't this just be a non scoreboard instance of CPUCount?
> +
> +static uint64_t global_count_tb;
> +static uint64_t global_count_insn;
> +static uint64_t global_count_mem;
> +static unsigned int max_cpu_index;
> +static GMutex tb_lock;
> +static GMutex insn_lock;
> +static GMutex mem_lock;
> +
> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> +
> +static void stats_insn(void)
> +{
> + const uint64_t expected = global_count_insn;
> + const uint64_t per_vcpu = qemu_plugin_u64_sum(count_insn);
> + const uint64_t inl_per_vcpu =
> + qemu_plugin_u64_sum(count_insn_inline);
> + printf("insn: %" PRIu64 "\n", expected);
> + printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu);
> + printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
> + g_assert(expected > 0);
> + g_assert(per_vcpu == expected);
> + g_assert(inl_per_vcpu == expected);
> +}
> +
> +static void stats_tb(void)
> +{
> + const uint64_t expected = global_count_tb;
> + const uint64_t per_vcpu = qemu_plugin_u64_sum(count_tb);
> + const uint64_t inl_per_vcpu =
> + qemu_plugin_u64_sum(count_tb_inline);
> + printf("tb: %" PRIu64 "\n", expected);
> + printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu);
> + printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
> + g_assert(expected > 0);
> + g_assert(per_vcpu == expected);
> + g_assert(inl_per_vcpu == expected);
> +}
> +
> +static void stats_mem(void)
> +{
> + const uint64_t expected = global_count_mem;
> + const uint64_t per_vcpu = qemu_plugin_u64_sum(count_mem);
> + const uint64_t inl_per_vcpu =
> + qemu_plugin_u64_sum(count_mem_inline);
> + printf("mem: %" PRIu64 "\n", expected);
> + printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu);
> + printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
> + g_assert(expected > 0);
> + g_assert(per_vcpu == expected);
> + g_assert(inl_per_vcpu == expected);
> +}
> +
> +static void plugin_exit(qemu_plugin_id_t id, void *udata)
> +{
> + const unsigned int num_cpus = qemu_plugin_scoreboard_size(counts);
> + g_assert(num_cpus == max_cpu_index + 1);
> +
> + for (int i = 0; i < num_cpus ; ++i) {
> + const uint64_t tb = *qemu_plugin_u64_get(count_tb, i);
> + const uint64_t tb_inline = *qemu_plugin_u64_get(count_tb_inline, i);
> + const uint64_t insn = *qemu_plugin_u64_get(count_insn, i);
> + const uint64_t insn_inline = *qemu_plugin_u64_get(count_insn_inline, i);
> + const uint64_t mem = *qemu_plugin_u64_get(count_mem, i);
> + const uint64_t mem_inline = *qemu_plugin_u64_get(count_mem_inline, i);
> + printf("cpu %d: tb (%" PRIu64 ", %" PRIu64 ") | "
> + "insn (%" PRIu64 ", %" PRIu64 ") | "
> + "mem (%" PRIu64 ", %" PRIu64 ")"
> + "\n",
> + i, tb, tb_inline, insn, insn_inline, mem, mem_inline);
> + g_assert(tb == tb_inline);
> + g_assert(insn == insn_inline);
> + g_assert(mem == mem_inline);
> + }
> +
> + stats_tb();
> + stats_insn();
> + stats_mem();
> +
> + qemu_plugin_scoreboard_free(counts);
> +}
> +
> +static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
> +{
> + (*qemu_plugin_u64_get(count_tb, cpu_index))++;
> + g_mutex_lock(&tb_lock);
> + max_cpu_index = MAX(max_cpu_index, cpu_index);
> + global_count_tb++;
> + g_mutex_unlock(&tb_lock);
> +}
> +
> +static void vcpu_insn_exec(unsigned int cpu_index, void *udata)
> +{
> + (*qemu_plugin_u64_get(count_insn, cpu_index))++;
> + g_mutex_lock(&insn_lock);
> + global_count_insn++;
> + g_mutex_unlock(&insn_lock);
> +}
> +
> +static void vcpu_mem_access(unsigned int cpu_index,
> + qemu_plugin_meminfo_t info,
> + uint64_t vaddr,
> + void *userdata)
> +{
> + (*qemu_plugin_u64_get(count_mem, cpu_index))++;
> + g_mutex_lock(&mem_lock);
> + global_count_mem++;
> + g_mutex_unlock(&mem_lock);
> +}
> +
> +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
> +{
> + qemu_plugin_register_vcpu_tb_exec_cb(
> + tb, vcpu_tb_exec, QEMU_PLUGIN_CB_NO_REGS, 0);
> + qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
> + tb, QEMU_PLUGIN_INLINE_ADD_U64, count_tb_inline, 1);
> +
> + for (int idx = 0; idx < qemu_plugin_tb_n_insns(tb); ++idx) {
> + struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx);
> + qemu_plugin_register_vcpu_insn_exec_cb(
> + insn, vcpu_insn_exec, QEMU_PLUGIN_CB_NO_REGS, 0);
> + qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
> + insn, QEMU_PLUGIN_INLINE_ADD_U64, count_insn_inline, 1);
> + qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access,
> + QEMU_PLUGIN_CB_NO_REGS,
> + QEMU_PLUGIN_MEM_RW, 0);
> + qemu_plugin_register_vcpu_mem_inline_per_vcpu(
> + insn, QEMU_PLUGIN_MEM_RW,
> + QEMU_PLUGIN_INLINE_ADD_U64,
> + count_mem_inline, 1);
> + }
> +}
> +
> +QEMU_PLUGIN_EXPORT
> +int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
> + int argc, char **argv)
> +{
> + counts = qemu_plugin_scoreboard_new(sizeof(CPUCount));
> + count_tb = qemu_plugin_u64_struct(counts, CPUCount, count_tb);
> + count_insn = qemu_plugin_u64_struct(counts, CPUCount, count_insn);
> + count_mem = qemu_plugin_u64_struct(counts, CPUCount, count_mem);
> + count_tb_inline = qemu_plugin_u64_struct(counts, CPUCount, count_tb_inline);
> + count_insn_inline =
> + qemu_plugin_u64_struct(counts, CPUCount, count_insn_inline);
> + count_mem_inline =
> + qemu_plugin_u64_struct(counts, CPUCount, count_mem_inline);
> + qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
> + qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
> +
> + return 0;
> +}
> diff --git a/tests/plugin/meson.build b/tests/plugin/meson.build
> index e18183aaeda..9eece5bab51 100644
> --- a/tests/plugin/meson.build
> +++ b/tests/plugin/meson.build
> @@ -1,6 +1,6 @@
> t = []
> if get_option('plugins')
> - foreach i : ['bb', 'empty', 'insn', 'mem', 'syscall']
> + foreach i : ['bb', 'empty', 'inline', 'insn', 'mem', 'syscall']
> if host_os == 'windows'
> t += shared_module(i, files(i + '.c') + '../../contrib/plugins/win32_linker.c',
> include_directories: '../../include/qemu',
Otherwise:
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 11/14] plugins: remove non per_vcpu inline operation from API
2024-01-18 3:23 ` [PATCH v2 11/14] plugins: remove non per_vcpu inline operation from API Pierrick Bouvier
@ 2024-01-26 16:26 ` Alex Bennée
2024-01-30 7:53 ` Pierrick Bouvier
0 siblings, 1 reply; 34+ messages in thread
From: Alex Bennée @ 2024-01-26 16:26 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> Now we have a thread-safe equivalent of inline operation, and that all
> plugins were changed to use it, there is no point to keep the old API.
>
> In more, it will help when we implement more functionality (conditional
> callbacks), as we can assume that we operate on a scoreboard.
>
> Bump API version as it's a breaking change for existing plugins.
>
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
> include/qemu/qemu-plugin.h | 59 ++++----------------------------------
> plugins/api.c | 29 -------------------
> 2 files changed, 6 insertions(+), 82 deletions(-)
>
> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
> index 55f918db1b0..3ee514f79cf 100644
> --- a/include/qemu/qemu-plugin.h
> +++ b/include/qemu/qemu-plugin.h
> @@ -51,11 +51,16 @@ typedef uint64_t qemu_plugin_id_t;
> *
> * The plugins export the API they were built against by exposing the
> * symbol qemu_plugin_version which can be checked.
> + *
> + * Version 2:
> + * Remove qemu_plugin_register_vcpu_{tb, insn, mem}_exec_inline.
> + * Those functions are replaced by *_per_vcpu variants, which guarantees
> + * thread-safety for operations.
> */
>
> extern QEMU_PLUGIN_EXPORT int qemu_plugin_version;
>
> -#define QEMU_PLUGIN_VERSION 1
> +#define QEMU_PLUGIN_VERSION 2
I think technically the adding new API bumps this, the deprecating the
old version bumps:
QEMU_PLUGIN_MIN_VERSION
to the same.
>
> /**
> * struct qemu_info_t - system information for plugins
> @@ -311,25 +316,6 @@ enum qemu_plugin_op {
> QEMU_PLUGIN_INLINE_ADD_U64,
> };
>
> -/**
> - * qemu_plugin_register_vcpu_tb_exec_inline() - execution inline op
> - * @tb: the opaque qemu_plugin_tb handle for the translation
> - * @op: the type of qemu_plugin_op (e.g. ADD_U64)
> - * @ptr: the target memory location for the op
> - * @imm: the op data (e.g. 1)
> - *
> - * Insert an inline op to every time a translated unit executes.
> - * Useful if you just want to increment a single counter somewhere in
> - * memory.
> - *
> - * Note: ops are not atomic so in multi-threaded/multi-smp situations
> - * you will get inexact results.
> - */
> -QEMU_PLUGIN_API
> -void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
> - enum qemu_plugin_op op,
> - void *ptr, uint64_t imm);
> -
> /**
> * qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu() - execution inline op
> * @tb: the opaque qemu_plugin_tb handle for the translation
> @@ -361,21 +347,6 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
> enum qemu_plugin_cb_flags flags,
> void *userdata);
>
> -/**
> - * qemu_plugin_register_vcpu_insn_exec_inline() - insn execution inline op
> - * @insn: the opaque qemu_plugin_insn handle for an instruction
> - * @op: the type of qemu_plugin_op (e.g. ADD_U64)
> - * @ptr: the target memory location for the op
> - * @imm: the op data (e.g. 1)
> - *
> - * Insert an inline op to every time an instruction executes. Useful
> - * if you just want to increment a single counter somewhere in memory.
> - */
> -QEMU_PLUGIN_API
> -void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
> - enum qemu_plugin_op op,
> - void *ptr, uint64_t imm);
> -
> /**
> * qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu() - insn exec inline op
> * @insn: the opaque qemu_plugin_insn handle for an instruction
> @@ -599,24 +570,6 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
> enum qemu_plugin_mem_rw rw,
> void *userdata);
>
> -/**
> - * qemu_plugin_register_vcpu_mem_inline() - register an inline op to any memory access
> - * @insn: handle for instruction to instrument
> - * @rw: apply to reads, writes or both
> - * @op: the op, of type qemu_plugin_op
> - * @ptr: pointer memory for the op
> - * @imm: immediate data for @op
> - *
> - * This registers a inline op every memory access generated by the
> - * instruction. This provides for a lightweight but not thread-safe
> - * way of counting the number of operations done.
> - */
> -QEMU_PLUGIN_API
> -void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
> - enum qemu_plugin_mem_rw rw,
> - enum qemu_plugin_op op, void *ptr,
> - uint64_t imm);
> -
> /**
> * qemu_plugin_register_vcpu_mem_inline_per_vcpu() - inline op for mem access
> * @insn: handle for instruction to instrument
> diff --git a/plugins/api.c b/plugins/api.c
> index 132d5e0bec1..29915d3c142 100644
> --- a/plugins/api.c
> +++ b/plugins/api.c
> @@ -101,16 +101,6 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
> }
> }
>
> -void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
> - enum qemu_plugin_op op,
> - void *ptr, uint64_t imm)
> -{
> - if (!tb->mem_only) {
> - plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE],
> - 0, op, ptr, 0, sizeof(uint64_t), true, imm);
> - }
> -}
> -
> void qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
> struct qemu_plugin_tb *tb,
> enum qemu_plugin_op op,
> @@ -140,16 +130,6 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
> }
> }
>
> -void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
> - enum qemu_plugin_op op,
> - void *ptr, uint64_t imm)
> -{
> - if (!insn->mem_only) {
> - plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
> - 0, op, ptr, 0, sizeof(uint64_t), true, imm);
> - }
> -}
> -
> void qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
> struct qemu_plugin_insn *insn,
> enum qemu_plugin_op op,
> @@ -179,15 +159,6 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
> cb, flags, rw, udata);
> }
>
> -void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
> - enum qemu_plugin_mem_rw rw,
> - enum qemu_plugin_op op, void *ptr,
> - uint64_t imm)
> -{
> - plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
> - rw, op, ptr, 0, sizeof(uint64_t), true, imm);
> -}
> -
> void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
> struct qemu_plugin_insn *insn,
> enum qemu_plugin_mem_rw rw,
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 13/14] MAINTAINERS: Add myself as reviewer for TCG Plugins
2024-01-18 3:23 ` [PATCH v2 13/14] MAINTAINERS: Add myself as reviewer for TCG Plugins Pierrick Bouvier
@ 2024-01-26 16:27 ` Alex Bennée
0 siblings, 0 replies; 34+ messages in thread
From: Alex Bennée @ 2024-01-26 16:27 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss, Philippe Mathieu-Daudé
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 14/14] contrib/plugins/execlog: fix new warnings
2024-01-18 3:23 ` [PATCH v2 14/14] contrib/plugins/execlog: fix new warnings Pierrick Bouvier
@ 2024-01-26 16:31 ` Alex Bennée
2024-01-30 7:51 ` Pierrick Bouvier
0 siblings, 1 reply; 34+ messages in thread
From: Alex Bennée @ 2024-01-26 16:31 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> ‘g_pattern_match_string’ is deprecated,
> Use 'g_pattern_spec_match_string' instead.
Unfortunately this isn't enough as we can still build on older glibs:
/* Ask for warnings for anything that was marked deprecated in
* the defined version, or before. It is a candidate for rewrite.
*/
#define GLIB_VERSION_MIN_REQUIRED GLIB_VERSION_2_56
You can do something like:
/*
* g_pattern_match_string has been deprecated in Glib since 2.70 and
* will complain about it if you try to use it. Fortunately the
* signature of both functions is the same making it easy to work
* around.
*/
static inline
gboolean g_pattern_spec_match_string_qemu(GPatternSpec *pspec,
const gchar *string)
{
#if GLIB_CHECK_VERSION(2, 70, 0)
return g_pattern_spec_match_string(pspec, string);
#else
return g_pattern_match_string(pspec, string);
#endif
};
#define g_pattern_spec_match_string(p, s) g_pattern_spec_match_string_qemu(p, s)
in glib-compat.h but I was wondering if it would be valid to add that
dependency to plugins. We might get away with it as it doesn't include
anything from QEMU itself.
>
> passing argument 2 of ‘g_ptr_array_add’ discards ‘const’ qualifier from
> pointer target type
>
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
> contrib/plugins/execlog.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/contrib/plugins/execlog.c b/contrib/plugins/execlog.c
> index 5a4de1c93be..d12137ce5c0 100644
> --- a/contrib/plugins/execlog.c
> +++ b/contrib/plugins/execlog.c
> @@ -336,8 +336,8 @@ static void registers_init(int vcpu_index)
> for (int p = 0; p < rmatches->len; p++) {
> g_autoptr(GPatternSpec) pat = g_pattern_spec_new(rmatches->pdata[p]);
> g_autofree gchar *rd_lower = g_utf8_strdown(rd->name, -1);
> - if (g_pattern_match_string(pat, rd->name) ||
> - g_pattern_match_string(pat, rd_lower)) {
> + if (g_pattern_spec_match_string(pat, rd->name) ||
> + g_pattern_spec_match_string(pat, rd_lower)) {
> Register *reg = init_vcpu_register(vcpu_index, rd);
> g_ptr_array_add(registers, reg);
>
> @@ -345,7 +345,7 @@ static void registers_init(int vcpu_index)
> if (disas_assist) {
> g_mutex_lock(&add_reg_name_lock);
> if (!g_ptr_array_find(all_reg_names, reg->name, NULL)) {
> - g_ptr_array_add(all_reg_names, reg->name);
> + g_ptr_array_add(all_reg_names, (gpointer)reg->name);
> }
> g_mutex_unlock(&add_reg_name_lock);
> }
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 01/14] plugins: implement inline operation relative to cpu_index
2024-01-26 12:07 ` Alex Bennée
@ 2024-01-29 10:10 ` Pierrick Bouvier
0 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-29 10:10 UTC (permalink / raw)
To: Alex Bennée
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
On 1/26/24 16:07, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>
>> Instead of working on a fixed memory location, allow to address it based
>> on cpu_index, an element size and a given offset.
>> Result address: ptr + offset + cpu_index * element_size.
>>
>> With this, we can target a member in a struct array from a base pointer.
>>
>> Current semantic is not modified, thus inline operation still targets
>> always the same memory location.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> accel/tcg/plugin-gen.c | 67 +++++++++++++++++++++++++++++++++++-------
>> include/qemu/plugin.h | 3 ++
>> plugins/api.c | 7 +++--
>> plugins/core.c | 18 +++++++++---
>> plugins/plugin.h | 6 ++--
>> 5 files changed, 81 insertions(+), 20 deletions(-)
>>
>> diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
>> index b37ce7683e6..1a2375d7779 100644
>> --- a/accel/tcg/plugin-gen.c
>> +++ b/accel/tcg/plugin-gen.c
>> @@ -132,16 +132,28 @@ static void gen_empty_udata_cb_no_rwg(void)
>> */
>> static void gen_empty_inline_cb(void)
>> {
>> + TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
>> + TCGv_ptr cpu_index_as_ptr = tcg_temp_ebb_new_ptr();
>> TCGv_i64 val = tcg_temp_ebb_new_i64();
>> TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
>>
>> + tcg_gen_ld_i32(cpu_index, tcg_env,
>> + -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
>> + /* pass an immediate != 0 so that it doesn't get optimized away */
>> + tcg_gen_muli_i32(cpu_index, cpu_index, 0xdeadbeef);
>> + tcg_gen_ext_i32_ptr(cpu_index_as_ptr, cpu_index);
>> +
>> tcg_gen_movi_ptr(ptr, 0);
>> + tcg_gen_add_ptr(ptr, ptr, cpu_index_as_ptr);
>> tcg_gen_ld_i64(val, ptr, 0);
>> /* pass an immediate != 0 so that it doesn't get optimized away */
>> tcg_gen_addi_i64(val, val, 0xdeadface);
>> +
>> tcg_gen_st_i64(val, ptr, 0);
>> tcg_temp_free_ptr(ptr);
>> tcg_temp_free_i64(val);
>> + tcg_temp_free_ptr(cpu_index_as_ptr);
>> + tcg_temp_free_i32(cpu_index);
>> }
>>
> <snip>
>> if (UINTPTR_MAX == UINT32_MAX) {
>> @@ -395,18 +439,19 @@ static TCGOp *append_inline_cb(const struct qemu_plugin_dyn_cb *cb,
>> TCGOp *begin_op, TCGOp *op,
>> int *unused)
>> {
>> - /* const_ptr */
>> - op = copy_const_ptr(&begin_op, op, cb->userp);
>> -
>> - /* ld_i64 */
>> + char *ptr = cb->userp;
>> + if (!cb->inline_direct_ptr) {
>> + /* dereference userp once to get access to memory location */
>> + ptr = *(char **)cb->userp;
>> + }
>
> I'm confused here, probably because inline_direct_ptr gets removed later
> on?
>
Yes, we temporarily need two code paths for this patch series. Else,
plugins should be updated at the same time than we make all changes to
not break anything.
>> + op = copy_ld_i32(&begin_op, op);
>> + op = copy_mul_i32(&begin_op, op, cb->inline_element_size);
>> + op = copy_ext_i32_ptr(&begin_op, op);
>> + op = copy_const_ptr(&begin_op, op, ptr + cb->inline_offset);
>> + op = copy_add_ptr(&begin_op, op);
>> op = copy_ld_i64(&begin_op, op);
>> -
>> - /* add_i64 */
>> op = copy_add_i64(&begin_op, op, cb->inline_insn.imm);
>> -
>> - /* st_i64 */
>> op = copy_st_i64(&begin_op, op);
>> -
>> return op;
>> }
>>
>> diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
>> index b0c5ac68293..9346249145d 100644
>> --- a/include/qemu/plugin.h
>> +++ b/include/qemu/plugin.h
>> @@ -86,6 +86,9 @@ enum plugin_dyn_cb_subtype {
>> struct qemu_plugin_dyn_cb {
>> union qemu_plugin_cb_sig f;
>> void *userp;
>> + size_t inline_offset;
>> + size_t inline_element_size;
>> + bool inline_direct_ptr;
>
> Ahh here it is.
>
> (I seem to recall there is a setting for git to order headers first that
> helps with this).
>
Indeed, found it (thanks):
https://gitlab.com/qemu-project/qemu/-/blob/master/scripts/git.orderfile
> We could do with some documentation here. I can guess the offset and
> element size values but what inline_direct_ptr means is not clear.
>
It represents if the userp is a pointer to data, or a pointer to pointer
to data. Any suggestion for name having this detail?
I have another patch that replace all this by a qemu_plugin_u64_t (from
scoreboard), that I'll integrate in a v3, which is much clearer. But
there will still be one commit needed with this.
>> enum plugin_dyn_cb_subtype type;
>> /* @rw applies to mem callbacks only (both regular and inline) */
>> enum qemu_plugin_mem_rw rw;
>> diff --git a/plugins/api.c b/plugins/api.c
>> index 8d5cca53295..e777eb4e9fc 100644
>> --- a/plugins/api.c
>> +++ b/plugins/api.c
>> @@ -106,7 +106,8 @@ void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
>> void *ptr, uint64_t imm)
>> {
>> if (!tb->mem_only) {
>> - plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
>> + plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE],
>> + 0, op, ptr, 0, sizeof(uint64_t), true, imm);
>> }
>> }
>>
>> @@ -131,7 +132,7 @@ void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
>> {
>> if (!insn->mem_only) {
>> plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
>> - 0, op, ptr, imm);
>> + 0, op, ptr, 0, sizeof(uint64_t), true, imm);
>> }
>> }
>>
>> @@ -156,7 +157,7 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
>> uint64_t imm)
>> {
>> plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
>> - rw, op, ptr, imm);
>> + rw, op, ptr, 0, sizeof(uint64_t), true, imm);
>> }
>>
>> void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
>> diff --git a/plugins/core.c b/plugins/core.c
>> index 49588285dd0..e07b9cdf229 100644
>> --- a/plugins/core.c
>> +++ b/plugins/core.c
>> @@ -280,13 +280,18 @@ static struct qemu_plugin_dyn_cb *plugin_get_dyn_cb(GArray **arr)
>>
>> void plugin_register_inline_op(GArray **arr,
>> enum qemu_plugin_mem_rw rw,
>> - enum qemu_plugin_op op, void *ptr,
>> + enum qemu_plugin_op op,
>> + void *ptr, size_t offset, size_t element_size,
>> + bool direct_ptr,
>> uint64_t imm)
>> {
>> struct qemu_plugin_dyn_cb *dyn_cb;
>>
>> dyn_cb = plugin_get_dyn_cb(arr);
>> dyn_cb->userp = ptr;
>> + dyn_cb->inline_element_size = element_size;
>> + dyn_cb->inline_offset = offset;
>> + dyn_cb->inline_direct_ptr = direct_ptr;
>> dyn_cb->type = PLUGIN_CB_INLINE;
>> dyn_cb->rw = rw;
>> dyn_cb->inline_insn.op = op;
>> @@ -431,9 +436,14 @@ void qemu_plugin_flush_cb(void)
>> plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
>> }
>>
>> -void exec_inline_op(struct qemu_plugin_dyn_cb *cb)
>> +void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int cpu_index)
>> {
>> - uint64_t *val = cb->userp;
>> + char *ptr = cb->userp;
>> + if (!cb->inline_direct_ptr) {
>> + ptr = *(char **) cb->userp;
>> + }
>> + ptr += cb->inline_offset;
>> + uint64_t *val = (uint64_t *)(ptr + cpu_index * cb->inline_element_size);
>>
>> switch (cb->inline_insn.op) {
>> case QEMU_PLUGIN_INLINE_ADD_U64:
>> @@ -466,7 +476,7 @@ void qemu_plugin_vcpu_mem_cb(CPUState *cpu, uint64_t vaddr,
>> vaddr, cb->userp);
>> break;
>> case PLUGIN_CB_INLINE:
>> - exec_inline_op(cb);
>> + exec_inline_op(cb, cpu->cpu_index);
>> break;
>> default:
>> g_assert_not_reached();
>> diff --git a/plugins/plugin.h b/plugins/plugin.h
>> index 5eb2fdbc85e..2c278379b70 100644
>> --- a/plugins/plugin.h
>> +++ b/plugins/plugin.h
>> @@ -66,7 +66,9 @@ struct qemu_plugin_ctx *plugin_id_to_ctx_locked(qemu_plugin_id_t id);
>>
>> void plugin_register_inline_op(GArray **arr,
>> enum qemu_plugin_mem_rw rw,
>> - enum qemu_plugin_op op, void *ptr,
>> + enum qemu_plugin_op op,
>> + void *ptr, size_t offset, size_t element_size,
>> + bool direct_ptr,
>> uint64_t imm);
>>
>> void plugin_reset_uninstall(qemu_plugin_id_t id,
>> @@ -95,6 +97,6 @@ void plugin_register_vcpu_mem_cb(GArray **arr,
>> enum qemu_plugin_mem_rw rw,
>> void *udata);
>>
>> -void exec_inline_op(struct qemu_plugin_dyn_cb *cb);
>> +void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int cpu_index);
>>
>> #endif /* PLUGIN_H */
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 02/14] plugins: scoreboard API
2024-01-26 15:14 ` Alex Bennée
@ 2024-01-30 7:37 ` Pierrick Bouvier
2024-01-30 10:23 ` Alex Bennée
2024-01-31 7:44 ` Pierrick Bouvier
1 sibling, 1 reply; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-30 7:37 UTC (permalink / raw)
To: Alex Bennée
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
On 1/26/24 19:14, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>
>> We introduce a cpu local storage, automatically managed (and extended)
>> by QEMU itself. Plugin allocate a scoreboard, and don't have to deal
>> with how many cpus are launched.
>>
>> This API will be used by new inline functions but callbacks can benefit
>> from this as well. This way, they can operate without a global lock for
>> simple operations.
>>
>> New functions:
>> - qemu_plugin_scoreboard_free
>> - qemu_plugin_scoreboard_get
>> - qemu_plugin_scoreboard_new
>> - qemu_plugin_scoreboard_size
>>
>> In more, we define a qemu_plugin_u64_t, which is a simple struct holding
>> a pointer to a scoreboard, and a given offset.
>> This allows to have a scoreboard containing structs, without having to
>> bring offset for all operations on a specific field.
>>
>> Since most of the plugins are simply collecting a sum of per-cpu values,
>> qemu_plugin_u64_t directly support this operation as well.
>>
>> New functions:
>> - qemu_plugin_u64_get
>> - qemu_plugin_u64_sum
>> New macros:
>> - qemu_plugin_u64
>> - qemu_plugin_u64_struct
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> include/qemu/plugin.h | 7 +++
>> include/qemu/qemu-plugin.h | 75 ++++++++++++++++++++++++++++
>> plugins/api.c | 39 +++++++++++++++
>> plugins/core.c | 97 ++++++++++++++++++++++++++++++++++++
>> plugins/plugin.h | 8 +++
>> plugins/qemu-plugins.symbols | 6 +++
>> 6 files changed, 232 insertions(+)
>>
>> diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
>> index 9346249145d..5f340192e56 100644
>> --- a/include/qemu/plugin.h
>> +++ b/include/qemu/plugin.h
>> @@ -115,6 +115,13 @@ struct qemu_plugin_insn {
>> bool mem_only;
>> };
>>
>> +/* A scoreboard is an array of values, indexed by vcpu_index */
>> +struct qemu_plugin_scoreboard {
>> + GArray *data;
>> + size_t size;
>
> Can we get size from GArray->len itself?
>
GArray->len matches the allocated size, which is different from
"semantic" size. I'll answer to why it was implemented this way in a
next comment.
>> + size_t element_size;
>> +};
>> +
>> /*
>> * qemu_plugin_insn allocate and cleanup functions. We don't expect to
>> * cleanup many of these structures. They are reused for each fresh
>> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
>> index 2c1930e7e45..934059d64c2 100644
>> --- a/include/qemu/qemu-plugin.h
>> +++ b/include/qemu/qemu-plugin.h
>> @@ -220,6 +220,23 @@ void qemu_plugin_register_vcpu_resume_cb(qemu_plugin_id_t id,
>> struct qemu_plugin_tb;
>> /** struct qemu_plugin_insn - Opaque handle for a translated instruction */
>> struct qemu_plugin_insn;
>> +/**
>> + * struct qemu_plugin_scoreboard - Opaque handle for a scoreboard
>> + *
>> + * A scoreboard is an array of data, indexed by vcpu_index.
>> + **/
>
> stray *'s - I think this is what trips up kdoc.
>
Thanks, fixed that in a new version. I now build with documentation
enabled, so should not happen again, sorry.
>> +struct qemu_plugin_scoreboard;
>> +
>> +/**
>> + * qemu_plugin_u64_t - uint64_t member of an entry in a scoreboard
>
> We generally reserve lower_case_with_underscores_ending_with_a_t for
> scalar types. Its also a little generic. Maybe qemu_plugin_scoreboard_ref?
>
For _t suffix, I hesitated on this, and followed the qemu_info_t struct
example in the same header.
In the beginning, I picked qemu_plugin_scoreboard_u64, but realized the
name was far too long. qemu_plugin_u64 seemed like the right spot
between length/meaning.
I would like to defend the u64 meaning, because it was explicitely made
to express u64 type, instead of a generic scoreboard entry/ref.
In more, based on other comments made, maybe it was not clear that a
qemu_plugin_u64 does not necessarily point to a "whole" entry in
scoreboard, but it can target a struct member (u64) in a given entry as
well. (i.e. scoreboard[i].member). I'll give a more detailed answer
below in this message.
This way, scoreboard can be composed of struct values, and a
qemu_plugin_u64 allows to work on a specific member of that struct
without having to manipulate offset everywhere. Thus the get and sum
operation on this type.
Does it makes more sense to you given this?
>> + *
>> + * This field allows to access a specific uint64_t member in one given entry,
>> + * located at a specified offset. Inline operations expect this as entry.
>> + */
>> +typedef struct qemu_plugin_u64 {
> > I don't think you need the forward declaration here (which clashes later
> on).
>
Initially wrote this without a typedef, and left this afterwards. Will
remove it, thanks!
>> + struct qemu_plugin_scoreboard *score;
>> + size_t offset;
>> +} qemu_plugin_u64_t;
>>
>> /**
>> * enum qemu_plugin_cb_flags - type of callback
>> @@ -754,5 +771,63 @@ int qemu_plugin_read_register(unsigned int vcpu,
>> struct qemu_plugin_register *handle,
>> GByteArray *buf);
>>
>> +/**
>> + * qemu_plugin_scoreboard_new() - alloc a new scoreboard
>> + *
>> + * Returns a pointer to a new scoreboard. It must be freed using
>> + * qemu_plugin_scoreboard_free.
>> + */
>> +QEMU_PLUGIN_API
>> +struct qemu_plugin_scoreboard *qemu_plugin_scoreboard_new(size_t element_size);
>> +
>> +/**
>> + * qemu_plugin_scoreboard_free() - free a scoreboard
>> + * @score: scoreboard to free
>> + */
>> +QEMU_PLUGIN_API
>> +void qemu_plugin_scoreboard_free(struct qemu_plugin_scoreboard *score);
>> +
>> +/**
>> + * qemu_plugin_scoreboard_size() - return size of a scoreboard
>> + * @score: scoreboard to query
>> + */
>> +QEMU_PLUGIN_API
>> +size_t qemu_plugin_scoreboard_size(struct qemu_plugin_scoreboard
>> *score);
>
> Is this actually host memory size related? The use case in the plugins
> seems to be a proxy for num_cpus and I'm note sure we want it used that
> way. We are making the contract that it will always be big enough for
> the number of vcpus created - maybe that is the helper we need?
>
The current API has a limitation in user mode, where you can't know how
many cpus were run overall. The point of scoreboard was to abstract this
away, by removing the need for plugins to track this (and use lock,
manual memory allocation, and glue code in general).
I thought it was more clear to write:
for (int i = 0; i < qemu_plugin_scoreboard_size(s); ++i) {
void *val = qemu_plugin_scoreboard_get(s, i);
}
than:
for (int i = 0; i < qemu_plugin_max_cpus(s); ++i) {
void *val = qemu_plugin_scoreboard_get(s, i);
}
We can always replace qemu_plugin_scoreboard_size by
qemu_plugin_max_cpus (or another name you prefer), which would return
the max number of cpus that were used during all the execution, if that
works for you.
>> +
>> +/**
>> + * qemu_plugin_scoreboard_get() - access value from a scoreboard
>> + * @score: scoreboard to query
>> + * @vcpu_index: entry index
>> + *
>> + * Returns address of entry of a scoreboard matching a given vcpu_index. This
>> + * address can be modified later if scoreboard is resized.
>> + */
>> +QEMU_PLUGIN_API
>> +void *qemu_plugin_scoreboard_get(struct qemu_plugin_scoreboard *score,
>> + unsigned int vcpu_index);
>> +
>> +/* Macros to define a qemu_plugin_u64_t */
>> +#define qemu_plugin_u64(score) \
>> + (qemu_plugin_u64_t){score, 0}
>> +#define qemu_plugin_u64_struct(score, type, member) \
>> + (qemu_plugin_u64_t){score, offsetof(type, member)}
>
> qemu_plugin_val_ref and qemu_plugin_struct_ref?
>
Same comment as before, it would be more clear to keep the u64 in the
name if possible, so we know which data type we manipulate.
>> +
>> +/**
>> + * qemu_plugin_u64_get() - access specific uint64_t in a scoreboard
>> + * @entry: entry to query
>> + * @vcpu_index: entry index
>> + *
>> + * Returns address of a specific member in a scoreboard entry, matching a given
>> + * vcpu_index.
>> + */
>> +QEMU_PLUGIN_API
>> +uint64_t *qemu_plugin_u64_get(qemu_plugin_u64_t entry, unsigned int vcpu_index);
>> +
>> +/**
>> + * qemu_plugin_u64_sum() - return sum of all values in a scoreboard
>> + * @entry: entry to sum
>> + */
>> +QEMU_PLUGIN_API
>> +uint64_t qemu_plugin_u64_sum(qemu_plugin_u64_t entry);
>>
>> #endif /* QEMU_QEMU_PLUGIN_H */
>> diff --git a/plugins/api.c b/plugins/api.c
>> index e777eb4e9fc..4de94e798c6 100644
>> --- a/plugins/api.c
>> +++ b/plugins/api.c
>> @@ -547,3 +547,42 @@ static void __attribute__((__constructor__)) qemu_api_init(void)
>> qemu_mutex_init(®_handle_lock);
>>
>> }
>> +
>> +struct qemu_plugin_scoreboard *qemu_plugin_scoreboard_new(size_t element_size)
>> +{
>> + return plugin_scoreboard_new(element_size);
>> +}
>> +
>> +void qemu_plugin_scoreboard_free(struct qemu_plugin_scoreboard *score)
>> +{
>> + plugin_scoreboard_free(score);
>> +}
>> +
>> +size_t qemu_plugin_scoreboard_size(struct qemu_plugin_scoreboard *score)
>> +{
>> + return score->size;
>
> So this would be score->data->len
>
>> +}
>> +
>> +void *qemu_plugin_scoreboard_get(struct qemu_plugin_scoreboard *score,
>> + unsigned int vcpu_index)
>> +{
>> + g_assert(vcpu_index < qemu_plugin_scoreboard_size(score));
>> + char *ptr = score->data->data;
>> + return ptr + vcpu_index * score->element_size;
>
> GArray has accesses functions for this:
>
> return &g_array_index(score->data,
> g_array_get_element_size(score->data),
> vcpu_index);
>
Yes, I'll update this.
>> +}
>> +
>> +uint64_t *qemu_plugin_u64_get(qemu_plugin_u64_t entry,
>> + unsigned int vcpu_index)
>> +{
>> + char *ptr = (char *) qemu_plugin_scoreboard_get(entry.score, vcpu_index);
>> + return (uint64_t *)(ptr + entry.offset);
>
> Not sure whats going with the casting here but I wonder if we are giving
> the user too much rope. Why not to return the value itself? In fact why
> do we treat things as an offset rather than an index of uint64_t.
>
> The qemu_plugin_scoreboard_get can return uint64_t * and the individual
> get becomes:
>
>
> uint64_t *qemu_plugin_scoreboard_get(struct qemu_plugin_scoreboard *score,
> unsigned int vcpu_index)
> {
> g_assert(vcpu_index < qemu_plugin_scoreboard_size(score));
> return &g_array_index(score->data, score->element_size, vcpu_index);
> }
>
> uint64_t qemu_plugin_u64_get(struct qemu_plugin_scoreboard *score,
> unsigned int vcpu_index, unsigned int score_index)
> {
> g_assert(score_index < score->num_elements);
> uint64_t *sb = qemu_plugin_scoreboard_get(score, vcpu_index);
> return sb[score_index];
> }
>
As said above, maybe it was not clear in the commit (though expressed in
commit message) what was a distinction between a scoreboard, and a
plugin_u64.
The latter is an abstraction of a specific member in one scoreboard
entry, while a scoreboard holds any type.
Let me write a concrete example, which will be clearer:
typedef struct {
char data1;
bool data2;
void* data3;
enum ...;
uint64_t counter;
} CPUData;
score = qemu_plugin_scoreboard_new(sizeof(CPUData));
qemu_plugin_u64_t counter =
qemu_plugin_u64_struct(score, CPUData, counter));
From there:
qemu_plugin_u64_get(counter, 2) returns &score[2].counter.
qemu_plugin_u64_sum(counter) returns
sum(i: [0:max_cpus[, score[i].counter).
With this, a plugin can simply have an abstract representation of cpu
local value, than can be get or set (since we return a pointer with get)
and easily summed, which is what most of the plugins want to do in the
end (at least from what I observed in existing ones).
If we stick to a void* scoreboard, we end up writing the sum function in
every plugin. In more, every API function "inline" operating on a
scoreboard would need the offsetof(CPUData, counter) passed in
parameter, which is error prone and repetitive.
With this explaination, do you see more the value to have a scoreboard
*and* a plugin_u64 struct to access it?
>> +}
>> +
>> +uint64_t qemu_plugin_u64_sum(qemu_plugin_u64_t entry)
>> +{
>> + uint64_t total = 0;
>> + for (int i = 0; i < qemu_plugin_scoreboard_size(entry.score); ++i) {
>> + total += *qemu_plugin_u64_get(entry, i);
>> + }
>> + return total;
>> +}
>> diff --git a/plugins/core.c b/plugins/core.c
>> index e07b9cdf229..0286a127810 100644
>> --- a/plugins/core.c
>> +++ b/plugins/core.c
>> @@ -209,6 +209,71 @@ plugin_register_cb_udata(qemu_plugin_id_t id, enum qemu_plugin_event ev,
>> do_plugin_register_cb(id, ev, func, udata);
>> }
>>
>> +struct resize_scoreboard_args {
>> + size_t new_alloc_size;
>> + size_t new_size;
>> +};
>> +
>> +static void plugin_resize_one_scoreboard(gpointer key,
>> + gpointer value,
>> + gpointer user_data)
>> +{
>> + struct qemu_plugin_scoreboard *score =
>> + (struct qemu_plugin_scoreboard *) value;
>> + struct resize_scoreboard_args *args =
>> + (struct resize_scoreboard_args *) user_data;
>> + if (score->data->len != args->new_alloc_size) {
>> + g_array_set_size(score->data, args->new_alloc_size);
>> + }
>> + score->size = args->new_size;
>
> I think you have this in len so if you skip the extra field you can just
> do:
>
> new_size = GPOINTER_TO_UINT(user_data);
>
If we can avoid managing allocated vs real size, yes.
>> +}
>> +
>> +static void plugin_grow_scoreboards__locked(CPUState *cpu)
>> +{
>> + if (cpu->cpu_index < plugin.scoreboard_size) {
>> + return;
>> + }
>> +
>> + bool need_realloc = FALSE;
>> + while (cpu->cpu_index >= plugin.scoreboard_alloc_size) {
>> + plugin.scoreboard_alloc_size *= 2;
>
> I suspect for USER this makes sense, if we every end up doing system
> expansion then it would just be +1 at a time. In fact given glib does:
>
> gsize want_alloc = g_nearest_pow (g_array_elt_len (array, want_len));
>
> maybe we just do a simple increment and leave it up glib to handle the
> exponential growth?
>
>
Initially, I wanted to go this way, but it seems like glib does not
offer alloc_size information about a GArray. I looked for this in
documentation and doc, and maybe missed it.
If you have an idea to detect this, I would love to just use it, and it
would made the code much more simple, avoiding the need to keep track of
two sizes.
The problem we try to solve is to detect when the pointer will be
reallocated, with the consequence that all tb must be flushed and
retranslated to use the new one instead.
>> + need_realloc = TRUE;
>> + }
>> + plugin.scoreboard_size = cpu->cpu_index + 1;
>> + g_assert(plugin.scoreboard_size <= plugin.scoreboard_alloc_size);
>> +
>> + if (g_hash_table_size(plugin.scoreboards) == 0) {
>> + /* nothing to do, we just updated sizes for future scoreboards */
>> + return;
>> + }
>> +
>> + if (need_realloc) {
>> +#ifdef CONFIG_USER_ONLY
>> + /**
>> + * cpus must be stopped, as some tb might still use an existing
>> + * scoreboard.
>> + */
>> + start_exclusive();
>> +#endif
>
> Hmm this seems wrong to be USER_ONLY. While we don't expect to resize in
> system mode if we did we certainly want to do it during exclusive
> periods.
>
I agree. Initially did this for both (system and user), but in system,
we hit a segfault.
$ ./build/qemu-system-x86_64 -plugin ./build/tests/plugin/libinline.so
-smp 17 # > 16 will trigger a reallocation of scoreboard
../cpu-common.c:196:20: runtime error: member access within null pointer
of type 'struct CPUState'
It seems like something is not set yet at this time (current_cpu TLS
var), which result in this problem. Maybe this code simply reveals an
existing problem, what do you think?
>> + }
>> +
>> + struct resize_scoreboard_args args = {
>> + .new_alloc_size = plugin.scoreboard_alloc_size,
>> + .new_size = plugin.scoreboard_size
>> + };
>> + g_hash_table_foreach(plugin.scoreboards,
>> + &plugin_resize_one_scoreboard,
>> + &args);
>
> This can just be g_hash_table_foreach(plugin.scoreboards,
> &plugin_resize_one_scoreboard,
> GUINT_TO_POINTER(plugin.scoreboard_alloc_size))
>
If we use a single size, yes.
>> +
>> + if (need_realloc) {
>> + /* force all tb to be flushed, as scoreboard pointers were changed. */
>> + tb_flush(cpu);
>> +#ifdef CONFIG_USER_ONLY
>> + end_exclusive();
>> +#endif
>> + }
>> +}
>> +
>> void qemu_plugin_vcpu_init_hook(CPUState *cpu)
>> {
>> bool success;
>> @@ -218,6 +283,7 @@ void qemu_plugin_vcpu_init_hook(CPUState *cpu)
>> success = g_hash_table_insert(plugin.cpu_ht, &cpu->cpu_index,
>> &cpu->cpu_index);
>> g_assert(success);
>> + plugin_grow_scoreboards__locked(cpu);
>> qemu_rec_mutex_unlock(&plugin.lock);
>>
>> plugin_vcpu_cb__simple(cpu, QEMU_PLUGIN_EV_VCPU_INIT);
>> @@ -576,8 +642,39 @@ static void __attribute__((__constructor__)) plugin_init(void)
>> qemu_rec_mutex_init(&plugin.lock);
>> plugin.id_ht = g_hash_table_new(g_int64_hash, g_int64_equal);
>> plugin.cpu_ht = g_hash_table_new(g_int_hash, g_int_equal);
>> + plugin.scoreboards = g_hash_table_new(g_int64_hash, g_int64_equal);
>> + plugin.scoreboard_size = 1;
>> + plugin.scoreboard_alloc_size = 16; /* avoid frequent reallocation */
>> QTAILQ_INIT(&plugin.ctxs);
>> qht_init(&plugin.dyn_cb_arr_ht, plugin_dyn_cb_arr_cmp, 16,
>> QHT_MODE_AUTO_RESIZE);
>> atexit(qemu_plugin_atexit_cb);
>> }
>> +
>> +struct qemu_plugin_scoreboard *plugin_scoreboard_new(size_t element_size)
>> +{
>> + struct qemu_plugin_scoreboard *score =
>> + g_malloc0(sizeof(struct qemu_plugin_scoreboard));
>> + score->data = g_array_new(FALSE, TRUE, element_size);
>> + g_array_set_size(score->data, plugin.scoreboard_alloc_size);
>> + score->size = plugin.scoreboard_size;
>> + score->element_size = element_size;
>> +
>> + qemu_rec_mutex_lock(&plugin.lock);
>> + bool inserted = g_hash_table_insert(plugin.scoreboards, score, score);
>> + g_assert(inserted);
>> + qemu_rec_mutex_unlock(&plugin.lock);
>> +
>> + return score;
>> +}
>> +
>> +void plugin_scoreboard_free(struct qemu_plugin_scoreboard *score)
>> +{
>> + qemu_rec_mutex_lock(&plugin.lock);
>> + bool removed = g_hash_table_remove(plugin.scoreboards, score);
>> + g_assert(removed);
>> + qemu_rec_mutex_unlock(&plugin.lock);
>> +
>> + g_array_free(score->data, TRUE);
>> + g_free(score);
>> +}
>> diff --git a/plugins/plugin.h b/plugins/plugin.h
>> index 2c278379b70..99829c40886 100644
>> --- a/plugins/plugin.h
>> +++ b/plugins/plugin.h
>> @@ -31,6 +31,10 @@ struct qemu_plugin_state {
>> * but with the HT we avoid adding a field to CPUState.
>> */
>> GHashTable *cpu_ht;
>> + /* Scoreboards, indexed by their addresses. */
>> + GHashTable *scoreboards;
>> + size_t scoreboard_size;
>> + size_t scoreboard_alloc_size;
>> DECLARE_BITMAP(mask, QEMU_PLUGIN_EV_MAX);
>> /*
>> * @lock protects the struct as well as ctx->uninstalling.
>> @@ -99,4 +103,8 @@ void plugin_register_vcpu_mem_cb(GArray **arr,
>>
>> void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int cpu_index);
>>
>> +struct qemu_plugin_scoreboard *plugin_scoreboard_new(size_t element_size);
>> +
>> +void plugin_scoreboard_free(struct qemu_plugin_scoreboard *score);
>> +
>> #endif /* PLUGIN_H */
>> diff --git a/plugins/qemu-plugins.symbols b/plugins/qemu-plugins.symbols
>> index 6963585c1ea..93866d1b5f2 100644
>> --- a/plugins/qemu-plugins.symbols
>> +++ b/plugins/qemu-plugins.symbols
>> @@ -38,10 +38,16 @@
>> qemu_plugin_register_vcpu_tb_exec_inline;
>> qemu_plugin_register_vcpu_tb_trans_cb;
>> qemu_plugin_reset;
>> + qemu_plugin_scoreboard_free;
>> + qemu_plugin_scoreboard_get;
>> + qemu_plugin_scoreboard_new;
>> + qemu_plugin_scoreboard_size;
>> qemu_plugin_start_code;
>> qemu_plugin_tb_get_insn;
>> qemu_plugin_tb_n_insns;
>> qemu_plugin_tb_vaddr;
>> + qemu_plugin_u64_get;
>> + qemu_plugin_u64_sum;
>> qemu_plugin_uninstall;
>> qemu_plugin_vcpu_for_each;
>> };
>
Thanks for your complete and detailed review!
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 05/14] tests/plugin: add test plugin for inline operations
2024-01-26 16:05 ` Alex Bennée
@ 2024-01-30 7:49 ` Pierrick Bouvier
2024-01-30 14:52 ` Alex Bennée
0 siblings, 1 reply; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-30 7:49 UTC (permalink / raw)
To: Alex Bennée
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
On 1/26/24 20:05, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>
>> For now, it simply performs instruction, bb and mem count, and ensure
>> that inline vs callback versions have the same result. Later, we'll
>> extend it when new inline operations are added.
>>
>> Use existing plugins to test everything works is a bit cumbersome, as
>> different events are treated in different plugins. Thus, this new one.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> tests/plugin/inline.c | 182 +++++++++++++++++++++++++++++++++++++++
>> tests/plugin/meson.build | 2 +-
>> 2 files changed, 183 insertions(+), 1 deletion(-)
>> create mode 100644 tests/plugin/inline.c
>>
>> diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c
>> new file mode 100644
>> index 00000000000..28d1c3b1e48
>> --- /dev/null
>> +++ b/tests/plugin/inline.c
>> @@ -0,0 +1,182 @@
>> +/*
>> + * Copyright (C) 2023, Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> + *
>> + * Demonstrates and tests usage of inline ops.
>> + *
>> + * License: GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + */
>> +
>> +#include <glib.h>
>> +#include <stdint.h>
>> +#include <stdio.h>
>> +
>> +#include <qemu-plugin.h>
>> +
>> +typedef struct {
>> + uint64_t count_tb;
>> + uint64_t count_tb_inline;
>> + uint64_t count_insn;
>> + uint64_t count_insn_inline;
>> + uint64_t count_mem;
>> + uint64_t count_mem_inline;
>> +} CPUCount;
>
> I wonder if there is any way to enforce the structures being an array of
> 64 bit counts? I do worry the compiler might want day decide to do
> something silly but legal leading to confusion.
>
> I guess qemu_plugin_scoreboard_new could:
>
> g_assert((element_size % sizeof(uint64_t)) == 0)
>
Given explaination on patch [02/14], do you see more that CPUCount could
hold any type, and qemu_plugin_u64 allows to target a specific member in it?
In general, qemu plugin runtime simply is given an offset and total size
of the struct, so a compiled plugin can have any optimization/padding on
this struct without this affecting the result.
> ?
>
>> +static qemu_plugin_u64_t count_tb;
>> +static qemu_plugin_u64_t count_tb_inline;
>> +static qemu_plugin_u64_t count_insn;
>> +static qemu_plugin_u64_t count_insn_inline;
>> +static qemu_plugin_u64_t count_mem;
>> +static qemu_plugin_u64_t count_mem_inline;
>
> Can't this just be a non scoreboard instance of CPUCount?
>
We could always use:
CPUCount* count = qemu_plugin_scoreboard_get(score, i);
count->count_tb++;
However, the part where we sum all values would now need some glue code,
which is longer and more error prone than having those definition here
(and declaration in install function).
>> +
>> +static uint64_t global_count_tb;
>> +static uint64_t global_count_insn;
>> +static uint64_t global_count_mem;
>> +static unsigned int max_cpu_index;
>> +static GMutex tb_lock;
>> +static GMutex insn_lock;
>> +static GMutex mem_lock;
>> +
>> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>> +
>> +static void stats_insn(void)
>> +{
>> + const uint64_t expected = global_count_insn;
>> + const uint64_t per_vcpu = qemu_plugin_u64_sum(count_insn);
>> + const uint64_t inl_per_vcpu =
>> + qemu_plugin_u64_sum(count_insn_inline);
>> + printf("insn: %" PRIu64 "\n", expected);
>> + printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu);
>> + printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>> + g_assert(expected > 0);
>> + g_assert(per_vcpu == expected);
>> + g_assert(inl_per_vcpu == expected);
>> +}
>> +
>> +static void stats_tb(void)
>> +{
>> + const uint64_t expected = global_count_tb;
>> + const uint64_t per_vcpu = qemu_plugin_u64_sum(count_tb);
>> + const uint64_t inl_per_vcpu =
>> + qemu_plugin_u64_sum(count_tb_inline);
>> + printf("tb: %" PRIu64 "\n", expected);
>> + printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu);
>> + printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>> + g_assert(expected > 0);
>> + g_assert(per_vcpu == expected);
>> + g_assert(inl_per_vcpu == expected);
>> +}
>> +
>> +static void stats_mem(void)
>> +{
>> + const uint64_t expected = global_count_mem;
>> + const uint64_t per_vcpu = qemu_plugin_u64_sum(count_mem);
>> + const uint64_t inl_per_vcpu =
>> + qemu_plugin_u64_sum(count_mem_inline);
>> + printf("mem: %" PRIu64 "\n", expected);
>> + printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu);
>> + printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>> + g_assert(expected > 0);
>> + g_assert(per_vcpu == expected);
>> + g_assert(inl_per_vcpu == expected);
>> +}
>> +
>> +static void plugin_exit(qemu_plugin_id_t id, void *udata)
>> +{
>> + const unsigned int num_cpus = qemu_plugin_scoreboard_size(counts);
>> + g_assert(num_cpus == max_cpu_index + 1);
>> +
>> + for (int i = 0; i < num_cpus ; ++i) {
>> + const uint64_t tb = *qemu_plugin_u64_get(count_tb, i);
>> + const uint64_t tb_inline = *qemu_plugin_u64_get(count_tb_inline, i);
>> + const uint64_t insn = *qemu_plugin_u64_get(count_insn, i);
>> + const uint64_t insn_inline = *qemu_plugin_u64_get(count_insn_inline, i);
>> + const uint64_t mem = *qemu_plugin_u64_get(count_mem, i);
>> + const uint64_t mem_inline = *qemu_plugin_u64_get(count_mem_inline, i);
>> + printf("cpu %d: tb (%" PRIu64 ", %" PRIu64 ") | "
>> + "insn (%" PRIu64 ", %" PRIu64 ") | "
>> + "mem (%" PRIu64 ", %" PRIu64 ")"
>> + "\n",
>> + i, tb, tb_inline, insn, insn_inline, mem, mem_inline);
>> + g_assert(tb == tb_inline);
>> + g_assert(insn == insn_inline);
>> + g_assert(mem == mem_inline);
>> + }
>> +
>> + stats_tb();
>> + stats_insn();
>> + stats_mem();
>> +
>> + qemu_plugin_scoreboard_free(counts);
>> +}
>> +
>> +static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
>> +{
>> + (*qemu_plugin_u64_get(count_tb, cpu_index))++;
>> + g_mutex_lock(&tb_lock);
>> + max_cpu_index = MAX(max_cpu_index, cpu_index);
>> + global_count_tb++;
>> + g_mutex_unlock(&tb_lock);
>> +}
>> +
>> +static void vcpu_insn_exec(unsigned int cpu_index, void *udata)
>> +{
>> + (*qemu_plugin_u64_get(count_insn, cpu_index))++;
>> + g_mutex_lock(&insn_lock);
>> + global_count_insn++;
>> + g_mutex_unlock(&insn_lock);
>> +}
>> +
>> +static void vcpu_mem_access(unsigned int cpu_index,
>> + qemu_plugin_meminfo_t info,
>> + uint64_t vaddr,
>> + void *userdata)
>> +{
>> + (*qemu_plugin_u64_get(count_mem, cpu_index))++;
>> + g_mutex_lock(&mem_lock);
>> + global_count_mem++;
>> + g_mutex_unlock(&mem_lock);
>> +}
>> +
>> +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>> +{
>> + qemu_plugin_register_vcpu_tb_exec_cb(
>> + tb, vcpu_tb_exec, QEMU_PLUGIN_CB_NO_REGS, 0);
>> + qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
>> + tb, QEMU_PLUGIN_INLINE_ADD_U64, count_tb_inline, 1);
>> +
>> + for (int idx = 0; idx < qemu_plugin_tb_n_insns(tb); ++idx) {
>> + struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx);
>> + qemu_plugin_register_vcpu_insn_exec_cb(
>> + insn, vcpu_insn_exec, QEMU_PLUGIN_CB_NO_REGS, 0);
>> + qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
>> + insn, QEMU_PLUGIN_INLINE_ADD_U64, count_insn_inline, 1);
>> + qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access,
>> + QEMU_PLUGIN_CB_NO_REGS,
>> + QEMU_PLUGIN_MEM_RW, 0);
>> + qemu_plugin_register_vcpu_mem_inline_per_vcpu(
>> + insn, QEMU_PLUGIN_MEM_RW,
>> + QEMU_PLUGIN_INLINE_ADD_U64,
>> + count_mem_inline, 1);
>> + }
>> +}
>> +
>> +QEMU_PLUGIN_EXPORT
>> +int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
>> + int argc, char **argv)
>> +{
>> + counts = qemu_plugin_scoreboard_new(sizeof(CPUCount));
>> + count_tb = qemu_plugin_u64_struct(counts, CPUCount, count_tb);
>> + count_insn = qemu_plugin_u64_struct(counts, CPUCount, count_insn);
>> + count_mem = qemu_plugin_u64_struct(counts, CPUCount, count_mem);
>> + count_tb_inline = qemu_plugin_u64_struct(counts, CPUCount, count_tb_inline);
>> + count_insn_inline =
>> + qemu_plugin_u64_struct(counts, CPUCount, count_insn_inline);
>> + count_mem_inline =
>> + qemu_plugin_u64_struct(counts, CPUCount, count_mem_inline);
>> + qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
>> + qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
>> +
>> + return 0;
>> +}
>> diff --git a/tests/plugin/meson.build b/tests/plugin/meson.build
>> index e18183aaeda..9eece5bab51 100644
>> --- a/tests/plugin/meson.build
>> +++ b/tests/plugin/meson.build
>> @@ -1,6 +1,6 @@
>> t = []
>> if get_option('plugins')
>> - foreach i : ['bb', 'empty', 'insn', 'mem', 'syscall']
>> + foreach i : ['bb', 'empty', 'inline', 'insn', 'mem', 'syscall']
>> if host_os == 'windows'
>> t += shared_module(i, files(i + '.c') + '../../contrib/plugins/win32_linker.c',
>> include_directories: '../../include/qemu',
>
> Otherwise:
>
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 14/14] contrib/plugins/execlog: fix new warnings
2024-01-26 16:31 ` Alex Bennée
@ 2024-01-30 7:51 ` Pierrick Bouvier
0 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-30 7:51 UTC (permalink / raw)
To: Alex Bennée
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
On 1/26/24 20:31, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>
>> ‘g_pattern_match_string’ is deprecated,
>> Use 'g_pattern_spec_match_string' instead.
>
> Unfortunately this isn't enough as we can still build on older glibs:
>
> /* Ask for warnings for anything that was marked deprecated in
> * the defined version, or before. It is a candidate for rewrite.
> */
> #define GLIB_VERSION_MIN_REQUIRED GLIB_VERSION_2_56
>
> You can do something like:
>
> /*
> * g_pattern_match_string has been deprecated in Glib since 2.70 and
> * will complain about it if you try to use it. Fortunately the
> * signature of both functions is the same making it easy to work
> * around.
> */
> static inline
> gboolean g_pattern_spec_match_string_qemu(GPatternSpec *pspec,
> const gchar *string)
> {
> #if GLIB_CHECK_VERSION(2, 70, 0)
> return g_pattern_spec_match_string(pspec, string);
> #else
> return g_pattern_match_string(pspec, string);
> #endif
> };
> #define g_pattern_spec_match_string(p, s) g_pattern_spec_match_string_qemu(p, s)
>
> in glib-compat.h but I was wondering if it would be valid to add that
> dependency to plugins. We might get away with it as it doesn't include
> anything from QEMU itself.
>
Oh I see.
Since it's the only plugin using this so far, and it's a "contrib"
plugins, I'll simply drop this patch for now. We can always discuss this
again in the future.
I think you are right, and it's not worth adding this to glib-compat.h.
>>
>> passing argument 2 of ‘g_ptr_array_add’ discards ‘const’ qualifier from
>> pointer target type
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> contrib/plugins/execlog.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/contrib/plugins/execlog.c b/contrib/plugins/execlog.c
>> index 5a4de1c93be..d12137ce5c0 100644
>> --- a/contrib/plugins/execlog.c
>> +++ b/contrib/plugins/execlog.c
>> @@ -336,8 +336,8 @@ static void registers_init(int vcpu_index)
>> for (int p = 0; p < rmatches->len; p++) {
>> g_autoptr(GPatternSpec) pat = g_pattern_spec_new(rmatches->pdata[p]);
>> g_autofree gchar *rd_lower = g_utf8_strdown(rd->name, -1);
>> - if (g_pattern_match_string(pat, rd->name) ||
>> - g_pattern_match_string(pat, rd_lower)) {
>> + if (g_pattern_spec_match_string(pat, rd->name) ||
>> + g_pattern_spec_match_string(pat, rd_lower)) {
>> Register *reg = init_vcpu_register(vcpu_index, rd);
>> g_ptr_array_add(registers, reg);
>>
>> @@ -345,7 +345,7 @@ static void registers_init(int vcpu_index)
>> if (disas_assist) {
>> g_mutex_lock(&add_reg_name_lock);
>> if (!g_ptr_array_find(all_reg_names, reg->name, NULL)) {
>> - g_ptr_array_add(all_reg_names, reg->name);
>> + g_ptr_array_add(all_reg_names, (gpointer)reg->name);
>> }
>> g_mutex_unlock(&add_reg_name_lock);
>> }
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 11/14] plugins: remove non per_vcpu inline operation from API
2024-01-26 16:26 ` Alex Bennée
@ 2024-01-30 7:53 ` Pierrick Bouvier
0 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-30 7:53 UTC (permalink / raw)
To: Alex Bennée
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
On 1/26/24 20:26, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>
>> Now we have a thread-safe equivalent of inline operation, and that all
>> plugins were changed to use it, there is no point to keep the old API.
>>
>> In more, it will help when we implement more functionality (conditional
>> callbacks), as we can assume that we operate on a scoreboard.
>>
>> Bump API version as it's a breaking change for existing plugins.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> include/qemu/qemu-plugin.h | 59 ++++----------------------------------
>> plugins/api.c | 29 -------------------
>> 2 files changed, 6 insertions(+), 82 deletions(-)
>>
>> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
>> index 55f918db1b0..3ee514f79cf 100644
>> --- a/include/qemu/qemu-plugin.h
>> +++ b/include/qemu/qemu-plugin.h
>> @@ -51,11 +51,16 @@ typedef uint64_t qemu_plugin_id_t;
>> *
>> * The plugins export the API they were built against by exposing the
>> * symbol qemu_plugin_version which can be checked.
>> + *
>> + * Version 2:
>> + * Remove qemu_plugin_register_vcpu_{tb, insn, mem}_exec_inline.
>> + * Those functions are replaced by *_per_vcpu variants, which guarantees
>> + * thread-safety for operations.
>> */
>>
>> extern QEMU_PLUGIN_EXPORT int qemu_plugin_version;
>>
>> -#define QEMU_PLUGIN_VERSION 1
>> +#define QEMU_PLUGIN_VERSION 2
>
> I think technically the adding new API bumps this, the deprecating the
> old version bumps:
>
> QEMU_PLUGIN_MIN_VERSION
>
> to the same.
>
Yes, you're right, it would prevent plugin using removed function to
work. I'll update MIN_VERSION too.
>>
>> /**
>> * struct qemu_info_t - system information for plugins
>> @@ -311,25 +316,6 @@ enum qemu_plugin_op {
>> QEMU_PLUGIN_INLINE_ADD_U64,
>> };
>>
>> -/**
>> - * qemu_plugin_register_vcpu_tb_exec_inline() - execution inline op
>> - * @tb: the opaque qemu_plugin_tb handle for the translation
>> - * @op: the type of qemu_plugin_op (e.g. ADD_U64)
>> - * @ptr: the target memory location for the op
>> - * @imm: the op data (e.g. 1)
>> - *
>> - * Insert an inline op to every time a translated unit executes.
>> - * Useful if you just want to increment a single counter somewhere in
>> - * memory.
>> - *
>> - * Note: ops are not atomic so in multi-threaded/multi-smp situations
>> - * you will get inexact results.
>> - */
>> -QEMU_PLUGIN_API
>> -void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
>> - enum qemu_plugin_op op,
>> - void *ptr, uint64_t imm);
>> -
>> /**
>> * qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu() - execution inline op
>> * @tb: the opaque qemu_plugin_tb handle for the translation
>> @@ -361,21 +347,6 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
>> enum qemu_plugin_cb_flags flags,
>> void *userdata);
>>
>> -/**
>> - * qemu_plugin_register_vcpu_insn_exec_inline() - insn execution inline op
>> - * @insn: the opaque qemu_plugin_insn handle for an instruction
>> - * @op: the type of qemu_plugin_op (e.g. ADD_U64)
>> - * @ptr: the target memory location for the op
>> - * @imm: the op data (e.g. 1)
>> - *
>> - * Insert an inline op to every time an instruction executes. Useful
>> - * if you just want to increment a single counter somewhere in memory.
>> - */
>> -QEMU_PLUGIN_API
>> -void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
>> - enum qemu_plugin_op op,
>> - void *ptr, uint64_t imm);
>> -
>> /**
>> * qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu() - insn exec inline op
>> * @insn: the opaque qemu_plugin_insn handle for an instruction
>> @@ -599,24 +570,6 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
>> enum qemu_plugin_mem_rw rw,
>> void *userdata);
>>
>> -/**
>> - * qemu_plugin_register_vcpu_mem_inline() - register an inline op to any memory access
>> - * @insn: handle for instruction to instrument
>> - * @rw: apply to reads, writes or both
>> - * @op: the op, of type qemu_plugin_op
>> - * @ptr: pointer memory for the op
>> - * @imm: immediate data for @op
>> - *
>> - * This registers a inline op every memory access generated by the
>> - * instruction. This provides for a lightweight but not thread-safe
>> - * way of counting the number of operations done.
>> - */
>> -QEMU_PLUGIN_API
>> -void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
>> - enum qemu_plugin_mem_rw rw,
>> - enum qemu_plugin_op op, void *ptr,
>> - uint64_t imm);
>> -
>> /**
>> * qemu_plugin_register_vcpu_mem_inline_per_vcpu() - inline op for mem access
>> * @insn: handle for instruction to instrument
>> diff --git a/plugins/api.c b/plugins/api.c
>> index 132d5e0bec1..29915d3c142 100644
>> --- a/plugins/api.c
>> +++ b/plugins/api.c
>> @@ -101,16 +101,6 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
>> }
>> }
>>
>> -void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
>> - enum qemu_plugin_op op,
>> - void *ptr, uint64_t imm)
>> -{
>> - if (!tb->mem_only) {
>> - plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE],
>> - 0, op, ptr, 0, sizeof(uint64_t), true, imm);
>> - }
>> -}
>> -
>> void qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
>> struct qemu_plugin_tb *tb,
>> enum qemu_plugin_op op,
>> @@ -140,16 +130,6 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
>> }
>> }
>>
>> -void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
>> - enum qemu_plugin_op op,
>> - void *ptr, uint64_t imm)
>> -{
>> - if (!insn->mem_only) {
>> - plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
>> - 0, op, ptr, 0, sizeof(uint64_t), true, imm);
>> - }
>> -}
>> -
>> void qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
>> struct qemu_plugin_insn *insn,
>> enum qemu_plugin_op op,
>> @@ -179,15 +159,6 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
>> cb, flags, rw, udata);
>> }
>>
>> -void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
>> - enum qemu_plugin_mem_rw rw,
>> - enum qemu_plugin_op op, void *ptr,
>> - uint64_t imm)
>> -{
>> - plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
>> - rw, op, ptr, 0, sizeof(uint64_t), true, imm);
>> -}
>> -
>> void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
>> struct qemu_plugin_insn *insn,
>> enum qemu_plugin_mem_rw rw,
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 02/14] plugins: scoreboard API
2024-01-30 7:37 ` Pierrick Bouvier
@ 2024-01-30 10:23 ` Alex Bennée
2024-01-30 11:10 ` Pierrick Bouvier
0 siblings, 1 reply; 34+ messages in thread
From: Alex Bennée @ 2024-01-30 10:23 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> On 1/26/24 19:14, Alex Bennée wrote:
>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>>
>>> We introduce a cpu local storage, automatically managed (and extended)
>>> by QEMU itself. Plugin allocate a scoreboard, and don't have to deal
>>> with how many cpus are launched.
>>>
>>> This API will be used by new inline functions but callbacks can benefit
>>> from this as well. This way, they can operate without a global lock for
>>> simple operations.
>>>
>>> New functions:
>>> - qemu_plugin_scoreboard_free
>>> - qemu_plugin_scoreboard_get
>>> - qemu_plugin_scoreboard_new
>>> - qemu_plugin_scoreboard_size
>>>
>>> In more, we define a qemu_plugin_u64_t, which is a simple struct holding
>>> a pointer to a scoreboard, and a given offset.
>>> This allows to have a scoreboard containing structs, without having to
>>> bring offset for all operations on a specific field.
>>>
>>> Since most of the plugins are simply collecting a sum of per-cpu values,
>>> qemu_plugin_u64_t directly support this operation as well.
>>>
>>> New functions:
>>> - qemu_plugin_u64_get
>>> - qemu_plugin_u64_sum
>>> New macros:
>>> - qemu_plugin_u64
>>> - qemu_plugin_u64_struct
>>>
>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>> ---
>>> include/qemu/plugin.h | 7 +++
>>> include/qemu/qemu-plugin.h | 75 ++++++++++++++++++++++++++++
>>> plugins/api.c | 39 +++++++++++++++
>>> plugins/core.c | 97 ++++++++++++++++++++++++++++++++++++
>>> plugins/plugin.h | 8 +++
>>> plugins/qemu-plugins.symbols | 6 +++
>>> 6 files changed, 232 insertions(+)
>>>
>>> diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
>>> index 9346249145d..5f340192e56 100644
>>> --- a/include/qemu/plugin.h
>>> +++ b/include/qemu/plugin.h
>>> @@ -115,6 +115,13 @@ struct qemu_plugin_insn {
>>> bool mem_only;
>>> };
>>> +/* A scoreboard is an array of values, indexed by vcpu_index */
>>> +struct qemu_plugin_scoreboard {
>>> + GArray *data;
>>> + size_t size;
>> Can we get size from GArray->len itself?
>>
>
> GArray->len matches the allocated size, which is different from
> "semantic" size. I'll answer to why it was implemented this way in a
> next comment.
Does it? The documentation states:
guint len;
the number of elements in the GArray not including the possible terminating zero element.
I think the actual allocated size is hidden from you by glib as an
implementation detail.
>>> + size_t element_size;
>>> +};
>>> +
>>> /*
>>> * qemu_plugin_insn allocate and cleanup functions. We don't expect to
>>> * cleanup many of these structures. They are reused for each fresh
>>> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
>>> index 2c1930e7e45..934059d64c2 100644
>>> --- a/include/qemu/qemu-plugin.h
>>> +++ b/include/qemu/qemu-plugin.h
>>> @@ -220,6 +220,23 @@ void qemu_plugin_register_vcpu_resume_cb(qemu_plugin_id_t id,
>>> struct qemu_plugin_tb;
>>> /** struct qemu_plugin_insn - Opaque handle for a translated instruction */
>>> struct qemu_plugin_insn;
>>> +/**
>>> + * struct qemu_plugin_scoreboard - Opaque handle for a scoreboard
>>> + *
>>> + * A scoreboard is an array of data, indexed by vcpu_index.
>>> + **/
>> stray *'s - I think this is what trips up kdoc.
>>
>
> Thanks, fixed that in a new version. I now build with documentation
> enabled, so should not happen again, sorry.
>
>>> +struct qemu_plugin_scoreboard;
>>> +
>>> +/**
>>> + * qemu_plugin_u64_t - uint64_t member of an entry in a scoreboard
>> We generally reserve lower_case_with_underscores_ending_with_a_t for
>> scalar types. Its also a little generic. Maybe qemu_plugin_scoreboard_ref?
>>
>
> For _t suffix, I hesitated on this, and followed the qemu_info_t
> struct example in the same header.
We are nothing if not inconsistent in the code base ;-) There are a
number of _t types that in retrospect shouldn't have been.
> In the beginning, I picked qemu_plugin_scoreboard_u64, but realized
> the name was far too long. qemu_plugin_u64 seemed like the right spot
> between length/meaning.
> I would like to defend the u64 meaning, because it was explicitely
> made to express u64 type, instead of a generic scoreboard entry/ref.
>
> In more, based on other comments made, maybe it was not clear that a
> qemu_plugin_u64 does not necessarily point to a "whole" entry in
> scoreboard, but it can target a struct member (u64) in a given entry
> as well. (i.e. scoreboard[i].member). I'll give a more detailed answer
> below in this message.
If we just return values rather than pointers we can stick with simpler
types.
> This way, scoreboard can be composed of struct values, and a
> qemu_plugin_u64 allows to work on a specific member of that struct
> without having to manipulate offset everywhere. Thus the get and sum
> operation on this type.
I would have to see an example of why giving this indirection to the
plugin makes things easier.
>
> Does it makes more sense to you given this?
>
>>> + *
>>> + * This field allows to access a specific uint64_t member in one given entry,
>>> + * located at a specified offset. Inline operations expect this as entry.
>>> + */
>>> +typedef struct qemu_plugin_u64 {
>> > I don't think you need the forward declaration here (which clashes later
>> on).
>>
>
> Initially wrote this without a typedef, and left this afterwards. Will
> remove it, thanks!
>
>>> + struct qemu_plugin_scoreboard *score;
>>> + size_t offset;
>>> +} qemu_plugin_u64_t;
>>> /**
>>> * enum qemu_plugin_cb_flags - type of callback
>>> @@ -754,5 +771,63 @@ int qemu_plugin_read_register(unsigned int vcpu,
>>> struct qemu_plugin_register *handle,
>>> GByteArray *buf);
>>> +/**
>>> + * qemu_plugin_scoreboard_new() - alloc a new scoreboard
>>> + *
>>> + * Returns a pointer to a new scoreboard. It must be freed using
>>> + * qemu_plugin_scoreboard_free.
>>> + */
>>> +QEMU_PLUGIN_API
>>> +struct qemu_plugin_scoreboard *qemu_plugin_scoreboard_new(size_t element_size);
>>> +
>>> +/**
>>> + * qemu_plugin_scoreboard_free() - free a scoreboard
>>> + * @score: scoreboard to free
>>> + */
>>> +QEMU_PLUGIN_API
>>> +void qemu_plugin_scoreboard_free(struct qemu_plugin_scoreboard *score);
>>> +
>>> +/**
>>> + * qemu_plugin_scoreboard_size() - return size of a scoreboard
>>> + * @score: scoreboard to query
>>> + */
>>> +QEMU_PLUGIN_API
>>> +size_t qemu_plugin_scoreboard_size(struct qemu_plugin_scoreboard
>>> *score);
>> Is this actually host memory size related? The use case in the
>> plugins
>> seems to be a proxy for num_cpus and I'm note sure we want it used that
>> way. We are making the contract that it will always be big enough for
>> the number of vcpus created - maybe that is the helper we need?
>>
>
> The current API has a limitation in user mode, where you can't know
> how many cpus were run overall. The point of scoreboard was to
> abstract this away, by removing the need for plugins to track this
> (and use lock, manual memory allocation, and glue code in general).
Agreed.
> I thought it was more clear to write:
> for (int i = 0; i < qemu_plugin_scoreboard_size(s); ++i) {
> void *val = qemu_plugin_scoreboard_get(s, i);
> }
> than:
> for (int i = 0; i < qemu_plugin_max_cpus(s); ++i) {
> void *val = qemu_plugin_scoreboard_get(s, i);
> }
>
> We can always replace qemu_plugin_scoreboard_size by
> qemu_plugin_max_cpus (or another name you prefer), which would return
> the max number of cpus that were used during all the execution, if
> that works for you.
I think that works, do we know it even if we haven't allocated any
scoreboards?
>
>>> +
>>> +/**
>>> + * qemu_plugin_scoreboard_get() - access value from a scoreboard
>>> + * @score: scoreboard to query
>>> + * @vcpu_index: entry index
>>> + *
>>> + * Returns address of entry of a scoreboard matching a given vcpu_index. This
>>> + * address can be modified later if scoreboard is resized.
>>> + */
>>> +QEMU_PLUGIN_API
>>> +void *qemu_plugin_scoreboard_get(struct qemu_plugin_scoreboard *score,
>>> + unsigned int vcpu_index);
>>> +
>>> +/* Macros to define a qemu_plugin_u64_t */
>>> +#define qemu_plugin_u64(score) \
>>> + (qemu_plugin_u64_t){score, 0}
>>> +#define qemu_plugin_u64_struct(score, type, member) \
>>> + (qemu_plugin_u64_t){score, offsetof(type, member)}
>> qemu_plugin_val_ref and qemu_plugin_struct_ref?
>>
>
> Same comment as before, it would be more clear to keep the u64 in the
> name if possible, so we know which data type we manipulate.
>
>>> +
>>> +/**
>>> + * qemu_plugin_u64_get() - access specific uint64_t in a scoreboard
>>> + * @entry: entry to query
>>> + * @vcpu_index: entry index
>>> + *
>>> + * Returns address of a specific member in a scoreboard entry, matching a given
>>> + * vcpu_index.
>>> + */
>>> +QEMU_PLUGIN_API
>>> +uint64_t *qemu_plugin_u64_get(qemu_plugin_u64_t entry, unsigned int vcpu_index);
>>> +
>>> +/**
>>> + * qemu_plugin_u64_sum() - return sum of all values in a scoreboard
>>> + * @entry: entry to sum
>>> + */
>>> +QEMU_PLUGIN_API
>>> +uint64_t qemu_plugin_u64_sum(qemu_plugin_u64_t entry);
>>> #endif /* QEMU_QEMU_PLUGIN_H */
>>> diff --git a/plugins/api.c b/plugins/api.c
>>> index e777eb4e9fc..4de94e798c6 100644
>>> --- a/plugins/api.c
>>> +++ b/plugins/api.c
>>> @@ -547,3 +547,42 @@ static void __attribute__((__constructor__)) qemu_api_init(void)
>>> qemu_mutex_init(®_handle_lock);
>>> }
>>> +
>>> +struct qemu_plugin_scoreboard *qemu_plugin_scoreboard_new(size_t element_size)
>>> +{
>>> + return plugin_scoreboard_new(element_size);
>>> +}
>>> +
>>> +void qemu_plugin_scoreboard_free(struct qemu_plugin_scoreboard *score)
>>> +{
>>> + plugin_scoreboard_free(score);
>>> +}
>>> +
>>> +size_t qemu_plugin_scoreboard_size(struct qemu_plugin_scoreboard *score)
>>> +{
>>> + return score->size;
>> So this would be score->data->len
>>
>>> +}
>>> +
>>> +void *qemu_plugin_scoreboard_get(struct qemu_plugin_scoreboard *score,
>>> + unsigned int vcpu_index)
>>> +{
>>> + g_assert(vcpu_index < qemu_plugin_scoreboard_size(score));
>>> + char *ptr = score->data->data;
>>> + return ptr + vcpu_index * score->element_size;
>> GArray has accesses functions for this:
>> return &g_array_index(score->data,
>> g_array_get_element_size(score->data),
>> vcpu_index);
>>
>
> Yes, I'll update this.
>
>>> +}
>>> +
>>> +uint64_t *qemu_plugin_u64_get(qemu_plugin_u64_t entry,
>>> + unsigned int vcpu_index)
>>> +{
>>> + char *ptr = (char *) qemu_plugin_scoreboard_get(entry.score, vcpu_index);
>>> + return (uint64_t *)(ptr + entry.offset);
>> Not sure whats going with the casting here but I wonder if we are
>> giving
>> the user too much rope. Why not to return the value itself? In fact why
>> do we treat things as an offset rather than an index of uint64_t.
>> The qemu_plugin_scoreboard_get can return uint64_t * and the
>> individual
>> get becomes:
>> uint64_t *qemu_plugin_scoreboard_get(struct
>> qemu_plugin_scoreboard *score,
>> unsigned int vcpu_index)
>> {
>> g_assert(vcpu_index < qemu_plugin_scoreboard_size(score));
>> return &g_array_index(score->data, score->element_size, vcpu_index);
>> }
>> uint64_t qemu_plugin_u64_get(struct qemu_plugin_scoreboard
>> *score,
>> unsigned int vcpu_index, unsigned int score_index)
>> {
>> g_assert(score_index < score->num_elements);
>> uint64_t *sb = qemu_plugin_scoreboard_get(score, vcpu_index);
>> return sb[score_index];
>> }
>>
>
> As said above, maybe it was not clear in the commit (though expressed
> in commit message) what was a distinction between a scoreboard, and a
> plugin_u64.
>
> The latter is an abstraction of a specific member in one scoreboard
> entry, while a scoreboard holds any type.
>
> Let me write a concrete example, which will be clearer:
>
> typedef struct {
> char data1;
> bool data2;
> void* data3;
> enum ...;
> uint64_t counter;
> } CPUData;
>
> score = qemu_plugin_scoreboard_new(sizeof(CPUData));
> qemu_plugin_u64_t counter =
> qemu_plugin_u64_struct(score, CPUData, counter));
>
> From there:
> qemu_plugin_u64_get(counter, 2) returns &score[2].counter.
> qemu_plugin_u64_sum(counter) returns
> sum(i: [0:max_cpus[, score[i].counter).
>
> With this, a plugin can simply have an abstract representation of cpu
> local value, than can be get or set (since we return a pointer with
> get) and easily summed, which is what most of the plugins want to do
> in the end (at least from what I observed in existing ones).
>
> If we stick to a void* scoreboard, we end up writing the sum function
> in every plugin. In more, every API function "inline" operating on a
> scoreboard would need the offsetof(CPUData, counter) passed in
> parameter, which is error prone and repetitive.
>
> With this explaination, do you see more the value to have a scoreboard
> *and* a plugin_u64 struct to access it?
>
>>> +}
>>> +
>>> +uint64_t qemu_plugin_u64_sum(qemu_plugin_u64_t entry)
>>> +{
>>> + uint64_t total = 0;
>>> + for (int i = 0; i < qemu_plugin_scoreboard_size(entry.score); ++i) {
>>> + total += *qemu_plugin_u64_get(entry, i);
>>> + }
>>> + return total;
>>> +}
>>> diff --git a/plugins/core.c b/plugins/core.c
>>> index e07b9cdf229..0286a127810 100644
>>> --- a/plugins/core.c
>>> +++ b/plugins/core.c
>>> @@ -209,6 +209,71 @@ plugin_register_cb_udata(qemu_plugin_id_t id, enum qemu_plugin_event ev,
>>> do_plugin_register_cb(id, ev, func, udata);
>>> }
>>> +struct resize_scoreboard_args {
>>> + size_t new_alloc_size;
>>> + size_t new_size;
>>> +};
>>> +
>>> +static void plugin_resize_one_scoreboard(gpointer key,
>>> + gpointer value,
>>> + gpointer user_data)
>>> +{
>>> + struct qemu_plugin_scoreboard *score =
>>> + (struct qemu_plugin_scoreboard *) value;
>>> + struct resize_scoreboard_args *args =
>>> + (struct resize_scoreboard_args *) user_data;
>>> + if (score->data->len != args->new_alloc_size) {
>>> + g_array_set_size(score->data, args->new_alloc_size);
>>> + }
>>> + score->size = args->new_size;
>> I think you have this in len so if you skip the extra field you can
>> just
>> do:
>> new_size = GPOINTER_TO_UINT(user_data);
>>
>
> If we can avoid managing allocated vs real size, yes.
>
>>> +}
>>> +
>>> +static void plugin_grow_scoreboards__locked(CPUState *cpu)
>>> +{
>>> + if (cpu->cpu_index < plugin.scoreboard_size) {
>>> + return;
>>> + }
>>> +
>>> + bool need_realloc = FALSE;
>>> + while (cpu->cpu_index >= plugin.scoreboard_alloc_size) {
>>> + plugin.scoreboard_alloc_size *= 2;
>> I suspect for USER this makes sense, if we every end up doing system
>> expansion then it would just be +1 at a time. In fact given glib does:
>> gsize want_alloc = g_nearest_pow (g_array_elt_len (array,
>> want_len));
>> maybe we just do a simple increment and leave it up glib to handle
>> the
>> exponential growth?
>>
>
> Initially, I wanted to go this way, but it seems like glib does not
> offer alloc_size information about a GArray. I looked for this in
> documentation and doc, and maybe missed it.
>
> If you have an idea to detect this, I would love to just use it, and
> it would made the code much more simple, avoiding the need to keep
> track of two sizes.
>
> The problem we try to solve is to detect when the pointer will be
> reallocated, with the consequence that all tb must be flushed and
> retranslated to use the new one instead.
>
>>> + need_realloc = TRUE;
>>> + }
>>> + plugin.scoreboard_size = cpu->cpu_index + 1;
>>> + g_assert(plugin.scoreboard_size <= plugin.scoreboard_alloc_size);
>>> +
>>> + if (g_hash_table_size(plugin.scoreboards) == 0) {
>>> + /* nothing to do, we just updated sizes for future scoreboards */
>>> + return;
>>> + }
>>> +
>>> + if (need_realloc) {
>>> +#ifdef CONFIG_USER_ONLY
>>> + /**
>>> + * cpus must be stopped, as some tb might still use an existing
>>> + * scoreboard.
>>> + */
>>> + start_exclusive();
>>> +#endif
>> Hmm this seems wrong to be USER_ONLY. While we don't expect to
>> resize in
>> system mode if we did we certainly want to do it during exclusive
>> periods.
>>
>
> I agree. Initially did this for both (system and user), but in system,
> we hit a segfault.
>
> $ ./build/qemu-system-x86_64 -plugin ./build/tests/plugin/libinline.so
> -smp 17 # > 16 will trigger a reallocation of scoreboard
> ../cpu-common.c:196:20: runtime error: member access within null
> pointer of type 'struct CPUState'
>
> It seems like something is not set yet at this time (current_cpu TLS
> var), which result in this problem. Maybe this code simply reveals an
> existing problem, what do you think?
>
>>> + }
>>> +
>>> + struct resize_scoreboard_args args = {
>>> + .new_alloc_size = plugin.scoreboard_alloc_size,
>>> + .new_size = plugin.scoreboard_size
>>> + };
>>> + g_hash_table_foreach(plugin.scoreboards,
>>> + &plugin_resize_one_scoreboard,
>>> + &args);
>> This can just be g_hash_table_foreach(plugin.scoreboards,
>> &plugin_resize_one_scoreboard,
>> GUINT_TO_POINTER(plugin.scoreboard_alloc_size))
>>
>
> If we use a single size, yes.
>
>>> +
>>> + if (need_realloc) {
>>> + /* force all tb to be flushed, as scoreboard pointers were changed. */
>>> + tb_flush(cpu);
>>> +#ifdef CONFIG_USER_ONLY
>>> + end_exclusive();
>>> +#endif
>>> + }
>>> +}
>>> +
>>> void qemu_plugin_vcpu_init_hook(CPUState *cpu)
>>> {
>>> bool success;
>>> @@ -218,6 +283,7 @@ void qemu_plugin_vcpu_init_hook(CPUState *cpu)
>>> success = g_hash_table_insert(plugin.cpu_ht, &cpu->cpu_index,
>>> &cpu->cpu_index);
>>> g_assert(success);
>>> + plugin_grow_scoreboards__locked(cpu);
>>> qemu_rec_mutex_unlock(&plugin.lock);
>>> plugin_vcpu_cb__simple(cpu, QEMU_PLUGIN_EV_VCPU_INIT);
>>> @@ -576,8 +642,39 @@ static void __attribute__((__constructor__)) plugin_init(void)
>>> qemu_rec_mutex_init(&plugin.lock);
>>> plugin.id_ht = g_hash_table_new(g_int64_hash, g_int64_equal);
>>> plugin.cpu_ht = g_hash_table_new(g_int_hash, g_int_equal);
>>> + plugin.scoreboards = g_hash_table_new(g_int64_hash, g_int64_equal);
>>> + plugin.scoreboard_size = 1;
>>> + plugin.scoreboard_alloc_size = 16; /* avoid frequent reallocation */
>>> QTAILQ_INIT(&plugin.ctxs);
>>> qht_init(&plugin.dyn_cb_arr_ht, plugin_dyn_cb_arr_cmp, 16,
>>> QHT_MODE_AUTO_RESIZE);
>>> atexit(qemu_plugin_atexit_cb);
>>> }
>>> +
>>> +struct qemu_plugin_scoreboard *plugin_scoreboard_new(size_t element_size)
>>> +{
>>> + struct qemu_plugin_scoreboard *score =
>>> + g_malloc0(sizeof(struct qemu_plugin_scoreboard));
>>> + score->data = g_array_new(FALSE, TRUE, element_size);
>>> + g_array_set_size(score->data, plugin.scoreboard_alloc_size);
>>> + score->size = plugin.scoreboard_size;
>>> + score->element_size = element_size;
>>> +
>>> + qemu_rec_mutex_lock(&plugin.lock);
>>> + bool inserted = g_hash_table_insert(plugin.scoreboards, score, score);
>>> + g_assert(inserted);
>>> + qemu_rec_mutex_unlock(&plugin.lock);
>>> +
>>> + return score;
>>> +}
>>> +
>>> +void plugin_scoreboard_free(struct qemu_plugin_scoreboard *score)
>>> +{
>>> + qemu_rec_mutex_lock(&plugin.lock);
>>> + bool removed = g_hash_table_remove(plugin.scoreboards, score);
>>> + g_assert(removed);
>>> + qemu_rec_mutex_unlock(&plugin.lock);
>>> +
>>> + g_array_free(score->data, TRUE);
>>> + g_free(score);
>>> +}
>>> diff --git a/plugins/plugin.h b/plugins/plugin.h
>>> index 2c278379b70..99829c40886 100644
>>> --- a/plugins/plugin.h
>>> +++ b/plugins/plugin.h
>>> @@ -31,6 +31,10 @@ struct qemu_plugin_state {
>>> * but with the HT we avoid adding a field to CPUState.
>>> */
>>> GHashTable *cpu_ht;
>>> + /* Scoreboards, indexed by their addresses. */
>>> + GHashTable *scoreboards;
>>> + size_t scoreboard_size;
>>> + size_t scoreboard_alloc_size;
>>> DECLARE_BITMAP(mask, QEMU_PLUGIN_EV_MAX);
>>> /*
>>> * @lock protects the struct as well as ctx->uninstalling.
>>> @@ -99,4 +103,8 @@ void plugin_register_vcpu_mem_cb(GArray **arr,
>>> void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int
>>> cpu_index);
>>> +struct qemu_plugin_scoreboard *plugin_scoreboard_new(size_t
>>> element_size);
>>> +
>>> +void plugin_scoreboard_free(struct qemu_plugin_scoreboard *score);
>>> +
>>> #endif /* PLUGIN_H */
>>> diff --git a/plugins/qemu-plugins.symbols b/plugins/qemu-plugins.symbols
>>> index 6963585c1ea..93866d1b5f2 100644
>>> --- a/plugins/qemu-plugins.symbols
>>> +++ b/plugins/qemu-plugins.symbols
>>> @@ -38,10 +38,16 @@
>>> qemu_plugin_register_vcpu_tb_exec_inline;
>>> qemu_plugin_register_vcpu_tb_trans_cb;
>>> qemu_plugin_reset;
>>> + qemu_plugin_scoreboard_free;
>>> + qemu_plugin_scoreboard_get;
>>> + qemu_plugin_scoreboard_new;
>>> + qemu_plugin_scoreboard_size;
>>> qemu_plugin_start_code;
>>> qemu_plugin_tb_get_insn;
>>> qemu_plugin_tb_n_insns;
>>> qemu_plugin_tb_vaddr;
>>> + qemu_plugin_u64_get;
>>> + qemu_plugin_u64_sum;
>>> qemu_plugin_uninstall;
>>> qemu_plugin_vcpu_for_each;
>>> };
>>
>
> Thanks for your complete and detailed review!
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 02/14] plugins: scoreboard API
2024-01-30 10:23 ` Alex Bennée
@ 2024-01-30 11:10 ` Pierrick Bouvier
0 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-30 11:10 UTC (permalink / raw)
To: Alex Bennée
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
On 1/30/24 14:23, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>
>> On 1/26/24 19:14, Alex Bennée wrote:
>>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>>>
>>>> We introduce a cpu local storage, automatically managed (and extended)
>>>> by QEMU itself. Plugin allocate a scoreboard, and don't have to deal
>>>> with how many cpus are launched.
>>>>
>>>> This API will be used by new inline functions but callbacks can benefit
>>>> from this as well. This way, they can operate without a global lock for
>>>> simple operations.
>>>>
>>>> New functions:
>>>> - qemu_plugin_scoreboard_free
>>>> - qemu_plugin_scoreboard_get
>>>> - qemu_plugin_scoreboard_new
>>>> - qemu_plugin_scoreboard_size
>>>>
>>>> In more, we define a qemu_plugin_u64_t, which is a simple struct holding
>>>> a pointer to a scoreboard, and a given offset.
>>>> This allows to have a scoreboard containing structs, without having to
>>>> bring offset for all operations on a specific field.
>>>>
>>>> Since most of the plugins are simply collecting a sum of per-cpu values,
>>>> qemu_plugin_u64_t directly support this operation as well.
>>>>
>>>> New functions:
>>>> - qemu_plugin_u64_get
>>>> - qemu_plugin_u64_sum
>>>> New macros:
>>>> - qemu_plugin_u64
>>>> - qemu_plugin_u64_struct
>>>>
>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>> ---
>>>> include/qemu/plugin.h | 7 +++
>>>> include/qemu/qemu-plugin.h | 75 ++++++++++++++++++++++++++++
>>>> plugins/api.c | 39 +++++++++++++++
>>>> plugins/core.c | 97 ++++++++++++++++++++++++++++++++++++
>>>> plugins/plugin.h | 8 +++
>>>> plugins/qemu-plugins.symbols | 6 +++
>>>> 6 files changed, 232 insertions(+)
>>>>
>>>> diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
>>>> index 9346249145d..5f340192e56 100644
>>>> --- a/include/qemu/plugin.h
>>>> +++ b/include/qemu/plugin.h
>>>> @@ -115,6 +115,13 @@ struct qemu_plugin_insn {
>>>> bool mem_only;
>>>> };
>>>> +/* A scoreboard is an array of values, indexed by vcpu_index */
>>>> +struct qemu_plugin_scoreboard {
>>>> + GArray *data;
>>>> + size_t size;
>>> Can we get size from GArray->len itself?
>>>
>>
>> GArray->len matches the allocated size, which is different from
>> "semantic" size. I'll answer to why it was implemented this way in a
>> next comment.
>
> Does it? The documentation states:
>
> guint len;
>
> the number of elements in the GArray not including the possible terminating zero element.
>
> I think the actual allocated size is hidden from you by glib as an
> implementation detail.
>
Sorry, I meant it in the context of a scoreboard where it matches the
"allocated" size (considering our exponential growth strategy). Indeed,
for the glib, current allocated size is a detail private to implementation.
Ideally, this exact function "g_array_maybe_expand" would be useful in a
public GArray API, but that's not the case today.
https://github.com/GNOME/glib/blob/5f345a265373deff10bd118b7205fb162268eab2/glib/garray.c#L119
So beyond handling both sizes ourselves, I don't see another solution.
>>>> + size_t element_size;
>>>> +};
>>>> +
>>>> /*
>>>> * qemu_plugin_insn allocate and cleanup functions. We don't expect to
>>>> * cleanup many of these structures. They are reused for each fresh
>>>> diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
>>>> index 2c1930e7e45..934059d64c2 100644
>>>> --- a/include/qemu/qemu-plugin.h
>>>> +++ b/include/qemu/qemu-plugin.h
>>>> @@ -220,6 +220,23 @@ void qemu_plugin_register_vcpu_resume_cb(qemu_plugin_id_t id,
>>>> struct qemu_plugin_tb;
>>>> /** struct qemu_plugin_insn - Opaque handle for a translated instruction */
>>>> struct qemu_plugin_insn;
>>>> +/**
>>>> + * struct qemu_plugin_scoreboard - Opaque handle for a scoreboard
>>>> + *
>>>> + * A scoreboard is an array of data, indexed by vcpu_index.
>>>> + **/
>>> stray *'s - I think this is what trips up kdoc.
>>>
>>
>> Thanks, fixed that in a new version. I now build with documentation
>> enabled, so should not happen again, sorry.
>>
>>>> +struct qemu_plugin_scoreboard;
>>>> +
>>>> +/**
>>>> + * qemu_plugin_u64_t - uint64_t member of an entry in a scoreboard
>>> We generally reserve lower_case_with_underscores_ending_with_a_t for
>>> scalar types. Its also a little generic. Maybe qemu_plugin_scoreboard_ref?
>>>
>>
>> For _t suffix, I hesitated on this, and followed the qemu_info_t
>> struct example in the same header.
>
> We are nothing if not inconsistent in the code base ;-) There are a
> number of _t types that in retrospect shouldn't have been.
>
Sure, no problem I'll remove the _t.
>> In the beginning, I picked qemu_plugin_scoreboard_u64, but realized
>> the name was far too long. qemu_plugin_u64 seemed like the right spot
>> between length/meaning.
>> I would like to defend the u64 meaning, because it was explicitely
>> made to express u64 type, instead of a generic scoreboard entry/ref.
>>
>> In more, based on other comments made, maybe it was not clear that a
>> qemu_plugin_u64 does not necessarily point to a "whole" entry in
>> scoreboard, but it can target a struct member (u64) in a given entry
>> as well. (i.e. scoreboard[i].member). I'll give a more detailed answer
>> below in this message.
>
> If we just return values rather than pointers we can stick with simpler
> types.
>
Initially, I did this. The problem is that you then need more things
when working on a value:
- qemu_plugin_u64_set
_ probably qemu_plugin_u64_inc(uint64_t val), because
qemu_plugin_u64_set(qemu_plugin_u64_get(counter) + 1) is a bit verbose
for me.
In more, it's inconsistent to have a qemu_plugin_scoreboard_get that
return a pointer, while a qemu_plugin_u64_get that return a value. I
tried to follow g_array semantic of get equivalent, which gives an
address based access instead of a value.
>> This way, scoreboard can be composed of struct values, and a
>> qemu_plugin_u64 allows to work on a specific member of that struct
>> without having to manipulate offset everywhere. Thus the get and sum
>> operation on this type.
>
> I would have to see an example of why giving this indirection to the
> plugin makes things easier.
>
The first version I wrote had simply a scoreboard interface, and
quickly, some things appeared as verbose and/or missing.
For concrete examples, I would suggest to look at plugins rewritten
using this in the series. [08/14] bb.c for instance.
In the things that pushed me towards this, if it can help to convince:
- having a scoreboard of struct is really useful, and allows to skip
allocation/locking problems while manipulating a data more complex than
an array of integers. I hope we agree on this.
- since qemu_plugin_scoreboard_get has to return a void*, you need to
declare a (typed) variable or write a cast every time you just want to
access a value.
- the sum is still a problem you have to consider, after typing 3 times
the same function in plugins, it appears as something that should be
part of the interface. Having a sum(ptr, offset, element_size) interface
is really error prone.
- when registering inline stuff, you need to add a call parameter with
offsetof(MyStruct, member), which offers more possibility for a mistake
compared to have single place where you define a qemu_plugin_u64_t, and
manipulate this everywhere.
- register inline op already have a lot of parameters, adding a new
"offset" makes it heavy, especially when it's just 0.
- possibility to extend this with other types than u64 in the future
without going through the "void *" strategy.
- favor strongly typed design when possible.
Finally, I would add that it's not mandatory to have a global static
variable of this type. You can simply register inline op using
(qemu_plugin_u64(score)). It's useful when you want to manipulate a
precise value, and get its sum over all cpus.
>>
>> Does it makes more sense to you given this?
>>
>>>> + *
>>>> + * This field allows to access a specific uint64_t member in one given entry,
>>>> + * located at a specified offset. Inline operations expect this as entry.
>>>> + */
>>>> +typedef struct qemu_plugin_u64 {
>>> > I don't think you need the forward declaration here (which clashes later
>>> on).
>>>
>>
>> Initially wrote this without a typedef, and left this afterwards. Will
>> remove it, thanks!
>>
>>>> + struct qemu_plugin_scoreboard *score;
>>>> + size_t offset;
>>>> +} qemu_plugin_u64_t;
>>>> /**
>>>> * enum qemu_plugin_cb_flags - type of callback
>>>> @@ -754,5 +771,63 @@ int qemu_plugin_read_register(unsigned int vcpu,
>>>> struct qemu_plugin_register *handle,
>>>> GByteArray *buf);
>>>> +/**
>>>> + * qemu_plugin_scoreboard_new() - alloc a new scoreboard
>>>> + *
>>>> + * Returns a pointer to a new scoreboard. It must be freed using
>>>> + * qemu_plugin_scoreboard_free.
>>>> + */
>>>> +QEMU_PLUGIN_API
>>>> +struct qemu_plugin_scoreboard *qemu_plugin_scoreboard_new(size_t element_size);
>>>> +
>>>> +/**
>>>> + * qemu_plugin_scoreboard_free() - free a scoreboard
>>>> + * @score: scoreboard to free
>>>> + */
>>>> +QEMU_PLUGIN_API
>>>> +void qemu_plugin_scoreboard_free(struct qemu_plugin_scoreboard *score);
>>>> +
>>>> +/**
>>>> + * qemu_plugin_scoreboard_size() - return size of a scoreboard
>>>> + * @score: scoreboard to query
>>>> + */
>>>> +QEMU_PLUGIN_API
>>>> +size_t qemu_plugin_scoreboard_size(struct qemu_plugin_scoreboard
>>>> *score);
>>> Is this actually host memory size related? The use case in the
>>> plugins
>>> seems to be a proxy for num_cpus and I'm note sure we want it used that
>>> way. We are making the contract that it will always be big enough for
>>> the number of vcpus created - maybe that is the helper we need?
>>>
>>
>> The current API has a limitation in user mode, where you can't know
>> how many cpus were run overall. The point of scoreboard was to
>> abstract this away, by removing the need for plugins to track this
>> (and use lock, manual memory allocation, and glue code in general).
>
> Agreed.
>
>> I thought it was more clear to write:
>> for (int i = 0; i < qemu_plugin_scoreboard_size(s); ++i) {
>> void *val = qemu_plugin_scoreboard_get(s, i);
>> }
>> than:
>> for (int i = 0; i < qemu_plugin_max_cpus(s); ++i) {
>> void *val = qemu_plugin_scoreboard_get(s, i);
>> }
>>
>> We can always replace qemu_plugin_scoreboard_size by
>> qemu_plugin_max_cpus (or another name you prefer), which would return
>> the max number of cpus that were used during all the execution, if
>> that works for you.
>
> I think that works, do we know it even if we haven't allocated any
> scoreboards?
>
Excellent point, indeed in this case, you don't have access to this
information (scoreboard_size needs a scoreboard in parameter). That
would be a good argument to have a max_cpus function instead. Would that
name work for you?
>
>>
>>>> +
>>>> +/**
>>>> + * qemu_plugin_scoreboard_get() - access value from a scoreboard
>>>> + * @score: scoreboard to query
>>>> + * @vcpu_index: entry index
>>>> + *
>>>> + * Returns address of entry of a scoreboard matching a given vcpu_index. This
>>>> + * address can be modified later if scoreboard is resized.
>>>> + */
>>>> +QEMU_PLUGIN_API
>>>> +void *qemu_plugin_scoreboard_get(struct qemu_plugin_scoreboard *score,
>>>> + unsigned int vcpu_index);
>>>> +
>>>> +/* Macros to define a qemu_plugin_u64_t */
>>>> +#define qemu_plugin_u64(score) \
>>>> + (qemu_plugin_u64_t){score, 0}
>>>> +#define qemu_plugin_u64_struct(score, type, member) \
>>>> + (qemu_plugin_u64_t){score, offsetof(type, member)}
>>> qemu_plugin_val_ref and qemu_plugin_struct_ref?
>>>
>>
>> Same comment as before, it would be more clear to keep the u64 in the
>> name if possible, so we know which data type we manipulate.
>>
>>>> +
>>>> +/**
>>>> + * qemu_plugin_u64_get() - access specific uint64_t in a scoreboard
>>>> + * @entry: entry to query
>>>> + * @vcpu_index: entry index
>>>> + *
>>>> + * Returns address of a specific member in a scoreboard entry, matching a given
>>>> + * vcpu_index.
>>>> + */
>>>> +QEMU_PLUGIN_API
>>>> +uint64_t *qemu_plugin_u64_get(qemu_plugin_u64_t entry, unsigned int vcpu_index);
>>>> +
>>>> +/**
>>>> + * qemu_plugin_u64_sum() - return sum of all values in a scoreboard
>>>> + * @entry: entry to sum
>>>> + */
>>>> +QEMU_PLUGIN_API
>>>> +uint64_t qemu_plugin_u64_sum(qemu_plugin_u64_t entry);
>>>> #endif /* QEMU_QEMU_PLUGIN_H */
>>>> diff --git a/plugins/api.c b/plugins/api.c
>>>> index e777eb4e9fc..4de94e798c6 100644
>>>> --- a/plugins/api.c
>>>> +++ b/plugins/api.c
>>>> @@ -547,3 +547,42 @@ static void __attribute__((__constructor__)) qemu_api_init(void)
>>>> qemu_mutex_init(®_handle_lock);
>>>> }
>>>> +
>>>> +struct qemu_plugin_scoreboard *qemu_plugin_scoreboard_new(size_t element_size)
>>>> +{
>>>> + return plugin_scoreboard_new(element_size);
>>>> +}
>>>> +
>>>> +void qemu_plugin_scoreboard_free(struct qemu_plugin_scoreboard *score)
>>>> +{
>>>> + plugin_scoreboard_free(score);
>>>> +}
>>>> +
>>>> +size_t qemu_plugin_scoreboard_size(struct qemu_plugin_scoreboard *score)
>>>> +{
>>>> + return score->size;
>>> So this would be score->data->len
>>>
>>>> +}
>>>> +
>>>> +void *qemu_plugin_scoreboard_get(struct qemu_plugin_scoreboard *score,
>>>> + unsigned int vcpu_index)
>>>> +{
>>>> + g_assert(vcpu_index < qemu_plugin_scoreboard_size(score));
>>>> + char *ptr = score->data->data;
>>>> + return ptr + vcpu_index * score->element_size;
>>> GArray has accesses functions for this:
>>> return &g_array_index(score->data,
>>> g_array_get_element_size(score->data),
>>> vcpu_index);
>>>
>>
>> Yes, I'll update this.
>>
>>>> +}
>>>> +
>>>> +uint64_t *qemu_plugin_u64_get(qemu_plugin_u64_t entry,
>>>> + unsigned int vcpu_index)
>>>> +{
>>>> + char *ptr = (char *) qemu_plugin_scoreboard_get(entry.score, vcpu_index);
>>>> + return (uint64_t *)(ptr + entry.offset);
>>> Not sure whats going with the casting here but I wonder if we are
>>> giving
>>> the user too much rope. Why not to return the value itself? In fact why
>>> do we treat things as an offset rather than an index of uint64_t.
>>> The qemu_plugin_scoreboard_get can return uint64_t * and the
>>> individual
>>> get becomes:
>>> uint64_t *qemu_plugin_scoreboard_get(struct
>>> qemu_plugin_scoreboard *score,
>>> unsigned int vcpu_index)
>>> {
>>> g_assert(vcpu_index < qemu_plugin_scoreboard_size(score));
>>> return &g_array_index(score->data, score->element_size, vcpu_index);
>>> }
>>> uint64_t qemu_plugin_u64_get(struct qemu_plugin_scoreboard
>>> *score,
>>> unsigned int vcpu_index, unsigned int score_index)
>>> {
>>> g_assert(score_index < score->num_elements);
>>> uint64_t *sb = qemu_plugin_scoreboard_get(score, vcpu_index);
>>> return sb[score_index];
>>> }
>>>
>>
>> As said above, maybe it was not clear in the commit (though expressed
>> in commit message) what was a distinction between a scoreboard, and a
>> plugin_u64.
>>
>> The latter is an abstraction of a specific member in one scoreboard
>> entry, while a scoreboard holds any type.
>>
>> Let me write a concrete example, which will be clearer:
>>
>> typedef struct {
>> char data1;
>> bool data2;
>> void* data3;
>> enum ...;
>> uint64_t counter;
>> } CPUData;
>>
>> score = qemu_plugin_scoreboard_new(sizeof(CPUData));
>> qemu_plugin_u64_t counter =
>> qemu_plugin_u64_struct(score, CPUData, counter));
>>
>> From there:
>> qemu_plugin_u64_get(counter, 2) returns &score[2].counter.
>> qemu_plugin_u64_sum(counter) returns
>> sum(i: [0:max_cpus[, score[i].counter).
>>
>> With this, a plugin can simply have an abstract representation of cpu
>> local value, than can be get or set (since we return a pointer with
>> get) and easily summed, which is what most of the plugins want to do
>> in the end (at least from what I observed in existing ones).
>>
>> If we stick to a void* scoreboard, we end up writing the sum function
>> in every plugin. In more, every API function "inline" operating on a
>> scoreboard would need the offsetof(CPUData, counter) passed in
>> parameter, which is error prone and repetitive.
>>
>> With this explaination, do you see more the value to have a scoreboard
>> *and* a plugin_u64 struct to access it?
>>
>>>> +}
>>>> +
>>>> +uint64_t qemu_plugin_u64_sum(qemu_plugin_u64_t entry)
>>>> +{
>>>> + uint64_t total = 0;
>>>> + for (int i = 0; i < qemu_plugin_scoreboard_size(entry.score); ++i) {
>>>> + total += *qemu_plugin_u64_get(entry, i);
>>>> + }
>>>> + return total;
>>>> +}
>>>> diff --git a/plugins/core.c b/plugins/core.c
>>>> index e07b9cdf229..0286a127810 100644
>>>> --- a/plugins/core.c
>>>> +++ b/plugins/core.c
>>>> @@ -209,6 +209,71 @@ plugin_register_cb_udata(qemu_plugin_id_t id, enum qemu_plugin_event ev,
>>>> do_plugin_register_cb(id, ev, func, udata);
>>>> }
>>>> +struct resize_scoreboard_args {
>>>> + size_t new_alloc_size;
>>>> + size_t new_size;
>>>> +};
>>>> +
>>>> +static void plugin_resize_one_scoreboard(gpointer key,
>>>> + gpointer value,
>>>> + gpointer user_data)
>>>> +{
>>>> + struct qemu_plugin_scoreboard *score =
>>>> + (struct qemu_plugin_scoreboard *) value;
>>>> + struct resize_scoreboard_args *args =
>>>> + (struct resize_scoreboard_args *) user_data;
>>>> + if (score->data->len != args->new_alloc_size) {
>>>> + g_array_set_size(score->data, args->new_alloc_size);
>>>> + }
>>>> + score->size = args->new_size;
>>> I think you have this in len so if you skip the extra field you can
>>> just
>>> do:
>>> new_size = GPOINTER_TO_UINT(user_data);
>>>
>>
>> If we can avoid managing allocated vs real size, yes.
>>
>>>> +}
>>>> +
>>>> +static void plugin_grow_scoreboards__locked(CPUState *cpu)
>>>> +{
>>>> + if (cpu->cpu_index < plugin.scoreboard_size) {
>>>> + return;
>>>> + }
>>>> +
>>>> + bool need_realloc = FALSE;
>>>> + while (cpu->cpu_index >= plugin.scoreboard_alloc_size) {
>>>> + plugin.scoreboard_alloc_size *= 2;
>>> I suspect for USER this makes sense, if we every end up doing system
>>> expansion then it would just be +1 at a time. In fact given glib does:
>>> gsize want_alloc = g_nearest_pow (g_array_elt_len (array,
>>> want_len));
>>> maybe we just do a simple increment and leave it up glib to handle
>>> the
>>> exponential growth?
>>>
>>
>> Initially, I wanted to go this way, but it seems like glib does not
>> offer alloc_size information about a GArray. I looked for this in
>> documentation and doc, and maybe missed it.
>>
>> If you have an idea to detect this, I would love to just use it, and
>> it would made the code much more simple, avoiding the need to keep
>> track of two sizes.
>>
>> The problem we try to solve is to detect when the pointer will be
>> reallocated, with the consequence that all tb must be flushed and
>> retranslated to use the new one instead.
>>
>>>> + need_realloc = TRUE;
>>>> + }
>>>> + plugin.scoreboard_size = cpu->cpu_index + 1;
>>>> + g_assert(plugin.scoreboard_size <= plugin.scoreboard_alloc_size);
>>>> +
>>>> + if (g_hash_table_size(plugin.scoreboards) == 0) {
>>>> + /* nothing to do, we just updated sizes for future scoreboards */
>>>> + return;
>>>> + }
>>>> +
>>>> + if (need_realloc) {
>>>> +#ifdef CONFIG_USER_ONLY
>>>> + /**
>>>> + * cpus must be stopped, as some tb might still use an existing
>>>> + * scoreboard.
>>>> + */
>>>> + start_exclusive();
>>>> +#endif
>>> Hmm this seems wrong to be USER_ONLY. While we don't expect to
>>> resize in
>>> system mode if we did we certainly want to do it during exclusive
>>> periods.
>>>
>>
>> I agree. Initially did this for both (system and user), but in system,
>> we hit a segfault.
>>
>> $ ./build/qemu-system-x86_64 -plugin ./build/tests/plugin/libinline.so
>> -smp 17 # > 16 will trigger a reallocation of scoreboard
>> ../cpu-common.c:196:20: runtime error: member access within null
>> pointer of type 'struct CPUState'
>>
>> It seems like something is not set yet at this time (current_cpu TLS
>> var), which result in this problem. Maybe this code simply reveals an
>> existing problem, what do you think?
>>
>>>> + }
>>>> +
>>>> + struct resize_scoreboard_args args = {
>>>> + .new_alloc_size = plugin.scoreboard_alloc_size,
>>>> + .new_size = plugin.scoreboard_size
>>>> + };
>>>> + g_hash_table_foreach(plugin.scoreboards,
>>>> + &plugin_resize_one_scoreboard,
>>>> + &args);
>>> This can just be g_hash_table_foreach(plugin.scoreboards,
>>> &plugin_resize_one_scoreboard,
>>> GUINT_TO_POINTER(plugin.scoreboard_alloc_size))
>>>
>>
>> If we use a single size, yes.
>>
>>>> +
>>>> + if (need_realloc) {
>>>> + /* force all tb to be flushed, as scoreboard pointers were changed. */
>>>> + tb_flush(cpu);
>>>> +#ifdef CONFIG_USER_ONLY
>>>> + end_exclusive();
>>>> +#endif
>>>> + }
>>>> +}
>>>> +
>>>> void qemu_plugin_vcpu_init_hook(CPUState *cpu)
>>>> {
>>>> bool success;
>>>> @@ -218,6 +283,7 @@ void qemu_plugin_vcpu_init_hook(CPUState *cpu)
>>>> success = g_hash_table_insert(plugin.cpu_ht, &cpu->cpu_index,
>>>> &cpu->cpu_index);
>>>> g_assert(success);
>>>> + plugin_grow_scoreboards__locked(cpu);
>>>> qemu_rec_mutex_unlock(&plugin.lock);
>>>> plugin_vcpu_cb__simple(cpu, QEMU_PLUGIN_EV_VCPU_INIT);
>>>> @@ -576,8 +642,39 @@ static void __attribute__((__constructor__)) plugin_init(void)
>>>> qemu_rec_mutex_init(&plugin.lock);
>>>> plugin.id_ht = g_hash_table_new(g_int64_hash, g_int64_equal);
>>>> plugin.cpu_ht = g_hash_table_new(g_int_hash, g_int_equal);
>>>> + plugin.scoreboards = g_hash_table_new(g_int64_hash, g_int64_equal);
>>>> + plugin.scoreboard_size = 1;
>>>> + plugin.scoreboard_alloc_size = 16; /* avoid frequent reallocation */
>>>> QTAILQ_INIT(&plugin.ctxs);
>>>> qht_init(&plugin.dyn_cb_arr_ht, plugin_dyn_cb_arr_cmp, 16,
>>>> QHT_MODE_AUTO_RESIZE);
>>>> atexit(qemu_plugin_atexit_cb);
>>>> }
>>>> +
>>>> +struct qemu_plugin_scoreboard *plugin_scoreboard_new(size_t element_size)
>>>> +{
>>>> + struct qemu_plugin_scoreboard *score =
>>>> + g_malloc0(sizeof(struct qemu_plugin_scoreboard));
>>>> + score->data = g_array_new(FALSE, TRUE, element_size);
>>>> + g_array_set_size(score->data, plugin.scoreboard_alloc_size);
>>>> + score->size = plugin.scoreboard_size;
>>>> + score->element_size = element_size;
>>>> +
>>>> + qemu_rec_mutex_lock(&plugin.lock);
>>>> + bool inserted = g_hash_table_insert(plugin.scoreboards, score, score);
>>>> + g_assert(inserted);
>>>> + qemu_rec_mutex_unlock(&plugin.lock);
>>>> +
>>>> + return score;
>>>> +}
>>>> +
>>>> +void plugin_scoreboard_free(struct qemu_plugin_scoreboard *score)
>>>> +{
>>>> + qemu_rec_mutex_lock(&plugin.lock);
>>>> + bool removed = g_hash_table_remove(plugin.scoreboards, score);
>>>> + g_assert(removed);
>>>> + qemu_rec_mutex_unlock(&plugin.lock);
>>>> +
>>>> + g_array_free(score->data, TRUE);
>>>> + g_free(score);
>>>> +}
>>>> diff --git a/plugins/plugin.h b/plugins/plugin.h
>>>> index 2c278379b70..99829c40886 100644
>>>> --- a/plugins/plugin.h
>>>> +++ b/plugins/plugin.h
>>>> @@ -31,6 +31,10 @@ struct qemu_plugin_state {
>>>> * but with the HT we avoid adding a field to CPUState.
>>>> */
>>>> GHashTable *cpu_ht;
>>>> + /* Scoreboards, indexed by their addresses. */
>>>> + GHashTable *scoreboards;
>>>> + size_t scoreboard_size;
>>>> + size_t scoreboard_alloc_size;
>>>> DECLARE_BITMAP(mask, QEMU_PLUGIN_EV_MAX);
>>>> /*
>>>> * @lock protects the struct as well as ctx->uninstalling.
>>>> @@ -99,4 +103,8 @@ void plugin_register_vcpu_mem_cb(GArray **arr,
>>>> void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int
>>>> cpu_index);
>>>> +struct qemu_plugin_scoreboard *plugin_scoreboard_new(size_t
>>>> element_size);
>>>> +
>>>> +void plugin_scoreboard_free(struct qemu_plugin_scoreboard *score);
>>>> +
>>>> #endif /* PLUGIN_H */
>>>> diff --git a/plugins/qemu-plugins.symbols b/plugins/qemu-plugins.symbols
>>>> index 6963585c1ea..93866d1b5f2 100644
>>>> --- a/plugins/qemu-plugins.symbols
>>>> +++ b/plugins/qemu-plugins.symbols
>>>> @@ -38,10 +38,16 @@
>>>> qemu_plugin_register_vcpu_tb_exec_inline;
>>>> qemu_plugin_register_vcpu_tb_trans_cb;
>>>> qemu_plugin_reset;
>>>> + qemu_plugin_scoreboard_free;
>>>> + qemu_plugin_scoreboard_get;
>>>> + qemu_plugin_scoreboard_new;
>>>> + qemu_plugin_scoreboard_size;
>>>> qemu_plugin_start_code;
>>>> qemu_plugin_tb_get_insn;
>>>> qemu_plugin_tb_n_insns;
>>>> qemu_plugin_tb_vaddr;
>>>> + qemu_plugin_u64_get;
>>>> + qemu_plugin_u64_sum;
>>>> qemu_plugin_uninstall;
>>>> qemu_plugin_vcpu_for_each;
>>>> };
>>>
>>
>> Thanks for your complete and detailed review!
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 05/14] tests/plugin: add test plugin for inline operations
2024-01-30 7:49 ` Pierrick Bouvier
@ 2024-01-30 14:52 ` Alex Bennée
2024-01-30 16:40 ` Pierrick Bouvier
0 siblings, 1 reply; 34+ messages in thread
From: Alex Bennée @ 2024-01-30 14:52 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> On 1/26/24 20:05, Alex Bennée wrote:
>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>>
>>> For now, it simply performs instruction, bb and mem count, and ensure
>>> that inline vs callback versions have the same result. Later, we'll
>>> extend it when new inline operations are added.
>>>
>>> Use existing plugins to test everything works is a bit cumbersome, as
>>> different events are treated in different plugins. Thus, this new one.
>>>
>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>> ---
>>> tests/plugin/inline.c | 182 +++++++++++++++++++++++++++++++++++++++
>>> tests/plugin/meson.build | 2 +-
>>> 2 files changed, 183 insertions(+), 1 deletion(-)
>>> create mode 100644 tests/plugin/inline.c
>>>
>>> diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c
>>> new file mode 100644
>>> index 00000000000..28d1c3b1e48
>>> --- /dev/null
>>> +++ b/tests/plugin/inline.c
>>> @@ -0,0 +1,182 @@
>>> +/*
>>> + * Copyright (C) 2023, Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>> + *
>>> + * Demonstrates and tests usage of inline ops.
>>> + *
>>> + * License: GNU GPL, version 2 or later.
>>> + * See the COPYING file in the top-level directory.
>>> + */
>>> +
>>> +#include <glib.h>
>>> +#include <stdint.h>
>>> +#include <stdio.h>
>>> +
>>> +#include <qemu-plugin.h>
>>> +
>>> +typedef struct {
>>> + uint64_t count_tb;
>>> + uint64_t count_tb_inline;
>>> + uint64_t count_insn;
>>> + uint64_t count_insn_inline;
>>> + uint64_t count_mem;
>>> + uint64_t count_mem_inline;
>>> +} CPUCount;
>> I wonder if there is any way to enforce the structures being an
>> array of
>> 64 bit counts? I do worry the compiler might want day decide to do
>> something silly but legal leading to confusion.
>> I guess qemu_plugin_scoreboard_new could:
>> g_assert((element_size % sizeof(uint64_t)) == 0)
>>
>
> Given explaination on patch [02/14], do you see more that CPUCount
> could hold any type, and qemu_plugin_u64 allows to target a specific
> member in it?
>
> In general, qemu plugin runtime simply is given an offset and total
> size of the struct, so a compiled plugin can have any
> optimization/padding on this struct without this affecting the result.
>
>> ?
>>
>>> +static qemu_plugin_u64_t count_tb;
>>> +static qemu_plugin_u64_t count_tb_inline;
>>> +static qemu_plugin_u64_t count_insn;
>>> +static qemu_plugin_u64_t count_insn_inline;
>>> +static qemu_plugin_u64_t count_mem;
>>> +static qemu_plugin_u64_t count_mem_inline;
>> Can't this just be a non scoreboard instance of CPUCount?
>>
>
> We could always use:
> CPUCount* count = qemu_plugin_scoreboard_get(score, i);
> count->count_tb++;
I thought these where globals protected by a lock? I wasn't suggesting
hiding the abstraction behind the scoreboard API.
>
> However, the part where we sum all values would now need some glue
> code, which is longer and more error prone than having those
> definition here (and declaration in install function).
>
>>> +
>>> +static uint64_t global_count_tb;
>>> +static uint64_t global_count_insn;
>>> +static uint64_t global_count_mem;
>>> +static unsigned int max_cpu_index;
>>> +static GMutex tb_lock;
>>> +static GMutex insn_lock;
>>> +static GMutex mem_lock;
>>> +
>>> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>>> +
>>> +static void stats_insn(void)
>>> +{
>>> + const uint64_t expected = global_count_insn;
>>> + const uint64_t per_vcpu = qemu_plugin_u64_sum(count_insn);
>>> + const uint64_t inl_per_vcpu =
>>> + qemu_plugin_u64_sum(count_insn_inline);
>>> + printf("insn: %" PRIu64 "\n", expected);
>>> + printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>> + printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>>> + g_assert(expected > 0);
>>> + g_assert(per_vcpu == expected);
>>> + g_assert(inl_per_vcpu == expected);
>>> +}
>>> +
>>> +static void stats_tb(void)
>>> +{
>>> + const uint64_t expected = global_count_tb;
>>> + const uint64_t per_vcpu = qemu_plugin_u64_sum(count_tb);
>>> + const uint64_t inl_per_vcpu =
>>> + qemu_plugin_u64_sum(count_tb_inline);
>>> + printf("tb: %" PRIu64 "\n", expected);
>>> + printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>> + printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>>> + g_assert(expected > 0);
>>> + g_assert(per_vcpu == expected);
>>> + g_assert(inl_per_vcpu == expected);
>>> +}
>>> +
>>> +static void stats_mem(void)
>>> +{
>>> + const uint64_t expected = global_count_mem;
>>> + const uint64_t per_vcpu = qemu_plugin_u64_sum(count_mem);
>>> + const uint64_t inl_per_vcpu =
>>> + qemu_plugin_u64_sum(count_mem_inline);
>>> + printf("mem: %" PRIu64 "\n", expected);
>>> + printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>> + printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>>> + g_assert(expected > 0);
>>> + g_assert(per_vcpu == expected);
>>> + g_assert(inl_per_vcpu == expected);
>>> +}
>>> +
>>> +static void plugin_exit(qemu_plugin_id_t id, void *udata)
>>> +{
>>> + const unsigned int num_cpus = qemu_plugin_scoreboard_size(counts);
>>> + g_assert(num_cpus == max_cpu_index + 1);
>>> +
>>> + for (int i = 0; i < num_cpus ; ++i) {
>>> + const uint64_t tb = *qemu_plugin_u64_get(count_tb, i);
>>> + const uint64_t tb_inline = *qemu_plugin_u64_get(count_tb_inline, i);
>>> + const uint64_t insn = *qemu_plugin_u64_get(count_insn, i);
>>> + const uint64_t insn_inline = *qemu_plugin_u64_get(count_insn_inline, i);
>>> + const uint64_t mem = *qemu_plugin_u64_get(count_mem, i);
>>> + const uint64_t mem_inline = *qemu_plugin_u64_get(count_mem_inline, i);
>>> + printf("cpu %d: tb (%" PRIu64 ", %" PRIu64 ") | "
>>> + "insn (%" PRIu64 ", %" PRIu64 ") | "
>>> + "mem (%" PRIu64 ", %" PRIu64 ")"
>>> + "\n",
>>> + i, tb, tb_inline, insn, insn_inline, mem, mem_inline);
>>> + g_assert(tb == tb_inline);
>>> + g_assert(insn == insn_inline);
>>> + g_assert(mem == mem_inline);
>>> + }
>>> +
>>> + stats_tb();
>>> + stats_insn();
>>> + stats_mem();
>>> +
>>> + qemu_plugin_scoreboard_free(counts);
>>> +}
>>> +
>>> +static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
>>> +{
>>> + (*qemu_plugin_u64_get(count_tb, cpu_index))++;
>>> + g_mutex_lock(&tb_lock);
>>> + max_cpu_index = MAX(max_cpu_index, cpu_index);
>>> + global_count_tb++;
>>> + g_mutex_unlock(&tb_lock);
>>> +}
>>> +
>>> +static void vcpu_insn_exec(unsigned int cpu_index, void *udata)
>>> +{
>>> + (*qemu_plugin_u64_get(count_insn, cpu_index))++;
>>> + g_mutex_lock(&insn_lock);
>>> + global_count_insn++;
>>> + g_mutex_unlock(&insn_lock);
>>> +}
>>> +
>>> +static void vcpu_mem_access(unsigned int cpu_index,
>>> + qemu_plugin_meminfo_t info,
>>> + uint64_t vaddr,
>>> + void *userdata)
>>> +{
>>> + (*qemu_plugin_u64_get(count_mem, cpu_index))++;
>>> + g_mutex_lock(&mem_lock);
>>> + global_count_mem++;
>>> + g_mutex_unlock(&mem_lock);
>>> +}
>>> +
>>> +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>>> +{
>>> + qemu_plugin_register_vcpu_tb_exec_cb(
>>> + tb, vcpu_tb_exec, QEMU_PLUGIN_CB_NO_REGS, 0);
>>> + qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
>>> + tb, QEMU_PLUGIN_INLINE_ADD_U64, count_tb_inline, 1);
>>> +
>>> + for (int idx = 0; idx < qemu_plugin_tb_n_insns(tb); ++idx) {
>>> + struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx);
>>> + qemu_plugin_register_vcpu_insn_exec_cb(
>>> + insn, vcpu_insn_exec, QEMU_PLUGIN_CB_NO_REGS, 0);
>>> + qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
>>> + insn, QEMU_PLUGIN_INLINE_ADD_U64, count_insn_inline, 1);
>>> + qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access,
>>> + QEMU_PLUGIN_CB_NO_REGS,
>>> + QEMU_PLUGIN_MEM_RW, 0);
>>> + qemu_plugin_register_vcpu_mem_inline_per_vcpu(
>>> + insn, QEMU_PLUGIN_MEM_RW,
>>> + QEMU_PLUGIN_INLINE_ADD_U64,
>>> + count_mem_inline, 1);
>>> + }
>>> +}
>>> +
>>> +QEMU_PLUGIN_EXPORT
>>> +int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
>>> + int argc, char **argv)
>>> +{
>>> + counts = qemu_plugin_scoreboard_new(sizeof(CPUCount));
>>> + count_tb = qemu_plugin_u64_struct(counts, CPUCount, count_tb);
>>> + count_insn = qemu_plugin_u64_struct(counts, CPUCount, count_insn);
>>> + count_mem = qemu_plugin_u64_struct(counts, CPUCount, count_mem);
>>> + count_tb_inline = qemu_plugin_u64_struct(counts, CPUCount, count_tb_inline);
>>> + count_insn_inline =
>>> + qemu_plugin_u64_struct(counts, CPUCount, count_insn_inline);
>>> + count_mem_inline =
>>> + qemu_plugin_u64_struct(counts, CPUCount, count_mem_inline);
>>> + qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
>>> + qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
>>> +
>>> + return 0;
>>> +}
>>> diff --git a/tests/plugin/meson.build b/tests/plugin/meson.build
>>> index e18183aaeda..9eece5bab51 100644
>>> --- a/tests/plugin/meson.build
>>> +++ b/tests/plugin/meson.build
>>> @@ -1,6 +1,6 @@
>>> t = []
>>> if get_option('plugins')
>>> - foreach i : ['bb', 'empty', 'insn', 'mem', 'syscall']
>>> + foreach i : ['bb', 'empty', 'inline', 'insn', 'mem', 'syscall']
>>> if host_os == 'windows'
>>> t += shared_module(i, files(i + '.c') + '../../contrib/plugins/win32_linker.c',
>>> include_directories: '../../include/qemu',
>> Otherwise:
>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 05/14] tests/plugin: add test plugin for inline operations
2024-01-30 14:52 ` Alex Bennée
@ 2024-01-30 16:40 ` Pierrick Bouvier
0 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-30 16:40 UTC (permalink / raw)
To: Alex Bennée
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
On 1/30/24 18:52, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>
>> On 1/26/24 20:05, Alex Bennée wrote:
>>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>>>
>>>> For now, it simply performs instruction, bb and mem count, and ensure
>>>> that inline vs callback versions have the same result. Later, we'll
>>>> extend it when new inline operations are added.
>>>>
>>>> Use existing plugins to test everything works is a bit cumbersome, as
>>>> different events are treated in different plugins. Thus, this new one.
>>>>
>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>> ---
>>>> tests/plugin/inline.c | 182 +++++++++++++++++++++++++++++++++++++++
>>>> tests/plugin/meson.build | 2 +-
>>>> 2 files changed, 183 insertions(+), 1 deletion(-)
>>>> create mode 100644 tests/plugin/inline.c
>>>>
>>>> diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c
>>>> new file mode 100644
>>>> index 00000000000..28d1c3b1e48
>>>> --- /dev/null
>>>> +++ b/tests/plugin/inline.c
>>>> @@ -0,0 +1,182 @@
>>>> +/*
>>>> + * Copyright (C) 2023, Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>> + *
>>>> + * Demonstrates and tests usage of inline ops.
>>>> + *
>>>> + * License: GNU GPL, version 2 or later.
>>>> + * See the COPYING file in the top-level directory.
>>>> + */
>>>> +
>>>> +#include <glib.h>
>>>> +#include <stdint.h>
>>>> +#include <stdio.h>
>>>> +
>>>> +#include <qemu-plugin.h>
>>>> +
>>>> +typedef struct {
>>>> + uint64_t count_tb;
>>>> + uint64_t count_tb_inline;
>>>> + uint64_t count_insn;
>>>> + uint64_t count_insn_inline;
>>>> + uint64_t count_mem;
>>>> + uint64_t count_mem_inline;
>>>> +} CPUCount;
>>> I wonder if there is any way to enforce the structures being an
>>> array of
>>> 64 bit counts? I do worry the compiler might want day decide to do
>>> something silly but legal leading to confusion.
>>> I guess qemu_plugin_scoreboard_new could:
>>> g_assert((element_size % sizeof(uint64_t)) == 0)
>>>
>>
>> Given explaination on patch [02/14], do you see more that CPUCount
>> could hold any type, and qemu_plugin_u64 allows to target a specific
>> member in it?
>>
>> In general, qemu plugin runtime simply is given an offset and total
>> size of the struct, so a compiled plugin can have any
>> optimization/padding on this struct without this affecting the result.
>>
>>> ?
>>>
>>>> +static qemu_plugin_u64_t count_tb;
>>>> +static qemu_plugin_u64_t count_tb_inline;
>>>> +static qemu_plugin_u64_t count_insn;
>>>> +static qemu_plugin_u64_t count_insn_inline;
>>>> +static qemu_plugin_u64_t count_mem;
>>>> +static qemu_plugin_u64_t count_mem_inline;
>>> Can't this just be a non scoreboard instance of CPUCount?
>>>
>>
>> We could always use:
>> CPUCount* count = qemu_plugin_scoreboard_get(score, i);
>> count->count_tb++;
>
> I thought these where globals protected by a lock? I wasn't suggesting
> hiding the abstraction behind the scoreboard API.
>
global_count_* var are protected by a lock (and serve as reference value
to test inline operations).
The scoreboard is used for inline and callback (per cpu) operations.
>>
>> However, the part where we sum all values would now need some glue
>> code, which is longer and more error prone than having those
>> definition here (and declaration in install function).
>>
>>>> +
>>>> +static uint64_t global_count_tb;
>>>> +static uint64_t global_count_insn;
>>>> +static uint64_t global_count_mem;
>>>> +static unsigned int max_cpu_index;
>>>> +static GMutex tb_lock;
>>>> +static GMutex insn_lock;
>>>> +static GMutex mem_lock;
>>>> +
>>>> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>>>> +
>>>> +static void stats_insn(void)
>>>> +{
>>>> + const uint64_t expected = global_count_insn;
>>>> + const uint64_t per_vcpu = qemu_plugin_u64_sum(count_insn);
>>>> + const uint64_t inl_per_vcpu =
>>>> + qemu_plugin_u64_sum(count_insn_inline);
>>>> + printf("insn: %" PRIu64 "\n", expected);
>>>> + printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>>> + printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>>>> + g_assert(expected > 0);
>>>> + g_assert(per_vcpu == expected);
>>>> + g_assert(inl_per_vcpu == expected);
>>>> +}
>>>> +
>>>> +static void stats_tb(void)
>>>> +{
>>>> + const uint64_t expected = global_count_tb;
>>>> + const uint64_t per_vcpu = qemu_plugin_u64_sum(count_tb);
>>>> + const uint64_t inl_per_vcpu =
>>>> + qemu_plugin_u64_sum(count_tb_inline);
>>>> + printf("tb: %" PRIu64 "\n", expected);
>>>> + printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>>> + printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>>>> + g_assert(expected > 0);
>>>> + g_assert(per_vcpu == expected);
>>>> + g_assert(inl_per_vcpu == expected);
>>>> +}
>>>> +
>>>> +static void stats_mem(void)
>>>> +{
>>>> + const uint64_t expected = global_count_mem;
>>>> + const uint64_t per_vcpu = qemu_plugin_u64_sum(count_mem);
>>>> + const uint64_t inl_per_vcpu =
>>>> + qemu_plugin_u64_sum(count_mem_inline);
>>>> + printf("mem: %" PRIu64 "\n", expected);
>>>> + printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>>> + printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>>>> + g_assert(expected > 0);
>>>> + g_assert(per_vcpu == expected);
>>>> + g_assert(inl_per_vcpu == expected);
>>>> +}
>>>> +
>>>> +static void plugin_exit(qemu_plugin_id_t id, void *udata)
>>>> +{
>>>> + const unsigned int num_cpus = qemu_plugin_scoreboard_size(counts);
>>>> + g_assert(num_cpus == max_cpu_index + 1);
>>>> +
>>>> + for (int i = 0; i < num_cpus ; ++i) {
>>>> + const uint64_t tb = *qemu_plugin_u64_get(count_tb, i);
>>>> + const uint64_t tb_inline = *qemu_plugin_u64_get(count_tb_inline, i);
>>>> + const uint64_t insn = *qemu_plugin_u64_get(count_insn, i);
>>>> + const uint64_t insn_inline = *qemu_plugin_u64_get(count_insn_inline, i);
>>>> + const uint64_t mem = *qemu_plugin_u64_get(count_mem, i);
>>>> + const uint64_t mem_inline = *qemu_plugin_u64_get(count_mem_inline, i);
>>>> + printf("cpu %d: tb (%" PRIu64 ", %" PRIu64 ") | "
>>>> + "insn (%" PRIu64 ", %" PRIu64 ") | "
>>>> + "mem (%" PRIu64 ", %" PRIu64 ")"
>>>> + "\n",
>>>> + i, tb, tb_inline, insn, insn_inline, mem, mem_inline);
>>>> + g_assert(tb == tb_inline);
>>>> + g_assert(insn == insn_inline);
>>>> + g_assert(mem == mem_inline);
>>>> + }
>>>> +
>>>> + stats_tb();
>>>> + stats_insn();
>>>> + stats_mem();
>>>> +
>>>> + qemu_plugin_scoreboard_free(counts);
>>>> +}
>>>> +
>>>> +static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
>>>> +{
>>>> + (*qemu_plugin_u64_get(count_tb, cpu_index))++;
>>>> + g_mutex_lock(&tb_lock);
>>>> + max_cpu_index = MAX(max_cpu_index, cpu_index);
>>>> + global_count_tb++;
>>>> + g_mutex_unlock(&tb_lock);
>>>> +}
>>>> +
>>>> +static void vcpu_insn_exec(unsigned int cpu_index, void *udata)
>>>> +{
>>>> + (*qemu_plugin_u64_get(count_insn, cpu_index))++;
>>>> + g_mutex_lock(&insn_lock);
>>>> + global_count_insn++;
>>>> + g_mutex_unlock(&insn_lock);
>>>> +}
>>>> +
>>>> +static void vcpu_mem_access(unsigned int cpu_index,
>>>> + qemu_plugin_meminfo_t info,
>>>> + uint64_t vaddr,
>>>> + void *userdata)
>>>> +{
>>>> + (*qemu_plugin_u64_get(count_mem, cpu_index))++;
>>>> + g_mutex_lock(&mem_lock);
>>>> + global_count_mem++;
>>>> + g_mutex_unlock(&mem_lock);
>>>> +}
>>>> +
>>>> +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>>>> +{
>>>> + qemu_plugin_register_vcpu_tb_exec_cb(
>>>> + tb, vcpu_tb_exec, QEMU_PLUGIN_CB_NO_REGS, 0);
>>>> + qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
>>>> + tb, QEMU_PLUGIN_INLINE_ADD_U64, count_tb_inline, 1);
>>>> +
>>>> + for (int idx = 0; idx < qemu_plugin_tb_n_insns(tb); ++idx) {
>>>> + struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx);
>>>> + qemu_plugin_register_vcpu_insn_exec_cb(
>>>> + insn, vcpu_insn_exec, QEMU_PLUGIN_CB_NO_REGS, 0);
>>>> + qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
>>>> + insn, QEMU_PLUGIN_INLINE_ADD_U64, count_insn_inline, 1);
>>>> + qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access,
>>>> + QEMU_PLUGIN_CB_NO_REGS,
>>>> + QEMU_PLUGIN_MEM_RW, 0);
>>>> + qemu_plugin_register_vcpu_mem_inline_per_vcpu(
>>>> + insn, QEMU_PLUGIN_MEM_RW,
>>>> + QEMU_PLUGIN_INLINE_ADD_U64,
>>>> + count_mem_inline, 1);
>>>> + }
>>>> +}
>>>> +
>>>> +QEMU_PLUGIN_EXPORT
>>>> +int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
>>>> + int argc, char **argv)
>>>> +{
>>>> + counts = qemu_plugin_scoreboard_new(sizeof(CPUCount));
>>>> + count_tb = qemu_plugin_u64_struct(counts, CPUCount, count_tb);
>>>> + count_insn = qemu_plugin_u64_struct(counts, CPUCount, count_insn);
>>>> + count_mem = qemu_plugin_u64_struct(counts, CPUCount, count_mem);
>>>> + count_tb_inline = qemu_plugin_u64_struct(counts, CPUCount, count_tb_inline);
>>>> + count_insn_inline =
>>>> + qemu_plugin_u64_struct(counts, CPUCount, count_insn_inline);
>>>> + count_mem_inline =
>>>> + qemu_plugin_u64_struct(counts, CPUCount, count_mem_inline);
>>>> + qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
>>>> + qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
>>>> +
>>>> + return 0;
>>>> +}
>>>> diff --git a/tests/plugin/meson.build b/tests/plugin/meson.build
>>>> index e18183aaeda..9eece5bab51 100644
>>>> --- a/tests/plugin/meson.build
>>>> +++ b/tests/plugin/meson.build
>>>> @@ -1,6 +1,6 @@
>>>> t = []
>>>> if get_option('plugins')
>>>> - foreach i : ['bb', 'empty', 'insn', 'mem', 'syscall']
>>>> + foreach i : ['bb', 'empty', 'inline', 'insn', 'mem', 'syscall']
>>>> if host_os == 'windows'
>>>> t += shared_module(i, files(i + '.c') + '../../contrib/plugins/win32_linker.c',
>>>> include_directories: '../../include/qemu',
>>> Otherwise:
>>> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
>>>
>
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 02/14] plugins: scoreboard API
2024-01-26 15:14 ` Alex Bennée
2024-01-30 7:37 ` Pierrick Bouvier
@ 2024-01-31 7:44 ` Pierrick Bouvier
2024-02-01 5:28 ` Pierrick Bouvier
1 sibling, 1 reply; 34+ messages in thread
From: Pierrick Bouvier @ 2024-01-31 7:44 UTC (permalink / raw)
To: Alex Bennée
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
On 1/26/24 19:14, Alex Bennée wrote:
>> + need_realloc = TRUE;
>> + }
>> + plugin.scoreboard_size = cpu->cpu_index + 1;
>> + g_assert(plugin.scoreboard_size <= plugin.scoreboard_alloc_size);
>> +
>> + if (g_hash_table_size(plugin.scoreboards) == 0) {
>> + /* nothing to do, we just updated sizes for future scoreboards */
>> + return;
>> + }
>> +
>> + if (need_realloc) {
>> +#ifdef CONFIG_USER_ONLY
>> + /**
>> + * cpus must be stopped, as some tb might still use an existing
>> + * scoreboard.
>> + */
>> + start_exclusive();
>> +#endif
>
> Hmm this seems wrong to be USER_ONLY. While we don't expect to resize in
> system mode if we did we certainly want to do it during exclusive
> periods.
>
After investigation, current_cpu TLS var is not set in cpus-common.c at
this point.
Indeed we are not on any cpu_exec path, but in the cpu_realize_fn when
calling this (through qemu_plugin_vcpu_init_hook).
One obvious fix is to check if it's NULL or not, like:
--- a/cpu-common.c
+++ b/cpu-common.c
@@ -193,7 +193,7 @@ void start_exclusive(void)
CPUState *other_cpu;
int running_cpus;
- if (current_cpu->exclusive_context_count) {
+ if (current_cpu && current_cpu->exclusive_context_count) {
current_cpu->exclusive_context_count++;
return;
}
Does anyone suggest another possible fix? (like define current_cpu
somewhere, or moving qemu_plugin_vcpu_init_hook call).
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v2 02/14] plugins: scoreboard API
2024-01-31 7:44 ` Pierrick Bouvier
@ 2024-02-01 5:28 ` Pierrick Bouvier
0 siblings, 0 replies; 34+ messages in thread
From: Pierrick Bouvier @ 2024-02-01 5:28 UTC (permalink / raw)
To: Alex Bennée
Cc: qemu-devel, Mahmoud Mandour, Paolo Bonzini, Richard Henderson,
Alexandre Iooss
On 1/31/24 11:44, Pierrick Bouvier wrote:
> On 1/26/24 19:14, Alex Bennée wrote:
>>> + need_realloc = TRUE;
>>> + }
>>> + plugin.scoreboard_size = cpu->cpu_index + 1;
>>> + g_assert(plugin.scoreboard_size <= plugin.scoreboard_alloc_size);
>>> +
>>> + if (g_hash_table_size(plugin.scoreboards) == 0) {
>>> + /* nothing to do, we just updated sizes for future scoreboards */
>>> + return;
>>> + }
>>> +
>>> + if (need_realloc) {
>>> +#ifdef CONFIG_USER_ONLY
>>> + /**
>>> + * cpus must be stopped, as some tb might still use an existing
>>> + * scoreboard.
>>> + */
>>> + start_exclusive();
>>> +#endif
>>
>> Hmm this seems wrong to be USER_ONLY. While we don't expect to resize in
>> system mode if we did we certainly want to do it during exclusive
>> periods.
>>
>
> After investigation, current_cpu TLS var is not set in cpus-common.c at
> this point.
>
> Indeed we are not on any cpu_exec path, but in the cpu_realize_fn when
> calling this (through qemu_plugin_vcpu_init_hook).
>
> One obvious fix is to check if it's NULL or not, like:
> --- a/cpu-common.c
> +++ b/cpu-common.c
> @@ -193,7 +193,7 @@ void start_exclusive(void)
> CPUState *other_cpu;
> int running_cpus;
>
> - if (current_cpu->exclusive_context_count) {
> + if (current_cpu && current_cpu->exclusive_context_count) {
> current_cpu->exclusive_context_count++;
> return;
> }
>
> Does anyone suggest another possible fix? (like define current_cpu
> somewhere, or moving qemu_plugin_vcpu_init_hook call).
Running init_hook asynchronously on cpu works and solves the problem,
without any need to modify start/end exclusive code.
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2024-02-01 5:29 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-18 3:23 [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 01/14] plugins: implement inline operation relative to cpu_index Pierrick Bouvier
2024-01-26 12:07 ` Alex Bennée
2024-01-29 10:10 ` Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 02/14] plugins: scoreboard API Pierrick Bouvier
2024-01-26 15:14 ` Alex Bennée
2024-01-30 7:37 ` Pierrick Bouvier
2024-01-30 10:23 ` Alex Bennée
2024-01-30 11:10 ` Pierrick Bouvier
2024-01-31 7:44 ` Pierrick Bouvier
2024-02-01 5:28 ` Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 03/14] docs/devel: plugins can trigger a tb flush Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 04/14] plugins: add inline operation per vcpu Pierrick Bouvier
2024-01-26 15:17 ` Alex Bennée
2024-01-18 3:23 ` [PATCH v2 05/14] tests/plugin: add test plugin for inline operations Pierrick Bouvier
2024-01-26 16:05 ` Alex Bennée
2024-01-30 7:49 ` Pierrick Bouvier
2024-01-30 14:52 ` Alex Bennée
2024-01-30 16:40 ` Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 06/14] tests/plugin/mem: migrate to new per_vcpu API Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 07/14] tests/plugin/insn: " Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 08/14] tests/plugin/bb: " Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 09/14] contrib/plugins/hotblocks: " Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 10/14] contrib/plugins/howvec: " Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 11/14] plugins: remove non per_vcpu inline operation from API Pierrick Bouvier
2024-01-26 16:26 ` Alex Bennée
2024-01-30 7:53 ` Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 12/14] plugins: register inline op with a qemu_plugin_u64_t Pierrick Bouvier
2024-01-18 3:23 ` [PATCH v2 13/14] MAINTAINERS: Add myself as reviewer for TCG Plugins Pierrick Bouvier
2024-01-26 16:27 ` Alex Bennée
2024-01-18 3:23 ` [PATCH v2 14/14] contrib/plugins/execlog: fix new warnings Pierrick Bouvier
2024-01-26 16:31 ` Alex Bennée
2024-01-30 7:51 ` Pierrick Bouvier
2024-01-26 9:21 ` [PATCH v2 00/14] TCG Plugin inline operation enhancement Pierrick Bouvier
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).