qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/12] TCG Plugin inline operation enhancement
@ 2024-01-11 14:23 Pierrick Bouvier
  2024-01-11 14:23 ` [PATCH 01/12] plugins: implement inline operation with cpu_index offset Pierrick Bouvier
                   ` (11 more replies)
  0 siblings, 12 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

This series adds a new thread-safe API to declare inline operation
inside plugins. As well, it removes the existing non thread-safe API,
and migrates all existing plugins to use it.

Tested on Linux (user, system) for i386, x86_64 and aarch64.

To give some context, this a long term series of work around plugins,
with the goal to be able to do basic operations in a more performant and
accurate way. This will mean to add more inline operations and
conditional callbacks.

One final target of this work is to implement a plugin that implements
the icount=auto feature, and allow QEMU to run at a given "frequency"
based on number of instructions executed, without QEMU needing to keep
track of this.

Another final target is to be able to detect control flow changes in an
efficient and elegant way, by combining inline operation and conditional
callbacks.

 Pierrick Bouvier (12):
       plugins: implement inline operation with cpu_index offset
       plugins: add inline operation per vcpu
       tests/plugin: add test plugin for inline operations
       tests/plugin/inline: migrate to new per_vcpu API
       tests/plugin/mem: fix race condition with callbacks
       tests/plugin/mem: migrate to new per_vcpu API
       tests/plugin/insn: migrate to new per_vcpu API
       tests/plugin/bb: migrate to new per_vcpu API
       contrib/plugins/hotblocks: migrate to new per_vcpu API
       contrib/plugins/howvec: migrate to new per_vcpu API
       plugins: remove non per_vcpu inline operation from API
       MAINTAINERS: Add myself as reviewer for TCG Plugins

  MAINTAINERS                  |   1 +
  accel/tcg/plugin-gen.c       |  60 +++++++++++++---
  contrib/plugins/hotblocks.c  |  25 +++++--
  contrib/plugins/howvec.c     |  33 ++++++---
  include/qemu/plugin.h        |   1 +
  include/qemu/qemu-plugin.h   |  65 +++++++++--------
  plugins/api.c                |  35 +++++----
  plugins/core.c               |  11 +--
  plugins/plugin.h             |   5 +-
  plugins/qemu-plugins.symbols |   3 +
  tests/plugin/bb.c            |  54 +++++++-------
  tests/plugin/inline.c        | 166 +++++++++++++++++++++++++++++++++++++++++++
  tests/plugin/insn.c          |  14 ++--
  tests/plugin/mem.c           |  21 ++++--
  tests/plugin/meson.build     |   2 +-
  15 files changed, 383 insertions(+), 113 deletions(-)

Pierrick Bouvier (12):
  plugins: implement inline operation with cpu_index offset
  plugins: add inline operation per vcpu
  tests/plugin: add test plugin for inline operations
  tests/plugin/inline: migrate to new per_vcpu API
  tests/plugin/mem: fix race condition with callbacks
  tests/plugin/mem: migrate to new per_vcpu API
  tests/plugin/insn: migrate to new per_vcpu API
  tests/plugin/bb: migrate to new per_vcpu API
  contrib/plugins/hotblocks: migrate to new per_vcpu API
  contrib/plugins/howvec: migrate to new per_vcpu API
  plugins: remove non per_vcpu inline operation from API
  MAINTAINERS: Add myself as reviewer for TCG Plugins

 MAINTAINERS                  |   1 +
 accel/tcg/plugin-gen.c       |  60 ++++++++++---
 contrib/plugins/hotblocks.c  |  25 ++++--
 contrib/plugins/howvec.c     |  33 ++++---
 include/qemu/plugin.h        |   1 +
 include/qemu/qemu-plugin.h   |  65 ++++++++------
 plugins/api.c                |  35 +++++---
 plugins/core.c               |  11 ++-
 plugins/plugin.h             |   5 +-
 plugins/qemu-plugins.symbols |   3 +
 tests/plugin/bb.c            |  54 ++++++------
 tests/plugin/inline.c        | 166 +++++++++++++++++++++++++++++++++++
 tests/plugin/insn.c          |  14 ++-
 tests/plugin/mem.c           |  21 +++--
 tests/plugin/meson.build     |   2 +-
 15 files changed, 383 insertions(+), 113 deletions(-)
 create mode 100644 tests/plugin/inline.c

-- 
2.43.0



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/12] plugins: implement inline operation with cpu_index offset
  2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
@ 2024-01-11 14:23 ` Pierrick Bouvier
  2024-01-11 22:04   ` Richard Henderson
  2024-01-11 14:23 ` [PATCH 02/12] plugins: add inline operation per vcpu Pierrick Bouvier
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Instead of working on a fixed memory location, allow to index it based
on cpu_index and a given offset (ptr + cpu_index * offset).
Current semantic is not modified as we use a 0 offset, thus inline
operation still targets always the same memory location.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 accel/tcg/plugin-gen.c | 60 +++++++++++++++++++++++++++++++++++-------
 include/qemu/plugin.h  |  1 +
 plugins/api.c          |  7 ++---
 plugins/core.c         | 11 +++++---
 plugins/plugin.h       |  5 ++--
 5 files changed, 65 insertions(+), 19 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 78b331b2510..fc9d3ee23bc 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -118,16 +118,28 @@ static void gen_empty_udata_cb(void)
  */
 static void gen_empty_inline_cb(void)
 {
+    TCGv_i32 cpu_index = tcg_temp_ebb_new_i32();
+    TCGv_ptr cpu_index_as_ptr = tcg_temp_ebb_new_ptr();
     TCGv_i64 val = tcg_temp_ebb_new_i64();
     TCGv_ptr ptr = tcg_temp_ebb_new_ptr();
 
+    tcg_gen_ld_i32(cpu_index, tcg_env,
+                   -offsetof(ArchCPU, env) + offsetof(CPUState, cpu_index));
+    /* pass an immediate != 0 so that it doesn't get optimized away */
+    tcg_gen_muli_i32(cpu_index, cpu_index, 0xdeadbeef);
+    tcg_gen_ext_i32_ptr(cpu_index_as_ptr, cpu_index);
+
     tcg_gen_movi_ptr(ptr, 0);
+    tcg_gen_add_ptr(ptr, ptr, cpu_index_as_ptr);
     tcg_gen_ld_i64(val, ptr, 0);
     /* pass an immediate != 0 so that it doesn't get optimized away */
     tcg_gen_addi_i64(val, val, 0xdeadface);
+
     tcg_gen_st_i64(val, ptr, 0);
     tcg_temp_free_ptr(ptr);
     tcg_temp_free_i64(val);
+    tcg_temp_free_ptr(cpu_index_as_ptr);
+    tcg_temp_free_i32(cpu_index);
 }
 
 static void gen_empty_mem_cb(TCGv_i64 addr, uint32_t info)
@@ -274,12 +286,37 @@ static TCGOp *copy_const_ptr(TCGOp **begin_op, TCGOp *op, void *ptr)
     return op;
 }
 
+static TCGOp *copy_ld_i32(TCGOp **begin_op, TCGOp *op)
+{
+    return copy_op(begin_op, op, INDEX_op_ld_i32);
+}
+
+static TCGOp *copy_ext_i32_ptr(TCGOp **begin_op, TCGOp *op)
+{
+    if (UINTPTR_MAX == UINT32_MAX) {
+        op = copy_op(begin_op, op, INDEX_op_mov_i32);
+    } else {
+        op = copy_op(begin_op, op, INDEX_op_ext_i32_i64);
+    }
+    return op;
+}
+
+static TCGOp *copy_add_ptr(TCGOp **begin_op, TCGOp *op)
+{
+    if (UINTPTR_MAX == UINT32_MAX) {
+        op = copy_op(begin_op, op, INDEX_op_add_i32);
+    } else {
+        op = copy_op(begin_op, op, INDEX_op_add_i64);
+    }
+    return op;
+}
+
 static TCGOp *copy_ld_i64(TCGOp **begin_op, TCGOp *op)
 {
     if (TCG_TARGET_REG_BITS == 32) {
         /* 2x ld_i32 */
-        op = copy_op(begin_op, op, INDEX_op_ld_i32);
-        op = copy_op(begin_op, op, INDEX_op_ld_i32);
+        op = copy_ld_i32(begin_op, op);
+        op = copy_ld_i32(begin_op, op);
     } else {
         /* ld_i64 */
         op = copy_op(begin_op, op, INDEX_op_ld_i64);
@@ -315,6 +352,13 @@ static TCGOp *copy_add_i64(TCGOp **begin_op, TCGOp *op, uint64_t v)
     return op;
 }
 
+static TCGOp *copy_mul_i32(TCGOp **begin_op, TCGOp *op, uint32_t v)
+{
+    op = copy_op(begin_op, op, INDEX_op_mul_i32);
+    op->args[2] = tcgv_i32_arg(tcg_constant_i32(v));
+    return op;
+}
+
 static TCGOp *copy_st_ptr(TCGOp **begin_op, TCGOp *op)
 {
     if (UINTPTR_MAX == UINT32_MAX) {
@@ -380,18 +424,14 @@ static TCGOp *append_inline_cb(const struct qemu_plugin_dyn_cb *cb,
                                TCGOp *begin_op, TCGOp *op,
                                int *unused)
 {
-    /* const_ptr */
+    op = copy_ld_i32(&begin_op, op);
+    op = copy_mul_i32(&begin_op, op, cb->userp_offset);
+    op = copy_ext_i32_ptr(&begin_op, op);
     op = copy_const_ptr(&begin_op, op, cb->userp);
-
-    /* ld_i64 */
+    op = copy_add_ptr(&begin_op, op);
     op = copy_ld_i64(&begin_op, op);
-
-    /* add_i64 */
     op = copy_add_i64(&begin_op, op, cb->inline_insn.imm);
-
-    /* st_i64 */
     op = copy_st_i64(&begin_op, op);
-
     return op;
 }
 
diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index 7fdc3a4849f..4548affc295 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -85,6 +85,7 @@ enum plugin_dyn_cb_subtype {
 struct qemu_plugin_dyn_cb {
     union qemu_plugin_cb_sig f;
     void *userp;
+    size_t userp_offset;
     enum plugin_dyn_cb_subtype type;
     /* @rw applies to mem callbacks only (both regular and inline) */
     enum qemu_plugin_mem_rw rw;
diff --git a/plugins/api.c b/plugins/api.c
index 5521b0ad36c..0fcce825680 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -99,7 +99,8 @@ void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
                                               void *ptr, uint64_t imm)
 {
     if (!tb->mem_only) {
-        plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE], 0, op, ptr, imm);
+        plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE],
+                                  0, op, ptr, 0, imm);
     }
 }
 
@@ -120,7 +121,7 @@ void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
 {
     if (!insn->mem_only) {
         plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
-                                  0, op, ptr, imm);
+                                  0, op, ptr, 0, imm);
     }
 }
 
@@ -145,7 +146,7 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
                                           uint64_t imm)
 {
     plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
-                              rw, op, ptr, imm);
+                              rw, op, ptr, 0, imm);
 }
 
 void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
diff --git a/plugins/core.c b/plugins/core.c
index 49588285dd0..cc6d7720b1f 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -280,13 +280,15 @@ static struct qemu_plugin_dyn_cb *plugin_get_dyn_cb(GArray **arr)
 
 void plugin_register_inline_op(GArray **arr,
                                enum qemu_plugin_mem_rw rw,
-                               enum qemu_plugin_op op, void *ptr,
+                               enum qemu_plugin_op op,
+                               void *ptr, size_t offset,
                                uint64_t imm)
 {
     struct qemu_plugin_dyn_cb *dyn_cb;
 
     dyn_cb = plugin_get_dyn_cb(arr);
     dyn_cb->userp = ptr;
+    dyn_cb->userp_offset = offset;
     dyn_cb->type = PLUGIN_CB_INLINE;
     dyn_cb->rw = rw;
     dyn_cb->inline_insn.op = op;
@@ -431,9 +433,10 @@ void qemu_plugin_flush_cb(void)
     plugin_cb__simple(QEMU_PLUGIN_EV_FLUSH);
 }
 
-void exec_inline_op(struct qemu_plugin_dyn_cb *cb)
+void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int cpu_index)
 {
-    uint64_t *val = cb->userp;
+    const size_t offset = cpu_index * cb->userp_offset;
+    uint64_t *val = (uint64_t *)((char *) cb->userp + offset);
 
     switch (cb->inline_insn.op) {
     case QEMU_PLUGIN_INLINE_ADD_U64:
@@ -466,7 +469,7 @@ void qemu_plugin_vcpu_mem_cb(CPUState *cpu, uint64_t vaddr,
                            vaddr, cb->userp);
             break;
         case PLUGIN_CB_INLINE:
-            exec_inline_op(cb);
+            exec_inline_op(cb, cpu->cpu_index);
             break;
         default:
             g_assert_not_reached();
diff --git a/plugins/plugin.h b/plugins/plugin.h
index 5eb2fdbc85e..e597ef3c30e 100644
--- a/plugins/plugin.h
+++ b/plugins/plugin.h
@@ -66,7 +66,8 @@ struct qemu_plugin_ctx *plugin_id_to_ctx_locked(qemu_plugin_id_t id);
 
 void plugin_register_inline_op(GArray **arr,
                                enum qemu_plugin_mem_rw rw,
-                               enum qemu_plugin_op op, void *ptr,
+                               enum qemu_plugin_op op,
+                               void *ptr, size_t offset,
                                uint64_t imm);
 
 void plugin_reset_uninstall(qemu_plugin_id_t id,
@@ -95,6 +96,6 @@ void plugin_register_vcpu_mem_cb(GArray **arr,
                                  enum qemu_plugin_mem_rw rw,
                                  void *udata);
 
-void exec_inline_op(struct qemu_plugin_dyn_cb *cb);
+void exec_inline_op(struct qemu_plugin_dyn_cb *cb, int cpu_index);
 
 #endif /* PLUGIN_H */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 02/12] plugins: add inline operation per vcpu
  2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
  2024-01-11 14:23 ` [PATCH 01/12] plugins: implement inline operation with cpu_index offset Pierrick Bouvier
@ 2024-01-11 14:23 ` Pierrick Bouvier
  2024-01-11 22:08   ` Richard Henderson
  2024-01-11 14:23 ` [PATCH 03/12] tests/plugin: add test plugin for inline operations Pierrick Bouvier
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Extends API with three new functions:
qemu_plugin_register_vcpu_{tb, insn, mem}_exec_inline_per_vcpu().

Compared to non per_vcpu versions, ptr is now a base, and current
cpu_index and an offset are used to compute memory location on which
operation happens (ptr + cpu_index * offset).

This allows to have a thread-safe version of inline operations.

Having a flexible offset is useful in case a user wants to target a
memory location embedded into a struct. In this case, the offset between
two memory locations will be bigger than sizeof(uint64_t).

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 include/qemu/qemu-plugin.h   | 56 +++++++++++++++++++++++++++++++++++-
 plugins/api.c                | 36 ++++++++++++++++++++---
 plugins/qemu-plugins.symbols |  3 ++
 3 files changed, 90 insertions(+), 5 deletions(-)

diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index 4daab6efd29..8a0691a760e 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -312,6 +312,25 @@ void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
                                               enum qemu_plugin_op op,
                                               void *ptr, uint64_t imm);
 
+/**
+ * qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu() - execution inline op
+ * @tb: the opaque qemu_plugin_tb handle for the translation
+ * @op: the type of qemu_plugin_op (e.g. ADD_U64)
+ * @ptr: point to memory location for the op
+ * @offset: offset between two memory locations
+ * @imm: the op data (e.g. 1)
+ *
+ * Insert an inline op, on a memory location associated to a given
+ * vcpu (whose address is ptr + offset * cpu_index),
+ * every time a translated unit executes.
+ */
+QEMU_PLUGIN_API
+void qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+    struct qemu_plugin_tb *tb,
+    enum qemu_plugin_op op,
+    void *ptr, size_t offset,
+    uint64_t imm);
+
 /**
  * qemu_plugin_register_vcpu_insn_exec_cb() - register insn execution cb
  * @insn: the opaque qemu_plugin_insn handle for an instruction
@@ -342,6 +361,23 @@ void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
                                                 enum qemu_plugin_op op,
                                                 void *ptr, uint64_t imm);
 
+/**
+ * qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu() - insn exec inline op
+ * @insn: the opaque qemu_plugin_insn handle for an instruction
+ * @op: the type of qemu_plugin_op (e.g. ADD_U64)
+ * @ptr: point to array of memory locations for the op
+ * @offset: offset between two memory locations
+ * @imm: the op data (e.g. 1)
+ *
+ * Insert an inline op to every time an instruction executes.
+ */
+QEMU_PLUGIN_API
+void qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
+    struct qemu_plugin_insn *insn,
+    enum qemu_plugin_op op,
+    void *ptr, size_t offset,
+    uint64_t imm);
+
 /**
  * qemu_plugin_tb_n_insns() - query helper for number of insns in TB
  * @tb: opaque handle to TB passed to callback
@@ -567,7 +603,25 @@ void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
                                           enum qemu_plugin_op op, void *ptr,
                                           uint64_t imm);
 
-
+/**
+ * qemu_plugin_register_vcpu_mem_inline_per_vcpu() - inline op for mem access
+ * @insn: handle for instruction to instrument
+ * @rw: apply to reads, writes or both
+ * @op: the op, of type qemu_plugin_op
+ * @ptr: point to array of memory locations for the op
+ * @offset: offset between two memory locations
+ * @imm: immediate data for @op
+ *
+ * This registers a inline op every memory access generated by the
+ * instruction.
+ */
+QEMU_PLUGIN_API
+void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
+    struct qemu_plugin_insn *insn,
+    enum qemu_plugin_mem_rw rw,
+    enum qemu_plugin_op op,
+    void *ptr, size_t offset,
+    uint64_t imm);
 
 typedef void
 (*qemu_plugin_vcpu_syscall_cb_t)(qemu_plugin_id_t id, unsigned int vcpu_index,
diff --git a/plugins/api.c b/plugins/api.c
index 0fcce825680..fd6ce678501 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -97,10 +97,19 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
 void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
                                               enum qemu_plugin_op op,
                                               void *ptr, uint64_t imm)
+{
+    qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(tb, op, ptr, 0, imm);
+}
+
+void qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+    struct qemu_plugin_tb *tb,
+    enum qemu_plugin_op op,
+    void *ptr, size_t offset,
+    uint64_t imm)
 {
     if (!tb->mem_only) {
         plugin_register_inline_op(&tb->cbs[PLUGIN_CB_INLINE],
-                                  0, op, ptr, 0, imm);
+                                  0, op, ptr, offset, imm);
     }
 }
 
@@ -118,10 +127,19 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
 void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
                                                 enum qemu_plugin_op op,
                                                 void *ptr, uint64_t imm)
+{
+    qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(insn, op, ptr, 0, imm);
+}
+
+void qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
+    struct qemu_plugin_insn *insn,
+    enum qemu_plugin_op op,
+    void *ptr, size_t offset,
+    uint64_t imm)
 {
     if (!insn->mem_only) {
         plugin_register_inline_op(&insn->cbs[PLUGIN_CB_INSN][PLUGIN_CB_INLINE],
-                                  0, op, ptr, 0, imm);
+                                  0, op, ptr, offset, imm);
     }
 }
 
@@ -137,16 +155,26 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
                                       void *udata)
 {
     plugin_register_vcpu_mem_cb(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_REGULAR],
-                                    cb, flags, rw, udata);
+                                cb, flags, rw, udata);
 }
 
 void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
                                           enum qemu_plugin_mem_rw rw,
                                           enum qemu_plugin_op op, void *ptr,
                                           uint64_t imm)
+{
+    qemu_plugin_register_vcpu_mem_inline_per_vcpu(insn, rw, op, ptr, 0, imm);
+}
+
+void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
+    struct qemu_plugin_insn *insn,
+    enum qemu_plugin_mem_rw rw,
+    enum qemu_plugin_op op,
+    void *ptr, size_t offset,
+    uint64_t imm)
 {
     plugin_register_inline_op(&insn->cbs[PLUGIN_CB_MEM][PLUGIN_CB_INLINE],
-                              rw, op, ptr, 0, imm);
+                              rw, op, ptr, offset, imm);
 }
 
 void qemu_plugin_register_vcpu_tb_trans_cb(qemu_plugin_id_t id,
diff --git a/plugins/qemu-plugins.symbols b/plugins/qemu-plugins.symbols
index 71f6c90549d..56ba30e5a81 100644
--- a/plugins/qemu-plugins.symbols
+++ b/plugins/qemu-plugins.symbols
@@ -27,13 +27,16 @@
   qemu_plugin_register_vcpu_init_cb;
   qemu_plugin_register_vcpu_insn_exec_cb;
   qemu_plugin_register_vcpu_insn_exec_inline;
+  qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu;
   qemu_plugin_register_vcpu_mem_cb;
   qemu_plugin_register_vcpu_mem_inline;
+  qemu_plugin_register_vcpu_mem_inline_per_vcpu;
   qemu_plugin_register_vcpu_resume_cb;
   qemu_plugin_register_vcpu_syscall_cb;
   qemu_plugin_register_vcpu_syscall_ret_cb;
   qemu_plugin_register_vcpu_tb_exec_cb;
   qemu_plugin_register_vcpu_tb_exec_inline;
+  qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu;
   qemu_plugin_register_vcpu_tb_trans_cb;
   qemu_plugin_reset;
   qemu_plugin_start_code;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 03/12] tests/plugin: add test plugin for inline operations
  2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
  2024-01-11 14:23 ` [PATCH 01/12] plugins: implement inline operation with cpu_index offset Pierrick Bouvier
  2024-01-11 14:23 ` [PATCH 02/12] plugins: add inline operation per vcpu Pierrick Bouvier
@ 2024-01-11 14:23 ` Pierrick Bouvier
  2024-01-11 15:57   ` Philippe Mathieu-Daudé
  2024-01-11 14:23 ` [PATCH 04/12] tests/plugin/inline: migrate to new per_vcpu API Pierrick Bouvier
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

For now, it simply performs instruction, bb and mem count, and ensure
that inline vs callback versions have the same result. Later, we'll
extend it when new inline operations are added.

Use existing plugins to test everything works is a bit cumbersome, as
different events are treated in different plugins. Thus, this new one.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 tests/plugin/inline.c    | 183 +++++++++++++++++++++++++++++++++++++++
 tests/plugin/meson.build |   2 +-
 2 files changed, 184 insertions(+), 1 deletion(-)
 create mode 100644 tests/plugin/inline.c

diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c
new file mode 100644
index 00000000000..6114ebca545
--- /dev/null
+++ b/tests/plugin/inline.c
@@ -0,0 +1,183 @@
+/*
+ * Copyright (C) 2023, Pierrick Bouvier <pierrick.bouvier@linaro.org>
+ *
+ * Demonstrates and tests usage of inline ops.
+ *
+ * License: GNU GPL, version 2 or later.
+ *   See the COPYING file in the top-level directory.
+ */
+
+#include <glib.h>
+#include <stdint.h>
+#include <stdio.h>
+
+#include <qemu-plugin.h>
+
+#define MAX_CPUS 8
+
+static uint64_t count_tb;
+static uint64_t count_tb_per_vcpu[MAX_CPUS];
+static uint64_t count_tb_inline_per_vcpu[MAX_CPUS];
+static uint64_t count_tb_inline_racy;
+static uint64_t count_insn;
+static uint64_t count_insn_per_vcpu[MAX_CPUS];
+static uint64_t count_insn_inline_per_vcpu[MAX_CPUS];
+static uint64_t count_insn_inline_racy;
+static uint64_t count_mem;
+static uint64_t count_mem_per_vcpu[MAX_CPUS];
+static uint64_t count_mem_inline_per_vcpu[MAX_CPUS];
+static uint64_t count_mem_inline_racy;
+static GMutex tb_lock;
+static GMutex insn_lock;
+static GMutex mem_lock;
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+static uint64_t collect_per_vcpu(uint64_t *values)
+{
+    uint64_t count = 0;
+    for (int i = 0; i < MAX_CPUS; ++i) {
+        count += values[i];
+    }
+    return count;
+}
+
+static void stats_insn(void)
+{
+    const uint64_t expected = count_insn;
+    const uint64_t per_vcpu = collect_per_vcpu(count_insn_per_vcpu);
+    const uint64_t inl_per_vcpu = collect_per_vcpu(count_insn_inline_per_vcpu);
+    printf("insn: %" PRIu64 "\n", expected);
+    printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu);
+    printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
+    printf("insn: %" PRIu64 " (inline racy)\n", count_insn_inline_racy);
+    g_assert(expected > 0);
+    g_assert(per_vcpu == expected);
+    g_assert(inl_per_vcpu == expected);
+    g_assert(count_insn_inline_racy <= expected);
+}
+
+static void stats_tb(void)
+{
+    const uint64_t expected = count_tb;
+    const uint64_t per_vcpu = collect_per_vcpu(count_tb_per_vcpu);
+    const uint64_t inl_per_vcpu = collect_per_vcpu(count_tb_inline_per_vcpu);
+    printf("tb: %" PRIu64 "\n", expected);
+    printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu);
+    printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
+    printf("tb: %" PRIu64 " (inline racy)\n", count_tb_inline_racy);
+    g_assert(expected > 0);
+    g_assert(per_vcpu == expected);
+    g_assert(inl_per_vcpu == expected);
+    g_assert(count_tb_inline_racy <= expected);
+}
+
+static void stats_mem(void)
+{
+    const uint64_t expected = count_mem;
+    const uint64_t per_vcpu = collect_per_vcpu(count_mem_per_vcpu);
+    const uint64_t inl_per_vcpu = collect_per_vcpu(count_mem_inline_per_vcpu);
+    printf("mem: %" PRIu64 "\n", expected);
+    printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu);
+    printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
+    printf("mem: %" PRIu64 " (inline racy)\n", count_mem_inline_racy);
+    g_assert(expected > 0);
+    g_assert(per_vcpu == expected);
+    g_assert(inl_per_vcpu == expected);
+    g_assert(count_mem_inline_racy <= expected);
+}
+
+static void plugin_exit(qemu_plugin_id_t id, void *udata)
+{
+    for (int i = 0; i < MAX_CPUS; ++i) {
+        const uint64_t tb = count_tb_per_vcpu[i];
+        const uint64_t tb_inline = count_tb_inline_per_vcpu[i];
+        const uint64_t insn = count_insn_per_vcpu[i];
+        const uint64_t insn_inline = count_insn_inline_per_vcpu[i];
+        const uint64_t mem = count_mem_per_vcpu[i];
+        const uint64_t mem_inline = count_mem_inline_per_vcpu[i];
+        printf("cpu %d: tb (%" PRIu64 ", %" PRIu64 ") | "
+               "insn (%" PRIu64 ", %" PRIu64 ") | "
+               "mem (%" PRIu64 ", %" PRIu64 ")"
+               "\n",
+               i, tb, tb_inline, insn, insn_inline, mem, mem_inline);
+        g_assert(tb == tb_inline);
+        g_assert(insn == insn_inline);
+        g_assert(mem == mem_inline);
+    }
+
+    stats_tb();
+    stats_insn();
+    stats_mem();
+}
+
+static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
+{
+    count_tb_per_vcpu[cpu_index]++;
+    g_mutex_lock(&tb_lock);
+    count_tb++;
+    g_mutex_unlock(&tb_lock);
+}
+
+static void vcpu_insn_exec(unsigned int cpu_index, void *udata)
+{
+    count_insn_per_vcpu[cpu_index]++;
+    g_mutex_lock(&insn_lock);
+    count_insn++;
+    g_mutex_unlock(&insn_lock);
+}
+
+static void vcpu_mem_access(unsigned int cpu_index,
+                            qemu_plugin_meminfo_t info,
+                            uint64_t vaddr,
+                            void *userdata)
+{
+    count_mem_per_vcpu[cpu_index]++;
+    g_mutex_lock(&mem_lock);
+    count_mem++;
+    g_mutex_unlock(&mem_lock);
+}
+
+static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
+{
+    qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
+                                         QEMU_PLUGIN_CB_NO_REGS, 0);
+    qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
+                                             &count_tb_inline_racy, 1);
+    qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+        tb, QEMU_PLUGIN_INLINE_ADD_U64,
+        count_tb_inline_per_vcpu, sizeof(uint64_t), 1);
+
+    for (int idx = 0; idx < qemu_plugin_tb_n_insns(tb); ++idx) {
+        struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx);
+        qemu_plugin_register_vcpu_insn_exec_cb(insn, vcpu_insn_exec,
+                                               QEMU_PLUGIN_CB_NO_REGS, 0);
+        qemu_plugin_register_vcpu_insn_exec_inline(
+            insn, QEMU_PLUGIN_INLINE_ADD_U64,
+            &count_insn_inline_racy, 1);
+        qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
+            insn, QEMU_PLUGIN_INLINE_ADD_U64,
+            count_insn_inline_per_vcpu, sizeof(uint64_t), 1);
+        qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access,
+                                         QEMU_PLUGIN_CB_NO_REGS,
+                                         QEMU_PLUGIN_MEM_RW, 0);
+        qemu_plugin_register_vcpu_mem_inline(insn, QEMU_PLUGIN_MEM_RW,
+                                             QEMU_PLUGIN_INLINE_ADD_U64,
+                                             &count_mem_inline_racy, 1);
+        qemu_plugin_register_vcpu_mem_inline_per_vcpu(
+            insn, QEMU_PLUGIN_MEM_RW,
+            QEMU_PLUGIN_INLINE_ADD_U64,
+            count_mem_inline_per_vcpu, sizeof(uint64_t), 1);
+    }
+}
+
+QEMU_PLUGIN_EXPORT
+int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
+                        int argc, char **argv)
+{
+    g_assert(info->system.smp_vcpus <= MAX_CPUS);
+    qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
+    qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
+
+    return 0;
+}
diff --git a/tests/plugin/meson.build b/tests/plugin/meson.build
index e18183aaeda..9eece5bab51 100644
--- a/tests/plugin/meson.build
+++ b/tests/plugin/meson.build
@@ -1,6 +1,6 @@
 t = []
 if get_option('plugins')
-  foreach i : ['bb', 'empty', 'insn', 'mem', 'syscall']
+  foreach i : ['bb', 'empty', 'inline', 'insn', 'mem', 'syscall']
     if host_os == 'windows'
       t += shared_module(i, files(i + '.c') + '../../contrib/plugins/win32_linker.c',
                         include_directories: '../../include/qemu',
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 04/12] tests/plugin/inline: migrate to new per_vcpu API
  2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
                   ` (2 preceding siblings ...)
  2024-01-11 14:23 ` [PATCH 03/12] tests/plugin: add test plugin for inline operations Pierrick Bouvier
@ 2024-01-11 14:23 ` Pierrick Bouvier
  2024-01-11 22:10   ` Richard Henderson
  2024-01-11 14:23 ` [PATCH 05/12] tests/plugin/mem: fix race condition with callbacks Pierrick Bouvier
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 tests/plugin/inline.c | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c
index 6114ebca545..ae59f7af7a7 100644
--- a/tests/plugin/inline.c
+++ b/tests/plugin/inline.c
@@ -18,15 +18,12 @@
 static uint64_t count_tb;
 static uint64_t count_tb_per_vcpu[MAX_CPUS];
 static uint64_t count_tb_inline_per_vcpu[MAX_CPUS];
-static uint64_t count_tb_inline_racy;
 static uint64_t count_insn;
 static uint64_t count_insn_per_vcpu[MAX_CPUS];
 static uint64_t count_insn_inline_per_vcpu[MAX_CPUS];
-static uint64_t count_insn_inline_racy;
 static uint64_t count_mem;
 static uint64_t count_mem_per_vcpu[MAX_CPUS];
 static uint64_t count_mem_inline_per_vcpu[MAX_CPUS];
-static uint64_t count_mem_inline_racy;
 static GMutex tb_lock;
 static GMutex insn_lock;
 static GMutex mem_lock;
@@ -50,11 +47,9 @@ static void stats_insn(void)
     printf("insn: %" PRIu64 "\n", expected);
     printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu);
     printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
-    printf("insn: %" PRIu64 " (inline racy)\n", count_insn_inline_racy);
     g_assert(expected > 0);
     g_assert(per_vcpu == expected);
     g_assert(inl_per_vcpu == expected);
-    g_assert(count_insn_inline_racy <= expected);
 }
 
 static void stats_tb(void)
@@ -65,11 +60,9 @@ static void stats_tb(void)
     printf("tb: %" PRIu64 "\n", expected);
     printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu);
     printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
-    printf("tb: %" PRIu64 " (inline racy)\n", count_tb_inline_racy);
     g_assert(expected > 0);
     g_assert(per_vcpu == expected);
     g_assert(inl_per_vcpu == expected);
-    g_assert(count_tb_inline_racy <= expected);
 }
 
 static void stats_mem(void)
@@ -80,11 +73,9 @@ static void stats_mem(void)
     printf("mem: %" PRIu64 "\n", expected);
     printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu);
     printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
-    printf("mem: %" PRIu64 " (inline racy)\n", count_mem_inline_racy);
     g_assert(expected > 0);
     g_assert(per_vcpu == expected);
     g_assert(inl_per_vcpu == expected);
-    g_assert(count_mem_inline_racy <= expected);
 }
 
 static void plugin_exit(qemu_plugin_id_t id, void *udata)
@@ -142,8 +133,6 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
 {
     qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
                                          QEMU_PLUGIN_CB_NO_REGS, 0);
-    qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
-                                             &count_tb_inline_racy, 1);
     qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
         tb, QEMU_PLUGIN_INLINE_ADD_U64,
         count_tb_inline_per_vcpu, sizeof(uint64_t), 1);
@@ -152,18 +141,12 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
         struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx);
         qemu_plugin_register_vcpu_insn_exec_cb(insn, vcpu_insn_exec,
                                                QEMU_PLUGIN_CB_NO_REGS, 0);
-        qemu_plugin_register_vcpu_insn_exec_inline(
-            insn, QEMU_PLUGIN_INLINE_ADD_U64,
-            &count_insn_inline_racy, 1);
         qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
             insn, QEMU_PLUGIN_INLINE_ADD_U64,
             count_insn_inline_per_vcpu, sizeof(uint64_t), 1);
         qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access,
                                          QEMU_PLUGIN_CB_NO_REGS,
                                          QEMU_PLUGIN_MEM_RW, 0);
-        qemu_plugin_register_vcpu_mem_inline(insn, QEMU_PLUGIN_MEM_RW,
-                                             QEMU_PLUGIN_INLINE_ADD_U64,
-                                             &count_mem_inline_racy, 1);
         qemu_plugin_register_vcpu_mem_inline_per_vcpu(
             insn, QEMU_PLUGIN_MEM_RW,
             QEMU_PLUGIN_INLINE_ADD_U64,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 05/12] tests/plugin/mem: fix race condition with callbacks
  2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
                   ` (3 preceding siblings ...)
  2024-01-11 14:23 ` [PATCH 04/12] tests/plugin/inline: migrate to new per_vcpu API Pierrick Bouvier
@ 2024-01-11 14:23 ` Pierrick Bouvier
  2024-01-11 22:12   ` Richard Henderson
  2024-01-11 14:23 ` [PATCH 06/12] tests/plugin/mem: migrate to new per_vcpu API Pierrick Bouvier
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Introduce a lock so global count is correct.
This was found by comparing with new inline per_vcpu inline op.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 tests/plugin/mem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tests/plugin/mem.c b/tests/plugin/mem.c
index 44e91065ba7..beca8232342 100644
--- a/tests/plugin/mem.c
+++ b/tests/plugin/mem.c
@@ -22,6 +22,7 @@ static uint64_t io_count;
 static bool do_inline, do_callback;
 static bool do_haddr;
 static enum qemu_plugin_mem_rw rw = QEMU_PLUGIN_MEM_RW;
+static GMutex lock;
 
 static void plugin_exit(qemu_plugin_id_t id, void *p)
 {
@@ -42,6 +43,7 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
 static void vcpu_mem(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo,
                      uint64_t vaddr, void *udata)
 {
+    g_mutex_lock(&lock);
     if (do_haddr) {
         struct qemu_plugin_hwaddr *hwaddr;
         hwaddr = qemu_plugin_get_hwaddr(meminfo, vaddr);
@@ -53,6 +55,7 @@ static void vcpu_mem(unsigned int cpu_index, qemu_plugin_meminfo_t meminfo,
     } else {
         cb_mem_count++;
     }
+    g_mutex_unlock(&lock);
 }
 
 static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 06/12] tests/plugin/mem: migrate to new per_vcpu API
  2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
                   ` (4 preceding siblings ...)
  2024-01-11 14:23 ` [PATCH 05/12] tests/plugin/mem: fix race condition with callbacks Pierrick Bouvier
@ 2024-01-11 14:23 ` Pierrick Bouvier
  2024-01-11 14:23 ` [PATCH 07/12] tests/plugin/insn: " Pierrick Bouvier
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 tests/plugin/mem.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/tests/plugin/mem.c b/tests/plugin/mem.c
index beca8232342..06f2d80d0ec 100644
--- a/tests/plugin/mem.c
+++ b/tests/plugin/mem.c
@@ -16,7 +16,9 @@
 
 QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
 
-static uint64_t inline_mem_count;
+#define MAX_CPUS 8
+
+static uint64_t inline_mem_count[MAX_CPUS];
 static uint64_t cb_mem_count;
 static uint64_t io_count;
 static bool do_inline, do_callback;
@@ -29,7 +31,11 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
     g_autoptr(GString) out = g_string_new("");
 
     if (do_inline) {
-        g_string_printf(out, "inline mem accesses: %" PRIu64 "\n", inline_mem_count);
+        uint64_t total = 0;
+        for (int i = 0; i < MAX_CPUS; ++i) {
+            total += inline_mem_count[i];
+        }
+        g_string_printf(out, "inline mem accesses: %" PRIu64 "\n", total);
     }
     if (do_callback) {
         g_string_append_printf(out, "callback mem accesses: %" PRIu64 "\n", cb_mem_count);
@@ -67,9 +73,10 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
         struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
 
         if (do_inline) {
-            qemu_plugin_register_vcpu_mem_inline(insn, rw,
-                                                 QEMU_PLUGIN_INLINE_ADD_U64,
-                                                 &inline_mem_count, 1);
+            qemu_plugin_register_vcpu_mem_inline_per_vcpu(
+                insn, rw,
+                QEMU_PLUGIN_INLINE_ADD_U64,
+                inline_mem_count, sizeof(uint64_t), 1);
         }
         if (do_callback) {
             qemu_plugin_register_vcpu_mem_cb(insn, vcpu_mem,
@@ -109,6 +116,7 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
                 fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
                 return -1;
             }
+            g_assert(info->system.smp_vcpus <= MAX_CPUS);
         } else if (g_strcmp0(tokens[0], "callback") == 0) {
             if (!qemu_plugin_bool_parse(tokens[0], tokens[1], &do_callback)) {
                 fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 07/12] tests/plugin/insn: migrate to new per_vcpu API
  2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
                   ` (5 preceding siblings ...)
  2024-01-11 14:23 ` [PATCH 06/12] tests/plugin/mem: migrate to new per_vcpu API Pierrick Bouvier
@ 2024-01-11 14:23 ` Pierrick Bouvier
  2024-01-11 22:14   ` Richard Henderson
  2024-01-11 14:23 ` [PATCH 08/12] tests/plugin/bb: " Pierrick Bouvier
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 tests/plugin/insn.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/plugin/insn.c b/tests/plugin/insn.c
index 5fd3017c2b3..be79c5080fd 100644
--- a/tests/plugin/insn.c
+++ b/tests/plugin/insn.c
@@ -23,7 +23,7 @@ typedef struct {
 } InstructionCount;
 
 static InstructionCount counts[MAX_CPUS];
-static uint64_t inline_insn_count;
+static uint64_t inline_insn_count[MAX_CPUS];
 
 static bool do_inline;
 static bool do_size;
@@ -94,8 +94,9 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
         struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
 
         if (do_inline) {
-            qemu_plugin_register_vcpu_insn_exec_inline(
-                insn, QEMU_PLUGIN_INLINE_ADD_U64, &inline_insn_count, 1);
+            qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
+                insn, QEMU_PLUGIN_INLINE_ADD_U64,
+                inline_insn_count, sizeof(uint64_t), 1);
         } else {
             uint64_t vaddr = qemu_plugin_insn_vaddr(insn);
             qemu_plugin_register_vcpu_insn_exec_cb(
@@ -151,7 +152,11 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
             }
         }
     } else if (do_inline) {
-        g_string_append_printf(out, "insns: %" PRIu64 "\n", inline_insn_count);
+        uint64_t total = 0;
+        for (i = 0; i < MAX_CPUS; ++i) {
+            total += inline_insn_count[i];
+        }
+        g_string_append_printf(out, "insns: %" PRIu64 "\n", total);
     } else {
         uint64_t total_insns = 0;
         for (i = 0; i < MAX_CPUS; i++) {
@@ -195,6 +200,7 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
                 fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
                 return -1;
             }
+            g_assert(info->system.smp_vcpus <= MAX_CPUS);
         } else if (g_strcmp0(tokens[0], "sizes") == 0) {
             if (!qemu_plugin_bool_parse(tokens[0], tokens[1], &do_size)) {
                 fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 08/12] tests/plugin/bb: migrate to new per_vcpu API
  2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
                   ` (6 preceding siblings ...)
  2024-01-11 14:23 ` [PATCH 07/12] tests/plugin/insn: " Pierrick Bouvier
@ 2024-01-11 14:23 ` Pierrick Bouvier
  2024-01-11 22:15   ` Richard Henderson
  2024-01-11 14:23 ` [PATCH 09/12] contrib/plugins/hotblocks: " Pierrick Bouvier
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Since we need a fixed offset between count memory location, we now need
a contiguous array of CPUCount (instead of array of pointers).

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 tests/plugin/bb.c | 54 +++++++++++++++++++++++------------------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/tests/plugin/bb.c b/tests/plugin/bb.c
index df50d1fd3bc..644a36f3128 100644
--- a/tests/plugin/bb.c
+++ b/tests/plugin/bb.c
@@ -16,6 +16,8 @@
 
 QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
 
+#define MAX_CPUS 8
+
 typedef struct {
     GMutex lock;
     int index;
@@ -23,14 +25,10 @@ typedef struct {
     uint64_t insn_count;
 } CPUCount;
 
-/* Used by the inline & linux-user counts */
 static bool do_inline;
-static CPUCount inline_count;
-
 /* Dump running CPU total on idle? */
 static bool idle_report;
-static GPtrArray *counts;
-static int max_cpus;
+static CPUCount counts[MAX_CPUS];
 
 static void gen_one_cpu_report(CPUCount *count, GString *report)
 {
@@ -46,18 +44,26 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
 {
     g_autoptr(GString) report = g_string_new("");
 
-    if (do_inline || !max_cpus) {
+    if (do_inline) {
+        uint64_t total_bb = 0;
+        uint64_t total_insn = 0;
+        for (int i = 0; i < MAX_CPUS; ++i) {
+            total_bb += counts[i].bb_count;
+            total_insn += counts[i].insn_count;
+        }
         g_string_printf(report, "bb's: %" PRIu64", insns: %" PRIu64 "\n",
-                        inline_count.bb_count, inline_count.insn_count);
+                        total_bb, total_insn);
     } else {
-        g_ptr_array_foreach(counts, (GFunc) gen_one_cpu_report, report);
+        for (int i = 0; i < MAX_CPUS; ++i) {
+            gen_one_cpu_report(&counts[i], report);
+        }
     }
     qemu_plugin_outs(report->str);
 }
 
 static void vcpu_idle(qemu_plugin_id_t id, unsigned int cpu_index)
 {
-    CPUCount *count = g_ptr_array_index(counts, cpu_index);
+    CPUCount *count = &counts[cpu_index];
     g_autoptr(GString) report = g_string_new("");
     gen_one_cpu_report(count, report);
 
@@ -69,8 +75,7 @@ static void vcpu_idle(qemu_plugin_id_t id, unsigned int cpu_index)
 
 static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
 {
-    CPUCount *count = max_cpus ?
-        g_ptr_array_index(counts, cpu_index) : &inline_count;
+    CPUCount *count = &counts[cpu_index];
 
     uintptr_t n_insns = (uintptr_t)udata;
     g_mutex_lock(&count->lock);
@@ -84,11 +89,13 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
     size_t n_insns = qemu_plugin_tb_n_insns(tb);
 
     if (do_inline) {
-        qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
-                                                 &inline_count.bb_count, 1);
-        qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
-                                                 &inline_count.insn_count,
-                                                 n_insns);
+        CPUCount *first_count = &counts[0];
+        qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+            tb, QEMU_PLUGIN_INLINE_ADD_U64,
+            &first_count->bb_count, sizeof(CPUCount), 1);
+        qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+            tb, QEMU_PLUGIN_INLINE_ADD_U64,
+            &first_count->insn_count, sizeof(CPUCount), n_insns);
     } else {
         qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
                                              QEMU_PLUGIN_CB_NO_REGS,
@@ -121,17 +128,10 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
         }
     }
 
-    if (info->system_emulation && !do_inline) {
-        max_cpus = info->system.max_vcpus;
-        counts = g_ptr_array_new();
-        for (i = 0; i < max_cpus; i++) {
-            CPUCount *count = g_new0(CPUCount, 1);
-            g_mutex_init(&count->lock);
-            count->index = i;
-            g_ptr_array_add(counts, count);
-        }
-    } else if (!do_inline) {
-        g_mutex_init(&inline_count.lock);
+    g_assert(info->system.smp_vcpus <= MAX_CPUS);
+    for (i = 0; i < MAX_CPUS; i++) {
+        CPUCount *count = &counts[i];
+        count->index = i;
     }
 
     if (idle_report) {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 09/12] contrib/plugins/hotblocks: migrate to new per_vcpu API
  2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
                   ` (7 preceding siblings ...)
  2024-01-11 14:23 ` [PATCH 08/12] tests/plugin/bb: " Pierrick Bouvier
@ 2024-01-11 14:23 ` Pierrick Bouvier
  2024-01-12  8:42   ` Richard Henderson
  2024-01-11 14:23 ` [PATCH 10/12] contrib/plugins/howvec: " Pierrick Bouvier
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 contrib/plugins/hotblocks.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/contrib/plugins/hotblocks.c b/contrib/plugins/hotblocks.c
index 4de1b134944..1d95ac53143 100644
--- a/contrib/plugins/hotblocks.c
+++ b/contrib/plugins/hotblocks.c
@@ -17,6 +17,8 @@
 
 QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
 
+#define MAX_CPUS 8
+
 static bool do_inline;
 
 /* Plugins need to take care of their own locking */
@@ -34,16 +36,25 @@ static guint64 limit = 20;
  */
 typedef struct {
     uint64_t start_addr;
-    uint64_t exec_count;
+    uint64_t exec_count[MAX_CPUS];
     int      trans_count;
     unsigned long insns;
 } ExecCount;
 
+static uint64_t exec_count(ExecCount *e)
+{
+    uint64_t res = 0;
+    for (int i = 0; i < MAX_CPUS; ++i) {
+        res += e->exec_count[i];
+    }
+    return res;
+}
+
 static gint cmp_exec_count(gconstpointer a, gconstpointer b)
 {
     ExecCount *ea = (ExecCount *) a;
     ExecCount *eb = (ExecCount *) b;
-    return ea->exec_count > eb->exec_count ? -1 : 1;
+    return exec_count(ea) > exec_count(eb) ? -1 : 1;
 }
 
 static void plugin_exit(qemu_plugin_id_t id, void *p)
@@ -65,7 +76,7 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
             ExecCount *rec = (ExecCount *) it->data;
             g_string_append_printf(report, "0x%016"PRIx64", %d, %ld, %"PRId64"\n",
                                    rec->start_addr, rec->trans_count,
-                                   rec->insns, rec->exec_count);
+                                   rec->insns, exec_count(rec));
         }
 
         g_list_free(it);
@@ -89,7 +100,7 @@ static void vcpu_tb_exec(unsigned int cpu_index, void *udata)
     cnt = (ExecCount *) g_hash_table_lookup(hotblocks, (gconstpointer) hash);
     /* should always succeed */
     g_assert(cnt);
-    cnt->exec_count++;
+    cnt->exec_count[cpu_index]++;
     g_mutex_unlock(&lock);
 }
 
@@ -120,8 +131,9 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
     g_mutex_unlock(&lock);
 
     if (do_inline) {
-        qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
-                                                 &cnt->exec_count, 1);
+        qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
+            tb, QEMU_PLUGIN_INLINE_ADD_U64,
+            cnt->exec_count, sizeof(uint64_t), 1);
     } else {
         qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
                                              QEMU_PLUGIN_CB_NO_REGS,
@@ -149,6 +161,7 @@ int qemu_plugin_install(qemu_plugin_id_t id, const qemu_info_t *info,
 
     plugin_init();
 
+    g_assert(info->system.smp_vcpus <= MAX_CPUS);
     qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
     qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
     return 0;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 10/12] contrib/plugins/howvec: migrate to new per_vcpu API
  2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
                   ` (8 preceding siblings ...)
  2024-01-11 14:23 ` [PATCH 09/12] contrib/plugins/hotblocks: " Pierrick Bouvier
@ 2024-01-11 14:23 ` Pierrick Bouvier
  2024-01-12  8:44   ` Richard Henderson
  2024-01-11 14:23 ` [PATCH 11/12] plugins: remove non per_vcpu inline operation from API Pierrick Bouvier
  2024-01-11 14:23 ` [PATCH 12/12] MAINTAINERS: Add myself as reviewer for TCG Plugins Pierrick Bouvier
  11 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 contrib/plugins/howvec.c | 33 ++++++++++++++++++++++-----------
 1 file changed, 22 insertions(+), 11 deletions(-)

diff --git a/contrib/plugins/howvec.c b/contrib/plugins/howvec.c
index 644a7856bb2..46230a43927 100644
--- a/contrib/plugins/howvec.c
+++ b/contrib/plugins/howvec.c
@@ -22,8 +22,18 @@
 
 QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
 
+#define MAX_CPUS 8
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 
+static uint64_t count(uint64_t *arr)
+{
+    uint64_t res = 0;
+    for (int i = 0; i < MAX_CPUS; ++i) {
+        res += arr[i];
+    }
+    return res;
+}
+
 typedef enum {
     COUNT_CLASS,
     COUNT_INDIVIDUAL,
@@ -43,13 +53,13 @@ typedef struct {
     uint32_t mask;
     uint32_t pattern;
     CountType what;
-    uint64_t count;
+    uint64_t count[MAX_CPUS];
 } InsnClassExecCount;
 
 typedef struct {
     char *insn;
     uint32_t opcode;
-    uint64_t count;
+    uint64_t count[MAX_CPUS];
     InsnClassExecCount *class;
 } InsnExecCount;
 
@@ -159,7 +169,7 @@ static gint cmp_exec_count(gconstpointer a, gconstpointer b)
 {
     InsnExecCount *ea = (InsnExecCount *) a;
     InsnExecCount *eb = (InsnExecCount *) b;
-    return ea->count > eb->count ? -1 : 1;
+    return count(ea->count) > count(eb->count) ? -1 : 1;
 }
 
 static void free_record(gpointer data)
@@ -180,11 +190,11 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
         class = &class_table[i];
         switch (class->what) {
         case COUNT_CLASS:
-            if (class->count || verbose) {
+            if (count(class->count) || verbose) {
                 g_string_append_printf(report,
                                        "Class: %-24s\t(%" PRId64 " hits)\n",
                                        class->class,
-                                       class->count);
+                                       count(class->count));
             }
             break;
         case COUNT_INDIVIDUAL:
@@ -212,7 +222,7 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
                                    "Instr: %-24s\t(%" PRId64 " hits)"
                                    "\t(op=0x%08x/%s)\n",
                                    rec->insn,
-                                   rec->count,
+                                   count(rec->count),
                                    rec->opcode,
                                    rec->class ?
                                    rec->class->class : "un-categorised");
@@ -233,7 +243,7 @@ static void plugin_init(void)
 static void vcpu_insn_exec_before(unsigned int cpu_index, void *udata)
 {
     uint64_t *count = (uint64_t *) udata;
-    (*count)++;
+    count[cpu_index]++;
 }
 
 static uint64_t *find_counter(struct qemu_plugin_insn *insn)
@@ -265,7 +275,7 @@ static uint64_t *find_counter(struct qemu_plugin_insn *insn)
     case COUNT_NONE:
         return NULL;
     case COUNT_CLASS:
-        return &class->count;
+        return class->count;
     case COUNT_INDIVIDUAL:
     {
         InsnExecCount *icount;
@@ -285,7 +295,7 @@ static uint64_t *find_counter(struct qemu_plugin_insn *insn)
         }
         g_mutex_unlock(&lock);
 
-        return &icount->count;
+        return icount->count;
     }
     default:
         g_assert_not_reached();
@@ -306,8 +316,9 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
 
         if (cnt) {
             if (do_inline) {
-                qemu_plugin_register_vcpu_insn_exec_inline(
-                    insn, QEMU_PLUGIN_INLINE_ADD_U64, cnt, 1);
+                qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
+                    insn, QEMU_PLUGIN_INLINE_ADD_U64,
+                    cnt, sizeof(uint64_t), 1);
             } else {
                 qemu_plugin_register_vcpu_insn_exec_cb(
                     insn, vcpu_insn_exec_before, QEMU_PLUGIN_CB_NO_REGS, cnt);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 11/12] plugins: remove non per_vcpu inline operation from API
  2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
                   ` (9 preceding siblings ...)
  2024-01-11 14:23 ` [PATCH 10/12] contrib/plugins/howvec: " Pierrick Bouvier
@ 2024-01-11 14:23 ` Pierrick Bouvier
  2024-01-11 14:23 ` [PATCH 12/12] MAINTAINERS: Add myself as reviewer for TCG Plugins Pierrick Bouvier
  11 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Now we have a thread-safe equivalent of inline operation, and that all
plugins were changed to use it, there is no point to keep the old API.

In more, it will help when we implement more functionality (conditional
callbacks), as we know can assume that all inline operation have an
offset associated.

In case a user still want the old behaviour (target a single memory
location), it's still possible to define an inline operation with a 0
offset.

Bump API version as it's a breaking change for existing plugins.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 include/qemu/qemu-plugin.h | 59 ++++----------------------------------
 plugins/api.c              | 22 --------------
 2 files changed, 6 insertions(+), 75 deletions(-)

diff --git a/include/qemu/qemu-plugin.h b/include/qemu/qemu-plugin.h
index 8a0691a760e..65e770f5c56 100644
--- a/include/qemu/qemu-plugin.h
+++ b/include/qemu/qemu-plugin.h
@@ -50,11 +50,16 @@ typedef uint64_t qemu_plugin_id_t;
  *
  * The plugins export the API they were built against by exposing the
  * symbol qemu_plugin_version which can be checked.
+ *
+ * Version 2:
+ * Remove qemu_plugin_register_vcpu_{tb, insn, mem}_exec_inline.
+ * Those functions are replaced by *_per_vcpu variants, which guarantees
+ * thread-safety for operations.
  */
 
 extern QEMU_PLUGIN_EXPORT int qemu_plugin_version;
 
-#define QEMU_PLUGIN_VERSION 1
+#define QEMU_PLUGIN_VERSION 2
 
 /**
  * struct qemu_info_t - system information for plugins
@@ -293,25 +298,6 @@ enum qemu_plugin_op {
     QEMU_PLUGIN_INLINE_ADD_U64,
 };
 
-/**
- * qemu_plugin_register_vcpu_tb_exec_inline() - execution inline op
- * @tb: the opaque qemu_plugin_tb handle for the translation
- * @op: the type of qemu_plugin_op (e.g. ADD_U64)
- * @ptr: the target memory location for the op
- * @imm: the op data (e.g. 1)
- *
- * Insert an inline op to every time a translated unit executes.
- * Useful if you just want to increment a single counter somewhere in
- * memory.
- *
- * Note: ops are not atomic so in multi-threaded/multi-smp situations
- * you will get inexact results.
- */
-QEMU_PLUGIN_API
-void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
-                                              enum qemu_plugin_op op,
-                                              void *ptr, uint64_t imm);
-
 /**
  * qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu() - execution inline op
  * @tb: the opaque qemu_plugin_tb handle for the translation
@@ -346,21 +332,6 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
                                             enum qemu_plugin_cb_flags flags,
                                             void *userdata);
 
-/**
- * qemu_plugin_register_vcpu_insn_exec_inline() - insn execution inline op
- * @insn: the opaque qemu_plugin_insn handle for an instruction
- * @op: the type of qemu_plugin_op (e.g. ADD_U64)
- * @ptr: the target memory location for the op
- * @imm: the op data (e.g. 1)
- *
- * Insert an inline op to every time an instruction executes. Useful
- * if you just want to increment a single counter somewhere in memory.
- */
-QEMU_PLUGIN_API
-void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
-                                                enum qemu_plugin_op op,
-                                                void *ptr, uint64_t imm);
-
 /**
  * qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu() - insn exec inline op
  * @insn: the opaque qemu_plugin_insn handle for an instruction
@@ -585,24 +556,6 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
                                       enum qemu_plugin_mem_rw rw,
                                       void *userdata);
 
-/**
- * qemu_plugin_register_vcpu_mem_inline() - register an inline op to any memory access
- * @insn: handle for instruction to instrument
- * @rw: apply to reads, writes or both
- * @op: the op, of type qemu_plugin_op
- * @ptr: pointer memory for the op
- * @imm: immediate data for @op
- *
- * This registers a inline op every memory access generated by the
- * instruction. This provides for a lightweight but not thread-safe
- * way of counting the number of operations done.
- */
-QEMU_PLUGIN_API
-void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
-                                          enum qemu_plugin_mem_rw rw,
-                                          enum qemu_plugin_op op, void *ptr,
-                                          uint64_t imm);
-
 /**
  * qemu_plugin_register_vcpu_mem_inline_per_vcpu() - inline op for mem access
  * @insn: handle for instruction to instrument
diff --git a/plugins/api.c b/plugins/api.c
index fd6ce678501..a51585b3e53 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -94,13 +94,6 @@ void qemu_plugin_register_vcpu_tb_exec_cb(struct qemu_plugin_tb *tb,
     }
 }
 
-void qemu_plugin_register_vcpu_tb_exec_inline(struct qemu_plugin_tb *tb,
-                                              enum qemu_plugin_op op,
-                                              void *ptr, uint64_t imm)
-{
-    qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(tb, op, ptr, 0, imm);
-}
-
 void qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
     struct qemu_plugin_tb *tb,
     enum qemu_plugin_op op,
@@ -124,13 +117,6 @@ void qemu_plugin_register_vcpu_insn_exec_cb(struct qemu_plugin_insn *insn,
     }
 }
 
-void qemu_plugin_register_vcpu_insn_exec_inline(struct qemu_plugin_insn *insn,
-                                                enum qemu_plugin_op op,
-                                                void *ptr, uint64_t imm)
-{
-    qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(insn, op, ptr, 0, imm);
-}
-
 void qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
     struct qemu_plugin_insn *insn,
     enum qemu_plugin_op op,
@@ -158,14 +144,6 @@ void qemu_plugin_register_vcpu_mem_cb(struct qemu_plugin_insn *insn,
                                 cb, flags, rw, udata);
 }
 
-void qemu_plugin_register_vcpu_mem_inline(struct qemu_plugin_insn *insn,
-                                          enum qemu_plugin_mem_rw rw,
-                                          enum qemu_plugin_op op, void *ptr,
-                                          uint64_t imm)
-{
-    qemu_plugin_register_vcpu_mem_inline_per_vcpu(insn, rw, op, ptr, 0, imm);
-}
-
 void qemu_plugin_register_vcpu_mem_inline_per_vcpu(
     struct qemu_plugin_insn *insn,
     enum qemu_plugin_mem_rw rw,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 12/12] MAINTAINERS: Add myself as reviewer for TCG Plugins
  2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
                   ` (10 preceding siblings ...)
  2024-01-11 14:23 ` [PATCH 11/12] plugins: remove non per_vcpu inline operation from API Pierrick Bouvier
@ 2024-01-11 14:23 ` Pierrick Bouvier
  2024-01-12 15:53   ` Philippe Mathieu-Daudé
  11 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 14:23 UTC (permalink / raw)
  To: qemu-devel
  Cc: Pierrick Bouvier, Alex Bennée, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 00ec1f7ecaf..4a621aca978 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3660,6 +3660,7 @@ TCG Plugins
 M: Alex Bennée <alex.bennee@linaro.org>
 R: Alexandre Iooss <erdnaxe@crans.org>
 R: Mahmoud Mandour <ma.mandourr@gmail.com>
+R: Pierrick Bouvier <pierrick.bouvier@linaro.org>
 S: Maintained
 F: docs/devel/tcg-plugins.rst
 F: plugins/
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/12] tests/plugin: add test plugin for inline operations
  2024-01-11 14:23 ` [PATCH 03/12] tests/plugin: add test plugin for inline operations Pierrick Bouvier
@ 2024-01-11 15:57   ` Philippe Mathieu-Daudé
  2024-01-11 17:20     ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-01-11 15:57 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Richard Henderson,
	Paolo Bonzini, Alexandre Iooss

Hi Pierrick,

On 11/1/24 15:23, Pierrick Bouvier wrote:
> For now, it simply performs instruction, bb and mem count, and ensure
> that inline vs callback versions have the same result. Later, we'll
> extend it when new inline operations are added.
> 
> Use existing plugins to test everything works is a bit cumbersome, as
> different events are treated in different plugins. Thus, this new one.
> 
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
>   tests/plugin/inline.c    | 183 +++++++++++++++++++++++++++++++++++++++
>   tests/plugin/meson.build |   2 +-
>   2 files changed, 184 insertions(+), 1 deletion(-)
>   create mode 100644 tests/plugin/inline.c

> +#define MAX_CPUS 8

Where does this value come from?

Should the pluggin API provide a helper to ask TCG how many
vCPUs are created?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/12] tests/plugin: add test plugin for inline operations
  2024-01-11 15:57   ` Philippe Mathieu-Daudé
@ 2024-01-11 17:20     ` Pierrick Bouvier
  2024-01-12 17:20       ` Alex Bennée
  0 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-11 17:20 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Richard Henderson,
	Paolo Bonzini, Alexandre Iooss

On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
> Hi Pierrick,
> 
> On 11/1/24 15:23, Pierrick Bouvier wrote:
>> For now, it simply performs instruction, bb and mem count, and ensure
>> that inline vs callback versions have the same result. Later, we'll
>> extend it when new inline operations are added.
>>
>> Use existing plugins to test everything works is a bit cumbersome, as
>> different events are treated in different plugins. Thus, this new one.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>>    tests/plugin/inline.c    | 183 +++++++++++++++++++++++++++++++++++++++
>>    tests/plugin/meson.build |   2 +-
>>    2 files changed, 184 insertions(+), 1 deletion(-)
>>    create mode 100644 tests/plugin/inline.c
> 
>> +#define MAX_CPUS 8
> 
> Where does this value come from?
> 

The plugin tests/plugin/insn.c had this constant, so I picked it up from 
here.

> Should the pluggin API provide a helper to ask TCG how many
> vCPUs are created?

In user mode, we can't know how many simultaneous threads (and thus 
vcpu) will be triggered by advance. I'm not sure if additional cpus can 
be added in system mode.

One problem though, is that when you register an inline op with a 
dynamic array, when you resize it (when detecting a new vcpu), you can't 
change it afterwards. So, you need a storage statically sized somewhere.

Your question is good, and maybe we should define a MAX constant that 
plugins should rely on, instead of a random amount.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/12] plugins: implement inline operation with cpu_index offset
  2024-01-11 14:23 ` [PATCH 01/12] plugins: implement inline operation with cpu_index offset Pierrick Bouvier
@ 2024-01-11 22:04   ` Richard Henderson
  2024-01-12 14:27     ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2024-01-11 22:04 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/12/24 01:23, Pierrick Bouvier wrote:
> Instead of working on a fixed memory location, allow to index it based
> on cpu_index and a given offset (ptr + cpu_index * offset).
> Current semantic is not modified as we use a 0 offset, thus inline
> operation still targets always the same memory location.
> 
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
>   accel/tcg/plugin-gen.c | 60 +++++++++++++++++++++++++++++++++++-------
>   include/qemu/plugin.h  |  1 +
>   plugins/api.c          |  7 ++---
>   plugins/core.c         | 11 +++++---
>   plugins/plugin.h       |  5 ++--
>   5 files changed, 65 insertions(+), 19 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

For the to-do list: add mul -> shl strength reduction in fold_mul().


r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 02/12] plugins: add inline operation per vcpu
  2024-01-11 14:23 ` [PATCH 02/12] plugins: add inline operation per vcpu Pierrick Bouvier
@ 2024-01-11 22:08   ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2024-01-11 22:08 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/12/24 01:23, Pierrick Bouvier wrote:
> Extends API with three new functions:
> qemu_plugin_register_vcpu_{tb, insn, mem}_exec_inline_per_vcpu().
> 
> Compared to non per_vcpu versions, ptr is now a base, and current
> cpu_index and an offset are used to compute memory location on which
> operation happens (ptr + cpu_index * offset).
> 
> This allows to have a thread-safe version of inline operations.
> 
> Having a flexible offset is useful in case a user wants to target a
> memory location embedded into a struct. In this case, the offset between
> two memory locations will be bigger than sizeof(uint64_t).
> 
> Signed-off-by: Pierrick Bouvier<pierrick.bouvier@linaro.org>
> ---
>   include/qemu/qemu-plugin.h   | 56 +++++++++++++++++++++++++++++++++++-
>   plugins/api.c                | 36 ++++++++++++++++++++---
>   plugins/qemu-plugins.symbols |  3 ++
>   3 files changed, 90 insertions(+), 5 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/12] tests/plugin/inline: migrate to new per_vcpu API
  2024-01-11 14:23 ` [PATCH 04/12] tests/plugin/inline: migrate to new per_vcpu API Pierrick Bouvier
@ 2024-01-11 22:10   ` Richard Henderson
  2024-01-12  3:51     ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2024-01-11 22:10 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/12/24 01:23, Pierrick Bouvier wrote:
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
>   tests/plugin/inline.c | 17 -----------------
>   1 file changed, 17 deletions(-)

Was this supposed to be together with patch 6?

r~

> 
> diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c
> index 6114ebca545..ae59f7af7a7 100644
> --- a/tests/plugin/inline.c
> +++ b/tests/plugin/inline.c
> @@ -18,15 +18,12 @@
>   static uint64_t count_tb;
>   static uint64_t count_tb_per_vcpu[MAX_CPUS];
>   static uint64_t count_tb_inline_per_vcpu[MAX_CPUS];
> -static uint64_t count_tb_inline_racy;
>   static uint64_t count_insn;
>   static uint64_t count_insn_per_vcpu[MAX_CPUS];
>   static uint64_t count_insn_inline_per_vcpu[MAX_CPUS];
> -static uint64_t count_insn_inline_racy;
>   static uint64_t count_mem;
>   static uint64_t count_mem_per_vcpu[MAX_CPUS];
>   static uint64_t count_mem_inline_per_vcpu[MAX_CPUS];
> -static uint64_t count_mem_inline_racy;
>   static GMutex tb_lock;
>   static GMutex insn_lock;
>   static GMutex mem_lock;
> @@ -50,11 +47,9 @@ static void stats_insn(void)
>       printf("insn: %" PRIu64 "\n", expected);
>       printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu);
>       printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
> -    printf("insn: %" PRIu64 " (inline racy)\n", count_insn_inline_racy);
>       g_assert(expected > 0);
>       g_assert(per_vcpu == expected);
>       g_assert(inl_per_vcpu == expected);
> -    g_assert(count_insn_inline_racy <= expected);
>   }
>   
>   static void stats_tb(void)
> @@ -65,11 +60,9 @@ static void stats_tb(void)
>       printf("tb: %" PRIu64 "\n", expected);
>       printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu);
>       printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
> -    printf("tb: %" PRIu64 " (inline racy)\n", count_tb_inline_racy);
>       g_assert(expected > 0);
>       g_assert(per_vcpu == expected);
>       g_assert(inl_per_vcpu == expected);
> -    g_assert(count_tb_inline_racy <= expected);
>   }
>   
>   static void stats_mem(void)
> @@ -80,11 +73,9 @@ static void stats_mem(void)
>       printf("mem: %" PRIu64 "\n", expected);
>       printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu);
>       printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
> -    printf("mem: %" PRIu64 " (inline racy)\n", count_mem_inline_racy);
>       g_assert(expected > 0);
>       g_assert(per_vcpu == expected);
>       g_assert(inl_per_vcpu == expected);
> -    g_assert(count_mem_inline_racy <= expected);
>   }
>   
>   static void plugin_exit(qemu_plugin_id_t id, void *udata)
> @@ -142,8 +133,6 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>   {
>       qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
>                                            QEMU_PLUGIN_CB_NO_REGS, 0);
> -    qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
> -                                             &count_tb_inline_racy, 1);
>       qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
>           tb, QEMU_PLUGIN_INLINE_ADD_U64,
>           count_tb_inline_per_vcpu, sizeof(uint64_t), 1);
> @@ -152,18 +141,12 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>           struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx);
>           qemu_plugin_register_vcpu_insn_exec_cb(insn, vcpu_insn_exec,
>                                                  QEMU_PLUGIN_CB_NO_REGS, 0);
> -        qemu_plugin_register_vcpu_insn_exec_inline(
> -            insn, QEMU_PLUGIN_INLINE_ADD_U64,
> -            &count_insn_inline_racy, 1);
>           qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
>               insn, QEMU_PLUGIN_INLINE_ADD_U64,
>               count_insn_inline_per_vcpu, sizeof(uint64_t), 1);
>           qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access,
>                                            QEMU_PLUGIN_CB_NO_REGS,
>                                            QEMU_PLUGIN_MEM_RW, 0);
> -        qemu_plugin_register_vcpu_mem_inline(insn, QEMU_PLUGIN_MEM_RW,
> -                                             QEMU_PLUGIN_INLINE_ADD_U64,
> -                                             &count_mem_inline_racy, 1);
>           qemu_plugin_register_vcpu_mem_inline_per_vcpu(
>               insn, QEMU_PLUGIN_MEM_RW,
>               QEMU_PLUGIN_INLINE_ADD_U64,



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 05/12] tests/plugin/mem: fix race condition with callbacks
  2024-01-11 14:23 ` [PATCH 05/12] tests/plugin/mem: fix race condition with callbacks Pierrick Bouvier
@ 2024-01-11 22:12   ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2024-01-11 22:12 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/12/24 01:23, Pierrick Bouvier wrote:
> Introduce a lock so global count is correct.
> This was found by comparing with new inline per_vcpu inline op.
> 
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
>   tests/plugin/mem.c | 3 +++
>   1 file changed, 3 insertions(+)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 07/12] tests/plugin/insn: migrate to new per_vcpu API
  2024-01-11 14:23 ` [PATCH 07/12] tests/plugin/insn: " Pierrick Bouvier
@ 2024-01-11 22:14   ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2024-01-11 22:14 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/12/24 01:23, Pierrick Bouvier wrote:
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
>   tests/plugin/insn.c | 14 ++++++++++----
>   1 file changed, 10 insertions(+), 4 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

> @@ -195,6 +200,7 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
>                   fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
>                   return -1;
>               }
> +            g_assert(info->system.smp_vcpus <= MAX_CPUS);

Alex, can we do better than aborting here?


r~



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 08/12] tests/plugin/bb: migrate to new per_vcpu API
  2024-01-11 14:23 ` [PATCH 08/12] tests/plugin/bb: " Pierrick Bouvier
@ 2024-01-11 22:15   ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2024-01-11 22:15 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/12/24 01:23, Pierrick Bouvier wrote:
> Since we need a fixed offset between count memory location, we now need
> a contiguous array of CPUCount (instead of array of pointers).
> 
> Signed-off-by: Pierrick Bouvier<pierrick.bouvier@linaro.org>
> ---
>   tests/plugin/bb.c | 54 +++++++++++++++++++++++------------------------
>   1 file changed, 27 insertions(+), 27 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/12] tests/plugin/inline: migrate to new per_vcpu API
  2024-01-11 22:10   ` Richard Henderson
@ 2024-01-12  3:51     ` Pierrick Bouvier
  2024-01-12  8:40       ` Richard Henderson
  0 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-12  3:51 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/12/24 02:10, Richard Henderson wrote:
> On 1/12/24 01:23, Pierrick Bouvier wrote:
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>>    tests/plugin/inline.c | 17 -----------------
>>    1 file changed, 17 deletions(-)
> 
> Was this supposed to be together with patch 6?
> 

My goal was to have a version that still uses original API.
If you prefer this to be squashed, no problem to do it.

> r~
> 
>>
>> diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c
>> index 6114ebca545..ae59f7af7a7 100644
>> --- a/tests/plugin/inline.c
>> +++ b/tests/plugin/inline.c
>> @@ -18,15 +18,12 @@
>>    static uint64_t count_tb;
>>    static uint64_t count_tb_per_vcpu[MAX_CPUS];
>>    static uint64_t count_tb_inline_per_vcpu[MAX_CPUS];
>> -static uint64_t count_tb_inline_racy;
>>    static uint64_t count_insn;
>>    static uint64_t count_insn_per_vcpu[MAX_CPUS];
>>    static uint64_t count_insn_inline_per_vcpu[MAX_CPUS];
>> -static uint64_t count_insn_inline_racy;
>>    static uint64_t count_mem;
>>    static uint64_t count_mem_per_vcpu[MAX_CPUS];
>>    static uint64_t count_mem_inline_per_vcpu[MAX_CPUS];
>> -static uint64_t count_mem_inline_racy;
>>    static GMutex tb_lock;
>>    static GMutex insn_lock;
>>    static GMutex mem_lock;
>> @@ -50,11 +47,9 @@ static void stats_insn(void)
>>        printf("insn: %" PRIu64 "\n", expected);
>>        printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>        printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>> -    printf("insn: %" PRIu64 " (inline racy)\n", count_insn_inline_racy);
>>        g_assert(expected > 0);
>>        g_assert(per_vcpu == expected);
>>        g_assert(inl_per_vcpu == expected);
>> -    g_assert(count_insn_inline_racy <= expected);
>>    }
>>    
>>    static void stats_tb(void)
>> @@ -65,11 +60,9 @@ static void stats_tb(void)
>>        printf("tb: %" PRIu64 "\n", expected);
>>        printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>        printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>> -    printf("tb: %" PRIu64 " (inline racy)\n", count_tb_inline_racy);
>>        g_assert(expected > 0);
>>        g_assert(per_vcpu == expected);
>>        g_assert(inl_per_vcpu == expected);
>> -    g_assert(count_tb_inline_racy <= expected);
>>    }
>>    
>>    static void stats_mem(void)
>> @@ -80,11 +73,9 @@ static void stats_mem(void)
>>        printf("mem: %" PRIu64 "\n", expected);
>>        printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>        printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>> -    printf("mem: %" PRIu64 " (inline racy)\n", count_mem_inline_racy);
>>        g_assert(expected > 0);
>>        g_assert(per_vcpu == expected);
>>        g_assert(inl_per_vcpu == expected);
>> -    g_assert(count_mem_inline_racy <= expected);
>>    }
>>    
>>    static void plugin_exit(qemu_plugin_id_t id, void *udata)
>> @@ -142,8 +133,6 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>>    {
>>        qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
>>                                             QEMU_PLUGIN_CB_NO_REGS, 0);
>> -    qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
>> -                                             &count_tb_inline_racy, 1);
>>        qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
>>            tb, QEMU_PLUGIN_INLINE_ADD_U64,
>>            count_tb_inline_per_vcpu, sizeof(uint64_t), 1);
>> @@ -152,18 +141,12 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>>            struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx);
>>            qemu_plugin_register_vcpu_insn_exec_cb(insn, vcpu_insn_exec,
>>                                                   QEMU_PLUGIN_CB_NO_REGS, 0);
>> -        qemu_plugin_register_vcpu_insn_exec_inline(
>> -            insn, QEMU_PLUGIN_INLINE_ADD_U64,
>> -            &count_insn_inline_racy, 1);
>>            qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
>>                insn, QEMU_PLUGIN_INLINE_ADD_U64,
>>                count_insn_inline_per_vcpu, sizeof(uint64_t), 1);
>>            qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access,
>>                                             QEMU_PLUGIN_CB_NO_REGS,
>>                                             QEMU_PLUGIN_MEM_RW, 0);
>> -        qemu_plugin_register_vcpu_mem_inline(insn, QEMU_PLUGIN_MEM_RW,
>> -                                             QEMU_PLUGIN_INLINE_ADD_U64,
>> -                                             &count_mem_inline_racy, 1);
>>            qemu_plugin_register_vcpu_mem_inline_per_vcpu(
>>                insn, QEMU_PLUGIN_MEM_RW,
>>                QEMU_PLUGIN_INLINE_ADD_U64,
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/12] tests/plugin/inline: migrate to new per_vcpu API
  2024-01-12  3:51     ` Pierrick Bouvier
@ 2024-01-12  8:40       ` Richard Henderson
  2024-01-12  8:58         ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2024-01-12  8:40 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/12/24 14:51, Pierrick Bouvier wrote:
> On 1/12/24 02:10, Richard Henderson wrote:
>> On 1/12/24 01:23, Pierrick Bouvier wrote:
>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>> ---
>>>    tests/plugin/inline.c | 17 -----------------
>>>    1 file changed, 17 deletions(-)
>>
>> Was this supposed to be together with patch 6?
>>
> 
> My goal was to have a version that still uses original API.
> If you prefer this to be squashed, no problem to do it.

My confusion is that this patch does not "migrate" anything -- it only removes code.  Is 
the just that the description is inaccurate?  But it appears that the combination of 4+6 
would "migrate" to the new API.


r~

> 
>> r~
>>
>>>
>>> diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c
>>> index 6114ebca545..ae59f7af7a7 100644
>>> --- a/tests/plugin/inline.c
>>> +++ b/tests/plugin/inline.c
>>> @@ -18,15 +18,12 @@
>>>    static uint64_t count_tb;
>>>    static uint64_t count_tb_per_vcpu[MAX_CPUS];
>>>    static uint64_t count_tb_inline_per_vcpu[MAX_CPUS];
>>> -static uint64_t count_tb_inline_racy;
>>>    static uint64_t count_insn;
>>>    static uint64_t count_insn_per_vcpu[MAX_CPUS];
>>>    static uint64_t count_insn_inline_per_vcpu[MAX_CPUS];
>>> -static uint64_t count_insn_inline_racy;
>>>    static uint64_t count_mem;
>>>    static uint64_t count_mem_per_vcpu[MAX_CPUS];
>>>    static uint64_t count_mem_inline_per_vcpu[MAX_CPUS];
>>> -static uint64_t count_mem_inline_racy;
>>>    static GMutex tb_lock;
>>>    static GMutex insn_lock;
>>>    static GMutex mem_lock;
>>> @@ -50,11 +47,9 @@ static void stats_insn(void)
>>>        printf("insn: %" PRIu64 "\n", expected);
>>>        printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>>        printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>>> -    printf("insn: %" PRIu64 " (inline racy)\n", count_insn_inline_racy);
>>>        g_assert(expected > 0);
>>>        g_assert(per_vcpu == expected);
>>>        g_assert(inl_per_vcpu == expected);
>>> -    g_assert(count_insn_inline_racy <= expected);
>>>    }
>>>    static void stats_tb(void)
>>> @@ -65,11 +60,9 @@ static void stats_tb(void)
>>>        printf("tb: %" PRIu64 "\n", expected);
>>>        printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>>        printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>>> -    printf("tb: %" PRIu64 " (inline racy)\n", count_tb_inline_racy);
>>>        g_assert(expected > 0);
>>>        g_assert(per_vcpu == expected);
>>>        g_assert(inl_per_vcpu == expected);
>>> -    g_assert(count_tb_inline_racy <= expected);
>>>    }
>>>    static void stats_mem(void)
>>> @@ -80,11 +73,9 @@ static void stats_mem(void)
>>>        printf("mem: %" PRIu64 "\n", expected);
>>>        printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>>        printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>>> -    printf("mem: %" PRIu64 " (inline racy)\n", count_mem_inline_racy);
>>>        g_assert(expected > 0);
>>>        g_assert(per_vcpu == expected);
>>>        g_assert(inl_per_vcpu == expected);
>>> -    g_assert(count_mem_inline_racy <= expected);
>>>    }
>>>    static void plugin_exit(qemu_plugin_id_t id, void *udata)
>>> @@ -142,8 +133,6 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct 
>>> qemu_plugin_tb *tb)
>>>    {
>>>        qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
>>>                                             QEMU_PLUGIN_CB_NO_REGS, 0);
>>> -    qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
>>> -                                             &count_tb_inline_racy, 1);
>>>        qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
>>>            tb, QEMU_PLUGIN_INLINE_ADD_U64,
>>>            count_tb_inline_per_vcpu, sizeof(uint64_t), 1);
>>> @@ -152,18 +141,12 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct 
>>> qemu_plugin_tb *tb)
>>>            struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx);
>>>            qemu_plugin_register_vcpu_insn_exec_cb(insn, vcpu_insn_exec,
>>>                                                   QEMU_PLUGIN_CB_NO_REGS, 0);
>>> -        qemu_plugin_register_vcpu_insn_exec_inline(
>>> -            insn, QEMU_PLUGIN_INLINE_ADD_U64,
>>> -            &count_insn_inline_racy, 1);
>>>            qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
>>>                insn, QEMU_PLUGIN_INLINE_ADD_U64,
>>>                count_insn_inline_per_vcpu, sizeof(uint64_t), 1);
>>>            qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access,
>>>                                             QEMU_PLUGIN_CB_NO_REGS,
>>>                                             QEMU_PLUGIN_MEM_RW, 0);
>>> -        qemu_plugin_register_vcpu_mem_inline(insn, QEMU_PLUGIN_MEM_RW,
>>> -                                             QEMU_PLUGIN_INLINE_ADD_U64,
>>> -                                             &count_mem_inline_racy, 1);
>>>            qemu_plugin_register_vcpu_mem_inline_per_vcpu(
>>>                insn, QEMU_PLUGIN_MEM_RW,
>>>                QEMU_PLUGIN_INLINE_ADD_U64,
>>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 09/12] contrib/plugins/hotblocks: migrate to new per_vcpu API
  2024-01-11 14:23 ` [PATCH 09/12] contrib/plugins/hotblocks: " Pierrick Bouvier
@ 2024-01-12  8:42   ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2024-01-12  8:42 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/12/24 01:23, Pierrick Bouvier wrote:
> Signed-off-by: Pierrick Bouvier<pierrick.bouvier@linaro.org>
> ---
>   contrib/plugins/hotblocks.c | 25 +++++++++++++++++++------
>   1 file changed, 19 insertions(+), 6 deletions(-)

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 10/12] contrib/plugins/howvec: migrate to new per_vcpu API
  2024-01-11 14:23 ` [PATCH 10/12] contrib/plugins/howvec: " Pierrick Bouvier
@ 2024-01-12  8:44   ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2024-01-12  8:44 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/12/24 01:23, Pierrick Bouvier wrote:
> @@ -180,11 +190,11 @@ static void plugin_exit(qemu_plugin_id_t id, void *p)
>           class = &class_table[i];
>           switch (class->what) {
>           case COUNT_CLASS:
> -            if (class->count || verbose) {
> +            if (count(class->count) || verbose) {
>                   g_string_append_printf(report,
>                                          "Class: %-24s\t(%" PRId64 " hits)\n",
>                                          class->class,
> -                                       class->count);
> +                                       count(class->count));
>               }

Compute count once.  Otherwise,

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/12] tests/plugin/inline: migrate to new per_vcpu API
  2024-01-12  8:40       ` Richard Henderson
@ 2024-01-12  8:58         ` Pierrick Bouvier
  0 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-12  8:58 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/12/24 12:40, Richard Henderson wrote:
> On 1/12/24 14:51, Pierrick Bouvier wrote:
>> On 1/12/24 02:10, Richard Henderson wrote:
>>> On 1/12/24 01:23, Pierrick Bouvier wrote:
>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>> ---
>>>>     tests/plugin/inline.c | 17 -----------------
>>>>     1 file changed, 17 deletions(-)
>>>
>>> Was this supposed to be together with patch 6?
>>>
>>
>> My goal was to have a version that still uses original API.
>> If you prefer this to be squashed, no problem to do it.
> 
> My confusion is that this patch does not "migrate" anything -- it only removes code.  Is
> the just that the description is inaccurate?  But it appears that the combination of 4+6
> would "migrate" to the new API.
> 

You're right, the commit message is incorrect, as it is just removing 
the use of old API. Well, I think having this in a split commit does not 
create any value for this serie, so I'll simply squash this in previous one.

> 
> r~
> 
>>
>>> r~
>>>
>>>>
>>>> diff --git a/tests/plugin/inline.c b/tests/plugin/inline.c
>>>> index 6114ebca545..ae59f7af7a7 100644
>>>> --- a/tests/plugin/inline.c
>>>> +++ b/tests/plugin/inline.c
>>>> @@ -18,15 +18,12 @@
>>>>     static uint64_t count_tb;
>>>>     static uint64_t count_tb_per_vcpu[MAX_CPUS];
>>>>     static uint64_t count_tb_inline_per_vcpu[MAX_CPUS];
>>>> -static uint64_t count_tb_inline_racy;
>>>>     static uint64_t count_insn;
>>>>     static uint64_t count_insn_per_vcpu[MAX_CPUS];
>>>>     static uint64_t count_insn_inline_per_vcpu[MAX_CPUS];
>>>> -static uint64_t count_insn_inline_racy;
>>>>     static uint64_t count_mem;
>>>>     static uint64_t count_mem_per_vcpu[MAX_CPUS];
>>>>     static uint64_t count_mem_inline_per_vcpu[MAX_CPUS];
>>>> -static uint64_t count_mem_inline_racy;
>>>>     static GMutex tb_lock;
>>>>     static GMutex insn_lock;
>>>>     static GMutex mem_lock;
>>>> @@ -50,11 +47,9 @@ static void stats_insn(void)
>>>>         printf("insn: %" PRIu64 "\n", expected);
>>>>         printf("insn: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>>>         printf("insn: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>>>> -    printf("insn: %" PRIu64 " (inline racy)\n", count_insn_inline_racy);
>>>>         g_assert(expected > 0);
>>>>         g_assert(per_vcpu == expected);
>>>>         g_assert(inl_per_vcpu == expected);
>>>> -    g_assert(count_insn_inline_racy <= expected);
>>>>     }
>>>>     static void stats_tb(void)
>>>> @@ -65,11 +60,9 @@ static void stats_tb(void)
>>>>         printf("tb: %" PRIu64 "\n", expected);
>>>>         printf("tb: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>>>         printf("tb: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>>>> -    printf("tb: %" PRIu64 " (inline racy)\n", count_tb_inline_racy);
>>>>         g_assert(expected > 0);
>>>>         g_assert(per_vcpu == expected);
>>>>         g_assert(inl_per_vcpu == expected);
>>>> -    g_assert(count_tb_inline_racy <= expected);
>>>>     }
>>>>     static void stats_mem(void)
>>>> @@ -80,11 +73,9 @@ static void stats_mem(void)
>>>>         printf("mem: %" PRIu64 "\n", expected);
>>>>         printf("mem: %" PRIu64 " (per vcpu)\n", per_vcpu);
>>>>         printf("mem: %" PRIu64 " (per vcpu inline)\n", inl_per_vcpu);
>>>> -    printf("mem: %" PRIu64 " (inline racy)\n", count_mem_inline_racy);
>>>>         g_assert(expected > 0);
>>>>         g_assert(per_vcpu == expected);
>>>>         g_assert(inl_per_vcpu == expected);
>>>> -    g_assert(count_mem_inline_racy <= expected);
>>>>     }
>>>>     static void plugin_exit(qemu_plugin_id_t id, void *udata)
>>>> @@ -142,8 +133,6 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct
>>>> qemu_plugin_tb *tb)
>>>>     {
>>>>         qemu_plugin_register_vcpu_tb_exec_cb(tb, vcpu_tb_exec,
>>>>                                              QEMU_PLUGIN_CB_NO_REGS, 0);
>>>> -    qemu_plugin_register_vcpu_tb_exec_inline(tb, QEMU_PLUGIN_INLINE_ADD_U64,
>>>> -                                             &count_tb_inline_racy, 1);
>>>>         qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
>>>>             tb, QEMU_PLUGIN_INLINE_ADD_U64,
>>>>             count_tb_inline_per_vcpu, sizeof(uint64_t), 1);
>>>> @@ -152,18 +141,12 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct
>>>> qemu_plugin_tb *tb)
>>>>             struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, idx);
>>>>             qemu_plugin_register_vcpu_insn_exec_cb(insn, vcpu_insn_exec,
>>>>                                                    QEMU_PLUGIN_CB_NO_REGS, 0);
>>>> -        qemu_plugin_register_vcpu_insn_exec_inline(
>>>> -            insn, QEMU_PLUGIN_INLINE_ADD_U64,
>>>> -            &count_insn_inline_racy, 1);
>>>>             qemu_plugin_register_vcpu_insn_exec_inline_per_vcpu(
>>>>                 insn, QEMU_PLUGIN_INLINE_ADD_U64,
>>>>                 count_insn_inline_per_vcpu, sizeof(uint64_t), 1);
>>>>             qemu_plugin_register_vcpu_mem_cb(insn, &vcpu_mem_access,
>>>>                                              QEMU_PLUGIN_CB_NO_REGS,
>>>>                                              QEMU_PLUGIN_MEM_RW, 0);
>>>> -        qemu_plugin_register_vcpu_mem_inline(insn, QEMU_PLUGIN_MEM_RW,
>>>> -                                             QEMU_PLUGIN_INLINE_ADD_U64,
>>>> -                                             &count_mem_inline_racy, 1);
>>>>             qemu_plugin_register_vcpu_mem_inline_per_vcpu(
>>>>                 insn, QEMU_PLUGIN_MEM_RW,
>>>>                 QEMU_PLUGIN_INLINE_ADD_U64,
>>>
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/12] plugins: implement inline operation with cpu_index offset
  2024-01-11 22:04   ` Richard Henderson
@ 2024-01-12 14:27     ` Pierrick Bouvier
  2024-01-12 22:22       ` Richard Henderson
  0 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-12 14:27 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/12/24 02:04, Richard Henderson wrote:
> On 1/12/24 01:23, Pierrick Bouvier wrote:
>> Instead of working on a fixed memory location, allow to index it based
>> on cpu_index and a given offset (ptr + cpu_index * offset).
>> Current semantic is not modified as we use a 0 offset, thus inline
>> operation still targets always the same memory location.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>>    accel/tcg/plugin-gen.c | 60 +++++++++++++++++++++++++++++++++++-------
>>    include/qemu/plugin.h  |  1 +
>>    plugins/api.c          |  7 ++---
>>    plugins/core.c         | 11 +++++---
>>    plugins/plugin.h       |  5 ++--
>>    5 files changed, 65 insertions(+), 19 deletions(-)
> 
> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
> 
> For the to-do list: add mul -> shl strength reduction in fold_mul().
>

Would you like me to add a todo somewhere about it? Or is it a reminder 
for follow-up work?

> 
> r~


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 12/12] MAINTAINERS: Add myself as reviewer for TCG Plugins
  2024-01-11 14:23 ` [PATCH 12/12] MAINTAINERS: Add myself as reviewer for TCG Plugins Pierrick Bouvier
@ 2024-01-12 15:53   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 35+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-01-12 15:53 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Richard Henderson,
	Paolo Bonzini, Alexandre Iooss

On 11/1/24 15:23, Pierrick Bouvier wrote:
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
>   MAINTAINERS | 1 +
>   1 file changed, 1 insertion(+)

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/12] tests/plugin: add test plugin for inline operations
  2024-01-11 17:20     ` Pierrick Bouvier
@ 2024-01-12 17:20       ` Alex Bennée
  2024-01-13  5:16         ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Bennée @ 2024-01-12 17:20 UTC (permalink / raw)
  To: Pierrick Bouvier
  Cc: Philippe Mathieu-Daudé, qemu-devel, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:

> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
>> Hi Pierrick,
>> On 11/1/24 15:23, Pierrick Bouvier wrote:
>>> For now, it simply performs instruction, bb and mem count, and ensure
>>> that inline vs callback versions have the same result. Later, we'll
>>> extend it when new inline operations are added.
>>>
>>> Use existing plugins to test everything works is a bit cumbersome, as
>>> different events are treated in different plugins. Thus, this new one.
>>>
>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>> ---
>>>    tests/plugin/inline.c    | 183 +++++++++++++++++++++++++++++++++++++++
>>>    tests/plugin/meson.build |   2 +-
>>>    2 files changed, 184 insertions(+), 1 deletion(-)
>>>    create mode 100644 tests/plugin/inline.c
>> 
>>> +#define MAX_CPUS 8
>> Where does this value come from?
>> 
>
> The plugin tests/plugin/insn.c had this constant, so I picked it up
> from here.
>
>> Should the pluggin API provide a helper to ask TCG how many
>> vCPUs are created?
>
> In user mode, we can't know how many simultaneous threads (and thus
> vcpu) will be triggered by advance. I'm not sure if additional cpus
> can be added in system mode.
>
> One problem though, is that when you register an inline op with a
> dynamic array, when you resize it (when detecting a new vcpu), you
> can't change it afterwards. So, you need a storage statically sized
> somewhere.
>
> Your question is good, and maybe we should define a MAX constant that
> plugins should rely on, instead of a random amount.

For user-mode it can be infinite. The existing plugins do this by
ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
scoreboard as well? Of course that does introduce a trap for those using
user-mode...

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/12] plugins: implement inline operation with cpu_index offset
  2024-01-12 14:27     ` Pierrick Bouvier
@ 2024-01-12 22:22       ` Richard Henderson
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2024-01-12 22:22 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Mahmoud Mandour, Paolo Bonzini, Alexandre Iooss

On 1/13/24 01:27, Pierrick Bouvier wrote:
> On 1/12/24 02:04, Richard Henderson wrote:
>> On 1/12/24 01:23, Pierrick Bouvier wrote:
>>> Instead of working on a fixed memory location, allow to index it based
>>> on cpu_index and a given offset (ptr + cpu_index * offset).
>>> Current semantic is not modified as we use a 0 offset, thus inline
>>> operation still targets always the same memory location.
>>>
>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>> ---
>>>    accel/tcg/plugin-gen.c | 60 +++++++++++++++++++++++++++++++++++-------
>>>    include/qemu/plugin.h  |  1 +
>>>    plugins/api.c          |  7 ++---
>>>    plugins/core.c         | 11 +++++---
>>>    plugins/plugin.h       |  5 ++--
>>>    5 files changed, 65 insertions(+), 19 deletions(-)
>>
>> Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>>
>> For the to-do list: add mul -> shl strength reduction in fold_mul().
>>
> 
> Would you like me to add a todo somewhere about it? Or is it a reminder for follow-up work?

It's a reminder to myself for follow-up-work.


r~



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/12] tests/plugin: add test plugin for inline operations
  2024-01-12 17:20       ` Alex Bennée
@ 2024-01-13  5:16         ` Pierrick Bouvier
  2024-01-13 17:16           ` Alex Bennée
  0 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-13  5:16 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Philippe Mathieu-Daudé, qemu-devel, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

On 1/12/24 21:20, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> 
>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
>>> Hi Pierrick,
>>> On 11/1/24 15:23, Pierrick Bouvier wrote:
>>>> For now, it simply performs instruction, bb and mem count, and ensure
>>>> that inline vs callback versions have the same result. Later, we'll
>>>> extend it when new inline operations are added.
>>>>
>>>> Use existing plugins to test everything works is a bit cumbersome, as
>>>> different events are treated in different plugins. Thus, this new one.
>>>>
>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>> ---
>>>>     tests/plugin/inline.c    | 183 +++++++++++++++++++++++++++++++++++++++
>>>>     tests/plugin/meson.build |   2 +-
>>>>     2 files changed, 184 insertions(+), 1 deletion(-)
>>>>     create mode 100644 tests/plugin/inline.c
>>>
>>>> +#define MAX_CPUS 8
>>> Where does this value come from?
>>>
>>
>> The plugin tests/plugin/insn.c had this constant, so I picked it up
>> from here.
>>
>>> Should the pluggin API provide a helper to ask TCG how many
>>> vCPUs are created?
>>
>> In user mode, we can't know how many simultaneous threads (and thus
>> vcpu) will be triggered by advance. I'm not sure if additional cpus
>> can be added in system mode.
>>
>> One problem though, is that when you register an inline op with a
>> dynamic array, when you resize it (when detecting a new vcpu), you
>> can't change it afterwards. So, you need a storage statically sized
>> somewhere.
>>
>> Your question is good, and maybe we should define a MAX constant that
>> plugins should rely on, instead of a random amount.
> 
> For user-mode it can be infinite. The existing plugins do this by
> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
> scoreboard as well? Of course that does introduce a trap for those using
> user-mode...
> 

The problem with vcpu-index % max_vcpu is that it reintroduces race 
condition, though it's probably less frequent than on a single variable. 
IMHO, yes it solves memory error, but does not solve the initial problem 
itself.

The simplest solution would be to have a size "big enough" for most 
cases, and abort when it's reached.

Another solution, much more complicated, but correct, would be to move 
memory management of plugin scoreboard to plugin runtime, and add a 
level of indirection to access it. Every time a new vcpu is added, we 
can grow dynamically. This way, the array can grow, and ultimately, 
plugin can poke its content/size. I'm not sure this complexity is what 
we want though.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/12] tests/plugin: add test plugin for inline operations
  2024-01-13  5:16         ` Pierrick Bouvier
@ 2024-01-13 17:16           ` Alex Bennée
  2024-01-15  7:06             ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Bennée @ 2024-01-13 17:16 UTC (permalink / raw)
  To: Pierrick Bouvier
  Cc: Philippe Mathieu-Daudé, qemu-devel, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:

> On 1/12/24 21:20, Alex Bennée wrote:
>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>> 
>>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
>>>> Hi Pierrick,
>>>> On 11/1/24 15:23, Pierrick Bouvier wrote:
>>>>> For now, it simply performs instruction, bb and mem count, and ensure
>>>>> that inline vs callback versions have the same result. Later, we'll
>>>>> extend it when new inline operations are added.
>>>>>
>>>>> Use existing plugins to test everything works is a bit cumbersome, as
>>>>> different events are treated in different plugins. Thus, this new one.
>>>>>
<snip>
>>>>> +#define MAX_CPUS 8
>>>> Where does this value come from?
>>>>
>>>
>>> The plugin tests/plugin/insn.c had this constant, so I picked it up
>>> from here.
>>>
>>>> Should the pluggin API provide a helper to ask TCG how many
>>>> vCPUs are created?
>>>
>>> In user mode, we can't know how many simultaneous threads (and thus
>>> vcpu) will be triggered by advance. I'm not sure if additional cpus
>>> can be added in system mode.
>>>
>>> One problem though, is that when you register an inline op with a
>>> dynamic array, when you resize it (when detecting a new vcpu), you
>>> can't change it afterwards. So, you need a storage statically sized
>>> somewhere.
>>>
>>> Your question is good, and maybe we should define a MAX constant that
>>> plugins should rely on, instead of a random amount.
>> For user-mode it can be infinite. The existing plugins do this by
>> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
>> scoreboard as well? Of course that does introduce a trap for those using
>> user-mode...
>> 
>
> The problem with vcpu-index % max_vcpu is that it reintroduces race
> condition, though it's probably less frequent than on a single
> variable. IMHO, yes it solves memory error, but does not solve the
> initial problem itself.
>
> The simplest solution would be to have a size "big enough" for most
> cases, and abort when it's reached.

Well that is simple enough for system emulation as max_vcpus is a bounded
number.

> Another solution, much more complicated, but correct, would be to move
> memory management of plugin scoreboard to plugin runtime, and add a
> level of indirection to access it.

That certainly gives us the most control and safety. We can then ensure
we'll never to writing past the bounds of the buffer. The plugin would
have to use an access function to get the pointer to read at the time it
cared and of course inline checks should be pretty simple.

> Every time a new vcpu is added, we
> can grow dynamically. This way, the array can grow, and ultimately,
> plugin can poke its content/size. I'm not sure this complexity is what
> we want though.

It doesn't seem too bad. We have a start/end_exclusive is *-user do_fork
where we could update pointers. If we are smart about growing the size
of the arrays we could avoid too much re-translation.

Do we want a limit of one scoreboard per thread? Can we store structures
in there?

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/12] tests/plugin: add test plugin for inline operations
  2024-01-13 17:16           ` Alex Bennée
@ 2024-01-15  7:06             ` Pierrick Bouvier
  2024-01-15  9:04               ` Alex Bennée
  0 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-15  7:06 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Philippe Mathieu-Daudé, qemu-devel, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

On 1/13/24 21:16, Alex Bennée wrote:
> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> 
>> On 1/12/24 21:20, Alex Bennée wrote:
>>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>>>
>>>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
>>>>> Hi Pierrick,
>>>>> On 11/1/24 15:23, Pierrick Bouvier wrote:
>>>>>> For now, it simply performs instruction, bb and mem count, and ensure
>>>>>> that inline vs callback versions have the same result. Later, we'll
>>>>>> extend it when new inline operations are added.
>>>>>>
>>>>>> Use existing plugins to test everything works is a bit cumbersome, as
>>>>>> different events are treated in different plugins. Thus, this new one.
>>>>>>
> <snip>
>>>>>> +#define MAX_CPUS 8
>>>>> Where does this value come from?
>>>>>
>>>>
>>>> The plugin tests/plugin/insn.c had this constant, so I picked it up
>>>> from here.
>>>>
>>>>> Should the pluggin API provide a helper to ask TCG how many
>>>>> vCPUs are created?
>>>>
>>>> In user mode, we can't know how many simultaneous threads (and thus
>>>> vcpu) will be triggered by advance. I'm not sure if additional cpus
>>>> can be added in system mode.
>>>>
>>>> One problem though, is that when you register an inline op with a
>>>> dynamic array, when you resize it (when detecting a new vcpu), you
>>>> can't change it afterwards. So, you need a storage statically sized
>>>> somewhere.
>>>>
>>>> Your question is good, and maybe we should define a MAX constant that
>>>> plugins should rely on, instead of a random amount.
>>> For user-mode it can be infinite. The existing plugins do this by
>>> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
>>> scoreboard as well? Of course that does introduce a trap for those using
>>> user-mode...
>>>
>>
>> The problem with vcpu-index % max_vcpu is that it reintroduces race
>> condition, though it's probably less frequent than on a single
>> variable. IMHO, yes it solves memory error, but does not solve the
>> initial problem itself.
>>
>> The simplest solution would be to have a size "big enough" for most
>> cases, and abort when it's reached.
> 
> Well that is simple enough for system emulation as max_vcpus is a bounded
> number.
> 
>> Another solution, much more complicated, but correct, would be to move
>> memory management of plugin scoreboard to plugin runtime, and add a
>> level of indirection to access it.
> 
> That certainly gives us the most control and safety. We can then ensure
> we'll never to writing past the bounds of the buffer. The plugin would
> have to use an access function to get the pointer to read at the time it
> cared and of course inline checks should be pretty simple.
> 
>> Every time a new vcpu is added, we
>> can grow dynamically. This way, the array can grow, and ultimately,
>> plugin can poke its content/size. I'm not sure this complexity is what
>> we want though.
> 
> It doesn't seem too bad. We have a start/end_exclusive is *-user do_fork
> where we could update pointers. If we are smart about growing the size
> of the arrays we could avoid too much re-translation.
>

I was concerned about a potential race when the scoreboard updates this 
pointer, and other cpus are executing tb (using it). But this concern is 
not valid, since start_exclusive ensures all other cpus are stopped.

vcpu_init_hook function in plugins/core.c seems a good location to add 
this logic. We would check if an update is needed, then 
start_exclusive(), update the scoreboard and exit exclusive section.

Do you think it's worth to try to inline scoreboard pointer (and flush 
all tb when updated), instead of simply adding an indirection to it? 
With this, we could avoid any need to re-translate anything.

> Do we want a limit of one scoreboard per thread? Can we store structures
> in there?
> 

 From the current plugins use case, it seems that several scoreboards 
are needed.
Allowing structure storage seems a bit more tricky to me, because since 
memory may be reallocated, users won't be allowed to point directly to 
it. I would be in favor to avoid this (comments are welcome).

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/12] tests/plugin: add test plugin for inline operations
  2024-01-15  7:06             ` Pierrick Bouvier
@ 2024-01-15  9:04               ` Alex Bennée
  2024-01-16  7:46                 ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Alex Bennée @ 2024-01-15  9:04 UTC (permalink / raw)
  To: Pierrick Bouvier
  Cc: Philippe Mathieu-Daudé, qemu-devel, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:

> On 1/13/24 21:16, Alex Bennée wrote:
>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>> 
>>> On 1/12/24 21:20, Alex Bennée wrote:
>>>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>>>>
>>>>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
>>>>>> Hi Pierrick,
>>>>>> On 11/1/24 15:23, Pierrick Bouvier wrote:
>>>>>>> For now, it simply performs instruction, bb and mem count, and ensure
>>>>>>> that inline vs callback versions have the same result. Later, we'll
>>>>>>> extend it when new inline operations are added.
>>>>>>>
>>>>>>> Use existing plugins to test everything works is a bit cumbersome, as
>>>>>>> different events are treated in different plugins. Thus, this new one.
>>>>>>>
>> <snip>
>>>>>>> +#define MAX_CPUS 8
>>>>>> Where does this value come from?
>>>>>>
>>>>>
>>>>> The plugin tests/plugin/insn.c had this constant, so I picked it up
>>>>> from here.
>>>>>
>>>>>> Should the pluggin API provide a helper to ask TCG how many
>>>>>> vCPUs are created?
>>>>>
>>>>> In user mode, we can't know how many simultaneous threads (and thus
>>>>> vcpu) will be triggered by advance. I'm not sure if additional cpus
>>>>> can be added in system mode.
>>>>>
>>>>> One problem though, is that when you register an inline op with a
>>>>> dynamic array, when you resize it (when detecting a new vcpu), you
>>>>> can't change it afterwards. So, you need a storage statically sized
>>>>> somewhere.
>>>>>
>>>>> Your question is good, and maybe we should define a MAX constant that
>>>>> plugins should rely on, instead of a random amount.
>>>> For user-mode it can be infinite. The existing plugins do this by
>>>> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
>>>> scoreboard as well? Of course that does introduce a trap for those using
>>>> user-mode...
>>>>
>>>
>>> The problem with vcpu-index % max_vcpu is that it reintroduces race
>>> condition, though it's probably less frequent than on a single
>>> variable. IMHO, yes it solves memory error, but does not solve the
>>> initial problem itself.
>>>
>>> The simplest solution would be to have a size "big enough" for most
>>> cases, and abort when it's reached.
>> Well that is simple enough for system emulation as max_vcpus is a
>> bounded
>> number.
>> 
>>> Another solution, much more complicated, but correct, would be to move
>>> memory management of plugin scoreboard to plugin runtime, and add a
>>> level of indirection to access it.
>> That certainly gives us the most control and safety. We can then
>> ensure
>> we'll never to writing past the bounds of the buffer. The plugin would
>> have to use an access function to get the pointer to read at the time it
>> cared and of course inline checks should be pretty simple.
>> 
>>> Every time a new vcpu is added, we
>>> can grow dynamically. This way, the array can grow, and ultimately,
>>> plugin can poke its content/size. I'm not sure this complexity is what
>>> we want though.
>> It doesn't seem too bad. We have a start/end_exclusive is *-user
>> do_fork
>> where we could update pointers. If we are smart about growing the size
>> of the arrays we could avoid too much re-translation.
>>
>
> I was concerned about a potential race when the scoreboard updates
> this pointer, and other cpus are executing tb (using it). But this
> concern is not valid, since start_exclusive ensures all other cpus are
> stopped.
>
> vcpu_init_hook function in plugins/core.c seems a good location to add
> this logic. We would check if an update is needed, then
> start_exclusive(), update the scoreboard and exit exclusive section.

It might already be in the exclusive section. We should try and re-use
an existing exclusive region rather than adding a new one. It's
expensive so best avoided if we can.

> Do you think it's worth to try to inline scoreboard pointer (and flush
> all tb when updated), instead of simply adding an indirection to it?
> With this, we could avoid any need to re-translate anything.

Depends on the cost/complexity of the indirection I guess.
Re-translation isn't super expensive if we say double the size of the
score board each time we need to.

>> Do we want a limit of one scoreboard per thread? Can we store structures
>> in there?
>> 
>
> From the current plugins use case, it seems that several scoreboards
> are needed.
> Allowing structure storage seems a bit more tricky to me, because
> since memory may be reallocated, users won't be allowed to point
> directly to it. I would be in favor to avoid this (comments are
> welcome).

The init function can take a sizeof(entry) field and the inline op can
have an offset field (which for the simple 0 case can be folded away by
TCG).

-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 03/12] tests/plugin: add test plugin for inline operations
  2024-01-15  9:04               ` Alex Bennée
@ 2024-01-16  7:46                 ` Pierrick Bouvier
  0 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2024-01-16  7:46 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Philippe Mathieu-Daudé, qemu-devel, Mahmoud Mandour,
	Richard Henderson, Paolo Bonzini, Alexandre Iooss

On 1/15/24 13:04, Alex Bennée wrote:> Pierrick Bouvier 
<pierrick.bouvier@linaro.org> writes:>
>> On 1/13/24 21:16, Alex Bennée wrote:
>>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>>>
>>>> On 1/12/24 21:20, Alex Bennée wrote:
>>>>> Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
>>>>>
>>>>>> On 1/11/24 19:57, Philippe Mathieu-Daudé wrote:
>>>>>>> Hi Pierrick,
>>>>>>> On 11/1/24 15:23, Pierrick Bouvier wrote:
>>>>>>>> For now, it simply performs instruction, bb and mem count, and ensure
>>>>>>>> that inline vs callback versions have the same result. Later, we'll
>>>>>>>> extend it when new inline operations are added.
>>>>>>>>
>>>>>>>> Use existing plugins to test everything works is a bit cumbersome, as
>>>>>>>> different events are treated in different plugins. Thus, this new one.
>>>>>>>>
>>> <snip>
>>>>>>>> +#define MAX_CPUS 8
>>>>>>> Where does this value come from?
>>>>>>>
>>>>>>
>>>>>> The plugin tests/plugin/insn.c had this constant, so I picked it up
>>>>>> from here.
>>>>>>
>>>>>>> Should the pluggin API provide a helper to ask TCG how many
>>>>>>> vCPUs are created?
>>>>>>
>>>>>> In user mode, we can't know how many simultaneous threads (and thus
>>>>>> vcpu) will be triggered by advance. I'm not sure if additional cpus
>>>>>> can be added in system mode.
>>>>>>
>>>>>> One problem though, is that when you register an inline op with a
>>>>>> dynamic array, when you resize it (when detecting a new vcpu), you
>>>>>> can't change it afterwards. So, you need a storage statically sized
>>>>>> somewhere.
>>>>>>
>>>>>> Your question is good, and maybe we should define a MAX constant that
>>>>>> plugins should rely on, instead of a random amount.
>>>>> For user-mode it can be infinite. The existing plugins do this by
>>>>> ensuring vcpu_index % max_vcpu. Perhaps we just ensure that for the
>>>>> scoreboard as well? Of course that does introduce a trap for those using
>>>>> user-mode...
>>>>>
>>>>
>>>> The problem with vcpu-index % max_vcpu is that it reintroduces race
>>>> condition, though it's probably less frequent than on a single
>>>> variable. IMHO, yes it solves memory error, but does not solve the
>>>> initial problem itself.
>>>>
>>>> The simplest solution would be to have a size "big enough" for most
>>>> cases, and abort when it's reached.
>>> Well that is simple enough for system emulation as max_vcpus is a
>>> bounded
>>> number.
>>>
>>>> Another solution, much more complicated, but correct, would be to move
>>>> memory management of plugin scoreboard to plugin runtime, and add a
>>>> level of indirection to access it.
>>> That certainly gives us the most control and safety. We can then
>>> ensure
>>> we'll never to writing past the bounds of the buffer. The plugin would
>>> have to use an access function to get the pointer to read at the time it
>>> cared and of course inline checks should be pretty simple.
>>>
>>>> Every time a new vcpu is added, we
>>>> can grow dynamically. This way, the array can grow, and ultimately,
>>>> plugin can poke its content/size. I'm not sure this complexity is what
>>>> we want though.
>>> It doesn't seem too bad. We have a start/end_exclusive is *-user
>>> do_fork
>>> where we could update pointers. If we are smart about growing the size
>>> of the arrays we could avoid too much re-translation.
>>>
>>
>> I was concerned about a potential race when the scoreboard updates
>> this pointer, and other cpus are executing tb (using it). But this
>> concern is not valid, since start_exclusive ensures all other cpus are
>> stopped.
>>
>> vcpu_init_hook function in plugins/core.c seems a good location to add
>> this logic. We would check if an update is needed, then
>> start_exclusive(), update the scoreboard and exit exclusive section.
> 
> It might already be in the exclusive section. We should try and re-use
> an existing exclusive region rather than adding a new one. It's
> expensive so best avoided if we can.
> 
>> Do you think it's worth to try to inline scoreboard pointer (and flush
>> all tb when updated), instead of simply adding an indirection to it?
>> With this, we could avoid any need to re-translate anything.
> 
> Depends on the cost/complexity of the indirection I guess.
> Re-translation isn't super expensive if we say double the size of the
> score board each time we need to.
>
>>> Do we want a limit of one scoreboard per thread? Can we store structures
>>> in there?
>>>
>>
>>  From the current plugins use case, it seems that several scoreboards
>> are needed.
>> Allowing structure storage seems a bit more tricky to me, because
>> since memory may be reallocated, users won't be allowed to point
>> directly to it. I would be in favor to avoid this (comments are
>> welcome).
> 
> The init function can take a sizeof(entry) field and the inline op can
> have an offset field (which for the simple 0 case can be folded away by
> TCG).
> 

Thanks for all your comments and guidance on this.

I implemented a new version, working with a scoreboard that gets resized 
automatically, which allows usage of structs as well. The result is 
pretty satisfying as there is almost no more need to manually keep track 
of how many cpus have been used, while offering thread-safety by default.

I'll re-spin the serie once I cleaned up commits, and ported existing 
plugins to this.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2024-01-16  7:47 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-11 14:23 [PATCH 00/12] TCG Plugin inline operation enhancement Pierrick Bouvier
2024-01-11 14:23 ` [PATCH 01/12] plugins: implement inline operation with cpu_index offset Pierrick Bouvier
2024-01-11 22:04   ` Richard Henderson
2024-01-12 14:27     ` Pierrick Bouvier
2024-01-12 22:22       ` Richard Henderson
2024-01-11 14:23 ` [PATCH 02/12] plugins: add inline operation per vcpu Pierrick Bouvier
2024-01-11 22:08   ` Richard Henderson
2024-01-11 14:23 ` [PATCH 03/12] tests/plugin: add test plugin for inline operations Pierrick Bouvier
2024-01-11 15:57   ` Philippe Mathieu-Daudé
2024-01-11 17:20     ` Pierrick Bouvier
2024-01-12 17:20       ` Alex Bennée
2024-01-13  5:16         ` Pierrick Bouvier
2024-01-13 17:16           ` Alex Bennée
2024-01-15  7:06             ` Pierrick Bouvier
2024-01-15  9:04               ` Alex Bennée
2024-01-16  7:46                 ` Pierrick Bouvier
2024-01-11 14:23 ` [PATCH 04/12] tests/plugin/inline: migrate to new per_vcpu API Pierrick Bouvier
2024-01-11 22:10   ` Richard Henderson
2024-01-12  3:51     ` Pierrick Bouvier
2024-01-12  8:40       ` Richard Henderson
2024-01-12  8:58         ` Pierrick Bouvier
2024-01-11 14:23 ` [PATCH 05/12] tests/plugin/mem: fix race condition with callbacks Pierrick Bouvier
2024-01-11 22:12   ` Richard Henderson
2024-01-11 14:23 ` [PATCH 06/12] tests/plugin/mem: migrate to new per_vcpu API Pierrick Bouvier
2024-01-11 14:23 ` [PATCH 07/12] tests/plugin/insn: " Pierrick Bouvier
2024-01-11 22:14   ` Richard Henderson
2024-01-11 14:23 ` [PATCH 08/12] tests/plugin/bb: " Pierrick Bouvier
2024-01-11 22:15   ` Richard Henderson
2024-01-11 14:23 ` [PATCH 09/12] contrib/plugins/hotblocks: " Pierrick Bouvier
2024-01-12  8:42   ` Richard Henderson
2024-01-11 14:23 ` [PATCH 10/12] contrib/plugins/howvec: " Pierrick Bouvier
2024-01-12  8:44   ` Richard Henderson
2024-01-11 14:23 ` [PATCH 11/12] plugins: remove non per_vcpu inline operation from API Pierrick Bouvier
2024-01-11 14:23 ` [PATCH 12/12] MAINTAINERS: Add myself as reviewer for TCG Plugins Pierrick Bouvier
2024-01-12 15:53   ` Philippe Mathieu-Daudé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).