qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/9] contrib/plugins: uftrace
@ 2025-08-08  2:06 Pierrick Bouvier
  2025-08-08  2:06 ` [PATCH v5 1/9] contrib/plugins/uftrace: skeleton file Pierrick Bouvier
                   ` (9 more replies)
  0 siblings, 10 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08  2:06 UTC (permalink / raw)
  To: pierrick.bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé, rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

This plugin generates a binary trace compatible with the excellent uftrace:
https://github.com/namhyung/uftrace

In short, it tracks all function calls performed during execution, based on
frame pointer analysis. A big advantage over "uftrace record" is that it works
in system mode, allowing to trace a full system execution, which was the
original goal. It works as well in user mode, but uftrace itself already does
this. It's implemented for aarch64 and x86_64.

Let's start with concrete examples of the result.

First, in system mode, booting a stack using TF-A + U-boot + Linux:
- Two first stages of boot sequence in Arm Trusted Firmware (EL3 and S-EL1)
https://fileserver.linaro.org/s/kkxBS552W7nYESX/preview
- Stat and open syscalls in kernel
https://fileserver.linaro.org/s/dXe4MfraKg2F476/preview
- Poweroff sequence (from kernel back to firmware, NS-EL2 to EL3)
https://fileserver.linaro.org/s/oR2PtyGKJrqnfRf/preview

Full trace is available here:
https://fileserver.linaro.org/s/WsemLboPEzo24nw/download/aarch64_boot.json.gz
You can download and open it on https://ui.perfetto.dev/ to explore it.

Second, in user mode, tracing qemu-aarch64 (itself) running git --help:
- Loading program and its interpreter
https://fileserver.linaro.org/s/fie8JgX76yyL5cq/preview
- TB creation
https://fileserver.linaro.org/s/GXY6NKMw5EeRCew/preview

Full trace is available here:
https://fileserver.linaro.org/s/N8X8fnZ5yGRZLsT/download/qemu_aarch64_git_help.json.gz

If you had curiosity and now you're ready to give some attention, most of the
details you want to read are included in the documentation patch (final one).

Overhead is around x5-x15, and long traces can be directly filtered with uftrace
if needed.

I hope this plugin can help people trying to understand what happens out of the
user space, and get a better grasp of how firmwares, bootloader, and kernel
interact behind the curtain.

v2
--

- trace active stacks on exit
- do not erase map generated in system_emulation
- add documentation to generate restricted visual traces around specific events
  of execution

v3
--

- fix missing include unistd.h (build failed on MacOS only)

v4
--

- add support for x64

v5
--

- addressed Alex comments
- split plugin implementation in several commits
- removed instruction based timestamps (only use time based timestamps)
- removed sampling implementation

Pierrick Bouvier (9):
  contrib/plugins/uftrace: skeleton file
  contrib/plugins/uftrace: define cpu operations and implement aarch64
  contrib/plugins/uftrace: track callstack
  contrib/plugins/uftrace: implement tracing
  contrib/plugins/uftrace: implement privilege level tracing
  contrib/plugins/uftrace: generate additional files for uftrace
  contrib/plugins/uftrace: implement x64 support
  contrib/plugins/uftrace_symbols.py
  contrib/plugins/uftrace: add documentation

 docs/about/emulation.rst           | 197 +++++++
 contrib/plugins/uftrace.c          | 871 +++++++++++++++++++++++++++++
 contrib/plugins/meson.build        |   3 +-
 contrib/plugins/uftrace_symbols.py | 152 +++++
 4 files changed, 1222 insertions(+), 1 deletion(-)
 create mode 100644 contrib/plugins/uftrace.c
 create mode 100755 contrib/plugins/uftrace_symbols.py

-- 
2.47.2



^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH v5 1/9] contrib/plugins/uftrace: skeleton file
  2025-08-08  2:06 [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
@ 2025-08-08  2:06 ` Pierrick Bouvier
  2025-08-08  8:14   ` Manos Pitsidianakis
  2025-08-08  2:06 ` [PATCH v5 2/9] contrib/plugins/uftrace: define cpu operations and implement aarch64 Pierrick Bouvier
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08  2:06 UTC (permalink / raw)
  To: pierrick.bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé, rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

We define a scoreboard that will hold our data per cpu. As well, we
define a buffer per cpu that will be used to read registers and memories
in a thread-safe way.

For now, we just instrument all instructions with an empty callback.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 contrib/plugins/uftrace.c   | 74 +++++++++++++++++++++++++++++++++++++
 contrib/plugins/meson.build |  3 +-
 2 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 contrib/plugins/uftrace.c

diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
new file mode 100644
index 00000000000..d60c1077496
--- /dev/null
+++ b/contrib/plugins/uftrace.c
@@ -0,0 +1,74 @@
+/*
+ * Copyright (C) 2025, Pierrick Bouvier <pierrick.bouvier@linaro.org>
+ *
+ * Generates a trace compatible with uftrace (similar to uftrace record).
+ * https://github.com/namhyung/uftrace
+ *
+ * See docs/about/emulation.rst|Uftrace for details and examples.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <qemu-plugin.h>
+#include <glib.h>
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+typedef struct Cpu {
+    GByteArray *buf;
+} Cpu;
+
+static struct qemu_plugin_scoreboard *score;
+
+static void track_callstack(unsigned int cpu_index, void *udata)
+{
+}
+
+static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
+{
+    size_t n_insns = qemu_plugin_tb_n_insns(tb);
+
+    for (int i = 0; i < n_insns; i++) {
+        struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
+
+        uintptr_t pc = qemu_plugin_insn_vaddr(insn);
+        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
+                QEMU_PLUGIN_CB_R_REGS,
+                (void *) pc);
+
+    }
+}
+
+static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
+{
+    Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
+    cpu->buf = g_byte_array_new();
+}
+
+static void vcpu_end(unsigned int vcpu_index)
+{
+    Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
+    g_byte_array_free(cpu->buf, true);
+    memset(cpu, 0, sizeof(Cpu));
+}
+
+static void at_exit(qemu_plugin_id_t id, void *data)
+{
+    for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
+        vcpu_end(i);
+    }
+
+    qemu_plugin_scoreboard_free(score);
+}
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
+                                           const qemu_info_t *info,
+                                           int argc, char **argv)
+{
+    score = qemu_plugin_scoreboard_new(sizeof(Cpu));
+    qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
+    qemu_plugin_register_atexit_cb(id, at_exit, NULL);
+    qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
+
+    return 0;
+}
diff --git a/contrib/plugins/meson.build b/contrib/plugins/meson.build
index 1876bc78438..7eb3629c95d 100644
--- a/contrib/plugins/meson.build
+++ b/contrib/plugins/meson.build
@@ -1,5 +1,6 @@
 contrib_plugins = ['bbv', 'cache', 'cflow', 'drcov', 'execlog', 'hotblocks',
-                   'hotpages', 'howvec', 'hwprofile', 'ips', 'stoptrigger']
+                   'hotpages', 'howvec', 'hwprofile', 'ips', 'stoptrigger',
+                   'uftrace']
 if host_os != 'windows'
   # lockstep uses socket.h
   contrib_plugins += 'lockstep'
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 2/9] contrib/plugins/uftrace: define cpu operations and implement aarch64
  2025-08-08  2:06 [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
  2025-08-08  2:06 ` [PATCH v5 1/9] contrib/plugins/uftrace: skeleton file Pierrick Bouvier
@ 2025-08-08  2:06 ` Pierrick Bouvier
  2025-08-08  8:28   ` Manos Pitsidianakis
  2025-08-08  2:06 ` [PATCH v5 3/9] contrib/plugins/uftrace: track callstack Pierrick Bouvier
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08  2:06 UTC (permalink / raw)
  To: pierrick.bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé, rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

We define a new CpuOps structure that will be used to implement tracking
independently of guest architecture.

As well, we now instrument only instructions following ones that might
have touch the frame pointer.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 contrib/plugins/uftrace.c | 112 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 108 insertions(+), 4 deletions(-)

diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
index d60c1077496..4b1a2f38143 100644
--- a/contrib/plugins/uftrace.c
+++ b/contrib/plugins/uftrace.c
@@ -11,14 +11,94 @@
 
 #include <qemu-plugin.h>
 #include <glib.h>
+#include <stdio.h>
 
 QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
 
+typedef struct Cpu Cpu;
+
+typedef struct {
+    void (*init)(Cpu *cpu);
+    void (*end)(Cpu *cpu);
+    uint64_t (*get_frame_pointer)(Cpu *cpu);
+    bool (*does_insn_modify_frame_pointer)(const char *disas);
+} CpuOps;
+
 typedef struct Cpu {
     GByteArray *buf;
+    CpuOps ops;
+    void *arch;
 } Cpu;
 
+typedef struct {
+    struct qemu_plugin_register *reg_fp;
+} Aarch64Cpu;
+
 static struct qemu_plugin_scoreboard *score;
+static CpuOps arch_ops;
+
+static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
+{
+    GByteArray *buf = cpu->buf;
+    g_byte_array_set_size(buf, 0);
+    size_t sz = qemu_plugin_read_register(reg, buf);
+    g_assert(sz == 8);
+    g_assert(buf->len == 8);
+    return *((uint64_t *) buf->data);
+}
+
+static struct qemu_plugin_register *plugin_find_register(const char *name)
+{
+    g_autoptr(GArray) regs = qemu_plugin_get_registers();
+    for (int i = 0; i < regs->len; ++i) {
+        qemu_plugin_reg_descriptor *reg;
+        reg = &g_array_index(regs, qemu_plugin_reg_descriptor, i);
+        if (!strcmp(reg->name, name)) {
+            return reg->handle;
+        }
+    }
+    return NULL;
+}
+
+static uint64_t aarch64_get_frame_pointer(Cpu *cpu_)
+{
+    Aarch64Cpu *cpu = cpu_->arch;
+    return cpu_read_register64(cpu_, cpu->reg_fp);
+}
+
+static void aarch64_init(Cpu *cpu_)
+{
+    Aarch64Cpu *cpu = g_new0(Aarch64Cpu, 1);
+    cpu_->arch = cpu;
+    cpu->reg_fp = plugin_find_register("x29");
+    if (!cpu->reg_fp) {
+        fprintf(stderr, "uftrace plugin: frame pointer register (x29) is not "
+                        "available. Please use an AArch64 cpu (or -cpu max).\n");
+        g_abort();
+    }
+}
+
+static void aarch64_end(Cpu *cpu)
+{
+    g_free(cpu->arch);
+}
+
+static bool aarch64_does_insn_modify_frame_pointer(const char *disas)
+{
+    /*
+     * Check if current instruction concerns fp register "x29".
+     * We add a prefix space to make sure we don't match addresses dump
+     * in disassembly.
+     */
+    return strstr(disas, " x29");
+}
+
+static CpuOps aarch64_ops = {
+    .init = aarch64_init,
+    .end = aarch64_end,
+    .get_frame_pointer = aarch64_get_frame_pointer,
+    .does_insn_modify_frame_pointer = aarch64_does_insn_modify_frame_pointer,
+};
 
 static void track_callstack(unsigned int cpu_index, void *udata)
 {
@@ -28,20 +108,36 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
 {
     size_t n_insns = qemu_plugin_tb_n_insns(tb);
 
+    /*
+     * We instrument all instructions following one that might have updated
+     * the frame pointer. We always instrument first instruction in block, as
+     * last executed instruction, in previous tb, may have modified it.
+     */
+    bool instrument_insn = true;
     for (int i = 0; i < n_insns; i++) {
         struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
 
-        uintptr_t pc = qemu_plugin_insn_vaddr(insn);
-        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
-                QEMU_PLUGIN_CB_R_REGS,
-                (void *) pc);
+        if (instrument_insn) {
+            uintptr_t pc = qemu_plugin_insn_vaddr(insn);
+            qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
+                                                   QEMU_PLUGIN_CB_R_REGS,
+                                                   (void *) pc);
+            instrument_insn = false;
+        }
 
+        char *disas = qemu_plugin_insn_disas(insn);
+        if (arch_ops.does_insn_modify_frame_pointer(disas)) {
+            instrument_insn = true;
+        }
     }
 }
 
 static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
 {
     Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
+    cpu->ops = arch_ops;
+
+    cpu->ops.init(cpu);
     cpu->buf = g_byte_array_new();
 }
 
@@ -65,6 +161,14 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
                                            const qemu_info_t *info,
                                            int argc, char **argv)
 {
+    if (!strcmp(info->target_name, "aarch64")) {
+        arch_ops = aarch64_ops;
+    } else {
+        fprintf(stderr, "plugin uftrace: %s target is not supported\n",
+                info->target_name);
+        return 1;
+    }
+
     score = qemu_plugin_scoreboard_new(sizeof(Cpu));
     qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
     qemu_plugin_register_atexit_cb(id, at_exit, NULL);
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 3/9] contrib/plugins/uftrace: track callstack
  2025-08-08  2:06 [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
  2025-08-08  2:06 ` [PATCH v5 1/9] contrib/plugins/uftrace: skeleton file Pierrick Bouvier
  2025-08-08  2:06 ` [PATCH v5 2/9] contrib/plugins/uftrace: define cpu operations and implement aarch64 Pierrick Bouvier
@ 2025-08-08  2:06 ` Pierrick Bouvier
  2025-08-08  8:49   ` Manos Pitsidianakis
  2025-08-08  2:06 ` [PATCH v5 4/9] contrib/plugins/uftrace: implement tracing Pierrick Bouvier
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08  2:06 UTC (permalink / raw)
  To: pierrick.bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé, rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

We now track callstack, based on frame pointer analysis. We can detect
function calls, returns, and discontinuities.
We implement a frame pointer based unwinding that is used for
discontinuities.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 contrib/plugins/uftrace.c | 160 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 160 insertions(+)

diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
index 4b1a2f38143..d51faceb344 100644
--- a/contrib/plugins/uftrace.c
+++ b/contrib/plugins/uftrace.c
@@ -15,6 +15,15 @@
 
 QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
 
+typedef struct {
+    GArray *s;
+} Callstack;
+
+typedef struct {
+    uint64_t pc;
+    uint64_t frame_pointer;
+} CallstackEntry;
+
 typedef struct Cpu Cpu;
 
 typedef struct {
@@ -25,6 +34,7 @@ typedef struct {
 } CpuOps;
 
 typedef struct Cpu {
+    Callstack *cs;
     GByteArray *buf;
     CpuOps ops;
     void *arch;
@@ -37,6 +47,71 @@ typedef struct {
 static struct qemu_plugin_scoreboard *score;
 static CpuOps arch_ops;
 
+static Callstack *callstack_new(void)
+{
+    Callstack *cs = g_new0(Callstack, 1);
+    cs->s = g_array_new(false, false, sizeof(CallstackEntry));
+    return cs;
+}
+
+static void callstack_free(Callstack *cs)
+{
+    g_array_free(cs->s, true);
+    cs->s = NULL;
+    g_free(cs);
+}
+
+static size_t callstack_depth(const Callstack *cs)
+{
+    return cs->s->len;
+}
+
+static size_t callstack_empty(const Callstack *cs)
+{
+    return callstack_depth(cs) == 0;
+}
+
+static void callstack_clear(Callstack *cs)
+{
+    g_array_set_size(cs->s, 0);
+}
+
+static const CallstackEntry *callstack_at(const Callstack *cs, size_t depth)
+{
+    g_assert(depth > 0);
+    g_assert(depth <= callstack_depth(cs));
+    return &g_array_index(cs->s, CallstackEntry, depth - 1);
+}
+
+static CallstackEntry callstack_top(const Callstack *cs)
+{
+    if (callstack_depth(cs) >= 1) {
+        return *callstack_at(cs, callstack_depth(cs));
+    }
+    return (CallstackEntry){};
+}
+
+static CallstackEntry callstack_caller(const Callstack *cs)
+{
+    if (callstack_depth(cs) >= 2) {
+        return *callstack_at(cs, callstack_depth(cs) - 1);
+    }
+    return (CallstackEntry){};
+}
+
+static void callstack_push(Callstack *cs, CallstackEntry e)
+{
+    g_array_append_val(cs->s, e);
+}
+
+static CallstackEntry callstack_pop(Callstack *cs)
+{
+    g_assert(!callstack_empty(cs));
+    CallstackEntry e = callstack_top(cs);
+    g_array_set_size(cs->s, callstack_depth(cs) - 1);
+    return e;
+}
+
 static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
 {
     GByteArray *buf = cpu->buf;
@@ -47,6 +122,50 @@ static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
     return *((uint64_t *) buf->data);
 }
 
+static uint64_t cpu_read_memory64(Cpu *cpu, uint64_t addr)
+{
+    g_assert(addr);
+    GByteArray *buf = cpu->buf;
+    g_byte_array_set_size(buf, 0);
+    bool read = qemu_plugin_read_memory_vaddr(addr, buf, 8);
+    if (!read) {
+        return 0;
+    }
+    g_assert(buf->len == 8);
+    return *((uint64_t *) buf->data);
+}
+
+static void cpu_unwind_stack(Cpu *cpu, uint64_t frame_pointer, uint64_t pc)
+{
+    g_assert(callstack_empty(cpu->cs));
+
+    #define UNWIND_STACK_MAX_DEPTH 1024
+    CallstackEntry unwind[UNWIND_STACK_MAX_DEPTH];
+    size_t depth = 0;
+    do {
+        /* check we don't have an infinite stack */
+        for (size_t i = 0; i < depth; ++i) {
+            if (frame_pointer == unwind[i].frame_pointer) {
+                break;
+            }
+        }
+        CallstackEntry e = {.frame_pointer = frame_pointer, .pc = pc};
+        unwind[depth] = e;
+        depth++;
+        if (frame_pointer) {
+            frame_pointer = cpu_read_memory64(cpu, frame_pointer);
+        }
+        pc = cpu_read_memory64(cpu, frame_pointer + 8); /* read previous lr */
+    } while (frame_pointer && pc && depth < UNWIND_STACK_MAX_DEPTH);
+    #undef UNWIND_STACK_MAX_DEPTH
+
+    /* push it from bottom to top */
+    while (depth) {
+        callstack_push(cpu->cs, unwind[depth - 1]);
+        --depth;
+    }
+}
+
 static struct qemu_plugin_register *plugin_find_register(const char *name)
 {
     g_autoptr(GArray) regs = qemu_plugin_get_registers();
@@ -102,6 +221,43 @@ static CpuOps aarch64_ops = {
 
 static void track_callstack(unsigned int cpu_index, void *udata)
 {
+    uint64_t pc = (uintptr_t) udata;
+    Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
+    Callstack *cs = cpu->cs;
+
+    uint64_t fp = cpu->ops.get_frame_pointer(cpu);
+    if (!fp && callstack_empty(cs)) {
+        /*
+         * We simply push current pc. Note that we won't detect symbol change as
+         * long as a proper call does not happen.
+         */
+        callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
+        return;
+    }
+
+    CallstackEntry top = callstack_top(cs);
+    if (fp == top.frame_pointer) {
+        /* same function */
+        return;
+    }
+
+    CallstackEntry caller = callstack_caller(cs);
+    if (fp == caller.frame_pointer) {
+        /* return */
+        callstack_pop(cs);
+        return;
+    }
+
+    uint64_t caller_fp = fp ? cpu_read_memory64(cpu, fp) : 0;
+    if (caller_fp == top.frame_pointer) {
+        /* call */
+        callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
+        return;
+    }
+
+    /* discontinuity, exit current stack and unwind new one */
+    callstack_clear(cs);
+    cpu_unwind_stack(cpu, fp, pc);
 }
 
 static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
@@ -139,12 +295,16 @@ static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
 
     cpu->ops.init(cpu);
     cpu->buf = g_byte_array_new();
+
+    cpu->cs = callstack_new();
 }
 
 static void vcpu_end(unsigned int vcpu_index)
 {
     Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
     g_byte_array_free(cpu->buf, true);
+
+    callstack_free(cpu->cs);
     memset(cpu, 0, sizeof(Cpu));
 }
 
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 4/9] contrib/plugins/uftrace: implement tracing
  2025-08-08  2:06 [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
                   ` (2 preceding siblings ...)
  2025-08-08  2:06 ` [PATCH v5 3/9] contrib/plugins/uftrace: track callstack Pierrick Bouvier
@ 2025-08-08  2:06 ` Pierrick Bouvier
  2025-08-08  9:11   ` Manos Pitsidianakis
  2025-08-08  2:06 ` [PATCH v5 5/9] contrib/plugins/uftrace: implement privilege level tracing Pierrick Bouvier
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08  2:06 UTC (permalink / raw)
  To: pierrick.bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé, rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

We implement tracing, following uftrace format.
Trace is flushed every 32 MB, so file operations don't impact
performance at runtime.

A different trace is generated per cpu, and we ensure they have a unique
name, based on vcpu_index, while keeping room for privilege level coming
in next commit.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 contrib/plugins/uftrace.c | 149 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 148 insertions(+), 1 deletion(-)

diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
index d51faceb344..402a242433e 100644
--- a/contrib/plugins/uftrace.c
+++ b/contrib/plugins/uftrace.c
@@ -12,6 +12,13 @@
 #include <qemu-plugin.h>
 #include <glib.h>
 #include <stdio.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <time.h>
+#include <unistd.h>
+
+#define MiB     (INT64_C(1) << 20)
+#define NANOSECONDS_PER_SECOND 1000000000LL
 
 QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
 
@@ -24,6 +31,13 @@ typedef struct {
     uint64_t frame_pointer;
 } CallstackEntry;
 
+typedef struct {
+    GArray *t;
+    GString *path;
+    GString *name;
+    uint32_t id;
+} Trace;
+
 typedef struct Cpu Cpu;
 
 typedef struct {
@@ -34,6 +48,7 @@ typedef struct {
 } CpuOps;
 
 typedef struct Cpu {
+    Trace *trace;
     Callstack *cs;
     GByteArray *buf;
     CpuOps ops;
@@ -44,9 +59,41 @@ typedef struct {
     struct qemu_plugin_register *reg_fp;
 } Aarch64Cpu;
 
+typedef struct {
+    uint64_t timestamp;
+    uint64_t data;
+} UftraceEntry;
+
+typedef enum {
+    UFTRACE_ENTRY,
+    UFTRACE_EXIT,
+    UFTRACE_LOST,
+    UFTRACE_EVENT
+} UftraceRecordType;
+
 static struct qemu_plugin_scoreboard *score;
 static CpuOps arch_ops;
 
+static uint64_t gettime_ns(void)
+{
+#ifdef _WIN32
+    /*
+     * On Windows, timespec_get is available only with UCRT, but not with
+     * MinGW64 environment. Simplify by using only gettimeofday on this
+     * platform. This may result in a precision loss.
+     */
+    struct timeval tv;
+    gettimeofday(&tv, NULL);
+    uint64_t now_ns = tv.tv_sec * NANOSECONDS_PER_SECOND + tv.tv_usec * 1000;
+#else
+    /* We need nanosecond precision for short lived functions. */
+    struct timespec ts;
+    timespec_get(&ts, TIME_UTC);
+    uint64_t now_ns = ts.tv_sec * NANOSECONDS_PER_SECOND + ts.tv_nsec;
+#endif
+    return now_ns;
+}
+
 static Callstack *callstack_new(void)
 {
     Callstack *cs = g_new0(Callstack, 1);
@@ -112,6 +159,85 @@ static CallstackEntry callstack_pop(Callstack *cs)
     return e;
 }
 
+static Trace *trace_new(uint32_t id, GString *name)
+{
+    Trace *t = g_new0(Trace, 1);
+    t->t = g_array_new(false, false, sizeof(UftraceEntry));
+    t->path = g_string_new(NULL);
+    g_string_append_printf(t->path, "./uftrace.data/%"PRIu32".dat", id);
+    t->name = g_string_new(name->str);
+    t->id = id;
+    return t;
+}
+
+static void trace_free(Trace *t)
+{
+    g_assert(t->t->len == 0);
+    g_array_free(t->t, true);
+    t->t = NULL;
+    g_string_free(t->path, true);
+    t->path = NULL;
+    g_string_free(t->name, true);
+    t->name = NULL;
+    g_free(t);
+}
+
+static void trace_flush(Trace *t, bool append)
+{
+    int create_dir = g_mkdir_with_parents("./uftrace.data",
+                                          S_IRWXU | S_IRWXG | S_IRWXO);
+    g_assert(create_dir == 0);
+    FILE *dat = fopen(t->path->str, append ? "a" : "w");
+    g_assert(dat);
+    GArray *data = t->t;
+    if (data->len) {
+        fwrite(data->data, data->len, sizeof(UftraceEntry), dat);
+    }
+    fclose(dat);
+    g_array_set_size(data, 0);
+}
+
+static void trace_add_entry(Trace *t, uint64_t timestamp, uint64_t pc,
+                            size_t depth, UftraceRecordType type)
+{
+    /* libmcount/record.c:record_event */
+    const uint64_t record_magic = 0x5;
+    uint64_t data = type | record_magic << 3;
+    data += depth << 6;
+    data += pc << 16;
+    UftraceEntry e = {.timestamp = timestamp, .data = data};
+    g_array_append_val(t->t, e);
+    if (t->t->len * sizeof(UftraceEntry) > 32 * MiB) {
+        trace_flush(t, true);
+    }
+}
+
+static void trace_enter_function(Trace *t, uint64_t timestamp,
+                                 uint64_t pc, size_t depth)
+{
+    trace_add_entry(t, timestamp, pc, depth, UFTRACE_ENTRY);
+}
+
+static void trace_exit_function(Trace *t, uint64_t timestamp,
+                                uint64_t pc, size_t depth)
+{
+    trace_add_entry(t, timestamp, pc, depth, UFTRACE_EXIT);
+}
+
+static void trace_enter_stack(Trace *t, Callstack *cs, uint64_t timestamp)
+{
+    for (size_t depth = 1; depth <= callstack_depth(cs); ++depth) {
+        trace_enter_function(t, timestamp, callstack_at(cs, depth)->pc, depth);
+    }
+}
+
+static void trace_exit_stack(Trace *t, Callstack *cs, uint64_t timestamp)
+{
+    for (size_t depth = callstack_depth(cs); depth > 0; --depth) {
+        trace_exit_function(t, timestamp, callstack_at(cs, depth)->pc, depth);
+    }
+}
+
 static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
 {
     GByteArray *buf = cpu->buf;
@@ -223,7 +349,9 @@ static void track_callstack(unsigned int cpu_index, void *udata)
 {
     uint64_t pc = (uintptr_t) udata;
     Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
+    uint64_t timestamp = gettime_ns();
     Callstack *cs = cpu->cs;
+    Trace *t = cpu->trace;
 
     uint64_t fp = cpu->ops.get_frame_pointer(cpu);
     if (!fp && callstack_empty(cs)) {
@@ -232,6 +360,7 @@ static void track_callstack(unsigned int cpu_index, void *udata)
          * long as a proper call does not happen.
          */
         callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
+        trace_enter_function(t, timestamp, pc, callstack_depth(cs));
         return;
     }
 
@@ -244,7 +373,8 @@ static void track_callstack(unsigned int cpu_index, void *udata)
     CallstackEntry caller = callstack_caller(cs);
     if (fp == caller.frame_pointer) {
         /* return */
-        callstack_pop(cs);
+        CallstackEntry e = callstack_pop(cs);
+        trace_exit_function(t, timestamp, e.pc, callstack_depth(cs));
         return;
     }
 
@@ -252,12 +382,16 @@ static void track_callstack(unsigned int cpu_index, void *udata)
     if (caller_fp == top.frame_pointer) {
         /* call */
         callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
+        trace_enter_function(t, timestamp, pc, callstack_depth(cs));
         return;
     }
 
     /* discontinuity, exit current stack and unwind new one */
+    trace_exit_stack(t, cs, timestamp);
     callstack_clear(cs);
+
     cpu_unwind_stack(cpu, fp, pc);
+    trace_enter_stack(t, cs, timestamp);
 }
 
 static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
@@ -296,6 +430,16 @@ static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
     cpu->ops.init(cpu);
     cpu->buf = g_byte_array_new();
 
+    g_assert(vcpu_index < UINT32_MAX / 100);
+    /* trace_id is: cpu_number * 100 */
+    uint32_t trace_id = (vcpu_index + 1) * 100;
+
+    g_autoptr(GString) trace_name = g_string_new(NULL);
+    g_string_append_printf(trace_name, "cpu%u", vcpu_index);
+    cpu->trace = trace_new(trace_id, trace_name);
+    /* create/truncate trace file */
+    trace_flush(cpu->trace, false);
+
     cpu->cs = callstack_new();
 }
 
@@ -304,6 +448,7 @@ static void vcpu_end(unsigned int vcpu_index)
     Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
     g_byte_array_free(cpu->buf, true);
 
+    trace_free(cpu->trace);
     callstack_free(cpu->cs);
     memset(cpu, 0, sizeof(Cpu));
 }
@@ -311,6 +456,8 @@ static void vcpu_end(unsigned int vcpu_index)
 static void at_exit(qemu_plugin_id_t id, void *data)
 {
     for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
+        Cpu *cpu = qemu_plugin_scoreboard_find(score, i);
+        trace_flush(cpu->trace, true);
         vcpu_end(i);
     }
 
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 5/9] contrib/plugins/uftrace: implement privilege level tracing
  2025-08-08  2:06 [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
                   ` (3 preceding siblings ...)
  2025-08-08  2:06 ` [PATCH v5 4/9] contrib/plugins/uftrace: implement tracing Pierrick Bouvier
@ 2025-08-08  2:06 ` Pierrick Bouvier
  2025-08-08  9:26   ` Manos Pitsidianakis
  2025-08-08  2:06 ` [PATCH v5 6/9] contrib/plugins/uftrace: generate additional files for uftrace Pierrick Bouvier
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08  2:06 UTC (permalink / raw)
  To: pierrick.bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé, rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

We add new option trace-privilege-level=bool, which will create a
separate trace for each privilege level.
This allows to follow changes of privilege during execution.

We implement aarch64 operations to track current privilege level
accordingly.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 contrib/plugins/uftrace.c | 189 ++++++++++++++++++++++++++++++++++++--
 1 file changed, 181 insertions(+), 8 deletions(-)

diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
index 402a242433e..7737626da2f 100644
--- a/contrib/plugins/uftrace.c
+++ b/contrib/plugins/uftrace.c
@@ -44,19 +44,39 @@ typedef struct {
     void (*init)(Cpu *cpu);
     void (*end)(Cpu *cpu);
     uint64_t (*get_frame_pointer)(Cpu *cpu);
+    uint8_t (*get_privilege_level)(Cpu *cpu);
+    uint8_t (*num_privilege_levels)(void);
+    const char *(*get_privilege_level_name)(uint8_t pl);
     bool (*does_insn_modify_frame_pointer)(const char *disas);
 } CpuOps;
 
 typedef struct Cpu {
     Trace *trace;
     Callstack *cs;
+    uint8_t privilege_level;
+    GArray *traces; /* Trace *traces [] */
     GByteArray *buf;
     CpuOps ops;
     void *arch;
 } Cpu;
 
+typedef enum {
+    AARCH64_EL0_SECURE,
+    AARCH64_EL0_NONSECURE,
+    AARCH64_EL0_REALM,
+    AARCH64_EL1_SECURE,
+    AARCH64_EL1_NONSECURE,
+    AARCH64_EL1_REALM,
+    AARCH64_EL2_SECURE,
+    AARCH64_EL2_NONSECURE,
+    AARCH64_EL2_REALM,
+    AARCH64_EL3,
+} Aarch64PrivilegeLevel;
+
 typedef struct {
     struct qemu_plugin_register *reg_fp;
+    struct qemu_plugin_register *reg_cpsr;
+    struct qemu_plugin_register *reg_scr_el3;
 } Aarch64Cpu;
 
 typedef struct {
@@ -72,6 +92,7 @@ typedef enum {
 } UftraceRecordType;
 
 static struct qemu_plugin_scoreboard *score;
+static bool trace_privilege_level;
 static CpuOps arch_ops;
 
 static uint64_t gettime_ns(void)
@@ -248,6 +269,16 @@ static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
     return *((uint64_t *) buf->data);
 }
 
+static uint32_t cpu_read_register32(Cpu *cpu, struct qemu_plugin_register *reg)
+{
+    GByteArray *buf = cpu->buf;
+    g_byte_array_set_size(buf, 0);
+    size_t sz = qemu_plugin_read_register(reg, buf);
+    g_assert(sz == 4);
+    g_assert(buf->len == 4);
+    return *((uint32_t *) buf->data);
+}
+
 static uint64_t cpu_read_memory64(Cpu *cpu, uint64_t addr)
 {
     g_assert(addr);
@@ -305,6 +336,68 @@ static struct qemu_plugin_register *plugin_find_register(const char *name)
     return NULL;
 }
 
+static uint8_t aarch64_num_privilege_levels(void)
+{
+    return AARCH64_EL3 + 1;
+}
+
+static const char *aarch64_get_privilege_level_name(uint8_t pl)
+{
+    switch (pl) {
+    case AARCH64_EL0_SECURE: return "S-EL0";
+    case AARCH64_EL0_NONSECURE: return "NS-EL0";
+    case AARCH64_EL0_REALM: return "R-EL0";
+    case AARCH64_EL1_SECURE: return "S-EL1";
+    case AARCH64_EL1_NONSECURE: return "NS-EL1";
+    case AARCH64_EL1_REALM: return "R-EL1";
+    case AARCH64_EL2_SECURE: return "S-EL2";
+    case AARCH64_EL2_NONSECURE: return "NS-EL2";
+    case AARCH64_EL2_REALM: return "R-EL2";
+    case AARCH64_EL3: return "EL3";
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint8_t aarch64_get_privilege_level(Cpu *cpu_)
+{
+    Aarch64Cpu *cpu = cpu_->arch;
+    /*
+     * QEMU gdbstub does not provide access to CurrentEL,
+     * so we use CPSR instead.
+     */
+    uint8_t el = cpu_read_register32(cpu_, cpu->reg_cpsr) >> 2 & 0b11;
+
+    if (el == 3) {
+        return AARCH64_EL3;
+    }
+
+    uint8_t ss = AARCH64_EL0_SECURE;
+    if (!cpu->reg_scr_el3) {
+        ss = AARCH64_EL0_NONSECURE;
+    }
+    uint64_t scr_el3 = cpu_read_register64(cpu_, cpu->reg_scr_el3);
+    uint64_t ns = (scr_el3 >> 0) & 0b1;
+    uint64_t nse = (scr_el3 >> 62) & 0b1;
+    switch (nse << 1 | ns) {
+    case 0b00:
+        ss = AARCH64_EL0_SECURE;
+        break;
+    case 0b01:
+        ss = AARCH64_EL0_NONSECURE;
+        break;
+    case 0b11:
+        ss = AARCH64_EL0_REALM;
+        break;
+    default:
+        g_assert_not_reached();
+    }
+
+    const uint8_t num_ss = 3;
+    Aarch64PrivilegeLevel pl = el * num_ss + ss;
+    return pl;
+}
+
 static uint64_t aarch64_get_frame_pointer(Cpu *cpu_)
 {
     Aarch64Cpu *cpu = cpu_->arch;
@@ -321,6 +414,10 @@ static void aarch64_init(Cpu *cpu_)
                         "available. Please use an AArch64 cpu (or -cpu max).\n");
         g_abort();
     }
+    cpu->reg_cpsr = plugin_find_register("cpsr");
+    g_assert(cpu->reg_cpsr);
+    cpu->reg_scr_el3 = plugin_find_register("SCR_EL3");
+    /* scr_el3 is optional */
 }
 
 static void aarch64_end(Cpu *cpu)
@@ -342,9 +439,34 @@ static CpuOps aarch64_ops = {
     .init = aarch64_init,
     .end = aarch64_end,
     .get_frame_pointer = aarch64_get_frame_pointer,
+    .get_privilege_level = aarch64_get_privilege_level,
+    .num_privilege_levels = aarch64_num_privilege_levels,
+    .get_privilege_level_name = aarch64_get_privilege_level_name,
     .does_insn_modify_frame_pointer = aarch64_does_insn_modify_frame_pointer,
 };
 
+static void track_privilege_change(unsigned int cpu_index, void *udata)
+{
+    Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
+    uint8_t new_pl = cpu->ops.get_privilege_level(cpu);
+
+    if (new_pl == cpu->privilege_level) {
+        return;
+    }
+
+    uint64_t pc = (uintptr_t) udata;
+    uint64_t timestamp = gettime_ns();
+
+    trace_exit_stack(cpu->trace, cpu->cs, timestamp);
+    callstack_clear(cpu->cs);
+
+    cpu->privilege_level = new_pl;
+    cpu->trace = g_array_index(cpu->traces, Trace*, new_pl);
+
+    cpu_unwind_stack(cpu, cpu->ops.get_frame_pointer(cpu), pc);
+    trace_enter_stack(cpu->trace, cpu->cs, timestamp);
+}
+
 static void track_callstack(unsigned int cpu_index, void *udata)
 {
     uint64_t pc = (uintptr_t) udata;
@@ -397,6 +519,13 @@ static void track_callstack(unsigned int cpu_index, void *udata)
 static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
 {
     size_t n_insns = qemu_plugin_tb_n_insns(tb);
+    uintptr_t tb_pc = qemu_plugin_tb_vaddr(tb);
+
+    if (trace_privilege_level) {
+        qemu_plugin_register_vcpu_tb_exec_cb(tb, track_privilege_change,
+                                             QEMU_PLUGIN_CB_R_REGS,
+                                             (void *) tb_pc);
+    }
 
     /*
      * We instrument all instructions following one that might have updated
@@ -429,18 +558,36 @@ static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
 
     cpu->ops.init(cpu);
     cpu->buf = g_byte_array_new();
+    cpu->traces = g_array_new(0, 0, sizeof(Trace *));
 
     g_assert(vcpu_index < UINT32_MAX / 100);
-    /* trace_id is: cpu_number * 100 */
+    g_assert(cpu->ops.num_privilege_levels() < 100);
+    /* trace_id is: cpu_number * 100 + privilege_level */
     uint32_t trace_id = (vcpu_index + 1) * 100;
 
-    g_autoptr(GString) trace_name = g_string_new(NULL);
-    g_string_append_printf(trace_name, "cpu%u", vcpu_index);
-    cpu->trace = trace_new(trace_id, trace_name);
-    /* create/truncate trace file */
-    trace_flush(cpu->trace, false);
+    if (trace_privilege_level) {
+        for (uint8_t pl = 0; pl < cpu->ops.num_privilege_levels(); ++pl) {
+            g_autoptr(GString) trace_name = g_string_new(NULL);
+            g_string_append_printf(trace_name, "cpu%u %s", vcpu_index,
+                                   cpu->ops.get_privilege_level_name(pl));
+            Trace *t = trace_new(trace_id + pl, trace_name);
+            g_array_append_val(cpu->traces, t);
+        }
+    } else {
+        g_autoptr(GString) trace_name = g_string_new(NULL);
+        g_string_append_printf(trace_name, "cpu%u", vcpu_index);
+        Trace *t = trace_new(trace_id, trace_name);
+        g_array_append_val(cpu->traces, t);
+    }
+
+    for (size_t i = 0; i < cpu->traces->len; ++i) {
+        /* create/truncate trace files */
+        Trace *t = g_array_index(cpu->traces, Trace*, i);
+        trace_flush(t, false);
+    }
 
     cpu->cs = callstack_new();
+    cpu->trace = g_array_index(cpu->traces, Trace*, cpu->privilege_level);
 }
 
 static void vcpu_end(unsigned int vcpu_index)
@@ -448,7 +595,12 @@ static void vcpu_end(unsigned int vcpu_index)
     Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
     g_byte_array_free(cpu->buf, true);
 
-    trace_free(cpu->trace);
+    for (size_t i = 0; i < cpu->traces->len; ++i) {
+        Trace *t = g_array_index(cpu->traces, Trace*, i);
+        trace_free(t);
+    }
+
+    g_array_free(cpu->traces, true);
     callstack_free(cpu->cs);
     memset(cpu, 0, sizeof(Cpu));
 }
@@ -457,7 +609,13 @@ static void at_exit(qemu_plugin_id_t id, void *data)
 {
     for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
         Cpu *cpu = qemu_plugin_scoreboard_find(score, i);
-        trace_flush(cpu->trace, true);
+        for (size_t j = 0; j < cpu->traces->len; ++j) {
+            Trace *t = g_array_index(cpu->traces, Trace*, j);
+            trace_flush(t, true);
+        }
+    }
+
+    for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
         vcpu_end(i);
     }
 
@@ -468,6 +626,21 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
                                            const qemu_info_t *info,
                                            int argc, char **argv)
 {
+    for (int i = 0; i < argc; i++) {
+        char *opt = argv[i];
+        g_auto(GStrv) tokens = g_strsplit(opt, "=", 2);
+        if (g_strcmp0(tokens[0], "trace-privilege-level") == 0) {
+            if (!qemu_plugin_bool_parse(tokens[0], tokens[1],
+                                        &trace_privilege_level)) {
+                fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
+                return -1;
+            }
+        } else {
+            fprintf(stderr, "option parsing failed: %s\n", opt);
+            return -1;
+        }
+    }
+
     if (!strcmp(info->target_name, "aarch64")) {
         arch_ops = aarch64_ops;
     } else {
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 6/9] contrib/plugins/uftrace: generate additional files for uftrace
  2025-08-08  2:06 [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
                   ` (4 preceding siblings ...)
  2025-08-08  2:06 ` [PATCH v5 5/9] contrib/plugins/uftrace: implement privilege level tracing Pierrick Bouvier
@ 2025-08-08  2:06 ` Pierrick Bouvier
  2025-08-08  9:37   ` Manos Pitsidianakis
  2025-08-08  2:07 ` [PATCH v5 7/9] contrib/plugins/uftrace: implement x64 support Pierrick Bouvier
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08  2:06 UTC (permalink / raw)
  To: pierrick.bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé, rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

Beyond traces per cpu, uftrace expect to find some specific files.
- info: contains information about machine/program run
  those values are not impacting uftrace behaviour, and we simply copied
  a random example to keep things simple.
- memory mapping: how every binary is mapped in memory. For system mode,
  we generate an empty mapping (uftrace_symbols.py, coming in future
  commit, will take care of that). For user mode, we copy current
  /proc/self/maps. We don't need to do any special filtering, as
  reported addresses will necessarily concern guest program, and not
  QEMU and its libraries.
- task: list of tasks. We present every vcpu/privilege level as a
  separate process, as it's the best view we can have when generating a
  (visual) chrome trace. Using threads is less convenient in terms of
  UI.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 contrib/plugins/uftrace.c | 130 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 129 insertions(+), 1 deletion(-)

diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
index 7737626da2f..6628b4256fd 100644
--- a/contrib/plugins/uftrace.c
+++ b/contrib/plugins/uftrace.c
@@ -115,6 +115,126 @@ static uint64_t gettime_ns(void)
     return now_ns;
 }
 
+static void uftrace_write_map(bool system_emulation)
+{
+    const char *path = "./uftrace.data/sid-0.map";
+
+    if (system_emulation && access(path, F_OK) == 0) {
+        /* do not erase existing map in system emulation, as a custom one might
+         * already have been generated by uftrace_symbols.py */
+        return;
+    }
+
+    FILE *sid_map = fopen(path, "w");
+    g_assert(sid_map);
+
+    if (system_emulation) {
+        fprintf(sid_map,
+                "# map stack on highest address possible, to prevent uftrace\n"
+                "# from considering any kernel address\n");
+        fprintf(sid_map,
+          "ffffffffffff-ffffffffffff rw-p 00000000 00:00 0 [stack]\n");
+    } else {
+        /* in user mode, copy /proc/self/maps instead */
+        FILE *self_map = fopen("/proc/self/maps", "r");
+        g_assert(self_map);
+        for (;;) {
+            int c = fgetc(self_map);
+            if (c == EOF) {
+                break;
+            }
+            fputc(c, sid_map);
+        }
+        fclose(self_map);
+    }
+    fclose(sid_map);
+}
+
+static void uftrace_write_task(const GArray *traces)
+{
+    FILE *task = fopen("./uftrace.data/task.txt", "w");
+    g_assert(task);
+    for (int i = 0; i < traces->len; ++i) {
+        Trace *t = g_array_index(traces, Trace*, i);
+        fprintf(task, "SESS timestamp=0.0 pid=%"PRIu32" sid=0 exename=\"%s\"\n",
+                t->id, t->name->str);
+        fprintf(task, "TASK timestamp=0.0 tid=%"PRIu32" pid=%"PRIu32"\n",
+                t->id, t->id);
+    }
+    fclose(task);
+}
+
+static void uftrace_write_info(const GArray *traces)
+{
+    g_autoptr(GString) taskinfo_tids = g_string_new("taskinfo:tids=");
+    for (int i = 0; i < traces->len; ++i) {
+        Trace *t = g_array_index(traces, Trace*, i);
+        const char *delim = i > 0 ? "," : "";
+        g_string_append_printf(taskinfo_tids, "%s%"PRIu32, delim, t->id);
+    }
+
+    g_autoptr(GString) taskinfo_nr_tid = g_string_new("taskinfo:nr_tid=");
+    g_string_append_printf(taskinfo_nr_tid, "%d", traces->len);
+
+    FILE *info = fopen("./uftrace.data/info", "w");
+    g_assert(info);
+    /*
+     * $ uftrace dump --debug
+     * uftrace file header: magic         = 4674726163652100
+     * uftrace file header: version       = 4
+     * uftrace file header: header size   = 40
+     * uftrace file header: endian        = 1 (little)
+     * uftrace file header: class          = 2 (64 bit)
+     * uftrace file header: features      = 0x1263 (PLTHOOK | ...
+     * uftrace file header: info          = 0x7bff (EXE_NAME | ...
+     *  <0000000000000000>: 46 74 72 61 63 65 21 00  04 00 00 00 28 00 01 02
+     *  <0000000000000010>: 63 12 00 00 00 00 00 00  ff 7b 00 00 00 00 00 00
+     *  <0000000000000020>: 00 04 00 00 00 00 00 00
+     */
+    const uint8_t header[] = {0x46, 0x74, 0x72, 0x61, 0x63, 0x65, 0x21, 0x00,
+                              0x04, 0x00, 0x00, 0x00, 0x28, 0x00, 0x01, 0x02,
+                              0x63, 0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+                              0xff, 0x7b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+                              0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
+    fwrite(header, sizeof(header), 1, info);
+    const char *info_data[] = {
+        "exename:from_qemu",
+        "build_id:0123456789abcdef0123456789abcdef01234567",
+        "exit_status:0",
+        "cmdline:uftrace record qemu",
+        "cpuinfo:lines=2",
+        "cpuinfo:nr_cpus=1 / 1 (online/possible)",
+        "cpuinfo:desc=Intel 8086",
+        "meminfo:1.0 / 1.0 GB (free / total)",
+        "osinfo:lines=3",
+        "osinfo:kernel=Linux 6.12.33",
+        "osinfo:hostname=pc",
+        "osinfo:distro=\"Debian GNU/Linux 13 (trixie)\"",
+        "taskinfo:lines=2",
+        taskinfo_nr_tid->str,
+        taskinfo_tids->str,
+        "usageinfo:lines=6",
+        "usageinfo:systime=0.000000",
+        "usageinfo:usrtime=0.000000",
+        "usageinfo:ctxsw=0 / 0 (voluntary / involuntary)",
+        "usageinfo:maxrss=8016",
+        "usageinfo:pagefault=0 / 0 (major / minor)",
+        "usageinfo:iops=0 / 0 (read / write)",
+        "loadinfo:0.0 / 0.0 / 0.0",
+        "record_date:Mon Jan  1 00:00:00 2025",
+        "elapsed_time:1000000000000.0 sec",
+        "pattern_type:regex",
+        "uftrace_version:v0.17 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )",
+        "utc_offset:1751552954",
+        0};
+    const char **info_data_it = info_data;
+    while (*(info_data_it)) {
+        fprintf(info, "%s\n", *info_data_it);
+        ++info_data_it;
+    }
+    fclose(info);
+}
+
 static Callstack *callstack_new(void)
 {
     Callstack *cs = g_new0(Callstack, 1);
@@ -607,14 +727,22 @@ static void vcpu_end(unsigned int vcpu_index)
 
 static void at_exit(qemu_plugin_id_t id, void *data)
 {
+    bool system_emulation = (bool) data;
+    g_autoptr(GArray) traces = g_array_new(0, 0, sizeof(Trace *));
+
     for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
         Cpu *cpu = qemu_plugin_scoreboard_find(score, i);
         for (size_t j = 0; j < cpu->traces->len; ++j) {
             Trace *t = g_array_index(cpu->traces, Trace*, j);
             trace_flush(t, true);
+            g_array_append_val(traces, t);
         }
     }
 
+    uftrace_write_map(system_emulation);
+    uftrace_write_info(traces);
+    uftrace_write_task(traces);
+
     for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
         vcpu_end(i);
     }
@@ -651,7 +779,7 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
 
     score = qemu_plugin_scoreboard_new(sizeof(Cpu));
     qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
-    qemu_plugin_register_atexit_cb(id, at_exit, NULL);
+    qemu_plugin_register_atexit_cb(id, at_exit, (void *) info->system_emulation);
     qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
 
     return 0;
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 7/9] contrib/plugins/uftrace: implement x64 support
  2025-08-08  2:06 [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
                   ` (5 preceding siblings ...)
  2025-08-08  2:06 ` [PATCH v5 6/9] contrib/plugins/uftrace: generate additional files for uftrace Pierrick Bouvier
@ 2025-08-08  2:07 ` Pierrick Bouvier
  2025-08-08  9:42   ` Manos Pitsidianakis
  2025-08-08  2:07 ` [PATCH v5 8/9] contrib/plugins/uftrace_symbols.py Pierrick Bouvier
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08  2:07 UTC (permalink / raw)
  To: pierrick.bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé, rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

It's trivial to implement x64 support, as it's the same stack layout
than aarch64.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 contrib/plugins/uftrace.c | 85 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
index 6628b4256fd..f10172eed10 100644
--- a/contrib/plugins/uftrace.c
+++ b/contrib/plugins/uftrace.c
@@ -79,6 +79,20 @@ typedef struct {
     struct qemu_plugin_register *reg_scr_el3;
 } Aarch64Cpu;
 
+typedef enum {
+    X64_RING0,
+    X64_RING1,
+    X64_RING2,
+    X64_RING3,
+    X64_REAL_MODE,
+} X64PrivilegeLevel;
+
+typedef struct {
+    struct qemu_plugin_register *reg_rbp;
+    struct qemu_plugin_register *reg_cs;
+    struct qemu_plugin_register *reg_cr0;
+} X64Cpu;
+
 typedef struct {
     uint64_t timestamp;
     uint64_t data;
@@ -565,6 +579,75 @@ static CpuOps aarch64_ops = {
     .does_insn_modify_frame_pointer = aarch64_does_insn_modify_frame_pointer,
 };
 
+static uint8_t x64_num_privilege_levels(void)
+{
+    return X64_REAL_MODE + 1;
+}
+
+static const char *x64_get_privilege_level_name(uint8_t pl)
+{
+    switch (pl) {
+    case X64_RING0: return "Ring0";
+    case X64_RING1: return "Ring1";
+    case X64_RING2: return "Ring2";
+    case X64_RING3: return "Ring3";
+    case X64_REAL_MODE: return "RealMode";
+    default:
+        g_assert_not_reached();
+    }
+}
+
+static uint8_t x64_get_privilege_level(Cpu *cpu_)
+{
+    X64Cpu *cpu = cpu_->arch;
+    uint64_t cr0 = cpu_read_register64(cpu_, cpu->reg_cr0);
+    uint64_t protected_mode = (cr0 >> 0) & 0b1;
+    if (!protected_mode) {
+        return X64_REAL_MODE;
+    }
+    uint32_t cs = cpu_read_register32(cpu_, cpu->reg_cs);
+    uint32_t ring_level = (cs >> 0) & 0b11;
+    return ring_level;
+}
+
+static uint64_t x64_get_frame_pointer(Cpu *cpu_)
+{
+    X64Cpu *cpu = cpu_->arch;
+    return cpu_read_register64(cpu_, cpu->reg_rbp);
+}
+
+static void x64_init(Cpu *cpu_)
+{
+    X64Cpu *cpu = g_new0(X64Cpu, 1);
+    cpu_->arch = cpu;
+    cpu->reg_rbp = plugin_find_register("rbp");
+    g_assert(cpu->reg_rbp);
+    cpu->reg_cs = plugin_find_register("cs");
+    g_assert(cpu->reg_cs);
+    cpu->reg_cr0 = plugin_find_register("cr0");
+    g_assert(cpu->reg_cr0);
+}
+
+static void x64_end(Cpu *cpu)
+{
+    g_free(cpu->arch);
+}
+
+static bool x64_does_insn_modify_frame_pointer(const char *disas)
+{
+    return strstr(disas, "rbp");
+}
+
+static CpuOps x64_ops = {
+    .init = x64_init,
+    .end = x64_end,
+    .get_frame_pointer = x64_get_frame_pointer,
+    .get_privilege_level = x64_get_privilege_level,
+    .num_privilege_levels = x64_num_privilege_levels,
+    .get_privilege_level_name = x64_get_privilege_level_name,
+    .does_insn_modify_frame_pointer = x64_does_insn_modify_frame_pointer,
+};
+
 static void track_privilege_change(unsigned int cpu_index, void *udata)
 {
     Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
@@ -771,6 +854,8 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
 
     if (!strcmp(info->target_name, "aarch64")) {
         arch_ops = aarch64_ops;
+    } else if (!strcmp(info->target_name, "x86_64")) {
+        arch_ops = x64_ops;
     } else {
         fprintf(stderr, "plugin uftrace: %s target is not supported\n",
                 info->target_name);
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 8/9] contrib/plugins/uftrace_symbols.py
  2025-08-08  2:06 [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
                   ` (6 preceding siblings ...)
  2025-08-08  2:07 ` [PATCH v5 7/9] contrib/plugins/uftrace: implement x64 support Pierrick Bouvier
@ 2025-08-08  2:07 ` Pierrick Bouvier
  2025-08-08  2:07 ` [PATCH v5 9/9] contrib/plugins/uftrace: add documentation Pierrick Bouvier
  2025-08-08 20:45 ` [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
  9 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08  2:07 UTC (permalink / raw)
  To: pierrick.bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé, rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

usage:  contrib/plugins/uftrace_symbols.py \
        --prefix-symbols \
        arm-trusted-firmware/build/qemu/debug/bl1/bl1.elf \
        arm-trusted-firmware/build/qemu/debug/bl2/bl2.elf \
        arm-trusted-firmware/build/qemu/debug/bl31/bl31.elf \
        u-boot/u-boot:0x60000000 \
        u-boot/u-boot.relocated:0x000000023f6b6000 \
        linux/vmlinux

Will generate symbols and memory mapping files for uftrace, allowing to
have an enhanced trace, instead of raw addresses.

It takes a collection of elf files, and automatically find all their
symbols, and generate an ordered memory map based on that.

This script uses the python (native) pyelftools module.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 contrib/plugins/uftrace_symbols.py | 152 +++++++++++++++++++++++++++++
 1 file changed, 152 insertions(+)
 create mode 100755 contrib/plugins/uftrace_symbols.py

diff --git a/contrib/plugins/uftrace_symbols.py b/contrib/plugins/uftrace_symbols.py
new file mode 100755
index 00000000000..b49e03203c8
--- /dev/null
+++ b/contrib/plugins/uftrace_symbols.py
@@ -0,0 +1,152 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+#
+# Create symbols and mapping files for uftrace.
+#
+# Copyright 2025 Linaro Ltd
+# Author: Pierrick Bouvier <pierrick.bouvier@linaro.org>
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import argparse
+import elftools # pip install pyelftools
+import os
+
+from elftools.elf.elffile import ELFFile
+from elftools.elf.sections import SymbolTableSection
+
+def elf_func_symbols(elf):
+    symbol_tables = [(idx, s) for idx, s in enumerate(elf.iter_sections())
+                  if isinstance(s, SymbolTableSection)]
+    symbols = []
+    for _, section in symbol_tables:
+        for _, symbol in enumerate(section.iter_symbols()):
+            if symbol_size(symbol) == 0:
+                continue
+            type = symbol['st_info']['type']
+            if type == 'STT_FUNC' or type == 'STT_NOTYPE':
+                symbols.append(symbol)
+    symbols.sort(key = lambda x: symbol_addr(x))
+    return symbols
+
+def symbol_size(symbol):
+    return symbol['st_size']
+
+def symbol_addr(symbol):
+    addr = symbol['st_value']
+    # clamp addr to 48 bits, like uftrace entries
+    return addr & 0xffffffffffff
+
+def symbol_name(symbol):
+    return symbol.name
+
+class BinaryFile:
+    def __init__(self, path, map_offset):
+        self.fullpath = os.path.realpath(path)
+        self.map_offset = map_offset
+        with open(path, 'rb') as f:
+            self.elf = ELFFile(f)
+            self.symbols = elf_func_symbols(self.elf)
+
+    def path(self):
+        return self.fullpath
+
+    def addr_start(self):
+        return self.map_offset
+
+    def addr_end(self):
+        last_sym = self.symbols[-1]
+        return symbol_addr(last_sym) + symbol_size(last_sym) + self.map_offset
+
+    def generate_symbol_file(self, prefix_symbols):
+        binary_name = os.path.basename(self.fullpath)
+        sym_file_path = f'./uftrace.data/{binary_name}.sym'
+        print(f'{sym_file_path} ({len(self.symbols)} symbols)')
+        with open(sym_file_path, 'w') as sym_file:
+            # print hexadecimal addresses on 48 bits
+            addrx = "0>12x"
+            for s in self.symbols:
+                addr = symbol_addr(s)
+                addr = f'{addr:{addrx}}'
+                size = f'{symbol_size(s):{addrx}}'
+                name = symbol_name(s)
+                if prefix_symbols:
+                    name = f'{binary_name}:{name}'
+                print(addr, size, 'T', name, file=sym_file)
+
+def parse_parameter(p):
+    s = p.split(":")
+    path = s[0]
+    if len(s) == 1:
+        return path, 0
+    if len(s) > 2:
+        raise ValueError('only one offset can be set')
+    offset = s[1]
+    if not offset.startswith('0x'):
+        err = f'offset "{offset}" is not an hexadecimal constant. '
+        err += 'It should starts with "0x".'
+        raise ValueError(err)
+    offset = int(offset, 16)
+    return path, offset
+
+def is_from_user_mode(map_file_path):
+    if os.path.exists(map_file_path):
+        with open(map_file_path, 'r') as map_file:
+            if not map_file.readline().startswith('# map stack on'):
+                return True
+    return False
+
+def generate_map(binaries):
+    map_file_path = './uftrace.data/sid-0.map'
+
+    if is_from_user_mode(map_file_path):
+        print(f'do not overwrite {map_file_path} generated from qemu-user')
+        return
+
+    mappings = []
+
+    # print hexadecimal addresses on 48 bits
+    addrx = "0>12x"
+
+    mappings += ['# map stack on highest address possible, to prevent uftrace']
+    mappings += ['# from considering any kernel address']
+    mappings += ['ffffffffffff-ffffffffffff rw-p 00000000 00:00 0 [stack]']
+
+    for b in binaries:
+        m = f'{b.addr_start():{addrx}}-{b.addr_end():{addrx}}'
+        m += f' r--p 00000000 00:00 0 {b.path()}'
+        mappings.append(m)
+
+    with open(map_file_path, 'w') as map_file:
+        print('\n'.join(mappings), file=map_file)
+    print(f'{map_file_path}')
+    print('\n'.join(mappings))
+
+def main():
+    parser = argparse.ArgumentParser(description=
+                                     'generate symbol files for uftrace')
+    parser.add_argument('elf_file', nargs='+',
+                        help='path to an ELF file. '
+                        'Use /path/to/file:0xdeadbeef to add a mapping offset.')
+    parser.add_argument('--prefix-symbols',
+                        help='prepend binary name to symbols',
+                        action=argparse.BooleanOptionalAction)
+    args = parser.parse_args()
+
+    if not os.path.exists('./uftrace.data'):
+        os.mkdir('./uftrace.data')
+
+    binaries = []
+    for file in args.elf_file:
+        path, offset = parse_parameter(file)
+        b = BinaryFile(path, offset)
+        binaries.append(b)
+    binaries.sort(key = lambda b: b.addr_end());
+
+    for b in binaries:
+        b.generate_symbol_file(args.prefix_symbols)
+
+    generate_map(binaries)
+
+if __name__ == '__main__':
+    main()
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH v5 9/9] contrib/plugins/uftrace: add documentation
  2025-08-08  2:06 [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
                   ` (7 preceding siblings ...)
  2025-08-08  2:07 ` [PATCH v5 8/9] contrib/plugins/uftrace_symbols.py Pierrick Bouvier
@ 2025-08-08  2:07 ` Pierrick Bouvier
  2025-08-08  9:46   ` Manos Pitsidianakis
  2025-08-08 20:45 ` [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
  9 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08  2:07 UTC (permalink / raw)
  To: pierrick.bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé, rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

This documentation summarizes how to use the plugin, and present two
examples of the possibilities offered by it, in system and user mode.

As well, it explains how to rebuild and reproduce those examples.

Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 docs/about/emulation.rst | 197 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 197 insertions(+)

diff --git a/docs/about/emulation.rst b/docs/about/emulation.rst
index 456d01d5b08..9ce47ac2712 100644
--- a/docs/about/emulation.rst
+++ b/docs/about/emulation.rst
@@ -816,6 +816,203 @@ This plugin can limit the number of Instructions Per Second that are executed::
       The lower the number the more accurate time will be, but the less efficient the plugin.
       Defaults to ips/10
 
+Uftrace
+.......
+
+``contrib/plugins/uftrace.c``
+
+This plugin generates a binary trace compatible with
+`uftrace <https://github.com/namhyung/uftrace>`_.
+
+Plugin supports aarch64 and x64, and works in user and system mode, allowing to
+trace a system boot, which is not something possible usually.
+
+In user mode, the memory mapping is directly copied from ``/proc/self/maps`` at
+the end of execution. Uftrace should be able to retrieve symbols by itself,
+without any additional step.
+In system mode, the default memory mapping is empty, and you can generate
+one (and associated symbols) using ``contrib/plugins/uftrace_symbols.py``.
+Symbols must be present in ELF binaries.
+
+It tracks the call stack (based on frame pointer analysis). Thus, your program
+and its dependencies must be compiled using ``-fno-omit-frame-pointer
+-mno-omit-leaf-frame-pointer``. In 2024, `Ubuntu and Fedora enabled it by
+default again on x64
+<https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html>`_.
+On aarch64, this is less of a problem, as they are usually part of the ABI,
+except for leaf functions. That's true for user space applications, but not
+necessarily for bare metal code. You can read this `section
+<uftrace_build_system_example>` to easily build a system with frame pointers.
+
+When tracing long scenarios (> 1 min), the generated trace can become very long,
+making it hard to extract data from it. In this case, a simple solution is to
+trace execution while generating a timestamped output log using
+``qemu-system-aarch64 ... | ts "%s"``. Then, ``uftrace --time-range=start~end``
+can be used to reduce trace for only this part of execution.
+
+Performance wise, overhead compared to normal tcg execution is around x5-x15.
+
+.. list-table:: Uftrace plugin arguments
+  :widths: 20 80
+  :header-rows: 1
+
+  * - Option
+    - Description
+  * - trace-privilege-level=[on|off]
+    - Generate separate traces for each privilege level (Exception Level +
+      Security State on aarch64, Rings on x64).
+
+.. list-table:: uftrace_symbols.py arguments
+  :widths: 20 80
+  :header-rows: 1
+
+  * - Option
+    - Description
+  * - elf_file [elf_file ...]
+    - path to an ELF file. Use /path/to/file:0xdeadbeef to add a mapping offset.
+  * - --prefix-symbols
+    - prepend binary name to symbols
+
+Example user trace
+++++++++++++++++++
+
+As an example, we can trace qemu itself running git::
+
+    $ ./build/qemu-aarch64 -plugin \
+      build/contrib/plugins/libuftrace.so \
+      ./build/qemu-aarch64 /usr/bin/git --help
+
+    # and generate a chrome trace directly
+    $ uftrace dump --chrome | gzip > ~/qemu_aarch64_git_help.json.gz
+
+For convenience, you can download this trace `qemu_aarch64_git_help.json.gz
+<https://fileserver.linaro.org/s/N8X8fnZ5yGRZLsT/download/qemu_aarch64_git_help.json.gz>`_.
+Download it and open this trace on https://ui.perfetto.dev/. You can zoom in/out
+using w,a,s,d keys. Some sequences taken from this trace:
+
+- Loading program and its interpreter
+
+.. image:: https://fileserver.linaro.org/s/fie8JgX76yyL5cq/preview
+   :height: 200px
+
+- open syscall
+
+.. image:: https://fileserver.linaro.org/s/rsXPTeZZPza4PcE/preview
+   :height: 200px
+
+- TB creation
+
+.. image:: https://fileserver.linaro.org/s/GXY6NKMw5EeRCew/preview
+   :height: 200px
+
+It's usually better to use ``uftrace record`` directly. However, tracing
+binaries through qemu-user can be convenient when you don't want to recompile
+them (``uftrace record`` requires instrumentation), as long as symbols are
+present.
+
+Example system trace
+++++++++++++++++++++
+
+A full trace example (chrome trace, from instructions below) generated from a
+system boot can be found `here
+<https://fileserver.linaro.org/s/WsemLboPEzo24nw/download/aarch64_boot.json.gz>`_.
+Download it and open this trace on https://ui.perfetto.dev/. You can see code
+executed for all privilege levels, and zoom in/out using w,a,s,d keys. You can
+find below some sequences taken from this trace:
+
+- Two first stages of boot sequence in Arm Trusted Firmware (EL3 and S-EL1)
+
+.. image:: https://fileserver.linaro.org/s/kkxBS552W7nYESX/preview
+   :height: 200px
+
+- U-boot initialization (until code relocation, after which we can't track it)
+
+.. image:: https://fileserver.linaro.org/s/LKTgsXNZFi5GFNC/preview
+   :height: 200px
+
+- Stat and open syscalls in kernel
+
+.. image:: https://fileserver.linaro.org/s/dXe4MfraKg2F476/preview
+   :height: 200px
+
+- Timer interrupt
+
+.. image:: https://fileserver.linaro.org/s/TM5yobYzJtP7P3C/preview
+   :height: 200px
+
+- Poweroff sequence (from kernel back to firmware, NS-EL2 to EL3)
+
+.. image:: https://fileserver.linaro.org/s/oR2PtyGKJrqnfRf/preview
+   :height: 200px
+
+Build and run system example
+++++++++++++++++++++++++++++
+
+.. _uftrace_build_system_example:
+
+Building a full system image with frame pointers is not trivial.
+
+We provide a `simple way <https://github.com/pbo-linaro/qemu-linux-stack>`_ to
+build an aarch64 system, combining Arm Trusted firmware, U-boot, Linux kernel
+and debian userland. It's based on containers (``podman`` only) and
+``qemu-user-static (binfmt)`` to make sure it's easily reproducible and does not depend
+on machine where you build it.
+
+You can follow the exact same instructions for a x64 system, combining edk2,
+Linux, and Ubuntu, simply by switching to
+`x86_64 <https://github.com/pbo-linaro/qemu-linux-stack/tree/x86_64>`_ branch.
+
+To build the system::
+
+    # Install dependencies
+    $ sudo apt install -y podman qemu-user-static
+
+    $ git clone https://github.com/pbo-linaro/qemu-linux-stack
+    $ cd qemu-linux-stack
+    $ ./build.sh
+
+    # system can be started using:
+    $ ./run.sh /path/to/qemu-system-aarch64
+
+To generate a uftrace for a system boot from that::
+
+    # run true and poweroff the system
+    $ env INIT=true ./run.sh path/to/qemu-system-aarch64 \
+      -plugin path/to/contrib/plugins/libuftrace.so,trace-privilege-level=on
+
+    # generate symbols and memory mapping
+    $ path/to/contrib/plugins/uftrace_symbols.py \
+      --prefix-symbols \
+      arm-trusted-firmware/build/qemu/debug/bl1/bl1.elf \
+      arm-trusted-firmware/build/qemu/debug/bl2/bl2.elf \
+      arm-trusted-firmware/build/qemu/debug/bl31/bl31.elf \
+      u-boot/u-boot:0x60000000 \
+      linux/vmlinux
+
+    # inspect trace with
+    $ uftrace replay
+
+Uftrace allows to filter the trace, and dump flamegraphs, or a chrome trace.
+This last one is very interesting to see visually the boot process::
+
+    $ uftrace dump --chrome > boot.json
+    # Open your browser, and load boot.json on https://ui.perfetto.dev/.
+
+Long visual chrome traces can't be easily opened, thus, it might be
+interesting to generate them around a particular point of execution::
+
+    # execute qemu and timestamp output log
+    $ env INIT=true ./run.sh path/to/qemu-system-aarch64 \
+      -plugin path/to/contrib/plugins/libuftrace.so,trace-privilege-level=on |&
+      ts "%s" | tee exec.log
+
+    $ cat exec.log  | grep 'Run /init'
+      1753122320 [   11.834391] Run /init as init process
+      # init was launched at 1753122320
+
+    # generate trace around init execution (2 seconds):
+    $ uftrace dump --chrome --time-range=1753122320~1753122322 > init.json
+
 Other emulation features
 ------------------------
 
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 1/9] contrib/plugins/uftrace: skeleton file
  2025-08-08  2:06 ` [PATCH v5 1/9] contrib/plugins/uftrace: skeleton file Pierrick Bouvier
@ 2025-08-08  8:14   ` Manos Pitsidianakis
  2025-08-08 16:19     ` Pierrick Bouvier
  2025-08-08 17:28     ` Pierrick Bouvier
  0 siblings, 2 replies; 35+ messages in thread
From: Manos Pitsidianakis @ 2025-08-08  8:14 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé , rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>We define a scoreboard that will hold our data per cpu. As well, we
>define a buffer per cpu that will be used to read registers and memories
>in a thread-safe way.
>
>For now, we just instrument all instructions with an empty callback.
>
>Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>---
> contrib/plugins/uftrace.c   | 74 +++++++++++++++++++++++++++++++++++++
> contrib/plugins/meson.build |  3 +-
> 2 files changed, 76 insertions(+), 1 deletion(-)
> create mode 100644 contrib/plugins/uftrace.c
>
>diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>new file mode 100644
>index 00000000000..d60c1077496
>--- /dev/null
>+++ b/contrib/plugins/uftrace.c
>@@ -0,0 +1,74 @@
>+/*
>+ * Copyright (C) 2025, Pierrick Bouvier <pierrick.bouvier@linaro.org>
>+ *
>+ * Generates a trace compatible with uftrace (similar to uftrace record).
>+ * https://github.com/namhyung/uftrace
>+ *
>+ * See docs/about/emulation.rst|Uftrace for details and examples.
>+ *
>+ * SPDX-License-Identifier: GPL-2.0-or-later
>+ */
>+
>+#include <qemu-plugin.h>
>+#include <glib.h>
>+
>+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>+
>+typedef struct Cpu {
>+    GByteArray *buf;
>+} Cpu;
>+
>+static struct qemu_plugin_scoreboard *score;
>+
>+static void track_callstack(unsigned int cpu_index, void *udata)
>+{
>+}
>+
>+static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>+{
>+    size_t n_insns = qemu_plugin_tb_n_insns(tb);
>+
>+    for (int i = 0; i < n_insns; i++) {

s/int i/size_t i/

>+        struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);

This can return NULL,

>+
>+        uintptr_t pc = qemu_plugin_insn_vaddr(insn);

And this would lead to a NULL dereference (it performs insn->vaddr 
access)

>+        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
>+                QEMU_PLUGIN_CB_R_REGS,
>+                (void *) pc);

Hm indentation got broken here, should be 


+        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
+                                               QEMU_PLUGIN_CB_R_REGS,
+                                               (void *)pc);

>+
>+    }
>+}
>+
>+static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
>+{
>+    Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>+    cpu->buf = g_byte_array_new();
>+}
>+
>+static void vcpu_end(unsigned int vcpu_index)
>+{
>+    Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>+    g_byte_array_free(cpu->buf, true);
>+    memset(cpu, 0, sizeof(Cpu));

Nitpick, cpu->buf = NULL; is easier to understand (suggestion)

>+}
>+
>+static void at_exit(qemu_plugin_id_t id, void *data)
>+{
>+    for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>+        vcpu_end(i);
>+    }
>+
>+    qemu_plugin_scoreboard_free(score);
>+}
>+
>+QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
>+                                           const qemu_info_t *info,
>+                                           int argc, char **argv)
>+{
>+    score = qemu_plugin_scoreboard_new(sizeof(Cpu));
>+    qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
>+    qemu_plugin_register_atexit_cb(id, at_exit, NULL);
>+    qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
>+
>+    return 0;
>+}
>diff --git a/contrib/plugins/meson.build b/contrib/plugins/meson.build
>index 1876bc78438..7eb3629c95d 100644
>--- a/contrib/plugins/meson.build
>+++ b/contrib/plugins/meson.build
>@@ -1,5 +1,6 @@
> contrib_plugins = ['bbv', 'cache', 'cflow', 'drcov', 'execlog', 'hotblocks',
>-                   'hotpages', 'howvec', 'hwprofile', 'ips', 'stoptrigger']
>+                   'hotpages', 'howvec', 'hwprofile', 'ips', 'stoptrigger',
>+                   'uftrace']
> if host_os != 'windows'
>   # lockstep uses socket.h
>   contrib_plugins += 'lockstep'
>-- 
>2.47.2
>

If no other comments rise for this patch, you can add my r-b after 
fixing the NULL check:

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 2/9] contrib/plugins/uftrace: define cpu operations and implement aarch64
  2025-08-08  2:06 ` [PATCH v5 2/9] contrib/plugins/uftrace: define cpu operations and implement aarch64 Pierrick Bouvier
@ 2025-08-08  8:28   ` Manos Pitsidianakis
  2025-08-08 16:27     ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Manos Pitsidianakis @ 2025-08-08  8:28 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé , rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>We define a new CpuOps structure that will be used to implement tracking
>independently of guest architecture.
>
>As well, we now instrument only instructions following ones that might
>have touch the frame pointer.

s/touch/touched

>
>Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>---
> contrib/plugins/uftrace.c | 112 ++++++++++++++++++++++++++++++++++++--
> 1 file changed, 108 insertions(+), 4 deletions(-)
>
>diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>index d60c1077496..4b1a2f38143 100644
>--- a/contrib/plugins/uftrace.c
>+++ b/contrib/plugins/uftrace.c
>@@ -11,14 +11,94 @@
> 
> #include <qemu-plugin.h>
> #include <glib.h>
>+#include <stdio.h>
> 
> QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> 
>+typedef struct Cpu Cpu;
>+
>+typedef struct {
>+    void (*init)(Cpu *cpu);
>+    void (*end)(Cpu *cpu);
>+    uint64_t (*get_frame_pointer)(Cpu *cpu);
>+    bool (*does_insn_modify_frame_pointer)(const char *disas);
>+} CpuOps;
>+
> typedef struct Cpu {
>     GByteArray *buf;
>+    CpuOps ops;
>+    void *arch;
> } Cpu;
> 
>+typedef struct {
>+    struct qemu_plugin_register *reg_fp;
>+} Aarch64Cpu;
>+
> static struct qemu_plugin_scoreboard *score;
>+static CpuOps arch_ops;
>+
>+static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
>+{
>+    GByteArray *buf = cpu->buf;
>+    g_byte_array_set_size(buf, 0);
>+    size_t sz = qemu_plugin_read_register(reg, buf);
>+    g_assert(sz == 8);
>+    g_assert(buf->len == 8);
>+    return *((uint64_t *) buf->data);
>+}
>+
>+static struct qemu_plugin_register *plugin_find_register(const char *name)
>+{
>+    g_autoptr(GArray) regs = qemu_plugin_get_registers();

Question about the plugin API and not this patch per se, if the cpu is 
in a32/thumb mode does it still return the aarch64 registers?

>+    for (int i = 0; i < regs->len; ++i) {
>+        qemu_plugin_reg_descriptor *reg;
>+        reg = &g_array_index(regs, qemu_plugin_reg_descriptor, i);
>+        if (!strcmp(reg->name, name)) {
>+            return reg->handle;
>+        }
>+    }
>+    return NULL;
>+}
>+
>+static uint64_t aarch64_get_frame_pointer(Cpu *cpu_)
>+{
>+    Aarch64Cpu *cpu = cpu_->arch;
>+    return cpu_read_register64(cpu_, cpu->reg_fp);
>+}
>+
>+static void aarch64_init(Cpu *cpu_)
>+{
>+    Aarch64Cpu *cpu = g_new0(Aarch64Cpu, 1);
>+    cpu_->arch = cpu;
>+    cpu->reg_fp = plugin_find_register("x29");
>+    if (!cpu->reg_fp) {
>+        fprintf(stderr, "uftrace plugin: frame pointer register (x29) is not "
>+                        "available. Please use an AArch64 cpu (or -cpu max).\n");
>+        g_abort();
>+    }
>+}
>+
>+static void aarch64_end(Cpu *cpu)
>+{
>+    g_free(cpu->arch);
>+}
>+
>+static bool aarch64_does_insn_modify_frame_pointer(const char *disas)
>+{
>+    /*
>+     * Check if current instruction concerns fp register "x29".
>+     * We add a prefix space to make sure we don't match addresses dump
>+     * in disassembly.
>+     */
>+    return strstr(disas, " x29");

Hm is the whitespace before x29 guaranteed? Neat trick otherwise.

>+}
>+
>+static CpuOps aarch64_ops = {
>+    .init = aarch64_init,
>+    .end = aarch64_end,
>+    .get_frame_pointer = aarch64_get_frame_pointer,
>+    .does_insn_modify_frame_pointer = aarch64_does_insn_modify_frame_pointer,
>+};
> 
> static void track_callstack(unsigned int cpu_index, void *udata)
> {
>@@ -28,20 +108,36 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
> {
>     size_t n_insns = qemu_plugin_tb_n_insns(tb);
> 
>+    /*
>+     * We instrument all instructions following one that might have updated
>+     * the frame pointer. We always instrument first instruction in block, as
>+     * last executed instruction, in previous tb, may have modified it.

Modified it how?

>+     */
>+    bool instrument_insn = true;
>     for (int i = 0; i < n_insns; i++) {
>         struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
> 
>-        uintptr_t pc = qemu_plugin_insn_vaddr(insn);
>-        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
>-                QEMU_PLUGIN_CB_R_REGS,
>-                (void *) pc);
>+        if (instrument_insn) {
>+            uintptr_t pc = qemu_plugin_insn_vaddr(insn);
>+            qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
>+                                                   QEMU_PLUGIN_CB_R_REGS,
>+                                                   (void *) pc);
>+            instrument_insn = false;
>+        }
> 
>+        char *disas = qemu_plugin_insn_disas(insn);
>+        if (arch_ops.does_insn_modify_frame_pointer(disas)) {
>+            instrument_insn = true;
>+        }

So if I understand correctly you check if an instruction needs to be 
instrumented and then do it in the next forloop. This means if the last 
insn needs to be instrumented too it is not done, is that ok?

>     }
> }
> 
> static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
> {
>     Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>+    cpu->ops = arch_ops;
>+
>+    cpu->ops.init(cpu);
>     cpu->buf = g_byte_array_new();
> }
> 
>@@ -65,6 +161,14 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
>                                            const qemu_info_t *info,
>                                            int argc, char **argv)
> {
>+    if (!strcmp(info->target_name, "aarch64")) {
>+        arch_ops = aarch64_ops;
>+    } else {
>+        fprintf(stderr, "plugin uftrace: %s target is not supported\n",
>+                info->target_name);
>+        return 1;
>+    }
>+
>     score = qemu_plugin_scoreboard_new(sizeof(Cpu));
>     qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
>     qemu_plugin_register_atexit_cb(id, at_exit, NULL);
>-- 
>2.47.2
>

LGTM overall, it makes sense.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 3/9] contrib/plugins/uftrace: track callstack
  2025-08-08  2:06 ` [PATCH v5 3/9] contrib/plugins/uftrace: track callstack Pierrick Bouvier
@ 2025-08-08  8:49   ` Manos Pitsidianakis
  2025-08-08 16:36     ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Manos Pitsidianakis @ 2025-08-08  8:49 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé , rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>We now track callstack, based on frame pointer analysis. We can detect
>function calls, returns, and discontinuities.
>We implement a frame pointer based unwinding that is used for
>discontinuities.


Nit: Never heard of the "discontinuity" term for program execution 
before :D Maybe "async control flow (signals, interrupts)"?

>
>Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>---
> contrib/plugins/uftrace.c | 160 ++++++++++++++++++++++++++++++++++++++
> 1 file changed, 160 insertions(+)
>
>diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>index 4b1a2f38143..d51faceb344 100644
>--- a/contrib/plugins/uftrace.c
>+++ b/contrib/plugins/uftrace.c
>@@ -15,6 +15,15 @@
> 
> QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> 
>+typedef struct {
>+    GArray *s;
>+} Callstack;
>+
>+typedef struct {
>+    uint64_t pc;
>+    uint64_t frame_pointer;
>+} CallstackEntry;
>+
> typedef struct Cpu Cpu;
> 
> typedef struct {
>@@ -25,6 +34,7 @@ typedef struct {
> } CpuOps;
> 
> typedef struct Cpu {
>+    Callstack *cs;
>     GByteArray *buf;
>     CpuOps ops;
>     void *arch;
>@@ -37,6 +47,71 @@ typedef struct {
> static struct qemu_plugin_scoreboard *score;
> static CpuOps arch_ops;
> 
>+static Callstack *callstack_new(void)
>+{
>+    Callstack *cs = g_new0(Callstack, 1);
>+    cs->s = g_array_new(false, false, sizeof(CallstackEntry));
>+    return cs;
>+}
>+
>+static void callstack_free(Callstack *cs)
>+{
>+    g_array_free(cs->s, true);
>+    cs->s = NULL;
>+    g_free(cs);
>+}
>+
>+static size_t callstack_depth(const Callstack *cs)
>+{
>+    return cs->s->len;
>+}
>+
>+static size_t callstack_empty(const Callstack *cs)
>+{
>+    return callstack_depth(cs) == 0;
>+}
>+
>+static void callstack_clear(Callstack *cs)
>+{
>+    g_array_set_size(cs->s, 0);
>+}
>+
>+static const CallstackEntry *callstack_at(const Callstack *cs, size_t depth)
>+{
>+    g_assert(depth > 0);
>+    g_assert(depth <= callstack_depth(cs));
>+    return &g_array_index(cs->s, CallstackEntry, depth - 1);
>+}
>+
>+static CallstackEntry callstack_top(const Callstack *cs)
>+{
>+    if (callstack_depth(cs) >= 1) {
>+        return *callstack_at(cs, callstack_depth(cs));
>+    }
>+    return (CallstackEntry){};
>+}
>+
>+static CallstackEntry callstack_caller(const Callstack *cs)
>+{
>+    if (callstack_depth(cs) >= 2) {
>+        return *callstack_at(cs, callstack_depth(cs) - 1);
>+    }
>+    return (CallstackEntry){};
>+}
>+
>+static void callstack_push(Callstack *cs, CallstackEntry e)
>+{
>+    g_array_append_val(cs->s, e);
>+}
>+
>+static CallstackEntry callstack_pop(Callstack *cs)
>+{
>+    g_assert(!callstack_empty(cs));
>+    CallstackEntry e = callstack_top(cs);
>+    g_array_set_size(cs->s, callstack_depth(cs) - 1);
>+    return e;
>+}
>+
> static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
> {
>     GByteArray *buf = cpu->buf;
>@@ -47,6 +122,50 @@ static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
>     return *((uint64_t *) buf->data);
> }
> 
>+static uint64_t cpu_read_memory64(Cpu *cpu, uint64_t addr)
>+{
>+    g_assert(addr);
>+    GByteArray *buf = cpu->buf;
>+    g_byte_array_set_size(buf, 0);
>+    bool read = qemu_plugin_read_memory_vaddr(addr, buf, 8);
>+    if (!read) {
>+        return 0;
>+    }
>+    g_assert(buf->len == 8);
>+    return *((uint64_t *) buf->data);
>+}
>+
>+static void cpu_unwind_stack(Cpu *cpu, uint64_t frame_pointer, uint64_t pc)
>+{
>+    g_assert(callstack_empty(cpu->cs));
>+
>+    #define UNWIND_STACK_MAX_DEPTH 1024
>+    CallstackEntry unwind[UNWIND_STACK_MAX_DEPTH];
>+    size_t depth = 0;
>+    do {
>+        /* check we don't have an infinite stack */
>+        for (size_t i = 0; i < depth; ++i) {
>+            if (frame_pointer == unwind[i].frame_pointer) {
>+                break;
>+            }
>+        }
>+        CallstackEntry e = {.frame_pointer = frame_pointer, .pc = pc};
>+        unwind[depth] = e;
>+        depth++;
>+        if (frame_pointer) {
>+            frame_pointer = cpu_read_memory64(cpu, frame_pointer);
>+        }
>+        pc = cpu_read_memory64(cpu, frame_pointer + 8); /* read previous lr */
>+    } while (frame_pointer && pc && depth < UNWIND_STACK_MAX_DEPTH);
>+    #undef UNWIND_STACK_MAX_DEPTH
>+
>+    /* push it from bottom to top */
>+    while (depth) {
>+        callstack_push(cpu->cs, unwind[depth - 1]);
>+        --depth;
>+    }
>+}

Nice.

>+
> static struct qemu_plugin_register *plugin_find_register(const char *name)
> {
>     g_autoptr(GArray) regs = qemu_plugin_get_registers();
>@@ -102,6 +221,43 @@ static CpuOps aarch64_ops = {
> 
> static void track_callstack(unsigned int cpu_index, void *udata)
> {
>+    uint64_t pc = (uintptr_t) udata;
>+    Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
>+    Callstack *cs = cpu->cs;
>+
>+    uint64_t fp = cpu->ops.get_frame_pointer(cpu);
>+    if (!fp && callstack_empty(cs)) {
>+        /*
>+         * We simply push current pc. Note that we won't detect symbol change as
>+         * long as a proper call does not happen.
>+         */
>+        callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
>+        return;
>+    }
>+
>+    CallstackEntry top = callstack_top(cs);
>+    if (fp == top.frame_pointer) {
>+        /* same function */
>+        return;
>+    }
>+
>+    CallstackEntry caller = callstack_caller(cs);
>+    if (fp == caller.frame_pointer) {
>+        /* return */
>+        callstack_pop(cs);
>+        return;
>+    }
>+
>+    uint64_t caller_fp = fp ? cpu_read_memory64(cpu, fp) : 0;
>+    if (caller_fp == top.frame_pointer) {
>+        /* call */
>+        callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
>+        return;
>+    }
>+
>+    /* discontinuity, exit current stack and unwind new one */
>+    callstack_clear(cs);
>+    cpu_unwind_stack(cpu, fp, pc);
> }
> 
> static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>@@ -139,12 +295,16 @@ static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
> 
>     cpu->ops.init(cpu);
>     cpu->buf = g_byte_array_new();
>+
>+    cpu->cs = callstack_new();
> }
> 
> static void vcpu_end(unsigned int vcpu_index)
> {
>     Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>     g_byte_array_free(cpu->buf, true);
>+
>+    callstack_free(cpu->cs);
>     memset(cpu, 0, sizeof(Cpu));
> }
> 
>-- 
>2.47.2
>
Looks good I think,

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/9] contrib/plugins/uftrace: implement tracing
  2025-08-08  2:06 ` [PATCH v5 4/9] contrib/plugins/uftrace: implement tracing Pierrick Bouvier
@ 2025-08-08  9:11   ` Manos Pitsidianakis
  2025-08-08 16:38     ` Pierrick Bouvier
                       ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Manos Pitsidianakis @ 2025-08-08  9:11 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé , rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>We implement tracing, following uftrace format.
>Trace is flushed every 32 MB, so file operations don't impact
>performance at runtime.
>
>A different trace is generated per cpu, and we ensure they have a unique
>name, based on vcpu_index, while keeping room for privilege level coming
>in next commit.

Suggestion (not a request): put some kind of documentation about the 
format this patch implements, maybe a commit sha & URL to a header file 
from upstream uftrace.

>
>Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>---
> contrib/plugins/uftrace.c | 149 +++++++++++++++++++++++++++++++++++++-
> 1 file changed, 148 insertions(+), 1 deletion(-)
>
>diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>index d51faceb344..402a242433e 100644
>--- a/contrib/plugins/uftrace.c
>+++ b/contrib/plugins/uftrace.c
>@@ -12,6 +12,13 @@
> #include <qemu-plugin.h>
> #include <glib.h>
> #include <stdio.h>
>+#include <sys/stat.h>
>+#include <sys/time.h>
>+#include <time.h>
>+#include <unistd.h>
>+
>+#define MiB     (INT64_C(1) << 20)
>+#define NANOSECONDS_PER_SECOND 1000000000LL
> 
> QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> 
>@@ -24,6 +31,13 @@ typedef struct {
>     uint64_t frame_pointer;
> } CallstackEntry;
> 
>+typedef struct {
>+    GArray *t;
>+    GString *path;
>+    GString *name;
>+    uint32_t id;
>+} Trace;
>+
> typedef struct Cpu Cpu;
> 
> typedef struct {
>@@ -34,6 +48,7 @@ typedef struct {
> } CpuOps;
> 
> typedef struct Cpu {
>+    Trace *trace;
>     Callstack *cs;
>     GByteArray *buf;
>     CpuOps ops;
>@@ -44,9 +59,41 @@ typedef struct {
>     struct qemu_plugin_register *reg_fp;
> } Aarch64Cpu;
> 
>+typedef struct {
>+    uint64_t timestamp;
>+    uint64_t data;
>+} UftraceEntry;
>+
>+typedef enum {
>+    UFTRACE_ENTRY,
>+    UFTRACE_EXIT,
>+    UFTRACE_LOST,
>+    UFTRACE_EVENT
>+} UftraceRecordType;
>+
> static struct qemu_plugin_scoreboard *score;
> static CpuOps arch_ops;
> 
>+static uint64_t gettime_ns(void)
>+{
>+#ifdef _WIN32
>+    /*
>+     * On Windows, timespec_get is available only with UCRT, but not with
>+     * MinGW64 environment. Simplify by using only gettimeofday on this
>+     * platform. This may result in a precision loss.
>+     */
>+    struct timeval tv;
>+    gettimeofday(&tv, NULL);
>+    uint64_t now_ns = tv.tv_sec * NANOSECONDS_PER_SECOND + tv.tv_usec * 1000;
>+#else
>+    /* We need nanosecond precision for short lived functions. */
>+    struct timespec ts;
>+    timespec_get(&ts, TIME_UTC);
>+    uint64_t now_ns = ts.tv_sec * NANOSECONDS_PER_SECOND + ts.tv_nsec;
>+#endif
>+    return now_ns;
>+}
>+
> static Callstack *callstack_new(void)
> {
>     Callstack *cs = g_new0(Callstack, 1);
>@@ -112,6 +159,85 @@ static CallstackEntry callstack_pop(Callstack *cs)
>     return e;
> }
> 
>+static Trace *trace_new(uint32_t id, GString *name)
>+{
>+    Trace *t = g_new0(Trace, 1);
>+    t->t = g_array_new(false, false, sizeof(UftraceEntry));
>+    t->path = g_string_new(NULL);
>+    g_string_append_printf(t->path, "./uftrace.data/%"PRIu32".dat", id);
>+    t->name = g_string_new(name->str);
>+    t->id = id;
>+    return t;
>+}
>+
>+static void trace_free(Trace *t)
>+{
>+    g_assert(t->t->len == 0);
>+    g_array_free(t->t, true);
>+    t->t = NULL;
>+    g_string_free(t->path, true);
>+    t->path = NULL;
>+    g_string_free(t->name, true);
>+    t->name = NULL;
>+    g_free(t);
>+}
>+
>+static void trace_flush(Trace *t, bool append)
>+{
>+    int create_dir = g_mkdir_with_parents("./uftrace.data",
>+                                          S_IRWXU | S_IRWXG | S_IRWXO);
>+    g_assert(create_dir == 0);
>+    FILE *dat = fopen(t->path->str, append ? "a" : "w");
>+    g_assert(dat);
>+    GArray *data = t->t;
>+    if (data->len) {
>+        fwrite(data->data, data->len, sizeof(UftraceEntry), dat);

fwrite might not write all bytes, how about using the 
g_file_set_contents() wrapper?

>+    }
>+    fclose(dat);
>+    g_array_set_size(data, 0);
>+}
>+
>+static void trace_add_entry(Trace *t, uint64_t timestamp, uint64_t pc,
>+                            size_t depth, UftraceRecordType type)
>+{
>+    /* libmcount/record.c:record_event */
>+    const uint64_t record_magic = 0x5;
>+    uint64_t data = type | record_magic << 3;
>+    data += depth << 6;
>+    data += pc << 16;
>+    UftraceEntry e = {.timestamp = timestamp, .data = data};
>+    g_array_append_val(t->t, e);
>+    if (t->t->len * sizeof(UftraceEntry) > 32 * MiB) {
>+        trace_flush(t, true);
>+    }
>+}
>+
>+static void trace_enter_function(Trace *t, uint64_t timestamp,
>+                                 uint64_t pc, size_t depth)
>+{
>+    trace_add_entry(t, timestamp, pc, depth, UFTRACE_ENTRY);
>+}
>+
>+static void trace_exit_function(Trace *t, uint64_t timestamp,
>+                                uint64_t pc, size_t depth)
>+{
>+    trace_add_entry(t, timestamp, pc, depth, UFTRACE_EXIT);
>+}
>+
>+static void trace_enter_stack(Trace *t, Callstack *cs, uint64_t timestamp)
>+{
>+    for (size_t depth = 1; depth <= callstack_depth(cs); ++depth) {
>+        trace_enter_function(t, timestamp, callstack_at(cs, depth)->pc, depth);
>+    }
>+}
>+
>+static void trace_exit_stack(Trace *t, Callstack *cs, uint64_t timestamp)
>+{
>+    for (size_t depth = callstack_depth(cs); depth > 0; --depth) {
>+        trace_exit_function(t, timestamp, callstack_at(cs, depth)->pc, depth);
>+    }
>+}
>+
> static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
> {
>     GByteArray *buf = cpu->buf;
>@@ -223,7 +349,9 @@ static void track_callstack(unsigned int cpu_index, void *udata)
> {
>     uint64_t pc = (uintptr_t) udata;
>     Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
>+    uint64_t timestamp = gettime_ns();
>     Callstack *cs = cpu->cs;
>+    Trace *t = cpu->trace;
> 
>     uint64_t fp = cpu->ops.get_frame_pointer(cpu);
>     if (!fp && callstack_empty(cs)) {
>@@ -232,6 +360,7 @@ static void track_callstack(unsigned int cpu_index, void *udata)
>          * long as a proper call does not happen.
>          */
>         callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
>+        trace_enter_function(t, timestamp, pc, callstack_depth(cs));
>         return;
>     }
> 
>@@ -244,7 +373,8 @@ static void track_callstack(unsigned int cpu_index, void *udata)
>     CallstackEntry caller = callstack_caller(cs);
>     if (fp == caller.frame_pointer) {
>         /* return */
>-        callstack_pop(cs);
>+        CallstackEntry e = callstack_pop(cs);
>+        trace_exit_function(t, timestamp, e.pc, callstack_depth(cs));
>         return;
>     }
> 
>@@ -252,12 +382,16 @@ static void track_callstack(unsigned int cpu_index, void *udata)
>     if (caller_fp == top.frame_pointer) {
>         /* call */
>         callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
>+        trace_enter_function(t, timestamp, pc, callstack_depth(cs));
>         return;
>     }
> 
>     /* discontinuity, exit current stack and unwind new one */
>+    trace_exit_stack(t, cs, timestamp);
>     callstack_clear(cs);
>+
>     cpu_unwind_stack(cpu, fp, pc);
>+    trace_enter_stack(t, cs, timestamp);
> }
> 
> static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>@@ -296,6 +430,16 @@ static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
>     cpu->ops.init(cpu);
>     cpu->buf = g_byte_array_new();
> 
>+    g_assert(vcpu_index < UINT32_MAX / 100);
>+    /* trace_id is: cpu_number * 100 */
>+    uint32_t trace_id = (vcpu_index + 1) * 100;
>+
>+    g_autoptr(GString) trace_name = g_string_new(NULL);
>+    g_string_append_printf(trace_name, "cpu%u", vcpu_index);
>+    cpu->trace = trace_new(trace_id, trace_name);
>+    /* create/truncate trace file */
>+    trace_flush(cpu->trace, false);
>+
>     cpu->cs = callstack_new();
> }
> 
>@@ -304,6 +448,7 @@ static void vcpu_end(unsigned int vcpu_index)
>     Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>     g_byte_array_free(cpu->buf, true);
> 
>+    trace_free(cpu->trace);
>     callstack_free(cpu->cs);
>     memset(cpu, 0, sizeof(Cpu));
> }
>@@ -311,6 +456,8 @@ static void vcpu_end(unsigned int vcpu_index)
> static void at_exit(qemu_plugin_id_t id, void *data)
> {
>     for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>+        Cpu *cpu = qemu_plugin_scoreboard_find(score, i);
>+        trace_flush(cpu->trace, true);
>         vcpu_end(i);
>     }
> 
>-- 
>2.47.2
>

Looks good overall.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 5/9] contrib/plugins/uftrace: implement privilege level tracing
  2025-08-08  2:06 ` [PATCH v5 5/9] contrib/plugins/uftrace: implement privilege level tracing Pierrick Bouvier
@ 2025-08-08  9:26   ` Manos Pitsidianakis
  2025-08-08 16:41     ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Manos Pitsidianakis @ 2025-08-08  9:26 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé , rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>We add new option trace-privilege-level=bool, which will create a
>separate trace for each privilege level.
>This allows to follow changes of privilege during execution.
>
>We implement aarch64 operations to track current privilege level
>accordingly.
>
>Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>---
> contrib/plugins/uftrace.c | 189 ++++++++++++++++++++++++++++++++++++--
> 1 file changed, 181 insertions(+), 8 deletions(-)
>
>diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>index 402a242433e..7737626da2f 100644
>--- a/contrib/plugins/uftrace.c
>+++ b/contrib/plugins/uftrace.c
>@@ -44,19 +44,39 @@ typedef struct {
>     void (*init)(Cpu *cpu);
>     void (*end)(Cpu *cpu);
>     uint64_t (*get_frame_pointer)(Cpu *cpu);
>+    uint8_t (*get_privilege_level)(Cpu *cpu);
>+    uint8_t (*num_privilege_levels)(void);
>+    const char *(*get_privilege_level_name)(uint8_t pl);
>     bool (*does_insn_modify_frame_pointer)(const char *disas);
> } CpuOps;
> 
> typedef struct Cpu {
>     Trace *trace;
>     Callstack *cs;
>+    uint8_t privilege_level;
>+    GArray *traces; /* Trace *traces [] */
>     GByteArray *buf;
>     CpuOps ops;
>     void *arch;
> } Cpu;
> 
>+typedef enum {
>+    AARCH64_EL0_SECURE,
>+    AARCH64_EL0_NONSECURE,
>+    AARCH64_EL0_REALM,
>+    AARCH64_EL1_SECURE,
>+    AARCH64_EL1_NONSECURE,
>+    AARCH64_EL1_REALM,
>+    AARCH64_EL2_SECURE,
>+    AARCH64_EL2_NONSECURE,
>+    AARCH64_EL2_REALM,
>+    AARCH64_EL3,
>+} Aarch64PrivilegeLevel;
>+
> typedef struct {
>     struct qemu_plugin_register *reg_fp;
>+    struct qemu_plugin_register *reg_cpsr;
>+    struct qemu_plugin_register *reg_scr_el3;
> } Aarch64Cpu;
> 
> typedef struct {
>@@ -72,6 +92,7 @@ typedef enum {
> } UftraceRecordType;
> 
> static struct qemu_plugin_scoreboard *score;
>+static bool trace_privilege_level;
> static CpuOps arch_ops;
> 
> static uint64_t gettime_ns(void)
>@@ -248,6 +269,16 @@ static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
>     return *((uint64_t *) buf->data);
> }
> 
>+static uint32_t cpu_read_register32(Cpu *cpu, struct qemu_plugin_register *reg)
>+{
>+    GByteArray *buf = cpu->buf;
>+    g_byte_array_set_size(buf, 0);
>+    size_t sz = qemu_plugin_read_register(reg, buf);
>+    g_assert(sz == 4);
>+    g_assert(buf->len == 4);
>+    return *((uint32_t *) buf->data);
>+}
>+
> static uint64_t cpu_read_memory64(Cpu *cpu, uint64_t addr)
> {
>     g_assert(addr);
>@@ -305,6 +336,68 @@ static struct qemu_plugin_register *plugin_find_register(const char *name)
>     return NULL;
> }
> 
>+static uint8_t aarch64_num_privilege_levels(void)
>+{
>+    return AARCH64_EL3 + 1;

Nit: A common idiom is defining an _MAX enum variant that always stays 
on the bottom of the enum definition and equals the max value, thus 
ensuring you always refer to the correct maximum variant. Though I'd be 
surprised if they ever add an EL4 :^)

>+}
>+
>+static const char *aarch64_get_privilege_level_name(uint8_t pl)
>+{
>+    switch (pl) {
>+    case AARCH64_EL0_SECURE: return "S-EL0";
>+    case AARCH64_EL0_NONSECURE: return "NS-EL0";
>+    case AARCH64_EL0_REALM: return "R-EL0";
>+    case AARCH64_EL1_SECURE: return "S-EL1";
>+    case AARCH64_EL1_NONSECURE: return "NS-EL1";
>+    case AARCH64_EL1_REALM: return "R-EL1";
>+    case AARCH64_EL2_SECURE: return "S-EL2";
>+    case AARCH64_EL2_NONSECURE: return "NS-EL2";
>+    case AARCH64_EL2_REALM: return "R-EL2";
>+    case AARCH64_EL3: return "EL3";
>+    default:
>+        g_assert_not_reached();
>+    }
>+}
>+
>+static uint8_t aarch64_get_privilege_level(Cpu *cpu_)
>+{
>+    Aarch64Cpu *cpu = cpu_->arch;
>+    /*
>+     * QEMU gdbstub does not provide access to CurrentEL,

We should fix that... I wanted to do that for a while.

>+     * so we use CPSR instead.
>+     */
>+    uint8_t el = cpu_read_register32(cpu_, cpu->reg_cpsr) >> 2 & 0b11;
>+
>+    if (el == 3) {
>+        return AARCH64_EL3;
>+    }
>+
>+    uint8_t ss = AARCH64_EL0_SECURE;
>+    if (!cpu->reg_scr_el3) {
>+        ss = AARCH64_EL0_NONSECURE;
>+    }
>+    uint64_t scr_el3 = cpu_read_register64(cpu_, cpu->reg_scr_el3);
>+    uint64_t ns = (scr_el3 >> 0) & 0b1;
>+    uint64_t nse = (scr_el3 >> 62) & 0b1;
>+    switch (nse << 1 | ns) {
>+    case 0b00:
>+        ss = AARCH64_EL0_SECURE;
>+        break;
>+    case 0b01:
>+        ss = AARCH64_EL0_NONSECURE;
>+        break;
>+    case 0b11:
>+        ss = AARCH64_EL0_REALM;
>+        break;
>+    default:
>+        g_assert_not_reached();
>+    }
>+
>+    const uint8_t num_ss = 3;
>+    Aarch64PrivilegeLevel pl = el * num_ss + ss;

That's a bit blunt but I understand where you're coming from

>+    return pl;
>+}
>+
> static uint64_t aarch64_get_frame_pointer(Cpu *cpu_)
> {
>     Aarch64Cpu *cpu = cpu_->arch;
>@@ -321,6 +414,10 @@ static void aarch64_init(Cpu *cpu_)
>                         "available. Please use an AArch64 cpu (or -cpu max).\n");
>         g_abort();
>     }
>+    cpu->reg_cpsr = plugin_find_register("cpsr");
>+    g_assert(cpu->reg_cpsr);
>+    cpu->reg_scr_el3 = plugin_find_register("SCR_EL3");
>+    /* scr_el3 is optional */
> }
> 
> static void aarch64_end(Cpu *cpu)
>@@ -342,9 +439,34 @@ static CpuOps aarch64_ops = {
>     .init = aarch64_init,
>     .end = aarch64_end,
>     .get_frame_pointer = aarch64_get_frame_pointer,
>+    .get_privilege_level = aarch64_get_privilege_level,
>+    .num_privilege_levels = aarch64_num_privilege_levels,
>+    .get_privilege_level_name = aarch64_get_privilege_level_name,
>     .does_insn_modify_frame_pointer = aarch64_does_insn_modify_frame_pointer,
> };
> 
>+static void track_privilege_change(unsigned int cpu_index, void *udata)
>+{
>+    Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
>+    uint8_t new_pl = cpu->ops.get_privilege_level(cpu);
>+
>+    if (new_pl == cpu->privilege_level) {
>+        return;
>+    }
>+
>+    uint64_t pc = (uintptr_t) udata;
>+    uint64_t timestamp = gettime_ns();
>+
>+    trace_exit_stack(cpu->trace, cpu->cs, timestamp);
>+    callstack_clear(cpu->cs);
>+
>+    cpu->privilege_level = new_pl;
>+    cpu->trace = g_array_index(cpu->traces, Trace*, new_pl);
>+
>+    cpu_unwind_stack(cpu, cpu->ops.get_frame_pointer(cpu), pc);
>+    trace_enter_stack(cpu->trace, cpu->cs, timestamp);
>+}
>+
> static void track_callstack(unsigned int cpu_index, void *udata)
> {
>     uint64_t pc = (uintptr_t) udata;
>@@ -397,6 +519,13 @@ static void track_callstack(unsigned int cpu_index, void *udata)
> static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
> {
>     size_t n_insns = qemu_plugin_tb_n_insns(tb);
>+    uintptr_t tb_pc = qemu_plugin_tb_vaddr(tb);
>+
>+    if (trace_privilege_level) {
>+        qemu_plugin_register_vcpu_tb_exec_cb(tb, track_privilege_change,
>+                                             QEMU_PLUGIN_CB_R_REGS,
>+                                             (void *) tb_pc);
>+    }
> 
>     /*
>      * We instrument all instructions following one that might have updated
>@@ -429,18 +558,36 @@ static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
> 
>     cpu->ops.init(cpu);
>     cpu->buf = g_byte_array_new();
>+    cpu->traces = g_array_new(0, 0, sizeof(Trace *));
> 
>     g_assert(vcpu_index < UINT32_MAX / 100);
>-    /* trace_id is: cpu_number * 100 */
>+    g_assert(cpu->ops.num_privilege_levels() < 100);
>+    /* trace_id is: cpu_number * 100 + privilege_level */
>     uint32_t trace_id = (vcpu_index + 1) * 100;
> 
>-    g_autoptr(GString) trace_name = g_string_new(NULL);
>-    g_string_append_printf(trace_name, "cpu%u", vcpu_index);
>-    cpu->trace = trace_new(trace_id, trace_name);
>-    /* create/truncate trace file */
>-    trace_flush(cpu->trace, false);
>+    if (trace_privilege_level) {
>+        for (uint8_t pl = 0; pl < cpu->ops.num_privilege_levels(); ++pl) {
>+            g_autoptr(GString) trace_name = g_string_new(NULL);
>+            g_string_append_printf(trace_name, "cpu%u %s", vcpu_index,
>+                                   cpu->ops.get_privilege_level_name(pl));
>+            Trace *t = trace_new(trace_id + pl, trace_name);
>+            g_array_append_val(cpu->traces, t);
>+        }
>+    } else {
>+        g_autoptr(GString) trace_name = g_string_new(NULL);
>+        g_string_append_printf(trace_name, "cpu%u", vcpu_index);
>+        Trace *t = trace_new(trace_id, trace_name);
>+        g_array_append_val(cpu->traces, t);
>+    }
>+
>+    for (size_t i = 0; i < cpu->traces->len; ++i) {
>+        /* create/truncate trace files */
>+        Trace *t = g_array_index(cpu->traces, Trace*, i);
>+        trace_flush(t, false);
>+    }
> 
>     cpu->cs = callstack_new();
>+    cpu->trace = g_array_index(cpu->traces, Trace*, cpu->privilege_level);
> }
> 
> static void vcpu_end(unsigned int vcpu_index)
>@@ -448,7 +595,12 @@ static void vcpu_end(unsigned int vcpu_index)
>     Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>     g_byte_array_free(cpu->buf, true);
> 
>-    trace_free(cpu->trace);
>+    for (size_t i = 0; i < cpu->traces->len; ++i) {
>+        Trace *t = g_array_index(cpu->traces, Trace*, i);
>+        trace_free(t);
>+    }
>+
>+    g_array_free(cpu->traces, true);
>     callstack_free(cpu->cs);
>     memset(cpu, 0, sizeof(Cpu));
> }
>@@ -457,7 +609,13 @@ static void at_exit(qemu_plugin_id_t id, void *data)
> {
>     for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>         Cpu *cpu = qemu_plugin_scoreboard_find(score, i);
>-        trace_flush(cpu->trace, true);
>+        for (size_t j = 0; j < cpu->traces->len; ++j) {
>+            Trace *t = g_array_index(cpu->traces, Trace*, j);
>+            trace_flush(t, true);
>+        }
>+    }
>+
>+    for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>         vcpu_end(i);
>     }
> 
>@@ -468,6 +626,21 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
>                                            const qemu_info_t *info,
>                                            int argc, char **argv)
> {
>+    for (int i = 0; i < argc; i++) {
>+        char *opt = argv[i];
>+        g_auto(GStrv) tokens = g_strsplit(opt, "=", 2);
>+        if (g_strcmp0(tokens[0], "trace-privilege-level") == 0) {
>+            if (!qemu_plugin_bool_parse(tokens[0], tokens[1],
>+                                        &trace_privilege_level)) {
>+                fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
>+                return -1;
>+            }
>+        } else {
>+            fprintf(stderr, "option parsing failed: %s\n", opt);
>+            return -1;
>+        }
>+    }
>+
>     if (!strcmp(info->target_name, "aarch64")) {
>         arch_ops = aarch64_ops;
>     } else {
>-- 
>2.47.2
>

LGTM,

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/9] contrib/plugins/uftrace: generate additional files for uftrace
  2025-08-08  2:06 ` [PATCH v5 6/9] contrib/plugins/uftrace: generate additional files for uftrace Pierrick Bouvier
@ 2025-08-08  9:37   ` Manos Pitsidianakis
  2025-08-08 16:42     ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Manos Pitsidianakis @ 2025-08-08  9:37 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé , rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>Beyond traces per cpu, uftrace expect to find some specific files.
>- info: contains information about machine/program run
>  those values are not impacting uftrace behaviour, and we simply copied
>  a random example to keep things simple.
>- memory mapping: how every binary is mapped in memory. For system mode,
>  we generate an empty mapping (uftrace_symbols.py, coming in future
>  commit, will take care of that). For user mode, we copy current
>  /proc/self/maps. We don't need to do any special filtering, as
>  reported addresses will necessarily concern guest program, and not
>  QEMU and its libraries.
>- task: list of tasks. We present every vcpu/privilege level as a
>  separate process, as it's the best view we can have when generating a
>  (visual) chrome trace. Using threads is less convenient in terms of
>  UI.
>
>Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>---
> contrib/plugins/uftrace.c | 130 +++++++++++++++++++++++++++++++++++++-
> 1 file changed, 129 insertions(+), 1 deletion(-)
>
>diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>index 7737626da2f..6628b4256fd 100644
>--- a/contrib/plugins/uftrace.c
>+++ b/contrib/plugins/uftrace.c
>@@ -115,6 +115,126 @@ static uint64_t gettime_ns(void)
>     return now_ns;
> }
> 
>+static void uftrace_write_map(bool system_emulation)
>+{
>+    const char *path = "./uftrace.data/sid-0.map";
>+
>+    if (system_emulation && access(path, F_OK) == 0) {
>+        /* do not erase existing map in system emulation, as a custom one might
>+         * already have been generated by uftrace_symbols.py */
>+        return;
>+    }
>+
>+    FILE *sid_map = fopen(path, "w");
>+    g_assert(sid_map);
>+
>+    if (system_emulation) {
>+        fprintf(sid_map,
>+                "# map stack on highest address possible, to prevent uftrace\n"
>+                "# from considering any kernel address\n");
>+        fprintf(sid_map,
>+          "ffffffffffff-ffffffffffff rw-p 00000000 00:00 0 [stack]\n");
>+    } else {
>+        /* in user mode, copy /proc/self/maps instead */
>+        FILE *self_map = fopen("/proc/self/maps", "r");
>+        g_assert(self_map);
>+        for (;;) {
>+            int c = fgetc(self_map);
>+            if (c == EOF) {
>+                break;
>+            }
>+            fputc(c, sid_map);
>+        }
>+        fclose(self_map);
>+    }
>+    fclose(sid_map);
>+}
>+
>+static void uftrace_write_task(const GArray *traces)
>+{
>+    FILE *task = fopen("./uftrace.data/task.txt", "w");
>+    g_assert(task);
>+    for (int i = 0; i < traces->len; ++i) {
>+        Trace *t = g_array_index(traces, Trace*, i);
>+        fprintf(task, "SESS timestamp=0.0 pid=%"PRIu32" sid=0 exename=\"%s\"\n",
>+                t->id, t->name->str);
>+        fprintf(task, "TASK timestamp=0.0 tid=%"PRIu32" pid=%"PRIu32"\n",
>+                t->id, t->id);
>+    }
>+    fclose(task);
>+}
>+
>+static void uftrace_write_info(const GArray *traces)
>+{
>+    g_autoptr(GString) taskinfo_tids = g_string_new("taskinfo:tids=");
>+    for (int i = 0; i < traces->len; ++i) {
>+        Trace *t = g_array_index(traces, Trace*, i);
>+        const char *delim = i > 0 ? "," : "";
>+        g_string_append_printf(taskinfo_tids, "%s%"PRIu32, delim, t->id);
>+    }
>+
>+    g_autoptr(GString) taskinfo_nr_tid = g_string_new("taskinfo:nr_tid=");
>+    g_string_append_printf(taskinfo_nr_tid, "%d", traces->len);
>+
>+    FILE *info = fopen("./uftrace.data/info", "w");
>+    g_assert(info);
>+    /*
>+     * $ uftrace dump --debug
>+     * uftrace file header: magic         = 4674726163652100
>+     * uftrace file header: version       = 4
>+     * uftrace file header: header size   = 40
>+     * uftrace file header: endian        = 1 (little)
>+     * uftrace file header: class          = 2 (64 bit)
>+     * uftrace file header: features      = 0x1263 (PLTHOOK | ...
>+     * uftrace file header: info          = 0x7bff (EXE_NAME | ...
>+     *  <0000000000000000>: 46 74 72 61 63 65 21 00  04 00 00 00 28 00 01 02
>+     *  <0000000000000010>: 63 12 00 00 00 00 00 00  ff 7b 00 00 00 00 00 00
>+     *  <0000000000000020>: 00 04 00 00 00 00 00 00
>+     */
>+    const uint8_t header[] = {0x46, 0x74, 0x72, 0x61, 0x63, 0x65, 0x21, 0x00,
>+                              0x04, 0x00, 0x00, 0x00, 0x28, 0x00, 0x01, 0x02,
>+                              0x63, 0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>+                              0xff, 0x7b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>+                              0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
>+    fwrite(header, sizeof(header), 1, info);
>+    const char *info_data[] = {
>+        "exename:from_qemu",
>+        "build_id:0123456789abcdef0123456789abcdef01234567",
>+        "exit_status:0",
>+        "cmdline:uftrace record qemu",
>+        "cpuinfo:lines=2",
>+        "cpuinfo:nr_cpus=1 / 1 (online/possible)",
>+        "cpuinfo:desc=Intel 8086",

Haha what :D

>+        "meminfo:1.0 / 1.0 GB (free / total)",
>+        "osinfo:lines=3",
>+        "osinfo:kernel=Linux 6.12.33",
>+        "osinfo:hostname=pc",
>+        "osinfo:distro=\"Debian GNU/Linux 13 (trixie)\"",

So I assume these strings can be anything, why not make them blank?

>+        "taskinfo:lines=2",
>+        taskinfo_nr_tid->str,
>+        taskinfo_tids->str,
>+        "usageinfo:lines=6",
>+        "usageinfo:systime=0.000000",
>+        "usageinfo:usrtime=0.000000",
>+        "usageinfo:ctxsw=0 / 0 (voluntary / involuntary)",
>+        "usageinfo:maxrss=8016",
>+        "usageinfo:pagefault=0 / 0 (major / minor)",
>+        "usageinfo:iops=0 / 0 (read / write)",
>+        "loadinfo:0.0 / 0.0 / 0.0",
>+        "record_date:Mon Jan  1 00:00:00 2025",
>+        "elapsed_time:1000000000000.0 sec",
>+        "pattern_type:regex",
>+        "uftrace_version:v0.17 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )",
>+        "utc_offset:1751552954",
>+        0};
>+    const char **info_data_it = info_data;
>+    while (*(info_data_it)) {
>+        fprintf(info, "%s\n", *info_data_it);
>+        ++info_data_it;
>+    }
>+    fclose(info);
>+}
>+
> static Callstack *callstack_new(void)
> {
>     Callstack *cs = g_new0(Callstack, 1);
>@@ -607,14 +727,22 @@ static void vcpu_end(unsigned int vcpu_index)
> 
> static void at_exit(qemu_plugin_id_t id, void *data)
> {
>+    bool system_emulation = (bool) data;
>+    g_autoptr(GArray) traces = g_array_new(0, 0, sizeof(Trace *));
>+
>     for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>         Cpu *cpu = qemu_plugin_scoreboard_find(score, i);
>         for (size_t j = 0; j < cpu->traces->len; ++j) {
>             Trace *t = g_array_index(cpu->traces, Trace*, j);
>             trace_flush(t, true);
>+            g_array_append_val(traces, t);
>         }
>     }
> 
>+    uftrace_write_map(system_emulation);
>+    uftrace_write_info(traces);
>+    uftrace_write_task(traces);
>+
>     for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>         vcpu_end(i);
>     }
>@@ -651,7 +779,7 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
> 
>     score = qemu_plugin_scoreboard_new(sizeof(Cpu));
>     qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
>-    qemu_plugin_register_atexit_cb(id, at_exit, NULL);
>+    qemu_plugin_register_atexit_cb(id, at_exit, (void *) info->system_emulation);
>     qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
> 
>     return 0;
>-- 
>2.47.2
>

LGTM,

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 7/9] contrib/plugins/uftrace: implement x64 support
  2025-08-08  2:07 ` [PATCH v5 7/9] contrib/plugins/uftrace: implement x64 support Pierrick Bouvier
@ 2025-08-08  9:42   ` Manos Pitsidianakis
  2025-08-08 16:52     ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Manos Pitsidianakis @ 2025-08-08  9:42 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé , rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

On Fri, 08 Aug 2025 05:07, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>It's trivial to implement x64 support, as it's the same stack layout
>than aarch64.

s/than/as

>
>Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>---
> contrib/plugins/uftrace.c | 85 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 85 insertions(+)
>
>diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>index 6628b4256fd..f10172eed10 100644
>--- a/contrib/plugins/uftrace.c
>+++ b/contrib/plugins/uftrace.c
>@@ -79,6 +79,20 @@ typedef struct {
>     struct qemu_plugin_register *reg_scr_el3;
> } Aarch64Cpu;
> 
>+typedef enum {
>+    X64_RING0,
>+    X64_RING1,
>+    X64_RING2,
>+    X64_RING3,
>+    X64_REAL_MODE,
>+} X64PrivilegeLevel;
>+
>+typedef struct {
>+    struct qemu_plugin_register *reg_rbp;
>+    struct qemu_plugin_register *reg_cs;
>+    struct qemu_plugin_register *reg_cr0;
>+} X64Cpu;
>+
> typedef struct {
>     uint64_t timestamp;
>     uint64_t data;
>@@ -565,6 +579,75 @@ static CpuOps aarch64_ops = {
>     .does_insn_modify_frame_pointer = aarch64_does_insn_modify_frame_pointer,
> };
> 
>+static uint8_t x64_num_privilege_levels(void)
>+{
>+    return X64_REAL_MODE + 1;
>+}
>+
>+static const char *x64_get_privilege_level_name(uint8_t pl)
>+{
>+    switch (pl) {
>+    case X64_RING0: return "Ring0";
>+    case X64_RING1: return "Ring1";
>+    case X64_RING2: return "Ring2";
>+    case X64_RING3: return "Ring3";
>+    case X64_REAL_MODE: return "RealMode";
>+    default:
>+        g_assert_not_reached();
>+    }
>+}
>+
>+static uint8_t x64_get_privilege_level(Cpu *cpu_)
>+{
>+    X64Cpu *cpu = cpu_->arch;
>+    uint64_t cr0 = cpu_read_register64(cpu_, cpu->reg_cr0);
>+    uint64_t protected_mode = (cr0 >> 0) & 0b1;
>+    if (!protected_mode) {
>+        return X64_REAL_MODE;
>+    }
>+    uint32_t cs = cpu_read_register32(cpu_, cpu->reg_cs);
>+    uint32_t ring_level = (cs >> 0) & 0b11;
>+    return ring_level;
>+}
>+
>+static uint64_t x64_get_frame_pointer(Cpu *cpu_)
>+{
>+    X64Cpu *cpu = cpu_->arch;
>+    return cpu_read_register64(cpu_, cpu->reg_rbp);
>+}
>+
>+static void x64_init(Cpu *cpu_)
>+{
>+    X64Cpu *cpu = g_new0(X64Cpu, 1);
>+    cpu_->arch = cpu;
>+    cpu->reg_rbp = plugin_find_register("rbp");
>+    g_assert(cpu->reg_rbp);
>+    cpu->reg_cs = plugin_find_register("cs");
>+    g_assert(cpu->reg_cs);
>+    cpu->reg_cr0 = plugin_find_register("cr0");
>+    g_assert(cpu->reg_cr0);
>+}
>+
>+static void x64_end(Cpu *cpu)
>+{
>+    g_free(cpu->arch);
>+}
>+
>+static bool x64_does_insn_modify_frame_pointer(const char *disas)
>+{
>+    return strstr(disas, "rbp");
>+}
>+
>+static CpuOps x64_ops = {
>+    .init = x64_init,
>+    .end = x64_end,
>+    .get_frame_pointer = x64_get_frame_pointer,
>+    .get_privilege_level = x64_get_privilege_level,
>+    .num_privilege_levels = x64_num_privilege_levels,
>+    .get_privilege_level_name = x64_get_privilege_level_name,
>+    .does_insn_modify_frame_pointer = x64_does_insn_modify_frame_pointer,
>+};
>+
> static void track_privilege_change(unsigned int cpu_index, void *udata)
> {
>     Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
>@@ -771,6 +854,8 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
> 
>     if (!strcmp(info->target_name, "aarch64")) {
>         arch_ops = aarch64_ops;
>+    } else if (!strcmp(info->target_name, "x86_64")) {
>+        arch_ops = x64_ops;
>     } else {
>         fprintf(stderr, "plugin uftrace: %s target is not supported\n",
>                 info->target_name);
>-- 
>2.47.2
>

No idea about x86 assembly tbh but this looks correct to me.

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 9/9] contrib/plugins/uftrace: add documentation
  2025-08-08  2:07 ` [PATCH v5 9/9] contrib/plugins/uftrace: add documentation Pierrick Bouvier
@ 2025-08-08  9:46   ` Manos Pitsidianakis
  2025-08-08 16:59     ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Manos Pitsidianakis @ 2025-08-08  9:46 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé , rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

On Fri, 08 Aug 2025 05:07, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>This documentation summarizes how to use the plugin, and present two
>examples of the possibilities offered by it, in system and user mode.
>
>As well, it explains how to rebuild and reproduce those examples.
>
>Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>---
> docs/about/emulation.rst | 197 +++++++++++++++++++++++++++++++++++++++
> 1 file changed, 197 insertions(+)
>
>diff --git a/docs/about/emulation.rst b/docs/about/emulation.rst
>index 456d01d5b08..9ce47ac2712 100644
>--- a/docs/about/emulation.rst
>+++ b/docs/about/emulation.rst
>@@ -816,6 +816,203 @@ This plugin can limit the number of Instructions Per Second that are executed::
>       The lower the number the more accurate time will be, but the less efficient the plugin.
>       Defaults to ips/10
> 
>+Uftrace
>+.......
>+
>+``contrib/plugins/uftrace.c``
>+
>+This plugin generates a binary trace compatible with
>+`uftrace <https://github.com/namhyung/uftrace>`_.
>+
>+Plugin supports aarch64 and x64, and works in user and system mode, allowing to
>+trace a system boot, which is not something possible usually.

Now it is!

>+
>+In user mode, the memory mapping is directly copied from ``/proc/self/maps`` at
>+the end of execution. Uftrace should be able to retrieve symbols by itself,
>+without any additional step.
>+In system mode, the default memory mapping is empty, and you can generate
>+one (and associated symbols) using ``contrib/plugins/uftrace_symbols.py``.
>+Symbols must be present in ELF binaries.
>+
>+It tracks the call stack (based on frame pointer analysis). Thus, your program
>+and its dependencies must be compiled using ``-fno-omit-frame-pointer
>+-mno-omit-leaf-frame-pointer``. In 2024, `Ubuntu and Fedora enabled it by
>+default again on x64
>+<https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html>`_.
>+On aarch64, this is less of a problem, as they are usually part of the ABI,
>+except for leaf functions. That's true for user space applications, but not
>+necessarily for bare metal code. You can read this `section
>+<uftrace_build_system_example>` to easily build a system with frame pointers.
>+
>+When tracing long scenarios (> 1 min), the generated trace can become very long,
>+making it hard to extract data from it. In this case, a simple solution is to
>+trace execution while generating a timestamped output log using
>+``qemu-system-aarch64 ... | ts "%s"``. Then, ``uftrace --time-range=start~end``
>+can be used to reduce trace for only this part of execution.
>+
>+Performance wise, overhead compared to normal tcg execution is around x5-x15.
>+
>+.. list-table:: Uftrace plugin arguments
>+  :widths: 20 80
>+  :header-rows: 1
>+
>+  * - Option
>+    - Description
>+  * - trace-privilege-level=[on|off]
>+    - Generate separate traces for each privilege level (Exception Level +
>+      Security State on aarch64, Rings on x64).
>+
>+.. list-table:: uftrace_symbols.py arguments
>+  :widths: 20 80
>+  :header-rows: 1
>+
>+  * - Option
>+    - Description
>+  * - elf_file [elf_file ...]
>+    - path to an ELF file. Use /path/to/file:0xdeadbeef to add a mapping offset.
>+  * - --prefix-symbols
>+    - prepend binary name to symbols
>+
>+Example user trace
>+++++++++++++++++++
>+
>+As an example, we can trace qemu itself running git::
>+
>+    $ ./build/qemu-aarch64 -plugin \
>+      build/contrib/plugins/libuftrace.so \
>+      ./build/qemu-aarch64 /usr/bin/git --help
>+
>+    # and generate a chrome trace directly
>+    $ uftrace dump --chrome | gzip > ~/qemu_aarch64_git_help.json.gz
>+
>+For convenience, you can download this trace `qemu_aarch64_git_help.json.gz
>+<https://fileserver.linaro.org/s/N8X8fnZ5yGRZLsT/download/qemu_aarch64_git_help.json.gz>`_.

We should be able to add static files in the docs/ folder that sphinx 
html can link to for images and json. WDYT?

>+Download it and open this trace on https://ui.perfetto.dev/. You can zoom in/out
>+using w,a,s,d keys. Some sequences taken from this trace:

You can use :kbd:`W` etc for nice key formatting

>+
>+- Loading program and its interpreter
>+
>+.. image:: https://fileserver.linaro.org/s/fie8JgX76yyL5cq/preview
>+   :height: 200px
>+
>+- open syscall
>+
>+.. image:: https://fileserver.linaro.org/s/rsXPTeZZPza4PcE/preview
>+   :height: 200px
>+
>+- TB creation
>+
>+.. image:: https://fileserver.linaro.org/s/GXY6NKMw5EeRCew/preview
>+   :height: 200px
>+
>+It's usually better to use ``uftrace record`` directly. However, tracing
>+binaries through qemu-user can be convenient when you don't want to recompile
>+them (``uftrace record`` requires instrumentation), as long as symbols are
>+present.
>+
>+Example system trace
>+++++++++++++++++++++
>+
>+A full trace example (chrome trace, from instructions below) generated from a
>+system boot can be found `here
>+<https://fileserver.linaro.org/s/WsemLboPEzo24nw/download/aarch64_boot.json.gz>`_.
>+Download it and open this trace on https://ui.perfetto.dev/. You can see code
>+executed for all privilege levels, and zoom in/out using w,a,s,d keys. You can
>+find below some sequences taken from this trace:
>+
>+- Two first stages of boot sequence in Arm Trusted Firmware (EL3 and S-EL1)
>+
>+.. image:: https://fileserver.linaro.org/s/kkxBS552W7nYESX/preview
>+   :height: 200px
>+
>+- U-boot initialization (until code relocation, after which we can't track it)
>+
>+.. image:: https://fileserver.linaro.org/s/LKTgsXNZFi5GFNC/preview
>+   :height: 200px
>+
>+- Stat and open syscalls in kernel
>+
>+.. image:: https://fileserver.linaro.org/s/dXe4MfraKg2F476/preview
>+   :height: 200px
>+
>+- Timer interrupt
>+
>+.. image:: https://fileserver.linaro.org/s/TM5yobYzJtP7P3C/preview
>+   :height: 200px
>+
>+- Poweroff sequence (from kernel back to firmware, NS-EL2 to EL3)
>+
>+.. image:: https://fileserver.linaro.org/s/oR2PtyGKJrqnfRf/preview
>+   :height: 200px
>+
>+Build and run system example
>+++++++++++++++++++++++++++++
>+
>+.. _uftrace_build_system_example:
>+
>+Building a full system image with frame pointers is not trivial.
>+
>+We provide a `simple way <https://github.com/pbo-linaro/qemu-linux-stack>`_ to
>+build an aarch64 system, combining Arm Trusted firmware, U-boot, Linux kernel
>+and debian userland. It's based on containers (``podman`` only) and
>+``qemu-user-static (binfmt)`` to make sure it's easily reproducible and does not depend
>+on machine where you build it.
>+
>+You can follow the exact same instructions for a x64 system, combining edk2,
>+Linux, and Ubuntu, simply by switching to
>+`x86_64 <https://github.com/pbo-linaro/qemu-linux-stack/tree/x86_64>`_ branch.
>+
>+To build the system::
>+
>+    # Install dependencies
>+    $ sudo apt install -y podman qemu-user-static
>+
>+    $ git clone https://github.com/pbo-linaro/qemu-linux-stack
>+    $ cd qemu-linux-stack
>+    $ ./build.sh
>+
>+    # system can be started using:
>+    $ ./run.sh /path/to/qemu-system-aarch64
>+
>+To generate a uftrace for a system boot from that::
>+
>+    # run true and poweroff the system
>+    $ env INIT=true ./run.sh path/to/qemu-system-aarch64 \
>+      -plugin path/to/contrib/plugins/libuftrace.so,trace-privilege-level=on
>+
>+    # generate symbols and memory mapping
>+    $ path/to/contrib/plugins/uftrace_symbols.py \
>+      --prefix-symbols \
>+      arm-trusted-firmware/build/qemu/debug/bl1/bl1.elf \
>+      arm-trusted-firmware/build/qemu/debug/bl2/bl2.elf \
>+      arm-trusted-firmware/build/qemu/debug/bl31/bl31.elf \
>+      u-boot/u-boot:0x60000000 \
>+      linux/vmlinux
>+
>+    # inspect trace with
>+    $ uftrace replay
>+
>+Uftrace allows to filter the trace, and dump flamegraphs, or a chrome trace.
>+This last one is very interesting to see visually the boot process::
>+
>+    $ uftrace dump --chrome > boot.json
>+    # Open your browser, and load boot.json on https://ui.perfetto.dev/.
>+
>+Long visual chrome traces can't be easily opened, thus, it might be
>+interesting to generate them around a particular point of execution::
>+
>+    # execute qemu and timestamp output log
>+    $ env INIT=true ./run.sh path/to/qemu-system-aarch64 \
>+      -plugin path/to/contrib/plugins/libuftrace.so,trace-privilege-level=on |&
>+      ts "%s" | tee exec.log
>+
>+    $ cat exec.log  | grep 'Run /init'
>+      1753122320 [   11.834391] Run /init as init process
>+      # init was launched at 1753122320
>+
>+    # generate trace around init execution (2 seconds):
>+    $ uftrace dump --chrome --time-range=1753122320~1753122322 > init.json
>+
> Other emulation features
> ------------------------
> 
>-- 
>2.47.2
>


Sounds comprehensive all in all. I will try to follow the instructions 
and post a Tested-by

For the doc text:

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 1/9] contrib/plugins/uftrace: skeleton file
  2025-08-08  8:14   ` Manos Pitsidianakis
@ 2025-08-08 16:19     ` Pierrick Bouvier
  2025-08-08 16:34       ` Manos Pitsidianakis
  2025-08-08 17:28     ` Pierrick Bouvier
  1 sibling, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 16:19 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daud é, rowan Hart,
	Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On 8/8/25 1:14 AM, Manos Pitsidianakis wrote:
> On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>> We define a scoreboard that will hold our data per cpu. As well, we
>> define a buffer per cpu that will be used to read registers and memories
>> in a thread-safe way.
>>
>> For now, we just instrument all instructions with an empty callback.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> contrib/plugins/uftrace.c   | 74 +++++++++++++++++++++++++++++++++++++
>> contrib/plugins/meson.build |  3 +-
>> 2 files changed, 76 insertions(+), 1 deletion(-)
>> create mode 100644 contrib/plugins/uftrace.c
>>
>> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>> new file mode 100644
>> index 00000000000..d60c1077496
>> --- /dev/null
>> +++ b/contrib/plugins/uftrace.c
>> @@ -0,0 +1,74 @@
>> +/*
>> + * Copyright (C) 2025, Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> + *
>> + * Generates a trace compatible with uftrace (similar to uftrace record).
>> + * https://github.com/namhyung/uftrace
>> + *
>> + * See docs/about/emulation.rst|Uftrace for details and examples.
>> + *
>> + * SPDX-License-Identifier: GPL-2.0-or-later
>> + */
>> +
>> +#include <qemu-plugin.h>
>> +#include <glib.h>
>> +
>> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>> +
>> +typedef struct Cpu {
>> +    GByteArray *buf;
>> +} Cpu;
>> +
>> +static struct qemu_plugin_scoreboard *score;
>> +
>> +static void track_callstack(unsigned int cpu_index, void *udata)
>> +{
>> +}
>> +
>> +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>> +{
>> +    size_t n_insns = qemu_plugin_tb_n_insns(tb);
>> +
>> +    for (int i = 0; i < n_insns; i++) {
> 
> s/int i/size_t i/
>

Yep, that's better.

>> +        struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
> 
> This can return NULL,
>

It will return NULL only if i is out of the tb range, which will never 
happen here, because i < n_insns.
As you can see in all other plugins we have, there is never a NULL check 
for the return of qemu_plugin_tb_get_insn.
It points a good thing in the API though, that maybe we should simply 
assert i is in the range, because there is no reason for a user to use a 
random index that may fall out of the tb range.

>> +
>> +        uintptr_t pc = qemu_plugin_insn_vaddr(insn);
> 
> And this would lead to a NULL dereference (it performs insn->vaddr
> access)
> 
>> +        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
>> +                QEMU_PLUGIN_CB_R_REGS,
>> +                (void *) pc);
> 
> Hm indentation got broken here, should be
> 

Thanks, probably when I created the intermediate series of patches.

> 
> +        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
> +                                               QEMU_PLUGIN_CB_R_REGS,
> +                                               (void *)pc);
> 
>> +
>> +    }
>> +}
>> +
>> +static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
>> +{
>> +    Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>> +    cpu->buf = g_byte_array_new();
>> +}
>> +
>> +static void vcpu_end(unsigned int vcpu_index)
>> +{
>> +    Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>> +    g_byte_array_free(cpu->buf, true);
>> +    memset(cpu, 0, sizeof(Cpu));
> 
> Nitpick, cpu->buf = NULL; is easier to understand (suggestion)
>

Yes, it does not hurt, I'll add it.

>> +}
>> +
>> +static void at_exit(qemu_plugin_id_t id, void *data)
>> +{
>> +    for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>> +        vcpu_end(i);
>> +    }
>> +
>> +    qemu_plugin_scoreboard_free(score);
>> +}
>> +
>> +QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
>> +                                           const qemu_info_t *info,
>> +                                           int argc, char **argv)
>> +{
>> +    score = qemu_plugin_scoreboard_new(sizeof(Cpu));
>> +    qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
>> +    qemu_plugin_register_atexit_cb(id, at_exit, NULL);
>> +    qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
>> +
>> +    return 0;
>> +}
>> diff --git a/contrib/plugins/meson.build b/contrib/plugins/meson.build
>> index 1876bc78438..7eb3629c95d 100644
>> --- a/contrib/plugins/meson.build
>> +++ b/contrib/plugins/meson.build
>> @@ -1,5 +1,6 @@
>> contrib_plugins = ['bbv', 'cache', 'cflow', 'drcov', 'execlog', 'hotblocks',
>> -                   'hotpages', 'howvec', 'hwprofile', 'ips', 'stoptrigger']
>> +                   'hotpages', 'howvec', 'hwprofile', 'ips', 'stoptrigger',
>> +                   'uftrace']
>> if host_os != 'windows'
>>    # lockstep uses socket.h
>>    contrib_plugins += 'lockstep'
>> -- 
>> 2.47.2
>>
> 
> If no other comments rise for this patch, you can add my r-b after
> fixing the NULL check:
> 
> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 2/9] contrib/plugins/uftrace: define cpu operations and implement aarch64
  2025-08-08  8:28   ` Manos Pitsidianakis
@ 2025-08-08 16:27     ` Pierrick Bouvier
  2025-08-08 16:36       ` Manos Pitsidianakis
  0 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 16:27 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daud é, rowan Hart,
	Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On 8/8/25 1:28 AM, Manos Pitsidianakis wrote:
> On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>> We define a new CpuOps structure that will be used to implement tracking
>> independently of guest architecture.
>>
>> As well, we now instrument only instructions following ones that might
>> have touch the frame pointer.
> 
> s/touch/touched
> 
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> contrib/plugins/uftrace.c | 112 ++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 108 insertions(+), 4 deletions(-)
>>
>> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>> index d60c1077496..4b1a2f38143 100644
>> --- a/contrib/plugins/uftrace.c
>> +++ b/contrib/plugins/uftrace.c
>> @@ -11,14 +11,94 @@
>>
>> #include <qemu-plugin.h>
>> #include <glib.h>
>> +#include <stdio.h>
>>
>> QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>>
>> +typedef struct Cpu Cpu;
>> +
>> +typedef struct {
>> +    void (*init)(Cpu *cpu);
>> +    void (*end)(Cpu *cpu);
>> +    uint64_t (*get_frame_pointer)(Cpu *cpu);
>> +    bool (*does_insn_modify_frame_pointer)(const char *disas);
>> +} CpuOps;
>> +
>> typedef struct Cpu {
>>      GByteArray *buf;
>> +    CpuOps ops;
>> +    void *arch;
>> } Cpu;
>>
>> +typedef struct {
>> +    struct qemu_plugin_register *reg_fp;
>> +} Aarch64Cpu;
>> +
>> static struct qemu_plugin_scoreboard *score;
>> +static CpuOps arch_ops;
>> +
>> +static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
>> +{
>> +    GByteArray *buf = cpu->buf;
>> +    g_byte_array_set_size(buf, 0);
>> +    size_t sz = qemu_plugin_read_register(reg, buf);
>> +    g_assert(sz == 8);
>> +    g_assert(buf->len == 8);
>> +    return *((uint64_t *) buf->data);
>> +}
>> +
>> +static struct qemu_plugin_register *plugin_find_register(const char *name)
>> +{
>> +    g_autoptr(GArray) regs = qemu_plugin_get_registers();
> 
> Question about the plugin API and not this patch per se, if the cpu is
> in a32/thumb mode does it still return the aarch64 registers?
>

I didn't check this part, and it's a good question though.
It would be a massive headache from plugins point of view if registers 
list could vary along execution.

>> +    for (int i = 0; i < regs->len; ++i) {
>> +        qemu_plugin_reg_descriptor *reg;
>> +        reg = &g_array_index(regs, qemu_plugin_reg_descriptor, i);
>> +        if (!strcmp(reg->name, name)) {
>> +            return reg->handle;
>> +        }
>> +    }
>> +    return NULL;
>> +}
>> +
>> +static uint64_t aarch64_get_frame_pointer(Cpu *cpu_)
>> +{
>> +    Aarch64Cpu *cpu = cpu_->arch;
>> +    return cpu_read_register64(cpu_, cpu->reg_fp);
>> +}
>> +
>> +static void aarch64_init(Cpu *cpu_)
>> +{
>> +    Aarch64Cpu *cpu = g_new0(Aarch64Cpu, 1);
>> +    cpu_->arch = cpu;
>> +    cpu->reg_fp = plugin_find_register("x29");
>> +    if (!cpu->reg_fp) {
>> +        fprintf(stderr, "uftrace plugin: frame pointer register (x29) is not "
>> +                        "available. Please use an AArch64 cpu (or -cpu max).\n");
>> +        g_abort();
>> +    }
>> +}
>> +
>> +static void aarch64_end(Cpu *cpu)
>> +{
>> +    g_free(cpu->arch);
>> +}
>> +
>> +static bool aarch64_does_insn_modify_frame_pointer(const char *disas)
>> +{
>> +    /*
>> +     * Check if current instruction concerns fp register "x29".
>> +     * We add a prefix space to make sure we don't match addresses dump
>> +     * in disassembly.
>> +     */
>> +    return strstr(disas, " x29");
> 
> Hm is the whitespace before x29 guaranteed? Neat trick otherwise.
>

At least for aarch64 disassembler, yes, from what I saw.
Either it's the first operand, and then there is a whitespace between 
instruction name and it. Or it's another operand, and we always have a 
whitespace after ','.
I don't think we'll change disassembler soon, but in case this looks too 
fragile, we can always simply return yes if x29 is fine. In worst case, 
we match some additional instructions, but it should not have a huge 
performance hit.

>> +}
>> +
>> +static CpuOps aarch64_ops = {
>> +    .init = aarch64_init,
>> +    .end = aarch64_end,
>> +    .get_frame_pointer = aarch64_get_frame_pointer,
>> +    .does_insn_modify_frame_pointer = aarch64_does_insn_modify_frame_pointer,
>> +};
>>
>> static void track_callstack(unsigned int cpu_index, void *udata)
>> {
>> @@ -28,20 +108,36 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>> {
>>      size_t n_insns = qemu_plugin_tb_n_insns(tb);
>>
>> +    /*
>> +     * We instrument all instructions following one that might have updated
>> +     * the frame pointer. We always instrument first instruction in block, as
>> +     * last executed instruction, in previous tb, may have modified it.
> 
> Modified it how?
>

I ran an assert that latest instruction of a block never contained " 
x29", and it was triggered quickly. I don't remember exactly which 
instruction triggered the assert.

>> +     */
>> +    bool instrument_insn = true;
>>      for (int i = 0; i < n_insns; i++) {
>>          struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
>>
>> -        uintptr_t pc = qemu_plugin_insn_vaddr(insn);
>> -        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
>> -                QEMU_PLUGIN_CB_R_REGS,
>> -                (void *) pc);
>> +        if (instrument_insn) {
>> +            uintptr_t pc = qemu_plugin_insn_vaddr(insn);
>> +            qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
>> +                                                   QEMU_PLUGIN_CB_R_REGS,
>> +                                                   (void *) pc);
>> +            instrument_insn = false;
>> +        }
>>
>> +        char *disas = qemu_plugin_insn_disas(insn);
>> +        if (arch_ops.does_insn_modify_frame_pointer(disas)) {
>> +            instrument_insn = true;
>> +        }
> 
> So if I understand correctly you check if an instruction needs to be
> instrumented and then do it in the next forloop. This means if the last
> insn needs to be instrumented too it is not done, is that ok?
>

Yes, that is why we always instrument the first, to catch this.
There is no (current) way to instrument *after* instruction, and 
probably hard to implement, considering an instruction might generate a 
fault. So the only safe choice left is to instrument the first one of 
any tb.

>>      }
>> }
>>
>> static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
>> {
>>      Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>> +    cpu->ops = arch_ops;
>> +
>> +    cpu->ops.init(cpu);
>>      cpu->buf = g_byte_array_new();
>> }
>>
>> @@ -65,6 +161,14 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
>>                                             const qemu_info_t *info,
>>                                             int argc, char **argv)
>> {
>> +    if (!strcmp(info->target_name, "aarch64")) {
>> +        arch_ops = aarch64_ops;
>> +    } else {
>> +        fprintf(stderr, "plugin uftrace: %s target is not supported\n",
>> +                info->target_name);
>> +        return 1;
>> +    }
>> +
>>      score = qemu_plugin_scoreboard_new(sizeof(Cpu));
>>      qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
>>      qemu_plugin_register_atexit_cb(id, at_exit, NULL);
>> -- 
>> 2.47.2
>>
> 
> LGTM overall, it makes sense.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 1/9] contrib/plugins/uftrace: skeleton file
  2025-08-08 16:19     ` Pierrick Bouvier
@ 2025-08-08 16:34       ` Manos Pitsidianakis
  0 siblings, 0 replies; 35+ messages in thread
From: Manos Pitsidianakis @ 2025-08-08 16:34 UTC (permalink / raw)
  To: Pierrick Bouvier
  Cc: qemu-devel, Mahmoud Mandour, Philippe Mathieu-Daud é,
	rowan Hart, Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On Fri, Aug 8, 2025 at 7:19 PM Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> On 8/8/25 1:14 AM, Manos Pitsidianakis wrote:
> > On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
> >> We define a scoreboard that will hold our data per cpu. As well, we
> >> define a buffer per cpu that will be used to read registers and memories
> >> in a thread-safe way.
> >>
> >> For now, we just instrument all instructions with an empty callback.
> >>
> >> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> >> ---
> >> contrib/plugins/uftrace.c   | 74 +++++++++++++++++++++++++++++++++++++
> >> contrib/plugins/meson.build |  3 +-
> >> 2 files changed, 76 insertions(+), 1 deletion(-)
> >> create mode 100644 contrib/plugins/uftrace.c
> >>
> >> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
> >> new file mode 100644
> >> index 00000000000..d60c1077496
> >> --- /dev/null
> >> +++ b/contrib/plugins/uftrace.c
> >> @@ -0,0 +1,74 @@
> >> +/*
> >> + * Copyright (C) 2025, Pierrick Bouvier <pierrick.bouvier@linaro.org>
> >> + *
> >> + * Generates a trace compatible with uftrace (similar to uftrace record).
> >> + * https://github.com/namhyung/uftrace
> >> + *
> >> + * See docs/about/emulation.rst|Uftrace for details and examples.
> >> + *
> >> + * SPDX-License-Identifier: GPL-2.0-or-later
> >> + */
> >> +
> >> +#include <qemu-plugin.h>
> >> +#include <glib.h>
> >> +
> >> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> >> +
> >> +typedef struct Cpu {
> >> +    GByteArray *buf;
> >> +} Cpu;
> >> +
> >> +static struct qemu_plugin_scoreboard *score;
> >> +
> >> +static void track_callstack(unsigned int cpu_index, void *udata)
> >> +{
> >> +}
> >> +
> >> +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
> >> +{
> >> +    size_t n_insns = qemu_plugin_tb_n_insns(tb);
> >> +
> >> +    for (int i = 0; i < n_insns; i++) {
> >
> > s/int i/size_t i/
> >
>
> Yep, that's better.
>
> >> +        struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
> >
> > This can return NULL,
> >
>
> It will return NULL only if i is out of the tb range, which will never
> happen here, because i < n_insns.
> As you can see in all other plugins we have, there is never a NULL check
> for the return of qemu_plugin_tb_get_insn.
> It points a good thing in the API though, that maybe we should simply
> assert i is in the range, because there is no reason for a user to use a
> random index that may fall out of the tb range.

Ah thanks for pointing that out. Keep my r-b

>
> >> +
> >> +        uintptr_t pc = qemu_plugin_insn_vaddr(insn);
> >
> > And this would lead to a NULL dereference (it performs insn->vaddr
> > access)
> >
> >> +        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
> >> +                QEMU_PLUGIN_CB_R_REGS,
> >> +                (void *) pc);
> >
> > Hm indentation got broken here, should be
> >
>
> Thanks, probably when I created the intermediate series of patches.
>
> >
> > +        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
> > +                                               QEMU_PLUGIN_CB_R_REGS,
> > +                                               (void *)pc);
> >
> >> +
> >> +    }
> >> +}
> >> +
> >> +static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
> >> +{
> >> +    Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
> >> +    cpu->buf = g_byte_array_new();
> >> +}
> >> +
> >> +static void vcpu_end(unsigned int vcpu_index)
> >> +{
> >> +    Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
> >> +    g_byte_array_free(cpu->buf, true);
> >> +    memset(cpu, 0, sizeof(Cpu));
> >
> > Nitpick, cpu->buf = NULL; is easier to understand (suggestion)
> >
>
> Yes, it does not hurt, I'll add it.
>
> >> +}
> >> +
> >> +static void at_exit(qemu_plugin_id_t id, void *data)
> >> +{
> >> +    for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
> >> +        vcpu_end(i);
> >> +    }
> >> +
> >> +    qemu_plugin_scoreboard_free(score);
> >> +}
> >> +
> >> +QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
> >> +                                           const qemu_info_t *info,
> >> +                                           int argc, char **argv)
> >> +{
> >> +    score = qemu_plugin_scoreboard_new(sizeof(Cpu));
> >> +    qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
> >> +    qemu_plugin_register_atexit_cb(id, at_exit, NULL);
> >> +    qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
> >> +
> >> +    return 0;
> >> +}
> >> diff --git a/contrib/plugins/meson.build b/contrib/plugins/meson.build
> >> index 1876bc78438..7eb3629c95d 100644
> >> --- a/contrib/plugins/meson.build
> >> +++ b/contrib/plugins/meson.build
> >> @@ -1,5 +1,6 @@
> >> contrib_plugins = ['bbv', 'cache', 'cflow', 'drcov', 'execlog', 'hotblocks',
> >> -                   'hotpages', 'howvec', 'hwprofile', 'ips', 'stoptrigger']
> >> +                   'hotpages', 'howvec', 'hwprofile', 'ips', 'stoptrigger',
> >> +                   'uftrace']
> >> if host_os != 'windows'
> >>    # lockstep uses socket.h
> >>    contrib_plugins += 'lockstep'
> >> --
> >> 2.47.2
> >>
> >
> > If no other comments rise for this patch, you can add my r-b after
> > fixing the NULL check:
> >
> > Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 2/9] contrib/plugins/uftrace: define cpu operations and implement aarch64
  2025-08-08 16:27     ` Pierrick Bouvier
@ 2025-08-08 16:36       ` Manos Pitsidianakis
  0 siblings, 0 replies; 35+ messages in thread
From: Manos Pitsidianakis @ 2025-08-08 16:36 UTC (permalink / raw)
  To: Pierrick Bouvier
  Cc: qemu-devel, Mahmoud Mandour, Philippe Mathieu-Daud é,
	rowan Hart, Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On Fri, Aug 8, 2025 at 7:27 PM Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> On 8/8/25 1:28 AM, Manos Pitsidianakis wrote:
> > On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
> >> We define a new CpuOps structure that will be used to implement tracking
> >> independently of guest architecture.
> >>
> >> As well, we now instrument only instructions following ones that might
> >> have touch the frame pointer.
> >
> > s/touch/touched
> >
> >>
> >> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> >> ---
> >> contrib/plugins/uftrace.c | 112 ++++++++++++++++++++++++++++++++++++--
> >> 1 file changed, 108 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
> >> index d60c1077496..4b1a2f38143 100644
> >> --- a/contrib/plugins/uftrace.c
> >> +++ b/contrib/plugins/uftrace.c
> >> @@ -11,14 +11,94 @@
> >>
> >> #include <qemu-plugin.h>
> >> #include <glib.h>
> >> +#include <stdio.h>
> >>
> >> QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> >>
> >> +typedef struct Cpu Cpu;
> >> +
> >> +typedef struct {
> >> +    void (*init)(Cpu *cpu);
> >> +    void (*end)(Cpu *cpu);
> >> +    uint64_t (*get_frame_pointer)(Cpu *cpu);
> >> +    bool (*does_insn_modify_frame_pointer)(const char *disas);
> >> +} CpuOps;
> >> +
> >> typedef struct Cpu {
> >>      GByteArray *buf;
> >> +    CpuOps ops;
> >> +    void *arch;
> >> } Cpu;
> >>
> >> +typedef struct {
> >> +    struct qemu_plugin_register *reg_fp;
> >> +} Aarch64Cpu;
> >> +
> >> static struct qemu_plugin_scoreboard *score;
> >> +static CpuOps arch_ops;
> >> +
> >> +static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
> >> +{
> >> +    GByteArray *buf = cpu->buf;
> >> +    g_byte_array_set_size(buf, 0);
> >> +    size_t sz = qemu_plugin_read_register(reg, buf);
> >> +    g_assert(sz == 8);
> >> +    g_assert(buf->len == 8);
> >> +    return *((uint64_t *) buf->data);
> >> +}
> >> +
> >> +static struct qemu_plugin_register *plugin_find_register(const char *name)
> >> +{
> >> +    g_autoptr(GArray) regs = qemu_plugin_get_registers();
> >
> > Question about the plugin API and not this patch per se, if the cpu is
> > in a32/thumb mode does it still return the aarch64 registers?
> >
>
> I didn't check this part, and it's a good question though.
> It would be a massive headache from plugins point of view if registers
> list could vary along execution.
>
> >> +    for (int i = 0; i < regs->len; ++i) {
> >> +        qemu_plugin_reg_descriptor *reg;
> >> +        reg = &g_array_index(regs, qemu_plugin_reg_descriptor, i);
> >> +        if (!strcmp(reg->name, name)) {
> >> +            return reg->handle;
> >> +        }
> >> +    }
> >> +    return NULL;
> >> +}
> >> +
> >> +static uint64_t aarch64_get_frame_pointer(Cpu *cpu_)
> >> +{
> >> +    Aarch64Cpu *cpu = cpu_->arch;
> >> +    return cpu_read_register64(cpu_, cpu->reg_fp);
> >> +}
> >> +
> >> +static void aarch64_init(Cpu *cpu_)
> >> +{
> >> +    Aarch64Cpu *cpu = g_new0(Aarch64Cpu, 1);
> >> +    cpu_->arch = cpu;
> >> +    cpu->reg_fp = plugin_find_register("x29");
> >> +    if (!cpu->reg_fp) {
> >> +        fprintf(stderr, "uftrace plugin: frame pointer register (x29) is not "
> >> +                        "available. Please use an AArch64 cpu (or -cpu max).\n");
> >> +        g_abort();
> >> +    }
> >> +}
> >> +
> >> +static void aarch64_end(Cpu *cpu)
> >> +{
> >> +    g_free(cpu->arch);
> >> +}
> >> +
> >> +static bool aarch64_does_insn_modify_frame_pointer(const char *disas)
> >> +{
> >> +    /*
> >> +     * Check if current instruction concerns fp register "x29".
> >> +     * We add a prefix space to make sure we don't match addresses dump
> >> +     * in disassembly.
> >> +     */
> >> +    return strstr(disas, " x29");
> >
> > Hm is the whitespace before x29 guaranteed? Neat trick otherwise.
> >
>
> At least for aarch64 disassembler, yes, from what I saw.
> Either it's the first operand, and then there is a whitespace between
> instruction name and it. Or it's another operand, and we always have a
> whitespace after ','.
> I don't think we'll change disassembler soon, but in case this looks too
> fragile, we can always simply return yes if x29 is fine. In worst case,
> we match some additional instructions, but it should not have a huge
> performance hit.
>
> >> +}
> >> +
> >> +static CpuOps aarch64_ops = {
> >> +    .init = aarch64_init,
> >> +    .end = aarch64_end,
> >> +    .get_frame_pointer = aarch64_get_frame_pointer,
> >> +    .does_insn_modify_frame_pointer = aarch64_does_insn_modify_frame_pointer,
> >> +};
> >>
> >> static void track_callstack(unsigned int cpu_index, void *udata)
> >> {
> >> @@ -28,20 +108,36 @@ static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
> >> {
> >>      size_t n_insns = qemu_plugin_tb_n_insns(tb);
> >>
> >> +    /*
> >> +     * We instrument all instructions following one that might have updated
> >> +     * the frame pointer. We always instrument first instruction in block, as
> >> +     * last executed instruction, in previous tb, may have modified it.
> >
> > Modified it how?
> >
>
> I ran an assert that latest instruction of a block never contained "
> x29", and it was triggered quickly. I don't remember exactly which
> instruction triggered the assert.
>
> >> +     */
> >> +    bool instrument_insn = true;
> >>      for (int i = 0; i < n_insns; i++) {
> >>          struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
> >>
> >> -        uintptr_t pc = qemu_plugin_insn_vaddr(insn);
> >> -        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
> >> -                QEMU_PLUGIN_CB_R_REGS,
> >> -                (void *) pc);
> >> +        if (instrument_insn) {
> >> +            uintptr_t pc = qemu_plugin_insn_vaddr(insn);
> >> +            qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
> >> +                                                   QEMU_PLUGIN_CB_R_REGS,
> >> +                                                   (void *) pc);
> >> +            instrument_insn = false;
> >> +        }
> >>
> >> +        char *disas = qemu_plugin_insn_disas(insn);
> >> +        if (arch_ops.does_insn_modify_frame_pointer(disas)) {
> >> +            instrument_insn = true;
> >> +        }
> >
> > So if I understand correctly you check if an instruction needs to be
> > instrumented and then do it in the next forloop. This means if the last
> > insn needs to be instrumented too it is not done, is that ok?
> >
>
> Yes, that is why we always instrument the first, to catch this.
> There is no (current) way to instrument *after* instruction, and
> probably hard to implement, considering an instruction might generate a
> fault. So the only safe choice left is to instrument the first one of
> any tb.
>

Fair enough, thanks for the explanation :)

Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>

> >>      }
> >> }
> >>
> >> static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
> >> {
> >>      Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
> >> +    cpu->ops = arch_ops;
> >> +
> >> +    cpu->ops.init(cpu);
> >>      cpu->buf = g_byte_array_new();
> >> }
> >>
> >> @@ -65,6 +161,14 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
> >>                                             const qemu_info_t *info,
> >>                                             int argc, char **argv)
> >> {
> >> +    if (!strcmp(info->target_name, "aarch64")) {
> >> +        arch_ops = aarch64_ops;
> >> +    } else {
> >> +        fprintf(stderr, "plugin uftrace: %s target is not supported\n",
> >> +                info->target_name);
> >> +        return 1;
> >> +    }
> >> +
> >>      score = qemu_plugin_scoreboard_new(sizeof(Cpu));
> >>      qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
> >>      qemu_plugin_register_atexit_cb(id, at_exit, NULL);
> >> --
> >> 2.47.2
> >>
> >
> > LGTM overall, it makes sense.
>


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 3/9] contrib/plugins/uftrace: track callstack
  2025-08-08  8:49   ` Manos Pitsidianakis
@ 2025-08-08 16:36     ` Pierrick Bouvier
  0 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 16:36 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daud é, rowan Hart,
	Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On 8/8/25 1:49 AM, Manos Pitsidianakis wrote:
> On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>> We now track callstack, based on frame pointer analysis. We can detect
>> function calls, returns, and discontinuities.
>> We implement a frame pointer based unwinding that is used for
>> discontinuities.
> 
> 
> Nit: Never heard of the "discontinuity" term for program execution
> before :D Maybe "async control flow (signals, interrupts)"?
>

Someone posted a series to detect discontinuities in program execution.
I asked the same question as you:
[1] 
https://lore.kernel.org/qemu-devel/51ac04eea17a6c5b59a240d3c57ce54a851e4989@nut.email/

Since we'll probably have this series merged in the future, I though it 
was nice to reuse it.
As well, it's not only async control flow, a setjmp/longjmp will 
generate a discontinuity too. That's why I picked this name compared to 
"exceptional/async control flow".
That said, I don't have strong opinion, and as the name appears only in 
comments and commit message, it can be changed to anything.

>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> contrib/plugins/uftrace.c | 160 ++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 160 insertions(+)
>>
>> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>> index 4b1a2f38143..d51faceb344 100644
>> --- a/contrib/plugins/uftrace.c
>> +++ b/contrib/plugins/uftrace.c
>> @@ -15,6 +15,15 @@
>>
>> QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>>
>> +typedef struct {
>> +    GArray *s;
>> +} Callstack;
>> +
>> +typedef struct {
>> +    uint64_t pc;
>> +    uint64_t frame_pointer;
>> +} CallstackEntry;
>> +
>> typedef struct Cpu Cpu;
>>
>> typedef struct {
>> @@ -25,6 +34,7 @@ typedef struct {
>> } CpuOps;
>>
>> typedef struct Cpu {
>> +    Callstack *cs;
>>      GByteArray *buf;
>>      CpuOps ops;
>>      void *arch;
>> @@ -37,6 +47,71 @@ typedef struct {
>> static struct qemu_plugin_scoreboard *score;
>> static CpuOps arch_ops;
>>
>> +static Callstack *callstack_new(void)
>> +{
>> +    Callstack *cs = g_new0(Callstack, 1);
>> +    cs->s = g_array_new(false, false, sizeof(CallstackEntry));
>> +    return cs;
>> +}
>> +
>> +static void callstack_free(Callstack *cs)
>> +{
>> +    g_array_free(cs->s, true);
>> +    cs->s = NULL;
>> +    g_free(cs);
>> +}
>> +
>> +static size_t callstack_depth(const Callstack *cs)
>> +{
>> +    return cs->s->len;
>> +}
>> +
>> +static size_t callstack_empty(const Callstack *cs)
>> +{
>> +    return callstack_depth(cs) == 0;
>> +}
>> +
>> +static void callstack_clear(Callstack *cs)
>> +{
>> +    g_array_set_size(cs->s, 0);
>> +}
>> +
>> +static const CallstackEntry *callstack_at(const Callstack *cs, size_t depth)
>> +{
>> +    g_assert(depth > 0);
>> +    g_assert(depth <= callstack_depth(cs));
>> +    return &g_array_index(cs->s, CallstackEntry, depth - 1);
>> +}
>> +
>> +static CallstackEntry callstack_top(const Callstack *cs)
>> +{
>> +    if (callstack_depth(cs) >= 1) {
>> +        return *callstack_at(cs, callstack_depth(cs));
>> +    }
>> +    return (CallstackEntry){};
>> +}
>> +
>> +static CallstackEntry callstack_caller(const Callstack *cs)
>> +{
>> +    if (callstack_depth(cs) >= 2) {
>> +        return *callstack_at(cs, callstack_depth(cs) - 1);
>> +    }
>> +    return (CallstackEntry){};
>> +}
>> +
>> +static void callstack_push(Callstack *cs, CallstackEntry e)
>> +{
>> +    g_array_append_val(cs->s, e);
>> +}
>> +
>> +static CallstackEntry callstack_pop(Callstack *cs)
>> +{
>> +    g_assert(!callstack_empty(cs));
>> +    CallstackEntry e = callstack_top(cs);
>> +    g_array_set_size(cs->s, callstack_depth(cs) - 1);
>> +    return e;
>> +}
>> +
>> static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
>> {
>>      GByteArray *buf = cpu->buf;
>> @@ -47,6 +122,50 @@ static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
>>      return *((uint64_t *) buf->data);
>> }
>>
>> +static uint64_t cpu_read_memory64(Cpu *cpu, uint64_t addr)
>> +{
>> +    g_assert(addr);
>> +    GByteArray *buf = cpu->buf;
>> +    g_byte_array_set_size(buf, 0);
>> +    bool read = qemu_plugin_read_memory_vaddr(addr, buf, 8);
>> +    if (!read) {
>> +        return 0;
>> +    }
>> +    g_assert(buf->len == 8);
>> +    return *((uint64_t *) buf->data);
>> +}
>> +
>> +static void cpu_unwind_stack(Cpu *cpu, uint64_t frame_pointer, uint64_t pc)
>> +{
>> +    g_assert(callstack_empty(cpu->cs));
>> +
>> +    #define UNWIND_STACK_MAX_DEPTH 1024
>> +    CallstackEntry unwind[UNWIND_STACK_MAX_DEPTH];
>> +    size_t depth = 0;
>> +    do {
>> +        /* check we don't have an infinite stack */
>> +        for (size_t i = 0; i < depth; ++i) {
>> +            if (frame_pointer == unwind[i].frame_pointer) {
>> +                break;
>> +            }
>> +        }
>> +        CallstackEntry e = {.frame_pointer = frame_pointer, .pc = pc};
>> +        unwind[depth] = e;
>> +        depth++;
>> +        if (frame_pointer) {
>> +            frame_pointer = cpu_read_memory64(cpu, frame_pointer);
>> +        }
>> +        pc = cpu_read_memory64(cpu, frame_pointer + 8); /* read previous lr */
>> +    } while (frame_pointer && pc && depth < UNWIND_STACK_MAX_DEPTH);
>> +    #undef UNWIND_STACK_MAX_DEPTH
>> +
>> +    /* push it from bottom to top */
>> +    while (depth) {
>> +        callstack_push(cpu->cs, unwind[depth - 1]);
>> +        --depth;
>> +    }
>> +}
> 
> Nice.
>

I noticed at some point, during the boot sequence, we had a chain of 
frame pointers that looked like this:
A, B, A, B, A, B...
Thus the need to check we don't have an infinite stack.
It was in code without symbols, but I wonder why we had such a situation 
happening.

>> +
>> static struct qemu_plugin_register *plugin_find_register(const char *name)
>> {
>>      g_autoptr(GArray) regs = qemu_plugin_get_registers();
>> @@ -102,6 +221,43 @@ static CpuOps aarch64_ops = {
>>
>> static void track_callstack(unsigned int cpu_index, void *udata)
>> {
>> +    uint64_t pc = (uintptr_t) udata;
>> +    Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
>> +    Callstack *cs = cpu->cs;
>> +
>> +    uint64_t fp = cpu->ops.get_frame_pointer(cpu);
>> +    if (!fp && callstack_empty(cs)) {
>> +        /*
>> +         * We simply push current pc. Note that we won't detect symbol change as
>> +         * long as a proper call does not happen.
>> +         */
>> +        callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
>> +        return;
>> +    }
>> +
>> +    CallstackEntry top = callstack_top(cs);
>> +    if (fp == top.frame_pointer) {
>> +        /* same function */
>> +        return;
>> +    }
>> +
>> +    CallstackEntry caller = callstack_caller(cs);
>> +    if (fp == caller.frame_pointer) {
>> +        /* return */
>> +        callstack_pop(cs);
>> +        return;
>> +    }
>> +
>> +    uint64_t caller_fp = fp ? cpu_read_memory64(cpu, fp) : 0;
>> +    if (caller_fp == top.frame_pointer) {
>> +        /* call */
>> +        callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
>> +        return;
>> +    }
>> +
>> +    /* discontinuity, exit current stack and unwind new one */
>> +    callstack_clear(cs);
>> +    cpu_unwind_stack(cpu, fp, pc);
>> }
>>
>> static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>> @@ -139,12 +295,16 @@ static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
>>
>>      cpu->ops.init(cpu);
>>      cpu->buf = g_byte_array_new();
>> +
>> +    cpu->cs = callstack_new();
>> }
>>
>> static void vcpu_end(unsigned int vcpu_index)
>> {
>>      Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>>      g_byte_array_free(cpu->buf, true);
>> +
>> +    callstack_free(cpu->cs);
>>      memset(cpu, 0, sizeof(Cpu));
>> }
>>
>> -- 
>> 2.47.2
>>
> Looks good I think,
> 
> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/9] contrib/plugins/uftrace: implement tracing
  2025-08-08  9:11   ` Manos Pitsidianakis
@ 2025-08-08 16:38     ` Pierrick Bouvier
  2025-08-08 18:03     ` Pierrick Bouvier
  2025-08-08 18:24     ` Pierrick Bouvier
  2 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 16:38 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daud é, rowan Hart,
	Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On 8/8/25 2:11 AM, Manos Pitsidianakis wrote:
> On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>> We implement tracing, following uftrace format.
>> Trace is flushed every 32 MB, so file operations don't impact
>> performance at runtime.
>>
>> A different trace is generated per cpu, and we ensure they have a unique
>> name, based on vcpu_index, while keeping room for privilege level coming
>> in next commit.
> 
> Suggestion (not a request): put some kind of documentation about the
> format this patch implements, maybe a commit sha & URL to a header file
> from upstream uftrace.
> 

This is the comment /* libmcount/record.c:record_event */ in 
trace_add_entry() function.
I can add this to commit message also, so maybe it's easier to find.

There is no official documentation for that on uftrace side, but source 
code is quite readable, and uftrace provides a convenient "dump --debug" 
command.

>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> contrib/plugins/uftrace.c | 149 +++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 148 insertions(+), 1 deletion(-)
>>
>> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>> index d51faceb344..402a242433e 100644
>> --- a/contrib/plugins/uftrace.c
>> +++ b/contrib/plugins/uftrace.c
>> @@ -12,6 +12,13 @@
>> #include <qemu-plugin.h>
>> #include <glib.h>
>> #include <stdio.h>
>> +#include <sys/stat.h>
>> +#include <sys/time.h>
>> +#include <time.h>
>> +#include <unistd.h>
>> +
>> +#define MiB     (INT64_C(1) << 20)
>> +#define NANOSECONDS_PER_SECOND 1000000000LL
>>
>> QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>>
>> @@ -24,6 +31,13 @@ typedef struct {
>>      uint64_t frame_pointer;
>> } CallstackEntry;
>>
>> +typedef struct {
>> +    GArray *t;
>> +    GString *path;
>> +    GString *name;
>> +    uint32_t id;
>> +} Trace;
>> +
>> typedef struct Cpu Cpu;
>>
>> typedef struct {
>> @@ -34,6 +48,7 @@ typedef struct {
>> } CpuOps;
>>
>> typedef struct Cpu {
>> +    Trace *trace;
>>      Callstack *cs;
>>      GByteArray *buf;
>>      CpuOps ops;
>> @@ -44,9 +59,41 @@ typedef struct {
>>      struct qemu_plugin_register *reg_fp;
>> } Aarch64Cpu;
>>
>> +typedef struct {
>> +    uint64_t timestamp;
>> +    uint64_t data;
>> +} UftraceEntry;
>> +
>> +typedef enum {
>> +    UFTRACE_ENTRY,
>> +    UFTRACE_EXIT,
>> +    UFTRACE_LOST,
>> +    UFTRACE_EVENT
>> +} UftraceRecordType;
>> +
>> static struct qemu_plugin_scoreboard *score;
>> static CpuOps arch_ops;
>>
>> +static uint64_t gettime_ns(void)
>> +{
>> +#ifdef _WIN32
>> +    /*
>> +     * On Windows, timespec_get is available only with UCRT, but not with
>> +     * MinGW64 environment. Simplify by using only gettimeofday on this
>> +     * platform. This may result in a precision loss.
>> +     */
>> +    struct timeval tv;
>> +    gettimeofday(&tv, NULL);
>> +    uint64_t now_ns = tv.tv_sec * NANOSECONDS_PER_SECOND + tv.tv_usec * 1000;
>> +#else
>> +    /* We need nanosecond precision for short lived functions. */
>> +    struct timespec ts;
>> +    timespec_get(&ts, TIME_UTC);
>> +    uint64_t now_ns = ts.tv_sec * NANOSECONDS_PER_SECOND + ts.tv_nsec;
>> +#endif
>> +    return now_ns;
>> +}
>> +
>> static Callstack *callstack_new(void)
>> {
>>      Callstack *cs = g_new0(Callstack, 1);
>> @@ -112,6 +159,85 @@ static CallstackEntry callstack_pop(Callstack *cs)
>>      return e;
>> }
>>
>> +static Trace *trace_new(uint32_t id, GString *name)
>> +{
>> +    Trace *t = g_new0(Trace, 1);
>> +    t->t = g_array_new(false, false, sizeof(UftraceEntry));
>> +    t->path = g_string_new(NULL);
>> +    g_string_append_printf(t->path, "./uftrace.data/%"PRIu32".dat", id);
>> +    t->name = g_string_new(name->str);
>> +    t->id = id;
>> +    return t;
>> +}
>> +
>> +static void trace_free(Trace *t)
>> +{
>> +    g_assert(t->t->len == 0);
>> +    g_array_free(t->t, true);
>> +    t->t = NULL;
>> +    g_string_free(t->path, true);
>> +    t->path = NULL;
>> +    g_string_free(t->name, true);
>> +    t->name = NULL;
>> +    g_free(t);
>> +}
>> +
>> +static void trace_flush(Trace *t, bool append)
>> +{
>> +    int create_dir = g_mkdir_with_parents("./uftrace.data",
>> +                                          S_IRWXU | S_IRWXG | S_IRWXO);
>> +    g_assert(create_dir == 0);
>> +    FILE *dat = fopen(t->path->str, append ? "a" : "w");
>> +    g_assert(dat);
>> +    GArray *data = t->t;
>> +    if (data->len) {
>> +        fwrite(data->data, data->len, sizeof(UftraceEntry), dat);
> 
> fwrite might not write all bytes, how about using the
> g_file_set_contents() wrapper?
> 
>> +    }
>> +    fclose(dat);
>> +    g_array_set_size(data, 0);
>> +}
>> +
>> +static void trace_add_entry(Trace *t, uint64_t timestamp, uint64_t pc,
>> +                            size_t depth, UftraceRecordType type)
>> +{
>> +    /* libmcount/record.c:record_event */
>> +    const uint64_t record_magic = 0x5;
>> +    uint64_t data = type | record_magic << 3;
>> +    data += depth << 6;
>> +    data += pc << 16;
>> +    UftraceEntry e = {.timestamp = timestamp, .data = data};
>> +    g_array_append_val(t->t, e);
>> +    if (t->t->len * sizeof(UftraceEntry) > 32 * MiB) {
>> +        trace_flush(t, true);
>> +    }
>> +}
>> +
>> +static void trace_enter_function(Trace *t, uint64_t timestamp,
>> +                                 uint64_t pc, size_t depth)
>> +{
>> +    trace_add_entry(t, timestamp, pc, depth, UFTRACE_ENTRY);
>> +}
>> +
>> +static void trace_exit_function(Trace *t, uint64_t timestamp,
>> +                                uint64_t pc, size_t depth)
>> +{
>> +    trace_add_entry(t, timestamp, pc, depth, UFTRACE_EXIT);
>> +}
>> +
>> +static void trace_enter_stack(Trace *t, Callstack *cs, uint64_t timestamp)
>> +{
>> +    for (size_t depth = 1; depth <= callstack_depth(cs); ++depth) {
>> +        trace_enter_function(t, timestamp, callstack_at(cs, depth)->pc, depth);
>> +    }
>> +}
>> +
>> +static void trace_exit_stack(Trace *t, Callstack *cs, uint64_t timestamp)
>> +{
>> +    for (size_t depth = callstack_depth(cs); depth > 0; --depth) {
>> +        trace_exit_function(t, timestamp, callstack_at(cs, depth)->pc, depth);
>> +    }
>> +}
>> +
>> static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
>> {
>>      GByteArray *buf = cpu->buf;
>> @@ -223,7 +349,9 @@ static void track_callstack(unsigned int cpu_index, void *udata)
>> {
>>      uint64_t pc = (uintptr_t) udata;
>>      Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
>> +    uint64_t timestamp = gettime_ns();
>>      Callstack *cs = cpu->cs;
>> +    Trace *t = cpu->trace;
>>
>>      uint64_t fp = cpu->ops.get_frame_pointer(cpu);
>>      if (!fp && callstack_empty(cs)) {
>> @@ -232,6 +360,7 @@ static void track_callstack(unsigned int cpu_index, void *udata)
>>           * long as a proper call does not happen.
>>           */
>>          callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
>> +        trace_enter_function(t, timestamp, pc, callstack_depth(cs));
>>          return;
>>      }
>>
>> @@ -244,7 +373,8 @@ static void track_callstack(unsigned int cpu_index, void *udata)
>>      CallstackEntry caller = callstack_caller(cs);
>>      if (fp == caller.frame_pointer) {
>>          /* return */
>> -        callstack_pop(cs);
>> +        CallstackEntry e = callstack_pop(cs);
>> +        trace_exit_function(t, timestamp, e.pc, callstack_depth(cs));
>>          return;
>>      }
>>
>> @@ -252,12 +382,16 @@ static void track_callstack(unsigned int cpu_index, void *udata)
>>      if (caller_fp == top.frame_pointer) {
>>          /* call */
>>          callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
>> +        trace_enter_function(t, timestamp, pc, callstack_depth(cs));
>>          return;
>>      }
>>
>>      /* discontinuity, exit current stack and unwind new one */
>> +    trace_exit_stack(t, cs, timestamp);
>>      callstack_clear(cs);
>> +
>>      cpu_unwind_stack(cpu, fp, pc);
>> +    trace_enter_stack(t, cs, timestamp);
>> }
>>
>> static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>> @@ -296,6 +430,16 @@ static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
>>      cpu->ops.init(cpu);
>>      cpu->buf = g_byte_array_new();
>>
>> +    g_assert(vcpu_index < UINT32_MAX / 100);
>> +    /* trace_id is: cpu_number * 100 */
>> +    uint32_t trace_id = (vcpu_index + 1) * 100;
>> +
>> +    g_autoptr(GString) trace_name = g_string_new(NULL);
>> +    g_string_append_printf(trace_name, "cpu%u", vcpu_index);
>> +    cpu->trace = trace_new(trace_id, trace_name);
>> +    /* create/truncate trace file */
>> +    trace_flush(cpu->trace, false);
>> +
>>      cpu->cs = callstack_new();
>> }
>>
>> @@ -304,6 +448,7 @@ static void vcpu_end(unsigned int vcpu_index)
>>      Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>>      g_byte_array_free(cpu->buf, true);
>>
>> +    trace_free(cpu->trace);
>>      callstack_free(cpu->cs);
>>      memset(cpu, 0, sizeof(Cpu));
>> }
>> @@ -311,6 +456,8 @@ static void vcpu_end(unsigned int vcpu_index)
>> static void at_exit(qemu_plugin_id_t id, void *data)
>> {
>>      for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>> +        Cpu *cpu = qemu_plugin_scoreboard_find(score, i);
>> +        trace_flush(cpu->trace, true);
>>          vcpu_end(i);
>>      }
>>
>> -- 
>> 2.47.2
>>
> 
> Looks good overall.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 5/9] contrib/plugins/uftrace: implement privilege level tracing
  2025-08-08  9:26   ` Manos Pitsidianakis
@ 2025-08-08 16:41     ` Pierrick Bouvier
  0 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 16:41 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daud é, rowan Hart,
	Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On 8/8/25 2:26 AM, Manos Pitsidianakis wrote:
> On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>> We add new option trace-privilege-level=bool, which will create a
>> separate trace for each privilege level.
>> This allows to follow changes of privilege during execution.
>>
>> We implement aarch64 operations to track current privilege level
>> accordingly.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> contrib/plugins/uftrace.c | 189 ++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 181 insertions(+), 8 deletions(-)
>>
>> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>> index 402a242433e..7737626da2f 100644
>> --- a/contrib/plugins/uftrace.c
>> +++ b/contrib/plugins/uftrace.c
>> @@ -44,19 +44,39 @@ typedef struct {
>>      void (*init)(Cpu *cpu);
>>      void (*end)(Cpu *cpu);
>>      uint64_t (*get_frame_pointer)(Cpu *cpu);
>> +    uint8_t (*get_privilege_level)(Cpu *cpu);
>> +    uint8_t (*num_privilege_levels)(void);
>> +    const char *(*get_privilege_level_name)(uint8_t pl);
>>      bool (*does_insn_modify_frame_pointer)(const char *disas);
>> } CpuOps;
>>
>> typedef struct Cpu {
>>      Trace *trace;
>>      Callstack *cs;
>> +    uint8_t privilege_level;
>> +    GArray *traces; /* Trace *traces [] */
>>      GByteArray *buf;
>>      CpuOps ops;
>>      void *arch;
>> } Cpu;
>>
>> +typedef enum {
>> +    AARCH64_EL0_SECURE,
>> +    AARCH64_EL0_NONSECURE,
>> +    AARCH64_EL0_REALM,
>> +    AARCH64_EL1_SECURE,
>> +    AARCH64_EL1_NONSECURE,
>> +    AARCH64_EL1_REALM,
>> +    AARCH64_EL2_SECURE,
>> +    AARCH64_EL2_NONSECURE,
>> +    AARCH64_EL2_REALM,
>> +    AARCH64_EL3,
>> +} Aarch64PrivilegeLevel;
>> +
>> typedef struct {
>>      struct qemu_plugin_register *reg_fp;
>> +    struct qemu_plugin_register *reg_cpsr;
>> +    struct qemu_plugin_register *reg_scr_el3;
>> } Aarch64Cpu;
>>
>> typedef struct {
>> @@ -72,6 +92,7 @@ typedef enum {
>> } UftraceRecordType;
>>
>> static struct qemu_plugin_scoreboard *score;
>> +static bool trace_privilege_level;
>> static CpuOps arch_ops;
>>
>> static uint64_t gettime_ns(void)
>> @@ -248,6 +269,16 @@ static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
>>      return *((uint64_t *) buf->data);
>> }
>>
>> +static uint32_t cpu_read_register32(Cpu *cpu, struct qemu_plugin_register *reg)
>> +{
>> +    GByteArray *buf = cpu->buf;
>> +    g_byte_array_set_size(buf, 0);
>> +    size_t sz = qemu_plugin_read_register(reg, buf);
>> +    g_assert(sz == 4);
>> +    g_assert(buf->len == 4);
>> +    return *((uint32_t *) buf->data);
>> +}
>> +
>> static uint64_t cpu_read_memory64(Cpu *cpu, uint64_t addr)
>> {
>>      g_assert(addr);
>> @@ -305,6 +336,68 @@ static struct qemu_plugin_register *plugin_find_register(const char *name)
>>      return NULL;
>> }
>>
>> +static uint8_t aarch64_num_privilege_levels(void)
>> +{
>> +    return AARCH64_EL3 + 1;
> 
> Nit: A common idiom is defining an _MAX enum variant that always stays
> on the bottom of the enum definition and equals the max value, thus
> ensuring you always refer to the correct maximum variant. Though I'd be
> surprised if they ever add an EL4 :^)
>

I hesitated, so your nit is a good hint I should do it this way to not 
surprise the reader.

>> +}
>> +
>> +static const char *aarch64_get_privilege_level_name(uint8_t pl)
>> +{
>> +    switch (pl) {
>> +    case AARCH64_EL0_SECURE: return "S-EL0";
>> +    case AARCH64_EL0_NONSECURE: return "NS-EL0";
>> +    case AARCH64_EL0_REALM: return "R-EL0";
>> +    case AARCH64_EL1_SECURE: return "S-EL1";
>> +    case AARCH64_EL1_NONSECURE: return "NS-EL1";
>> +    case AARCH64_EL1_REALM: return "R-EL1";
>> +    case AARCH64_EL2_SECURE: return "S-EL2";
>> +    case AARCH64_EL2_NONSECURE: return "NS-EL2";
>> +    case AARCH64_EL2_REALM: return "R-EL2";
>> +    case AARCH64_EL3: return "EL3";
>> +    default:
>> +        g_assert_not_reached();
>> +    }
>> +}
>> +
>> +static uint8_t aarch64_get_privilege_level(Cpu *cpu_)
>> +{
>> +    Aarch64Cpu *cpu = cpu_->arch;
>> +    /*
>> +     * QEMU gdbstub does not provide access to CurrentEL,
> 
> We should fix that... I wanted to do that for a while.
> 
>> +     * so we use CPSR instead.
>> +     */
>> +    uint8_t el = cpu_read_register32(cpu_, cpu->reg_cpsr) >> 2 & 0b11;
>> +
>> +    if (el == 3) {
>> +        return AARCH64_EL3;
>> +    }
>> +
>> +    uint8_t ss = AARCH64_EL0_SECURE;
>> +    if (!cpu->reg_scr_el3) {
>> +        ss = AARCH64_EL0_NONSECURE;
>> +    }
>> +    uint64_t scr_el3 = cpu_read_register64(cpu_, cpu->reg_scr_el3);
>> +    uint64_t ns = (scr_el3 >> 0) & 0b1;
>> +    uint64_t nse = (scr_el3 >> 62) & 0b1;
>> +    switch (nse << 1 | ns) {
>> +    case 0b00:
>> +        ss = AARCH64_EL0_SECURE;
>> +        break;
>> +    case 0b01:
>> +        ss = AARCH64_EL0_NONSECURE;
>> +        break;
>> +    case 0b11:
>> +        ss = AARCH64_EL0_REALM;
>> +        break;
>> +    default:
>> +        g_assert_not_reached();
>> +    }
>> +
>> +    const uint8_t num_ss = 3;
>> +    Aarch64PrivilegeLevel pl = el * num_ss + ss;
> 
> That's a bit blunt but I understand where you're coming from
> 
>> +    return pl;
>> +}
>> +
>> static uint64_t aarch64_get_frame_pointer(Cpu *cpu_)
>> {
>>      Aarch64Cpu *cpu = cpu_->arch;
>> @@ -321,6 +414,10 @@ static void aarch64_init(Cpu *cpu_)
>>                          "available. Please use an AArch64 cpu (or -cpu max).\n");
>>          g_abort();
>>      }
>> +    cpu->reg_cpsr = plugin_find_register("cpsr");
>> +    g_assert(cpu->reg_cpsr);
>> +    cpu->reg_scr_el3 = plugin_find_register("SCR_EL3");
>> +    /* scr_el3 is optional */
>> }
>>
>> static void aarch64_end(Cpu *cpu)
>> @@ -342,9 +439,34 @@ static CpuOps aarch64_ops = {
>>      .init = aarch64_init,
>>      .end = aarch64_end,
>>      .get_frame_pointer = aarch64_get_frame_pointer,
>> +    .get_privilege_level = aarch64_get_privilege_level,
>> +    .num_privilege_levels = aarch64_num_privilege_levels,
>> +    .get_privilege_level_name = aarch64_get_privilege_level_name,
>>      .does_insn_modify_frame_pointer = aarch64_does_insn_modify_frame_pointer,
>> };
>>
>> +static void track_privilege_change(unsigned int cpu_index, void *udata)
>> +{
>> +    Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
>> +    uint8_t new_pl = cpu->ops.get_privilege_level(cpu);
>> +
>> +    if (new_pl == cpu->privilege_level) {
>> +        return;
>> +    }
>> +
>> +    uint64_t pc = (uintptr_t) udata;
>> +    uint64_t timestamp = gettime_ns();
>> +
>> +    trace_exit_stack(cpu->trace, cpu->cs, timestamp);
>> +    callstack_clear(cpu->cs);
>> +
>> +    cpu->privilege_level = new_pl;
>> +    cpu->trace = g_array_index(cpu->traces, Trace*, new_pl);
>> +
>> +    cpu_unwind_stack(cpu, cpu->ops.get_frame_pointer(cpu), pc);
>> +    trace_enter_stack(cpu->trace, cpu->cs, timestamp);
>> +}
>> +
>> static void track_callstack(unsigned int cpu_index, void *udata)
>> {
>>      uint64_t pc = (uintptr_t) udata;
>> @@ -397,6 +519,13 @@ static void track_callstack(unsigned int cpu_index, void *udata)
>> static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>> {
>>      size_t n_insns = qemu_plugin_tb_n_insns(tb);
>> +    uintptr_t tb_pc = qemu_plugin_tb_vaddr(tb);
>> +
>> +    if (trace_privilege_level) {
>> +        qemu_plugin_register_vcpu_tb_exec_cb(tb, track_privilege_change,
>> +                                             QEMU_PLUGIN_CB_R_REGS,
>> +                                             (void *) tb_pc);
>> +    }
>>
>>      /*
>>       * We instrument all instructions following one that might have updated
>> @@ -429,18 +558,36 @@ static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
>>
>>      cpu->ops.init(cpu);
>>      cpu->buf = g_byte_array_new();
>> +    cpu->traces = g_array_new(0, 0, sizeof(Trace *));
>>
>>      g_assert(vcpu_index < UINT32_MAX / 100);
>> -    /* trace_id is: cpu_number * 100 */
>> +    g_assert(cpu->ops.num_privilege_levels() < 100);
>> +    /* trace_id is: cpu_number * 100 + privilege_level */
>>      uint32_t trace_id = (vcpu_index + 1) * 100;
>>
>> -    g_autoptr(GString) trace_name = g_string_new(NULL);
>> -    g_string_append_printf(trace_name, "cpu%u", vcpu_index);
>> -    cpu->trace = trace_new(trace_id, trace_name);
>> -    /* create/truncate trace file */
>> -    trace_flush(cpu->trace, false);
>> +    if (trace_privilege_level) {
>> +        for (uint8_t pl = 0; pl < cpu->ops.num_privilege_levels(); ++pl) {
>> +            g_autoptr(GString) trace_name = g_string_new(NULL);
>> +            g_string_append_printf(trace_name, "cpu%u %s", vcpu_index,
>> +                                   cpu->ops.get_privilege_level_name(pl));
>> +            Trace *t = trace_new(trace_id + pl, trace_name);
>> +            g_array_append_val(cpu->traces, t);
>> +        }
>> +    } else {
>> +        g_autoptr(GString) trace_name = g_string_new(NULL);
>> +        g_string_append_printf(trace_name, "cpu%u", vcpu_index);
>> +        Trace *t = trace_new(trace_id, trace_name);
>> +        g_array_append_val(cpu->traces, t);
>> +    }
>> +
>> +    for (size_t i = 0; i < cpu->traces->len; ++i) {
>> +        /* create/truncate trace files */
>> +        Trace *t = g_array_index(cpu->traces, Trace*, i);
>> +        trace_flush(t, false);
>> +    }
>>
>>      cpu->cs = callstack_new();
>> +    cpu->trace = g_array_index(cpu->traces, Trace*, cpu->privilege_level);
>> }
>>
>> static void vcpu_end(unsigned int vcpu_index)
>> @@ -448,7 +595,12 @@ static void vcpu_end(unsigned int vcpu_index)
>>      Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>>      g_byte_array_free(cpu->buf, true);
>>
>> -    trace_free(cpu->trace);
>> +    for (size_t i = 0; i < cpu->traces->len; ++i) {
>> +        Trace *t = g_array_index(cpu->traces, Trace*, i);
>> +        trace_free(t);
>> +    }
>> +
>> +    g_array_free(cpu->traces, true);
>>      callstack_free(cpu->cs);
>>      memset(cpu, 0, sizeof(Cpu));
>> }
>> @@ -457,7 +609,13 @@ static void at_exit(qemu_plugin_id_t id, void *data)
>> {
>>      for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>>          Cpu *cpu = qemu_plugin_scoreboard_find(score, i);
>> -        trace_flush(cpu->trace, true);
>> +        for (size_t j = 0; j < cpu->traces->len; ++j) {
>> +            Trace *t = g_array_index(cpu->traces, Trace*, j);
>> +            trace_flush(t, true);
>> +        }
>> +    }
>> +
>> +    for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>>          vcpu_end(i);
>>      }
>>
>> @@ -468,6 +626,21 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
>>                                             const qemu_info_t *info,
>>                                             int argc, char **argv)
>> {
>> +    for (int i = 0; i < argc; i++) {
>> +        char *opt = argv[i];
>> +        g_auto(GStrv) tokens = g_strsplit(opt, "=", 2);
>> +        if (g_strcmp0(tokens[0], "trace-privilege-level") == 0) {
>> +            if (!qemu_plugin_bool_parse(tokens[0], tokens[1],
>> +                                        &trace_privilege_level)) {
>> +                fprintf(stderr, "boolean argument parsing failed: %s\n", opt);
>> +                return -1;
>> +            }
>> +        } else {
>> +            fprintf(stderr, "option parsing failed: %s\n", opt);
>> +            return -1;
>> +        }
>> +    }
>> +
>>      if (!strcmp(info->target_name, "aarch64")) {
>>          arch_ops = aarch64_ops;
>>      } else {
>> -- 
>> 2.47.2
>>
> 
> LGTM,
> 
> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 6/9] contrib/plugins/uftrace: generate additional files for uftrace
  2025-08-08  9:37   ` Manos Pitsidianakis
@ 2025-08-08 16:42     ` Pierrick Bouvier
  0 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 16:42 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daud é, rowan Hart,
	Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On 8/8/25 2:37 AM, Manos Pitsidianakis wrote:
> On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>> Beyond traces per cpu, uftrace expect to find some specific files.
>> - info: contains information about machine/program run
>>   those values are not impacting uftrace behaviour, and we simply copied
>>   a random example to keep things simple.
>> - memory mapping: how every binary is mapped in memory. For system mode,
>>   we generate an empty mapping (uftrace_symbols.py, coming in future
>>   commit, will take care of that). For user mode, we copy current
>>   /proc/self/maps. We don't need to do any special filtering, as
>>   reported addresses will necessarily concern guest program, and not
>>   QEMU and its libraries.
>> - task: list of tasks. We present every vcpu/privilege level as a
>>   separate process, as it's the best view we can have when generating a
>>   (visual) chrome trace. Using threads is less convenient in terms of
>>   UI.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> contrib/plugins/uftrace.c | 130 +++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 129 insertions(+), 1 deletion(-)
>>
>> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>> index 7737626da2f..6628b4256fd 100644
>> --- a/contrib/plugins/uftrace.c
>> +++ b/contrib/plugins/uftrace.c
>> @@ -115,6 +115,126 @@ static uint64_t gettime_ns(void)
>>      return now_ns;
>> }
>>
>> +static void uftrace_write_map(bool system_emulation)
>> +{
>> +    const char *path = "./uftrace.data/sid-0.map";
>> +
>> +    if (system_emulation && access(path, F_OK) == 0) {
>> +        /* do not erase existing map in system emulation, as a custom one might
>> +         * already have been generated by uftrace_symbols.py */
>> +        return;
>> +    }
>> +
>> +    FILE *sid_map = fopen(path, "w");
>> +    g_assert(sid_map);
>> +
>> +    if (system_emulation) {
>> +        fprintf(sid_map,
>> +                "# map stack on highest address possible, to prevent uftrace\n"
>> +                "# from considering any kernel address\n");
>> +        fprintf(sid_map,
>> +          "ffffffffffff-ffffffffffff rw-p 00000000 00:00 0 [stack]\n");
>> +    } else {
>> +        /* in user mode, copy /proc/self/maps instead */
>> +        FILE *self_map = fopen("/proc/self/maps", "r");
>> +        g_assert(self_map);
>> +        for (;;) {
>> +            int c = fgetc(self_map);
>> +            if (c == EOF) {
>> +                break;
>> +            }
>> +            fputc(c, sid_map);
>> +        }
>> +        fclose(self_map);
>> +    }
>> +    fclose(sid_map);
>> +}
>> +
>> +static void uftrace_write_task(const GArray *traces)
>> +{
>> +    FILE *task = fopen("./uftrace.data/task.txt", "w");
>> +    g_assert(task);
>> +    for (int i = 0; i < traces->len; ++i) {
>> +        Trace *t = g_array_index(traces, Trace*, i);
>> +        fprintf(task, "SESS timestamp=0.0 pid=%"PRIu32" sid=0 exename=\"%s\"\n",
>> +                t->id, t->name->str);
>> +        fprintf(task, "TASK timestamp=0.0 tid=%"PRIu32" pid=%"PRIu32"\n",
>> +                t->id, t->id);
>> +    }
>> +    fclose(task);
>> +}
>> +
>> +static void uftrace_write_info(const GArray *traces)
>> +{
>> +    g_autoptr(GString) taskinfo_tids = g_string_new("taskinfo:tids=");
>> +    for (int i = 0; i < traces->len; ++i) {
>> +        Trace *t = g_array_index(traces, Trace*, i);
>> +        const char *delim = i > 0 ? "," : "";
>> +        g_string_append_printf(taskinfo_tids, "%s%"PRIu32, delim, t->id);
>> +    }
>> +
>> +    g_autoptr(GString) taskinfo_nr_tid = g_string_new("taskinfo:nr_tid=");
>> +    g_string_append_printf(taskinfo_nr_tid, "%d", traces->len);
>> +
>> +    FILE *info = fopen("./uftrace.data/info", "w");
>> +    g_assert(info);
>> +    /*
>> +     * $ uftrace dump --debug
>> +     * uftrace file header: magic         = 4674726163652100
>> +     * uftrace file header: version       = 4
>> +     * uftrace file header: header size   = 40
>> +     * uftrace file header: endian        = 1 (little)
>> +     * uftrace file header: class          = 2 (64 bit)
>> +     * uftrace file header: features      = 0x1263 (PLTHOOK | ...
>> +     * uftrace file header: info          = 0x7bff (EXE_NAME | ...
>> +     *  <0000000000000000>: 46 74 72 61 63 65 21 00  04 00 00 00 28 00 01 02
>> +     *  <0000000000000010>: 63 12 00 00 00 00 00 00  ff 7b 00 00 00 00 00 00
>> +     *  <0000000000000020>: 00 04 00 00 00 00 00 00
>> +     */
>> +    const uint8_t header[] = {0x46, 0x74, 0x72, 0x61, 0x63, 0x65, 0x21, 0x00,
>> +                              0x04, 0x00, 0x00, 0x00, 0x28, 0x00, 0x01, 0x02,
>> +                              0x63, 0x12, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>> +                              0xff, 0x7b, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>> +                              0x00, 0x04, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00};
>> +    fwrite(header, sizeof(header), 1, info);
>> +    const char *info_data[] = {
>> +        "exename:from_qemu",
>> +        "build_id:0123456789abcdef0123456789abcdef01234567",
>> +        "exit_status:0",
>> +        "cmdline:uftrace record qemu",
>> +        "cpuinfo:lines=2",
>> +        "cpuinfo:nr_cpus=1 / 1 (online/possible)",
>> +        "cpuinfo:desc=Intel 8086",
> 
> Haha what :D

I'm happy at least one person got the joke.

> 
>> +        "meminfo:1.0 / 1.0 GB (free / total)",
>> +        "osinfo:lines=3",
>> +        "osinfo:kernel=Linux 6.12.33",
>> +        "osinfo:hostname=pc",
>> +        "osinfo:distro=\"Debian GNU/Linux 13 (trixie)\"",
> 
> So I assume these strings can be anything, why not make them blank?
>

That's a good idea. I will see how uftrace reacts to this.

>> +        "taskinfo:lines=2",
>> +        taskinfo_nr_tid->str,
>> +        taskinfo_tids->str,
>> +        "usageinfo:lines=6",
>> +        "usageinfo:systime=0.000000",
>> +        "usageinfo:usrtime=0.000000",
>> +        "usageinfo:ctxsw=0 / 0 (voluntary / involuntary)",
>> +        "usageinfo:maxrss=8016",
>> +        "usageinfo:pagefault=0 / 0 (major / minor)",
>> +        "usageinfo:iops=0 / 0 (read / write)",
>> +        "loadinfo:0.0 / 0.0 / 0.0",
>> +        "record_date:Mon Jan  1 00:00:00 2025",
>> +        "elapsed_time:1000000000000.0 sec",
>> +        "pattern_type:regex",
>> +        "uftrace_version:v0.17 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )",
>> +        "utc_offset:1751552954",
>> +        0};
>> +    const char **info_data_it = info_data;
>> +    while (*(info_data_it)) {
>> +        fprintf(info, "%s\n", *info_data_it);
>> +        ++info_data_it;
>> +    }
>> +    fclose(info);
>> +}
>> +
>> static Callstack *callstack_new(void)
>> {
>>      Callstack *cs = g_new0(Callstack, 1);
>> @@ -607,14 +727,22 @@ static void vcpu_end(unsigned int vcpu_index)
>>
>> static void at_exit(qemu_plugin_id_t id, void *data)
>> {
>> +    bool system_emulation = (bool) data;
>> +    g_autoptr(GArray) traces = g_array_new(0, 0, sizeof(Trace *));
>> +
>>      for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>>          Cpu *cpu = qemu_plugin_scoreboard_find(score, i);
>>          for (size_t j = 0; j < cpu->traces->len; ++j) {
>>              Trace *t = g_array_index(cpu->traces, Trace*, j);
>>              trace_flush(t, true);
>> +            g_array_append_val(traces, t);
>>          }
>>      }
>>
>> +    uftrace_write_map(system_emulation);
>> +    uftrace_write_info(traces);
>> +    uftrace_write_task(traces);
>> +
>>      for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>>          vcpu_end(i);
>>      }
>> @@ -651,7 +779,7 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
>>
>>      score = qemu_plugin_scoreboard_new(sizeof(Cpu));
>>      qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
>> -    qemu_plugin_register_atexit_cb(id, at_exit, NULL);
>> +    qemu_plugin_register_atexit_cb(id, at_exit, (void *) info->system_emulation);
>>      qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
>>
>>      return 0;
>> -- 
>> 2.47.2
>>
> 
> LGTM,
> 
> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 7/9] contrib/plugins/uftrace: implement x64 support
  2025-08-08  9:42   ` Manos Pitsidianakis
@ 2025-08-08 16:52     ` Pierrick Bouvier
  0 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 16:52 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daud é, rowan Hart,
	Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On 8/8/25 2:42 AM, Manos Pitsidianakis wrote:
> On Fri, 08 Aug 2025 05:07, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>> It's trivial to implement x64 support, as it's the same stack layout
>> than aarch64.
> 
> s/than/as
> 
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> contrib/plugins/uftrace.c | 85 +++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 85 insertions(+)
>>
>> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>> index 6628b4256fd..f10172eed10 100644
>> --- a/contrib/plugins/uftrace.c
>> +++ b/contrib/plugins/uftrace.c
>> @@ -79,6 +79,20 @@ typedef struct {
>>      struct qemu_plugin_register *reg_scr_el3;
>> } Aarch64Cpu;
>>
>> +typedef enum {
>> +    X64_RING0,
>> +    X64_RING1,
>> +    X64_RING2,
>> +    X64_RING3,
>> +    X64_REAL_MODE,
>> +} X64PrivilegeLevel;
>> +
>> +typedef struct {
>> +    struct qemu_plugin_register *reg_rbp;
>> +    struct qemu_plugin_register *reg_cs;
>> +    struct qemu_plugin_register *reg_cr0;
>> +} X64Cpu;
>> +
>> typedef struct {
>>      uint64_t timestamp;
>>      uint64_t data;
>> @@ -565,6 +579,75 @@ static CpuOps aarch64_ops = {
>>      .does_insn_modify_frame_pointer = aarch64_does_insn_modify_frame_pointer,
>> };
>>
>> +static uint8_t x64_num_privilege_levels(void)
>> +{
>> +    return X64_REAL_MODE + 1;
>> +}
>> +
>> +static const char *x64_get_privilege_level_name(uint8_t pl)
>> +{
>> +    switch (pl) {
>> +    case X64_RING0: return "Ring0";
>> +    case X64_RING1: return "Ring1";
>> +    case X64_RING2: return "Ring2";
>> +    case X64_RING3: return "Ring3";
>> +    case X64_REAL_MODE: return "RealMode";
>> +    default:
>> +        g_assert_not_reached();
>> +    }
>> +}
>> +
>> +static uint8_t x64_get_privilege_level(Cpu *cpu_)
>> +{
>> +    X64Cpu *cpu = cpu_->arch;
>> +    uint64_t cr0 = cpu_read_register64(cpu_, cpu->reg_cr0);
>> +    uint64_t protected_mode = (cr0 >> 0) & 0b1;
>> +    if (!protected_mode) {
>> +        return X64_REAL_MODE;
>> +    }
>> +    uint32_t cs = cpu_read_register32(cpu_, cpu->reg_cs);
>> +    uint32_t ring_level = (cs >> 0) & 0b11;
>> +    return ring_level;
>> +}
>> +
>> +static uint64_t x64_get_frame_pointer(Cpu *cpu_)
>> +{
>> +    X64Cpu *cpu = cpu_->arch;
>> +    return cpu_read_register64(cpu_, cpu->reg_rbp);
>> +}
>> +
>> +static void x64_init(Cpu *cpu_)
>> +{
>> +    X64Cpu *cpu = g_new0(X64Cpu, 1);
>> +    cpu_->arch = cpu;
>> +    cpu->reg_rbp = plugin_find_register("rbp");
>> +    g_assert(cpu->reg_rbp);
>> +    cpu->reg_cs = plugin_find_register("cs");
>> +    g_assert(cpu->reg_cs);
>> +    cpu->reg_cr0 = plugin_find_register("cr0");
>> +    g_assert(cpu->reg_cr0);
>> +}
>> +
>> +static void x64_end(Cpu *cpu)
>> +{
>> +    g_free(cpu->arch);
>> +}
>> +
>> +static bool x64_does_insn_modify_frame_pointer(const char *disas)
>> +{
>> +    return strstr(disas, "rbp");
>> +}
>> +
>> +static CpuOps x64_ops = {
>> +    .init = x64_init,
>> +    .end = x64_end,
>> +    .get_frame_pointer = x64_get_frame_pointer,
>> +    .get_privilege_level = x64_get_privilege_level,
>> +    .num_privilege_levels = x64_num_privilege_levels,
>> +    .get_privilege_level_name = x64_get_privilege_level_name,
>> +    .does_insn_modify_frame_pointer = x64_does_insn_modify_frame_pointer,
>> +};
>> +
>> static void track_privilege_change(unsigned int cpu_index, void *udata)
>> {
>>      Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
>> @@ -771,6 +854,8 @@ QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
>>
>>      if (!strcmp(info->target_name, "aarch64")) {
>>          arch_ops = aarch64_ops;
>> +    } else if (!strcmp(info->target_name, "x86_64")) {
>> +        arch_ops = x64_ops;
>>      } else {
>>          fprintf(stderr, "plugin uftrace: %s target is not supported\n",
>>                  info->target_name);
>> -- 
>> 2.47.2
>>
> 
> No idea about x86 assembly tbh but this looks correct to me.
> 
> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>

Funny enough, I tried to implement that for risv64 also, just to realize 
the utter mess this architecture did with frame pointer, hiding it on 
the other side of the stack frame.
Thus, it's not possible to easily walk frame pointers on this arch, and 
you need to disassemble prologue to find how big your stack frame is. 
It's crazy architects didn't learn their lesson from the past issues on 
other arch regarding frame pointers.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 9/9] contrib/plugins/uftrace: add documentation
  2025-08-08  9:46   ` Manos Pitsidianakis
@ 2025-08-08 16:59     ` Pierrick Bouvier
  0 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 16:59 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daud é, rowan Hart,
	Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On 8/8/25 2:46 AM, Manos Pitsidianakis wrote:
> On Fri, 08 Aug 2025 05:07, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>> This documentation summarizes how to use the plugin, and present two
>> examples of the possibilities offered by it, in system and user mode.
>>
>> As well, it explains how to rebuild and reproduce those examples.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> docs/about/emulation.rst | 197 +++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 197 insertions(+)
>>
>> diff --git a/docs/about/emulation.rst b/docs/about/emulation.rst
>> index 456d01d5b08..9ce47ac2712 100644
>> --- a/docs/about/emulation.rst
>> +++ b/docs/about/emulation.rst
>> @@ -816,6 +816,203 @@ This plugin can limit the number of Instructions Per Second that are executed::
>>        The lower the number the more accurate time will be, but the less efficient the plugin.
>>        Defaults to ips/10
>>
>> +Uftrace
>> +.......
>> +
>> +``contrib/plugins/uftrace.c``
>> +
>> +This plugin generates a binary trace compatible with
>> +`uftrace <https://github.com/namhyung/uftrace>`_.
>> +
>> +Plugin supports aarch64 and x64, and works in user and system mode, allowing to
>> +trace a system boot, which is not something possible usually.
> 
> Now it is!
> 
>> +
>> +In user mode, the memory mapping is directly copied from ``/proc/self/maps`` at
>> +the end of execution. Uftrace should be able to retrieve symbols by itself,
>> +without any additional step.
>> +In system mode, the default memory mapping is empty, and you can generate
>> +one (and associated symbols) using ``contrib/plugins/uftrace_symbols.py``.
>> +Symbols must be present in ELF binaries.
>> +
>> +It tracks the call stack (based on frame pointer analysis). Thus, your program
>> +and its dependencies must be compiled using ``-fno-omit-frame-pointer
>> +-mno-omit-leaf-frame-pointer``. In 2024, `Ubuntu and Fedora enabled it by
>> +default again on x64
>> +<https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html>`_.
>> +On aarch64, this is less of a problem, as they are usually part of the ABI,
>> +except for leaf functions. That's true for user space applications, but not
>> +necessarily for bare metal code. You can read this `section
>> +<uftrace_build_system_example>` to easily build a system with frame pointers.
>> +
>> +When tracing long scenarios (> 1 min), the generated trace can become very long,
>> +making it hard to extract data from it. In this case, a simple solution is to
>> +trace execution while generating a timestamped output log using
>> +``qemu-system-aarch64 ... | ts "%s"``. Then, ``uftrace --time-range=start~end``
>> +can be used to reduce trace for only this part of execution.
>> +
>> +Performance wise, overhead compared to normal tcg execution is around x5-x15.
>> +
>> +.. list-table:: Uftrace plugin arguments
>> +  :widths: 20 80
>> +  :header-rows: 1
>> +
>> +  * - Option
>> +    - Description
>> +  * - trace-privilege-level=[on|off]
>> +    - Generate separate traces for each privilege level (Exception Level +
>> +      Security State on aarch64, Rings on x64).
>> +
>> +.. list-table:: uftrace_symbols.py arguments
>> +  :widths: 20 80
>> +  :header-rows: 1
>> +
>> +  * - Option
>> +    - Description
>> +  * - elf_file [elf_file ...]
>> +    - path to an ELF file. Use /path/to/file:0xdeadbeef to add a mapping offset.
>> +  * - --prefix-symbols
>> +    - prepend binary name to symbols
>> +
>> +Example user trace
>> +++++++++++++++++++
>> +
>> +As an example, we can trace qemu itself running git::
>> +
>> +    $ ./build/qemu-aarch64 -plugin \
>> +      build/contrib/plugins/libuftrace.so \
>> +      ./build/qemu-aarch64 /usr/bin/git --help
>> +
>> +    # and generate a chrome trace directly
>> +    $ uftrace dump --chrome | gzip > ~/qemu_aarch64_git_help.json.gz
>> +
>> +For convenience, you can download this trace `qemu_aarch64_git_help.json.gz
>> +<https://fileserver.linaro.org/s/N8X8fnZ5yGRZLsT/download/qemu_aarch64_git_help.json.gz>`_.
> 
> We should be able to add static files in the docs/ folder that sphinx
> html can link to for images and json. WDYT?
>

Json is around 100MB, so I don't think we want to version that.
For images, it's more debatable, but I anticipated (maybe wrongly) that 
community would not like to see +1MB on qemu source code for doc images.

My git-publish client was a bit worried too, despite my encouragements, 
to try pushing a patch with a binary blob that will break half of the 
people "custom" setups, and freeze the mailing list.

So, since the json have to be hosted externally, I thought it was not 
the worst idea to keep images with it.

>> +Download it and open this trace on https://ui.perfetto.dev/. You can zoom in/out
>> +using w,a,s,d keys. Some sequences taken from this trace:
> 
> You can use :kbd:`W` etc for nice key formatting
> 
>> +
>> +- Loading program and its interpreter
>> +
>> +.. image:: https://fileserver.linaro.org/s/fie8JgX76yyL5cq/preview
>> +   :height: 200px
>> +
>> +- open syscall
>> +
>> +.. image:: https://fileserver.linaro.org/s/rsXPTeZZPza4PcE/preview
>> +   :height: 200px
>> +
>> +- TB creation
>> +
>> +.. image:: https://fileserver.linaro.org/s/GXY6NKMw5EeRCew/preview
>> +   :height: 200px
>> +
>> +It's usually better to use ``uftrace record`` directly. However, tracing
>> +binaries through qemu-user can be convenient when you don't want to recompile
>> +them (``uftrace record`` requires instrumentation), as long as symbols are
>> +present.
>> +
>> +Example system trace
>> +++++++++++++++++++++
>> +
>> +A full trace example (chrome trace, from instructions below) generated from a
>> +system boot can be found `here
>> +<https://fileserver.linaro.org/s/WsemLboPEzo24nw/download/aarch64_boot.json.gz>`_.
>> +Download it and open this trace on https://ui.perfetto.dev/. You can see code
>> +executed for all privilege levels, and zoom in/out using w,a,s,d keys. You can
>> +find below some sequences taken from this trace:
>> +
>> +- Two first stages of boot sequence in Arm Trusted Firmware (EL3 and S-EL1)
>> +
>> +.. image:: https://fileserver.linaro.org/s/kkxBS552W7nYESX/preview
>> +   :height: 200px
>> +
>> +- U-boot initialization (until code relocation, after which we can't track it)
>> +
>> +.. image:: https://fileserver.linaro.org/s/LKTgsXNZFi5GFNC/preview
>> +   :height: 200px
>> +
>> +- Stat and open syscalls in kernel
>> +
>> +.. image:: https://fileserver.linaro.org/s/dXe4MfraKg2F476/preview
>> +   :height: 200px
>> +
>> +- Timer interrupt
>> +
>> +.. image:: https://fileserver.linaro.org/s/TM5yobYzJtP7P3C/preview
>> +   :height: 200px
>> +
>> +- Poweroff sequence (from kernel back to firmware, NS-EL2 to EL3)
>> +
>> +.. image:: https://fileserver.linaro.org/s/oR2PtyGKJrqnfRf/preview
>> +   :height: 200px
>> +
>> +Build and run system example
>> +++++++++++++++++++++++++++++
>> +
>> +.. _uftrace_build_system_example:
>> +
>> +Building a full system image with frame pointers is not trivial.
>> +
>> +We provide a `simple way <https://github.com/pbo-linaro/qemu-linux-stack>`_ to
>> +build an aarch64 system, combining Arm Trusted firmware, U-boot, Linux kernel
>> +and debian userland. It's based on containers (``podman`` only) and
>> +``qemu-user-static (binfmt)`` to make sure it's easily reproducible and does not depend
>> +on machine where you build it.
>> +
>> +You can follow the exact same instructions for a x64 system, combining edk2,
>> +Linux, and Ubuntu, simply by switching to
>> +`x86_64 <https://github.com/pbo-linaro/qemu-linux-stack/tree/x86_64>`_ branch.
>> +
>> +To build the system::
>> +
>> +    # Install dependencies
>> +    $ sudo apt install -y podman qemu-user-static
>> +
>> +    $ git clone https://github.com/pbo-linaro/qemu-linux-stack
>> +    $ cd qemu-linux-stack
>> +    $ ./build.sh
>> +
>> +    # system can be started using:
>> +    $ ./run.sh /path/to/qemu-system-aarch64
>> +
>> +To generate a uftrace for a system boot from that::
>> +
>> +    # run true and poweroff the system
>> +    $ env INIT=true ./run.sh path/to/qemu-system-aarch64 \
>> +      -plugin path/to/contrib/plugins/libuftrace.so,trace-privilege-level=on
>> +
>> +    # generate symbols and memory mapping
>> +    $ path/to/contrib/plugins/uftrace_symbols.py \
>> +      --prefix-symbols \
>> +      arm-trusted-firmware/build/qemu/debug/bl1/bl1.elf \
>> +      arm-trusted-firmware/build/qemu/debug/bl2/bl2.elf \
>> +      arm-trusted-firmware/build/qemu/debug/bl31/bl31.elf \
>> +      u-boot/u-boot:0x60000000 \
>> +      linux/vmlinux
>> +
>> +    # inspect trace with
>> +    $ uftrace replay
>> +
>> +Uftrace allows to filter the trace, and dump flamegraphs, or a chrome trace.
>> +This last one is very interesting to see visually the boot process::
>> +
>> +    $ uftrace dump --chrome > boot.json
>> +    # Open your browser, and load boot.json on https://ui.perfetto.dev/.
>> +
>> +Long visual chrome traces can't be easily opened, thus, it might be
>> +interesting to generate them around a particular point of execution::
>> +
>> +    # execute qemu and timestamp output log
>> +    $ env INIT=true ./run.sh path/to/qemu-system-aarch64 \
>> +      -plugin path/to/contrib/plugins/libuftrace.so,trace-privilege-level=on |&
>> +      ts "%s" | tee exec.log
>> +
>> +    $ cat exec.log  | grep 'Run /init'
>> +      1753122320 [   11.834391] Run /init as init process
>> +      # init was launched at 1753122320
>> +
>> +    # generate trace around init execution (2 seconds):
>> +    $ uftrace dump --chrome --time-range=1753122320~1753122322 > init.json
>> +
>> Other emulation features
>> ------------------------
>>
>> -- 
>> 2.47.2
>>
> 
> 
> Sounds comprehensive all in all. I will try to follow the instructions
> and post a Tested-by
>

Thanks.
You can try on our lab vm, as it's quite network intensive, between the 
container creation, and sources download.

> For the doc text:
> 
> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 1/9] contrib/plugins/uftrace: skeleton file
  2025-08-08  8:14   ` Manos Pitsidianakis
  2025-08-08 16:19     ` Pierrick Bouvier
@ 2025-08-08 17:28     ` Pierrick Bouvier
  1 sibling, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 17:28 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daud é, rowan Hart,
	Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On 8/8/25 1:14 AM, Manos Pitsidianakis wrote:
> On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>> We define a scoreboard that will hold our data per cpu. As well, we
>> define a buffer per cpu that will be used to read registers and memories
>> in a thread-safe way.
>>
>> For now, we just instrument all instructions with an empty callback.
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> contrib/plugins/uftrace.c   | 74 +++++++++++++++++++++++++++++++++++++
>> contrib/plugins/meson.build |  3 +-
>> 2 files changed, 76 insertions(+), 1 deletion(-)
>> create mode 100644 contrib/plugins/uftrace.c
>>
>> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>> new file mode 100644
>> index 00000000000..d60c1077496
>> --- /dev/null
>> +++ b/contrib/plugins/uftrace.c
>> @@ -0,0 +1,74 @@
>> +/*
>> + * Copyright (C) 2025, Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> + *
>> + * Generates a trace compatible with uftrace (similar to uftrace record).
>> + * https://github.com/namhyung/uftrace
>> + *
>> + * See docs/about/emulation.rst|Uftrace for details and examples.
>> + *
>> + * SPDX-License-Identifier: GPL-2.0-or-later
>> + */
>> +
>> +#include <qemu-plugin.h>
>> +#include <glib.h>
>> +
>> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>> +
>> +typedef struct Cpu {
>> +    GByteArray *buf;
>> +} Cpu;
>> +
>> +static struct qemu_plugin_scoreboard *score;
>> +
>> +static void track_callstack(unsigned int cpu_index, void *udata)
>> +{
>> +}
>> +
>> +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>> +{
>> +    size_t n_insns = qemu_plugin_tb_n_insns(tb);
>> +
>> +    for (int i = 0; i < n_insns; i++) {
> 
> s/int i/size_t i/
> 
>> +        struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
> 
> This can return NULL,
> 
>> +
>> +        uintptr_t pc = qemu_plugin_insn_vaddr(insn);
> 
> And this would lead to a NULL dereference (it performs insn->vaddr
> access)
> 
>> +        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
>> +                QEMU_PLUGIN_CB_R_REGS,
>> +                (void *) pc);
> 
> Hm indentation got broken here, should be
> 
> 
> +        qemu_plugin_register_vcpu_insn_exec_cb(insn, track_callstack,
> +                                               QEMU_PLUGIN_CB_R_REGS,
> +                                               (void *)pc);
> 
>> +
>> +    }
>> +}
>> +
>> +static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
>> +{
>> +    Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>> +    cpu->buf = g_byte_array_new();
>> +}
>> +
>> +static void vcpu_end(unsigned int vcpu_index)
>> +{
>> +    Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>> +    g_byte_array_free(cpu->buf, true);
>> +    memset(cpu, 0, sizeof(Cpu));
> 
> Nitpick, cpu->buf = NULL; is easier to understand (suggestion)
>

Looking at this twice, it does not scale with other commits, which will 
free other members. That's why there was a memset in the first place.
Sorry, I rebuilt intermediate commits by reverting various parts, so it 
may result in suck quirks.

>> +}
>> +
>> +static void at_exit(qemu_plugin_id_t id, void *data)
>> +{
>> +    for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>> +        vcpu_end(i);
>> +    }
>> +
>> +    qemu_plugin_scoreboard_free(score);
>> +}
>> +
>> +QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
>> +                                           const qemu_info_t *info,
>> +                                           int argc, char **argv)
>> +{
>> +    score = qemu_plugin_scoreboard_new(sizeof(Cpu));
>> +    qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
>> +    qemu_plugin_register_atexit_cb(id, at_exit, NULL);
>> +    qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
>> +
>> +    return 0;
>> +}
>> diff --git a/contrib/plugins/meson.build b/contrib/plugins/meson.build
>> index 1876bc78438..7eb3629c95d 100644
>> --- a/contrib/plugins/meson.build
>> +++ b/contrib/plugins/meson.build
>> @@ -1,5 +1,6 @@
>> contrib_plugins = ['bbv', 'cache', 'cflow', 'drcov', 'execlog', 'hotblocks',
>> -                   'hotpages', 'howvec', 'hwprofile', 'ips', 'stoptrigger']
>> +                   'hotpages', 'howvec', 'hwprofile', 'ips', 'stoptrigger',
>> +                   'uftrace']
>> if host_os != 'windows'
>>    # lockstep uses socket.h
>>    contrib_plugins += 'lockstep'
>> -- 
>> 2.47.2
>>
> 
> If no other comments rise for this patch, you can add my r-b after
> fixing the NULL check:
> 
> Reviewed-by: Manos Pitsidianakis <manos.pitsidianakis@linaro.org>



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/9] contrib/plugins/uftrace: implement tracing
  2025-08-08  9:11   ` Manos Pitsidianakis
  2025-08-08 16:38     ` Pierrick Bouvier
@ 2025-08-08 18:03     ` Pierrick Bouvier
  2025-08-08 18:13       ` Manos Pitsidianakis
  2025-08-08 18:24     ` Pierrick Bouvier
  2 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 18:03 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daud é, rowan Hart,
	Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On 8/8/25 2:11 AM, Manos Pitsidianakis wrote:
> On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>> We implement tracing, following uftrace format.
>> Trace is flushed every 32 MB, so file operations don't impact
>> performance at runtime.
>>
>> A different trace is generated per cpu, and we ensure they have a unique
>> name, based on vcpu_index, while keeping room for privilege level coming
>> in next commit.
> 
> Suggestion (not a request): put some kind of documentation about the
> format this patch implements, maybe a commit sha & URL to a header file
> from upstream uftrace.
> 
>>
>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>> ---
>> contrib/plugins/uftrace.c | 149 +++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 148 insertions(+), 1 deletion(-)
>>
>> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>> index d51faceb344..402a242433e 100644
>> --- a/contrib/plugins/uftrace.c
>> +++ b/contrib/plugins/uftrace.c
>> @@ -12,6 +12,13 @@
>> #include <qemu-plugin.h>
>> #include <glib.h>
>> #include <stdio.h>
>> +#include <sys/stat.h>
>> +#include <sys/time.h>
>> +#include <time.h>
>> +#include <unistd.h>
>> +
>> +#define MiB     (INT64_C(1) << 20)
>> +#define NANOSECONDS_PER_SECOND 1000000000LL
>>
>> QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>>
>> @@ -24,6 +31,13 @@ typedef struct {
>>      uint64_t frame_pointer;
>> } CallstackEntry;
>>
>> +typedef struct {
>> +    GArray *t;
>> +    GString *path;
>> +    GString *name;
>> +    uint32_t id;
>> +} Trace;
>> +
>> typedef struct Cpu Cpu;
>>
>> typedef struct {
>> @@ -34,6 +48,7 @@ typedef struct {
>> } CpuOps;
>>
>> typedef struct Cpu {
>> +    Trace *trace;
>>      Callstack *cs;
>>      GByteArray *buf;
>>      CpuOps ops;
>> @@ -44,9 +59,41 @@ typedef struct {
>>      struct qemu_plugin_register *reg_fp;
>> } Aarch64Cpu;
>>
>> +typedef struct {
>> +    uint64_t timestamp;
>> +    uint64_t data;
>> +} UftraceEntry;
>> +
>> +typedef enum {
>> +    UFTRACE_ENTRY,
>> +    UFTRACE_EXIT,
>> +    UFTRACE_LOST,
>> +    UFTRACE_EVENT
>> +} UftraceRecordType;
>> +
>> static struct qemu_plugin_scoreboard *score;
>> static CpuOps arch_ops;
>>
>> +static uint64_t gettime_ns(void)
>> +{
>> +#ifdef _WIN32
>> +    /*
>> +     * On Windows, timespec_get is available only with UCRT, but not with
>> +     * MinGW64 environment. Simplify by using only gettimeofday on this
>> +     * platform. This may result in a precision loss.
>> +     */
>> +    struct timeval tv;
>> +    gettimeofday(&tv, NULL);
>> +    uint64_t now_ns = tv.tv_sec * NANOSECONDS_PER_SECOND + tv.tv_usec * 1000;
>> +#else
>> +    /* We need nanosecond precision for short lived functions. */
>> +    struct timespec ts;
>> +    timespec_get(&ts, TIME_UTC);
>> +    uint64_t now_ns = ts.tv_sec * NANOSECONDS_PER_SECOND + ts.tv_nsec;
>> +#endif
>> +    return now_ns;
>> +}
>> +
>> static Callstack *callstack_new(void)
>> {
>>      Callstack *cs = g_new0(Callstack, 1);
>> @@ -112,6 +159,85 @@ static CallstackEntry callstack_pop(Callstack *cs)
>>      return e;
>> }
>>
>> +static Trace *trace_new(uint32_t id, GString *name)
>> +{
>> +    Trace *t = g_new0(Trace, 1);
>> +    t->t = g_array_new(false, false, sizeof(UftraceEntry));
>> +    t->path = g_string_new(NULL);
>> +    g_string_append_printf(t->path, "./uftrace.data/%"PRIu32".dat", id);
>> +    t->name = g_string_new(name->str);
>> +    t->id = id;
>> +    return t;
>> +}
>> +
>> +static void trace_free(Trace *t)
>> +{
>> +    g_assert(t->t->len == 0);
>> +    g_array_free(t->t, true);
>> +    t->t = NULL;
>> +    g_string_free(t->path, true);
>> +    t->path = NULL;
>> +    g_string_free(t->name, true);
>> +    t->name = NULL;
>> +    g_free(t);
>> +}
>> +
>> +static void trace_flush(Trace *t, bool append)
>> +{
>> +    int create_dir = g_mkdir_with_parents("./uftrace.data",
>> +                                          S_IRWXU | S_IRWXG | S_IRWXO);
>> +    g_assert(create_dir == 0);
>> +    FILE *dat = fopen(t->path->str, append ? "a" : "w");
>> +    g_assert(dat);
>> +    GArray *data = t->t;
>> +    if (data->len) {
>> +        fwrite(data->data, data->len, sizeof(UftraceEntry), dat);
> 
> fwrite might not write all bytes, how about using the
> g_file_set_contents() wrapper?
>

If I see correctly, g_file_set_contents does not allow to append data, 
which is what we need to do here (that's the point of flushing every 
32MB). I can add an assert on fwrite return to make sure we wrote 
everything.

>> +    }
>> +    fclose(dat);
>> +    g_array_set_size(data, 0);
>> +}
>> +
>> +static void trace_add_entry(Trace *t, uint64_t timestamp, uint64_t pc,
>> +                            size_t depth, UftraceRecordType type)
>> +{
>> +    /* libmcount/record.c:record_event */
>> +    const uint64_t record_magic = 0x5;
>> +    uint64_t data = type | record_magic << 3;
>> +    data += depth << 6;
>> +    data += pc << 16;
>> +    UftraceEntry e = {.timestamp = timestamp, .data = data};
>> +    g_array_append_val(t->t, e);
>> +    if (t->t->len * sizeof(UftraceEntry) > 32 * MiB) {
>> +        trace_flush(t, true);
>> +    }
>> +}
>> +
>> +static void trace_enter_function(Trace *t, uint64_t timestamp,
>> +                                 uint64_t pc, size_t depth)
>> +{
>> +    trace_add_entry(t, timestamp, pc, depth, UFTRACE_ENTRY);
>> +}
>> +
>> +static void trace_exit_function(Trace *t, uint64_t timestamp,
>> +                                uint64_t pc, size_t depth)
>> +{
>> +    trace_add_entry(t, timestamp, pc, depth, UFTRACE_EXIT);
>> +}
>> +
>> +static void trace_enter_stack(Trace *t, Callstack *cs, uint64_t timestamp)
>> +{
>> +    for (size_t depth = 1; depth <= callstack_depth(cs); ++depth) {
>> +        trace_enter_function(t, timestamp, callstack_at(cs, depth)->pc, depth);
>> +    }
>> +}
>> +
>> +static void trace_exit_stack(Trace *t, Callstack *cs, uint64_t timestamp)
>> +{
>> +    for (size_t depth = callstack_depth(cs); depth > 0; --depth) {
>> +        trace_exit_function(t, timestamp, callstack_at(cs, depth)->pc, depth);
>> +    }
>> +}
>> +
>> static uint64_t cpu_read_register64(Cpu *cpu, struct qemu_plugin_register *reg)
>> {
>>      GByteArray *buf = cpu->buf;
>> @@ -223,7 +349,9 @@ static void track_callstack(unsigned int cpu_index, void *udata)
>> {
>>      uint64_t pc = (uintptr_t) udata;
>>      Cpu *cpu = qemu_plugin_scoreboard_find(score, cpu_index);
>> +    uint64_t timestamp = gettime_ns();
>>      Callstack *cs = cpu->cs;
>> +    Trace *t = cpu->trace;
>>
>>      uint64_t fp = cpu->ops.get_frame_pointer(cpu);
>>      if (!fp && callstack_empty(cs)) {
>> @@ -232,6 +360,7 @@ static void track_callstack(unsigned int cpu_index, void *udata)
>>           * long as a proper call does not happen.
>>           */
>>          callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
>> +        trace_enter_function(t, timestamp, pc, callstack_depth(cs));
>>          return;
>>      }
>>
>> @@ -244,7 +373,8 @@ static void track_callstack(unsigned int cpu_index, void *udata)
>>      CallstackEntry caller = callstack_caller(cs);
>>      if (fp == caller.frame_pointer) {
>>          /* return */
>> -        callstack_pop(cs);
>> +        CallstackEntry e = callstack_pop(cs);
>> +        trace_exit_function(t, timestamp, e.pc, callstack_depth(cs));
>>          return;
>>      }
>>
>> @@ -252,12 +382,16 @@ static void track_callstack(unsigned int cpu_index, void *udata)
>>      if (caller_fp == top.frame_pointer) {
>>          /* call */
>>          callstack_push(cs, (CallstackEntry){.frame_pointer = fp, .pc = pc});
>> +        trace_enter_function(t, timestamp, pc, callstack_depth(cs));
>>          return;
>>      }
>>
>>      /* discontinuity, exit current stack and unwind new one */
>> +    trace_exit_stack(t, cs, timestamp);
>>      callstack_clear(cs);
>> +
>>      cpu_unwind_stack(cpu, fp, pc);
>> +    trace_enter_stack(t, cs, timestamp);
>> }
>>
>> static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>> @@ -296,6 +430,16 @@ static void vcpu_init(qemu_plugin_id_t id, unsigned int vcpu_index)
>>      cpu->ops.init(cpu);
>>      cpu->buf = g_byte_array_new();
>>
>> +    g_assert(vcpu_index < UINT32_MAX / 100);
>> +    /* trace_id is: cpu_number * 100 */
>> +    uint32_t trace_id = (vcpu_index + 1) * 100;
>> +
>> +    g_autoptr(GString) trace_name = g_string_new(NULL);
>> +    g_string_append_printf(trace_name, "cpu%u", vcpu_index);
>> +    cpu->trace = trace_new(trace_id, trace_name);
>> +    /* create/truncate trace file */
>> +    trace_flush(cpu->trace, false);
>> +
>>      cpu->cs = callstack_new();
>> }
>>
>> @@ -304,6 +448,7 @@ static void vcpu_end(unsigned int vcpu_index)
>>      Cpu *cpu = qemu_plugin_scoreboard_find(score, vcpu_index);
>>      g_byte_array_free(cpu->buf, true);
>>
>> +    trace_free(cpu->trace);
>>      callstack_free(cpu->cs);
>>      memset(cpu, 0, sizeof(Cpu));
>> }
>> @@ -311,6 +456,8 @@ static void vcpu_end(unsigned int vcpu_index)
>> static void at_exit(qemu_plugin_id_t id, void *data)
>> {
>>      for (size_t i = 0; i < qemu_plugin_num_vcpus(); ++i) {
>> +        Cpu *cpu = qemu_plugin_scoreboard_find(score, i);
>> +        trace_flush(cpu->trace, true);
>>          vcpu_end(i);
>>      }
>>
>> -- 
>> 2.47.2
>>
> 
> Looks good overall.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/9] contrib/plugins/uftrace: implement tracing
  2025-08-08 18:03     ` Pierrick Bouvier
@ 2025-08-08 18:13       ` Manos Pitsidianakis
  2025-08-08 18:23         ` Pierrick Bouvier
  0 siblings, 1 reply; 35+ messages in thread
From: Manos Pitsidianakis @ 2025-08-08 18:13 UTC (permalink / raw)
  To: Pierrick Bouvier
  Cc: qemu-devel, Mahmoud Mandour, Philippe Mathieu-Daud é,
	rowan Hart, Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On Fri, Aug 8, 2025 at 9:03 PM Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> On 8/8/25 2:11 AM, Manos Pitsidianakis wrote:
> > On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
> >> We implement tracing, following uftrace format.
> >> Trace is flushed every 32 MB, so file operations don't impact
> >> performance at runtime.
> >>
> >> A different trace is generated per cpu, and we ensure they have a unique
> >> name, based on vcpu_index, while keeping room for privilege level coming
> >> in next commit.
> >
> > Suggestion (not a request): put some kind of documentation about the
> > format this patch implements, maybe a commit sha & URL to a header file
> > from upstream uftrace.
> >
> >>
> >> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> >> ---
> >> contrib/plugins/uftrace.c | 149 +++++++++++++++++++++++++++++++++++++-
> >> 1 file changed, 148 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
> >> index d51faceb344..402a242433e 100644
> >> --- a/contrib/plugins/uftrace.c
> >> +++ b/contrib/plugins/uftrace.c
> >> @@ -12,6 +12,13 @@
> >> #include <qemu-plugin.h>
> >> #include <glib.h>
> >> #include <stdio.h>
> >> +#include <sys/stat.h>
> >> +#include <sys/time.h>
> >> +#include <time.h>
> >> +#include <unistd.h>
> >> +
> >> +#define MiB     (INT64_C(1) << 20)
> >> +#define NANOSECONDS_PER_SECOND 1000000000LL
> >>
> >> QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> >>
> >> @@ -24,6 +31,13 @@ typedef struct {
> >>      uint64_t frame_pointer;
> >> } CallstackEntry;
> >>
> >> +typedef struct {
> >> +    GArray *t;
> >> +    GString *path;
> >> +    GString *name;
> >> +    uint32_t id;
> >> +} Trace;
> >> +
> >> typedef struct Cpu Cpu;
> >>
> >> typedef struct {
> >> @@ -34,6 +48,7 @@ typedef struct {
> >> } CpuOps;
> >>
> >> typedef struct Cpu {
> >> +    Trace *trace;
> >>      Callstack *cs;
> >>      GByteArray *buf;
> >>      CpuOps ops;
> >> @@ -44,9 +59,41 @@ typedef struct {
> >>      struct qemu_plugin_register *reg_fp;
> >> } Aarch64Cpu;
> >>
> >> +typedef struct {
> >> +    uint64_t timestamp;
> >> +    uint64_t data;
> >> +} UftraceEntry;
> >> +
> >> +typedef enum {
> >> +    UFTRACE_ENTRY,
> >> +    UFTRACE_EXIT,
> >> +    UFTRACE_LOST,
> >> +    UFTRACE_EVENT
> >> +} UftraceRecordType;
> >> +
> >> static struct qemu_plugin_scoreboard *score;
> >> static CpuOps arch_ops;
> >>
> >> +static uint64_t gettime_ns(void)
> >> +{
> >> +#ifdef _WIN32
> >> +    /*
> >> +     * On Windows, timespec_get is available only with UCRT, but not with
> >> +     * MinGW64 environment. Simplify by using only gettimeofday on this
> >> +     * platform. This may result in a precision loss.
> >> +     */
> >> +    struct timeval tv;
> >> +    gettimeofday(&tv, NULL);
> >> +    uint64_t now_ns = tv.tv_sec * NANOSECONDS_PER_SECOND + tv.tv_usec * 1000;
> >> +#else
> >> +    /* We need nanosecond precision for short lived functions. */
> >> +    struct timespec ts;
> >> +    timespec_get(&ts, TIME_UTC);
> >> +    uint64_t now_ns = ts.tv_sec * NANOSECONDS_PER_SECOND + ts.tv_nsec;
> >> +#endif
> >> +    return now_ns;
> >> +}
> >> +
> >> static Callstack *callstack_new(void)
> >> {
> >>      Callstack *cs = g_new0(Callstack, 1);
> >> @@ -112,6 +159,85 @@ static CallstackEntry callstack_pop(Callstack *cs)
> >>      return e;
> >> }
> >>
> >> +static Trace *trace_new(uint32_t id, GString *name)
> >> +{
> >> +    Trace *t = g_new0(Trace, 1);
> >> +    t->t = g_array_new(false, false, sizeof(UftraceEntry));
> >> +    t->path = g_string_new(NULL);
> >> +    g_string_append_printf(t->path, "./uftrace.data/%"PRIu32".dat", id);
> >> +    t->name = g_string_new(name->str);
> >> +    t->id = id;
> >> +    return t;
> >> +}
> >> +
> >> +static void trace_free(Trace *t)
> >> +{
> >> +    g_assert(t->t->len == 0);
> >> +    g_array_free(t->t, true);
> >> +    t->t = NULL;
> >> +    g_string_free(t->path, true);
> >> +    t->path = NULL;
> >> +    g_string_free(t->name, true);
> >> +    t->name = NULL;
> >> +    g_free(t);
> >> +}
> >> +
> >> +static void trace_flush(Trace *t, bool append)
> >> +{
> >> +    int create_dir = g_mkdir_with_parents("./uftrace.data",
> >> +                                          S_IRWXU | S_IRWXG | S_IRWXO);
> >> +    g_assert(create_dir == 0);
> >> +    FILE *dat = fopen(t->path->str, append ? "a" : "w");
> >> +    g_assert(dat);
> >> +    GArray *data = t->t;
> >> +    if (data->len) {
> >> +        fwrite(data->data, data->len, sizeof(UftraceEntry), dat);
> >
> > fwrite might not write all bytes, how about using the
> > g_file_set_contents() wrapper?
> >
>
> If I see correctly, g_file_set_contents does not allow to append data,
> which is what we need to do here (that's the point of flushing every
> 32MB).

Ah you're right, my bad. It might be because it renames a temporary
file to the destination filename in order to make the operation
atomic.

> I can add an assert on fwrite return to make sure we wrote
> everything.

Or a while loop with a bytes_written counter.

Or keep it all in memory and write it on exit? Your call


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/9] contrib/plugins/uftrace: implement tracing
  2025-08-08 18:13       ` Manos Pitsidianakis
@ 2025-08-08 18:23         ` Pierrick Bouvier
  2025-08-08 18:29           ` Manos Pitsidianakis
  0 siblings, 1 reply; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 18:23 UTC (permalink / raw)
  To: Manos Pitsidianakis
  Cc: qemu-devel, Mahmoud Mandour, Philippe Mathieu-Daud é,
	rowan Hart, Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On 8/8/25 11:13 AM, Manos Pitsidianakis wrote:
> On Fri, Aug 8, 2025 at 9:03 PM Pierrick Bouvier
> <pierrick.bouvier@linaro.org> wrote:
>>
>> On 8/8/25 2:11 AM, Manos Pitsidianakis wrote:
>>> On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>>>> We implement tracing, following uftrace format.
>>>> Trace is flushed every 32 MB, so file operations don't impact
>>>> performance at runtime.
>>>>
>>>> A different trace is generated per cpu, and we ensure they have a unique
>>>> name, based on vcpu_index, while keeping room for privilege level coming
>>>> in next commit.
>>>
>>> Suggestion (not a request): put some kind of documentation about the
>>> format this patch implements, maybe a commit sha & URL to a header file
>>> from upstream uftrace.
>>>
>>>>
>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>> ---
>>>> contrib/plugins/uftrace.c | 149 +++++++++++++++++++++++++++++++++++++-
>>>> 1 file changed, 148 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
>>>> index d51faceb344..402a242433e 100644
>>>> --- a/contrib/plugins/uftrace.c
>>>> +++ b/contrib/plugins/uftrace.c
>>>> @@ -12,6 +12,13 @@
>>>> #include <qemu-plugin.h>
>>>> #include <glib.h>
>>>> #include <stdio.h>
>>>> +#include <sys/stat.h>
>>>> +#include <sys/time.h>
>>>> +#include <time.h>
>>>> +#include <unistd.h>
>>>> +
>>>> +#define MiB     (INT64_C(1) << 20)
>>>> +#define NANOSECONDS_PER_SECOND 1000000000LL
>>>>
>>>> QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>>>>
>>>> @@ -24,6 +31,13 @@ typedef struct {
>>>>       uint64_t frame_pointer;
>>>> } CallstackEntry;
>>>>
>>>> +typedef struct {
>>>> +    GArray *t;
>>>> +    GString *path;
>>>> +    GString *name;
>>>> +    uint32_t id;
>>>> +} Trace;
>>>> +
>>>> typedef struct Cpu Cpu;
>>>>
>>>> typedef struct {
>>>> @@ -34,6 +48,7 @@ typedef struct {
>>>> } CpuOps;
>>>>
>>>> typedef struct Cpu {
>>>> +    Trace *trace;
>>>>       Callstack *cs;
>>>>       GByteArray *buf;
>>>>       CpuOps ops;
>>>> @@ -44,9 +59,41 @@ typedef struct {
>>>>       struct qemu_plugin_register *reg_fp;
>>>> } Aarch64Cpu;
>>>>
>>>> +typedef struct {
>>>> +    uint64_t timestamp;
>>>> +    uint64_t data;
>>>> +} UftraceEntry;
>>>> +
>>>> +typedef enum {
>>>> +    UFTRACE_ENTRY,
>>>> +    UFTRACE_EXIT,
>>>> +    UFTRACE_LOST,
>>>> +    UFTRACE_EVENT
>>>> +} UftraceRecordType;
>>>> +
>>>> static struct qemu_plugin_scoreboard *score;
>>>> static CpuOps arch_ops;
>>>>
>>>> +static uint64_t gettime_ns(void)
>>>> +{
>>>> +#ifdef _WIN32
>>>> +    /*
>>>> +     * On Windows, timespec_get is available only with UCRT, but not with
>>>> +     * MinGW64 environment. Simplify by using only gettimeofday on this
>>>> +     * platform. This may result in a precision loss.
>>>> +     */
>>>> +    struct timeval tv;
>>>> +    gettimeofday(&tv, NULL);
>>>> +    uint64_t now_ns = tv.tv_sec * NANOSECONDS_PER_SECOND + tv.tv_usec * 1000;
>>>> +#else
>>>> +    /* We need nanosecond precision for short lived functions. */
>>>> +    struct timespec ts;
>>>> +    timespec_get(&ts, TIME_UTC);
>>>> +    uint64_t now_ns = ts.tv_sec * NANOSECONDS_PER_SECOND + ts.tv_nsec;
>>>> +#endif
>>>> +    return now_ns;
>>>> +}
>>>> +
>>>> static Callstack *callstack_new(void)
>>>> {
>>>>       Callstack *cs = g_new0(Callstack, 1);
>>>> @@ -112,6 +159,85 @@ static CallstackEntry callstack_pop(Callstack *cs)
>>>>       return e;
>>>> }
>>>>
>>>> +static Trace *trace_new(uint32_t id, GString *name)
>>>> +{
>>>> +    Trace *t = g_new0(Trace, 1);
>>>> +    t->t = g_array_new(false, false, sizeof(UftraceEntry));
>>>> +    t->path = g_string_new(NULL);
>>>> +    g_string_append_printf(t->path, "./uftrace.data/%"PRIu32".dat", id);
>>>> +    t->name = g_string_new(name->str);
>>>> +    t->id = id;
>>>> +    return t;
>>>> +}
>>>> +
>>>> +static void trace_free(Trace *t)
>>>> +{
>>>> +    g_assert(t->t->len == 0);
>>>> +    g_array_free(t->t, true);
>>>> +    t->t = NULL;
>>>> +    g_string_free(t->path, true);
>>>> +    t->path = NULL;
>>>> +    g_string_free(t->name, true);
>>>> +    t->name = NULL;
>>>> +    g_free(t);
>>>> +}
>>>> +
>>>> +static void trace_flush(Trace *t, bool append)
>>>> +{
>>>> +    int create_dir = g_mkdir_with_parents("./uftrace.data",
>>>> +                                          S_IRWXU | S_IRWXG | S_IRWXO);
>>>> +    g_assert(create_dir == 0);
>>>> +    FILE *dat = fopen(t->path->str, append ? "a" : "w");
>>>> +    g_assert(dat);
>>>> +    GArray *data = t->t;
>>>> +    if (data->len) {
>>>> +        fwrite(data->data, data->len, sizeof(UftraceEntry), dat);
>>>
>>> fwrite might not write all bytes, how about using the
>>> g_file_set_contents() wrapper?
>>>
>>
>> If I see correctly, g_file_set_contents does not allow to append data,
>> which is what we need to do here (that's the point of flushing every
>> 32MB).
> 
> Ah you're right, my bad. It might be because it renames a temporary
> file to the destination filename in order to make the operation
> atomic.
>
>> I can add an assert on fwrite return to make sure we wrote
>> everything.
> 
> Or a while loop with a bytes_written counter.
> 

fwrite guarantees all data will be written (to the opposite or raw write 
calls), including if syscall is interrupted, doing a proper write loop 
until all data is written. You can check glibc source if in doubt.
Thus, the only situation where it might fail is if disk space is full. 
In this case, I think that an assert (similar to a panic!) is good 
enough because we can't really do anything to solve the problem.

> Or keep it all in memory and write it on exit? Your call

That was the first version, but it does not scale on long executions, 
where traces can reach several gb.

By the way, adding the assert caught a bug, size and num_elems were 
reversed, so definitely thanks for pointing this out.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/9] contrib/plugins/uftrace: implement tracing
  2025-08-08  9:11   ` Manos Pitsidianakis
  2025-08-08 16:38     ` Pierrick Bouvier
  2025-08-08 18:03     ` Pierrick Bouvier
@ 2025-08-08 18:24     ` Pierrick Bouvier
  2 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 18:24 UTC (permalink / raw)
  To: Manos Pitsidianakis, qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daud é, rowan Hart,
	Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On 8/8/25 2:11 AM, Manos Pitsidianakis wrote:
> On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
>> We implement tracing, following uftrace format.
>> Trace is flushed every 32 MB, so file operations don't impact
>> performance at runtime.
>>
>> A different trace is generated per cpu, and we ensure they have a unique
>> name, based on vcpu_index, while keeping room for privilege level coming
>> in next commit.
> 
> Suggestion (not a request): put some kind of documentation about the
> format this patch implements, maybe a commit sha & URL to a header file
> from upstream uftrace.
> 

I added a link to uftrace source:
https://github.com/namhyung/uftrace/blob/v0.18/libmcount/record.c#L909

As well, I updated the comment with this URL.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 4/9] contrib/plugins/uftrace: implement tracing
  2025-08-08 18:23         ` Pierrick Bouvier
@ 2025-08-08 18:29           ` Manos Pitsidianakis
  0 siblings, 0 replies; 35+ messages in thread
From: Manos Pitsidianakis @ 2025-08-08 18:29 UTC (permalink / raw)
  To: Pierrick Bouvier
  Cc: qemu-devel, Mahmoud Mandour, Philippe Mathieu-Daud é,
	rowan Hart, Gustavo Romero, Alex Benn é e, Alexandre Iooss,
	Peter Maydell, Richard Henderson

On Fri, Aug 8, 2025 at 9:23 PM Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> On 8/8/25 11:13 AM, Manos Pitsidianakis wrote:
> > On Fri, Aug 8, 2025 at 9:03 PM Pierrick Bouvier
> > <pierrick.bouvier@linaro.org> wrote:
> >>
> >> On 8/8/25 2:11 AM, Manos Pitsidianakis wrote:
> >>> On Fri, 08 Aug 2025 05:06, Pierrick Bouvier <pierrick.bouvier@linaro.org> wrote:
> >>>> We implement tracing, following uftrace format.
> >>>> Trace is flushed every 32 MB, so file operations don't impact
> >>>> performance at runtime.
> >>>>
> >>>> A different trace is generated per cpu, and we ensure they have a unique
> >>>> name, based on vcpu_index, while keeping room for privilege level coming
> >>>> in next commit.
> >>>
> >>> Suggestion (not a request): put some kind of documentation about the
> >>> format this patch implements, maybe a commit sha & URL to a header file
> >>> from upstream uftrace.
> >>>
> >>>>
> >>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> >>>> ---
> >>>> contrib/plugins/uftrace.c | 149 +++++++++++++++++++++++++++++++++++++-
> >>>> 1 file changed, 148 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/contrib/plugins/uftrace.c b/contrib/plugins/uftrace.c
> >>>> index d51faceb344..402a242433e 100644
> >>>> --- a/contrib/plugins/uftrace.c
> >>>> +++ b/contrib/plugins/uftrace.c
> >>>> @@ -12,6 +12,13 @@
> >>>> #include <qemu-plugin.h>
> >>>> #include <glib.h>
> >>>> #include <stdio.h>
> >>>> +#include <sys/stat.h>
> >>>> +#include <sys/time.h>
> >>>> +#include <time.h>
> >>>> +#include <unistd.h>
> >>>> +
> >>>> +#define MiB     (INT64_C(1) << 20)
> >>>> +#define NANOSECONDS_PER_SECOND 1000000000LL
> >>>>
> >>>> QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> >>>>
> >>>> @@ -24,6 +31,13 @@ typedef struct {
> >>>>       uint64_t frame_pointer;
> >>>> } CallstackEntry;
> >>>>
> >>>> +typedef struct {
> >>>> +    GArray *t;
> >>>> +    GString *path;
> >>>> +    GString *name;
> >>>> +    uint32_t id;
> >>>> +} Trace;
> >>>> +
> >>>> typedef struct Cpu Cpu;
> >>>>
> >>>> typedef struct {
> >>>> @@ -34,6 +48,7 @@ typedef struct {
> >>>> } CpuOps;
> >>>>
> >>>> typedef struct Cpu {
> >>>> +    Trace *trace;
> >>>>       Callstack *cs;
> >>>>       GByteArray *buf;
> >>>>       CpuOps ops;
> >>>> @@ -44,9 +59,41 @@ typedef struct {
> >>>>       struct qemu_plugin_register *reg_fp;
> >>>> } Aarch64Cpu;
> >>>>
> >>>> +typedef struct {
> >>>> +    uint64_t timestamp;
> >>>> +    uint64_t data;
> >>>> +} UftraceEntry;
> >>>> +
> >>>> +typedef enum {
> >>>> +    UFTRACE_ENTRY,
> >>>> +    UFTRACE_EXIT,
> >>>> +    UFTRACE_LOST,
> >>>> +    UFTRACE_EVENT
> >>>> +} UftraceRecordType;
> >>>> +
> >>>> static struct qemu_plugin_scoreboard *score;
> >>>> static CpuOps arch_ops;
> >>>>
> >>>> +static uint64_t gettime_ns(void)
> >>>> +{
> >>>> +#ifdef _WIN32
> >>>> +    /*
> >>>> +     * On Windows, timespec_get is available only with UCRT, but not with
> >>>> +     * MinGW64 environment. Simplify by using only gettimeofday on this
> >>>> +     * platform. This may result in a precision loss.
> >>>> +     */
> >>>> +    struct timeval tv;
> >>>> +    gettimeofday(&tv, NULL);
> >>>> +    uint64_t now_ns = tv.tv_sec * NANOSECONDS_PER_SECOND + tv.tv_usec * 1000;
> >>>> +#else
> >>>> +    /* We need nanosecond precision for short lived functions. */
> >>>> +    struct timespec ts;
> >>>> +    timespec_get(&ts, TIME_UTC);
> >>>> +    uint64_t now_ns = ts.tv_sec * NANOSECONDS_PER_SECOND + ts.tv_nsec;
> >>>> +#endif
> >>>> +    return now_ns;
> >>>> +}
> >>>> +
> >>>> static Callstack *callstack_new(void)
> >>>> {
> >>>>       Callstack *cs = g_new0(Callstack, 1);
> >>>> @@ -112,6 +159,85 @@ static CallstackEntry callstack_pop(Callstack *cs)
> >>>>       return e;
> >>>> }
> >>>>
> >>>> +static Trace *trace_new(uint32_t id, GString *name)
> >>>> +{
> >>>> +    Trace *t = g_new0(Trace, 1);
> >>>> +    t->t = g_array_new(false, false, sizeof(UftraceEntry));
> >>>> +    t->path = g_string_new(NULL);
> >>>> +    g_string_append_printf(t->path, "./uftrace.data/%"PRIu32".dat", id);
> >>>> +    t->name = g_string_new(name->str);
> >>>> +    t->id = id;
> >>>> +    return t;
> >>>> +}
> >>>> +
> >>>> +static void trace_free(Trace *t)
> >>>> +{
> >>>> +    g_assert(t->t->len == 0);
> >>>> +    g_array_free(t->t, true);
> >>>> +    t->t = NULL;
> >>>> +    g_string_free(t->path, true);
> >>>> +    t->path = NULL;
> >>>> +    g_string_free(t->name, true);
> >>>> +    t->name = NULL;
> >>>> +    g_free(t);
> >>>> +}
> >>>> +
> >>>> +static void trace_flush(Trace *t, bool append)
> >>>> +{
> >>>> +    int create_dir = g_mkdir_with_parents("./uftrace.data",
> >>>> +                                          S_IRWXU | S_IRWXG | S_IRWXO);
> >>>> +    g_assert(create_dir == 0);
> >>>> +    FILE *dat = fopen(t->path->str, append ? "a" : "w");
> >>>> +    g_assert(dat);
> >>>> +    GArray *data = t->t;
> >>>> +    if (data->len) {
> >>>> +        fwrite(data->data, data->len, sizeof(UftraceEntry), dat);
> >>>
> >>> fwrite might not write all bytes, how about using the
> >>> g_file_set_contents() wrapper?
> >>>
> >>
> >> If I see correctly, g_file_set_contents does not allow to append data,
> >> which is what we need to do here (that's the point of flushing every
> >> 32MB).
> >
> > Ah you're right, my bad. It might be because it renames a temporary
> > file to the destination filename in order to make the operation
> > atomic.
> >
> >> I can add an assert on fwrite return to make sure we wrote
> >> everything.
> >
> > Or a while loop with a bytes_written counter.
> >
>
> fwrite guarantees all data will be written (to the opposite or raw write
> calls), including if syscall is interrupted, doing a proper write loop
> until all data is written.

Right, as the manpage says:

> RETURN VALUE
>      On success, fread() and fwrite() return the number of items read or written.  This number equals the number of bytes transferred only when size is 1.  If an error occurs, or the
>      end of the file is reached, the return value is a short item count (or zero).

I thought it meant that "on error" included EAGAIN/EINTR, that's why I
suggested a loop. But the manpage doesn't list error values or even
mention whether it sets errno at all so I was mistaken.

> You can check glibc source if in doubt.
> Thus, the only situation where it might fail is if disk space is full.
> In this case, I think that an assert (similar to a panic!) is good
> enough because we can't really do anything to solve the problem.
>
> > Or keep it all in memory and write it on exit? Your call
>
> That was the first version, but it does not scale on long executions,
> where traces can reach several gb.
>
> By the way, adding the assert caught a bug, size and num_elems were
> reversed, so definitely thanks for pointing this out.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH v5 0/9] contrib/plugins: uftrace
  2025-08-08  2:06 [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
                   ` (8 preceding siblings ...)
  2025-08-08  2:07 ` [PATCH v5 9/9] contrib/plugins/uftrace: add documentation Pierrick Bouvier
@ 2025-08-08 20:45 ` Pierrick Bouvier
  9 siblings, 0 replies; 35+ messages in thread
From: Pierrick Bouvier @ 2025-08-08 20:45 UTC (permalink / raw)
  To: qemu-devel
  Cc: Mahmoud Mandour, Philippe Mathieu-Daudé, rowan Hart,
	Gustavo Romero, Manos Pitsidianakis, Alex Bennée,
	Alexandre Iooss, Peter Maydell, Richard Henderson

On 8/7/25 7:06 PM, Pierrick Bouvier wrote:
> This plugin generates a binary trace compatible with the excellent uftrace:
> https://github.com/namhyung/uftrace
> 
> In short, it tracks all function calls performed during execution, based on
> frame pointer analysis. A big advantage over "uftrace record" is that it works
> in system mode, allowing to trace a full system execution, which was the
> original goal. It works as well in user mode, but uftrace itself already does
> this. It's implemented for aarch64 and x86_64.
> 
> Let's start with concrete examples of the result.
> 
> First, in system mode, booting a stack using TF-A + U-boot + Linux:
> - Two first stages of boot sequence in Arm Trusted Firmware (EL3 and S-EL1)
> https://fileserver.linaro.org/s/kkxBS552W7nYESX/preview
> - Stat and open syscalls in kernel
> https://fileserver.linaro.org/s/dXe4MfraKg2F476/preview
> - Poweroff sequence (from kernel back to firmware, NS-EL2 to EL3)
> https://fileserver.linaro.org/s/oR2PtyGKJrqnfRf/preview
> 
> Full trace is available here:
> https://fileserver.linaro.org/s/WsemLboPEzo24nw/download/aarch64_boot.json.gz
> You can download and open it on https://ui.perfetto.dev/ to explore it.
> 
> Second, in user mode, tracing qemu-aarch64 (itself) running git --help:
> - Loading program and its interpreter
> https://fileserver.linaro.org/s/fie8JgX76yyL5cq/preview
> - TB creation
> https://fileserver.linaro.org/s/GXY6NKMw5EeRCew/preview
> 
> Full trace is available here:
> https://fileserver.linaro.org/s/N8X8fnZ5yGRZLsT/download/qemu_aarch64_git_help.json.gz
> 
> If you had curiosity and now you're ready to give some attention, most of the
> details you want to read are included in the documentation patch (final one).
> 
> Overhead is around x5-x15, and long traces can be directly filtered with uftrace
> if needed.
> 
> I hope this plugin can help people trying to understand what happens out of the
> user space, and get a better grasp of how firmwares, bootloader, and kernel
> interact behind the curtain.
> 
> v2
> --
> 
> - trace active stacks on exit
> - do not erase map generated in system_emulation
> - add documentation to generate restricted visual traces around specific events
>    of execution
> 
> v3
> --
> 
> - fix missing include unistd.h (build failed on MacOS only)
> 
> v4
> --
> 
> - add support for x64
> 
> v5
> --
> 
> - addressed Alex comments
> - split plugin implementation in several commits
> - removed instruction based timestamps (only use time based timestamps)
> - removed sampling implementation
> 
> Pierrick Bouvier (9):
>    contrib/plugins/uftrace: skeleton file
>    contrib/plugins/uftrace: define cpu operations and implement aarch64
>    contrib/plugins/uftrace: track callstack
>    contrib/plugins/uftrace: implement tracing
>    contrib/plugins/uftrace: implement privilege level tracing
>    contrib/plugins/uftrace: generate additional files for uftrace
>    contrib/plugins/uftrace: implement x64 support
>    contrib/plugins/uftrace_symbols.py
>    contrib/plugins/uftrace: add documentation
> 
>   docs/about/emulation.rst           | 197 +++++++
>   contrib/plugins/uftrace.c          | 871 +++++++++++++++++++++++++++++
>   contrib/plugins/meson.build        |   3 +-
>   contrib/plugins/uftrace_symbols.py | 152 +++++
>   4 files changed, 1222 insertions(+), 1 deletion(-)
>   create mode 100644 contrib/plugins/uftrace.c
>   create mode 100755 contrib/plugins/uftrace_symbols.py
> 

Sent v6:
https://lore.kernel.org/qemu-devel/20250808204156.659715-1-pierrick.bouvier@linaro.org/


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2025-08-08 20:45 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-08  2:06 [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier
2025-08-08  2:06 ` [PATCH v5 1/9] contrib/plugins/uftrace: skeleton file Pierrick Bouvier
2025-08-08  8:14   ` Manos Pitsidianakis
2025-08-08 16:19     ` Pierrick Bouvier
2025-08-08 16:34       ` Manos Pitsidianakis
2025-08-08 17:28     ` Pierrick Bouvier
2025-08-08  2:06 ` [PATCH v5 2/9] contrib/plugins/uftrace: define cpu operations and implement aarch64 Pierrick Bouvier
2025-08-08  8:28   ` Manos Pitsidianakis
2025-08-08 16:27     ` Pierrick Bouvier
2025-08-08 16:36       ` Manos Pitsidianakis
2025-08-08  2:06 ` [PATCH v5 3/9] contrib/plugins/uftrace: track callstack Pierrick Bouvier
2025-08-08  8:49   ` Manos Pitsidianakis
2025-08-08 16:36     ` Pierrick Bouvier
2025-08-08  2:06 ` [PATCH v5 4/9] contrib/plugins/uftrace: implement tracing Pierrick Bouvier
2025-08-08  9:11   ` Manos Pitsidianakis
2025-08-08 16:38     ` Pierrick Bouvier
2025-08-08 18:03     ` Pierrick Bouvier
2025-08-08 18:13       ` Manos Pitsidianakis
2025-08-08 18:23         ` Pierrick Bouvier
2025-08-08 18:29           ` Manos Pitsidianakis
2025-08-08 18:24     ` Pierrick Bouvier
2025-08-08  2:06 ` [PATCH v5 5/9] contrib/plugins/uftrace: implement privilege level tracing Pierrick Bouvier
2025-08-08  9:26   ` Manos Pitsidianakis
2025-08-08 16:41     ` Pierrick Bouvier
2025-08-08  2:06 ` [PATCH v5 6/9] contrib/plugins/uftrace: generate additional files for uftrace Pierrick Bouvier
2025-08-08  9:37   ` Manos Pitsidianakis
2025-08-08 16:42     ` Pierrick Bouvier
2025-08-08  2:07 ` [PATCH v5 7/9] contrib/plugins/uftrace: implement x64 support Pierrick Bouvier
2025-08-08  9:42   ` Manos Pitsidianakis
2025-08-08 16:52     ` Pierrick Bouvier
2025-08-08  2:07 ` [PATCH v5 8/9] contrib/plugins/uftrace_symbols.py Pierrick Bouvier
2025-08-08  2:07 ` [PATCH v5 9/9] contrib/plugins/uftrace: add documentation Pierrick Bouvier
2025-08-08  9:46   ` Manos Pitsidianakis
2025-08-08 16:59     ` Pierrick Bouvier
2025-08-08 20:45 ` [PATCH v5 0/9] contrib/plugins: uftrace Pierrick Bouvier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).