* [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions
@ 2026-03-18 10:46 Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 1/9] target/arm: Add API for dynamic exception injection Ruslan Ruslichenko
` (9 more replies)
0 siblings, 10 replies; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-18 10:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-arm, artem_mygaiev, volodymyr_babchuk, alex.bennee,
peter.maydell, pierrick.bouvier, philmd, Ruslan_Ruslichenko
From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
This patch series is submitted as an RFC to gather early feedback on a Fault Injection (FI) framework built on top of the QEMU TCG plugin subsystem.
Motivation
Testing guest operating systems, hypervisors (like Xen), and low-level drivers against unexpected hardware failures can be difficult.
This series provides an interface to inject faults dynamically without altering QEMU's core emulation source code for every test case.
Architecture & Key Features
The series introduces the core API extensions and implements a fault injection plugin (contrib/plugins/fault_injection.c) targeting AArch64.
The plugin can be controlled statically via XML configurations on boot, or dynamically at runtime via a UNIX socket (enabling integration with automated testing frameworks via Python or GDB).
New Plugin API Capabilities:
MMIO Interception: Allows plugins to hook into memory_region_dispatch_read/write to modify hardware register reads or drop writes.
Asynchronous Timers: Exposes QEMU_CLOCK_VIRTUAL to plugins, allowing callbacks to be scheduled based on guest virtual time.
TB Cache Flushing: Exposes qemu_plugin_flush_tb_cache() so plugins can force re-translation when applying dynamic PC-based hooks.
Interrupt & Exception Injection: Exposes APIs to raise/pulse hardware IRQs on the primary INTC and inject CPU exceptions (e.g., SErrors).
Custom Device Faults: Introduces a registry where device models (e.g., SMMUv3) can expose specific fault handlers (like CMDQ errors) to be triggered externally by plugins.
Patch Summary
Patch 1 (target/arm): Adds support for asynchronous CPU exception injection.
Patch 2-3 (plugins/api): Exposes virtual clock timers and TB cache flushing to the public plugin API.
Patch 4 (plugins): Introduces the core fault injection subsystem, IRQ/Exception routing, and the Custom Fault registry.
Patch 5 (system/memory): Adds the MMIO override hooks into the memory dispatch path.
Patch 6 (hw/intc): Registers the ARM GIC (v2/v3) with the plugin subsystem to enable direct hardware IRQ injection.
Patch 7 (hw/arm): Registers the SMMUv3 with the custom fault registry to demonstrate how device models can expose specific errors (like CMDQ faults) to plugins.
Patch 8 (contrib/plugins): Implements the actual fault_injection plugin using the new APIs.
Patch 9 (docs): Adds documentation and usage examples for the plugin.
Request for Comments & Feedback
Any suggestions on improvements, potential edge cases, or issues with the current design are highly welcome.
Ruslan Ruslichenko (9):
target/arm: Add API for dynamic exception injection
plugins/api: Expose virtual clock timers to plugins
plugins: Expose Transaction Block cache flush API to plugins
plugins: Introduce fault injection API and core subsystem
system/memory: Add plugin callbacks to intercept MMIO accesses
hw/intc/arm_gic: Register primary GIC for plugin IRQ injection
hw/arm/smmuv3: Add plugin fault handler for CMDQ errors
contrib/plugins: Add fault injection plugin
docs: Add description of fault-injection plugin and subsystem
contrib/plugins/fault_injection.c | 772 ++++++++++++++++++++++++++++++
contrib/plugins/meson.build | 1 +
docs/fault-injection.txt | 111 +++++
hw/arm/smmuv3.c | 54 +++
hw/intc/arm_gic.c | 28 ++
hw/intc/arm_gicv3.c | 28 ++
include/plugins/qemu-plugin.h | 28 ++
include/qemu/plugin.h | 39 ++
plugins/api.c | 62 +++
plugins/core.c | 11 +
plugins/fault.c | 116 +++++
plugins/meson.build | 1 +
plugins/plugin.h | 2 +
system/memory.c | 8 +
target/arm/cpu.h | 4 +
target/arm/helper.c | 55 +++
16 files changed, 1320 insertions(+)
create mode 100644 contrib/plugins/fault_injection.c
create mode 100644 docs/fault-injection.txt
create mode 100644 plugins/fault.c
--
2.43.0
^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC PATCH 1/9] target/arm: Add API for dynamic exception injection
2026-03-18 10:46 [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Ruslan Ruslichenko
@ 2026-03-18 10:46 ` Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 2/9] plugins/api: Expose virtual clock timers to plugins Ruslan Ruslichenko
` (8 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-18 10:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-arm, artem_mygaiev, volodymyr_babchuk, alex.bennee,
peter.maydell, pierrick.bouvier, philmd, Ruslan_Ruslichenko
From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
Implement arm_cpu_inject_exception() to allow external clients,
such as QEMU plugins or asynchronous timers, to inject exceptions
into the ARM guest.
Signed-off-by: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
---
target/arm/cpu.h | 4 ++++
target/arm/helper.c | 55 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 59 insertions(+)
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 657ff4ab20..f1d2d6e240 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2680,4 +2680,8 @@ extern const uint64_t pred_esz_masks[5];
#define LOG2_TAG_GRANULE 4
#define TAG_GRANULE (1 << LOG2_TAG_GRANULE)
+#ifndef CONFIG_USER_ONLY
+void arm_cpu_inject_exception(int excp_index, uint32_t syndrome);
+#endif
+
#endif
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 8c5769477c..73df3a9e6e 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -10241,4 +10241,59 @@ ARMSecuritySpace arm_security_space_below_el3(CPUARMState *env)
return ARMSS_NonSecure;
}
}
+
+typedef struct {
+ int excp_index;
+ uint32_t syndrome;
+} FIExcpAsync;
+
+static void fi_setup_exception(CPUState *cs, int excp_index, uint32_t syndrome)
+{
+ CPUARMState *env = cpu_env(cs);
+
+ cs->exception_index = excp_index;
+ env->exception.syndrome = syndrome;
+ env->exception.vaddress = env->pc;
+
+ if (excp_index == EXCP_VSERR) {
+ /* Serror syndrome constructed from vsesr_el2 */
+ env->cp15.vsesr_el2 = syndrome;
+ }
+
+ env->exception.target_el = arm_current_el(env);
+}
+
+static void arm_cpu_inject_exception_async(CPUState *cs, run_on_cpu_data data)
+{
+ FIExcpAsync *excp_data = (FIExcpAsync *)data.host_ptr;
+
+ fi_setup_exception(cs, excp_data->excp_index, excp_data->syndrome);
+
+ g_free(excp_data);
+}
+
+void arm_cpu_inject_exception(int excp_index, uint32_t syndrome)
+{
+ CPUState *cs = current_cpu;
+
+ if (!cs) {
+ /* If we called outside CPU thread (timer callback, etc) schedule async */
+ run_on_cpu_data async_data;
+ CPUState *cs0 = qemu_get_cpu(0);
+
+ FIExcpAsync *excp_data = g_new0(FIExcpAsync, 1);
+
+ excp_data->excp_index = excp_index;
+ excp_data->syndrome = syndrome;
+
+ async_data.host_ptr = excp_data;
+
+ async_run_on_cpu(cs0, arm_cpu_inject_exception_async, async_data);
+ return;
+ }
+
+ fi_setup_exception(cs, excp_index, syndrome);
+
+ cpu_loop_exit(cs);
+}
#endif /* !CONFIG_USER_ONLY */
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 2/9] plugins/api: Expose virtual clock timers to plugins
2026-03-18 10:46 [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 1/9] target/arm: Add API for dynamic exception injection Ruslan Ruslichenko
@ 2026-03-18 10:46 ` Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 3/9] plugins: Expose Transaction Block cache flush API " Ruslan Ruslichenko
` (7 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-18 10:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-arm, artem_mygaiev, volodymyr_babchuk, alex.bennee,
peter.maydell, pierrick.bouvier, philmd, Ruslan_Ruslichenko
From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
This patch extends QEMU Plugins API to allow set timers
in guest's virtual clock (QEMU_CLOCK_VIRTUAL).
It introduces qemu_plugin_timer_virt_ns(), which allows
scheduling a one-shot callback.
The patch also adds qemu_plugin_get_virtual_clock_ns(),
which can be used to query the current virtual time.
Signed-off-by: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
---
include/plugins/qemu-plugin.h | 6 ++++++
plugins/api.c | 29 +++++++++++++++++++++++++++++
2 files changed, 35 insertions(+)
diff --git a/include/plugins/qemu-plugin.h b/include/plugins/qemu-plugin.h
index 17a834dca9..bbd21e79c5 100644
--- a/include/plugins/qemu-plugin.h
+++ b/include/plugins/qemu-plugin.h
@@ -1246,6 +1246,12 @@ void qemu_plugin_u64_set(qemu_plugin_u64 entry, unsigned int vcpu_index,
QEMU_PLUGIN_API
uint64_t qemu_plugin_u64_sum(qemu_plugin_u64 entry);
+QEMU_PLUGIN_API
+uint64_t qemu_plugin_get_virtual_clock_ns(void);
+
+QEMU_PLUGIN_API
+void qemu_plugin_timer_virt_ns(uint64_t time, void (*cb)(void*), void *opaque);
+
#ifdef __cplusplus
} /* extern "C" */
#endif
diff --git a/plugins/api.c b/plugins/api.c
index 04ca7da7f1..609ea69293 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -39,6 +39,7 @@
#include "qemu/main-loop.h"
#include "qemu/plugin.h"
#include "qemu/log.h"
+#include "qemu/timer.h"
#include "system/memory.h"
#include "tcg/tcg.h"
#include "exec/gdbstub.h"
@@ -652,3 +653,31 @@ uint64_t qemu_plugin_u64_sum(qemu_plugin_u64 entry)
return total;
}
+typedef struct {
+ void (*cb)(void *opaque);
+ void* opaque;
+ QEMUTimer *timer;
+} qemu_plugin_timer_data;
+
+static void timer_cb(void* opaque)
+{
+ qemu_plugin_timer_data *data = (qemu_plugin_timer_data*)opaque;
+
+ data->cb(data->opaque);
+
+ timer_free(data->timer);
+ g_free(data);
+}
+
+QEMU_PLUGIN_API
+void qemu_plugin_timer_virt_ns(uint64_t time, void (*cb)(void*), void *opaque)
+{
+ qemu_plugin_timer_data* data = g_new0(qemu_plugin_timer_data, 1);
+
+ data->cb = cb;
+ data->opaque = opaque;
+
+ data->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, timer_cb, data);
+
+ timer_mod(data->timer, time);
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 3/9] plugins: Expose Transaction Block cache flush API to plugins
2026-03-18 10:46 [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 1/9] target/arm: Add API for dynamic exception injection Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 2/9] plugins/api: Expose virtual clock timers to plugins Ruslan Ruslichenko
@ 2026-03-18 10:46 ` Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 4/9] plugins: Introduce fault injection API and core subsystem Ruslan Ruslichenko
` (6 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-18 10:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-arm, artem_mygaiev, volodymyr_babchuk, alex.bennee,
peter.maydell, pierrick.bouvier, philmd, Ruslan_Ruslichenko
From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
The patch introduces qemu_plugin_flush_tb_cache() to the plugin API,
allowing plugins to invalidate QEMU translate code cache.
If a plugin needs to dynamically register a new instruction or memory
callback, the new hooks may not be triggered for code blocks that
QEMU has already translated and cached. This API allows QEMU
re-translate TB, so that new applied hooks will take effect.
Signed-off-by: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
---
include/plugins/qemu-plugin.h | 3 +++
plugins/api.c | 6 ++++++
plugins/core.c | 11 +++++++++++
plugins/plugin.h | 2 ++
4 files changed, 22 insertions(+)
diff --git a/include/plugins/qemu-plugin.h b/include/plugins/qemu-plugin.h
index bbd21e79c5..a68427536f 100644
--- a/include/plugins/qemu-plugin.h
+++ b/include/plugins/qemu-plugin.h
@@ -1246,6 +1246,9 @@ void qemu_plugin_u64_set(qemu_plugin_u64 entry, unsigned int vcpu_index,
QEMU_PLUGIN_API
uint64_t qemu_plugin_u64_sum(qemu_plugin_u64 entry);
+QEMU_PLUGIN_API
+void qemu_plugin_flush_tb_cache(void);
+
QEMU_PLUGIN_API
uint64_t qemu_plugin_get_virtual_clock_ns(void);
diff --git a/plugins/api.c b/plugins/api.c
index 609ea69293..fa650e1219 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -653,6 +653,12 @@ uint64_t qemu_plugin_u64_sum(qemu_plugin_u64 entry)
return total;
}
+QEMU_PLUGIN_API
+void qemu_plugin_flush_tb_cache(void)
+{
+ plugin_flush_tb_cache();
+}
+
typedef struct {
void (*cb)(void *opaque);
void* opaque;
diff --git a/plugins/core.c b/plugins/core.c
index 42fd986593..462f4bae81 100644
--- a/plugins/core.c
+++ b/plugins/core.c
@@ -21,6 +21,7 @@
#include "qemu/rcu.h"
#include "exec/tb-flush.h"
#include "tcg/tcg-op-common.h"
+#include "qemu/main-loop.h"
#include "plugin.h"
struct qemu_plugin_cb {
@@ -888,3 +889,13 @@ enum qemu_plugin_cb_flags tcg_call_to_qemu_plugin_cb_flags(int flags)
return QEMU_PLUGIN_CB_RW_REGS;
}
}
+
+void plugin_flush_tb_cache(void)
+{
+ CPUState *cpu = qemu_get_cpu(0);
+ if (cpu) {
+ queue_tb_flush(cpu);
+
+ qemu_cpu_kick(cpu);
+ }
+}
diff --git a/plugins/plugin.h b/plugins/plugin.h
index 6fbc443b96..0bf819536b 100644
--- a/plugins/plugin.h
+++ b/plugins/plugin.h
@@ -125,4 +125,6 @@ void plugin_scoreboard_free(struct qemu_plugin_scoreboard *score);
*/
void qemu_plugin_fillin_mode_info(qemu_info_t *info);
+void plugin_flush_tb_cache(void);
+
#endif /* PLUGIN_H */
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 4/9] plugins: Introduce fault injection API and core subsystem
2026-03-18 10:46 [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Ruslan Ruslichenko
` (2 preceding siblings ...)
2026-03-18 10:46 ` [RFC PATCH 3/9] plugins: Expose Transaction Block cache flush API " Ruslan Ruslichenko
@ 2026-03-18 10:46 ` Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 5/9] system/memory: Add plugin callbacks to intercept MMIO accesses Ruslan Ruslichenko
` (5 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-18 10:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-arm, artem_mygaiev, volodymyr_babchuk, alex.bennee,
peter.maydell, pierrick.bouvier, philmd, Ruslan_Ruslichenko
From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
The patch adds infrastructure of fault injection API to plugins.
The following capabilities introduced:
- MMIO overrides: Allows plugins to register callbacks that intercept
MMIO accesses.
- IRQ injection: Provides mechanism to raise or pulse hardware irq's
directly to primary interrupt controller.
- CPU Excheption injection: Provides API for injecting CPU exceptions.
As of now only ARM targets supported.
- Custom Fault Registry: Implements a registry allowing QEMU device
models to expose custom, device-specific fault handlers. Plugins
can trigger these dynamically by name.
Signed-off-by: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
---
include/plugins/qemu-plugin.h | 19 ++++++
include/qemu/plugin.h | 39 ++++++++++++
plugins/api.c | 27 ++++++++
plugins/fault.c | 116 ++++++++++++++++++++++++++++++++++
plugins/meson.build | 1 +
5 files changed, 202 insertions(+)
create mode 100644 plugins/fault.c
diff --git a/include/plugins/qemu-plugin.h b/include/plugins/qemu-plugin.h
index a68427536f..96e2787788 100644
--- a/include/plugins/qemu-plugin.h
+++ b/include/plugins/qemu-plugin.h
@@ -1255,6 +1255,25 @@ uint64_t qemu_plugin_get_virtual_clock_ns(void);
QEMU_PLUGIN_API
void qemu_plugin_timer_virt_ns(uint64_t time, void (*cb)(void*), void *opaque);
+typedef bool (*qemu_plugin_mmio_override_cb_t)(uint64_t hwaddr,
+ unsigned size,
+ bool is_write,
+ uint64_t *value);
+
+QEMU_PLUGIN_API
+void qemu_plugin_register_mmio_override_cb(qemu_plugin_id_t id,
+ qemu_plugin_mmio_override_cb_t cb);
+
+QEMU_PLUGIN_API
+void qemu_plugin_inject_irq(int irq_num, int cpu, bool pulse);
+
+QEMU_PLUGIN_API
+void qemu_plugin_inject_exception(int excp_index, uint32_t data);
+
+QEMU_PLUGIN_API
+void qemu_plugin_trigger_custom_fault(const char *fault_name, void *target_data,
+ void *fault_data);
+
#ifdef __cplusplus
} /* extern "C" */
#endif
diff --git a/include/qemu/plugin.h b/include/qemu/plugin.h
index ddd77bd82c..4cb01b2125 100644
--- a/include/qemu/plugin.h
+++ b/include/qemu/plugin.h
@@ -43,6 +43,11 @@ get_plugin_meminfo_rw(qemu_plugin_meminfo_t i)
return i >> 16;
}
+typedef void (*plugin_irq_inject_cb) (void *opaque, int irq,
+ int cpu, bool pulse);
+
+typedef void (*plugin_custom_fault_cb)(void *target_data, void *fault_data);
+
#ifdef CONFIG_PLUGIN
extern QemuOptsList qemu_plugin_opts;
@@ -234,6 +239,27 @@ static inline enum qemu_plugin_cb_flags qemu_plugin_get_cb_flags(void)
return current_cpu->neg.plugin_cb_flags;
}
+void plugin_register_mmio_override_cb(qemu_plugin_id_t id,
+ qemu_plugin_mmio_override_cb_t cb);
+
+bool plugin_mmio_override_cb_invoke(uint64_t hwaddr,
+ uint64_t size,
+ bool is_write,
+ uint64_t* data);
+
+void plugin_register_intc(void *opaque, plugin_irq_inject_cb cb);
+
+void plugin_inject_irq(int irq_num, int cpu, bool pulse);
+
+void plugin_inject_exception(int excp_index, uint32_t data);
+
+void plugin_register_custom_fault(const char *fault_name,
+ plugin_custom_fault_cb cb);
+
+void plugin_trigger_custom_fault(const char* fault_name, void *target_data,
+ void *fault_data);
+
+
#else /* !CONFIG_PLUGIN */
static inline void qemu_plugin_add_opts(void)
@@ -324,6 +350,19 @@ static inline void qemu_plugin_user_prefork_lock(void)
static inline void qemu_plugin_user_postfork(bool is_child)
{ }
+static inline bool plugin_mmio_override_cb_invoke(uint64_t hwaddr,
+ uint64_t size,
+ bool is_write,
+ void* data)
+{ return false; }
+
+static void plugin_register_intc(void *opaque, plugin_irq_inject_cb cb)
+{ }
+
+static void plugin_register_custom_fault(const char *fault_name,
+ plugin_custom_fault_cb cb)
+{ }
+
#endif /* !CONFIG_PLUGIN */
#endif /* QEMU_PLUGIN_H */
diff --git a/plugins/api.c b/plugins/api.c
index fa650e1219..0adeaa0bc3 100644
--- a/plugins/api.c
+++ b/plugins/api.c
@@ -687,3 +687,30 @@ void qemu_plugin_timer_virt_ns(uint64_t time, void (*cb)(void*), void *opaque)
timer_mod(data->timer, time);
}
+
+uint64_t qemu_plugin_get_virtual_clock_ns(void)
+{
+ return qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
+}
+
+void qemu_plugin_register_mmio_override_cb(qemu_plugin_id_t id,
+ qemu_plugin_mmio_override_cb_t cb)
+{
+ plugin_register_mmio_override_cb(id, cb);
+}
+
+void qemu_plugin_inject_irq(int irq_num, int cpu, bool pulse)
+{
+ plugin_inject_irq(irq_num, cpu, pulse);
+}
+
+void qemu_plugin_inject_exception(int excp_index, uint32_t data)
+{
+ plugin_inject_exception(excp_index, data);
+}
+
+void qemu_plugin_trigger_custom_fault(const char *fault_name,
+ void *target_data, void *fault_data)
+{
+ plugin_trigger_custom_fault(fault_name, target_data, fault_data);
+}
diff --git a/plugins/fault.c b/plugins/fault.c
new file mode 100644
index 0000000000..8f7c1e1333
--- /dev/null
+++ b/plugins/fault.c
@@ -0,0 +1,116 @@
+/*
+ * Fault Injection Core Subsystem
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "cpu.h"
+#include "qemu/main-loop.h"
+#include "hw/core/irq.h"
+#include "qemu/plugin.h"
+
+typedef struct {
+ qemu_plugin_id_t id;
+ qemu_plugin_mmio_override_cb_t cb;
+} MMIOOverrideEntry;
+
+static GArray *mmio_callbacks = NULL;
+
+void *intc_opaque;
+static plugin_irq_inject_cb irq_inject_cb = NULL;
+
+static GHashTable *fault_registry = NULL;
+
+void plugin_register_mmio_override_cb(qemu_plugin_id_t id,
+ qemu_plugin_mmio_override_cb_t cb)
+{
+ if (!mmio_callbacks) {
+ mmio_callbacks = g_array_new(FALSE, FALSE,
+ sizeof(MMIOOverrideEntry));
+ }
+
+ MMIOOverrideEntry entry = { .id = id, .cb = cb };
+ g_array_append_val(mmio_callbacks, entry);
+}
+
+bool plugin_mmio_override_cb_invoke(uint64_t hwaddr,
+ uint64_t size,
+ bool is_write,
+ uint64_t* data)
+{
+ if (!mmio_callbacks) {
+ return false;
+ }
+
+ for (int i = 0; i < mmio_callbacks->len; ++i) {
+ MMIOOverrideEntry *entry = &g_array_index(mmio_callbacks,
+ MMIOOverrideEntry, i);
+ if (entry->cb(hwaddr, size, is_write, data)) {
+ /* Stop on first match */
+ return true;
+ }
+ }
+
+ return false;
+}
+
+void plugin_register_intc(void *opaque, plugin_irq_inject_cb cb)
+{
+ intc_opaque = opaque;
+ irq_inject_cb = cb;
+}
+
+void plugin_inject_irq(int irq_num, int cpu, bool pulse)
+{
+ if (!irq_inject_cb) {
+ return;
+ }
+
+ bool locked = bql_locked();
+
+ if (!locked) {
+ bql_lock();
+ }
+
+ irq_inject_cb(intc_opaque, irq_num, cpu, pulse);
+
+ if (!locked) {
+ bql_unlock();
+ }
+}
+
+void plugin_inject_exception(int excp_index, uint32_t data)
+{
+#if defined (TARGET_ARM)
+ arm_cpu_inject_exception(excp_index, data);
+#else
+ qemu_log_mask(LOG_UNIMP,
+ "FI: Injecting exception is not supported for this target\n");
+#endif
+}
+
+void plugin_register_custom_fault(const char *fault_name,
+ plugin_custom_fault_cb cb){
+ if (!fault_registry) {
+ fault_registry = g_hash_table_new_full(g_str_hash, g_str_equal,
+ g_free, NULL);
+ }
+
+ g_hash_table_insert(fault_registry, g_strdup(fault_name), cb);
+}
+
+void plugin_trigger_custom_fault(const char* fault_name, void *target_data,
+ void *fault_data)
+{
+ plugin_custom_fault_cb cb = NULL;
+
+ if (fault_registry) {
+ cb = g_hash_table_lookup(fault_registry, fault_name);
+ }
+
+ if (cb) {
+ cb(target_data, fault_data);
+ }
+}
diff --git a/plugins/meson.build b/plugins/meson.build
index 9899f166ee..8995ce5977 100644
--- a/plugins/meson.build
+++ b/plugins/meson.build
@@ -86,3 +86,4 @@ system_ss.add(files('api.c', 'core.c'))
common_ss.add(files('loader.c'))
+specific_ss.add(files('fault.c'))
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 5/9] system/memory: Add plugin callbacks to intercept MMIO accesses
2026-03-18 10:46 [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Ruslan Ruslichenko
` (3 preceding siblings ...)
2026-03-18 10:46 ` [RFC PATCH 4/9] plugins: Introduce fault injection API and core subsystem Ruslan Ruslichenko
@ 2026-03-18 10:46 ` Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 6/9] hw/intc/arm_gic: Register primary GIC for plugin IRQ injection Ruslan Ruslichenko
` (4 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-18 10:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-arm, artem_mygaiev, volodymyr_babchuk, alex.bennee,
peter.maydell, pierrick.bouvier, philmd, Ruslan_Ruslichenko
From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
Add plugin callback to dispatch memory_region_dispath_read/write,
allowing plugins to intercept MMIO operations before they reach
device models, which enable to spoof read values and drop write
accesses.
Signed-off-by: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
---
system/memory.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/system/memory.c b/system/memory.c
index c51d0798a8..67a59f6e0a 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -35,6 +35,7 @@
#include "hw/core/boards.h"
#include "migration/vmstate.h"
#include "system/address-spaces.h"
+#include "qemu/plugin.h"
#include "memory-internal.h"
@@ -1448,6 +1449,10 @@ static MemTxResult memory_region_dispatch_read1(MemoryRegion *mr,
{
*pval = 0;
+
+ if (plugin_mmio_override_cb_invoke(mr->addr + addr, size, false, pval))
+ return MEMTX_OK;
+
if (mr->ops->read) {
return access_with_adjusted_size(addr, pval, size,
mr->ops->impl.min_access_size,
@@ -1533,6 +1538,9 @@ MemTxResult memory_region_dispatch_write(MemoryRegion *mr,
adjust_endianness(mr, &data, op);
+ if (plugin_mmio_override_cb_invoke(mr->addr + addr, size, true, &data))
+ return MEMTX_OK;
+
/*
* FIXME: it's not clear why under KVM the write would be processed
* directly, instead of going through eventfd. This probably should
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 6/9] hw/intc/arm_gic: Register primary GIC for plugin IRQ injection
2026-03-18 10:46 [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Ruslan Ruslichenko
` (4 preceding siblings ...)
2026-03-18 10:46 ` [RFC PATCH 5/9] system/memory: Add plugin callbacks to intercept MMIO accesses Ruslan Ruslichenko
@ 2026-03-18 10:46 ` Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 7/9] hw/arm/smmuv3: Add plugin fault handler for CMDQ errors Ruslan Ruslichenko
` (3 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-18 10:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-arm, artem_mygaiev, volodymyr_babchuk, alex.bennee,
peter.maydell, pierrick.bouvier, philmd, Ruslan_Ruslichenko
From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
Call plugin_register_primary_intc() at the end of the realization
of both ARM GICv2 and GICv3.
This links the system's primary interrupt controllers ot the
plugins subsystem, so that plugins can inject hardware irqs
using generic qemu_plugin_set_irq() API.
Signed-off-by: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
---
hw/intc/arm_gic.c | 28 ++++++++++++++++++++++++++++
hw/intc/arm_gicv3.c | 28 ++++++++++++++++++++++++++++
2 files changed, 56 insertions(+)
diff --git a/hw/intc/arm_gic.c b/hw/intc/arm_gic.c
index 4d4b79e6f3..aef39b3ef7 100644
--- a/hw/intc/arm_gic.c
+++ b/hw/intc/arm_gic.c
@@ -29,6 +29,8 @@
#include "trace.h"
#include "system/kvm.h"
#include "system/qtest.h"
+#include "qemu/plugin.h"
+
/* #define DEBUG_GIC */
@@ -2096,6 +2098,31 @@ static const MemoryRegionOps gic_viface_ops = {
.endianness = DEVICE_LITTLE_ENDIAN,
};
+static void gic_plugin_irq_inject(void *opaque, int irq, int cpu, bool pulse)
+{
+ DeviceState *dev = opaque;
+ GICState *s = ARM_GIC(dev);
+
+ qemu_irq gic_irq;
+
+ if (irq >= GIC_INTERNAL) {
+ assert(irq < s->num_irq);
+
+ gic_irq = qdev_get_gpio_in(dev, irq - GIC_INTERNAL);
+ } else {
+ assert(cpu < s->num_cpu);
+
+ uint32_t offset = s->num_irq - GIC_INTERNAL + (cpu * GIC_INTERNAL) + irq;
+ gic_irq = qdev_get_gpio_in(dev, offset);
+ }
+
+ if (pulse) {
+ qemu_irq_pulse(gic_irq);
+ } else {
+ qemu_irq_raise(gic_irq);
+ }
+}
+
static void arm_gic_realize(DeviceState *dev, Error **errp)
{
/* Device instance realize function for the GIC sysbus device */
@@ -2160,6 +2187,7 @@ static void arm_gic_realize(DeviceState *dev, Error **errp)
}
}
+ plugin_register_intc(dev, gic_plugin_irq_inject);
}
static void arm_gic_class_init(ObjectClass *klass, const void *data)
diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
index 542f81ea49..1bae8c9f17 100644
--- a/hw/intc/arm_gicv3.c
+++ b/hw/intc/arm_gicv3.c
@@ -20,6 +20,8 @@
#include "qemu/module.h"
#include "hw/intc/arm_gicv3.h"
#include "gicv3_internal.h"
+#include "hw/core/irq.h"
+#include "qemu/plugin.h"
static bool irqbetter(GICv3CPUState *cs, int irq, uint8_t prio, bool nmi)
{
@@ -434,6 +436,31 @@ static const MemoryRegionOps gic_ops[] = {
}
};
+static void gicv3_plugin_irq_inject(void *opaque, int irq, int cpu, bool pulse)
+{
+ DeviceState *dev = opaque;
+ GICv3State *s = ARM_GICV3(dev);
+
+ qemu_irq gic_irq;
+
+ if (irq >= GIC_INTERNAL) {
+ assert(irq < s->num_irq);
+
+ gic_irq = qdev_get_gpio_in(dev, irq - GIC_INTERNAL);
+ } else {
+ assert(cpu < s->num_cpu);
+
+ uint32_t offset = s->num_irq - GIC_INTERNAL + (cpu * GIC_INTERNAL) + irq;
+ gic_irq = qdev_get_gpio_in(dev, offset);
+ }
+
+ if (pulse) {
+ qemu_irq_pulse(gic_irq);
+ } else {
+ qemu_irq_raise(gic_irq);
+ }
+}
+
static void arm_gic_realize(DeviceState *dev, Error **errp)
{
/* Device instance realize function for the GIC sysbus device */
@@ -450,6 +477,7 @@ static void arm_gic_realize(DeviceState *dev, Error **errp)
gicv3_init_irqs_and_mmio(s, gicv3_set_irq, gic_ops);
gicv3_init_cpuif(s);
+ plugin_register_intc(dev, gicv3_plugin_irq_inject);
}
static void arm_gicv3_class_init(ObjectClass *klass, const void *data)
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 7/9] hw/arm/smmuv3: Add plugin fault handler for CMDQ errors
2026-03-18 10:46 [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Ruslan Ruslichenko
` (5 preceding siblings ...)
2026-03-18 10:46 ` [RFC PATCH 6/9] hw/intc/arm_gic: Register primary GIC for plugin IRQ injection Ruslan Ruslichenko
@ 2026-03-18 10:46 ` Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 8/9] contrib/plugins: Add fault injection plugin Ruslan Ruslichenko
` (2 subsequent siblings)
9 siblings, 0 replies; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-18 10:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-arm, artem_mygaiev, volodymyr_babchuk, alex.bennee,
peter.maydell, pierrick.bouvier, philmd, Ruslan_Ruslichenko
From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
Register custom 'smmu_gerror_cmdq' handler within plugin subsystem.
This enables external plugins to dynamically inject Command Queue
errors and trigger GERROR interrupts.
Signed-off-by: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
---
hw/arm/smmuv3.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index c08d58c579..e80b80e843 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -27,6 +27,8 @@
#include "hw/pci/pci.h"
#include "cpu.h"
#include "exec/target_page.h"
+#include "qemu/plugin.h"
+#include "qom/object.h"
#include "trace.h"
#include "qemu/log.h"
#include "qemu/error-report.h"
@@ -42,6 +44,55 @@
((ptw_info).stage == SMMU_STAGE_2 && \
(cfg)->s2cfg.record_faults))
+static void smmuv3_trigger_irq(SMMUv3State *s, SMMUIrq irq,
+ uint32_t gerror_mask);
+
+typedef struct {
+ uint64_t base_addr;
+ Object *found_obj;
+} SMMUSearchArgs;
+
+static int smmu_match_addr_cb(Object *obj, void *opaque)
+{
+ SMMUSearchArgs *args = (SMMUSearchArgs *)opaque;
+
+ if (object_dynamic_cast(obj, TYPE_ARM_SMMUV3)) {
+ SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
+
+ if (sbd->mmio[0].addr == args->base_addr) {
+ args->found_obj = obj;
+ return 1;
+ }
+ }
+
+ return 0;
+}
+
+static void smmu_inject_gerror_cmdq(void *target_data, void *fault_data)
+{
+ uint64_t base_address = *(uint64_t *)target_data;
+ SMMUCmdError cmd_error = *(SMMUCmdError*)fault_data;
+ Object *obj = NULL;
+
+ if (base_address) {
+ SMMUSearchArgs args = { .base_addr = base_address, .found_obj = NULL };
+ object_child_foreach_recursive(object_get_root(), smmu_match_addr_cb, &args);
+
+ obj = args.found_obj;
+ } else {
+ obj = object_resolve_path_type("", TYPE_ARM_SMMUV3, NULL);
+ }
+
+ if (!obj) {
+ return;
+ }
+
+ SMMUv3State *s = ARM_SMMUV3(obj);
+
+ smmu_write_cmdq_err(s, cmd_error);
+ smmuv3_trigger_irq(s, SMMU_IRQ_GERROR, R_GERROR_CMDQ_ERR_MASK);
+}
+
/**
* smmuv3_trigger_irq - pulse @irq if enabled and update
* GERROR register in case of GERROR interrupt
@@ -2130,6 +2181,9 @@ static void smmuv3_class_init(ObjectClass *klass, const void *data)
dc->hotpluggable = false;
dc->user_creatable = true;
+ plugin_register_custom_fault("smmu_gerror_cmdq",
+ smmu_inject_gerror_cmdq);
+
object_class_property_set_description(klass, "accel",
"Enable SMMUv3 accelerator support. Allows host SMMUv3 to be "
"configured in nested mode for vfio-pci dev assignment");
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 8/9] contrib/plugins: Add fault injection plugin
2026-03-18 10:46 [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Ruslan Ruslichenko
` (6 preceding siblings ...)
2026-03-18 10:46 ` [RFC PATCH 7/9] hw/arm/smmuv3: Add plugin fault handler for CMDQ errors Ruslan Ruslichenko
@ 2026-03-18 10:46 ` Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 9/9] docs: Add description of fault-injection plugin and subsystem Ruslan Ruslichenko
2026-03-18 17:16 ` [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Pierrick Bouvier
9 siblings, 0 replies; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-18 10:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-arm, artem_mygaiev, volodymyr_babchuk, alex.bennee,
peter.maydell, pierrick.bouvier, philmd, Ruslan_Ruslichenko
From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
Introduce Fault injection plugin for AArch64 targets. This
plugin provides a framework for testing guest OS by injecting
hardware-level faults during execution.
The plugin can be configured either statically via an XML file
at boot or dynamically at runtime via a UNIX socket.
Supported triggers:
- PC: Triggers on instruction execution at specific address.
- SYS_REG: Intercepts System Registers reads (e.g. mrs) and
modifies read results to configured value.
- RAM: Triggers on physical memory accesses.
- MMIO: Intercepts memory-mapped I/O.
- Timer: Triggers at a specific guest virtual clock time (ns).
Supported targets (injected faults):
- CPU_REG: Corrupts general-purpose CPU registers.
- RAM/MMIO: Modifies result of memory reads.
- IRQ: Inject hardware irqs on the primary INTC.
- EXCP: Inject CPU exceptions (e.g., Serror).
- Custom: Triggers device-specific fault handlers registered by
device models.
Signed-off-by: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
---
contrib/plugins/fault_injection.c | 772 ++++++++++++++++++++++++++++++
contrib/plugins/meson.build | 1 +
2 files changed, 773 insertions(+)
create mode 100644 contrib/plugins/fault_injection.c
diff --git a/contrib/plugins/fault_injection.c b/contrib/plugins/fault_injection.c
new file mode 100644
index 0000000000..6fa09fd359
--- /dev/null
+++ b/contrib/plugins/fault_injection.c
@@ -0,0 +1,772 @@
+/*
+ * Fault Injection Plugin
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include <ctype.h>
+#include <pthread.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/param.h>
+#include <sys/socket.h>
+#include <unistd.h>
+
+#include "qemu/osdep.h"
+#include <qemu-plugin.h>
+
+#include "glib/gmarkup.h"
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+typedef enum {
+ TRIGGER_ON_PC = 0,
+ TRIGGER_ON_SYSREG,
+ TRIGGER_ON_RAM,
+ TRIGGER_ON_MMIO,
+ TRIGGER_ON_TIMER
+} FaultTrigger;
+
+typedef enum {
+ TARGET_EMPTY = 0,
+ TARGET_CPU_REG,
+ TARGET_RAM,
+ TARGET_MMIO,
+ TARGET_IRQ,
+ TARGET_EXCP,
+ TARGET_CUSTOM
+} FaultTarget;
+
+typedef struct {
+ FaultTarget target;
+ uint64_t target_data;
+
+ FaultTrigger trigger;
+ uint64_t trigger_condition;
+ gchar *trigger_condition_str;
+
+ uint64_t fault_data;
+ gchar *fault_name;
+
+ uint8_t size;
+ uint8_t cpu;
+ gchar *irq_type;
+} FaultConfig;
+
+typedef struct {
+ uint64_t hwaddr;
+ uint64_t value;
+ uint8_t size;
+} MmioOverrideConfig;
+
+#define FI_LOG(...) do { \
+ g_autofree gchar *__msg = g_strdup_printf(__VA_ARGS__); \
+ qemu_plugin_outs(__msg); \
+} while (0)
+
+static bool plugin_is_shutting_down = false;
+static int socket_fd = -1;
+
+static GRWLock trigger_lock;
+
+GHashTable *pc_faults;
+GHashTable *mem_faults;
+GHashTable *sys_reg_faults;
+
+static GRWLock mmio_override_lock;
+static GRWLock sysreg_lock;
+
+GHashTable *mmio_override;
+
+static struct qemu_plugin_register *gp_registers[31];
+
+static void register_pc_trigger(FaultConfig* fc);
+static void register_mmio_override(FaultConfig *fc);
+
+static void fc_free(FaultConfig *fc);
+
+static bool apply_mmio_override(uint64_t hwaddr, unsigned size, bool is_write,
+ uint64_t *value)
+{
+ g_rw_lock_reader_lock(&mmio_override_lock);
+
+ MmioOverrideConfig *conf = g_hash_table_lookup(mmio_override, &hwaddr);
+ if (!conf) {
+ g_rw_lock_reader_unlock(&mmio_override_lock);
+ return false;
+ }
+
+ *value = conf->value;
+
+ g_rw_lock_reader_unlock(&mmio_override_lock);
+
+ return true;
+}
+
+static bool mmio_override_cb(uint64_t hwaddr, unsigned size, bool is_write,
+ uint64_t *value)
+{
+ if (is_write) {
+ return false;
+ }
+
+ return apply_mmio_override(hwaddr, size, is_write, value);
+}
+
+static void cpu_write_reg(int reg_id, uint64_t value)
+{
+ g_assert(reg_id >= 0 && reg_id <= 30);
+
+ g_autoptr(GByteArray) buf = g_byte_array_new();
+
+ g_byte_array_set_size(buf, 8);
+
+ memcpy(buf->data, &value, 8);
+
+ bool success = qemu_plugin_write_register(gp_registers[reg_id], buf);
+ if (!success) {
+ FI_LOG("FI: Failed to write register\n");
+ }
+}
+
+static void cpu_write_mem(uint64_t addr, uint64_t data, uint8_t size)
+{
+ g_autoptr(GByteArray) buf = g_byte_array_new();
+
+ g_byte_array_set_size(buf, size);
+
+ memcpy(buf->data, &data, size);
+
+ bool success = qemu_plugin_write_memory_vaddr(addr, buf);
+ if (!success) {
+ FI_LOG("FI: Failed to write memory\n");
+ }
+}
+
+static void inject_irq(FaultConfig *fc)
+{
+ int irq_num = fc->target_data;
+
+ if (!fc->irq_type || !g_strcmp0(fc->irq_type, "SPI")) {
+ irq_num += 32;
+ } else if (!g_strcmp0(fc->irq_type, "PPI")) {
+ irq_num += 16;
+ } else if (!g_strcmp0(fc->irq_type, "SGI")) {
+ /* skip */
+ } else {
+ FI_LOG("FI: Unknown IRQ type: %s\n", fc->irq_type);
+ }
+
+ qemu_plugin_inject_irq(irq_num, fc->cpu, fc->fault_data);
+
+}
+
+static void inject_fault(FaultConfig* fc)
+{
+ switch (fc->target) {
+ case TARGET_CPU_REG:
+ cpu_write_reg(fc->target_data, fc->fault_data);
+ break;
+ case TARGET_RAM:
+ cpu_write_mem(fc->target_data, fc->fault_data, fc->size);
+ break;
+ case TARGET_MMIO:
+ register_mmio_override(fc);
+ break;
+ case TARGET_IRQ:
+ inject_irq(fc);
+ break;
+ case TARGET_EXCP:
+ qemu_plugin_inject_exception(fc->target_data, fc->fault_data);
+ break;
+ case TARGET_CUSTOM:
+ qemu_plugin_trigger_custom_fault(fc->fault_name,
+ &fc->target_data, &fc->fault_data);
+ break;
+ default:
+ FI_LOG("FI: Unsupported fault type\n");
+ break;
+ }
+}
+
+static void timed_fault_timer_cb(void* data)
+{
+ FaultConfig* fc = (FaultConfig*)data;
+
+ inject_fault(fc);
+
+ fc_free(fc);
+}
+
+static void vcpu_mem_cb(unsigned int vcpu_index,
+ qemu_plugin_meminfo_t info,
+ uint64_t vaddr, void *userdata)
+{
+ GSList *fault_list;
+
+ g_rw_lock_reader_lock(&trigger_lock);
+
+ fault_list = g_hash_table_lookup(mem_faults, &vaddr);
+ for (GSList *entry = fault_list; entry != NULL; entry = entry->next) {
+ FaultConfig *fc = (FaultConfig *)entry->data;
+
+ inject_fault(fc);
+ }
+
+ g_rw_lock_reader_unlock(&trigger_lock);
+}
+
+static void vcpu_insn_exec_cb(unsigned int vcpu_index, void *data)
+{
+ uint64_t insn_vaddr = (uint64_t)data;
+ GSList *fault_list;
+
+ g_rw_lock_reader_lock(&trigger_lock);
+
+ fault_list = g_hash_table_lookup(pc_faults,
+ &insn_vaddr);
+
+ for (GSList *l = fault_list; l != NULL; l = l->next) {
+ FaultConfig *fc = (FaultConfig *)l->data;
+
+ inject_fault(fc);
+ }
+
+ g_rw_lock_reader_unlock(&trigger_lock);
+}
+
+#define MRS_OPCODE 0xD5300000
+#define MRS_OPCODE_MASK 0xFFF00000
+
+static void handle_sysreg_fault(struct qemu_plugin_insn *insn, uint64_t insn_vaddr)
+{
+ FaultConfig *fc;
+ uint32_t raw_opcode;
+ size_t data_size = qemu_plugin_insn_data(insn, &raw_opcode, sizeof(raw_opcode));
+ if (data_size < sizeof(raw_opcode)) {
+ return;
+ }
+
+ uint32_t opcode = GUINT32_FROM_LE(raw_opcode);
+
+ if ((opcode & MRS_OPCODE_MASK) != MRS_OPCODE) {
+ return;
+ }
+
+ char *disas = qemu_plugin_insn_disas(insn);
+ if (!disas) {
+ return;
+ }
+
+ int dest_reg;
+ char sysreg_name[32] = { 0 };
+
+ if (sscanf(disas, "mrs x%d, %31s", &dest_reg, sysreg_name) == 2) {
+ uint64_t fault_data;
+ bool found = false;
+
+ g_rw_lock_reader_lock(&sysreg_lock);
+
+ fc = g_hash_table_lookup(sys_reg_faults, sysreg_name);
+ if (fc) {
+ fault_data = fc->fault_data;
+ found = true;
+ }
+
+ g_rw_lock_reader_unlock(&sysreg_lock);
+
+ if (found) {
+ /*
+ * WA: For CPU system registers, injecting fault to destination
+ * gp register on next PC
+ */
+ FaultConfig *dyn_pc_fault = g_new0(FaultConfig, 1);
+
+ dyn_pc_fault->trigger = TRIGGER_ON_PC;
+ dyn_pc_fault->trigger_condition = insn_vaddr + 4;
+ dyn_pc_fault->target = TARGET_CPU_REG;
+ dyn_pc_fault->target_data = dest_reg;
+ dyn_pc_fault->fault_data = fault_data;
+
+ register_pc_trigger(dyn_pc_fault);
+ }
+ }
+
+ g_free(disas);
+}
+
+static void vcpu_tb_trans_cb(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
+{
+ for(int i = 0; i < qemu_plugin_tb_n_insns(tb); i++) {
+ struct qemu_plugin_insn *insn = qemu_plugin_tb_get_insn(tb, i);
+ uint64_t insn_vaddr = qemu_plugin_insn_vaddr(insn);
+ GSList *fault_list;
+
+ qemu_plugin_register_vcpu_mem_cb(insn, vcpu_mem_cb,
+ QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_MEM_RW, NULL);
+
+ handle_sysreg_fault(insn, insn_vaddr);
+
+ g_rw_lock_reader_lock(&trigger_lock);
+
+ fault_list = g_hash_table_lookup(pc_faults,
+ &insn_vaddr);
+
+ if (fault_list) {
+ qemu_plugin_register_vcpu_insn_exec_cb(insn, vcpu_insn_exec_cb,
+ QEMU_PLUGIN_CB_RW_REGS,
+ (void *)insn_vaddr);
+ }
+
+ g_rw_lock_reader_unlock(&trigger_lock);
+ }
+}
+
+static void vcpu_init_cb(qemu_plugin_id_t id, unsigned int vcpu_index)
+{
+ if (vcpu_index) {
+ /* Init reg's and mem watchpoints only once, with CPU 0 */
+ return;
+ }
+
+ g_autoptr(GArray) reg_list = qemu_plugin_get_registers();
+
+ for (int i = 0; i < reg_list->len; ++i) {
+ qemu_plugin_reg_descriptor *rd = &g_array_index(reg_list,
+ qemu_plugin_reg_descriptor, i);
+
+ if (rd->name[0] == 'x' && isdigit(rd->name[1])) {
+ int reg_ind = atoi(&rd->name[1]);
+
+ if (reg_ind >= 0 && reg_ind <= 30) {
+ gp_registers[reg_ind] = rd->handle;
+ }
+ }
+ }
+}
+
+static void register_mmio_override(FaultConfig *fc)
+{
+ g_rw_lock_writer_lock(&mmio_override_lock);
+
+ MmioOverrideConfig *curr_conf = g_hash_table_lookup(mmio_override,
+ &fc->target_data);
+ if (curr_conf) {
+ curr_conf->value = fc->fault_data;
+ curr_conf->size = fc->size;
+ } else {
+ MmioOverrideConfig *new_conf = g_new0(MmioOverrideConfig, 1);
+
+ new_conf->hwaddr = fc->target_data;
+ new_conf->value = fc->fault_data;
+ new_conf->size = fc->size;
+
+ g_hash_table_insert(mmio_override, &new_conf->hwaddr,
+ new_conf);
+ }
+
+ g_rw_lock_writer_unlock(&mmio_override_lock);
+}
+
+static void register_sysreg_override(FaultConfig *fc)
+{
+ g_rw_lock_writer_lock(&sysreg_lock);
+
+ FaultConfig *old_fc = g_hash_table_lookup(sys_reg_faults,
+ fc->trigger_condition_str);
+ g_hash_table_replace(sys_reg_faults,
+ fc->trigger_condition_str,
+ fc);
+
+ if (old_fc) {
+ fc_free(old_fc);
+ }
+
+ g_rw_lock_writer_unlock(&sysreg_lock);
+}
+
+static void register_ram_trigger(FaultConfig* fc)
+{
+
+ g_rw_lock_writer_lock(&trigger_lock);
+
+ GSList *mem_list = g_hash_table_lookup(mem_faults, &fc->trigger_condition);
+
+ mem_list = g_slist_append(mem_list, fc);
+ g_hash_table_insert(mem_faults,
+ &fc->trigger_condition, mem_list);
+
+ g_rw_lock_writer_unlock(&trigger_lock);
+
+}
+
+static void register_pc_trigger(FaultConfig* fc)
+{
+ g_rw_lock_writer_lock(&trigger_lock);
+
+ bool duplicate = false;
+ GSList *pc_list = g_hash_table_lookup(pc_faults,
+ &fc->trigger_condition);
+
+ for (GSList *l = pc_list; l != NULL; l = l->next) {
+ FaultConfig *existing = (FaultConfig *)l->data;
+
+ if (existing->target == fc->target &&
+ existing->target_data == fc->target_data &&
+ existing->fault_data == fc->fault_data) {
+ duplicate = true;
+ break;
+ }
+ }
+
+ if (!duplicate) {
+ pc_list = g_slist_append(pc_list, fc);
+ g_hash_table_insert(pc_faults, &fc->trigger_condition,
+ pc_list);
+ } else {
+ fc_free(fc);
+ }
+
+ g_rw_lock_writer_unlock(&trigger_lock);
+
+}
+
+static bool register_fault(FaultConfig *fc)
+{
+ FaultTrigger trigger_type = fc->trigger;
+
+ if (fc->target == TARGET_CUSTOM && !fc->fault_name) {
+ FI_LOG("FI: fault_name needed for custom targets\n");
+ return false;
+ }
+
+ if (!fc->size) {
+ fc->size = sizeof(fc->fault_data);
+ }
+
+ switch (fc->trigger) {
+ case TRIGGER_ON_PC:
+ register_pc_trigger(fc);
+ break;
+ case TRIGGER_ON_SYSREG:
+ if (fc->target != TARGET_EMPTY) {
+ FI_LOG("FI: SYS_REG faults does not support target\n");
+ return false;
+ }
+
+ register_sysreg_override(fc);
+ break;
+ case TRIGGER_ON_RAM:
+ if (fc->target == TARGET_EMPTY) {
+ /* Allow short form for RAM triggers to override same memory */
+ fc->target = TARGET_RAM;
+ fc->target_data = fc->trigger_condition;
+ }
+
+ register_ram_trigger(fc);
+ break;
+ case TRIGGER_ON_MMIO:
+ if (fc->target != TARGET_EMPTY) {
+ FI_LOG("FI: No target support for MMIO trigger for now\n");
+ return false;
+ }
+
+ register_mmio_override(fc);
+ fc_free(fc);
+ break;
+ case TRIGGER_ON_TIMER:
+ if (fc->target == TARGET_CPU_REG) {
+ FI_LOG("FI: CPU_REG is invalid for TIMER trigger\n");
+ return false;
+ }
+ qemu_plugin_timer_virt_ns(fc->trigger_condition,
+ timed_fault_timer_cb, fc);
+ break;
+ default:
+ /* skip */
+ break;
+ }
+
+ if (trigger_type == TRIGGER_ON_PC || trigger_type == TRIGGER_ON_SYSREG) {
+ qemu_plugin_flush_tb_cache();
+ }
+
+ return true;
+}
+
+static void fc_free(FaultConfig *fc)
+{
+ if (!fc) {
+ return;
+ }
+
+ g_free(fc->trigger_condition_str);
+ g_free(fc->fault_name);
+ g_free(fc->irq_type);
+
+ g_free(fc);
+}
+
+static void xml_start_elem(GMarkupParseContext *context,
+ const gchar *element_name,
+ const gchar **attribute_names,
+ const gchar **attribute_values,
+ gpointer user_data,
+ GError **error)
+{
+ if (!g_strcmp0(element_name, "Fault")) {
+ FaultConfig *fc = g_new0(FaultConfig, 1);
+
+ for (int i = 0; attribute_names[i] != NULL; i++) {
+ const char *key = attribute_names[i];
+ const char *value = attribute_values[i];
+
+ if (!g_strcmp0(key, "target")) {
+ if (!g_strcmp0(value, "CPU_REG")) {
+ fc->target = TARGET_CPU_REG;
+ } else if (!g_strcmp0(value, "RAM")) {
+ fc->target = TARGET_RAM;
+ } else if (!g_strcmp0(value, "MMIO")) {
+ fc->target = TARGET_MMIO;
+ } else if (!g_strcmp0(value, "IRQ")) {
+ fc->target = TARGET_IRQ;
+ } else if (!g_strcmp0(value, "EXCP")) {
+ fc->target = TARGET_EXCP;
+ } else if (!g_strcmp0(value, "CUSTOM")) {
+ fc->target = TARGET_CUSTOM;
+ } else {
+ g_set_error(error, G_MARKUP_ERROR,
+ G_MARKUP_ERROR_UNKNOWN_ATTRIBUTE,
+ "FI: Unknown target type '%s'", value);
+ fc_free(fc);
+ return;
+ }
+ } else if (!g_strcmp0(key, "trigger")) {
+ if (!g_strcmp0(value, "PC")) {
+ fc->trigger = TRIGGER_ON_PC;
+ } else if (!g_strcmp0(value, "SYS_REG")) {
+ fc->trigger = TRIGGER_ON_SYSREG;
+ } else if (!g_strcmp0(value, "RAM")) {
+ fc->trigger = TRIGGER_ON_RAM;
+ } else if (!g_strcmp0(value, "MMIO")) {
+ fc->trigger = TRIGGER_ON_MMIO;
+ } else if (!g_strcmp0(value, "TIMER")) {
+ fc->trigger = TRIGGER_ON_TIMER;
+ } else {
+ g_set_error(error, G_MARKUP_ERROR,
+ G_MARKUP_ERROR_UNKNOWN_ATTRIBUTE,
+ "FI: Unknown trigger type: '%s'", value);
+ fc_free(fc);
+ return;
+ }
+ } else if (!g_strcmp0(key, "target_data")) {
+ fc->target_data = strtoull(value, NULL, 0);
+ } else if (!g_strcmp0(key, "trigger_condition")) {
+ fc->trigger_condition_str = g_strdup(value);
+ fc->trigger_condition = strtoull(value, NULL, 0);
+ } else if (!g_strcmp0(key, "fault_data")) {
+ fc->fault_data = strtoull(value, NULL, 0);
+ } else if (!g_strcmp0(key, "size")) {
+ fc->size = strtoull(value, NULL, 0);
+ } else if (!g_strcmp0(key, "cpu")) {
+ fc->cpu = strtoull(value, NULL, 0);
+ } else if (!g_strcmp0(key, "irq_type")) {
+ fc->irq_type = g_strdup(value);
+ } else if (!g_strcmp0(key, "fault_name")) {
+ fc->fault_name = g_strdup(value);
+ }
+ }
+
+ if (!register_fault(fc)) {
+ g_set_error(error, G_MARKUP_ERROR,
+ G_MARKUP_ERROR_UNKNOWN_ATTRIBUTE,
+ "FI: Failed to register fault");
+ fc_free(fc);
+ return;
+ }
+ }
+}
+
+static GMarkupParser parser = {
+ .start_element = xml_start_elem,
+};
+
+static void *ipc_listener_thread(void *arg)
+{
+ char *sock_path = (char *)arg;
+ struct sockaddr_un addr;
+ int client_fd;
+ char buf[1024];
+
+ socket_fd = socket(AF_UNIX, SOCK_STREAM, 0);
+ if (socket_fd < 0) {
+ FI_LOG("Failed to create socket, err = %s\n",
+ strerror(errno));
+ return NULL;
+ }
+
+ memset(&addr, 0, sizeof(addr));
+
+ addr.sun_family = AF_UNIX;
+ g_strlcpy(addr.sun_path, sock_path, sizeof(addr.sun_path) - 1);
+
+ unlink(sock_path);
+
+ if (bind(socket_fd, &addr, sizeof(addr)) < 0) {
+ FI_LOG("Failed to create socket, err = %s\n",
+ strerror(errno));
+ close(socket_fd);
+ return NULL;
+ }
+
+ if (listen(socket_fd, 1)) {
+ FI_LOG("Listen socket failed, err = %s\n",
+ strerror(errno));
+ close(socket_fd);
+ return NULL;
+ }
+
+ while (true) {
+ client_fd = accept(socket_fd, NULL, NULL);
+
+ if (client_fd < 0) {
+ if (plugin_is_shutting_down) {
+ break;
+ }
+ continue;
+ }
+
+ GString *xml_payload = g_string_new(NULL);
+
+ memset(buf, 0, sizeof(buf));
+
+ while (true) {
+ ssize_t bytes_read = read(client_fd, buf, sizeof(buf) - 1);
+
+ if (bytes_read > 0) {
+ g_string_append_len(xml_payload, buf, bytes_read);
+ } else if (bytes_read == 0) {
+ break;
+ } else {
+ if (errno == EINTR) {
+ continue;
+ }
+
+ break;
+ }
+ }
+
+ if (xml_payload->len > 0) {
+ GError *err = NULL;
+
+ GMarkupParseContext *ctx = g_markup_parse_context_new(&parser,
+ 0, NULL, NULL);
+
+ if (!g_markup_parse_context_parse(ctx, xml_payload->str,
+ xml_payload->len, &err)) {
+ FI_LOG("FI Error: Failed to parse dynamic XML: %s\n",
+ err->message);
+ g_error_free(err);
+ }
+
+ g_markup_parse_context_free(ctx);
+ }
+
+ g_string_free(xml_payload, TRUE);
+ close(client_fd);
+ }
+
+ unlink(sock_path);
+ g_free(sock_path);
+
+ return NULL;
+}
+
+static void plugin_exit_cb(qemu_plugin_id_t id, void *userdata)
+{
+ plugin_is_shutting_down = true;
+
+ if (socket_fd >= 0) {
+ close(socket_fd);
+ socket_fd = -1;
+ }
+}
+
+QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
+ const qemu_info_t *info,
+ int argc, char **argv)
+{
+ const char *config_path = NULL;
+ const char *socket_path = NULL;
+ gchar *config;
+ gsize length;
+ GError *err = NULL;
+ bool success;
+
+ if (strcmp(info->target_name, "aarch64")) {
+ FI_LOG("FI: Target %s is not supported\n", info->target_name);
+ return 1;
+ }
+
+ for (int i = 0; i < argc; ++i) {
+ if (g_str_has_prefix(argv[i], "config=")) {
+ config_path = argv[i] + strlen("config=");
+ } else if (g_str_has_prefix(argv[i], "socket=")) {
+ socket_path = g_strdup(argv[i] + strlen("socket="));
+ }
+ }
+
+ if (!config_path && !socket_path) {
+ FI_LOG("FI: either config or socket path required\n");
+ return 1;
+ }
+
+ pc_faults = g_hash_table_new(g_int64_hash, g_int64_equal);
+ mem_faults = g_hash_table_new(g_int64_hash, g_int64_equal);
+ sys_reg_faults = g_hash_table_new(g_str_hash, g_str_equal);
+ mmio_override = g_hash_table_new(g_int64_hash, g_int64_equal);
+
+ g_rw_lock_init(&trigger_lock);
+ g_rw_lock_init(&mmio_override_lock);
+ g_rw_lock_init(&sysreg_lock);
+
+ if (config_path) {
+ if (access(config_path, R_OK)) {
+ FI_LOG("FI: can't access config file, err = %s\n",
+ strerror(errno));
+ return 1;
+ }
+
+ success = g_file_get_contents(config_path, &config,
+ &length, &err);
+ if (success) {
+ GMarkupParseContext *ctx = g_markup_parse_context_new(&parser,
+ 0, NULL, NULL);
+
+ success = g_markup_parse_context_parse(ctx, config, length, &err);
+ }
+
+ if (!success) {
+ FI_LOG("FI: failed to parse config file\n");
+ return 1;
+ }
+ }
+
+ if (socket_path) {
+ pthread_t thread_id;
+
+ pthread_create(&thread_id, NULL, ipc_listener_thread,
+ (void*)socket_path);
+ pthread_detach(thread_id);
+ }
+
+ qemu_plugin_register_vcpu_init_cb(id, vcpu_init_cb);
+ qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans_cb);
+ qemu_plugin_register_mmio_override_cb(id, mmio_override_cb);
+
+ qemu_plugin_register_atexit_cb(id, plugin_exit_cb, NULL);
+
+ return 0;
+}
\ No newline at end of file
diff --git a/contrib/plugins/meson.build b/contrib/plugins/meson.build
index 099319e7a1..df4d4c5177 100644
--- a/contrib/plugins/meson.build
+++ b/contrib/plugins/meson.build
@@ -12,6 +12,7 @@ contrib_plugins = [
'stoptrigger.c',
'traps.c',
'uftrace.c',
+'fault_injection.c',
]
if host_os != 'windows'
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [RFC PATCH 9/9] docs: Add description of fault-injection plugin and subsystem
2026-03-18 10:46 [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Ruslan Ruslichenko
` (7 preceding siblings ...)
2026-03-18 10:46 ` [RFC PATCH 8/9] contrib/plugins: Add fault injection plugin Ruslan Ruslichenko
@ 2026-03-18 10:46 ` Ruslan Ruslichenko
2026-03-18 17:16 ` [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Pierrick Bouvier
9 siblings, 0 replies; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-18 10:46 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-arm, artem_mygaiev, volodymyr_babchuk, alex.bennee,
peter.maydell, pierrick.bouvier, philmd, Ruslan_Ruslichenko
From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
The patch introduce documentation for newly added Fault Injection
plugin and subsystem.
Signed-off-by: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
---
docs/fault-injection.txt | 111 +++++++++++++++++++++++++++++++++++++++
1 file changed, 111 insertions(+)
create mode 100644 docs/fault-injection.txt
diff --git a/docs/fault-injection.txt b/docs/fault-injection.txt
new file mode 100644
index 0000000000..05cbd48136
--- /dev/null
+++ b/docs/fault-injection.txt
@@ -0,0 +1,111 @@
+QEMU FAULT INJECTION PLUGIN DOCUMENTATION
+=========================================
+
+OVERVIEW
+--------
+The Fault Injection (FI) plugin is a testing tool for guest operating systems running in QEMU. It allows you to test how a guest OS or driver handles hardware-level errors. Currently, only AArch64 (ARM64) guest systems are supported.
+
+Errors (faults) can be injected in two ways:
+1. Statically using a configuration file when QEMU starts.
+2. Dynamically using a UNIX socket while the system is running.
+
+
+USAGE
+-----
+To use the plugin, add the "-plugin" option to the QEMU command.
+
+Command Line Examples:
+
+1. Using a static XML config file:
+qemu-system-aarch64 -machine virt -cpu cortex-a57 -plugin ./contrib/plugins/libfault_injection.so,config=faults.xml
+
+2. Using a dynamic UNIX socket:
+qemu-system-aarch64 -machine virt -cpu cortex-a57 -plugin ./contrib/plugins/libfault_injection.so,socket=/tmp/fi_socket.sock
+
+3. Using both at the same time:
+qemu-system-aarch64 -machine virt -cpu cortex-a57 -plugin ./contrib/plugins/libfault_injection.so,config=faults.xml,socket=/tmp/fi_socket.sock
+
+To send a dynamic fault over the socket while QEMU is running, an XML string can be sent directly to the socket file.
+
+
+CORE CONCEPTS
+-------------
+A fault configuration has two main parts:
+- Trigger: When should the fault happen? (Example: when the CPU reaches a specific address).
+- Target: What should be corrupted or injected? (Example: change a CPU register).
+
+SUPPORTED TRIGGERS:
+- PC : Triggers when the CPU executes an instruction at a specific Virtual Address.
+- SYS_REG : Triggers when the guest reads a specific System Register (like cntvct_el0).
+- RAM : Triggers when the guest accesses a specific Virtual Address in memory.
+- MMIO : Triggers when the guest reads from a hardware device at a Physical Address.
+- TIMER : Triggers at a specific guest virtual time (in nanoseconds).
+
+SUPPORTED TARGETS:
+- CPU_REG : Changes a CPU register (x0 to x30).
+- RAM : Overwrites physical memory with a fake value.
+- MMIO : Modifies a hardware device read with a fake value.
+- IRQ : Injects a hardware interrupt into the primary INTC.
+- EXCP : Injects a CPU exception (like an SError).
+- CUSTOM : Triggers a custom device error (custom handler registered by device model).
+
+
+XML CONFIGURATION FORMAT
+------------------------
+The plugin uses a simple XML format. Each fault is defined by a <Fault /> tag. Multiple fault tags can be added inside one file by wrapping them in a <Faults> block.
+
+The following attributes can be used in the tag:
+- trigger : The event that starts the fault (PC, TIMER, etc.).
+- trigger_condition : The value needed to activate the trigger (Address, Time, or System Register Name).
+- target : The system part to corrupt (CPU_REG, IRQ, etc.). This is optional for RAM and MMIO triggers.
+- target_data : The specific ID or address of the target.
+- fault_data : The corrupted value to inject.
+- size : (Optional) Size in bytes for memory operations. Default is 8.
+- cpu : (Optional) CPU index for IRQs. Default is 0.
+- irq_type : (Optional) For IRQs. Can be SPI, PPI, or SGI. Default is SPI.
+- fault_name : (Optional) Required only for CUSTOM targets (string with the name of the custom fault).
+
+
+EXAMPLES
+--------
+
+Example 1: Corrupt a CPU Register on a Specific Instruction
+This changes register x1 to 0 when the CPU executes the instruction at virtual address 0xa00002e7714.
+
+<Faults>
+ <Fault trigger="PC" trigger_condition="0xa00002e7714" target="CPU_REG" target_data="1" fault_data="0" />
+</Faults>
+
+
+Example 2: Modify an MMIO Read
+When the guest OS tries to read a hardware device at physical address 0x0800FFE8, the plugin ignores the real hardware and returns the fake value 0x0.
+
+<Faults>
+ <Fault trigger="MMIO" trigger_condition="0x0800FFE8" fault_data="0" />
+</Faults>
+
+
+Example 3: Inject a Hardware Interrupt using a Timer
+This injects SPI interrupt number 77 into CPU 0 after 10s of virtual guest time and modifies the results of MMIO reads starting at this time.
+
+<Faults>
+ <Fault trigger="TIMER" trigger_condition="10000000000" target="IRQ" target_data="77" fault_data="0" />
+ <Fault trigger="TIMER" trigger_condition="10000000000" target="MMIO" target_data="0x09050060" fault_data="0x180" />
+ <Fault trigger="TIMER" trigger_condition="10000000000" target="MMIO" target_data="0x09050064" fault_data="0x0" />
+</Faults>
+
+
+Example 4: Trigger a Custom SMMUv3 Command Queue Error
+After 10s of guest virtual time, this injects a custom SMMUv3 Command Queue error into the SMMU device located at 0x09050000.
+
+<Faults>
+ <Fault trigger="TIMER" trigger_condition="10000000000" target="CUSTOM" fault_name="smmu_gerror_cmdq" target_data="0x09050000" fault_data="1" />
+</Faults>
+
+
+Example 5: Inject a CPU Exception (SError)
+This injects a Virtual SError (Exception Index 24) when the CPU executes the instruction at 0xffff8000802dfed0. The syndrome register is set to the value 0xbf000002.
+
+<Faults>
+ <Fault trigger="PC" trigger_condition="0xffff8000802dfed0" target="EXCP" target_data="24" fault_data="0xbf000002" />
+</Faults>
--
2.43.0
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions
2026-03-18 10:46 [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Ruslan Ruslichenko
` (8 preceding siblings ...)
2026-03-18 10:46 ` [RFC PATCH 9/9] docs: Add description of fault-injection plugin and subsystem Ruslan Ruslichenko
@ 2026-03-18 17:16 ` Pierrick Bouvier
2026-03-19 18:20 ` Ruslan Ruslichenko
9 siblings, 1 reply; 18+ messages in thread
From: Pierrick Bouvier @ 2026-03-18 17:16 UTC (permalink / raw)
To: Ruslan Ruslichenko, qemu-devel
Cc: qemu-arm, artem_mygaiev, volodymyr_babchuk, alex.bennee,
peter.maydell, philmd, Ruslan_Ruslichenko
Hi Ruslan,
On 3/18/26 3:46 AM, Ruslan Ruslichenko wrote:
> From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
>
> This patch series is submitted as an RFC to gather early feedback on a Fault Injection (FI) framework built on top of the QEMU TCG plugin subsystem.
>
> Motivation
>
> Testing guest operating systems, hypervisors (like Xen), and low-level drivers against unexpected hardware failures can be difficult.
> This series provides an interface to inject faults dynamically without altering QEMU's core emulation source code for every test case.
>
> Architecture & Key Features
>
> The series introduces the core API extensions and implements a fault injection plugin (contrib/plugins/fault_injection.c) targeting AArch64.
> The plugin can be controlled statically via XML configurations on boot, or dynamically at runtime via a UNIX socket (enabling integration with automated testing frameworks via Python or GDB).
>
> New Plugin API Capabilities:
>
> MMIO Interception: Allows plugins to hook into memory_region_dispatch_read/write to modify hardware register reads or drop writes.
> Asynchronous Timers: Exposes QEMU_CLOCK_VIRTUAL to plugins, allowing callbacks to be scheduled based on guest virtual time.
> TB Cache Flushing: Exposes qemu_plugin_flush_tb_cache() so plugins can force re-translation when applying dynamic PC-based hooks.
> Interrupt & Exception Injection: Exposes APIs to raise/pulse hardware IRQs on the primary INTC and inject CPU exceptions (e.g., SErrors).
> Custom Device Faults: Introduces a registry where device models (e.g., SMMUv3) can expose specific fault handlers (like CMDQ errors) to be triggered externally by plugins.
>
> Patch Summary
> Patch 1 (target/arm): Adds support for asynchronous CPU exception injection.
> Patch 2-3 (plugins/api): Exposes virtual clock timers and TB cache flushing to the public plugin API.
> Patch 4 (plugins): Introduces the core fault injection subsystem, IRQ/Exception routing, and the Custom Fault registry.
> Patch 5 (system/memory): Adds the MMIO override hooks into the memory dispatch path.
> Patch 6 (hw/intc): Registers the ARM GIC (v2/v3) with the plugin subsystem to enable direct hardware IRQ injection.
> Patch 7 (hw/arm): Registers the SMMUv3 with the custom fault registry to demonstrate how device models can expose specific errors (like CMDQ faults) to plugins.
> Patch 8 (contrib/plugins): Implements the actual fault_injection plugin using the new APIs.
> Patch 9 (docs): Adds documentation and usage examples for the plugin.
>
> Request for Comments & Feedback
>
> Any suggestions on improvements, potential edge cases, or issues with the current design are highly welcome.
>
> Ruslan Ruslichenko (9):
> target/arm: Add API for dynamic exception injection
> plugins/api: Expose virtual clock timers to plugins
> plugins: Expose Transaction Block cache flush API to plugins
> plugins: Introduce fault injection API and core subsystem
> system/memory: Add plugin callbacks to intercept MMIO accesses
> hw/intc/arm_gic: Register primary GIC for plugin IRQ injection
> hw/arm/smmuv3: Add plugin fault handler for CMDQ errors
> contrib/plugins: Add fault injection plugin
> docs: Add description of fault-injection plugin and subsystem
>
> contrib/plugins/fault_injection.c | 772 ++++++++++++++++++++++++++++++
> contrib/plugins/meson.build | 1 +
> docs/fault-injection.txt | 111 +++++
> hw/arm/smmuv3.c | 54 +++
> hw/intc/arm_gic.c | 28 ++
> hw/intc/arm_gicv3.c | 28 ++
> include/plugins/qemu-plugin.h | 28 ++
> include/qemu/plugin.h | 39 ++
> plugins/api.c | 62 +++
> plugins/core.c | 11 +
> plugins/fault.c | 116 +++++
> plugins/meson.build | 1 +
> plugins/plugin.h | 2 +
> system/memory.c | 8 +
> target/arm/cpu.h | 4 +
> target/arm/helper.c | 55 +++
> 16 files changed, 1320 insertions(+)
> create mode 100644 contrib/plugins/fault_injection.c
> create mode 100644 docs/fault-injection.txt
> create mode 100644 plugins/fault.c
>
first, thanks for posting your series!
About the general approach.
As you noticed, this is exposing a lot of QEMU internals, and it's
something we tend to avoid to do. As well, it's very architecture
specific, which is another pattern we try to avoid.
For some of your needs (especially IRQ injection and timer injection),
did you consider writing a custom ad-hoc device and timer generating those?
There is nothing preventing you from writing a plugin that can
communicate with this specific device (through a socket for instance),
to request specific injections. I feel that it would scale better than
exposing all this to QEMU plugins API.
For SMMU, this is trickier. Tao recently (6ce361b02c82) an iommu test
device, associated to qtest to unit test the smmu implementation. We
could maybe see to leverage that on a full machine, associated with the
communication method mentioned above, to generate specific operations at
runtime, all triggered via a plugin.
Exposing qemu_plugin_flush_tb_cache is a hint we are missing something
on QEMU side. Better to fix it than expose this very internal function.
The associated TRIGGER_ON_PC is very similar to existing inline
operations. They could be enhanced to support writing to a given
register, all the bricks are there. For TRIGGER_ON_SYSREG it's a bit
more complex, but we might enhance inline operations also to support
hooks on specific register writes.
For MMIO override, the current approach you have is good, and it's
definitely something we could integrate.
What are you toughts about this? (especially the device based approach
in case that you maybe tried first).
Regards,
Pierrick
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions
2026-03-18 17:16 ` [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Pierrick Bouvier
@ 2026-03-19 18:20 ` Ruslan Ruslichenko
2026-03-19 19:04 ` Pierrick Bouvier
0 siblings, 1 reply; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-19 18:20 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, qemu-arm, artem_mygaiev, volodymyr_babchuk,
alex.bennee, peter.maydell, philmd, Ruslan_Ruslichenko
Hi Pierrick,
Thank you for the feedback and review!
Our current plan is to put this plugin through our internal workflows to gather
more data on its limitations and performance.
Based on results, we may consider extending or refining the implementation
in the future.
Any further feedback on potential issues is highly appreciated.
On Wed, Mar 18, 2026 at 6:16 PM Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> Hi Ruslan,
>
> On 3/18/26 3:46 AM, Ruslan Ruslichenko wrote:
> > From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
> >
> > This patch series is submitted as an RFC to gather early feedback on a Fault Injection (FI) framework built on top of the QEMU TCG plugin subsystem.
> >
> > Motivation
> >
> > Testing guest operating systems, hypervisors (like Xen), and low-level drivers against unexpected hardware failures can be difficult.
> > This series provides an interface to inject faults dynamically without altering QEMU's core emulation source code for every test case.
> >
> > Architecture & Key Features
> >
> > The series introduces the core API extensions and implements a fault injection plugin (contrib/plugins/fault_injection.c) targeting AArch64.
> > The plugin can be controlled statically via XML configurations on boot, or dynamically at runtime via a UNIX socket (enabling integration with automated testing frameworks via Python or GDB).
> >
> > New Plugin API Capabilities:
> >
> > MMIO Interception: Allows plugins to hook into memory_region_dispatch_read/write to modify hardware register reads or drop writes.
> > Asynchronous Timers: Exposes QEMU_CLOCK_VIRTUAL to plugins, allowing callbacks to be scheduled based on guest virtual time.
> > TB Cache Flushing: Exposes qemu_plugin_flush_tb_cache() so plugins can force re-translation when applying dynamic PC-based hooks.
> > Interrupt & Exception Injection: Exposes APIs to raise/pulse hardware IRQs on the primary INTC and inject CPU exceptions (e.g., SErrors).
> > Custom Device Faults: Introduces a registry where device models (e.g., SMMUv3) can expose specific fault handlers (like CMDQ errors) to be triggered externally by plugins.
> >
> > Patch Summary
> > Patch 1 (target/arm): Adds support for asynchronous CPU exception injection.
> > Patch 2-3 (plugins/api): Exposes virtual clock timers and TB cache flushing to the public plugin API.
> > Patch 4 (plugins): Introduces the core fault injection subsystem, IRQ/Exception routing, and the Custom Fault registry.
> > Patch 5 (system/memory): Adds the MMIO override hooks into the memory dispatch path.
> > Patch 6 (hw/intc): Registers the ARM GIC (v2/v3) with the plugin subsystem to enable direct hardware IRQ injection.
> > Patch 7 (hw/arm): Registers the SMMUv3 with the custom fault registry to demonstrate how device models can expose specific errors (like CMDQ faults) to plugins.
> > Patch 8 (contrib/plugins): Implements the actual fault_injection plugin using the new APIs.
> > Patch 9 (docs): Adds documentation and usage examples for the plugin.
> >
> > Request for Comments & Feedback
> >
> > Any suggestions on improvements, potential edge cases, or issues with the current design are highly welcome.
> >
> > Ruslan Ruslichenko (9):
> > target/arm: Add API for dynamic exception injection
> > plugins/api: Expose virtual clock timers to plugins
> > plugins: Expose Transaction Block cache flush API to plugins
> > plugins: Introduce fault injection API and core subsystem
> > system/memory: Add plugin callbacks to intercept MMIO accesses
> > hw/intc/arm_gic: Register primary GIC for plugin IRQ injection
> > hw/arm/smmuv3: Add plugin fault handler for CMDQ errors
> > contrib/plugins: Add fault injection plugin
> > docs: Add description of fault-injection plugin and subsystem
> >
> > contrib/plugins/fault_injection.c | 772 ++++++++++++++++++++++++++++++
> > contrib/plugins/meson.build | 1 +
> > docs/fault-injection.txt | 111 +++++
> > hw/arm/smmuv3.c | 54 +++
> > hw/intc/arm_gic.c | 28 ++
> > hw/intc/arm_gicv3.c | 28 ++
> > include/plugins/qemu-plugin.h | 28 ++
> > include/qemu/plugin.h | 39 ++
> > plugins/api.c | 62 +++
> > plugins/core.c | 11 +
> > plugins/fault.c | 116 +++++
> > plugins/meson.build | 1 +
> > plugins/plugin.h | 2 +
> > system/memory.c | 8 +
> > target/arm/cpu.h | 4 +
> > target/arm/helper.c | 55 +++
> > 16 files changed, 1320 insertions(+)
> > create mode 100644 contrib/plugins/fault_injection.c
> > create mode 100644 docs/fault-injection.txt
> > create mode 100644 plugins/fault.c
> >
>
> first, thanks for posting your series!
>
> About the general approach.
> As you noticed, this is exposing a lot of QEMU internals, and it's
> something we tend to avoid to do. As well, it's very architecture
> specific, which is another pattern we try to avoid.
>
> For some of your needs (especially IRQ injection and timer injection),
> did you consider writing a custom ad-hoc device and timer generating those?
> There is nothing preventing you from writing a plugin that can
> communicate with this specific device (through a socket for instance),
> to request specific injections. I feel that it would scale better than
> exposing all this to QEMU plugins API.
>
> For SMMU, this is trickier. Tao recently (6ce361b02c82) an iommu test
> device, associated to qtest to unit test the smmu implementation. We
> could maybe see to leverage that on a full machine, associated with the
> communication method mentioned above, to generate specific operations at
> runtime, all triggered via a plugin.
>
> Exposing qemu_plugin_flush_tb_cache is a hint we are missing something
> on QEMU side. Better to fix it than expose this very internal function.
The reason this was needed is that the plugin may receive PC trigger
configuration
dynamically and need to register instruction callback at runtime.
If the TB for that PC is already translated and cached, our newly registered
callback might not be executed.
If there is a more proper way to force QEMU to re-translate a specific
TB or attach
a callback to cached TB it would be great to reduce the complexity here.
> The associated TRIGGER_ON_PC is very similar to existing inline
> operations. They could be enhanced to support writing to a given
> register, all the bricks are there. For TRIGGER_ON_SYSREG it's a bit
> more complex, but we might enhance inline operations also to support
> hooks on specific register writes.
TRIGGER_ON_PC may also be used for generating other faults too. For example,
one use-case is to trigger CPU exceptions on specific instructions.
Supporting TRIGGER_ON_SYSREG as an inline operation sounds like a
really interesting
direction to explore.
>
> For MMIO override, the current approach you have is good, and it's
> definitely something we could integrate.
>
> What are you toughts about this? (especially the device based approach
> in case that you maybe tried first).
I agree such an approach can work well for IRQ's and Timers, and would be
more clean way to implement this.
However, for SMMU and similar cases, triggering internal state errors is not
easy and requires accessing internal logic. So for those specific cases,
a different approach may be needed.
>
> Regards,
> Pierrick
BR,
Ruslan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions
2026-03-19 18:20 ` Ruslan Ruslichenko
@ 2026-03-19 19:04 ` Pierrick Bouvier
2026-03-19 22:29 ` Ruslan Ruslichenko
0 siblings, 1 reply; 18+ messages in thread
From: Pierrick Bouvier @ 2026-03-19 19:04 UTC (permalink / raw)
To: Ruslan Ruslichenko
Cc: qemu-devel, qemu-arm, artem_mygaiev, volodymyr_babchuk,
alex.bennee, peter.maydell, philmd, Ruslan_Ruslichenko
On 3/19/26 11:20 AM, Ruslan Ruslichenko wrote:
> Hi Pierrick,
>
> Thank you for the feedback and review!
>
> Our current plan is to put this plugin through our internal workflows to gather
> more data on its limitations and performance.
> Based on results, we may consider extending or refining the implementation
> in the future.
>
> Any further feedback on potential issues is highly appreciated.
>
By design, the approach of modifying QEMU internals to allow to inject
IRQ, set a timer, or trigger SMMU has very few chances to be integrated
as it is. At least, it should be discussed with the concerned
maintainers, and see if they would be open to it or not.
It's not wrong in itself, if you want a downstream solution, but it does
not scale upstream if we have to consider and accept everyone's needs.
The plugin API in itself can accept the burden for such things, but it's
harder to justify for internal stuff.
I believe it would be better to rely on ad hoc devices generating this,
with the advantage that even if they don't get accepted upstream, it
will be more easy for you to maintain them downstream compared to more
intrusive patches.
> On Wed, Mar 18, 2026 at 6:16 PM Pierrick Bouvier
> <pierrick.bouvier@linaro.org> wrote:
>>
>> Hi Ruslan,
>>
>> On 3/18/26 3:46 AM, Ruslan Ruslichenko wrote:
>>> From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
>>>
>>> This patch series is submitted as an RFC to gather early feedback on a Fault Injection (FI) framework built on top of the QEMU TCG plugin subsystem.
>>>
>>> Motivation
>>>
>>> Testing guest operating systems, hypervisors (like Xen), and low-level drivers against unexpected hardware failures can be difficult.
>>> This series provides an interface to inject faults dynamically without altering QEMU's core emulation source code for every test case.
>>>
>>> Architecture & Key Features
>>>
>>> The series introduces the core API extensions and implements a fault injection plugin (contrib/plugins/fault_injection.c) targeting AArch64.
>>> The plugin can be controlled statically via XML configurations on boot, or dynamically at runtime via a UNIX socket (enabling integration with automated testing frameworks via Python or GDB).
>>>
>>> New Plugin API Capabilities:
>>>
>>> MMIO Interception: Allows plugins to hook into memory_region_dispatch_read/write to modify hardware register reads or drop writes.
>>> Asynchronous Timers: Exposes QEMU_CLOCK_VIRTUAL to plugins, allowing callbacks to be scheduled based on guest virtual time.
>>> TB Cache Flushing: Exposes qemu_plugin_flush_tb_cache() so plugins can force re-translation when applying dynamic PC-based hooks.
>>> Interrupt & Exception Injection: Exposes APIs to raise/pulse hardware IRQs on the primary INTC and inject CPU exceptions (e.g., SErrors).
>>> Custom Device Faults: Introduces a registry where device models (e.g., SMMUv3) can expose specific fault handlers (like CMDQ errors) to be triggered externally by plugins.
>>>
>>> Patch Summary
>>> Patch 1 (target/arm): Adds support for asynchronous CPU exception injection.
>>> Patch 2-3 (plugins/api): Exposes virtual clock timers and TB cache flushing to the public plugin API.
>>> Patch 4 (plugins): Introduces the core fault injection subsystem, IRQ/Exception routing, and the Custom Fault registry.
>>> Patch 5 (system/memory): Adds the MMIO override hooks into the memory dispatch path.
>>> Patch 6 (hw/intc): Registers the ARM GIC (v2/v3) with the plugin subsystem to enable direct hardware IRQ injection.
>>> Patch 7 (hw/arm): Registers the SMMUv3 with the custom fault registry to demonstrate how device models can expose specific errors (like CMDQ faults) to plugins.
>>> Patch 8 (contrib/plugins): Implements the actual fault_injection plugin using the new APIs.
>>> Patch 9 (docs): Adds documentation and usage examples for the plugin.
>>>
>>> Request for Comments & Feedback
>>>
>>> Any suggestions on improvements, potential edge cases, or issues with the current design are highly welcome.
>>>
>>> Ruslan Ruslichenko (9):
>>> target/arm: Add API for dynamic exception injection
>>> plugins/api: Expose virtual clock timers to plugins
>>> plugins: Expose Transaction Block cache flush API to plugins
>>> plugins: Introduce fault injection API and core subsystem
>>> system/memory: Add plugin callbacks to intercept MMIO accesses
>>> hw/intc/arm_gic: Register primary GIC for plugin IRQ injection
>>> hw/arm/smmuv3: Add plugin fault handler for CMDQ errors
>>> contrib/plugins: Add fault injection plugin
>>> docs: Add description of fault-injection plugin and subsystem
>>>
>>> contrib/plugins/fault_injection.c | 772 ++++++++++++++++++++++++++++++
>>> contrib/plugins/meson.build | 1 +
>>> docs/fault-injection.txt | 111 +++++
>>> hw/arm/smmuv3.c | 54 +++
>>> hw/intc/arm_gic.c | 28 ++
>>> hw/intc/arm_gicv3.c | 28 ++
>>> include/plugins/qemu-plugin.h | 28 ++
>>> include/qemu/plugin.h | 39 ++
>>> plugins/api.c | 62 +++
>>> plugins/core.c | 11 +
>>> plugins/fault.c | 116 +++++
>>> plugins/meson.build | 1 +
>>> plugins/plugin.h | 2 +
>>> system/memory.c | 8 +
>>> target/arm/cpu.h | 4 +
>>> target/arm/helper.c | 55 +++
>>> 16 files changed, 1320 insertions(+)
>>> create mode 100644 contrib/plugins/fault_injection.c
>>> create mode 100644 docs/fault-injection.txt
>>> create mode 100644 plugins/fault.c
>>>
>>
>> first, thanks for posting your series!
>>
>> About the general approach.
>> As you noticed, this is exposing a lot of QEMU internals, and it's
>> something we tend to avoid to do. As well, it's very architecture
>> specific, which is another pattern we try to avoid.
>>
>> For some of your needs (especially IRQ injection and timer injection),
>> did you consider writing a custom ad-hoc device and timer generating those?
>> There is nothing preventing you from writing a plugin that can
>> communicate with this specific device (through a socket for instance),
>> to request specific injections. I feel that it would scale better than
>> exposing all this to QEMU plugins API.
>>
>> For SMMU, this is trickier. Tao recently (6ce361b02c82) an iommu test
>> device, associated to qtest to unit test the smmu implementation. We
>> could maybe see to leverage that on a full machine, associated with the
>> communication method mentioned above, to generate specific operations at
>> runtime, all triggered via a plugin.
>>
>> Exposing qemu_plugin_flush_tb_cache is a hint we are missing something
>> on QEMU side. Better to fix it than expose this very internal function.
>
> The reason this was needed is that the plugin may receive PC trigger
> configuration
> dynamically and need to register instruction callback at runtime.
> If the TB for that PC is already translated and cached, our newly registered
> callback might not be executed.
>
> If there is a more proper way to force QEMU to re-translate a specific
> TB or attach
> a callback to cached TB it would be great to reduce the complexity here.
>
I understand better. QEMU plugin current implementation is too limited
for this, and everything has to be done/known at translation time.
What is your use case for receiving PC trigger after translation? Do you
have some mechanism to communicate with the plugin for this?
>> The associated TRIGGER_ON_PC is very similar to existing inline
>> operations. They could be enhanced to support writing to a given
>> register, all the bricks are there. For TRIGGER_ON_SYSREG it's a bit
>> more complex, but we might enhance inline operations also to support
>> hooks on specific register writes.
>
> TRIGGER_ON_PC may also be used for generating other faults too. For example,
> one use-case is to trigger CPU exceptions on specific instructions.
> Supporting TRIGGER_ON_SYSREG as an inline operation sounds like a
> really interesting
> direction to explore.
>
In general, having inline operations support on register read/writes
would be a very nice thing to have (though might be tricky to implement
correctly), and more efficient that the existing approach that requires
to check their value everytime.
>>
>> For MMIO override, the current approach you have is good, and it's
>> definitely something we could integrate.
>>
>> What are you toughts about this? (especially the device based approach
>> in case that you maybe tried first).
>
> I agree such an approach can work well for IRQ's and Timers, and would be
> more clean way to implement this.
>
> However, for SMMU and similar cases, triggering internal state errors is not
> easy and requires accessing internal logic. So for those specific cases,
> a different approach may be needed.
>
Thus the iommu-testdev I mentioned, that could be extended to support this.
>>
>> Regards,
>> Pierrick
>
> BR,
> Ruslan
Regards,
Pierrick
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions
2026-03-19 19:04 ` Pierrick Bouvier
@ 2026-03-19 22:29 ` Ruslan Ruslichenko
2026-03-20 18:08 ` Pierrick Bouvier
0 siblings, 1 reply; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-19 22:29 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, qemu-arm, artem_mygaiev, volodymyr_babchuk,
alex.bennee, peter.maydell, philmd, Ruslan_Ruslichenko
On Thu, Mar 19, 2026 at 8:04 PM Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> On 3/19/26 11:20 AM, Ruslan Ruslichenko wrote:
> > Hi Pierrick,
> >
> > Thank you for the feedback and review!
> >
> > Our current plan is to put this plugin through our internal workflows to gather
> > more data on its limitations and performance.
> > Based on results, we may consider extending or refining the implementation
> > in the future.
> >
> > Any further feedback on potential issues is highly appreciated.
> >
>
> By design, the approach of modifying QEMU internals to allow to inject
> IRQ, set a timer, or trigger SMMU has very few chances to be integrated
> as it is. At least, it should be discussed with the concerned
> maintainers, and see if they would be open to it or not.
>
> It's not wrong in itself, if you want a downstream solution, but it does
> not scale upstream if we have to consider and accept everyone's needs.
> The plugin API in itself can accept the burden for such things, but it's
> harder to justify for internal stuff.
>
> I believe it would be better to rely on ad hoc devices generating this,
> with the advantage that even if they don't get accepted upstream, it
> will be more easy for you to maintain them downstream compared to more
> intrusive patches.
>
> > On Wed, Mar 18, 2026 at 6:16 PM Pierrick Bouvier
> > <pierrick.bouvier@linaro.org> wrote:
> >>
> >> Hi Ruslan,
> >>
> >> On 3/18/26 3:46 AM, Ruslan Ruslichenko wrote:
> >>> From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
> >>>
> >>> This patch series is submitted as an RFC to gather early feedback on a Fault Injection (FI) framework built on top of the QEMU TCG plugin subsystem.
> >>>
> >>> Motivation
> >>>
> >>> Testing guest operating systems, hypervisors (like Xen), and low-level drivers against unexpected hardware failures can be difficult.
> >>> This series provides an interface to inject faults dynamically without altering QEMU's core emulation source code for every test case.
> >>>
> >>> Architecture & Key Features
> >>>
> >>> The series introduces the core API extensions and implements a fault injection plugin (contrib/plugins/fault_injection.c) targeting AArch64.
> >>> The plugin can be controlled statically via XML configurations on boot, or dynamically at runtime via a UNIX socket (enabling integration with automated testing frameworks via Python or GDB).
> >>>
> >>> New Plugin API Capabilities:
> >>>
> >>> MMIO Interception: Allows plugins to hook into memory_region_dispatch_read/write to modify hardware register reads or drop writes.
> >>> Asynchronous Timers: Exposes QEMU_CLOCK_VIRTUAL to plugins, allowing callbacks to be scheduled based on guest virtual time.
> >>> TB Cache Flushing: Exposes qemu_plugin_flush_tb_cache() so plugins can force re-translation when applying dynamic PC-based hooks.
> >>> Interrupt & Exception Injection: Exposes APIs to raise/pulse hardware IRQs on the primary INTC and inject CPU exceptions (e.g., SErrors).
> >>> Custom Device Faults: Introduces a registry where device models (e.g., SMMUv3) can expose specific fault handlers (like CMDQ errors) to be triggered externally by plugins.
> >>>
> >>> Patch Summary
> >>> Patch 1 (target/arm): Adds support for asynchronous CPU exception injection.
> >>> Patch 2-3 (plugins/api): Exposes virtual clock timers and TB cache flushing to the public plugin API.
> >>> Patch 4 (plugins): Introduces the core fault injection subsystem, IRQ/Exception routing, and the Custom Fault registry.
> >>> Patch 5 (system/memory): Adds the MMIO override hooks into the memory dispatch path.
> >>> Patch 6 (hw/intc): Registers the ARM GIC (v2/v3) with the plugin subsystem to enable direct hardware IRQ injection.
> >>> Patch 7 (hw/arm): Registers the SMMUv3 with the custom fault registry to demonstrate how device models can expose specific errors (like CMDQ faults) to plugins.
> >>> Patch 8 (contrib/plugins): Implements the actual fault_injection plugin using the new APIs.
> >>> Patch 9 (docs): Adds documentation and usage examples for the plugin.
> >>>
> >>> Request for Comments & Feedback
> >>>
> >>> Any suggestions on improvements, potential edge cases, or issues with the current design are highly welcome.
> >>>
> >>> Ruslan Ruslichenko (9):
> >>> target/arm: Add API for dynamic exception injection
> >>> plugins/api: Expose virtual clock timers to plugins
> >>> plugins: Expose Transaction Block cache flush API to plugins
> >>> plugins: Introduce fault injection API and core subsystem
> >>> system/memory: Add plugin callbacks to intercept MMIO accesses
> >>> hw/intc/arm_gic: Register primary GIC for plugin IRQ injection
> >>> hw/arm/smmuv3: Add plugin fault handler for CMDQ errors
> >>> contrib/plugins: Add fault injection plugin
> >>> docs: Add description of fault-injection plugin and subsystem
> >>>
> >>> contrib/plugins/fault_injection.c | 772 ++++++++++++++++++++++++++++++
> >>> contrib/plugins/meson.build | 1 +
> >>> docs/fault-injection.txt | 111 +++++
> >>> hw/arm/smmuv3.c | 54 +++
> >>> hw/intc/arm_gic.c | 28 ++
> >>> hw/intc/arm_gicv3.c | 28 ++
> >>> include/plugins/qemu-plugin.h | 28 ++
> >>> include/qemu/plugin.h | 39 ++
> >>> plugins/api.c | 62 +++
> >>> plugins/core.c | 11 +
> >>> plugins/fault.c | 116 +++++
> >>> plugins/meson.build | 1 +
> >>> plugins/plugin.h | 2 +
> >>> system/memory.c | 8 +
> >>> target/arm/cpu.h | 4 +
> >>> target/arm/helper.c | 55 +++
> >>> 16 files changed, 1320 insertions(+)
> >>> create mode 100644 contrib/plugins/fault_injection.c
> >>> create mode 100644 docs/fault-injection.txt
> >>> create mode 100644 plugins/fault.c
> >>>
> >>
> >> first, thanks for posting your series!
> >>
> >> About the general approach.
> >> As you noticed, this is exposing a lot of QEMU internals, and it's
> >> something we tend to avoid to do. As well, it's very architecture
> >> specific, which is another pattern we try to avoid.
> >>
> >> For some of your needs (especially IRQ injection and timer injection),
> >> did you consider writing a custom ad-hoc device and timer generating those?
> >> There is nothing preventing you from writing a plugin that can
> >> communicate with this specific device (through a socket for instance),
> >> to request specific injections. I feel that it would scale better than
> >> exposing all this to QEMU plugins API.
> >>
> >> For SMMU, this is trickier. Tao recently (6ce361b02c82) an iommu test
> >> device, associated to qtest to unit test the smmu implementation. We
> >> could maybe see to leverage that on a full machine, associated with the
> >> communication method mentioned above, to generate specific operations at
> >> runtime, all triggered via a plugin.
> >>
> >> Exposing qemu_plugin_flush_tb_cache is a hint we are missing something
> >> on QEMU side. Better to fix it than expose this very internal function.
> >
> > The reason this was needed is that the plugin may receive PC trigger
> > configuration
> > dynamically and need to register instruction callback at runtime.
> > If the TB for that PC is already translated and cached, our newly registered
> > callback might not be executed.
> >
> > If there is a more proper way to force QEMU to re-translate a specific
> > TB or attach
> > a callback to cached TB it would be great to reduce the complexity here.
> >
>
> I understand better. QEMU plugin current implementation is too limited
> for this, and everything has to be done/known at translation time.
> What is your use case for receiving PC trigger after translation? Do you
> have some mechanism to communicate with the plugin for this?
Yes, exactly. If the guest has already executed the target code, the newly
added trigger will be ignored, as the TB is cached.
For runtime configuration, the plugin spawns a background thread that listens
on a socket. External Python test script connects to this socket to send
dynamically generated XML faults.
There are several scenarios where this might be needed, mainly for faults that
are difficult to define statically at boot time.
Examples include injecting faults after specific chain of events, freezing or
overriding system registers values at specific execution points (since this
is currently implemented via PC triggers). Supporting environments with KASLR
enabled might be one more case.
>
> >> The associated TRIGGER_ON_PC is very similar to existing inline
> >> operations. They could be enhanced to support writing to a given
> >> register, all the bricks are there. For TRIGGER_ON_SYSREG it's a bit
> >> more complex, but we might enhance inline operations also to support
> >> hooks on specific register writes.
> >
> > TRIGGER_ON_PC may also be used for generating other faults too. For example,
> > one use-case is to trigger CPU exceptions on specific instructions.
> > Supporting TRIGGER_ON_SYSREG as an inline operation sounds like a
> > really interesting
> > direction to explore.
> >
>
> In general, having inline operations support on register read/writes
> would be a very nice thing to have (though might be tricky to implement
> correctly), and more efficient that the existing approach that requires
> to check their value everytime.
>
> >>
> >> For MMIO override, the current approach you have is good, and it's
> >> definitely something we could integrate.
> >>
> >> What are you toughts about this? (especially the device based approach
> >> in case that you maybe tried first).
> >
> > I agree such an approach can work well for IRQ's and Timers, and would be
> > more clean way to implement this.
> >
> > However, for SMMU and similar cases, triggering internal state errors is not
> > easy and requires accessing internal logic. So for those specific cases,
> > a different approach may be needed.
> >
>
> Thus the iommu-testdev I mentioned, that could be extended to support this.
>
> >>
> >> Regards,
> >> Pierrick
> >
> > BR,
> > Ruslan
>
> Regards,
> Pierrick
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions
2026-03-19 22:29 ` Ruslan Ruslichenko
@ 2026-03-20 18:08 ` Pierrick Bouvier
2026-03-25 23:39 ` Ruslan Ruslichenko
0 siblings, 1 reply; 18+ messages in thread
From: Pierrick Bouvier @ 2026-03-20 18:08 UTC (permalink / raw)
To: Ruslan Ruslichenko
Cc: qemu-devel, qemu-arm, artem_mygaiev, volodymyr_babchuk,
alex.bennee, peter.maydell, philmd, Ruslan_Ruslichenko
On 3/19/26 3:29 PM, Ruslan Ruslichenko wrote:
> On Thu, Mar 19, 2026 at 8:04 PM Pierrick Bouvier
> <pierrick.bouvier@linaro.org> wrote:
>>
>> On 3/19/26 11:20 AM, Ruslan Ruslichenko wrote:
>>> Hi Pierrick,
>>>
>>> Thank you for the feedback and review!
>>>
>>> Our current plan is to put this plugin through our internal workflows to gather
>>> more data on its limitations and performance.
>>> Based on results, we may consider extending or refining the implementation
>>> in the future.
>>>
>>> Any further feedback on potential issues is highly appreciated.
>>>
>>
>> By design, the approach of modifying QEMU internals to allow to inject
>> IRQ, set a timer, or trigger SMMU has very few chances to be integrated
>> as it is. At least, it should be discussed with the concerned
>> maintainers, and see if they would be open to it or not.
>>
>> It's not wrong in itself, if you want a downstream solution, but it does
>> not scale upstream if we have to consider and accept everyone's needs.
>> The plugin API in itself can accept the burden for such things, but it's
>> harder to justify for internal stuff.
>>
>> I believe it would be better to rely on ad hoc devices generating this,
>> with the advantage that even if they don't get accepted upstream, it
>> will be more easy for you to maintain them downstream compared to more
>> intrusive patches.
>>
>>> On Wed, Mar 18, 2026 at 6:16 PM Pierrick Bouvier
>>> <pierrick.bouvier@linaro.org> wrote:
>>>>
>>>> Hi Ruslan,
>>>>
>>>> On 3/18/26 3:46 AM, Ruslan Ruslichenko wrote:
>>>>> From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
>>>>>
>>>>> This patch series is submitted as an RFC to gather early feedback on a Fault Injection (FI) framework built on top of the QEMU TCG plugin subsystem.
>>>>>
>>>>> Motivation
>>>>>
>>>>> Testing guest operating systems, hypervisors (like Xen), and low-level drivers against unexpected hardware failures can be difficult.
>>>>> This series provides an interface to inject faults dynamically without altering QEMU's core emulation source code for every test case.
>>>>>
>>>>> Architecture & Key Features
>>>>>
>>>>> The series introduces the core API extensions and implements a fault injection plugin (contrib/plugins/fault_injection.c) targeting AArch64.
>>>>> The plugin can be controlled statically via XML configurations on boot, or dynamically at runtime via a UNIX socket (enabling integration with automated testing frameworks via Python or GDB).
>>>>>
>>>>> New Plugin API Capabilities:
>>>>>
>>>>> MMIO Interception: Allows plugins to hook into memory_region_dispatch_read/write to modify hardware register reads or drop writes.
>>>>> Asynchronous Timers: Exposes QEMU_CLOCK_VIRTUAL to plugins, allowing callbacks to be scheduled based on guest virtual time.
>>>>> TB Cache Flushing: Exposes qemu_plugin_flush_tb_cache() so plugins can force re-translation when applying dynamic PC-based hooks.
>>>>> Interrupt & Exception Injection: Exposes APIs to raise/pulse hardware IRQs on the primary INTC and inject CPU exceptions (e.g., SErrors).
>>>>> Custom Device Faults: Introduces a registry where device models (e.g., SMMUv3) can expose specific fault handlers (like CMDQ errors) to be triggered externally by plugins.
>>>>>
>>>>> Patch Summary
>>>>> Patch 1 (target/arm): Adds support for asynchronous CPU exception injection.
>>>>> Patch 2-3 (plugins/api): Exposes virtual clock timers and TB cache flushing to the public plugin API.
>>>>> Patch 4 (plugins): Introduces the core fault injection subsystem, IRQ/Exception routing, and the Custom Fault registry.
>>>>> Patch 5 (system/memory): Adds the MMIO override hooks into the memory dispatch path.
>>>>> Patch 6 (hw/intc): Registers the ARM GIC (v2/v3) with the plugin subsystem to enable direct hardware IRQ injection.
>>>>> Patch 7 (hw/arm): Registers the SMMUv3 with the custom fault registry to demonstrate how device models can expose specific errors (like CMDQ faults) to plugins.
>>>>> Patch 8 (contrib/plugins): Implements the actual fault_injection plugin using the new APIs.
>>>>> Patch 9 (docs): Adds documentation and usage examples for the plugin.
>>>>>
>>>>> Request for Comments & Feedback
>>>>>
>>>>> Any suggestions on improvements, potential edge cases, or issues with the current design are highly welcome.
>>>>>
>>>>> Ruslan Ruslichenko (9):
>>>>> target/arm: Add API for dynamic exception injection
>>>>> plugins/api: Expose virtual clock timers to plugins
>>>>> plugins: Expose Transaction Block cache flush API to plugins
>>>>> plugins: Introduce fault injection API and core subsystem
>>>>> system/memory: Add plugin callbacks to intercept MMIO accesses
>>>>> hw/intc/arm_gic: Register primary GIC for plugin IRQ injection
>>>>> hw/arm/smmuv3: Add plugin fault handler for CMDQ errors
>>>>> contrib/plugins: Add fault injection plugin
>>>>> docs: Add description of fault-injection plugin and subsystem
>>>>>
>>>>> contrib/plugins/fault_injection.c | 772 ++++++++++++++++++++++++++++++
>>>>> contrib/plugins/meson.build | 1 +
>>>>> docs/fault-injection.txt | 111 +++++
>>>>> hw/arm/smmuv3.c | 54 +++
>>>>> hw/intc/arm_gic.c | 28 ++
>>>>> hw/intc/arm_gicv3.c | 28 ++
>>>>> include/plugins/qemu-plugin.h | 28 ++
>>>>> include/qemu/plugin.h | 39 ++
>>>>> plugins/api.c | 62 +++
>>>>> plugins/core.c | 11 +
>>>>> plugins/fault.c | 116 +++++
>>>>> plugins/meson.build | 1 +
>>>>> plugins/plugin.h | 2 +
>>>>> system/memory.c | 8 +
>>>>> target/arm/cpu.h | 4 +
>>>>> target/arm/helper.c | 55 +++
>>>>> 16 files changed, 1320 insertions(+)
>>>>> create mode 100644 contrib/plugins/fault_injection.c
>>>>> create mode 100644 docs/fault-injection.txt
>>>>> create mode 100644 plugins/fault.c
>>>>>
>>>>
>>>> first, thanks for posting your series!
>>>>
>>>> About the general approach.
>>>> As you noticed, this is exposing a lot of QEMU internals, and it's
>>>> something we tend to avoid to do. As well, it's very architecture
>>>> specific, which is another pattern we try to avoid.
>>>>
>>>> For some of your needs (especially IRQ injection and timer injection),
>>>> did you consider writing a custom ad-hoc device and timer generating those?
>>>> There is nothing preventing you from writing a plugin that can
>>>> communicate with this specific device (through a socket for instance),
>>>> to request specific injections. I feel that it would scale better than
>>>> exposing all this to QEMU plugins API.
>>>>
>>>> For SMMU, this is trickier. Tao recently (6ce361b02c82) an iommu test
>>>> device, associated to qtest to unit test the smmu implementation. We
>>>> could maybe see to leverage that on a full machine, associated with the
>>>> communication method mentioned above, to generate specific operations at
>>>> runtime, all triggered via a plugin.
>>>>
>>>> Exposing qemu_plugin_flush_tb_cache is a hint we are missing something
>>>> on QEMU side. Better to fix it than expose this very internal function.
>>>
>>> The reason this was needed is that the plugin may receive PC trigger
>>> configuration
>>> dynamically and need to register instruction callback at runtime.
>>> If the TB for that PC is already translated and cached, our newly registered
>>> callback might not be executed.
>>>
>>> If there is a more proper way to force QEMU to re-translate a specific
>>> TB or attach
>>> a callback to cached TB it would be great to reduce the complexity here.
>>>
>>
>> I understand better. QEMU plugin current implementation is too limited
>> for this, and everything has to be done/known at translation time.
>> What is your use case for receiving PC trigger after translation? Do you
>> have some mechanism to communicate with the plugin for this?
>
> Yes, exactly. If the guest has already executed the target code, the newly
> added trigger will be ignored, as the TB is cached.
>
> For runtime configuration, the plugin spawns a background thread that listens
> on a socket. External Python test script connects to this socket to send
> dynamically generated XML faults.
>
Ok.
Internally, we have tb_invalidate_phys_range that will invalidate a
given range of tb. This is called when writing to memory for a given
address holding code.
Thus from your plugin, if you write to pc address with
qemu_plugin_write_memory_vaddr, it should trigger a re-translation of
this tb. You'll need to read 1 byte, and write it back. As well, it
should be more efficient, since you will only invalidate this tb.
Give it a try and let us know if it works for your need.
> There are several scenarios where this might be needed, mainly for faults that
> are difficult to define statically at boot time.
> Examples include injecting faults after specific chain of events, freezing or
> overriding system registers values at specific execution points (since this
> is currently implemented via PC triggers). Supporting environments with KASLR
> enabled might be one more case.
>
For system registers, you can (heavy but would work) instrument
inconditionally all instructions that touch those registers, so there
would be no need to flush anything. System registers are not accessed
for every instruction, so hopefully, it should not impact too much
execution time.
With both solutions, it should remove the need to expose tb_flush
through plugin API.
>>
>>>> The associated TRIGGER_ON_PC is very similar to existing inline
>>>> operations. They could be enhanced to support writing to a given
>>>> register, all the bricks are there. For TRIGGER_ON_SYSREG it's a bit
>>>> more complex, but we might enhance inline operations also to support
>>>> hooks on specific register writes.
>>>
>>> TRIGGER_ON_PC may also be used for generating other faults too. For example,
>>> one use-case is to trigger CPU exceptions on specific instructions.
>>> Supporting TRIGGER_ON_SYSREG as an inline operation sounds like a
>>> really interesting
>>> direction to explore.
>>>
>>
>> In general, having inline operations support on register read/writes
>> would be a very nice thing to have (though might be tricky to implement
>> correctly), and more efficient that the existing approach that requires
>> to check their value everytime.
>>
>>>>
>>>> For MMIO override, the current approach you have is good, and it's
>>>> definitely something we could integrate.
>>>>
>>>> What are you toughts about this? (especially the device based approach
>>>> in case that you maybe tried first).
>>>
>>> I agree such an approach can work well for IRQ's and Timers, and would be
>>> more clean way to implement this.
>>>
>>> However, for SMMU and similar cases, triggering internal state errors is not
>>> easy and requires accessing internal logic. So for those specific cases,
>>> a different approach may be needed.
>>>
>>
>> Thus the iommu-testdev I mentioned, that could be extended to support this.
>>
>>>>
>>>> Regards,
>>>> Pierrick
>>>
>>> BR,
>>> Ruslan
>>
>> Regards,
>> Pierrick
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions
2026-03-20 18:08 ` Pierrick Bouvier
@ 2026-03-25 23:39 ` Ruslan Ruslichenko
2026-03-26 0:17 ` Pierrick Bouvier
0 siblings, 1 reply; 18+ messages in thread
From: Ruslan Ruslichenko @ 2026-03-25 23:39 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: qemu-devel, qemu-arm, artem_mygaiev, volodymyr_babchuk,
alex.bennee, peter.maydell, philmd, Ruslan_Ruslichenko
On Fri, Mar 20, 2026 at 7:08 PM Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> On 3/19/26 3:29 PM, Ruslan Ruslichenko wrote:
> > On Thu, Mar 19, 2026 at 8:04 PM Pierrick Bouvier
> > <pierrick.bouvier@linaro.org> wrote:
> >>
> >> On 3/19/26 11:20 AM, Ruslan Ruslichenko wrote:
> >>> Hi Pierrick,
> >>>
> >>> Thank you for the feedback and review!
> >>>
> >>> Our current plan is to put this plugin through our internal workflows to gather
> >>> more data on its limitations and performance.
> >>> Based on results, we may consider extending or refining the implementation
> >>> in the future.
> >>>
> >>> Any further feedback on potential issues is highly appreciated.
> >>>
> >>
> >> By design, the approach of modifying QEMU internals to allow to inject
> >> IRQ, set a timer, or trigger SMMU has very few chances to be integrated
> >> as it is. At least, it should be discussed with the concerned
> >> maintainers, and see if they would be open to it or not.
> >>
> >> It's not wrong in itself, if you want a downstream solution, but it does
> >> not scale upstream if we have to consider and accept everyone's needs.
> >> The plugin API in itself can accept the burden for such things, but it's
> >> harder to justify for internal stuff.
> >>
> >> I believe it would be better to rely on ad hoc devices generating this,
> >> with the advantage that even if they don't get accepted upstream, it
> >> will be more easy for you to maintain them downstream compared to more
> >> intrusive patches.
> >>
> >>> On Wed, Mar 18, 2026 at 6:16 PM Pierrick Bouvier
> >>> <pierrick.bouvier@linaro.org> wrote:
> >>>>
> >>>> Hi Ruslan,
> >>>>
> >>>> On 3/18/26 3:46 AM, Ruslan Ruslichenko wrote:
> >>>>> From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
> >>>>>
> >>>>> This patch series is submitted as an RFC to gather early feedback on a Fault Injection (FI) framework built on top of the QEMU TCG plugin subsystem.
> >>>>>
> >>>>> Motivation
> >>>>>
> >>>>> Testing guest operating systems, hypervisors (like Xen), and low-level drivers against unexpected hardware failures can be difficult.
> >>>>> This series provides an interface to inject faults dynamically without altering QEMU's core emulation source code for every test case.
> >>>>>
> >>>>> Architecture & Key Features
> >>>>>
> >>>>> The series introduces the core API extensions and implements a fault injection plugin (contrib/plugins/fault_injection.c) targeting AArch64.
> >>>>> The plugin can be controlled statically via XML configurations on boot, or dynamically at runtime via a UNIX socket (enabling integration with automated testing frameworks via Python or GDB).
> >>>>>
> >>>>> New Plugin API Capabilities:
> >>>>>
> >>>>> MMIO Interception: Allows plugins to hook into memory_region_dispatch_read/write to modify hardware register reads or drop writes.
> >>>>> Asynchronous Timers: Exposes QEMU_CLOCK_VIRTUAL to plugins, allowing callbacks to be scheduled based on guest virtual time.
> >>>>> TB Cache Flushing: Exposes qemu_plugin_flush_tb_cache() so plugins can force re-translation when applying dynamic PC-based hooks.
> >>>>> Interrupt & Exception Injection: Exposes APIs to raise/pulse hardware IRQs on the primary INTC and inject CPU exceptions (e.g., SErrors).
> >>>>> Custom Device Faults: Introduces a registry where device models (e.g., SMMUv3) can expose specific fault handlers (like CMDQ errors) to be triggered externally by plugins.
> >>>>>
> >>>>> Patch Summary
> >>>>> Patch 1 (target/arm): Adds support for asynchronous CPU exception injection.
> >>>>> Patch 2-3 (plugins/api): Exposes virtual clock timers and TB cache flushing to the public plugin API.
> >>>>> Patch 4 (plugins): Introduces the core fault injection subsystem, IRQ/Exception routing, and the Custom Fault registry.
> >>>>> Patch 5 (system/memory): Adds the MMIO override hooks into the memory dispatch path.
> >>>>> Patch 6 (hw/intc): Registers the ARM GIC (v2/v3) with the plugin subsystem to enable direct hardware IRQ injection.
> >>>>> Patch 7 (hw/arm): Registers the SMMUv3 with the custom fault registry to demonstrate how device models can expose specific errors (like CMDQ faults) to plugins.
> >>>>> Patch 8 (contrib/plugins): Implements the actual fault_injection plugin using the new APIs.
> >>>>> Patch 9 (docs): Adds documentation and usage examples for the plugin.
> >>>>>
> >>>>> Request for Comments & Feedback
> >>>>>
> >>>>> Any suggestions on improvements, potential edge cases, or issues with the current design are highly welcome.
> >>>>>
> >>>>> Ruslan Ruslichenko (9):
> >>>>> target/arm: Add API for dynamic exception injection
> >>>>> plugins/api: Expose virtual clock timers to plugins
> >>>>> plugins: Expose Transaction Block cache flush API to plugins
> >>>>> plugins: Introduce fault injection API and core subsystem
> >>>>> system/memory: Add plugin callbacks to intercept MMIO accesses
> >>>>> hw/intc/arm_gic: Register primary GIC for plugin IRQ injection
> >>>>> hw/arm/smmuv3: Add plugin fault handler for CMDQ errors
> >>>>> contrib/plugins: Add fault injection plugin
> >>>>> docs: Add description of fault-injection plugin and subsystem
> >>>>>
> >>>>> contrib/plugins/fault_injection.c | 772 ++++++++++++++++++++++++++++++
> >>>>> contrib/plugins/meson.build | 1 +
> >>>>> docs/fault-injection.txt | 111 +++++
> >>>>> hw/arm/smmuv3.c | 54 +++
> >>>>> hw/intc/arm_gic.c | 28 ++
> >>>>> hw/intc/arm_gicv3.c | 28 ++
> >>>>> include/plugins/qemu-plugin.h | 28 ++
> >>>>> include/qemu/plugin.h | 39 ++
> >>>>> plugins/api.c | 62 +++
> >>>>> plugins/core.c | 11 +
> >>>>> plugins/fault.c | 116 +++++
> >>>>> plugins/meson.build | 1 +
> >>>>> plugins/plugin.h | 2 +
> >>>>> system/memory.c | 8 +
> >>>>> target/arm/cpu.h | 4 +
> >>>>> target/arm/helper.c | 55 +++
> >>>>> 16 files changed, 1320 insertions(+)
> >>>>> create mode 100644 contrib/plugins/fault_injection.c
> >>>>> create mode 100644 docs/fault-injection.txt
> >>>>> create mode 100644 plugins/fault.c
> >>>>>
> >>>>
> >>>> first, thanks for posting your series!
> >>>>
> >>>> About the general approach.
> >>>> As you noticed, this is exposing a lot of QEMU internals, and it's
> >>>> something we tend to avoid to do. As well, it's very architecture
> >>>> specific, which is another pattern we try to avoid.
> >>>>
> >>>> For some of your needs (especially IRQ injection and timer injection),
> >>>> did you consider writing a custom ad-hoc device and timer generating those?
> >>>> There is nothing preventing you from writing a plugin that can
> >>>> communicate with this specific device (through a socket for instance),
> >>>> to request specific injections. I feel that it would scale better than
> >>>> exposing all this to QEMU plugins API.
> >>>>
> >>>> For SMMU, this is trickier. Tao recently (6ce361b02c82) an iommu test
> >>>> device, associated to qtest to unit test the smmu implementation. We
> >>>> could maybe see to leverage that on a full machine, associated with the
> >>>> communication method mentioned above, to generate specific operations at
> >>>> runtime, all triggered via a plugin.
> >>>>
> >>>> Exposing qemu_plugin_flush_tb_cache is a hint we are missing something
> >>>> on QEMU side. Better to fix it than expose this very internal function.
> >>>
> >>> The reason this was needed is that the plugin may receive PC trigger
> >>> configuration
> >>> dynamically and need to register instruction callback at runtime.
> >>> If the TB for that PC is already translated and cached, our newly registered
> >>> callback might not be executed.
> >>>
> >>> If there is a more proper way to force QEMU to re-translate a specific
> >>> TB or attach
> >>> a callback to cached TB it would be great to reduce the complexity here.
> >>>
> >>
> >> I understand better. QEMU plugin current implementation is too limited
> >> for this, and everything has to be done/known at translation time.
> >> What is your use case for receiving PC trigger after translation? Do you
> >> have some mechanism to communicate with the plugin for this?
> >
> > Yes, exactly. If the guest has already executed the target code, the newly
> > added trigger will be ignored, as the TB is cached.
> >
> > For runtime configuration, the plugin spawns a background thread that listens
> > on a socket. External Python test script connects to this socket to send
> > dynamically generated XML faults.
> >
>
> Ok.
>
> Internally, we have tb_invalidate_phys_range that will invalidate a
> given range of tb. This is called when writing to memory for a given
> address holding code.
>
> Thus from your plugin, if you write to pc address with
> qemu_plugin_write_memory_vaddr, it should trigger a re-translation of
> this tb. You'll need to read 1 byte, and write it back. As well, it
> should be more efficient, since you will only invalidate this tb.
>
> Give it a try and let us know if it works for your need.
>
Thank you for your suggestion. This is really useful information regarding
internals of tb processing.
I set up a test to simulate a scenario where a TB flush is needed
and used the described mechanism. However, there is a threading limitation:
qemu_plugin_write_memory_vaddr() must be called from a CPU thread.
In our current implementation dynamic faults are received and processed
by a background thread listening on a socket, so we cannot directly
use API from that context to trigger invalidation.
> > There are several scenarios where this might be needed, mainly for faults that
> > are difficult to define statically at boot time.
> > Examples include injecting faults after specific chain of events, freezing or
> > overriding system registers values at specific execution points (since this
> > is currently implemented via PC triggers). Supporting environments with KASLR
> > enabled might be one more case.
> >
>
> For system registers, you can (heavy but would work) instrument
> inconditionally all instructions that touch those registers, so there
> would be no need to flush anything. System registers are not accessed
> for every instruction, so hopefully, it should not impact too much
> execution time.
>
Agree, this is a good optimization and indeed simplifies dynamic faults
handling for system register reads.
Thank you for the recommendation!
> With both solutions, it should remove the need to expose tb_flush
> through plugin API.
>
> >>
> >>>> The associated TRIGGER_ON_PC is very similar to existing inline
> >>>> operations. They could be enhanced to support writing to a given
> >>>> register, all the bricks are there. For TRIGGER_ON_SYSREG it's a bit
> >>>> more complex, but we might enhance inline operations also to support
> >>>> hooks on specific register writes.
> >>>
> >>> TRIGGER_ON_PC may also be used for generating other faults too. For example,
> >>> one use-case is to trigger CPU exceptions on specific instructions.
> >>> Supporting TRIGGER_ON_SYSREG as an inline operation sounds like a
> >>> really interesting
> >>> direction to explore.
> >>>
> >>
> >> In general, having inline operations support on register read/writes
> >> would be a very nice thing to have (though might be tricky to implement
> >> correctly), and more efficient that the existing approach that requires
> >> to check their value everytime.
> >>
> >>>>
> >>>> For MMIO override, the current approach you have is good, and it's
> >>>> definitely something we could integrate.
> >>>>
> >>>> What are you toughts about this? (especially the device based approach
> >>>> in case that you maybe tried first).
> >>>
> >>> I agree such an approach can work well for IRQ's and Timers, and would be
> >>> more clean way to implement this.
> >>>
> >>> However, for SMMU and similar cases, triggering internal state errors is not
> >>> easy and requires accessing internal logic. So for those specific cases,
> >>> a different approach may be needed.
> >>>
> >>
> >> Thus the iommu-testdev I mentioned, that could be extended to support this.
> >>
> >>>>
> >>>> Regards,
> >>>> Pierrick
> >>>
> >>> BR,
> >>> Ruslan
> >>
> >> Regards,
> >> Pierrick
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions
2026-03-25 23:39 ` Ruslan Ruslichenko
@ 2026-03-26 0:17 ` Pierrick Bouvier
2026-03-26 11:45 ` Alex Bennée
0 siblings, 1 reply; 18+ messages in thread
From: Pierrick Bouvier @ 2026-03-26 0:17 UTC (permalink / raw)
To: Ruslan Ruslichenko
Cc: qemu-devel, qemu-arm, artem_mygaiev, volodymyr_babchuk,
alex.bennee, peter.maydell, philmd, Ruslan_Ruslichenko
On 3/25/26 4:39 PM, Ruslan Ruslichenko wrote:
> On Fri, Mar 20, 2026 at 7:08 PM Pierrick Bouvier
> <pierrick.bouvier@linaro.org> wrote:
>>
>> On 3/19/26 3:29 PM, Ruslan Ruslichenko wrote:
>>> On Thu, Mar 19, 2026 at 8:04 PM Pierrick Bouvier
>>> <pierrick.bouvier@linaro.org> wrote:
>>>>
>>>> On 3/19/26 11:20 AM, Ruslan Ruslichenko wrote:
>>>>> Hi Pierrick,
>>>>>
>>>>> Thank you for the feedback and review!
>>>>>
>>>>> Our current plan is to put this plugin through our internal workflows to gather
>>>>> more data on its limitations and performance.
>>>>> Based on results, we may consider extending or refining the implementation
>>>>> in the future.
>>>>>
>>>>> Any further feedback on potential issues is highly appreciated.
>>>>>
>>>>
>>>> By design, the approach of modifying QEMU internals to allow to inject
>>>> IRQ, set a timer, or trigger SMMU has very few chances to be integrated
>>>> as it is. At least, it should be discussed with the concerned
>>>> maintainers, and see if they would be open to it or not.
>>>>
>>>> It's not wrong in itself, if you want a downstream solution, but it does
>>>> not scale upstream if we have to consider and accept everyone's needs.
>>>> The plugin API in itself can accept the burden for such things, but it's
>>>> harder to justify for internal stuff.
>>>>
>>>> I believe it would be better to rely on ad hoc devices generating this,
>>>> with the advantage that even if they don't get accepted upstream, it
>>>> will be more easy for you to maintain them downstream compared to more
>>>> intrusive patches.
>>>>
>>>>> On Wed, Mar 18, 2026 at 6:16 PM Pierrick Bouvier
>>>>> <pierrick.bouvier@linaro.org> wrote:
>>>>>>
>>>>>> Hi Ruslan,
>>>>>>
>>>>>> On 3/18/26 3:46 AM, Ruslan Ruslichenko wrote:
>>>>>>> From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
>>>>>>>
>>>>>>> This patch series is submitted as an RFC to gather early feedback on a Fault Injection (FI) framework built on top of the QEMU TCG plugin subsystem.
>>>>>>>
>>>>>>> Motivation
>>>>>>>
>>>>>>> Testing guest operating systems, hypervisors (like Xen), and low-level drivers against unexpected hardware failures can be difficult.
>>>>>>> This series provides an interface to inject faults dynamically without altering QEMU's core emulation source code for every test case.
>>>>>>>
>>>>>>> Architecture & Key Features
>>>>>>>
>>>>>>> The series introduces the core API extensions and implements a fault injection plugin (contrib/plugins/fault_injection.c) targeting AArch64.
>>>>>>> The plugin can be controlled statically via XML configurations on boot, or dynamically at runtime via a UNIX socket (enabling integration with automated testing frameworks via Python or GDB).
>>>>>>>
>>>>>>> New Plugin API Capabilities:
>>>>>>>
>>>>>>> MMIO Interception: Allows plugins to hook into memory_region_dispatch_read/write to modify hardware register reads or drop writes.
>>>>>>> Asynchronous Timers: Exposes QEMU_CLOCK_VIRTUAL to plugins, allowing callbacks to be scheduled based on guest virtual time.
>>>>>>> TB Cache Flushing: Exposes qemu_plugin_flush_tb_cache() so plugins can force re-translation when applying dynamic PC-based hooks.
>>>>>>> Interrupt & Exception Injection: Exposes APIs to raise/pulse hardware IRQs on the primary INTC and inject CPU exceptions (e.g., SErrors).
>>>>>>> Custom Device Faults: Introduces a registry where device models (e.g., SMMUv3) can expose specific fault handlers (like CMDQ errors) to be triggered externally by plugins.
>>>>>>>
>>>>>>> Patch Summary
>>>>>>> Patch 1 (target/arm): Adds support for asynchronous CPU exception injection.
>>>>>>> Patch 2-3 (plugins/api): Exposes virtual clock timers and TB cache flushing to the public plugin API.
>>>>>>> Patch 4 (plugins): Introduces the core fault injection subsystem, IRQ/Exception routing, and the Custom Fault registry.
>>>>>>> Patch 5 (system/memory): Adds the MMIO override hooks into the memory dispatch path.
>>>>>>> Patch 6 (hw/intc): Registers the ARM GIC (v2/v3) with the plugin subsystem to enable direct hardware IRQ injection.
>>>>>>> Patch 7 (hw/arm): Registers the SMMUv3 with the custom fault registry to demonstrate how device models can expose specific errors (like CMDQ faults) to plugins.
>>>>>>> Patch 8 (contrib/plugins): Implements the actual fault_injection plugin using the new APIs.
>>>>>>> Patch 9 (docs): Adds documentation and usage examples for the plugin.
>>>>>>>
>>>>>>> Request for Comments & Feedback
>>>>>>>
>>>>>>> Any suggestions on improvements, potential edge cases, or issues with the current design are highly welcome.
>>>>>>>
>>>>>>> Ruslan Ruslichenko (9):
>>>>>>> target/arm: Add API for dynamic exception injection
>>>>>>> plugins/api: Expose virtual clock timers to plugins
>>>>>>> plugins: Expose Transaction Block cache flush API to plugins
>>>>>>> plugins: Introduce fault injection API and core subsystem
>>>>>>> system/memory: Add plugin callbacks to intercept MMIO accesses
>>>>>>> hw/intc/arm_gic: Register primary GIC for plugin IRQ injection
>>>>>>> hw/arm/smmuv3: Add plugin fault handler for CMDQ errors
>>>>>>> contrib/plugins: Add fault injection plugin
>>>>>>> docs: Add description of fault-injection plugin and subsystem
>>>>>>>
>>>>>>> contrib/plugins/fault_injection.c | 772 ++++++++++++++++++++++++++++++
>>>>>>> contrib/plugins/meson.build | 1 +
>>>>>>> docs/fault-injection.txt | 111 +++++
>>>>>>> hw/arm/smmuv3.c | 54 +++
>>>>>>> hw/intc/arm_gic.c | 28 ++
>>>>>>> hw/intc/arm_gicv3.c | 28 ++
>>>>>>> include/plugins/qemu-plugin.h | 28 ++
>>>>>>> include/qemu/plugin.h | 39 ++
>>>>>>> plugins/api.c | 62 +++
>>>>>>> plugins/core.c | 11 +
>>>>>>> plugins/fault.c | 116 +++++
>>>>>>> plugins/meson.build | 1 +
>>>>>>> plugins/plugin.h | 2 +
>>>>>>> system/memory.c | 8 +
>>>>>>> target/arm/cpu.h | 4 +
>>>>>>> target/arm/helper.c | 55 +++
>>>>>>> 16 files changed, 1320 insertions(+)
>>>>>>> create mode 100644 contrib/plugins/fault_injection.c
>>>>>>> create mode 100644 docs/fault-injection.txt
>>>>>>> create mode 100644 plugins/fault.c
>>>>>>>
>>>>>>
>>>>>> first, thanks for posting your series!
>>>>>>
>>>>>> About the general approach.
>>>>>> As you noticed, this is exposing a lot of QEMU internals, and it's
>>>>>> something we tend to avoid to do. As well, it's very architecture
>>>>>> specific, which is another pattern we try to avoid.
>>>>>>
>>>>>> For some of your needs (especially IRQ injection and timer injection),
>>>>>> did you consider writing a custom ad-hoc device and timer generating those?
>>>>>> There is nothing preventing you from writing a plugin that can
>>>>>> communicate with this specific device (through a socket for instance),
>>>>>> to request specific injections. I feel that it would scale better than
>>>>>> exposing all this to QEMU plugins API.
>>>>>>
>>>>>> For SMMU, this is trickier. Tao recently (6ce361b02c82) an iommu test
>>>>>> device, associated to qtest to unit test the smmu implementation. We
>>>>>> could maybe see to leverage that on a full machine, associated with the
>>>>>> communication method mentioned above, to generate specific operations at
>>>>>> runtime, all triggered via a plugin.
>>>>>>
>>>>>> Exposing qemu_plugin_flush_tb_cache is a hint we are missing something
>>>>>> on QEMU side. Better to fix it than expose this very internal function.
>>>>>
>>>>> The reason this was needed is that the plugin may receive PC trigger
>>>>> configuration
>>>>> dynamically and need to register instruction callback at runtime.
>>>>> If the TB for that PC is already translated and cached, our newly registered
>>>>> callback might not be executed.
>>>>>
>>>>> If there is a more proper way to force QEMU to re-translate a specific
>>>>> TB or attach
>>>>> a callback to cached TB it would be great to reduce the complexity here.
>>>>>
>>>>
>>>> I understand better. QEMU plugin current implementation is too limited
>>>> for this, and everything has to be done/known at translation time.
>>>> What is your use case for receiving PC trigger after translation? Do you
>>>> have some mechanism to communicate with the plugin for this?
>>>
>>> Yes, exactly. If the guest has already executed the target code, the newly
>>> added trigger will be ignored, as the TB is cached.
>>>
>>> For runtime configuration, the plugin spawns a background thread that listens
>>> on a socket. External Python test script connects to this socket to send
>>> dynamically generated XML faults.
>>>
>>
>> Ok.
>>
>> Internally, we have tb_invalidate_phys_range that will invalidate a
>> given range of tb. This is called when writing to memory for a given
>> address holding code.
>>
>> Thus from your plugin, if you write to pc address with
>> qemu_plugin_write_memory_vaddr, it should trigger a re-translation of
>> this tb. You'll need to read 1 byte, and write it back. As well, it
>> should be more efficient, since you will only invalidate this tb.
>>
>> Give it a try and let us know if it works for your need.
>>
>
> Thank you for your suggestion. This is really useful information regarding
> internals of tb processing.
>
> I set up a test to simulate a scenario where a TB flush is needed
> and used the described mechanism. However, there is a threading limitation:
> qemu_plugin_write_memory_vaddr() must be called from a CPU thread.
> In our current implementation dynamic faults are received and processed
> by a background thread listening on a socket, so we cannot directly
> use API from that context to trigger invalidation.
>
Indeed, when writing to a virtual address, we need to know the current
execution context and page table setup to translate it. I have two ideas:
- Register a callback per tb. When hitting a tb containing address where
to inject the fault, perform the read/write described above.
You always instrument, and selectively "poke" the code to trigger a new
translation.
- Simulate a given number of cpu watchpoints (N) by using N conditional
callback on every instruction, comparing current pc to N addresses. I'm
afraid it will be too slow.
One thing that could be considered on API side is to add a possibility
to invalidate a specific hardware address (not all tb), based on
tb_invalidate_phys_range. The problem is that plugin now need to keep
track of all physical addresses matching virtual ones you want to
invalidate, which is not convenient.
Else, the easiest way to solve all this is to expose tb_flush, like you
did, but keep this patch downstream for now.
If your final plugin will stay downstream (which I expect, given it has
its own protocol for injecting faults and no source for it), it's really
the cheapest solution.
The current design is built around the assumption that instrumentation
is made at translation time (and not later). So changing it by
instrumenting after translation brings new constraints we can't solve at
the moment without exposing internal details.
>>> There are several scenarios where this might be needed, mainly for faults that
>>> are difficult to define statically at boot time.
>>> Examples include injecting faults after specific chain of events, freezing or
>>> overriding system registers values at specific execution points (since this
>>> is currently implemented via PC triggers). Supporting environments with KASLR
>>> enabled might be one more case.
>>>
>>
>> For system registers, you can (heavy but would work) instrument
>> inconditionally all instructions that touch those registers, so there
>> would be no need to flush anything. System registers are not accessed
>> for every instruction, so hopefully, it should not impact too much
>> execution time.
>>
>
> Agree, this is a good optimization and indeed simplifies dynamic faults
> handling for system register reads.
> Thank you for the recommendation!
>
>> With both solutions, it should remove the need to expose tb_flush
>> through plugin API.
>>
>>>>
>>>>>> The associated TRIGGER_ON_PC is very similar to existing inline
>>>>>> operations. They could be enhanced to support writing to a given
>>>>>> register, all the bricks are there. For TRIGGER_ON_SYSREG it's a bit
>>>>>> more complex, but we might enhance inline operations also to support
>>>>>> hooks on specific register writes.
>>>>>
>>>>> TRIGGER_ON_PC may also be used for generating other faults too. For example,
>>>>> one use-case is to trigger CPU exceptions on specific instructions.
>>>>> Supporting TRIGGER_ON_SYSREG as an inline operation sounds like a
>>>>> really interesting
>>>>> direction to explore.
>>>>>
>>>>
>>>> In general, having inline operations support on register read/writes
>>>> would be a very nice thing to have (though might be tricky to implement
>>>> correctly), and more efficient that the existing approach that requires
>>>> to check their value everytime.
>>>>
>>>>>>
>>>>>> For MMIO override, the current approach you have is good, and it's
>>>>>> definitely something we could integrate.
>>>>>>
>>>>>> What are you toughts about this? (especially the device based approach
>>>>>> in case that you maybe tried first).
>>>>>
>>>>> I agree such an approach can work well for IRQ's and Timers, and would be
>>>>> more clean way to implement this.
>>>>>
>>>>> However, for SMMU and similar cases, triggering internal state errors is not
>>>>> easy and requires accessing internal logic. So for those specific cases,
>>>>> a different approach may be needed.
>>>>>
>>>>
>>>> Thus the iommu-testdev I mentioned, that could be extended to support this.
>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Pierrick
>>>>>
>>>>> BR,
>>>>> Ruslan
>>>>
>>>> Regards,
>>>> Pierrick
>>
Regards,
Pierrick
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions
2026-03-26 0:17 ` Pierrick Bouvier
@ 2026-03-26 11:45 ` Alex Bennée
0 siblings, 0 replies; 18+ messages in thread
From: Alex Bennée @ 2026-03-26 11:45 UTC (permalink / raw)
To: Pierrick Bouvier
Cc: Ruslan Ruslichenko, qemu-devel, qemu-arm, artem_mygaiev,
volodymyr_babchuk, peter.maydell, philmd, Ruslan_Ruslichenko,
Richard Henderson
(adding Richard to Cc)
Pierrick Bouvier <pierrick.bouvier@linaro.org> writes:
> On 3/25/26 4:39 PM, Ruslan Ruslichenko wrote:
>> On Fri, Mar 20, 2026 at 7:08 PM Pierrick Bouvier
>> <pierrick.bouvier@linaro.org> wrote:
>>>
>>> On 3/19/26 3:29 PM, Ruslan Ruslichenko wrote:
>>>> On Thu, Mar 19, 2026 at 8:04 PM Pierrick Bouvier
>>>> <pierrick.bouvier@linaro.org> wrote:
>>>>>
>>>>> On 3/19/26 11:20 AM, Ruslan Ruslichenko wrote:
>>>>>> Hi Pierrick,
>>>>>>
>>>>>> Thank you for the feedback and review!
>>>>>>
>>>>>> Our current plan is to put this plugin through our internal workflows to gather
>>>>>> more data on its limitations and performance.
>>>>>> Based on results, we may consider extending or refining the implementation
>>>>>> in the future.
>>>>>>
>>>>>> Any further feedback on potential issues is highly appreciated.
>>>>>>
>>>>>
>>>>> By design, the approach of modifying QEMU internals to allow to inject
>>>>> IRQ, set a timer, or trigger SMMU has very few chances to be integrated
>>>>> as it is. At least, it should be discussed with the concerned
>>>>> maintainers, and see if they would be open to it or not.
>>>>>
>>>>> It's not wrong in itself, if you want a downstream solution, but it does
>>>>> not scale upstream if we have to consider and accept everyone's needs.
>>>>> The plugin API in itself can accept the burden for such things, but it's
>>>>> harder to justify for internal stuff.
>>>>>
>>>>> I believe it would be better to rely on ad hoc devices generating this,
>>>>> with the advantage that even if they don't get accepted upstream, it
>>>>> will be more easy for you to maintain them downstream compared to more
>>>>> intrusive patches.
>>>>>
>>>>>> On Wed, Mar 18, 2026 at 6:16 PM Pierrick Bouvier
>>>>>> <pierrick.bouvier@linaro.org> wrote:
>>>>>>>
>>>>>>> Hi Ruslan,
>>>>>>>
>>>>>>> On 3/18/26 3:46 AM, Ruslan Ruslichenko wrote:
>>>>>>>> From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
>>>>>>>>
>>>>>>>> This patch series is submitted as an RFC to gather early feedback on a Fault Injection (FI) framework built on top of the QEMU TCG plugin subsystem.
>>>>>>>>
>>>>>>>> Motivation
>>>>>>>>
>>>>>>>> Testing guest operating systems, hypervisors (like Xen), and low-level drivers against unexpected hardware failures can be difficult.
>>>>>>>> This series provides an interface to inject faults dynamically without altering QEMU's core emulation source code for every test case.
>>>>>>>>
>>>>>>>> Architecture & Key Features
>>>>>>>>
>>>>>>>> The series introduces the core API extensions and implements a fault injection plugin (contrib/plugins/fault_injection.c) targeting AArch64.
>>>>>>>> The plugin can be controlled statically via XML configurations on boot, or dynamically at runtime via a UNIX socket (enabling integration with automated testing frameworks via Python or GDB).
>>>>>>>>
>>>>>>>> New Plugin API Capabilities:
>>>>>>>>
>>>>>>>> MMIO Interception: Allows plugins to hook into memory_region_dispatch_read/write to modify hardware register reads or drop writes.
>>>>>>>> Asynchronous Timers: Exposes QEMU_CLOCK_VIRTUAL to plugins, allowing callbacks to be scheduled based on guest virtual time.
>>>>>>>> TB Cache Flushing: Exposes qemu_plugin_flush_tb_cache() so plugins can force re-translation when applying dynamic PC-based hooks.
>>>>>>>> Interrupt & Exception Injection: Exposes APIs to raise/pulse hardware IRQs on the primary INTC and inject CPU exceptions (e.g., SErrors).
>>>>>>>> Custom Device Faults: Introduces a registry where device models (e.g., SMMUv3) can expose specific fault handlers (like CMDQ errors) to be triggered externally by plugins.
>>>>>>>>
>>>>>>>> Patch Summary
>>>>>>>> Patch 1 (target/arm): Adds support for asynchronous CPU exception injection.
>>>>>>>> Patch 2-3 (plugins/api): Exposes virtual clock timers and TB cache flushing to the public plugin API.
>>>>>>>> Patch 4 (plugins): Introduces the core fault injection subsystem, IRQ/Exception routing, and the Custom Fault registry.
>>>>>>>> Patch 5 (system/memory): Adds the MMIO override hooks into the memory dispatch path.
>>>>>>>> Patch 6 (hw/intc): Registers the ARM GIC (v2/v3) with the plugin subsystem to enable direct hardware IRQ injection.
>>>>>>>> Patch 7 (hw/arm): Registers the SMMUv3 with the custom fault registry to demonstrate how device models can expose specific errors (like CMDQ faults) to plugins.
>>>>>>>> Patch 8 (contrib/plugins): Implements the actual fault_injection plugin using the new APIs.
>>>>>>>> Patch 9 (docs): Adds documentation and usage examples for the plugin.
>>>>>>>>
>>>>>>>> Request for Comments & Feedback
>>>>>>>>
>>>>>>>> Any suggestions on improvements, potential edge cases, or issues with the current design are highly welcome.
>>>>>>>>
>>>>>>>> Ruslan Ruslichenko (9):
>>>>>>>> target/arm: Add API for dynamic exception injection
>>>>>>>> plugins/api: Expose virtual clock timers to plugins
>>>>>>>> plugins: Expose Transaction Block cache flush API to plugins
>>>>>>>> plugins: Introduce fault injection API and core subsystem
>>>>>>>> system/memory: Add plugin callbacks to intercept MMIO accesses
>>>>>>>> hw/intc/arm_gic: Register primary GIC for plugin IRQ injection
>>>>>>>> hw/arm/smmuv3: Add plugin fault handler for CMDQ errors
>>>>>>>> contrib/plugins: Add fault injection plugin
>>>>>>>> docs: Add description of fault-injection plugin and subsystem
>>>>>>>>
>>>>>>>> contrib/plugins/fault_injection.c | 772 ++++++++++++++++++++++++++++++
>>>>>>>> contrib/plugins/meson.build | 1 +
>>>>>>>> docs/fault-injection.txt | 111 +++++
>>>>>>>> hw/arm/smmuv3.c | 54 +++
>>>>>>>> hw/intc/arm_gic.c | 28 ++
>>>>>>>> hw/intc/arm_gicv3.c | 28 ++
>>>>>>>> include/plugins/qemu-plugin.h | 28 ++
>>>>>>>> include/qemu/plugin.h | 39 ++
>>>>>>>> plugins/api.c | 62 +++
>>>>>>>> plugins/core.c | 11 +
>>>>>>>> plugins/fault.c | 116 +++++
>>>>>>>> plugins/meson.build | 1 +
>>>>>>>> plugins/plugin.h | 2 +
>>>>>>>> system/memory.c | 8 +
>>>>>>>> target/arm/cpu.h | 4 +
>>>>>>>> target/arm/helper.c | 55 +++
>>>>>>>> 16 files changed, 1320 insertions(+)
>>>>>>>> create mode 100644 contrib/plugins/fault_injection.c
>>>>>>>> create mode 100644 docs/fault-injection.txt
>>>>>>>> create mode 100644 plugins/fault.c
>>>>>>>>
>>>>>>>
>>>>>>> first, thanks for posting your series!
>>>>>>>
>>>>>>> About the general approach.
>>>>>>> As you noticed, this is exposing a lot of QEMU internals, and it's
>>>>>>> something we tend to avoid to do. As well, it's very architecture
>>>>>>> specific, which is another pattern we try to avoid.
>>>>>>>
>>>>>>> For some of your needs (especially IRQ injection and timer injection),
>>>>>>> did you consider writing a custom ad-hoc device and timer generating those?
>>>>>>> There is nothing preventing you from writing a plugin that can
>>>>>>> communicate with this specific device (through a socket for instance),
>>>>>>> to request specific injections. I feel that it would scale better than
>>>>>>> exposing all this to QEMU plugins API.
>>>>>>>
>>>>>>> For SMMU, this is trickier. Tao recently (6ce361b02c82) an iommu test
>>>>>>> device, associated to qtest to unit test the smmu implementation. We
>>>>>>> could maybe see to leverage that on a full machine, associated with the
>>>>>>> communication method mentioned above, to generate specific operations at
>>>>>>> runtime, all triggered via a plugin.
>>>>>>>
>>>>>>> Exposing qemu_plugin_flush_tb_cache is a hint we are missing something
>>>>>>> on QEMU side. Better to fix it than expose this very internal function.
>>>>>>
>>>>>> The reason this was needed is that the plugin may receive PC trigger
>>>>>> configuration
>>>>>> dynamically and need to register instruction callback at runtime.
>>>>>> If the TB for that PC is already translated and cached, our newly registered
>>>>>> callback might not be executed.
>>>>>>
>>>>>> If there is a more proper way to force QEMU to re-translate a specific
>>>>>> TB or attach
>>>>>> a callback to cached TB it would be great to reduce the complexity here.
>>>>>>
>>>>>
>>>>> I understand better. QEMU plugin current implementation is too limited
>>>>> for this, and everything has to be done/known at translation time.
>>>>> What is your use case for receiving PC trigger after translation? Do you
>>>>> have some mechanism to communicate with the plugin for this?
>>>>
>>>> Yes, exactly. If the guest has already executed the target code, the newly
>>>> added trigger will be ignored, as the TB is cached.
>>>>
>>>> For runtime configuration, the plugin spawns a background thread that listens
>>>> on a socket. External Python test script connects to this socket to send
>>>> dynamically generated XML faults.
>>>>
>>>
>>> Ok.
>>>
>>> Internally, we have tb_invalidate_phys_range that will invalidate a
>>> given range of tb. This is called when writing to memory for a given
>>> address holding code.
>>>
>>> Thus from your plugin, if you write to pc address with
>>> qemu_plugin_write_memory_vaddr, it should trigger a re-translation of
>>> this tb. You'll need to read 1 byte, and write it back. As well, it
>>> should be more efficient, since you will only invalidate this tb.
>>>
>>> Give it a try and let us know if it works for your need.
>>>
>> Thank you for your suggestion. This is really useful information
>> regarding
>> internals of tb processing.
>> I set up a test to simulate a scenario where a TB flush is needed
>> and used the described mechanism. However, there is a threading limitation:
>> qemu_plugin_write_memory_vaddr() must be called from a CPU thread.
>> In our current implementation dynamic faults are received and processed
>> by a background thread listening on a socket, so we cannot directly
>> use API from that context to trigger invalidation.
>>
>
> Indeed, when writing to a virtual address, we need to know the current
> execution context and page table setup to translate it. I have two
> ideas:
> - Register a callback per tb. When hitting a tb containing address
> where to inject the fault, perform the read/write described above.
You could use a conditional callback with a scoreboard (or possibly
introduce a map feature similar to ebpf). You would track the address
ranges and latch the scoreboard when you want to look at something more
closely.
I wonder if allowing the TB itself to be invalidates conditionally would
be ok? We do try really hard to avoid exposing internal implementation
details to plugins but the concept of a block of instructions is kinda
already baked in. However we want to avoid plugins having to track a lot
of translation state to be useful.
> You always instrument, and selectively "poke" the code to trigger a
> new translation.
> - Simulate a given number of cpu watchpoints (N) by using N
> conditional callback on every instruction, comparing current pc to N
> addresses. I'm afraid it will be too slow.
I think you want at most one conditional check per instruction and then
take the slow path to check.
>
> One thing that could be considered on API side is to add a possibility
> to invalidate a specific hardware address (not all tb), based on
> tb_invalidate_phys_range. The problem is that plugin now need to keep
> track of all physical addresses matching virtual ones you want to
> invalidate, which is not convenient.
>
> Else, the easiest way to solve all this is to expose tb_flush, like
> you did, but keep this patch downstream for now.
> If your final plugin will stay downstream (which I expect, given it
> has its own protocol for injecting faults and no source for it), it's
> really the cheapest solution.
>
> The current design is built around the assumption that instrumentation
> is made at translation time (and not later). So changing it by
> instrumenting after translation brings new constraints we can't solve
> at the moment without exposing internal details.
We should certainly consider automatically triggering tb_flush() on each
qemu_plugin_register_vcpu_tb_trans_cb() so at least the case of
dynamically loading a plugin doesn't miss previous translations.
>
>>>> There are several scenarios where this might be needed, mainly for faults that
>>>> are difficult to define statically at boot time.
>>>> Examples include injecting faults after specific chain of events, freezing or
>>>> overriding system registers values at specific execution points (since this
>>>> is currently implemented via PC triggers). Supporting environments with KASLR
>>>> enabled might be one more case.
>>>>
>>>
>>> For system registers, you can (heavy but would work) instrument
>>> inconditionally all instructions that touch those registers, so there
>>> would be no need to flush anything. System registers are not accessed
>>> for every instruction, so hopefully, it should not impact too much
>>> execution time.
>>>
>> Agree, this is a good optimization and indeed simplifies dynamic
>> faults
>> handling for system register reads.
>> Thank you for the recommendation!
>>
>>> With both solutions, it should remove the need to expose tb_flush
>>> through plugin API.
>>>
>>>>>
>>>>>>> The associated TRIGGER_ON_PC is very similar to existing inline
>>>>>>> operations. They could be enhanced to support writing to a given
>>>>>>> register, all the bricks are there. For TRIGGER_ON_SYSREG it's a bit
>>>>>>> more complex, but we might enhance inline operations also to support
>>>>>>> hooks on specific register writes.
>>>>>>
>>>>>> TRIGGER_ON_PC may also be used for generating other faults too. For example,
>>>>>> one use-case is to trigger CPU exceptions on specific instructions.
>>>>>> Supporting TRIGGER_ON_SYSREG as an inline operation sounds like a
>>>>>> really interesting
>>>>>> direction to explore.
>>>>>>
>>>>>
>>>>> In general, having inline operations support on register read/writes
>>>>> would be a very nice thing to have (though might be tricky to implement
>>>>> correctly), and more efficient that the existing approach that requires
>>>>> to check their value everytime.
>>>>>
>>>>>>>
>>>>>>> For MMIO override, the current approach you have is good, and it's
>>>>>>> definitely something we could integrate.
>>>>>>>
>>>>>>> What are you toughts about this? (especially the device based approach
>>>>>>> in case that you maybe tried first).
>>>>>>
>>>>>> I agree such an approach can work well for IRQ's and Timers, and would be
>>>>>> more clean way to implement this.
>>>>>>
>>>>>> However, for SMMU and similar cases, triggering internal state errors is not
>>>>>> easy and requires accessing internal logic. So for those specific cases,
>>>>>> a different approach may be needed.
>>>>>>
>>>>>
>>>>> Thus the iommu-testdev I mentioned, that could be extended to support this.
>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>> Pierrick
>>>>>>
>>>>>> BR,
>>>>>> Ruslan
>>>>>
>>>>> Regards,
>>>>> Pierrick
>>>
>
> Regards,
> Pierrick
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-03-26 11:46 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-18 10:46 [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 1/9] target/arm: Add API for dynamic exception injection Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 2/9] plugins/api: Expose virtual clock timers to plugins Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 3/9] plugins: Expose Transaction Block cache flush API " Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 4/9] plugins: Introduce fault injection API and core subsystem Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 5/9] system/memory: Add plugin callbacks to intercept MMIO accesses Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 6/9] hw/intc/arm_gic: Register primary GIC for plugin IRQ injection Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 7/9] hw/arm/smmuv3: Add plugin fault handler for CMDQ errors Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 8/9] contrib/plugins: Add fault injection plugin Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 9/9] docs: Add description of fault-injection plugin and subsystem Ruslan Ruslichenko
2026-03-18 17:16 ` [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Pierrick Bouvier
2026-03-19 18:20 ` Ruslan Ruslichenko
2026-03-19 19:04 ` Pierrick Bouvier
2026-03-19 22:29 ` Ruslan Ruslichenko
2026-03-20 18:08 ` Pierrick Bouvier
2026-03-25 23:39 ` Ruslan Ruslichenko
2026-03-26 0:17 ` Pierrick Bouvier
2026-03-26 11:45 ` Alex Bennée
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox