Re: [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions

public inbox for qemu-devel@nongnu.org
 help / color / mirror / Atom feed

From: Ruslan Ruslichenko <ruslichenko.r@gmail.com>
To: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Cc: qemu-devel@nongnu.org, qemu-arm@nongnu.org,
	artem_mygaiev@epam.com,  volodymyr_babchuk@epam.com,
	alex.bennee@linaro.org, peter.maydell@linaro.org,
	 philmd@linaro.org, Ruslan_Ruslichenko@epam.com
Subject: Re: [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions
Date: Thu, 26 Mar 2026 00:39:19 +0100	[thread overview]
Message-ID: <CAN-aV1HRZPxuS_U4mHYhnobbKkq2G5T9P2cJ2=LahT3TzGsC6A@mail.gmail.com> (raw)
In-Reply-To: <79fca2cf-7618-4a35-a915-e2bf1811a851@linaro.org>

On Fri, Mar 20, 2026 at 7:08 PM Pierrick Bouvier
<pierrick.bouvier@linaro.org> wrote:
>
> On 3/19/26 3:29 PM, Ruslan Ruslichenko wrote:
> > On Thu, Mar 19, 2026 at 8:04 PM Pierrick Bouvier
> > <pierrick.bouvier@linaro.org> wrote:
> >>
> >> On 3/19/26 11:20 AM, Ruslan Ruslichenko wrote:
> >>> Hi Pierrick,
> >>>
> >>> Thank you for the feedback and review!
> >>>
> >>> Our current plan is to put this plugin through our internal workflows to gather
> >>> more data on its limitations and performance.
> >>> Based on results, we may consider extending or refining the implementation
> >>> in the future.
> >>>
> >>> Any further feedback on potential issues is highly appreciated.
> >>>
> >>
> >> By design, the approach of modifying QEMU internals to allow to inject
> >> IRQ, set a timer, or trigger SMMU has very few chances to be integrated
> >> as it is. At least, it should be discussed with the concerned
> >> maintainers, and see if they would be open to it or not.
> >>
> >> It's not wrong in itself, if you want a downstream solution, but it does
> >> not scale upstream if we have to consider and accept everyone's needs.
> >> The plugin API in itself can accept the burden for such things, but it's
> >> harder to justify for internal stuff.
> >>
> >> I believe it would be better to rely on ad hoc devices generating this,
> >> with the advantage that even if they don't get accepted upstream, it
> >> will be more easy for you to maintain them downstream compared to more
> >> intrusive patches.
> >>
> >>> On Wed, Mar 18, 2026 at 6:16 PM Pierrick Bouvier
> >>> <pierrick.bouvier@linaro.org> wrote:
> >>>>
> >>>> Hi Ruslan,
> >>>>
> >>>> On 3/18/26 3:46 AM, Ruslan Ruslichenko wrote:
> >>>>> From: Ruslan Ruslichenko <Ruslan_Ruslichenko@epam.com>
> >>>>>
> >>>>> This patch series is submitted as an RFC to gather early feedback on a Fault Injection (FI) framework built on top of the QEMU TCG plugin subsystem.
> >>>>>
> >>>>> Motivation
> >>>>>
> >>>>> Testing guest operating systems, hypervisors (like Xen), and low-level drivers against unexpected hardware failures can be difficult.
> >>>>> This series provides an interface to inject faults dynamically without altering QEMU's core emulation source code for every test case.
> >>>>>
> >>>>> Architecture & Key Features
> >>>>>
> >>>>> The series introduces the core API extensions and implements a fault injection plugin (contrib/plugins/fault_injection.c) targeting AArch64.
> >>>>> The plugin can be controlled statically via XML configurations on boot, or dynamically at runtime via a UNIX socket (enabling integration with automated testing frameworks via Python or GDB).
> >>>>>
> >>>>> New Plugin API Capabilities:
> >>>>>
> >>>>> MMIO Interception: Allows plugins to hook into memory_region_dispatch_read/write to modify hardware register reads or drop writes.
> >>>>> Asynchronous Timers: Exposes QEMU_CLOCK_VIRTUAL to plugins, allowing callbacks to be scheduled based on guest virtual time.
> >>>>> TB Cache Flushing: Exposes qemu_plugin_flush_tb_cache() so plugins can force re-translation when applying dynamic PC-based hooks.
> >>>>> Interrupt & Exception Injection: Exposes APIs to raise/pulse hardware IRQs on the primary INTC and inject CPU exceptions (e.g., SErrors).
> >>>>> Custom Device Faults: Introduces a registry where device models (e.g., SMMUv3) can expose specific fault handlers (like CMDQ errors) to be triggered externally by plugins.
> >>>>>
> >>>>> Patch Summary
> >>>>> Patch 1 (target/arm): Adds support for asynchronous CPU exception injection.
> >>>>> Patch 2-3 (plugins/api): Exposes virtual clock timers and TB cache flushing to the public plugin API.
> >>>>> Patch 4 (plugins): Introduces the core fault injection subsystem, IRQ/Exception routing, and the Custom Fault registry.
> >>>>> Patch 5 (system/memory): Adds the MMIO override hooks into the memory dispatch path.
> >>>>> Patch 6 (hw/intc): Registers the ARM GIC (v2/v3) with the plugin subsystem to enable direct hardware IRQ injection.
> >>>>> Patch 7 (hw/arm): Registers the SMMUv3 with the custom fault registry to demonstrate how device models can expose specific errors (like CMDQ faults) to plugins.
> >>>>> Patch 8 (contrib/plugins): Implements the actual fault_injection plugin using the new APIs.
> >>>>> Patch 9 (docs): Adds documentation and usage examples for the plugin.
> >>>>>
> >>>>> Request for Comments & Feedback
> >>>>>
> >>>>> Any suggestions on improvements, potential edge cases, or issues with the current design are highly welcome.
> >>>>>
> >>>>> Ruslan Ruslichenko (9):
> >>>>>      target/arm: Add API for dynamic exception injection
> >>>>>      plugins/api: Expose virtual clock timers to plugins
> >>>>>      plugins: Expose Transaction Block cache flush API to plugins
> >>>>>      plugins: Introduce fault injection API and core subsystem
> >>>>>      system/memory: Add plugin callbacks to intercept MMIO accesses
> >>>>>      hw/intc/arm_gic: Register primary GIC for plugin IRQ injection
> >>>>>      hw/arm/smmuv3: Add plugin fault handler for CMDQ errors
> >>>>>      contrib/plugins: Add fault injection plugin
> >>>>>      docs: Add description of fault-injection plugin and subsystem
> >>>>>
> >>>>>     contrib/plugins/fault_injection.c | 772 ++++++++++++++++++++++++++++++
> >>>>>     contrib/plugins/meson.build       |   1 +
> >>>>>     docs/fault-injection.txt          | 111 +++++
> >>>>>     hw/arm/smmuv3.c                   |  54 +++
> >>>>>     hw/intc/arm_gic.c                 |  28 ++
> >>>>>     hw/intc/arm_gicv3.c               |  28 ++
> >>>>>     include/plugins/qemu-plugin.h     |  28 ++
> >>>>>     include/qemu/plugin.h             |  39 ++
> >>>>>     plugins/api.c                     |  62 +++
> >>>>>     plugins/core.c                    |  11 +
> >>>>>     plugins/fault.c                   | 116 +++++
> >>>>>     plugins/meson.build               |   1 +
> >>>>>     plugins/plugin.h                  |   2 +
> >>>>>     system/memory.c                   |   8 +
> >>>>>     target/arm/cpu.h                  |   4 +
> >>>>>     target/arm/helper.c               |  55 +++
> >>>>>     16 files changed, 1320 insertions(+)
> >>>>>     create mode 100644 contrib/plugins/fault_injection.c
> >>>>>     create mode 100644 docs/fault-injection.txt
> >>>>>     create mode 100644 plugins/fault.c
> >>>>>
> >>>>
> >>>> first, thanks for posting your series!
> >>>>
> >>>> About the general approach.
> >>>> As you noticed, this is exposing a lot of QEMU internals, and it's
> >>>> something we tend to avoid to do. As well, it's very architecture
> >>>> specific, which is another pattern we try to avoid.
> >>>>
> >>>> For some of your needs (especially IRQ injection and timer injection),
> >>>> did you consider writing a custom ad-hoc device and timer generating those?
> >>>> There is nothing preventing you from writing a plugin that can
> >>>> communicate with this specific device (through a socket for instance),
> >>>> to request specific injections. I feel that it would scale better than
> >>>> exposing all this to QEMU plugins API.
> >>>>
> >>>> For SMMU, this is trickier. Tao recently (6ce361b02c82) an iommu test
> >>>> device, associated to qtest to unit test the smmu implementation. We
> >>>> could maybe see to leverage that on a full machine, associated with the
> >>>> communication method mentioned above, to generate specific operations at
> >>>> runtime, all triggered via a plugin.
> >>>>
> >>>> Exposing qemu_plugin_flush_tb_cache is a hint we are missing something
> >>>> on QEMU side. Better to fix it than expose this very internal function.
> >>>
> >>> The reason this was needed is that the plugin may receive PC trigger
> >>> configuration
> >>> dynamically and need to register instruction callback at runtime.
> >>> If the TB for that PC is already translated and cached, our newly registered
> >>> callback might not be executed.
> >>>
> >>> If there is a more proper way to force QEMU to re-translate a specific
> >>> TB or attach
> >>> a callback to cached TB it would be great to reduce the complexity here.
> >>>
> >>
> >> I understand better. QEMU plugin current implementation is too limited
> >> for this, and everything has to be done/known at translation time.
> >> What is your use case for receiving PC trigger after translation? Do you
> >> have some mechanism to communicate with the plugin for this?
> >
> > Yes, exactly. If the guest has already executed the target code, the newly
> > added trigger will be ignored, as the TB is cached.
> >
> > For runtime configuration, the plugin spawns a background thread that listens
> > on a socket. External Python test script connects to this socket to send
> > dynamically generated XML faults.
> >
>
> Ok.
>
> Internally, we have tb_invalidate_phys_range that will invalidate a
> given range of tb. This is called when writing to memory for a given
> address holding code.
>
> Thus from your plugin, if you write to pc address with
> qemu_plugin_write_memory_vaddr, it should trigger a re-translation of
> this tb. You'll need to read 1 byte, and write it back. As well, it
> should be more efficient, since you will only invalidate this tb.
>
> Give it a try and let us know if it works for your need.
>

Thank you for your suggestion. This is really useful information regarding
internals of tb processing.

I set up a test to simulate a scenario where a TB flush is needed
and used the described mechanism. However, there is a threading limitation:
qemu_plugin_write_memory_vaddr() must be called from a CPU thread.
In our current implementation dynamic faults are received and processed
by a background thread listening on a socket, so we cannot directly
use API from that context to trigger invalidation.

> > There are several scenarios where this might be needed, mainly for faults that
> > are difficult to define statically at boot time.
> > Examples include injecting faults after specific chain of events, freezing or
> > overriding system registers values at specific execution points (since this
> > is currently implemented via PC triggers). Supporting environments with KASLR
> > enabled might be one more case.
> >
>
> For system registers, you can (heavy but would work) instrument
> inconditionally all instructions that touch those registers, so there
> would be no need to flush anything. System registers are not accessed
> for every instruction, so hopefully, it should not impact too much
> execution time.
>

Agree, this is a good optimization and indeed simplifies dynamic faults
handling for system register reads.
Thank you for the recommendation!

> With both solutions, it should remove the need to expose tb_flush
> through plugin API.
>
> >>
> >>>> The associated TRIGGER_ON_PC is very similar to existing inline
> >>>> operations. They could be enhanced to support writing to a given
> >>>> register, all the bricks are there. For TRIGGER_ON_SYSREG it's a bit
> >>>> more complex, but we might enhance inline operations also to support
> >>>> hooks on specific register writes.
> >>>
> >>> TRIGGER_ON_PC may also be used for generating other faults too. For example,
> >>> one use-case is to trigger CPU exceptions on specific instructions.
> >>> Supporting TRIGGER_ON_SYSREG as an inline operation sounds like a
> >>> really interesting
> >>> direction to explore.
> >>>
> >>
> >> In general, having inline operations support on register read/writes
> >> would be a very nice thing to have (though might be tricky to implement
> >> correctly), and more efficient that the existing approach that requires
> >> to check their value everytime.
> >>
> >>>>
> >>>> For MMIO override, the current approach you have is good, and it's
> >>>> definitely something we could integrate.
> >>>>
> >>>> What are you toughts about this? (especially the device based approach
> >>>> in case that you maybe tried first).
> >>>
> >>> I agree such an approach can work well for IRQ's and Timers, and would be
> >>> more clean way to implement this.
> >>>
> >>> However, for SMMU and similar cases, triggering internal state errors is not
> >>> easy and requires accessing internal logic. So for those specific cases,
> >>> a different approach may be needed.
> >>>
> >>
> >> Thus the iommu-testdev I mentioned, that could be extended to support this.
> >>
> >>>>
> >>>> Regards,
> >>>> Pierrick
> >>>
> >>> BR,
> >>> Ruslan
> >>
> >> Regards,
> >> Pierrick
>

next prev parent reply	other threads:[~2026-03-25 23:40 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-18 10:46 [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 1/9] target/arm: Add API for dynamic exception injection Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 2/9] plugins/api: Expose virtual clock timers to plugins Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 3/9] plugins: Expose Transaction Block cache flush API " Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 4/9] plugins: Introduce fault injection API and core subsystem Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 5/9] system/memory: Add plugin callbacks to intercept MMIO accesses Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 6/9] hw/intc/arm_gic: Register primary GIC for plugin IRQ injection Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 7/9] hw/arm/smmuv3: Add plugin fault handler for CMDQ errors Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 8/9] contrib/plugins: Add fault injection plugin Ruslan Ruslichenko
2026-03-18 10:46 ` [RFC PATCH 9/9] docs: Add description of fault-injection plugin and subsystem Ruslan Ruslichenko
2026-03-18 17:16 ` [RFC PATCH 0/9] plugins: Introduce Fault Injection framework and API extensions Pierrick Bouvier
2026-03-19 18:20   ` Ruslan Ruslichenko
2026-03-19 19:04     ` Pierrick Bouvier
2026-03-19 22:29       ` Ruslan Ruslichenko
2026-03-20 18:08         ` Pierrick Bouvier
2026-03-25 23:39           ` Ruslan Ruslichenko [this message]
2026-03-26  0:17             ` Pierrick Bouvier
2026-03-26 11:45               ` Alex Bennée

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAN-aV1HRZPxuS_U4mHYhnobbKkq2G5T9P2cJ2=LahT3TzGsC6A@mail.gmail.com' \
    --to=ruslichenko.r@gmail.com \
    --cc=Ruslan_Ruslichenko@epam.com \
    --cc=alex.bennee@linaro.org \
    --cc=artem_mygaiev@epam.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=pierrick.bouvier@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=volodymyr_babchuk@epam.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox