qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] system/physmem: fix use-after-free with dispatch
@ 2025-07-24 16:11 Pierrick Bouvier
  2025-07-24 16:14 ` Pierrick Bouvier
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Pierrick Bouvier @ 2025-07-24 16:11 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Gustavo Romero, Michael Tokarev,
	Philippe Mathieu-Daudé, Peter Maydell, David Hildenbrand,
	Manos Pitsidianakis, Paolo Bonzini, Mark Cave-Ayland,
	Richard Henderson, Peter Xu, Mark Cave-Ayland, Pierrick Bouvier

A use-after-free bug was reported when booting a Linux kernel during the
pci setup phase. It's quite hard to reproduce (needs smp, and favored by
having several pci devices with BAR and specific Linux config, which
is Debian default one in this case).

After investigation (see the associated bug ticket), it appears that,
under specific conditions, we might access a cached AddressSpaceDispatch
that was reclaimed by RCU thread meanwhile.
In the Linux boot scenario, during the pci phase, memory region are
destroyed/recreated, resulting in exposition of the bug.

The core of the issue is that we cache the dispatch associated to
current cpu in cpu->cpu_ases[asidx].memory_dispatch. It is updated with
tcg_commit, which runs asynchronously on a given cpu.
At some point, we leave the rcu critial section, and the RCU thread
starts reclaiming it, but tcg_commit is not yet invoked, resulting in
the use-after-free.

It's not the first problem around this area, and this patch [1] already
tried to address it. It did a good job, but it seems that we found a
specific situation where it's not enough.

This patch takes a simple approach: remove the cached value creating the
issue, and make sure we always get the current mapping for address
space, using address_space_to_dispatch(cpu->cpu_ases[asidx].as).
It's equivalent to qatomic_rcu_read(&as->current_map)->dispatch;
This is not really costly, we just need two dereferences,
including one atomic (rcu) read, which is negligible considering we are
already on mmu slow path anyway.

Note that tcg_commit is still needed, as it's taking care of flushing
TLB, removing previously mapped entries.

Another solution would be to cache directly values under the dispatch
(dispatch themselves are not ref counted), keep an active reference on
associated memory section, and release it when appropriate (tricky).
Given the time already spent debugging this area now and previously, I
strongly prefer eliminating the root of the issue, instead of adding
more complexity for a hypothetical performance gain. RCU is precisely
used to ensure good performance when reading data, so caching is not as
beneficial as it might seem IMHO.

[1] https://gitlab.com/qemu-project/qemu/-/commit/0d58c660689f6da1e3feff8a997014003d928b3b

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3040
Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
---
 system/physmem.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/system/physmem.c b/system/physmem.c
index 130c148ffb5..e5dd760e0bc 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -165,13 +165,11 @@ static bool ram_is_cpr_compatible(RAMBlock *rb);
  * CPUAddressSpace: all the information a CPU needs about an AddressSpace
  * @cpu: the CPU whose AddressSpace this is
  * @as: the AddressSpace itself
- * @memory_dispatch: its dispatch pointer (cached, RCU protected)
  * @tcg_as_listener: listener for tracking changes to the AddressSpace
  */
 typedef struct CPUAddressSpace {
     CPUState *cpu;
     AddressSpace *as;
-    struct AddressSpaceDispatch *memory_dispatch;
     MemoryListener tcg_as_listener;
 } CPUAddressSpace;
 
@@ -692,7 +690,7 @@ address_space_translate_for_iotlb(CPUState *cpu, int asidx, hwaddr orig_addr,
     IOMMUTLBEntry iotlb;
     int iommu_idx;
     hwaddr addr = orig_addr;
-    AddressSpaceDispatch *d = cpu->cpu_ases[asidx].memory_dispatch;
+    AddressSpaceDispatch *d = address_space_to_dispatch(cpu->cpu_ases[asidx].as);
 
     for (;;) {
         section = address_space_translate_internal(d, addr, &addr, plen, false);
@@ -753,7 +751,7 @@ MemoryRegionSection *iotlb_to_section(CPUState *cpu,
 {
     int asidx = cpu_asidx_from_attrs(cpu, attrs);
     CPUAddressSpace *cpuas = &cpu->cpu_ases[asidx];
-    AddressSpaceDispatch *d = cpuas->memory_dispatch;
+    AddressSpaceDispatch *d = address_space_to_dispatch(cpuas->as);
     int section_index = index & ~TARGET_PAGE_MASK;
     MemoryRegionSection *ret;
 
@@ -2780,9 +2778,6 @@ static void tcg_log_global_after_sync(MemoryListener *listener)
 
 static void tcg_commit_cpu(CPUState *cpu, run_on_cpu_data data)
 {
-    CPUAddressSpace *cpuas = data.host_ptr;
-
-    cpuas->memory_dispatch = address_space_to_dispatch(cpuas->as);
     tlb_flush(cpu);
 }
 
@@ -2798,11 +2793,7 @@ static void tcg_commit(MemoryListener *listener)
     cpu = cpuas->cpu;
 
     /*
-     * Defer changes to as->memory_dispatch until the cpu is quiescent.
-     * Otherwise we race between (1) other cpu threads and (2) ongoing
-     * i/o for the current cpu thread, with data cached by mmu_lookup().
-     *
-     * In addition, queueing the work function will kick the cpu back to
+     * Queueing the work function will kick the cpu back to
      * the main loop, which will end the RCU critical section and reclaim
      * the memory data structures.
      *
-- 
2.47.2



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] system/physmem: fix use-after-free with dispatch
  2025-07-24 16:11 [PATCH] system/physmem: fix use-after-free with dispatch Pierrick Bouvier
@ 2025-07-24 16:14 ` Pierrick Bouvier
  2025-07-24 16:33 ` Richard Henderson
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Pierrick Bouvier @ 2025-07-24 16:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alex Bennée, Gustavo Romero, Michael Tokarev,
	Philippe Mathieu-Daudé, Peter Maydell, David Hildenbrand,
	Manos Pitsidianakis, Paolo Bonzini, Mark Cave-Ayland,
	Richard Henderson, Peter Xu, Mark Cave-Ayland

On 7/24/25 9:11 AM, Pierrick Bouvier wrote:
> A use-after-free bug was reported when booting a Linux kernel during the
> pci setup phase. It's quite hard to reproduce (needs smp, and favored by
> having several pci devices with BAR and specific Linux config, which
> is Debian default one in this case).
> 
> After investigation (see the associated bug ticket), it appears that,
> under specific conditions, we might access a cached AddressSpaceDispatch
> that was reclaimed by RCU thread meanwhile.
> In the Linux boot scenario, during the pci phase, memory region are
> destroyed/recreated, resulting in exposition of the bug.
> 
> The core of the issue is that we cache the dispatch associated to
> current cpu in cpu->cpu_ases[asidx].memory_dispatch. It is updated with
> tcg_commit, which runs asynchronously on a given cpu.
> At some point, we leave the rcu critial section, and the RCU thread
> starts reclaiming it, but tcg_commit is not yet invoked, resulting in
> the use-after-free.
> 
> It's not the first problem around this area, and this patch [1] already
> tried to address it. It did a good job, but it seems that we found a
> specific situation where it's not enough.
> 
> This patch takes a simple approach: remove the cached value creating the
> issue, and make sure we always get the current mapping for address
> space, using address_space_to_dispatch(cpu->cpu_ases[asidx].as).
> It's equivalent to qatomic_rcu_read(&as->current_map)->dispatch;
> This is not really costly, we just need two dereferences,
> including one atomic (rcu) read, which is negligible considering we are
> already on mmu slow path anyway.
> 
> Note that tcg_commit is still needed, as it's taking care of flushing
> TLB, removing previously mapped entries.
> 
> Another solution would be to cache directly values under the dispatch
> (dispatch themselves are not ref counted), keep an active reference on
> associated memory section, and release it when appropriate (tricky).
> Given the time already spent debugging this area now and previously, I
> strongly prefer eliminating the root of the issue, instead of adding
> more complexity for a hypothetical performance gain. RCU is precisely
> used to ensure good performance when reading data, so caching is not as
> beneficial as it might seem IMHO.
> 
> [1] https://gitlab.com/qemu-project/qemu/-/commit/0d58c660689f6da1e3feff8a997014003d928b3b
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3040
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
>   system/physmem.c | 15 +++------------
>   1 file changed, 3 insertions(+), 12 deletions(-)
>

This was tested with reproduced in bug report, doing more than 5'000 
boot without any failure.
As well, all QEMU tests were ran on aarch64 and x64 linux hosts.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] system/physmem: fix use-after-free with dispatch
  2025-07-24 16:11 [PATCH] system/physmem: fix use-after-free with dispatch Pierrick Bouvier
  2025-07-24 16:14 ` Pierrick Bouvier
@ 2025-07-24 16:33 ` Richard Henderson
  2025-07-24 19:01 ` Michael Tokarev
  2025-07-28 15:45 ` Philippe Mathieu-Daudé
  3 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2025-07-24 16:33 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Gustavo Romero, Michael Tokarev,
	Philippe Mathieu-Daudé, Peter Maydell, David Hildenbrand,
	Manos Pitsidianakis, Paolo Bonzini, Mark Cave-Ayland, Peter Xu,
	Mark Cave-Ayland

On 7/24/25 09:11, Pierrick Bouvier wrote:
> This patch takes a simple approach: remove the cached value creating the
> issue, and make sure we always get the current mapping for address
> space, using address_space_to_dispatch(cpu->cpu_ases[asidx].as).
> It's equivalent to qatomic_rcu_read(&as->current_map)->dispatch;
> This is not really costly, we just need two dereferences,
> including one atomic (rcu) read, which is negligible considering we are
> already on mmu slow path anyway.

We're not just on the slow path, we're on the tlb fill path, which is even rarer.  Once 
upon a time, memory_dispatch was used along the mmio path, but even then I think it must 
have been premature optimization.

This looks like an excellent solution.
Thanks for all the detective work.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>


r~


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] system/physmem: fix use-after-free with dispatch
  2025-07-24 16:11 [PATCH] system/physmem: fix use-after-free with dispatch Pierrick Bouvier
  2025-07-24 16:14 ` Pierrick Bouvier
  2025-07-24 16:33 ` Richard Henderson
@ 2025-07-24 19:01 ` Michael Tokarev
  2025-07-28 15:45 ` Philippe Mathieu-Daudé
  3 siblings, 0 replies; 5+ messages in thread
From: Michael Tokarev @ 2025-07-24 19:01 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Gustavo Romero, Philippe Mathieu-Daudé,
	Peter Maydell, David Hildenbrand, Manos Pitsidianakis,
	Paolo Bonzini, Mark Cave-Ayland, Richard Henderson, Peter Xu,
	Mark Cave-Ayland

On 24.07.2025 19:11, Pierrick Bouvier wrote:
> A use-after-free bug was reported when booting a Linux kernel during the
> pci setup phase. It's quite hard to reproduce (needs smp, and favored by
> having several pci devices with BAR and specific Linux config, which
> is Debian default one in this case).
> 
> After investigation (see the associated bug ticket), it appears that,
> under specific conditions, we might access a cached AddressSpaceDispatch
> that was reclaimed by RCU thread meanwhile.
> In the Linux boot scenario, during the pci phase, memory region are
> destroyed/recreated, resulting in exposition of the bug.
> 
> The core of the issue is that we cache the dispatch associated to
> current cpu in cpu->cpu_ases[asidx].memory_dispatch. It is updated with
> tcg_commit, which runs asynchronously on a given cpu.
> At some point, we leave the rcu critial section, and the RCU thread
> starts reclaiming it, but tcg_commit is not yet invoked, resulting in
> the use-after-free.
> 
> It's not the first problem around this area, and this patch [1] already
> tried to address it. It did a good job, but it seems that we found a
> specific situation where it's not enough.
> 
> This patch takes a simple approach: remove the cached value creating the
> issue, and make sure we always get the current mapping for address
> space, using address_space_to_dispatch(cpu->cpu_ases[asidx].as).
> It's equivalent to qatomic_rcu_read(&as->current_map)->dispatch;
> This is not really costly, we just need two dereferences,
> including one atomic (rcu) read, which is negligible considering we are
> already on mmu slow path anyway.
> 
> Note that tcg_commit is still needed, as it's taking care of flushing
> TLB, removing previously mapped entries.
> 
> Another solution would be to cache directly values under the dispatch
> (dispatch themselves are not ref counted), keep an active reference on
> associated memory section, and release it when appropriate (tricky).
> Given the time already spent debugging this area now and previously, I
> strongly prefer eliminating the root of the issue, instead of adding
> more complexity for a hypothetical performance gain. RCU is precisely
> used to ensure good performance when reading data, so caching is not as
> beneficial as it might seem IMHO.
> 
> [1] https://gitlab.com/qemu-project/qemu/-/commit/0d58c660689f6da1e3feff8a997014003d928b3b
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3040
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>

Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>

Thank you for this work!

/mjt


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] system/physmem: fix use-after-free with dispatch
  2025-07-24 16:11 [PATCH] system/physmem: fix use-after-free with dispatch Pierrick Bouvier
                   ` (2 preceding siblings ...)
  2025-07-24 19:01 ` Michael Tokarev
@ 2025-07-28 15:45 ` Philippe Mathieu-Daudé
  3 siblings, 0 replies; 5+ messages in thread
From: Philippe Mathieu-Daudé @ 2025-07-28 15:45 UTC (permalink / raw)
  To: Pierrick Bouvier, qemu-devel
  Cc: Alex Bennée, Gustavo Romero, Michael Tokarev, Peter Maydell,
	David Hildenbrand, Manos Pitsidianakis, Paolo Bonzini,
	Mark Cave-Ayland, Richard Henderson, Peter Xu, Mark Cave-Ayland

On 24/7/25 18:11, Pierrick Bouvier wrote:
> A use-after-free bug was reported when booting a Linux kernel during the
> pci setup phase. It's quite hard to reproduce (needs smp, and favored by
> having several pci devices with BAR and specific Linux config, which
> is Debian default one in this case).
> 
> After investigation (see the associated bug ticket), it appears that,
> under specific conditions, we might access a cached AddressSpaceDispatch
> that was reclaimed by RCU thread meanwhile.
> In the Linux boot scenario, during the pci phase, memory region are
> destroyed/recreated, resulting in exposition of the bug.
> 
> The core of the issue is that we cache the dispatch associated to
> current cpu in cpu->cpu_ases[asidx].memory_dispatch. It is updated with
> tcg_commit, which runs asynchronously on a given cpu.
> At some point, we leave the rcu critial section, and the RCU thread
> starts reclaiming it, but tcg_commit is not yet invoked, resulting in
> the use-after-free.
> 
> It's not the first problem around this area, and this patch [1] already
> tried to address it. It did a good job, but it seems that we found a
> specific situation where it's not enough.
> 
> This patch takes a simple approach: remove the cached value creating the
> issue, and make sure we always get the current mapping for address
> space, using address_space_to_dispatch(cpu->cpu_ases[asidx].as).

Very nice.

> It's equivalent to qatomic_rcu_read(&as->current_map)->dispatch;
> This is not really costly, we just need two dereferences,
> including one atomic (rcu) read, which is negligible considering we are
> already on mmu slow path anyway.
> 
> Note that tcg_commit is still needed, as it's taking care of flushing
> TLB, removing previously mapped entries.
> 
> Another solution would be to cache directly values under the dispatch
> (dispatch themselves are not ref counted), keep an active reference on
> associated memory section, and release it when appropriate (tricky).
> Given the time already spent debugging this area now and previously, I
> strongly prefer eliminating the root of the issue, instead of adding
> more complexity for a hypothetical performance gain. RCU is precisely
> used to ensure good performance when reading data, so caching is not as
> beneficial as it might seem IMHO.
> 
> [1] https://gitlab.com/qemu-project/qemu/-/commit/0d58c660689f6da1e3feff8a997014003d928b3b
> 
> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3040
> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
> ---
>   system/physmem.c | 15 +++------------
>   1 file changed, 3 insertions(+), 12 deletions(-)

Patch queued, thanks!


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-07-28 15:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-24 16:11 [PATCH] system/physmem: fix use-after-free with dispatch Pierrick Bouvier
2025-07-24 16:14 ` Pierrick Bouvier
2025-07-24 16:33 ` Richard Henderson
2025-07-24 19:01 ` Michael Tokarev
2025-07-28 15:45 ` Philippe Mathieu-Daudé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).