* [PATCH] memory: Make FlatView root references weak
@ 2025-10-27 5:56 Akihiko Odaki
2025-10-28 22:09 ` Peter Xu
0 siblings, 1 reply; 14+ messages in thread
From: Akihiko Odaki @ 2025-10-27 5:56 UTC (permalink / raw)
To: qemu-devel
Cc: Paolo Bonzini, Peter Xu, David Hildenbrand,
Philippe Mathieu-Daudé, Akihiko Odaki
docs/devel/memory.rst says "memory_region_ref and memory_region_unref
are never called on aliases", but these functions are called for
FlatView roots, which can be aliases.
This causes object overwrite hazard in pci-bridge. Specifically,
pci_bridge_region_init() expects that there are no references to
w->alias_io after object_unparent() is called, allowing it to reuse the
associated storage. However, if a parent bus still holds a reference to
the existing object as a FlatView's root, the storage is still in use,
leading to an overwrite. This hazard can be confirmed by adding the
following code to pci_bridge_region_init():
PCIDevice *parent_dev = parent->parent_dev;
assert(!object_dynamic_cast(OBJECT(parent_dev), TYPE_PCI_BRIDGE) ||
PCI_BRIDGE(parent_dev)->as_io.current_map->root != &w->alias_io);
This assertion fails when running:
meson test -C build qtest-x86_64/bios-tables-test \
'--test-args=-p /x86_64/acpi/piix4/pci-hotplug/no_root_hotplug'
Make the references of FlatView roots "weak" (i.e., remove the
reference to a root automatically removed when it is finalized) to
avoid calling memory_region_ref and memory_region_unref and fix the
hazard with pci-bridge.
Alternative solutions (like removing the "never called on aliases"
statement or detailing the exception) were rejected because the alias
invariant is still relied upon in several parts of the codebase, and
updating existing code to align with a new condition is non-trivial.
Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
---
system/memory.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/system/memory.c b/system/memory.c
index 8b84661ae36c..08fe5e791224 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -266,7 +266,6 @@ static FlatView *flatview_new(MemoryRegion *mr_root)
view = g_new0(FlatView, 1);
view->ref = 1;
view->root = mr_root;
- memory_region_ref(mr_root);
trace_flatview_new(view, mr_root);
return view;
@@ -301,7 +300,6 @@ static void flatview_destroy(FlatView *view)
memory_region_unref(view->ranges[i].mr);
}
g_free(view->ranges);
- memory_region_unref(view->root);
g_free(view);
}
@@ -1796,6 +1794,12 @@ void memory_region_init_iommu(void *_iommu_mr,
static void memory_region_finalize(Object *obj)
{
MemoryRegion *mr = MEMORY_REGION(obj);
+ gpointer key;
+ gpointer value;
+
+ if (g_hash_table_steal_extended(flat_views, mr, &key, &value)) {
+ ((FlatView *)value)->root = NULL;
+ }
/*
* Each memory region (that can be freed) must have an owner, and it
---
base-commit: 36076d24f04ea9dc3357c0fbe7bb14917375819c
change-id: 20251024-root-d431450fcbbf
Best regards,
--
Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-10-27 5:56 [PATCH] memory: Make FlatView root references weak Akihiko Odaki
@ 2025-10-28 22:09 ` Peter Xu
2025-10-29 4:06 ` Akihiko Odaki
0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2025-10-28 22:09 UTC (permalink / raw)
To: Akihiko Odaki
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On Mon, Oct 27, 2025 at 02:56:53PM +0900, Akihiko Odaki wrote:
> docs/devel/memory.rst says "memory_region_ref and memory_region_unref
> are never called on aliases", but these functions are called for
> FlatView roots, which can be aliases.
IMHO the quoted doc was in a special context, where it was talking about
the example of address_space_map() holding adhoc refcounts of a MR, in
which case "memory_region_ref and memory_region_unref are never called on
aliases" was correct..
The full context:
...
If you break this rule, the following situation can happen:
- the memory region's owner had a reference taken via memory_region_ref
(for example by address_space_map)
- the region is unparented, and has no owner anymore
- when address_space_unmap is called, the reference to the memory region's
owner is leaked.
There is an exception to the above rule: it is okay to call
object_unparent at any time for an alias or a container region. It is
therefore also okay to create or destroy alias and container regions
dynamically during a device's lifetime.
This exceptional usage is valid because aliases and containers only help
QEMU building the guest's memory map; they are never accessed directly.
memory_region_ref and memory_region_unref are never called on aliases
or containers, and the above situation then cannot happen. Exploiting
this exception is rarely necessary, and therefore it is discouraged,
but nevertheless it is used in a few places.
...
So I can't say the doc is wrong, but maybe it can be at least be clearer on
the scope of that sentence.. indeed.
>
> This causes object overwrite hazard in pci-bridge. Specifically,
> pci_bridge_region_init() expects that there are no references to
> w->alias_io after object_unparent() is called, allowing it to reuse the
> associated storage. However, if a parent bus still holds a reference to
> the existing object as a FlatView's root, the storage is still in use,
> leading to an overwrite. This hazard can be confirmed by adding the
> following code to pci_bridge_region_init():
>
> PCIDevice *parent_dev = parent->parent_dev;
> assert(!object_dynamic_cast(OBJECT(parent_dev), TYPE_PCI_BRIDGE) ||
> PCI_BRIDGE(parent_dev)->as_io.current_map->root != &w->alias_io);
What's interesting is I found PCIBridge.as_io / PCIBridge.as_mem are not
used anywhere.. because it looks like the bridge code uses MRs to operate
rather than address spaces.
Does it mean we can drop the two ASes? Then if they're the only holder of
the refcounts of these problematic MRs, does it solve the problem too in an
easier way? Maybe there're other issues you want to fix too with this patch?
>
> This assertion fails when running:
> meson test -C build qtest-x86_64/bios-tables-test \
> '--test-args=-p /x86_64/acpi/piix4/pci-hotplug/no_root_hotplug'
>
> Make the references of FlatView roots "weak" (i.e., remove the
> reference to a root automatically removed when it is finalized) to
> avoid calling memory_region_ref and memory_region_unref and fix the
> hazard with pci-bridge.
>
> Alternative solutions (like removing the "never called on aliases"
> statement or detailing the exception) were rejected because the alias
> invariant is still relied upon in several parts of the codebase, and
> updating existing code to align with a new condition is non-trivial.
>
> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> ---
> system/memory.c | 8 ++++++--
> 1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/system/memory.c b/system/memory.c
> index 8b84661ae36c..08fe5e791224 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -266,7 +266,6 @@ static FlatView *flatview_new(MemoryRegion *mr_root)
> view = g_new0(FlatView, 1);
> view->ref = 1;
> view->root = mr_root;
> - memory_region_ref(mr_root);
> trace_flatview_new(view, mr_root);
>
> return view;
> @@ -301,7 +300,6 @@ static void flatview_destroy(FlatView *view)
> memory_region_unref(view->ranges[i].mr);
> }
> g_free(view->ranges);
> - memory_region_unref(view->root);
> g_free(view);
> }
>
> @@ -1796,6 +1794,12 @@ void memory_region_init_iommu(void *_iommu_mr,
> static void memory_region_finalize(Object *obj)
> {
> MemoryRegion *mr = MEMORY_REGION(obj);
> + gpointer key;
> + gpointer value;
> +
> + if (g_hash_table_steal_extended(flat_views, mr, &key, &value)) {
> + ((FlatView *)value)->root = NULL;
> + }
This is definitely very tricky.. The translation path (from
AddressSpaceDispatch) indeed looks ok as of now, which doesn't looks at
view->root.. however at least I saw this:
void flatview_unref(FlatView *view)
{
if (qatomic_fetch_dec(&view->ref) == 1) {
trace_flatview_destroy_rcu(view, view->root);
assert(view->root); <-------------------
call_rcu(view, flatview_destroy, rcu);
}
}
I wonder how it didn't already crash.
The other stupid but working solution is we can always make the 6 aliases
to not be reused, IOW we can always use dynamic MRs considering
pci_bridge_update_mappings() should be rare?
In short, I wished we can fix this with something cleaner, and it's hard to
justify view->root==NULL is fine to be concurrently reset.. So IMHO we
should be very careful justifying this.
Thanks,
>
> /*
> * Each memory region (that can be freed) must have an owner, and it
>
> ---
> base-commit: 36076d24f04ea9dc3357c0fbe7bb14917375819c
> change-id: 20251024-root-d431450fcbbf
>
> Best regards,
> --
> Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>
--
Peter Xu
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-10-28 22:09 ` Peter Xu
@ 2025-10-29 4:06 ` Akihiko Odaki
2025-10-29 4:35 ` Akihiko Odaki
2025-10-29 15:21 ` Peter Xu
0 siblings, 2 replies; 14+ messages in thread
From: Akihiko Odaki @ 2025-10-29 4:06 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On 2025/10/29 7:09, Peter Xu wrote:
> On Mon, Oct 27, 2025 at 02:56:53PM +0900, Akihiko Odaki wrote:
>> docs/devel/memory.rst says "memory_region_ref and memory_region_unref
>> are never called on aliases", but these functions are called for
>> FlatView roots, which can be aliases.
>
> IMHO the quoted doc was in a special context, where it was talking about
> the example of address_space_map() holding adhoc refcounts of a MR, in
> which case "memory_region_ref and memory_region_unref are never called on
> aliases" was correct..
>
> The full context:
>
> ...
> If you break this rule, the following situation can happen:
>
> - the memory region's owner had a reference taken via memory_region_ref
> (for example by address_space_map)
>
> - the region is unparented, and has no owner anymore
>
> - when address_space_unmap is called, the reference to the memory region's
> owner is leaked.
>
> There is an exception to the above rule: it is okay to call
> object_unparent at any time for an alias or a container region. It is
> therefore also okay to create or destroy alias and container regions
> dynamically during a device's lifetime.
>
> This exceptional usage is valid because aliases and containers only help
> QEMU building the guest's memory map; they are never accessed directly.
> memory_region_ref and memory_region_unref are never called on aliases
> or containers, and the above situation then cannot happen. Exploiting
> this exception is rarely necessary, and therefore it is discouraged,
> but nevertheless it is used in a few places.
> ...
>
> So I can't say the doc is wrong, but maybe it can be at least be clearer on
> the scope of that sentence.. indeed.
I think statement "it is okay to call object_unparent at any time for an
alias or a container region" can be corrected. Practically, developers
will want call object_unparent() only when:
- the memory region is not added to a container and
- there is no manual references created with memory_region_ref().
These two conditions can be satisfied by auditing the device code that
owns the memory region instead of multiple devices.
>
>>
>> This causes object overwrite hazard in pci-bridge. Specifically,
>> pci_bridge_region_init() expects that there are no references to
>> w->alias_io after object_unparent() is called, allowing it to reuse the
>> associated storage. However, if a parent bus still holds a reference to
>> the existing object as a FlatView's root, the storage is still in use,
>> leading to an overwrite. This hazard can be confirmed by adding the
>> following code to pci_bridge_region_init():
>>
>> PCIDevice *parent_dev = parent->parent_dev;
>> assert(!object_dynamic_cast(OBJECT(parent_dev), TYPE_PCI_BRIDGE) ||
>> PCI_BRIDGE(parent_dev)->as_io.current_map->root != &w->alias_io);
>
> What's interesting is I found PCIBridge.as_io / PCIBridge.as_mem are not
> used anywhere.. because it looks like the bridge code uses MRs to operate
> rather than address spaces.
>
> Does it mean we can drop the two ASes? Then if they're the only holder of
> the refcounts of these problematic MRs, does it solve the problem too in an
> easier way? Maybe there're other issues you want to fix too with this patch?
Apparently we cannot drop the ASes. See commit 55fa4be6f76a
("virtio-pci: fix memory_region_find for VirtIOPCIRegion's MR"), which
introduced them.
I don't know any other existing devices affected by this FlatView
behavior, but it is also difficult to show that they are *not* affected
because it requires traversing MemoryRegion graphs that span across
several devices.
We will also need to update the documentation for future devices, but it
is not trivial either as the condition where aliases are referenced from
FlatView is complex.
Considering that, I think this patch is a pragmatic solution that
ensures correctness of object_unparent() on aliases.
>
>>
>> This assertion fails when running:
>> meson test -C build qtest-x86_64/bios-tables-test \
>> '--test-args=-p /x86_64/acpi/piix4/pci-hotplug/no_root_hotplug'
>>
>> Make the references of FlatView roots "weak" (i.e., remove the
>> reference to a root automatically removed when it is finalized) to
>> avoid calling memory_region_ref and memory_region_unref and fix the
>> hazard with pci-bridge.
>>
>> Alternative solutions (like removing the "never called on aliases"
>> statement or detailing the exception) were rejected because the alias
>> invariant is still relied upon in several parts of the codebase, and
>> updating existing code to align with a new condition is non-trivial.
>>
>> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>> ---
>> system/memory.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/system/memory.c b/system/memory.c
>> index 8b84661ae36c..08fe5e791224 100644
>> --- a/system/memory.c
>> +++ b/system/memory.c
>> @@ -266,7 +266,6 @@ static FlatView *flatview_new(MemoryRegion *mr_root)
>> view = g_new0(FlatView, 1);
>> view->ref = 1;
>> view->root = mr_root;
>> - memory_region_ref(mr_root);
>> trace_flatview_new(view, mr_root);
>>
>> return view;
>> @@ -301,7 +300,6 @@ static void flatview_destroy(FlatView *view)
>> memory_region_unref(view->ranges[i].mr);
>> }
>> g_free(view->ranges);
>> - memory_region_unref(view->root);
>> g_free(view);
>> }
>>
>> @@ -1796,6 +1794,12 @@ void memory_region_init_iommu(void *_iommu_mr,
>> static void memory_region_finalize(Object *obj)
>> {
>> MemoryRegion *mr = MEMORY_REGION(obj);
>> + gpointer key;
>> + gpointer value;
>> +
>> + if (g_hash_table_steal_extended(flat_views, mr, &key, &value)) {
>> + ((FlatView *)value)->root = NULL;
>> + }
>
> This is definitely very tricky.. The translation path (from
> AddressSpaceDispatch) indeed looks ok as of now, which doesn't looks at
> view->root.. however at least I saw this:
>
> void flatview_unref(FlatView *view)
> {
> if (qatomic_fetch_dec(&view->ref) == 1) {
> trace_flatview_destroy_rcu(view, view->root);
> assert(view->root); <-------------------
> call_rcu(view, flatview_destroy, rcu);
> }
> }
>
> I wonder how it didn't already crash.
In case of pci-bridge, I guess flatview_unref() is synchronously called,
but memory_region_unref(view->root) is not because of flatview_destroy()
is delayed with RCU.
>
> The other stupid but working solution is we can always make the 6 aliases
> to not be reused, IOW we can always use dynamic MRs considering
> pci_bridge_update_mappings() should be rare?
Perhaps we may introduce memory_region_new_alias() (that calls
object_new()) and allow calling object_unparent() only for aliases
created with the function.
Regards,
Akihiko Odaki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-10-29 4:06 ` Akihiko Odaki
@ 2025-10-29 4:35 ` Akihiko Odaki
2025-10-29 15:21 ` Peter Xu
1 sibling, 0 replies; 14+ messages in thread
From: Akihiko Odaki @ 2025-10-29 4:35 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On 2025/10/29 13:06, Akihiko Odaki wrote:
> On 2025/10/29 7:09, Peter Xu wrote:
>> On Mon, Oct 27, 2025 at 02:56:53PM +0900, Akihiko Odaki wrote:
>>> docs/devel/memory.rst says "memory_region_ref and memory_region_unref
>>> are never called on aliases", but these functions are called for
>>> FlatView roots, which can be aliases.
>>
>> IMHO the quoted doc was in a special context, where it was talking about
>> the example of address_space_map() holding adhoc refcounts of a MR, in
>> which case "memory_region_ref and memory_region_unref are never called on
>> aliases" was correct..
>>
>> The full context:
>>
>> ...
>> If you break this rule, the following situation can happen:
>> - the memory region's owner had a reference taken via
>> memory_region_ref
>> (for example by address_space_map)
>> - the region is unparented, and has no owner anymore
>> - when address_space_unmap is called, the reference to the memory
>> region's
>> owner is leaked.
>> There is an exception to the above rule: it is okay to call
>> object_unparent at any time for an alias or a container region. It is
>> therefore also okay to create or destroy alias and container regions
>> dynamically during a device's lifetime.
>> This exceptional usage is valid because aliases and containers only
>> help
>> QEMU building the guest's memory map; they are never accessed
>> directly.
>> memory_region_ref and memory_region_unref are never called on aliases
>> or containers, and the above situation then cannot happen. Exploiting
>> this exception is rarely necessary, and therefore it is discouraged,
>> but nevertheless it is used in a few places.
>> ...
>>
>> So I can't say the doc is wrong, but maybe it can be at least be
>> clearer on
>> the scope of that sentence.. indeed.
>
> I think statement "it is okay to call object_unparent at any time for an
> alias or a container region" can be corrected. Practically, developers
> will want call object_unparent() only when:
> - the memory region is not added to a container and
> - there is no manual references created with memory_region_ref().
>
> These two conditions can be satisfied by auditing the device code that
> owns the memory region instead of multiple devices.
>
>>
>>>
>>> This causes object overwrite hazard in pci-bridge. Specifically,
>>> pci_bridge_region_init() expects that there are no references to
>>> w->alias_io after object_unparent() is called, allowing it to reuse the
>>> associated storage. However, if a parent bus still holds a reference to
>>> the existing object as a FlatView's root, the storage is still in use,
>>> leading to an overwrite. This hazard can be confirmed by adding the
>>> following code to pci_bridge_region_init():
>>>
>>> PCIDevice *parent_dev = parent->parent_dev;
>>> assert(!object_dynamic_cast(OBJECT(parent_dev), TYPE_PCI_BRIDGE) ||
>>> PCI_BRIDGE(parent_dev)->as_io.current_map->root != &w-
>>> >alias_io);
>>
>> What's interesting is I found PCIBridge.as_io / PCIBridge.as_mem are not
>> used anywhere.. because it looks like the bridge code uses MRs to
>> operate
>> rather than address spaces.
>>
>> Does it mean we can drop the two ASes? Then if they're the only
>> holder of
>> the refcounts of these problematic MRs, does it solve the problem too
>> in an
>> easier way? Maybe there're other issues you want to fix too with this
>> patch?
>
> Apparently we cannot drop the ASes. See commit 55fa4be6f76a ("virtio-
> pci: fix memory_region_find for VirtIOPCIRegion's MR"), which introduced
> them.
>
> I don't know any other existing devices affected by this FlatView
> behavior, but it is also difficult to show that they are *not* affected
> because it requires traversing MemoryRegion graphs that span across
> several devices.
>
> We will also need to update the documentation for future devices, but it
> is not trivial either as the condition where aliases are referenced from
> FlatView is complex.
>
> Considering that, I think this patch is a pragmatic solution that
> ensures correctness of object_unparent() on aliases.
>
>>
>>>
>>> This assertion fails when running:
>>> meson test -C build qtest-x86_64/bios-tables-test \
>>> '--test-args=-p /x86_64/acpi/piix4/pci-hotplug/no_root_hotplug'
>>>
>>> Make the references of FlatView roots "weak" (i.e., remove the
>>> reference to a root automatically removed when it is finalized) to
>>> avoid calling memory_region_ref and memory_region_unref and fix the
>>> hazard with pci-bridge.
>>>
>>> Alternative solutions (like removing the "never called on aliases"
>>> statement or detailing the exception) were rejected because the alias
>>> invariant is still relied upon in several parts of the codebase, and
>>> updating existing code to align with a new condition is non-trivial.
>>>
>>> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>>> ---
>>> system/memory.c | 8 ++++++--
>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/system/memory.c b/system/memory.c
>>> index 8b84661ae36c..08fe5e791224 100644
>>> --- a/system/memory.c
>>> +++ b/system/memory.c
>>> @@ -266,7 +266,6 @@ static FlatView *flatview_new(MemoryRegion *mr_root)
>>> view = g_new0(FlatView, 1);
>>> view->ref = 1;
>>> view->root = mr_root;
>>> - memory_region_ref(mr_root);
>>> trace_flatview_new(view, mr_root);
>>> return view;
>>> @@ -301,7 +300,6 @@ static void flatview_destroy(FlatView *view)
>>> memory_region_unref(view->ranges[i].mr);
>>> }
>>> g_free(view->ranges);
>>> - memory_region_unref(view->root);
>>> g_free(view);
>>> }
>>> @@ -1796,6 +1794,12 @@ void memory_region_init_iommu(void *_iommu_mr,
>>> static void memory_region_finalize(Object *obj)
>>> {
>>> MemoryRegion *mr = MEMORY_REGION(obj);
>>> + gpointer key;
>>> + gpointer value;
>>> +
>>> + if (g_hash_table_steal_extended(flat_views, mr, &key, &value)) {
>>> + ((FlatView *)value)->root = NULL;
>>> + }
>>
>> This is definitely very tricky.. The translation path (from
>> AddressSpaceDispatch) indeed looks ok as of now, which doesn't looks at
>> view->root.. however at least I saw this:
>>
>> void flatview_unref(FlatView *view)
>> {
>> if (qatomic_fetch_dec(&view->ref) == 1) {
>> trace_flatview_destroy_rcu(view, view->root);
>> assert(view->root);
>> <-------------------
>> call_rcu(view, flatview_destroy, rcu);
>> }
>> }
>>
>> I wonder how it didn't already crash.
>
> In case of pci-bridge, I guess flatview_unref() is synchronously called,
> but memory_region_unref(view->root) is not because of flatview_destroy()
> is delayed with RCU.
>
>>
>> The other stupid but working solution is we can always make the 6 aliases
>> to not be reused, IOW we can always use dynamic MRs considering
>> pci_bridge_update_mappings() should be rare?
>
> Perhaps we may introduce memory_region_new_alias() (that calls
> object_new()) and allow calling object_unparent() only for aliases
> created with the function.
Speaking of alternative solutions, I think the best solution in long
term is to use functions that dynamically update aliases such as
memory_region_set_alias_offset() instead of dynamically calling
object_unparent().
But that also requires considerable amount of effort, so I think this
patch still makes sense as an immediate solution. Perhaps we may also
expand the documentation to tell developers to use the dynamic update
functions so that the number of object_unparent() calls on aliases will
decrease over time and we can eventually remove the weak reference trick.
Regards,
Akihiko Odaki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-10-29 4:06 ` Akihiko Odaki
2025-10-29 4:35 ` Akihiko Odaki
@ 2025-10-29 15:21 ` Peter Xu
2025-11-03 11:18 ` Akihiko Odaki
1 sibling, 1 reply; 14+ messages in thread
From: Peter Xu @ 2025-10-29 15:21 UTC (permalink / raw)
To: Akihiko Odaki
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On Wed, Oct 29, 2025 at 01:06:47PM +0900, Akihiko Odaki wrote:
> On 2025/10/29 7:09, Peter Xu wrote:
> > On Mon, Oct 27, 2025 at 02:56:53PM +0900, Akihiko Odaki wrote:
> > > docs/devel/memory.rst says "memory_region_ref and memory_region_unref
> > > are never called on aliases", but these functions are called for
> > > FlatView roots, which can be aliases.
> >
> > IMHO the quoted doc was in a special context, where it was talking about
> > the example of address_space_map() holding adhoc refcounts of a MR, in
> > which case "memory_region_ref and memory_region_unref are never called on
> > aliases" was correct..
> >
> > The full context:
> >
> > ...
> > If you break this rule, the following situation can happen:
> > - the memory region's owner had a reference taken via memory_region_ref
> > (for example by address_space_map)
> > - the region is unparented, and has no owner anymore
> > - when address_space_unmap is called, the reference to the memory region's
> > owner is leaked.
> > There is an exception to the above rule: it is okay to call
> > object_unparent at any time for an alias or a container region. It is
> > therefore also okay to create or destroy alias and container regions
> > dynamically during a device's lifetime.
> > This exceptional usage is valid because aliases and containers only help
> > QEMU building the guest's memory map; they are never accessed directly.
> > memory_region_ref and memory_region_unref are never called on aliases
> > or containers, and the above situation then cannot happen. Exploiting
> > this exception is rarely necessary, and therefore it is discouraged,
> > but nevertheless it is used in a few places.
> > ...
> >
> > So I can't say the doc is wrong, but maybe it can be at least be clearer on
> > the scope of that sentence.. indeed.
>
> I think statement "it is okay to call object_unparent at any time for an
> alias or a container region" can be corrected. Practically, developers will
> want call object_unparent() only when:
> - the memory region is not added to a container and
> - there is no manual references created with memory_region_ref().
>
> These two conditions can be satisfied by auditing the device code that owns
> the memory region instead of multiple devices.
Yes. I think there're other ways to implicitly taking mr refcounts though
(e.g. directly used as root address space when address_space_init()). From
that POV maybe the 1st requirement isn't as special.
>
> >
> > >
> > > This causes object overwrite hazard in pci-bridge. Specifically,
> > > pci_bridge_region_init() expects that there are no references to
> > > w->alias_io after object_unparent() is called, allowing it to reuse the
> > > associated storage. However, if a parent bus still holds a reference to
> > > the existing object as a FlatView's root, the storage is still in use,
> > > leading to an overwrite. This hazard can be confirmed by adding the
> > > following code to pci_bridge_region_init():
> > >
> > > PCIDevice *parent_dev = parent->parent_dev;
> > > assert(!object_dynamic_cast(OBJECT(parent_dev), TYPE_PCI_BRIDGE) ||
> > > PCI_BRIDGE(parent_dev)->as_io.current_map->root != &w->alias_io);
> >
> > What's interesting is I found PCIBridge.as_io / PCIBridge.as_mem are not
> > used anywhere.. because it looks like the bridge code uses MRs to operate
> > rather than address spaces.
> >
> > Does it mean we can drop the two ASes? Then if they're the only holder of
> > the refcounts of these problematic MRs, does it solve the problem too in an
> > easier way? Maybe there're other issues you want to fix too with this patch?
>
> Apparently we cannot drop the ASes. See commit 55fa4be6f76a ("virtio-pci:
> fix memory_region_find for VirtIOPCIRegion's MR"), which introduced them.
Ah, this is definitely obscure.. at least it should have some comments
explaining why the ASes are there.
Now reading a bit into the problem, I'm not even sure if this is the right
thing to do, starting from ffa8a3e3b2 where it starts to introduce
memory_region_find() for virtio_address_space_lookup().
I don't know the piece of code well enough to say, but IMHO logically it
shouldn't need to depend on global address space information for the
lookup.
>
> I don't know any other existing devices affected by this FlatView behavior,
> but it is also difficult to show that they are *not* affected because it
> requires traversing MemoryRegion graphs that span across several devices.
>
> We will also need to update the documentation for future devices, but it is
> not trivial either as the condition where aliases are referenced from
> FlatView is complex.
>
> Considering that, I think this patch is a pragmatic solution that ensures
> correctness of object_unparent() on aliases.
I think this patch should still be the last resort, let's still try to
discuss if there's other options.
For example, afaiu RCU readers at least do not rely on view->root to be
present, can we already release the refcount for the view->root within the
BQL section? I mean something like this:
===8<===
diff --git a/system/memory.c b/system/memory.c
index 8b84661ae3..ceb774530f 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -301,7 +301,6 @@ static void flatview_destroy(FlatView *view)
memory_region_unref(view->ranges[i].mr);
}
g_free(view->ranges);
- memory_region_unref(view->root);
g_free(view);
}
@@ -314,7 +313,16 @@ void flatview_unref(FlatView *view)
{
if (qatomic_fetch_dec(&view->ref) == 1) {
trace_flatview_destroy_rcu(view, view->root);
+ /* Root pointer must exist until now */
assert(view->root);
+ /*
+ * Release the root pointer first without waiting for a grace
+ * period, as the root is not used by RCU readers. Early releasing
+ * of root MR helps stablizing alias MR refcounts in use cases like
+ * pci_bridge_region_init(), where the caller might want to reuse
+ * the same MR right away.
+ */
+ g_clear_pointer(&view->root, memory_region_unref);
call_rcu(view, flatview_destroy, rcu);
}
}
===8<===
That at least do not introduce weak-refcount concepts.
>
> >
> > >
> > > This assertion fails when running:
> > > meson test -C build qtest-x86_64/bios-tables-test \
> > > '--test-args=-p /x86_64/acpi/piix4/pci-hotplug/no_root_hotplug'
> > >
> > > Make the references of FlatView roots "weak" (i.e., remove the
> > > reference to a root automatically removed when it is finalized) to
> > > avoid calling memory_region_ref and memory_region_unref and fix the
> > > hazard with pci-bridge.
> > >
> > > Alternative solutions (like removing the "never called on aliases"
> > > statement or detailing the exception) were rejected because the alias
> > > invariant is still relied upon in several parts of the codebase, and
> > > updating existing code to align with a new condition is non-trivial.
> > >
> > > Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> > > ---
> > > system/memory.c | 8 ++++++--
> > > 1 file changed, 6 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/system/memory.c b/system/memory.c
> > > index 8b84661ae36c..08fe5e791224 100644
> > > --- a/system/memory.c
> > > +++ b/system/memory.c
> > > @@ -266,7 +266,6 @@ static FlatView *flatview_new(MemoryRegion *mr_root)
> > > view = g_new0(FlatView, 1);
> > > view->ref = 1;
> > > view->root = mr_root;
> > > - memory_region_ref(mr_root);
> > > trace_flatview_new(view, mr_root);
> > > return view;
> > > @@ -301,7 +300,6 @@ static void flatview_destroy(FlatView *view)
> > > memory_region_unref(view->ranges[i].mr);
> > > }
> > > g_free(view->ranges);
> > > - memory_region_unref(view->root);
> > > g_free(view);
> > > }
> > > @@ -1796,6 +1794,12 @@ void memory_region_init_iommu(void *_iommu_mr,
> > > static void memory_region_finalize(Object *obj)
> > > {
> > > MemoryRegion *mr = MEMORY_REGION(obj);
> > > + gpointer key;
> > > + gpointer value;
> > > +
> > > + if (g_hash_table_steal_extended(flat_views, mr, &key, &value)) {
> > > + ((FlatView *)value)->root = NULL;
> > > + }
> >
> > This is definitely very tricky.. The translation path (from
> > AddressSpaceDispatch) indeed looks ok as of now, which doesn't looks at
> > view->root.. however at least I saw this:
> >
> > void flatview_unref(FlatView *view)
> > {
> > if (qatomic_fetch_dec(&view->ref) == 1) {
> > trace_flatview_destroy_rcu(view, view->root);
> > assert(view->root); <-------------------
> > call_rcu(view, flatview_destroy, rcu);
> > }
> > }
> >
> > I wonder how it didn't already crash.
>
> In case of pci-bridge, I guess flatview_unref() is synchronously called, but
> memory_region_unref(view->root) is not because of flatview_destroy() is
> delayed with RCU.
True.
>
> >
> > The other stupid but working solution is we can always make the 6 aliases
> > to not be reused, IOW we can always use dynamic MRs considering
> > pci_bridge_update_mappings() should be rare?
>
> Perhaps we may introduce memory_region_new_alias() (that calls object_new())
> and allow calling object_unparent() only for aliases created with the
> function.
IMHO we can see feasibility of above "early unref view->root" idea, then
this one, before the original solution.
Thanks,
--
Peter Xu
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-10-29 15:21 ` Peter Xu
@ 2025-11-03 11:18 ` Akihiko Odaki
2025-11-05 20:07 ` Peter Xu
0 siblings, 1 reply; 14+ messages in thread
From: Akihiko Odaki @ 2025-11-03 11:18 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On 2025/10/30 0:21, Peter Xu wrote:
> On Wed, Oct 29, 2025 at 01:06:47PM +0900, Akihiko Odaki wrote:
>> On 2025/10/29 7:09, Peter Xu wrote:
>>> On Mon, Oct 27, 2025 at 02:56:53PM +0900, Akihiko Odaki wrote:
>>>> docs/devel/memory.rst says "memory_region_ref and memory_region_unref
>>>> are never called on aliases", but these functions are called for
>>>> FlatView roots, which can be aliases.
>>>
>>> IMHO the quoted doc was in a special context, where it was talking about
>>> the example of address_space_map() holding adhoc refcounts of a MR, in
>>> which case "memory_region_ref and memory_region_unref are never called on
>>> aliases" was correct..
>>>
>>> The full context:
>>>
>>> ...
>>> If you break this rule, the following situation can happen:
>>> - the memory region's owner had a reference taken via memory_region_ref
>>> (for example by address_space_map)
>>> - the region is unparented, and has no owner anymore
>>> - when address_space_unmap is called, the reference to the memory region's
>>> owner is leaked.
>>> There is an exception to the above rule: it is okay to call
>>> object_unparent at any time for an alias or a container region. It is
>>> therefore also okay to create or destroy alias and container regions
>>> dynamically during a device's lifetime.
>>> This exceptional usage is valid because aliases and containers only help
>>> QEMU building the guest's memory map; they are never accessed directly.
>>> memory_region_ref and memory_region_unref are never called on aliases
>>> or containers, and the above situation then cannot happen. Exploiting
>>> this exception is rarely necessary, and therefore it is discouraged,
>>> but nevertheless it is used in a few places.
>>> ...
>>>
>>> So I can't say the doc is wrong, but maybe it can be at least be clearer on
>>> the scope of that sentence.. indeed.
>>
>> I think statement "it is okay to call object_unparent at any time for an
>> alias or a container region" can be corrected. Practically, developers will
>> want call object_unparent() only when:
>> - the memory region is not added to a container and
>> - there is no manual references created with memory_region_ref().
>>
>> These two conditions can be satisfied by auditing the device code that owns
>> the memory region instead of multiple devices.
>
> Yes. I think there're other ways to implicitly taking mr refcounts though
> (e.g. directly used as root address space when address_space_init()). From
> that POV maybe the 1st requirement isn't as special.
>
>>
>>>
>>>>
>>>> This causes object overwrite hazard in pci-bridge. Specifically,
>>>> pci_bridge_region_init() expects that there are no references to
>>>> w->alias_io after object_unparent() is called, allowing it to reuse the
>>>> associated storage. However, if a parent bus still holds a reference to
>>>> the existing object as a FlatView's root, the storage is still in use,
>>>> leading to an overwrite. This hazard can be confirmed by adding the
>>>> following code to pci_bridge_region_init():
>>>>
>>>> PCIDevice *parent_dev = parent->parent_dev;
>>>> assert(!object_dynamic_cast(OBJECT(parent_dev), TYPE_PCI_BRIDGE) ||
>>>> PCI_BRIDGE(parent_dev)->as_io.current_map->root != &w->alias_io);
>>>
>>> What's interesting is I found PCIBridge.as_io / PCIBridge.as_mem are not
>>> used anywhere.. because it looks like the bridge code uses MRs to operate
>>> rather than address spaces.
>>>
>>> Does it mean we can drop the two ASes? Then if they're the only holder of
>>> the refcounts of these problematic MRs, does it solve the problem too in an
>>> easier way? Maybe there're other issues you want to fix too with this patch?
>>
>> Apparently we cannot drop the ASes. See commit 55fa4be6f76a ("virtio-pci:
>> fix memory_region_find for VirtIOPCIRegion's MR"), which introduced them.
>
> Ah, this is definitely obscure.. at least it should have some comments
> explaining why the ASes are there.
>
> Now reading a bit into the problem, I'm not even sure if this is the right
> thing to do, starting from ffa8a3e3b2 where it starts to introduce
> memory_region_find() for virtio_address_space_lookup().
>
> I don't know the piece of code well enough to say, but IMHO logically it
> shouldn't need to depend on global address space information for the
> lookup.
I understand the FlatView is useful for memory_region_find(), and I
think it also makes sense to assume that any guest-visible MemoryRegion
is contained in an AddressSpace.
But, looking at the code again, I now wonder why the MemoryRegion was
not contained in any AddressSpace before commit 55fa4be6f76a
("virtio-pci: fix memory_region_find for VirtIOPCIRegion's MR"). I
dropped the ASes in pci-bridge and ran the reproducer[1] but it still
works. Perhaps the ASes may no longer be needed.
[1] https://gitlab.com/qemu-project/qemu/-/issues/2576
>
>>
>> I don't know any other existing devices affected by this FlatView behavior,
>> but it is also difficult to show that they are *not* affected because it
>> requires traversing MemoryRegion graphs that span across several devices.
>>
>> We will also need to update the documentation for future devices, but it is
>> not trivial either as the condition where aliases are referenced from
>> FlatView is complex.
>>
>> Considering that, I think this patch is a pragmatic solution that ensures
>> correctness of object_unparent() on aliases.
>
> I think this patch should still be the last resort, let's still try to
> discuss if there's other options.
>
> For example, afaiu RCU readers at least do not rely on view->root to be
> present, can we already release the refcount for the view->root within the
> BQL section? I mean something like this:
>
> ===8<===
> diff --git a/system/memory.c b/system/memory.c
> index 8b84661ae3..ceb774530f 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -301,7 +301,6 @@ static void flatview_destroy(FlatView *view)
> memory_region_unref(view->ranges[i].mr);
> }
> g_free(view->ranges);
> - memory_region_unref(view->root);
> g_free(view);
> }
>
> @@ -314,7 +313,16 @@ void flatview_unref(FlatView *view)
> {
> if (qatomic_fetch_dec(&view->ref) == 1) {
> trace_flatview_destroy_rcu(view, view->root);
> + /* Root pointer must exist until now */
> assert(view->root);
> + /*
> + * Release the root pointer first without waiting for a grace
> + * period, as the root is not used by RCU readers. Early releasing
> + * of root MR helps stablizing alias MR refcounts in use cases like
> + * pci_bridge_region_init(), where the caller might want to reuse
> + * the same MR right away.
> + */
> + g_clear_pointer(&view->root, memory_region_unref);
> call_rcu(view, flatview_destroy, rcu);
> }
> }
> ===8<===
>
> That at least do not introduce weak-refcount concepts.
It unfortunately does not work for pci-bridge. It has the following
function:
void pci_bridge_update_mappings(PCIBridge *br)
{
PCIBridgeWindows *w = &br->windows;
/* Make updates atomic to: handle the case of one VCPU updating the
bridge
* while another accesses an unaffected region. */
memory_region_transaction_begin();
pci_bridge_region_del(br, w);
pci_bridge_region_cleanup(br, w);
pci_bridge_region_init(br);
memory_region_transaction_commit();
}
object_unparent() happens in pci_bridge_region_cleanup().
pci_bridge_region_init() reuses the storage.
memory_region_transaction_commit() triggers flatview_unref(), but it
needs to happen before pci_bridge_region_init().
memory_region_transaction_commit() also has an undesirable
characteristic that its effect may be delayed due to nesting. To make
sure flatview_unref() happens with a particular call of
memory_region_transaction_commit(), you need to traverse the possible
call graph that lead to the function.
So I'm afraid but I don't think there is a better way to ensure
correctness without a codebase-wide audit.
>
>>
>>>
>>>>
>>>> This assertion fails when running:
>>>> meson test -C build qtest-x86_64/bios-tables-test \
>>>> '--test-args=-p /x86_64/acpi/piix4/pci-hotplug/no_root_hotplug'
>>>>
>>>> Make the references of FlatView roots "weak" (i.e., remove the
>>>> reference to a root automatically removed when it is finalized) to
>>>> avoid calling memory_region_ref and memory_region_unref and fix the
>>>> hazard with pci-bridge.
>>>>
>>>> Alternative solutions (like removing the "never called on aliases"
>>>> statement or detailing the exception) were rejected because the alias
>>>> invariant is still relied upon in several parts of the codebase, and
>>>> updating existing code to align with a new condition is non-trivial.
>>>>
>>>> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>>>> ---
>>>> system/memory.c | 8 ++++++--
>>>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/system/memory.c b/system/memory.c
>>>> index 8b84661ae36c..08fe5e791224 100644
>>>> --- a/system/memory.c
>>>> +++ b/system/memory.c
>>>> @@ -266,7 +266,6 @@ static FlatView *flatview_new(MemoryRegion *mr_root)
>>>> view = g_new0(FlatView, 1);
>>>> view->ref = 1;
>>>> view->root = mr_root;
>>>> - memory_region_ref(mr_root);
>>>> trace_flatview_new(view, mr_root);
>>>> return view;
>>>> @@ -301,7 +300,6 @@ static void flatview_destroy(FlatView *view)
>>>> memory_region_unref(view->ranges[i].mr);
>>>> }
>>>> g_free(view->ranges);
>>>> - memory_region_unref(view->root);
>>>> g_free(view);
>>>> }
>>>> @@ -1796,6 +1794,12 @@ void memory_region_init_iommu(void *_iommu_mr,
>>>> static void memory_region_finalize(Object *obj)
>>>> {
>>>> MemoryRegion *mr = MEMORY_REGION(obj);
>>>> + gpointer key;
>>>> + gpointer value;
>>>> +
>>>> + if (g_hash_table_steal_extended(flat_views, mr, &key, &value)) {
>>>> + ((FlatView *)value)->root = NULL;
>>>> + }
>>>
>>> This is definitely very tricky.. The translation path (from
>>> AddressSpaceDispatch) indeed looks ok as of now, which doesn't looks at
>>> view->root.. however at least I saw this:
>>>
>>> void flatview_unref(FlatView *view)
>>> {
>>> if (qatomic_fetch_dec(&view->ref) == 1) {
>>> trace_flatview_destroy_rcu(view, view->root);
>>> assert(view->root); <-------------------
>>> call_rcu(view, flatview_destroy, rcu);
>>> }
>>> }
>>>
>>> I wonder how it didn't already crash.
>>
>> In case of pci-bridge, I guess flatview_unref() is synchronously called, but
>> memory_region_unref(view->root) is not because of flatview_destroy() is
>> delayed with RCU.
>
> True.
>
>>
>>>
>>> The other stupid but working solution is we can always make the 6 aliases
>>> to not be reused, IOW we can always use dynamic MRs considering
>>> pci_bridge_update_mappings() should be rare?
>>
>> Perhaps we may introduce memory_region_new_alias() (that calls object_new())
>> and allow calling object_unparent() only for aliases created with the
>> function.
>
> IMHO we can see feasibility of above "early unref view->root" idea, then
> this one, before the original solution.
>
> Thanks,
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-11-03 11:18 ` Akihiko Odaki
@ 2025-11-05 20:07 ` Peter Xu
2025-11-06 2:23 ` Akihiko Odaki
0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2025-11-05 20:07 UTC (permalink / raw)
To: Akihiko Odaki
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On Mon, Nov 03, 2025 at 08:18:00PM +0900, Akihiko Odaki wrote:
> It unfortunately does not work for pci-bridge. It has the following
> function:
>
> void pci_bridge_update_mappings(PCIBridge *br)
> {
> PCIBridgeWindows *w = &br->windows;
>
> /* Make updates atomic to: handle the case of one VCPU updating the
> bridge
> * while another accesses an unaffected region. */
> memory_region_transaction_begin();
> pci_bridge_region_del(br, w);
> pci_bridge_region_cleanup(br, w);
> pci_bridge_region_init(br);
> memory_region_transaction_commit();
> }
>
> object_unparent() happens in pci_bridge_region_cleanup().
> pci_bridge_region_init() reuses the storage.
> memory_region_transaction_commit() triggers flatview_unref(), but it needs
> to happen before pci_bridge_region_init().
>
> memory_region_transaction_commit() also has an undesirable characteristic
> that its effect may be delayed due to nesting. To make sure flatview_unref()
> happens with a particular call of memory_region_transaction_commit(), you
> need to traverse the possible call graph that lead to the function.
>
> So I'm afraid but I don't think there is a better way to ensure correctness
> without a codebase-wide audit.
Ah indeed, I missed that. :(
One way to work this around is providing a helper (abstraction from the
current memory_region_transaction_commit) to enforce a flatview reset
before reusing. However I feel like it's an overkill too, but at least
that would also avoid weak-refs.
I think in practise I'd vote we fix pci-bridge only, either with your other
proposal to dynamically allocate the alias MRs, or something like you
posted previously:
https://lore.kernel.org/all/20250906-use-v1-3-c51caafd1eb7@rsg.ci.i.u-tokyo.ac.jp/#t
Personally, I don't mind fixing pci-bridge only even if we don't audit the
whole code base. The audit work is time consuming, and I'd simply trust
the tests from all the QEMU users covering whatever devices are still being
used. We will always get an issue report when something was wrong.
What do you think?
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-11-05 20:07 ` Peter Xu
@ 2025-11-06 2:23 ` Akihiko Odaki
2025-11-06 17:50 ` Peter Xu
0 siblings, 1 reply; 14+ messages in thread
From: Akihiko Odaki @ 2025-11-06 2:23 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On 2025/11/06 5:07, Peter Xu wrote:
> On Mon, Nov 03, 2025 at 08:18:00PM +0900, Akihiko Odaki wrote:
>> It unfortunately does not work for pci-bridge. It has the following
>> function:
>>
>> void pci_bridge_update_mappings(PCIBridge *br)
>> {
>> PCIBridgeWindows *w = &br->windows;
>>
>> /* Make updates atomic to: handle the case of one VCPU updating the
>> bridge
>> * while another accesses an unaffected region. */
>> memory_region_transaction_begin();
>> pci_bridge_region_del(br, w);
>> pci_bridge_region_cleanup(br, w);
>> pci_bridge_region_init(br);
>> memory_region_transaction_commit();
>> }
>>
>> object_unparent() happens in pci_bridge_region_cleanup().
>> pci_bridge_region_init() reuses the storage.
>> memory_region_transaction_commit() triggers flatview_unref(), but it needs
>> to happen before pci_bridge_region_init().
>>
>> memory_region_transaction_commit() also has an undesirable characteristic
>> that its effect may be delayed due to nesting. To make sure flatview_unref()
>> happens with a particular call of memory_region_transaction_commit(), you
>> need to traverse the possible call graph that lead to the function.
>>
>> So I'm afraid but I don't think there is a better way to ensure correctness
>> without a codebase-wide audit.
>
> Ah indeed, I missed that. :(
>
> One way to work this around is providing a helper (abstraction from the
> current memory_region_transaction_commit) to enforce a flatview reset
> before reusing. However I feel like it's an overkill too, but at least
> that would also avoid weak-refs.
Enforcing a FlatView reset for *one* memory_region_transaction_commit()
call is incompatible with nesting, which require delaying it until all
memory_region_transaction_commit() calls to finish.
>
> I think in practise I'd vote we fix pci-bridge only, either with your other
> proposal to dynamically allocate the alias MRs, or something like you
> posted previously:
>
> https://lore.kernel.org/all/20250906-use-v1-3-c51caafd1eb7@rsg.ci.i.u-tokyo.ac.jp/#t
>
> Personally, I don't mind fixing pci-bridge only even if we don't audit the
> whole code base. The audit work is time consuming, and I'd simply trust
> the tests from all the QEMU users covering whatever devices are still being
> used. We will always get an issue report when something was wrong.
>
> What do you think?
Generally speaking, we will not necessarily "always" get an issue report
when things went wrong with memory management. A bug in memory
management may not cause an immediate crash but corrupt the memory state
which you will find only later. The end result of memory corruption may
look random and result in a hard-to-debug issue report. A user may not
even bother writing an issue report at all; this is especially true for
this kind of corner cases that rarely happen.
There should have been no such a hazard of memory corruption if the code
did exactly what the documentation said in the first place. The
consistency of the code and the documentation is essential, especially
for this kind of complex and fundamental code.
Regards,
Akihiko Odaki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-11-06 2:23 ` Akihiko Odaki
@ 2025-11-06 17:50 ` Peter Xu
2025-11-07 2:16 ` Akihiko Odaki
0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2025-11-06 17:50 UTC (permalink / raw)
To: Akihiko Odaki
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On Thu, Nov 06, 2025 at 11:23:32AM +0900, Akihiko Odaki wrote:
> Generally speaking, we will not necessarily "always" get an issue report
> when things went wrong with memory management. A bug in memory management
> may not cause an immediate crash but corrupt the memory state which you will
> find only later. The end result of memory corruption may look random and
> result in a hard-to-debug issue report. A user may not even bother writing
> an issue report at all; this is especially true for this kind of corner
> cases that rarely happen.
>
> There should have been no such a hazard of memory corruption if the code did
> exactly what the documentation said in the first place. The consistency of
> the code and the documentation is essential, especially for this kind of
> complex and fundamental code.
Do you have encountered such case before?
I wasn't expecting that, because what you were saying looks more like what
Linux kernel would have a bug in mm. QEMU is still special as it has the
default unassigned MR trapping everything by default, meanwhile normally
what is moving is MMIO / alias regions rather than real ramblocks. It
means when things go wrong we have much higher chance to trap them
properly.
I also confess though that I'm pretty conservative on fixing things with
hypothetical issues. In general, I prefer fixing things with explicit
problems, so we know how to measure and justify a fix (depending on how
aggressive the fix is and how much maintanence burden it will bring to
QEMU). Without a real problem, it's harder to quantify that even if such
evaluation will also normally be subjective too.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-11-06 17:50 ` Peter Xu
@ 2025-11-07 2:16 ` Akihiko Odaki
2025-11-07 16:40 ` Peter Xu
0 siblings, 1 reply; 14+ messages in thread
From: Akihiko Odaki @ 2025-11-07 2:16 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On 2025/11/07 2:50, Peter Xu wrote:
> On Thu, Nov 06, 2025 at 11:23:32AM +0900, Akihiko Odaki wrote:
>> Generally speaking, we will not necessarily "always" get an issue report
>> when things went wrong with memory management. A bug in memory management
>> may not cause an immediate crash but corrupt the memory state which you will
>> find only later. The end result of memory corruption may look random and
>> result in a hard-to-debug issue report. A user may not even bother writing
>> an issue report at all; this is especially true for this kind of corner
>> cases that rarely happen.
>>
>> There should have been no such a hazard of memory corruption if the code did
>> exactly what the documentation said in the first place. The consistency of
>> the code and the documentation is essential, especially for this kind of
>> complex and fundamental code.
>
> Do you have encountered such case before?
>
> I wasn't expecting that, because what you were saying looks more like what
> Linux kernel would have a bug in mm. QEMU is still special as it has the
> default unassigned MR trapping everything by default, meanwhile normally
> what is moving is MMIO / alias regions rather than real ramblocks. It
> means when things go wrong we have much higher chance to trap them
> properly.
When I said "memory management" I meant the methods we use to allocate
and free memory (the Linux equivalents would be kmalloc()/free()/kref),
not the MM tracking or unassigned MR trapping behavior you mentioned.
The unassigned MR trap and MMIO/alias movement are a separate concern
and don’t change the underlying risk.
Concrete example: imagine an alias is allocated with g_new() and freed
immediately after object_unparent(). If that alias accidentally becomes
the FlatView root, destroying the FlatView later will call
memory_region_unref() and produce a use-after-free. We cannot predict
what memory_region_unref() will read or write in that scenario — the
result can be arbitrary memory corruption that surfaces much later as a
hard-to-debug, intermittent problem. Users often won’t file an issue for
these rare corner cases.
>
> I also confess though that I'm pretty conservative on fixing things with
> hypothetical issues. In general, I prefer fixing things with explicit
> problems, so we know how to measure and justify a fix (depending on how
> aggressive the fix is and how much maintanence burden it will bring to
> QEMU). Without a real problem, it's harder to quantify that even if such
> evaluation will also normally be subjective too.
Regarding your preference to fix only explicit problems: I understand
the conservatism, but here are the facts we need to weigh:
- The documentation claims we may free aliases because
memory_region_ref() is never called, yet there is code that does call
memory_region_ref().
- The patch adds code to align behavior with the documentation.
The significance of both potential impacts (the behavioral divergence
for devices other than pci-bridge, and the added complexity needed for
consistency) may be subjective and hypothetical, but that applies
equally to both sides.
In this case, the long-term reliability and maintainability of QEMU
depend on having the code behave as documented. Correctness should take
precedence over simplicity.
Regards,
Akihiko Odaki
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-11-07 2:16 ` Akihiko Odaki
@ 2025-11-07 16:40 ` Peter Xu
2025-11-08 2:07 ` Akihiko Odaki
0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2025-11-07 16:40 UTC (permalink / raw)
To: Akihiko Odaki
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On Fri, Nov 07, 2025 at 11:16:56AM +0900, Akihiko Odaki wrote:
> On 2025/11/07 2:50, Peter Xu wrote:
> > On Thu, Nov 06, 2025 at 11:23:32AM +0900, Akihiko Odaki wrote:
> > > Generally speaking, we will not necessarily "always" get an issue report
> > > when things went wrong with memory management. A bug in memory management
> > > may not cause an immediate crash but corrupt the memory state which you will
> > > find only later. The end result of memory corruption may look random and
> > > result in a hard-to-debug issue report. A user may not even bother writing
> > > an issue report at all; this is especially true for this kind of corner
> > > cases that rarely happen.
> > >
> > > There should have been no such a hazard of memory corruption if the code did
> > > exactly what the documentation said in the first place. The consistency of
> > > the code and the documentation is essential, especially for this kind of
> > > complex and fundamental code.
> >
> > Do you have encountered such case before?
> >
> > I wasn't expecting that, because what you were saying looks more like what
> > Linux kernel would have a bug in mm. QEMU is still special as it has the
> > default unassigned MR trapping everything by default, meanwhile normally
> > what is moving is MMIO / alias regions rather than real ramblocks. It
> > means when things go wrong we have much higher chance to trap them
> > properly.
>
> When I said "memory management" I meant the methods we use to allocate and
> free memory (the Linux equivalents would be kmalloc()/free()/kref), not the
> MM tracking or unassigned MR trapping behavior you mentioned. The unassigned
> MR trap and MMIO/alias movement are a separate concern and don’t change the
> underlying risk.
>
> Concrete example: imagine an alias is allocated with g_new() and freed
> immediately after object_unparent(). If that alias accidentally becomes the
> FlatView root, destroying the FlatView later will call memory_region_unref()
> and produce a use-after-free. We cannot predict what memory_region_unref()
> will read or write in that scenario — the result can be arbitrary memory
> corruption that surfaces much later as a hard-to-debug, intermittent
> problem. Users often won’t file an issue for these rare corner cases.
OK I see what you meant now. Yes it's a valid concern.
>
> >
> > I also confess though that I'm pretty conservative on fixing things with
> > hypothetical issues. In general, I prefer fixing things with explicit
> > problems, so we know how to measure and justify a fix (depending on how
> > aggressive the fix is and how much maintanence burden it will bring to
> > QEMU). Without a real problem, it's harder to quantify that even if such
> > evaluation will also normally be subjective too.
>
> Regarding your preference to fix only explicit problems: I understand the
> conservatism, but here are the facts we need to weigh:
>
> - The documentation claims we may free aliases because
> memory_region_ref() is never called, yet there is code that does call
> memory_region_ref().
> - The patch adds code to align behavior with the documentation.
>
> The significance of both potential impacts (the behavioral divergence for
> devices other than pci-bridge, and the added complexity needed for
> consistency) may be subjective and hypothetical, but that applies equally to
> both sides.
>
> In this case, the long-term reliability and maintainability of QEMU depend
> on having the code behave as documented. Correctness should take precedence
> over simplicity.
Fair enough.
Let's then still try to see whether we can brainstorm something that can
still at least avoid the "let's clear a remote pointer in a finalize(),
because it looks safe" approach.. I'm not sure if I'm the only one that
feels nervous with it.
Fundamentally, if you can remotely clear a pointer, it means it's not used
at all. In practise, that's correct because as I also discussed before I
don't think RCU readers use flatview->root at all. It was almost only
because we have some very light references on flatview->root. The major
"hidden" reference is actually the key in flat_views hash, however I don't
think it will have any stale root MR VA as key, as long as after a proper
commit() completes.
In short, I want to discuss with you on whether we can completely remove
the flatview->root reference, instead of resetting it in finalize().
Since that'll need to be justified, I prepared some sample patches myself.
It survives all my smoke tests. Once again, please treat them as comments
only. Would you think that's slightly better? Attached at the end.
IMHO removing view->root makes sure nothing will surprise us in the future
v.s. remote resets.
Thanks,
===8<===
From 4b495d935dfb145ed7ff57b3f6e4d6b9cb5ce038 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Fri, 7 Nov 2025 10:47:40 -0500
Subject: [PATCH 1/3] memory: Remove flatview root reference in
mtree_print_dispatch()
This is so far only used to dump an extra "[ROOT]" tag when the termination
MR is the root MR. This is so far the only real necessary reference to
flatview's root. Remove it. We lose this tag, but the hope is with the
MR's name still being available it's still clear on what it represents.
This paves way for a complete removal of root MR reference on flatviews.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
system/memory-internal.h | 3 +--
system/memory.c | 4 ++--
system/physmem.c | 5 ++---
3 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/system/memory-internal.h b/system/memory-internal.h
index 46f758fa7e..c5841a603c 100644
--- a/system/memory-internal.h
+++ b/system/memory-internal.h
@@ -35,8 +35,7 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv);
void address_space_dispatch_compact(AddressSpaceDispatch *d);
void address_space_dispatch_free(AddressSpaceDispatch *d);
-void mtree_print_dispatch(struct AddressSpaceDispatch *d,
- MemoryRegion *root);
+void mtree_print_dispatch(struct AddressSpaceDispatch *d);
/* returns true if end is big endian. */
static inline bool devend_big_endian(enum device_endian end)
diff --git a/system/memory.c b/system/memory.c
index 8b84661ae3..0d14e60d26 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -3573,8 +3573,8 @@ static void mtree_print_flatview(gpointer key, gpointer value,
}
#if !defined(CONFIG_USER_ONLY)
- if (fvi->dispatch_tree && view->root) {
- mtree_print_dispatch(view->dispatch, view->root);
+ if (fvi->dispatch_tree) {
+ mtree_print_dispatch(view->dispatch);
}
#endif
diff --git a/system/physmem.c b/system/physmem.c
index c9869e4049..26ae6e3acd 100644
--- a/system/physmem.c
+++ b/system/physmem.c
@@ -4257,7 +4257,7 @@ static void mtree_print_phys_entries(int start, int end, int skip, int ptr)
#define MR_SIZE(size) (int128_nz(size) ? (hwaddr)int128_get64( \
int128_sub((size), int128_one())) : 0)
-void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root)
+void mtree_print_dispatch(AddressSpaceDispatch *d)
{
int i;
@@ -4270,13 +4270,12 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root)
" [ROM]", " [watch]" };
qemu_printf(" #%d @" HWADDR_FMT_plx ".." HWADDR_FMT_plx
- " %s%s%s%s%s",
+ " %s%s%s%s",
i,
s->offset_within_address_space,
s->offset_within_address_space + MR_SIZE(s->size),
s->mr->name ? s->mr->name : "(noname)",
i < ARRAY_SIZE(names) ? names[i] : "",
- s->mr == root ? " [ROOT]" : "",
s == d->mru_section ? " [MRU]" : "",
s->mr->is_iommu ? " [iommu]" : "");
--
2.50.1
From f96e31623f6d13e27b835f06b3c131e91590b13f Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Fri, 7 Nov 2025 10:59:15 -0500
Subject: [PATCH 2/3] memory: Introduce flatview's root name
One other reason to reference a Flatview's root MR is to fetch a name.
Remember that inside Flatview itself so as to drop this reference to
Flatview's root MR.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/system/memory.h | 1 +
system/memory.c | 5 +++--
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/system/memory.h b/include/system/memory.h
index 3bd5ffa5e0..301cda1a8f 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -1202,6 +1202,7 @@ struct FlatView {
unsigned nr_allocated;
struct AddressSpaceDispatch *dispatch;
MemoryRegion *root;
+ char *root_name;
};
static inline FlatView *address_space_to_flatview(AddressSpace *as)
diff --git a/system/memory.c b/system/memory.c
index 0d14e60d26..86891c50b4 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -266,6 +266,7 @@ static FlatView *flatview_new(MemoryRegion *mr_root)
view = g_new0(FlatView, 1);
view->ref = 1;
view->root = mr_root;
+ view->root_name = g_strdup(mr_root ? memory_region_name(mr_root) : "(none)");
memory_region_ref(mr_root);
trace_flatview_new(view, mr_root);
@@ -301,6 +302,7 @@ static void flatview_destroy(FlatView *view)
memory_region_unref(view->ranges[i].mr);
}
g_free(view->ranges);
+ g_free(view->root_name);
memory_region_unref(view->root);
g_free(view);
}
@@ -3522,8 +3524,7 @@ static void mtree_print_flatview(gpointer key, gpointer value,
qemu_printf("\n");
}
- qemu_printf(" Root memory region: %s\n",
- view->root ? memory_region_name(view->root) : "(none)");
+ qemu_printf(" Root memory region: %s\n", view->root_name);
if (n <= 0) {
qemu_printf(MTREE_INDENT "No rendered FlatView\n\n");
--
2.50.1
From 19f4451a1a7ea2ea3c0c41849f9c118f2b909293 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Fri, 7 Nov 2025 11:10:19 -0500
Subject: [PATCH 3/3] memory: Remove Flatview->root reference completely
memory.rst has this paragraph, which is currently inaccurate:
This exceptional usage is valid because aliases and containers only help
QEMU building the guest's memory map; they are never accessed directly.
memory_region_ref and memory_region_unref are never called on aliases or
containers, and the above situation then cannot happen. Exploiting this
exception is rarely necessary, and therefore it is discouraged, but
nevertheless it is used in a few places.
It was inaccurate because flatviews so far can take a refcount on an alias
MR, which may cause an object_unparent() on top of alias MR unsafe too.
However, Flatview logically doesn't need to take refcount on its root MR
that was used to generate the topology.
When generating the topology, the root MR is used within the whole process.
Said that, it doesn't need the refcount because it keeps holding BQL
without releasing, so the root MR cannot have chance to be detached or
freed.
There's another very implicit use of the root MR, which is as the index of
the flat_views hash. However that is fine too without refcount, because
for each memory whole transaction, it will guarantee that after memory
region commit()ed and returned, the flat_views hash will not have any key
that points to an invalid root MR. It's because the role is the MR needs
to be fully detached first before object_unparent(), then it guarantees it
won't appear in any of the Flatviews.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
include/system/memory.h | 1 -
system/memory.c | 13 +++++++------
system/trace-events | 4 ++--
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/include/system/memory.h b/include/system/memory.h
index 301cda1a8f..d383081155 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -1201,7 +1201,6 @@ struct FlatView {
unsigned nr;
unsigned nr_allocated;
struct AddressSpaceDispatch *dispatch;
- MemoryRegion *root;
char *root_name;
};
diff --git a/system/memory.c b/system/memory.c
index 86891c50b4..6576e673d4 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -265,9 +265,7 @@ static FlatView *flatview_new(MemoryRegion *mr_root)
view = g_new0(FlatView, 1);
view->ref = 1;
- view->root = mr_root;
view->root_name = g_strdup(mr_root ? memory_region_name(mr_root) : "(none)");
- memory_region_ref(mr_root);
trace_flatview_new(view, mr_root);
return view;
@@ -294,7 +292,7 @@ static void flatview_destroy(FlatView *view)
{
int i;
- trace_flatview_destroy(view, view->root);
+ trace_flatview_destroy(view, view->root_name);
if (view->dispatch) {
address_space_dispatch_free(view->dispatch);
}
@@ -303,7 +301,6 @@ static void flatview_destroy(FlatView *view)
}
g_free(view->ranges);
g_free(view->root_name);
- memory_region_unref(view->root);
g_free(view);
}
@@ -315,8 +312,7 @@ static bool flatview_ref(FlatView *view)
void flatview_unref(FlatView *view)
{
if (qatomic_fetch_dec(&view->ref) == 1) {
- trace_flatview_destroy_rcu(view, view->root);
- assert(view->root);
+ trace_flatview_destroy_rcu(view, view->root_name);
call_rcu(view, flatview_destroy, rcu);
}
}
@@ -751,6 +747,11 @@ static FlatView *generate_memory_topology(MemoryRegion *mr)
int i;
FlatView *view;
+ /*
+ * Holding BQL makes sure @mr (if non-NULL) will not be gone from under
+ * us, hence we do not need a refcount while using it.
+ */
+ assert(bql_locked());
view = flatview_new(mr);
if (mr) {
diff --git a/system/trace-events b/system/trace-events
index 82856e44f2..21efd8047e 100644
--- a/system/trace-events
+++ b/system/trace-events
@@ -24,8 +24,8 @@ memory_region_ram_device_read(int cpu_index, void *mr, uint64_t addr, uint64_t v
memory_region_ram_device_write(int cpu_index, void *mr, uint64_t addr, uint64_t value, unsigned size) "cpu %d mr %p addr 0x%"PRIx64" value 0x%"PRIx64" size %u"
memory_region_sync_dirty(const char *mr, const char *listener, int global) "mr '%s' listener '%s' synced (global=%d)"
flatview_new(void *view, void *root) "%p (root %p)"
-flatview_destroy(void *view, void *root) "%p (root %p)"
-flatview_destroy_rcu(void *view, void *root) "%p (root %p)"
+flatview_destroy(void *view, char *root) "%p (root %p)"
+flatview_destroy_rcu(void *view, char *root) "%p (root %p)"
global_dirty_changed(unsigned int bitmask) "bitmask 0x%"PRIx32
# physmem.c
--
2.50.1
--
Peter Xu
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-11-07 16:40 ` Peter Xu
@ 2025-11-08 2:07 ` Akihiko Odaki
2025-11-12 22:39 ` Peter Xu
0 siblings, 1 reply; 14+ messages in thread
From: Akihiko Odaki @ 2025-11-08 2:07 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On 2025/11/08 1:40, Peter Xu wrote:
> On Fri, Nov 07, 2025 at 11:16:56AM +0900, Akihiko Odaki wrote:
>> On 2025/11/07 2:50, Peter Xu wrote:
>>> On Thu, Nov 06, 2025 at 11:23:32AM +0900, Akihiko Odaki wrote:
>>>> Generally speaking, we will not necessarily "always" get an issue report
>>>> when things went wrong with memory management. A bug in memory management
>>>> may not cause an immediate crash but corrupt the memory state which you will
>>>> find only later. The end result of memory corruption may look random and
>>>> result in a hard-to-debug issue report. A user may not even bother writing
>>>> an issue report at all; this is especially true for this kind of corner
>>>> cases that rarely happen.
>>>>
>>>> There should have been no such a hazard of memory corruption if the code did
>>>> exactly what the documentation said in the first place. The consistency of
>>>> the code and the documentation is essential, especially for this kind of
>>>> complex and fundamental code.
>>>
>>> Do you have encountered such case before?
>>>
>>> I wasn't expecting that, because what you were saying looks more like what
>>> Linux kernel would have a bug in mm. QEMU is still special as it has the
>>> default unassigned MR trapping everything by default, meanwhile normally
>>> what is moving is MMIO / alias regions rather than real ramblocks. It
>>> means when things go wrong we have much higher chance to trap them
>>> properly.
>>
>> When I said "memory management" I meant the methods we use to allocate and
>> free memory (the Linux equivalents would be kmalloc()/free()/kref), not the
>> MM tracking or unassigned MR trapping behavior you mentioned. The unassigned
>> MR trap and MMIO/alias movement are a separate concern and don’t change the
>> underlying risk.
>>
>> Concrete example: imagine an alias is allocated with g_new() and freed
>> immediately after object_unparent(). If that alias accidentally becomes the
>> FlatView root, destroying the FlatView later will call memory_region_unref()
>> and produce a use-after-free. We cannot predict what memory_region_unref()
>> will read or write in that scenario — the result can be arbitrary memory
>> corruption that surfaces much later as a hard-to-debug, intermittent
>> problem. Users often won’t file an issue for these rare corner cases.
>
> OK I see what you meant now. Yes it's a valid concern.
>
>>
>>>
>>> I also confess though that I'm pretty conservative on fixing things with
>>> hypothetical issues. In general, I prefer fixing things with explicit
>>> problems, so we know how to measure and justify a fix (depending on how
>>> aggressive the fix is and how much maintanence burden it will bring to
>>> QEMU). Without a real problem, it's harder to quantify that even if such
>>> evaluation will also normally be subjective too.
>>
>> Regarding your preference to fix only explicit problems: I understand the
>> conservatism, but here are the facts we need to weigh:
>>
>> - The documentation claims we may free aliases because
>> memory_region_ref() is never called, yet there is code that does call
>> memory_region_ref().
>> - The patch adds code to align behavior with the documentation.
>>
>> The significance of both potential impacts (the behavioral divergence for
>> devices other than pci-bridge, and the added complexity needed for
>> consistency) may be subjective and hypothetical, but that applies equally to
>> both sides.
>>
>> In this case, the long-term reliability and maintainability of QEMU depend
>> on having the code behave as documented. Correctness should take precedence
>> over simplicity.
>
> Fair enough.
>
> Let's then still try to see whether we can brainstorm something that can
> still at least avoid the "let's clear a remote pointer in a finalize(),
> because it looks safe" approach.. I'm not sure if I'm the only one that
> feels nervous with it.
>
> Fundamentally, if you can remotely clear a pointer, it means it's not used
> at all. In practise, that's correct because as I also discussed before I
> don't think RCU readers use flatview->root at all. It was almost only
> because we have some very light references on flatview->root. The major
> "hidden" reference is actually the key in flat_views hash, however I don't
> think it will have any stale root MR VA as key, as long as after a proper
> commit() completes.
"As long as after a proper commit() completes" is what we are trying to
avoid. Think of the following sequence:
memory_region_transaction_begin()
object_unparent(mr)
g_free(mr)
mr = g_new0(MemoryRegion, 1) // reuse the storage for another MR
address_space_init(as, mr, "as")
memory_region_transaction_commit()
address_space_init() will use the value keyed with the dangling pointer
in flat_views.
Removing a dangling reference is safer than leaving it. Some code that
assumes the reference will not be gone may get surprised and cause a
NULL dereference, but that almost always result in SIGSEGV instead of
memory corruptions. Such segfault is as easy to debug as SIGABRT
triggered by assertions.
>
> In short, I want to discuss with you on whether we can completely remove
> the flatview->root reference, instead of resetting it in finalize().
>
> Since that'll need to be justified, I prepared some sample patches myself.
> It survives all my smoke tests. Once again, please treat them as comments
> only. Would you think that's slightly better? Attached at the end.
>
> IMHO removing view->root makes sure nothing will surprise us in the future
> v.s. remote resets.
>
> Thanks,
>
> ===8<===
>
> From 4b495d935dfb145ed7ff57b3f6e4d6b9cb5ce038 Mon Sep 17 00:00:00 2001
> From: Peter Xu <peterx@redhat.com>
> Date: Fri, 7 Nov 2025 10:47:40 -0500
> Subject: [PATCH 1/3] memory: Remove flatview root reference in
> mtree_print_dispatch()
>
> This is so far only used to dump an extra "[ROOT]" tag when the termination
> MR is the root MR. This is so far the only real necessary reference to
> flatview's root. Remove it. We lose this tag, but the hope is with the
> MR's name still being available it's still clear on what it represents.
>
> This paves way for a complete removal of root MR reference on flatviews.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> system/memory-internal.h | 3 +--
> system/memory.c | 4 ++--
> system/physmem.c | 5 ++---
> 3 files changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/system/memory-internal.h b/system/memory-internal.h
> index 46f758fa7e..c5841a603c 100644
> --- a/system/memory-internal.h
> +++ b/system/memory-internal.h
> @@ -35,8 +35,7 @@ AddressSpaceDispatch *address_space_dispatch_new(FlatView *fv);
> void address_space_dispatch_compact(AddressSpaceDispatch *d);
> void address_space_dispatch_free(AddressSpaceDispatch *d);
>
> -void mtree_print_dispatch(struct AddressSpaceDispatch *d,
> - MemoryRegion *root);
> +void mtree_print_dispatch(struct AddressSpaceDispatch *d);
>
> /* returns true if end is big endian. */
> static inline bool devend_big_endian(enum device_endian end)
> diff --git a/system/memory.c b/system/memory.c
> index 8b84661ae3..0d14e60d26 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -3573,8 +3573,8 @@ static void mtree_print_flatview(gpointer key, gpointer value,
> }
>
> #if !defined(CONFIG_USER_ONLY)
> - if (fvi->dispatch_tree && view->root) {
> - mtree_print_dispatch(view->dispatch, view->root);
> + if (fvi->dispatch_tree) {
> + mtree_print_dispatch(view->dispatch);
> }
> #endif
>
> diff --git a/system/physmem.c b/system/physmem.c
> index c9869e4049..26ae6e3acd 100644
> --- a/system/physmem.c
> +++ b/system/physmem.c
> @@ -4257,7 +4257,7 @@ static void mtree_print_phys_entries(int start, int end, int skip, int ptr)
> #define MR_SIZE(size) (int128_nz(size) ? (hwaddr)int128_get64( \
> int128_sub((size), int128_one())) : 0)
>
> -void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root)
> +void mtree_print_dispatch(AddressSpaceDispatch *d)
> {
> int i;
>
> @@ -4270,13 +4270,12 @@ void mtree_print_dispatch(AddressSpaceDispatch *d, MemoryRegion *root)
> " [ROM]", " [watch]" };
>
> qemu_printf(" #%d @" HWADDR_FMT_plx ".." HWADDR_FMT_plx
> - " %s%s%s%s%s",
> + " %s%s%s%s",
> i,
> s->offset_within_address_space,
> s->offset_within_address_space + MR_SIZE(s->size),
> s->mr->name ? s->mr->name : "(noname)",
> i < ARRAY_SIZE(names) ? names[i] : "",
> - s->mr == root ? " [ROOT]" : "",
> s == d->mru_section ? " [MRU]" : "",
> s->mr->is_iommu ? " [iommu]" : "");
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-11-08 2:07 ` Akihiko Odaki
@ 2025-11-12 22:39 ` Peter Xu
2025-11-13 1:58 ` Akihiko Odaki
0 siblings, 1 reply; 14+ messages in thread
From: Peter Xu @ 2025-11-12 22:39 UTC (permalink / raw)
To: Akihiko Odaki
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On Sat, Nov 08, 2025 at 11:07:46AM +0900, Akihiko Odaki wrote:
> On 2025/11/08 1:40, Peter Xu wrote:
> > On Fri, Nov 07, 2025 at 11:16:56AM +0900, Akihiko Odaki wrote:
> > > On 2025/11/07 2:50, Peter Xu wrote:
> > > > On Thu, Nov 06, 2025 at 11:23:32AM +0900, Akihiko Odaki wrote:
> > > > > Generally speaking, we will not necessarily "always" get an issue report
> > > > > when things went wrong with memory management. A bug in memory management
> > > > > may not cause an immediate crash but corrupt the memory state which you will
> > > > > find only later. The end result of memory corruption may look random and
> > > > > result in a hard-to-debug issue report. A user may not even bother writing
> > > > > an issue report at all; this is especially true for this kind of corner
> > > > > cases that rarely happen.
> > > > >
> > > > > There should have been no such a hazard of memory corruption if the code did
> > > > > exactly what the documentation said in the first place. The consistency of
> > > > > the code and the documentation is essential, especially for this kind of
> > > > > complex and fundamental code.
> > > >
> > > > Do you have encountered such case before?
> > > >
> > > > I wasn't expecting that, because what you were saying looks more like what
> > > > Linux kernel would have a bug in mm. QEMU is still special as it has the
> > > > default unassigned MR trapping everything by default, meanwhile normally
> > > > what is moving is MMIO / alias regions rather than real ramblocks. It
> > > > means when things go wrong we have much higher chance to trap them
> > > > properly.
> > >
> > > When I said "memory management" I meant the methods we use to allocate and
> > > free memory (the Linux equivalents would be kmalloc()/free()/kref), not the
> > > MM tracking or unassigned MR trapping behavior you mentioned. The unassigned
> > > MR trap and MMIO/alias movement are a separate concern and don’t change the
> > > underlying risk.
> > >
> > > Concrete example: imagine an alias is allocated with g_new() and freed
> > > immediately after object_unparent(). If that alias accidentally becomes the
> > > FlatView root, destroying the FlatView later will call memory_region_unref()
> > > and produce a use-after-free. We cannot predict what memory_region_unref()
> > > will read or write in that scenario — the result can be arbitrary memory
> > > corruption that surfaces much later as a hard-to-debug, intermittent
> > > problem. Users often won’t file an issue for these rare corner cases.
> >
> > OK I see what you meant now. Yes it's a valid concern.
> >
> > >
> > > >
> > > > I also confess though that I'm pretty conservative on fixing things with
> > > > hypothetical issues. In general, I prefer fixing things with explicit
> > > > problems, so we know how to measure and justify a fix (depending on how
> > > > aggressive the fix is and how much maintanence burden it will bring to
> > > > QEMU). Without a real problem, it's harder to quantify that even if such
> > > > evaluation will also normally be subjective too.
> > >
> > > Regarding your preference to fix only explicit problems: I understand the
> > > conservatism, but here are the facts we need to weigh:
> > >
> > > - The documentation claims we may free aliases because
> > > memory_region_ref() is never called, yet there is code that does call
> > > memory_region_ref().
> > > - The patch adds code to align behavior with the documentation.
> > >
> > > The significance of both potential impacts (the behavioral divergence for
> > > devices other than pci-bridge, and the added complexity needed for
> > > consistency) may be subjective and hypothetical, but that applies equally to
> > > both sides.
> > >
> > > In this case, the long-term reliability and maintainability of QEMU depend
> > > on having the code behave as documented. Correctness should take precedence
> > > over simplicity.
> >
> > Fair enough.
> >
> > Let's then still try to see whether we can brainstorm something that can
> > still at least avoid the "let's clear a remote pointer in a finalize(),
> > because it looks safe" approach.. I'm not sure if I'm the only one that
> > feels nervous with it.
> >
> > Fundamentally, if you can remotely clear a pointer, it means it's not used
> > at all. In practise, that's correct because as I also discussed before I
> > don't think RCU readers use flatview->root at all. It was almost only
> > because we have some very light references on flatview->root. The major
> > "hidden" reference is actually the key in flat_views hash, however I don't
> > think it will have any stale root MR VA as key, as long as after a proper
> > commit() completes.
>
> "As long as after a proper commit() completes" is what we are trying to
> avoid. Think of the following sequence:
>
> memory_region_transaction_begin()
> object_unparent(mr)
> g_free(mr)
> mr = g_new0(MemoryRegion, 1) // reuse the storage for another MR
> address_space_init(as, mr, "as")
> memory_region_transaction_commit()
>
> address_space_init() will use the value keyed with the dangling pointer in
> flat_views.
I agree with your analysis, however I don't think such use case exists in
practise..
I added below and qemu's qtests run all smooth:
void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
{
+ assert(memory_region_transaction_depth == 0 &&
+ memory_region_update_pending == false);
memory_region_ref(root);
as->root = root;
as->current_map = NULL;
We can either keep the assertion (until it'll trigger any time, but I doubt
it..), or we can also solve this by other ways, I can think of two right
now:
(1) Make sure address_space_init() skips the last two steps to update
topology and ioeventfds if memory_region_transaction_depth>0.
(2) Introduce one more field for MemoryRegion, which is to allocate a
globally unique KEY for the flat_views hash, only needed if the MR
can be root candidates.
(2) will need more work, as we may need to have some way to make sure the
allocator will not generate duplicate KEYs (hence u64 counter may not be
ideal, as it may provide duplicates when wraps over u64.. even though I
also don't know if people will hit it at all..).
So I wonder if we can just keep the solution (0) above, which is to assert
it.
>
> Removing a dangling reference is safer than leaving it. Some code that
> assumes the reference will not be gone may get surprised and cause a NULL
> dereference, but that almost always result in SIGSEGV instead of memory
> corruptions. Such segfault is as easy to debug as SIGABRT triggered by
> assertions.
Yes it'll be easier to debug indeed. However IMHO this is still a bad plan
to wait for it whenever overlooked. It'll still be a disaster if a
customer crashed on the NULL pointer reference somewhere, so IMHO we should
avoid the first crash from design.
I confess I also added some assertions above. However since we don't have
a huge lot of object_unparent(mr) use cases, we can walk all of them and
make sure the assertion will never happen. IMHO it's still slightly better
than remote pointer resets.
I also wonder if you have good ideas on above, hence a better solution than
proposal (0,1,2) to avoid what you raised as problems.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] memory: Make FlatView root references weak
2025-11-12 22:39 ` Peter Xu
@ 2025-11-13 1:58 ` Akihiko Odaki
0 siblings, 0 replies; 14+ messages in thread
From: Akihiko Odaki @ 2025-11-13 1:58 UTC (permalink / raw)
To: Peter Xu
Cc: qemu-devel, Paolo Bonzini, David Hildenbrand,
Philippe Mathieu-Daudé
On 2025/11/13 7:39, Peter Xu wrote:
> On Sat, Nov 08, 2025 at 11:07:46AM +0900, Akihiko Odaki wrote:
>> On 2025/11/08 1:40, Peter Xu wrote:
>>> On Fri, Nov 07, 2025 at 11:16:56AM +0900, Akihiko Odaki wrote:
>>>> On 2025/11/07 2:50, Peter Xu wrote:
>>>>> On Thu, Nov 06, 2025 at 11:23:32AM +0900, Akihiko Odaki wrote:
>>>>>> Generally speaking, we will not necessarily "always" get an issue report
>>>>>> when things went wrong with memory management. A bug in memory management
>>>>>> may not cause an immediate crash but corrupt the memory state which you will
>>>>>> find only later. The end result of memory corruption may look random and
>>>>>> result in a hard-to-debug issue report. A user may not even bother writing
>>>>>> an issue report at all; this is especially true for this kind of corner
>>>>>> cases that rarely happen.
>>>>>>
>>>>>> There should have been no such a hazard of memory corruption if the code did
>>>>>> exactly what the documentation said in the first place. The consistency of
>>>>>> the code and the documentation is essential, especially for this kind of
>>>>>> complex and fundamental code.
>>>>>
>>>>> Do you have encountered such case before?
>>>>>
>>>>> I wasn't expecting that, because what you were saying looks more like what
>>>>> Linux kernel would have a bug in mm. QEMU is still special as it has the
>>>>> default unassigned MR trapping everything by default, meanwhile normally
>>>>> what is moving is MMIO / alias regions rather than real ramblocks. It
>>>>> means when things go wrong we have much higher chance to trap them
>>>>> properly.
>>>>
>>>> When I said "memory management" I meant the methods we use to allocate and
>>>> free memory (the Linux equivalents would be kmalloc()/free()/kref), not the
>>>> MM tracking or unassigned MR trapping behavior you mentioned. The unassigned
>>>> MR trap and MMIO/alias movement are a separate concern and don’t change the
>>>> underlying risk.
>>>>
>>>> Concrete example: imagine an alias is allocated with g_new() and freed
>>>> immediately after object_unparent(). If that alias accidentally becomes the
>>>> FlatView root, destroying the FlatView later will call memory_region_unref()
>>>> and produce a use-after-free. We cannot predict what memory_region_unref()
>>>> will read or write in that scenario — the result can be arbitrary memory
>>>> corruption that surfaces much later as a hard-to-debug, intermittent
>>>> problem. Users often won’t file an issue for these rare corner cases.
>>>
>>> OK I see what you meant now. Yes it's a valid concern.
>>>
>>>>
>>>>>
>>>>> I also confess though that I'm pretty conservative on fixing things with
>>>>> hypothetical issues. In general, I prefer fixing things with explicit
>>>>> problems, so we know how to measure and justify a fix (depending on how
>>>>> aggressive the fix is and how much maintanence burden it will bring to
>>>>> QEMU). Without a real problem, it's harder to quantify that even if such
>>>>> evaluation will also normally be subjective too.
>>>>
>>>> Regarding your preference to fix only explicit problems: I understand the
>>>> conservatism, but here are the facts we need to weigh:
>>>>
>>>> - The documentation claims we may free aliases because
>>>> memory_region_ref() is never called, yet there is code that does call
>>>> memory_region_ref().
>>>> - The patch adds code to align behavior with the documentation.
>>>>
>>>> The significance of both potential impacts (the behavioral divergence for
>>>> devices other than pci-bridge, and the added complexity needed for
>>>> consistency) may be subjective and hypothetical, but that applies equally to
>>>> both sides.
>>>>
>>>> In this case, the long-term reliability and maintainability of QEMU depend
>>>> on having the code behave as documented. Correctness should take precedence
>>>> over simplicity.
>>>
>>> Fair enough.
>>>
>>> Let's then still try to see whether we can brainstorm something that can
>>> still at least avoid the "let's clear a remote pointer in a finalize(),
>>> because it looks safe" approach.. I'm not sure if I'm the only one that
>>> feels nervous with it.
>>>
>>> Fundamentally, if you can remotely clear a pointer, it means it's not used
>>> at all. In practise, that's correct because as I also discussed before I
>>> don't think RCU readers use flatview->root at all. It was almost only
>>> because we have some very light references on flatview->root. The major
>>> "hidden" reference is actually the key in flat_views hash, however I don't
>>> think it will have any stale root MR VA as key, as long as after a proper
>>> commit() completes.
>>
>> "As long as after a proper commit() completes" is what we are trying to
>> avoid. Think of the following sequence:
>>
>> memory_region_transaction_begin()
>> object_unparent(mr)
>> g_free(mr)
>> mr = g_new0(MemoryRegion, 1) // reuse the storage for another MR
>> address_space_init(as, mr, "as")
>> memory_region_transaction_commit()
>>
>> address_space_init() will use the value keyed with the dangling pointer in
>> flat_views.
>
> I agree with your analysis, however I don't think such use case exists in
> practise..
>
> I added below and qemu's qtests run all smooth:
>
> void address_space_init(AddressSpace *as, MemoryRegion *root, const char *name)
> {
> + assert(memory_region_transaction_depth == 0 &&
> + memory_region_update_pending == false);
> memory_region_ref(root);
> as->root = root;
> as->current_map = NULL;
>
> We can either keep the assertion (until it'll trigger any time, but I doubt
> it..), or we can also solve this by other ways, I can think of two right
> now:
It is a convoluted way to express that flat_views may lose a valid
reference of a memory region at some point. This patch explicitly states
that.
>
> (1) Make sure address_space_init() skips the last two steps to update
> topology and ioeventfds if memory_region_transaction_depth>0.
>
> (2) Introduce one more field for MemoryRegion, which is to allocate a
> globally unique KEY for the flat_views hash, only needed if the MR
> can be root candidates.
>
> (2) will need more work, as we may need to have some way to make sure the
> allocator will not generate duplicate KEYs (hence u64 counter may not be
> ideal, as it may provide duplicates when wraps over u64.. even though I
> also don't know if people will hit it at all..).
>
> So I wonder if we can just keep the solution (0) above, which is to assert
> it.
>
>>
>> Removing a dangling reference is safer than leaving it. Some code that
>> assumes the reference will not be gone may get surprised and cause a NULL
>> dereference, but that almost always result in SIGSEGV instead of memory
>> corruptions. Such segfault is as easy to debug as SIGABRT triggered by
>> assertions.
>
> Yes it'll be easier to debug indeed. However IMHO this is still a bad plan
> to wait for it whenever overlooked. It'll still be a disaster if a
> customer crashed on the NULL pointer reference somewhere, so IMHO we should
> avoid the first crash from design.
>
> I confess I also added some assertions above. However since we don't have
> a huge lot of object_unparent(mr) use cases, we can walk all of them and
> make sure the assertion will never happen. IMHO it's still slightly better
> than remote pointer resets.
Pointer resetting doesn't even require walking all of object_unparent(),
so in this sense pointer resetting is better.
Regards,
Akihiko Odaki
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-11-13 1:59 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-27 5:56 [PATCH] memory: Make FlatView root references weak Akihiko Odaki
2025-10-28 22:09 ` Peter Xu
2025-10-29 4:06 ` Akihiko Odaki
2025-10-29 4:35 ` Akihiko Odaki
2025-10-29 15:21 ` Peter Xu
2025-11-03 11:18 ` Akihiko Odaki
2025-11-05 20:07 ` Peter Xu
2025-11-06 2:23 ` Akihiko Odaki
2025-11-06 17:50 ` Peter Xu
2025-11-07 2:16 ` Akihiko Odaki
2025-11-07 16:40 ` Peter Xu
2025-11-08 2:07 ` Akihiko Odaki
2025-11-12 22:39 ` Peter Xu
2025-11-13 1:58 ` Akihiko Odaki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).