qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Akihiko Odaki <akihiko.odaki@daynix.com>
To: Peter Xu <peterx@redhat.com>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Yanan Wang" <wangyanan55@huawei.com>,
	"John Snow" <jsnow@redhat.com>,
	"BALATON Zoltan" <balaton@eik.bme.hu>,
	"Jiaxun Yang" <jiaxun.yang@flygoat.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Daniel Henrique Barboza" <danielhb413@gmail.com>,
	"David Gibson" <david@gibson.dropbear.id.au>,
	"Harsh Prateek Bora" <harshpb@linux.ibm.com>,
	"Alexey Kardashevskiy" <aik@ozlabs.ru>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org,
	qemu-ppc@nongnu.org
Subject: Re: [PATCH v4 6/7] memory: Do not create circular reference with subregion
Date: Wed, 28 Aug 2024 23:02:06 +0900	[thread overview]
Message-ID: <eb79a40f-f9c9-47b3-bfd0-0198e6036714@daynix.com> (raw)
In-Reply-To: <Zs8hcLPU62Hj8x-W@x1n>

On 2024/08/28 22:09, Peter Xu wrote:
> On Wed, Aug 28, 2024 at 02:33:59PM +0900, Akihiko Odaki wrote:
>> On 2024/08/28 1:11, Peter Xu wrote:
>>> On Tue, Aug 27, 2024 at 01:14:51PM +0900, Akihiko Odaki wrote:
>>>> On 2024/08/27 4:42, Peter Xu wrote:
>>>>> On Mon, Aug 26, 2024 at 06:10:25PM +0100, Peter Maydell wrote:
>>>>>> On Mon, 26 Aug 2024 at 16:22, Peter Xu <peterx@redhat.com> wrote:
>>>>>>>
>>>>>>> On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
>>>>>>>> memory_region_update_container_subregions() used to call
>>>>>>>> memory_region_ref(), which creates a reference to the owner of the
>>>>>>>> subregion, on behalf of the owner of the container. This results in a
>>>>>>>> circular reference if the subregion and container have the same owner.
>>>>>>>>
>>>>>>>> memory_region_ref() creates a reference to the owner instead of the
>>>>>>>> memory region to match the lifetime of the owner and memory region. We
>>>>>>>> do not need such a hack if the subregion and container have the same
>>>>>>>> owner because the owner will be alive as long as the container is.
>>>>>>>> Therefore, create a reference to the subregion itself instead ot its
>>>>>>>> owner in such a case; the reference to the subregion is still necessary
>>>>>>>> to ensure that the subregion gets finalized after the container.
>>>>>>>>
>>>>>>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>>>>>>> ---
>>>>>>>>     system/memory.c | 8 ++++++--
>>>>>>>>     1 file changed, 6 insertions(+), 2 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/system/memory.c b/system/memory.c
>>>>>>>> index 5e6eb459d5de..e4d3e9d1f427 100644
>>>>>>>> --- a/system/memory.c
>>>>>>>> +++ b/system/memory.c
>>>>>>>> @@ -2612,7 +2612,9 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
>>>>>>>>
>>>>>>>>         memory_region_transaction_begin();
>>>>>>>>
>>>>>>>> -    memory_region_ref(subregion);
>>>>>>>> +    object_ref(mr->owner == subregion->owner ?
>>>>>>>> +               OBJECT(subregion) : subregion->owner);
>>>>>>>
>>>>>>> The only place that mr->refcount is used so far is the owner with the
>>>>>>> object property attached to the mr, am I right (ignoring name-less MRs)?
>>>>>>>
>>>>>>> I worry this will further complicate refcounting, now we're actively using
>>>>>>> two refcounts for MRs..
>>>>
>>>> The actor of object_ref() is the owner of the memory region also in this
>>>> case. We are calling object_ref() on behalf of mr->owner so we use
>>>> mr->refcount iff mr->owner == subregion->owner. In this sense there is only
>>>> one user of mr->refcount even after this change.
>>>
>>> Yes it's still one user, but it's not that straightforward to see, also
>>> it's still an extension to how we use mr->refcount right now.  Currently
>>> it's about "true / false" just to describe, now it's a real counter.
>>>
>>> I wished that counter doesn't even exist if we'd like to stick with device
>>> / owner's counter.  Adding this can definitely also make further effort
>>> harder if we want to remove mr->refcount.
>>
>> I don't think it will make removing mr->refcount harder. With this change,
>> mr->refcount will count the parent and container. If we remove mr->refcount,
>> we need to trigger object_finalize() in a way other than checking
>> mr->refcount, which can be achieved by simply evaluating OBJECT(mr)->parent
>> && mr->container.
>>
>>>
>>>>
>>>>>>>
>>>>>>> Continue discussion there:
>>>>>>>
>>>>>>> https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c10f0@daynix.com
>>>>>>>
>>>>>>> What I don't see is how mr->subregions differs from mr->container, so we
>>>>>>> allow subregions to be attached but not the container when finalize()
>>>>>>> (which is, afaict, the other way round).
>>>>>>>
>>>>>>> It seems easier to me that we allow both container and subregions to exist
>>>>>>> as long as within the owner itself, rather than start heavier use of
>>>>>>> mr->refcount.
>>>>>>
>>>>>> I don't think just "same owner" necessarily will be workable --
>>>>>> you can have a setup like:
>>>>>>      * device A has a container C_A
>>>>>>      * device A has a child-device B
>>>>>>      * device B has a memory region R_B
>>>>>>      * device A's realize method puts R_B into C_A
>>>>>>
>>>>>> R_B's owner is B, and the container's owner is A,
>>>>>> but we still want to be able to get rid of A (in the process
>>>>>> getting rid of B because it gets unparented and unreffed,
>>>>>> and R_B and C_A also).
>>>>>
>>>>> For cross-device references, should we rely on an explicit call to
>>>>> memory_region_del_subregion(), so as to detach the link between C_A and
>>>>> R_B?
>>>>
>>>> Yes, I agree.
>>>>
>>>>>
>>>>> My understanding so far: logically when MR finalize() it should guarantee
>>>>> both (1) mr->container==NULL, and (2) mr->subregions empty.  That's before
>>>>> commit 2e2b8eb70fdb7dfb and could be the ideal world (though at the very
>>>>> beginning we don't assert on ->container==NULL yet).  It requires all
>>>>> device emulations to do proper unrealize() to unlink all the MRs.
>>>>>
>>>>> However what I'm guessing is QEMU probably used to have lots of devices
>>>>> that are not following the rules and leaking these links.  Hence we have
>>>>> had 2e2b8eb70fdb7dfb, allowing that to happen as long as it's safe, and
>>>>> it's justified by comment in 2e2b8eb70fdb7dfb on why it's safe.
>>>>>
>>>>> What I was thinking is this comment seems to apply too to mr->container, so
>>>>> that it should be safe too to unlink ->container the same way as its own
>>>>> subregions. >
>>>>> IIUC that means for device-internal MR links we should be fine leaving
>>>>> whatever link between MRs owned by such device; the device->refcount
>>>>> guarantees none of them will be visible in any AS.  But then we need to
>>>>> always properly unlink the MRs when the link is across >1 device owners,
>>>>> otherwise it's prone to leak.
>>>>
>>>> There is one principle we must satisfy in general: keep a reference to a
>>>> memory region if it is visible to the guest.
>>>>
>>>> It is safe to call memory_region_del_subregion() and to trigger the
>>>> finalization of subregions when the container is not referenced because they
>>>> are no longer visible. This is not true for the other way around; even when
>>>> subregions are not referenced by anyone else, they are still visible to the
>>>> guest as long as the container is visible to the guest. It is not safe to
>>>> unref and finalize them in such a case.
>>>>
>>>> A memory region and its owner will leak if a memory region kept visible for
>>>> a too long period whether the chain of reference contains a
>>>> container/subregion relationship or not.
>>>
>>> Could you elaborate why it's still visible to the guest if
>>> owner->refcount==0 && mr->container!=NULL?
>>>
>>> Firstly, mr->container != NULL means the MR has an user indeed.  It's the
>>> matter of who's using it.  If that came from outside this device, it should
>>> require memory_region_ref(mr) before hand when adding the subregion, and
>>> that will hold one reference on the owner->refcount.
>>>
>>> Here owner->refcount==0 means there's no such reference, so it seems to me
>>> it's guaranteed to not be visible to anything outside of this device / owner.
>>> Then from that POV it's safe to unlink when the owner is finalizing just
>>> like what we do with mr->subregions, no?
>>
>> An object is alive during instance_finalize even though its refcount == 0.
>> We can't assume all memory regions are dead even if owner->refcount == 0
>> because of that.
> 
> When you referred to "an object", do you mean the MR being finalized here?
> 
> IIUC when the MR reaches its finalize(), it should mean it's not live
> anymore.
> 
> We have two forms of MR usages right now: either embeded in another Object
> / Device, or dynamically, like VFIOQuirk.
> 
> When used embeded, the MR is only finalized when being removed from the
> object's property list, that should only happen when the object / device
> triggered finalize().  Since the MR will use the owner->refcount so I
> suppose it means the MR is not live anymore.
> 
> When used dynamically, object_unparent() is needed but that should only
> happen when the object / owner is during finalize(), per document:
> 
>          If however the memory region is part of a dynamically allocated
>          data structure, you should call object_unparent() to destroy the
>          memory region before the data structure is freed.  For an example
>          see VFIOMSIXInfo and VFIOQuirk in hw/vfio/pci.c.
> 
> Then the MR is also not live.
> 
>> In particular, docs/devel/memory.rst says you can call object_unparent()
>> in the instance_finalize of the owner. This assumes a memory region will
>> not vanish during the execution of the function unless object_unparent()
>> is already called for the memory region.
> 
> Yes, the MR will not vanish during finalize() of the owner object. However
> I don't think it's "live"?  Again, it's based on my definition of
> "liveness" as "taking one refcount of its owner", here the owner refcount
> is zero.  IOW, I don't expect it to be accessible anywhere from any address
> space (e.g. address_space_map()), because they'll all use
> memory_region_ref() and that'll ultimately stops the owner from being
> finalized.

I am calling the fact that embedded memory regions are accessible in 
instance_finalize() "live". A device can perform operations on its 
memory regions during instance_finalize() and we should be aware of that.

object_unparent() is such an example. instance_finalize() of a device 
can call object_unparent() for a subregion and for its container. If we 
automatically finalize the container when calling object_unparent() for 
the subregion, calling object_unparent() for its container will result 
in the second finalization, which is not good.

Regards,
Akihiko Odaki


  reply	other threads:[~2024-08-28 14:03 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-23  6:13 [PATCH v4 0/7] Fix check-qtest-ppc64 sanitizer errors Akihiko Odaki
2024-08-23  6:13 ` [PATCH v4 1/7] migration: Free removed SaveStateEntry Akihiko Odaki
2024-08-23 12:30   ` Fabiano Rosas
2024-08-23  6:13 ` [PATCH v4 2/7] memory: Do not refer to "memory region's reference count" Akihiko Odaki
2024-08-23  6:13 ` [PATCH v4 3/7] memory: Refer to docs/devel/memory.rst for "owner" Akihiko Odaki
2024-08-23  6:13 ` [PATCH v4 4/7] memory: Clarify that owner may be missing Akihiko Odaki
2024-08-23  6:13 ` [PATCH v4 5/7] memory: Clarify owner must not call memory_region_ref() Akihiko Odaki
2024-08-26 15:31   ` Peter Xu
2024-08-23  6:13 ` [PATCH v4 6/7] memory: Do not create circular reference with subregion Akihiko Odaki
2024-08-26 15:21   ` Peter Xu
2024-08-26 17:10     ` Peter Maydell
2024-08-26 19:42       ` Peter Xu
2024-08-27  4:14         ` Akihiko Odaki
2024-08-27 16:11           ` Peter Xu
2024-08-28  5:33             ` Akihiko Odaki
2024-08-28 13:09               ` Peter Xu
2024-08-28 14:02                 ` Akihiko Odaki [this message]
2024-08-28 15:56                   ` Peter Xu
2024-08-29  4:39                     ` Akihiko Odaki
2024-08-29 19:48                       ` Peter Xu
2024-08-30  6:11                         ` Akihiko Odaki
2024-08-30 15:05                           ` Peter Xu
2024-08-31  4:03                             ` Akihiko Odaki
2024-08-23  6:13 ` [PATCH v4 7/7] tests/qtest: Delete previous boot file Akihiko Odaki
2024-08-26 15:26   ` Peter Xu
2024-08-27 13:20     ` Thomas Huth
2024-08-27 14:03       ` Fabiano Rosas
2024-08-27 14:46         ` Peter Maydell
2024-08-27 15:54           ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=eb79a40f-f9c9-47b3-bfd0-0198e6036714@daynix.com \
    --to=akihiko.odaki@daynix.com \
    --cc=aik@ozlabs.ru \
    --cc=alex.bennee@linaro.org \
    --cc=balaton@eik.bme.hu \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=harshpb@linux.ibm.com \
    --cc=jiaxun.yang@flygoat.com \
    --cc=jsnow@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=thuth@redhat.com \
    --cc=wangyanan55@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).