qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Akihiko Odaki <akihiko.odaki@daynix.com>
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Yanan Wang" <wangyanan55@huawei.com>,
	"John Snow" <jsnow@redhat.com>,
	"BALATON Zoltan" <balaton@eik.bme.hu>,
	"Jiaxun Yang" <jiaxun.yang@flygoat.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Daniel Henrique Barboza" <danielhb413@gmail.com>,
	"David Gibson" <david@gibson.dropbear.id.au>,
	"Harsh Prateek Bora" <harshpb@linux.ibm.com>,
	"Alexey Kardashevskiy" <aik@ozlabs.ru>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org,
	qemu-ppc@nongnu.org
Subject: Re: [PATCH v4 6/7] memory: Do not create circular reference with subregion
Date: Wed, 28 Aug 2024 09:09:04 -0400	[thread overview]
Message-ID: <Zs8hcLPU62Hj8x-W@x1n> (raw)
In-Reply-To: <de2229bc-876e-47b2-8a59-18fe7ffe3936@daynix.com>

On Wed, Aug 28, 2024 at 02:33:59PM +0900, Akihiko Odaki wrote:
> On 2024/08/28 1:11, Peter Xu wrote:
> > On Tue, Aug 27, 2024 at 01:14:51PM +0900, Akihiko Odaki wrote:
> > > On 2024/08/27 4:42, Peter Xu wrote:
> > > > On Mon, Aug 26, 2024 at 06:10:25PM +0100, Peter Maydell wrote:
> > > > > On Mon, 26 Aug 2024 at 16:22, Peter Xu <peterx@redhat.com> wrote:
> > > > > > 
> > > > > > On Fri, Aug 23, 2024 at 03:13:11PM +0900, Akihiko Odaki wrote:
> > > > > > > memory_region_update_container_subregions() used to call
> > > > > > > memory_region_ref(), which creates a reference to the owner of the
> > > > > > > subregion, on behalf of the owner of the container. This results in a
> > > > > > > circular reference if the subregion and container have the same owner.
> > > > > > > 
> > > > > > > memory_region_ref() creates a reference to the owner instead of the
> > > > > > > memory region to match the lifetime of the owner and memory region. We
> > > > > > > do not need such a hack if the subregion and container have the same
> > > > > > > owner because the owner will be alive as long as the container is.
> > > > > > > Therefore, create a reference to the subregion itself instead ot its
> > > > > > > owner in such a case; the reference to the subregion is still necessary
> > > > > > > to ensure that the subregion gets finalized after the container.
> > > > > > > 
> > > > > > > Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
> > > > > > > ---
> > > > > > >    system/memory.c | 8 ++++++--
> > > > > > >    1 file changed, 6 insertions(+), 2 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/system/memory.c b/system/memory.c
> > > > > > > index 5e6eb459d5de..e4d3e9d1f427 100644
> > > > > > > --- a/system/memory.c
> > > > > > > +++ b/system/memory.c
> > > > > > > @@ -2612,7 +2612,9 @@ static void memory_region_update_container_subregions(MemoryRegion *subregion)
> > > > > > > 
> > > > > > >        memory_region_transaction_begin();
> > > > > > > 
> > > > > > > -    memory_region_ref(subregion);
> > > > > > > +    object_ref(mr->owner == subregion->owner ?
> > > > > > > +               OBJECT(subregion) : subregion->owner);
> > > > > > 
> > > > > > The only place that mr->refcount is used so far is the owner with the
> > > > > > object property attached to the mr, am I right (ignoring name-less MRs)?
> > > > > > 
> > > > > > I worry this will further complicate refcounting, now we're actively using
> > > > > > two refcounts for MRs..
> > > 
> > > The actor of object_ref() is the owner of the memory region also in this
> > > case. We are calling object_ref() on behalf of mr->owner so we use
> > > mr->refcount iff mr->owner == subregion->owner. In this sense there is only
> > > one user of mr->refcount even after this change.
> > 
> > Yes it's still one user, but it's not that straightforward to see, also
> > it's still an extension to how we use mr->refcount right now.  Currently
> > it's about "true / false" just to describe, now it's a real counter.
> > 
> > I wished that counter doesn't even exist if we'd like to stick with device
> > / owner's counter.  Adding this can definitely also make further effort
> > harder if we want to remove mr->refcount.
> 
> I don't think it will make removing mr->refcount harder. With this change,
> mr->refcount will count the parent and container. If we remove mr->refcount,
> we need to trigger object_finalize() in a way other than checking
> mr->refcount, which can be achieved by simply evaluating OBJECT(mr)->parent
> && mr->container.
> 
> > 
> > > 
> > > > > > 
> > > > > > Continue discussion there:
> > > > > > 
> > > > > > https://lore.kernel.org/r/067b17a4-cdfc-4f7e-b7e4-28c38e1c10f0@daynix.com
> > > > > > 
> > > > > > What I don't see is how mr->subregions differs from mr->container, so we
> > > > > > allow subregions to be attached but not the container when finalize()
> > > > > > (which is, afaict, the other way round).
> > > > > > 
> > > > > > It seems easier to me that we allow both container and subregions to exist
> > > > > > as long as within the owner itself, rather than start heavier use of
> > > > > > mr->refcount.
> > > > > 
> > > > > I don't think just "same owner" necessarily will be workable --
> > > > > you can have a setup like:
> > > > >     * device A has a container C_A
> > > > >     * device A has a child-device B
> > > > >     * device B has a memory region R_B
> > > > >     * device A's realize method puts R_B into C_A
> > > > > 
> > > > > R_B's owner is B, and the container's owner is A,
> > > > > but we still want to be able to get rid of A (in the process
> > > > > getting rid of B because it gets unparented and unreffed,
> > > > > and R_B and C_A also).
> > > > 
> > > > For cross-device references, should we rely on an explicit call to
> > > > memory_region_del_subregion(), so as to detach the link between C_A and
> > > > R_B?
> > > 
> > > Yes, I agree.
> > > 
> > > > 
> > > > My understanding so far: logically when MR finalize() it should guarantee
> > > > both (1) mr->container==NULL, and (2) mr->subregions empty.  That's before
> > > > commit 2e2b8eb70fdb7dfb and could be the ideal world (though at the very
> > > > beginning we don't assert on ->container==NULL yet).  It requires all
> > > > device emulations to do proper unrealize() to unlink all the MRs.
> > > > 
> > > > However what I'm guessing is QEMU probably used to have lots of devices
> > > > that are not following the rules and leaking these links.  Hence we have
> > > > had 2e2b8eb70fdb7dfb, allowing that to happen as long as it's safe, and
> > > > it's justified by comment in 2e2b8eb70fdb7dfb on why it's safe.
> > > > 
> > > > What I was thinking is this comment seems to apply too to mr->container, so
> > > > that it should be safe too to unlink ->container the same way as its own
> > > > subregions. >
> > > > IIUC that means for device-internal MR links we should be fine leaving
> > > > whatever link between MRs owned by such device; the device->refcount
> > > > guarantees none of them will be visible in any AS.  But then we need to
> > > > always properly unlink the MRs when the link is across >1 device owners,
> > > > otherwise it's prone to leak.
> > > 
> > > There is one principle we must satisfy in general: keep a reference to a
> > > memory region if it is visible to the guest.
> > > 
> > > It is safe to call memory_region_del_subregion() and to trigger the
> > > finalization of subregions when the container is not referenced because they
> > > are no longer visible. This is not true for the other way around; even when
> > > subregions are not referenced by anyone else, they are still visible to the
> > > guest as long as the container is visible to the guest. It is not safe to
> > > unref and finalize them in such a case.
> > > 
> > > A memory region and its owner will leak if a memory region kept visible for
> > > a too long period whether the chain of reference contains a
> > > container/subregion relationship or not.
> > 
> > Could you elaborate why it's still visible to the guest if
> > owner->refcount==0 && mr->container!=NULL?
> > 
> > Firstly, mr->container != NULL means the MR has an user indeed.  It's the
> > matter of who's using it.  If that came from outside this device, it should
> > require memory_region_ref(mr) before hand when adding the subregion, and
> > that will hold one reference on the owner->refcount.
> > 
> > Here owner->refcount==0 means there's no such reference, so it seems to me
> > it's guaranteed to not be visible to anything outside of this device / owner.
> > Then from that POV it's safe to unlink when the owner is finalizing just
> > like what we do with mr->subregions, no?
> 
> An object is alive during instance_finalize even though its refcount == 0.
> We can't assume all memory regions are dead even if owner->refcount == 0
> because of that.

When you referred to "an object", do you mean the MR being finalized here?

IIUC when the MR reaches its finalize(), it should mean it's not live
anymore.

We have two forms of MR usages right now: either embeded in another Object
/ Device, or dynamically, like VFIOQuirk.

When used embeded, the MR is only finalized when being removed from the
object's property list, that should only happen when the object / device
triggered finalize().  Since the MR will use the owner->refcount so I
suppose it means the MR is not live anymore.

When used dynamically, object_unparent() is needed but that should only
happen when the object / owner is during finalize(), per document:

        If however the memory region is part of a dynamically allocated
        data structure, you should call object_unparent() to destroy the
        memory region before the data structure is freed.  For an example
        see VFIOMSIXInfo and VFIOQuirk in hw/vfio/pci.c.

Then the MR is also not live.

> In particular, docs/devel/memory.rst says you can call object_unparent()
> in the instance_finalize of the owner. This assumes a memory region will
> not vanish during the execution of the function unless object_unparent()
> is already called for the memory region.

Yes, the MR will not vanish during finalize() of the owner object. However
I don't think it's "live"?  Again, it's based on my definition of
"liveness" as "taking one refcount of its owner", here the owner refcount
is zero.  IOW, I don't expect it to be accessible anywhere from any address
space (e.g. address_space_map()), because they'll all use
memory_region_ref() and that'll ultimately stops the owner from being
finalized.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2024-08-28 13:09 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-23  6:13 [PATCH v4 0/7] Fix check-qtest-ppc64 sanitizer errors Akihiko Odaki
2024-08-23  6:13 ` [PATCH v4 1/7] migration: Free removed SaveStateEntry Akihiko Odaki
2024-08-23 12:30   ` Fabiano Rosas
2024-08-23  6:13 ` [PATCH v4 2/7] memory: Do not refer to "memory region's reference count" Akihiko Odaki
2024-08-23  6:13 ` [PATCH v4 3/7] memory: Refer to docs/devel/memory.rst for "owner" Akihiko Odaki
2024-08-23  6:13 ` [PATCH v4 4/7] memory: Clarify that owner may be missing Akihiko Odaki
2024-08-23  6:13 ` [PATCH v4 5/7] memory: Clarify owner must not call memory_region_ref() Akihiko Odaki
2024-08-26 15:31   ` Peter Xu
2024-08-23  6:13 ` [PATCH v4 6/7] memory: Do not create circular reference with subregion Akihiko Odaki
2024-08-26 15:21   ` Peter Xu
2024-08-26 17:10     ` Peter Maydell
2024-08-26 19:42       ` Peter Xu
2024-08-27  4:14         ` Akihiko Odaki
2024-08-27 16:11           ` Peter Xu
2024-08-28  5:33             ` Akihiko Odaki
2024-08-28 13:09               ` Peter Xu [this message]
2024-08-28 14:02                 ` Akihiko Odaki
2024-08-28 15:56                   ` Peter Xu
2024-08-29  4:39                     ` Akihiko Odaki
2024-08-29 19:48                       ` Peter Xu
2024-08-30  6:11                         ` Akihiko Odaki
2024-08-30 15:05                           ` Peter Xu
2024-08-31  4:03                             ` Akihiko Odaki
2024-08-23  6:13 ` [PATCH v4 7/7] tests/qtest: Delete previous boot file Akihiko Odaki
2024-08-26 15:26   ` Peter Xu
2024-08-27 13:20     ` Thomas Huth
2024-08-27 14:03       ` Fabiano Rosas
2024-08-27 14:46         ` Peter Maydell
2024-08-27 15:54           ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zs8hcLPU62Hj8x-W@x1n \
    --to=peterx@redhat.com \
    --cc=aik@ozlabs.ru \
    --cc=akihiko.odaki@daynix.com \
    --cc=alex.bennee@linaro.org \
    --cc=balaton@eik.bme.hu \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=harshpb@linux.ibm.com \
    --cc=jiaxun.yang@flygoat.com \
    --cc=jsnow@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=thuth@redhat.com \
    --cc=wangyanan55@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).