qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Akihiko Odaki <akihiko.odaki@daynix.com>
To: "Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Peter Xu" <peterx@redhat.com>
Cc: "Eduardo Habkost" <eduardo@habkost.net>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Yanan Wang" <wangyanan55@huawei.com>,
	"John Snow" <jsnow@redhat.com>,
	"BALATON Zoltan" <balaton@eik.bme.hu>,
	"Jiaxun Yang" <jiaxun.yang@flygoat.com>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Daniel Henrique Barboza" <danielhb413@gmail.com>,
	"David Gibson" <david@gibson.dropbear.id.au>,
	"Harsh Prateek Bora" <harshpb@linux.ibm.com>,
	"Alexey Kardashevskiy" <aik@ozlabs.ru>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Fabiano Rosas" <farosas@suse.de>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Laurent Vivier" <lvivier@redhat.com>,
	qemu-devel@nongnu.org, qemu-block@nongnu.org,
	qemu-ppc@nongnu.org
Subject: Re: [PATCH v2 09/15] memory: Do not create circular reference with subregion
Date: Mon, 8 Jul 2024 17:41:24 +0900	[thread overview]
Message-ID: <8ac3dd7a-13fb-438b-a45e-afd7c4ec370b@daynix.com> (raw)
In-Reply-To: <f52c6660-3a1c-449e-93c0-07e4cefebef3@linaro.org>

On 2024/07/08 17:06, Philippe Mathieu-Daudé wrote:
> On 6/7/24 13:59, Akihiko Odaki wrote:
>> On 2024/07/03 2:44, Peter Xu wrote:
>>> On Thu, Jun 27, 2024 at 10:37:52PM +0900, Akihiko Odaki wrote:
>>>> A memory region does not use their own reference counters, but instead
>>>> piggybacks on another QOM object, "owner" (unless the owner is not the
>>>> memory region itself). When creating a subregion, a new reference to 
>>>> the
>>>> owner of the container must be created. However, if the subregion is
>>>> owned by the same QOM object, this result in a self-reference, and make
>>>> the owner immortal. Avoid such a self-reference.
>>>>
>>>> Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
>>>> ---
>>>>   system/memory.c | 11 +++++++++--
>>>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/system/memory.c b/system/memory.c
>>>> index 74cd73ebc78b..949f5016a68d 100644
>>>> --- a/system/memory.c
>>>> +++ b/system/memory.c
>>>> @@ -2638,7 +2638,10 @@ static void 
>>>> memory_region_update_container_subregions(MemoryRegion *subregion)
>>>>       memory_region_transaction_begin();
>>>> -    memory_region_ref(subregion);
>>>> +    if (mr->owner != subregion->owner) {
>>>> +        memory_region_ref(subregion);
>>>> +    }
>>>> +
>>>>       QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
>>>>           if (subregion->priority >= other->priority) {
>>>>               QTAILQ_INSERT_BEFORE(other, subregion, subregions_link);
>>>> @@ -2696,7 +2699,11 @@ void memory_region_del_subregion(MemoryRegion 
>>>> *mr,
>>>>           assert(alias->mapped_via_alias >= 0);
>>>>       }
>>>>       QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
>>>> -    memory_region_unref(subregion);
>>>> +
>>>> +    if (mr->owner != subregion->owner) {
>>>> +        memory_region_unref(subregion);
>>>> +    }
>>>> +
>>>>       memory_region_update_pending |= mr->enabled && 
>>>> subregion->enabled;
>>>>       memory_region_transaction_commit();
>>>>   }
>>>
>>> This does look like a real issue.. the patch looks reasonable to me, 
>>> but I
>>> wonder whether we should start to add some good comments in code to 
>>> reflect
>>> that complexity starting from this one.  The MR refcount isn't easy to
>>> understand to me.
>>>
>>> It also lets me start to wonder how MR refcount went through until it 
>>> looks
>>> like today..  It's definitely not extremely intuitive to use 
>>> mr->owner as
>>> the object to do refcounting if mr itself does has its own QObject,
>>> meanwhile it has other tricks around.
>>>
>>> E.g. the first thing I stumbled over when looking was the optimization
>>> where we will avoid refcounting the mr when there's no owner, and 
>>> IIUC it
>>> was for the case when the "guest memory" (which will never be freed) 
>>> used
>>> to have no owner so we can speedup DMA if we know it won't go away.
>>>
>>> https://lore.kernel.org/qemu-devel/1450263601-2828-5-git-send-email-pbonzini@redhat.com/
>>>
>>> commit 612263cf33062f7441a5d0e3b37c65991fdc3210
>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>> Date:   Wed Dec 9 11:44:25 2015 +0100
>>>
>>>      memory: avoid unnecessary object_ref/unref
>>>      For the common case of DMA into non-hotplugged RAM, it is 
>>> unnecessary
>>>      but expensive to do object_ref/unref.  Add back an owner field to
>>>      MemoryRegion, so that these memory regions can skip the reference
>>>      counting.
>>>
>>> If so, it looks like it will stop working with memory-backends get
>>> involved?  As I think those MRs will have owner set always, and I wonder
>>> whether memory-backends should be the major way to specify guest 
>>> memory now
>>> and in the future.  So I'm not sure how important that optimization 
>>> is as
>>> of now, and whether we could "simplify" it back to always do the 
>>> refcount
>>> if the major scenarios will not adopt it.
>>>
>>> The other issue is we used owner refcount from the start of
>>> memory_region_ref() got introduced, since:
>>>
>>> commit 46637be269aaaceb9867ffdf176e906401138fff
>>> Author: Paolo Bonzini <pbonzini@redhat.com>
>>> Date:   Tue May 7 09:06:00 2013 +0200
>>>
>>>      memory: add ref/unref
>>>
>>> And we still have that in our document, even though I don't think 
>>> it's true
>>> anymore:
>>>
>>>   * ...  MemoryRegions actually do not have their
>>>   * own reference count; they piggyback on a QOM object, their "owner".
>>>   * This function adds a reference to the owner.
>>>
>>> It looks like what happened is when introduced the change, MR is not 
>>> a QOM
>>> object yet.  But it later is..
>>>
>>> I mentioned all these only because I found that _if_ we can keep mr
>>> refcounting as simple as other objects:
>>>
>>> memory_region_ref(mr)
>>> {
>>>      object_ref(OBJECT(mr));
>>> }
>>>
>>> Then looks like this "recursive refcount" problem can also go away.  I'm
>>> curious whether you or anyone tried to explore that path, or whether 
>>> above
>>> doesn't make sense at all.
>>
>> It unfortunately does not solve the problem.
>>
>> The underlying problem is that the whole device must be kept alive 
>> while its memory region are. Indeed MemoryRegions do have refcounts, 
>> but incrementing them do not extend the lifetime of the devices (i.e., 
>> the owners). The refcount of the owners must be incremented for 
>> correctness.
>>
>> Referencing a subregion MemoryRegion from its container MemoryRegion 
>> owned by the same device is an exceptional case. Incrementing the 
>> refcount of the owner extends the owner's lifetime to forever.
> 
> Is it really an exceptional case?
> 
> What I'm seeing are a lot of devices creating MR and never bother
> to destroy them, so indeed owner (device) refcount never reaches 0.
> 
> Most of the time when we create MR in .instance_init/.realize,
> we neglect to implement the undo path (.instance_finalize or
> .unrealize).

I meant memory_region_update_container_subregions() is the only code 
that creates such a reference. The function itself is called in many 
code paths.


  reply	other threads:[~2024-07-08  8:42 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-27 13:37 [PATCH v2 00/15] Fix check-qtest-ppc64 sanitizer errors Akihiko Odaki
2024-06-27 13:37 ` [PATCH v2 01/15] cpu: Free cpu_ases Akihiko Odaki
2024-06-28 15:12   ` Peter Maydell
2024-06-27 13:37 ` [PATCH v2 02/15] hw/ide: Convert macio ide_irq into GPIO line Akihiko Odaki
2024-06-28  7:19   ` Mark Cave-Ayland
2024-06-27 13:37 ` [PATCH v2 03/15] hw/ide: Remove internal DMA qemu_irq Akihiko Odaki
2024-06-28  7:23   ` Mark Cave-Ayland
2024-06-27 13:37 ` [PATCH v2 04/15] hw/isa/vt82c686: Define a GPIO line between vt82c686 and i8259 Akihiko Odaki
2024-06-28  7:27   ` Mark Cave-Ayland
2024-06-29  7:38     ` Akihiko Odaki
2024-06-29  8:04       ` Akihiko Odaki
2024-06-29 13:08   ` BALATON Zoltan
2024-07-01 10:32     ` Akihiko Odaki
2024-06-27 13:37 ` [PATCH v2 05/15] spapr: Free stdout path Akihiko Odaki
2024-06-27 13:37 ` [PATCH v2 06/15] ppc/vof: Fix unaligned FDT property access Akihiko Odaki
2024-06-28 15:20   ` Peter Maydell
2024-06-29  3:16     ` David Gibson
2024-07-04 11:54       ` Nicholas Piggin
2024-07-04 12:15       ` Peter Maydell
2024-07-05  1:18         ` Nicholas Piggin
2024-07-05  1:41           ` David Gibson
2024-07-05  4:40             ` Nicholas Piggin
2024-07-05  5:12               ` David Gibson
2024-07-05  7:50                 ` Nicholas Piggin
2024-07-06  9:07                   ` Akihiko Odaki
2024-07-06 10:37                 ` Peter Maydell
2024-07-06 23:46                   ` David Gibson
2024-07-08  7:49                     ` Nicholas Piggin
2024-07-08 15:59                       ` Peter Maydell
2024-07-09  7:46                         ` David Gibson
2024-07-09  7:41                       ` David Gibson
2024-07-05  1:33         ` David Gibson
2024-06-27 13:37 ` [PATCH v2 07/15] hw/virtio: Free vqs after vhost_dev_cleanup() Akihiko Odaki
2024-06-27 13:37 ` [PATCH v2 08/15] migration: Free removed SaveStateEntry Akihiko Odaki
2024-08-02 12:47   ` (subset) " Fabiano Rosas
2024-06-27 13:37 ` [PATCH v2 09/15] memory: Do not create circular reference with subregion Akihiko Odaki
2024-07-02 17:44   ` Peter Xu
2024-07-06 11:59     ` Akihiko Odaki
2024-07-08  8:06       ` Philippe Mathieu-Daudé
2024-07-08  8:41         ` Akihiko Odaki [this message]
2024-07-08 16:40         ` Peter Xu
2024-07-11  8:38           ` Akihiko Odaki
2024-08-22 17:10   ` Peter Maydell
2024-08-22 21:01     ` Peter Xu
2024-08-23  6:17       ` Akihiko Odaki
2024-06-27 13:37 ` [PATCH v2 10/15] tests/qtest: Use qtest_add_data_func_full() Akihiko Odaki
2024-07-02  6:14   ` Thomas Huth
2024-06-27 13:37 ` [PATCH v2 11/15] tests/qtest: Free unused QMP response Akihiko Odaki
2024-06-27 13:37 ` [PATCH v2 12/15] tests/qtest: Free old machine variable name Akihiko Odaki
2024-06-28 15:21   ` Peter Maydell
2024-06-27 13:37 ` [PATCH v2 13/15] tests/qtest: Delete previous boot file Akihiko Odaki
2024-07-02  7:31   ` Thomas Huth
2024-07-04 11:41     ` Akihiko Odaki
2024-06-27 13:37 ` [PATCH v2 14/15] tests/qtest: Free paths Akihiko Odaki
2024-06-27 13:37 ` [PATCH v2 15/15] tests/qtest: Free GThread Akihiko Odaki
2024-06-28 15:24   ` Peter Maydell
2024-07-01 20:11 ` [PATCH v2 00/15] Fix check-qtest-ppc64 sanitizer errors Michael S. Tsirkin
2024-07-01 22:23   ` BALATON Zoltan
2024-07-02  6:23     ` Thomas Huth
2024-07-04 11:37       ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ac3dd7a-13fb-438b-a45e-afd7c4ec370b@daynix.com \
    --to=akihiko.odaki@daynix.com \
    --cc=aik@ozlabs.ru \
    --cc=alex.bennee@linaro.org \
    --cc=balaton@eik.bme.hu \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=farosas@suse.de \
    --cc=harshpb@linux.ibm.com \
    --cc=jiaxun.yang@flygoat.com \
    --cc=jsnow@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=thuth@redhat.com \
    --cc=wangyanan55@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).