Re: [PATCH] rcu: Unify force quiescent state

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Dmitry Osipenko <dmitry.osipenko@collabora.com>
To: "Akihiko Odaki" <odaki@rsg.ci.i.u-tokyo.ac.jp>,
	qemu-devel@nongnu.org, "Alex Bennée" <alex.bennee@linaro.org>
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH] rcu: Unify force quiescent state
Date: Thu, 23 Oct 2025 12:50:27 +0300	[thread overview]
Message-ID: <07ecefc3-f5a1-490b-84fe-8454d3883296@collabora.com> (raw)
In-Reply-To: <ef54339e-c78c-43a0-94af-9908478095d6@rsg.ci.i.u-tokyo.ac.jp>

On 10/23/25 08:50, Akihiko Odaki wrote:
> On 2025/10/22 12:30, Dmitry Osipenko wrote:
>> On 10/17/25 03:40, Akihiko Odaki wrote:
>>> On 2025/10/17 8:43, Akihiko Odaki wrote:
>>>> On 2025/10/17 4:33, Dmitry Osipenko wrote:
>>>>> On 10/16/25 09:34, Akihiko Odaki wrote:
>>>>>> -        /* Wait for one thread to report a quiescent state and try
>>>>>> again.
>>>>>> +        /*
>>>>>> +         * Sleep for a while and try again.
>>>>>>             * Release rcu_registry_lock, so rcu_(un)register_thread()
>>>>>> doesn't
>>>>>>             * wait too much time.
>>>>>>             *
>>>>>> @@ -133,7 +150,20 @@ static void wait_for_readers(void)
>>>>>>             * rcu_registry_lock is released.
>>>>>>             */
>>>>>>            qemu_mutex_unlock(&rcu_registry_lock);
>>>>>> -        qemu_event_wait(&rcu_gp_event);
>>>>>> +
>>>>>> +        if (forced) {
>>>>>> +            qemu_event_wait(&rcu_gp_event);
>>>>>> +
>>>>>> +            /*
>>>>>> +             * We want to be notified of changes made to
>>>>>> rcu_gp_ongoing
>>>>>> +             * while we walk the list.
>>>>>> +             */
>>>>>> +            qemu_event_reset(&rcu_gp_event);
>>>>>> +        } else {
>>>>>> +            g_usleep(10000);
>>>>>> +            sleeps++;
>>>>>
>>>>> Thanks a lot for this RCU improvement. It indeed removes the hard
>>>>> stalls
>>>>> with unmapping of virtio-gpu blobs.
>>>>>
>>>>> Am I understanding correctly that potentially we will be hitting this
>>>>> g_usleep(10000) and stall virtio-gpu for the first ~10ms? I.e. the
>>>>> MemoryRegion patches from Alex [1] are still needed to avoid stalls
>>>>> entirely.
>>>>>
>>>>> [1]
>>>>> https://lore.kernel.org/qemu-devel/20251014111234.3190346-6-
>>>>> alex.bennee@linaro.org/
>>>>
>>>> That is right, but "avoiding stalls entirely" also causes use-after-
>>>> free.
>>>>
>>>> The problem with virtio-gpu on TCG is that TCG keeps using the old
>>>> memory map until force_rcu is triggered. So, without force_rcu, the
>>>> following pseudo-code on a guest will result in use-after-free:
>>>>
>>>> address = blob_map(resource_id);
>>>> blob_unmap(resource_id);
>>>>
>>>> for (i = 0; i < some_big_number; i++)
>>>>     *(uint8_t *)address = 0;
>>>>
>>>> *(uint8_t *)address will dereference the blob until force_rcu is
>>>> triggered, so finalizing MemoryRegion before force_rcu results in use-
>>>> after-free.
>>>>
>>>> The best option to eliminate the delay entirely I have in mind is to
>>>> call drain_call_rcu(), but I'm not for such a change (for now).
>>>> drain_call_rcu() eliminates the delay if the FlatView protected by RCU
>>>> is the only referrer of the MemoryRegion, but that is not guaranteed.
>>>>
>>>> Performance should not be a concern anyway in this situation. The
>>>> guest should not waste CPU time by polling in the first place if you
>>>> really care performance; since it's a para-virtualized device and not
>>>> a real hardware, CPU time may be shared between the guest and the
>>>> device, and thus polling on the guest has an inherent risk of slowing
>>>> down the device. For performance-sensitive workloads, the guest should:
>>>>
>>>> - avoid polling and
>>>> - accumulate commands instead of waiting for each
>>>>
>>>> The delay will be less problematic if the guest does so, and I think
>>>> at least Linux does avoid polling.
>>>>
>>>> That said, stalling the guest forever in this situation is "wrong" (!=
>>>> "bad performance"). I wrote this patch to guarantee forward progress,
>>>> which is mandatory for semantic correctness.
>>>>
>>>> Perhaps drain_call_rcu() may make sense also in other, performance-
>>>> sensitive scenarios, but it should be added after benchmark or we will
>>>> have a immature optimization.
>>>
>>> I first thought just adding drain_call_rcu() would work but apparently
>>> it is not that simple. Adding drain_call_rcu() has a few problems:
>>>
>>> - It drops the BQL, which should be avoided. Problems caused by
>>> run_on_cpu(), which drops the BQL, was discussed on the list for a few
>>> times and drain_call_rcu() may also suffer from them.
>>>
>>> - It is less effective if the RCU thread enters g_usleep() before
>>> drain_call_rcu() is called.
>>>
>>> - It slows down readers due to the nature of drain_call_rcu().
>>>
>>> So, if you know some workload that may suffer from the delay, it may be
>>> a good idea to try them with the patches from Alex first, and then think
>>> of a clean solution if it improves performance.
>>
>> What's not clear to me is whether this use-after-free problem affects
>> only TCG or KVM too.
>>
>> Can we unmap blob immediately using Alex' method when using KVM? Would
>> be great if we could optimize unmapping for KVM. TCG performance is a
>> much lesser concern.
> 
> Unfortunately no, the use-after-free can occur whatever acceleration is
> used. It was discussed thoroughly in the past. In short, the patch
> breaks reference counting (which also breaks RCU as it is indirectly
> tracked with reference counting) so use-after-free can be triggered with
> a DMA operation.
> 
> The best summary I can find is the following:
> https://lore.kernel.org/qemu-devel/6c4b9d6a-d86e-44cf-be48-
> f919b81eac15@rsg.ci.i.u-tokyo.ac.jp/
> 
> The breakage of reference counting can be checked with the code in the
> following email:
> https://lore.kernel.org/qemu-devel/e9285843-0487-4754-
> a348-34f1a852b24c@daynix.com/
> 
> I suggested adding an RCU API to request the force quiescent
> state for virtio-gpu as a solution that works both on TCG and KVM in
> another thread, but I think it is also the easiest solution for KVM that
> avoids a regression. The thread can be found at:
> https://lore.kernel.org/qemu-devel/
> f5d55625-10e0-496f-9b3e-2010fe0c741f@rsg.ci.i.u-tokyo.ac.jp/
> 
> I think it is straightforward enough, but I'm worried that it may not be
> implemented before the feature freeze. I am now just a hobbyist that has
> little time to spend for QEMU, and RCU and the memory region management
> are too important to make a change in haste.
Thanks. The RCU polling improvement already is a very good start and
makes native context usable with QEMU.

You're likely the best person to work on the RCU API addition. I may
look into it too.

-- 
Best regards,
Dmitry

next prev parent reply	other threads:[~2025-10-23  9:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-16  6:34 [PATCH] rcu: Unify force quiescent state Akihiko Odaki
2025-10-16 12:59 ` Paolo Bonzini
2025-10-16 19:33 ` Dmitry Osipenko
2025-10-16 23:43   ` Akihiko Odaki
2025-10-17  0:40     ` Akihiko Odaki
2025-10-19 20:23       ` Dmitry Osipenko
2025-10-20  1:14         ` Akihiko Odaki
2025-10-20 20:41           ` Dmitry Osipenko
2025-10-22  3:30       ` Dmitry Osipenko
2025-10-23  5:50         ` Akihiko Odaki
2025-10-23  9:50           ` Dmitry Osipenko [this message]
2025-10-20  6:38   ` Paolo Bonzini
2025-10-21  4:14     ` Akihiko Odaki
2025-10-19 20:25 ` Dmitry Osipenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07ecefc3-f5a1-490b-84fe-8454d3883296@collabora.com \
    --to=dmitry.osipenko@collabora.com \
    --cc=alex.bennee@linaro.org \
    --cc=armbru@redhat.com \
    --cc=david@redhat.com \
    --cc=eblake@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=odaki@rsg.ci.i.u-tokyo.ac.jp \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).