From: Paolo Bonzini <pbonzini@redhat.com>
To: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>, qemu-devel@nongnu.org
Cc: "David Hildenbrand" <david@redhat.com>,
"Eric Blake" <eblake@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Dmitry Osipenko" <dmitry.osipenko@collabora.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Peter Xu" <peterx@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH] rcu: Unify force quiescent state
Date: Thu, 16 Oct 2025 14:59:48 +0200 [thread overview]
Message-ID: <e0e263c3-980c-47ed-9a93-048de6e47e8b@redhat.com> (raw)
In-Reply-To: <20251016-force-v1-1-919a82112498@rsg.ci.i.u-tokyo.ac.jp>
On 10/16/25 08:34, Akihiko Odaki wrote:
> Borrow the concept of force quiescent state from Linux to ensure readers
> remain fast during normal operation and to avoid stalls.
Very nice solution! The code of the call RCU thread is simpler. I will
follow up with extracting
node = try_dequeue();
while (!node) {
bql_unlock();
qemu_event_reset(&rcu_call_ready_event);
node = try_dequeue();
if (!node) {
qemu_event_wait(&rcu_call_ready_event);
node = try_dequeue();
}
bql_lock();
}
to its own function dequeue().
Paolo
> Background
> ==========
>
> The previous implementation had four steps to begin reclamation.
>
> 1. call_rcu_thread() would wait for the first callback.
>
> 2. call_rcu_thread() would periodically poll until a decent number of
> callbacks piled up or it timed out.
>
> 3. synchronize_rcu() would statr a grace period (GP).
>
> 4. wait_for_readers() would wait for the GP to end. It would also
> trigger the force_rcu notifier to break busy loops in a read-side
> critical section if drain_call_rcu() had been called.
>
> Problem
> =======
>
> The separation of waiting logic across these steps led to suboptimal
> behavior:
>
> The GP was delayed until call_rcu_thread() stops polling.
>
> force_rcu was not consistently triggered when call_rcu_thread() detected
> a high number of pending callbacks or a timeout. This inconsistency
> sometimes led to stalls, as reported in a virtio-gpu issue where memory
> unmapping was blocked[1].
>
> wait_for_readers() imposed unnecessary overhead in non-urgent cases by
> unconditionally executing qatomic_set(&index->waiting, true) and
> qemu_event_reset(&rcu_gp_event), which are necessary only for expedited
> synchronization.
>
> Solution
> ========
>
> Move the polling in call_rcu_thread() to wait_for_readers() to prevent
> the delay of the GP. Additionally, reorganize wait_for_readers() to
> distinguish between two states:
>
> Normal State: it relies exclusively on periodic polling to detect
> the end of the GP and maintains the read-side fast path.
>
> Force Quiescent State: Whenever expediting synchronization, it always
> triggers force_rcu and executes both qatomic_set(&index->waiting, true)
> and qemu_event_reset(&rcu_gp_event). This avoids stalls while confining
> the read-side overhead to this state.
>
> This unified approach, inspired by the Linux RCU, ensures consistent and
> efficient RCU grace period handling and confirms resolution of the
> virtio-gpu issue.
>
> [1] https://lore.kernel.org/qemu-devel/20251014111234.3190346-9-alex.bennee@linaro.org/
>
> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> ---
> util/rcu.c | 79 ++++++++++++++++++++++++++++++++++++++++----------------------
> 1 file changed, 51 insertions(+), 28 deletions(-)
>
> diff --git a/util/rcu.c b/util/rcu.c
> index b703c86f15a3..acac9446ea98 100644
> --- a/util/rcu.c
> +++ b/util/rcu.c
> @@ -43,10 +43,14 @@
> #define RCU_GP_LOCKED (1UL << 0)
> #define RCU_GP_CTR (1UL << 1)
>
> +
> +#define RCU_CALL_MIN_SIZE 30
> +
> unsigned long rcu_gp_ctr = RCU_GP_LOCKED;
>
> QemuEvent rcu_gp_event;
> static int in_drain_call_rcu;
> +static int rcu_call_count;
> static QemuMutex rcu_registry_lock;
> static QemuMutex rcu_sync_lock;
>
> @@ -76,15 +80,29 @@ static void wait_for_readers(void)
> {
> ThreadList qsreaders = QLIST_HEAD_INITIALIZER(qsreaders);
> struct rcu_reader_data *index, *tmp;
> + int sleeps = 0;
> + bool forced = false;
>
> for (;;) {
> - /* We want to be notified of changes made to rcu_gp_ongoing
> - * while we walk the list.
> + /*
> + * Force the grace period to end and wait for it if any of the
> + * following heuristical conditions are satisfied:
> + * - A decent number of callbacks piled up.
> + * - It timed out.
> + * - It is in a drain_call_rcu() call.
> + *
> + * Otherwise, periodically poll the grace period, hoping it ends
> + * promptly.
> */
> - qemu_event_reset(&rcu_gp_event);
> + if (!forced &&
> + (qatomic_read(&rcu_call_count) >= RCU_CALL_MIN_SIZE ||
> + sleeps >= 5 || qatomic_read(&in_drain_call_rcu))) {
> + forced = true;
>
> - QLIST_FOREACH(index, ®istry, node) {
> - qatomic_set(&index->waiting, true);
> + QLIST_FOREACH(index, ®istry, node) {
> + notifier_list_notify(&index->force_rcu, NULL);
> + qatomic_set(&index->waiting, true);
> + }
> }
>
> /* Here, order the stores to index->waiting before the loads of
> @@ -106,8 +124,6 @@ static void wait_for_readers(void)
> * get some extra futex wakeups.
> */
> qatomic_set(&index->waiting, false);
> - } else if (qatomic_read(&in_drain_call_rcu)) {
> - notifier_list_notify(&index->force_rcu, NULL);
> }
> }
>
> @@ -115,7 +131,8 @@ static void wait_for_readers(void)
> break;
> }
>
> - /* Wait for one thread to report a quiescent state and try again.
> + /*
> + * Sleep for a while and try again.
> * Release rcu_registry_lock, so rcu_(un)register_thread() doesn't
> * wait too much time.
> *
> @@ -133,7 +150,20 @@ static void wait_for_readers(void)
> * rcu_registry_lock is released.
> */
> qemu_mutex_unlock(&rcu_registry_lock);
> - qemu_event_wait(&rcu_gp_event);
> +
> + if (forced) {
> + qemu_event_wait(&rcu_gp_event);
> +
> + /*
> + * We want to be notified of changes made to rcu_gp_ongoing
> + * while we walk the list.
> + */
> + qemu_event_reset(&rcu_gp_event);
> + } else {
> + g_usleep(10000);
> + sleeps++;
> + }
> +
> qemu_mutex_lock(&rcu_registry_lock);
> }
>
> @@ -173,15 +203,11 @@ void synchronize_rcu(void)
> }
> }
>
> -
> -#define RCU_CALL_MIN_SIZE 30
> -
> /* Multi-producer, single-consumer queue based on urcu/static/wfqueue.h
> * from liburcu. Note that head is only used by the consumer.
> */
> static struct rcu_head dummy;
> static struct rcu_head *head = &dummy, **tail = &dummy.next;
> -static int rcu_call_count;
> static QemuEvent rcu_call_ready_event;
>
> static void enqueue(struct rcu_head *node)
> @@ -259,30 +285,27 @@ static void *call_rcu_thread(void *opaque)
> rcu_register_thread();
>
> for (;;) {
> - int tries = 0;
> - int n = qatomic_read(&rcu_call_count);
> + int n;
>
> - /* Heuristically wait for a decent number of callbacks to pile up.
> + /*
> * Fetch rcu_call_count now, we only must process elements that were
> * added before synchronize_rcu() starts.
> */
> - while (n == 0 || (n < RCU_CALL_MIN_SIZE && ++tries <= 5)) {
> - g_usleep(10000);
> - if (n == 0) {
> - qemu_event_reset(&rcu_call_ready_event);
> - n = qatomic_read(&rcu_call_count);
> - if (n == 0) {
> + for (;;) {
> + qemu_event_reset(&rcu_call_ready_event);
> + n = qatomic_read(&rcu_call_count);
> + if (n) {
> + break;
> + }
> +
> #if defined(CONFIG_MALLOC_TRIM)
> - malloc_trim(4 * 1024 * 1024);
> + malloc_trim(4 * 1024 * 1024);
> #endif
> - qemu_event_wait(&rcu_call_ready_event);
> - }
> - }
> - n = qatomic_read(&rcu_call_count);
> + qemu_event_wait(&rcu_call_ready_event);
> }
>
> - qatomic_sub(&rcu_call_count, n);
> synchronize_rcu();
> + qatomic_sub(&rcu_call_count, n);
> bql_lock();
> while (n > 0) {
> node = try_dequeue();
>
> ---
> base-commit: 0dc905ac306c68649e05cdaf8434123c8f917b41
> change-id: 20251015-force-c4e03a9ba719
>
> Best regards,
> --
> Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
>
>
>
next prev parent reply other threads:[~2025-10-16 13:01 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-16 6:34 [PATCH] rcu: Unify force quiescent state Akihiko Odaki
2025-10-16 12:59 ` Paolo Bonzini [this message]
2025-10-16 19:33 ` Dmitry Osipenko
2025-10-16 23:43 ` Akihiko Odaki
2025-10-17 0:40 ` Akihiko Odaki
2025-10-19 20:23 ` Dmitry Osipenko
2025-10-20 1:14 ` Akihiko Odaki
2025-10-20 20:41 ` Dmitry Osipenko
2025-10-22 3:30 ` Dmitry Osipenko
2025-10-23 5:50 ` Akihiko Odaki
2025-10-23 9:50 ` Dmitry Osipenko
2025-10-20 6:38 ` Paolo Bonzini
2025-10-21 4:14 ` Akihiko Odaki
2025-10-19 20:25 ` Dmitry Osipenko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e0e263c3-980c-47ed-9a93-048de6e47e8b@redhat.com \
--to=pbonzini@redhat.com \
--cc=armbru@redhat.com \
--cc=david@redhat.com \
--cc=dmitry.osipenko@collabora.com \
--cc=eblake@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=mst@redhat.com \
--cc=odaki@rsg.ci.i.u-tokyo.ac.jp \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).