Re: [PATCH] rcu: Unify force quiescent state

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Paolo Bonzini <pbonzini@redhat.com>
To: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>, qemu-devel@nongnu.org
Cc: "David Hildenbrand" <david@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Dmitry Osipenko" <dmitry.osipenko@collabora.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH] rcu: Unify force quiescent state
Date: Thu, 16 Oct 2025 14:59:48 +0200	[thread overview]
Message-ID: <e0e263c3-980c-47ed-9a93-048de6e47e8b@redhat.com> (raw)
In-Reply-To: <20251016-force-v1-1-919a82112498@rsg.ci.i.u-tokyo.ac.jp>

On 10/16/25 08:34, Akihiko Odaki wrote:
> Borrow the concept of force quiescent state from Linux to ensure readers
> remain fast during normal operation and to avoid stalls.

Very nice solution!  The code of the call RCU thread is simpler.  I will 
follow up with extracting

             node = try_dequeue();
             while (!node) {
                 bql_unlock();
                 qemu_event_reset(&rcu_call_ready_event);
                 node = try_dequeue();
                 if (!node) {
                     qemu_event_wait(&rcu_call_ready_event);
                     node = try_dequeue();
                 }
                 bql_lock();
             }


to its own function dequeue().

Paolo

> Background
> ==========
> 
> The previous implementation had four steps to begin reclamation.
> 
> 1. call_rcu_thread() would wait for the first callback.
> 
> 2. call_rcu_thread() would periodically poll until a decent number of
>     callbacks piled up or it timed out.
> 
> 3. synchronize_rcu() would statr a grace period (GP).
> 
> 4. wait_for_readers() would wait for the GP to end. It would also
>     trigger the force_rcu notifier to break busy loops in a read-side
>     critical section if drain_call_rcu() had been called.
> 
> Problem
> =======
> 
> The separation of waiting logic across these steps led to suboptimal
> behavior:
> 
> The GP was delayed until call_rcu_thread() stops polling.
> 
> force_rcu was not consistently triggered when call_rcu_thread() detected
> a high number of pending callbacks or a timeout. This inconsistency
> sometimes led to stalls, as reported in a virtio-gpu issue where memory
> unmapping was blocked[1].
> 
> wait_for_readers() imposed unnecessary overhead in non-urgent cases by
> unconditionally executing qatomic_set(&index->waiting, true) and
> qemu_event_reset(&rcu_gp_event), which are necessary only for expedited
> synchronization.
> 
> Solution
> ========
> 
> Move the polling in call_rcu_thread() to wait_for_readers() to prevent
> the delay of the GP. Additionally, reorganize wait_for_readers() to
> distinguish between two states:
> 
> Normal State: it relies exclusively on periodic polling to detect
> the end of the GP and maintains the read-side fast path.
> 
> Force Quiescent State: Whenever expediting synchronization, it always
> triggers force_rcu and executes both qatomic_set(&index->waiting, true)
> and qemu_event_reset(&rcu_gp_event). This avoids stalls while confining
> the read-side overhead to this state.
> 
> This unified approach, inspired by the Linux RCU, ensures consistent and
> efficient RCU grace period handling and confirms resolution of the
> virtio-gpu issue.
> 
> [1] https://lore.kernel.org/qemu-devel/20251014111234.3190346-9-alex.bennee@linaro.org/
> 
> Signed-off-by: Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> ---
>   util/rcu.c | 79 ++++++++++++++++++++++++++++++++++++++++----------------------
>   1 file changed, 51 insertions(+), 28 deletions(-)
> 
> diff --git a/util/rcu.c b/util/rcu.c
> index b703c86f15a3..acac9446ea98 100644
> --- a/util/rcu.c
> +++ b/util/rcu.c
> @@ -43,10 +43,14 @@
>   #define RCU_GP_LOCKED           (1UL << 0)
>   #define RCU_GP_CTR              (1UL << 1)
>   
> +
> +#define RCU_CALL_MIN_SIZE        30
> +
>   unsigned long rcu_gp_ctr = RCU_GP_LOCKED;
>   
>   QemuEvent rcu_gp_event;
>   static int in_drain_call_rcu;
> +static int rcu_call_count;
>   static QemuMutex rcu_registry_lock;
>   static QemuMutex rcu_sync_lock;
>   
> @@ -76,15 +80,29 @@ static void wait_for_readers(void)
>   {
>       ThreadList qsreaders = QLIST_HEAD_INITIALIZER(qsreaders);
>       struct rcu_reader_data *index, *tmp;
> +    int sleeps = 0;
> +    bool forced = false;
>   
>       for (;;) {
> -        /* We want to be notified of changes made to rcu_gp_ongoing
> -         * while we walk the list.
> +        /*
> +         * Force the grace period to end and wait for it if any of the
> +         * following heuristical conditions are satisfied:
> +         * - A decent number of callbacks piled up.
> +         * - It timed out.
> +         * - It is in a drain_call_rcu() call.
> +         *
> +         * Otherwise, periodically poll the grace period, hoping it ends
> +         * promptly.
>            */
> -        qemu_event_reset(&rcu_gp_event);
> +        if (!forced &&
> +            (qatomic_read(&rcu_call_count) >= RCU_CALL_MIN_SIZE ||
> +             sleeps >= 5 || qatomic_read(&in_drain_call_rcu))) {
> +            forced = true;
>   
> -        QLIST_FOREACH(index, &registry, node) {
> -            qatomic_set(&index->waiting, true);
> +            QLIST_FOREACH(index, &registry, node) {
> +                notifier_list_notify(&index->force_rcu, NULL);
> +                qatomic_set(&index->waiting, true);
> +            }
>           }
>   
>           /* Here, order the stores to index->waiting before the loads of
> @@ -106,8 +124,6 @@ static void wait_for_readers(void)
>                    * get some extra futex wakeups.
>                    */
>                   qatomic_set(&index->waiting, false);
> -            } else if (qatomic_read(&in_drain_call_rcu)) {
> -                notifier_list_notify(&index->force_rcu, NULL);
>               }
>           }
>   
> @@ -115,7 +131,8 @@ static void wait_for_readers(void)
>               break;
>           }
>   
> -        /* Wait for one thread to report a quiescent state and try again.
> +        /*
> +         * Sleep for a while and try again.
>            * Release rcu_registry_lock, so rcu_(un)register_thread() doesn't
>            * wait too much time.
>            *
> @@ -133,7 +150,20 @@ static void wait_for_readers(void)
>            * rcu_registry_lock is released.
>            */
>           qemu_mutex_unlock(&rcu_registry_lock);
> -        qemu_event_wait(&rcu_gp_event);
> +
> +        if (forced) {
> +            qemu_event_wait(&rcu_gp_event);
> +
> +            /*
> +             * We want to be notified of changes made to rcu_gp_ongoing
> +             * while we walk the list.
> +             */
> +            qemu_event_reset(&rcu_gp_event);
> +        } else {
> +            g_usleep(10000);
> +            sleeps++;
> +        }
> +
>           qemu_mutex_lock(&rcu_registry_lock);
>       }
>   
> @@ -173,15 +203,11 @@ void synchronize_rcu(void)
>       }
>   }
>   
> -
> -#define RCU_CALL_MIN_SIZE        30
> -
>   /* Multi-producer, single-consumer queue based on urcu/static/wfqueue.h
>    * from liburcu.  Note that head is only used by the consumer.
>    */
>   static struct rcu_head dummy;
>   static struct rcu_head *head = &dummy, **tail = &dummy.next;
> -static int rcu_call_count;
>   static QemuEvent rcu_call_ready_event;
>   
>   static void enqueue(struct rcu_head *node)
> @@ -259,30 +285,27 @@ static void *call_rcu_thread(void *opaque)
>       rcu_register_thread();
>   
>       for (;;) {
> -        int tries = 0;
> -        int n = qatomic_read(&rcu_call_count);
> +        int n;
>   
> -        /* Heuristically wait for a decent number of callbacks to pile up.
> +        /*
>            * Fetch rcu_call_count now, we only must process elements that were
>            * added before synchronize_rcu() starts.
>            */
> -        while (n == 0 || (n < RCU_CALL_MIN_SIZE && ++tries <= 5)) {
> -            g_usleep(10000);
> -            if (n == 0) {
> -                qemu_event_reset(&rcu_call_ready_event);
> -                n = qatomic_read(&rcu_call_count);
> -                if (n == 0) {
> +        for (;;) {
> +            qemu_event_reset(&rcu_call_ready_event);
> +            n = qatomic_read(&rcu_call_count);
> +            if (n) {
> +                break;
> +            }
> +
>   #if defined(CONFIG_MALLOC_TRIM)
> -                    malloc_trim(4 * 1024 * 1024);
> +            malloc_trim(4 * 1024 * 1024);
>   #endif
> -                    qemu_event_wait(&rcu_call_ready_event);
> -                }
> -            }
> -            n = qatomic_read(&rcu_call_count);
> +            qemu_event_wait(&rcu_call_ready_event);
>           }
>   
> -        qatomic_sub(&rcu_call_count, n);
>           synchronize_rcu();
> +        qatomic_sub(&rcu_call_count, n);
>           bql_lock();
>           while (n > 0) {
>               node = try_dequeue();
> 
> ---
> base-commit: 0dc905ac306c68649e05cdaf8434123c8f917b41
> change-id: 20251015-force-c4e03a9ba719
> 
> Best regards,
> --
> Akihiko Odaki <odaki@rsg.ci.i.u-tokyo.ac.jp>
> 
> 
>

next prev parent reply	other threads:[~2025-10-16 13:01 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-16  6:34 [PATCH] rcu: Unify force quiescent state Akihiko Odaki
2025-10-16 12:59 ` Paolo Bonzini [this message]
2025-10-16 19:33 ` Dmitry Osipenko
2025-10-16 23:43   ` Akihiko Odaki
2025-10-17  0:40     ` Akihiko Odaki
2025-10-19 20:23       ` Dmitry Osipenko
2025-10-20  1:14         ` Akihiko Odaki
2025-10-20 20:41           ` Dmitry Osipenko
2025-10-22  3:30       ` Dmitry Osipenko
2025-10-23  5:50         ` Akihiko Odaki
2025-10-23  9:50           ` Dmitry Osipenko
2025-10-20  6:38   ` Paolo Bonzini
2025-10-21  4:14     ` Akihiko Odaki
2025-10-19 20:25 ` Dmitry Osipenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e0e263c3-980c-47ed-9a93-048de6e47e8b@redhat.com \
    --to=pbonzini@redhat.com \
    --cc=armbru@redhat.com \
    --cc=david@redhat.com \
    --cc=dmitry.osipenko@collabora.com \
    --cc=eblake@redhat.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=odaki@rsg.ci.i.u-tokyo.ac.jp \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).