From: Uladzislau Rezki <urezki@gmail.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Uladzislau Rezki <urezki@gmail.com>, RCU <rcu@vger.kernel.org>,
Neeraj upadhyay <Neeraj.Upadhyay@amd.com>,
Boqun Feng <boqun.feng@gmail.com>,
Hillf Danton <hdanton@sina.com>,
Joel Fernandes <joel@joelfernandes.org>,
LKML <linux-kernel@vger.kernel.org>,
Oleksiy Avramchenko <oleksiy.avramchenko@sony.com>,
Frederic Weisbecker <frederic@kernel.org>
Subject: Re: [PATCH v3 4/7] rcu: Improve handling of synchronize_rcu() users
Date: Wed, 3 Jan 2024 20:02:00 +0100 [thread overview]
Message-ID: <ZZWvKLy9mIOaaay4@pc636> (raw)
In-Reply-To: <d4635fdf-8ed0-452d-8bc8-0fe0e7fb1994@paulmck-laptop>
On Wed, Jan 03, 2024 at 09:56:42AM -0800, Paul E. McKenney wrote:
> On Wed, Jan 03, 2024 at 06:35:20PM +0100, Uladzislau Rezki wrote:
> > On Wed, Jan 03, 2024 at 06:47:30AM -0800, Paul E. McKenney wrote:
> > > On Wed, Jan 03, 2024 at 02:16:00PM +0100, Uladzislau Rezki wrote:
> > > > On Tue, Jan 02, 2024 at 11:25:13AM -0800, Paul E. McKenney wrote:
> > > > > On Tue, Jan 02, 2024 at 01:52:26PM +0100, Uladzislau Rezki wrote:
> > > > > > Hello, Paul!
> > > > > >
> > > > > > Sorry for late answer, it is because of holidays :)
> > > > > >
> > > > > > > > > > The problem is that, we are limited in number of "wait-heads" which we
> > > > > > > > > > add as a marker node for this/current grace period. If there are more clients
> > > > > > > > > > and there is no a wait-head available it means that a system, the deferred
> > > > > > > > > > kworker, is slow in processing callbacks, thus all wait-nodes are in use.
> > > > > > > > > >
> > > > > > > > > > That is why we need an extra grace period. Basically to repeat our try one
> > > > > > > > > > more time, i.e. it might be that a current grace period is not able to handle
> > > > > > > > > > users due to the fact that a system is doing really slow, but this is rather
> > > > > > > > > > a corner case and is not a problem.
> > > > > > > > >
> > > > > > > > > But in that case, the real issue is not the need for an extra grace
> > > > > > > > > period, but rather the need for the wakeup processing to happen, correct?
> > > > > > > > > Or am I missing something subtle here?
> > > > > > > > >
> > > > > > > > Basically, yes. If we had a spare dummy-node we could process the users
> > > > > > > > by the current GP(no need in extra). Why we may not have it - it is because
> > > > > > > > like you pointed:
> > > > > > > >
> > > > > > > > - wake-up issue, i.e. wake-up time + when we are on_cpu;
> > > > > > > > - slow list process. For example priority. The kworker is not
> > > > > > > > given enough CPU time to do the progress, thus "dummy-nodes"
> > > > > > > > are not released in time for reuse.
> > > > > > > >
> > > > > > > > Therefore, en extra GP is requested if there is a high flow of
> > > > > > > > synchronize_rcu() users and kworker is not able to do a progress
> > > > > > > > in time.
> > > > > > > >
> > > > > > > > For example 60K+ parallel synchronize_rcu() users will trigger it.
> > > > > > >
> > > > > > > OK, but what bad thing would happen if that was moved to precede the
> > > > > > > rcu_seq_start(&rcu_state.gp_seq)? That way, the requested grace period
> > > > > > > would be the same as the one that is just now starting.
> > > > > > >
> > > > > > > Something like this?
> > > > > > >
> > > > > > > start_new_poll = rcu_sr_normal_gp_init();
> > > > > > >
> > > > > > > /* Record GP times before starting GP, hence rcu_seq_start(). */
> > > > > > > rcu_seq_start(&rcu_state.gp_seq);
> > > > > > > ASSERT_EXCLUSIVE_WRITER(rcu_state.gp_seq);
> > > > > > >
> > > > > > I had a concern about the case when rcu_sr_normal_gp_init() handles what
> > > > > > we currently have, in terms of requests. Right after that there is/are
> > > > > > extra sync requests which invoke the start_poll_synchronize_rcu() but
> > > > > > since a GP has been requested before it will not request an extra one. So
> > > > > > "last" incoming users might not be processed.
> > > > > >
> > > > > > That is why i have placed the rcu_sr_normal_gp_init() after a gp_seq is
> > > > > > updated.
> > > > > >
> > > > > > I can miss something, so please comment. Apart of that we can move it
> > > > > > as you proposed.
> > > > >
> > > > > Couldn't that possibility be handled by a check in rcu_gp_cleanup()?
> > > > >
> > > > It is controlled by the caller anyway, i.e. if a new GP is needed.
> > > >
> > > > I am not 100% sure it is as straightforward as it could look like to
> > > > handle it in the rcu_sr_normal_gp_cleaup() function. At least i see
> > > > that we need to access to the first element of llist and find out if
> > > > it is a wait-dummy-head or not. If not we know there are extra incoming
> > > > calls.
> > > >
> > > > So that way requires extra calling of start_poll_synchronize_rcu().
> > >
> > > If this is invoked early enough in rcu_gp_cleanup(), all that needs to
> > > happen is to set the need_gp flag. Plus you can count the number of
> > > requests, and snapshot that number at rcu_gp_init() time and check to
> > > see if it changed at rcu_gp_cleanup() time. Later on, this could be
> > > used to reduce the number of wakeups, correct?
> > >
> > You mean instead of waking-up a gp-kthread just continue processing of
> > new users if they are exist? If so, i think, we can implement it as separate
> > patches.
>
> Agreed, this is an optimization, and thus should be a separate patch.
>
> > > > I can add a comment about your concern and we can find the best approach
> > > > later, if it is OK with you!
> > >
> > > I agree that this should be added via a later patch, though I have not
> > > yet given up on the possibility that this patch might be simple enough
> > > to be later in this same series.
> > >
> > Maybe there is a small misunderstanding. Please note, the rcu_sr_normal_gp_init()
> > function does not request any new gp, i.e. our approach does not do any extra GP
> > requests. It happens only if there are no any dummy-wait-head available as we
> > discussed it earlier.
>
> The start_poll_synchronize_rcu() added by your patch 4/7 will request
> an additional grace period because it is invoked after rcu_seq_start()
> is called, correct? Or am I missing something subtle here?
>
<snip>
+ // New poll request after rnp unlock
+ if (start_new_poll)
+ (void) start_poll_synchronize_rcu();
+
<snip>
The "start_new_poll" is set to "true" only when _this_ GP is not able
to handle anything and there are outstanding users. It happens when the
rcu_sr_normal_gp_init() function was not able to insert a dummy separator
to the llist, because there were no left dummy-nodes(fixed number of them)
due to the fact that all of them are "in-use". The reason why there are no
dummy-nodes is because of slow progress because it is done by dedicated
kworker.
I can trigger it, i mean when we need an addition GP, start_new_pool is 1,
only when i run 20 000 processes concurrently in a tight loop:
<snip>
while (1)
synchronize_rcu();
<snip>
in that scenario we start to ask for an addition GP because we are not up
to speed, i.e. a system is slow in processing callbacks and we need some
time until wait-node/nodes is/are released for reuse.
We need a next GP to move it forward, i.e. to repeat a try of attaching
a dummy-node.
--
Uladzislau Rezki
next prev parent reply other threads:[~2024-01-03 19:02 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-28 8:00 [PATCH v3 0/7] Reduce synchronize_rcu() latency(V3) Uladzislau Rezki (Sony)
2023-11-28 8:00 ` [PATCH v3 1/7] rcu: Reduce synchronize_rcu() latency Uladzislau Rezki (Sony)
2023-11-28 8:00 ` [PATCH v3 2/7] rcu: Add a trace event for synchronize_rcu_normal() Uladzislau Rezki (Sony)
2023-11-28 8:00 ` [PATCH v3 3/7] doc: Add rcutree.rcu_normal_wake_from_gp to kernel-parameters.txt Uladzislau Rezki (Sony)
2023-12-20 1:17 ` Paul E. McKenney
2023-12-21 10:28 ` Uladzislau Rezki
2023-11-28 8:00 ` [PATCH v3 4/7] rcu: Improve handling of synchronize_rcu() users Uladzislau Rezki (Sony)
2023-12-20 1:37 ` Paul E. McKenney
2023-12-21 10:52 ` Uladzislau Rezki
2023-12-21 18:40 ` Paul E. McKenney
2023-12-22 9:27 ` Uladzislau Rezki
2023-12-22 18:58 ` Paul E. McKenney
2024-01-02 12:52 ` Uladzislau Rezki
2024-01-02 19:25 ` Paul E. McKenney
2024-01-03 13:16 ` Uladzislau Rezki
2024-01-03 14:47 ` Paul E. McKenney
2024-01-03 17:35 ` Uladzislau Rezki
2024-01-03 17:56 ` Paul E. McKenney
2024-01-03 19:02 ` Uladzislau Rezki [this message]
2024-01-03 19:03 ` Uladzislau Rezki
2024-01-03 19:33 ` Paul E. McKenney
2024-01-04 11:17 ` Uladzislau Rezki
2023-11-28 8:00 ` [PATCH v3 5/7] rcu: Support direct wake-up " Uladzislau Rezki (Sony)
2023-12-20 1:46 ` Paul E. McKenney
2023-12-21 11:22 ` Uladzislau Rezki
2023-11-28 8:00 ` [PATCH v3 6/7] rcu: Move sync related data to rcu_state structure Uladzislau Rezki (Sony)
2023-12-20 1:47 ` Paul E. McKenney
2023-12-21 10:56 ` Uladzislau Rezki
2023-11-28 8:00 ` [PATCH v3 7/7] rcu: Add CONFIG_RCU_SR_NORMAL_DEBUG_GP Uladzislau Rezki (Sony)
2023-12-20 1:14 ` Paul E. McKenney
2023-12-21 10:27 ` Uladzislau Rezki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZZWvKLy9mIOaaay4@pc636 \
--to=urezki@gmail.com \
--cc=Neeraj.Upadhyay@amd.com \
--cc=boqun.feng@gmail.com \
--cc=frederic@kernel.org \
--cc=hdanton@sina.com \
--cc=joel@joelfernandes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oleksiy.avramchenko@sony.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.