From: Frederic Weisbecker <frederic@kernel.org>
To: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: rcu@vger.kernel.org, linux-kernel@vger.kernel.org,
rushikesh.s.kadam@intel.com, urezki@gmail.com,
neeraj.iitr10@gmail.com, paulmck@kernel.org, rostedt@goodmis.org,
vineeth@bitbyteword.org, boqun.feng@gmail.com
Subject: Re: [PATCH v5 06/18] rcu: Introduce call_rcu_lazy() API implementation
Date: Fri, 2 Sep 2022 17:21:32 +0200 [thread overview]
Message-ID: <20220902152132.GA115525@lothringen> (raw)
In-Reply-To: <20220901221720.1105021-7-joel@joelfernandes.org>
On Thu, Sep 01, 2022 at 10:17:08PM +0000, Joel Fernandes (Google) wrote:
> Implement timer-based RCU lazy callback batching. The batch is flushed
> whenever a certain amount of time has passed, or the batch on a
> particular CPU grows too big. Also memory pressure will flush it in a
> future patch.
>
> To handle several corner cases automagically (such as rcu_barrier() and
> hotplug), we re-use bypass lists to handle lazy CBs. The bypass list
> length has the lazy CB length included in it. A separate lazy CB length
> counter is also introduced to keep track of the number of lazy CBs.
>
> Suggested-by: Paul McKenney <paulmck@kernel.org>
> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
> ---
> include/linux/rcupdate.h | 6 ++
> kernel/rcu/Kconfig | 8 ++
> kernel/rcu/rcu.h | 11 +++
> kernel/rcu/rcu_segcblist.c | 2 +-
> kernel/rcu/tree.c | 130 +++++++++++++++---------
> kernel/rcu/tree.h | 13 ++-
> kernel/rcu/tree_nocb.h | 198 ++++++++++++++++++++++++++++++-------
> 7 files changed, 280 insertions(+), 88 deletions(-)
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 08605ce7379d..82e8a07e0856 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -108,6 +108,12 @@ static inline int rcu_preempt_depth(void)
>
> #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
>
> +#ifdef CONFIG_RCU_LAZY
> +void call_rcu_lazy(struct rcu_head *head, rcu_callback_t func);
> +#else
> +#define call_rcu_lazy(head, func) call_rcu(head, func)
Why about an inline instead?
> +#endif
> +
> /* Internal to kernel */
> void rcu_init(void);
> extern int rcu_scheduler_active;
> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> index d471d22a5e21..3128d01427cb 100644
> --- a/kernel/rcu/Kconfig
> +++ b/kernel/rcu/Kconfig
> @@ -311,4 +311,12 @@ config TASKS_TRACE_RCU_READ_MB
> Say N here if you hate read-side memory barriers.
> Take the default if you are unsure.
>
> +config RCU_LAZY
> + bool "RCU callback lazy invocation functionality"
> + depends on RCU_NOCB_CPU
> + default n
> + help
> + To save power, batch RCU callbacks and flush after delay, memory
> + pressure or callback list growing too big.
> +
> endmenu # "RCU Subsystem"
> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> index be5979da07f5..94675f14efe8 100644
> --- a/kernel/rcu/rcu.h
> +++ b/kernel/rcu/rcu.h
> @@ -474,6 +474,14 @@ enum rcutorture_type {
> INVALID_RCU_FLAVOR
> };
>
> +#if defined(CONFIG_RCU_LAZY)
> +unsigned long rcu_lazy_get_jiffies_till_flush(void);
> +void rcu_lazy_set_jiffies_till_flush(unsigned long j);
> +#else
> +static inline unsigned long rcu_lazy_get_jiffies_till_flush(void) { return 0; }
> +static inline void rcu_lazy_set_jiffies_till_flush(unsigned long j) { }
> +#endif
> +
> #if defined(CONFIG_TREE_RCU)
> void rcutorture_get_gp_data(enum rcutorture_type test_type, int *flags,
> unsigned long *gp_seq);
> @@ -483,6 +491,8 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
> unsigned long c_old,
> unsigned long c);
> void rcu_gp_set_torture_wait(int duration);
> +void rcu_force_call_rcu_to_lazy(bool force);
> +
> #else
> static inline void rcutorture_get_gp_data(enum rcutorture_type test_type,
> int *flags, unsigned long *gp_seq)
> @@ -501,6 +511,7 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
> do { } while (0)
> #endif
> static inline void rcu_gp_set_torture_wait(int duration) { }
> +static inline void rcu_force_call_rcu_to_lazy(bool force) { }
> #endif
>
> #if IS_ENABLED(CONFIG_RCU_TORTURE_TEST) || IS_MODULE(CONFIG_RCU_TORTURE_TEST)
> diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c
> index c54ea2b6a36b..55b50e592986 100644
> --- a/kernel/rcu/rcu_segcblist.c
> +++ b/kernel/rcu/rcu_segcblist.c
> @@ -38,7 +38,7 @@ void rcu_cblist_enqueue(struct rcu_cblist *rclp, struct rcu_head *rhp)
> * element of the second rcu_cblist structure, but ensuring that the second
> * rcu_cblist structure, if initially non-empty, always appears non-empty
> * throughout the process. If rdp is NULL, the second rcu_cblist structure
> - * is instead initialized to empty.
> + * is instead initialized to empty. Also account for lazy_len for lazy CBs.
Leftover?
> @@ -3904,7 +3943,8 @@ static void rcu_barrier_entrain(struct rcu_data *rdp)
> rdp->barrier_head.func = rcu_barrier_callback;
> debug_rcu_head_queue(&rdp->barrier_head);
> rcu_nocb_lock(rdp);
> - WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false,
> + /* wake gp thread */ true));
It's a bad sign when you need to start commenting your boolean parameters :)
Perhaps use a single two-bit flag instead of two booleans, for readability?
Also that's a subtle change which purpose isn't explained. It means that
rcu_barrier_entrain() used to wait for the bypass timer in the worst case
but now we force rcuog into it immediately. Should that be a separate change?
> diff --git a/kernel/rcu/tree_nocb.h b/kernel/rcu/tree_nocb.h
> index 31068dd31315..7e97a7b6e046 100644
> --- a/kernel/rcu/tree_nocb.h
> +++ b/kernel/rcu/tree_nocb.h
> @@ -256,6 +256,31 @@ static bool wake_nocb_gp(struct rcu_data *rdp, bool force)
> return __wake_nocb_gp(rdp_gp, rdp, force, flags);
> }
>
> +/*
> + * LAZY_FLUSH_JIFFIES decides the maximum amount of time that
> + * can elapse before lazy callbacks are flushed. Lazy callbacks
> + * could be flushed much earlier for a number of other reasons
> + * however, LAZY_FLUSH_JIFFIES will ensure no lazy callbacks are
> + * left unsubmitted to RCU after those many jiffies.
> + */
> +#define LAZY_FLUSH_JIFFIES (10 * HZ)
> +unsigned long jiffies_till_flush = LAZY_FLUSH_JIFFIES;
It seems this can be made static?
> @@ -265,23 +290,39 @@ static void wake_nocb_gp_defer(struct rcu_data *rdp, int waketype,
> {
> unsigned long flags;
> struct rcu_data *rdp_gp = rdp->nocb_gp_rdp;
> + unsigned long mod_jif = 0;
>
> raw_spin_lock_irqsave(&rdp_gp->nocb_gp_lock, flags);
>
> /*
> - * Bypass wakeup overrides previous deferments. In case
> - * of callback storm, no need to wake up too early.
> + * Bypass and lazy wakeup overrides previous deferments. In case of
> + * callback storm, no need to wake up too early.
> */
> - if (waketype == RCU_NOCB_WAKE_BYPASS) {
> - mod_timer(&rdp_gp->nocb_timer, jiffies + 2);
> - WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> - } else {
> + switch (waketype) {
> + case RCU_NOCB_WAKE_LAZY:
> + if (rdp->nocb_defer_wakeup != RCU_NOCB_WAKE_LAZY)
> + mod_jif = jiffies_till_flush;
> + break;
Ah so this timer override all the others, and it's a several seconds
timers. Then I'll have a related concern on nocb_gp_wait().
> +
> + case RCU_NOCB_WAKE_BYPASS:
> + mod_jif = 2;
> + break;
> +
> + case RCU_NOCB_WAKE:
> + case RCU_NOCB_WAKE_FORCE:
> + // For these, make it wake up the soonest if we
> + // were in a bypass or lazy sleep before.
> if (rdp_gp->nocb_defer_wakeup < RCU_NOCB_WAKE)
> - mod_timer(&rdp_gp->nocb_timer, jiffies + 1);
> - if (rdp_gp->nocb_defer_wakeup < waketype)
> - WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
> + mod_jif = 1;
> + break;
> }
>
> + if (mod_jif)
> + mod_timer(&rdp_gp->nocb_timer, jiffies + mod_jif);
> +
> + if (rdp_gp->nocb_defer_wakeup < waketype)
> + WRITE_ONCE(rdp_gp->nocb_defer_wakeup, waketype);
So RCU_NOCB_WAKE_BYPASS and RCU_NOCB_WAKE_LAZY don't override the timer state
anymore? Looks like something is missing.
> +
> raw_spin_unlock_irqrestore(&rdp_gp->nocb_gp_lock, flags);
>
> trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, reason);
[...]
> @@ -705,12 +816,21 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> my_rdp->nocb_gp_gp = needwait_gp;
> my_rdp->nocb_gp_seq = needwait_gp ? wait_gp_seq : 0;
>
> - if (bypass && !rcu_nocb_poll) {
> - // At least one child with non-empty ->nocb_bypass, so set
> - // timer in order to avoid stranding its callbacks.
> - wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS,
> - TPS("WakeBypassIsDeferred"));
> + // At least one child with non-empty ->nocb_bypass, so set
> + // timer in order to avoid stranding its callbacks.
> + if (!rcu_nocb_poll) {
> + // If bypass list only has lazy CBs. Add a deferred
> + // lazy wake up.
> + if (lazy && !bypass) {
> + wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_LAZY,
> + TPS("WakeLazyIsDeferred"));
What if:
1) rdp(1) has lazy callbacks
2) all other rdp's have no callback at all
3) nocb_gp_wait() runs through all rdp's, everything is handled, except for
these lazy callbacks
4) It reaches the above path, ready to arm the RCU_NOCB_WAKE_LAZY timer,
but it hasn't yet called wake_nocb_gp_defer()
5) Oh but rdp(2) queues a non-lazy callback. interrupts are disabled so it defers
the wake up to nocb_gp_wait() with arming the timer in RCU_NOCB_WAKE.
6) nocb_gp_wait() finally calls wake_nocb_gp_defer() and override the timeout
to several seconds ahead.
7) No more callbacks queued, the non-lazy callback will have to wait several
seconds to complete.
Or did I miss something? Note that the race exists with RCU_NOCB_WAKE_BYPASS
but it's only about one jiffy delay, not seconds.
(further review later).
Thanks.
> + // Otherwise add a deferred bypass wake up.
> + } else if (bypass) {
> + wake_nocb_gp_defer(my_rdp, RCU_NOCB_WAKE_BYPASS,
> + TPS("WakeBypassIsDeferred"));
> + }
> }
> +
> if (rcu_nocb_poll) {
> /* Polling, so trace if first poll in the series. */
> if (gotcbs)
> @@ -1036,7 +1156,7 @@ static long rcu_nocb_rdp_deoffload(void *arg)
> * return false, which means that future calls to rcu_nocb_try_bypass()
> * will refuse to put anything into the bypass.
> */
> - WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies));
> + WARN_ON_ONCE(!rcu_nocb_flush_bypass(rdp, NULL, jiffies, false, false));
> /*
> * Start with invoking rcu_core() early. This way if the current thread
> * happens to preempt an ongoing call to rcu_core() in the middle,
> @@ -1290,6 +1410,7 @@ static void __init rcu_boot_init_nocb_percpu_data(struct rcu_data *rdp)
> raw_spin_lock_init(&rdp->nocb_gp_lock);
> timer_setup(&rdp->nocb_timer, do_nocb_deferred_wakeup_timer, 0);
> rcu_cblist_init(&rdp->nocb_bypass);
> + WRITE_ONCE(rdp->lazy_len, 0);
> mutex_init(&rdp->nocb_gp_kthread_mutex);
> }
>
> @@ -1571,13 +1692,14 @@ static void rcu_init_one_nocb(struct rcu_node *rnp)
> }
>
> static bool rcu_nocb_flush_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> - unsigned long j)
> + unsigned long j, bool lazy, bool wakegp)
> {
> return true;
> }
>
> static bool rcu_nocb_try_bypass(struct rcu_data *rdp, struct rcu_head *rhp,
> - bool *was_alldone, unsigned long flags)
> + bool *was_alldone, unsigned long flags,
> + bool lazy)
> {
> return false;
> }
> --
> 2.37.2.789.g6183377224-goog
>
next prev parent reply other threads:[~2022-09-02 15:36 UTC|newest]
Thread overview: 84+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-01 22:17 [PATCH v5 00/18] Implement call_rcu_lazy() and miscellaneous fixes Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 01/18] mm/slub: perform free consistency checks before call_rcu Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 02/18] mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head Joel Fernandes (Google)
2022-09-02 9:26 ` Vlastimil Babka
2022-09-02 9:30 ` Vlastimil Babka
2022-09-02 15:09 ` Joel Fernandes
2022-09-03 13:53 ` Paul E. McKenney
2022-09-01 22:17 ` [PATCH v5 03/18] rcu/tree: Use READ_ONCE() for lockless read of rnp->qsmask Joel Fernandes (Google)
2022-09-06 22:26 ` Boqun Feng
2022-09-06 22:31 ` Joel Fernandes
2022-09-01 22:17 ` [PATCH v5 04/18] rcu: Fix late wakeup when flush of bypass cblist happens Joel Fernandes (Google)
2022-09-02 11:35 ` Frederic Weisbecker
2022-09-02 23:58 ` Joel Fernandes
2022-09-03 15:10 ` Paul E. McKenney
2022-09-04 21:13 ` Frederic Weisbecker
2022-09-03 14:04 ` Paul E. McKenney
2022-09-03 14:05 ` Joel Fernandes
2022-09-06 3:07 ` Joel Fernandes
2022-09-06 9:48 ` Frederic Weisbecker
2022-09-07 2:43 ` Joel Fernandes
2022-09-01 22:17 ` [PATCH v5 05/18] rcu: Move trace_rcu_callback() before bypassing Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 06/18] rcu: Introduce call_rcu_lazy() API implementation Joel Fernandes (Google)
2022-09-02 15:21 ` Frederic Weisbecker [this message]
2022-09-02 23:09 ` Joel Fernandes
2022-09-05 12:59 ` Frederic Weisbecker
2022-09-05 20:18 ` Joel Fernandes
2022-09-05 20:32 ` Joel Fernandes
2022-09-06 8:55 ` Paul E. McKenney
2022-09-06 16:16 ` Joel Fernandes
2022-09-06 17:05 ` Paul E. McKenney
2022-09-03 22:00 ` Joel Fernandes
2022-09-04 21:01 ` Frederic Weisbecker
2022-09-05 20:20 ` Joel Fernandes
2022-09-06 3:05 ` Joel Fernandes
2022-09-06 15:17 ` Frederic Weisbecker
2022-09-06 16:15 ` Joel Fernandes
2022-09-06 16:31 ` Joel Fernandes
2022-09-06 16:38 ` Joel Fernandes
2022-09-06 16:43 ` Joel Fernandes
2022-09-06 19:11 ` Frederic Weisbecker
2022-09-07 2:56 ` Joel Fernandes
2022-09-07 9:56 ` Frederic Weisbecker
2022-09-07 10:03 ` Frederic Weisbecker
2022-09-07 14:01 ` Joel Fernandes
2022-09-07 0:06 ` Joel Fernandes
2022-09-07 9:40 ` Frederic Weisbecker
2022-09-07 13:44 ` Joel Fernandes
2022-09-07 15:38 ` Frederic Weisbecker
2022-09-07 15:39 ` Joel Fernandes
2022-09-21 23:52 ` Joel Fernandes
2022-09-22 14:39 ` Joel Fernandes
2022-09-06 18:16 ` Uladzislau Rezki
2022-09-06 18:21 ` Joel Fernandes
2022-09-07 8:52 ` Uladzislau Rezki
2022-09-07 15:23 ` Joel Fernandes
2022-09-03 14:03 ` Paul E. McKenney
2022-09-03 14:05 ` Joel Fernandes
2022-09-01 22:17 ` [PATCH v5 07/18] rcu: shrinker for lazy rcu Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 08/18] rcu: Add per-CB tracing for queuing, flush and invocation Joel Fernandes (Google)
2022-09-02 16:48 ` kernel test robot
2022-09-03 12:39 ` Paul E. McKenney
2022-09-03 12:39 ` Paul E. McKenney
2022-09-03 14:07 ` Joel Fernandes
2022-09-03 14:07 ` Joel Fernandes
2022-09-02 19:01 ` kernel test robot
2022-09-01 22:17 ` [PATCH v5 09/18] rcuscale: Add laziness and kfree tests Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 10/18] rcutorture: Add test code for call_rcu_lazy() Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 11/18] fs: Move call_rcu() to call_rcu_lazy() in some paths Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 12/18] cred: Move call_rcu() to call_rcu_lazy() Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 13/18] security: " Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 14/18] net/core: " Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 15/18] kernel: Move various core kernel usages " Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 16/18] lib: Move call_rcu() " Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 17/18] i915: " Joel Fernandes (Google)
2022-09-01 22:17 ` [PATCH v5 18/18] fork: Move thread_stack_free_rcu() " Joel Fernandes (Google)
2022-09-03 15:22 ` [PATCH v5 00/18] Implement call_rcu_lazy() and miscellaneous fixes Paul E. McKenney
[not found] ` <CAEXW_YTxRWvUN3YuwdJ9WPmHH-v+Auozfc_OugpV_hpxC5K+4Q@mail.gmail.com>
[not found] ` <20220914093423.GS246308@paulmck-ThinkPad-P17-Gen-1>
[not found] ` <20220914102319.GA1936@lothringen>
[not found] ` <CA+KHdyUb122cCaeLutzK8Ti-dk4D=iXCtQau=ZXkPRO-1cnD1A@mail.gmail.com>
[not found] ` <20220914112251.GV246308@paulmck-ThinkPad-P17-Gen-1>
[not found] ` <YygjfJkoUzxy9xd/@pc636>
[not found] ` <20220919131152.GF4196@paulmck-ThinkPad-P17-Gen-1>
[not found] ` <CAEXW_YQGPwkTQjxAb5p4OOhJkMA-L93JNTsa-WaXs0Ts7=Oakw@mail.gmail.com>
[not found] ` <20220919223211.GJ4196@paulmck-ThinkPad-P17-Gen-1>
[not found] ` <f061b798-e6f6-1523-52c3-12fe7966f809@joelfernandes.org>
[not found] ` <a2882e80-1723-9004-f67a-cd8332aa701d@joelfernandes.org>
[not found] ` <d72808e1-fc27-f158-37b7-ef23c23e74c1@joelfernandes.org>
2022-09-20 11:40 ` Joel Fernandes
2022-09-20 12:32 ` Paul E. McKenney
2022-09-20 17:55 ` Joel Fernandes
2022-09-20 19:03 ` Paul E. McKenney
2022-09-20 22:27 ` Joel Fernandes
2022-09-20 22:35 ` Paul E. McKenney
2022-09-20 22:40 ` Joel Fernandes
2022-09-21 2:21 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220902152132.GA115525@lothringen \
--to=frederic@kernel.org \
--cc=boqun.feng@gmail.com \
--cc=joel@joelfernandes.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neeraj.iitr10@gmail.com \
--cc=paulmck@kernel.org \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=rushikesh.s.kadam@intel.com \
--cc=urezki@gmail.com \
--cc=vineeth@bitbyteword.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.