* [PATCH rcu 01/10] rcu: Fix rcu_read_unlock_strict() strict QS reporting
2022-08-31 18:07 [PATCH rcu 0/7] Miscellaneous fixes for v6.1 Paul E. McKenney
@ 2022-08-31 18:07 ` Paul E. McKenney
2022-08-31 18:07 ` [PATCH rcu 02/10] rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels Paul E. McKenney
` (8 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:07 UTC (permalink / raw)
To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang, Paul E . McKenney
From: Zqiang <qiang1.zhang@intel.com>
Kernels built with CONFIG_PREEMPT=n and CONFIG_RCU_STRICT_GRACE_PERIOD=y
report the quiescent state directly from the outermost rcu_read_unlock().
However, the current CPU's rcu_data structure's ->cpu_no_qs.b.norm
might still be set, in which case rcu_report_qs_rdp() will exit early,
thus failing to report quiescent state.
This commit therefore causes rcu_read_unlock_strict() to clear
CPU's rcu_data structure's ->cpu_no_qs.b.norm field before invoking
rcu_report_qs_rdp().
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
kernel/rcu/tree_plugin.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 438ecae6bd7e7..86772c95ed0ae 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -824,6 +824,7 @@ void rcu_read_unlock_strict(void)
if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
return;
rdp = this_cpu_ptr(&rcu_data);
+ rdp->cpu_no_qs.b.norm = false;
rcu_report_qs_rdp(rdp);
udelay(rcu_unlock_delay);
}
--
2.31.1.189.g2e36527f23
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH rcu 02/10] rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels
2022-08-31 18:07 [PATCH rcu 0/7] Miscellaneous fixes for v6.1 Paul E. McKenney
2022-08-31 18:07 ` [PATCH rcu 01/10] rcu: Fix rcu_read_unlock_strict() strict QS reporting Paul E. McKenney
@ 2022-08-31 18:07 ` Paul E. McKenney
2022-08-31 18:07 ` [PATCH rcu 03/10] rcu: Add QS check in rcu_exp_handler() for non-preemptible kernels Paul E. McKenney
` (7 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:07 UTC (permalink / raw)
To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang, Paul E . McKenney
From: Zqiang <qiang1.zhang@intel.com>
In non-premptible kernels, tasks never do context switches within
RCU read-side critical sections. Therefore, in such kernels, each
leaf rcu_node structure's ->blkd_tasks list will always be empty.
The comment on the non-preemptible version of rcu_preempt_deferred_qs()
confuses this point, so this commit therefore fixes it.
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
kernel/rcu/tree_plugin.h | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 86772c95ed0ae..4152816dd29f6 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -932,10 +932,13 @@ static notrace bool rcu_preempt_need_deferred_qs(struct task_struct *t)
return false;
}
-// Except that we do need to respond to a request by an expedited grace
-// period for a quiescent state from this CPU. Note that requests from
-// tasks are handled when removing the task from the blocked-tasks list
-// below.
+// Except that we do need to respond to a request by an expedited
+// grace period for a quiescent state from this CPU. Note that in
+// non-preemptible kernels, there can be no context switches within RCU
+// read-side critical sections, which in turn means that the leaf rcu_node
+// structure's blocked-tasks list is always empty. is therefore no need to
+// actually check it. Instead, a quiescent state from this CPU suffices,
+// and this function is only called from such a quiescent state.
notrace void rcu_preempt_deferred_qs(struct task_struct *t)
{
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
--
2.31.1.189.g2e36527f23
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH rcu 03/10] rcu: Add QS check in rcu_exp_handler() for non-preemptible kernels
2022-08-31 18:07 [PATCH rcu 0/7] Miscellaneous fixes for v6.1 Paul E. McKenney
2022-08-31 18:07 ` [PATCH rcu 01/10] rcu: Fix rcu_read_unlock_strict() strict QS reporting Paul E. McKenney
2022-08-31 18:07 ` [PATCH rcu 02/10] rcu: Update rcu_preempt_deferred_qs() comments for !PREEMPT kernels Paul E. McKenney
@ 2022-08-31 18:07 ` Paul E. McKenney
2022-09-07 12:10 ` Frederic Weisbecker
2022-08-31 18:07 ` [PATCH rcu 04/10] rcu: Make tiny RCU support leak callbacks for debug-object errors Paul E. McKenney
` (6 subsequent siblings)
9 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:07 UTC (permalink / raw)
To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang, Paul E . McKenney
From: Zqiang <qiang1.zhang@intel.com>
Kernels built with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y maintain
preempt_count() state. Because such kernels map __rcu_read_lock()
and __rcu_read_unlock() to preempt_disable() and preempt_enable(),
respectively, this allows the expedited grace period's !CONFIG_PREEMPT_RCU
version of the rcu_exp_handler() IPI handler function to use
preempt_count() to detect quiescent states.
This preempt_count() usage might seem to risk failures due to
use of implicit RCU readers in portions of the kernel under #ifndef
CONFIG_PREEMPTION, except that rcu_core() already disallows such implicit
RCU readers. The moral of this story is that you must use explicit
read-side markings such as rcu_read_lock() or preempt_disable() even if
the code knows that this kernel does not support preemption.
This commit therefore adds a preempt_count()-based check for a quiescent
state in the !CONFIG_PREEMPT_RCU version of the rcu_exp_handler()
function for kernels built with CONFIG_PREEMPT_COUNT=y, reporting an
immediate quiescent state when the interrupted code had both preemption
and softirqs enabled.
This change results in about a 2% reduction in expedited grace-period
latency in kernels built with both CONFIG_PREEMPT_RCU=n and
CONFIG_PREEMPT_COUNT=y.
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/all/20220622103549.2840087-1-qiang1.zhang@intel.com/
---
kernel/rcu/tree_exp.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index be667583a5547..b07998159d1fa 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -828,11 +828,13 @@ static void rcu_exp_handler(void *unused)
{
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp = rdp->mynode;
+ bool preempt_bh_enabled = !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
__this_cpu_read(rcu_data.cpu_no_qs.b.exp))
return;
- if (rcu_is_cpu_rrupt_from_idle()) {
+ if (rcu_is_cpu_rrupt_from_idle() ||
+ (IS_ENABLED(CONFIG_PREEMPT_COUNT) && preempt_bh_enabled)) {
rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
return;
}
--
2.31.1.189.g2e36527f23
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH rcu 03/10] rcu: Add QS check in rcu_exp_handler() for non-preemptible kernels
2022-08-31 18:07 ` [PATCH rcu 03/10] rcu: Add QS check in rcu_exp_handler() for non-preemptible kernels Paul E. McKenney
@ 2022-09-07 12:10 ` Frederic Weisbecker
2022-09-07 14:57 ` Paul E. McKenney
0 siblings, 1 reply; 14+ messages in thread
From: Frederic Weisbecker @ 2022-09-07 12:10 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: rcu, linux-kernel, kernel-team, rostedt, Zqiang
On Wed, Aug 31, 2022 at 11:07:58AM -0700, Paul E. McKenney wrote:
> From: Zqiang <qiang1.zhang@intel.com>
>
> Kernels built with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y maintain
> preempt_count() state. Because such kernels map __rcu_read_lock()
> and __rcu_read_unlock() to preempt_disable() and preempt_enable(),
> respectively, this allows the expedited grace period's !CONFIG_PREEMPT_RCU
> version of the rcu_exp_handler() IPI handler function to use
> preempt_count() to detect quiescent states.
>
> This preempt_count() usage might seem to risk failures due to
> use of implicit RCU readers in portions of the kernel under #ifndef
> CONFIG_PREEMPTION, except that rcu_core() already disallows such implicit
> RCU readers. The moral of this story is that you must use explicit
> read-side markings such as rcu_read_lock() or preempt_disable() even if
> the code knows that this kernel does not support preemption.
>
> This commit therefore adds a preempt_count()-based check for a quiescent
> state in the !CONFIG_PREEMPT_RCU version of the rcu_exp_handler()
> function for kernels built with CONFIG_PREEMPT_COUNT=y, reporting an
> immediate quiescent state when the interrupted code had both preemption
> and softirqs enabled.
>
> This change results in about a 2% reduction in expedited grace-period
> latency in kernels built with both CONFIG_PREEMPT_RCU=n and
> CONFIG_PREEMPT_COUNT=y.
>
> Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> Link: https://lore.kernel.org/all/20220622103549.2840087-1-qiang1.zhang@intel.com/
> ---
> kernel/rcu/tree_exp.h | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> index be667583a5547..b07998159d1fa 100644
> --- a/kernel/rcu/tree_exp.h
> +++ b/kernel/rcu/tree_exp.h
> @@ -828,11 +828,13 @@ static void rcu_exp_handler(void *unused)
> {
> struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> struct rcu_node *rnp = rdp->mynode;
> + bool preempt_bh_enabled = !(preempt_count() & (PREEMPT_MASK |
> SOFTIRQ_MASK));
I don't know if nested hardirqs still exist. I only heard old rumours
about broken drivers. Should we take care of them?
Also are we sure that all callers of flush_smp_call_function_queue()
are QS?
Let's see we know that rcu_exp_handler() can either be executed from:
* hardirqs
Or from process context, expected to be RCU QS states at least in idle
as the comment above flush_smp_call_function_queue() in idle says
(but I'd rather check all the in-process callers before stating all
of them are in QS)
* idle (in which case preemption is disabled unfortunately so the current
test won't help)
* stop_machine
_ When CPU is dead and out of RCU (rcutree_dead_cpu() called)
so that should be a QS.
_ When CPU is migrating (is it a QS then?)
If we check further that all non-IRQ callers of flush_smp_call_function_queue()
are always quiescent states then we could deduce that !in_hardirq() means we are in
a quiescent state, whether preemption is disabled or not.
In any case for the current patch, perhaps a more robust test against nested
hardirqs would be:
unsigned long cnt = preempt_count();
bool preempt_bh_enabled = (!cnt || cnt == HARDIRQ_OFFSET)
Thanks.
>
> if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> __this_cpu_read(rcu_data.cpu_no_qs.b.exp))
> return;
> - if (rcu_is_cpu_rrupt_from_idle()) {
> + if (rcu_is_cpu_rrupt_from_idle() ||
> + (IS_ENABLED(CONFIG_PREEMPT_COUNT) && preempt_bh_enabled)) {
> rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
> return;
> }
> --
> 2.31.1.189.g2e36527f23
>
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH rcu 03/10] rcu: Add QS check in rcu_exp_handler() for non-preemptible kernels
2022-09-07 12:10 ` Frederic Weisbecker
@ 2022-09-07 14:57 ` Paul E. McKenney
2022-09-07 15:14 ` Frederic Weisbecker
0 siblings, 1 reply; 14+ messages in thread
From: Paul E. McKenney @ 2022-09-07 14:57 UTC (permalink / raw)
To: Frederic Weisbecker; +Cc: rcu, linux-kernel, kernel-team, rostedt, Zqiang
On Wed, Sep 07, 2022 at 02:10:10PM +0200, Frederic Weisbecker wrote:
> On Wed, Aug 31, 2022 at 11:07:58AM -0700, Paul E. McKenney wrote:
> > From: Zqiang <qiang1.zhang@intel.com>
> >
> > Kernels built with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y maintain
> > preempt_count() state. Because such kernels map __rcu_read_lock()
> > and __rcu_read_unlock() to preempt_disable() and preempt_enable(),
> > respectively, this allows the expedited grace period's !CONFIG_PREEMPT_RCU
> > version of the rcu_exp_handler() IPI handler function to use
> > preempt_count() to detect quiescent states.
> >
> > This preempt_count() usage might seem to risk failures due to
> > use of implicit RCU readers in portions of the kernel under #ifndef
> > CONFIG_PREEMPTION, except that rcu_core() already disallows such implicit
> > RCU readers. The moral of this story is that you must use explicit
> > read-side markings such as rcu_read_lock() or preempt_disable() even if
> > the code knows that this kernel does not support preemption.
> >
> > This commit therefore adds a preempt_count()-based check for a quiescent
> > state in the !CONFIG_PREEMPT_RCU version of the rcu_exp_handler()
> > function for kernels built with CONFIG_PREEMPT_COUNT=y, reporting an
> > immediate quiescent state when the interrupted code had both preemption
> > and softirqs enabled.
> >
> > This change results in about a 2% reduction in expedited grace-period
> > latency in kernels built with both CONFIG_PREEMPT_RCU=n and
> > CONFIG_PREEMPT_COUNT=y.
> >
> > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > Link: https://lore.kernel.org/all/20220622103549.2840087-1-qiang1.zhang@intel.com/
> > ---
> > kernel/rcu/tree_exp.h | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index be667583a5547..b07998159d1fa 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -828,11 +828,13 @@ static void rcu_exp_handler(void *unused)
> > {
> > struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > struct rcu_node *rnp = rdp->mynode;
> > + bool preempt_bh_enabled = !(preempt_count() & (PREEMPT_MASK |
> > SOFTIRQ_MASK));
>
> I don't know if nested hardirqs still exist. I only heard old rumours
> about broken drivers. Should we take care of them?
Last I checked, certain tracing scenarios from irq handlers looked
to RCU like nested irq handlers. Given that, does your more robust
approach below work correctly?
Thanx, Paul
> Also are we sure that all callers of flush_smp_call_function_queue()
> are QS?
>
> Let's see we know that rcu_exp_handler() can either be executed from:
>
> * hardirqs
>
> Or from process context, expected to be RCU QS states at least in idle
> as the comment above flush_smp_call_function_queue() in idle says
> (but I'd rather check all the in-process callers before stating all
> of them are in QS)
>
> * idle (in which case preemption is disabled unfortunately so the current
> test won't help)
> * stop_machine
> _ When CPU is dead and out of RCU (rcutree_dead_cpu() called)
> so that should be a QS.
> _ When CPU is migrating (is it a QS then?)
>
> If we check further that all non-IRQ callers of flush_smp_call_function_queue()
> are always quiescent states then we could deduce that !in_hardirq() means we are in
> a quiescent state, whether preemption is disabled or not.
>
> In any case for the current patch, perhaps a more robust test against nested
> hardirqs would be:
>
> unsigned long cnt = preempt_count();
> bool preempt_bh_enabled = (!cnt || cnt == HARDIRQ_OFFSET)
>
> Thanks.
>
> >
> > if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> > __this_cpu_read(rcu_data.cpu_no_qs.b.exp))
> > return;
> > - if (rcu_is_cpu_rrupt_from_idle()) {
> > + if (rcu_is_cpu_rrupt_from_idle() ||
> > + (IS_ENABLED(CONFIG_PREEMPT_COUNT) && preempt_bh_enabled)) {
> > rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
> > return;
> > }
> > --
> > 2.31.1.189.g2e36527f23
> >
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH rcu 03/10] rcu: Add QS check in rcu_exp_handler() for non-preemptible kernels
2022-09-07 14:57 ` Paul E. McKenney
@ 2022-09-07 15:14 ` Frederic Weisbecker
0 siblings, 0 replies; 14+ messages in thread
From: Frederic Weisbecker @ 2022-09-07 15:14 UTC (permalink / raw)
To: Paul E. McKenney; +Cc: rcu, linux-kernel, kernel-team, rostedt, Zqiang
On Wed, Sep 07, 2022 at 07:57:59AM -0700, Paul E. McKenney wrote:
> On Wed, Sep 07, 2022 at 02:10:10PM +0200, Frederic Weisbecker wrote:
> > On Wed, Aug 31, 2022 at 11:07:58AM -0700, Paul E. McKenney wrote:
> > > From: Zqiang <qiang1.zhang@intel.com>
> > >
> > > Kernels built with CONFIG_PREEMPTION=n and CONFIG_PREEMPT_COUNT=y maintain
> > > preempt_count() state. Because such kernels map __rcu_read_lock()
> > > and __rcu_read_unlock() to preempt_disable() and preempt_enable(),
> > > respectively, this allows the expedited grace period's !CONFIG_PREEMPT_RCU
> > > version of the rcu_exp_handler() IPI handler function to use
> > > preempt_count() to detect quiescent states.
> > >
> > > This preempt_count() usage might seem to risk failures due to
> > > use of implicit RCU readers in portions of the kernel under #ifndef
> > > CONFIG_PREEMPTION, except that rcu_core() already disallows such implicit
> > > RCU readers. The moral of this story is that you must use explicit
> > > read-side markings such as rcu_read_lock() or preempt_disable() even if
> > > the code knows that this kernel does not support preemption.
> > >
> > > This commit therefore adds a preempt_count()-based check for a quiescent
> > > state in the !CONFIG_PREEMPT_RCU version of the rcu_exp_handler()
> > > function for kernels built with CONFIG_PREEMPT_COUNT=y, reporting an
> > > immediate quiescent state when the interrupted code had both preemption
> > > and softirqs enabled.
> > >
> > > This change results in about a 2% reduction in expedited grace-period
> > > latency in kernels built with both CONFIG_PREEMPT_RCU=n and
> > > CONFIG_PREEMPT_COUNT=y.
> > >
> > > Signed-off-by: Zqiang <qiang1.zhang@intel.com>
> > > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > > Link: https://lore.kernel.org/all/20220622103549.2840087-1-qiang1.zhang@intel.com/
> > > ---
> > > kernel/rcu/tree_exp.h | 4 +++-
> > > 1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > index be667583a5547..b07998159d1fa 100644
> > > --- a/kernel/rcu/tree_exp.h
> > > +++ b/kernel/rcu/tree_exp.h
> > > @@ -828,11 +828,13 @@ static void rcu_exp_handler(void *unused)
> > > {
> > > struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > > struct rcu_node *rnp = rdp->mynode;
> > > + bool preempt_bh_enabled = !(preempt_count() & (PREEMPT_MASK |
> > > SOFTIRQ_MASK));
> >
> > I don't know if nested hardirqs still exist. I only heard old rumours
> > about broken drivers. Should we take care of them?
>
> Last I checked, certain tracing scenarios from irq handlers looked
> to RCU like nested irq handlers. Given that, does your more robust
> approach below work correctly?
I haven't observed that but in any case, the check I propose
is more strict than the one on this patch. So in the worst case it's
a QS not reported if a nested interrupt is detected.
Thanks.
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH rcu 04/10] rcu: Make tiny RCU support leak callbacks for debug-object errors
2022-08-31 18:07 [PATCH rcu 0/7] Miscellaneous fixes for v6.1 Paul E. McKenney
` (2 preceding siblings ...)
2022-08-31 18:07 ` [PATCH rcu 03/10] rcu: Add QS check in rcu_exp_handler() for non-preemptible kernels Paul E. McKenney
@ 2022-08-31 18:07 ` Paul E. McKenney
2022-08-31 18:08 ` [PATCH rcu 05/10] rcu: Document reason for rcu_all_qs() call to preempt_disable() Paul E. McKenney
` (5 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:07 UTC (permalink / raw)
To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang, Paul E . McKenney
From: Zqiang <qiang1.zhang@intel.com>
Currently, only Tree RCU leaks callbacks setting when it detects a
duplicate call_rcu(). This commit causes Tiny RCU to also leak
callbacks in this situation.
Because this is Tiny RCU, kernel size is important:
1. CONFIG_TINY_RCU=y and CONFIG_DEBUG_OBJECTS_RCU_HEAD=n
(Production kernel)
Original:
text data bss dec hex filename
26290663 20159823 15212544 61663030 3ace736 vmlinux
With this commit:
text data bss dec hex filename
26290663 20159823 15212544 61663030 3ace736 vmlinux
2. CONFIG_TINY_RCU=y and CONFIG_DEBUG_OBJECTS_RCU_HEAD=y
(Debugging kernel)
Original:
text data bss dec hex filename
26291319 20160143 15212544 61664006 3aceb06 vmlinux
With this commit:
text data bss dec hex filename
26291319 20160431 15212544 61664294 3acec26 vmlinux
These results show that the kernel size is unchanged for production
kernels, as desired.
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
kernel/rcu/tiny.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tiny.c b/kernel/rcu/tiny.c
index f0561ee16b9c2..943d431b908f6 100644
--- a/kernel/rcu/tiny.c
+++ b/kernel/rcu/tiny.c
@@ -158,6 +158,10 @@ void synchronize_rcu(void)
}
EXPORT_SYMBOL_GPL(synchronize_rcu);
+static void tiny_rcu_leak_callback(struct rcu_head *rhp)
+{
+}
+
/*
* Post an RCU callback to be invoked after the end of an RCU grace
* period. But since we have but one CPU, that would be after any
@@ -165,9 +169,20 @@ EXPORT_SYMBOL_GPL(synchronize_rcu);
*/
void call_rcu(struct rcu_head *head, rcu_callback_t func)
{
+ static atomic_t doublefrees;
unsigned long flags;
- debug_rcu_head_queue(head);
+ if (debug_rcu_head_queue(head)) {
+ if (atomic_inc_return(&doublefrees) < 4) {
+ pr_err("%s(): Double-freed CB %p->%pS()!!! ", __func__, head, head->func);
+ mem_dump_obj(head);
+ }
+
+ if (!__is_kvfree_rcu_offset((unsigned long)head->func))
+ WRITE_ONCE(head->func, tiny_rcu_leak_callback);
+ return;
+ }
+
head->func = func;
head->next = NULL;
--
2.31.1.189.g2e36527f23
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH rcu 05/10] rcu: Document reason for rcu_all_qs() call to preempt_disable()
2022-08-31 18:07 [PATCH rcu 0/7] Miscellaneous fixes for v6.1 Paul E. McKenney
` (3 preceding siblings ...)
2022-08-31 18:07 ` [PATCH rcu 04/10] rcu: Make tiny RCU support leak callbacks for debug-object errors Paul E. McKenney
@ 2022-08-31 18:08 ` Paul E. McKenney
2022-08-31 18:08 ` [PATCH rcu 06/10] rcu: Update rcu_access_pointer() header for rcu_dereference_protected() Paul E. McKenney
` (4 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:08 UTC (permalink / raw)
To: rcu
Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
Neeraj Upadhyay, Boqun Feng, Frederic Weisbecker
Given that rcu_all_qs() is in non-preemptible kernels, why on earth should
it invoke preempt_disable()? This commit adds the reason, which is to
work nicely with debugging enabled in CONFIG_PREEMPT_COUNT=y kernels.
Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com>
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Reported-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
kernel/rcu/tree_plugin.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 4152816dd29f6..c46b3c74dad1f 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -870,7 +870,7 @@ void rcu_all_qs(void)
if (!raw_cpu_read(rcu_data.rcu_urgent_qs))
return;
- preempt_disable();
+ preempt_disable(); // For CONFIG_PREEMPT_COUNT=y kernels
/* Load rcu_urgent_qs before other flags. */
if (!smp_load_acquire(this_cpu_ptr(&rcu_data.rcu_urgent_qs))) {
preempt_enable();
--
2.31.1.189.g2e36527f23
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH rcu 06/10] rcu: Update rcu_access_pointer() header for rcu_dereference_protected()
2022-08-31 18:07 [PATCH rcu 0/7] Miscellaneous fixes for v6.1 Paul E. McKenney
` (4 preceding siblings ...)
2022-08-31 18:08 ` [PATCH rcu 05/10] rcu: Document reason for rcu_all_qs() call to preempt_disable() Paul E. McKenney
@ 2022-08-31 18:08 ` Paul E. McKenney
2022-08-31 18:08 ` [PATCH rcu 07/10] sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task() Paul E. McKenney
` (3 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:08 UTC (permalink / raw)
To: rcu
Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
Maxim Mikityanskiy
The rcu_access_pointer() docbook header correctly notes that it may be
used during post-grace-period teardown. However, it is usually better to
use rcu_dereference_protected() for this purpose. This commit therefore
calls out this preferred usage.
Reported-by: Maxim Mikityanskiy <maximmi@nvidia.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
include/linux/rcupdate.h | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index f527f27e64387..61a1a85c720c3 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -496,13 +496,21 @@ do { \
* against NULL. Although rcu_access_pointer() may also be used in cases
* where update-side locks prevent the value of the pointer from changing,
* you should instead use rcu_dereference_protected() for this use case.
+ * Within an RCU read-side critical section, there is little reason to
+ * use rcu_access_pointer().
+ *
+ * It is usually best to test the rcu_access_pointer() return value
+ * directly in order to avoid accidental dereferences being introduced
+ * by later inattentive changes. In other words, assigning the
+ * rcu_access_pointer() return value to a local variable results in an
+ * accident waiting to happen.
*
* It is also permissible to use rcu_access_pointer() when read-side
- * access to the pointer was removed at least one grace period ago, as
- * is the case in the context of the RCU callback that is freeing up
- * the data, or after a synchronize_rcu() returns. This can be useful
- * when tearing down multi-linked structures after a grace period
- * has elapsed.
+ * access to the pointer was removed at least one grace period ago, as is
+ * the case in the context of the RCU callback that is freeing up the data,
+ * or after a synchronize_rcu() returns. This can be useful when tearing
+ * down multi-linked structures after a grace period has elapsed. However,
+ * rcu_dereference_protected() is normally preferred for this use case.
*/
#define rcu_access_pointer(p) __rcu_access_pointer((p), __UNIQUE_ID(rcu), __rcu)
--
2.31.1.189.g2e36527f23
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH rcu 07/10] sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task()
2022-08-31 18:07 [PATCH rcu 0/7] Miscellaneous fixes for v6.1 Paul E. McKenney
` (5 preceding siblings ...)
2022-08-31 18:08 ` [PATCH rcu 06/10] rcu: Update rcu_access_pointer() header for rcu_dereference_protected() Paul E. McKenney
@ 2022-08-31 18:08 ` Paul E. McKenney
2022-08-31 18:08 ` [PATCH rcu 08/10] sched/debug: Show the registers of 'current' " Paul E. McKenney
` (2 subsequent siblings)
9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:08 UTC (permalink / raw)
To: rcu
Cc: linux-kernel, kernel-team, rostedt, Zhen Lei, Paul E . McKenney,
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Ben Segall, Mel Gorman,
Daniel Bristot de Oliveira, Valentin Schneider
From: Zhen Lei <thunder.leizhen@huawei.com>
The trigger_all_cpu_backtrace() function attempts to send an NMI to the
target CPU, which usually provides much better stack traces than the
dump_cpu_task() function's approach of dumping that stack from some other
CPU. So much so that most calls to dump_cpu_task() only happen after
a call to trigger_all_cpu_backtrace() has failed. And the exception to
this rule really should attempt to use trigger_all_cpu_backtrace() first.
Therefore, move the trigger_all_cpu_backtrace() invocation into
dump_cpu_task().
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
---
kernel/rcu/tree_stall.h | 5 ++---
kernel/sched/core.c | 3 +++
kernel/smp.c | 3 +--
3 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index c3fbbcc09327f..5653560573e22 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -368,7 +368,7 @@ static void rcu_dump_cpu_stacks(void)
if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu)) {
if (cpu_is_offline(cpu))
pr_err("Offline CPU %d blocking current GP.\n", cpu);
- else if (!trigger_single_cpu_backtrace(cpu))
+ else
dump_cpu_task(cpu);
}
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
@@ -511,8 +511,7 @@ static void rcu_check_gp_kthread_starvation(void)
pr_err("RCU GP kthread last ran on offline CPU %d.\n", cpu);
} else {
pr_err("Stack dump where RCU GP kthread last ran:\n");
- if (!trigger_single_cpu_backtrace(cpu))
- dump_cpu_task(cpu);
+ dump_cpu_task(cpu);
}
}
wake_up_process(gpk);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ee28253c9ac0c..e15b6a7f34f47 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -11183,6 +11183,9 @@ struct cgroup_subsys cpu_cgrp_subsys = {
void dump_cpu_task(int cpu)
{
+ if (trigger_single_cpu_backtrace(cpu))
+ return;
+
pr_info("Task dump for CPU %d:\n", cpu);
sched_show_task(cpu_curr(cpu));
}
diff --git a/kernel/smp.c b/kernel/smp.c
index 650810a6f29b3..e8cdc025a046f 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -370,8 +370,7 @@ static bool csd_lock_wait_toolong(struct __call_single_data *csd, u64 ts0, u64 *
if (cpu >= 0) {
if (static_branch_unlikely(&csdlock_debug_extended))
csd_lock_print_extended(csd, cpu);
- if (!trigger_single_cpu_backtrace(cpu))
- dump_cpu_task(cpu);
+ dump_cpu_task(cpu);
if (!cpu_cur_csd) {
pr_alert("csd: Re-sending CSD lock (#%d) IPI from CPU#%02d to CPU#%02d\n", *bug_id, raw_smp_processor_id(), cpu);
arch_send_call_function_single_ipi(cpu);
--
2.31.1.189.g2e36527f23
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH rcu 08/10] sched/debug: Show the registers of 'current' in dump_cpu_task()
2022-08-31 18:07 [PATCH rcu 0/7] Miscellaneous fixes for v6.1 Paul E. McKenney
` (6 preceding siblings ...)
2022-08-31 18:08 ` [PATCH rcu 07/10] sched/debug: Try trigger_single_cpu_backtrace(cpu) in dump_cpu_task() Paul E. McKenney
@ 2022-08-31 18:08 ` Paul E. McKenney
2022-08-31 18:08 ` [PATCH rcu 09/10] rcu: Avoid triggering strict-GP irq-work when RCU is idle Paul E. McKenney
2022-08-31 18:08 ` [PATCH rcu 10/10] rcu: Exclude outgoing CPU when it is the last to leave Paul E. McKenney
9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:08 UTC (permalink / raw)
To: rcu
Cc: linux-kernel, kernel-team, rostedt, Zhen Lei, Paul E . McKenney,
Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Ben Segall, Mel Gorman,
Daniel Bristot de Oliveira, Valentin Schneider
From: Zhen Lei <thunder.leizhen@huawei.com>
The dump_cpu_task() function does not print registers on architectures
that do not support NMIs. However, registers can be useful for
debugging. Fortunately, in the case where dump_cpu_task() is invoked
from an interrupt handler and is dumping the current CPU's stack, the
get_irq_regs() function can be used to get the registers.
Therefore, this commit makes dump_cpu_task() check to see if it is being
asked to dump the current CPU's stack from within an interrupt handler,
and, if so, it uses the get_irq_regs() function to obtain the registers.
On systems that do support NMIs, this commit has the further advantage
of avoiding a self-NMI in this case.
This is an example of rcu self-detected stall on arm64, which does not
support NMIs:
[ 27.501721] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 27.502238] rcu: 0-....: (1250 ticks this GP) idle=4f7/1/0x4000000000000000 softirq=2594/2594 fqs=619
[ 27.502632] (t=1251 jiffies g=2989 q=29 ncpus=4)
[ 27.503845] CPU: 0 PID: 306 Comm: test0 Not tainted 5.19.0-rc7-00009-g1c1a6c29ff99-dirty #46
[ 27.504732] Hardware name: linux,dummy-virt (DT)
[ 27.504947] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 27.504998] pc : arch_counter_read+0x18/0x24
[ 27.505301] lr : arch_counter_read+0x18/0x24
[ 27.505328] sp : ffff80000b29bdf0
[ 27.505345] x29: ffff80000b29bdf0 x28: 0000000000000000 x27: 0000000000000000
[ 27.505475] x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
[ 27.505553] x23: 0000000000001f40 x22: ffff800009849c48 x21: 000000065f871ae0
[ 27.505627] x20: 00000000000025ec x19: ffff80000a6eb300 x18: ffffffffffffffff
[ 27.505654] x17: 0000000000000001 x16: 0000000000000000 x15: ffff80000a6d0296
[ 27.505681] x14: ffffffffffffffff x13: ffff80000a29bc18 x12: 0000000000000426
[ 27.505709] x11: 0000000000000162 x10: ffff80000a2f3c18 x9 : ffff80000a29bc18
[ 27.505736] x8 : 00000000ffffefff x7 : ffff80000a2f3c18 x6 : 00000000759bd013
[ 27.505761] x5 : 01ffffffffffffff x4 : 0002dc6c00000000 x3 : 0000000000000017
[ 27.505787] x2 : 00000000000025ec x1 : ffff80000b29bdf0 x0 : 0000000075a30653
[ 27.505937] Call trace:
[ 27.506002] arch_counter_read+0x18/0x24
[ 27.506171] ktime_get+0x48/0xa0
[ 27.506207] test_task+0x70/0xf0
[ 27.506227] kthread+0x10c/0x110
[ 27.506243] ret_from_fork+0x10/0x20
This is a marked improvement over the old output:
[ 27.944550] rcu: INFO: rcu_preempt self-detected stall on CPU
[ 27.944980] rcu: 0-....: (1249 ticks this GP) idle=cbb/1/0x4000000000000000 softirq=2610/2610 fqs=614
[ 27.945407] (t=1251 jiffies g=2681 q=28 ncpus=4)
[ 27.945731] Task dump for CPU 0:
[ 27.945844] task:test0 state:R running task stack: 0 pid: 306 ppid: 2 flags:0x0000000a
[ 27.946073] Call trace:
[ 27.946151] dump_backtrace.part.0+0xc8/0xd4
[ 27.946378] show_stack+0x18/0x70
[ 27.946405] sched_show_task+0x150/0x180
[ 27.946427] dump_cpu_task+0x44/0x54
[ 27.947193] rcu_dump_cpu_stacks+0xec/0x130
[ 27.947212] rcu_sched_clock_irq+0xb18/0xef0
[ 27.947231] update_process_times+0x68/0xac
[ 27.947248] tick_sched_handle+0x34/0x60
[ 27.947266] tick_sched_timer+0x4c/0xa4
[ 27.947281] __hrtimer_run_queues+0x178/0x360
[ 27.947295] hrtimer_interrupt+0xe8/0x244
[ 27.947309] arch_timer_handler_virt+0x38/0x4c
[ 27.947326] handle_percpu_devid_irq+0x88/0x230
[ 27.947342] generic_handle_domain_irq+0x2c/0x44
[ 27.947357] gic_handle_irq+0x44/0xc4
[ 27.947376] call_on_irq_stack+0x2c/0x54
[ 27.947415] do_interrupt_handler+0x80/0x94
[ 27.947431] el1_interrupt+0x34/0x70
[ 27.947447] el1h_64_irq_handler+0x18/0x24
[ 27.947462] el1h_64_irq+0x64/0x68 <--- the above backtrace is worthless
[ 27.947474] arch_counter_read+0x18/0x24
[ 27.947487] ktime_get+0x48/0xa0
[ 27.947501] test_task+0x70/0xf0
[ 27.947520] kthread+0x10c/0x110
[ 27.947538] ret_from_fork+0x10/0x20
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Valentin Schneider <vschneid@redhat.com>
---
kernel/sched/core.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e15b6a7f34f47..60fdc0faf1c9d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -73,6 +73,7 @@
#include <uapi/linux/sched/types.h>
+#include <asm/irq_regs.h>
#include <asm/switch_to.h>
#include <asm/tlb.h>
@@ -11183,6 +11184,16 @@ struct cgroup_subsys cpu_cgrp_subsys = {
void dump_cpu_task(int cpu)
{
+ if (cpu == smp_processor_id() && in_hardirq()) {
+ struct pt_regs *regs;
+
+ regs = get_irq_regs();
+ if (regs) {
+ show_regs(regs);
+ return;
+ }
+ }
+
if (trigger_single_cpu_backtrace(cpu))
return;
--
2.31.1.189.g2e36527f23
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH rcu 09/10] rcu: Avoid triggering strict-GP irq-work when RCU is idle
2022-08-31 18:07 [PATCH rcu 0/7] Miscellaneous fixes for v6.1 Paul E. McKenney
` (7 preceding siblings ...)
2022-08-31 18:08 ` [PATCH rcu 08/10] sched/debug: Show the registers of 'current' " Paul E. McKenney
@ 2022-08-31 18:08 ` Paul E. McKenney
2022-08-31 18:08 ` [PATCH rcu 10/10] rcu: Exclude outgoing CPU when it is the last to leave Paul E. McKenney
9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:08 UTC (permalink / raw)
To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Zqiang, Paul E . McKenney
From: Zqiang <qiang1.zhang@intel.com>
Kernels built with PREEMPT_RCU=y and RCU_STRICT_GRACE_PERIOD=y trigger
irq-work from rcu_read_unlock(), and the resulting irq-work handler
invokes rcu_preempt_deferred_qs_handle(). The point of this triggering
is to force grace periods to end quickly in order to give tools like KASAN
a better chance of detecting RCU usage bugs such as leaking RCU-protected
pointers out of an RCU read-side critical section.
However, this irq-work triggering is unconditional. This works, but
there is no point in doing this irq-work unless the current grace period
is waiting on the running CPU or task, which is not the common case.
After all, in the common case there are many rcu_read_unlock() calls
per CPU per grace period.
This commit therefore triggers the irq-work only when the current grace
period is waiting on the running CPU or task.
This change was tested as follows on a four-CPU system:
echo rcu_preempt_deferred_qs_handler > /sys/kernel/debug/tracing/set_ftrace_filter
echo 1 > /sys/kernel/debug/tracing/function_profile_enabled
insmod rcutorture.ko
sleep 20
rmmod rcutorture.ko
echo 0 > /sys/kernel/debug/tracing/function_profile_enabled
echo > /sys/kernel/debug/tracing/set_ftrace_filter
This procedure produces results in this per-CPU set of files:
/sys/kernel/debug/tracing/trace_stat/function*
Sample output from one of these files is as follows:
Function Hit Time Avg s^2
-------- --- ---- --- ---
rcu_preempt_deferred_qs_handle 838746 182650.3 us 0.217 us 0.004 us
The baseline sum of the "Hit" values (the number of calls to this
function) was 3,319,015. With this commit, that sum was 1,140,359,
for a 2.9x reduction. The worst-case variance across the CPUs was less
than 25%, so this large effect size is statistically significant.
The raw data is available in the Link: URL.
Link: https://lore.kernel.org/all/20220808022626.12825-1-qiang1.zhang@intel.com/
Signed-off-by: Zqiang <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
kernel/rcu/tree_plugin.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c46b3c74dad1f..207617f69aa56 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -641,7 +641,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
expboost = (t->rcu_blocked_node && READ_ONCE(t->rcu_blocked_node->exp_tasks)) ||
(rdp->grpmask & READ_ONCE(rnp->expmask)) ||
- IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ||
+ (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) &&
+ ((rdp->grpmask & READ_ONCE(rnp->qsmask)) || t->rcu_blocked_node)) ||
(IS_ENABLED(CONFIG_RCU_BOOST) && irqs_were_disabled &&
t->rcu_blocked_node);
// Need to defer quiescent state until everything is enabled.
--
2.31.1.189.g2e36527f23
^ permalink raw reply related [flat|nested] 14+ messages in thread* [PATCH rcu 10/10] rcu: Exclude outgoing CPU when it is the last to leave
2022-08-31 18:07 [PATCH rcu 0/7] Miscellaneous fixes for v6.1 Paul E. McKenney
` (8 preceding siblings ...)
2022-08-31 18:08 ` [PATCH rcu 09/10] rcu: Avoid triggering strict-GP irq-work when RCU is idle Paul E. McKenney
@ 2022-08-31 18:08 ` Paul E. McKenney
9 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2022-08-31 18:08 UTC (permalink / raw)
To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney
The rcu_boost_kthread_setaffinity() function removes the outgoing CPU
from the set_cpus_allowed() mask for the corresponding leaf rcu_node
structure's rcub priority-boosting kthread. Except that if the outgoing
CPU will leave that structure without any online CPUs, the mask is set
to the housekeeping CPU mask from housekeeping_cpumask(). Which is fine
unless the outgoing CPU happens to be a housekeeping CPU.
This commit therefore removes the outgoing CPU from the housekeeping mask.
This would of course be problematic if the outgoing CPU was the last
online housekeeping CPU, but in that case you are in a world of hurt
anyway. If someone comes up with a valid use case for a system needing
all the housekeeping CPUs to be offline, further adjustments can be made.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
kernel/rcu/tree_plugin.h | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 207617f69aa56..32b424b571bd9 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -1243,8 +1243,11 @@ static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int outgoingcpu)
cpu != outgoingcpu)
cpumask_set_cpu(cpu, cm);
cpumask_and(cm, cm, housekeeping_cpumask(HK_TYPE_RCU));
- if (cpumask_empty(cm))
+ if (cpumask_empty(cm)) {
cpumask_copy(cm, housekeeping_cpumask(HK_TYPE_RCU));
+ if (outgoingcpu >= 0)
+ cpumask_clear_cpu(outgoingcpu, cm);
+ }
set_cpus_allowed_ptr(t, cm);
mutex_unlock(&rnp->boost_kthread_mutex);
free_cpumask_var(cm);
--
2.31.1.189.g2e36527f23
^ permalink raw reply related [flat|nested] 14+ messages in thread