public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] RCU changes for PREEMPT_LAZY
@ 2024-11-06 20:17 Ankur Arora
  2024-11-06 20:17 ` [PATCH v2 1/6] rcu: fix header guard for rcu_all_qs() Ankur Arora
                   ` (6 more replies)
  0 siblings, 7 replies; 36+ messages in thread
From: Ankur Arora @ 2024-11-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, ankur.a.arora, efault, sshegde,
	boris.ostrovsky

This series adds RCU and some leftover scheduler bits for lazy
preemption.

The main problem addressed in the RCU related patches is that before
PREEMPT_LAZY, PREEMPTION=y implied PREEMPT_RCU=y. With PREEMPT_LAZY,
that's no longer true. 

That's because PREEMPT_RCU makes some trade-offs to optimize for
latency as opposed to throughput, and configurations with limited
preemption might prefer the stronger forward-progress guarantees of
PREEMPT_RCU=n.

Accordingly, with standalone PREEMPT_LAZY (much like PREEMPT_NONE,
PREEMPT_VOLUNTARY) we want to use PREEMPT_RCU=n. And, when used in
conjunction with PREEMPT_DYNAMIC, we continue to use PREEMPT_RCU=y.

Patches 1 and 2 are cleanup patches:
  "rcu: fix header guard for rcu_all_qs()"
  "rcu: rename PREEMPT_AUTO to PREEMPT_LAZY"

Patch 3, "rcu: limit PREEMPT_RCU configurations", explicitly limits
PREEMPT_RCU=y to the PREEMPT_DYNAMIC or the latency oriented models.

Patches 4 and 5,
  "rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y"
  "osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y"

handle quiescent states for the (PREEMPT_LAZY=y, PREEMPT_RCU=n)
configuration.

And, finally patch-6
  "sched: warn for high latency with TIF_NEED_RESCHED_LAZY"
adds high latency warning for TIF_NEED_RESCHED_LAZY.

Goes on top of PeterZ's tree:

 git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/core

Changelog:
  - fixup incorrect usage of tif_need_resched_lazy() (comment from
    from Sebastian Andrzej Siewior)
  - massaged the commit messages a bit
  - drops the powerpc support for PREEMPT_LAZY as that was orthogonal
    to this series (Shrikanth will send that out separately.)

Please review.

Ankur Arora (6):
  rcu: fix header guard for rcu_all_qs()
  rcu: rename PREEMPT_AUTO to PREEMPT_LAZY
  rcu: limit PREEMPT_RCU configurations
  rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
  osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y
  sched: warn for high latency with TIF_NEED_RESCHED_LAZY

 include/linux/rcutree.h      |  2 +-
 include/linux/srcutiny.h     |  2 +-
 kernel/rcu/Kconfig           |  4 ++--
 kernel/rcu/srcutiny.c        | 14 +++++++-------
 kernel/rcu/tree_plugin.h     | 11 +++++++----
 kernel/sched/core.c          |  3 ++-
 kernel/sched/debug.c         |  7 +++++--
 kernel/trace/trace_osnoise.c | 22 ++++++++++++----------
 8 files changed, 37 insertions(+), 28 deletions(-)

-- 
2.43.5


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v2 1/6] rcu: fix header guard for rcu_all_qs()
  2024-11-06 20:17 [PATCH v2 0/6] RCU changes for PREEMPT_LAZY Ankur Arora
@ 2024-11-06 20:17 ` Ankur Arora
  2024-11-13 14:50   ` Frederic Weisbecker
  2024-11-14  7:06   ` Sebastian Andrzej Siewior
  2024-11-06 20:17 ` [PATCH v2 2/6] rcu: rename PREEMPT_AUTO to PREEMPT_LAZY Ankur Arora
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 36+ messages in thread
From: Ankur Arora @ 2024-11-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, ankur.a.arora, efault, sshegde,
	boris.ostrovsky

rcu_all_qs() is defined for !CONFIG_PREEMPT_RCU but the declaration
is conditioned on CONFIG_PREEMPTION.

With CONFIG_PREEMPT_LAZY, CONFIG_PREEMPTION=y does not imply
CONFIG_PREEMPT_RCU=y.

Decouple the two.

Cc: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 include/linux/rcutree.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h
index 90a684f94776..ae8b5cb475a3 100644
--- a/include/linux/rcutree.h
+++ b/include/linux/rcutree.h
@@ -104,7 +104,7 @@ extern int rcu_scheduler_active;
 void rcu_end_inkernel_boot(void);
 bool rcu_inkernel_boot_has_ended(void);
 bool rcu_is_watching(void);
-#ifndef CONFIG_PREEMPTION
+#ifndef CONFIG_PREEMPT_RCU
 void rcu_all_qs(void);
 #endif
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 2/6] rcu: rename PREEMPT_AUTO to PREEMPT_LAZY
  2024-11-06 20:17 [PATCH v2 0/6] RCU changes for PREEMPT_LAZY Ankur Arora
  2024-11-06 20:17 ` [PATCH v2 1/6] rcu: fix header guard for rcu_all_qs() Ankur Arora
@ 2024-11-06 20:17 ` Ankur Arora
  2024-11-13 14:59   ` Frederic Weisbecker
  2024-11-14  7:07   ` Sebastian Andrzej Siewior
  2024-11-06 20:17 ` [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations Ankur Arora
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 36+ messages in thread
From: Ankur Arora @ 2024-11-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, ankur.a.arora, efault, sshegde,
	boris.ostrovsky

Replace mentions of PREEMPT_AUTO with PREEMPT_LAZY.

Also, since PREMPT_LAZY implies PREEMPTION, we can just reduce the
TASKS_RCU selection criteria from:

  NEED_TASKS_RCU && (PREEMPTION || PREEMPT_AUTO)
to:
  NEED_TASKS_RCU && PREEMPTION

CC: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 include/linux/srcutiny.h |  2 +-
 kernel/rcu/Kconfig       |  2 +-
 kernel/rcu/srcutiny.c    | 14 +++++++-------
 3 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h
index 4d96bbdb45f0..1635c5e2662f 100644
--- a/include/linux/srcutiny.h
+++ b/include/linux/srcutiny.h
@@ -64,7 +64,7 @@ static inline int __srcu_read_lock(struct srcu_struct *ssp)
 {
 	int idx;
 
-	preempt_disable();  // Needed for PREEMPT_AUTO
+	preempt_disable();  // Needed for PREEMPT_LAZY
 	idx = ((READ_ONCE(ssp->srcu_idx) + 1) & 0x2) >> 1;
 	WRITE_ONCE(ssp->srcu_lock_nesting[idx], READ_ONCE(ssp->srcu_lock_nesting[idx]) + 1);
 	preempt_enable();
diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 3e079de0f5b4..5a7ff5e1cdcb 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -91,7 +91,7 @@ config NEED_TASKS_RCU
 
 config TASKS_RCU
 	bool
-	default NEED_TASKS_RCU && (PREEMPTION || PREEMPT_AUTO)
+	default NEED_TASKS_RCU && PREEMPTION
 	select IRQ_WORK
 
 config FORCE_TASKS_RUDE_RCU
diff --git a/kernel/rcu/srcutiny.c b/kernel/rcu/srcutiny.c
index 549c03336ee9..8a662d911abd 100644
--- a/kernel/rcu/srcutiny.c
+++ b/kernel/rcu/srcutiny.c
@@ -98,7 +98,7 @@ void __srcu_read_unlock(struct srcu_struct *ssp, int idx)
 {
 	int newval;
 
-	preempt_disable();  // Needed for PREEMPT_AUTO
+	preempt_disable();  // Needed for PREEMPT_LAZY
 	newval = READ_ONCE(ssp->srcu_lock_nesting[idx]) - 1;
 	WRITE_ONCE(ssp->srcu_lock_nesting[idx], newval);
 	preempt_enable();
@@ -120,7 +120,7 @@ void srcu_drive_gp(struct work_struct *wp)
 	struct srcu_struct *ssp;
 
 	ssp = container_of(wp, struct srcu_struct, srcu_work);
-	preempt_disable();  // Needed for PREEMPT_AUTO
+	preempt_disable();  // Needed for PREEMPT_LAZY
 	if (ssp->srcu_gp_running || ULONG_CMP_GE(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max))) {
 		return; /* Already running or nothing to do. */
 		preempt_enable();
@@ -138,7 +138,7 @@ void srcu_drive_gp(struct work_struct *wp)
 	WRITE_ONCE(ssp->srcu_gp_waiting, true);  /* srcu_read_unlock() wakes! */
 	preempt_enable();
 	swait_event_exclusive(ssp->srcu_wq, !READ_ONCE(ssp->srcu_lock_nesting[idx]));
-	preempt_disable();  // Needed for PREEMPT_AUTO
+	preempt_disable();  // Needed for PREEMPT_LAZY
 	WRITE_ONCE(ssp->srcu_gp_waiting, false); /* srcu_read_unlock() cheap. */
 	WRITE_ONCE(ssp->srcu_idx, ssp->srcu_idx + 1);
 	preempt_enable();
@@ -159,7 +159,7 @@ void srcu_drive_gp(struct work_struct *wp)
 	 * at interrupt level, but the ->srcu_gp_running checks will
 	 * straighten that out.
 	 */
-	preempt_disable();  // Needed for PREEMPT_AUTO
+	preempt_disable();  // Needed for PREEMPT_LAZY
 	WRITE_ONCE(ssp->srcu_gp_running, false);
 	idx = ULONG_CMP_LT(ssp->srcu_idx, READ_ONCE(ssp->srcu_idx_max));
 	preempt_enable();
@@ -172,7 +172,7 @@ static void srcu_gp_start_if_needed(struct srcu_struct *ssp)
 {
 	unsigned long cookie;
 
-	preempt_disable();  // Needed for PREEMPT_AUTO
+	preempt_disable();  // Needed for PREEMPT_LAZY
 	cookie = get_state_synchronize_srcu(ssp);
 	if (ULONG_CMP_GE(READ_ONCE(ssp->srcu_idx_max), cookie)) {
 		preempt_enable();
@@ -199,7 +199,7 @@ void call_srcu(struct srcu_struct *ssp, struct rcu_head *rhp,
 
 	rhp->func = func;
 	rhp->next = NULL;
-	preempt_disable();  // Needed for PREEMPT_AUTO
+	preempt_disable();  // Needed for PREEMPT_LAZY
 	local_irq_save(flags);
 	*ssp->srcu_cb_tail = rhp;
 	ssp->srcu_cb_tail = &rhp->next;
@@ -261,7 +261,7 @@ unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp)
 {
 	unsigned long ret;
 
-	preempt_disable();  // Needed for PREEMPT_AUTO
+	preempt_disable();  // Needed for PREEMPT_LAZY
 	ret = get_state_synchronize_srcu(ssp);
 	srcu_gp_start_if_needed(ssp);
 	preempt_enable();
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations
  2024-11-06 20:17 [PATCH v2 0/6] RCU changes for PREEMPT_LAZY Ankur Arora
  2024-11-06 20:17 ` [PATCH v2 1/6] rcu: fix header guard for rcu_all_qs() Ankur Arora
  2024-11-06 20:17 ` [PATCH v2 2/6] rcu: rename PREEMPT_AUTO to PREEMPT_LAZY Ankur Arora
@ 2024-11-06 20:17 ` Ankur Arora
  2024-11-13 15:31   ` Frederic Weisbecker
  2024-11-06 20:17 ` [PATCH v2 4/6] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y Ankur Arora
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 36+ messages in thread
From: Ankur Arora @ 2024-11-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, ankur.a.arora, efault, sshegde,
	boris.ostrovsky

PREEMPT_LAZY can be enabled stand-alone or alongside PREEMPT_DYNAMIC
which allows for dynamic switching of preemption models.

The choice of PREEMPT_RCU or not, however, is fixed at compile time.

Given that PREEMPT_RCU makes some trade-offs to optimize for latency
as opposed to throughput, configurations with limited preemption
might prefer the stronger forward-progress guarantees of PREEMPT_RCU=n.

Accordingly, explicitly limit PREEMPT_RCU=y to the latency oriented
preemption models: PREEMPT, PREEMPT_RT, and the runtime configurable
model PREEMPT_DYNAMIC.

This means the throughput oriented models, PREEMPT_NONE,
PREEMPT_VOLUNTARY and PREEMPT_LAZY will run with PREEMPT_RCU=n.

Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 kernel/rcu/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
index 5a7ff5e1cdcb..9d52f87fac27 100644
--- a/kernel/rcu/Kconfig
+++ b/kernel/rcu/Kconfig
@@ -18,7 +18,7 @@ config TREE_RCU
 
 config PREEMPT_RCU
 	bool
-	default y if PREEMPTION
+	default y if (PREEMPT || PREEMPT_RT || PREEMPT_DYNAMIC)
 	select TREE_RCU
 	help
 	  This option selects the RCU implementation that is
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 4/6] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
  2024-11-06 20:17 [PATCH v2 0/6] RCU changes for PREEMPT_LAZY Ankur Arora
                   ` (2 preceding siblings ...)
  2024-11-06 20:17 ` [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations Ankur Arora
@ 2024-11-06 20:17 ` Ankur Arora
  2024-11-14  8:50   ` Sebastian Andrzej Siewior
  2024-11-28 13:36   ` Frederic Weisbecker
  2024-11-06 20:17 ` [PATCH v2 5/6] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y Ankur Arora
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 36+ messages in thread
From: Ankur Arora @ 2024-11-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, ankur.a.arora, efault, sshegde,
	boris.ostrovsky

With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
states for read-side critical sections via rcu_all_qs().
One reason why this was needed, was lacking preempt-count, the tick
handler has no way of knowing whether it is executing in a read-side
critical section or not.

With PREEMPT_LAZY=y, there can be configurations with PREEMPT_COUNT=y,
PREEMPT_RCU=n, where cond_resched() is a stub that does not provide
quiescent states via rcu_all_qs().

So, use the availability of preempt_count() to report quiescent states
in rcu_flavor_sched_clock_irq().

Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 kernel/rcu/tree_plugin.h | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 1c7cbd145d5e..da324d66034b 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -974,13 +974,16 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
  */
 static void rcu_flavor_sched_clock_irq(int user)
 {
-	if (user || rcu_is_cpu_rrupt_from_idle()) {
+	if (user || rcu_is_cpu_rrupt_from_idle() ||
+	     (IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
+	      !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)))) {
 
 		/*
 		 * Get here if this CPU took its interrupt from user
-		 * mode or from the idle loop, and if this is not a
-		 * nested interrupt.  In this case, the CPU is in
-		 * a quiescent state, so note it.
+		 * mode, from the idle loop without this being a nested
+		 * interrupt, or while not holding a preempt count (but
+		 * with PREEMPT_COUNT=y. In this case, the CPU is in a
+		 * quiescent state, so note it.
 		 *
 		 * No memory barrier is required here because rcu_qs()
 		 * references only CPU-local variables that other CPUs
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 5/6] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y
  2024-11-06 20:17 [PATCH v2 0/6] RCU changes for PREEMPT_LAZY Ankur Arora
                   ` (3 preceding siblings ...)
  2024-11-06 20:17 ` [PATCH v2 4/6] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y Ankur Arora
@ 2024-11-06 20:17 ` Ankur Arora
  2024-11-14  9:22   ` Sebastian Andrzej Siewior
  2024-11-28 14:33   ` Frederic Weisbecker
  2024-11-06 20:17 ` [PATCH v2 6/6] sched: warn for high latency with TIF_NEED_RESCHED_LAZY Ankur Arora
  2024-11-14 10:01 ` [PATCH v2 0/6] RCU changes for PREEMPT_LAZY Sebastian Andrzej Siewior
  6 siblings, 2 replies; 36+ messages in thread
From: Ankur Arora @ 2024-11-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, ankur.a.arora, efault, sshegde,
	boris.ostrovsky, Daniel Bristot de Oliveira

To reduce RCU noise for nohz_full configurations, osnoise depends
on cond_resched() providing quiescent states for PREEMPT_RCU=n
configurations. And, for PREEMPT_RCU=y configurations does this
by directly calling rcu_momentary_eqs().

With PREEMPT_LAZY=y, however, we can have configurations with
(PREEMPTION=y, PREEMPT_RCU=n), which means neither of the above
can help.

Handle that by fallback to the explicit quiescent states via
rcu_momentary_eqs().

Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Suggested-by: Paul E. McKenney <paulmck@kernel.org>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 kernel/trace/trace_osnoise.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
index a50ed23bee77..15e9600d231d 100644
--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -1538,18 +1538,20 @@ static int run_osnoise(void)
 		/*
 		 * In some cases, notably when running on a nohz_full CPU with
 		 * a stopped tick PREEMPT_RCU has no way to account for QSs.
-		 * This will eventually cause unwarranted noise as PREEMPT_RCU
-		 * will force preemption as the means of ending the current
-		 * grace period. We avoid this problem by calling
-		 * rcu_momentary_eqs(), which performs a zero duration
-		 * EQS allowing PREEMPT_RCU to end the current grace period.
-		 * This call shouldn't be wrapped inside an RCU critical
-		 * section.
+		 * This will eventually cause unwarranted noise as RCU forces
+		 * preemption as the means of ending the current grace period.
+		 * We avoid this by calling rcu_momentary_eqs(), which performs
+		 * a zero duration EQS allowing RCU to end the current grace
+		 * period. This call shouldn't be wrapped inside an RCU
+		 * critical section.
 		 *
-		 * Note that in non PREEMPT_RCU kernels QSs are handled through
-		 * cond_resched()
+		 * For non-PREEMPT_RCU kernels with cond_resched() (non-
+		 * PREEMPT_LAZY configurations), QSs are handled through
+		 * cond_resched(). For PREEMPT_LAZY kernels, we fallback to
+		 * the zero duration QS via rcu_momentary_eqs().
 		 */
-		if (IS_ENABLED(CONFIG_PREEMPT_RCU)) {
+		if (IS_ENABLED(CONFIG_PREEMPT_RCU) ||
+		    (!IS_ENABLED(CONFIG_PREEMPT_RCU) && IS_ENABLED(CONFIG_PREEMPTION))) {
 			if (!disable_irq)
 				local_irq_disable();
 
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v2 6/6] sched: warn for high latency with TIF_NEED_RESCHED_LAZY
  2024-11-06 20:17 [PATCH v2 0/6] RCU changes for PREEMPT_LAZY Ankur Arora
                   ` (4 preceding siblings ...)
  2024-11-06 20:17 ` [PATCH v2 5/6] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y Ankur Arora
@ 2024-11-06 20:17 ` Ankur Arora
  2024-11-14  9:16   ` Sebastian Andrzej Siewior
  2024-11-14 10:01 ` [PATCH v2 0/6] RCU changes for PREEMPT_LAZY Sebastian Andrzej Siewior
  6 siblings, 1 reply; 36+ messages in thread
From: Ankur Arora @ 2024-11-06 20:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, ankur.a.arora, efault, sshegde,
	boris.ostrovsky, Ingo Molnar

Add support for warning if the TIF_NEED_RESCHED_LAZY bit is set
without rescheduling for more than the latency_warn_ms period.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Ziljstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
---
 kernel/sched/core.c  | 3 ++-
 kernel/sched/debug.c | 7 +++++--
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5c47d70f4204..077ea42a17f1 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5579,7 +5579,8 @@ static u64 cpu_resched_latency(struct rq *rq)
 	if (sysctl_resched_latency_warn_once && warned_once)
 		return 0;
 
-	if (!need_resched() || !latency_warn_ms)
+	if ((!need_resched() && !tif_test_bit(TIF_NEED_RESCHED_LAZY)) ||
+	    !latency_warn_ms)
 		return 0;
 
 	if (system_state == SYSTEM_BOOTING)
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index a48b2a701ec2..6c1a5305a1b3 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -1293,9 +1293,12 @@ void proc_sched_set_task(struct task_struct *p)
 void resched_latency_warn(int cpu, u64 latency)
 {
 	static DEFINE_RATELIMIT_STATE(latency_check_ratelimit, 60 * 60 * HZ, 1);
+	char *nr;
+
+	nr = tif_need_resched() ? "need_resched" : "need_resched_lazy";
 
 	WARN(__ratelimit(&latency_check_ratelimit),
-	     "sched: CPU %d need_resched set for > %llu ns (%d ticks) "
+	     "sched: CPU %d %s set for > %llu ns (%d ticks) "
 	     "without schedule\n",
-	     cpu, latency, cpu_rq(cpu)->ticks_without_resched);
+	     cpu, nr, latency, cpu_rq(cpu)->ticks_without_resched);
 }
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/6] rcu: fix header guard for rcu_all_qs()
  2024-11-06 20:17 ` [PATCH v2 1/6] rcu: fix header guard for rcu_all_qs() Ankur Arora
@ 2024-11-13 14:50   ` Frederic Weisbecker
  2024-11-14  7:06   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 36+ messages in thread
From: Frederic Weisbecker @ 2024-11-13 14:50 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, efault, sshegde, boris.ostrovsky

Le Wed, Nov 06, 2024 at 12:17:53PM -0800, Ankur Arora a écrit :
> rcu_all_qs() is defined for !CONFIG_PREEMPT_RCU but the declaration
> is conditioned on CONFIG_PREEMPTION.
> 
> With CONFIG_PREEMPT_LAZY, CONFIG_PREEMPTION=y does not imply
> CONFIG_PREEMPT_RCU=y.
> 
> Decouple the two.
> 
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/6] rcu: rename PREEMPT_AUTO to PREEMPT_LAZY
  2024-11-06 20:17 ` [PATCH v2 2/6] rcu: rename PREEMPT_AUTO to PREEMPT_LAZY Ankur Arora
@ 2024-11-13 14:59   ` Frederic Weisbecker
  2024-11-13 23:51     ` Ankur Arora
  2024-11-14  7:07   ` Sebastian Andrzej Siewior
  1 sibling, 1 reply; 36+ messages in thread
From: Frederic Weisbecker @ 2024-11-13 14:59 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, efault, sshegde, boris.ostrovsky

Le Wed, Nov 06, 2024 at 12:17:54PM -0800, Ankur Arora a écrit :
> Replace mentions of PREEMPT_AUTO with PREEMPT_LAZY.
> 
> Also, since PREMPT_LAZY implies PREEMPTION, we can just reduce the
> TASKS_RCU selection criteria from:
> 
>   NEED_TASKS_RCU && (PREEMPTION || PREEMPT_AUTO)
> to:
>   NEED_TASKS_RCU && PREEMPTION
> 
> CC: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations
  2024-11-06 20:17 ` [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations Ankur Arora
@ 2024-11-13 15:31   ` Frederic Weisbecker
  2024-11-14  0:23     ` Ankur Arora
  2024-11-25 21:40     ` Ankur Arora
  0 siblings, 2 replies; 36+ messages in thread
From: Frederic Weisbecker @ 2024-11-13 15:31 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, efault, sshegde, boris.ostrovsky

Le Wed, Nov 06, 2024 at 12:17:55PM -0800, Ankur Arora a écrit :
> PREEMPT_LAZY can be enabled stand-alone or alongside PREEMPT_DYNAMIC
> which allows for dynamic switching of preemption models.
> 
> The choice of PREEMPT_RCU or not, however, is fixed at compile time.
> 
> Given that PREEMPT_RCU makes some trade-offs to optimize for latency
> as opposed to throughput, configurations with limited preemption
> might prefer the stronger forward-progress guarantees of PREEMPT_RCU=n.
> 
> Accordingly, explicitly limit PREEMPT_RCU=y to the latency oriented
> preemption models: PREEMPT, PREEMPT_RT, and the runtime configurable
> model PREEMPT_DYNAMIC.
> 
> This means the throughput oriented models, PREEMPT_NONE,
> PREEMPT_VOLUNTARY and PREEMPT_LAZY will run with PREEMPT_RCU=n.
> 
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>  kernel/rcu/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> index 5a7ff5e1cdcb..9d52f87fac27 100644
> --- a/kernel/rcu/Kconfig
> +++ b/kernel/rcu/Kconfig
> @@ -18,7 +18,7 @@ config TREE_RCU
>  
>  config PREEMPT_RCU
>  	bool
> -	default y if PREEMPTION
> +	default y if (PREEMPT || PREEMPT_RT || PREEMPT_DYNAMIC)
>  	select TREE_RCU
>  	help
>  	  This option selects the RCU implementation that is

Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

But looking at !CONFIG_PREEMPT_RCU code on tree_plugin.h, I see
some issues now that the code can be preemptible. Well I think
it has always been preemptible but PREEMPTION && !PREEMPT_RCU
has seldom been exerciced (or was it even possible?).

For example rcu_read_unlock_strict() can be called with preemption
enabled so we need the following otherwise the rdp is unstable, the
norm value becomes racy (though automagically fixed in rcu_report_qs_rdp())
and rcu_report_qs_rdp() might warn.

diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 58d84c59f3dd..368f00267d4e 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -95,9 +95,9 @@ static inline void __rcu_read_lock(void)
 
 static inline void __rcu_read_unlock(void)
 {
-	preempt_enable();
 	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
 		rcu_read_unlock_strict();
+	preempt_enable();
 }
 
 static inline int rcu_preempt_depth(void)


Let me audit further if we missed something else...

Thanks.

> -- 
> 2.43.5
> 

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/6] rcu: rename PREEMPT_AUTO to PREEMPT_LAZY
  2024-11-13 14:59   ` Frederic Weisbecker
@ 2024-11-13 23:51     ` Ankur Arora
  0 siblings, 0 replies; 36+ messages in thread
From: Ankur Arora @ 2024-11-13 23:51 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, efault, sshegde, boris.ostrovsky


Frederic Weisbecker <frederic@kernel.org> writes:

> Le Wed, Nov 06, 2024 at 12:17:54PM -0800, Ankur Arora a écrit :
>> Replace mentions of PREEMPT_AUTO with PREEMPT_LAZY.
>>
>> Also, since PREMPT_LAZY implies PREEMPTION, we can just reduce the
>> TASKS_RCU selection criteria from:
>>
>>   NEED_TASKS_RCU && (PREEMPTION || PREEMPT_AUTO)
>> to:
>>   NEED_TASKS_RCU && PREEMPTION
>>
>> CC: Paul E. McKenney <paulmck@kernel.org>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thanks!

--
ankur

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations
  2024-11-13 15:31   ` Frederic Weisbecker
@ 2024-11-14  0:23     ` Ankur Arora
  2024-11-14  8:25       ` Sebastian Andrzej Siewior
  2024-11-25 21:40     ` Ankur Arora
  1 sibling, 1 reply; 36+ messages in thread
From: Ankur Arora @ 2024-11-14  0:23 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, efault, sshegde, boris.ostrovsky


Frederic Weisbecker <frederic@kernel.org> writes:

> Le Wed, Nov 06, 2024 at 12:17:55PM -0800, Ankur Arora a écrit :
>> PREEMPT_LAZY can be enabled stand-alone or alongside PREEMPT_DYNAMIC
>> which allows for dynamic switching of preemption models.
>>
>> The choice of PREEMPT_RCU or not, however, is fixed at compile time.
>>
>> Given that PREEMPT_RCU makes some trade-offs to optimize for latency
>> as opposed to throughput, configurations with limited preemption
>> might prefer the stronger forward-progress guarantees of PREEMPT_RCU=n.
>>
>> Accordingly, explicitly limit PREEMPT_RCU=y to the latency oriented
>> preemption models: PREEMPT, PREEMPT_RT, and the runtime configurable
>> model PREEMPT_DYNAMIC.
>>
>> This means the throughput oriented models, PREEMPT_NONE,
>> PREEMPT_VOLUNTARY and PREEMPT_LAZY will run with PREEMPT_RCU=n.
>>
>> Cc: Paul E. McKenney <paulmck@kernel.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>  kernel/rcu/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
>> index 5a7ff5e1cdcb..9d52f87fac27 100644
>> --- a/kernel/rcu/Kconfig
>> +++ b/kernel/rcu/Kconfig
>> @@ -18,7 +18,7 @@ config TREE_RCU
>>
>>  config PREEMPT_RCU
>>  	bool
>> -	default y if PREEMPTION
>> +	default y if (PREEMPT || PREEMPT_RT || PREEMPT_DYNAMIC)
>>  	select TREE_RCU
>>  	help
>>  	  This option selects the RCU implementation that is
>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>

Thanks!

> But looking at !CONFIG_PREEMPT_RCU code on tree_plugin.h, I see
> some issues now that the code can be preemptible. Well I think
> it has always been preemptible but PREEMPTION && !PREEMPT_RCU
> has seldom been exerciced (or was it even possible?).
>
> For example rcu_read_unlock_strict() can be called with preemption
> enabled so we need the following otherwise the rdp is unstable, the
> norm value becomes racy

I think I see your point about rdp being racy, but given that
rcu_read_unlock_strict() would always be called with a non-zero
preemption count (with CONFIG_PREEMPTION), wouldn't the preempt_count()
check defeat any calls to rcu_read_unlock_strict()?

    void rcu_read_unlock_strict(void)
    {
            struct rcu_data *rdp;

            if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
                    return;

Or am I missing something?

Ankur

> (though automagically fixed in rcu_report_qs_rdp())
> and rcu_report_qs_rdp() might warn.
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 58d84c59f3dd..368f00267d4e 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -95,9 +95,9 @@ static inline void __rcu_read_lock(void)
>
>  static inline void __rcu_read_unlock(void)
>  {
> -	preempt_enable();
>  	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
>  		rcu_read_unlock_strict();
> +	preempt_enable();
>  }
>
>  static inline int rcu_preempt_depth(void)
>
>
> Let me audit further if we missed something else...
>
> Thanks.
>
>> --
>> 2.43.5
>>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/6] rcu: fix header guard for rcu_all_qs()
  2024-11-06 20:17 ` [PATCH v2 1/6] rcu: fix header guard for rcu_all_qs() Ankur Arora
  2024-11-13 14:50   ` Frederic Weisbecker
@ 2024-11-14  7:06   ` Sebastian Andrzej Siewior
  2024-11-15  4:55     ` Ankur Arora
  1 sibling, 1 reply; 36+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-11-14  7:06 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, efault, sshegde, boris.ostrovsky

On 2024-11-06 12:17:53 [-0800], Ankur Arora wrote:
> rcu_all_qs() is defined for !CONFIG_PREEMPT_RCU but the declaration
> is conditioned on CONFIG_PREEMPTION.
> 
> With CONFIG_PREEMPT_LAZY, CONFIG_PREEMPTION=y does not imply
> CONFIG_PREEMPT_RCU=y.
> 
> Decouple the two.
> 
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Sebastian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 2/6] rcu: rename PREEMPT_AUTO to PREEMPT_LAZY
  2024-11-06 20:17 ` [PATCH v2 2/6] rcu: rename PREEMPT_AUTO to PREEMPT_LAZY Ankur Arora
  2024-11-13 14:59   ` Frederic Weisbecker
@ 2024-11-14  7:07   ` Sebastian Andrzej Siewior
  1 sibling, 0 replies; 36+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-11-14  7:07 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, efault, sshegde, boris.ostrovsky

On 2024-11-06 12:17:54 [-0800], Ankur Arora wrote:
> Replace mentions of PREEMPT_AUTO with PREEMPT_LAZY.
> 
> Also, since PREMPT_LAZY implies PREEMPTION, we can just reduce the
> TASKS_RCU selection criteria from:
> 
>   NEED_TASKS_RCU && (PREEMPTION || PREEMPT_AUTO)
> to:
>   NEED_TASKS_RCU && PREEMPTION
> 
> CC: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Sebastian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations
  2024-11-14  0:23     ` Ankur Arora
@ 2024-11-14  8:25       ` Sebastian Andrzej Siewior
  2024-11-14 11:36         ` Frederic Weisbecker
  0 siblings, 1 reply; 36+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-11-14  8:25 UTC (permalink / raw)
  To: Ankur Arora
  Cc: Frederic Weisbecker, linux-kernel, peterz, tglx, paulmck, mingo,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, efault, sshegde, boris.ostrovsky

On 2024-11-13 16:23:03 [-0800], Ankur Arora wrote:
> > But looking at !CONFIG_PREEMPT_RCU code on tree_plugin.h, I see
> > some issues now that the code can be preemptible. Well I think
> > it has always been preemptible but PREEMPTION && !PREEMPT_RCU
> > has seldom been exerciced (or was it even possible?).
> >
> > For example rcu_read_unlock_strict() can be called with preemption
> > enabled so we need the following otherwise the rdp is unstable, the
> > norm value becomes racy
> 
> I think I see your point about rdp being racy, but given that
> rcu_read_unlock_strict() would always be called with a non-zero
> preemption count (with CONFIG_PREEMPTION), wouldn't the preempt_count()
> check defeat any calls to rcu_read_unlock_strict()?
> 
>     void rcu_read_unlock_strict(void)
>     {
>             struct rcu_data *rdp;
> 
>             if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
>                     return;
> 
> Or am I missing something?

This is indeed broken. By moving preempt_disable() as Frederic suggested
then rcu_read_unlock_strict() becomes a NOP. Keeping this as-is results
in spats due to this_cpu_ptr() in preemptible regions. Looking further
we have "rdp->cpu != smp_processor_id()" as the next candidate.

That preempt_disable() should go to rcu_read_unlock_strict() after the
check.

> Ankur

Sebastian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/6] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
  2024-11-06 20:17 ` [PATCH v2 4/6] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y Ankur Arora
@ 2024-11-14  8:50   ` Sebastian Andrzej Siewior
  2024-11-15  4:58     ` Ankur Arora
  2024-11-28 13:36   ` Frederic Weisbecker
  1 sibling, 1 reply; 36+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-11-14  8:50 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, efault, sshegde, boris.ostrovsky

On 2024-11-06 12:17:56 [-0800], Ankur Arora wrote:
> With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
> states for read-side critical sections via rcu_all_qs().
> One reason why this was needed, was lacking preempt-count, the tick
> handler has no way of knowing whether it is executing in a read-side
> critical section or not.
> 
> With PREEMPT_LAZY=y, there can be configurations with PREEMPT_COUNT=y,
> PREEMPT_RCU=n, where cond_resched() is a stub that does not provide
> quiescent states via rcu_all_qs().

With PREEMPT_LAZY=y && PREEMPT_DYNAMIC=n we get PREEMPT_COUNT=y and
PREEMPT_RCU=n. In this configuration cond_resched() is an empty stub and
does not provide quiescent states via rcu_all_qs(). PREEMPT_RCU=y
provides this information via rcu_read_unlock() and its nesting counter.

> So, use the availability of preempt_count() to report quiescent states
> in rcu_flavor_sched_clock_irq().

Okay. You might also want to update the cond_resched() comment,
	s@In preemptible kernels, ->rcu_read_lock_nesting@
	  In PREEMPT_RCU kernels, ->rcu_read_lock_nesting@

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>

Sebastian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 6/6] sched: warn for high latency with TIF_NEED_RESCHED_LAZY
  2024-11-06 20:17 ` [PATCH v2 6/6] sched: warn for high latency with TIF_NEED_RESCHED_LAZY Ankur Arora
@ 2024-11-14  9:16   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 36+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-11-14  9:16 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, efault, sshegde, boris.ostrovsky, Ingo Molnar

On 2024-11-06 12:17:58 [-0800], Ankur Arora wrote:
> Add support for warning if the TIF_NEED_RESCHED_LAZY bit is set
> without rescheduling for more than the latency_warn_ms period.

You fail to explain _why_ it is required to also check
TIF_NEED_RESCHED_LAZY to not be set.

The problem with NEED_RESCHED set but no scheduling in 100ms is a long
preempt-off or IRQ-off region .
The problem with NEED_RESCHED_LAZY set but no scheduling in 100ms is a
missing timer tick in that time. Also the previous mentioned things
might have happen.
And the code acting on NEED_RESCHED_LAZY is invoked before this check
is.

Sebastian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/6] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y
  2024-11-06 20:17 ` [PATCH v2 5/6] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y Ankur Arora
@ 2024-11-14  9:22   ` Sebastian Andrzej Siewior
  2024-11-15  4:59     ` Ankur Arora
  2024-11-28 14:33   ` Frederic Weisbecker
  1 sibling, 1 reply; 36+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-11-14  9:22 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, efault, sshegde, boris.ostrovsky,
	Daniel Bristot de Oliveira

On 2024-11-06 12:17:57 [-0800], Ankur Arora wrote:
> To reduce RCU noise for nohz_full configurations, osnoise depends
> on cond_resched() providing quiescent states for PREEMPT_RCU=n
> configurations. And, for PREEMPT_RCU=y configurations does this
> by directly calling rcu_momentary_eqs().
> 
> With PREEMPT_LAZY=y, however, we can have configurations with
> (PREEMPTION=y, PREEMPT_RCU=n), which means neither of the above
> can help.

The problem is as you say CONFIG_PREEMPT_RCU=n + CONFIG_PREEMPTION=y.
You can't select any of those two directly but get here via
 PREEMPT_LAZY=y + PREEMPT_DYNAMIC=n.

Please spell it out to make it obvious. It is not a large group of
configurations, it is exactly this combo.

 With PREEMPT_LAZY=y + PREEMPT_DYNAMIC=n however we get PREEMPT_RCU=n
 which means no direct rcu_momentary_eqs() invocations and
 cond_resched() is an empty stub.

> Handle that by fallback to the explicit quiescent states via
> rcu_momentary_eqs().

> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>

Sebastian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/6] RCU changes for PREEMPT_LAZY
  2024-11-06 20:17 [PATCH v2 0/6] RCU changes for PREEMPT_LAZY Ankur Arora
                   ` (5 preceding siblings ...)
  2024-11-06 20:17 ` [PATCH v2 6/6] sched: warn for high latency with TIF_NEED_RESCHED_LAZY Ankur Arora
@ 2024-11-14 10:01 ` Sebastian Andrzej Siewior
  2024-11-15  5:20   ` Ankur Arora
  6 siblings, 1 reply; 36+ messages in thread
From: Sebastian Andrzej Siewior @ 2024-11-14 10:01 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, efault, sshegde, boris.ostrovsky

On 2024-11-06 12:17:52 [-0800], Ankur Arora wrote:
> This series adds RCU and some leftover scheduler bits for lazy
> preemption.

This is not critical for the current implementation. The way I
understand is that you make a change in 3/6 and then all other patches
in this series are required to deal with this.

For bisect reasons it would make sense to have 3/6 last in the series
and to the "fixes" first before the code is enabled. I mean if you apply
3/6 first then you get build failures without 1/6. But with 3/6 before
5/6 you should get runtime errors, right?

> The main problem addressed in the RCU related patches is that before
> PREEMPT_LAZY, PREEMPTION=y implied PREEMPT_RCU=y. With PREEMPT_LAZY,
> that's no longer true. 

No, you want to make PREEMPTION=y + PREEMPT_RCU=n + PREEMPT_LAZY=y
possible. This is different. Your wording makes it sound like there _is_
an actual problem.

> That's because PREEMPT_RCU makes some trade-offs to optimize for
> latency as opposed to throughput, and configurations with limited
> preemption might prefer the stronger forward-progress guarantees of
> PREEMPT_RCU=n.
> 
> Accordingly, with standalone PREEMPT_LAZY (much like PREEMPT_NONE,
> PREEMPT_VOLUNTARY) we want to use PREEMPT_RCU=n. And, when used in
> conjunction with PREEMPT_DYNAMIC, we continue to use PREEMPT_RCU=y.
> 
> Patches 1 and 2 are cleanup patches:
>   "rcu: fix header guard for rcu_all_qs()"
>   "rcu: rename PREEMPT_AUTO to PREEMPT_LAZY"
> 
> Patch 3, "rcu: limit PREEMPT_RCU configurations", explicitly limits
> PREEMPT_RCU=y to the PREEMPT_DYNAMIC or the latency oriented models.
> 
> Patches 4 and 5,
>   "rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y"
>   "osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y"
> 
> handle quiescent states for the (PREEMPT_LAZY=y, PREEMPT_RCU=n)
> configuration.

I was briefly thinking about 

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5646,8 +5646,11 @@ void sched_tick(void)
 	hw_pressure = arch_scale_hw_pressure(cpu_of(rq));
 	update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure);
 
-	if (dynamic_preempt_lazy() && tif_test_bit(TIF_NEED_RESCHED_LAZY))
+	if (dynamic_preempt_lazy() && tif_test_bit(TIF_NEED_RESCHED_LAZY)) {
 		resched_curr(rq);
+		if (!IS_ENABLED(CONFIG_PREEMPT_RCU))
+			rcu_all_qs();
+	}
 
 	donor->sched_class->task_tick(rq, donor, 0);
 	if (sched_feat(LATENCY_WARN))

which should make #4+ #5 obsolete. But I think it is nicer to have the
change in #4 since it extends the check to cover all cases. And then
we would do it twice just for osnoise.

Sebastian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations
  2024-11-14  8:25       ` Sebastian Andrzej Siewior
@ 2024-11-14 11:36         ` Frederic Weisbecker
  0 siblings, 0 replies; 36+ messages in thread
From: Frederic Weisbecker @ 2024-11-14 11:36 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Ankur Arora, linux-kernel, peterz, tglx, paulmck, mingo,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, efault, sshegde, boris.ostrovsky

Le Thu, Nov 14, 2024 at 09:25:34AM +0100, Sebastian Andrzej Siewior a écrit :
> On 2024-11-13 16:23:03 [-0800], Ankur Arora wrote:
> > > But looking at !CONFIG_PREEMPT_RCU code on tree_plugin.h, I see
> > > some issues now that the code can be preemptible. Well I think
> > > it has always been preemptible but PREEMPTION && !PREEMPT_RCU
> > > has seldom been exerciced (or was it even possible?).
> > >
> > > For example rcu_read_unlock_strict() can be called with preemption
> > > enabled so we need the following otherwise the rdp is unstable, the
> > > norm value becomes racy
> > 
> > I think I see your point about rdp being racy, but given that
> > rcu_read_unlock_strict() would always be called with a non-zero
> > preemption count (with CONFIG_PREEMPTION), wouldn't the preempt_count()
> > check defeat any calls to rcu_read_unlock_strict()?
> > 
> >     void rcu_read_unlock_strict(void)
> >     {
> >             struct rcu_data *rdp;
> > 
> >             if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
> >                     return;
> > 
> > Or am I missing something?

Haha, right!

> 
> This is indeed broken. By moving preempt_disable() as Frederic suggested
> then rcu_read_unlock_strict() becomes a NOP. Keeping this as-is results
> in spats due to this_cpu_ptr() in preemptible regions. Looking further
> we have "rdp->cpu != smp_processor_id()" as the next candidate.
> 
> That preempt_disable() should go to rcu_read_unlock_strict() after the
> check.

Yeah that looks better ;-)

> 
> > Ankur
> 
> Sebastian

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 1/6] rcu: fix header guard for rcu_all_qs()
  2024-11-14  7:06   ` Sebastian Andrzej Siewior
@ 2024-11-15  4:55     ` Ankur Arora
  0 siblings, 0 replies; 36+ messages in thread
From: Ankur Arora @ 2024-11-15  4:55 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, efault, sshegde, boris.ostrovsky


Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2024-11-06 12:17:53 [-0800], Ankur Arora wrote:
>> rcu_all_qs() is defined for !CONFIG_PREEMPT_RCU but the declaration
>> is conditioned on CONFIG_PREEMPTION.
>>
>> With CONFIG_PREEMPT_LAZY, CONFIG_PREEMPTION=y does not imply
>> CONFIG_PREEMPT_RCU=y.
>>
>> Decouple the two.
>>
>> Cc: Paul E. McKenney <paulmck@kernel.org>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>
> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Thanks!

--
ankur

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/6] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
  2024-11-14  8:50   ` Sebastian Andrzej Siewior
@ 2024-11-15  4:58     ` Ankur Arora
  0 siblings, 0 replies; 36+ messages in thread
From: Ankur Arora @ 2024-11-15  4:58 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, frederic, efault, sshegde, boris.ostrovsky


Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2024-11-06 12:17:56 [-0800], Ankur Arora wrote:
>> With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
>> states for read-side critical sections via rcu_all_qs().
>> One reason why this was needed, was lacking preempt-count, the tick
>> handler has no way of knowing whether it is executing in a read-side
>> critical section or not.
>>
>> With PREEMPT_LAZY=y, there can be configurations with PREEMPT_COUNT=y,
>> PREEMPT_RCU=n, where cond_resched() is a stub that does not provide
>> quiescent states via rcu_all_qs().
>
> With PREEMPT_LAZY=y && PREEMPT_DYNAMIC=n we get PREEMPT_COUNT=y and
> PREEMPT_RCU=n. In this configuration cond_resched() is an empty stub and
> does not provide quiescent states via rcu_all_qs(). PREEMPT_RCU=y
> provides this information via rcu_read_unlock() and its nesting counter.
>
>> So, use the availability of preempt_count() to report quiescent states
>> in rcu_flavor_sched_clock_irq().
>
> Okay. You might also want to update the cond_resched() comment,
> 	s@In preemptible kernels, ->rcu_read_lock_nesting@
> 	  In PREEMPT_RCU kernels, ->rcu_read_lock_nesting@

Good point. Will add.

> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

Thanks!

Ankur

>> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>
> Sebastian


--
ankur

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/6] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y
  2024-11-14  9:22   ` Sebastian Andrzej Siewior
@ 2024-11-15  4:59     ` Ankur Arora
  0 siblings, 0 replies; 36+ messages in thread
From: Ankur Arora @ 2024-11-15  4:59 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Ankur Arora, linux-kernel, peterz, tglx, paulmck, mingo,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, frederic, efault, sshegde, boris.ostrovsky,
	Daniel Bristot de Oliveira


Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2024-11-06 12:17:57 [-0800], Ankur Arora wrote:
>> To reduce RCU noise for nohz_full configurations, osnoise depends
>> on cond_resched() providing quiescent states for PREEMPT_RCU=n
>> configurations. And, for PREEMPT_RCU=y configurations does this
>> by directly calling rcu_momentary_eqs().
>>
>> With PREEMPT_LAZY=y, however, we can have configurations with
>> (PREEMPTION=y, PREEMPT_RCU=n), which means neither of the above
>> can help.
>
> The problem is as you say CONFIG_PREEMPT_RCU=n + CONFIG_PREEMPTION=y.
> You can't select any of those two directly but get here via
>  PREEMPT_LAZY=y + PREEMPT_DYNAMIC=n.
>
> Please spell it out to make it obvious. It is not a large group of
> configurations, it is exactly this combo.

Makes sense.

Will do. Thanks.

Ankur

>  With PREEMPT_LAZY=y + PREEMPT_DYNAMIC=n however we get PREEMPT_RCU=n
>  which means no direct rcu_momentary_eqs() invocations and
>  cond_resched() is an empty stub.
>
>> Handle that by fallback to the explicit quiescent states via
>> rcu_momentary_eqs().
>
>> Cc: Paul E. McKenney <paulmck@kernel.org>
>> Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
>> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>
> Sebastian


--
ankur

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 0/6] RCU changes for PREEMPT_LAZY
  2024-11-14 10:01 ` [PATCH v2 0/6] RCU changes for PREEMPT_LAZY Sebastian Andrzej Siewior
@ 2024-11-15  5:20   ` Ankur Arora
  0 siblings, 0 replies; 36+ messages in thread
From: Ankur Arora @ 2024-11-15  5:20 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: Ankur Arora, linux-kernel, peterz, tglx, paulmck, mingo,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, frederic, efault, sshegde, boris.ostrovsky


Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2024-11-06 12:17:52 [-0800], Ankur Arora wrote:
>> This series adds RCU and some leftover scheduler bits for lazy
>> preemption.
>
> This is not critical for the current implementation. The way I
> understand is that you make a change in 3/6 and then all other patches
> in this series are required to deal with this.
>
> For bisect reasons it would make sense to have 3/6 last in the series
> and to the "fixes" first before the code is enabled. I mean if you apply
> 3/6 first then you get build failures without 1/6. But with 3/6 before
> 5/6 you should get runtime errors, right?

That's a good point. Will reorder.

>> The main problem addressed in the RCU related patches is that before
>> PREEMPT_LAZY, PREEMPTION=y implied PREEMPT_RCU=y. With PREEMPT_LAZY,
>> that's no longer true.
>
> No, you want to make PREEMPTION=y + PREEMPT_RCU=n + PREEMPT_LAZY=y
> possible. This is different. Your wording makes it sound like there _is_
> an actual problem.

That's too literal a reading. It's just the problem ("matter or
situation that is unwelcome" to quote from a dictionary) addressed in
the patches.

>> That's because PREEMPT_RCU makes some trade-offs to optimize for
>> latency as opposed to throughput, and configurations with limited
>> preemption might prefer the stronger forward-progress guarantees of
>> PREEMPT_RCU=n.
>>
>> Accordingly, with standalone PREEMPT_LAZY (much like PREEMPT_NONE,
>> PREEMPT_VOLUNTARY) we want to use PREEMPT_RCU=n. And, when used in
>> conjunction with PREEMPT_DYNAMIC, we continue to use PREEMPT_RCU=y.
>>
>> Patches 1 and 2 are cleanup patches:
>>   "rcu: fix header guard for rcu_all_qs()"
>>   "rcu: rename PREEMPT_AUTO to PREEMPT_LAZY"
>>
>> Patch 3, "rcu: limit PREEMPT_RCU configurations", explicitly limits
>> PREEMPT_RCU=y to the PREEMPT_DYNAMIC or the latency oriented models.
>>
>> Patches 4 and 5,
>>   "rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y"
>>   "osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y"
>>
>> handle quiescent states for the (PREEMPT_LAZY=y, PREEMPT_RCU=n)
>> configuration.
>
> I was briefly thinking about
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5646,8 +5646,11 @@ void sched_tick(void)
>  	hw_pressure = arch_scale_hw_pressure(cpu_of(rq));
>  	update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure);
>
> -	if (dynamic_preempt_lazy() && tif_test_bit(TIF_NEED_RESCHED_LAZY))
> +	if (dynamic_preempt_lazy() && tif_test_bit(TIF_NEED_RESCHED_LAZY)) {
>  		resched_curr(rq);
> +		if (!IS_ENABLED(CONFIG_PREEMPT_RCU))
> +			rcu_all_qs();
> +	}
>
>  	donor->sched_class->task_tick(rq, donor, 0);
>  	if (sched_feat(LATENCY_WARN))
>
> which should make #4+ #5 obsolete. But I think it is nicer to have the
> change in #4 since it extends the check to cover all cases. And then
> we would do it twice just for osnoise.

Yeah, exactly. The check here only deals with this specific case
while the one in rcu_flavor_sched_clock_irq() can handle that more
generally.


Thanks.

--
ankur

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations
  2024-11-13 15:31   ` Frederic Weisbecker
  2024-11-14  0:23     ` Ankur Arora
@ 2024-11-25 21:40     ` Ankur Arora
  2024-11-26 14:49       ` Frederic Weisbecker
  1 sibling, 1 reply; 36+ messages in thread
From: Ankur Arora @ 2024-11-25 21:40 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ankur Arora, linux-kernel, peterz, tglx, paulmck, mingo, bigeasy,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, efault, sshegde, boris.ostrovsky


Frederic Weisbecker <frederic@kernel.org> writes:

> Le Wed, Nov 06, 2024 at 12:17:55PM -0800, Ankur Arora a écrit :
>> PREEMPT_LAZY can be enabled stand-alone or alongside PREEMPT_DYNAMIC
>> which allows for dynamic switching of preemption models.
>>
>> The choice of PREEMPT_RCU or not, however, is fixed at compile time.
>>
>> Given that PREEMPT_RCU makes some trade-offs to optimize for latency
>> as opposed to throughput, configurations with limited preemption
>> might prefer the stronger forward-progress guarantees of PREEMPT_RCU=n.
>>
>> Accordingly, explicitly limit PREEMPT_RCU=y to the latency oriented
>> preemption models: PREEMPT, PREEMPT_RT, and the runtime configurable
>> model PREEMPT_DYNAMIC.
>>
>> This means the throughput oriented models, PREEMPT_NONE,
>> PREEMPT_VOLUNTARY and PREEMPT_LAZY will run with PREEMPT_RCU=n.
>>
>> Cc: Paul E. McKenney <paulmck@kernel.org>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>  kernel/rcu/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
>> index 5a7ff5e1cdcb..9d52f87fac27 100644
>> --- a/kernel/rcu/Kconfig
>> +++ b/kernel/rcu/Kconfig
>> @@ -18,7 +18,7 @@ config TREE_RCU
>>
>>  config PREEMPT_RCU
>>  	bool
>> -	default y if PREEMPTION
>> +	default y if (PREEMPT || PREEMPT_RT || PREEMPT_DYNAMIC)
>>  	select TREE_RCU
>>  	help
>>  	  This option selects the RCU implementation that is
>
> Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>
> But looking at !CONFIG_PREEMPT_RCU code on tree_plugin.h, I see
> some issues now that the code can be preemptible. Well I think
> it has always been preemptible but PREEMPTION && !PREEMPT_RCU
> has seldom been exerciced (or was it even possible?).
>
> For example rcu_read_unlock_strict() can be called with preemption
> enabled so we need the following otherwise the rdp is unstable, the
> norm value becomes racy (though automagically fixed in rcu_report_qs_rdp())
> and rcu_report_qs_rdp() might warn.
>
> diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> index 58d84c59f3dd..368f00267d4e 100644
> --- a/include/linux/rcupdate.h
> +++ b/include/linux/rcupdate.h
> @@ -95,9 +95,9 @@ static inline void __rcu_read_lock(void)
>
>  static inline void __rcu_read_unlock(void)
>  {
> -	preempt_enable();
>  	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
>  		rcu_read_unlock_strict();
> +	preempt_enable();
>  }
>
>  static inline int rcu_preempt_depth(void)

Based on the discussion on the thread, how about keeping this and
changing the preempt_count check in rcu_read_unlock_strict() instead?

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 1c7cbd145d5e..8fc67639d3a7 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -831,8 +831,15 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
 void rcu_read_unlock_strict(void)
 {
        struct rcu_data *rdp;
+       int pc = ((preempt_count() & PREEMPT_MASK) >> PREEMPT_SHIFT);

-       if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
+       /*
+        * rcu_report_qs_rdp() can only be invoked with a stable rdp and
+        * and from the local CPU.
+        * With CONFIG_PREEMPTION=y, do this while holding the last
+        * preempt_count which gets dropped after __rcu_read_unlock().
+        */
+       if (irqs_disabled() || pc > 1 || !rcu_state.gp_kthread)
                return;
        rdp = this_cpu_ptr(&rcu_data);
        rdp->cpu_no_qs.b.norm = false;


Thanks

--
ankur

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations
  2024-11-25 21:40     ` Ankur Arora
@ 2024-11-26 14:49       ` Frederic Weisbecker
  2024-11-27  5:35         ` Ankur Arora
  0 siblings, 1 reply; 36+ messages in thread
From: Frederic Weisbecker @ 2024-11-26 14:49 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, efault, sshegde, boris.ostrovsky

Le Mon, Nov 25, 2024 at 01:40:39PM -0800, Ankur Arora a écrit :
> 
> Frederic Weisbecker <frederic@kernel.org> writes:
> 
> > Le Wed, Nov 06, 2024 at 12:17:55PM -0800, Ankur Arora a écrit :
> >> PREEMPT_LAZY can be enabled stand-alone or alongside PREEMPT_DYNAMIC
> >> which allows for dynamic switching of preemption models.
> >>
> >> The choice of PREEMPT_RCU or not, however, is fixed at compile time.
> >>
> >> Given that PREEMPT_RCU makes some trade-offs to optimize for latency
> >> as opposed to throughput, configurations with limited preemption
> >> might prefer the stronger forward-progress guarantees of PREEMPT_RCU=n.
> >>
> >> Accordingly, explicitly limit PREEMPT_RCU=y to the latency oriented
> >> preemption models: PREEMPT, PREEMPT_RT, and the runtime configurable
> >> model PREEMPT_DYNAMIC.
> >>
> >> This means the throughput oriented models, PREEMPT_NONE,
> >> PREEMPT_VOLUNTARY and PREEMPT_LAZY will run with PREEMPT_RCU=n.
> >>
> >> Cc: Paul E. McKenney <paulmck@kernel.org>
> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> >> ---
> >>  kernel/rcu/Kconfig | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> >> index 5a7ff5e1cdcb..9d52f87fac27 100644
> >> --- a/kernel/rcu/Kconfig
> >> +++ b/kernel/rcu/Kconfig
> >> @@ -18,7 +18,7 @@ config TREE_RCU
> >>
> >>  config PREEMPT_RCU
> >>  	bool
> >> -	default y if PREEMPTION
> >> +	default y if (PREEMPT || PREEMPT_RT || PREEMPT_DYNAMIC)
> >>  	select TREE_RCU
> >>  	help
> >>  	  This option selects the RCU implementation that is
> >
> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> >
> > But looking at !CONFIG_PREEMPT_RCU code on tree_plugin.h, I see
> > some issues now that the code can be preemptible. Well I think
> > it has always been preemptible but PREEMPTION && !PREEMPT_RCU
> > has seldom been exerciced (or was it even possible?).
> >
> > For example rcu_read_unlock_strict() can be called with preemption
> > enabled so we need the following otherwise the rdp is unstable, the
> > norm value becomes racy (though automagically fixed in rcu_report_qs_rdp())
> > and rcu_report_qs_rdp() might warn.
> >
> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> > index 58d84c59f3dd..368f00267d4e 100644
> > --- a/include/linux/rcupdate.h
> > +++ b/include/linux/rcupdate.h
> > @@ -95,9 +95,9 @@ static inline void __rcu_read_lock(void)
> >
> >  static inline void __rcu_read_unlock(void)
> >  {
> > -	preempt_enable();
> >  	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
> >  		rcu_read_unlock_strict();
> > +	preempt_enable();
> >  }
> >
> >  static inline int rcu_preempt_depth(void)
> 
> Based on the discussion on the thread, how about keeping this and
> changing the preempt_count check in rcu_read_unlock_strict() instead?
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 1c7cbd145d5e..8fc67639d3a7 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -831,8 +831,15 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
>  void rcu_read_unlock_strict(void)
>  {
>         struct rcu_data *rdp;
> +       int pc = ((preempt_count() & PREEMPT_MASK) >> PREEMPT_SHIFT);

This should be in_atomic_preempt_off(), otherwise softirqs and IRQs are
spuriously accounted as quiescent states.

Thanks.

> 
> -       if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
> +       /*
> +        * rcu_report_qs_rdp() can only be invoked with a stable rdp and
> +        * and from the local CPU.
> +        * With CONFIG_PREEMPTION=y, do this while holding the last
> +        * preempt_count which gets dropped after __rcu_read_unlock().
> +        */
> +       if (irqs_disabled() || pc > 1 || !rcu_state.gp_kthread)
>                 return;
>         rdp = this_cpu_ptr(&rcu_data);
>         rdp->cpu_no_qs.b.norm = false;
> 
> 
> Thanks
> 
> --
> ankur

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations
  2024-11-26 14:49       ` Frederic Weisbecker
@ 2024-11-27  5:35         ` Ankur Arora
  2024-11-27  6:19           ` Ankur Arora
  0 siblings, 1 reply; 36+ messages in thread
From: Ankur Arora @ 2024-11-27  5:35 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ankur Arora, linux-kernel, peterz, tglx, paulmck, mingo, bigeasy,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, efault, sshegde, boris.ostrovsky


Frederic Weisbecker <frederic@kernel.org> writes:

> Le Mon, Nov 25, 2024 at 01:40:39PM -0800, Ankur Arora a écrit :
>>
>> Frederic Weisbecker <frederic@kernel.org> writes:
>>
>> > Le Wed, Nov 06, 2024 at 12:17:55PM -0800, Ankur Arora a écrit :
>> >> PREEMPT_LAZY can be enabled stand-alone or alongside PREEMPT_DYNAMIC
>> >> which allows for dynamic switching of preemption models.
>> >>
>> >> The choice of PREEMPT_RCU or not, however, is fixed at compile time.
>> >>
>> >> Given that PREEMPT_RCU makes some trade-offs to optimize for latency
>> >> as opposed to throughput, configurations with limited preemption
>> >> might prefer the stronger forward-progress guarantees of PREEMPT_RCU=n.
>> >>
>> >> Accordingly, explicitly limit PREEMPT_RCU=y to the latency oriented
>> >> preemption models: PREEMPT, PREEMPT_RT, and the runtime configurable
>> >> model PREEMPT_DYNAMIC.
>> >>
>> >> This means the throughput oriented models, PREEMPT_NONE,
>> >> PREEMPT_VOLUNTARY and PREEMPT_LAZY will run with PREEMPT_RCU=n.
>> >>
>> >> Cc: Paul E. McKenney <paulmck@kernel.org>
>> >> Cc: Peter Zijlstra <peterz@infradead.org>
>> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> >> ---
>> >>  kernel/rcu/Kconfig | 2 +-
>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>
>> >> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
>> >> index 5a7ff5e1cdcb..9d52f87fac27 100644
>> >> --- a/kernel/rcu/Kconfig
>> >> +++ b/kernel/rcu/Kconfig
>> >> @@ -18,7 +18,7 @@ config TREE_RCU
>> >>
>> >>  config PREEMPT_RCU
>> >>  	bool
>> >> -	default y if PREEMPTION
>> >> +	default y if (PREEMPT || PREEMPT_RT || PREEMPT_DYNAMIC)
>> >>  	select TREE_RCU
>> >>  	help
>> >>  	  This option selects the RCU implementation that is
>> >
>> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>> >
>> > But looking at !CONFIG_PREEMPT_RCU code on tree_plugin.h, I see
>> > some issues now that the code can be preemptible. Well I think
>> > it has always been preemptible but PREEMPTION && !PREEMPT_RCU
>> > has seldom been exerciced (or was it even possible?).
>> >
>> > For example rcu_read_unlock_strict() can be called with preemption
>> > enabled so we need the following otherwise the rdp is unstable, the
>> > norm value becomes racy (though automagically fixed in rcu_report_qs_rdp())
>> > and rcu_report_qs_rdp() might warn.
>> >
>> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>> > index 58d84c59f3dd..368f00267d4e 100644
>> > --- a/include/linux/rcupdate.h
>> > +++ b/include/linux/rcupdate.h
>> > @@ -95,9 +95,9 @@ static inline void __rcu_read_lock(void)
>> >
>> >  static inline void __rcu_read_unlock(void)
>> >  {
>> > -	preempt_enable();
>> >  	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
>> >  		rcu_read_unlock_strict();
>> > +	preempt_enable();
>> >  }
>> >
>> >  static inline int rcu_preempt_depth(void)
>>
>> Based on the discussion on the thread, how about keeping this and
>> changing the preempt_count check in rcu_read_unlock_strict() instead?
>>
>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>> index 1c7cbd145d5e..8fc67639d3a7 100644
>> --- a/kernel/rcu/tree_plugin.h
>> +++ b/kernel/rcu/tree_plugin.h
>> @@ -831,8 +831,15 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
>>  void rcu_read_unlock_strict(void)
>>  {
>>         struct rcu_data *rdp;
>> +       int pc = ((preempt_count() & PREEMPT_MASK) >> PREEMPT_SHIFT);
>
> This should be in_atomic_preempt_off(), otherwise softirqs and IRQs are
> spuriously accounted as quiescent states.

Not sure I got that. Won't ((preempt_count() & PREEMPT_MASK) >> PREEMPT_SHIFT)
give us task only preempt count?

And, given that the preempt_count is at least 1, the (pc > 1) check below
would ensure we have a stable rdp and call rcu_report_qs_rdp() before
dropping the last preempt-count.

>>
>> -       if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
>> +       /*
>> +        * rcu_report_qs_rdp() can only be invoked with a stable rdp and
>> +        * and from the local CPU.
>> +        * With CONFIG_PREEMPTION=y, do this while holding the last
>> +        * preempt_count which gets dropped after __rcu_read_unlock().
>> +        */
>> +       if (irqs_disabled() || pc > 1 || !rcu_state.gp_kthread)
>>                 return;
>>         rdp = this_cpu_ptr(&rcu_data);
>>         rdp->cpu_no_qs.b.norm = false;

Thanks

--
ankur

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations
  2024-11-27  5:35         ` Ankur Arora
@ 2024-11-27  6:19           ` Ankur Arora
  2024-11-28 12:33             ` Frederic Weisbecker
  0 siblings, 1 reply; 36+ messages in thread
From: Ankur Arora @ 2024-11-27  6:19 UTC (permalink / raw)
  To: Ankur Arora
  Cc: Frederic Weisbecker, linux-kernel, peterz, tglx, paulmck, mingo,
	bigeasy, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt,
	bsegall, mgorman, vschneid, efault, sshegde, boris.ostrovsky


Ankur Arora <ankur.a.arora@oracle.com> writes:

> Frederic Weisbecker <frederic@kernel.org> writes:
>
>> Le Mon, Nov 25, 2024 at 01:40:39PM -0800, Ankur Arora a écrit :
>>>
>>> Frederic Weisbecker <frederic@kernel.org> writes:
>>>
>>> > Le Wed, Nov 06, 2024 at 12:17:55PM -0800, Ankur Arora a écrit :
>>> >> PREEMPT_LAZY can be enabled stand-alone or alongside PREEMPT_DYNAMIC
>>> >> which allows for dynamic switching of preemption models.
>>> >>
>>> >> The choice of PREEMPT_RCU or not, however, is fixed at compile time.
>>> >>
>>> >> Given that PREEMPT_RCU makes some trade-offs to optimize for latency
>>> >> as opposed to throughput, configurations with limited preemption
>>> >> might prefer the stronger forward-progress guarantees of PREEMPT_RCU=n.
>>> >>
>>> >> Accordingly, explicitly limit PREEMPT_RCU=y to the latency oriented
>>> >> preemption models: PREEMPT, PREEMPT_RT, and the runtime configurable
>>> >> model PREEMPT_DYNAMIC.
>>> >>
>>> >> This means the throughput oriented models, PREEMPT_NONE,
>>> >> PREEMPT_VOLUNTARY and PREEMPT_LAZY will run with PREEMPT_RCU=n.
>>> >>
>>> >> Cc: Paul E. McKenney <paulmck@kernel.org>
>>> >> Cc: Peter Zijlstra <peterz@infradead.org>
>>> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>>> >> ---
>>> >>  kernel/rcu/Kconfig | 2 +-
>>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>>> >>
>>> >> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
>>> >> index 5a7ff5e1cdcb..9d52f87fac27 100644
>>> >> --- a/kernel/rcu/Kconfig
>>> >> +++ b/kernel/rcu/Kconfig
>>> >> @@ -18,7 +18,7 @@ config TREE_RCU
>>> >>
>>> >>  config PREEMPT_RCU
>>> >>  	bool
>>> >> -	default y if PREEMPTION
>>> >> +	default y if (PREEMPT || PREEMPT_RT || PREEMPT_DYNAMIC)
>>> >>  	select TREE_RCU
>>> >>  	help
>>> >>  	  This option selects the RCU implementation that is
>>> >
>>> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>>> >
>>> > But looking at !CONFIG_PREEMPT_RCU code on tree_plugin.h, I see
>>> > some issues now that the code can be preemptible. Well I think
>>> > it has always been preemptible but PREEMPTION && !PREEMPT_RCU
>>> > has seldom been exerciced (or was it even possible?).
>>> >
>>> > For example rcu_read_unlock_strict() can be called with preemption
>>> > enabled so we need the following otherwise the rdp is unstable, the
>>> > norm value becomes racy (though automagically fixed in rcu_report_qs_rdp())
>>> > and rcu_report_qs_rdp() might warn.
>>> >
>>> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>>> > index 58d84c59f3dd..368f00267d4e 100644
>>> > --- a/include/linux/rcupdate.h
>>> > +++ b/include/linux/rcupdate.h
>>> > @@ -95,9 +95,9 @@ static inline void __rcu_read_lock(void)
>>> >
>>> >  static inline void __rcu_read_unlock(void)
>>> >  {
>>> > -	preempt_enable();
>>> >  	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
>>> >  		rcu_read_unlock_strict();
>>> > +	preempt_enable();
>>> >  }
>>> >
>>> >  static inline int rcu_preempt_depth(void)
>>>
>>> Based on the discussion on the thread, how about keeping this and
>>> changing the preempt_count check in rcu_read_unlock_strict() instead?
>>>
>>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>>> index 1c7cbd145d5e..8fc67639d3a7 100644
>>> --- a/kernel/rcu/tree_plugin.h
>>> +++ b/kernel/rcu/tree_plugin.h
>>> @@ -831,8 +831,15 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
>>>  void rcu_read_unlock_strict(void)
>>>  {
>>>         struct rcu_data *rdp;
>>> +       int pc = ((preempt_count() & PREEMPT_MASK) >> PREEMPT_SHIFT);
>>
>> This should be in_atomic_preempt_off(), otherwise softirqs and IRQs are
>> spuriously accounted as quiescent states.
>
> Not sure I got that. Won't ((preempt_count() & PREEMPT_MASK) >> PREEMPT_SHIFT)
> give us task only preempt count?

Oh wait. I see your point now. My check is too narrow.

Great. This'll work:

-       if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
+       if (irqs_disabled() || in_atomic_preempt_off()|| !rcu_state.gp_kthread)

Thanks

Ankur

> And, given that the preempt_count is at least 1, the (pc > 1) check below
> would ensure we have a stable rdp and call rcu_report_qs_rdp() before
> dropping the last preempt-count.
>
>>>
>>> -       if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
>>> +       /*
>>> +        * rcu_report_qs_rdp() can only be invoked with a stable rdp and
>>> +        * and from the local CPU.
>>> +        * With CONFIG_PREEMPTION=y, do this while holding the last
>>> +        * preempt_count which gets dropped after __rcu_read_unlock().
>>> +        */
>>> +       if (irqs_disabled() || pc > 1 || !rcu_state.gp_kthread)
>>>                 return;
>>>         rdp = this_cpu_ptr(&rcu_data);
>>>         rdp->cpu_no_qs.b.norm = false;
>
> Thanks


--
ankur

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations
  2024-11-27  6:19           ` Ankur Arora
@ 2024-11-28 12:33             ` Frederic Weisbecker
  2024-11-29  4:39               ` Ankur Arora
  0 siblings, 1 reply; 36+ messages in thread
From: Frederic Weisbecker @ 2024-11-28 12:33 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, efault, sshegde, boris.ostrovsky

Le Tue, Nov 26, 2024 at 10:19:05PM -0800, Ankur Arora a écrit :
> 
> Ankur Arora <ankur.a.arora@oracle.com> writes:
> 
> > Frederic Weisbecker <frederic@kernel.org> writes:
> >
> >> Le Mon, Nov 25, 2024 at 01:40:39PM -0800, Ankur Arora a écrit :
> >>>
> >>> Frederic Weisbecker <frederic@kernel.org> writes:
> >>>
> >>> > Le Wed, Nov 06, 2024 at 12:17:55PM -0800, Ankur Arora a écrit :
> >>> >> PREEMPT_LAZY can be enabled stand-alone or alongside PREEMPT_DYNAMIC
> >>> >> which allows for dynamic switching of preemption models.
> >>> >>
> >>> >> The choice of PREEMPT_RCU or not, however, is fixed at compile time.
> >>> >>
> >>> >> Given that PREEMPT_RCU makes some trade-offs to optimize for latency
> >>> >> as opposed to throughput, configurations with limited preemption
> >>> >> might prefer the stronger forward-progress guarantees of PREEMPT_RCU=n.
> >>> >>
> >>> >> Accordingly, explicitly limit PREEMPT_RCU=y to the latency oriented
> >>> >> preemption models: PREEMPT, PREEMPT_RT, and the runtime configurable
> >>> >> model PREEMPT_DYNAMIC.
> >>> >>
> >>> >> This means the throughput oriented models, PREEMPT_NONE,
> >>> >> PREEMPT_VOLUNTARY and PREEMPT_LAZY will run with PREEMPT_RCU=n.
> >>> >>
> >>> >> Cc: Paul E. McKenney <paulmck@kernel.org>
> >>> >> Cc: Peter Zijlstra <peterz@infradead.org>
> >>> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> >>> >> ---
> >>> >>  kernel/rcu/Kconfig | 2 +-
> >>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>> >>
> >>> >> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
> >>> >> index 5a7ff5e1cdcb..9d52f87fac27 100644
> >>> >> --- a/kernel/rcu/Kconfig
> >>> >> +++ b/kernel/rcu/Kconfig
> >>> >> @@ -18,7 +18,7 @@ config TREE_RCU
> >>> >>
> >>> >>  config PREEMPT_RCU
> >>> >>  	bool
> >>> >> -	default y if PREEMPTION
> >>> >> +	default y if (PREEMPT || PREEMPT_RT || PREEMPT_DYNAMIC)
> >>> >>  	select TREE_RCU
> >>> >>  	help
> >>> >>  	  This option selects the RCU implementation that is
> >>> >
> >>> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
> >>> >
> >>> > But looking at !CONFIG_PREEMPT_RCU code on tree_plugin.h, I see
> >>> > some issues now that the code can be preemptible. Well I think
> >>> > it has always been preemptible but PREEMPTION && !PREEMPT_RCU
> >>> > has seldom been exerciced (or was it even possible?).
> >>> >
> >>> > For example rcu_read_unlock_strict() can be called with preemption
> >>> > enabled so we need the following otherwise the rdp is unstable, the
> >>> > norm value becomes racy (though automagically fixed in rcu_report_qs_rdp())
> >>> > and rcu_report_qs_rdp() might warn.
> >>> >
> >>> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
> >>> > index 58d84c59f3dd..368f00267d4e 100644
> >>> > --- a/include/linux/rcupdate.h
> >>> > +++ b/include/linux/rcupdate.h
> >>> > @@ -95,9 +95,9 @@ static inline void __rcu_read_lock(void)
> >>> >
> >>> >  static inline void __rcu_read_unlock(void)
> >>> >  {
> >>> > -	preempt_enable();
> >>> >  	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
> >>> >  		rcu_read_unlock_strict();
> >>> > +	preempt_enable();
> >>> >  }
> >>> >
> >>> >  static inline int rcu_preempt_depth(void)
> >>>
> >>> Based on the discussion on the thread, how about keeping this and
> >>> changing the preempt_count check in rcu_read_unlock_strict() instead?
> >>>
> >>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> >>> index 1c7cbd145d5e..8fc67639d3a7 100644
> >>> --- a/kernel/rcu/tree_plugin.h
> >>> +++ b/kernel/rcu/tree_plugin.h
> >>> @@ -831,8 +831,15 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
> >>>  void rcu_read_unlock_strict(void)
> >>>  {
> >>>         struct rcu_data *rdp;
> >>> +       int pc = ((preempt_count() & PREEMPT_MASK) >> PREEMPT_SHIFT);
> >>
> >> This should be in_atomic_preempt_off(), otherwise softirqs and IRQs are
> >> spuriously accounted as quiescent states.
> >
> > Not sure I got that. Won't ((preempt_count() & PREEMPT_MASK) >> PREEMPT_SHIFT)
> > give us task only preempt count?
> 
> Oh wait. I see your point now. My check is too narrow.
> 
> Great. This'll work:
> 
> -       if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
> +       if (irqs_disabled() || in_atomic_preempt_off()|| !rcu_state.gp_kthread)
> 
> Thanks

Do you plan to integrate this in a further version of your set? Or should I send
a patch?

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 4/6] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y
  2024-11-06 20:17 ` [PATCH v2 4/6] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y Ankur Arora
  2024-11-14  8:50   ` Sebastian Andrzej Siewior
@ 2024-11-28 13:36   ` Frederic Weisbecker
  1 sibling, 0 replies; 36+ messages in thread
From: Frederic Weisbecker @ 2024-11-28 13:36 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, efault, sshegde, boris.ostrovsky

Le Wed, Nov 06, 2024 at 12:17:56PM -0800, Ankur Arora a écrit :
> With PREEMPT_RCU=n, cond_resched() provides urgently needed quiescent
> states for read-side critical sections via rcu_all_qs().
> One reason why this was needed, was lacking preempt-count, the tick
> handler has no way of knowing whether it is executing in a read-side
> critical section or not.
> 
> With PREEMPT_LAZY=y, there can be configurations with PREEMPT_COUNT=y,
> PREEMPT_RCU=n, where cond_resched() is a stub that does not provide
> quiescent states via rcu_all_qs().
> 
> So, use the availability of preempt_count() to report quiescent states
> in rcu_flavor_sched_clock_irq().
> 
> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>  kernel/rcu/tree_plugin.h | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> index 1c7cbd145d5e..da324d66034b 100644
> --- a/kernel/rcu/tree_plugin.h
> +++ b/kernel/rcu/tree_plugin.h
> @@ -974,13 +974,16 @@ static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp)
>   */
>  static void rcu_flavor_sched_clock_irq(int user)
>  {
> -	if (user || rcu_is_cpu_rrupt_from_idle()) {
> +	if (user || rcu_is_cpu_rrupt_from_idle() ||
> +	     (IS_ENABLED(CONFIG_PREEMPT_COUNT) &&
> +	      !(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK)))) {

I'm never sure if nested hardirqs are still possible but just in case,

    preempt_count() == HARDIRQ_OFFSET

might be a more robust check. And that also applies to the PREEMPT_RCU
implementation.

Thanks.

>  
>  		/*
>  		 * Get here if this CPU took its interrupt from user
> -		 * mode or from the idle loop, and if this is not a
> -		 * nested interrupt.  In this case, the CPU is in
> -		 * a quiescent state, so note it.
> +		 * mode, from the idle loop without this being a nested
> +		 * interrupt, or while not holding a preempt count (but
> +		 * with PREEMPT_COUNT=y. In this case, the CPU is in a
> +		 * quiescent state, so note it.
>  		 *
>  		 * No memory barrier is required here because rcu_qs()
>  		 * references only CPU-local variables that other CPUs
> -- 
> 2.43.5
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/6] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y
  2024-11-06 20:17 ` [PATCH v2 5/6] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y Ankur Arora
  2024-11-14  9:22   ` Sebastian Andrzej Siewior
@ 2024-11-28 14:33   ` Frederic Weisbecker
  2024-11-29  5:03     ` Ankur Arora
  1 sibling, 1 reply; 36+ messages in thread
From: Frederic Weisbecker @ 2024-11-28 14:33 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, efault, sshegde, boris.ostrovsky,
	Daniel Bristot de Oliveira

Le Wed, Nov 06, 2024 at 12:17:57PM -0800, Ankur Arora a écrit :
> To reduce RCU noise for nohz_full configurations, osnoise depends
> on cond_resched() providing quiescent states for PREEMPT_RCU=n
> configurations. And, for PREEMPT_RCU=y configurations does this
> by directly calling rcu_momentary_eqs().
> 
> With PREEMPT_LAZY=y, however, we can have configurations with
> (PREEMPTION=y, PREEMPT_RCU=n), which means neither of the above
> can help.
> 
> Handle that by fallback to the explicit quiescent states via
> rcu_momentary_eqs().
> 
> Cc: Paul E. McKenney <paulmck@kernel.org>
> Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
> Cc: Steven Rostedt <rostedt@goodmis.org>
> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> ---
>  kernel/trace/trace_osnoise.c | 22 ++++++++++++----------
>  1 file changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> index a50ed23bee77..15e9600d231d 100644
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -1538,18 +1538,20 @@ static int run_osnoise(void)
>  		/*
>  		 * In some cases, notably when running on a nohz_full CPU with
>  		 * a stopped tick PREEMPT_RCU has no way to account for QSs.
> -		 * This will eventually cause unwarranted noise as PREEMPT_RCU
> -		 * will force preemption as the means of ending the current
> -		 * grace period. We avoid this problem by calling
> -		 * rcu_momentary_eqs(), which performs a zero duration
> -		 * EQS allowing PREEMPT_RCU to end the current grace period.
> -		 * This call shouldn't be wrapped inside an RCU critical
> -		 * section.
> +		 * This will eventually cause unwarranted noise as RCU forces
> +		 * preemption as the means of ending the current grace period.
> +		 * We avoid this by calling rcu_momentary_eqs(), which performs
> +		 * a zero duration EQS allowing RCU to end the current grace
> +		 * period. This call shouldn't be wrapped inside an RCU
> +		 * critical section.
>  		 *
> -		 * Note that in non PREEMPT_RCU kernels QSs are handled through
> -		 * cond_resched()
> +		 * For non-PREEMPT_RCU kernels with cond_resched() (non-
> +		 * PREEMPT_LAZY configurations), QSs are handled through
> +		 * cond_resched(). For PREEMPT_LAZY kernels, we fallback to
> +		 * the zero duration QS via rcu_momentary_eqs().
>  		 */
> -		if (IS_ENABLED(CONFIG_PREEMPT_RCU)) {
> +		if (IS_ENABLED(CONFIG_PREEMPT_RCU) ||
> +		    (!IS_ENABLED(CONFIG_PREEMPT_RCU) && IS_ENABLED(CONFIG_PREEMPTION))) {
>  			if (!disable_irq)
>  				local_irq_disable();

How about making this unconditional so it works everywhere and doesn't
rely on cond_resched() Kconfig/preempt-dynamic mood?

Thanks.


>  
> -- 
> 2.43.5
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations
  2024-11-28 12:33             ` Frederic Weisbecker
@ 2024-11-29  4:39               ` Ankur Arora
  2024-11-29 12:49                 ` Frederic Weisbecker
  0 siblings, 1 reply; 36+ messages in thread
From: Ankur Arora @ 2024-11-29  4:39 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ankur Arora, linux-kernel, peterz, tglx, paulmck, mingo, bigeasy,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, efault, sshegde, boris.ostrovsky


Frederic Weisbecker <frederic@kernel.org> writes:

> Le Tue, Nov 26, 2024 at 10:19:05PM -0800, Ankur Arora a écrit :
>>
>> Ankur Arora <ankur.a.arora@oracle.com> writes:
>>
>> > Frederic Weisbecker <frederic@kernel.org> writes:
>> >
>> >> Le Mon, Nov 25, 2024 at 01:40:39PM -0800, Ankur Arora a écrit :
>> >>>
>> >>> Frederic Weisbecker <frederic@kernel.org> writes:
>> >>>
>> >>> > Le Wed, Nov 06, 2024 at 12:17:55PM -0800, Ankur Arora a écrit :
>> >>> >> PREEMPT_LAZY can be enabled stand-alone or alongside PREEMPT_DYNAMIC
>> >>> >> which allows for dynamic switching of preemption models.
>> >>> >>
>> >>> >> The choice of PREEMPT_RCU or not, however, is fixed at compile time.
>> >>> >>
>> >>> >> Given that PREEMPT_RCU makes some trade-offs to optimize for latency
>> >>> >> as opposed to throughput, configurations with limited preemption
>> >>> >> might prefer the stronger forward-progress guarantees of PREEMPT_RCU=n.
>> >>> >>
>> >>> >> Accordingly, explicitly limit PREEMPT_RCU=y to the latency oriented
>> >>> >> preemption models: PREEMPT, PREEMPT_RT, and the runtime configurable
>> >>> >> model PREEMPT_DYNAMIC.
>> >>> >>
>> >>> >> This means the throughput oriented models, PREEMPT_NONE,
>> >>> >> PREEMPT_VOLUNTARY and PREEMPT_LAZY will run with PREEMPT_RCU=n.
>> >>> >>
>> >>> >> Cc: Paul E. McKenney <paulmck@kernel.org>
>> >>> >> Cc: Peter Zijlstra <peterz@infradead.org>
>> >>> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> >>> >> ---
>> >>> >>  kernel/rcu/Kconfig | 2 +-
>> >>> >>  1 file changed, 1 insertion(+), 1 deletion(-)
>> >>> >>
>> >>> >> diff --git a/kernel/rcu/Kconfig b/kernel/rcu/Kconfig
>> >>> >> index 5a7ff5e1cdcb..9d52f87fac27 100644
>> >>> >> --- a/kernel/rcu/Kconfig
>> >>> >> +++ b/kernel/rcu/Kconfig
>> >>> >> @@ -18,7 +18,7 @@ config TREE_RCU
>> >>> >>
>> >>> >>  config PREEMPT_RCU
>> >>> >>  	bool
>> >>> >> -	default y if PREEMPTION
>> >>> >> +	default y if (PREEMPT || PREEMPT_RT || PREEMPT_DYNAMIC)
>> >>> >>  	select TREE_RCU
>> >>> >>  	help
>> >>> >>  	  This option selects the RCU implementation that is
>> >>> >
>> >>> > Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
>> >>> >
>> >>> > But looking at !CONFIG_PREEMPT_RCU code on tree_plugin.h, I see
>> >>> > some issues now that the code can be preemptible. Well I think
>> >>> > it has always been preemptible but PREEMPTION && !PREEMPT_RCU
>> >>> > has seldom been exerciced (or was it even possible?).
>> >>> >
>> >>> > For example rcu_read_unlock_strict() can be called with preemption
>> >>> > enabled so we need the following otherwise the rdp is unstable, the
>> >>> > norm value becomes racy (though automagically fixed in rcu_report_qs_rdp())
>> >>> > and rcu_report_qs_rdp() might warn.
>> >>> >
>> >>> > diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
>> >>> > index 58d84c59f3dd..368f00267d4e 100644
>> >>> > --- a/include/linux/rcupdate.h
>> >>> > +++ b/include/linux/rcupdate.h
>> >>> > @@ -95,9 +95,9 @@ static inline void __rcu_read_lock(void)
>> >>> >
>> >>> >  static inline void __rcu_read_unlock(void)
>> >>> >  {
>> >>> > -	preempt_enable();
>> >>> >  	if (IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD))
>> >>> >  		rcu_read_unlock_strict();
>> >>> > +	preempt_enable();
>> >>> >  }
>> >>> >
>> >>> >  static inline int rcu_preempt_depth(void)
>> >>>
>> >>> Based on the discussion on the thread, how about keeping this and
>> >>> changing the preempt_count check in rcu_read_unlock_strict() instead?
>> >>>
>> >>> diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
>> >>> index 1c7cbd145d5e..8fc67639d3a7 100644
>> >>> --- a/kernel/rcu/tree_plugin.h
>> >>> +++ b/kernel/rcu/tree_plugin.h
>> >>> @@ -831,8 +831,15 @@ dump_blkd_tasks(struct rcu_node *rnp, int ncheck)
>> >>>  void rcu_read_unlock_strict(void)
>> >>>  {
>> >>>         struct rcu_data *rdp;
>> >>> +       int pc = ((preempt_count() & PREEMPT_MASK) >> PREEMPT_SHIFT);
>> >>
>> >> This should be in_atomic_preempt_off(), otherwise softirqs and IRQs are
>> >> spuriously accounted as quiescent states.
>> >
>> > Not sure I got that. Won't ((preempt_count() & PREEMPT_MASK) >> PREEMPT_SHIFT)
>> > give us task only preempt count?
>>
>> Oh wait. I see your point now. My check is too narrow.
>>
>> Great. This'll work:
>>
>> -       if (irqs_disabled() || preempt_count() || !rcu_state.gp_kthread)
>> +       if (irqs_disabled() || in_atomic_preempt_off()|| !rcu_state.gp_kthread)
>>
>> Thanks
>
> Do you plan to integrate this in a further version of your set? Or should I send
> a patch?

Sure. I can add it to v3. Okay, if I add your suggested-by?

--
ankur

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/6] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y
  2024-11-28 14:33   ` Frederic Weisbecker
@ 2024-11-29  5:03     ` Ankur Arora
  2024-11-29 14:22       ` Frederic Weisbecker
  0 siblings, 1 reply; 36+ messages in thread
From: Ankur Arora @ 2024-11-29  5:03 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ankur Arora, linux-kernel, peterz, tglx, paulmck, mingo, bigeasy,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, efault, sshegde, boris.ostrovsky,
	Daniel Bristot de Oliveira


Frederic Weisbecker <frederic@kernel.org> writes:

> Le Wed, Nov 06, 2024 at 12:17:57PM -0800, Ankur Arora a écrit :
>> To reduce RCU noise for nohz_full configurations, osnoise depends
>> on cond_resched() providing quiescent states for PREEMPT_RCU=n
>> configurations. And, for PREEMPT_RCU=y configurations does this
>> by directly calling rcu_momentary_eqs().
>>
>> With PREEMPT_LAZY=y, however, we can have configurations with
>> (PREEMPTION=y, PREEMPT_RCU=n), which means neither of the above
>> can help.
>>
>> Handle that by fallback to the explicit quiescent states via
>> rcu_momentary_eqs().
>>
>> Cc: Paul E. McKenney <paulmck@kernel.org>
>> Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
>> Cc: Steven Rostedt <rostedt@goodmis.org>
>> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
>> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>  kernel/trace/trace_osnoise.c | 22 ++++++++++++----------
>>  1 file changed, 12 insertions(+), 10 deletions(-)
>>
>> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
>> index a50ed23bee77..15e9600d231d 100644
>> --- a/kernel/trace/trace_osnoise.c
>> +++ b/kernel/trace/trace_osnoise.c
>> @@ -1538,18 +1538,20 @@ static int run_osnoise(void)
>>  		/*
>>  		 * In some cases, notably when running on a nohz_full CPU with
>>  		 * a stopped tick PREEMPT_RCU has no way to account for QSs.
>> -		 * This will eventually cause unwarranted noise as PREEMPT_RCU
>> -		 * will force preemption as the means of ending the current
>> -		 * grace period. We avoid this problem by calling
>> -		 * rcu_momentary_eqs(), which performs a zero duration
>> -		 * EQS allowing PREEMPT_RCU to end the current grace period.
>> -		 * This call shouldn't be wrapped inside an RCU critical
>> -		 * section.
>> +		 * This will eventually cause unwarranted noise as RCU forces
>> +		 * preemption as the means of ending the current grace period.
>> +		 * We avoid this by calling rcu_momentary_eqs(), which performs
>> +		 * a zero duration EQS allowing RCU to end the current grace
>> +		 * period. This call shouldn't be wrapped inside an RCU
>> +		 * critical section.
>>  		 *
>> -		 * Note that in non PREEMPT_RCU kernels QSs are handled through
>> -		 * cond_resched()
>> +		 * For non-PREEMPT_RCU kernels with cond_resched() (non-
>> +		 * PREEMPT_LAZY configurations), QSs are handled through
>> +		 * cond_resched(). For PREEMPT_LAZY kernels, we fallback to
>> +		 * the zero duration QS via rcu_momentary_eqs().
>>  		 */
>> -		if (IS_ENABLED(CONFIG_PREEMPT_RCU)) {
>> +		if (IS_ENABLED(CONFIG_PREEMPT_RCU) ||
>> +		    (!IS_ENABLED(CONFIG_PREEMPT_RCU) && IS_ENABLED(CONFIG_PREEMPTION))) {
>>  			if (!disable_irq)
>>  				local_irq_disable();
>
> How about making this unconditional so it works everywhere and doesn't
> rely on cond_resched() Kconfig/preempt-dynamic mood?

I think it's a minor matter given that this isn't a hot path, but
we don't really need it for the !PREEMPT_RCU configuration.

Still, given that both of those clauses imply CONFIG_PREEMPTION, we
can just simplify this to (with an appropriately adjusted comment):

--- a/kernel/trace/trace_osnoise.c
+++ b/kernel/trace/trace_osnoise.c
@@ -1543,7 +1543,7 @@ static int run_osnoise(void)
                 * Note that in non PREEMPT_RCU kernels QSs are handled through
                 * cond_resched()
                 */
-               if (IS_ENABLED(CONFIG_PREEMPT_RCU)) {
+               if (IS_ENABLED(CONFIG_PREEMPTION)) {
                        if (!disable_irq)
                                local_irq_disable();

--
ankur

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations
  2024-11-29  4:39               ` Ankur Arora
@ 2024-11-29 12:49                 ` Frederic Weisbecker
  0 siblings, 0 replies; 36+ messages in thread
From: Frederic Weisbecker @ 2024-11-29 12:49 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, efault, sshegde, boris.ostrovsky

Le Thu, Nov 28, 2024 at 08:39:01PM -0800, Ankur Arora a écrit :
> > Do you plan to integrate this in a further version of your set? Or should I send
> > a patch?
> 
> Sure. I can add it to v3. Okay, if I add your suggested-by?

Sure! Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/6] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y
  2024-11-29  5:03     ` Ankur Arora
@ 2024-11-29 14:22       ` Frederic Weisbecker
  2024-11-29 19:21         ` Ankur Arora
  0 siblings, 1 reply; 36+ messages in thread
From: Frederic Weisbecker @ 2024-11-29 14:22 UTC (permalink / raw)
  To: Ankur Arora
  Cc: linux-kernel, peterz, tglx, paulmck, mingo, bigeasy, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, efault, sshegde, boris.ostrovsky,
	Daniel Bristot de Oliveira

Le Thu, Nov 28, 2024 at 09:03:56PM -0800, Ankur Arora a écrit :
> 
> Frederic Weisbecker <frederic@kernel.org> writes:
> 
> > Le Wed, Nov 06, 2024 at 12:17:57PM -0800, Ankur Arora a écrit :
> >> To reduce RCU noise for nohz_full configurations, osnoise depends
> >> on cond_resched() providing quiescent states for PREEMPT_RCU=n
> >> configurations. And, for PREEMPT_RCU=y configurations does this
> >> by directly calling rcu_momentary_eqs().
> >>
> >> With PREEMPT_LAZY=y, however, we can have configurations with
> >> (PREEMPTION=y, PREEMPT_RCU=n), which means neither of the above
> >> can help.
> >>
> >> Handle that by fallback to the explicit quiescent states via
> >> rcu_momentary_eqs().
> >>
> >> Cc: Paul E. McKenney <paulmck@kernel.org>
> >> Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
> >> Cc: Steven Rostedt <rostedt@goodmis.org>
> >> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
> >> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
> >> ---
> >>  kernel/trace/trace_osnoise.c | 22 ++++++++++++----------
> >>  1 file changed, 12 insertions(+), 10 deletions(-)
> >>
> >> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
> >> index a50ed23bee77..15e9600d231d 100644
> >> --- a/kernel/trace/trace_osnoise.c
> >> +++ b/kernel/trace/trace_osnoise.c
> >> @@ -1538,18 +1538,20 @@ static int run_osnoise(void)
> >>  		/*
> >>  		 * In some cases, notably when running on a nohz_full CPU with
> >>  		 * a stopped tick PREEMPT_RCU has no way to account for QSs.
> >> -		 * This will eventually cause unwarranted noise as PREEMPT_RCU
> >> -		 * will force preemption as the means of ending the current
> >> -		 * grace period. We avoid this problem by calling
> >> -		 * rcu_momentary_eqs(), which performs a zero duration
> >> -		 * EQS allowing PREEMPT_RCU to end the current grace period.
> >> -		 * This call shouldn't be wrapped inside an RCU critical
> >> -		 * section.
> >> +		 * This will eventually cause unwarranted noise as RCU forces
> >> +		 * preemption as the means of ending the current grace period.
> >> +		 * We avoid this by calling rcu_momentary_eqs(), which performs
> >> +		 * a zero duration EQS allowing RCU to end the current grace
> >> +		 * period. This call shouldn't be wrapped inside an RCU
> >> +		 * critical section.
> >>  		 *
> >> -		 * Note that in non PREEMPT_RCU kernels QSs are handled through
> >> -		 * cond_resched()
> >> +		 * For non-PREEMPT_RCU kernels with cond_resched() (non-
> >> +		 * PREEMPT_LAZY configurations), QSs are handled through
> >> +		 * cond_resched(). For PREEMPT_LAZY kernels, we fallback to
> >> +		 * the zero duration QS via rcu_momentary_eqs().
> >>  		 */
> >> -		if (IS_ENABLED(CONFIG_PREEMPT_RCU)) {
> >> +		if (IS_ENABLED(CONFIG_PREEMPT_RCU) ||
> >> +		    (!IS_ENABLED(CONFIG_PREEMPT_RCU) && IS_ENABLED(CONFIG_PREEMPTION))) {
> >>  			if (!disable_irq)
> >>  				local_irq_disable();
> >
> > How about making this unconditional so it works everywhere and doesn't
> > rely on cond_resched() Kconfig/preempt-dynamic mood?
> 
> I think it's a minor matter given that this isn't a hot path, but
> we don't really need it for the !PREEMPT_RCU configuration.

Well if you make it unconditional, cond_resched() / rcu_all_qs() won't do its
own rcu_momentary_qs(), because rcu_data.rcu_urgent_qs should
be false. So that essentially unify the behaviours for all configurations.

Thanks.

> 
> Still, given that both of those clauses imply CONFIG_PREEMPTION, we
> can just simplify this to (with an appropriately adjusted comment):
> 
> --- a/kernel/trace/trace_osnoise.c
> +++ b/kernel/trace/trace_osnoise.c
> @@ -1543,7 +1543,7 @@ static int run_osnoise(void)
>                  * Note that in non PREEMPT_RCU kernels QSs are handled through
>                  * cond_resched()
>                  */
> -               if (IS_ENABLED(CONFIG_PREEMPT_RCU)) {
> +               if (IS_ENABLED(CONFIG_PREEMPTION)) {
>                         if (!disable_irq)
>                                 local_irq_disable();
> 
> --
> ankur

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v2 5/6] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y
  2024-11-29 14:22       ` Frederic Weisbecker
@ 2024-11-29 19:21         ` Ankur Arora
  0 siblings, 0 replies; 36+ messages in thread
From: Ankur Arora @ 2024-11-29 19:21 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ankur Arora, linux-kernel, peterz, tglx, paulmck, mingo, bigeasy,
	juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall,
	mgorman, vschneid, efault, sshegde, boris.ostrovsky,
	Daniel Bristot de Oliveira


Frederic Weisbecker <frederic@kernel.org> writes:

> Le Thu, Nov 28, 2024 at 09:03:56PM -0800, Ankur Arora a écrit :
>>
>> Frederic Weisbecker <frederic@kernel.org> writes:
>>
>> > Le Wed, Nov 06, 2024 at 12:17:57PM -0800, Ankur Arora a écrit :
>> >> To reduce RCU noise for nohz_full configurations, osnoise depends
>> >> on cond_resched() providing quiescent states for PREEMPT_RCU=n
>> >> configurations. And, for PREEMPT_RCU=y configurations does this
>> >> by directly calling rcu_momentary_eqs().
>> >>
>> >> With PREEMPT_LAZY=y, however, we can have configurations with
>> >> (PREEMPTION=y, PREEMPT_RCU=n), which means neither of the above
>> >> can help.
>> >>
>> >> Handle that by fallback to the explicit quiescent states via
>> >> rcu_momentary_eqs().
>> >>
>> >> Cc: Paul E. McKenney <paulmck@kernel.org>
>> >> Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
>> >> Cc: Steven Rostedt <rostedt@goodmis.org>
>> >> Suggested-by: Paul E. McKenney <paulmck@kernel.org>
>> >> Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
>> >> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> >> ---
>> >>  kernel/trace/trace_osnoise.c | 22 ++++++++++++----------
>> >>  1 file changed, 12 insertions(+), 10 deletions(-)
>> >>
>> >> diff --git a/kernel/trace/trace_osnoise.c b/kernel/trace/trace_osnoise.c
>> >> index a50ed23bee77..15e9600d231d 100644
>> >> --- a/kernel/trace/trace_osnoise.c
>> >> +++ b/kernel/trace/trace_osnoise.c
>> >> @@ -1538,18 +1538,20 @@ static int run_osnoise(void)
>> >>  		/*
>> >>  		 * In some cases, notably when running on a nohz_full CPU with
>> >>  		 * a stopped tick PREEMPT_RCU has no way to account for QSs.
>> >> -		 * This will eventually cause unwarranted noise as PREEMPT_RCU
>> >> -		 * will force preemption as the means of ending the current
>> >> -		 * grace period. We avoid this problem by calling
>> >> -		 * rcu_momentary_eqs(), which performs a zero duration
>> >> -		 * EQS allowing PREEMPT_RCU to end the current grace period.
>> >> -		 * This call shouldn't be wrapped inside an RCU critical
>> >> -		 * section.
>> >> +		 * This will eventually cause unwarranted noise as RCU forces
>> >> +		 * preemption as the means of ending the current grace period.
>> >> +		 * We avoid this by calling rcu_momentary_eqs(), which performs
>> >> +		 * a zero duration EQS allowing RCU to end the current grace
>> >> +		 * period. This call shouldn't be wrapped inside an RCU
>> >> +		 * critical section.
>> >>  		 *
>> >> -		 * Note that in non PREEMPT_RCU kernels QSs are handled through
>> >> -		 * cond_resched()
>> >> +		 * For non-PREEMPT_RCU kernels with cond_resched() (non-
>> >> +		 * PREEMPT_LAZY configurations), QSs are handled through
>> >> +		 * cond_resched(). For PREEMPT_LAZY kernels, we fallback to
>> >> +		 * the zero duration QS via rcu_momentary_eqs().
>> >>  		 */
>> >> -		if (IS_ENABLED(CONFIG_PREEMPT_RCU)) {
>> >> +		if (IS_ENABLED(CONFIG_PREEMPT_RCU) ||
>> >> +		    (!IS_ENABLED(CONFIG_PREEMPT_RCU) && IS_ENABLED(CONFIG_PREEMPTION))) {
>> >>  			if (!disable_irq)
>> >>  				local_irq_disable();
>> >
>> > How about making this unconditional so it works everywhere and doesn't
>> > rely on cond_resched() Kconfig/preempt-dynamic mood?
>>
>> I think it's a minor matter given that this isn't a hot path, but
>> we don't really need it for the !PREEMPT_RCU configuration.
>
> Well if you make it unconditional, cond_resched() / rcu_all_qs() won't do its
> own rcu_momentary_qs(), because rcu_data.rcu_urgent_qs should
> be false. So that essentially unify the behaviours for all configurations.

Ah, yes. That makes sense.

Ankur

>>
>> Still, given that both of those clauses imply CONFIG_PREEMPTION, we
>> can just simplify this to (with an appropriately adjusted comment):
>>
>> --- a/kernel/trace/trace_osnoise.c
>> +++ b/kernel/trace/trace_osnoise.c
>> @@ -1543,7 +1543,7 @@ static int run_osnoise(void)
>>                  * Note that in non PREEMPT_RCU kernels QSs are handled through
>>                  * cond_resched()
>>                  */
>> -               if (IS_ENABLED(CONFIG_PREEMPT_RCU)) {
>> +               if (IS_ENABLED(CONFIG_PREEMPTION)) {
>>                         if (!disable_irq)
>>                                 local_irq_disable();
>>
>> --
>> ankur


--
ankur

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2024-11-29 19:22 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-06 20:17 [PATCH v2 0/6] RCU changes for PREEMPT_LAZY Ankur Arora
2024-11-06 20:17 ` [PATCH v2 1/6] rcu: fix header guard for rcu_all_qs() Ankur Arora
2024-11-13 14:50   ` Frederic Weisbecker
2024-11-14  7:06   ` Sebastian Andrzej Siewior
2024-11-15  4:55     ` Ankur Arora
2024-11-06 20:17 ` [PATCH v2 2/6] rcu: rename PREEMPT_AUTO to PREEMPT_LAZY Ankur Arora
2024-11-13 14:59   ` Frederic Weisbecker
2024-11-13 23:51     ` Ankur Arora
2024-11-14  7:07   ` Sebastian Andrzej Siewior
2024-11-06 20:17 ` [PATCH v2 3/6] rcu: limit PREEMPT_RCU configurations Ankur Arora
2024-11-13 15:31   ` Frederic Weisbecker
2024-11-14  0:23     ` Ankur Arora
2024-11-14  8:25       ` Sebastian Andrzej Siewior
2024-11-14 11:36         ` Frederic Weisbecker
2024-11-25 21:40     ` Ankur Arora
2024-11-26 14:49       ` Frederic Weisbecker
2024-11-27  5:35         ` Ankur Arora
2024-11-27  6:19           ` Ankur Arora
2024-11-28 12:33             ` Frederic Weisbecker
2024-11-29  4:39               ` Ankur Arora
2024-11-29 12:49                 ` Frederic Weisbecker
2024-11-06 20:17 ` [PATCH v2 4/6] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT_COUNT=y Ankur Arora
2024-11-14  8:50   ` Sebastian Andrzej Siewior
2024-11-15  4:58     ` Ankur Arora
2024-11-28 13:36   ` Frederic Weisbecker
2024-11-06 20:17 ` [PATCH v2 5/6] osnoise: handle quiescent states for PREEMPT_RCU=n, PREEMPTION=y Ankur Arora
2024-11-14  9:22   ` Sebastian Andrzej Siewior
2024-11-15  4:59     ` Ankur Arora
2024-11-28 14:33   ` Frederic Weisbecker
2024-11-29  5:03     ` Ankur Arora
2024-11-29 14:22       ` Frederic Weisbecker
2024-11-29 19:21         ` Ankur Arora
2024-11-06 20:17 ` [PATCH v2 6/6] sched: warn for high latency with TIF_NEED_RESCHED_LAZY Ankur Arora
2024-11-14  9:16   ` Sebastian Andrzej Siewior
2024-11-14 10:01 ` [PATCH v2 0/6] RCU changes for PREEMPT_LAZY Sebastian Andrzej Siewior
2024-11-15  5:20   ` Ankur Arora

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox