rcu.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] Make rcutorture safe(r) for arm64
@ 2025-05-08 23:42 Paul E. McKenney
  2025-05-08 23:42 ` [PATCH 1/3] torture: Check for "Call trace:" as well as "Call Trace:" Paul E. McKenney
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Paul E. McKenney @ 2025-05-08 23:42 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt

Hello!

This series makes a few small updates to make rcutorture run better
on arm64 servers.  Remaining issues include TREE07 .config issues
that are addressed by Mark Rutland's porting of PREEMPT_LAZY to arm64
and by upcoming work to handle the fact that arm64 kernels cannot be
built with CONFIG_SMP=n.  In the meantime, the CONFIG_SMP=n issue can
be worked around by explictly specifying the TREE01, TREE02, TREE03,
TREE04, TREE05, TREE07, SRCU-L, SRCU-N, SRCU-P, TASKS01, TASKS03, RUDE01,
TRACE01, and TRACE02 scenarios, preferably in a script.  (But if you
want typing practice, don't let me stand in your way!)

1.	Check for "Call trace:" as well as "Call Trace:".

2.	Reduce TREE01 CPU overcommit.

3.	Remove MAXSMP and CPUMASK_OFFSTACK from TREE01.

						Thanx, Paul

------------------------------------------------------------------------

 bin/console-badness.sh  |    2 +-
 bin/parse-console.sh    |    2 +-
 configs/rcu/TREE01      |    2 --
 configs/rcu/TREE01.boot |    2 +-
 4 files changed, 3 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] torture: Check for "Call trace:" as well as "Call Trace:"
  2025-05-08 23:42 [PATCH 0/3] Make rcutorture safe(r) for arm64 Paul E. McKenney
@ 2025-05-08 23:42 ` Paul E. McKenney
  2025-05-08 23:42 ` [PATCH 2/3] rcutorture: Reduce TREE01 CPU overcommit Paul E. McKenney
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2025-05-08 23:42 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney,
	Joel Fernandes

Different architectures capitalize their splats differently.  Who knew?

This commit therefore checks for both arm64 "Call trace:" and x86
"Call Trace:".

Reported-by: Joel Fernandes <joelagnelf@nvidia.com>
Closes: https://lore.kernel.org/all/553c33d8-2b51-4772-8aef-97b0163bc78e@nvidia.com/
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 tools/testing/selftests/rcutorture/bin/console-badness.sh | 2 +-
 tools/testing/selftests/rcutorture/bin/parse-console.sh   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/rcutorture/bin/console-badness.sh b/tools/testing/selftests/rcutorture/bin/console-badness.sh
index aad51e7c0183d..991fb11306eb6 100755
--- a/tools/testing/selftests/rcutorture/bin/console-badness.sh
+++ b/tools/testing/selftests/rcutorture/bin/console-badness.sh
@@ -10,7 +10,7 @@
 #
 # Authors: Paul E. McKenney <paulmck@kernel.org>
 
-grep -E 'Badness|WARNING:|Warn|BUG|===========|BUG: KCSAN:|Call Trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for|!!!' |
+grep -E 'Badness|WARNING:|Warn|BUG|===========|BUG: KCSAN:|Call Trace:|Call trace:|Oops:|detected stalls on CPUs/tasks:|self-detected stall on CPU|Stall ended before state dump start|\?\?\? Writer stall state|rcu_.*kthread starved for|!!!' |
 grep -v 'ODEBUG: ' |
 grep -v 'This means that this is a DEBUG kernel and it is' |
 grep -v 'Warning: unable to open an initial console' |
diff --git a/tools/testing/selftests/rcutorture/bin/parse-console.sh b/tools/testing/selftests/rcutorture/bin/parse-console.sh
index b07c11cf6929d..21e6ba3615f6a 100755
--- a/tools/testing/selftests/rcutorture/bin/parse-console.sh
+++ b/tools/testing/selftests/rcutorture/bin/parse-console.sh
@@ -148,7 +148,7 @@ then
 			summary="$summary  KCSAN: $n_kcsan"
 		fi
 	fi
-	n_calltrace=`grep -c 'Call Trace:' $file`
+	n_calltrace=`grep -Ec 'Call Trace:|Call trace:' $file`
 	if test "$n_calltrace" -ne 0
 	then
 		summary="$summary  Call Traces: $n_calltrace"
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] rcutorture: Reduce TREE01 CPU overcommit
  2025-05-08 23:42 [PATCH 0/3] Make rcutorture safe(r) for arm64 Paul E. McKenney
  2025-05-08 23:42 ` [PATCH 1/3] torture: Check for "Call trace:" as well as "Call Trace:" Paul E. McKenney
@ 2025-05-08 23:42 ` Paul E. McKenney
  2025-05-08 23:42 ` [PATCH 3/3] rcutorture: Remove MAXSMP and CPUMASK_OFFSTACK from TREE01 Paul E. McKenney
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2025-05-08 23:42 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney

The TREE01.boot nr_cpus kernel boot parameter has been set to 43 for
more than seven years, but it can cause RCU CPU stall warnings on arm64,
most of the time involving the stop-machine subsystem.  This should
not be too surprising, given that this causes 43 vCPUs to spin with
interrupts disabled when there are only eight physical CPUs.

The point of this CPU overcommit is to test the ability of expedited RCU
grace period initialization to handle races with incoming CPUs that have
never previously been online.  But limiting to 17 CPUs instead of 43
allows time for this code to be exercised, and eliminates (or at least
greatly reduces) the incidence of RCU CPU stall warnings on arm64.

So this commit therefore sets nr_cpus=17 in TREE01.boot.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot
index 40af3df0f397f..1cc5b47dde282 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE01.boot
@@ -1,4 +1,4 @@
-maxcpus=8 nr_cpus=43
+maxcpus=8 nr_cpus=17
 rcutree.gp_preinit_delay=3
 rcutree.gp_init_delay=3
 rcutree.gp_cleanup_delay=3
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] rcutorture: Remove MAXSMP and CPUMASK_OFFSTACK from TREE01
  2025-05-08 23:42 [PATCH 0/3] Make rcutorture safe(r) for arm64 Paul E. McKenney
  2025-05-08 23:42 ` [PATCH 1/3] torture: Check for "Call trace:" as well as "Call Trace:" Paul E. McKenney
  2025-05-08 23:42 ` [PATCH 2/3] rcutorture: Reduce TREE01 CPU overcommit Paul E. McKenney
@ 2025-05-08 23:42 ` Paul E. McKenney
  2025-05-09 13:18 ` [PATCH 0/3] Make rcutorture safe(r) for arm64 Joel Fernandes
  2025-05-09 17:57 ` Joel Fernandes
  4 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2025-05-08 23:42 UTC (permalink / raw)
  To: rcu; +Cc: linux-kernel, kernel-team, rostedt, Paul E. McKenney

Back in the day, rcutorture was about the only thing that tested off-stack
CPU masks, but now any arm64 system with more than 256 CPUs tests it
full time.  In fact, it is necessary to hack the kernel to prevent such
a system from testing off-stack CPU masks.  This means that there is
no longer much point in rcutorture going out of its way to test this.
And given the differences in how CPUMASK_OFFSTACK is enabled in x86 and
arm64, rcutorture would need to go out of its way.

This commit therefore removes CONFIG_CPUMASK_OFFSTACK=y (and the
CONFIG_MAXSMP=y required to enable it on x86) from TREE01.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 tools/testing/selftests/rcutorture/configs/rcu/TREE01 | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE01 b/tools/testing/selftests/rcutorture/configs/rcu/TREE01
index 8ae41d5f81a3e..54b1600c7eb5f 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE01
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE01
@@ -8,8 +8,6 @@ CONFIG_NO_HZ_IDLE=y
 CONFIG_NO_HZ_FULL=n
 CONFIG_RCU_TRACE=y
 CONFIG_HOTPLUG_CPU=y
-CONFIG_MAXSMP=y
-CONFIG_CPUMASK_OFFSTACK=y
 CONFIG_RCU_NOCB_CPU=y
 CONFIG_DEBUG_LOCK_ALLOC=n
 CONFIG_RCU_BOOST=n
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] Make rcutorture safe(r) for arm64
  2025-05-08 23:42 [PATCH 0/3] Make rcutorture safe(r) for arm64 Paul E. McKenney
                   ` (2 preceding siblings ...)
  2025-05-08 23:42 ` [PATCH 3/3] rcutorture: Remove MAXSMP and CPUMASK_OFFSTACK from TREE01 Paul E. McKenney
@ 2025-05-09 13:18 ` Joel Fernandes
  2025-05-09 14:32   ` Paul E. McKenney
  2025-05-09 17:57 ` Joel Fernandes
  4 siblings, 1 reply; 7+ messages in thread
From: Joel Fernandes @ 2025-05-09 13:18 UTC (permalink / raw)
  To: paulmck, rcu; +Cc: linux-kernel, kernel-team, rostedt



On 5/8/2025 7:42 PM, Paul E. McKenney wrote:
> Hello!
> 
> This series makes a few small updates to make rcutorture run better
> on arm64 servers.  Remaining issues include TREE07 .config issues
> that are addressed by Mark Rutland's porting of PREEMPT_LAZY to arm64
> and by upcoming work to handle the fact that arm64 kernels cannot be
> built with CONFIG_SMP=n.  In the meantime, the CONFIG_SMP=n issue can
> be worked around by explictly specifying the TREE01, TREE02, TREE03,
> TREE04, TREE05, TREE07, SRCU-L, SRCU-N, SRCU-P, TASKS01, TASKS03, RUDE01,
> TRACE01, and TRACE02 scenarios, preferably in a script.  (But if you
> want typing practice, don't let me stand in your way!)
> 
> 1.	Check for "Call trace:" as well as "Call Trace:".
> 
> 2.	Reduce TREE01 CPU overcommit.
> 
> 3.	Remove MAXSMP and CPUMASK_OFFSTACK from TREE01.
> 

These I will take for 6.16 and run some tests, since we're seeing these issues
on ARM. But let me know if you want to delay to 6.17. Thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] Make rcutorture safe(r) for arm64
  2025-05-09 13:18 ` [PATCH 0/3] Make rcutorture safe(r) for arm64 Joel Fernandes
@ 2025-05-09 14:32   ` Paul E. McKenney
  0 siblings, 0 replies; 7+ messages in thread
From: Paul E. McKenney @ 2025-05-09 14:32 UTC (permalink / raw)
  To: Joel Fernandes; +Cc: rcu, linux-kernel, kernel-team, rostedt

On Fri, May 09, 2025 at 09:18:00AM -0400, Joel Fernandes wrote:
> 
> 
> On 5/8/2025 7:42 PM, Paul E. McKenney wrote:
> > Hello!
> > 
> > This series makes a few small updates to make rcutorture run better
> > on arm64 servers.  Remaining issues include TREE07 .config issues
> > that are addressed by Mark Rutland's porting of PREEMPT_LAZY to arm64
> > and by upcoming work to handle the fact that arm64 kernels cannot be
> > built with CONFIG_SMP=n.  In the meantime, the CONFIG_SMP=n issue can
> > be worked around by explictly specifying the TREE01, TREE02, TREE03,
> > TREE04, TREE05, TREE07, SRCU-L, SRCU-N, SRCU-P, TASKS01, TASKS03, RUDE01,
> > TRACE01, and TRACE02 scenarios, preferably in a script.  (But if you
> > want typing practice, don't let me stand in your way!)
> > 
> > 1.	Check for "Call trace:" as well as "Call Trace:".
> > 
> > 2.	Reduce TREE01 CPU overcommit.
> > 
> > 3.	Remove MAXSMP and CPUMASK_OFFSTACK from TREE01.
> > 
> 
> These I will take for 6.16 and run some tests, since we're seeing these issues
> on ARM. But let me know if you want to delay to 6.17. Thanks!

Your decision on both sets makes a lot of sense to me, v6.16 for the
simple ARM-related ones and v6.17 for the less-important and more-complex
series.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] Make rcutorture safe(r) for arm64
  2025-05-08 23:42 [PATCH 0/3] Make rcutorture safe(r) for arm64 Paul E. McKenney
                   ` (3 preceding siblings ...)
  2025-05-09 13:18 ` [PATCH 0/3] Make rcutorture safe(r) for arm64 Joel Fernandes
@ 2025-05-09 17:57 ` Joel Fernandes
  4 siblings, 0 replies; 7+ messages in thread
From: Joel Fernandes @ 2025-05-09 17:57 UTC (permalink / raw)
  To: paulmck, rcu; +Cc: linux-kernel, kernel-team, rostedt



On 5/8/2025 7:42 PM, Paul E. McKenney wrote:
> Hello!
> 
> This series makes a few small updates to make rcutorture run better
> on arm64 servers.  Remaining issues include TREE07 .config issues
> that are addressed by Mark Rutland's porting of PREEMPT_LAZY to arm64
> and by upcoming work to handle the fact that arm64 kernels cannot be
> built with CONFIG_SMP=n.  In the meantime, the CONFIG_SMP=n issue can
> be worked around by explictly specifying the TREE01, TREE02, TREE03,
> TREE04, TREE05, TREE07, SRCU-L, SRCU-N, SRCU-P, TASKS01, TASKS03, RUDE01,
> TRACE01, and TRACE02 scenarios, preferably in a script.  (But if you
> want typing practice, don't let me stand in your way!)
> 
> 1.	Check for "Call trace:" as well as "Call Trace:".
> 
> 2.	Reduce TREE01 CPU overcommit.
> 
> 3.	Remove MAXSMP and CPUMASK_OFFSTACK from TREE01.
> 
Tested-by: Joel Fernandes <joelagnelf@nvidia.com>

Applied for 6.16, thanks!

 - Joel


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-05-09 17:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-08 23:42 [PATCH 0/3] Make rcutorture safe(r) for arm64 Paul E. McKenney
2025-05-08 23:42 ` [PATCH 1/3] torture: Check for "Call trace:" as well as "Call Trace:" Paul E. McKenney
2025-05-08 23:42 ` [PATCH 2/3] rcutorture: Reduce TREE01 CPU overcommit Paul E. McKenney
2025-05-08 23:42 ` [PATCH 3/3] rcutorture: Remove MAXSMP and CPUMASK_OFFSTACK from TREE01 Paul E. McKenney
2025-05-09 13:18 ` [PATCH 0/3] Make rcutorture safe(r) for arm64 Joel Fernandes
2025-05-09 14:32   ` Paul E. McKenney
2025-05-09 17:57 ` Joel Fernandes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).