* [PATCH 1/4] rcutorture: Allow a negative value for nfakewriters
@ 2025-01-23 18:58 Uladzislau Rezki (Sony)
2025-01-23 18:58 ` [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration Uladzislau Rezki (Sony)
` (3 more replies)
0 siblings, 4 replies; 32+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-01-23 18:58 UTC (permalink / raw)
To: Paul E . McKenney, Boqun Feng
Cc: RCU, LKML, Frederic Weisbecker, Cheung Wall, Neeraj upadhyay,
Joel Fernandes, Uladzislau Rezki, Oleksiy Avramchenko
Currently "nfakewriters" parameter can be set to any value but
there is no possibility to adjust it automatically based on how
many CPUs a system has where a test is run on.
To address this, if the "nfakewriters" is set to negative it will
be adjusted to num_possible_cpus() during torture initialization.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
kernel/rcu/rcutorture.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index d26fb1d33ed9..6bc161e1e8ac 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -4050,6 +4050,10 @@ rcu_torture_init(void)
writer_task);
if (torture_init_error(firsterr))
goto unwind;
+
+ if (nfakewriters < 0)
+ nfakewriters = (int) num_possible_cpus();
+
if (nfakewriters > 0) {
fakewriter_tasks = kcalloc(nfakewriters,
sizeof(fakewriter_tasks[0]),
--
2.39.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-23 18:58 [PATCH 1/4] rcutorture: Allow a negative value for nfakewriters Uladzislau Rezki (Sony)
@ 2025-01-23 18:58 ` Uladzislau Rezki (Sony)
2025-01-23 20:29 ` Paul E. McKenney
2025-01-23 18:58 ` [PATCH 3/4] rcu: Update TREE05.boot to test normal synchronize_rcu() Uladzislau Rezki (Sony)
` (2 subsequent siblings)
3 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-01-23 18:58 UTC (permalink / raw)
To: Paul E . McKenney, Boqun Feng
Cc: RCU, LKML, Frederic Weisbecker, Cheung Wall, Neeraj upadhyay,
Joel Fernandes, Uladzislau Rezki, Oleksiy Avramchenko
This configuration specifies the maximum number of CPUs which
is set to 8. The problem is that it can not be overwritten for
something higher.
Remove that configuration for TREE05, so it is possible to run
the torture test on as many CPUs as many system has.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
tools/testing/selftests/rcutorture/configs/rcu/TREE05 | 1 -
1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE05 b/tools/testing/selftests/rcutorture/configs/rcu/TREE05
index 9f48c73709ec..d6fbb82e3e6d 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE05
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE05
@@ -1,5 +1,4 @@
CONFIG_SMP=y
-CONFIG_NR_CPUS=8
CONFIG_PREEMPT_NONE=y
CONFIG_PREEMPT_VOLUNTARY=n
CONFIG_PREEMPT=n
--
2.39.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 3/4] rcu: Update TREE05.boot to test normal synchronize_rcu()
2025-01-23 18:58 [PATCH 1/4] rcutorture: Allow a negative value for nfakewriters Uladzislau Rezki (Sony)
2025-01-23 18:58 ` [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration Uladzislau Rezki (Sony)
@ 2025-01-23 18:58 ` Uladzislau Rezki (Sony)
2025-01-23 20:30 ` Paul E. McKenney
2025-01-23 18:58 ` [PATCH 4/4] rcu: Use _full() API to debug synchronize_rcu() Uladzislau Rezki (Sony)
2025-01-28 20:55 ` [PATCH 1/4] rcutorture: Allow a negative value for nfakewriters Uladzislau Rezki
3 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-01-23 18:58 UTC (permalink / raw)
To: Paul E . McKenney, Boqun Feng
Cc: RCU, LKML, Frederic Weisbecker, Cheung Wall, Neeraj upadhyay,
Joel Fernandes, Uladzislau Rezki, Oleksiy Avramchenko
Add extra parameters for rcutorture module. One is the "nfakewriters"
which is set -1. There will be created number of test-kthreads which
correspond to number of CPUs in a test system. Those threads randomly
invoke synchronize_rcu() call.
Apart of that "rcu_normal" is set to 1, because it is specifically for
a normal synchronize_rcu() testing, also a newly added parameter which
is "rcu_normal_wake_from_gp" is set to 1 also. That prevents interaction
with other callbacks in a system.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
tools/testing/selftests/rcutorture/configs/rcu/TREE05.boot | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE05.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE05.boot
index c419cac233ee..54f5c9053474 100644
--- a/tools/testing/selftests/rcutorture/configs/rcu/TREE05.boot
+++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE05.boot
@@ -2,3 +2,9 @@ rcutree.gp_preinit_delay=3
rcutree.gp_init_delay=3
rcutree.gp_cleanup_delay=3
rcupdate.rcu_self_test=1
+
+# This part is for synchronize_rcu() testing
+rcutorture.nfakewriters=-1
+rcutorture.gp_sync=1
+rcupdate.rcu_normal=1
+rcutree.rcu_normal_wake_from_gp=1
--
2.39.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 4/4] rcu: Use _full() API to debug synchronize_rcu()
2025-01-23 18:58 [PATCH 1/4] rcutorture: Allow a negative value for nfakewriters Uladzislau Rezki (Sony)
2025-01-23 18:58 ` [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration Uladzislau Rezki (Sony)
2025-01-23 18:58 ` [PATCH 3/4] rcu: Update TREE05.boot to test normal synchronize_rcu() Uladzislau Rezki (Sony)
@ 2025-01-23 18:58 ` Uladzislau Rezki (Sony)
2025-01-23 21:52 ` Paul E. McKenney
2025-01-28 20:55 ` [PATCH 1/4] rcutorture: Allow a negative value for nfakewriters Uladzislau Rezki
3 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki (Sony) @ 2025-01-23 18:58 UTC (permalink / raw)
To: Paul E . McKenney, Boqun Feng
Cc: RCU, LKML, Frederic Weisbecker, Cheung Wall, Neeraj upadhyay,
Joel Fernandes, Uladzislau Rezki, Oleksiy Avramchenko
Switch for using of get_state_synchronize_rcu_full() and
poll_state_synchronize_rcu_full() pair for debug a normal
synchronize_rcu() call.
Just using "not" full APIs to identify if a grace period
is passed or not might lead to a false kernel splat.
Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
---
include/linux/rcupdate_wait.h | 4 ++++
kernel/rcu/tree.c | 8 +++-----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/include/linux/rcupdate_wait.h b/include/linux/rcupdate_wait.h
index f9bed3d3f78d..a16fc2a9a7d7 100644
--- a/include/linux/rcupdate_wait.h
+++ b/include/linux/rcupdate_wait.h
@@ -16,6 +16,10 @@
struct rcu_synchronize {
struct rcu_head head;
struct completion completion;
+#ifdef CONFIG_PROVE_RCU
+ /* This is for testing. */
+ struct rcu_gp_oldstate oldstate;
+#endif
};
void wakeme_after_rcu(struct rcu_head *head);
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2795d6b5109c..0ae90089ef09 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1612,12 +1612,10 @@ static void rcu_sr_normal_complete(struct llist_node *node)
{
struct rcu_synchronize *rs = container_of(
(struct rcu_head *) node, struct rcu_synchronize, head);
- unsigned long oldstate = (unsigned long) rs->head.func;
WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) &&
- !poll_state_synchronize_rcu(oldstate),
- "A full grace period is not passed yet: %lu",
- rcu_seq_diff(get_state_synchronize_rcu(), oldstate));
+ !poll_state_synchronize_rcu_full(&rs->oldstate),
+ "A full grace period is not passed yet!\n");
/* Finally. */
complete(&rs->completion);
@@ -3214,7 +3212,7 @@ static void synchronize_rcu_normal(void)
* snapshot before adding a request.
*/
if (IS_ENABLED(CONFIG_PROVE_RCU))
- rs.head.func = (void *) get_state_synchronize_rcu();
+ get_state_synchronize_rcu_full(&rs.oldstate);
rcu_sr_normal_add_req(&rs);
--
2.39.5
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-23 18:58 ` [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration Uladzislau Rezki (Sony)
@ 2025-01-23 20:29 ` Paul E. McKenney
2025-01-24 11:41 ` Uladzislau Rezki
0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-23 20:29 UTC (permalink / raw)
To: Uladzislau Rezki (Sony)
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> This configuration specifies the maximum number of CPUs which
> is set to 8. The problem is that it can not be overwritten for
> something higher.
>
> Remove that configuration for TREE05, so it is possible to run
> the torture test on as many CPUs as many system has.
>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
You should be able to override this on the kvm.sh command line by
specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
For example, see the torture.sh querying the system's number of CPUs
and then specifying it to a number of tests.
Or am I missing something here?
Thanx, Paul
> ---
> tools/testing/selftests/rcutorture/configs/rcu/TREE05 | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE05 b/tools/testing/selftests/rcutorture/configs/rcu/TREE05
> index 9f48c73709ec..d6fbb82e3e6d 100644
> --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE05
> +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE05
> @@ -1,5 +1,4 @@
> CONFIG_SMP=y
> -CONFIG_NR_CPUS=8
> CONFIG_PREEMPT_NONE=y
> CONFIG_PREEMPT_VOLUNTARY=n
> CONFIG_PREEMPT=n
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 3/4] rcu: Update TREE05.boot to test normal synchronize_rcu()
2025-01-23 18:58 ` [PATCH 3/4] rcu: Update TREE05.boot to test normal synchronize_rcu() Uladzislau Rezki (Sony)
@ 2025-01-23 20:30 ` Paul E. McKenney
0 siblings, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-23 20:30 UTC (permalink / raw)
To: Uladzislau Rezki (Sony)
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Thu, Jan 23, 2025 at 07:58:27PM +0100, Uladzislau Rezki (Sony) wrote:
> Add extra parameters for rcutorture module. One is the "nfakewriters"
> which is set -1. There will be created number of test-kthreads which
> correspond to number of CPUs in a test system. Those threads randomly
> invoke synchronize_rcu() call.
>
> Apart of that "rcu_normal" is set to 1, because it is specifically for
> a normal synchronize_rcu() testing, also a newly added parameter which
> is "rcu_normal_wake_from_gp" is set to 1 also. That prevents interaction
> with other callbacks in a system.
>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
This one looks fine to me.
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Thanx, Paul
> ---
> tools/testing/selftests/rcutorture/configs/rcu/TREE05.boot | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/tools/testing/selftests/rcutorture/configs/rcu/TREE05.boot b/tools/testing/selftests/rcutorture/configs/rcu/TREE05.boot
> index c419cac233ee..54f5c9053474 100644
> --- a/tools/testing/selftests/rcutorture/configs/rcu/TREE05.boot
> +++ b/tools/testing/selftests/rcutorture/configs/rcu/TREE05.boot
> @@ -2,3 +2,9 @@ rcutree.gp_preinit_delay=3
> rcutree.gp_init_delay=3
> rcutree.gp_cleanup_delay=3
> rcupdate.rcu_self_test=1
> +
> +# This part is for synchronize_rcu() testing
> +rcutorture.nfakewriters=-1
> +rcutorture.gp_sync=1
> +rcupdate.rcu_normal=1
> +rcutree.rcu_normal_wake_from_gp=1
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/4] rcu: Use _full() API to debug synchronize_rcu()
2025-01-23 18:58 ` [PATCH 4/4] rcu: Use _full() API to debug synchronize_rcu() Uladzislau Rezki (Sony)
@ 2025-01-23 21:52 ` Paul E. McKenney
2025-01-24 11:48 ` Uladzislau Rezki
0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-23 21:52 UTC (permalink / raw)
To: Uladzislau Rezki (Sony)
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Thu, Jan 23, 2025 at 07:58:28PM +0100, Uladzislau Rezki (Sony) wrote:
> Switch for using of get_state_synchronize_rcu_full() and
> poll_state_synchronize_rcu_full() pair for debug a normal
> synchronize_rcu() call.
>
> Just using "not" full APIs to identify if a grace period
> is passed or not might lead to a false kernel splat.
>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
> include/linux/rcupdate_wait.h | 4 ++++
> kernel/rcu/tree.c | 8 +++-----
> 2 files changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/rcupdate_wait.h b/include/linux/rcupdate_wait.h
> index f9bed3d3f78d..a16fc2a9a7d7 100644
> --- a/include/linux/rcupdate_wait.h
> +++ b/include/linux/rcupdate_wait.h
> @@ -16,6 +16,10 @@
> struct rcu_synchronize {
> struct rcu_head head;
> struct completion completion;
> +#ifdef CONFIG_PROVE_RCU
> + /* This is for testing. */
> + struct rcu_gp_oldstate oldstate;
> +#endif
> };
> void wakeme_after_rcu(struct rcu_head *head);
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 2795d6b5109c..0ae90089ef09 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -1612,12 +1612,10 @@ static void rcu_sr_normal_complete(struct llist_node *node)
> {
> struct rcu_synchronize *rs = container_of(
> (struct rcu_head *) node, struct rcu_synchronize, head);
> - unsigned long oldstate = (unsigned long) rs->head.func;
>
> WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) &&
> - !poll_state_synchronize_rcu(oldstate),
> - "A full grace period is not passed yet: %lu",
> - rcu_seq_diff(get_state_synchronize_rcu(), oldstate));
> + !poll_state_synchronize_rcu_full(&rs->oldstate),
> + "A full grace period is not passed yet!\n");
Looks good, but why not also continue printing out the required
grace-period sequence number? Yes, there would need to be helper
sprintf()-style functions to paper over the difference between Tiny RCU
and Tree RCU. ;-)
Thanx, Paul
> /* Finally. */
> complete(&rs->completion);
> @@ -3214,7 +3212,7 @@ static void synchronize_rcu_normal(void)
> * snapshot before adding a request.
> */
> if (IS_ENABLED(CONFIG_PROVE_RCU))
> - rs.head.func = (void *) get_state_synchronize_rcu();
> + get_state_synchronize_rcu_full(&rs.oldstate);
>
> rcu_sr_normal_add_req(&rs);
>
> --
> 2.39.5
>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-23 20:29 ` Paul E. McKenney
@ 2025-01-24 11:41 ` Uladzislau Rezki
2025-01-24 15:45 ` Paul E. McKenney
0 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-24 11:41 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Uladzislau Rezki (Sony), Boqun Feng, RCU, LKML,
Frederic Weisbecker, Cheung Wall, Neeraj upadhyay, Joel Fernandes,
Oleksiy Avramchenko
On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > This configuration specifies the maximum number of CPUs which
> > is set to 8. The problem is that it can not be overwritten for
> > something higher.
> >
> > Remove that configuration for TREE05, so it is possible to run
> > the torture test on as many CPUs as many system has.
> >
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
>
> You should be able to override this on the kvm.sh command line by
> specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> For example, see the torture.sh querying the system's number of CPUs
> and then specifying it to a number of tests.
>
> Or am I missing something here?
>
It took me a while to understand what happens. Apparently there is this
8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
you need to know about that. I have not expected that.
Therefore i removed it from the configuration because i have not found
a good explanation why we need. It is confusing instead :)
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/4] rcu: Use _full() API to debug synchronize_rcu()
2025-01-23 21:52 ` Paul E. McKenney
@ 2025-01-24 11:48 ` Uladzislau Rezki
2025-01-24 15:49 ` Paul E. McKenney
0 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-24 11:48 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Uladzislau Rezki (Sony), Boqun Feng, RCU, LKML,
Frederic Weisbecker, Cheung Wall, Neeraj upadhyay, Joel Fernandes,
Oleksiy Avramchenko
On Thu, Jan 23, 2025 at 01:52:57PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 23, 2025 at 07:58:28PM +0100, Uladzislau Rezki (Sony) wrote:
> > Switch for using of get_state_synchronize_rcu_full() and
> > poll_state_synchronize_rcu_full() pair for debug a normal
> > synchronize_rcu() call.
> >
> > Just using "not" full APIs to identify if a grace period
> > is passed or not might lead to a false kernel splat.
> >
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> > include/linux/rcupdate_wait.h | 4 ++++
> > kernel/rcu/tree.c | 8 +++-----
> > 2 files changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/rcupdate_wait.h b/include/linux/rcupdate_wait.h
> > index f9bed3d3f78d..a16fc2a9a7d7 100644
> > --- a/include/linux/rcupdate_wait.h
> > +++ b/include/linux/rcupdate_wait.h
> > @@ -16,6 +16,10 @@
> > struct rcu_synchronize {
> > struct rcu_head head;
> > struct completion completion;
> > +#ifdef CONFIG_PROVE_RCU
> > + /* This is for testing. */
> > + struct rcu_gp_oldstate oldstate;
> > +#endif
> > };
> > void wakeme_after_rcu(struct rcu_head *head);
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 2795d6b5109c..0ae90089ef09 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1612,12 +1612,10 @@ static void rcu_sr_normal_complete(struct llist_node *node)
> > {
> > struct rcu_synchronize *rs = container_of(
> > (struct rcu_head *) node, struct rcu_synchronize, head);
> > - unsigned long oldstate = (unsigned long) rs->head.func;
> >
> > WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) &&
> > - !poll_state_synchronize_rcu(oldstate),
> > - "A full grace period is not passed yet: %lu",
> > - rcu_seq_diff(get_state_synchronize_rcu(), oldstate));
> > + !poll_state_synchronize_rcu_full(&rs->oldstate),
> > + "A full grace period is not passed yet!\n");
>
> Looks good, but why not also continue printing out the required
> grace-period sequence number? Yes, there would need to be helper
> sprintf()-style functions to paper over the difference between Tiny RCU
> and Tree RCU. ;-)
>
Uhh :) Do we have rcu_seq_diff() for a _full() API? Looks like not :)
It contains both, rgos_norm and rgos_exp! Take a delta of both?
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-24 11:41 ` Uladzislau Rezki
@ 2025-01-24 15:45 ` Paul E. McKenney
2025-01-24 17:21 ` Uladzislau Rezki
0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-24 15:45 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > This configuration specifies the maximum number of CPUs which
> > > is set to 8. The problem is that it can not be overwritten for
> > > something higher.
> > >
> > > Remove that configuration for TREE05, so it is possible to run
> > > the torture test on as many CPUs as many system has.
> > >
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> >
> > You should be able to override this on the kvm.sh command line by
> > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > For example, see the torture.sh querying the system's number of CPUs
> > and then specifying it to a number of tests.
> >
> > Or am I missing something here?
> >
> It took me a while to understand what happens. Apparently there is this
> 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> you need to know about that. I have not expected that.
>
> Therefore i removed it from the configuration because i have not found
> a good explanation why we need. It is confusing instead :)
Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
make use of 20 systems with 80 CPUs each. If you remove that line from
TREE05, won't each instance of TREE05 consume a full system, for a total
of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
command line, but that would affect all the scenarios, not just TREE05.
Including (say) TINY01, where I believe that it would cause kvm.sh
to complain about a Kconfig conflict.
Hence me not being in favor of this change. ;-)
Is there another way to make things work for both situations?
Thanx, Paul
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 4/4] rcu: Use _full() API to debug synchronize_rcu()
2025-01-24 11:48 ` Uladzislau Rezki
@ 2025-01-24 15:49 ` Paul E. McKenney
0 siblings, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-24 15:49 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Fri, Jan 24, 2025 at 12:48:12PM +0100, Uladzislau Rezki wrote:
> On Thu, Jan 23, 2025 at 01:52:57PM -0800, Paul E. McKenney wrote:
> > On Thu, Jan 23, 2025 at 07:58:28PM +0100, Uladzislau Rezki (Sony) wrote:
> > > Switch for using of get_state_synchronize_rcu_full() and
> > > poll_state_synchronize_rcu_full() pair for debug a normal
> > > synchronize_rcu() call.
> > >
> > > Just using "not" full APIs to identify if a grace period
> > > is passed or not might lead to a false kernel splat.
> > >
> > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > ---
> > > include/linux/rcupdate_wait.h | 4 ++++
> > > kernel/rcu/tree.c | 8 +++-----
> > > 2 files changed, 7 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/include/linux/rcupdate_wait.h b/include/linux/rcupdate_wait.h
> > > index f9bed3d3f78d..a16fc2a9a7d7 100644
> > > --- a/include/linux/rcupdate_wait.h
> > > +++ b/include/linux/rcupdate_wait.h
> > > @@ -16,6 +16,10 @@
> > > struct rcu_synchronize {
> > > struct rcu_head head;
> > > struct completion completion;
> > > +#ifdef CONFIG_PROVE_RCU
> > > + /* This is for testing. */
> > > + struct rcu_gp_oldstate oldstate;
> > > +#endif
> > > };
> > > void wakeme_after_rcu(struct rcu_head *head);
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 2795d6b5109c..0ae90089ef09 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -1612,12 +1612,10 @@ static void rcu_sr_normal_complete(struct llist_node *node)
> > > {
> > > struct rcu_synchronize *rs = container_of(
> > > (struct rcu_head *) node, struct rcu_synchronize, head);
> > > - unsigned long oldstate = (unsigned long) rs->head.func;
> > >
> > > WARN_ONCE(IS_ENABLED(CONFIG_PROVE_RCU) &&
> > > - !poll_state_synchronize_rcu(oldstate),
> > > - "A full grace period is not passed yet: %lu",
> > > - rcu_seq_diff(get_state_synchronize_rcu(), oldstate));
> > > + !poll_state_synchronize_rcu_full(&rs->oldstate),
> > > + "A full grace period is not passed yet!\n");
> >
> > Looks good, but why not also continue printing out the required
> > grace-period sequence number? Yes, there would need to be helper
> > sprintf()-style functions to paper over the difference between Tiny RCU
> > and Tree RCU. ;-)
> >
> Uhh :) Do we have rcu_seq_diff() for a _full() API? Looks like not :)
>
> It contains both, rgos_norm and rgos_exp! Take a delta of both?
Why not? Maybe separate the two differences with a colon.
Or maybe make a variant of poll_state_synchronize_rcu_full() that take
a char* argument, which uses the same value for the check and the string
to be output.
Thanx, Paul
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-24 15:45 ` Paul E. McKenney
@ 2025-01-24 17:21 ` Uladzislau Rezki
2025-01-24 17:36 ` Paul E. McKenney
0 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-24 17:21 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Uladzislau Rezki, Boqun Feng, RCU, LKML, Frederic Weisbecker,
Cheung Wall, Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Fri, Jan 24, 2025 at 07:45:23AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> > On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > This configuration specifies the maximum number of CPUs which
> > > > is set to 8. The problem is that it can not be overwritten for
> > > > something higher.
> > > >
> > > > Remove that configuration for TREE05, so it is possible to run
> > > > the torture test on as many CPUs as many system has.
> > > >
> > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > >
> > > You should be able to override this on the kvm.sh command line by
> > > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > > For example, see the torture.sh querying the system's number of CPUs
> > > and then specifying it to a number of tests.
> > >
> > > Or am I missing something here?
> > >
> > It took me a while to understand what happens. Apparently there is this
> > 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> > you need to know about that. I have not expected that.
> >
> > Therefore i removed it from the configuration because i have not found
> > a good explanation why we need. It is confusing instead :)
>
> Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
> make use of 20 systems with 80 CPUs each. If you remove that line from
> TREE05, won't each instance of TREE05 consume a full system, for a total
> of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
> command line, but that would affect all the scenarios, not just TREE05.
> Including (say) TINY01, where I believe that it would cause kvm.sh
> to complain about a Kconfig conflict.
>
> Hence me not being in favor of this change. ;-)
>
> Is there another way to make things work for both situations?
>
OK, i see. Well. I will just go with --kconfig CONFIG_NR_CPUS=foo if i
need more CPUs for TREE05.
I will not resist, we just drop this patch :)
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-24 17:21 ` Uladzislau Rezki
@ 2025-01-24 17:36 ` Paul E. McKenney
2025-01-24 17:48 ` Uladzislau Rezki
0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-24 17:36 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Fri, Jan 24, 2025 at 06:21:30PM +0100, Uladzislau Rezki wrote:
> On Fri, Jan 24, 2025 at 07:45:23AM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> > > On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > > > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > > This configuration specifies the maximum number of CPUs which
> > > > > is set to 8. The problem is that it can not be overwritten for
> > > > > something higher.
> > > > >
> > > > > Remove that configuration for TREE05, so it is possible to run
> > > > > the torture test on as many CPUs as many system has.
> > > > >
> > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > >
> > > > You should be able to override this on the kvm.sh command line by
> > > > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > > > For example, see the torture.sh querying the system's number of CPUs
> > > > and then specifying it to a number of tests.
> > > >
> > > > Or am I missing something here?
> > > >
> > > It took me a while to understand what happens. Apparently there is this
> > > 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> > > you need to know about that. I have not expected that.
> > >
> > > Therefore i removed it from the configuration because i have not found
> > > a good explanation why we need. It is confusing instead :)
> >
> > Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
> > make use of 20 systems with 80 CPUs each. If you remove that line from
> > TREE05, won't each instance of TREE05 consume a full system, for a total
> > of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
> > command line, but that would affect all the scenarios, not just TREE05.
> > Including (say) TINY01, where I believe that it would cause kvm.sh
> > to complain about a Kconfig conflict.
> >
> > Hence me not being in favor of this change. ;-)
> >
> > Is there another way to make things work for both situations?
> >
> OK, i see. Well. I will just go with --kconfig CONFIG_NR_CPUS=foo if i
> need more CPUs for TREE05.
>
> I will not resist, we just drop this patch :)
Thank you!
The bug you are chasing happens when a given synchonize_rcu() interacts
with RCU readers, correct?
In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
interacts with rcu_torture_reader(). So my guess is that running
many small TREE05 guest OSes would reproduce this bug more quickly.
So instead of this:
--kconfig CONFIG_NR_CPUS=128
Do this:
--configs "16*TREE05"
Or maybe even this:
--configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
Thoughts?
Thanx, Paul
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-24 17:36 ` Paul E. McKenney
@ 2025-01-24 17:48 ` Uladzislau Rezki
2025-01-24 19:34 ` Paul E. McKenney
0 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-24 17:48 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Uladzislau Rezki, Boqun Feng, RCU, LKML, Frederic Weisbecker,
Cheung Wall, Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Fri, Jan 24, 2025 at 09:36:07AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 24, 2025 at 06:21:30PM +0100, Uladzislau Rezki wrote:
> > On Fri, Jan 24, 2025 at 07:45:23AM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> > > > On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > > > > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > > > This configuration specifies the maximum number of CPUs which
> > > > > > is set to 8. The problem is that it can not be overwritten for
> > > > > > something higher.
> > > > > >
> > > > > > Remove that configuration for TREE05, so it is possible to run
> > > > > > the torture test on as many CPUs as many system has.
> > > > > >
> > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > >
> > > > > You should be able to override this on the kvm.sh command line by
> > > > > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > > > > For example, see the torture.sh querying the system's number of CPUs
> > > > > and then specifying it to a number of tests.
> > > > >
> > > > > Or am I missing something here?
> > > > >
> > > > It took me a while to understand what happens. Apparently there is this
> > > > 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> > > > you need to know about that. I have not expected that.
> > > >
> > > > Therefore i removed it from the configuration because i have not found
> > > > a good explanation why we need. It is confusing instead :)
> > >
> > > Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
> > > make use of 20 systems with 80 CPUs each. If you remove that line from
> > > TREE05, won't each instance of TREE05 consume a full system, for a total
> > > of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
> > > command line, but that would affect all the scenarios, not just TREE05.
> > > Including (say) TINY01, where I believe that it would cause kvm.sh
> > > to complain about a Kconfig conflict.
> > >
> > > Hence me not being in favor of this change. ;-)
> > >
> > > Is there another way to make things work for both situations?
> > >
> > OK, i see. Well. I will just go with --kconfig CONFIG_NR_CPUS=foo if i
> > need more CPUs for TREE05.
> >
> > I will not resist, we just drop this patch :)
>
> Thank you!
>
> The bug you are chasing happens when a given synchonize_rcu() interacts
> with RCU readers, correct?
>
Below one:
<snip>
/*
* RCU torture fake writer kthread. Repeatedly calls sync, with a random
* delay between calls.
*/
static int
rcu_torture_fakewriter(void *arg)
{
...
<snip>
> In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
> interacts with rcu_torture_reader(). So my guess is that running
> many small TREE05 guest OSes would reproduce this bug more quickly.
> So instead of this:
>
> --kconfig CONFIG_NR_CPUS=128
>
> Do this:
>
> --configs "16*TREE05"
>
> Or maybe even this:
>
> --configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
Thanks for input.
>
> Thoughts?
>
If you mean below splat:
<snip>
[ 32.107748] =============================
[ 32.108512] WARNING: suspicious RCU usage
[ 32.109232] 6.12.0-rc4-dirty #66 Not tainted
[ 32.110058] -----------------------------
[ 32.110817] kernel/events/core.c:13962 RCU-list traversed in non-reader section!!
[ 32.111221] kworker/u34:2 (251) used greatest stack depth: 12112 bytes left
[ 32.112125]
[ 32.112125] other info that might help us debug this:
[ 32.112125]
[ 32.112130]
[ 32.112130] rcu_scheduler_active = 2, debug_locks = 1
[ 32.116039] 3 locks held by cpuhp/1/20:
[ 32.116758] #0: ffffffff93a6a750 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x50/0x220
[ 32.118410] #1: ffffffff93a6ce00 (cpuhp_state-down){+.+.}-{0:0}, at: cpuhp_thread_fun+0x50/0x220
[ 32.120091] #2: ffffffff93b7eb68 (pmus_lock){+.+.}-{3:3}, at: perf_event_exit_cpu_context+0x32/0x2d0
[ 32.121723]
[ 32.121723] stack backtrace:
[ 32.122413] CPU: 1 UID: 0 PID: 20 Comm: cpuhp/1 Not tainted 6.12.0-rc4-dirty #66
[ 32.123666] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[ 32.125302] Call Trace:
[ 32.125769] <TASK>
[ 32.126148] dump_stack_lvl+0x83/0xa0
[ 32.126823] lockdep_rcu_suspicious+0x113/0x180
[ 32.127652] perf_event_exit_cpu_context+0x2c4/0x2d0
[ 32.128593] ? __pfx_perf_event_exit_cpu+0x10/0x10
[ 32.129489] perf_event_exit_cpu+0x9/0x10
[ 32.130243] cpuhp_invoke_callback+0x187/0x6e0
[ 32.131065] ? cpuhp_thread_fun+0x50/0x220
[ 32.131800] cpuhp_thread_fun+0x185/0x220
[ 32.132560] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 32.133394] smpboot_thread_fn+0xd8/0x1d0
[ 32.134050] kthread+0xd0/0x100
[ 32.134592] ? __pfx_kthread+0x10/0x10
[ 32.135270] ret_from_fork+0x2f/0x50
[ 32.135896] ? __pfx_kthread+0x10/0x10
[ 32.136610] ret_from_fork_asm+0x1a/0x30
[ 32.137356] </TASK>
[ 32.140997] smpboot: CPU 1 is now offline
<snip>
I reproduced that using:
+rcutorture.nfakewriters=128
+rcutorture.gp_sync=1
+rcupdate.rcu_expedited=0
+rcupdate.rcu_normal=1
+rcutree.rcu_normal_wake_from_gp=1
<snip>
The test script:
for (( i=0; i<$LOOPS; i++ )); do
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 64 --configs \
'100*TREE05' --memory 20G --bootargs 'rcutorture.fwd_progress=1'
echo "Done $i"
done
i.e. with more nfakewriters.
If you mean the one that has recently reported, i am not able to
reproduce it anyhow :)
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-24 17:48 ` Uladzislau Rezki
@ 2025-01-24 19:34 ` Paul E. McKenney
2025-01-27 13:27 ` Uladzislau Rezki
0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-24 19:34 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Fri, Jan 24, 2025 at 06:48:40PM +0100, Uladzislau Rezki wrote:
> On Fri, Jan 24, 2025 at 09:36:07AM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 24, 2025 at 06:21:30PM +0100, Uladzislau Rezki wrote:
> > > On Fri, Jan 24, 2025 at 07:45:23AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> > > > > On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > > > > > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > > > > This configuration specifies the maximum number of CPUs which
> > > > > > > is set to 8. The problem is that it can not be overwritten for
> > > > > > > something higher.
> > > > > > >
> > > > > > > Remove that configuration for TREE05, so it is possible to run
> > > > > > > the torture test on as many CPUs as many system has.
> > > > > > >
> > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > >
> > > > > > You should be able to override this on the kvm.sh command line by
> > > > > > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > > > > > For example, see the torture.sh querying the system's number of CPUs
> > > > > > and then specifying it to a number of tests.
> > > > > >
> > > > > > Or am I missing something here?
> > > > > >
> > > > > It took me a while to understand what happens. Apparently there is this
> > > > > 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> > > > > you need to know about that. I have not expected that.
> > > > >
> > > > > Therefore i removed it from the configuration because i have not found
> > > > > a good explanation why we need. It is confusing instead :)
> > > >
> > > > Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
> > > > make use of 20 systems with 80 CPUs each. If you remove that line from
> > > > TREE05, won't each instance of TREE05 consume a full system, for a total
> > > > of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
> > > > command line, but that would affect all the scenarios, not just TREE05.
> > > > Including (say) TINY01, where I believe that it would cause kvm.sh
> > > > to complain about a Kconfig conflict.
> > > >
> > > > Hence me not being in favor of this change. ;-)
> > > >
> > > > Is there another way to make things work for both situations?
> > > >
> > > OK, i see. Well. I will just go with --kconfig CONFIG_NR_CPUS=foo if i
> > > need more CPUs for TREE05.
> > >
> > > I will not resist, we just drop this patch :)
> >
> > Thank you!
> >
> > The bug you are chasing happens when a given synchonize_rcu() interacts
> > with RCU readers, correct?
> >
> Below one:
>
> <snip>
> /*
> * RCU torture fake writer kthread. Repeatedly calls sync, with a random
> * delay between calls.
> */
> static int
> rcu_torture_fakewriter(void *arg)
> {
> ...
> <snip>
>
> > In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
> > interacts with rcu_torture_reader(). So my guess is that running
> > many small TREE05 guest OSes would reproduce this bug more quickly.
> > So instead of this:
> >
> > --kconfig CONFIG_NR_CPUS=128
> >
> > Do this:
> >
> > --configs "16*TREE05"
> >
> > Or maybe even this:
> >
> > --configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
> Thanks for input.
>
> >
> > Thoughts?
> >
> If you mean below splat:
No, instead the one reported by cheung wall <zzqq0103.hey@gmail.com>.
> <snip>
> [ 32.107748] =============================
> [ 32.108512] WARNING: suspicious RCU usage
> [ 32.109232] 6.12.0-rc4-dirty #66 Not tainted
> [ 32.110058] -----------------------------
> [ 32.110817] kernel/events/core.c:13962 RCU-list traversed in non-reader section!!
> [ 32.111221] kworker/u34:2 (251) used greatest stack depth: 12112 bytes left
> [ 32.112125]
> [ 32.112125] other info that might help us debug this:
> [ 32.112125]
> [ 32.112130]
> [ 32.112130] rcu_scheduler_active = 2, debug_locks = 1
> [ 32.116039] 3 locks held by cpuhp/1/20:
> [ 32.116758] #0: ffffffff93a6a750 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x50/0x220
> [ 32.118410] #1: ffffffff93a6ce00 (cpuhp_state-down){+.+.}-{0:0}, at: cpuhp_thread_fun+0x50/0x220
> [ 32.120091] #2: ffffffff93b7eb68 (pmus_lock){+.+.}-{3:3}, at: perf_event_exit_cpu_context+0x32/0x2d0
> [ 32.121723]
> [ 32.121723] stack backtrace:
> [ 32.122413] CPU: 1 UID: 0 PID: 20 Comm: cpuhp/1 Not tainted 6.12.0-rc4-dirty #66
> [ 32.123666] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [ 32.125302] Call Trace:
> [ 32.125769] <TASK>
> [ 32.126148] dump_stack_lvl+0x83/0xa0
> [ 32.126823] lockdep_rcu_suspicious+0x113/0x180
> [ 32.127652] perf_event_exit_cpu_context+0x2c4/0x2d0
> [ 32.128593] ? __pfx_perf_event_exit_cpu+0x10/0x10
> [ 32.129489] perf_event_exit_cpu+0x9/0x10
> [ 32.130243] cpuhp_invoke_callback+0x187/0x6e0
> [ 32.131065] ? cpuhp_thread_fun+0x50/0x220
> [ 32.131800] cpuhp_thread_fun+0x185/0x220
> [ 32.132560] ? __pfx_smpboot_thread_fn+0x10/0x10 [ 32.133394] smpboot_thread_fn+0xd8/0x1d0
> [ 32.134050] kthread+0xd0/0x100
> [ 32.134592] ? __pfx_kthread+0x10/0x10
> [ 32.135270] ret_from_fork+0x2f/0x50
> [ 32.135896] ? __pfx_kthread+0x10/0x10
> [ 32.136610] ret_from_fork_asm+0x1a/0x30
> [ 32.137356] </TASK>
> [ 32.140997] smpboot: CPU 1 is now offline
> <snip>
>
> I reproduced that using:
>
> +rcutorture.nfakewriters=128
> +rcutorture.gp_sync=1
> +rcupdate.rcu_expedited=0
> +rcupdate.rcu_normal=1
> +rcutree.rcu_normal_wake_from_gp=1
> <snip>
>
> The test script:
>
> for (( i=0; i<$LOOPS; i++ )); do
> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 64 --configs \
> '100*TREE05' --memory 20G --bootargs 'rcutorture.fwd_progress=1'
> echo "Done $i"
> done
>
> i.e. with more nfakewriters.
Right, and large nfakewriters would help push the synchronize_rcu()
wakeups off of the grace-period kthread.
> If you mean the one that has recently reported, i am not able to
> reproduce it anyhow :)
Using larger numbers of smaller rcutorture guest OSes might help to
reproduce it. Maybe as small as three CPUs each. ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-24 19:34 ` Paul E. McKenney
@ 2025-01-27 13:27 ` Uladzislau Rezki
2025-01-27 14:51 ` Paul E. McKenney
0 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-27 13:27 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Uladzislau Rezki, Boqun Feng, RCU, LKML, Frederic Weisbecker,
Cheung Wall, Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Fri, Jan 24, 2025 at 11:34:03AM -0800, Paul E. McKenney wrote:
> On Fri, Jan 24, 2025 at 06:48:40PM +0100, Uladzislau Rezki wrote:
> > On Fri, Jan 24, 2025 at 09:36:07AM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 24, 2025 at 06:21:30PM +0100, Uladzislau Rezki wrote:
> > > > On Fri, Jan 24, 2025 at 07:45:23AM -0800, Paul E. McKenney wrote:
> > > > > On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> > > > > > On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > > > > > > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > > > > > This configuration specifies the maximum number of CPUs which
> > > > > > > > is set to 8. The problem is that it can not be overwritten for
> > > > > > > > something higher.
> > > > > > > >
> > > > > > > > Remove that configuration for TREE05, so it is possible to run
> > > > > > > > the torture test on as many CPUs as many system has.
> > > > > > > >
> > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > >
> > > > > > > You should be able to override this on the kvm.sh command line by
> > > > > > > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > > > > > > For example, see the torture.sh querying the system's number of CPUs
> > > > > > > and then specifying it to a number of tests.
> > > > > > >
> > > > > > > Or am I missing something here?
> > > > > > >
> > > > > > It took me a while to understand what happens. Apparently there is this
> > > > > > 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> > > > > > you need to know about that. I have not expected that.
> > > > > >
> > > > > > Therefore i removed it from the configuration because i have not found
> > > > > > a good explanation why we need. It is confusing instead :)
> > > > >
> > > > > Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
> > > > > make use of 20 systems with 80 CPUs each. If you remove that line from
> > > > > TREE05, won't each instance of TREE05 consume a full system, for a total
> > > > > of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
> > > > > command line, but that would affect all the scenarios, not just TREE05.
> > > > > Including (say) TINY01, where I believe that it would cause kvm.sh
> > > > > to complain about a Kconfig conflict.
> > > > >
> > > > > Hence me not being in favor of this change. ;-)
> > > > >
> > > > > Is there another way to make things work for both situations?
> > > > >
> > > > OK, i see. Well. I will just go with --kconfig CONFIG_NR_CPUS=foo if i
> > > > need more CPUs for TREE05.
> > > >
> > > > I will not resist, we just drop this patch :)
> > >
> > > Thank you!
> > >
> > > The bug you are chasing happens when a given synchonize_rcu() interacts
> > > with RCU readers, correct?
> > >
> > Below one:
> >
> > <snip>
> > /*
> > * RCU torture fake writer kthread. Repeatedly calls sync, with a random
> > * delay between calls.
> > */
> > static int
> > rcu_torture_fakewriter(void *arg)
> > {
> > ...
> > <snip>
> >
> > > In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
> > > interacts with rcu_torture_reader(). So my guess is that running
> > > many small TREE05 guest OSes would reproduce this bug more quickly.
> > > So instead of this:
> > >
> > > --kconfig CONFIG_NR_CPUS=128
> > >
> > > Do this:
> > >
> > > --configs "16*TREE05"
> > >
> > > Or maybe even this:
> > >
> > > --configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
> > Thanks for input.
> >
> > >
> > > Thoughts?
> > >
> > If you mean below splat:
>
> >
> > i.e. with more nfakewriters.
>
> Right, and large nfakewriters would help push the synchronize_rcu()
> wakeups off of the grace-period kthread.
>
> > If you mean the one that has recently reported, i am not able to
> > reproduce it anyhow :)
>
> Using larger numbers of smaller rcutorture guest OSes might help to
> reproduce it. Maybe as small as three CPUs each. ;-)
>
OK. I will give a try this:
for (( i=0; i<$LOOPS; i++ )); do
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
'16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
echo "Done $i"
done
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-27 13:27 ` Uladzislau Rezki
@ 2025-01-27 14:51 ` Paul E. McKenney
2025-01-27 15:42 ` Uladzislau Rezki
0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-27 14:51 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Mon, Jan 27, 2025 at 02:27:51PM +0100, Uladzislau Rezki wrote:
> On Fri, Jan 24, 2025 at 11:34:03AM -0800, Paul E. McKenney wrote:
> > On Fri, Jan 24, 2025 at 06:48:40PM +0100, Uladzislau Rezki wrote:
> > > On Fri, Jan 24, 2025 at 09:36:07AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 24, 2025 at 06:21:30PM +0100, Uladzislau Rezki wrote:
> > > > > On Fri, Jan 24, 2025 at 07:45:23AM -0800, Paul E. McKenney wrote:
> > > > > > On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> > > > > > > On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > > > > > > > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > > > > > > This configuration specifies the maximum number of CPUs which
> > > > > > > > > is set to 8. The problem is that it can not be overwritten for
> > > > > > > > > something higher.
> > > > > > > > >
> > > > > > > > > Remove that configuration for TREE05, so it is possible to run
> > > > > > > > > the torture test on as many CPUs as many system has.
> > > > > > > > >
> > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > > >
> > > > > > > > You should be able to override this on the kvm.sh command line by
> > > > > > > > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > > > > > > > For example, see the torture.sh querying the system's number of CPUs
> > > > > > > > and then specifying it to a number of tests.
> > > > > > > >
> > > > > > > > Or am I missing something here?
> > > > > > > >
> > > > > > > It took me a while to understand what happens. Apparently there is this
> > > > > > > 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> > > > > > > you need to know about that. I have not expected that.
> > > > > > >
> > > > > > > Therefore i removed it from the configuration because i have not found
> > > > > > > a good explanation why we need. It is confusing instead :)
> > > > > >
> > > > > > Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
> > > > > > make use of 20 systems with 80 CPUs each. If you remove that line from
> > > > > > TREE05, won't each instance of TREE05 consume a full system, for a total
> > > > > > of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
> > > > > > command line, but that would affect all the scenarios, not just TREE05.
> > > > > > Including (say) TINY01, where I believe that it would cause kvm.sh
> > > > > > to complain about a Kconfig conflict.
> > > > > >
> > > > > > Hence me not being in favor of this change. ;-)
> > > > > >
> > > > > > Is there another way to make things work for both situations?
> > > > > >
> > > > > OK, i see. Well. I will just go with --kconfig CONFIG_NR_CPUS=foo if i
> > > > > need more CPUs for TREE05.
> > > > >
> > > > > I will not resist, we just drop this patch :)
> > > >
> > > > Thank you!
> > > >
> > > > The bug you are chasing happens when a given synchonize_rcu() interacts
> > > > with RCU readers, correct?
> > > >
> > > Below one:
> > >
> > > <snip>
> > > /*
> > > * RCU torture fake writer kthread. Repeatedly calls sync, with a random
> > > * delay between calls.
> > > */
> > > static int
> > > rcu_torture_fakewriter(void *arg)
> > > {
> > > ...
> > > <snip>
> > >
> > > > In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
> > > > interacts with rcu_torture_reader(). So my guess is that running
> > > > many small TREE05 guest OSes would reproduce this bug more quickly.
> > > > So instead of this:
> > > >
> > > > --kconfig CONFIG_NR_CPUS=128
> > > >
> > > > Do this:
> > > >
> > > > --configs "16*TREE05"
> > > >
> > > > Or maybe even this:
> > > >
> > > > --configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
> > > Thanks for input.
> > >
> > > >
> > > > Thoughts?
> > > >
> > > If you mean below splat:
> >
> > >
> > > i.e. with more nfakewriters.
> >
> > Right, and large nfakewriters would help push the synchronize_rcu()
> > wakeups off of the grace-period kthread.
> >
> > > If you mean the one that has recently reported, i am not able to
> > > reproduce it anyhow :)
> >
> > Using larger numbers of smaller rcutorture guest OSes might help to
> > reproduce it. Maybe as small as three CPUs each. ;-)
> >
> OK. I will give a try this:
>
> for (( i=0; i<$LOOPS; i++ )); do
> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
> echo "Done $i"
> done
Making each guest OS smaller needs "--kconfig CONFIG_NR_CPUS=4" (or
whatever) as well, perhaps also increasing the "16*TREE05".
Thanx, Paul
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-27 14:51 ` Paul E. McKenney
@ 2025-01-27 15:42 ` Uladzislau Rezki
2025-01-27 16:51 ` Paul E. McKenney
0 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-27 15:42 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Uladzislau Rezki, Boqun Feng, RCU, LKML, Frederic Weisbecker,
Cheung Wall, Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Mon, Jan 27, 2025 at 06:51:44AM -0800, Paul E. McKenney wrote:
> On Mon, Jan 27, 2025 at 02:27:51PM +0100, Uladzislau Rezki wrote:
> > On Fri, Jan 24, 2025 at 11:34:03AM -0800, Paul E. McKenney wrote:
> > > On Fri, Jan 24, 2025 at 06:48:40PM +0100, Uladzislau Rezki wrote:
> > > > On Fri, Jan 24, 2025 at 09:36:07AM -0800, Paul E. McKenney wrote:
> > > > > On Fri, Jan 24, 2025 at 06:21:30PM +0100, Uladzislau Rezki wrote:
> > > > > > On Fri, Jan 24, 2025 at 07:45:23AM -0800, Paul E. McKenney wrote:
> > > > > > > On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> > > > > > > > On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > > > > > > > > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > > > > > > > This configuration specifies the maximum number of CPUs which
> > > > > > > > > > is set to 8. The problem is that it can not be overwritten for
> > > > > > > > > > something higher.
> > > > > > > > > >
> > > > > > > > > > Remove that configuration for TREE05, so it is possible to run
> > > > > > > > > > the torture test on as many CPUs as many system has.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > > > >
> > > > > > > > > You should be able to override this on the kvm.sh command line by
> > > > > > > > > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > > > > > > > > For example, see the torture.sh querying the system's number of CPUs
> > > > > > > > > and then specifying it to a number of tests.
> > > > > > > > >
> > > > > > > > > Or am I missing something here?
> > > > > > > > >
> > > > > > > > It took me a while to understand what happens. Apparently there is this
> > > > > > > > 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> > > > > > > > you need to know about that. I have not expected that.
> > > > > > > >
> > > > > > > > Therefore i removed it from the configuration because i have not found
> > > > > > > > a good explanation why we need. It is confusing instead :)
> > > > > > >
> > > > > > > Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
> > > > > > > make use of 20 systems with 80 CPUs each. If you remove that line from
> > > > > > > TREE05, won't each instance of TREE05 consume a full system, for a total
> > > > > > > of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
> > > > > > > command line, but that would affect all the scenarios, not just TREE05.
> > > > > > > Including (say) TINY01, where I believe that it would cause kvm.sh
> > > > > > > to complain about a Kconfig conflict.
> > > > > > >
> > > > > > > Hence me not being in favor of this change. ;-)
> > > > > > >
> > > > > > > Is there another way to make things work for both situations?
> > > > > > >
> > > > > > OK, i see. Well. I will just go with --kconfig CONFIG_NR_CPUS=foo if i
> > > > > > need more CPUs for TREE05.
> > > > > >
> > > > > > I will not resist, we just drop this patch :)
> > > > >
> > > > > Thank you!
> > > > >
> > > > > The bug you are chasing happens when a given synchonize_rcu() interacts
> > > > > with RCU readers, correct?
> > > > >
> > > > Below one:
> > > >
> > > > <snip>
> > > > /*
> > > > * RCU torture fake writer kthread. Repeatedly calls sync, with a random
> > > > * delay between calls.
> > > > */
> > > > static int
> > > > rcu_torture_fakewriter(void *arg)
> > > > {
> > > > ...
> > > > <snip>
> > > >
> > > > > In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
> > > > > interacts with rcu_torture_reader(). So my guess is that running
> > > > > many small TREE05 guest OSes would reproduce this bug more quickly.
> > > > > So instead of this:
> > > > >
> > > > > --kconfig CONFIG_NR_CPUS=128
> > > > >
> > > > > Do this:
> > > > >
> > > > > --configs "16*TREE05"
> > > > >
> > > > > Or maybe even this:
> > > > >
> > > > > --configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
> > > > Thanks for input.
> > > >
> > > > >
> > > > > Thoughts?
> > > > >
> > > > If you mean below splat:
> > >
> > > >
> > > > i.e. with more nfakewriters.
> > >
> > > Right, and large nfakewriters would help push the synchronize_rcu()
> > > wakeups off of the grace-period kthread.
> > >
> > > > If you mean the one that has recently reported, i am not able to
> > > > reproduce it anyhow :)
> > >
> > > Using larger numbers of smaller rcutorture guest OSes might help to
> > > reproduce it. Maybe as small as three CPUs each. ;-)
> > >
> > OK. I will give a try this:
> >
> > for (( i=0; i<$LOOPS; i++ )); do
> > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
> > echo "Done $i"
> > done
>
> Making each guest OS smaller needs "--kconfig CONFIG_NR_CPUS=4" (or
> whatever) as well, perhaps also increasing the "16*TREE05".
>
By default we have NR_CPUS=8, we we discussed. Providing to kvm "--cpus 5"
parameter will just set number of CPUs for a VM to 5:
<snip>
...
[ 0.060672] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=5, Nodes=1
...
<snip>
so, for my test i do not see why i need to set --kconfig CONFIG_NR_CPUS=4.
Am i missing something? :)
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-27 15:42 ` Uladzislau Rezki
@ 2025-01-27 16:51 ` Paul E. McKenney
2025-01-27 17:26 ` Uladzislau Rezki
0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-27 16:51 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Mon, Jan 27, 2025 at 04:42:58PM +0100, Uladzislau Rezki wrote:
> On Mon, Jan 27, 2025 at 06:51:44AM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 27, 2025 at 02:27:51PM +0100, Uladzislau Rezki wrote:
> > > On Fri, Jan 24, 2025 at 11:34:03AM -0800, Paul E. McKenney wrote:
> > > > On Fri, Jan 24, 2025 at 06:48:40PM +0100, Uladzislau Rezki wrote:
> > > > > On Fri, Jan 24, 2025 at 09:36:07AM -0800, Paul E. McKenney wrote:
> > > > > > On Fri, Jan 24, 2025 at 06:21:30PM +0100, Uladzislau Rezki wrote:
> > > > > > > On Fri, Jan 24, 2025 at 07:45:23AM -0800, Paul E. McKenney wrote:
> > > > > > > > On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> > > > > > > > > On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > > > > > > > > > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > > > > > > > > This configuration specifies the maximum number of CPUs which
> > > > > > > > > > > is set to 8. The problem is that it can not be overwritten for
> > > > > > > > > > > something higher.
> > > > > > > > > > >
> > > > > > > > > > > Remove that configuration for TREE05, so it is possible to run
> > > > > > > > > > > the torture test on as many CPUs as many system has.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > > > > >
> > > > > > > > > > You should be able to override this on the kvm.sh command line by
> > > > > > > > > > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > > > > > > > > > For example, see the torture.sh querying the system's number of CPUs
> > > > > > > > > > and then specifying it to a number of tests.
> > > > > > > > > >
> > > > > > > > > > Or am I missing something here?
> > > > > > > > > >
> > > > > > > > > It took me a while to understand what happens. Apparently there is this
> > > > > > > > > 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> > > > > > > > > you need to know about that. I have not expected that.
> > > > > > > > >
> > > > > > > > > Therefore i removed it from the configuration because i have not found
> > > > > > > > > a good explanation why we need. It is confusing instead :)
> > > > > > > >
> > > > > > > > Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
> > > > > > > > make use of 20 systems with 80 CPUs each. If you remove that line from
> > > > > > > > TREE05, won't each instance of TREE05 consume a full system, for a total
> > > > > > > > of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
> > > > > > > > command line, but that would affect all the scenarios, not just TREE05.
> > > > > > > > Including (say) TINY01, where I believe that it would cause kvm.sh
> > > > > > > > to complain about a Kconfig conflict.
> > > > > > > >
> > > > > > > > Hence me not being in favor of this change. ;-)
> > > > > > > >
> > > > > > > > Is there another way to make things work for both situations?
> > > > > > > >
> > > > > > > OK, i see. Well. I will just go with --kconfig CONFIG_NR_CPUS=foo if i
> > > > > > > need more CPUs for TREE05.
> > > > > > >
> > > > > > > I will not resist, we just drop this patch :)
> > > > > >
> > > > > > Thank you!
> > > > > >
> > > > > > The bug you are chasing happens when a given synchonize_rcu() interacts
> > > > > > with RCU readers, correct?
> > > > > >
> > > > > Below one:
> > > > >
> > > > > <snip>
> > > > > /*
> > > > > * RCU torture fake writer kthread. Repeatedly calls sync, with a random
> > > > > * delay between calls.
> > > > > */
> > > > > static int
> > > > > rcu_torture_fakewriter(void *arg)
> > > > > {
> > > > > ...
> > > > > <snip>
> > > > >
> > > > > > In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
> > > > > > interacts with rcu_torture_reader(). So my guess is that running
> > > > > > many small TREE05 guest OSes would reproduce this bug more quickly.
> > > > > > So instead of this:
> > > > > >
> > > > > > --kconfig CONFIG_NR_CPUS=128
> > > > > >
> > > > > > Do this:
> > > > > >
> > > > > > --configs "16*TREE05"
> > > > > >
> > > > > > Or maybe even this:
> > > > > >
> > > > > > --configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
> > > > > Thanks for input.
> > > > >
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > If you mean below splat:
> > > >
> > > > >
> > > > > i.e. with more nfakewriters.
> > > >
> > > > Right, and large nfakewriters would help push the synchronize_rcu()
> > > > wakeups off of the grace-period kthread.
> > > >
> > > > > If you mean the one that has recently reported, i am not able to
> > > > > reproduce it anyhow :)
> > > >
> > > > Using larger numbers of smaller rcutorture guest OSes might help to
> > > > reproduce it. Maybe as small as three CPUs each. ;-)
> > > >
> > > OK. I will give a try this:
> > >
> > > for (( i=0; i<$LOOPS; i++ )); do
> > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> > > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
> > > echo "Done $i"
> > > done
> >
> > Making each guest OS smaller needs "--kconfig CONFIG_NR_CPUS=4" (or
> > whatever) as well, perhaps also increasing the "16*TREE05".
> >
>
> By default we have NR_CPUS=8, we we discussed. Providing to kvm "--cpus 5"
> parameter will just set number of CPUs for a VM to 5:
>
> <snip>
> ...
> [ 0.060672] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=5, Nodes=1
> ...
> <snip>
>
> so, for my test i do not see why i need to set --kconfig CONFIG_NR_CPUS=4.
>
> Am i missing something? :)
Because that gets you more guest OSes running on your system, each with
one RCU-update kthread that is being checked by RCU reader kthreads.
Therefore, it might double the rate at which you are able to reproduce
this issue.
Thanx, Paul
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-27 16:51 ` Paul E. McKenney
@ 2025-01-27 17:26 ` Uladzislau Rezki
2025-01-27 18:15 ` Paul E. McKenney
0 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-27 17:26 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Uladzislau Rezki, Boqun Feng, RCU, LKML, Frederic Weisbecker,
Cheung Wall, Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Mon, Jan 27, 2025 at 08:51:01AM -0800, Paul E. McKenney wrote:
> On Mon, Jan 27, 2025 at 04:42:58PM +0100, Uladzislau Rezki wrote:
> > On Mon, Jan 27, 2025 at 06:51:44AM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 27, 2025 at 02:27:51PM +0100, Uladzislau Rezki wrote:
> > > > On Fri, Jan 24, 2025 at 11:34:03AM -0800, Paul E. McKenney wrote:
> > > > > On Fri, Jan 24, 2025 at 06:48:40PM +0100, Uladzislau Rezki wrote:
> > > > > > On Fri, Jan 24, 2025 at 09:36:07AM -0800, Paul E. McKenney wrote:
> > > > > > > On Fri, Jan 24, 2025 at 06:21:30PM +0100, Uladzislau Rezki wrote:
> > > > > > > > On Fri, Jan 24, 2025 at 07:45:23AM -0800, Paul E. McKenney wrote:
> > > > > > > > > On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> > > > > > > > > > On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > > > > > > > > > > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > > > > > > > > > This configuration specifies the maximum number of CPUs which
> > > > > > > > > > > > is set to 8. The problem is that it can not be overwritten for
> > > > > > > > > > > > something higher.
> > > > > > > > > > > >
> > > > > > > > > > > > Remove that configuration for TREE05, so it is possible to run
> > > > > > > > > > > > the torture test on as many CPUs as many system has.
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > > > > > >
> > > > > > > > > > > You should be able to override this on the kvm.sh command line by
> > > > > > > > > > > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > > > > > > > > > > For example, see the torture.sh querying the system's number of CPUs
> > > > > > > > > > > and then specifying it to a number of tests.
> > > > > > > > > > >
> > > > > > > > > > > Or am I missing something here?
> > > > > > > > > > >
> > > > > > > > > > It took me a while to understand what happens. Apparently there is this
> > > > > > > > > > 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> > > > > > > > > > you need to know about that. I have not expected that.
> > > > > > > > > >
> > > > > > > > > > Therefore i removed it from the configuration because i have not found
> > > > > > > > > > a good explanation why we need. It is confusing instead :)
> > > > > > > > >
> > > > > > > > > Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
> > > > > > > > > make use of 20 systems with 80 CPUs each. If you remove that line from
> > > > > > > > > TREE05, won't each instance of TREE05 consume a full system, for a total
> > > > > > > > > of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
> > > > > > > > > command line, but that would affect all the scenarios, not just TREE05.
> > > > > > > > > Including (say) TINY01, where I believe that it would cause kvm.sh
> > > > > > > > > to complain about a Kconfig conflict.
> > > > > > > > >
> > > > > > > > > Hence me not being in favor of this change. ;-)
> > > > > > > > >
> > > > > > > > > Is there another way to make things work for both situations?
> > > > > > > > >
> > > > > > > > OK, i see. Well. I will just go with --kconfig CONFIG_NR_CPUS=foo if i
> > > > > > > > need more CPUs for TREE05.
> > > > > > > >
> > > > > > > > I will not resist, we just drop this patch :)
> > > > > > >
> > > > > > > Thank you!
> > > > > > >
> > > > > > > The bug you are chasing happens when a given synchonize_rcu() interacts
> > > > > > > with RCU readers, correct?
> > > > > > >
> > > > > > Below one:
> > > > > >
> > > > > > <snip>
> > > > > > /*
> > > > > > * RCU torture fake writer kthread. Repeatedly calls sync, with a random
> > > > > > * delay between calls.
> > > > > > */
> > > > > > static int
> > > > > > rcu_torture_fakewriter(void *arg)
> > > > > > {
> > > > > > ...
> > > > > > <snip>
> > > > > >
> > > > > > > In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
> > > > > > > interacts with rcu_torture_reader(). So my guess is that running
> > > > > > > many small TREE05 guest OSes would reproduce this bug more quickly.
> > > > > > > So instead of this:
> > > > > > >
> > > > > > > --kconfig CONFIG_NR_CPUS=128
> > > > > > >
> > > > > > > Do this:
> > > > > > >
> > > > > > > --configs "16*TREE05"
> > > > > > >
> > > > > > > Or maybe even this:
> > > > > > >
> > > > > > > --configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
> > > > > > Thanks for input.
> > > > > >
> > > > > > >
> > > > > > > Thoughts?
> > > > > > >
> > > > > > If you mean below splat:
> > > > >
> > > > > >
> > > > > > i.e. with more nfakewriters.
> > > > >
> > > > > Right, and large nfakewriters would help push the synchronize_rcu()
> > > > > wakeups off of the grace-period kthread.
> > > > >
> > > > > > If you mean the one that has recently reported, i am not able to
> > > > > > reproduce it anyhow :)
> > > > >
> > > > > Using larger numbers of smaller rcutorture guest OSes might help to
> > > > > reproduce it. Maybe as small as three CPUs each. ;-)
> > > > >
> > > > OK. I will give a try this:
> > > >
> > > > for (( i=0; i<$LOOPS; i++ )); do
> > > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> > > > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
> > > > echo "Done $i"
> > > > done
> > >
> > > Making each guest OS smaller needs "--kconfig CONFIG_NR_CPUS=4" (or
> > > whatever) as well, perhaps also increasing the "16*TREE05".
> > >
> >
> > By default we have NR_CPUS=8, we we discussed. Providing to kvm "--cpus 5"
> > parameter will just set number of CPUs for a VM to 5:
> >
> > <snip>
> > ...
> > [ 0.060672] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=5, Nodes=1
> > ...
> > <snip>
> >
> > so, for my test i do not see why i need to set --kconfig CONFIG_NR_CPUS=4.
> >
> > Am i missing something? :)
>
> Because that gets you more guest OSes running on your system, each with
> one RCU-update kthread that is being checked by RCU reader kthreads.
> Therefore, it might double the rate at which you are able to reproduce
> this issue.
>
You mean that setting --kconfig CONFIG_NR_CPUS=4 and 16*TREE05 will run
4 separate KVM instances?
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-27 17:26 ` Uladzislau Rezki
@ 2025-01-27 18:15 ` Paul E. McKenney
2025-01-27 18:31 ` Uladzislau Rezki
2025-01-27 19:24 ` Uladzislau Rezki
0 siblings, 2 replies; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-27 18:15 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Mon, Jan 27, 2025 at 06:26:59PM +0100, Uladzislau Rezki wrote:
> On Mon, Jan 27, 2025 at 08:51:01AM -0800, Paul E. McKenney wrote:
> > On Mon, Jan 27, 2025 at 04:42:58PM +0100, Uladzislau Rezki wrote:
> > > On Mon, Jan 27, 2025 at 06:51:44AM -0800, Paul E. McKenney wrote:
> > > > On Mon, Jan 27, 2025 at 02:27:51PM +0100, Uladzislau Rezki wrote:
> > > > > On Fri, Jan 24, 2025 at 11:34:03AM -0800, Paul E. McKenney wrote:
> > > > > > On Fri, Jan 24, 2025 at 06:48:40PM +0100, Uladzislau Rezki wrote:
> > > > > > > On Fri, Jan 24, 2025 at 09:36:07AM -0800, Paul E. McKenney wrote:
> > > > > > > > On Fri, Jan 24, 2025 at 06:21:30PM +0100, Uladzislau Rezki wrote:
> > > > > > > > > On Fri, Jan 24, 2025 at 07:45:23AM -0800, Paul E. McKenney wrote:
> > > > > > > > > > On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> > > > > > > > > > > On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > > > > > > > > > > > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > > > > > > > > > > This configuration specifies the maximum number of CPUs which
> > > > > > > > > > > > > is set to 8. The problem is that it can not be overwritten for
> > > > > > > > > > > > > something higher.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Remove that configuration for TREE05, so it is possible to run
> > > > > > > > > > > > > the torture test on as many CPUs as many system has.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > > > > > > >
> > > > > > > > > > > > You should be able to override this on the kvm.sh command line by
> > > > > > > > > > > > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > > > > > > > > > > > For example, see the torture.sh querying the system's number of CPUs
> > > > > > > > > > > > and then specifying it to a number of tests.
> > > > > > > > > > > >
> > > > > > > > > > > > Or am I missing something here?
> > > > > > > > > > > >
> > > > > > > > > > > It took me a while to understand what happens. Apparently there is this
> > > > > > > > > > > 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> > > > > > > > > > > you need to know about that. I have not expected that.
> > > > > > > > > > >
> > > > > > > > > > > Therefore i removed it from the configuration because i have not found
> > > > > > > > > > > a good explanation why we need. It is confusing instead :)
> > > > > > > > > >
> > > > > > > > > > Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
> > > > > > > > > > make use of 20 systems with 80 CPUs each. If you remove that line from
> > > > > > > > > > TREE05, won't each instance of TREE05 consume a full system, for a total
> > > > > > > > > > of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
> > > > > > > > > > command line, but that would affect all the scenarios, not just TREE05.
> > > > > > > > > > Including (say) TINY01, where I believe that it would cause kvm.sh
> > > > > > > > > > to complain about a Kconfig conflict.
> > > > > > > > > >
> > > > > > > > > > Hence me not being in favor of this change. ;-)
> > > > > > > > > >
> > > > > > > > > > Is there another way to make things work for both situations?
> > > > > > > > > >
> > > > > > > > > OK, i see. Well. I will just go with --kconfig CONFIG_NR_CPUS=foo if i
> > > > > > > > > need more CPUs for TREE05.
> > > > > > > > >
> > > > > > > > > I will not resist, we just drop this patch :)
> > > > > > > >
> > > > > > > > Thank you!
> > > > > > > >
> > > > > > > > The bug you are chasing happens when a given synchonize_rcu() interacts
> > > > > > > > with RCU readers, correct?
> > > > > > > >
> > > > > > > Below one:
> > > > > > >
> > > > > > > <snip>
> > > > > > > /*
> > > > > > > * RCU torture fake writer kthread. Repeatedly calls sync, with a random
> > > > > > > * delay between calls.
> > > > > > > */
> > > > > > > static int
> > > > > > > rcu_torture_fakewriter(void *arg)
> > > > > > > {
> > > > > > > ...
> > > > > > > <snip>
> > > > > > >
> > > > > > > > In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
> > > > > > > > interacts with rcu_torture_reader(). So my guess is that running
> > > > > > > > many small TREE05 guest OSes would reproduce this bug more quickly.
> > > > > > > > So instead of this:
> > > > > > > >
> > > > > > > > --kconfig CONFIG_NR_CPUS=128
> > > > > > > >
> > > > > > > > Do this:
> > > > > > > >
> > > > > > > > --configs "16*TREE05"
> > > > > > > >
> > > > > > > > Or maybe even this:
> > > > > > > >
> > > > > > > > --configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
> > > > > > > Thanks for input.
> > > > > > >
> > > > > > > >
> > > > > > > > Thoughts?
> > > > > > > >
> > > > > > > If you mean below splat:
> > > > > >
> > > > > > >
> > > > > > > i.e. with more nfakewriters.
> > > > > >
> > > > > > Right, and large nfakewriters would help push the synchronize_rcu()
> > > > > > wakeups off of the grace-period kthread.
> > > > > >
> > > > > > > If you mean the one that has recently reported, i am not able to
> > > > > > > reproduce it anyhow :)
> > > > > >
> > > > > > Using larger numbers of smaller rcutorture guest OSes might help to
> > > > > > reproduce it. Maybe as small as three CPUs each. ;-)
> > > > > >
> > > > > OK. I will give a try this:
> > > > >
> > > > > for (( i=0; i<$LOOPS; i++ )); do
> > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> > > > > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
> > > > > echo "Done $i"
> > > > > done
> > > >
> > > > Making each guest OS smaller needs "--kconfig CONFIG_NR_CPUS=4" (or
> > > > whatever) as well, perhaps also increasing the "16*TREE05".
> > > >
> > >
> > > By default we have NR_CPUS=8, we we discussed. Providing to kvm "--cpus 5"
> > > parameter will just set number of CPUs for a VM to 5:
> > >
> > > <snip>
> > > ...
> > > [ 0.060672] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=5, Nodes=1
> > > ...
> > > <snip>
> > >
> > > so, for my test i do not see why i need to set --kconfig CONFIG_NR_CPUS=4.
> > >
> > > Am i missing something? :)
> >
> > Because that gets you more guest OSes running on your system, each with
> > one RCU-update kthread that is being checked by RCU reader kthreads.
> > Therefore, it might double the rate at which you are able to reproduce
> > this issue.
> >
> You mean that setting --kconfig CONFIG_NR_CPUS=4 and 16*TREE05 will run
> 4 separate KVM instances?
Almost but not quite.
I am assuming that you have a system with a multiple of eight CPUs.
If so, and assuming that Cheung's bug is an interaction between a fast
synchronize_rcu() grace period and a reader task that this grace period
is waiting on, having more and smaller guest OSes might make the problem
happen faster. So instead of your:
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
'16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
You might be able to double the number of reproductions of the bug
per unit time by instead using:
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
'32*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1' \
--kconfig "CONFIG_NR_CPUS=4"
Does that seem reasonable to you?
Thanx, Paul
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-27 18:15 ` Paul E. McKenney
@ 2025-01-27 18:31 ` Uladzislau Rezki
2025-01-27 19:24 ` Uladzislau Rezki
1 sibling, 0 replies; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-27 18:31 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Uladzislau Rezki, Boqun Feng, RCU, LKML, Frederic Weisbecker,
Cheung Wall, Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Mon, Jan 27, 2025 at 10:15:21AM -0800, Paul E. McKenney wrote:
> On Mon, Jan 27, 2025 at 06:26:59PM +0100, Uladzislau Rezki wrote:
> > On Mon, Jan 27, 2025 at 08:51:01AM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 27, 2025 at 04:42:58PM +0100, Uladzislau Rezki wrote:
> > > > On Mon, Jan 27, 2025 at 06:51:44AM -0800, Paul E. McKenney wrote:
> > > > > On Mon, Jan 27, 2025 at 02:27:51PM +0100, Uladzislau Rezki wrote:
> > > > > > On Fri, Jan 24, 2025 at 11:34:03AM -0800, Paul E. McKenney wrote:
> > > > > > > On Fri, Jan 24, 2025 at 06:48:40PM +0100, Uladzislau Rezki wrote:
> > > > > > > > On Fri, Jan 24, 2025 at 09:36:07AM -0800, Paul E. McKenney wrote:
> > > > > > > > > On Fri, Jan 24, 2025 at 06:21:30PM +0100, Uladzislau Rezki wrote:
> > > > > > > > > > On Fri, Jan 24, 2025 at 07:45:23AM -0800, Paul E. McKenney wrote:
> > > > > > > > > > > On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> > > > > > > > > > > > On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > > > > > > > > > > > > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > > > > > > > > > > > This configuration specifies the maximum number of CPUs which
> > > > > > > > > > > > > > is set to 8. The problem is that it can not be overwritten for
> > > > > > > > > > > > > > something higher.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Remove that configuration for TREE05, so it is possible to run
> > > > > > > > > > > > > > the torture test on as many CPUs as many system has.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > > You should be able to override this on the kvm.sh command line by
> > > > > > > > > > > > > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > > > > > > > > > > > > For example, see the torture.sh querying the system's number of CPUs
> > > > > > > > > > > > > and then specifying it to a number of tests.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Or am I missing something here?
> > > > > > > > > > > > >
> > > > > > > > > > > > It took me a while to understand what happens. Apparently there is this
> > > > > > > > > > > > 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> > > > > > > > > > > > you need to know about that. I have not expected that.
> > > > > > > > > > > >
> > > > > > > > > > > > Therefore i removed it from the configuration because i have not found
> > > > > > > > > > > > a good explanation why we need. It is confusing instead :)
> > > > > > > > > > >
> > > > > > > > > > > Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
> > > > > > > > > > > make use of 20 systems with 80 CPUs each. If you remove that line from
> > > > > > > > > > > TREE05, won't each instance of TREE05 consume a full system, for a total
> > > > > > > > > > > of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
> > > > > > > > > > > command line, but that would affect all the scenarios, not just TREE05.
> > > > > > > > > > > Including (say) TINY01, where I believe that it would cause kvm.sh
> > > > > > > > > > > to complain about a Kconfig conflict.
> > > > > > > > > > >
> > > > > > > > > > > Hence me not being in favor of this change. ;-)
> > > > > > > > > > >
> > > > > > > > > > > Is there another way to make things work for both situations?
> > > > > > > > > > >
> > > > > > > > > > OK, i see. Well. I will just go with --kconfig CONFIG_NR_CPUS=foo if i
> > > > > > > > > > need more CPUs for TREE05.
> > > > > > > > > >
> > > > > > > > > > I will not resist, we just drop this patch :)
> > > > > > > > >
> > > > > > > > > Thank you!
> > > > > > > > >
> > > > > > > > > The bug you are chasing happens when a given synchonize_rcu() interacts
> > > > > > > > > with RCU readers, correct?
> > > > > > > > >
> > > > > > > > Below one:
> > > > > > > >
> > > > > > > > <snip>
> > > > > > > > /*
> > > > > > > > * RCU torture fake writer kthread. Repeatedly calls sync, with a random
> > > > > > > > * delay between calls.
> > > > > > > > */
> > > > > > > > static int
> > > > > > > > rcu_torture_fakewriter(void *arg)
> > > > > > > > {
> > > > > > > > ...
> > > > > > > > <snip>
> > > > > > > >
> > > > > > > > > In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
> > > > > > > > > interacts with rcu_torture_reader(). So my guess is that running
> > > > > > > > > many small TREE05 guest OSes would reproduce this bug more quickly.
> > > > > > > > > So instead of this:
> > > > > > > > >
> > > > > > > > > --kconfig CONFIG_NR_CPUS=128
> > > > > > > > >
> > > > > > > > > Do this:
> > > > > > > > >
> > > > > > > > > --configs "16*TREE05"
> > > > > > > > >
> > > > > > > > > Or maybe even this:
> > > > > > > > >
> > > > > > > > > --configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
> > > > > > > > Thanks for input.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thoughts?
> > > > > > > > >
> > > > > > > > If you mean below splat:
> > > > > > >
> > > > > > > >
> > > > > > > > i.e. with more nfakewriters.
> > > > > > >
> > > > > > > Right, and large nfakewriters would help push the synchronize_rcu()
> > > > > > > wakeups off of the grace-period kthread.
> > > > > > >
> > > > > > > > If you mean the one that has recently reported, i am not able to
> > > > > > > > reproduce it anyhow :)
> > > > > > >
> > > > > > > Using larger numbers of smaller rcutorture guest OSes might help to
> > > > > > > reproduce it. Maybe as small as three CPUs each. ;-)
> > > > > > >
> > > > > > OK. I will give a try this:
> > > > > >
> > > > > > for (( i=0; i<$LOOPS; i++ )); do
> > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> > > > > > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
> > > > > > echo "Done $i"
> > > > > > done
> > > > >
> > > > > Making each guest OS smaller needs "--kconfig CONFIG_NR_CPUS=4" (or
> > > > > whatever) as well, perhaps also increasing the "16*TREE05".
> > > > >
> > > >
> > > > By default we have NR_CPUS=8, we we discussed. Providing to kvm "--cpus 5"
> > > > parameter will just set number of CPUs for a VM to 5:
> > > >
> > > > <snip>
> > > > ...
> > > > [ 0.060672] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=5, Nodes=1
> > > > ...
> > > > <snip>
> > > >
> > > > so, for my test i do not see why i need to set --kconfig CONFIG_NR_CPUS=4.
> > > >
> > > > Am i missing something? :)
> > >
> > > Because that gets you more guest OSes running on your system, each with
> > > one RCU-update kthread that is being checked by RCU reader kthreads.
> > > Therefore, it might double the rate at which you are able to reproduce
> > > this issue.
> > >
> > You mean that setting --kconfig CONFIG_NR_CPUS=4 and 16*TREE05 will run
> > 4 separate KVM instances?
>
> Almost but not quite.
>
> I am assuming that you have a system with a multiple of eight CPUs.
>
> If so, and assuming that Cheung's bug is an interaction between a fast
> synchronize_rcu() grace period and a reader task that this grace period
> is waiting on, having more and smaller guest OSes might make the problem
> happen faster. So instead of your:
>
> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
>
> You might be able to double the number of reproductions of the bug
> per unit time by instead using:
>
> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> '32*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1' \
> --kconfig "CONFIG_NR_CPUS=4"
>
> Does that seem reasonable to you?
>
I was confused by the: how CONFIG_NR_CPUS can influence on number of
instances kvm.sh runs.
It is obvious, that as much parallel setups you run as faster you can
reproduce it. Of course if there are enough resources a system runs the
test.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-27 18:15 ` Paul E. McKenney
2025-01-27 18:31 ` Uladzislau Rezki
@ 2025-01-27 19:24 ` Uladzislau Rezki
2025-01-27 20:37 ` Uladzislau Rezki
1 sibling, 1 reply; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-27 19:24 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Uladzislau Rezki, Boqun Feng, RCU, LKML, Frederic Weisbecker,
Cheung Wall, Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Mon, Jan 27, 2025 at 10:15:21AM -0800, Paul E. McKenney wrote:
> On Mon, Jan 27, 2025 at 06:26:59PM +0100, Uladzislau Rezki wrote:
> > On Mon, Jan 27, 2025 at 08:51:01AM -0800, Paul E. McKenney wrote:
> > > On Mon, Jan 27, 2025 at 04:42:58PM +0100, Uladzislau Rezki wrote:
> > > > On Mon, Jan 27, 2025 at 06:51:44AM -0800, Paul E. McKenney wrote:
> > > > > On Mon, Jan 27, 2025 at 02:27:51PM +0100, Uladzislau Rezki wrote:
> > > > > > On Fri, Jan 24, 2025 at 11:34:03AM -0800, Paul E. McKenney wrote:
> > > > > > > On Fri, Jan 24, 2025 at 06:48:40PM +0100, Uladzislau Rezki wrote:
> > > > > > > > On Fri, Jan 24, 2025 at 09:36:07AM -0800, Paul E. McKenney wrote:
> > > > > > > > > On Fri, Jan 24, 2025 at 06:21:30PM +0100, Uladzislau Rezki wrote:
> > > > > > > > > > On Fri, Jan 24, 2025 at 07:45:23AM -0800, Paul E. McKenney wrote:
> > > > > > > > > > > On Fri, Jan 24, 2025 at 12:41:38PM +0100, Uladzislau Rezki wrote:
> > > > > > > > > > > > On Thu, Jan 23, 2025 at 12:29:45PM -0800, Paul E. McKenney wrote:
> > > > > > > > > > > > > On Thu, Jan 23, 2025 at 07:58:26PM +0100, Uladzislau Rezki (Sony) wrote:
> > > > > > > > > > > > > > This configuration specifies the maximum number of CPUs which
> > > > > > > > > > > > > > is set to 8. The problem is that it can not be overwritten for
> > > > > > > > > > > > > > something higher.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Remove that configuration for TREE05, so it is possible to run
> > > > > > > > > > > > > > the torture test on as many CPUs as many system has.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > > You should be able to override this on the kvm.sh command line by
> > > > > > > > > > > > > specifying "--kconfig CONFIG_NR_CPUS=128" or whatever number you wish.
> > > > > > > > > > > > > For example, see the torture.sh querying the system's number of CPUs
> > > > > > > > > > > > > and then specifying it to a number of tests.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Or am I missing something here?
> > > > > > > > > > > > >
> > > > > > > > > > > > It took me a while to understand what happens. Apparently there is this
> > > > > > > > > > > > 8 CPUs limitation. Yes, i can do it manually by passing --kconfig but
> > > > > > > > > > > > you need to know about that. I have not expected that.
> > > > > > > > > > > >
> > > > > > > > > > > > Therefore i removed it from the configuration because i have not found
> > > > > > > > > > > > a good explanation why we need. It is confusing instead :)
> > > > > > > > > > >
> > > > > > > > > > > Right now, if I do a run with --configs "TREE10 14*CFLIST", this will
> > > > > > > > > > > make use of 20 systems with 80 CPUs each. If you remove that line from
> > > > > > > > > > > TREE05, won't each instance of TREE05 consume a full system, for a total
> > > > > > > > > > > of 33 systems? Yes, I could use "--kconfig CONFIG_NR_CPUS=8" on the
> > > > > > > > > > > command line, but that would affect all the scenarios, not just TREE05.
> > > > > > > > > > > Including (say) TINY01, where I believe that it would cause kvm.sh
> > > > > > > > > > > to complain about a Kconfig conflict.
> > > > > > > > > > >
> > > > > > > > > > > Hence me not being in favor of this change. ;-)
> > > > > > > > > > >
> > > > > > > > > > > Is there another way to make things work for both situations?
> > > > > > > > > > >
> > > > > > > > > > OK, i see. Well. I will just go with --kconfig CONFIG_NR_CPUS=foo if i
> > > > > > > > > > need more CPUs for TREE05.
> > > > > > > > > >
> > > > > > > > > > I will not resist, we just drop this patch :)
> > > > > > > > >
> > > > > > > > > Thank you!
> > > > > > > > >
> > > > > > > > > The bug you are chasing happens when a given synchonize_rcu() interacts
> > > > > > > > > with RCU readers, correct?
> > > > > > > > >
> > > > > > > > Below one:
> > > > > > > >
> > > > > > > > <snip>
> > > > > > > > /*
> > > > > > > > * RCU torture fake writer kthread. Repeatedly calls sync, with a random
> > > > > > > > * delay between calls.
> > > > > > > > */
> > > > > > > > static int
> > > > > > > > rcu_torture_fakewriter(void *arg)
> > > > > > > > {
> > > > > > > > ...
> > > > > > > > <snip>
> > > > > > > >
> > > > > > > > > In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
> > > > > > > > > interacts with rcu_torture_reader(). So my guess is that running
> > > > > > > > > many small TREE05 guest OSes would reproduce this bug more quickly.
> > > > > > > > > So instead of this:
> > > > > > > > >
> > > > > > > > > --kconfig CONFIG_NR_CPUS=128
> > > > > > > > >
> > > > > > > > > Do this:
> > > > > > > > >
> > > > > > > > > --configs "16*TREE05"
> > > > > > > > >
> > > > > > > > > Or maybe even this:
> > > > > > > > >
> > > > > > > > > --configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
> > > > > > > > Thanks for input.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Thoughts?
> > > > > > > > >
> > > > > > > > If you mean below splat:
> > > > > > >
> > > > > > > >
> > > > > > > > i.e. with more nfakewriters.
> > > > > > >
> > > > > > > Right, and large nfakewriters would help push the synchronize_rcu()
> > > > > > > wakeups off of the grace-period kthread.
> > > > > > >
> > > > > > > > If you mean the one that has recently reported, i am not able to
> > > > > > > > reproduce it anyhow :)
> > > > > > >
> > > > > > > Using larger numbers of smaller rcutorture guest OSes might help to
> > > > > > > reproduce it. Maybe as small as three CPUs each. ;-)
> > > > > > >
> > > > > > OK. I will give a try this:
> > > > > >
> > > > > > for (( i=0; i<$LOOPS; i++ )); do
> > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> > > > > > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
> > > > > > echo "Done $i"
> > > > > > done
> > > > >
> > > > > Making each guest OS smaller needs "--kconfig CONFIG_NR_CPUS=4" (or
> > > > > whatever) as well, perhaps also increasing the "16*TREE05".
> > > > >
> > > >
> > > > By default we have NR_CPUS=8, we we discussed. Providing to kvm "--cpus 5"
> > > > parameter will just set number of CPUs for a VM to 5:
> > > >
> > > > <snip>
> > > > ...
> > > > [ 0.060672] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=5, Nodes=1
> > > > ...
> > > > <snip>
> > > >
> > > > so, for my test i do not see why i need to set --kconfig CONFIG_NR_CPUS=4.
> > > >
> > > > Am i missing something? :)
> > >
> > > Because that gets you more guest OSes running on your system, each with
> > > one RCU-update kthread that is being checked by RCU reader kthreads.
> > > Therefore, it might double the rate at which you are able to reproduce
> > > this issue.
> > >
> > You mean that setting --kconfig CONFIG_NR_CPUS=4 and 16*TREE05 will run
> > 4 separate KVM instances?
>
> Almost but not quite.
>
> I am assuming that you have a system with a multiple of eight CPUs.
>
> If so, and assuming that Cheung's bug is an interaction between a fast
> synchronize_rcu() grace period and a reader task that this grace period
> is waiting on, having more and smaller guest OSes might make the problem
> happen faster. So instead of your:
>
> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
>
> You might be able to double the number of reproductions of the bug
> per unit time by instead using:
>
> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> '32*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1' \
> --kconfig "CONFIG_NR_CPUS=4"
>
> Does that seem reasonable to you?
>
It only runs one instance for me:
tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs 32*TREE05 --memory 10G --bootargs rcutorture.fwd_progress=1 --kconfig CONFIG_NR_CPUS=4
----Start batch 1: Mon Jan 27 08:20:17 PM CET 2025
TREE05 4: Starting build. Mon Jan 27 08:20:17 PM CET 2025
TREE05 4: Waiting for build to complete. Mon Jan 27 08:20:17 PM CET 2025
TREE05 4: Build complete. Mon Jan 27 08:21:26 PM CET 2025
---- TREE05 4: Kernel present. Mon Jan 27 08:21:26 PM CET 2025
---- Starting kernels. Mon Jan 27 08:21:26 PM CET 2025
with 4 CPUs inside VM :)
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-27 19:24 ` Uladzislau Rezki
@ 2025-01-27 20:37 ` Uladzislau Rezki
2025-01-28 0:14 ` Paul E. McKenney
0 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-27 20:37 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Paul E. McKenney, Boqun Feng, RCU, LKML, Frederic Weisbecker,
Cheung Wall, Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
> > > > > > > > > > > need more CPUs for TREE05.
> > > > > > > > > > >
> > > > > > > > > > > I will not resist, we just drop this patch :)
> > > > > > > > > >
> > > > > > > > > > Thank you!
> > > > > > > > > >
> > > > > > > > > > The bug you are chasing happens when a given synchonize_rcu() interacts
> > > > > > > > > > with RCU readers, correct?
> > > > > > > > > >
> > > > > > > > > Below one:
> > > > > > > > >
> > > > > > > > > <snip>
> > > > > > > > > /*
> > > > > > > > > * RCU torture fake writer kthread. Repeatedly calls sync, with a random
> > > > > > > > > * delay between calls.
> > > > > > > > > */
> > > > > > > > > static int
> > > > > > > > > rcu_torture_fakewriter(void *arg)
> > > > > > > > > {
> > > > > > > > > ...
> > > > > > > > > <snip>
> > > > > > > > >
> > > > > > > > > > In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
> > > > > > > > > > interacts with rcu_torture_reader(). So my guess is that running
> > > > > > > > > > many small TREE05 guest OSes would reproduce this bug more quickly.
> > > > > > > > > > So instead of this:
> > > > > > > > > >
> > > > > > > > > > --kconfig CONFIG_NR_CPUS=128
> > > > > > > > > >
> > > > > > > > > > Do this:
> > > > > > > > > >
> > > > > > > > > > --configs "16*TREE05"
> > > > > > > > > >
> > > > > > > > > > Or maybe even this:
> > > > > > > > > >
> > > > > > > > > > --configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
> > > > > > > > > Thanks for input.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thoughts?
> > > > > > > > > >
> > > > > > > > > If you mean below splat:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > i.e. with more nfakewriters.
> > > > > > > >
> > > > > > > > Right, and large nfakewriters would help push the synchronize_rcu()
> > > > > > > > wakeups off of the grace-period kthread.
> > > > > > > >
> > > > > > > > > If you mean the one that has recently reported, i am not able to
> > > > > > > > > reproduce it anyhow :)
> > > > > > > >
> > > > > > > > Using larger numbers of smaller rcutorture guest OSes might help to
> > > > > > > > reproduce it. Maybe as small as three CPUs each. ;-)
> > > > > > > >
> > > > > > > OK. I will give a try this:
> > > > > > >
> > > > > > > for (( i=0; i<$LOOPS; i++ )); do
> > > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> > > > > > > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
> > > > > > > echo "Done $i"
> > > > > > > done
> > > > > >
> > > > > > Making each guest OS smaller needs "--kconfig CONFIG_NR_CPUS=4" (or
> > > > > > whatever) as well, perhaps also increasing the "16*TREE05".
> > > > > >
> > > > >
> > > > > By default we have NR_CPUS=8, we we discussed. Providing to kvm "--cpus 5"
> > > > > parameter will just set number of CPUs for a VM to 5:
> > > > >
> > > > > <snip>
> > > > > ...
> > > > > [ 0.060672] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=5, Nodes=1
> > > > > ...
> > > > > <snip>
> > > > >
> > > > > so, for my test i do not see why i need to set --kconfig CONFIG_NR_CPUS=4.
> > > > >
> > > > > Am i missing something? :)
> > > >
> > > > Because that gets you more guest OSes running on your system, each with
> > > > one RCU-update kthread that is being checked by RCU reader kthreads.
> > > > Therefore, it might double the rate at which you are able to reproduce
> > > > this issue.
> > > >
> > > You mean that setting --kconfig CONFIG_NR_CPUS=4 and 16*TREE05 will run
> > > 4 separate KVM instances?
> >
> > Almost but not quite.
> >
> > I am assuming that you have a system with a multiple of eight CPUs.
> >
> > If so, and assuming that Cheung's bug is an interaction between a fast
> > synchronize_rcu() grace period and a reader task that this grace period
> > is waiting on, having more and smaller guest OSes might make the problem
> > happen faster. So instead of your:
> >
> > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
> >
> > You might be able to double the number of reproductions of the bug
> > per unit time by instead using:
> >
> > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> > '32*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1' \
> > --kconfig "CONFIG_NR_CPUS=4"
> >
> > Does that seem reasonable to you?
> >
> It only runs one instance for me:
>
> tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs 32*TREE05 --memory 10G --bootargs rcutorture.fwd_progress=1 --kconfig CONFIG_NR_CPUS=4
> ----Start batch 1: Mon Jan 27 08:20:17 PM CET 2025
> TREE05 4: Starting build. Mon Jan 27 08:20:17 PM CET 2025
> TREE05 4: Waiting for build to complete. Mon Jan 27 08:20:17 PM CET 2025
> TREE05 4: Build complete. Mon Jan 27 08:21:26 PM CET 2025
> ---- TREE05 4: Kernel present. Mon Jan 27 08:21:26 PM CET 2025
> ---- Starting kernels. Mon Jan 27 08:21:26 PM CET 2025
>
> with 4 CPUs inside VM :)
>
And when running 16 instances with 4 CPUs each i can reproduce the
splat which has been reported:
tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --configs \
'16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1' \
--kconfig "CONFIG_NR_CPUS=4"
<snip>
...
[ 0.595251] ------------[ cut here ]------------
[ 0.595867] A full grace period is not passed yet: 0
[ 0.595875] WARNING: CPU: 1 PID: 16 at kernel/rcu/tree.c:1617 rcu_sr_normal_complete+0xa9/0xc0
[ 0.598248] Modules linked in:
[ 0.598649] CPU: 1 UID: 0 PID: 16 Comm: rcu_preempt Not tainted 6.13.0-02530-g8950af6a11ff #261
[ 0.599248] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 0.600248] RIP: 0010:rcu_sr_normal_complete+0xa9/0xc0
[ 0.600913] Code: 48 29 c2 48 8d 04 0a ba 03 00 00 00 48 39 c2 79 0c 48 83 e8 04 48 c1 e8 02 48 8d 70 02 48 c7 c7 20 e9 33 b5 e8 d8 03 f4 ff 90 <0f> 0b 90 90 48 8d 7b 10 5b e9 f9 38 fb ff 66 0f 1f 84 00 00 00 00
[ 0.603249] RSP: 0018:ffffadad0008be60 EFLAGS: 00010282
[ 0.603925] RAX: 0000000000000000 RBX: ffffadad00013d10 RCX: 00000000ffffdfff
[ 0.605247] RDX: 0000000000000000 RSI: ffffadad0008bd10 RDI: 0000000000000001
[ 0.606247] RBP: 0000000000000000 R08: 0000000000009ffb R09: 00000000ffffdfff
[ 0.607248] R10: 00000000ffffdfff R11: ffffffffb56789a0 R12: 0000000000000005
[ 0.608247] R13: 0000000000031a40 R14: fffffffffffffb74 R15: 0000000000000000
[ 0.609250] FS: 0000000000000000(0000) GS:ffff9081f5c80000(0000) knlGS:0000000000000000
[ 0.610249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.611248] CR2: 0000000000000000 CR3: 00000002f024a000 CR4: 00000000000006f0
[ 0.612249] Call Trace:
[ 0.612574] <TASK>
[ 0.612854] ? __warn+0x8c/0x190
[ 0.613248] ? rcu_sr_normal_complete+0xa9/0xc0
[ 0.613840] ? report_bug+0x164/0x190
[ 0.614248] ? handle_bug+0x54/0x90
[ 0.614705] ? exc_invalid_op+0x17/0x70
[ 0.615248] ? asm_exc_invalid_op+0x1a/0x20
[ 0.615797] ? rcu_sr_normal_complete+0xa9/0xc0
[ 0.616248] rcu_gp_cleanup+0x403/0x5a0
[ 0.616248] ? __pfx_rcu_gp_kthread+0x10/0x10
[ 0.616818] rcu_gp_kthread+0x136/0x1c0
[ 0.617249] kthread+0xec/0x1f0
[ 0.617664] ? __pfx_kthread+0x10/0x10
[ 0.618156] ret_from_fork+0x2f/0x50
[ 0.618728] ? __pfx_kthread+0x10/0x10
[ 0.619216] ret_from_fork_asm+0x1a/0x30
[ 0.620251] </TASK>
...
<snip>
Linus tip-tree, HEAD is c4b9570cfb63501638db720f3bee9f6dfd044b82
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-27 20:37 ` Uladzislau Rezki
@ 2025-01-28 0:14 ` Paul E. McKenney
2025-01-28 12:17 ` Uladzislau Rezki
0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-28 0:14 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Mon, Jan 27, 2025 at 09:37:12PM +0100, Uladzislau Rezki wrote:
> > > > > > > > > > > > need more CPUs for TREE05.
> > > > > > > > > > > >
> > > > > > > > > > > > I will not resist, we just drop this patch :)
> > > > > > > > > > >
> > > > > > > > > > > Thank you!
> > > > > > > > > > >
> > > > > > > > > > > The bug you are chasing happens when a given synchonize_rcu() interacts
> > > > > > > > > > > with RCU readers, correct?
> > > > > > > > > > >
> > > > > > > > > > Below one:
> > > > > > > > > >
> > > > > > > > > > <snip>
> > > > > > > > > > /*
> > > > > > > > > > * RCU torture fake writer kthread. Repeatedly calls sync, with a random
> > > > > > > > > > * delay between calls.
> > > > > > > > > > */
> > > > > > > > > > static int
> > > > > > > > > > rcu_torture_fakewriter(void *arg)
> > > > > > > > > > {
> > > > > > > > > > ...
> > > > > > > > > > <snip>
> > > > > > > > > >
> > > > > > > > > > > In rcutorture, only the rcu_torture_writer() call to synchronize_rcu()
> > > > > > > > > > > interacts with rcu_torture_reader(). So my guess is that running
> > > > > > > > > > > many small TREE05 guest OSes would reproduce this bug more quickly.
> > > > > > > > > > > So instead of this:
> > > > > > > > > > >
> > > > > > > > > > > --kconfig CONFIG_NR_CPUS=128
> > > > > > > > > > >
> > > > > > > > > > > Do this:
> > > > > > > > > > >
> > > > > > > > > > > --configs "16*TREE05"
> > > > > > > > > > >
> > > > > > > > > > > Or maybe even this:
> > > > > > > > > > >
> > > > > > > > > > > --configs "16*TREE05" --kconfig CONFIG_NR_CPUS=4
> > > > > > > > > > Thanks for input.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thoughts?
> > > > > > > > > > >
> > > > > > > > > > If you mean below splat:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > i.e. with more nfakewriters.
> > > > > > > > >
> > > > > > > > > Right, and large nfakewriters would help push the synchronize_rcu()
> > > > > > > > > wakeups off of the grace-period kthread.
> > > > > > > > >
> > > > > > > > > > If you mean the one that has recently reported, i am not able to
> > > > > > > > > > reproduce it anyhow :)
> > > > > > > > >
> > > > > > > > > Using larger numbers of smaller rcutorture guest OSes might help to
> > > > > > > > > reproduce it. Maybe as small as three CPUs each. ;-)
> > > > > > > > >
> > > > > > > > OK. I will give a try this:
> > > > > > > >
> > > > > > > > for (( i=0; i<$LOOPS; i++ )); do
> > > > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> > > > > > > > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
> > > > > > > > echo "Done $i"
> > > > > > > > done
> > > > > > >
> > > > > > > Making each guest OS smaller needs "--kconfig CONFIG_NR_CPUS=4" (or
> > > > > > > whatever) as well, perhaps also increasing the "16*TREE05".
> > > > > > >
> > > > > >
> > > > > > By default we have NR_CPUS=8, we we discussed. Providing to kvm "--cpus 5"
> > > > > > parameter will just set number of CPUs for a VM to 5:
> > > > > >
> > > > > > <snip>
> > > > > > ...
> > > > > > [ 0.060672] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=5, Nodes=1
> > > > > > ...
> > > > > > <snip>
> > > > > >
> > > > > > so, for my test i do not see why i need to set --kconfig CONFIG_NR_CPUS=4.
> > > > > >
> > > > > > Am i missing something? :)
> > > > >
> > > > > Because that gets you more guest OSes running on your system, each with
> > > > > one RCU-update kthread that is being checked by RCU reader kthreads.
> > > > > Therefore, it might double the rate at which you are able to reproduce
> > > > > this issue.
> > > > >
> > > > You mean that setting --kconfig CONFIG_NR_CPUS=4 and 16*TREE05 will run
> > > > 4 separate KVM instances?
> > >
> > > Almost but not quite.
> > >
> > > I am assuming that you have a system with a multiple of eight CPUs.
> > >
> > > If so, and assuming that Cheung's bug is an interaction between a fast
> > > synchronize_rcu() grace period and a reader task that this grace period
> > > is waiting on, having more and smaller guest OSes might make the problem
> > > happen faster. So instead of your:
> > >
> > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> > > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1'
> > >
> > > You might be able to double the number of reproductions of the bug
> > > per unit time by instead using:
> > >
> > > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs \
> > > '32*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1' \
> > > --kconfig "CONFIG_NR_CPUS=4"
> > >
> > > Does that seem reasonable to you?
> > >
> > It only runs one instance for me:
> >
> > tools/testing/selftests/rcutorture/bin/kvm.sh --cpus 5 --configs 32*TREE05 --memory 10G --bootargs rcutorture.fwd_progress=1 --kconfig CONFIG_NR_CPUS=4
> > ----Start batch 1: Mon Jan 27 08:20:17 PM CET 2025
> > TREE05 4: Starting build. Mon Jan 27 08:20:17 PM CET 2025
> > TREE05 4: Waiting for build to complete. Mon Jan 27 08:20:17 PM CET 2025
> > TREE05 4: Build complete. Mon Jan 27 08:21:26 PM CET 2025
> > ---- TREE05 4: Kernel present. Mon Jan 27 08:21:26 PM CET 2025
> > ---- Starting kernels. Mon Jan 27 08:21:26 PM CET 2025
> >
> > with 4 CPUs inside VM :)
> >
> And when running 16 instances with 4 CPUs each i can reproduce the
> splat which has been reported:
>
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --configs \
> '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1' \
> --kconfig "CONFIG_NR_CPUS=4"
>
> <snip>
> ...
> [ 0.595251] ------------[ cut here ]------------
> [ 0.595867] A full grace period is not passed yet: 0
> [ 0.595875] WARNING: CPU: 1 PID: 16 at kernel/rcu/tree.c:1617 rcu_sr_normal_complete+0xa9/0xc0
> [ 0.598248] Modules linked in:
> [ 0.598649] CPU: 1 UID: 0 PID: 16 Comm: rcu_preempt Not tainted 6.13.0-02530-g8950af6a11ff #261
> [ 0.599248] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [ 0.600248] RIP: 0010:rcu_sr_normal_complete+0xa9/0xc0
> [ 0.600913] Code: 48 29 c2 48 8d 04 0a ba 03 00 00 00 48 39 c2 79 0c 48 83 e8 04 48 c1 e8 02 48 8d 70 02 48 c7 c7 20 e9 33 b5 e8 d8 03 f4 ff 90 <0f> 0b 90 90 48 8d 7b 10 5b e9 f9 38 fb ff 66 0f 1f 84 00 00 00 00
> [ 0.603249] RSP: 0018:ffffadad0008be60 EFLAGS: 00010282
> [ 0.603925] RAX: 0000000000000000 RBX: ffffadad00013d10 RCX: 00000000ffffdfff
> [ 0.605247] RDX: 0000000000000000 RSI: ffffadad0008bd10 RDI: 0000000000000001
> [ 0.606247] RBP: 0000000000000000 R08: 0000000000009ffb R09: 00000000ffffdfff
> [ 0.607248] R10: 00000000ffffdfff R11: ffffffffb56789a0 R12: 0000000000000005
> [ 0.608247] R13: 0000000000031a40 R14: fffffffffffffb74 R15: 0000000000000000
> [ 0.609250] FS: 0000000000000000(0000) GS:ffff9081f5c80000(0000) knlGS:0000000000000000
> [ 0.610249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 0.611248] CR2: 0000000000000000 CR3: 00000002f024a000 CR4: 00000000000006f0
> [ 0.612249] Call Trace:
> [ 0.612574] <TASK>
> [ 0.612854] ? __warn+0x8c/0x190
> [ 0.613248] ? rcu_sr_normal_complete+0xa9/0xc0
> [ 0.613840] ? report_bug+0x164/0x190
> [ 0.614248] ? handle_bug+0x54/0x90
> [ 0.614705] ? exc_invalid_op+0x17/0x70
> [ 0.615248] ? asm_exc_invalid_op+0x1a/0x20
> [ 0.615797] ? rcu_sr_normal_complete+0xa9/0xc0
> [ 0.616248] rcu_gp_cleanup+0x403/0x5a0
> [ 0.616248] ? __pfx_rcu_gp_kthread+0x10/0x10
> [ 0.616818] rcu_gp_kthread+0x136/0x1c0
> [ 0.617249] kthread+0xec/0x1f0
> [ 0.617664] ? __pfx_kthread+0x10/0x10
> [ 0.618156] ret_from_fork+0x2f/0x50
> [ 0.618728] ? __pfx_kthread+0x10/0x10
> [ 0.619216] ret_from_fork_asm+0x1a/0x30
> [ 0.620251] </TASK>
> ...
> <snip>
>
> Linus tip-tree, HEAD is c4b9570cfb63501638db720f3bee9f6dfd044b82
Very good! And of course, the next question is "does going to _full()
make the problem go away?" ;-)
Thanx, Paul
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-28 0:14 ` Paul E. McKenney
@ 2025-01-28 12:17 ` Uladzislau Rezki
2025-01-28 12:41 ` Paul E. McKenney
0 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-28 12:17 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Uladzislau Rezki, Boqun Feng, RCU, LKML, Frederic Weisbecker,
Cheung Wall, Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
> > > with 4 CPUs inside VM :)
> > >
> > And when running 16 instances with 4 CPUs each i can reproduce the
> > splat which has been reported:
> >
> > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --configs \
> > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1' \
> > --kconfig "CONFIG_NR_CPUS=4"
> >
> > <snip>
> > ...
> > [ 0.595251] ------------[ cut here ]------------
> > [ 0.595867] A full grace period is not passed yet: 0
> > [ 0.595875] WARNING: CPU: 1 PID: 16 at kernel/rcu/tree.c:1617 rcu_sr_normal_complete+0xa9/0xc0
> > [ 0.598248] Modules linked in:
> > [ 0.598649] CPU: 1 UID: 0 PID: 16 Comm: rcu_preempt Not tainted 6.13.0-02530-g8950af6a11ff #261
> > [ 0.599248] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > [ 0.600248] RIP: 0010:rcu_sr_normal_complete+0xa9/0xc0
> > [ 0.600913] Code: 48 29 c2 48 8d 04 0a ba 03 00 00 00 48 39 c2 79 0c 48 83 e8 04 48 c1 e8 02 48 8d 70 02 48 c7 c7 20 e9 33 b5 e8 d8 03 f4 ff 90 <0f> 0b 90 90 48 8d 7b 10 5b e9 f9 38 fb ff 66 0f 1f 84 00 00 00 00
> > [ 0.603249] RSP: 0018:ffffadad0008be60 EFLAGS: 00010282
> > [ 0.603925] RAX: 0000000000000000 RBX: ffffadad00013d10 RCX: 00000000ffffdfff
> > [ 0.605247] RDX: 0000000000000000 RSI: ffffadad0008bd10 RDI: 0000000000000001
> > [ 0.606247] RBP: 0000000000000000 R08: 0000000000009ffb R09: 00000000ffffdfff
> > [ 0.607248] R10: 00000000ffffdfff R11: ffffffffb56789a0 R12: 0000000000000005
> > [ 0.608247] R13: 0000000000031a40 R14: fffffffffffffb74 R15: 0000000000000000
> > [ 0.609250] FS: 0000000000000000(0000) GS:ffff9081f5c80000(0000) knlGS:0000000000000000
> > [ 0.610249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 0.611248] CR2: 0000000000000000 CR3: 00000002f024a000 CR4: 00000000000006f0
> > [ 0.612249] Call Trace:
> > [ 0.612574] <TASK>
> > [ 0.612854] ? __warn+0x8c/0x190
> > [ 0.613248] ? rcu_sr_normal_complete+0xa9/0xc0
> > [ 0.613840] ? report_bug+0x164/0x190
> > [ 0.614248] ? handle_bug+0x54/0x90
> > [ 0.614705] ? exc_invalid_op+0x17/0x70
> > [ 0.615248] ? asm_exc_invalid_op+0x1a/0x20
> > [ 0.615797] ? rcu_sr_normal_complete+0xa9/0xc0
> > [ 0.616248] rcu_gp_cleanup+0x403/0x5a0
> > [ 0.616248] ? __pfx_rcu_gp_kthread+0x10/0x10
> > [ 0.616818] rcu_gp_kthread+0x136/0x1c0
> > [ 0.617249] kthread+0xec/0x1f0
> > [ 0.617664] ? __pfx_kthread+0x10/0x10
> > [ 0.618156] ret_from_fork+0x2f/0x50
> > [ 0.618728] ? __pfx_kthread+0x10/0x10
> > [ 0.619216] ret_from_fork_asm+0x1a/0x30
> > [ 0.620251] </TASK>
> > ...
> > <snip>
> >
> > Linus tip-tree, HEAD is c4b9570cfb63501638db720f3bee9f6dfd044b82
>
> Very good! And of course, the next question is "does going to _full()
> make the problem go away?" ;-)
>
Yes does its job if i apply:
https://lore.kernel.org/rcu/00900afe-ac4e-4362-a3f9-d65f2c9dcd9a@paulmck-laptop/T/#m5d9263f3825d3170c044beedbae741717702d4aa
after that i am not able to reproduce the warning anymore. Tested over
night. Without it, i can reproduce it pretty easy :)
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-28 12:17 ` Uladzislau Rezki
@ 2025-01-28 12:41 ` Paul E. McKenney
2025-01-28 14:34 ` Uladzislau Rezki
0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-28 12:41 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Tue, Jan 28, 2025 at 01:17:34PM +0100, Uladzislau Rezki wrote:
> > > > with 4 CPUs inside VM :)
> > > >
> > > And when running 16 instances with 4 CPUs each i can reproduce the
> > > splat which has been reported:
> > >
> > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --configs \
> > > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1' \
> > > --kconfig "CONFIG_NR_CPUS=4"
> > >
> > > <snip>
> > > ...
> > > [ 0.595251] ------------[ cut here ]------------
> > > [ 0.595867] A full grace period is not passed yet: 0
> > > [ 0.595875] WARNING: CPU: 1 PID: 16 at kernel/rcu/tree.c:1617 rcu_sr_normal_complete+0xa9/0xc0
> > > [ 0.598248] Modules linked in:
> > > [ 0.598649] CPU: 1 UID: 0 PID: 16 Comm: rcu_preempt Not tainted 6.13.0-02530-g8950af6a11ff #261
> > > [ 0.599248] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > [ 0.600248] RIP: 0010:rcu_sr_normal_complete+0xa9/0xc0
> > > [ 0.600913] Code: 48 29 c2 48 8d 04 0a ba 03 00 00 00 48 39 c2 79 0c 48 83 e8 04 48 c1 e8 02 48 8d 70 02 48 c7 c7 20 e9 33 b5 e8 d8 03 f4 ff 90 <0f> 0b 90 90 48 8d 7b 10 5b e9 f9 38 fb ff 66 0f 1f 84 00 00 00 00
> > > [ 0.603249] RSP: 0018:ffffadad0008be60 EFLAGS: 00010282
> > > [ 0.603925] RAX: 0000000000000000 RBX: ffffadad00013d10 RCX: 00000000ffffdfff
> > > [ 0.605247] RDX: 0000000000000000 RSI: ffffadad0008bd10 RDI: 0000000000000001
> > > [ 0.606247] RBP: 0000000000000000 R08: 0000000000009ffb R09: 00000000ffffdfff
> > > [ 0.607248] R10: 00000000ffffdfff R11: ffffffffb56789a0 R12: 0000000000000005
> > > [ 0.608247] R13: 0000000000031a40 R14: fffffffffffffb74 R15: 0000000000000000
> > > [ 0.609250] FS: 0000000000000000(0000) GS:ffff9081f5c80000(0000) knlGS:0000000000000000
> > > [ 0.610249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [ 0.611248] CR2: 0000000000000000 CR3: 00000002f024a000 CR4: 00000000000006f0
> > > [ 0.612249] Call Trace:
> > > [ 0.612574] <TASK>
> > > [ 0.612854] ? __warn+0x8c/0x190
> > > [ 0.613248] ? rcu_sr_normal_complete+0xa9/0xc0
> > > [ 0.613840] ? report_bug+0x164/0x190
> > > [ 0.614248] ? handle_bug+0x54/0x90
> > > [ 0.614705] ? exc_invalid_op+0x17/0x70
> > > [ 0.615248] ? asm_exc_invalid_op+0x1a/0x20
> > > [ 0.615797] ? rcu_sr_normal_complete+0xa9/0xc0
> > > [ 0.616248] rcu_gp_cleanup+0x403/0x5a0
> > > [ 0.616248] ? __pfx_rcu_gp_kthread+0x10/0x10
> > > [ 0.616818] rcu_gp_kthread+0x136/0x1c0
> > > [ 0.617249] kthread+0xec/0x1f0
> > > [ 0.617664] ? __pfx_kthread+0x10/0x10
> > > [ 0.618156] ret_from_fork+0x2f/0x50
> > > [ 0.618728] ? __pfx_kthread+0x10/0x10
> > > [ 0.619216] ret_from_fork_asm+0x1a/0x30
> > > [ 0.620251] </TASK>
> > > ...
> > > <snip>
> > >
> > > Linus tip-tree, HEAD is c4b9570cfb63501638db720f3bee9f6dfd044b82
> >
> > Very good! And of course, the next question is "does going to _full()
> > make the problem go away?" ;-)
> >
> Yes does its job if i apply:
>
> https://lore.kernel.org/rcu/00900afe-ac4e-4362-a3f9-d65f2c9dcd9a@paulmck-laptop/T/#m5d9263f3825d3170c044beedbae741717702d4aa
>
> after that i am not able to reproduce the warning anymore. Tested over
> night. Without it, i can reproduce it pretty easy :)
Thank you, and good to hear!!!
May I add your Tested-by to that patch?
Thanx, Paul
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-28 12:41 ` Paul E. McKenney
@ 2025-01-28 14:34 ` Uladzislau Rezki
2025-01-28 18:43 ` Paul E. McKenney
0 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-28 14:34 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Uladzislau Rezki, Boqun Feng, RCU, LKML, Frederic Weisbecker,
Cheung Wall, Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Tue, Jan 28, 2025 at 04:41:16AM -0800, Paul E. McKenney wrote:
> On Tue, Jan 28, 2025 at 01:17:34PM +0100, Uladzislau Rezki wrote:
> > > > > with 4 CPUs inside VM :)
> > > > >
> > > > And when running 16 instances with 4 CPUs each i can reproduce the
> > > > splat which has been reported:
> > > >
> > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --configs \
> > > > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1' \
> > > > --kconfig "CONFIG_NR_CPUS=4"
> > > >
> > > > <snip>
> > > > ...
> > > > [ 0.595251] ------------[ cut here ]------------
> > > > [ 0.595867] A full grace period is not passed yet: 0
> > > > [ 0.595875] WARNING: CPU: 1 PID: 16 at kernel/rcu/tree.c:1617 rcu_sr_normal_complete+0xa9/0xc0
> > > > [ 0.598248] Modules linked in:
> > > > [ 0.598649] CPU: 1 UID: 0 PID: 16 Comm: rcu_preempt Not tainted 6.13.0-02530-g8950af6a11ff #261
> > > > [ 0.599248] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > [ 0.600248] RIP: 0010:rcu_sr_normal_complete+0xa9/0xc0
> > > > [ 0.600913] Code: 48 29 c2 48 8d 04 0a ba 03 00 00 00 48 39 c2 79 0c 48 83 e8 04 48 c1 e8 02 48 8d 70 02 48 c7 c7 20 e9 33 b5 e8 d8 03 f4 ff 90 <0f> 0b 90 90 48 8d 7b 10 5b e9 f9 38 fb ff 66 0f 1f 84 00 00 00 00
> > > > [ 0.603249] RSP: 0018:ffffadad0008be60 EFLAGS: 00010282
> > > > [ 0.603925] RAX: 0000000000000000 RBX: ffffadad00013d10 RCX: 00000000ffffdfff
> > > > [ 0.605247] RDX: 0000000000000000 RSI: ffffadad0008bd10 RDI: 0000000000000001
> > > > [ 0.606247] RBP: 0000000000000000 R08: 0000000000009ffb R09: 00000000ffffdfff
> > > > [ 0.607248] R10: 00000000ffffdfff R11: ffffffffb56789a0 R12: 0000000000000005
> > > > [ 0.608247] R13: 0000000000031a40 R14: fffffffffffffb74 R15: 0000000000000000
> > > > [ 0.609250] FS: 0000000000000000(0000) GS:ffff9081f5c80000(0000) knlGS:0000000000000000
> > > > [ 0.610249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 0.611248] CR2: 0000000000000000 CR3: 00000002f024a000 CR4: 00000000000006f0
> > > > [ 0.612249] Call Trace:
> > > > [ 0.612574] <TASK>
> > > > [ 0.612854] ? __warn+0x8c/0x190
> > > > [ 0.613248] ? rcu_sr_normal_complete+0xa9/0xc0
> > > > [ 0.613840] ? report_bug+0x164/0x190
> > > > [ 0.614248] ? handle_bug+0x54/0x90
> > > > [ 0.614705] ? exc_invalid_op+0x17/0x70
> > > > [ 0.615248] ? asm_exc_invalid_op+0x1a/0x20
> > > > [ 0.615797] ? rcu_sr_normal_complete+0xa9/0xc0
> > > > [ 0.616248] rcu_gp_cleanup+0x403/0x5a0
> > > > [ 0.616248] ? __pfx_rcu_gp_kthread+0x10/0x10
> > > > [ 0.616818] rcu_gp_kthread+0x136/0x1c0
> > > > [ 0.617249] kthread+0xec/0x1f0
> > > > [ 0.617664] ? __pfx_kthread+0x10/0x10
> > > > [ 0.618156] ret_from_fork+0x2f/0x50
> > > > [ 0.618728] ? __pfx_kthread+0x10/0x10
> > > > [ 0.619216] ret_from_fork_asm+0x1a/0x30
> > > > [ 0.620251] </TASK>
> > > > ...
> > > > <snip>
> > > >
> > > > Linus tip-tree, HEAD is c4b9570cfb63501638db720f3bee9f6dfd044b82
> > >
> > > Very good! And of course, the next question is "does going to _full()
> > > make the problem go away?" ;-)
> > >
> > Yes does its job if i apply:
> >
> > https://lore.kernel.org/rcu/00900afe-ac4e-4362-a3f9-d65f2c9dcd9a@paulmck-laptop/T/#m5d9263f3825d3170c044beedbae741717702d4aa
> >
> > after that i am not able to reproduce the warning anymore. Tested over
> > night. Without it, i can reproduce it pretty easy :)
>
> Thank you, and good to hear!!!
>
> May I add your Tested-by to that patch?
>
Sure.
Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
One question, we discussed that it is worth to print seq-delta
in case of warning. Whereas a newly patch does do it and just
emits a plain text.
I can send a separate patch or modify this one?
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-28 14:34 ` Uladzislau Rezki
@ 2025-01-28 18:43 ` Paul E. McKenney
2025-01-28 20:57 ` Uladzislau Rezki
0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-28 18:43 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Tue, Jan 28, 2025 at 03:34:50PM +0100, Uladzislau Rezki wrote:
> On Tue, Jan 28, 2025 at 04:41:16AM -0800, Paul E. McKenney wrote:
> > On Tue, Jan 28, 2025 at 01:17:34PM +0100, Uladzislau Rezki wrote:
> > > > > > with 4 CPUs inside VM :)
> > > > > >
> > > > > And when running 16 instances with 4 CPUs each i can reproduce the
> > > > > splat which has been reported:
> > > > >
> > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --configs \
> > > > > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1' \
> > > > > --kconfig "CONFIG_NR_CPUS=4"
> > > > >
> > > > > <snip>
> > > > > ...
> > > > > [ 0.595251] ------------[ cut here ]------------
> > > > > [ 0.595867] A full grace period is not passed yet: 0
> > > > > [ 0.595875] WARNING: CPU: 1 PID: 16 at kernel/rcu/tree.c:1617 rcu_sr_normal_complete+0xa9/0xc0
> > > > > [ 0.598248] Modules linked in:
> > > > > [ 0.598649] CPU: 1 UID: 0 PID: 16 Comm: rcu_preempt Not tainted 6.13.0-02530-g8950af6a11ff #261
> > > > > [ 0.599248] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > > [ 0.600248] RIP: 0010:rcu_sr_normal_complete+0xa9/0xc0
> > > > > [ 0.600913] Code: 48 29 c2 48 8d 04 0a ba 03 00 00 00 48 39 c2 79 0c 48 83 e8 04 48 c1 e8 02 48 8d 70 02 48 c7 c7 20 e9 33 b5 e8 d8 03 f4 ff 90 <0f> 0b 90 90 48 8d 7b 10 5b e9 f9 38 fb ff 66 0f 1f 84 00 00 00 00
> > > > > [ 0.603249] RSP: 0018:ffffadad0008be60 EFLAGS: 00010282
> > > > > [ 0.603925] RAX: 0000000000000000 RBX: ffffadad00013d10 RCX: 00000000ffffdfff
> > > > > [ 0.605247] RDX: 0000000000000000 RSI: ffffadad0008bd10 RDI: 0000000000000001
> > > > > [ 0.606247] RBP: 0000000000000000 R08: 0000000000009ffb R09: 00000000ffffdfff
> > > > > [ 0.607248] R10: 00000000ffffdfff R11: ffffffffb56789a0 R12: 0000000000000005
> > > > > [ 0.608247] R13: 0000000000031a40 R14: fffffffffffffb74 R15: 0000000000000000
> > > > > [ 0.609250] FS: 0000000000000000(0000) GS:ffff9081f5c80000(0000) knlGS:0000000000000000
> > > > > [ 0.610249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [ 0.611248] CR2: 0000000000000000 CR3: 00000002f024a000 CR4: 00000000000006f0
> > > > > [ 0.612249] Call Trace:
> > > > > [ 0.612574] <TASK>
> > > > > [ 0.612854] ? __warn+0x8c/0x190
> > > > > [ 0.613248] ? rcu_sr_normal_complete+0xa9/0xc0
> > > > > [ 0.613840] ? report_bug+0x164/0x190
> > > > > [ 0.614248] ? handle_bug+0x54/0x90
> > > > > [ 0.614705] ? exc_invalid_op+0x17/0x70
> > > > > [ 0.615248] ? asm_exc_invalid_op+0x1a/0x20
> > > > > [ 0.615797] ? rcu_sr_normal_complete+0xa9/0xc0
> > > > > [ 0.616248] rcu_gp_cleanup+0x403/0x5a0
> > > > > [ 0.616248] ? __pfx_rcu_gp_kthread+0x10/0x10
> > > > > [ 0.616818] rcu_gp_kthread+0x136/0x1c0
> > > > > [ 0.617249] kthread+0xec/0x1f0
> > > > > [ 0.617664] ? __pfx_kthread+0x10/0x10
> > > > > [ 0.618156] ret_from_fork+0x2f/0x50
> > > > > [ 0.618728] ? __pfx_kthread+0x10/0x10
> > > > > [ 0.619216] ret_from_fork_asm+0x1a/0x30
> > > > > [ 0.620251] </TASK>
> > > > > ...
> > > > > <snip>
> > > > >
> > > > > Linus tip-tree, HEAD is c4b9570cfb63501638db720f3bee9f6dfd044b82
> > > >
> > > > Very good! And of course, the next question is "does going to _full()
> > > > make the problem go away?" ;-)
> > > >
> > > Yes does its job if i apply:
> > >
> > > https://lore.kernel.org/rcu/00900afe-ac4e-4362-a3f9-d65f2c9dcd9a@paulmck-laptop/T/#m5d9263f3825d3170c044beedbae741717702d4aa
> > >
> > > after that i am not able to reproduce the warning anymore. Tested over
> > > night. Without it, i can reproduce it pretty easy :)
> >
> > Thank you, and good to hear!!!
> >
> > May I add your Tested-by to that patch?
> >
> Sure.
>
> Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
Thank you! I will apply this on my next rebase.
> One question, we discussed that it is worth to print seq-delta
> in case of warning. Whereas a newly patch does do it and just
> emits a plain text.
>
> I can send a separate patch or modify this one?
A separate patch would be best.
If it helps, one possible set of functions to model this on is
rcutorture_format_gp_seqs() on the "dev" branch of -rcu:
9357e5aecb63 ("rcutorture: Include grace-period sequence numbers in failure/close-call")
This has the needed #ifdefs and the different implementations for Tree
and Tiny RCU.
Thanx, Paul
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] rcutorture: Allow a negative value for nfakewriters
2025-01-23 18:58 [PATCH 1/4] rcutorture: Allow a negative value for nfakewriters Uladzislau Rezki (Sony)
` (2 preceding siblings ...)
2025-01-23 18:58 ` [PATCH 4/4] rcu: Use _full() API to debug synchronize_rcu() Uladzislau Rezki (Sony)
@ 2025-01-28 20:55 ` Uladzislau Rezki
2025-01-28 21:19 ` Paul E. McKenney
3 siblings, 1 reply; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-28 20:55 UTC (permalink / raw)
To: Paul E . McKenney
Cc: Paul E . McKenney, Boqun Feng, RCU, LKML, Frederic Weisbecker,
Cheung Wall, Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
Hello, Paul!
> Currently "nfakewriters" parameter can be set to any value but
> there is no possibility to adjust it automatically based on how
> many CPUs a system has where a test is run on.
>
> To address this, if the "nfakewriters" is set to negative it will
> be adjusted to num_possible_cpus() during torture initialization.
>
> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> ---
> kernel/rcu/rcutorture.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> index d26fb1d33ed9..6bc161e1e8ac 100644
> --- a/kernel/rcu/rcutorture.c
> +++ b/kernel/rcu/rcutorture.c
> @@ -4050,6 +4050,10 @@ rcu_torture_init(void)
> writer_task);
> if (torture_init_error(firsterr))
> goto unwind;
> +
> + if (nfakewriters < 0)
> + nfakewriters = (int) num_possible_cpus();
> +
> if (nfakewriters > 0) {
> fakewriter_tasks = kcalloc(nfakewriters,
> sizeof(fakewriter_tasks[0]),
> --
> 2.39.5
>
Don't you mind to take this as well? It is needed for:
rcu: Update TREE05.boot to test normal synchronize_rcu()
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration
2025-01-28 18:43 ` Paul E. McKenney
@ 2025-01-28 20:57 ` Uladzislau Rezki
0 siblings, 0 replies; 32+ messages in thread
From: Uladzislau Rezki @ 2025-01-28 20:57 UTC (permalink / raw)
To: Paul E. McKenney
Cc: Uladzislau Rezki, Boqun Feng, RCU, LKML, Frederic Weisbecker,
Cheung Wall, Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Tue, Jan 28, 2025 at 10:43:39AM -0800, Paul E. McKenney wrote:
> On Tue, Jan 28, 2025 at 03:34:50PM +0100, Uladzislau Rezki wrote:
> > On Tue, Jan 28, 2025 at 04:41:16AM -0800, Paul E. McKenney wrote:
> > > On Tue, Jan 28, 2025 at 01:17:34PM +0100, Uladzislau Rezki wrote:
> > > > > > > with 4 CPUs inside VM :)
> > > > > > >
> > > > > > And when running 16 instances with 4 CPUs each i can reproduce the
> > > > > > splat which has been reported:
> > > > > >
> > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --configs \
> > > > > > '16*TREE05' --memory 10G --bootargs 'rcutorture.fwd_progress=1' \
> > > > > > --kconfig "CONFIG_NR_CPUS=4"
> > > > > >
> > > > > > <snip>
> > > > > > ...
> > > > > > [ 0.595251] ------------[ cut here ]------------
> > > > > > [ 0.595867] A full grace period is not passed yet: 0
> > > > > > [ 0.595875] WARNING: CPU: 1 PID: 16 at kernel/rcu/tree.c:1617 rcu_sr_normal_complete+0xa9/0xc0
> > > > > > [ 0.598248] Modules linked in:
> > > > > > [ 0.598649] CPU: 1 UID: 0 PID: 16 Comm: rcu_preempt Not tainted 6.13.0-02530-g8950af6a11ff #261
> > > > > > [ 0.599248] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> > > > > > [ 0.600248] RIP: 0010:rcu_sr_normal_complete+0xa9/0xc0
> > > > > > [ 0.600913] Code: 48 29 c2 48 8d 04 0a ba 03 00 00 00 48 39 c2 79 0c 48 83 e8 04 48 c1 e8 02 48 8d 70 02 48 c7 c7 20 e9 33 b5 e8 d8 03 f4 ff 90 <0f> 0b 90 90 48 8d 7b 10 5b e9 f9 38 fb ff 66 0f 1f 84 00 00 00 00
> > > > > > [ 0.603249] RSP: 0018:ffffadad0008be60 EFLAGS: 00010282
> > > > > > [ 0.603925] RAX: 0000000000000000 RBX: ffffadad00013d10 RCX: 00000000ffffdfff
> > > > > > [ 0.605247] RDX: 0000000000000000 RSI: ffffadad0008bd10 RDI: 0000000000000001
> > > > > > [ 0.606247] RBP: 0000000000000000 R08: 0000000000009ffb R09: 00000000ffffdfff
> > > > > > [ 0.607248] R10: 00000000ffffdfff R11: ffffffffb56789a0 R12: 0000000000000005
> > > > > > [ 0.608247] R13: 0000000000031a40 R14: fffffffffffffb74 R15: 0000000000000000
> > > > > > [ 0.609250] FS: 0000000000000000(0000) GS:ffff9081f5c80000(0000) knlGS:0000000000000000
> > > > > > [ 0.610249] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > [ 0.611248] CR2: 0000000000000000 CR3: 00000002f024a000 CR4: 00000000000006f0
> > > > > > [ 0.612249] Call Trace:
> > > > > > [ 0.612574] <TASK>
> > > > > > [ 0.612854] ? __warn+0x8c/0x190
> > > > > > [ 0.613248] ? rcu_sr_normal_complete+0xa9/0xc0
> > > > > > [ 0.613840] ? report_bug+0x164/0x190
> > > > > > [ 0.614248] ? handle_bug+0x54/0x90
> > > > > > [ 0.614705] ? exc_invalid_op+0x17/0x70
> > > > > > [ 0.615248] ? asm_exc_invalid_op+0x1a/0x20
> > > > > > [ 0.615797] ? rcu_sr_normal_complete+0xa9/0xc0
> > > > > > [ 0.616248] rcu_gp_cleanup+0x403/0x5a0
> > > > > > [ 0.616248] ? __pfx_rcu_gp_kthread+0x10/0x10
> > > > > > [ 0.616818] rcu_gp_kthread+0x136/0x1c0
> > > > > > [ 0.617249] kthread+0xec/0x1f0
> > > > > > [ 0.617664] ? __pfx_kthread+0x10/0x10
> > > > > > [ 0.618156] ret_from_fork+0x2f/0x50
> > > > > > [ 0.618728] ? __pfx_kthread+0x10/0x10
> > > > > > [ 0.619216] ret_from_fork_asm+0x1a/0x30
> > > > > > [ 0.620251] </TASK>
> > > > > > ...
> > > > > > <snip>
> > > > > >
> > > > > > Linus tip-tree, HEAD is c4b9570cfb63501638db720f3bee9f6dfd044b82
> > > > >
> > > > > Very good! And of course, the next question is "does going to _full()
> > > > > make the problem go away?" ;-)
> > > > >
> > > > Yes does its job if i apply:
> > > >
> > > > https://lore.kernel.org/rcu/00900afe-ac4e-4362-a3f9-d65f2c9dcd9a@paulmck-laptop/T/#m5d9263f3825d3170c044beedbae741717702d4aa
> > > >
> > > > after that i am not able to reproduce the warning anymore. Tested over
> > > > night. Without it, i can reproduce it pretty easy :)
> > >
> > > Thank you, and good to hear!!!
> > >
> > > May I add your Tested-by to that patch?
> > >
> > Sure.
> >
> > Tested-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
>
> Thank you! I will apply this on my next rebase.
>
> > One question, we discussed that it is worth to print seq-delta
> > in case of warning. Whereas a newly patch does do it and just
> > emits a plain text.
> >
> > I can send a separate patch or modify this one?
>
> A separate patch would be best.
>
Sounds good :)
> If it helps, one possible set of functions to model this on is
> rcutorture_format_gp_seqs() on the "dev" branch of -rcu:
>
> 9357e5aecb63 ("rcutorture: Include grace-period sequence numbers in failure/close-call")
>
> This has the needed #ifdefs and the different implementations for Tree
> and Tiny RCU.
>
I will have a look.
--
Uladzislau Rezki
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 1/4] rcutorture: Allow a negative value for nfakewriters
2025-01-28 20:55 ` [PATCH 1/4] rcutorture: Allow a negative value for nfakewriters Uladzislau Rezki
@ 2025-01-28 21:19 ` Paul E. McKenney
0 siblings, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2025-01-28 21:19 UTC (permalink / raw)
To: Uladzislau Rezki
Cc: Boqun Feng, RCU, LKML, Frederic Weisbecker, Cheung Wall,
Neeraj upadhyay, Joel Fernandes, Oleksiy Avramchenko
On Tue, Jan 28, 2025 at 09:55:19PM +0100, Uladzislau Rezki wrote:
> Hello, Paul!
>
> > Currently "nfakewriters" parameter can be set to any value but
> > there is no possibility to adjust it automatically based on how
> > many CPUs a system has where a test is run on.
> >
> > To address this, if the "nfakewriters" is set to negative it will
> > be adjusted to num_possible_cpus() during torture initialization.
> >
> > Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com>
> > ---
> > kernel/rcu/rcutorture.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index d26fb1d33ed9..6bc161e1e8ac 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -4050,6 +4050,10 @@ rcu_torture_init(void)
> > writer_task);
> > if (torture_init_error(firsterr))
> > goto unwind;
> > +
> > + if (nfakewriters < 0)
> > + nfakewriters = (int) num_possible_cpus();
> > +
> > if (nfakewriters > 0) {
> > fakewriter_tasks = kcalloc(nfakewriters,
> > sizeof(fakewriter_tasks[0]),
> > --
> > 2.39.5
> >
>
> Don't you mind to take this as well? It is needed for:
>
> rcu: Update TREE05.boot to test normal synchronize_rcu()
I would, but could you please set something up like we have for
nreaders (the module parameter) and nrealreaders (the value actually
used throughout). I freely admit that nrealfakereaders sounds a bit
strange, so please feel free to either embrace the strangeness or propose
an alternative. ;-)
The reason for this is so that, on a system with 128 CPUs, the user can
distinguish between having specified (say) nfakewriters=128 on the one
hand or nfakewriters=-1 on the other.
Thanx, Paul
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-01-28 21:19 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-23 18:58 [PATCH 1/4] rcutorture: Allow a negative value for nfakewriters Uladzislau Rezki (Sony)
2025-01-23 18:58 ` [PATCH 2/4] torture: Remove CONFIG_NR_CPUS configuration Uladzislau Rezki (Sony)
2025-01-23 20:29 ` Paul E. McKenney
2025-01-24 11:41 ` Uladzislau Rezki
2025-01-24 15:45 ` Paul E. McKenney
2025-01-24 17:21 ` Uladzislau Rezki
2025-01-24 17:36 ` Paul E. McKenney
2025-01-24 17:48 ` Uladzislau Rezki
2025-01-24 19:34 ` Paul E. McKenney
2025-01-27 13:27 ` Uladzislau Rezki
2025-01-27 14:51 ` Paul E. McKenney
2025-01-27 15:42 ` Uladzislau Rezki
2025-01-27 16:51 ` Paul E. McKenney
2025-01-27 17:26 ` Uladzislau Rezki
2025-01-27 18:15 ` Paul E. McKenney
2025-01-27 18:31 ` Uladzislau Rezki
2025-01-27 19:24 ` Uladzislau Rezki
2025-01-27 20:37 ` Uladzislau Rezki
2025-01-28 0:14 ` Paul E. McKenney
2025-01-28 12:17 ` Uladzislau Rezki
2025-01-28 12:41 ` Paul E. McKenney
2025-01-28 14:34 ` Uladzislau Rezki
2025-01-28 18:43 ` Paul E. McKenney
2025-01-28 20:57 ` Uladzislau Rezki
2025-01-23 18:58 ` [PATCH 3/4] rcu: Update TREE05.boot to test normal synchronize_rcu() Uladzislau Rezki (Sony)
2025-01-23 20:30 ` Paul E. McKenney
2025-01-23 18:58 ` [PATCH 4/4] rcu: Use _full() API to debug synchronize_rcu() Uladzislau Rezki (Sony)
2025-01-23 21:52 ` Paul E. McKenney
2025-01-24 11:48 ` Uladzislau Rezki
2025-01-24 15:49 ` Paul E. McKenney
2025-01-28 20:55 ` [PATCH 1/4] rcutorture: Allow a negative value for nfakewriters Uladzislau Rezki
2025-01-28 21:19 ` Paul E. McKenney
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox