From: Joel Fernandes <joelagnelf@nvidia.com>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Joel Fernandes <joelagnelf@nvidia.com>,
linux-kernel <linux-kernel@vger.kernel.org>,
Frederic Weisbecker <frederic@kernel.org>,
Neeraj Upadhyay <neeraj.upadhyay@kernel.org>,
Josh Triplett <josh@joshtriplett.org>,
Boqun Feng <boqun.feng@gmail.com>,
Uladzislau Rezki <urezki@gmail.com>,
Steven Rostedt <rostedt@goodmis.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Lai Jiangshan <jiangshanlai@gmail.com>,
Zqiang <qiang.zhang1211@gmail.com>,
Davidlohr Bueso <dave@stgolabs.net>, rcu <rcu@vger.kernel.org>
Subject: Re: rcutorture: Perform more frequent testing of ->gpwrap
Date: Sat, 05 Apr 2025 12:30:58 -0000 [thread overview]
Message-ID: <174385625883.111.5710608020798879272@patchwork.local> (raw)
In-Reply-To: <c9c19caa-68c1-4012-bd82-38a24c9edb60@paulmck-laptop>
Hello, Paul,
On Sat, 5 Apr 2025 12:26:12 GMT, "Paul E. McKenney" wrote:
> On Sat, Mar 29, 2025 at 07:01:42PM -0400, Joel Fernandes wrote:
> > Currently, the ->gpwrap is not tested (at all per my testing) due to the
> > requirement of a large delta between a CPU's rdp->gp_seq and its node's
> > rnp->gpseq.
> >
> > This results in no testing of ->gpwrap being set. This patch by default
> > adds 5 minutes of testing with ->gpwrap forced by lowering the delta
> > between rdp->gp_seq and rnp->gp_seq to just 8 GPs. All of this is
> > configurable, including the active time for the setting and a full
> > testing cycle.
> >
> > By default, the first 25 minutes of a test will have the _default_
> > behavior there is right now (ULONG_MAX / 4) delta. Then for 5 minutes,
> > we switch to a smaller delta causing 1-2 wraps in 5 minutes. I believe
> > this is reasonable since we at least add a little bit of testing for
> > usecases where ->gpwrap is set.
> >
> > Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
>
> I ran this as follows:
>
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 10m --configs "TREE01" --bootargs "rcutorture.ovf_cycle_mins=7" --trust-make
>
> Once I actually applied your patch, I did get this:
>
> [ 601.891042] gpwraps: 13745
>
> Which seems to indicate some testing. ;-)
Thanks a lot for running it. I am wondering if I should check in tree.c (only in
testing mode), if the wraps are too many and restrict testing if so. Otherwise,
it is hard to come up with a constant that ensures the wraps are under control.
On the other hand, since this is only for 5 minutes every 30 minutes, we can leave
it as is and avoid the complexity.
>
> Additional comments inline.
>
> Thanx, Paul
>
> > ---
> > kernel/rcu/rcu.h | 4 +++
> > kernel/rcu/rcutorture.c | 64 +++++++++++++++++++++++++++++++++++++++++
> > kernel/rcu/tree.c | 34 ++++++++++++++++++++--
> > kernel/rcu/tree.h | 1 +
> > 4 files changed, 101 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
> > index eed2951a4962..9a15e9701e02 100644
> > --- a/kernel/rcu/rcu.h
> > +++ b/kernel/rcu/rcu.h
> > @@ -572,6 +572,8 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
> > unsigned long c_old,
> > unsigned long c);
> > void rcu_gp_set_torture_wait(int duration);
> > +void rcu_set_torture_ovf_lag(unsigned long lag);
> > +int rcu_get_gpwrap_count(int cpu);
> > #else
> > static inline void rcutorture_get_gp_data(int *flags, unsigned long *gp_seq)
> > {
> > @@ -589,6 +591,8 @@ void do_trace_rcu_torture_read(const char *rcutorturename,
> > do { } while (0)
> > #endif
> > static inline void rcu_gp_set_torture_wait(int duration) { }
> > +static inline void rcu_set_torture_ovf_lag(unsigned long lag) { }
> > +static inline int rcu_get_gpwrap_count(int cpu) { return 0; }
>
> Very good, you did remember CONFIG_SMP=n. And yes, I did try it. ;-)
>
> But shouldn't these be function pointers in rcu_torture_ops? That way if
> some other flavor of RCU starts doing wrap protection for its grace-period
> sequence numbers, this testing can apply directly to that flavor as well.
These are here because 'rdp' is not accessible AFAIK from rcutorture.c.
I could add wrappers to these and include them as pointers the a struct as well.
But I think these will still stay to access rdp.
>
> Then the pointers can simply be NULL in kernels built with CONFIG_SMP=n.
>
> > #endif
> > unsigned long long rcutorture_gather_gp_seqs(void);
> > void rcutorture_format_gp_seqs(unsigned long long seqs, char *cp, size_t len);
> > diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
> > index 895a27545ae1..79a72e70913e 100644
> > --- a/kernel/rcu/rcutorture.c
> > +++ b/kernel/rcu/rcutorture.c
> > @@ -118,6 +118,9 @@ torture_param(int, nreaders, -1, "Number of RCU reader threads");
> > torture_param(int, object_debug, 0, "Enable debug-object double call_rcu() testing");
> > torture_param(int, onoff_holdoff, 0, "Time after boot before CPU hotplugs (s)");
> > torture_param(int, onoff_interval, 0, "Time between CPU hotplugs (jiffies), 0=disable");
> > +torture_param(int, ovf_cycle_mins, 30, "Total cycle duration for ovf lag testing (in minutes)");
> > +torture_param(int, ovf_active_mins, 5, "Duration for which ovf lag is active within each cycle (in minutes)");
> > +torture_param(int, ovf_lag_gps, 8, "Value to set for set_torture_ovf_lag during an active testing period.");
>
> Given that "ovf" means just "overflow", would it make sense to get a "gp"
> in there? Maybe just "gpwrap_..."?
>
> "What is in a name?" ;-)
Sure, makes sense I will rename.
>
> I could argue with the defaults, but I run long tests often enough that
> I am not worried about coverage. As long as we remember to either run
> long tests or specify appropriate rcutorture.ovf_cycle_mins when messing
> with ->gpwrap code.
>
> > torture_param(int, nocbs_nthreads, 0, "Number of NOCB toggle threads, 0 to disable");
> > torture_param(int, nocbs_toggle, 1000, "Time between toggling nocb state (ms)");
> > torture_param(int, preempt_duration, 0, "Preemption duration (ms), zero to disable");
> > @@ -2629,6 +2632,7 @@ rcu_torture_stats_print(void)
> > int i;
> > long pipesummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 };
> > long batchsummary[RCU_TORTURE_PIPE_LEN + 1] = { 0 };
> > + long n_gpwraps = 0;
> > struct rcu_torture *rtcp;
> > static unsigned long rtcv_snap = ULONG_MAX;
> > static bool splatted;
> > @@ -2639,6 +2643,7 @@ rcu_torture_stats_print(void)
> > pipesummary[i] += READ_ONCE(per_cpu(rcu_torture_count, cpu)[i]);
> > batchsummary[i] += READ_ONCE(per_cpu(rcu_torture_batch, cpu)[i]);
> > }
> > + n_gpwraps += rcu_get_gpwrap_count(cpu);
> > }
> > for (i = RCU_TORTURE_PIPE_LEN; i >= 0; i--) {
> > if (pipesummary[i] != 0)
> > @@ -2672,6 +2677,7 @@ rcu_torture_stats_print(void)
> > pr_cont("read-exits: %ld ", data_race(n_read_exits)); // Statistic.
> > pr_cont("nocb-toggles: %ld:%ld\n",
>
> The "\n" on the above line needs to be deleted.
Ok.
> > atomic_long_read(&n_nocb_offload), atomic_long_read(&n_nocb_deoffload));
> > + pr_cont("gpwraps: %ld\n", n_gpwraps);
> >
> > pr_alert("%s%s ", torture_type, TORTURE_FLAG);
> > if (atomic_read(&n_rcu_torture_mberror) ||
> > @@ -3842,6 +3848,58 @@ static int rcu_torture_preempt(void *unused)
> >
> > static enum cpuhp_state rcutor_hp;
> >
> > +static struct hrtimer ovf_lag_timer;
> > +static bool ovf_lag_active;
>
> Same "ovf" naming complaint as before.
Ok.
> > +}
> > +
> > +static int rcu_torture_ovf_lag_init(void)
> > +{
> > + if (ovf_cycle_mins <= 0 || ovf_active_mins <= 0) {
> > + pr_alert("rcu-torture: lag timing parameters must be positive\n");
> > + return -EINVAL;
> > + }
>
> Why not refuse to start this portion of the test when testing CONFIG_SMP=n
> or something other than vanilla RCU? No need to fail the test, just
> print something saying that this testing won't be happening.
Got it, will do.
> > +
> > +static void rcu_torture_ovf_lag_cleanup(void)
> > +{
> > + hrtimer_cancel(&ovf_lag_timer);
> > +
> > + if (ovf_lag_active) {
> > + rcu_set_torture_ovf_lag(0);
> > + ovf_lag_active = false;
> > + }
> > +}
>
> Did you try the modprobe/rmmod testing to verify that this
> cleans up appropriately? You could use the drgn tool to check.
> See tools/rcu//rcu-cbs.py for an example drgn script that digs into the
> rcu_data structures.
Nice, will check!
Will work on this and provide v2.
thanks,
- Joel
next prev parent reply other threads:[~2025-04-05 12:31 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-29 23:01 [PATCH] rcutorture: Perform more frequent testing of ->gpwrap Joel Fernandes
2025-04-02 1:16 ` Paul E. McKenney
2025-04-05 12:30 ` Joel Fernandes [this message]
2025-04-05 17:22 ` Paul E. McKenney
2025-04-05 17:54 ` Joel Fernandes
2025-04-05 19:09 ` Paul E. McKenney
2025-04-10 13:48 ` Joel Fernandes
2025-04-10 13:48 ` Joel Fernandes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=174385625883.111.5710608020798879272@patchwork.local \
--to=joelagnelf@nvidia.com \
--cc=boqun.feng@gmail.com \
--cc=dave@stgolabs.net \
--cc=frederic@kernel.org \
--cc=jiangshanlai@gmail.com \
--cc=josh@joshtriplett.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=neeraj.upadhyay@kernel.org \
--cc=paulmck@kernel.org \
--cc=qiang.zhang1211@gmail.com \
--cc=rcu@vger.kernel.org \
--cc=rostedt@goodmis.org \
--cc=urezki@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox