* [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() @ 2025-01-28 0:07 Joel Fernandes (Google) 2025-01-28 0:09 ` Joel Fernandes 0 siblings, 1 reply; 20+ messages in thread From: Joel Fernandes (Google) @ 2025-01-28 0:07 UTC (permalink / raw) To: linux-kernel, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang Cc: rcu The rcu_seq_done() API has a large "false-negative" windows of size ULONG_MAX/2, where after wrap around, it is possible that it will think that a GP has not completed if a wrap around happens and the delta is large. rcu_seq_done_exact() is more accurate avoiding this wrap around issue, by making the window of false-negativity by only 3 GPs. Use this logic for rcu_seq_done() which is a nice negative code delta and could potentially avoid issues in the future where rcu_seq_done() was reporting false-negatives for too long. rcutorture runs of all scenarios for 15 minutes passed. Code inspection was done of all users to convince the change would work. Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> --- kernel/rcu/rcu.h | 13 ++----------- kernel/rcu/tree.c | 6 +++--- 2 files changed, 5 insertions(+), 14 deletions(-) diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h index eed2951a4962..c2ca196907cb 100644 --- a/kernel/rcu/rcu.h +++ b/kernel/rcu/rcu.h @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) /* * Given a snapshot from rcu_seq_snap(), determine whether or not a - * full update-side operation has occurred. + * full update-side operation has occurred while also handling + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. */ static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) -{ - return ULONG_CMP_GE(READ_ONCE(*sp), s); -} - -/* - * Given a snapshot from rcu_seq_snap(), determine whether or not a - * full update-side operation has occurred, but do not allow the - * (ULONG_MAX / 2) safety-factor/guard-band. - */ -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) { unsigned long cur_s = READ_ONCE(*sp); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index b77ccc55557b..835600cec9ba 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); bool poll_state_synchronize_rcu(unsigned long oldstate) { if (oldstate == RCU_GET_STATE_COMPLETED || - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { smp_mb(); /* Ensure GP ends before subsequent accesses. */ return true; } @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) smp_mb(); // Order against root rcu_node structure grace-period cleanup. if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { smp_mb(); /* Ensure GP ends before subsequent accesses. */ return true; } -- 2.34.1 ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-28 0:07 [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() Joel Fernandes (Google) @ 2025-01-28 0:09 ` Joel Fernandes 2025-01-29 1:22 ` Joel Fernandes 0 siblings, 1 reply; 20+ messages in thread From: Joel Fernandes @ 2025-01-28 0:09 UTC (permalink / raw) To: linux-kernel, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang Cc: rcu On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) <joel@joelfernandes.org> wrote: > > The rcu_seq_done() API has a large "false-negative" windows of size > ULONG_MAX/2, where after wrap around, it is possible that it will think > that a GP has not completed if a wrap around happens and the delta is > large. > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > by making the window of false-negativity by only 3 GPs. Use this logic > for rcu_seq_done() which is a nice negative code delta and could > potentially avoid issues in the future where rcu_seq_done() was > reporting false-negatives for too long. > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > was done of all users to convince the change would work. > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> I am leaving a 60 minute overnight run of all scenarios on my personal server for further testing. thanks, - Joel > --- > kernel/rcu/rcu.h | 13 ++----------- > kernel/rcu/tree.c | 6 +++--- > 2 files changed, 5 insertions(+), 14 deletions(-) > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > index eed2951a4962..c2ca196907cb 100644 > --- a/kernel/rcu/rcu.h > +++ b/kernel/rcu/rcu.h > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > /* > * Given a snapshot from rcu_seq_snap(), determine whether or not a > - * full update-side operation has occurred. > + * full update-side operation has occurred while also handling > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > */ > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > -{ > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > -} > - > -/* > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > - * full update-side operation has occurred, but do not allow the > - * (ULONG_MAX / 2) safety-factor/guard-band. > - */ > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > { > unsigned long cur_s = READ_ONCE(*sp); > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > index b77ccc55557b..835600cec9ba 100644 > --- a/kernel/rcu/tree.c > +++ b/kernel/rcu/tree.c > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > bool poll_state_synchronize_rcu(unsigned long oldstate) > { > if (oldstate == RCU_GET_STATE_COMPLETED || > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > return true; > } > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > return true; > } > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-28 0:09 ` Joel Fernandes @ 2025-01-29 1:22 ` Joel Fernandes 2025-01-29 1:33 ` Paul E. McKenney 0 siblings, 1 reply; 20+ messages in thread From: Joel Fernandes @ 2025-01-29 1:22 UTC (permalink / raw) To: linux-kernel, Paul E. McKenney, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang Cc: rcu On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > <joel@joelfernandes.org> wrote: > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > that a GP has not completed if a wrap around happens and the delta is > > large. > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > by making the window of false-negativity by only 3 GPs. Use this logic > > for rcu_seq_done() which is a nice negative code delta and could > > potentially avoid issues in the future where rcu_seq_done() was > > reporting false-negatives for too long. > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > was done of all users to convince the change would work. > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > I am leaving a 60 minute overnight run of all scenarios on my personal > server for further testing. The run passed, details below: --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: Results directory: tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 thanks, - Joel > > thanks, > > - Joel > > > > > > --- > > kernel/rcu/rcu.h | 13 ++----------- > > kernel/rcu/tree.c | 6 +++--- > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > index eed2951a4962..c2ca196907cb 100644 > > --- a/kernel/rcu/rcu.h > > +++ b/kernel/rcu/rcu.h > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > /* > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > - * full update-side operation has occurred. > > + * full update-side operation has occurred while also handling > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > */ > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > -{ > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > -} > > - > > -/* > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > - * full update-side operation has occurred, but do not allow the > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > - */ > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > { > > unsigned long cur_s = READ_ONCE(*sp); > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > index b77ccc55557b..835600cec9ba 100644 > > --- a/kernel/rcu/tree.c > > +++ b/kernel/rcu/tree.c > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > { > > if (oldstate == RCU_GET_STATE_COMPLETED || > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > return true; > > } > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > return true; > > } > > -- > > 2.34.1 > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-29 1:22 ` Joel Fernandes @ 2025-01-29 1:33 ` Paul E. McKenney 2025-01-29 1:38 ` Joel Fernandes 0 siblings, 1 reply; 20+ messages in thread From: Paul E. McKenney @ 2025-01-29 1:33 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > <joel@joelfernandes.org> wrote: > > > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > > that a GP has not completed if a wrap around happens and the delta is > > > large. > > > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > by making the window of false-negativity by only 3 GPs. Use this logic > > > for rcu_seq_done() which is a nice negative code delta and could > > > potentially avoid issues in the future where rcu_seq_done() was > > > reporting false-negatives for too long. > > > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > was done of all users to convince the change would work. > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > I am leaving a 60 minute overnight run of all scenarios on my personal > > server for further testing. > > The run passed, details below: > > --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > Results directory: > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 Very good! How would you go about analyzing whether this is really safe vs. getting just getting lucky and not having provoked an overflow? Thanx, Paul > thanks, > > - Joel > > > > > > thanks, > > > > - Joel > > > > > > > > > > > --- > > > kernel/rcu/rcu.h | 13 ++----------- > > > kernel/rcu/tree.c | 6 +++--- > > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > index eed2951a4962..c2ca196907cb 100644 > > > --- a/kernel/rcu/rcu.h > > > +++ b/kernel/rcu/rcu.h > > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > > > /* > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > - * full update-side operation has occurred. > > > + * full update-side operation has occurred while also handling > > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > > */ > > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > -{ > > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > > -} > > > - > > > -/* > > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > - * full update-side operation has occurred, but do not allow the > > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > > - */ > > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > { > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > index b77ccc55557b..835600cec9ba 100644 > > > --- a/kernel/rcu/tree.c > > > +++ b/kernel/rcu/tree.c > > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > > { > > > if (oldstate == RCU_GET_STATE_COMPLETED || > > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > return true; > > > } > > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > return true; > > > } > > > -- > > > 2.34.1 > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-29 1:33 ` Paul E. McKenney @ 2025-01-29 1:38 ` Joel Fernandes 2025-01-29 1:47 ` Paul E. McKenney 0 siblings, 1 reply; 20+ messages in thread From: Joel Fernandes @ 2025-01-29 1:38 UTC (permalink / raw) To: paulmck Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > > <joel@joelfernandes.org> wrote: > > > > > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > > > that a GP has not completed if a wrap around happens and the delta is > > > > large. > > > > > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > > by making the window of false-negativity by only 3 GPs. Use this logic > > > > for rcu_seq_done() which is a nice negative code delta and could > > > > potentially avoid issues in the future where rcu_seq_done() was > > > > reporting false-negatives for too long. > > > > > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > > was done of all users to convince the change would work. > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > I am leaving a 60 minute overnight run of all scenarios on my personal > > > server for further testing. > > > > The run passed, details below: > > > > --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > Results directory: > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > Very good! > > How would you go about analyzing whether this is really safe vs. getting > just getting lucky and not having provoked an overflow? I would probably add a more specific test case stressing the API, or even a unit test of just the API by passing a range of sequences.. I should go ahead and do that but it sounds like you feel there is an issue with the patch? :) thanks, - Joel > > Thanx, Paul > > > thanks, > > > > - Joel > > > > > > > > > > thanks, > > > > > > - Joel > > > > > > > > > > > > > > > > --- > > > > kernel/rcu/rcu.h | 13 ++----------- > > > > kernel/rcu/tree.c | 6 +++--- > > > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > index eed2951a4962..c2ca196907cb 100644 > > > > --- a/kernel/rcu/rcu.h > > > > +++ b/kernel/rcu/rcu.h > > > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > > > > > /* > > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > - * full update-side operation has occurred. > > > > + * full update-side operation has occurred while also handling > > > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > > > */ > > > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > > -{ > > > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > > > -} > > > > - > > > > -/* > > > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > - * full update-side operation has occurred, but do not allow the > > > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > > > - */ > > > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > > { > > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > index b77ccc55557b..835600cec9ba 100644 > > > > --- a/kernel/rcu/tree.c > > > > +++ b/kernel/rcu/tree.c > > > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > > > { > > > > if (oldstate == RCU_GET_STATE_COMPLETED || > > > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > return true; > > > > } > > > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > return true; > > > > } > > > > -- > > > > 2.34.1 > > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-29 1:38 ` Joel Fernandes @ 2025-01-29 1:47 ` Paul E. McKenney 2025-01-29 2:11 ` Joel Fernandes 2025-01-29 2:13 ` Joel Fernandes 0 siblings, 2 replies; 20+ messages in thread From: Paul E. McKenney @ 2025-01-29 1:47 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > > On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > > > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > > > <joel@joelfernandes.org> wrote: > > > > > > > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > > > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > > > > that a GP has not completed if a wrap around happens and the delta is > > > > > large. > > > > > > > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > > > by making the window of false-negativity by only 3 GPs. Use this logic > > > > > for rcu_seq_done() which is a nice negative code delta and could > > > > > potentially avoid issues in the future where rcu_seq_done() was > > > > > reporting false-negatives for too long. > > > > > > > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > > > was done of all users to convince the change would work. > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > > I am leaving a 60 minute overnight run of all scenarios on my personal > > > > server for further testing. > > > > > > The run passed, details below: > > > > > > --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > > Results directory: > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > > RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > > SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > > SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > > SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > > SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > > SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > > SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > > TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > > TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > > TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > > TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > > TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > > TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > > TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > > TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > > TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > > TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > > TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > > TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > > TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > > TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > > --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > > > Very good! > > > > How would you go about analyzing whether this is really safe vs. getting > > just getting lucky and not having provoked an overflow? > > I would probably add a more specific test case stressing the API, or > even a unit test of just the API by passing a range of sequences.. I > should go ahead and do that but it sounds like you feel there is an > issue with the patch? :) 2^31 (let alone 2^63) is a very large number of grace periods, and so it is hard to test grace-period sequence-number wrap. Not impossible, though... Thanx, Paul > thanks, > > - Joel > > > > > > Thanx, Paul > > > > > thanks, > > > > > > - Joel > > > > > > > > > > > > > > thanks, > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > > > > --- > > > > > kernel/rcu/rcu.h | 13 ++----------- > > > > > kernel/rcu/tree.c | 6 +++--- > > > > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > index eed2951a4962..c2ca196907cb 100644 > > > > > --- a/kernel/rcu/rcu.h > > > > > +++ b/kernel/rcu/rcu.h > > > > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > > > > > > > /* > > > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > - * full update-side operation has occurred. > > > > > + * full update-side operation has occurred while also handling > > > > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > > > > */ > > > > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > > > -{ > > > > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > > > > -} > > > > > - > > > > > -/* > > > > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > - * full update-side operation has occurred, but do not allow the > > > > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > > > > - */ > > > > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > > > { > > > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > index b77ccc55557b..835600cec9ba 100644 > > > > > --- a/kernel/rcu/tree.c > > > > > +++ b/kernel/rcu/tree.c > > > > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > > > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > > > > { > > > > > if (oldstate == RCU_GET_STATE_COMPLETED || > > > > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > > > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > return true; > > > > > } > > > > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > > > > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > > > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > > > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > > > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > return true; > > > > > } > > > > > -- > > > > > 2.34.1 > > > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-29 1:47 ` Paul E. McKenney @ 2025-01-29 2:11 ` Joel Fernandes 2025-01-29 2:13 ` Joel Fernandes 1 sibling, 0 replies; 20+ messages in thread From: Joel Fernandes @ 2025-01-29 2:11 UTC (permalink / raw) To: paulmck Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > > On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > > > On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > > > > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > > > > <joel@joelfernandes.org> wrote: > > > > > > > > > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > > > > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > > > > > that a GP has not completed if a wrap around happens and the delta is > > > > > > large. > > > > > > > > > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > > > > by making the window of false-negativity by only 3 GPs. Use this logic > > > > > > for rcu_seq_done() which is a nice negative code delta and could > > > > > > potentially avoid issues in the future where rcu_seq_done() was > > > > > > reporting false-negatives for too long. > > > > > > > > > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > > > > was done of all users to convince the change would work. > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > > > > I am leaving a 60 minute overnight run of all scenarios on my personal > > > > > server for further testing. > > > > > > > > The run passed, details below: > > > > > > > > --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > > > Results directory: > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > > > RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > > > SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > > > SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > > > SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > > > SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > > > SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > > > SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > > > TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > > > TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > > > TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > > > TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > > > TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > > > TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > > > TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > > > TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > > > TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > > > TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > > > TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > > > TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > > > TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > > > TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > > > --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > > > > > Very good! > > > > > > How would you go about analyzing whether this is really safe vs. getting > > > just getting lucky and not having provoked an overflow? > > > > I would probably add a more specific test case stressing the API, or > > even a unit test of just the API by passing a range of sequences.. I > > should go ahead and do that but it sounds like you feel there is an > > issue with the patch? :) > > 2^31 (let alone 2^63) is a very large number of grace periods, and > so it is hard to test grace-period sequence-number wrap. > > Not impossible, though... I realized with some simple tests this can break because of some ambiguity around whether a wraparound really happened: say *sp is 100, and s is 200. This could either be a full 32/64-bit wrap around, or it could be that *sp is not yet caught up. I guess there is no way to distinguish between the two cases. For this example, ULONG_CMP_GE will be false and ULONG_CMP_LT will be true making the new rcu_seq_done() version to be true. But if *sp is not yet caught up, that means the new rcu_seq_done() should be false, not true! Sigh, I look like a fool now. But I am also wondering what makes rcu_seq_done_exact() correct then - because it cannot really tell when a wrap around actually occurred. Hmmm... Confused, - Joel > > Thanx, Paul > > > thanks, > > > > - Joel > > > > > > > > > > Thanx, Paul > > > > > > > thanks, > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > kernel/rcu/rcu.h | 13 ++----------- > > > > > > kernel/rcu/tree.c | 6 +++--- > > > > > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > > index eed2951a4962..c2ca196907cb 100644 > > > > > > --- a/kernel/rcu/rcu.h > > > > > > +++ b/kernel/rcu/rcu.h > > > > > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > > > > > > > > > /* > > > > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > > - * full update-side operation has occurred. > > > > > > + * full update-side operation has occurred while also handling > > > > > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > > > > > */ > > > > > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > > > > -{ > > > > > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > > > > > -} > > > > > > - > > > > > > -/* > > > > > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > > - * full update-side operation has occurred, but do not allow the > > > > > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > > > > > - */ > > > > > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > > > > { > > > > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > index b77ccc55557b..835600cec9ba 100644 > > > > > > --- a/kernel/rcu/tree.c > > > > > > +++ b/kernel/rcu/tree.c > > > > > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > > > > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > > > > > { > > > > > > if (oldstate == RCU_GET_STATE_COMPLETED || > > > > > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > > > > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > > return true; > > > > > > } > > > > > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > > > > > > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > > > > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > > > > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > > > > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > > return true; > > > > > > } > > > > > > -- > > > > > > 2.34.1 > > > > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-29 1:47 ` Paul E. McKenney 2025-01-29 2:11 ` Joel Fernandes @ 2025-01-29 2:13 ` Joel Fernandes 2025-01-29 2:21 ` Paul E. McKenney 1 sibling, 1 reply; 20+ messages in thread From: Joel Fernandes @ 2025-01-29 2:13 UTC (permalink / raw) To: paulmck Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > > On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > > > On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > > > > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > > > > <joel@joelfernandes.org> wrote: > > > > > > > > > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > > > > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > > > > > that a GP has not completed if a wrap around happens and the delta is > > > > > > large. > > > > > > > > > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > > > > by making the window of false-negativity by only 3 GPs. Use this logic > > > > > > for rcu_seq_done() which is a nice negative code delta and could > > > > > > potentially avoid issues in the future where rcu_seq_done() was > > > > > > reporting false-negatives for too long. > > > > > > > > > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > > > > was done of all users to convince the change would work. > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > > > > I am leaving a 60 minute overnight run of all scenarios on my personal > > > > > server for further testing. > > > > > > > > The run passed, details below: > > > > > > > > --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > > > Results directory: > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > > > RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > > > SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > > > SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > > > SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > > > SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > > > SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > > > SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > > > TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > > > TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > > > TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > > > TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > > > TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > > > TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > > > TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > > > TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > > > TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > > > TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > > > TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > > > TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > > > TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > > > TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > > > --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > > > > > Very good! > > > > > > How would you go about analyzing whether this is really safe vs. getting > > > just getting lucky and not having provoked an overflow? > > > > I would probably add a more specific test case stressing the API, or > > even a unit test of just the API by passing a range of sequences.. I > > should go ahead and do that but it sounds like you feel there is an > > issue with the patch? :) > > 2^31 (let alone 2^63) is a very large number of grace periods, and > so it is hard to test grace-period sequence-number wrap. > > Not impossible, though... > We could test a decent number of candidate sequences to cover different cases. Not ideal like bruteforcing, but... Another idea is to hardcode/assume ULONG_MAX as 16-bit in a unit test. thanks, - Joel > > > > > > > > Thanx, Paul > > > > > > > thanks, > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > kernel/rcu/rcu.h | 13 ++----------- > > > > > > kernel/rcu/tree.c | 6 +++--- > > > > > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > > index eed2951a4962..c2ca196907cb 100644 > > > > > > --- a/kernel/rcu/rcu.h > > > > > > +++ b/kernel/rcu/rcu.h > > > > > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > > > > > > > > > /* > > > > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > > - * full update-side operation has occurred. > > > > > > + * full update-side operation has occurred while also handling > > > > > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > > > > > */ > > > > > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > > > > -{ > > > > > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > > > > > -} > > > > > > - > > > > > > -/* > > > > > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > > - * full update-side operation has occurred, but do not allow the > > > > > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > > > > > - */ > > > > > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > > > > { > > > > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > index b77ccc55557b..835600cec9ba 100644 > > > > > > --- a/kernel/rcu/tree.c > > > > > > +++ b/kernel/rcu/tree.c > > > > > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > > > > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > > > > > { > > > > > > if (oldstate == RCU_GET_STATE_COMPLETED || > > > > > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > > > > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > > return true; > > > > > > } > > > > > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > > > > > > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > > > > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > > > > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > > > > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > > return true; > > > > > > } > > > > > > -- > > > > > > 2.34.1 > > > > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-29 2:13 ` Joel Fernandes @ 2025-01-29 2:21 ` Paul E. McKenney 2025-01-29 11:34 ` Joel Fernandes 0 siblings, 1 reply; 20+ messages in thread From: Paul E. McKenney @ 2025-01-29 2:21 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Tue, Jan 28, 2025 at 09:13:45PM -0500, Joel Fernandes wrote: > On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > > > On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > > > > On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > > > > > On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > > > > > <joel@joelfernandes.org> wrote: > > > > > > > > > > > > > > The rcu_seq_done() API has a large "false-negative" windows of size > > > > > > > ULONG_MAX/2, where after wrap around, it is possible that it will think > > > > > > > that a GP has not completed if a wrap around happens and the delta is > > > > > > > large. > > > > > > > > > > > > > > rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > > > > > by making the window of false-negativity by only 3 GPs. Use this logic > > > > > > > for rcu_seq_done() which is a nice negative code delta and could > > > > > > > potentially avoid issues in the future where rcu_seq_done() was > > > > > > > reporting false-negatives for too long. > > > > > > > > > > > > > > rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > > > > > was done of all users to convince the change would work. > > > > > > > > > > > > > > Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > > > > > > > > > > I am leaving a 60 minute overnight run of all scenarios on my personal > > > > > > server for further testing. > > > > > > > > > > The run passed, details below: > > > > > > > > > > --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > > > > Results directory: > > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > > > > RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > > > > SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > > > > SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > > > > SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > > > > SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > > > > SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > > > > SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > > > > TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > > > > TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > > > > TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > > > > TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > > > > TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > > > > TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > > > > TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > > > > TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > > > > TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > > > > TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > > > > TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > > > > TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > > > > TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > > > > TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > > > > --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > > > > > > > Very good! > > > > > > > > How would you go about analyzing whether this is really safe vs. getting > > > > just getting lucky and not having provoked an overflow? > > > > > > I would probably add a more specific test case stressing the API, or > > > even a unit test of just the API by passing a range of sequences.. I > > > should go ahead and do that but it sounds like you feel there is an > > > issue with the patch? :) > > > > 2^31 (let alone 2^63) is a very large number of grace periods, and > > so it is hard to test grace-period sequence-number wrap. > > > > Not impossible, though... > > We could test a decent number of candidate sequences to cover > different cases. Not ideal like bruteforcing, but... Another idea is > to hardcode/assume ULONG_MAX as 16-bit in a unit test. Or put the various sequence numbers into an unsigned short or even an unsigned char. One set of use cases checks to see if a given CPU's ->gp_seq has fallen too far behind the current grace period, and sets a flag to alert that CPU. Others rely on a false negative being functionally OK. Or so I believe. ;-) Thanx, Paul > thanks, > > - Joel > > > > > > > > > > > > > > Thanx, Paul > > > > > > > > > thanks, > > > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > > > > > thanks, > > > > > > > > > > > > - Joel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > kernel/rcu/rcu.h | 13 ++----------- > > > > > > > kernel/rcu/tree.c | 6 +++--- > > > > > > > 2 files changed, 5 insertions(+), 14 deletions(-) > > > > > > > > > > > > > > diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > > > > > > > index eed2951a4962..c2ca196907cb 100644 > > > > > > > --- a/kernel/rcu/rcu.h > > > > > > > +++ b/kernel/rcu/rcu.h > > > > > > > @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > > > > > > > > > > > > > > /* > > > > > > > * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > > > - * full update-side operation has occurred. > > > > > > > + * full update-side operation has occurred while also handling > > > > > > > + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > > > > > > > */ > > > > > > > static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > > > > > > > -{ > > > > > > > - return ULONG_CMP_GE(READ_ONCE(*sp), s); > > > > > > > -} > > > > > > > - > > > > > > > -/* > > > > > > > - * Given a snapshot from rcu_seq_snap(), determine whether or not a > > > > > > > - * full update-side operation has occurred, but do not allow the > > > > > > > - * (ULONG_MAX / 2) safety-factor/guard-band. > > > > > > > - */ > > > > > > > -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > > > > > > > { > > > > > > > unsigned long cur_s = READ_ONCE(*sp); > > > > > > > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > > > > > > > index b77ccc55557b..835600cec9ba 100644 > > > > > > > --- a/kernel/rcu/tree.c > > > > > > > +++ b/kernel/rcu/tree.c > > > > > > > @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > > > > > > > bool poll_state_synchronize_rcu(unsigned long oldstate) > > > > > > > { > > > > > > > if (oldstate == RCU_GET_STATE_COMPLETED || > > > > > > > - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > > > > > > > + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > > > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > > > return true; > > > > > > > } > > > > > > > @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > > > > > > > > > > > > > > smp_mb(); // Order against root rcu_node structure grace-period cleanup. > > > > > > > if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > > > > > > > - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > > > + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > > > > > > > rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > > > > > > > - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > > > + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > > > > > > > smp_mb(); /* Ensure GP ends before subsequent accesses. */ > > > > > > > return true; > > > > > > > } > > > > > > > -- > > > > > > > 2.34.1 > > > > > > > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-29 2:21 ` Paul E. McKenney @ 2025-01-29 11:34 ` Joel Fernandes 2025-01-29 17:25 ` Paul E. McKenney 0 siblings, 1 reply; 20+ messages in thread From: Joel Fernandes @ 2025-01-29 11:34 UTC (permalink / raw) To: paulmck Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu > On Jan 28, 2025, at 9:21 PM, Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Jan 28, 2025 at 09:13:45PM -0500, Joel Fernandes wrote: >>> On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: >>> >>> On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: >>>> On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: >>>>> >>>>> On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: >>>>>> On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: >>>>>>> On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) >>>>>>> <joel@joelfernandes.org> wrote: >>>>>>>> >>>>>>>> The rcu_seq_done() API has a large "false-negative" windows of size >>>>>>>> ULONG_MAX/2, where after wrap around, it is possible that it will think >>>>>>>> that a GP has not completed if a wrap around happens and the delta is >>>>>>>> large. >>>>>>>> >>>>>>>> rcu_seq_done_exact() is more accurate avoiding this wrap around issue, >>>>>>>> by making the window of false-negativity by only 3 GPs. Use this logic >>>>>>>> for rcu_seq_done() which is a nice negative code delta and could >>>>>>>> potentially avoid issues in the future where rcu_seq_done() was >>>>>>>> reporting false-negatives for too long. >>>>>>>> >>>>>>>> rcutorture runs of all scenarios for 15 minutes passed. Code inspection >>>>>>>> was done of all users to convince the change would work. >>>>>>>> >>>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> >>>>>>> >>>>>>> I am leaving a 60 minute overnight run of all scenarios on my personal >>>>>>> server for further testing. >>>>>> >>>>>> The run passed, details below: >>>>>> >>>>>> --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: >>>>>> Results directory: >>>>>> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 >>>>>> RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 >>>>>> SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 >>>>>> SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 >>>>>> SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 >>>>>> SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 >>>>>> SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 >>>>>> SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 >>>>>> TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] >>>>>> TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] >>>>>> TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] >>>>>> TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 >>>>>> TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 >>>>>> TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 >>>>>> TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 >>>>>> TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] >>>>>> TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 >>>>>> TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 >>>>>> TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 >>>>>> TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 >>>>>> TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] >>>>>> TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 >>>>>> --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 >>>>> >>>>> Very good! >>>>> >>>>> How would you go about analyzing whether this is really safe vs. getting >>>>> just getting lucky and not having provoked an overflow? >>>> >>>> I would probably add a more specific test case stressing the API, or >>>> even a unit test of just the API by passing a range of sequences.. I >>>> should go ahead and do that but it sounds like you feel there is an >>>> issue with the patch? :) >>> >>> 2^31 (let alone 2^63) is a very large number of grace periods, and >>> so it is hard to test grace-period sequence-number wrap. >>> >>> Not impossible, though... >> >> We could test a decent number of candidate sequences to cover >> different cases. Not ideal like bruteforcing, but... Another idea is >> to hardcode/assume ULONG_MAX as 16-bit in a unit test. > > Or put the various sequence numbers into an unsigned short or even > an unsigned char. > > One set of use cases checks to see if a given CPU's ->gp_seq has fallen > too far behind the current grace period, and sets a flag to alert > that CPU. Others rely on a false negative being functionally OK. > > Or so I believe. ;-) Thanks, I am itching to create a visualization of all eight bit combinations and the output of both API, which will be a fun exercise however I’m missing something fundamental because as I mentioned in that 100 and 200 example, the API itself cannot distinguish between a wraparound and a legitimate delay in comparison between start and a delayed end. I need to understand this better and go through the code more. ;-/ Thanks, - Joel > > Thanx, Paul > >> thanks, >> >> - Joel >> >> >> >>>> >>>>> >>>>> Thanx, Paul >>>>> >>>>>> thanks, >>>>>> >>>>>> - Joel >>>>>> >>>>>> >>>>>>> >>>>>>> thanks, >>>>>>> >>>>>>> - Joel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> --- >>>>>>>> kernel/rcu/rcu.h | 13 ++----------- >>>>>>>> kernel/rcu/tree.c | 6 +++--- >>>>>>>> 2 files changed, 5 insertions(+), 14 deletions(-) >>>>>>>> >>>>>>>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h >>>>>>>> index eed2951a4962..c2ca196907cb 100644 >>>>>>>> --- a/kernel/rcu/rcu.h >>>>>>>> +++ b/kernel/rcu/rcu.h >>>>>>>> @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) >>>>>>>> >>>>>>>> /* >>>>>>>> * Given a snapshot from rcu_seq_snap(), determine whether or not a >>>>>>>> - * full update-side operation has occurred. >>>>>>>> + * full update-side operation has occurred while also handling >>>>>>>> + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. >>>>>>>> */ >>>>>>>> static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) >>>>>>>> -{ >>>>>>>> - return ULONG_CMP_GE(READ_ONCE(*sp), s); >>>>>>>> -} >>>>>>>> - >>>>>>>> -/* >>>>>>>> - * Given a snapshot from rcu_seq_snap(), determine whether or not a >>>>>>>> - * full update-side operation has occurred, but do not allow the >>>>>>>> - * (ULONG_MAX / 2) safety-factor/guard-band. >>>>>>>> - */ >>>>>>>> -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) >>>>>>>> { >>>>>>>> unsigned long cur_s = READ_ONCE(*sp); >>>>>>>> >>>>>>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >>>>>>>> index b77ccc55557b..835600cec9ba 100644 >>>>>>>> --- a/kernel/rcu/tree.c >>>>>>>> +++ b/kernel/rcu/tree.c >>>>>>>> @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); >>>>>>>> bool poll_state_synchronize_rcu(unsigned long oldstate) >>>>>>>> { >>>>>>>> if (oldstate == RCU_GET_STATE_COMPLETED || >>>>>>>> - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { >>>>>>>> + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { >>>>>>>> smp_mb(); /* Ensure GP ends before subsequent accesses. */ >>>>>>>> return true; >>>>>>>> } >>>>>>>> @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) >>>>>>>> >>>>>>>> smp_mb(); // Order against root rcu_node structure grace-period cleanup. >>>>>>>> if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || >>>>>>>> - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || >>>>>>>> + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || >>>>>>>> rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || >>>>>>>> - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { >>>>>>>> + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { >>>>>>>> smp_mb(); /* Ensure GP ends before subsequent accesses. */ >>>>>>>> return true; >>>>>>>> } >>>>>>>> -- >>>>>>>> 2.34.1 >>>>>>>> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-29 11:34 ` Joel Fernandes @ 2025-01-29 17:25 ` Paul E. McKenney 2025-01-29 23:10 ` Joel Fernandes 0 siblings, 1 reply; 20+ messages in thread From: Paul E. McKenney @ 2025-01-29 17:25 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Wed, Jan 29, 2025 at 06:34:53AM -0500, Joel Fernandes wrote: > > > > On Jan 28, 2025, at 9:21 PM, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Tue, Jan 28, 2025 at 09:13:45PM -0500, Joel Fernandes wrote: > >>> On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: > >>> > >>> On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > >>>> On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > >>>>> > >>>>> On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > >>>>>> On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > >>>>>>> On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > >>>>>>> <joel@joelfernandes.org> wrote: > >>>>>>>> > >>>>>>>> The rcu_seq_done() API has a large "false-negative" windows of size > >>>>>>>> ULONG_MAX/2, where after wrap around, it is possible that it will think > >>>>>>>> that a GP has not completed if a wrap around happens and the delta is > >>>>>>>> large. > >>>>>>>> > >>>>>>>> rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > >>>>>>>> by making the window of false-negativity by only 3 GPs. Use this logic > >>>>>>>> for rcu_seq_done() which is a nice negative code delta and could > >>>>>>>> potentially avoid issues in the future where rcu_seq_done() was > >>>>>>>> reporting false-negatives for too long. > >>>>>>>> > >>>>>>>> rcutorture runs of all scenarios for 15 minutes passed. Code inspection > >>>>>>>> was done of all users to convince the change would work. > >>>>>>>> > >>>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > >>>>>>> > >>>>>>> I am leaving a 60 minute overnight run of all scenarios on my personal > >>>>>>> server for further testing. > >>>>>> > >>>>>> The run passed, details below: > >>>>>> > >>>>>> --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > >>>>>> Results directory: > >>>>>> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > >>>>>> RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > >>>>>> SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > >>>>>> SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > >>>>>> SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > >>>>>> SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > >>>>>> SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > >>>>>> SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > >>>>>> TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > >>>>>> TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > >>>>>> TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > >>>>>> TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > >>>>>> TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > >>>>>> TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > >>>>>> TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > >>>>>> TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > >>>>>> TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > >>>>>> TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > >>>>>> TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > >>>>>> TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > >>>>>> TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > >>>>>> TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > >>>>>> --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > >>>>> > >>>>> Very good! > >>>>> > >>>>> How would you go about analyzing whether this is really safe vs. getting > >>>>> just getting lucky and not having provoked an overflow? > >>>> > >>>> I would probably add a more specific test case stressing the API, or > >>>> even a unit test of just the API by passing a range of sequences.. I > >>>> should go ahead and do that but it sounds like you feel there is an > >>>> issue with the patch? :) > >>> > >>> 2^31 (let alone 2^63) is a very large number of grace periods, and > >>> so it is hard to test grace-period sequence-number wrap. > >>> > >>> Not impossible, though... > >> > >> We could test a decent number of candidate sequences to cover > >> different cases. Not ideal like bruteforcing, but... Another idea is > >> to hardcode/assume ULONG_MAX as 16-bit in a unit test. > > > > Or put the various sequence numbers into an unsigned short or even > > an unsigned char. > > > > One set of use cases checks to see if a given CPU's ->gp_seq has fallen > > too far behind the current grace period, and sets a flag to alert > > that CPU. Others rely on a false negative being functionally OK. > > > > Or so I believe. ;-) > > Thanks, I am itching to create a visualization of all eight bit combinations and the output of both API, which will be a fun exercise however I’m missing something fundamental because as I mentioned in that 100 and 200 example, the API itself cannot distinguish between a wraparound and a legitimate delay in comparison between start and a delayed end. I need to understand this better and go through the code more. ;-/ The big questions are "under what conditions does it need to distinguish, and what are the consequences of failing to get this right?" Also, "what is the purpose of ->gpwrap?". Thanx, Paul > Thanks, > > - Joel > > > > > > Thanx, Paul > > > >> thanks, > >> > >> - Joel > >> > >> > >> > >>>> > >>>>> > >>>>> Thanx, Paul > >>>>> > >>>>>> thanks, > >>>>>> > >>>>>> - Joel > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> thanks, > >>>>>>> > >>>>>>> - Joel > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> --- > >>>>>>>> kernel/rcu/rcu.h | 13 ++----------- > >>>>>>>> kernel/rcu/tree.c | 6 +++--- > >>>>>>>> 2 files changed, 5 insertions(+), 14 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h > >>>>>>>> index eed2951a4962..c2ca196907cb 100644 > >>>>>>>> --- a/kernel/rcu/rcu.h > >>>>>>>> +++ b/kernel/rcu/rcu.h > >>>>>>>> @@ -146,19 +146,10 @@ static inline bool rcu_seq_started(unsigned long *sp, unsigned long s) > >>>>>>>> > >>>>>>>> /* > >>>>>>>> * Given a snapshot from rcu_seq_snap(), determine whether or not a > >>>>>>>> - * full update-side operation has occurred. > >>>>>>>> + * full update-side operation has occurred while also handling > >>>>>>>> + * wraparounds that exceed the (ULONG_MAX / 2) safety-factor/guard-band. > >>>>>>>> */ > >>>>>>>> static inline bool rcu_seq_done(unsigned long *sp, unsigned long s) > >>>>>>>> -{ > >>>>>>>> - return ULONG_CMP_GE(READ_ONCE(*sp), s); > >>>>>>>> -} > >>>>>>>> - > >>>>>>>> -/* > >>>>>>>> - * Given a snapshot from rcu_seq_snap(), determine whether or not a > >>>>>>>> - * full update-side operation has occurred, but do not allow the > >>>>>>>> - * (ULONG_MAX / 2) safety-factor/guard-band. > >>>>>>>> - */ > >>>>>>>> -static inline bool rcu_seq_done_exact(unsigned long *sp, unsigned long s) > >>>>>>>> { > >>>>>>>> unsigned long cur_s = READ_ONCE(*sp); > >>>>>>>> > >>>>>>>> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c > >>>>>>>> index b77ccc55557b..835600cec9ba 100644 > >>>>>>>> --- a/kernel/rcu/tree.c > >>>>>>>> +++ b/kernel/rcu/tree.c > >>>>>>>> @@ -4300,7 +4300,7 @@ EXPORT_SYMBOL_GPL(start_poll_synchronize_rcu_full); > >>>>>>>> bool poll_state_synchronize_rcu(unsigned long oldstate) > >>>>>>>> { > >>>>>>>> if (oldstate == RCU_GET_STATE_COMPLETED || > >>>>>>>> - rcu_seq_done_exact(&rcu_state.gp_seq_polled, oldstate)) { > >>>>>>>> + rcu_seq_done(&rcu_state.gp_seq_polled, oldstate)) { > >>>>>>>> smp_mb(); /* Ensure GP ends before subsequent accesses. */ > >>>>>>>> return true; > >>>>>>>> } > >>>>>>>> @@ -4347,9 +4347,9 @@ bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) > >>>>>>>> > >>>>>>>> smp_mb(); // Order against root rcu_node structure grace-period cleanup. > >>>>>>>> if (rgosp->rgos_norm == RCU_GET_STATE_COMPLETED || > >>>>>>>> - rcu_seq_done_exact(&rnp->gp_seq, rgosp->rgos_norm) || > >>>>>>>> + rcu_seq_done(&rnp->gp_seq, rgosp->rgos_norm) || > >>>>>>>> rgosp->rgos_exp == RCU_GET_STATE_COMPLETED || > >>>>>>>> - rcu_seq_done_exact(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > >>>>>>>> + rcu_seq_done(&rcu_state.expedited_sequence, rgosp->rgos_exp)) { > >>>>>>>> smp_mb(); /* Ensure GP ends before subsequent accesses. */ > >>>>>>>> return true; > >>>>>>>> } > >>>>>>>> -- > >>>>>>>> 2.34.1 > >>>>>>>> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-29 17:25 ` Paul E. McKenney @ 2025-01-29 23:10 ` Joel Fernandes 2025-01-30 5:56 ` Paul E. McKenney 0 siblings, 1 reply; 20+ messages in thread From: Joel Fernandes @ 2025-01-29 23:10 UTC (permalink / raw) To: paulmck Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Wed, Jan 29, 2025 at 12:25 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Wed, Jan 29, 2025 at 06:34:53AM -0500, Joel Fernandes wrote: > > > > > > > On Jan 28, 2025, at 9:21 PM, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Tue, Jan 28, 2025 at 09:13:45PM -0500, Joel Fernandes wrote: > > >>> On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > >>> > > >>> On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > > >>>> On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > >>>>> > > >>>>> On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > >>>>>> On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > >>>>>>> On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > >>>>>>> <joel@joelfernandes.org> wrote: > > >>>>>>>> > > >>>>>>>> The rcu_seq_done() API has a large "false-negative" windows of size > > >>>>>>>> ULONG_MAX/2, where after wrap around, it is possible that it will think > > >>>>>>>> that a GP has not completed if a wrap around happens and the delta is > > >>>>>>>> large. > > >>>>>>>> > > >>>>>>>> rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > >>>>>>>> by making the window of false-negativity by only 3 GPs. Use this logic > > >>>>>>>> for rcu_seq_done() which is a nice negative code delta and could > > >>>>>>>> potentially avoid issues in the future where rcu_seq_done() was > > >>>>>>>> reporting false-negatives for too long. > > >>>>>>>> > > >>>>>>>> rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > >>>>>>>> was done of all users to convince the change would work. > > >>>>>>>> > > >>>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > >>>>>>> > > >>>>>>> I am leaving a 60 minute overnight run of all scenarios on my personal > > >>>>>>> server for further testing. > > >>>>>> > > >>>>>> The run passed, details below: > > >>>>>> > > >>>>>> --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > >>>>>> Results directory: > > >>>>>> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > >>>>>> RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > >>>>>> SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > >>>>>> SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > >>>>>> SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > >>>>>> SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > >>>>>> SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > >>>>>> SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > >>>>>> TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > >>>>>> TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > >>>>>> TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > >>>>>> TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > >>>>>> TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > >>>>>> TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > >>>>>> TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > >>>>>> TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > >>>>>> TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > >>>>>> TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > >>>>>> TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > >>>>>> TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > >>>>>> TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > >>>>>> TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > >>>>>> --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > >>>>> > > >>>>> Very good! > > >>>>> > > >>>>> How would you go about analyzing whether this is really safe vs. getting > > >>>>> just getting lucky and not having provoked an overflow? > > >>>> > > >>>> I would probably add a more specific test case stressing the API, or > > >>>> even a unit test of just the API by passing a range of sequences.. I > > >>>> should go ahead and do that but it sounds like you feel there is an > > >>>> issue with the patch? :) > > >>> > > >>> 2^31 (let alone 2^63) is a very large number of grace periods, and > > >>> so it is hard to test grace-period sequence-number wrap. > > >>> > > >>> Not impossible, though... > > >> > > >> We could test a decent number of candidate sequences to cover > > >> different cases. Not ideal like bruteforcing, but... Another idea is > > >> to hardcode/assume ULONG_MAX as 16-bit in a unit test. > > > > > > Or put the various sequence numbers into an unsigned short or even > > > an unsigned char. > > > > > > One set of use cases checks to see if a given CPU's ->gp_seq has fallen > > > too far behind the current grace period, and sets a flag to alert > > > that CPU. Others rely on a false negative being functionally OK. > > > > > > Or so I believe. ;-) > > > > Thanks, I am itching to create a visualization of all eight bit combinations and the output of both API, which will be a fun exercise however I’m missing something fundamental because as I mentioned in that 100 and 200 example, the API itself cannot distinguish between a wraparound and a legitimate delay in comparison between start and a delayed end. I need to understand this better and go through the code more. ;-/ > > The big questions are "under what conditions does it need to distinguish, > and what are the consequences of failing to get this right?" Also, "what > is the purpose of ->gpwrap?". My understanding is we take a snapshot of a sequence and then check at later time if it was reached. Ok I shall explore these questions. Meanwhile I created the following visualization using 5-bit numbers: https://drive.google.com/file/d/1w2sgJga7B5dye4iH0oPZ79obaKO-e_Jn/view?usp=sharing I find as we expect, that for rcu_seq_done(), as long as the distance between the "running sequence" and the "target" is ULONG_MAX/2, it returns correct results. OTHERWISE, it can return false results. So for target value of 28 (in 5-bit world), only initial running values from 12 would return FALSE. Before number 12, everything is TRUE. This can cause both false-positives and false-negatives depending on the input, however it is possible that are no users who use it causing false-positives! So I guess it really depends on user. False positives: If the initial value is ULONG_MAX/2 away from the target, then rcu_seq_done() can return TRUE when *no wraparound* happened. False negatives: The initial value of the snapshot was *within* the ULONG_MAX/2 distance from target. A full *wrap around then happened*, and we ended where we were initially, so rcu_seq_done() should return TRUE but it returns FALSE because it doesn't know about the wrap. Now we move rcu_seq_done_exact. It has the exact same behavior except that instead of ULONG_MAX/2, the above situations are shrunk to about 10 counts from the target. So if target is 28, then the initial sequence should have been at least 18 to avoid false-positive, but again it is possible and I certain see in the code that it cannot be used this way.. (at least so far).. So all we are left with is the false-negative band of ~2.5 GPs.. About "what are the consequences of failing to get this right". I believe false-positive is unacceptable unless it is maybe debug code - that can break functionality in code, too short GPs and all that. However false-negative is OK as long as the usecase can retry later and can afford to wait. Did I get that much correct? Thanks, - Joel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-29 23:10 ` Joel Fernandes @ 2025-01-30 5:56 ` Paul E. McKenney 2025-02-04 15:44 ` Joel Fernandes 0 siblings, 1 reply; 20+ messages in thread From: Paul E. McKenney @ 2025-01-30 5:56 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Wed, Jan 29, 2025 at 06:10:10PM -0500, Joel Fernandes wrote: > On Wed, Jan 29, 2025 at 12:25 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Wed, Jan 29, 2025 at 06:34:53AM -0500, Joel Fernandes wrote: > > > > > > > > > > On Jan 28, 2025, at 9:21 PM, Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Tue, Jan 28, 2025 at 09:13:45PM -0500, Joel Fernandes wrote: > > > >>> On Tue, Jan 28, 2025 at 8:47 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > >>> > > > >>> On Tue, Jan 28, 2025 at 08:38:57PM -0500, Joel Fernandes wrote: > > > >>>> On Tue, Jan 28, 2025 at 8:33 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > >>>>> > > > >>>>> On Tue, Jan 28, 2025 at 08:22:48PM -0500, Joel Fernandes wrote: > > > >>>>>> On Mon, Jan 27, 2025 at 07:09:34PM -0500, Joel Fernandes wrote: > > > >>>>>>> On Mon, Jan 27, 2025 at 7:07 PM Joel Fernandes (Google) > > > >>>>>>> <joel@joelfernandes.org> wrote: > > > >>>>>>>> > > > >>>>>>>> The rcu_seq_done() API has a large "false-negative" windows of size > > > >>>>>>>> ULONG_MAX/2, where after wrap around, it is possible that it will think > > > >>>>>>>> that a GP has not completed if a wrap around happens and the delta is > > > >>>>>>>> large. > > > >>>>>>>> > > > >>>>>>>> rcu_seq_done_exact() is more accurate avoiding this wrap around issue, > > > >>>>>>>> by making the window of false-negativity by only 3 GPs. Use this logic > > > >>>>>>>> for rcu_seq_done() which is a nice negative code delta and could > > > >>>>>>>> potentially avoid issues in the future where rcu_seq_done() was > > > >>>>>>>> reporting false-negatives for too long. > > > >>>>>>>> > > > >>>>>>>> rcutorture runs of all scenarios for 15 minutes passed. Code inspection > > > >>>>>>>> was done of all users to convince the change would work. > > > >>>>>>>> > > > >>>>>>>> Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > > >>>>>>> > > > >>>>>>> I am leaving a 60 minute overnight run of all scenarios on my personal > > > >>>>>>> server for further testing. > > > >>>>>> > > > >>>>>> The run passed, details below: > > > >>>>>> > > > >>>>>> --- Mon Jan 27 11:49:49 PM EST 2025 Test summary: > > > >>>>>> Results directory: > > > >>>>>> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 60 > > > >>>>>> RUDE01 ------- 14309 GPs (3.97472/s) [tasks-rude: g57884 f0x0 total-gps=57880] n_max_cbs: 0 > > > >>>>>> SRCU-L ------- 34121 GPs (9.47806/s) [srcu: g316564 f0x0 total-gps=79242] n_max_cbs: 150000 > > > >>>>>> SRCU-N ------- 151316 GPs (42.0322/s) [srcu: g1840064 f0x0 total-gps=460117] n_max_cbs: 150000 > > > >>>>>> SRCU-P ------- 35189 GPs (9.77472/s) [srcud: g320792 f0x0 total-gps=80299] n_max_cbs: 150000 > > > >>>>>> SRCU-T ------- 389034 GPs (108.065/s) [srcu: g4142406 f0x0 total-gps=1035602] n_max_cbs: 50000 > > > >>>>>> SRCU-U ------- 376267 GPs (104.519/s) [srcud: g3953834 f0x0 total-gps=988459] n_max_cbs: 50000 > > > >>>>>> SRCU-V ------- 407633 GPs (113.231/s) [srcud: g4371704 f0x0 total-gps=1092927] n_max_cbs: 1000 > > > >>>>>> TASKS01 ------- 11007 GPs (3.0575/s) [tasks: g57816 f0x0 total-gps=57808] > > > >>>>>> TASKS02 ------- 10539 GPs (2.9275/s) [tasks: g57936 f0x0 total-gps=57936] > > > >>>>>> TASKS03 ------- 10453 GPs (2.90361/s) [tasks: g57508 f0x0 total-gps=57508] > > > >>>>>> TINY01 ------- 511634 GPs (142.121/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 57078 > > > >>>>>> TINY02 ------- 541799 GPs (150.5/s) [rcu: g0 f0x0 total-gps=0] n_max_cbs: 2619 > > > >>>>>> TRACE01 ------- 7299 GPs (2.0275/s) [tasks-tracing: g45844 f0x0 total-gps=45844] n_max_cbs: 50000 > > > >>>>>> TRACE02 ------- 101265 GPs (28.1292/s) [tasks-tracing: g305464 f0x0 total-gps=305456] n_max_cbs: 100000 > > > >>>>>> TREE01 ------- 97989 GPs (27.2192/s) [rcu: g479473 f0x0 total-gps=120151] > > > >>>>>> TREE02 ------- 202908 GPs (56.3633/s) [rcu: g1459509 f0x0 total-gps=365162] n_max_cbs: 1139244 > > > >>>>>> TREE03 ------- 168901 GPs (46.9169/s) [rcu: g1764445 f0x0 total-gps=441393] n_max_cbs: 1341765 > > > >>>>>> TREE04 ------- 148876 GPs (41.3544/s) [rcu: g951744 f0x0 total-gps=238225] n_max_cbs: 236765 > > > >>>>>> TREE05 ------- 220092 GPs (61.1367/s) [rcu: g1234385 f0x0 total-gps=308880] n_max_cbs: 82801 > > > >>>>>> TREE07 ------- 34678 GPs (9.63278/s) [rcu: g207257 f0x0 total-gps=52094] > > > >>>>>> TREE09 ------- 341706 GPs (94.9183/s) [rcu: g7693569 f0x0 total-gps=1923688] n_max_cbs: 1845334 > > > >>>>>> --- Done at Mon Jan 27 11:49:55 PM EST 2025 (4:41:24) exitcode 0 > > > >>>>> > > > >>>>> Very good! > > > >>>>> > > > >>>>> How would you go about analyzing whether this is really safe vs. getting > > > >>>>> just getting lucky and not having provoked an overflow? > > > >>>> > > > >>>> I would probably add a more specific test case stressing the API, or > > > >>>> even a unit test of just the API by passing a range of sequences.. I > > > >>>> should go ahead and do that but it sounds like you feel there is an > > > >>>> issue with the patch? :) > > > >>> > > > >>> 2^31 (let alone 2^63) is a very large number of grace periods, and > > > >>> so it is hard to test grace-period sequence-number wrap. > > > >>> > > > >>> Not impossible, though... > > > >> > > > >> We could test a decent number of candidate sequences to cover > > > >> different cases. Not ideal like bruteforcing, but... Another idea is > > > >> to hardcode/assume ULONG_MAX as 16-bit in a unit test. > > > > > > > > Or put the various sequence numbers into an unsigned short or even > > > > an unsigned char. > > > > > > > > One set of use cases checks to see if a given CPU's ->gp_seq has fallen > > > > too far behind the current grace period, and sets a flag to alert > > > > that CPU. Others rely on a false negative being functionally OK. > > > > > > > > Or so I believe. ;-) > > > > > > Thanks, I am itching to create a visualization of all eight bit combinations and the output of both API, which will be a fun exercise however I’m missing something fundamental because as I mentioned in that 100 and 200 example, the API itself cannot distinguish between a wraparound and a legitimate delay in comparison between start and a delayed end. I need to understand this better and go through the code more. ;-/ > > > > The big questions are "under what conditions does it need to distinguish, > > and what are the consequences of failing to get this right?" Also, "what > > is the purpose of ->gpwrap?". > > My understanding is we take a snapshot of a sequence and then check at > later time if it was reached. Ok I shall explore these questions. > Meanwhile I created the following visualization using 5-bit numbers: > > https://drive.google.com/file/d/1w2sgJga7B5dye4iH0oPZ79obaKO-e_Jn/view?usp=sharing > > I find as we expect, that for rcu_seq_done(), as long as the distance > between the "running sequence" and the "target" is ULONG_MAX/2, it > returns correct results. OTHERWISE, it can return false results. So > for target value of 28 (in 5-bit world), only initial running values > from 12 would return FALSE. Before number 12, everything is TRUE. This > can cause both false-positives and false-negatives depending on the > input, however it is possible that are no users who use it causing > false-positives! So I guess it really depends on user. Your last sentence is extremely important. You need to evaluate the consequences of false positives and false negatives on a use-case-by-use-case basis. Which first requires enumerating the use cases. > False positives: > If the initial value is ULONG_MAX/2 away from the target, then > rcu_seq_done() can return TRUE when *no wraparound* happened. What exactly is the initial value and the target? My guess is that the initial value is what was returned from rcu_seq_snap(), but that is just a guess. My guess is that the target value is the value of the underlying sequence number at the time that rcu_seq_done() is invoked, but again, that is just a guess. Similar question for "away from the target", and similar guess. This might sound picky, but if you think that *I* am picky, just try the actual RCU code. ;-) > False negatives: > The initial value of the snapshot was *within* the ULONG_MAX/2 > distance from target. A full *wrap around then happened*, and we ended > where we were initially, so rcu_seq_done() should return TRUE but it > returns FALSE because it doesn't know about the wrap. By "where we were initially", one might reasonably guess that you meant that the initial value and the target are one and the same. But I suspect that you are thinking of a third value, the value of the underlying grace-period sequence number at the time of the call to rcu_seq_snap(). But you might be thinking of something else entirely. > Now we move rcu_seq_done_exact. It has the exact same behavior except > that instead of ULONG_MAX/2, the above situations are shrunk to about > 10 counts from the target. So if target is 28, then the initial > sequence should have been at least 18 to avoid false-positive, but > again it is possible and I certain see in the code that it cannot be > used this way.. (at least so far).. So all we are left with is the > false-negative band of ~2.5 GPs.. Here, I have the same questions. As you no doubt guessed. > About "what are the consequences of failing to get this right". I > believe false-positive is unacceptable unless it is maybe debug code - > that can break functionality in code, too short GPs and all that. > However false-negative is OK as long as the usecase can retry later > and can afford to wait. Did I get that much correct? Maybe? Please look at this on a per-use-case basis. Thanx, Paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-01-30 5:56 ` Paul E. McKenney @ 2025-02-04 15:44 ` Joel Fernandes 2025-02-04 20:22 ` Paul E. McKenney 0 siblings, 1 reply; 20+ messages in thread From: Joel Fernandes @ 2025-02-04 15:44 UTC (permalink / raw) To: paulmck Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Thu, Jan 30, 2025 at 12:56 AM Paul E. McKenney <paulmck@kernel.org> wrote: > By "where we were initially", one might reasonably guess that you meant > that the initial value and the target are one and the same. But I suspect > that you are thinking of a third value, the value of the underlying > grace-period sequence number at the time of the call to rcu_seq_snap(). > But you might be thinking of something else entirely. > > > Now we move rcu_seq_done_exact. It has the exact same behavior except > > that instead of ULONG_MAX/2, the above situations are shrunk to about > > 10 counts from the target. So if target is 28, then the initial > > sequence should have been at least 18 to avoid false-positive, but > > again it is possible and I certain see in the code that it cannot be > > used this way.. (at least so far).. So all we are left with is the > > false-negative band of ~2.5 GPs.. > > Here, I have the same questions. As you no doubt guessed. > > > About "what are the consequences of failing to get this right". I > > believe false-positive is unacceptable unless it is maybe debug code - > > that can break functionality in code, too short GPs and all that. > > However false-negative is OK as long as the usecase can retry later > > and can afford to wait. Did I get that much correct? > > Maybe? > > Please look at this on a per-use-case basis. Sure. FWIW, I started a world-editable document here. I am planning to work on this a bit more before our meeting this week. If others knowledgeable like Frederic and others could make edits/comments, that'd be welcomed. But I basically summarized this whole thread here: https://docs.google.com/document/d/1sFNuGH4U6DFCE8sunbdWMjFhJhb1aG4goOJAnS6c6gQ/edit?usp=sharing My thought is (AFAICS), this patch is still valid and rcu_seq_done_exact is a potentially better replacement to rcu_seq_done. But famous last words... thanks, - Joel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-02-04 15:44 ` Joel Fernandes @ 2025-02-04 20:22 ` Paul E. McKenney 2025-02-05 1:28 ` Joel Fernandes 0 siblings, 1 reply; 20+ messages in thread From: Paul E. McKenney @ 2025-02-04 20:22 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Tue, Feb 04, 2025 at 10:44:45AM -0500, Joel Fernandes wrote: > On Thu, Jan 30, 2025 at 12:56 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > By "where we were initially", one might reasonably guess that you meant > > that the initial value and the target are one and the same. But I suspect > > that you are thinking of a third value, the value of the underlying > > grace-period sequence number at the time of the call to rcu_seq_snap(). > > But you might be thinking of something else entirely. > > > > > Now we move rcu_seq_done_exact. It has the exact same behavior except > > > that instead of ULONG_MAX/2, the above situations are shrunk to about > > > 10 counts from the target. So if target is 28, then the initial > > > sequence should have been at least 18 to avoid false-positive, but > > > again it is possible and I certain see in the code that it cannot be > > > used this way.. (at least so far).. So all we are left with is the > > > false-negative band of ~2.5 GPs.. > > > > Here, I have the same questions. As you no doubt guessed. > > > > > About "what are the consequences of failing to get this right". I > > > believe false-positive is unacceptable unless it is maybe debug code - > > > that can break functionality in code, too short GPs and all that. > > > However false-negative is OK as long as the usecase can retry later > > > and can afford to wait. Did I get that much correct? > > > > Maybe? > > > > Please look at this on a per-use-case basis. > > Sure. FWIW, I started a world-editable document here. I am planning to > work on this a bit more before our meeting this week. If others > knowledgeable like Frederic and others could make edits/comments, > that'd be welcomed. But I basically summarized this whole thread here: > https://docs.google.com/document/d/1sFNuGH4U6DFCE8sunbdWMjFhJhb1aG4goOJAnS6c6gQ/edit?usp=sharing > > My thought is (AFAICS), this patch is still valid and > rcu_seq_done_exact is a potentially better replacement to > rcu_seq_done. But famous last words... Let's start with the use case where the sequence number is being used by rcu_barrier(). What happens? Thanx, Paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-02-04 20:22 ` Paul E. McKenney @ 2025-02-05 1:28 ` Joel Fernandes 2025-02-05 10:28 ` Paul E. McKenney 0 siblings, 1 reply; 20+ messages in thread From: Joel Fernandes @ 2025-02-05 1:28 UTC (permalink / raw) To: paulmck Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Tue, Feb 4, 2025 at 3:22 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Feb 04, 2025 at 10:44:45AM -0500, Joel Fernandes wrote: > > On Thu, Jan 30, 2025 at 12:56 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > By "where we were initially", one might reasonably guess that you meant > > > that the initial value and the target are one and the same. But I suspect > > > that you are thinking of a third value, the value of the underlying > > > grace-period sequence number at the time of the call to rcu_seq_snap(). > > > But you might be thinking of something else entirely. > > > > > > > Now we move rcu_seq_done_exact. It has the exact same behavior except > > > > that instead of ULONG_MAX/2, the above situations are shrunk to about > > > > 10 counts from the target. So if target is 28, then the initial > > > > sequence should have been at least 18 to avoid false-positive, but > > > > again it is possible and I certain see in the code that it cannot be > > > > used this way.. (at least so far).. So all we are left with is the > > > > false-negative band of ~2.5 GPs.. > > > > > > Here, I have the same questions. As you no doubt guessed. > > > > > > > About "what are the consequences of failing to get this right". I > > > > believe false-positive is unacceptable unless it is maybe debug code - > > > > that can break functionality in code, too short GPs and all that. > > > > However false-negative is OK as long as the usecase can retry later > > > > and can afford to wait. Did I get that much correct? > > > > > > Maybe? > > > > > > Please look at this on a per-use-case basis. > > > > Sure. FWIW, I started a world-editable document here. I am planning to > > work on this a bit more before our meeting this week. If others > > knowledgeable like Frederic and others could make edits/comments, > > that'd be welcomed. But I basically summarized this whole thread here: > > https://docs.google.com/document/d/1sFNuGH4U6DFCE8sunbdWMjFhJhb1aG4goOJAnS6c6gQ/edit?usp=sharing > > > > My thought is (AFAICS), this patch is still valid and > > rcu_seq_done_exact is a potentially better replacement to > > rcu_seq_done. But famous last words... > > Let's start with the use case where the sequence number is being used > by rcu_barrier(). What happens? The rcu_state.barrier_sequence is used to keep track of barrier in progress, as there can be concurrent rcu_barrier() calls happening so we optimize by batching multiples of these calls. It also handles the case where the entrain IPI came through but no rcu_barrier() is currently in-flight (which would be weird but maybe it happens somehow). It is also used to make sure we don't need to entrain on a CPU again if we already did an entraining for a certain rcu_barrier() invocation already. This is tracked by rdp->barrier_seq_snap which copies the value of rcu_state.barrier_sequence after the entraining. Al though I am currently unable to envision why the IPI handler would be called unnecessarily (hardware bug?) since we hold the rcu_barrier mutex throughout the rcu_barrier() function, including while waiting on CPUs. Going back to my first point, for the rcu_seq_done() in rcu_barrier(), we are checking if someone else did our work: mutex_lock(&rcu_state.barrier_mutex); /* Did someone else do our work for us? */ if (rcu_seq_done(&rcu_state.barrier_sequence, s)) Here I think rcu_seq_done_exact() is better, because if too many rcu_barrier() calls concurrently made at the same time, the mutex may be contended. If enough time passes and we have a wrap around, we might end up in false-negative land which is much larger with rcu_seq_done() than ..done_exact(). However, that does seem to be a bug to begin with if the mutex contended for that long. OTOH, doing an rcu_barrier() again when we didn't need to doesn't seem like the end of the world especially if we contended on the mutex for that long. But at least it shows that rcu_seq_done_exact() cannot seem to hurt here. Hmmm, - Joel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-02-05 1:28 ` Joel Fernandes @ 2025-02-05 10:28 ` Paul E. McKenney 2025-02-05 15:45 ` Joel Fernandes 0 siblings, 1 reply; 20+ messages in thread From: Paul E. McKenney @ 2025-02-05 10:28 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Tue, Feb 04, 2025 at 08:28:29PM -0500, Joel Fernandes wrote: > On Tue, Feb 4, 2025 at 3:22 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Tue, Feb 04, 2025 at 10:44:45AM -0500, Joel Fernandes wrote: > > > On Thu, Jan 30, 2025 at 12:56 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > By "where we were initially", one might reasonably guess that you meant > > > > that the initial value and the target are one and the same. But I suspect > > > > that you are thinking of a third value, the value of the underlying > > > > grace-period sequence number at the time of the call to rcu_seq_snap(). > > > > But you might be thinking of something else entirely. > > > > > > > > > Now we move rcu_seq_done_exact. It has the exact same behavior except > > > > > that instead of ULONG_MAX/2, the above situations are shrunk to about > > > > > 10 counts from the target. So if target is 28, then the initial > > > > > sequence should have been at least 18 to avoid false-positive, but > > > > > again it is possible and I certain see in the code that it cannot be > > > > > used this way.. (at least so far).. So all we are left with is the > > > > > false-negative band of ~2.5 GPs.. > > > > > > > > Here, I have the same questions. As you no doubt guessed. > > > > > > > > > About "what are the consequences of failing to get this right". I > > > > > believe false-positive is unacceptable unless it is maybe debug code - > > > > > that can break functionality in code, too short GPs and all that. > > > > > However false-negative is OK as long as the usecase can retry later > > > > > and can afford to wait. Did I get that much correct? > > > > > > > > Maybe? > > > > > > > > Please look at this on a per-use-case basis. > > > > > > Sure. FWIW, I started a world-editable document here. I am planning to > > > work on this a bit more before our meeting this week. If others > > > knowledgeable like Frederic and others could make edits/comments, > > > that'd be welcomed. But I basically summarized this whole thread here: > > > https://docs.google.com/document/d/1sFNuGH4U6DFCE8sunbdWMjFhJhb1aG4goOJAnS6c6gQ/edit?usp=sharing > > > > > > My thought is (AFAICS), this patch is still valid and > > > rcu_seq_done_exact is a potentially better replacement to > > > rcu_seq_done. But famous last words... > > > > Let's start with the use case where the sequence number is being used > > by rcu_barrier(). What happens? > > The rcu_state.barrier_sequence is used to keep track of barrier in > progress, as there can be concurrent rcu_barrier() calls happening so > we optimize by batching multiples of these calls. > > It also handles the case where the entrain IPI came through but no > rcu_barrier() is currently in-flight (which would be weird but maybe > it happens somehow). > > It is also used to make sure we don't need to entrain on a CPU again > if we already did an entraining for a certain rcu_barrier() invocation > already. This is tracked by rdp->barrier_seq_snap which copies the > value of rcu_state.barrier_sequence after the entraining. Al though I > am currently unable to envision why the IPI handler would be called > unnecessarily (hardware bug?) since we hold the rcu_barrier mutex > throughout the rcu_barrier() function, including while waiting on > CPUs. > > Going back to my first point, for the rcu_seq_done() in rcu_barrier(), > we are checking if someone else did our work: > mutex_lock(&rcu_state.barrier_mutex); > /* Did someone else do our work for us? */ > if (rcu_seq_done(&rcu_state.barrier_sequence, s)) > > Here I think rcu_seq_done_exact() is better, because if too many > rcu_barrier() calls concurrently made at the same time, the mutex may > be contended. If enough time passes and we have a wrap around, we > might end up in false-negative land which is much larger with > rcu_seq_done() than ..done_exact(). However, that does seem to be a > bug to begin with if the mutex contended for that long. OTOH, doing an > rcu_barrier() again when we didn't need to doesn't seem like the end > of the world especially if we contended on the mutex for that long. > But at least it shows that rcu_seq_done_exact() cannot seem to hurt > here. > > Hmmm, So... Do we really care about the difference between rcu_seq_done() and rcu_seq_done_exact() in the rcu_barrier() case? Thanx, Paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-02-05 10:28 ` Paul E. McKenney @ 2025-02-05 15:45 ` Joel Fernandes 2025-02-05 15:56 ` Paul E. McKenney 0 siblings, 1 reply; 20+ messages in thread From: Joel Fernandes @ 2025-02-05 15:45 UTC (permalink / raw) To: paulmck Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Wed, Feb 5, 2025 at 5:28 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Tue, Feb 04, 2025 at 08:28:29PM -0500, Joel Fernandes wrote: > > On Tue, Feb 4, 2025 at 3:22 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Tue, Feb 04, 2025 at 10:44:45AM -0500, Joel Fernandes wrote: > > > > On Thu, Jan 30, 2025 at 12:56 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > By "where we were initially", one might reasonably guess that you meant > > > > > that the initial value and the target are one and the same. But I suspect > > > > > that you are thinking of a third value, the value of the underlying > > > > > grace-period sequence number at the time of the call to rcu_seq_snap(). > > > > > But you might be thinking of something else entirely. > > > > > > > > > > > Now we move rcu_seq_done_exact. It has the exact same behavior except > > > > > > that instead of ULONG_MAX/2, the above situations are shrunk to about > > > > > > 10 counts from the target. So if target is 28, then the initial > > > > > > sequence should have been at least 18 to avoid false-positive, but > > > > > > again it is possible and I certain see in the code that it cannot be > > > > > > used this way.. (at least so far).. So all we are left with is the > > > > > > false-negative band of ~2.5 GPs.. > > > > > > > > > > Here, I have the same questions. As you no doubt guessed. > > > > > > > > > > > About "what are the consequences of failing to get this right". I > > > > > > believe false-positive is unacceptable unless it is maybe debug code - > > > > > > that can break functionality in code, too short GPs and all that. > > > > > > However false-negative is OK as long as the usecase can retry later > > > > > > and can afford to wait. Did I get that much correct? > > > > > > > > > > Maybe? > > > > > > > > > > Please look at this on a per-use-case basis. > > > > > > > > Sure. FWIW, I started a world-editable document here. I am planning to > > > > work on this a bit more before our meeting this week. If others > > > > knowledgeable like Frederic and others could make edits/comments, > > > > that'd be welcomed. But I basically summarized this whole thread here: > > > > https://docs.google.com/document/d/1sFNuGH4U6DFCE8sunbdWMjFhJhb1aG4goOJAnS6c6gQ/edit?usp=sharing > > > > > > > > My thought is (AFAICS), this patch is still valid and > > > > rcu_seq_done_exact is a potentially better replacement to > > > > rcu_seq_done. But famous last words... > > > > > > Let's start with the use case where the sequence number is being used > > > by rcu_barrier(). What happens? > > > > The rcu_state.barrier_sequence is used to keep track of barrier in > > progress, as there can be concurrent rcu_barrier() calls happening so > > we optimize by batching multiples of these calls. > > > > It also handles the case where the entrain IPI came through but no > > rcu_barrier() is currently in-flight (which would be weird but maybe > > it happens somehow). > > > > It is also used to make sure we don't need to entrain on a CPU again > > if we already did an entraining for a certain rcu_barrier() invocation > > already. This is tracked by rdp->barrier_seq_snap which copies the > > value of rcu_state.barrier_sequence after the entraining. Al though I > > am currently unable to envision why the IPI handler would be called > > unnecessarily (hardware bug?) since we hold the rcu_barrier mutex > > throughout the rcu_barrier() function, including while waiting on > > CPUs. > > > > Going back to my first point, for the rcu_seq_done() in rcu_barrier(), > > we are checking if someone else did our work: > > mutex_lock(&rcu_state.barrier_mutex); > > /* Did someone else do our work for us? */ > > if (rcu_seq_done(&rcu_state.barrier_sequence, s)) > > > > Here I think rcu_seq_done_exact() is better, because if too many > > rcu_barrier() calls concurrently made at the same time, the mutex may > > be contended. If enough time passes and we have a wrap around, we > > might end up in false-negative land which is much larger with > > rcu_seq_done() than ..done_exact(). However, that does seem to be a > > bug to begin with if the mutex contended for that long. OTOH, doing an > > rcu_barrier() again when we didn't need to doesn't seem like the end > > of the world especially if we contended on the mutex for that long. > > But at least it shows that rcu_seq_done_exact() cannot seem to hurt > > here. > > > > Hmmm, > > So... Do we really care about the difference between rcu_seq_done() > and rcu_seq_done_exact() in the rcu_barrier() case? In so far as to get rid of rcu_seq_done() API in favor of using rcu_seq_done_exact() but should not cause any functional or robustness changes to rcu_barrier() at least, AFAICS. thanks, - Joel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-02-05 15:45 ` Joel Fernandes @ 2025-02-05 15:56 ` Paul E. McKenney 2025-02-05 15:59 ` Joel Fernandes 0 siblings, 1 reply; 20+ messages in thread From: Paul E. McKenney @ 2025-02-05 15:56 UTC (permalink / raw) To: Joel Fernandes Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Wed, Feb 05, 2025 at 10:45:08AM -0500, Joel Fernandes wrote: > On Wed, Feb 5, 2025 at 5:28 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > On Tue, Feb 04, 2025 at 08:28:29PM -0500, Joel Fernandes wrote: > > > On Tue, Feb 4, 2025 at 3:22 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > On Tue, Feb 04, 2025 at 10:44:45AM -0500, Joel Fernandes wrote: > > > > > On Thu, Jan 30, 2025 at 12:56 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > By "where we were initially", one might reasonably guess that you meant > > > > > > that the initial value and the target are one and the same. But I suspect > > > > > > that you are thinking of a third value, the value of the underlying > > > > > > grace-period sequence number at the time of the call to rcu_seq_snap(). > > > > > > But you might be thinking of something else entirely. > > > > > > > > > > > > > Now we move rcu_seq_done_exact. It has the exact same behavior except > > > > > > > that instead of ULONG_MAX/2, the above situations are shrunk to about > > > > > > > 10 counts from the target. So if target is 28, then the initial > > > > > > > sequence should have been at least 18 to avoid false-positive, but > > > > > > > again it is possible and I certain see in the code that it cannot be > > > > > > > used this way.. (at least so far).. So all we are left with is the > > > > > > > false-negative band of ~2.5 GPs.. > > > > > > > > > > > > Here, I have the same questions. As you no doubt guessed. > > > > > > > > > > > > > About "what are the consequences of failing to get this right". I > > > > > > > believe false-positive is unacceptable unless it is maybe debug code - > > > > > > > that can break functionality in code, too short GPs and all that. > > > > > > > However false-negative is OK as long as the usecase can retry later > > > > > > > and can afford to wait. Did I get that much correct? > > > > > > > > > > > > Maybe? > > > > > > > > > > > > Please look at this on a per-use-case basis. > > > > > > > > > > Sure. FWIW, I started a world-editable document here. I am planning to > > > > > work on this a bit more before our meeting this week. If others > > > > > knowledgeable like Frederic and others could make edits/comments, > > > > > that'd be welcomed. But I basically summarized this whole thread here: > > > > > https://docs.google.com/document/d/1sFNuGH4U6DFCE8sunbdWMjFhJhb1aG4goOJAnS6c6gQ/edit?usp=sharing > > > > > > > > > > My thought is (AFAICS), this patch is still valid and > > > > > rcu_seq_done_exact is a potentially better replacement to > > > > > rcu_seq_done. But famous last words... > > > > > > > > Let's start with the use case where the sequence number is being used > > > > by rcu_barrier(). What happens? > > > > > > The rcu_state.barrier_sequence is used to keep track of barrier in > > > progress, as there can be concurrent rcu_barrier() calls happening so > > > we optimize by batching multiples of these calls. > > > > > > It also handles the case where the entrain IPI came through but no > > > rcu_barrier() is currently in-flight (which would be weird but maybe > > > it happens somehow). > > > > > > It is also used to make sure we don't need to entrain on a CPU again > > > if we already did an entraining for a certain rcu_barrier() invocation > > > already. This is tracked by rdp->barrier_seq_snap which copies the > > > value of rcu_state.barrier_sequence after the entraining. Al though I > > > am currently unable to envision why the IPI handler would be called > > > unnecessarily (hardware bug?) since we hold the rcu_barrier mutex > > > throughout the rcu_barrier() function, including while waiting on > > > CPUs. > > > > > > Going back to my first point, for the rcu_seq_done() in rcu_barrier(), > > > we are checking if someone else did our work: > > > mutex_lock(&rcu_state.barrier_mutex); > > > /* Did someone else do our work for us? */ > > > if (rcu_seq_done(&rcu_state.barrier_sequence, s)) > > > > > > Here I think rcu_seq_done_exact() is better, because if too many > > > rcu_barrier() calls concurrently made at the same time, the mutex may > > > be contended. If enough time passes and we have a wrap around, we > > > might end up in false-negative land which is much larger with > > > rcu_seq_done() than ..done_exact(). However, that does seem to be a > > > bug to begin with if the mutex contended for that long. OTOH, doing an > > > rcu_barrier() again when we didn't need to doesn't seem like the end > > > of the world especially if we contended on the mutex for that long. > > > But at least it shows that rcu_seq_done_exact() cannot seem to hurt > > > here. > > > > > > Hmmm, > > > > So... Do we really care about the difference between rcu_seq_done() > > and rcu_seq_done_exact() in the rcu_barrier() case? > > In so far as to get rid of rcu_seq_done() API in favor of using > rcu_seq_done_exact() but should not cause any functional or robustness > changes to rcu_barrier() at least, AFAICS. OK, *that* I might be able to get behind. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() 2025-02-05 15:56 ` Paul E. McKenney @ 2025-02-05 15:59 ` Joel Fernandes 0 siblings, 0 replies; 20+ messages in thread From: Joel Fernandes @ 2025-02-05 15:59 UTC (permalink / raw) To: paulmck Cc: linux-kernel, Frederic Weisbecker, Neeraj Upadhyay, Josh Triplett, Boqun Feng, Uladzislau Rezki, Steven Rostedt, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu On Wed, Feb 5, 2025 at 10:56 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > On Wed, Feb 05, 2025 at 10:45:08AM -0500, Joel Fernandes wrote: > > On Wed, Feb 5, 2025 at 5:28 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > On Tue, Feb 04, 2025 at 08:28:29PM -0500, Joel Fernandes wrote: > > > > On Tue, Feb 4, 2025 at 3:22 PM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > > > > On Tue, Feb 04, 2025 at 10:44:45AM -0500, Joel Fernandes wrote: > > > > > > On Thu, Jan 30, 2025 at 12:56 AM Paul E. McKenney <paulmck@kernel.org> wrote: > > > > > > > By "where we were initially", one might reasonably guess that you meant > > > > > > > that the initial value and the target are one and the same. But I suspect > > > > > > > that you are thinking of a third value, the value of the underlying > > > > > > > grace-period sequence number at the time of the call to rcu_seq_snap(). > > > > > > > But you might be thinking of something else entirely. > > > > > > > > > > > > > > > Now we move rcu_seq_done_exact. It has the exact same behavior except > > > > > > > > that instead of ULONG_MAX/2, the above situations are shrunk to about > > > > > > > > 10 counts from the target. So if target is 28, then the initial > > > > > > > > sequence should have been at least 18 to avoid false-positive, but > > > > > > > > again it is possible and I certain see in the code that it cannot be > > > > > > > > used this way.. (at least so far).. So all we are left with is the > > > > > > > > false-negative band of ~2.5 GPs.. > > > > > > > > > > > > > > Here, I have the same questions. As you no doubt guessed. > > > > > > > > > > > > > > > About "what are the consequences of failing to get this right". I > > > > > > > > believe false-positive is unacceptable unless it is maybe debug code - > > > > > > > > that can break functionality in code, too short GPs and all that. > > > > > > > > However false-negative is OK as long as the usecase can retry later > > > > > > > > and can afford to wait. Did I get that much correct? > > > > > > > > > > > > > > Maybe? > > > > > > > > > > > > > > Please look at this on a per-use-case basis. > > > > > > > > > > > > Sure. FWIW, I started a world-editable document here. I am planning to > > > > > > work on this a bit more before our meeting this week. If others > > > > > > knowledgeable like Frederic and others could make edits/comments, > > > > > > that'd be welcomed. But I basically summarized this whole thread here: > > > > > > https://docs.google.com/document/d/1sFNuGH4U6DFCE8sunbdWMjFhJhb1aG4goOJAnS6c6gQ/edit?usp=sharing > > > > > > > > > > > > My thought is (AFAICS), this patch is still valid and > > > > > > rcu_seq_done_exact is a potentially better replacement to > > > > > > rcu_seq_done. But famous last words... > > > > > > > > > > Let's start with the use case where the sequence number is being used > > > > > by rcu_barrier(). What happens? > > > > > > > > The rcu_state.barrier_sequence is used to keep track of barrier in > > > > progress, as there can be concurrent rcu_barrier() calls happening so > > > > we optimize by batching multiples of these calls. > > > > > > > > It also handles the case where the entrain IPI came through but no > > > > rcu_barrier() is currently in-flight (which would be weird but maybe > > > > it happens somehow). > > > > > > > > It is also used to make sure we don't need to entrain on a CPU again > > > > if we already did an entraining for a certain rcu_barrier() invocation > > > > already. This is tracked by rdp->barrier_seq_snap which copies the > > > > value of rcu_state.barrier_sequence after the entraining. Al though I > > > > am currently unable to envision why the IPI handler would be called > > > > unnecessarily (hardware bug?) since we hold the rcu_barrier mutex > > > > throughout the rcu_barrier() function, including while waiting on > > > > CPUs. > > > > > > > > Going back to my first point, for the rcu_seq_done() in rcu_barrier(), > > > > we are checking if someone else did our work: > > > > mutex_lock(&rcu_state.barrier_mutex); > > > > /* Did someone else do our work for us? */ > > > > if (rcu_seq_done(&rcu_state.barrier_sequence, s)) > > > > > > > > Here I think rcu_seq_done_exact() is better, because if too many > > > > rcu_barrier() calls concurrently made at the same time, the mutex may > > > > be contended. If enough time passes and we have a wrap around, we > > > > might end up in false-negative land which is much larger with > > > > rcu_seq_done() than ..done_exact(). However, that does seem to be a > > > > bug to begin with if the mutex contended for that long. OTOH, doing an > > > > rcu_barrier() again when we didn't need to doesn't seem like the end > > > > of the world especially if we contended on the mutex for that long. > > > > But at least it shows that rcu_seq_done_exact() cannot seem to hurt > > > > here. > > > > > > > > Hmmm, > > > > > > So... Do we really care about the difference between rcu_seq_done() > > > and rcu_seq_done_exact() in the rcu_barrier() case? > > > > In so far as to get rid of rcu_seq_done() API in favor of using > > rcu_seq_done_exact() but should not cause any functional or robustness > > changes to rcu_barrier() at least, AFAICS. > > OK, *that* I might be able to get behind. ;-) Great! So I guess I will continue down the rabbit hole then with the other users and cross my T's / dot my I's for this patch. Very least, I am (re-)learning about all these use cases so thank you! ;-) - Joel ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2025-02-05 15:59 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-28 0:07 [PATCH] rcu: Merge rcu_seq_done_exact() logic into rcu_seq_done() Joel Fernandes (Google) 2025-01-28 0:09 ` Joel Fernandes 2025-01-29 1:22 ` Joel Fernandes 2025-01-29 1:33 ` Paul E. McKenney 2025-01-29 1:38 ` Joel Fernandes 2025-01-29 1:47 ` Paul E. McKenney 2025-01-29 2:11 ` Joel Fernandes 2025-01-29 2:13 ` Joel Fernandes 2025-01-29 2:21 ` Paul E. McKenney 2025-01-29 11:34 ` Joel Fernandes 2025-01-29 17:25 ` Paul E. McKenney 2025-01-29 23:10 ` Joel Fernandes 2025-01-30 5:56 ` Paul E. McKenney 2025-02-04 15:44 ` Joel Fernandes 2025-02-04 20:22 ` Paul E. McKenney 2025-02-05 1:28 ` Joel Fernandes 2025-02-05 10:28 ` Paul E. McKenney 2025-02-05 15:45 ` Joel Fernandes 2025-02-05 15:56 ` Paul E. McKenney 2025-02-05 15:59 ` Joel Fernandes
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.