Linux RCU subsystem development
 help / color / mirror / Atom feed
* [RFC] jiffies_till_first_fqs off by 1
@ 2025-12-23 17:38 Joel Fernandes
  2025-12-23 23:53 ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2025-12-23 17:38 UTC (permalink / raw)
  To: rcu
  Cc: Steven Rostedt, linux-kernel, Davidlohr Bueso, Josh Triplett,
	Frederic Weisbecker, Neeraj Upadhyay, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang, rcu

Hello,

During studying some synchronize_rcu() latencies, I found that the
jiffies_till_first_fqs value passed to the timer tick subsystem does is always
off by one. This is natural due to calc_index() rounding up.

For example, jiffies_till_first_fqs=3 means the "Jiffies till first FQS" delay
is actually 4ms. And same for the next FQS. In fact, in testing it shows it can
never ever be 3ms for HZ=1000. And in rare cases, it will go to 5ms probably due
to interrupts.

Considering this, I think it is better to reduce the jiffies_till_first_fqs by 1
before passing it to the wait APIs.

But before I wanted to send a patch, I wanted to get everyone's thoughts.
Considering this the RFC.

The other place I found this was when call_rcu_hurry() is called, but the GP
thread takes a tick to wake up, but this isn't related to the timer per-se, it
is just that we don't want to wake the GP thread too often. So we just wait for
the next tick to notice callbacks before doing a wakeup.

Heh, and this means synchronize_rcu() latencies will multiply when HZ < 1000. I
wonder if this is also what caused Uladzislau to investigate it for mobile devices.

 - Joel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jiffies_till_first_fqs off by 1
  2025-12-23 17:38 [RFC] jiffies_till_first_fqs off by 1 Joel Fernandes
@ 2025-12-23 23:53 ` Paul E. McKenney
  2025-12-24  2:06   ` Joel Fernandes
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2025-12-23 23:53 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, Steven Rostedt, linux-kernel, Davidlohr Bueso, Josh Triplett,
	Frederic Weisbecker, Neeraj Upadhyay, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang

On Tue, Dec 23, 2025 at 12:38:19PM -0500, Joel Fernandes wrote:
> Hello,
> 
> During studying some synchronize_rcu() latencies, I found that the
> jiffies_till_first_fqs value passed to the timer tick subsystem does is always
> off by one. This is natural due to calc_index() rounding up.
> 
> For example, jiffies_till_first_fqs=3 means the "Jiffies till first FQS" delay
> is actually 4ms. And same for the next FQS. In fact, in testing it shows it can
> never ever be 3ms for HZ=1000. And in rare cases, it will go to 5ms probably due
> to interrupts.
> 
> Considering this, I think it is better to reduce the jiffies_till_first_fqs by 1
> before passing it to the wait APIs.
> 
> But before I wanted to send a patch, I wanted to get everyone's thoughts.
> Considering this the RFC.

Inadvertent passing of the value zero?

> The other place I found this was when call_rcu_hurry() is called, but the GP
> thread takes a tick to wake up, but this isn't related to the timer per-se, it
> is just that we don't want to wake the GP thread too often. So we just wait for
> the next tick to notice callbacks before doing a wakeup.
> 
> Heh, and this means synchronize_rcu() latencies will multiply when HZ < 1000. I
> wonder if this is also what caused Uladzislau to investigate it for mobile devices.

Quite possibly!

Back in the day, the theory was that lower HZ tended to imply less-capable
CPUs, and thus a need to lighten the load.  So there might need to be
some adjustment for present-day hardware.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jiffies_till_first_fqs off by 1
  2025-12-23 23:53 ` Paul E. McKenney
@ 2025-12-24  2:06   ` Joel Fernandes
  2025-12-25 18:54     ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2025-12-24  2:06 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, Steven Rostedt, linux-kernel, Davidlohr Bueso, Josh Triplett,
	Frederic Weisbecker, Neeraj Upadhyay, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang

Hi Paul,

On Tue, Dec 23, 2025 at 03:53:23PM -0800, Paul E. McKenney wrote:
> On Tue, Dec 23, 2025 at 12:38:19PM -0500, Joel Fernandes wrote:
> > During studying some synchronize_rcu() latencies, I found that the
> > jiffies_till_first_fqs value passed to the timer tick subsystem does is always
> > off by one. This is natural due to calc_index() rounding up.
> > 
> > For example, jiffies_till_first_fqs=3 means the "Jiffies till first FQS" delay
> > is actually 4ms. And same for the next FQS. In fact, in testing it shows it can
> > never ever be 3ms for HZ=1000. And in rare cases, it will go to 5ms probably due
> > to interrupts.
> > 
> > Considering this, I think it is better to reduce the jiffies_till_first_fqs by 1
> > before passing it to the wait APIs.
> > 
> > But before I wanted to send a patch, I wanted to get everyone's thoughts.
> > Considering this the RFC.
> 
> Inadvertent passing of the value zero?

This should not be an issue because at the moment, even a value of
jiffies_till_first_fqs == 0 waits for ~1 jiffie due to schedule_timeout(0).

But you raise a good point, we should cap the minimum allowed jiffie value
for the fqs parameters to 1 so that we don't pass schedule_timeout() with
negative values when/if we do the reduce-by-one approach.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jiffies_till_first_fqs off by 1
  2025-12-24  2:06   ` Joel Fernandes
@ 2025-12-25 18:54     ` Paul E. McKenney
  2025-12-26  2:15       ` Joel Fernandes
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2025-12-25 18:54 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, Steven Rostedt, linux-kernel, Davidlohr Bueso, Josh Triplett,
	Frederic Weisbecker, Neeraj Upadhyay, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang

On Tue, Dec 23, 2025 at 09:06:19PM -0500, Joel Fernandes wrote:
> Hi Paul,
> 
> On Tue, Dec 23, 2025 at 03:53:23PM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 23, 2025 at 12:38:19PM -0500, Joel Fernandes wrote:
> > > During studying some synchronize_rcu() latencies, I found that the
> > > jiffies_till_first_fqs value passed to the timer tick subsystem does is always
> > > off by one. This is natural due to calc_index() rounding up.
> > > 
> > > For example, jiffies_till_first_fqs=3 means the "Jiffies till first FQS" delay
> > > is actually 4ms. And same for the next FQS. In fact, in testing it shows it can
> > > never ever be 3ms for HZ=1000. And in rare cases, it will go to 5ms probably due
> > > to interrupts.
> > > 
> > > Considering this, I think it is better to reduce the jiffies_till_first_fqs by 1
> > > before passing it to the wait APIs.
> > > 
> > > But before I wanted to send a patch, I wanted to get everyone's thoughts.
> > > Considering this the RFC.
> > 
> > Inadvertent passing of the value zero?
> 
> This should not be an issue because at the moment, even a value of
> jiffies_till_first_fqs == 0 waits for ~1 jiffie due to schedule_timeout(0).
> 
> But you raise a good point, we should cap the minimum allowed jiffie value
> for the fqs parameters to 1 so that we don't pass schedule_timeout() with
> negative values when/if we do the reduce-by-one approach.

There is a potential use case for jiffies_till_first_fqs=0 and no wait,
which would be systems that want to scan for idle CPUs immediately after
the grace period has been initialized.  Note the word "potential".  ;-)

If we want to support this, then perhaps we would need to avoid that
schedule_timeout(0).  Or rcu_gp_fqs_check_wake(), as the case may be.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jiffies_till_first_fqs off by 1
  2025-12-25 18:54     ` Paul E. McKenney
@ 2025-12-26  2:15       ` Joel Fernandes
  2026-01-01 22:24         ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2025-12-26  2:15 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: rcu, Steven Rostedt, linux-kernel, Davidlohr Bueso, Josh Triplett,
	Frederic Weisbecker, Neeraj Upadhyay, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang

On Thu, Dec 25, 2025 at 10:54:20AM -0800, Paul E. McKenney wrote:
> On Tue, Dec 23, 2025 at 09:06:19PM -0500, Joel Fernandes wrote:
> > Hi Paul,
> > 
> > On Tue, Dec 23, 2025 at 03:53:23PM -0800, Paul E. McKenney wrote:
> > > On Tue, Dec 23, 2025 at 12:38:19PM -0500, Joel Fernandes wrote:
> > > > During studying some synchronize_rcu() latencies, I found that the
> > > > jiffies_till_first_fqs value passed to the timer tick subsystem does is always
> > > > off by one. This is natural due to calc_index() rounding up.
> > > > 
> > > > For example, jiffies_till_first_fqs=3 means the "Jiffies till first FQS" delay
> > > > is actually 4ms. And same for the next FQS. In fact, in testing it shows it can
> > > > never ever be 3ms for HZ=1000. And in rare cases, it will go to 5ms probably due
> > > > to interrupts.
> > > > 
> > > > Considering this, I think it is better to reduce the jiffies_till_first_fqs by 1
> > > > before passing it to the wait APIs.
> > > > 
> > > > But before I wanted to send a patch, I wanted to get everyone's thoughts.
> > > > Considering this the RFC.
> > > 
> > > Inadvertent passing of the value zero?
> > 
> > This should not be an issue because at the moment, even a value of
> > jiffies_till_first_fqs == 0 waits for ~1 jiffie due to schedule_timeout(0).
> > 
> > But you raise a good point, we should cap the minimum allowed jiffie value
> > for the fqs parameters to 1 so that we don't pass schedule_timeout() with
> > negative values when/if we do the reduce-by-one approach.
> 
> There is a potential use case for jiffies_till_first_fqs=0 and no wait,
> which would be systems that want to scan for idle CPUs immediately after
> the grace period has been initialized.  Note the word "potential".  ;-)

Sure, we could add support for that but that would be new behavior that is
not in the existing code.

So jiffies_till_first_fqs=0 today, I think it is not 'working as intended'
because it will never not wait I think.

So we should fix that too? Or maybe it can be a patch separate from this
(that I can work on). I think no harming in allowing that mode, at least it
will be more in line with the expected outcome.

> 
> If we want to support this, then perhaps we would need to avoid that
> schedule_timeout(0).  Or rcu_gp_fqs_check_wake(), as the case may be.

True.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jiffies_till_first_fqs off by 1
  2025-12-26  2:15       ` Joel Fernandes
@ 2026-01-01 22:24         ` Paul E. McKenney
  2026-01-02  2:59           ` Joel Fernandes
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2026-01-01 22:24 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, Steven Rostedt, linux-kernel, Davidlohr Bueso, Josh Triplett,
	Frederic Weisbecker, Neeraj Upadhyay, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang

On Thu, Dec 25, 2025 at 09:15:59PM -0500, Joel Fernandes wrote:
> On Thu, Dec 25, 2025 at 10:54:20AM -0800, Paul E. McKenney wrote:
> > On Tue, Dec 23, 2025 at 09:06:19PM -0500, Joel Fernandes wrote:
> > > Hi Paul,
> > > 
> > > On Tue, Dec 23, 2025 at 03:53:23PM -0800, Paul E. McKenney wrote:
> > > > On Tue, Dec 23, 2025 at 12:38:19PM -0500, Joel Fernandes wrote:
> > > > > During studying some synchronize_rcu() latencies, I found that the
> > > > > jiffies_till_first_fqs value passed to the timer tick subsystem does is always
> > > > > off by one. This is natural due to calc_index() rounding up.
> > > > > 
> > > > > For example, jiffies_till_first_fqs=3 means the "Jiffies till first FQS" delay
> > > > > is actually 4ms. And same for the next FQS. In fact, in testing it shows it can
> > > > > never ever be 3ms for HZ=1000. And in rare cases, it will go to 5ms probably due
> > > > > to interrupts.
> > > > > 
> > > > > Considering this, I think it is better to reduce the jiffies_till_first_fqs by 1
> > > > > before passing it to the wait APIs.
> > > > > 
> > > > > But before I wanted to send a patch, I wanted to get everyone's thoughts.
> > > > > Considering this the RFC.
> > > > 
> > > > Inadvertent passing of the value zero?
> > > 
> > > This should not be an issue because at the moment, even a value of
> > > jiffies_till_first_fqs == 0 waits for ~1 jiffie due to schedule_timeout(0).
> > > 
> > > But you raise a good point, we should cap the minimum allowed jiffie value
> > > for the fqs parameters to 1 so that we don't pass schedule_timeout() with
> > > negative values when/if we do the reduce-by-one approach.
> > 
> > There is a potential use case for jiffies_till_first_fqs=0 and no wait,
> > which would be systems that want to scan for idle CPUs immediately after
> > the grace period has been initialized.  Note the word "potential".  ;-)
> 
> Sure, we could add support for that but that would be new behavior that is
> not in the existing code.
> 
> So jiffies_till_first_fqs=0 today, I think it is not 'working as intended'
> because it will never not wait I think.

Agreed.

> So we should fix that too? Or maybe it can be a patch separate from this
> (that I can work on). I think no harming in allowing that mode, at least it
> will be more in line with the expected outcome.

Makes sense!  However, given that no one has complained, care is required.
Someone might be relying on the old behavior.  (In which case an easy
fix would be to make -1 be no waiting, though one might hope for a
better fix.)

							Thanx, Paul

> > If we want to support this, then perhaps we would need to avoid that
> > schedule_timeout(0).  Or rcu_gp_fqs_check_wake(), as the case may be.
> 
> True.
> 
> thanks,
> 
>  - Joel
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jiffies_till_first_fqs off by 1
  2026-01-01 22:24         ` Paul E. McKenney
@ 2026-01-02  2:59           ` Joel Fernandes
  2026-01-02  3:41             ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2026-01-02  2:59 UTC (permalink / raw)
  To: paulmck
  Cc: rcu, Steven Rostedt, linux-kernel, Davidlohr Bueso, Josh Triplett,
	Frederic Weisbecker, Neeraj Upadhyay, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang



On 1/1/2026 5:24 PM, Paul E. McKenney wrote:
> On Thu, Dec 25, 2025 at 09:15:59PM -0500, Joel Fernandes wrote:
>> On Thu, Dec 25, 2025 at 10:54:20AM -0800, Paul E. McKenney wrote:
>>> On Tue, Dec 23, 2025 at 09:06:19PM -0500, Joel Fernandes wrote:
>>>> Hi Paul,
>>>>
>>>> On Tue, Dec 23, 2025 at 03:53:23PM -0800, Paul E. McKenney wrote:
>>>>> On Tue, Dec 23, 2025 at 12:38:19PM -0500, Joel Fernandes wrote:
>>>>>> During studying some synchronize_rcu() latencies, I found that the
>>>>>> jiffies_till_first_fqs value passed to the timer tick subsystem does is always
>>>>>> off by one. This is natural due to calc_index() rounding up.
>>>>>>
>>>>>> For example, jiffies_till_first_fqs=3 means the "Jiffies till first FQS" delay
>>>>>> is actually 4ms. And same for the next FQS. In fact, in testing it shows it can
>>>>>> never ever be 3ms for HZ=1000. And in rare cases, it will go to 5ms probably due
>>>>>> to interrupts.
>>>>>>
>>>>>> Considering this, I think it is better to reduce the jiffies_till_first_fqs by 1
>>>>>> before passing it to the wait APIs.
>>>>>>
>>>>>> But before I wanted to send a patch, I wanted to get everyone's thoughts.
>>>>>> Considering this the RFC.
>>>>>
>>>>> Inadvertent passing of the value zero?
>>>>
>>>> This should not be an issue because at the moment, even a value of
>>>> jiffies_till_first_fqs == 0 waits for ~1 jiffie due to schedule_timeout(0).
>>>>
>>>> But you raise a good point, we should cap the minimum allowed jiffie value
>>>> for the fqs parameters to 1 so that we don't pass schedule_timeout() with
>>>> negative values when/if we do the reduce-by-one approach.
>>>
>>> There is a potential use case for jiffies_till_first_fqs=0 and no wait,
>>> which would be systems that want to scan for idle CPUs immediately after
>>> the grace period has been initialized.  Note the word "potential".  ;-)
>>
>> Sure, we could add support for that but that would be new behavior that is
>> not in the existing code.
>>
>> So jiffies_till_first_fqs=0 today, I think it is not 'working as intended'
>> because it will never not wait I think.
> 
> Agreed.
> >> So we should fix that too? Or maybe it can be a patch separate from this
>> (that I can work on). I think no harming in allowing that mode, at least it
>> will be more in line with the expected outcome.
> 
> Makes sense!  However, given that no one has complained, care is required.
> Someone might be relying on the old behavior.  (In which case an easy
> fix would be to make -1 be no waiting, though one might hope for a
> better fix.)
Some further investigations revealed that the "1 jiffie error" is actually worst
case. In the best case, it could still be closer to a jiffie. It is just the
nature of the timer wheel, since it snaps to numerical TICK_NS boundary, the
rounding error is intentionally added depending on how far along in the boundary
was the timer for the wait enqueued. If we took probability distributions, we
should be landing with a 1/2 jiffie error, though in practice I've seen it to be
3/4 jiffie error on average.

Given this, it would probably not make sense for us to do the -1 to adjust for
the error (since we don't clearly have bounds on the minimum error). We just
have to accept that we'd lose 1-2 extra jiffie per FQS loop iteration wait,
which is amplified if a grace period is already in progress. I've seen this add
upto 4 jiffies to back-to-back synchronize_rcu() latency even when there are no
readers in progress.

But I had to go down the rabbit hole and check... ;-)

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jiffies_till_first_fqs off by 1
  2026-01-02  2:59           ` Joel Fernandes
@ 2026-01-02  3:41             ` Paul E. McKenney
  2026-01-02 17:58               ` Joel Fernandes
  0 siblings, 1 reply; 10+ messages in thread
From: Paul E. McKenney @ 2026-01-02  3:41 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: rcu, Steven Rostedt, linux-kernel, Davidlohr Bueso, Josh Triplett,
	Frederic Weisbecker, Neeraj Upadhyay, Boqun Feng,
	Uladzislau Rezki, Mathieu Desnoyers, Lai Jiangshan, Zqiang

On Thu, Jan 01, 2026 at 09:59:27PM -0500, Joel Fernandes wrote:
> 
> 
> On 1/1/2026 5:24 PM, Paul E. McKenney wrote:
> > On Thu, Dec 25, 2025 at 09:15:59PM -0500, Joel Fernandes wrote:
> >> On Thu, Dec 25, 2025 at 10:54:20AM -0800, Paul E. McKenney wrote:
> >>> On Tue, Dec 23, 2025 at 09:06:19PM -0500, Joel Fernandes wrote:
> >>>> Hi Paul,
> >>>>
> >>>> On Tue, Dec 23, 2025 at 03:53:23PM -0800, Paul E. McKenney wrote:
> >>>>> On Tue, Dec 23, 2025 at 12:38:19PM -0500, Joel Fernandes wrote:
> >>>>>> During studying some synchronize_rcu() latencies, I found that the
> >>>>>> jiffies_till_first_fqs value passed to the timer tick subsystem does is always
> >>>>>> off by one. This is natural due to calc_index() rounding up.
> >>>>>>
> >>>>>> For example, jiffies_till_first_fqs=3 means the "Jiffies till first FQS" delay
> >>>>>> is actually 4ms. And same for the next FQS. In fact, in testing it shows it can
> >>>>>> never ever be 3ms for HZ=1000. And in rare cases, it will go to 5ms probably due
> >>>>>> to interrupts.
> >>>>>>
> >>>>>> Considering this, I think it is better to reduce the jiffies_till_first_fqs by 1
> >>>>>> before passing it to the wait APIs.
> >>>>>>
> >>>>>> But before I wanted to send a patch, I wanted to get everyone's thoughts.
> >>>>>> Considering this the RFC.
> >>>>>
> >>>>> Inadvertent passing of the value zero?
> >>>>
> >>>> This should not be an issue because at the moment, even a value of
> >>>> jiffies_till_first_fqs == 0 waits for ~1 jiffie due to schedule_timeout(0).
> >>>>
> >>>> But you raise a good point, we should cap the minimum allowed jiffie value
> >>>> for the fqs parameters to 1 so that we don't pass schedule_timeout() with
> >>>> negative values when/if we do the reduce-by-one approach.
> >>>
> >>> There is a potential use case for jiffies_till_first_fqs=0 and no wait,
> >>> which would be systems that want to scan for idle CPUs immediately after
> >>> the grace period has been initialized.  Note the word "potential".  ;-)
> >>
> >> Sure, we could add support for that but that would be new behavior that is
> >> not in the existing code.
> >>
> >> So jiffies_till_first_fqs=0 today, I think it is not 'working as intended'
> >> because it will never not wait I think.
> > 
> > Agreed.
> > >> So we should fix that too? Or maybe it can be a patch separate from this
> >> (that I can work on). I think no harming in allowing that mode, at least it
> >> will be more in line with the expected outcome.
> > 
> > Makes sense!  However, given that no one has complained, care is required.
> > Someone might be relying on the old behavior.  (In which case an easy
> > fix would be to make -1 be no waiting, though one might hope for a
> > better fix.)
> Some further investigations revealed that the "1 jiffie error" is actually worst
> case. In the best case, it could still be closer to a jiffie. It is just the
> nature of the timer wheel, since it snaps to numerical TICK_NS boundary, the
> rounding error is intentionally added depending on how far along in the boundary
> was the timer for the wait enqueued. If we took probability distributions, we
> should be landing with a 1/2 jiffie error, though in practice I've seen it to be
> 3/4 jiffie error on average.
> 
> Given this, it would probably not make sense for us to do the -1 to adjust for
> the error (since we don't clearly have bounds on the minimum error). We just
> have to accept that we'd lose 1-2 extra jiffie per FQS loop iteration wait,
> which is amplified if a grace period is already in progress. I've seen this add
> upto 4 jiffies to back-to-back synchronize_rcu() latency even when there are no
> readers in progress.
. 
> But I had to go down the rabbit hole and check... ;-)

I was thinking in terms of special-casing -1 to skip the sleep, but I
guess that there are as many ways to skin a rabbit as a cat.  ;-)

							Thanx, Paul


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jiffies_till_first_fqs off by 1
  2026-01-02  3:41             ` Paul E. McKenney
@ 2026-01-02 17:58               ` Joel Fernandes
  2026-01-02 19:50                 ` Paul E. McKenney
  0 siblings, 1 reply; 10+ messages in thread
From: Joel Fernandes @ 2026-01-02 17:58 UTC (permalink / raw)
  To: paulmck
  Cc: Joel Fernandes, rcu, Steven Rostedt, linux-kernel,
	Davidlohr Bueso, Josh Triplett, Frederic Weisbecker,
	Neeraj Upadhyay, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang



> On Jan 1, 2026, at 10:41 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> 
> On Thu, Jan 01, 2026 at 09:59:27PM -0500, Joel Fernandes wrote:
>> 
>> 
>>> On 1/1/2026 5:24 PM, Paul E. McKenney wrote:
>>> On Thu, Dec 25, 2025 at 09:15:59PM -0500, Joel Fernandes wrote:
>>>> On Thu, Dec 25, 2025 at 10:54:20AM -0800, Paul E. McKenney wrote:
>>>>> On Tue, Dec 23, 2025 at 09:06:19PM -0500, Joel Fernandes wrote:
>>>>>> Hi Paul,
>>>>>> 
>>>>>> On Tue, Dec 23, 2025 at 03:53:23PM -0800, Paul E. McKenney wrote:
>>>>>>> On Tue, Dec 23, 2025 at 12:38:19PM -0500, Joel Fernandes wrote:
>>>>>>>> During studying some synchronize_rcu() latencies, I found that the
>>>>>>>> jiffies_till_first_fqs value passed to the timer tick subsystem does is always
>>>>>>>> off by one. This is natural due to calc_index() rounding up.
>>>>>>>> 
>>>>>>>> For example, jiffies_till_first_fqs=3 means the "Jiffies till first FQS" delay
>>>>>>>> is actually 4ms. And same for the next FQS. In fact, in testing it shows it can
>>>>>>>> never ever be 3ms for HZ=1000. And in rare cases, it will go to 5ms probably due
>>>>>>>> to interrupts.
>>>>>>>> 
>>>>>>>> Considering this, I think it is better to reduce the jiffies_till_first_fqs by 1
>>>>>>>> before passing it to the wait APIs.
>>>>>>>> 
>>>>>>>> But before I wanted to send a patch, I wanted to get everyone's thoughts.
>>>>>>>> Considering this the RFC.
>>>>>>> 
>>>>>>> Inadvertent passing of the value zero?
>>>>>> 
>>>>>> This should not be an issue because at the moment, even a value of
>>>>>> jiffies_till_first_fqs == 0 waits for ~1 jiffie due to schedule_timeout(0).
>>>>>> 
>>>>>> But you raise a good point, we should cap the minimum allowed jiffie value
>>>>>> for the fqs parameters to 1 so that we don't pass schedule_timeout() with
>>>>>> negative values when/if we do the reduce-by-one approach.
>>>>> 
>>>>> There is a potential use case for jiffies_till_first_fqs=0 and no wait,
>>>>> which would be systems that want to scan for idle CPUs immediately after
>>>>> the grace period has been initialized.  Note the word "potential".  ;-)
>>>> 
>>>> Sure, we could add support for that but that would be new behavior that is
>>>> not in the existing code.
>>>> 
>>>> So jiffies_till_first_fqs=0 today, I think it is not 'working as intended'
>>>> because it will never not wait I think.
>>> 
>>> Agreed.
>>>>> So we should fix that too? Or maybe it can be a patch separate from this
>>>> (that I can work on). I think no harming in allowing that mode, at least it
>>>> will be more in line with the expected outcome.
>>> 
>>> Makes sense!  However, given that no one has complained, care is required.
>>> Someone might be relying on the old behavior.  (In which case an easy
>>> fix would be to make -1 be no waiting, though one might hope for a
>>> better fix.)
>> Some further investigations revealed that the "1 jiffie error" is actually worst
>> case. In the best case, it could still be closer to a jiffie. It is just the
>> nature of the timer wheel, since it snaps to numerical TICK_NS boundary, the
>> rounding error is intentionally added depending on how far along in the boundary
>> was the timer for the wait enqueued. If we took probability distributions, we
>> should be landing with a 1/2 jiffie error, though in practice I've seen it to be
>> 3/4 jiffie error on average.
>> 
>> Given this, it would probably not make sense for us to do the -1 to adjust for
>> the error (since we don't clearly have bounds on the minimum error). We just
>> have to accept that we'd lose 1-2 extra jiffie per FQS loop iteration wait,
>> which is amplified if a grace period is already in progress. I've seen this add
>> upto 4 jiffies to back-to-back synchronize_rcu() latency even when there are no
>> readers in progress.
> .
>> But I had to go down the rabbit hole and check... ;-)
> 
> I was thinking in terms of special-casing -1 to skip the sleep, but I
> guess that there are as many ways to skin a rabbit as a cat.  ;-)

Sure I am happy to do that. One of my fears though is no one will know to use it that way making it not that useful.

Do let me know if anyone sets it to 0 though. Perhaps for testing even to make the GP cycle shorter?

 - Joel


> 
>                            Thanx, Paul
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC] jiffies_till_first_fqs off by 1
  2026-01-02 17:58               ` Joel Fernandes
@ 2026-01-02 19:50                 ` Paul E. McKenney
  0 siblings, 0 replies; 10+ messages in thread
From: Paul E. McKenney @ 2026-01-02 19:50 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Joel Fernandes, rcu, Steven Rostedt, linux-kernel,
	Davidlohr Bueso, Josh Triplett, Frederic Weisbecker,
	Neeraj Upadhyay, Boqun Feng, Uladzislau Rezki, Mathieu Desnoyers,
	Lai Jiangshan, Zqiang

On Fri, Jan 02, 2026 at 12:58:08PM -0500, Joel Fernandes wrote:
> 
> 
> > On Jan 1, 2026, at 10:41 PM, Paul E. McKenney <paulmck@kernel.org> wrote:
> > 
> > On Thu, Jan 01, 2026 at 09:59:27PM -0500, Joel Fernandes wrote:
> >> 
> >> 
> >>> On 1/1/2026 5:24 PM, Paul E. McKenney wrote:
> >>> On Thu, Dec 25, 2025 at 09:15:59PM -0500, Joel Fernandes wrote:
> >>>> On Thu, Dec 25, 2025 at 10:54:20AM -0800, Paul E. McKenney wrote:
> >>>>> On Tue, Dec 23, 2025 at 09:06:19PM -0500, Joel Fernandes wrote:
> >>>>>> Hi Paul,
> >>>>>> 
> >>>>>> On Tue, Dec 23, 2025 at 03:53:23PM -0800, Paul E. McKenney wrote:
> >>>>>>> On Tue, Dec 23, 2025 at 12:38:19PM -0500, Joel Fernandes wrote:
> >>>>>>>> During studying some synchronize_rcu() latencies, I found that the
> >>>>>>>> jiffies_till_first_fqs value passed to the timer tick subsystem does is always
> >>>>>>>> off by one. This is natural due to calc_index() rounding up.
> >>>>>>>> 
> >>>>>>>> For example, jiffies_till_first_fqs=3 means the "Jiffies till first FQS" delay
> >>>>>>>> is actually 4ms. And same for the next FQS. In fact, in testing it shows it can
> >>>>>>>> never ever be 3ms for HZ=1000. And in rare cases, it will go to 5ms probably due
> >>>>>>>> to interrupts.
> >>>>>>>> 
> >>>>>>>> Considering this, I think it is better to reduce the jiffies_till_first_fqs by 1
> >>>>>>>> before passing it to the wait APIs.
> >>>>>>>> 
> >>>>>>>> But before I wanted to send a patch, I wanted to get everyone's thoughts.
> >>>>>>>> Considering this the RFC.
> >>>>>>> 
> >>>>>>> Inadvertent passing of the value zero?
> >>>>>> 
> >>>>>> This should not be an issue because at the moment, even a value of
> >>>>>> jiffies_till_first_fqs == 0 waits for ~1 jiffie due to schedule_timeout(0).
> >>>>>> 
> >>>>>> But you raise a good point, we should cap the minimum allowed jiffie value
> >>>>>> for the fqs parameters to 1 so that we don't pass schedule_timeout() with
> >>>>>> negative values when/if we do the reduce-by-one approach.
> >>>>> 
> >>>>> There is a potential use case for jiffies_till_first_fqs=0 and no wait,
> >>>>> which would be systems that want to scan for idle CPUs immediately after
> >>>>> the grace period has been initialized.  Note the word "potential".  ;-)
> >>>> 
> >>>> Sure, we could add support for that but that would be new behavior that is
> >>>> not in the existing code.
> >>>> 
> >>>> So jiffies_till_first_fqs=0 today, I think it is not 'working as intended'
> >>>> because it will never not wait I think.
> >>> 
> >>> Agreed.
> >>>>> So we should fix that too? Or maybe it can be a patch separate from this
> >>>> (that I can work on). I think no harming in allowing that mode, at least it
> >>>> will be more in line with the expected outcome.
> >>> 
> >>> Makes sense!  However, given that no one has complained, care is required.
> >>> Someone might be relying on the old behavior.  (In which case an easy
> >>> fix would be to make -1 be no waiting, though one might hope for a
> >>> better fix.)
> >> Some further investigations revealed that the "1 jiffie error" is actually worst
> >> case. In the best case, it could still be closer to a jiffie. It is just the
> >> nature of the timer wheel, since it snaps to numerical TICK_NS boundary, the
> >> rounding error is intentionally added depending on how far along in the boundary
> >> was the timer for the wait enqueued. If we took probability distributions, we
> >> should be landing with a 1/2 jiffie error, though in practice I've seen it to be
> >> 3/4 jiffie error on average.
> >> 
> >> Given this, it would probably not make sense for us to do the -1 to adjust for
> >> the error (since we don't clearly have bounds on the minimum error). We just
> >> have to accept that we'd lose 1-2 extra jiffie per FQS loop iteration wait,
> >> which is amplified if a grace period is already in progress. I've seen this add
> >> upto 4 jiffies to back-to-back synchronize_rcu() latency even when there are no
> >> readers in progress.
> > .
> >> But I had to go down the rabbit hole and check... ;-)
> > 
> > I was thinking in terms of special-casing -1 to skip the sleep, but I
> > guess that there are as many ways to skin a rabbit as a cat.  ;-)
> 
> Sure I am happy to do that. One of my fears though is no one will know to use it that way making it not that useful.
> 
> Do let me know if anyone sets it to 0 though. Perhaps for testing even to make the GP cycle shorter?

I do not know of anyone doing that, hence the non-urgency.  The "-1"
would be just in case someone actually is setting it to zero, and
complains about us breaking userspace.  :-/

							Thanx, Paul

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-01-02 19:50 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-23 17:38 [RFC] jiffies_till_first_fqs off by 1 Joel Fernandes
2025-12-23 23:53 ` Paul E. McKenney
2025-12-24  2:06   ` Joel Fernandes
2025-12-25 18:54     ` Paul E. McKenney
2025-12-26  2:15       ` Joel Fernandes
2026-01-01 22:24         ` Paul E. McKenney
2026-01-02  2:59           ` Joel Fernandes
2026-01-02  3:41             ` Paul E. McKenney
2026-01-02 17:58               ` Joel Fernandes
2026-01-02 19:50                 ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox