* [PATCH 0/2] xen/cpufreq: a couple of fixes for the sampling window
@ 2026-02-27 7:32 Roger Pau Monne
2026-02-27 7:32 ` [PATCH 1/2] xen/cpufreq: fix adjusting of sampling window on early exit Roger Pau Monne
2026-02-27 7:32 ` [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor Roger Pau Monne
0 siblings, 2 replies; 9+ messages in thread
From: Roger Pau Monne @ 2026-02-27 7:32 UTC (permalink / raw)
To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich
Hello,
Since a month or so I've noticed that my only text box that supports
P-states would seem to eventually get all its CPUs stuck in the lower
frequency P-state, and no amount of load would result in a P-state
change.
The issue has always been intermittent, and not easy to reproduce.
Looking at the on-demand governor I've found the two issues fixed on
this series. With those fixed I haven't been able to reproduce the
issue, but as said I don't have a deterministic way to trigger it.
Anyway, the fixes look legit, so regardless of whether there's something
else causing my issue, we should take those. I'm a bit surprised no-one
has noticed wonky performance with the on-demand governor before; the
changes that introduce those issue have been there since the governor
was introduced. Neither I know why I've only noticed the low frequency
stickiness issue ~last month, if it was indeed caused by the issues
here.
Thanks, Roger.
Roger Pau Monne (2):
xen/cpufreq: fix adjusting of sampling window on early exit
xen/cpufreq: fix usages of align_timer() in the on-demand governor
xen/drivers/cpufreq/cpufreq_ondemand.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--
2.51.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] xen/cpufreq: fix adjusting of sampling window on early exit
2026-02-27 7:32 [PATCH 0/2] xen/cpufreq: a couple of fixes for the sampling window Roger Pau Monne
@ 2026-02-27 7:32 ` Roger Pau Monne
2026-03-02 9:46 ` Jan Beulich
2026-02-27 7:32 ` [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor Roger Pau Monne
1 sibling, 1 reply; 9+ messages in thread
From: Roger Pau Monne @ 2026-02-27 7:32 UTC (permalink / raw)
To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich
The logic in dbs_check_cpu() resets the sampling window even when the
sampling period is considered too small. This leads to further calls
finding an imbalance between the total window time and the idle time, as
the total window time is possibly shorter than the idle time.
Fix by resetting the sampling window start time in the same block where the
current idle time is stored. While there also prevent a duplicated call to
NOW() and instead re-use the previously fetched value.
Fixes: d6f001cb91ac ("x86: Implement cpufreq ondemand policy")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
xen/drivers/cpufreq/cpufreq_ondemand.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/xen/drivers/cpufreq/cpufreq_ondemand.c b/xen/drivers/cpufreq/cpufreq_ondemand.c
index 0126a3f5d9b4..537695eaab19 100644
--- a/xen/drivers/cpufreq/cpufreq_ondemand.c
+++ b/xen/drivers/cpufreq/cpufreq_ondemand.c
@@ -117,11 +117,12 @@ static void dbs_check_cpu(struct cpu_dbs_info_s *this_dbs_info)
cur_ns = NOW();
total_ns = cur_ns - this_dbs_info->prev_cpu_wall;
- this_dbs_info->prev_cpu_wall = NOW();
if (total_ns < MIN_DBS_INTERVAL)
return;
+ this_dbs_info->prev_cpu_wall = cur_ns;
+
/* Get Idle Time */
for_each_cpu(j, policy->cpus) {
uint64_t idle_ns, total_idle_ns;
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor
2026-02-27 7:32 [PATCH 0/2] xen/cpufreq: a couple of fixes for the sampling window Roger Pau Monne
2026-02-27 7:32 ` [PATCH 1/2] xen/cpufreq: fix adjusting of sampling window on early exit Roger Pau Monne
@ 2026-02-27 7:32 ` Roger Pau Monne
2026-03-02 9:57 ` Jan Beulich
2026-06-15 17:44 ` Jason Andryuk
1 sibling, 2 replies; 9+ messages in thread
From: Roger Pau Monne @ 2026-02-27 7:32 UTC (permalink / raw)
To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich
The first parameter passed to align_timer() is the timer expiration, not
the current time. Adjust the calls to align_timer() in the on-demand
governor to pass the expected timer expiration as the first parameter.
Fixes: af74e3a15a83 ("cpufreq: align dbs timer for better package C state residency")
Fixes: 382b95f627a9 ("Fix cpufreq HW-ALL coordination handle")
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
xen/drivers/cpufreq/cpufreq_ondemand.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/xen/drivers/cpufreq/cpufreq_ondemand.c b/xen/drivers/cpufreq/cpufreq_ondemand.c
index 537695eaab19..0d94c0e464a6 100644
--- a/xen/drivers/cpufreq/cpufreq_ondemand.c
+++ b/xen/drivers/cpufreq/cpufreq_ondemand.c
@@ -185,7 +185,8 @@ static void cf_check do_dbs_timer(void *dbs)
dbs_check_cpu(dbs_info);
set_timer(&per_cpu(dbs_timer, dbs_info->cpu),
- align_timer(NOW() , dbs_tuners_ins.sampling_rate));
+ align_timer(NOW() + dbs_tuners_ins.sampling_rate,
+ dbs_tuners_ins.sampling_rate));
}
static void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
@@ -400,6 +401,6 @@ void cpufreq_dbs_timer_resume(void)
(void)cmpxchg(stoppable, -1, 1);
}
else
- set_timer(t, align_timer(now, dbs_tuners_ins.sampling_rate));
+ set_timer(t, align_timer(t->expires, dbs_tuners_ins.sampling_rate));
}
}
--
2.51.0
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] xen/cpufreq: fix adjusting of sampling window on early exit
2026-02-27 7:32 ` [PATCH 1/2] xen/cpufreq: fix adjusting of sampling window on early exit Roger Pau Monne
@ 2026-03-02 9:46 ` Jan Beulich
0 siblings, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2026-03-02 9:46 UTC (permalink / raw)
To: Roger Pau Monne; +Cc: xen-devel
On 27.02.2026 08:32, Roger Pau Monne wrote:
> The logic in dbs_check_cpu() resets the sampling window even when the
> sampling period is considered too small. This leads to further calls
> finding an imbalance between the total window time and the idle time, as
> the total window time is possibly shorter than the idle time.
>
> Fix by resetting the sampling window start time in the same block where the
> current idle time is stored. While there also prevent a duplicated call to
> NOW() and instead re-use the previously fetched value.
>
> Fixes: d6f001cb91ac ("x86: Implement cpufreq ondemand policy")
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor
2026-02-27 7:32 ` [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor Roger Pau Monne
@ 2026-03-02 9:57 ` Jan Beulich
2026-06-15 17:44 ` Jason Andryuk
1 sibling, 0 replies; 9+ messages in thread
From: Jan Beulich @ 2026-03-02 9:57 UTC (permalink / raw)
To: Roger Pau Monne; +Cc: xen-devel
On 27.02.2026 08:32, Roger Pau Monne wrote:
> The first parameter passed to align_timer() is the timer expiration, not
> the current time. Adjust the calls to align_timer() in the on-demand
> governor to pass the expected timer expiration as the first parameter.
>
> Fixes: af74e3a15a83 ("cpufreq: align dbs timer for better package C state residency")
> Fixes: 382b95f627a9 ("Fix cpufreq HW-ALL coordination handle")
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor
2026-02-27 7:32 ` [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor Roger Pau Monne
2026-03-02 9:57 ` Jan Beulich
@ 2026-06-15 17:44 ` Jason Andryuk
2026-06-15 18:33 ` Roger Pau Monné
2026-06-15 19:22 ` Roger Pau Monné
1 sibling, 2 replies; 9+ messages in thread
From: Jason Andryuk @ 2026-06-15 17:44 UTC (permalink / raw)
To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich
On 2026-02-27 02:32, Roger Pau Monne wrote:
> The first parameter passed to align_timer() is the timer expiration, not
> the current time. Adjust the calls to align_timer() in the on-demand
> governor to pass the expected timer expiration as the first parameter.
Internally, we have a report of a benchmark regressing ~6% with this
change on 4.20.
s_time_t align_timer(s_time_t firsttick, uint64_t period)
{
if ( !period )
return firsttick;
return firsttick + (period - 1) - ((firsttick - 1) % period);
}
The code rounds firsttick up to the next period:
align_timer(0, period) -> 0
align_timer(1, period) -> period
align_timer(period - 1, period) -> period
align_timer(period, period) -> period
align_timer(period + 1, period) -> 2 * period
With the change of this patch adding the period before calling
align_timer(), the timer is set for two periods in the future. The only
exception is when firsttick % period == 0. I think that is unlikely to
happen since NOW() will always be a little after the period. Even if it
did happen, the timer would fire immediately, but the next timer would
be set for 1 period later.
So I think we want to revert?
Regards,
Jason
>
> Fixes: af74e3a15a83 ("cpufreq: align dbs timer for better package C state residency")
> Fixes: 382b95f627a9 ("Fix cpufreq HW-ALL coordination handle")
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> ---
> xen/drivers/cpufreq/cpufreq_ondemand.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/xen/drivers/cpufreq/cpufreq_ondemand.c b/xen/drivers/cpufreq/cpufreq_ondemand.c
> index 537695eaab19..0d94c0e464a6 100644
> --- a/xen/drivers/cpufreq/cpufreq_ondemand.c
> +++ b/xen/drivers/cpufreq/cpufreq_ondemand.c
> @@ -185,7 +185,8 @@ static void cf_check do_dbs_timer(void *dbs)
> dbs_check_cpu(dbs_info);
>
> set_timer(&per_cpu(dbs_timer, dbs_info->cpu),
> - align_timer(NOW() , dbs_tuners_ins.sampling_rate));
> + align_timer(NOW() + dbs_tuners_ins.sampling_rate,
> + dbs_tuners_ins.sampling_rate));
> }
>
> static void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
> @@ -400,6 +401,6 @@ void cpufreq_dbs_timer_resume(void)
> (void)cmpxchg(stoppable, -1, 1);
> }
> else
> - set_timer(t, align_timer(now, dbs_tuners_ins.sampling_rate));
> + set_timer(t, align_timer(t->expires, dbs_tuners_ins.sampling_rate));
> }
> }
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor
2026-06-15 17:44 ` Jason Andryuk
@ 2026-06-15 18:33 ` Roger Pau Monné
2026-06-15 19:31 ` Jason Andryuk
2026-06-15 19:22 ` Roger Pau Monné
1 sibling, 1 reply; 9+ messages in thread
From: Roger Pau Monné @ 2026-06-15 18:33 UTC (permalink / raw)
To: Jason Andryuk; +Cc: xen-devel, Jan Beulich
On Mon, Jun 15, 2026 at 01:44:54PM -0400, Jason Andryuk wrote:
> On 2026-02-27 02:32, Roger Pau Monne wrote:
> > The first parameter passed to align_timer() is the timer expiration, not
> > the current time. Adjust the calls to align_timer() in the on-demand
> > governor to pass the expected timer expiration as the first parameter.
>
> Internally, we have a report of a benchmark regressing ~6% with this change
> on 4.20.
>
> s_time_t align_timer(s_time_t firsttick, uint64_t period)
> {
> if ( !period )
> return firsttick;
>
> return firsttick + (period - 1) - ((firsttick - 1) % period);
> }
>
> The code rounds firsttick up to the next period:
>
> align_timer(0, period) -> 0
> align_timer(1, period) -> period
> align_timer(period - 1, period) -> period
> align_timer(period, period) -> period
> align_timer(period + 1, period) -> 2 * period
>
> With the change of this patch adding the period before calling
> align_timer(), the timer is set for two periods in the future. The only
> exception is when firsttick % period == 0. I think that is unlikely to
> happen since NOW() will always be a little after the period. Even if it did
> happen, the timer would fire immediately, but the next timer would be set
> for 1 period later.
Hm, I see. So this is explicitly done to never exceed one period
between sampling, even if that implies using a smaller period and
over-sampling. That's kind of different from how the other caller
uses align_timer(), where it's expected the timer to fire after the
period has expired, not before. I think create_periodic_time() is
fine, because it's only the first tick that might be delayed,
afterwards the next tick should be aligned to the period already.
> So I think we want to revert?
I think we want to revert the first chunk...
> Regards,
> Jason
>
> >
> > Fixes: af74e3a15a83 ("cpufreq: align dbs timer for better package C state residency")
> > Fixes: 382b95f627a9 ("Fix cpufreq HW-ALL coordination handle")
> > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> > ---
> > xen/drivers/cpufreq/cpufreq_ondemand.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/drivers/cpufreq/cpufreq_ondemand.c b/xen/drivers/cpufreq/cpufreq_ondemand.c
> > index 537695eaab19..0d94c0e464a6 100644
> > --- a/xen/drivers/cpufreq/cpufreq_ondemand.c
> > +++ b/xen/drivers/cpufreq/cpufreq_ondemand.c
> > @@ -185,7 +185,8 @@ static void cf_check do_dbs_timer(void *dbs)
> > dbs_check_cpu(dbs_info);
> > set_timer(&per_cpu(dbs_timer, dbs_info->cpu),
> > - align_timer(NOW() , dbs_tuners_ins.sampling_rate));
> > + align_timer(NOW() + dbs_tuners_ins.sampling_rate,
> > + dbs_tuners_ins.sampling_rate));
> > }
> > static void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
> > @@ -400,6 +401,6 @@ void cpufreq_dbs_timer_resume(void)
> > (void)cmpxchg(stoppable, -1, 1);
> > }
> > else
> > - set_timer(t, align_timer(now, dbs_tuners_ins.sampling_rate));
> > + set_timer(t, align_timer(t->expires, dbs_tuners_ins.sampling_rate));
... but possibly keep this as-is? Thinking about it, t->expires
should already be aligned, and hence we could drop the align_timer()
call here? It should be equivalent to aligning NOW() to the next
period boundary, so yes, we could revert this chunk also and
timer expiry should be the same.
And maybe we want to add an extra align_timer() in dbs_timer_init() to
align the first call also in a separate patch.
Do you want to send the revert, or should I?
Regards, Roger.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor
2026-06-15 17:44 ` Jason Andryuk
2026-06-15 18:33 ` Roger Pau Monné
@ 2026-06-15 19:22 ` Roger Pau Monné
1 sibling, 0 replies; 9+ messages in thread
From: Roger Pau Monné @ 2026-06-15 19:22 UTC (permalink / raw)
To: Jason Andryuk; +Cc: xen-devel, Jan Beulich
On Mon, Jun 15, 2026 at 01:44:54PM -0400, Jason Andryuk wrote:
> On 2026-02-27 02:32, Roger Pau Monne wrote:
> > The first parameter passed to align_timer() is the timer expiration, not
> > the current time. Adjust the calls to align_timer() in the on-demand
> > governor to pass the expected timer expiration as the first parameter.
>
> Internally, we have a report of a benchmark regressing ~6% with this change
> on 4.20.
>
> s_time_t align_timer(s_time_t firsttick, uint64_t period)
> {
> if ( !period )
> return firsttick;
>
> return firsttick + (period - 1) - ((firsttick - 1) % period);
> }
>
> The code rounds firsttick up to the next period:
>
> align_timer(0, period) -> 0
> align_timer(1, period) -> period
> align_timer(period - 1, period) -> period
> align_timer(period, period) -> period
> align_timer(period + 1, period) -> 2 * period
>
> With the change of this patch adding the period before calling
> align_timer(), the timer is set for two periods in the future. The only
> exception is when firsttick % period == 0. I think that is unlikely to
> happen since NOW() will always be a little after the period. Even if it did
> happen, the timer would fire immediately, but the next timer would be set
> for 1 period later.
>
> So I think we want to revert?
Forgot to mention in the first reply, as I went straight into the
technical side: thanks for finding and reporting this, we can
hopefully get it sorted before the release.
Roger.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor
2026-06-15 18:33 ` Roger Pau Monné
@ 2026-06-15 19:31 ` Jason Andryuk
0 siblings, 0 replies; 9+ messages in thread
From: Jason Andryuk @ 2026-06-15 19:31 UTC (permalink / raw)
To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich
On 2026-06-15 14:33, Roger Pau Monné wrote:
> On Mon, Jun 15, 2026 at 01:44:54PM -0400, Jason Andryuk wrote:
>> On 2026-02-27 02:32, Roger Pau Monne wrote:
>>> The first parameter passed to align_timer() is the timer expiration, not
>>> the current time. Adjust the calls to align_timer() in the on-demand
>>> governor to pass the expected timer expiration as the first parameter.
>>
>> Internally, we have a report of a benchmark regressing ~6% with this change
>> on 4.20.
>>
>> s_time_t align_timer(s_time_t firsttick, uint64_t period)
>> {
>> if ( !period )
>> return firsttick;
>>
>> return firsttick + (period - 1) - ((firsttick - 1) % period);
>> }
>>
>> The code rounds firsttick up to the next period:
>>
>> align_timer(0, period) -> 0
>> align_timer(1, period) -> period
>> align_timer(period - 1, period) -> period
>> align_timer(period, period) -> period
>> align_timer(period + 1, period) -> 2 * period
>>
>> With the change of this patch adding the period before calling
>> align_timer(), the timer is set for two periods in the future. The only
>> exception is when firsttick % period == 0. I think that is unlikely to
>> happen since NOW() will always be a little after the period. Even if it did
>> happen, the timer would fire immediately, but the next timer would be set
>> for 1 period later.
>
> Hm, I see. So this is explicitly done to never exceed one period
> between sampling, even if that implies using a smaller period and
> over-sampling. That's kind of different from how the other caller
> uses align_timer(), where it's expected the timer to fire after the
> period has expired, not before. I think create_periodic_time() is
> fine, because it's only the first tick that might be delayed,
> afterwards the next tick should be aligned to the period already.
>
>> So I think we want to revert?
>
> I think we want to revert the first chunk...
>
>> Regards,
>> Jason
>>
>>>
>>> Fixes: af74e3a15a83 ("cpufreq: align dbs timer for better package C state residency")
>>> Fixes: 382b95f627a9 ("Fix cpufreq HW-ALL coordination handle")
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> ---
>>> xen/drivers/cpufreq/cpufreq_ondemand.c | 5 +++--
>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/xen/drivers/cpufreq/cpufreq_ondemand.c b/xen/drivers/cpufreq/cpufreq_ondemand.c
>>> index 537695eaab19..0d94c0e464a6 100644
>>> --- a/xen/drivers/cpufreq/cpufreq_ondemand.c
>>> +++ b/xen/drivers/cpufreq/cpufreq_ondemand.c
>>> @@ -185,7 +185,8 @@ static void cf_check do_dbs_timer(void *dbs)
>>> dbs_check_cpu(dbs_info);
>>> set_timer(&per_cpu(dbs_timer, dbs_info->cpu),
>>> - align_timer(NOW() , dbs_tuners_ins.sampling_rate));
>>> + align_timer(NOW() + dbs_tuners_ins.sampling_rate,
>>> + dbs_tuners_ins.sampling_rate));
>>> }
>>> static void dbs_timer_init(struct cpu_dbs_info_s *dbs_info)
>>> @@ -400,6 +401,6 @@ void cpufreq_dbs_timer_resume(void)
>>> (void)cmpxchg(stoppable, -1, 1);
>>> }
>>> else
>>> - set_timer(t, align_timer(now, dbs_tuners_ins.sampling_rate));
>>> + set_timer(t, align_timer(t->expires, dbs_tuners_ins.sampling_rate));
>
> ... but possibly keep this as-is? Thinking about it, t->expires
> should already be aligned, and hence we could drop the align_timer()
> call here? It should be equivalent to aligning NOW() to the next
> period boundary, so yes, we could revert this chunk also and
> timer expiry should be the same.
>
> And maybe we want to add an extra align_timer() in dbs_timer_init() to
> align the first call also in a separate patch.
>
> Do you want to send the revert, or should I?
I'll send out a straight revert soon. Additional changes can go on top
(though probably not for 4.22?). Thanks for the feedback.
Regards,
Jason
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-06-15 19:32 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-27 7:32 [PATCH 0/2] xen/cpufreq: a couple of fixes for the sampling window Roger Pau Monne
2026-02-27 7:32 ` [PATCH 1/2] xen/cpufreq: fix adjusting of sampling window on early exit Roger Pau Monne
2026-03-02 9:46 ` Jan Beulich
2026-02-27 7:32 ` [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor Roger Pau Monne
2026-03-02 9:57 ` Jan Beulich
2026-06-15 17:44 ` Jason Andryuk
2026-06-15 18:33 ` Roger Pau Monné
2026-06-15 19:31 ` Jason Andryuk
2026-06-15 19:22 ` Roger Pau Monné
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.