* [PATCH 0/2] xen/cpufreq: a couple of fixes for the sampling window @ 2026-02-27 7:32 Roger Pau Monne 2026-02-27 7:32 ` [PATCH 1/2] xen/cpufreq: fix adjusting of sampling window on early exit Roger Pau Monne 2026-02-27 7:32 ` [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor Roger Pau Monne 0 siblings, 2 replies; 9+ messages in thread From: Roger Pau Monne @ 2026-02-27 7:32 UTC (permalink / raw) To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich Hello, Since a month or so I've noticed that my only text box that supports P-states would seem to eventually get all its CPUs stuck in the lower frequency P-state, and no amount of load would result in a P-state change. The issue has always been intermittent, and not easy to reproduce. Looking at the on-demand governor I've found the two issues fixed on this series. With those fixed I haven't been able to reproduce the issue, but as said I don't have a deterministic way to trigger it. Anyway, the fixes look legit, so regardless of whether there's something else causing my issue, we should take those. I'm a bit surprised no-one has noticed wonky performance with the on-demand governor before; the changes that introduce those issue have been there since the governor was introduced. Neither I know why I've only noticed the low frequency stickiness issue ~last month, if it was indeed caused by the issues here. Thanks, Roger. Roger Pau Monne (2): xen/cpufreq: fix adjusting of sampling window on early exit xen/cpufreq: fix usages of align_timer() in the on-demand governor xen/drivers/cpufreq/cpufreq_ondemand.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) -- 2.51.0 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 1/2] xen/cpufreq: fix adjusting of sampling window on early exit 2026-02-27 7:32 [PATCH 0/2] xen/cpufreq: a couple of fixes for the sampling window Roger Pau Monne @ 2026-02-27 7:32 ` Roger Pau Monne 2026-03-02 9:46 ` Jan Beulich 2026-02-27 7:32 ` [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor Roger Pau Monne 1 sibling, 1 reply; 9+ messages in thread From: Roger Pau Monne @ 2026-02-27 7:32 UTC (permalink / raw) To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich The logic in dbs_check_cpu() resets the sampling window even when the sampling period is considered too small. This leads to further calls finding an imbalance between the total window time and the idle time, as the total window time is possibly shorter than the idle time. Fix by resetting the sampling window start time in the same block where the current idle time is stored. While there also prevent a duplicated call to NOW() and instead re-use the previously fetched value. Fixes: d6f001cb91ac ("x86: Implement cpufreq ondemand policy") Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/drivers/cpufreq/cpufreq_ondemand.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/xen/drivers/cpufreq/cpufreq_ondemand.c b/xen/drivers/cpufreq/cpufreq_ondemand.c index 0126a3f5d9b4..537695eaab19 100644 --- a/xen/drivers/cpufreq/cpufreq_ondemand.c +++ b/xen/drivers/cpufreq/cpufreq_ondemand.c @@ -117,11 +117,12 @@ static void dbs_check_cpu(struct cpu_dbs_info_s *this_dbs_info) cur_ns = NOW(); total_ns = cur_ns - this_dbs_info->prev_cpu_wall; - this_dbs_info->prev_cpu_wall = NOW(); if (total_ns < MIN_DBS_INTERVAL) return; + this_dbs_info->prev_cpu_wall = cur_ns; + /* Get Idle Time */ for_each_cpu(j, policy->cpus) { uint64_t idle_ns, total_idle_ns; -- 2.51.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 1/2] xen/cpufreq: fix adjusting of sampling window on early exit 2026-02-27 7:32 ` [PATCH 1/2] xen/cpufreq: fix adjusting of sampling window on early exit Roger Pau Monne @ 2026-03-02 9:46 ` Jan Beulich 0 siblings, 0 replies; 9+ messages in thread From: Jan Beulich @ 2026-03-02 9:46 UTC (permalink / raw) To: Roger Pau Monne; +Cc: xen-devel On 27.02.2026 08:32, Roger Pau Monne wrote: > The logic in dbs_check_cpu() resets the sampling window even when the > sampling period is considered too small. This leads to further calls > finding an imbalance between the total window time and the idle time, as > the total window time is possibly shorter than the idle time. > > Fix by resetting the sampling window start time in the same block where the > current idle time is stored. While there also prevent a duplicated call to > NOW() and instead re-use the previously fetched value. > > Fixes: d6f001cb91ac ("x86: Implement cpufreq ondemand policy") > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor 2026-02-27 7:32 [PATCH 0/2] xen/cpufreq: a couple of fixes for the sampling window Roger Pau Monne 2026-02-27 7:32 ` [PATCH 1/2] xen/cpufreq: fix adjusting of sampling window on early exit Roger Pau Monne @ 2026-02-27 7:32 ` Roger Pau Monne 2026-03-02 9:57 ` Jan Beulich 2026-06-15 17:44 ` Jason Andryuk 1 sibling, 2 replies; 9+ messages in thread From: Roger Pau Monne @ 2026-02-27 7:32 UTC (permalink / raw) To: xen-devel; +Cc: Roger Pau Monne, Jan Beulich The first parameter passed to align_timer() is the timer expiration, not the current time. Adjust the calls to align_timer() in the on-demand governor to pass the expected timer expiration as the first parameter. Fixes: af74e3a15a83 ("cpufreq: align dbs timer for better package C state residency") Fixes: 382b95f627a9 ("Fix cpufreq HW-ALL coordination handle") Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- xen/drivers/cpufreq/cpufreq_ondemand.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/xen/drivers/cpufreq/cpufreq_ondemand.c b/xen/drivers/cpufreq/cpufreq_ondemand.c index 537695eaab19..0d94c0e464a6 100644 --- a/xen/drivers/cpufreq/cpufreq_ondemand.c +++ b/xen/drivers/cpufreq/cpufreq_ondemand.c @@ -185,7 +185,8 @@ static void cf_check do_dbs_timer(void *dbs) dbs_check_cpu(dbs_info); set_timer(&per_cpu(dbs_timer, dbs_info->cpu), - align_timer(NOW() , dbs_tuners_ins.sampling_rate)); + align_timer(NOW() + dbs_tuners_ins.sampling_rate, + dbs_tuners_ins.sampling_rate)); } static void dbs_timer_init(struct cpu_dbs_info_s *dbs_info) @@ -400,6 +401,6 @@ void cpufreq_dbs_timer_resume(void) (void)cmpxchg(stoppable, -1, 1); } else - set_timer(t, align_timer(now, dbs_tuners_ins.sampling_rate)); + set_timer(t, align_timer(t->expires, dbs_tuners_ins.sampling_rate)); } } -- 2.51.0 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor 2026-02-27 7:32 ` [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor Roger Pau Monne @ 2026-03-02 9:57 ` Jan Beulich 2026-06-15 17:44 ` Jason Andryuk 1 sibling, 0 replies; 9+ messages in thread From: Jan Beulich @ 2026-03-02 9:57 UTC (permalink / raw) To: Roger Pau Monne; +Cc: xen-devel On 27.02.2026 08:32, Roger Pau Monne wrote: > The first parameter passed to align_timer() is the timer expiration, not > the current time. Adjust the calls to align_timer() in the on-demand > governor to pass the expected timer expiration as the first parameter. > > Fixes: af74e3a15a83 ("cpufreq: align dbs timer for better package C state residency") > Fixes: 382b95f627a9 ("Fix cpufreq HW-ALL coordination handle") > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor 2026-02-27 7:32 ` [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor Roger Pau Monne 2026-03-02 9:57 ` Jan Beulich @ 2026-06-15 17:44 ` Jason Andryuk 2026-06-15 18:33 ` Roger Pau Monné 2026-06-15 19:22 ` Roger Pau Monné 1 sibling, 2 replies; 9+ messages in thread From: Jason Andryuk @ 2026-06-15 17:44 UTC (permalink / raw) To: Roger Pau Monne, xen-devel; +Cc: Jan Beulich On 2026-02-27 02:32, Roger Pau Monne wrote: > The first parameter passed to align_timer() is the timer expiration, not > the current time. Adjust the calls to align_timer() in the on-demand > governor to pass the expected timer expiration as the first parameter. Internally, we have a report of a benchmark regressing ~6% with this change on 4.20. s_time_t align_timer(s_time_t firsttick, uint64_t period) { if ( !period ) return firsttick; return firsttick + (period - 1) - ((firsttick - 1) % period); } The code rounds firsttick up to the next period: align_timer(0, period) -> 0 align_timer(1, period) -> period align_timer(period - 1, period) -> period align_timer(period, period) -> period align_timer(period + 1, period) -> 2 * period With the change of this patch adding the period before calling align_timer(), the timer is set for two periods in the future. The only exception is when firsttick % period == 0. I think that is unlikely to happen since NOW() will always be a little after the period. Even if it did happen, the timer would fire immediately, but the next timer would be set for 1 period later. So I think we want to revert? Regards, Jason > > Fixes: af74e3a15a83 ("cpufreq: align dbs timer for better package C state residency") > Fixes: 382b95f627a9 ("Fix cpufreq HW-ALL coordination handle") > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > xen/drivers/cpufreq/cpufreq_ondemand.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/xen/drivers/cpufreq/cpufreq_ondemand.c b/xen/drivers/cpufreq/cpufreq_ondemand.c > index 537695eaab19..0d94c0e464a6 100644 > --- a/xen/drivers/cpufreq/cpufreq_ondemand.c > +++ b/xen/drivers/cpufreq/cpufreq_ondemand.c > @@ -185,7 +185,8 @@ static void cf_check do_dbs_timer(void *dbs) > dbs_check_cpu(dbs_info); > > set_timer(&per_cpu(dbs_timer, dbs_info->cpu), > - align_timer(NOW() , dbs_tuners_ins.sampling_rate)); > + align_timer(NOW() + dbs_tuners_ins.sampling_rate, > + dbs_tuners_ins.sampling_rate)); > } > > static void dbs_timer_init(struct cpu_dbs_info_s *dbs_info) > @@ -400,6 +401,6 @@ void cpufreq_dbs_timer_resume(void) > (void)cmpxchg(stoppable, -1, 1); > } > else > - set_timer(t, align_timer(now, dbs_tuners_ins.sampling_rate)); > + set_timer(t, align_timer(t->expires, dbs_tuners_ins.sampling_rate)); > } > } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor 2026-06-15 17:44 ` Jason Andryuk @ 2026-06-15 18:33 ` Roger Pau Monné 2026-06-15 19:31 ` Jason Andryuk 2026-06-15 19:22 ` Roger Pau Monné 1 sibling, 1 reply; 9+ messages in thread From: Roger Pau Monné @ 2026-06-15 18:33 UTC (permalink / raw) To: Jason Andryuk; +Cc: xen-devel, Jan Beulich On Mon, Jun 15, 2026 at 01:44:54PM -0400, Jason Andryuk wrote: > On 2026-02-27 02:32, Roger Pau Monne wrote: > > The first parameter passed to align_timer() is the timer expiration, not > > the current time. Adjust the calls to align_timer() in the on-demand > > governor to pass the expected timer expiration as the first parameter. > > Internally, we have a report of a benchmark regressing ~6% with this change > on 4.20. > > s_time_t align_timer(s_time_t firsttick, uint64_t period) > { > if ( !period ) > return firsttick; > > return firsttick + (period - 1) - ((firsttick - 1) % period); > } > > The code rounds firsttick up to the next period: > > align_timer(0, period) -> 0 > align_timer(1, period) -> period > align_timer(period - 1, period) -> period > align_timer(period, period) -> period > align_timer(period + 1, period) -> 2 * period > > With the change of this patch adding the period before calling > align_timer(), the timer is set for two periods in the future. The only > exception is when firsttick % period == 0. I think that is unlikely to > happen since NOW() will always be a little after the period. Even if it did > happen, the timer would fire immediately, but the next timer would be set > for 1 period later. Hm, I see. So this is explicitly done to never exceed one period between sampling, even if that implies using a smaller period and over-sampling. That's kind of different from how the other caller uses align_timer(), where it's expected the timer to fire after the period has expired, not before. I think create_periodic_time() is fine, because it's only the first tick that might be delayed, afterwards the next tick should be aligned to the period already. > So I think we want to revert? I think we want to revert the first chunk... > Regards, > Jason > > > > > Fixes: af74e3a15a83 ("cpufreq: align dbs timer for better package C state residency") > > Fixes: 382b95f627a9 ("Fix cpufreq HW-ALL coordination handle") > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > --- > > xen/drivers/cpufreq/cpufreq_ondemand.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/xen/drivers/cpufreq/cpufreq_ondemand.c b/xen/drivers/cpufreq/cpufreq_ondemand.c > > index 537695eaab19..0d94c0e464a6 100644 > > --- a/xen/drivers/cpufreq/cpufreq_ondemand.c > > +++ b/xen/drivers/cpufreq/cpufreq_ondemand.c > > @@ -185,7 +185,8 @@ static void cf_check do_dbs_timer(void *dbs) > > dbs_check_cpu(dbs_info); > > set_timer(&per_cpu(dbs_timer, dbs_info->cpu), > > - align_timer(NOW() , dbs_tuners_ins.sampling_rate)); > > + align_timer(NOW() + dbs_tuners_ins.sampling_rate, > > + dbs_tuners_ins.sampling_rate)); > > } > > static void dbs_timer_init(struct cpu_dbs_info_s *dbs_info) > > @@ -400,6 +401,6 @@ void cpufreq_dbs_timer_resume(void) > > (void)cmpxchg(stoppable, -1, 1); > > } > > else > > - set_timer(t, align_timer(now, dbs_tuners_ins.sampling_rate)); > > + set_timer(t, align_timer(t->expires, dbs_tuners_ins.sampling_rate)); ... but possibly keep this as-is? Thinking about it, t->expires should already be aligned, and hence we could drop the align_timer() call here? It should be equivalent to aligning NOW() to the next period boundary, so yes, we could revert this chunk also and timer expiry should be the same. And maybe we want to add an extra align_timer() in dbs_timer_init() to align the first call also in a separate patch. Do you want to send the revert, or should I? Regards, Roger. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor 2026-06-15 18:33 ` Roger Pau Monné @ 2026-06-15 19:31 ` Jason Andryuk 0 siblings, 0 replies; 9+ messages in thread From: Jason Andryuk @ 2026-06-15 19:31 UTC (permalink / raw) To: Roger Pau Monné; +Cc: xen-devel, Jan Beulich On 2026-06-15 14:33, Roger Pau Monné wrote: > On Mon, Jun 15, 2026 at 01:44:54PM -0400, Jason Andryuk wrote: >> On 2026-02-27 02:32, Roger Pau Monne wrote: >>> The first parameter passed to align_timer() is the timer expiration, not >>> the current time. Adjust the calls to align_timer() in the on-demand >>> governor to pass the expected timer expiration as the first parameter. >> >> Internally, we have a report of a benchmark regressing ~6% with this change >> on 4.20. >> >> s_time_t align_timer(s_time_t firsttick, uint64_t period) >> { >> if ( !period ) >> return firsttick; >> >> return firsttick + (period - 1) - ((firsttick - 1) % period); >> } >> >> The code rounds firsttick up to the next period: >> >> align_timer(0, period) -> 0 >> align_timer(1, period) -> period >> align_timer(period - 1, period) -> period >> align_timer(period, period) -> period >> align_timer(period + 1, period) -> 2 * period >> >> With the change of this patch adding the period before calling >> align_timer(), the timer is set for two periods in the future. The only >> exception is when firsttick % period == 0. I think that is unlikely to >> happen since NOW() will always be a little after the period. Even if it did >> happen, the timer would fire immediately, but the next timer would be set >> for 1 period later. > > Hm, I see. So this is explicitly done to never exceed one period > between sampling, even if that implies using a smaller period and > over-sampling. That's kind of different from how the other caller > uses align_timer(), where it's expected the timer to fire after the > period has expired, not before. I think create_periodic_time() is > fine, because it's only the first tick that might be delayed, > afterwards the next tick should be aligned to the period already. > >> So I think we want to revert? > > I think we want to revert the first chunk... > >> Regards, >> Jason >> >>> >>> Fixes: af74e3a15a83 ("cpufreq: align dbs timer for better package C state residency") >>> Fixes: 382b95f627a9 ("Fix cpufreq HW-ALL coordination handle") >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>> --- >>> xen/drivers/cpufreq/cpufreq_ondemand.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/xen/drivers/cpufreq/cpufreq_ondemand.c b/xen/drivers/cpufreq/cpufreq_ondemand.c >>> index 537695eaab19..0d94c0e464a6 100644 >>> --- a/xen/drivers/cpufreq/cpufreq_ondemand.c >>> +++ b/xen/drivers/cpufreq/cpufreq_ondemand.c >>> @@ -185,7 +185,8 @@ static void cf_check do_dbs_timer(void *dbs) >>> dbs_check_cpu(dbs_info); >>> set_timer(&per_cpu(dbs_timer, dbs_info->cpu), >>> - align_timer(NOW() , dbs_tuners_ins.sampling_rate)); >>> + align_timer(NOW() + dbs_tuners_ins.sampling_rate, >>> + dbs_tuners_ins.sampling_rate)); >>> } >>> static void dbs_timer_init(struct cpu_dbs_info_s *dbs_info) >>> @@ -400,6 +401,6 @@ void cpufreq_dbs_timer_resume(void) >>> (void)cmpxchg(stoppable, -1, 1); >>> } >>> else >>> - set_timer(t, align_timer(now, dbs_tuners_ins.sampling_rate)); >>> + set_timer(t, align_timer(t->expires, dbs_tuners_ins.sampling_rate)); > > ... but possibly keep this as-is? Thinking about it, t->expires > should already be aligned, and hence we could drop the align_timer() > call here? It should be equivalent to aligning NOW() to the next > period boundary, so yes, we could revert this chunk also and > timer expiry should be the same. > > And maybe we want to add an extra align_timer() in dbs_timer_init() to > align the first call also in a separate patch. > > Do you want to send the revert, or should I? I'll send out a straight revert soon. Additional changes can go on top (though probably not for 4.22?). Thanks for the feedback. Regards, Jason ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor 2026-06-15 17:44 ` Jason Andryuk 2026-06-15 18:33 ` Roger Pau Monné @ 2026-06-15 19:22 ` Roger Pau Monné 1 sibling, 0 replies; 9+ messages in thread From: Roger Pau Monné @ 2026-06-15 19:22 UTC (permalink / raw) To: Jason Andryuk; +Cc: xen-devel, Jan Beulich On Mon, Jun 15, 2026 at 01:44:54PM -0400, Jason Andryuk wrote: > On 2026-02-27 02:32, Roger Pau Monne wrote: > > The first parameter passed to align_timer() is the timer expiration, not > > the current time. Adjust the calls to align_timer() in the on-demand > > governor to pass the expected timer expiration as the first parameter. > > Internally, we have a report of a benchmark regressing ~6% with this change > on 4.20. > > s_time_t align_timer(s_time_t firsttick, uint64_t period) > { > if ( !period ) > return firsttick; > > return firsttick + (period - 1) - ((firsttick - 1) % period); > } > > The code rounds firsttick up to the next period: > > align_timer(0, period) -> 0 > align_timer(1, period) -> period > align_timer(period - 1, period) -> period > align_timer(period, period) -> period > align_timer(period + 1, period) -> 2 * period > > With the change of this patch adding the period before calling > align_timer(), the timer is set for two periods in the future. The only > exception is when firsttick % period == 0. I think that is unlikely to > happen since NOW() will always be a little after the period. Even if it did > happen, the timer would fire immediately, but the next timer would be set > for 1 period later. > > So I think we want to revert? Forgot to mention in the first reply, as I went straight into the technical side: thanks for finding and reporting this, we can hopefully get it sorted before the release. Roger. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-06-15 19:32 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-27 7:32 [PATCH 0/2] xen/cpufreq: a couple of fixes for the sampling window Roger Pau Monne 2026-02-27 7:32 ` [PATCH 1/2] xen/cpufreq: fix adjusting of sampling window on early exit Roger Pau Monne 2026-03-02 9:46 ` Jan Beulich 2026-02-27 7:32 ` [PATCH 2/2] xen/cpufreq: fix usages of align_timer() in the on-demand governor Roger Pau Monne 2026-03-02 9:57 ` Jan Beulich 2026-06-15 17:44 ` Jason Andryuk 2026-06-15 18:33 ` Roger Pau Monné 2026-06-15 19:31 ` Jason Andryuk 2026-06-15 19:22 ` Roger Pau Monné
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.