* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
[not found] ` <20201022120213.GG2611@hirez.programming.kicks-ass.net>
@ 2020-10-22 12:19 ` Rafael J. Wysocki
2020-10-22 12:29 ` Peter Zijlstra
2020-10-22 16:23 ` [PATCH] cpufreq: Avoid configuring old governors as default with intel_pstate Rafael J. Wysocki
1 sibling, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2020-10-22 12:19 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Viresh Kumar, Julia Lawall, Mel Gorman, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
linux-kernel, Valentin Schneider, Gilles Muller,
srinivas.pandruvada, Linux PM, Len Brown
[CC linux-pm and Len]
On Thursday, October 22, 2020 2:02:13 PM CEST Peter Zijlstra wrote:
> On Thu, Oct 22, 2020 at 01:45:25PM +0200, Rafael J. Wysocki wrote:
> > On Thursday, October 22, 2020 12:47:03 PM CEST Viresh Kumar wrote:
> > > On 22-10-20, 09:11, Peter Zijlstra wrote:
> > > > Well, but we need to do something to force people onto schedutil,
> > > > otherwise we'll get more crap like this thread.
> > > >
> > > > Can we take the choice away? Only let Kconfig select which governors are
> > > > available and then set the default ourselves? I mean, the end goal being
> > > > to not have selectable governors at all, this seems like a good step
> > > > anyway.
> > >
> > > Just to clarify and complete the point a bit here, the users can still
> > > pass the default governor from cmdline using
> > > cpufreq.default_governor=, which will take precedence over the one the
> > > below code is playing with. And later once the kernel is up, they can
> > > still choose a different governor from userspace.
> >
> > Right.
> >
> > Also some people simply set "performance" as the default governor and then
> > don't touch cpufreq otherwise (the idea is to get everything to the max
> > freq right away and stay in that mode forever). This still needs to be
> > possible IMO.
>
> Performance/powersave make sense to keep.
>
> However I do want to retire ondemand, conservative and also very much
> intel_pstate/active mode.
I agree in general, but IMO it would not be prudent to do that without making
schedutil provide the same level of performance in all of the relevant use
cases.
> I also have very little sympathy for userspace.
That I completely agree with.
> We should start by making it hard to use them and eventually just delete
> them outright.
Right, but see above: IMO step 0 should be to ensure that schedutil is a viable
replacement for all users.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 12:19 ` default cpufreq gov, was: [PATCH] sched/fair: check for idle core Rafael J. Wysocki
@ 2020-10-22 12:29 ` Peter Zijlstra
2020-10-22 14:52 ` Mel Gorman
2020-10-22 15:45 ` A L
0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2020-10-22 12:29 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Viresh Kumar, Julia Lawall, Mel Gorman, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
linux-kernel, Valentin Schneider, Gilles Muller,
srinivas.pandruvada, Linux PM, Len Brown
On Thu, Oct 22, 2020 at 02:19:29PM +0200, Rafael J. Wysocki wrote:
> > However I do want to retire ondemand, conservative and also very much
> > intel_pstate/active mode.
>
> I agree in general, but IMO it would not be prudent to do that without making
> schedutil provide the same level of performance in all of the relevant use
> cases.
Agreed; I though to have understood we were there already.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 12:29 ` Peter Zijlstra
@ 2020-10-22 14:52 ` Mel Gorman
2020-10-22 14:58 ` Colin Ian King
2020-10-22 15:25 ` Peter Zijlstra
2020-10-22 15:45 ` A L
1 sibling, 2 replies; 27+ messages in thread
From: Mel Gorman @ 2020-10-22 14:52 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Giovanni Gherdovich, Viresh Kumar,
Julia Lawall, Ingo Molnar, kernel-janitors, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, linux-kernel, Valentin Schneider,
Gilles Muller, srinivas.pandruvada, Linux PM, Len Brown
On Thu, Oct 22, 2020 at 02:29:49PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 22, 2020 at 02:19:29PM +0200, Rafael J. Wysocki wrote:
> > > However I do want to retire ondemand, conservative and also very much
> > > intel_pstate/active mode.
> >
> > I agree in general, but IMO it would not be prudent to do that without making
> > schedutil provide the same level of performance in all of the relevant use
> > cases.
>
> Agreed; I though to have understood we were there already.
AFAIK, not quite (added Giovanni as he has been paying more attention).
Schedutil has improved since it was merged but not to the extent where
it is a drop-in replacement. The standard it needs to meet is that
it is at least equivalent to powersave (in intel_pstate language)
or ondemand (acpi_cpufreq) and within a reasonable percentage of the
performance governor. Defaulting to performance is a) giving up and b)
the performance governor is not a universal win. There are some questions
currently on whether schedutil is good enough when HWP is not available.
There was some evidence (I don't have the data, Giovanni was looking into
it) that HWP was a requirement to make schedutil work well. That is a
hazard in itself because someone could test on the latest gen Intel CPU
and conclude everything is fine and miss that Intel-specific technology
is needed to make it work well while throwing everyone else under a bus.
Giovanni knows a lot more than I do about this, I could be wrong or
forgetting things.
For distros, switching to schedutil by default would be nice because
frequency selection state would follow the task instead of being per-cpu
and we could stop worrying about different HWP implementations but it's
not at the point where the switch is advisable. I would expect hard data
before switching the default and still would strongly advise having a
period of time where we can fall back when someone inevitably finds a
new corner case or exception.
For reference, SLUB had the same problem for years. It was switched
on by default in the kernel config but it was a long time before
SLUB was generally equivalent to SLAB in terms of performance. Block
multiqueue also had vaguely similar issues before the default changes
and a period of time before it was removed removed (example whinging mail
https://lore.kernel.org/lkml/20170803085115.r2jfz2lofy5spfdb@techsingularity.net/)
It's schedutil's turn :P
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 14:52 ` Mel Gorman
@ 2020-10-22 14:58 ` Colin Ian King
2020-10-22 15:12 ` Phil Auld
2020-10-22 15:25 ` Peter Zijlstra
1 sibling, 1 reply; 27+ messages in thread
From: Colin Ian King @ 2020-10-22 14:58 UTC (permalink / raw)
To: Mel Gorman, Peter Zijlstra
Cc: Rafael J. Wysocki, Giovanni Gherdovich, Viresh Kumar,
Julia Lawall, Ingo Molnar, kernel-janitors, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, linux-kernel, Valentin Schneider,
Gilles Muller, srinivas.pandruvada, Linux PM, Len Brown
On 22/10/2020 15:52, Mel Gorman wrote:
> On Thu, Oct 22, 2020 at 02:29:49PM +0200, Peter Zijlstra wrote:
>> On Thu, Oct 22, 2020 at 02:19:29PM +0200, Rafael J. Wysocki wrote:
>>>> However I do want to retire ondemand, conservative and also very much
>>>> intel_pstate/active mode.
>>>
>>> I agree in general, but IMO it would not be prudent to do that without making
>>> schedutil provide the same level of performance in all of the relevant use
>>> cases.
>>
>> Agreed; I though to have understood we were there already.
>
> AFAIK, not quite (added Giovanni as he has been paying more attention).
> Schedutil has improved since it was merged but not to the extent where
> it is a drop-in replacement. The standard it needs to meet is that
> it is at least equivalent to powersave (in intel_pstate language)
> or ondemand (acpi_cpufreq) and within a reasonable percentage of the
> performance governor. Defaulting to performance is a) giving up and b)
> the performance governor is not a universal win. There are some questions
> currently on whether schedutil is good enough when HWP is not available.
> There was some evidence (I don't have the data, Giovanni was looking into
> it) that HWP was a requirement to make schedutil work well. That is a
> hazard in itself because someone could test on the latest gen Intel CPU
> and conclude everything is fine and miss that Intel-specific technology
> is needed to make it work well while throwing everyone else under a bus.
> Giovanni knows a lot more than I do about this, I could be wrong or
> forgetting things.
>
> For distros, switching to schedutil by default would be nice because
> frequency selection state would follow the task instead of being per-cpu
> and we could stop worrying about different HWP implementations but it's
> not at the point where the switch is advisable. I would expect hard data
> before switching the default and still would strongly advise having a
> period of time where we can fall back when someone inevitably finds a
> new corner case or exception.
..and it would be really useful for distros to know when the hard data
is available so that they can make an informed decision when to move to
schedutil.
>
> For reference, SLUB had the same problem for years. It was switched
> on by default in the kernel config but it was a long time before
> SLUB was generally equivalent to SLAB in terms of performance. Block
> multiqueue also had vaguely similar issues before the default changes
> and a period of time before it was removed removed (example whinging mail
> https://lore.kernel.org/lkml/20170803085115.r2jfz2lofy5spfdb@techsingularity.net/)
> It's schedutil's turn :P
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 14:58 ` Colin Ian King
@ 2020-10-22 15:12 ` Phil Auld
2020-10-22 16:35 ` Mel Gorman
0 siblings, 1 reply; 27+ messages in thread
From: Phil Auld @ 2020-10-22 15:12 UTC (permalink / raw)
To: Colin Ian King
Cc: Mel Gorman, Peter Zijlstra, Rafael J. Wysocki,
Giovanni Gherdovich, Viresh Kumar, Julia Lawall, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
linux-kernel, Valentin Schneider, Gilles Muller,
srinivas.pandruvada, Linux PM, Len Brown
On Thu, Oct 22, 2020 at 03:58:13PM +0100 Colin Ian King wrote:
> On 22/10/2020 15:52, Mel Gorman wrote:
> > On Thu, Oct 22, 2020 at 02:29:49PM +0200, Peter Zijlstra wrote:
> >> On Thu, Oct 22, 2020 at 02:19:29PM +0200, Rafael J. Wysocki wrote:
> >>>> However I do want to retire ondemand, conservative and also very much
> >>>> intel_pstate/active mode.
> >>>
> >>> I agree in general, but IMO it would not be prudent to do that without making
> >>> schedutil provide the same level of performance in all of the relevant use
> >>> cases.
> >>
> >> Agreed; I though to have understood we were there already.
> >
> > AFAIK, not quite (added Giovanni as he has been paying more attention).
> > Schedutil has improved since it was merged but not to the extent where
> > it is a drop-in replacement. The standard it needs to meet is that
> > it is at least equivalent to powersave (in intel_pstate language)
> > or ondemand (acpi_cpufreq) and within a reasonable percentage of the
> > performance governor. Defaulting to performance is a) giving up and b)
> > the performance governor is not a universal win. There are some questions
> > currently on whether schedutil is good enough when HWP is not available.
> > There was some evidence (I don't have the data, Giovanni was looking into
> > it) that HWP was a requirement to make schedutil work well. That is a
> > hazard in itself because someone could test on the latest gen Intel CPU
> > and conclude everything is fine and miss that Intel-specific technology
> > is needed to make it work well while throwing everyone else under a bus.
> > Giovanni knows a lot more than I do about this, I could be wrong or
> > forgetting things.
> >
> > For distros, switching to schedutil by default would be nice because
> > frequency selection state would follow the task instead of being per-cpu
> > and we could stop worrying about different HWP implementations but it's
> > not at the point where the switch is advisable. I would expect hard data
> > before switching the default and still would strongly advise having a
> > period of time where we can fall back when someone inevitably finds a
> > new corner case or exception.
>
> ..and it would be really useful for distros to know when the hard data
> is available so that they can make an informed decision when to move to
> schedutil.
>
I think distros are on the hook to generate that hard data themselves
with which to make such a decision. I don't expect it to be done by
someone else.
> >
> > For reference, SLUB had the same problem for years. It was switched
> > on by default in the kernel config but it was a long time before
> > SLUB was generally equivalent to SLAB in terms of performance. Block
> > multiqueue also had vaguely similar issues before the default changes
> > and a period of time before it was removed removed (example whinging mail
> > https://lore.kernel.org/lkml/20170803085115.r2jfz2lofy5spfdb@techsingularity.net/)
> > It's schedutil's turn :P
> >
>
Agreed. I'd like the option to switch back if we make the default change.
It's on the table and I'd like to be able to go that way.
Cheers,
Phil
--
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 14:52 ` Mel Gorman
2020-10-22 14:58 ` Colin Ian King
@ 2020-10-22 15:25 ` Peter Zijlstra
2020-10-22 15:55 ` Rafael J. Wysocki
` (2 more replies)
1 sibling, 3 replies; 27+ messages in thread
From: Peter Zijlstra @ 2020-10-22 15:25 UTC (permalink / raw)
To: Mel Gorman
Cc: Rafael J. Wysocki, Giovanni Gherdovich, Viresh Kumar,
Julia Lawall, Ingo Molnar, kernel-janitors, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, linux-kernel, Valentin Schneider,
Gilles Muller, srinivas.pandruvada, Linux PM, Len Brown
On Thu, Oct 22, 2020 at 03:52:50PM +0100, Mel Gorman wrote:
> There are some questions
> currently on whether schedutil is good enough when HWP is not available.
Srinivas and Rafael will know better, but Intel does run a lot of tests
and IIRC it was found that schedutil was on-par for !HWP. That was the
basis for commit:
33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by default without HWP")
But now it turns out that commit results in running intel_pstate-passive
on ondemand, which is quite horrible.
> There was some evidence (I don't have the data, Giovanni was looking into
> it) that HWP was a requirement to make schedutil work well.
That seems to be the question; Rafael just said the opposite.
> For distros, switching to schedutil by default would be nice because
> frequency selection state would follow the task instead of being per-cpu
> and we could stop worrying about different HWP implementations but it's
s/HWP/cpufreq-governors/ ? But yes.
> not at the point where the switch is advisable. I would expect hard data
> before switching the default and still would strongly advise having a
> period of time where we can fall back when someone inevitably finds a
> new corner case or exception.
Which is why I advocated to make it 'difficult' to use the old ones and
only later remove them.
> For reference, SLUB had the same problem for years. It was switched
> on by default in the kernel config but it was a long time before
> SLUB was generally equivalent to SLAB in terms of performance.
I remember :-)
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 12:29 ` Peter Zijlstra
2020-10-22 14:52 ` Mel Gorman
@ 2020-10-22 15:45 ` A L
2020-10-22 15:55 ` Vincent Guittot
1 sibling, 1 reply; 27+ messages in thread
From: A L @ 2020-10-22 15:45 UTC (permalink / raw)
To: Peter Zijlstra, Rafael J. Wysocki
Cc: Viresh Kumar, Julia Lawall, Mel Gorman, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
linux-kernel, Valentin Schneider, Gilles Muller,
srinivas.pandruvada, Linux PM, Len Brown
---- From: Peter Zijlstra <peterz@infradead.org> -- Sent: 2020-10-22 - 14:29 ----
> On Thu, Oct 22, 2020 at 02:19:29PM +0200, Rafael J. Wysocki wrote:
>> > However I do want to retire ondemand, conservative and also very much
>> > intel_pstate/active mode.
>>
>> I agree in general, but IMO it would not be prudent to do that without making
>> schedutil provide the same level of performance in all of the relevant use
>> cases.
>
> Agreed; I though to have understood we were there already.
Hi,
Currently schedutil does not populate all stats like ondemand does, which can be a problem for some monitoring software.
On my AMD 3000G CPU with kernel-5.9.1:
grep. /sys/devices/system/cpu/cpufreq/policy0/stats/*
With ondemand:
time_in_state:3900000 145179
time_in_state:1600000 9588482
total_trans:177565
trans_table: From : To
trans_table: : 3900000 1600000
trans_table: 3900000: 0 88783
trans_table: 1600000: 88782 0
With schedutil only two file exists:
reset:<empty>
total_trans:216609
I'd really like to have these stats populated with schedutil, if that's possible.
Thanks.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 15:45 ` A L
@ 2020-10-22 15:55 ` Vincent Guittot
2020-10-23 5:11 ` Viresh Kumar
0 siblings, 1 reply; 27+ messages in thread
From: Vincent Guittot @ 2020-10-22 15:55 UTC (permalink / raw)
To: A L
Cc: Peter Zijlstra, Rafael J. Wysocki, Viresh Kumar, Julia Lawall,
Mel Gorman, Ingo Molnar, kernel-janitors, Juri Lelli,
Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, linux-kernel, Valentin Schneider,
Gilles Muller, Srinivas Pandruvada, Linux PM, Len Brown
On Thu, 22 Oct 2020 at 17:45, A L <mail@lechevalier.se> wrote:
>
>
>
> ---- From: Peter Zijlstra <peterz@infradead.org> -- Sent: 2020-10-22 - 14:29 ----
>
> > On Thu, Oct 22, 2020 at 02:19:29PM +0200, Rafael J. Wysocki wrote:
> >> > However I do want to retire ondemand, conservative and also very much
> >> > intel_pstate/active mode.
> >>
> >> I agree in general, but IMO it would not be prudent to do that without making
> >> schedutil provide the same level of performance in all of the relevant use
> >> cases.
> >
> > Agreed; I though to have understood we were there already.
>
> Hi,
>
>
> Currently schedutil does not populate all stats like ondemand does, which can be a problem for some monitoring software.
>
> On my AMD 3000G CPU with kernel-5.9.1:
>
>
> grep. /sys/devices/system/cpu/cpufreq/policy0/stats/*
>
> With ondemand:
> time_in_state:3900000 145179
> time_in_state:1600000 9588482
> total_trans:177565
> trans_table: From : To
> trans_table: : 3900000 1600000
> trans_table: 3900000: 0 88783
> trans_table: 1600000: 88782 0
>
> With schedutil only two file exists:
> reset:<empty>
> total_trans:216609
>
>
> I'd really like to have these stats populated with schedutil, if that's possible.
Your problem might have been fixed with
commit 96f60cddf7a1 ("cpufreq: stats: Enable stats for fast-switch as well")
>
> Thanks.
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 15:25 ` Peter Zijlstra
@ 2020-10-22 15:55 ` Rafael J. Wysocki
2020-10-22 16:29 ` Mel Gorman
2020-10-22 20:10 ` Giovanni Gherdovich
2 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2020-10-22 15:55 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Mel Gorman, Rafael J. Wysocki, Giovanni Gherdovich, Viresh Kumar,
Julia Lawall, Ingo Molnar, kernel-janitors, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, Linux Kernel Mailing List,
Valentin Schneider, Gilles Muller, Srinivas Pandruvada, Linux PM,
Len Brown
On Thu, Oct 22, 2020 at 5:25 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Oct 22, 2020 at 03:52:50PM +0100, Mel Gorman wrote:
>
> > There are some questions
> > currently on whether schedutil is good enough when HWP is not available.
>
> Srinivas and Rafael will know better, but Intel does run a lot of tests
> and IIRC it was found that schedutil was on-par for !HWP. That was the
> basis for commit:
>
> 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by default without HWP")
>
> But now it turns out that commit results in running intel_pstate-passive
> on ondemand, which is quite horrible.
It doesn't in general. AFAICS this happens only if "ondemand" was
selected as the default governor in the old kernel config, which
should not be the common case.
But I do agree that this needs to be avoided.
> > There was some evidence (I don't have the data, Giovanni was looking into
> > it) that HWP was a requirement to make schedutil work well.
>
> That seems to be the question; Rafael just said the opposite.
I'm not aware of any data like that.
HWP should not be required and it should always be possible to make an
HWP system run without HWP (except for those with exotic BIOS
configs). However, schedutil should work without HWP as well as (or
better than) the "ondemand" and "conservative" governors on top of the
same driver (whatever it is) and it should work as well as (or better
than) "raw" HWP (so to speak) on top of intel_pstate in the passive
mode with HWP enabled (before 5.9 it couldn't work in that
configuration at all and now it can do that, which I guess may be
regarded as an improvement).
> > For distros, switching to schedutil by default would be nice because
> > frequency selection state would follow the task instead of being per-cpu
> > and we could stop worrying about different HWP implementations but it's
>
> s/HWP/cpufreq-governors/ ? But yes.
Well, different HWP implementations in different processor generations
may be a concern as well in general.
> > not at the point where the switch is advisable. I would expect hard data
> > before switching the default and still would strongly advise having a
> > period of time where we can fall back when someone inevitably finds a
> > new corner case or exception.
>
> Which is why I advocated to make it 'difficult' to use the old ones and
> only later remove them.
Slightly less convenient may be sufficient IMV.
> > For reference, SLUB had the same problem for years. It was switched
> > on by default in the kernel config but it was a long time before
> > SLUB was generally equivalent to SLAB in terms of performance.
>
> I remember :-)
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH] cpufreq: Avoid configuring old governors as default with intel_pstate
[not found] ` <20201022120213.GG2611@hirez.programming.kicks-ass.net>
2020-10-22 12:19 ` default cpufreq gov, was: [PATCH] sched/fair: check for idle core Rafael J. Wysocki
@ 2020-10-22 16:23 ` Rafael J. Wysocki
2020-10-23 6:17 ` Viresh Kumar
2020-10-23 15:15 ` [PATCH v2] " Rafael J. Wysocki
1 sibling, 2 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2020-10-22 16:23 UTC (permalink / raw)
To: Peter Zijlstra, Viresh Kumar, Julia Lawall
Cc: Mel Gorman, Ingo Molnar, kernel-janitors, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, linux-kernel, Valentin Schneider,
Gilles Muller, srinivas.pandruvada, Linux PM, Len Brown
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] cpufreq: Avoid configuring old governors as default with intel_pstate
Commit 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by
default without HWP") was meant to cause intel_pstate without HWP
to be used in the passive mode with the schedutil governor on top of
it by default, but it missed the case in which either "ondemand" or
"conservative" was selected as the default governor in the existing
kernel config, in which case the previous old governor configuration
would be used, causing the default legacy governor to be used on top
of intel_pstate instead of schedutil.
Address this by preventing "ondemand" and "conservative" from being
configured as the default cpufreq governor in the case when schedutil
is the default choice for the default governor setting.
[Note that the default cpufreq governor can still be set via the
kernel command line if need be and that choice is not limited,
so if anyone really wants to use one of the legacy governors by
default, it can be achieved this way.]
Fixes: 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by default without HWP")
Cc: 5.8+ <stable@vger.kernel.org> # 5.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
drivers/cpufreq/Kconfig | 2 ++
1 file changed, 2 insertions(+)
Index: linux-pm/drivers/cpufreq/Kconfig
===================================================================
--- linux-pm.orig/drivers/cpufreq/Kconfig
+++ linux-pm/drivers/cpufreq/Kconfig
@@ -71,6 +71,7 @@ config CPU_FREQ_DEFAULT_GOV_USERSPACE
config CPU_FREQ_DEFAULT_GOV_ONDEMAND
bool "ondemand"
+ depends on !SMP || !X86_INTEL_PSTATE
select CPU_FREQ_GOV_ONDEMAND
select CPU_FREQ_GOV_PERFORMANCE
help
@@ -83,6 +84,7 @@ config CPU_FREQ_DEFAULT_GOV_ONDEMAND
config CPU_FREQ_DEFAULT_GOV_CONSERVATIVE
bool "conservative"
+ depends on !SMP || !X86_INTEL_PSTATE
select CPU_FREQ_GOV_CONSERVATIVE
select CPU_FREQ_GOV_PERFORMANCE
help
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 15:25 ` Peter Zijlstra
2020-10-22 15:55 ` Rafael J. Wysocki
@ 2020-10-22 16:29 ` Mel Gorman
2020-10-22 20:10 ` Giovanni Gherdovich
2 siblings, 0 replies; 27+ messages in thread
From: Mel Gorman @ 2020-10-22 16:29 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Rafael J. Wysocki, Giovanni Gherdovich, Viresh Kumar,
Julia Lawall, Ingo Molnar, kernel-janitors, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, linux-kernel, Valentin Schneider,
Gilles Muller, srinivas.pandruvada, Linux PM, Len Brown
On Thu, Oct 22, 2020 at 05:25:14PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 22, 2020 at 03:52:50PM +0100, Mel Gorman wrote:
>
> > There are some questions
> > currently on whether schedutil is good enough when HWP is not available.
>
> Srinivas and Rafael will know better, but Intel does run a lot of tests
> and IIRC it was found that schedutil was on-par for !HWP. That was the
> basis for commit:
>
> 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by default without HWP")
>
> But now it turns out that commit results in running intel_pstate-passive
> on ondemand, which is quite horrible.
>
I know Intel ran a lot of tests, no question about it and no fingers are
being pointed. I know I've had enough bugs patches tested with a battery
of tests on various machines and still ended up with bug reports :)
> > There was some evidence (I don't have the data, Giovanni was looking into
> > it) that HWP was a requirement to make schedutil work well.
>
> That seems to be the question; Rafael just said the opposite.
>
> > For distros, switching to schedutil by default would be nice because
> > frequency selection state would follow the task instead of being per-cpu
> > and we could stop worrying about different HWP implementations but it's
>
> s/HWP/cpufreq-governors/ ? But yes.
>
I've seen cases where HWP had variable behaviour between CPU
generations. It was hard to quantify and/or figure out because HWP is a
black box.
> > not at the point where the switch is advisable. I would expect hard data
> > before switching the default and still would strongly advise having a
> > period of time where we can fall back when someone inevitably finds a
> > new corner case or exception.
>
> Which is why I advocated to make it 'difficult' to use the old ones and
> only later remove them.
>
That's fair.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 15:12 ` Phil Auld
@ 2020-10-22 16:35 ` Mel Gorman
2020-10-22 17:59 ` Rafael J. Wysocki
0 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2020-10-22 16:35 UTC (permalink / raw)
To: Phil Auld
Cc: Colin Ian King, Peter Zijlstra, Rafael J. Wysocki,
Giovanni Gherdovich, Viresh Kumar, Julia Lawall, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
linux-kernel, Valentin Schneider, Gilles Muller,
srinivas.pandruvada, Linux PM, Len Brown
On Thu, Oct 22, 2020 at 11:12:00AM -0400, Phil Auld wrote:
> > > AFAIK, not quite (added Giovanni as he has been paying more attention).
> > > Schedutil has improved since it was merged but not to the extent where
> > > it is a drop-in replacement. The standard it needs to meet is that
> > > it is at least equivalent to powersave (in intel_pstate language)
> > > or ondemand (acpi_cpufreq) and within a reasonable percentage of the
> > > performance governor. Defaulting to performance is a) giving up and b)
> > > the performance governor is not a universal win. There are some questions
> > > currently on whether schedutil is good enough when HWP is not available.
> > > There was some evidence (I don't have the data, Giovanni was looking into
> > > it) that HWP was a requirement to make schedutil work well. That is a
> > > hazard in itself because someone could test on the latest gen Intel CPU
> > > and conclude everything is fine and miss that Intel-specific technology
> > > is needed to make it work well while throwing everyone else under a bus.
> > > Giovanni knows a lot more than I do about this, I could be wrong or
> > > forgetting things.
> > >
> > > For distros, switching to schedutil by default would be nice because
> > > frequency selection state would follow the task instead of being per-cpu
> > > and we could stop worrying about different HWP implementations but it's
> > > not at the point where the switch is advisable. I would expect hard data
> > > before switching the default and still would strongly advise having a
> > > period of time where we can fall back when someone inevitably finds a
> > > new corner case or exception.
> >
> > ..and it would be really useful for distros to know when the hard data
> > is available so that they can make an informed decision when to move to
> > schedutil.
> >
>
> I think distros are on the hook to generate that hard data themselves
> with which to make such a decision. I don't expect it to be done by
> someone else.
>
Yep, distros are on the hook. When I said "I would expect hard data",
it was in the knowledge that for openSUSE/SLE, we (as in SUSE) would be
generating said data and making a call based on it. I'd be surprised if
Phil was not thinking along the same lines.
> > > For reference, SLUB had the same problem for years. It was switched
> > > on by default in the kernel config but it was a long time before
> > > SLUB was generally equivalent to SLAB in terms of performance. Block
> > > multiqueue also had vaguely similar issues before the default changes
> > > and a period of time before it was removed removed (example whinging mail
> > > https://lore.kernel.org/lkml/20170803085115.r2jfz2lofy5spfdb@techsingularity.net/)
> > > It's schedutil's turn :P
> > >
> >
>
> Agreed. I'd like the option to switch back if we make the default change.
> It's on the table and I'd like to be able to go that way.
>
Yep. It sounds chicken, but it's a useful safety net and a reasonable
way to deprecate a feature. It's also useful for bug creation -- User X
running whatever found that schedutil is worse than the old governor and
had to temporarily switch back. Repeat until complaining stops and then
tear out the old stuff.
When/if there is a patch setting schedutil as the default, cc suitable
distro people (Giovanni and myself for openSUSE). Other distros assuming
they're watching can nominate their own victim.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 16:35 ` Mel Gorman
@ 2020-10-22 17:59 ` Rafael J. Wysocki
2020-10-22 20:32 ` Mel Gorman
0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2020-10-22 17:59 UTC (permalink / raw)
To: Mel Gorman
Cc: Phil Auld, Colin Ian King, Peter Zijlstra, Rafael J. Wysocki,
Giovanni Gherdovich, Viresh Kumar, Julia Lawall, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
Linux Kernel Mailing List, Valentin Schneider, Gilles Muller,
Srinivas Pandruvada, Linux PM, Len Brown
On Thu, Oct 22, 2020 at 6:35 PM Mel Gorman <mgorman@suse.de> wrote:
>
> On Thu, Oct 22, 2020 at 11:12:00AM -0400, Phil Auld wrote:
> > > > AFAIK, not quite (added Giovanni as he has been paying more attention).
> > > > Schedutil has improved since it was merged but not to the extent where
> > > > it is a drop-in replacement. The standard it needs to meet is that
> > > > it is at least equivalent to powersave (in intel_pstate language)
> > > > or ondemand (acpi_cpufreq) and within a reasonable percentage of the
> > > > performance governor. Defaulting to performance is a) giving up and b)
> > > > the performance governor is not a universal win. There are some questions
> > > > currently on whether schedutil is good enough when HWP is not available.
> > > > There was some evidence (I don't have the data, Giovanni was looking into
> > > > it) that HWP was a requirement to make schedutil work well. That is a
> > > > hazard in itself because someone could test on the latest gen Intel CPU
> > > > and conclude everything is fine and miss that Intel-specific technology
> > > > is needed to make it work well while throwing everyone else under a bus.
> > > > Giovanni knows a lot more than I do about this, I could be wrong or
> > > > forgetting things.
> > > >
> > > > For distros, switching to schedutil by default would be nice because
> > > > frequency selection state would follow the task instead of being per-cpu
> > > > and we could stop worrying about different HWP implementations but it's
> > > > not at the point where the switch is advisable. I would expect hard data
> > > > before switching the default and still would strongly advise having a
> > > > period of time where we can fall back when someone inevitably finds a
> > > > new corner case or exception.
> > >
> > > ..and it would be really useful for distros to know when the hard data
> > > is available so that they can make an informed decision when to move to
> > > schedutil.
> > >
> >
> > I think distros are on the hook to generate that hard data themselves
> > with which to make such a decision. I don't expect it to be done by
> > someone else.
> >
>
> Yep, distros are on the hook. When I said "I would expect hard data",
> it was in the knowledge that for openSUSE/SLE, we (as in SUSE) would be
> generating said data and making a call based on it. I'd be surprised if
> Phil was not thinking along the same lines.
>
> > > > For reference, SLUB had the same problem for years. It was switched
> > > > on by default in the kernel config but it was a long time before
> > > > SLUB was generally equivalent to SLAB in terms of performance. Block
> > > > multiqueue also had vaguely similar issues before the default changes
> > > > and a period of time before it was removed removed (example whinging mail
> > > > https://lore.kernel.org/lkml/20170803085115.r2jfz2lofy5spfdb@techsingularity.net/)
> > > > It's schedutil's turn :P
> > > >
> > >
> >
> > Agreed. I'd like the option to switch back if we make the default change.
> > It's on the table and I'd like to be able to go that way.
> >
>
> Yep. It sounds chicken, but it's a useful safety net and a reasonable
> way to deprecate a feature. It's also useful for bug creation -- User X
> running whatever found that schedutil is worse than the old governor and
> had to temporarily switch back. Repeat until complaining stops and then
> tear out the old stuff.
>
> When/if there is a patch setting schedutil as the default, cc suitable
> distro people (Giovanni and myself for openSUSE).
So for the record, Giovanni was on the CC list of the "cpufreq:
intel_pstate: Use passive mode by default without HWP" patch that this
discussion resulted from (and which kind of belongs to the above
category).
> Other distros assuming they're watching can nominate their own victim.
But no other victims had been nominated at that time.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 15:25 ` Peter Zijlstra
2020-10-22 15:55 ` Rafael J. Wysocki
2020-10-22 16:29 ` Mel Gorman
@ 2020-10-22 20:10 ` Giovanni Gherdovich
2020-10-22 20:16 ` Giovanni Gherdovich
2020-10-23 7:03 ` Peter Zijlstra
2 siblings, 2 replies; 27+ messages in thread
From: Giovanni Gherdovich @ 2020-10-22 20:10 UTC (permalink / raw)
To: Peter Zijlstra, Rafael J. Wysocki
Cc: Mel Gorman, Viresh Kumar, Julia Lawall, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
linux-kernel, Valentin Schneider, Gilles Muller,
srinivas.pandruvada, Linux PM, Len Brown
Hello Peter, Rafael,
back in August I tested a v5.8 kernel adding Rafael's patches from v5.9 that
make schedutil and HWP works together, i.e. f6ebbcf08f37 ("cpufreq: intel_pstate:
Implement passive mode with HWP enabled").
The main point I took from the exercise is that tbench (network benchmark
in localhost) is problematic for schedutil and only with HWP (thanks to
Rafael's patch above) it reaches the throughput of the other governors.
When HWP isn't available, the penalty is 5-10% and I need to understand if
the cause is something that can affect other applications too (or just a
quirk of this test).
I ran this campaign this summer when Rafal CC'ed me to f6ebbcf08f37
("cpufreq: intel_pstate: Implement passive mode with HWP enabled"),
I didn't reply as the patch was a win anyways (my bad, I should have posted
the positive results). The regression of tbench with schedutil w/o HWP,
that went unnoticed for long, got the best of my attention.
Other remarks
* on gitsource (running the git unit test suite, measures elapsed time)
schedutil is a lot better than Intel's powersave but not as good as the
performance governor.
* for the AMD EPYC machines we haven't yet implemented frequency invariant
accounting, which might explain why schedutil looses to ondemand on all
the benchmarks.
* on dbench (filesystem, measures latency) and kernbench (kernel compilation),
sugov is as good as the Intel performance governor. You can add or remove
HWP (to either sugov or perfgov), it doesn't make a difference. Intel's
powersave in general trails behind.
* generally my main concern is performance, not power efficiency, but I was
a little disappointed to see schedutil being just as efficient as
perfgov (the performance-per-watt ratios): there are even a few cases
where (on tbench) the performance governor is both faster and more
efficient. From previous conversations with Rafael I recall that
switching frequency has an energy cost, so it could be that schedutil
switches too often to amortize it. I haven't checked.
To read the tables:
Tilde (~) means the result is the same as baseline (or, the ratio is close
to 1). The double asterisk (**) is a visual aid and means the result is
worse than baseline (higher or lower depending on the case).
For an overview of the possible configurations (intel_psate passive,
active, HWP on/off etc) I made the diagram at
https://beta.suse.com/private/ggherdovich/cpufreq/x86-cpufreq.png
1) INTEL, HWP-CAPABLE MACHINES
2) INTEL, NON-HWP-CAPABLE MACHINES
3) AMD EPYC
1) INTEL, HWP-CAPABLE MACHINES:
64x_SKYLAKE_NUMA: Intel Skylake SP, 32 cores / 64 threads, NUMA, SATA SSD storage
------------------------------------------------------------------------------
sugov-HWP sugov-no-HWP powersave-HWP perfgov-HWP better if
------------------------------------------------------------------------------
PERFORMANCE RATIOS
tbench 1.00 0.68 ~ 1.03** higher
dbench 1.00 ~ 1.03 ~ lower
kernbench 1.00 ~ 1.11 ~ lower
gitsource 1.00 1.03 2.26 0.82** lower
------------------------------------------------------------------------------
PERFORMANCE-PER-WATT RATIOS
tbench 1.00 0.74 ~ ~ higher
dbench 1.00 ~ ~ ~ higher
kernbench 1.00 ~ 0.96 ~ higher
gitsource 1.00 0.96 0.45 1.15** higher
8x_SKYLAKE_UMA: Intel Skylake (client), 4 cores / 8 threads, UMA, SATA SSD storage
------------------------------------------------------------------------------
sugov-HWP sugov-no-HWP powersave-HWP perfgov-HWP better if
------------------------------------------------------------------------------
PERFORMANCE RATIOS
tbench 1.00 0.91 ~ ~ higher
dbench 1.00 ~ ~ ~ lower
kernbench 1.00 ~ ~ ~ lower
gitsource 1.00 1.04 1.77 ~ lower
------------------------------------------------------------------------------
PERFORMANCE-PER-WATT RATIOS
tbench 1.00 0.95 ~ ~ higher
dbench 1.00 ~ ~ ~ higher
kernbench 1.00 ~ ~ ~ higher
gitsource 1.00 ~ 0.74 ~ higher
8x_COFFEELAKE_UMA: Intel Coffee Lake, 4 cores / 8 threads, UMA, NVMe SSD storage
---------------------------------------------------------------
sugov-HWP powersave-HWP perfgov-HWP better if
---------------------------------------------------------------
PERFORMANCE RATIOS
tbench 1.00 ~ ~ higher
dbench 1.00 1.12 ~ lower
kernbench 1.00 ~ ~ lower
gitsource 1.00 2.05 ~ lower
---------------------------------------------------------------
PERFORMANCE-PER-WATT RATIOS
tbench 1.00 ~ ~ higher
dbench 1.00 1.80** ~ higher
kernbench 1.00 ~ ~ higher
gitsource 1.00 1.52** ~ higher
2) INTEL, NON-HWP-CAPABLE MACHINES:
80x_BROADWELL_NUMA: Intel Broadwell EP, 40 cores / 80 threads, NUMA, SATA SSD storage
---------------------------------------------------------------
sugov powersave perfgov better if
---------------------------------------------------------------
PERFORMANCE RATIOS
tbench 1.00 1.11** 1.10** higher
dbench 1.00 1.10 ~ lower
kernbench 1.00 1.10 ~ lower
gitsource 1.00 2.27 0.95** lower
---------------------------------------------------------------
PERFORMANCE-PER-WATT RATIOS
tbench 1.00 1.05** 1.04** higher
dbench 1.00 1.24** 0.95 higher
kernbench 1.00 ~ ~ higher
gitsource 1.00 0.86 1.04** higher
48x_HASWELL_NUMA: Intel Haswell EP, 24 cores / 48 threads, NUMA, HDD storage
---------------------------------------------------------------
sugov powersave perfgov better if
---------------------------------------------------------------
PERFORMANCE RATIOS
tbench 1.00 1.25** 1.27** higher
dbench 1.00 1.17 ~ lower
kernbench 1.00 1.04 ~ lower
gitsource 1.00 1.54 0.79** lower
---------------------------------------------------------------
PERFORMANCE-PER-WATT RATIOS
tbench 1.00 1.18** 1.11** higher
dbench 1.00 1.25** ~ higher
kernbench 1.00 1.04** 0.97 higher
gitsource 1.00 0.77 ~ higher
3) AMD EPYC:
256x_ROME_NUMA: AMD Rome , 128 cores / 256 threads, NUMA, SATA SSD storage
---------------------------------------------------------------
sugov ondemand perfgov better if
---------------------------------------------------------------
PERFORMANCE RATIOS
tbench 1.00 1.11** 1.58** higher
dbench 1.00 0.44** 0.40** lower
kernbench 1.00 ~ 0.91** lower
gitsource 1.00 0.96** 0.65** lower
128x_NAPLES_NUMA: AMD Naples , 64 cores / 128 threads, NUMA, SATA SSD storage
---------------------------------------------------------------
sugov ondemand perfgov better if
---------------------------------------------------------------
PERFORMANCE RATIOS
tbench 1.00 1.10** 1.19** higher
dbench 1.00 1.05 0.95** lower
kernbench 1.00 ~ 0.95** lower
gitsource 1.00 0.93** 0.55** lower
Giovanni
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 20:10 ` Giovanni Gherdovich
@ 2020-10-22 20:16 ` Giovanni Gherdovich
2020-10-23 7:03 ` Peter Zijlstra
1 sibling, 0 replies; 27+ messages in thread
From: Giovanni Gherdovich @ 2020-10-22 20:16 UTC (permalink / raw)
To: Peter Zijlstra, Rafael J. Wysocki
Cc: Mel Gorman, Viresh Kumar, Julia Lawall, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
linux-kernel, Valentin Schneider, Gilles Muller,
srinivas.pandruvada, Linux PM, Len Brown
On Thu, 2020-10-22 at 22:10 +0200, Giovanni Gherdovich wrote:
> [...]
> To read the tables:
>
> Tilde (~) means the result is the same as baseline (or, the ratio is close
> to 1). The double asterisk (**) is a visual aid and means the result is
> worse than baseline (higher or lower depending on the case).
Ouch, the opposite. Double asterisk (**) is where the result is better
than baseline, and schedutil needs improvement.
Giovanni
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 17:59 ` Rafael J. Wysocki
@ 2020-10-22 20:32 ` Mel Gorman
2020-10-22 20:39 ` Phil Auld
0 siblings, 1 reply; 27+ messages in thread
From: Mel Gorman @ 2020-10-22 20:32 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Phil Auld, Colin Ian King, Peter Zijlstra, Rafael J. Wysocki,
Giovanni Gherdovich, Viresh Kumar, Julia Lawall, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
Linux Kernel Mailing List, Valentin Schneider, Gilles Muller,
Srinivas Pandruvada, Linux PM, Len Brown
On Thu, Oct 22, 2020 at 07:59:43PM +0200, Rafael J. Wysocki wrote:
> > > Agreed. I'd like the option to switch back if we make the default change.
> > > It's on the table and I'd like to be able to go that way.
> > >
> >
> > Yep. It sounds chicken, but it's a useful safety net and a reasonable
> > way to deprecate a feature. It's also useful for bug creation -- User X
> > running whatever found that schedutil is worse than the old governor and
> > had to temporarily switch back. Repeat until complaining stops and then
> > tear out the old stuff.
> >
> > When/if there is a patch setting schedutil as the default, cc suitable
> > distro people (Giovanni and myself for openSUSE).
>
> So for the record, Giovanni was on the CC list of the "cpufreq:
> intel_pstate: Use passive mode by default without HWP" patch that this
> discussion resulted from (and which kind of belongs to the above
> category).
>
Oh I know, I did not mean to suggest that you did not. He made people
aware that this was going to be coming down the line and has been looking
into the "what if schedutil was the default" question. AFAIK, it's still
a work-in-progress and I don't know all the specifics but he knows more
than I do on the topic. I only know enough that if we flipped the switch
tomorrow that we could be plagued with google searches suggesting it be
turned off again just like there is still broken advice out there about
disabling intel_pstate for usually the wrong reasons.
The passive patch was a clear flag that the intent is that schedutil will
be the default at some unknown point in the future. That point is now a
bit closer and this thread could have encouraged a premature change of
the default resulting in unfair finger pointing at one company's test
team. If at least two distos check it out and it still goes wrong, at
least there will be shared blame :/
> > Other distros assuming they're watching can nominate their own victim.
>
> But no other victims had been nominated at that time.
We have one, possibly two if Phil agrees. That's better than zero or
unfairly placing the full responsibility on the Intel guys that have been
testing it out.
--
Mel Gorman
SUSE Labs
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 20:32 ` Mel Gorman
@ 2020-10-22 20:39 ` Phil Auld
0 siblings, 0 replies; 27+ messages in thread
From: Phil Auld @ 2020-10-22 20:39 UTC (permalink / raw)
To: Mel Gorman
Cc: Rafael J. Wysocki, Colin Ian King, Peter Zijlstra,
Rafael J. Wysocki, Giovanni Gherdovich, Viresh Kumar,
Julia Lawall, Ingo Molnar, kernel-janitors, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, Linux Kernel Mailing List,
Valentin Schneider, Gilles Muller, Srinivas Pandruvada, Linux PM,
Len Brown
On Thu, Oct 22, 2020 at 09:32:55PM +0100 Mel Gorman wrote:
> On Thu, Oct 22, 2020 at 07:59:43PM +0200, Rafael J. Wysocki wrote:
> > > > Agreed. I'd like the option to switch back if we make the default change.
> > > > It's on the table and I'd like to be able to go that way.
> > > >
> > >
> > > Yep. It sounds chicken, but it's a useful safety net and a reasonable
> > > way to deprecate a feature. It's also useful for bug creation -- User X
> > > running whatever found that schedutil is worse than the old governor and
> > > had to temporarily switch back. Repeat until complaining stops and then
> > > tear out the old stuff.
> > >
> > > When/if there is a patch setting schedutil as the default, cc suitable
> > > distro people (Giovanni and myself for openSUSE).
> >
> > So for the record, Giovanni was on the CC list of the "cpufreq:
> > intel_pstate: Use passive mode by default without HWP" patch that this
> > discussion resulted from (and which kind of belongs to the above
> > category).
> >
>
> Oh I know, I did not mean to suggest that you did not. He made people
> aware that this was going to be coming down the line and has been looking
> into the "what if schedutil was the default" question. AFAIK, it's still
> a work-in-progress and I don't know all the specifics but he knows more
> than I do on the topic. I only know enough that if we flipped the switch
> tomorrow that we could be plagued with google searches suggesting it be
> turned off again just like there is still broken advice out there about
> disabling intel_pstate for usually the wrong reasons.
>
> The passive patch was a clear flag that the intent is that schedutil will
> be the default at some unknown point in the future. That point is now a
> bit closer and this thread could have encouraged a premature change of
> the default resulting in unfair finger pointing at one company's test
> team. If at least two distos check it out and it still goes wrong, at
> least there will be shared blame :/
>
> > > Other distros assuming they're watching can nominate their own victim.
> >
> > But no other victims had been nominated at that time.
>
> We have one, possibly two if Phil agrees. That's better than zero or
> unfairly placing the full responsibility on the Intel guys that have been
> testing it out.
>
Yes. I agree and we (RHEL) are planning to test this soon. I'll try to get
to it. You can certainly CC me, please, athough I also try to watch for this
sort of thing on list.
Cheers,
Phil
> --
> Mel Gorman
> SUSE Labs
>
--
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 15:55 ` Vincent Guittot
@ 2020-10-23 5:11 ` Viresh Kumar
0 siblings, 0 replies; 27+ messages in thread
From: Viresh Kumar @ 2020-10-23 5:11 UTC (permalink / raw)
To: Vincent Guittot
Cc: A L, Peter Zijlstra, Rafael J. Wysocki, Julia Lawall, Mel Gorman,
Ingo Molnar, kernel-janitors, Juri Lelli, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
linux-kernel, Valentin Schneider, Gilles Muller,
Srinivas Pandruvada, Linux PM, Len Brown
On 22-10-20, 17:55, Vincent Guittot wrote:
> On Thu, 22 Oct 2020 at 17:45, A L <mail@lechevalier.se> wrote:
> >
> >
> >
> > ---- From: Peter Zijlstra <peterz@infradead.org> -- Sent: 2020-10-22 - 14:29 ----
> >
> > > On Thu, Oct 22, 2020 at 02:19:29PM +0200, Rafael J. Wysocki wrote:
> > >> > However I do want to retire ondemand, conservative and also very much
> > >> > intel_pstate/active mode.
> > >>
> > >> I agree in general, but IMO it would not be prudent to do that without making
> > >> schedutil provide the same level of performance in all of the relevant use
> > >> cases.
> > >
> > > Agreed; I though to have understood we were there already.
> >
> > Hi,
> >
> >
> > Currently schedutil does not populate all stats like ondemand does, which can be a problem for some monitoring software.
> >
> > On my AMD 3000G CPU with kernel-5.9.1:
> >
> >
> > grep. /sys/devices/system/cpu/cpufreq/policy0/stats/*
> >
> > With ondemand:
> > time_in_state:3900000 145179
> > time_in_state:1600000 9588482
> > total_trans:177565
> > trans_table: From : To
> > trans_table: : 3900000 1600000
> > trans_table: 3900000: 0 88783
> > trans_table: 1600000: 88782 0
> >
> > With schedutil only two file exists:
> > reset:<empty>
> > total_trans:216609
> >
> >
> > I'd really like to have these stats populated with schedutil, if that's possible.
>
> Your problem might have been fixed with
> commit 96f60cddf7a1 ("cpufreq: stats: Enable stats for fast-switch as well")
Thanks Vincent. Right, I have already fixed that for everyone.
--
viresh
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Avoid configuring old governors as default with intel_pstate
2020-10-22 16:23 ` [PATCH] cpufreq: Avoid configuring old governors as default with intel_pstate Rafael J. Wysocki
@ 2020-10-23 6:17 ` Viresh Kumar
2020-10-23 11:59 ` Rafael J. Wysocki
2020-10-23 15:15 ` [PATCH v2] " Rafael J. Wysocki
1 sibling, 1 reply; 27+ messages in thread
From: Viresh Kumar @ 2020-10-23 6:17 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Peter Zijlstra, Julia Lawall, Mel Gorman, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
linux-kernel, Valentin Schneider, Gilles Muller,
srinivas.pandruvada, Linux PM, Len Brown
On 22-10-20, 18:23, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] cpufreq: Avoid configuring old governors as default with intel_pstate
>
> Commit 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by
> default without HWP") was meant to cause intel_pstate without HWP
> to be used in the passive mode with the schedutil governor on top of
> it by default, but it missed the case in which either "ondemand" or
> "conservative" was selected as the default governor in the existing
> kernel config, in which case the previous old governor configuration
> would be used, causing the default legacy governor to be used on top
> of intel_pstate instead of schedutil.
>
> Address this by preventing "ondemand" and "conservative" from being
> configured as the default cpufreq governor in the case when schedutil
> is the default choice for the default governor setting.
>
> [Note that the default cpufreq governor can still be set via the
> kernel command line if need be and that choice is not limited,
> so if anyone really wants to use one of the legacy governors by
> default, it can be achieved this way.]
>
> Fixes: 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by default without HWP")
> Cc: 5.8+ <stable@vger.kernel.org> # 5.8+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> drivers/cpufreq/Kconfig | 2 ++
> 1 file changed, 2 insertions(+)
>
> Index: linux-pm/drivers/cpufreq/Kconfig
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/Kconfig
> +++ linux-pm/drivers/cpufreq/Kconfig
> @@ -71,6 +71,7 @@ config CPU_FREQ_DEFAULT_GOV_USERSPACE
>
> config CPU_FREQ_DEFAULT_GOV_ONDEMAND
> bool "ondemand"
> + depends on !SMP || !X86_INTEL_PSTATE
> select CPU_FREQ_GOV_ONDEMAND
> select CPU_FREQ_GOV_PERFORMANCE
> help
> @@ -83,6 +84,7 @@ config CPU_FREQ_DEFAULT_GOV_ONDEMAND
>
> config CPU_FREQ_DEFAULT_GOV_CONSERVATIVE
> bool "conservative"
> + depends on !SMP || !X86_INTEL_PSTATE
While reading this first it felt like a SMP platforms related problem
(which I was surprised about), and then I understood what you are
doing.
I wonder if rewriting it this way makes it more readable with same
result eventually.
depends on !(X86_INTEL_PSTATE && SMP)
--
viresh
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-22 20:10 ` Giovanni Gherdovich
2020-10-22 20:16 ` Giovanni Gherdovich
@ 2020-10-23 7:03 ` Peter Zijlstra
2020-10-23 17:46 ` Tom Lendacky
1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2020-10-23 7:03 UTC (permalink / raw)
To: Giovanni Gherdovich
Cc: Rafael J. Wysocki, Mel Gorman, Viresh Kumar, Julia Lawall,
Ingo Molnar, kernel-janitors, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, linux-kernel, Valentin Schneider,
Gilles Muller, srinivas.pandruvada, Linux PM, Len Brown,
thomas.lendacky, puwen, yazen.ghannam, kim.phillips,
suravee.suthikulpanit
On Thu, Oct 22, 2020 at 10:10:35PM +0200, Giovanni Gherdovich wrote:
> * for the AMD EPYC machines we haven't yet implemented frequency invariant
> accounting, which might explain why schedutil looses to ondemand on all
> the benchmarks.
Right, I poked the AMD people on that a few times, but nothing seems to
be forthcoming :/ Tom, any way you could perhaps expedite the matter?
In particular we're looking for some X86_VENDOR_AMD/HYGON code to run in
arch/x86/kernel/smpboot.c:init_freq_invariance()
The main issue is finding a 'max' frequency that is not the absolute max
turbo boost (this could result in not reaching it very often) but also
not too low such that we're always clipping.
And while we're here, IIUC AMD is still using acpi_cpufreq, but AFAIK
the chips have a CPPC interface which could be used instead. Is there
any progress on that?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] cpufreq: Avoid configuring old governors as default with intel_pstate
2020-10-23 6:17 ` Viresh Kumar
@ 2020-10-23 11:59 ` Rafael J. Wysocki
0 siblings, 0 replies; 27+ messages in thread
From: Rafael J. Wysocki @ 2020-10-23 11:59 UTC (permalink / raw)
To: Viresh Kumar
Cc: Rafael J. Wysocki, Peter Zijlstra, Julia Lawall, Mel Gorman,
Ingo Molnar, kernel-janitors, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, Linux Kernel Mailing List,
Valentin Schneider, Gilles Muller, Srinivas Pandruvada, Linux PM,
Len Brown
On Fri, Oct 23, 2020 at 8:17 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 22-10-20, 18:23, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Subject: [PATCH] cpufreq: Avoid configuring old governors as default with intel_pstate
> >
> > Commit 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by
> > default without HWP") was meant to cause intel_pstate without HWP
> > to be used in the passive mode with the schedutil governor on top of
> > it by default, but it missed the case in which either "ondemand" or
> > "conservative" was selected as the default governor in the existing
> > kernel config, in which case the previous old governor configuration
> > would be used, causing the default legacy governor to be used on top
> > of intel_pstate instead of schedutil.
> >
> > Address this by preventing "ondemand" and "conservative" from being
> > configured as the default cpufreq governor in the case when schedutil
> > is the default choice for the default governor setting.
> >
> > [Note that the default cpufreq governor can still be set via the
> > kernel command line if need be and that choice is not limited,
> > so if anyone really wants to use one of the legacy governors by
> > default, it can be achieved this way.]
> >
> > Fixes: 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by default without HWP")
> > Cc: 5.8+ <stable@vger.kernel.org> # 5.8+
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> > drivers/cpufreq/Kconfig | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > Index: linux-pm/drivers/cpufreq/Kconfig
> > ===================================================================
> > --- linux-pm.orig/drivers/cpufreq/Kconfig
> > +++ linux-pm/drivers/cpufreq/Kconfig
> > @@ -71,6 +71,7 @@ config CPU_FREQ_DEFAULT_GOV_USERSPACE
> >
> > config CPU_FREQ_DEFAULT_GOV_ONDEMAND
> > bool "ondemand"
> > + depends on !SMP || !X86_INTEL_PSTATE
> > select CPU_FREQ_GOV_ONDEMAND
> > select CPU_FREQ_GOV_PERFORMANCE
> > help
> > @@ -83,6 +84,7 @@ config CPU_FREQ_DEFAULT_GOV_ONDEMAND
> >
> > config CPU_FREQ_DEFAULT_GOV_CONSERVATIVE
> > bool "conservative"
> > + depends on !SMP || !X86_INTEL_PSTATE
>
> While reading this first it felt like a SMP platforms related problem
> (which I was surprised about), and then I understood what you are
> doing.
>
> I wonder if rewriting it this way makes it more readable with same
> result eventually.
>
> depends on !(X86_INTEL_PSTATE && SMP)
Agreed, will update.
Thanks!
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] sched/fair: check for idle core
[not found] ` <20201023061246.irzbrl62baoawmqv@vireshk-i7>
@ 2020-10-23 15:06 ` Rafael J. Wysocki
2020-10-27 3:01 ` Viresh Kumar
0 siblings, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2020-10-23 15:06 UTC (permalink / raw)
To: Viresh Kumar
Cc: Peter Zijlstra, Julia Lawall, Mel Gorman, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
linux-kernel, Valentin Schneider, Gilles Muller,
srinivas.pandruvada, linux-pm
On Friday, October 23, 2020 8:12:46 AM CEST Viresh Kumar wrote:
> On 22-10-20, 13:45, Rafael J. Wysocki wrote:
> > On Thursday, October 22, 2020 12:47:03 PM CEST Viresh Kumar wrote:
> > > And I am not really sure why we always wanted this backup performance
> > > governor to be there unless the said governors are built as module.
> >
> > Apparently, some old drivers had problems with switching frequencies fast enough
> > for ondemand to be used with them and the fallback was for those cases. AFAICS.
>
> Do we still need this ?
For the reasonably modern hardware, I don't think so.
> Or better ask those platforms to individually
> enable both of them.
Bu who knows what they are? :-)
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2] cpufreq: Avoid configuring old governors as default with intel_pstate
2020-10-22 16:23 ` [PATCH] cpufreq: Avoid configuring old governors as default with intel_pstate Rafael J. Wysocki
2020-10-23 6:17 ` Viresh Kumar
@ 2020-10-23 15:15 ` Rafael J. Wysocki
2020-10-27 3:01 ` Viresh Kumar
1 sibling, 1 reply; 27+ messages in thread
From: Rafael J. Wysocki @ 2020-10-23 15:15 UTC (permalink / raw)
To: Peter Zijlstra, Viresh Kumar, Julia Lawall
Cc: Mel Gorman, Ingo Molnar, kernel-janitors, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, linux-kernel, Valentin Schneider,
Gilles Muller, srinivas.pandruvada, Linux PM, Len Brown
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by
default without HWP") was meant to cause intel_pstate to be used
in the passive mode with the schedutil governor on top of it, but
it missed the case in which either "ondemand" or "conservative"
was selected as the default governor in the existing kernel config,
in which case the previous old governor configuration would be used,
causing the default legacy governor to be used on top of intel_pstate
instead of schedutil.
Address this by preventing "ondemand" and "conservative" from being
configured as the default cpufreq governor in the case when schedutil
is the default choice for the default governor setting.
[Note that the default cpufreq governor can still be set via the
kernel command line if need be and that choice is not limited,
so if anyone really wants to use one of the legacy governors by
default, it can be achieved this way.]
Fixes: 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by default without HWP")
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Cc: 5.8+ <stable@vger.kernel.org> # 5.8+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
The v2 addresses a review comment from Viresh regarding of the expression format
and adds a missing Reported-by for Julia.
---
drivers/cpufreq/Kconfig | 2 ++
1 file changed, 2 insertions(+)
Index: linux-pm/drivers/cpufreq/Kconfig
===================================================================
--- linux-pm.orig/drivers/cpufreq/Kconfig
+++ linux-pm/drivers/cpufreq/Kconfig
@@ -71,6 +71,7 @@ config CPU_FREQ_DEFAULT_GOV_USERSPACE
config CPU_FREQ_DEFAULT_GOV_ONDEMAND
bool "ondemand"
+ depends on !(X86_INTEL_PSTATE && SMP)
select CPU_FREQ_GOV_ONDEMAND
select CPU_FREQ_GOV_PERFORMANCE
help
@@ -83,6 +84,7 @@ config CPU_FREQ_DEFAULT_GOV_ONDEMAND
config CPU_FREQ_DEFAULT_GOV_CONSERVATIVE
bool "conservative"
+ depends on !(X86_INTEL_PSTATE && SMP)
select CPU_FREQ_GOV_CONSERVATIVE
select CPU_FREQ_GOV_PERFORMANCE
help
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-23 7:03 ` Peter Zijlstra
@ 2020-10-23 17:46 ` Tom Lendacky
2020-10-26 19:52 ` Fontenot, Nathan
0 siblings, 1 reply; 27+ messages in thread
From: Tom Lendacky @ 2020-10-23 17:46 UTC (permalink / raw)
To: Peter Zijlstra, Giovanni Gherdovich
Cc: Rafael J. Wysocki, Mel Gorman, Viresh Kumar, Julia Lawall,
Ingo Molnar, kernel-janitors, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, linux-kernel, Valentin Schneider,
Gilles Muller, srinivas.pandruvada, Linux PM, Len Brown, puwen,
yazen.ghannam, kim.phillips, suravee.suthikulpanit,
Fontenot, Nathan
On 10/23/20 2:03 AM, Peter Zijlstra wrote:
> On Thu, Oct 22, 2020 at 10:10:35PM +0200, Giovanni Gherdovich wrote:
>> * for the AMD EPYC machines we haven't yet implemented frequency invariant
>> accounting, which might explain why schedutil looses to ondemand on all
>> the benchmarks.
>
> Right, I poked the AMD people on that a few times, but nothing seems to
> be forthcoming :/ Tom, any way you could perhaps expedite the matter?
Adding Nathan to the thread to help out here.
Thanks,
Tom
>
> In particular we're looking for some X86_VENDOR_AMD/HYGON code to run in
>
> arch/x86/kernel/smpboot.c:init_freq_invariance()
>
> The main issue is finding a 'max' frequency that is not the absolute max
> turbo boost (this could result in not reaching it very often) but also
> not too low such that we're always clipping.
>
> And while we're here, IIUC AMD is still using acpi_cpufreq, but AFAIK
> the chips have a CPPC interface which could be used instead. Is there
> any progress on that?
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: default cpufreq gov, was: [PATCH] sched/fair: check for idle core
2020-10-23 17:46 ` Tom Lendacky
@ 2020-10-26 19:52 ` Fontenot, Nathan
0 siblings, 0 replies; 27+ messages in thread
From: Fontenot, Nathan @ 2020-10-26 19:52 UTC (permalink / raw)
To: Tom Lendacky, Peter Zijlstra, Giovanni Gherdovich
Cc: Rafael J. Wysocki, Mel Gorman, Viresh Kumar, Julia Lawall,
Ingo Molnar, kernel-janitors, Juri Lelli, Vincent Guittot,
Dietmar Eggemann, Steven Rostedt, Ben Segall,
Daniel Bristot de Oliveira, linux-kernel, Valentin Schneider,
Gilles Muller, srinivas.pandruvada, Linux PM, Len Brown, puwen,
yazen.ghannam, kim.phillips, suravee.suthikulpanit
On 10/23/2020 12:46 PM, Tom Lendacky wrote:
> On 10/23/20 2:03 AM, Peter Zijlstra wrote:
>> On Thu, Oct 22, 2020 at 10:10:35PM +0200, Giovanni Gherdovich wrote:
>>> * for the AMD EPYC machines we haven't yet implemented frequency invariant
>>> accounting, which might explain why schedutil looses to ondemand on all
>>> the benchmarks.
>>
>> Right, I poked the AMD people on that a few times, but nothing seems to
>> be forthcoming :/ Tom, any way you could perhaps expedite the matter?
>
> Adding Nathan to the thread to help out here.
>
> Thanks,
> Tom
Thanks Tom, diving in...
>
>>
>> In particular we're looking for some X86_VENDOR_AMD/HYGON code to run in
>>
>> arch/x86/kernel/smpboot.c:init_freq_invariance()
>>
>> The main issue is finding a 'max' frequency that is not the absolute max
>> turbo boost (this could result in not reaching it very often) but also
>> not too low such that we're always clipping.
I've started looking into this and have a lead but need to confirm that the
frequency value I'm getting is not an absolute max.
>>
>> And while we're here, IIUC AMD is still using acpi_cpufreq, but AFAIK
>> the chips have a CPPC interface which could be used instead. Is there
>> any progress on that?
>>
Correct, AMD uses acpi_cpufreq. The newer AMD chips do have a CPPC interface
(not sure how far back 'newer' covers). I'll take a look at schedutil and
cppc_cpufreq and the possibility of transitioning to them for AMD.
-Nathan
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] sched/fair: check for idle core
2020-10-23 15:06 ` [PATCH] sched/fair: check for idle core Rafael J. Wysocki
@ 2020-10-27 3:01 ` Viresh Kumar
0 siblings, 0 replies; 27+ messages in thread
From: Viresh Kumar @ 2020-10-27 3:01 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Peter Zijlstra, Julia Lawall, Mel Gorman, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
linux-kernel, Valentin Schneider, Gilles Muller,
srinivas.pandruvada, linux-pm
On 23-10-20, 17:06, Rafael J. Wysocki wrote:
> On Friday, October 23, 2020 8:12:46 AM CEST Viresh Kumar wrote:
> > On 22-10-20, 13:45, Rafael J. Wysocki wrote:
> > > On Thursday, October 22, 2020 12:47:03 PM CEST Viresh Kumar wrote:
> > > > And I am not really sure why we always wanted this backup performance
> > > > governor to be there unless the said governors are built as module.
> > >
> > > Apparently, some old drivers had problems with switching frequencies fast enough
> > > for ondemand to be used with them and the fallback was for those cases. AFAICS.
> >
> > Do we still need this ?
>
> For the reasonably modern hardware, I don't think so.
>
> > Or better ask those platforms to individually
> > enable both of them.
>
> Bu who knows what they are? :-)
I was planning to break them and let them complain :)
--
viresh
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2] cpufreq: Avoid configuring old governors as default with intel_pstate
2020-10-23 15:15 ` [PATCH v2] " Rafael J. Wysocki
@ 2020-10-27 3:01 ` Viresh Kumar
0 siblings, 0 replies; 27+ messages in thread
From: Viresh Kumar @ 2020-10-27 3:01 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Peter Zijlstra, Julia Lawall, Mel Gorman, Ingo Molnar,
kernel-janitors, Juri Lelli, Vincent Guittot, Dietmar Eggemann,
Steven Rostedt, Ben Segall, Daniel Bristot de Oliveira,
linux-kernel, Valentin Schneider, Gilles Muller,
srinivas.pandruvada, Linux PM, Len Brown
On 23-10-20, 17:15, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Commit 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by
> default without HWP") was meant to cause intel_pstate to be used
> in the passive mode with the schedutil governor on top of it, but
> it missed the case in which either "ondemand" or "conservative"
> was selected as the default governor in the existing kernel config,
> in which case the previous old governor configuration would be used,
> causing the default legacy governor to be used on top of intel_pstate
> instead of schedutil.
>
> Address this by preventing "ondemand" and "conservative" from being
> configured as the default cpufreq governor in the case when schedutil
> is the default choice for the default governor setting.
>
> [Note that the default cpufreq governor can still be set via the
> kernel command line if need be and that choice is not limited,
> so if anyone really wants to use one of the legacy governors by
> default, it can be achieved this way.]
>
> Fixes: 33aa46f252c7 ("cpufreq: intel_pstate: Use passive mode by default without HWP")
> Reported-by: Julia Lawall <julia.lawall@inria.fr>
> Cc: 5.8+ <stable@vger.kernel.org> # 5.8+
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> The v2 addresses a review comment from Viresh regarding of the expression format
> and adds a missing Reported-by for Julia.
>
> ---
> drivers/cpufreq/Kconfig | 2 ++
> 1 file changed, 2 insertions(+)
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
--
viresh
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2020-10-27 3:01 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1603211879-1064-1-git-send-email-Julia.Lawall@inria.fr>
[not found] ` <34115486.YmRjPRKJaA@kreacher>
[not found] ` <20201022120213.GG2611@hirez.programming.kicks-ass.net>
2020-10-22 12:19 ` default cpufreq gov, was: [PATCH] sched/fair: check for idle core Rafael J. Wysocki
2020-10-22 12:29 ` Peter Zijlstra
2020-10-22 14:52 ` Mel Gorman
2020-10-22 14:58 ` Colin Ian King
2020-10-22 15:12 ` Phil Auld
2020-10-22 16:35 ` Mel Gorman
2020-10-22 17:59 ` Rafael J. Wysocki
2020-10-22 20:32 ` Mel Gorman
2020-10-22 20:39 ` Phil Auld
2020-10-22 15:25 ` Peter Zijlstra
2020-10-22 15:55 ` Rafael J. Wysocki
2020-10-22 16:29 ` Mel Gorman
2020-10-22 20:10 ` Giovanni Gherdovich
2020-10-22 20:16 ` Giovanni Gherdovich
2020-10-23 7:03 ` Peter Zijlstra
2020-10-23 17:46 ` Tom Lendacky
2020-10-26 19:52 ` Fontenot, Nathan
2020-10-22 15:45 ` A L
2020-10-22 15:55 ` Vincent Guittot
2020-10-23 5:11 ` Viresh Kumar
2020-10-22 16:23 ` [PATCH] cpufreq: Avoid configuring old governors as default with intel_pstate Rafael J. Wysocki
2020-10-23 6:17 ` Viresh Kumar
2020-10-23 11:59 ` Rafael J. Wysocki
2020-10-23 15:15 ` [PATCH v2] " Rafael J. Wysocki
2020-10-27 3:01 ` Viresh Kumar
[not found] ` <20201023061246.irzbrl62baoawmqv@vireshk-i7>
2020-10-23 15:06 ` [PATCH] sched/fair: check for idle core Rafael J. Wysocki
2020-10-27 3:01 ` Viresh Kumar
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox