Linux Power Management development
 help / color / mirror / Atom feed
* Re: [PATCHSET v6 0/4] Split iowait into two states
       [not found] <20240819154259.215504-1-axboe@kernel.dk>
@ 2024-09-04 14:28 ` Peter Zijlstra
  2024-09-04 14:41   ` Jens Axboe
  2024-09-04 14:42   ` Rafael J. Wysocki
  0 siblings, 2 replies; 12+ messages in thread
From: Peter Zijlstra @ 2024-09-04 14:28 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, tglx, rafael, daniel.lezcano, linux-pm

On Mon, Aug 19, 2024 at 09:39:45AM -0600, Jens Axboe wrote:
> Hi,
> 
> This is v6 of the patchset where the current in_iowait state is split
> into two parts:
> 
> 1) The "task is sleeping waiting on IO", and would like cpufreq goodness
>    in terms of sleep and wakeup latencies.
> 2) The above, and also accounted as such in the iowait stats.
> 
> The current ->in_iowait covers both, this series splits it into two types
> of state so that each can be controlled seperately.

Yeah, but *WHY* !?!? I have some vague memories from last time around,
but patches should really keep this information.

> Patches 1..3 are prep patches, changing the type of
> task_struct->nr_iowait and adding helpers to manipulate the iowait counts.
> 
> Patch 4 does the actual splitting.
> 
> This has been sitting for a while, would be nice to get this queued up
> for 6.12. Comments welcome!

Ufff, and all this because menu-governor does something insane :-(

Rafael, why can't we simply remove this from menu? All the nr_iowait*()
users are basically broken and I would much rather fix broken rather
than work around broken like this.

That is, from where I'm sitting this all makes the io-wait situation far
worse instead of better.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET v6 0/4] Split iowait into two states
  2024-09-04 14:28 ` [PATCHSET v6 0/4] Split iowait into two states Peter Zijlstra
@ 2024-09-04 14:41   ` Jens Axboe
  2024-09-04 14:49     ` Jens Axboe
  2024-09-05  9:51     ` Peter Zijlstra
  2024-09-04 14:42   ` Rafael J. Wysocki
  1 sibling, 2 replies; 12+ messages in thread
From: Jens Axboe @ 2024-09-04 14:41 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, tglx, rafael, daniel.lezcano, linux-pm

On 9/4/24 8:28 AM, Peter Zijlstra wrote:
> On Mon, Aug 19, 2024 at 09:39:45AM -0600, Jens Axboe wrote:
>> Hi,
>>
>> This is v6 of the patchset where the current in_iowait state is split
>> into two parts:
>>
>> 1) The "task is sleeping waiting on IO", and would like cpufreq goodness
>>    in terms of sleep and wakeup latencies.
>> 2) The above, and also accounted as such in the iowait stats.
>>
>> The current ->in_iowait covers both, this series splits it into two types
>> of state so that each can be controlled seperately.
> 
> Yeah, but *WHY* !?!? I have some vague memories from last time around,
> but patches should really keep this information.

To decouple the frequency boost on short waits from the accounting side,
as lots of tooling equates iowait time with busy time and reports it as
such. Yeah that's garbage and a reporting issue, but decades of
education hasn't really improved on that. We should've dumped iowait
once we moved away from 1-2 processor system or had preemptible kernels,
but alas we did not and here we are in 2024.

>> Patches 1..3 are prep patches, changing the type of
>> task_struct->nr_iowait and adding helpers to manipulate the iowait counts.
>>
>> Patch 4 does the actual splitting.
>>
>> This has been sitting for a while, would be nice to get this queued up
>> for 6.12. Comments welcome!
> 
> Ufff, and all this because menu-governor does something insane :-(
> 
> Rafael, why can't we simply remove this from menu? All the nr_iowait*()
> users are basically broken and I would much rather fix broken rather
> than work around broken like this.
> 
> That is, from where I'm sitting this all makes the io-wait situation far
> worse instead of better.

IMHO what we need is a way to propagate expected wait times for a
sleeper. Right now iowait serves this purpose in a very crude way, in
that it doesn't really tell you the expected wait, just that it's a
short one.

If we simply remove iowait frequency boosting, then we'll have big
regressions particularly for low/sync storage IO.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET v6 0/4] Split iowait into two states
  2024-09-04 14:28 ` [PATCHSET v6 0/4] Split iowait into two states Peter Zijlstra
  2024-09-04 14:41   ` Jens Axboe
@ 2024-09-04 14:42   ` Rafael J. Wysocki
  2024-09-04 15:18     ` Rafael J. Wysocki
  1 sibling, 1 reply; 12+ messages in thread
From: Rafael J. Wysocki @ 2024-09-04 14:42 UTC (permalink / raw)
  To: Peter Zijlstra, Jens Axboe; +Cc: linux-kernel, tglx, daniel.lezcano, linux-pm

On Wed, Sep 4, 2024 at 4:28 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Aug 19, 2024 at 09:39:45AM -0600, Jens Axboe wrote:
> > Hi,
> >
> > This is v6 of the patchset where the current in_iowait state is split
> > into two parts:
> >
> > 1) The "task is sleeping waiting on IO", and would like cpufreq goodness
> >    in terms of sleep and wakeup latencies.
> > 2) The above, and also accounted as such in the iowait stats.
> >
> > The current ->in_iowait covers both, this series splits it into two types
> > of state so that each can be controlled seperately.
>
> Yeah, but *WHY* !?!? I have some vague memories from last time around,
> but patches should really keep this information.
>
> > Patches 1..3 are prep patches, changing the type of
> > task_struct->nr_iowait and adding helpers to manipulate the iowait counts.
> >
> > Patch 4 does the actual splitting.
> >
> > This has been sitting for a while, would be nice to get this queued up
> > for 6.12. Comments welcome!
>
> Ufff, and all this because menu-governor does something insane :-(
>
> Rafael, why can't we simply remove this from menu?

Same reason as before: people use it and refuse to stop.

But this is mostly about the schedutil cpufreq governor that uses
iowait boosting.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET v6 0/4] Split iowait into two states
  2024-09-04 14:41   ` Jens Axboe
@ 2024-09-04 14:49     ` Jens Axboe
  2024-09-05  9:51     ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2024-09-04 14:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, tglx, rafael, daniel.lezcano, linux-pm

On 9/4/24 8:41 AM, Jens Axboe wrote:
> On 9/4/24 8:28 AM, Peter Zijlstra wrote:
>> On Mon, Aug 19, 2024 at 09:39:45AM -0600, Jens Axboe wrote:
>>> Hi,
>>>
>>> This is v6 of the patchset where the current in_iowait state is split
>>> into two parts:
>>>
>>> 1) The "task is sleeping waiting on IO", and would like cpufreq goodness
>>>    in terms of sleep and wakeup latencies.
>>> 2) The above, and also accounted as such in the iowait stats.
>>>
>>> The current ->in_iowait covers both, this series splits it into two types
>>> of state so that each can be controlled seperately.
>>
>> Yeah, but *WHY* !?!? I have some vague memories from last time around,
>> but patches should really keep this information.
> 
> To decouple the frequency boost on short waits from the accounting side,
> as lots of tooling equates iowait time with busy time and reports it as
> such. Yeah that's garbage and a reporting issue, but decades of
> education hasn't really improved on that. We should've dumped iowait
> once we moved away from 1-2 processor system or had preemptible kernels,
> but alas we did not and here we are in 2024.

Forgot to mention, it's not *just* an educational thing - lots services
of services do mixed network and disk IO, obviously, and they do have
some interest in retaining iowait metrics on the disk side.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET v6 0/4] Split iowait into two states
  2024-09-04 14:42   ` Rafael J. Wysocki
@ 2024-09-04 15:18     ` Rafael J. Wysocki
  2024-09-05  9:29       ` Christian Loehle
  2024-09-05  9:36       ` Peter Zijlstra
  0 siblings, 2 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2024-09-04 15:18 UTC (permalink / raw)
  To: Peter Zijlstra, Jens Axboe
  Cc: linux-kernel, tglx, daniel.lezcano, linux-pm, Christian Loehle

On Wed, Sep 4, 2024 at 4:42 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Wed, Sep 4, 2024 at 4:28 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Aug 19, 2024 at 09:39:45AM -0600, Jens Axboe wrote:
> > > Hi,
> > >
> > > This is v6 of the patchset where the current in_iowait state is split
> > > into two parts:
> > >
> > > 1) The "task is sleeping waiting on IO", and would like cpufreq goodness
> > >    in terms of sleep and wakeup latencies.
> > > 2) The above, and also accounted as such in the iowait stats.
> > >
> > > The current ->in_iowait covers both, this series splits it into two types
> > > of state so that each can be controlled seperately.
> >
> > Yeah, but *WHY* !?!? I have some vague memories from last time around,
> > but patches should really keep this information.
> >
> > > Patches 1..3 are prep patches, changing the type of
> > > task_struct->nr_iowait and adding helpers to manipulate the iowait counts.
> > >
> > > Patch 4 does the actual splitting.
> > >
> > > This has been sitting for a while, would be nice to get this queued up
> > > for 6.12. Comments welcome!
> >
> > Ufff, and all this because menu-governor does something insane :-(
> >
> > Rafael, why can't we simply remove this from menu?
>
> Same reason as before: people use it and refuse to stop.
>
> But this is mostly about the schedutil cpufreq governor that uses
> iowait boosting.

To be more precise, there are two different uses of "iowait" in PM.

One is the nr_iowait_cpu() call in menu_select() and the result of it
is used for two purposes: (1) select different sets of statistics
depending on whether or not this number is zero and (2) set a limit
for the idle state's exit latency that depends on this number (but
note that it only takes effect when the "iowait" statistics are used
in the first place).  Both of these are arguably questionable and it
is unclear to me whether or not they actually help and how much.

The other use is boosting CPU frequency in schedutil and intel_pstate
if SCHED_CPUFREQ_IOWAIT is passed to them which in turn depends on the
p->in_iowait value in enqueue_task_fair().

AFAICS, the latter makes a major difference.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET v6 0/4] Split iowait into two states
  2024-09-04 15:18     ` Rafael J. Wysocki
@ 2024-09-05  9:29       ` Christian Loehle
  2024-09-05 10:40         ` Rafael J. Wysocki
  2024-09-05  9:36       ` Peter Zijlstra
  1 sibling, 1 reply; 12+ messages in thread
From: Christian Loehle @ 2024-09-05  9:29 UTC (permalink / raw)
  To: Rafael J. Wysocki, Peter Zijlstra, Jens Axboe
  Cc: linux-kernel, tglx, daniel.lezcano, linux-pm

On 9/4/24 16:18, Rafael J. Wysocki wrote:
> On Wed, Sep 4, 2024 at 4:42 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>>
>> On Wed, Sep 4, 2024 at 4:28 PM Peter Zijlstra <peterz@infradead.org> wrote:
>>>
>>> On Mon, Aug 19, 2024 at 09:39:45AM -0600, Jens Axboe wrote:
>>>> Hi,
>>>>
>>>> This is v6 of the patchset where the current in_iowait state is split
>>>> into two parts:
>>>>
>>>> 1) The "task is sleeping waiting on IO", and would like cpufreq goodness
>>>>    in terms of sleep and wakeup latencies.
>>>> 2) The above, and also accounted as such in the iowait stats.
>>>>
>>>> The current ->in_iowait covers both, this series splits it into two types
>>>> of state so that each can be controlled seperately.
>>>
>>> Yeah, but *WHY* !?!? I have some vague memories from last time around,
>>> but patches should really keep this information.
>>>
>>>> Patches 1..3 are prep patches, changing the type of
>>>> task_struct->nr_iowait and adding helpers to manipulate the iowait counts.
>>>>
>>>> Patch 4 does the actual splitting.
>>>>
>>>> This has been sitting for a while, would be nice to get this queued up
>>>> for 6.12. Comments welcome!
>>>
>>> Ufff, and all this because menu-governor does something insane :-(
>>>
>>> Rafael, why can't we simply remove this from menu?
>>
>> Same reason as before: people use it and refuse to stop.
>>
>> But this is mostly about the schedutil cpufreq governor that uses
>> iowait boosting.
> 
> To be more precise, there are two different uses of "iowait" in PM.
> 
> One is the nr_iowait_cpu() call in menu_select() and the result of it
> is used for two purposes: (1) select different sets of statistics
> depending on whether or not this number is zero and (2) set a limit
> for the idle state's exit latency that depends on this number (but
> note that it only takes effect when the "iowait" statistics are used
> in the first place).  Both of these are arguably questionable and it
> is unclear to me whether or not they actually help and how much.

So from my perspective it doesn't, not significantly to justify it's
existence anyway. Either it doesn't actually matter for menu, or teo
is able to compete / outperform without relying on it.
Some caution is advised though this really depends on:
- Which idle states are available for the kernel to select.
- How accurate the kernel's view of the idle states is.

Both varies wildly.

> 
> The other use is boosting CPU frequency in schedutil and intel_pstate
> if SCHED_CPUFREQ_IOWAIT is passed to them which in turn depends on the
> p->in_iowait value in enqueue_task_fair().
> 
> AFAICS, the latter makes a major difference.


Indeed, fortunately the impact is quite limited here.
But please, Rafael, Jens and Peter, feel free to share your comments
over here too:

https://lore.kernel.org/lkml/20240905092645.2885200-1-christian.loehle@arm.com/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET v6 0/4] Split iowait into two states
  2024-09-04 15:18     ` Rafael J. Wysocki
  2024-09-05  9:29       ` Christian Loehle
@ 2024-09-05  9:36       ` Peter Zijlstra
  2024-09-05 10:31         ` Christian Loehle
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2024-09-05  9:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jens Axboe, linux-kernel, tglx, daniel.lezcano, linux-pm,
	Christian Loehle

On Wed, Sep 04, 2024 at 05:18:57PM +0200, Rafael J. Wysocki wrote:

> To be more precise, there are two different uses of "iowait" in PM.
> 
> One is the nr_iowait_cpu() call in menu_select() and the result of it
> is used for two purposes: (1) select different sets of statistics
> depending on whether or not this number is zero and (2) set a limit
> for the idle state's exit latency that depends on this number (but
> note that it only takes effect when the "iowait" statistics are used
> in the first place).  Both of these are arguably questionable and it
> is unclear to me whether or not they actually help and how much.

So this one is very dubious, it relies on tasks getting back on the CPU
they went to sleep on -- not guaranteed at all.

> The other use is boosting CPU frequency in schedutil and intel_pstate
> if SCHED_CPUFREQ_IOWAIT is passed to them which in turn depends on the
> p->in_iowait value in enqueue_task_fair().

This one is fine and makes sense. At this point we know that p is going
to run and where it is going to run.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET v6 0/4] Split iowait into two states
  2024-09-04 14:41   ` Jens Axboe
  2024-09-04 14:49     ` Jens Axboe
@ 2024-09-05  9:51     ` Peter Zijlstra
  1 sibling, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2024-09-05  9:51 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, tglx, rafael, daniel.lezcano, linux-pm

On Wed, Sep 04, 2024 at 08:41:23AM -0600, Jens Axboe wrote:

> > Yeah, but *WHY* !?!? I have some vague memories from last time around,
> > but patches should really keep this information.
> 
> To decouple the frequency boost on short waits from the accounting side,
> as lots of tooling equates iowait time with busy time and reports it as
> such. Yeah that's garbage and a reporting issue, but decades of
> education hasn't really improved on that. We should've dumped iowait
> once we moved away from 1-2 processor system or had preemptible kernels,
> but alas we did not and here we are in 2024.

There's 'WAIT' in the name, what broken piece of garbage reports it as
busy time? That has *NEVER* been right. Even on UP systems where IO-wait
is actually a sensible number, it is explicitly the time it *could* have
been busy, if only the IO were faster.

And are we really going to make the whole kernel situation worse just
because there's a bunch of broken userspace?

> >> Patches 1..3 are prep patches, changing the type of
> >> task_struct->nr_iowait and adding helpers to manipulate the iowait counts.
> >>
> >> Patch 4 does the actual splitting.
> >>
> >> This has been sitting for a while, would be nice to get this queued up
> >> for 6.12. Comments welcome!
> > 
> > Ufff, and all this because menu-governor does something insane :-(
> > 
> > Rafael, why can't we simply remove this from menu? All the nr_iowait*()
> > users are basically broken and I would much rather fix broken rather
> > than work around broken like this.
> > 
> > That is, from where I'm sitting this all makes the io-wait situation far
> > worse instead of better.
> 
> IMHO what we need is a way to propagate expected wait times for a
> sleeper. Right now iowait serves this purpose in a very crude way, in
> that it doesn't really tell you the expected wait, just that it's a
> short one.

Expected wait time is one thing, but you then *still* have no clue what
CPU it will get back on. Very typically it will be another CPU in the
same cache cluster. One that had no consideration of it when it went to
sleep.

A sleeping task is not associated with a CPU. There is a fundamental
mismatch there.

Using io-wait for idle state selection is very tricky because of this.

> If we simply remove iowait frequency boosting, then we'll have big
> regressions particularly for low/sync storage IO.

The frequency boosting thing I don't object to. That happend on wakeup
after we know that and where a task is going to run.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET v6 0/4] Split iowait into two states
  2024-09-05  9:36       ` Peter Zijlstra
@ 2024-09-05 10:31         ` Christian Loehle
  2024-09-05 11:00           ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Christian Loehle @ 2024-09-05 10:31 UTC (permalink / raw)
  To: Peter Zijlstra, Rafael J. Wysocki
  Cc: Jens Axboe, linux-kernel, tglx, daniel.lezcano, linux-pm

On 9/5/24 10:36, Peter Zijlstra wrote:
> On Wed, Sep 04, 2024 at 05:18:57PM +0200, Rafael J. Wysocki wrote:
> 
>> To be more precise, there are two different uses of "iowait" in PM.
>>
>> One is the nr_iowait_cpu() call in menu_select() and the result of it
>> is used for two purposes: (1) select different sets of statistics
>> depending on whether or not this number is zero and (2) set a limit
>> for the idle state's exit latency that depends on this number (but
>> note that it only takes effect when the "iowait" statistics are used
>> in the first place).  Both of these are arguably questionable and it
>> is unclear to me whether or not they actually help and how much.
> 
> So this one is very dubious, it relies on tasks getting back on the CPU
> they went to sleep on -- not guaranteed at all.
> 
>> The other use is boosting CPU frequency in schedutil and intel_pstate
>> if SCHED_CPUFREQ_IOWAIT is passed to them which in turn depends on the
>> p->in_iowait value in enqueue_task_fair().
> 
> This one is fine and makes sense. At this point we know that p is going
> to run and where it is going to run.

On any even remotely realistic scenario and hardware though the boost
isn't effective until the next enqueue-dequeue-cycle, so if your above
objection is based on that, I would object here too, using your argument.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET v6 0/4] Split iowait into two states
  2024-09-05  9:29       ` Christian Loehle
@ 2024-09-05 10:40         ` Rafael J. Wysocki
  0 siblings, 0 replies; 12+ messages in thread
From: Rafael J. Wysocki @ 2024-09-05 10:40 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Rafael J. Wysocki, Peter Zijlstra, Jens Axboe, linux-kernel, tglx,
	daniel.lezcano, linux-pm

On Thu, Sep 5, 2024 at 11:29 AM Christian Loehle
<christian.loehle@arm.com> wrote:
>
> On 9/4/24 16:18, Rafael J. Wysocki wrote:
> > On Wed, Sep 4, 2024 at 4:42 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >>
> >> On Wed, Sep 4, 2024 at 4:28 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >>>
> >>> On Mon, Aug 19, 2024 at 09:39:45AM -0600, Jens Axboe wrote:
> >>>> Hi,
> >>>>
> >>>> This is v6 of the patchset where the current in_iowait state is split
> >>>> into two parts:
> >>>>
> >>>> 1) The "task is sleeping waiting on IO", and would like cpufreq goodness
> >>>>    in terms of sleep and wakeup latencies.
> >>>> 2) The above, and also accounted as such in the iowait stats.
> >>>>
> >>>> The current ->in_iowait covers both, this series splits it into two types
> >>>> of state so that each can be controlled seperately.
> >>>
> >>> Yeah, but *WHY* !?!? I have some vague memories from last time around,
> >>> but patches should really keep this information.
> >>>
> >>>> Patches 1..3 are prep patches, changing the type of
> >>>> task_struct->nr_iowait and adding helpers to manipulate the iowait counts.
> >>>>
> >>>> Patch 4 does the actual splitting.
> >>>>
> >>>> This has been sitting for a while, would be nice to get this queued up
> >>>> for 6.12. Comments welcome!
> >>>
> >>> Ufff, and all this because menu-governor does something insane :-(
> >>>
> >>> Rafael, why can't we simply remove this from menu?
> >>
> >> Same reason as before: people use it and refuse to stop.
> >>
> >> But this is mostly about the schedutil cpufreq governor that uses
> >> iowait boosting.
> >
> > To be more precise, there are two different uses of "iowait" in PM.
> >
> > One is the nr_iowait_cpu() call in menu_select() and the result of it
> > is used for two purposes: (1) select different sets of statistics
> > depending on whether or not this number is zero and (2) set a limit
> > for the idle state's exit latency that depends on this number (but
> > note that it only takes effect when the "iowait" statistics are used
> > in the first place).  Both of these are arguably questionable and it
> > is unclear to me whether or not they actually help and how much.
>
> So from my perspective it doesn't, not significantly to justify it's
> existence anyway. Either it doesn't actually matter for menu, or teo
> is able to compete / outperform without relying on it.

Thanks for this feedback!

I'm actually going to try to remove that stuff from menu and see if
anyone cries bloody murder.

> Some caution is advised though this really depends on:
> - Which idle states are available for the kernel to select.
> - How accurate the kernel's view of the idle states is.
>
> Both varies wildly.

True, but let's see what the feedback is.

> > The other use is boosting CPU frequency in schedutil and intel_pstate
> > if SCHED_CPUFREQ_IOWAIT is passed to them which in turn depends on the
> > p->in_iowait value in enqueue_task_fair().
> >
> > AFAICS, the latter makes a major difference.
>
>
> Indeed, fortunately the impact is quite limited here.
> But please, Rafael, Jens and Peter, feel free to share your comments
> over here too:
>
> https://lore.kernel.org/lkml/20240905092645.2885200-1-christian.loehle@arm.com/

I will.

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET v6 0/4] Split iowait into two states
  2024-09-05 10:31         ` Christian Loehle
@ 2024-09-05 11:00           ` Peter Zijlstra
  2024-09-05 11:09             ` Christian Loehle
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2024-09-05 11:00 UTC (permalink / raw)
  To: Christian Loehle
  Cc: Rafael J. Wysocki, Jens Axboe, linux-kernel, tglx, daniel.lezcano,
	linux-pm

On Thu, Sep 05, 2024 at 11:31:09AM +0100, Christian Loehle wrote:
> On 9/5/24 10:36, Peter Zijlstra wrote:
> > On Wed, Sep 04, 2024 at 05:18:57PM +0200, Rafael J. Wysocki wrote:
> > 
> >> To be more precise, there are two different uses of "iowait" in PM.
> >>
> >> One is the nr_iowait_cpu() call in menu_select() and the result of it
> >> is used for two purposes: (1) select different sets of statistics
> >> depending on whether or not this number is zero and (2) set a limit
> >> for the idle state's exit latency that depends on this number (but
> >> note that it only takes effect when the "iowait" statistics are used
> >> in the first place).  Both of these are arguably questionable and it
> >> is unclear to me whether or not they actually help and how much.
> > 
> > So this one is very dubious, it relies on tasks getting back on the CPU
> > they went to sleep on -- not guaranteed at all.
> > 
> >> The other use is boosting CPU frequency in schedutil and intel_pstate
> >> if SCHED_CPUFREQ_IOWAIT is passed to them which in turn depends on the
> >> p->in_iowait value in enqueue_task_fair().
> > 
> > This one is fine and makes sense. At this point we know that p is going
> > to run and where it is going to run.
> 
> On any even remotely realistic scenario and hardware though the boost
> isn't effective until the next enqueue-dequeue-cycle, so if your above
> objection is based on that, I would object here too, using your argument.

That is a quality of implementation issue with schedutil no?

The whole notion that the wait was for feeding external hardware, and
thus the normal utilization metric doesn't work right thing is still
valid.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHSET v6 0/4] Split iowait into two states
  2024-09-05 11:00           ` Peter Zijlstra
@ 2024-09-05 11:09             ` Christian Loehle
  0 siblings, 0 replies; 12+ messages in thread
From: Christian Loehle @ 2024-09-05 11:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Rafael J. Wysocki, Jens Axboe, linux-kernel, tglx, daniel.lezcano,
	linux-pm

On 9/5/24 12:00, Peter Zijlstra wrote:
> On Thu, Sep 05, 2024 at 11:31:09AM +0100, Christian Loehle wrote:
>> On 9/5/24 10:36, Peter Zijlstra wrote:
>>> On Wed, Sep 04, 2024 at 05:18:57PM +0200, Rafael J. Wysocki wrote:
>>>
>>>> To be more precise, there are two different uses of "iowait" in PM.
>>>>
>>>> One is the nr_iowait_cpu() call in menu_select() and the result of it
>>>> is used for two purposes: (1) select different sets of statistics
>>>> depending on whether or not this number is zero and (2) set a limit
>>>> for the idle state's exit latency that depends on this number (but
>>>> note that it only takes effect when the "iowait" statistics are used
>>>> in the first place).  Both of these are arguably questionable and it
>>>> is unclear to me whether or not they actually help and how much.
>>>
>>> So this one is very dubious, it relies on tasks getting back on the CPU
>>> they went to sleep on -- not guaranteed at all.
>>>
>>>> The other use is boosting CPU frequency in schedutil and intel_pstate
>>>> if SCHED_CPUFREQ_IOWAIT is passed to them which in turn depends on the
>>>> p->in_iowait value in enqueue_task_fair().
>>>
>>> This one is fine and makes sense. At this point we know that p is going
>>> to run and where it is going to run.
>>
>> On any even remotely realistic scenario and hardware though the boost
>> isn't effective until the next enqueue-dequeue-cycle, so if your above
>> objection is based on that, I would object here too, using your argument.
> 
> That is a quality of implementation issue with schedutil no?

Is it? So there is a latency from requesting a new frequency and actually
running on it, for both x86 and arm platforms out there that should still
be a few usecs at least during which the task is running. The task will
dequeue quite soon (otherwise it will build up utilization and then it's
not one we consider problematic wrt to this io utilization problem anyway).
Just to be clear, I'm assuming fast_switch here and then I think schedutil's
implementation isn't the problem, rather the premise of the underlying
problem is.
I have tried to elaborate on that in the RFC I've posted and linked though.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-09-05 11:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20240819154259.215504-1-axboe@kernel.dk>
2024-09-04 14:28 ` [PATCHSET v6 0/4] Split iowait into two states Peter Zijlstra
2024-09-04 14:41   ` Jens Axboe
2024-09-04 14:49     ` Jens Axboe
2024-09-05  9:51     ` Peter Zijlstra
2024-09-04 14:42   ` Rafael J. Wysocki
2024-09-04 15:18     ` Rafael J. Wysocki
2024-09-05  9:29       ` Christian Loehle
2024-09-05 10:40         ` Rafael J. Wysocki
2024-09-05  9:36       ` Peter Zijlstra
2024-09-05 10:31         ` Christian Loehle
2024-09-05 11:00           ` Peter Zijlstra
2024-09-05 11:09             ` Christian Loehle

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox