* sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization
@ 2020-09-18 17:27 Tejun Heo
2020-09-24 11:50 ` Peter Zijlstra
0 siblings, 1 reply; 5+ messages in thread
From: Tejun Heo @ 2020-09-18 17:27 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-kernel, Rik van Riel
Hello,
Peter, I noticed /proc/stat::procs_blocked going U64_MAX transiently once in
the blue moon without any other persistent issues. After looking at the code
with Rik for a bit, the culprit seems to be c6e7bd7afaeb ("sched/core:
Optimize ttwu() spinning on p->on_cpu") - it changed where ttwu dec's
nr_iowait and it looks like that can happen before the target task gets to
inc.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization 2020-09-18 17:27 sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization Tejun Heo @ 2020-09-24 11:50 ` Peter Zijlstra 2020-09-24 14:27 ` Tejun Heo 0 siblings, 1 reply; 5+ messages in thread From: Peter Zijlstra @ 2020-09-24 11:50 UTC (permalink / raw) To: Tejun Heo; +Cc: linux-kernel, Rik van Riel On Fri, Sep 18, 2020 at 01:27:59PM -0400, Tejun Heo wrote: > Hello, > > Peter, I noticed /proc/stat::procs_blocked going U64_MAX transiently once in > the blue moon without any other persistent issues. After looking at the code > with Rik for a bit, the culprit seems to be c6e7bd7afaeb ("sched/core: > Optimize ttwu() spinning on p->on_cpu") - it changed where ttwu dec's > nr_iowait and it looks like that can happen before the target task gets to > inc. Hurmph.. I suppose you're right :/ And this is an actual problem? I think the below should cure that, but blergh, not nice. If you could confirm, I'll try and think of something nicer. diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ebb90572e10d..259a4ae8ab8e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2505,7 +2505,12 @@ ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, #ifdef CONFIG_SMP if (wake_flags & WF_MIGRATED) en_flags |= ENQUEUE_MIGRATED; + else #endif + if (p->in_iowait) { + delayacct_blkio_end(p); + atomic_dec(&task_rq(p)->nr_iowait); + } activate_task(rq, p, en_flags); ttwu_do_wakeup(rq, p, wake_flags, rf); @@ -2892,11 +2897,6 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) if (READ_ONCE(p->on_rq) && ttwu_runnable(p, wake_flags)) goto unlock; - if (p->in_iowait) { - delayacct_blkio_end(p); - atomic_dec(&task_rq(p)->nr_iowait); - } - #ifdef CONFIG_SMP /* * Ensure we load p->on_cpu _after_ p->on_rq, otherwise it would be @@ -2967,6 +2967,11 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags); if (task_cpu(p) != cpu) { + if (p->in_iowait) { + delayacct_blkio_end(p); + atomic_dec(&task_rq(p)->nr_iowait); + } + wake_flags |= WF_MIGRATED; psi_ttwu_dequeue(p); set_task_cpu(p, cpu); ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization 2020-09-24 11:50 ` Peter Zijlstra @ 2020-09-24 14:27 ` Tejun Heo 2020-09-24 14:50 ` Peter Zijlstra 0 siblings, 1 reply; 5+ messages in thread From: Tejun Heo @ 2020-09-24 14:27 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Rik van Riel Hello, On Thu, Sep 24, 2020 at 01:50:42PM +0200, Peter Zijlstra wrote: > Hurmph.. I suppose you're right :/ And this is an actual problem? Yeah, this got exposed to userspace as a full 64bit number which overflowed u32 conversion in the rust procfs library which aborted a program I was working on multiple times over several months. On a more theoretical side, it might also surprise nr_iowait_cpu() users. However, a real problem that may be. > I think the below should cure that, but blergh, not nice. If you could > confirm, I'll try and think of something nicer. Rik suggested that it'd be sufficient to return 0 on underflow especially given that 0 is actually the right number to describe the state. So, maybe that can be a nicer code-wise? Thanks. -- tejun ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization 2020-09-24 14:27 ` Tejun Heo @ 2020-09-24 14:50 ` Peter Zijlstra 2020-09-24 14:57 ` Tejun Heo 0 siblings, 1 reply; 5+ messages in thread From: Peter Zijlstra @ 2020-09-24 14:50 UTC (permalink / raw) To: Tejun Heo; +Cc: linux-kernel, Rik van Riel On Thu, Sep 24, 2020 at 10:27:51AM -0400, Tejun Heo wrote: > Hello, > > On Thu, Sep 24, 2020 at 01:50:42PM +0200, Peter Zijlstra wrote: > > Hurmph.. I suppose you're right :/ And this is an actual problem? > > Yeah, this got exposed to userspace as a full 64bit number which overflowed > u32 conversion in the rust procfs library which aborted a program I was > working on multiple times over several months. > > On a more theoretical side, it might also surprise nr_iowait_cpu() users. > However, a real problem that may be. > > > I think the below should cure that, but blergh, not nice. If you could > > confirm, I'll try and think of something nicer. > > Rik suggested that it'd be sufficient to return 0 on underflow especially > given that 0 is actually the right number to describe the state. So, maybe > that can be a nicer code-wise? I worry about things where one CPU has a positive value and one or more (other) CPUs have a temporary negative value. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization 2020-09-24 14:50 ` Peter Zijlstra @ 2020-09-24 14:57 ` Tejun Heo 0 siblings, 0 replies; 5+ messages in thread From: Tejun Heo @ 2020-09-24 14:57 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Rik van Riel On Thu, Sep 24, 2020 at 04:50:41PM +0200, Peter Zijlstra wrote: > > Rik suggested that it'd be sufficient to return 0 on underflow especially > > given that 0 is actually the right number to describe the state. So, maybe > > that can be a nicer code-wise? > > I worry about things where one CPU has a positive value and one or more > (other) CPUs have a temporary negative value. I think it'd make more sense to max'ing them per-cpu as that's the right per-cpu state. nr_iowait_cpu() needs that anyway, so maybe the summing function can just use nr_iowait_cpu()? Thanks. -- tejun ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-09-24 14:58 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2020-09-18 17:27 sched: rq->nr_iowait transiently going negative after the recent p->on_cpu optimization Tejun Heo 2020-09-24 11:50 ` Peter Zijlstra 2020-09-24 14:27 ` Tejun Heo 2020-09-24 14:50 ` Peter Zijlstra 2020-09-24 14:57 ` Tejun Heo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.