From: Mel Gorman <mgorman@techsingularity.net>
To: Dave Chinner <david@fromorbit.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Tejun Heo <tj@kernel.org>,
Phil Auld <pauld@redhat.com>, Ming Lei <ming.lei@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] sched, fair: Allow a per-cpu kthread waking a task to stack on the same CPU
Date: Thu, 30 Jan 2020 00:50:02 +0000 [thread overview]
Message-ID: <20200130005002.GG3466@techsingularity.net> (raw)
In-Reply-To: <20200129220021.GN18610@dread.disaster.area>
On Thu, Jan 30, 2020 at 09:00:21AM +1100, Dave Chinner wrote:
> On Wed, Jan 29, 2020 at 06:38:52PM +0100, Peter Zijlstra wrote:
> > On Tue, Jan 28, 2020 at 09:10:12AM +0000, Mel Gorman wrote:
> > > Peter, Ingo and Vincent -- I know the timing is bad due to the merge
> > > window but do you have any thoughts on allowing select_idle_sibling to
> > > stack a wakee task on the same CPU as a waker in this specific case?
> >
> > I sort of see, but *groan*...
> >
> > so if the kworker unlocks a contended mutex/rwsem/completion...
> >
> > I suppose the fact that it limits it to tasks that were running on the
> > same CPU limits the impact if we do get it wrong.
> >
> > Elsewhere you write:
> >
> > > I would prefer the wakeup code did not have to signal that it's a
> > > synchronous wakeup. Sync wakeups so exist but callers got it wrong many
> > > times where stacking was allowed and then the waker did not go to sleep.
> > > While the chain of events are related, they are not related in a very
> > > obvious way. I think it's much safer to keep this as a scheduler
> > > heuristic instead of depending on callers to have sufficient knowledge
> > > of the scheduler implementation.
> >
> > That is true; the existing WF_SYNC has caused many issues for maybe
> > being too strong.
> >
> > But what if we create a new hint that combines both these ideas? Say
> > WF_COMPLETE and subject that to these same criteria. This way we can
> > eliminate wakeups from locks and such (they won't have this set).
> >
> > Or am I just making things complicated again?
>
> I suspect this is making it complicated again, because it requires
> the people who maintain the code that is using workqueues to
> understand when they might need to use a special wakeup interface in
> the work function. And that includes code that currently calls
> wake_up_all() because there can be hundreds of independent tasks
> waiting on the IO completion (e.g all the wait queues in the XFS
> journal code can (and do) have multiple threads waiting on them).
>
> IOWs, requiring a special flag just to optimise this specific case
> (i.e. single dependent waiter on same CPU as the kworker) when the
> adverse behaviour is both hardware and workload dependent means it
> just won't get used correctly or reliably.
>
I agree. Pick any of Rusty's rules from "-2 Read the implementation
and you'll get it wrong" all the way down to "-10 It's impossible to
get right.".
> Hence I'd much prefer the kernel detects and dynamically handles
> this situation at runtime, because this pattern of workqueue usage
> is already quite common and will only become more widespread as we
> progress towards async processing of syscalls.
>
To be fair, as Peter says, the kernel patch may not detect this
properly. There are corner cases where it will get it wrong. My thinking is
that *at the moment* when the heuristic is wrong, it's almost certainly
because the machine was so over-saturated such that multiple related
tasks are stacking anyway.
Depending on how async syscalls proceeds, this might get turn out to the
the wrong heuristic and an API change will be required. At least if that
happens, we'll have a few use cases to help guide what the API change
should look like so we do not end up in WF_SYNC hell again.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2020-01-30 0:50 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-27 14:36 [PATCH] sched, fair: Allow a per-cpu kthread waking a task to stack on the same CPU Mel Gorman
2020-01-27 22:32 ` Dave Chinner
2020-01-28 1:19 ` Mel Gorman
2020-01-28 9:10 ` Mel Gorman
2020-01-29 17:38 ` Peter Zijlstra
2020-01-29 22:00 ` Dave Chinner
2020-01-30 0:50 ` Mel Gorman [this message]
2020-01-30 0:43 ` Mel Gorman
2020-01-30 8:06 ` Peter Zijlstra
2020-01-30 8:55 ` Mel Gorman
2020-01-28 14:24 ` Mel Gorman
2020-01-28 22:21 ` Dave Chinner
2020-01-29 10:53 ` Mel Gorman
[not found] <20200128100643.3016-1-hdanton@sina.com>
2020-01-28 10:32 ` Mel Gorman
[not found] ` <20200128130837.11136-1-hdanton@sina.com>
2020-01-28 13:41 ` Mel Gorman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200130005002.GG3466@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=mingo@redhat.com \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).