linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>,
	Ingo Molnar <mingo@redhat.com>, Tejun Heo <tj@kernel.org>,
	Phil Auld <pauld@redhat.com>, Ming Lei <ming.lei@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] sched, fair: Allow a per-cpu kthread waking a task to stack on the same CPU
Date: Thu, 30 Jan 2020 00:43:34 +0000	[thread overview]
Message-ID: <20200130004334.GF3466@techsingularity.net> (raw)
In-Reply-To: <20200129173852.GP14914@hirez.programming.kicks-ass.net>

On Wed, Jan 29, 2020 at 06:38:52PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 28, 2020 at 09:10:12AM +0000, Mel Gorman wrote:
> > Peter, Ingo and Vincent -- I know the timing is bad due to the merge
> > window but do you have any thoughts on allowing select_idle_sibling to
> > stack a wakee task on the same CPU as a waker in this specific case?
> 
> I sort of see, but *groan*...
> 

This is the reaction I kinda expected. Sync wakeups and select_idle_sibling
probably caused someone PTSD at some point before my time in kernel/sched/.

> so if the kworker unlocks a contended mutex/rwsem/completion...
> 
> I suppose the fact that it limits it to tasks that were running on the
> same CPU limits the impact if we do get it wrong.
> 

And it's limited to no other task currently running on the
CPU. Now, potentially multiple sleepers are on that CPU waiting for
a mutex/rwsem/completion but it's very unlikely and mostly likely due
to the machine being saturated in which case searching for an idle CPU
will probably fail. It would also be bound by a small window after the
first wakeup before the task becomes runnable before the nr_running check
mitigages the problem. Besides, if the sleeping task is waiting on the
lock, it *is* related to the kworker which is probably finished.

In other words, even this patches worst-case behaviour does not seem
that bad.

> Elsewhere you write:
> 
> > I would prefer the wakeup code did not have to signal that it's a
> > synchronous wakeup. Sync wakeups so exist but callers got it wrong many
> > times where stacking was allowed and then the waker did not go to sleep.
> > While the chain of events are related, they are not related in a very
> > obvious way. I think it's much safer to keep this as a scheduler
> > heuristic instead of depending on callers to have sufficient knowledge
> > of the scheduler implementation.
> 
> That is true; the existing WF_SYNC has caused many issues for maybe
> being too strong.
> 

Exactly. It ended up being almost ignored. It basically just means that
the waker CPU may be used as the target for wake_affine because the
users were not obeying the contract.

> But what if we create a new hint that combines both these ideas? Say
> WF_COMPLETE and subject that to these same criteria. This way we can
> eliminate wakeups from locks and such (they won't have this set).
> 

I think that'll end up with three consequences. First, it falls foul
of Rusty's Rules of API Design[1] because even if people read the
implementation and the documentation, they might still get it wrong like
what happened with WF_SYNC.  Second, some other subsystem will think it's
special and use the flag because it happens to work for one benchmark or
worse, they happened to copy/paste the code for some reason. Finally,
the workqueue implementation may change in some way that renders the
use of the flag incorrect. With this patch, if workqueues change design,
it's more likely the patch becomes a no-op.

> Or am I just making things complicated again?

I think so but I also wrote the patch so I'm biased. I think the callers
would be forced into an API change if it's a common pattern where multiple
unbound tasks can sleep on the same CPU waiting on a single kworker and
I struggle to think of such an example.

The length of time it took this issue to be detected and patched is
indicative that not everyone is familiar with kernel/sched/fair.c and its
consequences.  If they were, chances are they would have implemented some
mental hack like binding a task to a single CPU until the IO completes.

[1] http://sweng.the-davies.net/Home/rustys-api-design-manifesto

-- 
Mel Gorman
SUSE Labs

  parent reply	other threads:[~2020-01-30  0:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-27 14:36 [PATCH] sched, fair: Allow a per-cpu kthread waking a task to stack on the same CPU Mel Gorman
2020-01-27 22:32 ` Dave Chinner
2020-01-28  1:19   ` Mel Gorman
2020-01-28  9:10     ` Mel Gorman
2020-01-29 17:38       ` Peter Zijlstra
2020-01-29 22:00         ` Dave Chinner
2020-01-30  0:50           ` Mel Gorman
2020-01-30  0:43         ` Mel Gorman [this message]
2020-01-30  8:06           ` Peter Zijlstra
2020-01-30  8:55             ` Mel Gorman
2020-01-28 14:24   ` Mel Gorman
2020-01-28 22:21     ` Dave Chinner
2020-01-29 10:53       ` Mel Gorman
     [not found] <20200128100643.3016-1-hdanton@sina.com>
2020-01-28 10:32 ` Mel Gorman
     [not found] ` <20200128130837.11136-1-hdanton@sina.com>
2020-01-28 13:41   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200130004334.GF3466@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).