From: Peter Zijlstra <peterz@infradead.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: heiko.carstens@de.ibm.com, Tejun Heo <tj@kernel.org>,
Ingo Molnar <mingo@kernel.org>, Rik van Riel <riel@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq()
Date: Wed, 14 Oct 2015 17:00:23 +0200 [thread overview]
Message-ID: <20151014150023.GT3604@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20151012173402.GA29647@redhat.com>
On Mon, Oct 12, 2015 at 07:34:02PM +0200, Oleg Nesterov wrote:
> On 10/12, Peter Zijlstra wrote:
> >
> > On Sat, Oct 10, 2015 at 08:53:09PM +0200, Oleg Nesterov wrote:
> > > I do not understand the cpu_active() check in select_fallback_rq().
> > > x86 doesn't need it, and the recent commit dd9d3843755d "sched: Fix
> > > cpu_active_mask/cpu_online_mask race" documents the fact that on any
> > > architecture we can ignore !active starting from CPU_ONLINE stage.
> > >
> > > But any possible reason why do we need this check in "fallback" must
> > > equally apply to select_task_rq().
> >
> > So the reason, from vague memory, is that we want to allow per-cpu
> > threads to start/stop before/after active.
>
> I simply can't understand... To me it looks as if we can simply remove
> the cpu_active() check in select_fallback_rq().
>
> If we race with cpu_down(), cpu_active() is cleared by sched_cpu_inactive()
> which is CPU_PRI_SCHED_INACTIVE = INT_MIN + 1 priority, so it seems that
> only cpuset_cpu_inactive() can be called after that and this check is
> obviously racy anyway.
Racy yes; however select_task_rq() is called with locks held, therefore
preemption disabled. This serializes us against the sync_sched() in
cpu_down().
And note that smpboot_park_threads() runs until after the sync_sched().
So you cannot add cpu_active() to select_task_rq() for it must allow
per-cpu tasks to remain on the cpu.
Also, I think smpboot_park_threads() is called too early, we can still
run random tasks at that point which might rely on having the per-cpu
tasks around. But we cannot run it later because once the stopper thread
runs, the per-cpu threads cannot schedule to park themselves :/
Lets keep an eye on Thomas' rework to see if this gets sorted.
> So why we can't simply remove select_fallback_rq()->cpu_active() ?
It would allow migration onto a CPU that's known to be going away. That
doesn't make sense.
> > active 'should' really only govern load-balancer bits or so.
>
> OK, I don't understand the usage of cpu_active_mask in kernel/sched/,
> and of course I could easily miss something else. But I doubt very
> much this check can save us from something bad simply because it is
> racy.
But safely so; if we observe active, it must stay 'active' until we
enable preemption again.
The whole point of active is to limit the load-balancer; such that we
can rebuild the sched-domains after the fact. We used to have to rebuild
to the sched-domains early, which got into trouble (forgot details,
again).
And we had to have a new mask, because online was too late, there was
(forgot details) a state where we were still online but new are not
welcome.
Also, it makes sense to stop accepting new tasks before you take out the
old ones.
> Yes, we also have synchronize_sched() after CPU_DOWN_PREPARE, but
> the only thing we do before stop_machine() is smpboot_park_threads().
> So this can help (say) set_cpus_allowed_ptr() which uses active_mask,
> but I don't see how this can connect to ttwu paths.
Say you have a task A running on CPU4, it goes to sleep. We unplug CPU4.
We then commence unplugging CPU3, concurrently we wake A. A finds that
its old CPU (4) is no longer online and ends up in fallback looking for
a new one.
Without the cpu_active() test in fallback, we could place A on CPU3,
which is on its way down, just not quite dead.
Even if this is not strictly buggy its a daft thing to do and tempting
fate.
> And again. If there is some reason we can not do this, say, because
> ipi to non-active CPU can trigger a warning, or something else, then
> we can hit the same problem because select_task_rq() does not check
> cpu_active(). The kernel threads like stoppers/workers are probably
> fine in any case, but userspace can trigger the race with cpu_up/down.
So the only 'race' is observing active while we're stuck in
sync_sched(), which is meant to be safe. It also provides us with a
known spot after which we're sure no new tasks will come onto our dying
CPU.
next prev parent reply other threads:[~2015-10-14 15:00 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-10 18:52 [PATCH 0/3] (Was: sched: start stopper early) Oleg Nesterov
2015-10-10 18:53 ` [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq() Oleg Nesterov
2015-10-11 18:04 ` Oleg Nesterov
2015-10-11 18:57 ` Thomas Gleixner
2015-10-12 12:16 ` Peter Zijlstra
2015-10-12 17:34 ` Oleg Nesterov
2015-10-14 15:00 ` Peter Zijlstra [this message]
2015-10-14 20:05 ` Oleg Nesterov
2015-10-14 20:35 ` Peter Zijlstra
2015-10-10 18:53 ` [PATCH 2/3] sched: change select_fallback_rq() to use for_each_cpu_and() Oleg Nesterov
2015-10-10 18:53 ` [PATCH 3/3] sched: don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS Oleg Nesterov
2015-10-20 9:35 ` [tip:sched/core] sched: Don't scan all-offline -> cpus_allowed " tip-bot for Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20151014150023.GT3604@twins.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=heiko.carstens@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=riel@redhat.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vkuznets@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.