All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleg Nesterov <oleg@redhat.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: heiko.carstens@de.ibm.com, Tejun Heo <tj@kernel.org>,
	Ingo Molnar <mingo@kernel.org>, Rik van Riel <riel@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq()
Date: Wed, 14 Oct 2015 22:05:16 +0200	[thread overview]
Message-ID: <20151014200516.GA11157@redhat.com> (raw)
In-Reply-To: <20151014150023.GT3604@twins.programming.kicks-ass.net>

On 10/14, Peter Zijlstra wrote:
>
> On Mon, Oct 12, 2015 at 07:34:02PM +0200, Oleg Nesterov wrote:
> >
> > I simply can't understand... To me it looks as if we can simply remove
> > the cpu_active() check in select_fallback_rq().
> >
> > If we race with cpu_down(), cpu_active() is cleared by sched_cpu_inactive()
> > which is CPU_PRI_SCHED_INACTIVE  = INT_MIN + 1 priority, so it seems that
> > only cpuset_cpu_inactive() can be called after that and this check is
> > obviously racy anyway.
>
> Racy yes; however select_task_rq() is called with locks held, therefore
> preemption disabled. This serializes us against the sync_sched() in
> cpu_down().

Yes, I even mentioned this below. And this only means that if you see
cpu_active() == T you know that smpboot_park_threads() can't be called
until preempt_disable().

cpu_active() can become false right after the check, preempt_disable()
can't protect from __cpu_notify(CPU_DOWN_PREPARE).

> And note that smpboot_park_threads() runs until after the sync_sched().
> So you cannot add cpu_active() to select_task_rq() for it must allow
> per-cpu tasks to remain on the cpu.

Yes, yes, this is why this patch triggers BUG_ON() before ->park() in
smpboot_thread_fn().

> > So why we can't simply remove select_fallback_rq()->cpu_active() ?
>
> It would allow migration onto a CPU that's known to be going away. That
> doesn't make sense.

But this is not that bad? This is very unlikely and CPU_DYING will
migrate the woken task again.

> > OK, I don't understand the usage of cpu_active_mask in kernel/sched/,
> > and of course I could easily miss something else. But I doubt very
> > much this check can save us from something bad simply because it is
> > racy.
>
> But safely so; if we observe active, it must stay 'active' until we
> enable preemption again.

I don't undertand what "stay active" actually means here. cpu_active()
is not stable. But probably this doesn't matter.

> > Yes, we also have synchronize_sched() after CPU_DOWN_PREPARE, but
> > the only thing we do before stop_machine() is smpboot_park_threads().
> > So this can help (say) set_cpus_allowed_ptr() which uses active_mask,
> > but I don't see how this can connect to ttwu paths.
>
> Say you have a task A running on CPU4, it goes to sleep. We unplug CPU4.
> We then commence unplugging CPU3, concurrently we wake A. A finds that
> its old CPU (4) is no longer online and ends up in fallback looking for
> a new one.
>
> Without the cpu_active() test in fallback, we could place A on CPU3,
> which is on its way down, just not quite dead.

The same can happen with the cpu_active() check we currently have.
And note again that CPU_PRI_SCHED_INACTIVE == INT_MIN + 1. When
sched_cpu_inactive() clears cpu_active() all other callbacks (except
cpuset_cpu_inactive() were already fired.

> Even if this is not strictly buggy its a daft thing to do and tempting
> fate.

See above...

And if we remove this check from select_fallback_rq() we can revert
dd9d3843755da95f6 "sched: Fix cpu_active_mask/cpu_online_mask race".

And revert 875ebe94 "powerpc/smp: Wait until secondaries are active & online".
And a1307bba "s390/smp: wait until secondaries are active & online".

> > And again. If there is some reason we can not do this, say, because
> > ipi to non-active CPU can trigger a warning, or something else, then
> > we can hit the same problem because select_task_rq() does not check
> > cpu_active(). The kernel threads like stoppers/workers are probably
> > fine in any case, but userspace can trigger the race with cpu_up/down.
>
> So the only 'race' is observing active while we're stuck in
> sync_sched(),

Why? select_task_rq() will see cpu_online() == T after sync_sched().

> which is meant to be safe.

Yes, because that damn cpu_active() check doesn't look strictly necessary ;)
Or I misunderstood.

> It also provides us with a
> known spot after which we're sure no new tasks will come onto our dying
> CPU.

You mean that ttwu(task) can't migrate this task to the dying CPU
after synchronize_rcu() and before stop_machine(), this is true.

OK, lets keep this check if you dislike the idea to remove it. But to me
the very fact that select_task_rq() and select_fallback_rq() use different
checks looks just wrong. Even if everything happens to work.

Oleg.


  reply	other threads:[~2015-10-14 20:08 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-10 18:52 [PATCH 0/3] (Was: sched: start stopper early) Oleg Nesterov
2015-10-10 18:53 ` [PATCH 1/3] sched: select_task_rq() should check cpu_active() like select_fallback_rq() Oleg Nesterov
2015-10-11 18:04   ` Oleg Nesterov
2015-10-11 18:57     ` Thomas Gleixner
2015-10-12 12:16   ` Peter Zijlstra
2015-10-12 17:34     ` Oleg Nesterov
2015-10-14 15:00       ` Peter Zijlstra
2015-10-14 20:05         ` Oleg Nesterov [this message]
2015-10-14 20:35           ` Peter Zijlstra
2015-10-10 18:53 ` [PATCH 2/3] sched: change select_fallback_rq() to use for_each_cpu_and() Oleg Nesterov
2015-10-10 18:53 ` [PATCH 3/3] sched: don't scan all-offline ->cpus_allowed twice if !CONFIG_CPUSETS Oleg Nesterov
2015-10-20  9:35   ` [tip:sched/core] sched: Don't scan all-offline -> cpus_allowed " tip-bot for Oleg Nesterov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151014200516.GA11157@redhat.com \
    --to=oleg@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.