From: Vikram Mulukutla <markivx@codeaurora.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>, Tejun Heo <tj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
LKML <linux-kernel@vger.kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Sebastian Sewior <bigeasy@linutronix.de>,
linux-kernel-owner@vger.kernel.org
Subject: Re: [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU
Date: Fri, 07 Jul 2017 00:47:29 -0700 [thread overview]
Message-ID: <b283140da44f4e0d5352807786f61576@codeaurora.org> (raw)
In-Reply-To: <alpine.DEB.2.20.1707042218020.2131@nanos>
On 2017-07-04 13:20, Thomas Gleixner wrote:
> Vikram reported the following backtrace:
>
> BUG: scheduling while atomic: swapper/7/0/0x00000002
> CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
> schedule
> schedule_hrtimeout_range_clock
> schedule_hrtimeout
> wait_task_inactive
> __kthread_bind_mask
> __kthread_bind
> __kthread_unpark
> kthread_unpark
> cpuhp_online_idle
> cpu_startup_entry
> secondary_start_kernel
>
> He analyzed correctly that a parked cpu hotplug thread of an offlined
> CPU
> was still on the runqueue when the CPU came back online and tried to
> unpark
> it. This causes the thread which invoked kthread_unpark() to call
> wait_task_inactive() and subsequently schedule() with preemption
> disabled.
> His proposed workaround was to "make sure" that a parked thread has
> scheduled out when the CPU goes offline, so the situation cannot
> happen.
>
> But that's still wrong because the root cause is not the fact that the
> percpu thread is still on the runqueue and neither that preemption is
> disabled, which could be simply solved by enabling preemption before
> calling kthread_unpark().
>
> The real issue is that the calling thread is the idle task of the
> upcoming
> CPU, which is not supposed to call anything which might sleep. The
> moron,
> who wrote that code, missed completely that kthread_unpark() might end
> up
> in schedule().
>
Agreed. Presumably we are still OK with the cpu hotplug thread being
migrated off to random CPUs and its unfinished kthread_parkme racing
with
a subsequent unpark? The cpuhp/X thread ends up on a random rq if it
can't
do schedule() in time because migration/X yanks it off of the dying CPU
X.
Apart from slightly disconcerting traces showing cpuhp/X momentarily
executing
on CPU Y, there's no problem that I can see of course.
Probably worth mentioning that the following patch from Junaid Shahid
seems
to indicate that such a race doesn't always work out as desired:
https://lkml.org/lkml/2017/4/28/755
> The solution is simpler than expected. The thread which controls the
> hotplug operation is waiting for the CPU to call complete() on the
> hotplug
> state completion. So the idle task of the upcoming CPU can set its
> state to
> CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the
> control
> task on a different CPU, which then can safely do the unpark and kick
> the
> now unparked hotplug thread of the upcoming CPU to complete the bringup
> to
> the final target state.
>
> Fixes: 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully
> up")
> Reported-by: Vikram Mulukutla <markivx@codeaurora.org>
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Sebastian Siewior <bigeasy@linutronix.de>
> Cc: rusty@rustcorp.com.au
> Cc: tj@kernel.org
> Cc: akpm@linux-foundation.org
> Cc: stable@vger.kernel.org
>
> ---
> kernel/cpu.c | 30 ++++++++++++++++--------------
> 1 file changed, 16 insertions(+), 14 deletions(-)
>
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -271,11 +271,25 @@ void cpu_hotplug_enable(void)
> EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
> #endif /* CONFIG_HOTPLUG_CPU */
>
> +static void __cpuhp_kick_ap_work(struct cpuhp_cpu_state *st);
> +
> static int bringup_wait_for_ap(unsigned int cpu)
> {
> struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>
> + /* Wait for the CPU to reach IDLE_ONLINE */
> wait_for_completion(&st->done);
> + BUG_ON(!cpu_online(cpu));
> +
> + /* Unpark the stopper thread and the hotplug thread of the target cpu
> */
> + stop_machine_unpark(cpu);
> + kthread_unpark(st->thread);
> +
> + /* Should we go further up ? */
> + if (st->target > CPUHP_AP_ONLINE_IDLE) {
> + __cpuhp_kick_ap_work(st);
> + wait_for_completion(&st->done);
> + }
> return st->result;
> }
>
> @@ -296,9 +310,7 @@ static int bringup_cpu(unsigned int cpu)
> irq_unlock_sparse();
> if (ret)
> return ret;
> - ret = bringup_wait_for_ap(cpu);
> - BUG_ON(!cpu_online(cpu));
> - return ret;
> + return bringup_wait_for_ap(cpu);
> }
>
> /*
> @@ -775,23 +787,13 @@ void notify_cpu_starting(unsigned int cp
> void cpuhp_online_idle(enum cpuhp_state state)
> {
> struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
> - unsigned int cpu = smp_processor_id();
>
> /* Happens for the boot cpu */
> if (state != CPUHP_AP_ONLINE_IDLE)
> return;
>
> st->state = CPUHP_AP_ONLINE_IDLE;
> -
> - /* Unpark the stopper thread and the hotplug thread of this cpu */
> - stop_machine_unpark(cpu);
> - kthread_unpark(st->thread);
> -
> - /* Should we go further up ? */
> - if (st->target > CPUHP_AP_ONLINE_IDLE)
> - __cpuhp_kick_ap_work(st);
> - else
> - complete(&st->done);
> + complete(&st->done);
> }
>
> /* Requires cpu_add_remove_lock to be held */
Nice and clean otherwise. Channagoud was instrumental in collecting
data, theorizing with me and testing your fix, so if the concern I've
raised above doesn't matter, please add:
Tested-by: Channagoud Kadabi <ckadabi@codeaurora.org>
Thanks,
Vikram
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
next prev parent reply other threads:[~2017-07-07 7:47 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-07-04 20:20 [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU Thomas Gleixner
2017-07-05 9:04 ` Peter Zijlstra
2017-07-05 9:07 ` Thomas Gleixner
2017-07-05 9:17 ` Peter Zijlstra
2017-07-06 8:57 ` [tip:smp/urgent] " tip-bot for Thomas Gleixner
2017-07-07 7:47 ` Vikram Mulukutla [this message]
2017-07-07 7:53 ` [PATCH] " Vikram Mulukutla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b283140da44f4e0d5352807786f61576@codeaurora.org \
--to=markivx@codeaurora.org \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=linux-kernel-owner@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peterz@infradead.org \
--cc=rusty@rustcorp.com.au \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox