All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vikram Mulukutla <markivx@codeaurora.org>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>, Tejun Heo <tj@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Sebastian Sewior <bigeasy@linutronix.de>,
	linux-kernel-owner@vger.kernel.org
Subject: Re: [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU
Date: Fri, 07 Jul 2017 00:53:16 -0700	[thread overview]
Message-ID: <858ae39b96a9e56006c26eb91bce1923@codeaurora.org> (raw)
In-Reply-To: <b283140da44f4e0d5352807786f61576@codeaurora.org>

On 2017-07-07 00:47, Vikram Mulukutla wrote:
> On 2017-07-04 13:20, Thomas Gleixner wrote:
>> Vikram reported the following backtrace:
>> 
>>    BUG: scheduling while atomic: swapper/7/0/0x00000002
>>    CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
>>    schedule
>>    schedule_hrtimeout_range_clock
>>    schedule_hrtimeout
>>    wait_task_inactive
>>    __kthread_bind_mask
>>    __kthread_bind
>>    __kthread_unpark
>>    kthread_unpark
>>    cpuhp_online_idle
>>    cpu_startup_entry
>>    secondary_start_kernel
>> 
>> He analyzed correctly that a parked cpu hotplug thread of an offlined 
>> CPU
>> was still on the runqueue when the CPU came back online and tried to 
>> unpark
>> it. This causes the thread which invoked kthread_unpark() to call
>> wait_task_inactive() and subsequently schedule() with preemption 
>> disabled.
>> His proposed workaround was to "make sure" that a parked thread has
>> scheduled out when the CPU goes offline, so the situation cannot 
>> happen.
>> 
>> But that's still wrong because the root cause is not the fact that the
>> percpu thread is still on the runqueue and neither that preemption is
>> disabled, which could be simply solved by enabling preemption before
>> calling kthread_unpark().
>> 
>> The real issue is that the calling thread is the idle task of the 
>> upcoming
>> CPU, which is not supposed to call anything which might sleep.  The 
>> moron,
>> who wrote that code, missed completely that kthread_unpark() might end 
>> up
>> in schedule().
>> 
> 
> Agreed. Presumably we are still OK with the cpu hotplug thread being
> migrated off to random CPUs and its unfinished kthread_parkme racing 
> with
> a subsequent unpark? The cpuhp/X thread ends up on a random rq if it 
> can't
> do schedule() in time because migration/X yanks it off of the dying CPU 
> X.
> Apart from slightly disconcerting traces showing cpuhp/X momentarily 
> executing
> on CPU Y, there's no problem that I can see of course.
> 
> Probably worth mentioning that the following patch from Junaid Shahid 
> seems
> to indicate that such a race doesn't always work out as desired:
> https://lkml.org/lkml/2017/4/28/755

Ah, Junaid's problem/patch wouldn't be relevant in the hotplug case 
because of the
completion I think.

> 
>> The solution is simpler than expected. The thread which controls the
>> hotplug operation is waiting for the CPU to call complete() on the 
>> hotplug
>> state completion. So the idle task of the upcoming CPU can set its 
>> state to
>> CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the 
>> control
>> task on a different CPU, which then can safely do the unpark and kick 
>> the
>> now unparked hotplug thread of the upcoming CPU to complete the 
>> bringup to
>> the final target state.
>> 
>> Fixes: 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully 
>> up")
>> Reported-by: Vikram Mulukutla <markivx@codeaurora.org>
>> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Cc: Sebastian Siewior <bigeasy@linutronix.de>
>> Cc: rusty@rustcorp.com.au
>> Cc: tj@kernel.org
>> Cc: akpm@linux-foundation.org
>> Cc: stable@vger.kernel.org
>> 
>> ---
>>  kernel/cpu.c |   30 ++++++++++++++++--------------
>>  1 file changed, 16 insertions(+), 14 deletions(-)
>> 
>> --- a/kernel/cpu.c
>> +++ b/kernel/cpu.c
>> @@ -271,11 +271,25 @@ void cpu_hotplug_enable(void)
>>  EXPORT_SYMBOL_GPL(cpu_hotplug_enable);
>>  #endif	/* CONFIG_HOTPLUG_CPU */
>> 
>> +static void __cpuhp_kick_ap_work(struct cpuhp_cpu_state *st);
>> +
>>  static int bringup_wait_for_ap(unsigned int cpu)
>>  {
>>  	struct cpuhp_cpu_state *st = per_cpu_ptr(&cpuhp_state, cpu);
>> 
>> +	/* Wait for the CPU to reach IDLE_ONLINE */
>>  	wait_for_completion(&st->done);
>> +	BUG_ON(!cpu_online(cpu));
>> +
>> +	/* Unpark the stopper thread and the hotplug thread of the target 
>> cpu */
>> +	stop_machine_unpark(cpu);
>> +	kthread_unpark(st->thread);
>> +
>> +	/* Should we go further up ? */
>> +	if (st->target > CPUHP_AP_ONLINE_IDLE) {
>> +		__cpuhp_kick_ap_work(st);
>> +		wait_for_completion(&st->done);
>> +	}
>>  	return st->result;
>>  }
>> 
>> @@ -296,9 +310,7 @@ static int bringup_cpu(unsigned int cpu)
>>  	irq_unlock_sparse();
>>  	if (ret)
>>  		return ret;
>> -	ret = bringup_wait_for_ap(cpu);
>> -	BUG_ON(!cpu_online(cpu));
>> -	return ret;
>> +	return bringup_wait_for_ap(cpu);
>>  }
>> 
>>  /*
>> @@ -775,23 +787,13 @@ void notify_cpu_starting(unsigned int cp
>>  void cpuhp_online_idle(enum cpuhp_state state)
>>  {
>>  	struct cpuhp_cpu_state *st = this_cpu_ptr(&cpuhp_state);
>> -	unsigned int cpu = smp_processor_id();
>> 
>>  	/* Happens for the boot cpu */
>>  	if (state != CPUHP_AP_ONLINE_IDLE)
>>  		return;
>> 
>>  	st->state = CPUHP_AP_ONLINE_IDLE;
>> -
>> -	/* Unpark the stopper thread and the hotplug thread of this cpu */
>> -	stop_machine_unpark(cpu);
>> -	kthread_unpark(st->thread);
>> -
>> -	/* Should we go further up ? */
>> -	if (st->target > CPUHP_AP_ONLINE_IDLE)
>> -		__cpuhp_kick_ap_work(st);
>> -	else
>> -		complete(&st->done);
>> +	complete(&st->done);
>>  }
>> 
>>  /* Requires cpu_add_remove_lock to be held */
> 
> Nice and clean otherwise. Channagoud was instrumental in collecting
> data, theorizing with me and testing your fix, so if the concern I've
> raised above doesn't matter, please add:
> 
> Tested-by: Channagoud Kadabi <ckadabi@codeaurora.org>
> 
> Thanks,
> Vikram

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

      reply	other threads:[~2017-07-07  7:53 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-04 20:20 [PATCH] smp/hotplug: Move unparking of percpu threads to the control CPU Thomas Gleixner
2017-07-05  9:04 ` Peter Zijlstra
2017-07-05  9:07   ` Thomas Gleixner
2017-07-05  9:17     ` Peter Zijlstra
2017-07-06  8:57 ` [tip:smp/urgent] " tip-bot for Thomas Gleixner
2017-07-07  7:47 ` [PATCH] " Vikram Mulukutla
2017-07-07  7:53   ` Vikram Mulukutla [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=858ae39b96a9e56006c26eb91bce1923@codeaurora.org \
    --to=markivx@codeaurora.org \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=linux-kernel-owner@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.