public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Jiri Slaby <jirislaby@gmail.com>
Cc: peterz@infradead.org, rjw@sisk.pl, akpm@linux-foundation.org,
	rusty@rustcorp.com.au, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH 1/1] sched: fix cpu_down deadlock
Date: Fri, 11 Sep 2009 14:09:55 +0800	[thread overview]
Message-ID: <4AA9E9B3.8060901@cn.fujitsu.com> (raw)
In-Reply-To: <1252496510-11898-1-git-send-email-jirislaby@gmail.com>

Jiri Slaby wrote:
> Jiri Slaby wrote:
>> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
>> cpu_hotplug-dont-affect-current-tasks-affinity.patch
>>
>> Well, I don't know why, but when the kthread overthere runs under
>> suspend conditions and gets rescheduled (e.g. by the might_sleep()
>> inside) it never returns. pick_next_task always returns the idle task
>> from the idle queue. State of the thread is TASK_RUNNING.
>>
>> Why is it not enqueued into some queue? I tried also
>> sched_setscheduler(current, FIFO, 99) in the thread itself. Unless I did
>> it wrong, it seems like a global scheduler problem?
> 
> Actually not, it definitely seems like a cpu_down problem.
>  
>> Ingo, any ideas?
> 
> Apparently not, but nevermind :). What about the patch below?
> 
> --
> 
> After a cpu is taken down in __stop_machine, the kcpu_thread still may be
> rescheduled to that cpu, but in fact the cpu is not running at that
> moment.
> 
> This causes kcpu_thread to never run again, because its enqueued on another
> runqueue, hence pick_next_task never selects it on the set of newly
> running cpus.
> 
> We do set_cpus_allowed_ptr in _cpu_down_thread, but cpu_active_mask is
> updated to not contain the cpu which goes down even after the thread finishes
> (and _cpu_down returns).
> 
> For me this triggers mostly while suspending a SMP machine with
> FAIR_GROUP_SCHED enabled and
> cpu_hotplug-dont-affect-current-tasks-affinity patch applied. The patch
> adds kthread to the cpu_down pipeline.
> 
> Fix this issue by eliminating the to-be-killed-cpu from active_cpu
> locally.
> 
> Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/cpu.c |   12 ++++++++++--
>  1 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index be9c5ad..17a3635 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -196,6 +196,14 @@ static int __ref _cpu_down_thread(void *_param)
>  	unsigned long mod = param->mod;
>  	unsigned int cpu = param->cpu;
>  	void *hcpu = (void *)(long)cpu;
> +	cpumask_var_t active_mask;
> +
> +	if (!alloc_cpumask_var(&active_mask, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	/* make sure we are not running on the cpu which goes down,
> +	   cpu_active_mask is altered even after we return! */
> +	cpumask_andnot(active_mask, cpu_active_mask, cpumask_of(cpu));
>  
>  	cpu_hotplug_begin();
>  	err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
> @@ -211,7 +219,7 @@ static int __ref _cpu_down_thread(void *_param)
>  	}
>  
>  	/* Ensure that we are not runnable on dying cpu */
> -	set_cpus_allowed_ptr(current, cpu_active_mask);
> +	set_cpus_allowed_ptr(current, active_mask);
>  
>  	err = __stop_machine(take_cpu_down, param, cpumask_of(cpu));
>  	if (err) {
> @@ -237,9 +245,9 @@ static int __ref _cpu_down_thread(void *_param)
>  		BUG();
>  
>  	check_for_tasks(cpu);
> -
>  out_release:
>  	cpu_hotplug_done();
> +	free_cpumask_var(active_mask);
>  	if (!err) {
>  		if (raw_notifier_call_chain(&cpu_chain, CPU_POST_DEAD | mod,
>  					    hcpu) == NOTIFY_BAD)



Hi, Jiri Slaby

Does this bug occur when a cpu is being offlined or
when the system is being suspended?
Or Both?

Lai


  parent reply	other threads:[~2009-09-11  6:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-11  8:41 [PATCH 1/1] Power: fix suspend vt regression Jiri Slaby
2009-08-11 17:00 ` Greg KH
2009-08-11 21:19   ` Jiri Slaby
2009-08-11 21:20     ` Jiri Slaby
2009-08-31  9:47     ` suspend race -next regression [Was: Power: fix suspend vt regression] Jiri Slaby
2009-08-31 19:32       ` Rafael J. Wysocki
2009-09-04 11:49         ` suspend race -mm " Jiri Slaby
2009-09-04 22:30           ` Jiri Slaby
2009-09-04 22:36             ` Jiri Slaby
2009-09-05 12:39               ` [-mm] warning during suspend [was: suspend race -mm regression] Jiri Slaby
2009-09-05 14:41                 ` Xiao Guangrong
2009-09-10 20:57                   ` Andrew Morton
2009-09-11  0:00                     ` Suresh Siddha
2009-09-11  7:55                       ` Xiao Guangrong
2009-09-09 11:41           ` [PATCH 1/1] sched: fix cpu_down deadlock Jiri Slaby
2009-09-09 11:53             ` Peter Zijlstra
2009-09-09 12:23               ` Jiri Slaby
2009-09-09 12:37                 ` Peter Zijlstra
2009-09-09 13:46                 ` Oleg Nesterov
2009-09-11  6:09             ` Lai Jiangshan [this message]
2009-09-11  6:28               ` Jiri Slaby
2009-09-11  7:38                 ` Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AA9E9B3.8060901@cn.fujitsu.com \
    --to=laijs@cn.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=jirislaby@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rjw@sisk.pl \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox