All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lai Jiangshan <laijs@cn.fujitsu.com>
To: Jiri Slaby <jirislaby@gmail.com>
Cc: peterz@infradead.org, rjw@sisk.pl, akpm@linux-foundation.org,
	rusty@rustcorp.com.au, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: [PATCH 1/1] sched: fix cpu_down deadlock
Date: Fri, 11 Sep 2009 14:09:55 +0800	[thread overview]
Message-ID: <4AA9E9B3.8060901@cn.fujitsu.com> (raw)
In-Reply-To: <1252496510-11898-1-git-send-email-jirislaby@gmail.com>

Jiri Slaby wrote:
> Jiri Slaby wrote:
>> Thanks, in the end I found it manually. Goddammit! It's an -mm thing:
>> cpu_hotplug-dont-affect-current-tasks-affinity.patch
>>
>> Well, I don't know why, but when the kthread overthere runs under
>> suspend conditions and gets rescheduled (e.g. by the might_sleep()
>> inside) it never returns. pick_next_task always returns the idle task
>> from the idle queue. State of the thread is TASK_RUNNING.
>>
>> Why is it not enqueued into some queue? I tried also
>> sched_setscheduler(current, FIFO, 99) in the thread itself. Unless I did
>> it wrong, it seems like a global scheduler problem?
> 
> Actually not, it definitely seems like a cpu_down problem.
>  
>> Ingo, any ideas?
> 
> Apparently not, but nevermind :). What about the patch below?
> 
> --
> 
> After a cpu is taken down in __stop_machine, the kcpu_thread still may be
> rescheduled to that cpu, but in fact the cpu is not running at that
> moment.
> 
> This causes kcpu_thread to never run again, because its enqueued on another
> runqueue, hence pick_next_task never selects it on the set of newly
> running cpus.
> 
> We do set_cpus_allowed_ptr in _cpu_down_thread, but cpu_active_mask is
> updated to not contain the cpu which goes down even after the thread finishes
> (and _cpu_down returns).
> 
> For me this triggers mostly while suspending a SMP machine with
> FAIR_GROUP_SCHED enabled and
> cpu_hotplug-dont-affect-current-tasks-affinity patch applied. The patch
> adds kthread to the cpu_down pipeline.
> 
> Fix this issue by eliminating the to-be-killed-cpu from active_cpu
> locally.
> 
> Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
> Cc: Ingo Molnar <mingo@elte.hu>
> Cc: Peter Zijlstra <peterz@infradead.org>
> ---
>  kernel/cpu.c |   12 ++++++++++--
>  1 files changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index be9c5ad..17a3635 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -196,6 +196,14 @@ static int __ref _cpu_down_thread(void *_param)
>  	unsigned long mod = param->mod;
>  	unsigned int cpu = param->cpu;
>  	void *hcpu = (void *)(long)cpu;
> +	cpumask_var_t active_mask;
> +
> +	if (!alloc_cpumask_var(&active_mask, GFP_KERNEL))
> +		return -ENOMEM;
> +
> +	/* make sure we are not running on the cpu which goes down,
> +	   cpu_active_mask is altered even after we return! */
> +	cpumask_andnot(active_mask, cpu_active_mask, cpumask_of(cpu));
>  
>  	cpu_hotplug_begin();
>  	err = __raw_notifier_call_chain(&cpu_chain, CPU_DOWN_PREPARE | mod,
> @@ -211,7 +219,7 @@ static int __ref _cpu_down_thread(void *_param)
>  	}
>  
>  	/* Ensure that we are not runnable on dying cpu */
> -	set_cpus_allowed_ptr(current, cpu_active_mask);
> +	set_cpus_allowed_ptr(current, active_mask);
>  
>  	err = __stop_machine(take_cpu_down, param, cpumask_of(cpu));
>  	if (err) {
> @@ -237,9 +245,9 @@ static int __ref _cpu_down_thread(void *_param)
>  		BUG();
>  
>  	check_for_tasks(cpu);
> -
>  out_release:
>  	cpu_hotplug_done();
> +	free_cpumask_var(active_mask);
>  	if (!err) {
>  		if (raw_notifier_call_chain(&cpu_chain, CPU_POST_DEAD | mod,
>  					    hcpu) == NOTIFY_BAD)



Hi, Jiri Slaby

Does this bug occur when a cpu is being offlined or
when the system is being suspended?
Or Both?

Lai


  parent reply	other threads:[~2009-09-11  6:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-11  8:41 [PATCH 1/1] Power: fix suspend vt regression Jiri Slaby
2009-08-11 17:00 ` Greg KH
2009-08-11 21:19   ` Jiri Slaby
2009-08-11 21:20     ` Jiri Slaby
2009-08-31  9:47     ` suspend race -next regression [Was: Power: fix suspend vt regression] Jiri Slaby
2009-08-31 19:32       ` Rafael J. Wysocki
2009-09-04 11:49         ` suspend race -mm " Jiri Slaby
2009-09-04 22:30           ` Jiri Slaby
2009-09-04 22:36             ` Jiri Slaby
2009-09-05 12:39               ` [-mm] warning during suspend [was: suspend race -mm regression] Jiri Slaby
2009-09-05 14:41                 ` Xiao Guangrong
2009-09-10 20:57                   ` Andrew Morton
2009-09-11  0:00                     ` Suresh Siddha
2009-09-11  7:55                       ` Xiao Guangrong
2009-09-09 11:41           ` [PATCH 1/1] sched: fix cpu_down deadlock Jiri Slaby
2009-09-09 11:53             ` Peter Zijlstra
2009-09-09 12:23               ` Jiri Slaby
2009-09-09 12:37                 ` Peter Zijlstra
2009-09-09 13:46                 ` Oleg Nesterov
2009-09-11  6:09             ` Lai Jiangshan [this message]
2009-09-11  6:28               ` Jiri Slaby
2009-09-11  7:38                 ` Lai Jiangshan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4AA9E9B3.8060901@cn.fujitsu.com \
    --to=laijs@cn.fujitsu.com \
    --cc=akpm@linux-foundation.org \
    --cc=jirislaby@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rjw@sisk.pl \
    --cc=rusty@rustcorp.com.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.