All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
To: Paul Mackerras <paulus@samba.org>
Cc: linux-kernel@vger.kernel.org,
	Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Ingo Molnar <mingo@kernel.org>, Oleg Nesterov <oleg@redhat.com>,
	Tejun Heo <tj@kernel.org>, Michel Lespinasse <walken@google.com>,
	ego@linux.vnet.ibm.com,
	"rusty@rustcorp.com.au" <rusty@rustcorp.com.au>,
	Thomas Gleixner <tglx@linutronix.de>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Subject: Re: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock
Date: Wed, 22 Jan 2014 14:46:09 +0530	[thread overview]
Message-ID: <52DF8C59.1090702@linux.vnet.ibm.com> (raw)
In-Reply-To: <52DF81B0.7020700@linux.vnet.ibm.com>

On 01/22/2014 02:00 PM, Srivatsa S. Bhat wrote:
> Hi Paul,
> 
> On 01/22/2014 11:22 AM, Paul Mackerras wrote:
>> This arises out of a report from a tester that offlining a CPU never
>> finished on a system they were testing.  This was on a POWER8 running
>> a 3.10.x kernel, but the issue is still present in mainline AFAICS.
>>
>> What I found when I looked at the system was this:
>>
>> * There was a ppc64_cpu process stuck inside cpu_hotplug_begin(),
>>   called from _cpu_down(), from cpu_down().  This process was holding
>>   the cpu_add_remove_lock mutex, since cpu_down() calls
>>   cpu_maps_update_begin() before calling _cpu_down().  It was stuck
>>   there because cpu_hotplug.refcount == 1.
>>
>> * There was a mdadm process trying to acquire the cpu_add_remove_lock
>>   mutex inside register_cpu_notifier(), called from
>>   raid5_alloc_percpu() in drivers/md/raid5.c.  That process had
>>   previously called get_online_cpus, which is why cpu_hotplug.refcount
>>   was 1.
>>
>> Result: deadlock.
>>
>> Thus it seems that the following code is not safe:
>>
>> 	get_online_cpus();
>> 	register_cpu_notifier(&...);
>> 	put_online_cpus();
>>
> 
> Yes, this is a known problem, and I had proposed an elaborate solution
> some time ago: https://lkml.org/lkml/2012/3/1/39
> But that won't work for all cases, so that solution is a no-go.
> 

Wait a min, that _will_ actually work for all cases because I have provided
an option to invoke _any_ arbitrary function as the "setup" routine.

So, taking the example of raid5 that you mentioned below, instead of doing
this:

static int raid5_alloc_percpu()
{
	...
	get_online_cpus();
	for_each_present_cpu(cpu) {
	  ...
	}
	...
	register_cpu_notifier();
	put_online_cpus();
}

We can do this:

void func()
{
	for_each_present_cpu(cpu) {
		...
	}
	...
}

static int raid5_alloc_percpu()
{
	...
	register_allcpu_notifier(..., true, func);
}


The other, simpler alternative fix is to use cpu_hotplug_disable/enable()
in place of get/put_online_cpus() around the callback registration code.
Something like this:

cpu_hotplug_disable();
register_cpu_notifier();
cpu_hotplug_enable();

But the problem with this is that a hotplug operation that tries to run
concurrently with this will get a -EBUSY, which is kinda undesirable.
Also, this will only synchronize with hotplug operations initiated via
calls to cpu_up/down() (such as those that are initiated by writing to the
online file in sysfs). It won't synchronize with the hotplug operations
invoked by disable/enable_nonboot_cpus(), which by-pass cpu_up/down() and
directly call _cpu_up/down() by ignoring the cpu_hotplug_disabled flag.

The latter is a more controlled environment though, since its mostly used
by the suspend/hibernate code, in a state where the entire userspace is
frozen. So it might not be that bad.

Regards,
Srivatsa S. Bhat

> If we forget the CPU_POST_DEAD stage for a moment, we can just replace the
> calls to cpu_maps_update_begin/done() with get/put_online_cpus() in both
> register_cpu_notifier() as well as unregister_cpu_notifier(). After all,
> the callback registration code needs to synchronize only with the actual
> hotplug operations, and not the update of cpu-maps. So they don't really
> need to acquire the cpu_add_remove_lock.
> 
> However, CPU_POST_DEAD notifications run with the hotplug lock dropped.
> So we can't simply replace cpu_add_remove_lock with hotplug lock in the
> registration routines, because notifier invocations and notifier registration
> needs to be synchronized.
> 
> Hmm...
>  
>> There are a few different places that do that sort of thing; besides
>> drivers/md/raid5.c, there are instances in arch/x86/kernel/cpu,
>> arch/x86/oprofile, drivers/cpufreq/acpi-cpufreq.c,
>> drivers/oprofile/nmi_timer_int.c and kernel/trace/ring_buffer.c.
>>
>> My question is this: is it reasonable to call register_cpu_notifier
>> inside a get/put_online_cpus block?
> 
> Ideally, we would want that to work. Because there is no other race-free
> way of registering a notifier.
> 
>>  If so, the deadlock needs to be
>> fixed; if not, the callers need to be fixed, and the restriction
>> should be documented.
> 
> Fixing the callers is a last resort. I'm thinking of ways to fix the
> deadlock itself, and allow the callers to call register_cpu_notifier
> within a get/put_online_cpus() block...
> 


  reply	other threads:[~2014-01-22  9:21 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-22  5:52 Deadlock between cpu_hotplug_begin and cpu_add_remove_lock Paul Mackerras
2014-01-22  8:30 ` Srivatsa S. Bhat
2014-01-22  9:16   ` Srivatsa S. Bhat [this message]
2014-01-22 19:18     ` Oleg Nesterov
2014-01-22 19:58       ` Srivatsa S. Bhat
2014-01-23 17:02         ` Oleg Nesterov
2014-01-28 14:32           ` Srivatsa S. Bhat
2014-01-23  2:29     ` Rusty Russell
2014-01-23  5:36       ` Srivatsa S. Bhat
2014-01-23 23:01         ` Rusty Russell
2014-01-28 14:36           ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52DF8C59.1090702@linux.vnet.ibm.com \
    --to=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=ego@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=rusty@rustcorp.com.au \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=walken@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.