linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Deadlock between cpu_hotplug_begin and cpu_add_remove_lock
@ 2014-01-22  5:52 Paul Mackerras
  2014-01-22  8:30 ` Srivatsa S. Bhat
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Mackerras @ 2014-01-22  5:52 UTC (permalink / raw)
  To: Srivatsa S. Bhat; +Cc: linux-kernel

This arises out of a report from a tester that offlining a CPU never
finished on a system they were testing.  This was on a POWER8 running
a 3.10.x kernel, but the issue is still present in mainline AFAICS.

What I found when I looked at the system was this:

* There was a ppc64_cpu process stuck inside cpu_hotplug_begin(),
  called from _cpu_down(), from cpu_down().  This process was holding
  the cpu_add_remove_lock mutex, since cpu_down() calls
  cpu_maps_update_begin() before calling _cpu_down().  It was stuck
  there because cpu_hotplug.refcount == 1.

* There was a mdadm process trying to acquire the cpu_add_remove_lock
  mutex inside register_cpu_notifier(), called from
  raid5_alloc_percpu() in drivers/md/raid5.c.  That process had
  previously called get_online_cpus, which is why cpu_hotplug.refcount
  was 1.

Result: deadlock.

Thus it seems that the following code is not safe:

	get_online_cpus();
	register_cpu_notifier(&...);
	put_online_cpus();

There are a few different places that do that sort of thing; besides
drivers/md/raid5.c, there are instances in arch/x86/kernel/cpu,
arch/x86/oprofile, drivers/cpufreq/acpi-cpufreq.c,
drivers/oprofile/nmi_timer_int.c and kernel/trace/ring_buffer.c.

My question is this: is it reasonable to call register_cpu_notifier
inside a get/put_online_cpus block?  If so, the deadlock needs to be
fixed; if not, the callers need to be fixed, and the restriction
should be documented.

Regards,
Paul.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2014-01-28 14:41 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-22  5:52 Deadlock between cpu_hotplug_begin and cpu_add_remove_lock Paul Mackerras
2014-01-22  8:30 ` Srivatsa S. Bhat
2014-01-22  9:16   ` Srivatsa S. Bhat
2014-01-22 19:18     ` Oleg Nesterov
2014-01-22 19:58       ` Srivatsa S. Bhat
2014-01-23 17:02         ` Oleg Nesterov
2014-01-28 14:32           ` Srivatsa S. Bhat
2014-01-23  2:29     ` Rusty Russell
2014-01-23  5:36       ` Srivatsa S. Bhat
2014-01-23 23:01         ` Rusty Russell
2014-01-28 14:36           ` Srivatsa S. Bhat

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).