All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Mackerras <paulus@samba.org>
To: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
Cc: linux-kernel@vger.kernel.org
Subject: Deadlock between cpu_hotplug_begin and cpu_add_remove_lock
Date: Wed, 22 Jan 2014 16:52:39 +1100	[thread overview]
Message-ID: <20140122055239.GA29418@iris.ozlabs.ibm.com> (raw)

This arises out of a report from a tester that offlining a CPU never
finished on a system they were testing.  This was on a POWER8 running
a 3.10.x kernel, but the issue is still present in mainline AFAICS.

What I found when I looked at the system was this:

* There was a ppc64_cpu process stuck inside cpu_hotplug_begin(),
  called from _cpu_down(), from cpu_down().  This process was holding
  the cpu_add_remove_lock mutex, since cpu_down() calls
  cpu_maps_update_begin() before calling _cpu_down().  It was stuck
  there because cpu_hotplug.refcount == 1.

* There was a mdadm process trying to acquire the cpu_add_remove_lock
  mutex inside register_cpu_notifier(), called from
  raid5_alloc_percpu() in drivers/md/raid5.c.  That process had
  previously called get_online_cpus, which is why cpu_hotplug.refcount
  was 1.

Result: deadlock.

Thus it seems that the following code is not safe:

	get_online_cpus();
	register_cpu_notifier(&...);
	put_online_cpus();

There are a few different places that do that sort of thing; besides
drivers/md/raid5.c, there are instances in arch/x86/kernel/cpu,
arch/x86/oprofile, drivers/cpufreq/acpi-cpufreq.c,
drivers/oprofile/nmi_timer_int.c and kernel/trace/ring_buffer.c.

My question is this: is it reasonable to call register_cpu_notifier
inside a get/put_online_cpus block?  If so, the deadlock needs to be
fixed; if not, the callers need to be fixed, and the restriction
should be documented.

Regards,
Paul.

             reply	other threads:[~2014-01-22  5:53 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-22  5:52 Paul Mackerras [this message]
2014-01-22  8:30 ` Deadlock between cpu_hotplug_begin and cpu_add_remove_lock Srivatsa S. Bhat
2014-01-22  9:16   ` Srivatsa S. Bhat
2014-01-22 19:18     ` Oleg Nesterov
2014-01-22 19:58       ` Srivatsa S. Bhat
2014-01-23 17:02         ` Oleg Nesterov
2014-01-28 14:32           ` Srivatsa S. Bhat
2014-01-23  2:29     ` Rusty Russell
2014-01-23  5:36       ` Srivatsa S. Bhat
2014-01-23 23:01         ` Rusty Russell
2014-01-28 14:36           ` Srivatsa S. Bhat

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140122055239.GA29418@iris.ozlabs.ibm.com \
    --to=paulus@samba.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.