From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org,
Greg KH <greg@kroah.com>
Cc: Justin Forbes <jmforbes@linuxtx.org>,
Zwane Mwaikambo <zwane@arm.linux.org.uk>,
"Theodore Ts'o" <tytso@mit.edu>,
Randy Dunlap <rdunlap@xenotime.net>,
Dave Jones <davej@redhat.com>,
Chuck Wolber <chuckw@quantumlinux.com>,
Chris Wedgwood <reviews@ml.cw.f00f.org>,
Michael Krufky <mkrufky@linuxtv.org>,
Chuck Ebbert <cebbert@redhat.com>,
Domenico Andreoli <cavokz@gmail.com>, Willy Tarreau <w@1wt.eu>,
Rodrigo Rubira Branco <rbranco@la.checkpoint.com>,
Jake Edge <jake@lwn.net>, Eugene Teo <eteo@redhat.com>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, NeilBrown <neilb@suse.de>,
Dan Williams <dan.j.williams@intel.com>
Subject: [patch 78/88] md: fix deadlock when stopping arrays
Date: Thu, 30 Apr 2009 09:57:07 -0700 [thread overview]
Message-ID: <20090430165749.877345784@mini.kroah.org> (raw)
In-Reply-To: <20090430170122.GA16015@kroah.com>
[-- Attachment #1: md-fix-deadlock-when-stopping-arrays.patch --]
[-- Type: text/plain, Size: 4927 bytes --]
2.6.28-stable review patch. If anyone has any objections, please let us know.
------------------
From: Dan Williams <dan.j.williams@intel.com>
[backport of 5fd3a17ed456637a224cf4ca82b9ad9d005bc8d4]
Resolve a deadlock when stopping redundant arrays, i.e. ones that
require a call to sysfs_remove_group when shutdown. The deadlock is
summarized below:
Thread1 Thread2
------- -------
read sysfs attribute stop array
take mddev lock
sysfs_remove_group
sysfs_get_active
wait for mddev lock
wait for active
Sysrq-w:
--------
mdmon S 00000017 2212 4163 1
f1982ea8 00000046 2dcf6b85 00000017 c0b23100 f2f83ed0 c0b23100 f2f8413c
c0b23100 c0b23100 c0b1fb98 f2f8413c 00000000 f2f8413c c0b23100 f2291ecc
00000002 c0b23100 00000000 00000017 f2f83ed0 f1982eac 00000046 c044d9dd
Call Trace:
[<c044d9dd>] ? debug_mutex_add_waiter+0x1d/0x58
[<c06ef451>] __mutex_lock_common+0x1d9/0x338
[<c06ef451>] ? __mutex_lock_common+0x1d9/0x338
[<c06ef5e3>] mutex_lock_interruptible_nested+0x33/0x3a
[<c0634553>] ? mddev_lock+0x14/0x16
[<c0634553>] mddev_lock+0x14/0x16
[<c0634eda>] md_attr_show+0x2a/0x49
[<c04e9997>] sysfs_read_file+0x93/0xf9
mdadm D 00000017 2812 4177 1
f0401d78 00000046 430456f8 00000017 f0401d58 f0401d20 c0b23100 f2da2c4c
c0b23100 c0b23100 c0b1fb98 f2da2c4c 0a10fc36 00000000 c0b23100 f0401d70
00000003 c0b23100 00000000 00000017 f2da29e0 00000001 00000002 00000000
Call Trace:
[<c06eed1b>] schedule_timeout+0x1b/0x95
[<c06eed1b>] ? schedule_timeout+0x1b/0x95
[<c06eeb97>] ? wait_for_common+0x34/0xdc
[<c044fa8a>] ? trace_hardirqs_on_caller+0x18/0x145
[<c044fbc2>] ? trace_hardirqs_on+0xb/0xd
[<c06eec03>] wait_for_common+0xa0/0xdc
[<c0428c7c>] ? default_wake_function+0x0/0x12
[<c06eeccc>] wait_for_completion+0x17/0x19
[<c04ea620>] sysfs_addrm_finish+0x19f/0x1d1
[<c04e920e>] sysfs_hash_and_remove+0x42/0x55
[<c04eb4db>] sysfs_remove_group+0x57/0x86
[<c0638086>] do_md_stop+0x13a/0x499
This has been there for a while, but is easier to trigger now that mdmon
is closely watching sysfs.
Cc: Neil Brown <neilb@suse.de>
Reported-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/md/md.c | 27 ++++++++++++++++++++++++---
include/linux/raid/md_k.h | 2 ++
2 files changed, 26 insertions(+), 3 deletions(-)
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3694,6 +3694,10 @@ static int do_md_run(mddev_t * mddev)
return err;
}
if (mddev->pers->sync_request) {
+ /* wait for any previously scheduled redundancy groups
+ * to be removed
+ */
+ flush_scheduled_work();
if (sysfs_create_group(&mddev->kobj, &md_redundancy_group))
printk(KERN_WARNING
"md: cannot register extra attributes for %s\n",
@@ -3824,6 +3828,14 @@ static void restore_bitmap_write_access(
spin_unlock(&inode->i_lock);
}
+
+static void sysfs_delayed_rm(struct work_struct *ws)
+{
+ mddev_t *mddev = container_of(ws, mddev_t, del_work);
+
+ sysfs_remove_group(&mddev->kobj, &md_redundancy_group);
+}
+
/* mode:
* 0 - completely stop and dis-assemble array
* 1 - switch to readonly
@@ -3833,6 +3845,7 @@ static int do_md_stop(mddev_t * mddev, i
{
int err = 0;
struct gendisk *disk = mddev->gendisk;
+ int remove_group = 0;
if (atomic_read(&mddev->openers) > is_open) {
printk("md: %s still in use.\n",mdname(mddev));
@@ -3868,10 +3881,9 @@ static int do_md_stop(mddev_t * mddev, i
mddev->queue->merge_bvec_fn = NULL;
mddev->queue->unplug_fn = NULL;
mddev->queue->backing_dev_info.congested_fn = NULL;
- if (mddev->pers->sync_request)
- sysfs_remove_group(&mddev->kobj, &md_redundancy_group);
-
module_put(mddev->pers->owner);
+ if (mddev->pers->sync_request)
+ remove_group = 1;
mddev->pers = NULL;
/* tell userspace to handle 'inactive' */
sysfs_notify_dirent(mddev->sysfs_state);
@@ -3919,6 +3931,15 @@ static int do_md_stop(mddev_t * mddev, i
/* make sure all md_delayed_delete calls have finished */
flush_scheduled_work();
+ /* we can't wait for group removal under mddev_lock as
+ * threads holding the group 'active' need to acquire
+ * mddev_lock before going inactive
+ */
+ if (remove_group) {
+ INIT_WORK(&mddev->del_work, sysfs_delayed_rm);
+ schedule_work(&mddev->del_work);
+ }
+
export_array(mddev);
mddev->array_sectors = 0;
--- a/include/linux/raid/md_k.h
+++ b/include/linux/raid/md_k.h
@@ -245,6 +245,8 @@ struct mddev_s
* file in sysfs.
*/
+ struct work_struct del_work; /* used for delayed sysfs removal */
+
spinlock_t write_lock;
wait_queue_head_t sb_wait; /* for waiting on superblock updates */
atomic_t pending_writes; /* number of active superblock writes */
next prev parent reply other threads:[~2009-04-30 17:43 UTC|newest]
Thread overview: 94+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20090430165549.117010404@mini.kroah.org>
2009-04-30 17:01 ` [patch 00/88] 2.6.28.10-stable review Greg KH
2009-04-30 16:55 ` [patch 01/88] bonding: Fix updating of speed/duplex changes Greg KH
2009-04-30 16:55 ` [patch 02/88] net: fix sctp breakage Greg KH
2009-04-30 16:55 ` [patch 03/88] ipv6: dont use tw net when accounting for recycled tw Greg KH
2009-04-30 16:55 ` [patch 04/88] ipv6: Plug sk_buff leak in ipv6_rcv (net/ipv6/ip6_input.c) Greg KH
2009-04-30 16:55 ` [patch 05/88] netfilter: nf_conntrack_tcp: fix unaligned memory access in tcp_sack Greg KH
2009-04-30 16:55 ` [patch 06/88] xfrm: spin_lock() should be spin_unlock() in xfrm_state.c Greg KH
2009-04-30 16:55 ` [patch 07/88] bridge: bad error handling when adding invalid ether address Greg KH
2009-04-30 16:55 ` [patch 08/88] bas_gigaset: correctly allocate USB interrupt transfer buffer Greg KH
2009-04-30 16:55 ` [patch 09/88] USB: EHCI: add software retry for transaction errors Greg KH
2009-04-30 16:55 ` [patch 10/88] USB: fix USB_STORAGE_CYPRESS_ATACB Greg KH
2009-04-30 16:56 ` [patch 11/88] USB: usb-storage: increase max_sectors for tape drives Greg KH
2009-04-30 16:56 ` [patch 12/88] USB: gadget: fix rndis regression Greg KH
2009-04-30 16:56 ` [patch 13/88] USB: add quirk to avoid config and interface strings Greg KH
2009-04-30 16:56 ` [patch 14/88] cifs: fix buffer format byte on NT Rename/hardlink Greg KH
2009-04-30 16:56 ` [patch 15/88] b43: fix b43_plcp_get_bitrate_idx_ofdm return type Greg KH
2009-04-30 16:56 ` [patch 16/88] CIFS: Fix memory overwrite when saving nativeFileSystem field during mount Greg KH
2009-04-30 16:56 ` [patch 17/88] Add a missing unlock_kernel() in raw_open() Greg KH
2009-04-30 16:56 ` [patch 18/88] x86, PAT, PCI: Change vma prot in pci_mmap to reflect inherited prot Greg KH
2009-04-30 16:56 ` [patch 19/88] x86: mtrr: dont modify RdDram/WrDram bits of fixed MTRRs Greg KH
2009-04-30 16:56 ` [patch 20/88] security/smack: fix oops when setting a size 0 SMACK64 xattr Greg KH
2009-04-30 16:56 ` [patch 21/88] x86, setup: mark %esi as clobbered in E820 BIOS call Greg KH
2009-04-30 16:56 ` [patch 22/88] dock: fix dereference after kfree() Greg KH
2009-04-30 16:56 ` [patch 23/88] mm: define a UNIQUE value for AS_UNEVICTABLE flag Greg KH
2009-04-30 16:56 ` [patch 24/88] mm: do_xip_mapping_read: fix length calculation Greg KH
2009-04-30 16:56 ` [patch 25/88] vfs: skip I_CLEAR state inodes Greg KH
2009-04-30 16:56 ` [patch 26/88] af_rose/x25: Sanity check the maximum user frame size Greg KH
2009-04-30 16:56 ` [patch 27/88] net/netrom: Fix socket locking Greg KH
2009-04-30 16:56 ` [patch 28/88] kprobes: Fix locking imbalance in kretprobes Greg KH
2009-04-30 16:56 ` [patch 29/88] netfilter: {ip, ip6, arp}_tables: fix incorrect loop detection Greg KH
2009-04-30 16:56 ` [patch 30/88] splice: fix deadlock in splicing to file Greg KH
2009-04-30 16:56 ` [patch 31/88] ALSA: hda - add missing comma in ad1884_slave_vols Greg KH
2009-04-30 16:56 ` [patch 32/88] SCSI: libiscsi: fix iscsi pool error path Greg KH
2009-04-30 16:56 ` [patch 33/88] SCSI: libiscsi: fix iscsi pool error path again Greg KH
2009-04-30 16:56 ` [patch 34/88] posixtimers, sched: Fix posix clock monotonicity Greg KH
2009-04-30 16:56 ` [patch 35/88] sched: do not count frozen tasks toward load Greg KH
2009-04-30 16:56 ` [patch 36/88] add some long-missing capabilities to fs_mask Greg KH
2009-04-30 16:56 ` [patch 37/88] spi: spi_write_then_read() bugfixes Greg KH
2009-04-30 16:56 ` [patch 38/88] powerpc: Fix data-corrupting bug in __futex_atomic_op Greg KH
2009-04-30 16:56 ` [patch 39/88] hpt366: fix HPT370 DMA timeouts Greg KH
2009-04-30 16:56 ` [patch 40/88] pata_hpt37x: " Greg KH
2009-04-30 16:56 ` [patch 41/88] mm: pass correct mm when growing stack Greg KH
2009-04-30 16:56 ` [patch 42/88] SCSI: sg: fix races during device removal Greg KH
2009-04-30 16:56 ` [patch 43/88] SCSI: sg: fix races with ioctl(SG_IO) Greg KH
2009-04-30 16:56 ` [patch 44/88] SCSI: sg: avoid blk_put_request/blk_rq_unmap_user in interrupt Greg KH
2009-04-30 16:56 ` [patch 45/88] usb gadget: fix ethernet link reports to ethtool Greg KH
2009-04-30 16:56 ` [patch 46/88] USB: ftdi_sio: add vendor/project id for JETI specbos 1201 spectrometer Greg KH
2009-04-30 16:56 ` [patch 47/88] USB: fix oops in cdc-wdm in case of malformed descriptors Greg KH
2009-04-30 16:56 ` [patch 48/88] USB: usb-storage: augment unusual_devs entry for Simple Tech/Datafab Greg KH
2009-04-30 16:56 ` [patch 49/88] Input: gameport - fix attach driver code Greg KH
2009-04-30 16:56 ` [patch 50/88] r8169: Reset IntrStatus after chip reset Greg KH
2009-04-30 16:56 ` [patch 51/88] agp: zero pages before sending to userspace Greg KH
2009-04-30 16:56 ` [patch 52/88] hugetlbfs: return negative error code for bad mount option Greg KH
2009-04-30 16:56 ` [patch 53/88] block: revert part of 18ce3751ccd488c78d3827e9f6bf54e6322676fb Greg KH
2009-04-30 16:56 ` [patch 54/88] anon_inodes: use fops->owner for module refcount Greg KH
2009-04-30 16:56 ` [patch 55/88] KVM: x86: Reset pending/inject NMI state on CPU reset Greg KH
2009-04-30 16:56 ` [patch 56/88] KVM: call kvm_arch_vcpu_reset() instead of the kvm_x86_ops callback Greg KH
2009-04-30 16:56 ` [patch 57/88] KVM: MMU: Extend kvm_mmu_page->slot_bitmap size Greg KH
2009-04-30 16:56 ` [patch 58/88] KVM: VMX: Move private memory slot position Greg KH
2009-04-30 16:56 ` [patch 59/88] KVM: SVM: Set the g bit of the cs selector for cross-vendor migration Greg KH
2009-04-30 16:56 ` [patch 60/88] KVM: SVM: Set the busy flag of the TR selector Greg KH
2009-04-30 16:56 ` [patch 61/88] KVM: MMU: Fix aliased gfns treated as unaliased Greg KH
2009-04-30 16:56 ` [patch 62/88] KVM: Fix cpuid leaf 0xb loop termination Greg KH
2009-04-30 16:56 ` [patch 63/88] KVM: Fix cpuid iteration on multiple leaves per eac Greg KH
2009-04-30 16:56 ` [patch 64/88] KVM: Prevent trace call into unloaded module text Greg KH
2009-04-30 16:56 ` [patch 65/88] KVM: Really remove a slot when a user ask us so Greg KH
2009-04-30 16:56 ` [patch 66/88] KVM: x86 emulator: Fix handling of VMMCALL instruction Greg KH
2009-04-30 16:56 ` [patch 67/88] KVM: set owner of cpu and vm file operations Greg KH
2009-04-30 16:56 ` [patch 68/88] KVM: Advertise the bug in memory region destruction as fixed Greg KH
2009-04-30 16:56 ` [patch 69/88] KVM: MMU: check for present pdptr shadow page in walk_shadow Greg KH
2009-04-30 16:56 ` [patch 70/88] KVM: MMU: handle large host sptes on invlpg/resync Greg KH
2009-04-30 16:57 ` [patch 71/88] KVM: mmu_notifiers release method Greg KH
2009-04-30 16:57 ` [patch 72/88] KVM: PIT: fix i8254 pending count read Greg KH
2009-04-30 16:57 ` [patch 73/88] KVM: x86: disable kvmclock on non constant TSC hosts Greg KH
2009-04-30 16:57 ` [patch 74/88] KVM: x86: fix LAPIC pending count calculation Greg KH
2009-04-30 16:57 ` [patch 75/88] KVM: VMX: Flush volatile msrs before emulating rdmsr Greg KH
2009-04-30 16:57 ` [patch 76/88] ath9k: implement IO serialization Greg KH
2009-04-30 16:57 ` [patch 77/88] ath9k: AR9280 PCI devices must serialize IO as well Greg KH
2009-04-30 16:57 ` Greg KH [this message]
2009-04-30 16:57 ` [patch 79/88] block: include empty disks in /proc/diskstats Greg KH
2009-04-30 16:57 ` [patch 80/88] powerpc: Sanitize stack pointer in signal handling code Greg KH
2009-04-30 16:57 ` [patch 81/88] fs core fixes Greg KH
2009-04-30 16:57 ` [patch 82/88] fix ptrace slowness Greg KH
2009-04-30 16:57 ` [patch 83/88] crypto: ixp4xx - Fix handling of chained sg buffers Greg KH
2009-04-30 16:57 ` [patch 84/88] PCI: fix incorrect mask of PM No_Soft_Reset bit Greg KH
2009-04-30 16:57 ` [patch 85/88] exit_notify: kill the wrong capable(CAP_KILL) check (CVE-2009-1337) Greg KH
2009-04-30 16:57 ` [patch 86/88] b44: Use kernel DMA addresses for the kernel DMA API Greg KH
2009-04-30 16:57 ` [patch 87/88] thinkpad-acpi: fix LED blinking through timer trigger Greg KH
2009-04-30 16:57 ` [patch 88/88] unreached code in selinux_ip_postroute_iptables_compat() (CVE-2009-1184) Greg KH
2009-04-30 21:44 ` [patch 00/88] 2.6.28.10-stable review Henrique de Moraes Holschuh
2009-04-30 21:54 ` Willy Tarreau
2009-05-02 15:38 ` Henrique de Moraes Holschuh
2009-04-30 22:32 ` Greg KH
2009-05-01 0:07 ` Henrique de Moraes Holschuh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090430165749.877345784@mini.kroah.org \
--to=gregkh@suse.de \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=cavokz@gmail.com \
--cc=cebbert@redhat.com \
--cc=chuckw@quantumlinux.com \
--cc=dan.j.williams@intel.com \
--cc=davej@redhat.com \
--cc=eteo@redhat.com \
--cc=greg@kroah.com \
--cc=jake@lwn.net \
--cc=jmforbes@linuxtx.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkrufky@linuxtv.org \
--cc=neilb@suse.de \
--cc=rbranco@la.checkpoint.com \
--cc=rdunlap@xenotime.net \
--cc=reviews@ml.cw.f00f.org \
--cc=stable@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=w@1wt.eu \
--cc=zwane@arm.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox