From: Greg KH <gregkh@suse.de>
To: linux-kernel@vger.kernel.org, stable@kernel.org,
Greg KH <greg@kroah.com>
Cc: Justin Forbes <jmforbes@linuxtx.org>,
Zwane Mwaikambo <zwane@arm.linux.org.uk>,
"Theodore Ts'o" <tytso@mit.edu>,
Randy Dunlap <rdunlap@xenotime.net>,
Dave Jones <davej@redhat.com>,
Chuck Wolber <chuckw@quantumlinux.com>,
Chris Wedgwood <reviews@ml.cw.f00f.org>,
Michael Krufky <mkrufky@linuxtv.org>,
Chuck Ebbert <cebbert@redhat.com>,
Domenico Andreoli <cavokz@gmail.com>, Willy Tarreau <w@1wt.eu>,
Rodrigo Rubira Branco <rbranco@la.checkpoint.com>,
Jake Edge <jake@lwn.net>, Eugene Teo <eteo@redhat.com>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
alan@lxorguk.ukuu.org.uk, NeilBrown <neilb@suse.de>,
Dan Williams <dan.j.williams@intel.com>
Subject: [patch 78/88] md: fix deadlock when stopping arrays
Date: Thu, 30 Apr 2009 09:57:07 -0700 [thread overview]
Message-ID: <20090430165749.877345784@mini.kroah.org> (raw)
In-Reply-To: <20090430170122.GA16015@kroah.com>
[-- Attachment #1: md-fix-deadlock-when-stopping-arrays.patch --]
[-- Type: text/plain, Size: 4927 bytes --]
2.6.28-stable review patch. If anyone has any objections, please let us know.
------------------
From: Dan Williams <dan.j.williams@intel.com>
[backport of 5fd3a17ed456637a224cf4ca82b9ad9d005bc8d4]
Resolve a deadlock when stopping redundant arrays, i.e. ones that
require a call to sysfs_remove_group when shutdown. The deadlock is
summarized below:
Thread1 Thread2
------- -------
read sysfs attribute stop array
take mddev lock
sysfs_remove_group
sysfs_get_active
wait for mddev lock
wait for active
Sysrq-w:
--------
mdmon S 00000017 2212 4163 1
f1982ea8 00000046 2dcf6b85 00000017 c0b23100 f2f83ed0 c0b23100 f2f8413c
c0b23100 c0b23100 c0b1fb98 f2f8413c 00000000 f2f8413c c0b23100 f2291ecc
00000002 c0b23100 00000000 00000017 f2f83ed0 f1982eac 00000046 c044d9dd
Call Trace:
[<c044d9dd>] ? debug_mutex_add_waiter+0x1d/0x58
[<c06ef451>] __mutex_lock_common+0x1d9/0x338
[<c06ef451>] ? __mutex_lock_common+0x1d9/0x338
[<c06ef5e3>] mutex_lock_interruptible_nested+0x33/0x3a
[<c0634553>] ? mddev_lock+0x14/0x16
[<c0634553>] mddev_lock+0x14/0x16
[<c0634eda>] md_attr_show+0x2a/0x49
[<c04e9997>] sysfs_read_file+0x93/0xf9
mdadm D 00000017 2812 4177 1
f0401d78 00000046 430456f8 00000017 f0401d58 f0401d20 c0b23100 f2da2c4c
c0b23100 c0b23100 c0b1fb98 f2da2c4c 0a10fc36 00000000 c0b23100 f0401d70
00000003 c0b23100 00000000 00000017 f2da29e0 00000001 00000002 00000000
Call Trace:
[<c06eed1b>] schedule_timeout+0x1b/0x95
[<c06eed1b>] ? schedule_timeout+0x1b/0x95
[<c06eeb97>] ? wait_for_common+0x34/0xdc
[<c044fa8a>] ? trace_hardirqs_on_caller+0x18/0x145
[<c044fbc2>] ? trace_hardirqs_on+0xb/0xd
[<c06eec03>] wait_for_common+0xa0/0xdc
[<c0428c7c>] ? default_wake_function+0x0/0x12
[<c06eeccc>] wait_for_completion+0x17/0x19
[<c04ea620>] sysfs_addrm_finish+0x19f/0x1d1
[<c04e920e>] sysfs_hash_and_remove+0x42/0x55
[<c04eb4db>] sysfs_remove_group+0x57/0x86
[<c0638086>] do_md_stop+0x13a/0x499
This has been there for a while, but is easier to trigger now that mdmon
is closely watching sysfs.
Cc: Neil Brown <neilb@suse.de>
Reported-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
---
drivers/md/md.c | 27 ++++++++++++++++++++++++---
include/linux/raid/md_k.h | 2 ++
2 files changed, 26 insertions(+), 3 deletions(-)
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -3694,6 +3694,10 @@ static int do_md_run(mddev_t * mddev)
return err;
}
if (mddev->pers->sync_request) {
+ /* wait for any previously scheduled redundancy groups
+ * to be removed
+ */
+ flush_scheduled_work();
if (sysfs_create_group(&mddev->kobj, &md_redundancy_group))
printk(KERN_WARNING
"md: cannot register extra attributes for %s\n",
@@ -3824,6 +3828,14 @@ static void restore_bitmap_write_access(
spin_unlock(&inode->i_lock);
}
+
+static void sysfs_delayed_rm(struct work_struct *ws)
+{
+ mddev_t *mddev = container_of(ws, mddev_t, del_work);
+
+ sysfs_remove_group(&mddev->kobj, &md_redundancy_group);
+}
+
/* mode:
* 0 - completely stop and dis-assemble array
* 1 - switch to readonly
@@ -3833,6 +3845,7 @@ static int do_md_stop(mddev_t * mddev, i
{
int err = 0;
struct gendisk *disk = mddev->gendisk;
+ int remove_group = 0;
if (atomic_read(&mddev->openers) > is_open) {
printk("md: %s still in use.\n",mdname(mddev));
@@ -3868,10 +3881,9 @@ static int do_md_stop(mddev_t * mddev, i
mddev->queue->merge_bvec_fn = NULL;
mddev->queue->unplug_fn = NULL;
mddev->queue->backing_dev_info.congested_fn = NULL;
- if (mddev->pers->sync_request)
- sysfs_remove_group(&mddev->kobj, &md_redundancy_group);
-
module_put(mddev->pers->owner);
+ if (mddev->pers->sync_request)
+ remove_group = 1;
mddev->pers = NULL;
/* tell userspace to handle 'inactive' */
sysfs_notify_dirent(mddev->sysfs_state);
@@ -3919,6 +3931,15 @@ static int do_md_stop(mddev_t * mddev, i
/* make sure all md_delayed_delete calls have finished */
flush_scheduled_work();
+ /* we can't wait for group removal under mddev_lock as
+ * threads holding the group 'active' need to acquire
+ * mddev_lock before going inactive
+ */
+ if (remove_group) {
+ INIT_WORK(&mddev->del_work, sysfs_delayed_rm);
+ schedule_work(&mddev->del_work);
+ }
+
export_array(mddev);
mddev->array_sectors = 0;
--- a/include/linux/raid/md_k.h
+++ b/include/linux/raid/md_k.h
@@ -245,6 +245,8 @@ struct mddev_s
* file in sysfs.
*/
+ struct work_struct del_work; /* used for delayed sysfs removal */
+
spinlock_t write_lock;
wait_queue_head_t sb_wait; /* for waiting on superblock updates */
atomic_t pending_writes; /* number of active superblock writes */
next prev parent reply other threads:[~2009-04-30 17:43 UTC|newest]
Thread overview: 100+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20090430165549.117010404@mini.kroah.org>
[not found] ` <20090430170122.GA16015-U8xfFu+wG4EAvxtiuMwx3w@public.gmane.org>
2009-04-30 16:56 ` [patch 35/88] sched: do not count frozen tasks toward load Greg KH
2009-04-30 16:56 ` Greg KH
2009-04-30 17:01 ` [patch 00/88] 2.6.28.10-stable review Greg KH
2009-04-30 16:55 ` [patch 01/88] bonding: Fix updating of speed/duplex changes Greg KH
2009-04-30 16:55 ` [patch 02/88] net: fix sctp breakage Greg KH
2009-04-30 16:55 ` [patch 03/88] ipv6: dont use tw net when accounting for recycled tw Greg KH
2009-04-30 16:55 ` [patch 04/88] ipv6: Plug sk_buff leak in ipv6_rcv (net/ipv6/ip6_input.c) Greg KH
2009-04-30 16:55 ` [patch 05/88] netfilter: nf_conntrack_tcp: fix unaligned memory access in tcp_sack Greg KH
2009-04-30 16:55 ` [patch 06/88] xfrm: spin_lock() should be spin_unlock() in xfrm_state.c Greg KH
2009-04-30 16:55 ` [patch 07/88] bridge: bad error handling when adding invalid ether address Greg KH
2009-04-30 16:55 ` [patch 08/88] bas_gigaset: correctly allocate USB interrupt transfer buffer Greg KH
2009-04-30 16:55 ` [patch 09/88] USB: EHCI: add software retry for transaction errors Greg KH
2009-04-30 16:55 ` [patch 10/88] USB: fix USB_STORAGE_CYPRESS_ATACB Greg KH
2009-04-30 16:56 ` [patch 11/88] USB: usb-storage: increase max_sectors for tape drives Greg KH
2009-04-30 16:56 ` [patch 12/88] USB: gadget: fix rndis regression Greg KH
2009-04-30 16:56 ` [patch 13/88] USB: add quirk to avoid config and interface strings Greg KH
2009-04-30 16:56 ` [patch 14/88] cifs: fix buffer format byte on NT Rename/hardlink Greg KH
2009-04-30 16:56 ` [patch 15/88] b43: fix b43_plcp_get_bitrate_idx_ofdm return type Greg KH
2009-04-30 16:56 ` [patch 16/88] CIFS: Fix memory overwrite when saving nativeFileSystem field during mount Greg KH
2009-04-30 16:56 ` [patch 17/88] Add a missing unlock_kernel() in raw_open() Greg KH
2009-04-30 16:56 ` [patch 18/88] x86, PAT, PCI: Change vma prot in pci_mmap to reflect inherited prot Greg KH
2009-04-30 16:56 ` [patch 19/88] x86: mtrr: dont modify RdDram/WrDram bits of fixed MTRRs Greg KH
2009-04-30 16:56 ` [patch 20/88] security/smack: fix oops when setting a size 0 SMACK64 xattr Greg KH
2009-04-30 16:56 ` [patch 21/88] x86, setup: mark %esi as clobbered in E820 BIOS call Greg KH
2009-04-30 16:56 ` [patch 22/88] dock: fix dereference after kfree() Greg KH
2009-04-30 16:56 ` [patch 23/88] mm: define a UNIQUE value for AS_UNEVICTABLE flag Greg KH
2009-04-30 16:56 ` [patch 24/88] mm: do_xip_mapping_read: fix length calculation Greg KH
2009-04-30 16:56 ` [patch 25/88] vfs: skip I_CLEAR state inodes Greg KH
2009-04-30 16:56 ` [patch 26/88] af_rose/x25: Sanity check the maximum user frame size Greg KH
2009-04-30 16:56 ` [patch 27/88] net/netrom: Fix socket locking Greg KH
2009-04-30 16:56 ` [patch 28/88] kprobes: Fix locking imbalance in kretprobes Greg KH
2009-04-30 16:56 ` [patch 29/88] netfilter: {ip, ip6, arp}_tables: fix incorrect loop detection Greg KH
2009-04-30 16:56 ` [patch 30/88] splice: fix deadlock in splicing to file Greg KH
2009-04-30 16:56 ` [patch 31/88] ALSA: hda - add missing comma in ad1884_slave_vols Greg KH
2009-04-30 16:56 ` [patch 32/88] SCSI: libiscsi: fix iscsi pool error path Greg KH
2009-04-30 16:56 ` [patch 33/88] SCSI: libiscsi: fix iscsi pool error path again Greg KH
2009-04-30 16:56 ` [patch 34/88] posixtimers, sched: Fix posix clock monotonicity Greg KH
2009-04-30 16:56 ` [patch 35/88] sched: do not count frozen tasks toward load Greg KH
2009-04-30 16:56 ` [patch 36/88] add some long-missing capabilities to fs_mask Greg KH
2009-04-30 16:56 ` [patch 37/88] spi: spi_write_then_read() bugfixes Greg KH
2009-04-30 16:56 ` [patch 38/88] powerpc: Fix data-corrupting bug in __futex_atomic_op Greg KH
2009-04-30 16:56 ` [patch 39/88] hpt366: fix HPT370 DMA timeouts Greg KH
2009-04-30 16:56 ` [patch 40/88] pata_hpt37x: " Greg KH
2009-04-30 16:56 ` [patch 41/88] mm: pass correct mm when growing stack Greg KH
2009-04-30 16:56 ` [patch 42/88] SCSI: sg: fix races during device removal Greg KH
2009-04-30 16:56 ` [patch 43/88] SCSI: sg: fix races with ioctl(SG_IO) Greg KH
2009-04-30 16:56 ` [patch 44/88] SCSI: sg: avoid blk_put_request/blk_rq_unmap_user in interrupt Greg KH
2009-04-30 16:56 ` [patch 45/88] usb gadget: fix ethernet link reports to ethtool Greg KH
2009-04-30 16:56 ` [patch 46/88] USB: ftdi_sio: add vendor/project id for JETI specbos 1201 spectrometer Greg KH
2009-04-30 16:56 ` [patch 47/88] USB: fix oops in cdc-wdm in case of malformed descriptors Greg KH
2009-04-30 16:56 ` [patch 48/88] USB: usb-storage: augment unusual_devs entry for Simple Tech/Datafab Greg KH
2009-04-30 16:56 ` [patch 49/88] Input: gameport - fix attach driver code Greg KH
2009-04-30 16:56 ` [patch 50/88] r8169: Reset IntrStatus after chip reset Greg KH
2009-04-30 16:56 ` [patch 51/88] agp: zero pages before sending to userspace Greg KH
2009-04-30 16:56 ` [patch 52/88] hugetlbfs: return negative error code for bad mount option Greg KH
2009-04-30 16:56 ` [patch 53/88] block: revert part of 18ce3751ccd488c78d3827e9f6bf54e6322676fb Greg KH
2009-04-30 16:56 ` [patch 54/88] anon_inodes: use fops->owner for module refcount Greg KH
2009-04-30 16:56 ` [patch 55/88] KVM: x86: Reset pending/inject NMI state on CPU reset Greg KH
2009-04-30 16:56 ` [patch 56/88] KVM: call kvm_arch_vcpu_reset() instead of the kvm_x86_ops callback Greg KH
2009-04-30 16:56 ` [patch 57/88] KVM: MMU: Extend kvm_mmu_page->slot_bitmap size Greg KH
2009-04-30 16:56 ` [patch 58/88] KVM: VMX: Move private memory slot position Greg KH
2009-04-30 16:56 ` [patch 59/88] KVM: SVM: Set the g bit of the cs selector for cross-vendor migration Greg KH
2009-04-30 16:56 ` [patch 60/88] KVM: SVM: Set the busy flag of the TR selector Greg KH
2009-04-30 16:56 ` [patch 61/88] KVM: MMU: Fix aliased gfns treated as unaliased Greg KH
2009-04-30 16:56 ` [patch 62/88] KVM: Fix cpuid leaf 0xb loop termination Greg KH
2009-04-30 16:56 ` [patch 63/88] KVM: Fix cpuid iteration on multiple leaves per eac Greg KH
2009-04-30 16:56 ` [patch 64/88] KVM: Prevent trace call into unloaded module text Greg KH
2009-04-30 16:56 ` [patch 65/88] KVM: Really remove a slot when a user ask us so Greg KH
2009-04-30 16:56 ` [patch 66/88] KVM: x86 emulator: Fix handling of VMMCALL instruction Greg KH
2009-04-30 16:56 ` [patch 67/88] KVM: set owner of cpu and vm file operations Greg KH
2009-04-30 16:56 ` [patch 68/88] KVM: Advertise the bug in memory region destruction as fixed Greg KH
2009-04-30 16:56 ` [patch 69/88] KVM: MMU: check for present pdptr shadow page in walk_shadow Greg KH
2009-04-30 16:56 ` [patch 70/88] KVM: MMU: handle large host sptes on invlpg/resync Greg KH
2009-04-30 16:57 ` [patch 71/88] KVM: mmu_notifiers release method Greg KH
2009-04-30 16:57 ` [patch 72/88] KVM: PIT: fix i8254 pending count read Greg KH
2009-04-30 16:57 ` [patch 73/88] KVM: x86: disable kvmclock on non constant TSC hosts Greg KH
2009-04-30 16:57 ` [patch 74/88] KVM: x86: fix LAPIC pending count calculation Greg KH
2009-04-30 16:57 ` [patch 75/88] KVM: VMX: Flush volatile msrs before emulating rdmsr Greg KH
2009-04-30 16:57 ` [ath9k-devel] [patch 76/88] ath9k: implement IO serialization Greg KH
2009-04-30 16:57 ` Greg KH
2009-04-30 16:57 ` Greg KH
2009-04-30 16:57 ` [ath9k-devel] [patch 77/88] ath9k: AR9280 PCI devices must serialize IO as well Greg KH
2009-04-30 16:57 ` Greg KH
2009-04-30 16:57 ` Greg KH
2009-04-30 16:57 ` Greg KH [this message]
2009-04-30 16:57 ` [patch 79/88] block: include empty disks in /proc/diskstats Greg KH
2009-04-30 16:57 ` [patch 80/88] powerpc: Sanitize stack pointer in signal handling code Greg KH
2009-04-30 16:57 ` [patch 81/88] fs core fixes Greg KH
2009-04-30 16:57 ` [patch 82/88] fix ptrace slowness Greg KH
2009-04-30 16:57 ` [patch 83/88] crypto: ixp4xx - Fix handling of chained sg buffers Greg KH
2009-04-30 16:57 ` [patch 84/88] PCI: fix incorrect mask of PM No_Soft_Reset bit Greg KH
2009-04-30 16:57 ` [patch 85/88] exit_notify: kill the wrong capable(CAP_KILL) check (CVE-2009-1337) Greg KH
2009-04-30 16:57 ` [patch 86/88] b44: Use kernel DMA addresses for the kernel DMA API Greg KH
2009-04-30 16:57 ` [patch 87/88] thinkpad-acpi: fix LED blinking through timer trigger Greg KH
2009-04-30 16:57 ` [patch 88/88] unreached code in selinux_ip_postroute_iptables_compat() (CVE-2009-1184) Greg KH
2009-04-30 21:44 ` [patch 00/88] 2.6.28.10-stable review Henrique de Moraes Holschuh
2009-04-30 21:54 ` Willy Tarreau
2009-05-02 15:38 ` Henrique de Moraes Holschuh
2009-04-30 22:32 ` Greg KH
2009-05-01 0:07 ` Henrique de Moraes Holschuh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090430165749.877345784@mini.kroah.org \
--to=gregkh@suse.de \
--cc=akpm@linux-foundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=cavokz@gmail.com \
--cc=cebbert@redhat.com \
--cc=chuckw@quantumlinux.com \
--cc=dan.j.williams@intel.com \
--cc=davej@redhat.com \
--cc=eteo@redhat.com \
--cc=greg@kroah.com \
--cc=jake@lwn.net \
--cc=jmforbes@linuxtx.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkrufky@linuxtv.org \
--cc=neilb@suse.de \
--cc=rbranco@la.checkpoint.com \
--cc=rdunlap@xenotime.net \
--cc=reviews@ml.cw.f00f.org \
--cc=stable@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
--cc=w@1wt.eu \
--cc=zwane@arm.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.