From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
alan@lxorguk.ukuu.org.uk, "Theodore Tso" <tytso@mit.edu>
Subject: [ 067/120] ext4: fix potential deadlock in ext4_nonda_switch()
Date: Thu, 11 Oct 2012 10:00:19 +0900 [thread overview]
Message-ID: <20121011005838.357274078@linuxfoundation.org> (raw)
In-Reply-To: <20121011005825.364610894@linuxfoundation.org>
3.4-stable review patch. If anyone has any objections, please let me know.
------------------
From: Theodore Ts'o <tytso@mit.edu>
commit 00d4e7362ed01987183e9528295de3213031309c upstream.
In ext4_nonda_switch(), if the file system is getting full we used to
call writeback_inodes_sb_if_idle(). The problem is that we can be
holding i_mutex already, and this causes a potential deadlock when
writeback_inodes_sb_if_idle() when it tries to take s_umount. (See
lockdep output below).
As it turns out we don't need need to hold s_umount; the fact that we
are in the middle of the write(2) system call will keep the superblock
pinned. Unfortunately writeback_inodes_sb() checks to make sure
s_umount is taken, and the VFS uses a different mechanism for making
sure the file system doesn't get unmounted out from under us. The
simplest way of dealing with this is to just simply grab s_umount
using a trylock, and skip kicking the writeback flusher thread in the
very unlikely case that we can't take a read lock on s_umount without
blocking.
Also, we now check the cirteria for kicking the writeback thread
before we decide to whether to fall back to non-delayed writeback, so
if there are any outstanding delayed allocation writes, we try to get
them resolved as soon as possible.
[ INFO: possible circular locking dependency detected ]
3.6.0-rc1-00042-gce894ca #367 Not tainted
-------------------------------------------------------
dd/8298 is trying to acquire lock:
(&type->s_umount_key#18){++++..}, at: [<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46
but task is already holding lock:
(&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3
which lock already depends on the new lock.
2 locks held by dd/8298:
#0: (sb_writers#2){.+.+.+}, at: [<c01ddcc5>] generic_file_aio_write+0x56/0xd3
#1: (&sb->s_type->i_mutex_key#8){+.+...}, at: [<c01ddcce>] generic_file_aio_write+0x5f/0xd3
stack backtrace:
Pid: 8298, comm: dd Not tainted 3.6.0-rc1-00042-gce894ca #367
Call Trace:
[<c015b79c>] ? console_unlock+0x345/0x372
[<c06d62a1>] print_circular_bug+0x190/0x19d
[<c019906c>] __lock_acquire+0x86d/0xb6c
[<c01999db>] ? mark_held_locks+0x5c/0x7b
[<c0199724>] lock_acquire+0x66/0xb9
[<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
[<c06db935>] down_read+0x28/0x58
[<c02277d4>] ? writeback_inodes_sb_if_idle+0x28/0x46
[<c02277d4>] writeback_inodes_sb_if_idle+0x28/0x46
[<c026f3b2>] ext4_nonda_switch+0xe1/0xf4
[<c0271ece>] ext4_da_write_begin+0x27/0x193
[<c01dcdb0>] generic_file_buffered_write+0xc8/0x1bb
[<c01ddc47>] __generic_file_aio_write+0x1dd/0x205
[<c01ddce7>] generic_file_aio_write+0x78/0xd3
[<c026d336>] ext4_file_write+0x480/0x4a6
[<c0198c1d>] ? __lock_acquire+0x41e/0xb6c
[<c0180944>] ? sched_clock_cpu+0x11a/0x13e
[<c01967e9>] ? trace_hardirqs_off+0xb/0xd
[<c018099f>] ? local_clock+0x37/0x4e
[<c0209f2c>] do_sync_write+0x67/0x9d
[<c0209ec5>] ? wait_on_retry_sync_kiocb+0x44/0x44
[<c020a7b9>] vfs_write+0x7b/0xe6
[<c020a9a6>] sys_write+0x3b/0x64
[<c06dd4bd>] syscall_call+0x7/0xb
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
fs/ext4/inode.c | 17 ++++++++++-------
fs/fs-writeback.c | 1 +
2 files changed, 11 insertions(+), 7 deletions(-)
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2386,6 +2386,16 @@ static int ext4_nonda_switch(struct supe
free_blocks = EXT4_C2B(sbi,
percpu_counter_read_positive(&sbi->s_freeclusters_counter));
dirty_blocks = percpu_counter_read_positive(&sbi->s_dirtyclusters_counter);
+ /*
+ * Start pushing delalloc when 1/2 of free blocks are dirty.
+ */
+ if (dirty_blocks && (free_blocks < 2 * dirty_blocks) &&
+ !writeback_in_progress(sb->s_bdi) &&
+ down_read_trylock(&sb->s_umount)) {
+ writeback_inodes_sb(sb, WB_REASON_FS_FREE_SPACE);
+ up_read(&sb->s_umount);
+ }
+
if (2 * free_blocks < 3 * dirty_blocks ||
free_blocks < (dirty_blocks + EXT4_FREECLUSTERS_WATERMARK)) {
/*
@@ -2394,13 +2404,6 @@ static int ext4_nonda_switch(struct supe
*/
return 1;
}
- /*
- * Even if we don't switch but are nearing capacity,
- * start pushing delalloc when 1/2 of free blocks are dirty.
- */
- if (free_blocks < 2 * dirty_blocks)
- writeback_inodes_sb_if_idle(sb, WB_REASON_FS_FREE_SPACE);
-
return 0;
}
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -68,6 +68,7 @@ int writeback_in_progress(struct backing
{
return test_bit(BDI_writeback_running, &bdi->state);
}
+EXPORT_SYMBOL(writeback_in_progress);
static inline struct backing_dev_info *inode_to_bdi(struct inode *inode)
{
next prev parent reply other threads:[~2012-10-11 1:25 UTC|newest]
Thread overview: 130+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-10-11 0:59 [ 000/120] 3.4.14-stable review Greg Kroah-Hartman
2012-10-11 0:59 ` [ 001/120] mn10300: only add -mmem-funcs to KBUILD_CFLAGS if gcc supports it Greg Kroah-Hartman
2012-10-11 0:59 ` [ 002/120] kbuild: make: fix if_changed when command contains backslashes Greg Kroah-Hartman
2012-10-11 0:59 ` [ 003/120] kbuild: Fix gcc -x syntax Greg Kroah-Hartman
2012-10-11 0:59 ` [ 004/120] slab: fix the DEADLOCK issue on l3 alien lock Greg Kroah-Hartman
2012-10-18 13:20 ` Steven Rostedt
2012-10-18 17:14 ` Greg Kroah-Hartman
2012-10-18 18:04 ` Steven Rostedt
2012-10-11 0:59 ` [ 005/120] intel-iommu: Default to non-coherent for domains unattached to iommus Greg Kroah-Hartman
2012-10-11 0:59 ` [ 006/120] media: rc: ite-cir: Initialise ite_dev::rdev earlier Greg Kroah-Hartman
2012-10-11 0:59 ` [ 007/120] media: gspca_pac7302: add support for device 1ae7:2001 Speedlink Snappy Microphone SL-6825-SBK Greg Kroah-Hartman
2012-10-11 0:59 ` [ 008/120] ACPI: run _OSC after ACPI_FULL_INITIALIZATION Greg Kroah-Hartman
2012-10-11 0:59 ` [ 009/120] PCI: acpiphp: check whether _ADR evaluation succeeded Greg Kroah-Hartman
2012-10-11 0:59 ` [ 010/120] mfd: max8925: Move _IO resources out of ioport_ioresource Greg Kroah-Hartman
2012-10-11 0:59 ` [ 011/120] lib/gcd.c: prevent possible div by 0 Greg Kroah-Hartman
2012-10-12 21:11 ` Ben Hutchings
2012-10-12 21:16 ` Greg Kroah-Hartman
2012-10-11 0:59 ` [ 012/120] kernel/sys.c: call disable_nonboot_cpus() in kernel_restart() Greg Kroah-Hartman
2012-10-11 0:59 ` [ 013/120] drivers/scsi/atp870u.c: fix bad use of udelay Greg Kroah-Hartman
2012-10-11 0:59 ` [ 014/120] drivers/dma/dmaengine.c: lower the priority of failed to get dma channel message Greg Kroah-Hartman
2012-10-11 0:59 ` [ 015/120] lguest: fix occasional crash in example launcher Greg Kroah-Hartman
2012-10-11 0:59 ` [ 016/120] powerpc/eeh: Fix crash on converting OF node to edev Greg Kroah-Hartman
2012-10-11 0:59 ` [ 017/120] rapidio/rionet: fix multicast packet transmit logic Greg Kroah-Hartman
2012-10-11 0:59 ` [ 018/120] PM / Sleep: use resume event when call dpm_resume_early Greg Kroah-Hartman
2012-10-11 0:59 ` [ 019/120] workqueue: add missing smp_wmb() in process_one_work() Greg Kroah-Hartman
2012-10-11 0:59 ` [ 020/120] jbd2: dont write superblock when if its empty Greg Kroah-Hartman
2012-10-11 0:59 ` [ 021/120] localmodconfig: Fix localyesconfig to set to y not m Greg Kroah-Hartman
2012-10-11 0:59 ` [ 022/120] bnx2x: fix rx checksum validation for IPv6 Greg Kroah-Hartman
2012-10-11 0:59 ` [ 023/120] xfrm: Workaround incompatibility of ESN and async crypto Greg Kroah-Hartman
2012-10-11 0:59 ` [ 024/120] xfrm_user: return error pointer instead of NULL Greg Kroah-Hartman
2012-10-11 0:59 ` [ 025/120] xfrm_user: return error pointer instead of NULL #2 Greg Kroah-Hartman
2012-10-11 0:59 ` [ 026/120] xfrm: fix a read lock imbalance in make_blackhole Greg Kroah-Hartman
2012-10-11 0:59 ` [ 027/120] xfrm_user: fix info leak in copy_to_user_auth() Greg Kroah-Hartman
2012-10-11 0:59 ` [ 028/120] xfrm_user: fix info leak in copy_to_user_state() Greg Kroah-Hartman
2012-10-11 0:59 ` [ 029/120] xfrm_user: fix info leak in copy_to_user_policy() Greg Kroah-Hartman
2012-10-11 0:59 ` [ 030/120] xfrm_user: fix info leak in copy_to_user_tmpl() Greg Kroah-Hartman
2012-10-11 0:59 ` [ 031/120] xfrm_user: dont copy esn replay window twice for new states Greg Kroah-Hartman
2012-10-11 0:59 ` [ 032/120] xfrm_user: ensure user supplied esn replay window is valid Greg Kroah-Hartman
2012-10-11 0:59 ` [ 033/120] net: ethernet: davinci_cpdma: decrease the desc count when cleaning up the remaining packets Greg Kroah-Hartman
2012-10-11 0:59 ` [ 034/120] ixp4xx_hss: fix build failure due to missing linux/module.h inclusion Greg Kroah-Hartman
2012-10-11 0:59 ` [ 035/120] netxen: check for root bus in netxen_mask_aer_correctable Greg Kroah-Hartman
2012-10-11 0:59 ` [ 036/120] net-sched: sch_cbq: avoid infinite loop Greg Kroah-Hartman
2012-10-11 0:59 ` [ 037/120] pkt_sched: fix virtual-start-time update in QFQ Greg Kroah-Hartman
2012-10-11 0:59 ` [ 038/120] sierra_net: Endianess bug fix Greg Kroah-Hartman
2012-10-11 0:59 ` [ 039/120] 8021q: fix mac_len recomputation in vlan_untag() Greg Kroah-Hartman
2012-10-11 0:59 ` [ 040/120] ipv6: release reference of ip6_null_entrys dst entry in __ip6_del_rt Greg Kroah-Hartman
2012-10-11 0:59 ` [ 041/120] ipv6: del unreachable route when an addr is deleted on lo Greg Kroah-Hartman
2012-10-11 0:59 ` [ 042/120] ipv6: fix return value check in fib6_add() Greg Kroah-Hartman
2012-10-11 0:59 ` [ 043/120] tcp: flush DMA queue before sk_wait_data if rcv_wnd is zero Greg Kroah-Hartman
2012-10-11 0:59 ` [ 044/120] sctp: Dont charge for data in sndbuf again when transmitting packet Greg Kroah-Hartman
2012-10-11 0:59 ` [ 045/120] pppoe: drop PPPOX_ZOMBIEs in pppoe_release Greg Kroah-Hartman
2012-10-11 0:59 ` [ 046/120] net: small bug on rxhash calculation Greg Kroah-Hartman
2012-10-11 0:59 ` [ 047/120] net: guard tcp_set_keepalive() to tcp sockets Greg Kroah-Hartman
2012-10-11 1:00 ` [ 048/120] ipv4: raw: fix icmp_filter() Greg Kroah-Hartman
2012-10-11 1:00 ` [ 049/120] ipv6: raw: fix icmpv6_filter() Greg Kroah-Hartman
2012-10-11 1:00 ` [ 050/120] ipv6: mip6: fix mip6_mh_filter() Greg Kroah-Hartman
2012-10-11 1:00 ` [ 051/120] l2tp: fix a typo in l2tp_eth_dev_recv() Greg Kroah-Hartman
2012-10-11 1:00 ` [ 052/120] netrom: copy_datagram_iovec can fail Greg Kroah-Hartman
2012-10-11 1:00 ` [ 053/120] net: do not disable sg for packets requiring no checksum Greg Kroah-Hartman
2012-10-11 1:00 ` [ 054/120] aoe: assert AoE packets marked as " Greg Kroah-Hartman
2012-10-11 1:00 ` [ 055/120] drm/savage: re-add busmaster enable, regression fix Greg Kroah-Hartman
2012-10-11 1:00 ` [ 056/120] SCSI: zfcp: Adapt to new FC_PORTSPEED semantics Greg Kroah-Hartman
2012-10-11 1:00 ` [ 057/120] SCSI: zfcp: Make trace record tags unique Greg Kroah-Hartman
2012-10-11 1:00 ` [ 058/120] SCSI: zfcp: Bounds checking for deferred error trace Greg Kroah-Hartman
2012-10-11 1:00 ` [ 059/120] SCSI: zfcp: Do not wakeup while suspended Greg Kroah-Hartman
2012-10-11 1:00 ` [ 060/120] SCSI: zfcp: remove invalid reference to list iterator variable Greg Kroah-Hartman
2012-10-11 1:00 ` [ 061/120] SCSI: zfcp: restore refcount check on port_remove Greg Kroah-Hartman
2012-10-11 1:00 ` [ 062/120] SCSI: zfcp: only access zfcp_scsi_dev for valid scsi_device Greg Kroah-Hartman
2012-10-11 1:00 ` [ 063/120] PCI: Check P2P bridge for invalid secondary/subordinate range Greg Kroah-Hartman
2012-10-11 1:00 ` [ 064/120] ext4: ignore last group w/o enough space when resizing instead of BUGing Greg Kroah-Hartman
2012-10-11 1:00 ` [ 065/120] ext4: dont copy non-existent gdt blocks when resizing Greg Kroah-Hartman
2012-10-11 1:00 ` [ 066/120] ext4: avoid duplicate writes of the backup bg descriptor blocks Greg Kroah-Hartman
2012-10-11 1:00 ` Greg Kroah-Hartman [this message]
2012-10-11 1:00 ` [ 068/120] ext4: fix crash when accessing /proc/mounts concurrently Greg Kroah-Hartman
2012-10-11 1:00 ` [ 069/120] ext4: move_extent code cleanup Greg Kroah-Hartman
2012-10-11 1:00 ` [ 070/120] ext4: online defrag is not supported for journaled files Greg Kroah-Hartman
2012-10-11 1:00 ` [ 071/120] ext4: always set i_op in ext4_mknod() Greg Kroah-Hartman
2012-10-11 1:00 ` [ 072/120] ext4: fix fdatasync() for files with only i_size changes Greg Kroah-Hartman
2012-10-11 1:00 ` [ 073/120] ASoC: wm9712: Fix name of Capture Switch Greg Kroah-Hartman
2012-10-11 1:00 ` [ 074/120] kpageflags: fix wrong KPF_THP on non-huge compound pages Greg Kroah-Hartman
2012-10-11 1:00 ` [ 075/120] hugetlb: do not use vma_hugecache_offset() for vma_prio_tree_foreach Greg Kroah-Hartman
2012-10-11 1:00 ` [ 076/120] mm: fix invalidate_complete_page2() lock ordering Greg Kroah-Hartman
2012-10-11 1:00 ` [ 077/120] mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP Greg Kroah-Hartman
2012-10-11 1:00 ` [ 078/120] ALSA: aloop - add locking to timer access Greg Kroah-Hartman
2012-10-11 1:00 ` [ 079/120] ALSA: hda/realtek - Fix detection of ALC271X codec Greg Kroah-Hartman
2012-10-11 1:00 ` [ 080/120] ALSA: usb - disable broken hw volume for Tenx TP6911 Greg Kroah-Hartman
2012-10-11 1:00 ` [ 081/120] ALSA: USB: Support for (original) Xbox Communicator Greg Kroah-Hartman
2012-10-11 1:00 ` [ 082/120] drm: Destroy the planes prior to destroying the associated CRTC Greg Kroah-Hartman
2012-10-11 1:00 ` [ 083/120] drm/radeon: only adjust default clocks on NI GPUs Greg Kroah-Hartman
2012-10-11 1:00 ` [ 084/120] drm/radeon: Add MSI quirk for gateway RS690 Greg Kroah-Hartman
2012-10-11 1:00 ` [ 085/120] drm/radeon: force MSIs on RS690 asics Greg Kroah-Hartman
2012-10-11 1:00 ` [ 086/120] ia64: Add missing RCU idle APIs on idle loop Greg Kroah-Hartman
2012-10-11 1:00 ` [ 087/120] h8300: " Greg Kroah-Hartman
2012-10-11 1:00 ` [ 088/120] parisc: " Greg Kroah-Hartman
2012-10-11 1:00 ` [ 089/120] xtensa: " Greg Kroah-Hartman
2012-10-11 1:00 ` [ 090/120] frv: " Greg Kroah-Hartman
2012-10-11 1:00 ` [ 091/120] mn10300: " Greg Kroah-Hartman
2012-10-11 1:00 ` [ 092/120] m68k: " Greg Kroah-Hartman
2012-10-11 1:00 ` [ 093/120] alpha: " Greg Kroah-Hartman
2012-10-11 1:00 ` [ 094/120] cris: " Greg Kroah-Hartman
2012-10-11 1:00 ` [ 095/120] m32r: " Greg Kroah-Hartman
2012-10-11 1:00 ` [ 096/120] score: " Greg Kroah-Hartman
2012-10-11 1:00 ` [ 097/120] rcu: Fix day-one dyntick-idle stall-warning bug Greg Kroah-Hartman
2012-10-12 22:14 ` Ben Hutchings
2012-10-14 23:32 ` Paul E. McKenney
2012-10-14 23:54 ` Ben Hutchings
2012-10-15 1:07 ` Paul E. McKenney
2012-10-11 1:00 ` [ 098/120] r8169: Config1 is read-only on 8168c and later Greg Kroah-Hartman
2012-10-11 1:00 ` [ 099/120] r8169: 8168c and later require bit 0x20 to be set in Config2 for PME signaling Greg Kroah-Hartman
2012-10-11 1:00 ` [ 100/120] revert "mm: mempolicy: Let vma_merge and vma_split handle vma->vm_policy linkages" Greg Kroah-Hartman
2012-10-11 1:00 ` [ 101/120] mempolicy: remove mempolicy sharing Greg Kroah-Hartman
2012-10-11 1:00 ` [ 102/120] mempolicy: fix a race in shared_policy_replace() Greg Kroah-Hartman
2012-10-11 1:00 ` [ 103/120] mempolicy: fix refcount leak in mpol_set_shared_policy() Greg Kroah-Hartman
2012-10-11 1:00 ` [ 104/120] mempolicy: fix a memory corruption by refcount imbalance in alloc_pages_vma() Greg Kroah-Hartman
2012-10-11 1:00 ` [ 105/120] efi: Build EFI stub with EFI-appropriate options Greg Kroah-Hartman
2012-10-11 1:00 ` [ 106/120] efi: initialize efi.runtime_version to make query_variable_info/update_capsule workable Greg Kroah-Hartman
2012-10-11 1:00 ` [ 107/120] CPU hotplug, cpusets, suspend: Dont modify cpusets during suspend/resume Greg Kroah-Hartman
2012-10-11 1:01 ` [ 108/120] mtd: mtdpart: break it as soon as we parse out the partitions Greg Kroah-Hartman
2012-10-11 1:01 ` [ 109/120] mtd: autcpu12-nvram: Fix compile breakage Greg Kroah-Hartman
2012-10-11 1:01 ` [ 110/120] mtd: nandsim: bugfix: fail if overridesize is too big Greg Kroah-Hartman
2012-10-11 1:01 ` [ 111/120] mtd: nand: Use the mirror BBT descriptor when reading its version Greg Kroah-Hartman
2012-10-11 1:01 ` [ 112/120] mtd: omap2: fix omap_nand_remove segfault Greg Kroah-Hartman
2012-10-11 1:01 ` [ 113/120] mtd: omap2: fix module loading Greg Kroah-Hartman
2012-10-11 1:01 ` [ 114/120] mmc: omap_hsmmc: Pass on the suspend failure to the PM core Greg Kroah-Hartman
2012-10-11 1:01 ` [ 115/120] mmc: sh-mmcif: avoid oops on spurious interrupts Greg Kroah-Hartman
2012-10-11 1:01 ` [ 116/120] JFFS2: dont fail on bitflips in OOB Greg Kroah-Hartman
2012-10-11 1:01 ` [ 117/120] cifs: reinstate the forcegid option Greg Kroah-Hartman
2012-10-11 1:01 ` [ 118/120] Convert properly UTF-8 to UTF-16 Greg Kroah-Hartman
2012-10-11 1:01 ` [ 119/120] udf: fix retun value on error path in udf_load_logicalvol Greg Kroah-Hartman
2012-10-11 1:01 ` [ 120/120] sched: Fix migration thread runtime bogosity Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121011005838.357274078@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=alan@lxorguk.ukuu.org.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox