From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Douglas Anderson <dianders@chromium.org>,
Guenter Roeck <groeck@chromium.org>,
Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
Sasha Levin <sashal@kernel.org>,
linux-fsdevel@vger.kernel.org
Subject: [PATCH AUTOSEL 4.9 66/90] bdev: Reduce time holding bd_mutex in sync in blkdev_close()
Date: Thu, 17 Sep 2020 22:14:31 -0400 [thread overview]
Message-ID: <20200918021455.2067301-66-sashal@kernel.org> (raw)
In-Reply-To: <20200918021455.2067301-1-sashal@kernel.org>
From: Douglas Anderson <dianders@chromium.org>
[ Upstream commit b849dd84b6ccfe32622988b79b7b073861fcf9f7 ]
While trying to "dd" to the block device for a USB stick, I
encountered a hung task warning (blocked for > 120 seconds). I
managed to come up with an easy way to reproduce this on my system
(where /dev/sdb is the block device for my USB stick) with:
while true; do dd if=/dev/zero of=/dev/sdb bs=4M; done
With my reproduction here are the relevant bits from the hung task
detector:
INFO: task udevd:294 blocked for more than 122 seconds.
...
udevd D 0 294 1 0x00400008
Call trace:
...
mutex_lock_nested+0x40/0x50
__blkdev_get+0x7c/0x3d4
blkdev_get+0x118/0x138
blkdev_open+0x94/0xa8
do_dentry_open+0x268/0x3a0
vfs_open+0x34/0x40
path_openat+0x39c/0xdf4
do_filp_open+0x90/0x10c
do_sys_open+0x150/0x3c8
...
...
Showing all locks held in the system:
...
1 lock held by dd/2798:
#0: ffffff814ac1a3b8 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x50/0x204
...
dd D 0 2798 2764 0x00400208
Call trace:
...
schedule+0x8c/0xbc
io_schedule+0x1c/0x40
wait_on_page_bit_common+0x238/0x338
__lock_page+0x5c/0x68
write_cache_pages+0x194/0x500
generic_writepages+0x64/0xa4
blkdev_writepages+0x24/0x30
do_writepages+0x48/0xa8
__filemap_fdatawrite_range+0xac/0xd8
filemap_write_and_wait+0x30/0x84
__blkdev_put+0x88/0x204
blkdev_put+0xc4/0xe4
blkdev_close+0x28/0x38
__fput+0xe0/0x238
____fput+0x1c/0x28
task_work_run+0xb0/0xe4
do_notify_resume+0xfc0/0x14bc
work_pending+0x8/0x14
The problem appears related to the fact that my USB disk is terribly
slow and that I have a lot of RAM in my system to cache things.
Specifically my writes seem to be happening at ~15 MB/s and I've got
~4 GB of RAM in my system that can be used for buffering. To write 4
GB of buffer to disk thus takes ~4000 MB / ~15 MB/s = ~267 seconds.
The 267 second number is a problem because in __blkdev_put() we call
sync_blockdev() while holding the bd_mutex. Any other callers who
want the bd_mutex will be blocked for the whole time.
The problem is made worse because I believe blkdev_put() specifically
tells other tasks (namely udev) to go try to access the device at right
around the same time we're going to hold the mutex for a long time.
Putting some traces around this (after disabling the hung task detector),
I could confirm:
dd: 437.608600: __blkdev_put() right before sync_blockdev() for sdb
udevd: 437.623901: blkdev_open() right before blkdev_get() for sdb
dd: 661.468451: __blkdev_put() right after sync_blockdev() for sdb
udevd: 663.820426: blkdev_open() right after blkdev_get() for sdb
A simple fix for this is to realize that sync_blockdev() works fine if
you're not holding the mutex. Also, it's not the end of the world if
you sync a little early (though it can have performance impacts).
Thus we can make a guess that we're going to need to do the sync and
then do it without holding the mutex. We still do one last sync with
the mutex but it should be much, much faster.
With this, my hung task warnings for my test case are gone.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
fs/block_dev.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 06f7cbe201326..98b37e77683d3 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1586,6 +1586,16 @@ static void __blkdev_put(struct block_device *bdev, fmode_t mode, int for_part)
struct gendisk *disk = bdev->bd_disk;
struct block_device *victim = NULL;
+ /*
+ * Sync early if it looks like we're the last one. If someone else
+ * opens the block device between now and the decrement of bd_openers
+ * then we did a sync that we didn't need to, but that's not the end
+ * of the world and we want to avoid long (could be several minute)
+ * syncs while holding the mutex.
+ */
+ if (bdev->bd_openers == 1)
+ sync_blockdev(bdev);
+
mutex_lock_nested(&bdev->bd_mutex, for_part);
if (for_part)
bdev->bd_part_count--;
--
2.25.1
next prev parent reply other threads:[~2020-09-18 2:25 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-09-18 2:13 [PATCH AUTOSEL 4.9 01/90] scsi: aacraid: fix illegal IO beyond last LBA Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 02/90] m68k: q40: Fix info-leak in rtc_ioctl Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 03/90] gma/gma500: fix a memory disclosure bug due to uninitialized bytes Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 04/90] ASoC: kirkwood: fix IRQ error handling Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 05/90] ata: sata_mv, avoid trigerrable BUG_ON Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 06/90] PM / devfreq: tegra30: Fix integer overflow on CPU's freq max out Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 07/90] clk/ti/adpll: allocate room for terminating null Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 08/90] mtd: cfi_cmdset_0002: don't free cfi->cfiq in error path of cfi_amdstd_setup() Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 09/90] mfd: mfd-core: Protect against NULL call-back function pointer Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 10/90] tracing: Adding NULL checks for trace_array descriptor pointer Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 11/90] bcache: fix a lost wake-up problem caused by mca_cannibalize_lock Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 12/90] RDMA/i40iw: Fix potential use after free Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 13/90] xfs: fix attr leaf header freemap.size underflow Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 14/90] RDMA/iw_cgxb4: Fix an error handling path in 'c4iw_connect()' Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 15/90] debugfs: Fix !DEBUG_FS debugfs_create_automount Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 16/90] CIFS: Properly process SMB3 lease breaks Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 17/90] kernel/sys.c: avoid copying possible padding bytes in copy_to_user Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 18/90] neigh_stat_seq_next() should increase position index Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 19/90] rt_cpu_seq_next " Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 20/90] seqlock: Require WRITE_ONCE surrounding raw_seqcount_barrier Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 21/90] media: ti-vpe: cal: Restrict DMA to avoid memory corruption Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 22/90] ACPI: EC: Reference count query handlers under lock Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 23/90] efi/arm: Defer probe of PCIe backed efifb on DT systems Sasha Levin
2020-09-18 6:25 ` Ard Biesheuvel
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 24/90] dmaengine: zynqmp_dma: fix burst length configuration Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 25/90] tracing: Set kernel_stack's caller size properly Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 26/90] ext4: make dioread_nolock the default Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 27/90] ar5523: Add USB ID of SMCWUSBT-G2 wireless adapter Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 28/90] Bluetooth: Fix refcount use-after-free issue Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 29/90] mm: pagewalk: fix termination condition in walk_pte_range() Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 30/90] Bluetooth: prefetch channel before killing sock Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 31/90] KVM: fix overflow of zero page refcount with ksm running Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 32/90] ALSA: hda: Clear RIRB status before reading WP Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 33/90] skbuff: fix a data race in skb_queue_len() Sasha Levin
2020-09-18 2:13 ` [PATCH AUTOSEL 4.9 34/90] audit: CONFIG_CHANGE don't log internal bookkeeping as an event Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 35/90] selinux: sel_avc_get_stat_idx should increase position index Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 36/90] scsi: lpfc: Fix RQ buffer leakage when no IOCBs available Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 37/90] scsi: lpfc: Fix coverity errors in fmdi attribute handling Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 38/90] drm/omap: fix possible object reference leak Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 39/90] RDMA/rxe: Fix configuration of atomic queue pair attributes Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 40/90] KVM: x86: fix incorrect comparison in trace event Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 41/90] x86/pkeys: Add check for pkey "overflow" Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 42/90] bpf: Remove recursion prevention from rcu free callback Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 43/90] dmaengine: tegra-apb: Prevent race conditions on channel's freeing Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 44/90] media: go7007: Fix URB type for interrupt handling Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 45/90] Bluetooth: guard against controllers sending zero'd events Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 46/90] timekeeping: Prevent 32bit truncation in scale64_check_overflow() Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 47/90] drm/amdgpu: increase atombios cmd timeout Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 48/90] Bluetooth: L2CAP: handle l2cap config request during open state Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 49/90] media: tda10071: fix unsigned sign extension overflow Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 50/90] xfs: don't ever return a stale pointer from __xfs_dir3_free_read Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 51/90] tpm: ibmvtpm: Wait for buffer to be set before proceeding Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 52/90] tracing: Use address-of operator on section symbols Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 53/90] serial: 8250_port: Don't service RX FIFO if throttled Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 54/90] serial: 8250_omap: Fix sleeping function called from invalid context during probe Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 55/90] serial: 8250: 8250_omap: Terminate DMA before pushing data on RX timeout Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 56/90] cpufreq: powernv: Fix frame-size-overflow in powernv_cpufreq_work_fn Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 57/90] tools: gpio-hammer: Avoid potential overflow in main Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 58/90] SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()' Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 59/90] svcrdma: Fix leak of transport addresses Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 60/90] ubifs: Fix out-of-bounds memory access caused by abnormal value of node_len Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 61/90] ALSA: usb-audio: Fix case when USB MIDI interface has more than one extra endpoint descriptor Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 62/90] mm/filemap.c: clear page error before actual read Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 63/90] mm/mmap.c: initialize align_offset explicitly for vm_unmapped_area Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 64/90] serial: uartps: Wait for tx_empty in console setup Sasha Levin
2020-09-28 20:11 ` Naresh Kamboju
2020-09-28 20:13 ` Naresh Kamboju
2020-09-29 6:59 ` Greg Kroah-Hartman
2020-09-29 17:39 ` Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 65/90] KVM: Remove CREATE_IRQCHIP/SET_PIT2 race Sasha Levin
2020-09-18 2:14 ` Sasha Levin [this message]
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 67/90] drivers: char: tlclk.c: Avoid data race between init and interrupt handler Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 68/90] dt-bindings: sound: wm8994: Correct required supplies based on actual implementaion Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 69/90] atm: fix a memory leak of vcc->user_back Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 70/90] phy: samsung: s5pv210-usb2: Add delay after reset Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 71/90] Bluetooth: Handle Inquiry Cancel error after Inquiry Complete Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 72/90] USB: EHCI: ehci-mv: fix error handling in mv_ehci_probe() Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 73/90] tty: serial: samsung: Correct clock selection logic Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 74/90] ALSA: hda: Fix potential race in unsol event handler Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 75/90] fuse: don't check refcount after stealing page Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 76/90] USB: EHCI: ehci-mv: fix less than zero comparison of an unsigned int Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 77/90] e1000: Do not perform reset in reset_task if we are already down Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 78/90] printk: handle blank console arguments passed in Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 79/90] btrfs: don't force read-only after error in drop snapshot Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 80/90] vfio/pci: fix memory leaks of eventfd ctx Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 81/90] perf util: Fix memory leak of prefix_if_not_in Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 82/90] perf kcore_copy: Fix module map when there are no modules loaded Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 83/90] mtd: rawnand: omap_elm: Fix runtime PM imbalance on error Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 84/90] ceph: fix potential race in ceph_check_caps Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 85/90] mtd: parser: cmdline: Support MTD names containing one or more colons Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 86/90] x86/speculation/mds: Mark mds_user_clear_cpu_buffers() __always_inline Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 87/90] vfio/pci: Clear error and request eventfd ctx after releasing Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 88/90] cifs: Fix double add page to memcg when cifs_readpages Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 89/90] selftests/x86/syscall_nt: Clear weird flags after each test Sasha Levin
2020-09-18 2:14 ` [PATCH AUTOSEL 4.9 90/90] vfio/pci: fix racy on error and request eventfd ctx Sasha Levin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200918021455.2067301-66-sashal@kernel.org \
--to=sashal@kernel.org \
--cc=axboe@kernel.dk \
--cc=dianders@chromium.org \
--cc=groeck@chromium.org \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox