From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Peter Mann <peter.mann@sh.cz>,
Jens Axboe <axboe@kernel.dk>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.15 53/76] io_uring/rw: fix missing NOWAIT check for O_DIRECT start write
Date: Tue, 12 Nov 2024 11:21:18 +0100 [thread overview]
Message-ID: <20241112101841.799632310@linuxfoundation.org> (raw)
In-Reply-To: <20241112101839.777512218@linuxfoundation.org>
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe <axboe@kernel.dk>
Commit 1d60d74e852647255bd8e76f5a22dc42531e4389 upstream.
When io_uring starts a write, it'll call kiocb_start_write() to bump the
super block rwsem, preventing any freezes from happening while that
write is in-flight. The freeze side will grab that rwsem for writing,
excluding any new writers from happening and waiting for existing writes
to finish. But io_uring unconditionally uses kiocb_start_write(), which
will block if someone is currently attempting to freeze the mount point.
This causes a deadlock where freeze is waiting for previous writes to
complete, but the previous writes cannot complete, as the task that is
supposed to complete them is blocked waiting on starting a new write.
This results in the following stuck trace showing that dependency with
the write blocked starting a new write:
task:fio state:D stack:0 pid:886 tgid:886 ppid:876
Call trace:
__switch_to+0x1d8/0x348
__schedule+0x8e8/0x2248
schedule+0x110/0x3f0
percpu_rwsem_wait+0x1e8/0x3f8
__percpu_down_read+0xe8/0x500
io_write+0xbb8/0xff8
io_issue_sqe+0x10c/0x1020
io_submit_sqes+0x614/0x2110
__arm64_sys_io_uring_enter+0x524/0x1038
invoke_syscall+0x74/0x268
el0_svc_common.constprop.0+0x160/0x238
do_el0_svc+0x44/0x60
el0_svc+0x44/0xb0
el0t_64_sync_handler+0x118/0x128
el0t_64_sync+0x168/0x170
INFO: task fsfreeze:7364 blocked for more than 15 seconds.
Not tainted 6.12.0-rc5-00063-g76aaf945701c #7963
with the attempting freezer stuck trying to grab the rwsem:
task:fsfreeze state:D stack:0 pid:7364 tgid:7364 ppid:995
Call trace:
__switch_to+0x1d8/0x348
__schedule+0x8e8/0x2248
schedule+0x110/0x3f0
percpu_down_write+0x2b0/0x680
freeze_super+0x248/0x8a8
do_vfs_ioctl+0x149c/0x1b18
__arm64_sys_ioctl+0xd0/0x1a0
invoke_syscall+0x74/0x268
el0_svc_common.constprop.0+0x160/0x238
do_el0_svc+0x44/0x60
el0_svc+0x44/0xb0
el0t_64_sync_handler+0x118/0x128
el0t_64_sync+0x168/0x170
Fix this by having the io_uring side honor IOCB_NOWAIT, and only attempt a
blocking grab of the super block rwsem if it isn't set. For normal issue
where IOCB_NOWAIT would always be set, this returns -EAGAIN which will
have io_uring core issue a blocking attempt of the write. That will in
turn also get completions run, ensuring forward progress.
Since freezing requires CAP_SYS_ADMIN in the first place, this isn't
something that can be triggered by a regular user.
Cc: stable@vger.kernel.org # 5.10+
Reported-by: Peter Mann <peter.mann@sh.cz>
Link: https://lore.kernel.org/io-uring/38c94aec-81c9-4f62-b44e-1d87f5597644@sh.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
io_uring/io_uring.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 718ab6a418425..b53099b595cc7 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3724,6 +3724,25 @@ static int io_write_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return io_prep_rw(req, sqe, WRITE);
}
+static bool io_kiocb_start_write(struct io_kiocb *req, struct kiocb *kiocb)
+{
+ struct inode *inode;
+ bool ret;
+
+ if (!(req->flags & REQ_F_ISREG))
+ return true;
+ if (!(kiocb->ki_flags & IOCB_NOWAIT)) {
+ kiocb_start_write(kiocb);
+ return true;
+ }
+
+ inode = file_inode(kiocb->ki_filp);
+ ret = sb_start_write_trylock(inode->i_sb);
+ if (ret)
+ __sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
+ return ret;
+}
+
static int io_write(struct io_kiocb *req, unsigned int issue_flags)
{
struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
@@ -3770,8 +3789,8 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
goto out_free;
- if (req->flags & REQ_F_ISREG)
- kiocb_start_write(kiocb);
+ if (unlikely(!io_kiocb_start_write(req, kiocb)))
+ goto copy_iov;
kiocb->ki_flags |= IOCB_WRITE;
if (req->file->f_op->write_iter)
--
2.43.0
next prev parent reply other threads:[~2024-11-12 10:25 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-12 10:20 [PATCH 5.15 00/76] 5.15.172-rc1 review Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 01/76] arm64: dts: rockchip: Fix rt5651 compatible value on rk3399-sapphire-excavator Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 02/76] arm64: dts: rockchip: Remove hdmis 2nd interrupt on rk3328 Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 03/76] arm64: dts: rockchip: Fix bluetooth properties on Rock960 boards Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 04/76] arm64: dts: rockchip: Remove #cooling-cells from fan on Theobroma lion Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 05/76] arm64: dts: rockchip: Fix LED triggers on rk3308-roc-cc Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 06/76] arm64: dts: imx8mp: correct sdhc ipg clk Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 07/76] ARM: dts: rockchip: fix rk3036 acodec node Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 08/76] ARM: dts: rockchip: drop grf reference from rk3036 hdmi Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 09/76] ARM: dts: rockchip: Fix the spi controller on rk3036 Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 10/76] ARM: dts: rockchip: Fix the realtek audio codec on rk3036-kylin Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 11/76] HID: core: zero-initialize the report buffer Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 12/76] NFSv3: only use NFS timeout for MOUNT when protocols are compatible Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 13/76] NFS: Add a tracepoint to show the results of nfs_set_cache_invalid() Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 14/76] NFSv3: handle out-of-order write replies Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 15/76] nfs: avoid i_lock contention in nfs_clear_invalid_mapping Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 16/76] security/keys: fix slab-out-of-bounds in key_task_permission Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 17/76] net: enetc: set MAC address to the VF net_device Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 18/76] sctp: properly validate chunk size in sctp_sf_ootb() Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 19/76] can: c_can: fix {rx,tx}_errors statistics Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 20/76] i40e: fix race condition by adding filters intermediate sync state Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 21/76] net: hns3: fix kernel crash when uninstalling driver Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 22/76] net: phy: ti: add PHY_RST_AFTER_CLK_EN flag Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 23/76] net: stmmac: Fix unbalanced IRQ wake disable warning on single irq case Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 24/76] net: arc: fix the device for dma_map_single/dma_unmap_single Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 25/76] Revert "ALSA: hda/conexant: Mute speakers at suspend / shutdown" Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 26/76] media: stb0899_algo: initialize cfr before using it Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 27/76] media: dvbdev: prevent the risk of out of memory access Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 28/76] media: dvb_frontend: dont play tricks with underflow values Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 29/76] media: adv7604: prevent underflow condition when reporting colorspace Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 30/76] scsi: sd_zbc: Use kvzalloc() to allocate REPORT ZONES buffer Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 31/76] ALSA: firewire-lib: fix return value on fail in amdtp_tscm_init() Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 32/76] ASoC: stm32: spdifrx: fix dma channel release in stm32_spdifrx_remove Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 33/76] media: s5p-jpeg: prevent buffer overflows Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 34/76] media: cx24116: prevent overflows on SNR calculus Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 35/76] media: pulse8-cec: fix data timestamp at pulse8_setup() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 36/76] media: v4l2-tpg: prevent the risk of a division by zero Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 37/76] media: v4l2-ctrls-api: fix error handling for v4l2_g_ctrl() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 38/76] pwm: imx-tpm: Use correct MODULO value for EPWM mode Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 39/76] drm/amdgpu: Adjust debugfs eviction and IB access permissions Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 40/76] drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 41/76] drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 42/76] thermal/drivers/qcom/lmh: Remove false lockdep backtrace Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 43/76] dm cache: correct the number of origin blocks to match the target length Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 44/76] dm cache: fix out-of-bounds access to the dirty bitset when resizing Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 45/76] dm cache: optimize dirty bit checking with find_next_bit " Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 46/76] dm cache: fix potential out-of-bounds access on the first resume Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 47/76] dm-unstriped: cast an operand to sector_t to prevent potential uint32_t overflow Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 48/76] ALSA: usb-audio: Add quirk for HP 320 FHD Webcam Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 49/76] posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 50/76] io_uring: rename kiocb_end_write() local helper Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 51/76] fs: create kiocb_{start,end}_write() helpers Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 52/76] io_uring: use " Greg Kroah-Hartman
2024-11-12 10:21 ` Greg Kroah-Hartman [this message]
2024-11-12 10:21 ` [PATCH 5.15 54/76] nfs: Fix KMSAN warning in decode_getfattr_attrs() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 55/76] btrfs: reinitialize delayed ref list after deleting it from the list Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 56/76] net: bridge: xmit: make sure we have at least eth header len bytes Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 57/76] ice: Add a per-VF limit on number of FDIR filters Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 58/76] net: do not delay dst_entries_add() in dst_release() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 59/76] media: uvcvideo: Skip parsing frames of type UVC_VS_UNDEFINED in uvc_parse_format Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 60/76] fs/proc: fix compile warning about variable vmcore_mmap_ops Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 61/76] usb: musb: sunxi: Fix accessing an released usb phy Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 62/76] usb: dwc3: fix fault at system suspend if device was already runtime suspended Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 63/76] usb: typec: fix potential out of bounds in ucsi_ccg_update_set_new_cam_cmd() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 64/76] USB: serial: io_edgeport: fix use after free in debug printk Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 65/76] USB: serial: qcserial: add support for Sierra Wireless EM86xx Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 66/76] USB: serial: option: add Fibocom FG132 0x0112 composition Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 67/76] USB: serial: option: add Quectel RG650V Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 68/76] irqchip/gic-v3: Force propagation of the active state with a read-back Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 69/76] ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 70/76] ucounts: fix counter leak in inc_rlimit_get_ucounts() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 71/76] ALSA: usb-audio: Support jack detection on Dell dock Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 72/76] ALSA: usb-audio: Add quirks for Dell WD19 dock Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 73/76] ACPI: PRM: Clean up guid type in struct prm_handler_info Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 74/76] hv_sock: Initializing vsk->trans to NULL to prevent a dangling pointer Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 75/76] vsock/virtio: Initialization of the dangling pointer occurring in vsk->trans Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 76/76] ALSA: usb-audio: Add endianness annotations Greg Kroah-Hartman
2024-11-12 16:05 ` [PATCH 5.15 00/76] 5.15.172-rc1 review Harshit Mogalapalli
2024-11-12 23:22 ` Shuah Khan
2024-11-13 0:45 ` Ron Economos
2024-11-13 0:56 ` Florian Fainelli
2024-11-13 11:28 ` Naresh Kamboju
2024-11-13 13:30 ` Mark Brown
2024-11-13 19:58 ` Jon Hunter
2024-11-14 10:49 ` [PATCH 5.15] " Hardik Garg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241112101841.799632310@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=axboe@kernel.dk \
--cc=patches@lists.linux.dev \
--cc=peter.mann@sh.cz \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.