From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: stable@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
patches@lists.linux.dev, Peter Mann <peter.mann@sh.cz>,
Jens Axboe <axboe@kernel.dk>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.15 53/76] io_uring/rw: fix missing NOWAIT check for O_DIRECT start write
Date: Tue, 12 Nov 2024 11:21:18 +0100 [thread overview]
Message-ID: <20241112101841.799632310@linuxfoundation.org> (raw)
In-Reply-To: <20241112101839.777512218@linuxfoundation.org>
5.15-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe <axboe@kernel.dk>
Commit 1d60d74e852647255bd8e76f5a22dc42531e4389 upstream.
When io_uring starts a write, it'll call kiocb_start_write() to bump the
super block rwsem, preventing any freezes from happening while that
write is in-flight. The freeze side will grab that rwsem for writing,
excluding any new writers from happening and waiting for existing writes
to finish. But io_uring unconditionally uses kiocb_start_write(), which
will block if someone is currently attempting to freeze the mount point.
This causes a deadlock where freeze is waiting for previous writes to
complete, but the previous writes cannot complete, as the task that is
supposed to complete them is blocked waiting on starting a new write.
This results in the following stuck trace showing that dependency with
the write blocked starting a new write:
task:fio state:D stack:0 pid:886 tgid:886 ppid:876
Call trace:
__switch_to+0x1d8/0x348
__schedule+0x8e8/0x2248
schedule+0x110/0x3f0
percpu_rwsem_wait+0x1e8/0x3f8
__percpu_down_read+0xe8/0x500
io_write+0xbb8/0xff8
io_issue_sqe+0x10c/0x1020
io_submit_sqes+0x614/0x2110
__arm64_sys_io_uring_enter+0x524/0x1038
invoke_syscall+0x74/0x268
el0_svc_common.constprop.0+0x160/0x238
do_el0_svc+0x44/0x60
el0_svc+0x44/0xb0
el0t_64_sync_handler+0x118/0x128
el0t_64_sync+0x168/0x170
INFO: task fsfreeze:7364 blocked for more than 15 seconds.
Not tainted 6.12.0-rc5-00063-g76aaf945701c #7963
with the attempting freezer stuck trying to grab the rwsem:
task:fsfreeze state:D stack:0 pid:7364 tgid:7364 ppid:995
Call trace:
__switch_to+0x1d8/0x348
__schedule+0x8e8/0x2248
schedule+0x110/0x3f0
percpu_down_write+0x2b0/0x680
freeze_super+0x248/0x8a8
do_vfs_ioctl+0x149c/0x1b18
__arm64_sys_ioctl+0xd0/0x1a0
invoke_syscall+0x74/0x268
el0_svc_common.constprop.0+0x160/0x238
do_el0_svc+0x44/0x60
el0_svc+0x44/0xb0
el0t_64_sync_handler+0x118/0x128
el0t_64_sync+0x168/0x170
Fix this by having the io_uring side honor IOCB_NOWAIT, and only attempt a
blocking grab of the super block rwsem if it isn't set. For normal issue
where IOCB_NOWAIT would always be set, this returns -EAGAIN which will
have io_uring core issue a blocking attempt of the write. That will in
turn also get completions run, ensuring forward progress.
Since freezing requires CAP_SYS_ADMIN in the first place, this isn't
something that can be triggered by a regular user.
Cc: stable@vger.kernel.org # 5.10+
Reported-by: Peter Mann <peter.mann@sh.cz>
Link: https://lore.kernel.org/io-uring/38c94aec-81c9-4f62-b44e-1d87f5597644@sh.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
io_uring/io_uring.c | 23 +++++++++++++++++++++--
1 file changed, 21 insertions(+), 2 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 718ab6a418425..b53099b595cc7 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -3724,6 +3724,25 @@ static int io_write_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
return io_prep_rw(req, sqe, WRITE);
}
+static bool io_kiocb_start_write(struct io_kiocb *req, struct kiocb *kiocb)
+{
+ struct inode *inode;
+ bool ret;
+
+ if (!(req->flags & REQ_F_ISREG))
+ return true;
+ if (!(kiocb->ki_flags & IOCB_NOWAIT)) {
+ kiocb_start_write(kiocb);
+ return true;
+ }
+
+ inode = file_inode(kiocb->ki_filp);
+ ret = sb_start_write_trylock(inode->i_sb);
+ if (ret)
+ __sb_writers_release(inode->i_sb, SB_FREEZE_WRITE);
+ return ret;
+}
+
static int io_write(struct io_kiocb *req, unsigned int issue_flags)
{
struct iovec inline_vecs[UIO_FASTIOV], *iovec = inline_vecs;
@@ -3770,8 +3789,8 @@ static int io_write(struct io_kiocb *req, unsigned int issue_flags)
if (unlikely(ret))
goto out_free;
- if (req->flags & REQ_F_ISREG)
- kiocb_start_write(kiocb);
+ if (unlikely(!io_kiocb_start_write(req, kiocb)))
+ goto copy_iov;
kiocb->ki_flags |= IOCB_WRITE;
if (req->file->f_op->write_iter)
--
2.43.0
next prev parent reply other threads:[~2024-11-12 10:25 UTC|newest]
Thread overview: 85+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-12 10:20 [PATCH 5.15 00/76] 5.15.172-rc1 review Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 01/76] arm64: dts: rockchip: Fix rt5651 compatible value on rk3399-sapphire-excavator Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 02/76] arm64: dts: rockchip: Remove hdmis 2nd interrupt on rk3328 Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 03/76] arm64: dts: rockchip: Fix bluetooth properties on Rock960 boards Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 04/76] arm64: dts: rockchip: Remove #cooling-cells from fan on Theobroma lion Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 05/76] arm64: dts: rockchip: Fix LED triggers on rk3308-roc-cc Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 06/76] arm64: dts: imx8mp: correct sdhc ipg clk Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 07/76] ARM: dts: rockchip: fix rk3036 acodec node Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 08/76] ARM: dts: rockchip: drop grf reference from rk3036 hdmi Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 09/76] ARM: dts: rockchip: Fix the spi controller on rk3036 Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 10/76] ARM: dts: rockchip: Fix the realtek audio codec on rk3036-kylin Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 11/76] HID: core: zero-initialize the report buffer Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 12/76] NFSv3: only use NFS timeout for MOUNT when protocols are compatible Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 13/76] NFS: Add a tracepoint to show the results of nfs_set_cache_invalid() Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 14/76] NFSv3: handle out-of-order write replies Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 15/76] nfs: avoid i_lock contention in nfs_clear_invalid_mapping Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 16/76] security/keys: fix slab-out-of-bounds in key_task_permission Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 17/76] net: enetc: set MAC address to the VF net_device Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 18/76] sctp: properly validate chunk size in sctp_sf_ootb() Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 19/76] can: c_can: fix {rx,tx}_errors statistics Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 20/76] i40e: fix race condition by adding filters intermediate sync state Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 21/76] net: hns3: fix kernel crash when uninstalling driver Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 22/76] net: phy: ti: add PHY_RST_AFTER_CLK_EN flag Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 23/76] net: stmmac: Fix unbalanced IRQ wake disable warning on single irq case Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 24/76] net: arc: fix the device for dma_map_single/dma_unmap_single Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 25/76] Revert "ALSA: hda/conexant: Mute speakers at suspend / shutdown" Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 26/76] media: stb0899_algo: initialize cfr before using it Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 27/76] media: dvbdev: prevent the risk of out of memory access Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 28/76] media: dvb_frontend: dont play tricks with underflow values Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 29/76] media: adv7604: prevent underflow condition when reporting colorspace Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 30/76] scsi: sd_zbc: Use kvzalloc() to allocate REPORT ZONES buffer Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 31/76] ALSA: firewire-lib: fix return value on fail in amdtp_tscm_init() Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 32/76] ASoC: stm32: spdifrx: fix dma channel release in stm32_spdifrx_remove Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 33/76] media: s5p-jpeg: prevent buffer overflows Greg Kroah-Hartman
2024-11-12 10:20 ` [PATCH 5.15 34/76] media: cx24116: prevent overflows on SNR calculus Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 35/76] media: pulse8-cec: fix data timestamp at pulse8_setup() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 36/76] media: v4l2-tpg: prevent the risk of a division by zero Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 37/76] media: v4l2-ctrls-api: fix error handling for v4l2_g_ctrl() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 38/76] pwm: imx-tpm: Use correct MODULO value for EPWM mode Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 39/76] drm/amdgpu: Adjust debugfs eviction and IB access permissions Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 40/76] drm/amdgpu: add missing size check in amdgpu_debugfs_gprwave_read() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 41/76] drm/amdgpu: prevent NULL pointer dereference if ATIF is not supported Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 42/76] thermal/drivers/qcom/lmh: Remove false lockdep backtrace Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 43/76] dm cache: correct the number of origin blocks to match the target length Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 44/76] dm cache: fix out-of-bounds access to the dirty bitset when resizing Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 45/76] dm cache: optimize dirty bit checking with find_next_bit " Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 46/76] dm cache: fix potential out-of-bounds access on the first resume Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 47/76] dm-unstriped: cast an operand to sector_t to prevent potential uint32_t overflow Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 48/76] ALSA: usb-audio: Add quirk for HP 320 FHD Webcam Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 49/76] posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 50/76] io_uring: rename kiocb_end_write() local helper Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 51/76] fs: create kiocb_{start,end}_write() helpers Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 52/76] io_uring: use " Greg Kroah-Hartman
2024-11-12 10:21 ` Greg Kroah-Hartman [this message]
2024-11-12 10:21 ` [PATCH 5.15 54/76] nfs: Fix KMSAN warning in decode_getfattr_attrs() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 55/76] btrfs: reinitialize delayed ref list after deleting it from the list Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 56/76] net: bridge: xmit: make sure we have at least eth header len bytes Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 57/76] ice: Add a per-VF limit on number of FDIR filters Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 58/76] net: do not delay dst_entries_add() in dst_release() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 59/76] media: uvcvideo: Skip parsing frames of type UVC_VS_UNDEFINED in uvc_parse_format Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 60/76] fs/proc: fix compile warning about variable vmcore_mmap_ops Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 61/76] usb: musb: sunxi: Fix accessing an released usb phy Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 62/76] usb: dwc3: fix fault at system suspend if device was already runtime suspended Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 63/76] usb: typec: fix potential out of bounds in ucsi_ccg_update_set_new_cam_cmd() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 64/76] USB: serial: io_edgeport: fix use after free in debug printk Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 65/76] USB: serial: qcserial: add support for Sierra Wireless EM86xx Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 66/76] USB: serial: option: add Fibocom FG132 0x0112 composition Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 67/76] USB: serial: option: add Quectel RG650V Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 68/76] irqchip/gic-v3: Force propagation of the active state with a read-back Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 69/76] ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 70/76] ucounts: fix counter leak in inc_rlimit_get_ucounts() Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 71/76] ALSA: usb-audio: Support jack detection on Dell dock Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 72/76] ALSA: usb-audio: Add quirks for Dell WD19 dock Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 73/76] ACPI: PRM: Clean up guid type in struct prm_handler_info Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 74/76] hv_sock: Initializing vsk->trans to NULL to prevent a dangling pointer Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 75/76] vsock/virtio: Initialization of the dangling pointer occurring in vsk->trans Greg Kroah-Hartman
2024-11-12 10:21 ` [PATCH 5.15 76/76] ALSA: usb-audio: Add endianness annotations Greg Kroah-Hartman
2024-11-12 16:05 ` [PATCH 5.15 00/76] 5.15.172-rc1 review Harshit Mogalapalli
2024-11-12 23:22 ` Shuah Khan
2024-11-13 0:45 ` Ron Economos
2024-11-13 0:56 ` Florian Fainelli
2024-11-13 11:28 ` Naresh Kamboju
2024-11-13 13:30 ` Mark Brown
2024-11-13 19:58 ` Jon Hunter
2024-11-14 10:49 ` [PATCH 5.15] " Hardik Garg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241112101841.799632310@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=axboe@kernel.dk \
--cc=patches@lists.linux.dev \
--cc=peter.mann@sh.cz \
--cc=sashal@kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox