All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>,
	Jens Axboe <axboe@kernel.dk>, Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.5 34/65] io_uring: fix poll_list race for SETUP_IOPOLL|SETUP_SQPOLL
Date: Thu, 19 Mar 2020 14:04:16 +0100	[thread overview]
Message-ID: <20200319123937.204774554@linuxfoundation.org> (raw)
In-Reply-To: <20200319123926.466988514@linuxfoundation.org>

From: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

[ Upstream commit bdcd3eab2a9ae0ac93f27275b6895dd95e5bf360 ]

After making ext4 support iopoll method:
  let ext4_file_operations's iopoll method be iomap_dio_iopoll(),
we found fio can easily hang in fio_ioring_getevents() with below fio
job:
    rm -f testfile; sync;
    sudo fio -name=fiotest -filename=testfile -iodepth=128 -thread
-rw=write -ioengine=io_uring  -hipri=1 -sqthread_poll=1 -direct=1
-bs=4k -size=10G -numjobs=8 -runtime=2000 -group_reporting
with IORING_SETUP_SQPOLL and IORING_SETUP_IOPOLL enabled.

There are two issues that results in this hang, one reason is that
when IORING_SETUP_SQPOLL and IORING_SETUP_IOPOLL are enabled, fio
does not use io_uring_enter to get completed events, it relies on
kernel io_sq_thread to poll for completed events.

Another reason is that there is a race: when io_submit_sqes() in
io_sq_thread() submits a batch of sqes, variable 'inflight' will
record the number of submitted reqs, then io_sq_thread will poll for
reqs which have been added to poll_list. But note, if some previous
reqs have been punted to io worker, these reqs will won't be in
poll_list timely. io_sq_thread() will only poll for a part of previous
submitted reqs, and then find poll_list is empty, reset variable
'inflight' to be zero. If app just waits these deferred reqs and does
not wake up io_sq_thread again, then hang happens.

For app that entirely relies on io_sq_thread to poll completed requests,
let io_iopoll_req_issued() wake up io_sq_thread properly when adding new
element to poll_list, and when io_sq_thread prepares to sleep, check
whether poll_list is empty again, if not empty, continue to poll.

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/io_uring.c | 59 +++++++++++++++++++++++----------------------------
 1 file changed, 27 insertions(+), 32 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 60a4832089982..c8f8cc2463986 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1435,6 +1435,10 @@ static void io_iopoll_req_issued(struct io_kiocb *req)
 		list_add(&req->list, &ctx->poll_list);
 	else
 		list_add_tail(&req->list, &ctx->poll_list);
+
+	if ((ctx->flags & IORING_SETUP_SQPOLL) &&
+	    wq_has_sleeper(&ctx->sqo_wait))
+		wake_up(&ctx->sqo_wait);
 }
 
 static void io_file_put(struct io_submit_state *state)
@@ -3857,9 +3861,8 @@ static int io_sq_thread(void *data)
 	const struct cred *old_cred;
 	mm_segment_t old_fs;
 	DEFINE_WAIT(wait);
-	unsigned inflight;
 	unsigned long timeout;
-	int ret;
+	int ret = 0;
 
 	complete(&ctx->completions[1]);
 
@@ -3867,39 +3870,19 @@ static int io_sq_thread(void *data)
 	set_fs(USER_DS);
 	old_cred = override_creds(ctx->creds);
 
-	ret = timeout = inflight = 0;
+	timeout = jiffies + ctx->sq_thread_idle;
 	while (!kthread_should_park()) {
 		unsigned int to_submit;
 
-		if (inflight) {
+		if (!list_empty(&ctx->poll_list)) {
 			unsigned nr_events = 0;
 
-			if (ctx->flags & IORING_SETUP_IOPOLL) {
-				/*
-				 * inflight is the count of the maximum possible
-				 * entries we submitted, but it can be smaller
-				 * if we dropped some of them. If we don't have
-				 * poll entries available, then we know that we
-				 * have nothing left to poll for. Reset the
-				 * inflight count to zero in that case.
-				 */
-				mutex_lock(&ctx->uring_lock);
-				if (!list_empty(&ctx->poll_list))
-					io_iopoll_getevents(ctx, &nr_events, 0);
-				else
-					inflight = 0;
-				mutex_unlock(&ctx->uring_lock);
-			} else {
-				/*
-				 * Normal IO, just pretend everything completed.
-				 * We don't have to poll completions for that.
-				 */
-				nr_events = inflight;
-			}
-
-			inflight -= nr_events;
-			if (!inflight)
+			mutex_lock(&ctx->uring_lock);
+			if (!list_empty(&ctx->poll_list))
+				io_iopoll_getevents(ctx, &nr_events, 0);
+			else
 				timeout = jiffies + ctx->sq_thread_idle;
+			mutex_unlock(&ctx->uring_lock);
 		}
 
 		to_submit = io_sqring_entries(ctx);
@@ -3928,7 +3911,7 @@ static int io_sq_thread(void *data)
 			 * more IO, we should wait for the application to
 			 * reap events and wake us up.
 			 */
-			if (inflight ||
+			if (!list_empty(&ctx->poll_list) ||
 			    (!time_after(jiffies, timeout) && ret != -EBUSY &&
 			    !percpu_ref_is_dying(&ctx->refs))) {
 				cond_resched();
@@ -3938,6 +3921,19 @@ static int io_sq_thread(void *data)
 			prepare_to_wait(&ctx->sqo_wait, &wait,
 						TASK_INTERRUPTIBLE);
 
+			/*
+			 * While doing polled IO, before going to sleep, we need
+			 * to check if there are new reqs added to poll_list, it
+			 * is because reqs may have been punted to io worker and
+			 * will be added to poll_list later, hence check the
+			 * poll_list again.
+			 */
+			if ((ctx->flags & IORING_SETUP_IOPOLL) &&
+			    !list_empty_careful(&ctx->poll_list)) {
+				finish_wait(&ctx->sqo_wait, &wait);
+				continue;
+			}
+
 			/* Tell userspace we may need a wakeup call */
 			ctx->rings->sq_flags |= IORING_SQ_NEED_WAKEUP;
 			/* make sure to read SQ tail after writing flags */
@@ -3966,8 +3962,7 @@ static int io_sq_thread(void *data)
 		mutex_lock(&ctx->uring_lock);
 		ret = io_submit_sqes(ctx, to_submit, NULL, -1, &cur_mm, true);
 		mutex_unlock(&ctx->uring_lock);
-		if (ret > 0)
-			inflight += ret;
+		timeout = jiffies + ctx->sq_thread_idle;
 	}
 
 	set_fs(old_fs);
-- 
2.20.1




  parent reply	other threads:[~2020-03-19 13:26 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-19 13:03 [PATCH 5.5 00/65] 5.5.11-rc1 review Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 01/65] gpiolib: Add support for the irqdomain which doesnt use irq_fwspec as arg Greg Kroah-Hartman
2020-03-19 13:33   ` Kevin Hao
2020-03-19 13:47     ` Greg Kroah-Hartman
2020-03-19 14:47       ` Kevin Hao
2020-03-19 14:58         ` Greg Kroah-Hartman
2020-03-19 22:27         ` Linus Walleij
2020-03-19 13:03 ` [PATCH 5.5 02/65] pinctrl: qcom: ssbi-gpio: Fix fwspec parsing bug Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 03/65] mmc: core: Default to generic_cmd6_time as timeout in __mmc_switch() Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 04/65] mmc: core: Allow host controllers to require R1B for CMD6 Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 05/65] mmc: sdhci-tegra: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 06/65] mmc: sdhci-omap: " Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 07/65] mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for eMMC sleep command Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 08/65] mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for erase/trim/discard Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 09/65] ACPI: watchdog: Allow disabling WDAT at boot Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 10/65] HID: apple: Add support for recent firmware on Magic Keyboards Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 11/65] ACPI: watchdog: Set default timeout in probe Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 12/65] HID: i2c-hid: add Trekstor Surfbook E11B to descriptor override Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 13/65] mips: vdso: fix jalr t9 crash in vdso code Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 14/65] MIPS: Disable VDSO time functionality on microMIPS Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 15/65] mips: vdso: add build time check that no jalr t9 calls left Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 16/65] HID: hid-bigbenff: fix general protection fault caused by double kfree Greg Kroah-Hartman
2020-03-19 13:03 ` [PATCH 5.5 17/65] HID: hid-bigbenff: call hid_hw_stop() in case of error Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 18/65] HID: hid-bigbenff: fix race condition for scheduled work during removal Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 19/65] riscv: set pmp configuration if kernel is running in M-mode Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 20/65] MIPS: vdso: Wrap -mexplicit-relocs in cc-option Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 21/65] kunit: run kunit_tool from any directory Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 22/65] selftests/rseq: Fix out-of-tree compilation Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 23/65] tracing: Fix number printing bug in print_synth_event() Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 24/65] cfg80211: check reg_rule for NULL in handle_channel_custom() Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 25/65] scsi: libfc: free response frame from GPN_ID Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 26/65] net: usb: qmi_wwan: restore mtu min/max values after raw_ip switch Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 27/65] net: ks8851-ml: Fix IRQ handling and locking Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 28/65] mac80211: rx: avoid RCU list traversal under mutex Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 29/65] net: ll_temac: Fix race condition causing TX hang Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 30/65] net: ll_temac: Add more error handling of dma_map_single() calls Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 31/65] net: ll_temac: Fix RX buffer descriptor handling on GFP_ATOMIC pressure Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 32/65] net: ll_temac: Handle DMA halt condition caused by buffer underrun Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 33/65] blk-mq: insert passthrough request into hctx->dispatch directly Greg Kroah-Hartman
2020-03-19 13:04 ` Greg Kroah-Hartman [this message]
2020-03-19 13:04 ` [PATCH 5.5 35/65] drm/amdgpu: fix memory leak during TDR test(v2) Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 36/65] io_uring: pick up link work on submit reference drop Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 37/65] kbuild: add dtbs_check to PHONY Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 38/65] kbuild: add dt_binding_check to PHONY in a correct place Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 39/65] signal: avoid double atomic counter increments for user accounting Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 40/65] net: bcmgenet: Clear ID_MODE_DIS in EXT_RGMII_OOB_CTRL when not needed Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 41/65] slip: not call free_netdev before rtnl_unlock in slip_open Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 42/65] net: phy: mscc: fix firmware paths Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 43/65] hinic: fix a irq affinity bug Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 44/65] hinic: fix a bug of setting hw_ioctxt Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 45/65] hinic: fix a bug of rss configuration Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 46/65] net: rmnet: fix NULL pointer dereference in rmnet_newlink() Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 47/65] net: rmnet: fix NULL pointer dereference in rmnet_changelink() Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 48/65] net: rmnet: fix suspicious RCU usage Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 49/65] net: rmnet: remove rcu_read_lock in rmnet_force_unassociate_device() Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 50/65] net: rmnet: do not allow to change mux id if mux id is duplicated Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 51/65] net: rmnet: use upper/lower device infrastructure Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 52/65] net: rmnet: fix bridge mode bugs Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 53/65] net: rmnet: fix packet forwarding in rmnet bridge mode Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 54/65] sfc: fix timestamp reconstruction at 16-bit rollover points Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 55/65] mlxsw: pci: Wait longer before accessing the device after reset Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 56/65] net: dsa: mv88e6xxx: Fix masking of egress port Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 57/65] jbd2: fix data races at struct journal_head Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 58/65] blk-mq: insert flush request to the front of dispatch queue Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 59/65] ARM: 8957/1: VDSO: Match ARMv8 timer in cntvct_functional() Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 60/65] ARM: 8958/1: rename missed uaccess .fixup section Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 61/65] mm: slub: add missing TID bump in kmem_cache_alloc_bulk() Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 62/65] HID: google: add moonball USB id Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 63/65] HID: add ALWAYS_POLL quirk to lenovo pixart mouse Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 64/65] ARM: 8961/2: Fix Kbuild issue caused by per-task stack protector GCC plugin Greg Kroah-Hartman
2020-03-19 13:04 ` [PATCH 5.5 65/65] ipv4: ensure rcu_read_lock() in cipso_v4_error() Greg Kroah-Hartman
2020-03-19 14:44 ` [PATCH 5.5 00/65] 5.5.11-rc1 review Guenter Roeck
2020-03-19 14:59   ` Greg Kroah-Hartman
2020-03-19 15:15     ` Guenter Roeck
2020-03-19 15:33       ` Greg Kroah-Hartman
2020-03-20 14:46       ` Sasha Levin
2020-03-22 19:51         ` Pavel Machek
2020-03-22 20:34           ` Sasha Levin
2020-03-23  6:48           ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200319123937.204774554@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=axboe@kernel.dk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=xiaoguang.wang@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.