stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Al Viro <viro@zeniv.linux.org.uk>,
	Sasha Levin <sashal@kernel.org>,
	linux-aio@kvack.org, linux-fsdevel@vger.kernel.org
Subject: [PATCH AUTOSEL 5.0 66/66] Fix aio_poll() races
Date: Wed, 24 Apr 2019 10:33:40 -0400	[thread overview]
Message-ID: <20190424143341.27665-66-sashal@kernel.org> (raw)
In-Reply-To: <20190424143341.27665-1-sashal@kernel.org>

From: Al Viro <viro@zeniv.linux.org.uk>

[ Upstream commit af5c72b1fc7a00aa484e90b0c4e0eeb582545634 ]

aio_poll() has to cope with several unpleasant problems:
	* requests that might stay around indefinitely need to
be made visible for io_cancel(2); that must not be done to
a request already completed, though.
	* in cases when ->poll() has placed us on a waitqueue,
wakeup might have happened (and request completed) before ->poll()
returns.
	* worse, in some early wakeup cases request might end
up re-added into the queue later - we can't treat "woken up and
currently not in the queue" as "it's not going to stick around
indefinitely"
	* ... moreover, ->poll() might have decided not to
put it on any queues to start with, and that needs to be distinguished
from the previous case
	* ->poll() might have tried to put us on more than one queue.
Only the first will succeed for aio poll, so we might end up missing
wakeups.  OTOH, we might very well notice that only after the
wakeup hits and request gets completed (all before ->poll() gets
around to the second poll_wait()).  In that case it's too late to
decide that we have an error.

req->woken was an attempt to deal with that.  Unfortunately, it was
broken.  What we need to keep track of is not that wakeup has happened -
the thing might come back after that.  It's that async reference is
already gone and won't come back, so we can't (and needn't) put the
request on the list of cancellables.

The easiest case is "request hadn't been put on any waitqueues"; we
can tell by seeing NULL apt.head, and in that case there won't be
anything async.  We should either complete the request ourselves
(if vfs_poll() reports anything of interest) or return an error.

In all other cases we get exclusion with wakeups by grabbing the
queue lock.

If request is currently on queue and we have something interesting
from vfs_poll(), we can steal it and complete the request ourselves.

If it's on queue and vfs_poll() has not reported anything interesting,
we either put it on the cancellable list, or, if we know that it
hadn't been put on all queues ->poll() wanted it on, we steal it and
return an error.

If it's _not_ on queue, it's either been already dealt with (in which
case we do nothing), or there's aio_poll_complete_work() about to be
executed.  In that case we either put it on the cancellable list,
or, if we know it hadn't been put on all queues ->poll() wanted it on,
simulate what cancel would've done.

It's a lot more convoluted than I'd like it to be.  Single-consumer APIs
suck, and unfortunately aio is not an exception...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 fs/aio.c | 90 +++++++++++++++++++++++++-------------------------------
 1 file changed, 40 insertions(+), 50 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 82bf5dffb272..efa13410e04e 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -181,7 +181,7 @@ struct poll_iocb {
 	struct file		*file;
 	struct wait_queue_head	*head;
 	__poll_t		events;
-	bool			woken;
+	bool			done;
 	bool			cancelled;
 	struct wait_queue_entry	wait;
 	struct work_struct	work;
@@ -1606,12 +1606,6 @@ static int aio_fsync(struct fsync_iocb *req, const struct iocb *iocb,
 	return 0;
 }
 
-static inline void aio_poll_complete(struct aio_kiocb *iocb, __poll_t mask)
-{
-	iocb->ki_res.res = mangle_poll(mask);
-	iocb_put(iocb);
-}
-
 static void aio_poll_complete_work(struct work_struct *work)
 {
 	struct poll_iocb *req = container_of(work, struct poll_iocb, work);
@@ -1637,9 +1631,11 @@ static void aio_poll_complete_work(struct work_struct *work)
 		return;
 	}
 	list_del_init(&iocb->ki_list);
+	iocb->ki_res.res = mangle_poll(mask);
+	req->done = true;
 	spin_unlock_irq(&ctx->ctx_lock);
 
-	aio_poll_complete(iocb, mask);
+	iocb_put(iocb);
 }
 
 /* assumes we are called with irqs disabled */
@@ -1667,31 +1663,27 @@ static int aio_poll_wake(struct wait_queue_entry *wait, unsigned mode, int sync,
 	__poll_t mask = key_to_poll(key);
 	unsigned long flags;
 
-	req->woken = true;
-
 	/* for instances that support it check for an event match first: */
-	if (mask) {
-		if (!(mask & req->events))
-			return 0;
+	if (mask && !(mask & req->events))
+		return 0;
 
+	list_del_init(&req->wait.entry);
+
+	if (mask && spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
 		/*
 		 * Try to complete the iocb inline if we can. Use
 		 * irqsave/irqrestore because not all filesystems (e.g. fuse)
 		 * call this function with IRQs disabled and because IRQs
 		 * have to be disabled before ctx_lock is obtained.
 		 */
-		if (spin_trylock_irqsave(&iocb->ki_ctx->ctx_lock, flags)) {
-			list_del(&iocb->ki_list);
-			spin_unlock_irqrestore(&iocb->ki_ctx->ctx_lock, flags);
-
-			list_del_init(&req->wait.entry);
-			aio_poll_complete(iocb, mask);
-			return 1;
-		}
+		list_del(&iocb->ki_list);
+		iocb->ki_res.res = mangle_poll(mask);
+		req->done = true;
+		spin_unlock_irqrestore(&iocb->ki_ctx->ctx_lock, flags);
+		iocb_put(iocb);
+	} else {
+		schedule_work(&req->work);
 	}
-
-	list_del_init(&req->wait.entry);
-	schedule_work(&req->work);
 	return 1;
 }
 
@@ -1723,6 +1715,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 	struct kioctx *ctx = aiocb->ki_ctx;
 	struct poll_iocb *req = &aiocb->poll;
 	struct aio_poll_table apt;
+	bool cancel = false;
 	__poll_t mask;
 
 	/* reject any unknown events outside the normal event mask. */
@@ -1736,7 +1729,7 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 	req->events = demangle_poll(iocb->aio_buf) | EPOLLERR | EPOLLHUP;
 
 	req->head = NULL;
-	req->woken = false;
+	req->done = false;
 	req->cancelled = false;
 
 	apt.pt._qproc = aio_poll_queue_proc;
@@ -1749,36 +1742,33 @@ static ssize_t aio_poll(struct aio_kiocb *aiocb, const struct iocb *iocb)
 	init_waitqueue_func_entry(&req->wait, aio_poll_wake);
 
 	mask = vfs_poll(req->file, &apt.pt) & req->events;
-	if (unlikely(!req->head)) {
-		/* we did not manage to set up a waitqueue, done */
-		goto out;
-	}
-
 	spin_lock_irq(&ctx->ctx_lock);
-	spin_lock(&req->head->lock);
-	if (req->woken) {
-		/* wake_up context handles the rest */
-		mask = 0;
+	if (likely(req->head)) {
+		spin_lock(&req->head->lock);
+		if (unlikely(list_empty(&req->wait.entry))) {
+			if (apt.error)
+				cancel = true;
+			apt.error = 0;
+			mask = 0;
+		}
+		if (mask || apt.error) {
+			list_del_init(&req->wait.entry);
+		} else if (cancel) {
+			WRITE_ONCE(req->cancelled, true);
+		} else if (!req->done) { /* actually waiting for an event */
+			list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
+			aiocb->ki_cancel = aio_poll_cancel;
+		}
+		spin_unlock(&req->head->lock);
+	}
+	if (mask) { /* no async, we'd stolen it */
+		aiocb->ki_res.res = mangle_poll(mask);
 		apt.error = 0;
-	} else if (mask || apt.error) {
-		/* if we get an error or a mask we are done */
-		WARN_ON_ONCE(list_empty(&req->wait.entry));
-		list_del_init(&req->wait.entry);
-	} else {
-		/* actually waiting for an event */
-		list_add_tail(&aiocb->ki_list, &ctx->active_reqs);
-		aiocb->ki_cancel = aio_poll_cancel;
 	}
-	spin_unlock(&req->head->lock);
 	spin_unlock_irq(&ctx->ctx_lock);
-
-out:
-	if (unlikely(apt.error))
-		return apt.error;
-
 	if (mask)
-		aio_poll_complete(aiocb, mask);
-	return 0;
+		iocb_put(aiocb);
+	return apt.error;
 }
 
 static int __io_submit_one(struct kioctx *ctx, const struct iocb *iocb,
-- 
2.19.1


      parent reply	other threads:[~2019-04-24 14:37 UTC|newest]

Thread overview: 76+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-24 14:32 [PATCH AUTOSEL 5.0 01/66] arm64: dts: rockchip: fix rk3328-roc-cc gmac2io tx/rx_delay Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 02/66] HID: Increase maximum report size allowed by hid_field_extract() Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 03/66] HID: logitech: check the return value of create_singlethread_workqueue Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 04/66] HID: debug: fix race condition with between rdesc_show() and device removal Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 05/66] rtc: cros-ec: Fail suspend/resume if wake IRQ can't be configured Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 06/66] rtc: sh: Fix invalid alarm warning for non-enabled alarm Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 07/66] ARM: OMAP2+: add missing of_node_put after of_device_is_available Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 08/66] batman-adv: Reduce claim hash refcnt only for removed entry Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 09/66] batman-adv: Reduce tt_local " Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 10/66] batman-adv: Reduce tt_global " Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 11/66] batman-adv: fix warning in function batadv_v_elp_get_throughput Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 12/66] ARM: dts: rockchip: Fix gpu opp node names for rk3288 Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 13/66] reset: meson-audio-arb: Fix missing .owner setting of reset_controller_dev Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 14/66] ARM: dts: Fix dcan clkctrl clock for am3 Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 15/66] i40e: fix i40e_ptp_adjtime when given a negative delta Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 16/66] igb: Fix WARN_ONCE on runtime suspend Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 17/66] ixgbe: fix mdio bus registration Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 18/66] i40e: fix WoL support check Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 19/66] fm10k: Fix a potential NULL pointer dereference Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 20/66] riscv: fix accessing 8-byte variable from RV32 Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 21/66] HID: quirks: Fix keyboard + touchpad on Lenovo Miix 630 Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 22/66] net: hns3: fix compile error Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 23/66] xdp: fix cpumap redirect SKB creation bug Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 24/66] net/mlx5: E-Switch, Protect from invalid memory access in offload fdb table Sasha Levin
2019-04-24 14:32 ` [PATCH AUTOSEL 5.0 25/66] net/mlx5: E-Switch, Fix esw manager vport indication for more vport commands Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 26/66] bonding: show full hw address in sysfs for slave entries Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 27/66] net: stmmac: use correct DMA buffer size in the RX descriptor Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 28/66] net: stmmac: ratelimit RX error logs Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 29/66] net: stmmac: don't stop NAPI processing when dropping a packet Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 30/66] net: stmmac: don't overwrite discard_frame status Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 31/66] net: stmmac: fix dropping of multi-descriptor RX frames Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 32/66] net: stmmac: don't log oversized frames Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 33/66] jffs2: fix use-after-free on symlink traversal Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 34/66] debugfs: " Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 35/66] mfd: twl-core: Disable IRQ while suspended Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 36/66] block: use blk_free_flush_queue() to free hctx->fq in blk_mq_init_hctx Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 37/66] rtc: da9063: set uie_unsupported when relevant Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 38/66] HID: input: add mapping for Assistant key Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 39/66] vfio/pci: use correct format characters Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 40/66] vfio/type1: Limit DMA mappings per container Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 41/66] scsi: core: add new RDAC LENOVO/DE_Series device Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 42/66] scsi: storvsc: Fix calculation of sub-channel count Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 43/66] arm/mach-at91/pm : fix possible object reference leak Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 44/66] blk-mq: do not reset plug->rq_count before the list is sorted Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 45/66] arm64: fix wrong check of on_sdei_stack in nmi context Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 46/66] net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw() Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 47/66] net: hns: Use NAPI_POLL_WEIGHT for hns driver Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 48/66] net: hns: Fix probabilistic memory overwrite when HNS driver initialized Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 49/66] net: hns: fix ICMP6 neighbor solicitation messages discard problem Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 50/66] net: hns: Fix WARNING when remove HNS driver with SMMU enabled Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 51/66] libcxgb: fix incorrect ppmax calculation Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 52/66] KVM: SVM: prevent DBG_DECRYPT and DBG_ENCRYPT overflow Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 53/66] kmemleak: powerpc: skip scanning holes in the .bss section Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 54/66] hugetlbfs: fix memory leak for resv_map Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 55/66] sh: fix multiple function definition build errors Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 56/66] kernel/sysctl.c: fix out-of-bounds access when setting file-max Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 57/66] null_blk: prevent crash from bad home_node value Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 58/66] xsysace: Fix error handling in ace_setup Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 59/66] fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock Sasha Levin
2019-04-24 16:34   ` Greg Kroah-Hartman
2019-04-24 16:40     ` Linus Torvalds
2019-04-24 17:02       ` Greg Kroah-Hartman
2019-04-24 17:19       ` Sasha Levin
2019-04-24 17:26         ` Linus Torvalds
2019-04-24 18:30           ` Kirill Smelkov
2019-04-25 10:04             ` David Laight
2019-04-26  7:45               ` Kirill Smelkov
2019-04-26 11:00                 ` David Laight
2019-04-26 18:20                   ` Kirill Smelkov
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 60/66] ARM: orion: don't use using 64-bit DMA masks Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 61/66] ARM: iop: " Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 62/66] pin iocb through aio Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 63/66] aio: fold lookup_kiocb() into its sole caller Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 64/66] aio: keep io_event in aio_kiocb Sasha Levin
2019-04-24 14:33 ` [PATCH AUTOSEL 5.0 65/66] aio: store event at final iocb_put() Sasha Levin
2019-04-24 14:33 ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190424143341.27665-66-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).