From: Kent Overstreet <kmo@daterainc.com>
To: Chris Mason <clm@fb.com>,
linux-fsdevel@vger.kernel.org, linux-aio@kvack.org,
Peter Zijlstra <peterz@infradead.org>
Subject: Re: [PATCH RFC] fs/aio: fix sleeping while TASK_INTERRUPTIBLE
Date: Wed, 24 Dec 2014 18:56:41 -0800 [thread overview]
Message-ID: <20141225025641.GC29607@moria.home.lan> (raw)
In-Reply-To: <20141223001619.GA26385@ret.masoncoding.com>
On Mon, Dec 22, 2014 at 07:16:25PM -0500, Chris Mason wrote:
> The 3.19 merge window brought in a great new warning to catch someone
> calling might_sleep with their state != TASK_RUNNING. The idea was to
> find buggy code locking mutexes after calling prepare_to_wait(), kind
> of like this:
Ben just told me about this issue.
IMO, the way the code is structured now is correct, I would argue the problem is
with the way wait_event() works - they way they have to mess with the global-ish
task state when adding a wait_queue_t to a wait_queue_head (who came up with
these names?)
Bcache's closures don't have this problem; a closure being on a waitlist has
nothing to do with task state - instead, closures keep a counter of the number
of things they're waiting on. You can add a closure to a waitlist and then
separately, later, do a closure_sync() to wait on the closure's remaining count
to hit 0.
Bcache in fact used to have a closure_wait_event() macro that was exactly
analogous to wait_event() but using a closure - I forget what it was used for,
but at some point it wasn't used by bcache anymore and got deleted.
I just cooked up closure_sync_interruptible_hrtimeout() and the corresponding
wait_event macro and then converted aio to use it. This would IMO be a much
cleaner solution to the original problem.
The one disadvantage I know of, with the current code, is that closure waitlists
are singly linked - so they can be lockless, but that means you wake up/remove
a single closure from a waitlist, you have to do wake_up_all() - which is an
obvious disadvantage w.r.t. spurious wakeups. If people like this approach
though I'll just make closure waitlists doubly linked with a lock (which is
something I'd been considering doing anyways)
Here's the patch to the aio code - the rest of the series is in a branch at:
http://evilpiepirate.org/git/linux-bcache.git/log/?h=aio_ring_fix
Disclaimer: code has only been _lightly_ tested so far, the closure hrtimer
stuff was somewhat nontrivial
commit c91f0de111da37581709f7d201793a88c6993188
Author: Kent Overstreet <kmo@daterainc.com>
Date: Wed Dec 24 17:20:32 2014 -0800
aio: Convert to closure waitlist for aio ring buffer
Advantage of closure waitlists is that we don't have to muck with the task state
before we actually sleep; instead of prepare_to_wait() we do closure_wait(),
which like prepare_to_wait() adds an object to a waitlist but unlike
prepare_to_wait it's the closure that's doing the waiting, not the task.
This fixes the issue with doing copy_to_user() after modifying the task state.
Change-Id: Ifc75123d5bb620277d1e78dd5102e5d8bead1add
diff --git a/fs/aio.c b/fs/aio.c
index 1b7893ecc2..284c74e624 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -40,6 +40,7 @@
#include <linux/ramfs.h>
#include <linux/percpu-refcount.h>
#include <linux/mount.h>
+#include <linux/closure.h>
#include <asm/kmap_types.h>
#include <asm/uaccess.h>
@@ -136,7 +137,7 @@ struct kioctx {
struct {
struct mutex ring_lock;
- wait_queue_head_t wait;
+ struct closure_waitlist wait;
} ____cacheline_aligned_in_smp;
struct {
@@ -689,7 +690,6 @@ static struct kioctx *ioctx_alloc(unsigned nr_events)
/* Protect against page migration throughout kiotx setup by keeping
* the ring_lock mutex held until setup is complete. */
mutex_lock(&ctx->ring_lock);
- init_waitqueue_head(&ctx->wait);
INIT_LIST_HEAD(&ctx->active_reqs);
@@ -772,7 +772,7 @@ static int kill_ioctx(struct mm_struct *mm, struct kioctx *ctx,
spin_unlock(&mm->ioctx_lock);
/* percpu_ref_kill() will do the necessary call_rcu() */
- wake_up_all(&ctx->wait);
+ closure_wake_up(&ctx->wait);
/*
* It'd be more correct to do this in free_ioctx(), after all
@@ -1121,8 +1121,7 @@ void aio_complete(struct kiocb *iocb, long res, long res2)
*/
smp_mb();
- if (waitqueue_active(&ctx->wait))
- wake_up(&ctx->wait);
+ closure_wake_up(&ctx->wait);
percpu_ref_put(&ctx->reqs);
}
@@ -1237,26 +1236,15 @@ static long read_events(struct kioctx *ctx, long min_nr, long nr,
return -EFAULT;
until = timespec_to_ktime(ts);
+
+ if (until.tv64)
+ until = ktime_add(ktime_get(), until);
}
- /*
- * Note that aio_read_events() is being called as the conditional - i.e.
- * we're calling it after prepare_to_wait() has set task state to
- * TASK_INTERRUPTIBLE.
- *
- * But aio_read_events() can block, and if it blocks it's going to flip
- * the task state back to TASK_RUNNING.
- *
- * This should be ok, provided it doesn't flip the state back to
- * TASK_RUNNING and return 0 too much - that causes us to spin. That
- * will only happen if the mutex_lock() call blocks, and we then find
- * the ringbuffer empty. So in practice we should be ok, but it's
- * something to be aware of when touching this code.
- */
if (until.tv64 == 0)
aio_read_events(ctx, min_nr, nr, event, &ret);
else
- wait_event_interruptible_hrtimeout(ctx->wait,
+ closure_wait_event_hrtimeout(&ctx->wait,
aio_read_events(ctx, min_nr, nr, event, &ret),
until);
next prev parent reply other threads:[~2014-12-25 2:51 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-12-23 0:16 [PATCH RFC] fs/aio: fix sleeping while TASK_INTERRUPTIBLE Chris Mason
2014-12-23 18:43 ` Benjamin LaHaise
2014-12-23 18:55 ` Chris Mason
2014-12-23 21:58 ` Benjamin LaHaise
2014-12-25 2:59 ` Kent Overstreet
2014-12-25 3:11 ` Benjamin LaHaise
2014-12-25 3:29 ` Kent Overstreet
2014-12-29 1:24 ` Chris Mason
2014-12-25 2:56 ` Kent Overstreet [this message]
2014-12-25 14:27 ` Sedat Dilek
2015-01-04 10:16 ` Sedat Dilek
2014-12-29 15:08 ` Chris Mason
2014-12-29 22:08 ` Kent Overstreet
2015-01-13 16:06 ` Benjamin LaHaise
2015-01-13 16:20 ` Chris Mason
2015-01-21 10:13 ` Dave Chinner
2015-01-21 21:42 ` Chris Mason
2015-02-03 9:14 ` Sedat Dilek
2015-02-03 9:54 ` Sedat Dilek
2015-02-09 3:08 ` Sedat Dilek
2015-02-09 4:21 ` Sedat Dilek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141225025641.GC29607@moria.home.lan \
--to=kmo@daterainc.com \
--cc=clm@fb.com \
--cc=linux-aio@kvack.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).