From: Jan Kara <jack@suse.cz>
To: Nick Piggin <npiggin@gmail.com>
Cc: paulmck@linux.vnet.ibm.com, Jeff Moyer <jmoyer@redhat.com>,
Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-kernel@vger.kernel.org
Subject: Re: [patch] fs: aio fix rcu lookup
Date: Thu, 20 Jan 2011 21:16:02 +0100 [thread overview]
Message-ID: <20110120201602.GA19797@quack.suse.cz> (raw)
In-Reply-To: <AANLkTi=RdmWV9MLWzDxe6N_kWZQA+a2o8jg-PkB-7QGk@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 1521 bytes --]
On Fri 21-01-11 05:31:53, Nick Piggin wrote:
> On Thu, Jan 20, 2011 at 3:03 PM, Paul E. McKenney
> <paulmck@linux.vnet.ibm.com> wrote:
> > On Thu, Jan 20, 2011 at 08:20:00AM +1100, Nick Piggin wrote:
> >> On Thu, Jan 20, 2011 at 8:03 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> >> >> I don't know exactly how all programs use io_destroy -- of the small
> >> >> number that do, probably an even smaller number would care here. But I
> >> >> don't think it simplifies things enough to use synchronize_rcu for it.
> >> >
> >> > Above it sounded like you didn't think AIO should be using RCU at all.
> >>
> >> synchronize_rcu of course, not RCU (typo).
> >
> > I think that Nick is suggesting that call_rcu() be used instead.
> > Perhaps also very sparing use of synchronize_rcu_expedited(), which
> > is faster than synchronize_rcu(), but which which uses more CPU time.
>
> call_rcu() is the obvious alternative, yes.
>
> Basically, once we give in to synchronize_rcu() we're basically giving
> up. That's certainly a very good tradeoff for something like filesystem
> unregistration or module unload, it buys big simplifications in real
> fastpaths. But I just don't think it should be taken lightly.
So in the end, I've realized I don't need synchronize_rcu() at all and
in fact everything is OK even without call_rcu() if I base my fix on top
of your patch.
Attached is your patch with added comment I proposed and also a patch
fixing the second race. Better?
Honza
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
[-- Attachment #2: 0001-fs-Fix-aio-rcu-ioctx-lookup.patch --]
[-- Type: text/x-patch, Size: 2358 bytes --]
>From 68857d7f2087edbbc5ee1d828f151ac46406f3be Mon Sep 17 00:00:00 2001
From: Nick Piggin <npiggin@gmail.com>
Date: Thu, 20 Jan 2011 20:08:52 +0100
Subject: [PATCH 1/2] fs: Fix aio rcu ioctx lookup
aio-dio-invalidate-failure GPFs in aio_put_req from io_submit.
lookup_ioctx doesn't implement the rcu lookup pattern properly. rcu_read_lock
does not prevent refcount going to zero, so we might take a refcount on a zero
count ioctx.
Fix the bug by atomically testing for zero refcount before incrementing.
[JK: Added comment into the code]
Signed-off-by: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/aio.c | 35 ++++++++++++++++++++++++-----------
1 files changed, 24 insertions(+), 11 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index fc557a3..b4dd668 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -239,15 +239,23 @@ static void __put_ioctx(struct kioctx *ctx)
call_rcu(&ctx->rcu_head, ctx_rcu_free);
}
-#define get_ioctx(kioctx) do { \
- BUG_ON(atomic_read(&(kioctx)->users) <= 0); \
- atomic_inc(&(kioctx)->users); \
-} while (0)
-#define put_ioctx(kioctx) do { \
- BUG_ON(atomic_read(&(kioctx)->users) <= 0); \
- if (unlikely(atomic_dec_and_test(&(kioctx)->users))) \
- __put_ioctx(kioctx); \
-} while (0)
+static inline void get_ioctx(struct kioctx *kioctx)
+{
+ BUG_ON(atomic_read(&kioctx->users) <= 0);
+ atomic_inc(&kioctx->users);
+}
+
+static inline int try_get_ioctx(struct kioctx *kioctx)
+{
+ return atomic_inc_not_zero(&kioctx->users);
+}
+
+static inline void put_ioctx(struct kioctx *kioctx)
+{
+ BUG_ON(atomic_read(&kioctx->users) <= 0);
+ if (unlikely(atomic_dec_and_test(&kioctx->users)))
+ __put_ioctx(kioctx);
+}
/* ioctx_alloc
* Allocates and initializes an ioctx. Returns an ERR_PTR if it failed.
@@ -601,8 +609,13 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
rcu_read_lock();
hlist_for_each_entry_rcu(ctx, n, &mm->ioctx_list, list) {
- if (ctx->user_id == ctx_id && !ctx->dead) {
- get_ioctx(ctx);
+ /*
+ * RCU protects us against accessing freed memory but
+ * we have to be careful not to get a reference when the
+ * reference count already dropped to 0 (ctx->dead test
+ * is unreliable because of races).
+ */
+ if (ctx->user_id == ctx_id && !ctx->dead && try_get_ioctx(ctx)){
ret = ctx;
break;
}
--
1.7.1
[-- Attachment #3: 0002-fs-Fix-race-between-io_destroy-and-io_submit-in-AIO.patch --]
[-- Type: text/x-patch, Size: 1827 bytes --]
>From 6d5375d55b5d88e8ceda739052566e033be620c2 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Wed, 19 Jan 2011 00:37:48 +0100
Subject: [PATCH 2/2] fs: Fix race between io_destroy() and io_submit() in AIO
A race can occur when io_submit() races with io_destroy():
CPU1 CPU2
io_submit()
do_io_submit()
...
ctx = lookup_ioctx(ctx_id);
io_destroy()
Now do_io_submit() holds the last reference to ctx.
...
queue new AIO
put_ioctx(ctx) - frees ctx with active AIOs
We solve this issue by checking whether ctx is being destroyed
in AIO submission path after adding new AIO to ctx. Then we
are guaranteed that either io_destroy() waits for new AIO or
we see that ctx is being destroyed and bail out.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/aio.c | 15 +++++++++++++++
1 files changed, 15 insertions(+), 0 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index b4dd668..0244c04 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1642,6 +1642,21 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
goto out_put_req;
spin_lock_irq(&ctx->ctx_lock);
+ /*
+ * We could have raced with io_destroy() and are currently holding a
+ * reference to ctx which should be destroyed. We cannot submit IO
+ * since ctx gets freed as soon as io_submit() puts its reference.
+ * The check here is reliable since io_destroy() sets ctx->dead before
+ * waiting for outstanding IO. Thus if we don't see ctx->dead set here,
+ * io_destroy() waits for our IO to finish.
+ * The check is inside ctx->ctx_lock to avoid extra memory barrier
+ * in this fast path...
+ */
+ if (ctx->dead) {
+ spin_unlock_irq(&ctx->ctx_lock);
+ ret = -EINVAL;
+ goto out_put_req;
+ }
aio_run_iocb(req);
if (!list_empty(&ctx->run_list)) {
/* drain the run list */
--
1.7.1
next prev parent reply other threads:[~2011-01-20 20:16 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-14 1:35 [patch] fs: aio fix rcu lookup Nick Piggin
2011-01-14 14:52 ` Jeff Moyer
2011-01-14 15:00 ` Nick Piggin
2011-01-17 19:07 ` Jeff Moyer
2011-01-17 23:24 ` Nick Piggin
2011-01-18 17:21 ` Jeff Moyer
2011-01-18 19:01 ` Jan Kara
2011-01-18 22:17 ` Nick Piggin
2011-01-18 23:00 ` Jeff Moyer
2011-01-18 23:05 ` Nick Piggin
2011-01-18 23:52 ` Jan Kara
2011-01-19 0:20 ` Nick Piggin
2011-01-19 13:21 ` Jan Kara
2011-01-19 16:03 ` Nick Piggin
2011-01-19 16:50 ` Jan Kara
2011-01-19 17:37 ` Nick Piggin
2011-01-20 20:21 ` Jan Kara
2011-01-19 19:13 ` Jeff Moyer
2011-01-19 19:46 ` Jeff Moyer
2011-01-19 20:18 ` Nick Piggin
2011-01-19 20:32 ` Jeff Moyer
2011-01-19 20:45 ` Nick Piggin
2011-01-19 21:03 ` Jeff Moyer
2011-01-19 21:20 ` Nick Piggin
2011-01-20 4:03 ` Paul E. McKenney
2011-01-20 18:31 ` Nick Piggin
2011-01-20 20:02 ` Paul E. McKenney
2011-01-20 20:15 ` Eric Dumazet
2011-01-21 21:22 ` Paul E. McKenney
2011-01-20 20:16 ` Jan Kara [this message]
2011-01-20 21:16 ` Jeff Moyer
2011-02-01 16:24 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110120201602.GA19797@quack.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=jmoyer@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@gmail.com \
--cc=paulmck@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).