From: Jan Kara <jack@suse.cz>
To: Nick Piggin <npiggin@gmail.com>
Cc: paulmck@linux.vnet.ibm.com, Jeff Moyer <jmoyer@redhat.com>,
Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
linux-kernel@vger.kernel.org
Subject: Re: [patch] fs: aio fix rcu lookup
Date: Tue, 1 Feb 2011 17:24:38 +0100 [thread overview]
Message-ID: <20110201162438.GC2059@quack.suse.cz> (raw)
In-Reply-To: <20110120201602.GA19797@quack.suse.cz>
On Thu 20-01-11 21:16:02, Jan Kara wrote:
> On Fri 21-01-11 05:31:53, Nick Piggin wrote:
> > On Thu, Jan 20, 2011 at 3:03 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > On Thu, Jan 20, 2011 at 08:20:00AM +1100, Nick Piggin wrote:
> > >> On Thu, Jan 20, 2011 at 8:03 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> > >> >> I don't know exactly how all programs use io_destroy -- of the small
> > >> >> number that do, probably an even smaller number would care here. But I
> > >> >> don't think it simplifies things enough to use synchronize_rcu for it.
> > >> >
> > >> > Above it sounded like you didn't think AIO should be using RCU at all.
> > >>
> > >> synchronize_rcu of course, not RCU (typo).
> > >
> > > I think that Nick is suggesting that call_rcu() be used instead.
> > > Perhaps also very sparing use of synchronize_rcu_expedited(), which
> > > is faster than synchronize_rcu(), but which which uses more CPU time.
> >
> > call_rcu() is the obvious alternative, yes.
> >
> > Basically, once we give in to synchronize_rcu() we're basically giving
> > up. That's certainly a very good tradeoff for something like filesystem
> > unregistration or module unload, it buys big simplifications in real
> > fastpaths. But I just don't think it should be taken lightly.
> So in the end, I've realized I don't need synchronize_rcu() at all and
> in fact everything is OK even without call_rcu() if I base my fix on top
> of your patch.
>
> Attached is your patch with added comment I proposed and also a patch
> fixing the second race. Better?
Nick, any opinion on this? Should I push the patches upstream?
Honza
> From 68857d7f2087edbbc5ee1d828f151ac46406f3be Mon Sep 17 00:00:00 2001
> From: Nick Piggin <npiggin@gmail.com>
> Date: Thu, 20 Jan 2011 20:08:52 +0100
> Subject: [PATCH 1/2] fs: Fix aio rcu ioctx lookup
>
> aio-dio-invalidate-failure GPFs in aio_put_req from io_submit.
>
> lookup_ioctx doesn't implement the rcu lookup pattern properly. rcu_read_lock
> does not prevent refcount going to zero, so we might take a refcount on a zero
> count ioctx.
>
> Fix the bug by atomically testing for zero refcount before incrementing.
>
> [JK: Added comment into the code]
>
> Signed-off-by: Nick Piggin <npiggin@kernel.dk>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/aio.c | 35 ++++++++++++++++++++++++-----------
> 1 files changed, 24 insertions(+), 11 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index fc557a3..b4dd668 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -239,15 +239,23 @@ static void __put_ioctx(struct kioctx *ctx)
> call_rcu(&ctx->rcu_head, ctx_rcu_free);
> }
>
> -#define get_ioctx(kioctx) do { \
> - BUG_ON(atomic_read(&(kioctx)->users) <= 0); \
> - atomic_inc(&(kioctx)->users); \
> -} while (0)
> -#define put_ioctx(kioctx) do { \
> - BUG_ON(atomic_read(&(kioctx)->users) <= 0); \
> - if (unlikely(atomic_dec_and_test(&(kioctx)->users))) \
> - __put_ioctx(kioctx); \
> -} while (0)
> +static inline void get_ioctx(struct kioctx *kioctx)
> +{
> + BUG_ON(atomic_read(&kioctx->users) <= 0);
> + atomic_inc(&kioctx->users);
> +}
> +
> +static inline int try_get_ioctx(struct kioctx *kioctx)
> +{
> + return atomic_inc_not_zero(&kioctx->users);
> +}
> +
> +static inline void put_ioctx(struct kioctx *kioctx)
> +{
> + BUG_ON(atomic_read(&kioctx->users) <= 0);
> + if (unlikely(atomic_dec_and_test(&kioctx->users)))
> + __put_ioctx(kioctx);
> +}
>
> /* ioctx_alloc
> * Allocates and initializes an ioctx. Returns an ERR_PTR if it failed.
> @@ -601,8 +609,13 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
> rcu_read_lock();
>
> hlist_for_each_entry_rcu(ctx, n, &mm->ioctx_list, list) {
> - if (ctx->user_id == ctx_id && !ctx->dead) {
> - get_ioctx(ctx);
> + /*
> + * RCU protects us against accessing freed memory but
> + * we have to be careful not to get a reference when the
> + * reference count already dropped to 0 (ctx->dead test
> + * is unreliable because of races).
> + */
> + if (ctx->user_id == ctx_id && !ctx->dead && try_get_ioctx(ctx)){
> ret = ctx;
> break;
> }
> --
> 1.7.1
>
> From 6d5375d55b5d88e8ceda739052566e033be620c2 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Wed, 19 Jan 2011 00:37:48 +0100
> Subject: [PATCH 2/2] fs: Fix race between io_destroy() and io_submit() in AIO
>
> A race can occur when io_submit() races with io_destroy():
>
> CPU1 CPU2
> io_submit()
> do_io_submit()
> ...
> ctx = lookup_ioctx(ctx_id);
> io_destroy()
> Now do_io_submit() holds the last reference to ctx.
> ...
> queue new AIO
> put_ioctx(ctx) - frees ctx with active AIOs
>
> We solve this issue by checking whether ctx is being destroyed
> in AIO submission path after adding new AIO to ctx. Then we
> are guaranteed that either io_destroy() waits for new AIO or
> we see that ctx is being destroyed and bail out.
>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
> fs/aio.c | 15 +++++++++++++++
> 1 files changed, 15 insertions(+), 0 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index b4dd668..0244c04 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1642,6 +1642,21 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
> goto out_put_req;
>
> spin_lock_irq(&ctx->ctx_lock);
> + /*
> + * We could have raced with io_destroy() and are currently holding a
> + * reference to ctx which should be destroyed. We cannot submit IO
> + * since ctx gets freed as soon as io_submit() puts its reference.
> + * The check here is reliable since io_destroy() sets ctx->dead before
> + * waiting for outstanding IO. Thus if we don't see ctx->dead set here,
> + * io_destroy() waits for our IO to finish.
> + * The check is inside ctx->ctx_lock to avoid extra memory barrier
> + * in this fast path...
> + */
> + if (ctx->dead) {
> + spin_unlock_irq(&ctx->ctx_lock);
> + ret = -EINVAL;
> + goto out_put_req;
> + }
> aio_run_iocb(req);
> if (!list_empty(&ctx->run_list)) {
> /* drain the run list */
> --
> 1.7.1
>
--
Jan Kara <jack@suse.cz>
SUSE Labs, CR
prev parent reply other threads:[~2011-02-01 16:24 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-14 1:35 [patch] fs: aio fix rcu lookup Nick Piggin
2011-01-14 14:52 ` Jeff Moyer
2011-01-14 15:00 ` Nick Piggin
2011-01-17 19:07 ` Jeff Moyer
2011-01-17 23:24 ` Nick Piggin
2011-01-18 17:21 ` Jeff Moyer
2011-01-18 19:01 ` Jan Kara
2011-01-18 22:17 ` Nick Piggin
2011-01-18 23:00 ` Jeff Moyer
2011-01-18 23:05 ` Nick Piggin
2011-01-18 23:52 ` Jan Kara
2011-01-19 0:20 ` Nick Piggin
2011-01-19 13:21 ` Jan Kara
2011-01-19 16:03 ` Nick Piggin
2011-01-19 16:50 ` Jan Kara
2011-01-19 17:37 ` Nick Piggin
2011-01-20 20:21 ` Jan Kara
2011-01-19 19:13 ` Jeff Moyer
2011-01-19 19:46 ` Jeff Moyer
2011-01-19 20:18 ` Nick Piggin
2011-01-19 20:32 ` Jeff Moyer
2011-01-19 20:45 ` Nick Piggin
2011-01-19 21:03 ` Jeff Moyer
2011-01-19 21:20 ` Nick Piggin
2011-01-20 4:03 ` Paul E. McKenney
2011-01-20 18:31 ` Nick Piggin
2011-01-20 20:02 ` Paul E. McKenney
2011-01-20 20:15 ` Eric Dumazet
2011-01-21 21:22 ` Paul E. McKenney
2011-01-20 20:16 ` Jan Kara
2011-01-20 21:16 ` Jeff Moyer
2011-02-01 16:24 ` Jan Kara [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110201162438.GC2059@quack.suse.cz \
--to=jack@suse.cz \
--cc=akpm@linux-foundation.org \
--cc=jmoyer@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=npiggin@gmail.com \
--cc=paulmck@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).