linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Nick Piggin <npiggin@gmail.com>
Cc: paulmck@linux.vnet.ibm.com, Jeff Moyer <jmoyer@redhat.com>,
	Jan Kara <jack@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-kernel@vger.kernel.org
Subject: Re: [patch] fs: aio fix rcu lookup
Date: Tue, 1 Feb 2011 17:24:38 +0100	[thread overview]
Message-ID: <20110201162438.GC2059@quack.suse.cz> (raw)
In-Reply-To: <20110120201602.GA19797@quack.suse.cz>

On Thu 20-01-11 21:16:02, Jan Kara wrote:
> On Fri 21-01-11 05:31:53, Nick Piggin wrote:
> > On Thu, Jan 20, 2011 at 3:03 PM, Paul E. McKenney
> > <paulmck@linux.vnet.ibm.com> wrote:
> > > On Thu, Jan 20, 2011 at 08:20:00AM +1100, Nick Piggin wrote:
> > >> On Thu, Jan 20, 2011 at 8:03 AM, Jeff Moyer <jmoyer@redhat.com> wrote:
> > >> >> I don't know exactly how all programs use io_destroy -- of the small
> > >> >> number that do, probably an even smaller number would care here. But I
> > >> >> don't think it simplifies things enough to use synchronize_rcu for it.
> > >> >
> > >> > Above it sounded like you didn't think AIO should be using RCU at all.
> > >>
> > >> synchronize_rcu of course, not RCU (typo).
> > >
> > > I think that Nick is suggesting that call_rcu() be used instead.
> > > Perhaps also very sparing use of synchronize_rcu_expedited(), which
> > > is faster than synchronize_rcu(), but which which uses more CPU time.
> > 
> > call_rcu() is the obvious alternative, yes.
> > 
> > Basically, once we give in to synchronize_rcu() we're basically giving
> > up. That's certainly a very good tradeoff for something like filesystem
> > unregistration or module unload, it buys big simplifications in real
> > fastpaths. But I just don't think it should be taken lightly.
> So in the end, I've realized I don't need synchronize_rcu() at all and
> in fact everything is OK even without call_rcu() if I base my fix on top
> of your patch.
> 
> Attached is your patch with added comment I proposed and also a patch
> fixing the second race. Better?
  Nick, any opinion on this? Should I push the patches upstream?

									Honza

> From 68857d7f2087edbbc5ee1d828f151ac46406f3be Mon Sep 17 00:00:00 2001
> From: Nick Piggin <npiggin@gmail.com>
> Date: Thu, 20 Jan 2011 20:08:52 +0100
> Subject: [PATCH 1/2] fs: Fix aio rcu ioctx lookup
> 
> aio-dio-invalidate-failure GPFs in aio_put_req from io_submit.
> 
> lookup_ioctx doesn't implement the rcu lookup pattern properly.  rcu_read_lock
> does not prevent refcount going to zero, so we might take a refcount on a zero
> count ioctx.
> 
> Fix the bug by atomically testing for zero refcount before incrementing.
> 
> [JK: Added comment into the code]
> 
> Signed-off-by: Nick Piggin <npiggin@kernel.dk>
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/aio.c |   35 ++++++++++++++++++++++++-----------
>  1 files changed, 24 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index fc557a3..b4dd668 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -239,15 +239,23 @@ static void __put_ioctx(struct kioctx *ctx)
>  	call_rcu(&ctx->rcu_head, ctx_rcu_free);
>  }
>  
> -#define get_ioctx(kioctx) do {						\
> -	BUG_ON(atomic_read(&(kioctx)->users) <= 0);			\
> -	atomic_inc(&(kioctx)->users);					\
> -} while (0)
> -#define put_ioctx(kioctx) do {						\
> -	BUG_ON(atomic_read(&(kioctx)->users) <= 0);			\
> -	if (unlikely(atomic_dec_and_test(&(kioctx)->users))) 		\
> -		__put_ioctx(kioctx);					\
> -} while (0)
> +static inline void get_ioctx(struct kioctx *kioctx)
> +{
> +	BUG_ON(atomic_read(&kioctx->users) <= 0);
> +	atomic_inc(&kioctx->users);
> +}
> +
> +static inline int try_get_ioctx(struct kioctx *kioctx)
> +{
> +	return atomic_inc_not_zero(&kioctx->users);
> +}
> +
> +static inline void put_ioctx(struct kioctx *kioctx)
> +{
> +	BUG_ON(atomic_read(&kioctx->users) <= 0);
> +	if (unlikely(atomic_dec_and_test(&kioctx->users)))
> +		__put_ioctx(kioctx);
> +}
>  
>  /* ioctx_alloc
>   *	Allocates and initializes an ioctx.  Returns an ERR_PTR if it failed.
> @@ -601,8 +609,13 @@ static struct kioctx *lookup_ioctx(unsigned long ctx_id)
>  	rcu_read_lock();
>  
>  	hlist_for_each_entry_rcu(ctx, n, &mm->ioctx_list, list) {
> -		if (ctx->user_id == ctx_id && !ctx->dead) {
> -			get_ioctx(ctx);
> +		/*
> +		 * RCU protects us against accessing freed memory but
> +		 * we have to be careful not to get a reference when the
> +		 * reference count already dropped to 0 (ctx->dead test
> +		 * is unreliable because of races).
> +		 */
> +		if (ctx->user_id == ctx_id && !ctx->dead && try_get_ioctx(ctx)){
>  			ret = ctx;
>  			break;
>  		}
> -- 
> 1.7.1
> 

> From 6d5375d55b5d88e8ceda739052566e033be620c2 Mon Sep 17 00:00:00 2001
> From: Jan Kara <jack@suse.cz>
> Date: Wed, 19 Jan 2011 00:37:48 +0100
> Subject: [PATCH 2/2] fs: Fix race between io_destroy() and io_submit() in AIO
> 
> A race can occur when io_submit() races with io_destroy():
> 
>  CPU1						CPU2
> io_submit()
>   do_io_submit()
>     ...
>     ctx = lookup_ioctx(ctx_id);
> 						io_destroy()
>     Now do_io_submit() holds the last reference to ctx.
>     ...
>     queue new AIO
>     put_ioctx(ctx) - frees ctx with active AIOs
> 
> We solve this issue by checking whether ctx is being destroyed
> in AIO submission path after adding new AIO to ctx. Then we
> are guaranteed that either io_destroy() waits for new AIO or
> we see that ctx is being destroyed and bail out.
> 
> Signed-off-by: Jan Kara <jack@suse.cz>
> ---
>  fs/aio.c |   15 +++++++++++++++
>  1 files changed, 15 insertions(+), 0 deletions(-)
> 
> diff --git a/fs/aio.c b/fs/aio.c
> index b4dd668..0244c04 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -1642,6 +1642,21 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>  		goto out_put_req;
>  
>  	spin_lock_irq(&ctx->ctx_lock);
> +	/*
> +	 * We could have raced with io_destroy() and are currently holding a
> +	 * reference to ctx which should be destroyed. We cannot submit IO
> +	 * since ctx gets freed as soon as io_submit() puts its reference.
> +	 * The check here is reliable since io_destroy() sets ctx->dead before
> +	 * waiting for outstanding IO. Thus if we don't see ctx->dead set here,
> +	 * io_destroy() waits for our IO to finish.
> +	 * The check is inside ctx->ctx_lock to avoid extra memory barrier
> +	 * in this fast path...
> +	 */
> +	if (ctx->dead) {
> +		spin_unlock_irq(&ctx->ctx_lock);
> +		ret = -EINVAL;
> +		goto out_put_req;
> +	}
>  	aio_run_iocb(req);
>  	if (!list_empty(&ctx->run_list)) {
>  		/* drain the run list */
> -- 
> 1.7.1
> 

-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

      parent reply	other threads:[~2011-02-01 16:24 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-14  1:35 [patch] fs: aio fix rcu lookup Nick Piggin
2011-01-14 14:52 ` Jeff Moyer
2011-01-14 15:00   ` Nick Piggin
2011-01-17 19:07     ` Jeff Moyer
2011-01-17 23:24       ` Nick Piggin
2011-01-18 17:21         ` Jeff Moyer
2011-01-18 19:01         ` Jan Kara
2011-01-18 22:17           ` Nick Piggin
2011-01-18 23:00             ` Jeff Moyer
2011-01-18 23:05               ` Nick Piggin
2011-01-18 23:52             ` Jan Kara
2011-01-19  0:20               ` Nick Piggin
2011-01-19 13:21                 ` Jan Kara
2011-01-19 16:03                   ` Nick Piggin
2011-01-19 16:50                     ` Jan Kara
2011-01-19 17:37                       ` Nick Piggin
2011-01-20 20:21                         ` Jan Kara
2011-01-19 19:13                   ` Jeff Moyer
2011-01-19 19:46                     ` Jeff Moyer
2011-01-19 20:18                       ` Nick Piggin
2011-01-19 20:32                         ` Jeff Moyer
2011-01-19 20:45                           ` Nick Piggin
2011-01-19 21:03                             ` Jeff Moyer
2011-01-19 21:20                               ` Nick Piggin
2011-01-20  4:03                                 ` Paul E. McKenney
2011-01-20 18:31                                   ` Nick Piggin
2011-01-20 20:02                                     ` Paul E. McKenney
2011-01-20 20:15                                       ` Eric Dumazet
2011-01-21 21:22                                         ` Paul E. McKenney
2011-01-20 20:16                                     ` Jan Kara
2011-01-20 21:16                                       ` Jeff Moyer
2011-02-01 16:24                                       ` Jan Kara [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110201162438.GC2059@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=jmoyer@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=npiggin@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).