linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "J. Bruce Fields" <bfields@fieldses.org>
To: Neil Brown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/7] sunrpc: fix race in new cache_wait code.
Date: Wed, 22 Sep 2010 23:25:40 -0400	[thread overview]
Message-ID: <20100923032540.GA13554@fieldses.org> (raw)
In-Reply-To: <20100923130002.5f2da3e9@notabene>

On Thu, Sep 23, 2010 at 01:00:02PM +1000, Neil Brown wrote:
> On Wed, 22 Sep 2010 13:50:27 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Wed, Sep 22, 2010 at 12:55:06PM +1000, NeilBrown wrote:
> > > If we set up to wait for a cache item to be filled in, and then find
> > > that it is no longer pending, it could be that some other thread is
> > > in 'cache_revisit_request' and has moved our request to its 'pending' list.
> > > So when our setup_deferral calls cache_revisit_request it will find nothing to
> > > put on the pending list, and do nothing.
> > > 
> > > We then return from cache_wait_req, thus leaving the 'sleeper'
> > > on-stack structure open to being corrupted by subsequent stack usage.
> > > 
> > > However that 'sleeper' could still be on the 'pending' list that the
> > > other thread is looking at and so any corruption could cause it to behave badly.
> > > 
> > > To avoid this race we simply take the same path as if the
> > > 'wait_for_completion_interruptible_timeout' was interrupted and if the
> > > sleeper is no longer on the list (which it won't be) we wait on the
> > > completion - which will ensure that any other cache_revisit_request
> > > will have let go of the sleeper.
> > 
> > OK, but I don't think we need that first CACHE_PENDING check in
> > setup_deferral at all.  Best just to ignore it?:
> > 
> > diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> > index d789dfc..82804b4 100644
> > --- a/net/sunrpc/cache.c
> > +++ b/net/sunrpc/cache.c
> > @@ -567,10 +567,9 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item, int ti
> >  	sleeper.completion = COMPLETION_INITIALIZER_ONSTACK(sleeper.completion);
> >  	dreq->revisit = cache_restart_thread;
> >  
> > -	ret = setup_deferral(dreq, item);
> > +	setup_deferral(dreq, item);
> >  
> > -	if (ret ||
> > -	    wait_for_completion_interruptible_timeout(
> > +	if (wait_for_completion_interruptible_timeout(
> >  		    &sleeper.completion, timeout) <= 0) {
> >  		/* The completion wasn't completed, so we need
> >  		 * to clean up
> > 
> > Then it's obvious that we always wait for completion, and that we fill
> > the basic requirement to avoid corruption here.
> > 
> > (Which is, in more detail: cache_wait_req must not return while dreq is
> > still reachable from anywhere else.  Since dreq is reachable from
> > elsewhere only as long as it is hashed, and since anyone else that might
> > unhash it will call our revisit (which unconditionally calls complete),
> > then forget about it, it suffices for cache_wait_req to either wait for
> > completion, or unhash dreq *itself* (before someone else does).)
> > 
> > Also we don't need the PENDING check to ensure we wait only when
> > necessary--we only wait while dreq is hashed, and as long as we're
> > hashed anyone clearing PENDING will also end up doing the complete().
> > 
> > In the deferral case it's maybe a useful optimization if it avoids
> > an unnecessary drop sometimes.  Here it doesn't help.
> > 
> > --b.
> 
> I like your thinking and I agree with most of it.
> After adding dreq to the hash table, we need to check CACHE_PENDING.  It is
> possible that it was cleared moments before we added dreq to the hash-table so
> the cache_revisit_request associated with clearing it might have missed our
> dreq, so we need to do something about that....

Arrgh, right--I wasn't thinking straight.

> How about this.
> It gets rid of the return values (which were confusing anyway) and adds
> explicit checks on CAHCE_PENDING where needed.
> ??

Thanks, I'll take a look in the morning when my head's (hopefully)
clearer.

> 
> Also, I noticed there is a race with the call to cache_limit_defers.  The
> 'dreq' could be freed before that is called.
> 
> It looks like I need to resubmit a lot of this - do you want to just discard
> it all from your -next and I'll make something new?

I'm trying very hard not to rewind -next; so I'd prefer incremental
patches for anything already there, replacements for the rest.

--b.

> 
> Thank for the review!
> NeilBrown
> 
> 
>  cache.c |   53 +++++++++++++++++------------------------------------
>  1 file changed, 17 insertions(+), 36 deletions(-)
> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index c60c7f6..db0fa44 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -37,7 +37,7 @@
>  
>  #define	 RPCDBG_FACILITY RPCDBG_CACHE
>  
> -static int cache_defer_req(struct cache_req *req, struct cache_head *item);
> +static void cache_defer_req(struct cache_req *req, struct cache_head *item);
>  static void cache_revisit_request(struct cache_head *item);
>  
>  static void cache_init(struct cache_head *h)
> @@ -268,7 +268,8 @@ int cache_check(struct cache_detail *detail,
>  	}
>  
>  	if (rv == -EAGAIN) {
> -		if (cache_defer_req(rqstp, h) < 0) {
> +		cache_defer_req(rqstp, h);
> +		if (!test_bit(CACHE_PENDING, &h->flags)) {
>  			/* Request is not deferred */
>  			rv = cache_is_valid(detail, h);
>  			if (rv == -EAGAIN)
> @@ -527,7 +528,7 @@ static void __hash_deferred_req(struct cache_deferred_req *dreq, struct cache_he
>  	hlist_add_head(&dreq->hash, &cache_defer_hash[hash]);
>  }
>  
> -static int setup_deferral(struct cache_deferred_req *dreq, struct cache_head *item)
> +static void setup_deferral(struct cache_deferred_req *dreq, struct cache_head *item)
>  {
>  
>  	dreq->item = item;
> @@ -537,13 +538,6 @@ static int setup_deferral(struct cache_deferred_req *dreq, struct cache_head *it
>  	__hash_deferred_req(dreq, item);
>  
>  	spin_unlock(&cache_defer_lock);
> -
> -	if (!test_bit(CACHE_PENDING, &item->flags)) {
> -		/* must have just been validated... */
> -		cache_revisit_request(item);
> -		return -EAGAIN;
> -	}
> -	return 0;
>  }
>  
>  struct thread_deferred_req {
> @@ -558,18 +552,17 @@ static void cache_restart_thread(struct cache_deferred_req *dreq, int too_many)
>  	complete(&dr->completion);
>  }
>  
> -static int cache_wait_req(struct cache_req *req, struct cache_head *item, int timeout)
> +static void cache_wait_req(struct cache_req *req, struct cache_head *item, int timeout)
>  {
>  	struct thread_deferred_req sleeper;
>  	struct cache_deferred_req *dreq = &sleeper.handle;
> -	int ret;
>  
>  	sleeper.completion = COMPLETION_INITIALIZER_ONSTACK(sleeper.completion);
>  	dreq->revisit = cache_restart_thread;
>  
> -	ret = setup_deferral(dreq, item);
> +	setup_deferral(dreq, item);
>  
> -	if (ret ||
> +	if (!test_bit(CACHE_PENDING, &item->flags) ||
>  	    wait_for_completion_interruptible_timeout(
>  		    &sleeper.completion, timeout) <= 0) {
>  		/* The completion wasn't completed, so we need
> @@ -589,18 +582,6 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item, int ti
>  			wait_for_completion(&sleeper.completion);
>  		}
>  	}
> -	if (test_bit(CACHE_PENDING, &item->flags)) {
> -		/* item is still pending, try request
> -		 * deferral
> -		 */
> -		return -ETIMEDOUT;
> -	}
> -	/* only return success if we actually deferred the
> -	 * request.  In this case we waited until it was
> -	 * answered so no deferral has happened - rather
> -	 * an answer already exists.
> -	 */
> -	return -EEXIST;
>  }
>  
>  static void cache_limit_defers(struct cache_deferred_req *dreq)
> @@ -639,7 +620,7 @@ static void cache_limit_defers(struct cache_deferred_req *dreq)
>  		discard->revisit(discard, 1);
>  }
>  
> -static int cache_defer_req(struct cache_req *req, struct cache_head *item)
> +static void cache_defer_req(struct cache_req *req, struct cache_head *item)
>  {
>  	struct cache_deferred_req *dreq;
>  	int ret;
> @@ -647,17 +628,19 @@ static int cache_defer_req(struct cache_req *req, struct cache_head *item)
>  
>  	timeout = req->set_waiter(req, 1);
>  	if (timeout) {
> -		ret = cache_wait_req(req, item, timeout);
> +		cache_wait_req(req, item, timeout);
>  		req->set_waiter(req, 0);
> -		return ret;
>  	} else {
>  		dreq = req->defer(req);
>  		if (dreq == NULL)
> -			return -ENOMEM;
> -		if (setup_deferral(dreq, item) < 0)
> -			return -EAGAIN;
> +			return;
> +		setup_deferral(dreq, item);
>  		cache_limit_defers(dreq);
> -		return 0;
> +		if (!test_bit(CACHE_PENDING, &item->flags))
> +			/* Bit could have been cleared before we managed to
> +			 * set up the deferral, so need to revisit just in case
> +			 */
> +			cache_revisit_request(item);
>  	}
>  }
>  
> 

  reply	other threads:[~2010-09-23  3:27 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-09-22  2:55 [PATCH 0/7] Assorted nfsd patches for 2.6.37 NeilBrown
2010-09-22  2:55 ` [PATCH 2/7] sunrpc/cache: fix recent breakage of cache_clean_deferred NeilBrown
     [not found]   ` <20100922025506.31745.74964.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-09-22 18:27     ` J. Bruce Fields
2010-09-22  2:55 ` [PATCH 1/7] sunrpc: fix race in new cache_wait code NeilBrown
2010-09-22 17:50   ` J. Bruce Fields
2010-09-23  3:00     ` Neil Brown
2010-09-23  3:25       ` J. Bruce Fields [this message]
2010-09-23 14:46         ` J. Bruce Fields
2010-10-01 23:09           ` J. Bruce Fields
2010-10-02  0:12             ` Neil Brown
2010-09-22  2:55 ` [PATCH 3/7] sunrpc/cache: change deferred-request hash table to use hlist NeilBrown
2010-09-22  2:59   ` J. Bruce Fields
2010-09-22  4:51     ` Neil Brown
2010-09-22  2:55 ` [PATCH 4/7] sunrpc/cache: centralise handling of size limit on deferred list NeilBrown
     [not found]   ` <20100922025507.31745.61919.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-09-22 18:31     ` J. Bruce Fields
2010-09-23  3:02       ` Neil Brown
2010-09-22  2:55 ` [PATCH 6/7] nfsd: formally deprecate legacy nfsd syscall interface NeilBrown
     [not found]   ` <20100922025507.31745.57024.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-09-22  3:10     ` J. Bruce Fields
2010-09-22  2:55 ` [PATCH 5/7] sunrpc/cache: allow thread manager more control of whether threads can wait for upcalls NeilBrown
2010-09-22 18:36   ` J. Bruce Fields
2010-09-23  3:23     ` Neil Brown
2010-09-22  2:55 ` [PATCH 7/7] nfsd: allow deprecated interface to be compiled out NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100923032540.GA13554@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).