linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH] svcrpc: modifying positive sunrpc cache entries is racy
Date: Tue, 4 Jan 2011 16:01:52 +1100	[thread overview]
Message-ID: <20110104160152.602a3c44@notabene.brown> (raw)
In-Reply-To: <20110103205514.GB18056@fieldses.org>

On Mon, 3 Jan 2011 15:55:14 -0500 "J. Bruce Fields" <bfields@fieldses.org>
wrote:

> On Wed, Dec 29, 2010 at 08:57:19PM -0500, J. Bruce Fields wrote:
> > On Thu, Dec 30, 2010 at 12:19:40PM +1100, Neil Brown wrote:
> > > On Wed, 29 Dec 2010 15:59:42 -0500 "J. Bruce Fields" <bfields@fieldses.org>
> > > wrote:
> > > > Also noticed while trying to track down an rhel5 oops in
> > > > svcauth_unix_set_client():
> > > > 
> > > > 	- cache_check() can set an entry negative in place, which if
> > > > 	  nothing else must cause a leak in some cases.  (Because when
> > > > 	  the entry is eventually destroyed, it will be assumed to not
> > > > 	  have any contents.)  I suppose the fix is again to try to
> > > > 	  adding a new negative entry instead.
> > > 
> > > cache_check should only set an entry 'negative' if it is not already valid
> > > (rv == -EAGAIN) and there is no up-call pending.
> > 
> > I don't think anything keeps VALID from being set after the
> > cache_is_valid check but before the code that does the
> > set_bit(CACHE_NEGATIVE).
> > 
> > > Maybe we should check CACHE_VALID again after the test_and_set of
> > > CACHE_PENDING, but is a very unlikely race (if it is actually a race at all)
> > > 
> > > > 
> > > > 	- since cache_check() doesn't use any locking, I can't see what
> > > > 	  guarantees that when it sees the CACHE_VALID bit set and
> > > > 	  CACHE_NEGATIVE cleared, it must necessarily see the new
> > > > 	  contents.   I think that'd be fixed by a wmb() before setting
> > > > 	  those bits and a rmb() after checking them.  I don't know if
> > > > 	  it's actually possible to hit that bug....
> > > 
> > > Yes, we probably want a set_bit_lock in cache_fresh_locked() though I don't
> > > think that exists, so we could use test_and_set_bit_locked() instead.
> > > 
> > > But it does feel like maybe we should add some locking to cache_check.
> > > Take the lock at the the start, and release it after the
> > > test_and_set_bit(CACHE_PENDING) or once we have decided not to do that ???
> > 
> > Maybe so.
> 
> Here's one attempt.
> 
> --b.
> 
> commit 55563023f85d01698ccf72325c87e3a7039a189b
> Author: J. Bruce Fields <bfields@redhat.com>
> Date:   Mon Jan 3 15:10:27 2011 -0500
> 
>     svcrpc: take locks to fix cache_check races
>     
>     There are at least a couple races in cache_check:
>     
>     	- We attempt to turn a cache entry negative in place.  But that
>     	  entry may already have been filled in by some other task since
>     	  we last checked whether it was valid, so we could be modifying
>     	  an already-valid entry.  If nothing else there's a likely leak
>     	  in such a case when the entry is eventually put() and contents
>     	  are not freed because it has CACHE_NEGATIVE set.
>     	- If cache_check races with an update that is turning the entry
>     	  CACHE_VALID, then it's possible that the CACHE_VALID bit could
>     	  become visible on this CPU before the actual contents do, so
>     	  we could tell the caller this entry is ready to use when in
>     	  fact the caller could still get invalid contents.
>     
>     Some memory barriers might be sufficient to fix at least the latter; but
>     for now let's keep things simple and take the hash_lock when we turn an
>     entry negative or check the CACHE_VALID bit.
>     
>     Signed-off-by: J. Bruce Fields <bfields@redhat.com>
> 
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 0d6002f..2105b40 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -200,8 +200,9 @@ static int cache_make_upcall(struct cache_detail *cd, struct cache_head *h)
>  	return cd->cache_upcall(cd, h);
>  }
>  
> -static inline int cache_is_valid(struct cache_detail *detail, struct cache_head *h)
> +static int __cache_is_valid(struct cache_detail *detail, struct cache_head *h)
>  {
> +
>  	if (!test_bit(CACHE_VALID, &h->flags))
>  		return -EAGAIN;
>  	else {
> @@ -213,6 +214,33 @@ static inline int cache_is_valid(struct cache_detail *detail, struct cache_head
>  	}
>  }
>  
> +static int cache_is_valid(struct cache_detail *detail, struct cache_head *h)
> +{
> +	int rv;
> +
> +	read_lock(&detail->hash_lock);
> +	rv = __cache_is_valid(detail, h);
> +	read_unlock(&detail->hash_lock);
> +	return rv;
> +}

I don't think there is anything in __cache_is_valid that needs to be
protected.
The compiler will almost certainly produce code which loads f->flags once and
then performs 1 or 2 bit tests against the value in the register and produces
one of 3 possible return values based on the result.
There is absolutely no value in putting locking around that, especially as
CACHE_VALID is never cleared.

Maybe you imagine a re-ordering of setting CACHE_NEGATIVE and CACHE_VALID,
but as they are in the same cache line (and in fact in the same byte) they
cannot be re-ordered.  We always set CACHE_NEGATIVE before CACHE_VALID and
there is no way those two could get to memory in the wrong order.



> +
> +static int try_to_negate_entry(struct cache_detail *detail, struct cache_head *h)
> +{
> +	int rv;
> +
> +	write_lock(&detail->hash_lock);
> +	rv = __cache_is_valid(detail, h);
> +	if (rv != -EAGAIN) {
> +		write_unlock(&detail->hash_lock);
> +		return rv;
> +	}
> +	set_bit(CACHE_NEGATIVE, &h->flags);
> +	cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
> +	write_unlock(&detail->hash_lock);
> +	cache_fresh_unlocked(h, detail);
> +	return -ENOENT;
> +}
> +
>  /*
>   * This is the generic cache management routine for all
>   * the authentication caches.
> @@ -251,14 +279,8 @@ int cache_check(struct cache_detail *detail,
>  			case -EINVAL:
>  				clear_bit(CACHE_PENDING, &h->flags);
>  				cache_revisit_request(h);
> -				if (rv == -EAGAIN) {
> -					set_bit(CACHE_NEGATIVE, &h->flags);
> -					cache_fresh_locked(h, seconds_since_boot()+CACHE_NEW_EXPIRY);
> -					cache_fresh_unlocked(h, detail);
> -					rv = -ENOENT;
> -				}
> +				rv = try_to_negate_entry(detail, h);
>  				break;

This bit looks good those.  It feels much better having an 'unlock' between
'cache_fresh_locked' and 'cache_fresh_unlocked' !!

Thanks,

NeilBrown


> -
>  			case -EAGAIN:
>  				clear_bit(CACHE_PENDING, &h->flags);
>  				cache_revisit_request(h);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


  reply	other threads:[~2011-01-04  5:02 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-12-29 20:47 [PATCH] svcrpc: modifying positive sunrpc cache entries is racy J. Bruce Fields
2010-12-29 20:59 ` J. Bruce Fields
2010-12-30  1:19   ` Neil Brown
2010-12-30  1:57     ` J. Bruce Fields
2011-01-03 20:55       ` J. Bruce Fields
2011-01-04  5:01         ` NeilBrown [this message]
2011-01-04 15:22           ` J. Bruce Fields
2011-01-04 19:23             ` J. Bruce Fields
2011-01-04 19:31               ` [PATCH 1/2] svcrpc: take lock on turning entry NEGATIVE in cache_check J. Bruce Fields
2011-01-04 19:31               ` [PATCH 2/2] svcrpc: ensure cache_check caller sees updated entry J. Bruce Fields
2011-01-04 21:10               ` [PATCH] svcrpc: modifying positive sunrpc cache entries is racy NeilBrown
     [not found]                 ` <20110105081031.220bfbc9-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2011-01-04 21:15                   ` J. Bruce Fields
2011-01-03 22:26 ` J. Bruce Fields
2011-01-04  3:08   ` J. Bruce Fields
2011-01-04  4:51     ` NeilBrown
2011-01-04 18:43       ` J. Bruce Fields
2011-01-04 21:15         ` NeilBrown
2011-01-04 21:21           ` J. Bruce Fields
2011-01-04 21:46       ` J. Bruce Fields
2011-01-04 23:05         ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110104160152.602a3c44@notabene.brown \
    --to=neilb@suse.de \
    --cc=bfields@fieldses.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).