From: "J. Bruce Fields" <bfields@fieldses.org>
To: Neil Brown <neilb@suse.de>
Cc: linux-nfs@vger.kernel.org
Subject: Re: [PATCH 1/7] sunrpc: fix race in new cache_wait code.
Date: Wed, 22 Sep 2010 23:25:40 -0400 [thread overview]
Message-ID: <20100923032540.GA13554@fieldses.org> (raw)
In-Reply-To: <20100923130002.5f2da3e9@notabene>
On Thu, Sep 23, 2010 at 01:00:02PM +1000, Neil Brown wrote:
> On Wed, 22 Sep 2010 13:50:27 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
>
> > On Wed, Sep 22, 2010 at 12:55:06PM +1000, NeilBrown wrote:
> > > If we set up to wait for a cache item to be filled in, and then find
> > > that it is no longer pending, it could be that some other thread is
> > > in 'cache_revisit_request' and has moved our request to its 'pending' list.
> > > So when our setup_deferral calls cache_revisit_request it will find nothing to
> > > put on the pending list, and do nothing.
> > >
> > > We then return from cache_wait_req, thus leaving the 'sleeper'
> > > on-stack structure open to being corrupted by subsequent stack usage.
> > >
> > > However that 'sleeper' could still be on the 'pending' list that the
> > > other thread is looking at and so any corruption could cause it to behave badly.
> > >
> > > To avoid this race we simply take the same path as if the
> > > 'wait_for_completion_interruptible_timeout' was interrupted and if the
> > > sleeper is no longer on the list (which it won't be) we wait on the
> > > completion - which will ensure that any other cache_revisit_request
> > > will have let go of the sleeper.
> >
> > OK, but I don't think we need that first CACHE_PENDING check in
> > setup_deferral at all. Best just to ignore it?:
> >
> > diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> > index d789dfc..82804b4 100644
> > --- a/net/sunrpc/cache.c
> > +++ b/net/sunrpc/cache.c
> > @@ -567,10 +567,9 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item, int ti
> > sleeper.completion = COMPLETION_INITIALIZER_ONSTACK(sleeper.completion);
> > dreq->revisit = cache_restart_thread;
> >
> > - ret = setup_deferral(dreq, item);
> > + setup_deferral(dreq, item);
> >
> > - if (ret ||
> > - wait_for_completion_interruptible_timeout(
> > + if (wait_for_completion_interruptible_timeout(
> > &sleeper.completion, timeout) <= 0) {
> > /* The completion wasn't completed, so we need
> > * to clean up
> >
> > Then it's obvious that we always wait for completion, and that we fill
> > the basic requirement to avoid corruption here.
> >
> > (Which is, in more detail: cache_wait_req must not return while dreq is
> > still reachable from anywhere else. Since dreq is reachable from
> > elsewhere only as long as it is hashed, and since anyone else that might
> > unhash it will call our revisit (which unconditionally calls complete),
> > then forget about it, it suffices for cache_wait_req to either wait for
> > completion, or unhash dreq *itself* (before someone else does).)
> >
> > Also we don't need the PENDING check to ensure we wait only when
> > necessary--we only wait while dreq is hashed, and as long as we're
> > hashed anyone clearing PENDING will also end up doing the complete().
> >
> > In the deferral case it's maybe a useful optimization if it avoids
> > an unnecessary drop sometimes. Here it doesn't help.
> >
> > --b.
>
> I like your thinking and I agree with most of it.
> After adding dreq to the hash table, we need to check CACHE_PENDING. It is
> possible that it was cleared moments before we added dreq to the hash-table so
> the cache_revisit_request associated with clearing it might have missed our
> dreq, so we need to do something about that....
Arrgh, right--I wasn't thinking straight.
> How about this.
> It gets rid of the return values (which were confusing anyway) and adds
> explicit checks on CAHCE_PENDING where needed.
> ??
Thanks, I'll take a look in the morning when my head's (hopefully)
clearer.
>
> Also, I noticed there is a race with the call to cache_limit_defers. The
> 'dreq' could be freed before that is called.
>
> It looks like I need to resubmit a lot of this - do you want to just discard
> it all from your -next and I'll make something new?
I'm trying very hard not to rewind -next; so I'd prefer incremental
patches for anything already there, replacements for the rest.
--b.
>
> Thank for the review!
> NeilBrown
>
>
> cache.c | 53 +++++++++++++++++------------------------------------
> 1 file changed, 17 insertions(+), 36 deletions(-)
>
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index c60c7f6..db0fa44 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -37,7 +37,7 @@
>
> #define RPCDBG_FACILITY RPCDBG_CACHE
>
> -static int cache_defer_req(struct cache_req *req, struct cache_head *item);
> +static void cache_defer_req(struct cache_req *req, struct cache_head *item);
> static void cache_revisit_request(struct cache_head *item);
>
> static void cache_init(struct cache_head *h)
> @@ -268,7 +268,8 @@ int cache_check(struct cache_detail *detail,
> }
>
> if (rv == -EAGAIN) {
> - if (cache_defer_req(rqstp, h) < 0) {
> + cache_defer_req(rqstp, h);
> + if (!test_bit(CACHE_PENDING, &h->flags)) {
> /* Request is not deferred */
> rv = cache_is_valid(detail, h);
> if (rv == -EAGAIN)
> @@ -527,7 +528,7 @@ static void __hash_deferred_req(struct cache_deferred_req *dreq, struct cache_he
> hlist_add_head(&dreq->hash, &cache_defer_hash[hash]);
> }
>
> -static int setup_deferral(struct cache_deferred_req *dreq, struct cache_head *item)
> +static void setup_deferral(struct cache_deferred_req *dreq, struct cache_head *item)
> {
>
> dreq->item = item;
> @@ -537,13 +538,6 @@ static int setup_deferral(struct cache_deferred_req *dreq, struct cache_head *it
> __hash_deferred_req(dreq, item);
>
> spin_unlock(&cache_defer_lock);
> -
> - if (!test_bit(CACHE_PENDING, &item->flags)) {
> - /* must have just been validated... */
> - cache_revisit_request(item);
> - return -EAGAIN;
> - }
> - return 0;
> }
>
> struct thread_deferred_req {
> @@ -558,18 +552,17 @@ static void cache_restart_thread(struct cache_deferred_req *dreq, int too_many)
> complete(&dr->completion);
> }
>
> -static int cache_wait_req(struct cache_req *req, struct cache_head *item, int timeout)
> +static void cache_wait_req(struct cache_req *req, struct cache_head *item, int timeout)
> {
> struct thread_deferred_req sleeper;
> struct cache_deferred_req *dreq = &sleeper.handle;
> - int ret;
>
> sleeper.completion = COMPLETION_INITIALIZER_ONSTACK(sleeper.completion);
> dreq->revisit = cache_restart_thread;
>
> - ret = setup_deferral(dreq, item);
> + setup_deferral(dreq, item);
>
> - if (ret ||
> + if (!test_bit(CACHE_PENDING, &item->flags) ||
> wait_for_completion_interruptible_timeout(
> &sleeper.completion, timeout) <= 0) {
> /* The completion wasn't completed, so we need
> @@ -589,18 +582,6 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item, int ti
> wait_for_completion(&sleeper.completion);
> }
> }
> - if (test_bit(CACHE_PENDING, &item->flags)) {
> - /* item is still pending, try request
> - * deferral
> - */
> - return -ETIMEDOUT;
> - }
> - /* only return success if we actually deferred the
> - * request. In this case we waited until it was
> - * answered so no deferral has happened - rather
> - * an answer already exists.
> - */
> - return -EEXIST;
> }
>
> static void cache_limit_defers(struct cache_deferred_req *dreq)
> @@ -639,7 +620,7 @@ static void cache_limit_defers(struct cache_deferred_req *dreq)
> discard->revisit(discard, 1);
> }
>
> -static int cache_defer_req(struct cache_req *req, struct cache_head *item)
> +static void cache_defer_req(struct cache_req *req, struct cache_head *item)
> {
> struct cache_deferred_req *dreq;
> int ret;
> @@ -647,17 +628,19 @@ static int cache_defer_req(struct cache_req *req, struct cache_head *item)
>
> timeout = req->set_waiter(req, 1);
> if (timeout) {
> - ret = cache_wait_req(req, item, timeout);
> + cache_wait_req(req, item, timeout);
> req->set_waiter(req, 0);
> - return ret;
> } else {
> dreq = req->defer(req);
> if (dreq == NULL)
> - return -ENOMEM;
> - if (setup_deferral(dreq, item) < 0)
> - return -EAGAIN;
> + return;
> + setup_deferral(dreq, item);
> cache_limit_defers(dreq);
> - return 0;
> + if (!test_bit(CACHE_PENDING, &item->flags))
> + /* Bit could have been cleared before we managed to
> + * set up the deferral, so need to revisit just in case
> + */
> + cache_revisit_request(item);
> }
> }
>
>
next prev parent reply other threads:[~2010-09-23 3:27 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-22 2:55 [PATCH 0/7] Assorted nfsd patches for 2.6.37 NeilBrown
2010-09-22 2:55 ` [PATCH 2/7] sunrpc/cache: fix recent breakage of cache_clean_deferred NeilBrown
[not found] ` <20100922025506.31745.74964.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-09-22 18:27 ` J. Bruce Fields
2010-09-22 2:55 ` [PATCH 1/7] sunrpc: fix race in new cache_wait code NeilBrown
2010-09-22 17:50 ` J. Bruce Fields
2010-09-23 3:00 ` Neil Brown
2010-09-23 3:25 ` J. Bruce Fields [this message]
2010-09-23 14:46 ` J. Bruce Fields
2010-10-01 23:09 ` J. Bruce Fields
2010-10-02 0:12 ` Neil Brown
2010-09-22 2:55 ` [PATCH 3/7] sunrpc/cache: change deferred-request hash table to use hlist NeilBrown
2010-09-22 2:59 ` J. Bruce Fields
2010-09-22 4:51 ` Neil Brown
2010-09-22 2:55 ` [PATCH 4/7] sunrpc/cache: centralise handling of size limit on deferred list NeilBrown
[not found] ` <20100922025507.31745.61919.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-09-22 18:31 ` J. Bruce Fields
2010-09-23 3:02 ` Neil Brown
2010-09-22 2:55 ` [PATCH 6/7] nfsd: formally deprecate legacy nfsd syscall interface NeilBrown
[not found] ` <20100922025507.31745.57024.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-09-22 3:10 ` J. Bruce Fields
2010-09-22 2:55 ` [PATCH 5/7] sunrpc/cache: allow thread manager more control of whether threads can wait for upcalls NeilBrown
2010-09-22 18:36 ` J. Bruce Fields
2010-09-23 3:23 ` Neil Brown
2010-09-22 2:55 ` [PATCH 7/7] nfsd: allow deprecated interface to be compiled out NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100923032540.GA13554@fieldses.org \
--to=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).