* [PATCH 2/7] sunrpc/cache: fix recent breakage of cache_clean_deferred
2010-09-22 2:55 [PATCH 0/7] Assorted nfsd patches for 2.6.37 NeilBrown
@ 2010-09-22 2:55 ` NeilBrown
[not found] ` <20100922025506.31745.74964.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-09-22 2:55 ` [PATCH 1/7] sunrpc: fix race in new cache_wait code NeilBrown
` (5 subsequent siblings)
6 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2010-09-22 2:55 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
commit 6610f720e9e8103c22d1f1ccf8fbb695550a571f
broke cache_clean_deferred as entries are no longer added to the
pending list for subsequent revisiting.
So put those requests back on the pending list.
Signed-off-by: NeilBrown <neilb@suse.de>
---
net/sunrpc/cache.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 62078be..a9e850e 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -678,8 +678,10 @@ void cache_clean_deferred(void *owner)
spin_lock(&cache_defer_lock);
list_for_each_entry_safe(dreq, tmp, &cache_defer_list, recent) {
- if (dreq->owner == owner)
+ if (dreq->owner == owner) {
__unhash_deferred_req(dreq);
+ list_add(&dreq->recent, &pending);
+ }
}
spin_unlock(&cache_defer_lock);
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH 1/7] sunrpc: fix race in new cache_wait code.
2010-09-22 2:55 [PATCH 0/7] Assorted nfsd patches for 2.6.37 NeilBrown
2010-09-22 2:55 ` [PATCH 2/7] sunrpc/cache: fix recent breakage of cache_clean_deferred NeilBrown
@ 2010-09-22 2:55 ` NeilBrown
2010-09-22 17:50 ` J. Bruce Fields
2010-09-22 2:55 ` [PATCH 6/7] nfsd: formally deprecate legacy nfsd syscall interface NeilBrown
` (4 subsequent siblings)
6 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2010-09-22 2:55 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
If we set up to wait for a cache item to be filled in, and then find
that it is no longer pending, it could be that some other thread is
in 'cache_revisit_request' and has moved our request to its 'pending' list.
So when our setup_deferral calls cache_revisit_request it will find nothing to
put on the pending list, and do nothing.
We then return from cache_wait_req, thus leaving the 'sleeper'
on-stack structure open to being corrupted by subsequent stack usage.
However that 'sleeper' could still be on the 'pending' list that the
other thread is looking at and so any corruption could cause it to behave badly.
To avoid this race we simply take the same path as if the
'wait_for_completion_interruptible_timeout' was interrupted and if the
sleeper is no longer on the list (which it won't be) we wait on the
completion - which will ensure that any other cache_revisit_request
will have let go of the sleeper.
Signed-off-by: NeilBrown <neilb@suse.de>
---
net/sunrpc/cache.c | 5 ++---
1 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index ca7c621..62078be 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -579,10 +579,9 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item)
dreq->revisit = cache_restart_thread;
ret = setup_deferral(dreq, item);
- if (ret)
- return ret;
- if (wait_for_completion_interruptible_timeout(
+ if (ret ||
+ wait_for_completion_interruptible_timeout(
&sleeper.completion, req->thread_wait) <= 0) {
/* The completion wasn't completed, so we need
* to clean up
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH 1/7] sunrpc: fix race in new cache_wait code.
2010-09-22 2:55 ` [PATCH 1/7] sunrpc: fix race in new cache_wait code NeilBrown
@ 2010-09-22 17:50 ` J. Bruce Fields
2010-09-23 3:00 ` Neil Brown
0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2010-09-22 17:50 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-nfs
On Wed, Sep 22, 2010 at 12:55:06PM +1000, NeilBrown wrote:
> If we set up to wait for a cache item to be filled in, and then find
> that it is no longer pending, it could be that some other thread is
> in 'cache_revisit_request' and has moved our request to its 'pending' list.
> So when our setup_deferral calls cache_revisit_request it will find nothing to
> put on the pending list, and do nothing.
>
> We then return from cache_wait_req, thus leaving the 'sleeper'
> on-stack structure open to being corrupted by subsequent stack usage.
>
> However that 'sleeper' could still be on the 'pending' list that the
> other thread is looking at and so any corruption could cause it to behave badly.
>
> To avoid this race we simply take the same path as if the
> 'wait_for_completion_interruptible_timeout' was interrupted and if the
> sleeper is no longer on the list (which it won't be) we wait on the
> completion - which will ensure that any other cache_revisit_request
> will have let go of the sleeper.
OK, but I don't think we need that first CACHE_PENDING check in
setup_deferral at all. Best just to ignore it?:
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index d789dfc..82804b4 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -567,10 +567,9 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item, int ti
sleeper.completion = COMPLETION_INITIALIZER_ONSTACK(sleeper.completion);
dreq->revisit = cache_restart_thread;
- ret = setup_deferral(dreq, item);
+ setup_deferral(dreq, item);
- if (ret ||
- wait_for_completion_interruptible_timeout(
+ if (wait_for_completion_interruptible_timeout(
&sleeper.completion, timeout) <= 0) {
/* The completion wasn't completed, so we need
* to clean up
Then it's obvious that we always wait for completion, and that we fill
the basic requirement to avoid corruption here.
(Which is, in more detail: cache_wait_req must not return while dreq is
still reachable from anywhere else. Since dreq is reachable from
elsewhere only as long as it is hashed, and since anyone else that might
unhash it will call our revisit (which unconditionally calls complete),
then forget about it, it suffices for cache_wait_req to either wait for
completion, or unhash dreq *itself* (before someone else does).)
Also we don't need the PENDING check to ensure we wait only when
necessary--we only wait while dreq is hashed, and as long as we're
hashed anyone clearing PENDING will also end up doing the complete().
In the deferral case it's maybe a useful optimization if it avoids
an unnecessary drop sometimes. Here it doesn't help.
--b.
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH 1/7] sunrpc: fix race in new cache_wait code.
2010-09-22 17:50 ` J. Bruce Fields
@ 2010-09-23 3:00 ` Neil Brown
2010-09-23 3:25 ` J. Bruce Fields
0 siblings, 1 reply; 22+ messages in thread
From: Neil Brown @ 2010-09-23 3:00 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
On Wed, 22 Sep 2010 13:50:27 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:
> On Wed, Sep 22, 2010 at 12:55:06PM +1000, NeilBrown wrote:
> > If we set up to wait for a cache item to be filled in, and then find
> > that it is no longer pending, it could be that some other thread is
> > in 'cache_revisit_request' and has moved our request to its 'pending' list.
> > So when our setup_deferral calls cache_revisit_request it will find nothing to
> > put on the pending list, and do nothing.
> >
> > We then return from cache_wait_req, thus leaving the 'sleeper'
> > on-stack structure open to being corrupted by subsequent stack usage.
> >
> > However that 'sleeper' could still be on the 'pending' list that the
> > other thread is looking at and so any corruption could cause it to behave badly.
> >
> > To avoid this race we simply take the same path as if the
> > 'wait_for_completion_interruptible_timeout' was interrupted and if the
> > sleeper is no longer on the list (which it won't be) we wait on the
> > completion - which will ensure that any other cache_revisit_request
> > will have let go of the sleeper.
>
> OK, but I don't think we need that first CACHE_PENDING check in
> setup_deferral at all. Best just to ignore it?:
>
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index d789dfc..82804b4 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -567,10 +567,9 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item, int ti
> sleeper.completion = COMPLETION_INITIALIZER_ONSTACK(sleeper.completion);
> dreq->revisit = cache_restart_thread;
>
> - ret = setup_deferral(dreq, item);
> + setup_deferral(dreq, item);
>
> - if (ret ||
> - wait_for_completion_interruptible_timeout(
> + if (wait_for_completion_interruptible_timeout(
> &sleeper.completion, timeout) <= 0) {
> /* The completion wasn't completed, so we need
> * to clean up
>
> Then it's obvious that we always wait for completion, and that we fill
> the basic requirement to avoid corruption here.
>
> (Which is, in more detail: cache_wait_req must not return while dreq is
> still reachable from anywhere else. Since dreq is reachable from
> elsewhere only as long as it is hashed, and since anyone else that might
> unhash it will call our revisit (which unconditionally calls complete),
> then forget about it, it suffices for cache_wait_req to either wait for
> completion, or unhash dreq *itself* (before someone else does).)
>
> Also we don't need the PENDING check to ensure we wait only when
> necessary--we only wait while dreq is hashed, and as long as we're
> hashed anyone clearing PENDING will also end up doing the complete().
>
> In the deferral case it's maybe a useful optimization if it avoids
> an unnecessary drop sometimes. Here it doesn't help.
>
> --b.
I like your thinking and I agree with most of it.
After adding dreq to the hash table, we need to check CACHE_PENDING. It is
possible that it was cleared moments before we added dreq to the hash-table so
the cache_revisit_request associated with clearing it might have missed our
dreq, so we need to do something about that....
How about this.
It gets rid of the return values (which were confusing anyway) and adds
explicit checks on CAHCE_PENDING where needed.
??
Also, I noticed there is a race with the call to cache_limit_defers. The
'dreq' could be freed before that is called.
It looks like I need to resubmit a lot of this - do you want to just discard
it all from your -next and I'll make something new?
Thank for the review!
NeilBrown
cache.c | 53 +++++++++++++++++------------------------------------
1 file changed, 17 insertions(+), 36 deletions(-)
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index c60c7f6..db0fa44 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -37,7 +37,7 @@
#define RPCDBG_FACILITY RPCDBG_CACHE
-static int cache_defer_req(struct cache_req *req, struct cache_head *item);
+static void cache_defer_req(struct cache_req *req, struct cache_head *item);
static void cache_revisit_request(struct cache_head *item);
static void cache_init(struct cache_head *h)
@@ -268,7 +268,8 @@ int cache_check(struct cache_detail *detail,
}
if (rv == -EAGAIN) {
- if (cache_defer_req(rqstp, h) < 0) {
+ cache_defer_req(rqstp, h);
+ if (!test_bit(CACHE_PENDING, &h->flags)) {
/* Request is not deferred */
rv = cache_is_valid(detail, h);
if (rv == -EAGAIN)
@@ -527,7 +528,7 @@ static void __hash_deferred_req(struct cache_deferred_req *dreq, struct cache_he
hlist_add_head(&dreq->hash, &cache_defer_hash[hash]);
}
-static int setup_deferral(struct cache_deferred_req *dreq, struct cache_head *item)
+static void setup_deferral(struct cache_deferred_req *dreq, struct cache_head *item)
{
dreq->item = item;
@@ -537,13 +538,6 @@ static int setup_deferral(struct cache_deferred_req *dreq, struct cache_head *it
__hash_deferred_req(dreq, item);
spin_unlock(&cache_defer_lock);
-
- if (!test_bit(CACHE_PENDING, &item->flags)) {
- /* must have just been validated... */
- cache_revisit_request(item);
- return -EAGAIN;
- }
- return 0;
}
struct thread_deferred_req {
@@ -558,18 +552,17 @@ static void cache_restart_thread(struct cache_deferred_req *dreq, int too_many)
complete(&dr->completion);
}
-static int cache_wait_req(struct cache_req *req, struct cache_head *item, int timeout)
+static void cache_wait_req(struct cache_req *req, struct cache_head *item, int timeout)
{
struct thread_deferred_req sleeper;
struct cache_deferred_req *dreq = &sleeper.handle;
- int ret;
sleeper.completion = COMPLETION_INITIALIZER_ONSTACK(sleeper.completion);
dreq->revisit = cache_restart_thread;
- ret = setup_deferral(dreq, item);
+ setup_deferral(dreq, item);
- if (ret ||
+ if (!test_bit(CACHE_PENDING, &item->flags) ||
wait_for_completion_interruptible_timeout(
&sleeper.completion, timeout) <= 0) {
/* The completion wasn't completed, so we need
@@ -589,18 +582,6 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item, int ti
wait_for_completion(&sleeper.completion);
}
}
- if (test_bit(CACHE_PENDING, &item->flags)) {
- /* item is still pending, try request
- * deferral
- */
- return -ETIMEDOUT;
- }
- /* only return success if we actually deferred the
- * request. In this case we waited until it was
- * answered so no deferral has happened - rather
- * an answer already exists.
- */
- return -EEXIST;
}
static void cache_limit_defers(struct cache_deferred_req *dreq)
@@ -639,7 +620,7 @@ static void cache_limit_defers(struct cache_deferred_req *dreq)
discard->revisit(discard, 1);
}
-static int cache_defer_req(struct cache_req *req, struct cache_head *item)
+static void cache_defer_req(struct cache_req *req, struct cache_head *item)
{
struct cache_deferred_req *dreq;
int ret;
@@ -647,17 +628,19 @@ static int cache_defer_req(struct cache_req *req, struct cache_head *item)
timeout = req->set_waiter(req, 1);
if (timeout) {
- ret = cache_wait_req(req, item, timeout);
+ cache_wait_req(req, item, timeout);
req->set_waiter(req, 0);
- return ret;
} else {
dreq = req->defer(req);
if (dreq == NULL)
- return -ENOMEM;
- if (setup_deferral(dreq, item) < 0)
- return -EAGAIN;
+ return;
+ setup_deferral(dreq, item);
cache_limit_defers(dreq);
- return 0;
+ if (!test_bit(CACHE_PENDING, &item->flags))
+ /* Bit could have been cleared before we managed to
+ * set up the deferral, so need to revisit just in case
+ */
+ cache_revisit_request(item);
}
}
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH 1/7] sunrpc: fix race in new cache_wait code.
2010-09-23 3:00 ` Neil Brown
@ 2010-09-23 3:25 ` J. Bruce Fields
2010-09-23 14:46 ` J. Bruce Fields
0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2010-09-23 3:25 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-nfs
On Thu, Sep 23, 2010 at 01:00:02PM +1000, Neil Brown wrote:
> On Wed, 22 Sep 2010 13:50:27 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
>
> > On Wed, Sep 22, 2010 at 12:55:06PM +1000, NeilBrown wrote:
> > > If we set up to wait for a cache item to be filled in, and then find
> > > that it is no longer pending, it could be that some other thread is
> > > in 'cache_revisit_request' and has moved our request to its 'pending' list.
> > > So when our setup_deferral calls cache_revisit_request it will find nothing to
> > > put on the pending list, and do nothing.
> > >
> > > We then return from cache_wait_req, thus leaving the 'sleeper'
> > > on-stack structure open to being corrupted by subsequent stack usage.
> > >
> > > However that 'sleeper' could still be on the 'pending' list that the
> > > other thread is looking at and so any corruption could cause it to behave badly.
> > >
> > > To avoid this race we simply take the same path as if the
> > > 'wait_for_completion_interruptible_timeout' was interrupted and if the
> > > sleeper is no longer on the list (which it won't be) we wait on the
> > > completion - which will ensure that any other cache_revisit_request
> > > will have let go of the sleeper.
> >
> > OK, but I don't think we need that first CACHE_PENDING check in
> > setup_deferral at all. Best just to ignore it?:
> >
> > diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> > index d789dfc..82804b4 100644
> > --- a/net/sunrpc/cache.c
> > +++ b/net/sunrpc/cache.c
> > @@ -567,10 +567,9 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item, int ti
> > sleeper.completion = COMPLETION_INITIALIZER_ONSTACK(sleeper.completion);
> > dreq->revisit = cache_restart_thread;
> >
> > - ret = setup_deferral(dreq, item);
> > + setup_deferral(dreq, item);
> >
> > - if (ret ||
> > - wait_for_completion_interruptible_timeout(
> > + if (wait_for_completion_interruptible_timeout(
> > &sleeper.completion, timeout) <= 0) {
> > /* The completion wasn't completed, so we need
> > * to clean up
> >
> > Then it's obvious that we always wait for completion, and that we fill
> > the basic requirement to avoid corruption here.
> >
> > (Which is, in more detail: cache_wait_req must not return while dreq is
> > still reachable from anywhere else. Since dreq is reachable from
> > elsewhere only as long as it is hashed, and since anyone else that might
> > unhash it will call our revisit (which unconditionally calls complete),
> > then forget about it, it suffices for cache_wait_req to either wait for
> > completion, or unhash dreq *itself* (before someone else does).)
> >
> > Also we don't need the PENDING check to ensure we wait only when
> > necessary--we only wait while dreq is hashed, and as long as we're
> > hashed anyone clearing PENDING will also end up doing the complete().
> >
> > In the deferral case it's maybe a useful optimization if it avoids
> > an unnecessary drop sometimes. Here it doesn't help.
> >
> > --b.
>
> I like your thinking and I agree with most of it.
> After adding dreq to the hash table, we need to check CACHE_PENDING. It is
> possible that it was cleared moments before we added dreq to the hash-table so
> the cache_revisit_request associated with clearing it might have missed our
> dreq, so we need to do something about that....
Arrgh, right--I wasn't thinking straight.
> How about this.
> It gets rid of the return values (which were confusing anyway) and adds
> explicit checks on CAHCE_PENDING where needed.
> ??
Thanks, I'll take a look in the morning when my head's (hopefully)
clearer.
>
> Also, I noticed there is a race with the call to cache_limit_defers. The
> 'dreq' could be freed before that is called.
>
> It looks like I need to resubmit a lot of this - do you want to just discard
> it all from your -next and I'll make something new?
I'm trying very hard not to rewind -next; so I'd prefer incremental
patches for anything already there, replacements for the rest.
--b.
>
> Thank for the review!
> NeilBrown
>
>
> cache.c | 53 +++++++++++++++++------------------------------------
> 1 file changed, 17 insertions(+), 36 deletions(-)
>
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index c60c7f6..db0fa44 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -37,7 +37,7 @@
>
> #define RPCDBG_FACILITY RPCDBG_CACHE
>
> -static int cache_defer_req(struct cache_req *req, struct cache_head *item);
> +static void cache_defer_req(struct cache_req *req, struct cache_head *item);
> static void cache_revisit_request(struct cache_head *item);
>
> static void cache_init(struct cache_head *h)
> @@ -268,7 +268,8 @@ int cache_check(struct cache_detail *detail,
> }
>
> if (rv == -EAGAIN) {
> - if (cache_defer_req(rqstp, h) < 0) {
> + cache_defer_req(rqstp, h);
> + if (!test_bit(CACHE_PENDING, &h->flags)) {
> /* Request is not deferred */
> rv = cache_is_valid(detail, h);
> if (rv == -EAGAIN)
> @@ -527,7 +528,7 @@ static void __hash_deferred_req(struct cache_deferred_req *dreq, struct cache_he
> hlist_add_head(&dreq->hash, &cache_defer_hash[hash]);
> }
>
> -static int setup_deferral(struct cache_deferred_req *dreq, struct cache_head *item)
> +static void setup_deferral(struct cache_deferred_req *dreq, struct cache_head *item)
> {
>
> dreq->item = item;
> @@ -537,13 +538,6 @@ static int setup_deferral(struct cache_deferred_req *dreq, struct cache_head *it
> __hash_deferred_req(dreq, item);
>
> spin_unlock(&cache_defer_lock);
> -
> - if (!test_bit(CACHE_PENDING, &item->flags)) {
> - /* must have just been validated... */
> - cache_revisit_request(item);
> - return -EAGAIN;
> - }
> - return 0;
> }
>
> struct thread_deferred_req {
> @@ -558,18 +552,17 @@ static void cache_restart_thread(struct cache_deferred_req *dreq, int too_many)
> complete(&dr->completion);
> }
>
> -static int cache_wait_req(struct cache_req *req, struct cache_head *item, int timeout)
> +static void cache_wait_req(struct cache_req *req, struct cache_head *item, int timeout)
> {
> struct thread_deferred_req sleeper;
> struct cache_deferred_req *dreq = &sleeper.handle;
> - int ret;
>
> sleeper.completion = COMPLETION_INITIALIZER_ONSTACK(sleeper.completion);
> dreq->revisit = cache_restart_thread;
>
> - ret = setup_deferral(dreq, item);
> + setup_deferral(dreq, item);
>
> - if (ret ||
> + if (!test_bit(CACHE_PENDING, &item->flags) ||
> wait_for_completion_interruptible_timeout(
> &sleeper.completion, timeout) <= 0) {
> /* The completion wasn't completed, so we need
> @@ -589,18 +582,6 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item, int ti
> wait_for_completion(&sleeper.completion);
> }
> }
> - if (test_bit(CACHE_PENDING, &item->flags)) {
> - /* item is still pending, try request
> - * deferral
> - */
> - return -ETIMEDOUT;
> - }
> - /* only return success if we actually deferred the
> - * request. In this case we waited until it was
> - * answered so no deferral has happened - rather
> - * an answer already exists.
> - */
> - return -EEXIST;
> }
>
> static void cache_limit_defers(struct cache_deferred_req *dreq)
> @@ -639,7 +620,7 @@ static void cache_limit_defers(struct cache_deferred_req *dreq)
> discard->revisit(discard, 1);
> }
>
> -static int cache_defer_req(struct cache_req *req, struct cache_head *item)
> +static void cache_defer_req(struct cache_req *req, struct cache_head *item)
> {
> struct cache_deferred_req *dreq;
> int ret;
> @@ -647,17 +628,19 @@ static int cache_defer_req(struct cache_req *req, struct cache_head *item)
>
> timeout = req->set_waiter(req, 1);
> if (timeout) {
> - ret = cache_wait_req(req, item, timeout);
> + cache_wait_req(req, item, timeout);
> req->set_waiter(req, 0);
> - return ret;
> } else {
> dreq = req->defer(req);
> if (dreq == NULL)
> - return -ENOMEM;
> - if (setup_deferral(dreq, item) < 0)
> - return -EAGAIN;
> + return;
> + setup_deferral(dreq, item);
> cache_limit_defers(dreq);
> - return 0;
> + if (!test_bit(CACHE_PENDING, &item->flags))
> + /* Bit could have been cleared before we managed to
> + * set up the deferral, so need to revisit just in case
> + */
> + cache_revisit_request(item);
> }
> }
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 1/7] sunrpc: fix race in new cache_wait code.
2010-09-23 3:25 ` J. Bruce Fields
@ 2010-09-23 14:46 ` J. Bruce Fields
2010-10-01 23:09 ` J. Bruce Fields
0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2010-09-23 14:46 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-nfs
On Wed, Sep 22, 2010 at 11:25:40PM -0400, J. Bruce Fields wrote:
> On Thu, Sep 23, 2010 at 01:00:02PM +1000, Neil Brown wrote:
> > How about this.
> > It gets rid of the return values (which were confusing anyway) and adds
> > explicit checks on CAHCE_PENDING where needed.
> > ??
>
> Thanks, I'll take a look in the morning when my head's (hopefully)
> clearer.
Looks reasonable to me on a quick skim.
> > Also, I noticed there is a race with the call to cache_limit_defers. The
> > 'dreq' could be freed before that is called.
> >
> > It looks like I need to resubmit a lot of this - do you want to just discard
> > it all from your -next and I'll make something new?
>
> I'm trying very hard not to rewind -next; so I'd prefer incremental
> patches for anything already there, replacements for the rest.
But I'll wait for a new series. Thanks!
--b.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/7] sunrpc: fix race in new cache_wait code.
2010-09-23 14:46 ` J. Bruce Fields
@ 2010-10-01 23:09 ` J. Bruce Fields
2010-10-02 0:12 ` Neil Brown
0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2010-10-01 23:09 UTC (permalink / raw)
To: Neil Brown; +Cc: linux-nfs
On Thu, Sep 23, 2010 at 10:46:47AM -0400, J. Bruce Fields wrote:
> On Wed, Sep 22, 2010 at 11:25:40PM -0400, J. Bruce Fields wrote:
> > On Thu, Sep 23, 2010 at 01:00:02PM +1000, Neil Brown wrote:
> > > How about this.
> > > It gets rid of the return values (which were confusing anyway) and adds
> > > explicit checks on CAHCE_PENDING where needed.
> > > ??
> >
> > Thanks, I'll take a look in the morning when my head's (hopefully)
> > clearer.
>
> Looks reasonable to me on a quick skim.
>
> > > Also, I noticed there is a race with the call to cache_limit_defers. The
> > > 'dreq' could be freed before that is called.
> > >
> > > It looks like I need to resubmit a lot of this - do you want to just discard
> > > it all from your -next and I'll make something new?
> >
> > I'm trying very hard not to rewind -next; so I'd prefer incremental
> > patches for anything already there, replacements for the rest.
>
> But I'll wait for a new series. Thanks!
Well, just to have the bug fixed, I've applied your simple original fix
(1/7), but would still happily take incremental cleanup.
Out of this series that just leaves
[PATCH 4/7] sunrpc/cache: centralise handling of size limit on
deferred list.
[PATCH 5/7] sunrpc/cache: allow thread manager more control of
whether threads can wait for upcalls
(And with no particular objection to either--they just seemed more RFC's
than "please apply"'s.)
--b.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 1/7] sunrpc: fix race in new cache_wait code.
2010-10-01 23:09 ` J. Bruce Fields
@ 2010-10-02 0:12 ` Neil Brown
0 siblings, 0 replies; 22+ messages in thread
From: Neil Brown @ 2010-10-02 0:12 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
On Fri, 1 Oct 2010 19:09:41 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:
> On Thu, Sep 23, 2010 at 10:46:47AM -0400, J. Bruce Fields wrote:
> > On Wed, Sep 22, 2010 at 11:25:40PM -0400, J. Bruce Fields wrote:
> > > On Thu, Sep 23, 2010 at 01:00:02PM +1000, Neil Brown wrote:
> > > > How about this.
> > > > It gets rid of the return values (which were confusing anyway) and adds
> > > > explicit checks on CAHCE_PENDING where needed.
> > > > ??
> > >
> > > Thanks, I'll take a look in the morning when my head's (hopefully)
> > > clearer.
> >
> > Looks reasonable to me on a quick skim.
> >
> > > > Also, I noticed there is a race with the call to cache_limit_defers. The
> > > > 'dreq' could be freed before that is called.
> > > >
> > > > It looks like I need to resubmit a lot of this - do you want to just discard
> > > > it all from your -next and I'll make something new?
> > >
> > > I'm trying very hard not to rewind -next; so I'd prefer incremental
> > > patches for anything already there, replacements for the rest.
> >
> > But I'll wait for a new series. Thanks!
>
> Well, just to have the bug fixed, I've applied your simple original fix
> (1/7), but would still happily take incremental cleanup.
Thanks. I've been on leave this past week, but have now flagged this email
for attention when I get back into things next week.
>
> Out of this series that just leaves
>
> [PATCH 4/7] sunrpc/cache: centralise handling of size limit on
> deferred list.
>
> [PATCH 5/7] sunrpc/cache: allow thread manager more control of
> whether threads can wait for upcalls
>
> (And with no particular objection to either--they just seemed more RFC's
> than "please apply"'s.)
.... Yes..... I think I like them, but until I have some idea of what if
anything could/should be done to the usage of request deferral in lockd, it
is hard know whether it is really worth the churn. So let's leave them for
now.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 6/7] nfsd: formally deprecate legacy nfsd syscall interface
2010-09-22 2:55 [PATCH 0/7] Assorted nfsd patches for 2.6.37 NeilBrown
2010-09-22 2:55 ` [PATCH 2/7] sunrpc/cache: fix recent breakage of cache_clean_deferred NeilBrown
2010-09-22 2:55 ` [PATCH 1/7] sunrpc: fix race in new cache_wait code NeilBrown
@ 2010-09-22 2:55 ` NeilBrown
[not found] ` <20100922025507.31745.57024.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2010-09-22 2:55 ` [PATCH 5/7] sunrpc/cache: allow thread manager more control of whether threads can wait for upcalls NeilBrown
` (3 subsequent siblings)
6 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2010-09-22 2:55 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
The syscall interface is has been replaced by a more flexible
interface since 2.6.0. It is time to work towards discarding
the old interface.
So add a entry in feature-removal-schedule.txt and print a warning
when the interface is used.
Signed-off-by: NeilBrown <neilb@suse.de>
---
Documentation/feature-removal-schedule.txt | 10 ++++++++++
fs/nfsd/nfsctl.c | 10 ++++++++++
2 files changed, 20 insertions(+), 0 deletions(-)
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 842aa9d..076a2c0 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -564,3 +564,13 @@ Who: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
----------------------------
+What: access to nfsd auth cache through sys_nfsservctl or '.' files
+ in the 'nfsd' filesystem.
+When: 2.6.40
+Why: This is a legacy interface which have been replaced by a more
+ dynamic cache. Continuing to maintain this interface is an
+ unnecessary burden.
+Who: NeilBrown <neilb@suse.de>
+
+----------------------------
+
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index b53b1d0..7f0fc88 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -121,6 +121,16 @@ static ssize_t nfsctl_transaction_write(struct file *file, const char __user *bu
static ssize_t nfsctl_transaction_read(struct file *file, char __user *buf, size_t size, loff_t *pos)
{
+ static int warned;
+ if (file->f_dentry->d_name.name[0] == '.' && !warned) {
+ char name[sizeof(current->comm)];
+ printk(KERN_INFO
+ "Warning: \"%s\" uses deprecated NFSD interface: %s."
+ " This will be removed in 2.6.40\n",
+ get_task_comm(name, current),
+ file->f_dentry->d_name.name);
+ warned = 1;
+ }
if (! file->private_data) {
/* An attempt to read a transaction file without writing
* causes a 0-byte write so that the file can return
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH 5/7] sunrpc/cache: allow thread manager more control of whether threads can wait for upcalls
2010-09-22 2:55 [PATCH 0/7] Assorted nfsd patches for 2.6.37 NeilBrown
` (2 preceding siblings ...)
2010-09-22 2:55 ` [PATCH 6/7] nfsd: formally deprecate legacy nfsd syscall interface NeilBrown
@ 2010-09-22 2:55 ` NeilBrown
2010-09-22 18:36 ` J. Bruce Fields
2010-09-22 2:55 ` [PATCH 7/7] nfsd: allow deprecated interface to be compiled out NeilBrown
` (2 subsequent siblings)
6 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2010-09-22 2:55 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
Rather than blindly setting a timeout based on a course idea of
busy-ness, allow the 'cache' to call into the 'rqstp' manager to
request permission to wait for an upcall, and how long to wait for.
This allows the thread manager to know how many threads are waiting
and reduce the permitted timeout when there are a large number.
The same code can check if waiting makes any sense (which it doesn't
on single-threaded services) or if deferral is allowed (which it isn't
e.g. for NFSv4).
The current heuristic is to allow a long wait (30 sec) if fewer than
1/2 the threads are waiting, and to allow no wait at all if there are
more than 1/2 already waiting.
This provides better guaranties that slow responses to upcalls will
not block too many threads for too long.
Signed-off-by: NeilBrown <neilb@suse.de>
---
include/linux/sunrpc/cache.h | 8 ++++--
include/linux/sunrpc/svc.h | 1 +
net/sunrpc/cache.c | 30 ++++++++++++-----------
net/sunrpc/svc_xprt.c | 54 +++++++++++++++++++++++++++++++++---------
4 files changed, 64 insertions(+), 29 deletions(-)
diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 0349635..e2f5e56 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -125,9 +125,11 @@ struct cache_detail {
*/
struct cache_req {
struct cache_deferred_req *(*defer)(struct cache_req *req);
- int thread_wait; /* How long (jiffies) we can block the
- * current thread to wait for updates.
- */
+ /* See how long (jiffies) we can block the
+ * current thread to wait for updates, and register
+ * (or unregister) that we are waiting.
+ */
+ int (*set_waiter)(struct cache_req *req, int set);
};
/* this must be embedded in a deferred_request that is being
* delayed awaiting cache-fill
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 5a3085b..060029a 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -48,6 +48,7 @@ struct svc_pool {
struct list_head sp_threads; /* idle server threads */
struct list_head sp_sockets; /* pending sockets */
unsigned int sp_nrthreads; /* # of threads in pool */
+ unsigned int sp_nrwaiting; /* # of threads waiting on callbacks */
struct list_head sp_all_threads; /* all server threads */
struct svc_pool_stats sp_stats; /* statistics on pool operation */
} ____cacheline_aligned_in_smp;
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 1963e2a..b14d0ec 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -558,7 +558,7 @@ static void cache_restart_thread(struct cache_deferred_req *dreq, int too_many)
complete(&dr->completion);
}
-static int cache_wait_req(struct cache_req *req, struct cache_head *item)
+static int cache_wait_req(struct cache_req *req, struct cache_head *item, int timeout)
{
struct thread_deferred_req sleeper;
struct cache_deferred_req *dreq = &sleeper.handle;
@@ -571,7 +571,7 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item)
if (ret ||
wait_for_completion_interruptible_timeout(
- &sleeper.completion, req->thread_wait) <= 0) {
+ &sleeper.completion, timeout) <= 0) {
/* The completion wasn't completed, so we need
* to clean up
*/
@@ -643,20 +643,22 @@ static int cache_defer_req(struct cache_req *req, struct cache_head *item)
{
struct cache_deferred_req *dreq;
int ret;
+ int timeout;
- if (req->thread_wait) {
- ret = cache_wait_req(req, item);
- if (ret != -ETIMEDOUT)
- return ret;
+ timeout = req->set_waiter(req, 1);
+ if (timeout) {
+ ret = cache_wait_req(req, item, timeout);
+ req->set_waiter(req, 0);
+ return ret;
+ } else {
+ dreq = req->defer(req);
+ if (dreq == NULL)
+ return -ENOMEM;
+ if (setup_deferral(dreq, item) < 0)
+ return -EAGAIN;
+ cache_limit_defers(dreq);
+ return 0;
}
- dreq = req->defer(req);
- if (dreq == NULL)
- return -ENOMEM;
- if (setup_deferral(dreq, item) < 0)
- return -EAGAIN;
-
- cache_limit_defers(dreq);
- return 0;
}
static void cache_revisit_request(struct cache_head *item)
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 95fc3e8..4d4032c 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -20,6 +20,7 @@
static struct svc_deferred_req *svc_deferred_dequeue(struct svc_xprt *xprt);
static int svc_deferred_recv(struct svc_rqst *rqstp);
static struct cache_deferred_req *svc_defer(struct cache_req *req);
+static int svc_set_waiter(struct cache_req *req, int add);
static void svc_age_temp_xprts(unsigned long closure);
/* apparently the "standard" is that clients close
@@ -651,11 +652,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
if (signalled() || kthread_should_stop())
return -EINTR;
- /* Normally we will wait up to 5 seconds for any required
- * cache information to be provided.
- */
- rqstp->rq_chandle.thread_wait = 5*HZ;
-
spin_lock_bh(&pool->sp_lock);
xprt = svc_xprt_dequeue(pool);
if (xprt) {
@@ -663,12 +659,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
svc_xprt_get(xprt);
rqstp->rq_reserved = serv->sv_max_mesg;
atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
-
- /* As there is a shortage of threads and this request
- * had to be queued, don't allow the thread to wait so
- * long for cache updates.
- */
- rqstp->rq_chandle.thread_wait = 1*HZ;
} else {
/* No data pending. Go to sleep */
svc_thread_enqueue(pool, rqstp);
@@ -772,6 +762,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
rqstp->rq_secure = svc_port_is_privileged(svc_addr(rqstp));
rqstp->rq_chandle.defer = svc_defer;
+ rqstp->rq_chandle.set_waiter = svc_set_waiter;
if (serv->sv_stats)
serv->sv_stats->netcnt++;
@@ -987,7 +978,7 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req)
struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle);
struct svc_deferred_req *dr;
- if (rqstp->rq_arg.page_len || !rqstp->rq_usedeferral)
+ if (rqstp->rq_arg.page_len)
return NULL; /* if more than a page, give up FIXME */
if (rqstp->rq_deferred) {
dr = rqstp->rq_deferred;
@@ -1065,6 +1056,45 @@ static struct svc_deferred_req *svc_deferred_dequeue(struct svc_xprt *xprt)
return dr;
}
+/*
+ * If 'add', the cache wants to wait for user-space to respond to a request.
+ * We return the number of jiffies to wait. 0 means not to wait at all but
+ * to try deferral instead.
+ * If not 'add', then the wait is over and we can discount this one.
+ */
+static int svc_set_waiter(struct cache_req *req, int add)
+{
+ struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle);
+ struct svc_pool *pool = rqstp->rq_pool;
+ int rv = 0;
+
+ spin_lock_bh(&pool->sp_lock);
+ if (add) {
+ if (pool->sp_nrthreads <= 1 && rqstp->rq_usedeferral)
+ /* single threaded - never wait */
+ rv = 0;
+ else if (pool->sp_nrwaiting * 2 > pool->sp_nrthreads)
+ /* > 1/2 are waiting already, something looks wrong,
+ * Don't defer or wait */
+ rv = 1;
+ else
+ /* Wait a reasonably long time */
+ rv = 30 * HZ;
+
+ if (!rqstp->rq_usedeferral && rv == 0)
+ /* just double-checking - we mustn't allow deferral */
+ rv = 1;
+
+ if (rv)
+ pool->sp_nrwaiting++;
+ } else {
+ BUG_ON(pool->sp_nrwaiting == 0);
+ pool->sp_nrwaiting--;
+ }
+ spin_unlock_bh(&pool->sp_lock);
+ return rv;
+}
+
/**
* svc_find_xprt - find an RPC transport instance
* @serv: pointer to svc_serv to search
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH 5/7] sunrpc/cache: allow thread manager more control of whether threads can wait for upcalls
2010-09-22 2:55 ` [PATCH 5/7] sunrpc/cache: allow thread manager more control of whether threads can wait for upcalls NeilBrown
@ 2010-09-22 18:36 ` J. Bruce Fields
2010-09-23 3:23 ` Neil Brown
0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2010-09-22 18:36 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-nfs
On Wed, Sep 22, 2010 at 12:55:07PM +1000, NeilBrown wrote:
> Rather than blindly setting a timeout based on a course idea of
> busy-ness, allow the 'cache' to call into the 'rqstp' manager to
> request permission to wait for an upcall, and how long to wait for.
>
> This allows the thread manager to know how many threads are waiting
> and reduce the permitted timeout when there are a large number.
>
> The same code can check if waiting makes any sense (which it doesn't
> on single-threaded services) or if deferral is allowed (which it isn't
> e.g. for NFSv4).
>
> The current heuristic is to allow a long wait (30 sec) if fewer than
> 1/2 the threads are waiting, and to allow no wait at all if there are
> more than 1/2 already waiting.
>
> This provides better guaranties that slow responses to upcalls will
> not block too many threads for too long.
I suppose you're probably looking for comments on the idea rather than
the particular choice of heuristic, but: wasn't one of the motivations
that a cache flush in the midst of heavy write traffic could cause long
delays due to writes being dropped?
That seems like a case where most threads may handling rpc's, but
waiting is still preferable to dropping.
--b.
>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> include/linux/sunrpc/cache.h | 8 ++++--
> include/linux/sunrpc/svc.h | 1 +
> net/sunrpc/cache.c | 30 ++++++++++++-----------
> net/sunrpc/svc_xprt.c | 54 +++++++++++++++++++++++++++++++++---------
> 4 files changed, 64 insertions(+), 29 deletions(-)
>
> diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
> index 0349635..e2f5e56 100644
> --- a/include/linux/sunrpc/cache.h
> +++ b/include/linux/sunrpc/cache.h
> @@ -125,9 +125,11 @@ struct cache_detail {
> */
> struct cache_req {
> struct cache_deferred_req *(*defer)(struct cache_req *req);
> - int thread_wait; /* How long (jiffies) we can block the
> - * current thread to wait for updates.
> - */
> + /* See how long (jiffies) we can block the
> + * current thread to wait for updates, and register
> + * (or unregister) that we are waiting.
> + */
> + int (*set_waiter)(struct cache_req *req, int set);
> };
> /* this must be embedded in a deferred_request that is being
> * delayed awaiting cache-fill
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 5a3085b..060029a 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -48,6 +48,7 @@ struct svc_pool {
> struct list_head sp_threads; /* idle server threads */
> struct list_head sp_sockets; /* pending sockets */
> unsigned int sp_nrthreads; /* # of threads in pool */
> + unsigned int sp_nrwaiting; /* # of threads waiting on callbacks */
> struct list_head sp_all_threads; /* all server threads */
> struct svc_pool_stats sp_stats; /* statistics on pool operation */
> } ____cacheline_aligned_in_smp;
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index 1963e2a..b14d0ec 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -558,7 +558,7 @@ static void cache_restart_thread(struct cache_deferred_req *dreq, int too_many)
> complete(&dr->completion);
> }
>
> -static int cache_wait_req(struct cache_req *req, struct cache_head *item)
> +static int cache_wait_req(struct cache_req *req, struct cache_head *item, int timeout)
> {
> struct thread_deferred_req sleeper;
> struct cache_deferred_req *dreq = &sleeper.handle;
> @@ -571,7 +571,7 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item)
>
> if (ret ||
> wait_for_completion_interruptible_timeout(
> - &sleeper.completion, req->thread_wait) <= 0) {
> + &sleeper.completion, timeout) <= 0) {
> /* The completion wasn't completed, so we need
> * to clean up
> */
> @@ -643,20 +643,22 @@ static int cache_defer_req(struct cache_req *req, struct cache_head *item)
> {
> struct cache_deferred_req *dreq;
> int ret;
> + int timeout;
>
> - if (req->thread_wait) {
> - ret = cache_wait_req(req, item);
> - if (ret != -ETIMEDOUT)
> - return ret;
> + timeout = req->set_waiter(req, 1);
> + if (timeout) {
> + ret = cache_wait_req(req, item, timeout);
> + req->set_waiter(req, 0);
> + return ret;
> + } else {
> + dreq = req->defer(req);
> + if (dreq == NULL)
> + return -ENOMEM;
> + if (setup_deferral(dreq, item) < 0)
> + return -EAGAIN;
> + cache_limit_defers(dreq);
> + return 0;
> }
> - dreq = req->defer(req);
> - if (dreq == NULL)
> - return -ENOMEM;
> - if (setup_deferral(dreq, item) < 0)
> - return -EAGAIN;
> -
> - cache_limit_defers(dreq);
> - return 0;
> }
>
> static void cache_revisit_request(struct cache_head *item)
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 95fc3e8..4d4032c 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -20,6 +20,7 @@
> static struct svc_deferred_req *svc_deferred_dequeue(struct svc_xprt *xprt);
> static int svc_deferred_recv(struct svc_rqst *rqstp);
> static struct cache_deferred_req *svc_defer(struct cache_req *req);
> +static int svc_set_waiter(struct cache_req *req, int add);
> static void svc_age_temp_xprts(unsigned long closure);
>
> /* apparently the "standard" is that clients close
> @@ -651,11 +652,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
> if (signalled() || kthread_should_stop())
> return -EINTR;
>
> - /* Normally we will wait up to 5 seconds for any required
> - * cache information to be provided.
> - */
> - rqstp->rq_chandle.thread_wait = 5*HZ;
> -
> spin_lock_bh(&pool->sp_lock);
> xprt = svc_xprt_dequeue(pool);
> if (xprt) {
> @@ -663,12 +659,6 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
> svc_xprt_get(xprt);
> rqstp->rq_reserved = serv->sv_max_mesg;
> atomic_add(rqstp->rq_reserved, &xprt->xpt_reserved);
> -
> - /* As there is a shortage of threads and this request
> - * had to be queued, don't allow the thread to wait so
> - * long for cache updates.
> - */
> - rqstp->rq_chandle.thread_wait = 1*HZ;
> } else {
> /* No data pending. Go to sleep */
> svc_thread_enqueue(pool, rqstp);
> @@ -772,6 +762,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
>
> rqstp->rq_secure = svc_port_is_privileged(svc_addr(rqstp));
> rqstp->rq_chandle.defer = svc_defer;
> + rqstp->rq_chandle.set_waiter = svc_set_waiter;
>
> if (serv->sv_stats)
> serv->sv_stats->netcnt++;
> @@ -987,7 +978,7 @@ static struct cache_deferred_req *svc_defer(struct cache_req *req)
> struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle);
> struct svc_deferred_req *dr;
>
> - if (rqstp->rq_arg.page_len || !rqstp->rq_usedeferral)
> + if (rqstp->rq_arg.page_len)
> return NULL; /* if more than a page, give up FIXME */
> if (rqstp->rq_deferred) {
> dr = rqstp->rq_deferred;
> @@ -1065,6 +1056,45 @@ static struct svc_deferred_req *svc_deferred_dequeue(struct svc_xprt *xprt)
> return dr;
> }
>
> +/*
> + * If 'add', the cache wants to wait for user-space to respond to a request.
> + * We return the number of jiffies to wait. 0 means not to wait at all but
> + * to try deferral instead.
> + * If not 'add', then the wait is over and we can discount this one.
> + */
> +static int svc_set_waiter(struct cache_req *req, int add)
> +{
> + struct svc_rqst *rqstp = container_of(req, struct svc_rqst, rq_chandle);
> + struct svc_pool *pool = rqstp->rq_pool;
> + int rv = 0;
> +
> + spin_lock_bh(&pool->sp_lock);
> + if (add) {
> + if (pool->sp_nrthreads <= 1 && rqstp->rq_usedeferral)
> + /* single threaded - never wait */
> + rv = 0;
> + else if (pool->sp_nrwaiting * 2 > pool->sp_nrthreads)
> + /* > 1/2 are waiting already, something looks wrong,
> + * Don't defer or wait */
> + rv = 1;
> + else
> + /* Wait a reasonably long time */
> + rv = 30 * HZ;
> +
> + if (!rqstp->rq_usedeferral && rv == 0)
> + /* just double-checking - we mustn't allow deferral */
> + rv = 1;
> +
> + if (rv)
> + pool->sp_nrwaiting++;
> + } else {
> + BUG_ON(pool->sp_nrwaiting == 0);
> + pool->sp_nrwaiting--;
> + }
> + spin_unlock_bh(&pool->sp_lock);
> + return rv;
> +}
> +
> /**
> * svc_find_xprt - find an RPC transport instance
> * @serv: pointer to svc_serv to search
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 5/7] sunrpc/cache: allow thread manager more control of whether threads can wait for upcalls
2010-09-22 18:36 ` J. Bruce Fields
@ 2010-09-23 3:23 ` Neil Brown
0 siblings, 0 replies; 22+ messages in thread
From: Neil Brown @ 2010-09-23 3:23 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
On Wed, 22 Sep 2010 14:36:22 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:
> On Wed, Sep 22, 2010 at 12:55:07PM +1000, NeilBrown wrote:
> > Rather than blindly setting a timeout based on a course idea of
> > busy-ness, allow the 'cache' to call into the 'rqstp' manager to
> > request permission to wait for an upcall, and how long to wait for.
> >
> > This allows the thread manager to know how many threads are waiting
> > and reduce the permitted timeout when there are a large number.
> >
> > The same code can check if waiting makes any sense (which it doesn't
> > on single-threaded services) or if deferral is allowed (which it isn't
> > e.g. for NFSv4).
> >
> > The current heuristic is to allow a long wait (30 sec) if fewer than
> > 1/2 the threads are waiting, and to allow no wait at all if there are
> > more than 1/2 already waiting.
> >
> > This provides better guaranties that slow responses to upcalls will
> > not block too many threads for too long.
>
> I suppose you're probably looking for comments on the idea rather than
> the particular choice of heuristic, but: wasn't one of the motivations
> that a cache flush in the midst of heavy write traffic could cause long
> delays due to writes being dropped?
>
> That seems like a case where most threads may handling rpc's, but
> waiting is still preferable to dropping.
>
All comments are welcome - thanks.
Yes, the heuristic needs careful thought.
Currently a high level of traffic after a cache flush (and with slow mountd)
will cause the first 300 requests to get deferred and subsequent requests to
either be dropped or to cause old requests to be dropped.
With the proposed code, the '300' becomes 'half the number of nfsd threads'.
This is smaller by default (bad) but configurable (good).
I think making it configurable is much more important than having it large by
default. But maybe 3/4 would be better than 1/2.
I keep thinking that it would be really nice to have a dynamic thread pool,
but I'm that sure that would do more than just move the problem...
Another approach we could take (which only really works for tcp) is to count
the number of waiting requests against each connection (rqstp) and if a
single connection has "more than it's fair share" we stop accepting requests
on that connection, while still allowing new requests on connections where
there are fewer waiting requests.... sounds a bit complex though.
There is probably a really nice solution and I just cannot see it (maybe the
solution is just to ignore the 'problem' and allow any request to wait as
long as it likes).
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 7/7] nfsd: allow deprecated interface to be compiled out.
2010-09-22 2:55 [PATCH 0/7] Assorted nfsd patches for 2.6.37 NeilBrown
` (3 preceding siblings ...)
2010-09-22 2:55 ` [PATCH 5/7] sunrpc/cache: allow thread manager more control of whether threads can wait for upcalls NeilBrown
@ 2010-09-22 2:55 ` NeilBrown
2010-09-22 2:55 ` [PATCH 3/7] sunrpc/cache: change deferred-request hash table to use hlist NeilBrown
2010-09-22 2:55 ` [PATCH 4/7] sunrpc/cache: centralise handling of size limit on deferred list NeilBrown
6 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2010-09-22 2:55 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
Add CONFIG_NFSD_DEPRECATED, default to y.
Only include deprecated interface if this is defined.
This allows distros to remove this interface before the official
removal, and allows developers to test without it.
Signed-off-by: NeilBrown <neilb@suse.de>
---
fs/Makefile | 5 +----
fs/compat.c | 2 +-
fs/nfsd/Kconfig | 12 ++++++++++++
fs/nfsd/export.c | 22 +++++++++++++++++++---
fs/nfsd/nfsctl.c | 10 ++++++++++
5 files changed, 43 insertions(+), 8 deletions(-)
diff --git a/fs/Makefile b/fs/Makefile
index e6ec1d3..26956fc 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -29,10 +29,7 @@ obj-$(CONFIG_EVENTFD) += eventfd.o
obj-$(CONFIG_AIO) += aio.o
obj-$(CONFIG_FILE_LOCKING) += locks.o
obj-$(CONFIG_COMPAT) += compat.o compat_ioctl.o
-
-nfsd-$(CONFIG_NFSD) := nfsctl.o
-obj-y += $(nfsd-y) $(nfsd-m)
-
+obj-$(CONFIG_NFSD_DEPRECATED) += nfsctl.o
obj-$(CONFIG_BINFMT_AOUT) += binfmt_aout.o
obj-$(CONFIG_BINFMT_EM86) += binfmt_em86.o
obj-$(CONFIG_BINFMT_MISC) += binfmt_misc.o
diff --git a/fs/compat.c b/fs/compat.c
index 718c706..df5e671 100644
--- a/fs/compat.c
+++ b/fs/compat.c
@@ -1963,7 +1963,7 @@ asmlinkage long compat_sys_ppoll(struct pollfd __user *ufds,
}
#endif /* HAVE_SET_RESTORE_SIGMASK */
-#if defined(CONFIG_NFSD) || defined(CONFIG_NFSD_MODULE)
+#if (defined(CONFIG_NFSD) || defined(CONFIG_NFSD_MODULE)) && !defined(CONFIG_NFSD_DEPRECATED)
/* Stuff for NFS server syscalls... */
struct compat_nfsctl_svc {
u16 svc32_port;
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index 4264377..18b3e89 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -28,6 +28,18 @@ config NFSD
If unsure, say N.
+config NFSD_DEPRECATED
+ bool "Include support for deprecated syscall interface to NFSD"
+ depends on NFSD
+ default y
+ help
+ The syscall interface to nfsd was obsoleted in 2.6.0 by a new
+ filesystem based interface. The old interface is due for removal
+ in 2.6.40. If you wish to remove the interface before then
+ say N.
+
+ In unsure, say Y.
+
config NFSD_V2_ACL
bool
depends on NFSD
diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index e56827b..a3c7d0c 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -28,9 +28,6 @@
typedef struct auth_domain svc_client;
typedef struct svc_export svc_export;
-static void exp_do_unexport(svc_export *unexp);
-static int exp_verify_string(char *cp, int max);
-
/*
* We have two caches.
* One maps client+vfsmnt+dentry to export options - the export map
@@ -802,6 +799,7 @@ exp_find_key(svc_client *clp, int fsid_type, u32 *fsidv, struct cache_req *reqp)
return ek;
}
+#ifdef CONFIG_NFSD_DEPRECATED
static int exp_set_key(svc_client *clp, int fsid_type, u32 *fsidv,
struct svc_export *exp)
{
@@ -852,6 +850,7 @@ exp_get_fsid_key(svc_client *clp, int fsid)
return exp_find_key(clp, FSID_NUM, fsidv, NULL);
}
+#endif
static svc_export *exp_get_by_name(svc_client *clp, const struct path *path,
struct cache_req *reqp)
@@ -893,6 +892,7 @@ static struct svc_export *exp_parent(svc_client *clp, struct path *path)
return exp;
}
+#ifdef CONFIG_NFSD_DEPRECATED
/*
* Hashtable locking. Write locks are placed only by user processes
* wanting to modify export information.
@@ -925,6 +925,19 @@ exp_writeunlock(void)
{
up_write(&hash_sem);
}
+#else
+
+/* hash_sem not needed once deprecated interface is removed */
+void exp_readlock(void) {}
+static inline void exp_writelock(void){}
+void exp_readunlock(void) {}
+static inline void exp_writeunlock(void){}
+
+#endif
+
+#ifdef CONFIG_NFSD_DEPRECATED
+static void exp_do_unexport(svc_export *unexp);
+static int exp_verify_string(char *cp, int max);
static void exp_fsid_unhash(struct svc_export *exp)
{
@@ -1147,6 +1160,7 @@ out_unlock:
exp_writeunlock();
return err;
}
+#endif /* CONFIG_NFSD_DEPRECATED */
/*
* Obtain the root fh on behalf of a client.
@@ -1529,6 +1543,7 @@ const struct seq_operations nfs_exports_op = {
.show = e_show,
};
+#ifdef CONFIG_NFSD_DEPRECATED
/*
* Add or modify a client.
* Change requests may involve the list of host addresses. The list of
@@ -1618,6 +1633,7 @@ exp_verify_string(char *cp, int max)
printk(KERN_NOTICE "nfsd: couldn't validate string %s\n", cp);
return 0;
}
+#endif /* CONFIG_NFSD_DEPRECATED */
/*
* Initialize the exports module.
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 7f0fc88..b278e44 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -22,6 +22,7 @@
*/
enum {
NFSD_Root = 1,
+#ifdef CONFIG_NFSD_DEPRECATED
NFSD_Svc,
NFSD_Add,
NFSD_Del,
@@ -29,6 +30,7 @@ enum {
NFSD_Unexport,
NFSD_Getfd,
NFSD_Getfs,
+#endif
NFSD_List,
NFSD_Export_features,
NFSD_Fh,
@@ -54,6 +56,7 @@ enum {
/*
* write() for these nodes.
*/
+#ifdef CONFIG_NFSD_DEPRECATED
static ssize_t write_svc(struct file *file, char *buf, size_t size);
static ssize_t write_add(struct file *file, char *buf, size_t size);
static ssize_t write_del(struct file *file, char *buf, size_t size);
@@ -61,6 +64,7 @@ static ssize_t write_export(struct file *file, char *buf, size_t size);
static ssize_t write_unexport(struct file *file, char *buf, size_t size);
static ssize_t write_getfd(struct file *file, char *buf, size_t size);
static ssize_t write_getfs(struct file *file, char *buf, size_t size);
+#endif
static ssize_t write_filehandle(struct file *file, char *buf, size_t size);
static ssize_t write_unlock_ip(struct file *file, char *buf, size_t size);
static ssize_t write_unlock_fs(struct file *file, char *buf, size_t size);
@@ -76,6 +80,7 @@ static ssize_t write_recoverydir(struct file *file, char *buf, size_t size);
#endif
static ssize_t (*write_op[])(struct file *, char *, size_t) = {
+#ifdef CONFIG_NFSD_DEPRECATED
[NFSD_Svc] = write_svc,
[NFSD_Add] = write_add,
[NFSD_Del] = write_del,
@@ -83,6 +88,7 @@ static ssize_t (*write_op[])(struct file *, char *, size_t) = {
[NFSD_Unexport] = write_unexport,
[NFSD_Getfd] = write_getfd,
[NFSD_Getfs] = write_getfs,
+#endif
[NFSD_Fh] = write_filehandle,
[NFSD_FO_UnlockIP] = write_unlock_ip,
[NFSD_FO_UnlockFS] = write_unlock_fs,
@@ -196,6 +202,7 @@ static const struct file_operations pool_stats_operations = {
* payload - write methods
*/
+#ifdef CONFIG_NFSD_DEPRECATED
/**
* write_svc - Start kernel's NFSD server
*
@@ -491,6 +498,7 @@ static ssize_t write_getfd(struct file *file, char *buf, size_t size)
out:
return err;
}
+#endif /* CONFIG_NFSD_DEPRECATED */
/**
* write_unlock_ip - Release all locks used by a client
@@ -1365,6 +1373,7 @@ static ssize_t write_recoverydir(struct file *file, char *buf, size_t size)
static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
{
static struct tree_descr nfsd_files[] = {
+#ifdef CONFIG_NFSD_DEPRECATED
[NFSD_Svc] = {".svc", &transaction_ops, S_IWUSR},
[NFSD_Add] = {".add", &transaction_ops, S_IWUSR},
[NFSD_Del] = {".del", &transaction_ops, S_IWUSR},
@@ -1372,6 +1381,7 @@ static int nfsd_fill_super(struct super_block * sb, void * data, int silent)
[NFSD_Unexport] = {".unexport", &transaction_ops, S_IWUSR},
[NFSD_Getfd] = {".getfd", &transaction_ops, S_IWUSR|S_IRUSR},
[NFSD_Getfs] = {".getfs", &transaction_ops, S_IWUSR|S_IRUSR},
+#endif
[NFSD_List] = {"exports", &exports_operations, S_IRUGO},
[NFSD_Export_features] = {"export_features",
&export_features_operations, S_IRUGO},
^ permalink raw reply related [flat|nested] 22+ messages in thread* [PATCH 3/7] sunrpc/cache: change deferred-request hash table to use hlist.
2010-09-22 2:55 [PATCH 0/7] Assorted nfsd patches for 2.6.37 NeilBrown
` (4 preceding siblings ...)
2010-09-22 2:55 ` [PATCH 7/7] nfsd: allow deprecated interface to be compiled out NeilBrown
@ 2010-09-22 2:55 ` NeilBrown
2010-09-22 2:59 ` J. Bruce Fields
2010-09-22 2:55 ` [PATCH 4/7] sunrpc/cache: centralise handling of size limit on deferred list NeilBrown
6 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2010-09-22 2:55 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
Being a hash table, hlist is the best option.
There is currently some ugliness were we treat "->next == NULL" as
a special case to avoid having to initialise the whole array.
This change nicely gets rid of that case.
Signed-off-by: NeilBrown <neilb@suse.de>
---
include/linux/sunrpc/cache.h | 2 +-
net/sunrpc/cache.c | 22 ++++++++--------------
2 files changed, 9 insertions(+), 15 deletions(-)
diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
index 52a7d72..0349635 100644
--- a/include/linux/sunrpc/cache.h
+++ b/include/linux/sunrpc/cache.h
@@ -133,7 +133,7 @@ struct cache_req {
* delayed awaiting cache-fill
*/
struct cache_deferred_req {
- struct list_head hash; /* on hash chain */
+ struct hlist_node hash; /* on hash chain */
struct list_head recent; /* on fifo */
struct cache_head *item; /* cache item we wait on */
void *owner; /* we might need to discard all defered requests
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index a9e850e..466f65c 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -506,13 +506,13 @@ EXPORT_SYMBOL_GPL(cache_purge);
static DEFINE_SPINLOCK(cache_defer_lock);
static LIST_HEAD(cache_defer_list);
-static struct list_head cache_defer_hash[DFR_HASHSIZE];
+static struct hlist_head cache_defer_hash[DFR_HASHSIZE];
static int cache_defer_cnt;
static void __unhash_deferred_req(struct cache_deferred_req *dreq)
{
list_del_init(&dreq->recent);
- list_del_init(&dreq->hash);
+ hlist_del_init(&dreq->hash);
cache_defer_cnt--;
}
@@ -521,9 +521,7 @@ static void __hash_deferred_req(struct cache_deferred_req *dreq, struct cache_he
int hash = DFR_HASH(item);
list_add(&dreq->recent, &cache_defer_list);
- if (cache_defer_hash[hash].next == NULL)
- INIT_LIST_HEAD(&cache_defer_hash[hash]);
- list_add(&dreq->hash, &cache_defer_hash[hash]);
+ hlist_add_head(&dreq->hash, &cache_defer_hash[hash]);
}
static int setup_deferral(struct cache_deferred_req *dreq, struct cache_head *item)
@@ -587,7 +585,7 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item)
* to clean up
*/
spin_lock(&cache_defer_lock);
- if (!list_empty(&sleeper.handle.hash)) {
+ if (!hlist_unhashed(&sleeper.handle.hash)) {
__unhash_deferred_req(&sleeper.handle);
spin_unlock(&cache_defer_lock);
} else {
@@ -642,23 +640,19 @@ static void cache_revisit_request(struct cache_head *item)
struct cache_deferred_req *dreq;
struct list_head pending;
- struct list_head *lp;
+ struct hlist_node *lp, *tmp;
int hash = DFR_HASH(item);
INIT_LIST_HEAD(&pending);
spin_lock(&cache_defer_lock);
- lp = cache_defer_hash[hash].next;
- if (lp) {
- while (lp != &cache_defer_hash[hash]) {
- dreq = list_entry(lp, struct cache_deferred_req, hash);
- lp = lp->next;
+ hlist_for_each_entry_safe(dreq, lp, tmp,
+ &cache_defer_hash[hash], hash)
if (dreq->item == item) {
__unhash_deferred_req(dreq);
list_add(&dreq->recent, &pending);
}
- }
- }
+
spin_unlock(&cache_defer_lock);
while (!list_empty(&pending)) {
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH 3/7] sunrpc/cache: change deferred-request hash table to use hlist.
2010-09-22 2:55 ` [PATCH 3/7] sunrpc/cache: change deferred-request hash table to use hlist NeilBrown
@ 2010-09-22 2:59 ` J. Bruce Fields
2010-09-22 4:51 ` Neil Brown
0 siblings, 1 reply; 22+ messages in thread
From: J. Bruce Fields @ 2010-09-22 2:59 UTC (permalink / raw)
To: NeilBrown; +Cc: linux-nfs
On Wed, Sep 22, 2010 at 12:55:07PM +1000, NeilBrown wrote:
> Being a hash table, hlist is the best option.
>
> There is currently some ugliness were we treat "->next == NULL" as
> a special case to avoid having to initialise the whole array.
> This change nicely gets rid of that case.
This (or some hopefully correct variant!) should already be applied;
could you check that it looks right?
--b.
>
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> include/linux/sunrpc/cache.h | 2 +-
> net/sunrpc/cache.c | 22 ++++++++--------------
> 2 files changed, 9 insertions(+), 15 deletions(-)
>
> diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h
> index 52a7d72..0349635 100644
> --- a/include/linux/sunrpc/cache.h
> +++ b/include/linux/sunrpc/cache.h
> @@ -133,7 +133,7 @@ struct cache_req {
> * delayed awaiting cache-fill
> */
> struct cache_deferred_req {
> - struct list_head hash; /* on hash chain */
> + struct hlist_node hash; /* on hash chain */
> struct list_head recent; /* on fifo */
> struct cache_head *item; /* cache item we wait on */
> void *owner; /* we might need to discard all defered requests
> diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
> index a9e850e..466f65c 100644
> --- a/net/sunrpc/cache.c
> +++ b/net/sunrpc/cache.c
> @@ -506,13 +506,13 @@ EXPORT_SYMBOL_GPL(cache_purge);
>
> static DEFINE_SPINLOCK(cache_defer_lock);
> static LIST_HEAD(cache_defer_list);
> -static struct list_head cache_defer_hash[DFR_HASHSIZE];
> +static struct hlist_head cache_defer_hash[DFR_HASHSIZE];
> static int cache_defer_cnt;
>
> static void __unhash_deferred_req(struct cache_deferred_req *dreq)
> {
> list_del_init(&dreq->recent);
> - list_del_init(&dreq->hash);
> + hlist_del_init(&dreq->hash);
> cache_defer_cnt--;
> }
>
> @@ -521,9 +521,7 @@ static void __hash_deferred_req(struct cache_deferred_req *dreq, struct cache_he
> int hash = DFR_HASH(item);
>
> list_add(&dreq->recent, &cache_defer_list);
> - if (cache_defer_hash[hash].next == NULL)
> - INIT_LIST_HEAD(&cache_defer_hash[hash]);
> - list_add(&dreq->hash, &cache_defer_hash[hash]);
> + hlist_add_head(&dreq->hash, &cache_defer_hash[hash]);
> }
>
> static int setup_deferral(struct cache_deferred_req *dreq, struct cache_head *item)
> @@ -587,7 +585,7 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item)
> * to clean up
> */
> spin_lock(&cache_defer_lock);
> - if (!list_empty(&sleeper.handle.hash)) {
> + if (!hlist_unhashed(&sleeper.handle.hash)) {
> __unhash_deferred_req(&sleeper.handle);
> spin_unlock(&cache_defer_lock);
> } else {
> @@ -642,23 +640,19 @@ static void cache_revisit_request(struct cache_head *item)
> struct cache_deferred_req *dreq;
> struct list_head pending;
>
> - struct list_head *lp;
> + struct hlist_node *lp, *tmp;
> int hash = DFR_HASH(item);
>
> INIT_LIST_HEAD(&pending);
> spin_lock(&cache_defer_lock);
>
> - lp = cache_defer_hash[hash].next;
> - if (lp) {
> - while (lp != &cache_defer_hash[hash]) {
> - dreq = list_entry(lp, struct cache_deferred_req, hash);
> - lp = lp->next;
> + hlist_for_each_entry_safe(dreq, lp, tmp,
> + &cache_defer_hash[hash], hash)
> if (dreq->item == item) {
> __unhash_deferred_req(dreq);
> list_add(&dreq->recent, &pending);
> }
> - }
> - }
> +
> spin_unlock(&cache_defer_lock);
>
> while (!list_empty(&pending)) {
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH 3/7] sunrpc/cache: change deferred-request hash table to use hlist.
2010-09-22 2:59 ` J. Bruce Fields
@ 2010-09-22 4:51 ` Neil Brown
0 siblings, 0 replies; 22+ messages in thread
From: Neil Brown @ 2010-09-22 4:51 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
On Tue, 21 Sep 2010 22:59:44 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:
> On Wed, Sep 22, 2010 at 12:55:07PM +1000, NeilBrown wrote:
> > Being a hash table, hlist is the best option.
> >
> > There is currently some ugliness were we treat "->next == NULL" as
> > a special case to avoid having to initialise the whole array.
> > This change nicely gets rid of that case.
>
> This (or some hopefully correct variant!) should already be applied;
> could you check that it looks right?
I must has pulled just a little too early - I see it now.
Just a couple of minor formatting improvements in your version, otherwise
identical.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 4/7] sunrpc/cache: centralise handling of size limit on deferred list.
2010-09-22 2:55 [PATCH 0/7] Assorted nfsd patches for 2.6.37 NeilBrown
` (5 preceding siblings ...)
2010-09-22 2:55 ` [PATCH 3/7] sunrpc/cache: change deferred-request hash table to use hlist NeilBrown
@ 2010-09-22 2:55 ` NeilBrown
[not found] ` <20100922025507.31745.61919.stgit-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
6 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2010-09-22 2:55 UTC (permalink / raw)
To: J. Bruce Fields; +Cc: linux-nfs
We limit the number of 'defer' requests to DFR_MAX.
The imposition of this limit is spread about a bit - sometime we don't
add new things to the list, sometimes we remove old things.
Also it is currently applied to requests which we are 'waiting' for
rather than 'deferring'. This doesn't seem ideal as 'waiting'
requests are naturally limited by the number of threads.
So gather the DFR_MAX handling code to one place and only apply it to
requests that are actually being deferred.
This means that not all 'cache_deferred_req' structures go on the
'cache_defer_list, so we need to be careful when removing things.
Signed-off-by: NeilBrown <neilb@suse.de>
---
net/sunrpc/cache.c | 68 +++++++++++++++++++++++++++++++++++-----------------
1 files changed, 46 insertions(+), 22 deletions(-)
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c
index 466f65c..1963e2a 100644
--- a/net/sunrpc/cache.c
+++ b/net/sunrpc/cache.c
@@ -511,22 +511,24 @@ static int cache_defer_cnt;
static void __unhash_deferred_req(struct cache_deferred_req *dreq)
{
- list_del_init(&dreq->recent);
hlist_del_init(&dreq->hash);
- cache_defer_cnt--;
+ if (!list_empty(&dreq->recent)) {
+ list_del_init(&dreq->recent);
+ cache_defer_cnt--;
+ }
}
static void __hash_deferred_req(struct cache_deferred_req *dreq, struct cache_head *item)
{
int hash = DFR_HASH(item);
+ INIT_LIST_HEAD(&dreq->recent);
list_add(&dreq->recent, &cache_defer_list);
hlist_add_head(&dreq->hash, &cache_defer_hash[hash]);
}
static int setup_deferral(struct cache_deferred_req *dreq, struct cache_head *item)
{
- struct cache_deferred_req *discard;
dreq->item = item;
@@ -534,19 +536,8 @@ static int setup_deferral(struct cache_deferred_req *dreq, struct cache_head *it
__hash_deferred_req(dreq, item);
- /* it is in, now maybe clean up */
- discard = NULL;
- if (++cache_defer_cnt > DFR_MAX) {
- discard = list_entry(cache_defer_list.prev,
- struct cache_deferred_req, recent);
- __unhash_deferred_req(discard);
- }
spin_unlock(&cache_defer_lock);
- if (discard)
- /* there was one too many */
- discard->revisit(discard, 1);
-
if (!test_bit(CACHE_PENDING, &item->flags)) {
/* must have just been validated... */
cache_revisit_request(item);
@@ -612,18 +603,47 @@ static int cache_wait_req(struct cache_req *req, struct cache_head *item)
return -EEXIST;
}
+static void cache_limit_defers(struct cache_deferred_req *dreq)
+{
+ /* Add 'dreq' to the list of deferred requests and make sure
+ * we don't exceed the limit of allowed deferred requests.
+ */
+ struct cache_deferred_req *discard = NULL;
+
+ spin_lock(&cache_defer_lock);
+ if (!list_empty(&dreq->recent) ||
+ hlist_unhashed(&dreq->hash)) {
+ /* Must have lost a race, maybe cache_revisit_request is
+ * already processing this. In any case, there is nothing for
+ * us to do.
+ */
+ spin_unlock(&cache_defer_lock);
+ }
+
+ /* First, add this to the list */
+ cache_defer_cnt++;
+ list_add(&dreq->recent, &cache_defer_list);
+
+ /* now consider removing either the first or the last */
+ if (cache_defer_cnt > DFR_MAX) {
+ if (net_random() & 1)
+ discard = list_entry(cache_defer_list.next,
+ struct cache_deferred_req, recent);
+ else
+ discard = list_entry(cache_defer_list.prev,
+ struct cache_deferred_req, recent);
+ __unhash_deferred_req(discard);
+ }
+ spin_unlock(&cache_defer_lock);
+ if (discard)
+ discard->revisit(discard, 1);
+}
+
static int cache_defer_req(struct cache_req *req, struct cache_head *item)
{
struct cache_deferred_req *dreq;
int ret;
- if (cache_defer_cnt >= DFR_MAX) {
- /* too much in the cache, randomly drop this one,
- * or continue and drop the oldest
- */
- if (net_random()&1)
- return -ENOMEM;
- }
if (req->thread_wait) {
ret = cache_wait_req(req, item);
if (ret != -ETIMEDOUT)
@@ -632,7 +652,11 @@ static int cache_defer_req(struct cache_req *req, struct cache_head *item)
dreq = req->defer(req);
if (dreq == NULL)
return -ENOMEM;
- return setup_deferral(dreq, item);
+ if (setup_deferral(dreq, item) < 0)
+ return -EAGAIN;
+
+ cache_limit_defers(dreq);
+ return 0;
}
static void cache_revisit_request(struct cache_head *item)
^ permalink raw reply related [flat|nested] 22+ messages in thread