[PATCH 0/2] Cache state owners [take 2]

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] Cache state owners [take 2]
@ 2011-12-06 21:13 Chuck Lever
  2011-12-06 21:13 ` [PATCH 1/2] NFS: Clean up nfs4_find_state_owners_locked() Chuck Lever
  2011-12-06 21:13 ` [PATCH 2/2] NFS: Cache state owners after files are closed Chuck Lever
  0 siblings, 2 replies; 18+ messages in thread
From: Chuck Lever @ 2011-12-06 21:13 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs

This has seen some light testing.  Changes since version 1:

  o Inclusion of additional nfs4_find_state_owners_locked() clean up
  o Follow Bruce's suggestion to set ->destroy in common setup function
  o Use time_in_range() instead of time_before() to check for expiry
  o Discard state owners immediately in certain cases

---

Chuck Lever (2):
      NFS: Cache state owners after files are closed
      NFS: Clean up nfs4_find_state_owners_locked()


 fs/nfs/client.c           |    8 ++++
 fs/nfs/nfs4_fs.h          |    3 +
 fs/nfs/nfs4state.c        |  102 ++++++++++++++++++++++++++++++++++++---------
 include/linux/nfs_fs_sb.h |    1 
 4 files changed, 94 insertions(+), 20 deletions(-)

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] NFS: Clean up nfs4_find_state_owners_locked()
  2011-12-06 21:13 [PATCH 0/2] Cache state owners [take 2] Chuck Lever
@ 2011-12-06 21:13 ` Chuck Lever
  2011-12-06 21:13 ` [PATCH 2/2] NFS: Cache state owners after files are closed Chuck Lever
  1 sibling, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2011-12-06 21:13 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs

There's no longer a need to check the so_server field in the state
owner, because nowadays the RB tree we search for state owners
contains owners for that only server.

Make nfs4_find_state_owners_locked() use the same tree searching logic
as nfs4_insert_state_owner_locked().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfs/nfs4state.c |   15 +++------------
 1 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 39914be..8794b82 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -377,31 +377,22 @@ nfs4_find_state_owner_locked(struct nfs_server *server, struct rpc_cred *cred)
 {
 	struct rb_node **p = &server->state_owners.rb_node,
 		       *parent = NULL;
-	struct nfs4_state_owner *sp, *res = NULL;
+	struct nfs4_state_owner *sp;
 
 	while (*p != NULL) {
 		parent = *p;
 		sp = rb_entry(parent, struct nfs4_state_owner, so_server_node);
 
-		if (server < sp->so_server) {
-			p = &parent->rb_left;
-			continue;
-		}
-		if (server > sp->so_server) {
-			p = &parent->rb_right;
-			continue;
-		}
 		if (cred < sp->so_cred)
 			p = &parent->rb_left;
 		else if (cred > sp->so_cred)
 			p = &parent->rb_right;
 		else {
 			atomic_inc(&sp->so_count);
-			res = sp;
-			break;
+			return sp;
 		}
 	}
-	return res;
+	return NULL;
 }
 
 static struct nfs4_state_owner *


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] NFS: Cache state owners after files are closed
  2011-12-06 21:13 [PATCH 0/2] Cache state owners [take 2] Chuck Lever
  2011-12-06 21:13 ` [PATCH 1/2] NFS: Clean up nfs4_find_state_owners_locked() Chuck Lever
@ 2011-12-06 21:13 ` Chuck Lever
  2012-01-04 15:53   ` Trond Myklebust
  1 sibling, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2011-12-06 21:13 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-nfs

Servers have a finite amount of memory to store NFSv4 open and lock
owners.  Moreover, servers may have a difficult time determining when
they can reap their state owner table, thanks to gray areas in the
NFSv4 protocol specification.  Thus clients should be careful to reuse
state owners when possible.

Currently Linux is not too careful.  When a user has closed all her
files on one mount point, the state owner's reference count goes to
zero, and it is released.  The next OPEN allocates a new one.  A
workload that serially opens and closes files can run through a large
number of open owners this way.

When a state owner's reference count goes to zero, slap it onto a free
list for that nfs_server, with an expiry time.  Garbage collect before
looking for a state owner.  This makes state owners for active users
available for re-use.

Now that there can be unused state owners remaining at umount time,
purge the state owner free list when a server is destroyed.  Also be
sure not to reclaim unused state owners during state recovery.

This change has benefits for the client as well.  For some workloads,
this approach drops the number of OPEN_CONFIRM calls from the same as
the number of OPEN calls, down to just one.  This reduces wire traffic
and thus open(2) latency.  Before this patch, untarring a kernel
source tarball shows the OPEN_CONFIRM call counter steadily increasing
through the test.  With the patch, the OPEN_CONFIRM count remains at 1
throughout the entire untar.

As long as the expiry time is kept short, I don't think garbage
collection should be terribly expensive, although it does bounce the
clp->cl_lock around a bit.

[ At some point we should rationalize the use of the nfs_server
->destroy method. ]

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---

 fs/nfs/client.c           |    8 ++++
 fs/nfs/nfs4_fs.h          |    3 ++
 fs/nfs/nfs4state.c        |   87 +++++++++++++++++++++++++++++++++++++++++----
 include/linux/nfs_fs_sb.h |    1 +
 4 files changed, 91 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 5833fbb..9db51cf 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -250,6 +250,11 @@ static void pnfs_init_server(struct nfs_server *server)
 	rpc_init_wait_queue(&server->roc_rpcwaitq, "pNFS ROC");
 }
 
+static void nfs4_destroy_server(struct nfs_server *server)
+{
+	nfs4_purge_state_owners(server);
+}
+
 #else
 static void nfs4_shutdown_client(struct nfs_client *clp)
 {
@@ -1064,6 +1069,7 @@ static struct nfs_server *nfs_alloc_server(void)
 	INIT_LIST_HEAD(&server->master_link);
 	INIT_LIST_HEAD(&server->delegations);
 	INIT_LIST_HEAD(&server->layouts);
+	INIT_LIST_HEAD(&server->state_owners_lru);
 
 	atomic_set(&server->active, 0);
 
@@ -1537,6 +1543,7 @@ static int nfs4_server_common_setup(struct nfs_server *server,
 
 	nfs_server_insert_lists(server);
 	server->mount_time = jiffies;
+	server->destroy = nfs4_destroy_server;
 out:
 	nfs_free_fattr(fattr);
 	return error;
@@ -1718,6 +1725,7 @@ struct nfs_server *nfs_clone_server(struct nfs_server *source,
 
 	/* Copy data from the source */
 	server->nfs_client = source->nfs_client;
+	server->destroy = source->destroy;
 	atomic_inc(&server->nfs_client->cl_count);
 	nfs_server_copy_userdata(server, source);
 
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 3e93e9a..5a6dc35 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -118,6 +118,8 @@ struct nfs_unique_id {
 struct nfs4_state_owner {
 	struct nfs_unique_id so_owner_id;
 	struct nfs_server    *so_server;
+	struct list_head     so_lru;
+	unsigned long        so_expires;
 	struct rb_node	     so_server_node;
 
 	struct rpc_cred	     *so_cred;	 /* Associated cred */
@@ -343,6 +345,7 @@ static inline void nfs4_schedule_session_recovery(struct nfs4_session *session)
 
 extern struct nfs4_state_owner * nfs4_get_state_owner(struct nfs_server *, struct rpc_cred *);
 extern void nfs4_put_state_owner(struct nfs4_state_owner *);
+extern void nfs4_purge_state_owners(struct nfs_server *);
 extern struct nfs4_state * nfs4_get_open_state(struct inode *, struct nfs4_state_owner *);
 extern void nfs4_put_open_state(struct nfs4_state *);
 extern void nfs4_close_state(struct nfs4_state *, fmode_t);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 8794b82..6bc75b5 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -49,6 +49,7 @@
 #include <linux/ratelimit.h>
 #include <linux/workqueue.h>
 #include <linux/bitops.h>
+#include <linux/jiffies.h>
 
 #include "nfs4_fs.h"
 #include "callback.h"
@@ -453,6 +454,7 @@ nfs4_alloc_state_owner(void)
 	spin_lock_init(&sp->so_sequence.lock);
 	INIT_LIST_HEAD(&sp->so_sequence.list);
 	atomic_set(&sp->so_count, 1);
+	INIT_LIST_HEAD(&sp->so_lru);
 	return sp;
 }
 
@@ -464,12 +466,48 @@ nfs4_drop_state_owner(struct nfs4_state_owner *sp)
 		struct nfs_client *clp = server->nfs_client;
 
 		spin_lock(&clp->cl_lock);
+		list_del_init(&sp->so_lru);
 		rb_erase(&sp->so_server_node, &server->state_owners);
 		RB_CLEAR_NODE(&sp->so_server_node);
 		spin_unlock(&clp->cl_lock);
 	}
 }
 
+static void nfs4_release_state_owner(struct nfs4_state_owner *sp)
+{
+	struct nfs_client *clp = sp->so_server->nfs_client;
+
+	spin_lock(&clp->cl_lock);
+	nfs4_remove_state_owner_locked(sp);
+	spin_unlock(&clp->cl_lock);
+	rpc_destroy_wait_queue(&sp->so_sequence.wait);
+	put_rpccred(sp->so_cred);
+	kfree(sp);
+}
+
+static void nfs4_gc_state_owners(struct nfs_server *server)
+{
+	struct nfs_client *clp = server->nfs_client;
+	struct nfs4_state_owner *sp, *tmp;
+	unsigned long now = jiffies;
+	LIST_HEAD(doomed);
+
+	spin_lock(&clp->cl_lock);
+	list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru) {
+		/* NB: LRU is sorted so that oldest is at the head */
+		if (time_in_range(now, sp->so_expires,
+				sp->so_expires + clp->cl_lease_time))
+			break;
+		list_move(&sp->so_lru, &doomed);
+	}
+	spin_unlock(&clp->cl_lock);
+
+	list_for_each_entry_safe(sp, tmp, &doomed, so_lru) {
+		list_del_init(&sp->so_lru);
+		nfs4_release_state_owner(sp);
+	}
+}
+
 /**
  * nfs4_get_state_owner - Look up a state owner given a credential
  * @server: nfs_server to search
@@ -483,11 +521,13 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
 	struct nfs_client *clp = server->nfs_client;
 	struct nfs4_state_owner *sp, *new;
 
+	nfs4_gc_state_owners(server);
+
 	spin_lock(&clp->cl_lock);
 	sp = nfs4_find_state_owner_locked(server, cred);
 	spin_unlock(&clp->cl_lock);
 	if (sp != NULL)
-		return sp;
+		goto out;
 	new = nfs4_alloc_state_owner();
 	if (new == NULL)
 		return NULL;
@@ -502,26 +542,56 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
 		rpc_destroy_wait_queue(&new->so_sequence.wait);
 		kfree(new);
 	}
+out:
+	list_del_init(&sp->so_lru);
 	return sp;
 }
 
 /**
  * nfs4_put_state_owner - Release a nfs4_state_owner
  * @sp: state owner data to release
- *
  */
 void nfs4_put_state_owner(struct nfs4_state_owner *sp)
 {
-	struct nfs_client *clp = sp->so_server->nfs_client;
-	struct rpc_cred *cred = sp->so_cred;
+	struct nfs_server *server = sp->so_server;
+	struct nfs_client *clp = server->nfs_client;
 
 	if (!atomic_dec_and_lock(&sp->so_count, &clp->cl_lock))
 		return;
-	nfs4_remove_state_owner_locked(sp);
+
+	if (!RB_EMPTY_NODE(&sp->so_server_node)) {
+		sp->so_expires = jiffies;
+		list_move_tail(&sp->so_lru, &server->state_owners_lru);
+		spin_unlock(&clp->cl_lock);
+	} else {
+		list_del_init(&sp->so_lru);
+		spin_unlock(&clp->cl_lock);
+		nfs4_release_state_owner(sp);
+	}
+}
+
+/**
+ * nfs4_purge_state_owners - Release all cached state owners
+ * @server: nfs_server with cached state owners to release
+ *
+ * Called at umount time.  Remaining state owners will be on
+ * the LRU with ref count of zero.
+ */
+void nfs4_purge_state_owners(struct nfs_server *server)
+{
+	struct nfs_client *clp = server->nfs_client;
+	struct nfs4_state_owner *sp, *tmp;
+	LIST_HEAD(doomed);
+
+	spin_lock(&clp->cl_lock);
+	list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru)
+		list_move(&sp->so_lru, &doomed);
 	spin_unlock(&clp->cl_lock);
-	rpc_destroy_wait_queue(&sp->so_sequence.wait);
-	put_rpccred(cred);
-	kfree(sp);
+
+	list_for_each_entry_safe(sp, tmp, &doomed, so_lru) {
+		list_del_init(&sp->so_lru);
+		nfs4_release_state_owner(sp);
+	}
 }
 
 static struct nfs4_state *
@@ -1385,6 +1455,7 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov
 restart:
 	rcu_read_lock();
 	list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) {
+		nfs4_purge_state_owners(server);
 		spin_lock(&clp->cl_lock);
 		for (pos = rb_first(&server->state_owners);
 		     pos != NULL;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index b5479df..ba4d765 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -153,6 +153,7 @@ struct nfs_server {
 	struct rb_root		openowner_id;
 	struct rb_root		lockowner_id;
 #endif
+	struct list_head	state_owners_lru;
 	struct list_head	layouts;
 	struct list_head	delegations;
 	void (*destroy)(struct nfs_server *);


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] NFS: Cache state owners after files are closed
  2011-12-06 21:13 ` [PATCH 2/2] NFS: Cache state owners after files are closed Chuck Lever
@ 2012-01-04 15:53   ` Trond Myklebust
  2012-01-04 15:59     ` Trond Myklebust
  0 siblings, 1 reply; 18+ messages in thread
From: Trond Myklebust @ 2012-01-04 15:53 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Tue, 2011-12-06 at 16:13 -0500, Chuck Lever wrote: 
> Servers have a finite amount of memory to store NFSv4 open and lock
> owners.  Moreover, servers may have a difficult time determining when
> they can reap their state owner table, thanks to gray areas in the
> NFSv4 protocol specification.  Thus clients should be careful to reuse
> state owners when possible.
> 
> Currently Linux is not too careful.  When a user has closed all her
> files on one mount point, the state owner's reference count goes to
> zero, and it is released.  The next OPEN allocates a new one.  A
> workload that serially opens and closes files can run through a large
> number of open owners this way.
> 
> When a state owner's reference count goes to zero, slap it onto a free
> list for that nfs_server, with an expiry time.  Garbage collect before
> looking for a state owner.  This makes state owners for active users
> available for re-use.
> 
> Now that there can be unused state owners remaining at umount time,
> purge the state owner free list when a server is destroyed.  Also be
> sure not to reclaim unused state owners during state recovery.
> 
> This change has benefits for the client as well.  For some workloads,
> this approach drops the number of OPEN_CONFIRM calls from the same as
> the number of OPEN calls, down to just one.  This reduces wire traffic
> and thus open(2) latency.  Before this patch, untarring a kernel
> source tarball shows the OPEN_CONFIRM call counter steadily increasing
> through the test.  With the patch, the OPEN_CONFIRM count remains at 1
> throughout the entire untar.
> 
> As long as the expiry time is kept short, I don't think garbage
> collection should be terribly expensive, although it does bounce the
> clp->cl_lock around a bit.
> 
> [ At some point we should rationalize the use of the nfs_server
> ->destroy method. ]
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
<snip> 
> +
> +static void nfs4_gc_state_owners(struct nfs_server *server)
> +{
> +	struct nfs_client *clp = server->nfs_client;
> +	struct nfs4_state_owner *sp, *tmp;
> +	unsigned long now = jiffies;
> +	LIST_HEAD(doomed);
> +
> +	spin_lock(&clp->cl_lock);
> +	list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru) {
> +		/* NB: LRU is sorted so that oldest is at the head */
> +		if (time_in_range(now, sp->so_expires,
> +				sp->so_expires + clp->cl_lease_time))
> +			break;
> +		list_move(&sp->so_lru, &doomed);
> +	}
> +	spin_unlock(&clp->cl_lock);

There is a race here: the state owner is still visible in the rb-tree,
and so a second thread that calls nfs4_get_state_owner() may still pick
up one of these open owners that are now on your 'doomed' list.

I think you rather need to do the equivalent of an
'nfs4_drop_state_owner' in your loop above in order to be 100% safe.

> +	list_for_each_entry_safe(sp, tmp, &doomed, so_lru) {
> +		list_del_init(&sp->so_lru);
> +		nfs4_release_state_owner(sp);
> +	}
> +}
> +
>  /**
>   * nfs4_get_state_owner - Look up a state owner given a credential
>   * @server: nfs_server to search
> @@ -483,11 +521,13 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
>  	struct nfs_client *clp = server->nfs_client;
>  	struct nfs4_state_owner *sp, *new;
>  
> +	nfs4_gc_state_owners(server);
> +
>  	spin_lock(&clp->cl_lock);
>  	sp = nfs4_find_state_owner_locked(server, cred);
>  	spin_unlock(&clp->cl_lock);
>  	if (sp != NULL)
> -		return sp;
> +		goto out;
>  	new = nfs4_alloc_state_owner();
>  	if (new == NULL)
>  		return NULL;
> @@ -502,26 +542,56 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
>  		rpc_destroy_wait_queue(&new->so_sequence.wait);
>  		kfree(new);
>  	}
> +out:
> +	list_del_init(&sp->so_lru);

Ehem... List corruption due to no spin lock. Doesn't it make more sense
to do the above inside nfs4_find/insert_state_owner_locked?

> 	return sp;
>  }
>  

Otherwise it looks OK...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] NFS: Cache state owners after files are closed
  2012-01-04 15:53   ` Trond Myklebust
@ 2012-01-04 15:59     ` Trond Myklebust
  2012-01-04 16:08       ` Chuck Lever
  0 siblings, 1 reply; 18+ messages in thread
From: Trond Myklebust @ 2012-01-04 15:59 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Wed, 2012-01-04 at 10:53 -0500, Trond Myklebust wrote: 
> On Tue, 2011-12-06 at 16:13 -0500, Chuck Lever wrote: 
> > Servers have a finite amount of memory to store NFSv4 open and lock
> > owners.  Moreover, servers may have a difficult time determining when
> > they can reap their state owner table, thanks to gray areas in the
> > NFSv4 protocol specification.  Thus clients should be careful to reuse
> > state owners when possible.
> > 
> > Currently Linux is not too careful.  When a user has closed all her
> > files on one mount point, the state owner's reference count goes to
> > zero, and it is released.  The next OPEN allocates a new one.  A
> > workload that serially opens and closes files can run through a large
> > number of open owners this way.
> > 
> > When a state owner's reference count goes to zero, slap it onto a free
> > list for that nfs_server, with an expiry time.  Garbage collect before
> > looking for a state owner.  This makes state owners for active users
> > available for re-use.
> > 
> > Now that there can be unused state owners remaining at umount time,
> > purge the state owner free list when a server is destroyed.  Also be
> > sure not to reclaim unused state owners during state recovery.
> > 
> > This change has benefits for the client as well.  For some workloads,
> > this approach drops the number of OPEN_CONFIRM calls from the same as
> > the number of OPEN calls, down to just one.  This reduces wire traffic
> > and thus open(2) latency.  Before this patch, untarring a kernel
> > source tarball shows the OPEN_CONFIRM call counter steadily increasing
> > through the test.  With the patch, the OPEN_CONFIRM count remains at 1
> > throughout the entire untar.
> > 
> > As long as the expiry time is kept short, I don't think garbage
> > collection should be terribly expensive, although it does bounce the
> > clp->cl_lock around a bit.
> > 
> > [ At some point we should rationalize the use of the nfs_server
> > ->destroy method. ]
> > 
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> <snip> 
> > +
> > +static void nfs4_gc_state_owners(struct nfs_server *server)
> > +{
> > +	struct nfs_client *clp = server->nfs_client;
> > +	struct nfs4_state_owner *sp, *tmp;
> > +	unsigned long now = jiffies;
> > +	LIST_HEAD(doomed);
> > +
> > +	spin_lock(&clp->cl_lock);
> > +	list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru) {
> > +		/* NB: LRU is sorted so that oldest is at the head */
> > +		if (time_in_range(now, sp->so_expires,
> > +				sp->so_expires + clp->cl_lease_time))
> > +			break;
> > +		list_move(&sp->so_lru, &doomed);
> > +	}
> > +	spin_unlock(&clp->cl_lock);
> 
> There is a race here: the state owner is still visible in the rb-tree,
> and so a second thread that calls nfs4_get_state_owner() may still pick
> up one of these open owners that are now on your 'doomed' list.
> 
> I think you rather need to do the equivalent of an
> 'nfs4_drop_state_owner' in your loop above in order to be 100% safe.
> 
> > +	list_for_each_entry_safe(sp, tmp, &doomed, so_lru) {
> > +		list_del_init(&sp->so_lru);
> > +		nfs4_release_state_owner(sp);
> > +	}
> > +}
> > +
> >  /**
> >   * nfs4_get_state_owner - Look up a state owner given a credential
> >   * @server: nfs_server to search
> > @@ -483,11 +521,13 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
> >  	struct nfs_client *clp = server->nfs_client;
> >  	struct nfs4_state_owner *sp, *new;
> >  
> > +	nfs4_gc_state_owners(server);
> > +

One more thing: doesn't it make more sense to do the garbage collection
after you've looked up the state owner? Even if the open owner has been
expired on the server, you will still gain by avoiding the extra
allocation.

> >  	spin_lock(&clp->cl_lock);
> >  	sp = nfs4_find_state_owner_locked(server, cred);
> >  	spin_unlock(&clp->cl_lock);
> >  	if (sp != NULL)
> > -		return sp;
> > +		goto out;
> >  	new = nfs4_alloc_state_owner();
> >  	if (new == NULL)
> >  		return NULL;
> > @@ -502,26 +542,56 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
> >  		rpc_destroy_wait_queue(&new->so_sequence.wait);
> >  		kfree(new);
> >  	}
> > +out:
> > +	list_del_init(&sp->so_lru);
> 
> Ehem... List corruption due to no spin lock. Doesn't it make more sense
> to do the above inside nfs4_find/insert_state_owner_locked?
> 
> > 	return sp;
> >  }
> >  
> 
> Otherwise it looks OK...
> 

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] NFS: Cache state owners after files are closed
  2012-01-04 15:59     ` Trond Myklebust
@ 2012-01-04 16:08       ` Chuck Lever
  2012-01-04 16:14         ` Trond Myklebust
  0 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2012-01-04 16:08 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs


On Jan 4, 2012, at 10:59 AM, Trond Myklebust wrote:

> On Wed, 2012-01-04 at 10:53 -0500, Trond Myklebust wrote: 
>> On Tue, 2011-12-06 at 16:13 -0500, Chuck Lever wrote: 
>>> Servers have a finite amount of memory to store NFSv4 open and lock
>>> owners.  Moreover, servers may have a difficult time determining when
>>> they can reap their state owner table, thanks to gray areas in the
>>> NFSv4 protocol specification.  Thus clients should be careful to reuse
>>> state owners when possible.
>>> 
>>> Currently Linux is not too careful.  When a user has closed all her
>>> files on one mount point, the state owner's reference count goes to
>>> zero, and it is released.  The next OPEN allocates a new one.  A
>>> workload that serially opens and closes files can run through a large
>>> number of open owners this way.
>>> 
>>> When a state owner's reference count goes to zero, slap it onto a free
>>> list for that nfs_server, with an expiry time.  Garbage collect before
>>> looking for a state owner.  This makes state owners for active users
>>> available for re-use.
>>> 
>>> Now that there can be unused state owners remaining at umount time,
>>> purge the state owner free list when a server is destroyed.  Also be
>>> sure not to reclaim unused state owners during state recovery.
>>> 
>>> This change has benefits for the client as well.  For some workloads,
>>> this approach drops the number of OPEN_CONFIRM calls from the same as
>>> the number of OPEN calls, down to just one.  This reduces wire traffic
>>> and thus open(2) latency.  Before this patch, untarring a kernel
>>> source tarball shows the OPEN_CONFIRM call counter steadily increasing
>>> through the test.  With the patch, the OPEN_CONFIRM count remains at 1
>>> throughout the entire untar.
>>> 
>>> As long as the expiry time is kept short, I don't think garbage
>>> collection should be terribly expensive, although it does bounce the
>>> clp->cl_lock around a bit.
>>> 
>>> [ At some point we should rationalize the use of the nfs_server
>>> ->destroy method. ]
>>> 
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>> <snip> 
>>> +
>>> +static void nfs4_gc_state_owners(struct nfs_server *server)
>>> +{
>>> +	struct nfs_client *clp = server->nfs_client;
>>> +	struct nfs4_state_owner *sp, *tmp;
>>> +	unsigned long now = jiffies;
>>> +	LIST_HEAD(doomed);
>>> +
>>> +	spin_lock(&clp->cl_lock);
>>> +	list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru) {
>>> +		/* NB: LRU is sorted so that oldest is at the head */
>>> +		if (time_in_range(now, sp->so_expires,
>>> +				sp->so_expires + clp->cl_lease_time))
>>> +			break;
>>> +		list_move(&sp->so_lru, &doomed);
>>> +	}
>>> +	spin_unlock(&clp->cl_lock);
>> 
>> There is a race here: the state owner is still visible in the rb-tree,
>> and so a second thread that calls nfs4_get_state_owner() may still pick
>> up one of these open owners that are now on your 'doomed' list.
>> 
>> I think you rather need to do the equivalent of an
>> 'nfs4_drop_state_owner' in your loop above in order to be 100% safe.
>> 
>>> +	list_for_each_entry_safe(sp, tmp, &doomed, so_lru) {
>>> +		list_del_init(&sp->so_lru);
>>> +		nfs4_release_state_owner(sp);
>>> +	}
>>> +}
>>> +
>>> /**
>>>  * nfs4_get_state_owner - Look up a state owner given a credential
>>>  * @server: nfs_server to search
>>> @@ -483,11 +521,13 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
>>> 	struct nfs_client *clp = server->nfs_client;
>>> 	struct nfs4_state_owner *sp, *new;
>>> 
>>> +	nfs4_gc_state_owners(server);
>>> +
> 
> One more thing: doesn't it make more sense to do the garbage collection
> after you've looked up the state owner? Even if the open owner has been
> expired on the server, you will still gain by avoiding the extra
> allocation.

I think "GC first" is more fair.  If you really want to make sure an idle open owner is not re-used passed the sell-by, we have to do GC first.  Otherwise an open owner can languish for days if there's no other activity, and will get re-used on the next open.

It seems to me that on-the-wire behavior would be inconsistent if some open owners were re-used after a long idle, some were not, and the precise re-use behavior depended on workload.  "GC first" gives nicely predictable behavior.

> 
>>> 	spin_lock(&clp->cl_lock);
>>> 	sp = nfs4_find_state_owner_locked(server, cred);
>>> 	spin_unlock(&clp->cl_lock);
>>> 	if (sp != NULL)
>>> -		return sp;
>>> +		goto out;
>>> 	new = nfs4_alloc_state_owner();
>>> 	if (new == NULL)
>>> 		return NULL;
>>> @@ -502,26 +542,56 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
>>> 		rpc_destroy_wait_queue(&new->so_sequence.wait);
>>> 		kfree(new);
>>> 	}
>>> +out:
>>> +	list_del_init(&sp->so_lru);
>> 
>> Ehem... List corruption due to no spin lock. Doesn't it make more sense
>> to do the above inside nfs4_find/insert_state_owner_locked?
>> 
>>> 	return sp;
>>> }
>>> 
>> 
>> Otherwise it looks OK...
>> 
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer
> 
> NetApp
> Trond.Myklebust@netapp.com
> www.netapp.com
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] NFS: Cache state owners after files are closed
  2012-01-04 16:08       ` Chuck Lever
@ 2012-01-04 16:14         ` Trond Myklebust
  2012-01-04 17:07           ` Chuck Lever
  0 siblings, 1 reply; 18+ messages in thread
From: Trond Myklebust @ 2012-01-04 16:14 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Wed, 2012-01-04 at 11:08 -0500, Chuck Lever wrote: 
> On Jan 4, 2012, at 10:59 AM, Trond Myklebust wrote:
> 
> > On Wed, 2012-01-04 at 10:53 -0500, Trond Myklebust wrote: 
> >> On Tue, 2011-12-06 at 16:13 -0500, Chuck Lever wrote: 
> >>> Servers have a finite amount of memory to store NFSv4 open and lock
> >>> owners.  Moreover, servers may have a difficult time determining when
> >>> they can reap their state owner table, thanks to gray areas in the
> >>> NFSv4 protocol specification.  Thus clients should be careful to reuse
> >>> state owners when possible.
> >>> 
> >>> Currently Linux is not too careful.  When a user has closed all her
> >>> files on one mount point, the state owner's reference count goes to
> >>> zero, and it is released.  The next OPEN allocates a new one.  A
> >>> workload that serially opens and closes files can run through a large
> >>> number of open owners this way.
> >>> 
> >>> When a state owner's reference count goes to zero, slap it onto a free
> >>> list for that nfs_server, with an expiry time.  Garbage collect before
> >>> looking for a state owner.  This makes state owners for active users
> >>> available for re-use.
> >>> 
> >>> Now that there can be unused state owners remaining at umount time,
> >>> purge the state owner free list when a server is destroyed.  Also be
> >>> sure not to reclaim unused state owners during state recovery.
> >>> 
> >>> This change has benefits for the client as well.  For some workloads,
> >>> this approach drops the number of OPEN_CONFIRM calls from the same as
> >>> the number of OPEN calls, down to just one.  This reduces wire traffic
> >>> and thus open(2) latency.  Before this patch, untarring a kernel
> >>> source tarball shows the OPEN_CONFIRM call counter steadily increasing
> >>> through the test.  With the patch, the OPEN_CONFIRM count remains at 1
> >>> throughout the entire untar.
> >>> 
> >>> As long as the expiry time is kept short, I don't think garbage
> >>> collection should be terribly expensive, although it does bounce the
> >>> clp->cl_lock around a bit.
> >>> 
> >>> [ At some point we should rationalize the use of the nfs_server
> >>> ->destroy method. ]
> >>> 
> >>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> >>> ---
> >> <snip> 
> >>> +
> >>> +static void nfs4_gc_state_owners(struct nfs_server *server)
> >>> +{
> >>> +	struct nfs_client *clp = server->nfs_client;
> >>> +	struct nfs4_state_owner *sp, *tmp;
> >>> +	unsigned long now = jiffies;
> >>> +	LIST_HEAD(doomed);
> >>> +
> >>> +	spin_lock(&clp->cl_lock);
> >>> +	list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru) {
> >>> +		/* NB: LRU is sorted so that oldest is at the head */
> >>> +		if (time_in_range(now, sp->so_expires,
> >>> +				sp->so_expires + clp->cl_lease_time))
> >>> +			break;
> >>> +		list_move(&sp->so_lru, &doomed);
> >>> +	}
> >>> +	spin_unlock(&clp->cl_lock);
> >> 
> >> There is a race here: the state owner is still visible in the rb-tree,
> >> and so a second thread that calls nfs4_get_state_owner() may still pick
> >> up one of these open owners that are now on your 'doomed' list.
> >> 
> >> I think you rather need to do the equivalent of an
> >> 'nfs4_drop_state_owner' in your loop above in order to be 100% safe.
> >> 
> >>> +	list_for_each_entry_safe(sp, tmp, &doomed, so_lru) {
> >>> +		list_del_init(&sp->so_lru);
> >>> +		nfs4_release_state_owner(sp);
> >>> +	}
> >>> +}
> >>> +
> >>> /**
> >>>  * nfs4_get_state_owner - Look up a state owner given a credential
> >>>  * @server: nfs_server to search
> >>> @@ -483,11 +521,13 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
> >>> 	struct nfs_client *clp = server->nfs_client;
> >>> 	struct nfs4_state_owner *sp, *new;
> >>> 
> >>> +	nfs4_gc_state_owners(server);
> >>> +
> > 
> > One more thing: doesn't it make more sense to do the garbage collection
> > after you've looked up the state owner? Even if the open owner has been
> > expired on the server, you will still gain by avoiding the extra
> > allocation.
> 
> I think "GC first" is more fair.  If you really want to make sure an idle open owner is not re-used passed the sell-by, we have to do GC first.  Otherwise an open owner can languish for days if there's no other activity, and will get re-used on the next open.

Why do we care about fairness here? The only use for the garbage
collector is to ensure that state owners that refer to credentials that
are no longer in use get kicked out. I don't see why we should throw out
an open owner that matches our search criteria.

> It seems to me that on-the-wire behavior would be inconsistent if some open owners were re-used after a long idle, some were not, and the precise re-use behavior depended on workload.  "GC first" gives nicely predictable behavior.

Inconsistent how? An open owner is just a label and a counter. The
server isn't going to be using it for anything other than figuring out
ordering semantics (minor version 0 only) and share lock semantics.


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] NFS: Cache state owners after files are closed
  2012-01-04 16:14         ` Trond Myklebust
@ 2012-01-04 17:07           ` Chuck Lever
  2012-01-04 17:14             ` Trond Myklebust
  0 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2012-01-04 17:07 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs


On Jan 4, 2012, at 11:14 AM, Trond Myklebust wrote:

> On Wed, 2012-01-04 at 11:08 -0500, Chuck Lever wrote: 
>> On Jan 4, 2012, at 10:59 AM, Trond Myklebust wrote:
>> 
>>> On Wed, 2012-01-04 at 10:53 -0500, Trond Myklebust wrote: 
>>>> On Tue, 2011-12-06 at 16:13 -0500, Chuck Lever wrote: 
>>>>> Servers have a finite amount of memory to store NFSv4 open and lock
>>>>> owners.  Moreover, servers may have a difficult time determining when
>>>>> they can reap their state owner table, thanks to gray areas in the
>>>>> NFSv4 protocol specification.  Thus clients should be careful to reuse
>>>>> state owners when possible.
>>>>> 
>>>>> Currently Linux is not too careful.  When a user has closed all her
>>>>> files on one mount point, the state owner's reference count goes to
>>>>> zero, and it is released.  The next OPEN allocates a new one.  A
>>>>> workload that serially opens and closes files can run through a large
>>>>> number of open owners this way.
>>>>> 
>>>>> When a state owner's reference count goes to zero, slap it onto a free
>>>>> list for that nfs_server, with an expiry time.  Garbage collect before
>>>>> looking for a state owner.  This makes state owners for active users
>>>>> available for re-use.
>>>>> 
>>>>> Now that there can be unused state owners remaining at umount time,
>>>>> purge the state owner free list when a server is destroyed.  Also be
>>>>> sure not to reclaim unused state owners during state recovery.
>>>>> 
>>>>> This change has benefits for the client as well.  For some workloads,
>>>>> this approach drops the number of OPEN_CONFIRM calls from the same as
>>>>> the number of OPEN calls, down to just one.  This reduces wire traffic
>>>>> and thus open(2) latency.  Before this patch, untarring a kernel
>>>>> source tarball shows the OPEN_CONFIRM call counter steadily increasing
>>>>> through the test.  With the patch, the OPEN_CONFIRM count remains at 1
>>>>> throughout the entire untar.
>>>>> 
>>>>> As long as the expiry time is kept short, I don't think garbage
>>>>> collection should be terribly expensive, although it does bounce the
>>>>> clp->cl_lock around a bit.
>>>>> 
>>>>> [ At some point we should rationalize the use of the nfs_server
>>>>> ->destroy method. ]
>>>>> 
>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>> ---
>>>> <snip> 
>>>>> +
>>>>> +static void nfs4_gc_state_owners(struct nfs_server *server)
>>>>> +{
>>>>> +	struct nfs_client *clp = server->nfs_client;
>>>>> +	struct nfs4_state_owner *sp, *tmp;
>>>>> +	unsigned long now = jiffies;
>>>>> +	LIST_HEAD(doomed);
>>>>> +
>>>>> +	spin_lock(&clp->cl_lock);
>>>>> +	list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru) {
>>>>> +		/* NB: LRU is sorted so that oldest is at the head */
>>>>> +		if (time_in_range(now, sp->so_expires,
>>>>> +				sp->so_expires + clp->cl_lease_time))
>>>>> +			break;
>>>>> +		list_move(&sp->so_lru, &doomed);
>>>>> +	}
>>>>> +	spin_unlock(&clp->cl_lock);
>>>> 
>>>> There is a race here: the state owner is still visible in the rb-tree,
>>>> and so a second thread that calls nfs4_get_state_owner() may still pick
>>>> up one of these open owners that are now on your 'doomed' list.
>>>> 
>>>> I think you rather need to do the equivalent of an
>>>> 'nfs4_drop_state_owner' in your loop above in order to be 100% safe.
>>>> 
>>>>> +	list_for_each_entry_safe(sp, tmp, &doomed, so_lru) {
>>>>> +		list_del_init(&sp->so_lru);
>>>>> +		nfs4_release_state_owner(sp);
>>>>> +	}
>>>>> +}
>>>>> +
>>>>> /**
>>>>> * nfs4_get_state_owner - Look up a state owner given a credential
>>>>> * @server: nfs_server to search
>>>>> @@ -483,11 +521,13 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
>>>>> 	struct nfs_client *clp = server->nfs_client;
>>>>> 	struct nfs4_state_owner *sp, *new;
>>>>> 
>>>>> +	nfs4_gc_state_owners(server);
>>>>> +
>>> 
>>> One more thing: doesn't it make more sense to do the garbage collection
>>> after you've looked up the state owner? Even if the open owner has been
>>> expired on the server, you will still gain by avoiding the extra
>>> allocation.
>> 
>> I think "GC first" is more fair.  If you really want to make sure an idle open owner is not re-used passed the sell-by, we have to do GC first.  Otherwise an open owner can languish for days if there's no other activity, and will get re-used on the next open.
> 
> Why do we care about fairness here? The only use for the garbage
> collector is to ensure that state owners that refer to credentials that
> are no longer in use get kicked out.  I don't see why we should throw out
> an open owner that matches our search criteria.

You can't avoid that.  It's easy to find some workload where "GC last" tosses a state owner right before an application wants to reuse it.  We will always have this problem as long as we garbage collect.

Right now the client is maximally aggressive about tossing state owners.  That is an existence proof that it's really not critical if we throw these away a little more aggressively than is optimal.  It would be much less conservative to allow some state owners to live a long time ("GC last" does not guarantee a lifetime upper bound), since we've never allowed a state owner to live a long time before.

>> It seems to me that on-the-wire behavior would be inconsistent if some open owners were re-used after a long idle, some were not, and the precise re-use behavior depended on workload.  "GC first" gives nicely predictable behavior.
> 
> Inconsistent how?

I think I explained it already.  Some state owners get re-used, some don't, and there's no real pattern.  It's an invitation to unwanted load-dependent behavior (he said, having spent the past month tracking down unwanted load-dependent behavior).  I don't have something specific in mind at the moment.

> An open owner is just a label and a counter. The
> server isn't going to be using it for anything other than figuring out
> ordering semantics (minor version 0 only) and share lock semantics.

As you state above, on our client state owners actually pin resources.  They are more than a label and a counter.

If you prefer "GC last" I can change my patch (or you can).  I think "GC first" is a much cleaner implementation.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] NFS: Cache state owners after files are closed
  2012-01-04 17:07           ` Chuck Lever
@ 2012-01-04 17:14             ` Trond Myklebust
  2012-01-04 21:52               ` [PATCH] NFS: Fix a few issues with cached state owners Trond Myklebust
  0 siblings, 1 reply; 18+ messages in thread
From: Trond Myklebust @ 2012-01-04 17:14 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Wed, 2012-01-04 at 12:07 -0500, Chuck Lever wrote: 
> On Jan 4, 2012, at 11:14 AM, Trond Myklebust wrote:
> As you state above, on our client state owners actually pin resources.  They are more than a label and a counter.
> 
> If you prefer "GC last" I can change my patch (or you can).

Please do. It is slightly more efficient in the corner case I pointed
out, and should leave all other cases unaffected...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] NFS: Fix a few issues with cached state owners
  2012-01-04 17:14             ` Trond Myklebust
@ 2012-01-04 21:52               ` Trond Myklebust
  2012-01-04 21:56                 ` Chuck Lever
  0 siblings, 1 reply; 18+ messages in thread
From: Trond Myklebust @ 2012-01-04 21:52 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

The garbage collector needs to remove the entry from the lru list
_and_ the red-black tree atomically in order avoid races in
nfs4_get_state_owner.

Fix a case in nfs4_get_state_owner in which the garbage-collector
list was being manipulated while not holding the appropriate spin
lock.

nfs4_drop_state_owner doesn't need to look at sp->so_lru: the caller
must have a reference to the state owner, so it can't be on the
garbage-collection list.

Run the garbage collector _after_ we've looked up the state owner
for efficiency reasons.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/nfs4state.c |   23 ++++++++++++++++-------
 1 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index b9ade99..dc78e0e 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -389,6 +389,8 @@ nfs4_find_state_owner_locked(struct nfs_server *server, struct rpc_cred *cred)
 		else if (cred > sp->so_cred)
 			p = &parent->rb_right;
 		else {
+			if (!list_empty(&sp->so_lru))
+				list_del_init(&sp->so_lru);
 			atomic_inc(&sp->so_count);
 			return sp;
 		}
@@ -413,6 +415,8 @@ nfs4_insert_state_owner_locked(struct nfs4_state_owner *new)
 		else if (new->so_cred > sp->so_cred)
 			p = &parent->rb_right;
 		else {
+			if (!list_empty(&sp->so_lru))
+				list_del_init(&sp->so_lru);
 			atomic_inc(&sp->so_count);
 			return sp;
 		}
@@ -459,6 +463,13 @@ nfs4_alloc_state_owner(void)
 }
 
 static void
+nfs4_drop_state_owner_locked(struct nfs_server *server, struct nfs4_state_owner *sp)
+{
+	rb_erase(&sp->so_server_node, &server->state_owners);
+	RB_CLEAR_NODE(&sp->so_server_node);
+}
+
+static void
 nfs4_drop_state_owner(struct nfs4_state_owner *sp)
 {
 	if (!RB_EMPTY_NODE(&sp->so_server_node)) {
@@ -466,9 +477,8 @@ nfs4_drop_state_owner(struct nfs4_state_owner *sp)
 		struct nfs_client *clp = server->nfs_client;
 
 		spin_lock(&clp->cl_lock);
-		list_del_init(&sp->so_lru);
-		rb_erase(&sp->so_server_node, &server->state_owners);
-		RB_CLEAR_NODE(&sp->so_server_node);
+		if (!RB_EMPTY_NODE(&sp->so_server_node))
+			nfs4_drop_state_owner_locked(server, sp);
 		spin_unlock(&clp->cl_lock);
 	}
 }
@@ -499,6 +509,7 @@ static void nfs4_gc_state_owners(struct nfs_server *server)
 				sp->so_expires + clp->cl_lease_time))
 			break;
 		list_move(&sp->so_lru, &doomed);
+		nfs4_drop_state_owner_locked(server, sp);
 	}
 	spin_unlock(&clp->cl_lock);
 
@@ -521,8 +532,6 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
 	struct nfs_client *clp = server->nfs_client;
 	struct nfs4_state_owner *sp, *new;
 
-	nfs4_gc_state_owners(server);
-
 	spin_lock(&clp->cl_lock);
 	sp = nfs4_find_state_owner_locked(server, cred);
 	spin_unlock(&clp->cl_lock);
@@ -530,7 +539,7 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
 		goto out;
 	new = nfs4_alloc_state_owner();
 	if (new == NULL)
-		return NULL;
+		goto out;
 	new->so_server = server;
 	new->so_cred = cred;
 	spin_lock(&clp->cl_lock);
@@ -543,7 +552,7 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
 		kfree(new);
 	}
 out:
-	list_del_init(&sp->so_lru);
+	nfs4_gc_state_owners(server);
 	return sp;
 }
 
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] NFS: Fix a few issues with cached state owners
  2012-01-04 21:52               ` [PATCH] NFS: Fix a few issues with cached state owners Trond Myklebust
@ 2012-01-04 21:56                 ` Chuck Lever
  2012-01-04 21:58                   ` Chuck Lever
  0 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2012-01-04 21:56 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs


On Jan 4, 2012, at 4:52 PM, Trond Myklebust wrote:

> The garbage collector needs to remove the entry from the lru list
> _and_ the red-black tree atomically in order avoid races in
> nfs4_get_state_owner.
> 
> Fix a case in nfs4_get_state_owner in which the garbage-collector
> list was being manipulated while not holding the appropriate spin
> lock.
> 
> nfs4_drop_state_owner doesn't need to look at sp->so_lru: the caller
> must have a reference to the state owner, so it can't be on the
> garbage-collection list.
> 
> Run the garbage collector _after_ we've looked up the state owner
> for efficiency reasons.
> 
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>

> ---
> fs/nfs/nfs4state.c |   23 ++++++++++++++++-------
> 1 files changed, 16 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index b9ade99..dc78e0e 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -389,6 +389,8 @@ nfs4_find_state_owner_locked(struct nfs_server *server, struct rpc_cred *cred)
> 		else if (cred > sp->so_cred)
> 			p = &parent->rb_right;
> 		else {
> +			if (!list_empty(&sp->so_lru))
> +				list_del_init(&sp->so_lru);
> 			atomic_inc(&sp->so_count);
> 			return sp;
> 		}
> @@ -413,6 +415,8 @@ nfs4_insert_state_owner_locked(struct nfs4_state_owner *new)
> 		else if (new->so_cred > sp->so_cred)
> 			p = &parent->rb_right;
> 		else {
> +			if (!list_empty(&sp->so_lru))
> +				list_del_init(&sp->so_lru);
> 			atomic_inc(&sp->so_count);
> 			return sp;
> 		}
> @@ -459,6 +463,13 @@ nfs4_alloc_state_owner(void)
> }
> 
> static void
> +nfs4_drop_state_owner_locked(struct nfs_server *server, struct nfs4_state_owner *sp)
> +{
> +	rb_erase(&sp->so_server_node, &server->state_owners);
> +	RB_CLEAR_NODE(&sp->so_server_node);
> +}
> +
> +static void
> nfs4_drop_state_owner(struct nfs4_state_owner *sp)
> {
> 	if (!RB_EMPTY_NODE(&sp->so_server_node)) {
> @@ -466,9 +477,8 @@ nfs4_drop_state_owner(struct nfs4_state_owner *sp)
> 		struct nfs_client *clp = server->nfs_client;
> 
> 		spin_lock(&clp->cl_lock);
> -		list_del_init(&sp->so_lru);
> -		rb_erase(&sp->so_server_node, &server->state_owners);
> -		RB_CLEAR_NODE(&sp->so_server_node);
> +		if (!RB_EMPTY_NODE(&sp->so_server_node))
> +			nfs4_drop_state_owner_locked(server, sp);
> 		spin_unlock(&clp->cl_lock);
> 	}
> }
> @@ -499,6 +509,7 @@ static void nfs4_gc_state_owners(struct nfs_server *server)
> 				sp->so_expires + clp->cl_lease_time))
> 			break;
> 		list_move(&sp->so_lru, &doomed);
> +		nfs4_drop_state_owner_locked(server, sp);
> 	}
> 	spin_unlock(&clp->cl_lock);
> 
> @@ -521,8 +532,6 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
> 	struct nfs_client *clp = server->nfs_client;
> 	struct nfs4_state_owner *sp, *new;
> 
> -	nfs4_gc_state_owners(server);
> -
> 	spin_lock(&clp->cl_lock);
> 	sp = nfs4_find_state_owner_locked(server, cred);
> 	spin_unlock(&clp->cl_lock);
> @@ -530,7 +539,7 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
> 		goto out;
> 	new = nfs4_alloc_state_owner();
> 	if (new == NULL)
> -		return NULL;
> +		goto out;
> 	new->so_server = server;
> 	new->so_cred = cred;
> 	spin_lock(&clp->cl_lock);
> @@ -543,7 +552,7 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
> 		kfree(new);
> 	}
> out:
> -	list_del_init(&sp->so_lru);
> +	nfs4_gc_state_owners(server);
> 	return sp;
> }
> 
> -- 
> 1.7.7.5
> 

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] NFS: Fix a few issues with cached state owners
  2012-01-04 21:56                 ` Chuck Lever
@ 2012-01-04 21:58                   ` Chuck Lever
  2012-01-04 22:03                     ` Trond Myklebust
  0 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2012-01-04 21:58 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs


On Jan 4, 2012, at 4:56 PM, Chuck Lever wrote:

> 
> On Jan 4, 2012, at 4:52 PM, Trond Myklebust wrote:
> 
>> The garbage collector needs to remove the entry from the lru list
>> _and_ the red-black tree atomically in order avoid races in
>> nfs4_get_state_owner.
>> 
>> Fix a case in nfs4_get_state_owner in which the garbage-collector
>> list was being manipulated while not holding the appropriate spin
>> lock.
>> 
>> nfs4_drop_state_owner doesn't need to look at sp->so_lru: the caller
>> must have a reference to the state owner, so it can't be on the
>> garbage-collection list.
>> 
>> Run the garbage collector _after_ we've looked up the state owner
>> for efficiency reasons.
>> 
>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>

You might consider squashing this with my patch.  The list_del_init() outside the cl_lock is enough to warrant it.

> 
>> ---
>> fs/nfs/nfs4state.c |   23 ++++++++++++++++-------
>> 1 files changed, 16 insertions(+), 7 deletions(-)
>> 
>> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
>> index b9ade99..dc78e0e 100644
>> --- a/fs/nfs/nfs4state.c
>> +++ b/fs/nfs/nfs4state.c
>> @@ -389,6 +389,8 @@ nfs4_find_state_owner_locked(struct nfs_server *server, struct rpc_cred *cred)
>> 		else if (cred > sp->so_cred)
>> 			p = &parent->rb_right;
>> 		else {
>> +			if (!list_empty(&sp->so_lru))
>> +				list_del_init(&sp->so_lru);
>> 			atomic_inc(&sp->so_count);
>> 			return sp;
>> 		}
>> @@ -413,6 +415,8 @@ nfs4_insert_state_owner_locked(struct nfs4_state_owner *new)
>> 		else if (new->so_cred > sp->so_cred)
>> 			p = &parent->rb_right;
>> 		else {
>> +			if (!list_empty(&sp->so_lru))
>> +				list_del_init(&sp->so_lru);
>> 			atomic_inc(&sp->so_count);
>> 			return sp;
>> 		}
>> @@ -459,6 +463,13 @@ nfs4_alloc_state_owner(void)
>> }
>> 
>> static void
>> +nfs4_drop_state_owner_locked(struct nfs_server *server, struct nfs4_state_owner *sp)
>> +{
>> +	rb_erase(&sp->so_server_node, &server->state_owners);
>> +	RB_CLEAR_NODE(&sp->so_server_node);
>> +}
>> +
>> +static void
>> nfs4_drop_state_owner(struct nfs4_state_owner *sp)
>> {
>> 	if (!RB_EMPTY_NODE(&sp->so_server_node)) {
>> @@ -466,9 +477,8 @@ nfs4_drop_state_owner(struct nfs4_state_owner *sp)
>> 		struct nfs_client *clp = server->nfs_client;
>> 
>> 		spin_lock(&clp->cl_lock);
>> -		list_del_init(&sp->so_lru);
>> -		rb_erase(&sp->so_server_node, &server->state_owners);
>> -		RB_CLEAR_NODE(&sp->so_server_node);
>> +		if (!RB_EMPTY_NODE(&sp->so_server_node))
>> +			nfs4_drop_state_owner_locked(server, sp);
>> 		spin_unlock(&clp->cl_lock);
>> 	}
>> }
>> @@ -499,6 +509,7 @@ static void nfs4_gc_state_owners(struct nfs_server *server)
>> 				sp->so_expires + clp->cl_lease_time))
>> 			break;
>> 		list_move(&sp->so_lru, &doomed);
>> +		nfs4_drop_state_owner_locked(server, sp);
>> 	}
>> 	spin_unlock(&clp->cl_lock);
>> 
>> @@ -521,8 +532,6 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
>> 	struct nfs_client *clp = server->nfs_client;
>> 	struct nfs4_state_owner *sp, *new;
>> 
>> -	nfs4_gc_state_owners(server);
>> -
>> 	spin_lock(&clp->cl_lock);
>> 	sp = nfs4_find_state_owner_locked(server, cred);
>> 	spin_unlock(&clp->cl_lock);
>> @@ -530,7 +539,7 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
>> 		goto out;
>> 	new = nfs4_alloc_state_owner();
>> 	if (new == NULL)
>> -		return NULL;
>> +		goto out;
>> 	new->so_server = server;
>> 	new->so_cred = cred;
>> 	spin_lock(&clp->cl_lock);
>> @@ -543,7 +552,7 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
>> 		kfree(new);
>> 	}
>> out:
>> -	list_del_init(&sp->so_lru);
>> +	nfs4_gc_state_owners(server);
>> 	return sp;
>> }
>> 
>> -- 
>> 1.7.7.5
>> 
> 
> -- 
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> 

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] NFS: Fix a few issues with cached state owners
  2012-01-04 21:58                   ` Chuck Lever
@ 2012-01-04 22:03                     ` Trond Myklebust
  2012-01-04 22:07                       ` Chuck Lever
  0 siblings, 1 reply; 18+ messages in thread
From: Trond Myklebust @ 2012-01-04 22:03 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Wed, 2012-01-04 at 16:58 -0500, Chuck Lever wrote: 
> On Jan 4, 2012, at 4:56 PM, Chuck Lever wrote:
> 
> > 
> > On Jan 4, 2012, at 4:52 PM, Trond Myklebust wrote:
> > 
> >> The garbage collector needs to remove the entry from the lru list
> >> _and_ the red-black tree atomically in order avoid races in
> >> nfs4_get_state_owner.
> >> 
> >> Fix a case in nfs4_get_state_owner in which the garbage-collector
> >> list was being manipulated while not holding the appropriate spin
> >> lock.
> >> 
> >> nfs4_drop_state_owner doesn't need to look at sp->so_lru: the caller
> >> must have a reference to the state owner, so it can't be on the
> >> garbage-collection list.
> >> 
> >> Run the garbage collector _after_ we've looked up the state owner
> >> for efficiency reasons.
> >> 
> >> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> > 
> > Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
> 
> You might consider squashing this with my patch.  The list_del_init() outside the cl_lock is enough to warrant it.

Sure, as long as you're fine with that.

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] NFS: Fix a few issues with cached state owners
  2012-01-04 22:03                     ` Trond Myklebust
@ 2012-01-04 22:07                       ` Chuck Lever
  2012-01-04 22:27                         ` [PATCH] NFS: Cache state owners after files are closed Trond Myklebust
  0 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2012-01-04 22:07 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs


On Jan 4, 2012, at 5:03 PM, Trond Myklebust wrote:

> On Wed, 2012-01-04 at 16:58 -0500, Chuck Lever wrote: 
>> On Jan 4, 2012, at 4:56 PM, Chuck Lever wrote:
>> 
>>> 
>>> On Jan 4, 2012, at 4:52 PM, Trond Myklebust wrote:
>>> 
>>>> The garbage collector needs to remove the entry from the lru list
>>>> _and_ the red-black tree atomically in order avoid races in
>>>> nfs4_get_state_owner.
>>>> 
>>>> Fix a case in nfs4_get_state_owner in which the garbage-collector
>>>> list was being manipulated while not holding the appropriate spin
>>>> lock.
>>>> 
>>>> nfs4_drop_state_owner doesn't need to look at sp->so_lru: the caller
>>>> must have a reference to the state owner, so it can't be on the
>>>> garbage-collection list.
>>>> 
>>>> Run the garbage collector _after_ we've looked up the state owner
>>>> for efficiency reasons.
>>>> 
>>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>>> 
>>> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
>> 
>> You might consider squashing this with my patch.  The list_del_init() outside the cl_lock is enough to warrant it.
> 
> Sure, as long as you're fine with that.

Usually I like to see who did what, but I think this should be one patch for better bisecting behavior and to make it easy for distributors to find and apply this fix.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] NFS: Cache state owners after files are closed
  2012-01-04 22:07                       ` Chuck Lever
@ 2012-01-04 22:27                         ` Trond Myklebust
  2012-01-04 22:29                           ` Trond Myklebust
  2012-01-04 23:10                           ` Chuck Lever
  0 siblings, 2 replies; 18+ messages in thread
From: Trond Myklebust @ 2012-01-04 22:27 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

From: Chuck Lever <chuck.lever@oracle.com>

Servers have a finite amount of memory to store NFSv4 open and lock
owners.  Moreover, servers may have a difficult time determining when
they can reap their state owner table, thanks to gray areas in the
NFSv4 protocol specification.  Thus clients should be careful to reuse
state owners when possible.

Currently Linux is not too careful.  When a user has closed all her
files on one mount point, the state owner's reference count goes to
zero, and it is released.  The next OPEN allocates a new one.  A
workload that serially opens and closes files can run through a large
number of open owners this way.

When a state owner's reference count goes to zero, slap it onto a free
list for that nfs_server, with an expiry time.  Garbage collect before
looking for a state owner.  This makes state owners for active users
available for re-use.

Now that there can be unused state owners remaining at umount time,
purge the state owner free list when a server is destroyed.  Also be
sure not to reclaim unused state owners during state recovery.

This change has benefits for the client as well.  For some workloads,
this approach drops the number of OPEN_CONFIRM calls from the same as
the number of OPEN calls, down to just one.  This reduces wire traffic
and thus open(2) latency.  Before this patch, untarring a kernel
source tarball shows the OPEN_CONFIRM call counter steadily increasing
through the test.  With the patch, the OPEN_CONFIRM count remains at 1
throughout the entire untar.

As long as the expiry time is kept short, I don't think garbage
collection should be terribly expensive, although it does bounce the
clp->cl_lock around a bit.

[ At some point we should rationalize the use of the nfs_server
->destroy method. ]

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[Trond: Fixed a garbage collection race and a few efficiency issues]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
 fs/nfs/client.c           |    8 ++++
 fs/nfs/nfs4_fs.h          |    3 ++
 fs/nfs/nfs4state.c        |   88 ++++++++++++++++++++++++++++++++++++++++-----
 include/linux/nfs_fs_sb.h |    1 +
 4 files changed, 91 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 32ea371..41bd67f 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -250,6 +250,11 @@ static void pnfs_init_server(struct nfs_server *server)
 	rpc_init_wait_queue(&server->roc_rpcwaitq, "pNFS ROC");
 }
 
+static void nfs4_destroy_server(struct nfs_server *server)
+{
+	nfs4_purge_state_owners(server);
+}
+
 #else
 static void nfs4_shutdown_client(struct nfs_client *clp)
 {
@@ -1065,6 +1070,7 @@ static struct nfs_server *nfs_alloc_server(void)
 	INIT_LIST_HEAD(&server->master_link);
 	INIT_LIST_HEAD(&server->delegations);
 	INIT_LIST_HEAD(&server->layouts);
+	INIT_LIST_HEAD(&server->state_owners_lru);
 
 	atomic_set(&server->active, 0);
 
@@ -1538,6 +1544,7 @@ static int nfs4_server_common_setup(struct nfs_server *server,
 
 	nfs_server_insert_lists(server);
 	server->mount_time = jiffies;
+	server->destroy = nfs4_destroy_server;
 out:
 	nfs_free_fattr(fattr);
 	return error;
@@ -1719,6 +1726,7 @@ struct nfs_server *nfs_clone_server(struct nfs_server *source,
 
 	/* Copy data from the source */
 	server->nfs_client = source->nfs_client;
+	server->destroy = source->destroy;
 	atomic_inc(&server->nfs_client->cl_count);
 	nfs_server_copy_userdata(server, source);
 
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 693ae22..4d7d0ae 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -94,6 +94,8 @@ struct nfs_unique_id {
 struct nfs4_state_owner {
 	struct nfs_unique_id so_owner_id;
 	struct nfs_server    *so_server;
+	struct list_head     so_lru;
+	unsigned long        so_expires;
 	struct rb_node	     so_server_node;
 
 	struct rpc_cred	     *so_cred;	 /* Associated cred */
@@ -319,6 +321,7 @@ static inline void nfs4_schedule_session_recovery(struct nfs4_session *session)
 
 extern struct nfs4_state_owner * nfs4_get_state_owner(struct nfs_server *, struct rpc_cred *);
 extern void nfs4_put_state_owner(struct nfs4_state_owner *);
+extern void nfs4_purge_state_owners(struct nfs_server *);
 extern struct nfs4_state * nfs4_get_open_state(struct inode *, struct nfs4_state_owner *);
 extern void nfs4_put_open_state(struct nfs4_state *);
 extern void nfs4_close_state(struct nfs4_state *, fmode_t);
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index 6354e4f..8aadb38 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -49,6 +49,7 @@
 #include <linux/ratelimit.h>
 #include <linux/workqueue.h>
 #include <linux/bitops.h>
+#include <linux/jiffies.h>
 
 #include "nfs4_fs.h"
 #include "callback.h"
@@ -388,6 +389,8 @@ nfs4_find_state_owner_locked(struct nfs_server *server, struct rpc_cred *cred)
 		else if (cred > sp->so_cred)
 			p = &parent->rb_right;
 		else {
+			if (!list_empty(&sp->so_lru))
+				list_del_init(&sp->so_lru);
 			atomic_inc(&sp->so_count);
 			return sp;
 		}
@@ -412,6 +415,8 @@ nfs4_insert_state_owner_locked(struct nfs4_state_owner *new)
 		else if (new->so_cred > sp->so_cred)
 			p = &parent->rb_right;
 		else {
+			if (!list_empty(&sp->so_lru))
+				list_del_init(&sp->so_lru);
 			atomic_inc(&sp->so_count);
 			return sp;
 		}
@@ -453,6 +458,7 @@ nfs4_alloc_state_owner(void)
 	spin_lock_init(&sp->so_sequence.lock);
 	INIT_LIST_HEAD(&sp->so_sequence.list);
 	atomic_set(&sp->so_count, 1);
+	INIT_LIST_HEAD(&sp->so_lru);
 	return sp;
 }
 
@@ -470,6 +476,37 @@ nfs4_drop_state_owner(struct nfs4_state_owner *sp)
 	}
 }
 
+static void nfs4_free_state_owner(struct nfs4_state_owner *sp)
+{
+	rpc_destroy_wait_queue(&sp->so_sequence.wait);
+	put_rpccred(sp->so_cred);
+	kfree(sp);
+}
+
+static void nfs4_gc_state_owners(struct nfs_server *server)
+{
+	struct nfs_client *clp = server->nfs_client;
+	struct nfs4_state_owner *sp, *tmp;
+	unsigned long now = jiffies;
+	LIST_HEAD(doomed);
+
+	spin_lock(&clp->cl_lock);
+	list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru) {
+		/* NB: LRU is sorted so that oldest is at the head */
+		if (time_in_range(now, sp->so_expires,
+				sp->so_expires + clp->cl_lease_time))
+			break;
+		list_move(&sp->so_lru, &doomed);
+		nfs4_remove_state_owner_locked(sp);
+	}
+	spin_unlock(&clp->cl_lock);
+
+	list_for_each_entry_safe(sp, tmp, &doomed, so_lru) {
+		list_del_init(&sp->so_lru);
+		nfs4_free_state_owner(sp);
+	}
+}
+
 /**
  * nfs4_get_state_owner - Look up a state owner given a credential
  * @server: nfs_server to search
@@ -487,10 +524,10 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
 	sp = nfs4_find_state_owner_locked(server, cred);
 	spin_unlock(&clp->cl_lock);
 	if (sp != NULL)
-		return sp;
+		goto out;
 	new = nfs4_alloc_state_owner();
 	if (new == NULL)
-		return NULL;
+		goto out;
 	new->so_server = server;
 	new->so_cred = cred;
 	spin_lock(&clp->cl_lock);
@@ -502,26 +539,58 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
 		rpc_destroy_wait_queue(&new->so_sequence.wait);
 		kfree(new);
 	}
+out:
+	nfs4_gc_state_owners(server);
 	return sp;
 }
 
 /**
  * nfs4_put_state_owner - Release a nfs4_state_owner
  * @sp: state owner data to release
- *
  */
 void nfs4_put_state_owner(struct nfs4_state_owner *sp)
 {
-	struct nfs_client *clp = sp->so_server->nfs_client;
-	struct rpc_cred *cred = sp->so_cred;
+	struct nfs_server *server = sp->so_server;
+	struct nfs_client *clp = server->nfs_client;
 
 	if (!atomic_dec_and_lock(&sp->so_count, &clp->cl_lock))
 		return;
-	nfs4_remove_state_owner_locked(sp);
+
+	if (!RB_EMPTY_NODE(&sp->so_server_node)) {
+		sp->so_expires = jiffies;
+		list_move_tail(&sp->so_lru, &server->state_owners_lru);
+		spin_unlock(&clp->cl_lock);
+	} else {
+		nfs4_remove_state_owner_locked(sp);
+		spin_unlock(&clp->cl_lock);
+		nfs4_free_state_owner(sp);
+	}
+}
+
+/**
+ * nfs4_purge_state_owners - Release all cached state owners
+ * @server: nfs_server with cached state owners to release
+ *
+ * Called at umount time.  Remaining state owners will be on
+ * the LRU with ref count of zero.
+ */
+void nfs4_purge_state_owners(struct nfs_server *server)
+{
+	struct nfs_client *clp = server->nfs_client;
+	struct nfs4_state_owner *sp, *tmp;
+	LIST_HEAD(doomed);
+
+	spin_lock(&clp->cl_lock);
+	list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru) {
+		list_move(&sp->so_lru, &doomed);
+		nfs4_remove_state_owner_locked(sp);
+	}
 	spin_unlock(&clp->cl_lock);
-	rpc_destroy_wait_queue(&sp->so_sequence.wait);
-	put_rpccred(cred);
-	kfree(sp);
+
+	list_for_each_entry_safe(sp, tmp, &doomed, so_lru) {
+		list_del_init(&sp->so_lru);
+		nfs4_free_state_owner(sp);
+	}
 }
 
 static struct nfs4_state *
@@ -1393,6 +1462,7 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov
 restart:
 	rcu_read_lock();
 	list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) {
+		nfs4_purge_state_owners(server);
 		spin_lock(&clp->cl_lock);
 		for (pos = rb_first(&server->state_owners);
 		     pos != NULL;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index b5479df..ba4d765 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -153,6 +153,7 @@ struct nfs_server {
 	struct rb_root		openowner_id;
 	struct rb_root		lockowner_id;
 #endif
+	struct list_head	state_owners_lru;
 	struct list_head	layouts;
 	struct list_head	delegations;
 	void (*destroy)(struct nfs_server *);
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] NFS: Cache state owners after files are closed
  2012-01-04 22:27                         ` [PATCH] NFS: Cache state owners after files are closed Trond Myklebust
@ 2012-01-04 22:29                           ` Trond Myklebust
  2012-01-04 22:39                             ` Chuck Lever
  2012-01-04 23:10                           ` Chuck Lever
  1 sibling, 1 reply; 18+ messages in thread
From: Trond Myklebust @ 2012-01-04 22:29 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-nfs

On Wed, 2012-01-04 at 17:27 -0500, Trond Myklebust wrote: 
> [Trond: Fixed a garbage collection race and a few efficiency issues]
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

As you can see from the resulting patch, I found a couple more
optimisations: mainly to do with optimising away the dropping and then
retaking of the client->cl_lock...

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] NFS: Cache state owners after files are closed
  2012-01-04 22:29                           ` Trond Myklebust
@ 2012-01-04 22:39                             ` Chuck Lever
  0 siblings, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2012-01-04 22:39 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs


On Jan 4, 2012, at 5:29 PM, Trond Myklebust wrote:

> On Wed, 2012-01-04 at 17:27 -0500, Trond Myklebust wrote: 
>> [Trond: Fixed a garbage collection race and a few efficiency issues]
>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> 
> As you can see from the resulting patch, I found a couple more
> optimisations: mainly to do with optimising away the dropping and then
> retaking of the client->cl_lock...

Noted.  Good to eliminate more contention on that lock.  I'll try it out.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] NFS: Cache state owners after files are closed
  2012-01-04 22:27                         ` [PATCH] NFS: Cache state owners after files are closed Trond Myklebust
  2012-01-04 22:29                           ` Trond Myklebust
@ 2012-01-04 23:10                           ` Chuck Lever
  1 sibling, 0 replies; 18+ messages in thread
From: Chuck Lever @ 2012-01-04 23:10 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs


On Jan 4, 2012, at 5:27 PM, Trond Myklebust wrote:

> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Servers have a finite amount of memory to store NFSv4 open and lock
> owners.  Moreover, servers may have a difficult time determining when
> they can reap their state owner table, thanks to gray areas in the
> NFSv4 protocol specification.  Thus clients should be careful to reuse
> state owners when possible.
> 
> Currently Linux is not too careful.  When a user has closed all her
> files on one mount point, the state owner's reference count goes to
> zero, and it is released.  The next OPEN allocates a new one.  A
> workload that serially opens and closes files can run through a large
> number of open owners this way.
> 
> When a state owner's reference count goes to zero, slap it onto a free
> list for that nfs_server, with an expiry time.  Garbage collect before

^before$^when

You get the idea.

> looking for a state owner.  This makes state owners for active users
> available for re-use.
> 
> Now that there can be unused state owners remaining at umount time,
> purge the state owner free list when a server is destroyed.  Also be
> sure not to reclaim unused state owners during state recovery.
> 
> This change has benefits for the client as well.  For some workloads,
> this approach drops the number of OPEN_CONFIRM calls from the same as
> the number of OPEN calls, down to just one.  This reduces wire traffic
> and thus open(2) latency.  Before this patch, untarring a kernel
> source tarball shows the OPEN_CONFIRM call counter steadily increasing
> through the test.  With the patch, the OPEN_CONFIRM count remains at 1
> throughout the entire untar.
> 
> As long as the expiry time is kept short, I don't think garbage
> collection should be terribly expensive, although it does bounce the
> clp->cl_lock around a bit.
> 
> [ At some point we should rationalize the use of the nfs_server
> ->destroy method. ]
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> [Trond: Fixed a garbage collection race and a few efficiency issues]
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> ---
> fs/nfs/client.c           |    8 ++++
> fs/nfs/nfs4_fs.h          |    3 ++
> fs/nfs/nfs4state.c        |   88 ++++++++++++++++++++++++++++++++++++++++-----
> include/linux/nfs_fs_sb.h |    1 +
> 4 files changed, 91 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 32ea371..41bd67f 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -250,6 +250,11 @@ static void pnfs_init_server(struct nfs_server *server)
> 	rpc_init_wait_queue(&server->roc_rpcwaitq, "pNFS ROC");
> }
> 
> +static void nfs4_destroy_server(struct nfs_server *server)
> +{
> +	nfs4_purge_state_owners(server);
> +}
> +
> #else
> static void nfs4_shutdown_client(struct nfs_client *clp)
> {
> @@ -1065,6 +1070,7 @@ static struct nfs_server *nfs_alloc_server(void)
> 	INIT_LIST_HEAD(&server->master_link);
> 	INIT_LIST_HEAD(&server->delegations);
> 	INIT_LIST_HEAD(&server->layouts);
> +	INIT_LIST_HEAD(&server->state_owners_lru);
> 
> 	atomic_set(&server->active, 0);
> 
> @@ -1538,6 +1544,7 @@ static int nfs4_server_common_setup(struct nfs_server *server,
> 
> 	nfs_server_insert_lists(server);
> 	server->mount_time = jiffies;
> +	server->destroy = nfs4_destroy_server;
> out:
> 	nfs_free_fattr(fattr);
> 	return error;
> @@ -1719,6 +1726,7 @@ struct nfs_server *nfs_clone_server(struct nfs_server *source,
> 
> 	/* Copy data from the source */
> 	server->nfs_client = source->nfs_client;
> +	server->destroy = source->destroy;
> 	atomic_inc(&server->nfs_client->cl_count);
> 	nfs_server_copy_userdata(server, source);
> 
> diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
> index 693ae22..4d7d0ae 100644
> --- a/fs/nfs/nfs4_fs.h
> +++ b/fs/nfs/nfs4_fs.h
> @@ -94,6 +94,8 @@ struct nfs_unique_id {
> struct nfs4_state_owner {
> 	struct nfs_unique_id so_owner_id;
> 	struct nfs_server    *so_server;
> +	struct list_head     so_lru;
> +	unsigned long        so_expires;
> 	struct rb_node	     so_server_node;
> 
> 	struct rpc_cred	     *so_cred;	 /* Associated cred */
> @@ -319,6 +321,7 @@ static inline void nfs4_schedule_session_recovery(struct nfs4_session *session)
> 
> extern struct nfs4_state_owner * nfs4_get_state_owner(struct nfs_server *, struct rpc_cred *);
> extern void nfs4_put_state_owner(struct nfs4_state_owner *);
> +extern void nfs4_purge_state_owners(struct nfs_server *);
> extern struct nfs4_state * nfs4_get_open_state(struct inode *, struct nfs4_state_owner *);
> extern void nfs4_put_open_state(struct nfs4_state *);
> extern void nfs4_close_state(struct nfs4_state *, fmode_t);
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index 6354e4f..8aadb38 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -49,6 +49,7 @@
> #include <linux/ratelimit.h>
> #include <linux/workqueue.h>
> #include <linux/bitops.h>
> +#include <linux/jiffies.h>
> 
> #include "nfs4_fs.h"
> #include "callback.h"
> @@ -388,6 +389,8 @@ nfs4_find_state_owner_locked(struct nfs_server *server, struct rpc_cred *cred)
> 		else if (cred > sp->so_cred)
> 			p = &parent->rb_right;
> 		else {
> +			if (!list_empty(&sp->so_lru))
> +				list_del_init(&sp->so_lru);
> 			atomic_inc(&sp->so_count);
> 			return sp;
> 		}
> @@ -412,6 +415,8 @@ nfs4_insert_state_owner_locked(struct nfs4_state_owner *new)
> 		else if (new->so_cred > sp->so_cred)
> 			p = &parent->rb_right;
> 		else {
> +			if (!list_empty(&sp->so_lru))
> +				list_del_init(&sp->so_lru);
> 			atomic_inc(&sp->so_count);
> 			return sp;
> 		}
> @@ -453,6 +458,7 @@ nfs4_alloc_state_owner(void)
> 	spin_lock_init(&sp->so_sequence.lock);
> 	INIT_LIST_HEAD(&sp->so_sequence.list);
> 	atomic_set(&sp->so_count, 1);
> +	INIT_LIST_HEAD(&sp->so_lru);
> 	return sp;
> }
> 
> @@ -470,6 +476,37 @@ nfs4_drop_state_owner(struct nfs4_state_owner *sp)
> 	}
> }
> 
> +static void nfs4_free_state_owner(struct nfs4_state_owner *sp)
> +{
> +	rpc_destroy_wait_queue(&sp->so_sequence.wait);
> +	put_rpccred(sp->so_cred);
> +	kfree(sp);
> +}
> +
> +static void nfs4_gc_state_owners(struct nfs_server *server)
> +{
> +	struct nfs_client *clp = server->nfs_client;
> +	struct nfs4_state_owner *sp, *tmp;
> +	unsigned long now = jiffies;
> +	LIST_HEAD(doomed);
> +
> +	spin_lock(&clp->cl_lock);
> +	list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru) {
> +		/* NB: LRU is sorted so that oldest is at the head */
> +		if (time_in_range(now, sp->so_expires,
> +				sp->so_expires + clp->cl_lease_time))
> +			break;
> +		list_move(&sp->so_lru, &doomed);
> +		nfs4_remove_state_owner_locked(sp);
> +	}
> +	spin_unlock(&clp->cl_lock);
> +
> +	list_for_each_entry_safe(sp, tmp, &doomed, so_lru) {
> +		list_del_init(&sp->so_lru);
> +		nfs4_free_state_owner(sp);
> +	}
> +}
> +
> /**
>  * nfs4_get_state_owner - Look up a state owner given a credential
>  * @server: nfs_server to search
> @@ -487,10 +524,10 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
> 	sp = nfs4_find_state_owner_locked(server, cred);
> 	spin_unlock(&clp->cl_lock);
> 	if (sp != NULL)
> -		return sp;
> +		goto out;
> 	new = nfs4_alloc_state_owner();
> 	if (new == NULL)
> -		return NULL;
> +		goto out;
> 	new->so_server = server;
> 	new->so_cred = cred;
> 	spin_lock(&clp->cl_lock);
> @@ -502,26 +539,58 @@ struct nfs4_state_owner *nfs4_get_state_owner(struct nfs_server *server,
> 		rpc_destroy_wait_queue(&new->so_sequence.wait);
> 		kfree(new);
> 	}
> +out:
> +	nfs4_gc_state_owners(server);
> 	return sp;
> }
> 
> /**
>  * nfs4_put_state_owner - Release a nfs4_state_owner
>  * @sp: state owner data to release
> - *
>  */
> void nfs4_put_state_owner(struct nfs4_state_owner *sp)
> {
> -	struct nfs_client *clp = sp->so_server->nfs_client;
> -	struct rpc_cred *cred = sp->so_cred;
> +	struct nfs_server *server = sp->so_server;
> +	struct nfs_client *clp = server->nfs_client;
> 
> 	if (!atomic_dec_and_lock(&sp->so_count, &clp->cl_lock))
> 		return;
> -	nfs4_remove_state_owner_locked(sp);
> +
> +	if (!RB_EMPTY_NODE(&sp->so_server_node)) {
> +		sp->so_expires = jiffies;
> +		list_move_tail(&sp->so_lru, &server->state_owners_lru);
> +		spin_unlock(&clp->cl_lock);
> +	} else {
> +		nfs4_remove_state_owner_locked(sp);
> +		spin_unlock(&clp->cl_lock);
> +		nfs4_free_state_owner(sp);
> +	}
> +}
> +
> +/**
> + * nfs4_purge_state_owners - Release all cached state owners
> + * @server: nfs_server with cached state owners to release
> + *
> + * Called at umount time.  Remaining state owners will be on
> + * the LRU with ref count of zero.
> + */
> +void nfs4_purge_state_owners(struct nfs_server *server)
> +{
> +	struct nfs_client *clp = server->nfs_client;
> +	struct nfs4_state_owner *sp, *tmp;
> +	LIST_HEAD(doomed);
> +
> +	spin_lock(&clp->cl_lock);
> +	list_for_each_entry_safe(sp, tmp, &server->state_owners_lru, so_lru) {
> +		list_move(&sp->so_lru, &doomed);
> +		nfs4_remove_state_owner_locked(sp);
> +	}
> 	spin_unlock(&clp->cl_lock);
> -	rpc_destroy_wait_queue(&sp->so_sequence.wait);
> -	put_rpccred(cred);
> -	kfree(sp);
> +
> +	list_for_each_entry_safe(sp, tmp, &doomed, so_lru) {
> +		list_del_init(&sp->so_lru);
> +		nfs4_free_state_owner(sp);
> +	}
> }
> 
> static struct nfs4_state *
> @@ -1393,6 +1462,7 @@ static int nfs4_do_reclaim(struct nfs_client *clp, const struct nfs4_state_recov
> restart:
> 	rcu_read_lock();
> 	list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) {
> +		nfs4_purge_state_owners(server);
> 		spin_lock(&clp->cl_lock);
> 		for (pos = rb_first(&server->state_owners);
> 		     pos != NULL;
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index b5479df..ba4d765 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -153,6 +153,7 @@ struct nfs_server {
> 	struct rb_root		openowner_id;
> 	struct rb_root		lockowner_id;
> #endif
> +	struct list_head	state_owners_lru;
> 	struct list_head	layouts;
> 	struct list_head	delegations;
> 	void (*destroy)(struct nfs_server *);
> -- 
> 1.7.7.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-01-04 23:11 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-06 21:13 [PATCH 0/2] Cache state owners [take 2] Chuck Lever
2011-12-06 21:13 ` [PATCH 1/2] NFS: Clean up nfs4_find_state_owners_locked() Chuck Lever
2011-12-06 21:13 ` [PATCH 2/2] NFS: Cache state owners after files are closed Chuck Lever
2012-01-04 15:53   ` Trond Myklebust
2012-01-04 15:59     ` Trond Myklebust
2012-01-04 16:08       ` Chuck Lever
2012-01-04 16:14         ` Trond Myklebust
2012-01-04 17:07           ` Chuck Lever
2012-01-04 17:14             ` Trond Myklebust
2012-01-04 21:52               ` [PATCH] NFS: Fix a few issues with cached state owners Trond Myklebust
2012-01-04 21:56                 ` Chuck Lever
2012-01-04 21:58                   ` Chuck Lever
2012-01-04 22:03                     ` Trond Myklebust
2012-01-04 22:07                       ` Chuck Lever
2012-01-04 22:27                         ` [PATCH] NFS: Cache state owners after files are closed Trond Myklebust
2012-01-04 22:29                           ` Trond Myklebust
2012-01-04 22:39                             ` Chuck Lever
2012-01-04 23:10                           ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).