[PATCH v4 0/6] nfsd: overhaul the client name tracking code

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v4 0/6] nfsd: overhaul the client name tracking code
@ 2012-01-23 20:01 Jeff Layton
  2012-01-23 20:01 ` [PATCH v4 1/6] nfsd: add nfsd4_client_tracking_ops struct and a way to set it Jeff Layton
                   ` (7 more replies)
  0 siblings, 8 replies; 28+ messages in thread
From: Jeff Layton @ 2012-01-23 20:01 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

This is the fourth iteration of this patchset. I had originally asked
Bruce to take the last one for 3.3, but decided at the last minute to
wait on it a bit. I knew there would be some changes needed in the
upcall, so by waiting we can avoid needing to deal with those in code
that has already shipped. I would like to see this patchset considered
for 3.4 however.

The previous patchset can be viewed here. That set also contains a
more comprehensive description of the rationale for this:

    http://www.spinics.net/lists/linux-nfs/msg26324.html

There have been a number of significant changes since the last set:

- the remove/expire upcall is now gone. In a clustered environment, the
records would need to be refcounted in order to handle that properly. That
becomes a sticky problem when you could have nodes rebooting. We don't
really need to remove these records individually however. Cleaning them
out only when the grace period ends should be sufficient.

- an "init" upcall has been added. When knfsd starts, it will upcall to
userspace for a boot_generation value. This will replace the boot time
in the scheme that generates clientid4's. By doing this, we eliminate
a potential problem with clientid4 collisions in clustered setups.

- a "client index" has been added to the create and check upcalls. The
kernel ignores this value for now, but eventually we'll need to use it
to help establish what clients own which locks in a clustered setup.

- a number of the client_tracking functions have been changed to void
return functions. Since the callers all ignore the return codes of those
functions, there was little benefit in passing them back.

Comments and suggestions are welcome...

Jeff Layton (6):
  nfsd: add nfsd4_client_tracking_ops struct and a way to set it
  sunrpc: create nfsd dir in rpc_pipefs
  nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
  nfsd: add a header describing upcall to nfsdcld
  nfsd: add the infrastructure to handle the cld upcall
  nfsd: get boot generation number from upcall instead of boot_time

 fs/nfsd/nfs4callback.c   |   12 +-
 fs/nfsd/nfs4recover.c    |  480 +++++++++++++++++++++++++++++++++++++++++++++-
 fs/nfsd/nfs4state.c      |   63 +++----
 fs/nfsd/state.h          |   22 ++-
 include/linux/nfsd/cld.h |   58 ++++++
 net/sunrpc/rpc_pipe.c    |    5 +
 6 files changed, 581 insertions(+), 59 deletions(-)
 create mode 100644 include/linux/nfsd/cld.h

-- 
1.7.7.5


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v4 1/6] nfsd: add nfsd4_client_tracking_ops struct and a way to set it
  2012-01-23 20:01 [PATCH v4 0/6] nfsd: overhaul the client name tracking code Jeff Layton
@ 2012-01-23 20:01 ` Jeff Layton
  2012-01-23 20:01 ` [PATCH v4 2/6] sunrpc: create nfsd dir in rpc_pipefs Jeff Layton
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Jeff Layton @ 2012-01-23 20:01 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Abstract out the mechanism that we use to track clients into a set of
client name tracking functions.

This gives us a mechanism to plug in a new set of client tracking
functions without disturbing the callers. It also gives us a way to
decide on what tracking scheme to use at runtime.

For now, this just looks like pointless abstraction, but later we'll
add a new alternate scheme for tracking clients on stable storage.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfs4recover.c |  123 +++++++++++++++++++++++++++++++++++++++++++++----
 fs/nfsd/nfs4state.c   |   46 +++++++------------
 fs/nfsd/state.h       |   14 +++--
 3 files changed, 138 insertions(+), 45 deletions(-)

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 0b3e875..3fbf1f4 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -43,9 +43,20 @@
 
 #define NFSDDBG_FACILITY                NFSDDBG_PROC
 
+/* Declarations */
+struct nfsd4_client_tracking_ops {
+	int (*init)(void);
+	void (*exit)(void);
+	void (*create)(struct nfs4_client *);
+	void (*remove)(struct nfs4_client *);
+	int (*check)(struct nfs4_client *);
+	void (*grace_done)(time_t);
+};
+
 /* Globals */
 static struct file *rec_file;
 static char user_recovery_dirname[PATH_MAX] = "/var/lib/nfs/v4recovery";
+static struct nfsd4_client_tracking_ops *client_tracking_ops;
 
 static int
 nfs4_save_creds(const struct cred **original_creds)
@@ -117,7 +128,8 @@ out_no_tfm:
 	return status;
 }
 
-void nfsd4_create_clid_dir(struct nfs4_client *clp)
+static void
+nfsd4_create_clid_dir(struct nfs4_client *clp)
 {
 	const struct cred *original_cred;
 	char *dname = clp->cl_recdir;
@@ -265,7 +277,7 @@ out_unlock:
 	return status;
 }
 
-void
+static void
 nfsd4_remove_clid_dir(struct nfs4_client *clp)
 {
 	const struct cred *original_cred;
@@ -292,7 +304,6 @@ out:
 	if (status)
 		printk("NFSD: Failed to remove expired client state directory"
 				" %.*s\n", HEXDIR_LEN, clp->cl_recdir);
-	return;
 }
 
 static int
@@ -311,8 +322,8 @@ purge_old(struct dentry *parent, struct dentry *child)
 	return 0;
 }
 
-void
-nfsd4_recdir_purge_old(void) {
+static void
+nfsd4_recdir_purge_old(time_t boot_time __attribute__ ((unused))) {
 	int status;
 
 	if (!rec_file)
@@ -343,7 +354,7 @@ load_recdir(struct dentry *parent, struct dentry *child)
 	return 0;
 }
 
-int
+static int
 nfsd4_recdir_load(void) {
 	int status;
 
@@ -361,8 +372,8 @@ nfsd4_recdir_load(void) {
  * Hold reference to the recovery directory.
  */
 
-void
-nfsd4_init_recdir()
+static int
+nfsd4_init_recdir(void)
 {
 	const struct cred *original_cred;
 	int status;
@@ -377,20 +388,37 @@ nfsd4_init_recdir()
 		printk("NFSD: Unable to change credentials to find recovery"
 		       " directory: error %d\n",
 		       status);
-		return;
+		return status;
 	}
 
 	rec_file = filp_open(user_recovery_dirname, O_RDONLY | O_DIRECTORY, 0);
 	if (IS_ERR(rec_file)) {
 		printk("NFSD: unable to find recovery directory %s\n",
 				user_recovery_dirname);
+		status = PTR_ERR(rec_file);
 		rec_file = NULL;
 	}
 
 	nfs4_reset_creds(original_cred);
+	return status;
 }
 
-void
+static int
+nfsd4_load_reboot_recovery_data(void)
+{
+	int status;
+
+	nfs4_lock_state();
+	status = nfsd4_init_recdir();
+	if (!status)
+		status = nfsd4_recdir_load();
+	nfs4_unlock_state();
+	if (status)
+		printk(KERN_ERR "NFSD: Failure reading reboot recovery data\n");
+	return status;
+}
+
+static void
 nfsd4_shutdown_recdir(void)
 {
 	if (!rec_file)
@@ -425,3 +453,78 @@ nfs4_recoverydir(void)
 {
 	return user_recovery_dirname;
 }
+
+static int
+nfsd4_check_legacy_client(struct nfs4_client *clp)
+{
+	if (nfsd4_find_reclaim_client(clp) != NULL)
+		return 0;
+	else
+		return -ENOENT;
+}
+
+static struct nfsd4_client_tracking_ops nfsd4_legacy_tracking_ops = {
+	.init		= nfsd4_load_reboot_recovery_data,
+	.exit		= nfsd4_shutdown_recdir,
+	.create		= nfsd4_create_clid_dir,
+	.remove		= nfsd4_remove_clid_dir,
+	.check		= nfsd4_check_legacy_client,
+	.grace_done	= nfsd4_recdir_purge_old,
+};
+
+int
+nfsd4_client_tracking_init(void)
+{
+	int status;
+
+	client_tracking_ops = &nfsd4_legacy_tracking_ops;
+
+	if (!client_tracking_ops->init)
+		return 0;
+
+	status = client_tracking_ops->init();
+	if (status) {
+		printk(KERN_WARNING "NFSD: Unable to initialize client "
+				    "recovery tracking! (%d)\n", status);
+		client_tracking_ops = NULL;
+	}
+	return status;
+}
+
+void
+nfsd4_client_tracking_exit(void)
+{
+	if (client_tracking_ops && client_tracking_ops->exit)
+		client_tracking_ops->exit();
+	client_tracking_ops = NULL;
+}
+
+void
+nfsd4_client_record_create(struct nfs4_client *clp)
+{
+	if (client_tracking_ops && client_tracking_ops->create)
+		client_tracking_ops->create(clp);
+}
+
+void
+nfsd4_client_record_remove(struct nfs4_client *clp)
+{
+	if (client_tracking_ops && client_tracking_ops->remove)
+		client_tracking_ops->remove(clp);
+}
+
+int
+nfsd4_client_record_check(struct nfs4_client *clp)
+{
+	if (client_tracking_ops && client_tracking_ops->check)
+		return client_tracking_ops->check(clp);
+
+	return -EOPNOTSUPP;
+}
+
+void
+nfsd4_grace_done(time_t boot_time)
+{
+	if (client_tracking_ops && client_tracking_ops->grace_done)
+		client_tracking_ops->grace_done(boot_time);
+}
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index e8c98f0..971a117 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2045,7 +2045,7 @@ nfsd4_reclaim_complete(struct svc_rqst *rqstp, struct nfsd4_compound_state *csta
 		goto out;
 
 	status = nfs_ok;
-	nfsd4_create_clid_dir(cstate->session->se_client);
+	nfsd4_client_record_create(cstate->session->se_client);
 out:
 	nfs4_unlock_state();
 	return status;
@@ -2240,7 +2240,7 @@ nfsd4_setclientid_confirm(struct svc_rqst *rqstp,
 			conf = find_confirmed_client_by_str(unconf->cl_recdir,
 							    hash);
 			if (conf) {
-				nfsd4_remove_clid_dir(conf);
+				nfsd4_client_record_remove(conf);
 				expire_client(conf);
 			}
 			move_to_confirmed(unconf);
@@ -3066,7 +3066,7 @@ static void
 nfsd4_end_grace(void)
 {
 	dprintk("NFSD: end of grace period\n");
-	nfsd4_recdir_purge_old();
+	nfsd4_grace_done(boot_time);
 	locks_end_grace(&nfsd4_manager);
 	/*
 	 * Now that every NFSv4 client has had the chance to recover and
@@ -3115,7 +3115,7 @@ nfs4_laundromat(void)
 		clp = list_entry(pos, struct nfs4_client, cl_lru);
 		dprintk("NFSD: purging unused client (clientid %08x)\n",
 			clp->cl_clientid.cl_id);
-		nfsd4_remove_clid_dir(clp);
+		nfsd4_client_record_remove(clp);
 		expire_client(clp);
 	}
 	spin_lock(&recall_lock);
@@ -3539,7 +3539,7 @@ nfsd4_open_confirm(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	dprintk("NFSD: %s: success, seqid=%d stateid=" STATEID_FMT "\n",
 		__func__, oc->oc_seqid, STATEID_VAL(&stp->st_stid.sc_stateid));
 
-	nfsd4_create_clid_dir(oo->oo_owner.so_client);
+	nfsd4_client_record_create(oo->oo_owner.so_client);
 	status = nfs_ok;
 out:
 	if (!cstate->replay_owner)
@@ -4397,19 +4397,13 @@ nfs4_release_reclaim(void)
 
 /*
  * called from OPEN, CLAIM_PREVIOUS with a new clientid. */
-static struct nfs4_client_reclaim *
-nfs4_find_reclaim_client(clientid_t *clid)
+struct nfs4_client_reclaim *
+nfsd4_find_reclaim_client(struct nfs4_client *clp)
 {
 	unsigned int strhashval;
-	struct nfs4_client *clp;
 	struct nfs4_client_reclaim *crp = NULL;
 
 
-	/* find clientid in conf_id_hashtbl */
-	clp = find_confirmed_client(clid);
-	if (clp == NULL)
-		return NULL;
-
 	dprintk("NFSD: nfs4_find_reclaim_client for %.*s with recdir %s\n",
 		            clp->cl_name.len, clp->cl_name.data,
 			    clp->cl_recdir);
@@ -4430,7 +4424,14 @@ nfs4_find_reclaim_client(clientid_t *clid)
 __be32
 nfs4_check_open_reclaim(clientid_t *clid)
 {
-	return nfs4_find_reclaim_client(clid) ? nfs_ok : nfserr_reclaim_bad;
+	struct nfs4_client *clp;
+
+	/* find clientid in conf_id_hashtbl */
+	clp = find_confirmed_client(clid);
+	if (clp == NULL)
+		return nfserr_reclaim_bad;
+
+	return nfsd4_client_record_check(clp) ? nfserr_reclaim_bad : nfs_ok;
 }
 
 #ifdef CONFIG_NFSD_FAULT_INJECTION
@@ -4577,19 +4578,6 @@ nfs4_state_init(void)
 	reclaim_str_hashtbl_size = 0;
 }
 
-static void
-nfsd4_load_reboot_recovery_data(void)
-{
-	int status;
-
-	nfs4_lock_state();
-	nfsd4_init_recdir();
-	status = nfsd4_recdir_load();
-	nfs4_unlock_state();
-	if (status)
-		printk("NFSD: Failure reading reboot recovery data\n");
-}
-
 /*
  * Since the lifetime of a delegation isn't limited to that of an open, a
  * client may quite reasonably hang on to a delegation as long as it has
@@ -4642,7 +4630,7 @@ out_free_laundry:
 int
 nfs4_state_start(void)
 {
-	nfsd4_load_reboot_recovery_data();
+	nfsd4_client_tracking_init();
 	return __nfs4_state_start();
 }
 
@@ -4676,7 +4664,7 @@ __nfs4_state_shutdown(void)
 		unhash_delegation(dp);
 	}
 
-	nfsd4_shutdown_recdir();
+	nfsd4_client_tracking_exit();
 }
 
 void
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index ffb5df1..8f539ee 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -463,6 +463,7 @@ extern __be32 nfs4_preprocess_stateid_op(struct nfsd4_compound_state *cstate,
 extern void nfs4_lock_state(void);
 extern void nfs4_unlock_state(void);
 extern int nfs4_in_grace(void);
+extern struct nfs4_client_reclaim *nfsd4_find_reclaim_client(struct nfs4_client *crp);
 extern __be32 nfs4_check_open_reclaim(clientid_t *clid);
 extern void nfs4_free_openowner(struct nfs4_openowner *);
 extern void nfs4_free_lockowner(struct nfs4_lockowner *);
@@ -477,16 +478,17 @@ extern void nfsd4_destroy_callback_queue(void);
 extern void nfsd4_shutdown_callback(struct nfs4_client *);
 extern void nfs4_put_delegation(struct nfs4_delegation *dp);
 extern __be32 nfs4_make_rec_clidname(char *clidname, struct xdr_netobj *clname);
-extern void nfsd4_init_recdir(void);
-extern int nfsd4_recdir_load(void);
-extern void nfsd4_shutdown_recdir(void);
 extern int nfs4_client_to_reclaim(const char *name);
 extern int nfs4_has_reclaimed_state(const char *name, bool use_exchange_id);
-extern void nfsd4_recdir_purge_old(void);
-extern void nfsd4_create_clid_dir(struct nfs4_client *clp);
-extern void nfsd4_remove_clid_dir(struct nfs4_client *clp);
 extern void release_session_client(struct nfsd4_session *);
 extern __be32 nfs4_validate_stateid(struct nfs4_client *, stateid_t *);
 extern void nfsd4_purge_closed_stateid(struct nfs4_stateowner *);
 
+/* nfs4recover operations */
+extern int nfsd4_client_tracking_init(void);
+extern void nfsd4_client_tracking_exit(void);
+extern void nfsd4_client_record_create(struct nfs4_client *clp);
+extern void nfsd4_client_record_remove(struct nfs4_client *clp);
+extern int nfsd4_client_record_check(struct nfs4_client *clp);
+extern void nfsd4_grace_done(time_t boot_time);
 #endif   /* NFSD4_STATE_H */
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 2/6] sunrpc: create nfsd dir in rpc_pipefs
  2012-01-23 20:01 [PATCH v4 0/6] nfsd: overhaul the client name tracking code Jeff Layton
  2012-01-23 20:01 ` [PATCH v4 1/6] nfsd: add nfsd4_client_tracking_ops struct and a way to set it Jeff Layton
@ 2012-01-23 20:01 ` Jeff Layton
  2012-01-23 20:01 ` [PATCH v4 3/6] nfsd: convert nfs4_client->cl_cb_flags to a generic flags field Jeff Layton
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Jeff Layton @ 2012-01-23 20:01 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Add a new top-level dir in rpc_pipefs to hold the pipe for the clientid
upcall.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 net/sunrpc/rpc_pipe.c |    5 +++++
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index 63a7a7a..3dc2c6e 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -986,6 +986,7 @@ enum {
 	RPCAUTH_statd,
 	RPCAUTH_nfsd4_cb,
 	RPCAUTH_cache,
+	RPCAUTH_nfsd,
 	RPCAUTH_RootEOF
 };
 
@@ -1018,6 +1019,10 @@ static const struct rpc_filelist files[] = {
 		.name = "cache",
 		.mode = S_IFDIR | S_IRUGO | S_IXUGO,
 	},
+	[RPCAUTH_nfsd] = {
+		.name = "nfsd",
+		.mode = S_IFDIR | S_IRUGO | S_IXUGO,
+	},
 };
 
 static int
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 3/6] nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
  2012-01-23 20:01 [PATCH v4 0/6] nfsd: overhaul the client name tracking code Jeff Layton
  2012-01-23 20:01 ` [PATCH v4 1/6] nfsd: add nfsd4_client_tracking_ops struct and a way to set it Jeff Layton
  2012-01-23 20:01 ` [PATCH v4 2/6] sunrpc: create nfsd dir in rpc_pipefs Jeff Layton
@ 2012-01-23 20:01 ` Jeff Layton
  2012-01-23 20:01 ` [PATCH v4 4/6] nfsd: add a header describing upcall to nfsdcld Jeff Layton
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Jeff Layton @ 2012-01-23 20:01 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

We'll need a way to flag the nfs4_client as already being recorded on
stable storage so that we don't continually upcall.

The cl_cb_flags field is only using 2 bits right now, so repurpose that
to a generic flags field. Rename NFSD4_CLIENT_KILL to
NFSD4_CLIENT_CB_KILL to make it evident that it's part of the callback
flags. Add a mask that we can use for existing checks that look to see
whether any flags are set, so that the new STABLE flag doesn't
interfere.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfs4callback.c |   12 ++++++------
 fs/nfsd/state.h        |    8 +++++---
 2 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index 6f3ebb4..730e2cc 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -756,7 +756,7 @@ void nfsd4_probe_callback(struct nfs4_client *clp)
 {
 	/* XXX: atomicity?  Also, should we be using cl_cb_flags? */
 	clp->cl_cb_state = NFSD4_CB_UNKNOWN;
-	set_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_cb_flags);
+	set_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags);
 	do_probe_callback(clp);
 }
 
@@ -915,7 +915,7 @@ void nfsd4_destroy_callback_queue(void)
 /* must be called under the state lock */
 void nfsd4_shutdown_callback(struct nfs4_client *clp)
 {
-	set_bit(NFSD4_CLIENT_KILL, &clp->cl_cb_flags);
+	set_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags);
 	/*
 	 * Note this won't actually result in a null callback;
 	 * instead, nfsd4_do_callback_rpc() will detect the killed
@@ -966,15 +966,15 @@ static void nfsd4_process_cb_update(struct nfsd4_callback *cb)
 		svc_xprt_put(clp->cl_cb_conn.cb_xprt);
 		clp->cl_cb_conn.cb_xprt = NULL;
 	}
-	if (test_bit(NFSD4_CLIENT_KILL, &clp->cl_cb_flags))
+	if (test_bit(NFSD4_CLIENT_CB_KILL, &clp->cl_flags))
 		return;
 	spin_lock(&clp->cl_lock);
 	/*
 	 * Only serialized callback code is allowed to clear these
 	 * flags; main nfsd code can only set them:
 	 */
-	BUG_ON(!clp->cl_cb_flags);
-	clear_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_cb_flags);
+	BUG_ON(!(clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK));
+	clear_bit(NFSD4_CLIENT_CB_UPDATE, &clp->cl_flags);
 	memcpy(&conn, &cb->cb_clp->cl_cb_conn, sizeof(struct nfs4_cb_conn));
 	c = __nfsd4_find_backchannel(clp);
 	if (c) {
@@ -1000,7 +1000,7 @@ void nfsd4_do_callback_rpc(struct work_struct *w)
 	struct nfs4_client *clp = cb->cb_clp;
 	struct rpc_clnt *clnt;
 
-	if (clp->cl_cb_flags)
+	if (clp->cl_flags & NFSD4_CLIENT_CB_FLAG_MASK)
 		nfsd4_process_cb_update(cb);
 
 	clnt = clp->cl_cb_client;
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 8f539ee..0f9a0ac 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -250,9 +250,11 @@ struct nfs4_client {
 
 	/* for v4.0 and v4.1 callbacks: */
 	struct nfs4_cb_conn	cl_cb_conn;
-#define NFSD4_CLIENT_CB_UPDATE	1
-#define NFSD4_CLIENT_KILL	2
-	unsigned long		cl_cb_flags;
+#define NFSD4_CLIENT_STABLE		(0)	/* client on stable storage */
+#define NFSD4_CLIENT_CB_UPDATE		(1)
+#define NFSD4_CLIENT_CB_KILL		(2)
+#define NFSD4_CLIENT_CB_FLAG_MASK	(0x6)
+	unsigned long		cl_flags;
 	struct rpc_clnt		*cl_cb_client;
 	u32			cl_cb_ident;
 #define NFSD4_CB_UP		0
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 4/6] nfsd: add a header describing upcall to nfsdcld
  2012-01-23 20:01 [PATCH v4 0/6] nfsd: overhaul the client name tracking code Jeff Layton
                   ` (2 preceding siblings ...)
  2012-01-23 20:01 ` [PATCH v4 3/6] nfsd: convert nfs4_client->cl_cb_flags to a generic flags field Jeff Layton
@ 2012-01-23 20:01 ` Jeff Layton
  2012-01-23 20:01 ` [PATCH v4 5/6] nfsd: add the infrastructure to handle the cld upcall Jeff Layton
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Jeff Layton @ 2012-01-23 20:01 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

The daemon takes a versioned binary struct. Hopefully this should allow
us to revise the struct later if it becomes necessary.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/nfsd/cld.h |   58 ++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 58 insertions(+), 0 deletions(-)
 create mode 100644 include/linux/nfsd/cld.h

diff --git a/include/linux/nfsd/cld.h b/include/linux/nfsd/cld.h
new file mode 100644
index 0000000..b144a70
--- /dev/null
+++ b/include/linux/nfsd/cld.h
@@ -0,0 +1,58 @@
+/*
+ * fs/nfsd/cld.h - upcall description for nfsdcld communication
+ *
+ * Copyright (c) 2011 Red Hat, Inc.
+ * Author(s): Jeff Layton <jlayton@redhat.com>
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, write to the Free Software
+ *  Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#ifndef _NFSD_CLD_H
+#define _NFSD_CLD_H
+
+/* latest upcall version available */
+#define CLD_UPCALL_VERSION 1
+
+/* defined by RFC3530 */
+#define NFS4_OPAQUE_LIMIT 1024
+
+enum cld_command {
+	Cld_Init,		/* called when nfsd starts */
+	Cld_Create,		/* create a record for this cm_id */
+	Cld_Allow,		/* is this cm_id allowed? */
+	Cld_GraceDone,		/* grace period is complete */
+};
+
+/* representation of long-form NFSv4 client ID */
+struct cld_name {
+	uint16_t	cn_len;				/* length of cm_id */
+	unsigned char	cn_id[NFS4_OPAQUE_LIMIT];	/* client-provided */
+} __attribute__((packed));
+
+/* message struct for communication with userspace */
+struct cld_msg {
+	uint8_t		cm_vers;		/* upcall version */
+	uint8_t		cm_cmd;			/* upcall command */
+	int16_t		cm_status;		/* return code */
+	uint32_t	cm_xid;			/* transaction id */
+	union {
+		int64_t		cm_gracetime;	/* grace period start time */
+		int64_t		cm_index;	/* clientid index */
+		uint32_t	cm_generation;	/* boot generation index */
+		struct cld_name	cm_name;
+	} __attribute__((packed)) cm_u;
+} __attribute__((packed));
+
+#endif /* !_NFSD_CLD_H */
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 5/6] nfsd: add the infrastructure to handle the cld upcall
  2012-01-23 20:01 [PATCH v4 0/6] nfsd: overhaul the client name tracking code Jeff Layton
                   ` (3 preceding siblings ...)
  2012-01-23 20:01 ` [PATCH v4 4/6] nfsd: add a header describing upcall to nfsdcld Jeff Layton
@ 2012-01-23 20:01 ` Jeff Layton
  2012-01-23 20:01 ` [PATCH v4 6/6] nfsd: get boot generation number from upcall instead of boot_time Jeff Layton
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 28+ messages in thread
From: Jeff Layton @ 2012-01-23 20:01 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

...and add a mechanism for switching between the "legacy" tracker and
the new one. The decision is made by looking to see whether the
v4recoverydir exists. If it does, then the legacy client tracker is
used.

If it's not, then the kernel will create a "cld" pipe in rpc_pipefs.
That pipe is used to talk to a daemon for handling the upcall.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfs4recover.c |  328 ++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 327 insertions(+), 1 deletions(-)

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 3fbf1f4..7ef9327 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -1,5 +1,6 @@
 /*
 *  Copyright (c) 2004 The Regents of the University of Michigan.
+*  Copyright (c) 2011 Jeff Layton <jlayton@redhat.com>
 *  All rights reserved.
 *
 *  Andy Adamson <andros@citi.umich.edu>
@@ -36,6 +37,10 @@
 #include <linux/namei.h>
 #include <linux/crypto.h>
 #include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/sunrpc/rpc_pipe_fs.h>
+#include <linux/sunrpc/clnt.h>
+#include <linux/nfsd/cld.h>
 
 #include "nfsd.h"
 #include "state.h"
@@ -472,12 +477,333 @@ static struct nfsd4_client_tracking_ops nfsd4_legacy_tracking_ops = {
 	.grace_done	= nfsd4_recdir_purge_old,
 };
 
+/* Globals */
+#define NFSD_PIPE_DIR		"/nfsd"
+
+static struct dentry *cld_pipe;
+
+/* list of cld_msg's that are currently in use */
+static DEFINE_SPINLOCK(cld_lock);
+static LIST_HEAD(cld_list);
+static unsigned int cld_xid;
+
+struct cld_upcall {
+	struct list_head	cu_list;
+	struct task_struct	*cu_task;
+	struct cld_msg	cu_msg;
+};
+
+static int
+__cld_pipe_upcall(struct cld_msg *cmsg)
+{
+	int ret;
+	struct rpc_pipe_msg msg;
+	struct inode *inode = cld_pipe->d_inode;
+
+	memset(&msg, 0, sizeof(msg));
+	msg.data = cmsg;
+	msg.len = sizeof(*cmsg);
+
+	/*
+	 * Set task state before we queue the upcall. That prevents
+	 * wake_up_process in the downcall from racing with schedule.
+	 */
+	set_current_state(TASK_UNINTERRUPTIBLE);
+	ret = rpc_queue_upcall(inode, &msg);
+	if (ret < 0) {
+		set_current_state(TASK_RUNNING);
+		goto out;
+	}
+
+	schedule();
+	set_current_state(TASK_RUNNING);
+
+	if (msg.errno < 0)
+		ret = msg.errno;
+out:
+	return ret;
+}
+
+static int
+cld_pipe_upcall(struct cld_msg *cmsg)
+{
+	int ret;
+
+	/*
+	 * -EAGAIN occurs when pipe is closed an reopened while there are
+	 *  upcalls queued.
+	 */
+	do {
+		ret = __cld_pipe_upcall(cmsg);
+	} while (ret == -EAGAIN);
+
+	return ret;
+}
+
+static ssize_t
+cld_pipe_downcall(struct file *filp, const char __user *src, size_t mlen)
+{
+	struct cld_upcall *tmp, *cup;
+	struct cld_msg *cmsg = (struct cld_msg *)src;
+	uint32_t xid;
+
+	if (mlen != sizeof(*cmsg)) {
+		dprintk("%s: got %lu bytes, expected %lu\n", __func__, mlen,
+			sizeof(*cmsg));
+		return -EINVAL;
+	}
+
+	/* copy just the xid so we can try to find that */
+	if (copy_from_user(&xid, &cmsg->cm_xid, sizeof(xid)) != 0) {
+		dprintk("%s: error when copying xid from userspace", __func__);
+		return -EFAULT;
+	}
+
+	/* walk the list and find corresponding xid */
+	cup = NULL;
+	spin_lock(&cld_lock);
+	list_for_each_entry(tmp, &cld_list, cu_list) {
+		if (get_unaligned(&tmp->cu_msg.cm_xid) == xid) {
+			cup = tmp;
+			list_del_init(&cup->cu_list);
+			break;
+		}
+	}
+	spin_unlock(&cld_lock);
+
+	/* couldn't find upcall? */
+	if (!cup) {
+		dprintk("%s: couldn't find upcall -- xid=%u\n", __func__,
+			cup->cu_msg.cm_xid);
+		return -EINVAL;
+	}
+
+	if (copy_from_user(&cup->cu_msg, src, mlen) != 0)
+		return -EFAULT;
+
+	wake_up_process(cup->cu_task);
+	return mlen;
+}
+
+static void
+cld_pipe_destroy_msg(struct rpc_pipe_msg *msg)
+{
+	struct cld_msg *cmsg = msg->data;
+	struct cld_upcall *cup = container_of(cmsg, struct cld_upcall,
+						 cu_msg);
+
+	/* errno >= 0 means we got a downcall */
+	if (msg->errno >= 0)
+		return;
+
+	wake_up_process(cup->cu_task);
+}
+
+static const struct rpc_pipe_ops cld_upcall_ops = {
+	.upcall		= rpc_pipe_generic_upcall,
+	.downcall	= cld_pipe_downcall,
+	.destroy_msg	= cld_pipe_destroy_msg,
+};
+
+/* Initialize rpc_pipefs pipe for communication with client tracking daemon */
+static int
+nfsd4_init_cld_pipe(void)
+{
+	int ret;
+	struct path path;
+	struct vfsmount *mnt;
+
+	if (cld_pipe)
+		return 0;
+
+	mnt = rpc_get_mount();
+	if (IS_ERR(mnt))
+		return PTR_ERR(mnt);
+
+	ret = vfs_path_lookup(mnt->mnt_root, mnt, NFSD_PIPE_DIR, 0, &path);
+	if (ret)
+		goto err;
+
+	cld_pipe = rpc_mkpipe(path.dentry, "cld", NULL,
+				&cld_upcall_ops, RPC_PIPE_WAIT_FOR_OPEN);
+	path_put(&path);
+	if (!IS_ERR(cld_pipe))
+		return 0;
+
+	ret = PTR_ERR(cld_pipe);
+err:
+	rpc_put_mount();
+	printk(KERN_ERR "NFSD: unable to create nfsdcld upcall pipe (%d)\n",
+			ret);
+	return ret;
+}
+
+static void
+nfsd4_remove_cld_pipe(void)
+{
+	int ret;
+
+	ret = rpc_unlink(cld_pipe);
+	if (ret)
+		printk(KERN_ERR "NFSD: error removing cld pipe: %d\n", ret);
+	cld_pipe = NULL;
+	rpc_put_mount();
+}
+
+static struct cld_upcall *
+alloc_cld_upcall(void)
+{
+	struct cld_upcall *new, *tmp;
+
+	new = kzalloc(sizeof(*new), GFP_KERNEL);
+	if (!new)
+		return new;
+
+	/* FIXME: hard cap on number in flight? */
+restart_search:
+	spin_lock(&cld_lock);
+	list_for_each_entry(tmp, &cld_list, cu_list) {
+		if (tmp->cu_msg.cm_xid == cld_xid) {
+			cld_xid++;
+			spin_unlock(&cld_lock);
+			goto restart_search;
+		}
+	}
+	new->cu_task = current;
+	new->cu_msg.cm_vers = CLD_UPCALL_VERSION;
+	put_unaligned(cld_xid++, &new->cu_msg.cm_xid);
+	list_add(&new->cu_list, &cld_list);
+	spin_unlock(&cld_lock);
+
+	dprintk("%s: allocated xid %u\n", __func__, new->cu_msg.cm_xid);
+
+	return new;
+}
+
+static void
+free_cld_upcall(struct cld_upcall *victim)
+{
+	spin_lock(&cld_lock);
+	list_del(&victim->cu_list);
+	spin_unlock(&cld_lock);
+	kfree(victim);
+}
+
+/* Ask daemon to create a new record */
+static void
+nfsd4_cld_create(struct nfs4_client *clp)
+{
+	int ret;
+	struct cld_upcall *cup;
+
+	/* Don't upcall if it's already stored */
+	if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags)) {
+		ret = -EEXIST;
+		goto out_err;
+	}
+
+	cup = alloc_cld_upcall();
+	if (!cup) {
+		ret = -ENOMEM;
+		goto out_err;
+	}
+
+	cup->cu_msg.cm_cmd = Cld_Create;
+	cup->cu_msg.cm_u.cm_name.cn_len = clp->cl_name.len;
+	memcpy(cup->cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data,
+			clp->cl_name.len);
+
+	ret = cld_pipe_upcall(&cup->cu_msg);
+	if (!ret) {
+		ret = cup->cu_msg.cm_status;
+		set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags);
+	}
+
+	free_cld_upcall(cup);
+out_err:
+	if (ret)
+		printk(KERN_ERR "NFSD: Unable to create client "
+				"record on stable storage: %d\n", ret);
+}
+
+/* Check for presence of a record, and update its timestamp */
+static int
+nfsd4_cld_check(struct nfs4_client *clp)
+{
+	int ret;
+	struct cld_upcall *cup;
+
+	/* Don't upcall if one was already stored during this grace pd */
+	if (test_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags))
+		return 0;
+
+	cup = alloc_cld_upcall();
+	if (!cup)
+		return -ENOMEM;
+
+	cup->cu_msg.cm_cmd = Cld_Allow;
+	cup->cu_msg.cm_u.cm_name.cn_len = clp->cl_name.len;
+	memcpy(cup->cu_msg.cm_u.cm_name.cn_id, clp->cl_name.data,
+			clp->cl_name.len);
+
+	ret = cld_pipe_upcall(&cup->cu_msg);
+	if (!ret) {
+		ret = cup->cu_msg.cm_status;
+		set_bit(NFSD4_CLIENT_STABLE, &clp->cl_flags);
+	}
+
+	free_cld_upcall(cup);
+	return ret;
+}
+
+static void
+nfsd4_cld_grace_done(time_t boot_time)
+{
+	int ret;
+	struct cld_upcall *cup;
+
+	cup = alloc_cld_upcall();
+	if (!cup) {
+		ret = -ENOMEM;
+		goto out_err;
+	}
+
+	cup->cu_msg.cm_cmd = Cld_GraceDone;
+	cup->cu_msg.cm_u.cm_gracetime = (int64_t)boot_time;
+	ret = cld_pipe_upcall(&cup->cu_msg);
+	if (!ret)
+		ret = cup->cu_msg.cm_status;
+
+	free_cld_upcall(cup);
+out_err:
+	if (ret)
+		printk(KERN_ERR "NFSD: Unable to end grace period: %d\n", ret);
+}
+
+static struct nfsd4_client_tracking_ops nfsd4_cld_tracking_ops = {
+	.init		= nfsd4_init_cld_pipe,
+	.exit		= nfsd4_remove_cld_pipe,
+	.create		= nfsd4_cld_create,
+	.check		= nfsd4_cld_check,
+	.grace_done	= nfsd4_cld_grace_done,
+};
+
 int
 nfsd4_client_tracking_init(void)
 {
 	int status;
+	struct path path;
 
-	client_tracking_ops = &nfsd4_legacy_tracking_ops;
+	if (!client_tracking_ops) {
+		client_tracking_ops = &nfsd4_cld_tracking_ops;
+		status = kern_path(nfs4_recoverydir(), LOOKUP_FOLLOW, &path);
+		if (!status) {
+			if (S_ISDIR(path.dentry->d_inode->i_mode))
+				client_tracking_ops =
+						&nfsd4_legacy_tracking_ops;
+			path_put(&path);
+		}
+	}
 
 	if (!client_tracking_ops->init)
 		return 0;
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v4 6/6] nfsd: get boot generation number from upcall instead of boot_time
  2012-01-23 20:01 [PATCH v4 0/6] nfsd: overhaul the client name tracking code Jeff Layton
                   ` (4 preceding siblings ...)
  2012-01-23 20:01 ` [PATCH v4 5/6] nfsd: add the infrastructure to handle the cld upcall Jeff Layton
@ 2012-01-23 20:01 ` Jeff Layton
  2012-01-24 23:08 ` [PATCH v4 0/6] nfsd: overhaul the client name tracking code J. Bruce Fields
  2012-01-24 23:10 ` J. Bruce Fields
  7 siblings, 0 replies; 28+ messages in thread
From: Jeff Layton @ 2012-01-23 20:01 UTC (permalink / raw)
  To: bfields; +Cc: linux-nfs

Replace boot_time as a mechanism for generating clientids with a
number that we upcall for from stable storage. If the upcall is
not available, then call get_seconds() to set it similarly to
how we do this today.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 fs/nfsd/nfs4recover.c |   41 ++++++++++++++++++++++++++++++++++++-----
 fs/nfsd/nfs4state.c   |   19 +++++++++++++------
 fs/nfsd/state.h       |    2 +-
 3 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/fs/nfsd/nfs4recover.c b/fs/nfsd/nfs4recover.c
index 7ef9327..1477dc5 100644
--- a/fs/nfsd/nfs4recover.c
+++ b/fs/nfsd/nfs4recover.c
@@ -50,7 +50,7 @@
 
 /* Declarations */
 struct nfsd4_client_tracking_ops {
-	int (*init)(void);
+	int (*init)(uint32_t *);
 	void (*exit)(void);
 	void (*create)(struct nfs4_client *);
 	void (*remove)(struct nfs4_client *);
@@ -409,7 +409,7 @@ nfsd4_init_recdir(void)
 }
 
 static int
-nfsd4_load_reboot_recovery_data(void)
+nfsd4_load_reboot_recovery_data(uint32_t *boot_gen)
 {
 	int status;
 
@@ -420,6 +420,8 @@ nfsd4_load_reboot_recovery_data(void)
 	nfs4_unlock_state();
 	if (status)
 		printk(KERN_ERR "NFSD: Failure reading reboot recovery data\n");
+	else
+		*boot_gen = (uint32_t)get_seconds();
 	return status;
 }
 
@@ -689,6 +691,35 @@ free_cld_upcall(struct cld_upcall *victim)
 	kfree(victim);
 }
 
+static int
+nfsd4_cld_init(uint32_t *boot_gen)
+{
+	int ret;
+	struct cld_upcall *cup;
+
+	ret = nfsd4_init_cld_pipe();
+	if (ret)
+		return ret;
+
+	/* upcall for boot gen */
+	cup = alloc_cld_upcall();
+	if (!cup)
+		return -ENOMEM;
+
+	cup->cu_msg.cm_cmd = Cld_Init;
+
+	ret = cld_pipe_upcall(&cup->cu_msg);
+	if (!ret) {
+		ret = cup->cu_msg.cm_status;
+		if (!ret)
+			*boot_gen = cup->cu_msg.cm_u.cm_generation;
+	}
+
+	free_cld_upcall(cup);
+	return ret;
+
+}
+
 /* Ask daemon to create a new record */
 static void
 nfsd4_cld_create(struct nfs4_client *clp)
@@ -781,7 +812,7 @@ out_err:
 }
 
 static struct nfsd4_client_tracking_ops nfsd4_cld_tracking_ops = {
-	.init		= nfsd4_init_cld_pipe,
+	.init		= nfsd4_cld_init,
 	.exit		= nfsd4_remove_cld_pipe,
 	.create		= nfsd4_cld_create,
 	.check		= nfsd4_cld_check,
@@ -789,7 +820,7 @@ static struct nfsd4_client_tracking_ops nfsd4_cld_tracking_ops = {
 };
 
 int
-nfsd4_client_tracking_init(void)
+nfsd4_client_tracking_init(uint32_t *boot_gen)
 {
 	int status;
 	struct path path;
@@ -808,7 +839,7 @@ nfsd4_client_tracking_init(void)
 	if (!client_tracking_ops->init)
 		return 0;
 
-	status = client_tracking_ops->init();
+	status = client_tracking_ops->init(boot_gen);
 	if (status) {
 		printk(KERN_WARNING "NFSD: Unable to initialize client "
 				    "recovery tracking! (%d)\n", status);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 971a117..47e4981 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -49,6 +49,7 @@
 time_t nfsd4_lease = 90;     /* default lease time */
 time_t nfsd4_grace = 90;
 static time_t boot_time;
+static uint32_t boot_gen;
 
 #define all_ones {{~0,~0},~0}
 static const stateid_t one_stateid = {
@@ -975,10 +976,10 @@ renew_client(struct nfs4_client *clp)
 static int
 STALE_CLIENTID(clientid_t *clid)
 {
-	if (clid->cl_boot == boot_time)
+	if (clid->cl_boot == boot_gen)
 		return 0;
-	dprintk("NFSD stale clientid (%08x/%08x) boot_time %08lx\n",
-		clid->cl_boot, clid->cl_id, boot_time);
+	dprintk("NFSD stale clientid (%08x/%08x) boot_gen %08x\n",
+		clid->cl_boot, clid->cl_id, boot_gen);
 	return 1;
 }
 
@@ -1132,7 +1133,7 @@ static void gen_clid(struct nfs4_client *clp)
 {
 	static u32 current_clientid = 1;
 
-	clp->cl_clientid.cl_boot = boot_time;
+	clp->cl_clientid.cl_boot = boot_gen;
 	clp->cl_clientid.cl_id = current_clientid++; 
 }
 
@@ -3175,7 +3176,7 @@ static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_ol_stateid *s
 static int
 STALE_STATEID(stateid_t *stateid)
 {
-	if (stateid->si_opaque.so_clid.cl_boot == boot_time)
+	if (stateid->si_opaque.so_clid.cl_boot == boot_gen)
 		return 0;
 	dprintk("NFSD: stale stateid " STATEID_FMT "!\n",
 		STATEID_VAL(stateid));
@@ -4630,7 +4631,13 @@ out_free_laundry:
 int
 nfs4_state_start(void)
 {
-	nfsd4_client_tracking_init();
+	int ret;
+
+	ret = nfsd4_client_tracking_init(&boot_gen);
+
+	/* if init fails, then we still need to set boot_gen */
+	if (ret)
+		boot_gen = (uint32_t)get_seconds();
 	return __nfs4_state_start();
 }
 
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 0f9a0ac..dd1f688 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -487,7 +487,7 @@ extern __be32 nfs4_validate_stateid(struct nfs4_client *, stateid_t *);
 extern void nfsd4_purge_closed_stateid(struct nfs4_stateowner *);
 
 /* nfs4recover operations */
-extern int nfsd4_client_tracking_init(void);
+extern int nfsd4_client_tracking_init(uint32_t *boot_gen);
 extern void nfsd4_client_tracking_exit(void);
 extern void nfsd4_client_record_create(struct nfs4_client *clp);
 extern void nfsd4_client_record_remove(struct nfs4_client *clp);
-- 
1.7.7.5


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-23 20:01 [PATCH v4 0/6] nfsd: overhaul the client name tracking code Jeff Layton
                   ` (5 preceding siblings ...)
  2012-01-23 20:01 ` [PATCH v4 6/6] nfsd: get boot generation number from upcall instead of boot_time Jeff Layton
@ 2012-01-24 23:08 ` J. Bruce Fields
  2012-01-24 23:11   ` J. Bruce Fields
  2012-01-25 11:41   ` Jeff Layton
  2012-01-24 23:10 ` J. Bruce Fields
  7 siblings, 2 replies; 28+ messages in thread
From: J. Bruce Fields @ 2012-01-24 23:08 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Mon, Jan 23, 2012 at 03:01:01PM -0500, Jeff Layton wrote:
> This is the fourth iteration of this patchset. I had originally asked
> Bruce to take the last one for 3.3, but decided at the last minute to
> wait on it a bit. I knew there would be some changes needed in the
> upcall, so by waiting we can avoid needing to deal with those in code
> that has already shipped. I would like to see this patchset considered
> for 3.4 however.
> 
> The previous patchset can be viewed here. That set also contains a
> more comprehensive description of the rationale for this:
> 
>     http://www.spinics.net/lists/linux-nfs/msg26324.html
> 
> There have been a number of significant changes since the last set:
> 
> - the remove/expire upcall is now gone. In a clustered environment, the
> records would need to be refcounted in order to handle that properly. That
> becomes a sticky problem when you could have nodes rebooting. We don't
> really need to remove these records individually however. Cleaning them
> out only when the grace period ends should be sufficient.

I don't think so:

	1. Client establishes state with server.
	2. Network goes down.
	3. A lease period passes without the client being able to renew.
	   The server expires the client and grants conflicting locks to
	   other clients.
	3. Server reboots.
	4. Network comes back up.

At this point, the client sees that the server has rebooted and is in
its grace period, and reclaims.  Ooops.

The server needs to be able to tell the client "nope, you're not allowed
to reclaim any more" at this point.

So we need some sort of remove/expire upcall.

--b.

> - an "init" upcall has been added. When knfsd starts, it will upcall to
> userspace for a boot_generation value. This will replace the boot time
> in the scheme that generates clientid4's. By doing this, we eliminate
> a potential problem with clientid4 collisions in clustered setups.
> 
> - a "client index" has been added to the create and check upcalls. The
> kernel ignores this value for now, but eventually we'll need to use it
> to help establish what clients own which locks in a clustered setup.
> 
> - a number of the client_tracking functions have been changed to void
> return functions. Since the callers all ignore the return codes of those
> functions, there was little benefit in passing them back.
> 
> Comments and suggestions are welcome...
> 
> Jeff Layton (6):
>   nfsd: add nfsd4_client_tracking_ops struct and a way to set it
>   sunrpc: create nfsd dir in rpc_pipefs
>   nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
>   nfsd: add a header describing upcall to nfsdcld
>   nfsd: add the infrastructure to handle the cld upcall
>   nfsd: get boot generation number from upcall instead of boot_time
> 
>  fs/nfsd/nfs4callback.c   |   12 +-
>  fs/nfsd/nfs4recover.c    |  480 +++++++++++++++++++++++++++++++++++++++++++++-
>  fs/nfsd/nfs4state.c      |   63 +++----
>  fs/nfsd/state.h          |   22 ++-
>  include/linux/nfsd/cld.h |   58 ++++++
>  net/sunrpc/rpc_pipe.c    |    5 +
>  6 files changed, 581 insertions(+), 59 deletions(-)
>  create mode 100644 include/linux/nfsd/cld.h
> 
> -- 
> 1.7.7.5
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-23 20:01 [PATCH v4 0/6] nfsd: overhaul the client name tracking code Jeff Layton
                   ` (6 preceding siblings ...)
  2012-01-24 23:08 ` [PATCH v4 0/6] nfsd: overhaul the client name tracking code J. Bruce Fields
@ 2012-01-24 23:10 ` J. Bruce Fields
  7 siblings, 0 replies; 28+ messages in thread
From: J. Bruce Fields @ 2012-01-24 23:10 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Mon, Jan 23, 2012 at 03:01:01PM -0500, Jeff Layton wrote:
> - a number of the client_tracking functions have been changed to void
> return functions. Since the callers all ignore the return codes of those
> functions, there was little benefit in passing them back.

If you'd like to make that patch #1, I could apply it now.

--b.

> 
> Comments and suggestions are welcome...
> 
> Jeff Layton (6):
>   nfsd: add nfsd4_client_tracking_ops struct and a way to set it
>   sunrpc: create nfsd dir in rpc_pipefs
>   nfsd: convert nfs4_client->cl_cb_flags to a generic flags field
>   nfsd: add a header describing upcall to nfsdcld
>   nfsd: add the infrastructure to handle the cld upcall
>   nfsd: get boot generation number from upcall instead of boot_time
> 
>  fs/nfsd/nfs4callback.c   |   12 +-
>  fs/nfsd/nfs4recover.c    |  480 +++++++++++++++++++++++++++++++++++++++++++++-
>  fs/nfsd/nfs4state.c      |   63 +++----
>  fs/nfsd/state.h          |   22 ++-
>  include/linux/nfsd/cld.h |   58 ++++++
>  net/sunrpc/rpc_pipe.c    |    5 +
>  6 files changed, 581 insertions(+), 59 deletions(-)
>  create mode 100644 include/linux/nfsd/cld.h
> 
> -- 
> 1.7.7.5
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-24 23:08 ` [PATCH v4 0/6] nfsd: overhaul the client name tracking code J. Bruce Fields
@ 2012-01-24 23:11   ` J. Bruce Fields
  2012-01-25 11:41   ` Jeff Layton
  1 sibling, 0 replies; 28+ messages in thread
From: J. Bruce Fields @ 2012-01-24 23:11 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Tue, Jan 24, 2012 at 06:08:55PM -0500, J. Bruce Fields wrote:
> On Mon, Jan 23, 2012 at 03:01:01PM -0500, Jeff Layton wrote:
> > This is the fourth iteration of this patchset. I had originally asked
> > Bruce to take the last one for 3.3, but decided at the last minute to
> > wait on it a bit. I knew there would be some changes needed in the
> > upcall, so by waiting we can avoid needing to deal with those in code
> > that has already shipped. I would like to see this patchset considered
> > for 3.4 however.
> > 
> > The previous patchset can be viewed here. That set also contains a
> > more comprehensive description of the rationale for this:
> > 
> >     http://www.spinics.net/lists/linux-nfs/msg26324.html
> > 
> > There have been a number of significant changes since the last set:
> > 
> > - the remove/expire upcall is now gone. In a clustered environment, the
> > records would need to be refcounted in order to handle that properly. That
> > becomes a sticky problem when you could have nodes rebooting. We don't
> > really need to remove these records individually however. Cleaning them
> > out only when the grace period ends should be sufficient.
> 
> I don't think so:
> 
> 	1. Client establishes state with server.
> 	2. Network goes down.
> 	3. A lease period passes without the client being able to renew.
> 	   The server expires the client and grants conflicting locks to
> 	   other clients.
> 	3. Server reboots.

(Doh.  Counting is hard.)

--b.

> 	4. Network comes back up.
> 
> At this point, the client sees that the server has rebooted and is in
> its grace period, and reclaims.  Ooops.
> 
> The server needs to be able to tell the client "nope, you're not allowed
> to reclaim any more" at this point.
> 
> So we need some sort of remove/expire upcall.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-24 23:08 ` [PATCH v4 0/6] nfsd: overhaul the client name tracking code J. Bruce Fields
  2012-01-24 23:11   ` J. Bruce Fields
@ 2012-01-25 11:41   ` Jeff Layton
  2012-01-25 13:11     ` J. Bruce Fields
  1 sibling, 1 reply; 28+ messages in thread
From: Jeff Layton @ 2012-01-25 11:41 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs

On Tue, 24 Jan 2012 18:08:55 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Mon, Jan 23, 2012 at 03:01:01PM -0500, Jeff Layton wrote:
> > This is the fourth iteration of this patchset. I had originally asked
> > Bruce to take the last one for 3.3, but decided at the last minute to
> > wait on it a bit. I knew there would be some changes needed in the
> > upcall, so by waiting we can avoid needing to deal with those in code
> > that has already shipped. I would like to see this patchset considered
> > for 3.4 however.
> > 
> > The previous patchset can be viewed here. That set also contains a
> > more comprehensive description of the rationale for this:
> > 
> >     http://www.spinics.net/lists/linux-nfs/msg26324.html
> > 
> > There have been a number of significant changes since the last set:
> > 
> > - the remove/expire upcall is now gone. In a clustered environment, the
> > records would need to be refcounted in order to handle that properly. That
> > becomes a sticky problem when you could have nodes rebooting. We don't
> > really need to remove these records individually however. Cleaning them
> > out only when the grace period ends should be sufficient.
> 
> I don't think so:
> 
> 	1. Client establishes state with server.
> 	2. Network goes down.
> 	3. A lease period passes without the client being able to renew.
> 	   The server expires the client and grants conflicting locks to
> 	   other clients.
> 	3. Server reboots.
> 	4. Network comes back up.
> 
> At this point, the client sees that the server has rebooted and is in
> its grace period, and reclaims.  Ooops.
> 
> The server needs to be able to tell the client "nope, you're not allowed
> to reclaim any more" at this point.
> 
> So we need some sort of remove/expire upcall.
> 

Doh! I don't know what I was thinking -- you're correct and we do need
that.

Ok, I'll see about putting it back and will resend. That does make it
rather nasty to handle clients mounting from multiple nodes in the same
cluster though. We'll need to come up with a data model that allows for
that as well.

> 
> If you'd like to make that patch #1, I could apply it now.
> 

That's already patch #1, but I might need to change some of the
operations to take an additional arg with the server's address. For
now, let's hold off on applying anything until I'm clear on what these
ops need to look like. We still have some runway until the 3.4 merge
window...

Thanks,
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 11:41   ` Jeff Layton
@ 2012-01-25 13:11     ` J. Bruce Fields
  2012-01-25 13:38       ` Jeff Layton
  0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2012-01-25 13:11 UTC (permalink / raw)
  To: Jeff Layton; +Cc: linux-nfs

On Wed, Jan 25, 2012 at 06:41:58AM -0500, Jeff Layton wrote:
> On Tue, 24 Jan 2012 18:08:55 -0500
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > On Mon, Jan 23, 2012 at 03:01:01PM -0500, Jeff Layton wrote:
> > > This is the fourth iteration of this patchset. I had originally asked
> > > Bruce to take the last one for 3.3, but decided at the last minute to
> > > wait on it a bit. I knew there would be some changes needed in the
> > > upcall, so by waiting we can avoid needing to deal with those in code
> > > that has already shipped. I would like to see this patchset considered
> > > for 3.4 however.
> > > 
> > > The previous patchset can be viewed here. That set also contains a
> > > more comprehensive description of the rationale for this:
> > > 
> > >     http://www.spinics.net/lists/linux-nfs/msg26324.html
> > > 
> > > There have been a number of significant changes since the last set:
> > > 
> > > - the remove/expire upcall is now gone. In a clustered environment, the
> > > records would need to be refcounted in order to handle that properly. That
> > > becomes a sticky problem when you could have nodes rebooting. We don't
> > > really need to remove these records individually however. Cleaning them
> > > out only when the grace period ends should be sufficient.
> > 
> > I don't think so:
> > 
> > 	1. Client establishes state with server.
> > 	2. Network goes down.
> > 	3. A lease period passes without the client being able to renew.
> > 	   The server expires the client and grants conflicting locks to
> > 	   other clients.
> > 	3. Server reboots.
> > 	4. Network comes back up.
> > 
> > At this point, the client sees that the server has rebooted and is in
> > its grace period, and reclaims.  Ooops.
> > 
> > The server needs to be able to tell the client "nope, you're not allowed
> > to reclaim any more" at this point.
> > 
> > So we need some sort of remove/expire upcall.
> > 
> 
> Doh! I don't know what I was thinking -- you're correct and we do need
> that.
> 
> Ok, I'll see about putting it back and will resend. That does make it
> rather nasty to handle clients mounting from multiple nodes in the same
> cluster though. We'll need to come up with a data model that allows for
> that as well.

Honestly, in the v4-based migration case if one client can hold state on
mulitple nodes, and could (could it?) after reboot decide to reclaim
state on a different node from the one it previously held the same state
on--I'm not even clear what *should* happen, or if the protocol is
really adequate for that case.

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 13:11     ` J. Bruce Fields
@ 2012-01-25 13:38       ` Jeff Layton
  2012-01-25 16:47         ` Chuck Lever
  0 siblings, 1 reply; 28+ messages in thread
From: Jeff Layton @ 2012-01-25 13:38 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: linux-nfs, chuck.lever

On Wed, 25 Jan 2012 08:11:17 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Wed, Jan 25, 2012 at 06:41:58AM -0500, Jeff Layton wrote:
> > On Tue, 24 Jan 2012 18:08:55 -0500
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> > > On Mon, Jan 23, 2012 at 03:01:01PM -0500, Jeff Layton wrote:
> > > > This is the fourth iteration of this patchset. I had originally asked
> > > > Bruce to take the last one for 3.3, but decided at the last minute to
> > > > wait on it a bit. I knew there would be some changes needed in the
> > > > upcall, so by waiting we can avoid needing to deal with those in code
> > > > that has already shipped. I would like to see this patchset considered
> > > > for 3.4 however.
> > > > 
> > > > The previous patchset can be viewed here. That set also contains a
> > > > more comprehensive description of the rationale for this:
> > > > 
> > > >     http://www.spinics.net/lists/linux-nfs/msg26324.html
> > > > 
> > > > There have been a number of significant changes since the last set:
> > > > 
> > > > - the remove/expire upcall is now gone. In a clustered environment, the
> > > > records would need to be refcounted in order to handle that properly. That
> > > > becomes a sticky problem when you could have nodes rebooting. We don't
> > > > really need to remove these records individually however. Cleaning them
> > > > out only when the grace period ends should be sufficient.
> > > 
> > > I don't think so:
> > > 
> > > 	1. Client establishes state with server.
> > > 	2. Network goes down.
> > > 	3. A lease period passes without the client being able to renew.
> > > 	   The server expires the client and grants conflicting locks to
> > > 	   other clients.
> > > 	3. Server reboots.
> > > 	4. Network comes back up.
> > > 
> > > At this point, the client sees that the server has rebooted and is in
> > > its grace period, and reclaims.  Ooops.
> > > 
> > > The server needs to be able to tell the client "nope, you're not allowed
> > > to reclaim any more" at this point.
> > > 
> > > So we need some sort of remove/expire upcall.
> > > 
> > 
> > Doh! I don't know what I was thinking -- you're correct and we do need
> > that.
> > 
> > Ok, I'll see about putting it back and will resend. That does make it
> > rather nasty to handle clients mounting from multiple nodes in the same
> > cluster though. We'll need to come up with a data model that allows for
> > that as well.
> 
> Honestly, in the v4-based migration case if one client can hold state on
> mulitple nodes, and could (could it?) after reboot decide to reclaim
> state on a different node from the one it previously held the same state
> on--I'm not even clear what *should* happen, or if the protocol is
> really adequate for that case.
> 
> --b.

That was one of Chuck's concerns, IIUC:

--------------[snip]----------------

What if a server has more than one address?  For example, an IPv4 and
an IPv6 address?  Does it get two separate database files?  If so, how
do you ensure that a client's nfs_client_id4 is recorded in both places
atomically?  I'm not sure tying the server's identity to an IP address
is wise.

--------------[snip]----------------

This is the problem...

We need to tie the record to some property that's invariant for the NFS
server "instance". That can't be a physical nodeid or anything, since
part of the goal here is to allow for cluster services to float freely
between them.

I really would like to avoid having to establish some abstract "service
ID" or something since we'd have to track that on stable storage on a
per-nfs-service basis.

The server address seems like a natural fit here. With the design I'm
proposing, a client will need to reestablish its state on another node
if it migrates for any reason.

Chuck, what was your specific worry about tracking these on a per
server address basis? Can you outline a scenario where that would break
something?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 13:38       ` Jeff Layton
@ 2012-01-25 16:47         ` Chuck Lever
  2012-01-25 17:14           ` J. Bruce Fields
  2012-01-25 19:08           ` Jeff Layton
  0 siblings, 2 replies; 28+ messages in thread
From: Chuck Lever @ 2012-01-25 16:47 UTC (permalink / raw)
  To: Jeff Layton; +Cc: J. Bruce Fields, linux-nfs


On Jan 25, 2012, at 8:38 AM, Jeff Layton wrote:

> On Wed, 25 Jan 2012 08:11:17 -0500
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
>> On Wed, Jan 25, 2012 at 06:41:58AM -0500, Jeff Layton wrote:
>>> On Tue, 24 Jan 2012 18:08:55 -0500
>>> "J. Bruce Fields" <bfields@fieldses.org> wrote:
>>> 
>>>> On Mon, Jan 23, 2012 at 03:01:01PM -0500, Jeff Layton wrote:
>>>>> This is the fourth iteration of this patchset. I had originally asked
>>>>> Bruce to take the last one for 3.3, but decided at the last minute to
>>>>> wait on it a bit. I knew there would be some changes needed in the
>>>>> upcall, so by waiting we can avoid needing to deal with those in code
>>>>> that has already shipped. I would like to see this patchset considered
>>>>> for 3.4 however.
>>>>> 
>>>>> The previous patchset can be viewed here. That set also contains a
>>>>> more comprehensive description of the rationale for this:
>>>>> 
>>>>>    http://www.spinics.net/lists/linux-nfs/msg26324.html
>>>>> 
>>>>> There have been a number of significant changes since the last set:
>>>>> 
>>>>> - the remove/expire upcall is now gone. In a clustered environment, the
>>>>> records would need to be refcounted in order to handle that properly. That
>>>>> becomes a sticky problem when you could have nodes rebooting. We don't
>>>>> really need to remove these records individually however. Cleaning them
>>>>> out only when the grace period ends should be sufficient.
>>>> 
>>>> I don't think so:
>>>> 
>>>> 	1. Client establishes state with server.
>>>> 	2. Network goes down.
>>>> 	3. A lease period passes without the client being able to renew.
>>>> 	   The server expires the client and grants conflicting locks to
>>>> 	   other clients.
>>>> 	3. Server reboots.
>>>> 	4. Network comes back up.
>>>> 
>>>> At this point, the client sees that the server has rebooted and is in
>>>> its grace period, and reclaims.  Ooops.
>>>> 
>>>> The server needs to be able to tell the client "nope, you're not allowed
>>>> to reclaim any more" at this point.
>>>> 
>>>> So we need some sort of remove/expire upcall.
>>>> 
>>> 
>>> Doh! I don't know what I was thinking -- you're correct and we do need
>>> that.
>>> 
>>> Ok, I'll see about putting it back and will resend. That does make it
>>> rather nasty to handle clients mounting from multiple nodes in the same
>>> cluster though. We'll need to come up with a data model that allows for
>>> that as well.
>> 
>> Honestly, in the v4-based migration case if one client can hold state on
>> mulitple nodes, and could (could it?) after reboot decide to reclaim
>> state on a different node from the one it previously held the same state
>> on--I'm not even clear what *should* happen, or if the protocol is
>> really adequate for that case.
>> 
>> --b.
> 
> That was one of Chuck's concerns, IIUC:
> 
> --------------[snip]----------------
> 
> What if a server has more than one address?  For example, an IPv4 and
> an IPv6 address?  Does it get two separate database files?  If so, how
> do you ensure that a client's nfs_client_id4 is recorded in both places
> atomically?  I'm not sure tying the server's identity to an IP address
> is wise.
> 
> --------------[snip]----------------
> 
> This is the problem...
> 
> We need to tie the record to some property that's invariant for the NFS
> server "instance". That can't be a physical nodeid or anything, since
> part of the goal here is to allow for cluster services to float freely
> between them.
> 
> I really would like to avoid having to establish some abstract "service
> ID" or something since we'd have to track that on stable storage on a
> per-nfs-service basis.

I don't understand this concern.  You are already building an on-disk database, so adding this item would not be more overhead than a few bytes.  And having a service ID is roughly the same as an NFSv4.1 server ID, if I understand this correctly.

> The server address seems like a natural fit here. With the design I'm
> proposing, a client will need to reestablish its state on another node
> if it migrates for any reason.

The server's IP address is certainly not invariant.  It could be assigned via DHCP, for instance.  But it definitely can be changed by an administrator at any time.

And a server can be multi-homed.  It almost certainly will be multi-homed where IPv6 is present.  Which IP address represents the server's identity?

We have the same problem on clients.  We shouldn't (although we currently do) use the client's IP address in its nfs_client_id4 string: the string is supposed to be invariant, but IP addresses can change, and which address do you pick if there is more than one?

For NFSv4.1, you already have a single server ID object that is not linked to any of the server's IP addresses.

I think therefore that an IP address is actually the very last thing you should use to identify a server instance.

> Chuck, what was your specific worry about tracking these on a per
> server address basis? Can you outline a scenario where that would break
> something?

I'm having a hard time following the discussion, I must be lacking some context.  But the problem is how NFSv4.0 clients detect server identity.  The only way they can do it is by performing a SETCLIENTID_CONFIRM with a particular clientid4 against every server a client knows about.  If the clientid4 is recognized by multiple server IPs, the client knows these IPs are the same server.

Thus if you are preserving clientid4's on stable storage, it seems to me that you need to preserve the relationship between a clientid4 and which servers recognize it.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 16:47         ` Chuck Lever
@ 2012-01-25 17:14           ` J. Bruce Fields
  2012-01-25 17:41             ` Chuck Lever
  2012-01-25 19:08           ` Jeff Layton
  1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2012-01-25 17:14 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Jeff Layton, linux-nfs

On Wed, Jan 25, 2012 at 11:47:41AM -0500, Chuck Lever wrote:
> I'm having a hard time following the discussion, I must be lacking some context.  But the problem is how NFSv4.0 clients detect server identity.  The only way they can do it is by performing a SETCLIENTID_CONFIRM with a particular clientid4 against every server a client knows about.  If the clientid4 is recognized by multiple server IPs, the client knows these IPs are the same server.
> 
> Thus if you are preserving clientid4's on stable storage, it seems to me that you need to preserve the relationship between a clientid4 and which servers recognize it.

The part I'm having trouble thinking about:

Suppose your cluster nodes all advertise each other as replicas using
v4 (fs_locations or fs_locations_info).

Suppose your clients support (v4-based) failover, either transparent or
not.

Now suppose your cluster reboots.

Must the client necessarily reclaim its locks against the same server
that it last acquired them from?  And if not, how do we decide whether a
given reclaim is allowed or not?

I think there may be scenarios involving server reboots and the client
losing contact with some but not all servers, where we could get
confused about which (if any) state the client is allowed to reclaim.

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 17:14           ` J. Bruce Fields
@ 2012-01-25 17:41             ` Chuck Lever
  2012-01-25 18:55               ` J. Bruce Fields
  0 siblings, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2012-01-25 17:41 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs

Hi-

On Jan 25, 2012, at 12:14 PM, J. Bruce Fields wrote:

> On Wed, Jan 25, 2012 at 11:47:41AM -0500, Chuck Lever wrote:
>> I'm having a hard time following the discussion, I must be lacking some context.  But the problem is how NFSv4.0 clients detect server identity.  The only way they can do it is by performing a SETCLIENTID_CONFIRM with a particular clientid4 against every server a client knows about.  If the clientid4 is recognized by multiple server IPs, the client knows these IPs are the same server.
>> 
>> Thus if you are preserving clientid4's on stable storage, it seems to me that you need to preserve the relationship between a clientid4 and which servers recognize it.
> 
> The part I'm having trouble thinking about:
> 
> Suppose your cluster nodes all advertise each other as replicas using
> v4 (fs_locations or fs_locations_info).
> 
> Suppose your clients support (v4-based) failover, either transparent or
> not.
> 
> Now suppose your cluster reboots.
> 
> Must the client necessarily reclaim its locks against the same server
> that it last acquired them from?  And if not, how do we decide whether a
> given reclaim is allowed or not?

This is just my opinion, but...

One might define an NFSv4 server as an entity that passes out and recognizes clientid4's.

Suppose a client is talking to NFSv4 servers at IP addresses A and B.  If the client gets a clientid4 from IP address A and performs a failing SETCLIENTID_CONFIRM with that clientid4 on IP address B, then one can say that A and B are IP addresses for distinct servers.

If the SETCLIENTID_CONFIRM succeeds, however, then the client must try other tests to confirm that A and B represent the same server.  For example, can the client use the same state tokens when sending NFsv4 operations to A and B?

> I think there may be scenarios involving server reboots and the client
> losing contact with some but not all servers, where we could get
> confused about which (if any) state the client is allowed to reclaim.

If SETCLIENTID returns a unique clientid4 that a client hasn't seen from other servers, the client knows that's a unique server instance which must be recovered separately after a reboot.

Conversely, if all the servers in your cluster recognize a particular clientid4, then there must be some kind of state sharing relationship amongst them.  Reclaiming against one should be sufficient.  Not all clustering works that way, though.

fs_locations can pass out information for entirely separate server instances.  fs_locations does not in any way indicate a relationship of state or data consistency between the servers in the lists it returns.  fs_locations_info, in fact, returns much richer information that allows clients some visibility of state and data consistency amongst the listed servers.  But the mere presence of a server in one of these lists is not enough for a wayward client to draw any conclusions about the need to reclaim state after a server reboot.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 17:41             ` Chuck Lever
@ 2012-01-25 18:55               ` J. Bruce Fields
  2012-01-25 20:23                 ` Jeff Layton
  2012-01-25 20:29                 ` Chuck Lever
  0 siblings, 2 replies; 28+ messages in thread
From: J. Bruce Fields @ 2012-01-25 18:55 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Jeff Layton, linux-nfs

On Wed, Jan 25, 2012 at 12:41:27PM -0500, Chuck Lever wrote:
> If SETCLIENTID returns a unique clientid4 that a client hasn't seen from other servers, the client knows that's a unique server instance which must be recovered separately after a reboot.

Hm, but does it have to do the recovery with that server?

And if so, then how does that fit with failover?

I mean, suppose the whole cluster is rebooted.  From the client's point
of view, its server becomes unresponsive.  So it should probably start
pinging the replicas to see if another one's up.  The first server it
gets a response from won't necessarily be the one it was using before.
What happens next?

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 16:47         ` Chuck Lever
  2012-01-25 17:14           ` J. Bruce Fields
@ 2012-01-25 19:08           ` Jeff Layton
  1 sibling, 0 replies; 28+ messages in thread
From: Jeff Layton @ 2012-01-25 19:08 UTC (permalink / raw)
  To: Chuck Lever; +Cc: J. Bruce Fields, linux-nfs

On Wed, 25 Jan 2012 11:47:41 -0500
Chuck Lever <chuck.lever@oracle.com> wrote:

> 
> On Jan 25, 2012, at 8:38 AM, Jeff Layton wrote:
> 
> > On Wed, 25 Jan 2012 08:11:17 -0500
> > "J. Bruce Fields" <bfields@fieldses.org> wrote:
> > 
> >> On Wed, Jan 25, 2012 at 06:41:58AM -0500, Jeff Layton wrote:
> >>> On Tue, 24 Jan 2012 18:08:55 -0500
> >>> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> >>> 
> >>>> On Mon, Jan 23, 2012 at 03:01:01PM -0500, Jeff Layton wrote:
> >>>>> This is the fourth iteration of this patchset. I had originally asked
> >>>>> Bruce to take the last one for 3.3, but decided at the last minute to
> >>>>> wait on it a bit. I knew there would be some changes needed in the
> >>>>> upcall, so by waiting we can avoid needing to deal with those in code
> >>>>> that has already shipped. I would like to see this patchset considered
> >>>>> for 3.4 however.
> >>>>> 
> >>>>> The previous patchset can be viewed here. That set also contains a
> >>>>> more comprehensive description of the rationale for this:
> >>>>> 
> >>>>>    http://www.spinics.net/lists/linux-nfs/msg26324.html
> >>>>> 
> >>>>> There have been a number of significant changes since the last set:
> >>>>> 
> >>>>> - the remove/expire upcall is now gone. In a clustered environment, the
> >>>>> records would need to be refcounted in order to handle that properly. That
> >>>>> becomes a sticky problem when you could have nodes rebooting. We don't
> >>>>> really need to remove these records individually however. Cleaning them
> >>>>> out only when the grace period ends should be sufficient.
> >>>> 
> >>>> I don't think so:
> >>>> 
> >>>> 	1. Client establishes state with server.
> >>>> 	2. Network goes down.
> >>>> 	3. A lease period passes without the client being able to renew.
> >>>> 	   The server expires the client and grants conflicting locks to
> >>>> 	   other clients.
> >>>> 	3. Server reboots.
> >>>> 	4. Network comes back up.
> >>>> 
> >>>> At this point, the client sees that the server has rebooted and is in
> >>>> its grace period, and reclaims.  Ooops.
> >>>> 
> >>>> The server needs to be able to tell the client "nope, you're not allowed
> >>>> to reclaim any more" at this point.
> >>>> 
> >>>> So we need some sort of remove/expire upcall.
> >>>> 
> >>> 
> >>> Doh! I don't know what I was thinking -- you're correct and we do need
> >>> that.
> >>> 
> >>> Ok, I'll see about putting it back and will resend. That does make it
> >>> rather nasty to handle clients mounting from multiple nodes in the same
> >>> cluster though. We'll need to come up with a data model that allows for
> >>> that as well.
> >> 
> >> Honestly, in the v4-based migration case if one client can hold state on
> >> mulitple nodes, and could (could it?) after reboot decide to reclaim
> >> state on a different node from the one it previously held the same state
> >> on--I'm not even clear what *should* happen, or if the protocol is
> >> really adequate for that case.
> >> 
> >> --b.
> > 
> > That was one of Chuck's concerns, IIUC:
> > 
> > --------------[snip]----------------
> > 
> > What if a server has more than one address?  For example, an IPv4 and
> > an IPv6 address?  Does it get two separate database files?  If so, how
> > do you ensure that a client's nfs_client_id4 is recorded in both places
> > atomically?  I'm not sure tying the server's identity to an IP address
> > is wise.
> > 
> > --------------[snip]----------------
> > 
> > This is the problem...
> > 
> > We need to tie the record to some property that's invariant for the NFS
> > server "instance". That can't be a physical nodeid or anything, since
> > part of the goal here is to allow for cluster services to float freely
> > between them.
> > 
> > I really would like to avoid having to establish some abstract "service
> > ID" or something since we'd have to track that on stable storage on a
> > per-nfs-service basis.
> 
> I don't understand this concern.  You are already building an on-disk database, so adding this item would not be more overhead than a few bytes.  And having a service ID is roughly the same as an NFSv4.1 server ID, if I understand this correctly.

It's more difficult than you think. Each physical server node could
potentially be home to one or more NFS "services". If we add this
persistent serviceid, how will we stick it into the kernel, and what
will we associate that with? How will the kernel know that it should
use serviceid #1 for a particular SETCLIENTID call and not serviceid #2?

Dealing with that means adding a lot of smarts to the kernel that I'm
keen to avoid having to deal with (think containerization). There's
nothing that stops someone from putting that in later if they choose,
but it's not necessary to do here.

> 
> > The server address seems like a natural fit here. With the design I'm
> > proposing, a client will need to reestablish its state on another node
> > if it migrates for any reason.
> 
> The server's IP address is certainly not invariant.  It could be assigned via DHCP, for instance.  But it definitely can be changed by an administrator at any time.
> 

It's not invariant, but if the server reboots and its address changes,
can we reasonably assume that the client will know to pick up a new
address? While I'm all for coding for future flexibility, dealing with
that situation will add a lot of complexity that we really don't need
at the moment.

> And a server can be multi-homed.  It almost certainly will be multi-homed where IPv6 is present.  Which IP address represents the server's identity?
> 
> We have the same problem on clients.  We shouldn't (although we currently do) use the client's IP address in its nfs_client_id4 string: the string is supposed to be invariant, but IP addresses can change, and which address do you pick if there is more than one?
> 
> For NFSv4.1, you already have a single server ID object that is not linked to any of the server's IP addresses.
> 
> I think therefore that an IP address is actually the very last thing you should use to identify a server instance.
> 
> > Chuck, what was your specific worry about tracking these on a per
> > server address basis? Can you outline a scenario where that would break
> > something?
> 
> I'm having a hard time following the discussion, I must be lacking some context.  But the problem is how NFSv4.0 clients detect server identity.  The only way they can do it is by performing a SETCLIENTID_CONFIRM with a particular clientid4 against every server a client knows about.  If the clientid4 is recognized by multiple server IPs, the client knows these IPs are the same server.
> 
> Thus if you are preserving clientid4's on stable storage, it seems to me that you need to preserve the relationship between a clientid4 and which servers recognize it.
> 

I'm not storing clientid4's on stable storage. The clientid4 will be
unique by virtue of the fact that nfsd will upcall (once) to get its
boot_generation value.

The basic idea here is to allow serving from more than one node of a
clustered filesystem at a time. The design I'm shooting for here is to
set it up so that a client could potentially migrate to another server
node in the cluster at any time. It could do that in response to the
address moving, a migration event, or for reasons of its own choosing.

There are a lot of details to the design that I don't want to go over
here, but basically we don't need anything quite that complicated. We
just need to ensure that if a client issues a reclaim request against a
service that we don't grant it if it didn't hold any state since the
last reboot.

Tying clientids to the server's address is fairly simple. True, if a
client migrates to a different address on a multihomed server, it
wouldn't recognize its clientid4, but is that a use-case that we really
need to concern ourselves with?

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 18:55               ` J. Bruce Fields
@ 2012-01-25 20:23                 ` Jeff Layton
  2012-01-25 21:25                   ` J. Bruce Fields
  2012-01-25 20:29                 ` Chuck Lever
  1 sibling, 1 reply; 28+ messages in thread
From: Jeff Layton @ 2012-01-25 20:23 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Chuck Lever, linux-nfs

On Wed, 25 Jan 2012 13:55:29 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Wed, Jan 25, 2012 at 12:41:27PM -0500, Chuck Lever wrote:
> > If SETCLIENTID returns a unique clientid4 that a client hasn't seen from other servers, the client knows that's a unique server instance which must be recovered separately after a reboot.
> 
> Hm, but does it have to do the recovery with that server?
> 
> And if so, then how does that fit with failover?
> 
> I mean, suppose the whole cluster is rebooted.  From the client's point
> of view, its server becomes unresponsive.  So it should probably start
> pinging the replicas to see if another one's up.  The first server it
> gets a response from won't necessarily be the one it was using before.
> What happens next?
> 
> --b.

Perhaps this will articulate the problem better:

Suppose we have a cluster of two machines (node1 and node2) and they
both serve out two exports (/exp1 and /exp2). A client mounts /exp1
from node1 and /exp2 from node2. Furthermore, let's assume that the
client sends the same name string to both hosts in the SETCLIENTID call.

Now, there's a network partition such that the client cannot talk to
node2 anymore, but can talk to node1. node2 expires the client and
tries to remove its record from stable storage...but we can't allow the
record to be purged since we need to keep the record around for node1.
So, we need to ensure that both nodes have their own record.

Fine, so let's assume that we tie that record to a "nodeid" or
something specific to the physical host. Now, let's suppose there's a
cluster-wide reboot and the network partition is repaired. The client
decides on its own to migrate all of its mounts to node1. But, we
can't allow it to reclaim the locks /exp2. So maybe we need to track
the nfs_client_id4 on a per-node+per_fs basis...

Don't worry, it gets worse...suppose we end up with the mounting
subdirectories of the same mount from different hosts (say,
node1:/exp2/dir1 node2:/exp2/dir2 -- it's pathological, but there's no
reason you couldn't do that). Now, it's not even sufficient to track
this info on a per-node + per-fs basis...

So, while it's all well and good to talk about keeping this flexible, I
think the only way to bring in sanity here is to put some artificial
constraints in place. I suggest that we only allow the reclaim of locks
on the original address against which they were established. It's less
than ideal, and we can try to loosen that later somehow, but doing
anything else on the first pass is going to be really ugly.

Thoughts?
-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 18:55               ` J. Bruce Fields
  2012-01-25 20:23                 ` Jeff Layton
@ 2012-01-25 20:29                 ` Chuck Lever
  2012-01-25 20:53                   ` J. Bruce Fields
  1 sibling, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2012-01-25 20:29 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs

On Jan 25, 2012, at 1:55 PM, J. Bruce Fields wrote:

> On Wed, Jan 25, 2012 at 12:41:27PM -0500, Chuck Lever wrote:
>> If SETCLIENTID returns a unique clientid4 that a client hasn't seen from other servers, the client knows that's a unique server instance which must be recovered separately after a reboot.
> 
> Hm, but does it have to do the recovery with that server?

If a client has a lease and open state on that server, it should do recovery if the server reboots.

> And if so, then how does that fit with failover?

We were supposed to discuss that with Bill and Piyush.  Maybe we can bring it up again at Connectathon.  But my assumption is that fail over is supposed to look like a server reboot.  The question is what clients does the server allow to recover, and which does it force to start fresh?  Shouldn't it be enough for a server to remember nfs_client_id4 strings?

> I mean, suppose the whole cluster is rebooted.  From the client's point
> of view, its server becomes unresponsive.  So it should probably start
> pinging the replicas to see if another one's up.  The first server it
> gets a response from won't necessarily be the one it was using before.
> What happens next?

Again, it depends on whether your clustering implementation shares state among all servers in the cluster.

Sharing state across a cluster seems like a difficult implementation choice.  For simplicity I think each server in a cluster should manage its own leases.  NFSv4.0 replication is generally for read-only datasets anyway.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 20:29                 ` Chuck Lever
@ 2012-01-25 20:53                   ` J. Bruce Fields
  2012-01-25 21:08                     ` Chuck Lever
  0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2012-01-25 20:53 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Jeff Layton, linux-nfs

On Wed, Jan 25, 2012 at 03:29:34PM -0500, Chuck Lever wrote:
> 
> On Jan 25, 2012, at 1:55 PM, J. Bruce Fields wrote:
> 
> > On Wed, Jan 25, 2012 at 12:41:27PM -0500, Chuck Lever wrote:
> >> If SETCLIENTID returns a unique clientid4 that a client hasn't seen from other servers, the client knows that's a unique server instance which must be recovered separately after a reboot.
> > 
> > Hm, but does it have to do the recovery with that server?
> 
> If a client has a lease and open state on that server, it should do recovery if the server reboots.

Yes, but does it have to do it against *that* server, or could it
recover against another?

Again, as long as failover is allowed, I think the latter is too.

> > And if so, then how does that fit with failover?
> 
> We were supposed to discuss that with Bill and Piyush.  Maybe we can bring it up again at Connectathon.  But my assumption is that fail over is supposed to look like a server reboot.

That's what I assume too: but that means, if I'm a client, and I fail
over from server A to server B, and server B gives me a STALE error: I
don't know if that's just because I failed over, or if in fact A and/or
B did just reboot.

And from the point of view of the servers: they don't know if the state
I'm trying to reclaim is state I previously held from server A, or if
it's some other state that I previously held on server C (but then lost,
unbeknownst to me, due to a network partition that lost my RENEWs to C).

So I guess the servers would be stuck trying to track all that state
across reboots?

> The question is what clients does the server allow to recover, and which does it force to start fresh?  Shouldn't it be enough for a server to remember nfs_client_id4 strings?
> 
> > I mean, suppose the whole cluster is rebooted.  From the client's point
> > of view, its server becomes unresponsive.  So it should probably start
> > pinging the replicas to see if another one's up.  The first server it
> > gets a response from won't necessarily be the one it was using before.
> > What happens next?
> 
> Again, it depends on whether your clustering implementation shares state among all servers in the cluster.

Assume for now it doesn't.

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 20:53                   ` J. Bruce Fields
@ 2012-01-25 21:08                     ` Chuck Lever
  0 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever @ 2012-01-25 21:08 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs


On Jan 25, 2012, at 3:53 PM, J. Bruce Fields wrote:

> On Wed, Jan 25, 2012 at 03:29:34PM -0500, Chuck Lever wrote:
>> 
>> On Jan 25, 2012, at 1:55 PM, J. Bruce Fields wrote:
>> 
>>> On Wed, Jan 25, 2012 at 12:41:27PM -0500, Chuck Lever wrote:
>>>> If SETCLIENTID returns a unique clientid4 that a client hasn't seen from other servers, the client knows that's a unique server instance which must be recovered separately after a reboot.
>>> 
>>> Hm, but does it have to do the recovery with that server?
>> 
>> If a client has a lease and open state on that server, it should do recovery if the server reboots.
> 
> Yes, but does it have to do it against *that* server, or could it
> recover against another?
> 
> Again, as long as failover is allowed, I think the latter is too.

Your questions assume a number of implementation details that are not in evidence.  I think we should have a f2f or phone meeting to walk through this.


> 
>>> And if so, then how does that fit with failover?
>> 
>> We were supposed to discuss that with Bill and Piyush.  Maybe we can bring it up again at Connectathon.  But my assumption is that fail over is supposed to look like a server reboot.
> 
> That's what I assume too: but that means, if I'm a client, and I fail
> over from server A to server B, and server B gives me a STALE error: I
> don't know if that's just because I failed over, or if in fact A and/or
> B did just reboot.
> 
> And from the point of view of the servers: they don't know if the state
> I'm trying to reclaim is state I previously held from server A, or if
> it's some other state that I previously held on server C (but then lost,
> unbeknownst to me, due to a network partition that lost my RENEWs to C).
> 
> So I guess the servers would be stuck trying to track all that state
> across reboots?
> 
>> The question is what clients does the server allow to recover, and which does it force to start fresh?  Shouldn't it be enough for a server to remember nfs_client_id4 strings?
>> 
>>> I mean, suppose the whole cluster is rebooted.  From the client's point
>>> of view, its server becomes unresponsive.  So it should probably start
>>> pinging the replicas to see if another one's up.  The first server it
>>> gets a response from won't necessarily be the one it was using before.
>>> What happens next?
>> 
>> Again, it depends on whether your clustering implementation shares state among all servers in the cluster.
> 
> Assume for now it doesn't.
> 
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 20:23                 ` Jeff Layton
@ 2012-01-25 21:25                   ` J. Bruce Fields
  2012-01-25 21:29                     ` Chuck Lever
  2012-01-27 15:43                     ` Jeff Layton
  0 siblings, 2 replies; 28+ messages in thread
From: J. Bruce Fields @ 2012-01-25 21:25 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Chuck Lever, linux-nfs

On Wed, Jan 25, 2012 at 03:23:56PM -0500, Jeff Layton wrote:
> I suggest that we only allow the reclaim of locks
> on the original address against which they were established.

I'm not sure what that means.

If a server stops responding, the v4.0 client has two choices: it can
either wait for the server to come back, and reclaim when it does.  Or
if it supports failover it can go find another server and perform
reclaims over there.

I'm a little unclear how it does that, but I suppose it first tests
somehow to see whether its existing state is supported, and if not, it
establishes a new clientid with SETCLIENTID/SETCILENTID_CONFIRM using
its old name, and then attempts to reclaim.

You're now requiring it *not* to do that if it happens that the servers
all rebooted in the meantime.  How does it know that that's what
happened?

Or maybe that's not what you want to require, I'm not sure.

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 21:25                   ` J. Bruce Fields
@ 2012-01-25 21:29                     ` Chuck Lever
  2012-01-25 21:54                       ` J. Bruce Fields
  2012-01-27 15:43                     ` Jeff Layton
  1 sibling, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2012-01-25 21:29 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs


On Jan 25, 2012, at 4:25 PM, J. Bruce Fields wrote:

> On Wed, Jan 25, 2012 at 03:23:56PM -0500, Jeff Layton wrote:
>> I suggest that we only allow the reclaim of locks
>> on the original address against which they were established.
> 
> I'm not sure what that means.
> 
> If a server stops responding, the v4.0 client has two choices: it can
> either wait for the server to come back, and reclaim when it does.  Or
> if it supports failover it can go find another server and perform
> reclaims over there.

Honestly, I don't think that's possible in NFSv4.0.

The richer information provided by fs_locations_info can allow the client to determine whether two servers share more than simply data.  That information does not exist in NFSv4.0, so there's no way a client can expect that two servers lists in fs_locations results will always have the same NFSv4 state.

Thus I believe that NFSv4.0 replication is limited to read-only data.  But I have to go back and read that chapter of 3530 again.

> I'm a little unclear how it does that, but I suppose it first tests
> somehow to see whether its existing state is supported, and if not, it
> establishes a new clientid with SETCLIENTID/SETCILENTID_CONFIRM using
> its old name, and then attempts to reclaim.
> 
> You're now requiring it *not* to do that if it happens that the servers
> all rebooted in the meantime.  How does it know that that's what
> happened?
> 
> Or maybe that's not what you want to require, I'm not sure.
> 
> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 21:29                     ` Chuck Lever
@ 2012-01-25 21:54                       ` J. Bruce Fields
  2012-01-25 21:55                         ` Chuck Lever
  0 siblings, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2012-01-25 21:54 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Jeff Layton, linux-nfs

On Wed, Jan 25, 2012 at 04:29:02PM -0500, Chuck Lever wrote:
> 
> On Jan 25, 2012, at 4:25 PM, J. Bruce Fields wrote:
> 
> > On Wed, Jan 25, 2012 at 03:23:56PM -0500, Jeff Layton wrote:
> >> I suggest that we only allow the reclaim of locks
> >> on the original address against which they were established.
> > 
> > I'm not sure what that means.
> > 
> > If a server stops responding, the v4.0 client has two choices: it can
> > either wait for the server to come back, and reclaim when it does.  Or
> > if it supports failover it can go find another server and perform
> > reclaims over there.
> 
> Honestly, I don't think that's possible in NFSv4.0.

Well, it was *supposed* to be possible, wasn't it?  I thought fixing up
3530 to make that work was part of what the discussion around
http://datatracker.ietf.org/doc/draft-dnoveck-nfsv4-migration-issues/
was about?  (OK, I should actually read that.)

Agreed that there doesn't seem to be an agreed-upon way to do it now....

--b.

> The richer information provided by fs_locations_info can allow the client to determine whether two servers share more than simply data.  That information does not exist in NFSv4.0, so there's no way a client can expect that two servers lists in fs_locations results will always have the same NFSv4 state.
> 
> Thus I believe that NFSv4.0 replication is limited to read-only data.  But I have to go back and read that chapter of 3530 again.
> 
> > I'm a little unclear how it does that, but I suppose it first tests
> > somehow to see whether its existing state is supported, and if not, it
> > establishes a new clientid with SETCLIENTID/SETCILENTID_CONFIRM using
> > its old name, and then attempts to reclaim.
> > 
> > You're now requiring it *not* to do that if it happens that the servers
> > all rebooted in the meantime.  How does it know that that's what
> > happened?
> > 
> > Or maybe that's not what you want to require, I'm not sure.
> > 
> > --b.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 21:54                       ` J. Bruce Fields
@ 2012-01-25 21:55                         ` Chuck Lever
  2012-01-25 22:11                           ` J. Bruce Fields
  0 siblings, 1 reply; 28+ messages in thread
From: Chuck Lever @ 2012-01-25 21:55 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Jeff Layton, linux-nfs


On Jan 25, 2012, at 4:54 PM, J. Bruce Fields wrote:

> On Wed, Jan 25, 2012 at 04:29:02PM -0500, Chuck Lever wrote:
>> 
>> On Jan 25, 2012, at 4:25 PM, J. Bruce Fields wrote:
>> 
>>> On Wed, Jan 25, 2012 at 03:23:56PM -0500, Jeff Layton wrote:
>>>> I suggest that we only allow the reclaim of locks
>>>> on the original address against which they were established.
>>> 
>>> I'm not sure what that means.
>>> 
>>> If a server stops responding, the v4.0 client has two choices: it can
>>> either wait for the server to come back, and reclaim when it does.  Or
>>> if it supports failover it can go find another server and perform
>>> reclaims over there.
>> 
>> Honestly, I don't think that's possible in NFSv4.0.
> 
> Well, it was *supposed* to be possible, wasn't it?  I thought fixing up
> 3530 to make that work was part of what the discussion around
> http://datatracker.ietf.org/doc/draft-dnoveck-nfsv4-migration-issues/
> was about?  (OK, I should actually read that.)

No, that's just for migration.  Failover is different, though it uses some of the same mechanisms.

> Agreed that there doesn't seem to be an agreed-upon way to do it now....
> 
> --b.
> 
>> The richer information provided by fs_locations_info can allow the client to determine whether two servers share more than simply data.  That information does not exist in NFSv4.0, so there's no way a client can expect that two servers lists in fs_locations results will always have the same NFSv4 state.
>> 
>> Thus I believe that NFSv4.0 replication is limited to read-only data.  But I have to go back and read that chapter of 3530 again.
>> 
>>> I'm a little unclear how it does that, but I suppose it first tests
>>> somehow to see whether its existing state is supported, and if not, it
>>> establishes a new clientid with SETCLIENTID/SETCILENTID_CONFIRM using
>>> its old name, and then attempts to reclaim.
>>> 
>>> You're now requiring it *not* to do that if it happens that the servers
>>> all rebooted in the meantime.  How does it know that that's what
>>> happened?
>>> 
>>> Or maybe that's not what you want to require, I'm not sure.
>>> 
>>> --b.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
>> -- 
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>> 
>> 
>> 
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 21:55                         ` Chuck Lever
@ 2012-01-25 22:11                           ` J. Bruce Fields
  0 siblings, 0 replies; 28+ messages in thread
From: J. Bruce Fields @ 2012-01-25 22:11 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Jeff Layton, linux-nfs

On Wed, Jan 25, 2012 at 04:55:54PM -0500, Chuck Lever wrote:
> 
> On Jan 25, 2012, at 4:54 PM, J. Bruce Fields wrote:
> 
> > On Wed, Jan 25, 2012 at 04:29:02PM -0500, Chuck Lever wrote:
> >> 
> >> On Jan 25, 2012, at 4:25 PM, J. Bruce Fields wrote:
> >> 
> >>> On Wed, Jan 25, 2012 at 03:23:56PM -0500, Jeff Layton wrote:
> >>>> I suggest that we only allow the reclaim of locks
> >>>> on the original address against which they were established.
> >>> 
> >>> I'm not sure what that means.
> >>> 
> >>> If a server stops responding, the v4.0 client has two choices: it can
> >>> either wait for the server to come back, and reclaim when it does.  Or
> >>> if it supports failover it can go find another server and perform
> >>> reclaims over there.
> >> 
> >> Honestly, I don't think that's possible in NFSv4.0.
> > 
> > Well, it was *supposed* to be possible, wasn't it?  I thought fixing up
> > 3530 to make that work was part of what the discussion around
> > http://datatracker.ietf.org/doc/draft-dnoveck-nfsv4-migration-issues/
> > was about?  (OK, I should actually read that.)
> 
> No, that's just for migration.  Failover is different, though it uses some of the same mechanisms.

Hm, I would've thought once you've worked out the clientid, then
failover would look pretty similar in 4.0 and 4.1.

And I don't think the richer locations info helps e.g. with the server
trying to track client-reclaimability across reboots and failovers.

--b.

> >> The richer information provided by fs_locations_info can allow the client to determine whether two servers share more than simply data.  That information does not exist in NFSv4.0, so there's no way a client can expect that two servers lists in fs_locations results will always have the same NFSv4 state.
> >> 
> >> Thus I believe that NFSv4.0 replication is limited to read-only data.  But I have to go back and read that chapter of 3530 again.
> >> 
> >>> I'm a little unclear how it does that, but I suppose it first tests
> >>> somehow to see whether its existing state is supported, and if not, it
> >>> establishes a new clientid with SETCLIENTID/SETCILENTID_CONFIRM using
> >>> its old name, and then attempts to reclaim.
> >>> 
> >>> You're now requiring it *not* to do that if it happens that the servers
> >>> all rebooted in the meantime.  How does it know that that's what
> >>> happened?
> >>> 
> >>> Or maybe that's not what you want to require, I'm not sure.
> >>> 
> >>> --b.
> >>> --
> >>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >>> the body of a message to majordomo@vger.kernel.org
> >>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> 
> >> -- 
> >> Chuck Lever
> >> chuck[dot]lever[at]oracle[dot]com
> >> 
> >> 
> >> 
> >> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v4 0/6] nfsd: overhaul the client name tracking code
  2012-01-25 21:25                   ` J. Bruce Fields
  2012-01-25 21:29                     ` Chuck Lever
@ 2012-01-27 15:43                     ` Jeff Layton
  1 sibling, 0 replies; 28+ messages in thread
From: Jeff Layton @ 2012-01-27 15:43 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Chuck Lever, linux-nfs

On Wed, 25 Jan 2012 16:25:53 -0500
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> On Wed, Jan 25, 2012 at 03:23:56PM -0500, Jeff Layton wrote:
> > I suggest that we only allow the reclaim of locks
> > on the original address against which they were established.
> 
> I'm not sure what that means.
> 
> If a server stops responding, the v4.0 client has two choices: it can
> either wait for the server to come back, and reclaim when it does.  Or
> if it supports failover it can go find another server and perform
> reclaims over there.
> 
> I'm a little unclear how it does that, but I suppose it first tests
> somehow to see whether its existing state is supported, and if not, it
> establishes a new clientid with SETCLIENTID/SETCILENTID_CONFIRM using
> its old name, and then attempts to reclaim.
> 
> You're now requiring it *not* to do that if it happens that the servers
> all rebooted in the meantime.  How does it know that that's what
> happened?
> 
> Or maybe that's not what you want to require, I'm not sure.
> 

Sorry I didn't respond sooner. I spent some time yesterday poring
over Dave's whitepaper and the RFCs to see if I could figure out a
better way to do this. Short answer: I don't think we can...

By the above, I meant that we can't reasonably allow a client to
acquire a lock on address A and then reclaim that lock on address B
after a reboot. But now I'm not even certain that's sufficient to
prevent all possible problems after a cold-start of the cluster.

In particular, I'm concerned about this one (from my earlier email):

> Don't worry, it gets worse...suppose we end up with the mounting
> subdirectories of the same mount from different hosts (say,
> node1:/exp2/dir1 node2:/exp2/dir2 -- it's pathological, but there's no
> reason you couldn't do that). Now, it's not even sufficient to track
> this info on a per-node + per-fs basis...

We have no way to prevent someone from doing the above, or even to
reliably detect whether this has been done. The only way that I can see
that we could handle the above situation would be to track each
individual lock on stable storage along with enough information to know
which client owns it at a particular time. That's something I really
don't want to implement at this point in time...

I'm going to continue researching this and seeing if I can come up with
a way to handle the clustered configuration sanely. What I'll probably
plan to do in the interim is to fix the patchsets that I have so far to
at least work properly in the single-node case. I'll also try to
"future-proof" the upcall format such that a clustered configuration
hopefully won't require much in the way of changes.

-- 
Jeff Layton <jlayton@redhat.com>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2012-01-27 15:43 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-23 20:01 [PATCH v4 0/6] nfsd: overhaul the client name tracking code Jeff Layton
2012-01-23 20:01 ` [PATCH v4 1/6] nfsd: add nfsd4_client_tracking_ops struct and a way to set it Jeff Layton
2012-01-23 20:01 ` [PATCH v4 2/6] sunrpc: create nfsd dir in rpc_pipefs Jeff Layton
2012-01-23 20:01 ` [PATCH v4 3/6] nfsd: convert nfs4_client->cl_cb_flags to a generic flags field Jeff Layton
2012-01-23 20:01 ` [PATCH v4 4/6] nfsd: add a header describing upcall to nfsdcld Jeff Layton
2012-01-23 20:01 ` [PATCH v4 5/6] nfsd: add the infrastructure to handle the cld upcall Jeff Layton
2012-01-23 20:01 ` [PATCH v4 6/6] nfsd: get boot generation number from upcall instead of boot_time Jeff Layton
2012-01-24 23:08 ` [PATCH v4 0/6] nfsd: overhaul the client name tracking code J. Bruce Fields
2012-01-24 23:11   ` J. Bruce Fields
2012-01-25 11:41   ` Jeff Layton
2012-01-25 13:11     ` J. Bruce Fields
2012-01-25 13:38       ` Jeff Layton
2012-01-25 16:47         ` Chuck Lever
2012-01-25 17:14           ` J. Bruce Fields
2012-01-25 17:41             ` Chuck Lever
2012-01-25 18:55               ` J. Bruce Fields
2012-01-25 20:23                 ` Jeff Layton
2012-01-25 21:25                   ` J. Bruce Fields
2012-01-25 21:29                     ` Chuck Lever
2012-01-25 21:54                       ` J. Bruce Fields
2012-01-25 21:55                         ` Chuck Lever
2012-01-25 22:11                           ` J. Bruce Fields
2012-01-27 15:43                     ` Jeff Layton
2012-01-25 20:29                 ` Chuck Lever
2012-01-25 20:53                   ` J. Bruce Fields
2012-01-25 21:08                     ` Chuck Lever
2012-01-25 19:08           ` Jeff Layton
2012-01-24 23:10 ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).