[Cluster-devel] [PATCH 1/2] NLM failover unlock commands

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [Cluster-devel] [PATCH 1/2] NLM failover unlock commands
@ 2008-01-07  5:39 Wendy Cheng
       [not found] ` <message from Wendy Cheng on Monday January 7>
  2008-01-08 17:02 ` [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands Christoph Hellwig
  0 siblings, 2 replies; 57+ messages in thread
From: Wendy Cheng @ 2008-01-07  5:39 UTC (permalink / raw)
  To: cluster-devel.redhat.com

We've implemented two new NFSD procfs files:

o /proc/fs/nfsd/unlock_ip
o /proc/fs/nfsd/unlock_filesystem

They are intended to allow admin or user mode script to release NLM 
locks based on either a path name or a server in-bound ip address (ipv4 
for now)
as;

shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip
shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem

The expected usage is for High Availability (HA) environment where nfs 
servers are clustered together to provide either load balancing or take 
over upon server failure. The task is normally started by transferring a 
floating IP address from serverA to serverB with the following sequences:

ServerA:
1. Tear down the IP address
2. Unexport the path
3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
4. If unmount required,
      write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
5. Signal peer to begin take-over.

For details, check out:
http://people.redhat.com/wcheng/Patches/NFS/NLM/004.txt

Acknowledgment goes to Neil Brown who has been offered support and 
guidance during our prototype efforts.

-- Wendy

-------------- next part --------------
A non-text attachment was scrubbed...
Name: unlock_001.patch
Type: text/x-patch
Size: 11591 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20080107/5c32265e/attachment.bin>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] [PATCH 2/2] Fix lockd panic
@ 2008-01-07  5:53 Wendy Cheng
  0 siblings, 0 replies; 57+ messages in thread
From: Wendy Cheng @ 2008-01-07  5:53 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This small patch has not been changed since our last discussion:
http://www.opensubscriber.com/message/nfs at lists.sourceforge.net/6348912.html

To recap the issue, a client could ask for a posix lock that invokes:

 >>>     server calls nlm4svc_proc_lock() ->
 >>>         * server lookup file (f_count++)
 >>>         * server lock the file
 >>>         * server calls nlm_release_host
 >>>         * server calls nlm_release_file (f_count--)
 >>>         * server return to client with status 0
 >>>

As part of the lookup file, the lock stays on vfs inode->i_flock list 
with zero f_count. Any call into nlm_traverse_files() will BUG() in 
locks_remove_flock() (fs/locks.c:2034) during fclose(), if that file 
happens to be of no interest to that particular search. Since after 
nlm_inspect_file(), the logic unconditionally checks for possible 
removing of the file. As the file is not blocked, nothing to do with 
shares, and f_count is zero, it will get removed from hash and fclose() 
invoked with the posix lock hanging on i_flock list.

-- Wendy

-------------- next part --------------
A non-text attachment was scrubbed...
Name: unlock_002.patch
Type: text/x-patch
Size: 939 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20080107/7dc95a17/attachment.bin>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
       [not found] ` <message from Wendy Cheng on Monday January 7>
@ 2008-01-08  5:18   ` Neil Brown
  2008-01-09  2:51     ` Wendy Cheng
  2008-01-08  5:31   ` [Cluster-devel] Re: [PATCH 2/2] Fix lockd panic Neil Brown
  1 sibling, 1 reply; 57+ messages in thread
From: Neil Brown @ 2008-01-08  5:18 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Monday January 7, wcheng at redhat.com wrote:
> We've implemented two new NFSD procfs files:
> 
> o /proc/fs/nfsd/unlock_ip
> o /proc/fs/nfsd/unlock_filesystem
> 
> They are intended to allow admin or user mode script to release NLM 
> locks based on either a path name or a server in-bound ip address (ipv4 
> for now)
> as;
> 
> shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip
> shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem

I'm happy with this interface and the code looks credible, so 
  Acked-by: NeilBrown <neilb@suse.de>

however......
	

> --- linux-o/fs/nfsd/nfsctl.c	2008-01-04 10:01:08.000000000 -0500
> +++ linux/fs/nfsd/nfsctl.c	2008-01-06 15:27:34.000000000 -0500
> @@ -288,6 +295,56 @@ static ssize_t write_getfd(struct file *
>  	return err;
>  }
>  
> +extern __u32 in_aton(const char *str);

Bad.  It is "__be32" in linux/inet.h, and the difference an be
important.
Can you just #include <linux/inet.h> ???

> +
> +static
> +ssize_t failover_parse(int where, struct file *file, char *buf, size_t size)
> +{
> +	char *fo_path, *mesg;
> +	__be32 server_ip[4];

Why '4' ???

Also, fo_path is only sometimes a path, so the name choice could be
confusing.  You use "data" in the formal parameters for nfsd_fo_cmd,
which is more idiomatic at least.
Maybe we should have a 
union unlock_args {
	char *path;
	__be32 IPv4;
};
and pass around a pointer to such a union?
If you don't like that I would be happy with a 'void*', but not with a
'char *' called path.

> @@ -717,7 +776,6 @@ static void __exit exit_nfsd(void)
>  	nfsd4_free_slabs();
>  	unregister_filesystem(&nfsd_fs_type);
>  }
> -
>  MODULE_AUTHOR("Olaf Kirch <okir@monad.swb.de>");
>  MODULE_LICENSE("GPL");
>  module_init(init_nfsd)

Any good reason for removing this blank line?


> +int nlmsvc_fo_match(struct nlm_host *dummy1, struct nlm_host *dummy2)
> +{
> +        return 1;
> +}

White space damage.  Did you run checkpatch.pl??

> +int
> +nlmsvc_fo_cmd(int cmd, void *datap, int grace_time)
> +{
> +	nlm_fo_cmd fo_cmd;
> +	int rc=-EINVAL;
> +
> +	fo_printk("lockd: nlmsvc_fo_cmd enter, cmd=%d, datap=0x%p, gp=%d\n",
> +		cmd, datap, grace_time);
> +
> +	fo_cmd.cmd   = cmd;
> +	fo_cmd.stat  = 0;
> +	fo_cmd.gp    = 0;
> +	fo_cmd.datap = datap;
> +
> +	/* "if" place holder for NFSD_FO_RESUME */
> +	{
> +		/* fo_start */
> +		rc = nlm_traverse_files((struct nlm_host*) &fo_cmd, 
> +					nlmsvc_fo_match);
> +		fo_printk("nlmsvc_fo_cmd rc=%d, stat=%d\n", rc, fo_cmd.stat);
> +	} 
> +
> +	return rc;
> +}
> +
> +EXPORT_SYMBOL(nlmsvc_fo_cmd);

I think today's convention it to not have a blank line before
EXPORT_SYMBOL.  checkpatch.pl should pick this up for you.

> --- linux-o/include/linux/lockd/lockd.h	2008-01-04 10:01:08.000000000 -0500
> +++ linux/include/linux/lockd/lockd.h	2008-01-06 15:14:55.000000000 -0500
> @@ -39,7 +39,7 @@
>  struct nlm_host {
>  	struct hlist_node	h_hash;		/* doubly linked list */
>  	struct sockaddr_in	h_addr;		/* peer address */
> -	struct sockaddr_in	h_saddr;	/* our address (optional) */
> +	struct sockaddr_in      h_saddr;        /* our address (optional) */
>  	struct rpc_clnt	*	h_rpcclnt;	/* RPC client to talk to peer */
>  	char *			h_name;		/* remote hostname */
>  	u32			h_version;	/* interface version */

This change is purely white-space breakage.


> @@ -214,6 +215,17 @@ void		  nlmsvc_mark_resources(void);
>  void		  nlmsvc_free_host_resources(struct nlm_host *);
>  void		  nlmsvc_invalidate_all(void);
>  
> +/* cluster failover support */
> +
> +typedef struct {
> +	int     cmd;
> +	int     stat;
> +	int     gp;
> +	void    *datap;
> +} nlm_fo_cmd;

gp???  I guess that means 'grace period'.  It isn't used at all in
this patch.  Ideally it should only be introduce in the patch which
uses it, but it definitely needs a better name - and preferably a
comment.

NeilBrown



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 2/2] Fix lockd panic
       [not found] ` <message from Wendy Cheng on Monday January 7>
  2008-01-08  5:18   ` [Cluster-devel] " Neil Brown
@ 2008-01-08  5:31   ` Neil Brown
  2008-01-09  3:02     ` Wendy Cheng
  1 sibling, 1 reply; 57+ messages in thread
From: Neil Brown @ 2008-01-08  5:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Monday January 7, wcheng at redhat.com wrote:
> This small patch has not been changed since our last discussion:
> http://www.opensubscriber.com/message/nfs at lists.sourceforge.net/6348912.html
> 
> To recap the issue, a client could ask for a posix lock that invokes:
> 
>  >>>     server calls nlm4svc_proc_lock() ->
>  >>>         * server lookup file (f_count++)
>  >>>         * server lock the file
>  >>>         * server calls nlm_release_host
>  >>>         * server calls nlm_release_file (f_count--)
>  >>>         * server return to client with status 0
>  >>>
> 
> As part of the lookup file, the lock stays on vfs inode->i_flock list 
> with zero f_count. Any call into nlm_traverse_files() will BUG() in 
> locks_remove_flock() (fs/locks.c:2034) during fclose(), if that file 
> happens to be of no interest to that particular search. Since after 
> nlm_inspect_file(), the logic unconditionally checks for possible 
> removing of the file. As the file is not blocked, nothing to do with 
> shares, and f_count is zero, it will get removed from hash and fclose() 
> invoked with the posix lock hanging on i_flock list.

If I'm reading this correctly, this bug is introduced by your previous
patch.

The important difference between the old code and the new code here is
that the old code tests "file->f_locks" while the new code iterates
through i_flock to see if there are any lockd locks.

f_locks is set to a count of lockd locks in nlm_traverse_locks which
*was* always called by nlm_inspect_file which is called immediately
before the code you are changing.
But since your patch, nlm_inspect_file does not always call
nlm_traverse_locks, so there is a chance that f_locks is wrong.

With this patch, f_locks it not used at all any more.

Introducing a bug in one patch and fixing in the next is bad style.

Some options:

 Have an initial patch which removes all references to f_locks and
 includes the change in this patch.  With that in place your main
 patch won't introduce a bug.  If you do this, you should attempt to
 understand and justify the performance impact (does nlm_traverse_files
 become quadratic in number of locks.  Is that acceptable?).

 Change the first patch to explicitly update f_count if you bypass the
 call to nlm_inspect_file.

 something else???

So NAK for this one in it's current form... unless I've misunderstood
something.

NeilBrown

> 
> -- Wendy
> 
> This fixes the incorrect fclose call inside nlm_traverse_files() where
> a posix lock could still be held by NFS client. Problem was found in a
> kernel panic inside locks_remove_flock() (fs/locks.c:2034) as part of
> the fclose call due to NFS-NLM locks still hanging on inode->i_flock list.
> 
> Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
> 
>  svcsubs.c |    3 +--
>  1 files changed, 1 insertion(+), 2 deletions(-)
> 
> --- linux-nlm-1/fs/lockd/svcsubs.c	2008-01-06 18:23:20.000000000 -0500
> +++ linux/fs/lockd/svcsubs.c	2008-01-06 18:24:12.000000000 -0500
> @@ -332,8 +332,7 @@ nlm_traverse_files(struct nlm_host *host
>  			mutex_lock(&nlm_file_mutex);
>  			file->f_count--;
>  			/* No more references to this file. Let go of it. */
> -			if (list_empty(&file->f_blocks) && !file->f_locks
> -			 && !file->f_shares && !file->f_count) {
> +			if (!nlm_file_inuse(file)) {
>  				hlist_del(&file->f_list);
>  				nlmsvc_ops->fclose(file->f_file);
>  				kfree(file);

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-07  5:39 [Cluster-devel] [PATCH 1/2] NLM failover unlock commands Wendy Cheng
       [not found] ` <message from Wendy Cheng on Monday January 7>
@ 2008-01-08 17:02 ` Christoph Hellwig
  2008-01-08 17:49   ` Christoph Hellwig
  2008-01-09  3:49   ` Wendy Cheng
  1 sibling, 2 replies; 57+ messages in thread
From: Christoph Hellwig @ 2008-01-08 17:02 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Mon, Jan 07, 2008 at 12:39:25AM -0500, Wendy Cheng wrote:
> +#define DEBUG 0
> +#define fo_printk(x...) ((void)(DEBUG && printk(x)))

Please don't introduce more debugging helpers but use the existing
ones.

> +extern __u32 in_aton(const char *str);

This is properly declared in <linux/inet.h>

> +	return (nfsd_fo_cmd(where, fo_path, 0));

no braces around the return values please.  (happens multiple times)

> +int
> +nfsd_fo_cmd(int cmd, char *datap, int grace_period)
> +{
> +	struct nameidata nd;
> +	void *objp = (void *)datap;
> +	int rc=0;
> +
> +	if (cmd == NFSD_FO_PATH) { 
> +		rc = path_lookup((const char *)datap, 0, &nd);
> +		if (rc) {
> +			fo_printk("nfsd: nfsd_fo path (%s) not found\n", datap);
> +			return rc;
> +		}
> +		fo_printk("nfsd: nfsd_fo lookup path = (0x%p,0x%p)\n", 
> +			nd.mnt, nd.dentry);
> +		objp = (void *) &nd;
> +	} 
> +	return (nlmsvc_fo_cmd(cmd, objp, grace_period));

this has nothing in common for the two cases except for the final
function call.  Please just inline this function into the caller which
gives you quite a bit of nice cleanup by not passing all the parameters
in odd ways aswell.

And btw, I think this code has quite a bit too much debug printks,
almost more than code.  I'd be better readable by reducing that.

> +static inline int
> +nlmsvc_fo_unlock_match(void *datap, struct nlm_file *file)
> +{
> +	nlm_fo_cmd *fo_cmd = (nlm_fo_cmd *) datap;
> +	int cmd = fo_cmd->cmd;
> +	struct path *f_path;
> +
> +	fo_printk("nlm_fo_unlock_match cmd=%d\n", cmd);
> +
> +	if (cmd == NFSD_FO_VIP) {

Please split this into two separate functions for the NFSD_FO_VIP/
NFSD_FO_PATH cases as there's just about nothing in common for the two.

>  {
> +	/* Cluster failover has timing constraints. There is a slight
> +	 * performance hit if nlm_fo_unlock_match() is implemented as 
> +	 * a match fn (since it will be invoked for each block, share,
> +	 * and lock later when the lists are traversed). Instead, we 
> +	 * add path-matching logic into the following unlikely clause. 
> +	 * If matches, the dummy nlmsvc_fo_match will always return 
> +	 * true. 
> +	 */
> +	dprintk("nlm_inspect_files: file=%p\n", file);
> +	if (unlikely(match == nlmsvc_fo_match)) {
> +		if (!nlmsvc_fo_unlock_match((void *)host, file))
> +			return 0;
> +		fo_printk("nlm_fo find lock file entry (0x%p)\n", file);
> +	}

That's a quite nast hack.  Did you benchmark the the match fn variant
to see if there is actually any mesurable difference?  Also no need
to downcast pointers to void *, it's implicit in C.

> +	/* "if" place holder for NFSD_FO_RESUME */
> +	{

no need for such placeholders.

> +/* cluster failover support */
> +
> +typedef struct {
> +	int     cmd;
> +	int     stat;
> +	int     gp;
> +	void    *datap;
> +} nlm_fo_cmd;

please don't introduce typedefs for struct types.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-08 17:02 ` [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands Christoph Hellwig
@ 2008-01-08 17:49   ` Christoph Hellwig
  2008-01-08 20:57     ` Wendy Cheng
  2008-01-09  3:49   ` Wendy Cheng
  1 sibling, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2008-01-08 17:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Ok, I played around with this and cleaned up the ip/path codepathes to
be entirely setup which helped the code quite a bit.  Also a few other
cleanups and two bugfixes (postive error code returned and missing
path_release) fell out of it.  I still don't like what's going on in
fs/lockd/svcsubs.c, it would be much better if the cluster unlock code
simply didn't use nlm_traverse_files but did it's own loop over
the nlm hosts.  That should also absolete the second patch.


Index: linux-2.6/fs/nfsd/nfsctl.c
===================================================================
--- linux-2.6.orig/fs/nfsd/nfsctl.c	2008-01-08 18:36:37.000000000 +0100
+++ linux-2.6/fs/nfsd/nfsctl.c	2008-01-08 18:45:55.000000000 +0100
@@ -22,6 +22,7 @@
 #include <linux/seq_file.h>
 #include <linux/pagemap.h>
 #include <linux/init.h>
+#include <linux/inet.h>
 #include <linux/string.h>
 #include <linux/smp_lock.h>
 #include <linux/ctype.h>
@@ -35,6 +36,7 @@
 #include <linux/nfsd/cache.h>
 #include <linux/nfsd/xdr.h>
 #include <linux/nfsd/syscall.h>
+#include <linux/lockd/lockd.h>
 
 #include <asm/uaccess.h>
 
@@ -52,6 +54,8 @@ enum {
 	NFSD_Getfs,
 	NFSD_List,
 	NFSD_Fh,
+	NFSD_FO_UnlockIP,
+	NFSD_FO_UnlockFS,
 	NFSD_Threads,
 	NFSD_Pool_Threads,
 	NFSD_Versions,
@@ -88,6 +92,9 @@ static ssize_t write_leasetime(struct fi
 static ssize_t write_recoverydir(struct file *file, char *buf, size_t size);
 #endif
 
+static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size);
+static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size);
+
 static ssize_t (*write_op[])(struct file *, char *, size_t) = {
 	[NFSD_Svc] = write_svc,
 	[NFSD_Add] = write_add,
@@ -97,6 +104,8 @@ static ssize_t (*write_op[])(struct file
 	[NFSD_Getfd] = write_getfd,
 	[NFSD_Getfs] = write_getfs,
 	[NFSD_Fh] = write_filehandle,
+	[NFSD_FO_UnlockIP] = failover_unlock_ip,
+	[NFSD_FO_UnlockFS] = failover_unlock_fs,
 	[NFSD_Threads] = write_threads,
 	[NFSD_Pool_Threads] = write_pool_threads,
 	[NFSD_Versions] = write_versions,
@@ -288,6 +297,55 @@ static ssize_t write_getfd(struct file *
 	return err;
 }
 
+static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size)
+{
+	__be32 server_ip;
+	char *fo_path;
+	char *mesg;
+
+	/* sanity check */
+	if (size <= 0)
+		return -EINVAL;
+
+	if (buf[size-1] == '\n')
+		buf[size-1] = 0;
+
+	fo_path = mesg = buf;
+	if (qword_get(&mesg, fo_path, size) < 0)
+		return -EINVAL;
+
+	server_ip = in_aton(fo_path);
+	return nlmsvc_failover_ip(server_ip[0]);
+}
+
+static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size)
+{
+	struct nameidata nd;
+	char *fo_path;
+	char *mesg;
+	int error;
+
+	/* sanity check */
+	if (size <= 0)
+		return -EINVAL;
+
+	if (buf[size-1] == '\n')
+		buf[size-1] = 0;
+
+	fo_path = mesg = buf;
+	if (qword_get(&mesg, fo_path, size) < 0)
+		return -EINVAL;
+
+	error = path_lookup(fo_path, 0, &nd);
+	if (error)
+		return error;
+
+	error = nlmsvc_failover_path(&nd);
+
+	path_release(&nd);
+	return error;
+}
+
 static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
 {
 	/* request is:
@@ -646,6 +704,8 @@ static int nfsd_fill_super(struct super_
 		[NFSD_Getfd] = {".getfd", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Getfs] = {".getfs", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_List] = {"exports", &exports_operations, S_IRUGO},
+		[NFSD_FO_UnlockIP] = {"unlock_ip", &transaction_ops, S_IWUSR|S_IRUSR},
+		[NFSD_FO_UnlockFS] = {"unlock_filesystem", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
Index: linux-2.6/fs/lockd/svcsubs.c
===================================================================
--- linux-2.6.orig/fs/lockd/svcsubs.c	2008-01-08 18:36:37.000000000 +0100
+++ linux-2.6/fs/lockd/svcsubs.c	2008-01-08 18:44:11.000000000 +0100
@@ -18,6 +18,8 @@
 #include <linux/lockd/lockd.h>
 #include <linux/lockd/share.h>
 #include <linux/lockd/sm_inter.h>
+#include <linux/module.h>
+#include <linux/mount.h>
 
 #define NLMDBG_FACILITY		NLMDBG_SVCSUBS
 
@@ -87,7 +89,7 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
 	unsigned int	hash;
 	__be32		nfserr;
 
-	nlm_debug_print_fh("nlm_file_lookup", f);
+	nlm_debug_print_fh("nlm_lookup_file", f);
 
 	hash = file_hash(f);
 
@@ -123,6 +125,9 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
 
 	hlist_add_head(&file->f_list, &nlm_files[hash]);
 
+	/* fill in f_iaddr for nlm lock failover */
+	file->f_iaddr = rqstp->rq_daddr;
+
 found:
 	dprintk("lockd: found file %p (count %d)\n", file, file->f_count);
 	*result = file;
@@ -194,12 +199,63 @@ again:
 	return 0;
 }
 
+static int
+nlmsvc_fo_unlock_match_path(void *datap, struct nlm_file *file)
+{
+	struct nameidata *nd = datap;
+	return nd->mnt == file->f_file->f_path.mnt;
+}
+
+static int
+nlmsvc_fo_unlock_match_ip(void *datap, struct nlm_file *file)
+{
+	struct in_addr *in = datap;
+
+	return file->f_iaddr.addr.s_addr == in->s_addr;
+}
+
+/*
+ * To fit the logic into current lockd code structure, we add a
+ * little wrapper function here. The real matching task should be
+ * carried out by nlm_fo_check_fsid().
+ */
+
+static int nlmsvc_fo_match_path(struct nlm_host *dummy1,
+		struct nlm_host *dummy2)
+{
+        return 1;
+}
+
+static int nlmsvc_fo_match_ip(struct nlm_host *dummy1, struct nlm_host *dummy2)
+{
+        return 1;
+}
+
 /*
  * Inspect a single file
  */
 static inline int
 nlm_inspect_file(struct nlm_host *host, struct nlm_file *file, nlm_host_match_fn_t match)
 {
+	/*
+	 * Cluster failover has timing constraints. There is a slight
+	 * performance hit if nlm_fo_unlock_match() is implemented as
+	 * a match fn (since it will be invoked for each block, share,
+	 * and lock later when the lists are traversed). Instead, we
+	 * add path-matching logic into the following unlikely clause.
+	 * If matches, the dummy nlmsvc_fo_match will always return
+	 * true.
+	 */
+	dprintk("nlm_inspect_files: file=%p\n", file);
+	if (unlikely(match == nlmsvc_fo_match_path)) {
+		if (!nlmsvc_fo_unlock_match_path(host, file))
+			return 0;
+	}
+	if (unlikely(match == nlmsvc_fo_match_ip)) {
+		if (!nlmsvc_fo_unlock_match_ip(host, file))
+			return 0;
+	}
+
 	nlmsvc_traverse_blocks(host, file, match);
 	nlmsvc_traverse_shares(host, file, match);
 	return nlm_traverse_locks(host, file, match);
@@ -370,3 +426,22 @@ nlmsvc_invalidate_all(void)
 	 */
 	nlm_traverse_files(NULL, nlmsvc_is_client);
 }
+
+/*
+ * Release locks associated with an export fsid upon failover
+ *      invoked via nfsd nfsctl call (write_fo_unlock).
+ */
+int
+nlmsvc_failover_path(struct nameidata *nd)
+{
+	return nlm_traverse_files((struct nlm_host*)nd, nlmsvc_fo_match_path);
+}
+EXPORT_SYMBOL_GPL(nlmsvc_failover_path);
+
+int
+nlmsvc_failover_ip(__be32 server_addr)
+{
+	return nlm_traverse_files((struct nlm_host *)&server_addr,
+				nlmsvc_fo_match_ip);
+}
+EXPORT_SYMBOL_GPL(nlmsvc_failover_ip);
Index: linux-2.6/include/linux/lockd/lockd.h
===================================================================
--- linux-2.6.orig/include/linux/lockd/lockd.h	2008-01-08 18:36:37.000000000 +0100
+++ linux-2.6/include/linux/lockd/lockd.h	2008-01-08 18:44:44.000000000 +0100
@@ -113,6 +113,7 @@ struct nlm_file {
 	unsigned int		f_locks;	/* guesstimate # of locks */
 	unsigned int		f_count;	/* reference count */
 	struct mutex		f_mutex;	/* avoid concurrent access */
+	union svc_addr_u	f_iaddr;	/* server ip for failover */
 };
 
 /*
@@ -214,6 +215,12 @@ void		  nlmsvc_mark_resources(void);
 void		  nlmsvc_free_host_resources(struct nlm_host *);
 void		  nlmsvc_invalidate_all(void);
 
+/*
+ * Cluster failover support
+ */
+int           nlmsvc_failover_path(struct nameidata *nd);
+int           nlmsvc_failover_ip(__be32 server_addr);
+
 static __inline__ struct inode *
 nlmsvc_file_inode(struct nlm_file *file)
 {



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-08 17:49   ` Christoph Hellwig
@ 2008-01-08 20:57     ` Wendy Cheng
  2008-01-09 18:02       ` Christoph Hellwig
  0 siblings, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-08 20:57 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Christoph Hellwig wrote:
> Ok, I played around with this and cleaned up the ip/path codepathes to
> be entirely setup which helped the code quite a bit.  Also a few other
>   
Thanks for doing this :) . In the middle of running it with our cluster 
test - if passed, will repost it. Get your "signed-off" line ready ?

-- Wendy
> cleanups and two bugfixes (postive error code returned and missing
> path_release) fell out of it.  I still don't like what's going on in
> fs/lockd/svcsubs.c, it would be much better if the cluster unlock code
> simply didn't use nlm_traverse_files but did it's own loop over
> the nlm hosts.  That should also absolete the second patch.
>   
>
> Index: linux-2.6/fs/nfsd/nfsctl.c
> ===================================================================
> --- linux-2.6.orig/fs/nfsd/nfsctl.c	2008-01-08 18:36:37.000000000 +0100
> +++ linux-2.6/fs/nfsd/nfsctl.c	2008-01-08 18:45:55.000000000 +0100
> @@ -22,6 +22,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/pagemap.h>
>  #include <linux/init.h>
> +#include <linux/inet.h>
>  #include <linux/string.h>
>  #include <linux/smp_lock.h>
>  #include <linux/ctype.h>
> @@ -35,6 +36,7 @@
>  #include <linux/nfsd/cache.h>
>  #include <linux/nfsd/xdr.h>
>  #include <linux/nfsd/syscall.h>
> +#include <linux/lockd/lockd.h>
>  
>  #include <asm/uaccess.h>
>  
> @@ -52,6 +54,8 @@ enum {
>  	NFSD_Getfs,
>  	NFSD_List,
>  	NFSD_Fh,
> +	NFSD_FO_UnlockIP,
> +	NFSD_FO_UnlockFS,
>  	NFSD_Threads,
>  	NFSD_Pool_Threads,
>  	NFSD_Versions,
> @@ -88,6 +92,9 @@ static ssize_t write_leasetime(struct fi
>  static ssize_t write_recoverydir(struct file *file, char *buf, size_t size);
>  #endif
>  
> +static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size);
> +static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size);
> +
>  static ssize_t (*write_op[])(struct file *, char *, size_t) = {
>  	[NFSD_Svc] = write_svc,
>  	[NFSD_Add] = write_add,
> @@ -97,6 +104,8 @@ static ssize_t (*write_op[])(struct file
>  	[NFSD_Getfd] = write_getfd,
>  	[NFSD_Getfs] = write_getfs,
>  	[NFSD_Fh] = write_filehandle,
> +	[NFSD_FO_UnlockIP] = failover_unlock_ip,
> +	[NFSD_FO_UnlockFS] = failover_unlock_fs,
>  	[NFSD_Threads] = write_threads,
>  	[NFSD_Pool_Threads] = write_pool_threads,
>  	[NFSD_Versions] = write_versions,
> @@ -288,6 +297,55 @@ static ssize_t write_getfd(struct file *
>  	return err;
>  }
>  
> +static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size)
> +{
> +	__be32 server_ip;
> +	char *fo_path;
> +	char *mesg;
> +
> +	/* sanity check */
> +	if (size <= 0)
> +		return -EINVAL;
> +
> +	if (buf[size-1] == '\n')
> +		buf[size-1] = 0;
> +
> +	fo_path = mesg = buf;
> +	if (qword_get(&mesg, fo_path, size) < 0)
> +		return -EINVAL;
> +
> +	server_ip = in_aton(fo_path);
> +	return nlmsvc_failover_ip(server_ip[0]);
> +}
> +
> +static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size)
> +{
> +	struct nameidata nd;
> +	char *fo_path;
> +	char *mesg;
> +	int error;
> +
> +	/* sanity check */
> +	if (size <= 0)
> +		return -EINVAL;
> +
> +	if (buf[size-1] == '\n')
> +		buf[size-1] = 0;
> +
> +	fo_path = mesg = buf;
> +	if (qword_get(&mesg, fo_path, size) < 0)
> +		return -EINVAL;
> +
> +	error = path_lookup(fo_path, 0, &nd);
> +	if (error)
> +		return error;
> +
> +	error = nlmsvc_failover_path(&nd);
> +
> +	path_release(&nd);
> +	return error;
> +}
> +
>  static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
>  {
>  	/* request is:
> @@ -646,6 +704,8 @@ static int nfsd_fill_super(struct super_
>  		[NFSD_Getfd] = {".getfd", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Getfs] = {".getfs", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_List] = {"exports", &exports_operations, S_IRUGO},
> +		[NFSD_FO_UnlockIP] = {"unlock_ip", &transaction_ops, S_IWUSR|S_IRUSR},
> +		[NFSD_FO_UnlockFS] = {"unlock_filesystem", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
> Index: linux-2.6/fs/lockd/svcsubs.c
> ===================================================================
> --- linux-2.6.orig/fs/lockd/svcsubs.c	2008-01-08 18:36:37.000000000 +0100
> +++ linux-2.6/fs/lockd/svcsubs.c	2008-01-08 18:44:11.000000000 +0100
> @@ -18,6 +18,8 @@
>  #include <linux/lockd/lockd.h>
>  #include <linux/lockd/share.h>
>  #include <linux/lockd/sm_inter.h>
> +#include <linux/module.h>
> +#include <linux/mount.h>
>  
>  #define NLMDBG_FACILITY		NLMDBG_SVCSUBS
>  
> @@ -87,7 +89,7 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
>  	unsigned int	hash;
>  	__be32		nfserr;
>  
> -	nlm_debug_print_fh("nlm_file_lookup", f);
> +	nlm_debug_print_fh("nlm_lookup_file", f);
>  
>  	hash = file_hash(f);
>  
> @@ -123,6 +125,9 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
>  
>  	hlist_add_head(&file->f_list, &nlm_files[hash]);
>  
> +	/* fill in f_iaddr for nlm lock failover */
> +	file->f_iaddr = rqstp->rq_daddr;
> +
>  found:
>  	dprintk("lockd: found file %p (count %d)\n", file, file->f_count);
>  	*result = file;
> @@ -194,12 +199,63 @@ again:
>  	return 0;
>  }
>  
> +static int
> +nlmsvc_fo_unlock_match_path(void *datap, struct nlm_file *file)
> +{
> +	struct nameidata *nd = datap;
> +	return nd->mnt == file->f_file->f_path.mnt;
> +}
> +
> +static int
> +nlmsvc_fo_unlock_match_ip(void *datap, struct nlm_file *file)
> +{
> +	struct in_addr *in = datap;
> +
> +	return file->f_iaddr.addr.s_addr == in->s_addr;
> +}
> +
> +/*
> + * To fit the logic into current lockd code structure, we add a
> + * little wrapper function here. The real matching task should be
> + * carried out by nlm_fo_check_fsid().
> + */
> +
> +static int nlmsvc_fo_match_path(struct nlm_host *dummy1,
> +		struct nlm_host *dummy2)
> +{
> +        return 1;
> +}
> +
> +static int nlmsvc_fo_match_ip(struct nlm_host *dummy1, struct nlm_host *dummy2)
> +{
> +        return 1;
> +}
> +
>  /*
>   * Inspect a single file
>   */
>  static inline int
>  nlm_inspect_file(struct nlm_host *host, struct nlm_file *file, nlm_host_match_fn_t match)
>  {
> +	/*
> +	 * Cluster failover has timing constraints. There is a slight
> +	 * performance hit if nlm_fo_unlock_match() is implemented as
> +	 * a match fn (since it will be invoked for each block, share,
> +	 * and lock later when the lists are traversed). Instead, we
> +	 * add path-matching logic into the following unlikely clause.
> +	 * If matches, the dummy nlmsvc_fo_match will always return
> +	 * true.
> +	 */
> +	dprintk("nlm_inspect_files: file=%p\n", file);
> +	if (unlikely(match == nlmsvc_fo_match_path)) {
> +		if (!nlmsvc_fo_unlock_match_path(host, file))
> +			return 0;
> +	}
> +	if (unlikely(match == nlmsvc_fo_match_ip)) {
> +		if (!nlmsvc_fo_unlock_match_ip(host, file))
> +			return 0;
> +	}
> +
>  	nlmsvc_traverse_blocks(host, file, match);
>  	nlmsvc_traverse_shares(host, file, match);
>  	return nlm_traverse_locks(host, file, match);
> @@ -370,3 +426,22 @@ nlmsvc_invalidate_all(void)
>  	 */
>  	nlm_traverse_files(NULL, nlmsvc_is_client);
>  }
> +
> +/*
> + * Release locks associated with an export fsid upon failover
> + *      invoked via nfsd nfsctl call (write_fo_unlock).
> + */
> +int
> +nlmsvc_failover_path(struct nameidata *nd)
> +{
> +	return nlm_traverse_files((struct nlm_host*)nd, nlmsvc_fo_match_path);
> +}
> +EXPORT_SYMBOL_GPL(nlmsvc_failover_path);
> +
> +int
> +nlmsvc_failover_ip(__be32 server_addr)
> +{
> +	return nlm_traverse_files((struct nlm_host *)&server_addr,
> +				nlmsvc_fo_match_ip);
> +}
> +EXPORT_SYMBOL_GPL(nlmsvc_failover_ip);
> Index: linux-2.6/include/linux/lockd/lockd.h
> ===================================================================
> --- linux-2.6.orig/include/linux/lockd/lockd.h	2008-01-08 18:36:37.000000000 +0100
> +++ linux-2.6/include/linux/lockd/lockd.h	2008-01-08 18:44:44.000000000 +0100
> @@ -113,6 +113,7 @@ struct nlm_file {
>  	unsigned int		f_locks;	/* guesstimate # of locks */
>  	unsigned int		f_count;	/* reference count */
>  	struct mutex		f_mutex;	/* avoid concurrent access */
> +	union svc_addr_u	f_iaddr;	/* server ip for failover */
>  };
>  
>  /*
> @@ -214,6 +215,12 @@ void		  nlmsvc_mark_resources(void);
>  void		  nlmsvc_free_host_resources(struct nlm_host *);
>  void		  nlmsvc_invalidate_all(void);
>  
> +/*
> + * Cluster failover support
> + */
> +int           nlmsvc_failover_path(struct nameidata *nd);
> +int           nlmsvc_failover_ip(__be32 server_addr);
> +
>  static __inline__ struct inode *
>  nlmsvc_file_inode(struct nlm_file *file)
>  {
>   



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-08  5:18   ` [Cluster-devel] " Neil Brown
@ 2008-01-09  2:51     ` Wendy Cheng
  0 siblings, 0 replies; 57+ messages in thread
From: Wendy Cheng @ 2008-01-09  2:51 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Neil Brown wrote:

>On Monday January 7, wcheng at redhat.com wrote:
>  
>
>>We've implemented two new NFSD procfs files:
>>
>>o /proc/fs/nfsd/unlock_ip
>>o /proc/fs/nfsd/unlock_filesystem
>>
>>They are intended to allow admin or user mode script to release NLM 
>>locks based on either a path name or a server in-bound ip address (ipv4 
>>for now)
>>as;
>>
>>shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip
>>shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem
>>    
>>
>
>I'm happy with this interface and the code looks credible, so 
>  Acked-by: NeilBrown <neilb@suse.de>
>
>however......
>  
>
[snip]

Thank .. all points taken .. patch will be re-submitted tomorrow (my 
time zone, North Carolina, US) ..

-- Wendy



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 2/2] Fix lockd panic
  2008-01-08  5:31   ` [Cluster-devel] Re: [PATCH 2/2] Fix lockd panic Neil Brown
@ 2008-01-09  3:02     ` Wendy Cheng
  2008-01-09  4:43       ` Wendy Cheng
  0 siblings, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-09  3:02 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Neil Brown wrote:

>
>If I'm reading this correctly, this bug is introduced by your previous
>patch.
>  
>
Depending on how you see the issue. From my end, I view this as the 
existing code has a "trap" and I fell into it. This is probably a chance 
to clean up this logic.

>The important difference between the old code and the new code here is
>that the old code tests "file->f_locks" while the new code iterates
>through i_flock to see if there are any lockd locks.
>
>f_locks is set to a count of lockd locks in nlm_traverse_locks which
>*was* always called by nlm_inspect_file which is called immediately
>before the code you are changing.
>But since your patch, nlm_inspect_file does not always call
>nlm_traverse_locks, so there is a chance that f_locks is wrong.
>
>With this patch, f_locks it not used at all any more.
>  
>
Yes, a fair description of the issue !

>Introducing a bug in one patch and fixing in the next is bad style.
>  
>
ok .....

>Some options:
>
> Have an initial patch which removes all references to f_locks and
> includes the change in this patch.  With that in place your main
> patch won't introduce a bug.  If you do this, you should attempt to
> understand and justify the performance impact (does nlm_traverse_files
> become quadratic in number of locks.  Is that acceptable?).
>
> Change the first patch to explicitly update f_count if you bypass the
> call to nlm_inspect_file.
>
> something else???
>  
>

Let's see what hch says in another email... will come back to this soon.

>So NAK for this one in it's current form... unless I've misunderstood
>something.
>
>  
>
I expect this :)

-- Wendy



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-08 17:02 ` [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands Christoph Hellwig
  2008-01-08 17:49   ` Christoph Hellwig
@ 2008-01-09  3:49   ` Wendy Cheng
  2008-01-09 16:13     ` J. Bruce Fields
  1 sibling, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-09  3:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Christoph Hellwig wrote:

>>+/* cluster failover support */
>>+
>>+typedef struct {
>>+	int     cmd;
>>+	int     stat;
>>+	int     gp;
>>+	void    *datap;
>>+} nlm_fo_cmd;
>>    
>>
>
>please don't introduce typedefs for struct types.
>  
>

I don't do much community version of linux code so its coding standard 
is new to me. Any reason for this (not doing typedefs) ?

-- Wendy



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 2/2] Fix lockd panic
  2008-01-09  3:02     ` Wendy Cheng
@ 2008-01-09  4:43       ` Wendy Cheng
  2008-01-09 23:33         ` Wendy Cheng
  0 siblings, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-09  4:43 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Wendy Cheng wrote:

> Neil Brown wrote:
>
>> Some options:
>>
>> Have an initial patch which removes all references to f_locks and
>> includes the change in this patch. With that in place your main
>> patch won't introduce a bug. If you do this, you should attempt to
>> understand and justify the performance impact (does nlm_traverse_files
>> become quadratic in number of locks. Is that acceptable?).
>>
>> Change the first patch to explicitly update f_count if you bypass the
>> call to nlm_inspect_file.
>>
>> something else???
>>
>>
>
> Let's see what hch says in another email... will come back to this soon.
>

Will do a quick and dirty performance measure tomorrow when I get to the 
office. Then we'll go from there.

-- Wendy



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-09  3:49   ` Wendy Cheng
@ 2008-01-09 16:13     ` J. Bruce Fields
  0 siblings, 0 replies; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-09 16:13 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Jan 08, 2008 at 10:49:17PM -0500, Wendy Cheng wrote:
> Christoph Hellwig wrote:
>
>>> +/* cluster failover support */
>>> +
>>> +typedef struct {
>>> +	int     cmd;
>>> +	int     stat;
>>> +	int     gp;
>>> +	void    *datap;
>>> +} nlm_fo_cmd;
>>>    
>>>
>>
>> please don't introduce typedefs for struct types.
>>  
>>
>
> I don't do much community version of linux code so its coding standard  
> is new to me. Any reason for this (not doing typedefs) ?

The argument is in "Chapter 5: Typdefs" of Documentation/CodingStyle.

--b.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-08 20:57     ` Wendy Cheng
@ 2008-01-09 18:02       ` Christoph Hellwig
  2008-01-10  7:59         ` Christoph Hellwig
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2008-01-09 18:02 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Jan 08, 2008 at 03:57:45PM -0500, Wendy Cheng wrote:
> Christoph Hellwig wrote:
>> Ok, I played around with this and cleaned up the ip/path codepathes to
>> be entirely setup which helped the code quite a bit.  Also a few other
>>   
> Thanks for doing this :) . In the middle of running it with our cluster 
> test - if passed, will repost it. Get your "signed-off" line ready ?

Not quite yet.  I'm not happy with what's going on in svcsubs.c in the
current form.

I've added another (untested) idea patch below which adds a second
match function to nlm_traverse_files to remove the current hardcoded
hack.  If that works out we'll just need to incorporate Neil's feedback
to the second patch somehow.


Index: linux-2.6/fs/nfsd/nfsctl.c
===================================================================
--- linux-2.6.orig/fs/nfsd/nfsctl.c	2008-01-08 18:36:37.000000000 +0100
+++ linux-2.6/fs/nfsd/nfsctl.c	2008-01-08 18:45:55.000000000 +0100
@@ -22,6 +22,7 @@
 #include <linux/seq_file.h>
 #include <linux/pagemap.h>
 #include <linux/init.h>
+#include <linux/inet.h>
 #include <linux/string.h>
 #include <linux/smp_lock.h>
 #include <linux/ctype.h>
@@ -35,6 +36,7 @@
 #include <linux/nfsd/cache.h>
 #include <linux/nfsd/xdr.h>
 #include <linux/nfsd/syscall.h>
+#include <linux/lockd/lockd.h>
 
 #include <asm/uaccess.h>
 
@@ -52,6 +54,8 @@ enum {
 	NFSD_Getfs,
 	NFSD_List,
 	NFSD_Fh,
+	NFSD_FO_UnlockIP,
+	NFSD_FO_UnlockFS,
 	NFSD_Threads,
 	NFSD_Pool_Threads,
 	NFSD_Versions,
@@ -88,6 +92,9 @@ static ssize_t write_leasetime(struct fi
 static ssize_t write_recoverydir(struct file *file, char *buf, size_t size);
 #endif
 
+static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size);
+static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size);
+
 static ssize_t (*write_op[])(struct file *, char *, size_t) = {
 	[NFSD_Svc] = write_svc,
 	[NFSD_Add] = write_add,
@@ -97,6 +104,8 @@ static ssize_t (*write_op[])(struct file
 	[NFSD_Getfd] = write_getfd,
 	[NFSD_Getfs] = write_getfs,
 	[NFSD_Fh] = write_filehandle,
+	[NFSD_FO_UnlockIP] = failover_unlock_ip,
+	[NFSD_FO_UnlockFS] = failover_unlock_fs,
 	[NFSD_Threads] = write_threads,
 	[NFSD_Pool_Threads] = write_pool_threads,
 	[NFSD_Versions] = write_versions,
@@ -288,6 +297,55 @@ static ssize_t write_getfd(struct file *
 	return err;
 }
 
+static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size)
+{
+	__be32 server_ip;
+	char *fo_path;
+	char *mesg;
+
+	/* sanity check */
+	if (size <= 0)
+		return -EINVAL;
+
+	if (buf[size-1] == '\n')
+		buf[size-1] = 0;
+
+	fo_path = mesg = buf;
+	if (qword_get(&mesg, fo_path, size) < 0)
+		return -EINVAL;
+
+	server_ip = in_aton(fo_path);
+	return nlmsvc_failover_ip(server_ip[0]);
+}
+
+static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size)
+{
+	struct nameidata nd;
+	char *fo_path;
+	char *mesg;
+	int error;
+
+	/* sanity check */
+	if (size <= 0)
+		return -EINVAL;
+
+	if (buf[size-1] == '\n')
+		buf[size-1] = 0;
+
+	fo_path = mesg = buf;
+	if (qword_get(&mesg, fo_path, size) < 0)
+		return -EINVAL;
+
+	error = path_lookup(fo_path, 0, &nd);
+	if (error)
+		return error;
+
+	error = nlmsvc_failover_path(&nd);
+
+	path_release(&nd);
+	return error;
+}
+
 static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
 {
 	/* request is:
@@ -646,6 +704,8 @@ static int nfsd_fill_super(struct super_
 		[NFSD_Getfd] = {".getfd", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Getfs] = {".getfs", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_List] = {"exports", &exports_operations, S_IRUGO},
+		[NFSD_FO_UnlockIP] = {"unlock_ip", &transaction_ops, S_IWUSR|S_IRUSR},
+		[NFSD_FO_UnlockFS] = {"unlock_filesystem", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
Index: linux-2.6/fs/lockd/svcsubs.c
===================================================================
--- linux-2.6.orig/fs/lockd/svcsubs.c	2008-01-08 18:36:37.000000000 +0100
+++ linux-2.6/fs/lockd/svcsubs.c	2008-01-09 18:59:37.000000000 +0100
@@ -18,6 +18,8 @@
 #include <linux/lockd/lockd.h>
 #include <linux/lockd/share.h>
 #include <linux/lockd/sm_inter.h>
+#include <linux/module.h>
+#include <linux/mount.h>
 
 #define NLMDBG_FACILITY		NLMDBG_SVCSUBS
 
@@ -87,7 +89,7 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
 	unsigned int	hash;
 	__be32		nfserr;
 
-	nlm_debug_print_fh("nlm_file_lookup", f);
+	nlm_debug_print_fh("nlm_lookup_file", f);
 
 	hash = file_hash(f);
 
@@ -123,6 +125,9 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
 
 	hlist_add_head(&file->f_list, &nlm_files[hash]);
 
+	/* fill in f_iaddr for nlm lock failover */
+	file->f_iaddr = rqstp->rq_daddr;
+
 found:
 	dprintk("lockd: found file %p (count %d)\n", file, file->f_count);
 	*result = file;
@@ -194,6 +199,12 @@ again:
 	return 0;
 }
 
+static int
+nlmsvc_always_match(struct nlm_host *dummy1, struct nlm_host *dummy2)
+{
+        return 1;
+}
+
 /*
  * Inspect a single file
  */
@@ -230,7 +241,8 @@ nlm_file_inuse(struct nlm_file *file)
  * Loop over all files in the file table.
  */
 static int
-nlm_traverse_files(struct nlm_host *host, nlm_host_match_fn_t match)
+nlm_traverse_files(void *data, nlm_host_match_fn_t match,
+		int (*file_ok)(void *data, struct nlm_file *file))
 {
 	struct hlist_node *pos, *next;
 	struct nlm_file	*file;
@@ -244,8 +256,10 @@ nlm_traverse_files(struct nlm_host *host
 
 			/* Traverse locks, blocks and shares of this file
 			 * and update file->f_locks count */
-			if (nlm_inspect_file(host, file, match))
-				ret = 1;
+			if (file_ok && file_ok(data, file)) {
+				if (nlm_inspect_file(data, file, match))
+					ret = 1;
+			}
 
 			mutex_lock(&nlm_file_mutex);
 			file->f_count--;
@@ -337,7 +351,7 @@ void
 nlmsvc_mark_resources(void)
 {
 	dprintk("lockd: nlmsvc_mark_resources\n");
-	nlm_traverse_files(NULL, nlmsvc_mark_host);
+	nlm_traverse_files(NULL, nlmsvc_mark_host, NULL);
 }
 
 /*
@@ -348,7 +362,7 @@ nlmsvc_free_host_resources(struct nlm_ho
 {
 	dprintk("lockd: nlmsvc_free_host_resources\n");
 
-	if (nlm_traverse_files(host, nlmsvc_same_host)) {
+	if (nlm_traverse_files(host, nlmsvc_same_host, NULL)) {
 		printk(KERN_WARNING
 			"lockd: couldn't remove all locks held by %s\n",
 			host->h_name);
@@ -368,5 +382,36 @@ nlmsvc_invalidate_all(void)
 	 * turn, which is about as inefficient as it gets.
 	 * Now we just do it once in nlm_traverse_files.
 	 */
-	nlm_traverse_files(NULL, nlmsvc_is_client);
+	nlm_traverse_files(NULL, nlmsvc_is_client, NULL);
+}
+
+static int
+nlmsvc_failover_file_ok_path(void *datap, struct nlm_file *file)
+{
+	struct nameidata *nd = datap;
+	return nd->mnt == file->f_file->f_path.mnt;
+}
+
+int
+nlmsvc_failover_path(struct nameidata *nd)
+{
+	return nlm_traverse_files(nd, nlmsvc_always_match,
+			nlmsvc_failover_file_ok_path);
+}
+EXPORT_SYMBOL_GPL(nlmsvc_failover_path);
+
+static int
+nlmsvc_failover_file_ok_ip(void *datap, struct nlm_file *file)
+{
+	struct in_addr *in = datap;
+
+	return file->f_iaddr.addr.s_addr == in->s_addr;
+}
+
+int
+nlmsvc_failover_ip(__be32 server_addr)
+{
+	return nlm_traverse_files(&server_addr, nlmsvc_always_match,
+			nlmsvc_failover_file_ok_ip);
 }
+EXPORT_SYMBOL_GPL(nlmsvc_failover_ip);
Index: linux-2.6/include/linux/lockd/lockd.h
===================================================================
--- linux-2.6.orig/include/linux/lockd/lockd.h	2008-01-08 18:36:37.000000000 +0100
+++ linux-2.6/include/linux/lockd/lockd.h	2008-01-08 18:44:44.000000000 +0100
@@ -113,6 +113,7 @@ struct nlm_file {
 	unsigned int		f_locks;	/* guesstimate # of locks */
 	unsigned int		f_count;	/* reference count */
 	struct mutex		f_mutex;	/* avoid concurrent access */
+	union svc_addr_u	f_iaddr;	/* server ip for failover */
 };
 
 /*
@@ -214,6 +215,12 @@ void		  nlmsvc_mark_resources(void);
 void		  nlmsvc_free_host_resources(struct nlm_host *);
 void		  nlmsvc_invalidate_all(void);
 
+/*
+ * Cluster failover support
+ */
+int           nlmsvc_failover_path(struct nameidata *nd);
+int           nlmsvc_failover_ip(__be32 server_addr);
+
 static __inline__ struct inode *
 nlmsvc_file_inode(struct nlm_file *file)
 {



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 2/2] Fix lockd panic
  2008-01-09  4:43       ` Wendy Cheng
@ 2008-01-09 23:33         ` Wendy Cheng
  2008-01-12  6:51           ` Wendy Cheng
  0 siblings, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-09 23:33 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Wendy Cheng wrote:
> Wendy Cheng wrote:
>
>> Neil Brown wrote:
>>
>>> Some options:
>>>
>>> Have an initial patch which removes all references to f_locks and
>>> includes the change in this patch. With that in place your main
>>> patch won't introduce a bug. If you do this, you should attempt to
>>> understand and justify the performance impact (does nlm_traverse_files
>>> become quadratic in number of locks. Is that acceptable?).
>>>
>>> Change the first patch to explicitly update f_count if you bypass the
>>> call to nlm_inspect_file.
>>>
>>> something else???
>>>
>>>
>>
>> Let's see what hch says in another email... will come back to this soon.
>>
>
> Will do a quick and dirty performance measure tomorrow when I get to 
> the office. Then we'll go from there.

The test didn't go well and I'm still debugging .. However, let's 
revisit the issue:

[quot from Neil's comment]
The important difference between the old code and the new code here is
that the old code tests "file->f_locks" while the new code iterates
through i_flock to see if there are any lockd locks.

f_locks is set to a count of lockd locks in nlm_traverse_locks which
*was* always called by nlm_inspect_file which is called immediately
before the code you are changing.
But since your patch, nlm_inspect_file does not always call
nlm_traverse_locks, so there is a chance that f_locks is wrong.

With this patch, f_locks it not used at all any more.
[end quot]

The point here is "with this patch, f_locks it not used at all any 
more." Note that we have a nice inline function "nlm_file_inuse", why 
should we use f_locks (that I assume people agree that it is awkward) ? 
Could we simply drop f_locks all together in this section of code?

-- Wendy

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-09 18:02       ` Christoph Hellwig
@ 2008-01-10  7:59         ` Christoph Hellwig
  2008-01-12  7:03           ` Wendy Cheng
  0 siblings, 1 reply; 57+ messages in thread
From: Christoph Hellwig @ 2008-01-10  7:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Wed, Jan 09, 2008 at 06:02:15PM +0000, Christoph Hellwig wrote:
> On Tue, Jan 08, 2008 at 03:57:45PM -0500, Wendy Cheng wrote:
> > Christoph Hellwig wrote:
> >> Ok, I played around with this and cleaned up the ip/path codepathes to
> >> be entirely setup which helped the code quite a bit.  Also a few other
> >>   
> > Thanks for doing this :) . In the middle of running it with our cluster 
> > test - if passed, will repost it. Get your "signed-off" line ready ?
> 
> Not quite yet.  I'm not happy with what's going on in svcsubs.c in the
> current form.
> 
> I've added another (untested) idea patch below which adds a second
> match function to nlm_traverse_files to remove the current hardcoded
> hack.  If that works out we'll just need to incorporate Neil's feedback
> to the second patch somehow.

This patch introduce a new bug by a reversed check for the file_ok
function that would hit all non-failover lockd functionality.  There's
also been a bug left-over from my previous patch where the file_ok
callback expects the wrong type passed to it and would crash.

Fixed version below:


Index: linux-2.6/fs/nfsd/nfsctl.c
===================================================================
--- linux-2.6.orig/fs/nfsd/nfsctl.c	2008-01-08 18:36:37.000000000 +0100
+++ linux-2.6/fs/nfsd/nfsctl.c	2008-01-10 08:56:55.000000000 +0100
@@ -22,6 +22,7 @@
 #include <linux/seq_file.h>
 #include <linux/pagemap.h>
 #include <linux/init.h>
+#include <linux/inet.h>
 #include <linux/string.h>
 #include <linux/smp_lock.h>
 #include <linux/ctype.h>
@@ -35,6 +36,7 @@
 #include <linux/nfsd/cache.h>
 #include <linux/nfsd/xdr.h>
 #include <linux/nfsd/syscall.h>
+#include <linux/lockd/lockd.h>
 
 #include <asm/uaccess.h>
 
@@ -52,6 +54,8 @@ enum {
 	NFSD_Getfs,
 	NFSD_List,
 	NFSD_Fh,
+	NFSD_FO_UnlockIP,
+	NFSD_FO_UnlockFS,
 	NFSD_Threads,
 	NFSD_Pool_Threads,
 	NFSD_Versions,
@@ -88,6 +92,9 @@ static ssize_t write_leasetime(struct fi
 static ssize_t write_recoverydir(struct file *file, char *buf, size_t size);
 #endif
 
+static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size);
+static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size);
+
 static ssize_t (*write_op[])(struct file *, char *, size_t) = {
 	[NFSD_Svc] = write_svc,
 	[NFSD_Add] = write_add,
@@ -97,6 +104,8 @@ static ssize_t (*write_op[])(struct file
 	[NFSD_Getfd] = write_getfd,
 	[NFSD_Getfs] = write_getfs,
 	[NFSD_Fh] = write_filehandle,
+	[NFSD_FO_UnlockIP] = failover_unlock_ip,
+	[NFSD_FO_UnlockFS] = failover_unlock_fs,
 	[NFSD_Threads] = write_threads,
 	[NFSD_Pool_Threads] = write_pool_threads,
 	[NFSD_Versions] = write_versions,
@@ -288,6 +297,55 @@ static ssize_t write_getfd(struct file *
 	return err;
 }
 
+static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size)
+{
+	__be32 server_ip;
+	char *fo_path;
+	char *mesg;
+
+	/* sanity check */
+	if (size <= 0)
+		return -EINVAL;
+
+	if (buf[size-1] == '\n')
+		buf[size-1] = 0;
+
+	fo_path = mesg = buf;
+	if (qword_get(&mesg, fo_path, size) < 0)
+		return -EINVAL;
+
+	server_ip = in_aton(fo_path);
+	return nlmsvc_failover_ip(server_ip);
+}
+
+static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size)
+{
+	struct nameidata nd;
+	char *fo_path;
+	char *mesg;
+	int error;
+
+	/* sanity check */
+	if (size <= 0)
+		return -EINVAL;
+
+	if (buf[size-1] == '\n')
+		buf[size-1] = 0;
+
+	fo_path = mesg = buf;
+	if (qword_get(&mesg, fo_path, size) < 0)
+		return -EINVAL;
+
+	error = path_lookup(fo_path, 0, &nd);
+	if (error)
+		return error;
+
+	error = nlmsvc_failover_path(&nd);
+
+	path_release(&nd);
+	return error;
+}
+
 static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
 {
 	/* request is:
@@ -646,6 +704,8 @@ static int nfsd_fill_super(struct super_
 		[NFSD_Getfd] = {".getfd", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Getfs] = {".getfs", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_List] = {"exports", &exports_operations, S_IRUGO},
+		[NFSD_FO_UnlockIP] = {"unlock_ip", &transaction_ops, S_IWUSR|S_IRUSR},
+		[NFSD_FO_UnlockFS] = {"unlock_filesystem", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
 		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
Index: linux-2.6/fs/lockd/svcsubs.c
===================================================================
--- linux-2.6.orig/fs/lockd/svcsubs.c	2008-01-08 18:36:37.000000000 +0100
+++ linux-2.6/fs/lockd/svcsubs.c	2008-01-10 08:56:18.000000000 +0100
@@ -18,6 +18,8 @@
 #include <linux/lockd/lockd.h>
 #include <linux/lockd/share.h>
 #include <linux/lockd/sm_inter.h>
+#include <linux/module.h>
+#include <linux/mount.h>
 
 #define NLMDBG_FACILITY		NLMDBG_SVCSUBS
 
@@ -87,7 +89,7 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
 	unsigned int	hash;
 	__be32		nfserr;
 
-	nlm_debug_print_fh("nlm_file_lookup", f);
+	nlm_debug_print_fh("nlm_lookup_file", f);
 
 	hash = file_hash(f);
 
@@ -123,6 +125,9 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
 
 	hlist_add_head(&file->f_list, &nlm_files[hash]);
 
+	/* fill in f_iaddr for nlm lock failover */
+	file->f_iaddr = rqstp->rq_daddr;
+
 found:
 	dprintk("lockd: found file %p (count %d)\n", file, file->f_count);
 	*result = file;
@@ -194,6 +199,12 @@ again:
 	return 0;
 }
 
+static int
+nlmsvc_always_match(struct nlm_host *dummy1, struct nlm_host *dummy2)
+{
+        return 1;
+}
+
 /*
  * Inspect a single file
  */
@@ -230,7 +241,8 @@ nlm_file_inuse(struct nlm_file *file)
  * Loop over all files in the file table.
  */
 static int
-nlm_traverse_files(struct nlm_host *host, nlm_host_match_fn_t match)
+nlm_traverse_files(void *data, nlm_host_match_fn_t match,
+		int (*file_ok)(void *data, struct nlm_file *file))
 {
 	struct hlist_node *pos, *next;
 	struct nlm_file	*file;
@@ -244,8 +256,10 @@ nlm_traverse_files(struct nlm_host *host
 
 			/* Traverse locks, blocks and shares of this file
 			 * and update file->f_locks count */
-			if (nlm_inspect_file(host, file, match))
-				ret = 1;
+			if (!file_ok || file_ok(data, file)) {
+				if (nlm_inspect_file(data, file, match))
+					ret = 1;
+			}
 
 			mutex_lock(&nlm_file_mutex);
 			file->f_count--;
@@ -337,7 +351,7 @@ void
 nlmsvc_mark_resources(void)
 {
 	dprintk("lockd: nlmsvc_mark_resources\n");
-	nlm_traverse_files(NULL, nlmsvc_mark_host);
+	nlm_traverse_files(NULL, nlmsvc_mark_host, NULL);
 }
 
 /*
@@ -348,7 +362,7 @@ nlmsvc_free_host_resources(struct nlm_ho
 {
 	dprintk("lockd: nlmsvc_free_host_resources\n");
 
-	if (nlm_traverse_files(host, nlmsvc_same_host)) {
+	if (nlm_traverse_files(host, nlmsvc_same_host, NULL)) {
 		printk(KERN_WARNING
 			"lockd: couldn't remove all locks held by %s\n",
 			host->h_name);
@@ -368,5 +382,36 @@ nlmsvc_invalidate_all(void)
 	 * turn, which is about as inefficient as it gets.
 	 * Now we just do it once in nlm_traverse_files.
 	 */
-	nlm_traverse_files(NULL, nlmsvc_is_client);
+	nlm_traverse_files(NULL, nlmsvc_is_client, NULL);
+}
+
+static int
+nlmsvc_failover_file_ok_path(void *datap, struct nlm_file *file)
+{
+	struct nameidata *nd = datap;
+	return nd->mnt == file->f_file->f_path.mnt;
+}
+
+int
+nlmsvc_failover_path(struct nameidata *nd)
+{
+	return nlm_traverse_files(nd, nlmsvc_always_match,
+			nlmsvc_failover_file_ok_path);
+}
+EXPORT_SYMBOL_GPL(nlmsvc_failover_path);
+
+static int
+nlmsvc_failover_file_ok_ip(void *datap, struct nlm_file *file)
+{
+	__be32 *server_addr = datap;
+
+	return file->f_iaddr.addr.s_addr == *server_addr;
+}
+
+int
+nlmsvc_failover_ip(__be32 server_addr)
+{
+	return nlm_traverse_files(&server_addr, nlmsvc_always_match,
+			nlmsvc_failover_file_ok_ip);
 }
+EXPORT_SYMBOL_GPL(nlmsvc_failover_ip);
Index: linux-2.6/include/linux/lockd/lockd.h
===================================================================
--- linux-2.6.orig/include/linux/lockd/lockd.h	2008-01-08 18:36:37.000000000 +0100
+++ linux-2.6/include/linux/lockd/lockd.h	2008-01-08 18:44:44.000000000 +0100
@@ -113,6 +113,7 @@ struct nlm_file {
 	unsigned int		f_locks;	/* guesstimate # of locks */
 	unsigned int		f_count;	/* reference count */
 	struct mutex		f_mutex;	/* avoid concurrent access */
+	union svc_addr_u	f_iaddr;	/* server ip for failover */
 };
 
 /*
@@ -214,6 +215,12 @@ void		  nlmsvc_mark_resources(void);
 void		  nlmsvc_free_host_resources(struct nlm_host *);
 void		  nlmsvc_invalidate_all(void);
 
+/*
+ * Cluster failover support
+ */
+int           nlmsvc_failover_path(struct nameidata *nd);
+int           nlmsvc_failover_ip(__be32 server_addr);
+
 static __inline__ struct inode *
 nlmsvc_file_inode(struct nlm_file *file)
 {



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 2/2] Fix lockd panic
  2008-01-09 23:33         ` Wendy Cheng
@ 2008-01-12  6:51           ` Wendy Cheng
  0 siblings, 0 replies; 57+ messages in thread
From: Wendy Cheng @ 2008-01-12  6:51 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Wendy Cheng wrote:

> The point here is "with this patch, f_locks it not used at all any 
> more." Note that we have a nice inline function "nlm_file_inuse", why 
> should we use f_locks (that I assume people agree that it is awkward) 
> ? Could we simply drop f_locks all together in this section of code?
>
Start to have a second thought about this ....

Removing f_locks does make the code much more readable. However, if 
inode->i_flock list is long, e.g. large amount of (clients) hosts and/or 
processes from other hosts competing for the same lock, we don't want to 
do the list walk twice within nlm_traverse_files(). Intuitively this is 
unlikely but I prefer not changing the current behavior. As the result, 
I'm withdrawing the patch all together.

The first patch will handle the issue correctly. Will submit it after 
this post.

-- Wendy

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-10  7:59         ` Christoph Hellwig
@ 2008-01-12  7:03           ` Wendy Cheng
  2008-01-12  9:38             ` Christoph Hellwig
                               ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Wendy Cheng @ 2008-01-12  7:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This is a combined patch that has:

* changes made by Christoph Hellwig
* code segment that handles f_locks so we would not walk inode->i_flock 
list twice.

If agreed, please re-add your "ack-by" and "signed-off" lines 
respectively. Thanks ...

-- Wendy
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unlock_v2.patch
Type: text/x-patch
Size: 8535 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20080112/4b2b3d90/attachment.bin>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-12  7:03           ` Wendy Cheng
@ 2008-01-12  9:38             ` Christoph Hellwig
  2008-01-14 23:07             ` J. Bruce Fields
       [not found]             ` <message from Wendy Cheng on Saturday January 12>
  2 siblings, 0 replies; 57+ messages in thread
From: Christoph Hellwig @ 2008-01-12  9:38 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Sat, Jan 12, 2008 at 02:03:55AM -0500, Wendy Cheng wrote:
> This is a combined patch that has:
>
> * changes made by Christoph Hellwig
> * code segment that handles f_locks so we would not walk inode->i_flock 
> list twice.
>
> If agreed, please re-add your "ack-by" and "signed-off" lines respectively. 
> Thanks ...
>
> -- Wendy

> Two new NFSD procfs files are added:
>   /proc/fs/nfsd/unlock_ip
>   /proc/fs/nfsd/unlock_filesystem
> 
> They are intended to allow admin or user mode script to release NLM locks
> based on either a path name or a server in-bound ip address (ipv4 for now)
> as;
> 
> shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip
> shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem
> 
> Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
> Signed-off-by: Lon Hohberger  <lhh@redhat.com>

Signed-off-by: Christoph Hellwig <hch@lst.de>



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-12  7:03           ` Wendy Cheng
  2008-01-12  9:38             ` Christoph Hellwig
@ 2008-01-14 23:07             ` J. Bruce Fields
       [not found]               ` <message from J. Bruce Fields on Monday January 14>
  2008-01-15 16:14               ` Wendy Cheng
       [not found]             ` <message from Wendy Cheng on Saturday January 12>
  2 siblings, 2 replies; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-14 23:07 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Sat, Jan 12, 2008 at 02:03:55AM -0500, Wendy Cheng wrote:
> This is a combined patch that has:
>
> * changes made by Christoph Hellwig
> * code segment that handles f_locks so we would not walk inode->i_flock  
> list twice.
>
> If agreed, please re-add your "ack-by" and "signed-off" lines  
> respectively. Thanks ...
>
> -- Wendy

Comments below--

> Two new NFSD procfs files are added:
>   /proc/fs/nfsd/unlock_ip
>   /proc/fs/nfsd/unlock_filesystem
> 
> They are intended to allow admin or user mode script to release NLM locks
> based on either a path name or a server in-bound ip address (ipv4 for now)
> as;
> 
> shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip
> shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem
> 
> Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
> Signed-off-by: Lon Hohberger  <lhh@redhat.com>
> 
>  fs/lockd/svcsubs.c          |   68 ++++++++++++++++++++++++++++++++++++++------
>  fs/nfsd/nfsctl.c            |   62 ++++++++++++++++++++++++++++++++++++++++
>  include/linux/lockd/lockd.h |    7 ++++
>  3 files changed, 129 insertions(+), 8 deletions(-)
> 
> --- linux-o/fs/nfsd/nfsctl.c	2008-01-04 10:01:08.000000000 -0500
> +++ linux/fs/nfsd/nfsctl.c	2008-01-11 19:08:02.000000000 -0500
> @@ -22,6 +22,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/pagemap.h>
>  #include <linux/init.h>
> +#include <linux/inet.h>
>  #include <linux/string.h>
>  #include <linux/smp_lock.h>
>  #include <linux/ctype.h>
> @@ -35,6 +36,7 @@
>  #include <linux/nfsd/cache.h>
>  #include <linux/nfsd/xdr.h>
>  #include <linux/nfsd/syscall.h>
> +#include <linux/lockd/lockd.h>
>  
>  #include <asm/uaccess.h>
>  
> @@ -52,6 +54,8 @@ enum {
>  	NFSD_Getfs,
>  	NFSD_List,
>  	NFSD_Fh,
> +	NFSD_FO_UnlockIP,
> +	NFSD_FO_UnlockFS,
>  	NFSD_Threads,
>  	NFSD_Pool_Threads,
>  	NFSD_Versions,
> @@ -88,6 +92,9 @@ static ssize_t write_leasetime(struct fi
>  static ssize_t write_recoverydir(struct file *file, char *buf, size_t size);
>  #endif
>  
> +static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size);
> +static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size);
> +
>  static ssize_t (*write_op[])(struct file *, char *, size_t) = {
>  	[NFSD_Svc] = write_svc,
>  	[NFSD_Add] = write_add,
> @@ -97,6 +104,8 @@ static ssize_t (*write_op[])(struct file
>  	[NFSD_Getfd] = write_getfd,
>  	[NFSD_Getfs] = write_getfs,
>  	[NFSD_Fh] = write_filehandle,
> +	[NFSD_FO_UnlockIP] = failover_unlock_ip,
> +	[NFSD_FO_UnlockFS] = failover_unlock_fs,
>  	[NFSD_Threads] = write_threads,
>  	[NFSD_Pool_Threads] = write_pool_threads,
>  	[NFSD_Versions] = write_versions,
> @@ -288,6 +297,55 @@ static ssize_t write_getfd(struct file *
>  	return err;
>  }
>  
> +static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size)
> +{
> +	__be32 server_ip;
> +	char *fo_path;
> +	char *mesg;
> +
> +	/* sanity check */
> +	if (size <= 0)
> +		return -EINVAL;

Not only is size never negative, it's actually an unsigned type here, so
this is a no-op.

> +
> +	if (buf[size-1] == '\n')
> +		buf[size-1] = 0;

The other write methods in this file actually just do

	if (buf[size-1] != '\n')
		return -EINVAL;

I don't know why.  But absent some reason, I'd rather these two new
files behaved the same as existing ones.

> +
> +	fo_path = mesg = buf;
> +	if (qword_get(&mesg, fo_path, size) < 0)
> +		return -EINVAL;

"mesg" is unneeded here, right?  You can just do:

	fo_path = buf;
	if (qword_get(&buf, buf, size) < 0)

> +
> +	server_ip = in_aton(fo_path);

It'd be nice if we could sanity-check this.  (Is there code already in
the kernel someplace to do this?)

And, this isn't your problem for now, but eventually I guess this will
have to accept an ivp6 address as well?

> +	return nlmsvc_failover_ip(server_ip);
> +}
> +
> +static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size)
> +{
> +	struct nameidata nd;
> +	char *fo_path;
> +	char *mesg;
> +	int error;
> +
> +	/* sanity check */
> +	if (size <= 0)
> +		return -EINVAL;
> +
> +	if (buf[size-1] == '\n')
> +		buf[size-1] = 0;
> +
> +	fo_path = mesg = buf;
> +	if (qword_get(&mesg, fo_path, size) < 0)
> +		return -EINVAL;

Same comments as above.

> +
> +	error = path_lookup(fo_path, 0, &nd);
> +	if (error)
> +		return error;
> +
> +	error = nlmsvc_failover_path(&nd);
> +
> +	path_release(&nd);
> +	return error;
> +}
> +
>  static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
>  {
>  	/* request is:
> @@ -646,6 +704,10 @@ static int nfsd_fill_super(struct super_
>  		[NFSD_Getfd] = {".getfd", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Getfs] = {".getfs", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_List] = {"exports", &exports_operations, S_IRUGO},
> +		[NFSD_FO_UnlockIP] = {"unlock_ip",
> +					&transaction_ops, S_IWUSR|S_IRUSR},
> +		[NFSD_FO_UnlockFS] = {"unlock_filesystem",
> +					&transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
> --- linux-o/fs/lockd/svcsubs.c	2008-01-04 10:01:08.000000000 -0500
> +++ linux/fs/lockd/svcsubs.c	2008-01-11 19:08:28.000000000 -0500
> @@ -18,6 +18,8 @@
>  #include <linux/lockd/lockd.h>
>  #include <linux/lockd/share.h>
>  #include <linux/lockd/sm_inter.h>
> +#include <linux/module.h>
> +#include <linux/mount.h>
>  
>  #define NLMDBG_FACILITY		NLMDBG_SVCSUBS
>  
> @@ -87,7 +89,7 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
>  	unsigned int	hash;
>  	__be32		nfserr;
>  
> -	nlm_debug_print_fh("nlm_file_lookup", f);
> +	nlm_debug_print_fh("nlm_lookup_file", f);
>  
>  	hash = file_hash(f);
>  
> @@ -123,6 +125,9 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
>  
>  	hlist_add_head(&file->f_list, &nlm_files[hash]);
>  
> +	/* fill in f_iaddr for nlm lock failover */
> +	file->f_iaddr = rqstp->rq_daddr;
> +
>  found:
>  	dprintk("lockd: found file %p (count %d)\n", file, file->f_count);
>  	*result = file;
> @@ -194,6 +199,12 @@ again:
>  	return 0;
>  }
>  
> +static int
> +nlmsvc_always_match(struct nlm_host *dummy1, struct nlm_host *dummy2)
> +{
> +	return 1;
> +}
> +
>  /*
>   * Inspect a single file
>   */
> @@ -230,27 +241,37 @@ nlm_file_inuse(struct nlm_file *file)
>   * Loop over all files in the file table.
>   */
>  static int
> -nlm_traverse_files(struct nlm_host *host, nlm_host_match_fn_t match)
> +nlm_traverse_files(void *data, nlm_host_match_fn_t match,
> +		int (*failover)(void *data, struct nlm_file *file))
>  {
>  	struct hlist_node *pos, *next;
>  	struct nlm_file	*file;
> -	int i, ret = 0;
> +	int i, ret = 0, inspect_file;
>  
>  	mutex_lock(&nlm_file_mutex);
>  	for (i = 0; i < FILE_NRHASH; i++) {
>  		hlist_for_each_entry_safe(file, pos, next, &nlm_files[i], f_list) {
>  			file->f_count++;
>  			mutex_unlock(&nlm_file_mutex);
> +			inspect_file = 1;
>  
>  			/* Traverse locks, blocks and shares of this file
>  			 * and update file->f_locks count */
> -			if (nlm_inspect_file(host, file, match))
> +
> +			if (unlikely(failover)) {
> +				if (!failover(data, file)) {
> +					inspect_file = 0;
> +					file->f_locks = nlm_file_inuse(file);
> +				}
> +			}
> +
> +			if (inspect_file && nlm_inspect_file(data, file, match))
>  				ret = 1;

This seems quite complicated!  I don't have an alternative suggestion,
though.

--b.


>  
>  			mutex_lock(&nlm_file_mutex);
>  			file->f_count--;
>  			/* No more references to this file. Let go of it. */
> -			if (list_empty(&file->f_blocks) && !file->f_locks
> +			if (!file->f_locks && list_empty(&file->f_blocks)
>  			 && !file->f_shares && !file->f_count) {
>  				hlist_del(&file->f_list);
>  				nlmsvc_ops->fclose(file->f_file);
> @@ -337,7 +358,7 @@ void
>  nlmsvc_mark_resources(void)
>  {
>  	dprintk("lockd: nlmsvc_mark_resources\n");
> -	nlm_traverse_files(NULL, nlmsvc_mark_host);
> +	nlm_traverse_files(NULL, nlmsvc_mark_host, NULL);
>  }
>  
>  /*
> @@ -348,7 +369,7 @@ nlmsvc_free_host_resources(struct nlm_ho
>  {
>  	dprintk("lockd: nlmsvc_free_host_resources\n");
>  
> -	if (nlm_traverse_files(host, nlmsvc_same_host)) {
> +	if (nlm_traverse_files(host, nlmsvc_same_host, NULL)) {
>  		printk(KERN_WARNING
>  			"lockd: couldn't remove all locks held by %s\n",
>  			host->h_name);
> @@ -368,5 +389,36 @@ nlmsvc_invalidate_all(void)
>  	 * turn, which is about as inefficient as it gets.
>  	 * Now we just do it once in nlm_traverse_files.
>  	 */
> -	nlm_traverse_files(NULL, nlmsvc_is_client);
> +	nlm_traverse_files(NULL, nlmsvc_is_client, NULL);
> +}
> +
> +static int
> +nlmsvc_failover_file_ok_path(void *datap, struct nlm_file *file)
> +{
> +	struct nameidata *nd = datap;
> +	return nd->mnt == file->f_file->f_path.mnt;
> +}
> +
> +int
> +nlmsvc_failover_path(struct nameidata *nd)
> +{
> +	return nlm_traverse_files(nd, nlmsvc_always_match,
> +			nlmsvc_failover_file_ok_path);
> +}
> +EXPORT_SYMBOL_GPL(nlmsvc_failover_path);
> +
> +static int
> +nlmsvc_failover_file_ok_ip(void *datap, struct nlm_file *file)
> +{
> +	__be32 *server_addr = datap;
> +
> +	return file->f_iaddr.addr.s_addr == *server_addr;
> +}
> +
> +int
> +nlmsvc_failover_ip(__be32 server_addr)
> +{
> +	return nlm_traverse_files(&server_addr, nlmsvc_always_match,
> +			nlmsvc_failover_file_ok_ip);
>  }
> +EXPORT_SYMBOL_GPL(nlmsvc_failover_ip);
> --- linux-o/include/linux/lockd/lockd.h	2008-01-04 10:01:08.000000000 -0500
> +++ linux/include/linux/lockd/lockd.h	2008-01-11 18:24:45.000000000 -0500
> @@ -113,6 +113,7 @@ struct nlm_file {
>  	unsigned int		f_locks;	/* guesstimate # of locks */
>  	unsigned int		f_count;	/* reference count */
>  	struct mutex		f_mutex;	/* avoid concurrent access */
> +	union svc_addr_u	f_iaddr;	/* server ip for failover */
>  };
>  
>  /*
> @@ -214,6 +215,12 @@ void		  nlmsvc_mark_resources(void);
>  void		  nlmsvc_free_host_resources(struct nlm_host *);
>  void		  nlmsvc_invalidate_all(void);
>  
> +/*
> + * Cluster failover support
> + */
> +int           nlmsvc_failover_path(struct nameidata *nd);
> +int           nlmsvc_failover_ip(__be32 server_addr);
> +
>  static __inline__ struct inode *
>  nlmsvc_file_inode(struct nlm_file *file)
>  {



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
       [not found]               ` <message from J. Bruce Fields on Monday January 14>
@ 2008-01-14 23:31                 ` Neil Brown
  2008-01-22 22:53                   ` J. Bruce Fields
  0 siblings, 1 reply; 57+ messages in thread
From: Neil Brown @ 2008-01-14 23:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Monday January 14, bfields at fieldses.org wrote:
> >  
> > +static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size)
> > +{
> > +	__be32 server_ip;
> > +	char *fo_path;
> > +	char *mesg;
> > +
> > +	/* sanity check */
> > +	if (size <= 0)
> > +		return -EINVAL;
> 
> Not only is size never negative, it's actually an unsigned type here, so
> this is a no-op.

No,  It it equivalent to
          if (size == 0)

which alternative is clearer and more maintainable is debatable.

> 
> > +
> > +	if (buf[size-1] == '\n')
> > +		buf[size-1] = 0;
> 
> The other write methods in this file actually just do
> 
> 	if (buf[size-1] != '\n')
> 		return -EINVAL;

and those which don't check for size == 0 are underflowing an array.
That should probably be fixed.

> 
> I don't know why.  But absent some reason, I'd rather these two new
> files behaved the same as existing ones.
> 
> > +
> > +	fo_path = mesg = buf;
> > +	if (qword_get(&mesg, fo_path, size) < 0)
> > +		return -EINVAL;
> 
> "mesg" is unneeded here, right?  You can just do:
> 
> 	fo_path = buf;
> 	if (qword_get(&buf, buf, size) < 0)
> 
> > +
> > +	server_ip = in_aton(fo_path);
> 
> It'd be nice if we could sanity-check this.  (Is there code already in
> the kernel someplace to do this?)

In ip_map_parse we do:

	if (sscanf(buf, "%u.%u.%u.%u%c", &b1, &b2, &b3, &b4, &c) != 4)
		return -EINVAL;
	...
	addr.s_addr =
		htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);

I suspect that would fit in an inline function somewhere quite
nicely. but where?


NeilBrown



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
       [not found]             ` <message from Wendy Cheng on Saturday January 12>
@ 2008-01-14 23:52               ` Neil Brown
  2008-01-15 20:17                 ` Wendy Cheng
  0 siblings, 1 reply; 57+ messages in thread
From: Neil Brown @ 2008-01-14 23:52 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Saturday January 12, wcheng at redhat.com wrote:
> This is a combined patch that has:
> 
> * changes made by Christoph Hellwig
> * code segment that handles f_locks so we would not walk inode->i_flock 
> list twice.
> 
> If agreed, please re-add your "ack-by" and "signed-off" lines 
> respectively. Thanks ...
> 

> -	int i, ret = 0;
> +	int i, ret = 0, inspect_file;
>  
>  	mutex_lock(&nlm_file_mutex);
>  	for (i = 0; i < FILE_NRHASH; i++) {
>  		hlist_for_each_entry_safe(file, pos, next, &nlm_files[i], f_list) {
>  			file->f_count++;
>  			mutex_unlock(&nlm_file_mutex);
> +			inspect_file = 1;
>  
>  			/* Traverse locks, blocks and shares of this file
>  			 * and update file->f_locks count */
> -			if (nlm_inspect_file(host, file, match))
> +
> +			if (unlikely(failover)) {
> +				if (!failover(data, file)) {
> +					inspect_file = 0;
> +					file->f_locks = nlm_file_inuse(file);
> +				}
> +			}
> +
> +			if (inspect_file && nlm_inspect_file(data, file, match))
>  				ret = 1;

if (unlikely(failover) &&
    !failover(data, file))
	file->f_locks = nlm_file_inuse(file);
else if (nlm_inspect_file(data, file, match))
	ret = 1;

Though the logic still isn't very clear... maybe:

if (likely(failover == NULL) ||
    failover(data, file))
	ret |= nlm_inspect_file(data, file, match);
else
    file->f_locks = nlm_file_inuse(file);

Actually I would like to make nlm_inspect_file return 'void'.
The returned value of '1' is ultimately either ignored or it triggers
a BUG().  And the case where it triggers a BUG is the "host != NULL"
case.  (I think - if someone could check, that would be good).
So putting BUG_ON(host) in nlm_traverse_locks (along with a nice big
comment) would mean we can discard the return value from
nlm_traverse_locks and nlm_inspect_file and nlm_traverse_files.

Also, if we could change the function name 'failover' to some sort of
verb like "is_failover" or "is_failover_file", then the above could be

  if (likely(is_failover_file == NULL) ||
      is_failover_file(data, file))
	/* note nlm_inspect_file updates f_locks */
	nlm_inspect_file(data, file, match);
  else
	file->f_locks = nlm_file_inuse(file);


>  
>  			mutex_lock(&nlm_file_mutex);
>  			file->f_count--;
>  			/* No more references to this file. Let go of it. */
> -			if (list_empty(&file->f_blocks) && !file->f_locks
> +			if (!file->f_locks && list_empty(&file->f_blocks)

Is this change actually achieving something?  or is it just noise?


NeilBrown



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-14 23:07             ` J. Bruce Fields
       [not found]               ` <message from J. Bruce Fields on Monday January 14>
@ 2008-01-15 16:14               ` Wendy Cheng
  2008-01-15 16:30                 ` J. Bruce Fields
  1 sibling, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-15 16:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

J. Bruce Fields wrote:
>>  
>> +static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size)
>> +{
>> +	__be32 server_ip;
>> +	char *fo_path;
>> +	char *mesg;
>> +
>> +	/* sanity check */
>> +	if (size <= 0)
>> +		return -EINVAL;
>>     
>
> Not only is size never negative, it's actually an unsigned type here, so
> this is a no-op.
>   

Based on comment from Neil, let's keep this ?

>   
>> +
>> +	if (buf[size-1] == '\n')
>> +		buf[size-1] = 0;
>>     
>
> The other write methods in this file actually just do
>
> 	if (buf[size-1] != '\n')
> 		return -EINVAL;
>
> I don't know why.  But absent some reason, I'd rather these two new
> files behaved the same as existing ones.
>   
ok, fixed.

>   
>> +
>> +	fo_path = mesg = buf;
>> +	if (qword_get(&mesg, fo_path, size) < 0)
>> +		return -EINVAL;
>>     
>
> "mesg" is unneeded here, right?  You can just do:
>
> 	fo_path = buf;
> 	if (qword_get(&buf, buf, size) < 0)
>   

A left-over from previous patch where the command parsing is done by a 
common routine. Will fix this for now.
>   
>> +
>> +	server_ip = in_aton(fo_path);
>>     
>
> It'd be nice if we could sanity-check this.  (Is there code already in
> the kernel someplace to do this?)
>   

The argument here is that if this is a bogus address, the search (of nlm 
files list) will fail (not found) later anyway. Failover is normally 
time critical. Quicker we finish, less issues will be seen. The sanity 
check here can be skipped (imo).
> And, this isn't your problem for now, but eventually I guess this will
> have to accept an ivp6 address as well?
>   

yeah .. Thinking about this right now.. May be borrowing the code from 
ip_map_parse() as Neil pointed out in another mail ?
>   
>> +static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size)
>> +{
>> +	struct nameidata nd;
>> +	char *fo_path;
>> +	char *mesg;
>> +	int error;
>> +
>> +	/* sanity check */
>> +	if (size <= 0)
>> +		return -EINVAL;
>> +
>> +	if (buf[size-1] == '\n')
>> +		buf[size-1] = 0;
>> +
>> +	fo_path = mesg = buf;
>> +	if (qword_get(&mesg, fo_path, size) < 0)
>> +		return -EINVAL;
>>     
>
> Same comments as above.
>   
ok, fixed.

...
[snip]
>>  /*
>>   * Inspect a single file
>>   */
>> @@ -230,27 +241,37 @@ nlm_file_inuse(struct nlm_file *file)
>>   * Loop over all files in the file table.
>>   */
>>  static int
>> -nlm_traverse_files(struct nlm_host *host, nlm_host_match_fn_t match)
>> +nlm_traverse_files(void *data, nlm_host_match_fn_t match,
>> +		int (*failover)(void *data, struct nlm_file *file))
>>  {
>>  	struct hlist_node *pos, *next;
>>  	struct nlm_file	*file;
>> -	int i, ret = 0;
>> +	int i, ret = 0, inspect_file;
>>  
>>  	mutex_lock(&nlm_file_mutex);
>>  	for (i = 0; i < FILE_NRHASH; i++) {
>>  		hlist_for_each_entry_safe(file, pos, next, &nlm_files[i], f_list) {
>>  			file->f_count++;
>>  			mutex_unlock(&nlm_file_mutex);
>> +			inspect_file = 1;
>>  
>>  			/* Traverse locks, blocks and shares of this file
>>  			 * and update file->f_locks count */
>> -			if (nlm_inspect_file(host, file, match))
>> +
>> +			if (unlikely(failover)) {
>> +				if (!failover(data, file)) {
>> +					inspect_file = 0;
>> +					file->f_locks = nlm_file_inuse(file);
>> +				}
>> +			}
>> +
>> +			if (inspect_file && nlm_inspect_file(data, file, match))
>>  				ret = 1;
>>     
>
> This seems quite complicated!  I don't have an alternative suggestion,
> though.
>   
I hear you .... we also need to look into other (vfs) layer locking 
functions and do an overall cleanup eventuall. It is not related to the 
failover function though.

Will post the revised patch after a light testing soon ...

-- Wendy



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-15 16:14               ` Wendy Cheng
@ 2008-01-15 16:30                 ` J. Bruce Fields
  0 siblings, 0 replies; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-15 16:30 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Jan 15, 2008 at 11:14:54AM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>>>  +static ssize_t failover_unlock_ip(struct file *file, char *buf, 
>>> size_t size)
>>> +{
>>> +	__be32 server_ip;
>>> +	char *fo_path;
>>> +	char *mesg;
>>> +
>>> +	/* sanity check */
>>> +	if (size <= 0)
>>> +		return -EINVAL;
>>>     
>>
>> Not only is size never negative, it's actually an unsigned type here, so
>> this is a no-op.
>>   
>
> Based on comment from Neil, let's keep this ?

Yes, I misread, apologies!

>>   
>>> +
>>> +	server_ip = in_aton(fo_path);
>>>     
>>
>> It'd be nice if we could sanity-check this.  (Is there code already in
>> the kernel someplace to do this?)
>>   
>
> The argument here is that if this is a bogus address, the search (of nlm  
> files list) will fail (not found) later anyway.

Or it could accidentally find some other address?

> Failover is normally  
> time critical. Quicker we finish, less issues will be seen. The sanity  
> check here can be skipped (imo).

>> And, this isn't your problem for now, but eventually I guess this will
>> have to accept an ivp6 address as well?
>>   
>
> yeah .. Thinking about this right now.. May be borrowing the code from  
> ip_map_parse() as Neil pointed out in another mail ?

Yes, that looks easy enough.

I think the sanity checking would make it easier to extend the interface
later (for ipv6 or whatever else we might discover we need) because
we'll be able to depend on older kernels failing in a predictable way.

--b.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-14 23:52               ` Neil Brown
@ 2008-01-15 20:17                 ` Wendy Cheng
       [not found]                   ` <message from Wendy Cheng on Tuesday January 15>
  0 siblings, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-15 20:17 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Neil Brown wrote:
> On Saturday January 12, wcheng at redhat.com wrote:
>   
>> This is a combined patch that has:
>>
>> * changes made by Christoph Hellwig
>> * code segment that handles f_locks so we would not walk inode->i_flock 
>> list twice.
>>
>>     
.....
>
> if (unlikely(failover) &&
>     !failover(data, file))
> 	file->f_locks = nlm_file_inuse(file);
> else if (nlm_inspect_file(data, file, match))
> 	ret = 1;
>
> Though the logic still isn't very clear... maybe:
>
> if (likely(failover == NULL) ||
>     failover(data, file))
> 	ret |= nlm_inspect_file(data, file, match);
> else
>     file->f_locks = nlm_file_inuse(file);
>
> Actually I would like to make nlm_inspect_file return 'void'.
> The returned value of '1' is ultimately either ignored or it triggers
> a BUG().  And the case where it triggers a BUG is the "host != NULL"
> case.  (I think - if someone could check, that would be good).
> So putting BUG_ON(host) in nlm_traverse_locks (along with a nice big
> comment) would mean we can discard the return value from
> nlm_traverse_locks and nlm_inspect_file and nlm_traverse_files.
>   

Current logic BUG() when:

1. host is not NULL; and
2. nlm_traverse_locks() somehow can't unlock the file.

I don't feel comfortable to change the existing code structure, 
especially a BUG() statement. It would be better to separate lock 
failover function away from lockd code clean-up. This is to make it 
easier for problem isolations (just in case).

On the other hand, if we view "ret" as a file count that tells us how 
many files fail to get unlocked, it would be great for debugging 
purpose. So the changes I made (currently in the middle of cluster 
testing) end up like this:

if (likely(is_failover_file == NULL) ||
is_failover_file(data, file)) {
/*
* Note that nlm_inspect_file updates f_locks
* and ret is the number of files that can't
* be unlocked.
*/
ret += nlm_inspect_file(data, file, match);
} else
file->f_locks = nlm_file_inuse(file);


>
>
>
>   
>>  
>>  			mutex_lock(&nlm_file_mutex);
>>  			file->f_count--;
>>  			/* No more references to this file. Let go of it. */
>> -			if (list_empty(&file->f_blocks) && !file->f_locks
>> +			if (!file->f_locks && list_empty(&file->f_blocks)
>>     
>
> Is this change actually achieving something?  or is it just noise?
>   
Not really - but I thought checking for f_locks would be faster (tiny 
bit of optimization :))

-- Wendy
>
> NeilBrown
>
>   



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
       [not found]                   ` <message from Wendy Cheng on Tuesday January 15>
@ 2008-01-15 20:50                     ` Neil Brown
  2008-01-15 20:56                       ` Wendy Cheng
  2008-01-15 22:48                       ` Wendy Cheng
  2008-01-16  4:19                     ` Neil Brown
  1 sibling, 2 replies; 57+ messages in thread
From: Neil Brown @ 2008-01-15 20:50 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tuesday January 15, wcheng at redhat.com wrote:
> 
> I don't feel comfortable to change the existing code structure, 
> especially a BUG() statement. It would be better to separate lock 
> failover function away from lockd code clean-up. This is to make it 
> easier for problem isolations (just in case).

Fair enough.

> 
> On the other hand, if we view "ret" as a file count that tells us how 
> many files fail to get unlocked, it would be great for debugging 
> purpose. So the changes I made (currently in the middle of cluster 
> testing) end up like this:
> 
> if (likely(is_failover_file == NULL) ||
> is_failover_file(data, file)) {
> /*
> * Note that nlm_inspect_file updates f_locks
> * and ret is the number of files that can't
> * be unlocked.
> */
> ret += nlm_inspect_file(data, file, match);
> } else
> file->f_locks = nlm_file_inuse(file);

Looks good.

> >>  			mutex_lock(&nlm_file_mutex);
> >>  			file->f_count--;
> >>  			/* No more references to this file. Let go of it. */
> >> -			if (list_empty(&file->f_blocks) && !file->f_locks
> >> +			if (!file->f_locks && list_empty(&file->f_blocks)
> >>     
> >
> > Is this change actually achieving something?  or is it just noise?
> >   
> Not really - but I thought checking for f_locks would be faster (tiny 
> bit of optimization :))

You don't want to put the fastest operation first.  You want the one
that is most likely to fail (as it is an '&&' where failure aborts the
sequence).
So you would need some argument (preferably with measurements) to say
which of the conditions will fail most often, before rearranging as an
optimisation.

NeilBrown



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-15 20:50                     ` Neil Brown
@ 2008-01-15 20:56                       ` Wendy Cheng
  2008-01-15 22:48                       ` Wendy Cheng
  1 sibling, 0 replies; 57+ messages in thread
From: Wendy Cheng @ 2008-01-15 20:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Neil Brown wrote:
> On Tuesday January 15, wcheng at redhat.com wrote:
>   
>> I don't feel comfortable to change the existing code structure, 
>> especially a BUG() statement. It would be better to separate lock 
>> failover function away from lockd code clean-up. This is to make it 
>> easier for problem isolations (just in case).
>>     
>
> Fair enough.
>
>   
>> On the other hand, if we view "ret" as a file count that tells us how 
>> many files fail to get unlocked, it would be great for debugging 
>> purpose. So the changes I made (currently in the middle of cluster 
>> testing) end up like this:
>>
>> if (likely(is_failover_file == NULL) ||
>> is_failover_file(data, file)) {
>> /*
>> * Note that nlm_inspect_file updates f_locks
>> * and ret is the number of files that can't
>> * be unlocked.
>> */
>> ret += nlm_inspect_file(data, file, match);
>> } else
>> file->f_locks = nlm_file_inuse(file);
>>     
>
> Looks good.
>
>   
>>>>  			mutex_lock(&nlm_file_mutex);
>>>>  			file->f_count--;
>>>>  			/* No more references to this file. Let go of it. */
>>>> -			if (list_empty(&file->f_blocks) && !file->f_locks
>>>> +			if (!file->f_locks && list_empty(&file->f_blocks)
>>>>     
>>>>         
>>> Is this change actually achieving something?  or is it just noise?
>>>   
>>>       
>> Not really - but I thought checking for f_locks would be faster (tiny 
>> bit of optimization :))
>>     
>
> You don't want to put the fastest operation first.  You want the one
> that is most likely to fail (as it is an '&&' where failure aborts the
> sequence).
> So you would need some argument (preferably with measurements) to say
> which of the conditions will fail most often, before rearranging as an
> optimisation.
>
>
>   
yep .. really think about it, my bad. Have reset it back ... Wendy



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-15 20:50                     ` Neil Brown
  2008-01-15 20:56                       ` Wendy Cheng
@ 2008-01-15 22:48                       ` Wendy Cheng
  2008-01-17 15:10                         ` J. Bruce Fields
  1 sibling, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-15 22:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Revised version of the patch:

* based on comment from Neil Brown, use sscanf() to parse IP address (a 
cool idea imo).
* the "ret" inside nlm_traverse_files() is now the file count that can't 
be unlocked.
* other minor changes from latest round of review.

Thanks to all for the review !

-- Wendy

-------------- next part --------------
A non-text attachment was scrubbed...
Name: unlock_v3.patch
Type: text/x-patch
Size: 8335 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20080115/ebf7ec37/attachment.bin>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
       [not found]                   ` <message from Wendy Cheng on Tuesday January 15>
  2008-01-15 20:50                     ` Neil Brown
@ 2008-01-16  4:19                     ` Neil Brown
  1 sibling, 0 replies; 57+ messages in thread
From: Neil Brown @ 2008-01-16  4:19 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tuesday January 15, wcheng at redhat.com wrote:
> Revised version of the patch:
> 
> * based on comment from Neil Brown, use sscanf() to parse IP address (a 
> cool idea imo).
> * the "ret" inside nlm_traverse_files() is now the file count that can't 
> be unlocked.
> * other minor changes from latest round of review.
> 
> Thanks to all for the review !
> 
> -- Wendy
> 
> Two new NFSD procfs files are added:
>   /proc/fs/nfsd/unlock_ip
>   /proc/fs/nfsd/unlock_filesystem
> 
> They are intended to allow admin or user mode script to release NLM locks
> based on either a path name or a server in-bound ip address (ipv4 for now)
> as;
> 
> shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip
> shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem
> 
> Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
> Signed-off-by: Lon Hohberger  <lhh@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-By: NeilBrown <neilb@suse.de>

Thanks.

NeilBrown



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-15 22:48                       ` Wendy Cheng
@ 2008-01-17 15:10                         ` J. Bruce Fields
  2008-01-17 15:48                           ` Wendy Cheng
  0 siblings, 1 reply; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-17 15:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Jan 15, 2008 at 05:48:00PM -0500, Wendy Cheng wrote:
> Revised version of the patch:
>
> * based on comment from Neil Brown, use sscanf() to parse IP address (a  
> cool idea imo).
> * the "ret" inside nlm_traverse_files() is now the file count that can't  
> be unlocked.
> * other minor changes from latest round of review.

Thanks!

Remind me: why do we need both per-ip and per-filesystem methods?  In
practice, I assume that we'll always do *both*?

We're migrating clients by moving a server ip address from one node to
another.  And I assume we're permitting at most one node to export each
filesystem at a time.  So it *should* be the case that the set of locks
held on the filesystem(s) that are moving are the same as the set of
locks held by the virtual ip that is moving.

But presumably in some scenarios clients can get confused, and we need
to ensure that stale locks are not left behind?

We've discussed this before, but we should get the answer into comments
in the code (or on the patches).

--b.

>
> Thanks to all for the review !
>
> -- Wendy
>

> Two new NFSD procfs files are added:
>   /proc/fs/nfsd/unlock_ip
>   /proc/fs/nfsd/unlock_filesystem
> 
> They are intended to allow admin or user mode script to release NLM locks
> based on either a path name or a server in-bound ip address (ipv4 for now)
> as;
> 
> shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip
> shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem
> 
> Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
> Signed-off-by: Lon Hohberger  <lhh@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
>  fs/lockd/svcsubs.c          |   66 +++++++++++++++++++++++++++++++++++++++-----
>  fs/nfsd/nfsctl.c            |   65 +++++++++++++++++++++++++++++++++++++++++++
>  include/linux/lockd/lockd.h |    7 ++++
>  3 files changed, 131 insertions(+), 7 deletions(-)
> 
> --- linux-o/fs/nfsd/nfsctl.c	2008-01-04 10:01:08.000000000 -0500
> +++ linux/fs/nfsd/nfsctl.c	2008-01-15 11:30:19.000000000 -0500
> @@ -22,6 +22,7 @@
>  #include <linux/seq_file.h>
>  #include <linux/pagemap.h>
>  #include <linux/init.h>
> +#include <linux/inet.h>
>  #include <linux/string.h>
>  #include <linux/smp_lock.h>
>  #include <linux/ctype.h>
> @@ -35,6 +36,7 @@
>  #include <linux/nfsd/cache.h>
>  #include <linux/nfsd/xdr.h>
>  #include <linux/nfsd/syscall.h>
> +#include <linux/lockd/lockd.h>
>  
>  #include <asm/uaccess.h>
>  
> @@ -52,6 +54,8 @@ enum {
>  	NFSD_Getfs,
>  	NFSD_List,
>  	NFSD_Fh,
> +	NFSD_FO_UnlockIP,
> +	NFSD_FO_UnlockFS,
>  	NFSD_Threads,
>  	NFSD_Pool_Threads,
>  	NFSD_Versions,
> @@ -88,6 +92,9 @@ static ssize_t write_leasetime(struct fi
>  static ssize_t write_recoverydir(struct file *file, char *buf, size_t size);
>  #endif
>  
> +static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size);
> +static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size);
> +
>  static ssize_t (*write_op[])(struct file *, char *, size_t) = {
>  	[NFSD_Svc] = write_svc,
>  	[NFSD_Add] = write_add,
> @@ -97,6 +104,8 @@ static ssize_t (*write_op[])(struct file
>  	[NFSD_Getfd] = write_getfd,
>  	[NFSD_Getfs] = write_getfs,
>  	[NFSD_Fh] = write_filehandle,
> +	[NFSD_FO_UnlockIP] = failover_unlock_ip,
> +	[NFSD_FO_UnlockFS] = failover_unlock_fs,
>  	[NFSD_Threads] = write_threads,
>  	[NFSD_Pool_Threads] = write_pool_threads,
>  	[NFSD_Versions] = write_versions,
> @@ -288,6 +297,58 @@ static ssize_t write_getfd(struct file *
>  	return err;
>  }
>  
> +static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size)
> +{
> +	__be32 server_ip;
> +	char *fo_path, c;
> +	int b1, b2, b3, b4;
> +
> +	/* sanity check */
> +	if (size == 0)
> +		return -EINVAL;
> +
> +	if (buf[size-1] != '\n')
> +		return -EINVAL;
> +
> +	fo_path = buf;
> +	if (qword_get(&buf, fo_path, size) < 0)
> +		return -EINVAL;
> +
> +	/* get ipv4 address */
> +	if (sscanf(fo_path, "%u.%u.%u.%u%c", &b1, &b2, &b3, &b4, &c) != 4)
> +		return -EINVAL;
> +	server_ip = htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
> +
> +	return nlmsvc_failover_ip(server_ip);
> +}
> +
> +static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size)
> +{
> +	struct nameidata nd;
> +	char *fo_path;
> +	int error;
> +
> +	/* sanity check */
> +	if (size == 0)
> +		return -EINVAL;
> +
> +	if (buf[size-1] != '\n')
> +		return -EINVAL;
> +
> +	fo_path = buf;
> +	if (qword_get(&buf, fo_path, size) < 0)
> +		return -EINVAL;
> +
> +	error = path_lookup(fo_path, 0, &nd);
> +	if (error)
> +		return error;
> +
> +	error = nlmsvc_failover_path(&nd);
> +
> +	path_release(&nd);
> +	return error;
> +}
> +
>  static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
>  {
>  	/* request is:
> @@ -646,6 +707,10 @@ static int nfsd_fill_super(struct super_
>  		[NFSD_Getfd] = {".getfd", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Getfs] = {".getfs", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_List] = {"exports", &exports_operations, S_IRUGO},
> +		[NFSD_FO_UnlockIP] = {"unlock_ip",
> +					&transaction_ops, S_IWUSR|S_IRUSR},
> +		[NFSD_FO_UnlockFS] = {"unlock_filesystem",
> +					&transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
>  		[NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
> --- linux-o/fs/lockd/svcsubs.c	2008-01-04 10:01:08.000000000 -0500
> +++ linux/fs/lockd/svcsubs.c	2008-01-15 11:16:48.000000000 -0500
> @@ -18,6 +18,8 @@
>  #include <linux/lockd/lockd.h>
>  #include <linux/lockd/share.h>
>  #include <linux/lockd/sm_inter.h>
> +#include <linux/module.h>
> +#include <linux/mount.h>
>  
>  #define NLMDBG_FACILITY		NLMDBG_SVCSUBS
>  
> @@ -87,7 +89,7 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
>  	unsigned int	hash;
>  	__be32		nfserr;
>  
> -	nlm_debug_print_fh("nlm_file_lookup", f);
> +	nlm_debug_print_fh("nlm_lookup_file", f);
>  
>  	hash = file_hash(f);
>  
> @@ -123,6 +125,9 @@ nlm_lookup_file(struct svc_rqst *rqstp, 
>  
>  	hlist_add_head(&file->f_list, &nlm_files[hash]);
>  
> +	/* fill in f_iaddr for nlm lock failover */
> +	file->f_iaddr = rqstp->rq_daddr;
> +
>  found:
>  	dprintk("lockd: found file %p (count %d)\n", file, file->f_count);
>  	*result = file;
> @@ -194,6 +199,12 @@ again:
>  	return 0;
>  }
>  
> +static int
> +nlmsvc_always_match(struct nlm_host *dummy1, struct nlm_host *dummy2)
> +{
> +	return 1;
> +}
> +
>  /*
>   * Inspect a single file
>   */
> @@ -230,7 +241,8 @@ nlm_file_inuse(struct nlm_file *file)
>   * Loop over all files in the file table.
>   */
>  static int
> -nlm_traverse_files(struct nlm_host *host, nlm_host_match_fn_t match)
> +nlm_traverse_files(void *data, nlm_host_match_fn_t match,
> +		int (*is_failover_file)(void *data, struct nlm_file *file))
>  {
>  	struct hlist_node *pos, *next;
>  	struct nlm_file	*file;
> @@ -244,8 +256,17 @@ nlm_traverse_files(struct nlm_host *host
>  
>  			/* Traverse locks, blocks and shares of this file
>  			 * and update file->f_locks count */
> -			if (nlm_inspect_file(host, file, match))
> -				ret = 1;
> +
> +			if (likely(is_failover_file == NULL) ||
> +				is_failover_file(data, file)) {
> +				/*
> +				 *  Note that nlm_inspect_file updates f_locks
> +				 *  and ret is the number of files that can't
> +				 *  be unlocked.
> +				 */
> +				ret += nlm_inspect_file(data, file, match);
> +			} else
> +				file->f_locks = nlm_file_inuse(file);
>  
>  			mutex_lock(&nlm_file_mutex);
>  			file->f_count--;
> @@ -337,7 +358,7 @@ void
>  nlmsvc_mark_resources(void)
>  {
>  	dprintk("lockd: nlmsvc_mark_resources\n");
> -	nlm_traverse_files(NULL, nlmsvc_mark_host);
> +	nlm_traverse_files(NULL, nlmsvc_mark_host, NULL);
>  }
>  
>  /*
> @@ -348,7 +369,7 @@ nlmsvc_free_host_resources(struct nlm_ho
>  {
>  	dprintk("lockd: nlmsvc_free_host_resources\n");
>  
> -	if (nlm_traverse_files(host, nlmsvc_same_host)) {
> +	if (nlm_traverse_files(host, nlmsvc_same_host, NULL)) {
>  		printk(KERN_WARNING
>  			"lockd: couldn't remove all locks held by %s\n",
>  			host->h_name);
> @@ -368,5 +389,36 @@ nlmsvc_invalidate_all(void)
>  	 * turn, which is about as inefficient as it gets.
>  	 * Now we just do it once in nlm_traverse_files.
>  	 */
> -	nlm_traverse_files(NULL, nlmsvc_is_client);
> +	nlm_traverse_files(NULL, nlmsvc_is_client, NULL);
> +}
> +
> +static int
> +nlmsvc_failover_file_ok_path(void *datap, struct nlm_file *file)
> +{
> +	struct nameidata *nd = datap;
> +	return nd->mnt == file->f_file->f_path.mnt;
> +}
> +
> +int
> +nlmsvc_failover_path(struct nameidata *nd)
> +{
> +	return nlm_traverse_files(nd, nlmsvc_always_match,
> +			nlmsvc_failover_file_ok_path);
> +}
> +EXPORT_SYMBOL_GPL(nlmsvc_failover_path);
> +
> +static int
> +nlmsvc_failover_file_ok_ip(void *datap, struct nlm_file *file)
> +{
> +	__be32 *server_addr = datap;
> +
> +	return file->f_iaddr.addr.s_addr == *server_addr;
> +}
> +
> +int
> +nlmsvc_failover_ip(__be32 server_addr)
> +{
> +	return nlm_traverse_files(&server_addr, nlmsvc_always_match,
> +			nlmsvc_failover_file_ok_ip);
>  }
> +EXPORT_SYMBOL_GPL(nlmsvc_failover_ip);
> --- linux-o/include/linux/lockd/lockd.h	2008-01-04 10:01:08.000000000 -0500
> +++ linux/include/linux/lockd/lockd.h	2008-01-15 11:13:04.000000000 -0500
> @@ -113,6 +113,7 @@ struct nlm_file {
>  	unsigned int		f_locks;	/* guesstimate # of locks */
>  	unsigned int		f_count;	/* reference count */
>  	struct mutex		f_mutex;	/* avoid concurrent access */
> +	union svc_addr_u	f_iaddr;	/* server ip for failover */
>  };
>  
>  /*
> @@ -214,6 +215,12 @@ void		  nlmsvc_mark_resources(void);
>  void		  nlmsvc_free_host_resources(struct nlm_host *);
>  void		  nlmsvc_invalidate_all(void);
>  
> +/*
> + * Cluster failover support
> + */
> +int           nlmsvc_failover_path(struct nameidata *nd);
> +int           nlmsvc_failover_ip(__be32 server_addr);
> +
>  static __inline__ struct inode *
>  nlmsvc_file_inode(struct nlm_file *file)
>  {



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 15:10                         ` J. Bruce Fields
@ 2008-01-17 15:48                           ` Wendy Cheng
  2008-01-17 16:08                             ` Wendy Cheng
                                               ` (2 more replies)
  0 siblings, 3 replies; 57+ messages in thread
From: Wendy Cheng @ 2008-01-17 15:48 UTC (permalink / raw)
  To: cluster-devel.redhat.com

J. Bruce Fields wrote:
> Remind me: why do we need both per-ip and per-filesystem methods?  In
> practice, I assume that we'll always do *both*?
>   

Failover normally is done via virtual IP address - so per-ip base method 
should be the core routine. However, for non-cluster filesystem such as 
ext3/4, changing server also implies umount. If there are clients not 
following rule and obtaining locks via different ip interfaces, umount 
would fail that ends up aborting the failover process. That's the place 
we need the per-filesystem method.

ServerA:
1. Tear down the IP address
2. Unexport the path
3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
4. If unmount required,
write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
5. Signal peer to begin take-over.

Sometime ago we were looking at "export name" as the core method (so 
per-filesystem method is a subset of that). Unfortunately, the prototype 
efforts showed the code would be too intrusive (if filesystem sub-tree 
is exported).
> We're migrating clients by moving a server ip address from one node to
> another.  And I assume we're permitting at most one node to export each
> filesystem at a time.  So it *should* be the case that the set of locks
> held on the filesystem(s) that are moving are the same as the set of
> locks held by the virtual ip that is moving.
>   

This is true for non-cluster filesystem. But a cluster filesystem can be 
exported from multiple servers.
> But presumably in some scenarios clients can get confused, and we need
> to ensure that stale locks are not left behind?
>   

Yes.

> We've discussed this before, but we should get the answer into comments
> in the code (or on the patches).
>
>   
ok, working on it. or should we add something into linux/Documentation 
to describe the overall logic ?

-- Wendy



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 15:48                           ` Wendy Cheng
@ 2008-01-17 16:08                             ` Wendy Cheng
  2008-01-17 16:10                               ` Wendy Cheng
  2008-01-17 16:14                             ` J. Bruce Fields
  2008-01-17 16:31                             ` J. Bruce Fields
  2 siblings, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-17 16:08 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Add a more detailed description into the top of the patch itself. I'm 
working on the resume patch now - it will include an overall write-up in 
the Documentation directory.

-- Wendy



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 16:08                             ` Wendy Cheng
@ 2008-01-17 16:10                               ` Wendy Cheng
  2008-01-18 10:21                                 ` Frank van Maarseveen
  0 siblings, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-17 16:10 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Wendy Cheng wrote:
> Add a more detailed description into the top of the patch itself. I'm 
> working on the resume patch now - it will include an overall write-up 
> in the Documentation directory.

Sorry, forgot to attach the patch. Here it is ... Wendy

-------------- next part --------------
A non-text attachment was scrubbed...
Name: unlock_v4.patch
Type: text/x-patch
Size: 8610 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20080117/ea248495/attachment.bin>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 15:48                           ` Wendy Cheng
  2008-01-17 16:08                             ` Wendy Cheng
@ 2008-01-17 16:14                             ` J. Bruce Fields
  2008-01-17 16:17                               ` Wendy Cheng
  2008-01-17 16:31                             ` J. Bruce Fields
  2 siblings, 1 reply; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-17 16:14 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jan 17, 2008 at 10:48:56AM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> Remind me: why do we need both per-ip and per-filesystem methods?  In
>> practice, I assume that we'll always do *both*?
>>   
>
> Failover normally is done via virtual IP address - so per-ip base method  
> should be the core routine. However, for non-cluster filesystem such as  
> ext3/4, changing server also implies umount. If there are clients not  
> following rule and obtaining locks via different ip interfaces, umount  
> would fail that ends up aborting the failover process. That's the place  
> we need the per-filesystem method.
>
> ServerA:
> 1. Tear down the IP address
> 2. Unexport the path
> 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
> 4. If unmount required,
> write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
> 5. Signal peer to begin take-over.
>
> Sometime ago we were looking at "export name" as the core method (so  
> per-filesystem method is a subset of that). Unfortunately, the prototype  
> efforts showed the code would be too intrusive (if filesystem sub-tree  
> is exported).
>> We're migrating clients by moving a server ip address from one node to
>> another.  And I assume we're permitting at most one node to export each
>> filesystem at a time.  So it *should* be the case that the set of locks
>> held on the filesystem(s) that are moving are the same as the set of
>> locks held by the virtual ip that is moving.
>>   
>
> This is true for non-cluster filesystem. But a cluster filesystem can be  
> exported from multiple servers.
>> But presumably in some scenarios clients can get confused, and we need
>> to ensure that stale locks are not left behind?
>>   
>
> Yes.
>
>> We've discussed this before, but we should get the answer into comments
>> in the code (or on the patches).
>>
>>   
> ok, working on it. or should we add something into linux/Documentation  
> to describe the overall logic ?

Yeah, sounds good.  Maybe under Documentation/filesystems?  And it might
also be helpful to leave a reference to it in the code, e.g., in
nfsctl.c:

	/*
	 * The following are used for failover; see
	 * Documentation/filesystems/nfsd-failover.txt for details.
	 */

--b.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 16:14                             ` J. Bruce Fields
@ 2008-01-17 16:17                               ` Wendy Cheng
  2008-01-17 16:21                                 ` J. Bruce Fields
  0 siblings, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-17 16:17 UTC (permalink / raw)
  To: cluster-devel.redhat.com

J. Bruce Fields wrote:
> Yeah, sounds good.  Maybe under Documentation/filesystems?  And it might
> also be helpful to leave a reference to it in the code, e.g., in
> nfsctl.c:
>
> 	/*
> 	 * The following are used for failover; see
> 	 * Documentation/filesystems/nfsd-failover.txt for details.
> 	 */
>
> --b.
>   

ok, looks good ... but let's "fix" this on the resume patch ? or you 
want it on the unlock patch *now* ?

-- Wendy



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 16:17                               ` Wendy Cheng
@ 2008-01-17 16:21                                 ` J. Bruce Fields
  0 siblings, 0 replies; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-17 16:21 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jan 17, 2008 at 11:17:08AM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> Yeah, sounds good.  Maybe under Documentation/filesystems?  And it might
>> also be helpful to leave a reference to it in the code, e.g., in
>> nfsctl.c:
>>
>> 	/*
>> 	 * The following are used for failover; see
>> 	 * Documentation/filesystems/nfsd-failover.txt for details.
>> 	 */
>>
>> --b.
>>   
>
> ok, looks good ... but let's "fix" this on the resume patch ? or you  
> want it on the unlock patch *now* ?

It's not urgent.

--b.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 15:48                           ` Wendy Cheng
  2008-01-17 16:08                             ` Wendy Cheng
  2008-01-17 16:14                             ` J. Bruce Fields
@ 2008-01-17 16:31                             ` J. Bruce Fields
  2008-01-17 16:31                               ` Wendy Cheng
  2 siblings, 1 reply; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-17 16:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jan 17, 2008 at 10:48:56AM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> Remind me: why do we need both per-ip and per-filesystem methods?  In
>> practice, I assume that we'll always do *both*?
>>   
>
> Failover normally is done via virtual IP address - so per-ip base method  
> should be the core routine. However, for non-cluster filesystem such as  
> ext3/4, changing server also implies umount. If there are clients not  
> following rule and obtaining locks via different ip interfaces, umount  
> would fail that ends up aborting the failover process. That's the place  
> we need the per-filesystem method.
>
> ServerA:
> 1. Tear down the IP address
> 2. Unexport the path
> 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
> 4. If unmount required,
> write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
> 5. Signal peer to begin take-over.
>
> Sometime ago we were looking at "export name" as the core method (so  
> per-filesystem method is a subset of that). Unfortunately, the prototype  
> efforts showed the code would be too intrusive (if filesystem sub-tree  
> is exported).
>> We're migrating clients by moving a server ip address from one node to
>> another.  And I assume we're permitting at most one node to export each
>> filesystem at a time.  So it *should* be the case that the set of locks
>> held on the filesystem(s) that are moving are the same as the set of
>> locks held by the virtual ip that is moving.
>>   
>
> This is true for non-cluster filesystem. But a cluster filesystem can be  
> exported from multiple servers.

But that last sentence:

	it *should* be the case that the set of locks held on the
	filesystem(s) that are moving are the same as the set of locks
	held by the virtual ip that is moving.

is still true in the cluster filesystem case, right?

--b.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 16:31                             ` J. Bruce Fields
@ 2008-01-17 16:31                               ` Wendy Cheng
  2008-01-17 16:40                                 ` J. Bruce Fields
  0 siblings, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-17 16:31 UTC (permalink / raw)
  To: cluster-devel.redhat.com

J. Bruce Fields wrote:
> On Thu, Jan 17, 2008 at 10:48:56AM -0500, Wendy Cheng wrote:
>   
>> J. Bruce Fields wrote:
>>     
>>> Remind me: why do we need both per-ip and per-filesystem methods?  In
>>> practice, I assume that we'll always do *both*?
>>>   
>>>       
>> Failover normally is done via virtual IP address - so per-ip base method  
>> should be the core routine. However, for non-cluster filesystem such as  
>> ext3/4, changing server also implies umount. If there are clients not  
>> following rule and obtaining locks via different ip interfaces, umount  
>> would fail that ends up aborting the failover process. That's the place  
>> we need the per-filesystem method.
>>
>> ServerA:
>> 1. Tear down the IP address
>> 2. Unexport the path
>> 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
>> 4. If unmount required,
>> write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
>> 5. Signal peer to begin take-over.
>>
>> Sometime ago we were looking at "export name" as the core method (so  
>> per-filesystem method is a subset of that). Unfortunately, the prototype  
>> efforts showed the code would be too intrusive (if filesystem sub-tree  
>> is exported).
>>     
>>> We're migrating clients by moving a server ip address from one node to
>>> another.  And I assume we're permitting at most one node to export each
>>> filesystem at a time.  So it *should* be the case that the set of locks
>>> held on the filesystem(s) that are moving are the same as the set of
>>> locks held by the virtual ip that is moving.
>>>   
>>>       
>> This is true for non-cluster filesystem. But a cluster filesystem can be  
>> exported from multiple servers.
>>     
>
> But that last sentence:
>
> 	it *should* be the case that the set of locks held on the
> 	filesystem(s) that are moving are the same as the set of locks
> 	held by the virtual ip that is moving.
>
> is still true in the cluster filesystem case, right?
>
> --b.
>   
Yes .... Wendy



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 16:31                               ` Wendy Cheng
@ 2008-01-17 16:40                                 ` J. Bruce Fields
       [not found]                                   ` <1200591323.13670.34.camel@dyn9047022153>
  2008-01-17 18:07                                   ` Wendy Cheng
  0 siblings, 2 replies; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-17 16:40 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jan 17, 2008 at 11:31:22AM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> On Thu, Jan 17, 2008 at 10:48:56AM -0500, Wendy Cheng wrote:
>>   
>>> J. Bruce Fields wrote:
>>>     
>>>> Remind me: why do we need both per-ip and per-filesystem methods?  In
>>>> practice, I assume that we'll always do *both*?
>>>>         
>>> Failover normally is done via virtual IP address - so per-ip base 
>>> method  should be the core routine. However, for non-cluster 
>>> filesystem such as  ext3/4, changing server also implies umount. If 
>>> there are clients not  following rule and obtaining locks via 
>>> different ip interfaces, umount  would fail that ends up aborting the 
>>> failover process. That's the place  we need the per-filesystem 
>>> method.
>>>
>>> ServerA:
>>> 1. Tear down the IP address
>>> 2. Unexport the path
>>> 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
>>> 4. If unmount required,
>>> write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
>>> 5. Signal peer to begin take-over.
>>>
>>> Sometime ago we were looking at "export name" as the core method (so  
>>> per-filesystem method is a subset of that). Unfortunately, the 
>>> prototype  efforts showed the code would be too intrusive (if 
>>> filesystem sub-tree  is exported).
>>>     
>>>> We're migrating clients by moving a server ip address from one node to
>>>> another.  And I assume we're permitting at most one node to export each
>>>> filesystem at a time.  So it *should* be the case that the set of locks
>>>> held on the filesystem(s) that are moving are the same as the set of
>>>> locks held by the virtual ip that is moving.
>>>>         
>>> This is true for non-cluster filesystem. But a cluster filesystem can 
>>> be  exported from multiple servers.
>>>     
>>
>> But that last sentence:
>>
>> 	it *should* be the case that the set of locks held on the
>> 	filesystem(s) that are moving are the same as the set of locks
>> 	held by the virtual ip that is moving.
>>
>> is still true in the cluster filesystem case, right?
>>
>> --b.
>>   
> Yes .... Wendy

In one situations (buggy client?  Weird network failure?) could that
fail to be the case?

Would there be any advantage to enforcing that requirement in the
server?  (For example, teaching nlm to reject any locking request for a
certain filesystem that wasn't sent to a certain server IP.)

--b.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
       [not found]                                   ` <1200591323.13670.34.camel@dyn9047022153>
@ 2008-01-17 17:59                                     ` Wendy Cheng
  0 siblings, 0 replies; 57+ messages in thread
From: Wendy Cheng @ 2008-01-17 17:59 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Frank Filz wrote:
>
> I assume the intent here with this implementation is that the node
> taking over will start lock recovery for the ip address? With that
> perspective, I guess it would be important that each file system only be
> accessed with a single ip address otherwise lock recovery will not work
> correctly since another node/ip could accept locks for that filesystem,
> possibly "stealing" a lock that is in recovery. As I recall, our
> implementation put the entire filesystem cluster-wide into recovery
> during fail-over.
>
>   

We have 2 more patches on their way to this mailing that:

* Set per-ip based grace period
* Notify only relevant clients about reclaim events

-- Wendy



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 16:40                                 ` J. Bruce Fields
       [not found]                                   ` <1200591323.13670.34.camel@dyn9047022153>
@ 2008-01-17 18:07                                   ` Wendy Cheng
  2008-01-17 20:23                                     ` J. Bruce Fields
  1 sibling, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-17 18:07 UTC (permalink / raw)
  To: cluster-devel.redhat.com

J. Bruce Fields wrote:
> On Thu, Jan 17, 2008 at 11:31:22AM -0500, Wendy Cheng wrote:
>   
>> J. Bruce Fields wrote:
>>     
>>> On Thu, Jan 17, 2008 at 10:48:56AM -0500, Wendy Cheng wrote:
>>>   
>>>       
>>>> J. Bruce Fields wrote:
>>>>     
>>>>         
>>>>> Remind me: why do we need both per-ip and per-filesystem methods?  In
>>>>> practice, I assume that we'll always do *both*?
>>>>>         
>>>>>           
>>>> Failover normally is done via virtual IP address - so per-ip base 
>>>> method  should be the core routine. However, for non-cluster 
>>>> filesystem such as  ext3/4, changing server also implies umount. If 
>>>> there are clients not  following rule and obtaining locks via 
>>>> different ip interfaces, umount  would fail that ends up aborting the 
>>>> failover process. That's the place  we need the per-filesystem 
>>>> method.
>>>>
>>>> ServerA:
>>>> 1. Tear down the IP address
>>>> 2. Unexport the path
>>>> 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
>>>> 4. If unmount required,
>>>> write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
>>>> 5. Signal peer to begin take-over.
>>>>
>>>> Sometime ago we were looking at "export name" as the core method (so  
>>>> per-filesystem method is a subset of that). Unfortunately, the 
>>>> prototype  efforts showed the code would be too intrusive (if 
>>>> filesystem sub-tree  is exported).
>>>>     
>>>>         
>>>>> We're migrating clients by moving a server ip address from one node to
>>>>> another.  And I assume we're permitting at most one node to export each
>>>>> filesystem at a time.  So it *should* be the case that the set of locks
>>>>> held on the filesystem(s) that are moving are the same as the set of
>>>>> locks held by the virtual ip that is moving.
>>>>>         
>>>>>           
>>>> This is true for non-cluster filesystem. But a cluster filesystem can 
>>>> be  exported from multiple servers.
>>>>     
>>>>         
>>> But that last sentence:
>>>
>>> 	it *should* be the case that the set of locks held on the
>>> 	filesystem(s) that are moving are the same as the set of locks
>>> 	held by the virtual ip that is moving.
>>>
>>> is still true in the cluster filesystem case, right?
>>>
>>> --b.
>>>   
>>>       
>> Yes .... Wendy
>>     
>
> In one situations (buggy client?  Weird network failure?) could that
> fail to be the case?
>
> Would there be any advantage to enforcing that requirement in the
> server?  (For example, teaching nlm to reject any locking request for a
> certain filesystem that wasn't sent to a certain server IP.)
>
> --b.
>   
It is doable... could be added into the "resume" patch that is currently 
being tested (since the logic is so similar to the per-ip base grace 
period) that should be out for review no later than next Monday.

However, as any new code added into the system, there are trade-off(s). 
I'm not sure we want to keep enhancing this too much though. Remember, 
locking is about latency. Adding more checking will hurt latency.

-- Wendy




^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 18:07                                   ` Wendy Cheng
@ 2008-01-17 20:23                                     ` J. Bruce Fields
  2008-01-18 10:03                                       ` Frank van Maarseveen
  2008-01-24 16:00                                       ` J. Bruce Fields
  0 siblings, 2 replies; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-17 20:23 UTC (permalink / raw)
  To: cluster-devel.redhat.com

To summarize a phone conversation from today:

On Thu, Jan 17, 2008 at 01:07:02PM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> Would there be any advantage to enforcing that requirement in the
>> server?  (For example, teaching nlm to reject any locking request for a
>> certain filesystem that wasn't sent to a certain server IP.)
>>
>> --b.
>>   
> It is doable... could be added into the "resume" patch that is currently  
> being tested (since the logic is so similar to the per-ip base grace  
> period) that should be out for review no later than next Monday.
>
> However, as any new code added into the system, there are trade-off(s).  
> I'm not sure we want to keep enhancing this too much though.

Sure.  And I don't want to make this terribly complicated.  The patch
looks good, and solves a clear problem.  That said, there are a few
related problems we'd like to solve:

	- We want to be able to move an export to a node with an already
	  active nfs server.  Currently that requires restarting all of
	  nfsd on the target node.  This is what I understand your next
	  patch fixes.
	- In the case of a filesystem that may be mounted from multiple
	  nodes at once, we need to make sure we're not leaving a window
	  allowing other applications to claim locks that nfs clients
	  haven't recovered yet.
	- Ideally we'd like this to be possible without making the
	  filesystem block all lock requests during a 90-second grace
	  period; instead it should only have to block those requests
	  that conflict with to-be-recovered locks.
	- All this should work for nfsv4, where we want to eventually
	  also allow migration of individual clients, and
	  client-initiated failover.

I absolutely don't want to delay solving this particular problem until
all the above is figured out, but I would like to be reasonably
confident that the new user-interface can be extended naturally to
handle the above cases; or at least that it won't unnecessarily
complicate their implementation.

I'll try to sketch an implementation of most of the above in the next
week.

Anyway, that together with the fact that 2.6.25 is opening up soon (in a
week or so?) inclines me toward delay submitting this until 2.6.26.

> Remember,  
> locking is about latency. Adding more checking will hurt latency.

Do you have any latency tests that we could use, or latency-sensitive
workloads that you use as benchmarks?

My suspicion is that checks such as these would be dwarfed by the posix
deadlock detection checks, not to mention the roundtrip to the server
for the nlm rpc and (in the gfs2 case) the communication with gfs2's
posix lock manager.

But I'd love any chance to demonstrate lock latency problems--I'm sure
there's good work to be done there.

--b.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 20:23                                     ` J. Bruce Fields
@ 2008-01-18 10:03                                       ` Frank van Maarseveen
  2008-01-18 14:56                                         ` Wendy Cheng
  2008-01-24 16:00                                       ` J. Bruce Fields
  1 sibling, 1 reply; 57+ messages in thread
From: Frank van Maarseveen @ 2008-01-18 10:03 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jan 17, 2008 at 03:23:42PM -0500, J. Bruce Fields wrote:
> To summarize a phone conversation from today:
> 
> On Thu, Jan 17, 2008 at 01:07:02PM -0500, Wendy Cheng wrote:
> > J. Bruce Fields wrote:
> >> Would there be any advantage to enforcing that requirement in the
> >> server?  (For example, teaching nlm to reject any locking request for a
> >> certain filesystem that wasn't sent to a certain server IP.)
> >>
> >> --b.
> >>   
> > It is doable... could be added into the "resume" patch that is currently  
> > being tested (since the logic is so similar to the per-ip base grace  
> > period) that should be out for review no later than next Monday.
> >
> > However, as any new code added into the system, there are trade-off(s).  
> > I'm not sure we want to keep enhancing this too much though.
> 
> Sure.  And I don't want to make this terribly complicated.  The patch
> looks good, and solves a clear problem.  That said, there are a few
> related problems we'd like to solve:
> 
> 	- We want to be able to move an export to a node with an already
> 	  active nfs server.  Currently that requires restarting all of
> 	  nfsd on the target node.  This is what I understand your next
> 	  patch fixes.

Maybe a silly question but what about using "exportfs -r" for this?

-- 
Frank



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 16:10                               ` Wendy Cheng
@ 2008-01-18 10:21                                 ` Frank van Maarseveen
  2008-01-18 15:00                                   ` Wendy Cheng
  0 siblings, 1 reply; 57+ messages in thread
From: Frank van Maarseveen @ 2008-01-18 10:21 UTC (permalink / raw)
  To: cluster-devel.redhat.com


> shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip
> shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem
> 
> The expected sequence of events can be:
> 1. Tear down the IP address

You might consider using iptables at this point for dropping outgoing
packets with that source IP address to catch any packet still in
flight. It fixed ESTALE problems for me IIRC (NFSv3, UDP).

> 2. Unexport the path
> 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
> 4. If unmount required, write path name to
>    /proc/fs/nfsd/unlock_filesystem, then unmount.
> 5. Signal peer to begin take-over.
> 

-- 
Frank



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-18 10:03                                       ` Frank van Maarseveen
@ 2008-01-18 14:56                                         ` Wendy Cheng
  0 siblings, 0 replies; 57+ messages in thread
From: Wendy Cheng @ 2008-01-18 14:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Frank van Maarseveen wrote:
>
>> 	- We want to be able to move an export to a node with an already
>> 	  active nfs server.  Currently that requires restarting all of
>> 	  nfsd on the target node.  This is what I understand your next
>> 	  patch fixes.
>>     
>
> Maybe a silly question but what about using "exportfs -r" for this?
>   

/me prays we won't go back to our *old* export failover proposal (about 
two years ago) :) ...

Anyway, re-export is part of the required steps that take-over server 
needs to do. It, however, doesn't handle lockd grace period so we will 
have the possibility that other client will steal the locks away. That's 
why Bruce and I are working on the second "resume" patch at this moment.

-- Wendy

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-18 10:21                                 ` Frank van Maarseveen
@ 2008-01-18 15:00                                   ` Wendy Cheng
  0 siblings, 0 replies; 57+ messages in thread
From: Wendy Cheng @ 2008-01-18 15:00 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Frank van Maarseveen wrote:
>> shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip
>> shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem
>>
>> The expected sequence of events can be:
>> 1. Tear down the IP address
>>     
>
> You might consider using iptables at this point for dropping outgoing
> packets with that source IP address to catch any packet still in
> flight. It fixed ESTALE problems for me IIRC (NFSv3, UDP).
>   

ok, thanks ... Wendy
>   
>> 2. Unexport the path
>> 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
>> 4. If unmount required, write path name to
>>    /proc/fs/nfsd/unlock_filesystem, then unmount.
>> 5. Signal peer to begin take-over.
>>
>>     
>
>   



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-14 23:31                 ` Neil Brown
@ 2008-01-22 22:53                   ` J. Bruce Fields
       [not found]                     ` <message from J. Bruce Fields on Tuesday January 22>
  0 siblings, 1 reply; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-22 22:53 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tue, Jan 15, 2008 at 10:31:18AM +1100, Neil Brown wrote:
> On Monday January 14, bfields at fieldses.org wrote:
> > >  
> > > +static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size)
> > > +{
> > > +	__be32 server_ip;
> > > +	char *fo_path;
> > > +	char *mesg;
> > > +
> > > +	/* sanity check */
> > > +	if (size <= 0)
> > > +		return -EINVAL;
> > 
> > Not only is size never negative, it's actually an unsigned type here, so
> > this is a no-op.
> 
> No,  It it equivalent to
>           if (size == 0)
> 
> which alternative is clearer and more maintainable is debatable.
> 
> > 
> > > +
> > > +	if (buf[size-1] == '\n')
> > > +		buf[size-1] = 0;
> > 
> > The other write methods in this file actually just do
> > 
> > 	if (buf[size-1] != '\n')
> > 		return -EINVAL;
> 
> and those which don't check for size == 0 are underflowing an array.
> That should probably be fixed.

?

--b.

commit 6685389d610950126f700d25f3d010c7049441c3
Author: J. Bruce Fields <bfields@citi.umich.edu>
Date:   Tue Jan 22 17:40:42 2008 -0500

    nfsd: more careful input validation in nfsctl write methods
    
    Neil Brown points out that we're checking buf[size-1] in a couple places
    without first checking whether size is zero.
    
    Actually, given the implementation of simple_transaction_get(), buf[-1]
    is zero, so in both of these cases the subsequent check of the value of
    buf[size-1] will catch this case.
    
    But it seems fragile to depend on that, so add explicit checks for this
    case.
    
    Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
    Cc: Neil Brown <neilb@suse.de>

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 61015cf..9ed2a2b 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -304,6 +304,9 @@ static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
 	struct auth_domain *dom;
 	struct knfsd_fh fh;
 
+	if (size == 0)
+		return -EINVAL;
+
 	if (buf[size-1] != '\n')
 		return -EINVAL;
 	buf[size-1] = 0;
@@ -663,7 +666,7 @@ static ssize_t write_recoverydir(struct file *file, char *buf, size_t size)
 	char *recdir;
 	int len, status;
 
-	if (size > PATH_MAX || buf[size-1] != '\n')
+	if (size == 0 || size > PATH_MAX || buf[size-1] != '\n')
 		return -EINVAL;
 	buf[size-1] = 0;
 



^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
       [not found]                     ` <message from J. Bruce Fields on Tuesday January 22>
@ 2008-01-24  4:02                       ` Neil Brown
  0 siblings, 0 replies; 57+ messages in thread
From: Neil Brown @ 2008-01-24  4:02 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Tuesday January 22, bfields at fieldses.org wrote:
> 
> ?

!

(i.e. Acked-By: NeilBrown <neilb@suse.de>)

tnx.NB

> 
> --b.
> 
> commit 6685389d610950126f700d25f3d010c7049441c3
> Author: J. Bruce Fields <bfields@citi.umich.edu>
> Date:   Tue Jan 22 17:40:42 2008 -0500
> 
>     nfsd: more careful input validation in nfsctl write methods
>     
>     Neil Brown points out that we're checking buf[size-1] in a couple places
>     without first checking whether size is zero.
>     
>     Actually, given the implementation of simple_transaction_get(), buf[-1]
>     is zero, so in both of these cases the subsequent check of the value of
>     buf[size-1] will catch this case.
>     
>     But it seems fragile to depend on that, so add explicit checks for this
>     case.
>     
>     Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
>     Cc: Neil Brown <neilb@suse.de>



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-17 20:23                                     ` J. Bruce Fields
  2008-01-18 10:03                                       ` Frank van Maarseveen
@ 2008-01-24 16:00                                       ` J. Bruce Fields
       [not found]                                         ` <4798BAAE.6090107@redhat.com>
                                                           ` (2 more replies)
  1 sibling, 3 replies; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-24 16:00 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jan 17, 2008 at 03:23:42PM -0500, J. Bruce Fields wrote:
> To summarize a phone conversation from today:
> 
> On Thu, Jan 17, 2008 at 01:07:02PM -0500, Wendy Cheng wrote:
> > J. Bruce Fields wrote:
> >> Would there be any advantage to enforcing that requirement in the
> >> server?  (For example, teaching nlm to reject any locking request for a
> >> certain filesystem that wasn't sent to a certain server IP.)
> >>
> >> --b.
> >>   
> > It is doable... could be added into the "resume" patch that is currently  
> > being tested (since the logic is so similar to the per-ip base grace  
> > period) that should be out for review no later than next Monday.
> >
> > However, as any new code added into the system, there are trade-off(s).  
> > I'm not sure we want to keep enhancing this too much though.
> 
> Sure.  And I don't want to make this terribly complicated.  The patch
> looks good, and solves a clear problem.  That said, there are a few
> related problems we'd like to solve:
> 
> 	- We want to be able to move an export to a node with an already
> 	  active nfs server.  Currently that requires restarting all of
> 	  nfsd on the target node.  This is what I understand your next
> 	  patch fixes.
> 	- In the case of a filesystem that may be mounted from multiple
> 	  nodes at once, we need to make sure we're not leaving a window
> 	  allowing other applications to claim locks that nfs clients
> 	  haven't recovered yet.
> 	- Ideally we'd like this to be possible without making the
> 	  filesystem block all lock requests during a 90-second grace
> 	  period; instead it should only have to block those requests
> 	  that conflict with to-be-recovered locks.
> 	- All this should work for nfsv4, where we want to eventually
> 	  also allow migration of individual clients, and
> 	  client-initiated failover.
> 
> I absolutely don't want to delay solving this particular problem until
> all the above is figured out, but I would like to be reasonably
> confident that the new user-interface can be extended naturally to
> handle the above cases; or at least that it won't unnecessarily
> complicate their implementation.
> 
> I'll try to sketch an implementation of most of the above in the next
> week.

Bah.  Apologies, this is taking me longer than it should to figure
out--I've only barely started writing patches.

The basic idea, though:

In practice, it seems that both the unlock_ip and unlock_pathname
methods that revoke locks are going to be called together.  The two
separate calls therefore seem a little redundant.  The reason we *need*
both is that it's possible that a misconfigured client could grab locks
for a (server ip, export) combination that it isn't supposed to.

So it makes sense to me to restrict locking from the beginning to
prevent that from happening.  Therefore I'd like to add a call at the
beginning like:

	echo "192.168.1.1 /exports/example" > /proc/fs/nfsd/start_grace

before any exports are set up, which both starts a grace period, and
tells nfs to allow locks on the filesystem /exports/example only if
they're addressed to the server ip 192.168.1.1.  Then on shutdown,

	echo "192.168.1.1" >/proc/fs/nfsd/unlock_ip

should be sufficient to guarantee that nfsd/lockd no longer holds locks
on /exports/example.

(I think Wendy's pretty close to that api already after adding the
second method to start grace?)

The other advantage to having the server-ip from the start is that at
the time we make lock requests to the cluster filesystem, we can tell it
that the locks associated with 192.168.1.1 are special: they may migrate
as a group to another node, and on node failure they should (if
possible) be held to give a chance for another node to take them over.

Internally I'd like to have an object like

	struct lock_manager {
		char *lm_name;
		...
	}

for each server ip address.  A pointer to this structure would be passed
with each lock request, allowing the filesystem to associate locks to
lock_manager's.  The name would be a string derived from the server ip
address that the cluster can compare to match reclaim requests with the
locks that they're reclaiming from another node.

(And in the NFSv4 case we would eventually also allow lock_managers with
single nfsv4 client (as opposed to server-ip) granularity.)

Does that seem sane?

But it's taking me longer than I'd like to get patches that implement
this.  Hopefully by next week I can get working code together for people
to look at....

--b.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
       [not found]                                         ` <4798BAAE.6090107@redhat.com>
@ 2008-01-24 16:39                                           ` J. Bruce Fields
  0 siblings, 0 replies; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-24 16:39 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jan 24, 2008 at 11:19:58AM -0500, Peter Staubach wrote:
> J. Bruce Fields wrote:
>> On Thu, Jan 17, 2008 at 03:23:42PM -0500, J. Bruce Fields wrote:
>>   
>>> To summarize a phone conversation from today:
>>>
>>> On Thu, Jan 17, 2008 at 01:07:02PM -0500, Wendy Cheng wrote:
>>>     
>>>> J. Bruce Fields wrote:
>>>>       
>>>>> Would there be any advantage to enforcing that requirement in the
>>>>> server?  (For example, teaching nlm to reject any locking request for a
>>>>> certain filesystem that wasn't sent to a certain server IP.)
>>>>>
>>>>> --b.
>>>>>           
>>>> It is doable... could be added into the "resume" patch that is 
>>>> currently  being tested (since the logic is so similar to the 
>>>> per-ip base grace  period) that should be out for review no later 
>>>> than next Monday.
>>>>
>>>> However, as any new code added into the system, there are 
>>>> trade-off(s).  I'm not sure we want to keep enhancing this too much 
>>>> though.
>>>>       
>>> Sure.  And I don't want to make this terribly complicated.  The patch
>>> looks good, and solves a clear problem.  That said, there are a few
>>> related problems we'd like to solve:
>>>
>>> 	- We want to be able to move an export to a node with an already
>>> 	  active nfs server.  Currently that requires restarting all of
>>> 	  nfsd on the target node.  This is what I understand your next
>>> 	  patch fixes.
>>> 	- In the case of a filesystem that may be mounted from multiple
>>> 	  nodes at once, we need to make sure we're not leaving a window
>>> 	  allowing other applications to claim locks that nfs clients
>>> 	  haven't recovered yet.
>>> 	- Ideally we'd like this to be possible without making the
>>> 	  filesystem block all lock requests during a 90-second grace
>>> 	  period; instead it should only have to block those requests
>>> 	  that conflict with to-be-recovered locks.
>>> 	- All this should work for nfsv4, where we want to eventually
>>> 	  also allow migration of individual clients, and
>>> 	  client-initiated failover.
>>>
>>> I absolutely don't want to delay solving this particular problem until
>>> all the above is figured out, but I would like to be reasonably
>>> confident that the new user-interface can be extended naturally to
>>> handle the above cases; or at least that it won't unnecessarily
>>> complicate their implementation.
>>>
>>> I'll try to sketch an implementation of most of the above in the next
>>> week.
>>>     
>>
>> Bah.  Apologies, this is taking me longer than it should to figure
>> out--I've only barely started writing patches.
>>
>> The basic idea, though:
>>
>> In practice, it seems that both the unlock_ip and unlock_pathname
>> methods that revoke locks are going to be called together.  The two
>> separate calls therefore seem a little redundant.  The reason we *need*
>> both is that it's possible that a misconfigured client could grab locks
>> for a (server ip, export) combination that it isn't supposed to.
>>
>> So it makes sense to me to restrict locking from the beginning to
>> prevent that from happening.  Therefore I'd like to add a call at the
>> beginning like:
>>
>> 	echo "192.168.1.1 /exports/example" > /proc/fs/nfsd/start_grace
>>
>> before any exports are set up, which both starts a grace period, and
>> tells nfs to allow locks on the filesystem /exports/example only if
>> they're addressed to the server ip 192.168.1.1.  Then on shutdown,
>>
>> 	echo "192.168.1.1" >/proc/fs/nfsd/unlock_ip
>>
>> should be sufficient to guarantee that nfsd/lockd no longer holds locks
>> on /exports/example.
>>
>> (I think Wendy's pretty close to that api already after adding the
>> second method to start grace?)
>>
>> The other advantage to having the server-ip from the start is that at
>> the time we make lock requests to the cluster filesystem, we can tell it
>> that the locks associated with 192.168.1.1 are special: they may migrate
>> as a group to another node, and on node failure they should (if
>> possible) be held to give a chance for another node to take them over.
>>
>> Internally I'd like to have an object like
>>
>> 	struct lock_manager {
>> 		char *lm_name;
>> 		...
>> 	}
>>
>> for each server ip address.  A pointer to this structure would be passed
>> with each lock request, allowing the filesystem to associate locks to
>> lock_manager's.  The name would be a string derived from the server ip
>> address that the cluster can compare to match reclaim requests with the
>> locks that they're reclaiming from another node.
>>
>> (And in the NFSv4 case we would eventually also allow lock_managers with
>> single nfsv4 client (as opposed to server-ip) granularity.)
>>
>> Does that seem sane?
>>
>> But it's taking me longer than I'd like to get patches that implement
>> this.  Hopefully by next week I can get working code together for people
>> to look at....
>
> This seems somewhat less than scalable and not particularly
> generally useful except in this specific sort of situation
> to me.  The ratios between servers and clients tend to be
> one to many, where many can be not uncommonly 4 digits.
> What happens if the cluster is 1000+ nodes?  This strikes me
> as tough on a systems administrator.

I think that may turn out to be a valid objection to the notion of allowing
migration of individual v4 clients.  Does it really apply to the case of
migration by server ip addresses?

Maybe I'm not understanding the exact scalability problem you're
thinking of.

> It seems to me that it would be nice to be able to solve this
> problem someplace else like a cluster lock manager and not
> clutter up the NFS lock manager.

All we're doing in lock/nfsd is associating individual lock requests to "lock
managers"--sets of locks that may migrate as a group and making it possible to
individually shut down and start up lock managers.

In the case of a cluster filesystem, what I hope we end up with is an
api with calls to the filesystem like:

	lock_manager_start(lock_manager, super_block);
	lock_manager_end_grace(lock_manager, super_block);
	lock_manager_shutdown(lock_manager, super_block);

that inform the filesystem of the status of each lock manager.  If we pass lock
requests (including reclaims) to the filesystem as well, then it can if it
wants get complete control over the lock recovery process--so for example if it
knows which locks a dead node held, it may decide to allow normal locking that
doesn't conflict with those locks during the grace period.  Or it may choose to
do something simpler.

So I *think* this allows us to do what you want--it adds the minimal
infrastructure to lockd required to allow the cluster's lock manager to do the
real work.

(Though we'd also want a simple default implementation in the vfs which handles
simpler cases--like two servers that mount a single ext3 filesystem one at a
time from a shared block device.)

--b.

>
> Perhaps the cluster folks could explain why this problem
> can't be solved there?  Is this another case of attempting
> to create a cluster concept without actually doing all of
> the work to make the cluster appear to be a single system?
>
> Or am I misunderstanding the situation here?



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-24 16:00                                       ` J. Bruce Fields
       [not found]                                         ` <4798BAAE.6090107@redhat.com>
@ 2008-01-24 19:45                                         ` Wendy Cheng
  2008-01-24 20:19                                           ` J. Bruce Fields
  2008-01-28  3:46                                         ` Felix Blyakher
  2 siblings, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-24 19:45 UTC (permalink / raw)
  To: cluster-devel.redhat.com

J. Bruce Fields wrote:
> On Thu, Jan 17, 2008 at 03:23:42PM -0500, J. Bruce Fields wrote:
>   
>> To summarize a phone conversation from today:
>>
>> On Thu, Jan 17, 2008 at 01:07:02PM -0500, Wendy Cheng wrote:
>>     
>>> J. Bruce Fields wrote:
>>>       
>>>> Would there be any advantage to enforcing that requirement in the
>>>> server?  (For example, teaching nlm to reject any locking request for a
>>>> certain filesystem that wasn't sent to a certain server IP.)
>>>>
>>>> --b.
>>>>   
>>>>         
>>> It is doable... could be added into the "resume" patch that is currently  
>>> being tested (since the logic is so similar to the per-ip base grace  
>>> period) that should be out for review no later than next Monday.
>>>
>>> However, as any new code added into the system, there are trade-off(s).  
>>> I'm not sure we want to keep enhancing this too much though.
>>>       
>> Sure.  And I don't want to make this terribly complicated.  The patch
>> looks good, and solves a clear problem.  That said, there are a few
>> related problems we'd like to solve:
>>
>> 	- We want to be able to move an export to a node with an already
>> 	  active nfs server.  Currently that requires restarting all of
>> 	  nfsd on the target node.  This is what I understand your next
>> 	  patch fixes.
>> 	- In the case of a filesystem that may be mounted from multiple
>> 	  nodes at once, we need to make sure we're not leaving a window
>> 	  allowing other applications to claim locks that nfs clients
>> 	  haven't recovered yet.
>> 	- Ideally we'd like this to be possible without making the
>> 	  filesystem block all lock requests during a 90-second grace
>> 	  period; instead it should only have to block those requests
>> 	  that conflict with to-be-recovered locks.
>> 	- All this should work for nfsv4, where we want to eventually
>> 	  also allow migration of individual clients, and
>> 	  client-initiated failover.
>>
>> I absolutely don't want to delay solving this particular problem until
>> all the above is figured out, but I would like to be reasonably
>> confident that the new user-interface can be extended naturally to
>> handle the above cases; or at least that it won't unnecessarily
>> complicate their implementation.
>>
>> I'll try to sketch an implementation of most of the above in the next
>> week.
>>     
>
> Bah.  Apologies, this is taking me longer than it should to figure
> out--I've only barely started writing patches.
>
> The basic idea, though:
>
> In practice, it seems that both the unlock_ip and unlock_pathname
> methods that revoke locks are going to be called together.  The two
> separate calls therefore seem a little redundant.  The reason we *need*
> both is that it's possible that a misconfigured client could grab locks
> for a (server ip, export) combination that it isn't supposed to.
>   

That is not a correct assumption. The two commands (unlock_ip and 
unlock_pathname) are not necessarily called together. It is ok for local 
filesystem (ext3) but not for cluster filesystem where the very same 
filesystem (or subtree) can be exported from multiple servers using 
different subtrees. Also as we discussed before, it is 
"unlock_filesystem", *not* "unlock_pathname" (this implies sub-tree 
exports) due to implementation difficulties (see the "Implementation 
Notes" from http://people.redhat.com/wcheng/Patches/NFS/NLM/004.txt).
> So it makes sense to me to restrict locking from the beginning to
> prevent that from happening.  Therefore I'd like to add a call at the
> beginning like:
>
> 	echo "192.168.1.1 /exports/example" > /proc/fs/nfsd/start_grace
>   

My second patch set is about to be sent out (doing text description at 
this moment .. sorry for the delay).

> before any exports are set up, which both starts a grace period, and
> tells nfs to allow locks on the filesystem /exports/example only if
> they're addressed to the server ip 192.168.1.1.  Then on shutdown,
>
> 	echo "192.168.1.1" >/proc/fs/nfsd/unlock_ip
>
> should be sufficient to guarantee that nfsd/lockd no longer holds locks
> on /exports/example.
>
> (I think Wendy's pretty close to that api already after adding the
> second method to start grace?)
>
> The other advantage to having the server-ip from the start is that at
> the time we make lock requests to the cluster filesystem, we can tell it
> that the locks associated with 192.168.1.1 are special: they may migrate
> as a group to another node, and on node failure they should (if
> possible) be held to give a chance for another node to take them over.
>
> Internally I'd like to have an object like
>
> 	struct lock_manager {
> 		char *lm_name;
> 		...
> 	}
>
> for each server ip address.  A pointer to this structure would be passed
> with each lock request, allowing the filesystem to associate locks to
> lock_manager's.  The name would be a string derived from the server ip
> address that the cluster can compare to match reclaim requests with the
> locks that they're reclaiming from another node.
>   

I still don't have a warm feeling about adding this (at this moment) - 
somehow feel we over-engineer the lock failover issues. Remember lock 
failover is just a small portion of the general NFS server failover (for 
both HA and load balancing purpose) issues. Maybe we should have 
something simple and usable for 2.6 kernel before adding this type of 
complication ?
> (And in the NFSv4 case we would eventually also allow lock_managers with
> single nfsv4 client (as opposed to server-ip) granularity.)
>
> Does that seem sane?
>
> But it's taking me longer than I'd like to get patches that implement
> this.  Hopefully by next week I can get working code together for people
> to look at....
>
> --b.
>   



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-24 19:45                                         ` Wendy Cheng
@ 2008-01-24 20:19                                           ` J. Bruce Fields
  2008-01-24 21:06                                             ` Wendy Cheng
  0 siblings, 1 reply; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-24 20:19 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jan 24, 2008 at 02:45:37PM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> In practice, it seems that both the unlock_ip and unlock_pathname
>> methods that revoke locks are going to be called together.  The two
>> separate calls therefore seem a little redundant.  The reason we *need*
>> both is that it's possible that a misconfigured client could grab locks
>> for a (server ip, export) combination that it isn't supposed to.
>>   
>
> That is not a correct assumption. The two commands (unlock_ip and  
> unlock_pathname) are not necessarily called together. It is ok for local  
> filesystem (ext3) but not for cluster filesystem where the very same  
> filesystem (or subtree) can be exported from multiple servers using  
> different subtrees.

Ouch.  Are people really doing that, and why?  What happens if the
subtrees share files (because of hard links) that are locked from both
nodes?

> Also as we discussed before, it is  
> "unlock_filesystem", *not* "unlock_pathname" (this implies sub-tree  
> exports) due to implementation difficulties (see the "Implementation  
> Notes" from http://people.redhat.com/wcheng/Patches/NFS/NLM/004.txt).

Unless I misread the latest patch, it's actually matching on the
vfsmount, right?

I guess that means we *could* handle the above situation by doing a

	mount --bind /path/to/export/point /path/to/export/point

on each export, at which point there will be a separate vfsmount for
each export point?

But I don't think that's what we really want.  The goal is to ensure
that the nfs server holds no locks on a disk filesystem so we can
unmount it completely from this machine and mount it elsewhere.  So we
should really be removing all locks for the superblock, not just for a
particular mount of that superblock.  Otherwise we'll have odd problems
if someone happens to do the unlock_filesystem downcall from a different
namespace or something.

>> So it makes sense to me to restrict locking from the beginning to
>> prevent that from happening.  Therefore I'd like to add a call at the
>> beginning like:
>>
>> 	echo "192.168.1.1 /exports/example" > /proc/fs/nfsd/start_grace
>>   
>
> My second patch set is about to be sent out (doing text description at  
> this moment .. sorry for the delay).

Good, thanks.

>> before any exports are set up, which both starts a grace period, and
>> tells nfs to allow locks on the filesystem /exports/example only if
>> they're addressed to the server ip 192.168.1.1.  Then on shutdown,
>>
>> 	echo "192.168.1.1" >/proc/fs/nfsd/unlock_ip
>>
>> should be sufficient to guarantee that nfsd/lockd no longer holds locks
>> on /exports/example.
>>
>> (I think Wendy's pretty close to that api already after adding the
>> second method to start grace?)
>>
>> The other advantage to having the server-ip from the start is that at
>> the time we make lock requests to the cluster filesystem, we can tell it
>> that the locks associated with 192.168.1.1 are special: they may migrate
>> as a group to another node, and on node failure they should (if
>> possible) be held to give a chance for another node to take them over.
>>
>> Internally I'd like to have an object like
>>
>> 	struct lock_manager {
>> 		char *lm_name;
>> 		...
>> 	}
>>
>> for each server ip address.  A pointer to this structure would be passed
>> with each lock request, allowing the filesystem to associate locks to
>> lock_manager's.  The name would be a string derived from the server ip
>> address that the cluster can compare to match reclaim requests with the
>> locks that they're reclaiming from another node.
>>   
>
> I still don't have a warm feeling about adding this (at this moment) -  
> somehow feel we over-engineer the lock failover issues.

I agree that that's a risk.

> Remember lock  failover is just a small portion of the general NFS
> server failover (for  both HA and load balancing purpose) issues.
> Maybe we should have  something simple and usable for 2.6 kernel
> before adding this type of  complication ?

Yeah.  We should aim to include basic failover functionality in 2.6.26,
one way or another--I think that dealing with the other issues I'm
worried about won't actually be a great deal more complicated, but if
that doesn't pan out then fine.  I would like to at least make sure this
is all working for nfsv4 as well, though.  (Currently only locks held by
v2/v3 clients are dropped.)

--b.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-24 20:19                                           ` J. Bruce Fields
@ 2008-01-24 21:06                                             ` Wendy Cheng
  2008-01-24 21:40                                               ` J. Bruce Fields
  0 siblings, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-24 21:06 UTC (permalink / raw)
  To: cluster-devel.redhat.com

J. Bruce Fields wrote:
> On Thu, Jan 24, 2008 at 02:45:37PM -0500, Wendy Cheng wrote:
>   
>> J. Bruce Fields wrote:
>>     
>>> In practice, it seems that both the unlock_ip and unlock_pathname
>>> methods that revoke locks are going to be called together.  The two
>>> separate calls therefore seem a little redundant.  The reason we *need*
>>> both is that it's possible that a misconfigured client could grab locks
>>> for a (server ip, export) combination that it isn't supposed to.
>>>   
>>>       
>> That is not a correct assumption. The two commands (unlock_ip and  
>> unlock_pathname) are not necessarily called together. It is ok for local  
>> filesystem (ext3) but not for cluster filesystem where the very same  
>> filesystem (or subtree) can be exported from multiple servers using  
>> different subtrees.
>>     
>
> Ouch.  Are people really doing that, and why?  What happens if the
> subtrees share files (because of hard links) that are locked from both
> nodes?
>   

It is *more* common than you would expect - say server1 exports 
"/mnt/gfs/maildir/namea-j" and server2 exports "/mnt/gfs/maildir/namek-z".

>   
>> Also as we discussed before, it is  
>> "unlock_filesystem", *not* "unlock_pathname" (this implies sub-tree  
>> exports) due to implementation difficulties (see the "Implementation  
>> Notes" from http://people.redhat.com/wcheng/Patches/NFS/NLM/004.txt).
>>     
>
> Unless I misread the latest patch, it's actually matching on the
> vfsmount, right?
>   

Yes.

> I guess that means we *could* handle the above situation by doing a
>
> 	mount --bind /path/to/export/point /path/to/export/point
>
> on each export, at which point there will be a separate vfsmount for
> each export point?
>   

Cluster configuration itself has been cumbersome and error-prone. 
Requirement like this will not be well received. On the other hand, 
force-unlock a mount point is a *last* resort - since NFS clients using 
another ip interface would lose the contact with the server. We should 
*not* consider "unlock_filesystem" a frequent event.

> But I don't think that's what we really want.  The goal is to ensure
> that the nfs server holds no locks on a disk filesystem so we can
> unmount it completely from this machine and mount it elsewhere.  So we
> should really be removing all locks for the superblock, not just for a
> particular mount of that superblock.  Otherwise we'll have odd problems
> if someone happens to do the unlock_filesystem downcall from a different
> namespace or something.
>   

Oh ... sorry .. didn't read this far... so we agree the "--bind" is not 
a good idea :) ..

-- Wendy




^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-24 21:06                                             ` Wendy Cheng
@ 2008-01-24 21:40                                               ` J. Bruce Fields
  2008-01-24 21:49                                                 ` Wendy Cheng
  0 siblings, 1 reply; 57+ messages in thread
From: J. Bruce Fields @ 2008-01-24 21:40 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Thu, Jan 24, 2008 at 04:06:49PM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> On Thu, Jan 24, 2008 at 02:45:37PM -0500, Wendy Cheng wrote:
>>   
>>> J. Bruce Fields wrote:
>>>     
>>>> In practice, it seems that both the unlock_ip and unlock_pathname
>>>> methods that revoke locks are going to be called together.  The two
>>>> separate calls therefore seem a little redundant.  The reason we *need*
>>>> both is that it's possible that a misconfigured client could grab locks
>>>> for a (server ip, export) combination that it isn't supposed to.
>>>>         
>>> That is not a correct assumption. The two commands (unlock_ip and   
>>> unlock_pathname) are not necessarily called together. It is ok for 
>>> local  filesystem (ext3) but not for cluster filesystem where the 
>>> very same  filesystem (or subtree) can be exported from multiple 
>>> servers using  different subtrees.
>>>     
>>
>> Ouch.  Are people really doing that, and why?  What happens if the
>> subtrees share files (because of hard links) that are locked from both
>> nodes?
>>   
>
> It is *more* common than you would expect - say server1 exports  
> "/mnt/gfs/maildir/namea-j" and server2 exports 
> "/mnt/gfs/maildir/namek-z".

I believe it, but how hard would it be for them to just set those up as
separate partitions?

I'm really not fond of exports of subdirectories of filesystems, mainly
because I'm worried that many administrators don't understand the
security issue (which is that they probably are exposing the whole
filesystem when they export a subdirectory).

--b.



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-24 21:40                                               ` J. Bruce Fields
@ 2008-01-24 21:49                                                 ` Wendy Cheng
  0 siblings, 0 replies; 57+ messages in thread
From: Wendy Cheng @ 2008-01-24 21:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

J. Bruce Fields wrote:
> On Thu, Jan 24, 2008 at 04:06:49PM -0500, Wendy Cheng wrote:
>   
>> J. Bruce Fields wrote:
>>     
>>> On Thu, Jan 24, 2008 at 02:45:37PM -0500, Wendy Cheng wrote:
>>>   
>>>       
>>>> J. Bruce Fields wrote:
>>>>     
>>>>         
>>>>> In practice, it seems that both the unlock_ip and unlock_pathname
>>>>> methods that revoke locks are going to be called together.  The two
>>>>> separate calls therefore seem a little redundant.  The reason we *need*
>>>>> both is that it's possible that a misconfigured client could grab locks
>>>>> for a (server ip, export) combination that it isn't supposed to.
>>>>>         
>>>>>           
>>>> That is not a correct assumption. The two commands (unlock_ip and   
>>>> unlock_pathname) are not necessarily called together. It is ok for 
>>>> local  filesystem (ext3) but not for cluster filesystem where the 
>>>> very same  filesystem (or subtree) can be exported from multiple 
>>>> servers using  different subtrees.
>>>>     
>>>>         
>>> Ouch.  Are people really doing that, and why?  What happens if the
>>> subtrees share files (because of hard links) that are locked from both
>>> nodes?
>>>   
>>>       
>> It is *more* common than you would expect - say server1 exports  
>> "/mnt/gfs/maildir/namea-j" and server2 exports 
>> "/mnt/gfs/maildir/namek-z".
>>     
>
> I believe it, but how hard would it be for them to just set those up as
> separate partitions?
>
>   
The answers that I normally hear is that "then why would I bother to use 
a cluster filesystem ?" .... Wendy



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-24 16:00                                       ` J. Bruce Fields
       [not found]                                         ` <4798BAAE.6090107@redhat.com>
  2008-01-24 19:45                                         ` Wendy Cheng
@ 2008-01-28  3:46                                         ` Felix Blyakher
  2008-01-28 15:56                                           ` Wendy Cheng
  2 siblings, 1 reply; 57+ messages in thread
From: Felix Blyakher @ 2008-01-28  3:46 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi Bruce,

On Jan 24, 2008, at 10:00 AM, J. Bruce Fields wrote:

> On Thu, Jan 17, 2008 at 03:23:42PM -0500, J. Bruce Fields wrote:
>> To summarize a phone conversation from today:
>>
>> On Thu, Jan 17, 2008 at 01:07:02PM -0500, Wendy Cheng wrote:
>>> J. Bruce Fields wrote:
>>>> Would there be any advantage to enforcing that requirement in the
>>>> server?  (For example, teaching nlm to reject any locking  
>>>> request for a
>>>> certain filesystem that wasn't sent to a certain server IP.)
>>>>
>>>> --b.
>>>>
>>> It is doable... could be added into the "resume" patch that is  
>>> currently
>>> being tested (since the logic is so similar to the per-ip base grace
>>> period) that should be out for review no later than next Monday.
>>>
>>> However, as any new code added into the system, there are trade- 
>>> off(s).
>>> I'm not sure we want to keep enhancing this too much though.
>>
>> Sure.  And I don't want to make this terribly complicated.  The patch
>> looks good, and solves a clear problem.  That said, there are a few
>> related problems we'd like to solve:
>>
>> 	- We want to be able to move an export to a node with an already
>> 	  active nfs server.  Currently that requires restarting all of
>> 	  nfsd on the target node.  This is what I understand your next
>> 	  patch fixes.
>> 	- In the case of a filesystem that may be mounted from multiple
>> 	  nodes at once, we need to make sure we're not leaving a window
>> 	  allowing other applications to claim locks that nfs clients
>> 	  haven't recovered yet.
>> 	- Ideally we'd like this to be possible without making the
>> 	  filesystem block all lock requests during a 90-second grace
>> 	  period; instead it should only have to block those requests
>> 	  that conflict with to-be-recovered locks.
>> 	- All this should work for nfsv4, where we want to eventually
>> 	  also allow migration of individual clients, and
>> 	  client-initiated failover.
>>
>> I absolutely don't want to delay solving this particular problem  
>> until
>> all the above is figured out, but I would like to be reasonably
>> confident that the new user-interface can be extended naturally to
>> handle the above cases; or at least that it won't unnecessarily
>> complicate their implementation.
>>
>> I'll try to sketch an implementation of most of the above in the next
>> week.
>
> Bah.  Apologies, this is taking me longer than it should to figure
> out--I've only barely started writing patches.
>
> The basic idea, though:
>
> In practice, it seems that both the unlock_ip and unlock_pathname
> methods that revoke locks are going to be called together.  The two
> separate calls therefore seem a little redundant.  The reason we  
> *need*
> both is that it's possible that a misconfigured client could grab  
> locks
> for a (server ip, export) combination that it isn't supposed to.
>
> So it makes sense to me to restrict locking from the beginning to
> prevent that from happening.  Therefore I'd like to add a call at the
> beginning like:
>
> 	echo "192.168.1.1 /exports/example" > /proc/fs/nfsd/start_grace
>
> before any exports are set up, which both starts a grace period, and
> tells nfs to allow locks on the filesystem /exports/example only if
> they're addressed to the server ip 192.168.1.1.  Then on shutdown,
>
> 	echo "192.168.1.1" >/proc/fs/nfsd/unlock_ip
>
> should be sufficient to guarantee that nfsd/lockd no longer holds  
> locks
> on /exports/example.
>
> (I think Wendy's pretty close to that api already after adding the
> second method to start grace?)
>
> The other advantage to having the server-ip from the start is that at
> the time we make lock requests to the cluster filesystem, we can  
> tell it
> that the locks associated with 192.168.1.1 are special: they may  
> migrate
> as a group to another node, and on node failure they should (if
> possible) be held to give a chance for another node to take them over.
>
> Internally I'd like to have an object like
>
> 	struct lock_manager {
> 		char *lm_name;
> 		...
> 	}
>
> for each server ip address.  A pointer to this structure would be  
> passed
> with each lock request, allowing the filesystem to associate locks to
> lock_manager's.  The name would be a string derived from the server ip
> address that the cluster can compare to match reclaim requests with  
> the
> locks that they're reclaiming from another node.
>
> (And in the NFSv4 case we would eventually also allow lock_managers  
> with
> single nfsv4 client (as opposed to server-ip) granularity.)
>
> Does that seem sane?

It does. Though, I'd like to elaborate on effect of this change on the
disk filesystem, and in particular on a cluster filesystem.
I know, I'm jumping ahead, but I'd like to make sure that it's all
going to work well with cluster filesystems.

As part of processing "unlock by ip" request (from above example of
writing into /proc/fs/nfsd/unlock_ip) nfsd would call the underlying
filesystem. In cluster filesystem we really can't just delete locks,
as filesystem is still available and accessible from other nodes in
the cluster. We need to protect the nfs client's locks till they're
reclaimed by the new nfs server.

Bruce mentioned in another mail that communication to the underlying
filesystem would be through the lock_manager calls:

On Jan 24, 2008, at 10:39 AM, J. Bruce Fields wrote:
> In the case of a cluster filesystem, what I hope we end up with is an
> api with calls to the filesystem like:
>
> 	lock_manager_start(lock_manager, super_block);
> 	lock_manager_end_grace(lock_manager, super_block);
> 	lock_manager_shutdown(lock_manager, super_block);

Would that be part of lock_manager_ops, which any filesystem can  
implement,
with the generic ops handling the case of two servers mounting single  
ext3
filesystem one at a time?

Back to cluster filesystem, lock_manager_shutdown would be called as  
result
of unlock_ip request. That should trigger "grace period" in the cluster
filesystem, such that none of the locks associated with the lock_manager
are getting really unlocked, but rather marked for reclaim. That period
lasts till new nfs server comes up, or some predefined (or tunable)
timeout.
New nfs server will call lock_manager_start() to mark the start of
its new grace period, during which the nfs clients are reclaiming the
locks. At the end of the grace period lock_manager_end_grace() is
called signaling filesystem that the grace period is over and any
remaining unreclaimed locks could be cleaned up.

Seems sane to me. Is that how you envision nfsd interaction with the
underlying (clustered) filesystem?

The next step would be to think of recovery of nfs server on another
node. The key difference is that the there is no controlled way to
shutdown filesystem and release the file locks. Though, that will
be the subject for another topic.

Cheers,
Felix



^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-28  3:46                                         ` Felix Blyakher
@ 2008-01-28 15:56                                           ` Wendy Cheng
  2008-01-28 17:06                                             ` Felix Blyakher
  0 siblings, 1 reply; 57+ messages in thread
From: Wendy Cheng @ 2008-01-28 15:56 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Felix Blyakher wrote:
>>
>> (I think Wendy's pretty close to that api already after adding the
>> second method to start grace?)

For reclaiming grace period issues, maybe we should move to
https://www.redhat.com/archives/cluster-devel/2008-January/msg00340.html 
thread ?

I view this (unlock) patch set is mostly done and really don't want to 
add more complications to it.

-- Wendy




^ permalink raw reply	[flat|nested] 57+ messages in thread

* [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
  2008-01-28 15:56                                           ` Wendy Cheng
@ 2008-01-28 17:06                                             ` Felix Blyakher
  0 siblings, 0 replies; 57+ messages in thread
From: Felix Blyakher @ 2008-01-28 17:06 UTC (permalink / raw)
  To: cluster-devel.redhat.com


On Jan 28, 2008, at 9:56 AM, Wendy Cheng wrote:

> Felix Blyakher wrote:
>>>
>>> (I think Wendy's pretty close to that api already after adding the
>>> second method to start grace?)
>
> For reclaiming grace period issues, maybe we should move to
> https://www.redhat.com/archives/cluster-devel/2008-January/ 
> msg00340.html thread ?
>
> I view this (unlock) patch set is mostly done and really don't want  
> to add more complications to it.

Heh, that was rather reply to Bruce's looking forward discussion
on related issues. Incidentally it was attached to your unlock
patch thread.

No problem with your patch, Wendy.

Cheers,
Felix



^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2008-01-28 17:06 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-01-07  5:39 [Cluster-devel] [PATCH 1/2] NLM failover unlock commands Wendy Cheng
     [not found] ` <message from Wendy Cheng on Monday January 7>
2008-01-08  5:18   ` [Cluster-devel] " Neil Brown
2008-01-09  2:51     ` Wendy Cheng
2008-01-08  5:31   ` [Cluster-devel] Re: [PATCH 2/2] Fix lockd panic Neil Brown
2008-01-09  3:02     ` Wendy Cheng
2008-01-09  4:43       ` Wendy Cheng
2008-01-09 23:33         ` Wendy Cheng
2008-01-12  6:51           ` Wendy Cheng
2008-01-08 17:02 ` [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands Christoph Hellwig
2008-01-08 17:49   ` Christoph Hellwig
2008-01-08 20:57     ` Wendy Cheng
2008-01-09 18:02       ` Christoph Hellwig
2008-01-10  7:59         ` Christoph Hellwig
2008-01-12  7:03           ` Wendy Cheng
2008-01-12  9:38             ` Christoph Hellwig
2008-01-14 23:07             ` J. Bruce Fields
     [not found]               ` <message from J. Bruce Fields on Monday January 14>
2008-01-14 23:31                 ` Neil Brown
2008-01-22 22:53                   ` J. Bruce Fields
     [not found]                     ` <message from J. Bruce Fields on Tuesday January 22>
2008-01-24  4:02                       ` Neil Brown
2008-01-15 16:14               ` Wendy Cheng
2008-01-15 16:30                 ` J. Bruce Fields
     [not found]             ` <message from Wendy Cheng on Saturday January 12>
2008-01-14 23:52               ` Neil Brown
2008-01-15 20:17                 ` Wendy Cheng
     [not found]                   ` <message from Wendy Cheng on Tuesday January 15>
2008-01-15 20:50                     ` Neil Brown
2008-01-15 20:56                       ` Wendy Cheng
2008-01-15 22:48                       ` Wendy Cheng
2008-01-17 15:10                         ` J. Bruce Fields
2008-01-17 15:48                           ` Wendy Cheng
2008-01-17 16:08                             ` Wendy Cheng
2008-01-17 16:10                               ` Wendy Cheng
2008-01-18 10:21                                 ` Frank van Maarseveen
2008-01-18 15:00                                   ` Wendy Cheng
2008-01-17 16:14                             ` J. Bruce Fields
2008-01-17 16:17                               ` Wendy Cheng
2008-01-17 16:21                                 ` J. Bruce Fields
2008-01-17 16:31                             ` J. Bruce Fields
2008-01-17 16:31                               ` Wendy Cheng
2008-01-17 16:40                                 ` J. Bruce Fields
     [not found]                                   ` <1200591323.13670.34.camel@dyn9047022153>
2008-01-17 17:59                                     ` Wendy Cheng
2008-01-17 18:07                                   ` Wendy Cheng
2008-01-17 20:23                                     ` J. Bruce Fields
2008-01-18 10:03                                       ` Frank van Maarseveen
2008-01-18 14:56                                         ` Wendy Cheng
2008-01-24 16:00                                       ` J. Bruce Fields
     [not found]                                         ` <4798BAAE.6090107@redhat.com>
2008-01-24 16:39                                           ` J. Bruce Fields
2008-01-24 19:45                                         ` Wendy Cheng
2008-01-24 20:19                                           ` J. Bruce Fields
2008-01-24 21:06                                             ` Wendy Cheng
2008-01-24 21:40                                               ` J. Bruce Fields
2008-01-24 21:49                                                 ` Wendy Cheng
2008-01-28  3:46                                         ` Felix Blyakher
2008-01-28 15:56                                           ` Wendy Cheng
2008-01-28 17:06                                             ` Felix Blyakher
2008-01-16  4:19                     ` Neil Brown
2008-01-09  3:49   ` Wendy Cheng
2008-01-09 16:13     ` J. Bruce Fields
  -- strict thread matches above, loose matches on Subject: below --
2008-01-07  5:53 [Cluster-devel] [PATCH 2/2] Fix lockd panic Wendy Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).