Linux NFS development
 help / color / mirror / Atom feed
* [RFC] NLM lock failover admin interface
@ 2006-06-12  5:25 Wendy Cheng
  2006-06-12  6:11 ` Wendy Cheng
                   ` (4 more replies)
  0 siblings, 5 replies; 28+ messages in thread
From: Wendy Cheng @ 2006-06-12  5:25 UTC (permalink / raw)
  To: nfs; +Cc: linux-cluster

NFS v2/v3 active-active NLM lock failover has been an issue with our
cluster suite. With current implementation, it (cluster suite) is trying
to carry the workaround as much as it can with user mode scripts where,
upon failover, on taken-over server, it:

1. Tear down virtual IP.
2. Unexport the subject NFS export.
3. Signal lockd to drop the locks.
4. Un-mount filesystem if needed.

There are many other issues (such as /var/lib/nfs/statd/sm file, etc)
but this particular post is to further refine step 3 to avoid the 50
second global (default) grace period for all NFS exports; i.e., we would
like to be able to selectively drop locks (only) associated with the
requested exports without disrupting other NFS services. 

We've done some prototype (coding) works but would like to search for
community consensus on the admin interface if possible. We've tried out
the following:

1. /proc interface, say writing the fsid into a /proc directory entry
would end up dropping all NLM locks associated with the NFS export that
has fsid in its /etc/exports file.

2. Adding a new flag into "exportfs" command, say "h", such that

   "exportfs -uh *:/export_path"

would un-export the entry and drop the NLM locks associated with the
entry.

3. Add a new nfsctl by re-using a 2.4 kernel flag (NFSCTL_FOLOCKS) where
it takes:

   struct nfsctl_folocks {
        int           type;
        unsigned int  fsid;
        unsigned int  devno;
   }

as input argument. Depending on "type", the kernel call would drop the
locks associated with either the fsid, or devno. 

The core of the implementation is a new cloned version of
nlm_traverse_files() where it searches the "nlm_files" list one by one
to compare the fsid (or devno) based on nlm_file.f_handle field. A
helper function is also implemented to extract the fsid (or devno) from
f_handle.

The new function is planned to allow failover to abort if the file can't
be closed. We may also put the file locks back if abort occurs.

Would appreciate comments on the above admin interface. As soon as the
external interface can be finalized, the code will be submitted for
review.

-- Wendy

^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: [Linux-cluster] [RFC] NLM lock failover admin interface
@ 2006-06-12 14:45 Stanley, Jon
  2006-06-13  3:39 ` Wendy Cheng
  0 siblings, 1 reply; 28+ messages in thread
From: Stanley, Jon @ 2006-06-12 14:45 UTC (permalink / raw)
  To: linux clustering, nfs

 

> -----Original Message-----
> From: linux-cluster-bounces@redhat.com 
> [mailto:linux-cluster-bounces@redhat.com] On Behalf Of Wendy Cheng
> Sent: Monday, June 12, 2006 12:26 AM
> To: nfs@lists.sourceforge.net
> Cc: linux-cluster@redhat.com
> Subject: [Linux-cluster] [RFC] NLM lock failover admin interface
> 
NOTE - I don't use NFS functionality in Cluster Suite, so my coments may
be entirely meaningless.

> 
> 1. /proc interface, say writing the fsid into a /proc directory entry
> would end up dropping all NLM locks associated with the NFS 
> export that
> has fsid in its /etc/exports file.

This would defintely have it's advantages for people who know what
they're doing - they could drop all locks without unexporting the
filesystem.  However, it also gives people the opportunity to shoot
themselves in the foot - by eliminating locks that are needed.  After
weighing the pros and cons, I really don't think that any method
accessible via /proc is a good idea.

> 
> 2. Adding a new flag into "exportfs" command, say "h", such that
> 
>    "exportfs -uh *:/export_path"
> 
> would un-export the entry and drop the NLM locks associated with the
> entry.
> 

This is the best of the three, IMHO.  Gives you the safety of *knowing*
that the filesystem was unexported before dropping the locks, and
preventing folks from shooting themselves in the foot.

The other option that was mentioned, a separate lockd for each fs, is
also a good idea - but would require a lot of coding no doubt, and
introduce more instability into what I already preceive as an unstable
NFS subsystem in Linux (I *refuse* to use Linux as an NFS server and
instead go with Solaris - I've had *really* bad experiences with Linux
NFS under load - but that's getting OT).


_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2006-06-16 15:39 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-12  5:25 [RFC] NLM lock failover admin interface Wendy Cheng
2006-06-12  6:11 ` Wendy Cheng
2006-06-12 15:00 ` [Linux-cluster] " J. Bruce Fields
2006-06-12 15:44   ` [NFS] " Wendy Cheng
2006-06-12 16:20     ` [Linux-cluster] " Madhan P
2006-06-12 16:58       ` Madhan P
2006-06-12 18:09       ` [NFS] " Wendy Cheng
2006-06-12 17:23     ` [Linux-cluster] " Steve Dickson
2006-06-12 17:27 ` James Yarbrough
2006-06-12 19:07   ` [NFS] " Wendy Cheng
2006-06-13  3:17 ` Neil Brown
2006-06-13  7:00   ` [NFS] " Wendy Cheng
2006-06-13  7:08     ` Neil Brown
2006-06-14  6:54   ` [NFS] " Wendy Cheng
2006-06-14 11:36     ` Christoph Hellwig
2006-06-14 13:39       ` Wendy Cheng
2006-06-14 14:00     ` Wendy Cheng
2006-06-15 14:07       ` [NFS] " William A.(Andy) Adamson
2006-06-15 15:09         ` Wendy Cheng
2006-06-16  6:09         ` [Linux-cluster] " Neil Brown
2006-06-16 15:39           ` [NFS] " William A.(Andy) Adamson
2006-06-15  4:27     ` [NFS] " Neil Brown
2006-06-15  6:39       ` Wendy Cheng
2006-06-15  8:02         ` Neil Brown
2006-06-15 18:43           ` Wendy Cheng
2006-06-13 15:23 ` James Yarbrough
  -- strict thread matches above, loose matches on Subject: below --
2006-06-12 14:45 [Linux-cluster] " Stanley, Jon
2006-06-13  3:39 ` Wendy Cheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox