[Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Wendy Cheng <wcheng@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover
Date: Sat, 05 Aug 2006 01:44:42 -0400	[thread overview]
Message-ID: <1154756682.3384.34.camel@localhost.localdomain> (raw)
In-Reply-To: <1154706709.4727.21.camel@localhost>

On Fri, 2006-08-04 at 11:51 -0400, Trond Myklebust wrote:
> On Fri, 2006-08-04 at 10:56 -0400, Wendy Cheng wrote:
> > Anyway, better be conservative than sorry - I think we want to switch to
> > "fsid" approach to avoid messing with these networking issues, including
> > IPV6 modification. That is, we use fsid as the key to drop the lock and
> > set per-fsid NLM grace period. The ha-callout will have a 4th argument
> > (fsid) when invoked. 
> 
> What is the point of doing that? As far as the client is concerned, a
> server has either rebooted or it hasn't. It doesn't know about single
> filesystems rebooting.
> 

For active-active failover, the submitted patches allow:  

1: Drop the locks tied with one particular floating ip (in old server).
2: Notify relevant clients that the floating ip has been rebooted.
3: Set per-ip nlm grace period.
4: The (notified) nfs clients reclaim locks into the new server.

While the above 4 steps are being executed, both servers keep alive with
other nfs services un-interrupted. (1) and (3) are accomplished by Patch
3-1 and Patch 3-2. (4) is nfs client's task that follows its existing
logic without changes. 

For (2), the basics are built upon the existing rpc.statd's HA features,
specifically the -H and -nNP option. It, however, needs Patch 3-3 to
pass the correct floating ip address into rpc.statd user mode daemon as
the following: 

For system not involved in HA failover, nothing has change. All new
functions are optional with added-on feature. For cluster failover,

1. The rpc.statd is dispatched as "rpc.statd -H ha-callout"
2. Upon each monitor RPC calls (SM_MON or SM_UNMON), rpc.statd
   received the following from kernel:
   2-a: event (mon or unmon)
   2-b: server interface
   2-c: client interface.
3. The rpc.statd does its normal chores by writing or deleting 
   the client interface to/from the default sm directory. Server
   interface is not used here.  
   (btw, this is the existing logic without changes).
4. Then, the rpc.statd invokes ha-callout with the following three
   arguments:
   4-a: event (add-client or del-client)
   4-b: server interface
   4-c: client interface
   The ha-callout (in our case, it will be part of RHCS cluster suite)
   builds multiple sm directories based on 4-b, say 
   /shared_storage/sm_x,  where x is server's ip interface.
5. Upon failover, the cluster suite invokes 
   "rpc.statd -n x -N -P /shared_storage/sm_x" to notify affected
   clients. The new short-life rpc.statd will send the notification to
   relevant (nlm) clients and subsequently exits. The old rpc.statd
   (from step 1) is not aware of the failover event.

Note that before patch 3-3, the kernel always sets 2-b to
system_utsname.nodename. For rpc.statd, if RESTRICTED_STATD flag is on,
the rpc.statd always set 4-b to 127.0.0.1. Without RESTRICTED_STATD on,
it sets 4-b with whatever was passed by kernel (via 2-b). What (kernel)
patch 3-3 does is setting 2-b to the floating ip so rpc.statd could get
the correct ip and pass it into 4-b.

Greg said (I havn't figured out how) without setting 4-b to 127.0.0.1,
we "may" open a security hole. So the thinking here is, then, let's not
change anything but add an fsid as 4th argument for ha-callout as:

   4-d: fsid.

where "fsid" can be viewed as an unique identifier for an NFS export
specified in exports file (check "man exports"); e.g.

        /failover_dir   *(rw, fsid=1234)

With the added fsid info from ha-callout program, the cluster suite (or
human administrator) should be able to associated which (nlm) client has
been affected by one particular failover. 

From implementation point of view, since fsid, if specified, has already
been part of the filehandle that is part of the nlm_file structure, we
should be able to replace the floating ip in the submitted patches with
fsid and still accomplish the very same thing. In short, the failover
sequence with the new interface would look like:

taken-over server:
A-1. tear down floating ip, say 10.10.1.1
A-2. unexport subject filesystem
A-3. "echo 1234 > /proc/fs/nfsd/nlm_unlock"  //fsid=1234
A-4. umount filesystem.

take-over server:
B-1. mount the subject filesystem
B-2. "echo 1234 > /proc/fs/nfsd/nlm_set_ip_grace"
B-3. "rpc.statd -n 10.10.1.1 -N -P /shared_storage/sm_10.10.1.1"
B-4. bring up 10.10.1.1
B-5. re-export the filesystem

A-3 and B-2 could be issued multiple times if the floating ip is
associated with multiple fsid(s).

Make sense ?

This fsid can also resolve Neil's concern (about nlm client using wrong
server interface to access filesystem) that I'll follow up sometime next
week. 

-- Wendy

next prev parent reply	other threads:[~2006-08-05  5:44 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-06-29 17:47 [Cluster-devel] [RFC PATCH 0/3] NLM lock failover Wendy Cheng
2006-08-01  1:55 ` [Cluster-devel] [PATCH " Wendy Cheng
     [not found]   ` <message from Wendy Cheng on Monday July 31>
2006-08-03  4:14     ` [Cluster-devel] Re: [NFS] " Neil Brown
2006-08-03 21:34       ` Wendy Cheng
2006-08-07 22:38       ` Wendy Cheng
2006-08-04  9:27   ` Greg Banks
2006-08-04 13:27     ` Wendy Cheng
2006-08-04 14:56       ` Wendy Cheng
2006-08-04 15:51         ` Trond Myklebust
2006-08-05  5:44           ` Wendy Cheng [this message]
2006-08-07  4:05             ` Greg Banks
2006-08-07 20:14               ` James Yarbrough
2006-08-07 21:03                 ` Wendy Cheng
2006-08-07  4:05       ` Greg Banks

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1154756682.3384.34.camel@localhost.localdomain \
    --to=wcheng@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).