* [Cluster-devel] [RFC PATCH 0/3] NLM lock failover
@ 2006-06-29 17:47 Wendy Cheng
2006-08-01 1:55 ` [Cluster-devel] [PATCH " Wendy Cheng
0 siblings, 1 reply; 14+ messages in thread
From: Wendy Cheng @ 2006-06-29 17:47 UTC (permalink / raw)
To: cluster-devel.redhat.com
The uploaded patches implement NLM lock failover as discussed in:
http://www.redhat.com/archives/linux-cluster/2006-June/msg00050.html
[PATCH 1/3]
Add a new admin interface into current nfsd procfs filesystem to trigger
NLM lock releasing logic. The command is invoked by echoing the server
virtual IP address into /proc/fs/nfsd/nlm_unlock file as:
shell> cd /prod/fs/nfsd
shell> echo 10.10.1.1 > nlm_unlock
It is currently restricted to IPV4 addressing and the command should be
invoked from the taken-over NFS server. To do list (not yet implemented)
include:
1. IPv6 addessing enablement
2. Add "client:server" ip pair to allow NFS V4 lock failover as proposed
by Andy Adamson (CITI).
[PATCH 2/3]
Add take-over server counter-part command into nfsd procfs interface to
allow selectively setting of per (virtual) ip (lockd) grace period. It
is also invoked by echoing the virtual IP into
/proc/fs/nfsd/nlm_set_ip_grace file as:
shell> cd /proc/fs/nfsd
shell> echo 10.10.1.1 > nlm_set_ip_grace
It is also restricted to IPV4 addressing and the command is expected to
be invoked from the take-over NFS server.
[PATCH 3/3]
This kernel patch has *not* been unit tested out yet and it needs to be
paired with user mode nfs-utils changes (not ready in time for this
RFC). It puts the taken-over IPv4 address in standard dot notation into
the 3rd parameter of ha_callout program (see man rpc.statd for details)
for "add-client" event. For "del-client", we would assume the monitored
host should be removed from machine-wide lists, regardless server's
interface.
Upon agree-ed on, we will integrate the changes into our cluster suite
to start a full function verification test. Please comment.
-- Wendy
^ permalink raw reply [flat|nested] 14+ messages in thread* [Cluster-devel] [PATCH 0/3] NLM lock failover 2006-06-29 17:47 [Cluster-devel] [RFC PATCH 0/3] NLM lock failover Wendy Cheng @ 2006-08-01 1:55 ` Wendy Cheng [not found] ` <message from Wendy Cheng on Monday July 31> 2006-08-04 9:27 ` Greg Banks 0 siblings, 2 replies; 14+ messages in thread From: Wendy Cheng @ 2006-08-01 1:55 UTC (permalink / raw) To: cluster-devel.redhat.com For background info, please check out: o http://www.redhat.com/archives/linux-cluster/2006-June/msg00050.html for interface discussion. o https://www.redhat.com/archives/cluster-devel/2006-June/msg00231.html for first drafted code review. Note and Restrictions: o With nfs-utils-1.0.8-rc4 and nfs-utils-lib-1.0.8, the tests went surprisingly well, particularly the ha-callout feature. *No* change is made into these two user mode utility packages. o The nfs-utils config flag RESTRICTED_STATD must be off for NLM failover to be functional correctly. o The third parameter passed to rpc.statd ha-callout program is no longer be the system_utsname.nodename (set by sethostname()). It is, instead, the specific IP interface where the server receives the client's request. o The patches are for NFS v2/v3 only. However, we do leave room for future NFS V4 expansion. For example, echoing client_ip at server_ip into /proc/fs/nfsd/nlm_unlock could be used to drop the V4 locks. o IP V6 modification is not included in this patch set. If required, it will be submitted as another patch set. PATCH 1/3 --------- Add a new admin interface into current nfsd procfs filesystem to trigger NLM lock releasing logic. The command is invoked by echoing the server IP V4 address (in standard dot notation) into /proc/fs/nfsd/nlm_unlock file as: shell> cd /prod/fs/nfsd shell> echo 10.10.1.1 > nlm_unlock PATCH 2/3 --------- Add take-over server counter-part command into nfsd procfs interface to allow selectively setting of per (virtual) ip (lockd) grace period. The grace period setting follows current system-wide grace period rule and default. It is also invoked by echoing the server IP V4 address (in dot notation) into /proc/fs/nfsd/nlm_set_ip_grace file: shell> cd /proc/fs/nfsd shell> echo 10.10.1.1 > nlm_set_ip_grace PATCH 3/3 --------- This kernel patch adds a new field into struct nlm_host that holds the server IP address. Upon SM_MON and SM_UNMON procedure calls, the IP (in standard V4 dot notation) is placed as the "my_name" string and passed to local statd daemon. This enables ha-callout program ("man rpc.statd" for details) to receive the IP address of the server that has received the client's request. Before this change, my_name string is universally set to system_utsname.nodename. The user mode HA implementation is expected to: 1. Specify a user mode ha-calloupt program (rpc.statd -H) for receiving client monitored states. 2. Based on info passed by ha-callout, individual state-directory should be created and can be read from take-over server. 3. Upon failover, on take-over server, send out notification to nfs client via (rpc.statd -n server_ip -P individual_state_directory -N) command. -- Wendy ^ permalink raw reply [flat|nested] 14+ messages in thread
[parent not found: <message from Wendy Cheng on Monday July 31>]
* [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover [not found] ` <message from Wendy Cheng on Monday July 31> @ 2006-08-03 4:14 ` Neil Brown 2006-08-03 21:34 ` Wendy Cheng 2006-08-07 22:38 ` Wendy Cheng 0 siblings, 2 replies; 14+ messages in thread From: Neil Brown @ 2006-08-03 4:14 UTC (permalink / raw) To: cluster-devel.redhat.com Thanks for these. First note: it helps a lot if the Subject line for each patch contains a distinctive short description of what the patch does. Rather than just "NLM lock failover" three times, maybe Add nlm_lock file to nfsd fs to allow selective unlocked based on server IP Add nlm_set_ip_grace file to allow selective setting of grace time Use IP address rather than hostname in SM_{UN,}MON calls to statd Or something like that. > > PATCH 1/3 > --------- > Add a new admin interface into current nfsd procfs filesystem to trigger > NLM lock releasing logic. The command is invoked by echoing the server > IP V4 address (in standard dot notation) into /proc/fs/nfsd/nlm_unlock > file as: > > shell> cd /prod/fs/nfsd > shell> echo 10.10.1.1 > nlm_unlock > This patch makes an assumption that any given filehandle will only arrive at one particular interface - never more. This is implicit in the fact that f_iaddr is stored in 'struct nlm_file' which is indexed by filehandle. In the case where you are intending to support active-active failover this is should be the case, but obviously configuration errors are possible. I think what I would like is that if requests arrive at two (or more) different interfaces for the one file, then f_iaddr is cleared and some flag is set. When an IP is written to nlm_unlock, if the flag is set, then a warning message is printer Note: some files access via multiple interfaces will not be unlocked. A consequence of this is that you cannot have a virtual server with two (or more interfaces). Is this likely to be a problem? e.g. if you have 4 physical interfaces on your server, might you want to bind a different IP to each for each virtual server? If you did, then my change above would mean that you couldn't do failover, and we might need to look at other options... Possibly (and maybe this is more work than is justified), lockd can monitor interface usage and deduce interface pools based on seeing the same filehandle on multiple interfaces. Then when an unlock request arrives on nlm_unlock, lockd would require all interfaces that touched a file to be 'unlocked' before actually dropping the locks on the file. As you can probably tell I was "thinking out loud" there and it may not be particularly coherent or cohesive. Do you have any thoughts on this issues? > PATCH 2/3 > --------- > Add take-over server counter-part command into nfsd procfs interface to > allow selectively setting of per (virtual) ip (lockd) grace period. The > grace period setting follows current system-wide grace period rule and > default. It is also invoked by echoing the server IP V4 address (in dot > notation) into /proc/fs/nfsd/nlm_set_ip_grace file: > > shell> cd /proc/fs/nfsd > shell> echo 10.10.1.1 > nlm_set_ip_grace > I think nlm_ip_mutex should be a spinlock, and I don't think you should need to hold the lock in __nlm_servs_gc, as you have already disconnected these entries from the global list. +extern unsigned long set_grace_period(void); /* see fs/lockd/svc.c */ That should go in a header file. + switch (passthru_check) { + case NLMSVC_FO_PASSTHRU: + break; + case NLMSVC_FO_RECLAIM: + if (argp->reclaim) break; + default: + return nlm_lck_denied_grace_period; + } I'd rather you spelt out case NLMSVC_FO_BLOCK_ANY: rather than used 'default:' - it makes the code more readable. and surely you should check for NLMSVC_FO_PASSTHRU before calling nlmsvc_fo_check ??? It seems to me that it would be clearer not to put the nlmsvc_fo_check call inside nlm*svc_retrieve_args, but rather to define e.g. nlmsvc_is_grace_period(rqstp) which checks nlmsvc_grace_period and the list of IPs, and then replace every if (nlmsvc_grace_period) { resp->status = nlm_lck_denied_grace_period; return rpc_success; } with if (nlmsvc_is_grace_period(rqstp)) { resp->status = nlm_lck_denied_grace_period; return rpc_success; } and every if (nlmsvc_grace_period && !argp->reclaim) { resp->status = nlm_lck_denied_grace_period; return rpc_success; } with if (!argp->reclaim && nlmsvc_is_grace_period(rqstp)) { resp->status = nlm_lck_denied_grace_period; return rpc_success; } Does that seem reasonable to you? > PATCH 3/3 > --------- > This kernel patch adds a new field into struct nlm_host that holds the > server IP address. Upon SM_MON and SM_UNMON procedure calls, the IP (in > standard V4 dot notation) is placed as the "my_name" string and passed > to local statd daemon. This enables ha-callout program ("man rpc.statd" > for details) to receive the IP address of the server that has received > the client's request. Before this change, my_name string is universally > set to system_utsname.nodename. > > The user mode HA implementation is expected to: > 1. Specify a user mode ha-calloupt program (rpc.statd -H) for receiving > client monitored states. > 2. Based on info passed by ha-callout, individual state-directory should > be created and can be read from take-over server. > 3. Upon failover, on take-over server, send out notification to nfs > client via (rpc.statd -n server_ip -P individual_state_directory -N) > command. Was it necessary to rename "s_server" to "s_where"? Could you not have just introduced "s_server_ip"??? And if you really want s_where, then I would like some #defines and make + if (server) + host->h_where = 1; + else + host->h_where = 0; something like host->where = server ? ON_SERVER : ON_CLIENT (reading some more) In fact, you don't need h_where at all. Just change h_server to be the IP address, and then use e.g. - args.proto= (host->h_proto<<1) | host->h_server; + args.serv = host->h_server; + args.proto= (host->h_proto<<1) | (host->h_server?1:0) (or host->server!=0 or !!host->server - whatever takes your fancy). @@ -142,7 +142,7 @@ out_err: static u32 * xdr_encode_common(struct rpc_rqst *rqstp, u32 *p, struct nsm_args *argp) { - char buffer[20]; + char buffer[100]; This looks like it should really be a separate patch, and should probably be __NEW_UTS_LEN+1 rather than 100. But I worry about other people who might be using ha-callout already and expecting a host name there. We are making a non-optional user-visible change here. Is it really a safe thing to do? Thanks, NeilBrown ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover 2006-08-03 4:14 ` [Cluster-devel] Re: [NFS] " Neil Brown @ 2006-08-03 21:34 ` Wendy Cheng 2006-08-07 22:38 ` Wendy Cheng 1 sibling, 0 replies; 14+ messages in thread From: Wendy Cheng @ 2006-08-03 21:34 UTC (permalink / raw) To: cluster-devel.redhat.com Neil Brown wrote: >First note: it helps a lot if the Subject line for each patch >contains a distinctive short description of what the patch does. > > This is due to inexperience with open source patch submission plus end-of-day fatigue :) .. It will be improved. >>PATCH 1/3 >>--------- >>This patch makes an assumption that any given filehandle will only arrive at >>one particular interface - never more. This is implicit in the fact >>that f_iaddr is stored in 'struct nlm_file' which is indexed by >>filehandle. >> >>..... >> >>A consequence of this is that you cannot have a virtual server with >>two (or more interfaces). Is this likely to be a problem? >>e.g. if you have 4 physical interfaces on your server, might you want >>to bind a different IP to each for each virtual server? >>If you did, then my change above would mean that you couldn't do >>failover, and we might need to look at other options... >> >>Possibly (and maybe this is more work than is justified), lockd can >>monitor interface usage and deduce interface pools based on seeing the >>same filehandle on multiple interfaces. Then when an unlock request >>arrives on nlm_unlock, lockd would require all interfaces that touched >>a file to be 'unlocked' before actually dropping the locks on the >>file. >> >>As you can probably tell I was "thinking out loud" there and it may >>not be particularly coherent or cohesive. >> >>Do you have any thoughts on this issues? >> >> Another option is dropping the (NLM) locks based on "fsid" (that can be retrieved from filehandle), instead of virtual ip address. Note that "fsid" has a good use in a cluster environment (compared to device major/minor since different nodes may have different device numbers). See any bad thing about fsid approach ? One catch (about fsid) I can think of is that it must be passed from lockd to statd (then to ha-callout program). Current SM_MON and SM_UNMON protocol doesn't have any extra field for us to do that. Will add one more field causing any issue ? e.g. current SM_MON argument string<1024> mon_name; string<1024> my_name; unit32 my_prog; unit32 my_vers; unit32 my_proc; opaque[16] priv; Will add "opaque[16] fsid" after "priv" be ok ? Ditto for SM_UNMON. On the other hand, the fsid can be the 4th parameter to pass to ha-callout program (then, that we can avoid breaking any existing ha-callup application). Lets give it few more days to think these issues over. All others (comments for PATCH 2/3 and 3/3) are helpful coding advices - they are appreciated and changes will be made accordingly. -- Wendy ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover 2006-08-03 4:14 ` [Cluster-devel] Re: [NFS] " Neil Brown 2006-08-03 21:34 ` Wendy Cheng @ 2006-08-07 22:38 ` Wendy Cheng 1 sibling, 0 replies; 14+ messages in thread From: Wendy Cheng @ 2006-08-07 22:38 UTC (permalink / raw) To: cluster-devel.redhat.com Neil Brown wrote > >This patch makes an assumption that any given filehandle will only arrive at >one particular interface - never more. This is implicit in the fact >that f_iaddr is stored in 'struct nlm_file' which is indexed by >filehandle. > >In the case where you are intending to support active-active failover >this is should be the case, but obviously configuration errors are >possible. > >I think what I would like is that if requests arrive at two (or more) >different interfaces for the one file, then f_iaddr is cleared and some >flag is set. >When an IP is written to nlm_unlock, if the flag is set, then a >warning message is printer > Note: some files access via multiple interfaces will not be > unlocked. > > Have given this issue a long thought during the weekend. The suggested changes will work but so is "fsid" approach. I prefer "fsid" because it is simpler to implement and very effective. The problem with this approach is that it is a little bit awkward to explain - I don't believe it is well-understand (or even aware of ) by most of the system admin(s). We may need a good write-up for the procedures. It, however, effectively handles the issues associated with an export entry getting accessed by multiple virtual ip interfaces. The test runs (with fsid) today have been encouraging. Will push the revised patches for review as soon as they pass some sanity checks and testings. However, the following is the main changes made by this new implementation: First, it is required to export the filesystem using "fsid"; e.g. "export *:/mnt/tank1 (fsid=9468,rw) ". Patch 1: Drop the locks based on fsid; e.g. "echo 9468 > /proc/fs/nfsd/nlm_unlock" Patch 2: Set individual grace period based on fsid "echo 9468 > /proc/fs/nfsd/nlm_set_igrace" Patch 3: Utility functions to allow cluster suite (or system admin) to implement its own client reclaim notifications. Unfortunately, it is too cumbersome to switch Patch 3 using fsid. So the old trick stays (we still use server ip to facilitate the client reclaim notification process). In the mean time, the following is how I yank the fsid out of file handle. Send it here for preview purpose. If anyone spots anything wrong, please do let me know. This will be the "guts" of this whole thing: -- Wendy -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fsid_from_fh.txt URL: <http://listman.redhat.com/archives/cluster-devel/attachments/20060807/9e34b65b/attachment.txt> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover 2006-08-01 1:55 ` [Cluster-devel] [PATCH " Wendy Cheng [not found] ` <message from Wendy Cheng on Monday July 31> @ 2006-08-04 9:27 ` Greg Banks 2006-08-04 13:27 ` Wendy Cheng 1 sibling, 1 reply; 14+ messages in thread From: Greg Banks @ 2006-08-04 9:27 UTC (permalink / raw) To: cluster-devel.redhat.com On Tue, 2006-08-01 at 11:55, Wendy Cheng wrote: > o The nfs-utils config flag RESTRICTED_STATD must be off for NLM > failover to be functional correctly. That would reopen this ancient security hole: http://www.cert.org/advisories/CA-99-05-statd-automountd.html which might not be the best of ideas. Greg. -- Greg Banks, R&D Software Engineer, SGI Australian Software Group. I don't speak for SGI. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover 2006-08-04 9:27 ` Greg Banks @ 2006-08-04 13:27 ` Wendy Cheng 2006-08-04 14:56 ` Wendy Cheng 2006-08-07 4:05 ` Greg Banks 0 siblings, 2 replies; 14+ messages in thread From: Wendy Cheng @ 2006-08-04 13:27 UTC (permalink / raw) To: cluster-devel.redhat.com On Fri, 2006-08-04 at 19:27 +1000, Greg Banks wrote: > On Tue, 2006-08-01 at 11:55, Wendy Cheng wrote: > > o The nfs-utils config flag RESTRICTED_STATD must be off for NLM > > failover to be functional correctly. > > That would reopen this ancient security hole: > > http://www.cert.org/advisories/CA-99-05-statd-automountd.html > > which might not be the best of ideas. > ok, thanks ! I'll look into this. But I believe nfs-utils-1.0.8-rc4 has this off by default ? -- Wendy ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover 2006-08-04 13:27 ` Wendy Cheng @ 2006-08-04 14:56 ` Wendy Cheng 2006-08-04 15:51 ` Trond Myklebust 2006-08-07 4:05 ` Greg Banks 1 sibling, 1 reply; 14+ messages in thread From: Wendy Cheng @ 2006-08-04 14:56 UTC (permalink / raw) To: cluster-devel.redhat.com On Fri, 2006-08-04 at 09:27 -0400, Wendy Cheng wrote: > On Fri, 2006-08-04 at 19:27 +1000, Greg Banks wrote: > > On Tue, 2006-08-01 at 11:55, Wendy Cheng wrote: > > > o The nfs-utils config flag RESTRICTED_STATD must be off for NLM > > > failover to be functional correctly. > > > > That would reopen this ancient security hole: > > > > http://www.cert.org/advisories/CA-99-05-statd-automountd.html > > > > which might not be the best of ideas. > > > > ok, thanks ! I'll look into this. But I believe nfs-utils-1.0.8-rc4 has > this off by default ? > Anyway, better be conservative than sorry - I think we want to switch to "fsid" approach to avoid messing with these networking issues, including IPV6 modification. That is, we use fsid as the key to drop the lock and set per-fsid NLM grace period. The ha-callout will have a 4th argument (fsid) when invoked. -- Wendy ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover 2006-08-04 14:56 ` Wendy Cheng @ 2006-08-04 15:51 ` Trond Myklebust 2006-08-05 5:44 ` Wendy Cheng 0 siblings, 1 reply; 14+ messages in thread From: Trond Myklebust @ 2006-08-04 15:51 UTC (permalink / raw) To: cluster-devel.redhat.com On Fri, 2006-08-04 at 10:56 -0400, Wendy Cheng wrote: > Anyway, better be conservative than sorry - I think we want to switch to > "fsid" approach to avoid messing with these networking issues, including > IPV6 modification. That is, we use fsid as the key to drop the lock and > set per-fsid NLM grace period. The ha-callout will have a 4th argument > (fsid) when invoked. What is the point of doing that? As far as the client is concerned, a server has either rebooted or it hasn't. It doesn't know about single filesystems rebooting. Cheers, Trond ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover 2006-08-04 15:51 ` Trond Myklebust @ 2006-08-05 5:44 ` Wendy Cheng 2006-08-07 4:05 ` Greg Banks 0 siblings, 1 reply; 14+ messages in thread From: Wendy Cheng @ 2006-08-05 5:44 UTC (permalink / raw) To: cluster-devel.redhat.com On Fri, 2006-08-04 at 11:51 -0400, Trond Myklebust wrote: > On Fri, 2006-08-04 at 10:56 -0400, Wendy Cheng wrote: > > Anyway, better be conservative than sorry - I think we want to switch to > > "fsid" approach to avoid messing with these networking issues, including > > IPV6 modification. That is, we use fsid as the key to drop the lock and > > set per-fsid NLM grace period. The ha-callout will have a 4th argument > > (fsid) when invoked. > > What is the point of doing that? As far as the client is concerned, a > server has either rebooted or it hasn't. It doesn't know about single > filesystems rebooting. > For active-active failover, the submitted patches allow: 1: Drop the locks tied with one particular floating ip (in old server). 2: Notify relevant clients that the floating ip has been rebooted. 3: Set per-ip nlm grace period. 4: The (notified) nfs clients reclaim locks into the new server. While the above 4 steps are being executed, both servers keep alive with other nfs services un-interrupted. (1) and (3) are accomplished by Patch 3-1 and Patch 3-2. (4) is nfs client's task that follows its existing logic without changes. For (2), the basics are built upon the existing rpc.statd's HA features, specifically the -H and -nNP option. It, however, needs Patch 3-3 to pass the correct floating ip address into rpc.statd user mode daemon as the following: For system not involved in HA failover, nothing has change. All new functions are optional with added-on feature. For cluster failover, 1. The rpc.statd is dispatched as "rpc.statd -H ha-callout" 2. Upon each monitor RPC calls (SM_MON or SM_UNMON), rpc.statd received the following from kernel: 2-a: event (mon or unmon) 2-b: server interface 2-c: client interface. 3. The rpc.statd does its normal chores by writing or deleting the client interface to/from the default sm directory. Server interface is not used here. (btw, this is the existing logic without changes). 4. Then, the rpc.statd invokes ha-callout with the following three arguments: 4-a: event (add-client or del-client) 4-b: server interface 4-c: client interface The ha-callout (in our case, it will be part of RHCS cluster suite) builds multiple sm directories based on 4-b, say /shared_storage/sm_x, where x is server's ip interface. 5. Upon failover, the cluster suite invokes "rpc.statd -n x -N -P /shared_storage/sm_x" to notify affected clients. The new short-life rpc.statd will send the notification to relevant (nlm) clients and subsequently exits. The old rpc.statd (from step 1) is not aware of the failover event. Note that before patch 3-3, the kernel always sets 2-b to system_utsname.nodename. For rpc.statd, if RESTRICTED_STATD flag is on, the rpc.statd always set 4-b to 127.0.0.1. Without RESTRICTED_STATD on, it sets 4-b with whatever was passed by kernel (via 2-b). What (kernel) patch 3-3 does is setting 2-b to the floating ip so rpc.statd could get the correct ip and pass it into 4-b. Greg said (I havn't figured out how) without setting 4-b to 127.0.0.1, we "may" open a security hole. So the thinking here is, then, let's not change anything but add an fsid as 4th argument for ha-callout as: 4-d: fsid. where "fsid" can be viewed as an unique identifier for an NFS export specified in exports file (check "man exports"); e.g. /failover_dir *(rw, fsid=1234) With the added fsid info from ha-callout program, the cluster suite (or human administrator) should be able to associated which (nlm) client has been affected by one particular failover. From implementation point of view, since fsid, if specified, has already been part of the filehandle that is part of the nlm_file structure, we should be able to replace the floating ip in the submitted patches with fsid and still accomplish the very same thing. In short, the failover sequence with the new interface would look like: taken-over server: A-1. tear down floating ip, say 10.10.1.1 A-2. unexport subject filesystem A-3. "echo 1234 > /proc/fs/nfsd/nlm_unlock" //fsid=1234 A-4. umount filesystem. take-over server: B-1. mount the subject filesystem B-2. "echo 1234 > /proc/fs/nfsd/nlm_set_ip_grace" B-3. "rpc.statd -n 10.10.1.1 -N -P /shared_storage/sm_10.10.1.1" B-4. bring up 10.10.1.1 B-5. re-export the filesystem A-3 and B-2 could be issued multiple times if the floating ip is associated with multiple fsid(s). Make sense ? This fsid can also resolve Neil's concern (about nlm client using wrong server interface to access filesystem) that I'll follow up sometime next week. -- Wendy ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover 2006-08-05 5:44 ` Wendy Cheng @ 2006-08-07 4:05 ` Greg Banks 2006-08-07 20:14 ` James Yarbrough 0 siblings, 1 reply; 14+ messages in thread From: Greg Banks @ 2006-08-07 4:05 UTC (permalink / raw) To: cluster-devel.redhat.com On Sat, 2006-08-05 at 15:44, Wendy Cheng wrote: > On Fri, 2006-08-04 at 11:51 -0400, Trond Myklebust wrote: > > On Fri, 2006-08-04 at 10:56 -0400, Wendy Cheng wrote: > Note that before patch 3-3, the kernel always sets 2-b to > system_utsname.nodename. For rpc.statd, if RESTRICTED_STATD flag is on, > the rpc.statd always set 4-b to 127.0.0.1. Without RESTRICTED_STATD on, > it sets 4-b with whatever was passed by kernel (via 2-b). What (kernel) > patch 3-3 does is setting 2-b to the floating ip so rpc.statd could get > the correct ip and pass it into 4-b. > > Greg said (I havn't figured out how) without setting 4-b to 127.0.0.1, > we "may" open a security hole. Aha, I see what you needed. You could have changed the logic in the RESTRICTED_STATD case of sm_mon_1_svc() not to ignore the passed my_addr.s_addr if svc_getcaller(rqstp->rq_xprt) is a privileged port on localhost. This would probably give you your logic without reopening the security hole. > take-over server: > B-1. mount the subject filesystem > B-2. "echo 1234 > /proc/fs/nfsd/nlm_set_ip_grace" > B-3. "rpc.statd -n 10.10.1.1 -N -P /shared_storage/sm_10.10.1.1" > B-4. bring up 10.10.1.1 > B-5. re-export the filesystem Umm, don't you want to do B-3 after B-4 and B-5 ? Otherwise clients might racily fail on the first try. Also, just curious here, when do you purge the clients' ARP caches? Greg. -- Greg Banks, R&D Software Engineer, SGI Australian Software Group. I don't speak for SGI. ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover 2006-08-07 4:05 ` Greg Banks @ 2006-08-07 20:14 ` James Yarbrough 2006-08-07 21:03 ` Wendy Cheng 0 siblings, 1 reply; 14+ messages in thread From: James Yarbrough @ 2006-08-07 20:14 UTC (permalink / raw) To: cluster-devel.redhat.com > > take-over server: > > B-1. mount the subject filesystem > > B-2. "echo 1234 > /proc/fs/nfsd/nlm_set_ip_grace" > > B-3. "rpc.statd -n 10.10.1.1 -N -P /shared_storage/sm_10.10.1.1" > > B-4. bring up 10.10.1.1 > > B-5. re-export the filesystem > > Umm, don't you want to do B-3 after B-4 and B-5 ? Otherwise > clients might racily fail on the first try. I don't think they will necessrily fail. It depends on whether the server sends ICMP unreachable messages and how the client responds to those. In any case, I think the ordering should be B-5, B-4, and B-3 last. One can argue about the ordering of B-3 and B-4, but if exporting (B-5) does not happen before bringing up the IP address (B-4), clients can get ESTALE replies. For better transparency, it's probably best to avoid ESTALE. It's probably OK to do step B-3 after bringing up the IP address since that will mimic what happens during boot. > > Also, just curious here, when do you purge the clients' ARP caches? I don't think you can actually do a purge from the server explicitly. You should get the desired result when the IP address (10.10.1.1 in the above example) is brought up. There's a gratuitous ARP that goes with that step. -- jmy at sgi.com 650 933 3124 Blow up the mailbox! ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover 2006-08-07 20:14 ` James Yarbrough @ 2006-08-07 21:03 ` Wendy Cheng 0 siblings, 0 replies; 14+ messages in thread From: Wendy Cheng @ 2006-08-07 21:03 UTC (permalink / raw) To: cluster-devel.redhat.com James Yarbrough wrote: >>>take-over server: >>>B-1. mount the subject filesystem >>>B-2. "echo 1234 > /proc/fs/nfsd/nlm_set_ip_grace" >>>B-3. "rpc.statd -n 10.10.1.1 -N -P /shared_storage/sm_10.10.1.1" >>>B-4. bring up 10.10.1.1 >>>B-5. re-export the filesystem >>> >>> >>Umm, don't you want to do B-3 after B-4 and B-5 ? Otherwise >>clients might racily fail on the first try. >> >> > >I don't think they will necessrily fail. It depends on whether the >server sends ICMP unreachable messages and how the client responds to >those. In any case, I think the ordering should be B-5, B-4, and B-3 >last. One can argue about the ordering of B-3 and B-4, but if exporting >(B-5) does not happen before bringing up the IP address (B-4), clients >can get ESTALE replies. For better transparency, it's probably best >to avoid ESTALE. > >It's probably OK to do step B-3 after bringing up the IP address since >that will mimic what happens during boot. > > Yes, you and Greg are mostly right - that was an oversight from my test script. But our user mode RHCS script (Lon wrote that piece of code) does it correctly. He did B-5, B-4, and B-3 last. <info> Adding export: *:/mnt/tank1 (fsid=9468,rw) <info> Adding export: *:/mnt/tank2 (fsid=661,rw) <debug> Link for eth0: Detected <info> Adding IPv4 address 10.15.89.203 to eth0 <debug> Sending gratuitous ARP: 10.15.89.203 00:30:48:27:92:d6 brd ff:ff:ff:ff:ff:ff <info> Sending reclaim notifications via tank-02 Start of nfs1 complete -- Wendy ^ permalink raw reply [flat|nested] 14+ messages in thread
* [Cluster-devel] Re: [NFS] [PATCH 0/3] NLM lock failover 2006-08-04 13:27 ` Wendy Cheng 2006-08-04 14:56 ` Wendy Cheng @ 2006-08-07 4:05 ` Greg Banks 1 sibling, 0 replies; 14+ messages in thread From: Greg Banks @ 2006-08-07 4:05 UTC (permalink / raw) To: cluster-devel.redhat.com On Fri, 2006-08-04 at 23:27, Wendy Cheng wrote: > On Fri, 2006-08-04 at 19:27 +1000, Greg Banks wrote: > > On Tue, 2006-08-01 at 11:55, Wendy Cheng wrote: > > > o The nfs-utils config flag RESTRICTED_STATD must be off for NLM > > > failover to be functional correctly. > > > > That would reopen this ancient security hole: > > > > http://www.cert.org/advisories/CA-99-05-statd-automountd.html > > > > which might not be the best of ideas. > > > > ok, thanks ! I'll look into this. But I believe nfs-utils-1.0.8-rc4 has > this off by default ? I really hope distros have --enable-secure-statd in their .specs. I know SLES9+ doesn't need it, because SLES has Olaf's in-kernel rpc.statd which (IIRC) has the equivalent of RESTRICTED_STATD hardcoded. Greg. -- Greg Banks, R&D Software Engineer, SGI Australian Software Group. I don't speak for SGI. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2006-08-07 22:38 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-29 17:47 [Cluster-devel] [RFC PATCH 0/3] NLM lock failover Wendy Cheng
2006-08-01 1:55 ` [Cluster-devel] [PATCH " Wendy Cheng
[not found] ` <message from Wendy Cheng on Monday July 31>
2006-08-03 4:14 ` [Cluster-devel] Re: [NFS] " Neil Brown
2006-08-03 21:34 ` Wendy Cheng
2006-08-07 22:38 ` Wendy Cheng
2006-08-04 9:27 ` Greg Banks
2006-08-04 13:27 ` Wendy Cheng
2006-08-04 14:56 ` Wendy Cheng
2006-08-04 15:51 ` Trond Myklebust
2006-08-05 5:44 ` Wendy Cheng
2006-08-07 4:05 ` Greg Banks
2006-08-07 20:14 ` James Yarbrough
2006-08-07 21:03 ` Wendy Cheng
2006-08-07 4:05 ` Greg Banks
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).