From mboxrd@z Thu Jan 1 00:00:00 1970 From: Wendy Cheng Date: Wed, 25 Apr 2007 10:10:31 -0400 Subject: [Cluster-devel] Re: [NFS] [PATCH 0/4 Revised] NLM - lock failover In-Reply-To: <20070425141818.GA14729@fieldses.org> References: <46156F3F.3070606@redhat.com> <20070425141818.GA14729@fieldses.org> Message-ID: <462F6157.7060604@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit J. Bruce Fields wrote: > On Thu, Apr 05, 2007 at 05:50:55PM -0400, Wendy Cheng wrote: > >> 1) Failover server exports filesystem with "fsid" option as: >> /etc/exports entry> /mnt/shared/exports *(fsid=1234,sync,rw) >> 2) Failover server dispatch rpc.statd with "-H" option. >> 3) Failover server drops locks based on fsid by: >> shell> echo 1234 > /proc/fs/nfsd/nlm_unlock >> 4) Takeover server enters per fsid grace period by: >> shell> echo 1234 > /proc/fs/nfsd/nlm_set_igrace >> 5) Takeover server notifies clients for lock reclaim by: >> shell> /usr/sbin/sm-notify -f -v floating_ip_address -P an_sm_directory >> > > I don't understand statd and lockd as well as I should. Where exactly > does the takeover server stop serving requests, and the failover server > start? If this isn't done carefully, you can leave a window between > steps 3 and 4 where a client could acquire a lock before its rightful > owner reclaims it, right? > > The detailed overall steps were described in the first email we sent *long* time (> 6 months, I think) ago. The first step of the whole process is tearing down the floating IP from the failover server. The IP is not accessible until filesystem is safely fail-over and SM_NOTIFY ready to be sent. Last round of discussion gave me an impression that as long as I rebased the code into akpm's mm tree, these patches would get accepted. So I have been quite careless in this submission and just realized people have a very short memory :) .. Will do the write-up and put it somewhere so we don't need to go thru this again. -- Wendy