From: Wendy Cheng <wcheng@redhat.com>
To: cluster-devel.redhat.com
Subject: [NFS] [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
Date: Thu, 26 Apr 2007 00:35:13 -0400 [thread overview]
Message-ID: <46302C01.2060500@redhat.com> (raw)
In-Reply-To: <17965.39683.396108.623418@notabene.brown>
Neil Brown wrote:
>On Monday April 23, wcheng at redhat.com wrote:
>
>
>>Neil Brown wrote:
>>
>>[snip]
>>
>> We started the discussion using network interface (to drop
>>the locks) but found it wouldn't work well on local filesytems such as
>>ext3. There is really no control on which local (sever side) interface
>>NFS clients will use (shouldn't be hard to implement one though). When
>>the fail-over server starts to remove the locks, it needs a way to find
>>*all* of the locks associated with the will-be-moved partition. This is
>>to allow umount to succeed. The server ip address alone can't guarantee
>>that. That was the reason we switched to fsid. Also remember this is NFS
>>v2/v3 - clients have no knowledge of server migration.
>>
>>
>[snip]
>
>So it seems to me we do know exactly the list of local-addresses that
>could possibly be associated with locks on a given filesystem. They
>are exactly the IP addresses that are publicly acknowledged to be
>usable for that filesystem.
>And if any client tries to access the filesystem using a different IP
>address then they are doing the wrong thing and should be reformatted.
>
>
A convincing argument... unfortunately, this happens to be a case where
we need to protect server from client's misbehaviors. For a local
filesystem (ext3), if any file reference count is not zero (i.e. some
clients are still holding the locks), the filesystem can't be
un-mounted. We would have to fail the failover to avoid data corruption.
>Maybe the idea of using network addresses was the first suggestion,
>and maybe it was rejected for the reasons you give, but it doesn't
>currently seem like those reasons are valid. Maybe those who proposed
>those reasons (and maybe that was me) couldn't see the big picture at
>the time...
>
>
This debate has been (so far) tolerable and helpful - so I'm not going
to comment on this paragraph :) ... But I have to remind people my first
proposal was adding new flags into export command (say "exportfs -ud" to
unexport+drop locks, and "exportfs -g" to re-export and start grace
period). Then we moved to "echo network-addr into procfs", later
switched to "fsid" approach. A very long journey ...
>
>
>>> The reply to SM_MON (currently completely ignored by all versions
>>> of Linux) has an extra value which indicates how many more seconds
>>> of grace period there is to go. This can be stuffed into res_stat
>>> maybe.
>>> Places where we currently check 'nlmsvc_grace_period', get moved to
>>> *after* the nlmsvc_retrieve_args call, and the grace_period value
>>> is extracted from host->nsm.
>>>
>>>
>>>
>>>
>>ok with me but I don't see the advantages though ?
>>
>>
>
>So we can have a different grace period for each different 'host'.
>
>
IMHO, having grace period for each client (host) is overkilled.
> [snip]
>
>Part of unmounting the filesystem from Server A requires getting
>Server A to drop all the locks on the filesystem. We know they can
>only be held by client that sent request to a given set of IP
>addresses. Lockd created an 'nsm' for each client/local-IP pair and
>registered each of those with statd. The information registered with
>statd includes the details of an RPC call that can be made to lockd to
>tell it to drop all the locks owned by that client/local-IP pair.
>
>The statd in 1.1.0 records all this information in the files created
>in /var/lib/nfs/sm (and could pass it to the ha-callout if required).
>So when it is time to unmount the filesystem, some program can look
>through all the files in nfs/nm, read each of the lines, find those
>which relate to any of the local IP address that we want to move, and
>initialiate the RPC callback described on that line. This will tell
>lockd to drop those lockd. When all the RPCs have been sent, lockd
>will not hold any locks on that filesystem any more.
>
>
Bright idea ! But doesn't solve the issue of misbehaved clients who come
in from un-wanted (server) interfaces. Does it ?
>
>[snip]
>I feel it has taken me quite a while to gain a full understanding of
>what you are trying to achieve. Maybe it would be useful to have a
>concise/precise description of what the goal is.
>I think a lot of the issues have now become clear, but it seems there
>remains the issue of what system-wide configurations are expected, and
>what configuration we can rule 'out of scope' and decide we don't have
>to deal with.
>
>
I'm trying to do the write-up now. But could the following temporarily
serve the purpose ? What is not clear from this thread of discussion?
http://www.redhat.com/archives/linux-cluster/2006-June/msg00050.html
-- Wendy
next prev parent reply other threads:[~2007-04-26 4:35 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-04-05 21:50 [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover Wendy Cheng
2007-04-11 17:01 ` [Cluster-devel] Re: [NFS] " J. Bruce Fields
2007-04-17 19:30 ` [Cluster-devel] " Wendy Cheng
2007-04-18 18:56 ` [Cluster-devel] " Wendy Cheng
2007-04-18 19:46 ` Wendy Cheng
2007-04-19 14:41 ` [Cluster-devel] Re: [NFS] " Christoph Hellwig
2007-04-19 15:08 ` Wendy Cheng
[not found] ` <message from Wendy Cheng on Tuesday April 17>
2007-04-19 7:04 ` [Cluster-devel] " Neil Brown
2007-04-19 14:53 ` Wendy Cheng
2007-04-24 3:30 ` Wendy Cheng
[not found] ` <message from Wendy Cheng on Monday April 23>
2007-04-24 5:52 ` [NFS] " Neil Brown
2007-04-26 4:35 ` Wendy Cheng [this message]
[not found] ` <message from Wendy Cheng on Thursday April 26>
2007-04-26 5:43 ` Neil Brown
2007-04-27 2:24 ` Wendy Cheng
2007-04-27 6:00 ` Neil Brown
2007-04-27 11:15 ` Jeff Layton
[not found] ` <message from Jeff Layton on Friday April 27>
2007-04-27 12:40 ` Neil Brown
2007-04-27 18:57 ` Jeff Layton
2007-04-27 14:17 ` Christoph Hellwig
2007-04-27 15:43 ` J. Bruce Fields
2007-04-27 15:36 ` Wendy Cheng
2007-04-27 16:31 ` J. Bruce Fields
[not found] ` <message from J. Bruce Fields on Friday April 27>
2007-04-27 22:22 ` Neil Brown
2007-04-29 20:14 ` J. Bruce Fields
[not found] ` <message from J. Bruce Fields on Sunday April 29>
2007-04-29 23:10 ` Neil Brown
2007-04-30 5:19 ` Wendy Cheng
2007-05-04 18:42 ` J. Bruce Fields
2007-05-04 21:35 ` Wendy Cheng
2007-04-27 20:34 ` Frank van Maarseveen
2007-04-28 3:55 ` Wendy Cheng
[not found] ` <message from Wendy Cheng on Friday April 27>
2007-04-28 4:51 ` Neil Brown
2007-04-28 5:27 ` Marc Eshel
2007-04-28 12:33 ` Frank van Maarseveen
2007-04-27 15:12 ` Jeff Layton
2007-04-25 14:18 ` [Cluster-devel] Re: [NFS] " J. Bruce Fields
2007-04-25 14:10 ` Wendy Cheng
2007-04-25 15:21 ` Marc Eshel
2007-04-25 15:19 ` Wendy Cheng
2007-04-25 15:39 ` Wendy Cheng
2007-04-25 15:59 ` J. Bruce Fields
2007-04-25 15:52 ` Wendy Cheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46302C01.2060500@redhat.com \
--to=wcheng@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).