[PATCH 0/4 Revised] NLM - lock failover

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4 Revised] NLM - lock failover
@ 2007-04-05 21:50 Wendy Cheng
  2007-04-11 17:01 ` J. Bruce Fields
                   ` (3 more replies)
  0 siblings, 4 replies; 42+ messages in thread
From: Wendy Cheng @ 2007-04-05 21:50 UTC (permalink / raw)
  To: nfs, cluster-devel; +Cc: Lon Hohberger

Revised patches based on 2.6.21-rc4 kernel and nfs-utils-1.1.0-rc1 that 
address issues discussed in:
https://www.redhat.com/archives/cluster-devel/2006-September/msg00034.html

Quick How-to:
1) Failover server exports filesystem with "fsid" option as:
    /etc/exports entry> /mnt/shared/exports *(fsid=1234,sync,rw)
2) Failover server dispatch rpc.statd with "-H" option.
3) Failover server drops locks based on fsid by:
    shell> echo 1234 > /proc/fs/nfsd/nlm_unlock
4) Takeover server enters per fsid grace period by:
    shell> echo 1234 > /proc/fs/nfsd/nlm_set_igrace
5) Takeover server notifies clients for lock reclaim by:
    shell> /usr/sbin/sm-notify -f -v floating_ip_address -P an_sm_directory

Patch Summary:
4-1: implement /proc/fs/nfsd/nlm_unlock
4-2: implement /proc/fs/nfsd/nlm_set_igrace
4-3: correctly record and pass incoming server ip interface into rpc.statd.
4-4: nfs-utils statd changes
4-1 includes an existing lockd bug fix as discussed in:
http://sourceforge.net/mailarchive/forum.php?thread_name=4603506D.5040807%40redhat.com&forum_name=nfs
(subject: [NFS] Question about f_count in struct nlm_file)
4-4 includes an existing nfs-utils statd bug fix as discussed in:
http://sourceforge.net/mailarchive/message.php?msg_name=46142B4F.1030507%40redhat.com
(subject: Re: [NFS] lockd and statd)

Misc:
o No IPV6 support due to testing efforts
o NFS V3 only - will compare notes with CITI folks (NFS V4 issues)
o Still need some error-inject tests.

-- Wendy

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-05 21:50 [PATCH 0/4 Revised] NLM - lock failover Wendy Cheng
@ 2007-04-11 17:01 ` J. Bruce Fields
  2007-04-17 19:30 ` [Cluster-devel] " Wendy Cheng
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 42+ messages in thread
From: J. Bruce Fields @ 2007-04-11 17:01 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: cluster-devel, Lon Hohberger, nfs

On Thu, Apr 05, 2007 at 05:50:55PM -0400, Wendy Cheng wrote:
> Revised patches based on 2.6.21-rc4 kernel and nfs-utils-1.1.0-rc1 that 
> address issues discussed in:
> https://www.redhat.com/archives/cluster-devel/2006-September/msg00034.html
> 
> Quick How-to:
> 1) Failover server exports filesystem with "fsid" option as:
>     /etc/exports entry> /mnt/shared/exports *(fsid=1234,sync,rw)
> 2) Failover server dispatch rpc.statd with "-H" option.
> 3) Failover server drops locks based on fsid by:
>     shell> echo 1234 > /proc/fs/nfsd/nlm_unlock
> 4) Takeover server enters per fsid grace period by:
>     shell> echo 1234 > /proc/fs/nfsd/nlm_set_igrace
> 5) Takeover server notifies clients for lock reclaim by:
>     shell> /usr/sbin/sm-notify -f -v floating_ip_address -P an_sm_directory
> 
> Patch Summary:
> 4-1: implement /proc/fs/nfsd/nlm_unlock
> 4-2: implement /proc/fs/nfsd/nlm_set_igrace
> 4-3: correctly record and pass incoming server ip interface into rpc.statd.
> 4-4: nfs-utils statd changes
> 4-1 includes an existing lockd bug fix as discussed in:
> http://sourceforge.net/mailarchive/forum.php?thread_name=4603506D.5040807%40redhat.com&forum_name=nfs
> (subject: [NFS] Question about f_count in struct nlm_file)

That's the one separate chunk in nlm_traverse_files()?  Could you keep
that split out as a separate patch?  I see that it got some discussion
before but I'm not clear what the resolution was....

--b.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-05 21:50 [PATCH 0/4 Revised] NLM - lock failover Wendy Cheng
  2007-04-11 17:01 ` J. Bruce Fields
@ 2007-04-17 19:30 ` Wendy Cheng
  2007-04-18 18:56   ` Wendy Cheng
  2007-04-19  7:04   ` [Cluster-devel] " Neil Brown
  2007-04-25 14:18 ` J. Bruce Fields
  2011-11-30 10:13 ` Pavel
  3 siblings, 2 replies; 42+ messages in thread
From: Wendy Cheng @ 2007-04-17 19:30 UTC (permalink / raw)
  To: Neil Brown; +Cc: cluster-devel, nfs

Few new thoughts from the latest round of review are really good and 
worth doing....

However, since this particular NLM patch set is only part of the overall 
scaffolding code to allow NFS V3 server fail over before NFS V4 is 
widely adopted and stabilized, I'm wondering whether we should drag 
ourselves too far for something that will be replaced soon. Lon and I 
had been discussing the possibility of proposing new design changes into 
the existing state monitoring protocol itself - but I'm leaning toward 
*not* doing client SM_NOTIFY eventually (by passing the lock states 
directly from fail-over server to take-over server if all possible). 
This would consolidate few next work items such as NFSD V3 request reply 
cache entires (or at least non-idempotent operation entries) or NFS V4 
states that need to get moved around between the fail over servers.

In general, NFS cluster failover has been error prone and has timing 
constraints (e.g. failover must finish within a sensible time interval). 
Would it make more sense to have a workable solution with restricted 
application first ? We can always merge various pieces together later as 
we learn more from our users. For this reasoning, simple and plain 
patches like this set would work best for now.

In any case, the following collect the review comments so far:

o 1-1 [from hch]
"Dropping locks should also support uuid or dev_t based exports."

A valid request. The easiest solution might be simply taking Neil's idea 
by using export path name. So this issue is combined into 1-3 (see below 
for details).

o 1-2 [from hch]
"It would be nice to have a more general push api for changes to 
filesystem state, that works on a similar basis as getting information 
from /etc/exports."

Could hch (or anyone) elaborate more on this ? Should I interpret it as 
implementing a configuration file (that describes the failover options 
that has a format similar to /etc/exports (including filesystem 
identifiers, the length of grace period, etc) and a command (maybe two - 
one on failover server and one on take-over server) to kick off the 
failover based on the pre-defined configuration file ?

o 1-3 [from neilb]
"It would seem to make more sense to use the filesystem name (i.e. a 
path) by writing a directory name to /proc/fs/nfsd/nlm_unlock and maybe 
also to /proc/fs/nlm_restart_grace_for_fs" and have 'my_name' in the 
SM_MON request be the path name of the export point rather the network 
address."

It was my mistake to mention that we could use "fsid" in the "my_name" 
field in previous post. As Lon pointed out, SM_MON requires server 
address so we do not blindly notify clients that could result with 
unspecified behaviors. On the other hand, the "path name" idea does 
solve various problems if we want to support different types of existing 
filesystem identifiers for failover purpose. Combining the configuration 
file mentioned in 1-2, this could be a nice long term solution. Few 
concerns (about using path name alone) :

*String comparison can be error-prone and slow
* It loses the abstraction provided by the "fsid" approach, particularly 
for a cluster filesystem load balancing purpose. With "fsid" approach, 
we could simply export the same directory using two different fsid(s) 
(associated with two different IP addresses) for various purposes on the 
same node.
* Will have to repeatedly educate users that "dev_t" is not unique 
across reboots or nodes; uuid is restricted to one single disk 
partition; and both of them require extra steps to obtain the values 
somewhere else that are not easily read by human eyes. My support 
experiences taught me that by the time users really understand the 
difference, they'll switch to fsid anyway.

1-4 [from bfields]
"Unrelated bug fix should break out from the feature patches".

Will do

2-1 [from cluster coherent NFS conf. call]
"Hooks to allow cluster filesystem does its own "start" and "stop" of 
grace period."

This could be solved by using a configuration file as described in 1-2.

3-1 [from okir]
"There's not enough room in the SM_MON request to accommodate additional 
network addresses (e.g. IPv6)".

SM_MON is sent and received *within* the very same server. Is it really 
matter whether we follow the protocol standard in this particular RPC 
call ? My guess is not. Current patch writes server IP into "my_name" 
field as a variable length character array. I see no reason this can't 
be a larger character array (say 128 bytes for IPV6) to accommodate all 
the existing network addressing we know of.

3-2 [from okir]
"Should we think about replacing SM_MON with some new design altogether 
(think netlink) ?"

Yes. But before we spend the efforts, I would suggest we focus on
1. Offering a tentative workable NFS V3 solution for our users first.
2. Check out the requirement from NFS V4 implementation so we don't end 
up revising the new changes again when V4 failover arrives.

In short, my vote is taking this (NLM) patch set and let people try it 
out while we switch our gear to look into other NFS V3 failover issues 
(nfsd in particular). Neil ?

-- Wendy

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-17 19:30 ` [Cluster-devel] " Wendy Cheng
@ 2007-04-18 18:56   ` Wendy Cheng
  2007-04-18 19:46     ` [Cluster-devel] " Wendy Cheng
  2007-04-19 14:41     ` Christoph Hellwig
  2007-04-19  7:04   ` [Cluster-devel] " Neil Brown
  1 sibling, 2 replies; 42+ messages in thread
From: Wendy Cheng @ 2007-04-18 18:56 UTC (permalink / raw)
  To: nfs, cluster-devel

Arjan, I need an objective opinion and am wondering whether you could 
give me some advices ...

I'm quite upset about this Christoph guy and really want to "talk back". 
Will my response be too strong that ends up harm my later on patches ?

-- Wendy

Christoph Hellwig wrote:
> On Tue, Apr 17, 2007 at 10:11:13PM -0400, Wendy Cheng wrote:
> > However, since this particular NLM patch set is only part of the 
> overall
> > scaffolding code to allow NFS V3 server fail over before NFS V4 is
> > widely adopted and stabilized, I'm wondering whether we should drag
> > ourselves too far for something that will be replaced soon.
>
> I don't think that's a valid argument.  We hack this up because it's
> going to be obsolete mid-term never was a really good argument.  And in
> this case it's a particularly bad one.  People won't rush to NFSv4 just
> because someone declares it stable now.  And if they did we couldn't
> simply rip out existing functionality.
>

The "hack" and "bad" are very subjective words in this context. 
Comparing to many other code currently living inside Linux kernel tree, 
this patch set, gone thru 3 rounds of extensive review and discussions, 
deserves at least "average" standing in terms of solution, quality and 
testing efforts.

On the other hand, I certainly welcome further constructive suggestions 
and ideas though.

-- Wendy

 

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-18 18:56   ` Wendy Cheng
@ 2007-04-18 19:46     ` Wendy Cheng
  2007-04-19 14:41     ` Christoph Hellwig
  1 sibling, 0 replies; 42+ messages in thread
From: Wendy Cheng @ 2007-04-18 19:46 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: cluster-devel, nfs

oops :( ...


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-17 19:30 ` [Cluster-devel] " Wendy Cheng
  2007-04-18 18:56   ` Wendy Cheng
@ 2007-04-19  7:04   ` Neil Brown
  2007-04-19 14:53     ` Wendy Cheng
  2007-04-24  3:30     ` Wendy Cheng
  1 sibling, 2 replies; 42+ messages in thread
From: Neil Brown @ 2007-04-19  7:04 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: cluster-devel, nfs

On Tuesday April 17, wcheng@redhat.com wrote:
> 
> In short, my vote is taking this (NLM) patch set and let people try it 
> out while we switch our gear to look into other NFS V3 failover issues 
> (nfsd in particular). Neil ?

I agree with Christoph in that we should do it properly.
That doesn't mean that we need a complete solution.  But we do want to
make sure to avoid any design decisions that we might not want to be
stuck with.  Sometimes that's unavoidable, but let's try a little
harder for the moment.

One thing that has been bothering me is that sometimes the
"filesystem" (in the guise of an fsid) is used to talk to the kernel
about failover issues (when flushing locks or restarting the grace
period) and sometimes the local network address is used (when talking
with statd). 

I would rather use a single identifier.  In my previous email I was
leaning towards using the filesystem as the single identifier.  Today
I'm leaning the other way - to using the local network address.

It works like this:

  We have a module parameter for lockd something like
  "virtual_server".
  If that is set to 0, none of the following changes are effective.
  If it is set to 1:

   The destination address for any lockd request becomes part of the
   key to find the nsm_handle.
   The my_name field in SM_MON requests and SM_UNMON requests is set
   to a textual representation of that destination address.
   The reply to SM_MON (currently completely ignored by all versions
   of Linux) has an extra value which indicates how many more seconds
   of grace period there is to go.  This can be stuffed into res_stat
   maybe.
   Places where we currently check 'nlmsvc_grace_period', get moved to
   *after* the nlmsvc_retrieve_args call, and the grace_period value
   is extracted from host->nsm.

  This is the full extent of the kernel changes.

  To remove old locks, we arrange for the callbacks registered with
  statd for the relevant clients to be called.
  To set the grace period, we make sure statd knows about it and it
  will return the relevant information to lockd.
  To notify clients of the need to reclaim locks, we simple use the
  information stored by statd, which contains the local network
  address.

The only aspect of this that gives me any cause for concern is
overloading the return value for SM_MON.  Possibly it might be cleaner
to define an SM_MON2 with different args or whatever.
As this interface is entirely local to the one machine, and as it can
quite easily be kept back-compatible, I think the concept is fine.

Statd would need to pass the my_name field to the ha callout rather
than replacing it with "127.0.0.1", but other than that I don't think
any changes are needed to statd (though I haven't thought through that
fully yet).

Comments?

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-18 18:56   ` Wendy Cheng
  2007-04-18 19:46     ` [Cluster-devel] " Wendy Cheng
@ 2007-04-19 14:41     ` Christoph Hellwig
  2007-04-19 15:08       ` Wendy Cheng
  1 sibling, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2007-04-19 14:41 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: cluster-devel, nfs

On Wed, Apr 18, 2007 at 02:56:18PM -0400, Wendy Cheng wrote:
> Arjan, I need an objective opinion and am wondering whether you could 
> give me some advices ...
> 
> I'm quite upset about this Christoph guy and really want to "talk back". 
> Will my response be too strong that ends up harm my later on patches ?

I don't mind strong answers, but I'd generally prefer clueful strong
answers :)


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-19  7:04   ` [Cluster-devel] " Neil Brown
@ 2007-04-19 14:53     ` Wendy Cheng
  2007-04-24  3:30     ` Wendy Cheng
  1 sibling, 0 replies; 42+ messages in thread
From: Wendy Cheng @ 2007-04-19 14:53 UTC (permalink / raw)
  To: Neil Brown; +Cc: cluster-devel, nfs

Neil Brown wrote:
> On Tuesday April 17, wcheng@redhat.com wrote:
>   
>> In short, my vote is taking this (NLM) patch set and let people try it 
>> out while we switch our gear to look into other NFS V3 failover issues 
>> (nfsd in particular). Neil ?
>>     
>
> I agree with Christoph in that we should do it properly.
> That doesn't mean that we need a complete solution.  But we do want to
> make sure to avoid any design decisions that we might not want to be
> stuck with.  Sometimes that's unavoidable, but let's try a little
> harder for the moment.
>   

As any code review, set personal feeling aside, at the end of the day, 
you would start to appreciate some of the look-like-harsh comments. This 
instance is definitely one of that moments. I agree we should try harder.

NFS failover has been a difficult subject. There is a three-years-old 
Red Hat bugzilla asking for this feature, plus few others marked as 
duplicate. By reading through the comments last night, I do feel 
strongly that we should put restrictions on the implementation to avoid 
dragging users into another three more years.

> One thing that has been bothering me is that sometimes the
> "filesystem" (in the guise of an fsid) is used to talk to the kernel
> about failover issues (when flushing locks or restarting the grace
> period) and sometimes the local network address is used (when talking
> with statd). 
>
> I would rather use a single identifier.  In my previous email I was
> leaning towards using the filesystem as the single identifier.  Today
> I'm leaning the other way - to using the local network address.
>
> It works like this:
>
>   We have a module parameter for lockd something like
>   "virtual_server".
>   If that is set to 0, none of the following changes are effective.
>   If it is set to 1:
>
>    The destination address for any lockd request becomes part of the
>    key to find the nsm_handle.
>    The my_name field in SM_MON requests and SM_UNMON requests is set
>    to a textual representation of that destination address.
>    The reply to SM_MON (currently completely ignored by all versions
>    of Linux) has an extra value which indicates how many more seconds
>    of grace period there is to go.  This can be stuffed into res_stat
>    maybe.
>    Places where we currently check 'nlmsvc_grace_period', get moved to
>    *after* the nlmsvc_retrieve_args call, and the grace_period value
>    is extracted from host->nsm.
>
>   This is the full extent of the kernel changes.
>
>   To remove old locks, we arrange for the callbacks registered with
>   statd for the relevant clients to be called.
>   To set the grace period, we make sure statd knows about it and it
>   will return the relevant information to lockd.
>   To notify clients of the need to reclaim locks, we simple use the
>   information stored by statd, which contains the local network
>   address.
>
> The only aspect of this that gives me any cause for concern is
> overloading the return value for SM_MON.  Possibly it might be cleaner
> to define an SM_MON2 with different args or whatever.
> As this interface is entirely local to the one machine, and as it can
> quite easily be kept back-compatible, I think the concept is fine.
>
> Statd would need to pass the my_name field to the ha callout rather
> than replacing it with "127.0.0.1", but other than that I don't think
> any changes are needed to statd (though I haven't thought through that
> fully yet).
>
> Comments?
>
>   
Need sometime to look into the ramifications ... comment will follow soon.

-- Wendy


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-19 14:41     ` Christoph Hellwig
@ 2007-04-19 15:08       ` Wendy Cheng
  0 siblings, 0 replies; 42+ messages in thread
From: Wendy Cheng @ 2007-04-19 15:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: cluster-devel, nfs

Christoph Hellwig wrote:
> On Wed, Apr 18, 2007 at 02:56:18PM -0400, Wendy Cheng wrote:
>   
>> Arjan, I need an objective opinion and am wondering whether you could 
>> give me some advices ...
>>
>> I'm quite upset about this Christoph guy and really want to "talk back". 
>> Will my response be too strong that ends up harm my later on patches ?
>>     
>
> I don't mind strong answers, but I'd generally prefer clueful strong
> answers :)
>
>   

Well, I have been seriously considering your previous advice about 
getting another job. Maybe working on a new email system that would 
allow people to cancel or stop mistakenly sent out emails would be a 
very good new project ? :)

Glad to know you don't feel offended.

-- Wendy


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-19  7:04   ` [Cluster-devel] " Neil Brown
  2007-04-19 14:53     ` Wendy Cheng
@ 2007-04-24  3:30     ` Wendy Cheng
  2007-04-24  5:52       ` Neil Brown
  1 sibling, 1 reply; 42+ messages in thread
From: Wendy Cheng @ 2007-04-24  3:30 UTC (permalink / raw)
  To: Neil Brown; +Cc: cluster-devel, nfs

Neil Brown wrote:

>One thing that has been bothering me is that sometimes the
>"filesystem" (in the guise of an fsid) is used to talk to the kernel
>about failover issues (when flushing locks or restarting the grace
>period) and sometimes the local network address is used (when talking
>with statd). 
>  
>

This is a perception issue - it depends on how the design is described. 
More on this later.

>I would rather use a single identifier.  In my previous email I was
>leaning towards using the filesystem as the single identifier.  Today
>I'm leaning the other way - to using the local network address.
>  
>
Guess you're juggling with too many things so forget why we came down to 
this route ? We started the discussion using network interface (to drop 
the locks) but found it wouldn't work well on local filesytems such as 
ext3. There is really no control on which local (sever side) interface 
NFS clients will use (shouldn't be hard to implement one though). When 
the fail-over server starts to remove the locks, it needs a way to find 
*all* of the locks associated with the will-be-moved partition. This is 
to allow umount to succeed. The server ip address alone can't guarantee 
that. That was the reason we switched to fsid. Also remember this is NFS 
v2/v3 - clients have no knowledge of server migration.

Now, let's move back to first paragraph. An active-active failover can 
be described as a 5-steps process:

Step 1. Quiesce the floating network address.
Step 2. Move the exported filesystem directories from Server A to Server B.
Step 3. Re-enable the network interface.
Step 4. Inform clients about the changes via NSM (Network Status 
Monitor) Protocol.
Step 5. Grace period.

I was told last week that, independent of lockd, some cluster 
filesystems do have their own implementation of grace period. It is on 
the wish list that this feature is taken into consideration. IMHO, the 
overall process should be viewed as a collaboration between filesystem, 
network interface, and NFS protocol itself. Mixing the filesystem and 
network operations are unavoidable.

On the other hand, the current proposed interface is expandable .. say, 
prefix a non-numerical string "DEV" or "UUID" to ask for dropping locks 
as in:
shell> echo "DEV12390 > /proc/fs/nfsd/nlm_unlock;

or allow individual grace period of 10 seconds as:
shell> echo "1234@10" > nlm_set_grace_for_fsid

With above said, some of the following flow confuses me ... comment 
inlined as below ..

>It works like this:
>
>  We have a module parameter for lockd something like
>  "virtual_server".
>  If that is set to 0, none of the following changes are effective.
>  If it is set to 1:
>  
>
ok with me ...

>   The destination address for any lockd request becomes part of the
>   key to find the nsm_handle.
>  
>

As explained above, the address along can't guarantee the associated 
locks get cleaned up for one particular filesystem.

>   The my_name field in SM_MON requests and SM_UNMON requests is set
>   to a textual representation of that destination address.
>  
>

That's what the current patch does.

>   The reply to SM_MON (currently completely ignored by all versions
>   of Linux) has an extra value which indicates how many more seconds
>   of grace period there is to go.  This can be stuffed into res_stat
>   maybe.
>   Places where we currently check 'nlmsvc_grace_period', get moved to
>   *after* the nlmsvc_retrieve_args call, and the grace_period value
>   is extracted from host->nsm.
>  
>
ok with me but I don't see the advantages though ?

>  This is the full extent of the kernel changes.
>
>  To remove old locks, we arrange for the callbacks registered with
>  statd for the relevant clients to be called.
>  To set the grace period, we make sure statd knows about it and it
>  will return the relevant information to lockd.
>  To notify clients of the need to reclaim locks, we simple use the
>  information stored by statd, which contains the local network
>  address.
>  
>

I'm lost here... help ?

>The only aspect of this that gives me any cause for concern is
>overloading the return value for SM_MON.  Possibly it might be cleaner
>to define an SM_MON2 with different args or whatever.
>As this interface is entirely local to the one machine, and as it can
>quite easily be kept back-compatible, I think the concept is fine.
>  
>
Agree !

>Statd would need to pass the my_name field to the ha callout rather
>than replacing it with "127.0.0.1", but other than that I don't think
>any changes are needed to statd (though I haven't thought through that
>fully yet).
>  
>

That's the current patch does.

>Comments?
>
>
>  
>
I feel we're in the loop again... If there is any way I can shorten this 
discussion, please do let me know.

-- Wendy


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-24  3:30     ` Wendy Cheng
@ 2007-04-24  5:52       ` Neil Brown
  2007-04-26  4:35         ` Wendy Cheng
  0 siblings, 1 reply; 42+ messages in thread
From: Neil Brown @ 2007-04-24  5:52 UTC (permalink / raw)
  To: wcheng; +Cc: cluster-devel, nfs

On Monday April 23, wcheng@redhat.com wrote:
> Neil Brown wrote:
> 
> >One thing that has been bothering me is that sometimes the
> >"filesystem" (in the guise of an fsid) is used to talk to the kernel
> >about failover issues (when flushing locks or restarting the grace
> >period) and sometimes the local network address is used (when talking
> >with statd). 
> 
> This is a perception issue - it depends on how the design is described. 

Perception affects understanding.  Understanding is vital.

> More on this later.

OK.

> 
> >I would rather use a single identifier.  In my previous email I was
> >leaning towards using the filesystem as the single identifier.  Today
> >I'm leaning the other way - to using the local network address.
> >  
> >
> Guess you're juggling with too many things so forget why we came down to 
> this route ?

Probably :-)

>              We started the discussion using network interface (to drop 
> the locks) but found it wouldn't work well on local filesytems such as 
> ext3. There is really no control on which local (sever side) interface 
> NFS clients will use (shouldn't be hard to implement one though). When 
> the fail-over server starts to remove the locks, it needs a way to find 
> *all* of the locks associated with the will-be-moved partition. This is 
> to allow umount to succeed. The server ip address alone can't guarantee 
> that. That was the reason we switched to fsid. Also remember this is NFS 
> v2/v3 - clients have no knowledge of server migration.

Hmmm...
I had in mind that you would have some name in the DNS like
"virtual-nas-foo" which maps to a number of IP addresses,  And every
client that wants to access /bar, which is known to be served by
virtual-nas-foo would:
   mount virtual-nas-foo:/bar /bar

and some server (A) from the pool of possibilities would configure a bunch
of virtual interfaces to have the different IP addresses that the DNS
knows to be associated with 'virtual-nas-foo'.
It might also configure a bunch of other virtual interfaces with the
addresses of 'virtual-nas-baz', but no client would ever try to 
   mount virtual-nas-baz:/bar /bar
because, while that might work depending on the server configuration,
it is clearly a config error and as soon as /bar was migrated A to B,
those clients would mysteriously lose service.

So it seems to me we do know exactly the list of local-addresses that
could possibly be associated with locks on a given filesystem.  They
are exactly the IP addresses that are publicly acknowledged to be
usable for that filesystem.
And if any client tries to access the filesystem using a different IP
address then they are doing the wrong thing and should be reformatted.

Maybe the idea of using network addresses was the first suggestion,
and maybe it was rejected for the reasons you give, but it doesn't
currently seem like those reasons are valid.  Maybe those who proposed
those reasons (and maybe that was me) couldn't see the big picture at
the time.... maybe I still don't see the big picture?

> >   The reply to SM_MON (currently completely ignored by all versions
> >   of Linux) has an extra value which indicates how many more seconds
> >   of grace period there is to go.  This can be stuffed into res_stat
> >   maybe.
> >   Places where we currently check 'nlmsvc_grace_period', get moved to
> >   *after* the nlmsvc_retrieve_args call, and the grace_period value
> >   is extracted from host->nsm.
> >  
> >
> ok with me but I don't see the advantages though ?

So we can have a different grace period for each different 'host'.

> 
> >  This is the full extent of the kernel changes.
> >
> >  To remove old locks, we arrange for the callbacks registered with
> >  statd for the relevant clients to be called.
> >  To set the grace period, we make sure statd knows about it and it
> >  will return the relevant information to lockd.
> >  To notify clients of the need to reclaim locks, we simple use the
> >  information stored by statd, which contains the local network
> >  address.
> >  
> >
> 
> I'm lost here... help ?
> 

Ok, I'll try to not be so terse.

> >  To remove old locks, we arrange for the callbacks registered with
> >  statd for the relevant clients to be called.

Part of unmounting the filesystem from Server A requires getting
Server A to drop all the locks on the filesystem.  We know they can
only be held by client that sent request to a given set of IP
addresses.   Lockd created an 'nsm' for each client/local-IP pair and
registered each of those with statd.  The information registered with
statd includes the details of an RPC call that can be made to lockd to
tell it to drop all the locks owned by that client/local-IP pair.

The statd in 1.1.0 records all this information in the files created
in /var/lib/nfs/sm (and could pass it to the ha-callout if required).
So when it is time to unmount the filesystem, some program can look
through all the files in nfs/nm, read each of the lines, find those
which relate to any of the local IP address that we want to move, and
initialiate the RPC callback described on that line.  This will tell
lockd to drop those lockd.  When all the RPCs have been sent, lockd
will not hold any locks on that filesystem any more.

> >  To set the grace period, we make sure statd knows about it and it
> >  will return the relevant information to lockd.

On Server-B, we mount the filesystem(s) and export them.  When a lock
request arrives from some client, lockd needs to know whether the
grace period is still active.  We want that determination to depend on
which filesystem/local-IP was used.  One way to do that is to have in
information passing in by statd when lockd asks for the client to be
monitored.  A possible implementation would be to have the ha-callout
find out the virtual-server was migrated, and return a number of
seconds remaining by writing it to stdout.  statd could run the
ha-callout with output to a pipe, read the number, and include that in
the reply to SM_MON.

> >  To notify clients of the need to reclaim locks, we simple use the
> >  information stored by statd, which contains the local network
> >  address.

Once the filesystem is exported on Server-B, we need to notify all
clients to reclaim their locks.  We can find the same lines that were
used to tell lockd to close locks on the server, and use that
information to tell client that they need to reclaim (or information
recorded elsewhere by the ha-callout can do the same thing).

Does that make it clearer?

> I feel we're in the loop again... If there is any way I can shorten this 
> discussion, please do let me know.
> 

Much as the 'waterfall model' is frowned upon these days, I wonder if
it could serve us here.
I feel it has taken me quite a while to gain a full understanding of
what you are trying to achieve.  Maybe it would be useful to have a
concise/precise description of what the goal is.
I think a lot of the issues have now become clear, but it seems there
remains the issue of what system-wide configurations are expected, and
what configuration we can rule 'out of scope' and decide we don't have
to deal with.
Once we have a clear statement of the gaol that we can agree on, it
should be a lot easy to evaluate and reason about different
implementation proposals.

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-25 14:18 ` J. Bruce Fields
@ 2007-04-25 14:10   ` Wendy Cheng
  2007-04-25 15:21     ` Marc Eshel
  2007-04-25 15:59     ` J. Bruce Fields
  0 siblings, 2 replies; 42+ messages in thread
From: Wendy Cheng @ 2007-04-25 14:10 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: cluster-devel, Lon Hohberger, nfs

J. Bruce Fields wrote:
> On Thu, Apr 05, 2007 at 05:50:55PM -0400, Wendy Cheng wrote:
>   
>> 1) Failover server exports filesystem with "fsid" option as:
>>     /etc/exports entry> /mnt/shared/exports *(fsid=1234,sync,rw)
>> 2) Failover server dispatch rpc.statd with "-H" option.
>> 3) Failover server drops locks based on fsid by:
>>     shell> echo 1234 > /proc/fs/nfsd/nlm_unlock
>> 4) Takeover server enters per fsid grace period by:
>>     shell> echo 1234 > /proc/fs/nfsd/nlm_set_igrace
>> 5) Takeover server notifies clients for lock reclaim by:
>>     shell> /usr/sbin/sm-notify -f -v floating_ip_address -P an_sm_directory
>>     
>
> I don't understand statd and lockd as well as I should.  Where exactly
> does the takeover server stop serving requests, and the failover server
> start?  If this isn't done carefully, you can leave a window between
> steps 3 and 4 where a client could acquire a lock before its rightful
> owner reclaims it, right?
>
>   
The detailed overall steps were described in the first email we sent 
*long* time (> 6 months, I think) ago. The first step of the whole 
process is tearing down the floating IP from the failover server. The IP 
is not accessible until filesystem is safely fail-over and SM_NOTIFY 
ready to be sent.

Last round of discussion gave me an impression that as long as I rebased 
the code into akpm's mm tree, these patches would get accepted. So I 
have been quite careless in this submission and just realized people 
have a very short memory :) .. Will do the write-up and put it somewhere 
so we don't need to go thru this again.

-- Wendy


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-05 21:50 [PATCH 0/4 Revised] NLM - lock failover Wendy Cheng
  2007-04-11 17:01 ` J. Bruce Fields
  2007-04-17 19:30 ` [Cluster-devel] " Wendy Cheng
@ 2007-04-25 14:18 ` J. Bruce Fields
  2007-04-25 14:10   ` Wendy Cheng
  2011-11-30 10:13 ` Pavel
  3 siblings, 1 reply; 42+ messages in thread
From: J. Bruce Fields @ 2007-04-25 14:18 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: cluster-devel, Lon Hohberger, nfs

On Thu, Apr 05, 2007 at 05:50:55PM -0400, Wendy Cheng wrote:
> 1) Failover server exports filesystem with "fsid" option as:
>     /etc/exports entry> /mnt/shared/exports *(fsid=1234,sync,rw)
> 2) Failover server dispatch rpc.statd with "-H" option.
> 3) Failover server drops locks based on fsid by:
>     shell> echo 1234 > /proc/fs/nfsd/nlm_unlock
> 4) Takeover server enters per fsid grace period by:
>     shell> echo 1234 > /proc/fs/nfsd/nlm_set_igrace
> 5) Takeover server notifies clients for lock reclaim by:
>     shell> /usr/sbin/sm-notify -f -v floating_ip_address -P an_sm_directory

I don't understand statd and lockd as well as I should.  Where exactly
does the takeover server stop serving requests, and the failover server
start?  If this isn't done carefully, you can leave a window between
steps 3 and 4 where a client could acquire a lock before its rightful
owner reclaims it, right?

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-25 15:21     ` Marc Eshel
@ 2007-04-25 15:19       ` Wendy Cheng
  2007-04-25 15:39         ` [Cluster-devel] " Wendy Cheng
  0 siblings, 1 reply; 42+ messages in thread
From: Wendy Cheng @ 2007-04-25 15:19 UTC (permalink / raw)
  To: Marc Eshel
  Cc: J. Bruce Fields, cluster-devel, Lon Hohberger, nfs, nfs-bounces

Marc Eshel wrote:
>> The detailed overall steps were described in the first email we sent 
>> *long* time (> 6 months, I think) ago. The first step of the whole 
>> process is tearing down the floating IP from the failover server. The IP 
>> is not accessible until filesystem is safely fail-over and SM_NOTIFY 
>> ready to be sent.
>>     
>
> I thought this is a solution for an active active server where a cluster 
> file system can export the same file system from multiple NFS servers.
> Marc.
>
>   

Yes ... but remember we should have two cases here:

1) Local filesystems such as ext3 - both IP and filesystem are not 
accessible during the transition.
2) Cluster filesystem such as GFS or GPFS - filesystem may still be 
accessible (depending on the configuration, say you have advertised two 
exported IP addresses, each serving different subdirectories for the 
very same cluster filesystem). The failover IP should be suspended 
during the transition until SM_NOTIFY is ready to go out (but the other 
IP should be up and services the requests as it should).

I assume people understand that the affected export entries should have 
been un-exported (as part of the over-all process).

-- Wendy

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-25 14:10   ` Wendy Cheng
@ 2007-04-25 15:21     ` Marc Eshel
  2007-04-25 15:19       ` Wendy Cheng
  2007-04-25 15:59     ` J. Bruce Fields
  1 sibling, 1 reply; 42+ messages in thread
From: Marc Eshel @ 2007-04-25 15:21 UTC (permalink / raw)
  To: Wendy Cheng
  Cc: J. Bruce Fields, cluster-devel, Lon Hohberger, nfs, nfs-bounces

nfs-bounces@lists.sourceforge.net wrote on 04/25/2007 07:10:31 AM:

> J. Bruce Fields wrote:
> > On Thu, Apr 05, 2007 at 05:50:55PM -0400, Wendy Cheng wrote:
> > 
> >> 1) Failover server exports filesystem with "fsid" option as:
> >>     /etc/exports entry> /mnt/shared/exports *(fsid=1234,sync,rw)
> >> 2) Failover server dispatch rpc.statd with "-H" option.
> >> 3) Failover server drops locks based on fsid by:
> >>     shell> echo 1234 > /proc/fs/nfsd/nlm_unlock
> >> 4) Takeover server enters per fsid grace period by:
> >>     shell> echo 1234 > /proc/fs/nfsd/nlm_set_igrace
> >> 5) Takeover server notifies clients for lock reclaim by:
> >>     shell> /usr/sbin/sm-notify -f -v floating_ip_address -P 
an_sm_directory
> >> 
> >
> > I don't understand statd and lockd as well as I should.  Where exactly
> > does the takeover server stop serving requests, and the failover 
server
> > start?  If this isn't done carefully, you can leave a window between
> > steps 3 and 4 where a client could acquire a lock before its rightful
> > owner reclaims it, right?
> >
> > 
> The detailed overall steps were described in the first email we sent 
> *long* time (> 6 months, I think) ago. The first step of the whole 
> process is tearing down the floating IP from the failover server. The IP 

> is not accessible until filesystem is safely fail-over and SM_NOTIFY 
> ready to be sent.

I thought this is a solution for an active active server where a cluster 
file system can export the same file system from multiple NFS servers.
Marc.

> 
> Last round of discussion gave me an impression that as long as I rebased 

> the code into akpm's mm tree, these patches would get accepted. So I 
> have been quite careless in this submission and just realized people 
> have a very short memory :) .. Will do the write-up and put it somewhere 

> so we don't need to go thru this again.
> 
> -- Wendy
> 
> 
> 
-------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-25 15:19       ` Wendy Cheng
@ 2007-04-25 15:39         ` Wendy Cheng
  0 siblings, 0 replies; 42+ messages in thread
From: Wendy Cheng @ 2007-04-25 15:39 UTC (permalink / raw)
  To: cluster-devel, nfs

Wendy Cheng wrote:
> Marc Eshel wrote:
>>> The detailed overall steps were described in the first email we sent 
>>> *long* time (> 6 months, I think) ago. The first step of the whole 
>>> process is tearing down the floating IP from the failover server. 
>>> The IP is not accessible until filesystem is safely fail-over and 
>>> SM_NOTIFY ready to be sent.
>>
>> I thought this is a solution for an active active server where a 
>> cluster file system can export the same file system from multiple NFS 
>> servers.
>> Marc.
>>
>
> Yes ... but remember we should have two cases here:
>
> 1) Local filesystems such as ext3 - both IP and filesystem are not 
> accessible during the transition.
> 2) Cluster filesystem such as GFS or GPFS - filesystem may still be 
> accessible (depending on the configuration, say you have advertised 
> two exported IP addresses, each serving different subdirectories for 
> the very same cluster filesystem). The failover IP should be suspended 
> during the transition until SM_NOTIFY is ready to go out (but the 
> other IP should be up and services the requests as it should).
>
> I assume people understand that the affected export entries should 
> have been un-exported (as part of the over-all process).

To remind people, we had described the overall steps in the first round 
of kernel interface discussions submitted to nfs mailing list (before 
implemented the code). These steps included un-export the entries, 
tearing down the IP, filesystem migration, SM_NOTIFY, grace period, etc. 
I'm in the middle of redoing the write-up. It will be uploaded into 
somewhere soon.

-- Wendy


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-25 15:59     ` J. Bruce Fields
@ 2007-04-25 15:52       ` Wendy Cheng
  0 siblings, 0 replies; 42+ messages in thread
From: Wendy Cheng @ 2007-04-25 15:52 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: cluster-devel, Lon Hohberger, nfs

J. Bruce Fields wrote:
> If practical, it would be helpful to have any such documentation in the
> final version of the patches, though, either as patch comments or (maybe
> better in this case) as comments in the code or in Documentation/.  When
> someone needs to go back and find out how this was all meant to work,
> it'll be easier to find in the source tree or the git history than in
> the mail archives.
>
>   
I'm correcting the oversight at this moment .... had been working in 
cluster-only development group(s) for too long and forgot NFS mailing 
list is a community consisting of people from different background(s).

-- Wendy


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-25 14:10   ` Wendy Cheng
  2007-04-25 15:21     ` Marc Eshel
@ 2007-04-25 15:59     ` J. Bruce Fields
  2007-04-25 15:52       ` Wendy Cheng
  1 sibling, 1 reply; 42+ messages in thread
From: J. Bruce Fields @ 2007-04-25 15:59 UTC (permalink / raw)
  To: Wendy Cheng; +Cc: cluster-devel, Lon Hohberger, nfs

On Wed, Apr 25, 2007 at 10:10:31AM -0400, Wendy Cheng wrote:
> The detailed overall steps were described in the first email we sent 
> *long* time (> 6 months, I think) ago. The first step of the whole 
> process is tearing down the floating IP from the failover server. The IP 
> is not accessible until filesystem is safely fail-over and SM_NOTIFY 
> ready to be sent.

I understand, thanks.

> Last round of discussion gave me an impression that as long as I rebased 
> the code into akpm's mm tree, these patches would get accepted. So I 
> have been quite careless in this submission and just realized people 
> have a very short memory :) .. Will do the write-up and put it somewhere 
> so we don't need to go thru this again.

Yeah, apologies for the short memory.  I'll try to follow more closely
from now on!

If practical, it would be helpful to have any such documentation in the
final version of the patches, though, either as patch comments or (maybe
better in this case) as comments in the code or in Documentation/.  When
someone needs to go back and find out how this was all meant to work,
it'll be easier to find in the source tree or the git history than in
the mail archives.

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-24  5:52       ` Neil Brown
@ 2007-04-26  4:35         ` Wendy Cheng
  2007-04-26  5:43           ` Neil Brown
  0 siblings, 1 reply; 42+ messages in thread
From: Wendy Cheng @ 2007-04-26  4:35 UTC (permalink / raw)
  To: Neil Brown; +Cc: cluster-devel, nfs

Neil Brown wrote:

>On Monday April 23, wcheng@redhat.com wrote:
>  
>
>>Neil Brown wrote:
>>
>>[snip]
>>
>>             We started the discussion using network interface (to drop 
>>the locks) but found it wouldn't work well on local filesytems such as 
>>ext3. There is really no control on which local (sever side) interface 
>>NFS clients will use (shouldn't be hard to implement one though). When 
>>the fail-over server starts to remove the locks, it needs a way to find 
>>*all* of the locks associated with the will-be-moved partition. This is 
>>to allow umount to succeed. The server ip address alone can't guarantee 
>>that. That was the reason we switched to fsid. Also remember this is NFS 
>>v2/v3 - clients have no knowledge of server migration.
>>    
>>
>[snip]
>
>So it seems to me we do know exactly the list of local-addresses that
>could possibly be associated with locks on a given filesystem.  They
>are exactly the IP addresses that are publicly acknowledged to be
>usable for that filesystem.
>And if any client tries to access the filesystem using a different IP
>address then they are doing the wrong thing and should be reformatted.
>  
>

A convincing argument... unfortunately, this happens to be a case where 
we need to protect server from client's misbehaviors. For a local 
filesystem (ext3), if any file reference count is not zero (i.e. some 
clients are still holding the locks), the filesystem can't be 
un-mounted. We would have to fail the failover to avoid data corruption.

>Maybe the idea of using network addresses was the first suggestion,
>and maybe it was rejected for the reasons you give, but it doesn't
>currently seem like those reasons are valid.  Maybe those who proposed
>those reasons (and maybe that was me) couldn't see the big picture at
>the time...
>  
>

This debate has been (so far) tolerable and helpful - so I'm not going 
to comment on this paragraph :) ... But I have to remind people my first 
proposal was adding new flags into export command (say "exportfs -ud" to 
unexport+drop locks, and "exportfs -g" to re-export and start grace 
period). Then we moved to "echo network-addr into procfs", later 
switched to "fsid" approach. A very long journey ...

>  
>
>>>  The reply to SM_MON (currently completely ignored by all versions
>>>  of Linux) has an extra value which indicates how many more seconds
>>>  of grace period there is to go.  This can be stuffed into res_stat
>>>  maybe.
>>>  Places where we currently check 'nlmsvc_grace_period', get moved to
>>>  *after* the nlmsvc_retrieve_args call, and the grace_period value
>>>  is extracted from host->nsm.
>>> 
>>>
>>>      
>>>
>>ok with me but I don't see the advantages though ?
>>    
>>
>
>So we can have a different grace period for each different 'host'.
>  
>

IMHO, having grace period for each client (host) is overkilled.

> [snip]
>
>Part of unmounting the filesystem from Server A requires getting
>Server A to drop all the locks on the filesystem.  We know they can
>only be held by client that sent request to a given set of IP
>addresses.   Lockd created an 'nsm' for each client/local-IP pair and
>registered each of those with statd.  The information registered with
>statd includes the details of an RPC call that can be made to lockd to
>tell it to drop all the locks owned by that client/local-IP pair.
>
>The statd in 1.1.0 records all this information in the files created
>in /var/lib/nfs/sm (and could pass it to the ha-callout if required).
>So when it is time to unmount the filesystem, some program can look
>through all the files in nfs/nm, read each of the lines, find those
>which relate to any of the local IP address that we want to move, and
>initialiate the RPC callback described on that line.  This will tell
>lockd to drop those lockd.  When all the RPCs have been sent, lockd
>will not hold any locks on that filesystem any more.
>  
>

Bright idea ! But doesn't solve the issue of misbehaved clients who come 
in from un-wanted (server) interfaces. Does it ?

>
>[snip]
>I feel it has taken me quite a while to gain a full understanding of
>what you are trying to achieve.  Maybe it would be useful to have a
>concise/precise description of what the goal is.
>I think a lot of the issues have now become clear, but it seems there
>remains the issue of what system-wide configurations are expected, and
>what configuration we can rule 'out of scope' and decide we don't have
>to deal with.
>  
>
I'm trying to do the write-up now. But could the following temporarily 
serve the purpose ? What is not clear from this thread of discussion?

http://www.redhat.com/archives/linux-cluster/2006-June/msg00050.html


-- Wendy

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-26  4:35         ` Wendy Cheng
@ 2007-04-26  5:43           ` Neil Brown
  2007-04-27  2:24             ` Wendy Cheng
  0 siblings, 1 reply; 42+ messages in thread
From: Neil Brown @ 2007-04-26  5:43 UTC (permalink / raw)
  To: wcheng; +Cc: cluster-devel, nfs

On Thursday April 26, wcheng@redhat.com wrote:
> 
> A convincing argument... unfortunately, this happens to be a case where 
> we need to protect server from client's misbehaviors. For a local 
> filesystem (ext3), if any file reference count is not zero (i.e. some 
> clients are still holding the locks), the filesystem can't be 
> un-mounted. We would have to fail the failover to avoid data corruption.

I think this is a tangential problem.
"removing locks held by troublesome clients so that I can unmount my
filesystem" is quite different from "remove locks held by client
clients using virtual-NAS-foo so they can be migrated".

I would have no problem with a new file in the nfsd filesystem such
that
      echo /some/path > /proc/fs/nfsd/nlm_drop_locks
would cause lockd to drop all locks on all files with the same 'struct
super' as "/some/path"->i_sb.
But I think that is independent functionality, that might be useful to
people who aren't doing active-active failover, but happens also to be
useful in conjunction with active-active failover.

We could discuss whether it should be "same superblock" or "same
vfsmount".  Both make sense to some extent.  The latter is possible
more flexible. 

If you had this interface, you might not need to send the various RPC
calls to lockd to get it to drop locks.... but then if you had a
cluster filesystem and wanted to only move some clients to a different
host, you would not want to drop *all* the locks on the filesystem, so
maybe both interfaces are still needed.

> 
> IMHO, having grace period for each client (host) is overkilled.

Yes, it gives you much more flexibility than you would ever what or
use, and in that sense it is overkill.
But it also makes available the specific flexibility that you do want
(grace period per local-address) with an extremely simple change to the
lockd interface, which I think is a big win.

> >
> >[snip]
> >I feel it has taken me quite a while to gain a full understanding of
> >what you are trying to achieve.  Maybe it would be useful to have a
> >concise/precise description of what the goal is.
> >I think a lot of the issues have now become clear, but it seems there
> >remains the issue of what system-wide configurations are expected, and
> >what configuration we can rule 'out of scope' and decide we don't have
> >to deal with.
> >  
> >
> I'm trying to do the write-up now. But could the following temporarily 
> serve the purpose ? What is not clear from this thread of discussion?
> 
> http://www.redhat.com/archives/linux-cluster/2006-June/msg00050.html

Lots of things are not clear - mostly things that have since become
clear in the ongoing discussion.
 - The many-IPs to many-filesystems possibilitiy
 - The need to explicitly handle mis-configured clients
 - The details of needs with respect to SM_NOTIFY callbacks
 - the "big picture" stuff.

I confess that I had a much more shallow understanding of how statd
interacts with lockd  when this discussion first started.
I'm sure that slowed me down in understanding the key issues, and in
suggesting workable possibilities.

I am sorry that this has taken so long.  However I think we are very
close to a solution that will solve everybody's needs.  And you've
found some bugs along the way!!

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-26  5:43           ` Neil Brown
@ 2007-04-27  2:24             ` Wendy Cheng
  2007-04-27  6:00               ` Neil Brown
  0 siblings, 1 reply; 42+ messages in thread
From: Wendy Cheng @ 2007-04-27  2:24 UTC (permalink / raw)
  To: Neil Brown; +Cc: cluster-devel, nfs

Neil Brown wrote:

>On Thursday April 26, wcheng@redhat.com wrote:
>  
>
>>A convincing argument... unfortunately, this happens to be a case where 
>>we need to protect server from client's misbehaviors. For a local 
>>filesystem (ext3), if any file reference count is not zero (i.e. some 
>>clients are still holding the locks), the filesystem can't be 
>>un-mounted. We would have to fail the failover to avoid data corruption.
>>    
>>
>
>I think this is a tangential problem.
>"removing locks held by troublesome clients so that I can unmount my
>filesystem" is quite different from "remove locks held by client
>clients using virtual-NAS-foo so they can be migrated".
>  
>
The reason to unmount is because we want to migrate the virtual IP. IMO 
they are the same issue but it is silly to keep fighting about this. In 
any case, one interface is better than two, if you allow me to insist on 
this.

So how about we do RPC call to lockd to tell it to drop the locks owned 
by the client/local-IP pair as you proposed, *but* add an "OR" with fsid 
to fool proof the process ? Say something like this:

RPC_to_lockd_with (client_host, client_ip, fsid);
if ((host == client_host && vip == client_ip) ||
(get_fsid(file) == client_fsid))
drop_the_locks();

This logic (RPC to lockd) will be triggered by a new command added to 
nfs-util package.

If we can agree on this, the rest would be easy. Done ?

-- Wendy

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27  2:24             ` Wendy Cheng
@ 2007-04-27  6:00               ` Neil Brown
  2007-04-27 11:15                 ` Jeff Layton
  0 siblings, 1 reply; 42+ messages in thread
From: Neil Brown @ 2007-04-27  6:00 UTC (permalink / raw)
  To: wcheng; +Cc: cluster-devel, nfs

On Thursday April 26, wcheng@redhat.com wrote:
> Neil Brown wrote:
> 
> >On Thursday April 26, wcheng@redhat.com wrote:
> >  
> >
> >>A convincing argument... unfortunately, this happens to be a case where 
> >>we need to protect server from client's misbehaviors. For a local 
> >>filesystem (ext3), if any file reference count is not zero (i.e. some 
> >>clients are still holding the locks), the filesystem can't be 
> >>un-mounted. We would have to fail the failover to avoid data corruption.
> >>    
> >>
> >
> >I think this is a tangential problem.
> >"removing locks held by troublesome clients so that I can unmount my
> >filesystem" is quite different from "remove locks held by client
> >clients using virtual-NAS-foo so they can be migrated".
> >  
> >
> The reason to unmount is because we want to migrate the virtual IP.

The reason to unmount is because we want to migrate the filesystem. In
your application that happens at the same time as migrating the
virtual IP, but they are still distinct operations.

>                                                                     IMO 
> they are the same issue but it is silly to keep fighting about this. In 
> any case, one interface is better than two, if you allow me to insist on 
> this.

How many interfaces depends somewhat on how many jobs to do.
You want to destroy state that will be rebuilt on a different server,
and you want to force-unmount a filesystem.  Two different jobs. Two
interfaces seems OK.
If they could both be done with one simple interface that would be
ideal, but I'm not sure they can.

And no-one gets to insist on anything.
You are writing the code.  I am accepting/rejecting it.  We both need
to agree or we won't move forward.  (Well... I could just write code
myself, but I don't plan to do that).

> 
> So how about we do RPC call to lockd to tell it to drop the locks owned 
> by the client/local-IP pair as you proposed, *but* add an "OR" with fsid 
> to fool proof the process ? Say something like this:
> 
> RPC_to_lockd_with (client_host, client_ip, fsid);
> if ((host == client_host && vip == client_ip) ||
> (get_fsid(file) == client_fsid))
> drop_the_locks();
> 
> This logic (RPC to lockd) will be triggered by a new command added to 
> nfs-util package.
> 
> If we can agree on this, the rest would be easy. Done ?

Sorry, but we cannot agree with this, and I think the rest is still
easy.

The more I think about it, the less I like the idea of using an fsid.
The fsid concept was created simply because we needed something that
would fit inside a filehandle.  I think that is the only place it
should be used.
Outside of filehandles, we have a perfectly good and well-understood
mechanism for identifying files and filesystems.  It is a "path name".
The functionality "drop all locks held by lockd on a particular
filesystem" is potentially useful outside of any fail-over
configuration, and should work on any filesystem, not just one that
was exported with 'fsid='.

So if you need that, then I think it really must be implemented by
something a lot like
   echo -n /path/name > /proc/fs/nfs/nlm_unlock_filesystem

This is something that we could possible teach "fuser -k" about - so
it can effectively 'kill' that part of lockd that is accessing a given
filesystem.  It is useful to failover, but definitely useful beyond
failover.

Everything else can be done in the RPC interface between lockd and
statd, leveraging the "my_name" field to identify state based on which
local network address was used.  All this other functionality is
completely agnostic about the particular filesystem and just looks at
the virtual IP that was used.
All this other functionality is all that you need unless you have a
misbehaving client.
You would do all the lockd/statd/rpc stuff.  Then try to unmount the
filesystem.  If that fails, try "fuser -k -m /whatever" and try the
unmount again.

Another interface alternative might be to hook in to
umount(MNT_FORCE), but that would require even broader review, and
probably isn't worth it....

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27  6:00               ` Neil Brown
@ 2007-04-27 11:15                 ` Jeff Layton
  2007-04-27 12:40                   ` Neil Brown
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff Layton @ 2007-04-27 11:15 UTC (permalink / raw)
  To: Neil Brown; +Cc: cluster-devel, nfs

On Fri, Apr 27, 2007 at 04:00:13PM +1000, Neil Brown wrote:
> On Thursday April 26, wcheng@redhat.com wrote:
> > Neil Brown wrote:
> > 
> > >On Thursday April 26, wcheng@redhat.com wrote:
> > >  
> > >
> > >>A convincing argument... unfortunately, this happens to be a case where 
> > >>we need to protect server from client's misbehaviors. For a local 
> > >>filesystem (ext3), if any file reference count is not zero (i.e. some 
> > >>clients are still holding the locks), the filesystem can't be 
> > >>un-mounted. We would have to fail the failover to avoid data corruption.
> > >>    
> > >>
> > >
> > >I think this is a tangential problem.
> > >"removing locks held by troublesome clients so that I can unmount my
> > >filesystem" is quite different from "remove locks held by client
> > >clients using virtual-NAS-foo so they can be migrated".
> > >  
> > >
> > The reason to unmount is because we want to migrate the virtual IP.
> 
> The reason to unmount is because we want to migrate the filesystem. In
> your application that happens at the same time as migrating the
> virtual IP, but they are still distinct operations.
>  
> >                                                                     IMO 
> > they are the same issue but it is silly to keep fighting about this. In 
> > any case, one interface is better than two, if you allow me to insist on 
> > this.
> 
> How many interfaces depends somewhat on how many jobs to do.
> You want to destroy state that will be rebuilt on a different server,
> and you want to force-unmount a filesystem.  Two different jobs. Two
> interfaces seems OK.
> If they could both be done with one simple interface that would be
> ideal, but I'm not sure they can.
> 
> And no-one gets to insist on anything.
> You are writing the code.  I am accepting/rejecting it.  We both need
> to agree or we won't move forward.  (Well... I could just write code
> myself, but I don't plan to do that).
> 
> > 
> > So how about we do RPC call to lockd to tell it to drop the locks owned 
> > by the client/local-IP pair as you proposed, *but* add an "OR" with fsid 
> > to fool proof the process ? Say something like this:
> > 
> > RPC_to_lockd_with (client_host, client_ip, fsid);
> > if ((host == client_host && vip == client_ip) ||
> > (get_fsid(file) == client_fsid))
> > drop_the_locks();
> > 
> > This logic (RPC to lockd) will be triggered by a new command added to 
> > nfs-util package.
> > 
> > If we can agree on this, the rest would be easy. Done ?
> 
> Sorry, but we cannot agree with this, and I think the rest is still
> easy.
> 
> The more I think about it, the less I like the idea of using an fsid.
> The fsid concept was created simply because we needed something that
> would fit inside a filehandle.  I think that is the only place it
> should be used.
> Outside of filehandles, we have a perfectly good and well-understood
> mechanism for identifying files and filesystems.  It is a "path name".
> The functionality "drop all locks held by lockd on a particular
> filesystem" is potentially useful outside of any fail-over
> configuration, and should work on any filesystem, not just one that
> was exported with 'fsid='.
> 
> So if you need that, then I think it really must be implemented by
> something a lot like
>    echo -n /path/name > /proc/fs/nfs/nlm_unlock_filesystem
> 
> This is something that we could possible teach "fuser -k" about - so
> it can effectively 'kill' that part of lockd that is accessing a given
> filesystem.  It is useful to failover, but definitely useful beyond
> failover.

Just a note that I posted a patch ~ a year ago that did precisely that. The
interface was a little bit different. I had userspace echoing in a dev_t
number, but it wouldn't be too hard to change it to use a pathname instead.

Subject was:

    [PATCH] lockd: add procfs control to cue lockd to release all locks on a device   

...if anyone is interested in having me resurrect it.

-- Jeff


> 
> 
> Everything else can be done in the RPC interface between lockd and
> statd, leveraging the "my_name" field to identify state based on which
> local network address was used.  All this other functionality is
> completely agnostic about the particular filesystem and just looks at
> the virtual IP that was used.
> All this other functionality is all that you need unless you have a
> misbehaving client.
> You would do all the lockd/statd/rpc stuff.  Then try to unmount the
> filesystem.  If that fails, try "fuser -k -m /whatever" and try the
> unmount again.
> 
> Another interface alternative might be to hook in to
> umount(MNT_FORCE), but that would require even broader review, and
> probably isn't worth it....
> 
> NeilBrown
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27 11:15                 ` Jeff Layton
@ 2007-04-27 12:40                   ` Neil Brown
  2007-04-27 13:42                     ` Jeff Layton
  0 siblings, 1 reply; 42+ messages in thread
From: Neil Brown @ 2007-04-27 12:40 UTC (permalink / raw)
  To: Jeff Layton; +Cc: cluster-devel, nfs

On Friday April 27, jlayton@poochiereds.net wrote:
> On Fri, Apr 27, 2007 at 04:00:13PM +1000, Neil Brown wrote:
> > 
> > So if you need that, then I think it really must be implemented by
> > something a lot like
> >    echo -n /path/name > /proc/fs/nfs/nlm_unlock_filesystem
> > 
> > This is something that we could possible teach "fuser -k" about - so
> > it can effectively 'kill' that part of lockd that is accessing a given
> > filesystem.  It is useful to failover, but definitely useful beyond
> > failover.
> 
> Just a note that I posted a patch ~ a year ago that did precisely that. The
> interface was a little bit different. I had userspace echoing in a dev_t
> number, but it wouldn't be too hard to change it to use a pathname instead.
> 
> Subject was:
> 
>     [PATCH] lockd: add procfs control to cue lockd to release all locks on a device   
> 
> ...if anyone is interested in having me resurrect it.
> 
> -- Jeff

http://lkml.org/lkml/2006/4/10/240

Looks like no-one ever replied.
I probably didn't see it:  things on linux-kernel that don't have
'nfs' or 'raid' (or a few related strings) in the subject have at best
an even chance of me seeing them.  I've just added 'lockd' to the list
of important strings :-)

I would rather a path name, and would rather it came through the
'nfsd' filesystem, but those are fairly trivial changes.

nlm_traverse_files has changed a bit since then, but it should be
easier to unlock based on filesystem with the current code
(especially if we made the first arg a void*..).

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27 12:40                   ` Neil Brown
@ 2007-04-27 13:42                     ` Jeff Layton
  2007-04-27 14:17                       ` Christoph Hellwig
  2007-04-27 15:12                       ` Jeff Layton
  0 siblings, 2 replies; 42+ messages in thread
From: Jeff Layton @ 2007-04-27 13:42 UTC (permalink / raw)
  To: Neil Brown, Wendy Cheng; +Cc: cluster-devel, nfs

On Fri, Apr 27, 2007 at 10:40:16PM +1000, Neil Brown wrote:
> On Friday April 27, jlayton@poochiereds.net wrote:
> > On Fri, Apr 27, 2007 at 04:00:13PM +1000, Neil Brown wrote:
> > > 
> > > So if you need that, then I think it really must be implemented by
> > > something a lot like
> > >    echo -n /path/name > /proc/fs/nfs/nlm_unlock_filesystem
> > > 
> > > This is something that we could possible teach "fuser -k" about - so
> > > it can effectively 'kill' that part of lockd that is accessing a given
> > > filesystem.  It is useful to failover, but definitely useful beyond
> > > failover.
> > 
> > Just a note that I posted a patch ~ a year ago that did precisely that. The
> > interface was a little bit different. I had userspace echoing in a dev_t
> > number, but it wouldn't be too hard to change it to use a pathname instead.
> > 
> > Subject was:
> > 
> >     [PATCH] lockd: add procfs control to cue lockd to release all locks on a device   
> > 
> > ...if anyone is interested in having me resurrect it.
> > 
> > -- Jeff
> 
> http://lkml.org/lkml/2006/4/10/240
> 
> Looks like no-one ever replied.
> I probably didn't see it:  things on linux-kernel that don't have
> 'nfs' or 'raid' (or a few related strings) in the subject have at best
> an even chance of me seeing them.  I've just added 'lockd' to the list
> of important strings :-)
> 
> I would rather a path name, and would rather it came through the
> 'nfsd' filesystem, but those are fairly trivial changes.
> 
> nlm_traverse_files has changed a bit since then, but it should be
> easier to unlock based on filesystem with the current code
> (especially if we made the first arg a void*..).
> 
> NeilBrown
> 

Ok, I'll toss cleaning that patch up and reposting it on to my to-do list...

Wendy, it seems like you had some objection to this patch at the time, but
the nature of it escapes me. Do you recall what your concern with it was?

Thanks,
Jeff


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27 13:42                     ` Jeff Layton
@ 2007-04-27 14:17                       ` Christoph Hellwig
  2007-04-27 15:42                         ` J. Bruce Fields
  2007-04-27 15:12                       ` Jeff Layton
  1 sibling, 1 reply; 42+ messages in thread
From: Christoph Hellwig @ 2007-04-27 14:17 UTC (permalink / raw)
  To: Jeff Layton; +Cc: Neil Brown, cluster-devel, nfs

On Fri, Apr 27, 2007 at 09:42:48AM -0400, Jeff Layton wrote:
> Ok, I'll toss cleaning that patch up and reposting it on to my to-do list...
> 
> Wendy, it seems like you had some objection to this patch at the time, but
> the nature of it escapes me. Do you recall what your concern with it was?

I like the idea of the patch.  Intead of writing a dev_t into procfs
it should probably be changed to write a path to a new file in the
nfsctl filesystem.   In fact couldn't this be treated as a reexport
with a NFSEXP_ flag meaning drop all locks to avoid creating new
interfaces?


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27 13:42                     ` Jeff Layton
  2007-04-27 14:17                       ` Christoph Hellwig
@ 2007-04-27 15:12                       ` Jeff Layton
  1 sibling, 0 replies; 42+ messages in thread
From: Jeff Layton @ 2007-04-27 15:12 UTC (permalink / raw)
  To: Neil Brown, Wendy Cheng, cluster-devel, nfs

On Fri, Apr 27, 2007 at 09:42:48AM -0400, Jeff Layton wrote:
> Ok, I'll toss cleaning that patch up and reposting it on to my to-do list...
> 
> Wendy, it seems like you had some objection to this patch at the time, but
> the nature of it escapes me. Do you recall what your concern with it was?
> 
> Thanks,
> Jeff
> 

Actually, on second thought, I'm going to leave this in Wendy's capable
hands. The changes to the existing code and the interface changes needed are
probably enough to make it so that this patch will be rewritten from scratch
anyway...

Cheers,
Jeff


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27 15:42                         ` J. Bruce Fields
@ 2007-04-27 15:36                           ` Wendy Cheng
  2007-04-27 16:31                             ` J. Bruce Fields
  2007-04-27 20:34                             ` Frank van Maarseveen
  0 siblings, 2 replies; 42+ messages in thread
From: Wendy Cheng @ 2007-04-27 15:36 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Christoph Hellwig, Neil Brown, cluster-devel, nfs, Jeff Layton

J. Bruce Fields wrote:
> On Fri, Apr 27, 2007 at 03:17:10PM +0100, Christoph Hellwig wrote:
>   
>> In fact couldn't this be treated as a reexport with a NFSEXP_ flag
>> meaning drop all locks to avoid creating new interfaces?
>>     
>
> Off hand, I can't see any reason why that wouldn't work.  The code to
> handle it would probably go in fs/nfsd/export.c:svc_export_parse().
>
>   
Sign :( ... folks, we go back to the loop again. That *was* my first 
proposal ...

-- Wendy

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27 14:17                       ` Christoph Hellwig
@ 2007-04-27 15:42                         ` J. Bruce Fields
  2007-04-27 15:36                           ` Wendy Cheng
  0 siblings, 1 reply; 42+ messages in thread
From: J. Bruce Fields @ 2007-04-27 15:42 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Neil Brown, cluster-devel, nfs, Jeff Layton

On Fri, Apr 27, 2007 at 03:17:10PM +0100, Christoph Hellwig wrote:
> In fact couldn't this be treated as a reexport with a NFSEXP_ flag
> meaning drop all locks to avoid creating new interfaces?

Off hand, I can't see any reason why that wouldn't work.  The code to
handle it would probably go in fs/nfsd/export.c:svc_export_parse().

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27 15:36                           ` Wendy Cheng
@ 2007-04-27 16:31                             ` J. Bruce Fields
  2007-04-27 22:22                               ` Neil Brown
  2007-04-27 20:34                             ` Frank van Maarseveen
  1 sibling, 1 reply; 42+ messages in thread
From: J. Bruce Fields @ 2007-04-27 16:31 UTC (permalink / raw)
  To: Wendy Cheng
  Cc: Christoph Hellwig, Neil Brown, cluster-devel, nfs, Jeff Layton

On Fri, Apr 27, 2007 at 11:36:16AM -0400, Wendy Cheng wrote:
> J. Bruce Fields wrote:
> >On Fri, Apr 27, 2007 at 03:17:10PM +0100, Christoph Hellwig wrote:
> >  
> >>In fact couldn't this be treated as a reexport with a NFSEXP_ flag
> >>meaning drop all locks to avoid creating new interfaces?
> >>    
> >
> >Off hand, I can't see any reason why that wouldn't work.  The code to
> >handle it would probably go in fs/nfsd/export.c:svc_export_parse().
> >
> >  
> Sign :( ... folks, we go back to the loop again. That *was* my first 
> proposal ...

So you're talking about this and followups?:

	http://marc.info/?l=linux-nfs&m=115009204513790&w=2

I just took a look and couldn't find any complaints about that
approach.  Were they elsewhere?

I understand the frustration.  There's a balance betweeen on the one
hand, being willing to throw out some hard work and start over if
someone comes up with a real objection, and, on the other hand, sticking
to a design when you're convinced it's right.

I *really* appreciate good review, but I also try to avoid doing
something I don't like just because it seems to be the only way to make
somebody else happy....  If they've got a real point then I should be
able to understand it.  If not, then I risk doing all the work to make
them happy just to throw it away because I can't defend the approach in
the end, or because I find out I misunderstood their original point.

(Then again, sometimes I do just have to trust somebody.  And sometimes
I guess learning who can be trusted about what is part of the process.)

In this case I think the complaint about requiring fsid's on everything
is legimate, and the original approach of using the export was sensible.
But I haven't been paying as much attention as I should have, and I
probably missed something.

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27 15:36                           ` Wendy Cheng
  2007-04-27 16:31                             ` J. Bruce Fields
@ 2007-04-27 20:34                             ` Frank van Maarseveen
  2007-04-28  3:55                               ` Wendy Cheng
  1 sibling, 1 reply; 42+ messages in thread
From: Frank van Maarseveen @ 2007-04-27 20:34 UTC (permalink / raw)
  To: Wendy Cheng
  Cc: Christoph Hellwig, cluster-devel, Jeff Layton, Neil Brown,
	J. Bruce Fields, nfs

On Fri, Apr 27, 2007 at 11:36:16AM -0400, Wendy Cheng wrote:
> J. Bruce Fields wrote:
> > On Fri, Apr 27, 2007 at 03:17:10PM +0100, Christoph Hellwig wrote:
> >   
> >> In fact couldn't this be treated as a reexport with a NFSEXP_ flag
> >> meaning drop all locks to avoid creating new interfaces?
> >>     
> >
> > Off hand, I can't see any reason why that wouldn't work.  The code to
> > handle it would probably go in fs/nfsd/export.c:svc_export_parse().
> >
> >   
> Sign :( ... folks, we go back to the loop again. That *was* my first 
> proposal ...

I'm quite interested in _any_ patch which would allow me to drop
the locks obtained by NFS clients on a specific export, either by (1)
"exportfs -uh" or by (2) "echo /some/path > /proc/fs/nfsd/nlm_drop_lock"
as Neil mentioned.
I want to migrate virtual(*) NFS servers _including_ the locks without
having to tear down the whole machine. In my case "migration" is a sort
of scheduled failover: no HA or clusters involved.

At first, the "exportfs -uh" proposal (maybe fsid driven) seems "the
right thing" because after unexporting there's no valid case to preserve
the locks AFAICS. Unexport implies EACCES for subsequent NFS accesses
anyway and unexporting /cdrom for example is _required_ in order to be
able to umount and eject the thing. As it stands today, unexporting is
not even enough to be able to unmount it and that's not good. (the need
to having to unexport a /cdrom before being able to eject it is actually
a problem on its own -- a separate issue).

So why not drop the locks always upon unexport? Stupid question of course
because exporting anything will not send out any sm notifications so
that would break symmetry.

I'd prefer (2) "echo /some/path > /proc/fs/nfsd/nlm_drop_lock" because:

- Tying the -h (drop locks) option to -u (unexport) is too restrictive IMO.
  For one thing, there's a bug in the linux NFS client locking code (I
  reported this earlier) which results in locks not being removed from
  the server. It was not too difficult to reproduce and programs on the
  client will wait forever due to this. To handle these kind of situations
  I need (2) on the server.

- (2) may be useful for other NFS server setups: it is inherently more
  flexible.

- (2) does not depend on nfs-utils. It's simpler.

(*) virtual in this case means a UUID or IP based fsid= option and an
additional IP address on eth0 per export entry, such, that it becomes
possible to move an export entry + disk partition + local mount to
different hardware without needing to remount it on all <large number>
NFS clients.

-- 
Frank

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27 16:31                             ` J. Bruce Fields
@ 2007-04-27 22:22                               ` Neil Brown
  2007-04-29 20:13                                 ` J. Bruce Fields
  0 siblings, 1 reply; 42+ messages in thread
From: Neil Brown @ 2007-04-27 22:22 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Christoph Hellwig, cluster-devel, nfs, Jeff Layton

On Friday April 27, bfields@fieldses.org wrote:
> On Fri, Apr 27, 2007 at 11:36:16AM -0400, Wendy Cheng wrote:
> > J. Bruce Fields wrote:
> > >On Fri, Apr 27, 2007 at 03:17:10PM +0100, Christoph Hellwig wrote:
> > >  
> > >>In fact couldn't this be treated as a reexport with a NFSEXP_ flag
> > >>meaning drop all locks to avoid creating new interfaces?
> > >>    
> > >
> > >Off hand, I can't see any reason why that wouldn't work.  The code to
> > >handle it would probably go in fs/nfsd/export.c:svc_export_parse().
> > >
> > >  
> > Sign :( ... folks, we go back to the loop again. That *was* my first 
> > proposal ...

Yes, I grinned when I saw it too.
Your first proposal was actually a flag to  "unexport", where as
Christoph seems to be a flag to "export".  So there is at least a
subtle difference.

A flag to unexport cannot work because we don't call unexport - we
just flush a kernel cache.

A flag to export is just .... weird.  All the other export flags are
state flags.  This would be an action flag.  They are quite different
things.   Setting a state flag again is a no-op.  Setting an action
flag again has a very real effect.

Also, each filesystem is potentially exported multiple times for
different sets of clients.  If such a flag (whether on 'export' or
'unexport') just said "remove locks from this set of clients" it
wouldn't meet the needs, and if it said "remove all locks" it would be
a very irregular interface.

> 
> So you're talking about this and followups?:
> 
> 	http://marc.info/?l=linux-nfs&m=115009204513790&w=2
> 
> I just took a look and couldn't find any complaints about that
> approach.  Were they elsewhere?

https://www.redhat.com/archives/linux-cluster/2006-June/msg00101.html

Is where I said that I don't like the unexport flag.

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27 20:34                             ` Frank van Maarseveen
@ 2007-04-28  3:55                               ` Wendy Cheng
  2007-04-28  4:51                                 ` Neil Brown
  0 siblings, 1 reply; 42+ messages in thread
From: Wendy Cheng @ 2007-04-28  3:55 UTC (permalink / raw)
  To: Frank van Maarseveen
  Cc: Christoph Hellwig, cluster-devel, Jeff Layton, Neil Brown,
	J. Bruce Fields, nfs

Frank van Maarseveen wrote:

>I'm quite interested in _any_ patch which would allow me to drop
>the locks obtained by NFS clients on a specific export, either by (1)
>"exportfs -uh" or by (2) "echo /some/path > /proc/fs/nfsd/nlm_drop_lock"
>as Neil mentioned.
>  
>
Thanks for commenting on this. Opinions from users who will eventually 
use these interfaces are always valued.

>[snip]
>
>
>I'd prefer (2) "echo /some/path > /proc/fs/nfsd/nlm_drop_lock" because:
>  
>
To convert the first patch of this submitted series from "fsid" to 
"/some/path" is a no-brainer, since we had gone thru several rounds of 
similar changes. However, my questions (it is more of a Neil's question) 
are, if I convert the first patch to do this,

1) then why do we still need the RPC drop-lock call in nfs-util ?
2) what should we do for the 2nd patch ? i.e., how do we communicate 
with the take-over server it is time for its action, by RPC call or by 
"echo /some/path > /proc/fs/nfsd/nlm_set_grace_or_whatever" ?

In general, I feel if we do this "/some/path" approach, we may as well 
simply convert the 2nd patch from "fsid" to "/some/path". Then we would 
finish this long journey.

-- Wendy

>- Tying the -h (drop locks) option to -u (unexport) is too restrictive IMO.
>  For one thing, there's a bug in the linux NFS client locking code (I
>  reported this earlier) which results in locks not being removed from
>  the server. It was not too difficult to reproduce and programs on the
>  client will wait forever due to this. To handle these kind of situations
>  I need (2) on the server.
>
>- (2) may be useful for other NFS server setups: it is inherently more
>  flexible.
>
>- (2) does not depend on nfs-utils. It's simpler.
>
>
>(*) virtual in this case means a UUID or IP based fsid= option and an
>additional IP address on eth0 per export entry, such, that it becomes
>possible to move an export entry + disk partition + local mount to
>different hardware without needing to remount it on all <large number>
>NFS clients.
>
>  
>


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-28  3:55                               ` Wendy Cheng
@ 2007-04-28  4:51                                 ` Neil Brown
  2007-04-28  5:26                                   ` Marc Eshel
  2007-04-28 12:33                                   ` Frank van Maarseveen
  0 siblings, 2 replies; 42+ messages in thread
From: Neil Brown @ 2007-04-28  4:51 UTC (permalink / raw)
  To: wcheng
  Cc: cluster-devel, Frank van Maarseveen, Jeff Layton,
	Christoph Hellwig, nfs, J. Bruce Fields

On Friday April 27, wcheng@redhat.com wrote:
> Frank van Maarseveen wrote:
> >
> >I'd prefer (2) "echo /some/path > /proc/fs/nfsd/nlm_drop_lock" because:
> >  
> >
> To convert the first patch of this submitted series from "fsid" to 
> "/some/path" is a no-brainer, since we had gone thru several rounds of 
> similar changes. However, my questions (it is more of a Neil's question) 
> are, if I convert the first patch to do this,
> 
> 1) then why do we still need the RPC drop-lock call in nfs-util ?

Maybe we don't.
I can imagine a (probably hypothetical) situation where you want to
drop some but not all of the locks on a filesystem - if it is a
cluster-aware filesystem that several virtual-NAS's export, and you
want to move just one virtual-NAS.  But if you don't want to be able
to do that, you obviously don't have to.

> 2) what should we do for the 2nd patch ? i.e., how do we communicate 
> with the take-over server it is time for its action, by RPC call or by 
> "echo /some/path > /proc/fs/nfsd/nlm_set_grace_or_whatever" ?

I'm happy with using a path name like this to restart the grace
period.  Where would you store the per-filesystem grace-period-end??
I guess you would need a new little data structure indexed by
... 'struct super_block *' I guess.  It would need to hold a reference
on the superblock until the grace period expired would it?

It might seem 'obvious' to store it in 'struct svc_export', but there
can be several of these per filesystem, and more could be added after
you set the grace period.  So it would be messy to get that right.

> 
> In general, I feel if we do this "/some/path" approach, we may as well 
> simply convert the 2nd patch from "fsid" to "/some/path". Then we would 
> finish this long journey.

Certainly a lot closer.
If we are creating "nlm_drop_locks" and "nlm_set_grace" interfaces, we
should spend a few moments considering exactly what semantics they
should have.

In both cases we write a filename.  Presumably it must start with a
'/' and be null terminated, so you use "echo -n" rather than "echo".
After all, a filename can contain a newline.

Is there any extra info we might want to pass in or out at the same
time?

For nlm_drop_locks, we might also want to be able to query locked -
"Do you hold any locks on this filesystem".  Even "how many?".
For set_grace, we might want to ask how many seconds are left in the
grace period (I'm not sure how this info would be used, but it is
always nice to be able to read any value that you can write).

Does it make sense to have a single file with composite semantics?

We write
    XX/path/name
where XX can be:
    a number, to set second remaining in grace period
    a '?' (or empty string) to query state
    a '-' to remove all locks (and cancels any grace period)
We then read back two numbers, the seconds remaining in the grace
period, and the number of locked files.

Then we need to make sure we choose appropriate names.  I think that
the string 'lockd' make more sense than 'nlm', as we are interacting
with the daemon, not configuring the protocol.  We might not either
need either as the file is inside /proc/fs/nfsd, it is obviously
related to nfsd.
And if we can use the interface to query, then names like 'set' and
'drop' and probably mis-placed.  Maybe "grace" and "locks".
If no path is given, the requests have system-wide effect.  If there
is a non-empty path, just that filesystem if queried/modified.

These are just possibilities.  I'm quite happy with either 1 or 2
files.  I just want to be sure a number of options have been
considered, and that a reasoned choice as been made.

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-28  4:51                                 ` Neil Brown
@ 2007-04-28  5:26                                   ` Marc Eshel
  2007-04-28 12:33                                   ` Frank van Maarseveen
  1 sibling, 0 replies; 42+ messages in thread
From: Marc Eshel @ 2007-04-28  5:26 UTC (permalink / raw)
  To: Neil Brown
  Cc: cluster-devel, Jeff Layton, Christoph Hellwig, J. Bruce Fields,
	nfs, Frank van Maarseveen, nfs-bounces

nfs-bounces@lists.sourceforge.net wrote on 04/27/2007 09:51:17 PM:

> On Friday April 27, wcheng@redhat.com wrote:
> > Frank van Maarseveen wrote:
> > >
> > >I'd prefer (2) "echo /some/path > /proc/fs/nfsd/nlm_drop_lock" 
because:
> > > 
> > >
> > To convert the first patch of this submitted series from "fsid" to 
> > "/some/path" is a no-brainer, since we had gone thru several rounds of 

> > similar changes. However, my questions (it is more of a Neil's 
question) 
> > are, if I convert the first patch to do this,
> > 
> > 1) then why do we still need the RPC drop-lock call in nfs-util ?
> 
> Maybe we don't.
> I can imagine a (probably hypothetical) situation where you want to
> drop some but not all of the locks on a filesystem - if it is a
> cluster-aware filesystem that several virtual-NAS's export, and you
> want to move just one virtual-NAS.  But if you don't want to be able
> to do that, you obviously don't have to.

It would be very useful for cluster filesystems, that can export the same 
filesystem from few servers using multiple IP addresses from each server, 
to be able to move IP address among server for load balancing. 
Marc.
 
> > 2) what should we do for the 2nd patch ? i.e., how do we communicate 
> > with the take-over server it is time for its action, by RPC call or by 

> > "echo /some/path > /proc/fs/nfsd/nlm_set_grace_or_whatever" ?
> 
> I'm happy with using a path name like this to restart the grace
> period.  Where would you store the per-filesystem grace-period-end??
> I guess you would need a new little data structure indexed by
> ... 'struct super_block *' I guess.  It would need to hold a reference
> on the superblock until the grace period expired would it?
> 
> It might seem 'obvious' to store it in 'struct svc_export', but there
> can be several of these per filesystem, and more could be added after
> you set the grace period.  So it would be messy to get that right.
> 
> 
> > 
> > In general, I feel if we do this "/some/path" approach, we may as well 

> > simply convert the 2nd patch from "fsid" to "/some/path". Then we 
would 
> > finish this long journey.
> 
> Certainly a lot closer.
> If we are creating "nlm_drop_locks" and "nlm_set_grace" interfaces, we
> should spend a few moments considering exactly what semantics they
> should have.
> 
> In both cases we write a filename.  Presumably it must start with a
> '/' and be null terminated, so you use "echo -n" rather than "echo".
> After all, a filename can contain a newline.
> 
> Is there any extra info we might want to pass in or out at the same
> time?
> 
> For nlm_drop_locks, we might also want to be able to query locked -
> "Do you hold any locks on this filesystem".  Even "how many?".
> For set_grace, we might want to ask how many seconds are left in the
> grace period (I'm not sure how this info would be used, but it is
> always nice to be able to read any value that you can write).
> 
> Does it make sense to have a single file with composite semantics?
> 
> We write
>     XX/path/name
> where XX can be:
>     a number, to set second remaining in grace period
>     a '?' (or empty string) to query state
>     a '-' to remove all locks (and cancels any grace period)
> We then read back two numbers, the seconds remaining in the grace
> period, and the number of locked files.
> 
> Then we need to make sure we choose appropriate names.  I think that
> the string 'lockd' make more sense than 'nlm', as we are interacting
> with the daemon, not configuring the protocol.  We might not either
> need either as the file is inside /proc/fs/nfsd, it is obviously
> related to nfsd.
> And if we can use the interface to query, then names like 'set' and
> 'drop' and probably mis-placed.  Maybe "grace" and "locks".
> If no path is given, the requests have system-wide effect.  If there
> is a non-empty path, just that filesystem if queried/modified.
> 
> These are just possibilities.  I'm quite happy with either 1 or 2
> files.  I just want to be sure a number of options have been
> considered, and that a reasoned choice as been made.
> 
> NeilBrown
> 
> 
-------------------------------------------------------------------------
> This SF.net email is sponsored by DB2 Express
> Download DB2 Express C - the FREE version of DB2 express and take
> control of your XML. No limits. Just data. Click to get it now.
> http://sourceforge.net/powerbar/db2/
> _______________________________________________
> NFS maillist  -  NFS@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nfs


-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-28  4:51                                 ` Neil Brown
  2007-04-28  5:26                                   ` Marc Eshel
@ 2007-04-28 12:33                                   ` Frank van Maarseveen
  1 sibling, 0 replies; 42+ messages in thread
From: Frank van Maarseveen @ 2007-04-28 12:33 UTC (permalink / raw)
  To: Neil Brown
  Cc: cluster-devel, Jeff Layton, Christoph Hellwig, nfs,
	J. Bruce Fields

On Sat, Apr 28, 2007 at 02:51:17PM +1000, Neil Brown wrote:
> On Friday April 27, wcheng@redhat.com wrote:
[...]
> Certainly a lot closer.
> If we are creating "nlm_drop_locks" and "nlm_set_grace" interfaces, we
> should spend a few moments considering exactly what semantics they
> should have.
> 
> In both cases we write a filename.  Presumably it must start with a
> '/' and be null terminated, so you use "echo -n" rather than "echo".
> After all, a filename can contain a newline.

I don't care much about the trailing newline. Try mounting and
exporting it and mounting it on the client ;-). Truncating the
string at the first newline may be a practical thing to do.

> 
> Is there any extra info we might want to pass in or out at the same
> time?
> 
> For nlm_drop_locks, we might also want to be able to query locked -
> "Do you hold any locks on this filesystem".  Even "how many?".

The "no locks dropped" case might be useful. #locks dropped is
only informational (without client info) and covers the first case
too so that would be my choice but I don't have any strong opinion
about this.

> 
> Does it make sense to have a single file with composite semantics?

Only if that would avoid an otherwise unavoidable race. There are
just too many components involved with NFS so to avoid any race I'd
probably unplug it temporarily with iptables or "ip addr del..."
But I would like to be able to drop locks without entering grace
mode: a zero second grace mode when combined.

> 
> We write
>     XX/path/name
> where XX can be:

Try mounting and exporting pathnames with spaces.. that's not going
to work anytime soon, or even anytime at all (other unixes). So no
need to use / as separator.

>     a number, to set second remaining in grace period
>     a '?' (or empty string) to query state

You mean: write "?/path/name" to tell the kernel what subsequent
reads should query?

>     a '-' to remove all locks (and cancels any grace period)

That's a strange combination. But cancelling a grace period
is equivalent with setting it to zero seconds so no need for a
special case.

I'd go for simplicity: one file per function (unless there's an
unavoidable race). What about:

/proc/fs/nfsd/nlm_grace:
	Write a number to set the grace period in seconds
	(0==cancel). May be followed by a space + pathname to
	indicate the superblock/list of svc_something the grace
	period applies to (otherwise it's global). Truncate the
	string at a newline.

/proc/fs/nfsd/nlm_unlock:
	Write either a pathname or "" to drop locks. This has the
	same syntax as the second field of nlm_grace.

Optional: In addition to a pathname support "fsid=" syntax in
both cases.

If you wanna go wild then support a file= syntax to recover from
stale locks on individual files due to buggy clients.

-- 
Frank

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-27 22:22                               ` Neil Brown
@ 2007-04-29 20:13                                 ` J. Bruce Fields
  2007-04-29 23:10                                   ` Neil Brown
  0 siblings, 1 reply; 42+ messages in thread
From: J. Bruce Fields @ 2007-04-29 20:13 UTC (permalink / raw)
  To: Neil Brown; +Cc: Christoph Hellwig, cluster-devel, nfs, Jeff Layton

On Sat, Apr 28, 2007 at 08:22:55AM +1000, Neil Brown wrote:
> A flag to unexport cannot work because we don't call unexport - we
> just flush a kernel cache.
> 
> A flag to export is just .... weird.  All the other export flags are
> state flags.  This would be an action flag.  They are quite different
> things.   Setting a state flag again is a no-op.  Setting an action
> flag again has a very real effect.

In this case the second set shouldn't have any effect--whatever flag is
set should prevent further locks from being accepted, shouldn't it?  (If
it matters.)

> Also, each filesystem is potentially exported multiple times for
> different sets of clients.  If such a flag (whether on 'export' or
> 'unexport') just said "remove locks from this set of clients" it
> wouldn't meet the needs, and if it said "remove all locks" it would be
> a very irregular interface.

The same could be said of the "fsid=" option on exports.  It doesn't
make sense to provide different filehandle- or path- name spaces
depending on the IP address of a client.  If my laptop changes IP
address, then I can (grudgingly) accept the fact that the server may
have to deny me access that I had before--maybe it just can't trust the
network I moved to for whatever reason--but I'd really rather it didn't
suddenly start giving me paths, or different filehandles, or different
semantics (like sync vs. async).

So the export interface is already being used for stuff that's really
intended to be per-filesystem rather than per-(filesystem, client) pair.

> > So you're talking about this and followups?:
> > 
> > 	http://marc.info/?l=linux-nfs&m=115009204513790&w=2
> > 
> > I just took a look and couldn't find any complaints about that
> > approach.  Were they elsewhere?
> 
> https://www.redhat.com/archives/linux-cluster/2006-June/msg00101.html
> 
> Is where I said that I don't like the unexport flag.

Got it, thanks.

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-29 20:13                                 ` J. Bruce Fields
@ 2007-04-29 23:10                                   ` Neil Brown
  2007-04-30  5:19                                     ` Wendy Cheng
  2007-05-04 18:42                                     ` J. Bruce Fields
  0 siblings, 2 replies; 42+ messages in thread
From: Neil Brown @ 2007-04-29 23:10 UTC (permalink / raw)
  To: J. Bruce Fields; +Cc: Christoph Hellwig, cluster-devel, nfs, Jeff Layton

On Sunday April 29, bfields@fieldses.org wrote:
> On Sat, Apr 28, 2007 at 08:22:55AM +1000, Neil Brown wrote:
> > A flag to unexport cannot work because we don't call unexport - we
> > just flush a kernel cache.
> > 
> > A flag to export is just .... weird.  All the other export flags are
> > state flags.  This would be an action flag.  They are quite different
> > things.   Setting a state flag again is a no-op.  Setting an action
> > flag again has a very real effect.
> 
> In this case the second set shouldn't have any effect--whatever flag is
> set should prevent further locks from being accepted, shouldn't it?  (If
> it matters.)

yes, I guess a "No locks are allowed against this export" makes more
sense than "Remove all locks on this export now".
Though currently the locks are against the filesystem - the export can
disappear from the cache while the locks remain - so it's a long way
from perfect.  Possibly we could insist that the export remains in the
kernel while files are locked .... but we update export flags by
replacing the export, so that would be a little awkward.

Also, I think I was half-thinking about the "reset the grace period"
operation, and that looks a lot like an action.... unless you make it
  grace_period_ends=seconds-since-epoch.

That might work.

> 
> > Also, each filesystem is potentially exported multiple times for
> > different sets of clients.  If such a flag (whether on 'export' or
> > 'unexport') just said "remove locks from this set of clients" it
> > wouldn't meet the needs, and if it said "remove all locks" it would be
> > a very irregular interface.
> 
> The same could be said of the "fsid=" option on exports.  It doesn't
> make sense to provide different filehandle- or path- name spaces
> depending on the IP address of a client.  If my laptop changes IP
> address, then I can (grudgingly) accept the fact that the server may
> have to deny me access that I had before--maybe it just can't trust the
> network I moved to for whatever reason--but I'd really rather it didn't
> suddenly start giving me paths, or different filehandles, or different
> semantics (like sync vs. async).
> 
> So the export interface is already being used for stuff that's really
> intended to be per-filesystem rather than per-(filesystem, client) pair.

ro/rw is often different based on client address, but yes: at lot of
the flags don't really make sense being different for different
clients on the same filesystem.

My feeling was that the "nolocks" flag is essentially pointless unless
it is the same for all exports on the one filesystem, and that gives
it a very different feel.

To make use of such a flag you could not rely on the normal mechanism
for loading flag information: on-demand loading by mountd.
You would need to look through /proc/fs/nfsd/exports, find all the
current exports for the filesystem, tell the kernel to change each
export to have the "nolocks" flag.  And then when you have done all of
that, you want to immediately remove all those export entries so you
can unmount the filesystem.

So while it could be made to work, it doesn't feel clean at all.

A   grace_period_ends=seconds-since-epoch  flag would not have most of
those problems.  e.g. it could be demand loaded.
But there is the risk that it might be set for some exports on a given
filesystem and not for others.  And the consequence of that is that
some clients might not be able to reclaim their locks (because the
lock has already been given to a client which didn't know about the
new grace period).

Now maybe it would be good to have a bunch of nfsd options that are
explicitly per-filesystem rather than per-export.
Maybe that is the sort of interface we should be designing.
  echo "+nolocks /path/to/filesystem" > /proc/fs/nfsd/filesystem_settings
  echo "grace_end=12345678 /path/to/filesystem" > /proc/....
  echo "-write_gather /path" > .....

We would need to be clear on how long those settings remain in the
kernel, how it can be told to completely forget a particular
filesystem etc..

But we probably don't need to go over-board straight away.
I like the interface:
   echo -n "flag flag .. /path/name" >  /proc/fs/nfsd/filesystem_settings

where if flags is "?flag", then the value is returned by a subsequent
read on the same file-descriptor.

At this point we only need "nolocks" and "grace_end".
The grace_end information persists until that point in time.
The "nolocks" information .... doesn't persist(?).

NeilBrown

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-29 23:10                                   ` Neil Brown
@ 2007-04-30  5:19                                     ` Wendy Cheng
  2007-05-04 18:42                                     ` J. Bruce Fields
  1 sibling, 0 replies; 42+ messages in thread
From: Wendy Cheng @ 2007-04-30  5:19 UTC (permalink / raw)
  To: Neil Brown
  Cc: J. Bruce Fields, Christoph Hellwig, cluster-devel, nfs,
	Jeff Layton

Neil Brown wrote:

>But we probably don't need to go over-board straight away.
>I like the interface:
>   echo -n "flag flag .. /path/name" >  /proc/fs/nfsd/filesystem_settings
>
>where if flags is "?flag", then the value is returned by a subsequent
>read on the same file-descriptor.
>
>  
>
Will do a quick prototype to see whether this would work as good as it 
appears. I haven't given up RPC call (into lockd) either since it seems 
to be a bright idea.

-- Wendy

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-04-29 23:10                                   ` Neil Brown
  2007-04-30  5:19                                     ` Wendy Cheng
@ 2007-05-04 18:42                                     ` J. Bruce Fields
  2007-05-04 21:35                                       ` Wendy Cheng
  1 sibling, 1 reply; 42+ messages in thread
From: J. Bruce Fields @ 2007-05-04 18:42 UTC (permalink / raw)
  To: Neil Brown; +Cc: Christoph Hellwig, cluster-devel, nfs, Jeff Layton

On Mon, Apr 30, 2007 at 09:10:38AM +1000, Neil Brown wrote:
> where if flags is "?flag", then the value is returned by a subsequent
> read on the same file-descriptor.

The ?flag thing seems a little awkward.  It'd be nice if we could get
all the flags for a single filesystem just by cat'ing an appropriate
file.

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
  2007-05-04 18:42                                     ` J. Bruce Fields
@ 2007-05-04 21:35                                       ` Wendy Cheng
  0 siblings, 0 replies; 42+ messages in thread
From: Wendy Cheng @ 2007-05-04 21:35 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Neil Brown, Christoph Hellwig, cluster-devel, nfs, Jeff Layton

J. Bruce Fields wrote:

>On Mon, Apr 30, 2007 at 09:10:38AM +1000, Neil Brown wrote:
>  
>
>>where if flags is "?flag", then the value is returned by a subsequent
>>read on the same file-descriptor.
>>    
>>
>
>The ?flag thing seems a little awkward.  It'd be nice if we could get
>all the flags for a single filesystem just by cat'ing an appropriate
>file.
>
>--b.
>  
>
ok, make sense ... Wendy

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 0/4 Revised] NLM - lock failover
  2007-04-05 21:50 [PATCH 0/4 Revised] NLM - lock failover Wendy Cheng
                   ` (2 preceding siblings ...)
  2007-04-25 14:18 ` J. Bruce Fields
@ 2011-11-30 10:13 ` Pavel
  3 siblings, 0 replies; 42+ messages in thread
From: Pavel @ 2011-11-30 10:13 UTC (permalink / raw)
  To: linux-nfs

Wendy Cheng <wcheng <at> redhat.com> writes:

> 
> Revised patches based on 2.6.21-rc4 kernel and nfs-utils-1.1.0-rc1 that 
> address issues discussed in:
> https://www.redhat.com/archives/cluster-devel/2006-September/msg00034.html
> 
> Quick How-to:
> 1) Failover server exports filesystem with "fsid" option as:
>     /etc/exports entry> /mnt/shared/exports *(fsid=1234,sync,rw)
> 2) Failover server dispatch rpc.statd with "-H" option.
> 3) Failover server drops locks based on fsid by:
>     shell> echo 1234 > /proc/fs/nfsd/nlm_unlock
> 4) Takeover server enters per fsid grace period by:
>     shell> echo 1234 > /proc/fs/nfsd/nlm_set_igrace
> 5) Takeover server notifies clients for lock reclaim by:
>     shell> /usr/sbin/sm-notify -f -v floating_ip_address -P an_sm_directory
> 
> Patch Summary:
> 4-1: implement /proc/fs/nfsd/nlm_unlock
> 4-2: implement /proc/fs/nfsd/nlm_set_igrace
> 4-3: correctly record and pass incoming server ip interface into rpc.statd.
> 4-4: nfs-utils statd changes
> 4-1 includes an existing lockd bug fix as discussed in:
> http://sourceforge.net/mailarchive/forum.php?
thread_name=4603506D.5040807%40redhat.com&forum_name=nfs
> (subject: [NFS] Question about f_count in struct nlm_file)
> 4-4 includes an existing nfs-utils statd bug fix as discussed in:
> http://sourceforge.net/mailarchive/message.php?
msg_name=46142B4F.1030507%40redhat.com
> (subject: Re: [NFS] lockd and statd)
> 
> Misc:
> o No IPV6 support due to testing efforts
> o NFS V3 only - will compare notes with CITI folks (NFS V4 issues)
> o Still need some error-inject tests.
> 

Hi everyone!

I'm building an A/A cluster using NFS v3 and local file systems, and looking for 
efficient ways for failover (for now I have to restart nfs-kernel-server on 
Takeover node to be able to initiate grace period), so the discussed solutions 
are very interesting to me.

Now (4 years after) in current nfs-utils packages (v. 1.2.2-4 and later) I can 
see that the ability to release locks was really implemented and is working well 
(I mean interfaces /proc/fs/nfsd/unlock_ip and /proc/fs/nfsd/unlock_filesystem), 
but how about reacquiring locks on the node, share migrates to? - I've been 
going through various mailing lists and found a lot of discussions on the topic 
(also dated mainly 2007), but don't seem to find any rpc-based mechanism or 
interface like /proc/fs/nfsd/nlm_set_grace to do that, was it ever made?

Thanks!

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2011-11-30 10:14 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-04-05 21:50 [PATCH 0/4 Revised] NLM - lock failover Wendy Cheng
2007-04-11 17:01 ` J. Bruce Fields
2007-04-17 19:30 ` [Cluster-devel] " Wendy Cheng
2007-04-18 18:56   ` Wendy Cheng
2007-04-18 19:46     ` [Cluster-devel] " Wendy Cheng
2007-04-19 14:41     ` Christoph Hellwig
2007-04-19 15:08       ` Wendy Cheng
2007-04-19  7:04   ` [Cluster-devel] " Neil Brown
2007-04-19 14:53     ` Wendy Cheng
2007-04-24  3:30     ` Wendy Cheng
2007-04-24  5:52       ` Neil Brown
2007-04-26  4:35         ` Wendy Cheng
2007-04-26  5:43           ` Neil Brown
2007-04-27  2:24             ` Wendy Cheng
2007-04-27  6:00               ` Neil Brown
2007-04-27 11:15                 ` Jeff Layton
2007-04-27 12:40                   ` Neil Brown
2007-04-27 13:42                     ` Jeff Layton
2007-04-27 14:17                       ` Christoph Hellwig
2007-04-27 15:42                         ` J. Bruce Fields
2007-04-27 15:36                           ` Wendy Cheng
2007-04-27 16:31                             ` J. Bruce Fields
2007-04-27 22:22                               ` Neil Brown
2007-04-29 20:13                                 ` J. Bruce Fields
2007-04-29 23:10                                   ` Neil Brown
2007-04-30  5:19                                     ` Wendy Cheng
2007-05-04 18:42                                     ` J. Bruce Fields
2007-05-04 21:35                                       ` Wendy Cheng
2007-04-27 20:34                             ` Frank van Maarseveen
2007-04-28  3:55                               ` Wendy Cheng
2007-04-28  4:51                                 ` Neil Brown
2007-04-28  5:26                                   ` Marc Eshel
2007-04-28 12:33                                   ` Frank van Maarseveen
2007-04-27 15:12                       ` Jeff Layton
2007-04-25 14:18 ` J. Bruce Fields
2007-04-25 14:10   ` Wendy Cheng
2007-04-25 15:21     ` Marc Eshel
2007-04-25 15:19       ` Wendy Cheng
2007-04-25 15:39         ` [Cluster-devel] " Wendy Cheng
2007-04-25 15:59     ` J. Bruce Fields
2007-04-25 15:52       ` Wendy Cheng
2011-11-30 10:13 ` Pavel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).