Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Wendy Cheng <wcheng@redhat.com>
To: Neil Brown <neilb@suse.de>
Cc: cluster-devel@redhat.com, nfs@lists.sourceforge.net
Subject: Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
Date: Thu, 26 Apr 2007 00:35:13 -0400	[thread overview]
Message-ID: <46302C01.2060500@redhat.com> (raw)
In-Reply-To: <17965.39683.396108.623418@notabene.brown>

Neil Brown wrote:

>On Monday April 23, wcheng@redhat.com wrote:
>  
>
>>Neil Brown wrote:
>>
>>[snip]
>>
>>             We started the discussion using network interface (to drop 
>>the locks) but found it wouldn't work well on local filesytems such as 
>>ext3. There is really no control on which local (sever side) interface 
>>NFS clients will use (shouldn't be hard to implement one though). When 
>>the fail-over server starts to remove the locks, it needs a way to find 
>>*all* of the locks associated with the will-be-moved partition. This is 
>>to allow umount to succeed. The server ip address alone can't guarantee 
>>that. That was the reason we switched to fsid. Also remember this is NFS 
>>v2/v3 - clients have no knowledge of server migration.
>>    
>>
>[snip]
>
>So it seems to me we do know exactly the list of local-addresses that
>could possibly be associated with locks on a given filesystem.  They
>are exactly the IP addresses that are publicly acknowledged to be
>usable for that filesystem.
>And if any client tries to access the filesystem using a different IP
>address then they are doing the wrong thing and should be reformatted.
>  
>

A convincing argument... unfortunately, this happens to be a case where 
we need to protect server from client's misbehaviors. For a local 
filesystem (ext3), if any file reference count is not zero (i.e. some 
clients are still holding the locks), the filesystem can't be 
un-mounted. We would have to fail the failover to avoid data corruption.

>Maybe the idea of using network addresses was the first suggestion,
>and maybe it was rejected for the reasons you give, but it doesn't
>currently seem like those reasons are valid.  Maybe those who proposed
>those reasons (and maybe that was me) couldn't see the big picture at
>the time...
>  
>

This debate has been (so far) tolerable and helpful - so I'm not going 
to comment on this paragraph :) ... But I have to remind people my first 
proposal was adding new flags into export command (say "exportfs -ud" to 
unexport+drop locks, and "exportfs -g" to re-export and start grace 
period). Then we moved to "echo network-addr into procfs", later 
switched to "fsid" approach. A very long journey ...

>  
>
>>>  The reply to SM_MON (currently completely ignored by all versions
>>>  of Linux) has an extra value which indicates how many more seconds
>>>  of grace period there is to go.  This can be stuffed into res_stat
>>>  maybe.
>>>  Places where we currently check 'nlmsvc_grace_period', get moved to
>>>  *after* the nlmsvc_retrieve_args call, and the grace_period value
>>>  is extracted from host->nsm.
>>> 
>>>
>>>      
>>>
>>ok with me but I don't see the advantages though ?
>>    
>>
>
>So we can have a different grace period for each different 'host'.
>  
>

IMHO, having grace period for each client (host) is overkilled.

> [snip]
>
>Part of unmounting the filesystem from Server A requires getting
>Server A to drop all the locks on the filesystem.  We know they can
>only be held by client that sent request to a given set of IP
>addresses.   Lockd created an 'nsm' for each client/local-IP pair and
>registered each of those with statd.  The information registered with
>statd includes the details of an RPC call that can be made to lockd to
>tell it to drop all the locks owned by that client/local-IP pair.
>
>The statd in 1.1.0 records all this information in the files created
>in /var/lib/nfs/sm (and could pass it to the ha-callout if required).
>So when it is time to unmount the filesystem, some program can look
>through all the files in nfs/nm, read each of the lines, find those
>which relate to any of the local IP address that we want to move, and
>initialiate the RPC callback described on that line.  This will tell
>lockd to drop those lockd.  When all the RPCs have been sent, lockd
>will not hold any locks on that filesystem any more.
>  
>

Bright idea ! But doesn't solve the issue of misbehaved clients who come 
in from un-wanted (server) interfaces. Does it ?

>
>[snip]
>I feel it has taken me quite a while to gain a full understanding of
>what you are trying to achieve.  Maybe it would be useful to have a
>concise/precise description of what the goal is.
>I think a lot of the issues have now become clear, but it seems there
>remains the issue of what system-wide configurations are expected, and
>what configuration we can rule 'out of scope' and decide we don't have
>to deal with.
>  
>
I'm trying to do the write-up now. But could the following temporarily 
serve the purpose ? What is not clear from this thread of discussion?

http://www.redhat.com/archives/linux-cluster/2006-June/msg00050.html


-- Wendy

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

next prev parent reply	other threads:[~2007-04-26  4:25 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-05 21:50 [PATCH 0/4 Revised] NLM - lock failover Wendy Cheng
2007-04-11 17:01 ` J. Bruce Fields
2007-04-17 19:30 ` [Cluster-devel] " Wendy Cheng
2007-04-18 18:56   ` Wendy Cheng
2007-04-18 19:46     ` [Cluster-devel] " Wendy Cheng
2007-04-19 14:41     ` Christoph Hellwig
2007-04-19 15:08       ` Wendy Cheng
2007-04-19  7:04   ` [Cluster-devel] " Neil Brown
2007-04-19 14:53     ` Wendy Cheng
2007-04-24  3:30     ` Wendy Cheng
2007-04-24  5:52       ` Neil Brown
2007-04-26  4:35         ` Wendy Cheng [this message]
2007-04-26  5:43           ` Neil Brown
2007-04-27  2:24             ` Wendy Cheng
2007-04-27  6:00               ` Neil Brown
2007-04-27 11:15                 ` Jeff Layton
2007-04-27 12:40                   ` Neil Brown
2007-04-27 13:42                     ` Jeff Layton
2007-04-27 14:17                       ` Christoph Hellwig
2007-04-27 15:42                         ` J. Bruce Fields
2007-04-27 15:36                           ` Wendy Cheng
2007-04-27 16:31                             ` J. Bruce Fields
2007-04-27 22:22                               ` Neil Brown
2007-04-29 20:13                                 ` J. Bruce Fields
2007-04-29 23:10                                   ` Neil Brown
2007-04-30  5:19                                     ` Wendy Cheng
2007-05-04 18:42                                     ` J. Bruce Fields
2007-05-04 21:35                                       ` Wendy Cheng
2007-04-27 20:34                             ` Frank van Maarseveen
2007-04-28  3:55                               ` Wendy Cheng
2007-04-28  4:51                                 ` Neil Brown
2007-04-28  5:26                                   ` Marc Eshel
2007-04-28 12:33                                   ` Frank van Maarseveen
2007-04-27 15:12                       ` Jeff Layton
2007-04-25 14:18 ` J. Bruce Fields
2007-04-25 14:10   ` Wendy Cheng
2007-04-25 15:21     ` Marc Eshel
2007-04-25 15:19       ` Wendy Cheng
2007-04-25 15:39         ` [Cluster-devel] " Wendy Cheng
2007-04-25 15:59     ` J. Bruce Fields
2007-04-25 15:52       ` Wendy Cheng
2011-11-30 10:13 ` Pavel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46302C01.2060500@redhat.com \
    --to=wcheng@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=neilb@suse.de \
    --cc=nfs@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).