[NFS] [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Wendy Cheng <wcheng@redhat.com>
To: cluster-devel.redhat.com
Subject: [NFS] [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
Date: Thu, 26 Apr 2007 00:35:13 -0400	[thread overview]
Message-ID: <46302C01.2060500@redhat.com> (raw)
In-Reply-To: <17965.39683.396108.623418@notabene.brown>

Neil Brown wrote:

>On Monday April 23, wcheng at redhat.com wrote:
>  
>
>>Neil Brown wrote:
>>
>>[snip]
>>
>>             We started the discussion using network interface (to drop 
>>the locks) but found it wouldn't work well on local filesytems such as 
>>ext3. There is really no control on which local (sever side) interface 
>>NFS clients will use (shouldn't be hard to implement one though). When 
>>the fail-over server starts to remove the locks, it needs a way to find 
>>*all* of the locks associated with the will-be-moved partition. This is 
>>to allow umount to succeed. The server ip address alone can't guarantee 
>>that. That was the reason we switched to fsid. Also remember this is NFS 
>>v2/v3 - clients have no knowledge of server migration.
>>    
>>
>[snip]
>
>So it seems to me we do know exactly the list of local-addresses that
>could possibly be associated with locks on a given filesystem.  They
>are exactly the IP addresses that are publicly acknowledged to be
>usable for that filesystem.
>And if any client tries to access the filesystem using a different IP
>address then they are doing the wrong thing and should be reformatted.
>  
>

A convincing argument... unfortunately, this happens to be a case where 
we need to protect server from client's misbehaviors. For a local 
filesystem (ext3), if any file reference count is not zero (i.e. some 
clients are still holding the locks), the filesystem can't be 
un-mounted. We would have to fail the failover to avoid data corruption.

>Maybe the idea of using network addresses was the first suggestion,
>and maybe it was rejected for the reasons you give, but it doesn't
>currently seem like those reasons are valid.  Maybe those who proposed
>those reasons (and maybe that was me) couldn't see the big picture at
>the time...
>  
>

This debate has been (so far) tolerable and helpful - so I'm not going 
to comment on this paragraph :) ... But I have to remind people my first 
proposal was adding new flags into export command (say "exportfs -ud" to 
unexport+drop locks, and "exportfs -g" to re-export and start grace 
period). Then we moved to "echo network-addr into procfs", later 
switched to "fsid" approach. A very long journey ...

>  
>
>>>  The reply to SM_MON (currently completely ignored by all versions
>>>  of Linux) has an extra value which indicates how many more seconds
>>>  of grace period there is to go.  This can be stuffed into res_stat
>>>  maybe.
>>>  Places where we currently check 'nlmsvc_grace_period', get moved to
>>>  *after* the nlmsvc_retrieve_args call, and the grace_period value
>>>  is extracted from host->nsm.
>>> 
>>>
>>>      
>>>
>>ok with me but I don't see the advantages though ?
>>    
>>
>
>So we can have a different grace period for each different 'host'.
>  
>

IMHO, having grace period for each client (host) is overkilled.

> [snip]
>
>Part of unmounting the filesystem from Server A requires getting
>Server A to drop all the locks on the filesystem.  We know they can
>only be held by client that sent request to a given set of IP
>addresses.   Lockd created an 'nsm' for each client/local-IP pair and
>registered each of those with statd.  The information registered with
>statd includes the details of an RPC call that can be made to lockd to
>tell it to drop all the locks owned by that client/local-IP pair.
>
>The statd in 1.1.0 records all this information in the files created
>in /var/lib/nfs/sm (and could pass it to the ha-callout if required).
>So when it is time to unmount the filesystem, some program can look
>through all the files in nfs/nm, read each of the lines, find those
>which relate to any of the local IP address that we want to move, and
>initialiate the RPC callback described on that line.  This will tell
>lockd to drop those lockd.  When all the RPCs have been sent, lockd
>will not hold any locks on that filesystem any more.
>  
>

Bright idea ! But doesn't solve the issue of misbehaved clients who come 
in from un-wanted (server) interfaces. Does it ?

>
>[snip]
>I feel it has taken me quite a while to gain a full understanding of
>what you are trying to achieve.  Maybe it would be useful to have a
>concise/precise description of what the goal is.
>I think a lot of the issues have now become clear, but it seems there
>remains the issue of what system-wide configurations are expected, and
>what configuration we can rule 'out of scope' and decide we don't have
>to deal with.
>  
>
I'm trying to do the write-up now. But could the following temporarily 
serve the purpose ? What is not clear from this thread of discussion?

http://www.redhat.com/archives/linux-cluster/2006-June/msg00050.html


-- Wendy

WARNING: multiple messages have this Message-ID (diff)

From: Wendy Cheng <wcheng@redhat.com>
To: Neil Brown <neilb@suse.de>
Cc: cluster-devel@redhat.com, nfs@lists.sourceforge.net
Subject: Re: [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover
Date: Thu, 26 Apr 2007 00:35:13 -0400	[thread overview]
Message-ID: <46302C01.2060500@redhat.com> (raw)
In-Reply-To: <17965.39683.396108.623418@notabene.brown>

Neil Brown wrote:

>On Monday April 23, wcheng@redhat.com wrote:
>  
>
>>Neil Brown wrote:
>>
>>[snip]
>>
>>             We started the discussion using network interface (to drop 
>>the locks) but found it wouldn't work well on local filesytems such as 
>>ext3. There is really no control on which local (sever side) interface 
>>NFS clients will use (shouldn't be hard to implement one though). When 
>>the fail-over server starts to remove the locks, it needs a way to find 
>>*all* of the locks associated with the will-be-moved partition. This is 
>>to allow umount to succeed. The server ip address alone can't guarantee 
>>that. That was the reason we switched to fsid. Also remember this is NFS 
>>v2/v3 - clients have no knowledge of server migration.
>>    
>>
>[snip]
>
>So it seems to me we do know exactly the list of local-addresses that
>could possibly be associated with locks on a given filesystem.  They
>are exactly the IP addresses that are publicly acknowledged to be
>usable for that filesystem.
>And if any client tries to access the filesystem using a different IP
>address then they are doing the wrong thing and should be reformatted.
>  
>

A convincing argument... unfortunately, this happens to be a case where 
we need to protect server from client's misbehaviors. For a local 
filesystem (ext3), if any file reference count is not zero (i.e. some 
clients are still holding the locks), the filesystem can't be 
un-mounted. We would have to fail the failover to avoid data corruption.

>Maybe the idea of using network addresses was the first suggestion,
>and maybe it was rejected for the reasons you give, but it doesn't
>currently seem like those reasons are valid.  Maybe those who proposed
>those reasons (and maybe that was me) couldn't see the big picture at
>the time...
>  
>

This debate has been (so far) tolerable and helpful - so I'm not going 
to comment on this paragraph :) ... But I have to remind people my first 
proposal was adding new flags into export command (say "exportfs -ud" to 
unexport+drop locks, and "exportfs -g" to re-export and start grace 
period). Then we moved to "echo network-addr into procfs", later 
switched to "fsid" approach. A very long journey ...

>  
>
>>>  The reply to SM_MON (currently completely ignored by all versions
>>>  of Linux) has an extra value which indicates how many more seconds
>>>  of grace period there is to go.  This can be stuffed into res_stat
>>>  maybe.
>>>  Places where we currently check 'nlmsvc_grace_period', get moved to
>>>  *after* the nlmsvc_retrieve_args call, and the grace_period value
>>>  is extracted from host->nsm.
>>> 
>>>
>>>      
>>>
>>ok with me but I don't see the advantages though ?
>>    
>>
>
>So we can have a different grace period for each different 'host'.
>  
>

IMHO, having grace period for each client (host) is overkilled.

> [snip]
>
>Part of unmounting the filesystem from Server A requires getting
>Server A to drop all the locks on the filesystem.  We know they can
>only be held by client that sent request to a given set of IP
>addresses.   Lockd created an 'nsm' for each client/local-IP pair and
>registered each of those with statd.  The information registered with
>statd includes the details of an RPC call that can be made to lockd to
>tell it to drop all the locks owned by that client/local-IP pair.
>
>The statd in 1.1.0 records all this information in the files created
>in /var/lib/nfs/sm (and could pass it to the ha-callout if required).
>So when it is time to unmount the filesystem, some program can look
>through all the files in nfs/nm, read each of the lines, find those
>which relate to any of the local IP address that we want to move, and
>initialiate the RPC callback described on that line.  This will tell
>lockd to drop those lockd.  When all the RPCs have been sent, lockd
>will not hold any locks on that filesystem any more.
>  
>

Bright idea ! But doesn't solve the issue of misbehaved clients who come 
in from un-wanted (server) interfaces. Does it ?

>
>[snip]
>I feel it has taken me quite a while to gain a full understanding of
>what you are trying to achieve.  Maybe it would be useful to have a
>concise/precise description of what the goal is.
>I think a lot of the issues have now become clear, but it seems there
>remains the issue of what system-wide configurations are expected, and
>what configuration we can rule 'out of scope' and decide we don't have
>to deal with.
>  
>
I'm trying to do the write-up now. But could the following temporarily 
serve the purpose ? What is not clear from this thread of discussion?

http://www.redhat.com/archives/linux-cluster/2006-June/msg00050.html


-- Wendy

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

next prev parent reply	other threads:[~2007-04-26  4:35 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-05 21:50 [Cluster-devel] [PATCH 0/4 Revised] NLM - lock failover Wendy Cheng
2007-04-05 21:50 ` Wendy Cheng
2007-04-11 17:01 ` [Cluster-devel] Re: [NFS] " J. Bruce Fields
2007-04-11 17:01   ` J. Bruce Fields
2007-04-17 19:30 ` [Cluster-devel] " Wendy Cheng
2007-04-17 19:30   ` Wendy Cheng
2007-04-18 18:56   ` [Cluster-devel] " Wendy Cheng
2007-04-18 18:56     ` Wendy Cheng
2007-04-18 19:46     ` [Cluster-devel] " Wendy Cheng
2007-04-18 19:46       ` Wendy Cheng
2007-04-19 14:41     ` [Cluster-devel] Re: [NFS] " Christoph Hellwig
2007-04-19 14:41       ` Christoph Hellwig
2007-04-19 15:08       ` [Cluster-devel] Re: [NFS] " Wendy Cheng
2007-04-19 15:08         ` Wendy Cheng
     [not found]   ` <message from Wendy Cheng on Tuesday April 17>
2007-04-19  7:04     ` [Cluster-devel] " Neil Brown
2007-04-19  7:04       ` Neil Brown
2007-04-19 14:53       ` Wendy Cheng
2007-04-19 14:53         ` Wendy Cheng
2007-04-24  3:30       ` Wendy Cheng
2007-04-24  3:30         ` Wendy Cheng
     [not found]         ` <message from Wendy Cheng on Monday April 23>
2007-04-24  5:52           ` [NFS] " Neil Brown
2007-04-24  5:52             ` Neil Brown
2007-04-26  4:35             ` Wendy Cheng [this message]
2007-04-26  4:35               ` Wendy Cheng
     [not found]               ` <message from Wendy Cheng on Thursday April 26>
2007-04-26  5:43                 ` [NFS] " Neil Brown
2007-04-26  5:43                   ` Neil Brown
2007-04-27  2:24                   ` [NFS] " Wendy Cheng
2007-04-27  2:24                     ` Wendy Cheng
2007-04-27  6:00                 ` [NFS] " Neil Brown
2007-04-27  6:00                   ` Neil Brown
2007-04-27 11:15                   ` [NFS] " Jeff Layton
2007-04-27 11:15                     ` Jeff Layton
     [not found]                     ` <message from Jeff Layton on Friday April 27>
2007-04-27 12:40                       ` [NFS] " Neil Brown
2007-04-27 12:40                         ` Neil Brown
2007-04-27 13:42                         ` Jeff Layton
2007-04-27 18:57                           ` [NFS] " Jeff Layton
2007-04-27 14:17                           ` Christoph Hellwig
2007-04-27 14:17                             ` Christoph Hellwig
2007-04-27 15:42                             ` J. Bruce Fields
2007-04-27 15:43                               ` [NFS] " J. Bruce Fields
2007-04-27 15:36                               ` Wendy Cheng
2007-04-27 15:36                                 ` Wendy Cheng
2007-04-27 16:31                                 ` J. Bruce Fields
2007-04-27 16:31                                   ` [NFS] " J. Bruce Fields
     [not found]                                   ` <message from J. Bruce Fields on Friday April 27>
2007-04-27 22:22                                     ` Neil Brown
2007-04-27 22:22                                       ` Neil Brown
2007-04-29 20:13                                       ` J. Bruce Fields
2007-04-29 20:14                                         ` [NFS] " J. Bruce Fields
     [not found]                                         ` <message from J. Bruce Fields on Sunday April 29>
2007-04-29 23:10                                           ` Neil Brown
2007-04-29 23:10                                             ` Neil Brown
2007-04-30  5:19                                             ` [NFS] " Wendy Cheng
2007-04-30  5:19                                               ` Wendy Cheng
2007-05-04 18:42                                             ` [NFS] " J. Bruce Fields
2007-05-04 18:42                                               ` J. Bruce Fields
2007-05-04 21:35                                               ` [NFS] " Wendy Cheng
2007-05-04 21:35                                                 ` Wendy Cheng
2007-04-27 20:34                                 ` Frank van Maarseveen
2007-04-27 20:34                                   ` [NFS] " Frank van Maarseveen
2007-04-28  3:55                                   ` Wendy Cheng
2007-04-28  3:55                                     ` Wendy Cheng
     [not found]                                     ` <message from Wendy Cheng on Friday April 27>
2007-04-28  4:51                                       ` [NFS] " Neil Brown
2007-04-28  4:51                                         ` Neil Brown
2007-04-28  5:26                                         ` Marc Eshel
2007-04-28  5:27                                           ` [NFS] " Marc Eshel
2007-04-28 12:33                                         ` Frank van Maarseveen
2007-04-28 12:33                                           ` [NFS] " Frank van Maarseveen
2007-04-27 15:12                           ` Jeff Layton
2007-04-27 15:12                             ` [NFS] " Jeff Layton
2007-04-25 14:18 ` [Cluster-devel] Re: [NFS] " J. Bruce Fields
2007-04-25 14:18   ` J. Bruce Fields
2007-04-25 14:10   ` [Cluster-devel] Re: [NFS] " Wendy Cheng
2007-04-25 14:10     ` Wendy Cheng
2007-04-25 15:21     ` [Cluster-devel] Re: [NFS] " Marc Eshel
2007-04-25 15:21       ` Marc Eshel
2007-04-25 15:19       ` [Cluster-devel] Re: [NFS] " Wendy Cheng
2007-04-25 15:19         ` Wendy Cheng
2007-04-25 15:39         ` [Cluster-devel] Re: [NFS] " Wendy Cheng
2007-04-25 15:39           ` [Cluster-devel] " Wendy Cheng
2007-04-25 15:59     ` [Cluster-devel] Re: [NFS] " J. Bruce Fields
2007-04-25 15:59       ` J. Bruce Fields
2007-04-25 15:52       ` [Cluster-devel] Re: [NFS] " Wendy Cheng
2007-04-25 15:52         ` Wendy Cheng
2011-11-30 10:13 ` Pavel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46302C01.2060500@redhat.com \
    --to=wcheng@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.