All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wendy Cheng <wcheng@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
Date: Thu, 17 Jan 2008 13:07:02 -0500	[thread overview]
Message-ID: <478F9946.9010601@redhat.com> (raw)
In-Reply-To: <20080117164002.GH16581@fieldses.org>

J. Bruce Fields wrote:
> On Thu, Jan 17, 2008 at 11:31:22AM -0500, Wendy Cheng wrote:
>   
>> J. Bruce Fields wrote:
>>     
>>> On Thu, Jan 17, 2008 at 10:48:56AM -0500, Wendy Cheng wrote:
>>>   
>>>       
>>>> J. Bruce Fields wrote:
>>>>     
>>>>         
>>>>> Remind me: why do we need both per-ip and per-filesystem methods?  In
>>>>> practice, I assume that we'll always do *both*?
>>>>>         
>>>>>           
>>>> Failover normally is done via virtual IP address - so per-ip base 
>>>> method  should be the core routine. However, for non-cluster 
>>>> filesystem such as  ext3/4, changing server also implies umount. If 
>>>> there are clients not  following rule and obtaining locks via 
>>>> different ip interfaces, umount  would fail that ends up aborting the 
>>>> failover process. That's the place  we need the per-filesystem 
>>>> method.
>>>>
>>>> ServerA:
>>>> 1. Tear down the IP address
>>>> 2. Unexport the path
>>>> 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
>>>> 4. If unmount required,
>>>> write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
>>>> 5. Signal peer to begin take-over.
>>>>
>>>> Sometime ago we were looking at "export name" as the core method (so  
>>>> per-filesystem method is a subset of that). Unfortunately, the 
>>>> prototype  efforts showed the code would be too intrusive (if 
>>>> filesystem sub-tree  is exported).
>>>>     
>>>>         
>>>>> We're migrating clients by moving a server ip address from one node to
>>>>> another.  And I assume we're permitting at most one node to export each
>>>>> filesystem at a time.  So it *should* be the case that the set of locks
>>>>> held on the filesystem(s) that are moving are the same as the set of
>>>>> locks held by the virtual ip that is moving.
>>>>>         
>>>>>           
>>>> This is true for non-cluster filesystem. But a cluster filesystem can 
>>>> be  exported from multiple servers.
>>>>     
>>>>         
>>> But that last sentence:
>>>
>>> 	it *should* be the case that the set of locks held on the
>>> 	filesystem(s) that are moving are the same as the set of locks
>>> 	held by the virtual ip that is moving.
>>>
>>> is still true in the cluster filesystem case, right?
>>>
>>> --b.
>>>   
>>>       
>> Yes .... Wendy
>>     
>
> In one situations (buggy client?  Weird network failure?) could that
> fail to be the case?
>
> Would there be any advantage to enforcing that requirement in the
> server?  (For example, teaching nlm to reject any locking request for a
> certain filesystem that wasn't sent to a certain server IP.)
>
> --b.
>   
It is doable... could be added into the "resume" patch that is currently 
being tested (since the logic is so similar to the per-ip base grace 
period) that should be out for review no later than next Monday.

However, as any new code added into the system, there are trade-off(s). 
I'm not sure we want to keep enhancing this too much though. Remember, 
locking is about latency. Adding more checking will hurt latency.

-- Wendy




WARNING: multiple messages have this Message-ID (diff)
From: Wendy Cheng <wcheng@redhat.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>, Christoph Hellwig <hch@infradead.org>,
	NFS list <linux-nfs@vger.kernel.org>,
	cluster-devel@redhat.com
Subject: Re: [PATCH 1/2] NLM failover unlock commands
Date: Thu, 17 Jan 2008 13:07:02 -0500	[thread overview]
Message-ID: <478F9946.9010601@redhat.com> (raw)
In-Reply-To: <20080117164002.GH16581@fieldses.org>

J. Bruce Fields wrote:
> On Thu, Jan 17, 2008 at 11:31:22AM -0500, Wendy Cheng wrote:
>   
>> J. Bruce Fields wrote:
>>     
>>> On Thu, Jan 17, 2008 at 10:48:56AM -0500, Wendy Cheng wrote:
>>>   
>>>       
>>>> J. Bruce Fields wrote:
>>>>     
>>>>         
>>>>> Remind me: why do we need both per-ip and per-filesystem methods?  In
>>>>> practice, I assume that we'll always do *both*?
>>>>>         
>>>>>           
>>>> Failover normally is done via virtual IP address - so per-ip base 
>>>> method  should be the core routine. However, for non-cluster 
>>>> filesystem such as  ext3/4, changing server also implies umount. If 
>>>> there are clients not  following rule and obtaining locks via 
>>>> different ip interfaces, umount  would fail that ends up aborting the 
>>>> failover process. That's the place  we need the per-filesystem 
>>>> method.
>>>>
>>>> ServerA:
>>>> 1. Tear down the IP address
>>>> 2. Unexport the path
>>>> 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
>>>> 4. If unmount required,
>>>> write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
>>>> 5. Signal peer to begin take-over.
>>>>
>>>> Sometime ago we were looking at "export name" as the core method (so  
>>>> per-filesystem method is a subset of that). Unfortunately, the 
>>>> prototype  efforts showed the code would be too intrusive (if 
>>>> filesystem sub-tree  is exported).
>>>>     
>>>>         
>>>>> We're migrating clients by moving a server ip address from one node to
>>>>> another.  And I assume we're permitting at most one node to export each
>>>>> filesystem at a time.  So it *should* be the case that the set of locks
>>>>> held on the filesystem(s) that are moving are the same as the set of
>>>>> locks held by the virtual ip that is moving.
>>>>>         
>>>>>           
>>>> This is true for non-cluster filesystem. But a cluster filesystem can 
>>>> be  exported from multiple servers.
>>>>     
>>>>         
>>> But that last sentence:
>>>
>>> 	it *should* be the case that the set of locks held on the
>>> 	filesystem(s) that are moving are the same as the set of locks
>>> 	held by the virtual ip that is moving.
>>>
>>> is still true in the cluster filesystem case, right?
>>>
>>> --b.
>>>   
>>>       
>> Yes .... Wendy
>>     
>
> In one situations (buggy client?  Weird network failure?) could that
> fail to be the case?
>
> Would there be any advantage to enforcing that requirement in the
> server?  (For example, teaching nlm to reject any locking request for a
> certain filesystem that wasn't sent to a certain server IP.)
>
> --b.
>   
It is doable... could be added into the "resume" patch that is currently 
being tested (since the logic is so similar to the per-ip base grace 
period) that should be out for review no later than next Monday.

However, as any new code added into the system, there are trade-off(s). 
I'm not sure we want to keep enhancing this too much though. Remember, 
locking is about latency. Adding more checking will hurt latency.

-- Wendy

  parent reply	other threads:[~2008-01-17 18:07 UTC|newest]

Thread overview: 117+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-07  5:39 [Cluster-devel] [PATCH 1/2] NLM failover unlock commands Wendy Cheng
2008-01-07  5:39 ` Wendy Cheng
     [not found] ` <message from Wendy Cheng on Monday January 7>
2008-01-08  5:18   ` [Cluster-devel] " Neil Brown
2008-01-08  5:18     ` Neil Brown
2008-01-09  2:51     ` [Cluster-devel] " Wendy Cheng
2008-01-09  2:51       ` Wendy Cheng
2008-01-08  5:31   ` [Cluster-devel] Re: [PATCH 2/2] Fix lockd panic Neil Brown
2008-01-08  5:31     ` Neil Brown
2008-01-09  3:02     ` [Cluster-devel] " Wendy Cheng
2008-01-09  3:02       ` Wendy Cheng
2008-01-09  4:43       ` [Cluster-devel] " Wendy Cheng
2008-01-09  4:43         ` Wendy Cheng
2008-01-09 23:33         ` [Cluster-devel] " Wendy Cheng
2008-01-09 23:33           ` Wendy Cheng
2008-01-12  6:51           ` [Cluster-devel] " Wendy Cheng
2008-01-12  6:51             ` Wendy Cheng
2008-01-08 17:02 ` [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands Christoph Hellwig
2008-01-08 17:02   ` Christoph Hellwig
2008-01-08 17:49   ` [Cluster-devel] " Christoph Hellwig
2008-01-08 17:49     ` Christoph Hellwig
2008-01-08 20:57     ` [Cluster-devel] " Wendy Cheng
2008-01-08 20:57       ` Wendy Cheng
2008-01-09 18:02       ` [Cluster-devel] " Christoph Hellwig
2008-01-09 18:02         ` Christoph Hellwig
2008-01-10  7:59         ` [Cluster-devel] " Christoph Hellwig
2008-01-10  7:59           ` Christoph Hellwig
2008-01-12  7:03           ` [Cluster-devel] " Wendy Cheng
2008-01-12  7:03             ` Wendy Cheng
2008-01-12  9:38             ` [Cluster-devel] " Christoph Hellwig
2008-01-12  9:38               ` Christoph Hellwig
2008-01-14 23:07             ` [Cluster-devel] " J. Bruce Fields
2008-01-14 23:07               ` J. Bruce Fields
     [not found]               ` <message from J. Bruce Fields on Monday January 14>
2008-01-14 23:31                 ` [Cluster-devel] " Neil Brown
2008-01-14 23:31                   ` Neil Brown
     [not found]                   ` <18315.61638.14133.308991-wvvUuzkyo1EYVZTmpyfIwg@public.gmane.org>
2008-01-15 16:38                     ` Chuck Lever
2008-01-22 22:53                   ` [Cluster-devel] " J. Bruce Fields
2008-01-22 22:53                     ` J. Bruce Fields
     [not found]                     ` <message from J. Bruce Fields on Tuesday January 22>
2008-01-24  4:02                       ` [Cluster-devel] " Neil Brown
2008-01-24  4:02                         ` Neil Brown
2008-01-15 16:14               ` [Cluster-devel] " Wendy Cheng
2008-01-15 16:14                 ` Wendy Cheng
2008-01-15 16:30                 ` [Cluster-devel] " J. Bruce Fields
2008-01-15 16:30                   ` J. Bruce Fields
     [not found]             ` <message from Wendy Cheng on Saturday January 12>
2008-01-14 23:52               ` [Cluster-devel] " Neil Brown
2008-01-14 23:52                 ` Neil Brown
2008-01-15 20:17                 ` [Cluster-devel] " Wendy Cheng
2008-01-15 20:17                   ` Wendy Cheng
     [not found]                   ` <message from Wendy Cheng on Tuesday January 15>
2008-01-15 20:50                     ` [Cluster-devel] " Neil Brown
2008-01-15 20:50                       ` Neil Brown
2008-01-15 20:56                       ` [Cluster-devel] " Wendy Cheng
2008-01-15 20:56                         ` Wendy Cheng
2008-01-15 22:48                       ` [Cluster-devel] " Wendy Cheng
2008-01-15 22:48                         ` Wendy Cheng
2008-01-17 15:10                         ` [Cluster-devel] " J. Bruce Fields
2008-01-17 15:10                           ` J. Bruce Fields
2008-01-17 15:48                           ` [Cluster-devel] " Wendy Cheng
2008-01-17 15:48                             ` Wendy Cheng
2008-01-17 16:08                             ` [Cluster-devel] " Wendy Cheng
2008-01-17 16:08                               ` Wendy Cheng
2008-01-17 16:10                               ` [Cluster-devel] " Wendy Cheng
2008-01-17 16:10                                 ` Wendy Cheng
2008-01-18 10:21                                 ` [Cluster-devel] " Frank van Maarseveen
2008-01-18 10:21                                   ` Frank van Maarseveen
2008-01-18 15:00                                   ` [Cluster-devel] " Wendy Cheng
2008-01-18 15:00                                     ` Wendy Cheng
2008-01-17 16:14                             ` [Cluster-devel] " J. Bruce Fields
2008-01-17 16:14                               ` J. Bruce Fields
2008-01-17 16:17                               ` [Cluster-devel] " Wendy Cheng
2008-01-17 16:17                                 ` Wendy Cheng
2008-01-17 16:21                                 ` [Cluster-devel] " J. Bruce Fields
2008-01-17 16:21                                   ` J. Bruce Fields
2008-01-17 16:31                             ` [Cluster-devel] " J. Bruce Fields
2008-01-17 16:31                               ` J. Bruce Fields
2008-01-17 16:31                               ` [Cluster-devel] " Wendy Cheng
2008-01-17 16:31                                 ` Wendy Cheng
2008-01-17 16:40                                 ` [Cluster-devel] " J. Bruce Fields
2008-01-17 16:40                                   ` J. Bruce Fields
2008-01-17 17:35                                   ` Frank Filz
2008-01-17 17:59                                     ` [Cluster-devel] " Wendy Cheng
2008-01-17 17:59                                       ` Wendy Cheng
2008-01-17 18:07                                   ` Wendy Cheng [this message]
2008-01-17 18:07                                     ` Wendy Cheng
2008-01-17 20:23                                     ` [Cluster-devel] " J. Bruce Fields
2008-01-17 20:23                                       ` J. Bruce Fields
2008-01-18 10:03                                       ` [Cluster-devel] " Frank van Maarseveen
2008-01-18 10:03                                         ` Frank van Maarseveen
2008-01-18 14:56                                         ` [Cluster-devel] " Wendy Cheng
2008-01-18 14:56                                           ` Wendy Cheng
2008-01-24 16:00                                       ` [Cluster-devel] " J. Bruce Fields
2008-01-24 16:00                                         ` J. Bruce Fields
2008-01-24 16:19                                         ` Peter Staubach
2008-01-24 16:39                                           ` [Cluster-devel] " J. Bruce Fields
2008-01-24 16:39                                             ` J. Bruce Fields
2008-01-24 19:45                                         ` [Cluster-devel] " Wendy Cheng
2008-01-24 19:45                                           ` Wendy Cheng
2008-01-24 20:19                                           ` [Cluster-devel] " J. Bruce Fields
2008-01-24 20:19                                             ` J. Bruce Fields
2008-01-24 21:06                                             ` [Cluster-devel] " Wendy Cheng
2008-01-24 21:06                                               ` Wendy Cheng
2008-01-24 21:40                                               ` [Cluster-devel] " J. Bruce Fields
2008-01-24 21:40                                                 ` J. Bruce Fields
2008-01-24 21:49                                                 ` [Cluster-devel] " Wendy Cheng
2008-01-24 21:49                                                   ` Wendy Cheng
2008-01-28  3:46                                         ` [Cluster-devel] " Felix Blyakher
2008-01-28  3:46                                           ` Felix Blyakher
2008-01-28 15:56                                           ` [Cluster-devel] " Wendy Cheng
2008-01-28 15:56                                             ` Wendy Cheng
2008-01-28 17:06                                             ` [Cluster-devel] " Felix Blyakher
2008-01-28 17:06                                               ` Felix Blyakher
2008-01-16  4:19                     ` Neil Brown
2008-01-16  4:19                       ` Neil Brown
2008-01-09  3:49   ` [Cluster-devel] " Wendy Cheng
2008-01-09  3:49     ` Wendy Cheng
2008-01-09 16:13     ` [Cluster-devel] " J. Bruce Fields
2008-01-09 16:13       ` J. Bruce Fields
  -- strict thread matches above, loose matches on Subject: below --
2008-01-07  5:53 [Cluster-devel] [PATCH 2/2] Fix lockd panic Wendy Cheng
2008-01-07  5:53 ` Wendy Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=478F9946.9010601@redhat.com \
    --to=wcheng@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.