cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: J. Bruce Fields <bfields@fieldses.org>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
Date: Thu, 24 Jan 2008 15:19:10 -0500	[thread overview]
Message-ID: <20080124201910.GF26164@fieldses.org> (raw)
In-Reply-To: <4798EAE1.2000707@redhat.com>

On Thu, Jan 24, 2008 at 02:45:37PM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> In practice, it seems that both the unlock_ip and unlock_pathname
>> methods that revoke locks are going to be called together.  The two
>> separate calls therefore seem a little redundant.  The reason we *need*
>> both is that it's possible that a misconfigured client could grab locks
>> for a (server ip, export) combination that it isn't supposed to.
>>   
>
> That is not a correct assumption. The two commands (unlock_ip and  
> unlock_pathname) are not necessarily called together. It is ok for local  
> filesystem (ext3) but not for cluster filesystem where the very same  
> filesystem (or subtree) can be exported from multiple servers using  
> different subtrees.

Ouch.  Are people really doing that, and why?  What happens if the
subtrees share files (because of hard links) that are locked from both
nodes?

> Also as we discussed before, it is  
> "unlock_filesystem", *not* "unlock_pathname" (this implies sub-tree  
> exports) due to implementation difficulties (see the "Implementation  
> Notes" from http://people.redhat.com/wcheng/Patches/NFS/NLM/004.txt).

Unless I misread the latest patch, it's actually matching on the
vfsmount, right?

I guess that means we *could* handle the above situation by doing a

	mount --bind /path/to/export/point /path/to/export/point

on each export, at which point there will be a separate vfsmount for
each export point?

But I don't think that's what we really want.  The goal is to ensure
that the nfs server holds no locks on a disk filesystem so we can
unmount it completely from this machine and mount it elsewhere.  So we
should really be removing all locks for the superblock, not just for a
particular mount of that superblock.  Otherwise we'll have odd problems
if someone happens to do the unlock_filesystem downcall from a different
namespace or something.

>> So it makes sense to me to restrict locking from the beginning to
>> prevent that from happening.  Therefore I'd like to add a call at the
>> beginning like:
>>
>> 	echo "192.168.1.1 /exports/example" > /proc/fs/nfsd/start_grace
>>   
>
> My second patch set is about to be sent out (doing text description at  
> this moment .. sorry for the delay).

Good, thanks.

>> before any exports are set up, which both starts a grace period, and
>> tells nfs to allow locks on the filesystem /exports/example only if
>> they're addressed to the server ip 192.168.1.1.  Then on shutdown,
>>
>> 	echo "192.168.1.1" >/proc/fs/nfsd/unlock_ip
>>
>> should be sufficient to guarantee that nfsd/lockd no longer holds locks
>> on /exports/example.
>>
>> (I think Wendy's pretty close to that api already after adding the
>> second method to start grace?)
>>
>> The other advantage to having the server-ip from the start is that at
>> the time we make lock requests to the cluster filesystem, we can tell it
>> that the locks associated with 192.168.1.1 are special: they may migrate
>> as a group to another node, and on node failure they should (if
>> possible) be held to give a chance for another node to take them over.
>>
>> Internally I'd like to have an object like
>>
>> 	struct lock_manager {
>> 		char *lm_name;
>> 		...
>> 	}
>>
>> for each server ip address.  A pointer to this structure would be passed
>> with each lock request, allowing the filesystem to associate locks to
>> lock_manager's.  The name would be a string derived from the server ip
>> address that the cluster can compare to match reclaim requests with the
>> locks that they're reclaiming from another node.
>>   
>
> I still don't have a warm feeling about adding this (at this moment) -  
> somehow feel we over-engineer the lock failover issues.

I agree that that's a risk.

> Remember lock  failover is just a small portion of the general NFS
> server failover (for  both HA and load balancing purpose) issues.
> Maybe we should have  something simple and usable for 2.6 kernel
> before adding this type of  complication ?

Yeah.  We should aim to include basic failover functionality in 2.6.26,
one way or another--I think that dealing with the other issues I'm
worried about won't actually be a great deal more complicated, but if
that doesn't pan out then fine.  I would like to at least make sure this
is all working for nfsv4 as well, though.  (Currently only locks held by
v2/v3 clients are dropped.)

--b.



  reply	other threads:[~2008-01-24 20:19 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-07  5:39 [Cluster-devel] [PATCH 1/2] NLM failover unlock commands Wendy Cheng
     [not found] ` <message from Wendy Cheng on Monday January 7>
2008-01-08  5:18   ` [Cluster-devel] " Neil Brown
2008-01-09  2:51     ` Wendy Cheng
2008-01-08  5:31   ` [Cluster-devel] Re: [PATCH 2/2] Fix lockd panic Neil Brown
2008-01-09  3:02     ` Wendy Cheng
2008-01-09  4:43       ` Wendy Cheng
2008-01-09 23:33         ` Wendy Cheng
2008-01-12  6:51           ` Wendy Cheng
2008-01-08 17:02 ` [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands Christoph Hellwig
2008-01-08 17:49   ` Christoph Hellwig
2008-01-08 20:57     ` Wendy Cheng
2008-01-09 18:02       ` Christoph Hellwig
2008-01-10  7:59         ` Christoph Hellwig
2008-01-12  7:03           ` Wendy Cheng
2008-01-12  9:38             ` Christoph Hellwig
2008-01-14 23:07             ` J. Bruce Fields
     [not found]               ` <message from J. Bruce Fields on Monday January 14>
2008-01-14 23:31                 ` Neil Brown
2008-01-22 22:53                   ` J. Bruce Fields
     [not found]                     ` <message from J. Bruce Fields on Tuesday January 22>
2008-01-24  4:02                       ` Neil Brown
2008-01-15 16:14               ` Wendy Cheng
2008-01-15 16:30                 ` J. Bruce Fields
     [not found]             ` <message from Wendy Cheng on Saturday January 12>
2008-01-14 23:52               ` Neil Brown
2008-01-15 20:17                 ` Wendy Cheng
     [not found]                   ` <message from Wendy Cheng on Tuesday January 15>
2008-01-15 20:50                     ` Neil Brown
2008-01-15 20:56                       ` Wendy Cheng
2008-01-15 22:48                       ` Wendy Cheng
2008-01-17 15:10                         ` J. Bruce Fields
2008-01-17 15:48                           ` Wendy Cheng
2008-01-17 16:08                             ` Wendy Cheng
2008-01-17 16:10                               ` Wendy Cheng
2008-01-18 10:21                                 ` Frank van Maarseveen
2008-01-18 15:00                                   ` Wendy Cheng
2008-01-17 16:14                             ` J. Bruce Fields
2008-01-17 16:17                               ` Wendy Cheng
2008-01-17 16:21                                 ` J. Bruce Fields
2008-01-17 16:31                             ` J. Bruce Fields
2008-01-17 16:31                               ` Wendy Cheng
2008-01-17 16:40                                 ` J. Bruce Fields
     [not found]                                   ` <1200591323.13670.34.camel@dyn9047022153>
2008-01-17 17:59                                     ` Wendy Cheng
2008-01-17 18:07                                   ` Wendy Cheng
2008-01-17 20:23                                     ` J. Bruce Fields
2008-01-18 10:03                                       ` Frank van Maarseveen
2008-01-18 14:56                                         ` Wendy Cheng
2008-01-24 16:00                                       ` J. Bruce Fields
     [not found]                                         ` <4798BAAE.6090107@redhat.com>
2008-01-24 16:39                                           ` J. Bruce Fields
2008-01-24 19:45                                         ` Wendy Cheng
2008-01-24 20:19                                           ` J. Bruce Fields [this message]
2008-01-24 21:06                                             ` Wendy Cheng
2008-01-24 21:40                                               ` J. Bruce Fields
2008-01-24 21:49                                                 ` Wendy Cheng
2008-01-28  3:46                                         ` Felix Blyakher
2008-01-28 15:56                                           ` Wendy Cheng
2008-01-28 17:06                                             ` Felix Blyakher
2008-01-16  4:19                     ` Neil Brown
2008-01-09  3:49   ` Wendy Cheng
2008-01-09 16:13     ` J. Bruce Fields
  -- strict thread matches above, loose matches on Subject: below --
2008-01-07  5:53 [Cluster-devel] [PATCH 2/2] Fix lockd panic Wendy Cheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080124201910.GF26164@fieldses.org \
    --to=bfields@fieldses.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).