From: J. Bruce Fields <bfields@fieldses.org>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands
Date: Thu, 24 Jan 2008 15:19:10 -0500 [thread overview]
Message-ID: <20080124201910.GF26164@fieldses.org> (raw)
In-Reply-To: <4798EAE1.2000707@redhat.com>
On Thu, Jan 24, 2008 at 02:45:37PM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> In practice, it seems that both the unlock_ip and unlock_pathname
>> methods that revoke locks are going to be called together. The two
>> separate calls therefore seem a little redundant. The reason we *need*
>> both is that it's possible that a misconfigured client could grab locks
>> for a (server ip, export) combination that it isn't supposed to.
>>
>
> That is not a correct assumption. The two commands (unlock_ip and
> unlock_pathname) are not necessarily called together. It is ok for local
> filesystem (ext3) but not for cluster filesystem where the very same
> filesystem (or subtree) can be exported from multiple servers using
> different subtrees.
Ouch. Are people really doing that, and why? What happens if the
subtrees share files (because of hard links) that are locked from both
nodes?
> Also as we discussed before, it is
> "unlock_filesystem", *not* "unlock_pathname" (this implies sub-tree
> exports) due to implementation difficulties (see the "Implementation
> Notes" from http://people.redhat.com/wcheng/Patches/NFS/NLM/004.txt).
Unless I misread the latest patch, it's actually matching on the
vfsmount, right?
I guess that means we *could* handle the above situation by doing a
mount --bind /path/to/export/point /path/to/export/point
on each export, at which point there will be a separate vfsmount for
each export point?
But I don't think that's what we really want. The goal is to ensure
that the nfs server holds no locks on a disk filesystem so we can
unmount it completely from this machine and mount it elsewhere. So we
should really be removing all locks for the superblock, not just for a
particular mount of that superblock. Otherwise we'll have odd problems
if someone happens to do the unlock_filesystem downcall from a different
namespace or something.
>> So it makes sense to me to restrict locking from the beginning to
>> prevent that from happening. Therefore I'd like to add a call at the
>> beginning like:
>>
>> echo "192.168.1.1 /exports/example" > /proc/fs/nfsd/start_grace
>>
>
> My second patch set is about to be sent out (doing text description at
> this moment .. sorry for the delay).
Good, thanks.
>> before any exports are set up, which both starts a grace period, and
>> tells nfs to allow locks on the filesystem /exports/example only if
>> they're addressed to the server ip 192.168.1.1. Then on shutdown,
>>
>> echo "192.168.1.1" >/proc/fs/nfsd/unlock_ip
>>
>> should be sufficient to guarantee that nfsd/lockd no longer holds locks
>> on /exports/example.
>>
>> (I think Wendy's pretty close to that api already after adding the
>> second method to start grace?)
>>
>> The other advantage to having the server-ip from the start is that at
>> the time we make lock requests to the cluster filesystem, we can tell it
>> that the locks associated with 192.168.1.1 are special: they may migrate
>> as a group to another node, and on node failure they should (if
>> possible) be held to give a chance for another node to take them over.
>>
>> Internally I'd like to have an object like
>>
>> struct lock_manager {
>> char *lm_name;
>> ...
>> }
>>
>> for each server ip address. A pointer to this structure would be passed
>> with each lock request, allowing the filesystem to associate locks to
>> lock_manager's. The name would be a string derived from the server ip
>> address that the cluster can compare to match reclaim requests with the
>> locks that they're reclaiming from another node.
>>
>
> I still don't have a warm feeling about adding this (at this moment) -
> somehow feel we over-engineer the lock failover issues.
I agree that that's a risk.
> Remember lock failover is just a small portion of the general NFS
> server failover (for both HA and load balancing purpose) issues.
> Maybe we should have something simple and usable for 2.6 kernel
> before adding this type of complication ?
Yeah. We should aim to include basic failover functionality in 2.6.26,
one way or another--I think that dealing with the other issues I'm
worried about won't actually be a great deal more complicated, but if
that doesn't pan out then fine. I would like to at least make sure this
is all working for nfsv4 as well, though. (Currently only locks held by
v2/v3 clients are dropped.)
--b.
next prev parent reply other threads:[~2008-01-24 20:19 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-07 5:39 [Cluster-devel] [PATCH 1/2] NLM failover unlock commands Wendy Cheng
[not found] ` <message from Wendy Cheng on Monday January 7>
2008-01-08 5:18 ` [Cluster-devel] " Neil Brown
2008-01-09 2:51 ` Wendy Cheng
2008-01-08 5:31 ` [Cluster-devel] Re: [PATCH 2/2] Fix lockd panic Neil Brown
2008-01-09 3:02 ` Wendy Cheng
2008-01-09 4:43 ` Wendy Cheng
2008-01-09 23:33 ` Wendy Cheng
2008-01-12 6:51 ` Wendy Cheng
2008-01-08 17:02 ` [Cluster-devel] Re: [PATCH 1/2] NLM failover unlock commands Christoph Hellwig
2008-01-08 17:49 ` Christoph Hellwig
2008-01-08 20:57 ` Wendy Cheng
2008-01-09 18:02 ` Christoph Hellwig
2008-01-10 7:59 ` Christoph Hellwig
2008-01-12 7:03 ` Wendy Cheng
2008-01-12 9:38 ` Christoph Hellwig
2008-01-14 23:07 ` J. Bruce Fields
[not found] ` <message from J. Bruce Fields on Monday January 14>
2008-01-14 23:31 ` Neil Brown
2008-01-22 22:53 ` J. Bruce Fields
[not found] ` <message from J. Bruce Fields on Tuesday January 22>
2008-01-24 4:02 ` Neil Brown
2008-01-15 16:14 ` Wendy Cheng
2008-01-15 16:30 ` J. Bruce Fields
[not found] ` <message from Wendy Cheng on Saturday January 12>
2008-01-14 23:52 ` Neil Brown
2008-01-15 20:17 ` Wendy Cheng
[not found] ` <message from Wendy Cheng on Tuesday January 15>
2008-01-15 20:50 ` Neil Brown
2008-01-15 20:56 ` Wendy Cheng
2008-01-15 22:48 ` Wendy Cheng
2008-01-17 15:10 ` J. Bruce Fields
2008-01-17 15:48 ` Wendy Cheng
2008-01-17 16:08 ` Wendy Cheng
2008-01-17 16:10 ` Wendy Cheng
2008-01-18 10:21 ` Frank van Maarseveen
2008-01-18 15:00 ` Wendy Cheng
2008-01-17 16:14 ` J. Bruce Fields
2008-01-17 16:17 ` Wendy Cheng
2008-01-17 16:21 ` J. Bruce Fields
2008-01-17 16:31 ` J. Bruce Fields
2008-01-17 16:31 ` Wendy Cheng
2008-01-17 16:40 ` J. Bruce Fields
[not found] ` <1200591323.13670.34.camel@dyn9047022153>
2008-01-17 17:59 ` Wendy Cheng
2008-01-17 18:07 ` Wendy Cheng
2008-01-17 20:23 ` J. Bruce Fields
2008-01-18 10:03 ` Frank van Maarseveen
2008-01-18 14:56 ` Wendy Cheng
2008-01-24 16:00 ` J. Bruce Fields
[not found] ` <4798BAAE.6090107@redhat.com>
2008-01-24 16:39 ` J. Bruce Fields
2008-01-24 19:45 ` Wendy Cheng
2008-01-24 20:19 ` J. Bruce Fields [this message]
2008-01-24 21:06 ` Wendy Cheng
2008-01-24 21:40 ` J. Bruce Fields
2008-01-24 21:49 ` Wendy Cheng
2008-01-28 3:46 ` Felix Blyakher
2008-01-28 15:56 ` Wendy Cheng
2008-01-28 17:06 ` Felix Blyakher
2008-01-16 4:19 ` Neil Brown
2008-01-09 3:49 ` Wendy Cheng
2008-01-09 16:13 ` J. Bruce Fields
-- strict thread matches above, loose matches on Subject: below --
2008-01-07 5:53 [Cluster-devel] [PATCH 2/2] Fix lockd panic Wendy Cheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080124201910.GF26164@fieldses.org \
--to=bfields@fieldses.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).