From: Wendy Cheng <s.wendy.cheng@gmail.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: linux-nfs@vger.kernel.org
Subject: Re: multiple instances of rpc.statd
Date: Mon, 28 Apr 2008 15:19:28 -0400 [thread overview]
Message-ID: <48162340.6060509@gmail.com> (raw)
In-Reply-To: <20080428182612.GC22037@fieldses.org>
J. Bruce Fields wrote:
> On Sun, Apr 27, 2008 at 10:59:11PM -0500, Wendy Cheng wrote:
>
>>
>>> So for basic v2/v3 failover, what remains is some statd -H scripts, and
>>> some form of grace period control? Is there anything else we're
>>> missing?
>>>
>> The submitted patch set is reasonably complete ... .
>>
>> There was another thought about statd patches though - mostly because of
>> the concerns over statd's responsiveness. It depended so much on network
>> status and clients' participations. I was hoping NFS V4 would catch up
>> by the time v2/v3 grace period patches got accepted into mainline
>> kernel. Ideally the v2/v3 lock reclaiming logic could use (or at least
>> did a similar implementation) the communication channel established by
>> v4 servers - that is,
>>
>> 1. Enable grace period as previous submitted patches on secondary server.
>> 2. Drop the locks on primary server (and chained the dropped locks into
>> a lock-list).
>>
>
> What information exactly would be on that lock list?
>
Can't believe I get myself into this ... I'm supposed to be a disk
firmware person *now* .. Anyway,
Are the lock state finalized in v4 yet ? Can we borrow the concepts (and
saved lock states) from v4 ? We certainly can define the saved state
useful for v3 independent of v4, say client IP, file path, lock range,
lock type, and user id ? Need to re-read linux source to make sure it is
doable though.
>
>> 3. Send the lock-list via v4 communication channel (or similar
>> implementation) from primary server to backup server.
>> 4. Reclaim the lock base on the lock-list on backup server.
>>
>
> So at this step it's the server itself reclaiming those locks, and
> you're talking about a completely transparent migration that doesn't
> look to the client like a reboot?
>
Yes, that's the idea .. never implement any prototype code yet - so not
sure how feasible it would be.
> My feeling has been that that's best done after first making sure we can
> handle the case where the client reclaims the locks, since the latter is
> easier, and is likely to involve at least some of the same work. I
> could be wrong.
>
Makes sense .. so the steps taken may be:
1. Push the patch sets that we originally submitted. This is to make
sure we have something working.
2. Prototype the new logic, parallel with v4 development, observe and
learn the results from step 1 based on user feedbacks.
3. Integrate the new logic, if it turns out to be good.
> Exactly which data has to be transferred from the old server to the new?
> (Lock types, ranges, fh's, owners, and pid's, for established locks; do
> we also need to hand off blocking locks? Statd data still needs to be
> transferred. Ideally rpc reply caches. What else?)
>
All statd has is the client network addresses (that is already part of
current NLM states anyway). Yes, rpc reply cache is important (and
that's exactly the motivation for this thread of discussion). Eventually
the rpc reply cache needs to get transferred. As long as the
communication channel is established, there is no reason for lock states
not taking this advantages.
>
>> In short, it would be nice to replace the existing statd lock reclaiming
>> logic with the above steps if all possible during active-active
>> failover. For reboot, on the other hand, should stay same as today's
>> statd logic without changes.
>>
As mentioned before, cluster issues are not trivial. Take one step at a
time .. So the next task we should be focusing may be the grace period
patch. Will see what I can do to help out here.
-- Wendy
next prev parent reply other threads:[~2008-04-28 19:17 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-04-25 13:31 multiple instances of rpc.statd Bernd Schubert
[not found] ` <200804251531.21035.bs-PKu+Ek1N2UGzQB+pC5nmwQ@public.gmane.org>
2008-04-25 13:47 ` Wendy Cheng
2008-04-25 14:30 ` Bernd Schubert
[not found] ` <200804251630.36917.bs-PKu+Ek1N2UGzQB+pC5nmwQ@public.gmane.org>
2008-04-25 15:39 ` Wendy Cheng
2008-04-25 22:07 ` J. Bruce Fields
2008-04-28 3:59 ` Wendy Cheng
2008-04-28 18:26 ` J. Bruce Fields
2008-04-28 19:19 ` Wendy Cheng [this message]
2008-04-29 16:20 ` J. Bruce Fields
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48162340.6060509@gmail.com \
--to=s.wendy.cheng@gmail.com \
--cc=bfields@fieldses.org \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.