From: "Labiaga, Ricardo" <ricardo.labiaga@netapp.com>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: William Adamson <William.Adamson@netapp.com>,
<linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 0/12] Fix session reset deadlocks Version 4
Date: Sat, 05 Dec 2009 19:25:12 -0800 [thread overview]
Message-ID: <C7406418.1099B%ricardo.labiaga@netapp.com> (raw)
In-Reply-To: <1260059679.10985.9.camel@localhost>
These patches due improve the situation. I still see a number of sequence
calls with sessionID=0 and the same sequenceID that triggered the initial
BADSESSION. It does recover after the session is fully established though.
The sequenceID's with sessionID=0 are generated because nfs4_reset_session()
clears the DRAINING flag and wakes the pending RPCs even on error. This is
broken, since we don't have a valid sessionID. Since we're already in the
state manager, why not just let the state manager retry if the error is
recoverable (such as STALE_CLIENTID)?
I'll give that a try after dinner :-)
- ricardo
On 12/5/09 4:34 PM, "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote:
> On Sat, 2009-12-05 at 13:42 -0800, Labiaga, Ricardo wrote:
>>
>>
>> On 12/5/09 1:39 PM, "Ricardo Labiaga" <ricardo.labiaga@netapp.com> wrote:
>>
>>> On 12/5/09 1:12 PM, "Trond Myklebust" <Trond.Myklebust@netapp.com> wrote:
>>>
>>>> On Sat, 2009-12-05 at 12:55 -0800, Labiaga, Ricardo wrote:
>>>>> Tried with this patch but it didn't make a difference.
>>>>
>>>> You are still seeing RPC calls with 0 session ids?
>>>>
>>>
>>> Yes, right after the session is destroyed, and before it's recreated. The
>>> original RPC that got the BAD_SESSION error keeps on trying.
>>>
>>
>> I should clarify. It's not a retransmission, the client issues the same
>> compound with a new XID.
>>
>> - ricardo
>>
>>> After the session is recreated, the same RPC is issued (with the same
>>> sequenceID) but with the new sessionID. This time it fails with
>>> SEQ_MISORDERED. This repeats indefinitely until the process is manually
>>> interrupted.
>>>
>>>>> I haven't tried applying the second cleanup patch yet since it
>>>>> didn't apply cleanly on top of nfs-for-next. Is this the branch you
>>>>> used?
>>>>
>>>> I've pushed out all patches (including the cleanup patch) onto
>>>> nfs-for-next now...
>>>>
>>>
>>> Got it, I was able to apply both patches. The results above are with both
>>> patches.
>
> I've found some other interesting session reset cases. I've coded up
> some fixes, and pushed them to the nfs-for-next tree.
>
> In particular, please see
>
> http://git.linux-nfs.org/?p=trondmy/nfs-2.6.git&a=commitdiff&h=f26468fb9384e73
> fb357d2e84d3e9c88c7d1129d
> which should ensure that we always reinitialise the slot sequence number
> after a server reboot.
>
> Could you please see if that in any way changes the above behaviour?
>
> Cheers
> Trond
next prev parent reply other threads:[~2009-12-06 3:25 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-04 20:25 [PATCH 0/12] Fix session reset deadlocks Version 4 andros
2009-12-04 20:25 ` [PATCH 01/11] nfs41: add create session into establish_clid andros
2009-12-04 20:25 ` [PATCH 02/11] nfs41: rename cl_state session SETUP bit to RESET andros
2009-12-04 20:25 ` [PATCH 03/11] nfs41: nfs4_get_lease_time will never session reset andros
2009-12-04 20:25 ` [PATCH 04/11] nfs41: call free slot from nfs4_restart_rpc andros
2009-12-04 20:25 ` [PATCH 05/11] nfs41: free the slot on unhandled read errors andros
2009-12-04 20:25 ` [PATCH 06/11] nfs41: fix switch in nfs4_handle_exception andros
2009-12-04 20:25 ` [PATCH 07/11] nfs41: fix switch in nfs4_recovery_handle_error andros
2009-12-04 20:25 ` [PATCH 08/11] nfs41: don't clear tk_action on success andros
2009-12-04 20:25 ` [PATCH 09/11] nfs41: remove nfs4_recover_session andros
2009-12-04 20:25 ` [PATCH 10/11] nfs41: nfs41: fix state manager deadlock in session reset andros
2009-12-04 20:25 ` [PATCH 11/11] nfs41: drain session cleanup andros
2009-12-04 20:25 ` [PATCH 12/12] nfs41: only state manager sets NFS4CLNT_SESSION_SETUP andros
2009-12-04 22:01 ` [PATCH 0/12] Fix session reset deadlocks Version 4 Trond Myklebust
2009-12-04 22:24 ` Trond Myklebust
2009-12-05 7:05 ` Labiaga, Ricardo
[not found] ` <273FE88A07F5D445824060902F70034408A1A330-hX7t0kiaRRpT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2009-12-05 19:03 ` William A. (Andy) Adamson
[not found] ` <273FE88A07F5D445824060902F70034406371449@SACMVEXC1-PRD.hq.netapp.com>
[not found] ` <273FE88A07F5D445824060902F70034406371449-hX7t0kiaRRpT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2009-12-05 20:06 ` Trond Myklebust
2009-12-05 20:04 ` Trond Myklebust
[not found] ` <273FE88A07F5D445824060902F7003440637144A@SACMVEXC1-PRD.hq.netapp.com>
[not found] ` <273FE88A07F5D445824060902F7003440637144A-hX7t0kiaRRpT+ZUat5FNkAK/GNPrWCqfQQ4Iyu8u01E@public.gmane.org>
2009-12-05 21:12 ` Trond Myklebust
2009-12-05 21:39 ` Labiaga, Ricardo
2009-12-05 21:42 ` Labiaga, Ricardo
2009-12-06 0:34 ` Trond Myklebust
2009-12-06 3:25 ` Labiaga, Ricardo [this message]
2009-12-06 3:28 ` Labiaga, Ricardo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=C7406418.1099B%ricardo.labiaga@netapp.com \
--to=ricardo.labiaga@netapp.com \
--cc=Trond.Myklebust@netapp.com \
--cc=William.Adamson@netapp.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox