From: Ben Greear <greearb@candelatech.com>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: "Myklebust, Trond" <Trond.Myklebust@netapp.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: Question on nfs40_discover_server_trunking.
Date: Fri, 18 Jan 2013 15:21:12 -0800 [thread overview]
Message-ID: <50F9D8E8.6080803@candelatech.com> (raw)
In-Reply-To: <C11ACA70-EC3A-4FAD-8ADC-572F9B6B4CFE@oracle.com>
On 01/18/2013 03:14 PM, Chuck Lever wrote:
>
> On Jan 18, 2013, at 5:59 PM, "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
>
>> On Fri, 2013-01-18 at 17:03 -0500, an unknown sender wrote:
>>> On Fri, 2013-01-18 at 16:33 -0500, Chuck Lever wrote:
>>>> On Jan 18, 2013, at 4:28 PM, Ben Greear <greearb@candelatech.com> wrote:
>>>>
>>>>> Any chance the STALE_CLIENTID case needs a 'break'?
>>>>
>>>> I don't think so. LEASE_CONFIRM is set, and we want to wake the state renewal thread.
>>>>
>>>>>
>>>>> Twice I've seen kernel crashes after the nfs40_walk_client_list
>>>>> failed (though code comments say it should never fail).
>>>>
>>>> nfs40_walk_client_list() is looking for an nfs_client that is supposed to already be in the nfs_client list. If the search fails, that's a bug.
>>>>
>>>> Eyeball the contents of your nfs_client list. You should find an appropriate nfs_client in there, and then figure out why the search doesn't find it.
>>>
>>> You have considered the fact that the call to
>>> nfs4_proc_setclientid_confirm can potentially return
>>> NFS4ERR_STALE_CLIENTID if the server rebooted while the client was
>>> walking the list?
>>
>> In fact, as far as I can see, the correct behaviour in
>> nfs40_discover_server_trunking() should be to re-issue the setclientid
>> call, and then walk the list again if nfs40_walk_client_list() returns
>> NFS4ERR_STALE_CLIENTID.
>
> When I wrote the server trunking detection logic, I think we hadn't clearly decided what needed to be done in the STALE_CLIENTID case.
>
>> Something like the attached patch:
>
> A couple of comments:
>
> o nfs_get_client() already sticks the new client on the tail of the nfs_client list
>
> o We don't want to get stuck in a loop here. Should the "do {}" loop in nfs40_discover_server_trunking() be bounded by a retry count?
>
> However, I haven't heard Ben say "oh, yes, my server had rebooted." I'd like some confirmation that the match failed for an explainable and expected reason.
The server machine did not reboot, but it's badly overloaded,
trying to serve 3000 mount points that are
constantly being brought up and torn down while
NFS write traffic is going on.
Even with all this, I've seen this particular problem only twice in
around 2 days of solid testing (I've been optimizing my user-space
app, and the better it gets, the more kernel bugs I find!)
If you have some particular debug info you want printed in
the failure cause, I'll be happy to run with that.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
next prev parent reply other threads:[~2013-01-18 23:21 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-01-18 21:28 Question on nfs40_discover_server_trunking Ben Greear
2013-01-18 21:33 ` Chuck Lever
2013-01-18 21:36 ` Ben Greear
2013-01-18 21:59 ` Ben Greear
2013-01-18 22:29 ` Chuck Lever
2013-01-18 22:34 ` Ben Greear
2013-01-18 22:43 ` Chuck Lever
2013-01-18 22:51 ` Ben Greear
2013-01-18 23:01 ` Ben Greear
2013-01-18 22:03 ` Myklebust, Trond
[not found] ` <1358546604.2872.6.camel@leira.trondhjem.org>
2013-01-18 22:59 ` Myklebust, Trond
2013-01-18 23:06 ` Ben Greear
2013-01-18 23:14 ` Chuck Lever
2013-01-18 23:21 ` Ben Greear [this message]
2013-01-19 0:44 ` Myklebust, Trond
[not found] ` <1358556248.2835.10.camel@leira.trondhjem.org>
2013-01-19 1:01 ` Myklebust, Trond
2013-01-19 1:27 ` Chuck Lever
2013-01-19 4:11 ` Myklebust, Trond
2013-01-21 17:32 ` Ben Greear
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50F9D8E8.6080803@candelatech.com \
--to=greearb@candelatech.com \
--cc=Trond.Myklebust@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).