From: "Dr. J. Bruce Fields" <bfields@fieldses.org>
To: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Harry Edmon <harry@uw.edu>, Chuck Lever <chuck.lever@oracle.com>,
linux-nfs@vger.kernel.org
Subject: Re: 2.6.38.6 - state manager constantly respawns
Date: Mon, 16 May 2011 16:53:51 -0400 [thread overview]
Message-ID: <20110516205351.GD1680@fieldses.org> (raw)
In-Reply-To: <20110516202059.GC1680@fieldses.org>
On Mon, May 16, 2011 at 04:20:59PM -0400, Dr. J. Bruce Fields wrote:
> On Mon, May 16, 2011 at 03:54:16PM -0400, Trond Myklebust wrote:
> > On Mon, 2011-05-16 at 12:48 -0700, Harry Edmon wrote:
> > > On 05/16/11 12:43, Trond Myklebust wrote:
> > > > On Mon, 2011-05-16 at 12:36 -0700, Harry Edmon wrote:
> > > >
> > > >> On 05/16/11 12:22, Chuck Lever wrote:
> > > >>
> > > >>> On May 16, 2011, at 3:12 PM, Harry Edmon wrote:
> > > >>>
> > > >>>
> > > >>>
> > > >>>> Attached is 1000 lines of output from tshark when the problem is occurring. The client and server are connected by a private ethernet.
> > > >>>>
> > > >>>>
> > > >>> Disappointing: tshark is not telling us the return codes. However, I see "PUTFH;READ" then "RENEW" in a loop, which indicates the state manager thread is being kicked off because of ongoing difficulties with state recovery. Is there a stuck application on that client?
> > > >>>
> > > >>> Try again with "tshark -V".
> > > >>>
> > > >>>
> > > >> Here is the output from tshark -V (first 50,000 lines). Nothing
> > > >> appears to be stuck, and as I said when I reboot the client into 2.6.32
> > > >> the problem goes away, only to reappear when I reboot it back into 2.6.38.6.
> > > >>
> > > >>
> > > > Possibly, but it definitely indicates a server bug. What kind of server
> > > > are you using?
> > > >
> > > > Basically, the client is getting confused because when it sends a READ,
> > > > the server is telling it that the lease has expired, then when it sends
> > > > a RENEW, the same server replies that the lease is OK...
> > > >
> > > > Trond
> > > >
> > > The server is running the 2.6.38.6 kernel with Debian squeeze, just like
> > > the client. The kernel config is attached.
> >
> > Bruce, any idea how the server might get into this state?
>
> So READ is getting ESTALE
Err, sorry, EXPIRED.
> and RENEW is getting OK? And we're positive
> that the stateid on the READ is derived from the clientid sent with the
> RENEW?
>
> OK, I'll look at the capture....
Hm, so the renews all have clid 465ccc4d09000000, and the reads all have
a stateid (0, 465ccc4dc24c0a0000000000).
So the first 4 bytes matching just tells me both were handed out by the
same server instance (so there was no server reboot in between); there's
no way for me to tell whether they really belong to the same client.
The server does assume that any stateid from the current server instance
that no longer exists in its table is expired. I believe that's
correct, given a correctly functioning client, but perhaps I'm missing a
case.
--b.
next prev parent reply other threads:[~2011-05-16 20:53 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-16 18:40 2.6.38.6 - state manager constantly respawns Harry Edmon
2011-05-16 18:45 ` Chuck Lever
2011-05-16 19:12 ` Harry Edmon
2011-05-16 19:22 ` Chuck Lever
2011-05-16 19:36 ` Harry Edmon
2011-05-16 19:43 ` Trond Myklebust
2011-05-16 19:48 ` Harry Edmon
2011-05-16 19:54 ` Trond Myklebust
2011-05-16 20:20 ` Dr. J. Bruce Fields
2011-05-16 20:53 ` Dr. J. Bruce Fields [this message]
2011-05-20 16:20 ` Harry Edmon
2011-05-20 17:26 ` Dr. J. Bruce Fields
2011-05-20 17:52 ` Trond Myklebust
2011-05-20 18:36 ` Trond Myklebust
2011-05-20 18:59 ` Dr. J. Bruce Fields
2011-05-20 19:15 ` Trond Myklebust
2011-05-20 19:32 ` Dr. J. Bruce Fields
2011-05-20 18:47 ` Dr. J. Bruce Fields
2011-05-20 18:50 ` Bryan Schumaker
2011-05-20 19:29 ` Harry Edmon
2011-05-20 19:39 ` Andy Adamson
2011-05-20 19:40 ` Trond Myklebust
2011-05-20 19:44 ` Harry Edmon
2011-05-20 20:11 ` Trond Myklebust
2011-05-20 20:23 ` Harry Edmon
2011-05-20 20:27 ` Trond Myklebust
2011-05-20 18:35 ` Harry Edmon
2011-05-16 20:21 ` Chuck Lever
2011-05-16 20:33 ` Trond Myklebust
[not found] ` <1305578007.19725.24.camel-SyLVLa/KEI9HwK5hSS5vWB2eb7JE58TQ@public.gmane.org>
2011-05-16 20:37 ` Harry Edmon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110516205351.GD1680@fieldses.org \
--to=bfields@fieldses.org \
--cc=Trond.Myklebust@netapp.com \
--cc=chuck.lever@oracle.com \
--cc=harry@uw.edu \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).