From: Guillaume Morin <guillaume@morinfr.org>
To: Chuck Lever <chucklever@gmail.com>
Cc: Guillaume Morin <guillaume@morinfr.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
Trond Myklebust <trond.myklebust@primarydata.com>,
Chris Mason <clm@fb.com>
Subject: Re: [BUG] nfs3 client stops retrying to connect
Date: Mon, 8 Jun 2015 19:10:06 +0200 [thread overview]
Message-ID: <20150608171006.GA13396@bender.morinfr.org> (raw)
In-Reply-To: <22109174-5489-46AB-8C0A-62840D63DC97@gmail.com>
Chuck,
On 04 Jun 22:57, Chuck Lever wrote:
> > I am 100% sure that XPRT_CONNECTING is the issue because 1) the state
> > had the flag up 2) there was absolutley no nfs network traffic between the
> > client and the server 3) I "unfroze" the mounts by clearing it manually.
> >
> > xs_tcp_cancel_linger_timeout, I think, is guaranteed to clear the flag.
>
> I'm speculating based on some comments in the git log, but what if
> the transport never sees TCP_CLOSE, but rather gets an error_report
> callback instead?
I don't think that could be it because xs_tcp_setup_socket() does the
connecting and is clearing the bit in all cases so at the time you would get
a TCP_CLOSE it would have been cleared a while ago.
So that's why I thought the best explanation was finding a place where
the worker task running xs_tcp_setup_socket() is cancelled and the bit
not cleared. This is how I found xs_tcp_close()
> > Either the callback is canceled and it clears the flag or the callback
> > will do it. I am not sure how this could leave the flag set but I am
> > not familiar with this code, so I could totally be missing something
> > obvious.
> >
> > xs_tcp_close() is the only thing I have found which cancels the callback
> > and does not clear the flag.
>
> How would xs_tcp_close() be invoked?
TBH I do not know. It's the close() method of the xprt so I am assuming
there are a few places where it could be. But I am not familiar with
the code base..
> >> It's rather academic, though. All this code was replaced in 4.0.
> >
> > Well, it's not academic for all the users of the stable branches which
> > might have this bug in the kernel they're running :-)
>
> I didn't mean to be glib. The point is, stable kernels are always fixed
> by backporting an existing fix from a newer kernel.
The stable kernel rules says an "equivalent" fix in the Linus' tree. I
think that Greg would pick up this fix unless it's too complicated.
Nevertheless, it's such an annoying bug I am pretty sure the
distributions would pick it up if Greg does not.
We had to move an nfs server on friday and I got a few machines that had
the same issue again...
Thanks for your help, I appreciate it.
Guillaume.
--
Guillaume Morin <guillaume@morinfr.org>
next prev parent reply other threads:[~2015-06-08 17:10 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-21 1:21 [BUG] nfs3 client stops retrying to connect Guillaume Morin
2015-06-03 18:31 ` Chuck Lever
2015-06-04 20:06 ` Guillaume Morin
2015-06-04 21:23 ` Chuck Lever
2015-06-04 22:14 ` Guillaume Morin
2015-06-05 2:57 ` Chuck Lever
2015-06-08 17:10 ` Guillaume Morin [this message]
2015-06-08 17:50 ` Chuck Lever
2015-06-08 18:12 ` Guillaume Morin
2015-08-25 15:16 ` Guillaume Morin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150608171006.GA13396@bender.morinfr.org \
--to=guillaume@morinfr.org \
--cc=chucklever@gmail.com \
--cc=clm@fb.com \
--cc=linux-nfs@vger.kernel.org \
--cc=trond.myklebust@primarydata.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).