linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Guillaume Morin <guillaume@morinfr.org>
To: Chuck Lever <chucklever@gmail.com>
Cc: Guillaume Morin <guillaume@morinfr.org>,
	Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Trond Myklebust <trond.myklebust@primarydata.com>,
	Chris Mason <clm@fb.com>
Subject: Re: [BUG] nfs3 client stops retrying to connect
Date: Mon, 8 Jun 2015 19:10:06 +0200	[thread overview]
Message-ID: <20150608171006.GA13396@bender.morinfr.org> (raw)
In-Reply-To: <22109174-5489-46AB-8C0A-62840D63DC97@gmail.com>

Chuck,

On 04 Jun 22:57, Chuck Lever wrote:
> > I am 100% sure that XPRT_CONNECTING is the issue because 1) the state
> > had the flag up 2) there was absolutley no nfs network traffic between the
> > client and the server 3) I "unfroze" the mounts by clearing it manually.
> > 
> > xs_tcp_cancel_linger_timeout, I think, is guaranteed to clear the flag.
> 
> I'm speculating based on some comments in the git log, but what if
> the transport never sees TCP_CLOSE, but rather gets an error_report
> callback instead?

I don't think that could be it because xs_tcp_setup_socket() does the
connecting and is clearing the bit in all cases so at the time you would get
a TCP_CLOSE it would have been cleared a while ago.

So that's why I thought the best explanation was finding a place where
the worker task running xs_tcp_setup_socket() is cancelled and the bit
not cleared.  This is how I found xs_tcp_close()

> > Either the callback is canceled and it clears the flag or the callback
> > will do it.  I am not sure how this could leave the flag set but I am
> > not familiar with this code, so I could totally be missing something
> > obvious.
> > 
> > xs_tcp_close() is the only thing I have found which cancels the callback
> > and does not clear the flag.
> 
> How would xs_tcp_close() be invoked?

TBH I do not know.  It's the close() method of the xprt so I am assuming
there are a few places where it could be.  But I am not familiar with
the code base..

> >> It's rather academic, though. All this code was replaced in 4.0.
> > 
> > Well, it's not academic for all the users of the stable branches which
> > might have this bug in the kernel they're running :-)
> 
> I didn't mean to be glib. The point is, stable kernels are always fixed
> by backporting an existing fix from a newer kernel.

The stable kernel rules says an "equivalent" fix in the Linus' tree.  I
think that Greg would pick up this fix unless it's too complicated.

Nevertheless, it's such an annoying bug I am pretty sure the
distributions would pick it up if Greg does not.

We had to move an nfs server on friday and I got a few machines that had
the same issue again... 

Thanks for your help, I appreciate it.

Guillaume.

-- 
Guillaume Morin <guillaume@morinfr.org>

  reply	other threads:[~2015-06-08 17:10 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-21  1:21 [BUG] nfs3 client stops retrying to connect Guillaume Morin
2015-06-03 18:31 ` Chuck Lever
2015-06-04 20:06   ` Guillaume Morin
2015-06-04 21:23     ` Chuck Lever
2015-06-04 22:14       ` Guillaume Morin
2015-06-05  2:57         ` Chuck Lever
2015-06-08 17:10           ` Guillaume Morin [this message]
2015-06-08 17:50             ` Chuck Lever
2015-06-08 18:12               ` Guillaume Morin
2015-08-25 15:16                 ` Guillaume Morin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150608171006.GA13396@bender.morinfr.org \
    --to=guillaume@morinfr.org \
    --cc=chucklever@gmail.com \
    --cc=clm@fb.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).