Re: PROBLEM: NFS Client Ignores TCP Resets

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: NeilBrown <nfbrown@novell.com>
To: Richard Laager <rlaager@wiktel.com>,
	trond.myklebust@primarydata.com, anna.schumaker@netapp.com
Cc: linux-nfs@vger.kernel.org
Subject: Re: PROBLEM: NFS Client Ignores TCP Resets
Date: Sun, 03 Apr 2016 13:58:59 +1000	[thread overview]
Message-ID: <87twjjpcl8.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <56BFE55D.1010509@wiktel.com>

[-- Attachment #1: Type: text/plain, Size: 3530 bytes --]

On Sun, Feb 14 2016, Richard Laager wrote:

> [1.] One line summary of the problem:
>
> NFS Client Ignores TCP Resets
>
> [2.] Full description of the problem/report:
>
> Steps to reproduce:
> 1) Mount NFS share from HA cluster with TCP.
> 2) Failover the HA cluster. (The NFS server's IP address moves from one
>     machine to the other.)
> 3) Access the mounted NFS share from the client (an `ls` is sufficient).
>
> Expected results:
> Accessing the NFS mount works fine immediately.
>
> Actual results:
> Accessing the NFS mount hangs for 5 minutes. Then the TCP connection 
> times out, a new connection is established, and it works fine again.
>
> After the IP moves, the new server responds to the client with TCP RST 
> packets, just as I would expect. I would expect the client to tear down 
> its TCP connection immediately and re-establish a new one. But it 
> doesn't. Am I confused, or is this a bug?
>
> For the duration of this test, all iptables firewalling was disabled on 
> the client machine.
>
> I have a packet capture of a minimized test (just a simple ls):
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1542826/+attachment/4571304/+files/dovecot-test.upstream-kernel.pcap

I notice that the server sends packets from a different MAC address to
the one it advertises in ARP replies (and the one the client sends to).
This is probably normal - maybe you have two interfaces bonded together?

Maybe it would help to be explicit about the network configuration
between client and server - are there switches?  soft or hard?

Where is tcpdump being run?  On the (virtual) client, or on the
(physical) host or elsewhere?

As you say, everything looks perfect until the server sends an RST and
the client appears to ignore it.  The from/to addresses are all
identical to those on the subsequent SYN/ACK which is not ignored so it
seems unlikely that the SYN/ACK would get through but not the RST.

This bug (it is definitely a bug somewhere) looks suspiciously similar
to the one fixed by
Commit: 7b514a886ba5 ("tcp: accept RST without ACK flag")
but that was fixed 3 years ago - a temporary bug in v3.8.  I cannot see
any evidence that it has crept back.

Can you create a TCP connection to some other port on the server
(telnet? ssh? http?) and see what happens to it on fail-over?
You would need some protocol that the server won't quickly close.
Maybe just "telnet SERVER 2049" and don't type anything until after the
failover.

If that closes quickly, then maybe it is an NFS bug.  If that persists
for a long timeout before closing, then it must be a network bug -
either in the network code or the network hardware.
In that case, netdev@vger.kernel.org might be the best place to ask.

Looking at the debug logs, the most interesting (to me) part is

2016-03-11T03:27:24.897463-06:00 imap1 kernel:
  [  479.708050] RPC:       xs_error_report client ffff88003cfe4000, error=113...

error 113 is EHOSTUNREACH.  This strongly suggests that the network layer
didn't "see" the RST and has only broken the connection because it isn't
getting a reply from the server for it's GETATTR retransmissions.

If you were up to building your own kernel, I would suggest putting some
printks in tcp_validate_incoming() (in net/ipv4/tcp_input.c).

Print a message if th->rst is ever set, and another if the
tcp_sequence() test causes it to be discarded.  It shouldn't but
something seems to be discarding it somewhere...

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

next prev parent reply	other threads:[~2016-04-03  3:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-14  2:24 PROBLEM: NFS Client Ignores TCP Resets Richard Laager
2016-03-08 17:06 ` Richard Laager
2016-03-09 21:16   ` Anna Schumaker
2016-03-09 21:42     ` Richard Laager
2016-03-11  9:44     ` Richard Laager
2016-04-02  1:43       ` Richard Laager
2016-04-03  3:58 ` NeilBrown [this message]
2016-04-07  9:45   ` Richard Laager
2016-04-08  0:47     ` NeilBrown
2017-10-02 19:29       ` Olga Kornievskaia
2017-10-02 22:13         ` NeilBrown
     [not found]       ` <CAN-5tyHuuBJxwqFLkiZa5ktBk7ypCJxmZ9creeD_RGWbK4Xn3A@mail.gmail.com>
2017-10-02 19:48         ` Richard Laager
2017-10-02 22:03           ` Olga Kornievskaia
2017-10-03  0:09             ` Richard Laager

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87twjjpcl8.fsf@notabene.neil.brown.name \
    --to=nfbrown@novell.com \
    --cc=anna.schumaker@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=rlaager@wiktel.com \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).