All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Wysochanski <dwysocha@redhat.com>
To: Olga Kornievskaia <olga.kornievskaia@gmail.com>
Cc: trond.myklebust@hammerspace.com,
	Anna Schumaker <anna.schumaker@netapp.com>,
	linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: [PATCH 1/1] SUNRPC: fix handling of half-closed connection
Date: Fri, 22 Feb 2019 14:17:29 -0500	[thread overview]
Message-ID: <1550863049.9958.3.camel@redhat.com> (raw)
In-Reply-To: <CAN-5tyEpWevmXJb8weecFBm_V=FZm4Zc=b+rrwJFtMgUYhm2tg@mail.gmail.com>

On Fri, 2019-02-22 at 12:10 -0500, Olga Kornievskaia wrote:
> On Fri, Feb 22, 2019 at 11:32 AM Dave Wysochanski <dwysocha@redhat.co
> m> wrote:
> > 
> > On Fri, 2019-02-22 at 09:44 -0500, Olga Kornievskaia wrote:
> > > Hi Dave,
> > > 
> > > A re-producer is a server that sends an ACK to the client's
> > > FIN/ACK
> > > request and does nothing afterwards (I can reproduce it 100% with
> > > a
> > > hacked up server. It was discovered with a "broken" server that
> > > doesn't fully closes a connection). It leave this client unable
> > > to
> > > connect to this server again indefinitely/forever/reboot required
> > > kind
> > > of state. Once it was considered that doing something like that
> > > to
> > > the
> > > client is a form of an attack (denial-of-server) and thus the
> > > kernel
> > > has a tcp_fin_timeout option after which the kernel will abort
> > > the
> > > connection. However this only applies to the sockets that have
> > > been
> > > closed by the client. This is NOT the case. NFS does not close
> > > the
> > > connection and it ignores kernel's notification of FIN_WAIT2
> > > state.
> > > 
> > 
> > Interesting.  I had the same reproducer but eventually an RST would
> > get
> > sent from the NFS client due to the TCP keepalives.  It sounds like
> > that is not the case anymore.  When I did my testing was over a
> > year
> > and a half ago though.
> 
> after the keepalives the TCP also sends a FIN/ACK if the server
> properly sent a FIN/ACK back, then keep alive will indeed be
> sufficient. Can you check if in your trace server in one time sent
> just an ACK but in another case sent the FIN/ACK?
> 

I had two reproducers actually.  In both cases the NFS server would
never send a FIN.

The first bug (https://bugzilla.redhat.com/show_bug.cgi?id=1458421) was
about a NFS server crash after being in the half-close state.  An NFS4
client without that keepalive patch would hang.
This is a valid test case and we check for that now in all releases. 
Here's the outline:
# Outline
# Step 1. On NFS client: tcpdump -U -q -i $INTERFACE -w $PCAP_FILE -s 0 port 2049 and host $NFS_SERVER
# Step 2. On NFS client, mount NFS4 server export with "vers=4,timeo=5"
# Step 3. On NFS server, block outgoing FIN packets from the NFS server"
# Step 4. On NFS server, using systemtap script, to delay NFS4OPEN_CONFIRM response for a few seconds
# Step 5. On NFS client, open a file, which should get delayed and trigger a FIN from the NFS client
# Step 6. On NFS server, after a short delay, crash the server by 'echo c > /proc/sysrq-trigger'
# Step 7. On NFS client, wait for NFS server to reboot
# Step 8. On NFS client, wait for the NFS to reconnect TCP connection
# Step 9. On NFS client, wait for NFS4 grace period
# Step 10. On NFS client, try a 'ls' of the file created earlier
# Step 11. On NFS client, sleep for $DELAY seconds to allow all operations to complete
# Step 12. On NFS client, ensure all operations have completed
# Step 13. Cleanup


The second test case (after the keepalive patch) was arguably invalid
test as I used systemtap on the NFS server to never close the socket so
it would never send a FIN.  This obviously should never happen but at
the time I wanted to see if the NFS client would recover.
I did that with this:
probe module("sunrpc").function("svc_xprt_enqueue") {
	printf("%s: %s removing XPT_CLOSE from xprt = %p\n", tz_ctime(gettimeofday_s()), ppfunc(), $xprt);
	$xprt->xpt_flags = $xprt->xpt_flags & ~(1 << 2);
}
Here's the full outline for this test:

# This test checks automatic NFS TCP recovery after a specific failure scenario.
# The failure scenario is with an NFSv3 mount that goes idle and the NFS client closes the
# TCP connection, and the final FIN from the NFS server gets lost.
# This can be a tricky scenario since the NFS client's TCP may get stuck in FIN-WAIT-2 indefinitely.
# The NFS client should be able to recover the NFS TCP connection automatically without unmount/mount
# cycle or reboot, and the TCP connection should not be stuck in FIN-WAIT-2 or any other non-connected
# state.
# 
# Outline
# Step 1. On NFS client: tcpdump -U -q -i $INTERFACE -w $PCAP_FILE -s 0 port 2049 and host $NFS_SERVER
# Step 2. On NFS client, mount NFS3 server export
# Step 3. On NFS server, using systemtap script, force the NFS server to never send a FIN (avoid close)
# Step 4. On NFS client, verify the NFS mount with 'ls' and TCP connection in TCP_ESTABLISHED
# Step 5. On NFS client, wait up to $time_limit seconds to allow NFS TCP connection to transition to TIME_WAIT state
# Step 6. On NFS client, wait up to $time_limit seconds to allow NFS TCP connection to disappear
# Step 7. Cleanup
#
# PASS: The state of NFS's TCP connection is usable by the end of the test (commands don't hang, etc)
# FAIL: The state of the NFS TCP connection remains in FIN-WAIT-2 indefinitely


> > > One can argue that this is a broken server and we shouldn't
> > > bother.
> > > But this patch is an attempt to argue that the client still
> > > should
> > > care and deal with this condition. However, if the community
> > > feels
> > > that a broken server is a broken server and this form of an
> > > attack is
> > > not interested, this patch can live will be an archive for later
> > > or
> > > never.
> > > 
> > 
> > This isn't IPoIB is it?
> 
> No, just normal TCP.
> 
> 
> > Actually, fwiw, looking back I had speculated on changes in this
> > area.
> > I'm adding you to the CC list of this bug which had some of my
> > musings
> > on it:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1466802#c43
> > That bug I ended up closing when we could no longer prove there was
> > any
> > state where the NFS client could get stuck in FIN_WAIT2 after the
> > keepalive patch.
> 
> I can reproduce current behavior with the current upstream code.
> 
> 
> > It can happen that the server only sends the ACK back to the
> > clients
> > FIN,ACK so in general the testcase is valid.  But then the question
> > is
> > how long should one wait for the final data and FIN from the
> > server, or
> > are there ever instances where you shouldn't wait forever?  Is
> > there a
> > way for us to know for sure there is no data left to receive so
> > it's
> > safe to timeout?  No RPCs outstanding?
> 
> Yep all good questions which I don't have answers to. I realize that
> for instance, that a server might send an ACK and then send a FIN/ACK
> after that. But why is it bad for the client to proactively send an
> RST (by shutting down the connection in TCP_FIN_WAIT2 if the client
> was shutting down the connection anyway.
> 
I think you could lose data from the server if you RST and don't wait
but other than that I don't know.  We may be splitting hairs here
though if there's no demonstrable harm to the application (NFS).  As
Trond points out elsewhere reconnect / port reuse may be an issue.

When I looked at this in the second bug I was almost convinced that
there was a problem with the 'close' method in xs_tcp_ops - we needed
to be calling xs_close(), but I ran into some problems and I wasn't
sure about the test case.


> > I don't claim to know many of the subtleties here as far as would
> > the
> > server wait forever in LAST_ACK or do implementations eventually
> > timeout after some time?  Seems like if these last packets get lost
> > there is likely a bug somewhere (either firewall or TCP stack,
> > etc).
> > https://tools.ietf.org/html/rfc793#page-22
> > 
> > It looks like at least some people are putting timeouts into their
> > stacks though I'm not sure that's a good idea or not.
> 
> I saw only one other driver in the kernel that does have handling of
> the TCP_FIN_WAIT2 state (driver/crypto/chelsio) ...


One final thought - is it possible that we could have a network that
does not support TCP keepalives?  In that instance, I'll bet we could
get an indefinite hang in FIN_WAIT2 on my first test case (server
crashed after the half-close state).

It does not look like TCP keepalives have to be implemented and maybe
some routers / NATs / firewalls disallow them (due to extra traffic)?
https://tools.ietf.org/html/rfc1122#page-101



  reply	other threads:[~2019-02-22 19:17 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-20 14:56 [PATCH 1/1] SUNRPC: fix handling of half-closed connection Olga Kornievskaia
2019-02-22 12:12 ` Dave Wysochanski
2019-02-22 13:45   ` Trond Myklebust
2019-02-22 14:46     ` Olga Kornievskaia
2019-02-22 15:05       ` Trond Myklebust
2019-02-22 15:11         ` Olga Kornievskaia
2019-02-22 15:50           ` Trond Myklebust
2019-02-22 17:02             ` Olga Kornievskaia
2019-02-22 17:09               ` Trond Myklebust
2019-02-22 14:44   ` Olga Kornievskaia
2019-02-22 16:32     ` Dave Wysochanski
2019-02-22 17:10       ` Olga Kornievskaia
2019-02-22 19:17         ` Dave Wysochanski [this message]
2019-02-22 20:00           ` Trond Myklebust
2019-02-23 17:31             ` Dave Wysochanski
2019-03-05 17:22             ` Olga Kornievskaia

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1550863049.9958.3.camel@redhat.com \
    --to=dwysocha@redhat.com \
    --cc=anna.schumaker@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=olga.kornievskaia@gmail.com \
    --cc=trond.myklebust@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.