From: Rick Macklem <rmacklem@uoguelph.ca>
To: Chuck Lever <chuck.lever@oracle.com>
Cc: Bram Vandoren <brambi@gmail.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: NFS client hangs after server reboot
Date: Wed, 29 May 2013 08:49:24 -0400 (EDT) [thread overview]
Message-ID: <1091403305.30723.1369831764803.JavaMail.root@erie.cs.uoguelph.ca> (raw)
In-Reply-To: <BC47EE42-5F58-4555-A49C-4175B75943D8@oracle.com>
Chuck Lever wrote:
> On May 28, 2013, at 9:04 PM, Chuck Lever <chuck.lever@oracle.com>
> wrote:
>
> >
> > On May 28, 2013, at 7:30 PM, Rick Macklem <rmacklem@uoguelph.ca>
> > wrote:
> >
> >> Bram Vandoren wrote:
> >>>>> Hi Rick, Chuck, Bruce,
> >>>>> in attachment is a small pcap when a client is in the locked.
> >>>>> Hopefully I can reproduce the problem so I can send you a
> >>>>> capture
> >>>>> during a reboot cycle.
> >>>>
> >>>> The pcap file confirms that the state IDs and client ID do not
> >>>> appear to match, and do appear on the same TCP connection (in
> >>>> different operations). I think the presence of the RENEW
> >>>> operations
> >>>> here suggest that the client believes it has not been able to
> >>>> renew
> >>>> its lease using stateful operations like READ. IMO this is
> >>>> evidence
> >>>> in favor of the theory that the client neglected to recover these
> >>>> state IDs for some reason.
> >>>>
> >>>> We'll need to see the actual reboot recovery traffic to analyze
> >>>> further, and that occurs just after the server reboots. Even
> >>>> better
> >>>> would be to see the initial OPEN of the file where the READ
> >>>> operations are failing. I recognize this is a non-determinstic
> >>>> problem that will be a challenge to capture properly.
> >>>>
> >>>> Rather than capturing the trace on the server, you should be able
> >>>> to
> >>>> capture it on your clients in order to capture traffic before,
> >>>> during, and after the server reboot. To avoid capturing an
> >>>> enormous
> >>>> amount of data, both tcpdump and tshark provide options to save
> >>>> the
> >>>> captured network data into a small ring of files (see their man
> >>>> pages). Once a client mount point has locked, you can stop the
> >>>> capture, and hopefully the ring will have everything we need.
> >>>
> >>> Hi All,
> >>> I managed to capture the packets after a reboot. I send the pcap
> >>> file
> >>> to the people in cc (privacy issue, contact me if someone on the
> >>> list
> >>> wants a copy). This is a summary of what happens after a reboot
> >>> (perhaps a missed some relevant information):
> >>>
> >>> 38:
> >>> - client -> server: client executes 3 writes with a stale clientid
> >>> (A)
> >>> - client -> server: RENEW
> >>> 44:
> >>> - server -> client: NFS4ERR_STALE_STATEID (in reponse to A)
> >>> 45:
> >>> - server -> client: NFS4ERR_STALE_CLIENTID
> >>> 65:
> >>> - client -> server: RENEW
> >>> 66
> >>> - server -> client: NFS4ERR_STALE_CLIENTID
> >>> 67,85,87,93:
> >>> SETCLIENTID/SETCLIENTID_CONFIRM sequence (ok)
> >>> 78,79:
> >>> NFS4STALE_STATEID (reponse to the other 2 writes in A)
> >>>
> >>> 98: OPEN with CLAIM_PREVIOUS
> >>> 107: response to open: NFS4ERR_NO_GRACE (strange?)
> >>> after that the client re-opens the files without CLAIM_PREVIOUS
> >>> option
> >>> and they are all succesful.
> >>>
> >>> The client starts using the new stateids except for the files in
> >>> A.
> >>> The server returns a NFS4_STALE_STATEID, the client executes a
> >>> RENEW
> >>> (IMO this should be an OPEN request) and retries the WRITE, the
> >>> server
> >>> returns a NFS4_STALE_STATEID
> >>>
> >>> Server: FreeBSD 9.1 with new NFS server implementation
> >>> Client: Fedora 17, 3.8.11-100.fc17.x86_64
> >>>
> >>> Any clues?
> >>>
> >>> Thanks,
> >>> Bram
> >> I just took a look at the packet capture (I hadn't realized you had
> >> posted one when I did the last reply).
> >>
> >> I think the problem is a TCP layer one, from looking at the first 8
> >> packets captured amd their timestamps. However, my TCP is old and
> >> tired, so I'm not sure. From the first 8 packets:
> >> Time(sec)
> >> 0 - client sends a SYN
> >> 0 - server replies [RST, ACK]
> >> Since Reset is in the reply, is this when the server rebooted?
> >> 6 - client sends a SYN with same port#s. Wireshark thinks this is
> >> a new connection using same port#s.
> >> 6 - FreeBSD replies with [RST, ACK] again, thinking it is a resend
> >> of the first packet and still considers it the old Reset
> >> connection.
> >> 15 - Repeat of what happened at 6sec.
> >> 30,31 - client sends SYN twice, still from the same port#s
> >> - no reply from FreeBSD
> >> 4393 - client sends SYN again with same port#s
> >> 4393 - server replies with [SYN, ACK] establishing the new TCP
> >> connection on the same port#s, but over an hour later.
> >
> > To be clear, this is very early in the capture, just frame 9 and 10.
> > The Linux client attempts to re-use the previous connection's source
> > port to preserve the server's duplicate reply cache.
>
Btw, I think that a DRC that assumes the client will use the same port#
is broken. I'm not sure if all clients use the same port# for UDP (the
FreeBSD client does a connect() for UDP, so I think it will), but I
definitely think that a server can't assume "same port#" for TCP.)
However, this is off the current topic/issue.
> And by the way, if the server actually did reboot, it shouldn't have
> noticed the client's port re-use. TCP connection state should have
> been wiped by the system restart.
>
Good point. Which leads to the question, "When did the server reboot
relative to the packet capture?". Was it before the capture started or
sometime during the 4400 seconds when the client wasn't sending anything or ???
> > But it's not clear why the client is not retrying SYN during the
> > period between frame 8 and frame 9. The client should retry SYN
> > after just another minute or two, and by that time the server's TCP
> > connection state should allow a connection on that port.
> >
> > It's not typical for a client with an active workload to wait 4400
> > seconds to send a fresh SYN. Bram, can you shed any light on this?
> >
> >> Since this is now over an hour after packet 0, it isn't
> >> surprising that Grace is over.
> >
> > Fair enough. However, it does not explain why the client fails to
> > recover the open state for one file.
> >
> >> I'd guess that the problem occurs because the client reconnect
> >> uses the same client side port# even though the server did a
> >> Reset for that port#, but I'm not enough of a TCP guy to know
> >> what should be correct TCP behaviour.
> >>
> >> If you don't mind the first part of this packet capture being
> >> posted (email me to say if it ok to post it), I can post it to
> >> freebsd-net@freebsd.org, since there are a bunch of TCP savy
> >> types that might be able to help. (If you give the ok for this,
> >> please also email me what version of FreeBSD the server is
> >> running.)
> >
> > Thanks for the extra set of eyes.
> >
Well, I am waiting for permission from Bram to post it. However, I
suppose I could post a summary instead (leaving IP#s out, etc). I
don't think there would be anything in it other than IP#s that he
might not want advertised?
rick
> > --
> > Chuck Lever
> > chuck[dot]lever[at]oracle[dot]com
> >
> >
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs"
> > in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
next prev parent reply other threads:[~2013-05-29 12:49 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-09 15:51 NFS client hangs after server reboot Bram Vandoren
2013-04-09 19:08 ` J. Bruce Fields
2013-04-10 19:33 ` Chuck Lever
2013-04-10 23:23 ` Rick Macklem
2013-04-11 23:15 ` Rick Macklem
2013-04-12 9:19 ` Bram Vandoren
2013-04-12 15:10 ` J. Bruce Fields
[not found] ` <CACQjR_CcKwHU8sMrmQ5YfgV5dbuiMLRRqBkDRQEVq2yjGEuzmg@mail.gmail.com>
2013-04-12 15:14 ` Chuck Lever
2013-05-28 12:31 ` Bram Vandoren
2013-05-28 19:23 ` Chuck Lever
2013-05-28 22:06 ` Rick Macklem
2013-05-28 23:30 ` Rick Macklem
2013-05-29 1:04 ` Chuck Lever
2013-05-29 1:13 ` Chuck Lever
2013-05-29 12:49 ` Rick Macklem [this message]
2013-05-30 11:09 ` Bram Vandoren
2013-05-30 0:24 ` Rick Macklem
2013-05-30 0:31 ` Rick Macklem
2013-05-30 11:20 ` Bram Vandoren
2013-05-30 11:04 ` Bram Vandoren
2013-05-30 11:55 ` Rick Macklem
2013-05-31 16:35 ` Bram Vandoren
2013-05-31 23:24 ` Rick Macklem
2013-08-28 13:39 ` William Dauchy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1091403305.30723.1369831764803.JavaMail.root@erie.cs.uoguelph.ca \
--to=rmacklem@uoguelph.ca \
--cc=bfields@fieldses.org \
--cc=brambi@gmail.com \
--cc=chuck.lever@oracle.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).