From: Chuck Lever <chuck.lever@oracle.com>
To: Rick Macklem <rmacklem@uoguelph.ca>
Cc: Bram Vandoren <brambi@gmail.com>,
"J. Bruce Fields" <bfields@fieldses.org>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>
Subject: Re: NFS client hangs after server reboot
Date: Tue, 28 May 2013 21:04:32 -0400 [thread overview]
Message-ID: <E626E85E-BC60-4FBA-BF02-536508991C65@oracle.com> (raw)
In-Reply-To: <1480713067.24683.1369783846612.JavaMail.root@erie.cs.uoguelph.ca>
On May 28, 2013, at 7:30 PM, Rick Macklem <rmacklem@uoguelph.ca> wrote:
> Bram Vandoren wrote:
>>>> Hi Rick, Chuck, Bruce,
>>>> in attachment is a small pcap when a client is in the locked.
>>>> Hopefully I can reproduce the problem so I can send you a capture
>>>> during a reboot cycle.
>>>
>>> The pcap file confirms that the state IDs and client ID do not
>>> appear to match, and do appear on the same TCP connection (in
>>> different operations). I think the presence of the RENEW operations
>>> here suggest that the client believes it has not been able to renew
>>> its lease using stateful operations like READ. IMO this is evidence
>>> in favor of the theory that the client neglected to recover these
>>> state IDs for some reason.
>>>
>>> We'll need to see the actual reboot recovery traffic to analyze
>>> further, and that occurs just after the server reboots. Even better
>>> would be to see the initial OPEN of the file where the READ
>>> operations are failing. I recognize this is a non-determinstic
>>> problem that will be a challenge to capture properly.
>>>
>>> Rather than capturing the trace on the server, you should be able to
>>> capture it on your clients in order to capture traffic before,
>>> during, and after the server reboot. To avoid capturing an enormous
>>> amount of data, both tcpdump and tshark provide options to save the
>>> captured network data into a small ring of files (see their man
>>> pages). Once a client mount point has locked, you can stop the
>>> capture, and hopefully the ring will have everything we need.
>>
>> Hi All,
>> I managed to capture the packets after a reboot. I send the pcap file
>> to the people in cc (privacy issue, contact me if someone on the list
>> wants a copy). This is a summary of what happens after a reboot
>> (perhaps a missed some relevant information):
>>
>> 38:
>> - client -> server: client executes 3 writes with a stale clientid (A)
>> - client -> server: RENEW
>> 44:
>> - server -> client: NFS4ERR_STALE_STATEID (in reponse to A)
>> 45:
>> - server -> client: NFS4ERR_STALE_CLIENTID
>> 65:
>> - client -> server: RENEW
>> 66
>> - server -> client: NFS4ERR_STALE_CLIENTID
>> 67,85,87,93:
>> SETCLIENTID/SETCLIENTID_CONFIRM sequence (ok)
>> 78,79:
>> NFS4STALE_STATEID (reponse to the other 2 writes in A)
>>
>> 98: OPEN with CLAIM_PREVIOUS
>> 107: response to open: NFS4ERR_NO_GRACE (strange?)
>> after that the client re-opens the files without CLAIM_PREVIOUS option
>> and they are all succesful.
>>
>> The client starts using the new stateids except for the files in A.
>> The server returns a NFS4_STALE_STATEID, the client executes a RENEW
>> (IMO this should be an OPEN request) and retries the WRITE, the server
>> returns a NFS4_STALE_STATEID
>>
>> Server: FreeBSD 9.1 with new NFS server implementation
>> Client: Fedora 17, 3.8.11-100.fc17.x86_64
>>
>> Any clues?
>>
>> Thanks,
>> Bram
> I just took a look at the packet capture (I hadn't realized you had
> posted one when I did the last reply).
>
> I think the problem is a TCP layer one, from looking at the first 8
> packets captured amd their timestamps. However, my TCP is old and
> tired, so I'm not sure. From the first 8 packets:
> Time(sec)
> 0 - client sends a SYN
> 0 - server replies [RST, ACK]
> Since Reset is in the reply, is this when the server rebooted?
> 6 - client sends a SYN with same port#s. Wireshark thinks this is
> a new connection using same port#s.
> 6 - FreeBSD replies with [RST, ACK] again, thinking it is a resend
> of the first packet and still considers it the old Reset connection.
> 15 - Repeat of what happened at 6sec.
> 30,31 - client sends SYN twice, still from the same port#s
> - no reply from FreeBSD
> 4393 - client sends SYN again with same port#s
> 4393 - server replies with [SYN, ACK] establishing the new TCP
> connection on the same port#s, but over an hour later.
To be clear, this is very early in the capture, just frame 9 and 10. The Linux client attempts to re-use the previous connection's source port to preserve the server's duplicate reply cache.
But it's not clear why the client is not retrying SYN during the period between frame 8 and frame 9. The client should retry SYN after just another minute or two, and by that time the server's TCP connection state should allow a connection on that port.
It's not typical for a client with an active workload to wait 4400 seconds to send a fresh SYN. Bram, can you shed any light on this?
> Since this is now over an hour after packet 0, it isn't
> surprising that Grace is over.
Fair enough. However, it does not explain why the client fails to recover the open state for one file.
> I'd guess that the problem occurs because the client reconnect
> uses the same client side port# even though the server did a
> Reset for that port#, but I'm not enough of a TCP guy to know
> what should be correct TCP behaviour.
>
> If you don't mind the first part of this packet capture being
> posted (email me to say if it ok to post it), I can post it to
> freebsd-net@freebsd.org, since there are a bunch of TCP savy
> types that might be able to help. (If you give the ok for this,
> please also email me what version of FreeBSD the server is
> running.)
Thanks for the extra set of eyes.
--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com
next prev parent reply other threads:[~2013-05-29 1:04 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-04-09 15:51 NFS client hangs after server reboot Bram Vandoren
2013-04-09 19:08 ` J. Bruce Fields
2013-04-10 19:33 ` Chuck Lever
2013-04-10 23:23 ` Rick Macklem
2013-04-11 23:15 ` Rick Macklem
2013-04-12 9:19 ` Bram Vandoren
2013-04-12 15:10 ` J. Bruce Fields
[not found] ` <CACQjR_CcKwHU8sMrmQ5YfgV5dbuiMLRRqBkDRQEVq2yjGEuzmg@mail.gmail.com>
2013-04-12 15:14 ` Chuck Lever
2013-05-28 12:31 ` Bram Vandoren
2013-05-28 19:23 ` Chuck Lever
2013-05-28 22:06 ` Rick Macklem
2013-05-28 23:30 ` Rick Macklem
2013-05-29 1:04 ` Chuck Lever [this message]
2013-05-29 1:13 ` Chuck Lever
2013-05-29 12:49 ` Rick Macklem
2013-05-30 11:09 ` Bram Vandoren
2013-05-30 0:24 ` Rick Macklem
2013-05-30 0:31 ` Rick Macklem
2013-05-30 11:20 ` Bram Vandoren
2013-05-30 11:04 ` Bram Vandoren
2013-05-30 11:55 ` Rick Macklem
2013-05-31 16:35 ` Bram Vandoren
2013-05-31 23:24 ` Rick Macklem
2013-08-28 13:39 ` William Dauchy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E626E85E-BC60-4FBA-BF02-536508991C65@oracle.com \
--to=chuck.lever@oracle.com \
--cc=bfields@fieldses.org \
--cc=brambi@gmail.com \
--cc=linux-nfs@vger.kernel.org \
--cc=rmacklem@uoguelph.ca \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).