From: "Benjamin Coddington" <bcodding@redhat.com>
To: "Alan Post" <adp@prgmr.com>
Cc: linux-nfs <linux-nfs@vger.kernel.org>
Subject: Re: User process NFS write hang in wait_on_commit with kworker
Date: Wed, 19 Jun 2019 08:38:02 -0400 [thread overview]
Message-ID: <25608EB2-87F0-4196-BEF9-8AB8FC72270B@redhat.com> (raw)
In-Reply-To: <20190619000746.GT4158@turtle.email>
On 18 Jun 2019, at 20:07, Alan Post wrote:
> On Tue, Jun 18, 2019 at 11:29:16AM -0400, Benjamin Coddington wrote:
>> I think that your transport or NFS server is dropping the response to an
>> RPC. The NFS client will not retransmit on an established connection.
>>
>> What server are you using? Any middle boxes on the network that could be
>> transparently dropping transmissions (less likely, but I have seen them)?
>>
>
> I've found 8 separate NFS client hangs of the sort I reported here,
> and in all cases the same NFS server was involved: an Ubuntu Trusty
> system running 4.4.0. I've been upgrading all of these NFS servers,
> haven't done this one yet--the complicity of NFS hangs I've been
> seeing have slowed me down.
>
> Of the 8 NFS clients with a hang to this server, about half are in
> the same computer room where packets only transit rack switches, with
> the other half also going through a computer room router.
>
> I see positive dropped and overrun packet counts on the NFS server
> interface, along with a similar magnitude of pause counts on the
> switch port for the NFS server. Given the occurences of this issue
> only this rack switch and a redundant pair of top-of-rack switches in
> the rack with the NFS server are in-common between all 8 NFS clients
> with write hangs.
TCP drops or overruns should not be a problem since the TCP layer will
retransmit packets that are not acked. The issue would be if the NFS
server is perhaps silently dropping a response to an IO RPC. Or, an
intelligent middle-box that keeps its own stateful transparent TCP handling
between client and server existed (you clearly don't have that here).
So I recall some knfsd issues dropping replies in that era of kernel
versions when the GSS sequencing grew out of a window. Are you using a
sec=krb5* on these mounts, or is it all sec=sys? Perhaps that's the problem
you are seeing. Again, just some guessing.
Verifying this is the problem could be done by setting up some rolling
network captures.. but sometimes it can be hard to not have the capture
fill up with continuing traffic from other processes.
Ben
next prev parent reply other threads:[~2019-06-19 12:38 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-18 0:06 User process NFS write hang in wait_on_commit with kworker Alan Post
2019-06-18 15:29 ` Benjamin Coddington
2019-06-19 0:07 ` Alan Post
2019-06-19 12:38 ` Benjamin Coddington [this message]
2019-06-21 20:47 ` Alan Post
2019-06-28 18:33 ` Alan Post
2019-07-02 9:55 ` Benjamin Coddington
2019-07-03 21:32 ` Alan Post
2019-07-05 23:53 ` Tom Talpey
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=25608EB2-87F0-4196-BEF9-8AB8FC72270B@redhat.com \
--to=bcodding@redhat.com \
--cc=adp@prgmr.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox