Weird TCP hang when doing loopback NFS (wireshark traces attached)

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Weird TCP hang when doing loopback NFS (wireshark traces attached)
@ 2015-03-01 19:14 Trond Myklebust
       [not found] ` <1425237291.24845.13.camel-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2015-03-01 19:14 UTC (permalink / raw)
  To: Linux Network Devel Mailing List, Linux NFS Mailing List

[-- Attachment #1: Type: text/plain, Size: 711 bytes --]

Hi,

When doing testing of NFSv3 loopback mounts (client and server are on
the same IP address), I'm seeing a very reproducible hang in which the
client stops receiving data from the server. The TCP connection is still
marked as established, and the server appears to continue to receive and
send data, however the client does not.

So far, I've reproduced on both v4.0-rc1, and the Fedora v3.18.7 kernel.

The reproducer is simply to loopback mount using NFSv3, and then run the
'fsx' filesystem exerciser. I'm usually able to trigger the hang with
"fsx -N 100000 foobar".

I've attached a couple of wireshark trace of a few frames just before
and during the hang in case it jogs any memories.

Cheers
  Trond

[-- Attachment #2: dump_lastframes.out.pcapng.gz --]
[-- Type: application/x-pcapng, Size: 1689 bytes --]

[-- Attachment #3: dump_3.18.7_lastframes.pcapng.gz --]
[-- Type: application/x-pcapng, Size: 49636 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Weird TCP hang when doing loopback NFS (wireshark traces attached)
       [not found] ` <1425237291.24845.13.camel-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
@ 2015-03-02  0:52   ` Trond Myklebust
  2015-03-02  1:06     ` Bruce James Fields
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2015-03-02  0:52 UTC (permalink / raw)
  To: Linux Network Devel Mailing List, Linux NFS Mailing List,
	Bruce James Fields

Hi Bruce,

On Sun, Mar 1, 2015 at 2:14 PM, Trond Myklebust
<trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org> wrote:
> Hi,
>
> When doing testing of NFSv3 loopback mounts (client and server are on
> the same IP address), I'm seeing a very reproducible hang in which the
> client stops receiving data from the server. The TCP connection is still
> marked as established, and the server appears to continue to receive and
> send data, however the client does not.
>
> So far, I've reproduced on both v4.0-rc1, and the Fedora v3.18.7 kernel.
>
> The reproducer is simply to loopback mount using NFSv3, and then run the
> 'fsx' filesystem exerciser. I'm usually able to trigger the hang with
> "fsx -N 100000 foobar".
>
> I've attached a couple of wireshark trace of a few frames just before
> and during the hang in case it jogs any memories.

This bug appears to go away when I disable the splice()-based reads by
clearing the RQ_SPLICE_OK flag.

I noticed that it always involved a combination of a READ and a
truncating SETATTR call. Are you sure that it is safe to share
pagecache pages directly with sendpage() in this way? As far as I can
tell, there is no locking to prevent them from being modified while in
the TCP send queue.

Cheers
  Trond
-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Weird TCP hang when doing loopback NFS (wireshark traces attached)
  2015-03-02  0:52   ` Trond Myklebust
@ 2015-03-02  1:06     ` Bruce James Fields
  2015-03-02  1:20       ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Bruce James Fields @ 2015-03-02  1:06 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Network Devel Mailing List, Linux NFS Mailing List

On Sun, Mar 01, 2015 at 07:52:28PM -0500, Trond Myklebust wrote:
> Hi Bruce,
> 
> On Sun, Mar 1, 2015 at 2:14 PM, Trond Myklebust
> <trond.myklebust@primarydata.com> wrote:
> > Hi,
> >
> > When doing testing of NFSv3 loopback mounts (client and server are on
> > the same IP address), I'm seeing a very reproducible hang in which the
> > client stops receiving data from the server. The TCP connection is still
> > marked as established, and the server appears to continue to receive and
> > send data, however the client does not.
> >
> > So far, I've reproduced on both v4.0-rc1, and the Fedora v3.18.7 kernel.
> >
> > The reproducer is simply to loopback mount using NFSv3, and then run the
> > 'fsx' filesystem exerciser. I'm usually able to trigger the hang with
> > "fsx -N 100000 foobar".
> >
> > I've attached a couple of wireshark trace of a few frames just before
> > and during the hang in case it jogs any memories.
> 
> This bug appears to go away when I disable the splice()-based reads by
> clearing the RQ_SPLICE_OK flag.
> 
> I noticed that it always involved a combination of a READ and a
> truncating SETATTR call. Are you sure that it is safe to share
> pagecache pages directly with sendpage() in this way? As far as I can
> tell, there is no locking to prevent them from being modified while in
> the TCP send queue.

This is the stable-pages problem that we've had forever, isn't it?  Or
is this a different problem?

--b.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Weird TCP hang when doing loopback NFS (wireshark traces attached)
  2015-03-02  1:06     ` Bruce James Fields
@ 2015-03-02  1:20       ` Trond Myklebust
  2015-03-02  4:31         ` Trond Myklebust
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2015-03-02  1:20 UTC (permalink / raw)
  To: Bruce James Fields
  Cc: Linux Network Devel Mailing List, Linux NFS Mailing List

On Sun, Mar 1, 2015 at 8:06 PM, Bruce James Fields <bfields@fieldses.org> wrote:
> On Sun, Mar 01, 2015 at 07:52:28PM -0500, Trond Myklebust wrote:
>> Hi Bruce,
>>
>> On Sun, Mar 1, 2015 at 2:14 PM, Trond Myklebust
>> <trond.myklebust@primarydata.com> wrote:
>> > Hi,
>> >
>> > When doing testing of NFSv3 loopback mounts (client and server are on
>> > the same IP address), I'm seeing a very reproducible hang in which the
>> > client stops receiving data from the server. The TCP connection is still
>> > marked as established, and the server appears to continue to receive and
>> > send data, however the client does not.
>> >
>> > So far, I've reproduced on both v4.0-rc1, and the Fedora v3.18.7 kernel.
>> >
>> > The reproducer is simply to loopback mount using NFSv3, and then run the
>> > 'fsx' filesystem exerciser. I'm usually able to trigger the hang with
>> > "fsx -N 100000 foobar".
>> >
>> > I've attached a couple of wireshark trace of a few frames just before
>> > and during the hang in case it jogs any memories.
>>
>> This bug appears to go away when I disable the splice()-based reads by
>> clearing the RQ_SPLICE_OK flag.
>>
>> I noticed that it always involved a combination of a READ and a
>> truncating SETATTR call. Are you sure that it is safe to share
>> pagecache pages directly with sendpage() in this way? As far as I can
>> tell, there is no locking to prevent them from being modified while in
>> the TCP send queue.
>
> This is the stable-pages problem that we've had forever, isn't it?  Or
> is this a different problem?

It is causing the TCP socket to hang, so it goes beyond the usual
stable pages issue.

-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Weird TCP hang when doing loopback NFS (wireshark traces attached)
  2015-03-02  1:20       ` Trond Myklebust
@ 2015-03-02  4:31         ` Trond Myklebust
       [not found]           ` <CAHQdGtQnbPWYhdvwTGJKUD4mt8x_rmQjCH3AO4X17Y4RBSpUQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 6+ messages in thread
From: Trond Myklebust @ 2015-03-02  4:31 UTC (permalink / raw)
  To: Bruce James Fields
  Cc: Linux Network Devel Mailing List, Linux NFS Mailing List

On Sun, Mar 1, 2015 at 8:20 PM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
> On Sun, Mar 1, 2015 at 8:06 PM, Bruce James Fields <bfields@fieldses.org> wrote:
>> On Sun, Mar 01, 2015 at 07:52:28PM -0500, Trond Myklebust wrote:
>>> Hi Bruce,
>>>
>>> On Sun, Mar 1, 2015 at 2:14 PM, Trond Myklebust
>>> <trond.myklebust@primarydata.com> wrote:
>>> > Hi,
>>> >
>>> > When doing testing of NFSv3 loopback mounts (client and server are on
>>> > the same IP address), I'm seeing a very reproducible hang in which the
>>> > client stops receiving data from the server. The TCP connection is still
>>> > marked as established, and the server appears to continue to receive and
>>> > send data, however the client does not.
>>> >
>>> > So far, I've reproduced on both v4.0-rc1, and the Fedora v3.18.7 kernel.
>>> >
>>> > The reproducer is simply to loopback mount using NFSv3, and then run the
>>> > 'fsx' filesystem exerciser. I'm usually able to trigger the hang with
>>> > "fsx -N 100000 foobar".
>>> >
>>> > I've attached a couple of wireshark trace of a few frames just before
>>> > and during the hang in case it jogs any memories.
>>>
>>> This bug appears to go away when I disable the splice()-based reads by
>>> clearing the RQ_SPLICE_OK flag.
>>>
>>> I noticed that it always involved a combination of a READ and a
>>> truncating SETATTR call. Are you sure that it is safe to share
>>> pagecache pages directly with sendpage() in this way? As far as I can
>>> tell, there is no locking to prevent them from being modified while in
>>> the TCP send queue.
>>
>> This is the stable-pages problem that we've had forever, isn't it?  Or
>> is this a different problem?
>
> It is causing the TCP socket to hang, so it goes beyond the usual
> stable pages issue.
>

Confirming that clearing RQ_SPLICE_OK fixes the issue on all kernel
that I've tested so far.
-- 
Trond Myklebust
Linux NFS client maintainer, PrimaryData
trond.myklebust@primarydata.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Weird TCP hang when doing loopback NFS (wireshark traces attached)
       [not found]           ` <CAHQdGtQnbPWYhdvwTGJKUD4mt8x_rmQjCH3AO4X17Y4RBSpUQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-03-02 19:58             ` Bruce James Fields
  0 siblings, 0 replies; 6+ messages in thread
From: Bruce James Fields @ 2015-03-02 19:58 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux Network Devel Mailing List, Linux NFS Mailing List

On Sun, Mar 01, 2015 at 11:31:31PM -0500, Trond Myklebust wrote:
> On Sun, Mar 1, 2015 at 8:20 PM, Trond Myklebust
> <trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org> wrote:
> > On Sun, Mar 1, 2015 at 8:06 PM, Bruce James Fields <bfields-uC3wQj2KruNg9hUCZPvPmw@public.gmane.org> wrote:
> >> On Sun, Mar 01, 2015 at 07:52:28PM -0500, Trond Myklebust wrote:
> >>> Hi Bruce,
> >>>
> >>> On Sun, Mar 1, 2015 at 2:14 PM, Trond Myklebust
> >>> <trond.myklebust-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org> wrote:
> >>> > Hi,
> >>> >
> >>> > When doing testing of NFSv3 loopback mounts (client and server are on
> >>> > the same IP address), I'm seeing a very reproducible hang in which the
> >>> > client stops receiving data from the server. The TCP connection is still
> >>> > marked as established, and the server appears to continue to receive and
> >>> > send data, however the client does not.
> >>> >
> >>> > So far, I've reproduced on both v4.0-rc1, and the Fedora v3.18.7 kernel.
> >>> >
> >>> > The reproducer is simply to loopback mount using NFSv3, and then run the
> >>> > 'fsx' filesystem exerciser. I'm usually able to trigger the hang with
> >>> > "fsx -N 100000 foobar".
> >>> >
> >>> > I've attached a couple of wireshark trace of a few frames just before
> >>> > and during the hang in case it jogs any memories.
> >>>
> >>> This bug appears to go away when I disable the splice()-based reads by
> >>> clearing the RQ_SPLICE_OK flag.
> >>>
> >>> I noticed that it always involved a combination of a READ and a
> >>> truncating SETATTR call. Are you sure that it is safe to share
> >>> pagecache pages directly with sendpage() in this way? As far as I can
> >>> tell, there is no locking to prevent them from being modified while in
> >>> the TCP send queue.
> >>
> >> This is the stable-pages problem that we've had forever, isn't it?  Or
> >> is this a different problem?
> >
> > It is causing the TCP socket to hang, so it goes beyond the usual
> > stable pages issue.
> >
> 
> Confirming that clearing RQ_SPLICE_OK fixes the issue on all kernel
> that I've tested so far.

Well, if the problem is a race with truncate then I guess it may have
something to do with sending pages that are no longer part of the page
cache?

I'd think that the get_page() in nfsd_splice_actor would prevent the
page being put to any other use until the network layer was done with
it, so that at worst the client would see garbage.  But I don't begin to
understand how truncation actually works....

The zero-copy v3 code has been there since 2002, if I'm reading the
history right, so if it's really a fundamental problem with the approach
then I wonder how it's survived so long.

I haven't tried to reproduce yet.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-03-02 19:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-03-01 19:14 Weird TCP hang when doing loopback NFS (wireshark traces attached) Trond Myklebust
     [not found] ` <1425237291.24845.13.camel-7I+n7zu2hftEKMMhf/gKZA@public.gmane.org>
2015-03-02  0:52   ` Trond Myklebust
2015-03-02  1:06     ` Bruce James Fields
2015-03-02  1:20       ` Trond Myklebust
2015-03-02  4:31         ` Trond Myklebust
     [not found]           ` <CAHQdGtQnbPWYhdvwTGJKUD4mt8x_rmQjCH3AO4X17Y4RBSpUQQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-03-02 19:58             ` Bruce James Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).