* hang on xfstests generic/074 @ 2015-02-10 15:43 J. Bruce Fields 2015-02-10 15:45 ` J. Bruce Fields 2015-02-11 12:34 ` Christoph Hellwig 0 siblings, 2 replies; 8+ messages in thread From: J. Bruce Fields @ 2015-02-10 15:43 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs I finally got around to running xfstests as part of my regular testing and ran across a reproduceable hang on generic/074: [110040.300055] INFO: task fstest:22762 blocked for more than 120 seconds. [110040.300571] Tainted: G W 3.19.0-rc4-00206-g53ea83c #16 [110040.301082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [110040.301656] fstest D ffff88005246bc98 11320 22762 22761 0x00000000 [110040.302334] ffff88005246bc98 ffff88005246bc58 0000000000009000 ffff880057c4d7d0 [110040.303185] ffff88005246bfd8 ffff880052f84b90 ffff880057c4d7d0 0000000000000000 [110040.304041] 0000000000000000 0000000000000001 0000000000000000 ffff88005246bc58 [110040.305096] Call Trace: [110040.305331] [<ffffffff810b0ed7>] ? prepare_to_wait+0x27/0x90 [110040.305770] [<ffffffffa000f492>] ? rpc_make_runnable+0xc2/0xd0 [sunrpc] [110040.306274] [<ffffffff81a9550d>] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [110040.306799] [<ffffffff81a90910>] ? bit_wait+0x60/0x60 [110040.307258] [<ffffffff810bad1d>] ? trace_hardirqs_on_caller+0x15d/0x200 [110040.307798] [<ffffffff810badcd>] ? trace_hardirqs_on+0xd/0x10 [110040.308283] [<ffffffff81a90910>] ? bit_wait+0x60/0x60 [110040.308678] [<ffffffff81a8ffc9>] schedule+0x29/0x70 [110040.309054] [<ffffffff81a90295>] io_schedule+0x55/0x80 [110040.309510] [<ffffffff81a90944>] bit_wait_io+0x34/0x60 [110040.309909] [<ffffffff81a905b7>] __wait_on_bit+0x67/0x90 [110040.310313] [<ffffffff8114e72d>] ? find_get_pages_tag+0xd/0x210 [110040.310756] [<ffffffff8114d2c6>] wait_on_page_bit+0xb6/0xc0 [110040.311177] [<ffffffff810b1410>] ? autoremove_wake_function+0x40/0x40 [110040.311654] [<ffffffff8114d422>] filemap_fdatawait_range+0xf2/0x190 [110040.312131] [<ffffffff8114f062>] filemap_write_and_wait_range+0x42/0x70 [110040.312627] [<ffffffffa017866f>] nfs4_file_fsync+0x5f/0xb0 [nfsv4] [110040.313090] [<ffffffffa0178905>] ? nfs4_do_check_delegation+0x5/0xc0 [nfsv4] [110040.313616] [<ffffffff811d05a9>] vfs_fsync+0x29/0x40 [110040.313991] [<ffffffffa010fdea>] nfs_file_flush+0x7a/0xb0 [nfs] [110040.314421] [<ffffffff8119b093>] filp_close+0x33/0x80 [110040.314801] [<ffffffff811bce42>] __close_fd+0x82/0xa0 [110040.315178] [<ffffffff8119b103>] SyS_close+0x23/0x50 [110040.315551] [<ffffffff81a960d2>] system_call_fastpath+0x12/0x17 [110040.316247] no locks held by fstest/22762. [110063.328025] nfs: server f21-1 not responding, still trying ... Is this anything known? Client and server were running some version of my tree, 19-rc4 based. --b. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: hang on xfstests generic/074 2015-02-10 15:43 hang on xfstests generic/074 J. Bruce Fields @ 2015-02-10 15:45 ` J. Bruce Fields 2015-02-11 12:34 ` Christoph Hellwig 1 sibling, 0 replies; 8+ messages in thread From: J. Bruce Fields @ 2015-02-10 15:45 UTC (permalink / raw) To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs On Tue, Feb 10, 2015 at 10:43:06AM -0500, bfields wrote: > I finally got around to running xfstests as part of my regular testing > and ran across a reproduceable hang on generic/074: By the way, running ./check -nfs -g auto, I also see failures on: generic/005 generic/017 generic/031 generic/032 generic/033 generic/035 generic/037 generic/053 generic/062 generic/068 generic/088 generic/089 generic/105 generic/126 generic/133 generic/184 generic/225 generic/277 generic/294 generic/306 and another 94 tests were skipped. I haven't looked into any of those yet. All of this is over 4.1. --b. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: hang on xfstests generic/074 2015-02-10 15:43 hang on xfstests generic/074 J. Bruce Fields 2015-02-10 15:45 ` J. Bruce Fields @ 2015-02-11 12:34 ` Christoph Hellwig 2015-03-01 15:56 ` Trond Myklebust 2016-04-01 14:57 ` Benjamin Coddington 1 sibling, 2 replies; 8+ messages in thread From: Christoph Hellwig @ 2015-02-11 12:34 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs On Tue, Feb 10, 2015 at 10:43:06AM -0500, J. Bruce Fields wrote: > I finally got around to running xfstests as part of my regular testing > and ran across a reproduceable hang on generic/074: Yes, I reported this about half a year ago. It was caused (or at least unhidden) by commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e ("SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT"). Reverting that commit fixes the issue for me. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: hang on xfstests generic/074 2015-02-11 12:34 ` Christoph Hellwig @ 2015-03-01 15:56 ` Trond Myklebust 2015-03-01 15:58 ` Trond Myklebust 2016-04-01 14:57 ` Benjamin Coddington 1 sibling, 1 reply; 8+ messages in thread From: Trond Myklebust @ 2015-03-01 15:56 UTC (permalink / raw) To: Christoph Hellwig; +Cc: J. Bruce Fields, Anna Schumaker, Linux NFS Mailing List On Wed, Feb 11, 2015 at 7:34 AM, Christoph Hellwig <hch@infradead.org> wrote: > On Tue, Feb 10, 2015 at 10:43:06AM -0500, J. Bruce Fields wrote: >> I finally got around to running xfstests as part of my regular testing >> and ran across a reproduceable hang on generic/074: > > Yes, I reported this about half a year ago. It was caused (or at least > unhidden) by commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e ("SUNRPC: > Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT"). Reverting > that commit fixes the issue for me. > Can you please recheck with the new 'devel' branch on git://git.linux-nfs.org/projects/trondmy/linux-nfs.git ? That fixes the test for me. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: hang on xfstests generic/074 2015-03-01 15:56 ` Trond Myklebust @ 2015-03-01 15:58 ` Trond Myklebust 0 siblings, 0 replies; 8+ messages in thread From: Trond Myklebust @ 2015-03-01 15:58 UTC (permalink / raw) To: Christoph Hellwig; +Cc: J. Bruce Fields, Anna Schumaker, Linux NFS Mailing List On Sun, Mar 1, 2015 at 10:56 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > On Wed, Feb 11, 2015 at 7:34 AM, Christoph Hellwig <hch@infradead.org> wrote: >> On Tue, Feb 10, 2015 at 10:43:06AM -0500, J. Bruce Fields wrote: >>> I finally got around to running xfstests as part of my regular testing >>> and ran across a reproduceable hang on generic/074: >> >> Yes, I reported this about half a year ago. It was caused (or at least >> unhidden) by commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e ("SUNRPC: >> Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT"). Reverting >> that commit fixes the issue for me. >> > > Can you please recheck with the new 'devel' branch on > git://git.linux-nfs.org/projects/trondmy/linux-nfs.git ? That fixes > the test for me. > Oops. Never mind. I was thinking of a different issue, but also with generic/074. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@primarydata.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: hang on xfstests generic/074 2015-02-11 12:34 ` Christoph Hellwig 2015-03-01 15:56 ` Trond Myklebust @ 2016-04-01 14:57 ` Benjamin Coddington 2016-04-01 15:26 ` Trond Myklebust 1 sibling, 1 reply; 8+ messages in thread From: Benjamin Coddington @ 2016-04-01 14:57 UTC (permalink / raw) To: Trond Myklebust, Christoph Hellwig Cc: J. Bruce Fields, Anna Schumaker, linux-nfs On Wed, 11 Feb 2015, Christoph Hellwig wrote: > On Tue, Feb 10, 2015 at 10:43:06AM -0500, J. Bruce Fields wrote: > > I finally got around to running xfstests as part of my regular testing > > and ran across a reproduceable hang on generic/074: > > Yes, I reported this about half a year ago. It was caused (or at least > unhidden) by commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e ("SUNRPC: > Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT"). Reverting > that commit fixes the issue for me. I just ran into this. Now that we have SO_REUSEPORT, can we get rid of RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT? Ben ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: hang on xfstests generic/074 2016-04-01 14:57 ` Benjamin Coddington @ 2016-04-01 15:26 ` Trond Myklebust 2016-04-08 18:24 ` Benjamin Coddington 0 siblings, 1 reply; 8+ messages in thread From: Trond Myklebust @ 2016-04-01 15:26 UTC (permalink / raw) To: Benjamin Coddington Cc: Christoph Hellwig, J. Bruce Fields, Anna Schumaker, Linux NFS Mailing List On Fri, Apr 1, 2016 at 10:57 AM, Benjamin Coddington <bcodding@redhat.com> wrote: > On Wed, 11 Feb 2015, Christoph Hellwig wrote: > >> On Tue, Feb 10, 2015 at 10:43:06AM -0500, J. Bruce Fields wrote: >> > I finally got around to running xfstests as part of my regular testing >> > and ran across a reproduceable hang on generic/074: >> >> Yes, I reported this about half a year ago. It was caused (or at least >> unhidden) by commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e ("SUNRPC: >> Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT"). Reverting >> that commit fixes the issue for me. > > I just ran into this. > > Now that we have SO_REUSEPORT, can we get rid of > RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT? They are unrelated. If you are hitting this hang, then you have borked server that is dropping NFSv4 RPC requests. The old behaviour of having the client break the connection is not actually sanctioned by the NFSv4 protocol. Cheers Trond ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: hang on xfstests generic/074 2016-04-01 15:26 ` Trond Myklebust @ 2016-04-08 18:24 ` Benjamin Coddington 0 siblings, 0 replies; 8+ messages in thread From: Benjamin Coddington @ 2016-04-08 18:24 UTC (permalink / raw) To: Trond Myklebust Cc: Christoph Hellwig, J. Bruce Fields, Anna Schumaker, Linux NFS Mailing List On Fri, 1 Apr 2016, Trond Myklebust wrote: > On Fri, Apr 1, 2016 at 10:57 AM, Benjamin Coddington > <bcodding@redhat.com> wrote: > > On Wed, 11 Feb 2015, Christoph Hellwig wrote: > > > >> On Tue, Feb 10, 2015 at 10:43:06AM -0500, J. Bruce Fields wrote: > >> > I finally got around to running xfstests as part of my regular testing > >> > and ran across a reproduceable hang on generic/074: > >> > >> Yes, I reported this about half a year ago. It was caused (or at least > >> unhidden) by commit 2aca5b869ace67a63aab895659e5dc14c33a4d6e ("SUNRPC: > >> Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT"). Reverting > >> that commit fixes the issue for me. > > > > I just ran into this. > > > > Now that we have SO_REUSEPORT, can we get rid of > > RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT? > > They are unrelated. > > If you are hitting this hang, then you have borked server that is > dropping NFSv4 RPC requests. The old behaviour of having the client > break the connection is not actually sanctioned by the NFSv4 protocol. > > Cheers > Trond Ah, thanks for pointing that out. It is a server bug, I think. I'm trying to find out more, and I'll write about it under separate cover. Ben ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-04-08 18:24 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-10 15:43 hang on xfstests generic/074 J. Bruce Fields 2015-02-10 15:45 ` J. Bruce Fields 2015-02-11 12:34 ` Christoph Hellwig 2015-03-01 15:56 ` Trond Myklebust 2015-03-01 15:58 ` Trond Myklebust 2016-04-01 14:57 ` Benjamin Coddington 2016-04-01 15:26 ` Trond Myklebust 2016-04-08 18:24 ` Benjamin Coddington
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).