From: Simon Kirby <sim@hostway.ca>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Yan-Pai Chen <yanpai.chen@gmail.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [3.2.5] NFSv3 CLOSE_WAIT hang
Date: Fri, 12 Oct 2012 01:15:16 -0700 [thread overview]
Message-ID: <20121012081516.GA20046@hostway.ca> (raw)
In-Reply-To: <4FA345DA4F4AE44899BD2B03EEEC2FA908FD9796@SACEXCMBX04-PRD.hq.netapp.com>
On Wed, Sep 19, 2012 at 10:11:57PM +0000, Myklebust, Trond wrote:
> On Wed, 2012-09-19 at 15:01 -0700, Simon Kirby wrote:
>
> > Yes, based on data collected today, this seems to fix the problem!
> > Awesome! :)
>
> Great! I'm adding "Tested-by:" and "Cc: stable" lines since you've been
> seeing this for some time.
This has been working well, except we just saw one case now where we
still got a CLOSE_WAIT hang. The issue was that the queue_work() was not
called because setting XPRT_LOCKED failed, which you asked about before.
This trace from our debugging patches:
WARNING: at net/sunrpc/xprt.c:648 xprt_force_disconnect+0x12a/0x140()
Hardware name: PowerEdge 1950
xprt_force_disconnect(): setting XPRT_LOCKED failed
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2 serio_raw evdev
Pid: 0, comm: swapper/1 Tainted: G W 3.2.28-hw+ #25
Call Trace:
<IRQ> [<ffffffff816b74ba>] ? xprt_force_disconnect+0x12a/0x140
[<ffffffff8105db70>] warn_slowpath_common+0x80/0xc0
[<ffffffff8105dca9>] warn_slowpath_fmt+0x69/0x70
[<ffffffff81701c8c>] ? printk+0x67/0x69
[<ffffffff81704ef1>] ? _raw_spin_lock_bh+0x11/0x40
[<ffffffff816b74ba>] xprt_force_disconnect+0x12a/0x140
[<ffffffff816ba99c>] xs_tcp_state_change+0xdc/0x340
[<ffffffff81669deb>] tcp_fin+0x17b/0x210
[<ffffffff8166aaf2>] tcp_data_queue+0x4c2/0xe70
[<ffffffff8166e259>] ? tcp_validate_incoming+0x109/0x340
[<ffffffff8166e5d9>] tcp_rcv_established+0x149/0xb50
[<ffffffff8167711c>] tcp_v4_do_rcv+0x15c/0x2e0
[<ffffffff8164b6e4>] ? nf_iterate+0x84/0xb0
[<ffffffff816778ef>] tcp_v4_rcv+0x64f/0x8b0
[<ffffffff8164b85d>] ? nf_hook_slow+0x6d/0x130
[<ffffffff81656160>] ? ip_rcv+0x2f0/0x2f0
[<ffffffff8165620a>] ip_local_deliver_finish+0xaa/0x1d0
[<ffffffff816563bd>] ip_local_deliver+0x8d/0xa0
[<ffffffff81655bdb>] ip_rcv_finish+0x7b/0x310
[<ffffffff816560ed>] ip_rcv+0x27d/0x2f0
[<ffffffff816272b7>] __netif_receive_skb+0x437/0x490
[<ffffffff81629148>] netif_receive_skb+0x78/0x80
...
-pid- flgs status -client- --rqstp- -timeout ---ops--
60022 0080 -11 ffff880096e5bc00 (null) 0 ffffffff817beb20 nfsv3 FSINFO a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff880142c1d800) state=16, snd_task=0
SUNRPC: xprt(ffff880032175800) state=16, snd_task=0
SUNRPC: xprt(ffff8800c2da9000) state=16, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff880221977000) state=16, snd_task=0
SUNRPC: xprt(ffff88022212f800) state=16, snd_task=0
SUNRPC: xprt(ffff880221504000) state=0, snd_task=0
SUNRPC: xprt(ffff88022509c800) state=18, snd_task=0
SUNRPC: xprt(ffff88022509c800) state=18, snd_task=0
SUNRPC: xprt(ffff880222711000) state=18, snd_task=0
SUNRPC: xprt(ffff880222711000) state=18, snd_task=0
SUNRPC: xprt(ffff880221737800) state=18, snd_task=0
SUNRPC: xprt(ffff880221737800) state=18, snd_task=0
SUNRPC: xprt(ffff88021f275800) state=18, snd_task=0
SUNRPC: xprt(ffff88021f275800) state=18, snd_task=0
SUNRPC: xprt(ffff8802252b3800) state=18, snd_task=0
SUNRPC: xprt(ffff880222704000) state=18, snd_task=0
SUNRPC: xprt(ffff880222704000) state=18, snd_task=0
SUNRPC: xprt(ffff8802252b3800) state=18, snd_task=0
SUNRPC: xprt(ffff8802252b3800) state=18, snd_task=0
SUNRPC: xprt(ffff88022234e800) state=18, snd_task=0
SUNRPC: xprt(ffff88022234e800) state=18, snd_task=0
SUNRPC: xprt(ffff880221734000) state=18, snd_task=0
SUNRPC: xprt(ffff880221734000) state=18, snd_task=0
SUNRPC: xprt(ffff880222702000) state=18, snd_task=0
SUNRPC: xprt(ffff880222702000) state=18, snd_task=0
1758 0080 -11 ffff880225562400 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1770 0080 -11 ffff880225562400 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1799 0080 -11 ffff880225562400 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
59728 0001 -11 ffff8802253d9e00 (null) 0 ffffffff8173ff20 nfsv3 COMMIT a:call_reserveresult q:xprt_sending
1449 0080 -11 ffff8802253d9e00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1766 0080 -11 ffff8802253d9e00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1811 0080 -11 ffff8802253d9e00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
1431 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1604 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1851 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1870 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
5693 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
5791 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
1871 0080 -11 ffff880225522800 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
1442 0080 -11 ffff8802253d7600 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1457 0080 -11 ffff8802253d7600 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1829 0080 -11 ffff8802253d7600 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
5784 0080 -11 ffff8802253d7600 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
1625 0080 -11 ffff880225522000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
2152 0080 -11 ffff880225522000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
2196 0080 -11 ffff880225522000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
1517 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1733 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1741 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1872 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
2211 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
4316 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
4350 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 LOOKUP a:call_reserveresult q:xprt_sending
4353 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
5862 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
6982 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
9869 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff880225098800) state=16, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
We've only seen this once, so we're definitely better than before.
Simon-
prev parent reply other threads:[~2012-10-12 8:15 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-09 19:45 [3.1-rc4] NFSv3 client hang Simon Kirby
2011-09-09 23:18 ` Trond Myklebust
2011-10-20 19:03 ` Simon Kirby
2012-03-01 22:55 ` Simon Kirby
2012-03-02 0:25 ` Simon Kirby
2012-03-02 18:49 ` [3.2.5] NFSv3 CLOSE_WAIT hang Simon Kirby
2012-09-05 7:49 ` Yan-Pai Chen
2012-09-05 15:09 ` Myklebust, Trond
2012-09-07 13:57 ` Dick Streefland, rnews
2012-09-07 14:13 ` Myklebust, Trond
2012-09-07 14:33 ` Dick Streefland, rnews
2012-09-07 15:46 ` Myklebust, Trond
2012-09-08 19:32 ` Dick Streefland, rnews
2012-09-10 9:00 ` Yan-Pai Chen
2012-09-11 19:40 ` Simon Kirby
2012-09-11 22:17 ` Myklebust, Trond
2012-09-13 5:22 ` Yan-Pai Chen
2012-09-13 13:32 ` Myklebust, Trond
2012-09-21 7:30 ` Yan-Pai Chen
[not found] ` <1347401844.15208.17.camel@lade.trondhjem.org>
2012-09-12 20:54 ` Myklebust, Trond
2012-09-19 22:01 ` Simon Kirby
2012-09-19 22:11 ` Myklebust, Trond
2012-10-12 8:15 ` Simon Kirby [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121012081516.GA20046@hostway.ca \
--to=sim@hostway.ca \
--cc=Trond.Myklebust@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=yanpai.chen@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).