From: Simon Kirby <sim@hostway.ca>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Yan-Pai Chen <yanpai.chen@gmail.com>,
"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [3.2.5] NFSv3 CLOSE_WAIT hang
Date: Fri, 12 Oct 2012 01:15:16 -0700 [thread overview]
Message-ID: <20121012081516.GA20046@hostway.ca> (raw)
In-Reply-To: <4FA345DA4F4AE44899BD2B03EEEC2FA908FD9796@SACEXCMBX04-PRD.hq.netapp.com>
On Wed, Sep 19, 2012 at 10:11:57PM +0000, Myklebust, Trond wrote:
> On Wed, 2012-09-19 at 15:01 -0700, Simon Kirby wrote:
>
> > Yes, based on data collected today, this seems to fix the problem!
> > Awesome! :)
>
> Great! I'm adding "Tested-by:" and "Cc: stable" lines since you've been
> seeing this for some time.
This has been working well, except we just saw one case now where we
still got a CLOSE_WAIT hang. The issue was that the queue_work() was not
called because setting XPRT_LOCKED failed, which you asked about before.
This trace from our debugging patches:
WARNING: at net/sunrpc/xprt.c:648 xprt_force_disconnect+0x12a/0x140()
Hardware name: PowerEdge 1950
xprt_force_disconnect(): setting XPRT_LOCKED failed
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2 serio_raw evdev
Pid: 0, comm: swapper/1 Tainted: G W 3.2.28-hw+ #25
Call Trace:
<IRQ> [<ffffffff816b74ba>] ? xprt_force_disconnect+0x12a/0x140
[<ffffffff8105db70>] warn_slowpath_common+0x80/0xc0
[<ffffffff8105dca9>] warn_slowpath_fmt+0x69/0x70
[<ffffffff81701c8c>] ? printk+0x67/0x69
[<ffffffff81704ef1>] ? _raw_spin_lock_bh+0x11/0x40
[<ffffffff816b74ba>] xprt_force_disconnect+0x12a/0x140
[<ffffffff816ba99c>] xs_tcp_state_change+0xdc/0x340
[<ffffffff81669deb>] tcp_fin+0x17b/0x210
[<ffffffff8166aaf2>] tcp_data_queue+0x4c2/0xe70
[<ffffffff8166e259>] ? tcp_validate_incoming+0x109/0x340
[<ffffffff8166e5d9>] tcp_rcv_established+0x149/0xb50
[<ffffffff8167711c>] tcp_v4_do_rcv+0x15c/0x2e0
[<ffffffff8164b6e4>] ? nf_iterate+0x84/0xb0
[<ffffffff816778ef>] tcp_v4_rcv+0x64f/0x8b0
[<ffffffff8164b85d>] ? nf_hook_slow+0x6d/0x130
[<ffffffff81656160>] ? ip_rcv+0x2f0/0x2f0
[<ffffffff8165620a>] ip_local_deliver_finish+0xaa/0x1d0
[<ffffffff816563bd>] ip_local_deliver+0x8d/0xa0
[<ffffffff81655bdb>] ip_rcv_finish+0x7b/0x310
[<ffffffff816560ed>] ip_rcv+0x27d/0x2f0
[<ffffffff816272b7>] __netif_receive_skb+0x437/0x490
[<ffffffff81629148>] netif_receive_skb+0x78/0x80
...
-pid- flgs status -client- --rqstp- -timeout ---ops--
60022 0080 -11 ffff880096e5bc00 (null) 0 ffffffff817beb20 nfsv3 FSINFO a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff880142c1d800) state=16, snd_task=0
SUNRPC: xprt(ffff880032175800) state=16, snd_task=0
SUNRPC: xprt(ffff8800c2da9000) state=16, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff880221977000) state=16, snd_task=0
SUNRPC: xprt(ffff88022212f800) state=16, snd_task=0
SUNRPC: xprt(ffff880221504000) state=0, snd_task=0
SUNRPC: xprt(ffff88022509c800) state=18, snd_task=0
SUNRPC: xprt(ffff88022509c800) state=18, snd_task=0
SUNRPC: xprt(ffff880222711000) state=18, snd_task=0
SUNRPC: xprt(ffff880222711000) state=18, snd_task=0
SUNRPC: xprt(ffff880221737800) state=18, snd_task=0
SUNRPC: xprt(ffff880221737800) state=18, snd_task=0
SUNRPC: xprt(ffff88021f275800) state=18, snd_task=0
SUNRPC: xprt(ffff88021f275800) state=18, snd_task=0
SUNRPC: xprt(ffff8802252b3800) state=18, snd_task=0
SUNRPC: xprt(ffff880222704000) state=18, snd_task=0
SUNRPC: xprt(ffff880222704000) state=18, snd_task=0
SUNRPC: xprt(ffff8802252b3800) state=18, snd_task=0
SUNRPC: xprt(ffff8802252b3800) state=18, snd_task=0
SUNRPC: xprt(ffff88022234e800) state=18, snd_task=0
SUNRPC: xprt(ffff88022234e800) state=18, snd_task=0
SUNRPC: xprt(ffff880221734000) state=18, snd_task=0
SUNRPC: xprt(ffff880221734000) state=18, snd_task=0
SUNRPC: xprt(ffff880222702000) state=18, snd_task=0
SUNRPC: xprt(ffff880222702000) state=18, snd_task=0
1758 0080 -11 ffff880225562400 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1770 0080 -11 ffff880225562400 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1799 0080 -11 ffff880225562400 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
59728 0001 -11 ffff8802253d9e00 (null) 0 ffffffff8173ff20 nfsv3 COMMIT a:call_reserveresult q:xprt_sending
1449 0080 -11 ffff8802253d9e00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1766 0080 -11 ffff8802253d9e00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1811 0080 -11 ffff8802253d9e00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
1431 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1604 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1851 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1870 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
5693 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
5791 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
1871 0080 -11 ffff880225522800 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
1442 0080 -11 ffff8802253d7600 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1457 0080 -11 ffff8802253d7600 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1829 0080 -11 ffff8802253d7600 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
5784 0080 -11 ffff8802253d7600 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
1625 0080 -11 ffff880225522000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
2152 0080 -11 ffff880225522000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
2196 0080 -11 ffff880225522000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
1517 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1733 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1741 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
1872 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
2211 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
4316 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
4350 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 LOOKUP a:call_reserveresult q:xprt_sending
4353 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
5862 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
6982 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
9869 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff880225098800) state=16, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
We've only seen this once, so we're definitely better than before.
Simon-
prev parent reply other threads:[~2012-10-12 8:15 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-09 19:45 [3.1-rc4] NFSv3 client hang Simon Kirby
2011-09-09 23:18 ` Trond Myklebust
2011-09-09 23:18 ` Trond Myklebust
2011-10-20 19:03 ` Simon Kirby
2012-03-01 22:55 ` Simon Kirby
2012-03-02 0:25 ` Simon Kirby
2012-03-02 18:49 ` [3.2.5] NFSv3 CLOSE_WAIT hang Simon Kirby
2012-09-05 7:49 ` Yan-Pai Chen
2012-09-05 15:09 ` Myklebust, Trond
2012-09-07 13:57 ` Dick Streefland, rnews
2012-09-07 14:13 ` Myklebust, Trond
2012-09-07 14:33 ` Dick Streefland, rnews
2012-09-07 15:46 ` Myklebust, Trond
2012-09-08 19:32 ` Dick Streefland, rnews
2012-09-10 9:00 ` Yan-Pai Chen
2012-09-11 19:40 ` Simon Kirby
2012-09-11 22:17 ` Myklebust, Trond
2012-09-13 5:22 ` Yan-Pai Chen
2012-09-13 13:32 ` Myklebust, Trond
2012-09-21 7:30 ` Yan-Pai Chen
[not found] ` <1347401844.15208.17.camel@lade.trondhjem.org>
2012-09-12 20:54 ` Myklebust, Trond
2012-09-19 22:01 ` Simon Kirby
2012-09-19 22:11 ` Myklebust, Trond
2012-10-12 8:15 ` Simon Kirby [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20121012081516.GA20046@hostway.ca \
--to=sim@hostway.ca \
--cc=Trond.Myklebust@netapp.com \
--cc=linux-nfs@vger.kernel.org \
--cc=yanpai.chen@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.