All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Kirby <sim@hostway.ca>
To: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Cc: Yan-Pai Chen <yanpai.chen@gmail.com>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: [3.2.5] NFSv3 CLOSE_WAIT hang
Date: Fri, 12 Oct 2012 01:15:16 -0700	[thread overview]
Message-ID: <20121012081516.GA20046@hostway.ca> (raw)
In-Reply-To: <4FA345DA4F4AE44899BD2B03EEEC2FA908FD9796@SACEXCMBX04-PRD.hq.netapp.com>

On Wed, Sep 19, 2012 at 10:11:57PM +0000, Myklebust, Trond wrote:

> On Wed, 2012-09-19 at 15:01 -0700, Simon Kirby wrote:
> 
> > Yes, based on data collected today, this seems to fix the problem!
> > Awesome! :)
> 
> Great! I'm adding "Tested-by:" and "Cc: stable" lines since you've been
> seeing this for some time.

This has been working well, except we just saw one case now where we
still got a CLOSE_WAIT hang. The issue was that the queue_work() was not
called because setting XPRT_LOCKED failed, which you asked about before.

This trace from our debugging patches:

WARNING: at net/sunrpc/xprt.c:648 xprt_force_disconnect+0x12a/0x140() 
Hardware name: PowerEdge 1950 
xprt_force_disconnect(): setting XPRT_LOCKED failed                                                                                                                                                
Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2 serio_raw evdev                             
Pid: 0, comm: swapper/1 Tainted: G        W    3.2.28-hw+ #25 
Call Trace: 
 <IRQ>  [<ffffffff816b74ba>] ? xprt_force_disconnect+0x12a/0x140 
 [<ffffffff8105db70>] warn_slowpath_common+0x80/0xc0 
 [<ffffffff8105dca9>] warn_slowpath_fmt+0x69/0x70 
 [<ffffffff81701c8c>] ? printk+0x67/0x69 
 [<ffffffff81704ef1>] ? _raw_spin_lock_bh+0x11/0x40 
 [<ffffffff816b74ba>] xprt_force_disconnect+0x12a/0x140 
 [<ffffffff816ba99c>] xs_tcp_state_change+0xdc/0x340 
 [<ffffffff81669deb>] tcp_fin+0x17b/0x210 
 [<ffffffff8166aaf2>] tcp_data_queue+0x4c2/0xe70 
 [<ffffffff8166e259>] ? tcp_validate_incoming+0x109/0x340 
 [<ffffffff8166e5d9>] tcp_rcv_established+0x149/0xb50 
 [<ffffffff8167711c>] tcp_v4_do_rcv+0x15c/0x2e0 
 [<ffffffff8164b6e4>] ? nf_iterate+0x84/0xb0 
 [<ffffffff816778ef>] tcp_v4_rcv+0x64f/0x8b0 
 [<ffffffff8164b85d>] ? nf_hook_slow+0x6d/0x130 
 [<ffffffff81656160>] ? ip_rcv+0x2f0/0x2f0 
 [<ffffffff8165620a>] ip_local_deliver_finish+0xaa/0x1d0 
 [<ffffffff816563bd>] ip_local_deliver+0x8d/0xa0 
 [<ffffffff81655bdb>] ip_rcv_finish+0x7b/0x310 
 [<ffffffff816560ed>] ip_rcv+0x27d/0x2f0 
 [<ffffffff816272b7>] __netif_receive_skb+0x437/0x490 
 [<ffffffff81629148>] netif_receive_skb+0x78/0x80 
...
-pid- flgs status -client- --rqstp- -timeout ---ops--
60022 0080    -11 ffff880096e5bc00   (null)        0 ffffffff817beb20 nfsv3 FSINFO a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff880142c1d800) state=16, snd_task=0
SUNRPC: xprt(ffff880032175800) state=16, snd_task=0
SUNRPC: xprt(ffff8800c2da9000) state=16, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff880221977000) state=16, snd_task=0
SUNRPC: xprt(ffff88022212f800) state=16, snd_task=0
SUNRPC: xprt(ffff880221504000) state=0, snd_task=0
SUNRPC: xprt(ffff88022509c800) state=18, snd_task=0
SUNRPC: xprt(ffff88022509c800) state=18, snd_task=0
SUNRPC: xprt(ffff880222711000) state=18, snd_task=0
SUNRPC: xprt(ffff880222711000) state=18, snd_task=0
SUNRPC: xprt(ffff880221737800) state=18, snd_task=0
SUNRPC: xprt(ffff880221737800) state=18, snd_task=0
SUNRPC: xprt(ffff88021f275800) state=18, snd_task=0
SUNRPC: xprt(ffff88021f275800) state=18, snd_task=0
SUNRPC: xprt(ffff8802252b3800) state=18, snd_task=0
SUNRPC: xprt(ffff880222704000) state=18, snd_task=0
SUNRPC: xprt(ffff880222704000) state=18, snd_task=0
SUNRPC: xprt(ffff8802252b3800) state=18, snd_task=0
SUNRPC: xprt(ffff8802252b3800) state=18, snd_task=0
SUNRPC: xprt(ffff88022234e800) state=18, snd_task=0
SUNRPC: xprt(ffff88022234e800) state=18, snd_task=0
SUNRPC: xprt(ffff880221734000) state=18, snd_task=0
SUNRPC: xprt(ffff880221734000) state=18, snd_task=0
SUNRPC: xprt(ffff880222702000) state=18, snd_task=0
SUNRPC: xprt(ffff880222702000) state=18, snd_task=0
 1758 0080    -11 ffff880225562400   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 1770 0080    -11 ffff880225562400   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 1799 0080    -11 ffff880225562400   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
59728 0001    -11 ffff8802253d9e00   (null)        0 ffffffff8173ff20 nfsv3 COMMIT a:call_reserveresult q:xprt_sending
 1449 0080    -11 ffff8802253d9e00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 1766 0080    -11 ffff8802253d9e00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 1811 0080    -11 ffff8802253d9e00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
 1431 0080    -11 ffff8802253d6000   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 1604 0080    -11 ffff8802253d6000   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 1851 0080    -11 ffff8802253d6000   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 1870 0080    -11 ffff8802253d6000   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 5693 0080    -11 ffff8802253d6000   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 5791 0080    -11 ffff8802253d6000   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
 1871 0080    -11 ffff880225522800   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
 1442 0080    -11 ffff8802253d7600   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 1457 0080    -11 ffff8802253d7600   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 1829 0080    -11 ffff8802253d7600   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 5784 0080    -11 ffff8802253d7600   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
 1625 0080    -11 ffff880225522000   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 2152 0080    -11 ffff880225522000   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 2196 0080    -11 ffff880225522000   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
 1517 0080    -11 ffff8802253d5a00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 1733 0080    -11 ffff8802253d5a00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 1741 0080    -11 ffff8802253d5a00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 1872 0080    -11 ffff8802253d5a00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 2211 0080    -11 ffff8802253d5a00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 4316 0080    -11 ffff8802253d5a00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 4350 0080    -11 ffff8802253d5a00   (null)        0 ffffffff817beb20 nfsv3 LOOKUP a:call_reserveresult q:xprt_sending
 4353 0080    -11 ffff8802253d5a00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 5862 0080    -11 ffff8802253d5a00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 6982 0080    -11 ffff8802253d5a00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
 9869 0080    -11 ffff8802253d5a00   (null)        0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0
SUNRPC: xprt(ffff880225098800) state=16, snd_task=0
SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0

We've only seen this once, so we're definitely better than before.

Simon-

      reply	other threads:[~2012-10-12  8:15 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-09 19:45 [3.1-rc4] NFSv3 client hang Simon Kirby
2011-09-09 23:18 ` Trond Myklebust
2011-09-09 23:18   ` Trond Myklebust
2011-10-20 19:03   ` Simon Kirby
2012-03-01 22:55     ` Simon Kirby
2012-03-02  0:25       ` Simon Kirby
2012-03-02 18:49         ` [3.2.5] NFSv3 CLOSE_WAIT hang Simon Kirby
2012-09-05  7:49           ` Yan-Pai Chen
2012-09-05 15:09             ` Myklebust, Trond
2012-09-07 13:57               ` Dick Streefland, rnews
2012-09-07 14:13                 ` Myklebust, Trond
2012-09-07 14:33                   ` Dick Streefland, rnews
2012-09-07 15:46                     ` Myklebust, Trond
2012-09-08 19:32                       ` Dick Streefland, rnews
2012-09-10  9:00                         ` Yan-Pai Chen
2012-09-11 19:40                           ` Simon Kirby
2012-09-11 22:17                             ` Myklebust, Trond
2012-09-13  5:22                               ` Yan-Pai Chen
2012-09-13 13:32                                 ` Myklebust, Trond
2012-09-21  7:30                                   ` Yan-Pai Chen
     [not found]                             ` <1347401844.15208.17.camel@lade.trondhjem.org>
2012-09-12 20:54                               ` Myklebust, Trond
2012-09-19 22:01                                 ` Simon Kirby
2012-09-19 22:11                                   ` Myklebust, Trond
2012-10-12  8:15                                     ` Simon Kirby [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20121012081516.GA20046@hostway.ca \
    --to=sim@hostway.ca \
    --cc=Trond.Myklebust@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=yanpai.chen@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.