* [3.1-rc4] NFSv3 client hang @ 2011-09-09 19:45 Simon Kirby 2011-09-09 23:18 ` Trond Myklebust 0 siblings, 1 reply; 23+ messages in thread From: Simon Kirby @ 2011-09-09 19:45 UTC (permalink / raw) To: linux-nfs, linux-kernel The 3.1-rc4 NFSv3 client hung on another box (separate from the other one which Oopsed in vfs_rmdir() with similar workload). This build was also of 9e79e3e9dd9672b37ac9412e9a926714306551fe (slightly past 3.1-rc4), and "git log 9e79e3e9dd96.. fs/nfs net/sunrpc" is empty. All mounts to one server IP have hung, while all other mounts work fine. I ran "cd /proc/sys/sunrpc; echo 255 > rpc_debug; echo 255 > nfs_debug" for a while, then kill -9'd all D-state processes to simplify the debugging, and was left with one that was not interruptible: 28612 D /usr/local/apache2/bin/http sleep_on_page # cat /proc/28612/stack [<ffffffff810bdf49>] sleep_on_page+0x9/0x10 [<ffffffff810bdf34>] __lock_page+0x64/0x70 [<ffffffff8112a9e5>] __generic_file_splice_read+0x2d5/0x500 [<ffffffff8112ac5a>] generic_file_splice_read+0x4a/0x90 [<ffffffff812030e5>] nfs_file_splice_read+0x85/0xd0 [<ffffffff81128fb2>] do_splice_to+0x72/0xa0 [<ffffffff811297e4>] splice_direct_to_actor+0xc4/0x1d0 [<ffffffff81129942>] do_splice_direct+0x52/0x70 [<ffffffff81100096>] do_sendfile+0x166/0x1d0 [<ffffffff81100185>] sys_sendfile64+0x85/0xb0 [<ffffffff816af57b>] system_call_fastpath+0x16/0x1b [<ffffffffffffffff>] 0xffffffffffffffff echo 1 > /proc/sys/sunrpc/rpc_debug emits: -pid- flgs status -client- --rqstp- -timeout ---ops-- 37163 0001 -11 ffff8802251bca00 (null) 0 ffffffff816e4110 nfsv3 READ a:call_reserveresult q:xprt_sending tcpdump to this server shows absolutely no packets to the server IP for several minutes. netstat shows the socket in CLOSE_WAIT: # netstat -tan|grep 2049 tcp 0 0 10.10.52.50:806 10.10.52.230:2049 CLOSE_WAIT This is the only port-2049 socket that still exists. rpcinfo -p 10.10.52.230, -t 10.10.52.230 lockmgr, etc., all show the server seems fine. rpciod is sleeping in rescuer_thread, and nothing else is in D state. mount opts were "rw,hard,intr,tcp,timeo=300,retrans=2,vers=3" Running another "df" on the mountpoint with rpc_debug = 255 shows: -pid- flgs status -client- --rqstp- -timeout ---ops-- 37163 0001 -11 ffff8802251bca00 (null) 0 ffffffff816e4110 nfsv3 READ a:call_reserveresult q:xprt_sending RPC: looking up Generic cred NFS call access RPC: new task initialized, procpid 30679 RPC: allocated task ffff880030c17a00 RPC: 37133 __rpc_execute flags=0x80 RPC: 37133 call_start nfs3 proc ACCESS (sync) RPC: 37133 call_reserve (status 0) RPC: 37133 failed to lock transport ffff880223d0a000 RPC: 37133 sleep_on(queue "xprt_sending" time 4489651610) RPC: 37133 added to queue ffff880223d0a178 "xprt_sending" RPC: 37133 sync task going to sleep So something is not closing the old transport socket here? Simon- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.1-rc4] NFSv3 client hang 2011-09-09 19:45 [3.1-rc4] NFSv3 client hang Simon Kirby @ 2011-09-09 23:18 ` Trond Myklebust 2011-10-20 19:03 ` Simon Kirby 0 siblings, 1 reply; 23+ messages in thread From: Trond Myklebust @ 2011-09-09 23:18 UTC (permalink / raw) To: Simon Kirby; +Cc: linux-nfs, linux-kernel On Fri, 2011-09-09 at 12:45 -0700, Simon Kirby wrote: > The 3.1-rc4 NFSv3 client hung on another box (separate from the other one > which Oopsed in vfs_rmdir() with similar workload). This build was also > of 9e79e3e9dd9672b37ac9412e9a926714306551fe (slightly past 3.1-rc4), and > "git log 9e79e3e9dd96.. fs/nfs net/sunrpc" is empty. > > All mounts to one server IP have hung, while all other mounts work fine. > I ran "cd /proc/sys/sunrpc; echo 255 > rpc_debug; echo 255 > nfs_debug" > for a while, then kill -9'd all D-state processes to simplify the > debugging, and was left with one that was not interruptible: > > 28612 D /usr/local/apache2/bin/http sleep_on_page > # cat /proc/28612/stack > [<ffffffff810bdf49>] sleep_on_page+0x9/0x10 > [<ffffffff810bdf34>] __lock_page+0x64/0x70 > [<ffffffff8112a9e5>] __generic_file_splice_read+0x2d5/0x500 > [<ffffffff8112ac5a>] generic_file_splice_read+0x4a/0x90 > [<ffffffff812030e5>] nfs_file_splice_read+0x85/0xd0 > [<ffffffff81128fb2>] do_splice_to+0x72/0xa0 > [<ffffffff811297e4>] splice_direct_to_actor+0xc4/0x1d0 > [<ffffffff81129942>] do_splice_direct+0x52/0x70 > [<ffffffff81100096>] do_sendfile+0x166/0x1d0 > [<ffffffff81100185>] sys_sendfile64+0x85/0xb0 > [<ffffffff816af57b>] system_call_fastpath+0x16/0x1b > [<ffffffffffffffff>] 0xffffffffffffffff > > echo 1 > /proc/sys/sunrpc/rpc_debug emits: > > -pid- flgs status -client- --rqstp- -timeout ---ops-- > 37163 0001 -11 ffff8802251bca00 (null) 0 ffffffff816e4110 nfsv3 READ a:call_reserveresult q:xprt_sending > > tcpdump to this server shows absolutely no packets to the server IP for > several minutes. netstat shows the socket in CLOSE_WAIT: > > # netstat -tan|grep 2049 > tcp 0 0 10.10.52.50:806 10.10.52.230:2049 CLOSE_WAIT > > This is the only port-2049 socket that still exists. > rpcinfo -p 10.10.52.230, -t 10.10.52.230 lockmgr, etc., all show the > server seems fine. rpciod is sleeping in rescuer_thread, and nothing > else is in D state. > > mount opts were "rw,hard,intr,tcp,timeo=300,retrans=2,vers=3" > > Running another "df" on the mountpoint with rpc_debug = 255 shows: > > -pid- flgs status -client- --rqstp- -timeout ---ops-- > 37163 0001 -11 ffff8802251bca00 (null) 0 ffffffff816e4110 nfsv3 READ a:call_reserveresult q:xprt_sending > RPC: looking up Generic cred > NFS call access > RPC: new task initialized, procpid 30679 > RPC: allocated task ffff880030c17a00 > RPC: 37133 __rpc_execute flags=0x80 > RPC: 37133 call_start nfs3 proc ACCESS (sync) > RPC: 37133 call_reserve (status 0) > RPC: 37133 failed to lock transport ffff880223d0a000 > RPC: 37133 sleep_on(queue "xprt_sending" time 4489651610) > RPC: 37133 added to queue ffff880223d0a178 "xprt_sending" > RPC: 37133 sync task going to sleep > > So something is not closing the old transport socket here? Hi Simon, I've stared at this for some time now, and I'm having trouble seeing what could be going wrong. Can you therefore please apply the following patch, and see if you can reproduce the problem? Cheers Trond 8<------------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.1-rc4] NFSv3 client hang 2011-09-09 23:18 ` Trond Myklebust @ 2011-10-20 19:03 ` Simon Kirby 2012-03-01 22:55 ` Simon Kirby 0 siblings, 1 reply; 23+ messages in thread From: Simon Kirby @ 2011-10-20 19:03 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs, linux-kernel On Fri, Sep 09, 2011 at 07:18:42PM -0400, Trond Myklebust wrote: > On Fri, 2011-09-09 at 12:45 -0700, Simon Kirby wrote: > > The 3.1-rc4 NFSv3 client hung on another box (separate from the other one > > which Oopsed in vfs_rmdir() with similar workload). This build was also > > of 9e79e3e9dd9672b37ac9412e9a926714306551fe (slightly past 3.1-rc4), and > > "git log 9e79e3e9dd96.. fs/nfs net/sunrpc" is empty. > > > > All mounts to one server IP have hung, while all other mounts work fine. > > I ran "cd /proc/sys/sunrpc; echo 255 > rpc_debug; echo 255 > nfs_debug" > > for a while, then kill -9'd all D-state processes to simplify the > > debugging, and was left with one that was not interruptible: > > > > 28612 D /usr/local/apache2/bin/http sleep_on_page > > # cat /proc/28612/stack > > [<ffffffff810bdf49>] sleep_on_page+0x9/0x10 > > [<ffffffff810bdf34>] __lock_page+0x64/0x70 > > [<ffffffff8112a9e5>] __generic_file_splice_read+0x2d5/0x500 > > [<ffffffff8112ac5a>] generic_file_splice_read+0x4a/0x90 > > [<ffffffff812030e5>] nfs_file_splice_read+0x85/0xd0 > > [<ffffffff81128fb2>] do_splice_to+0x72/0xa0 > > [<ffffffff811297e4>] splice_direct_to_actor+0xc4/0x1d0 > > [<ffffffff81129942>] do_splice_direct+0x52/0x70 > > [<ffffffff81100096>] do_sendfile+0x166/0x1d0 > > [<ffffffff81100185>] sys_sendfile64+0x85/0xb0 > > [<ffffffff816af57b>] system_call_fastpath+0x16/0x1b > > [<ffffffffffffffff>] 0xffffffffffffffff > > > > echo 1 > /proc/sys/sunrpc/rpc_debug emits: > > > > -pid- flgs status -client- --rqstp- -timeout ---ops-- > > 37163 0001 -11 ffff8802251bca00 (null) 0 ffffffff816e4110 nfsv3 READ a:call_reserveresult q:xprt_sending > > > > tcpdump to this server shows absolutely no packets to the server IP for > > several minutes. netstat shows the socket in CLOSE_WAIT: > > > > # netstat -tan|grep 2049 > > tcp 0 0 10.10.52.50:806 10.10.52.230:2049 CLOSE_WAIT > > > > This is the only port-2049 socket that still exists. > > rpcinfo -p 10.10.52.230, -t 10.10.52.230 lockmgr, etc., all show the > > server seems fine. rpciod is sleeping in rescuer_thread, and nothing > > else is in D state. > > > > mount opts were "rw,hard,intr,tcp,timeo=300,retrans=2,vers=3" > > > > Running another "df" on the mountpoint with rpc_debug = 255 shows: > > > > -pid- flgs status -client- --rqstp- -timeout ---ops-- > > 37163 0001 -11 ffff8802251bca00 (null) 0 ffffffff816e4110 nfsv3 READ a:call_reserveresult q:xprt_sending > > RPC: looking up Generic cred > > NFS call access > > RPC: new task initialized, procpid 30679 > > RPC: allocated task ffff880030c17a00 > > RPC: 37133 __rpc_execute flags=0x80 > > RPC: 37133 call_start nfs3 proc ACCESS (sync) > > RPC: 37133 call_reserve (status 0) > > RPC: 37133 failed to lock transport ffff880223d0a000 > > RPC: 37133 sleep_on(queue "xprt_sending" time 4489651610) > > RPC: 37133 added to queue ffff880223d0a178 "xprt_sending" > > RPC: 37133 sync task going to sleep > > > > So something is not closing the old transport socket here? > > Hi Simon, > > I've stared at this for some time now, and I'm having trouble seeing > what could be going wrong. Can you therefore please apply the following > patch, and see if you can reproduce the problem? > > Cheers > Trond > 8<------------------------------------------------------------------------- > From 7030a57b25ce1647cb8489851cff3419e8856189 Mon Sep 17 00:00:00 2001 > From: Trond Myklebust <Trond.Myklebust@netapp.com> > Date: Fri, 9 Sep 2011 19:14:36 -0400 > Subject: [PATCH] NFS: debugging > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > --- > net/sunrpc/clnt.c | 10 ++++++++++ > net/sunrpc/xprt.c | 8 ++++---- > 2 files changed, 14 insertions(+), 4 deletions(-) > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > index c5347d2..3e32cfc 100644 > --- a/net/sunrpc/clnt.c > +++ b/net/sunrpc/clnt.c > @@ -1867,6 +1867,16 @@ void rpc_show_tasks(void) > rpc_show_task(clnt, task); > } > spin_unlock(&clnt->cl_lock); > + task = ACCESS_ONCE(clnt->cl_xprt->snd_task); > + if (task) > + printk("SUNRPC: xprt(%p) state=%lu, snd_task->tk_pid=%u\n", > + clnt->cl_xprt, > + clnt->cl_xprt->state, > + task->tk_pid); > + else > + printk("SUNRPC: xprt(%p) state=%lu, snd_task=0\n", > + clnt->cl_xprt, > + clnt->cl_xprt->state); > } > spin_unlock(&rpc_client_lock); > } > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > index f4385e4..9f5f0e1 100644 > --- a/net/sunrpc/xprt.c > +++ b/net/sunrpc/xprt.c > @@ -235,7 +235,7 @@ static void xprt_clear_locked(struct rpc_xprt *xprt) > clear_bit(XPRT_LOCKED, &xprt->state); > smp_mb__after_clear_bit(); > } else > - queue_work(rpciod_workqueue, &xprt->task_cleanup); > + BUG_ON(!queue_work(rpciod_workqueue, &xprt->task_cleanup)); > } > > /* > @@ -636,7 +636,7 @@ void xprt_force_disconnect(struct rpc_xprt *xprt) > set_bit(XPRT_CLOSE_WAIT, &xprt->state); > /* Try to schedule an autoclose RPC call */ > if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0) > - queue_work(rpciod_workqueue, &xprt->task_cleanup); > + BUG_ON(!queue_work(rpciod_workqueue, &xprt->task_cleanup)); > xprt_wake_pending_tasks(xprt, -EAGAIN); > spin_unlock_bh(&xprt->transport_lock); > } > @@ -663,7 +663,7 @@ void xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie) > set_bit(XPRT_CLOSE_WAIT, &xprt->state); > /* Try to schedule an autoclose RPC call */ > if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0) > - queue_work(rpciod_workqueue, &xprt->task_cleanup); > + BUG_ON(!queue_work(rpciod_workqueue, &xprt->task_cleanup)); > xprt_wake_pending_tasks(xprt, -EAGAIN); > out: > spin_unlock_bh(&xprt->transport_lock); > @@ -681,7 +681,7 @@ xprt_init_autodisconnect(unsigned long data) > goto out_abort; > spin_unlock(&xprt->transport_lock); > set_bit(XPRT_CONNECTION_CLOSE, &xprt->state); > - queue_work(rpciod_workqueue, &xprt->task_cleanup); > + BUG_ON(!queue_work(rpciod_workqueue, &xprt->task_cleanup)); > return; > out_abort: > spin_unlock(&xprt->transport_lock); > -- > 1.7.6 Hello! Finally hit this again, on 3.1-rc9. The port 2049 socket is in CLOSE_WAIT state again, and everything that touches any mount to that IP hangs. I also have logs before unmounting anything, but I figured I'd clear the mounts that weren't hanging to simplify this output. Here is the output from echo 1 > /proc/sys/sunrpc/rpc_debug: [592254.493768] SUNRPC: xprt(ffff880030d3d800) state=16, snd_task=0 [592254.493847] SUNRPC: xprt(ffff880223a8b800) state=16, snd_task=0 [592254.493919] SUNRPC: xprt(ffff880225122800) state=16, snd_task=0 [592254.493991] SUNRPC: xprt(ffff880223a0c800) state=16, snd_task=0 [592254.494064] SUNRPC: xprt(ffff88022280b000) state=18, snd_task=0 [592254.494136] SUNRPC: xprt(ffff88022280b000) state=18, snd_task=0 [592254.494208] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.494283] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.494357] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.494431] -pid- flgs status -client- --rqstp- -timeout ---ops-- [592254.494512] 31859 0801 -11 ffff88022522c000 ffff88008aebb400 0 ffffffff816f0220 nfsv3 WRITE a:call_status q:xprt_sending [592254.494611] 60646 0801 -11 ffff88022522c000 ffff8800c4e21e00 0 ffffffff816f0220 nfsv3 WRITE a:call_status q:xprt_sending [592254.494709] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.494786] 61304 0080 -11 ffff880225e8a600 (null) 0 ffffffff81764ea0 nfsv3 GETATTR a:call_reserveresult q:xprt_sending [592254.494883] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.494958] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.495032] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.495106] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.495180] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.495254] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.495331] 63054 0080 -11 ffff88022522de00 (null) 0 ffffffff81764ea0 nfsv3 GETATTR a:call_reserveresult q:xprt_sending [592254.495428] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.495502] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.495576] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 [592254.495651] SUNRPC: xprt(ffff8802251d6800) state=18, snd_task=0 [592254.495723] SUNRPC: xprt(ffff8802251d6800) state=18, snd_task=0 [592254.495795] SUNRPC: xprt(ffff8802239d3800) state=18, snd_task=0 [592254.495867] SUNRPC: xprt(ffff8802239d3800) state=18, snd_task=0 [592254.495939] SUNRPC: xprt(ffff880225056800) state=16, snd_task=0 # ps -eo pid,stat,args,wchan | grep D PID STAT COMMAND WCHAN 2627 D [php] rpc_wait_bit_killable 8524 D [php4] sleep_on_page 8525 D /usr/lib/cgi-bin/php4 sleep_on_page 8528 D /usr/lib/cgi-bin/php4 generic_file_aio_write 8530 D /usr/lib/cgi-bin/php4 nfs_file_fsync 10892 D /usr/lib/cgi-bin/php4 sleep_on_page 12157 D [php4] rpc_wait_bit_killable 23965 S+ grep --colour D pipe_wait The box was pulled from the cluster for now, so we can leave it in this state temporarily, assuming there is anything further we can do to debug it. Simon- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.1-rc4] NFSv3 client hang 2011-10-20 19:03 ` Simon Kirby @ 2012-03-01 22:55 ` Simon Kirby 2012-03-02 0:25 ` Simon Kirby 0 siblings, 1 reply; 23+ messages in thread From: Simon Kirby @ 2012-03-01 22:55 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Thu, Oct 20, 2011 at 12:03:34PM -0700, Simon Kirby wrote: > On Fri, Sep 09, 2011 at 07:18:42PM -0400, Trond Myklebust wrote: > > > On Fri, 2011-09-09 at 12:45 -0700, Simon Kirby wrote: > > > The 3.1-rc4 NFSv3 client hung on another box (separate from the other one > > > which Oopsed in vfs_rmdir() with similar workload). This build was also > > > of 9e79e3e9dd9672b37ac9412e9a926714306551fe (slightly past 3.1-rc4), and > > > "git log 9e79e3e9dd96.. fs/nfs net/sunrpc" is empty. > > > > > > All mounts to one server IP have hung, while all other mounts work fine. > > > I ran "cd /proc/sys/sunrpc; echo 255 > rpc_debug; echo 255 > nfs_debug" > > > for a while, then kill -9'd all D-state processes to simplify the > > > debugging, and was left with one that was not interruptible: > > > > > > 28612 D /usr/local/apache2/bin/http sleep_on_page > > > # cat /proc/28612/stack > > > [<ffffffff810bdf49>] sleep_on_page+0x9/0x10 > > > [<ffffffff810bdf34>] __lock_page+0x64/0x70 > > > [<ffffffff8112a9e5>] __generic_file_splice_read+0x2d5/0x500 > > > [<ffffffff8112ac5a>] generic_file_splice_read+0x4a/0x90 > > > [<ffffffff812030e5>] nfs_file_splice_read+0x85/0xd0 > > > [<ffffffff81128fb2>] do_splice_to+0x72/0xa0 > > > [<ffffffff811297e4>] splice_direct_to_actor+0xc4/0x1d0 > > > [<ffffffff81129942>] do_splice_direct+0x52/0x70 > > > [<ffffffff81100096>] do_sendfile+0x166/0x1d0 > > > [<ffffffff81100185>] sys_sendfile64+0x85/0xb0 > > > [<ffffffff816af57b>] system_call_fastpath+0x16/0x1b > > > [<ffffffffffffffff>] 0xffffffffffffffff > > > > > > echo 1 > /proc/sys/sunrpc/rpc_debug emits: > > > > > > -pid- flgs status -client- --rqstp- -timeout ---ops-- > > > 37163 0001 -11 ffff8802251bca00 (null) 0 ffffffff816e4110 nfsv3 READ a:call_reserveresult q:xprt_sending > > > > > > tcpdump to this server shows absolutely no packets to the server IP for > > > several minutes. netstat shows the socket in CLOSE_WAIT: > > > > > > # netstat -tan|grep 2049 > > > tcp 0 0 10.10.52.50:806 10.10.52.230:2049 CLOSE_WAIT > > > > > > This is the only port-2049 socket that still exists. > > > rpcinfo -p 10.10.52.230, -t 10.10.52.230 lockmgr, etc., all show the > > > server seems fine. rpciod is sleeping in rescuer_thread, and nothing > > > else is in D state. > > > > > > mount opts were "rw,hard,intr,tcp,timeo=300,retrans=2,vers=3" > > > > > > Running another "df" on the mountpoint with rpc_debug = 255 shows: > > > > > > -pid- flgs status -client- --rqstp- -timeout ---ops-- > > > 37163 0001 -11 ffff8802251bca00 (null) 0 ffffffff816e4110 nfsv3 READ a:call_reserveresult q:xprt_sending > > > RPC: looking up Generic cred > > > NFS call access > > > RPC: new task initialized, procpid 30679 > > > RPC: allocated task ffff880030c17a00 > > > RPC: 37133 __rpc_execute flags=0x80 > > > RPC: 37133 call_start nfs3 proc ACCESS (sync) > > > RPC: 37133 call_reserve (status 0) > > > RPC: 37133 failed to lock transport ffff880223d0a000 > > > RPC: 37133 sleep_on(queue "xprt_sending" time 4489651610) > > > RPC: 37133 added to queue ffff880223d0a178 "xprt_sending" > > > RPC: 37133 sync task going to sleep > > > > > > So something is not closing the old transport socket here? > > > > Hi Simon, > > > > I've stared at this for some time now, and I'm having trouble seeing > > what could be going wrong. Can you therefore please apply the following > > patch, and see if you can reproduce the problem? > > > > Cheers > > Trond > > 8<------------------------------------------------------------------------- > > From 7030a57b25ce1647cb8489851cff3419e8856189 Mon Sep 17 00:00:00 2001 > > From: Trond Myklebust <Trond.Myklebust@netapp.com> > > Date: Fri, 9 Sep 2011 19:14:36 -0400 > > Subject: [PATCH] NFS: debugging > > > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > > --- > > net/sunrpc/clnt.c | 10 ++++++++++ > > net/sunrpc/xprt.c | 8 ++++---- > > 2 files changed, 14 insertions(+), 4 deletions(-) > > > > diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c > > index c5347d2..3e32cfc 100644 > > --- a/net/sunrpc/clnt.c > > +++ b/net/sunrpc/clnt.c > > @@ -1867,6 +1867,16 @@ void rpc_show_tasks(void) > > rpc_show_task(clnt, task); > > } > > spin_unlock(&clnt->cl_lock); > > + task = ACCESS_ONCE(clnt->cl_xprt->snd_task); > > + if (task) > > + printk("SUNRPC: xprt(%p) state=%lu, snd_task->tk_pid=%u\n", > > + clnt->cl_xprt, > > + clnt->cl_xprt->state, > > + task->tk_pid); > > + else > > + printk("SUNRPC: xprt(%p) state=%lu, snd_task=0\n", > > + clnt->cl_xprt, > > + clnt->cl_xprt->state); > > } > > spin_unlock(&rpc_client_lock); > > } > > diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c > > index f4385e4..9f5f0e1 100644 > > --- a/net/sunrpc/xprt.c > > +++ b/net/sunrpc/xprt.c > > @@ -235,7 +235,7 @@ static void xprt_clear_locked(struct rpc_xprt *xprt) > > clear_bit(XPRT_LOCKED, &xprt->state); > > smp_mb__after_clear_bit(); > > } else > > - queue_work(rpciod_workqueue, &xprt->task_cleanup); > > + BUG_ON(!queue_work(rpciod_workqueue, &xprt->task_cleanup)); > > } > > > > /* > > @@ -636,7 +636,7 @@ void xprt_force_disconnect(struct rpc_xprt *xprt) > > set_bit(XPRT_CLOSE_WAIT, &xprt->state); > > /* Try to schedule an autoclose RPC call */ > > if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0) > > - queue_work(rpciod_workqueue, &xprt->task_cleanup); > > + BUG_ON(!queue_work(rpciod_workqueue, &xprt->task_cleanup)); > > xprt_wake_pending_tasks(xprt, -EAGAIN); > > spin_unlock_bh(&xprt->transport_lock); > > } > > @@ -663,7 +663,7 @@ void xprt_conditional_disconnect(struct rpc_xprt *xprt, unsigned int cookie) > > set_bit(XPRT_CLOSE_WAIT, &xprt->state); > > /* Try to schedule an autoclose RPC call */ > > if (test_and_set_bit(XPRT_LOCKED, &xprt->state) == 0) > > - queue_work(rpciod_workqueue, &xprt->task_cleanup); > > + BUG_ON(!queue_work(rpciod_workqueue, &xprt->task_cleanup)); > > xprt_wake_pending_tasks(xprt, -EAGAIN); > > out: > > spin_unlock_bh(&xprt->transport_lock); > > @@ -681,7 +681,7 @@ xprt_init_autodisconnect(unsigned long data) > > goto out_abort; > > spin_unlock(&xprt->transport_lock); > > set_bit(XPRT_CONNECTION_CLOSE, &xprt->state); > > - queue_work(rpciod_workqueue, &xprt->task_cleanup); > > + BUG_ON(!queue_work(rpciod_workqueue, &xprt->task_cleanup)); > > return; > > out_abort: > > spin_unlock(&xprt->transport_lock); > > -- > > 1.7.6 > > Hello! > > Finally hit this again, on 3.1-rc9. The port 2049 socket is in CLOSE_WAIT > state again, and everything that touches any mount to that IP hangs. > > I also have logs before unmounting anything, but I figured I'd clear the > mounts that weren't hanging to simplify this output. Here is the output > from echo 1 > /proc/sys/sunrpc/rpc_debug: > > [592254.493768] SUNRPC: xprt(ffff880030d3d800) state=16, snd_task=0 > [592254.493847] SUNRPC: xprt(ffff880223a8b800) state=16, snd_task=0 > [592254.493919] SUNRPC: xprt(ffff880225122800) state=16, snd_task=0 > [592254.493991] SUNRPC: xprt(ffff880223a0c800) state=16, snd_task=0 > [592254.494064] SUNRPC: xprt(ffff88022280b000) state=18, snd_task=0 > [592254.494136] SUNRPC: xprt(ffff88022280b000) state=18, snd_task=0 > [592254.494208] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.494283] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.494357] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.494431] -pid- flgs status -client- --rqstp- -timeout ---ops-- > [592254.494512] 31859 0801 -11 ffff88022522c000 ffff88008aebb400 0 ffffffff816f0220 nfsv3 WRITE a:call_status q:xprt_sending > [592254.494611] 60646 0801 -11 ffff88022522c000 ffff8800c4e21e00 0 ffffffff816f0220 nfsv3 WRITE a:call_status q:xprt_sending > [592254.494709] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.494786] 61304 0080 -11 ffff880225e8a600 (null) 0 ffffffff81764ea0 nfsv3 GETATTR a:call_reserveresult q:xprt_sending > [592254.494883] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.494958] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.495032] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.495106] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.495180] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.495254] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.495331] 63054 0080 -11 ffff88022522de00 (null) 0 ffffffff81764ea0 nfsv3 GETATTR a:call_reserveresult q:xprt_sending > [592254.495428] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.495502] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.495576] SUNRPC: xprt(ffff880223946000) state=27, snd_task->tk_pid=34816 > [592254.495651] SUNRPC: xprt(ffff8802251d6800) state=18, snd_task=0 > [592254.495723] SUNRPC: xprt(ffff8802251d6800) state=18, snd_task=0 > [592254.495795] SUNRPC: xprt(ffff8802239d3800) state=18, snd_task=0 > [592254.495867] SUNRPC: xprt(ffff8802239d3800) state=18, snd_task=0 > [592254.495939] SUNRPC: xprt(ffff880225056800) state=16, snd_task=0 > > # ps -eo pid,stat,args,wchan | grep D > PID STAT COMMAND WCHAN > 2627 D [php] rpc_wait_bit_killable > 8524 D [php4] sleep_on_page > 8525 D /usr/lib/cgi-bin/php4 sleep_on_page > 8528 D /usr/lib/cgi-bin/php4 generic_file_aio_write > 8530 D /usr/lib/cgi-bin/php4 nfs_file_fsync > 10892 D /usr/lib/cgi-bin/php4 sleep_on_page > 12157 D [php4] rpc_wait_bit_killable > 23965 S+ grep --colour D pipe_wait > > The box was pulled from the cluster for now, so we can leave it in this > state temporarily, assuming there is anything further we can do to debug it. Here's another one, from 3.2.2. I guess it's still happening. :/ One mount point is stuck (hangs on access), and netstat shows the socket to what should be the server in CLOSE_WAIT state. The other mounts work, and are in ESTABLISHED state. With your debugging patch still applied, echo 0 > /proc/sys/sunrpc/rpc_debug shows: # dmesg | grep SUNRPC: | cut -f2- -d' ' | sort -u SUNRPC: xprt(ffff88008a111000) state=16, snd_task=0 SUNRPC: xprt(ffff8801daae5800) state=16, snd_task=0 SUNRPC: xprt(ffff880209f04000) state=18, snd_task=0 SUNRPC: xprt(ffff88020ec78800) state=16, snd_task=0 SUNRPC: xprt(ffff880222c10800) state=16, snd_task=0 SUNRPC: xprt(ffff880222c13000) state=18, snd_task=0 SUNRPC: xprt(ffff880222c18800) state=16, snd_task=0 SUNRPC: xprt(ffff880222ca0800) state=18, snd_task=0 SUNRPC: xprt(ffff880223f6b000) state=18, snd_task=0 SUNRPC: xprt(ffff880223f79800) state=18, snd_task=0 SUNRPC: xprt(ffff880223f91800) state=18, snd_task=0 SUNRPC: xprt(ffff880223f93000) state=18, snd_task=0 SUNRPC: xprt(ffff880223f97000) state=18, snd_task=0 SUNRPC: xprt(ffff880223f9b800) state=18, snd_task=0 SUNRPC: xprt(ffff880223faa800) state=18, snd_task=0 SUNRPC: xprt(ffff880223fc1000) state=16, snd_task=0 SUNRPC: xprt(ffff880223fc7800) state=27, snd_task->tk_pid=0 SUNRPC: xprt(ffff880223fd3800) state=18, snd_task=0 SUNRPC: xprt(ffff880225069800) state=18, snd_task=0 SUNRPC: xprt(ffff8802251bb800) state=18, snd_task=0 SUNRPC: xprt(ffff8802257c3800) state=16, snd_task=0 SUNRPC: xprt(ffff8802257c6000) state=18, snd_task=0 SUNRPC: xprt(ffff880225e8c000) state=16, snd_task=0 SUNRPC: xprt(ffff880226181000) state=18, snd_task=0 SUNRPC: xprt(ffff880226182000) state=18, snd_task=0 SUNRPC: xprt(ffff880226185800) state=18, snd_task=0 SUNRPC: xprt(ffff880226187800) state=16, snd_task=0 Full dmesg: 0x.ca/sim/ref/3.2.2/rpc_debug.txt Simon- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.1-rc4] NFSv3 client hang 2012-03-01 22:55 ` Simon Kirby @ 2012-03-02 0:25 ` Simon Kirby 2012-03-02 18:49 ` [3.2.5] NFSv3 CLOSE_WAIT hang Simon Kirby 0 siblings, 1 reply; 23+ messages in thread From: Simon Kirby @ 2012-03-02 0:25 UTC (permalink / raw) To: Trond Myklebust; +Cc: linux-nfs On Thu, Mar 01, 2012 at 02:55:24PM -0800, Simon Kirby wrote: > Here's another one, from 3.2.2. I guess it's still happening. :/ In cleaning this one up, "umount -t nfs -a -l" left the socket there still in CLOSE_WAIT. There were a few stuck "df" processes from our monitoring script: 31370 D< df -mPt nfs rpc_wait_bit_killable 31822 D< df -mPt nfs rpc_wait_bit_killable 31847 D< df -mPt nfs rpc_wait_bit_killable ...killing those made the socket actually close and change to TIME_WAIT, allowing me to remount everything, and it all works again. Simon- ^ permalink raw reply [flat|nested] 23+ messages in thread
* [3.2.5] NFSv3 CLOSE_WAIT hang 2012-03-02 0:25 ` Simon Kirby @ 2012-03-02 18:49 ` Simon Kirby 2012-09-05 7:49 ` Yan-Pai Chen 0 siblings, 1 reply; 23+ messages in thread From: Simon Kirby @ 2012-03-02 18:49 UTC (permalink / raw) To: linux-nfs Here's another CLOSE_WAIT hang, 3.2.5 client, 3.2.2 knfsd server, NFSv3. Not sure why these are all happening again now. This cluster seems to have a set of customers that are good at breaking things. ;) The only other maybe weird thing we do on this partition is bind mount some NFS paths to break through some chroots. Perhaps this is pegging things in place (or expecting them to stay there) and causing problems? export: /home/logs 10.10.52.0/24(rw,async,no_root_squash,no_subtree_check,fsid=1) mount: 10.10.52.15:/home/logs /home/logs nfs rw,soft,intr,tcp,timeo=300,retrans=2,vers=3 0 0 current socket state: tcp 0 0 10.10.52.11:719 10.10.52.15:2049 CLOSE_WAIT - off (0.00/0/0) demsg: [505238.409106] ------------[ cut here ]------------ [505238.409194] WARNING: at fs/dcache.c:2485 prepend_path+0x19c/0x1b0() [505238.409269] Hardware name: PowerEdge 1950 [505238.409339] Root dentry has weird name <> [505238.409409] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2 [505238.410107] Pid: 3461, comm: sftp-server Tainted: G W 3.2.5-hw+ #3 [505238.410184] Call Trace: [505238.410256] [<ffffffff8113140c>] ? prepend_path+0x19c/0x1b0 [505238.410333] [<ffffffff8105b0e0>] warn_slowpath_common+0x80/0xc0 [505238.410408] [<ffffffff8105b219>] warn_slowpath_fmt+0x69/0x70 [505238.410485] [<ffffffff8111bf89>] ? do_sync_read+0xd9/0x120 [505238.410560] [<ffffffff81130db0>] ? prepend+0x30/0x40 [505238.410634] [<ffffffff8113140c>] prepend_path+0x19c/0x1b0 [505238.410708] [<ffffffff81131940>] sys_getcwd+0x100/0x1e0 [505238.410786] [<ffffffff816e1392>] system_call_fastpath+0x16/0x1b [505238.410861] ---[ end trace 3cedde7daf396ef5 ]--- [505238.461689] ------------[ cut here ]------------ [505238.461772] WARNING: at fs/dcache.c:2485 prepend_path+0x19c/0x1b0() [505238.461846] Hardware name: PowerEdge 1950 [505238.461916] Root dentry has weird name <> [505238.461986] Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2 [505238.462680] Pid: 3461, comm: sftp-server Tainted: G W 3.2.5-hw+ #3 [505238.462756] Call Trace: [505238.462826] [<ffffffff8113140c>] ? prepend_path+0x19c/0x1b0 [505238.462902] [<ffffffff8105b0e0>] warn_slowpath_common+0x80/0xc0 [505238.462977] [<ffffffff8105b219>] warn_slowpath_fmt+0x69/0x70 [505238.463053] [<ffffffff8111bf89>] ? do_sync_read+0xd9/0x120 [505238.463128] [<ffffffff81130db0>] ? prepend+0x30/0x40 [505238.463202] [<ffffffff8113140c>] prepend_path+0x19c/0x1b0 [505238.463277] [<ffffffff81131940>] sys_getcwd+0x100/0x1e0 [505238.463352] [<ffffffff816e1392>] system_call_fastpath+0x16/0x1b [505238.463427] ---[ end trace 3cedde7daf396ef6 ]--- [505526.997108] SUNRPC: xprt(ffff880081b48800) state=18, snd_task=0 [505526.997189] SUNRPC: xprt(ffff8802252a4800) state=18, snd_task=0 [505526.997264] SUNRPC: xprt(ffff880025b5b800) state=18, snd_task=0 [505526.997338] SUNRPC: xprt(ffff880223590000) state=18, snd_task=0 [505526.997412] SUNRPC: xprt(ffff880223590000) state=18, snd_task=0 [505526.997486] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505526.997560] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505526.997634] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505526.997708] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505526.997782] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505526.997856] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505526.997930] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505526.998004] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505526.998078] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505526.998152] SUNRPC: xprt(ffff8802251d3800) state=18, snd_task=0 [505526.998226] SUNRPC: xprt(ffff8802251d3800) state=18, snd_task=0 [505526.998300] SUNRPC: xprt(ffff88022344a000) state=18, snd_task=0 [505526.998374] SUNRPC: xprt(ffff880208c53800) state=16, snd_task=0 [505526.998448] SUNRPC: xprt(ffff8801cabf1000) state=16, snd_task=0 [505526.998523] SUNRPC: xprt(ffff8802235ac800) state=18, snd_task=0 [505526.998597] SUNRPC: xprt(ffff880208c53000) state=18, snd_task=0 [505526.998671] SUNRPC: xprt(ffff8802235a0800) state=18, snd_task=0 [505526.998745] SUNRPC: xprt(ffff880211b70000) state=18, snd_task=0 [505526.998819] SUNRPC: xprt(ffff88022503b800) state=18, snd_task=0 [505527.000640] SUNRPC: xprt(ffff88022360d000) state=18, snd_task=0 [505527.000715] SUNRPC: xprt(ffff880223650800) state=18, snd_task=0 [505527.000790] SUNRPC: xprt(ffff880223608800) state=16, snd_task=0 [505527.000864] SUNRPC: xprt(ffff8802257d2000) state=18, snd_task=0 [505527.000938] SUNRPC: xprt(ffff8802257d2000) state=18, snd_task=0 [505527.001012] SUNRPC: xprt(ffff8802257d2000) state=18, snd_task=0 [505527.001086] SUNRPC: xprt(ffff8802257d2000) state=18, snd_task=0 [505527.001160] SUNRPC: xprt(ffff8802257d2000) state=18, snd_task=0 [505527.001234] -pid- flgs status -client- --rqstp- -timeout ---ops-- [505527.001317] 7094 0880 0 ffff880223763000 ffff880224046400 7500 ffffffff8178b220 nfsv3 SETATTR a:call_status q:xprt_pending [505527.001417] SUNRPC: xprt(ffff8802257d2000) state=18, snd_task=0 [505527.001522] SUNRPC: xprt(ffff8802257d2000) state=18, snd_task=0 [505527.001597] SUNRPC: xprt(ffff8802257d2000) state=18, snd_task=0 [505527.001671] SUNRPC: xprt(ffff8802235a1800) state=18, snd_task=0 [505527.001745] SUNRPC: xprt(ffff8802235a1800) state=18, snd_task=0 [505527.001824] 12105 0280 -11 ffff88022551c200 (null) 0 ffffffff8178b220 nfsv3 ACCESS a:call_reserveresult q:xprt_sending [505527.001926] 12106 0280 -11 ffff88022551c200 (null) 0 ffffffff8178b220 nfsv3 ACCESS a:call_reserveresult q:xprt_sending [505527.002028] 12125 0280 -11 ffff88022551c200 (null) 0 ffffffff8178b220 nfsv3 ACCESS a:call_reserveresult q:xprt_sending [505527.002130] 12126 0280 -11 ffff88022551c200 (null) 0 ffffffff8178b220 nfsv3 ACCESS a:call_reserveresult q:xprt_sending [505527.002231] 12129 0280 -11 ffff88022551c200 (null) 0 ffffffff8178b220 nfsv3 ACCESS a:call_reserveresult q:xprt_sending [505527.002332] 12167 0280 -11 ffff88022551c200 (null) 0 ffffffff8178b220 nfsv3 ACCESS a:call_reserveresult q:xprt_sending [505527.002434] 12176 0280 -11 ffff88022551c200 (null) 0 ffffffff8178b220 nfsv3 ACCESS a:call_reserveresult q:xprt_sending [505527.002535] 12188 0280 -11 ffff88022551c200 (null) 0 ffffffff8178b220 nfsv3 ACCESS a:call_reserveresult q:xprt_sending [505527.002637] 12189 0280 -11 ffff88022551c200 (null) 0 ffffffff8178b220 nfsv3 ACCESS a:call_reserveresult q:xprt_sending [505527.002736] SUNRPC: xprt(ffff88022510f800) state=27, snd_task->tk_pid=32732 [505527.002813] SUNRPC: xprt(ffff88022510f800) state=27, snd_task->tk_pid=32732 [505527.002889] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505527.002963] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505527.003036] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505527.003110] SUNRPC: xprt(ffff8802251d7800) state=18, snd_task=0 [505527.003184] SUNRPC: xprt(ffff8802251d3800) state=18, snd_task=0 [505527.003258] SUNRPC: xprt(ffff8802251d3800) state=18, snd_task=0 [505527.003334] SUNRPC: xprt(ffff8802251d3800) state=18, snd_task=0 [505527.003408] SUNRPC: xprt(ffff8802251d3800) state=18, snd_task=0 [505527.003482] SUNRPC: xprt(ffff8802251d3800) state=18, snd_task=0 [505527.003555] SUNRPC: xprt(ffff8802251d3800) state=18, snd_task=0 [505527.003629] SUNRPC: xprt(ffff8802252a5000) state=18, snd_task=0 [505527.003703] SUNRPC: xprt(ffff880225f05800) state=18, snd_task=0 [505527.003777] SUNRPC: xprt(ffff880225f05800) state=18, snd_task=0 [505527.003851] SUNRPC: xprt(ffff8802252a5000) state=18, snd_task=0 [505527.003925] SUNRPC: xprt(ffff8802252a5000) state=18, snd_task=0 [505527.004012] SUNRPC: xprt(ffff880223586800) state=18, snd_task=0 [505527.004089] SUNRPC: xprt(ffff880223586800) state=18, snd_task=0 [505527.004167] SUNRPC: xprt(ffff8802235a3800) state=18, snd_task=0 [505527.004243] SUNRPC: xprt(ffff8802235a3800) state=18, snd_task=0 [505527.004317] SUNRPC: xprt(ffff8802235ab800) state=18, snd_task=0 [505527.004393] SUNRPC: xprt(ffff8802235ab800) state=18, snd_task=0 [505527.004467] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.004544] 7708 0880 0 ffff880224080000 ffff88022557b400 7500 ffffffff8178b220 nfsv3 SETATTR a:call_status q:xprt_pending [505527.004644] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.004718] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.004792] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.004868] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.004945] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.005019] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.005096] 7512 0801 0 ffff88022501f200 ffff880223695800 7500 ffffffff81714ee0 nfsv3 WRITE a:call_status q:xprt_pending [505527.005198] 7516 0880 0 ffff88022501f200 ffff8802252e1200 7500 ffffffff8178b220 nfsv3 SETATTR a:call_status q:xprt_pending [505527.005297] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.005372] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.005447] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.005521] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.005607] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.005682] SUNRPC: xprt(ffff8802236a3800) state=18, snd_task=0 [505527.005756] SUNRPC: xprt(ffff88022359e800) state=18, snd_task=0 [505527.005829] SUNRPC: xprt(ffff88022359e800) state=18, snd_task=0 [505527.005906] 7688 0880 0 ffff880225479a00 ffff880225019600 7500 ffffffff8178b220 nfsv3 GETATTR a:call_status q:xprt_pending [505527.006005] SUNRPC: xprt(ffff88022359e800) state=18, snd_task=0 [505527.006079] SUNRPC: xprt(ffff88022359e800) state=18, snd_task=0 [505527.006153] SUNRPC: xprt(ffff88022359e800) state=18, snd_task=0 [505527.006227] SUNRPC: xprt(ffff88022359e800) state=18, snd_task=0 [505527.006301] SUNRPC: xprt(ffff88022359e800) state=18, snd_task=0 [505527.006375] SUNRPC: xprt(ffff88022359e800) state=18, snd_task=0 [505527.006448] SUNRPC: xprt(ffff8802235bf000) state=18, snd_task=0 [505527.006522] SUNRPC: xprt(ffff8802235bf000) state=18, snd_task=0 [505527.006596] SUNRPC: xprt(ffff8802235bf000) state=18, snd_task=0 [505527.006670] SUNRPC: xprt(ffff8802235bf000) state=18, snd_task=0 [505527.006744] SUNRPC: xprt(ffff8802235bf000) state=18, snd_task=0 [505527.006818] SUNRPC: xprt(ffff8802235bf000) state=18, snd_task=0 [505527.006891] SUNRPC: xprt(ffff8802235bf000) state=18, snd_task=0 [505527.006965] SUNRPC: xprt(ffff8802235bf000) state=18, snd_task=0 [505527.007039] SUNRPC: xprt(ffff88022344a000) state=18, snd_task=0 [505527.007113] SUNRPC: xprt(ffff88022344a000) state=18, snd_task=0 [505527.007187] SUNRPC: xprt(ffff88022344a000) state=18, snd_task=0 [505527.007261] SUNRPC: xprt(ffff88022344a000) state=18, snd_task=0 [505527.007335] SUNRPC: xprt(ffff88022344a000) state=18, snd_task=0 [505527.007409] SUNRPC: xprt(ffff88022344a000) state=18, snd_task=0 [505527.007483] SUNRPC: xprt(ffff8802236a2000) state=16, snd_task=0 [505527.007557] SUNRPC: xprt(ffff88022344a000) state=18, snd_task=0 Thanks, Simon- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-03-02 18:49 ` [3.2.5] NFSv3 CLOSE_WAIT hang Simon Kirby @ 2012-09-05 7:49 ` Yan-Pai Chen 2012-09-05 15:09 ` Myklebust, Trond 0 siblings, 1 reply; 23+ messages in thread From: Yan-Pai Chen @ 2012-09-05 7:49 UTC (permalink / raw) To: linux-nfs Simon Kirby <sim@...> writes: > > Here's another CLOSE_WAIT hang, 3.2.5 client, 3.2.2 knfsd server, NFSv3. > Not sure why these are all happening again now. This cluster seems to > have a set of customers that are good at breaking things. ;) Hi all, I have the same problem in 3.3 kernel (client). After applying the following modification as suggested by Dick in http://www.spinics.net/lists/linux-nfs/msg32407.html, the problem is just gone. Does anyone know if they are related? Thanks. diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index c64c0ef..f979e9f 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1071,24 +1071,9 @@ void xprt_reserve(struct rpc_task *task) { struct rpc_xprt *xprt = task->tk_xprt; - task->tk_status = 0; - if (task->tk_rqstp != NULL) - return; - - /* Note: grabbing the xprt_lock_write() here is not strictly needed, - * but ensures that we throttle new slot allocation if the transport - * is congested (e.g. if reconnecting or if we're out of socket - * write buffer space). - */ - task->tk_timeout = 0; - task->tk_status = -EAGAIN; - if (!xprt_lock_write(xprt, task)) - return; - spin_lock(&xprt->reserve_lock); xprt_alloc_slot(task); spin_unlock(&xprt->reserve_lock); - xprt_release_write(xprt, task); } ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-05 7:49 ` Yan-Pai Chen @ 2012-09-05 15:09 ` Myklebust, Trond 2012-09-07 13:57 ` Dick Streefland, rnews 0 siblings, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-09-05 15:09 UTC (permalink / raw) To: Yan-Pai Chen; +Cc: linux-nfs@vger.kernel.org T24gV2VkLCAyMDEyLTA5LTA1IGF0IDA3OjQ5ICswMDAwLCBZYW4tUGFpIENoZW4gd3JvdGU6DQo+ IFNpbW9uIEtpcmJ5IDxzaW1ALi4uPiB3cml0ZXM6DQo+IA0KPiA+IA0KPiA+IEhlcmUncyBhbm90 aGVyIENMT1NFX1dBSVQgaGFuZywgMy4yLjUgY2xpZW50LCAzLjIuMiBrbmZzZCBzZXJ2ZXIsIE5G U3YzLg0KPiA+IE5vdCBzdXJlIHdoeSB0aGVzZSBhcmUgYWxsIGhhcHBlbmluZyBhZ2FpbiBub3cu IFRoaXMgY2x1c3RlciBzZWVtcyB0bw0KPiA+IGhhdmUgYSBzZXQgb2YgY3VzdG9tZXJzIHRoYXQg YXJlIGdvb2QgYXQgYnJlYWtpbmcgdGhpbmdzLiA7KQ0KPiANCj4gSGkgYWxsLA0KPiANCj4gSSBo YXZlIHRoZSBzYW1lIHByb2JsZW0gaW4gMy4zIGtlcm5lbCAoY2xpZW50KS4NCj4gQWZ0ZXIgYXBw bHlpbmcgdGhlIGZvbGxvd2luZyBtb2RpZmljYXRpb24gYXMgc3VnZ2VzdGVkIGJ5IERpY2sgaW4N Cj4gaHR0cDovL3d3dy5zcGluaWNzLm5ldC9saXN0cy9saW51eC1uZnMvbXNnMzI0MDcuaHRtbCwg dGhlIHByb2JsZW0NCj4gaXMganVzdCBnb25lLg0KPiANCj4gRG9lcyBhbnlvbmUga25vdyBpZiB0 aGV5IGFyZSByZWxhdGVkPw0KPiBUaGFua3MuDQo+IA0KPiBkaWZmIC0tZ2l0IGEvbmV0L3N1bnJw Yy94cHJ0LmMgYi9uZXQvc3VucnBjL3hwcnQuYw0KPiBpbmRleCBjNjRjMGVmLi5mOTc5ZTlmIDEw MDY0NA0KPiAtLS0gYS9uZXQvc3VucnBjL3hwcnQuYw0KPiArKysgYi9uZXQvc3VucnBjL3hwcnQu Yw0KPiBAQCAtMTA3MSwyNCArMTA3MSw5IEBAIHZvaWQgeHBydF9yZXNlcnZlKHN0cnVjdCBycGNf dGFzayAqdGFzaykNCj4gIHsNCj4gICAgICAgICBzdHJ1Y3QgcnBjX3hwcnQgKnhwcnQgPSB0YXNr LT50a194cHJ0Ow0KPiANCj4gLSAgICAgICB0YXNrLT50a19zdGF0dXMgPSAwOw0KPiAtICAgICAg IGlmICh0YXNrLT50a19ycXN0cCAhPSBOVUxMKQ0KPiAtICAgICAgICAgICAgICAgcmV0dXJuOw0K PiAtDQo+IC0gICAgICAgLyogTm90ZTogZ3JhYmJpbmcgdGhlIHhwcnRfbG9ja193cml0ZSgpIGhl cmUgaXMgbm90IHN0cmljdGx5IG5lZWRlZCwNCj4gLSAgICAgICAgKiBidXQgZW5zdXJlcyB0aGF0 IHdlIHRocm90dGxlIG5ldyBzbG90IGFsbG9jYXRpb24gaWYgdGhlIHRyYW5zcG9ydA0KPiAtICAg ICAgICAqIGlzIGNvbmdlc3RlZCAoZS5nLiBpZiByZWNvbm5lY3Rpbmcgb3IgaWYgd2UncmUgb3V0 IG9mIHNvY2tldA0KPiAtICAgICAgICAqIHdyaXRlIGJ1ZmZlciBzcGFjZSkuDQo+IC0gICAgICAg ICovDQo+IC0gICAgICAgdGFzay0+dGtfdGltZW91dCA9IDA7DQo+IC0gICAgICAgdGFzay0+dGtf c3RhdHVzID0gLUVBR0FJTjsNCj4gLSAgICAgICBpZiAoIXhwcnRfbG9ja193cml0ZSh4cHJ0LCB0 YXNrKSkNCj4gLSAgICAgICAgICAgICAgIHJldHVybjsNCj4gLQ0KPiAgICAgICAgIHNwaW5fbG9j aygmeHBydC0+cmVzZXJ2ZV9sb2NrKTsNCj4gICAgICAgICB4cHJ0X2FsbG9jX3Nsb3QodGFzayk7 DQo+ICAgICAgICAgc3Bpbl91bmxvY2soJnhwcnQtPnJlc2VydmVfbG9jayk7DQo+IC0gICAgICAg eHBydF9yZWxlYXNlX3dyaXRlKHhwcnQsIHRhc2spOw0KPiAgfQ0KDQpDbGVhcmx5IHRoZSBjb21t ZW50IGlzIG1pc2xlYWRpbmcgYW5kIHNob3VsZCBiZSByZW1vdmVkLiBUaGF0IHdyaXRlIGxvY2sN Cl9pc18gbmVlZGVkIGluIG9yZGVyIHRvIHRocm90dGxlIHNsb3RzIG9uIFRDUC4NCg0KQXMgZmFy IGFzIEkga25vdywga2VybmVsIDMuMyBpcyBub3QgaW4gc3RhYmxlIHN1cHBvcnQgYW55IG1vcmUs IHNvIEkNCmNhbid0IGhlbHAgdGhhdC4gQ2FuIHlvdSByZXByb2R1Y2UgdGhlIHByb2JsZW0gb24g YSAzLjUga2VybmVsIG9yDQpoaWdoZXI/DQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBO RlMgY2xpZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNv bQ0Kd3d3Lm5ldGFwcC5jb20NCg0K ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-05 15:09 ` Myklebust, Trond @ 2012-09-07 13:57 ` Dick Streefland, rnews 2012-09-07 14:13 ` Myklebust, Trond 0 siblings, 1 reply; 23+ messages in thread From: Dick Streefland, rnews @ 2012-09-07 13:57 UTC (permalink / raw) To: linux-nfs "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: | Clearly the comment is misleading and should be removed. That write lock | | _is_ needed in order to throttle slots on TCP. | | As far as I know, kernel 3.3 is not in stable support any more, so I | can't help that. Can you reproduce the problem on a 3.5 kernel or | higher? I can easily reproduce the hang on the latest kernel (eeea3ac912) with my script (http://permalink.gmane.org/gmane.linux.nfs/51833). With this change, it doesn't hang: diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a5a402a..6e739bf 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -1083,20 +1083,9 @@ void xprt_reserve(struct rpc_task *task) if (task->tk_rqstp != NULL) return; - /* Note: grabbing the xprt_lock_write() here is not strictly needed, - * but ensures that we throttle new slot allocation if the transport - * is congested (e.g. if reconnecting or if we're out of socket - * write buffer space). - */ - task->tk_timeout = 0; - task->tk_status = -EAGAIN; - if (!xprt_lock_write(xprt, task)) - return; - spin_lock(&xprt->reserve_lock); xprt_alloc_slot(task); spin_unlock(&xprt->reserve_lock); - xprt_release_write(xprt, task); } static inline __be32 xprt_alloc_xid(struct rpc_xprt *xprt) -- Dick ^ permalink raw reply related [flat|nested] 23+ messages in thread
* RE: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-07 13:57 ` Dick Streefland, rnews @ 2012-09-07 14:13 ` Myklebust, Trond 2012-09-07 14:33 ` Dick Streefland, rnews 0 siblings, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-09-07 14:13 UTC (permalink / raw) To: Dick Streefland, linux-nfs@vger.kernel.org > -----Original Message----- > From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs- > owner@vger.kernel.org] On Behalf Of Dick Streefland > Sent: Friday, September 07, 2012 9:57 AM > To: linux-nfs@vger.kernel.org > Subject: Re: [3.2.5] NFSv3 CLOSE_WAIT hang > > "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: > | Clearly the comment is misleading and should be removed. That write > | lock > | > | _is_ needed in order to throttle slots on TCP. > | > | As far as I know, kernel 3.3 is not in stable support any more, so I > | can't help that. Can you reproduce the problem on a 3.5 kernel or > | higher? > > I can easily reproduce the hang on the latest kernel (eeea3ac912) with my > script (http://permalink.gmane.org/gmane.linux.nfs/51833). On TCP? > With this change, it doesn't hang: It doesn't matter as long as that change is wrong. That code _is_ needed for TCP. Now if UDP is broken, then we can talk about a fix for that issue, but that's not going to be the one you propose. Trond ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-07 14:13 ` Myklebust, Trond @ 2012-09-07 14:33 ` Dick Streefland, rnews 2012-09-07 15:46 ` Myklebust, Trond 0 siblings, 1 reply; 23+ messages in thread From: Dick Streefland, rnews @ 2012-09-07 14:33 UTC (permalink / raw) To: linux-nfs "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: | > I can easily reproduce the hang on the latest kernel (eeea3ac912) with my | > script (http://permalink.gmane.org/gmane.linux.nfs/51833). | | On TCP? No, on UDP (mount options are in that first email). | > With this change, it doesn't hang: | | It doesn't matter as long as that change is wrong. That code _is_ needed for TCP. | | Now if UDP is broken, then we can talk about a fix for that issue, but that's not going to be the one you propose. OK, fine with me. My proposal to revert this change was only based on the remark in the comment that it is not strictly needed. So we need to differentiate between UDP and TCP then? -- Dick ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-07 14:33 ` Dick Streefland, rnews @ 2012-09-07 15:46 ` Myklebust, Trond 2012-09-08 19:32 ` Dick Streefland, rnews 0 siblings, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-09-07 15:46 UTC (permalink / raw) To: Dick Streefland; +Cc: linux-nfs@vger.kernel.org T24gRnJpLCAyMDEyLTA5LTA3IGF0IDE0OjMzICswMDAwLCBEaWNrIFN0cmVlZmxhbmQgd3JvdGU6 DQo+ICJNeWtsZWJ1c3QsIFRyb25kIiA8VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20+IHdyb3Rl Og0KPiB8ID4gSSBjYW4gZWFzaWx5IHJlcHJvZHVjZSB0aGUgaGFuZyBvbiB0aGUgbGF0ZXN0IGtl cm5lbCAoZWVlYTNhYzkxMikgd2l0aCBteQ0KPiB8ID4gc2NyaXB0IChodHRwOi8vcGVybWFsaW5r LmdtYW5lLm9yZy9nbWFuZS5saW51eC5uZnMvNTE4MzMpLg0KPiB8IA0KPiB8IE9uIFRDUD8NCj4g DQo+IE5vLCBvbiBVRFAgKG1vdW50IG9wdGlvbnMgYXJlIGluIHRoYXQgZmlyc3QgZW1haWwpLg0K PiANCj4gfCA+IFdpdGggdGhpcyBjaGFuZ2UsIGl0IGRvZXNuJ3QgaGFuZzoNCj4gfCANCj4gfCBJ dCBkb2Vzbid0IG1hdHRlciBhcyBsb25nIGFzIHRoYXQgY2hhbmdlIGlzIHdyb25nLiBUaGF0IGNv ZGUgX2lzXyBuZWVkZWQgZm9yIFRDUC4NCj4gfCANCj4gfCBOb3cgaWYgVURQIGlzIGJyb2tlbiwg dGhlbiB3ZSBjYW4gdGFsayBhYm91dCBhIGZpeCBmb3IgdGhhdCBpc3N1ZSwgYnV0IHRoYXQncyBu b3QgZ29pbmcgdG8gYmUgdGhlIG9uZSB5b3UgcHJvcG9zZS4NCj4gDQo+IE9LLCBmaW5lIHdpdGgg bWUuIE15IHByb3Bvc2FsIHRvIHJldmVydCB0aGlzIGNoYW5nZSB3YXMgb25seSBiYXNlZCBvbg0K PiB0aGUgcmVtYXJrIGluIHRoZSBjb21tZW50IHRoYXQgaXQgaXMgbm90IHN0cmljdGx5IG5lZWRl ZC4gU28gd2UgbmVlZA0KPiB0byBkaWZmZXJlbnRpYXRlIGJldHdlZW4gVURQIGFuZCBUQ1AgdGhl bj8NCg0KWWVzLiBDYW4geW91IHBsZWFzZSBzZWUgaWYgdGhlIGZvbGxvd2luZyBwYXRjaCBmaXhl cyB0aGUgVURQIGhhbmc/DQoNCjg8LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tDQpGcm9tIGYzOWMxYmZiNWEwM2UyZDI1 NTQ1MWJmZjA1YmUwZDcyNTUyOThmYTQgTW9uIFNlcCAxNyAwMDowMDowMCAyMDAxDQpGcm9tOiBU cm9uZCBNeWtsZWJ1c3QgPFRyb25kLk15a2xlYnVzdEBuZXRhcHAuY29tPg0KRGF0ZTogRnJpLCA3 IFNlcCAyMDEyIDExOjA4OjUwIC0wNDAwDQpTdWJqZWN0OiBbUEFUQ0hdIFNVTlJQQzogRml4IGEg VURQIHRyYW5zcG9ydCByZWdyZXNzaW9uDQoNCkNvbW1pdCA0M2NlZGJmMGU4ZGZiOWM1NjEwZWI3 OTg1ZDVmMjEyNjNlMzEzODAyIChTVU5SUEM6IEVuc3VyZSB0aGF0DQp3ZSBncmFiIHRoZSBYUFJU X0xPQ0sgYmVmb3JlIGNhbGxpbmcgeHBydF9hbGxvY19zbG90KSBpcyBjYXVzaW5nDQpoYW5ncyBp biB0aGUgY2FzZSBvZiBORlMgb3ZlciBVRFAgbW91bnRzLg0KDQpTaW5jZSBuZWl0aGVyIHRoZSBV RFAgb3IgdGhlIFJETUEgdHJhbnNwb3J0IG1lY2hhbmlzbSB1c2UgZHluYW1pYyBzbG90DQphbGxv Y2F0aW9uLCB3ZSBjYW4gc2tpcCBncmFiYmluZyB0aGUgc29ja2V0IGxvY2sgZm9yIHRob3NlIHRy YW5zcG9ydHMuDQpBZGQgYSBuZXcgcnBjX3hwcnRfb3AgdG8gYWxsb3cgc3dpdGNoaW5nIGJldHdl ZW4gdGhlIFRDUCBhbmQgVURQL1JETUENCmNhc2UuDQoNCk5vdGUgdGhhdCB0aGUgTkZTdjQuMSBi YWNrIGNoYW5uZWwgYXNzaWducyB0aGUgc2xvdCBkaXJlY3RseQ0KdGhyb3VnaCBycGNfcnVuX2Jj X3Rhc2ssIHNvIHdlIGNhbiBpZ25vcmUgdGhhdCBjYXNlLg0KDQpSZXBvcnRlZC1ieTogRGljayBT dHJlZWZsYW5kIDxkaWNrLnN0cmVlZmxhbmRAYWx0aXVtLm5sPg0KU2lnbmVkLW9mZi1ieTogVHJv bmQgTXlrbGVidXN0IDxUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbT4NCkNjOiBzdGFibGVAdmdl ci5rZXJuZWwub3JnIFs+PSAzLjFdDQotLS0NCiBpbmNsdWRlL2xpbnV4L3N1bnJwYy94cHJ0Lmgg ICAgIHwgIDMgKysrDQogbmV0L3N1bnJwYy94cHJ0LmMgICAgICAgICAgICAgICB8IDM0ICsrKysr KysrKysrKysrKysrKysrLS0tLS0tLS0tLS0tLS0NCiBuZXQvc3VucnBjL3hwcnRyZG1hL3RyYW5z cG9ydC5jIHwgIDEgKw0KIG5ldC9zdW5ycGMveHBydHNvY2suYyAgICAgICAgICAgfCAgMyArKysN CiA0IGZpbGVzIGNoYW5nZWQsIDI3IGluc2VydGlvbnMoKyksIDE0IGRlbGV0aW9ucygtKQ0KDQpk aWZmIC0tZ2l0IGEvaW5jbHVkZS9saW51eC9zdW5ycGMveHBydC5oIGIvaW5jbHVkZS9saW51eC9z dW5ycGMveHBydC5oDQppbmRleCBjZmY0MGFhLi5iZjhjNDlmIDEwMDY0NA0KLS0tIGEvaW5jbHVk ZS9saW51eC9zdW5ycGMveHBydC5oDQorKysgYi9pbmNsdWRlL2xpbnV4L3N1bnJwYy94cHJ0LmgN CkBAIC0xMTQsNiArMTE0LDcgQEAgc3RydWN0IHJwY194cHJ0X29wcyB7DQogCXZvaWQJCSgqc2V0 X2J1ZmZlcl9zaXplKShzdHJ1Y3QgcnBjX3hwcnQgKnhwcnQsIHNpemVfdCBzbmRzaXplLCBzaXpl X3QgcmN2c2l6ZSk7DQogCWludAkJKCpyZXNlcnZlX3hwcnQpKHN0cnVjdCBycGNfeHBydCAqeHBy dCwgc3RydWN0IHJwY190YXNrICp0YXNrKTsNCiAJdm9pZAkJKCpyZWxlYXNlX3hwcnQpKHN0cnVj dCBycGNfeHBydCAqeHBydCwgc3RydWN0IHJwY190YXNrICp0YXNrKTsNCisJdm9pZAkJKCphbGxv Y19zbG90KShzdHJ1Y3QgcnBjX3hwcnQgKnhwcnQsIHN0cnVjdCBycGNfdGFzayAqdGFzayk7DQog CXZvaWQJCSgqcnBjYmluZCkoc3RydWN0IHJwY190YXNrICp0YXNrKTsNCiAJdm9pZAkJKCpzZXRf cG9ydCkoc3RydWN0IHJwY194cHJ0ICp4cHJ0LCB1bnNpZ25lZCBzaG9ydCBwb3J0KTsNCiAJdm9p ZAkJKCpjb25uZWN0KShzdHJ1Y3QgcnBjX3Rhc2sgKnRhc2spOw0KQEAgLTI4MSw2ICsyODIsOCBA QCB2b2lkCQkJeHBydF9jb25uZWN0KHN0cnVjdCBycGNfdGFzayAqdGFzayk7DQogdm9pZAkJCXhw cnRfcmVzZXJ2ZShzdHJ1Y3QgcnBjX3Rhc2sgKnRhc2spOw0KIGludAkJCXhwcnRfcmVzZXJ2ZV94 cHJ0KHN0cnVjdCBycGNfeHBydCAqeHBydCwgc3RydWN0IHJwY190YXNrICp0YXNrKTsNCiBpbnQJ CQl4cHJ0X3Jlc2VydmVfeHBydF9jb25nKHN0cnVjdCBycGNfeHBydCAqeHBydCwgc3RydWN0IHJw Y190YXNrICp0YXNrKTsNCit2b2lkCQkJeHBydF9hbGxvY19zbG90KHN0cnVjdCBycGNfeHBydCAq eHBydCwgc3RydWN0IHJwY190YXNrICp0YXNrKTsNCit2b2lkCQkJeHBydF9sb2NrX2FuZF9hbGxv Y19zbG90KHN0cnVjdCBycGNfeHBydCAqeHBydCwgc3RydWN0IHJwY190YXNrICp0YXNrKTsNCiBp bnQJCQl4cHJ0X3ByZXBhcmVfdHJhbnNtaXQoc3RydWN0IHJwY190YXNrICp0YXNrKTsNCiB2b2lk CQkJeHBydF90cmFuc21pdChzdHJ1Y3QgcnBjX3Rhc2sgKnRhc2spOw0KIHZvaWQJCQl4cHJ0X2Vu ZF90cmFuc21pdChzdHJ1Y3QgcnBjX3Rhc2sgKnRhc2spOw0KZGlmZiAtLWdpdCBhL25ldC9zdW5y cGMveHBydC5jIGIvbmV0L3N1bnJwYy94cHJ0LmMNCmluZGV4IGE1YTQwMmEuLjVkN2Y2MWQgMTAw NjQ0DQotLS0gYS9uZXQvc3VucnBjL3hwcnQuYw0KKysrIGIvbmV0L3N1bnJwYy94cHJ0LmMNCkBA IC05NjksMTEgKzk2OSwxMSBAQCBzdGF0aWMgYm9vbCB4cHJ0X2R5bmFtaWNfZnJlZV9zbG90KHN0 cnVjdCBycGNfeHBydCAqeHBydCwgc3RydWN0IHJwY19ycXN0ICpyZXEpDQogCXJldHVybiBmYWxz ZTsNCiB9DQogDQotc3RhdGljIHZvaWQgeHBydF9hbGxvY19zbG90KHN0cnVjdCBycGNfdGFzayAq dGFzaykNCit2b2lkIHhwcnRfYWxsb2Nfc2xvdChzdHJ1Y3QgcnBjX3hwcnQgKnhwcnQsIHN0cnVj dCBycGNfdGFzayAqdGFzaykNCiB7DQotCXN0cnVjdCBycGNfeHBydAkqeHBydCA9IHRhc2stPnRr X3hwcnQ7DQogCXN0cnVjdCBycGNfcnFzdCAqcmVxOw0KIA0KKwlzcGluX2xvY2soJnhwcnQtPnJl c2VydmVfbG9jayk7DQogCWlmICghbGlzdF9lbXB0eSgmeHBydC0+ZnJlZSkpIHsNCiAJCXJlcSA9 IGxpc3RfZW50cnkoeHBydC0+ZnJlZS5uZXh0LCBzdHJ1Y3QgcnBjX3Jxc3QsIHJxX2xpc3QpOw0K IAkJbGlzdF9kZWwoJnJlcS0+cnFfbGlzdCk7DQpAQCAtOTk0LDEyICs5OTQsMjkgQEAgc3RhdGlj IHZvaWQgeHBydF9hbGxvY19zbG90KHN0cnVjdCBycGNfdGFzayAqdGFzaykNCiAJZGVmYXVsdDoN CiAJCXRhc2stPnRrX3N0YXR1cyA9IC1FQUdBSU47DQogCX0NCisJc3Bpbl91bmxvY2soJnhwcnQt PnJlc2VydmVfbG9jayk7DQogCXJldHVybjsNCiBvdXRfaW5pdF9yZXE6DQogCXRhc2stPnRrX3N0 YXR1cyA9IDA7DQogCXRhc2stPnRrX3Jxc3RwID0gcmVxOw0KIAl4cHJ0X3JlcXVlc3RfaW5pdCh0 YXNrLCB4cHJ0KTsNCisJc3Bpbl91bmxvY2soJnhwcnQtPnJlc2VydmVfbG9jayk7DQorfQ0KK0VY UE9SVF9TWU1CT0xfR1BMKHhwcnRfYWxsb2Nfc2xvdCk7DQorDQordm9pZCB4cHJ0X2xvY2tfYW5k X2FsbG9jX3Nsb3Qoc3RydWN0IHJwY194cHJ0ICp4cHJ0LCBzdHJ1Y3QgcnBjX3Rhc2sgKnRhc2sp DQorew0KKwkvKiBOb3RlOiBncmFiYmluZyB0aGUgeHBydF9sb2NrX3dyaXRlKCkgZW5zdXJlcyB0 aGF0IHdlIHRocm90dGxlDQorCSAqIG5ldyBzbG90IGFsbG9jYXRpb24gaWYgdGhlIHRyYW5zcG9y dCBpcyBjb25nZXN0ZWQgKGkuZS4gd2hlbg0KKwkgKiByZWNvbm5lY3RpbmcgYSBzdHJlYW0gdHJh bnNwb3J0IG9yIHdoZW4gb3V0IG9mIHNvY2tldCB3cml0ZQ0KKwkgKiBidWZmZXIgc3BhY2UpLg0K KwkgKi8NCisJaWYgKHhwcnRfbG9ja193cml0ZSh4cHJ0LCB0YXNrKSkgew0KKwkJeHBydF9hbGxv Y19zbG90KHhwcnQsIHRhc2spOw0KKwkJeHBydF9yZWxlYXNlX3dyaXRlKHhwcnQsIHRhc2spOw0K Kwl9DQogfQ0KK0VYUE9SVF9TWU1CT0xfR1BMKHhwcnRfbG9ja19hbmRfYWxsb2Nfc2xvdCk7DQog DQogc3RhdGljIHZvaWQgeHBydF9mcmVlX3Nsb3Qoc3RydWN0IHJwY194cHJ0ICp4cHJ0LCBzdHJ1 Y3QgcnBjX3Jxc3QgKnJlcSkNCiB7DQpAQCAtMTA4MywyMCArMTEwMCw5IEBAIHZvaWQgeHBydF9y ZXNlcnZlKHN0cnVjdCBycGNfdGFzayAqdGFzaykNCiAJaWYgKHRhc2stPnRrX3Jxc3RwICE9IE5V TEwpDQogCQlyZXR1cm47DQogDQotCS8qIE5vdGU6IGdyYWJiaW5nIHRoZSB4cHJ0X2xvY2tfd3Jp dGUoKSBoZXJlIGlzIG5vdCBzdHJpY3RseSBuZWVkZWQsDQotCSAqIGJ1dCBlbnN1cmVzIHRoYXQg d2UgdGhyb3R0bGUgbmV3IHNsb3QgYWxsb2NhdGlvbiBpZiB0aGUgdHJhbnNwb3J0DQotCSAqIGlz IGNvbmdlc3RlZCAoZS5nLiBpZiByZWNvbm5lY3Rpbmcgb3IgaWYgd2UncmUgb3V0IG9mIHNvY2tl dA0KLQkgKiB3cml0ZSBidWZmZXIgc3BhY2UpLg0KLQkgKi8NCiAJdGFzay0+dGtfdGltZW91dCA9 IDA7DQogCXRhc2stPnRrX3N0YXR1cyA9IC1FQUdBSU47DQotCWlmICgheHBydF9sb2NrX3dyaXRl KHhwcnQsIHRhc2spKQ0KLQkJcmV0dXJuOw0KLQ0KLQlzcGluX2xvY2soJnhwcnQtPnJlc2VydmVf bG9jayk7DQotCXhwcnRfYWxsb2Nfc2xvdCh0YXNrKTsNCi0Jc3Bpbl91bmxvY2soJnhwcnQtPnJl c2VydmVfbG9jayk7DQotCXhwcnRfcmVsZWFzZV93cml0ZSh4cHJ0LCB0YXNrKTsNCisJeHBydC0+ b3BzLT5hbGxvY19zbG90KHhwcnQsIHRhc2spOw0KIH0NCiANCiBzdGF0aWMgaW5saW5lIF9fYmUz MiB4cHJ0X2FsbG9jX3hpZChzdHJ1Y3QgcnBjX3hwcnQgKnhwcnQpDQpkaWZmIC0tZ2l0IGEvbmV0 L3N1bnJwYy94cHJ0cmRtYS90cmFuc3BvcnQuYyBiL25ldC9zdW5ycGMveHBydHJkbWEvdHJhbnNw b3J0LmMNCmluZGV4IDA2Y2RiZmYuLjVkOTIwMmQgMTAwNjQ0DQotLS0gYS9uZXQvc3VucnBjL3hw cnRyZG1hL3RyYW5zcG9ydC5jDQorKysgYi9uZXQvc3VucnBjL3hwcnRyZG1hL3RyYW5zcG9ydC5j DQpAQCAtNzEzLDYgKzcxMyw3IEBAIHN0YXRpYyB2b2lkIHhwcnRfcmRtYV9wcmludF9zdGF0cyhz dHJ1Y3QgcnBjX3hwcnQgKnhwcnQsIHN0cnVjdCBzZXFfZmlsZSAqc2VxKQ0KIHN0YXRpYyBzdHJ1 Y3QgcnBjX3hwcnRfb3BzIHhwcnRfcmRtYV9wcm9jcyA9IHsNCiAJLnJlc2VydmVfeHBydAkJPSB4 cHJ0X3JkbWFfcmVzZXJ2ZV94cHJ0LA0KIAkucmVsZWFzZV94cHJ0CQk9IHhwcnRfcmVsZWFzZV94 cHJ0X2NvbmcsIC8qIHN1bnJwYy94cHJ0LmMgKi8NCisJLmFsbG9jX3Nsb3QJCT0geHBydF9hbGxv Y19zbG90LA0KIAkucmVsZWFzZV9yZXF1ZXN0CT0geHBydF9yZWxlYXNlX3Jxc3RfY29uZywgICAg ICAgLyogZGl0dG8gKi8NCiAJLnNldF9yZXRyYW5zX3RpbWVvdXQJPSB4cHJ0X3NldF9yZXRyYW5z X3RpbWVvdXRfZGVmLCAvKiBkaXR0byAqLw0KIAkucnBjYmluZAkJPSBycGNiX2dldHBvcnRfYXN5 bmMsCS8qIHN1bnJwYy9ycGNiX2NsbnQuYyAqLw0KZGlmZiAtLWdpdCBhL25ldC9zdW5ycGMveHBy dHNvY2suYyBiL25ldC9zdW5ycGMveHBydHNvY2suYw0KaW5kZXggNDAwNTY3Mi4uYTM1YjhlNSAx MDA2NDQNCi0tLSBhL25ldC9zdW5ycGMveHBydHNvY2suYw0KKysrIGIvbmV0L3N1bnJwYy94cHJ0 c29jay5jDQpAQCAtMjQ3Myw2ICsyNDczLDcgQEAgc3RhdGljIHZvaWQgYmNfZGVzdHJveShzdHJ1 Y3QgcnBjX3hwcnQgKnhwcnQpDQogc3RhdGljIHN0cnVjdCBycGNfeHBydF9vcHMgeHNfbG9jYWxf b3BzID0gew0KIAkucmVzZXJ2ZV94cHJ0CQk9IHhwcnRfcmVzZXJ2ZV94cHJ0LA0KIAkucmVsZWFz ZV94cHJ0CQk9IHhzX3RjcF9yZWxlYXNlX3hwcnQsDQorCS5hbGxvY19zbG90CQk9IHhwcnRfYWxs b2Nfc2xvdCwNCiAJLnJwY2JpbmQJCT0geHNfbG9jYWxfcnBjYmluZCwNCiAJLnNldF9wb3J0CQk9 IHhzX2xvY2FsX3NldF9wb3J0LA0KIAkuY29ubmVjdAkJPSB4c19jb25uZWN0LA0KQEAgLTI0ODks NiArMjQ5MCw3IEBAIHN0YXRpYyBzdHJ1Y3QgcnBjX3hwcnRfb3BzIHhzX3VkcF9vcHMgPSB7DQog CS5zZXRfYnVmZmVyX3NpemUJPSB4c191ZHBfc2V0X2J1ZmZlcl9zaXplLA0KIAkucmVzZXJ2ZV94 cHJ0CQk9IHhwcnRfcmVzZXJ2ZV94cHJ0X2NvbmcsDQogCS5yZWxlYXNlX3hwcnQJCT0geHBydF9y ZWxlYXNlX3hwcnRfY29uZywNCisJLmFsbG9jX3Nsb3QJCT0geHBydF9hbGxvY19zbG90LA0KIAku cnBjYmluZAkJPSBycGNiX2dldHBvcnRfYXN5bmMsDQogCS5zZXRfcG9ydAkJPSB4c19zZXRfcG9y dCwNCiAJLmNvbm5lY3QJCT0geHNfY29ubmVjdCwNCkBAIC0yNTA2LDYgKzI1MDgsNyBAQCBzdGF0 aWMgc3RydWN0IHJwY194cHJ0X29wcyB4c191ZHBfb3BzID0gew0KIHN0YXRpYyBzdHJ1Y3QgcnBj X3hwcnRfb3BzIHhzX3RjcF9vcHMgPSB7DQogCS5yZXNlcnZlX3hwcnQJCT0geHBydF9yZXNlcnZl X3hwcnQsDQogCS5yZWxlYXNlX3hwcnQJCT0geHNfdGNwX3JlbGVhc2VfeHBydCwNCisJLmFsbG9j X3Nsb3QJCT0geHBydF9sb2NrX2FuZF9hbGxvY19zbG90LA0KIAkucnBjYmluZAkJPSBycGNiX2dl dHBvcnRfYXN5bmMsDQogCS5zZXRfcG9ydAkJPSB4c19zZXRfcG9ydCwNCiAJLmNvbm5lY3QJCT0g eHNfY29ubmVjdCwNCi0tIA0KMS43LjExLjQNCg0KDQotLSANClRyb25kIE15a2xlYnVzdA0KTGlu dXggTkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFw cC5jb20NCnd3dy5uZXRhcHAuY29tDQoNCg== ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-07 15:46 ` Myklebust, Trond @ 2012-09-08 19:32 ` Dick Streefland, rnews 2012-09-10 9:00 ` Yan-Pai Chen 0 siblings, 1 reply; 23+ messages in thread From: Dick Streefland, rnews @ 2012-09-08 19:32 UTC (permalink / raw) To: linux-nfs "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: | Yes. Can you please see if the following patch fixes the UDP hang? | | 8<--------------------------------------------------------------------- | From f39c1bfb5a03e2d255451bff05be0d7255298fa4 Mon Sep 17 00:00:00 2001 | From: Trond Myklebust <Trond.Myklebust@netapp.com> | Date: Fri, 7 Sep 2012 11:08:50 -0400 | Subject: [PATCH] SUNRPC: Fix a UDP transport regression | | Commit 43cedbf0e8dfb9c5610eb7985d5f21263e313802 (SUNRPC: Ensure that | we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing | hangs in the case of NFS over UDP mounts. | | Since neither the UDP or the RDMA transport mechanism use dynamic slot | allocation, we can skip grabbing the socket lock for those transports. | Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA | case. | | Note that the NFSv4.1 back channel assigns the slot directly | through rpc_run_bc_task, so we can ignore that case. | | Reported-by: Dick Streefland <dick.streefland@altium.nl> | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> | Cc: stable@vger.kernel.org [>= 3.1] This patch appears to fix the issue for me. I cannot reproduce the hang anymore. -- Dick ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-08 19:32 ` Dick Streefland, rnews @ 2012-09-10 9:00 ` Yan-Pai Chen 2012-09-11 19:40 ` Simon Kirby 0 siblings, 1 reply; 23+ messages in thread From: Yan-Pai Chen @ 2012-09-10 9:00 UTC (permalink / raw) To: linux-nfs Dick Streefland <dick.streefland@...> writes: > > "Myklebust, Trond" <Trond.Myklebust@...> wrote: > | Yes. Can you please see if the following patch fixes the UDP hang? > | > | 8<--------------------------------------------------------------------- > | From f39c1bfb5a03e2d255451bff05be0d7255298fa4 Mon Sep 17 00:00:00 2001 > | From: Trond Myklebust <Trond.Myklebust@...> > | Date: Fri, 7 Sep 2012 11:08:50 -0400 > | Subject: [PATCH] SUNRPC: Fix a UDP transport regression > | > | Commit 43cedbf0e8dfb9c5610eb7985d5f21263e313802 (SUNRPC: Ensure that > | we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing > | hangs in the case of NFS over UDP mounts. > | > | Since neither the UDP or the RDMA transport mechanism use dynamic slot > | allocation, we can skip grabbing the socket lock for those transports. > | Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA > | case. > | > | Note that the NFSv4.1 back channel assigns the slot directly > | through rpc_run_bc_task, so we can ignore that case. > | > | Reported-by: Dick Streefland <dick.streefland@...> > | Signed-off-by: Trond Myklebust <Trond.Myklebust@...> > | Cc: stable@... [>= 3.1] > > This patch appears to fix the issue for me. I cannot reproduce the > hang anymore. > Hi Trond, Apologies for my late response. Upgrading to kernel 3.5 requires some effort. I am still working on it. After applying your patch on 3.3 kernel, the problem is gone when using UDP mounts. But it remains hang in the case of NFS over TCP mounts. I reproduced the problem by executing mm/mtest06_3 (i.e. mmap3) in the LTP test suite repeatedly. About less than 200 times, it eventually ran into the CLOSE_WAIT hang. I got the following messages after enabling rpc_debug & nfs_debug: 47991 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE a:call_reserveresult q:xprt_sending 47992 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE a:call_reserveresult q:xprt_sending 47993 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE a:call_reserveresult q:xprt_sending 47994 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE a:call_reserveresult q:xprt_sending 47995 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE a:call_reserveresult q:xprt_sending ... And the hung task information: INFO: task mmap3:24017 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mmap3 D c0237070 0 24017 23980 0x00000000 [<c0237070>] (__schedule+0x608/0x6d8) from [<c02372d4>] (io_schedule+0x84/0xc0) [<c02372d4>] (io_schedule+0x84/0xc0) from [<c006f2f0>] (sleep_on_page+0x8/0x10) [<c006f2f0>] (sleep_on_page+0x8/0x10) from [<c02357dc>] (__wait_on_bit+0x54/0x9c) [<c02357dc>] (__wait_on_bit+0x54/0x9c) from [<c006f6a8>] (wait_on_page_bit+0xbc/0xd4) [<c006f6a8>] (wait_on_page_bit+0xbc/0xd4) from [<c00700c4>] (filemap_fdatawait_range+0x88/0x13c) [<c00700c4>] (filemap_fdatawait_range+0x88/0x13c) from [<c007029c>] (filemap_write_and_wait_range+0x50/0x64) [<c007029c>] (filemap_write_and_wait_range+0x50/0x64) from [<c00ff280>] (nfs_file_fsync+0x5c/0x154) [<c00ff280>] (nfs_file_fsync+0x5c/0x154) from [<c00c200c>] (vfs_fsync_range+0x30/0x40) [<c00c200c>] (vfs_fsync_range+0x30/0x40) from [<c00c203c>] (vfs_fsync+0x20/0x28) [<c00c203c>] (vfs_fsync+0x20/0x28) from [<c009b880>] (filp_close+0x40/0x84) [<c009b880>] (filp_close+0x40/0x84) from [<c001b9b0>] (put_files_struct+0xa8/0xfc) [<c001b9b0>] (put_files_struct+0xa8/0xfc) from [<c001d3f4>] (do_exit+0x278/0x78c) [<c001d3f4>] (do_exit+0x278/0x78c) from [<c001d9b0>] (do_group_exit+0xa8/0xd4) [<c001d9b0>] (do_group_exit+0xa8/0xd4) from [<c002a538>] (get_signal_to_deliver+0x48c/0x4f8) [<c002a538>] (get_signal_to_deliver+0x48c/0x4f8) from [<c000b7a0>] (do_signal+0x88/0x584) [<c000b7a0>] (do_signal+0x88/0x584) from [<c000bcb4>] (do_notify_resume+0x18/0x50) [<c000bcb4>] (do_notify_resume+0x18/0x50) from [<c0009418>] (work_pending+0x24/0x28) -- Regards, Andrew ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-10 9:00 ` Yan-Pai Chen @ 2012-09-11 19:40 ` Simon Kirby 2012-09-11 22:17 ` Myklebust, Trond [not found] ` <1347401844.15208.17.camel@lade.trondhjem.org> 0 siblings, 2 replies; 23+ messages in thread From: Simon Kirby @ 2012-09-11 19:40 UTC (permalink / raw) To: Yan-Pai Chen; +Cc: linux-nfs On Mon, Sep 10, 2012 at 09:00:37AM +0000, Yan-Pai Chen wrote: > Hi Trond, > > Apologies for my late response. > Upgrading to kernel 3.5 requires some effort. I am still working on it. > > After applying your patch on 3.3 kernel, the problem is gone when using UDP > mounts. > But it remains hang in the case of NFS over TCP mounts. > > I reproduced the problem by executing mm/mtest06_3 (i.e. mmap3) in the LTP test > suite repeatedly. > About less than 200 times, it eventually ran into the CLOSE_WAIT hang. > I got the following messages after enabling rpc_debug & nfs_debug: > > 47991 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > a:call_reserveresult q:xprt_sending > 47992 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > a:call_reserveresult q:xprt_sending > 47993 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > a:call_reserveresult q:xprt_sending > 47994 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > a:call_reserveresult q:xprt_sending > 47995 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > a:call_reserveresult q:xprt_sending > ... Hello! This problem still bites us rarely, and we've been using TCP NFS for some time. However, our case seems to be narrowed it down to a very long storage hang on the knfsd side. If storage never has any problems, we don't see the NFS client hang. I was going to try to make a test-case by forcing the server to hang, but I never got around to this. Meanwhile, I've been running the clients with the debugging patches I posted earlier, and it always prints the 'xprt_force_disconnect(): setting XPRT_CLOSE_WAIT" warning before hanging. If Apache is in sendfile() at the time, it seems to get stuck forever; otherwise, it might recover. http://www.spinics.net/lists/linux-nfs/msg29495.html http://0x.ca/sim/ref/3.2.10/dmesg I suppose we could try 3.5 at this point. Simon- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-11 19:40 ` Simon Kirby @ 2012-09-11 22:17 ` Myklebust, Trond 2012-09-13 5:22 ` Yan-Pai Chen [not found] ` <1347401844.15208.17.camel@lade.trondhjem.org> 1 sibling, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-09-11 22:17 UTC (permalink / raw) To: Simon Kirby; +Cc: Yan-Pai Chen, linux-nfs@vger.kernel.org T24gVHVlLCAyMDEyLTA5LTExIGF0IDEyOjQwIC0wNzAwLCBTaW1vbiBLaXJieSB3cm90ZToNCj4g T24gTW9uLCBTZXAgMTAsIDIwMTIgYXQgMDk6MDA6MzdBTSArMDAwMCwgWWFuLVBhaSBDaGVuIHdy b3RlOg0KPiANCj4gPiBIaSBUcm9uZCwNCj4gPiANCj4gPiBBcG9sb2dpZXMgZm9yIG15IGxhdGUg cmVzcG9uc2UuDQo+ID4gVXBncmFkaW5nIHRvIGtlcm5lbCAzLjUgcmVxdWlyZXMgc29tZSBlZmZv cnQuIEkgYW0gc3RpbGwgd29ya2luZyBvbiBpdC4NCj4gPiANCj4gPiBBZnRlciBhcHBseWluZyB5 b3VyIHBhdGNoIG9uIDMuMyBrZXJuZWwsIHRoZSBwcm9ibGVtIGlzIGdvbmUgd2hlbiB1c2luZyBV RFAgDQo+ID4gbW91bnRzLg0KPiA+IEJ1dCBpdCByZW1haW5zIGhhbmcgaW4gdGhlIGNhc2Ugb2Yg TkZTIG92ZXIgVENQIG1vdW50cy4gDQo+ID4gDQo+ID4gSSByZXByb2R1Y2VkIHRoZSBwcm9ibGVt IGJ5IGV4ZWN1dGluZyBtbS9tdGVzdDA2XzMgKGkuZS4gbW1hcDMpIGluIHRoZSBMVFAgdGVzdCAN Cj4gPiBzdWl0ZSByZXBlYXRlZGx5Lg0KPiA+IEFib3V0IGxlc3MgdGhhbiAyMDAgdGltZXMsIGl0 IGV2ZW50dWFsbHkgcmFuIGludG8gdGhlIENMT1NFX1dBSVQgaGFuZy4NCj4gPiBJIGdvdCB0aGUg Zm9sbG93aW5nIG1lc3NhZ2VzIGFmdGVyIGVuYWJsaW5nIHJwY19kZWJ1ZyAmIG5mc19kZWJ1ZzoN Cj4gPiANCj4gPiA0Nzk5MSAwMDAxICAgIC0xMSBjZjI5MTBlMCAgIChudWxsKSAgICAgICAgMCBj MDI0M2Y0MCBuZnN2MyBXUklURSANCj4gPiBhOmNhbGxfcmVzZXJ2ZXJlc3VsdCBxOnhwcnRfc2Vu ZGluZw0KPiA+IDQ3OTkyIDAwMDEgICAgLTExIGNmMjkxMGUwICAgKG51bGwpICAgICAgICAwIGMw MjQzZjQwIG5mc3YzIFdSSVRFIA0KPiA+IGE6Y2FsbF9yZXNlcnZlcmVzdWx0IHE6eHBydF9zZW5k aW5nDQo+ID4gNDc5OTMgMDAwMSAgICAtMTEgY2YyOTEwZTAgICAobnVsbCkgICAgICAgIDAgYzAy NDNmNDAgbmZzdjMgV1JJVEUgDQo+ID4gYTpjYWxsX3Jlc2VydmVyZXN1bHQgcTp4cHJ0X3NlbmRp bmcNCj4gPiA0Nzk5NCAwMDAxICAgIC0xMSBjZjI5MTBlMCAgIChudWxsKSAgICAgICAgMCBjMDI0 M2Y0MCBuZnN2MyBXUklURSANCj4gPiBhOmNhbGxfcmVzZXJ2ZXJlc3VsdCBxOnhwcnRfc2VuZGlu Zw0KPiA+IDQ3OTk1IDAwMDEgICAgLTExIGNmMjkxMGUwICAgKG51bGwpICAgICAgICAwIGMwMjQz ZjQwIG5mc3YzIFdSSVRFIA0KPiA+IGE6Y2FsbF9yZXNlcnZlcmVzdWx0IHE6eHBydF9zZW5kaW5n DQo+ID4gLi4uDQo+IA0KPiBIZWxsbyENCj4gDQo+IFRoaXMgcHJvYmxlbSBzdGlsbCBiaXRlcyB1 cyByYXJlbHksIGFuZCB3ZSd2ZSBiZWVuIHVzaW5nIFRDUCBORlMgZm9yIHNvbWUNCj4gdGltZS4g SG93ZXZlciwgb3VyIGNhc2Ugc2VlbXMgdG8gYmUgbmFycm93ZWQgaXQgZG93biB0byBhIHZlcnkg bG9uZw0KPiBzdG9yYWdlIGhhbmcgb24gdGhlIGtuZnNkIHNpZGUuIElmIHN0b3JhZ2UgbmV2ZXIg aGFzIGFueSBwcm9ibGVtcywgd2UNCj4gZG9uJ3Qgc2VlIHRoZSBORlMgY2xpZW50IGhhbmcuIEkg d2FzIGdvaW5nIHRvIHRyeSB0byBtYWtlIGEgdGVzdC1jYXNlIGJ5DQo+IGZvcmNpbmcgdGhlIHNl cnZlciB0byBoYW5nLCBidXQgSSBuZXZlciBnb3QgYXJvdW5kIHRvIHRoaXMuIE1lYW53aGlsZSwN Cj4gSSd2ZSBiZWVuIHJ1bm5pbmcgdGhlIGNsaWVudHMgd2l0aCB0aGUgZGVidWdnaW5nIHBhdGNo ZXMgSSBwb3N0ZWQNCj4gZWFybGllciwgYW5kIGl0IGFsd2F5cyBwcmludHMgdGhlICd4cHJ0X2Zv cmNlX2Rpc2Nvbm5lY3QoKTogc2V0dGluZw0KPiBYUFJUX0NMT1NFX1dBSVQiIHdhcm5pbmcgYmVm b3JlIGhhbmdpbmcuIElmIEFwYWNoZSBpcyBpbiBzZW5kZmlsZSgpIGF0DQo+IHRoZSB0aW1lLCBp dCBzZWVtcyB0byBnZXQgc3R1Y2sgZm9yZXZlcjsgb3RoZXJ3aXNlLCBpdCBtaWdodCByZWNvdmVy Lg0KDQpEb2VzIHRoZSAiaWYgKHRlc3RfYW5kX3NldF9iaXQoWFBSVF9MT0NLKSA9PSAwKSIgY29u ZGl0aW9uIGltbWVkaWF0ZWx5DQpmb2xsb3dpbmcgdGhhdCBzdWNjZWVkIHNvIHRoYXQgcXVldWVf d29yaygpIGlzIGNhbGxlZD8NCg0KPiBodHRwOi8vd3d3LnNwaW5pY3MubmV0L2xpc3RzL2xpbnV4 LW5mcy9tc2cyOTQ5NS5odG1sDQo+IGh0dHA6Ly8weC5jYS9zaW0vcmVmLzMuMi4xMC9kbWVzZw0K PiANCj4gSSBzdXBwb3NlIHdlIGNvdWxkIHRyeSAzLjUgYXQgdGhpcyBwb2ludC4NCg0KSWYgeW91 J3ZlIGJlZW4ga2VlcGluZyB1cCB3aXRoIHRoZSAzLjIgc3RhYmxlIHJlbGVhc2VzLCB0aGVuIEkg d291bGRuJ3QNCmV4cGVjdCBhbnkgbWFqb3IgZGlmZmVyZW5jZXMgdG8gdGhlIHN1bnJwYyBjb2Rl LCBidXQgaXQgbWlnaHQgYmUgd29ydGggYQ0KdHJ5IGluIGNhc2UgdGhlIG5ldHdvcmtpbmcgbGF5 ZXIgaGFzIGNoYW5nZWQuDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xpZW50 IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3Lm5l dGFwcC5jb20NCg== ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-11 22:17 ` Myklebust, Trond @ 2012-09-13 5:22 ` Yan-Pai Chen 2012-09-13 13:32 ` Myklebust, Trond 0 siblings, 1 reply; 23+ messages in thread From: Yan-Pai Chen @ 2012-09-13 5:22 UTC (permalink / raw) To: linux-nfs Myklebust, Trond <Trond.Myklebust@...> writes: > > > > I suppose we could try 3.5 at this point. > > If you've been keeping up with the 3.2 stable releases, then I wouldn't > expect any major differences to the sunrpc code, but it might be worth a > try in case the networking layer has changed. > Hi Trond, I got the same problem on kernel 3.5 when using TCP mounts. The hung task info and rpc_debug messages were just the same as what I got on kernel 3.3. -- Regards, Andrew ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-13 5:22 ` Yan-Pai Chen @ 2012-09-13 13:32 ` Myklebust, Trond 2012-09-21 7:30 ` Yan-Pai Chen 0 siblings, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-09-13 13:32 UTC (permalink / raw) To: Yan-Pai Chen; +Cc: linux-nfs@vger.kernel.org T24gVGh1LCAyMDEyLTA5LTEzIGF0IDA1OjIyICswMDAwLCBZYW4tUGFpIENoZW4gd3JvdGU6DQo+ IE15a2xlYnVzdCwgVHJvbmQgPFRyb25kLk15a2xlYnVzdEAuLi4+IHdyaXRlczoNCj4gDQo+ID4g PiANCj4gPiA+IEkgc3VwcG9zZSB3ZSBjb3VsZCB0cnkgMy41IGF0IHRoaXMgcG9pbnQuDQo+ID4g DQo+ID4gSWYgeW91J3ZlIGJlZW4ga2VlcGluZyB1cCB3aXRoIHRoZSAzLjIgc3RhYmxlIHJlbGVh c2VzLCB0aGVuIEkgd291bGRuJ3QNCj4gPiBleHBlY3QgYW55IG1ham9yIGRpZmZlcmVuY2VzIHRv IHRoZSBzdW5ycGMgY29kZSwgYnV0IGl0IG1pZ2h0IGJlIHdvcnRoIGENCj4gPiB0cnkgaW4gY2Fz ZSB0aGUgbmV0d29ya2luZyBsYXllciBoYXMgY2hhbmdlZC4NCj4gPiANCj4gDQo+IEhpIFRyb25k LA0KPiANCj4gSSBnb3QgdGhlIHNhbWUgcHJvYmxlbSBvbiBrZXJuZWwgMy41IHdoZW4gdXNpbmcg VENQIG1vdW50cy4NCj4gVGhlIGh1bmcgdGFzayBpbmZvIGFuZCBycGNfZGVidWcgbWVzc2FnZXMg d2VyZSBqdXN0IHRoZSBzYW1lIGFzIHdoYXQNCj4gSSBnb3Qgb24ga2VybmVsIDMuMy4NCg0KVGhh bmtzLiBDYW4geW91IHRyeSB1c2luZyB0aGUgcGF0Y2ggdGhhdCBJIHBvc3RlZCBvbiB0aGUgbGlz dCB5ZXN0ZXJkYXk/DQpJdCBzaG91bGQgYXBwbHkgY2xlYW5seSB0byB0aGUgbGF0ZXN0IDMuNS54 IGtlcm5lbC4NCg0KQ2hlZXJzDQogIFRyb25kDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXgg TkZTIGNsaWVudCBtYWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5j b20NCnd3dy5uZXRhcHAuY29tDQo= ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-13 13:32 ` Myklebust, Trond @ 2012-09-21 7:30 ` Yan-Pai Chen 0 siblings, 0 replies; 23+ messages in thread From: Yan-Pai Chen @ 2012-09-21 7:30 UTC (permalink / raw) To: linux-nfs Myklebust, Trond <Trond.Myklebust@...> writes: > > Thanks. Can you try using the patch that I posted on the list yesterday? > It should apply cleanly to the latest 3.5.x kernel. > > Cheers > Trond Hi Trond, I can reproduce the same problem after applying your latest patch on 3.5.0 kernel. But it works for the later 3.5.x kernel (fbcbe2b3c9). -- Regards, Andrew ^ permalink raw reply [flat|nested] 23+ messages in thread
[parent not found: <1347401844.15208.17.camel@lade.trondhjem.org>]
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang [not found] ` <1347401844.15208.17.camel@lade.trondhjem.org> @ 2012-09-12 20:54 ` Myklebust, Trond 2012-09-19 22:01 ` Simon Kirby 0 siblings, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-09-12 20:54 UTC (permalink / raw) To: Simon Kirby; +Cc: Yan-Pai Chen, linux-nfs@vger.kernel.org T24gVHVlLCAyMDEyLTA5LTExIGF0IDE4OjE3IC0wNDAwLCBUcm9uZCBNeWtsZWJ1c3Qgd3JvdGU6 DQo+IE9uIFR1ZSwgMjAxMi0wOS0xMSBhdCAxMjo0MCAtMDcwMCwgU2ltb24gS2lyYnkgd3JvdGU6 DQo+ID4gSGVsbG8hDQo+ID4gDQo+ID4gVGhpcyBwcm9ibGVtIHN0aWxsIGJpdGVzIHVzIHJhcmVs eSwgYW5kIHdlJ3ZlIGJlZW4gdXNpbmcgVENQIE5GUyBmb3Igc29tZQ0KPiA+IHRpbWUuIEhvd2V2 ZXIsIG91ciBjYXNlIHNlZW1zIHRvIGJlIG5hcnJvd2VkIGl0IGRvd24gdG8gYSB2ZXJ5IGxvbmcN Cj4gPiBzdG9yYWdlIGhhbmcgb24gdGhlIGtuZnNkIHNpZGUuIElmIHN0b3JhZ2UgbmV2ZXIgaGFz IGFueSBwcm9ibGVtcywgd2UNCj4gPiBkb24ndCBzZWUgdGhlIE5GUyBjbGllbnQgaGFuZy4gSSB3 YXMgZ29pbmcgdG8gdHJ5IHRvIG1ha2UgYSB0ZXN0LWNhc2UgYnkNCj4gPiBmb3JjaW5nIHRoZSBz ZXJ2ZXIgdG8gaGFuZywgYnV0IEkgbmV2ZXIgZ290IGFyb3VuZCB0byB0aGlzLiBNZWFud2hpbGUs DQo+ID4gSSd2ZSBiZWVuIHJ1bm5pbmcgdGhlIGNsaWVudHMgd2l0aCB0aGUgZGVidWdnaW5nIHBh dGNoZXMgSSBwb3N0ZWQNCj4gPiBlYXJsaWVyLCBhbmQgaXQgYWx3YXlzIHByaW50cyB0aGUgJ3hw cnRfZm9yY2VfZGlzY29ubmVjdCgpOiBzZXR0aW5nDQo+ID4gWFBSVF9DTE9TRV9XQUlUIiB3YXJu aW5nIGJlZm9yZSBoYW5naW5nLiBJZiBBcGFjaGUgaXMgaW4gc2VuZGZpbGUoKSBhdA0KPiA+IHRo ZSB0aW1lLCBpdCBzZWVtcyB0byBnZXQgc3R1Y2sgZm9yZXZlcjsgb3RoZXJ3aXNlLCBpdCBtaWdo dCByZWNvdmVyLg0KPiANCj4gRG9lcyB0aGUgImlmICh0ZXN0X2FuZF9zZXRfYml0KFhQUlRfTE9D SykgPT0gMCkiIGNvbmRpdGlvbiBpbW1lZGlhdGVseQ0KPiBmb2xsb3dpbmcgdGhhdCBzdWNjZWVk IHNvIHRoYXQgcXVldWVfd29yaygpIGlzIGNhbGxlZD8NCj4gDQo+ID4gaHR0cDovL3d3dy5zcGlu aWNzLm5ldC9saXN0cy9saW51eC1uZnMvbXNnMjk0OTUuaHRtbA0KPiA+IGh0dHA6Ly8weC5jYS9z aW0vcmVmLzMuMi4xMC9kbWVzZw0KPiA+IA0KPiA+IEkgc3VwcG9zZSB3ZSBjb3VsZCB0cnkgMy41 IGF0IHRoaXMgcG9pbnQuDQo+IA0KPiBJZiB5b3UndmUgYmVlbiBrZWVwaW5nIHVwIHdpdGggdGhl IDMuMiBzdGFibGUgcmVsZWFzZXMsIHRoZW4gSSB3b3VsZG4ndA0KPiBleHBlY3QgYW55IG1ham9y IGRpZmZlcmVuY2VzIHRvIHRoZSBzdW5ycGMgY29kZSwgYnV0IGl0IG1pZ2h0IGJlIHdvcnRoIGEN Cj4gdHJ5IGluIGNhc2UgdGhlIG5ldHdvcmtpbmcgbGF5ZXIgaGFzIGNoYW5nZWQuDQoNCkhpIFNp bW9uLA0KDQpDYW4geW91IHRyeSB0aGUgZm9sbG93aW5nIHBhdGNoLCBhbmQgc2VlIGlmIGl0IGFk ZHJlc3NlcyB0aGUgVENQICJzZXJ2ZXINCmhhbmdzIiBjYXNlPw0KDQpDaGVlcnMNCiAgVHJvbmQN Cjg8LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0tLS0tLQ0KRnJvbSA5OTMzMGQwOWNjMTA3NGZiZGM2NDA4OWZhMGEzZjhkYmRj NzRkYWFmIE1vbiBTZXAgMTcgMDA6MDA6MDAgMjAwMQ0KRnJvbTogVHJvbmQgTXlrbGVidXN0IDxU cm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbT4NCkRhdGU6IFdlZCwgMTIgU2VwIDIwMTIgMTY6NDk6 MTUgLTA0MDANClN1YmplY3Q6IFtQQVRDSF0gU1VOUlBDOiBFbnN1cmUgdGhhdCB0aGUgVENQIHNv Y2tldCBpcyBjbG9zZWQgd2hlbiBpbg0KIENMT1NFX1dBSVQNCg0KSW5zdGVhZCBvZiBkb2luZyBh IHNodXRkb3duKCkgY2FsbCwgd2UgbmVlZCB0byBkbyBhbiBhY3R1YWwgY2xvc2UoKS4NCkRpdHRv IGlmL3doZW4gdGhlIHNlcnZlciBpcyBzZW5kaW5nIHVzIGp1bmsgUlBDIGhlYWRlcnMuDQoNClNp Z25lZC1vZmYtYnk6IFRyb25kIE15a2xlYnVzdCA8VHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20+ DQotLS0NCiBuZXQvc3VucnBjL3hwcnRzb2NrLmMgfCAyMSArKysrKysrKysrKysrKysrLS0tLS0N CiAxIGZpbGUgY2hhbmdlZCwgMTYgaW5zZXJ0aW9ucygrKSwgNSBkZWxldGlvbnMoLSkNCg0KZGlm ZiAtLWdpdCBhL25ldC9zdW5ycGMveHBydHNvY2suYyBiL25ldC9zdW5ycGMveHBydHNvY2suYw0K aW5kZXggYTM1YjhlNS4uZDE5ODhjZiAxMDA2NDQNCi0tLSBhL25ldC9zdW5ycGMveHBydHNvY2su Yw0KKysrIGIvbmV0L3N1bnJwYy94cHJ0c29jay5jDQpAQCAtMTAyNSw2ICsxMDI1LDE2IEBAIHN0 YXRpYyB2b2lkIHhzX3VkcF9kYXRhX3JlYWR5KHN0cnVjdCBzb2NrICpzaywgaW50IGxlbikNCiAJ cmVhZF91bmxvY2tfYmgoJnNrLT5za19jYWxsYmFja19sb2NrKTsNCiB9DQogDQorLyoNCisgKiBI ZWxwZXIgZnVuY3Rpb24gdG8gZm9yY2UgYSBUQ1AgY2xvc2UgaWYgdGhlIHNlcnZlciBpcyBzZW5k aW5nDQorICoganVuayBhbmQvb3IgaXQgaGFzIHB1dCB1cyBpbiBDTE9TRV9XQUlUDQorICovDQor c3RhdGljIHZvaWQgeHNfdGNwX2ZvcmNlX2Nsb3NlKHN0cnVjdCBycGNfeHBydCAqeHBydCkNCit7 DQorCXNldF9iaXQoWFBSVF9DT05ORUNUSU9OX0NMT1NFLCAmeHBydC0+c3RhdGUpOw0KKwl4cHJ0 X2ZvcmNlX2Rpc2Nvbm5lY3QoeHBydCk7DQorfQ0KKw0KIHN0YXRpYyBpbmxpbmUgdm9pZCB4c190 Y3BfcmVhZF9mcmFnaGRyKHN0cnVjdCBycGNfeHBydCAqeHBydCwgc3RydWN0IHhkcl9za2JfcmVh ZGVyICpkZXNjKQ0KIHsNCiAJc3RydWN0IHNvY2tfeHBydCAqdHJhbnNwb3J0ID0gY29udGFpbmVy X29mKHhwcnQsIHN0cnVjdCBzb2NrX3hwcnQsIHhwcnQpOw0KQEAgLTEwNTEsNyArMTA2MSw3IEBA IHN0YXRpYyBpbmxpbmUgdm9pZCB4c190Y3BfcmVhZF9mcmFnaGRyKHN0cnVjdCBycGNfeHBydCAq eHBydCwgc3RydWN0IHhkcl9za2JfcmVhDQogCS8qIFNhbml0eSBjaGVjayBvZiB0aGUgcmVjb3Jk IGxlbmd0aCAqLw0KIAlpZiAodW5saWtlbHkodHJhbnNwb3J0LT50Y3BfcmVjbGVuIDwgOCkpIHsN CiAJCWRwcmludGsoIlJQQzogICAgICAgaW52YWxpZCBUQ1AgcmVjb3JkIGZyYWdtZW50IGxlbmd0 aFxuIik7DQotCQl4cHJ0X2ZvcmNlX2Rpc2Nvbm5lY3QoeHBydCk7DQorCQl4c190Y3BfZm9yY2Vf Y2xvc2UoeHBydCk7DQogCQlyZXR1cm47DQogCX0NCiAJZHByaW50aygiUlBDOiAgICAgICByZWFk aW5nIFRDUCByZWNvcmQgZnJhZ21lbnQgb2YgbGVuZ3RoICVkXG4iLA0KQEAgLTExMzIsNyArMTE0 Miw3IEBAIHN0YXRpYyBpbmxpbmUgdm9pZCB4c190Y3BfcmVhZF9jYWxsZGlyKHN0cnVjdCBzb2Nr X3hwcnQgKnRyYW5zcG9ydCwNCiAJCWJyZWFrOw0KIAlkZWZhdWx0Og0KIAkJZHByaW50aygiUlBD OiAgICAgICBpbnZhbGlkIHJlcXVlc3QgbWVzc2FnZSB0eXBlXG4iKTsNCi0JCXhwcnRfZm9yY2Vf ZGlzY29ubmVjdCgmdHJhbnNwb3J0LT54cHJ0KTsNCisJCXhzX3RjcF9mb3JjZV9jbG9zZSgmdHJh bnNwb3J0LT54cHJ0KTsNCiAJfQ0KIAl4c190Y3BfY2hlY2tfZnJhZ2hkcih0cmFuc3BvcnQpOw0K IH0NCkBAIC0xNDU1LDYgKzE0NjUsOCBAQCBzdGF0aWMgdm9pZCB4c190Y3BfY2FuY2VsX2xpbmdl cl90aW1lb3V0KHN0cnVjdCBycGNfeHBydCAqeHBydCkNCiBzdGF0aWMgdm9pZCB4c19zb2NrX21h cmtfY2xvc2VkKHN0cnVjdCBycGNfeHBydCAqeHBydCkNCiB7DQogCXNtcF9tYl9fYmVmb3JlX2Ns ZWFyX2JpdCgpOw0KKwljbGVhcl9iaXQoWFBSVF9DT05ORUNUSU9OX0FCT1JULCAmeHBydC0+c3Rh dGUpOw0KKwljbGVhcl9iaXQoWFBSVF9DT05ORUNUSU9OX0NMT1NFLCAmeHBydC0+c3RhdGUpOw0K IAljbGVhcl9iaXQoWFBSVF9DTE9TRV9XQUlULCAmeHBydC0+c3RhdGUpOw0KIAljbGVhcl9iaXQo WFBSVF9DTE9TSU5HLCAmeHBydC0+c3RhdGUpOw0KIAlzbXBfbWJfX2FmdGVyX2NsZWFyX2JpdCgp Ow0KQEAgLTE1MTIsOCArMTUyNCw4IEBAIHN0YXRpYyB2b2lkIHhzX3RjcF9zdGF0ZV9jaGFuZ2Uo c3RydWN0IHNvY2sgKnNrKQ0KIAkJYnJlYWs7DQogCWNhc2UgVENQX0NMT1NFX1dBSVQ6DQogCQkv KiBUaGUgc2VydmVyIGluaXRpYXRlZCBhIHNodXRkb3duIG9mIHRoZSBzb2NrZXQgKi8NCi0JCXhw cnRfZm9yY2VfZGlzY29ubmVjdCh4cHJ0KTsNCiAJCXhwcnQtPmNvbm5lY3RfY29va2llKys7DQor CQl4c190Y3BfZm9yY2VfY2xvc2UoeHBydCk7DQogCWNhc2UgVENQX0NMT1NJTkc6DQogCQkvKg0K IAkJICogSWYgdGhlIHNlcnZlciBjbG9zZWQgZG93biB0aGUgY29ubmVjdGlvbiwgbWFrZSBzdXJl IHRoYXQNCkBAIC0yMTk5LDggKzIyMTEsNyBAQCBzdGF0aWMgdm9pZCB4c190Y3Bfc2V0dXBfc29j a2V0KHN0cnVjdCB3b3JrX3N0cnVjdCAqd29yaykNCiAJCS8qIFdlJ3JlIHByb2JhYmx5IGluIFRJ TUVfV0FJVC4gR2V0IHJpZCBvZiBleGlzdGluZyBzb2NrZXQsDQogCQkgKiBhbmQgcmV0cnkNCiAJ CSAqLw0KLQkJc2V0X2JpdChYUFJUX0NPTk5FQ1RJT05fQ0xPU0UsICZ4cHJ0LT5zdGF0ZSk7DQot CQl4cHJ0X2ZvcmNlX2Rpc2Nvbm5lY3QoeHBydCk7DQorCQl4c190Y3BfZm9yY2VfY2xvc2UoeHBy dCk7DQogCQlicmVhazsNCiAJY2FzZSAtRUNPTk5SRUZVU0VEOg0KIAljYXNlIC1FQ09OTlJFU0VU Og0KLS0gDQoxLjcuMTEuNA0KDQoNCi0tIA0KVHJvbmQgTXlrbGVidXN0DQpMaW51eCBORlMgY2xp ZW50IG1haW50YWluZXINCg0KTmV0QXBwDQpUcm9uZC5NeWtsZWJ1c3RAbmV0YXBwLmNvbQ0Kd3d3 Lm5ldGFwcC5jb20NCg== ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-12 20:54 ` Myklebust, Trond @ 2012-09-19 22:01 ` Simon Kirby 2012-09-19 22:11 ` Myklebust, Trond 0 siblings, 1 reply; 23+ messages in thread From: Simon Kirby @ 2012-09-19 22:01 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Yan-Pai Chen, linux-nfs@vger.kernel.org On Tue, Sep 11, 2012 at 10:17:25PM +0000, Myklebust, Trond wrote: > Does the "if (test_and_set_bit(XPRT_LOCK) == 0)" condition immediately > following that succeed so that queue_work() is called? Yes, it seems to: [146957.793093] RPC: server initated shutdown -- state 8 conn 1 dead 0 zapped 1 sk_shutdown 1 [146957.793418] xprt_force_disconnect(): setting XPRT_CLOSE_WAIT [146957.799288] xprt_force_disconnect(): setting XPRT_LOCKED worked, calling queue_work() On Wed, Sep 12, 2012 at 08:54:14PM +0000, Myklebust, Trond wrote: > On Tue, 2012-09-11 at 18:17 -0400, Trond Myklebust wrote: > > Hi Simon, > > Can you try the following patch, and see if it addresses the TCP "server > hangs" case? > > Cheers > Trond > 8<---------------------------------------------------------------------- > From 99330d09cc1074fbdc64089fa0a3f8dbdc74daaf Mon Sep 17 00:00:00 2001 > From: Trond Myklebust <Trond.Myklebust@netapp.com> > Date: Wed, 12 Sep 2012 16:49:15 -0400 > Subject: [PATCH] SUNRPC: Ensure that the TCP socket is closed when in > CLOSE_WAIT > > Instead of doing a shutdown() call, we need to do an actual close(). > Ditto if/when the server is sending us junk RPC headers. > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > --- > net/sunrpc/xprtsock.c | 21 ++++++++++++++++----- > 1 file changed, 16 insertions(+), 5 deletions(-) > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index a35b8e5..d1988cf 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -1025,6 +1025,16 @@ static void xs_udp_data_ready(struct sock *sk, int len) > read_unlock_bh(&sk->sk_callback_lock); > } > > +/* > + * Helper function to force a TCP close if the server is sending > + * junk and/or it has put us in CLOSE_WAIT > + */ > +static void xs_tcp_force_close(struct rpc_xprt *xprt) > +{ > + set_bit(XPRT_CONNECTION_CLOSE, &xprt->state); > + xprt_force_disconnect(xprt); > +} > + > static inline void xs_tcp_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_reader *desc) > { > struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); > @@ -1051,7 +1061,7 @@ static inline void xs_tcp_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_rea > /* Sanity check of the record length */ > if (unlikely(transport->tcp_reclen < 8)) { > dprintk("RPC: invalid TCP record fragment length\n"); > - xprt_force_disconnect(xprt); > + xs_tcp_force_close(xprt); > return; > } > dprintk("RPC: reading TCP record fragment of length %d\n", > @@ -1132,7 +1142,7 @@ static inline void xs_tcp_read_calldir(struct sock_xprt *transport, > break; > default: > dprintk("RPC: invalid request message type\n"); > - xprt_force_disconnect(&transport->xprt); > + xs_tcp_force_close(&transport->xprt); > } > xs_tcp_check_fraghdr(transport); > } > @@ -1455,6 +1465,8 @@ static void xs_tcp_cancel_linger_timeout(struct rpc_xprt *xprt) > static void xs_sock_mark_closed(struct rpc_xprt *xprt) > { > smp_mb__before_clear_bit(); > + clear_bit(XPRT_CONNECTION_ABORT, &xprt->state); > + clear_bit(XPRT_CONNECTION_CLOSE, &xprt->state); > clear_bit(XPRT_CLOSE_WAIT, &xprt->state); > clear_bit(XPRT_CLOSING, &xprt->state); > smp_mb__after_clear_bit(); > @@ -1512,8 +1524,8 @@ static void xs_tcp_state_change(struct sock *sk) > break; > case TCP_CLOSE_WAIT: > /* The server initiated a shutdown of the socket */ > - xprt_force_disconnect(xprt); > xprt->connect_cookie++; > + xs_tcp_force_close(xprt); > case TCP_CLOSING: > /* > * If the server closed down the connection, make sure that > @@ -2199,8 +2211,7 @@ static void xs_tcp_setup_socket(struct work_struct *work) > /* We're probably in TIME_WAIT. Get rid of existing socket, > * and retry > */ > - set_bit(XPRT_CONNECTION_CLOSE, &xprt->state); > - xprt_force_disconnect(xprt); > + xs_tcp_force_close(xprt); > break; > case -ECONNREFUSED: > case -ECONNRESET: > -- > 1.7.11.4 Yes, based on data collected today, this seems to fix the problem! Awesome! :) Simon- ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-19 22:01 ` Simon Kirby @ 2012-09-19 22:11 ` Myklebust, Trond 2012-10-12 8:15 ` Simon Kirby 0 siblings, 1 reply; 23+ messages in thread From: Myklebust, Trond @ 2012-09-19 22:11 UTC (permalink / raw) To: Simon Kirby; +Cc: Yan-Pai Chen, linux-nfs@vger.kernel.org T24gV2VkLCAyMDEyLTA5LTE5IGF0IDE1OjAxIC0wNzAwLCBTaW1vbiBLaXJieSB3cm90ZToNCg0K PiBZZXMsIGJhc2VkIG9uIGRhdGEgY29sbGVjdGVkIHRvZGF5LCB0aGlzIHNlZW1zIHRvIGZpeCB0 aGUgcHJvYmxlbSENCj4gQXdlc29tZSEgOikNCg0KR3JlYXQhIEknbSBhZGRpbmcgIlRlc3RlZC1i eToiIGFuZCAiQ2M6IHN0YWJsZSIgbGluZXMgc2luY2UgeW91J3ZlIGJlZW4NCnNlZWluZyB0aGlz IGZvciBzb21lIHRpbWUuDQotLSANClRyb25kIE15a2xlYnVzdA0KTGludXggTkZTIGNsaWVudCBt YWludGFpbmVyDQoNCk5ldEFwcA0KVHJvbmQuTXlrbGVidXN0QG5ldGFwcC5jb20NCnd3dy5uZXRh cHAuY29tDQo= ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [3.2.5] NFSv3 CLOSE_WAIT hang 2012-09-19 22:11 ` Myklebust, Trond @ 2012-10-12 8:15 ` Simon Kirby 0 siblings, 0 replies; 23+ messages in thread From: Simon Kirby @ 2012-10-12 8:15 UTC (permalink / raw) To: Myklebust, Trond; +Cc: Yan-Pai Chen, linux-nfs@vger.kernel.org On Wed, Sep 19, 2012 at 10:11:57PM +0000, Myklebust, Trond wrote: > On Wed, 2012-09-19 at 15:01 -0700, Simon Kirby wrote: > > > Yes, based on data collected today, this seems to fix the problem! > > Awesome! :) > > Great! I'm adding "Tested-by:" and "Cc: stable" lines since you've been > seeing this for some time. This has been working well, except we just saw one case now where we still got a CLOSE_WAIT hang. The issue was that the queue_work() was not called because setting XPRT_LOCKED failed, which you asked about before. This trace from our debugging patches: WARNING: at net/sunrpc/xprt.c:648 xprt_force_disconnect+0x12a/0x140() Hardware name: PowerEdge 1950 xprt_force_disconnect(): setting XPRT_LOCKED failed Modules linked in: ipmi_devintf ipmi_si ipmi_msghandler xt_recent nf_conntrack_ftp xt_state xt_owner nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 bnx2 serio_raw evdev Pid: 0, comm: swapper/1 Tainted: G W 3.2.28-hw+ #25 Call Trace: <IRQ> [<ffffffff816b74ba>] ? xprt_force_disconnect+0x12a/0x140 [<ffffffff8105db70>] warn_slowpath_common+0x80/0xc0 [<ffffffff8105dca9>] warn_slowpath_fmt+0x69/0x70 [<ffffffff81701c8c>] ? printk+0x67/0x69 [<ffffffff81704ef1>] ? _raw_spin_lock_bh+0x11/0x40 [<ffffffff816b74ba>] xprt_force_disconnect+0x12a/0x140 [<ffffffff816ba99c>] xs_tcp_state_change+0xdc/0x340 [<ffffffff81669deb>] tcp_fin+0x17b/0x210 [<ffffffff8166aaf2>] tcp_data_queue+0x4c2/0xe70 [<ffffffff8166e259>] ? tcp_validate_incoming+0x109/0x340 [<ffffffff8166e5d9>] tcp_rcv_established+0x149/0xb50 [<ffffffff8167711c>] tcp_v4_do_rcv+0x15c/0x2e0 [<ffffffff8164b6e4>] ? nf_iterate+0x84/0xb0 [<ffffffff816778ef>] tcp_v4_rcv+0x64f/0x8b0 [<ffffffff8164b85d>] ? nf_hook_slow+0x6d/0x130 [<ffffffff81656160>] ? ip_rcv+0x2f0/0x2f0 [<ffffffff8165620a>] ip_local_deliver_finish+0xaa/0x1d0 [<ffffffff816563bd>] ip_local_deliver+0x8d/0xa0 [<ffffffff81655bdb>] ip_rcv_finish+0x7b/0x310 [<ffffffff816560ed>] ip_rcv+0x27d/0x2f0 [<ffffffff816272b7>] __netif_receive_skb+0x437/0x490 [<ffffffff81629148>] netif_receive_skb+0x78/0x80 ... -pid- flgs status -client- --rqstp- -timeout ---ops-- 60022 0080 -11 ffff880096e5bc00 (null) 0 ffffffff817beb20 nfsv3 FSINFO a:call_reserveresult q:xprt_sending SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0 SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0 SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0 SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0 SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0 SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0 SUNRPC: xprt(ffff880142c1d800) state=16, snd_task=0 SUNRPC: xprt(ffff880032175800) state=16, snd_task=0 SUNRPC: xprt(ffff8800c2da9000) state=16, snd_task=0 SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0 SUNRPC: xprt(ffff880221977000) state=16, snd_task=0 SUNRPC: xprt(ffff88022212f800) state=16, snd_task=0 SUNRPC: xprt(ffff880221504000) state=0, snd_task=0 SUNRPC: xprt(ffff88022509c800) state=18, snd_task=0 SUNRPC: xprt(ffff88022509c800) state=18, snd_task=0 SUNRPC: xprt(ffff880222711000) state=18, snd_task=0 SUNRPC: xprt(ffff880222711000) state=18, snd_task=0 SUNRPC: xprt(ffff880221737800) state=18, snd_task=0 SUNRPC: xprt(ffff880221737800) state=18, snd_task=0 SUNRPC: xprt(ffff88021f275800) state=18, snd_task=0 SUNRPC: xprt(ffff88021f275800) state=18, snd_task=0 SUNRPC: xprt(ffff8802252b3800) state=18, snd_task=0 SUNRPC: xprt(ffff880222704000) state=18, snd_task=0 SUNRPC: xprt(ffff880222704000) state=18, snd_task=0 SUNRPC: xprt(ffff8802252b3800) state=18, snd_task=0 SUNRPC: xprt(ffff8802252b3800) state=18, snd_task=0 SUNRPC: xprt(ffff88022234e800) state=18, snd_task=0 SUNRPC: xprt(ffff88022234e800) state=18, snd_task=0 SUNRPC: xprt(ffff880221734000) state=18, snd_task=0 SUNRPC: xprt(ffff880221734000) state=18, snd_task=0 SUNRPC: xprt(ffff880222702000) state=18, snd_task=0 SUNRPC: xprt(ffff880222702000) state=18, snd_task=0 1758 0080 -11 ffff880225562400 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 1770 0080 -11 ffff880225562400 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 1799 0080 -11 ffff880225562400 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0 59728 0001 -11 ffff8802253d9e00 (null) 0 ffffffff8173ff20 nfsv3 COMMIT a:call_reserveresult q:xprt_sending 1449 0080 -11 ffff8802253d9e00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 1766 0080 -11 ffff8802253d9e00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 1811 0080 -11 ffff8802253d9e00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0 1431 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 1604 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 1851 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 1870 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 5693 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 5791 0080 -11 ffff8802253d6000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0 1871 0080 -11 ffff880225522800 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0 1442 0080 -11 ffff8802253d7600 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 1457 0080 -11 ffff8802253d7600 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 1829 0080 -11 ffff8802253d7600 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 5784 0080 -11 ffff8802253d7600 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0 1625 0080 -11 ffff880225522000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 2152 0080 -11 ffff880225522000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 2196 0080 -11 ffff880225522000 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0 1517 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 1733 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 1741 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 1872 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 2211 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 4316 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 4350 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 LOOKUP a:call_reserveresult q:xprt_sending 4353 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 5862 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 6982 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending 9869 0080 -11 ffff8802253d5a00 (null) 0 ffffffff817beb20 nfsv3 ACCESS a:call_reserveresult q:xprt_sending SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0 SUNRPC: xprt(ffff88021e87d800) state=283, snd_task->tk_pid=0 SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0 SUNRPC: xprt(ffff880225098800) state=16, snd_task=0 SUNRPC: xprt(ffff88022509b000) state=18, snd_task=0 We've only seen this once, so we're definitely better than before. Simon- ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2012-10-12 8:15 UTC | newest] Thread overview: 23+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-09-09 19:45 [3.1-rc4] NFSv3 client hang Simon Kirby 2011-09-09 23:18 ` Trond Myklebust 2011-10-20 19:03 ` Simon Kirby 2012-03-01 22:55 ` Simon Kirby 2012-03-02 0:25 ` Simon Kirby 2012-03-02 18:49 ` [3.2.5] NFSv3 CLOSE_WAIT hang Simon Kirby 2012-09-05 7:49 ` Yan-Pai Chen 2012-09-05 15:09 ` Myklebust, Trond 2012-09-07 13:57 ` Dick Streefland, rnews 2012-09-07 14:13 ` Myklebust, Trond 2012-09-07 14:33 ` Dick Streefland, rnews 2012-09-07 15:46 ` Myklebust, Trond 2012-09-08 19:32 ` Dick Streefland, rnews 2012-09-10 9:00 ` Yan-Pai Chen 2012-09-11 19:40 ` Simon Kirby 2012-09-11 22:17 ` Myklebust, Trond 2012-09-13 5:22 ` Yan-Pai Chen 2012-09-13 13:32 ` Myklebust, Trond 2012-09-21 7:30 ` Yan-Pai Chen [not found] ` <1347401844.15208.17.camel@lade.trondhjem.org> 2012-09-12 20:54 ` Myklebust, Trond 2012-09-19 22:01 ` Simon Kirby 2012-09-19 22:11 ` Myklebust, Trond 2012-10-12 8:15 ` Simon Kirby
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).