linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Simon Kirby <sim@hostway.ca>
To: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [3.1-rc4] NFSv3 client hang
Date: Fri, 9 Sep 2011 12:45:10 -0700	[thread overview]
Message-ID: <20110909194509.GB6195@hostway.ca> (raw)

The 3.1-rc4 NFSv3 client hung on another box (separate from the other one
which Oopsed in vfs_rmdir() with similar workload). This build was also
of 9e79e3e9dd9672b37ac9412e9a926714306551fe (slightly past 3.1-rc4), and
"git log 9e79e3e9dd96.. fs/nfs net/sunrpc" is empty.

All mounts to one server IP have hung, while all other mounts work fine.
I ran "cd /proc/sys/sunrpc; echo 255 > rpc_debug; echo 255 > nfs_debug"
for a while, then kill -9'd all D-state processes to simplify the
debugging, and was left with one that was not interruptible:

28612 D    /usr/local/apache2/bin/http sleep_on_page
# cat /proc/28612/stack
[<ffffffff810bdf49>] sleep_on_page+0x9/0x10
[<ffffffff810bdf34>] __lock_page+0x64/0x70
[<ffffffff8112a9e5>] __generic_file_splice_read+0x2d5/0x500
[<ffffffff8112ac5a>] generic_file_splice_read+0x4a/0x90
[<ffffffff812030e5>] nfs_file_splice_read+0x85/0xd0
[<ffffffff81128fb2>] do_splice_to+0x72/0xa0
[<ffffffff811297e4>] splice_direct_to_actor+0xc4/0x1d0
[<ffffffff81129942>] do_splice_direct+0x52/0x70
[<ffffffff81100096>] do_sendfile+0x166/0x1d0
[<ffffffff81100185>] sys_sendfile64+0x85/0xb0
[<ffffffff816af57b>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

echo 1 > /proc/sys/sunrpc/rpc_debug emits:

-pid- flgs status -client- --rqstp- -timeout ---ops--
37163 0001    -11 ffff8802251bca00   (null)        0 ffffffff816e4110 nfsv3 READ a:call_reserveresult q:xprt_sending

tcpdump to this server shows absolutely no packets to the server IP for
several minutes. netstat shows the socket in CLOSE_WAIT:

# netstat -tan|grep 2049
tcp        0      0 10.10.52.50:806         10.10.52.230:2049       CLOSE_WAIT

This is the only port-2049 socket that still exists.
rpcinfo -p 10.10.52.230, -t 10.10.52.230 lockmgr, etc., all show the
server seems fine. rpciod is sleeping in rescuer_thread, and nothing
else is in D state.

mount opts were "rw,hard,intr,tcp,timeo=300,retrans=2,vers=3"

Running another "df" on the mountpoint with rpc_debug = 255 shows:

-pid- flgs status -client- --rqstp- -timeout ---ops--
37163 0001    -11 ffff8802251bca00   (null)        0 ffffffff816e4110 nfsv3 READ a:call_reserveresult q:xprt_sending
RPC:       looking up Generic cred
NFS call  access
RPC:       new task initialized, procpid 30679
RPC:       allocated task ffff880030c17a00
RPC: 37133 __rpc_execute flags=0x80
RPC: 37133 call_start nfs3 proc ACCESS (sync)
RPC: 37133 call_reserve (status 0)
RPC: 37133 failed to lock transport ffff880223d0a000
RPC: 37133 sleep_on(queue "xprt_sending" time 4489651610)
RPC: 37133 added to queue ffff880223d0a178 "xprt_sending"
RPC: 37133 sync task going to sleep

So something is not closing the old transport socket here?

Simon-

             reply	other threads:[~2011-09-09 20:02 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-09 19:45 Simon Kirby [this message]
2011-09-09 23:18 ` [3.1-rc4] NFSv3 client hang Trond Myklebust
2011-10-20 19:03   ` Simon Kirby
2012-03-01 22:55     ` Simon Kirby
2012-03-02  0:25       ` Simon Kirby
2012-03-02 18:49         ` [3.2.5] NFSv3 CLOSE_WAIT hang Simon Kirby
2012-09-05  7:49           ` Yan-Pai Chen
2012-09-05 15:09             ` Myklebust, Trond
2012-09-07 13:57               ` Dick Streefland, rnews
2012-09-07 14:13                 ` Myklebust, Trond
2012-09-07 14:33                   ` Dick Streefland, rnews
2012-09-07 15:46                     ` Myklebust, Trond
2012-09-08 19:32                       ` Dick Streefland, rnews
2012-09-10  9:00                         ` Yan-Pai Chen
2012-09-11 19:40                           ` Simon Kirby
2012-09-11 22:17                             ` Myklebust, Trond
2012-09-13  5:22                               ` Yan-Pai Chen
2012-09-13 13:32                                 ` Myklebust, Trond
2012-09-21  7:30                                   ` Yan-Pai Chen
     [not found]                             ` <1347401844.15208.17.camel@lade.trondhjem.org>
2012-09-12 20:54                               ` Myklebust, Trond
2012-09-19 22:01                                 ` Simon Kirby
2012-09-19 22:11                                   ` Myklebust, Trond
2012-10-12  8:15                                     ` Simon Kirby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110909194509.GB6195@hostway.ca \
    --to=sim@hostway.ca \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).