All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dr Fields James Bruce <bfields@fieldses.org>
To: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Andrew Martin <amartin@xes-inc.com>, Jim Rees <rees@umich.edu>,
	bhawley@luminex.com, Brown Neil <neilb@suse.de>,
	linux-nfs-owner@vger.kernel.org, linux-nfs@vger.kernel.org
Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
Date: Fri, 28 Mar 2014 18:00:38 -0400	[thread overview]
Message-ID: <20140328220038.GK6041@fieldses.org> (raw)
In-Reply-To: <A95B7939-BDEB-4FFB-BCC0-6EAD9487E7D8@primarydata.com>

On Tue, Mar 18, 2014 at 06:27:57PM -0400, Trond Myklebust wrote:
> 
> On Mar 18, 2014, at 17:50, Andrew Martin <amartin@xes-inc.com> wrote:
> 
> > ----- Original Message -----
> >> From: "Trond Myklebust" <trond.myklebust@primarydata.com>
> >> To: "Andrew Martin" <amartin@xes-inc.com>
> >> Cc: "Jim Rees" <rees@umich.edu>, bhawley@luminex.com, "Brown Neil" <neilb@suse.de>, linux-nfs-owner@vger.kernel.org,
> >> linux-nfs@vger.kernel.org
> >> Sent: Thursday, March 6, 2014 3:01:03 PM
> >> Subject: Re: Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels
> >> 
> >> 
> > 
> > Trond,
> > 
> > This problem has reoccurred, and I have captured the debug output that you requested:
> > 
> > echo 0 >/proc/sys/sunrpc/rpc_debug:
> > http://pastebin.com/9juDs2TW
> > 
> > echo w > /proc/sysrq-trigger ; dmesg:
> > http://pastebin.com/1vDx9bNf
> > 
> > netstat -tn:
> > http://pastebin.com/mjxqjmuL
> > 
> > One suggestion for debug was to attempt to run "umount -f /path/to/mountpoint"
> > repeatedly to attempt to send SIGKILL back up to the application. This always
> > returned "Device or resource busy" and I was unable to unmount the filesystem
> > until I used "mount -l". 
> > 
> > I was able to kill -9 all but two of the processes that were blocking in
> > uninterruptable sleep. Note that I was able to get lsof output on these
> > processes this time, and they all appeared to be blocking on access to a
> > single file on the nfs share. If I tried to cat said file from this client,
> > my terminal would block:
> > open("/path/to/file", O_RDONLY)        = 3
> > fstat(3, {st_mode=S_IFREG|0644, st_size=42385, ...}) = 0
> > mmap(NULL, 1056768, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb00f0dc000
> > read(3,
> > 
> > However, I could cat the file just fine from another nfs client. Does this 
> > additional information shed any light on the source of this problem?
> > 
> 
> Ah… So this machine is acting both as a NFSv3 client and a NFSv4 server?
> 
> 	• [1140235.544551] SysRq : Show Blocked State
> 	• [1140235.547126]   task                        PC stack   pid father
> 	• [1140235.547145] rpciod/0      D 0000000000000001     0   833      2 0x00000000
> 	• [1140235.547150]  ffff8802812a3c20 0000000000000046 0000000000015e00 0000000000015e00
> 	• [1140235.547155]  ffff880297251ad0 ffff8802812a3fd8 0000000000015e00 ffff880297251700
> 	• [1140235.547159]  0000000000015e00 ffff8802812a3fd8 0000000000015e00 ffff880297251ad0
> 	• [1140235.547164] Call Trace:
> 	• [1140235.547175]  [<ffffffff8156a1a5>] schedule_timeout+0x195/0x300
> 	• [1140235.547182]  [<ffffffff81078130>] ? process_timeout+0x0/0x10
> 	• [1140235.547197]  [<ffffffffa009ef52>] rpc_shutdown_client+0xc2/0x100 [sunrpc]
> 	• [1140235.547203]  [<ffffffff81086750>] ? autoremove_wake_function+0x0/0x40
> 	• [1140235.547216]  [<ffffffffa01aa62c>] put_nfs4_client+0x4c/0xb0 [nfsd]
> 	• [1140235.547227]  [<ffffffffa01ae669>] nfsd4_cb_probe_done+0x29/0x60 [nfsd]
> 	• [1140235.547238]  [<ffffffffa00a5d0c>] rpc_exit_task+0x2c/0x60 [sunrpc]
> 	• [1140235.547250]  [<ffffffffa00a64e6>] __rpc_execute+0x66/0x2a0 [sunrpc]
> 	• [1140235.547261]  [<ffffffffa00a6750>] ? rpc_async_schedule+0x0/0x20 [sunrpc]
> 	• [1140235.547272]  [<ffffffffa00a6765>] rpc_async_schedule+0x15/0x20 [sunrpc]
> 	• [1140235.547276]  [<ffffffff81081ba7>] run_workqueue+0xc7/0x1a0
> 	• [1140235.547279]  [<ffffffff81081d23>] worker_thread+0xa3/0x110
> 	• [1140235.547284]  [<ffffffff81086750>] ? autoremove_wake_function+0x0/0x40
> 	• [1140235.547287]  [<ffffffff81081c80>] ? worker_thread+0x0/0x110
> 	• [1140235.547291]  [<ffffffff810863d6>] kthread+0x96/0xa0
> 	• [1140235.547295]  [<ffffffff810141aa>] child_rip+0xa/0x20
> 	• [1140235.547299]  [<ffffffff81086340>] ? kthread+0x0/0xa0
> 	• [1140235.547302]  [<ffffffff810141a0>] ? child_rip+0x0/0x20
> 
> the above looks bad. The rpciod thread is sleeping, waiting for the rpc client to terminate, and the only task running on that rpc client, according to your rpc_debug output is the above CB_NULL probe. Deadlock...
> 
> Bruce, it looks like the above should have been fixed in Linux 2.6.35 with commit 9045b4b9f7f3 (nfsd4: remove probe task's reference on client), is that correct?

Yes, that definitely looks it would explain the bug.  And the sysrq
trace shows 2.6.32-57.

Andrew Martin, can you confirm that the problem is no longer
reproduceable on a kernel with that patch applied?

--b.

  reply	other threads:[~2014-03-28 22:00 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1696396609.119284.1394040541217.JavaMail.zimbra@xes-inc.com>
2014-03-05 17:45 ` Optimal NFS mount options to safely allow interrupts and timeouts on newer kernels Andrew Martin
2014-03-05 20:11   ` Jim Rees
2014-03-05 20:41     ` Andrew Martin
2014-03-05 21:11       ` Jim Rees
2014-03-06  3:34         ` NeilBrown
2014-03-06  3:47           ` Jim Rees
2014-03-06  4:37             ` NeilBrown
2014-03-05 20:15   ` Brian Hawley
2014-03-05 20:54     ` Chuck Lever
2014-03-06  9:37     ` Ric Wheeler
2014-03-06  3:50   ` NeilBrown
2014-03-06  5:03     ` Andrew Martin
2014-03-06  5:37       ` NeilBrown
2014-03-06  5:47         ` Brian Hawley
2014-03-06 15:30           ` Andrew Martin
2014-03-06 16:22             ` Jim Rees
2014-03-06 16:43               ` Andrew Martin
2014-03-06 17:36                 ` Jim Rees
2014-03-06 18:26                   ` Trond Myklebust
2014-03-06 18:35                   ` Andrew Martin
2014-03-06 18:48                     ` Jim Rees
2014-03-06 19:02                       ` Trond Myklebust
2014-03-06 18:50                     ` Trond Myklebust
2014-03-06 19:46                       ` Andrew Martin
2014-03-06 19:52                         ` Trond Myklebust
2014-03-06 20:45                           ` Andrew Martin
2014-03-06 21:01                             ` Trond Myklebust
2014-03-18 21:50                               ` Andrew Martin
2014-03-18 22:27                                 ` Trond Myklebust
2014-03-28 22:00                                   ` Dr Fields James Bruce [this message]
2014-04-04 18:15                                     ` Andrew Martin
2014-03-06 19:00                 ` Brian Hawley
2014-03-06 19:06                   ` Trond Myklebust
2014-03-06 19:14                     ` Brian Hawley
2014-03-06 19:26                       ` Trond Myklebust
2014-03-06 19:33                         ` Brian Hawley
2014-03-06 19:47                           ` Trond Myklebust
2014-03-06 19:56                             ` Brian Hawley
2014-03-06 20:31                               ` Trond Myklebust
2014-03-06 20:34                                 ` Brian Hawley
2014-03-06 20:41                                   ` Trond Myklebust
2014-03-06 19:29                       ` Ric Wheeler
2014-03-06 19:38                         ` Brian Hawley
2014-04-04 18:15                           ` Andrew Martin
2014-03-06 18:56             ` Brian Hawley
2014-03-06 12:34       ` Jim Rees
2014-03-06 15:26         ` Chuck Lever
2014-03-06 15:33           ` Trond Myklebust
2014-03-06 15:59             ` Chuck Lever
2014-03-06 16:02               ` Trond Myklebust
2014-03-06 16:13                 ` Chuck Lever
2014-03-06 16:16                   ` Trond Myklebust
2014-03-06 16:45                     ` Chuck Lever
2014-03-06 17:47                       ` Trond Myklebust
2014-03-06 20:38                         ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140328220038.GK6041@fieldses.org \
    --to=bfields@fieldses.org \
    --cc=amartin@xes-inc.com \
    --cc=bhawley@luminex.com \
    --cc=linux-nfs-owner@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=rees@umich.edu \
    --cc=trond.myklebust@primarydata.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.