linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Christoph Hellwig <hch@lst.de>,
	Eric Sandeen <sandeen@sandeen.net>,
	linux-nfs@vger.kernel.org, xfs@oss.sgi.com
Subject: Re: panic on 4.20 server exporting xfs filesystem
Date: Fri, 6 Mar 2015 07:59:22 +1100	[thread overview]
Message-ID: <20150305205922.GF18360@dastard> (raw)
In-Reply-To: <20150305204749.GA17934@fieldses.org>

On Thu, Mar 05, 2015 at 03:47:49PM -0500, J. Bruce Fields wrote:
> On Thu, Mar 05, 2015 at 12:02:17PM -0500, J. Bruce Fields wrote:
> > On Thu, Mar 05, 2015 at 10:01:38AM -0500, J. Bruce Fields wrote:
> > > On Thu, Mar 05, 2015 at 02:17:31PM +0100, Christoph Hellwig wrote:
> > > > On Wed, Mar 04, 2015 at 11:08:49PM -0500, J. Bruce Fields wrote:
> > > > > Ah-hah:
> > > > > 
> > > > > 	static void
> > > > > 	nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)
> > > > > 	{
> > > > > 		...
> > > > > 		nfsd4_cb_layout_fail(ls);
> > > > > 
> > > > > That'd do it!
> > > > > 
> > > > > Haven't tried to figure out why exactly that's getting called, and why
> > > > > only rarely.  Some intermittent problem with the callback path, I guess.
> > > > > 
> > > > > Anyway, I think that solves most of the mystery....
> > > > 
> > > > Ooops, that was a nasty git merge error in the last rebase, see the fix
> > > > below.
> > > 
> > > Thanks!
> > 
> > And with that fix things look good.
> > 
> > I'm still curious why the callbacks are failling.  It's also logging
> > "nfsd: client 192.168.122.32 failed to respond to layout recall".
> 
> I spoke too soon, I'm still not getting through my usual test run--the most
> recent run is hanging in generic/247 with the following in the server logs.
> 
> But I probably still won't get a chance to look at this any closer till after
> vault.
> 
> --b.
> 
> nfsd: client 192.168.122.32 failed to respond to layout recall.   Fencing..
> nfsd: fence failed for client 192.168.122.32: -2!
> nfsd: client 192.168.122.32 failed to respond to layout recall.   Fencing..
> nfsd: fence failed for client 192.168.122.32: -2!
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt ffff88005639a000 xid c21abd62
> kswapd0: page allocation failure: order:0, mode:0x120

[snip network driver memory allocation failure]

> active_anon:7053 inactive_anon:2435 isolated_anon:0
>  active_file:88743 inactive_file:89505 isolated_file:32
>  unevictable:0 dirty:9786 writeback:0 unstable:0
>  free:3571 slab_reclaimable:227807 slab_unreclaimable:75772
>  mapped:21010 shmem:380 pagetables:1567 bounce:0
>  free_cma:0

Looks like there should be heaps of reclaimable memory...

> nfsd: client 192.168.122.32 failed to respond to layout recall.   Fencing..

So there's a layout recall pending...

> nfsd: fence failed for client 192.168.122.32: -2!
> receive_cb_reply: Got unrecognized reply: calldir 0x1 xpt_bc_xprt ffff880051dfc000 xid 8ff02aaf
> INFO: task nfsd:17653 blocked for more than 120 seconds.
>       Not tainted 4.0.0-rc2-09922-g26cbcc7 #89
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> nfsd            D ffff8800753a7848 11720 17653      2 0x00000000
>  ffff8800753a7848 0000000000000001 0000000000000001 ffffffff82210580
>  ffff88004e9bcb50 0000000000000006 ffffffff8119d9f0 ffff8800753a7848
>  ffff8800753a7fd8 ffff88002e5e3d70 0000000000000246 ffff88004e9bcb50
> Call Trace:
>  [<ffffffff8119d9f0>] ? new_sync_read+0xb0/0xb0
>  [<ffffffff8119d9f0>] ? new_sync_read+0xb0/0xb0
>  [<ffffffff81a95737>] schedule+0x37/0x90
>  [<ffffffff81a95ac8>] schedule_preempt_disabled+0x18/0x30
>  [<ffffffff81a97756>] mutex_lock_nested+0x156/0x400
>  [<ffffffff813a0d5a>] ? xfs_file_buffered_aio_write.isra.9+0x6a/0x2a0
>  [<ffffffff8119d9f0>] ? new_sync_read+0xb0/0xb0
>  [<ffffffff813a0d5a>] xfs_file_buffered_aio_write.isra.9+0x6a/0x2a0
>  [<ffffffff8119d9f0>] ? new_sync_read+0xb0/0xb0
>  [<ffffffff813a1016>] xfs_file_write_iter+0x86/0x130
>  [<ffffffff8119db05>] do_iter_readv_writev+0x65/0xa0

and the nfsd got hung up on the inode mutex during  a write.

Which means there's some other process blocked holding the i_mutex.
sysrq-w and sysrq-t is probably going to tell us more here.

I suspect we'll have another write stuck in break_layout().....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2015-03-05 20:59 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-03 22:10 panic on 4.20 server exporting xfs filesystem J. Bruce Fields
2015-03-03 22:44 ` Dave Chinner
2015-03-04  2:08   ` J. Bruce Fields
2015-03-04  4:41     ` Dave Chinner
2015-03-05 13:19       ` Christoph Hellwig
2015-03-05 15:21         ` J. Bruce Fields
2015-03-08 13:08         ` Tom Haynes
2015-03-04 15:54     ` J. Bruce Fields
2015-03-04 22:09       ` Dave Chinner
2015-03-04 22:27         ` J. Bruce Fields
2015-03-04 22:45           ` Dave Chinner
2015-03-04 22:49             ` Eric Sandeen
2015-03-04 22:56               ` Dave Chinner
2015-03-05  4:08                 ` J. Bruce Fields
2015-03-05 13:17                   ` Christoph Hellwig
2015-03-05 15:01                     ` J. Bruce Fields
2015-03-05 17:02                       ` J. Bruce Fields
2015-03-05 20:47                         ` J. Bruce Fields
2015-03-05 20:59                           ` Dave Chinner [this message]
2015-03-06 20:47                             ` J. Bruce Fields
2015-03-19 17:27                               ` Christoph Hellwig
2015-03-19 18:47                                 ` J. Bruce Fields
2015-03-20  6:49                                   ` Christoph Hellwig
2015-03-08 15:30                           ` Christoph Hellwig
2015-03-09 19:45                             ` J. Bruce Fields
2015-03-20  4:06                     ` Kinglong Mee
2015-03-20  6:50                       ` Christoph Hellwig
2015-03-20  7:56                         ` [PATCH] NFSD: Fix infinite loop in nfsd4_cb_layout_fail() Kinglong Mee
2015-03-15 12:58 ` panic on 4.20 server exporting xfs filesystem Christoph Hellwig
2015-03-16 14:27   ` J. Bruce Fields
2015-03-17 10:30     ` Christoph Hellwig
2015-03-18 10:50     ` Christoph Hellwig
2015-03-27 10:41 ` Christoph Hellwig
2015-03-27 14:50   ` Jeff Layton
2015-03-30 16:44     ` Christoph Hellwig
2015-03-27 15:13   ` J. Bruce Fields
2015-04-26 16:19   ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150305205922.GF18360@dastard \
    --to=david@fromorbit.com \
    --cc=bfields@fieldses.org \
    --cc=hch@lst.de \
    --cc=linux-nfs@vger.kernel.org \
    --cc=sandeen@sandeen.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).