linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@redhat.com>
To: Christoph Hellwig <hch@infradead.org>, bfields@redhat.com
Cc: linux-nfs@vger.kernel.org
Subject: Re: nfsd crash when running xfstests/089
Date: Tue, 20 Sep 2016 13:41:38 -0400	[thread overview]
Message-ID: <1474393298.19989.47.camel@redhat.com> (raw)
In-Reply-To: <20160920163717.GA946@infradead.org>

On Tue, 2016-09-20 at 09:37 -0700, Christoph Hellwig wrote:
> Running a latest Linus tree with nfsv4.2 and the blocklayout driver
> against an XFS file system exported from the local machine I get
> this error when running generic/089 somewhat relatively reproducibly
> (doesn't happen on every run, but more often than not):
> 
> generic/089 133s ...[  387.409504] run fstests generic/089 at 2016-09-20 16:31:44
> [  462.789037] general protection fault: 0000 [#1] SMP
> [  462.790231] Modules linked in:
> [  462.790557] CPU: 2 PID: 3087 Comm: nfsd Tainted: G        W 4.8.0-rc6+ #1939
> [  462.791235] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
> [  462.792145] task: ffff880137fa1280 task.stack: ffff880137fa4000
> [  462.792163] RIP: 0010:[<ffffffff813fff4a>]  [<ffffffff813fff4a>] release_lock_stateid+0x1a/0x60
> [  462.792163] RSP: 0018:ffff880137fa7cd8  EFLAGS: 00010246
> [  462.792163] RAX: ffff880137d90548 RBX: ffff88013a7c7a88 RCX: ffff880137fa1a48
> [  462.792163] RDX: ffff880137fa1a48 RSI: 0000000000000000 RDI: ffff88013a7c7a88
> [  462.792163] RBP: ffff880137fa7cf0 R08: 0000000000000001 R09: 0000000000000000
> [  462.792163] R10: 0000000000000001 R11: 0000000000000000 R12: 6b6b6b6b6b6b6b6b
> [  462.792163] R13: 0000000000000000 R14: ffff88013ac08000 R15: 0000000000000021
> [  462.792163] FS:  0000000000000000(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> [  462.792163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  462.792163] CR2: 00007fcf2d818000 CR3: 0000000135b0e000 CR4: 00000000000006e0
> [  462.792163] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  462.792163] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  462.792163] Stack:
> [  462.792163]  ffff88013a7c7a88 ffff88013a7c7b40 0000000000000000 ffff880137fa7d18
> [  462.792163]  ffffffff81405b5a ffff880138ffd000 ffff880136dce000 ffff880138ffd068
> [  462.792163]  ffff880137fa7d70 ffffffff813f13f4 ffff88013ac08220 0000000000000870
> [  462.792163] Call Trace:
> [  462.792163]  [<ffffffff81405b5a>] nfsd4_free_stateid+0x16a/0x170
> [  462.792163]  [<ffffffff813f13f4>] nfsd4_proc_compound+0x344/0x630
> [  462.792163]  [<ffffffff813dc573>] nfsd_dispatch+0xb3/0x1f0
> [  462.792163]  [<ffffffff81dc0898>] svc_process_common+0x428/0x650
> [  462.792163]  [<ffffffff81dc0c15>] svc_process+0x155/0x340
> [  462.792163]  [<ffffffff813dbaf2>] nfsd+0x172/0x270
> [  462.792163]  [<ffffffff813db980>] ? nfsd_destroy+0x180/0x180
> [  462.792163]  [<ffffffff813db980>] ? nfsd_destroy+0x180/0x180
> [  462.792163]  [<ffffffff810fab91>] kthread+0xf1/0x110
> [  462.792163]  [<ffffffff81e01e6f>] ret_from_fork+0x1f/0x40
> [  462.792163]  [<ffffffff810faaa0>] ? kthread_create_on_node+0x200/0x200
> 
> (gdb) l *(release_lock_stateid+0x1a)
> 0xffffffff813fff4a is in release_lock_stateid
> (./include/linux/spinlock.h:302).
> > > 297		raw_spin_lock_init(&(_lock)->rlock);		\
> > 298	} while (0)
> 299	
> > 300	static __always_inline void spin_lock(spinlock_t *lock)
> > 301	{
> > 302		raw_spin_lock(&lock->rlock);
> > 303	}
> 304	
> > 305	static __always_inline void spin_lock_bh(spinlock_t *lock)
> > 306	{
> (gdb) l *(nfsd4_free_stateid+0x16a)
> 0xffffffff81405b5a is in nfsd4_free_stateid (fs/nfsd/nfs4state.c:4923).
> > 4918		ret = nfserr_locks_held;
> > 4919		if (check_for_locks(stp->st_stid.sc_file,
> > 4920				    lockowner(stp->st_stateowner)))
> > 4921			goto out;
> 4922	
> > 4923		release_lock_stateid(stp);
> > 4924		ret = nfs_ok;
> 4925	
> > 4926	out:
> > 4927		mutex_unlock(&stp->st_mutex);
> (gdb) 
> 

Super. Ok, so it looks like it oopsed while trying to lock the cl_lock,
but it first has to chase through several pointers to get to the clp.

My first suspicion would be a refcounting problem of some sort. Could
the stateid be getting freed while still hashed? How about the
openowner? That shouldn't happen of course, but if the refcounts were
somehow off...

-- 
Jeff Layton <jlayton@redhat.com>

  reply	other threads:[~2016-09-20 17:41 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-20 16:37 nfsd crash when running xfstests/089 Christoph Hellwig
2016-09-20 17:41 ` Jeff Layton [this message]
2016-09-22 16:20 ` Jeff Layton
2016-09-23 16:40   ` Christoph Hellwig
2016-09-23 17:42     ` Benjamin Coddington

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1474393298.19989.47.camel@redhat.com \
    --to=jlayton@redhat.com \
    --cc=bfields@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).