All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trond.myklebust@fys.uio.no>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Dr. David Alan Gilbert" <linux@treblig.org>,
	linux-kernel@vger.kernel.org,
	"J. Bruce Fields" <bfields@fieldses.org>
Subject: Re: 2.6.22.1 Oops in put_nfs_open_context
Date: Mon, 06 Aug 2007 14:19:54 -0400	[thread overview]
Message-ID: <1186424394.6616.44.camel@localhost> (raw)
In-Reply-To: <20070806110107.0d0a2c2b.akpm@linux-foundation.org>

[-- Attachment #1: Type: text/plain, Size: 4674 bytes --]

On Mon, 2007-08-06 at 11:01 -0700, Andrew Morton wrote:
> On Mon, 6 Aug 2007 11:08:13 +0100 "Dr. David Alan Gilbert" <linux@treblig.org> wrote:
> 
> >   The oops below is from one of a pair of machines that run compiles;
> > they're not managing to stay up for more than a day or two at a time
> > this is the first time I've actually managed to capture an oops from one.
> > They lock to the point where they still ping, and they won't toggle
> > capslock.  A top left running on them showed it sitting with pdflush
> > using 99% CPU.
> > 
> >   Config at the bottom.  The hardware are supermicro X7DVA boards with
> > 2x Xeon 5140's. (These Supermicro bios don't appear to have the PCI-Express
> > coalesce option being discussed in another thread).
> > 
> > Dave
> > 
> > Aug  3 19:15:41 fel kernel: [185427.633686] BUG: unable to handle kernel paging request at virtual address 00100104
> > Aug  3 19:15:41 fel kernel: [185427.633691]  printing eip:
> > Aug  3 19:15:41 fel kernel: [185427.633693] c01fe613
> > Aug  3 19:15:41 fel kernel: [185427.633694] *pde = 00000000
> > Aug  3 19:15:41 fel kernel: [185427.633697] Oops: 0002 [#1]
> > Aug  3 19:15:41 fel kernel: [185427.633705] SMP
> > Aug  3 19:15:41 fel kernel: [185427.633712] Modules linked in: netconsole
> > Aug  3 19:15:41 fel kernel: [185427.633721] CPU:    2 Aug  3 19:15:41 fel kernel: [185427.633722] EIP:    0060:[<c01fe613>]    Not tainted VLI
> > Aug  3 19:15:41 fel kernel: [185427.633723] EFLAGS: 00010296   (2.6.22.1daveg #3) Aug  3 19:15:41 fel kernel: [185427.633735] EIP is at put_nfs_open_context+0x30/0x7d
> > Aug  3 19:15:41 fel kernel: [185427.633739] eax: 00100100   ebx: d299b634   ecx: 00000001   edx: 00200200 Aug  3 19:15:41 fel kernel: [185427.633744] esi: cd4c7240   edi: cd4c7260   ebp: e2a25d30   esp: e2a25d24
> > Aug  3 19:15:42 fel kernel: [185427.633748] ds: 007b   es: 007b   fs: 00d8  gs: 0033  ss: 0068 Aug  3 19:15:43 fel kernel: [185427.633752] Process ccache (pid: 8352, ti=e2a24000 task=f69c15b0 task.ti=e2a24000)
> > Aug  3 19:15:43 fel kernel: [185427.633756] Stack: e469b680 d299b57c 00000029 e2a25d3c c02014dd 00000000 e2a25d68 c020475f Aug  3 19:15:43 fel kernel: [185427.633774]        00000001 00000000 00000000 ffffffff d299b4ac e469b680 d299b57c 00000000
> > Aug  3 19:15:43 fel kernel: [185427.633791]        00000001 e2a25da0 c0205a2d 00000000 00000000 00000000 00000000 d299b4ac Aug  3 19:15:44 fel kernel: [185427.633809] Call Trace:
> > Aug  3 19:15:44 fel kernel: [185427.633813]  [<c0104034>] show_trace_log_lvl+0x19/0x2e
> > Aug  3 19:15:46 fel kernel: [185427.633821]  [<c01040fe>] show_stack_log_lvl+0xa1/0xa9
> > Aug  3 19:15:47 fel kernel: [185427.633827]  [<c0104301>] show_registers+0x1b8/0x289
> > Aug  3 19:15:47 fel kernel: [185427.633833]  [<c0104524>] die+0x10d/0x1d2
> > Aug  3 19:15:47 fel kernel: [185427.633839]  [<c0431938>] do_page_fault+0x44d/0x524
> > Aug  3 19:15:47 fel kernel: [185427.633847]  [<c043018a>] error_code+0x72/0x78
> > Aug  3 19:15:47 fel kernel: [185427.633852]  [<c02014dd>] nfs_release_request+0x20/0x2f
> > Aug  3 19:15:47 fel kernel: [185427.633859]  [<c020475f>] nfs_wait_on_requests_locked+0x6d/0xae
> > Aug  3 19:15:47 fel kernel: [185427.633866]  [<c0205a2d>] nfs_sync_mapping_wait+0x82/0x11b
> > Aug  3 19:15:47 fel kernel: [185427.633872]  [<c0205b23>] nfs_wb_all+0x5d/0x7b
> > Aug  3 19:15:47 fel kernel: [185427.633878]  [<c01fc93b>] nfs_rename+0x16b/0x275
> > Aug  3 19:15:47 fel kernel: [185427.633884]  [<c0168c99>] vfs_rename_other+0x65/0xaf
> > Aug  3 19:15:47 fel kernel: [185427.633891]  [<c0168dd0>] vfs_rename+0xed/0x20b
> > Aug  3 19:15:47 fel kernel: [185427.633896]  [<c016902f>] do_rename+0x141/0x182
> > Aug  3 19:15:47 fel kernel: [185427.633902]  [<c01690ab>] sys_renameat+0x3b/0x5d
> > Aug  3 19:15:47 fel kernel: [185427.633907]  [<c01690f5>] sys_rename+0x28/0x2a
> > Aug  3 19:15:47 fel kernel: [185427.633913]  [<c01032a6>] sysenter_past_esp+0x5f/0x85
> > Aug  3 19:15:47 fel kernel: [185427.633919]  =======================
> > Aug  3 19:15:47 fel kernel: [185427.633922] Code: 89 c6 53 f0 ff 08 0f 94 c0 84 c0 74 66 8d 7e 20 39 7e 20 74 30 8b 46 08 8b 58 24 83 c3 6c 89 d8 e8 4c 17 23 00 8b 46 20 8b 57 04 <89> 50 04 89 02 89 d8 c7 46 20 00 01 10 00 c7 47 04 00 02 20 00
> > Aug  3 19:15:47 fel kernel: [185427.634022] EIP: [<c01fe613>] put_nfs_open_context+0x30/0x7d SS:ESP 0068:e2a25d24
> 
> We did a list operation on an already-deleted list_head.
> 
> That code has changed a lot between 2..22 and 2.6.23-rc2.  Hopefully for the
> better, although I can't immediately find a commit in there which looks like
> it addresses this bug.

I believe this fix should address it.

Trond


[-- Attachment #2: linux-2.6.23-002-fix_put_nfs_open_context.dif --]
[-- Type: message/rfc822, Size: 2723 bytes --]

From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: No Subject
Date: Thu, 26 Jul 2007 12:06:17 -0400
Message-ID: <1186424394.6616.45.camel@localhost>

We need to grab the inode->i_lock atomically with the last reference put in
order to remove the open context that is being freed from the
nfsi->open_files list.

Fix by converting the kref to a standard atomic counter and then using
atomic_dec_and_lock()...

Thanks to Arnd Bergmann for pointing out the problem.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/nfs/inode.c         |   24 ++++++++----------------
 include/linux/nfs_fs.h |    2 +-
 2 files changed, 9 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index bca6cdc..71a49c3 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -468,7 +468,7 @@ static struct nfs_open_context *alloc_nfs_open_context(struct vfsmount *mnt, str
 		ctx->lockowner = current->files;
 		ctx->error = 0;
 		ctx->dir_cookie = 0;
-		kref_init(&ctx->kref);
+		atomic_set(&ctx->count, 1);
 	}
 	return ctx;
 }
@@ -476,21 +476,18 @@ static struct nfs_open_context *alloc_nfs_open_context(struct vfsmount *mnt, str
 struct nfs_open_context *get_nfs_open_context(struct nfs_open_context *ctx)
 {
 	if (ctx != NULL)
-		kref_get(&ctx->kref);
+		atomic_inc(&ctx->count);
 	return ctx;
 }
 
-static void nfs_free_open_context(struct kref *kref)
+void put_nfs_open_context(struct nfs_open_context *ctx)
 {
-	struct nfs_open_context *ctx = container_of(kref,
-			struct nfs_open_context, kref);
+	struct inode *inode = ctx->path.dentry->d_inode;
 
-	if (!list_empty(&ctx->list)) {
-		struct inode *inode = ctx->path.dentry->d_inode;
-		spin_lock(&inode->i_lock);
-		list_del(&ctx->list);
-		spin_unlock(&inode->i_lock);
-	}
+	if (!atomic_dec_and_lock(&ctx->count, &inode->i_lock))
+		return;
+	list_del(&ctx->list);
+	spin_unlock(&inode->i_lock);
 	if (ctx->state != NULL)
 		nfs4_close_state(&ctx->path, ctx->state, ctx->mode);
 	if (ctx->cred != NULL)
@@ -500,11 +497,6 @@ static void nfs_free_open_context(struct kref *kref)
 	kfree(ctx);
 }
 
-void put_nfs_open_context(struct nfs_open_context *ctx)
-{
-	kref_put(&ctx->kref, nfs_free_open_context);
-}
-
 /*
  * Ensure that mmap has a recent RPC credential for use when writing out
  * shared pages
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 9ba4aec..157dcb0 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -71,7 +71,7 @@ struct nfs_access_entry {
 
 struct nfs4_state;
 struct nfs_open_context {
-	struct kref kref;
+	atomic_t count;
 	struct path path;
 	struct rpc_cred *cred;
 	struct nfs4_state *state;

  reply	other threads:[~2007-08-06 18:20 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-06 10:08 2.6.22.1 Oops in put_nfs_open_context Dr. David Alan Gilbert
2007-08-06 18:01 ` Andrew Morton
2007-08-06 18:19   ` Trond Myklebust [this message]
2007-08-06 18:45     ` Andrew Morton
2007-08-06 19:38       ` Trond Myklebust
2007-08-07 12:59     ` Dr. David Alan Gilbert
2007-08-23  9:41       ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1186424394.6616.44.camel@localhost \
    --to=trond.myklebust@fys.uio.no \
    --cc=akpm@linux-foundation.org \
    --cc=bfields@fieldses.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@treblig.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.