From: Christian Krafft <krafft@de.ibm.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>,
linux-kernel@vger.kernel.org
Subject: Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
Date: Thu, 26 Jul 2007 14:44:33 +0200 [thread overview]
Message-ID: <20070726144433.080126f7@localhost> (raw)
In-Reply-To: <200707261323.38310.arnd@arndb.de>
[-- Attachment #1: Type: text/plain, Size: 6167 bytes --]
On Thu, 26 Jul 2007 13:23:37 +0200
Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday 25 July 2007, Trond Myklebust wrote:
> >
> > On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote:
> >
> > > Obviously the locking code in nfs_free_open_context is wrong.
> > > Checking the list for entries and removing the entry should be an atomic operation.
> >
> > Wrong. It is quite safe to test the structure member ctx->list for
> > emptiness outside the spinlock because we have an explicit guarantee
> > that nobody else has a reference to this structure, plus the
> > atomic_dec_and_test() in kref_put() has acted as a memory barrier for
> > us.
>
> Well, the real question then is how the ctx can still be present in the
> nfsi->open_files list. Since we are in nfs_free_open_context(), there
> must not be any pointer to the ctx anywhere, but still we have this other
> thread calling get_nfs_open_context() on it.
>
> Arnd <><
Thanks for the pointer Arnd,
Trond, does the patch below look better to you ?
Note:
For now the list_del_init is needed, so that the BUG_ON(!list_empty) works correctly.
(I have no clue why list_empty should not be working, when the list has been poisoned by a list_del.)
I am running the stress tests now. It seems to work so far.
---
Subject: nfs: fix locking in nfs/inode.c in nfs_free_open_context
From: Christian Krafft <krafft@de.ibm.com>
While running some tests on a server using NFS root I found some
locking problems.
The scripts I am using can be found under at
http://localhorst.gotdns.org/parabelboi/nfs/
Please note, I am using rpm to generate read load. I found no better way
to generate that much stress on the read path ;-)
It's a two way machine btw, with mount options tcp,nolock the problem
doesn't occur (probably due to a different timing).
When running write_back.sh six times concurrent with one read.sh,
the processes will be locked within a few minutes.
The problem occurs at least on 2.6.22 and 2.6.23-rc1.
It shows up like this:
Badness at lib/kref.c:33
NIP: c00000000018b734 LR: c00000000014c37c CTR: c0000000001566f0
REGS: c00000007aebf160 TRAP: 0700 Not tainted (2.6.23-rc1)
MSR: 9000000000029032 <EE,ME,IR,DR> CR: 84000422 XER: 00000000
TASK = c00000007df7d530[30769] 'sync' THREAD: c00000007aebc000 CPU: 3
NIP [c00000000018b734] .kref_get+0xc/0x28
LR [c00000000014c37c] .get_nfs_open_context+0x1c/0x38
Call Trace:
[c00000007aebf3e0] [c00000000033d9f4] ._spin_lock+0x10/0x24 (unreliable)
[c00000007aebf460] [c00000000014cf40] .nfs_find_open_context+0x68/0xcc
[c00000007aebf500] [c000000000156654] .nfs_writepage_locked+0x164/0x200
[c00000007aebf600] [c00000000015670c] .nfs_writepage+0x1c/0x48
[c00000007aebf690] [c00000000008e53c] .__writepage+0x3c/0x84
[c00000007aebf710] [c00000000008ed84] .write_cache_pages+0x238/0x3f4
[c00000007aebf880] [c000000000155840] .nfs_writepages+0x7c/0xc0
[c00000007aebf970] [c00000000008efe0] .do_writepages+0x68/0xa4
[c00000007aebf9f0] [c0000000000e0290] .__writeback_single_inode+0x1bc/0x3b8
[c00000007aebfae0] [c0000000000e0984] .sync_sb_inodes+0x20c/0x308
[c00000007aebfb90] [c0000000000e0b50] .sync_inodes_sb+0xd0/0x100
[c00000007aebfc80] [c0000000000e0c14] .__sync_inodes+0x94/0x120
nstruction dump:
4e800421 e8410028 38000001 38210080 7c030378 e8010010 ebc1fff0 7c0803a6
4e800020 80030000 7c000034 5400d97e <0b000000> 7c001828 30000001 7c00192d
Unable to handle kernel paging request for data at address 0x00200200
Faulting instruction address: 0xc000000000193474
cpu 0x0: Vector: 300 (Data Access) at [c00000000ff4b5b0]
pc: c000000000193474: .list_del+0x20/0xa4
lr: c00000000014c2e8: .nfs_free_open_context+0x4c/0xc4
sp: c00000000ff4b830
msr: 9000000000009032
dar: 200200
dsisr: 40000000
current = 0xc00000000ff0a230
paca = 0xc00000000047e900
pid = 506, comm = rpciod/0
enter ? for help
[c00000000ff4b8b0] c00000000014c2e8 .nfs_free_open_context+0x4c/0xc4
[c00000000ff4b940] c00000000018b708 .kref_put+0x74/0x94
[c00000000ff4b9c0] c00000000014c1dc .put_nfs_open_context+0x1c/0x34
[c00000000ff4ba40] c000000000150ec0 .nfs_free_request+0x2c/0x5c
[c00000000ff4bad0] c00000000018b708 .kref_put+0x74/0x94
[c00000000ff4bb50] c000000000150e2c .nfs_release_request+0x20/0x38
[c00000000ff4bbd0] c00000000015750c .nfs_writeback_done_partial+0x270/0x2c8
[c00000000ff4bc80] c0000000003274fc .rpc_exit_task+0x5c/0xa0
Obviously the locking code in nfs_free_open_context is wrong.
Checking the list for entries and removing the entry should be an atomic operation.
Also list_del_init should be used, because list_del will leave the empty list in
an undefined condition.
After applying the patch the scripts run a bit longer, but another error occurs :-/
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Index: linux-2.6.23-rc1/fs/nfs/inode.c
===================================================================
--- linux-2.6.23-rc1.orig/fs/nfs/inode.c
+++ linux-2.6.23-rc1/fs/nfs/inode.c
@@ -485,12 +485,8 @@ static void nfs_free_open_context(struct
struct nfs_open_context *ctx = container_of(kref,
struct nfs_open_context, kref);
- if (!list_empty(&ctx->list)) {
- struct inode *inode = ctx->path.dentry->d_inode;
- spin_lock(&inode->i_lock);
- list_del(&ctx->list);
- spin_unlock(&inode->i_lock);
- }
+ BUG_ON(!list_empty(&ctx->list));
+
if (ctx->state != NULL)
nfs4_close_state(&ctx->path, ctx->state, ctx->mode);
if (ctx->cred != NULL)
@@ -549,7 +545,7 @@ static void nfs_file_clear_open_context(
if (ctx) {
filp->private_data = NULL;
spin_lock(&inode->i_lock);
- list_move_tail(&ctx->list, &NFS_I(inode)->open_files);
+ list_del_init(&ctx->list);
spin_unlock(&inode->i_lock);
put_nfs_open_context(ctx);
}
--
Mit freundlichen Gruessen,
kind regards,
Christian Krafft
IBM Systems & Technology Group,
Linux Kernel Development
IT Specialist
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Herbert Kircher
Sitz der Gesellschaft: Boeblingen
Registriergericht: Amtsgericht Stuttgart, HRB 243294
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2007-07-26 12:45 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-25 15:08 [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context Christian Krafft
2007-07-25 17:28 ` Trond Myklebust
2007-07-26 11:23 ` Arnd Bergmann
2007-07-26 12:37 ` Trond Myklebust
2007-07-26 12:44 ` Christian Krafft [this message]
2007-07-26 13:23 ` Trond Myklebust
2007-07-26 15:13 ` Arnd Bergmann
2007-07-26 16:08 ` Trond Myklebust
2007-07-26 18:00 ` Christian Krafft
2007-07-27 10:02 ` Christian Krafft
2007-07-27 9:44 ` Arnd Bergmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070726144433.080126f7@localhost \
--to=krafft@de.ibm.com \
--cc=arnd@arndb.de \
--cc=linux-kernel@vger.kernel.org \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.