From: Christian Krafft <krafft@de.ibm.com>
To: Arnd Bergmann <arnd@arndb.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>,
linux-kernel@vger.kernel.org
Subject: Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
Date: Thu, 26 Jul 2007 14:44:33 +0200 [thread overview]
Message-ID: <20070726144433.080126f7@localhost> (raw)
In-Reply-To: <200707261323.38310.arnd@arndb.de>
[-- Attachment #1: Type: text/plain, Size: 6167 bytes --]
On Thu, 26 Jul 2007 13:23:37 +0200
Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday 25 July 2007, Trond Myklebust wrote:
> >
> > On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote:
> >
> > > Obviously the locking code in nfs_free_open_context is wrong.
> > > Checking the list for entries and removing the entry should be an atomic operation.
> >
> > Wrong. It is quite safe to test the structure member ctx->list for
> > emptiness outside the spinlock because we have an explicit guarantee
> > that nobody else has a reference to this structure, plus the
> > atomic_dec_and_test() in kref_put() has acted as a memory barrier for
> > us.
>
> Well, the real question then is how the ctx can still be present in the
> nfsi->open_files list. Since we are in nfs_free_open_context(), there
> must not be any pointer to the ctx anywhere, but still we have this other
> thread calling get_nfs_open_context() on it.
>
> Arnd <><
Thanks for the pointer Arnd,
Trond, does the patch below look better to you ?
Note:
For now the list_del_init is needed, so that the BUG_ON(!list_empty) works correctly.
(I have no clue why list_empty should not be working, when the list has been poisoned by a list_del.)
I am running the stress tests now. It seems to work so far.
---
Subject: nfs: fix locking in nfs/inode.c in nfs_free_open_context
From: Christian Krafft <krafft@de.ibm.com>
While running some tests on a server using NFS root I found some
locking problems.
The scripts I am using can be found under at
http://localhorst.gotdns.org/parabelboi/nfs/
Please note, I am using rpm to generate read load. I found no better way
to generate that much stress on the read path ;-)
It's a two way machine btw, with mount options tcp,nolock the problem
doesn't occur (probably due to a different timing).
When running write_back.sh six times concurrent with one read.sh,
the processes will be locked within a few minutes.
The problem occurs at least on 2.6.22 and 2.6.23-rc1.
It shows up like this:
Badness at lib/kref.c:33
NIP: c00000000018b734 LR: c00000000014c37c CTR: c0000000001566f0
REGS: c00000007aebf160 TRAP: 0700 Not tainted (2.6.23-rc1)
MSR: 9000000000029032 <EE,ME,IR,DR> CR: 84000422 XER: 00000000
TASK = c00000007df7d530[30769] 'sync' THREAD: c00000007aebc000 CPU: 3
NIP [c00000000018b734] .kref_get+0xc/0x28
LR [c00000000014c37c] .get_nfs_open_context+0x1c/0x38
Call Trace:
[c00000007aebf3e0] [c00000000033d9f4] ._spin_lock+0x10/0x24 (unreliable)
[c00000007aebf460] [c00000000014cf40] .nfs_find_open_context+0x68/0xcc
[c00000007aebf500] [c000000000156654] .nfs_writepage_locked+0x164/0x200
[c00000007aebf600] [c00000000015670c] .nfs_writepage+0x1c/0x48
[c00000007aebf690] [c00000000008e53c] .__writepage+0x3c/0x84
[c00000007aebf710] [c00000000008ed84] .write_cache_pages+0x238/0x3f4
[c00000007aebf880] [c000000000155840] .nfs_writepages+0x7c/0xc0
[c00000007aebf970] [c00000000008efe0] .do_writepages+0x68/0xa4
[c00000007aebf9f0] [c0000000000e0290] .__writeback_single_inode+0x1bc/0x3b8
[c00000007aebfae0] [c0000000000e0984] .sync_sb_inodes+0x20c/0x308
[c00000007aebfb90] [c0000000000e0b50] .sync_inodes_sb+0xd0/0x100
[c00000007aebfc80] [c0000000000e0c14] .__sync_inodes+0x94/0x120
nstruction dump:
4e800421 e8410028 38000001 38210080 7c030378 e8010010 ebc1fff0 7c0803a6
4e800020 80030000 7c000034 5400d97e <0b000000> 7c001828 30000001 7c00192d
Unable to handle kernel paging request for data at address 0x00200200
Faulting instruction address: 0xc000000000193474
cpu 0x0: Vector: 300 (Data Access) at [c00000000ff4b5b0]
pc: c000000000193474: .list_del+0x20/0xa4
lr: c00000000014c2e8: .nfs_free_open_context+0x4c/0xc4
sp: c00000000ff4b830
msr: 9000000000009032
dar: 200200
dsisr: 40000000
current = 0xc00000000ff0a230
paca = 0xc00000000047e900
pid = 506, comm = rpciod/0
enter ? for help
[c00000000ff4b8b0] c00000000014c2e8 .nfs_free_open_context+0x4c/0xc4
[c00000000ff4b940] c00000000018b708 .kref_put+0x74/0x94
[c00000000ff4b9c0] c00000000014c1dc .put_nfs_open_context+0x1c/0x34
[c00000000ff4ba40] c000000000150ec0 .nfs_free_request+0x2c/0x5c
[c00000000ff4bad0] c00000000018b708 .kref_put+0x74/0x94
[c00000000ff4bb50] c000000000150e2c .nfs_release_request+0x20/0x38
[c00000000ff4bbd0] c00000000015750c .nfs_writeback_done_partial+0x270/0x2c8
[c00000000ff4bc80] c0000000003274fc .rpc_exit_task+0x5c/0xa0
Obviously the locking code in nfs_free_open_context is wrong.
Checking the list for entries and removing the entry should be an atomic operation.
Also list_del_init should be used, because list_del will leave the empty list in
an undefined condition.
After applying the patch the scripts run a bit longer, but another error occurs :-/
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Index: linux-2.6.23-rc1/fs/nfs/inode.c
===================================================================
--- linux-2.6.23-rc1.orig/fs/nfs/inode.c
+++ linux-2.6.23-rc1/fs/nfs/inode.c
@@ -485,12 +485,8 @@ static void nfs_free_open_context(struct
struct nfs_open_context *ctx = container_of(kref,
struct nfs_open_context, kref);
- if (!list_empty(&ctx->list)) {
- struct inode *inode = ctx->path.dentry->d_inode;
- spin_lock(&inode->i_lock);
- list_del(&ctx->list);
- spin_unlock(&inode->i_lock);
- }
+ BUG_ON(!list_empty(&ctx->list));
+
if (ctx->state != NULL)
nfs4_close_state(&ctx->path, ctx->state, ctx->mode);
if (ctx->cred != NULL)
@@ -549,7 +545,7 @@ static void nfs_file_clear_open_context(
if (ctx) {
filp->private_data = NULL;
spin_lock(&inode->i_lock);
- list_move_tail(&ctx->list, &NFS_I(inode)->open_files);
+ list_del_init(&ctx->list);
spin_unlock(&inode->i_lock);
put_nfs_open_context(ctx);
}
--
Mit freundlichen Gruessen,
kind regards,
Christian Krafft
IBM Systems & Technology Group,
Linux Kernel Development
IT Specialist
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Herbert Kircher
Sitz der Gesellschaft: Boeblingen
Registriergericht: Amtsgericht Stuttgart, HRB 243294
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
next prev parent reply other threads:[~2007-07-26 12:45 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-07-25 15:08 [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context Christian Krafft
2007-07-25 17:28 ` Trond Myklebust
2007-07-26 11:23 ` Arnd Bergmann
2007-07-26 12:37 ` Trond Myklebust
2007-07-26 12:44 ` Christian Krafft [this message]
2007-07-26 13:23 ` Trond Myklebust
2007-07-26 15:13 ` Arnd Bergmann
2007-07-26 16:08 ` Trond Myklebust
2007-07-26 18:00 ` Christian Krafft
2007-07-27 10:02 ` Christian Krafft
2007-07-27 9:44 ` Arnd Bergmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070726144433.080126f7@localhost \
--to=krafft@de.ibm.com \
--cc=arnd@arndb.de \
--cc=linux-kernel@vger.kernel.org \
--cc=trond.myklebust@fys.uio.no \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox