* [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
@ 2007-07-25 15:08 Christian Krafft
2007-07-25 17:28 ` Trond Myklebust
0 siblings, 1 reply; 11+ messages in thread
From: Christian Krafft @ 2007-07-25 15:08 UTC (permalink / raw)
To: linux-kernel; +Cc: krafft
[-- Attachment #1: Type: text/plain, Size: 4635 bytes --]
From: Christian Krafft <krafft@de.ibm.com>
While running some tests on a server using NFS root I found some
locking problems.
The scripts I am using can be found under at
http://localhorst.gotdns.org/parabelboi/nfs/
Please note, I am using rpm to generate read load. I found no better way
to generate that much stress on the read path ;-)
It's a two way machine btw, with mount options tcp,nolock the problem
doesn't occur (probably due to a different timing).
When running write_back.sh six times concurrent with one read.sh,
the processes will be locked within a few minutes.
The problem occurs at least on 2.6.22 and 2.6.23-rc1.
It shows up like this:
Badness at lib/kref.c:33
NIP: c00000000018b734 LR: c00000000014c37c CTR: c0000000001566f0
REGS: c00000007aebf160 TRAP: 0700 Not tainted (2.6.23-rc1)
MSR: 9000000000029032 <EE,ME,IR,DR> CR: 84000422 XER: 00000000
TASK = c00000007df7d530[30769] 'sync' THREAD: c00000007aebc000 CPU: 3
NIP [c00000000018b734] .kref_get+0xc/0x28
LR [c00000000014c37c] .get_nfs_open_context+0x1c/0x38
Call Trace:
[c00000007aebf3e0] [c00000000033d9f4] ._spin_lock+0x10/0x24 (unreliable)
[c00000007aebf460] [c00000000014cf40] .nfs_find_open_context+0x68/0xcc
[c00000007aebf500] [c000000000156654] .nfs_writepage_locked+0x164/0x200
[c00000007aebf600] [c00000000015670c] .nfs_writepage+0x1c/0x48
[c00000007aebf690] [c00000000008e53c] .__writepage+0x3c/0x84
[c00000007aebf710] [c00000000008ed84] .write_cache_pages+0x238/0x3f4
[c00000007aebf880] [c000000000155840] .nfs_writepages+0x7c/0xc0
[c00000007aebf970] [c00000000008efe0] .do_writepages+0x68/0xa4
[c00000007aebf9f0] [c0000000000e0290] .__writeback_single_inode+0x1bc/0x3b8
[c00000007aebfae0] [c0000000000e0984] .sync_sb_inodes+0x20c/0x308
[c00000007aebfb90] [c0000000000e0b50] .sync_inodes_sb+0xd0/0x100
[c00000007aebfc80] [c0000000000e0c14] .__sync_inodes+0x94/0x120
nstruction dump:
4e800421 e8410028 38000001 38210080 7c030378 e8010010 ebc1fff0 7c0803a6
4e800020 80030000 7c000034 5400d97e <0b000000> 7c001828 30000001 7c00192d
Unable to handle kernel paging request for data at address 0x00200200
Faulting instruction address: 0xc000000000193474
cpu 0x0: Vector: 300 (Data Access) at [c00000000ff4b5b0]
pc: c000000000193474: .list_del+0x20/0xa4
lr: c00000000014c2e8: .nfs_free_open_context+0x4c/0xc4
sp: c00000000ff4b830
msr: 9000000000009032
dar: 200200
dsisr: 40000000
current = 0xc00000000ff0a230
paca = 0xc00000000047e900
pid = 506, comm = rpciod/0
enter ? for help
[c00000000ff4b8b0] c00000000014c2e8 .nfs_free_open_context+0x4c/0xc4
[c00000000ff4b940] c00000000018b708 .kref_put+0x74/0x94
[c00000000ff4b9c0] c00000000014c1dc .put_nfs_open_context+0x1c/0x34
[c00000000ff4ba40] c000000000150ec0 .nfs_free_request+0x2c/0x5c
[c00000000ff4bad0] c00000000018b708 .kref_put+0x74/0x94
[c00000000ff4bb50] c000000000150e2c .nfs_release_request+0x20/0x38
[c00000000ff4bbd0] c00000000015750c .nfs_writeback_done_partial+0x270/0x2c8
[c00000000ff4bc80] c0000000003274fc .rpc_exit_task+0x5c/0xa0
Obviously the locking code in nfs_free_open_context is wrong.
Checking the list for entries and removing the entry should be an atomic operation.
Also list_del_init should be used, because list_del will leave the empty list in
an undefined condition.
After applying the patch the scripts run a bit longer, but another error occurs :-/
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Index: linux-2.6.23-rc1/fs/nfs/inode.c
===================================================================
--- linux-2.6.23-rc1.orig/fs/nfs/inode.c
+++ linux-2.6.23-rc1/fs/nfs/inode.c
@@ -484,13 +484,13 @@ static void nfs_free_open_context(struct
{
struct nfs_open_context *ctx = container_of(kref,
struct nfs_open_context, kref);
+ struct inode *inode = ctx->path.dentry->d_inode;
+
+ spin_lock(&inode->i_lock);
+ if (!list_empty(&ctx->list))
+ list_del_init(&ctx->list);
+ spin_unlock(&inode->i_lock);
- if (!list_empty(&ctx->list)) {
- struct inode *inode = ctx->path.dentry->d_inode;
- spin_lock(&inode->i_lock);
- list_del(&ctx->list);
- spin_unlock(&inode->i_lock);
- }
if (ctx->state != NULL)
nfs4_close_state(&ctx->path, ctx->state, ctx->mode);
if (ctx->cred != NULL)
--
Mit freundlichen Gruessen,
kind regards,
Christian Krafft
IBM Systems & Technology Group,
Linux Kernel Development
IT Specialist
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Herbert Kircher
Sitz der Gesellschaft: Boeblingen
Registriergericht: Amtsgericht Stuttgart, HRB 243294
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context 2007-07-25 15:08 [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context Christian Krafft @ 2007-07-25 17:28 ` Trond Myklebust 2007-07-26 11:23 ` Arnd Bergmann 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2007-07-25 17:28 UTC (permalink / raw) To: Christian Krafft; +Cc: linux-kernel On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote: > Obviously the locking code in nfs_free_open_context is wrong. > Checking the list for entries and removing the entry should be an atomic operation. Wrong. It is quite safe to test the structure member ctx->list for emptiness outside the spinlock because we have an explicit guarantee that nobody else has a reference to this structure, plus the atomic_dec_and_test() in kref_put() has acted as a memory barrier for us. > Also list_del_init should be used, because list_del will leave the empty list in > an undefined condition. Huh? We're freeing the context. It will _never_ _ever_ _ever_ be used again. If anything tries to use ctx->list after this, then that is a major bug. Patch rejected... Trond ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context 2007-07-25 17:28 ` Trond Myklebust @ 2007-07-26 11:23 ` Arnd Bergmann 2007-07-26 12:37 ` Trond Myklebust 2007-07-26 12:44 ` Christian Krafft 0 siblings, 2 replies; 11+ messages in thread From: Arnd Bergmann @ 2007-07-26 11:23 UTC (permalink / raw) To: Trond Myklebust; +Cc: Christian Krafft, linux-kernel On Wednesday 25 July 2007, Trond Myklebust wrote: > > On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote: > > > Obviously the locking code in nfs_free_open_context is wrong. > > Checking the list for entries and removing the entry should be an atomic operation. > > Wrong. It is quite safe to test the structure member ctx->list for > emptiness outside the spinlock because we have an explicit guarantee > that nobody else has a reference to this structure, plus the > atomic_dec_and_test() in kref_put() has acted as a memory barrier for > us. Well, the real question then is how the ctx can still be present in the nfsi->open_files list. Since we are in nfs_free_open_context(), there must not be any pointer to the ctx anywhere, but still we have this other thread calling get_nfs_open_context() on it. Arnd <>< ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context 2007-07-26 11:23 ` Arnd Bergmann @ 2007-07-26 12:37 ` Trond Myklebust 2007-07-26 12:44 ` Christian Krafft 1 sibling, 0 replies; 11+ messages in thread From: Trond Myklebust @ 2007-07-26 12:37 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Christian Krafft, linux-kernel On Thu, 2007-07-26 at 13:23 +0200, Arnd Bergmann wrote: > On Wednesday 25 July 2007, Trond Myklebust wrote: > > > > On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote: > > > > > Obviously the locking code in nfs_free_open_context is wrong. > > > Checking the list for entries and removing the entry should be an atomic operation. > > > > Wrong. It is quite safe to test the structure member ctx->list for > > emptiness outside the spinlock because we have an explicit guarantee > > that nobody else has a reference to this structure, plus the > > atomic_dec_and_test() in kref_put() has acted as a memory barrier for > > us. > > Well, the real question then is how the ctx can still be present in the > nfsi->open_files list. Since we are in nfs_free_open_context(), there > must not be any pointer to the ctx anywhere, but still we have this other > thread calling get_nfs_open_context() on it. Yup. That is definitely a bug. I wish we had a 'kref_put_and_lock' to deal with these situations where you want to grab a lock atomically with the last put. It would make krefs a lot more useful... Trond ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context 2007-07-26 11:23 ` Arnd Bergmann 2007-07-26 12:37 ` Trond Myklebust @ 2007-07-26 12:44 ` Christian Krafft 2007-07-26 13:23 ` Trond Myklebust 1 sibling, 1 reply; 11+ messages in thread From: Christian Krafft @ 2007-07-26 12:44 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Trond Myklebust, linux-kernel [-- Attachment #1: Type: text/plain, Size: 6167 bytes --] On Thu, 26 Jul 2007 13:23:37 +0200 Arnd Bergmann <arnd@arndb.de> wrote: > On Wednesday 25 July 2007, Trond Myklebust wrote: > > > > On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote: > > > > > Obviously the locking code in nfs_free_open_context is wrong. > > > Checking the list for entries and removing the entry should be an atomic operation. > > > > Wrong. It is quite safe to test the structure member ctx->list for > > emptiness outside the spinlock because we have an explicit guarantee > > that nobody else has a reference to this structure, plus the > > atomic_dec_and_test() in kref_put() has acted as a memory barrier for > > us. > > Well, the real question then is how the ctx can still be present in the > nfsi->open_files list. Since we are in nfs_free_open_context(), there > must not be any pointer to the ctx anywhere, but still we have this other > thread calling get_nfs_open_context() on it. > > Arnd <>< Thanks for the pointer Arnd, Trond, does the patch below look better to you ? Note: For now the list_del_init is needed, so that the BUG_ON(!list_empty) works correctly. (I have no clue why list_empty should not be working, when the list has been poisoned by a list_del.) I am running the stress tests now. It seems to work so far. --- Subject: nfs: fix locking in nfs/inode.c in nfs_free_open_context From: Christian Krafft <krafft@de.ibm.com> While running some tests on a server using NFS root I found some locking problems. The scripts I am using can be found under at http://localhorst.gotdns.org/parabelboi/nfs/ Please note, I am using rpm to generate read load. I found no better way to generate that much stress on the read path ;-) It's a two way machine btw, with mount options tcp,nolock the problem doesn't occur (probably due to a different timing). When running write_back.sh six times concurrent with one read.sh, the processes will be locked within a few minutes. The problem occurs at least on 2.6.22 and 2.6.23-rc1. It shows up like this: Badness at lib/kref.c:33 NIP: c00000000018b734 LR: c00000000014c37c CTR: c0000000001566f0 REGS: c00000007aebf160 TRAP: 0700 Not tainted (2.6.23-rc1) MSR: 9000000000029032 <EE,ME,IR,DR> CR: 84000422 XER: 00000000 TASK = c00000007df7d530[30769] 'sync' THREAD: c00000007aebc000 CPU: 3 NIP [c00000000018b734] .kref_get+0xc/0x28 LR [c00000000014c37c] .get_nfs_open_context+0x1c/0x38 Call Trace: [c00000007aebf3e0] [c00000000033d9f4] ._spin_lock+0x10/0x24 (unreliable) [c00000007aebf460] [c00000000014cf40] .nfs_find_open_context+0x68/0xcc [c00000007aebf500] [c000000000156654] .nfs_writepage_locked+0x164/0x200 [c00000007aebf600] [c00000000015670c] .nfs_writepage+0x1c/0x48 [c00000007aebf690] [c00000000008e53c] .__writepage+0x3c/0x84 [c00000007aebf710] [c00000000008ed84] .write_cache_pages+0x238/0x3f4 [c00000007aebf880] [c000000000155840] .nfs_writepages+0x7c/0xc0 [c00000007aebf970] [c00000000008efe0] .do_writepages+0x68/0xa4 [c00000007aebf9f0] [c0000000000e0290] .__writeback_single_inode+0x1bc/0x3b8 [c00000007aebfae0] [c0000000000e0984] .sync_sb_inodes+0x20c/0x308 [c00000007aebfb90] [c0000000000e0b50] .sync_inodes_sb+0xd0/0x100 [c00000007aebfc80] [c0000000000e0c14] .__sync_inodes+0x94/0x120 nstruction dump: 4e800421 e8410028 38000001 38210080 7c030378 e8010010 ebc1fff0 7c0803a6 4e800020 80030000 7c000034 5400d97e <0b000000> 7c001828 30000001 7c00192d Unable to handle kernel paging request for data at address 0x00200200 Faulting instruction address: 0xc000000000193474 cpu 0x0: Vector: 300 (Data Access) at [c00000000ff4b5b0] pc: c000000000193474: .list_del+0x20/0xa4 lr: c00000000014c2e8: .nfs_free_open_context+0x4c/0xc4 sp: c00000000ff4b830 msr: 9000000000009032 dar: 200200 dsisr: 40000000 current = 0xc00000000ff0a230 paca = 0xc00000000047e900 pid = 506, comm = rpciod/0 enter ? for help [c00000000ff4b8b0] c00000000014c2e8 .nfs_free_open_context+0x4c/0xc4 [c00000000ff4b940] c00000000018b708 .kref_put+0x74/0x94 [c00000000ff4b9c0] c00000000014c1dc .put_nfs_open_context+0x1c/0x34 [c00000000ff4ba40] c000000000150ec0 .nfs_free_request+0x2c/0x5c [c00000000ff4bad0] c00000000018b708 .kref_put+0x74/0x94 [c00000000ff4bb50] c000000000150e2c .nfs_release_request+0x20/0x38 [c00000000ff4bbd0] c00000000015750c .nfs_writeback_done_partial+0x270/0x2c8 [c00000000ff4bc80] c0000000003274fc .rpc_exit_task+0x5c/0xa0 Obviously the locking code in nfs_free_open_context is wrong. Checking the list for entries and removing the entry should be an atomic operation. Also list_del_init should be used, because list_del will leave the empty list in an undefined condition. After applying the patch the scripts run a bit longer, but another error occurs :-/ Signed-off-by: Christian Krafft <krafft@de.ibm.com> Index: linux-2.6.23-rc1/fs/nfs/inode.c =================================================================== --- linux-2.6.23-rc1.orig/fs/nfs/inode.c +++ linux-2.6.23-rc1/fs/nfs/inode.c @@ -485,12 +485,8 @@ static void nfs_free_open_context(struct struct nfs_open_context *ctx = container_of(kref, struct nfs_open_context, kref); - if (!list_empty(&ctx->list)) { - struct inode *inode = ctx->path.dentry->d_inode; - spin_lock(&inode->i_lock); - list_del(&ctx->list); - spin_unlock(&inode->i_lock); - } + BUG_ON(!list_empty(&ctx->list)); + if (ctx->state != NULL) nfs4_close_state(&ctx->path, ctx->state, ctx->mode); if (ctx->cred != NULL) @@ -549,7 +545,7 @@ static void nfs_file_clear_open_context( if (ctx) { filp->private_data = NULL; spin_lock(&inode->i_lock); - list_move_tail(&ctx->list, &NFS_I(inode)->open_files); + list_del_init(&ctx->list); spin_unlock(&inode->i_lock); put_nfs_open_context(ctx); } -- Mit freundlichen Gruessen, kind regards, Christian Krafft IBM Systems & Technology Group, Linux Kernel Development IT Specialist Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Herbert Kircher Sitz der Gesellschaft: Boeblingen Registriergericht: Amtsgericht Stuttgart, HRB 243294 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context 2007-07-26 12:44 ` Christian Krafft @ 2007-07-26 13:23 ` Trond Myklebust 2007-07-26 15:13 ` Arnd Bergmann 0 siblings, 1 reply; 11+ messages in thread From: Trond Myklebust @ 2007-07-26 13:23 UTC (permalink / raw) To: Christian Krafft; +Cc: Arnd Bergmann, linux-kernel On Thu, 2007-07-26 at 14:44 +0200, Christian Krafft wrote: > On Thu, 26 Jul 2007 13:23:37 +0200 > Arnd Bergmann <arnd@arndb.de> wrote: > > > On Wednesday 25 July 2007, Trond Myklebust wrote: > > > > > > On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote: > > > > > > > Obviously the locking code in nfs_free_open_context is wrong. > > > > Checking the list for entries and removing the entry should be an atomic operation. > > > > > > Wrong. It is quite safe to test the structure member ctx->list for > > > emptiness outside the spinlock because we have an explicit guarantee > > > that nobody else has a reference to this structure, plus the > > > atomic_dec_and_test() in kref_put() has acted as a memory barrier for > > > us. > > > > Well, the real question then is how the ctx can still be present in the > > nfsi->open_files list. Since we are in nfs_free_open_context(), there > > must not be any pointer to the ctx anywhere, but still we have this other > > thread calling get_nfs_open_context() on it. > > > > Arnd <>< > > Thanks for the pointer Arnd, > > Trond, does the patch below look better to you ? No. That is still incorrect. The list of open contexts is used for things like NFSv4 state recovery (when we're doing background writes, and the server happens to reboot). The lifetime of the open context may exceed that of the struct file that created it. Trond ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context 2007-07-26 13:23 ` Trond Myklebust @ 2007-07-26 15:13 ` Arnd Bergmann 2007-07-26 16:08 ` Trond Myklebust 0 siblings, 1 reply; 11+ messages in thread From: Arnd Bergmann @ 2007-07-26 15:13 UTC (permalink / raw) To: Trond Myklebust; +Cc: Christian Krafft, linux-kernel On Thursday 26 July 2007, Trond Myklebust wrote: > > > > > Wrong. It is quite safe to test the structure member ctx->list for > > > > emptiness outside the spinlock because we have an explicit guarantee > > > > that nobody else has a reference to this structure, plus the > > > > atomic_dec_and_test() in kref_put() has acted as a memory barrier for > > > > us > > > > > > Well, the real question then is how the ctx can still be present in the > > > nfsi->open_files list. Since we are in nfs_free_open_context(), there > > > must not be any pointer to the ctx anywhere, but still we have this other > > > thread calling get_nfs_open_context() on it > > > No. That is still incorrect. The list of open contexts is used for > things like NFSv4 state recovery (when we're doing background writes, > and the server happens to reboot). The lifetime of the open context may > exceed that of the struct file that created it. Unfortunately, you didn't answer my question. The observed problem is that the final kref_put gets called at a time where there are still references to the context in nfsi->open_files, and other threads therefore do get at them. The list_empty()/list_del() in nfs_free_open_context is bogus, because as you say, there is a guarantee that there is no other reference on the structure, and I take it that ought to include the ->open_files list. I would guess that a quick fix for this looks something like the patch below. The race is in put_nfs_open_context() between the point where we check the reference count and the point where the context gets removed from the open_files list. If another thread searches the list between these points and grabs a new reference, we die. The patch holds the i_lock around the kref_put to prevent others from searching the list. Ugly, I know, but it seems that's the price you pay for using a kref in such unconventional ways, i.e. not counting every reference. Arnd <>< --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -487,10 +487,10 @@ static void nfs_free_open_context(struct kref *kref) if (!list_empty(&ctx->list)) { struct inode *inode = ctx->path.dentry->d_inode; - spin_lock(&inode->i_lock); list_del(&ctx->list); - spin_unlock(&inode->i_lock); } + spin_unlock(&inode->i_lock); + if (ctx->state != NULL) nfs4_close_state(&ctx->path, ctx->state, ctx->mode); if (ctx->cred != NULL) @@ -498,11 +498,15 @@ static void nfs_free_open_context(struct kref *kref) dput(ctx->path.dentry); mntput(ctx->path.mnt); kfree(ctx); + + spin_lock(&inode->i_lock); } void put_nfs_open_context(struct nfs_open_context *ctx) { + spin_lock(&inode->i_lock); kref_put(&ctx->kref, nfs_free_open_context); + spin_unlock(&inode->i_lock); } /* ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context 2007-07-26 15:13 ` Arnd Bergmann @ 2007-07-26 16:08 ` Trond Myklebust 2007-07-26 18:00 ` Christian Krafft 2007-07-27 9:44 ` Arnd Bergmann 0 siblings, 2 replies; 11+ messages in thread From: Trond Myklebust @ 2007-07-26 16:08 UTC (permalink / raw) To: Arnd Bergmann; +Cc: Christian Krafft, linux-kernel On Thu, 2007-07-26 at 17:13 +0200, Arnd Bergmann wrote: > Unfortunately, you didn't answer my question. The observed problem is > that the final kref_put gets called at a time where there are still > references to the context in nfsi->open_files, and other threads > therefore do get at them. Actually, I thought I did: I said we need to grab the lock atomically with the last put. See below. > The patch holds the i_lock around the kref_put to prevent others > from searching the list. Ugly, I know, but it seems that's the > price you pay for using a kref in such unconventional ways, i.e. > not counting every reference. Really ugly. Here's an alternative that is a lot more palatable. Trond --------------------- From: Trond Myklebust <Trond.Myklebust@netapp.com> Date: Thu, 26 Jul 2007 12:06:17 -0400 NFS: Fix put_nfs_open_context We need to grab the inode->i_lock atomically with the last reference put in order to remove the open context that is being freed from the nfsi->open_files list. Fix by converting the kref to a standard atomic counter and then using atomic_dec_and_lock()... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> --- fs/nfs/inode.c | 24 ++++++++---------------- include/linux/nfs_fs.h | 2 +- 2 files changed, 9 insertions(+), 17 deletions(-) diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index bca6cdc..71a49c3 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -468,7 +468,7 @@ static struct nfs_open_context *alloc_nfs_open_context(struct vfsmount *mnt, str ctx->lockowner = current->files; ctx->error = 0; ctx->dir_cookie = 0; - kref_init(&ctx->kref); + atomic_set(&ctx->count, 1); } return ctx; } @@ -476,21 +476,18 @@ static struct nfs_open_context *alloc_nfs_open_context(struct vfsmount *mnt, str struct nfs_open_context *get_nfs_open_context(struct nfs_open_context *ctx) { if (ctx != NULL) - kref_get(&ctx->kref); + atomic_inc(&ctx->count); return ctx; } -static void nfs_free_open_context(struct kref *kref) +void put_nfs_open_context(struct nfs_open_context *ctx) { - struct nfs_open_context *ctx = container_of(kref, - struct nfs_open_context, kref); + struct inode *inode = ctx->path.dentry->d_inode; - if (!list_empty(&ctx->list)) { - struct inode *inode = ctx->path.dentry->d_inode; - spin_lock(&inode->i_lock); - list_del(&ctx->list); - spin_unlock(&inode->i_lock); - } + if (!atomic_dec_and_lock(&ctx->count, &inode->i_lock)) + return; + list_del(&ctx->list); + spin_unlock(&inode->i_lock); if (ctx->state != NULL) nfs4_close_state(&ctx->path, ctx->state, ctx->mode); if (ctx->cred != NULL) @@ -500,11 +497,6 @@ static void nfs_free_open_context(struct kref *kref) kfree(ctx); } -void put_nfs_open_context(struct nfs_open_context *ctx) -{ - kref_put(&ctx->kref, nfs_free_open_context); -} - /* * Ensure that mmap has a recent RPC credential for use when writing out * shared pages diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 9ba4aec..157dcb0 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -71,7 +71,7 @@ struct nfs_access_entry { struct nfs4_state; struct nfs_open_context { - struct kref kref; + atomic_t count; struct path path; struct rpc_cred *cred; struct nfs4_state *state; ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context 2007-07-26 16:08 ` Trond Myklebust @ 2007-07-26 18:00 ` Christian Krafft 2007-07-27 10:02 ` Christian Krafft 2007-07-27 9:44 ` Arnd Bergmann 1 sibling, 1 reply; 11+ messages in thread From: Christian Krafft @ 2007-07-26 18:00 UTC (permalink / raw) To: Trond Myklebust; +Cc: Arnd Bergmann, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1259 bytes --] On Thu, 26 Jul 2007 12:08:17 -0400 Trond Myklebust <trond.myklebust@fys.uio.no> wrote: > On Thu, 2007-07-26 at 17:13 +0200, Arnd Bergmann wrote: > > Unfortunately, you didn't answer my question. The observed problem is > > that the final kref_put gets called at a time where there are still > > references to the context in nfsi->open_files, and other threads > > therefore do get at them. > > Actually, I thought I did: I said we need to grab the lock atomically > with the last put. See below. > > > The patch holds the i_lock around the kref_put to prevent others > > from searching the list. Ugly, I know, but it seems that's the > > price you pay for using a kref in such unconventional ways, i.e. > > not counting every reference. > > Really ugly. Here's an alternative that is a lot more palatable. Indeed, and it also compiles ;-) I started a test run using this patch, we'll see in a few hours ... -- Mit freundlichen Gruessen, kind regards, Christian Krafft IBM Systems & Technology Group, Linux Kernel Development IT Specialist Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Herbert Kircher Sitz der Gesellschaft: Boeblingen Registriergericht: Amtsgericht Stuttgart, HRB 243294 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context 2007-07-26 18:00 ` Christian Krafft @ 2007-07-27 10:02 ` Christian Krafft 0 siblings, 0 replies; 11+ messages in thread From: Christian Krafft @ 2007-07-27 10:02 UTC (permalink / raw) To: Christian Krafft; +Cc: Trond Myklebust, Arnd Bergmann, linux-kernel [-- Attachment #1: Type: text/plain, Size: 639 bytes --] Hi Trond, On Thu, 26 Jul 2007 20:00:47 +0200 Christian Krafft <krafft@de.ibm.com> wrote: > > Indeed, and it also compiles ;-) > I started a test run using this patch, we'll see in a few hours ... > Scripts were running over night, no hang occurred. Would be great to see this patch integrated. -- Mit freundlichen Gruessen, kind regards, Christian Krafft IBM Systems & Technology Group, Linux Kernel Development IT Specialist Vorsitzender des Aufsichtsrats: Martin Jetter Geschaeftsfuehrung: Herbert Kircher Sitz der Gesellschaft: Boeblingen Registriergericht: Amtsgericht Stuttgart, HRB 243294 [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context 2007-07-26 16:08 ` Trond Myklebust 2007-07-26 18:00 ` Christian Krafft @ 2007-07-27 9:44 ` Arnd Bergmann 1 sibling, 0 replies; 11+ messages in thread From: Arnd Bergmann @ 2007-07-27 9:44 UTC (permalink / raw) To: Trond Myklebust; +Cc: Christian Krafft, linux-kernel, stable On Thursday 26 July 2007, Trond Myklebust wrote: > Really ugly. Here's an alternative that is a lot more palatable. Yes, much better than mine. > --------------------- > From: Trond Myklebust <Trond.Myklebust@netapp.com> > Date: Thu, 26 Jul 2007 12:06:17 -0400 > NFS: Fix put_nfs_open_context > > We need to grab the inode->i_lock atomically with the last reference put in > order to remove the open context that is being freed from the > nfsi->open_files list. > > Fix by converting the kref to a standard atomic counter and then using > atomic_dec_and_lock()... > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Arnd Bergmann <arnd@arndb.de> >From the bit of research I did on the bug yesterday, it seems that the race has been in there for ages, but may have become easier to hit with the change to kref after 2.6.22. I don't really understand why we didn't hit it in RHEL5/2.6.18 with the same test case, but in Fedora 7/2.6.21. Should a patch like the one below (I said like, you know you can't trust my patches for your code ;-) be put into 2.6.22.x and updated distro kernels? Arnd <>< --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -482,21 +482,19 @@ struct nfs_open_context *get_nfs_open_context(struct nfs_open_context *ctx) void put_nfs_open_context(struct nfs_open_context *ctx) { - if (atomic_dec_and_test(&ctx->count)) { - if (!list_empty(&ctx->list)) { - struct inode *inode = ctx->dentry->d_inode; - spin_lock(&inode->i_lock); - list_del(&ctx->list); - spin_unlock(&inode->i_lock); - } - if (ctx->state != NULL) - nfs4_close_state(ctx->state, ctx->mode); - if (ctx->cred != NULL) - put_rpccred(ctx->cred); - dput(ctx->dentry); - mntput(ctx->vfsmnt); - kfree(ctx); - } + struct inode *inode = ctx->dentry->d_inode; + + if (!atomic_dec_and_lock(&ctx->count, &inode->i_lock)) + return; + list_del(&ctx->list); + spin_unlock(&inode->i_lock); + if (ctx->state != NULL) + nfs4_close_state(ctx->state, ctx->mode); + if (ctx->cred != NULL) + put_rpccred(ctx->cred); + dput(ctx->dentry); + mntput(ctx->vfsmnt); + kfree(ctx); } /* ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-07-27 10:41 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-07-25 15:08 [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context Christian Krafft 2007-07-25 17:28 ` Trond Myklebust 2007-07-26 11:23 ` Arnd Bergmann 2007-07-26 12:37 ` Trond Myklebust 2007-07-26 12:44 ` Christian Krafft 2007-07-26 13:23 ` Trond Myklebust 2007-07-26 15:13 ` Arnd Bergmann 2007-07-26 16:08 ` Trond Myklebust 2007-07-26 18:00 ` Christian Krafft 2007-07-27 10:02 ` Christian Krafft 2007-07-27 9:44 ` Arnd Bergmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox