* [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
@ 2007-07-25 15:08 Christian Krafft
2007-07-25 17:28 ` Trond Myklebust
0 siblings, 1 reply; 11+ messages in thread
From: Christian Krafft @ 2007-07-25 15:08 UTC (permalink / raw)
To: linux-kernel; +Cc: krafft
[-- Attachment #1: Type: text/plain, Size: 4635 bytes --]
From: Christian Krafft <krafft@de.ibm.com>
While running some tests on a server using NFS root I found some
locking problems.
The scripts I am using can be found under at
http://localhorst.gotdns.org/parabelboi/nfs/
Please note, I am using rpm to generate read load. I found no better way
to generate that much stress on the read path ;-)
It's a two way machine btw, with mount options tcp,nolock the problem
doesn't occur (probably due to a different timing).
When running write_back.sh six times concurrent with one read.sh,
the processes will be locked within a few minutes.
The problem occurs at least on 2.6.22 and 2.6.23-rc1.
It shows up like this:
Badness at lib/kref.c:33
NIP: c00000000018b734 LR: c00000000014c37c CTR: c0000000001566f0
REGS: c00000007aebf160 TRAP: 0700 Not tainted (2.6.23-rc1)
MSR: 9000000000029032 <EE,ME,IR,DR> CR: 84000422 XER: 00000000
TASK = c00000007df7d530[30769] 'sync' THREAD: c00000007aebc000 CPU: 3
NIP [c00000000018b734] .kref_get+0xc/0x28
LR [c00000000014c37c] .get_nfs_open_context+0x1c/0x38
Call Trace:
[c00000007aebf3e0] [c00000000033d9f4] ._spin_lock+0x10/0x24 (unreliable)
[c00000007aebf460] [c00000000014cf40] .nfs_find_open_context+0x68/0xcc
[c00000007aebf500] [c000000000156654] .nfs_writepage_locked+0x164/0x200
[c00000007aebf600] [c00000000015670c] .nfs_writepage+0x1c/0x48
[c00000007aebf690] [c00000000008e53c] .__writepage+0x3c/0x84
[c00000007aebf710] [c00000000008ed84] .write_cache_pages+0x238/0x3f4
[c00000007aebf880] [c000000000155840] .nfs_writepages+0x7c/0xc0
[c00000007aebf970] [c00000000008efe0] .do_writepages+0x68/0xa4
[c00000007aebf9f0] [c0000000000e0290] .__writeback_single_inode+0x1bc/0x3b8
[c00000007aebfae0] [c0000000000e0984] .sync_sb_inodes+0x20c/0x308
[c00000007aebfb90] [c0000000000e0b50] .sync_inodes_sb+0xd0/0x100
[c00000007aebfc80] [c0000000000e0c14] .__sync_inodes+0x94/0x120
nstruction dump:
4e800421 e8410028 38000001 38210080 7c030378 e8010010 ebc1fff0 7c0803a6
4e800020 80030000 7c000034 5400d97e <0b000000> 7c001828 30000001 7c00192d
Unable to handle kernel paging request for data at address 0x00200200
Faulting instruction address: 0xc000000000193474
cpu 0x0: Vector: 300 (Data Access) at [c00000000ff4b5b0]
pc: c000000000193474: .list_del+0x20/0xa4
lr: c00000000014c2e8: .nfs_free_open_context+0x4c/0xc4
sp: c00000000ff4b830
msr: 9000000000009032
dar: 200200
dsisr: 40000000
current = 0xc00000000ff0a230
paca = 0xc00000000047e900
pid = 506, comm = rpciod/0
enter ? for help
[c00000000ff4b8b0] c00000000014c2e8 .nfs_free_open_context+0x4c/0xc4
[c00000000ff4b940] c00000000018b708 .kref_put+0x74/0x94
[c00000000ff4b9c0] c00000000014c1dc .put_nfs_open_context+0x1c/0x34
[c00000000ff4ba40] c000000000150ec0 .nfs_free_request+0x2c/0x5c
[c00000000ff4bad0] c00000000018b708 .kref_put+0x74/0x94
[c00000000ff4bb50] c000000000150e2c .nfs_release_request+0x20/0x38
[c00000000ff4bbd0] c00000000015750c .nfs_writeback_done_partial+0x270/0x2c8
[c00000000ff4bc80] c0000000003274fc .rpc_exit_task+0x5c/0xa0
Obviously the locking code in nfs_free_open_context is wrong.
Checking the list for entries and removing the entry should be an atomic operation.
Also list_del_init should be used, because list_del will leave the empty list in
an undefined condition.
After applying the patch the scripts run a bit longer, but another error occurs :-/
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Index: linux-2.6.23-rc1/fs/nfs/inode.c
===================================================================
--- linux-2.6.23-rc1.orig/fs/nfs/inode.c
+++ linux-2.6.23-rc1/fs/nfs/inode.c
@@ -484,13 +484,13 @@ static void nfs_free_open_context(struct
{
struct nfs_open_context *ctx = container_of(kref,
struct nfs_open_context, kref);
+ struct inode *inode = ctx->path.dentry->d_inode;
+
+ spin_lock(&inode->i_lock);
+ if (!list_empty(&ctx->list))
+ list_del_init(&ctx->list);
+ spin_unlock(&inode->i_lock);
- if (!list_empty(&ctx->list)) {
- struct inode *inode = ctx->path.dentry->d_inode;
- spin_lock(&inode->i_lock);
- list_del(&ctx->list);
- spin_unlock(&inode->i_lock);
- }
if (ctx->state != NULL)
nfs4_close_state(&ctx->path, ctx->state, ctx->mode);
if (ctx->cred != NULL)
--
Mit freundlichen Gruessen,
kind regards,
Christian Krafft
IBM Systems & Technology Group,
Linux Kernel Development
IT Specialist
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Herbert Kircher
Sitz der Gesellschaft: Boeblingen
Registriergericht: Amtsgericht Stuttgart, HRB 243294
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
2007-07-25 15:08 [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context Christian Krafft
@ 2007-07-25 17:28 ` Trond Myklebust
2007-07-26 11:23 ` Arnd Bergmann
0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2007-07-25 17:28 UTC (permalink / raw)
To: Christian Krafft; +Cc: linux-kernel
On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote:
> Obviously the locking code in nfs_free_open_context is wrong.
> Checking the list for entries and removing the entry should be an atomic operation.
Wrong. It is quite safe to test the structure member ctx->list for
emptiness outside the spinlock because we have an explicit guarantee
that nobody else has a reference to this structure, plus the
atomic_dec_and_test() in kref_put() has acted as a memory barrier for
us.
> Also list_del_init should be used, because list_del will leave the empty list in
> an undefined condition.
Huh? We're freeing the context. It will _never_ _ever_ _ever_ be used
again. If anything tries to use ctx->list after this, then that is a
major bug.
Patch rejected...
Trond
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
2007-07-25 17:28 ` Trond Myklebust
@ 2007-07-26 11:23 ` Arnd Bergmann
2007-07-26 12:37 ` Trond Myklebust
2007-07-26 12:44 ` Christian Krafft
0 siblings, 2 replies; 11+ messages in thread
From: Arnd Bergmann @ 2007-07-26 11:23 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Christian Krafft, linux-kernel
On Wednesday 25 July 2007, Trond Myklebust wrote:
>
> On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote:
>
> > Obviously the locking code in nfs_free_open_context is wrong.
> > Checking the list for entries and removing the entry should be an atomic operation.
>
> Wrong. It is quite safe to test the structure member ctx->list for
> emptiness outside the spinlock because we have an explicit guarantee
> that nobody else has a reference to this structure, plus the
> atomic_dec_and_test() in kref_put() has acted as a memory barrier for
> us.
Well, the real question then is how the ctx can still be present in the
nfsi->open_files list. Since we are in nfs_free_open_context(), there
must not be any pointer to the ctx anywhere, but still we have this other
thread calling get_nfs_open_context() on it.
Arnd <><
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
2007-07-26 11:23 ` Arnd Bergmann
@ 2007-07-26 12:37 ` Trond Myklebust
2007-07-26 12:44 ` Christian Krafft
1 sibling, 0 replies; 11+ messages in thread
From: Trond Myklebust @ 2007-07-26 12:37 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: Christian Krafft, linux-kernel
On Thu, 2007-07-26 at 13:23 +0200, Arnd Bergmann wrote:
> On Wednesday 25 July 2007, Trond Myklebust wrote:
> >
> > On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote:
> >
> > > Obviously the locking code in nfs_free_open_context is wrong.
> > > Checking the list for entries and removing the entry should be an atomic operation.
> >
> > Wrong. It is quite safe to test the structure member ctx->list for
> > emptiness outside the spinlock because we have an explicit guarantee
> > that nobody else has a reference to this structure, plus the
> > atomic_dec_and_test() in kref_put() has acted as a memory barrier for
> > us.
>
> Well, the real question then is how the ctx can still be present in the
> nfsi->open_files list. Since we are in nfs_free_open_context(), there
> must not be any pointer to the ctx anywhere, but still we have this other
> thread calling get_nfs_open_context() on it.
Yup. That is definitely a bug. I wish we had a 'kref_put_and_lock' to
deal with these situations where you want to grab a lock atomically with
the last put. It would make krefs a lot more useful...
Trond
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
2007-07-26 11:23 ` Arnd Bergmann
2007-07-26 12:37 ` Trond Myklebust
@ 2007-07-26 12:44 ` Christian Krafft
2007-07-26 13:23 ` Trond Myklebust
1 sibling, 1 reply; 11+ messages in thread
From: Christian Krafft @ 2007-07-26 12:44 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: Trond Myklebust, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 6167 bytes --]
On Thu, 26 Jul 2007 13:23:37 +0200
Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday 25 July 2007, Trond Myklebust wrote:
> >
> > On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote:
> >
> > > Obviously the locking code in nfs_free_open_context is wrong.
> > > Checking the list for entries and removing the entry should be an atomic operation.
> >
> > Wrong. It is quite safe to test the structure member ctx->list for
> > emptiness outside the spinlock because we have an explicit guarantee
> > that nobody else has a reference to this structure, plus the
> > atomic_dec_and_test() in kref_put() has acted as a memory barrier for
> > us.
>
> Well, the real question then is how the ctx can still be present in the
> nfsi->open_files list. Since we are in nfs_free_open_context(), there
> must not be any pointer to the ctx anywhere, but still we have this other
> thread calling get_nfs_open_context() on it.
>
> Arnd <><
Thanks for the pointer Arnd,
Trond, does the patch below look better to you ?
Note:
For now the list_del_init is needed, so that the BUG_ON(!list_empty) works correctly.
(I have no clue why list_empty should not be working, when the list has been poisoned by a list_del.)
I am running the stress tests now. It seems to work so far.
---
Subject: nfs: fix locking in nfs/inode.c in nfs_free_open_context
From: Christian Krafft <krafft@de.ibm.com>
While running some tests on a server using NFS root I found some
locking problems.
The scripts I am using can be found under at
http://localhorst.gotdns.org/parabelboi/nfs/
Please note, I am using rpm to generate read load. I found no better way
to generate that much stress on the read path ;-)
It's a two way machine btw, with mount options tcp,nolock the problem
doesn't occur (probably due to a different timing).
When running write_back.sh six times concurrent with one read.sh,
the processes will be locked within a few minutes.
The problem occurs at least on 2.6.22 and 2.6.23-rc1.
It shows up like this:
Badness at lib/kref.c:33
NIP: c00000000018b734 LR: c00000000014c37c CTR: c0000000001566f0
REGS: c00000007aebf160 TRAP: 0700 Not tainted (2.6.23-rc1)
MSR: 9000000000029032 <EE,ME,IR,DR> CR: 84000422 XER: 00000000
TASK = c00000007df7d530[30769] 'sync' THREAD: c00000007aebc000 CPU: 3
NIP [c00000000018b734] .kref_get+0xc/0x28
LR [c00000000014c37c] .get_nfs_open_context+0x1c/0x38
Call Trace:
[c00000007aebf3e0] [c00000000033d9f4] ._spin_lock+0x10/0x24 (unreliable)
[c00000007aebf460] [c00000000014cf40] .nfs_find_open_context+0x68/0xcc
[c00000007aebf500] [c000000000156654] .nfs_writepage_locked+0x164/0x200
[c00000007aebf600] [c00000000015670c] .nfs_writepage+0x1c/0x48
[c00000007aebf690] [c00000000008e53c] .__writepage+0x3c/0x84
[c00000007aebf710] [c00000000008ed84] .write_cache_pages+0x238/0x3f4
[c00000007aebf880] [c000000000155840] .nfs_writepages+0x7c/0xc0
[c00000007aebf970] [c00000000008efe0] .do_writepages+0x68/0xa4
[c00000007aebf9f0] [c0000000000e0290] .__writeback_single_inode+0x1bc/0x3b8
[c00000007aebfae0] [c0000000000e0984] .sync_sb_inodes+0x20c/0x308
[c00000007aebfb90] [c0000000000e0b50] .sync_inodes_sb+0xd0/0x100
[c00000007aebfc80] [c0000000000e0c14] .__sync_inodes+0x94/0x120
nstruction dump:
4e800421 e8410028 38000001 38210080 7c030378 e8010010 ebc1fff0 7c0803a6
4e800020 80030000 7c000034 5400d97e <0b000000> 7c001828 30000001 7c00192d
Unable to handle kernel paging request for data at address 0x00200200
Faulting instruction address: 0xc000000000193474
cpu 0x0: Vector: 300 (Data Access) at [c00000000ff4b5b0]
pc: c000000000193474: .list_del+0x20/0xa4
lr: c00000000014c2e8: .nfs_free_open_context+0x4c/0xc4
sp: c00000000ff4b830
msr: 9000000000009032
dar: 200200
dsisr: 40000000
current = 0xc00000000ff0a230
paca = 0xc00000000047e900
pid = 506, comm = rpciod/0
enter ? for help
[c00000000ff4b8b0] c00000000014c2e8 .nfs_free_open_context+0x4c/0xc4
[c00000000ff4b940] c00000000018b708 .kref_put+0x74/0x94
[c00000000ff4b9c0] c00000000014c1dc .put_nfs_open_context+0x1c/0x34
[c00000000ff4ba40] c000000000150ec0 .nfs_free_request+0x2c/0x5c
[c00000000ff4bad0] c00000000018b708 .kref_put+0x74/0x94
[c00000000ff4bb50] c000000000150e2c .nfs_release_request+0x20/0x38
[c00000000ff4bbd0] c00000000015750c .nfs_writeback_done_partial+0x270/0x2c8
[c00000000ff4bc80] c0000000003274fc .rpc_exit_task+0x5c/0xa0
Obviously the locking code in nfs_free_open_context is wrong.
Checking the list for entries and removing the entry should be an atomic operation.
Also list_del_init should be used, because list_del will leave the empty list in
an undefined condition.
After applying the patch the scripts run a bit longer, but another error occurs :-/
Signed-off-by: Christian Krafft <krafft@de.ibm.com>
Index: linux-2.6.23-rc1/fs/nfs/inode.c
===================================================================
--- linux-2.6.23-rc1.orig/fs/nfs/inode.c
+++ linux-2.6.23-rc1/fs/nfs/inode.c
@@ -485,12 +485,8 @@ static void nfs_free_open_context(struct
struct nfs_open_context *ctx = container_of(kref,
struct nfs_open_context, kref);
- if (!list_empty(&ctx->list)) {
- struct inode *inode = ctx->path.dentry->d_inode;
- spin_lock(&inode->i_lock);
- list_del(&ctx->list);
- spin_unlock(&inode->i_lock);
- }
+ BUG_ON(!list_empty(&ctx->list));
+
if (ctx->state != NULL)
nfs4_close_state(&ctx->path, ctx->state, ctx->mode);
if (ctx->cred != NULL)
@@ -549,7 +545,7 @@ static void nfs_file_clear_open_context(
if (ctx) {
filp->private_data = NULL;
spin_lock(&inode->i_lock);
- list_move_tail(&ctx->list, &NFS_I(inode)->open_files);
+ list_del_init(&ctx->list);
spin_unlock(&inode->i_lock);
put_nfs_open_context(ctx);
}
--
Mit freundlichen Gruessen,
kind regards,
Christian Krafft
IBM Systems & Technology Group,
Linux Kernel Development
IT Specialist
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Herbert Kircher
Sitz der Gesellschaft: Boeblingen
Registriergericht: Amtsgericht Stuttgart, HRB 243294
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
2007-07-26 12:44 ` Christian Krafft
@ 2007-07-26 13:23 ` Trond Myklebust
2007-07-26 15:13 ` Arnd Bergmann
0 siblings, 1 reply; 11+ messages in thread
From: Trond Myklebust @ 2007-07-26 13:23 UTC (permalink / raw)
To: Christian Krafft; +Cc: Arnd Bergmann, linux-kernel
On Thu, 2007-07-26 at 14:44 +0200, Christian Krafft wrote:
> On Thu, 26 Jul 2007 13:23:37 +0200
> Arnd Bergmann <arnd@arndb.de> wrote:
>
> > On Wednesday 25 July 2007, Trond Myklebust wrote:
> > >
> > > On Wed, 2007-07-25 at 17:08 +0200, Christian Krafft wrote:
> > >
> > > > Obviously the locking code in nfs_free_open_context is wrong.
> > > > Checking the list for entries and removing the entry should be an atomic operation.
> > >
> > > Wrong. It is quite safe to test the structure member ctx->list for
> > > emptiness outside the spinlock because we have an explicit guarantee
> > > that nobody else has a reference to this structure, plus the
> > > atomic_dec_and_test() in kref_put() has acted as a memory barrier for
> > > us.
> >
> > Well, the real question then is how the ctx can still be present in the
> > nfsi->open_files list. Since we are in nfs_free_open_context(), there
> > must not be any pointer to the ctx anywhere, but still we have this other
> > thread calling get_nfs_open_context() on it.
> >
> > Arnd <><
>
> Thanks for the pointer Arnd,
>
> Trond, does the patch below look better to you ?
No. That is still incorrect. The list of open contexts is used for
things like NFSv4 state recovery (when we're doing background writes,
and the server happens to reboot). The lifetime of the open context may
exceed that of the struct file that created it.
Trond
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
2007-07-26 13:23 ` Trond Myklebust
@ 2007-07-26 15:13 ` Arnd Bergmann
2007-07-26 16:08 ` Trond Myklebust
0 siblings, 1 reply; 11+ messages in thread
From: Arnd Bergmann @ 2007-07-26 15:13 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Christian Krafft, linux-kernel
On Thursday 26 July 2007, Trond Myklebust wrote:
>
> > > > Wrong. It is quite safe to test the structure member ctx->list for
> > > > emptiness outside the spinlock because we have an explicit guarantee
> > > > that nobody else has a reference to this structure, plus the
> > > > atomic_dec_and_test() in kref_put() has acted as a memory barrier for
> > > > us
> > >
> > > Well, the real question then is how the ctx can still be present in the
> > > nfsi->open_files list. Since we are in nfs_free_open_context(), there
> > > must not be any pointer to the ctx anywhere, but still we have this other
> > > thread calling get_nfs_open_context() on it
> >
> No. That is still incorrect. The list of open contexts is used for
> things like NFSv4 state recovery (when we're doing background writes,
> and the server happens to reboot). The lifetime of the open context may
> exceed that of the struct file that created it.
Unfortunately, you didn't answer my question. The observed problem is
that the final kref_put gets called at a time where there are still
references to the context in nfsi->open_files, and other threads
therefore do get at them.
The list_empty()/list_del() in nfs_free_open_context is bogus, because
as you say, there is a guarantee that there is no other reference on
the structure, and I take it that ought to include the ->open_files
list.
I would guess that a quick fix for this looks something like the
patch below. The race is in put_nfs_open_context() between the point
where we check the reference count and the point where the context
gets removed from the open_files list. If another thread searches
the list between these points and grabs a new reference, we die.
The patch holds the i_lock around the kref_put to prevent others
from searching the list. Ugly, I know, but it seems that's the
price you pay for using a kref in such unconventional ways, i.e.
not counting every reference.
Arnd <><
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -487,10 +487,10 @@ static void nfs_free_open_context(struct kref *kref)
if (!list_empty(&ctx->list)) {
struct inode *inode = ctx->path.dentry->d_inode;
- spin_lock(&inode->i_lock);
list_del(&ctx->list);
- spin_unlock(&inode->i_lock);
}
+ spin_unlock(&inode->i_lock);
+
if (ctx->state != NULL)
nfs4_close_state(&ctx->path, ctx->state, ctx->mode);
if (ctx->cred != NULL)
@@ -498,11 +498,15 @@ static void nfs_free_open_context(struct kref *kref)
dput(ctx->path.dentry);
mntput(ctx->path.mnt);
kfree(ctx);
+
+ spin_lock(&inode->i_lock);
}
void put_nfs_open_context(struct nfs_open_context *ctx)
{
+ spin_lock(&inode->i_lock);
kref_put(&ctx->kref, nfs_free_open_context);
+ spin_unlock(&inode->i_lock);
}
/*
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
2007-07-26 15:13 ` Arnd Bergmann
@ 2007-07-26 16:08 ` Trond Myklebust
2007-07-26 18:00 ` Christian Krafft
2007-07-27 9:44 ` Arnd Bergmann
0 siblings, 2 replies; 11+ messages in thread
From: Trond Myklebust @ 2007-07-26 16:08 UTC (permalink / raw)
To: Arnd Bergmann; +Cc: Christian Krafft, linux-kernel
On Thu, 2007-07-26 at 17:13 +0200, Arnd Bergmann wrote:
> Unfortunately, you didn't answer my question. The observed problem is
> that the final kref_put gets called at a time where there are still
> references to the context in nfsi->open_files, and other threads
> therefore do get at them.
Actually, I thought I did: I said we need to grab the lock atomically
with the last put. See below.
> The patch holds the i_lock around the kref_put to prevent others
> from searching the list. Ugly, I know, but it seems that's the
> price you pay for using a kref in such unconventional ways, i.e.
> not counting every reference.
Really ugly. Here's an alternative that is a lot more palatable.
Trond
---------------------
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Thu, 26 Jul 2007 12:06:17 -0400
NFS: Fix put_nfs_open_context
We need to grab the inode->i_lock atomically with the last reference put in
order to remove the open context that is being freed from the
nfsi->open_files list.
Fix by converting the kref to a standard atomic counter and then using
atomic_dec_and_lock()...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---
fs/nfs/inode.c | 24 ++++++++----------------
include/linux/nfs_fs.h | 2 +-
2 files changed, 9 insertions(+), 17 deletions(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index bca6cdc..71a49c3 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -468,7 +468,7 @@ static struct nfs_open_context *alloc_nfs_open_context(struct vfsmount *mnt, str
ctx->lockowner = current->files;
ctx->error = 0;
ctx->dir_cookie = 0;
- kref_init(&ctx->kref);
+ atomic_set(&ctx->count, 1);
}
return ctx;
}
@@ -476,21 +476,18 @@ static struct nfs_open_context *alloc_nfs_open_context(struct vfsmount *mnt, str
struct nfs_open_context *get_nfs_open_context(struct nfs_open_context *ctx)
{
if (ctx != NULL)
- kref_get(&ctx->kref);
+ atomic_inc(&ctx->count);
return ctx;
}
-static void nfs_free_open_context(struct kref *kref)
+void put_nfs_open_context(struct nfs_open_context *ctx)
{
- struct nfs_open_context *ctx = container_of(kref,
- struct nfs_open_context, kref);
+ struct inode *inode = ctx->path.dentry->d_inode;
- if (!list_empty(&ctx->list)) {
- struct inode *inode = ctx->path.dentry->d_inode;
- spin_lock(&inode->i_lock);
- list_del(&ctx->list);
- spin_unlock(&inode->i_lock);
- }
+ if (!atomic_dec_and_lock(&ctx->count, &inode->i_lock))
+ return;
+ list_del(&ctx->list);
+ spin_unlock(&inode->i_lock);
if (ctx->state != NULL)
nfs4_close_state(&ctx->path, ctx->state, ctx->mode);
if (ctx->cred != NULL)
@@ -500,11 +497,6 @@ static void nfs_free_open_context(struct kref *kref)
kfree(ctx);
}
-void put_nfs_open_context(struct nfs_open_context *ctx)
-{
- kref_put(&ctx->kref, nfs_free_open_context);
-}
-
/*
* Ensure that mmap has a recent RPC credential for use when writing out
* shared pages
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 9ba4aec..157dcb0 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -71,7 +71,7 @@ struct nfs_access_entry {
struct nfs4_state;
struct nfs_open_context {
- struct kref kref;
+ atomic_t count;
struct path path;
struct rpc_cred *cred;
struct nfs4_state *state;
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
2007-07-26 16:08 ` Trond Myklebust
@ 2007-07-26 18:00 ` Christian Krafft
2007-07-27 10:02 ` Christian Krafft
2007-07-27 9:44 ` Arnd Bergmann
1 sibling, 1 reply; 11+ messages in thread
From: Christian Krafft @ 2007-07-26 18:00 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Arnd Bergmann, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1259 bytes --]
On Thu, 26 Jul 2007 12:08:17 -0400
Trond Myklebust <trond.myklebust@fys.uio.no> wrote:
> On Thu, 2007-07-26 at 17:13 +0200, Arnd Bergmann wrote:
> > Unfortunately, you didn't answer my question. The observed problem is
> > that the final kref_put gets called at a time where there are still
> > references to the context in nfsi->open_files, and other threads
> > therefore do get at them.
>
> Actually, I thought I did: I said we need to grab the lock atomically
> with the last put. See below.
>
> > The patch holds the i_lock around the kref_put to prevent others
> > from searching the list. Ugly, I know, but it seems that's the
> > price you pay for using a kref in such unconventional ways, i.e.
> > not counting every reference.
>
> Really ugly. Here's an alternative that is a lot more palatable.
Indeed, and it also compiles ;-)
I started a test run using this patch, we'll see in a few hours ...
--
Mit freundlichen Gruessen,
kind regards,
Christian Krafft
IBM Systems & Technology Group,
Linux Kernel Development
IT Specialist
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Herbert Kircher
Sitz der Gesellschaft: Boeblingen
Registriergericht: Amtsgericht Stuttgart, HRB 243294
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
2007-07-26 16:08 ` Trond Myklebust
2007-07-26 18:00 ` Christian Krafft
@ 2007-07-27 9:44 ` Arnd Bergmann
1 sibling, 0 replies; 11+ messages in thread
From: Arnd Bergmann @ 2007-07-27 9:44 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Christian Krafft, linux-kernel, stable
On Thursday 26 July 2007, Trond Myklebust wrote:
> Really ugly. Here's an alternative that is a lot more palatable.
Yes, much better than mine.
> ---------------------
> From: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date: Thu, 26 Jul 2007 12:06:17 -0400
> NFS: Fix put_nfs_open_context
>
> We need to grab the inode->i_lock atomically with the last reference put in
> order to remove the open context that is being freed from the
> nfsi->open_files list.
>
> Fix by converting the kref to a standard atomic counter and then using
> atomic_dec_and_lock()...
>
> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
>From the bit of research I did on the bug yesterday, it seems
that the race has been in there for ages, but may have become easier
to hit with the change to kref after 2.6.22. I don't really
understand why we didn't hit it in RHEL5/2.6.18 with the same
test case, but in Fedora 7/2.6.21.
Should a patch like the one below (I said like, you know you
can't trust my patches for your code ;-) be put into 2.6.22.x
and updated distro kernels?
Arnd <><
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -482,21 +482,19 @@ struct nfs_open_context *get_nfs_open_context(struct nfs_open_context *ctx)
void put_nfs_open_context(struct nfs_open_context *ctx)
{
- if (atomic_dec_and_test(&ctx->count)) {
- if (!list_empty(&ctx->list)) {
- struct inode *inode = ctx->dentry->d_inode;
- spin_lock(&inode->i_lock);
- list_del(&ctx->list);
- spin_unlock(&inode->i_lock);
- }
- if (ctx->state != NULL)
- nfs4_close_state(ctx->state, ctx->mode);
- if (ctx->cred != NULL)
- put_rpccred(ctx->cred);
- dput(ctx->dentry);
- mntput(ctx->vfsmnt);
- kfree(ctx);
- }
+ struct inode *inode = ctx->dentry->d_inode;
+
+ if (!atomic_dec_and_lock(&ctx->count, &inode->i_lock))
+ return;
+ list_del(&ctx->list);
+ spin_unlock(&inode->i_lock);
+ if (ctx->state != NULL)
+ nfs4_close_state(ctx->state, ctx->mode);
+ if (ctx->cred != NULL)
+ put_rpccred(ctx->cred);
+ dput(ctx->dentry);
+ mntput(ctx->vfsmnt);
+ kfree(ctx);
}
/*
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context
2007-07-26 18:00 ` Christian Krafft
@ 2007-07-27 10:02 ` Christian Krafft
0 siblings, 0 replies; 11+ messages in thread
From: Christian Krafft @ 2007-07-27 10:02 UTC (permalink / raw)
To: Christian Krafft; +Cc: Trond Myklebust, Arnd Bergmann, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 639 bytes --]
Hi Trond,
On Thu, 26 Jul 2007 20:00:47 +0200
Christian Krafft <krafft@de.ibm.com> wrote:
>
> Indeed, and it also compiles ;-)
> I started a test run using this patch, we'll see in a few hours ...
>
Scripts were running over night, no hang occurred.
Would be great to see this patch integrated.
--
Mit freundlichen Gruessen,
kind regards,
Christian Krafft
IBM Systems & Technology Group,
Linux Kernel Development
IT Specialist
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Herbert Kircher
Sitz der Gesellschaft: Boeblingen
Registriergericht: Amtsgericht Stuttgart, HRB 243294
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2007-07-27 10:41 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-25 15:08 [patch] nfs: fix locking in nfs/inode.c in nfs_free_open_context Christian Krafft
2007-07-25 17:28 ` Trond Myklebust
2007-07-26 11:23 ` Arnd Bergmann
2007-07-26 12:37 ` Trond Myklebust
2007-07-26 12:44 ` Christian Krafft
2007-07-26 13:23 ` Trond Myklebust
2007-07-26 15:13 ` Arnd Bergmann
2007-07-26 16:08 ` Trond Myklebust
2007-07-26 18:00 ` Christian Krafft
2007-07-27 10:02 ` Christian Krafft
2007-07-27 9:44 ` Arnd Bergmann
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox