* [PATCH][RFC] sleep_on fixes for lockd
2004-02-09 10:28 ` [PATCH][RFC] use completions instead of sleep_on for rpciod Olaf Kirch
@ 2004-02-09 11:38 ` Christoph Hellwig
2004-02-09 20:48 ` [NFS] [PATCH][RFC] use completions instead of sleep_on forrpciod trond.myklebust
2004-02-10 1:52 ` [NFS] [PATCH][RFC] use completions instead of sleep_on for rpciod Greg Banks
2 siblings, 0 replies; 8+ messages in thread
From: Christoph Hellwig @ 2004-02-09 11:38 UTC (permalink / raw)
To: Olaf Kirch; +Cc: nfs, linux-fsdevel
Here's another interesting one for you, this time in lockd:
nlmclnt_grace_wait() is completetly unused, removed.
nlmclnt_call() uses sleep_on_interruptible, but doesn't actually check
for signals, I've fixed it to use wait_event_interruptible and check
for signals, but again I'd prefer someone knowledgeable to take look
at what this actually was intended to do..
--- 1.8/fs/lockd/clntproc.c Fri Feb 7 21:25:20 2003
+++ edited/fs/lockd/clntproc.c Sat Feb 7 15:58:42 2004
@@ -195,19 +195,6 @@
}
/*
- * Wait while server is in grace period
- */
-static inline int
-nlmclnt_grace_wait(struct nlm_host *host)
-{
- if (!host->h_reclaiming)
- interruptible_sleep_on_timeout(&host->h_gracewait, 10*HZ);
- else
- interruptible_sleep_on(&host->h_gracewait);
- return signalled()? -ERESTARTSYS : 0;
-}
-
-/*
* Allocate an NLM RPC call struct
*/
struct nlm_rqst *
@@ -254,8 +241,11 @@
msg.rpc_cred = nfs_file_cred(filp);
do {
- if (host->h_reclaiming && !argp->reclaim) {
- interruptible_sleep_on(&host->h_gracewait);
+ status = wait_event_interruptible(host->h_gracewait,
+ (!host->h_reclaiming || argp->reclaim));
+ if (status) {
+ if (signalled())
+ return -EINTR;
continue;
}
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [NFS] [PATCH][RFC] use completions instead of sleep_on forrpciod
2004-02-09 10:28 ` [PATCH][RFC] use completions instead of sleep_on for rpciod Olaf Kirch
2004-02-09 11:38 ` [PATCH][RFC] sleep_on fixes for lockd Christoph Hellwig
@ 2004-02-09 20:48 ` trond.myklebust
2004-02-09 21:00 ` David S. Miller
2004-02-10 9:15 ` Olaf Kirch
2004-02-10 1:52 ` [NFS] [PATCH][RFC] use completions instead of sleep_on for rpciod Greg Banks
2 siblings, 2 replies; 8+ messages in thread
From: trond.myklebust @ 2004-02-09 20:48 UTC (permalink / raw)
To: Olaf Kirch, Christoph Hellwig; +Cc: netdev, nfs
P=E5 m=E5 , 09/02/2004 klokka 11:28, skreiv Olaf Kirch:
> On Sat, Feb 07, 2004 at 03:44:05PM +0100, Christoph Hellwig wrote:
> > The rpciod shutdown code gives ugly sleep_on without BKL warnings in
-mm. And it looks indeed somewhat racy.
>
> Yes, this code is indeed one of the areas that breaks when you use the
sunrpc code without the BKL. Another one was the missing spinlock in
xprt_alloc_xid. The code in fs/nfs/unlink.c is probably also racy
without BKL.
>
> So if you remove the sunrpc BKL be prepared for lots and lots
> of bug reports :)
Ahem....
xprt_alloc_xid is quite safe. It's always called under xprt->xprt_lock.
The code in unlink.c is only called from dentry_iput() or from
sillyrename(). In either case, there is no possibility for SMP races.
I'll admit that the rpciod killer code is racy, but I'm not sure that
completions are the best answer, as they have at least one nasty feature:
they are not interruptible.
Cheers,
Trond
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [NFS] [PATCH][RFC] use completions instead of sleep_on forrpciod
2004-02-09 20:48 ` [NFS] [PATCH][RFC] use completions instead of sleep_on forrpciod trond.myklebust
@ 2004-02-09 21:00 ` David S. Miller
2004-02-10 9:15 ` Olaf Kirch
1 sibling, 0 replies; 8+ messages in thread
From: David S. Miller @ 2004-02-09 21:00 UTC (permalink / raw)
To: trond.myklebust; +Cc: okir, hch, netdev, nfs
On Mon, 9 Feb 2004 21:48:59 +0100 (CET)
trond.myklebust@fys.uio.no wrote:
> I'll admit that the rpciod killer code is racy, but I'm not sure that
> completions are the best answer, as they have at least one nasty feature:
> they are not interruptible.
Would you like to back out the change then? I put it into
Linus's tree already.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [NFS] [PATCH][RFC] use completions instead of sleep_on forrpciod
2004-02-09 20:48 ` [NFS] [PATCH][RFC] use completions instead of sleep_on forrpciod trond.myklebust
2004-02-09 21:00 ` David S. Miller
@ 2004-02-10 9:15 ` Olaf Kirch
2004-02-10 11:47 ` trond.myklebust
1 sibling, 1 reply; 8+ messages in thread
From: Olaf Kirch @ 2004-02-10 9:15 UTC (permalink / raw)
To: trond.myklebust; +Cc: Christoph Hellwig, netdev, nfs
On Mon, Feb 09, 2004 at 09:48:59PM +0100, Trond Myklebust wrote:
> > So if you remove the sunrpc BKL be prepared for lots and lots
> > of bug reports :)
>
> Ahem....
>
> xprt_alloc_xid is quite safe. It's always called under xprt->xprt_lock.
It wasn't in 2.4.19 or whenever I ran into this. And a per-transport
lock isn't enough to protect a global variable shared by all transports.
Trust me: I spent several weeks debugging this particular problem :)
(See the PS for a somewhat lengthy description of what I remember)
I think that BKL removal at this point is probably not a good idea. Suse
had a BKL removal patch in their SLES8 kernel, and I spent a total of 4-5
weeks hunting down all sorts of bugs related to SMP races. And I don't
think we've seen all of these bugs yet. It might be better to postpone
this for the 2.7 branch.
> The code in unlink.c is only called from dentry_iput() or from
> sillyrename(). In either case, there is no possibility for SMP races.
Are you sure? nfs_dentry_iput explicitly does a lock_kernel before calling
nfs_complete_unlink. I think that's because dentry_iput explicitly drops
the dcache_lock before calling dentry->d_op->d_iput. I wouldn't remove
that BKL :)
And even if you leave that BKL and just remove the one in rpciod, you may
still have a race between rpciod and dput. One CPU may be adding entries
inside nfs_async_unlink (called from dentry_iput()), and at the same
time rpciod can run on another CPU, calling nfs_async_unlink_release,
removing stuff from the nfs_deletes list. The reference counting on
nfs_unlinkdata isn't atomic either.
Olaf
PS: Let me try to put the pieces back together... this was on a PPC
SMP system (so it can take a while for modified values to be
be visible on other CPUs). There were several mounts being stressed
in parallel.
xprt_request_init did a simple "req->rq_xid = xid++;" without any
additional spin locks protecting the xid variable (we have that
spin lock now).
The bug required at least 3 CPUs, and 3 different mounts.
CPU0 CPU1 CPU2
--------------------------------------------------------
| lock xprt1 lock xprt2
| lock xprt3
| read xid=0
| read xid=0
| write xid=1
read xid=1
time++ write xid=2
write xid=1
| unlock xprt3
| lock xprt3
| read xid=1
| write xid=2
v unlock xprt3
Note that on PPC, modified data can take a long time to become
visible on other CPUs. Cache sync is only guaranteed by operations
such as spin_unlock etc.
Olaf
--
Olaf Kirch | Stop wasting entropy - start using predictable
okir@suse.de | tempfile names today!
---------------+
^ permalink raw reply [flat|nested] 8+ messages in thread* Re: [NFS] [PATCH][RFC] use completions instead of sleep_on forrpciod
2004-02-10 9:15 ` Olaf Kirch
@ 2004-02-10 11:47 ` trond.myklebust
0 siblings, 0 replies; 8+ messages in thread
From: trond.myklebust @ 2004-02-10 11:47 UTC (permalink / raw)
To: Olaf Kirch; +Cc: Christoph Hellwig, netdev, nfs
P=E5 ty , 10/02/2004 klokka 10:15, skreiv Olaf Kirch:
> It wasn't in 2.4.19 or whenever I ran into this. And a per-transport
lock isn't enough to protect a global variable shared by all transports.
Trust me: I spent several weeks debugging this particular problem :)
(See the PS for a somewhat lengthy description of what I remember)
Duh... I forgot that the XID was still global...
Does anybody see any problems with moving the XID counter into the struct
xprt? Please bear in mind that the replay caches on modern NFS servers
look at XID+client IP address+port number, so 2 sockets that play the sam=
e
XID to the server should not normally conflict.
> I think that BKL removal at this point is probably not a good idea. Sus=
e
had a BKL removal patch in their SLES8 kernel, and I spent a total of
4-5
> weeks hunting down all sorts of bugs related to SMP races. And I don't
think we've seen all of these bugs yet. It might be better to postpone
this for the 2.7 branch.
I totally agree with this...
No extra work has been done on BKL removal in the 2.6 branch, and I know
that the old patch that was up for testing in the 2.4 tree was buggy in
several ways...
> > The code in unlink.c is only called from dentry_iput() or from
sillyrename(). In either case, there is no possibility for SMP races.
>
> Are you sure? nfs_dentry_iput explicitly does a lock_kernel before
calling
> nfs_complete_unlink. I think that's because dentry_iput explicitly drop=
s
the dcache_lock before calling dentry->d_op->d_iput. I wouldn't remove
that BKL :)
Yep, but there is only one inode per dentry, and the VFS rules forbids yo=
u
to convert a negative dentry into a positive one (the contrary is allowed=
,
but only under certain restricted situations).
> And even if you leave that BKL and just remove the one in rpciod, you
may
> still have a race between rpciod and dput. One CPU may be adding entrie=
s
inside nfs_async_unlink (called from dentry_iput()), and at the same
time rpciod can run on another CPU, calling nfs_async_unlink_release,
removing stuff from the nfs_deletes list. The reference counting on
nfs_unlinkdata isn't atomic either.
Sure... The unlink.c global lists definitely depend on the BKL for
protection. The latter would need to be replaced by a new global
spinlock.
Cheers,
Trond
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [NFS] [PATCH][RFC] use completions instead of sleep_on for rpciod
2004-02-09 10:28 ` [PATCH][RFC] use completions instead of sleep_on for rpciod Olaf Kirch
2004-02-09 11:38 ` [PATCH][RFC] sleep_on fixes for lockd Christoph Hellwig
2004-02-09 20:48 ` [NFS] [PATCH][RFC] use completions instead of sleep_on forrpciod trond.myklebust
@ 2004-02-10 1:52 ` Greg Banks
2004-02-10 9:18 ` Olaf Kirch
2 siblings, 1 reply; 8+ messages in thread
From: Greg Banks @ 2004-02-10 1:52 UTC (permalink / raw)
To: Olaf Kirch; +Cc: Christoph Hellwig, netdev, nfs
Olaf Kirch wrote:
>
> [...] The code in fs/nfs/unlink.c is probably also racy
> without BKL.
I had cause to poke around there recently, and it's *definitely* racy
without BKL. That file would benefit from several modernisations:
1. move the list of nfs_unlinkdata from a global to a field in nfs_server
2. add a lock in nfs_server for it instead of implicit BKL in
nfs_complete_unlink, nfs_async_unlink, nfs_put_unlinkdata
3. use proper list.h lists
> So if you remove the sunrpc BKL be prepared for lots and lots
> of bug reports :)
Yes but it's worth doing, especially serverside.
Greg.
--
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.
^ permalink raw reply [flat|nested] 8+ messages in thread