* [PATCH] First casuality of hlist poisoning in 2.5.70
@ 2003-06-11 23:34 Trond Myklebust
2003-06-11 23:50 ` Linus Torvalds
2003-06-12 5:22 ` Linus Torvalds
0 siblings, 2 replies; 9+ messages in thread
From: Trond Myklebust @ 2003-06-11 23:34 UTC (permalink / raw)
To: Linus Torvalds, Alexander Viro; +Cc: Linux Kernel
Hi,
This patch removes the Oops that occurs when either the source or
the target of a d_move() operation is unhashed. It is currently
triggered by the NFS sillyrename code.
Cheers,
Trond
--- linux-2.5.70-up/fs/dcache.c.orig 2003-06-07 10:17:01.000000000 -0700
+++ linux-2.5.70-up/fs/dcache.c 2003-06-11 16:11:56.000000000 -0700
@@ -1223,8 +1223,13 @@
/* Move the dentry to the target hash queue, if on different bucket */
if (dentry->d_bucket != target->d_bucket) {
dentry->d_bucket = target->d_bucket;
- hlist_del_rcu(&dentry->d_hash);
- hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
+ if (!hlist_unhashed(&dentry->d_hash))
+ hlist_del_rcu(&dentry->d_hash);
+ if (!hlist_unhashed(&target->d_hash)) {
+ hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
+ dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
+ } else
+ dentry->d_vfs_flags |= DCACHE_UNHASHED;
}
/* Unhash the target: dput() will then get rid of it */
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] First casuality of hlist poisoning in 2.5.70 2003-06-11 23:34 [PATCH] First casuality of hlist poisoning in 2.5.70 Trond Myklebust @ 2003-06-11 23:50 ` Linus Torvalds 2003-06-12 0:04 ` Trond Myklebust 2003-06-12 5:22 ` Linus Torvalds 1 sibling, 1 reply; 9+ messages in thread From: Linus Torvalds @ 2003-06-11 23:50 UTC (permalink / raw) To: Trond Myklebust; +Cc: Alexander Viro, Linux Kernel On Wed, 11 Jun 2003, Trond Myklebust wrote: > > This patch removes the Oops that occurs when either the source or > the target of a d_move() operation is unhashed. It is currently > triggered by the NFS sillyrename code. Cool. The thing found something! However, I'm still a bit confused: > - hlist_del_rcu(&dentry->d_hash); > - hlist_add_head_rcu(&dentry->d_hash, target->d_bucket); > + if (!hlist_unhashed(&dentry->d_hash)) > + hlist_del_rcu(&dentry->d_hash); > + if (!hlist_unhashed(&target->d_hash)) { > + hlist_add_head_rcu(&dentry->d_hash, target->d_bucket); > + dentry->d_vfs_flags &= ~DCACHE_UNHASHED; > + } else > + dentry->d_vfs_flags |= DCACHE_UNHASHED; Can source or target really be validly unhashed? That makes no sense, since we just looked it up, and we've held the directory semaphores over the whole thing. In other words, I worry that the real bug is something else, and your patch makes it not oops, but hides the real problem. I'm sure you're right, but can you tell me what the sequence of events is that validly leads to this? Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] First casuality of hlist poisoning in 2.5.70 2003-06-11 23:50 ` Linus Torvalds @ 2003-06-12 0:04 ` Trond Myklebust 2003-06-12 0:32 ` Linus Torvalds 0 siblings, 1 reply; 9+ messages in thread From: Trond Myklebust @ 2003-06-12 0:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Trond Myklebust, Alexander Viro, Linux Kernel >>>>> " " == Linus Torvalds <torvalds@transmeta.com> writes: >> - hlist_del_rcu(&dentry->d_hash); >> - hlist_add_head_rcu(&dentry->d_hash, target->d_bucket); >> + if (!hlist_unhashed(&dentry->d_hash)) >> + hlist_del_rcu(&dentry->d_hash); >> + if (!hlist_unhashed(&target->d_hash)) { >> + hlist_add_head_rcu(&dentry->d_hash, target->d_bucket); >> + dentry->d_vfs_flags &= ~DCACHE_UNHASHED; >> + } else >> + dentry->d_vfs_flags |= DCACHE_UNHASHED; > Can source or target really be validly unhashed? That makes no > sense, since we just looked it up, and we've held the directory > semaphores over the whole thing. When renaming, we may want to unhash the dentry in order to stop d_lookup()s from succeeding (Recall that cached_lookup() does not attempt to take the directory semaphore - only real_lookup() does that). AFAICS one should not rehash the dentry until after the d_move(). Does that make sense? Cheers, Trond ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] First casuality of hlist poisoning in 2.5.70 2003-06-12 0:04 ` Trond Myklebust @ 2003-06-12 0:32 ` Linus Torvalds 0 siblings, 0 replies; 9+ messages in thread From: Linus Torvalds @ 2003-06-12 0:32 UTC (permalink / raw) To: Trond Myklebust; +Cc: Alexander Viro, Linux Kernel On Wed, 11 Jun 2003, Trond Myklebust wrote: > > AFAICS one should not rehash the dentry until after the d_move(). Does > that make sense? Yeah, it does seem that rehashing before actually calling d_move() means that there is a small window where another process might now come in, and use the dcache (without getting the semaphore) to see the old value of the target dentry, even though the low-level filesystem has already move the new dentry value over the target. Ie the window would be between i_op->rename(...) d_rehash(new_dentry) ... race here ... d_move(old_dentry, new_dentry) That might confuse a filesystem that expected that the target was deleted an no longer reachable by anybody. Al? What do you think? Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] First casuality of hlist poisoning in 2.5.70 2003-06-11 23:34 [PATCH] First casuality of hlist poisoning in 2.5.70 Trond Myklebust 2003-06-11 23:50 ` Linus Torvalds @ 2003-06-12 5:22 ` Linus Torvalds 2003-06-12 6:04 ` Trond Myklebust ` (2 more replies) 1 sibling, 3 replies; 9+ messages in thread From: Linus Torvalds @ 2003-06-12 5:22 UTC (permalink / raw) To: Trond Myklebust; +Cc: Alexander Viro, Linux Kernel On Wed, 11 Jun 2003, Trond Myklebust wrote: > > This patch removes the Oops that occurs when either the source or > the target of a d_move() operation is unhashed. It is currently > triggered by the NFS sillyrename code. Btw, thinking more about this, the patch can't be right. > + if (!hlist_unhashed(&dentry->d_hash)) > + hlist_del_rcu(&dentry->d_hash); This leaves the d_hash pointers as crapola. > + if (!hlist_unhashed(&target->d_hash)) { > + hlist_add_head_rcu(&dentry->d_hash, target->d_bucket); > + dentry->d_vfs_flags &= ~DCACHE_UNHASHED; > + } else > + dentry->d_vfs_flags |= DCACHE_UNHASHED; And this will fix it only if target is hashed. So either the patch depends on the target being hashed (and if so, it just shouldn't have the test, or have a BUG_ON() instead), or the patch is buggy. I have this suspicion that the logical patch would be something like this: if (hlist_unhashed(&target->d_hash)) { /* Unhash the dentry too */ __d_drop(dentry); } else { /* Rehash the dentry onto the same hash as the target */ hlist_del_rcu(&dentry->d_hash); hlist_add_head_rcu(&dentry->d_hash, target->d_bucket); dentry->d_vfs_flags &= ~DCACHE_UNHASHED; } At least this would make slightly more sense than the patch. HOWEVER, I actually suspect that the target really _cannot_ be unhashed, and that the test makes no sense, and the sequence should just be /* Rehash the dentry onto the same hash as the target */ hlist_del_rcu(&dentry->d_hash); hlist_add_head_rcu(&dentry->d_hash, target->d_bucket); dentry->d_vfs_flags &= ~DCACHE_UNHASHED; this is actually identical to what we have right now, except for one small detail: the current 2.5.x tree does not clear the DCACHE_UNHASHED bit in d_move(). Also, if a unhashed dentry really is ok, then I think the test for /* Move the dentry to the target hash queue, if on different bucket */ if (dentry->d_bucket != target->d_bucket) { ... should really be something like if (!d_hashed(dentry) || dentry->d_bucket != target->d_bucket) { ... because otherwise the dentry may well share the same bucket as the target, but if it wasn't hashed before, it won't be hashed after either. But I suspect that neither dentry nor target should really ever be unhashed by the time we call d_move(). That's reinforced by the fact that it looks like a unhashed dentry in d_move() would have been a silent bug previously - staying unhashed if it just shared the bucket. Al, I'll be really happy having you go over this code too. And whatever we decide is right (enforcing hashedness or whatever), we should assert it, because clearly d_move() has been a bit too subtle for us so far. Linus ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] First casuality of hlist poisoning in 2.5.70 2003-06-12 5:22 ` Linus Torvalds @ 2003-06-12 6:04 ` Trond Myklebust 2003-06-12 7:30 ` Trond Myklebust 2003-06-12 16:26 ` viro 2 siblings, 0 replies; 9+ messages in thread From: Trond Myklebust @ 2003-06-12 6:04 UTC (permalink / raw) To: Linus Torvalds; +Cc: Alexander Viro, Linux Kernel >>>>> " " == Linus Torvalds <torvalds@transmeta.com> writes: > But I suspect that neither dentry nor target should really ever > be unhashed by the time we call d_move(). That's reinforced by > the fact that it looks like a unhashed dentry in d_move() would > have been a silent bug previously - staying unhashed if it just > shared the bucket. It is not a bug. Looking more carefully at the Oops, it seems that this problem is occurring inside an nfs_rename() in which the target name belongs to an open file - and thus needs to be sillyrenamed first. In that case, we certainly do not want to rehash the dentry in order to do the d_move() since that would give rise to a race: we want to do a real rename into the same dentry after we're done with the sillyrename. I can agree that the patch was flawed, but I still believe that we do need to allow d_move to work with unhashed dentries. Cheers, Trond ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] First casuality of hlist poisoning in 2.5.70 2003-06-12 5:22 ` Linus Torvalds 2003-06-12 6:04 ` Trond Myklebust @ 2003-06-12 7:30 ` Trond Myklebust 2003-06-12 16:26 ` viro 2 siblings, 0 replies; 9+ messages in thread From: Trond Myklebust @ 2003-06-12 7:30 UTC (permalink / raw) To: Linus Torvalds; +Cc: Alexander Viro, Linux Kernel Note: As far as NFS is concerned, d_move() should always assume that the dentry is unhashed. Even if we ensure that we rehash, someone else could in theory trigger a call to d_revalidate() that causes the dentry to be dropped before we get to d_move(). Cheers, Trond ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] First casuality of hlist poisoning in 2.5.70 2003-06-12 5:22 ` Linus Torvalds 2003-06-12 6:04 ` Trond Myklebust 2003-06-12 7:30 ` Trond Myklebust @ 2003-06-12 16:26 ` viro 2003-06-19 23:01 ` Paul E. McKenney 2 siblings, 1 reply; 9+ messages in thread From: viro @ 2003-06-12 16:26 UTC (permalink / raw) To: Linus Torvalds; +Cc: Trond Myklebust, Linux Kernel On Wed, Jun 11, 2003 at 10:22:25PM -0700, Linus Torvalds wrote: > HOWEVER, I actually suspect that the target really _cannot_ be unhashed, > and that the test makes no sense, and the sequence should just be > > /* Rehash the dentry onto the same hash as the target */ > hlist_del_rcu(&dentry->d_hash); > hlist_add_head_rcu(&dentry->d_hash, target->d_bucket); > dentry->d_vfs_flags &= ~DCACHE_UNHASHED; > But I suspect that neither dentry nor target should really ever be > unhashed by the time we call d_move(). That's reinforced by the fact that > it looks like a unhashed dentry in d_move() would have been a silent bug > previously - staying unhashed if it just shared the bucket. > Al, I'll be really happy having you go over this code too. And whatever we > decide is right (enforcing hashedness or whatever), we should assert it, > because clearly d_move() has been a bit too subtle for us so far. Sigh... The real problem is not in d_move(), but in the way NFS drops dentries. That, and the fact that we are eating the consequences of RCU use in dcache - it had predictably made the entire thing _far_ too subtle. We probably should accept that both d_move() arguments can be unhashed. After the move hashed status of source should remain as it is and victim^Wtarget should get unhashed. We _do_ need to sort out the situation with unhashing stuff in NFS - in particular, the way it deals with mountpoints and with directories is a mess. I'm looking through that code, but it's bloody slow analysis due to RCU. Premature optimizations and all such... ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] First casuality of hlist poisoning in 2.5.70 2003-06-12 16:26 ` viro @ 2003-06-19 23:01 ` Paul E. McKenney 0 siblings, 0 replies; 9+ messages in thread From: Paul E. McKenney @ 2003-06-19 23:01 UTC (permalink / raw) To: viro; +Cc: Linus Torvalds, Trond Myklebust, Linux Kernel On Thu, Jun 12, 2003 at 05:26:27PM +0100, viro@parcelfarce.linux.theplanet.co.uk wrote: > We _do_ need to sort out the situation with unhashing stuff in NFS - in > particular, the way it deals with mountpoints and with directories is > a mess. I'm looking through that code, but it's bloody slow analysis > due to RCU. Premature optimizations and all such... No arguments on NFS, and I have to agree that RCU has been used primarily as an optimization thus far in Linux. However, RCU can and should be considered quite early on. The situation is similar to reader-writer locking -- you are better off working out which of your locks are going to be reader-writer locks earlier rather than later. If you use either reader-writer locking or RCU as a late optimization, you will find that your earlier design decisions make your life more complex than necessary. On the other hand, if RCU is considered early, it can make life simpler. Breaking up locks often introduces deadlocks. Some of these deadlocks can be avoided entirely by thinking through the use of RCU at design time. An extreme example of deadlock avoidance may be found in IPMI (drivers/char/ipmi/ipmi_kcs_intf.c, the synchronize_kernel() near line 1174). Really tough to do this as a late optimization... Thanx, Paul ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2003-06-19 23:48 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2003-06-11 23:34 [PATCH] First casuality of hlist poisoning in 2.5.70 Trond Myklebust 2003-06-11 23:50 ` Linus Torvalds 2003-06-12 0:04 ` Trond Myklebust 2003-06-12 0:32 ` Linus Torvalds 2003-06-12 5:22 ` Linus Torvalds 2003-06-12 6:04 ` Trond Myklebust 2003-06-12 7:30 ` Trond Myklebust 2003-06-12 16:26 ` viro 2003-06-19 23:01 ` Paul E. McKenney
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.