All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] First casuality of hlist poisoning in 2.5.70
@ 2003-06-11 23:34 Trond Myklebust
  2003-06-11 23:50 ` Linus Torvalds
  2003-06-12  5:22 ` Linus Torvalds
  0 siblings, 2 replies; 9+ messages in thread
From: Trond Myklebust @ 2003-06-11 23:34 UTC (permalink / raw)
  To: Linus Torvalds, Alexander Viro; +Cc: Linux Kernel


Hi,

  This patch removes the Oops that occurs when either the source or
the target of a d_move() operation is unhashed. It is currently
triggered by the NFS sillyrename code.

Cheers,
  Trond

--- linux-2.5.70-up/fs/dcache.c.orig	2003-06-07 10:17:01.000000000 -0700
+++ linux-2.5.70-up/fs/dcache.c	2003-06-11 16:11:56.000000000 -0700
@@ -1223,8 +1223,13 @@
 	/* Move the dentry to the target hash queue, if on different bucket */
 	if (dentry->d_bucket != target->d_bucket) {
 		dentry->d_bucket = target->d_bucket;
-		hlist_del_rcu(&dentry->d_hash);
-		hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
+		if (!hlist_unhashed(&dentry->d_hash))
+			hlist_del_rcu(&dentry->d_hash);
+		if (!hlist_unhashed(&target->d_hash)) {
+			hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
+			dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
+		} else
+			dentry->d_vfs_flags |= DCACHE_UNHASHED;
 	}
 
 	/* Unhash the target: dput() will then get rid of it */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] First casuality of hlist poisoning in 2.5.70
  2003-06-11 23:34 [PATCH] First casuality of hlist poisoning in 2.5.70 Trond Myklebust
@ 2003-06-11 23:50 ` Linus Torvalds
  2003-06-12  0:04   ` Trond Myklebust
  2003-06-12  5:22 ` Linus Torvalds
  1 sibling, 1 reply; 9+ messages in thread
From: Linus Torvalds @ 2003-06-11 23:50 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Alexander Viro, Linux Kernel


On Wed, 11 Jun 2003, Trond Myklebust wrote:
> 
>   This patch removes the Oops that occurs when either the source or
> the target of a d_move() operation is unhashed. It is currently
> triggered by the NFS sillyrename code.

Cool. The thing found something!

However, I'm still a bit confused:

> -		hlist_del_rcu(&dentry->d_hash);
> -		hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
> +		if (!hlist_unhashed(&dentry->d_hash))
> +			hlist_del_rcu(&dentry->d_hash);
> +		if (!hlist_unhashed(&target->d_hash)) {
> +			hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
> +			dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
> +		} else
> +			dentry->d_vfs_flags |= DCACHE_UNHASHED;

Can source or target really be validly unhashed? That makes no sense,
since we just looked it up, and we've held the directory semaphores over
the whole thing.

In other words, I worry that the real bug is something else, and your 
patch makes it not oops, but hides the real problem.

I'm sure you're right, but can you tell me what the sequence of events is 
that validly leads to this?

		Linus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] First casuality of hlist poisoning in 2.5.70
  2003-06-11 23:50 ` Linus Torvalds
@ 2003-06-12  0:04   ` Trond Myklebust
  2003-06-12  0:32     ` Linus Torvalds
  0 siblings, 1 reply; 9+ messages in thread
From: Trond Myklebust @ 2003-06-12  0:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Trond Myklebust, Alexander Viro, Linux Kernel

>>>>> " " == Linus Torvalds <torvalds@transmeta.com> writes:

    >> - hlist_del_rcu(&dentry->d_hash);
    >> - hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
    >> + if (!hlist_unhashed(&dentry->d_hash))
    >> + hlist_del_rcu(&dentry->d_hash);
    >> + if (!hlist_unhashed(&target->d_hash)) {
    >> + hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
    >> + dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
    >> + } else
    >> + dentry->d_vfs_flags |= DCACHE_UNHASHED;

     > Can source or target really be validly unhashed? That makes no
     > sense, since we just looked it up, and we've held the directory
     > semaphores over the whole thing.

When renaming, we may want to unhash the dentry in order to stop
d_lookup()s from succeeding (Recall that cached_lookup() does not
attempt to take the directory semaphore - only real_lookup() does
that).

AFAICS one should not rehash the dentry until after the d_move(). Does
that make sense?

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] First casuality of hlist poisoning in 2.5.70
  2003-06-12  0:04   ` Trond Myklebust
@ 2003-06-12  0:32     ` Linus Torvalds
  0 siblings, 0 replies; 9+ messages in thread
From: Linus Torvalds @ 2003-06-12  0:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Alexander Viro, Linux Kernel


On Wed, 11 Jun 2003, Trond Myklebust wrote:
>
> AFAICS one should not rehash the dentry until after the d_move(). Does
> that make sense?

Yeah, it does seem that rehashing before actually calling d_move() means 
that there is a small window where another process might now come in, and 
use the dcache (without getting the semaphore) to see the old value of the 
target dentry, even though the low-level filesystem has already move the 
new dentry value over the target. Ie the window would be between

		i_op->rename(...)
	d_rehash(new_dentry)
	... race here ...
	d_move(old_dentry, new_dentry)

That might confuse a filesystem that expected that the target was deleted 
an no longer reachable by anybody.

Al? What do you think?

		Linus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] First casuality of hlist poisoning in 2.5.70
  2003-06-11 23:34 [PATCH] First casuality of hlist poisoning in 2.5.70 Trond Myklebust
  2003-06-11 23:50 ` Linus Torvalds
@ 2003-06-12  5:22 ` Linus Torvalds
  2003-06-12  6:04   ` Trond Myklebust
                     ` (2 more replies)
  1 sibling, 3 replies; 9+ messages in thread
From: Linus Torvalds @ 2003-06-12  5:22 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Alexander Viro, Linux Kernel


On Wed, 11 Jun 2003, Trond Myklebust wrote:
> 
>   This patch removes the Oops that occurs when either the source or
> the target of a d_move() operation is unhashed. It is currently
> triggered by the NFS sillyrename code.

Btw, thinking more about this, the patch can't be right.

> +		if (!hlist_unhashed(&dentry->d_hash))
> +			hlist_del_rcu(&dentry->d_hash);

This leaves the d_hash pointers as crapola.

> +		if (!hlist_unhashed(&target->d_hash)) {
> +			hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
> +			dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
> +		} else
> +			dentry->d_vfs_flags |= DCACHE_UNHASHED;

And this will fix it only if target is hashed.

So either the patch depends on the target being hashed (and if so, it just 
shouldn't have the test, or have a BUG_ON() instead), or the patch is 
buggy.

I have this suspicion that the logical patch would be something like this:

	if (hlist_unhashed(&target->d_hash)) {
		/* Unhash the dentry too */
		__d_drop(dentry);
	} else {
		/* Rehash the dentry onto the same hash as the target */
		hlist_del_rcu(&dentry->d_hash);
		hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
		dentry->d_vfs_flags &= ~DCACHE_UNHASHED;   
	}

At least this would make slightly more sense than the patch.

HOWEVER, I actually suspect that the target really _cannot_ be unhashed, 
and that the test makes no sense, and the sequence should just be

	/* Rehash the dentry onto the same hash as the target */
	hlist_del_rcu(&dentry->d_hash);
	hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
	dentry->d_vfs_flags &= ~DCACHE_UNHASHED;

this is actually identical to what we have right now, except for one small 
detail: the current 2.5.x tree does not clear the DCACHE_UNHASHED bit in 
d_move().

Also, if a unhashed dentry really is ok, then I think the test for 

        /* Move the dentry to the target hash queue, if on different bucket */
        if (dentry->d_bucket != target->d_bucket) {
		...

should really be something like

	if (!d_hashed(dentry) || dentry->d_bucket != target->d_bucket) {
		...

because otherwise the dentry may well share the same bucket as the target, 
but if it wasn't hashed before, it won't be hashed after either. 

But I suspect that neither dentry nor target should really ever be
unhashed by the time we call d_move(). That's reinforced by the fact that
it looks like a unhashed dentry in d_move() would have been a silent bug
previously - staying unhashed if it just shared the bucket.

Al, I'll be really happy having you go over this code too. And whatever we 
decide is right (enforcing hashedness or whatever), we should assert it, 
because clearly d_move() has been a bit too subtle for us so far.

		Linus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] First casuality of hlist poisoning in 2.5.70
  2003-06-12  5:22 ` Linus Torvalds
@ 2003-06-12  6:04   ` Trond Myklebust
  2003-06-12  7:30   ` Trond Myklebust
  2003-06-12 16:26   ` viro
  2 siblings, 0 replies; 9+ messages in thread
From: Trond Myklebust @ 2003-06-12  6:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alexander Viro, Linux Kernel

>>>>> " " == Linus Torvalds <torvalds@transmeta.com> writes:

     > But I suspect that neither dentry nor target should really ever
     > be unhashed by the time we call d_move(). That's reinforced by
     > the fact that it looks like a unhashed dentry in d_move() would
     > have been a silent bug previously - staying unhashed if it just
     > shared the bucket.

It is not a bug.

Looking more carefully at the Oops, it seems that this problem is
occurring inside an nfs_rename() in which the target name belongs to
an open file - and thus needs to be sillyrenamed first.

In that case, we certainly do not want to rehash the dentry in order
to do the d_move() since that would give rise to a race: we want to do
a real rename into the same dentry after we're done with the
sillyrename.

I can agree that the patch was flawed, but I still believe that we do
need to allow d_move to work with unhashed dentries.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] First casuality of hlist poisoning in 2.5.70
  2003-06-12  5:22 ` Linus Torvalds
  2003-06-12  6:04   ` Trond Myklebust
@ 2003-06-12  7:30   ` Trond Myklebust
  2003-06-12 16:26   ` viro
  2 siblings, 0 replies; 9+ messages in thread
From: Trond Myklebust @ 2003-06-12  7:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Alexander Viro, Linux Kernel


Note: As far as NFS is concerned, d_move() should always assume that
the dentry is unhashed. Even if we ensure that we rehash, someone else
could in theory trigger a call to d_revalidate() that causes the
dentry to be dropped before we get to d_move().

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] First casuality of hlist poisoning in 2.5.70
  2003-06-12  5:22 ` Linus Torvalds
  2003-06-12  6:04   ` Trond Myklebust
  2003-06-12  7:30   ` Trond Myklebust
@ 2003-06-12 16:26   ` viro
  2003-06-19 23:01     ` Paul E. McKenney
  2 siblings, 1 reply; 9+ messages in thread
From: viro @ 2003-06-12 16:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Trond Myklebust, Linux Kernel

On Wed, Jun 11, 2003 at 10:22:25PM -0700, Linus Torvalds wrote:

> HOWEVER, I actually suspect that the target really _cannot_ be unhashed, 
> and that the test makes no sense, and the sequence should just be
> 
> 	/* Rehash the dentry onto the same hash as the target */
> 	hlist_del_rcu(&dentry->d_hash);
> 	hlist_add_head_rcu(&dentry->d_hash, target->d_bucket);
> 	dentry->d_vfs_flags &= ~DCACHE_UNHASHED;
 
> But I suspect that neither dentry nor target should really ever be
> unhashed by the time we call d_move(). That's reinforced by the fact that
> it looks like a unhashed dentry in d_move() would have been a silent bug
> previously - staying unhashed if it just shared the bucket.

> Al, I'll be really happy having you go over this code too. And whatever we 
> decide is right (enforcing hashedness or whatever), we should assert it, 
> because clearly d_move() has been a bit too subtle for us so far.

Sigh...  The real problem is not in d_move(), but in the way NFS drops
dentries.  That, and the fact that we are eating the consequences of
RCU use in dcache - it had predictably made the entire thing _far_ too
subtle.

We probably should accept that both d_move() arguments can be unhashed.
After the move hashed status of source should remain as it is and
victim^Wtarget should get unhashed.

We _do_ need to sort out the situation with unhashing stuff in NFS - in
particular, the way it deals with mountpoints and with directories is
a mess.  I'm looking through that code, but it's bloody slow analysis
due to RCU.  Premature optimizations and all such...

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] First casuality of hlist poisoning in 2.5.70
  2003-06-12 16:26   ` viro
@ 2003-06-19 23:01     ` Paul E. McKenney
  0 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2003-06-19 23:01 UTC (permalink / raw)
  To: viro; +Cc: Linus Torvalds, Trond Myklebust, Linux Kernel

On Thu, Jun 12, 2003 at 05:26:27PM +0100, viro@parcelfarce.linux.theplanet.co.uk wrote:
> We _do_ need to sort out the situation with unhashing stuff in NFS - in
> particular, the way it deals with mountpoints and with directories is
> a mess.  I'm looking through that code, but it's bloody slow analysis
> due to RCU.  Premature optimizations and all such...

No arguments on NFS, and I have to agree that RCU has been used
primarily as an optimization thus far in Linux.  However, RCU
can and should be considered quite early on.  The situation is
similar to reader-writer locking -- you are better off working
out which of your locks are going to be reader-writer locks
earlier rather than later.  If you use either reader-writer
locking or RCU as a late optimization, you will find that your
earlier design decisions make your life more complex than necessary.

On the other hand, if RCU is considered early, it can make life
simpler.  Breaking up locks often introduces deadlocks.  Some
of these deadlocks can be avoided entirely by thinking through
the use of RCU at design time.  An extreme example of deadlock
avoidance may be found in IPMI (drivers/char/ipmi/ipmi_kcs_intf.c,
the synchronize_kernel() near line 1174).  Really tough to do
this as a late optimization...

						Thanx, Paul

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-06-19 23:48 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-06-11 23:34 [PATCH] First casuality of hlist poisoning in 2.5.70 Trond Myklebust
2003-06-11 23:50 ` Linus Torvalds
2003-06-12  0:04   ` Trond Myklebust
2003-06-12  0:32     ` Linus Torvalds
2003-06-12  5:22 ` Linus Torvalds
2003-06-12  6:04   ` Trond Myklebust
2003-06-12  7:30   ` Trond Myklebust
2003-06-12 16:26   ` viro
2003-06-19 23:01     ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.