Re: Linux 3.0-rc6 ..

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Linux 3.0-rc6 ..
       [not found] <CA+55aFxQw4T07hxz8JSm12x3FOH_Dcf=G5mvLrxiTuLxjbw+Mg@mail.gmail.com>
@ 2011-07-11  6:03 ` Al Viro
  2011-07-12 23:48   ` ->d_lock FUBAR (was Re: Linux 3.0-rc6) Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2011-07-11  6:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel

On Mon, Jul 04, 2011 at 04:09:29PM -0700, Linus Torvalds wrote:

> Other than the isci driver, the rest really is just lots of random
> small stuff. It's getting to the point where I'm thinking I should
> just release 3.0, because it's been pretty quiet, and the fixes
> haven't been earth-shakingly exciting. Some drm (radeon and intel)
> fixes might be noticeable to more people, the rest would tend to be
> pretty esoteric.

Sigh...  Looks like we have serious problems around ->d_parent handling.
First of all, __d_unalias() is fscked - calling d_ancestor() is not
going to do us any good before we made sure that tree topology won't
change right under us.  Used to be protected by dcache_lock, but not
anymore.  Moreover, there's a similar problem with __d_materialise_dentry()
side of things; there we don't check for loop creation at all and with
NFS we just might try to attach a root of disconnected subtree *inside*
that subtree.  No check and no locking either...

Another piece of PITA - cifs_get_root() will cheerfully call
d_materialise_unique() on dentry it got from d_lookup() if it happens to
be negative to start with *and* directory had been created on server in
the meanwhile.  BUG_ON() triggered in d_materialise_unique()... The lack
of i_mutex on parent also doesn't help.  cifs_get_root() mess is from
this cycle, BTW.

btrfs get_default_root() doesn't grab i_mutex either.  It should, since it
calls d_splice_alias().

I'm really not fond of the code in dentry_lock_for_move(); we _might_ manage
to avoid deadlocks since i_mutex serialization might prevent contention on
d_lock, but I'm still not convinced, especially due to missing i_mutex
in several callers...  I mean, this

 * If there is an ancestor relationship:
 * dentry->d_parent->...->d_parent->d_lock
 *   ...
 *     dentry->d_parent->d_lock
 *       dentry->d_lock
 *
 * If no ancestor relationship:
 * if (dentry1 < dentry2)
 *   dentry1->d_lock
 *     dentry2->d_lock

is no good: suppose A is ancestor of B and C is unrelated to either.
With B sitting at lower address than C and A at higher one.  We have
A before B, since it's an ancestor; C before A since they are unrelated
and addresses compare that way; B before C (ditto).  Loops in lock
ordering are generally bad; we _might_ get away with that in this case
since we serialize d_move() callers to hell and back, but...

Al, going through ->d_parent code review and not happy about the state of
that animal...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* ->d_lock FUBAR (was Re: Linux 3.0-rc6)
  2011-07-11  6:03 ` Linux 3.0-rc6 Al Viro
@ 2011-07-12 23:48   ` Al Viro
  2011-07-13  0:04     ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2011-07-12 23:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, npiggin

On Mon, Jul 11, 2011 at 07:03:15AM +0100, Al Viro wrote:

>  * If there is an ancestor relationship:
>  * dentry->d_parent->...->d_parent->d_lock
>  *   ...
>  *     dentry->d_parent->d_lock
>  *       dentry->d_lock
>  *
>  * If no ancestor relationship:
>  * if (dentry1 < dentry2)
>  *   dentry1->d_lock
>  *     dentry2->d_lock
> 
> is no good: suppose A is ancestor of B and C is unrelated to either.
> With B sitting at lower address than C and A at higher one.  We have
> A before B, since it's an ancestor; C before A since they are unrelated
> and addresses compare that way; B before C (ditto).  Loops in lock
> ordering are generally bad; we _might_ get away with that in this case
> since we serialize d_move() callers to hell and back, but...

But it's not enough.  Look: getting from rcu pathwalk to normal one
involves
	* grabbing d_lock on parent
	* grabbing d_lock on child
	* checking that child hadn't been moved elsewhere in the meanwhile
All flakiness of the locking "order" aside, here we simply lock two dentries
that might be nowhere near each other by now.  Hell, by that point the parent
might've been moved under (what used to be) child.  Or it might have address
greater than that of child and be not an ancestor anymore.  Note that no
i_mutex, etc. is held at that point, so there's no external serialization
to save our arses...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)
  2011-07-12 23:48   ` ->d_lock FUBAR (was Re: Linux 3.0-rc6) Al Viro
@ 2011-07-13  0:04     ` Linus Torvalds
  2011-07-13  0:56       ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2011-07-13  0:04 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, npiggin

On Tue, Jul 12, 2011 at 4:48 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> All flakiness of the locking "order" aside, here we simply lock two dentries
> that might be nowhere near each other by now.  Hell, by that point the parent
> might've been moved under (what used to be) child.  Or it might have address
> greater than that of child and be not an ancestor anymore.  Note that no
> i_mutex, etc. is held at that point, so there's no external serialization
> to save our arses...

So why wouldn't it be sufficient to just take the s_vfs_rename_mutex
in the caller? We're only talking d_materialise_unique(), no?

That said, looking at NFS (nfs_prime_dcache), we do seem to hold the
parent directory inode semaphore. So I'm not seeing any renames
happening wrt the entry we found and the parent. So the relationship
there should be safe regardless, no?

I didn't check the other users of d_materialise_unique() (ciph and
ceph) but at least the cifs use is in "readdir.c", which implies
similar directory inode mutex issues.

                 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)
  2011-07-13  0:04     ` Linus Torvalds
@ 2011-07-13  0:56       ` Al Viro
  2011-07-13  1:39         ` Al Viro
  2011-07-13  1:48         ` Linus Torvalds
  0 siblings, 2 replies; 12+ messages in thread
From: Al Viro @ 2011-07-13  0:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, npiggin

On Tue, Jul 12, 2011 at 05:04:55PM -0700, Linus Torvalds wrote:

> > All flakiness of the locking "order" aside, here we simply lock two dentries
> > that might be nowhere near each other by now. ?Hell, by that point the parent
> > might've been moved under (what used to be) child. ?Or it might have address
> > greater than that of child and be not an ancestor anymore. ?Note that no
> > i_mutex, etc. is held at that point, so there's no external serialization
> > to save our arses...
> 
> So why wouldn't it be sufficient to just take the s_vfs_rename_mutex
> in the caller? We're only talking d_materialise_unique(), no?

Alas, no.  d_materialize_unique() aside (we have a couple of bugs in
there), ->d_lock handling is fscked up.  Basically, it's unlazy_walk()
taking ->d_lock instances without anything to guarantee that it's doing
that in order consistent with d_move() (and any other users, for that
matter).  *And* the "locking order" in question is not transitive, but
that's a separate story.  And no, we obviously are not going to
serialize the switch from RCU to normal pathwalk on a per-fs mutex...

I'm crawling through the d_lock users right now (>400 places ;-/), trying
to put together reasonable locking rules.

FWIW, we used to have this for d_move() et.al.:
	* all places changing ->d_parent hold i_mutex on parent(s)
	* if parents differ, we have ->s_vfs_rename_mutex as well
	* if old_dentry was not attached to the tree, it was positive
and a subdirectory of new_dentry->d_parent.  Said new_dentry->d_parent
was been locked by caller.

Unfortunately, current tree has these rules violated in several places.
And these rules have nothing to say about ->d_lock ;-/

Nick, could you please describe the locking rules you had in mind for
->d_lock?  unlazy_walk() (aka nameidata_dentry_drop_rcu()) can probably
be dealt with by checking d_seq twice, once before locking the child.
Then we could be sure that it's still a child of parent and will stay
so as long as parent's ->d_lock is held, and thus the ordering would
stay stable...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)
  2011-07-13  0:56       ` Al Viro
@ 2011-07-13  1:39         ` Al Viro
  2011-07-13  2:40           ` Linus Torvalds
  2011-07-13  1:48         ` Linus Torvalds
  1 sibling, 1 reply; 12+ messages in thread
From: Al Viro @ 2011-07-13  1:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, npiggin

On Wed, Jul 13, 2011 at 01:56:34AM +0100, Al Viro wrote:

> Nick, could you please describe the locking rules you had in mind for
> ->d_lock?  unlazy_walk() (aka nameidata_dentry_drop_rcu()) can probably
> be dealt with by checking d_seq twice, once before locking the child.
> Then we could be sure that it's still a child of parent and will stay
> so as long as parent's ->d_lock is held, and thus the ordering would
> stay stable...

As the matter of fact, can we ever get there with IS_ROOT(dentry)?
AFAICS, that should be impossible - dentry->d_seq would have to be
changed by whatever had torn it off the tree and we would have
buggered off on __d_rcu_to_refcount() failing...

AFAICS, the only way to get there would be with mountpoint crossing
returning a symlink with symlink already killed by rename() somehow
(call in walk_component()).  The first part should be impossible -
symlinks can't be mounted/bound on anything (and if it would be
possible, we'd trigger that BUG_ON() if symlink was still alive,
anyway).

So here's what I want to do to unlazy_walk(); it'll almost certainly
leave other problems with ->d_lock, but at least it'll take care of
that one:

Make sure that child is still a child of parent before nested locking
of child->d_lock in unlazy_walk(); otherwise we are risking a violation
of locking order and deadlocks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/fs/namei.c b/fs/namei.c
index 0223c41..5c867dd 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -433,6 +433,8 @@ static int unlazy_walk(struct nameidata *nd, struct dentry *dentry)
 			goto err_parent;
 		BUG_ON(nd->inode != parent->d_inode);
 	} else {
+		if (dentry->d_parent != parent)
+			goto err_parent;
 		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
 		if (!__d_rcu_to_refcount(dentry, nd->seq))
 			goto err_child;

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)
  2011-07-13  1:39         ` Al Viro
@ 2011-07-13  2:40           ` Linus Torvalds
  2011-07-13  2:59             ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2011-07-13  2:40 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, npiggin

On Tue, Jul 12, 2011 at 6:39 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> So here's what I want to do to unlazy_walk(); it'll almost certainly
> leave other problems with ->d_lock, but at least it'll take care of
> that one:

That patch seems pretty clearly safe. If the parent has changed, we'd
want to exit anyway: as you point out the __d_rcu_to_refcount() is
there to catch that case. So exiting early and thus making that direct
parenthood requirement explicit in that d_lock case seems to be a good
thing regardless.

It's dentry_lock_for_move() that makes me really nervous. Not only
does it lock up to four dentries, but it mixes the whole parenthood vs
pointer ordering.  Or course, it does have those BUG_ON() checks, so
it should never cause any circular dependencies, but still..

The actual main protection to get lookups correct in the presence of
concurrent moves largely depends on the sequence numbers (ie
d_lookup() retrying if it hits a rename), which is why I also find it
unlikely that we really should need to hold all those d_lock cases all
at the same time.

So does d_move() really need to get all the locks at the same time and
then do all the operations inside that "super-locked" region? Or could
we get the locks in sequence and do individual parts of the rename
operations under individual locks?

Are there any other d_lock cases that depend on the pointer ordering?
Most everything else seems to be about direct parenthood, no?

                               Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)
  2011-07-13  2:40           ` Linus Torvalds
@ 2011-07-13  2:59             ` Al Viro
  2011-07-13  3:22               ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2011-07-13  2:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, npiggin

On Tue, Jul 12, 2011 at 07:40:01PM -0700, Linus Torvalds wrote:

> It's dentry_lock_for_move() that makes me really nervous. Not only
> does it lock up to four dentries, but it mixes the whole parenthood vs
> pointer ordering.  Or course, it does have those BUG_ON() checks, so
> it should never cause any circular dependencies, but still..

Me too, obviously.

> The actual main protection to get lookups correct in the presence of
> concurrent moves largely depends on the sequence numbers (ie
> d_lookup() retrying if it hits a rename), which is why I also find it
> unlikely that we really should need to hold all those d_lock cases all
> at the same time.
> 
> So does d_move() really need to get all the locks at the same time and
> then do all the operations inside that "super-locked" region? Or could
> we get the locks in sequence and do individual parts of the rename
> operations under individual locks?
> 
> Are there any other d_lock cases that depend on the pointer ordering?
> Most everything else seems to be about direct parenthood, no?

It's not that easy.  We want ->d_lock on parents - not only because
there's code iterating through the list of children, but because
ordering on direct parenthood bloody depends on children not moving
out while we hold ->d_lock on their parent.  Otherwise we are looking
for nightmares in shrink_dcache_parent() et.al.

I'm not sure how much do we care about stability of x->d_parent when
x->d_lock is held.  ->d_compare() is the most obvious potential area
of trouble in that respect, but there might be more.

I'm still not finished reviewing ->d_lock uses; about a couple of hundreds
is left to wade through.  I would really, *REALLY* appreciate explicitly
defined locking rules from Nick (it's his changes, mostly).  As in "this,
this and that field is protected by ->d_lock on..."

Note that ->d_parent is stable when i_mutex is held on parent, which
makes most of the users of ->d_parent safe and fine (->lookup(), etc.
are all called with directory locked).  I've not finished reviewing
->d_parent users either, but IMO ->d_lock review is more important, so
it got bumped in front of queue...

Back to GrepTFS and stripping the paint off the walls...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)
  2011-07-13  2:59             ` Al Viro
@ 2011-07-13  3:22               ` Al Viro
  2011-07-13  3:56                 ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2011-07-13  3:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, npiggin

On Wed, Jul 13, 2011 at 03:59:18AM +0100, Al Viro wrote:

> I'm not sure how much do we care about stability of x->d_parent when
> x->d_lock is held.  ->d_compare() is the most obvious potential area
> of trouble in that respect, but there might be more.

Oh, and another fun area is per-chain locks, of course ;-/  Look at
__d_drop(); reengineering the callers of d_drop() to make sure that
fscker's parent stays stable would be very painful and turning that
into loop-based variant is going to be interesting.  Doable, but not
fun...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)
  2011-07-13  3:22               ` Al Viro
@ 2011-07-13  3:56                 ` Linus Torvalds
  2011-07-14  7:21                   ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2011-07-13  3:56 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, npiggin

On Tue, Jul 12, 2011 at 8:22 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Oh, and another fun area is per-chain locks, of course ;-/  Look at
> __d_drop(); reengineering the callers of d_drop() to make sure that
> fscker's parent stays stable would be very painful and turning that
> into loop-based variant is going to be interesting.  Doable, but not
> fun...

So I'm almost convinced.

The reason I say "almost" is that I wonder if we couldn't just turn
the d_move() into basically a

   write_seqlock(rename_lock);
   write_seqcount_begin(dentry);
   write_seqcount_begin(target);

   unhash(dentry);  /* Takes and releases dentry/dentry-parent locks */
   unhash(target);  /* Takes and releases target/target-parent locks */

   tname = target->name;
   dname = dentry->name;

   rehash(dentry, tname); /* takes/releases locks as above.. */
   rehash(target, dname); /* takes/releases locks as above.. */

   write_seqcount_end(target);
   write_seqcount_end(dentry);
   write_sequnlock(rename_lock);

where the seqlock and sequence counts mean that any lookups (RCU or
not) while the thing was unhashed get automatically re-done, so the
fact that it's not at "atomic" move in between would not important.

See what I'm saying? Sure, we want to hold both the dentry and parent
locks when messing with dentry->d_parent: but does that really mean
that we want to hold all FOUR locks at the same time, and in
particular hold those *independent* locks (which is why we do that
pointer value comparison, to give ordering *outside* of the parenthood
issue)?

IOW, maybe we could just hold and release them pairwise, and at any
one point in time we'd only hold one pair of dentry/parent d_lock? No
dentry-pointer-based ordering required - but we'd still protect each
d_parent pointer _individually_.

But I didn't really look closely at the code. The above suggestion is
purely based on my high-level gut feel, and it's entirely possible
that there's some particular practical reason why it's a totally
broken idea, and I'm just a complete moron. I'm realizing, for
example, that maybe we can't even do the dentry seqcount without
holding d_lock?

It would take the locks a few more times, but it would avoid the nasty
lock ordering issues exactly because it drops them in between rather
than try to hold them all at once. And d_move() is *not* a performance
critical area, afaik, so the point would be to make the locking more
straightforward and avoid the horrible subtlety.

Crazy? Stupid? What am I missing? It's probably something obvious.

                            Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)
  2011-07-13  3:56                 ` Linus Torvalds
@ 2011-07-14  7:21                   ` Al Viro
  2011-07-15  4:58                     ` Al Viro
  0 siblings, 1 reply; 12+ messages in thread
From: Al Viro @ 2011-07-14  7:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, npiggin

On Tue, Jul 12, 2011 at 08:56:35PM -0700, Linus Torvalds wrote:

> It would take the locks a few more times, but it would avoid the nasty
> lock ordering issues exactly because it drops them in between rather
> than try to hold them all at once. And d_move() is *not* a performance
> critical area, afaik, so the point would be to make the locking more
> straightforward and avoid the horrible subtlety.
> 
> Crazy? Stupid? What am I missing? It's probably something obvious.

	It's loops over d_subdirs of given dentry.  IOW, dentry can
be found not just via the hash chains and those iterators (ones outside
of fs/dcache.c) don't play with rename seqlock.  They just rely on ->d_lock
on dentry being enough to stabilize the list of children.

It's not all that horrible, since i_mutex on parent (held in quite a
few of those) is enough to prevent d_move() altogether.  Or iterator
sits on fs that never does d_move/d_materialise_unique at all.  There
are exceptions. The most generic one is __fsnotify_update_child_dentry_flags(),
but that's not all.  E.g. coda ->d_revalidate(), possibly some ceph code...
Overall, it might be doable, but it'll take some massage in strange places.
Hell knows; I think at that point in release cycle we don't want to go
there.  After looking through ->d_lock users I'm mostly convinced that
unlazy_walk() fix + making damn sure we don't try to create loops in
d_materialise_unique() + adding missing ->i_mutex where needed is the
way to go for now.  We have several places where ->d_lock overlap is
not parent-to-child, but these really can't lead to deadlocks - fs is
specialized enough to prevent them.

However, looking through these loops over ->d_subdirs shows *another* pile of
bugs, like this one:
        mutex_lock(&bus->d_inode->i_mutex);

        list_for_each_entry(dev, &bus->d_subdirs, d_u.d_child)
                if (dev->d_inode)
                        update_dev(dev);

        mutex_unlock(&bus->d_inode->i_mutex);
in drivers/usb/core/inode.c:update_bus().  See the problem?  Sure,
it's ramfs-like: all dentries are pinned down until unlink/rmdir,
which is blocked by i_mutex on parent.  And yes, d_move() is also
not a problem (ditto).  However, if you open a file and unlink it,
dentry will be unpinned, but stay around.  It will be unhashed,
but it'll be on ->d_subdir.  Close the file and dentry will be killed,
without i_mutex on parent.  Note that ->d_locked will be taken -
both on victim and parent.  But the quoted sucker takes no ->d_lock
at all.  Bummer...

IOW, iterators relying _purely_ on i_mutex on parent are currently fucked.
We have more such cases.  Another one is debugfs_remove_recursive().

I wonder if we simply ought to evict the sucker from d_subdirs/d_child
in simple_unlink()...  It would require verifying that users of that
animal are OK with that, but I suspect that most of them will be happy.
It might simplify life for readdir() and llseek() on such directories,
while we are at it...  The "hunt for dentry leaks" logics on umount
might become slightly less useful for those filesystems (we'll get
"parent is misteriously busy" instead of "child is unlinked, but somehow
busy on fs shutdown"), but I doubt it's that serious.  Hell knows -
I never had to hunt dentry leaks on debugfs/usbfs/etc.

BTW, usbfs tree iterators should've been killed a long time ago - we
simply need ->permission() and ->getattr() added for usbfs to DTRT
instead of this "iterate over all inodes, patch new i_uid/i_gid/i_mode
when remount asks to do that" crap.

BTW^2: in clk_debugfs_register_one() in assorted arch/arm/mach-*/clock.c
        d = c->dent;
        list_for_each_entry_safe(child, child_tmp, &d->d_subdirs, d_u.d_child)
                debugfs_remove(child);
        debugfs_remove(c->dent);
should be debugfs_remove_recursive().  Here we don't even bother with
i_mutex...

BTW^3: sel_remove_classes() plays very unsafe games - no ->i_mutex at all, no
->d_lock until we get leaf directories.  Root-only, but...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)
  2011-07-14  7:21                   ` Al Viro
@ 2011-07-15  4:58                     ` Al Viro
  0 siblings, 0 replies; 12+ messages in thread
From: Al Viro @ 2011-07-15  4:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-fsdevel, npiggin

On Thu, Jul 14, 2011 at 08:21:20AM +0100, Al Viro wrote:
> Overall, it might be doable, but it'll take some massage in strange places.
> Hell knows; I think at that point in release cycle we don't want to go
> there.  After looking through ->d_lock users I'm mostly convinced that
> unlazy_walk() fix + making damn sure we don't try to create loops in
> d_materialise_unique() + adding missing ->i_mutex where needed is the
> way to go for now.  We have several places where ->d_lock overlap is
> not parent-to-child, but these really can't lead to deadlocks - fs is
> specialized enough to prevent them.

OK, hopefully that should take care of the ->d_lock for now; missing
->i_mutex is *not* dealt with yet, since I'm yet to finish ->d_parent
code review.  dentry_lock_for_move() is serialized by rename_lock now
(call in __dentry_materialise_unique() used to be done without it; fixed
in this series), so we can't have more than one of those in any potential
deadlock set.  Everything else either does lock child after immediate
parent or happens on filesystems that see neither d_move() nor
d_materialise_unique() and have serialization of their own.  IOW, I
think that should be enough to avoid deadlocks for in-tree filesystems
for the time being; anything more dramatic will have to wait for the
next merge window - it's too late in the cycle to do anything daring
in that area...  Usual place,

git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6.git/ for-linus

Shortlog:
Al Viro (2):
      Fix ->d_lock locking order in unlazy_walk()
      fix loop checks in d_materialise_unique()

Diffstat:
 fs/dcache.c |   51 ++++++++++++++++++++++++++++++++++-----------------
 fs/namei.c  |    2 ++
 2 files changed, 36 insertions(+), 17 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)
  2011-07-13  0:56       ` Al Viro
  2011-07-13  1:39         ` Al Viro
@ 2011-07-13  1:48         ` Linus Torvalds
  1 sibling, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2011-07-13  1:48 UTC (permalink / raw)
  To: Al Viro; +Cc: linux-fsdevel, npiggin

On Tue, Jul 12, 2011 at 5:56 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> Alas, no.  d_materialize_unique() aside (we have a couple of bugs in
> there), ->d_lock handling is fscked up.

Ok, I misread your previous email, and read it as if the problem you
were discussing was __d_unalias(), and then mis-read the newer email
as a result.

I now see what you're saying.

Without having thought very deeply about this, I would suggest that we
aim for entirely getting rid of anything that holds two d_lock's at
the same time. There may be cases where we do end up having both
"dentry" and the immediate parent, and in that case maybe we can have
that direct relationship act as a ordering requirement, but yes, we
should definitely aim to get rid of the whole "if ancestor do one
ordering, otherwise base it on pointer ordering".

Some of the things that take both locks make me go "do we really need
to hold those locks at the same time?" IOW, maybe the locking could be
simply sequential. Some of that pain may be simply unnecessary.

                             Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2011-07-15  4:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CA+55aFxQw4T07hxz8JSm12x3FOH_Dcf=G5mvLrxiTuLxjbw+Mg@mail.gmail.com>
2011-07-11  6:03 ` Linux 3.0-rc6 Al Viro
2011-07-12 23:48   ` ->d_lock FUBAR (was Re: Linux 3.0-rc6) Al Viro
2011-07-13  0:04     ` Linus Torvalds
2011-07-13  0:56       ` Al Viro
2011-07-13  1:39         ` Al Viro
2011-07-13  2:40           ` Linus Torvalds
2011-07-13  2:59             ` Al Viro
2011-07-13  3:22               ` Al Viro
2011-07-13  3:56                 ` Linus Torvalds
2011-07-14  7:21                   ` Al Viro
2011-07-15  4:58                     ` Al Viro
2011-07-13  1:48         ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).