From: Al Viro <viro@ZenIV.linux.org.uk>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-fsdevel@vger.kernel.org, npiggin@kernel.dk
Subject: Re: ->d_lock FUBAR (was Re: Linux 3.0-rc6)
Date: Thu, 14 Jul 2011 08:21:20 +0100 [thread overview]
Message-ID: <20110714072119.GO11013@ZenIV.linux.org.uk> (raw)
In-Reply-To: <CA+55aFz3SZPa6MvDfzG0zF=AFLh1XUeVNVdyPxQKrovtU5MdbA@mail.gmail.com>
On Tue, Jul 12, 2011 at 08:56:35PM -0700, Linus Torvalds wrote:
> It would take the locks a few more times, but it would avoid the nasty
> lock ordering issues exactly because it drops them in between rather
> than try to hold them all at once. And d_move() is *not* a performance
> critical area, afaik, so the point would be to make the locking more
> straightforward and avoid the horrible subtlety.
>
> Crazy? Stupid? What am I missing? It's probably something obvious.
It's loops over d_subdirs of given dentry. IOW, dentry can
be found not just via the hash chains and those iterators (ones outside
of fs/dcache.c) don't play with rename seqlock. They just rely on ->d_lock
on dentry being enough to stabilize the list of children.
It's not all that horrible, since i_mutex on parent (held in quite a
few of those) is enough to prevent d_move() altogether. Or iterator
sits on fs that never does d_move/d_materialise_unique at all. There
are exceptions. The most generic one is __fsnotify_update_child_dentry_flags(),
but that's not all. E.g. coda ->d_revalidate(), possibly some ceph code...
Overall, it might be doable, but it'll take some massage in strange places.
Hell knows; I think at that point in release cycle we don't want to go
there. After looking through ->d_lock users I'm mostly convinced that
unlazy_walk() fix + making damn sure we don't try to create loops in
d_materialise_unique() + adding missing ->i_mutex where needed is the
way to go for now. We have several places where ->d_lock overlap is
not parent-to-child, but these really can't lead to deadlocks - fs is
specialized enough to prevent them.
However, looking through these loops over ->d_subdirs shows *another* pile of
bugs, like this one:
mutex_lock(&bus->d_inode->i_mutex);
list_for_each_entry(dev, &bus->d_subdirs, d_u.d_child)
if (dev->d_inode)
update_dev(dev);
mutex_unlock(&bus->d_inode->i_mutex);
in drivers/usb/core/inode.c:update_bus(). See the problem? Sure,
it's ramfs-like: all dentries are pinned down until unlink/rmdir,
which is blocked by i_mutex on parent. And yes, d_move() is also
not a problem (ditto). However, if you open a file and unlink it,
dentry will be unpinned, but stay around. It will be unhashed,
but it'll be on ->d_subdir. Close the file and dentry will be killed,
without i_mutex on parent. Note that ->d_locked will be taken -
both on victim and parent. But the quoted sucker takes no ->d_lock
at all. Bummer...
IOW, iterators relying _purely_ on i_mutex on parent are currently fucked.
We have more such cases. Another one is debugfs_remove_recursive().
I wonder if we simply ought to evict the sucker from d_subdirs/d_child
in simple_unlink()... It would require verifying that users of that
animal are OK with that, but I suspect that most of them will be happy.
It might simplify life for readdir() and llseek() on such directories,
while we are at it... The "hunt for dentry leaks" logics on umount
might become slightly less useful for those filesystems (we'll get
"parent is misteriously busy" instead of "child is unlinked, but somehow
busy on fs shutdown"), but I doubt it's that serious. Hell knows -
I never had to hunt dentry leaks on debugfs/usbfs/etc.
BTW, usbfs tree iterators should've been killed a long time ago - we
simply need ->permission() and ->getattr() added for usbfs to DTRT
instead of this "iterate over all inodes, patch new i_uid/i_gid/i_mode
when remount asks to do that" crap.
BTW^2: in clk_debugfs_register_one() in assorted arch/arm/mach-*/clock.c
d = c->dent;
list_for_each_entry_safe(child, child_tmp, &d->d_subdirs, d_u.d_child)
debugfs_remove(child);
debugfs_remove(c->dent);
should be debugfs_remove_recursive(). Here we don't even bother with
i_mutex...
BTW^3: sel_remove_classes() plays very unsafe games - no ->i_mutex at all, no
->d_lock until we get leaf directories. Root-only, but...
next prev parent reply other threads:[~2011-07-14 7:21 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-07-04 23:09 Linux 3.0-rc6 Linus Torvalds
2011-07-06 15:27 ` Arkadiusz Miśkiewicz
2011-07-11 6:03 ` Al Viro
2011-07-12 23:48 ` ->d_lock FUBAR (was Re: Linux 3.0-rc6) Al Viro
2011-07-13 0:04 ` Linus Torvalds
2011-07-13 0:56 ` Al Viro
2011-07-13 1:39 ` Al Viro
2011-07-13 2:40 ` Linus Torvalds
2011-07-13 2:59 ` Al Viro
2011-07-13 3:22 ` Al Viro
2011-07-13 3:56 ` Linus Torvalds
2011-07-14 7:21 ` Al Viro [this message]
2011-07-15 4:58 ` Al Viro
2011-07-13 1:48 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110714072119.GO11013@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=linux-fsdevel@vger.kernel.org \
--cc=npiggin@kernel.dk \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.