From: Tejun Heo <htejun@gmail.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Gabor Gombas <gombasg@sztaki.hu>,
Dave Young <hidave.darkstar@gmail.com>,
linux-kernel@vger.kernel.org, bluez-devel@lists.sourceforge.net,
Greg KH <greg@kroah.com>,
ebiederm@xmission.com
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs
Date: Sun, 06 Jan 2008 11:07:52 +0900 [thread overview]
Message-ID: <478037F8.8020103@gmail.com> (raw)
In-Reply-To: <20080105194510.GK27894@ZenIV.linux.org.uk>
Hello,
Al Viro wrote:
> On Sat, Jan 05, 2008 at 11:30:25PM +0900, Tejun Heo wrote:
>>> Assuming that this is what we get, everything looks explainable - we
>>> have sysfs_rename_dir() calling sysfs_get_dentry() while the parent
>>> gets evicted. We don't have any exclusion, so while we are playing
>>> silly buggers with lookups in sysfs_get_dentry() we have parent become
>>> negative; the rest is obvious...
>> That part of code is walking down the sysfs tree from the s_root of
>> sysfs hierarchy and on each step parent is held using dget() while being
>> referenced, so I don't think they can turn negative there.
>
> Turn? Just what stops you from getting a negative (and unhashed) from
> lookup_one_noperm() and on the next iteration being buggered on mutex_lock()?
Right, I haven't thought about that. When sysfs_get_dentry() is called,
@sd is always valid so unless there was existing negative dentry, lookup
is guaranteed to return positive dentry, but by populating dcache with
negative dentry before a node is created, things can go wrong. I don't
think that's what's going on here tho. If that was the case, the
while() loop looking up the next sd to lookup (@cur) should have blown
up as negative dentry will have NULL d_fsdata which doesn't match any sd.
I guess what's needed here is d_revalidate() as other distributed
filesystems do. I'll test whether this can be actually triggered and
prepare a fix. Thanks a lot for pointing out the problem.
>>> AFAICS, the locking here is quite broken and frankly, sysfs_get_dentry()
>>> and the way it plays with fs/namei.c are ucking fugly.
>> Can you elaborate a bit? The locking in sysfs is unconventional but
>> that's mostly from necessity. It has dual interface - vfs and driver
>> model && vfs data structures (dentry and inode) are too big to always
>> keep around, so it basically becomes a small distributed file system
>> where the backing data can change asynchronously.
>
> ... with all fun that creates. As it is, you have those async changers
> of backing data using VFS locking _under_ sysfs locks via lookup_one_noperm()
> and yet it needs sysfs_mutex inside sysfs_lookup(). So you can't have
> sysfs_get_dentry() under it. So you don't have exclusion with arseloads
> of sysfs tree changes in there. Joy...
There are two locks. sysfs_rename_mutex and sysfs_mutex.
sysfs_rename_mutex is above VFS locks while sysfs_mutex is below VFS
locks. sysfs_rename_mutex() protects against move/rename which can
change the ancestry of a held sysfs_dirent while sysfs_mutex protects
the sd hierarchy itself. Locking can be wrong if sysfs_rename_mutex
locking is missing from the places where ancestry of a held sd can
change but I can't find one ATM. If I'm missing your point again, feel
free to scream at me. :-)
As it's unnecessarily unintuitive, there's a pending change to rename
sysfs_rename_mutex and use it to protect the whole tree structure to
make locking simpler while using sysfs_mutex to guard VFS access such
that the locking hierarchy plainly becomes sysfs_rename_mutex - VFS
locks - sysfs_mutex where all internal sysfs structure is protected by
the outer mutex and the inner one just protects VFS accesses.
> Frankly, with the current state of sysfs the last vestiges of arguments
> used to push it into the tree back then are dead and buried. I'm not
> blaming you, BTW - the shitpile *did* grow past the point where its
> memory footprint became far too large and something needed to be done.
> Unfortunately, it happened too late for that something being "get rid
> of the entire mess" and now we are saddled with it for good.
Yeah, it's too late to get rid of sysfs and regardless implementation
ugliness, which BTW I think has improved a lot during last six or so
months, it's now pretty useful and important to drivers, so I guess the
only option is trying hard to make it better.
Oh, BTW, the ugly lookup_one_noperm() can be removed if LOOKUP_NOPERM
flag is added. The only reason sysfs_lookup() uses the specialized
lookup is to avoid permission check.
Thanks.
--
tejun
next prev parent reply other threads:[~2008-01-06 2:08 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-12-28 17:32 Oops involving RFCOMM and sysfs Gabor Gombas
2007-12-29 8:07 ` [Bluez-devel] " Dave Young
2008-01-02 14:48 ` Gabor Gombas
2008-01-02 15:16 ` Gabor Gombas
2008-01-03 13:16 ` Gabor Gombas
2008-01-04 1:05 ` Dave Young
2008-01-07 8:07 ` Tejun Heo
2008-01-07 14:10 ` Gabor Gombas
2008-01-05 7:50 ` Al Viro
2008-01-05 14:30 ` Tejun Heo
2008-01-05 19:45 ` Al Viro
2008-01-06 2:07 ` Tejun Heo [this message]
2008-01-06 2:18 ` Al Viro
2008-01-06 2:54 ` Tejun Heo
2008-01-06 3:35 ` Al Viro
2008-01-06 3:54 ` Tejun Heo
2008-01-07 2:37 ` Tejun Heo
2008-01-07 8:21 ` Eric W. Biederman
2008-01-07 9:17 ` Tejun Heo
2008-01-07 9:18 ` Tejun Heo
2008-01-07 9:22 ` Al Viro
2008-01-07 10:33 ` Eric W. Biederman
2008-01-07 14:13 ` Gabor Gombas
2008-01-07 15:24 ` Tejun Heo
2008-01-07 21:00 ` Gabor Gombas
2008-01-08 9:42 ` Tejun Heo
2008-01-08 13:32 ` Gabor Gombas
2008-01-09 9:16 ` Tejun Heo
2008-01-09 15:57 ` Cornelia Huck
2008-01-10 1:11 ` Dave Young
2008-01-11 23:09 ` Gabor Gombas
2008-01-14 7:05 ` Dave Young
2008-01-14 12:52 ` Cornelia Huck
2008-01-15 1:57 ` Dave Young
2008-01-16 1:02 ` Dave Young
2008-01-16 23:06 ` Gabor Gombas
2008-01-17 7:24 ` Dave Young
2008-01-17 8:15 ` Dave Young
2008-01-17 11:42 ` Cornelia Huck
2008-01-18 3:37 ` Dave Young
2008-01-18 9:19 ` Cornelia Huck
2008-01-18 10:23 ` Cornelia Huck
2008-01-18 10:34 ` Dave Young
2008-01-18 11:26 ` Cornelia Huck
2008-01-21 3:15 ` Dave Young
2008-01-21 15:09 ` [Patch] Driver core: Cleanup get_device_parent() in device_add() and device_move() Cornelia Huck
2008-01-10 10:15 ` [Bluez-devel] Oops involving RFCOMM and sysfs Gabor Gombas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=478037F8.8020103@gmail.com \
--to=htejun@gmail.com \
--cc=bluez-devel@lists.sourceforge.net \
--cc=ebiederm@xmission.com \
--cc=gombasg@sztaki.hu \
--cc=greg@kroah.com \
--cc=hidave.darkstar@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox