public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <htejun@gmail.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Gabor Gombas <gombasg@sztaki.hu>,
	Dave Young <hidave.darkstar@gmail.com>,
	linux-kernel@vger.kernel.org, bluez-devel@lists.sourceforge.net,
	Greg KH <greg@kroah.com>,
	ebiederm@xmission.com
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs
Date: Sun, 06 Jan 2008 11:07:52 +0900	[thread overview]
Message-ID: <478037F8.8020103@gmail.com> (raw)
In-Reply-To: <20080105194510.GK27894@ZenIV.linux.org.uk>

Hello,

Al Viro wrote:
> On Sat, Jan 05, 2008 at 11:30:25PM +0900, Tejun Heo wrote:
>>> Assuming that this is what we get, everything looks explainable - we
>>> have sysfs_rename_dir() calling sysfs_get_dentry() while the parent
>>> gets evicted.  We don't have any exclusion, so while we are playing
>>> silly buggers with lookups in sysfs_get_dentry() we have parent become
>>> negative; the rest is obvious...
>> That part of code is walking down the sysfs tree from the s_root of
>> sysfs hierarchy and on each step parent is held using dget() while being
>> referenced, so I don't think they can turn negative there.
> 
> Turn?  Just what stops you from getting a negative (and unhashed) from
> lookup_one_noperm() and on the next iteration being buggered on mutex_lock()?

Right, I haven't thought about that.  When sysfs_get_dentry() is called,
@sd is always valid so unless there was existing negative dentry, lookup
is guaranteed to return positive dentry, but by populating dcache with
negative dentry before a node is created, things can go wrong.  I don't
think that's what's going on here tho.  If that was the case, the
while() loop looking up the next sd to lookup (@cur) should have blown
up as negative dentry will have NULL d_fsdata which doesn't match any sd.

I guess what's needed here is d_revalidate() as other distributed
filesystems do.  I'll test whether this can be actually triggered and
prepare a fix.  Thanks a lot for pointing out the problem.

>>> AFAICS, the locking here is quite broken and frankly, sysfs_get_dentry()
>>> and the way it plays with fs/namei.c are ucking fugly.
>> Can you elaborate a bit?  The locking in sysfs is unconventional but
>> that's mostly from necessity.  It has dual interface - vfs and driver
>> model && vfs data structures (dentry and inode) are too big to always
>> keep around, so it basically becomes a small distributed file system
>> where the backing data can change asynchronously.
> 
> ... with all fun that creates.  As it is, you have those async changers
> of backing data using VFS locking _under_ sysfs locks via lookup_one_noperm()
> and yet it needs sysfs_mutex inside sysfs_lookup().  So you can't have
> sysfs_get_dentry() under it.  So you don't have exclusion with arseloads
> of sysfs tree changes in there.  Joy...

There are two locks.  sysfs_rename_mutex and sysfs_mutex.
sysfs_rename_mutex is above VFS locks while sysfs_mutex is below VFS
locks.  sysfs_rename_mutex() protects against move/rename which can
change the ancestry of a held sysfs_dirent while sysfs_mutex protects
the sd hierarchy itself.  Locking can be wrong if sysfs_rename_mutex
locking is missing from the places where ancestry of a held sd can
change but I can't find one ATM.  If I'm missing your point again, feel
free to scream at me.  :-)

As it's unnecessarily unintuitive, there's a pending change to rename
sysfs_rename_mutex and use it to protect the whole tree structure to
make locking simpler while using sysfs_mutex to guard VFS access such
that the locking hierarchy plainly becomes sysfs_rename_mutex - VFS
locks - sysfs_mutex where all internal sysfs structure is protected by
the outer mutex and the inner one just protects VFS accesses.

> Frankly, with the current state of sysfs the last vestiges of arguments
> used to push it into the tree back then are dead and buried.  I'm not
> blaming you, BTW - the shitpile *did* grow past the point where its
> memory footprint became far too large and something needed to be done.
> Unfortunately, it happened too late for that something being "get rid
> of the entire mess" and now we are saddled with it for good.

Yeah, it's too late to get rid of sysfs and regardless implementation
ugliness, which BTW I think has improved a lot during last six or so
months, it's now pretty useful and important to drivers, so I guess the
only option is trying hard to make it better.

Oh, BTW, the ugly lookup_one_noperm() can be removed if LOOKUP_NOPERM
flag is added.  The only reason sysfs_lookup() uses the specialized
lookup is to avoid permission check.

Thanks.

-- 
tejun

  reply	other threads:[~2008-01-06  2:08 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-28 17:32 Oops involving RFCOMM and sysfs Gabor Gombas
2007-12-29  8:07 ` [Bluez-devel] " Dave Young
2008-01-02 14:48   ` Gabor Gombas
2008-01-02 15:16   ` Gabor Gombas
2008-01-03 13:16     ` Gabor Gombas
2008-01-04  1:05       ` Dave Young
2008-01-07  8:07         ` Tejun Heo
2008-01-07 14:10         ` Gabor Gombas
2008-01-05  7:50     ` Al Viro
2008-01-05 14:30       ` Tejun Heo
2008-01-05 19:45         ` Al Viro
2008-01-06  2:07           ` Tejun Heo [this message]
2008-01-06  2:18             ` Al Viro
2008-01-06  2:54               ` Tejun Heo
2008-01-06  3:35                 ` Al Viro
2008-01-06  3:54                   ` Tejun Heo
2008-01-07  2:37             ` Tejun Heo
2008-01-07  8:21               ` Eric W. Biederman
2008-01-07  9:17                 ` Tejun Heo
2008-01-07  9:18                   ` Tejun Heo
2008-01-07  9:22                     ` Al Viro
2008-01-07 10:33                       ` Eric W. Biederman
2008-01-07 14:13       ` Gabor Gombas
2008-01-07 15:24         ` Tejun Heo
2008-01-07 21:00           ` Gabor Gombas
2008-01-08  9:42             ` Tejun Heo
2008-01-08 13:32               ` Gabor Gombas
2008-01-09  9:16                 ` Tejun Heo
2008-01-09 15:57                   ` Cornelia Huck
2008-01-10  1:11                   ` Dave Young
2008-01-11 23:09                     ` Gabor Gombas
2008-01-14  7:05                       ` Dave Young
2008-01-14 12:52                         ` Cornelia Huck
2008-01-15  1:57                           ` Dave Young
2008-01-16  1:02                             ` Dave Young
2008-01-16 23:06                               ` Gabor Gombas
2008-01-17  7:24                                 ` Dave Young
2008-01-17  8:15                                   ` Dave Young
2008-01-17 11:42                                     ` Cornelia Huck
2008-01-18  3:37                                       ` Dave Young
2008-01-18  9:19                                         ` Cornelia Huck
2008-01-18 10:23                                           ` Cornelia Huck
2008-01-18 10:34                                             ` Dave Young
2008-01-18 11:26                                               ` Cornelia Huck
2008-01-21  3:15                                                 ` Dave Young
2008-01-21 15:09                                                   ` [Patch] Driver core: Cleanup get_device_parent() in device_add() and device_move() Cornelia Huck
2008-01-10 10:15                   ` [Bluez-devel] Oops involving RFCOMM and sysfs Gabor Gombas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=478037F8.8020103@gmail.com \
    --to=htejun@gmail.com \
    --cc=bluez-devel@lists.sourceforge.net \
    --cc=ebiederm@xmission.com \
    --cc=gombasg@sztaki.hu \
    --cc=greg@kroah.com \
    --cc=hidave.darkstar@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox