All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <htejun@gmail.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Gabor Gombas <gombasg@sztaki.hu>, Greg KH <greg@kroah.com>,
	linux-kernel@vger.kernel.org, ebiederm@xmission.com,
	bluez-devel@lists.sf.net
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs
Date: Sun, 06 Jan 2008 11:07:52 +0900	[thread overview]
Message-ID: <478037F8.8020103@gmail.com> (raw)
In-Reply-To: <20080105194510.GK27894@ZenIV.linux.org.uk>

Hello,

Al Viro wrote:
> On Sat, Jan 05, 2008 at 11:30:25PM +0900, Tejun Heo wrote:
>>> Assuming that this is what we get, everything looks explainable - we
>>> have sysfs_rename_dir() calling sysfs_get_dentry() while the parent
>>> gets evicted.  We don't have any exclusion, so while we are playing
>>> silly buggers with lookups in sysfs_get_dentry() we have parent become
>>> negative; the rest is obvious...
>> That part of code is walking down the sysfs tree from the s_root of
>> sysfs hierarchy and on each step parent is held using dget() while being
>> referenced, so I don't think they can turn negative there.
> 
> Turn?  Just what stops you from getting a negative (and unhashed) from
> lookup_one_noperm() and on the next iteration being buggered on mutex_lock()?

Right, I haven't thought about that.  When sysfs_get_dentry() is called,
@sd is always valid so unless there was existing negative dentry, lookup
is guaranteed to return positive dentry, but by populating dcache with
negative dentry before a node is created, things can go wrong.  I don't
think that's what's going on here tho.  If that was the case, the
while() loop looking up the next sd to lookup (@cur) should have blown
up as negative dentry will have NULL d_fsdata which doesn't match any sd.

I guess what's needed here is d_revalidate() as other distributed
filesystems do.  I'll test whether this can be actually triggered and
prepare a fix.  Thanks a lot for pointing out the problem.

>>> AFAICS, the locking here is quite broken and frankly, sysfs_get_dentry()
>>> and the way it plays with fs/namei.c are ucking fugly.
>> Can you elaborate a bit?  The locking in sysfs is unconventional but
>> that's mostly from necessity.  It has dual interface - vfs and driver
>> model && vfs data structures (dentry and inode) are too big to always
>> keep around, so it basically becomes a small distributed file system
>> where the backing data can change asynchronously.
> 
> ... with all fun that creates.  As it is, you have those async changers
> of backing data using VFS locking _under_ sysfs locks via lookup_one_noperm()
> and yet it needs sysfs_mutex inside sysfs_lookup().  So you can't have
> sysfs_get_dentry() under it.  So you don't have exclusion with arseloads
> of sysfs tree changes in there.  Joy...

There are two locks.  sysfs_rename_mutex and sysfs_mutex.
sysfs_rename_mutex is above VFS locks while sysfs_mutex is below VFS
locks.  sysfs_rename_mutex() protects against move/rename which can
change the ancestry of a held sysfs_dirent while sysfs_mutex protects
the sd hierarchy itself.  Locking can be wrong if sysfs_rename_mutex
locking is missing from the places where ancestry of a held sd can
change but I can't find one ATM.  If I'm missing your point again, feel
free to scream at me.  :-)

As it's unnecessarily unintuitive, there's a pending change to rename
sysfs_rename_mutex and use it to protect the whole tree structure to
make locking simpler while using sysfs_mutex to guard VFS access such
that the locking hierarchy plainly becomes sysfs_rename_mutex - VFS
locks - sysfs_mutex where all internal sysfs structure is protected by
the outer mutex and the inner one just protects VFS accesses.

> Frankly, with the current state of sysfs the last vestiges of arguments
> used to push it into the tree back then are dead and buried.  I'm not
> blaming you, BTW - the shitpile *did* grow past the point where its
> memory footprint became far too large and something needed to be done.
> Unfortunately, it happened too late for that something being "get rid
> of the entire mess" and now we are saddled with it for good.

Yeah, it's too late to get rid of sysfs and regardless implementation
ugliness, which BTW I think has improved a lot during last six or so
months, it's now pretty useful and important to drivers, so I guess the
only option is trying hard to make it better.

Oh, BTW, the ugly lookup_one_noperm() can be removed if LOOKUP_NOPERM
flag is added.  The only reason sysfs_lookup() uses the specialized
lookup is to avoid permission check.

Thanks.

-- 
tejun

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bluez-devel mailing list
Bluez-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bluez-devel

WARNING: multiple messages have this Message-ID (diff)
From: Tejun Heo <htejun@gmail.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Gabor Gombas <gombasg@sztaki.hu>,
	Dave Young <hidave.darkstar@gmail.com>,
	linux-kernel@vger.kernel.org, bluez-devel@lists.sourceforge.net,
	Greg KH <greg@kroah.com>,
	ebiederm@xmission.com
Subject: Re: [Bluez-devel] Oops involving RFCOMM and sysfs
Date: Sun, 06 Jan 2008 11:07:52 +0900	[thread overview]
Message-ID: <478037F8.8020103@gmail.com> (raw)
In-Reply-To: <20080105194510.GK27894@ZenIV.linux.org.uk>

Hello,

Al Viro wrote:
> On Sat, Jan 05, 2008 at 11:30:25PM +0900, Tejun Heo wrote:
>>> Assuming that this is what we get, everything looks explainable - we
>>> have sysfs_rename_dir() calling sysfs_get_dentry() while the parent
>>> gets evicted.  We don't have any exclusion, so while we are playing
>>> silly buggers with lookups in sysfs_get_dentry() we have parent become
>>> negative; the rest is obvious...
>> That part of code is walking down the sysfs tree from the s_root of
>> sysfs hierarchy and on each step parent is held using dget() while being
>> referenced, so I don't think they can turn negative there.
> 
> Turn?  Just what stops you from getting a negative (and unhashed) from
> lookup_one_noperm() and on the next iteration being buggered on mutex_lock()?

Right, I haven't thought about that.  When sysfs_get_dentry() is called,
@sd is always valid so unless there was existing negative dentry, lookup
is guaranteed to return positive dentry, but by populating dcache with
negative dentry before a node is created, things can go wrong.  I don't
think that's what's going on here tho.  If that was the case, the
while() loop looking up the next sd to lookup (@cur) should have blown
up as negative dentry will have NULL d_fsdata which doesn't match any sd.

I guess what's needed here is d_revalidate() as other distributed
filesystems do.  I'll test whether this can be actually triggered and
prepare a fix.  Thanks a lot for pointing out the problem.

>>> AFAICS, the locking here is quite broken and frankly, sysfs_get_dentry()
>>> and the way it plays with fs/namei.c are ucking fugly.
>> Can you elaborate a bit?  The locking in sysfs is unconventional but
>> that's mostly from necessity.  It has dual interface - vfs and driver
>> model && vfs data structures (dentry and inode) are too big to always
>> keep around, so it basically becomes a small distributed file system
>> where the backing data can change asynchronously.
> 
> ... with all fun that creates.  As it is, you have those async changers
> of backing data using VFS locking _under_ sysfs locks via lookup_one_noperm()
> and yet it needs sysfs_mutex inside sysfs_lookup().  So you can't have
> sysfs_get_dentry() under it.  So you don't have exclusion with arseloads
> of sysfs tree changes in there.  Joy...

There are two locks.  sysfs_rename_mutex and sysfs_mutex.
sysfs_rename_mutex is above VFS locks while sysfs_mutex is below VFS
locks.  sysfs_rename_mutex() protects against move/rename which can
change the ancestry of a held sysfs_dirent while sysfs_mutex protects
the sd hierarchy itself.  Locking can be wrong if sysfs_rename_mutex
locking is missing from the places where ancestry of a held sd can
change but I can't find one ATM.  If I'm missing your point again, feel
free to scream at me.  :-)

As it's unnecessarily unintuitive, there's a pending change to rename
sysfs_rename_mutex and use it to protect the whole tree structure to
make locking simpler while using sysfs_mutex to guard VFS access such
that the locking hierarchy plainly becomes sysfs_rename_mutex - VFS
locks - sysfs_mutex where all internal sysfs structure is protected by
the outer mutex and the inner one just protects VFS accesses.

> Frankly, with the current state of sysfs the last vestiges of arguments
> used to push it into the tree back then are dead and buried.  I'm not
> blaming you, BTW - the shitpile *did* grow past the point where its
> memory footprint became far too large and something needed to be done.
> Unfortunately, it happened too late for that something being "get rid
> of the entire mess" and now we are saddled with it for good.

Yeah, it's too late to get rid of sysfs and regardless implementation
ugliness, which BTW I think has improved a lot during last six or so
months, it's now pretty useful and important to drivers, so I guess the
only option is trying hard to make it better.

Oh, BTW, the ugly lookup_one_noperm() can be removed if LOOKUP_NOPERM
flag is added.  The only reason sysfs_lookup() uses the specialized
lookup is to avoid permission check.

Thanks.

-- 
tejun

  reply	other threads:[~2008-01-06  2:07 UTC|newest]

Thread overview: 74+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-12-28 17:32 [Bluez-devel] Oops involving RFCOMM and sysfs Gabor Gombas
2007-12-28 17:32 ` Gabor Gombas
2007-12-29  8:07 ` [Bluez-devel] " Dave Young
2007-12-29  8:07   ` Dave Young
2008-01-02 14:48   ` Gabor Gombas
2008-01-02 14:48     ` Gabor Gombas
2008-01-02 15:16   ` Gabor Gombas
2008-01-03 13:16     ` Gabor Gombas
2008-01-04  1:05       ` Dave Young
2008-01-04  1:05         ` Dave Young
2008-01-07  8:07         ` Tejun Heo
2008-01-07  8:07           ` Tejun Heo
2008-01-07 14:10         ` Gabor Gombas
2008-01-07 14:10           ` Gabor Gombas
2008-01-05  7:50     ` Al Viro
2008-01-05  7:50       ` Al Viro
2008-01-05 14:30       ` Tejun Heo
2008-01-05 14:30         ` Tejun Heo
2008-01-05 19:45         ` Al Viro
2008-01-05 19:45           ` Al Viro
2008-01-06  2:07           ` Tejun Heo [this message]
2008-01-06  2:07             ` Tejun Heo
2008-01-06  2:18             ` Al Viro
2008-01-06  2:18               ` Al Viro
2008-01-06  2:54               ` Tejun Heo
2008-01-06  2:54                 ` Tejun Heo
2008-01-06  3:35                 ` Al Viro
2008-01-06  3:35                   ` Al Viro
2008-01-06  3:54                   ` Tejun Heo
2008-01-07  2:37             ` Tejun Heo
2008-01-07  2:37               ` Tejun Heo
2008-01-07  8:21               ` Eric W. Biederman
2008-01-07  8:21                 ` Eric W. Biederman
2008-01-07  9:17                 ` Tejun Heo
2008-01-07  9:17                   ` Tejun Heo
2008-01-07  9:18                   ` Tejun Heo
2008-01-07  9:18                     ` Tejun Heo
2008-01-07  9:22                     ` Al Viro
2008-01-07  9:22                       ` Al Viro
2008-01-07 10:33                       ` Eric W. Biederman
2008-01-07 10:33                         ` Eric W. Biederman
2008-01-07 14:13       ` Gabor Gombas
2008-01-07 14:13         ` Gabor Gombas
2008-01-07 15:24         ` Tejun Heo
2008-01-07 15:24           ` Tejun Heo
2008-01-07 21:00           ` Gabor Gombas
2008-01-07 21:00             ` Gabor Gombas
2008-01-08  9:42             ` Tejun Heo
2008-01-08 13:32               ` Gabor Gombas
2008-01-08 13:32                 ` Gabor Gombas
2008-01-09  9:16                 ` Tejun Heo
2008-01-09  9:16                   ` Tejun Heo
2008-01-09 15:57                   ` Cornelia Huck
2008-01-10  1:11                   ` Dave Young
2008-01-10  1:11                     ` Dave Young
2008-01-11 23:09                     ` Gabor Gombas
2008-01-11 23:09                       ` Gabor Gombas
2008-01-14  7:05                       ` Dave Young
2008-01-14 12:52                         ` Cornelia Huck
2008-01-15  1:57                           ` Dave Young
2008-01-16  1:02                             ` Dave Young
2008-01-16 23:06                               ` Gabor Gombas
2008-01-17  7:24                                 ` Dave Young
2008-01-17  8:15                                   ` Dave Young
2008-01-17 11:42                                     ` Cornelia Huck
2008-01-18  3:37                                       ` Dave Young
2008-01-18  9:19                                         ` Cornelia Huck
2008-01-18 10:23                                           ` Cornelia Huck
2008-01-18 10:34                                             ` Dave Young
2008-01-18 11:26                                               ` Cornelia Huck
2008-01-21  3:15                                                 ` Dave Young
2008-01-21 15:09                                                   ` [Patch] Driver core: Cleanup get_device_parent() in device_add() and device_move() Cornelia Huck
2008-01-10 10:15                   ` [Bluez-devel] Oops involving RFCOMM and sysfs Gabor Gombas
2008-01-10 10:15                     ` Gabor Gombas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=478037F8.8020103@gmail.com \
    --to=htejun@gmail.com \
    --cc=bluez-devel@lists.sf.net \
    --cc=bluez-devel@lists.sourceforge.net \
    --cc=ebiederm@xmission.com \
    --cc=gombasg@sztaki.hu \
    --cc=greg@kroah.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@ZenIV.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.