From: Daniel Lezcano <dlezcano@fr.ibm.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
Benjamin Thery <benjamin.thery@bull.net>,
Greg KH <greg@kroah.com>,
linux-kernel@vger.kernel.org,
"Serge E. Hallyn" <serue@us.ibm.com>,
Al Viro <viro@ftp.linux.org.uk>,
Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: sysfs: tagged directories not merged completely yet
Date: Tue, 07 Oct 2008 11:01:32 +0200 [thread overview]
Message-ID: <48EB256C.4020003@fr.ibm.com> (raw)
In-Reply-To: <20081003101331.GH28946@ZenIV.linux.org.uk>
Al Viro wrote:
> On Tue, Sep 23, 2008 at 11:23:57AM -0700, Eric W. Biederman wrote:
>> Benjamin Thery <benjamin.thery@bull.net> writes:
>>> Oh.
>>> It's a pity Al couldn't re-review them before. We've already lost a lot
>>> of time with this patchset and it's blocking easier testing of network
>>> namespaces (right now, with a mainline kernel, we have to disable sysfs
>>> to build network namespaces).
>> I am confident that we have a good base with these patches and the rest of
>> the work can be done incrementally on top of them if any issues turn up.
>>
>> Al recent rework of sysctl has a very similar structure.
>
> No, it does not. My apologies for delay, but here are more printable parts
> of review.
>
> First of all, this stuff breaks just about every damn integrity constraint VFS
> knows of. It tries to tiptoe through the resulting minefield - without
> success. So the first group of comments will be of "you *really* don't
> do $FOO" variety. I'm very far from being convinced that we want to
> special-case in VFS every kind of weirdness sysfs happens to do; in effect,
> that would require adding a FS_IS_SYSFS_SO_BEND_OVER fs type flag and making
> a lot of locking conditional on that.
>
> a) You do *not* share struct inode between the superblocks, for fsck sake!
> b) You do *not* grab i_mutex on ancestors after having grabbed it on
> file, as sysfs_chmod_file() does.
> c) You do *not* change dentry tree topology without s_vfs_rename_mutex on
> affected superblock. That, BTW, is broken in mainline sysfs as well.
> d) You REALLY, REALLY do not unhash busy dentries of directories.
>
> In addition to that, there are interesting internal problems:
> * inumbers are released by final sysfs_put(); that can happen before the
> final iput() on corresponding inode. Guess what happens if new entry is
> created in the meanwhile, reuses the same inumber and lookup gets to
> sysfs_get_inode() on it?
> * may I politely suggest that
> again:
> mutex_lock(&A);
> if (!mutex_trylock(&B)) {
> mutex_unlock(&A);
> goto again;
> }
> is somewhat, er, deficient way to deal with buggered locking hierarchy?
> Not to mention anything else, that's obviously FUBAR on UP box - if we
> have B contended, we've just killed the box dead since we'll never give
> the CPU back to whoever happens to hold B. See sysfs_mv_dir() for a lovely
> example.
> * sysfs_count_nlink() is called from sysfs_fill_super() without sysfs_mutex;
> now this sucker can get called at any moment.
> * just what is protecting ->s_tag?
> * __sysfs_remove_dir() appears to assume that subdirectories are possible;
> AFAICS, if we *do* get them, we get very screwed after remove_dir().
> * everything else aside, the internal locking is extremely heavy. For
> fsck sake, guys, a single system-wide mutex that can be grabbed for the
> duration of readdir on any directory and block just about anything
> in the filesystem? Just mmap() something over NFS on a slow link and
> do getdents() to such buffer. Watch a *lot* of stuff getting buggered.
> Hell, you can't even do ifconfig up while that sucker is held...
>
> Seriously, people, it's getting worse than devfs had ever been ;-/
Thank you Al for reviewing the patchset.
-- Daniel
next prev parent reply other threads:[~2008-10-07 9:01 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-09-22 14:31 sysfs: tagged directories not merged completely yet Benjamin Thery
2008-09-22 15:34 ` Greg KH
2008-09-22 20:24 ` Eric W. Biederman
2008-09-23 14:24 ` Benjamin Thery
2008-09-23 18:23 ` Eric W. Biederman
2008-10-03 10:13 ` Al Viro
2008-10-05 5:32 ` Greg KH
2008-10-07 8:27 ` Eric W. Biederman
2008-10-07 10:47 ` [PATCH 0/3] minor sysfs tagged directory fixes Eric W. Biederman
2008-10-07 10:49 ` [PATCH 1/3] sysfs: Remove lock ordering violation in sysfs_chmod_file Eric W. Biederman
2008-10-07 10:51 ` [PATCH 2/3] sysfs: Fix and sysfs_mv_dir by using lock_rename Eric W. Biederman
2008-10-07 10:52 ` [PATCH 3/3] sysfs: Take sysfs_mutex when fetching the root inode Eric W. Biederman
2008-10-07 21:21 ` [PATCH 2/3] sysfs: Fix and sysfs_mv_dir by using lock_rename Dave Hansen
2008-10-07 21:19 ` [PATCH 1/3] sysfs: Remove lock ordering violation in sysfs_chmod_file Dave Hansen
2008-10-07 22:31 ` Eric W. Biederman
2008-10-07 22:27 ` sysfs: tagged directories not merged completely yet Greg KH
2008-10-07 22:54 ` Serge E. Hallyn
2008-10-07 23:39 ` Greg KH
2008-10-08 0:12 ` Serge E. Hallyn
2008-10-08 0:38 ` Greg KH
2008-10-08 14:18 ` Serge E. Hallyn
2008-10-07 23:34 ` Tejun Heo
2008-10-14 1:11 ` Eric W. Biederman
2008-10-14 7:55 ` Tejun Heo
2008-10-14 12:19 ` Eric W. Biederman
2008-10-15 11:04 ` Tejun Heo
2008-10-16 21:58 ` Eric W. Biederman
2008-10-14 18:53 ` Serge E. Hallyn
2008-10-15 0:48 ` Eric W. Biederman
2008-10-15 13:42 ` Serge E. Hallyn
2008-10-15 13:54 ` Benjamin Thery
2008-10-08 0:39 ` Eric W. Biederman
2008-10-08 1:29 ` Eric W. Biederman
2008-10-07 8:08 ` Eric W. Biederman
2008-10-07 9:01 ` Daniel Lezcano [this message]
2008-10-07 9:12 ` Tejun Heo
2008-10-07 11:56 ` Eric W. Biederman
2008-10-07 12:19 ` Tejun Heo
2008-10-07 23:17 ` Tejun Heo
2008-10-08 0:04 ` Eric W. Biederman
2008-10-08 0:20 ` Tejun Heo
2008-10-08 0:58 ` Eric W. Biederman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48EB256C.4020003@fr.ibm.com \
--to=dlezcano@fr.ibm.com \
--cc=benjamin.thery@bull.net \
--cc=ebiederm@xmission.com \
--cc=greg@kroah.com \
--cc=linux-kernel@vger.kernel.org \
--cc=serue@us.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=viro@ZenIV.linux.org.uk \
--cc=viro@ftp.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).