linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Lezcano <dlezcano@fr.ibm.com>
To: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>,
	Benjamin Thery <benjamin.thery@bull.net>,
	Greg KH <greg@kroah.com>,
	linux-kernel@vger.kernel.org,
	"Serge E. Hallyn" <serue@us.ibm.com>,
	Al Viro <viro@ftp.linux.org.uk>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: Re: sysfs: tagged directories not merged completely yet
Date: Tue, 07 Oct 2008 11:01:32 +0200	[thread overview]
Message-ID: <48EB256C.4020003@fr.ibm.com> (raw)
In-Reply-To: <20081003101331.GH28946@ZenIV.linux.org.uk>

Al Viro wrote:
> On Tue, Sep 23, 2008 at 11:23:57AM -0700, Eric W. Biederman wrote:
>> Benjamin Thery <benjamin.thery@bull.net> writes:
>>> Oh.
>>> It's a pity Al couldn't re-review them before. We've already lost a lot
>>> of time with this patchset and it's blocking easier testing of network
>>> namespaces (right now, with a mainline kernel, we have to disable sysfs
>>> to build network namespaces).
>> I am confident that we have a good base with these patches and the rest of
>> the work can be done incrementally on top of them if any issues turn up.
>>
>> Al recent rework of sysctl has a very similar structure.
> 
> No, it does not.  My apologies for delay, but here are more printable parts
> of review.
> 
> First of all, this stuff breaks just about every damn integrity constraint VFS
> knows of.  It tries to tiptoe through the resulting minefield - without
> success.  So the first group of comments will be of "you *really* don't
> do $FOO" variety.  I'm very far from being convinced that we want to
> special-case in VFS every kind of weirdness sysfs happens to do; in effect,
> that would require adding a FS_IS_SYSFS_SO_BEND_OVER fs type flag and making
> a lot of locking conditional on that.
> 
> a) You do *not* share struct inode between the superblocks, for fsck sake!
> b) You do *not* grab i_mutex on ancestors after having grabbed it on
> file, as sysfs_chmod_file() does.
> c) You do *not* change dentry tree topology without s_vfs_rename_mutex on
> affected superblock.  That, BTW, is broken in mainline sysfs as well.
> d) You REALLY, REALLY do not unhash busy dentries of directories.
> 
> In addition to that, there are interesting internal problems:
> * inumbers are released by final sysfs_put(); that can happen before the
> final iput() on corresponding inode.  Guess what happens if new entry is
> created in the meanwhile, reuses the same inumber and lookup gets to
> sysfs_get_inode() on it?
> * may I politely suggest that
> again:
> 	mutex_lock(&A);
> 	if (!mutex_trylock(&B)) {
> 		mutex_unlock(&A);
> 		goto again;
> 	}
> is somewhat, er, deficient way to deal with buggered locking hierarchy?
> Not to mention anything else, that's obviously FUBAR on UP box - if we
> have B contended, we've just killed the box dead since we'll never give
> the CPU back to whoever happens to hold B.  See sysfs_mv_dir() for a lovely
> example.
> * sysfs_count_nlink() is called from sysfs_fill_super() without sysfs_mutex;
> now this sucker can get called at any moment.
> * just what is protecting ->s_tag?
> * __sysfs_remove_dir() appears to assume that subdirectories are possible;
> AFAICS, if we *do* get them, we get very screwed after remove_dir().
> * everything else aside, the internal locking is extremely heavy.  For
> fsck sake, guys, a single system-wide mutex that can be grabbed for the
> duration of readdir on any directory and block just about anything
> in the filesystem?  Just mmap() something over NFS on a slow link and
> do getdents() to such buffer.  Watch a *lot* of stuff getting buggered.
> Hell, you can't even do ifconfig up while that sucker is held...
> 
> Seriously, people, it's getting worse than devfs had ever been ;-/

Thank you Al for reviewing the patchset.

   -- Daniel

  parent reply	other threads:[~2008-10-07  9:01 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-09-22 14:31 sysfs: tagged directories not merged completely yet Benjamin Thery
2008-09-22 15:34 ` Greg KH
2008-09-22 20:24   ` Eric W. Biederman
2008-09-23 14:24   ` Benjamin Thery
2008-09-23 18:23     ` Eric W. Biederman
2008-10-03 10:13       ` Al Viro
2008-10-05  5:32         ` Greg KH
2008-10-07  8:27           ` Eric W. Biederman
2008-10-07 10:47             ` [PATCH 0/3] minor sysfs tagged directory fixes Eric W. Biederman
2008-10-07 10:49               ` [PATCH 1/3] sysfs: Remove lock ordering violation in sysfs_chmod_file Eric W. Biederman
2008-10-07 10:51                 ` [PATCH 2/3] sysfs: Fix and sysfs_mv_dir by using lock_rename Eric W. Biederman
2008-10-07 10:52                   ` [PATCH 3/3] sysfs: Take sysfs_mutex when fetching the root inode Eric W. Biederman
2008-10-07 21:21                   ` [PATCH 2/3] sysfs: Fix and sysfs_mv_dir by using lock_rename Dave Hansen
2008-10-07 21:19                 ` [PATCH 1/3] sysfs: Remove lock ordering violation in sysfs_chmod_file Dave Hansen
2008-10-07 22:31                   ` Eric W. Biederman
2008-10-07 22:27             ` sysfs: tagged directories not merged completely yet Greg KH
2008-10-07 22:54               ` Serge E. Hallyn
2008-10-07 23:39                 ` Greg KH
2008-10-08  0:12                   ` Serge E. Hallyn
2008-10-08  0:38                     ` Greg KH
2008-10-08 14:18                       ` Serge E. Hallyn
2008-10-07 23:34               ` Tejun Heo
2008-10-14  1:11                 ` Eric W. Biederman
2008-10-14  7:55                   ` Tejun Heo
2008-10-14 12:19                     ` Eric W. Biederman
2008-10-15 11:04                       ` Tejun Heo
2008-10-16 21:58                         ` Eric W. Biederman
2008-10-14 18:53                     ` Serge E. Hallyn
2008-10-15  0:48                       ` Eric W. Biederman
2008-10-15 13:42                         ` Serge E. Hallyn
2008-10-15 13:54                           ` Benjamin Thery
2008-10-08  0:39               ` Eric W. Biederman
2008-10-08  1:29               ` Eric W. Biederman
2008-10-07  8:08         ` Eric W. Biederman
2008-10-07  9:01         ` Daniel Lezcano [this message]
2008-10-07  9:12         ` Tejun Heo
2008-10-07 11:56           ` Eric W. Biederman
2008-10-07 12:19             ` Tejun Heo
2008-10-07 23:17               ` Tejun Heo
2008-10-08  0:04                 ` Eric W. Biederman
2008-10-08  0:20                   ` Tejun Heo
2008-10-08  0:58                     ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48EB256C.4020003@fr.ibm.com \
    --to=dlezcano@fr.ibm.com \
    --cc=benjamin.thery@bull.net \
    --cc=ebiederm@xmission.com \
    --cc=greg@kroah.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=serue@us.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=viro@ZenIV.linux.org.uk \
    --cc=viro@ftp.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).