linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Serge E. Hallyn" <serue@us.ibm.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>,
	Kay Sievers <kay.sievers@vrfy.org>,
	linux-kernel@vger.kernel.org, Tejun Heo <tj@kernel.org>,
	Cornelia Huck <cornelia.huck@de.ibm.com>,
	linux-fsdevel@vger.kernel.org,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Benjamin LaHaise <bcrl@lhnet.ca>,
	netdev@vger.kernel.org, Benjamin Thery <benjamin.thery@bull.net>
Subject: Re: [PATCH 3/6] sysfs: Implement sysfs tagged directory support.
Date: Tue, 30 Mar 2010 23:02:34 -0500	[thread overview]
Message-ID: <20100331040234.GA7184@us.ibm.com> (raw)
In-Reply-To: <m1zl1prwie.fsf@fess.ebiederm.org>

Quoting Eric W. Biederman (ebiederm@xmission.com):
> "Serge E. Hallyn" <serue@us.ibm.com> writes:
> 
> > Quoting Eric W. Biederman (ebiederm@xmission.com):
> >>  int sysfs_rename(struct sysfs_dirent *sd,
> >> -	struct sysfs_dirent *new_parent_sd, const char *new_name)
> >> +	struct sysfs_dirent *new_parent_sd, const void *new_ns,
> >> +	const char *new_name)
> >>  {
> >>  	const char *dup_name = NULL;
> >>  	int error;
> >> @@ -743,12 +789,12 @@ int sysfs_rename(struct sysfs_dirent *sd,
> >>  	mutex_lock(&sysfs_mutex);
> >> 
> >>  	error = 0;
> >> -	if ((sd->s_parent == new_parent_sd) &&
> >> +	if ((sd->s_parent == new_parent_sd) && (sd->s_ns == new_ns) &&
> >>  	    (strcmp(sd->s_name, new_name) == 0))
> >>  		goto out;	/* nothing to rename */
> >> 
> >>  	error = -EEXIST;
> >> -	if (sysfs_find_dirent(new_parent_sd, new_name))
> >> +	if (sysfs_find_dirent(new_parent_sd, new_ns, new_name))
> >>  		goto out;
> >> 
> >>  	/* rename sysfs_dirent */
> >> @@ -770,6 +816,7 @@ int sysfs_rename(struct sysfs_dirent *sd,
> >>  		sd->s_parent = new_parent_sd;
> >>  		sysfs_link_sibling(sd);
> >>  	}
> >> +	sd->s_ns = new_ns;
> >> 
> >>  	error = 0;
> >>   out:
> >
> > ...
> >
> >> +void sysfs_exit_ns(enum kobj_ns_type type, const void *ns)
> >> +{
> >> +	struct super_block *sb;
> >> +
> >> +	mutex_lock(&sysfs_mutex);
> >> +	spin_lock(&sb_lock);
> >> +	list_for_each_entry(sb, &sysfs_fs_type.fs_supers, s_instances) {
> >> +		struct sysfs_super_info *info = sysfs_info(sb);
> >> +		/* Ignore superblocks that are in the process of unmounting */
> >> +		if (sb->s_count <= S_BIAS)
> >> +			continue;
> >> +		/* Ignore superblocks with the wrong ns */
> >> +		if (info->ns[type] != ns)
> >> +			continue;
> >> +		info->ns[type] = NULL;
> >> +	}
> >> +	spin_unlock(&sb_lock);
> >> +	mutex_unlock(&sysfs_mutex);
> >> +}
> >> +
> >
> > ..
> >
> >> @@ -136,6 +138,7 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
> >>  			const char *old, const char *new)
> >>  {
> >>  	struct sysfs_dirent *parent_sd, *sd = NULL;
> >> +	const void *old_ns = NULL, *new_ns = NULL;
> >>  	int result;
> >> 
> >>  	if (!kobj)
> >> @@ -143,8 +146,11 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
> >>  	else
> >>  		parent_sd = kobj->sd;
> >> 
> >> +	if (targ->sd)
> >> +		old_ns = targ->sd->s_ns;
> >> +
> >>  	result = -ENOENT;
> >> -	sd = sysfs_get_dirent(parent_sd, old);
> >> +	sd = sysfs_get_dirent(parent_sd, old_ns, old);
> >>  	if (!sd)
> >>  		goto out;
> >> 
> >> @@ -154,7 +160,10 @@ int sysfs_rename_link(struct kobject *kobj, struct kobject *targ,
> >>  	if (sd->s_symlink.target_sd->s_dir.kobj != targ)
> >>  		goto out;
> >> 
> >> -	result = sysfs_rename(sd, parent_sd, new);
> >> +	if (sysfs_ns_type(parent_sd))
> >> +		new_ns = targ->ktype->namespace(targ);
> >> +
> >> +	result = sysfs_rename(sd, parent_sd, new_ns, new);
> >> 
> >>  out:
> >>  	sysfs_put(sd);
> >
> > This is a huge patch, and for the most part I haven't found any problems,
> > except potentially this one.  It looks like sysfs_rename_link() checks
> > old_ns and new_ns before calling sysfs_rename().  But sysfs_mutex isn't
> > taken until sysfs_rename().  sysfs_rename() will then proceed to do
> > the rename, and unconditionally set sd->ns = new_ns.
> >
> > In the meantime, it seems as though new_ns might have exited, and
> > sysfs_exit_ns() unset new_ns on the new parent dir.  This means that
> > we'll end up with the namespace code having thought that it cleared
> > all new_ns's, but this file will have snuck by.  Meaning an action on
> > the renamed file might dereference a freed namespace.
> >
> > Or am I way off base?
> 
> There are a couple of reasons why this is not a concern.
> 
> The only new_ns we clear is on the super block.

Oops, yeah - I failed to note that.

> sysfs itself never dereferences namespace arguments and only uses them
> for comparison purposes.  They are just cookies that cause comparisons
> to differ from a sysfs perspective.
> 
> The upper levels are responsible for taking care of them selves
> sysfs_mutex does not protect them.  If you compile out sysfs the sysfs
> mutex is not even present.
> 
> In the worst case if the upper levels mess up we will have a stale
> token that we never dereference on a sysfs dirent, which in a pathological
> case will happen to be the same as a new namespace and we will have
> a spurious directory entry that we have leaked.
> 
> In practice we move all network devices (and thus sysfs files) out of
> a network namespace before allowing it to exit.

Ok, that makes sense too - so any tagged sysfs file created for some object
in a ns must be deleted at netns exit.  I could imagine someone expecting
that if the ns exits, the tasks in the ns will exit, causing the sysfs
mount to be umounted and auto-deleting the files?  (which of course would
get buggered if task in other ns was examining the mount which it got
through mounts propagation)  We'll have to make sure noone does that.  Should
it be documented somewhere, or is that obvious enough?

(I'm thinking of other namespaces in the future, not net_ns which I
understand doesn't do that)

> The network namespace
> is not listed so it is invisible to anyone wanting to poke a network
> device into an exiting network namespace.  The unlisting of the
> network namespace and the device_rename both happen under the
> rtnl_lock which guarantees they are serialized.
> 
> Eric

  reply	other threads:[~2010-03-31  4:02 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-03-30 18:30 [PATCH 0/6] tagged sysfs support Eric W. Biederman
2010-03-30 18:31 ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Eric W. Biederman
2010-03-30 19:23   ` Eric Dumazet
2010-03-30 23:50     ` [PATCH 7/6] sysfs: Remove double free sysfs_get_sb Eric W. Biederman
2010-03-31  5:01   ` [PATCH 1/6] sysfs: Basic support for multiple super blocks Serge E. Hallyn
2010-03-31  5:01     ` Serge E. Hallyn
2010-03-31  5:41   ` Tejun Heo
2010-03-31  5:51     ` Eric W. Biederman
2010-03-31 13:47       ` Serge E. Hallyn
2010-03-31 14:02         ` Eric W. Biederman
2010-04-05  7:45       ` Tejun Heo
2010-03-30 18:31 ` [PATCH 2/6] kobj: Add basic infrastructure for dealing with namespaces Eric W. Biederman
2010-03-30 18:31 ` [PATCH 3/6] sysfs: Implement sysfs tagged directory support Eric W. Biederman
2010-03-31  2:43   ` Serge E. Hallyn
2010-03-31  3:38     ` Eric W. Biederman
2010-03-31  4:02       ` Serge E. Hallyn [this message]
2010-03-31  4:23         ` Eric W. Biederman
2010-03-31  4:53           ` Serge E. Hallyn
2010-03-31  6:49   ` Tejun Heo
2010-03-31  7:43     ` Eric W. Biederman
2010-03-31  8:17       ` Tejun Heo
2010-03-31  8:22         ` Tejun Heo
2010-03-31  9:39           ` Eric W. Biederman
2010-04-05  8:17             ` Tejun Heo
2010-03-30 18:31 ` [PATCH 4/6] sysfs: Add support for tagged directories with untagged members Eric W. Biederman
2010-03-30 18:31 ` [PATCH 5/6] sysfs: Implement sysfs_delete_link Eric W. Biederman
2010-03-30 18:31 ` [PATCH 6/6] driver core: Implement ns directory support for device classes Eric W. Biederman
2010-03-30 18:53 ` [PATCH 0/6] tagged sysfs support Kay Sievers
2010-03-30 23:04   ` Eric W. Biederman
2010-03-31  5:51     ` Kay Sievers
2010-03-31  6:25       ` Tejun Heo
2010-03-31  6:52       ` Eric W. Biederman
2010-04-03  0:58       ` Ben Hutchings
2010-04-03  8:35         ` Kay Sievers
2010-04-03 16:05           ` Ben Hutchings
2010-04-03 16:35             ` Kay Sievers
2010-03-31 17:21 ` Serge E. Hallyn
2010-03-31 18:09   ` Eric W. Biederman
2010-05-20 17:47 ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100331040234.GA7184@us.ibm.com \
    --to=serue@us.ibm.com \
    --cc=bcrl@lhnet.ca \
    --cc=benjamin.thery@bull.net \
    --cc=cornelia.huck@de.ibm.com \
    --cc=ebiederm@xmission.com \
    --cc=eric.dumazet@gmail.com \
    --cc=gregkh@suse.de \
    --cc=kay.sievers@vrfy.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).