Linux Container Development
 help / color / mirror / Atom feed
From: ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org (Eric W. Biederman)
To: Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
Cc: linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Linux Containers
	<containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org>,
	Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted
Date: Fri, 08 Nov 2013 21:42:36 -0800	[thread overview]
Message-ID: <87k3gigmgj.fsf@xmission.com> (raw)
In-Reply-To: <527C4D88.10907-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org> (Gao feng's message of "Fri, 08 Nov 2013 10:33:44 +0800")

Gao feng <gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org> writes:

> On 11/02/2013 02:06 PM, Gao feng wrote:
>> Hi Eric,
>> 
>> On 08/28/2013 05:44 AM, Eric W. Biederman wrote:
>>>
>>> Rely on the fact that another flavor of the filesystem is already
>>> mounted and do not rely on state in the user namespace.
>>>
>>> Verify that the mounted filesystem is not covered in any significant
>>> way.  I would love to verify that the previously mounted filesystem
>>> has no mounts on top but there are at least the directories
>>> /proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
>>> for other filesystems to mount on top of.
>>>
>>> Refactor the test into a function named fs_fully_visible and call that
>>> function from the mount routines of proc and sysfs.  This makes this
>>> test local to the filesystems involved and the results current of when
>>> the mounts take place, removing a weird threading of the user
>>> namespace, the mount namespace and the filesystems themselves.
>>>
>>> Signed-off-by: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
>>> ---
>>>  fs/namespace.c                 |   37 +++++++++++++++++++++++++------------
>>>  fs/proc/root.c                 |    7 +++++--
>>>  fs/sysfs/mount.c               |    3 ++-
>>>  include/linux/fs.h             |    1 +
>>>  include/linux/user_namespace.h |    4 ----
>>>  kernel/user.c                  |    2 --
>>>  kernel/user_namespace.c        |    2 --
>>>  7 files changed, 33 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/fs/namespace.c b/fs/namespace.c
>>> index 64627f8..877e427 100644
>>> --- a/fs/namespace.c
>>> +++ b/fs/namespace.c
>>> @@ -2867,25 +2867,38 @@ bool current_chrooted(void)
>>>  	return chrooted;
>>>  }
>>>  
>>> -void update_mnt_policy(struct user_namespace *userns)
>>> +bool fs_fully_visible(struct file_system_type *type)
>>>  {
>>>  	struct mnt_namespace *ns = current->nsproxy->mnt_ns;
>>>  	struct mount *mnt;
>>> +	bool visible = false;
>>>  
>>> -	down_read(&namespace_sem);
>>> +	if (unlikely(!ns))
>>> +		return false;
>>> +
>>> +	namespace_lock();
>>>  	list_for_each_entry(mnt, &ns->list, mnt_list) {
>>> -		switch (mnt->mnt.mnt_sb->s_magic) {
>>> -		case SYSFS_MAGIC:
>>> -			userns->may_mount_sysfs = true;
>>> -			break;
>>> -		case PROC_SUPER_MAGIC:
>>> -			userns->may_mount_proc = true;
>>> -			break;
>>> +		struct mount *child;
>>> +		if (mnt->mnt.mnt_sb->s_type != type)
>>> +			continue;
>>> +
>>> +		/* This mount is not fully visible if there are any child mounts
>>> +		 * that cover anything except for empty directories.
>>> +		 */
>>> +		list_for_each_entry(child, &mnt->mnt_mounts, mnt_child) {
>>> +			struct inode *inode = child->mnt_mountpoint->d_inode;
>>> +			if (!S_ISDIR(inode->i_mode))
>>> +				goto next;
>>> +			if (inode->i_nlink != 2)
>>> +				goto next;
>> 
>> 
>> I met a problem that proc filesystem failed to mount in user namespace,
>> The problem is the i_nlink of sysctl entries under proc filesystem is not
>> 2. it always is 1 even it's a directory, see proc_sys_make_inode. and for
>> btrfs, the i_nlink for an empty dir is 2 too. it seems like depends on the
>> filesystem itself,not depends on vfs. In my system binfmt_misc is mounted
>> on /proc/sys/fs/binfmt_misc, and the i_nlink of this directory's inode is
>> 1.

Yes. 1 is what filesystems that are too lazy to count the number of
links to a directory return, and /proc/sys is currently such a
filesystem.

Ordinarily nlink == 2 means a directory does not have any subdirectories.

>> btw, I'm not quite understand what's the inode->i_nlink != 2 here means?
>> is this directory empty? as I know, when we create a file(not dir) under
>> a dir, the i_nlink of this dir will not increase.
>> 
>> And another question, it looks like if we don't have proc/sys fs mounted,
>> then proc/sys will be failed to be mounted?
>> 
>
> Any Idea?? or should we need to revert this patch??

The patch is mostly doing what it is supposed to be doing.

Now the code is slightly buggy.  inode->i_nlink will test to see if a
directory has subdirectories but it won't test to see if a directory is
empty.  Where did my brain go when I was writing that test?

Right now I would rather not have the empty directory exception than
remove this code.

The test is a little trickier to write than it might otherwise be
because /proc and /sys tend to be slightly imperfect filesystems.

I think the only way to really test that is to call readdir on the
directory itself :(  I don't like that thought.

I don't know what I was thinking when I wrote that test but I definitely
goofed up.  Grr!

I can certainly filter out any directory with nlink > 2.  That would be
an easy partial step forward.

The real question though is how do I detect directories it is safe to
mount on where there will not be files in them.  I can't call iterate
with the namespace_lock held so things are a bit tricky.

Eric

  parent reply	other threads:[~2013-11-09  5:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-27 21:44 [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted Eric W. Biederman
     [not found] ` <878uzmhkqg.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-08-27 21:46   ` [REVIEW][PATCH 2/2] sysfs: Restrict mounting sysfs Eric W. Biederman
     [not found]     ` <874naahkng.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-08-28 19:00       ` Greg Kroah-Hartman
2013-09-23 10:33       ` James Hogan
     [not found]         ` <524018EA.9070202-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
2013-09-23 21:41           ` [PATCH] sysfs: Allow mounting without CONFIG_NET Eric W. Biederman
     [not found]             ` <87ioxrrzb6.fsf_-_-HxuHnoDHeQZYhcs0q7wBk77fW72O3V7zAL8bYrjMMd8@public.gmane.org>
2013-09-24 11:25               ` James Hogan
2013-08-27 21:47   ` [REVIEW][PATCH 1/2] userns: Better restrictions on when proc and sysfs can be mounted Andy Lutomirski
     [not found]     ` <CALCETrWPDzuoaJp2ko5jAbwYUBqSdPjvO5uGo-gZVsS4Wm1PKQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-08-27 21:57       ` Eric W. Biederman
     [not found]         ` <87a9k2g5la.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-09-01  4:45           ` Eric W. Biederman
     [not found]         ` <87eh99noa0.fsf@xmission.com>
     [not found]           ` <87eh99noa0.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-09-03 17:40             ` Andy Lutomirski
2013-11-02  6:06   ` Gao feng
     [not found]     ` <52749663.2000701-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2013-11-04  7:00       ` Janne Karhunen
     [not found]         ` <CAE=NcrY+CzX+H4XQTdGj7CSZ98a5T=bNgT6=jGZzcjyaHb-ttw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-09  5:22           ` Eric W. Biederman
2013-11-08  2:33       ` Gao feng
     [not found]         ` <527C4D88.10907-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2013-11-09  5:42           ` Eric W. Biederman [this message]
     [not found]             ` <87k3gigmgj.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-11-13  7:26               ` Gao feng
     [not found]                 ` <5283299B.8080702-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2013-11-14 11:10                   ` Gao feng
     [not found]                     ` <5284AF90.7060506-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2013-11-14 16:54                       ` Andy Lutomirski
     [not found]                         ` <CALCETrXtWtF=JgiwENNzh7UZKnXijHauOQ5ZjHYxYJC-BAU5Aw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-11-15  1:16                           ` Gao feng
     [not found]                         ` <528575EC.2030309@cn.fujitsu.com>
     [not found]                           ` <528575EC.2030309-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2013-11-15  4:54                             ` Eric W. Biederman
     [not found]                               ` <87txfexo25.fsf-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
2013-11-15  6:14                                 ` Gao feng
     [not found]                                   ` <5285BBE2.7010001-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org>
2013-11-15  8:37                                     ` Eric W. Biederman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k3gigmgj.fsf@xmission.com \
    --to=ebiederm-as9lmozglivwk0htik3j/w@public.gmane.org \
    --cc=containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org \
    --cc=gaofeng-BthXqXjhjHXQFUHtdCDX3A@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox