From: Andrew Vagin <avagin-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: "Eric W. Biederman" <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
Linux FS Devel
<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Andrey Vagin <avagin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
Alexander Viro
<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Cyrill Gorcunov
<gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>,
Serge Hallyn
<serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>,
Rob Landley <rob-VoJi6FS/r0vR7s880joybQ@public.gmane.org>
Subject: Re: [PATCH] [RFC] mnt: add ability to clone mntns starting with the current root
Date: Thu, 9 Oct 2014 14:29:19 +0400 [thread overview]
Message-ID: <20141009102917.GA3257@paralelels.com> (raw)
In-Reply-To: <87vbnue56f.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
On Wed, Oct 08, 2014 at 12:23:52PM -0700, Eric W. Biederman wrote:
> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>
> > On Wed, Oct 8, 2014 at 4:08 AM, Andrew Vagin <avagin-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org> wrote:
> >> On Tue, Oct 07, 2014 at 01:45:22PM -0700, Eric W. Biederman wrote:
> >>> Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org> writes:
> >>>
> >>> > From: Andrey Vagin <avagin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >>> >
> >>> > Currently when we create a new container with a separate root,
> >>> > we need to clone the current mount namespace with all mounts and then
> >>> > clean up it by using pivot_root(). A big part of mountpoints are cloned
> >>> > only to be umounted.
> >>>
> >>> Is the motivation performance? Because if that is the motivation we
> >>> need numbers.
> >>
> >> The major motivation to create a clean mount namespace which contains
> >> only required mounts.
> >>
> >> Now you want to convince us that there is nothing wrong if we use
> >> userns, because all inherited mounts are locked. My point is that all
> >> useless mounts should be umounted. If the current root isn't on rootfs,
> >> pivot_root() allows us to umount all useless points. But pivot_root()
> >> doesn't work, if the current root is on rootfs. How can we umount
> >> useless points in this case?
>
> One of your justifications for a new system call was so you could do
> less. Doing less to get to where you want to go is only justified when
> your doing less to get better performance.
>
> >> Maybe we want to say that rootfs should not be used if we are going to
> >> create containers...
>
> Today it is an assumption of the vfs that rootfs is mounted. With
> rootfs mounted and pivot_root at the base of the mount stack you can
> make as minimal of a set of mounts as the vfs allows.
You have misunderstood me.
For most system /proc/self/mountinfo looks like this:
[root@dhcp-10-30-23-214 ~]# cat /proc/self/mountinfo
17 22 0:3 / /proc rw,relatime - proc proc rw
18 22 0:0 / /sys rw,relatime - sysfs sysfs rw
19 22 0:5 / /dev rw,relatime - devtmpfs devtmpfs rw,size=502324k,nr_inodes=125581,mode=755
20 19 0:11 / /dev/pts rw,relatime - devpts devpts rw,gid=5,mode=620,ptmxmode=000
21 19 0:17 / /dev/shm rw,nosuid,nodev,noexec,relatime - tmpfs tmpfs rw
22 1 253:2 / / rw,relatime - ext4 /dev/vda2 rw,barrier=1,data=ordered
24 22 253:1 / /boot rw,relatime - ext3 /dev/vda1 rw,errors=continue,user_xattr,acl,barrier=1,data=ordered
/ isn't a rootfs mount here and pivot_root() works fine in this case. Here is
no problem for such system.
Now look at the second case:
hell@android:/ $ cat /proc/self/mountinfo
1 1 0:1 / / ro,relatime - rootfs rootfs ro
11 1 0:11 / /dev rw,nosuid,relatime - tmpfs tmpfs rw,mode=755
12 11 0:9 / /dev/pts rw,relatime - devpts devpts rw,mode=600
13 1 0:3 / /proc rw,relatime - proc proc rw
14 1 0:12 / /sys rw,relatime - sysfs sysfs rw
Now / is an rootfs mount. pivot_root() doesn't work in this case and we
need to do some tricks to get a minimal set of mounts.
Thanks,
Andrew
>
> Removing rootfs from the vfs requires an audit of everything that
> manipulates mounts. It is not remotely a local excercise.
>
> One of the things that needs to be considered is that if you really want
> to audit mounts is the code that needs manipulates them needs to be
> audited every bit as much as the mounts themselves.
>
> > Could we have an extra rootfs-like fs that is always completely empty,
> > doesn't allow any writes, and can sit at the bottom of container
> > namespace hierarchies? If so, and if we add a new syscall that's like
> > pivot_root (or unshare) but prunes the hierarchy, then we could switch
> > to that rootfs then.
>
> Or equally have something that guarantees that rootfs is empty and
> read-only at the time the normal root filesystem is mounted. That is
> certainly a much more localized change if we want to go there.
>
> I am half tempted to suggest that mount --move /some/path / be updated
> to make the old / just go away (perhaps to be replaced with a read-only
> empty rootfs). That gets us into figuring out if we break userspace
> which is a big challenge.
>
> Eric
prev parent reply other threads:[~2014-10-09 10:29 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-10-07 12:12 [PATCH] [RFC] mnt: add ability to clone mntns starting with the current root Andrey Vagin
2014-10-07 13:30 ` Al Viro
[not found] ` <20141007133039.GG7996-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2014-10-07 13:33 ` Al Viro
[not found] ` <20141007133339.GH7996-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2014-10-07 19:44 ` Andrew Vagin
2014-10-07 20:30 ` Eric W. Biederman
2014-10-07 20:46 ` Serge Hallyn
2014-10-07 20:52 ` Eric W. Biederman
[not found] ` <87wq8bvbzg.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-07 21:32 ` Serge Hallyn
2014-10-07 21:42 ` Eric W. Biederman
[not found] ` <87zjd7r1z9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-07 22:19 ` Andy Lutomirski
2014-10-07 22:42 ` Eric W. Biederman
[not found] ` <87h9zfpkm3.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-07 22:44 ` Andy Lutomirski
2014-10-07 23:42 ` Eric W. Biederman
2014-10-07 23:44 ` Andy Lutomirski
2014-10-08 0:20 ` Eric W. Biederman
[not found] ` <87vbnvif9e.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-08 0:25 ` Andy Lutomirski
[not found] ` <87r3yjy64e.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-07 21:02 ` Andy Lutomirski
[not found] ` <CALCETrXgssZfi3BirQ=K7-vrPyEh5AzFX2pF+yj76Ngi0sf7Yw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 21:26 ` Eric W. Biederman
[not found] ` <87siizshav.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-07 21:38 ` Andy Lutomirski
[not found] ` <CALCETrWfZwbGCxnUAg0PnM=tN8MGRQkHrJVC42bVF7sdJKXLmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 21:50 ` Eric W. Biederman
[not found] ` <87zjd7pn0o.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-07 21:52 ` Andy Lutomirski
2014-10-07 21:33 ` Serge Hallyn
[not found] ` <1412683977-29543-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2014-10-07 20:45 ` Eric W. Biederman
[not found] ` <87mw97wqvx.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-08 11:08 ` Andrew Vagin
[not found] ` <20141008110829.GC24908-yYYamFZzV1regbzhZkK2zA@public.gmane.org>
2014-10-08 15:35 ` Andy Lutomirski
[not found] ` <CALCETrX4XrgbQNZZa7=1009KqhJ2gT+VBUkC15+59K9yEiTSbQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-08 19:23 ` Eric W. Biederman
2014-10-08 19:31 ` Andy Lutomirski
[not found] ` <CALCETrVSxYr=Oa29qHNL-GoifS26U8TfpreGY+KN7g926YgHUw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-08 21:36 ` Rob Landley
2014-10-08 22:01 ` Andy Lutomirski
[not found] ` <CALCETrXapWTiFw2CC1m43fs9yuHuesXxXtmHh-5F3J_bUYeRxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-08 23:38 ` Serge Hallyn
2014-10-08 23:41 ` Andy Lutomirski
[not found] ` <87vbnue56f.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-08 21:23 ` Rob Landley
2014-10-09 10:29 ` Andrew Vagin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20141009102917.GA3257@paralelels.com \
--to=avagin-bzqdu9zft3wakbo8gow8eq@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
--cc=avagin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
--cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
--cc=gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
--cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
--cc=rob-VoJi6FS/r0vR7s880joybQ@public.gmane.org \
--cc=serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org \
--cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
--cc=xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).