All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rob Landley <rob-VoJi6FS/r0vR7s880joybQ@public.gmane.org>
To: Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org>,
	"Eric W. Biederman"
	<ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Cc: Andrew Vagin <avagin-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>,
	Andrey Vagin <avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
	Linux FS Devel
	<linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	"linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org"
	<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Linux API <linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Andrey Vagin <avagin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Alexander Viro
	<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Cyrill Gorcunov
	<gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>,
	Pavel Emelyanov <xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>,
	Serge Hallyn
	<serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
Subject: Re: [PATCH] [RFC] mnt: add ability to clone mntns starting with the current root
Date: Wed, 08 Oct 2014 16:36:01 -0500	[thread overview]
Message-ID: <5435AE41.20105@landley.net> (raw)
In-Reply-To: <CALCETrVSxYr=Oa29qHNL-GoifS26U8TfpreGY+KN7g926YgHUw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 10/08/14 14:31, Andy Lutomirski wrote:
> On Wed, Oct 8, 2014 at 12:23 PM, Eric W. Biederman
> <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org> wrote:
>> Andy Lutomirski <luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org> writes:
>>>> Maybe we want to say that rootfs should not be used if we are going to
>>>> create containers...
>>
>> Today it is an assumption of the vfs that rootfs is mounted.  With
>> rootfs mounted and pivot_root at the base of the mount stack you can
>> make as minimal of a set of mounts as the vfs allows.
>>
>> Removing rootfs from the vfs requires an audit of everything that
>> manipulates mounts.  It is not remotely a local excercise.
> 
> Would it be a less invasive audit to allow different mount namespaces
> to have different rootfses?

I.E. The same way different namespaces have different init tasks?

The abstraction containers has implemented here should be logically
consistent.

>>> Could we have an extra rootfs-like fs that is always completely empty,
>>> doesn't allow any writes, and can sit at the bottom of container
>>> namespace hierarchies?  If so, and if we add a new syscall that's like
>>> pivot_root (or unshare) but prunes the hierarchy, then we could switch
>>> to that rootfs then.
>>
>> Or equally have something that guarantees that rootfs is empty and
>> read-only at the time the normal root filesystem is mounted.  That is
>> certainly a much more localized change if we want to go there.
>>
>> I am half tempted to suggest that mount --move /some/path / be updated
>> to make the old / just go away (perhaps to be replaced with a read-only
>> empty rootfs).  That gets us into figuring out if we break userspace
>> which is a big challenge.
> 
> Hence my argument for a new syscall or entirely new operation.

I'm still waiting for somebody to explain to my why chroot() shouldn't
be changed to do this instead of adding a new syscall. (At least when
mount namespace support is enabled.)

> mount(2) and friends are way too multiplexed right now.  I just found
> yet another security bug due to the insanely complicated semantics of
> the vfs syscalls.  (Yes, a different one from the one yesterday.)

As the guy who rewrote busybox mount 3 times, and who just implemented a
brand new one (toybox) from scratch:

It's a bit fiddly, yes.

> A new operation kills several birds with one stone.  It could look like:
> 
> int mntns_change_root(int dfd, const char *path, int flags);
> 
> return -EPERM if chrooted.

Really?

>  Returns -EINVAL if path (relative to dfd) isn't a mountmount.

Requiring that chroot() only be called on mountpoints would break
existing semantics, which gets us back to new systemcall instead of
changing behavior of existing one.

If I recall, the first line of pushback against merging the openvz code
as is was "buckets of new syscalls". Pushback against adding a new
system call is understandable. Why can't we fix chroot() now that we
have the tools to do so?

>  Otherwise it disconnects path from the existing
> hierarchy, attaches a permanently-empty read-only rootfs under it,
> makes it the root of the mntns, and does the root refs fixup.  The old
> hierarchy gets thrown out.

We have a chroot() syscall. We don't use it for containers because it
doesn't do what we want. Does it currently do what _anybody_ wants?

> Systemd could use this, too.

While that's a strong argument against it, I'm willing to overlook it.

Rob

WARNING: multiple messages have this Message-ID (diff)
From: Rob Landley <rob@landley.net>
To: Andy Lutomirski <luto@amacapital.net>,
	"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andrew Vagin <avagin@parallels.com>,
	Andrey Vagin <avagin@openvz.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Andrey Vagin <avagin@gmail.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	Cyrill Gorcunov <gorcunov@openvz.org>,
	Pavel Emelyanov <xemul@parallels.com>,
	Serge Hallyn <serge.hallyn@canonical.com>
Subject: Re: [PATCH] [RFC] mnt: add ability to clone mntns starting with the current root
Date: Wed, 08 Oct 2014 16:36:01 -0500	[thread overview]
Message-ID: <5435AE41.20105@landley.net> (raw)
In-Reply-To: <CALCETrVSxYr=Oa29qHNL-GoifS26U8TfpreGY+KN7g926YgHUw@mail.gmail.com>

On 10/08/14 14:31, Andy Lutomirski wrote:
> On Wed, Oct 8, 2014 at 12:23 PM, Eric W. Biederman
> <ebiederm@xmission.com> wrote:
>> Andy Lutomirski <luto@amacapital.net> writes:
>>>> Maybe we want to say that rootfs should not be used if we are going to
>>>> create containers...
>>
>> Today it is an assumption of the vfs that rootfs is mounted.  With
>> rootfs mounted and pivot_root at the base of the mount stack you can
>> make as minimal of a set of mounts as the vfs allows.
>>
>> Removing rootfs from the vfs requires an audit of everything that
>> manipulates mounts.  It is not remotely a local excercise.
> 
> Would it be a less invasive audit to allow different mount namespaces
> to have different rootfses?

I.E. The same way different namespaces have different init tasks?

The abstraction containers has implemented here should be logically
consistent.

>>> Could we have an extra rootfs-like fs that is always completely empty,
>>> doesn't allow any writes, and can sit at the bottom of container
>>> namespace hierarchies?  If so, and if we add a new syscall that's like
>>> pivot_root (or unshare) but prunes the hierarchy, then we could switch
>>> to that rootfs then.
>>
>> Or equally have something that guarantees that rootfs is empty and
>> read-only at the time the normal root filesystem is mounted.  That is
>> certainly a much more localized change if we want to go there.
>>
>> I am half tempted to suggest that mount --move /some/path / be updated
>> to make the old / just go away (perhaps to be replaced with a read-only
>> empty rootfs).  That gets us into figuring out if we break userspace
>> which is a big challenge.
> 
> Hence my argument for a new syscall or entirely new operation.

I'm still waiting for somebody to explain to my why chroot() shouldn't
be changed to do this instead of adding a new syscall. (At least when
mount namespace support is enabled.)

> mount(2) and friends are way too multiplexed right now.  I just found
> yet another security bug due to the insanely complicated semantics of
> the vfs syscalls.  (Yes, a different one from the one yesterday.)

As the guy who rewrote busybox mount 3 times, and who just implemented a
brand new one (toybox) from scratch:

It's a bit fiddly, yes.

> A new operation kills several birds with one stone.  It could look like:
> 
> int mntns_change_root(int dfd, const char *path, int flags);
> 
> return -EPERM if chrooted.

Really?

>  Returns -EINVAL if path (relative to dfd) isn't a mountmount.

Requiring that chroot() only be called on mountpoints would break
existing semantics, which gets us back to new systemcall instead of
changing behavior of existing one.

If I recall, the first line of pushback against merging the openvz code
as is was "buckets of new syscalls". Pushback against adding a new
system call is understandable. Why can't we fix chroot() now that we
have the tools to do so?

>  Otherwise it disconnects path from the existing
> hierarchy, attaches a permanently-empty read-only rootfs under it,
> makes it the root of the mntns, and does the root refs fixup.  The old
> hierarchy gets thrown out.

We have a chroot() syscall. We don't use it for containers because it
doesn't do what we want. Does it currently do what _anybody_ wants?

> Systemd could use this, too.

While that's a strong argument against it, I'm willing to overlook it.

Rob

  parent reply	other threads:[~2014-10-08 21:36 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-07 12:12 [PATCH] [RFC] mnt: add ability to clone mntns starting with the current root Andrey Vagin
2014-10-07 12:12 ` Andrey Vagin
2014-10-07 13:30 ` Al Viro
     [not found]   ` <20141007133039.GG7996-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2014-10-07 13:33     ` Al Viro
2014-10-07 13:33       ` Al Viro
     [not found]       ` <20141007133339.GH7996-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2014-10-07 19:44         ` Andrew Vagin
2014-10-07 19:44           ` Andrew Vagin
2014-10-07 19:44           ` Andrew Vagin
2014-10-07 20:30         ` Eric W. Biederman
2014-10-07 20:30           ` Eric W. Biederman
2014-10-07 20:46           ` Serge Hallyn
2014-10-07 20:52             ` Eric W. Biederman
2014-10-07 20:52               ` Eric W. Biederman
     [not found]               ` <87wq8bvbzg.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-07 21:32                 ` Serge Hallyn
2014-10-07 21:32                   ` Serge Hallyn
2014-10-07 21:42                   ` Eric W. Biederman
     [not found]                     ` <87zjd7r1z9.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-07 22:19                       ` Andy Lutomirski
2014-10-07 22:19                         ` Andy Lutomirski
2014-10-07 22:42                         ` Eric W. Biederman
2014-10-07 22:42                           ` Eric W. Biederman
     [not found]                           ` <87h9zfpkm3.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-07 22:44                             ` Andy Lutomirski
2014-10-07 22:44                               ` Andy Lutomirski
2014-10-07 23:42                               ` Eric W. Biederman
2014-10-07 23:42                                 ` Eric W. Biederman
2014-10-07 23:44                                 ` Andy Lutomirski
2014-10-08  0:20                                   ` Eric W. Biederman
2014-10-08  0:20                                     ` Eric W. Biederman
     [not found]                                     ` <87vbnvif9e.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-08  0:25                                       ` Andy Lutomirski
2014-10-08  0:25                                         ` Andy Lutomirski
     [not found]           ` <87r3yjy64e.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-07 21:02             ` Andy Lutomirski
2014-10-07 21:02               ` Andy Lutomirski
     [not found]               ` <CALCETrXgssZfi3BirQ=K7-vrPyEh5AzFX2pF+yj76Ngi0sf7Yw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 21:26                 ` Eric W. Biederman
2014-10-07 21:26                   ` Eric W. Biederman
2014-10-07 21:26                   ` Eric W. Biederman
     [not found]                   ` <87siizshav.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-07 21:38                     ` Andy Lutomirski
2014-10-07 21:38                       ` Andy Lutomirski
     [not found]                       ` <CALCETrWfZwbGCxnUAg0PnM=tN8MGRQkHrJVC42bVF7sdJKXLmw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-07 21:50                         ` Eric W. Biederman
2014-10-07 21:50                           ` Eric W. Biederman
2014-10-07 21:50                           ` Eric W. Biederman
     [not found]                           ` <87zjd7pn0o.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-07 21:52                             ` Andy Lutomirski
2014-10-07 21:52                               ` Andy Lutomirski
2014-10-07 21:33                 ` Serge Hallyn
2014-10-07 21:33                   ` Serge Hallyn
     [not found] ` <1412683977-29543-1-git-send-email-avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org>
2014-10-07 20:45   ` Eric W. Biederman
2014-10-07 20:45     ` Eric W. Biederman
     [not found]     ` <87mw97wqvx.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-08 11:08       ` Andrew Vagin
2014-10-08 11:08         ` Andrew Vagin
2014-10-08 11:08         ` Andrew Vagin
     [not found]         ` <20141008110829.GC24908-yYYamFZzV1regbzhZkK2zA@public.gmane.org>
2014-10-08 15:35           ` Andy Lutomirski
2014-10-08 15:35             ` Andy Lutomirski
     [not found]             ` <CALCETrX4XrgbQNZZa7=1009KqhJ2gT+VBUkC15+59K9yEiTSbQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-08 19:23               ` Eric W. Biederman
2014-10-08 19:23                 ` Eric W. Biederman
2014-10-08 19:23                 ` Eric W. Biederman
2014-10-08 19:31                 ` Andy Lutomirski
     [not found]                   ` <CALCETrVSxYr=Oa29qHNL-GoifS26U8TfpreGY+KN7g926YgHUw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-08 21:36                     ` Rob Landley [this message]
2014-10-08 21:36                       ` Rob Landley
2014-10-08 22:01                       ` Andy Lutomirski
     [not found]                         ` <CALCETrXapWTiFw2CC1m43fs9yuHuesXxXtmHh-5F3J_bUYeRxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-08 23:38                           ` Serge Hallyn
2014-10-08 23:38                             ` Serge Hallyn
2014-10-08 23:41                             ` Andy Lutomirski
2014-10-08 23:41                               ` Andy Lutomirski
     [not found]                 ` <87vbnue56f.fsf-JOvCrm2gF+uungPnsOpG7nhyD016LWXt@public.gmane.org>
2014-10-08 21:23                   ` Rob Landley
2014-10-08 21:23                     ` Rob Landley
2014-10-09 10:29                   ` Andrew Vagin
2014-10-09 10:29                     ` Andrew Vagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5435AE41.20105@landley.net \
    --to=rob-voji6fs/r0vr7s880joybq@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=avagin-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    --cc=avagin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=avagin-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=gorcunov-GEFAQzZX7r8dnm+yROfE0A@public.gmane.org \
    --cc=linux-api-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=luto-kltTT9wpgjJwATOyAt5JVQ@public.gmane.org \
    --cc=serge.hallyn-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org \
    --cc=viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    --cc=xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.