cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Glauber Costa <glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
To: Al Viro <viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org,
	serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org,
	daniel.lezcano-GANU6spQydw@public.gmane.org,
	pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org,
	mzxreary-uLTowLwuiw4b1SvskN2V4Q@public.gmane.org,
	xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org,
	James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org,
	tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
	eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org
Subject: Re: [RFC 0/4] per-namespace allowed filesystems list
Date: Tue, 24 Jan 2012 14:22:49 +0400	[thread overview]
Message-ID: <4F1E8679.5060606@parallels.com> (raw)
In-Reply-To: <20120123211218.GF23916-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>

On 01/24/2012 01:12 AM, Al Viro wrote:
> On Mon, Jan 23, 2012 at 08:56:08PM +0400, Glauber Costa wrote:
>> This patch creates a list of allowed filesystems per-namespace.
>> The goal is to prevent users inside a container, even root,
>> to mount filesystems that are not allowed by the main box admin.
>>
>> My main two motivators to pursue this are:
>>   1) We want to prevent a certain tailored view of some virtual
>>      filesystems, for example, by bind-mounting files with userspace
>>      generated data into /proc. The ability of mounting /proc inside
>>      the container works against this effort, while disallowing it
>>      via capabilities would have the effect of disallowing other
>>      mounts as well.
>
> Translation, please.
>
>> 2) Some filesystems are known not to behave well under a container
>>     environment. They require changes to work in a safe-way. We can
>>     whitelist only the filesystems we want.
>
> So fix them.
>
>> This works as a whitelist. Only filesystems in the list are allowed
>> to be mounted. Doing a blacklist would create problems when, say,
>> a module is loaded. The whitelist is only checked if it is enabled first.
>> So any setup that was already working, will keep working. And whoever
>> is not interested in limiting filesystem mount, does not need
>> to bother about it.
>>
>> Please let me know what you guys think about it.
>
> NAKed-by: Al Viro<viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
> NAKed-because: too fucking ugly
>
> This is bloody ridiculous; if you want to prevent a luser adming playing with
> the set of mounts you've given it, the right way to go is not to mess with the
> "which fs types are allowed" but to add a per-namespace "immutable" flag.
> And add a new clone(2)/unshare(2) flag, used only along with the CLONE_NEWNS
> and setting the "immutable" on the copied namespace.

Okay, not that I laid down the problem, I am happy to pursue any 
solutions we think is better. But let me develop it a bit more, first.

An immutable flag does not work, because I don't want to prevent a luser 
(loved that) to mess up with the mounts they are given. In general, it 
is perfectly fine for them to mount things inside the cointainer as the 
time goes.

But some others, I don't consider so. The example of /proc I've given, 
let me elaborate: Much of the information living on /proc, is really 
global, rather than per-container. The ones pertaining to pid namespace, 
and other namespaces are already per-namespace so they are fine. But 
there is more: some of the things /proc track, like cpu usage, memory, 
and the like, are resource-constrained by other entities, for instance, 
cgroups. In some cases, like /proc/stat, information exists in cgroup, 
but come from more than once cgroup. All of them are independent in 
nature, making it hard to come out with a
coherent vision.

Furthermore, there is no connection between namespaces and cgroups, so 
it is not obvious at all (there were discussions before), which 
information should the process see - unlike namespaces, the mere fact 
that a process lives in a cgroup, does not really mean it is isolated 
from the system in this sense.

One of the solutions, is to do it all in userspace, from outside the 
container, and bind mount the files inside the container's /proc. But it 
only works if we can prevent the user from remounting the real /proc 
somewhere. Not because it would screw up his system, which I don't care 
about, but because it will give him information about the global state 
of the system.

An immutable flag fixes this, but then it prevents all further 
legitimate mounts

  parent reply	other threads:[~2012-01-24 10:22 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-23 16:56 [RFC 0/4] per-namespace allowed filesystems list Glauber Costa
2012-01-23 16:56 ` [RFC 2/4] " Glauber Costa
2012-01-23 16:56 ` [RFC 3/4] show only allowed filesystems in /proc/filesystems Glauber Costa
     [not found] ` <1327337772-1972-1-git-send-email-glommer-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-01-23 16:56   ` [RFC 1/4] move /proc/filesystems inside /proc/self Glauber Costa
2012-01-23 16:56   ` [RFC 4/4] fslist netlink interface Glauber Costa
2012-01-23 19:20   ` [RFC 0/4] per-namespace allowed filesystems list Eric W. Biederman
2012-01-23 21:12   ` Al Viro
     [not found]     ` <20120123211218.GF23916-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org>
2012-01-23 23:04       ` Kirill A. Shutemov
     [not found]         ` <20120123230457.GA14347-oKw7cIdHH8eLwutG50LtGA@public.gmane.org>
2012-01-23 23:12           ` Al Viro
2012-01-24  7:17             ` Kirill A. Shutemov
2012-01-24 10:32           ` Glauber Costa
2012-01-24 10:22       ` Glauber Costa [this message]
2012-01-24  0:04 ` Eric W. Biederman
     [not found]   ` <m1vco2m0eh.fsf-+imSwln9KH6u2/kzUuoCbdi2O/JbrIOy@public.gmane.org>
2012-01-24 10:31     ` Glauber Costa
     [not found]       ` <4F1E886A.7000107-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org>
2012-01-24 11:17         ` Eric W. Biederman
2012-01-24 11:24           ` Glauber Costa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F1E8679.5060606@parallels.com \
    --to=glommer-bzqdu9zft3wakbo8gow8eq@public.gmane.org \
    --cc=James.Bottomley-d9PhHud1JfjCXq6kfMZ53/egYHeGw8Jk@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=daniel.lezcano-GANU6spQydw@public.gmane.org \
    --cc=ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org \
    --cc=eric.dumazet-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=linux-fsdevel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mzxreary-uLTowLwuiw4b1SvskN2V4Q@public.gmane.org \
    --cc=pjt-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=serge-A9i7LUbDfNHQT0dZR+AlfA@public.gmane.org \
    --cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
    --cc=viro-3bDd1+5oDREiFSDQTTA3OLVCufUGDwFn@public.gmane.org \
    --cc=xemul-bzQdu9zFT3WakBO8gow8eQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).