From: Seth Forshee <seth.forshee@canonical.com>
To: Djalal Harouni <tixxdz@opendz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Chris Mason <clm@fb.com>,
tytso@mit.edu, Serge Hallyn <serge.hallyn@canonical.com>,
Josh Triplett <josh@joshtriplett.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Andy Lutomirski <luto@kernel.org>,
Dongsu Park <dongsu@endocode.com>,
David Herrmann <dh.herrmann@googlemail.com>,
Miklos Szeredi <mszeredi@redhat.com>,
AlbanCrequyalban.crequy@gmail.com, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org,
linux-security-module@vger.kernel.org
Subject: Re: [RFC PATCH 0/0] VFS:userns: support portable root filesystems
Date: Wed, 4 May 2016 08:34:04 -0500 [thread overview]
Message-ID: <20160504133404.GA139418@ubuntu-hedt> (raw)
In-Reply-To: <1462317714-27360-1-git-send-email-tixxdz@opendz.org>
On Wed, May 04, 2016 at 01:21:46AM +0200, Djalal Harouni wrote:
> This RFC tries to explore how to support filesystem operations inside
> user namespace using only VFS and a per mount namespace solution. This
> allows to take advantage of user namespace separations without
> introducing any change at the filesystems level. All this is handled
> with the virtual view of mount namespaces.
>
>
> 1) Presentation:
> ================
>
> The main aim is to support portable root filesystems and allow containers,
> virtual machines and other cases to use the same root filesystem.
> Due to security reasons, filesystems can't be mounted inside user
> namespaces, and mounting them outside will not solve the problem since
> they will show up with the wrong UIDs/GIDs. Read and write operations
> will also fail and so on.
>
> The current userspace solution is to automatically chown the whole root
> filesystem before starting a container, example:
> (host) init_user_ns 1000000:1065536 => (container) user_ns_X1 0:65535
> (host) init_user_ns 2000000:2065536 => (container) user_ns_Y1 0:65535
> (host) init_user_ns 3000000:3065536 => (container) user_ns_Z1 0:65535
> ...
>
> Every time a chown is called, files are changed and so on... This
> prevents to have portable filesystems where you can throw anywhere
> and boot. Having an extra step to adapt the filesystem to the current
> mapping and persist it will not allow to verify its integrity, it makes
> snapshots and migration a bit harder, and probably other limitations...
>
> It seems that there are multiple ways to allow user namespaces combine
> nicely with filesystems, but none of them is that easy. The bind mount
> and pin the user namespace during mount time will not work, bind mounts
> share the same super block, hence you may endup working on the wrong
> vfsmount context and there is no easy way to get out of that...
>
> Using the user namespace in the super block seems the way to go, and
> there is the "Support fuse mounts in user namespaces" [1] patches which
> seem nice but perhaps too complex!? there is also the overlayfs solution,
> and finaly the VFS layer solution.
I'm not sure if you're meaning to propose your patches as an alternative
to mine or not, but I think they're orthogonal. My goal is to allow
containers in user namespaces to mount some subset of filesystem types
(not specifically container root filesystems, but in general), which
your patches won't enable. Your goal is to share a rootfs between
multiple containers with different uid/gid shifts, which my patches
don't help with.
> We present here a simple VFS solution, everything is packed inside VFS,
> filesystems don't need to know anything (except probably XFS, and special
> operations inside union filesystems). Currently it supports ext4, btrfs
> and overlayfs. Changes into filesystems are small, just parse the
> vfs_shift_uids and vfs_shift_gids options during mount and set the
> appropriate flags into the super_block structure.
>
> 1) Filesystems don't need the FS_USERNS_MOUNT flag, so no user
> namespace mounting, they stay secure, nothing changes.
>
> 2) The solution is based on VFS and mount namespaces, we use the user
> namespace of the containing mount namespace to check if we should shift
> UIDs/GIDs from/to virtual <=> on-disk view.
> If a filesystem was mounted with "vfs_shift_uids" and "vfs_shift_gids"
> options, and if it shows up inside a mount namespace that supports VFS
> UIDs/GIDs shifts then during each access we will remap UID/GID either
> to virtual or to on-disk view using simple helper functions to allow the
> access. In case the mount or current mount namespace do not support VFS
> UID/GID shifts, we fallback to the old behaviour, no shift is performed.
>
> 3) inodes will always keep their original values which reflect the
> mapping inside init_user_ns which we consider the on-disk mapping.
> Therfore they will have a mapping from 0:65536 on-disk, these values are
> the persistent values that we have to write to the disk. We don't keep
> track of any UID/GID shift that was applied before. This gives
> portability and allows to use the previous mapping which was freed for
> another root filesystem...
Sorry, I haven't had time to look at the patches, but how are you
handling suid/sgid? Will the process get the ids in the inode or the
shifted ids?
Thanks,
Seth
next prev parent reply other threads:[~2016-05-04 13:34 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1462317714-27360-1-git-send-email-tixxdz@opendz.org>
2016-05-04 0:41 ` [RFC PATCH 0/0] VFS:userns: support portable root filesystems Josh Triplett
2016-05-04 10:08 ` Djalal Harouni
2016-05-04 16:38 ` Josh Triplett
2016-05-04 13:34 ` Seth Forshee [this message]
2016-05-04 18:35 ` Djalal Harouni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160504133404.GA139418@ubuntu-hedt \
--to=seth.forshee@canonical.com \
--cc=AlbanCrequyalban.crequy@gmail.com \
--cc=clm@fb.com \
--cc=dh.herrmann@googlemail.com \
--cc=dongsu@endocode.com \
--cc=ebiederm@xmission.com \
--cc=josh@joshtriplett.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mszeredi@redhat.com \
--cc=serge.hallyn@canonical.com \
--cc=tixxdz@opendz.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox