From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: "Stéphane Graber" <stgraber@ubuntu.com>
Cc: linux-security-module@vger.kernel.org,
Kees Cook <keescook@chromium.org>,
Jonathan Corbet <corbet@lwn.net>,
linux-api@vger.kernel.org,
Linux Containers <containers@lists.linux-foundation.org>,
Jann Horn <jannh@google.com>,
linux-kernel@vger.kernel.org, smbarber@chromium.org,
Seth Forshee <seth.forshee@canonical.com>,
"Eric W. Biederman" <ebiederm@xmission.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Christian Brauner <christian.brauner@ubuntu.com>,
Alexey Dobriyan <adobriyan@gmail.com>,
Alexander Viro <viro@zeniv.linux.org.uk>
Subject: Re: [PATCH v2 00/28] user_namespace: introduce fsid mappings
Date: Mon, 17 Feb 2020 15:03:45 -0800 [thread overview]
Message-ID: <1581980625.24289.30.camel@HansenPartnership.com> (raw)
In-Reply-To: <CA+enf=vwd-dxzve87t7Mw1Z35RZqdLzVaKq=fZ4EGOpnES0f5w@mail.gmail.com>
On Mon, 2020-02-17 at 16:57 -0500, Stéphane Graber wrote:
> On Mon, Feb 17, 2020 at 4:12 PM James Bottomley <
> James.Bottomley@hansenpartnership.com> wrote:
>
> > On Fri, 2020-02-14 at 19:35 +0100, Christian Brauner wrote:
> > [...]
> > > With this patch series we simply introduce the ability to create
> > > fsid mappings that are different from the id mappings of a user
> > > namespace. The whole feature set is placed under a config option
> > > that defaults to false.
> > >
> > > In the usual case of running an unprivileged container we will
> > > have setup an id mapping, e.g. 0 100000 100000. The on-disk
> > > mapping will correspond to this id mapping, i.e. all files which
> > > we want to appear as 0:0 inside the user namespace will be
> > > chowned to 100000:100000 on the host. This works, because
> > > whenever the kernel needs to do a filesystem access it will
> > > lookup the corresponding uid and gid in the idmapping tables of
> > > the container.
> > > Now think about the case where we want to have an id mapping of 0
> > > 100000 100000 but an on-disk mapping of 0 300000 100000 which is
> > > needed to e.g. share a single on-disk mapping with multiple
> > > containers that all have different id mappings.
> > > This will be problematic. Whenever a filesystem access is
> > > requested, the kernel will now try to lookup a mapping for 300000
> > > in the id mapping tables of the user namespace but since there is
> > > none the files will appear to be owned by the overflow id, i.e.
> > > usually 65534:65534 or nobody:nogroup.
> > >
> > > With fsid mappings we can solve this by writing an id mapping of
> > > 0 100000 100000 and an fsid mapping of 0 300000 100000. On
> > > filesystem access the kernel will now lookup the mapping for
> > > 300000 in the fsid mapping tables of the user namespace. And
> > > since such a mapping exists, the corresponding files will have
> > > correct ownership.
> >
> > How do we parametrise this new fsid shift for the unprivileged use
> > case? For newuidmap/newgidmap, it's easy because each user gets a
> > dedicated range and everything "just works (tm)". However, for the
> > fsid mapping, assuming some newfsuid/newfsgid tool to help, that
> > tool has to know not only your allocated uid/gid chunk, but also
> > the offset map of the image. The former is easy, but the latter is
> > going to vary by the actual image ... well unless we standardise
> > some accepted shift for images and it simply becomes a known static
> > offset.
> >
>
> For unprivileged runtimes, I would expect images to be unshifted and
> be unpacked from within a userns.
For images whose resting format is an archive like tar, I concur.
> So your unprivileged user would be allowed a uid/gid range through
> /etc/subuid and /etc/subgid and allowed to use them through
> newuidmap/newgidmap.In that namespace, you can then pull
> and unpack any images/layers you may want and the resulting fs tree
> will look correct from within that namespace.
>
> All that is possible today and is how for example unprivileged LXC
> works right now.
I do have a counter example, but it might be more esoteric: I do use
unprivileged architecture emulation containers to maintain actual
physical system boot environments. These are stored as mountable disk
images, not as archives, so I do need a simple remapping ... however, I
think this use case is simple: it's a back shift along my owned uid/gid
range, so tools for allowing unprivileged use can easily cope with this
use case, so the use is either fsid identity or fsid back along
existing user_ns mapping.
> What this patchset then allows is for containers to have differing
> uid/gid maps while still being based off the same image or layers.
> In this scenario, you would carve a subset of your main uid/gid map
> for each container you run and run them in a child user namespace
> while setting up a fsuid/fsgid map such that their filesystem access
> do not follow their uid/gid map. This then results in proper
> isolation for processes, networks, ... as everything runs as
> different kuid/kgid but the VFS view will be the same in all
> containers.
Who owns the shifted range of the image ... all tenants or none?
> Shared storage between those otherwise isolated containers would also
> work just fine by simply bind-mounting the same path into two or more
> containers.
>
>
> Now one additional thing that would be safe for a setuid wrapper to
> allow would be for arbitrary mapping of any of the uid/gid that the
> user owns to be used within the fsuid/fsgid map. One potential use
> for this would be to create any number of user namespaces, each with
> their own mapping for uid 0 while still having all VFS access be
> mapped to the user that spawned them (say uid=1000, gid=1000).
>
>
> Note that in our case, the intended use for this is from a privileged
> runtime where our images would be unshifted as would be the container
> storage and any shared storage for containers. The security model
> effectively relying on properly configured filesystem permissions and
> mount namespaces such that the content of those paths can never be
> seen by anyone but root outside of those containers (and therefore
> avoids all the issues around setuid/setgid/fscaps).
Yes, I understand ... all orchestration systems are currently hugely
privileged. However, there is interest in getting them down to only
"slightly privileged".
James
> We will then be able to allocate distinct, random, ranges of 65536
> uids/gids (or more) for each container without ever having to do any
> uid/gid shifting at the filesystem layer or run into issues when
> having to setup shared storage between containers or attaching
> external storage volumes to those containers.
next prev parent reply other threads:[~2020-02-17 23:03 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-02-14 18:35 [PATCH v2 00/28] user_namespace: introduce fsid mappings Christian Brauner
2020-02-14 18:35 ` [PATCH v2 01/28] user_namespace: introduce fsid mappings infrastructure Christian Brauner
2020-02-14 18:35 ` [PATCH v2 02/28] proc: add /proc/<pid>/fsuid_map Christian Brauner
2020-02-14 18:35 ` [PATCH v2 03/28] proc: add /proc/<pid>/fsgid_map Christian Brauner
2020-02-14 18:35 ` [PATCH v2 04/28] fsuidgid: add fsid mapping helpers Christian Brauner
2020-02-14 19:11 ` Jann Horn
2020-02-16 16:55 ` Christian Brauner
2020-02-14 18:35 ` [PATCH v2 05/28] proc: task_state(): use from_kfs{g,u}id_munged Christian Brauner
2020-02-14 18:35 ` [PATCH v2 06/28] cred: add kfs{g,u}id Christian Brauner
2020-02-14 18:35 ` [PATCH v2 07/28] sys: __sys_setfsuid(): handle fsid mappings Christian Brauner
2020-02-14 18:35 ` [PATCH v2 08/28] sys: __sys_setfsgid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 09/28] sys:__sys_setuid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 10/28] sys:__sys_setgid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 11/28] sys:__sys_setreuid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 12/28] sys:__sys_setregid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 13/28] sys:__sys_setresuid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 14/28] sys:__sys_setresgid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 15/28] fs: add is_userns_visible() helper Christian Brauner
2020-02-14 18:35 ` [PATCH v2 16/28] namei: may_{o_}create(): handle fsid mappings Christian Brauner
2020-02-14 18:35 ` [PATCH v2 17/28] inode: inode_owner_or_capable(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 18/28] capability: privileged_wrt_inode_uidgid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 19/28] stat: " Christian Brauner
2020-02-14 19:03 ` Tycho Andersen
2020-02-16 14:12 ` Christian Brauner
2020-02-14 18:35 ` [PATCH v2 20/28] open: " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 21/28] posix_acl: " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 22/28] attr: notify_change(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 23/28] commoncap: cap_bprm_set_creds(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 24/28] commoncap: cap_task_fix_setuid(): " Christian Brauner
2020-02-14 18:35 ` [PATCH v2 25/28] commoncap: handle fsid mappings with vfs caps Christian Brauner
2020-02-14 18:35 ` [PATCH v2 26/28] exec: bprm_fill_uid(): handle fsid mappings Christian Brauner
2020-02-14 18:35 ` [PATCH v2 27/28] ptrace: adapt ptrace_may_access() to always uses unmapped fsids Christian Brauner
2020-02-14 18:35 ` [PATCH v2 28/28] devpts: handle fsid mappings Christian Brauner
2020-02-16 15:55 ` [PATCH v2 00/28] user_namespace: introduce " Florian Weimer
2020-02-16 16:40 ` Christian Brauner
2020-02-17 21:06 ` James Bottomley
2020-02-17 21:20 ` Christian Brauner
2020-02-17 22:35 ` James Bottomley
2020-02-17 23:05 ` Christian Brauner
2020-02-17 21:11 ` James Bottomley
[not found] ` <CA+enf=vwd-dxzve87t7Mw1Z35RZqdLzVaKq=fZ4EGOpnES0f5w@mail.gmail.com>
2020-02-17 22:02 ` Stéphane Graber
2020-02-17 23:03 ` James Bottomley [this message]
2020-02-17 23:11 ` Stéphane Graber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1581980625.24289.30.camel@HansenPartnership.com \
--to=james.bottomley@hansenpartnership.com \
--cc=adobriyan@gmail.com \
--cc=christian.brauner@ubuntu.com \
--cc=containers@lists.linux-foundation.org \
--cc=corbet@lwn.net \
--cc=ebiederm@xmission.com \
--cc=jannh@google.com \
--cc=keescook@chromium.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=seth.forshee@canonical.com \
--cc=smbarber@chromium.org \
--cc=stgraber@ubuntu.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.