From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: virtio-fs@redhat.com, qemu-devel@nongnu.org,
Stefan Hajnoczi <stefanha@redhat.com>,
Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [Virtio-fs] [PATCH 0/2] virtiofsd: drop Linux capabilities(7)
Date: Fri, 19 Jun 2020 17:16:48 +0100 [thread overview]
Message-ID: <20200619161648.GJ2690@work-vm> (raw)
In-Reply-To: <20200619160923.GD3154@redhat.com>
* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Fri, Jun 19, 2020 at 09:27:46AM +0100, Dr. David Alan Gilbert wrote:
> > * Vivek Goyal (vgoyal@redhat.com) wrote:
> > > On Thu, Jun 18, 2020 at 08:16:55PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Vivek Goyal (vgoyal@redhat.com) wrote:
> > > > > On Thu, Apr 16, 2020 at 05:49:05PM +0100, Stefan Hajnoczi wrote:
> > > > > > virtiofsd doesn't need of all Linux capabilities(7) available to root. Keep a
> > > > > > whitelisted set of capabilities that we require. This improves security in
> > > > > > case virtiofsd is compromised by making it hard for an attacker to gain further
> > > > > > access to the system.
> > > > >
> > > > > Hi Stefan,
> > > > >
> > > > > I just noticed that this patch set breaks overlayfs on top of virtiofs.
> > > > >
> > > > > overlayfs sets "trusted.overlay.*" and xattrs in trusted domain
> > > > > need CAP_SYS_ADMIN.
> > > > >
> > > > > man xattr says.
> > > > >
> > > > > Trusted extended attributes
> > > > > Trusted extended attributes are visible and accessible only to pro‐
> > > > > cesses that have the CAP_SYS_ADMIN capability. Attributes in this
> > > > > class are used to implement mechanisms in user space (i.e., outside the
> > > > > kernel) which keep information in extended attributes to which ordinary
> > > > > processes should not have access.
> > > > >
> > > > > There is a chance that overlay moves away from trusted xattr in future.
> > > > > But for now we need to make it work. This is an important use case for
> > > > > kata docker in docker build.
> > > > >
> > > > > May be we can add an option to virtiofsd say "--add-cap <capability>" and
> > > > > ask user to pass in "--add-cap cap_sys_admin" if they need to run daemon
> > > > > with this capaibility.
> > > >
> > > > I'll admit I don't like the idea of giving it cap_sys_admin.
> > > > Can you explain:
> > > > a) What overlayfs uses trusted for?
> > >
> > > overlayfs stores bunch of metadata and uses "trusted" xattrs for it.
> >
> > Tell me more about this metadata.
> > Taking a juicy looking one, what does OVL_XATTR_REDIRECT do?
>
> It contains path information which is used for lookup into lower layer.
>
> > Or what happens if I was to write random numbers into OVL_XATTR_NLINK?
>
> Overlay is storing its metadata in trusted.* xattrs. If a user modifies
> metadata, then various kind of bad things can happen. I think one can
> do some kind of checks on metadata to make sure it does not crash
> atleast.
>
> And that's true for any filesystem. Isn't. If user manages to modify
> metadata outside of filesystem, then lot of bad things can happen. I
> thought that's the reason that people are not comfortable with the
> idea of allowing mounting filesystem from inside user namespace because
> it makes it easy to mount a hand crafted filesystem.
>
> Anyway, I think overlayfs is just one use case of trusted xattr. Even
> if overlayfs moves away from trusted xattr, what about other users.
> We need to have a story around how will we support trusted xattrs
> safely.
>
>
> >
> > > > b) If something nasty was to write junk into the trusted attributes,
> > > > what would happen?
> > >
> > > This directory is owned by guest. So it should be able to write
> > > anything it wants, as long as process in guest has CAP_SYS_ADMIN, right?
> >
> > Well, we shouldn't be able to break/crash/escape into the host; how
> > much does overlayfs validate trusted.* it uses?
>
> I thought qemu and kvm are the one who should ensure we should not be
> able to break out of sandbox. Kernel implementation could be as
> buggy as it wanted to be. We are working with this security model
> that kernel is completely untrusted.
But with virtiofs we allow the guest to do a lot of filesystem
operations on the host. It's the virtiofsd that has to ensure that
these are safe and contained within the fs it's exposed; the qemu/kvm
can't protect us from that.
That's why we sandbox the virtiofsd like we do - if we allow a
priviliged guest to perform calls to an unconstrained virtiofsd it would
be able to escape. That's what I want to check.
Dave
> >
> > > > c) I see overlayfs has a fallback check if xattr isn't supported at
> > > > all - what is the consequence?
> > >
> > > It falls back to I think read only mode.
> >
> > It looks like the fallback is more subtle to me:
> > /*
> > * Check if upper/work fs supports trusted.overlay.* xattr
> > */
> > err = ovl_do_setxattr(ofs->workdir, OVL_XATTR_OPAQUE, "0", 1, 0);
> > if (err) {
> > ofs->noxattr = true;
> > ofs->config.index = false;
> > ofs->config.metacopy = false;
> > pr_warn("upper fs does not support xattr, falling back to index=off and metacopy=off.\n");
> >
> > but I don't know what index and metacopy are.
>
> They enable certain features in overlayfs. In fact, we fall back to
> lesser capability on if we are running on ext4/xfs. For virtiofs,
> we deny the mount completely.
>
> /*
> * We allowed sub-optimal upper fs configuration and don't want to break
> * users over kernel upgrade, but we never allowed remote upper fs, so
> * we can enforce strict requirements for remote upper fs.
> */
> if (ovl_dentry_remote(ofs->workdir) &&
> (!d_type || !rename_whiteout || ofs->noxattr)) {
> pr_err("upper fs missing required features.\n");
> err = -EINVAL;
> goto out;
> }
>
> >
> > > For a moment forget about overlayfs. Say a user process in guest with
> > > CAP_SYS_ADMIN is writing trusted.foo. Should that succeed? Is a
> > > passthrough filesystem, so it should go through. But currently it
> > > wont.
> >
> > As long as any effects of what it writes are contained to the area of
> > the filesystem exposed to the guest, yes - however it worries me what
> > the consequences of broken trusted metadata is. If it's delicate enough
> > that it's guarded by CAP_SYS_ADMIN someone must have worried about it.
>
> Agreed that we need to look into whether having CAP_SYS_ADMIN allow
> virtiofsd to break out of jail.
>
> May be we need to provide that remapping trusted xattr feature so
> that we don't have to have CAP_SYS_ADMIN in init_user_ns and can
> provide this emulation even when running in user namespace.
>
> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply other threads:[~2020-06-19 16:17 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-16 16:49 [PATCH 0/2] virtiofsd: drop Linux capabilities(7) Stefan Hajnoczi
2020-04-16 16:49 ` [PATCH 1/2] virtiofsd: only retain file system capabilities Stefan Hajnoczi
2020-04-28 11:48 ` Dr. David Alan Gilbert
2020-04-16 16:49 ` [PATCH 2/2] virtiofsd: drop all capabilities in the wait parent process Stefan Hajnoczi
2020-04-16 17:50 ` Philippe Mathieu-Daudé
2020-04-16 20:10 ` [PATCH 0/2] virtiofsd: drop Linux capabilities(7) Vivek Goyal
2020-04-17 9:42 ` Stefan Hajnoczi
2020-05-01 18:28 ` Dr. David Alan Gilbert
2020-06-18 19:08 ` [Virtio-fs] " Vivek Goyal
2020-06-18 19:16 ` Dr. David Alan Gilbert
2020-06-18 19:27 ` Vivek Goyal
2020-06-19 4:46 ` Chirantan Ekbote
2020-06-19 8:39 ` Dr. David Alan Gilbert
2020-06-19 9:17 ` Chirantan Ekbote
2020-06-19 11:12 ` Dr. David Alan Gilbert
2020-06-19 19:15 ` Vivek Goyal
2020-06-25 3:19 ` Chirantan Ekbote
2020-06-25 12:55 ` Vivek Goyal
2020-07-13 8:54 ` Chirantan Ekbote
2020-07-13 13:39 ` Vivek Goyal
2020-06-19 8:27 ` Dr. David Alan Gilbert
2020-06-19 11:39 ` Daniel P. Berrangé
2020-06-19 11:49 ` Dr. David Alan Gilbert
2020-06-19 12:05 ` Daniel P. Berrangé
2020-06-19 17:41 ` Vivek Goyal
2020-06-19 19:12 ` Vivek Goyal
2020-06-26 11:26 ` Dr. David Alan Gilbert
2020-06-19 16:09 ` Vivek Goyal
2020-06-19 16:16 ` Dr. David Alan Gilbert [this message]
2020-06-19 17:11 ` Vivek Goyal
2020-06-19 17:16 ` Dr. David Alan Gilbert
2020-06-19 14:16 ` Miklos Szeredi
2020-06-19 14:25 ` Vivek Goyal
2020-06-19 15:26 ` Miklos Szeredi
2020-06-19 15:57 ` Vivek Goyal
2020-06-19 14:29 ` Vivek Goyal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200619161648.GJ2690@work-vm \
--to=dgilbert@redhat.com \
--cc=miklos@szeredi.hu \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vgoyal@redhat.com \
--cc=virtio-fs@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).