qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: virtio-fs@redhat.com, qemu-devel@nongnu.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Miklos Szeredi <miklos@szeredi.hu>
Subject: Re: [Virtio-fs] [PATCH 0/2] virtiofsd: drop Linux capabilities(7)
Date: Fri, 19 Jun 2020 17:16:48 +0100	[thread overview]
Message-ID: <20200619161648.GJ2690@work-vm> (raw)
In-Reply-To: <20200619160923.GD3154@redhat.com>

* Vivek Goyal (vgoyal@redhat.com) wrote:
> On Fri, Jun 19, 2020 at 09:27:46AM +0100, Dr. David Alan Gilbert wrote:
> > * Vivek Goyal (vgoyal@redhat.com) wrote:
> > > On Thu, Jun 18, 2020 at 08:16:55PM +0100, Dr. David Alan Gilbert wrote:
> > > > * Vivek Goyal (vgoyal@redhat.com) wrote:
> > > > > On Thu, Apr 16, 2020 at 05:49:05PM +0100, Stefan Hajnoczi wrote:
> > > > > > virtiofsd doesn't need of all Linux capabilities(7) available to root.  Keep a
> > > > > > whitelisted set of capabilities that we require.  This improves security in
> > > > > > case virtiofsd is compromised by making it hard for an attacker to gain further
> > > > > > access to the system.
> > > > > 
> > > > > Hi Stefan,
> > > > > 
> > > > > I just noticed that this patch set breaks overlayfs on top of virtiofs.
> > > > > 
> > > > > overlayfs sets "trusted.overlay.*" and xattrs in trusted domain
> > > > > need CAP_SYS_ADMIN.
> > > > > 
> > > > > man xattr says.
> > > > > 
> > > > >    Trusted extended attributes
> > > > >        Trusted  extended  attributes  are  visible and accessible only to pro‐
> > > > >        cesses that have the  CAP_SYS_ADMIN  capability.   Attributes  in  this
> > > > >        class are used to implement mechanisms in user space (i.e., outside the
> > > > >        kernel) which keep information in extended attributes to which ordinary
> > > > >        processes should not have access.
> > > > > 
> > > > > There is a chance that overlay moves away from trusted xattr in future.
> > > > > But for now we need to make it work. This is an important use case for
> > > > > kata docker in docker build.
> > > > > 
> > > > > May be we can add an option to virtiofsd say "--add-cap <capability>" and
> > > > > ask user to pass in "--add-cap cap_sys_admin" if they need to run daemon
> > > > > with this capaibility.
> > > > 
> > > > I'll admit I don't like the idea of giving it cap_sys_admin.
> > > > Can you explain:
> > > >   a) What overlayfs uses trusted for?
> > > 
> > > overlayfs stores bunch of metadata and uses "trusted" xattrs for it.
> > 
> > Tell me more about this metadata.
> > Taking a juicy looking one, what does OVL_XATTR_REDIRECT do?
> 
> It contains path information which is used for lookup into lower layer.
> 
> > Or what happens if I was to write random numbers into OVL_XATTR_NLINK?
> 
> Overlay is storing its metadata in trusted.* xattrs. If a user modifies
> metadata, then various kind of bad things can happen. I think one can
> do some kind of checks on metadata to make sure it does not crash
> atleast.
> 
> And that's true for any filesystem. Isn't. If user manages to modify
> metadata outside of filesystem, then lot of bad things can happen. I
> thought that's the reason that people are not comfortable with the
> idea of allowing mounting filesystem from inside user namespace because
> it makes it easy to mount a hand crafted filesystem.
> 
> Anyway, I think overlayfs is just one use case of trusted xattr. Even
> if overlayfs moves away from trusted xattr, what about other users.
> We need to have a story around how will we support trusted xattrs
> safely.
> 
> 
> > 
> > > >   b) If something nasty was to write junk into the trusted attributes,
> > > >     what would happen?
> > > 
> > > This directory is owned by guest. So it should be able to write
> > > anything it wants, as long as process in guest has CAP_SYS_ADMIN, right?
> > 
> > Well, we shouldn't be able to break/crash/escape into the host; how
> > much does overlayfs validate trusted.* it uses?
> 
> I thought qemu and kvm are the one who should ensure we should not be
> able to break out of sandbox. Kernel implementation could be as 
> buggy as it wanted to be. We are working with this security model
> that kernel is completely untrusted.

But with virtiofs we allow the guest to do a lot of filesystem
operations on the host.  It's the virtiofsd that has to ensure that
these are safe and contained within the fs it's exposed; the qemu/kvm
can't protect us from that.

That's why we sandbox the virtiofsd like we do - if we allow a
priviliged guest to perform calls to an unconstrained virtiofsd it would
be able to escape.  That's what I want to check.

Dave

> > 
> > > >   c) I see overlayfs has a fallback check if xattr isn't supported at
> > > > all - what is the consequence?
> > > 
> > > It falls back to I think read only mode. 
> > 
> > It looks like the fallback is more subtle to me:
> >         /*
> >          * Check if upper/work fs supports trusted.overlay.* xattr
> >          */
> >         err = ovl_do_setxattr(ofs->workdir, OVL_XATTR_OPAQUE, "0", 1, 0);
> >         if (err) {
> >                 ofs->noxattr = true;
> >                 ofs->config.index = false;
> >                 ofs->config.metacopy = false;
> >                 pr_warn("upper fs does not support xattr, falling back to index=off and metacopy=off.\n");
> > 
> > but I don't know what index and metacopy are.
> 
> They enable certain features in overlayfs. In fact, we fall back to
> lesser capability on if we are running on ext4/xfs. For virtiofs, 
> we deny the mount completely.
> 
>         /*
>          * We allowed sub-optimal upper fs configuration and don't want to break
>          * users over kernel upgrade, but we never allowed remote upper fs, so
>          * we can enforce strict requirements for remote upper fs.
>          */
>         if (ovl_dentry_remote(ofs->workdir) &&
>             (!d_type || !rename_whiteout || ofs->noxattr)) {
>                 pr_err("upper fs missing required features.\n");
>                 err = -EINVAL;
>                 goto out;
>         }
> 
> > 
> > > For a moment forget about overlayfs. Say a user process in guest with
> > > CAP_SYS_ADMIN is writing trusted.foo. Should that succeed? Is a
> > > passthrough filesystem, so it should go through. But currently it
> > > wont.
> > 
> > As long as any effects of what it writes are contained to the area of
> > the filesystem exposed to the guest, yes - however it worries me what
> > the consequences of broken trusted metadata is.  If it's delicate enough
> > that it's guarded by CAP_SYS_ADMIN someone must have worried about it.
> 
> Agreed that we need to look into whether having CAP_SYS_ADMIN allow
> virtiofsd to break out of jail. 
> 
> May be we need to provide that remapping trusted xattr feature so
> that we don't have to have CAP_SYS_ADMIN in init_user_ns and can
> provide this emulation even when running in user namespace.
> 
> Vivek
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



  reply	other threads:[~2020-06-19 16:17 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-16 16:49 [PATCH 0/2] virtiofsd: drop Linux capabilities(7) Stefan Hajnoczi
2020-04-16 16:49 ` [PATCH 1/2] virtiofsd: only retain file system capabilities Stefan Hajnoczi
2020-04-28 11:48   ` Dr. David Alan Gilbert
2020-04-16 16:49 ` [PATCH 2/2] virtiofsd: drop all capabilities in the wait parent process Stefan Hajnoczi
2020-04-16 17:50   ` Philippe Mathieu-Daudé
2020-04-16 20:10 ` [PATCH 0/2] virtiofsd: drop Linux capabilities(7) Vivek Goyal
2020-04-17  9:42   ` Stefan Hajnoczi
2020-05-01 18:28 ` Dr. David Alan Gilbert
2020-06-18 19:08 ` [Virtio-fs] " Vivek Goyal
2020-06-18 19:16   ` Dr. David Alan Gilbert
2020-06-18 19:27     ` Vivek Goyal
2020-06-19  4:46       ` Chirantan Ekbote
2020-06-19  8:39         ` Dr. David Alan Gilbert
2020-06-19  9:17           ` Chirantan Ekbote
2020-06-19 11:12             ` Dr. David Alan Gilbert
2020-06-19 19:15         ` Vivek Goyal
2020-06-25  3:19           ` Chirantan Ekbote
2020-06-25 12:55             ` Vivek Goyal
2020-07-13  8:54               ` Chirantan Ekbote
2020-07-13 13:39                 ` Vivek Goyal
2020-06-19  8:27       ` Dr. David Alan Gilbert
2020-06-19 11:39         ` Daniel P. Berrangé
2020-06-19 11:49           ` Dr. David Alan Gilbert
2020-06-19 12:05             ` Daniel P. Berrangé
2020-06-19 17:41               ` Vivek Goyal
2020-06-19 19:12           ` Vivek Goyal
2020-06-26 11:26             ` Dr. David Alan Gilbert
2020-06-19 16:09         ` Vivek Goyal
2020-06-19 16:16           ` Dr. David Alan Gilbert [this message]
2020-06-19 17:11             ` Vivek Goyal
2020-06-19 17:16               ` Dr. David Alan Gilbert
2020-06-19 14:16   ` Miklos Szeredi
2020-06-19 14:25     ` Vivek Goyal
2020-06-19 15:26       ` Miklos Szeredi
2020-06-19 15:57         ` Vivek Goyal
2020-06-19 14:29     ` Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200619161648.GJ2690@work-vm \
    --to=dgilbert@redhat.com \
    --cc=miklos@szeredi.hu \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).