Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Jag Raman <jag.raman@oracle.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "Elena Ufimtseva" <elena.ufimtseva@oracle.com>,
	"John G Johnson" <john.g.johnson@oracle.com>,
	sstabellini@kernel.org, konrad.wilk@oracle.com,
	"Stefan Hajnoczi" <stefanha@gmail.com>,
	"Philippe Mathieu-Daudé" <f4bug@amsat.org>,
	qemu-devel@nongnu.org, ross.lagerwall@citrix.com,
	liran.alon@oracle.com, kanth.ghatraju@oracle.com
Subject: Re: [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess
Date: Tue, 7 May 2019 15:00:52 -0400	[thread overview]
Message-ID: <fe4b0b42-523d-5877-173c-3e878abd4e32@oracle.com> (raw)
In-Reply-To: <20190425154421.GG17806@stefanha-x1.localdomain>

Hi Stefan,

Thank you very much for your feedback. Following is a summary of the
discussions our team had regarding your feedback.

On 4/25/2019 11:44 AM, Stefan Hajnoczi wrote:
> 
> Can multiple LSI SCSI controllers be launched such that each process
> only has access to a subset of disk images?  Or is the disk image label
> per-VM so that there is no isolation between LSI SCSI controller
> processes for that VM?

Yes, it is possible to provide each process with access to a subset of
disk images. The Orchestrator (libvirt, etc.) assigns a set of MCS
Categories to each VM, then device instances can be isolated by being
assigned a subset of the VM’s Categories.

> 
> My concern with this overall approach is the practicality vs its
> benefits.  Regarding practicality, each emulated device needs to be
> proxied separately.  The QEMU subsystem used by the device also needs to
> be proxied.  Global state, monitor commands, and live migration all
> require code changes to support proxied operation.  This is very
> invasive.
> 
> Then each emulated device needs an SELinux policy to achieve the
> benefits of confinement.  I have no idea how to correctly write a policy
> like this and it's likely that developers who contribute a single new
> device will not be proficient in it either.  Writing these policies is a
> rare thing and few people will be good at this.  It also makes me worry
> about how we test and review them.

We also think that having an SELinux policy per device would become
complicated. Our proposal, therefore, is to define SELinux policies for
each device class - viz. disk, network, console, graphics, etc.
"fedora-selinux" upstream repo. [1] will contain these policies, so the
device developer doesn't have to worry about defining new policies for
each device. This proposal would diminish the complexity of SELinux
policies.

> 
> Despite the efforts required in making this work, all processes still
> effectively have full access to the guest since they can access guest
> RAM.  What I mean is that the device is actually not confined to its
> host process (e.g. LSI SCSI controller process) because it can write
> code to executable guest RAM pages.  The guest will then execute that
> code and therefore all guest I/O (networking, disk, etc) is still
> available indirectly to the "confined" processes.  They are not really
> sandboxed from the outside world, regardless of how strict the SELinux
> policy is :(.
> 
> There are performance issues due to proxying as well, but let's ignore
> them for now and focus on security.

We are also focusing on performance. Please take a look at the following
blog for an initial report on performance. The results are for an iSCSI
backend in Oracle Cloud. We are working on collecting data on a much
heavier IOPS workload like an NVMe backend.

https://blogs.oracle.com/linux/towards-a-more-secure-qemu-hypervisor%2c-part-3-of-3-v2

> 
> How do the benefits compare against today's monolithic approach?  If the
> guest exploits monolithic QEMU it has full access to all host files and
> APIs available to QEMU.  However, these are largely just the resources
> that belong to the guest anyway - not resources we are trying to keep
> away from the guest.  With multi-process QEMU each process still has
> access to all guest interfaces via the code injection I mentioned above,
> but the SELinux policy could restrict access to some resources.  But
> this benefit is really small in my opinion, given that the resources
> belong to the guest anyway and the guest can already access them.

The primary focus of our project is to defend the host from malicious
guest. The code injection problem you outlined above involves part of
the guest attacking itself, but not the host. Therefore, this wouldn't
compromise our objective.

Like you know, there are some parts of QEMU which are not directly
accessible from the guest (via drivers, etc.), which we prefer to call
the control plane. It executes ioctls to the host kernel and has access
to a broader set of syscalls, which the device emulation code doesn’t
need. We want to protect the control plane from emulated devices. In the
case where a device injects code into the RAM to attack another device
on the same VM, the control plane would still be protected.

Another benefit with the project would be regarding detecting and
reporting failures in the emulated devices. For instance, in cases like
CVE-2018-18849, where an emulated device hangs/crashes, it wouldn't
directly crash the QEMU process as well. QEMU could detect the failure,
log the problem and exit, instead of generating coredump/hang.

> 
> I think you can implement this for a handful of devices as a one-time
> thing, but the invasiveness and the impracticality of getting wide cover
> of QEMU make this approach questionable.
> 
> Am I mistaken about the invasiveness or impracticality?

We are not planning to implement this for all devices since it would be
impractical. But the project adds a framework for implementing more
devices in the future.

One other thing we would like to bring your attention to is that the
project doesn't affect the current usage. The same devices could still
be used as part of monolithic QEMU if the user chooses to do so.

> 
> Am I misunderstanding the security benefits compared to what already
> exists today?

As far as we know, there is no other open-source KVM based toolstack
where the privileged operations are in a separate process, and the
emulated devices are in jail and where you can still run legacy OSes
like Windows XP

> 
> A more practical approach is to strip down QEMU (compiling out unused
> devices and features) and to run virtio devices in vhost-user processes
> (e.g. virtio-input, virtio-gpu, virtio-fs).  This achieves similar goals
> without proxy objects or invasive changes to QEMU since the vhost-user
> devices use a different codebase and aren't accessible via the QEMU
> monitor.  The limitation is that existing QEMU code and non-virtio
> devices aren't available in this model.

In some cases, the user/customer brings in VMs with legacy devices
attached to them. It's not possible to take the virtio/vhost approach in
this case.

[1] https://github.com/fedora-selinux

Thanks!
-- 
Jag

> 
> Stefan
>

next prev parent reply	other threads:[~2019-05-07 19:03 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-07  7:22 [Qemu-devel] [multiprocess RFC PATCH 36/37] multi-process: add the concept description to docs/devel/qemu-multiprocess elena.ufimtseva
2019-03-07  8:14 ` Thomas Huth
2019-03-07 14:16   ` Kevin Wolf
2019-03-07 14:21     ` Thomas Huth
2019-03-07 14:40       ` Konrad Rzeszutek Wilk
2019-03-07 14:53         ` Thomas Huth
2019-03-08 18:22     ` Elena Ufimtseva
2019-03-07 14:26 ` Stefan Hajnoczi
2019-03-07 14:51   ` Daniel P. Berrangé
2019-03-07 16:05     ` Michael S. Tsirkin
2019-03-07 16:19       ` Daniel P. Berrangé
2019-03-07 16:46         ` Michael S. Tsirkin
2019-03-07 16:49           ` Daniel P. Berrangé
2019-03-07 19:27     ` Stefan Hajnoczi
2019-03-07 23:29       ` John G Johnson
2019-03-08  9:50         ` Stefan Hajnoczi
     [not found]           ` <20190326080822.GC21018@stefanha-x1.localdomain>
     [not found]             ` <e5395abf-6b41-46c8-f5af-3210077dfdd5@oracle.com>
     [not found]               ` <CAAdtpL4ztcpf-CTx0fc5T_+VQ+8upHa2pEMoiZPcmBXOO6L3Og@mail.gmail.com>
2019-04-23 21:26                 ` Jag Raman
2019-04-23 21:26                   ` Jag Raman
2019-04-25 15:44                   ` Stefan Hajnoczi
2019-04-25 15:44                     ` Stefan Hajnoczi
2019-05-07 19:00                     ` Jag Raman [this message]
2019-05-23 10:40                       ` Stefan Hajnoczi
2019-06-11 15:53                         ` Jag Raman
2019-05-23 11:11                       ` Stefan Hajnoczi
2019-05-28 15:18                         ` Elena Ufimtseva
2019-05-30 20:54                           ` Elena Ufimtseva
2019-06-11 15:59                             ` Jag Raman
2019-06-12 16:24                             ` Stefan Hajnoczi
2019-06-12 17:01                               ` Elena Ufimtseva
2019-03-11 10:20         ` Daniel P. Berrangé
2019-05-07 21:00           ` Elena Ufimtseva
2019-05-23 11:22             ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fe4b0b42-523d-5877-173c-3e878abd4e32@oracle.com \
    --to=jag.raman@oracle.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=f4bug@amsat.org \
    --cc=john.g.johnson@oracle.com \
    --cc=kanth.ghatraju@oracle.com \
    --cc=konrad.wilk@oracle.com \
    --cc=liran.alon@oracle.com \
    --cc=qemu-devel@nongnu.org \
    --cc=ross.lagerwall@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).