Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Philippe Mathieu-Daudé" <philmd@redhat.com>
To: "Alex Bennée" <alex.bennee@linaro.org>, qemu-devel@nongnu.org
Cc: Eduardo Otubo <otubo@redhat.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Markus Armbruster <armbru@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs
Date: Fri, 3 May 2019 12:10:04 +0200	[thread overview]
Message-ID: <ab085fad-e97c-ce30-326e-4e06a05d4a4e@redhat.com> (raw)
In-Reply-To: <87v9yr52ph.fsf@zen.linaroharston>

On 5/3/19 11:04 AM, Alex Bennée wrote:
> 
> Stefan Hajnoczi <stefanha@redhat.com> writes:
> 
>> At KVM Forum 2018 I gave a presentation on security in QEMU:
>> https://www.youtube.com/watch?v=YAdRf_hwxU8 (video)
>> https://vmsplice.net/~stefan/stefanha-kvm-forum-2018.pdf (slides)
>>
>> This patch adds a security guide to the developer docs.  This document
>> covers things that developers should know about security in QEMU.  It is
>> just a starting point that we can expand on later.  I hope it will be
>> useful as a resource for new contributors and will save code reviewers
>> from explaining the same concepts many times.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>> v2:
>>  * Added mention of passthrough USB and PCI devices [philmd]
>>  * Reworded resource limits [philmd]
>>  * Added qemu_log_mask(LOG_GUEST_ERROR) [philmd]
>> ---
>>  docs/devel/index.rst    |   1 +
>>  docs/devel/security.rst | 225 ++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 226 insertions(+)
>>  create mode 100644 docs/devel/security.rst
>>
>> diff --git a/docs/devel/index.rst b/docs/devel/index.rst
>> index ebbab636ce..fd0b5fa387 100644
>> --- a/docs/devel/index.rst
>> +++ b/docs/devel/index.rst
>> @@ -20,3 +20,4 @@ Contents:
>>     stable-process
>>     testing
>>     decodetree
>> +   security
>> diff --git a/docs/devel/security.rst b/docs/devel/security.rst
>> new file mode 100644
>> index 0000000000..83c6fb2231
>> --- /dev/null
>> +++ b/docs/devel/security.rst
>> @@ -0,0 +1,225 @@
>> +==============
>> +Security Guide
>> +==============
>> +Overview
>> +--------
>> +This guide covers security topics relevant to developers working on QEMU.  It
>> +includes an explanation of the security requirements that QEMU gives its users,
>> +the architecture of the code, and secure coding practices.
>> +
>> +Security Requirements
>> +---------------------
>> +QEMU supports many different use cases, some of which have stricter security
>> +requirements than others.  The community has agreed on the overall security
>> +requirements that users may depend on.  These requirements define what is
>> +considered supported from a security perspective.
>> +
>> +Virtualization Use Case
>> +~~~~~~~~~~~~~~~~~~~~~~~
>> +The virtualization use case covers cloud and virtual private server (VPS)
>> +hosting, as well as traditional data center and desktop virtualization.  These
>> +use cases rely on hardware virtualization extensions to execute guest code
>> +safely on the physical CPU at close-to-native speed.
>> +
>> +The following entities are **untrusted**, meaning that they may be buggy or
>> +malicious:
>> +
>> +* Guest
>> +* User-facing interfaces (e.g. VNC, SPICE, WebSocket)
>> +* Network protocols (e.g. NBD, live migration)
>> +* User-supplied files (e.g. disk images, kernels, device trees)
>> +* Passthrough devices (e.g. PCI, USB)

Thanks.

>> +
>> +Bugs affecting these entities are evaluated on whether they can cause damage in
>> +real-world use cases and treated as security bugs if this is the case.
>> +
>> +Non-virtualization Use Case
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The non-virtualization use case covers emulation using the Tiny Code Generator
>> +(TCG).  In principle the TCG and device emulation code used in conjunction with
>> +the non-virtualization use case should meet the same security requirements as
>> +the virtualization use case.  However, for historical reasons much of the
>> +non-virtualization use case code was not written with these security
>> +requirements in mind.
>> +
>> +Bugs affecting the non-virtualization use case are not considered security
>> +bugs at this time.  Users with non-virtualization use cases must not rely on
>> +QEMU to provide guest isolation or any security guarantees.
>> +
>> +Architecture
>> +------------
>> +This section describes the design principles that ensure the security
>> +requirements are met.
>> +
>> +Guest Isolation
>> +~~~~~~~~~~~~~~~
>> +Guest isolation is the confinement of guest code to the virtual machine.  When
>> +guest code gains control of execution on the host this is called escaping the
>> +virtual machine.  Isolation also includes resource limits such as throttling of
>> +CPU, memory, disk, or network.  Guests must be unable to exceed their resource
>> +limits.
>> +
>> +QEMU presents an attack surface to the guest in the form of emulated devices.
>> +The guest must not be able to gain control of QEMU.  Bugs in emulated devices
>> +could allow malicious guests to gain code execution in QEMU.  At this point the
>> +guest has escaped the virtual machine and is able to act in the context of the
>> +QEMU process on the host.
>> +
>> +Guests often interact with other guests and share resources with them.  A
>> +malicious guest must not gain control of other guests or access their data.
>> +Disk image files and network traffic must be protected from other guests unless
>> +explicitly shared between them by the user.
>> +
>> +Principle of Least Privilege
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The principle of least privilege states that each component only has access to
>> +the privileges necessary for its function.  In the case of QEMU this means that
>> +each process only has access to resources belonging to the guest.
>> +
>> +The QEMU process should not have access to any resources that are inaccessible
>> +to the guest.  This way the guest does not gain anything by escaping into the
>> +QEMU process since it already has access to those same resources from within
>> +the guest.
>> +
>> +Following the principle of least privilege immediately fulfills guest isolation
>> +requirements.  For example, guest A only has access to its own disk image file
>> +``a.img`` and not guest B's disk image file ``b.img``.
>> +
>> +In reality certain resources are inaccessible to the guest but must be
>> +available to QEMU to perform its function.  For example, host system calls are
>> +necessary for QEMU but are not exposed to guests.  A guest that escapes into
>> +the QEMU process can then begin invoking host system calls.
>> +
>> +New features must be designed to follow the principle of least privilege.
>> +Should this not be possible for technical reasons, the security risk must be
>> +clearly documented so users are aware of the trade-off of enabling the feature.
>> +
>> +Isolation mechanisms
>> +~~~~~~~~~~~~~~~~~~~~
>> +Several isolation mechanisms are available to realize this architecture of
>> +guest isolation and the principle of least privilege.  With the exception of
>> +Linux seccomp, these mechanisms are all deployed by management tools that
>> +launch QEMU, such as libvirt.  They are also platform-specific so they are only
>> +described briefly for Linux here.
>> +
>> +The fundamental isolation mechanism is that QEMU processes must run as
>> +**unprivileged users**.  Sometimes it seems more convenient to launch QEMU as
>> +root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a
>> +huge security risk.  File descriptor passing can be used to give an otherwise
>> +unprivileged QEMU process access to host devices without running QEMU
>> as root.
> 
> Should we mention that you can still maintain running as a user and just
> make the devices you need available to the user/group rather than
> becoming root? For example I generally make /dev/kvm group accessible to
> my user account.

Good suggestion.

>> +
>> +**SELinux** and **AppArmor** make it possible to confine processes beyond the
>> +traditional UNIX process and file permissions model.  They restrict the QEMU
>> +process from accessing processes and files on the host system that are not
>> +needed by QEMU.
>> +
>> +**Resource limits** and **cgroup controllers** provide throughput and utilization
>> +limits on key resources such as CPU time, memory, and I/O bandwidth.
>> +
>> +**Linux namespaces** can be used to make process, file system, and other system
>> +resources unavailable to QEMU.  A namespaced QEMU process is restricted to only
>> +those resources that were granted to it.
>> +
>> +**Linux seccomp** is available via the QEMU ``--sandbox`` option.  It disables
>> +system calls that are not needed by QEMU, thereby reducing the host kernel
>> +attack surface.
>> +
>> +Secure coding practices
>> +-----------------------
>> +At the source code level there are several points to keep in mind.  Both
>> +developers and security researchers must be aware of them so that they can
>> +develop safe code and audit existing code properly.
>> +
>> +General Secure C Coding Practices
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +Most CVEs (security bugs) reported against QEMU are not specific to
>> +virtualization or emulation.  They are simply C programming bugs.  Therefore
>> +it's critical to be aware of common classes of security bugs.
>> +
>> +There is a wide selection of resources available covering secure C coding.  For
>> +example, the `CERT C Coding Standard
>> +<https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard>`_
>> +covers the most important classes of security bugs.
>> +
>> +Instead of describing them in detail here, only the names of the most important
>> +classes of security bugs are mentioned:
>> +
>> +* Buffer overflows
>> +* Use-after-free and double-free
>> +* Integer overflows
>> +* Format string vulnerabilities
>> +
>> +Some of these classes of bugs can be detected by analyzers.  Static analysis is
>> +performed regularly by Coverity and the most obvious of these bugs are even
>> +reported by compilers.  Dynamic analysis is possible with valgrind, tsan, and
>> +asan.
>> +
>> +Input Validation
>> +~~~~~~~~~~~~~~~~
>> +Inputs from the guest or external sources (e.g. network, files) cannot be
>> +trusted and may be invalid.  Inputs must be checked before using them in a way
>> +that could crash the program, expose host memory to the guest, or otherwise be
>> +exploitable by an attacker.
>> +
>> +The most sensitive attack surface is device emulation.  All hardware register
>> +accesses and data read from guest memory must be validated.  A typical example
>> +is a device that contains multiple units that are selectable by the guest via
>> +an index register::
>> +
>> +  typedef struct {
>> +      ProcessingUnit unit[2];
>> +      ...
>> +  } MyDeviceState;
>> +
>> +  static void mydev_writel(void *opaque, uint32_t addr, uint32_t val)
>> +  {
>> +      MyDeviceState *mydev = opaque;
>> +      ProcessingUnit *unit;
>> +
>> +      switch (addr) {
>> +      case MYDEV_SELECT_UNIT:
>> +          unit = &mydev->unit[val];   <-- this input wasn't validated!
>> +          ...
>> +      }
>> +  }
>> +
>> +If ``val`` is not in range [0, 1] then an out-of-bounds memory access will take
>> +place when ``unit`` is dereferenced.  The code must check that ``val`` is 0 or
>> +1 and handle the case where it is invalid.
>> +
>> +Unexpected Device Accesses
>> +~~~~~~~~~~~~~~~~~~~~~~~~~~
>> +The guest may access device registers in unusual orders or at unexpected
>> +moments.  Device emulation code must not assume that the guest follows the
>> +typical "theory of operation" presented in driver writer manuals.  The guest
>> +may make nonsense accesses to device registers such as starting operations
>> +before the device has been fully initialized.
>> +
>> +A related issue is that device emulation code must be prepared for unexpected
>> +device register accesses while asynchronous operations are in progress.  A
>> +well-behaved guest might wait for a completion interrupt before accessing
>> +certain device registers.  Device emulation code must handle the case where the
>> +guest overwrites registers or submits further requests before an ongoing
>> +request completes.  Unexpected accesses must not cause memory corruption or
>> +leaks in QEMU.
>> +
>> +Invalid device register accesses can be reported with
>> +``qemu_log_mask(LOG_GUEST_ERROR, ...)``.  The ``-d guest_errors`` command-line
>> +option enables these log messages.

Thanks for adding this section!

>> +
>> +Live migration
>> +~~~~~~~~~~~~~~
>> +Device state can be saved to disk image files and shared with other users.
>> +Live migration code must validate inputs when loading device state so an
>> +attacker cannot gain control by crafting invalid device states.  Device state
>> +is therefore considered untrusted even though it is typically generated by QEMU
>> +itself.
>> +
>> +Guest Memory Access Races
>> +~~~~~~~~~~~~~~~~~~~~~~~~~
>> +Guests with multiple vCPUs may modify guest RAM while device emulation code is
>> +running.  Device emulation code must copy in descriptors and other guest RAM
>> +structures and only process the local copy.  This prevents
>> +time-of-check-to-time-of-use (TOCTOU) race conditions that could cause QEMU to
>> +crash when a vCPU thread modifies guest RAM while device emulation is
>> +processing it.
> 
> Anyway:
> 
> Reviewed-by: Alex Bennée <alex.bennee@linaro.org>

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

next prev parent reply	other threads:[~2019-05-03 10:10 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-25 13:35 [Qemu-devel] [PATCH v2] security.rst: add Security Guide to developer docs Stefan Hajnoczi
2019-04-25 13:35 ` Stefan Hajnoczi
2019-05-01 16:20 ` Stefan Hajnoczi
2019-05-01 16:20   ` Stefan Hajnoczi
2019-05-03  8:14 ` Stefano Garzarella
2019-05-03  8:14   ` Stefano Garzarella
2019-05-03  9:04 ` Alex Bennée
2019-05-03  9:04   ` Alex Bennée
2019-05-03 10:10   ` Philippe Mathieu-Daudé [this message]
2019-05-03 10:10     ` Philippe Mathieu-Daudé
2019-05-03 17:32   ` Stefan Hajnoczi
2019-05-03 17:32     ` Stefan Hajnoczi
2019-05-03 10:19 ` Daniel P. Berrangé
2019-05-03 10:19   ` Daniel P. Berrangé
2019-05-03 10:28   ` Peter Maydell
2019-05-03 10:28     ` Peter Maydell
2019-05-03 10:35     ` Daniel P. Berrangé
2019-05-03 10:35       ` Daniel P. Berrangé
2019-05-03 17:30       ` Stefan Hajnoczi
2019-05-03 17:30         ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ab085fad-e97c-ce30-326e-4e06a05d4a4e@redhat.com \
    --to=philmd@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=armbru@redhat.com \
    --cc=otubo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).