Re: RFC: HVM de-privileged mode scheduling considerations

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Ben Catterall <Ben.Catterall@citrix.com>,
	george.dunlap@eu.citrix.com, dario.faggioli@citrix.com
Cc: xen-devel@lists.xen.org
Subject: Re: RFC: HVM de-privileged mode scheduling considerations
Date: Mon, 3 Aug 2015 14:54:51 +0100	[thread overview]
Message-ID: <55BF72AB.8070100@citrix.com> (raw)
In-Reply-To: <55BF6E38.509@citrix.com>

On 03/08/15 14:35, Ben Catterall wrote:
> Hi all,
>
> I am working on an x86 proof-of-concept to evaluate if it is feasible
> to move device models and x86 emulation code for HVM guests into a
> de-privileged context.
>
> I was hoping to get feedback from relevant maintainers on scheduling
> considerations for this system to mitigate potential DoS attacks.
>
> Many thanks in advance,
> Ben
>
> This is intended as a proof-of-concept, with the aim of determining if
> this idea is feasible within performance constraints.
>
> Motivation
> ----------
> The motivation for moving the device models and x86 emulation code
> into ring 3 is to mitigate a system  compromise due a bug in any of
> these systems. These systems are currently part of the hypervisor and,
> consequently, a bug in any of these could allow an attacker to gain
> control (or perform a DOS) of
> Xen and/or guests.
>
> Migrating between PCPUs
> -----------------------
> There is a need to support migration between pcpus so that the
> scheduler can still perform this operation. However, there is an issue
> to resolve. Currently, I have a per-vcpu copy of the Xen ring 0 stack
> up to the point of entering the de-privileged mode. This allows us to
> restore this stack and then continue from the entry point when we have
> finished in de-privileged mode. There will be per-pcpu data on these
> per-vcpu stacks such as saved stack frame pointers for the per-pcpu
> stack, smp_processor_id() responses etc.
>
> Therefore, it will be necessary to lock the vcpu to the current pcpu
> when it enters this user mode so that it does not wake up on a
> different pcpu where such pointers and other data are invalid. We can
> do this by setting a hard affinity to the pcpu that the vcpu is
> executing on. See common/wait.c which does something similar to what I
> am doing.
>
> However, needing to have hard affinity to a pcpu leads to the
> following problem:
> - An attacker could lock multiple vcpus to a single pcpu, leading to a
> DoS. This could be achieved by  spinning in a loop in Xen
> de-privileged mode (assuming a bug in this mode) and performing this
> operation on multiple vcpus at once. The attacker could wait until all
> of their vcpus were on the same pcpu and then execute this attack.
> This could cause the pcpu to, effectively, lock up, as it will be
> under heavy load, and we would be unable to move work elsewhere.
>
> A solution to the DoS would be to force migration to another pcpu, if
> after, say, 100 quanta have passed where the vcpu has remained in
> de-privileged mode. This forcing of migration would require us to
> forcibly complete the de-privileged operation, and then, just before
> returning into the guest, force a cpu change. We could not just force
> a migration at the schedule call point as the Xen stack needs to
> unwind to free up resources. We would reset this count each time we
> completed a de-privileged mode operation.
>
> A legitimate long-running de-privileged operation would trigger this
> forced migration mechanism. However, it is unlikely that such
> operations will be needed and the count can be adjusted appropriately
> to mitigate this.
>
> Any suggestions or feedback would be appreciated!

I don't see why any scheduling support is needed.

Currently all operations like this are run synchronously in the vmexit
context of the vcpu.  Any current DoS is already a real issue.

In any reasonable situation, emulation of a device is a small state
mutation and occasionally kicking off a further action to perform.  (The
far bigger risk from this kind of emulation is following bad
pointers/etc, rather than long loops.)

I think it would be entirely reasonable to have a deadline for a single
execution of depriv mode, after which the domain is declared malicious
and killed.

We already have this for host pcpus - the watchdog defaults to 5
seconds.  Having a similar cutoff for depriv mode should be fine.

~Andrew

next prev parent reply	other threads:[~2015-08-03 13:54 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-08-03 13:35 RFC: HVM de-privileged mode scheduling considerations Ben Catterall
2015-08-03 13:54 ` Andrew Cooper [this message]
2015-08-03 14:34   ` Ian Campbell
2015-08-03 15:09     ` Dario Faggioli
2015-08-04 13:46     ` George Dunlap
2015-08-11 10:40       ` Ben Catterall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55BF72AB.8070100@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=Ben.Catterall@citrix.com \
    --cc=dario.faggioli@citrix.com \
    --cc=george.dunlap@eu.citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.