All of lore.kernel.org
 help / color / mirror / Atom feed
* RFC: HVM de-privileged mode scheduling considerations
@ 2015-08-03 13:35 Ben Catterall
  2015-08-03 13:54 ` Andrew Cooper
  0 siblings, 1 reply; 6+ messages in thread
From: Ben Catterall @ 2015-08-03 13:35 UTC (permalink / raw)
  To: george.dunlap, dario.faggioli; +Cc: xen-devel

Hi all,

I am working on an x86 proof-of-concept to evaluate if it is feasible to 
move device models and x86 emulation code for HVM guests into a 
de-privileged context.

I was hoping to get feedback from relevant maintainers on scheduling 
considerations for this system to mitigate potential DoS attacks.

Many thanks in advance,
Ben

This is intended as a proof-of-concept, with the aim of determining if 
this idea is feasible within performance constraints.

Motivation
----------
The motivation for moving the device models and x86 emulation code into 
ring 3 is to mitigate a system  compromise due a bug in any of these 
systems. These systems are currently part of the hypervisor and, 
consequently, a bug in any of these could allow an attacker to gain 
control (or perform a DOS) of
Xen and/or guests.

Migrating between PCPUs
-----------------------
There is a need to support migration between pcpus so that the scheduler 
can still perform this operation. However, there is an issue to resolve. 
Currently, I have a per-vcpu copy of the Xen ring 0 stack up to the 
point of entering the de-privileged mode. This allows us to restore this 
stack and then continue from the entry point when we have finished in 
de-privileged mode. There will be per-pcpu data on these per-vcpu stacks 
such as saved stack frame pointers for the per-pcpu stack, 
smp_processor_id() responses etc.

Therefore, it will be necessary to lock the vcpu to the current pcpu 
when it enters this user mode so that it does not wake up on a different 
pcpu where such pointers and other data are invalid. We can do this by 
setting a hard affinity to the pcpu that the vcpu is executing on. See 
common/wait.c which does something similar to what I am doing.

However, needing to have hard affinity to a pcpu leads to the following 
problem:
- An attacker could lock multiple vcpus to a single pcpu, leading to a 
DoS. This could be achieved by  spinning in a loop in Xen de-privileged 
mode (assuming a bug in this mode) and performing this operation on 
multiple vcpus at once. The attacker could wait until all of their vcpus 
were on the same pcpu and then execute this attack. This could cause the 
pcpu to, effectively, lock up, as it will be under heavy load, and we 
would be unable to move work elsewhere.

A solution to the DoS would be to force migration to another pcpu, if 
after, say, 100 quanta have passed where the vcpu has remained in 
de-privileged mode. This forcing of migration would require us to 
forcibly complete the de-privileged operation, and then, just before 
returning into the guest, force a cpu change. We could not just force a 
migration at the schedule call point as the Xen stack needs to unwind to 
free up resources. We would reset this count each time we completed a 
de-privileged mode operation.

A legitimate long-running de-privileged operation would trigger this 
forced migration mechanism. However, it is unlikely that such operations 
will be needed and the count can be adjusted appropriately to mitigate this.

Any suggestions or feedback would be appreciated!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-08-11 10:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-03 13:35 RFC: HVM de-privileged mode scheduling considerations Ben Catterall
2015-08-03 13:54 ` Andrew Cooper
2015-08-03 14:34   ` Ian Campbell
2015-08-03 15:09     ` Dario Faggioli
2015-08-04 13:46     ` George Dunlap
2015-08-11 10:40       ` Ben Catterall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.