* L1TF, and future work
@ 2018-08-15 13:17 Andrew Cooper
2018-08-15 13:21 ` Andrew Cooper
` (3 more replies)
0 siblings, 4 replies; 13+ messages in thread
From: Andrew Cooper @ 2018-08-15 13:17 UTC (permalink / raw)
To: Xen-devel List
Cc: Juergen Gross, Sergey Dyasli, Wei Liu, Dario Faggioli, Tim Deegan,
Jan Beulich, Xen Security, Woodhouse, David, Roger Pau Monne
Hello,
Now that the embargo on XSA-273 is up, we can start publicly discussing
the remaining work do, because there is plenty to do. In no particular
order...
1) Attempting to shadow dom0 from boot leads to some assertions very
very quickly. Shadowing dom0 after-the-fact leads to some very weird
crashes where whole swathes of the shadow appears to be missing. This
is why, for now, automatic shadowing of dom0 is disabled by default.
2) 32bit PV guests which use writeable pagetable support will
automatically get shadowed when the clear the lower half. Ideally, such
guests should be modified to use hypercalls rather than the ptwr
infrastructure (as its more efficient to begin with), but we can
probably work around this in Xen by emulating the next few instructions
until we have a complete PTE (same as the shadow code).
3) Toolstack CPUID/MSR work. This is needed for many reasons.
3a) Able to level MSR_ARCH_CAPS and maxphysaddr to regain some migration
safety.
3b) Able to report accurate topology to Xen (see point 5) and to guests.
3c) Able to configure/level the Viridian leaves, and implement the
Viridian L1TF extension.
3d) Able to configure/level the Xen leaves and implement a similar L1TF
enlightenment.
4) The shadow MMIO fastpath truncates the MMIO gfn at 2^28 without any
indication of failure. The most compatible bugfix AFACIT would be to
add an extra nibble's worth of gfn space which gets us to 2^32, and
clamp the guest maxphysaddr calculation at 44 bits. The alternative is
to clamp maxphysaddr to 40 bits, but that will break incoming migrate of
very large shadow guests.
4a) The shadow MMIO fastpath needs a runtime clobber, because it will
not function at all on Icelake hardware with a 52-bit physical address
width. Also, it turns out there is an architectural corner case when
levelling maxphysaddr, where some bits which (v)maxphysaddr says should
elicit #PF[RSVD], don't because the actual pipeline address width is larger.
5) Core-aware scheduling. At the moment, Xen will schedule arbitrary
guest vcpus on arbitrary hyperthreads. This is bad and wants fixing.
I'll defer to Dario for further details.
Perhaps the more important longer term action is to start removing
secrets from Xen, because its getting uncomfortably easy to ex-filtrate
data. I'll defer to David for his further plans in this direction.
I'm sure I've probably missed something in all of this, but this is
enough to begin the discussion.
~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: L1TF, and future work
2018-08-15 13:17 L1TF, and future work Andrew Cooper
@ 2018-08-15 13:21 ` Andrew Cooper
2018-08-15 14:11 ` Juergen Gross
2018-08-15 14:10 ` Jan Beulich
` (2 subsequent siblings)
3 siblings, 1 reply; 13+ messages in thread
From: Andrew Cooper @ 2018-08-15 13:21 UTC (permalink / raw)
To: Xen-devel List
Cc: Juergen Gross, Sergey Dyasli, Wei Liu, Tim Deegan, Dario Faggioli,
Jan Beulich, Xen Security, Woodhouse, David, Roger Pau Monne
On 15/08/18 14:17, Andrew Cooper wrote:
> Hello,
Apologies. Getting Dario's correct email address this time.
>
> Now that the embargo on XSA-273 is up, we can start publicly discussing
> the remaining work do, because there is plenty to do. In no particular
> order...
>
> 1) Attempting to shadow dom0 from boot leads to some assertions very
> very quickly. Shadowing dom0 after-the-fact leads to some very weird
> crashes where whole swathes of the shadow appears to be missing. This
> is why, for now, automatic shadowing of dom0 is disabled by default.
>
> 2) 32bit PV guests which use writeable pagetable support will
> automatically get shadowed when the clear the lower half. Ideally, such
> guests should be modified to use hypercalls rather than the ptwr
> infrastructure (as its more efficient to begin with), but we can
> probably work around this in Xen by emulating the next few instructions
> until we have a complete PTE (same as the shadow code).
>
> 3) Toolstack CPUID/MSR work. This is needed for many reasons.
> 3a) Able to level MSR_ARCH_CAPS and maxphysaddr to regain some migration
> safety.
> 3b) Able to report accurate topology to Xen (see point 5) and to guests.
> 3c) Able to configure/level the Viridian leaves, and implement the
> Viridian L1TF extension.
> 3d) Able to configure/level the Xen leaves and implement a similar L1TF
> enlightenment.
>
> 4) The shadow MMIO fastpath truncates the MMIO gfn at 2^28 without any
> indication of failure. The most compatible bugfix AFACIT would be to
> add an extra nibble's worth of gfn space which gets us to 2^32, and
> clamp the guest maxphysaddr calculation at 44 bits. The alternative is
> to clamp maxphysaddr to 40 bits, but that will break incoming migrate of
> very large shadow guests.
>
> 4a) The shadow MMIO fastpath needs a runtime clobber, because it will
> not function at all on Icelake hardware with a 52-bit physical address
> width. Also, it turns out there is an architectural corner case when
> levelling maxphysaddr, where some bits which (v)maxphysaddr says should
> elicit #PF[RSVD], don't because the actual pipeline address width is larger.
>
> 5) Core-aware scheduling. At the moment, Xen will schedule arbitrary
> guest vcpus on arbitrary hyperthreads. This is bad and wants fixing.
> I'll defer to Dario for further details.
>
> Perhaps the more important longer term action is to start removing
> secrets from Xen, because its getting uncomfortably easy to ex-filtrate
> data. I'll defer to David for his further plans in this direction.
>
> I'm sure I've probably missed something in all of this, but this is
> enough to begin the discussion.
>
> ~Andrew
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: L1TF, and future work
2018-08-15 13:17 L1TF, and future work Andrew Cooper
2018-08-15 13:21 ` Andrew Cooper
@ 2018-08-15 14:10 ` Jan Beulich
[not found] ` <5B74347002000078001DE714@suse.com>
2018-08-24 9:15 ` Dario Faggioli
3 siblings, 0 replies; 13+ messages in thread
From: Jan Beulich @ 2018-08-15 14:10 UTC (permalink / raw)
To: Andrew Cooper
Cc: Juergen Gross, Sergey Dyasli, Wei Liu, Tim Deegan, Dario Faggioli,
Xen Security, Xen-devel List, David Woodhouse, Roger Pau Monne
>>> On 15.08.18 at 15:17, <andrew.cooper3@citrix.com> wrote:
> 2) 32bit PV guests which use writeable pagetable support will
> automatically get shadowed when the clear the lower half.
... of a page table entry.
> Ideally, such
> guests should be modified to use hypercalls rather than the ptwr
> infrastructure (as its more efficient to begin with), but we can
> probably work around this in Xen by emulating the next few instructions
> until we have a complete PTE (same as the shadow code).
Provided the intervening insns are simple enough. I've looked into
current Linux pv-ops code the other day, and afaict it's already
using mmu-op or cmpxchg8b, but not two separate mov-s. But
of course I've looked at the general routines only, not at things
perhaps hidden in special cases, or in init-only code.
> 4) The shadow MMIO fastpath truncates the MMIO gfn at 2^28 without any
> indication of failure. The most compatible bugfix AFACIT would be to
> add an extra nibble's worth of gfn space which gets us to 2^32, and
> clamp the guest maxphysaddr calculation at 44 bits. The alternative is
> to clamp maxphysaddr to 40 bits, but that will break incoming migrate of
> very large shadow guests.
Urgh.
> 4a) The shadow MMIO fastpath needs a runtime clobber, because it will
> not function at all on Icelake hardware with a 52-bit physical address
> width. Also, it turns out there is an architectural corner case when
> levelling maxphysaddr, where some bits which (v)maxphysaddr says should
> elicit #PF[RSVD], don't because the actual pipeline address width is larger.
By "runtime clobber" you mean something to disable that path at
runtime, rather than at build time?
Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: L1TF, and future work
2018-08-15 13:21 ` Andrew Cooper
@ 2018-08-15 14:11 ` Juergen Gross
0 siblings, 0 replies; 13+ messages in thread
From: Juergen Gross @ 2018-08-15 14:11 UTC (permalink / raw)
To: Andrew Cooper, Xen-devel List
Cc: Sergey Dyasli, Wei Liu, Tim Deegan, Dario Faggioli, Jan Beulich,
Xen Security, Woodhouse, David, Roger Pau Monne
On 15/08/18 15:21, Andrew Cooper wrote:
> On 15/08/18 14:17, Andrew Cooper wrote:
>> Hello,
>
> Apologies. Getting Dario's correct email address this time.
>
>>
>> Now that the embargo on XSA-273 is up, we can start publicly discussing
>> the remaining work do, because there is plenty to do. In no particular
>> order...
>>
>> 1) Attempting to shadow dom0 from boot leads to some assertions very
>> very quickly. Shadowing dom0 after-the-fact leads to some very weird
>> crashes where whole swathes of the shadow appears to be missing. This
>> is why, for now, automatic shadowing of dom0 is disabled by default.
>>
>> 2) 32bit PV guests which use writeable pagetable support will
>> automatically get shadowed when the clear the lower half. Ideally, such
>> guests should be modified to use hypercalls rather than the ptwr
>> infrastructure (as its more efficient to begin with), but we can
>> probably work around this in Xen by emulating the next few instructions
>> until we have a complete PTE (same as the shadow code).
I can work on that in the Linux kernel.
There has been another bug which I suspect is related to that:
https://bugzilla.kernel.org/show_bug.cgi?id=198497
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: L1TF, and future work
[not found] ` <5B74347002000078001DE714@suse.com>
@ 2018-08-15 14:35 ` Juergen Gross
2018-08-24 18:43 ` Jason Andryuk
0 siblings, 1 reply; 13+ messages in thread
From: Juergen Gross @ 2018-08-15 14:35 UTC (permalink / raw)
To: Jan Beulich, Andrew Cooper
Cc: Sergey Dyasli, Wei Liu, Tim Deegan,
Xen-devel List <xen-devel@lists.xen.org>,
Xen Security <security@xen.org>, Dario Faggioli,
David Woodhouse, Roger Pau Monne
On 15/08/18 16:10, Jan Beulich wrote:
>>>> On 15.08.18 at 15:17, <andrew.cooper3@citrix.com> wrote:
>> 2) 32bit PV guests which use writeable pagetable support will
>> automatically get shadowed when the clear the lower half.
>
> ... of a page table entry.
>
>> Ideally, such
>> guests should be modified to use hypercalls rather than the ptwr
>> infrastructure (as its more efficient to begin with), but we can
>> probably work around this in Xen by emulating the next few instructions
>> until we have a complete PTE (same as the shadow code).
>
> Provided the intervening insns are simple enough. I've looked into
> current Linux pv-ops code the other day, and afaict it's already
> using mmu-op or cmpxchg8b, but not two separate mov-s. But
> of course I've looked at the general routines only, not at things
> perhaps hidden in special cases, or in init-only code.
Look at xen_pte_clear(). Inside irq handling it will use (PAE case):
static inline void native_pte_clear(struct mm_struct *mm, unsigned long
addr,
pte_t *ptep)
{
ptep->pte_low = 0;
smp_wmb();
ptep->pte_high = 0;
}
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: L1TF, and future work
2018-08-15 13:17 L1TF, and future work Andrew Cooper
` (2 preceding siblings ...)
[not found] ` <5B74347002000078001DE714@suse.com>
@ 2018-08-24 9:15 ` Dario Faggioli
2018-09-10 21:45 ` Tamas K Lengyel
3 siblings, 1 reply; 13+ messages in thread
From: Dario Faggioli @ 2018-08-24 9:15 UTC (permalink / raw)
To: Andrew Cooper, Xen-devel List
Cc: Juergen Gross, Sergey Dyasli, Wei Liu, Dario Faggioli, Tim Deegan,
Jan Beulich, Xen Security, Woodhouse, David, Roger Pau Monne
[-- Attachment #1.1: Type: text/plain, Size: 2989 bytes --]
On Wed, 2018-08-15 at 14:17 +0100, Andrew Cooper wrote:
> Hello,
>
> Now that the embargo on XSA-273 is up, we can start publicly
> discussing
> the remaining work do, because there is plenty to do. In no
> particular
> order...
>
>
> [...]
>
> 5) Core-aware scheduling. At the moment, Xen will schedule arbitrary
> guest vcpus on arbitrary hyperthreads. This is bad and wants
> fixing.
> I'll defer to Dario for further details.
>
Yes. So, basically, making sure that, if we have hyperthreading, only
vCPUs from one domain are, at any given time, concurrently running on
the threads of a core, acts as a form of mitigation.
As a reference, check how this is mentioned in L1TF writeups coming
from other hypervisor's that have (or are introducing) support for this
already:
Hyper-V:
https://support.microsoft.com/en-us/help/4457951/windows-server-guidance-to-protect-against-l1-terminal-fault
VMWare:
https://kb.vmware.com/s/article/55806
(MS' Hyper-V's core-scheduler is also mentioned in one of the Intel's
documents
https://www.intel.com/content/www/us/en/architecture-and-technology/l1tf.html
)
It's not a *complete* mitigation, and, e.g., the other measures (like
the L1D flushing on VMEnter) are still required, but it helps
preventing the issue of a VM being able to read/steal data from another
VM.
As an example, if we have VM 1 and VM 2, with four vCPUs each, and a
two core system with hyperthreading, i.e., cpu 0 and cpu 1 are threads
of core 0, while cpu 2 and cpu 3 are threads of core 2, we want to
schedule the vCPUs, for instance, like this:
cpu0 <-- d2v3
cpu1 <-- d2v1
cpu2 <-- d1v2
cpu3 <-- d1v0
and not like this:
cpu0 <-- d1v2
cpu1 <-- d2v3
...
Of course, this means that, if only d1v2, from VM 1, is active and
wants to run, while alle the four vCPUs of VM 2 are active and want to
run too, we can end up in this situation:
cpu0 <-- d1v2
cpu1 <-- _idle_
cpu2 <-- d2v1
cpu3 <-- d2v3
wanting_to_run: d2v0, d2v2
I.e., there are ready to run vCPUs, there is an idle pCPU, but we can't
run them there. This is not ideal, but is, at least in theory, better
than disabling hyperthreading entirely. (Again, these are all just
examples!)
Of course, this makes the scheduling much more complicated, especially
when it comes to fairness considerations and to avoiding starvation.
I do have an RFC level patch series, for starting implementing this
"core-scheduling", which I have shared with someone, during the
embargo, and that I will post here on xen-devel later.
Note that I'll be off for ~2 weeks, effective next Monday, so feel free
to comment, reply, etc, but expect me to reply back only in September.
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: L1TF, and future work
2018-08-15 14:35 ` Juergen Gross
@ 2018-08-24 18:43 ` Jason Andryuk
2018-08-25 5:21 ` Juergen Gross
0 siblings, 1 reply; 13+ messages in thread
From: Jason Andryuk @ 2018-08-24 18:43 UTC (permalink / raw)
To: Juergen Gross
Cc: sergey.dyasli, Wei Liu, Andrew Cooper, tim, xen-devel,
Jan Beulich, security, dfaggioli, dwmw, roger.pau
On Wed, Aug 15, 2018 at 10:39 AM Juergen Gross <jgross@suse.com> wrote:
>
> On 15/08/18 16:10, Jan Beulich wrote:
> >>>> On 15.08.18 at 15:17, <andrew.cooper3@citrix.com> wrote:
> >> 2) 32bit PV guests which use writeable pagetable support will
> >> automatically get shadowed when the clear the lower half.
> >
> > ... of a page table entry.
> >
> >> Ideally, such
> >> guests should be modified to use hypercalls rather than the ptwr
> >> infrastructure (as its more efficient to begin with), but we can
> >> probably work around this in Xen by emulating the next few instructions
> >> until we have a complete PTE (same as the shadow code).
> >
> > Provided the intervening insns are simple enough. I've looked into
> > current Linux pv-ops code the other day, and afaict it's already
> > using mmu-op or cmpxchg8b, but not two separate mov-s. But
> > of course I've looked at the general routines only, not at things
> > perhaps hidden in special cases, or in init-only code.
>
> Look at xen_pte_clear(). Inside irq handling it will use (PAE case):
>
> static inline void native_pte_clear(struct mm_struct *mm, unsigned long
> addr,
> pte_t *ptep)
> {
> ptep->pte_low = 0;
> smp_wmb();
> ptep->pte_high = 0;
> }
I've been testing out set_64bit for PTE operations on 32bit PAE. I
haven't found all the spots, but shadowing is now enabled a few
seconds into boot instead of immediately.
And yes, I think https://bugzilla.kernel.org/show_bug.cgi?id=198497 is
related as you presumed a while back.
Regards,
Jason
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: L1TF, and future work
2018-08-24 18:43 ` Jason Andryuk
@ 2018-08-25 5:21 ` Juergen Gross
2018-08-27 12:10 ` Jason Andryuk
0 siblings, 1 reply; 13+ messages in thread
From: Juergen Gross @ 2018-08-25 5:21 UTC (permalink / raw)
To: Jason Andryuk
Cc: sergey.dyasli, Wei Liu, Andrew Cooper, tim, xen-devel,
Jan Beulich, security, dfaggioli, dwmw, roger.pau
On 24/08/18 20:43, Jason Andryuk wrote:
> On Wed, Aug 15, 2018 at 10:39 AM Juergen Gross <jgross@suse.com> wrote:
>>
>> On 15/08/18 16:10, Jan Beulich wrote:
>>>>>> On 15.08.18 at 15:17, <andrew.cooper3@citrix.com> wrote:
>>>> 2) 32bit PV guests which use writeable pagetable support will
>>>> automatically get shadowed when the clear the lower half.
>>>
>>> ... of a page table entry.
>>>
>>>> Ideally, such
>>>> guests should be modified to use hypercalls rather than the ptwr
>>>> infrastructure (as its more efficient to begin with), but we can
>>>> probably work around this in Xen by emulating the next few instructions
>>>> until we have a complete PTE (same as the shadow code).
>>>
>>> Provided the intervening insns are simple enough. I've looked into
>>> current Linux pv-ops code the other day, and afaict it's already
>>> using mmu-op or cmpxchg8b, but not two separate mov-s. But
>>> of course I've looked at the general routines only, not at things
>>> perhaps hidden in special cases, or in init-only code.
>>
>> Look at xen_pte_clear(). Inside irq handling it will use (PAE case):
>>
>> static inline void native_pte_clear(struct mm_struct *mm, unsigned long
>> addr,
>> pte_t *ptep)
>> {
>> ptep->pte_low = 0;
>> smp_wmb();
>> ptep->pte_high = 0;
>> }
>
> I've been testing out set_64bit for PTE operations on 32bit PAE. I
> haven't found all the spots, but shadowing is now enabled a few
> seconds into boot instead of immediately.
>
> And yes, I think https://bugzilla.kernel.org/show_bug.cgi?id=198497 is
> related as you presumed a while back.
I have a patch series (two patches) avoiding shadowing completely:
https://lists.xen.org/archives/html/xen-devel/2018-08/msg01785.html
Juergen
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: L1TF, and future work
2018-08-25 5:21 ` Juergen Gross
@ 2018-08-27 12:10 ` Jason Andryuk
0 siblings, 0 replies; 13+ messages in thread
From: Jason Andryuk @ 2018-08-27 12:10 UTC (permalink / raw)
To: Juergen Gross
Cc: sergey.dyasli, Wei Liu, Andrew Cooper, tim, xen-devel,
Jan Beulich, security, Dario Faggioli, dwmw, roger.pau
On Sat, Aug 25, 2018 at 1:21 AM Juergen Gross <jgross@suse.com> wrote:
>
> On 24/08/18 20:43, Jason Andryuk wrote:
> > On Wed, Aug 15, 2018 at 10:39 AM Juergen Gross <jgross@suse.com> wrote:
> >>
> >> On 15/08/18 16:10, Jan Beulich wrote:
> >>>>>> On 15.08.18 at 15:17, <andrew.cooper3@citrix.com> wrote:
> >>>> 2) 32bit PV guests which use writeable pagetable support will
> >>>> automatically get shadowed when the clear the lower half.
> >>>
> >>> ... of a page table entry.
> >>>
> >>>> Ideally, such
> >>>> guests should be modified to use hypercalls rather than the ptwr
> >>>> infrastructure (as its more efficient to begin with), but we can
> >>>> probably work around this in Xen by emulating the next few instructions
> >>>> until we have a complete PTE (same as the shadow code).
> >>>
> >>> Provided the intervening insns are simple enough. I've looked into
> >>> current Linux pv-ops code the other day, and afaict it's already
> >>> using mmu-op or cmpxchg8b, but not two separate mov-s. But
> >>> of course I've looked at the general routines only, not at things
> >>> perhaps hidden in special cases, or in init-only code.
> >>
> >> Look at xen_pte_clear(). Inside irq handling it will use (PAE case):
> >>
> >> static inline void native_pte_clear(struct mm_struct *mm, unsigned long
> >> addr,
> >> pte_t *ptep)
> >> {
> >> ptep->pte_low = 0;
> >> smp_wmb();
> >> ptep->pte_high = 0;
> >> }
> >
> > I've been testing out set_64bit for PTE operations on 32bit PAE. I
> > haven't found all the spots, but shadowing is now enabled a few
> > seconds into boot instead of immediately.
> >
> > And yes, I think https://bugzilla.kernel.org/show_bug.cgi?id=198497 is
> > related as you presumed a while back.
>
> I have a patch series (two patches) avoiding shadowing completely:
>
> https://lists.xen.org/archives/html/xen-devel/2018-08/msg01785.html
Great! Thank you. I'm building now.
Looks like I missed native_ptep_get_and_clear which led to the delay
enabling shadowing.
Regards,
Jason
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: L1TF, and future work
2018-08-24 9:15 ` Dario Faggioli
@ 2018-09-10 21:45 ` Tamas K Lengyel
2018-09-11 15:13 ` Dario Faggioli
0 siblings, 1 reply; 13+ messages in thread
From: Tamas K Lengyel @ 2018-09-10 21:45 UTC (permalink / raw)
To: Dario Faggioli
Cc: JGross, sergey.dyasli, Wei Liu, Andrew Cooper, Dario Faggioli,
Tim Deegan, Xen-devel, Jan Beulich, security, dwmw,
Roger Pau Monné
On Fri, Aug 24, 2018 at 3:16 AM Dario Faggioli <dfaggioli@suse.com> wrote:
>
> On Wed, 2018-08-15 at 14:17 +0100, Andrew Cooper wrote:
> > Hello,
> >
> > Now that the embargo on XSA-273 is up, we can start publicly
> > discussing
> > the remaining work do, because there is plenty to do. In no
> > particular
> > order...
> >
> >
> > [...]
> >
> > 5) Core-aware scheduling. At the moment, Xen will schedule arbitrary
> > guest vcpus on arbitrary hyperthreads. This is bad and wants
> > fixing.
> > I'll defer to Dario for further details.
> >
> Yes. So, basically, making sure that, if we have hyperthreading, only
> vCPUs from one domain are, at any given time, concurrently running on
> the threads of a core, acts as a form of mitigation.
>
> As a reference, check how this is mentioned in L1TF writeups coming
> from other hypervisor's that have (or are introducing) support for this
> already:
>
> Hyper-V:
> https://support.microsoft.com/en-us/help/4457951/windows-server-guidance-to-protect-against-l1-terminal-fault
>
> VMWare:
> https://kb.vmware.com/s/article/55806
>
> (MS' Hyper-V's core-scheduler is also mentioned in one of the Intel's
> documents
> https://www.intel.com/content/www/us/en/architecture-and-technology/l1tf.html
> )
>
> It's not a *complete* mitigation, and, e.g., the other measures (like
> the L1D flushing on VMEnter) are still required, but it helps
> preventing the issue of a VM being able to read/steal data from another
> VM.
>
> As an example, if we have VM 1 and VM 2, with four vCPUs each, and a
> two core system with hyperthreading, i.e., cpu 0 and cpu 1 are threads
> of core 0, while cpu 2 and cpu 3 are threads of core 2, we want to
> schedule the vCPUs, for instance, like this:
>
> cpu0 <-- d2v3
> cpu1 <-- d2v1
> cpu2 <-- d1v2
> cpu3 <-- d1v0
>
> and not like this:
>
> cpu0 <-- d1v2
> cpu1 <-- d2v3
> ...
>
> Of course, this means that, if only d1v2, from VM 1, is active and
> wants to run, while alle the four vCPUs of VM 2 are active and want to
> run too, we can end up in this situation:
>
> cpu0 <-- d1v2
> cpu1 <-- _idle_
> cpu2 <-- d2v1
> cpu3 <-- d2v3
>
> wanting_to_run: d2v0, d2v2
>
> I.e., there are ready to run vCPUs, there is an idle pCPU, but we can't
> run them there. This is not ideal, but is, at least in theory, better
> than disabling hyperthreading entirely. (Again, these are all just
> examples!)
>
> Of course, this makes the scheduling much more complicated, especially
> when it comes to fairness considerations and to avoiding starvation.
>
> I do have an RFC level patch series, for starting implementing this
> "core-scheduling", which I have shared with someone, during the
> embargo, and that I will post here on xen-devel later.
>
> Note that I'll be off for ~2 weeks, effective next Monday, so feel free
> to comment, reply, etc, but expect me to reply back only in September.
Hi Dario,
once you are back from vacation, could you share the RFC patches you mentioned?
Thanks,
Tamas
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: L1TF, and future work
2018-09-10 21:45 ` Tamas K Lengyel
@ 2018-09-11 15:13 ` Dario Faggioli
2018-09-11 17:14 ` Tamas K Lengyel
0 siblings, 1 reply; 13+ messages in thread
From: Dario Faggioli @ 2018-09-11 15:13 UTC (permalink / raw)
To: Tamas K Lengyel
Cc: JGross, sergey.dyasli, Wei Liu, Andrew Cooper, Dario Faggioli,
Tim Deegan, Xen-devel, Jan Beulich, security, dwmw,
Roger Pau Monné
[-- Attachment #1.1: Type: text/plain, Size: 932 bytes --]
On Mon, 2018-09-10 at 15:45 -0600, Tamas K Lengyel wrote:
> On Fri, Aug 24, 2018 at 3:16 AM Dario Faggioli <dfaggioli@suse.com>
> wrote:
> >
> > Note that I'll be off for ~2 weeks, effective next Monday, so feel
> > free
> > to comment, reply, etc, but expect me to reply back only in
> > September.
>
> Hi Dario,
>
Hi,
> once you are back from vacation, could you share the RFC patches you
> mentioned?
>
I did that before leaving actually. :-)
https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg02164.html
I'm back now, and am working on the series again. In the meanwhile, do
feel free to share any kind of feedback or opinion.
Regards,
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: L1TF, and future work
2018-09-11 15:13 ` Dario Faggioli
@ 2018-09-11 17:14 ` Tamas K Lengyel
2018-09-12 6:12 ` Dario Faggioli
0 siblings, 1 reply; 13+ messages in thread
From: Tamas K Lengyel @ 2018-09-11 17:14 UTC (permalink / raw)
To: Dario Faggioli
Cc: JGross, sergey.dyasli, Wei Liu, Andrew Cooper, Dario Faggioli,
Tim Deegan, Xen-devel, Jan Beulich, security, dwmw,
Roger Pau Monné
On Tue, Sep 11, 2018 at 9:13 AM Dario Faggioli <dfaggioli@suse.com> wrote:
>
> On Mon, 2018-09-10 at 15:45 -0600, Tamas K Lengyel wrote:
> > On Fri, Aug 24, 2018 at 3:16 AM Dario Faggioli <dfaggioli@suse.com>
> > wrote:
> > >
> > > Note that I'll be off for ~2 weeks, effective next Monday, so feel
> > > free
> > > to comment, reply, etc, but expect me to reply back only in
> > > September.
> >
> > Hi Dario,
> >
> Hi,
>
> > once you are back from vacation, could you share the RFC patches you
> > mentioned?
> >
> I did that before leaving actually. :-)
>
> https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg02164.html
>
> I'm back now, and am working on the series again. In the meanwhile, do
> feel free to share any kind of feedback or opinion.
Ah, thanks, I missed it somehow! :) We'll give it a spin, this is
likely something we will need down the road (probably with credit2
though).
Tamas
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: L1TF, and future work
2018-09-11 17:14 ` Tamas K Lengyel
@ 2018-09-12 6:12 ` Dario Faggioli
0 siblings, 0 replies; 13+ messages in thread
From: Dario Faggioli @ 2018-09-12 6:12 UTC (permalink / raw)
To: Tamas K Lengyel
Cc: JGross, sergey.dyasli, Wei Liu, Andrew Cooper, Dario Faggioli,
Tim Deegan, Xen-devel, Jan Beulich, security, dwmw,
Roger Pau Monné
[-- Attachment #1.1: Type: text/plain, Size: 763 bytes --]
On Tue, 2018-09-11 at 11:14 -0600, Tamas K Lengyel wrote:
> On Tue, Sep 11, 2018 at 9:13 AM Dario Faggioli <dfaggioli@suse.com>
>
https://lists.xenproject.org/archives/html/xen-devel/2018-08/msg02164.html
> >
> > I'm back now, and am working on the series again. In the meanwhile,
> > do
> > feel free to share any kind of feedback or opinion.
>
> Ah, thanks, I missed it somehow! :) We'll give it a spin, this is
> likely something we will need down the road (probably with credit2
> though).
>
Dario
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/
[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
[-- Attachment #2: Type: text/plain, Size: 157 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2018-09-12 6:12 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-15 13:17 L1TF, and future work Andrew Cooper
2018-08-15 13:21 ` Andrew Cooper
2018-08-15 14:11 ` Juergen Gross
2018-08-15 14:10 ` Jan Beulich
[not found] ` <5B74347002000078001DE714@suse.com>
2018-08-15 14:35 ` Juergen Gross
2018-08-24 18:43 ` Jason Andryuk
2018-08-25 5:21 ` Juergen Gross
2018-08-27 12:10 ` Jason Andryuk
2018-08-24 9:15 ` Dario Faggioli
2018-09-10 21:45 ` Tamas K Lengyel
2018-09-11 15:13 ` Dario Faggioli
2018-09-11 17:14 ` Tamas K Lengyel
2018-09-12 6:12 ` Dario Faggioli
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).