xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Cooper <andrew.cooper3@citrix.com>
To: Juergen Gross <jgross@suse.com>, Xen-devel <xen-devel@lists.xen.org>
Subject: Re: [PATCH FAIRLY-RFC 00/44] x86: Prerequisite work for a Xen KAISER solution
Date: Fri, 5 Jan 2018 09:26:55 +0000	[thread overview]
Message-ID: <ad7e36d8-d2c9-f521-77c6-0a1c6c7fba51@citrix.com> (raw)
In-Reply-To: <e2f7e4b2-dd3e-977a-711c-5b9665a2b123@suse.com>

On 05/01/2018 07:48, Juergen Gross wrote:
> On 04/01/18 21:21, Andrew Cooper wrote:
>> This work was developed as an SP3 mitigation, but shelved when it became clear
>> that it wasn't viable to get done in the timeframe.
>>
>> To protect against SP3 attacks, most mappings needs to be flushed while in
>> user context.  However, to protect against all cross-VM attacks, it is
>> necessary to ensure that the Xen stacks are not mapped in any other cpus
>> address space, or an attacker can still recover at least the GPR state of
>> separate VMs.
> Above statement is too strict: it would be sufficient if no stacks of
> other domains are mapped.

Sadly not.  Having stacks shared by domain means one vcpu can still
steal at least GPR state from other vcpus belonging to the same domain.

Whether or not a specific kernel cares, some definitely will.

> I'm just working on a proof of concept using dedicated per-vcpu stacks
> for 64 bit pv domains. Those stacks would be mapped in the per-domain
> region of the address space. I hope to have a RFC version of the patches
> ready next week.
>
> This would allow to remove the per physical cpu mappings in the guest
> visible address space when doing page table isolation.
>
> In order to avoid SP3 attacks to other vcpu's stacks of the same guest
> we could extend the pv ABI to mark a guest's user L4 page table as
> "single use", i.e. not allowed to be active on multiple vcpus at the
> same time (introducing that ABI modification in the Linux kernel would
> be simple, as the Linux kernel currently lacks support for cross-cpu
> stack exploits and when that support is being added by per-cpu L4 user
> page tables we could just chime in). A L4 page table marked as "single
> use" would map the local vcpu stacks only.

For PV guests, it is the Xen stacks which matter, not the vcpu guest
kernel's ones.

64bit PV guest kernels are already mitigated better than KPTI can ever
manage, because there are no entry stacks or entry stubs required to be
mapped into guest userspace at all.

>> To have isolated stacks, Xen needs a per-pcpu isolated region, which requires
>> that two pCPUs never share the same %cr3.  This is trivial for 32bit PV guests
>> and HVM guests due to the existing per-vcpu Monitor Tables, but is problematic
>> for 64bit PV guests, which will run on the same %cr3 when scheduling different
>> threads from the same process.
>>
>> To avoid breaking the PV ABI, Xen needs to shadow the guest L4 pagetables if
>> it wants to maintain the unique %cr3 property it needs.
>>
>> tl;dr The shadowing algorithm in pt-shadow.c is too much of a performance
>> overhead to be viable, and very high risk to productise in an embargo window.
>> If we want to continue down this route, we either need someone to have a
>> clever alternative to the shadowing algorithm I came up with, or change the PV
>> ABI to require VMs not to share L4 pagetables.
>>
>> Either way, these patches are presented to start a discussion of the issues.
>> The series as a whole is not in a suitable state for committing.
> I think patch 1 should be excluded from that statement, as it is not
> directly related to the series.

There are bits of the series I do intend to take in, largely in this
form.  Another is "x86/pv: Drop support for paging out the LDT" because
its long-since time for that to disappear.

I should also say that the net changes to context switch and
critical-structure handling across this series is a performance and
security benefit, irrespective of the KAISER/KPTI side of things. 
They'd qualify for inclusion on their own merits alone (if it weren't
for the dependent L4 shadowing issues).

If you're interested, I stumbled onto patch one after introducing the
per-pcpu stack mapping, as virt_to_maddr() came out spectacularly
wrong.  Very observant readers might also notice the bit of misc
debugging which caused me to blindly stumble into XSA-243, which was an
interesting diversion from Xen crashing because of my own pagetable
mistakes.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-01-05  9:26 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-04 20:21 [PATCH FAIRLY-RFC 00/44] x86: Prerequisite work for a Xen KAISER solution Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 01/44] passthrough/vtd: Don't DMA to the stack in queue_invalidate_wait() Andrew Cooper
2018-01-05  9:21   ` Jan Beulich
2018-01-05  9:33     ` Andrew Cooper
2018-01-16  6:41   ` Tian, Kevin
2018-01-04 20:21 ` [PATCH RFC 02/44] x86/idt: Factor out enabling and disabling of ISTs Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 03/44] x86/pv: Rename invalidate_shadow_ldt() to pv_destroy_ldt() Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 04/44] x86/boot: Introduce cpu_smpboot_bsp() to dynamically allocate BSP state Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 05/44] x86/boot: Move arch_init_memory() earlier in the boot sequence Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 06/44] x86/boot: Allocate percpu pagetables for the idle vcpus Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 07/44] x86/boot: Use " Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 08/44] x86/pv: Avoid an opencoded mov to %cr3 in toggle_guest_mode() Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 09/44] x86/mm: Track the current %cr3 in a per_cpu variable Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 10/44] x86/pt-shadow: Initial infrastructure for L4 PV pagetable shadowing Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 11/44] x86/pt-shadow: Always set _PAGE_ACCESSED on L4e updates Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 12/44] x86/fixmap: Temporarily add a percpu fixmap range Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 13/44] x86/pt-shadow: Shadow L4 tables from 64bit PV guests Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 14/44] x86/mm: Added safety checks that pagetables aren't shared Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 15/44] x86: Rearrange the virtual layout to introduce a PERCPU linear slot Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 16/44] xen/ipi: Introduce arch_ipi_param_ok() to check IPI parameters Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 17/44] x86/smp: Infrastructure for allocating and freeing percpu pagetables Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 18/44] x86/mm: Maintain the correct percpu mappings on context switch Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 19/44] x86/boot: Defer TSS/IST setup until later during boot on the BSP Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 20/44] x86/smp: Allocate a percpu linear range for the IDT Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 21/44] x86/smp: Switch to using the percpu IDT mappings Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 22/44] x86/mm: Track whether the current cr3 has a short or extended directmap Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 23/44] x86/smp: Allocate percpu resources for map_domain_page() to use Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 24/44] x86/mapcache: Reimplement map_domain_page() from scratch Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 25/44] x86/fixmap: Drop percpu fixmap range Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 26/44] x86/pt-shadow: Maintain a small cache of shadowed frames Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 27/44] x86/smp: Allocate a percpu linear range for the compat translation area Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 28/44] x86/xlat: Use the percpu " Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 29/44] x86/smp: Allocate percpu resources for the GDT and LDT Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 30/44] x86/pv: Break handle_ldt_mapping_fault() out of handle_gdt_ldt_mapping_fault() Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 31/44] x86/pv: Drop support for paging out the LDT Andrew Cooper
2018-01-24 11:04   ` Jan Beulich
2018-01-04 20:21 ` [PATCH RFC 32/44] x86: Always reload the LDT on vcpu context switch Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 33/44] x86/smp: Use the percpu GDT/LDT mappings Andrew Cooper
2018-01-04 20:21 ` [PATCH RFC 34/44] x86: Drop the PERDOMAIN mappings Andrew Cooper
2018-01-04 20:22 ` [PATCH RFC 35/44] x86/smp: Allocate the stack in the percpu range Andrew Cooper
2018-01-04 20:22 ` [PATCH RFC 36/44] x86/monitor: Capture Xen's intent to use monitor at boot time Andrew Cooper
2018-01-04 20:22 ` [PATCH RFC 37/44] x86/misc: Move some IPI parameters off the stack Andrew Cooper
2018-01-04 20:22 ` [PATCH RFC 38/44] x86/mca: Move __HYPERVISOR_mca " Andrew Cooper
2018-01-04 20:22 ` [PATCH RFC 39/44] x86/smp: Introduce get_smp_ipi_buf() and take more " Andrew Cooper
2018-01-04 20:22 ` [PATCH RFC 40/44] x86/boot: Switch the APs to the percpu pagetables before entering C Andrew Cooper
2018-01-04 20:22 ` [PATCH RFC 41/44] x86/smp: Switch to using the percpu stacks Andrew Cooper
2018-01-04 20:22 ` [PATCH RFC 42/44] x86/smp: Allocate a percpu linear range for the TSS Andrew Cooper
2018-01-04 20:22 ` [PATCH RFC 43/44] x86/smp: Use the percpu TSS mapping Andrew Cooper
2018-01-04 20:22 ` [PATCH RFC 44/44] misc debugging Andrew Cooper
2018-01-05  7:48 ` [PATCH FAIRLY-RFC 00/44] x86: Prerequisite work for a Xen KAISER solution Juergen Gross
2018-01-05  9:26   ` Andrew Cooper [this message]
2018-01-05  9:39     ` Juergen Gross
2018-01-05  9:56       ` Andrew Cooper
2018-01-05 14:11       ` George Dunlap
2018-01-05 14:17         ` Juergen Gross
2018-01-05 14:21           ` George Dunlap
2018-01-05 14:28             ` Jan Beulich
2018-01-05 14:27         ` Jan Beulich
2018-01-05 14:35           ` Andrew Cooper
2018-01-08 11:41             ` George Dunlap
2018-01-09 23:14   ` Stefano Stabellini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ad7e36d8-d2c9-f521-77c6-0a1c6c7fba51@citrix.com \
    --to=andrew.cooper3@citrix.com \
    --cc=jgross@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).