From: Andi Kleen <ak@linux.intel.com>
To: speck@linutronix.de
Subject: [MODERATED] Re: [patch 2/2] Command line and documentation 2
Date: Mon, 9 Jul 2018 15:07:01 -0700 [thread overview]
Message-ID: <20180709220701.GN25550@tassilo.jf.intel.com> (raw)
In-Reply-To: <20180708125654.812951995@linutronix.de>
> + - Processors which have the ARCH_CAP_RDCL_NO bit set in the
> + IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is also not
> + affected by the Meltdown vulnerabitly. These CPUs should become
> + available end of 2018.
Would be better to specify the sysfs file output here.
> +Problem
> +-------
> +
> +If an instruction accesses a virtual address for which the relevant page
> +table entry (PTE) has the present bit cleared, then the speculative
> +execution can load the data into the speculation flow when the data from
"load the into the speculation flow" ?
No idea what that means. I would just drop it.
> +the physical address which is referenced in the PTE address bits is
> +available in the Level 1 Data Cache. This is a purely speculative
> +mechanism and the instruction will raise a page fault when it is retired.
> +
> +This creates a window between the load and retirement where speculative
> +execution can operate on the data and malicious code can use this to create
> +a side channel to leak the speculated data which was accessed without
> +permission.
> +
> +This flaw is very similar to the Meltdown vulnerability, which speculates
> +on data which should be not accessible from user space because the
> +speculation ignores the permission bits. Contrary to Meltdown L1TF can not
> +be exploited without actually generating page faults. While Meltdown breaks
Not correct due to TSX.
> +the user space to kernel space protection, L1TF has a broader scope. It
It's actually not broader for user space, but much more narrow.
> +allows to attack any physical memory address in the system and the attack
.. but only when the page table entry is controllable.
> +works across all protection domains. It allows to attack SGX and also
> +works from inside virtual machines because the speculation bypasses the
which is only the case for virtual machines.
> +extended page table (EPT) protection mechanism.
... and virtual machines can control page table entries.
> +2. Malicious guest in a virtual machine
> +
> + The fact that L1TF breaks all domain protections allows malicious guest
> + OSes, which can control the PTEs directly, and malicious userspace,
> + which runs on an unprotected guest kernel, to attack physical host
> + memory.
... to attack other guests or the host
> +
> +1. L1D flush on VMENTER
> +
> + To make sure that a guest cannot attack data which is present in L1D the
> + hypervisor flushes L1D before entering the guest.
> +
> + Flushing L1D evicts not only the data which should not be accessed by a
> + potentially malicious guest, it also flushes the guest data. Flushing
I suspect this is misleading because most non trivial exits already effectively
clear the L1.
> + L1D has a performance impact as the processor has to bring the flushed
> + guest data back into L1D. Depending on the frequency of VMEXIT/VMENTER
> + and the type of computations in the guest performance degradation in the
> + range of 1% to 50% has been observed. For scenarios where guest
> + VMEXIT/VMENTER are rare the performance impact is minimal. Virtio and
> + mechanisms like posted interrupts are designed to confine the VMEXITs to
> + a bare minimum, but specific configurations and application scenarios
> + might still suffer from a high VMEXIT rate.
> +
> + The general recommendation is to enable L1D flush on VMENTER.
... right so we should make it default.
> +2. Guest VCPU confinement to dedicated physical cores
> +
> + To address the SMT problem, it is possible to make a guest or a group of
> + guests affine to one or more physical cores. The proper mechanism for
> + that is to utilize cpusets and to make sure that no other guest or host
> + tasks can run on these cores.
Need to refer to exclusive cpusets here. If the cpuset is not exclusive
it doesn't help.
> + The host memory is attackable, when one of the sibling threads runs in
> + host OS (hypervisor) context and the other in guest context. The amount
> + of valuable information from the host OS context depends on the context
> + which the host OS executes, i.e. interrupts, soft interrupts and kernel
> + threads. The amount of valuable data from these contexts cannot be
> + declared as non interesting for an attacker without deep inspection of
> + the code.
The interrupt part seems misplaced in this section, should be in the next.
> +
> + Note, that assigning guests to a fixed set of physical cores affects the
> + ability of the scheduler to do load balancing and might have negative
> + effects on CPU utilization depending on the hosting scenario. Disabling
> + SMT might be a viable alternative for particular scenarios.
> +
> + For further information about confining guests to a single or to a group
> + of cores consult the cpusets documentation.
.... add a pointer to the document here ...
> + true because there are types of interrupts which are truly per CPU
> + interrupts, e.g. the local timer interrupt. Aside of that multi queue
> + devices affine their interrupts to single CPUs or groups of CPUs per
> + queue without allowing the administrator to control the affinities.
Mention that timers run on the CPUs they were triggered on, so they
are effectively tied to the processes?
> + /sys/devices/system/cpu/smt/active:
> +
> + This file reports whether SMT is enabled and active, i.e. if on any
> + physical core two or more sibling threads are online.
> +
> +There is ongoing research and development for other mitigation mechanisms
> +to address the performance impact of disabling SMT.
Really need to refer to any sysfs for L1D here.
> + configuration, i.e. SMT enabled or L1D flush disabled.
> +
> + novirt,nowarn: Same as 'novirt', but hypervisors will not warn when
> + a VM is started in a potentially insecure configuration.
> +
> +The default is 'novirt'.
> +
> +Mitigation control for KVM - command line or module parameter
> +-------------------------------------------------------------
> +
> +The KVM hypervisor mitigation mechanism, flushing the L1D cache when
> +entering a guest, can be controlled from the kernel command line or when
> +the KVM-Intel hypervisor is built as a module also with a module parameter.
That can be also changed dynamically through /sys/module/kvm/parameters/* right?
That would be useful to document.
> +
> +The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
> +following arguments::
> +
> + always: L1D cache flush on every VMENTER.
> +
> + cond: Flush L1D on VMENTER only when the code between VMEXIT and
> + VMENTER can leak host memory which is considered
> + interesting for an attacker. This still can leak host data
> + which allows e.g. to determine the hosts address space layout.
> +
> + never: Disables the mitigation
> +
> +The default is 'cond'. If 'l1tf=full' or 'l1tf=full,force' are given on the
> +kernel command line, these take precedence.
Ah, so the command line description was actually wrong. The default
is not novirt, but cond. That's good. But really need to fix that description
in the other patch ...
> +3. Virtualization with untrusted guests
> +
> + If SMT is not supported by the processor or disabled in the BIOS or by
> + the kernel, it's only required to enforce L1D flushing on VMENTER. This
> + can be achieved with the l1tf or the kvm-intel command line or module
> + parameters.
With cond in most cases they should be already ok?
Just need a trade off here between cond and always.
something like
With the default a guest can determine the address space layout
of the hypervisor, but no user data.
> + - Isolating the guest CPUs from interrupts can reduce the attack surface
> + further, but still allows a malicious guest to explore a limited
> + amount of host physical memory. This can at least be used to gain
> + knowledge about the host address space layout. The interrupts which
> + have a fixed affinity to the CPUs which run the untrusted guests can
> + depending on the scenario still trigger soft interrupts and schedule
> + kernel threads which might expose valuable information.
> +
> + - Disabling SMT and enforcing the L1D flushing provides the maximum
enforcing it with always
-Andi
>
next prev parent reply other threads:[~2018-07-09 22:07 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-08 12:52 [patch 0/2] Command line and documentation 0 Thomas Gleixner
2018-07-08 12:52 ` [patch 1/2] Command line and documentation 1 Thomas Gleixner
2018-07-08 14:00 ` [MODERATED] " Josh Poimboeuf
2018-07-08 14:13 ` Thomas Gleixner
2018-07-08 15:21 ` [MODERATED] " Josh Poimboeuf
2018-07-09 7:07 ` Thomas Gleixner
2018-07-09 13:14 ` Thomas Gleixner
2018-07-09 13:21 ` [MODERATED] " Jiri Kosina
2018-07-09 13:25 ` Jiri Kosina
2018-07-09 15:32 ` Josh Poimboeuf
2018-07-09 15:40 ` Thomas Gleixner
2018-07-09 15:44 ` [MODERATED] " Jiri Kosina
2018-07-08 20:32 ` Jiri Kosina
2018-07-09 0:33 ` Jon Masters
2018-07-09 10:26 ` Ingo Molnar
2018-07-09 21:45 ` Andi Kleen
2018-07-09 22:08 ` Andi Kleen
2018-07-09 22:40 ` Jiri Kosina
2018-07-10 11:53 ` Thomas Gleixner
2018-07-08 12:52 ` [patch 2/2] Command line and documentation 2 Thomas Gleixner
2018-07-08 14:40 ` [MODERATED] " Andrew Cooper
2018-07-09 7:05 ` Thomas Gleixner
2018-07-08 15:40 ` [MODERATED] " Josh Poimboeuf
2018-07-09 11:04 ` Ingo Molnar
2018-07-09 11:08 ` Jiri Kosina
2018-07-09 11:47 ` Ingo Molnar
2018-07-09 15:18 ` Thomas Gleixner
2018-07-09 22:07 ` Andi Kleen [this message]
2018-07-09 23:00 ` [MODERATED] " Josh Poimboeuf
2018-07-09 23:11 ` Andi Kleen
2018-07-09 23:45 ` Linus Torvalds
2018-07-10 2:44 ` Josh Poimboeuf
2018-07-10 5:57 ` Jiri Kosina
2018-07-10 6:22 ` Jiri Kosina
2018-07-10 17:46 ` Linus Torvalds
2018-07-10 21:22 ` Thomas Gleixner
2018-07-10 21:30 ` [MODERATED] " Linus Torvalds
2018-07-10 21:53 ` Linus Torvalds
2018-07-10 22:27 ` Thomas Gleixner
2018-07-10 22:37 ` [MODERATED] " Linus Torvalds
2018-07-10 22:42 ` Linus Torvalds
2018-07-10 22:50 ` Josh Poimboeuf
2018-07-11 13:56 ` Jon Masters
2018-07-11 14:48 ` Josh Poimboeuf
2018-07-10 22:20 ` Thomas Gleixner
2018-07-10 22:35 ` [MODERATED] " Linus Torvalds
2018-07-10 7:41 ` Thomas Gleixner
2018-07-10 8:44 ` [MODERATED] " Jiri Kosina
2018-07-10 10:32 ` Jiri Kosina
2018-07-10 22:57 ` Josh Poimboeuf
2018-07-10 19:36 ` Thomas Gleixner
2018-07-11 14:03 ` [MODERATED] " Jon Masters
2018-07-08 13:11 ` [patch 0/2] Command line and documentation 0 Thomas Gleixner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180709220701.GN25550@tassilo.jf.intel.com \
--to=ak@linux.intel.com \
--cc=speck@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.