From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dave.hansen@intel.com>
Received: from mail.linutronix.de (146.0.238.70:993) by
  crypto-ml.lab.linutronix.de with IMAP4-SSL for <speck@linutronix.de>; 14 Jan
  2019 19:20:10 -0000
Received: from mga07.intel.com ([134.134.136.100])
	by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256)
	(Exim 4.80)
	(envelope-from <dave.hansen@intel.com>)
	id 1gj7mK-0003hl-O0
	for speck@linutronix.de; Mon, 14 Jan 2019 20:20:09 +0100
Subject: [MODERATED] Re: [PATCH v4 05/28] MDSv4 10
References: <cover.1547256470.git.ak@linux.intel.com>
 <021c5ba2a9fdae326058dd16785b30c31546cd0f.1547256470.git.ak@linux.intel.com>
From: Dave Hansen <dave.hansen@intel.com>
Message-ID: <1ce79648-6263-d726-bb69-da54131535a2@intel.com>
Date: Mon, 14 Jan 2019 11:20:02 -0800
MIME-Version: 1.0
In-Reply-To: 
 <021c5ba2a9fdae326058dd16785b30c31546cd0f.1547256470.git.ak@linux.intel.com>
Content-Type: multipart/mixed; boundary="7OGI42AkB7AxZBUWrbS9X3j2Y6mtJBg3f"; protected-headers="v1"
To: speck@linutronix.de
List-ID: <speck.linutronix.de>

This is an OpenPGP/MIME encrypted message (RFC 4880 and 3156)
--7OGI42AkB7AxZBUWrbS9X3j2Y6mtJBg3f
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable

On 1/11/19 5:29 PM, speck for Andi Kleen wrote:
> When entering idle the internal state of the current CPU might
> become visible to the thread sibling because the CPU "frees" some
> internal resources.

Is there some documentation somewhere about what "idle" means here?  It
looks like MWAIT and HLT certainly count, but is there anything else?

I'm just trying to figure out how we make sure we catch all of the
call-sites for these.  This sprinkles quite a few of them around, and
I'm wondering how you found these, how we know if we missed any, and how
we keep folks from reintroducing new call-sites that would make us
vulnerable again.

I did a quick "objdump | grep mwait" and this patch appears to catch all
the functions that I encountered.

> +/*
> + * Clear CPU buffers before going idle, so that no state is leaked to =
SMT
> + * siblings taking over thread resources.
> + * Out of line to avoid include hell.
> + *
> + * Assumes that interrupts are disabled and only get reenabled
> + * before idle, otherwise the data from a racing interrupt might not
> + * get cleared. There are some callers who violate this,
> + * but they are only used in unattackable cases.> + */

Can we please document the unattackable cases, along with the reasons
they are unattackable?  This property also keeps us from being able to
annotate this site with lockdep checks for interrupts being off, which
is a bit unfortunate.

> +static inline void clear_cpu_idle(void)
> +{
> +	if (sched_smt_active()) {
> +		clear_thread_flag(TIF_CLEAR_CPU);
> +		clear_cpu();
> +	}
> +}

=2E..
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idl=
e.c
> index b2131c4ea124..0342daa122fe 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -33,6 +33,7 @@
>  #include <linux/cpuidle.h>
>  #include <linux/cpu.h>
>  #include <acpi/processor.h>
> +#include <asm/clearcpu.h>
> =20
>  /*
>   * Include the apic definitions for x86 to have the APIC timer related=
 defines
> @@ -120,6 +121,7 @@ static const struct dmi_system_id processor_power_d=
mi_table[] =3D {
>   */
>  static void __cpuidle acpi_safe_halt(void)
>  {
> +	clear_cpu_idle();
>  	if (!tif_need_resched()) {
>  		safe_halt();
>  		local_irq_disable();

Why is this one outside the if()?  Seems like it could be safely inside
next to safe_halt().

> @@ -681,6 +683,7 @@ static int acpi_idle_play_dead(struct cpuidle_devic=
e *dev, int index)
> =20
>  	ACPI_FLUSH_CPU_CACHE();
> =20
> +	clear_cpu_idle();
>  	while (1) {
> =20
>  		if (cx->entry_method =3D=3D ACPI_CSTATE_HALT)

At the risk of bike-shedding...  Why don't we just catch all these
*play_dead() sites inside play_dead() itself, or at arch_cpu_idle_dead()?=


> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index 8b5d85c91e9d..ddaa7603d53a 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -65,6 +65,7 @@
>  #include <asm/intel-family.h>
>  #include <asm/mwait.h>
>  #include <asm/msr.h>
> +#include <asm/clearcpu.h>
> =20
>  #define INTEL_IDLE_VERSION "0.4.1"
> =20
> @@ -933,6 +934,8 @@ static __cpuidle int intel_idle(struct cpuidle_devi=
ce *dev,
>  		}
>  	}
> =20
> +	clear_cpu_idle();
> +
>  	mwait_idle_with_hints(eax, ecx);

And my like bikeshed: It seems like this would be a much smaller patch,
and be less likely to have future code add vulnerabilities if we just
patched mwait_idle_with_hints().


--7OGI42AkB7AxZBUWrbS9X3j2Y6mtJBg3f--