From mboxrd@z Thu Jan  1 00:00:00 1970
Return-path: <peterz@infradead.org>
Received: from bombadil.infradead.org ([2607:7c80:54:e::133])
	by Galois.linutronix.de with esmtps (TLS1.2:RSA_AES_256_CBC_SHA256:256)
	(Exim 4.80)
	(envelope-from <peterz@infradead.org>)
	id 1fAuMP-0001hm-6f
	for speck@linutronix.de; Tue, 24 Apr 2018 11:35:42 +0200
Received: from j217100.upc-j.chello.nl ([24.132.217.100]
 helo=hirez.programming.kicks-ass.net)	by bombadil.infradead.org with esmtpsa
 (Exim 4.90_1 #2 (Red Hat Linux))	id 1fAuMN-0000DZ-5f	for speck@linutronix.de;
 Tue, 24 Apr 2018 09:35:39 +0000
Date: Tue, 24 Apr 2018 11:35:37 +0200
From: Peter Zijlstra <peterz@infradead.org>
Subject: [MODERATED] Re: L1D-Fault KVM mitigation
Message-ID: <20180424093537.GC4064@hirez.programming.kicks-ass.net>
References: <20180424090630.wlghmrpasn7v7wbn@suse.de>
MIME-Version: 1.0
In-Reply-To: <20180424090630.wlghmrpasn7v7wbn@suse.de>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
To: speck@linutronix.de
List-ID: <speck.linutronix.de>

On Tue, Apr 24, 2018 at 11:06:30AM +0200, speck for Joerg Roedel wrote:
> Hey,
> 
> I've been looking into the mitigation for the L1D fault issue in KVM,
> and since the hardware seems to speculate with the GPA as an HPA, it
> seems we have to disable SMT to be fully secure here because otherwise
> two different guests running on HT siblings could spy on each other.
> 
> I'd like to discuss how we mitigate this, the big hammer would be not
> initializing the HT siblings at boot on affected machines, but that is
> probably a bit too eager as it also penalizes people not using KVM.
> 
> Another option is to just print a fat warning and/or refuse to load the
> KVM modules on affected machines when HT is enabled.
> 
> So what are the opinions on how we should best mitigate this issue?

Another option, that is being explored, is to co-schedule siblings.
So ensure all siblings either run vcpus of the _same_ VM or idle.

Of course, this is all rather intrusive and ugly and brings with it
setup costs as well, because you'd have to sync up on VMENTER, VMEXIT
and interrupts (on the idle CPUs).

Another complication is that on overcommitted systems the regular load
balancer will happily migrate vcpu tasks around. So it is fairly tricky
to ensure runnable vcpu threads of the same VM are in fact around to be
ran on a core.

Not to mention that Linus has basically said: "No way, Jose".

I know that I worked a little with Tim on this, and I know Google did
their own thing (but have not seen patches from them -- is pjt on this
list?). I've also heard Amazon was also working on things (are they
here?). And I think RHT was also looking into something (mingo, bonzini
-- are you guys reading?)

In any case, if any of that is to go fly we need very solid numbers to
convince Linus to reconsider.

Another idea that I had was to only allow trusted guest kernels, as in
trusted computing, key verified images etc.. Of course, they too can be
compromised, but hopefully it avoids the most egregious hostile guest
scenarios.