From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from bombadil.infradead.org ([2607:7c80:54:e::133]) by Galois.linutronix.de with esmtps (TLS1.2:RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fAuMP-0001hm-6f for speck@linutronix.de; Tue, 24 Apr 2018 11:35:42 +0200 Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1fAuMN-0000DZ-5f for speck@linutronix.de; Tue, 24 Apr 2018 09:35:39 +0000 Date: Tue, 24 Apr 2018 11:35:37 +0200 From: Peter Zijlstra Subject: [MODERATED] Re: L1D-Fault KVM mitigation Message-ID: <20180424093537.GC4064@hirez.programming.kicks-ass.net> References: <20180424090630.wlghmrpasn7v7wbn@suse.de> MIME-Version: 1.0 In-Reply-To: <20180424090630.wlghmrpasn7v7wbn@suse.de> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit To: speck@linutronix.de List-ID: On Tue, Apr 24, 2018 at 11:06:30AM +0200, speck for Joerg Roedel wrote: > Hey, > > I've been looking into the mitigation for the L1D fault issue in KVM, > and since the hardware seems to speculate with the GPA as an HPA, it > seems we have to disable SMT to be fully secure here because otherwise > two different guests running on HT siblings could spy on each other. > > I'd like to discuss how we mitigate this, the big hammer would be not > initializing the HT siblings at boot on affected machines, but that is > probably a bit too eager as it also penalizes people not using KVM. > > Another option is to just print a fat warning and/or refuse to load the > KVM modules on affected machines when HT is enabled. > > So what are the opinions on how we should best mitigate this issue? Another option, that is being explored, is to co-schedule siblings. So ensure all siblings either run vcpus of the _same_ VM or idle. Of course, this is all rather intrusive and ugly and brings with it setup costs as well, because you'd have to sync up on VMENTER, VMEXIT and interrupts (on the idle CPUs). Another complication is that on overcommitted systems the regular load balancer will happily migrate vcpu tasks around. So it is fairly tricky to ensure runnable vcpu threads of the same VM are in fact around to be ran on a core. Not to mention that Linus has basically said: "No way, Jose". I know that I worked a little with Tim on this, and I know Google did their own thing (but have not seen patches from them -- is pjt on this list?). I've also heard Amazon was also working on things (are they here?). And I think RHT was also looking into something (mingo, bonzini -- are you guys reading?) In any case, if any of that is to go fly we need very solid numbers to convince Linus to reconsider. Another idea that I had was to only allow trusted guest kernels, as in trusted computing, key verified images etc.. Of course, they too can be compromised, but hopefully it avoids the most egregious hostile guest scenarios.