From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: =?utf-8?q?=3CBATV+662c0d45ef9886d85002+5386+infradead=2Eorg+d?= =?utf-8?q?wmw2=40twosheds=2Esrs=2Einfradead=2Eorg=3E?= Received: from twosheds.infradead.org ([2001:8b0:10b:1:21d:7dff:fe04:dbe2]) by Galois.linutronix.de with esmtps (TLS1.2:RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from =?utf-8?q?=3CBATV+662c0d45ef9886d85002+5386+infradea?= =?utf-8?q?d=2Eorg+dwmw2=40twosheds=2Esrs=2Einfradead=2Eorg=3E=29?= id 1fLQL6-0000Nz-Sn for speck@linutronix.de; Wed, 23 May 2018 11:45:50 +0200 Received: from [2001:8b0:10b:1::b8f] by twosheds.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1fLQL4-0002rE-9R for speck@linutronix.de; Wed, 23 May 2018 09:45:46 +0000 Message-ID: <1527068745.8186.89.camel@infradead.org> Subject: [MODERATED] Re: L1D-Fault KVM mitigation From: David Woodhouse In-Reply-To: <20180424110445.GU4043@hirez.programming.kicks-ass.net> References: <20180424090630.wlghmrpasn7v7wbn@suse.de> <20180424093537.GC4064@hirez.programming.kicks-ass.net> <1524563292.8691.38.camel@infradead.org> <20180424110445.GU4043@hirez.programming.kicks-ass.net> Date: Wed, 23 May 2018 10:45:45 +0100 Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit To: speck@linutronix.de List-ID: On Tue, 2018-04-24 at 13:04 +0200, speck for Peter Zijlstra wrote: > On Tue, Apr 24, 2018 at 10:48:12AM +0100, speck for David Woodhouse wrote: > > > > On Tue, 2018-04-24 at 11:35 +0200, speck for Peter Zijlstra wrote: > > > > > > > > > Another option, that is being explored, is to co-schedule siblings. > > > So ensure all siblings either run vcpus of the _same_ VM or idle. > > > > > > Of course, this is all rather intrusive and ugly and brings with it > > > setup costs as well, because you'd have to sync up on VMENTER, VMEXIT > > > and interrupts (on the idle CPUs). > > > I hate to suggest more microcode hacks but... if there was an MSR bit > > which, when set, would pause any HT sibling that was currently in VMX > > non-root mode, then we could set that up to be automatically set on > > vmexit and it would automatically pause the problematic siblings. > > Meaning that co-ordinating vmexits with them might actually be > > feasible? > Not sure I'm following. The above assumes a sibling is running a VCPU of > another VM, right? But it could equally well run any regular old task > (including idle). > > So only pausing siblings in VMX mode wouldn't help anything. The !VMX > tasks could still be loading stuff into L1. That's OK because it's only the VMX tasks which can abuse it, isn't it? Let's assume we've fixed the problem for normal tasks, by flipping the top bit in absent PTEs that actually contain swap pointers, etc. The only thing we have left is VM guests. The microcode bit would say that *if* a CPU thread is in non-root mode then *it* gets paused unless its sibling is also in non-root mode for the same VMID. So when both siblings are actually in the VM, they get to run. If one sibling comes *out* of the VM to the host kernel or to run (host) userspace, then the other one doesn't execute any guest instructions. It can take exceptions which cause a vmexit though. We'd also want a vCPU to be able to run if its sibling is actually in the host but *idle* (and has flushed the L1. Perhaps we actually automatically flush the L1 when resuming a sibling that got paused). It does still depend on gang scheduling (or at least forced sibling idle which is a subset of that), or a singleton vCPU might *never* get run. But we were going to have to do something along those lines anyway. The microcode trick just makes it a lot easier because we don't have to *explicitly* pause the sibling vCPUs and manage their state on every vmexit/entry. And avoids potential race conditions with managing that in software.