From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from merlin.infradead.org ([2001:8b0:10b:1231::1]) by Galois.linutronix.de with esmtps (TLS1.2:RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fLmoM-00023d-Mm for speck@linutronix.de; Thu, 24 May 2018 11:45:31 +0200 Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=hirez.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1fLmoL-00060N-65 for speck@linutronix.de; Thu, 24 May 2018 09:45:29 +0000 Date: Thu, 24 May 2018 11:45:26 +0200 From: Peter Zijlstra Subject: [MODERATED] Re: L1D-Fault KVM mitigation Message-ID: <20180524094526.GE12198@hirez.programming.kicks-ass.net> References: <20180424090630.wlghmrpasn7v7wbn@suse.de> <20180424093537.GC4064@hirez.programming.kicks-ass.net> <1524563292.8691.38.camel@infradead.org> <20180424110445.GU4043@hirez.programming.kicks-ass.net> <1527068745.8186.89.camel@infradead.org> MIME-Version: 1.0 In-Reply-To: <1527068745.8186.89.camel@infradead.org> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit To: speck@linutronix.de List-ID: On Wed, May 23, 2018 at 10:45:45AM +0100, speck for David Woodhouse wrote: > That's OK because it's only the VMX tasks which can abuse it, isn't it? If, like you outline below, this is an (optional) ucode assist to co-scheduling matching VCPU threads, then yes. > Let's assume we've fixed the problem for normal tasks, by flipping the > top bit in absent PTEs that actually contain swap pointers, etc. > > The only thing we have left is VM guests. The microcode bit would say > that *if* a CPU thread is in non-root mode then *it* gets paused unless > its sibling is also in non-root mode for the same VMID. > > So when both siblings are actually in the VM, they get to run. If one > sibling comes *out* of the VM to the host kernel or to run (host) > userspace, then the other one doesn't execute any guest instructions. > It can take exceptions which cause a vmexit though. Would it make sense to time limit the being 'stuck', much like PLE ? > We'd also want a vCPU to be able to run if its sibling is actually in > the host but *idle* (and has flushed the L1. Perhaps we actually > automatically flush the L1 when resuming a sibling that got paused). Right, idle is a wildcard which matches with any VCPU. We don't care about the cache state of the sibling though. L1 is shared and since VMENTER must flush L1, that is sufficient. > It does still depend on gang scheduling (or at least forced sibling > idle which is a subset of that), or a singleton vCPU might *never* get > run. But we were going to have to do something along those lines > anyway. Linus has opinions on that.. but yes, without that all that remains is disabling HT afaict. > The microcode trick just makes it a lot easier because we don't > have to *explicitly* pause the sibling vCPUs and manage their state on > every vmexit/entry. And avoids potential race conditions with managing > that in software. Yes, it would certainly help and avoid a fair bit of ugly. It would, for instance, avoid having to modify irq_enter() / irq_exit(), which would otherwise be required (and possibly leak all data touched up until that point is reached). But even with all that, adding L1-flush to every VMENTER will hurt lots. Consider for example the PIO emulation used when booting a guest from a disk image. That causes VMEXIT/VMENTER at stupendous rates. Also, none of this readily addresses the problem of load-balancing shredding the VCPU localities required for this.