From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com ([192.55.52.120]) by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fMHMS-00079F-Lz for speck@linutronix.de; Fri, 25 May 2018 20:22:45 +0200 References: <20180424090630.wlghmrpasn7v7wbn@suse.de> <20180424093537.GC4064@hirez.programming.kicks-ass.net> <1524563292.8691.38.camel@infradead.org> <20180424110445.GU4043@hirez.programming.kicks-ass.net> <1527068745.8186.89.camel@infradead.org> <20180524094526.GE12198@hirez.programming.kicks-ass.net> From: Tim Chen Message-ID: <786ae2c4-48ee-4af0-15fa-23659ac63adf@linux.intel.com> Date: Fri, 25 May 2018 11:22:37 -0700 MIME-Version: 1.0 In-Reply-To: Subject: [MODERATED] Encrypted Message Content-Type: multipart/mixed; boundary="EXKH5Di8TLJv5toFS19FDlxnjmXqpQ9vi"; protected-headers="v1" To: speck@linutronix.de List-ID: This is an OpenPGP/MIME encrypted message (RFC 4880 and 3156) --EXKH5Di8TLJv5toFS19FDlxnjmXqpQ9vi Content-Type: text/rfc822-headers; protected-headers="v1" Content-Disposition: inline From: Tim Chen To: speck for Tim Chen Subject: Re: L1D-Fault KVM mitigation --EXKH5Di8TLJv5toFS19FDlxnjmXqpQ9vi Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 05/24/2018 04:18 PM, speck for Tim Chen wrote: > On 05/24/2018 08:33 AM, speck for Thomas Gleixner wrote: >> On Thu, 24 May 2018, speck for Thomas Gleixner wrote: >>> On Thu, 24 May 2018, speck for Peter Zijlstra wrote: >>>> On Wed, May 23, 2018 at 10:45:45AM +0100, speck for David Woodhouse = wrote: >>>>> The microcode trick just makes it a lot easier because we don't >>>>> have to *explicitly* pause the sibling vCPUs and manage their state= on >>>>> every vmexit/entry. And avoids potential race conditions with manag= ing >>>>> that in software. >>>> >>>> Yes, it would certainly help and avoid a fair bit of ugly. It would,= for >>>> instance, avoid having to modify irq_enter() / irq_exit(), which wou= ld >>>> otherwise be required (and possibly leak all data touched up until t= hat >>>> point is reached). >>>> >>>> But even with all that, adding L1-flush to every VMENTER will hurt l= ots. >>>> Consider for example the PIO emulation used when booting a guest fro= m a >>>> disk image. That causes VMEXIT/VMENTER at stupendous rates. >>> >>> Just did a test on SKL Client where I have ucode. It does not have HT= so >>> its not suffering from any HT side effects when L1D is flushed. >>> >>> Boot time from a disk image is ~1s measured from the first vcpu enter= =2E >>> >>> With L1D Flush on vmenter the boot time is about 5-10% slower. And th= at has >>> lots of PIO operations in the early boot. >>> >>> For a kernel build the L1D Flush has an overhead of < 1%. >>> >>> Netperf guest to host has a slight drop of the throughput in the 2% >>> range. Host to guest surprisingly goes up by ~3%. Fun stuff! >>> >>> Now I isolated two host CPUs and pinned the two vCPUs on it to be abl= e to >>> measure the overhead. Running cyclictest with a period of 25us in the= guest >>> on a isolated guest CPU and monitoring the behaviour with perf on the= host >>> for the corresponding host CPU gives >>> >>> No Flush Flush >>> >>> 1.31 insn per cycle 1.14 insn per cycle >>> >>> 2e6 L1-dcache-load-misses/sec 26e6 L1-dcache-load-misses/sec >>> >>> In that simple test the L1D misses go up by a factor of 13. >>> >>> Now with the whole gang scheduling the numbers I heard through the >>> grapevine are in the range of factor 130, i.e. 13k% for a simple boot= from >>> disk image. 13 minutes instead of 6 seconds... >=20 > The performance is highly dependent on how often we VM exit. > Working with Peter Z on his prototype, the performance ranges from > no regression for a network loop back, ~20% regression for kernel compi= le > to ~100% regression on File IO. PIO brings out the worse aspect > of the synchronization overhead as we VM exit on every dword PIO read i= n, and the > kernel and initrd image was about 50 MB for the experiment, and led to > 13 min of load time. >=20 > We may need to do the co-scheduling only when VM exit rate is low, and > turn off the SMT when VM exit rate becomes too high. >=20 > (Note: I haven't added in the L1 flush on VM entry for my experiment, t= hat is on > the todo). As a post note, I added in the L1 flush and the performance numbers pretty much stay the same. So the synchronization overhead is dominant and L1 flush overhead is secondary. Tim --EXKH5Di8TLJv5toFS19FDlxnjmXqpQ9vi--