From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga14.intel.com ([192.55.52.115]) by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1fLzUp-0003pS-Pe for speck@linutronix.de; Fri, 25 May 2018 01:18:12 +0200 References: <20180424090630.wlghmrpasn7v7wbn@suse.de> <20180424093537.GC4064@hirez.programming.kicks-ass.net> <1524563292.8691.38.camel@infradead.org> <20180424110445.GU4043@hirez.programming.kicks-ass.net> <1527068745.8186.89.camel@infradead.org> <20180524094526.GE12198@hirez.programming.kicks-ass.net> From: Tim Chen Message-ID: Date: Thu, 24 May 2018 16:18:08 -0700 MIME-Version: 1.0 In-Reply-To: Subject: [MODERATED] Encrypted Message Content-Type: multipart/mixed; boundary="e8XP3ERilYTVqSHnTi3NhZMHrEbEO3sOI"; protected-headers="v1" To: speck@linutronix.de List-ID: This is an OpenPGP/MIME encrypted message (RFC 4880 and 3156) --e8XP3ERilYTVqSHnTi3NhZMHrEbEO3sOI Content-Type: text/rfc822-headers; protected-headers="v1" Content-Disposition: inline From: Tim Chen To: speck for Thomas Gleixner Subject: Re: L1D-Fault KVM mitigation --e8XP3ERilYTVqSHnTi3NhZMHrEbEO3sOI Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: quoted-printable On 05/24/2018 08:33 AM, speck for Thomas Gleixner wrote: > On Thu, 24 May 2018, speck for Thomas Gleixner wrote: >> On Thu, 24 May 2018, speck for Peter Zijlstra wrote: >>> On Wed, May 23, 2018 at 10:45:45AM +0100, speck for David Woodhouse w= rote: >>>> The microcode trick just makes it a lot easier because we don't >>>> have to *explicitly* pause the sibling vCPUs and manage their state = on >>>> every vmexit/entry. And avoids potential race conditions with managi= ng >>>> that in software. >>> >>> Yes, it would certainly help and avoid a fair bit of ugly. It would, = for >>> instance, avoid having to modify irq_enter() / irq_exit(), which woul= d >>> otherwise be required (and possibly leak all data touched up until th= at >>> point is reached). >>> >>> But even with all that, adding L1-flush to every VMENTER will hurt lo= ts. >>> Consider for example the PIO emulation used when booting a guest from= a >>> disk image. That causes VMEXIT/VMENTER at stupendous rates. >> >> Just did a test on SKL Client where I have ucode. It does not have HT = so >> its not suffering from any HT side effects when L1D is flushed. >> >> Boot time from a disk image is ~1s measured from the first vcpu enter.= >> >> With L1D Flush on vmenter the boot time is about 5-10% slower. And tha= t has >> lots of PIO operations in the early boot. >> >> For a kernel build the L1D Flush has an overhead of < 1%. >> >> Netperf guest to host has a slight drop of the throughput in the 2% >> range. Host to guest surprisingly goes up by ~3%. Fun stuff! >> >> Now I isolated two host CPUs and pinned the two vCPUs on it to be able= to >> measure the overhead. Running cyclictest with a period of 25us in the = guest >> on a isolated guest CPU and monitoring the behaviour with perf on the = host >> for the corresponding host CPU gives >> >> No Flush Flush >> >> 1.31 insn per cycle 1.14 insn per cycle >> >> 2e6 L1-dcache-load-misses/sec 26e6 L1-dcache-load-misses/sec >> >> In that simple test the L1D misses go up by a factor of 13. >> >> Now with the whole gang scheduling the numbers I heard through the >> grapevine are in the range of factor 130, i.e. 13k% for a simple boot = from >> disk image. 13 minutes instead of 6 seconds... The performance is highly dependent on how often we VM exit. Working with Peter Z on his prototype, the performance ranges from no regression for a network loop back, ~20% regression for kernel compile= to ~100% regression on File IO. PIO brings out the worse aspect of the synchronization overhead as we VM exit on every dword PIO read in,= and the kernel and initrd image was about 50 MB for the experiment, and led to 13 min of load time. We may need to do the co-scheduling only when VM exit rate is low, and turn off the SMT when VM exit rate becomes too high. (Note: I haven't added in the L1 flush on VM entry for my experiment, tha= t is on the todo). Tim >> >> That's not surprising at all, though the magnitude is way higher than = I >> expected. I don't see a realistic chance for vmexit heavy workloads to= work >> with that synchronization thing at all, whether it's ucode assisted or= not. >=20 > That said, I think we should stage the host side mitigations plus the L= 1 > flush on vmenter ASAP so we are not standing there with our pants down = when > the cat comes out of the bag early. That means HT off, but it's still > better than having absolutely nothing. >=20 > The gang scheduling nonsense can be added on top when it should > surprisingly turn out to be usable at all. >=20 > Thanks, >=20 > tglx >=20 --e8XP3ERilYTVqSHnTi3NhZMHrEbEO3sOI--