From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ak@linux.intel.com>
Received: from mga03.intel.com ([134.134.136.65])
	by Galois.linutronix.de with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256)
	(Exim 4.80)
	(envelope-from <ak@linux.intel.com>)
	id 1fR4Gq-0005xK-5u
	for speck@linutronix.de; Fri, 08 Jun 2018 01:24:45 +0200
Date: Thu, 7 Jun 2018 16:24:41 -0700
From: Andi Kleen <ak@linux.intel.com>
Subject: [MODERATED] Re: Is: Tim, Q to you. Was:Re: [PATCH 1/2] L1TF KVM 1
Message-ID: <20180607232441.GH7220@tassilo.jf.intel.com>
References: <20180529194214.2600-1-pbonzini@redhat.com>
 <20180529194240.7F1336110A@crypto-ml.lab.linutronix.de>
 <alpine.DEB.2.21.1805292350200.1597@nanos.tec.linutronix.de>
 <a225b8e5-494c-3f0d-8d2a-25af9f3fafbc@citrix.com>
 <99e589e5-6385-2e3e-aac4-6a5d6955a505@redhat.com>
 <0263eeab-7c6a-20e4-324a-135b97bc1691@amazon.com>
 <20180604131133.GB7296@char.us.oracle.com>
 <dd4d323e-de25-f559-076c-d9994fa431e4@linux.intel.com>
 <55fb75e8-d57f-29b3-5255-3be0677c2452@linux.intel.com>
 <bdbfc27c-aba0-dec1-8de8-93c7e5d1c572@linux.intel.com>
MIME-Version: 1.0
In-Reply-To: <bdbfc27c-aba0-dec1-8de8-93c7e5d1c572@linux.intel.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
To: speck@linutronix.de
List-ID: <speck.linutronix.de>

On Thu, Jun 07, 2018 at 12:11:21PM -0700, speck for Tim Chen wrote:
> From: Tim Chen <tim.c.chen@linux.intel.com>
> To: speck for Konrad Rzeszutek Wilk <speck@linutronix.de>
> Subject: Re: Is: Tim, Q to you. Was:Re: [PATCH 1/2] L1TF KVM 1

> On 06/05/2018 04:37 PM, Tim Chen wrote:
> > On 06/05/2018 04:34 PM, Tim Chen wrote:
> >> On 06/04/2018 06:11 AM, speck for Konrad Rzeszutek Wilk wrote:
> >>> On Mon, Jun 04, 2018 at 10:24:59AM +0200, speck for Martin Pohlack wrote:
> >>>> [resending as new message as the replay seems to have been lost on at
> >>>> least some mail paths]
> >>>>
> >>>> On 30.05.2018 11:01, speck for Paolo Bonzini wrote:
> >>>>> On 30/05/2018 01:54, speck for Andrew Cooper wrote:
> >>>>>> Other bits I don't understand are the 64k limit in the first place, why
> >>>>>> it gets walked over in 4k strides to begin with (I'm not aware of any
> >>>>>> prefetching which would benefit that...) and why a particularly
> >>>>>> obfuscated piece of magic is used for the 64byte strides.
> >>>>>
> >>>>> That is the only part I understood, :) the 4k strides ensure that the
> >>>>> source data is in the TLB.  Why that is needed is still a mystery though.
> >>>>
> >>>> I think the reasoning is that you first want to populate the TLB for the
> >>>> whole flush array, then fence, to make sure TLB walks do not interfere
> >>>> with the actual flushing later, either for performance reasons or for
> >>>> preventing leakage of partial walk results.
> >>>>
> >>>> Not sure about the 64K, it likely is about the LRU implementation for L1
> >>>> replacement not being perfect (but pseudo LRU), so you need to flush
> >>>> more than the L1 size (32K) in software.  But I have also seen smaller
> >>>> recommendations for that (52K).
> >>>
> >>
> >> Had some discussions with other Intel folks.
> >>
> >> Our recommendation is not to use the software sequence for L1 clear but
> >> use wrmsrl(MSR_IA32_FLUSH_L1D, MSR_IA32_FLUSH_L1D_VALUE).
> >> We expect that all affected systems will be receiving a ucode update
> >> to provide L1 clearing capability.
> >>
> >> Yes, the 4k stride is for getting TLB walks out of the way and
> >> the 64kB replacement is to accommodate pseudo LRU.
> > 
> > I will try to see if I can get hold of the relevant documentation
> > on pseudo LRU.
> > 
> 
> The HW folks mentioned that if we have nothing from the flush buffer in
> L1, then 32 KB would be sufficient (if we load miss for everything).
> 
> However, that's not the case. If some data from the flush buffer is
> already in L1, it could protect an unrelated line that's considered
> "near" by the LRU from getting flushed.  To make sure that does not
> happen, we go through 64 KB of data to guarantee every line in L1 will
> encounter a load miss and is flushed.

Also the recommended mitigation is really to use the MSR write instead
of the magic software sequence. Perhaps it would be best to 
just remove the software sequence. Updated microcode is needed in
any case, it doesn't make sense to try to support partially updated systems.

-Andi