From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=79U8=MF=lists.infradead.org=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-4.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH,
	DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 154C8C07E95
	for <linux-arm-kernel@archiver.kernel.org>; Tue, 13 Jul 2021 16:14:15 +0000 (UTC)
Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.kernel.org (Postfix) with ESMTPS id D743C611AC
	for <linux-arm-kernel@archiver.kernel.org>; Tue, 13 Jul 2021 16:14:14 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D743C611AC
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=lists.infradead.org; s=bombadil.20210309; h=Sender:
	Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post:
	List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References:
	Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description:
	Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:
	List-Owner; bh=5o8cSv8m0W2ctv+rsWfBMyacMxR9TtBFWKb2bzrFQnM=; b=lSh73w1srx7ojY
	4WRsLbtCpvEmpRdpGSLDKfu6+vUI8FtL4vVxqVOFW9LFdNjWj7gzyTXcvEJ3esBxaRqG2PcjDU8kV
	EZVEVZEOp2mtjIZo26Plt6KNl+h/8BVZBEjrGHNek3wij2NwrShGUShOMutaQTNY+pmheqv1k1ANl
	yKD3S7iMsAq6uvPP6qDi9nKDj9iONYM7gCITIQiBmngyLAlCIFSq5OSagxM3PIVPURfJL09xTsLkH
	KDjlALypCE44GaR5vx++cYvhEa7ZUYmpc9KZdi0EsmNNEdn0ySZs1OcpNcZWPE0Ld+rtwqubj0RxC
	W0Z1Rm10QaoOMEICqwZw==;
Received: from localhost ([::1] helo=bombadil.infradead.org)
	by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux))
	id 1m3L1c-00AtsD-0v; Tue, 13 Jul 2021 16:12:48 +0000
Received: from foss.arm.com ([217.140.110.172])
 by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux))
 id 1m3L1X-00Atri-Ht
 for linux-arm-kernel@lists.infradead.org; Tue, 13 Jul 2021 16:12:45 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
 by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D9C756D;
 Tue, 13 Jul 2021 09:12:38 -0700 (PDT)
Received: from C02TD0UTHF1T.local (unknown [10.57.6.209])
 by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 53E653F7D8;
 Tue, 13 Jul 2021 09:12:36 -0700 (PDT)
Date: Tue, 13 Jul 2021 17:12:33 +0100
From: Mark Rutland <mark.rutland@arm.com>
To: Bharat Bhushan <bbhushan2@marvell.com>
Cc: "catalin.marinas@arm.com" <catalin.marinas@arm.com>,
 "will@kernel.org" <will@kernel.org>,
 "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
 "maz@kernel.org" <maz@kernel.org>,
 "konrad.dybcio@somainline.org" <konrad.dybcio@somainline.org>,
 "saiprakash.ranjan@codeaurora.org" <saiprakash.ranjan@codeaurora.org>,
 "robh@kernel.org" <robh@kernel.org>, "marcan@marcan.st" <marcan@marcan.st>,
 "suzuki.poulose@arm.com" <suzuki.poulose@arm.com>,
 "broonie@kernel.org" <broonie@kernel.org>,
 "linux-arm-kernel@lists.infradead.org" <linux-arm-kernel@lists.infradead.org>, 
 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
 Linu Cherian <lcherian@marvell.com>,
 Sunil Kovvuri Goutham <sgoutham@marvell.com>
Subject: Re: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627 workaround
Message-ID: <20210713161233.GB13027@C02TD0UTHF1T.local>
References: <20210705060843.3150-1-bbhushan2@marvell.com>
 <20210705090753.GD38629@C02TD0UTHF1T.local>
 <CO6PR18MB4465687A22FE724E59D4225CE3199@CO6PR18MB4465.namprd18.prod.outlook.com>
 <20210708114157.GC24650@C02TD0UTHF1T.local>
 <CO6PR18MB4465C85A872D7CE4138EBF8FE3149@CO6PR18MB4465.namprd18.prod.outlook.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <CO6PR18MB4465C85A872D7CE4138EBF8FE3149@CO6PR18MB4465.namprd18.prod.outlook.com>
X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 
X-CRM114-CacheID: sfid-20210713_091243_744471_69028D92 
X-CRM114-Status: GOOD (  71.77  )
X-BeenThere: linux-arm-kernel@lists.infradead.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: <linux-arm-kernel.lists.infradead.org>
List-Unsubscribe: <http://lists.infradead.org/mailman/options/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=unsubscribe>
List-Archive: <http://lists.infradead.org/pipermail/linux-arm-kernel/>
List-Post: <mailto:linux-arm-kernel@lists.infradead.org>
List-Help: <mailto:linux-arm-kernel-request@lists.infradead.org?subject=help>
List-Subscribe: <http://lists.infradead.org/mailman/listinfo/linux-arm-kernel>, 
 <mailto:linux-arm-kernel-request@lists.infradead.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org>
Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org

On Tue, Jul 13, 2021 at 02:40:22AM +0000, Bharat Bhushan wrote:
> Hi Mark,
> 
> > -----Original Message-----
> > From: Mark Rutland <mark.rutland@arm.com>
> > Sent: Thursday, July 8, 2021 5:12 PM
> > To: Bharat Bhushan <bbhushan2@marvell.com>
> > Cc: catalin.marinas@arm.com; will@kernel.org; daniel.lezcano@linaro.org;
> > maz@kernel.org; konrad.dybcio@somainline.org;
> > saiprakash.ranjan@codeaurora.org; robh@kernel.org; marcan@marcan.st;
> > suzuki.poulose@arm.com; broonie@kernel.org; linux-arm-
> > kernel@lists.infradead.org; linux-kernel@vger.kernel.org; Linu Cherian
> > <lcherian@marvell.com>
> > Subject: Re: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627
> > workaround
> > 
> > On Thu, Jul 08, 2021 at 10:47:42AM +0000, Bharat Bhushan wrote:
> > > Hi Mark,
> > >
> > > Sorry for the delay, was gathering some details.
> > > Pease see inline
> > >
> > > > -----Original Message-----
> > > > From: Mark Rutland <mark.rutland@arm.com>
> > > > Sent: Monday, July 5, 2021 2:38 PM
> > > > To: Bharat Bhushan <bbhushan2@marvell.com>
> > > > Cc: catalin.marinas@arm.com; will@kernel.org;
> > > > daniel.lezcano@linaro.org; maz@kernel.org;
> > > > konrad.dybcio@somainline.org; saiprakash.ranjan@codeaurora.org;
> > > > robh@kernel.org; marcan@marcan.st; suzuki.poulose@arm.com;
> > > > broonie@kernel.org; linux-arm- kernel@lists.infradead.org;
> > > > linux-kernel@vger.kernel.org; Linu Cherian <lcherian@marvell.com>
> > > > Subject: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627
> > > > workaround
> > > >
> > > > External Email
> > > >
> > > > --------------------------------------------------------------------
> > > > --
> > > > Hi Bharat,
> > > >
> > > > On Mon, Jul 05, 2021 at 11:38:43AM +0530, Bharat Bhushan wrote:
> > > > > CPU pipeline have unpredicted behavior when timer interrupt
> > > > > appears and then disappears prior to the exception happening. Time
> > > > > interrupt appears on timer expiry and disappears when timer
> > > > > programming or timer disable. This typically can happen when a
> > > > > load instruction misses in the cache,  which can take few hundreds
> > > > > of cycles, and an interrupt appears after the load instruction
> > > > > starts executing but disappears before the load instruction completes.
> > > >
> > > > Could you elaborate on the scenario? What sort of unpredictable
> > > > behaviour can occur? e.g:
> > >
> > > This is a race condition where an instruction (except store, system,
> > > load atomic and load exclusive) becomes "nop" if interrupt appears and
> > > disappears before taken by CPU. For example interrupt appears after
> > > the atomic load instruction starts executing and disappears before the
> > > atomic load instruction completes, in that case instruction (not all)
> > > can become "nop". As interrupt disappears before atomic instruction
> > > completes, cpu continues to execute and while take junk from register
> > > as other dependent got "nop".
> > 
> > Thanks for this; I have a number of further questions below.
> > 
> > You said this doesn't apply to:
> > 
> > * store
> > * system
> > * load atomic
> > * load exclusive
> > 
> > ... but your example explains this happening for an atomic load, which was in
> > that list. Was the example bad, or was the list wrong?
> 
> The load atomic completes successfully. It doesn't become a nop. A
> loads atomic is significant just because it's an instruction which has
> a long time between executing an retiring. This provides a window of
> vulnerability when an interrupt asserts and then deasserts. This
> stimulates the bug and causes other instructions executing in
> parallel,  which can get nop.

Thanks for clarifiying; this was not clear from your initial
description.

> > It's not entirely clear to me which instructions this covers. e.g. is "system" the
> > entire system instruction class (i.e. all opcodes
> > 0b110101010_0xxxxxxx_xxxxxxxx_xxxxxxxx), or did you mean something more
> > specific? Does "store" include store-exlcusive?
> > 
> > Other than that list, can this occur for *any* instruction? e.g. MOV, SHA256*,
> > *DIV?
> 
> There are two general classes of instructions. Those that only change
> a gpr or PC. These are arithmetic, floating point, branch. Loads with
> no side effects also fall into this category. These are the
> instructions that can erroneously be nop'd. The other category are
> instructions that can change architectural state more than a GPR.
> These include all stores, atomic loads, exclusive loads, loads to
> non-cacheable space,msr,mrs,eret,tlb*,sys,brk,etc, these does not get
> "noped"
> 
> > 
> > Does this only apply to a single instruction at a time, or can multiple instructions
> > "become nop"?
> 
> Can be multiple,
> 
> > 
> > When an instruction "becomes nop", will subsequent instructions see a
> > consistent architectural state (e.g. GPRs as they were exactly before the
> > instruction which "becomes nop"), or can they see something else (e.g. garbage
> > forwarded from register renaming or other internal microarchitectural state)?
> 
> > > > * Does the CPU lockup?
> > > No
> > >
> > > > * Does the CPU take the exception at all?
> > > No
> > >
> > > > * Does the load behave erroneously?
> > > No,
> > >
> > > > * Does any CPU state (e.g. GPRs, PC, PSTATE) become corrupted?
> > >
> > > yes, GPRs will get corrupted, will have stale value
> > 
> > As above, is that the prior architectural value of the GPRs, or can that be some
> > bogus microarchitectural state (e.g. from renaming or other forwarding paths)?
> 
> The instructions that become a nop doesn't write the GPR and because
> this is an OOO machine the GPR result isn't the prior architectural
> value but whatever crap is leftover in the physical register.

Ok, so that's a potential information leak from a different context
(e.g. higher EL), depending on what happened to be left in that physical
register.

Consider a malicious guest at EL1. What prevents it from triggering this
deliberately, then inspecting the GPRs after taking the IRQ in order to
read prior secrets?

> > > > Does the problem manifest when IRQs are masked by DAIF.I, or by
> > > > CNT*_CTL_EL0.{IMASK,ENABLE} ?
> > >
> > > No, there are no issue if interrupts are masked.
> > 
> > If a write to CNTV_CTL_EL0.IMASK races with the interrupt being asserted, can
> > that trigger the problem?
> 
> If interrupt is enabled (DAIF) - then it will be taken, and no issue
> But if interrupts are disabled then following sequence can see the race
>  1)	interrupt is disabled (DAIF) 
> 2)	TVAL/ENABLE/IMASK at timer h/w programming to de-assert interrupt.
> 	Race of Irq asserted before irq de-asserted, than this short window of assertion will be considered as spike from timer h/w block
> 3)	Enable DAIF 
> 	Because of propagation delay CPU sees assertion and de-assertion (spike), errata hit
> 
> Will add "isb" around interrupt enablement in next version of patch.

... why?

That doesn't seem to follow from the abvoe, so I think I'm missing a
step.

> > If a write to DAIF.I races with the interrupt being asserted, can that trigger the
> > problem?
> 
> No race with writing to DAIF.I with interrupt assertion,
> Writing DAIF.I = 0 (enablement of interrupt) can race with de-assertion, which can lead to hitting errata

Ok.

That *might* make it possible to bodge around the timer specifically,
but as below I don't think we can ensure this is safe in the presence of
virtualization, nor when considering other interrupts.

> > From your description so far, this doesn't sound like it is specific to the timer
> > interrupt. Is it possible for a different interrupt to trigger this, e.g:
> > 
> > * Can the same happen with another PPI, e.g. the PMU interrupt if that
> >   gets de-asserted, or there's a race with DAIF.I?
> > 
> > * Can the same happen with an SGI, e.g. if one CPU asserts then
> >   de-asserts an SGI targetting another CPU, or there's a race with
> >   DAIF.I?
> > 
> > * Can the same happen with an SPI, e.g. if a device asserts then
> >   de-asserts its IRQ line, or there's a race with DAIF.I?
> 
> No issue with edge triggered, but this can happen with any level sensitive interrupt.

Ok.

So that'll include at least the PMU and 

> > If not, *why* does this happen specifically for the timer interrupt?
> > 
> > > > > Workaround of this is to ensure maximum 2us of time gap between
> > > > > timer interrupt and timer programming which can de-assert timer interrupt.
> > > >
> > > > The code below seems to try to enforce a 2us *minimum*. Which is it
> > > > supposed to be?
> > >
> > > Yes, it is minimum 2us.
> > >
> > > >
> > > > Can you explain *why* this is supposed to help?
> > > With the workaround interrupt assertion and de-assertion will be minimum 2us
> > apart.
> > 
> > I understood that, but why is that deemed to be sufficient? e.g. is it somehow
> > guaranteed that the CPU will complete the instruction that would "become nop"
> > in that time?
> 
> With this delay we avoid spike, either this this will becomes an
> actual interrupt or the spike never visible to core.
> 
> > > > I don't see how we can guarantee this in a VM, or if the CPU misses
> > > > on an instruction fetch.
> > >
> > > This errata applies to VM (virtual timer) as well, maybe there is some
> > > gap in my understanding, how it will be different in VM.
> > > Can you help with what issue we can have VM?
> > 
> > A VCPU can be pre-empted by the host at *any* time, for an arbitrary length of
> > time. So e.g. you can have a scenario such as:
> > 
> > 1. Guest reads CNTx_TVAL, sees interrupt is 4us in the future and
> >    decides it does not need to wait
> > 2. Host preempts guest
> > 3. Host does some processing for ~3.9us
> > 4. Host returns to guest, with 0.1us left until the interrupt triggers 5. Guest
> > reprograms CNTx_TVAL, and triggers the erratum
> 
> Yes, when timer expire just before tval written (race condition) , so
> there is assertion-followed by de-assertion, As interrupts are enabled
> in host, interrupt will be visible as spike to host.

Ok, so that's a recipe for the guest to attack the host.

> We will apply workaround whenever entering to guest (add a delay
> before exiting to guest in case guest timer is going to expire).

I think this is papering over the problem.

You've said this can happen for *any* level-triggered interrupt. AFAIK,
nothing prevents a malicious guest from deliberately asserting and
de-asserting a level-triggered interrupt (e.g. by writing to the GIC
distributor), and I also note that the GIC maintenance interrupt is
level-triggered.

So, as above:

1) A guest can deliberately cause information to be leaked to itself via
   the corrupted GPRs. I haven't seen any rationale for why that is not
   a problem, nor have I seen a suggested workaround.

2) A guest *may* be able to trigger this while the host is running. I
   haven't seen anything that rules this out so far.

3) Even in the absence of virtualization, it would be necessary to
   workaround this for *every* level-triggered interrupt, which includes
   at the timer, PMU, and GIC maintenance interrupts, in addition to any
   other configurable PPIs or SPIs.

Without a fix that covers all of those, I don't think the workaround is
viable.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel