From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 154C8C07E95 for ; Tue, 13 Jul 2021 16:14:15 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D743C611AC for ; Tue, 13 Jul 2021 16:14:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D743C611AC Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=5o8cSv8m0W2ctv+rsWfBMyacMxR9TtBFWKb2bzrFQnM=; b=lSh73w1srx7ojY 4WRsLbtCpvEmpRdpGSLDKfu6+vUI8FtL4vVxqVOFW9LFdNjWj7gzyTXcvEJ3esBxaRqG2PcjDU8kV EZVEVZEOp2mtjIZo26Plt6KNl+h/8BVZBEjrGHNek3wij2NwrShGUShOMutaQTNY+pmheqv1k1ANl yKD3S7iMsAq6uvPP6qDi9nKDj9iONYM7gCITIQiBmngyLAlCIFSq5OSagxM3PIVPURfJL09xTsLkH KDjlALypCE44GaR5vx++cYvhEa7ZUYmpc9KZdi0EsmNNEdn0ySZs1OcpNcZWPE0Ld+rtwqubj0RxC W0Z1Rm10QaoOMEICqwZw==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1m3L1c-00AtsD-0v; Tue, 13 Jul 2021 16:12:48 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1m3L1X-00Atri-Ht for linux-arm-kernel@lists.infradead.org; Tue, 13 Jul 2021 16:12:45 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D9C756D; Tue, 13 Jul 2021 09:12:38 -0700 (PDT) Received: from C02TD0UTHF1T.local (unknown [10.57.6.209]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 53E653F7D8; Tue, 13 Jul 2021 09:12:36 -0700 (PDT) Date: Tue, 13 Jul 2021 17:12:33 +0100 From: Mark Rutland To: Bharat Bhushan Cc: "catalin.marinas@arm.com" , "will@kernel.org" , "daniel.lezcano@linaro.org" , "maz@kernel.org" , "konrad.dybcio@somainline.org" , "saiprakash.ranjan@codeaurora.org" , "robh@kernel.org" , "marcan@marcan.st" , "suzuki.poulose@arm.com" , "broonie@kernel.org" , "linux-arm-kernel@lists.infradead.org" , "linux-kernel@vger.kernel.org" , Linu Cherian , Sunil Kovvuri Goutham Subject: Re: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627 workaround Message-ID: <20210713161233.GB13027@C02TD0UTHF1T.local> References: <20210705060843.3150-1-bbhushan2@marvell.com> <20210705090753.GD38629@C02TD0UTHF1T.local> <20210708114157.GC24650@C02TD0UTHF1T.local> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20210713_091243_744471_69028D92 X-CRM114-Status: GOOD ( 71.77 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Jul 13, 2021 at 02:40:22AM +0000, Bharat Bhushan wrote: > Hi Mark, > > > -----Original Message----- > > From: Mark Rutland > > Sent: Thursday, July 8, 2021 5:12 PM > > To: Bharat Bhushan > > Cc: catalin.marinas@arm.com; will@kernel.org; daniel.lezcano@linaro.org; > > maz@kernel.org; konrad.dybcio@somainline.org; > > saiprakash.ranjan@codeaurora.org; robh@kernel.org; marcan@marcan.st; > > suzuki.poulose@arm.com; broonie@kernel.org; linux-arm- > > kernel@lists.infradead.org; linux-kernel@vger.kernel.org; Linu Cherian > > > > Subject: Re: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627 > > workaround > > > > On Thu, Jul 08, 2021 at 10:47:42AM +0000, Bharat Bhushan wrote: > > > Hi Mark, > > > > > > Sorry for the delay, was gathering some details. > > > Pease see inline > > > > > > > -----Original Message----- > > > > From: Mark Rutland > > > > Sent: Monday, July 5, 2021 2:38 PM > > > > To: Bharat Bhushan > > > > Cc: catalin.marinas@arm.com; will@kernel.org; > > > > daniel.lezcano@linaro.org; maz@kernel.org; > > > > konrad.dybcio@somainline.org; saiprakash.ranjan@codeaurora.org; > > > > robh@kernel.org; marcan@marcan.st; suzuki.poulose@arm.com; > > > > broonie@kernel.org; linux-arm- kernel@lists.infradead.org; > > > > linux-kernel@vger.kernel.org; Linu Cherian > > > > Subject: [EXT] Re: [PATCH] clocksource: Add Marvell Errata-38627 > > > > workaround > > > > > > > > External Email > > > > > > > > -------------------------------------------------------------------- > > > > -- > > > > Hi Bharat, > > > > > > > > On Mon, Jul 05, 2021 at 11:38:43AM +0530, Bharat Bhushan wrote: > > > > > CPU pipeline have unpredicted behavior when timer interrupt > > > > > appears and then disappears prior to the exception happening. Time > > > > > interrupt appears on timer expiry and disappears when timer > > > > > programming or timer disable. This typically can happen when a > > > > > load instruction misses in the cache, which can take few hundreds > > > > > of cycles, and an interrupt appears after the load instruction > > > > > starts executing but disappears before the load instruction completes. > > > > > > > > Could you elaborate on the scenario? What sort of unpredictable > > > > behaviour can occur? e.g: > > > > > > This is a race condition where an instruction (except store, system, > > > load atomic and load exclusive) becomes "nop" if interrupt appears and > > > disappears before taken by CPU. For example interrupt appears after > > > the atomic load instruction starts executing and disappears before the > > > atomic load instruction completes, in that case instruction (not all) > > > can become "nop". As interrupt disappears before atomic instruction > > > completes, cpu continues to execute and while take junk from register > > > as other dependent got "nop". > > > > Thanks for this; I have a number of further questions below. > > > > You said this doesn't apply to: > > > > * store > > * system > > * load atomic > > * load exclusive > > > > ... but your example explains this happening for an atomic load, which was in > > that list. Was the example bad, or was the list wrong? > > The load atomic completes successfully. It doesn't become a nop. A > loads atomic is significant just because it's an instruction which has > a long time between executing an retiring. This provides a window of > vulnerability when an interrupt asserts and then deasserts. This > stimulates the bug and causes other instructions executing in > parallel, which can get nop. Thanks for clarifiying; this was not clear from your initial description. > > It's not entirely clear to me which instructions this covers. e.g. is "system" the > > entire system instruction class (i.e. all opcodes > > 0b110101010_0xxxxxxx_xxxxxxxx_xxxxxxxx), or did you mean something more > > specific? Does "store" include store-exlcusive? > > > > Other than that list, can this occur for *any* instruction? e.g. MOV, SHA256*, > > *DIV? > > There are two general classes of instructions. Those that only change > a gpr or PC. These are arithmetic, floating point, branch. Loads with > no side effects also fall into this category. These are the > instructions that can erroneously be nop'd. The other category are > instructions that can change architectural state more than a GPR. > These include all stores, atomic loads, exclusive loads, loads to > non-cacheable space,msr,mrs,eret,tlb*,sys,brk,etc, these does not get > "noped" > > > > > Does this only apply to a single instruction at a time, or can multiple instructions > > "become nop"? > > Can be multiple, > > > > > When an instruction "becomes nop", will subsequent instructions see a > > consistent architectural state (e.g. GPRs as they were exactly before the > > instruction which "becomes nop"), or can they see something else (e.g. garbage > > forwarded from register renaming or other internal microarchitectural state)? > > > > > * Does the CPU lockup? > > > No > > > > > > > * Does the CPU take the exception at all? > > > No > > > > > > > * Does the load behave erroneously? > > > No, > > > > > > > * Does any CPU state (e.g. GPRs, PC, PSTATE) become corrupted? > > > > > > yes, GPRs will get corrupted, will have stale value > > > > As above, is that the prior architectural value of the GPRs, or can that be some > > bogus microarchitectural state (e.g. from renaming or other forwarding paths)? > > The instructions that become a nop doesn't write the GPR and because > this is an OOO machine the GPR result isn't the prior architectural > value but whatever crap is leftover in the physical register. Ok, so that's a potential information leak from a different context (e.g. higher EL), depending on what happened to be left in that physical register. Consider a malicious guest at EL1. What prevents it from triggering this deliberately, then inspecting the GPRs after taking the IRQ in order to read prior secrets? > > > > Does the problem manifest when IRQs are masked by DAIF.I, or by > > > > CNT*_CTL_EL0.{IMASK,ENABLE} ? > > > > > > No, there are no issue if interrupts are masked. > > > > If a write to CNTV_CTL_EL0.IMASK races with the interrupt being asserted, can > > that trigger the problem? > > If interrupt is enabled (DAIF) - then it will be taken, and no issue > But if interrupts are disabled then following sequence can see the race > 1) interrupt is disabled (DAIF) > 2) TVAL/ENABLE/IMASK at timer h/w programming to de-assert interrupt. > Race of Irq asserted before irq de-asserted, than this short window of assertion will be considered as spike from timer h/w block > 3) Enable DAIF > Because of propagation delay CPU sees assertion and de-assertion (spike), errata hit > > Will add "isb" around interrupt enablement in next version of patch. ... why? That doesn't seem to follow from the abvoe, so I think I'm missing a step. > > If a write to DAIF.I races with the interrupt being asserted, can that trigger the > > problem? > > No race with writing to DAIF.I with interrupt assertion, > Writing DAIF.I = 0 (enablement of interrupt) can race with de-assertion, which can lead to hitting errata Ok. That *might* make it possible to bodge around the timer specifically, but as below I don't think we can ensure this is safe in the presence of virtualization, nor when considering other interrupts. > > From your description so far, this doesn't sound like it is specific to the timer > > interrupt. Is it possible for a different interrupt to trigger this, e.g: > > > > * Can the same happen with another PPI, e.g. the PMU interrupt if that > > gets de-asserted, or there's a race with DAIF.I? > > > > * Can the same happen with an SGI, e.g. if one CPU asserts then > > de-asserts an SGI targetting another CPU, or there's a race with > > DAIF.I? > > > > * Can the same happen with an SPI, e.g. if a device asserts then > > de-asserts its IRQ line, or there's a race with DAIF.I? > > No issue with edge triggered, but this can happen with any level sensitive interrupt. Ok. So that'll include at least the PMU and > > If not, *why* does this happen specifically for the timer interrupt? > > > > > > > Workaround of this is to ensure maximum 2us of time gap between > > > > > timer interrupt and timer programming which can de-assert timer interrupt. > > > > > > > > The code below seems to try to enforce a 2us *minimum*. Which is it > > > > supposed to be? > > > > > > Yes, it is minimum 2us. > > > > > > > > > > > Can you explain *why* this is supposed to help? > > > With the workaround interrupt assertion and de-assertion will be minimum 2us > > apart. > > > > I understood that, but why is that deemed to be sufficient? e.g. is it somehow > > guaranteed that the CPU will complete the instruction that would "become nop" > > in that time? > > With this delay we avoid spike, either this this will becomes an > actual interrupt or the spike never visible to core. > > > > > I don't see how we can guarantee this in a VM, or if the CPU misses > > > > on an instruction fetch. > > > > > > This errata applies to VM (virtual timer) as well, maybe there is some > > > gap in my understanding, how it will be different in VM. > > > Can you help with what issue we can have VM? > > > > A VCPU can be pre-empted by the host at *any* time, for an arbitrary length of > > time. So e.g. you can have a scenario such as: > > > > 1. Guest reads CNTx_TVAL, sees interrupt is 4us in the future and > > decides it does not need to wait > > 2. Host preempts guest > > 3. Host does some processing for ~3.9us > > 4. Host returns to guest, with 0.1us left until the interrupt triggers 5. Guest > > reprograms CNTx_TVAL, and triggers the erratum > > Yes, when timer expire just before tval written (race condition) , so > there is assertion-followed by de-assertion, As interrupts are enabled > in host, interrupt will be visible as spike to host. Ok, so that's a recipe for the guest to attack the host. > We will apply workaround whenever entering to guest (add a delay > before exiting to guest in case guest timer is going to expire). I think this is papering over the problem. You've said this can happen for *any* level-triggered interrupt. AFAIK, nothing prevents a malicious guest from deliberately asserting and de-asserting a level-triggered interrupt (e.g. by writing to the GIC distributor), and I also note that the GIC maintenance interrupt is level-triggered. So, as above: 1) A guest can deliberately cause information to be leaked to itself via the corrupted GPRs. I haven't seen any rationale for why that is not a problem, nor have I seen a suggested workaround. 2) A guest *may* be able to trigger this while the host is running. I haven't seen anything that rules this out so far. 3) Even in the absence of virtualization, it would be necessary to workaround this for *every* level-triggered interrupt, which includes at the timer, PMU, and GIC maintenance interrupts, in addition to any other configurable PPIs or SPIs. Without a fix that covers all of those, I don't think the workaround is viable. Thanks, Mark. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel