From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC4EBC43217 for ; Mon, 4 Oct 2021 13:43:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id C2EFB60F59 for ; Mon, 4 Oct 2021 13:43:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238467AbhJDNpQ (ORCPT ); Mon, 4 Oct 2021 09:45:16 -0400 Received: from mail.kernel.org ([198.145.29.99]:52290 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238479AbhJDNmQ (ORCPT ); Mon, 4 Oct 2021 09:42:16 -0400 Received: from disco-boy.misterjones.org (disco-boy.misterjones.org [51.254.78.96]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7190A61131; Mon, 4 Oct 2021 13:25:14 +0000 (UTC) Received: from sofa.misterjones.org ([185.219.108.64] helo=why.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mXNxw-00EdEV-ET; Mon, 04 Oct 2021 14:25:12 +0100 Date: Mon, 04 Oct 2021 14:25:12 +0100 Message-ID: <8735pgsz0n.wl-maz@kernel.org> From: Marc Zyngier To: Alexandru Elisei Cc: kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, James Morse , Suzuki K Poulose , Eric Auger , Christoffer Dall , kernel-team@android.com Subject: Re: [PATCH 2/5] KVM: arm64: Work around GICv3 locally generated SErrors In-Reply-To: <6e50193e-95c4-e1fa-8287-1b909a714ebd@arm.com> References: <20210924082542.2766170-1-maz@kernel.org> <20210924082542.2766170-3-maz@kernel.org> <6e50193e-95c4-e1fa-8287-1b909a714ebd@arm.com> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/27.1 (x86_64-pc-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=US-ASCII X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: alexandru.elisei@arm.com, kvmarm@lists.cs.columbia.edu, kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, james.morse@arm.com, suzuki.poulose@arm.com, eric.auger@redhat.com, christoffer.dall@arm.com, kernel-team@android.com X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false Precedence: bulk List-ID: X-Mailing-List: kvm@vger.kernel.org Hi Alex, On Mon, 04 Oct 2021 12:23:41 +0100, Alexandru Elisei wrote: > > Hi Marc, > > On 9/24/21 09:25, Marc Zyngier wrote: > > The infamous M1 has a feature nobody else ever implemented, > > in the form of the "GIC locally generated SError interrupts", > > also known as SEIS for short. > > > > These SErrors are generated when a guest does something that violates > > the GIC state machine. It would have been simpler to just *ignore* > > the damned thing, but that's not what this HW does. Oh well. > > > > This part of of the architecture is also amazingly under-specified. > > There is a whole 10 lines that describe the feature in a spec that > > is 930 pages long, and some of these lines are factually wrong. > > Oh, and it is deprecated, so the insentive to clarify it is low. > > > > Now, the spec says that this should be a *virtual* SError when > > HCR_EL2.AMO is set. As it turns out, that's not always the case > > on this CPU, and the SError sometimes fires on the host as a > > physical SError. Goodbye, cruel world. This clearly is a HW bug, > > and it means that a guest can easily take the host down, on demand. > > > > Thankfully, we have seen systems that were just as broken in the > > past, and we have the perfect vaccine for it. > > > > Apple M1, please meet the Cavium ThunderX workaround. All your > > GIC accesses will be trapped, sanitised, and emulated. Only the > > signalling aspect of the HW will be used. It won't be super speedy, > > but it will at least be safe. You're most welcome. > > > > Given that this has only ever been seen on this single implementation, > > that the spec is unclear at best and that we cannot trust it to ever > > be implemented correctly, gate the workaround solely on ICH_VTR_EL2.SEIS > > being set. > > I grepped for system error in Arm IHI 0069F, and turns out there's a number of > ways to make the GIC generate one: > > - When programming the ITS > > - On a write to ICC_DIR_EL1 (or the corresponding virtual CPU interface register) > with split priority drop/interrupt deactivation is not enabled. > > - On a write to GICV_AEOIR or GICC_DIR. > > ITS and the legacy GICv2 interface is memory mapped, so I am going > to trust that KVM emulates that correctly and avoids putting the GIC > into a state that triggers the SErrors. And to be clear, if the host kernel was doing the wrong thing, it would take a *physical* SError. And on the M1, it really doesn't matter as there is no physical GIC. > The CPU interface registers are accessed directly by the guest, then > changing that to trap-and-emulate looks like the only way to avoid > the guest from crashing the host with an SError. > > As for making the trap-and-emulate depend on the ICH_VTR_EL2.SEIS > being set, that sounds reasonable to me, considering that there were > no reports so far of this being implemented. And if it turns out > that there are device which implement GIC generated SErrors > *correctly* and the trap-and-emulate cost is too much, then we can > always get an errata number from Apple and have the trapping depend > on that, right? I have very little hope that we can get Apple to give us anything here. The CPU doesn't even advertise that it has a vGIC, so we're in uncharted territories. But we could definitely key that on the MIDR. > Reviewed-by: Alexandru Elisei Thanks! M. -- Without deviation from the norm, progress is not possible.