From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45B3B3DEFEC; Thu, 9 Apr 2026 18:12:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775758349; cv=none; b=INh5sj4U84JoBXt9pTxTgeDrigk/yswcp/gzzR+dH2gttPDu+MOtvVhQo3b7OXQ7QLVgCE0fmrvw4zudJO3aTnviiz+5i8RpLJxt+7igJvbzpVGlm18Hd/MS3E87Hg/K/VOiuNKVf67A7iMrBHSAWiS8zUFVLAz5tWMaC9OgA6g= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775758349; c=relaxed/simple; bh=TloMv1Js3cfnfbaK5/vsyrnKC1ksJ9K4FLT3zhDUB/0=; h=Date:Message-ID:From:To:Cc:Subject:In-Reply-To:References: MIME-Version:Content-Type; b=iR7e/3oQikW8wkydu0U80OGCJC9IQKCqoMJReBJUYwgdXpYIXKAEzde3zecIdQJ4aoggIr2Cbnpx8lXPW5rk3Copi0/lX7DFIDAE2aHZFtpxN2A85sC2Tjj+9mriC0guUszsLYns5ccwEUYJkmAhbkBY7kVFhGunosV6VlgO9f8= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ODRfiiYf; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ODRfiiYf" Received: by smtp.kernel.org (Postfix) with ESMTPSA id C0586C116C6; Thu, 9 Apr 2026 18:12:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1775758348; bh=TloMv1Js3cfnfbaK5/vsyrnKC1ksJ9K4FLT3zhDUB/0=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=ODRfiiYf4kJgeXqv0GgF/ZPzRIcx/7cYzvJqp4zkaelwoP8TPPLFQG/mzXnBdWFJs /u3506HhJUL77oALgrDfFjUnWT58yE327bp4l+bmpTzaDOA9FjtgTeNGwXynMcLw3k HQ6eXwsyyMprd9HHyD1P1ZwvqtST7v4d/fdreQkU2W5T663qqK/3Z8k8ur6HyPU9PK 0BAkNwN/gPXtcTz5CpCeA7g6Twxaol+yCNe+BZvqNPSatsyHC0TlAsojZAEeoZNH0j 42xnbbl0JKD2a+Im6CP1Y6AWmbvr8aLpQ6hou65Tox8kWgWnQC0S6CRew1PDkBMwFF As54E0pnCU8oQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wAtre-0000000ALO5-1TYg; Thu, 09 Apr 2026 18:12:26 +0000 Date: Thu, 09 Apr 2026 19:12:25 +0100 Message-ID: <86qzoo2eba.wl-maz@kernel.org> From: Marc Zyngier To: David Woodhouse Cc: "Gutierrez Cantu, Bernardo" , alexandru.elisei@arm.com, alyssa@rosenzweig.io, asahi@lists.linux.dev, broonie@kernel.org, catalin.marinas@arm.com, james.morse@arm.com, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, marcan@marcan.st, mathieu.poirier@linaro.org, oliver.upton@linux.dev, suzuki.poulose@arm.com, sven@svenpeter.dev, will@kernel.org Subject: Re: [PATCH] KVM: arm64: Add KVM_CAP_ARM_NATIVE_CACHE_CONFIG vcpu capability In-Reply-To: <584931abfa8c6f6c9c8a1379d26ef6beb58db1cd.camel@infradead.org> References: <7fb7b823c68e04321eb532a5b8ae21a818d4926d.camel@infradead.org> <86se942hbv.wl-maz@kernel.org> <584931abfa8c6f6c9c8a1379d26ef6beb58db1cd.camel@infradead.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/30.1 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: dwmw2@infradead.org, bercantu@amazon.de, alexandru.elisei@arm.com, alyssa@rosenzweig.io, asahi@lists.linux.dev, broonie@kernel.org, catalin.marinas@arm.com, james.morse@arm.com, kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, marcan@marcan.st, mathieu.poirier@linaro.org, oliver.upton@linux.dev, suzuki.poulose@arm.com, sven@svenpeter.dev, will@kernel.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false On Thu, 09 Apr 2026 18:49:09 +0100, David Woodhouse wrote: >=20 > [1 ] > On Thu, 2026-04-09 at 18:07 +0100, Marc Zyngier wrote: > > On Thu, 09 Apr 2026 16:29:06 +0100, > > David Woodhouse wrote: > > >=20 > > > [1=C2=A0 ] > > > From: David Woodhouse > > >=20 > > > Commit 7af0c2534f4c5 ("KVM: arm64: Normalize cache configuration") > > > fabricates CLIDR_EL1 and CCSIDR_EL1 values instead of using the real > > > hardware values. While this provides consistent values across > > > heterogeneous CPUs, it does cause visible changes in the CPU model > > > exposed to guests. > > >=20 > > > The commit claims that userspace can restore the original values, but > > > there is no way for userspace to obtain the real CLIDR_EL1 register > > > value =E2=80=94 it is not fully reconstructible from sysfs, which lac= ks the > > > LoC, LoUU, and LoUIS fields. > > >=20 > > > Add a per-vcpu KVM_CAP_ARM_NATIVE_CACHE_CONFIG capability that reads > > > the real CLIDR_EL1 and all CCSIDR_EL1 values from the current physical > > > CPU and sets them on the vcpu. > > >=20 > > > This allows hypervisors to present the real hardware cache configurat= ion > > > to guests, which is important for consistency of the environment acro= ss > > > kernel versions and for migration compatibility with hosts running > > > older kernels that exposed the real values. > > >=20 > > > Fixes: 7af0c2534f4c ("KVM: arm64: Normalize cache configuration") > > > Signed-off-by: David Woodhouse > > > --- > > > =C2=A0Documentation/virt/kvm/api.rst=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 23 ++++++++ > > > =C2=A0arch/arm64/include/asm/kvm_host.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 1 + > > > =C2=A0arch/arm64/kvm/arm.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 17 ++++++ > > > =C2=A0arch/arm64/kvm/sys_regs.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 | 26 ++++++++++ > > > =C2=A0include/uapi/linux/kvm.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 |=C2=A0 1 + > > > =C2=A0tools/testing/selftests/kvm/Makefile.kvm=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 |=C2=A0 1 + > > > =C2=A0.../selftests/kvm/arm64/native_cache_config.c | 52 ++++++++++++= +++++++ > > > =C2=A07 files changed, 121 insertions(+) > > > =C2=A0create mode 100644 tools/testing/selftests/kvm/arm64/native_cac= he_config.c > > >=20 > > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/= api.rst > > > index e3b3bd9edeec..ee47dc07ceac 100644 > > > --- a/Documentation/virt/kvm/api.rst > > > +++ b/Documentation/virt/kvm/api.rst > > > @@ -8930,6 +8930,29 @@ no-op. > > > =C2=A0 > > > =C2=A0``KVM_CHECK_EXTENSION`` returns the bitmask of exits that can b= e disabled. > > > =C2=A0 > > > +7.48 KVM_CAP_ARM_NATIVE_CACHE_CONFIG > > > +------------------------------------- > > > + > > > +:Architecture: arm64 > > > +:Target: vcpu > > > +:Parameters: none > > > +:Returns: 0 on success, -ENOMEM on allocation failure, -EINVAL if > > > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 args[0] or fl= ags are non-zero. > > > + > > > +This per-vcpu capability reads the real CLIDR_EL1 and CCSIDR_EL1 val= ues > > > +from the physical CPU on which the ioctl is executed, and sets them = on > > > +the vcpu. This replaces the fabricated cache configuration that KVM > > > +provides by default. > > > + > > > +The caller should ensure the vcpu thread is pinned to the desired > > > +physical CPU before invoking this capability, so that the correct ca= che > > > +topology is captured. On heterogeneous systems, different physical C= PUs > > > +may have different cache configurations. > > > + > > > +After this capability is enabled, the vcpu's CLIDR_EL1 and CCSIDR_EL1 > > > +values can still be overridden individually via ``KVM_SET_ONE_REG`` = and > > > +the ``KVM_REG_ARM_DEMUX`` interface. > > > + > > > =C2=A08. Other capabilities. > > > =C2=A0=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D > > > =C2=A0 > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/a= sm/kvm_host.h > > > index a1bb025c641f..c9713a472c47 100644 > > > --- a/arch/arm64/include/asm/kvm_host.h > > > +++ b/arch/arm64/include/asm/kvm_host.h > > > @@ -1296,6 +1296,7 @@ void kvm_sys_regs_create_debugfs(struct kvm *kv= m); > > > =C2=A0void kvm_reset_sys_regs(struct kvm_vcpu *vcpu); > > > =C2=A0 > > > =C2=A0int __init kvm_sys_reg_table_init(void); > > > +int kvm_vcpu_set_native_cache_config(struct kvm_vcpu *vcpu); > > > =C2=A0struct sys_reg_desc; > > > =C2=A0int __init populate_sysreg_config(const struct sys_reg_desc *sr, > > > =C2=A0 =C2=A0 unsigned int idx); > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > > > index 326a99fea753..579583e8dc5c 100644 > > > --- a/arch/arm64/kvm/arm.c > > > +++ b/arch/arm64/kvm/arm.c > > > @@ -393,6 +393,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm= , long ext) > > > =C2=A0 case KVM_CAP_ARM_DISABLE_EXITS: > > > =C2=A0 r =3D KVM_ARM_DISABLE_VALID_EXITS; > > > =C2=A0 break; > > > + case KVM_CAP_ARM_NATIVE_CACHE_CONFIG: > > > + case KVM_CAP_ENABLE_CAP: > > > + r =3D 1; > > > + break; > > > =C2=A0 case KVM_CAP_SET_GUEST_DEBUG2: > > > =C2=A0 return KVM_GUESTDBG_VALID_MASK; > > > =C2=A0 case KVM_CAP_ARM_SET_DEVICE_ADDR: > > > @@ -1793,6 +1797,19 @@ long kvm_arch_vcpu_ioctl(struct file *filp, > > > =C2=A0 r =3D kvm_arch_vcpu_ioctl_vcpu_init(vcpu, &init); > > > =C2=A0 break; > > > =C2=A0 } > > > + case KVM_ENABLE_CAP: { > > > + struct kvm_enable_cap cap; > > > + > > > + r =3D -EFAULT; > > > + if (copy_from_user(&cap, argp, sizeof(cap))) > > > + break; > > > + > > > + r =3D -EINVAL; > > > + if (cap.cap =3D=3D KVM_CAP_ARM_NATIVE_CACHE_CONFIG && > > > + =C2=A0=C2=A0=C2=A0 !cap.args[0] && !cap.flags) > > > + r =3D kvm_vcpu_set_native_cache_config(vcpu); > > > + break; > > > + } > > > =C2=A0 case KVM_SET_ONE_REG: > > > =C2=A0 case KVM_GET_ONE_REG: { > > > =C2=A0 struct kvm_one_reg reg; > > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > > > index 1b4cacb6e918..c19d84e48f8b 100644 > > > --- a/arch/arm64/kvm/sys_regs.c > > > +++ b/arch/arm64/kvm/sys_regs.c > > > @@ -484,6 +484,32 @@ static int set_ccsidr(struct kvm_vcpu *vcpu, u32= csselr, u32 val) > > > =C2=A0 return 0; > > > =C2=A0} > > > =C2=A0 > > > +int kvm_vcpu_set_native_cache_config(struct kvm_vcpu *vcpu) > > > +{ > > > + u32 csselr; > > > + > > > + if (!vcpu->arch.ccsidr) { > > > + vcpu->arch.ccsidr =3D kmalloc_array(CSSELR_MAX, sizeof(u32), > > > + =C2=A0 GFP_KERNEL_ACCOUNT); > > > + if (!vcpu->arch.ccsidr) > > > + return -ENOMEM; > > > + } > >=20 > > Well, no. > >=20 > > The moment you decide to expose all of the host's crap, you really > > need to put everything on the table. It means fully handling > > FEAT_CCIDX, which we were careful not to expose anywhere because it is > > a terrible idea. >=20 > The intent here is not to "expose all of the host's crap", but to > maintain compatibility with what the kernel did before commit > 7af0c2534f4c. No need to expose FEAT_CCIDX. That's not optional. Without FEAT_CCIDX, the guest cannot interpret the correct cache geometry. >=20 > > > + for (csselr =3D 0; csselr < CSSELR_MAX; csselr++) { > > > + write_sysreg(csselr, csselr_el1); > > > + isb(); > > > + vcpu->arch.ccsidr[csselr] =3D read_sysreg(ccsidr_el1); > >=20 > > That's not how the selection register works. CLIDR_EL1 tells you what > > each cache level is (Instructions, Data, Unified, Tags), and that must > > be combined with the index (which doesn't start at bit 0). >=20 > Ack, thanks. I'll rework that based on the old is_valid_cache() > function. >=20 > > I also wonder how you reconcile not exposing MTE when the cache > > hierarchy indicate support for tags. That clearly contradicts "report > > what the HW has". >=20 > If that was an issue then it would already have been an issue before > commit 7af0c2534f4 (and in kernels with that commit reverted), hosting > millions of guests today. That only means you are doing a pretty bad job at supporting guests. And yes, this is an issue for anything that expects to see something meaningful in CCSIDR[]. The fact that none of your guests hit that problem only means you're lacking coverage. =46rom what I can read, anything from Neoverse V1 is affected. >=20 > This isn't about introducing *new* behaviour; it's about allowing the > existing established behaviour to be maintained so that we can have a > *managed* transition to the new model (for new launches) rather than an > unconditional uncontrolled change as the kernel gets upgraded. Then fully implement "show me the cache hierarchy", read it out, and write it back with whatever level of brokenness you intend to inflict on your guests. But I'm not reintroducing this particular bug. M. --=20 Without deviation from the norm, progress is not possible.