Linux Confidential Computing Development

Linux Confidential Computing Development
 help / color / mirror / Atom feed

* Re: [PATCH v14 23/44] arm64: RMI: Handle RMI_EXIT_RIPAS_CHANGE
From: Wei-Lin Chang @ 2026-05-27 10:52 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-24-steven.price@arm.com>

Hi,

On Wed, May 13, 2026 at 02:17:31PM +0100, Steven Price wrote:
> The guest can request that a region of it's protected address space is
> switched between RIPAS_RAM and RIPAS_EMPTY (and back) using
> RSI_IPA_STATE_SET. This causes a guest exit with the
> RMI_EXIT_RIPAS_CHANGE code. We treat this as a request to convert a
> protected region to unprotected (or back), exiting to the VMM to make
> the necessary changes to the guest_memfd and memslot mappings. On the
> next entry the RIPAS changes are committed by making RMI_RTT_SET_RIPAS
> calls.
> 
> The VMM may wish to reject the RIPAS change requested by the guest. For
> now it can only do this by no longer scheduling the VCPU as we don't
> currently have a usecase for returning that rejection to the guest, but
> by postponing the RMI_RTT_SET_RIPAS changes to entry we leave the door
> open for adding a new ioctl in the future for this purpose.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
>  * Switch to the new RMI_RTT_UNPROT_UNMAP range-based API.
>  * Drop ugly hack for RMM bug which errored when the RIPAS was already
>    set to the desired value.
> Changes since v12:
>  * Switch to the new RMM v2.0 RMI_RTT_DATA_UNMAP which can unmap an
>    address range.
> Changes since v11:
>  * Combine the "Allow VMM to set RIPAS" patch into this one to avoid
>    adding functions before they are used.
>  * Drop the CAP for setting RIPAS and adapt to changes from previous
>    patches.
> Changes since v10:
>  * Add comment explaining the assignment of rec->run->exit.ripas_base in
>    kvm_complete_ripas_change().
> Changes since v8:
>  * Make use of ripas_change() from a previous patch to implement
>    realm_set_ipa_state().
>  * Update exit.ripas_base after a RIPAS change so that, if instead of
>    entering the guest we exit to user space, we don't attempt to repeat
>    the RIPAS change (triggering an error from the RMM).
> Changes since v7:
>  * Rework the loop in realm_set_ipa_state() to make it clear when the
>    'next' output value of rmi_rtt_set_ripas() is used.
> New patch for v7: The code was previously split awkwardly between two
> other patches.
> ---
>  arch/arm64/include/asm/kvm_rmi.h |   6 +
>  arch/arm64/kvm/mmu.c             |   8 +-
>  arch/arm64/kvm/rmi.c             | 439 +++++++++++++++++++++++++++++++
>  3 files changed, 450 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
> index feb534a6678e..007249a13dbc 100644
> --- a/arch/arm64/include/asm/kvm_rmi.h
> +++ b/arch/arm64/include/asm/kvm_rmi.h
> @@ -88,6 +88,12 @@ int kvm_rec_enter(struct kvm_vcpu *vcpu);
>  int kvm_rec_pre_enter(struct kvm_vcpu *vcpu);
>  int handle_rec_exit(struct kvm_vcpu *vcpu, int rec_run_status);
>  
> +void kvm_realm_unmap_range(struct kvm *kvm,
> +			   unsigned long ipa,
> +			   unsigned long size,
> +			   bool unmap_private,
> +			   bool may_block);
> +
>  static inline bool kvm_realm_is_private_address(struct realm *realm,
>  						unsigned long addr)
>  {
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index eb56d4e7f21a..10ca9dbe40a0 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -319,6 +319,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>   * @start: The intermediate physical base address of the range to unmap
>   * @size:  The size of the area to unmap
>   * @may_block: Whether or not we are permitted to block
> + * @only_shared: If true then protected mappings should not be unmapped

Do you think it's better if we use enum kvm_gfn_range_filter for this?
Pass KVM_FILTER_{PRIVATE, SHARED} to indicate what to unmap. This way we
don't have the think about booleans. kvm_realm_unmap_range() in patch 23
will have to change too though.

>   *
>   * Clear a range of stage-2 mappings, lowering the various ref-counts.  Must
>   * be called while holding mmu_lock (unless for freeing the stage2 pgd before
> @@ -326,7 +327,7 @@ static void invalidate_icache_guest_page(void *va, size_t size)
>   * with things behind our backs.
>   */
>  static void __unmap_stage2_range(struct kvm_s2_mmu *mmu, phys_addr_t start, u64 size,
> -				 bool may_block)
> +				 bool may_block, bool only_shared)
>  {
>  	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
>  	phys_addr_t end = start + size;
> @@ -343,7 +344,7 @@ void kvm_stage2_unmap_range(struct kvm_s2_mmu *mmu, phys_addr_t start,
>  	if (kvm_vm_is_protected(kvm_s2_mmu_to_kvm(mmu)))
>  		return;
>  
> -	__unmap_stage2_range(mmu, start, size, may_block);
> +	__unmap_stage2_range(mmu, start, size, may_block, false);
>  }
>  
>  void kvm_stage2_flush_range(struct kvm_s2_mmu *mmu, phys_addr_t addr, phys_addr_t end)
> @@ -2418,7 +2419,8 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
>  
>  	__unmap_stage2_range(&kvm->arch.mmu, range->start << PAGE_SHIFT,
>  			     (range->end - range->start) << PAGE_SHIFT,
> -			     range->may_block);
> +			     range->may_block,
> +			     !(range->attr_filter & KVM_FILTER_PRIVATE));
>  
>  	kvm_nested_s2_unmap(kvm, range->may_block);
>  	return false;

[...]

Thanks,
Wei-Lin Chang

^ permalink raw reply

* Re: [PATCH v10 02/25] x86/virt/tdx: Move TDX global initialization states to file scope
From: Kiryl Shutsemau @ 2026-05-27 10:47 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
In-Reply-To: <20260520133909.409394-3-chao.gao@intel.com>

On Wed, May 20, 2026 at 06:38:05AM -0700, Chao Gao wrote:
> TDX module global initialization is executed only once. The first call
> caches both the result and the "done" state, and later callers reuse the
> saved result. A lock protects that cached states.
> 
> Those states and the lock are currently kept as function-local statics
> because they are used only by try_init_module_global().
> 
> TDX module updates need to reset the cached states so TDX global
> initialization can be run again after an update. That will add another
> access site in the same file.
> 
> Move the cached states to file scope so it is accessible outside
> try_init_module_global(), and move the lock along with the states it
> protects.
> 
> No functional change intended.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>

Reviewed-by: Kiryl Shutsemau <kas@kernel.org>

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply

* Re: [PATCH v10 01/25] x86/virt/tdx: Clarify try_init_module_global() result caching
From: Kiryl Shutsemau @ 2026-05-27 10:43 UTC (permalink / raw)
  To: Chao Gao
  Cc: kvm, linux-coco, linux-kernel, binbin.wu, dave.hansen, djbw,
	ira.weiny, kai.huang, nik.borisov, paulmck, pbonzini,
	reinette.chatre, rick.p.edgecombe, sagis, seanjc, tony.lindgren,
	vannapurve, vishal.l.verma, yilun.xu, xiaoyao.li, yan.y.zhao,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	H. Peter Anvin
In-Reply-To: <20260520133909.409394-2-chao.gao@intel.com>

On Wed, May 20, 2026 at 06:38:04AM -0700, Chao Gao wrote:
> TDX module global initialization is executed only once. The first call
> caches both the result and the "done" state, and later callers reuse the
> saved result. A lock protects that cached state.
> 
> The current code is hard to read because sysinit_done is accessed under
> the lock, while sysinit_ret is not.
> 
> To improve readability, move sysinit_ret accesses within the lock.

Have you considered using the guard for the lock. It would simplify the
flow.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply

* [PATCH v6 4/4] coco: guest: arm64: Replace dummy CCA device with sysfs ABI
From: Aneesh Kumar K.V (Arm) @ 2026-05-27 10:02 UTC (permalink / raw)
  To: linux-coco, linux-arm-kernel, linux-kernel
  Cc: Aneesh Kumar K.V (Arm), Catalin Marinas, Greg KH, Jeremy Linton,
	Jonathan Cameron, Lorenzo Pieralisi, Mark Rutland, Sudeep Holla,
	Will Deacon, Steven Price, Suzuki K Poulose
In-Reply-To: <20260527100233.428018-1-aneesh.kumar@kernel.org>

The SMCCC firmware driver now creates the arm-smccc platform device and
instantiates the CCA RSI auxiliary devices once the RSI ABI is discovered.
The arm64-specific arm-cca-dev platform device stub is therefore no longer
needed.

However, userspace has used the arm-cca-dev platform device to detect Arm
CCA Realm guests [1]. Removing it without a replacement would break that
detection and would also leave userspace depending on kernel device-model
details.

Add /sys/firmware/cca/realm_guest as a stable, architecture-provided ABI
for detecting whether the kernel is running as an Arm CCA Realm guest. The
file returns 1 in Realm world and 0 otherwise, similar to the existing s390
/sys/firmware/uv/prot_virt_guest interface for protected virtualization
guests.

Remove the dummy arm-cca-dev registration now that userspace has a
dedicated CCA Realm guest indicator, and document the new ABI in
Documentation/ABI/testing/sysfs-firmware-cca.

[1] https://lore.kernel.org/all/4a7d84b2-2ec4-4773-a2d5-7b63d5c683cf@arm.com

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 Documentation/ABI/testing/sysfs-firmware-cca | 10 +++++
 arch/arm64/kernel/rsi.c                      | 39 +++++++++++++++-----
 2 files changed, 39 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-cca

diff --git a/Documentation/ABI/testing/sysfs-firmware-cca b/Documentation/ABI/testing/sysfs-firmware-cca
new file mode 100644
index 000000000000..bf177d636b92
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-cca
@@ -0,0 +1,10 @@
+What:		/sys/firmware/cca/realm_guest
+Date:		May 2026
+Contact:	Linux ARM Kernel Mailing list <linux-arm-kernel@lists.infradead.org>
+Description:	Read-only. Indicates whether the kernel is running as an
+		Arm Confidential Compute Architecture (CCA) Realm guest.
+
+		The value is one of:
+
+		0: the kernel is not running as a Realm guest
+		1: the kernel is running as a Realm guest
diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
index da440f71bb64..a333029ddf08 100644
--- a/arch/arm64/kernel/rsi.c
+++ b/arch/arm64/kernel/rsi.c
@@ -9,6 +9,8 @@
 #include <linux/swiotlb.h>
 #include <linux/cc_platform.h>
 #include <linux/platform_device.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
 
 #include <asm/io.h>
 #include <asm/mem_encrypt.h>
@@ -16,6 +18,7 @@
 #include <asm/rsi.h>
 
 static struct realm_config config;
+static struct kobject *cca_kobj;
 
 unsigned long prot_ns_shared;
 EXPORT_SYMBOL(prot_ns_shared);
@@ -160,17 +163,33 @@ void __init arm64_rsi_init(void)
 	static_branch_enable(&rsi_present);
 }
 
-static struct platform_device rsi_dev = {
-	.name = "arm-cca-dev",
-	.id = PLATFORM_DEVID_NONE
+static ssize_t cca_is_realm_guest(struct kobject *kobj,
+		struct kobj_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%d\n", is_realm_world());
+}
+
+static struct kobj_attribute cca_realm_guest =
+	__ATTR(realm_guest, 0444, cca_is_realm_guest, NULL);
+
+static const struct attribute *cca_realm_attrs[] = {
+	&cca_realm_guest.attr,
+	NULL,
 };
 
-static int __init arm64_create_dummy_rsi_dev(void)
+static int __init realm_sysfs_init(void)
 {
-	if (is_realm_world() &&
-	    platform_device_register(&rsi_dev))
-		pr_err("failed to register rsi platform device\n");
-	return 0;
-}
+	int ret;
+
+	cca_kobj = kobject_create_and_add("cca", firmware_kobj);
+	if (!cca_kobj)
+		return -ENOMEM;
 
-arch_initcall(arm64_create_dummy_rsi_dev)
+	ret = sysfs_create_files(cca_kobj, cca_realm_attrs);
+	if (!ret)
+		return 0;
+
+	kobject_put(cca_kobj);
+	return ret;
+}
+device_initcall(realm_sysfs_init);
-- 
2.43.0


^ permalink raw reply related

* [PATCH v6 3/4] firmware: smccc: arm-cca-guest: Bind the TSM provider to an SMCCC device
From: Aneesh Kumar K.V (Arm) @ 2026-05-27 10:02 UTC (permalink / raw)
  To: linux-coco, linux-arm-kernel, linux-kernel
  Cc: Aneesh Kumar K.V (Arm), Catalin Marinas, Greg KH, Jeremy Linton,
	Jonathan Cameron, Lorenzo Pieralisi, Mark Rutland, Sudeep Holla,
	Will Deacon, Steven Price, Suzuki K Poulose
In-Reply-To: <20260527100233.428018-1-aneesh.kumar@kernel.org>

The Arm CCA guest TSM provider currently binds through the arm-cca-dev
platform device. Like arm-smccc-trng, this device is not an independent
platform resource; it is a software representation of the RSI firmware
service discovered through SMCCC.

Move RSI discovery into the SMCCC firmware driver. When the SMCCC conduit
is SMC and the RSI ABI version check succeeds, create an arm-rsi-dev SMCCC
device. Convert the Arm CCA guest TSM provider to an SMCCC driver so it
binds to that discovered RSI service and keeps module autoloading through
the SMCCC device id table.

Keep the old arm-cca-dev platform-device registration for now. Userspace
has used that device as a Realm-guest indicator, so removing it is left to
a follow-up patch that adds a replacement sysfs ABI.

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 arch/arm64/include/asm/rsi.h                  |  2 +-
 arch/arm64/kernel/rsi.c                       |  2 +-
 drivers/firmware/smccc/Makefile               |  4 ++
 drivers/firmware/smccc/rmm.c                  | 25 ++++++++
 drivers/firmware/smccc/rmm.h                  | 17 ++++++
 drivers/firmware/smccc/smccc.c                |  8 +++
 drivers/virt/coco/arm-cca-guest/Kconfig       |  1 +
 drivers/virt/coco/arm-cca-guest/Makefile      |  2 +
 .../{arm-cca-guest.c => arm-cca.c}            | 60 +++++++++----------
 9 files changed, 89 insertions(+), 32 deletions(-)
 create mode 100644 drivers/firmware/smccc/rmm.c
 create mode 100644 drivers/firmware/smccc/rmm.h
 rename drivers/virt/coco/arm-cca-guest/{arm-cca-guest.c => arm-cca.c} (85%)

diff --git a/arch/arm64/include/asm/rsi.h b/arch/arm64/include/asm/rsi.h
index 88b50d660e85..2d2d363aaaee 100644
--- a/arch/arm64/include/asm/rsi.h
+++ b/arch/arm64/include/asm/rsi.h
@@ -10,7 +10,7 @@
 #include <linux/jump_label.h>
 #include <asm/rsi_cmds.h>
 
-#define RSI_PDEV_NAME "arm-cca-dev"
+#define RSI_DEV_NAME "arm-rsi-dev"
 
 DECLARE_STATIC_KEY_FALSE(rsi_present);
 
diff --git a/arch/arm64/kernel/rsi.c b/arch/arm64/kernel/rsi.c
index 92160f2e57ff..da440f71bb64 100644
--- a/arch/arm64/kernel/rsi.c
+++ b/arch/arm64/kernel/rsi.c
@@ -161,7 +161,7 @@ void __init arm64_rsi_init(void)
 }
 
 static struct platform_device rsi_dev = {
-	.name = RSI_PDEV_NAME,
+	.name = "arm-cca-dev",
 	.id = PLATFORM_DEVID_NONE
 };
 
diff --git a/drivers/firmware/smccc/Makefile b/drivers/firmware/smccc/Makefile
index 40d19144a860..33c850aaff4d 100644
--- a/drivers/firmware/smccc/Makefile
+++ b/drivers/firmware/smccc/Makefile
@@ -2,3 +2,7 @@
 #
 obj-$(CONFIG_HAVE_ARM_SMCCC_DISCOVERY)	+= smccc.o kvm_guest.o
 obj-$(CONFIG_ARM_SMCCC_SOC_ID)	+= soc_id.o
+
+ifeq ($(CONFIG_HAVE_ARM_SMCCC_DISCOVERY),y)
+obj-$(CONFIG_ARM64) += rmm.o
+endif
diff --git a/drivers/firmware/smccc/rmm.c b/drivers/firmware/smccc/rmm.c
new file mode 100644
index 000000000000..d572f47e955c
--- /dev/null
+++ b/drivers/firmware/smccc/rmm.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2026 Arm Limited
+ */
+
+#include <linux/arm-smccc-bus.h>
+#include <linux/err.h>
+#include <linux/printk.h>
+
+#include "rmm.h"
+
+void __init register_rsi_device(void)
+{
+	unsigned long ret;
+
+	if (arm_smccc_1_1_get_conduit() != SMCCC_CONDUIT_SMC)
+		return;
+
+	ret = rsi_request_version(RSI_ABI_VERSION, NULL, NULL);
+	if (ret != RSI_SUCCESS)
+		return;
+
+	if (IS_ERR(arm_smccc_device_register(RSI_DEV_NAME)))
+		pr_err("%s: could not register device\n", RSI_DEV_NAME);
+}
diff --git a/drivers/firmware/smccc/rmm.h b/drivers/firmware/smccc/rmm.h
new file mode 100644
index 000000000000..627098e2ae1f
--- /dev/null
+++ b/drivers/firmware/smccc/rmm.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _SMCCC_RMM_H
+#define _SMCCC_RMM_H
+
+#include <linux/init.h>
+
+#ifdef CONFIG_ARM64
+#include <linux/arm-smccc-bus.h>
+#include <asm/rsi_cmds.h>
+void __init register_rsi_device(void);
+#else
+
+static inline void __init register_rsi_device(void)
+{
+}
+#endif
+#endif
diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
index 6d260354d0f9..888e7f1d6f86 100644
--- a/drivers/firmware/smccc/smccc.c
+++ b/drivers/firmware/smccc/smccc.c
@@ -15,6 +15,8 @@
 
 #include <asm/archrandom.h>
 
+#include "rmm.h"
+
 static u32 smccc_version = ARM_SMCCC_VERSION_1_0;
 static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
 static DEFINE_IDA(arm_smccc_bus_id);
@@ -240,6 +242,12 @@ subsys_initcall(arm_smccc_bus_init);
 
 static int __init smccc_devices_init(void)
 {
+	/*
+	 * Register the RMI and RSI devices only when firmware exposes
+	 * the required SMCCC function IDs at a supported revision.
+	 */
+	register_rsi_device();
+
 	if (smccc_trng_available) {
 		struct arm_smccc_device *sdev;
 
diff --git a/drivers/virt/coco/arm-cca-guest/Kconfig b/drivers/virt/coco/arm-cca-guest/Kconfig
index 3f0f013f03f1..ad7538750c5a 100644
--- a/drivers/virt/coco/arm-cca-guest/Kconfig
+++ b/drivers/virt/coco/arm-cca-guest/Kconfig
@@ -1,6 +1,7 @@
 config ARM_CCA_GUEST
 	tristate "Arm CCA Guest driver"
 	depends on ARM64
+	depends on HAVE_ARM_SMCCC_DISCOVERY
 	select TSM_REPORTS
 	help
 	  The driver provides userspace interface to request and
diff --git a/drivers/virt/coco/arm-cca-guest/Makefile b/drivers/virt/coco/arm-cca-guest/Makefile
index 69eeba08e98a..75a120e24fda 100644
--- a/drivers/virt/coco/arm-cca-guest/Makefile
+++ b/drivers/virt/coco/arm-cca-guest/Makefile
@@ -1,2 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0-only
 obj-$(CONFIG_ARM_CCA_GUEST) += arm-cca-guest.o
+
+arm-cca-guest-y +=  arm-cca.o
diff --git a/drivers/virt/coco/arm-cca-guest/arm-cca-guest.c b/drivers/virt/coco/arm-cca-guest/arm-cca.c
similarity index 85%
rename from drivers/virt/coco/arm-cca-guest/arm-cca-guest.c
rename to drivers/virt/coco/arm-cca-guest/arm-cca.c
index 66d00b6ceb78..8d5a09bd772a 100644
--- a/drivers/virt/coco/arm-cca-guest/arm-cca-guest.c
+++ b/drivers/virt/coco/arm-cca-guest/arm-cca.c
@@ -4,6 +4,7 @@
  */
 
 #include <linux/arm-smccc.h>
+#include <linux/arm-smccc-bus.h>
 #include <linux/cc_platform.h>
 #include <linux/kernel.h>
 #include <linux/mod_devicetable.h>
@@ -182,52 +183,51 @@ static int arm_cca_report_new(struct tsm_report *report, void *data)
 	return ret;
 }
 
-static const struct tsm_report_ops arm_cca_tsm_ops = {
+static const struct tsm_report_ops arm_cca_tsm_report_ops = {
 	.name = KBUILD_MODNAME,
 	.report_new = arm_cca_report_new,
 };
 
-/**
- * arm_cca_guest_init - Register with the Trusted Security Module (TSM)
- * interface.
- *
- * Return:
- * * %0        - Registered successfully with the TSM interface.
- * * %-ENODEV  - The execution context is not an Arm Realm.
- * * %-EBUSY   - Already registered.
- */
-static int __init arm_cca_guest_init(void)
+static void unregister_cca_tsm_report(void *data)
+{
+	tsm_report_unregister(&arm_cca_tsm_report_ops);
+}
+
+static int cca_tsm_probe(struct arm_smccc_device *sdev)
 {
 	int ret;
 
 	if (!is_realm_world())
 		return -ENODEV;
 
-	ret = tsm_report_register(&arm_cca_tsm_ops, NULL);
-	if (ret < 0)
-		pr_err("Error %d registering with TSM\n", ret);
+	ret = tsm_report_register(&arm_cca_tsm_report_ops, NULL);
+	if (ret < 0) {
+		dev_err_probe(&sdev->dev, ret, "Error registering with TSM\n");
+		return ret;
+	}
 
-	return ret;
-}
-module_init(arm_cca_guest_init);
+	ret = devm_add_action_or_reset(&sdev->dev, unregister_cca_tsm_report,
+				       NULL);
+	if (ret < 0) {
+		dev_err_probe(&sdev->dev, ret, "Error registering devm action\n");
+		return ret;
+	}
 
-/**
- * arm_cca_guest_exit - unregister with the Trusted Security Module (TSM)
- * interface.
- */
-static void __exit arm_cca_guest_exit(void)
-{
-	tsm_report_unregister(&arm_cca_tsm_ops);
+	return 0;
 }
-module_exit(arm_cca_guest_exit);
 
-/* modalias, so userspace can autoload this module when RSI is available */
-static const struct platform_device_id arm_cca_match[] __maybe_unused = {
-	{ RSI_PDEV_NAME, 0},
-	{ }
+static const struct arm_smccc_device_id cca_tsm_id_table[] = {
+	{ .name = RSI_DEV_NAME },
+	{}
 };
+MODULE_DEVICE_TABLE(arm_smccc, cca_tsm_id_table);
 
-MODULE_DEVICE_TABLE(platform, arm_cca_match);
+static struct arm_smccc_driver cca_tsm_driver = {
+	.name = KBUILD_MODNAME,
+	.probe = cca_tsm_probe,
+	.id_table = cca_tsm_id_table,
+};
+module_arm_smccc_driver(cca_tsm_driver);
 MODULE_AUTHOR("Sami Mujawar <sami.mujawar@arm.com>");
 MODULE_DESCRIPTION("Arm CCA Guest TSM Driver");
 MODULE_LICENSE("GPL");
-- 
2.43.0


^ permalink raw reply related

* [PATCH v6 2/4] firmware: hwrng: arm_smccc_trng: Register as an SMCCC device
From: Aneesh Kumar K.V (Arm) @ 2026-05-27 10:02 UTC (permalink / raw)
  To: linux-coco, linux-arm-kernel, linux-kernel
  Cc: Aneesh Kumar K.V (Arm), Catalin Marinas, Greg KH, Jeremy Linton,
	Jonathan Cameron, Lorenzo Pieralisi, Mark Rutland, Sudeep Holla,
	Will Deacon, Steven Price, Suzuki K Poulose
In-Reply-To: <20260527100233.428018-1-aneesh.kumar@kernel.org>

The SMCCC TRNG interface is a firmware-provided SMCCC service rather than a
standalone platform device. Now that the SMCCC core has an SMCCC bus,
create an arm-smccc-trng device for the discovered TRNG service and convert
the hwrng driver to an SMCCC driver.

The SMCCC id table preserves module autoloading for systems where the TRNG
driver is built as a module.

The sysfs device path changes from the old smccc_trng platform-device path
to an arm-smccc device path. No known userspace dependency on the old path
was found; a Debian Code Search lookup for the existing platform-device
name/path did not find any users.

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 drivers/char/hw_random/arm_smccc_trng.c | 26 ++++++++++++++-----------
 drivers/firmware/smccc/smccc.c          | 14 ++++++-------
 2 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/char/hw_random/arm_smccc_trng.c b/drivers/char/hw_random/arm_smccc_trng.c
index dcb8e7f37f25..d273e6fc7129 100644
--- a/drivers/char/hw_random/arm_smccc_trng.c
+++ b/drivers/char/hw_random/arm_smccc_trng.c
@@ -16,8 +16,8 @@
 #include <linux/device.h>
 #include <linux/hw_random.h>
 #include <linux/module.h>
-#include <linux/platform_device.h>
 #include <linux/arm-smccc.h>
+#include <linux/arm-smccc-bus.h>
 
 #ifdef CONFIG_ARM64
 #define ARM_SMCCC_TRNG_RND	ARM_SMCCC_TRNG_RND64
@@ -94,29 +94,33 @@ static int smccc_trng_read(struct hwrng *rng, void *data, size_t max, bool wait)
 	return copied;
 }
 
-static int smccc_trng_probe(struct platform_device *pdev)
+static int smccc_trng_probe(struct arm_smccc_device *sdev)
 {
 	struct hwrng *trng;
 
-	trng = devm_kzalloc(&pdev->dev, sizeof(*trng), GFP_KERNEL);
+	trng = devm_kzalloc(&sdev->dev, sizeof(*trng), GFP_KERNEL);
 	if (!trng)
 		return -ENOMEM;
 
 	trng->name = "smccc_trng";
 	trng->read = smccc_trng_read;
 
-	return devm_hwrng_register(&pdev->dev, trng);
+	return devm_hwrng_register(&sdev->dev, trng);
 }
 
-static struct platform_driver smccc_trng_driver = {
-	.driver = {
-		.name		= "smccc_trng",
-	},
-	.probe		= smccc_trng_probe,
+static const struct arm_smccc_device_id smccc_trng_id_table[] = {
+	{ .name = "arm-smccc-trng" },
+	{}
 };
-module_platform_driver(smccc_trng_driver);
+MODULE_DEVICE_TABLE(arm_smccc, smccc_trng_id_table);
+
+static struct arm_smccc_driver smccc_trng_driver = {
+	.name	  = KBUILD_MODNAME,
+	.probe	  = smccc_trng_probe,
+	.id_table = smccc_trng_id_table,
+};
+module_arm_smccc_driver(smccc_trng_driver);
 
-MODULE_ALIAS("platform:smccc_trng");
 MODULE_AUTHOR("Andre Przywara");
 MODULE_DESCRIPTION("Arm SMCCC TRNG firmware interface support");
 MODULE_LICENSE("GPL");
diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
index 695c920a8087..6d260354d0f9 100644
--- a/drivers/firmware/smccc/smccc.c
+++ b/drivers/firmware/smccc/smccc.c
@@ -9,7 +9,6 @@
 #include <linux/init.h>
 #include <linux/arm-smccc.h>
 #include <linux/kernel.h>
-#include <linux/platform_device.h>
 #include <linux/arm-smccc-bus.h>
 #include <linux/idr.h>
 #include <linux/slab.h>
@@ -241,14 +240,13 @@ subsys_initcall(arm_smccc_bus_init);
 
 static int __init smccc_devices_init(void)
 {
-	struct platform_device *pdev;
-
 	if (smccc_trng_available) {
-		pdev = platform_device_register_simple("smccc_trng", -1,
-						       NULL, 0);
-		if (IS_ERR(pdev))
-			pr_err("smccc_trng: could not register device: %ld\n",
-			       PTR_ERR(pdev));
+		struct arm_smccc_device *sdev;
+
+		sdev = arm_smccc_device_register("arm-smccc-trng");
+		if (IS_ERR(sdev))
+			pr_err("arm-smccc-trng: could not register device: %ld\n",
+			       PTR_ERR(sdev));
 	}
 
 	return 0;
-- 
2.43.0


^ permalink raw reply related

* [PATCH v6 1/4] firmware: smccc: Add an Arm SMCCC bus
From: Aneesh Kumar K.V (Arm) @ 2026-05-27 10:02 UTC (permalink / raw)
  To: linux-coco, linux-arm-kernel, linux-kernel
  Cc: Aneesh Kumar K.V (Arm), Catalin Marinas, Greg KH, Jeremy Linton,
	Jonathan Cameron, Lorenzo Pieralisi, Mark Rutland, Sudeep Holla,
	Will Deacon, Steven Price, Suzuki K Poulose
In-Reply-To: <20260527100233.428018-1-aneesh.kumar@kernel.org>

SMCCC-discovered firmware services are currently represented by separate
platform devices, such as smccc_trng and arm-cca-dev. Those devices do not
represent independent DT/ACPI-described platform resources; they are
features of the SMCCC firmware interface.

Add an Arm SMCCC bus for services discovered through the SMCCC firmware
interface. The bus provides SMCCC device and driver registration helpers,
name-based matching, modalias generation, and a sysfs modalias attribute so
SMCCC service drivers can bind to discovered firmware services and autoload
as modules.

Follow-up changes can then register SMCCC firmware services as arm-smccc
devices instead of creating independent per-feature platform devices.

Based on arm_ffa code

Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
---
 drivers/firmware/smccc/smccc.c    | 158 ++++++++++++++++++++++++++++++
 include/linux/arm-smccc-bus.h     |  49 +++++++++
 include/linux/mod_devicetable.h   |  13 +++
 scripts/mod/devicetable-offsets.c |   3 +
 scripts/mod/file2alias.c          |   8 ++
 5 files changed, 231 insertions(+)
 create mode 100644 include/linux/arm-smccc-bus.h

diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
index bdee057db2fd..695c920a8087 100644
--- a/drivers/firmware/smccc/smccc.c
+++ b/drivers/firmware/smccc/smccc.c
@@ -10,10 +10,15 @@
 #include <linux/arm-smccc.h>
 #include <linux/kernel.h>
 #include <linux/platform_device.h>
+#include <linux/arm-smccc-bus.h>
+#include <linux/idr.h>
+#include <linux/slab.h>
+
 #include <asm/archrandom.h>
 
 static u32 smccc_version = ARM_SMCCC_VERSION_1_0;
 static enum arm_smccc_conduit smccc_conduit = SMCCC_CONDUIT_NONE;
+static DEFINE_IDA(arm_smccc_bus_id);
 
 bool __ro_after_init smccc_trng_available = false;
 s32 __ro_after_init smccc_soc_id_version = SMCCC_RET_NOT_SUPPORTED;
@@ -81,6 +86,159 @@ bool arm_smccc_hypervisor_has_uuid(const uuid_t *hyp_uuid)
 }
 EXPORT_SYMBOL_GPL(arm_smccc_hypervisor_has_uuid);
 
+static int arm_smccc_bus_match(struct device *dev,
+		const struct device_driver *drv)
+{
+	const struct arm_smccc_device_id *id_table;
+	struct arm_smccc_device *smccc_dev = to_arm_smccc_device(dev);
+
+	id_table = to_arm_smccc_driver(drv)->id_table;
+	if (!id_table)
+		return 0;
+
+	while (id_table->name[0]) {
+		if (!strcmp(smccc_dev->name, id_table->name))
+			return 1;
+		id_table++;
+	}
+
+	return 0;
+}
+
+static int arm_smccc_bus_probe(struct device *dev)
+{
+	struct arm_smccc_driver *smccc_drv = to_arm_smccc_driver(dev->driver);
+
+	return smccc_drv->probe(to_arm_smccc_device(dev));
+}
+
+static void arm_smccc_bus_remove(struct device *dev)
+{
+	struct arm_smccc_driver *smcc_drv = to_arm_smccc_driver(dev->driver);
+
+	if (smcc_drv->remove)
+		smcc_drv->remove(to_arm_smccc_device(dev));
+}
+
+static int arm_smccc_bus_uevent(const struct device *dev,
+		struct kobj_uevent_env *env)
+{
+	const struct arm_smccc_device *smccc_dev = to_arm_smccc_device(dev);
+
+	return add_uevent_var(env, "MODALIAS=" ARM_SMCCC_MODULE_PREFIX "%s",
+			      smccc_dev->name);
+}
+
+static ssize_t modalias_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct arm_smccc_device *smccc_dev = to_arm_smccc_device(dev);
+
+	return sysfs_emit(buf, ARM_SMCCC_MODULE_PREFIX "%s\n", smccc_dev->name);
+}
+static DEVICE_ATTR_RO(modalias);
+
+static struct attribute *arm_smccc_device_attrs[] = {
+	&dev_attr_modalias.attr,
+	NULL,
+};
+ATTRIBUTE_GROUPS(arm_smccc_device);
+
+const struct bus_type arm_smccc_bus_type = {
+	.name = "arm_smccc",
+	.match = arm_smccc_bus_match,
+	.probe = arm_smccc_bus_probe,
+	.remove = arm_smccc_bus_remove,
+	.uevent = arm_smccc_bus_uevent,
+	.dev_groups = arm_smccc_device_groups,
+};
+EXPORT_SYMBOL_GPL(arm_smccc_bus_type);
+
+int arm_smccc_driver_register(struct arm_smccc_driver *driver,
+		struct module *owner, const char *mod_name)
+{
+	if (!driver->probe)
+		return -EINVAL;
+
+	driver->driver.bus = &arm_smccc_bus_type;
+	driver->driver.name = driver->name;
+	driver->driver.owner = owner;
+	driver->driver.mod_name = mod_name;
+
+	return driver_register(&driver->driver);
+}
+EXPORT_SYMBOL_GPL(arm_smccc_driver_register);
+
+void arm_smccc_driver_unregister(struct arm_smccc_driver *driver)
+{
+	driver_unregister(&driver->driver);
+}
+EXPORT_SYMBOL_GPL(arm_smccc_driver_unregister);
+
+static void arm_smccc_release_device(struct device *dev)
+{
+	struct arm_smccc_device *smccc_dev = to_arm_smccc_device(dev);
+
+	ida_free(&arm_smccc_bus_id, smccc_dev->id);
+	kfree(smccc_dev);
+}
+
+struct arm_smccc_device *arm_smccc_device_register(const char *name)
+{
+	struct arm_smccc_device *smccc_dev;
+	int id, ret;
+
+	id = ida_alloc_min(&arm_smccc_bus_id, 1, GFP_KERNEL);
+	if (id < 0)
+		return ERR_PTR(id);
+
+	smccc_dev = kzalloc_obj(*smccc_dev);
+	if (!smccc_dev) {
+		ida_free(&arm_smccc_bus_id, id);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	smccc_dev->id = id;
+	if (strscpy(smccc_dev->name, name) < 0) {
+		kfree(smccc_dev);
+		ida_free(&arm_smccc_bus_id, id);
+		return ERR_PTR(-EINVAL);
+	}
+	smccc_dev->dev.bus = &arm_smccc_bus_type;
+	smccc_dev->dev.release = arm_smccc_release_device;
+
+	ret = dev_set_name(&smccc_dev->dev, "%s-%d", smccc_dev->name, id);
+	if (ret) {
+		kfree(smccc_dev);
+		ida_free(&arm_smccc_bus_id, id);
+		return ERR_PTR(ret);
+	}
+
+	ret = device_register(&smccc_dev->dev);
+	if (ret) {
+		put_device(&smccc_dev->dev);
+		return ERR_PTR(ret);
+	}
+
+	return smccc_dev;
+}
+EXPORT_SYMBOL_GPL(arm_smccc_device_register);
+
+void arm_smccc_device_unregister(struct arm_smccc_device *smccc_dev)
+{
+	if (!smccc_dev)
+		return;
+
+	device_unregister(&smccc_dev->dev);
+}
+EXPORT_SYMBOL_GPL(arm_smccc_device_unregister);
+
+static int __init arm_smccc_bus_init(void)
+{
+	return bus_register(&arm_smccc_bus_type);
+}
+subsys_initcall(arm_smccc_bus_init);
+
 static int __init smccc_devices_init(void)
 {
 	struct platform_device *pdev;
diff --git a/include/linux/arm-smccc-bus.h b/include/linux/arm-smccc-bus.h
new file mode 100644
index 000000000000..188891441e57
--- /dev/null
+++ b/include/linux/arm-smccc-bus.h
@@ -0,0 +1,49 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2026 Arm Limited
+ */
+#ifndef __LINUX_ARM_SMCCC_BUS_H
+#define __LINUX_ARM_SMCCC_BUS_H
+
+#include <linux/device.h>
+#include <linux/mod_devicetable.h>
+#include <linux/module.h>
+
+struct arm_smccc_device {
+	int id;
+	char name[ARM_SMCCC_NAME_SIZE];
+	struct device dev;
+};
+
+#define to_arm_smccc_device(d) container_of(d, struct arm_smccc_device, dev)
+
+struct arm_smccc_driver {
+	const char *name;
+	int (*probe)(struct arm_smccc_device *sdev);
+	void (*remove)(struct arm_smccc_device *sdev);
+	const struct arm_smccc_device_id *id_table;
+
+	struct device_driver driver;
+};
+
+#define to_arm_smccc_driver(d) \
+	container_of_const(d, struct arm_smccc_driver, driver)
+
+int arm_smccc_driver_register(struct arm_smccc_driver *driver,
+		struct module *owner, const char *mod_name);
+void arm_smccc_driver_unregister(struct arm_smccc_driver *driver);
+struct arm_smccc_device *arm_smccc_device_register(const char *name);
+void arm_smccc_device_unregister(struct arm_smccc_device *smcc_dev);
+
+#define arm_smccc_register(driver) \
+	arm_smccc_driver_register(driver, THIS_MODULE, KBUILD_MODNAME)
+#define arm_smccc_unregister(driver) \
+	arm_smccc_driver_unregister(driver)
+
+#define module_arm_smccc_driver(__arm_smccc_driver) \
+	module_driver(__arm_smccc_driver, arm_smccc_register, \
+		      arm_smccc_unregister)
+
+extern const struct bus_type arm_smccc_bus_type;
+
+#endif /* __LINUX_ARM_SMCCC_BUS_H */
diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h
index 23ff24080dfd..c9cee8c5a0b2 100644
--- a/include/linux/mod_devicetable.h
+++ b/include/linux/mod_devicetable.h
@@ -876,6 +876,19 @@ struct auxiliary_device_id {
 	kernel_ulong_t driver_data;
 };
 
+#define ARM_SMCCC_NAME_SIZE 40
+#define ARM_SMCCC_MODULE_PREFIX "arm_smccc:"
+
+/**
+ * struct arm_smccc_device_id - Arm SMCCC bus device identifier
+ * @name: SMCCC device name
+ * @driver_data: driver data
+ */
+struct arm_smccc_device_id {
+	char name[ARM_SMCCC_NAME_SIZE];
+	kernel_ulong_t driver_data;
+};
+
 /* Surface System Aggregator Module */
 
 #define SSAM_MATCH_TARGET	0x1
diff --git a/scripts/mod/devicetable-offsets.c b/scripts/mod/devicetable-offsets.c
index b4178c42d08f..a485011ff137 100644
--- a/scripts/mod/devicetable-offsets.c
+++ b/scripts/mod/devicetable-offsets.c
@@ -254,6 +254,9 @@ int main(void)
 	DEVID(auxiliary_device_id);
 	DEVID_FIELD(auxiliary_device_id, name);
 
+	DEVID(arm_smccc_device_id);
+	DEVID_FIELD(arm_smccc_device_id, name);
+
 	DEVID(ssam_device_id);
 	DEVID_FIELD(ssam_device_id, match_flags);
 	DEVID_FIELD(ssam_device_id, domain);
diff --git a/scripts/mod/file2alias.c b/scripts/mod/file2alias.c
index 4e99393a35f1..0ce4fb049711 100644
--- a/scripts/mod/file2alias.c
+++ b/scripts/mod/file2alias.c
@@ -1296,6 +1296,13 @@ static void do_auxiliary_entry(struct module *mod, void *symval)
 	module_alias_printf(mod, false, AUXILIARY_MODULE_PREFIX "%s", *name);
 }
 
+static void do_arm_smccc_entry(struct module *mod, void *symval)
+{
+	DEF_FIELD_ADDR(symval, arm_smccc_device_id, name);
+
+	module_alias_printf(mod, false, ARM_SMCCC_MODULE_PREFIX "%s", *name);
+}
+
 /*
  * Looks like: ssam:dNcNtNiNfN
  *
@@ -1466,6 +1473,7 @@ static const struct devtable devtable[] = {
 	{"mhi", SIZE_mhi_device_id, do_mhi_entry},
 	{"mhi_ep", SIZE_mhi_device_id, do_mhi_ep_entry},
 	{"auxiliary", SIZE_auxiliary_device_id, do_auxiliary_entry},
+	{"arm_smccc", SIZE_arm_smccc_device_id, do_arm_smccc_entry},
 	{"ssam", SIZE_ssam_device_id, do_ssam_entry},
 	{"dfl", SIZE_dfl_device_id, do_dfl_entry},
 	{"ishtp", SIZE_ishtp_device_id, do_ishtp_entry},
-- 
2.43.0


^ permalink raw reply related

* [PATCH v6 0/4] Switch Arm SMCCC firmware services to an SMCCC bus
From: Aneesh Kumar K.V (Arm) @ 2026-05-27 10:02 UTC (permalink / raw)
  To: linux-coco, linux-arm-kernel, linux-kernel
  Cc: Aneesh Kumar K.V (Arm), Catalin Marinas, Greg KH, Jeremy Linton,
	Jonathan Cameron, Lorenzo Pieralisi, Mark Rutland, Sudeep Holla,
	Will Deacon, Steven Price, Suzuki K Poulose

As discussed here:
https://lore.kernel.org/all/20250728135216.48084-12-aneesh.kumar@kernel.org

The earlier CCA guest support used an arm-cca-dev platform device as a pure
software anchor for the TSM class device. That platform device did not
correspond to a DT/ACPI described device, MMIO range, interrupt, or other
platform resource; it existed only to make the CCA guest driver bind and to
place the resulting TSM device in the driver model. The same pattern also
exists for smccc_trng. Creating separate platform devices for such
SMCCC-discovered features is misleading, because those features are not
independent platform devices.

This series adds an Arm SMCCC bus for services discovered through the SMCCC
firmware interface. The bus provides SMCCC device and driver registration
helpers, name-based matching, uevent modalias generation, and a sysfs modalias
attribute. SMCCC service drivers can use MODULE_DEVICE_TABLE(arm_smccc, ...)
to emit arm_smccc:<name> aliases, allowing userspace to autoload service
drivers when the SMCCC core registers matching firmware-service devices.

The series then moves SMCCC TRNG and the Arm CCA guest RSI service off the
platform bus. When the SMCCC core discovers the corresponding firmware
service, it registers an arm-smccc device for that service. The hwrng
arm_smccc_trng driver and the Arm CCA guest TSM provider are converted to
SMCCC drivers that bind to those discovered devices.

The old arm-cca-dev platform device has also been used by userspace as a Realm
guest indicator. Removing it without a replacement would leave userspace
depending on an internal driver-binding device. This series therefore adds
/sys/firmware/cca/realm_guest as a stable, architecture-provided ABI for
detecting whether the kernel is running as an Arm CCA Realm guest, and then
removes the dummy arm-cca-dev platform-device registration.

Changes from v5:
https://lore.kernel.org/all/20260514094030.42495-1-aneesh.kumar@kernel.org
* Replace the arm-smccc platform-device plus auxiliary-child model with a
  dedicated Arm SMCCC bus.
* Add SMCCC module alias support so SMCCC service drivers can use
  MODULE_DEVICE_TABLE(arm_smccc, ...) and autoload through arm_smccc:<name>
  aliases.
* Convert smccc_trng from a platform driver to an SMCCC driver.
* Convert the Arm CCA guest TSM provider from the arm-cca-dev platform device
  to an SMCCC driver bound to the discovered RSI service.
* Add /sys/firmware/cca/realm_guest before removing the old arm-cca-dev dummy
  platform device.

Changes from v4:
https://lore.kernel.org/all/20260427061615.905018-1-aneesh.kumar@kernel.org
* Add /sys/firmware/cca/realm_guest for detecting realm guest
* Convert smccc_trng to auxiliary device from platform device

Changes from v3:
https://lore.kernel.org/all/20260309100507.2303361-1-aneesh.kumar@kernel.org
* Rebased onto the latest kernel
* Drop pr_fmt() from drivers/firmware/smccc/rmm.c

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: Lorenzo Pieralisi <lpieralisi@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Steven Price <steven.price@arm.com>
Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>

Aneesh Kumar K.V (Arm) (4):
  firmware: smccc: Add an Arm SMCCC bus
  firmware: hwrng: arm_smccc_trng: Register as an SMCCC device
  firmware: smccc: arm-cca-guest: Bind the TSM provider to an SMCCC
    device
  coco: guest: arm64: Replace dummy CCA device with sysfs ABI

 Documentation/ABI/testing/sysfs-firmware-cca  |  10 +
 arch/arm64/include/asm/rsi.h                  |   2 +-
 arch/arm64/kernel/rsi.c                       |  39 +++-
 drivers/char/hw_random/arm_smccc_trng.c       |  26 +--
 drivers/firmware/smccc/Makefile               |   4 +
 drivers/firmware/smccc/rmm.c                  |  25 +++
 drivers/firmware/smccc/rmm.h                  |  17 ++
 drivers/firmware/smccc/smccc.c                | 178 +++++++++++++++++-
 drivers/virt/coco/arm-cca-guest/Kconfig       |   1 +
 drivers/virt/coco/arm-cca-guest/Makefile      |   2 +
 .../{arm-cca-guest.c => arm-cca.c}            |  60 +++---
 include/linux/arm-smccc-bus.h                 |  49 +++++
 include/linux/mod_devicetable.h               |  13 ++
 scripts/mod/devicetable-offsets.c             |   3 +
 scripts/mod/file2alias.c                      |   8 +
 15 files changed, 378 insertions(+), 59 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-firmware-cca
 create mode 100644 drivers/firmware/smccc/rmm.c
 create mode 100644 drivers/firmware/smccc/rmm.h
 rename drivers/virt/coco/arm-cca-guest/{arm-cca-guest.c => arm-cca.c} (85%)
 create mode 100644 include/linux/arm-smccc-bus.h

base-commit: 50897c955902c93ae71c38698abb910525ebdc89
-- 
2.43.0

^ permalink raw reply

* Re: [PATCH 02/15] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Xiaoyao Li @ 2026-05-27  8:18 UTC (permalink / raw)
  To: Xu Yilun
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan
In-Reply-To: <ahaeCGLPEfmYNFtC@yilunxu-OptiPlex-7050>

On 5/27/2026 3:32 PM, Xu Yilun wrote:
> On Wed, May 27, 2026 at 02:38:27PM +0800, Xiaoyao Li wrote:
>> On 5/27/2026 11:47 AM, Xu Yilun wrote:
>>>>> +static void tdx_clflush_hpa_list(struct page *root, unsigned int nr_pages)
>>>>> +{
>>>>> +	u64 *entries = page_to_virt(root);
>>>>> +	int i;
>>>>> +
>>>>> +	for (i = 0; i < nr_pages; i++)
>>>>> +		clflush_cache_range(__va(entries[i]), PAGE_SIZE);
>>>>
>>>> Is the page flush only needed when CLFLUSH_BEFORE_ALLOC is true?
>>>>
>>>> If so, it inherits the same decision to always flush as what
>>>
>>> Yes it is basically the same as tdx_clflush_page().
>>>
>>>> tdx_clflush_page() did. Then, any chance we can use tdx_clflush_page() here
>>>
>>> But I don't think we should convert hpa/page/va back and forth just for
>>> re-using one line of code.
>>
>> Because we want/need to flush page as late as possible so that the page
>> flush needs to happen right before SEAMCALL?
> 
> I think so. Let the flushing be part of the tdh call semantic.
> 
>>
>> How about we pass in the struct page * and number into tdx_ext_mem_add() and
>> construct the root page inside it?
> 
> I assume you don't suggest allocate root page inside the call, then we
> need 3 parameters for the HPA_LIST_INFO:
> 
>    struct page *, unsigned int nr_pages, struct page *root
> 
> which I think too much.

yeah, sort of.

> I think your concern is to try not to introduce another tdx_clflush_
> variant, but I believe this will happen, pfn based memory description is
> on the way:
> 
> https://lore.kernel.org/all/20260430014929.24210-1-yan.y.zhao@intel.com/

I don't object the variant of tdx_clflush_hpa_list(), but suggest if 
tdx_clflush_page() can be used instead of raw clflush_cache_range()

Maybe we can try to put tdx_clflush_hpa_list() along with 
tdx_clflush_page() and tdx_clflush_pfn()? This way, I think we can save 
the separate comment.


^ permalink raw reply

* Re: [PATCH 02/15] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Xu Yilun @ 2026-05-27  7:32 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan
In-Reply-To: <9073ac91-3aa4-41e2-bb81-8878409498e5@intel.com>

On Wed, May 27, 2026 at 02:38:27PM +0800, Xiaoyao Li wrote:
> On 5/27/2026 11:47 AM, Xu Yilun wrote:
> > > > +static void tdx_clflush_hpa_list(struct page *root, unsigned int nr_pages)
> > > > +{
> > > > +	u64 *entries = page_to_virt(root);
> > > > +	int i;
> > > > +
> > > > +	for (i = 0; i < nr_pages; i++)
> > > > +		clflush_cache_range(__va(entries[i]), PAGE_SIZE);
> > > 
> > > Is the page flush only needed when CLFLUSH_BEFORE_ALLOC is true?
> > > 
> > > If so, it inherits the same decision to always flush as what
> > 
> > Yes it is basically the same as tdx_clflush_page().
> > 
> > > tdx_clflush_page() did. Then, any chance we can use tdx_clflush_page() here
> > 
> > But I don't think we should convert hpa/page/va back and forth just for
> > re-using one line of code.
> 
> Because we want/need to flush page as late as possible so that the page
> flush needs to happen right before SEAMCALL?

I think so. Let the flushing be part of the tdh call semantic.

> 
> How about we pass in the struct page * and number into tdx_ext_mem_add() and
> construct the root page inside it?

I assume you don't suggest allocate root page inside the call, then we
need 3 parameters for the HPA_LIST_INFO:

  struct page *, unsigned int nr_pages, struct page *root

which I think too much.

I think your concern is to try not to introduce another tdx_clflush_
variant, but I believe this will happen, pfn based memory description is
on the way:

https://lore.kernel.org/all/20260430014929.24210-1-yan.y.zhao@intel.com/

^ permalink raw reply

* Re: [RFC PATCH 14/15] x86/virt/tdx: Embed version info in SEAMCALL leaf function definitions
From: Xiaoyao Li @ 2026-05-27  7:44 UTC (permalink / raw)
  To: Xu Yilun
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan
In-Reply-To: <ahaTH+O8YGxEXSpz@yilunxu-OptiPlex-7050>

On 5/27/2026 2:45 PM, Xu Yilun wrote:
>>>    /*
>>>     * TDX module SEAMCALL leaf functions
>>>     */
>>> @@ -31,7 +44,7 @@
>>>    #define TDH_VP_CREATE			10
>>>    #define TDH_MNG_KEY_FREEID		20
>>>    #define TDH_MNG_INIT			21
>>> -#define TDH_VP_INIT			22
>>> +#define TDH_VP_INIT			SEAMCALL_LEAF_VER(22, 1)
>>
>> how about
>>
>> #define TDH_VP_INIT			22
>> #define TDH_VP_INIT_V1			SEAMCALL_LEAF_VER(TDH_VP_INIT, 1)
>>
>> and use TDH_VP_INIT_V1 below?
> 
> I'm trying to avoid a _Vx postfix if unnecessary. Don't make callers
> have to choose between versions. The main MACRO should always point to
> the latest version since later versions are backward compatible.

I don't agree.

The later versions are backwards compatible, but the later versions 
might not be supported by the loaded TDX module.

Usually the callers will have to choose between versions due to the TDX 
module being used varies, just like the case in the next patch.

We can make TDH_VP_INIT represent the v1 as this patch because Linux 
mandates v1 when the code was merged. So it can be made the default.

> The next patch is an exception. I've found there is no public TDX Module
> release available for TDH.SYS.CONFIG v1. I expect people just use the
> un-versioned MACRO for development, but have to keep the explicitly
> versioned _V0 macro for compatibility for now.


^ permalink raw reply

* Re: [PATCH 01/15] x86/virt/tdx: Read global metadata for TDX Module Extensions
From: Xu Yilun @ 2026-05-27  7:11 UTC (permalink / raw)
  To: Sohil Mehta
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, yilun.xu, baolu.lu, zhenzhong.duan, xiaoyao.li
In-Reply-To: <b25e03ad-0bf4-482f-86ec-eebc6ac03d95@intel.com>

On Tue, May 26, 2026 at 11:05:48PM -0700, Sohil Mehta wrote:
> On 5/21/2026 8:41 PM, Xu Yilun wrote:
> > Add reading of the global metadata for TDX Module Extensions.
> > 
> > TDX Module Extensions is an add-on feature enumerated by TDX_FEATURES0.
> > But for the Module's integrity, Linux requires that all features that a
> > Module advertises must have a complete, valid set of metadata, and the
> > validation must succeed at core TDX initialization time.
> > 
> > Check TDX_FEATURES0 before reading these metadata. If a feature is
> > advertised, a failure in reading associated metadata causes the entire
> > TDX initialization to fail, otherwise skip.
> > 
> > Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> > ---
> >  arch/x86/include/asm/tdx_global_metadata.h  |  6 ++++++
> >  arch/x86/virt/vmx/tdx/tdx.h                 |  1 +
> >  arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 16 ++++++++++++++++
> 
> The top comments in tdx_global_metadata.h and tdx_global_metadata.c say
> that these files are autogenerated. I believe the script lives outside
> the tree. Is there a plan to merge the script?

No, the plan of auto-generating is deprecated. Now we switch to manual
update.

> 
> The generated code is optimized for space instead of readability. Also,
> I see odd uncommented assignments u64 => u8/u16 all over the file. I am
> assuming the upper bits are expected to be zero.
> 
> The patch is hard to review without the script. Can you post a link to

Yes, it is. A new plan is to refactor the file in future.

> the updated script that led to this patch?
> 
> 
> >  3 files changed, 23 insertions(+)
> > 
> > diff --git a/arch/x86/include/asm/tdx_global_metadata.h b/arch/x86/include/asm/tdx_global_metadata.h
> > index 40689c8dc67e..533afe50a3f1 100644
> > --- a/arch/x86/include/asm/tdx_global_metadata.h
> > +++ b/arch/x86/include/asm/tdx_global_metadata.h
> > @@ -40,12 +40,18 @@ struct tdx_sys_info_td_conf {
> >  	u64 cpuid_config_values[128][2];
> >  };
> >  
> > +struct tdx_sys_info_ext {
> > +	u16 memory_pool_required_pages;
> > +	u8 ext_required;
> 
> The name ext_required seems like a boolean. It is also used like a
> boolean later.
> 	if (!tdx_sysinfo.ext.ext_required)
> 		return 0;
> 
> But, IIUC, is it actually a mask that lists any feature that needs

No it is just a bool about Extentions needs to be initialized or not.

> extensions to work correctly? If so, it would be good to give it a name
> that reflects its usage. Maybe:
> features_requiring_ext or something better
> 
> As Xiaoyao mentioned, the struct requires a better explanation in the
> commit log.

Will do. I also plan to change the patch organization: instead of the
old auto-generated patch splitting style, I will switch to a human-readable
style and fold these metadata readings directly into the patches that
actually use them (e.g., DPAMT and TDX Runtime Update).

^ permalink raw reply

* Re: [RFC PATCH 14/15] x86/virt/tdx: Embed version info in SEAMCALL leaf function definitions
From: Xu Yilun @ 2026-05-27  6:45 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan
In-Reply-To: <90a4835f-bc11-4415-a7b6-84347f40861b@intel.com>

> >   /*
> >    * TDX module SEAMCALL leaf functions
> >    */
> > @@ -31,7 +44,7 @@
> >   #define TDH_VP_CREATE			10
> >   #define TDH_MNG_KEY_FREEID		20
> >   #define TDH_MNG_INIT			21
> > -#define TDH_VP_INIT			22
> > +#define TDH_VP_INIT			SEAMCALL_LEAF_VER(22, 1)
> 
> how about
> 
> #define TDH_VP_INIT			22
> #define TDH_VP_INIT_V1			SEAMCALL_LEAF_VER(TDH_VP_INIT, 1)
> 
> and use TDH_VP_INIT_V1 below?

I'm trying to avoid a _Vx postfix if unnecessary. Don't make callers
have to choose between versions. The main MACRO should always point to
the latest version since later versions are backward compatible.

The next patch is an exception. I've found there is no public TDX Module
release available for TDH.SYS.CONFIG v1. I expect people just use the
un-versioned MACRO for development, but have to keep the explicitly
versioned _V0 macro for compatibility for now.

^ permalink raw reply

* RE: [PATCH v5 5/5] iommufd/vdevice: add TSM request ioctl
From: Tian, Kevin @ 2026-05-27  6:56 UTC (permalink / raw)
  To: Dan Williams (nvidia), Alexey Kardashevskiy,
	Aneesh Kumar K.V (Arm), linux-coco@lists.linux.dev,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org
  Cc: Bjorn Helgaas, Jason Gunthorpe, Joerg Roedel, Jonathan Cameron,
	Nicolin Chen, Samuel Ortiz, Steven Price, Suzuki K Poulose,
	Will Deacon, Xu Yilun, Shameer Kolothum, Paolo Bonzini,
	Tony Krowiak, Halil Pasic, Jason Herne, Harald Freudenberger,
	Holger Dengler, Heiko Carstens, Vasily Gorbik, Alexander Gordeev,
	Christian Borntraeger, Sven Schnelle, Alex Williamson,
	Matthew Rosato, Farhan Ali, Eric Farman,
	linux-s390@vger.kernel.org
In-Reply-To: <6a168c8ea7d10_2129b2100e@djbw-dev.notmuch>

> From: Dan Williams (nvidia) <djbw@kernel.org>
> Sent: Wednesday, May 27, 2026 2:18 PM
> 
> Alexey Kardashevskiy wrote:
> >
> >
> > On 26/5/26 01:48, Aneesh Kumar K.V (Arm) wrote:
> > > +static bool iommufd_vdevice_tsm_req_scope_valid(u32 scope)
> > > +{
> > > +	if (scope > IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_LAST)
> > > +		return false;
> > > +
> > > +	switch (scope) {
> > > +	case IOMMU_VDEVICE_TSM_REQ_PCI_INFO:
> > > +	case IOMMU_VDEVICE_TSM_REQ_PCI_STATE_CHANGE:
> > > +	case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_READ:
> > > +	case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE:
> >
> > This scope thing still needs clarification.
> >
> > I have 3 types of requests to fit here, all go via VM -> KVM -> QEMU ->
> IOMMUFD -> TSM.
> >
> > 1) bind/unbind TDI <- moves to CONFIG_LOCKED, this is "OP";
> > 2) start/stop TDI <- moves to RUN, this is "GR"? Right now I route it via "OP";
> > 3) enable/disable MMIO/DMA <- no TDI state change, this is "GR" but
> which scope is it here?
> 
> The scope parameter was meant to enumerate a security model for classes
> of commands that are otherwise opaque to the kernel. However, none of
> the commands we are targeting are opaque (private specification with
> unknown effect). It now turns out there is no role for @scope for
> security.

yeah, I haven't succeeded on figuring out that role for now. It sounds an
unnecessary abstraction asking vendor specific code to translate its
command into opaque then in the end we go back to the vendor code
to decide the security scope of that opaque.

[...]
> ...or just observe that per CC arch commands are needed to setup the VM
> so per CC arch commands are needed to marshal device assignment support
> requests.
> 
> In that case pci_tsm_req_scope becomes tsm_req_type and is just:
> 
> TSM_REQ_TYPE_CCA
> TSM_REQ_TYPE_SEV
> TSM_REQ_TYPE_TDX
> 
> I am leaning towards the latter at this point.

+1

^ permalink raw reply

* Re: [PATCH 02/15] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Xiaoyao Li @ 2026-05-27  6:38 UTC (permalink / raw)
  To: Xu Yilun
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan
In-Reply-To: <ahZpUA62NRdNrkvZ@yilunxu-OptiPlex-7050>

On 5/27/2026 11:47 AM, Xu Yilun wrote:
>>> +static void tdx_clflush_hpa_list(struct page *root, unsigned int nr_pages)
>>> +{
>>> +	u64 *entries = page_to_virt(root);
>>> +	int i;
>>> +
>>> +	for (i = 0; i < nr_pages; i++)
>>> +		clflush_cache_range(__va(entries[i]), PAGE_SIZE);
>>
>> Is the page flush only needed when CLFLUSH_BEFORE_ALLOC is true?
>>
>> If so, it inherits the same decision to always flush as what
> 
> Yes it is basically the same as tdx_clflush_page().
> 
>> tdx_clflush_page() did. Then, any chance we can use tdx_clflush_page() here
> 
> But I don't think we should convert hpa/page/va back and forth just for
> re-using one line of code.

Because we want/need to flush page as late as possible so that the page 
flush needs to happen right before SEAMCALL?

How about we pass in the struct page * and number into tdx_ext_mem_add() 
and construct the root page inside it?

>> so that we have a single central place of the comment to explain the kernel
>> design decision.
> 
> How about I add a comment here to connect this wrapper to
> tdx_clflush_page():
> 
> /*
>   * Unconditionally flush the pages regardless of CLFLUSH_BEFORE_ALLOC. Inherit
>   * the same decision as tdx_clflush_page().
>   */
> static void tdx_clflush_hpa_list(struct page *root, unsigned int nr_pages)
> ...

It works either. I don't have strong preference. Let's see if anyone 
else say something about it.


^ permalink raw reply

* Re: [PATCH v5 5/5] iommufd/vdevice: add TSM request ioctl
From: Dan Williams (nvidia) @ 2026-05-27  6:17 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Aneesh Kumar K.V (Arm), linux-coco, iommu,
	linux-kernel, kvm
  Cc: Bjorn Helgaas, Dan Williams, Jason Gunthorpe, Joerg Roedel,
	Jonathan Cameron, Kevin Tian, Nicolin Chen, Samuel Ortiz,
	Steven Price, Suzuki K Poulose, Will Deacon, Xu Yilun,
	Shameer Kolothum, Paolo Bonzini, Tony Krowiak, Halil Pasic,
	Jason Herne, Harald Freudenberger, Holger Dengler, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Alex Williamson, Matthew Rosato, Farhan Ali,
	Eric Farman, linux-s390
In-Reply-To: <becd865d-09a4-4ac3-b719-4a0deae2692a@amd.com>

Alexey Kardashevskiy wrote:
> 
> 
> On 26/5/26 01:48, Aneesh Kumar K.V (Arm) wrote:
> > Add IOMMU_VDEVICE_TSM_REQUEST for issuing TSM guest request/response
> > transactions against an iommufd vdevice.
> > 
> > The ioctl takes a vdevice_id plus request/response user buffers and length
> > fields, and forwards the request through tsm_guest_req() to the PCI TSM
> > backend. This provides the host-side passthrough path used by CoCo guests
> > for TSM device attestation and acceptance flows after the device has been
> > bound to TSM.
> > 
> > Also add the supporting tsm_guest_req() helper and associated TSM core
> > interface definitions.
> > 
> > Based on changes from: Alexey Kardashevskiy <aik@amd.com>
> > 
> > Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> > ---
> >   drivers/iommu/iommufd/iommufd_private.h |  6 ++
> >   drivers/iommu/iommufd/main.c            |  3 +
> >   drivers/iommu/iommufd/tsm.c             | 68 +++++++++++++++++++++
> >   drivers/virt/coco/tsm-core.c            | 39 ++++++++++++
> >   include/linux/pci-tsm.h                 |  9 +--
> >   include/linux/tsm.h                     | 25 ++++++++
> >   include/uapi/linux/iommufd.h            | 80 +++++++++++++++++++++++++
> >   7 files changed, 226 insertions(+), 4 deletions(-)
[..]
> > diff --git a/drivers/iommu/iommufd/tsm.c b/drivers/iommu/iommufd/tsm.c
> > index 09ee668dbed9..342fbdb6a6b9 100644
> > --- a/drivers/iommu/iommufd/tsm.c
> > +++ b/drivers/iommu/iommufd/tsm.c
> > @@ -60,3 +60,71 @@ int iommufd_vdevice_tsm_op_ioctl(struct iommufd_ucmd *ucmd)
> >   	iommufd_put_object(ucmd->ictx, &vdev->obj);
> >   	return rc;
> >   }
> > +
> > +static bool iommufd_vdevice_tsm_req_scope_valid(u32 scope)
> > +{
> > +	if (scope > IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_LAST)
> > +		return false;
> > +
> > +	switch (scope) {
> > +	case IOMMU_VDEVICE_TSM_REQ_PCI_INFO:
> > +	case IOMMU_VDEVICE_TSM_REQ_PCI_STATE_CHANGE:
> > +	case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_READ:
> > +	case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE:
> 
> This scope thing still needs clarification.
> 
> I have 3 types of requests to fit here, all go via VM -> KVM -> QEMU -> IOMMUFD -> TSM.
> 
> 1) bind/unbind TDI <- moves to CONFIG_LOCKED, this is "OP";
> 2) start/stop TDI <- moves to RUN, this is "GR"? Right now I route it via "OP";
> 3) enable/disable MMIO/DMA <- no TDI state change, this is "GR" but which scope is it here?

The scope parameter was meant to enumerate a security model for classes
of commands that are otherwise opaque to the kernel. However, none of
the commands we are targeting are opaque (private specification with
unknown effect). It now turns out there is no role for @scope for
security.

Now a command family that iommufd can validate seems useful. As it
stands this implementation aliases command codes across TSMs. Do we
proceed with creating an actual shared command uapi for the truly shared
commands:

TSM_REQ_TYPE_DEFAULT: Commands every arch needs
TSM_REQ_READ_OBJECT
TSM_REQ_REGEN_OBJECT
TSM_REQ_OBJECT_INFO
TSM_REQ_VALIDATE_MMIO
TSM_REQ_SET_TDI_STATE

TSM_REQ_TYPE_SEV: Commands only SEV needs
TSM_REQ_SEV_ENABLE_DMA
TSM_REQ_SEV_DISABLE_DMA

...or just observe that per CC arch commands are needed to setup the VM
so per CC arch commands are needed to marshal device assignment support
requests.

In that case pci_tsm_req_scope becomes tsm_req_type and is just:

TSM_REQ_TYPE_CCA
TSM_REQ_TYPE_SEV
TSM_REQ_TYPE_TDX

I am leaning towards the latter at this point.

^ permalink raw reply

* Re: [PATCH 01/15] x86/virt/tdx: Read global metadata for TDX Module Extensions
From: Sohil Mehta @ 2026-05-27  6:05 UTC (permalink / raw)
  To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
  Cc: linux-coco, linux-kernel, kvm, yilun.xu, baolu.lu, zhenzhong.duan,
	xiaoyao.li
In-Reply-To: <20260522034128.3144354-2-yilun.xu@linux.intel.com>

On 5/21/2026 8:41 PM, Xu Yilun wrote:
> Add reading of the global metadata for TDX Module Extensions.
> 
> TDX Module Extensions is an add-on feature enumerated by TDX_FEATURES0.
> But for the Module's integrity, Linux requires that all features that a
> Module advertises must have a complete, valid set of metadata, and the
> validation must succeed at core TDX initialization time.
> 
> Check TDX_FEATURES0 before reading these metadata. If a feature is
> advertised, a failure in reading associated metadata causes the entire
> TDX initialization to fail, otherwise skip.
> 
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> ---
>  arch/x86/include/asm/tdx_global_metadata.h  |  6 ++++++
>  arch/x86/virt/vmx/tdx/tdx.h                 |  1 +
>  arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 16 ++++++++++++++++

The top comments in tdx_global_metadata.h and tdx_global_metadata.c say
that these files are autogenerated. I believe the script lives outside
the tree. Is there a plan to merge the script?

The generated code is optimized for space instead of readability. Also,
I see odd uncommented assignments u64 => u8/u16 all over the file. I am
assuming the upper bits are expected to be zero.

The patch is hard to review without the script. Can you post a link to
the updated script that led to this patch?


>  3 files changed, 23 insertions(+)
> 
> diff --git a/arch/x86/include/asm/tdx_global_metadata.h b/arch/x86/include/asm/tdx_global_metadata.h
> index 40689c8dc67e..533afe50a3f1 100644
> --- a/arch/x86/include/asm/tdx_global_metadata.h
> +++ b/arch/x86/include/asm/tdx_global_metadata.h
> @@ -40,12 +40,18 @@ struct tdx_sys_info_td_conf {
>  	u64 cpuid_config_values[128][2];
>  };
>  
> +struct tdx_sys_info_ext {
> +	u16 memory_pool_required_pages;
> +	u8 ext_required;

The name ext_required seems like a boolean. It is also used like a
boolean later.
	if (!tdx_sysinfo.ext.ext_required)
		return 0;

But, IIUC, is it actually a mask that lists any feature that needs
extensions to work correctly? If so, it would be good to give it a name
that reflects its usage. Maybe:
features_requiring_ext or something better

As Xiaoyao mentioned, the struct requires a better explanation in the
commit log.

> +};

...

>  static __init int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
>  {
>  	int ret = 0;
> @@ -116,5 +129,8 @@ static __init int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
>  	ret = ret ?: get_tdx_sys_info_td_ctrl(&sysinfo->td_ctrl);
>  	ret = ret ?: get_tdx_sys_info_td_conf(&sysinfo->td_conf);
>  
> +	if (sysinfo->features.tdx_features0 & TDX_FEATURES0_EXT)

Other metadata reads aren't gated on feature checking. Is this check
manually added or autogenerated. If manually added, it should have a
code comment clarifying that.

> +		ret = ret ?: get_tdx_sys_info_ext(&sysinfo->ext);
> +
>  	return ret;
>  }


^ permalink raw reply

* Re: [PATCH 00/15] Enable TDX Module Extensions and DICE-based TDX Quoting
From: Sohil Mehta @ 2026-05-27  5:23 UTC (permalink / raw)
  To: Xu Yilun, kas, djbw, rick.p.edgecombe, x86, peter.fang
  Cc: linux-coco, linux-kernel, kvm, yilun.xu, baolu.lu, zhenzhong.duan,
	xiaoyao.li
In-Reply-To: <20260522034128.3144354-1-yilun.xu@linux.intel.com>

Hello,

On 5/21/2026 8:41 PM, Xu Yilun wrote:

> The first 4 patches will eventually need an ack by an x86 maintainer, so
> please review with that in mind.
> 

I am looking at this from an x86 reviewer perspective with limited prior
TDX knowledge.

> == Overview ==
> 
> TDX Module introduces the "TDX Module Extensions" to support long
> running / hard-irq preemptible flows inside. This makes TDX Module
> capable of handling complex tasks through "Extension SEAMCALLs".

Can we explain a bit more about why these extensions are needed or what
would happen if the kernel didn't enable them? I ran the series through
an LLM for my curiosity. I think something on the below lines might be a
good addition for the cover letter itself.

(Please verify)

The TDX module's normal SEAMCALLs are designed to be short,
non-preemptible operations. However, some newer features (like
DICE-based TDX Quoting) require complex, potentially long-running
computations that can't complete within the tight constraints of a
single non-preemptible SEAMCALL.

The "TDX Module Extensions" solve this by introducing "Extension
SEAMCALLs" — a new class of SEAMCALLs that are:

* Long-running — they may take significant time to complete (e.g.,
cryptographic operations for attestation/quoting).

* Hard-IRQ preemptible — they can be interrupted by hardware interrupts
and later resumed, so they don't monopolize the CPU or cause
unacceptable interrupt latency.

Without this mechanism, complex operations like generating DICE
attestation quotes would either block interrupts for too long
(unacceptable for a host kernel) or wouldn't be possible inside the TDX
module at all. The Extensions give the TDX module a way to handle these
heavyweight tasks while remaining cooperative with the host's
interrupt/scheduling model.

> 
> TDX Module allows some add-on features to use the Extension. 

s/Module/module throughout the series.

The existing kernel code predominantly uses the lower case TDX "module".

> The first feature to use Extensions is DICE-based TDX Quoting [1].
> DICE is an industry-standard, certificate-backed attestation
> framework that layers evidence through a chain of certificates.
> 
> This series adds infrastructure to enable the Extensions and then
> implement DICE-based TDX Quoting.
> 
> The Extensions consumes relatively large amount of memory (~50MB). So it
> is designed to be off by default. It must be enabled after basic TDX
> Module initialization and when add-on features require it. To enable
> the Extensions, host first adds extra memory to TDX Module via a
> SEAMCALL (TDH.EXT.MEM.ADD), then uses another SEAMCALL (TDH.EXT.INIT) to
> initialize Extensions, and then some add-on features, e.g. DICE, could
> use Extension SEAMCALLs for work. Note that host can never get the added
> memory back.
> 
> Theoretically, the Extensions doesn't need to be enabled right after
> basic TDX initialization. It could be enabled right before the first
> Extension SEAMCALL is issued. That would save or postpone memory usage.
> But it isn't worth the complexity, the needs for the Extensions are vast
> but the savings are little for a typical TDX capable system (about
> 0.001% of memory). So the Linux decision is to just enable it along with
> the basic TDX.
> 

I think enabling it by default on TDX platforms (with the module
extension) might make sense. But the explanation here is slightly
confusing.

You said earlier that "The Extensions consumes relatively large amount
of memory (~50MB)" so they must be off by default. Later you say that
"..the saving are little .."

Are you saying that the dynamic enabling of the extensions is not worth
it or the dynamic allocation of the memory needed to support them?

In addition, could you briefly describe the complexity we are trading off?

> This series has 2 distinct parts:
> 
>   Patches  1-4:  TDX Module Extensions enabling
>   Patches  5-15: DICE-based TDX Quoting, primarily Peter's work.
> 

^ permalink raw reply

* Re: [PATCH 04/15] x86/virt/tdx: Enable the Extensions right after basic TDX Module init
From: Xu Yilun @ 2026-05-27  4:02 UTC (permalink / raw)
  To: Tony Lindgren
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan, xiaoyao.li
In-Reply-To: <ahPlgBgvIwimzmzH@tlindgre-MOBL1>

On Mon, May 25, 2026 at 09:00:32AM +0300, Tony Lindgren wrote:
> On Fri, May 22, 2026 at 11:41:17AM +0800, Xu Yilun wrote:
> > The detailed initialization flow for TDX Module Extensions has been
> > fully implemented. Enable the flow after basic TDX Module
> > initialization.
> > 
> > Theoretically, the Extensions doesn't need to be enabled right after
> > basic TDX initialization. It could be enabled right before the first
> > Extension SEAMCALL is issued. That would save or postpone memory usage.
> > But it isn't worth the complexity, the needs for the Extensions are vast
> > but the savings are little for a typical TDX capable system (about
> > 0.001% of memory). So the Linux decision is to just enable it along with
> > the basic TDX.
> > 
> > Note that the Extensions initialization flow will still not start if no
> > add-on features require Extensions. The enabling of add-on features will
> > be in later patches. Until then, the system hasn't consumed extra memory.
> 
> Looking at patch 15/15, we need to reload the TDX module metadata at least
> for the attestation. We need to do that early, so to me it seems that
> everything can be just tagged __init from the start.

I'm good to it. The Extension initialization will not start without
add-on features anyway. Let me move the patch as the first one to avoid
tag churn.

^ permalink raw reply

* Re: [PATCH 02/15] x86/virt/tdx: Add extra memory to TDX Module for Extensions
From: Xu Yilun @ 2026-05-27  3:47 UTC (permalink / raw)
  To: Xiaoyao Li
  Cc: kas, djbw, rick.p.edgecombe, x86, peter.fang, linux-coco,
	linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan
In-Reply-To: <7139c55b-b949-415d-ab82-fca1b1cc3880@intel.com>

> > +static void tdx_clflush_hpa_list(struct page *root, unsigned int nr_pages)
> > +{
> > +	u64 *entries = page_to_virt(root);
> > +	int i;
> > +
> > +	for (i = 0; i < nr_pages; i++)
> > +		clflush_cache_range(__va(entries[i]), PAGE_SIZE);
> 
> Is the page flush only needed when CLFLUSH_BEFORE_ALLOC is true?
> 
> If so, it inherits the same decision to always flush as what

Yes it is basically the same as tdx_clflush_page().

> tdx_clflush_page() did. Then, any chance we can use tdx_clflush_page() here

But I don't think we should convert hpa/page/va back and forth just for
re-using one line of code.

> so that we have a single central place of the comment to explain the kernel
> design decision.

How about I add a comment here to connect this wrapper to
tdx_clflush_page():

/*
 * Unconditionally flush the pages regardless of CLFLUSH_BEFORE_ALLOC. Inherit
 * the same decision as tdx_clflush_page().
 */
static void tdx_clflush_hpa_list(struct page *root, unsigned int nr_pages)
...

^ permalink raw reply

* Re: [RFC PATCH 15/15] x86/virt/tdx: Enable TDX Quoting extension
From: Xiaoyao Li @ 2026-05-27  1:30 UTC (permalink / raw)
  To: Xu Yilun
  Cc: Tony Lindgren, kas, djbw, rick.p.edgecombe, x86, peter.fang,
	linux-coco, linux-kernel, kvm, sohil.mehta, yilun.xu, baolu.lu,
	zhenzhong.duan
In-Reply-To: <ahXAL41ZmIDHmgfu@yilunxu-OptiPlex-7050>

On 5/26/2026 11:45 PM, Xu Yilun wrote:
> On Mon, May 25, 2026 at 06:51:27PM +0800, Xiaoyao Li wrote:
>> On 5/25/2026 1:17 PM, Tony Lindgren wrote:
>>> On Fri, May 22, 2026 at 11:41:28AM +0800, Xu Yilun wrote:
>>>> From: Peter Fang <peter.fang@intel.com>
>>>>
>>>> TDX Module updates global metadata when add-on features are enabled.
>>>> Host should update the cached tdx_sysinfo to reflect these changes.
>>>
>>> This should be made clearer IMO. How about mention that get_tdx_sys_info()
>>> needs to get called again to reload the TDX module global metadata?
>>
>> Ah ha! This patch answers my comment to patch 1:
>> https://lore.kernel.org/all/956fa1e6-2920-4b2e-8037-d4b9d812ae53@intel.com/
>>
>> sysinfo_ext->memory_pool_required_pages and sysinfo_ext->ext_required will
>> be updated after extensions are enabled by TDH.SYS.CONFIG.
>>
>> Patch 06 in this series already reads the tdx_sys_info_quote out of
>> get_tdx_sys_info(), which mean get_tdx_sys_info() doesn't ensure all the
>> global metadata will be update again.
>>
>> So how about move the read of memory_pool_required_pages and ext_required
>> out of get_tdx_sys_info() and put them after TDH.SYS.CONFIG, so that we
>> don't need call get_tdx_sys_info() again?
> 
> Yes, I'm good to it. I hesitated to move them out in case we need some
> central control on global data. But now I see there is already a
> precedent:
> 
> https://lore.kernel.org/kvm/20260520133909.409394-22-chao.gao@intel.com/
> 
> Once we've agreed on moving add-on data reading out of get_tdx_sys_info(),
> we don't have to read them after TDH.SYS.CONFIG, read them when really
> needed. How about the following, that makes the Extension part in this
> series self-contained.

Actually below is what I meant after TDH.SYS.CONFIG.

And I think we can re-order the patches of enabling TDX extensions by 
moving the patch 04 as the first one.

> ----8<----
> 
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 86e5b7ad19b3..b729c1f5ab9e 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -1536,6 +1536,10 @@ static __init int init_tdx_ext(void)
>          if (!(tdx_sysinfo.features.tdx_features0 & TDX_FEATURES0_EXT))
>                  return 0;
> 
> +       ret = get_tdx_sys_info_ext(&tdx_sysinfo.ext);
> +       if (ret)
> +               return ret;
> +
>          /* No feature requires TDX Module Extensions. */
>          if (!tdx_sysinfo.ext.ext_required)
>                  return 0;
> diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
> index f9cc2dd02caf..e7d9e0c4b604 100644
> --- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
> +++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
> @@ -140,8 +140,5 @@ static __init int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
>          ret = ret ?: get_tdx_sys_info_td_ctrl(&sysinfo->td_ctrl);
>          ret = ret ?: get_tdx_sys_info_td_conf(&sysinfo->td_conf);
> 
> -       if (sysinfo->features.tdx_features0 & TDX_FEATURES0_EXT)
> -               ret = ret ?: get_tdx_sys_info_ext(&sysinfo->ext);
> -
>          return ret;
>   }


^ permalink raw reply

* Re: [PATCH v5 5/5] iommufd/vdevice: add TSM request ioctl
From: Alexey Kardashevskiy @ 2026-05-27  0:16 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm), linux-coco, iommu, linux-kernel, kvm
  Cc: Bjorn Helgaas, Dan Williams, Jason Gunthorpe, Joerg Roedel,
	Jonathan Cameron, Kevin Tian, Nicolin Chen, Samuel Ortiz,
	Steven Price, Suzuki K Poulose, Will Deacon, Xu Yilun,
	Shameer Kolothum, Paolo Bonzini, Tony Krowiak, Halil Pasic,
	Jason Herne, Harald Freudenberger, Holger Dengler, Heiko Carstens,
	Vasily Gorbik, Alexander Gordeev, Christian Borntraeger,
	Sven Schnelle, Alex Williamson, Matthew Rosato, Farhan Ali,
	Eric Farman, linux-s390
In-Reply-To: <20260525154816.1029642-6-aneesh.kumar@kernel.org>



On 26/5/26 01:48, Aneesh Kumar K.V (Arm) wrote:
> Add IOMMU_VDEVICE_TSM_REQUEST for issuing TSM guest request/response
> transactions against an iommufd vdevice.
> 
> The ioctl takes a vdevice_id plus request/response user buffers and length
> fields, and forwards the request through tsm_guest_req() to the PCI TSM
> backend. This provides the host-side passthrough path used by CoCo guests
> for TSM device attestation and acceptance flows after the device has been
> bound to TSM.
> 
> Also add the supporting tsm_guest_req() helper and associated TSM core
> interface definitions.
> 
> Based on changes from: Alexey Kardashevskiy <aik@amd.com>
> 
> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
>   drivers/iommu/iommufd/iommufd_private.h |  6 ++
>   drivers/iommu/iommufd/main.c            |  3 +
>   drivers/iommu/iommufd/tsm.c             | 68 +++++++++++++++++++++
>   drivers/virt/coco/tsm-core.c            | 39 ++++++++++++
>   include/linux/pci-tsm.h                 |  9 +--
>   include/linux/tsm.h                     | 25 ++++++++
>   include/uapi/linux/iommufd.h            | 80 +++++++++++++++++++++++++
>   7 files changed, 226 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h
> index 8eea0c2c332b..0080895e9e92 100644
> --- a/drivers/iommu/iommufd/iommufd_private.h
> +++ b/drivers/iommu/iommufd/iommufd_private.h
> @@ -701,11 +701,17 @@ int iommufd_hw_queue_alloc_ioctl(struct iommufd_ucmd *ucmd);
>   void iommufd_hw_queue_destroy(struct iommufd_object *obj);
>   #ifdef CONFIG_TSM
>   int iommufd_vdevice_tsm_op_ioctl(struct iommufd_ucmd *ucmd);
> +int iommufd_vdevice_tsm_req_ioctl(struct iommufd_ucmd *ucmd);
>   #else
>   static inline int iommufd_vdevice_tsm_op_ioctl(struct iommufd_ucmd *ucmd)
>   {
>   	return -EOPNOTSUPP;
>   }
> +
> +static inline int iommufd_vdevice_tsm_req_ioctl(struct iommufd_ucmd *ucmd)
> +{
> +	return -EOPNOTSUPP;
> +}
>   #endif
>   
>   static inline struct iommufd_vdevice *
> diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
> index d73e6b391c6f..5f49b546ec92 100644
> --- a/drivers/iommu/iommufd/main.c
> +++ b/drivers/iommu/iommufd/main.c
> @@ -433,6 +433,7 @@ union ucmd_buffer {
>   	struct iommu_vfio_ioas vfio_ioas;
>   	struct iommu_viommu_alloc viommu;
>   	struct iommu_vdevice_tsm_op tsm_op;
> +	struct iommu_vdevice_tsm_req tsm_req;
>   #ifdef CONFIG_IOMMUFD_TEST
>   	struct iommu_test_cmd test;
>   #endif
> @@ -496,6 +497,8 @@ static const struct iommufd_ioctl_op iommufd_ioctl_ops[] = {
>   		 struct iommu_viommu_alloc, out_viommu_id),
>   	IOCTL_OP(IOMMU_VDEVICE_TSM_OP, iommufd_vdevice_tsm_op_ioctl,
>   		 struct iommu_vdevice_tsm_op, vdevice_id),
> +	IOCTL_OP(IOMMU_VDEVICE_TSM_REQ, iommufd_vdevice_tsm_req_ioctl,
> +		 struct iommu_vdevice_tsm_req, resp_uptr),
>   #ifdef CONFIG_IOMMUFD_TEST
>   	IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last),
>   #endif
> diff --git a/drivers/iommu/iommufd/tsm.c b/drivers/iommu/iommufd/tsm.c
> index 09ee668dbed9..342fbdb6a6b9 100644
> --- a/drivers/iommu/iommufd/tsm.c
> +++ b/drivers/iommu/iommufd/tsm.c
> @@ -60,3 +60,71 @@ int iommufd_vdevice_tsm_op_ioctl(struct iommufd_ucmd *ucmd)
>   	iommufd_put_object(ucmd->ictx, &vdev->obj);
>   	return rc;
>   }
> +
> +static bool iommufd_vdevice_tsm_req_scope_valid(u32 scope)
> +{
> +	if (scope > IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_LAST)
> +		return false;
> +
> +	switch (scope) {
> +	case IOMMU_VDEVICE_TSM_REQ_PCI_INFO:
> +	case IOMMU_VDEVICE_TSM_REQ_PCI_STATE_CHANGE:
> +	case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_READ:
> +	case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE:

This scope thing still needs clarification.

I have 3 types of requests to fit here, all go via VM -> KVM -> QEMU -> IOMMUFD -> TSM.

1) bind/unbind TDI <- moves to CONFIG_LOCKED, this is "OP";
2) start/stop TDI <- moves to RUN, this is "GR"? Right now I route it via "OP";
3) enable/disable MMIO/DMA <- no TDI state change, this is "GR" but which scope is it here?

thanks,



> +		return true;
> +	default:
> +		return false;
> +	}
> +}
> +
> +/**
> + * iommufd_vdevice_tsm_req_ioctl - Forward TSM requests
> + * @ucmd: user command data for IOMMU_VDEVICE_TSM_REQ
> + *
> + * Resolve @iommu_vdevice_tsm_req::vdevice_id to a vdevice and pass the
> + * request/response buffers to the TSM core.
> + *
> + * Return:
> + *  -errno on error.
> + *  positive residue if response/request bytes were left unconsumed.
> + *    if response buffer is provided, residue indicates the number of bytes
> + *    not used in response buffer
> + *    if there is no response buffer, residue indicates the number of bytes
> + *    not consumed in req buffer
> + *  0 otherwise.
> + */
> +int iommufd_vdevice_tsm_req_ioctl(struct iommufd_ucmd *ucmd)
> +{
> +	int rc;
> +	struct iommufd_vdevice *vdev;
> +	struct iommu_vdevice_tsm_req *cmd = ucmd->cmd;
> +	struct tsm_guest_req_info info = {
> +		.scope = cmd->scope,
> +		.req   = {
> +			.user = u64_to_user_ptr(cmd->req_uptr),
> +			.is_kernel = false,
> +		},
> +		.req_len = cmd->req_len,
> +		.resp    =  {
> +			.user = u64_to_user_ptr(cmd->resp_uptr),
> +			.is_kernel = false,
> +		},
> +		.resp_len = cmd->resp_len,
> +	};
> +
> +	if (cmd->__reserved)
> +		return -EOPNOTSUPP;
> +
> +	if (!iommufd_vdevice_tsm_req_scope_valid(cmd->scope))
> +		return -EINVAL;
> +
> +	vdev = iommufd_get_vdevice(ucmd->ictx, cmd->vdevice_id);
> +	if (IS_ERR(vdev))
> +		return PTR_ERR(vdev);
> +
> +	rc = tsm_guest_req(vdev->idev->dev, &info);
> +
> +	/* No inline response, hence we don't need to copy the response */
> +	iommufd_put_object(ucmd->ictx, &vdev->obj);
> +	return rc;
> +}
> diff --git a/drivers/virt/coco/tsm-core.c b/drivers/virt/coco/tsm-core.c
> index 3870d08ffe0d..c24886851f9e 100644
> --- a/drivers/virt/coco/tsm-core.c
> +++ b/drivers/virt/coco/tsm-core.c
> @@ -8,6 +8,7 @@
>   #include <linux/module.h>
>   #include <linux/cleanup.h>
>   #include <linux/pci-tsm.h>
> +#include <uapi/linux/iommufd.h>
>   
>   static void tsm_release(struct device *);
>   static const struct class tsm_class = {
> @@ -127,6 +128,44 @@ int tsm_unbind(struct device *dev)
>   }
>   EXPORT_SYMBOL_GPL(tsm_unbind);
>   
> +static int tsm_pci_req_scope(u32 scope, enum pci_tsm_req_scope *pci_scope)
> +{
> +	switch (scope) {
> +	case IOMMU_VDEVICE_TSM_REQ_PCI_INFO:
> +		*pci_scope = PCI_TSM_REQ_INFO;
> +		return 0;
> +	case IOMMU_VDEVICE_TSM_REQ_PCI_STATE_CHANGE:
> +		*pci_scope = PCI_TSM_REQ_STATE_CHANGE;
> +		return 0;
> +	case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_READ:
> +		*pci_scope = PCI_TSM_REQ_DEBUG_READ;
> +		return 0;
> +	case IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE:
> +		*pci_scope = PCI_TSM_REQ_DEBUG_WRITE;
> +		return 0;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
> +ssize_t tsm_guest_req(struct device *dev, struct tsm_guest_req_info *info)
> +{
> +	int ret;
> +	enum pci_tsm_req_scope pci_scope;
> +
> +	if (!dev_is_pci(dev))
> +		return -EINVAL;
> +
> +	ret = tsm_pci_req_scope(info->scope, &pci_scope);
> +	if (ret)
> +		return ret;
> +
> +	return pci_tsm_guest_req(to_pci_dev(dev), pci_scope, info->req,
> +				 info->req_len, info->resp, info->resp_len,
> +				 NULL);
> +}
> +EXPORT_SYMBOL_GPL(tsm_guest_req);
> +
>   static void tsm_release(struct device *dev)
>   {
>   	struct tsm_dev *tsm_dev = container_of(dev, typeof(*tsm_dev), dev);
> diff --git a/include/linux/pci-tsm.h b/include/linux/pci-tsm.h
> index a6435aba03f9..ec2236a7a279 100644
> --- a/include/linux/pci-tsm.h
> +++ b/include/linux/pci-tsm.h
> @@ -4,6 +4,7 @@
>   #include <linux/mutex.h>
>   #include <linux/pci.h>
>   #include <linux/sockptr.h>
> +#include <uapi/linux/iommufd.h>
>   
>   struct pci_tsm;
>   struct tsm_dev;
> @@ -173,7 +174,7 @@ enum pci_tsm_req_scope {
>   	 * typical TDISP collateral information like Device Interface Reports.
>   	 * No device secrets are permitted, and no device state is changed.
>   	 */
> -	PCI_TSM_REQ_INFO = 0,
> +	PCI_TSM_REQ_INFO = IOMMU_VDEVICE_TSM_REQ_PCI_INFO,
>   	/**
>   	 * @PCI_TSM_REQ_STATE_CHANGE: Request to change the TDISP state from
>   	 * UNLOCKED->LOCKED, LOCKED->RUN, or other architecture specific state
> @@ -181,14 +182,14 @@ enum pci_tsm_req_scope {
>   	 * to TDISP) device / host state, configuration, or data change is
>   	 * permitted.
>   	 */
> -	PCI_TSM_REQ_STATE_CHANGE = 1,
> +	PCI_TSM_REQ_STATE_CHANGE = IOMMU_VDEVICE_TSM_REQ_PCI_STATE_CHANGE,
>   	/**
>   	 * @PCI_TSM_REQ_DEBUG_READ: Read-only request for debug information
>   	 *
>   	 * A method to facilitate TVM information retrieval outside of typical
>   	 * TDISP operational requirements. No device secrets are permitted.
>   	 */
> -	PCI_TSM_REQ_DEBUG_READ = 2,
> +	PCI_TSM_REQ_DEBUG_READ = IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_READ,
>   	/**
>   	 * @PCI_TSM_REQ_DEBUG_WRITE: Device state changes for debug purposes
>   	 *
> @@ -196,7 +197,7 @@ enum pci_tsm_req_scope {
>   	 * the TDISP operational model. If allowed, requires CAP_SYS_RAW_IO, and
>   	 * will taint the kernel.
>   	 */
> -	PCI_TSM_REQ_DEBUG_WRITE = 3,
> +	PCI_TSM_REQ_DEBUG_WRITE = IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE,
>   };
>   
>   #ifdef CONFIG_PCI_TSM
> diff --git a/include/linux/tsm.h b/include/linux/tsm.h
> index 7b6df827321b..6101a2a1db61 100644
> --- a/include/linux/tsm.h
> +++ b/include/linux/tsm.h
> @@ -6,6 +6,7 @@
>   #include <linux/types.h>
>   #include <linux/uuid.h>
>   #include <linux/device.h>
> +#include <linux/sockptr.h>
>   
>   #define TSM_REPORT_INBLOB_MAX 64
>   #define TSM_REPORT_OUTBLOB_MAX SZ_16M
> @@ -128,6 +129,23 @@ struct kvm;
>   #ifdef CONFIG_TSM
>   int tsm_bind(struct device *dev, struct kvm *kvm, u64 tdi_id);
>   int tsm_unbind(struct device *dev);
> +
> +/**
> + * struct tsm_guest_req_info - parameter for tsm_guest_req()
> + * @scope: iommufd allocated scope for tsm guest request
> + * @req: request data buffer filled by guest
> + * @req_len: the size of @req filled by guest
> + * @resp: response data buffer filled by host
> + * @resp_len: the size of @resp buffer filled by guest
> + */
> +struct tsm_guest_req_info {
> +	u32 scope;
> +	sockptr_t req;
> +	size_t req_len;
> +	sockptr_t resp;
> +	size_t resp_len;
> +};
> +ssize_t tsm_guest_req(struct device *dev, struct tsm_guest_req_info *info);
>   #else
>   static inline int tsm_bind(struct device *dev, struct kvm *kvm, u64 tdi_id)
>   {
> @@ -138,6 +156,13 @@ static inline int tsm_unbind(struct device *dev)
>   {
>   	return 0;
>   }
> +
> +struct tsm_guest_req_info;
> +static inline ssize_t tsm_guest_req(struct device *dev,
> +		struct tsm_guest_req_info *info)
> +{
> +	return -EINVAL;
> +}
>   #endif
>   
>   #endif /* __TSM_H */
> diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
> index 66398efa31d1..7953e99a9671 100644
> --- a/include/uapi/linux/iommufd.h
> +++ b/include/uapi/linux/iommufd.h
> @@ -58,6 +58,7 @@ enum {
>   	IOMMUFD_CMD_VEVENTQ_ALLOC = 0x93,
>   	IOMMUFD_CMD_HW_QUEUE_ALLOC = 0x94,
>   	IOMMUFD_CMD_VDEVICE_TSM_OP = 0x95,
> +	IOMMUFD_CMD_VDEVICE_TSM_REQ = 0x96,
>   };
>   
>   /**
> @@ -1373,4 +1374,83 @@ struct iommu_hw_queue_alloc {
>   	__aligned_u64 length;
>   };
>   #define IOMMU_HW_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_HW_QUEUE_ALLOC)
> +
> +/*
> + * TSM request scope values are allocated by iommufd. Each device-bus transport
> + * gets a range from this number space.
> + */
> +#define IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_BASE	0
> +
> +enum iommu_vdevice_tsm_req_scope {
> +	/*
> +	 * Read-only, without side effects, request for typical TDISP
> +	 * collateral information like Device Interface Reports. No device
> +	 * secrets are permitted, and no device state is changed.
> +	 */
> +	IOMMU_VDEVICE_TSM_REQ_PCI_INFO =
> +		IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_BASE,
> +	/*
> +	 * Request to change the TDISP state from UNLOCKED->LOCKED,
> +	 * LOCKED->RUN, or other architecture specific state changes to
> +	 * support those transitions for a TDI. No other device or host state,
> +	 * configuration, or data change is permitted.
> +	 */
> +	IOMMU_VDEVICE_TSM_REQ_PCI_STATE_CHANGE =
> +		IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_BASE + 1,
> +	/*
> +	 * Read-only request for debug information outside of typical TDISP
> +	 * operational requirements. No device secrets are permitted.
> +	 */
> +	IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_READ =
> +		IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_BASE + 2,
> +	/*
> +	 * Device state changes for debug purposes. The request may affect the
> +	 * operational state of the device outside of the TDISP operational
> +	 * model. If allowed, this requires CAP_SYS_RAW_IO and taints the
> +	 * kernel.
> +	 */
> +	IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE =
> +		IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_BASE + 3,
> +	IOMMU_VDEVICE_TSM_REQ_SCOPE_PCI_LAST =
> +		IOMMU_VDEVICE_TSM_REQ_PCI_DEBUG_WRITE,
> +};
> +
> +/**
> + * struct iommu_vdevice_tsm_req - ioctl(IOMMU_VDEVICE_TSM_REQ)
> + * @size: sizeof(struct iommu_vdevice_tsm_req)
> + * @vdevice_id: vDevice ID the guest request is for
> + * @scope: One of enum iommu_vdevice_tsm_req_scope
> + * @req_len: Size in bytes of the input payload at @req_uptr
> + * @resp_len: Size in bytes of the output buffer at @resp_uptr
> + * @__reserved: Must be 0
> + * @req_uptr: Userspace pointer to the guest-provided request payload
> + * @resp_uptr: Userspace pointer to the guest response buffer
> + *
> + * Forward a TSM request to the TSM bound vDevice. This is intended for
> + * guest TSM/TDISP message transport where the host kernel only marshals
> + * bytes between userspace and the TSM implementation.
> + *
> + * Requests outside the iommufd allocated scope values are rejected. Lower
> + * layers may reject scope values that are valid in the global iommufd
> + * namespace, but not permitted for a specific bus.
> + *
> + * The request payload is read from @req_uptr/@req_len. If a response is
> + * expected, userspace provides @resp_uptr/@resp_len as writable storage for
> + * response bytes returned by the TSM path.
> + *
> + * The ioctl is only suitable for commands and results that the host kernel
> + * has no use, the host is only facilitating guest to TSM communication.
> + */
> +struct iommu_vdevice_tsm_req {
> +	__u32 size;
> +	__u32 vdevice_id;
> +	__u32 scope;
> +	__u32 req_len;
> +	__u32 resp_len;
> +	__u32 __reserved;
> +	__aligned_u64 req_uptr;
> +	__aligned_u64 resp_uptr;
> +};
> +
> +#define IOMMU_VDEVICE_TSM_REQ _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VDEVICE_TSM_REQ)
>   #endif

-- 
Alexey


^ permalink raw reply

* [Invitation] bi-weekly guest_memfd upstream call on 2026-05-28
From: Ackerley Tng @ 2026-05-26 23:49 UTC (permalink / raw)
  To: linux-coco, linux-mm, kvm; +Cc: david

Hi,

Our next guest_memfd upstream call is scheduled for Thursday, 2026-05-28
at 8:00 - 9:00am (GMT-07:00) Pacific Time - Vancouver.

We'll be using the following Google meet:
http://meet.google.com/wxp-wtju-jzw

In this meeting, we'll Tarun talk about guest_memfd preservation for
LiveUpdate.

We also have one question regarding virtio-mem and CoCo forwarded from
David.

The meeting notes can be found at [1], where we also link recordings and
collect current guest_memfd upstream proposals. If you want an google
calendar invitation that also covers all future meetings, just write
Ackerley or David a mail.

To put something to discuss onto the agenda, reply to this mail or add
them to the "Topics/questions for next meeting(s)" section in the
meeting notes as a comment.

[1] https://docs.google.com/document/d/1M6766BzdY1Lhk7LiR5IqVR8B8mG3cr-cxTxOrAosPOk/edit?usp=sharing

Ackerley

^ permalink raw reply

* Re: [PATCH v14 28/44] arm64: RMI: Create the realm descriptor
From: Wei-Lin Chang @ 2026-05-26 22:47 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-29-steven.price@arm.com>

Hi,

On Wed, May 13, 2026 at 02:17:36PM +0100, Steven Price wrote:
> Creating a realm involves first creating a realm descriptor (RD). This
> involves passing the configuration information to the RMM. Do this as
> part of realm_ensure_created() so that the realm is created when it is
> first needed.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
>  * The RMM no longer uses AUX granules, so no need to ask it how many it
>    needs.
>  * Adapted to other changes.
> Changes since v12:
>  * Since RMM page size is now equal to the host's page size various
>    calculations are simplified.
>  * Switch to using range based APIs to delegate/undelegate.
>  * VMID handling is now handled entirely by the RMM.
> ---
>  arch/arm64/kvm/rmi.c | 88 +++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 86 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
> index fb96bcaa73ed..cae29fd3353c 100644
> --- a/arch/arm64/kvm/rmi.c
> +++ b/arch/arm64/kvm/rmi.c
> @@ -418,6 +418,77 @@ static void realm_unmap_shared_range(struct kvm *kvm,
>  			     start, end);
>  }
>  
> +static int realm_create_rd(struct kvm *kvm)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +	struct realm_params *params = realm->params;
> +	void *rd = NULL;
> +	phys_addr_t rd_phys, params_phys;
> +	size_t pgd_size = kvm_pgtable_stage2_pgd_size(kvm->arch.mmu.vtcr);
> +	int r;
> +
> +	realm->ia_bits = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
> +
> +	if (WARN_ON(realm->rd || !realm->params))
> +		return -EEXIST;
> +
> +	rd = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
> +	if (!rd)
> +		return -ENOMEM;
> +
> +	rd_phys = virt_to_phys(rd);
> +	if (rmi_delegate_page(rd_phys)) {
> +		r = -ENXIO;
> +		goto free_rd;
> +	}
> +
> +	if (rmi_delegate_range(kvm->arch.mmu.pgd_phys, pgd_size)) {
> +		r = -ENXIO;
> +		goto out_undelegate_tables;
> +	}
> +
> +	params->s2sz = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
> +	params->rtt_level_start = get_start_level(realm);
> +	params->rtt_num_start = pgd_size / PAGE_SIZE;
> +	params->rtt_base = kvm->arch.mmu.pgd_phys;
> +
> +	if (kvm->arch.arm_pmu) {
> +		params->pmu_num_ctrs = kvm->arch.nr_pmu_counters;
> +		params->flags |= RMI_REALM_PARAM_FLAG_PMU;
> +	}
> +
> +	if (kvm_lpa2_is_enabled())
> +		params->flags |= RMI_REALM_PARAM_FLAG_LPA2;
> +
> +	params_phys = virt_to_phys(params);
> +
> +	if (rmi_realm_create(rd_phys, params_phys)) {
> +		r = -ENXIO;
> +		goto out_undelegate_tables;
> +	}
> +
> +	realm->rd = rd;
> +	kvm_set_realm_state(kvm, REALM_STATE_NEW);
> +	/* The realm is up, free the parameters.  */
> +	free_page((unsigned long)realm->params);
> +	realm->params = NULL;
> +
> +	return 0;
> +
> +out_undelegate_tables:
> +	if (WARN_ON(rmi_undelegate_range(kvm->arch.mmu.pgd_phys, pgd_size))) {
> +		/* Leak the pages if they cannot be returned */
> +		kvm->arch.mmu.pgt = NULL;
> +	}
> +	if (WARN_ON(rmi_undelegate_page(rd_phys))) {
> +		/* Leak the page if it isn't returned */
> +		return r;
> +	}
> +free_rd:
> +	free_page((unsigned long)rd);
> +	return r;
> +}
> +
>  static void realm_unmap_private_range(struct kvm *kvm,
>  				      unsigned long start,
>  				      unsigned long end,
> @@ -647,8 +718,21 @@ static int realm_init_ipa_state(struct kvm *kvm,
>  
>  static int realm_ensure_created(struct kvm *kvm)
>  {
> -	/* Provided in later patch */
> -	return -ENXIO;
> +	int ret;
> +
> +	switch (kvm_realm_state(kvm)) {
> +	case REALM_STATE_NONE:
> +		break;
> +	case REALM_STATE_NEW:
> +		return 0;
> +	case REALM_STATE_DEAD:
> +		return -ENXIO;
> +	default:
> +		return -EBUSY;
> +	}
> +
> +	ret = realm_create_rd(kvm);
> +	return ret;
>  }

I think ret can be simplified out.

Thanks,
Wei-Lin Chang

>  
>  static int set_ripas_of_protected_regions(struct kvm *kvm)
> -- 
> 2.43.0
> 

^ permalink raw reply

* Re: [PATCH v14 19/44] arm64: RMI: Allocate/free RECs to match vCPUs
From: Wei-Lin Chang @ 2026-05-26 22:39 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-20-steven.price@arm.com>

Hi,

On Wed, May 13, 2026 at 02:17:27PM +0100, Steven Price wrote:
> The RMM maintains a data structure known as the Realm Execution Context
> (or REC). It is similar to struct kvm_vcpu and tracks the state of the
> virtual CPUs. KVM must delegate memory and request the structures are
> created when vCPUs are created, and suitably tear down on destruction.
> 
> RECs may require additional pages (e.g. for storing larger register
> state for SVE). The RMM can request extra pages for this purpose using
> the Stateful RMI Operations (SRO) functionality to request pages during
> REC creation. These pages are then passed back to the host from the RMM
> ('reclaimed') when the REC is destroyed. The kernel tracking object
> (struct rmi_sro_state) is stored in the realm_rec structure to avoid
> memory allocation during the destruction path.
> 
> Note that only some of register state for the REC can be set by KVM, the
> rest is defined by the RMM (zeroed). The register state then cannot be
> changed by KVM after the REC is created (except when the guest
> explicitly requests this e.g. by performing a PSCI call).
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
>  * Support SRO for REC creation/destruction instead of auxiliary
>    granules.
> Changes since v12:
>  * Use the new range-based delegation RMI.
> Changes since v11:
>  * Remove the KVM_ARM_VCPU_REC feature. User space no longer needs to
>    configure each VCPU separately, RECs are created on the first VCPU
>    run of the guest.
> Changes since v9:
>  * Size the aux_pages array according to the PAGE_SIZE of the host.
> Changes since v7:
>  * Add comment explaining the aux_pages array.
>  * Rename "undeleted_failed" variable to "should_free" to avoid a
>    confusing double negative.
> Changes since v6:
>  * Avoid reporting the KVM_ARM_VCPU_REC feature if the guest isn't a
>    realm guest.
>  * Support host page size being larger than RMM's granule size when
>    allocating/freeing aux granules.
> Changes since v5:
>  * Separate the concept of vcpu_is_rec() and
>    kvm_arm_vcpu_rec_finalized() by using the KVM_ARM_VCPU_REC feature as
>    the indication that the VCPU is a REC.
> Changes since v2:
>  * Free rec->run earlier in kvm_destroy_realm() and adapt to previous patches.
> ---
>  arch/arm64/include/asm/kvm_emulate.h |   2 +-
>  arch/arm64/include/asm/kvm_host.h    |   3 +
>  arch/arm64/include/asm/kvm_rmi.h     |  17 +++++
>  arch/arm64/kvm/arm.c                 |   6 ++
>  arch/arm64/kvm/reset.c               |   1 +
>  arch/arm64/kvm/rmi.c                 | 105 +++++++++++++++++++++++++++
>  6 files changed, 133 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 82fd777bd9bb..2e69fe494716 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -714,7 +714,7 @@ static inline bool kvm_realm_is_created(struct kvm *kvm)
>  
>  static inline bool vcpu_is_rec(const struct kvm_vcpu *vcpu)
>  {
> -	return false;
> +	return kvm_is_realm(vcpu->kvm);
>  }
>  
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 3512696ed506..39b5de03d0fe 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -969,6 +969,9 @@ struct kvm_vcpu_arch {
>  
>  	/* Hyp-readable copy of kvm_vcpu::pid */
>  	pid_t pid;
> +
> +	/* Realm meta data */
> +	struct realm_rec rec;
>  };
>  
>  /*
> diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
> index 8bd743093ccf..d99bf4fc3c39 100644
> --- a/arch/arm64/include/asm/kvm_rmi.h
> +++ b/arch/arm64/include/asm/kvm_rmi.h
> @@ -59,6 +59,22 @@ struct realm {
>  	unsigned int ia_bits;
>  };
>  
> +/**
> + * struct realm_rec - Additional per VCPU data for a Realm
> + *
> + * @mpidr: MPIDR (Multiprocessor Affinity Register) value to identify this VCPU
> + * @rec_page: Kernel VA of the RMM's private page for this REC
> + * @aux_pages: Additional pages private to the RMM for this REC
> + * @run: Kernel VA of the RmiRecRun structure shared with the RMM
> + * @sro: A preallocated SRO state context
> + */
> +struct realm_rec {
> +	unsigned long mpidr;
> +	void *rec_page;
> +	struct rec_run *run;
> +	struct rmi_sro_state *sro;
> +};
> +
>  void kvm_init_rmi(void);
>  u32 kvm_realm_ipa_limit(void);
>  
> @@ -66,6 +82,7 @@ int kvm_init_realm(struct kvm *kvm);
>  int kvm_activate_realm(struct kvm *kvm);
>  void kvm_destroy_realm(struct kvm *kvm);
>  void kvm_realm_destroy_rtts(struct kvm *kvm);
> +void kvm_destroy_rec(struct kvm_vcpu *vcpu);
>  
>  static inline bool kvm_realm_is_private_address(struct realm *realm,
>  						unsigned long addr)
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index eb2b61fe1f0a..93d34762db91 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -586,6 +586,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
>  	/* Force users to call KVM_ARM_VCPU_INIT */
>  	vcpu_clear_flag(vcpu, VCPU_INITIALIZED);
>  
> +	vcpu->arch.rec.mpidr = INVALID_HWID;
> +
>  	vcpu->arch.mmu_page_cache.gfp_zero = __GFP_ZERO;
>  
>  	/* Set up the timer */
> @@ -1651,6 +1653,10 @@ static int kvm_vcpu_init_check_features(struct kvm_vcpu *vcpu,
>  	if (test_bit(KVM_ARM_VCPU_HAS_EL2, &features))
>  		return -EINVAL;
>  
> +	/* Realms are incompatible with AArch32 */
> +	if (vcpu_is_rec(vcpu))
> +		return -EINVAL;
> +
>  	return 0;
>  }
>  
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index b963fd975aac..c18cdca7d125 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -161,6 +161,7 @@ void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu)
>  	free_page((unsigned long)vcpu->arch.ctxt.vncr_array);
>  	kfree(vcpu->arch.vncr_tlb);
>  	kfree(vcpu->arch.ccsidr);
> +	kvm_destroy_rec(vcpu);
>  }
>  
>  static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
> index 849111817af7..353a5ca45e78 100644
> --- a/arch/arm64/kvm/rmi.c
> +++ b/arch/arm64/kvm/rmi.c
> @@ -173,9 +173,108 @@ static int realm_ensure_created(struct kvm *kvm)
>  	return -ENXIO;
>  }
>  
> +static int kvm_create_rec(struct kvm_vcpu *vcpu)
> +{
> +	struct user_pt_regs *vcpu_regs = vcpu_gp_regs(vcpu);
> +	unsigned long mpidr = kvm_vcpu_get_mpidr_aff(vcpu);
> +	struct realm *realm = &vcpu->kvm->arch.realm;
> +	struct realm_rec *rec = &vcpu->arch.rec;
> +	unsigned long rec_page_phys;
> +	struct rec_params *params;
> +	int r, i;
> +
> +	if (rec->run)
> +		return -EBUSY;
> +
> +	/*
> +	 * The RMM will report PSCI v1.0 to Realms and the KVM_ARM_VCPU_PSCI_0_2
> +	 * flag covers v0.2 and onwards.
> +	 */
> +	if (!vcpu_has_feature(vcpu, KVM_ARM_VCPU_PSCI_0_2))
> +		return -EINVAL;
> +
> +	BUILD_BUG_ON(sizeof(*params) > PAGE_SIZE);
> +	BUILD_BUG_ON(sizeof(*rec->run) > PAGE_SIZE);
> +
> +	params = (struct rec_params *)get_zeroed_page(GFP_KERNEL);
> +	rec->rec_page = (void *)__get_free_page(GFP_KERNEL);
> +	rec->run = (void *)get_zeroed_page(GFP_KERNEL);

Should this be cast to (struct rec_run *) ?

> +	rec->sro = kmalloc_obj(*rec->sro);
> +	if (!params || !rec->rec_page || !rec->run || !rec->sro) {
> +		r = -ENOMEM;
> +		goto out_free_pages;
> +	}
> +
> +	for (i = 0; i < ARRAY_SIZE(params->gprs); i++)
> +		params->gprs[i] = vcpu_regs->regs[i];
> +
> +	params->pc = vcpu_regs->pc;
> +
> +	if (vcpu->vcpu_id == 0)
> +		params->flags |= REC_PARAMS_FLAG_RUNNABLE;
> +
> +	rec_page_phys = virt_to_phys(rec->rec_page);
> +
> +	if (rmi_delegate_page(rec_page_phys)) {
> +		r = -ENXIO;
> +		goto out_free_pages;
> +	}
> +
> +	params->mpidr = mpidr;
> +
> +	if (rmi_rec_create(virt_to_phys(realm->rd), rec_page_phys,
> +			   virt_to_phys(params), rec->sro)) {
> +		r = -ENXIO;
> +		goto out_undelegate_rmm_rec;
> +	}
> +
> +	rec->mpidr = mpidr;
> +
> +	free_page((unsigned long)params);
> +	return 0;
> +
> +out_undelegate_rmm_rec:
> +	if (WARN_ON(rmi_undelegate_page(rec_page_phys)))
> +		rec->rec_page = NULL;
> +out_free_pages:
> +	free_page((unsigned long)rec->run);
> +	free_page((unsigned long)rec->rec_page);
> +	free_page((unsigned long)params);
> +	kfree(rec->sro);
> +	rec->run = NULL;
> +	return r;
> +}
> +

[...]

Thanks,
Wei-Lin Chang

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox