Linux Confidential Computing Development
 help / color / mirror / Atom feed
* Re: [PATCH v14 07/44] arm64: RMI: Configure the RMM with the host's page size
From: Suzuki K Poulose @ 2026-05-21 14:53 UTC (permalink / raw)
  To: Marc Zyngier, Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni, Gavin Shan, Shanker Donthineni,
	Alper Gun, Aneesh Kumar K . V, Emi Kisanuki, Vishal Annapurve,
	WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <86a4tsx536.wl-maz@kernel.org>

On 21/05/2026 14:30, Marc Zyngier wrote:
> On Wed, 13 May 2026 14:17:15 +0100,
> Steven Price <steven.price@arm.com> wrote:
>>
>> RMM v2.0 brings the ability to set the RMM's granule size. Check the
>> feature registers and configure the RMM so that it matches the host's
>> page size. This means that operations can be done with a granulatity
>> equal to PAGE_SIZE.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>> Changes since v13:
>>   * Moved out of KVM.
>> ---
>>   arch/arm64/kernel/rmi.c | 42 +++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 42 insertions(+)
>>
>> diff --git a/arch/arm64/kernel/rmi.c b/arch/arm64/kernel/rmi.c
>> index 99c1ccc35c11..a14ead5dedda 100644
>> --- a/arch/arm64/kernel/rmi.c
>> +++ b/arch/arm64/kernel/rmi.c
>> @@ -49,6 +49,45 @@ static int rmi_check_version(void)
>>   	return 0;
>>   }
>>   
>> +static int rmi_configure(void)
>> +{
>> +	struct rmm_config *config __free(free_page) = NULL;
>> +	unsigned long ret;
>> +
>> +	config = (struct rmm_config *)get_zeroed_page(GFP_KERNEL);
>> +	if (!config)
>> +		return -ENOMEM;
> 
> This is the sort of buggy construct that is highlighted in
> include/linux/cleanup.h: initialising the object for cleanup with
> NULL, and only later assigning the expected value.
> 
> It may not matter here, but it will catch you (or more probably me) in
> the future.
> 
>> +
>> +	switch (PAGE_SIZE) {
>> +	case SZ_4K:
>> +		config->rmi_granule_size = RMI_GRANULE_SIZE_4KB;
>> +		break;
>> +	case SZ_16K:
>> +		config->rmi_granule_size = RMI_GRANULE_SIZE_16KB;
>> +		break;
>> +	case SZ_64K:
>> +		config->rmi_granule_size = RMI_GRANULE_SIZE_64KB;
>> +		break;
>> +	default:
>> +		pr_err("Unsupported PAGE_SIZE for RMM\n");
> 
> Do you really anticipate PAGE_SIZE being any other value? This is 100%
> dead code. If you want to be extra cautious, have a BUILD_BUg_ON().
> 
>> +		return -EINVAL;
>> +	}
>> +
>> +	ret = rmi_rmm_config_set(virt_to_phys(config));
>> +	if (ret) {
>> +		pr_err("RMM config set failed\n");
>> +		return -EINVAL;
>> +	}
> 
> What is the live cycle of the page when the call succeeds? Is it
> switched back to the NS PAS and allowed to be freed?

It always remains in the NS world. We relay some information in the
NS PAS page, which the RMM consumes. The checks are performed on
the values consumed by the RMM.

Kind regards
Suzuki

> 
>> +
>> +	ret = rmi_rmm_activate();
>> +	if (ret) {
>> +		pr_err("RMM activate failed\n");
>> +		return -ENXIO;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>>   static int __init arm64_init_rmi(void)
>>   {
>>   	/* Continue without realm support if we can't agree on a version */
>> @@ -60,6 +99,9 @@ static int __init arm64_init_rmi(void)
>>   	if (WARN_ON(rmi_features(1, &rmm_feat_reg1)))
>>   		return 0;
>>   
>> +	if (rmi_configure())
>> +		return 0;
>> +
>>   	return 0;
>>   }
>>   subsys_initcall(arm64_init_rmi);
> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply

* Re: [PATCH v14 04/44] arm64: RMI: Add SMC definitions for calling the RMM
From: Suzuki K Poulose @ 2026-05-21 14:50 UTC (permalink / raw)
  To: Marc Zyngier, Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni, Gavin Shan, Shanker Donthineni,
	Alper Gun, Aneesh Kumar K . V, Emi Kisanuki, Vishal Annapurve,
	WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <86ecj5vsu4.wl-maz@kernel.org>

On 21/05/2026 13:40, Marc Zyngier wrote:
> On Wed, 13 May 2026 14:17:12 +0100,
> Steven Price <steven.price@arm.com> wrote:
>>
>> The RMM (Realm Management Monitor) provides functionality that can be
>> accessed by SMC calls from the host.
>>
>> The SMC definitions are based on DEN0137[1] version 2.0-bet1
>>
>> [1] https://developer.arm.com/documentation/den0137/2-0bet1/
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>> Changes since v13:
>>   * Updated to RMM spec v2.0-bet1
>> Changes since v12:
>>   * Updated to RMM spec v2.0-bet0
>> Changes since v9:
>>   * Corrected size of 'ripas_value' in struct rec_exit. The spec states
>>     this is an 8-bit type with padding afterwards (rather than a u64).
>> Changes since v8:
>>   * Added RMI_PERMITTED_GICV3_HCR_BITS to define which bits the RMM
>>     permits to be modified.
>> Changes since v6:
>>   * Renamed REC_ENTER_xxx defines to include 'FLAG' to make it obvious
>>     these are flag values.
>> Changes since v5:
>>   * Sorted the SMC #defines by value.
>>   * Renamed SMI_RxI_CALL to SMI_RMI_CALL since the macro is only used for
>>     RMI calls.
>>   * Renamed REC_GIC_NUM_LRS to REC_MAX_GIC_NUM_LRS since the actual
>>     number of available list registers could be lower.
>>   * Provided a define for the reserved fields of FeatureRegister0.
>>   * Fix inconsistent names for padding fields.
>> Changes since v4:
>>   * Update to point to final released RMM spec.
>>   * Minor rearrangements.
>> Changes since v3:
>>   * Update to match RMM spec v1.0-rel0-rc1.
>> Changes since v2:
>>   * Fix specification link.
>>   * Rename rec_entry->rec_enter to match spec.
>>   * Fix size of pmu_ovf_status to match spec.
>> ---
>>   arch/arm64/include/asm/rmi_smc.h | 448 +++++++++++++++++++++++++++++++
>>   1 file changed, 448 insertions(+)
>>   create mode 100644 arch/arm64/include/asm/rmi_smc.h
>>
>> diff --git a/arch/arm64/include/asm/rmi_smc.h b/arch/arm64/include/asm/rmi_smc.h
>> new file mode 100644
>> index 000000000000..a09b7a631fef
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/rmi_smc.h
>> @@ -0,0 +1,448 @@
>> +/* SPDX-License-Identifier: GPL-2.0 */
>> +/*
>> + * Copyright (C) 2023-2026 ARM Ltd.
>> + *
>> + * The values and structures in this file are from the Realm Management Monitor
>> + * specification (DEN0137) version 2.0-bet1:
>> + * https://developer.arm.com/documentation/den0137/2-0bet1/
> 
> How long is this spec going to be available on the ARM web site, which
> has a tendency of being reorganised every other week? And there is
> already a beta2.
> 
>> + */
>> +
>> +#ifndef __ASM_RMI_SMC_H
>> +#define __ASM_RMI_SMC_H
>> +
>> +#include <linux/arm-smccc.h>
>> +
>> +#define SMC_RMI_CALL(func)				\
>> +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,		\
>> +			   ARM_SMCCC_SMC_64,		\
>> +			   ARM_SMCCC_OWNER_STANDARD,	\
>> +			   (func))
>> +
>> +#define SMC_RMI_VERSION				SMC_RMI_CALL(0x0150)
>> +
>> +#define SMC_RMI_RTT_DATA_MAP_INIT		SMC_RMI_CALL(0x0153)
>> +
>> +#define SMC_RMI_REALM_ACTIVATE			SMC_RMI_CALL(0x0157)
>> +#define SMC_RMI_REALM_CREATE			SMC_RMI_CALL(0x0158)
>> +#define SMC_RMI_REALM_DESTROY			SMC_RMI_CALL(0x0159)
>> +#define SMC_RMI_REC_CREATE			SMC_RMI_CALL(0x015a)
>> +#define SMC_RMI_REC_DESTROY			SMC_RMI_CALL(0x015b)
>> +#define SMC_RMI_REC_ENTER			SMC_RMI_CALL(0x015c)
>> +#define SMC_RMI_RTT_CREATE			SMC_RMI_CALL(0x015d)
>> +#define SMC_RMI_RTT_DESTROY			SMC_RMI_CALL(0x015e)
>> +
>> +#define SMC_RMI_RTT_READ_ENTRY			SMC_RMI_CALL(0x0161)
>> +
>> +#define SMC_RMI_RTT_DEV_VALIDATE		SMC_RMI_CALL(0x0163)
>> +#define SMC_RMI_PSCI_COMPLETE			SMC_RMI_CALL(0x0164)
>> +#define SMC_RMI_FEATURES			SMC_RMI_CALL(0x0165)
>> +#define SMC_RMI_RTT_FOLD			SMC_RMI_CALL(0x0166)
>> +
>> +#define SMC_RMI_RTT_INIT_RIPAS			SMC_RMI_CALL(0x0168)
>> +#define SMC_RMI_RTT_SET_RIPAS			SMC_RMI_CALL(0x0169)
>> +#define SMC_RMI_VSMMU_CREATE			SMC_RMI_CALL(0x016a)
>> +#define SMC_RMI_VSMMU_DESTROY			SMC_RMI_CALL(0x016b)
>> +#define SMC_RMI_RMM_CONFIG_SET			SMC_RMI_CALL(0x016e)
>> +#define SMC_RMI_PSMMU_IRQ_NOTIFY		SMC_RMI_CALL(0x016f)
>> +
>> +#define SMC_RMI_PDEV_ABORT			SMC_RMI_CALL(0x0174)
>> +#define SMC_RMI_PDEV_COMMUNICATE		SMC_RMI_CALL(0x0175)
>> +#define SMC_RMI_PDEV_CREATE			SMC_RMI_CALL(0x0176)
>> +#define SMC_RMI_PDEV_DESTROY			SMC_RMI_CALL(0x0177)
>> +#define SMC_RMI_PDEV_GET_STATE			SMC_RMI_CALL(0x0178)
>> +
>> +#define SMC_RMI_PDEV_STREAM_KEY_REFRESH		SMC_RMI_CALL(0x017a)
>> +#define SMC_RMI_PDEV_SET_PUBKEY			SMC_RMI_CALL(0x017b)
>> +#define SMC_RMI_PDEV_STOP			SMC_RMI_CALL(0x017c)
>> +#define SMC_RMI_RTT_AUX_CREATE			SMC_RMI_CALL(0x017d)
>> +#define SMC_RMI_RTT_AUX_DESTROY			SMC_RMI_CALL(0x017e)
>> +#define SMC_RMI_RTT_AUX_FOLD			SMC_RMI_CALL(0x017f)
>> +
>> +#define SMC_RMI_VDEV_ABORT			SMC_RMI_CALL(0x0185)
>> +#define SMC_RMI_VDEV_COMMUNICATE		SMC_RMI_CALL(0x0186)
>> +#define SMC_RMI_VDEV_CREATE			SMC_RMI_CALL(0x0187)
>> +#define SMC_RMI_VDEV_DESTROY			SMC_RMI_CALL(0x0188)
>> +#define SMC_RMI_VDEV_GET_STATE			SMC_RMI_CALL(0x0189)
>> +#define SMC_RMI_VDEV_UNLOCK			SMC_RMI_CALL(0x018a)
>> +#define SMC_RMI_RTT_SET_S2AP			SMC_RMI_CALL(0x018b)
>> +#define SMC_RMI_VDEV_COMPLETE			SMC_RMI_CALL(0x018e)
>> +
>> +#define SMC_RMI_VDEV_GET_INTERFACE_REPORT	SMC_RMI_CALL(0x01d0)
>> +#define SMC_RMI_VDEV_GET_MEASUREMENTS		SMC_RMI_CALL(0x01d1)
>> +#define SMC_RMI_VDEV_LOCK			SMC_RMI_CALL(0x01d2)
>> +#define SMC_RMI_VDEV_START			SMC_RMI_CALL(0x01d3)
>> +
>> +#define SMC_RMI_VSMMU_EVENT_NOTIFY		SMC_RMI_CALL(0x01d6)
>> +#define SMC_RMI_PSMMU_ACTIVATE			SMC_RMI_CALL(0x01d7)
>> +#define SMC_RMI_PSMMU_DEACTIVATE		SMC_RMI_CALL(0x01d8)
>> +
>> +#define SMC_RMI_PSMMU_ST_L2_CREATE		SMC_RMI_CALL(0x01db)
>> +#define SMC_RMI_PSMMU_ST_L2_DESTROY		SMC_RMI_CALL(0x01dc)
>> +#define SMC_RMI_DPT_L0_CREATE			SMC_RMI_CALL(0x01dd)
>> +#define SMC_RMI_DPT_L0_DESTROY			SMC_RMI_CALL(0x01de)
>> +#define SMC_RMI_DPT_L1_CREATE			SMC_RMI_CALL(0x01df)
>> +#define SMC_RMI_DPT_L1_DESTROY			SMC_RMI_CALL(0x01e0)
>> +#define SMC_RMI_GRANULE_TRACKING_GET		SMC_RMI_CALL(0x01e1)
>> +
>> +#define SMC_RMI_GRANULE_TRACKING_SET		SMC_RMI_CALL(0x01e3)
>> +
>> +#define SMC_RMI_RMM_CONFIG_GET			SMC_RMI_CALL(0x01ec)
>> +
>> +#define SMC_RMI_RMM_STATE_GET			SMC_RMI_CALL(0x01ee)
>> +
>> +#define SMC_RMI_PSMMU_EVENT_CONSUME		SMC_RMI_CALL(0x01f0)
>> +#define SMC_RMI_GRANULE_RANGE_DELEGATE		SMC_RMI_CALL(0x01f1)
>> +#define SMC_RMI_GRANULE_RANGE_UNDELEGATE	SMC_RMI_CALL(0x01f2)
>> +#define SMC_RMI_GPT_L1_CREATE			SMC_RMI_CALL(0x01f3)
>> +#define SMC_RMI_GPT_L1_DESTROY			SMC_RMI_CALL(0x01f4)
>> +#define SMC_RMI_RTT_DATA_MAP			SMC_RMI_CALL(0x01f5)
>> +#define SMC_RMI_RTT_DATA_UNMAP			SMC_RMI_CALL(0x01f6)
>> +#define SMC_RMI_RTT_DEV_MAP			SMC_RMI_CALL(0x01f7)
>> +#define SMC_RMI_RTT_DEV_UNMAP			SMC_RMI_CALL(0x01f8)
>> +#define SMC_RMI_RTT_ARCH_DEV_MAP		SMC_RMI_CALL(0x01f9)
>> +#define SMC_RMI_RTT_ARCH_DEV_UNMAP		SMC_RMI_CALL(0x01fa)
>> +#define SMC_RMI_RTT_UNPROT_MAP			SMC_RMI_CALL(0x01fb)
>> +#define SMC_RMI_RTT_UNPROT_UNMAP		SMC_RMI_CALL(0x01fc)
>> +#define SMC_RMI_RTT_AUX_PROT_MAP		SMC_RMI_CALL(0x01fd)
>> +#define SMC_RMI_RTT_AUX_PROT_UNMAP		SMC_RMI_CALL(0x01fe)
>> +#define SMC_RMI_RTT_AUX_UNPROT_MAP		SMC_RMI_CALL(0x01ff)
>> +#define SMC_RMI_RTT_AUX_UNPROT_UNMAP		SMC_RMI_CALL(0x0200)
>> +#define SMC_RMI_REALM_TERMINATE			SMC_RMI_CALL(0x0201)
>> +#define SMC_RMI_RMM_ACTIVATE			SMC_RMI_CALL(0x0202)
>> +#define SMC_RMI_OP_CONTINUE			SMC_RMI_CALL(0x0203)
>> +#define SMC_RMI_PDEV_STREAM_CONNECT		SMC_RMI_CALL(0x0204)
>> +#define SMC_RMI_PDEV_STREAM_DISCONNECT		SMC_RMI_CALL(0x0205)
>> +#define SMC_RMI_PDEV_STREAM_COMPLETE		SMC_RMI_CALL(0x0206)
>> +#define SMC_RMI_PDEV_STREAM_KEY_PURGE		SMC_RMI_CALL(0x0207)
>> +#define SMC_RMI_OP_MEM_DONATE			SMC_RMI_CALL(0x0208)
>> +#define SMC_RMI_OP_MEM_RECLAIM			SMC_RMI_CALL(0x0209)
>> +#define SMC_RMI_OP_CANCEL			SMC_RMI_CALL(0x020a)
>> +#define SMC_RMI_VSMMU_FEATURES			SMC_RMI_CALL(0x020b)
>> +#define SMC_RMI_VSMMU_CMD_GET			SMC_RMI_CALL(0x020c)
>> +#define SMC_RMI_VSMMU_CMD_COMPLETE		SMC_RMI_CALL(0x020d)
>> +#define SMC_RMI_PSMMU_INFO			SMC_RMI_CALL(0x020e)
>> +
>> +#define RMI_ABI_MAJOR_VERSION	2
>> +#define RMI_ABI_MINOR_VERSION	0
>> +
>> +#define RMI_ABI_VERSION_GET_MAJOR(version) ((version) >> 16)
>> +#define RMI_ABI_VERSION_GET_MINOR(version) ((version) & 0xFFFF)
>> +#define RMI_ABI_VERSION(major, minor)      (((major) << 16) | (minor))
>> +
>> +#define RMI_UNASSIGNED			0
>> +#define RMI_ASSIGNED			1
>> +#define RMI_TABLE			2
>> +
>> +#define RMI_RETURN_STATUS(ret)		((ret) & 0xFF)
>> +#define RMI_RETURN_INDEX(ret)		(((ret) >> 8) & 0xFF)
>> +#define RMI_RETURN_MEMREQ(ret)		(((ret) >> 8) & 0x3)
>> +#define RMI_RETURN_CAN_CANCEL(ret)	(((ret) >> 10) & 0x1)
> 
> Use FIELD_GET() and specify masks that define the actual fields.
> 
>> +
>> +#define RMI_SUCCESS			0
>> +#define RMI_ERROR_INPUT			1
>> +#define RMI_ERROR_REALM			2
>> +#define RMI_ERROR_REC			3
>> +#define RMI_ERROR_RTT			4
>> +#define RMI_ERROR_NOT_SUPPORTED		5
>> +#define RMI_ERROR_DEVICE		6
>> +#define RMI_ERROR_RTT_AUX		7
>> +#define RMI_ERROR_PSMMU_ST		8
>> +#define RMI_ERROR_DPT			9
>> +#define RMI_BUSY			10
>> +#define RMI_ERROR_GLOBAL		11
>> +#define RMI_ERROR_TRACKING		12
>> +#define RMI_INCOMPLETE			13
>> +#define RMI_BLOCKED			14
>> +#define RMI_ERROR_GPT			15
>> +#define RMI_ERROR_GRANULE		16
>> +
>> +#define RMI_OP_MEM_REQ_NONE		0
>> +#define RMI_OP_MEM_REQ_DONATE		1
>> +#define RMI_OP_MEM_REQ_RECLAIM		2
>> +
>> +#define RMI_DONATE_SIZE(req)		((req) & 0x3)
>> +#define RMI_DONATE_COUNT_MASK		GENMASK(15, 2)
>> +#define RMI_DONATE_COUNT(req)		(((req) & RMI_DONATE_COUNT_MASK) >> 2)
>> +#define RMI_DONATE_CONTIG(req)		(!!((req) & BIT(16)))
>> +#define RMI_DONATE_STATE(req)		(!!((req) & BIT(17)))
> 
> FIELD_GET().
> 
>> +
>> +#define RMI_OP_MEM_DELEGATED		0
>> +#define RMI_OP_MEM_UNDELEGATED		1
>> +
>> +#define RMI_ADDR_TYPE_NONE		0
>> +#define RMI_ADDR_TYPE_SINGLE		1
>> +#define RMI_ADDR_TYPE_LIST		2
>> +
>> +#define RMI_ADDR_RANGE_SIZE_MASK	GENMASK(1, 0)
>> +#define RMI_ADDR_RANGE_COUNT_MASK	GENMASK(PAGE_SHIFT - 1, 2)
>> +#define RMI_ADDR_RANGE_ADDR_MASK	(PAGE_MASK & GENMASK(51, 0))
>> +#define RMI_ADDR_RANGE_STATE_MASK	BIT(63)
>> +
>> +#define RMI_ADDR_RANGE_SIZE(ar)		(FIELD_GET(RMI_ADDR_RANGE_SIZE_MASK, \
>> +						   (ar)))
>> +#define RMI_ADDR_RANGE_COUNT(ar)	(FIELD_GET(RMI_ADDR_RANGE_COUNT_MASK, \
>> +						   (ar)))
>> +#define RMI_ADDR_RANGE_ADDR(ar)		((ar) & RMI_ADDR_RANGE_ADDR_MASK)
>> +#define RMI_ADDR_RANGE_STATE(ar)	(FIELD_GET(RMI_ADDR_RANGE_STATE_MASK, \
>> +						   (ar)))
>> +
>> +enum rmi_ripas {
>> +	RMI_EMPTY = 0,
>> +	RMI_RAM = 1,
>> +	RMI_DESTROYED = 2,
>> +	RMI_DEV = 3,
>> +};
>> +
>> +#define RMI_NO_MEASURE_CONTENT	0
>> +#define RMI_MEASURE_CONTENT	1
>> +
>> +#define RMI_FEATURE_REGISTER_0_S2SZ		GENMASK(7, 0)
>> +#define RMI_FEATURE_REGISTER_0_LPA2		BIT(8)
>> +#define RMI_FEATURE_REGISTER_0_SVE		BIT(9)
>> +#define RMI_FEATURE_REGISTER_0_SVE_VL		GENMASK(13, 10)
>> +#define RMI_FEATURE_REGISTER_0_NUM_BPS		GENMASK(19, 14)
>> +#define RMI_FEATURE_REGISTER_0_NUM_WPS		GENMASK(25, 20)
>> +#define RMI_FEATURE_REGISTER_0_PMU		BIT(26)
>> +#define RMI_FEATURE_REGISTER_0_PMU_NUM_CTRS	GENMASK(31, 27)
>> +
>> +#define RMI_FEATURE_REGISTER_1_RMI_GRAN_SZ_4KB	BIT(0)
>> +#define RMI_FEATURE_REGISTER_1_RMI_GRAN_SZ_16KB	BIT(1)
>> +#define RMI_FEATURE_REGISTER_1_RMI_GRAN_SZ_64KB	BIT(2)
>> +#define RMI_FEATURE_REGISTER_1_HASH_SHA_256	BIT(3)
>> +#define RMI_FEATURE_REGISTER_1_HASH_SHA_384	BIT(4)
>> +#define RMI_FEATURE_REGISTER_1_HASH_SHA_512	BIT(5)
>> +#define RMI_FEATURE_REGISTER_1_MAX_RECS_ORDER	GENMASK(9, 6)
>> +#define RMI_FEATURE_REGISTER_1_L0GPTSZ		GENMASK(13, 10)
>> +#define RMI_FEATURE_REGISTER_1_PPS		GENMASK(16, 14)
>> +
>> +#define RMI_FEATURE_REGISTER_2_DA		BIT(0)
>> +#define RMI_FEATURE_REGISTER_2_DA_COH		BIT(1)
>> +#define RMI_FEATURE_REGISTER_2_VSMMU		BIT(2)
>> +#define RMI_FEATURE_REGISTER_2_ATS		BIT(3)
>> +#define RMI_FEATURE_REGISTER_2_MAX_VDEVS_ORDER	GENMASK(7, 4)
>> +#define RMI_FEATURE_REGISTER_2_VDEV_KROU	BIT(8)
>> +#define RMI_FEATURE_REGISTER_2_NON_TEE_STREAM	BIT(9)
>> +
>> +#define RMI_FEATURE_REGISTER_3_MAX_NUM_AUX_PLANES	GENMASK(3, 0)
>> +#define RMI_FEATURE_REGISTER_3_RTT_PLAN			GENMASK(5, 4)
>> +#define RMI_FEATURE_REGISTER_3_RTT_S2AP_INDIRECT	BIT(6)
>> +
>> +#define RMI_FEATURE_REGISTER_4_MEC_COUNT		GENMASK(63, 0)
>> +
>> +#define RMI_MEM_CATEGORY_CONVENTIONAL		0
>> +#define RMI_MEM_CATEGORY_DEV_NCOH		1
>> +#define RMI_MEM_CATEGORY_DEV_COH		2
>> +
>> +#define RMI_TRACKING_RESERVED			0
>> +#define RMI_TRACKING_NONE			1
>> +#define RMI_TRACKING_FINE			2
>> +#define RMI_TRACKING_COARSE			3
>> +
>> +#define RMI_GRANULE_SIZE_4KB	0
>> +#define RMI_GRANULE_SIZE_16KB	1
>> +#define RMI_GRANULE_SIZE_64KB	2
>> +
>> +/*
>> + * Note many of these fields are smaller than u64 but all fields have u64
>> + * alignment, so use u64 to ensure correct alignment.
>> + */
>> +struct rmm_config {
>> +	union { /* 0x0 */
>> +		struct {
>> +			u64 tracking_region_size;
>> +			u64 rmi_granule_size;
>> +		};
>> +		u8 sizer[0x1000];
> 
> SZ_4K?
> 
>> +	};
>> +};
>> +
>> +#define RMI_REALM_PARAM_FLAG_LPA2		BIT(0)
>> +#define RMI_REALM_PARAM_FLAG_SVE		BIT(1)
>> +#define RMI_REALM_PARAM_FLAG_PMU		BIT(2)
>> +
>> +struct realm_params {
>> +	union { /* 0x0 */
>> +		struct {
>> +			u64 flags;
>> +			u64 s2sz;
>> +			u64 sve_vl;
>> +			u64 num_bps;
>> +			u64 num_wps;
>> +			u64 pmu_num_ctrs;
>> +			u64 hash_algo;
>> +			u64 num_aux_planes;
>> +		};
>> +		u8 padding0[0x400];
> 
> SZ_1K? And similarly all over the shop?

Agreed to the comments above.

> 
> I haven't checked the details of the encodings (life is too short),
> but I wonder how much of this exists as an MRS and could be
> automatically generated?

Good point. This is something that we can check and get back to you.

Thanks
Suzuki


> 
> Thanks,
> 
> 	M.
> 


^ permalink raw reply

* Re: [PATCH v6 16/43] KVM: guest_memfd: Use actual size for invalidation in kvm_gmem_release()
From: Ackerley Tng @ 2026-05-21 14:40 UTC (permalink / raw)
  To: Sean Christopherson, Fuad Tabba
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <ag8BmtzxTlcuA_zy@google.com>

Sean Christopherson <seanjc@google.com> writes:

>
> [...snip...]
>
> --- virt/kvm/guest_memfd.c
> +++ virt/kvm/guest_memfd.c
> @@ -640,9 +640,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>  }
>
>  int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> -                 unsigned int fd, loff_t offset)
> +                 unsigned int fd, u64 offset)
>  {
> -       loff_t size = slot->npages << PAGE_SHIFT;
> +       u64 size = slot->npages << PAGE_SHIFT;
>         unsigned long start, end;
>         struct gmem_file *f;
>         struct inode *inode;
>

My mental model was:

+ offsets => loff_t
+ indices => pgoff_t
+ sizes => size_t

But looks like loff_t is more suitable for places where return values
(possibly negative) matter.

Good to go with u64!

> [...snip...]
>

^ permalink raw reply

* Re: [PATCH v6 11/43] KVM: guest_memfd: Ensure pages are not in use before conversion
From: Ackerley Tng @ 2026-05-21 14:36 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CA+EHjTyaBpTYsJRRyP09YggoHbi6s-ZgDoWoFgDRxO5k_BkoBw@mail.gmail.com>

Fuad Tabba <tabba@google.com> writes:

>
> [...snip...]
>
>> +static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t start,
>> +                                           size_t nr_pages, pgoff_t *err_index)
>> +{
>> +       struct address_space *mapping = inode->i_mapping;
>> +       const int filemap_get_folios_refcount = 1;
>> +       pgoff_t last = start + nr_pages - 1;
>> +       struct folio_batch fbatch;
>> +       bool safe = true;
>> +       int i;
>> +
>> +       folio_batch_init(&fbatch);
>> +       while (safe && filemap_get_folios(mapping, &start, last, &fbatch)) {
>> +
>> +               for (i = 0; i < folio_batch_count(&fbatch); ++i) {
>> +                       struct folio *folio = fbatch.folios[i];
>> +
>> +                       if (folio_ref_count(folio) !=
>> +                           folio_nr_pages(folio) + filemap_get_folios_refcount) {
>> +                               safe = false;
>> +                               *err_index = folio->index;
>> +                               break;
>
> https://sashiko.dev/#/patchset/20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4%40google.com?part=11
>

Sashiko's first issue on lru is addressed in a separate patch later. :)

> Sashiko raised a few issues here, but I think this one might be
> genuine. Can you look into it please?
>
> If that's right, when huge page support lands, if start falls in the
> middle of a large folio, returning folio->index as the err_index will
> return an offset strictly less than the requested start. A naive
> userspace retry loop resuming from error_offset would step backwards
> and corrupt attributes on memory it didn't intend to convert.
> err_index should be clamped to max(start, folio->index).
>

For these ones, I was thinking to defer all the huge-page related issues
to be fixed when huge pages land, since there are probably quite a few
places to update.

On second thought, this isn't a huge change, I'll fix this in the next
revision.

> Cheers,
> /fuad
>
>> +                       }
>> +               }
>> +
>>
>> [...snip...]
>>

^ permalink raw reply

* Re: [PATCH v14 10/44] arm64: RMI: Add support for SRO
From: Marc Zyngier @ 2026-05-21 14:35 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-11-steven.price@arm.com>

On Wed, 13 May 2026 14:17:18 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> RMM v2.0 introduces the concept of "Stateful RMI Operations" (SRO). This
> means that an SMC can return with an operation still in progress. The
> host is excepted to continue the operation until is reaches a conclusion
> (either success or failure). During this process the RMM can request
> additional memory ('donate') or hand memory back to the host
> ('reclaim'). The host can request an in progress operation is cancelled,
> but still continue the operation until it has completed (otherwise the
> incomplete operation may cause future RMM operations to fail).
> 
> The SRO is tracked using a struct rmi_sro_state object which keeps track
> of any memory which has been allocated but not yet consumed by the RMM
> or reclaimed from the RMM. This allows the memory to be reused in a
> future request within the same operation. It will also permit an
> operation to be done in a context where memory allocation may be
> difficult (e.g. atomic context) with the option to abort the operation
> and retry the memory allocation outside of the atomic context. The
> memory stored in the struct rmi_sro_state object can then be reused on
> the subsequent attempt.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> v14:
>  * SRO support has improved although is still not fully complete. The
>    infrastructure has been moved out of KVM.
> ---
>  arch/arm64/include/asm/rmi_cmds.h |   1 +
>  arch/arm64/kernel/rmi.c           | 359 ++++++++++++++++++++++++++++++
>  2 files changed, 360 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/rmi_cmds.h b/arch/arm64/include/asm/rmi_cmds.h
> index eb213c8e6f26..1a7b0c8f1e38 100644
> --- a/arch/arm64/include/asm/rmi_cmds.h
> +++ b/arch/arm64/include/asm/rmi_cmds.h
> @@ -35,6 +35,7 @@ struct rmi_sro_state {
>  
>  int rmi_delegate_range(phys_addr_t phys, unsigned long size);
>  int rmi_undelegate_range(phys_addr_t phys, unsigned long size);
> +int free_delegated_page(phys_addr_t phys);
>  
>  static inline int rmi_delegate_page(phys_addr_t phys)
>  {
> diff --git a/arch/arm64/kernel/rmi.c b/arch/arm64/kernel/rmi.c
> index 08cef54acadb..a8107ca9bb6d 100644
> --- a/arch/arm64/kernel/rmi.c
> +++ b/arch/arm64/kernel/rmi.c
> @@ -48,6 +48,365 @@ int rmi_undelegate_range(phys_addr_t phys, unsigned long size)
>  	return ret;
>  }
>  
> +static unsigned long donate_req_to_size(unsigned long donatereq)
> +{
> +	unsigned long unit_size = RMI_DONATE_SIZE(donatereq);
> +
> +	switch (unit_size) {
> +	case 0:
> +		return PAGE_SIZE;
> +	case 1:
> +		return PMD_SIZE;
> +	case 2:
> +		return PUD_SIZE;
> +	case 3:
> +		return P4D_SIZE;

How does this work when we have folded levels? If this is supposed to
be the architected size, then it should actively express that:

	return BIT(unit_size * (PAGE_SHIFT - 3) + PAGE_SHIFT);

> +	}
> +	unreachable();
> +}
> +
> +static void rmi_smccc_invoke(struct arm_smccc_1_2_regs *regs_in,
> +			     struct arm_smccc_1_2_regs *regs_out)
> +{
> +	struct arm_smccc_1_2_regs regs = *regs_in;
> +	unsigned long status;
> +
> +	do {
> +		arm_smccc_1_2_invoke(&regs, regs_out);
> +		status = RMI_RETURN_STATUS(regs_out->a0);
> +	} while (status == RMI_BUSY || status == RMI_BLOCKED);
> +}
> +
> +int free_delegated_page(phys_addr_t phys)
> +{
> +	if (WARN_ON(rmi_undelegate_page(phys))) {

Please drop this WARN_ON(). Or at least make it ONCE. Everywhere.

> +		/* Undelegate failed: leak the page */
> +		return -EBUSY;
> +	}
> +
> +	free_page((unsigned long)phys_to_virt(phys));
> +
> +	return 0;
> +}
> +
> +static int rmi_sro_ensure_capacity(struct rmi_sro_state *sro,
> +				   unsigned long count)
> +{
> +	if (WARN_ON_ONCE(sro->addr_count > RMI_MAX_ADDR_LIST))
> +		return -EOVERFLOW;
> +
> +	if (count > RMI_MAX_ADDR_LIST - sro->addr_count)
> +		return -ENOSPC;
> +
> +	return 0;
> +}
> +
> +static int rmi_sro_donate_contig(struct rmi_sro_state *sro,
> +				 unsigned long sro_handle,
> +				 unsigned long donatereq,
> +				 struct arm_smccc_1_2_regs *out_regs,
> +				 gfp_t gfp)
> +{
> +	unsigned long unit_size = RMI_DONATE_SIZE(donatereq);
> +	unsigned long unit_size_bytes = donate_req_to_size(donatereq);
> +	unsigned long count = RMI_DONATE_COUNT(donatereq);
> +	unsigned long state = RMI_DONATE_STATE(donatereq);
> +	unsigned long size = unit_size_bytes * count;
> +	unsigned long addr_range;
> +	int ret;
> +	void *virt;
> +	phys_addr_t phys;
> +	struct arm_smccc_1_2_regs regs = {
> +		SMC_RMI_OP_MEM_DONATE,
> +		sro_handle
> +	};
> +
> +	for (int i = 0; i < sro->addr_count; i++) {
> +		unsigned long entry = sro->addr_list[i];
> +
> +		if (RMI_ADDR_RANGE_SIZE(entry) == unit_size &&
> +		    RMI_ADDR_RANGE_COUNT(entry) == count &&
> +		    RMI_ADDR_RANGE_STATE(entry) == state) {
> +			sro->addr_count--;
> +			swap(sro->addr_list[sro->addr_count],
> +			     sro->addr_list[i]);
> +
> +			goto out;
> +		}
> +	}
> +
> +	ret = rmi_sro_ensure_capacity(sro, 1);
> +	if (ret)
> +		return ret;
> +
> +	virt = alloc_pages_exact(size, gfp);
> +	if (!virt)
> +		return -ENOMEM;
> +	phys = virt_to_phys(virt);
> +
> +	if (state == RMI_OP_MEM_DELEGATED) {
> +		if (rmi_delegate_range(phys, size)) {
> +			free_pages_exact(virt, size);
> +			return -ENXIO;
> +		}
> +	}
> +
> +	addr_range = phys & RMI_ADDR_RANGE_ADDR_MASK;
> +	FIELD_MODIFY(RMI_ADDR_RANGE_SIZE_MASK, &addr_range, unit_size);
> +	FIELD_MODIFY(RMI_ADDR_RANGE_COUNT_MASK, &addr_range, count);
> +	FIELD_MODIFY(RMI_ADDR_RANGE_STATE_MASK, &addr_range, state);
> +
> +	sro->addr_list[sro->addr_count] = addr_range;
> +

Shouldn't this be moved to a helper that ensures capacity, and returns
an error otherwise?

> +out:
> +	regs.a2 = virt_to_phys(&sro->addr_list[sro->addr_count]);
> +	regs.a3 = 1;

This could really do with context specific helpers that populate regs
based on a set of parameters. I have no idea what this 1 here is, and
the init is spread over too much code. Think of the children!

That's valid for the whole patch.

	M.
> +	rmi_smccc_invoke(&regs, out_regs);
> +
> +	unsigned long donated_granules = out_regs->a1;
> +	unsigned long donated_size = donated_granules << PAGE_SHIFT;
> +
> +	if (donated_granules == 0) {
> +		/* No pages used by the RMM */
> +		sro->addr_count++;
> +	} else if (donated_size < size) {
> +		phys = sro->addr_list[sro->addr_count] & RMI_ADDR_RANGE_ADDR_MASK;
> +
> +		/* Not all granules used by the RMM, free the remaining pages */
> +		for (long i = donated_size; i < size; i += PAGE_SIZE) {
> +			if (state == RMI_OP_MEM_DELEGATED)
> +				free_delegated_page(phys + i);
> +			else
> +				__free_page(phys_to_page(phys + i));
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int rmi_sro_donate_noncontig(struct rmi_sro_state *sro,
> +				    unsigned long sro_handle,
> +				    unsigned long donatereq,
> +				    struct arm_smccc_1_2_regs *out_regs,
> +				    gfp_t gfp)
> +{
> +	unsigned long unit_size = RMI_DONATE_SIZE(donatereq);
> +	unsigned long unit_size_bytes = donate_req_to_size(donatereq);
> +	unsigned long count = RMI_DONATE_COUNT(donatereq);
> +	unsigned long state = RMI_DONATE_STATE(donatereq);
> +	unsigned long found = 0;
> +	unsigned long addr_list_start = sro->addr_count;
> +	int ret;
> +	struct arm_smccc_1_2_regs regs = {
> +		SMC_RMI_OP_MEM_DONATE,
> +		sro_handle
> +	};
> +
> +	for (int i = 0; i < addr_list_start && found < count; i++) {
> +		unsigned long entry = sro->addr_list[i];
> +
> +		if (RMI_ADDR_RANGE_SIZE(entry) == unit_size &&
> +		    RMI_ADDR_RANGE_COUNT(entry) == 1 &&
> +		    RMI_ADDR_RANGE_STATE(entry) == state) {
> +			addr_list_start--;
> +			swap(sro->addr_list[addr_list_start],
> +			     sro->addr_list[i]);
> +			found++;
> +			i--;
> +		}
> +	}
> +
> +	ret = rmi_sro_ensure_capacity(sro, count - found);
> +	if (ret)
> +		return ret;
> +
> +	while (found < count) {
> +		unsigned long addr_range;
> +		void *virt = alloc_pages_exact(unit_size_bytes, gfp);
> +		phys_addr_t phys;
> +
> +		if (!virt)
> +			return -ENOMEM;
> +
> +		phys = virt_to_phys(virt);
> +
> +		if (state == RMI_OP_MEM_DELEGATED) {
> +			if (rmi_delegate_range(phys, unit_size_bytes)) {
> +				free_pages_exact(virt, unit_size_bytes);
> +				return -ENXIO;
> +			}
> +		}
> +
> +		addr_range = phys & RMI_ADDR_RANGE_ADDR_MASK;
> +		FIELD_MODIFY(RMI_ADDR_RANGE_SIZE_MASK, &addr_range, unit_size);
> +		FIELD_MODIFY(RMI_ADDR_RANGE_COUNT_MASK, &addr_range, 1);
> +		FIELD_MODIFY(RMI_ADDR_RANGE_STATE_MASK, &addr_range, state);
> +
> +		sro->addr_list[sro->addr_count++] = addr_range;
> +		found++;
> +	}
> +
> +	regs.a2 = virt_to_phys(&sro->addr_list[addr_list_start]);
> +	regs.a3 = found;
> +	rmi_smccc_invoke(&regs, out_regs);
> +
> +	unsigned long donated_granules = out_regs->a1;
> +
> +	if (WARN_ON(donated_granules & ((unit_size_bytes >> PAGE_SHIFT) - 1))) {
> +		/*
> +		 * FIXME: RMM has only consumed part of a huge page, this leaks
> +		 * the rest of the huge page
> +		 */
> +		donated_granules = ALIGN(donated_granules,
> +					 (unit_size_bytes >> PAGE_SHIFT));
> +	}
> +	unsigned long donated_blocks = donated_granules / (unit_size_bytes >> PAGE_SHIFT);
> +
> +	if (WARN_ON(donated_blocks > found))
> +		donated_blocks = found;
> +
> +	unsigned long undonated_blocks = found - donated_blocks;
> +
> +	while (donated_blocks && undonated_blocks) {
> +		sro->addr_count--;
> +		swap(sro->addr_list[addr_list_start],
> +		     sro->addr_list[sro->addr_count]);
> +		addr_list_start++;
> +
> +		donated_blocks--;
> +		undonated_blocks--;
> +	}
> +	sro->addr_count -= donated_blocks;
> +
> +	return 0;
> +}
> +
> +static int rmi_sro_donate(struct rmi_sro_state *sro,
> +			  unsigned long sro_handle,
> +			  unsigned long donatereq,
> +			  struct arm_smccc_1_2_regs *regs,
> +			  gfp_t gfp)
> +{
> +	unsigned long count = RMI_DONATE_COUNT(donatereq);
> +
> +	if (WARN_ON(!count))
> +		return 0;
> +
> +	if (RMI_DONATE_CONTIG(donatereq)) {
> +		return rmi_sro_donate_contig(sro, sro_handle, donatereq,
> +					     regs, gfp);
> +	} else {
> +		return rmi_sro_donate_noncontig(sro, sro_handle, donatereq,
> +						regs, gfp);
> +	}
> +}
> +
> +static int rmi_sro_reclaim(struct rmi_sro_state *sro,
> +			   unsigned long sro_handle,
> +			   struct arm_smccc_1_2_regs *out_regs)
> +{
> +	unsigned long capacity;
> +	struct arm_smccc_1_2_regs regs;
> +	int ret;
> +
> +	ret = rmi_sro_ensure_capacity(sro, 1);
> +	if (ret)
> +		rmi_sro_free(sro);
> +
> +	capacity = RMI_MAX_ADDR_LIST - sro->addr_count;
> +
> +	regs = (struct arm_smccc_1_2_regs){
> +		SMC_RMI_OP_MEM_RECLAIM,
> +		sro_handle,
> +		virt_to_phys(&sro->addr_list[sro->addr_count]),
> +		capacity
> +	};
> +	rmi_smccc_invoke(&regs, out_regs);
> +
> +	if (WARN_ON_ONCE(out_regs->a1 > capacity))
> +		out_regs->a1 = capacity;
> +
> +	sro->addr_count += out_regs->a1;
> +
> +	return 0;
> +}
> +
> +void rmi_sro_free(struct rmi_sro_state *sro)
> +{
> +	for (int i = 0; i < sro->addr_count; i++) {
> +		unsigned long entry = sro->addr_list[i];
> +		unsigned long addr = RMI_ADDR_RANGE_ADDR(entry);
> +		unsigned long unit_size = RMI_ADDR_RANGE_SIZE(entry);
> +		unsigned long count = RMI_ADDR_RANGE_COUNT(entry);
> +		unsigned long state = RMI_ADDR_RANGE_STATE(entry);
> +		unsigned long size = donate_req_to_size(unit_size) * count;
> +
> +		if (state == RMI_OP_MEM_DELEGATED) {
> +			if (WARN_ON(rmi_undelegate_range(addr, size))) {
> +				/* Leak the pages */
> +				continue;
> +			}
> +		}
> +		free_pages_exact(phys_to_virt(addr), size);
> +	}
> +
> +	sro->addr_count = 0;
> +}
> +
> +unsigned long rmi_sro_execute(struct rmi_sro_state *sro, gfp_t gfp)
> +{
> +	unsigned long sro_handle;
> +	struct arm_smccc_1_2_regs regs;
> +	struct arm_smccc_1_2_regs *regs_in = &sro->regs;
> +
> +	rmi_smccc_invoke(regs_in, &regs);
> +
> +	sro_handle = regs.a1;
> +
> +	while (RMI_RETURN_STATUS(regs.a0) == RMI_INCOMPLETE) {
> +		bool can_cancel = RMI_RETURN_CAN_CANCEL(regs.a0);
> +		int ret;
> +
> +		switch (RMI_RETURN_MEMREQ(regs.a0)) {
> +		case RMI_OP_MEM_REQ_NONE:
> +			regs = (struct arm_smccc_1_2_regs){
> +				SMC_RMI_OP_CONTINUE, sro_handle, 0
> +			};
> +			rmi_smccc_invoke(&regs, &regs);
> +			break;
> +		case RMI_OP_MEM_REQ_DONATE:
> +			ret = rmi_sro_donate(sro, sro_handle, regs.a2, &regs,
> +					     gfp);
> +			break;
> +		case RMI_OP_MEM_REQ_RECLAIM:
> +			ret = rmi_sro_reclaim(sro, sro_handle, &regs);
> +			break;
> +		default:
> +			ret = WARN_ON(1);
> +			break;
> +		}
> +
> +		if (ret) {
> +			if (can_cancel) {
> +				/*
> +				 * FIXME: Handle cancelling properly!
> +				 *
> +				 * If the operation has failed due to memory
> +				 * allocation failure then the information on
> +				 * the memory allocation should be saved, so
> +				 * that the allocation can be repeated outside
> +				 * of any context which prevented the
> +				 * allocation.

Honestly, this is the sort of stuff that I'd expect to be solved
*before* posting this code. Since this is so central to the whole
memory management, it needs to be correct from day-1.

If you can't make it work in time, then tone the supported features
down. But FIXMEs and WARN_ONs are not the way to go.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply

* Re: [PATCH v6 05/43] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
From: Ackerley Tng @ 2026-05-21 14:29 UTC (permalink / raw)
  To: Sean Christopherson, Fuad Tabba
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <ag8JIlHjohAOC3-g@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Thu, May 21, 2026, Fuad Tabba wrote:
>> On Wed, 20 May 2026 at 22:44, Ackerley Tng <ackerleytng@google.com> wrote:
>> >
>> > Fuad Tabba <tabba@google.com> writes:
>> >
>> > >
>> > > [...snip...]
>> > >
>> > >> +unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
>> > >> +{
>> > >> +       struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
>> > >> +       struct inode *inode;
>> > >> +
>> > >> +       /*
>> > >> +        * If this gfn has no associated memslot, there's no chance of the gfn
>> > >> +        * being backed by private memory, since guest_memfd must be used for
>> > >> +        * private memory, and guest_memfd must be associated with some memslot.
>> > >> +        */
>> > >> +       if (!slot)
>> > >> +               return 0;
>> > >> +
>> > >> +       CLASS(gmem_get_file, file)(slot);
>> > >> +       if (!file)
>> > >> +               return 0;
>> > >> +
>> > >> +       inode = file_inode(file);
>> > >> +
>> > >> +       /*
>> > >> +        * Rely on the maple tree's internal RCU lock to ensure a
>> > >> +        * stable result. This result can become stale as soon as the
>> > >> +        * lock is dropped, so the caller _must_ still protect
>> > >> +        * consumption of private vs. shared by checking
>> > >> +        * mmu_invalidate_retry_gfn() under mmu_lock to serialize
>> > >> +        * against ongoing attribute updates.
>> > >> +        */
>> > >> +       return kvm_gmem_get_attributes(inode, kvm_gmem_get_index(slot, gfn));
>> > >> +}
>> > >
>> > > Doesn't this imply that all consumers of kvm_mem_is_private() should
>> > > validate the result using mmu_lock and the invalidation sequence?
>> >
>> > Let me know how I can improve the comment.
>>
>> Given Sean's context, the comment is good I think. I would quibble
>> with the the "_must_ still protect" phrasing being a bit too strict.
>>
>> Maybe just soften it slightly to acknowledge the exception? Something like:
>>
>>   * lock is dropped, so callers that require a strict result _must_ protect
>>   * consumption of private vs. shared by checking mmu_invalidate_retry_gfn()
>>   * under mmu_lock to serialize against ongoing attribute updates. Callers
>>   * doing lockless reads must be able to tolerate a stale result.
>>
>> That aligns the comment with how KVM is actually using it today. That
>> said, this is nitpicking. Feel free to use or ignore.
>
> Hmm, I wonder if we can figure out a way to consolidate some documentation,
> because this is _exactly_ the same pattern that x86's host_pfn_mapping_level()
> deals with (see its big comment below).
>

This would be great, are you thinking an actual comment or something in
Documentation/?

Perhaps we could iterate on this a little with me providing the newbie
perspective. Do you want me to take a stab at writing something up?

> There's also the stale comment in kvm_invalidate_memslot(), which, stating the
> obvious, speaks to the memslot+SRCU side of things.
>
> Maybe it makes sense to to find a central location for one giant comment about
> how how MMU notifier events and memslot+SRCU protections work?  And then refer
> to that in paths where some asset needs to be tied into MMU notifiers and/or
> memslots+SRCU?
>
> [*] https://lore.kernel.org/all/agcbWe8s9lmPuJwG@google.com
>
> [...snip...]
>

^ permalink raw reply

* Re: [PATCH v14 08/44] arm64: RMI: Ensure that the RMM has GPT entries for memory
From: Marc Zyngier @ 2026-05-21 14:24 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <868q9cx4ac.wl-maz@kernel.org>

On Thu, 21 May 2026 14:47:55 +0100,
Marc Zyngier <maz@kernel.org> wrote:
> 
> On Wed, 13 May 2026 14:17:16 +0100,
> Steven Price <steven.price@arm.com> wrote:
> > 
> > +/*
> > + * For now we set the tracking_region_size to 0 for RMI_RMM_CONFIG_SET().
> > + * TODO: Support other tracking sizes (via Kconfig option).
> > + */
> > +#ifdef CONFIG_PAGE_SIZE_4KB
> > +#define RMM_GRANULE_TRACKING_SIZE	SZ_1G
> > +#elif defined(CONFIG_PAGE_SIZE_16KB)
> > +#define RMM_GRANULE_TRACKING_SIZE	SZ_32M
> > +#elif defined(CONFIG_PAGE_SIZE_64KB)
> > +#define RMM_GRANULE_TRACKING_SIZE	SZ_512M
> > +#endif
> 
> Basically, a level 2 mapping. Which means this whole block really is:
> 
> #define RMM_GRANULE_TRAKING_SIZE	(2 * PAGE_SHIFT - 3)

Obviously wrong:

#define RMM_GRANULE_TRAKING_SIZE	BIT(2 * PAGE_SHIFT - 3)

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply

* Re: [PATCH v6 19/43] KVM: Let userspace disable per-VM mem attributes, enable per-gmem attributes
From: Sean Christopherson @ 2026-05-21 14:21 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: ackerleytng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CA+EHjTzu=28Sr3=9A9LmJEGz0tBEDbU9taznVV5kdL7s8Nw=Jg@mail.gmail.com>

On Thu, May 21, 2026, Fuad Tabba wrote:
> Hi Ackerley,
> 
> On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
> <devnull+ackerleytng.google.com@kernel.org> wrote:
> >
> > From: Sean Christopherson <seanjc@google.com>
> >
> > Make vm_memory_attributes a module parameter so that userspace can disable
> > the use of memory attributes on the VM level.
> >
> > To avoid inconsistencies in the way memory attributes are tracked in KVM
> > and guest_memfd, the vm_memory_attributes module_param is made
> > read-only (0444).
> >
> > Make CONFIG_KVM_VM_MEMORY_ATTRIBUTES selectable, only for (CoCo) VM types
> > that might use vm_memory_attributes.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> 
> Config files always confuse me, but Sashiko might be onto something:
> 
> https://sashiko.dev/#/patchset/20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4%40google.com?part=19

: Since this prompt does not have a default value, will it default to N
: and silently drop KVM_VM_MEMORY_ATTRIBUTES during configuration updates
: like make olddefconfig?
: 
: Existing userspace VMMs that rely on the KVM_SET_MEMORY_ATTRIBUTES ioctl
: for TDX or SEV VMs might fail to boot if the feature is unexpectedly
: compiled out. Could a default y be used to preserve backwards
: compatibility for existing configurations?

> I think this partially goes back to commit 6, the one I flagged
> yesterday. But also adding "default y" to KVM_VM_MEMORY_ATTRIBUTES?
> The default value should at least fix this issue, but I'm not sure if
> it would cause other problems...

Hrm.  As much as I want per-gmem attributes to be the default going forward,
silently breaking existing setups isn't great.  On the other hand, I'm *very*
skeptical there are any SNP or TDX deployments using a distro kernel, so I'm
still leaning towards forcing the issue and turning per-VM attributes off by
default.

^ permalink raw reply

* Re: [PATCH v3 27/41] x86/kvmclock: Enable kvmclock on APs during onlining if kvmclock isn't sched_clock
From: David Woodhouse @ 2026-05-21 14:13 UTC (permalink / raw)
  To: Sean Christopherson, Peter Zijlstra
  Cc: Kiryl Shutsemau, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
	Jan Kiszka, Dave Hansen, Andy Lutomirski, Juergen Gross,
	Daniel Lezcano, Thomas Gleixner, John Stultz, Rick Edgecombe,
	Vitaly Kuznetsov, Broadcom internal kernel review list,
	Boris Ostrovsky, Stephen Boyd, x86, linux-coco, kvm, linux-hyperv,
	virtualization, linux-kernel, xen-devel, Michael Kelley,
	Tom Lendacky, Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <ag8K2FRGcoEa-D2Y@google.com>

[-- Attachment #1: Type: text/plain, Size: 2440 bytes --]

On Thu, 2026-05-21 at 06:38 -0700, Sean Christopherson wrote:
> On Thu, May 21, 2026, Peter Zijlstra wrote:
> > On Thu, May 21, 2026 at 05:59:17AM -0700, Sean Christopherson wrote:
> > > On Thu, May 21, 2026, David Woodhouse wrote:
> > > > On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> > > > > In anticipation of making x86_cpuinit.early_percpu_clock_init(), i.e.
> > > > > kvm_setup_secondary_clock(), a dedicated sched_clock hook that will be
> > > > > invoked if and only if kvmclock is set as sched_clock, ensure APs enable
> > > > > their kvmclock during CPU online.  While a redundant write to the MSR is
> > > > > technically ok, skip the registration when kvmclock is sched_clock so that
> > > > > it's somewhat obvious that kvmclock *needs* to be enabled during early
> > > > > bringup when it's being used as sched_clock.
> > > > > 
> > > > > Plumb in the BSP's resume path purely for documentation purposes.  Both
> > > > > KVM (as-a-guest) and timekeeping/clocksource hook syscore_ops, and it's
> > > > > not super obvious that using KVM's hooks would be flawed.  E.g. it would
> > > > > work today, because KVM's hooks happen to run after/before timekeeping's
> > > > > hooks during suspend/resume, but that's sheer dumb luck as the order in
> > > > > which syscore_ops are invoked depends entirely on when a subsystem is
> > > > > initialized and thus registers its hooks.
> > > > > 
> > > > > Opportunsitically make the registration messages more precise to help
> > > > > debug issues where kvmclock is enabled too late.
> > > > 
> > > > That's a hard word to type, isn't it?
> > > 
> > > Heh, you have no idea.  I've been "this" close to creating a VIM binding for a
> > > while, it is time...
> > 
> > 'z=' not good enough?
> 
> You people and your fancy ways.  I'm just happy I can get in and out of the editor :-)

I reached the peak of my vi knowledge in about 1995 when I learned that
I could log in on another terminal, kill it from there, and then set
EDITOR=emacs.

Ironically I still find myself doing that kind of thing when I'm
composing a git-send-email cover letter and decide I don't want to send
the series as-is at all. Maybe there's a way to put a poison pill in
the message (or save it unchanged?) to make git *NOT* send anything...
but I always err on the side of caution and just nuke it from orbit, or
at least from another terminal.


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v14 09/44] arm64: RMI: Provide functions to delegate/undelegate ranges of memory
From: Marc Zyngier @ 2026-05-21 13:59 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-10-steven.price@arm.com>

On Wed, 13 May 2026 14:17:17 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> The RMM requires memory is 'delegated' to it so that it can be used
> either for a realm guest or for various tracking purposes within the RMM
> (e.g. for metadata or page tables). Memory that has been delegated
> cannot be accessed by the host (it will result in a Granule Protection
> Fault).
> 
> Undelegation may fail if the memory is still in use by the RMM. This
> shouldn't happen (Linux should ensure it has destroyed the RMM objects
> before attempting to undelegate). In the event that it does happen this
> points to a programming bug and the only reasonable approach is for the
> physical pages to be leaked - it is up to the caller of
> rmi_undelegate_range() to handle this.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> v14:
>  * Split into separate patch and moved out of KVM
> ---
>  arch/arm64/include/asm/rmi_cmds.h | 13 +++++++++++
>  arch/arm64/kernel/rmi.c           | 36 +++++++++++++++++++++++++++++++
>  2 files changed, 49 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/rmi_cmds.h b/arch/arm64/include/asm/rmi_cmds.h
> index 9078a2920a7c..eb213c8e6f26 100644
> --- a/arch/arm64/include/asm/rmi_cmds.h
> +++ b/arch/arm64/include/asm/rmi_cmds.h
> @@ -33,6 +33,19 @@ struct rmi_sro_state {
>  } while (RMI_RETURN_STATUS(res.a0) == RMI_BUSY ||			\
>  	 RMI_RETURN_STATUS(res.a0) == RMI_BLOCKED)
>  
> +int rmi_delegate_range(phys_addr_t phys, unsigned long size);
> +int rmi_undelegate_range(phys_addr_t phys, unsigned long size);
> +
> +static inline int rmi_delegate_page(phys_addr_t phys)
> +{
> +	return rmi_delegate_range(phys, PAGE_SIZE);
> +}
> +
> +static inline int rmi_undelegate_page(phys_addr_t phys)
> +{
> +	return rmi_undelegate_range(phys, PAGE_SIZE);
> +}
> +
>  bool rmi_is_available(void);
>  
>  unsigned long rmi_sro_execute(struct rmi_sro_state *sro, gfp_t gfp);
> diff --git a/arch/arm64/kernel/rmi.c b/arch/arm64/kernel/rmi.c
> index 52a415e99500..08cef54acadb 100644
> --- a/arch/arm64/kernel/rmi.c
> +++ b/arch/arm64/kernel/rmi.c
> @@ -12,6 +12,42 @@ static bool arm64_rmi_is_available;
>  unsigned long rmm_feat_reg0;
>  unsigned long rmm_feat_reg1;
>  
> +int rmi_delegate_range(phys_addr_t phys, unsigned long size)
> +{
> +	unsigned long ret = 0;
> +	unsigned long top = phys + size;
> +	unsigned long out_top;
> +
> +	while (phys < top) {
> +		ret = rmi_granule_range_delegate(phys, top, &out_top);
> +		if (ret == RMI_SUCCESS)
> +			phys = out_top;
> +		else if (ret != RMI_BUSY && ret != RMI_BLOCKED)
> +			return ret;
> +	}
> +
> +	return ret;
> +}
> +
> +int rmi_undelegate_range(phys_addr_t phys, unsigned long size)
> +{
> +	unsigned long ret = 0;
> +	unsigned long top = phys + size;
> +	unsigned long out_top;
> +
> +	WARN_ON(size == 0);

I find it odd to warn on size = 0. After all, free(NULL) is not an
error. But even then, you continue feeding this to the RMM.

You also don't seem to be bothered with that on the delegation side...

> +
> +	while (phys < top) {
> +		ret = rmi_granule_range_undelegate(phys, top, &out_top);
> +		if (ret == RMI_SUCCESS)
> +			phys = out_top;

and size==0 doesn't violate any of the failure conditions listed in
B4.5.18.2 (beta2). Will you end-up looping around forever?

Same questions for the delegation, obviously.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply

* Re: [PATCH v6 05/43] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
From: Fuad Tabba @ 2026-05-21 13:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <ag8JIlHjohAOC3-g@google.com>

On Thu, 21 May 2026 at 14:31, Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, May 21, 2026, Fuad Tabba wrote:
> > On Wed, 20 May 2026 at 22:44, Ackerley Tng <ackerleytng@google.com> wrote:
> > >
> > > Fuad Tabba <tabba@google.com> writes:
> > >
> > > >
> > > > [...snip...]
> > > >
> > > >> +unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
> > > >> +{
> > > >> +       struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
> > > >> +       struct inode *inode;
> > > >> +
> > > >> +       /*
> > > >> +        * If this gfn has no associated memslot, there's no chance of the gfn
> > > >> +        * being backed by private memory, since guest_memfd must be used for
> > > >> +        * private memory, and guest_memfd must be associated with some memslot.
> > > >> +        */
> > > >> +       if (!slot)
> > > >> +               return 0;
> > > >> +
> > > >> +       CLASS(gmem_get_file, file)(slot);
> > > >> +       if (!file)
> > > >> +               return 0;
> > > >> +
> > > >> +       inode = file_inode(file);
> > > >> +
> > > >> +       /*
> > > >> +        * Rely on the maple tree's internal RCU lock to ensure a
> > > >> +        * stable result. This result can become stale as soon as the
> > > >> +        * lock is dropped, so the caller _must_ still protect
> > > >> +        * consumption of private vs. shared by checking
> > > >> +        * mmu_invalidate_retry_gfn() under mmu_lock to serialize
> > > >> +        * against ongoing attribute updates.
> > > >> +        */
> > > >> +       return kvm_gmem_get_attributes(inode, kvm_gmem_get_index(slot, gfn));
> > > >> +}
> > > >
> > > > Doesn't this imply that all consumers of kvm_mem_is_private() should
> > > > validate the result using mmu_lock and the invalidation sequence?
> > >
> > > Let me know how I can improve the comment.
> >
> > Given Sean's context, the comment is good I think. I would quibble
> > with the the "_must_ still protect" phrasing being a bit too strict.
> >
> > Maybe just soften it slightly to acknowledge the exception? Something like:
> >
> >   * lock is dropped, so callers that require a strict result _must_ protect
> >   * consumption of private vs. shared by checking mmu_invalidate_retry_gfn()
> >   * under mmu_lock to serialize against ongoing attribute updates. Callers
> >   * doing lockless reads must be able to tolerate a stale result.
> >
> > That aligns the comment with how KVM is actually using it today. That
> > said, this is nitpicking. Feel free to use or ignore.
>
> Hmm, I wonder if we can figure out a way to consolidate some documentation,
> because this is _exactly_ the same pattern that x86's host_pfn_mapping_level()
> deals with (see its big comment below).
>
> There's also the stale comment in kvm_invalidate_memslot(), which, stating the
> obvious, speaks to the memslot+SRCU side of things.
>
> Maybe it makes sense to to find a central location for one giant comment about
> how how MMU notifier events and memslot+SRCU protections work?  And then refer
> to that in paths where some asset needs to be tied into MMU notifiers and/or
> memslots+SRCU?
>
> [*] https://lore.kernel.org/all/agcbWe8s9lmPuJwG@google.com

This would fix a few related issues at once. sgtm
/fuad


/fuad

>
> /*
>  * Lookup the mapping level for @gfn in the current mm.
>  *
>  * WARNING!  Use of host_pfn_mapping_level() requires the caller and the end
>  * consumer to be tied into KVM's handlers for MMU notifier events!
>  *
>  * There are several ways to safely use this helper:
>  *
>  * - Check mmu_invalidate_retry_gfn() after grabbing the mapping level, before
>  *   consuming it.  In this case, mmu_lock doesn't need to be held during the
>  *   lookup, but it does need to be held while checking the MMU notifier.
>  *
>  * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation
>  *   event for the hva.  This can be done by explicit checking the MMU notifier
>  *   or by ensuring that KVM already has a valid mapping that covers the hva.
>  *
>  * - Do not use the result to install new mappings, e.g. use the host mapping
>  *   level only to decide whether or not to zap an entry.  In this case, it's
>  *   not required to hold mmu_lock (though it's highly likely the caller will
>  *   want to hold mmu_lock anyways, e.g. to modify SPTEs).
>  *
>  * Note!  The lookup can still race with modifications to host page tables, but
>  * the above "rules" ensure KVM will not _consume_ the result of the walk if a
>  * race with the primary MMU occurs.
>  */

^ permalink raw reply

* Re: [PATCH v14 08/44] arm64: RMI: Ensure that the RMM has GPT entries for memory
From: Marc Zyngier @ 2026-05-21 13:47 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-9-steven.price@arm.com>

On Wed, 13 May 2026 14:17:16 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> The RMM maintains the state of all the granules in the system to make
> sure that the host is abiding by the rules. This state can be maintained
> at different granularity, per page (TRACKING_FINE) or per region
> (TRACKING_COARSE). The region size depends on the underlying
> "RMI_GRANULE_SIZE". For a "coarse" region all pages in the region must
> be of the same state, this implies we need to have "fine" tracking for
> DRAM, so that we can delegated individual pages.
> 
> For now we only support a statically carved out memory for tracking
> granules for the "fine" regions. This can be extended in the future to
> allow modifying the tracking granularity and remove the need for a
> static allocation.
> 
> Similarly, the firmware may create L0 GPT entries describing the total
> address space. But if we change the "PAS" (Physical Address Space) of a
> granule then the firmware may need to create L1 tables to track the PAS
> at a finer granularity.
> 
> Note: support is currently missing for SROs which means that if the RMM
> needs memory donating this will fail (and render CCA unusable in Linux).
> This effectively means that the L1 GPT tables must be created before
> Linux starts.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
>  * Moved out of KVM
> ---
>  arch/arm64/include/asm/rmi_cmds.h |   2 +
>  arch/arm64/kernel/rmi.c           | 103 ++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/rmi_cmds.h b/arch/arm64/include/asm/rmi_cmds.h
> index 9179934925c5..9078a2920a7c 100644
> --- a/arch/arm64/include/asm/rmi_cmds.h
> +++ b/arch/arm64/include/asm/rmi_cmds.h
> @@ -33,6 +33,8 @@ struct rmi_sro_state {
>  } while (RMI_RETURN_STATUS(res.a0) == RMI_BUSY ||			\
>  	 RMI_RETURN_STATUS(res.a0) == RMI_BLOCKED)
>  
> +bool rmi_is_available(void);
> +
>  unsigned long rmi_sro_execute(struct rmi_sro_state *sro, gfp_t gfp);
>  void rmi_sro_free(struct rmi_sro_state *sro);
>  
> diff --git a/arch/arm64/kernel/rmi.c b/arch/arm64/kernel/rmi.c
> index a14ead5dedda..52a415e99500 100644
> --- a/arch/arm64/kernel/rmi.c
> +++ b/arch/arm64/kernel/rmi.c
> @@ -7,6 +7,8 @@
>  
>  #include <asm/rmi_cmds.h>
>  
> +static bool arm64_rmi_is_available;
> +
>  unsigned long rmm_feat_reg0;
>  unsigned long rmm_feat_reg1;
>  
> @@ -88,6 +90,102 @@ static int rmi_configure(void)
>  	return 0;
>  }
>  
> +/*
> + * For now we set the tracking_region_size to 0 for RMI_RMM_CONFIG_SET().
> + * TODO: Support other tracking sizes (via Kconfig option).
> + */
> +#ifdef CONFIG_PAGE_SIZE_4KB
> +#define RMM_GRANULE_TRACKING_SIZE	SZ_1G
> +#elif defined(CONFIG_PAGE_SIZE_16KB)
> +#define RMM_GRANULE_TRACKING_SIZE	SZ_32M
> +#elif defined(CONFIG_PAGE_SIZE_64KB)
> +#define RMM_GRANULE_TRACKING_SIZE	SZ_512M
> +#endif

Basically, a level 2 mapping. Which means this whole block really is:

#define RMM_GRANULE_TRAKING_SIZE	(2 * PAGE_SHIFT - 3)

(adjust for D128 as needed).

> +
> +/*
> + * Make sure the area is tracked by RMM at FINE granularity.
> + * We do not support changing the tracking yet.
> + */
> +static int rmi_verify_memory_tracking(phys_addr_t start, phys_addr_t end)
> +{
> +	while (start < end) {
> +		unsigned long ret, category, state, next;
> +
> +		ret = rmi_granule_tracking_get(start, end, &category, &state, &next);
> +		if (ret != RMI_SUCCESS ||
> +		    state != RMI_TRACKING_FINE ||
> +		    category != RMI_MEM_CATEGORY_CONVENTIONAL) {
> +			/* TODO: Set granule tracking in this case */
> +			pr_err("Granule tracking for region isn't fine/conventional: %llx",
> +			       start);
> +			return -ENODEV;

How is this triggered? Do we really need to spam the console with
this? A PA doesn't mean much, and there is no context (stack trace).

If that's not expected, turn this into a WARN_ONCE().

> +		}
> +		start = next;
> +	}
> +
> +	return 0;
> +}
> +
> +static unsigned long rmi_l0gpt_size(void)
> +{
> +	return 1UL << (30 + FIELD_GET(RMI_FEATURE_REGISTER_1_L0GPTSZ,
> +				      rmm_feat_reg1));
> +}
> +
> +static int rmi_create_gpts(phys_addr_t start, phys_addr_t end)
> +{
> +	unsigned long l0gpt_sz = rmi_l0gpt_size();
> +
> +	start = ALIGN_DOWN(start, l0gpt_sz);
> +	end = ALIGN(end, l0gpt_sz);
> +
> +	while (start < end) {
> +		int ret = rmi_gpt_l1_create(start);
> +
> +		/*
> +		 * Make sure the L1 GPT tables are created for the region.
> +		 * RMI_ERROR_GPT indicates the L1 table already exists.
> +		 */
> +		if (ret && ret != RMI_ERROR_GPT) {
> +			/*
> +			 * FIXME: Handle SRO so that memory can be donated for
> +			 * the tables.
> +			 */
> +			pr_err("GPT Level1 table missing for %llx\n", start);
> +			return -ENOMEM;

If any of this fails, where is the cleanup done? Is that part of the
missing SRO support that's indicated in the commit message?

> +		}
> +		start += l0gpt_sz;
> +	}
> +
> +	return 0;
> +}
> +
> +static int rmi_init_metadata(void)
> +{
> +	phys_addr_t start, end;
> +	const struct memblock_region *r;
> +
> +	for_each_mem_region(r) {
> +		int ret;
> +
> +		start = memblock_region_memory_base_pfn(r) << PAGE_SHIFT;
> +		end = memblock_region_memory_end_pfn(r) << PAGE_SHIFT;
> +		ret = rmi_verify_memory_tracking(start, end);
> +		if (ret)
> +			return ret;
> +		ret = rmi_create_gpts(start, end);
> +		if (ret)
> +			return ret;
> +	}

How does this work with, say, memory hotplug?

> +
> +	return 0;
> +}
> +
> +bool rmi_is_available(void)
> +{
> +	return arm64_rmi_is_available;
> +}
> +
>  static int __init arm64_init_rmi(void)
>  {
>  	/* Continue without realm support if we can't agree on a version */
> @@ -101,6 +199,11 @@ static int __init arm64_init_rmi(void)
>  
>  	if (rmi_configure())
>  		return 0;
> +	if (rmi_init_metadata())
> +		return 0;
> +
> +	arm64_rmi_is_available = true;
> +	pr_info("RMI configured");
>  
>  	return 0;
>  }

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply

* Re: [PATCH v3 27/41] x86/kvmclock: Enable kvmclock on APs during onlining if kvmclock isn't sched_clock
From: Sean Christopherson @ 2026-05-21 13:38 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Woodhouse, Kiryl Shutsemau, Paolo Bonzini, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Dave Hansen, Andy Lutomirski,
	Juergen Gross, Daniel Lezcano, Thomas Gleixner, John Stultz,
	Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260521131019.GI3126523@noisy.programming.kicks-ass.net>

On Thu, May 21, 2026, Peter Zijlstra wrote:
> On Thu, May 21, 2026 at 05:59:17AM -0700, Sean Christopherson wrote:
> > On Thu, May 21, 2026, David Woodhouse wrote:
> > > On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> > > > In anticipation of making x86_cpuinit.early_percpu_clock_init(), i.e.
> > > > kvm_setup_secondary_clock(), a dedicated sched_clock hook that will be
> > > > invoked if and only if kvmclock is set as sched_clock, ensure APs enable
> > > > their kvmclock during CPU online.  While a redundant write to the MSR is
> > > > technically ok, skip the registration when kvmclock is sched_clock so that
> > > > it's somewhat obvious that kvmclock *needs* to be enabled during early
> > > > bringup when it's being used as sched_clock.
> > > > 
> > > > Plumb in the BSP's resume path purely for documentation purposes.  Both
> > > > KVM (as-a-guest) and timekeeping/clocksource hook syscore_ops, and it's
> > > > not super obvious that using KVM's hooks would be flawed.  E.g. it would
> > > > work today, because KVM's hooks happen to run after/before timekeeping's
> > > > hooks during suspend/resume, but that's sheer dumb luck as the order in
> > > > which syscore_ops are invoked depends entirely on when a subsystem is
> > > > initialized and thus registers its hooks.
> > > > 
> > > > Opportunsitically make the registration messages more precise to help
> > > > debug issues where kvmclock is enabled too late.
> > > 
> > > That's a hard word to type, isn't it?
> > 
> > Heh, you have no idea.  I've been "this" close to creating a VIM binding for a
> > while, it is time...
> 
> 'z=' not good enough?

You people and your fancy ways.  I'm just happy I can get in and out of the editor :-)

^ permalink raw reply

* Re: [PATCH v6 05/43] KVM: guest_memfd: Wire up kvm_get_memory_attributes() to per-gmem attributes
From: Sean Christopherson @ 2026-05-21 13:31 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: Ackerley Tng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CA+EHjTzLCD-dU-euZKgzwyEr2ecPqFDNutcaHm2fCDGA+MHVXA@mail.gmail.com>

On Thu, May 21, 2026, Fuad Tabba wrote:
> On Wed, 20 May 2026 at 22:44, Ackerley Tng <ackerleytng@google.com> wrote:
> >
> > Fuad Tabba <tabba@google.com> writes:
> >
> > >
> > > [...snip...]
> > >
> > >> +unsigned long kvm_gmem_get_memory_attributes(struct kvm *kvm, gfn_t gfn)
> > >> +{
> > >> +       struct kvm_memory_slot *slot = gfn_to_memslot(kvm, gfn);
> > >> +       struct inode *inode;
> > >> +
> > >> +       /*
> > >> +        * If this gfn has no associated memslot, there's no chance of the gfn
> > >> +        * being backed by private memory, since guest_memfd must be used for
> > >> +        * private memory, and guest_memfd must be associated with some memslot.
> > >> +        */
> > >> +       if (!slot)
> > >> +               return 0;
> > >> +
> > >> +       CLASS(gmem_get_file, file)(slot);
> > >> +       if (!file)
> > >> +               return 0;
> > >> +
> > >> +       inode = file_inode(file);
> > >> +
> > >> +       /*
> > >> +        * Rely on the maple tree's internal RCU lock to ensure a
> > >> +        * stable result. This result can become stale as soon as the
> > >> +        * lock is dropped, so the caller _must_ still protect
> > >> +        * consumption of private vs. shared by checking
> > >> +        * mmu_invalidate_retry_gfn() under mmu_lock to serialize
> > >> +        * against ongoing attribute updates.
> > >> +        */
> > >> +       return kvm_gmem_get_attributes(inode, kvm_gmem_get_index(slot, gfn));
> > >> +}
> > >
> > > Doesn't this imply that all consumers of kvm_mem_is_private() should
> > > validate the result using mmu_lock and the invalidation sequence?
> >
> > Let me know how I can improve the comment.
> 
> Given Sean's context, the comment is good I think. I would quibble
> with the the "_must_ still protect" phrasing being a bit too strict.
> 
> Maybe just soften it slightly to acknowledge the exception? Something like:
> 
>   * lock is dropped, so callers that require a strict result _must_ protect
>   * consumption of private vs. shared by checking mmu_invalidate_retry_gfn()
>   * under mmu_lock to serialize against ongoing attribute updates. Callers
>   * doing lockless reads must be able to tolerate a stale result.
> 
> That aligns the comment with how KVM is actually using it today. That
> said, this is nitpicking. Feel free to use or ignore.

Hmm, I wonder if we can figure out a way to consolidate some documentation,
because this is _exactly_ the same pattern that x86's host_pfn_mapping_level()
deals with (see its big comment below).

There's also the stale comment in kvm_invalidate_memslot(), which, stating the
obvious, speaks to the memslot+SRCU side of things.

Maybe it makes sense to to find a central location for one giant comment about
how how MMU notifier events and memslot+SRCU protections work?  And then refer
to that in paths where some asset needs to be tied into MMU notifiers and/or
memslots+SRCU?

[*] https://lore.kernel.org/all/agcbWe8s9lmPuJwG@google.com


/*
 * Lookup the mapping level for @gfn in the current mm.
 *
 * WARNING!  Use of host_pfn_mapping_level() requires the caller and the end
 * consumer to be tied into KVM's handlers for MMU notifier events!
 *
 * There are several ways to safely use this helper:
 *
 * - Check mmu_invalidate_retry_gfn() after grabbing the mapping level, before
 *   consuming it.  In this case, mmu_lock doesn't need to be held during the
 *   lookup, but it does need to be held while checking the MMU notifier.
 *
 * - Hold mmu_lock AND ensure there is no in-progress MMU notifier invalidation
 *   event for the hva.  This can be done by explicit checking the MMU notifier
 *   or by ensuring that KVM already has a valid mapping that covers the hva.
 *
 * - Do not use the result to install new mappings, e.g. use the host mapping
 *   level only to decide whether or not to zap an entry.  In this case, it's
 *   not required to hold mmu_lock (though it's highly likely the caller will
 *   want to hold mmu_lock anyways, e.g. to modify SPTEs).
 *
 * Note!  The lookup can still race with modifications to host page tables, but
 * the above "rules" ensure KVM will not _consume_ the result of the walk if a
 * race with the primary MMU occurs.
 */

^ permalink raw reply

* Re: [PATCH v14 07/44] arm64: RMI: Configure the RMM with the host's page size
From: Marc Zyngier @ 2026-05-21 13:30 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-8-steven.price@arm.com>

On Wed, 13 May 2026 14:17:15 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> RMM v2.0 brings the ability to set the RMM's granule size. Check the
> feature registers and configure the RMM so that it matches the host's
> page size. This means that operations can be done with a granulatity
> equal to PAGE_SIZE.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
>  * Moved out of KVM.
> ---
>  arch/arm64/kernel/rmi.c | 42 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 42 insertions(+)
> 
> diff --git a/arch/arm64/kernel/rmi.c b/arch/arm64/kernel/rmi.c
> index 99c1ccc35c11..a14ead5dedda 100644
> --- a/arch/arm64/kernel/rmi.c
> +++ b/arch/arm64/kernel/rmi.c
> @@ -49,6 +49,45 @@ static int rmi_check_version(void)
>  	return 0;
>  }
>  
> +static int rmi_configure(void)
> +{
> +	struct rmm_config *config __free(free_page) = NULL;
> +	unsigned long ret;
> +
> +	config = (struct rmm_config *)get_zeroed_page(GFP_KERNEL);
> +	if (!config)
> +		return -ENOMEM;

This is the sort of buggy construct that is highlighted in
include/linux/cleanup.h: initialising the object for cleanup with
NULL, and only later assigning the expected value.

It may not matter here, but it will catch you (or more probably me) in
the future.

> +
> +	switch (PAGE_SIZE) {
> +	case SZ_4K:
> +		config->rmi_granule_size = RMI_GRANULE_SIZE_4KB;
> +		break;
> +	case SZ_16K:
> +		config->rmi_granule_size = RMI_GRANULE_SIZE_16KB;
> +		break;
> +	case SZ_64K:
> +		config->rmi_granule_size = RMI_GRANULE_SIZE_64KB;
> +		break;
> +	default:
> +		pr_err("Unsupported PAGE_SIZE for RMM\n");

Do you really anticipate PAGE_SIZE being any other value? This is 100%
dead code. If you want to be extra cautious, have a BUILD_BUg_ON().

> +		return -EINVAL;
> +	}
> +
> +	ret = rmi_rmm_config_set(virt_to_phys(config));
> +	if (ret) {
> +		pr_err("RMM config set failed\n");
> +		return -EINVAL;
> +	}

What is the live cycle of the page when the call succeeds? Is it
switched back to the NS PAS and allowed to be freed?

> +
> +	ret = rmi_rmm_activate();
> +	if (ret) {
> +		pr_err("RMM activate failed\n");
> +		return -ENXIO;
> +	}
> +
> +	return 0;
> +}
> +
>  static int __init arm64_init_rmi(void)
>  {
>  	/* Continue without realm support if we can't agree on a version */
> @@ -60,6 +99,9 @@ static int __init arm64_init_rmi(void)
>  	if (WARN_ON(rmi_features(1, &rmm_feat_reg1)))
>  		return 0;
>  
> +	if (rmi_configure())
> +		return 0;
> +
>  	return 0;
>  }
>  subsys_initcall(arm64_init_rmi);

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply

* Re: [PATCH v6 16/43] KVM: guest_memfd: Use actual size for invalidation in kvm_gmem_release()
From: Fuad Tabba @ 2026-05-21 13:29 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: ackerleytng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <ag8BmtzxTlcuA_zy@google.com>

On Thu, 21 May 2026 at 13:59, Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, May 21, 2026, Fuad Tabba wrote:
> > Hi Ackerley,
> >
> > On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
> > <devnull+ackerleytng.google.com@kernel.org> wrote:
> > >
> > > From: Ackerley Tng <ackerleytng@google.com>
> > >
> > > __kvm_gmem_invalidate_begin() and __kvm_gmem_invalidate_end() actually do
> > > not specially handle -1ul. -1ul is used as a huge number, which legal
> > > indices do not exceed, and hence the invalidation works as expected.
> > >
> > > Since a later patch is going to make use of the exact range, calculate the
> > > size of the guest_memfd inode and use it as the end range for invalidating
> > > SPTEs.
> > >
> > > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> >
> > Want to look at what Sashiko has to say? Seems to be a real issue:
> >
> > https://sashiko.dev/#/patchset/20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4%40google.com?part=16
> >
> > If I understand correctly, the fix should simple: use
> > check_add_overflow() to validate the offset and size parameters in
> > kvm_gmem_bind()
> >
> >    int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> >              unsigned int fd, loff_t offset)
> >    {
> >        loff_t size = slot->npages << PAGE_SHIFT;
> >    +    loff_t end;
> >        unsigned long start, end_index;
> >        struct gmem_file *f;
> > ...
> >    -    if (offset < 0 || !PAGE_ALIGNED(offset) ||
> >    -        offset + size > i_size_read(inode))
> >    +    if (offset < 0 || !PAGE_ALIGNED(offset) ||
> >    +        check_add_overflow(offset, size, &end) ||
>
> Eww, TIL I'm not a fan of check_add_overflow().  Burying an out-param in an
> if-statement is nasty.
>
> >    +        end > i_size_read(inode))
>
> This is all rather silly.  @offset and and @slot->npages are fundamentally
> unsigned values.   I don't see any reason to convert them to signed values, only
> to convert them *back* to unsigned values (when stored in start/end, because xarrays
> operate on "unsigned long" indices).
>
> i_size_read() obviously has to return a positive value, so can't we just do this?

lgtm,
/fuad

>
> diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
> index a35a55571a2d..9c6dbb54e800 100644
> --- virt/kvm/guest_memfd.c
> +++ virt/kvm/guest_memfd.c
> @@ -640,9 +640,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
>  }
>
>  int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> -                 unsigned int fd, loff_t offset)
> +                 unsigned int fd, u64 offset)
>  {
> -       loff_t size = slot->npages << PAGE_SHIFT;
> +       u64 size = slot->npages << PAGE_SHIFT;
>         unsigned long start, end;
>         struct gmem_file *f;
>         struct inode *inode;
> @@ -664,8 +664,7 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
>
>         inode = file_inode(file);
>
> -       if (offset < 0 || !PAGE_ALIGNED(offset) ||
> -           offset + size > i_size_read(inode))
> +       if (!PAGE_ALIGNED(offset) || offset + size > i_size_read(inode))
>                 goto err;
>
>         filemap_invalidate_lock(inode->i_mapping);
> diff --git virt/kvm/kvm_mm.h virt/kvm/kvm_mm.h
> index 9fcc5d5b7f8d..3cb5ef86d0d9 100644
> --- virt/kvm/kvm_mm.h
> +++ virt/kvm/kvm_mm.h
> @@ -72,7 +72,7 @@ int kvm_gmem_init(struct module *module);
>  void kvm_gmem_exit(void);
>  int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args);
>  int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> -                 unsigned int fd, loff_t offset);
> +                 unsigned int fd, u64 offset);
>  void kvm_gmem_unbind(struct kvm_memory_slot *slot);
>  #else
>  static inline int kvm_gmem_init(struct module *module)
> @@ -80,9 +80,8 @@ static inline int kvm_gmem_init(struct module *module)
>         return 0;
>  }
>  static inline void kvm_gmem_exit(void) {};
> -static inline int kvm_gmem_bind(struct kvm *kvm,
> -                                        struct kvm_memory_slot *slot,
> -                                        unsigned int fd, loff_t offset)
> +static inline int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
> +                               unsigned int fd, u64 offset)
>  {
>         WARN_ON_ONCE(1);
>         return -EIO;
>

^ permalink raw reply

* Re: [PATCH v6 21/43] KVM: SEV: Make 'uaddr' parameter optional for KVM_SEV_SNP_LAUNCH_UPDATE
From: Sean Christopherson @ 2026-05-21 13:21 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: ackerleytng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CA+EHjTwrygfMrZZSw4y7-ry8fidW2x0C7iuF2Q=dnPNHUmNtUg@mail.gmail.com>

On Thu, May 21, 2026, Fuad Tabba wrote:
> Hi,
> 
> On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
> <devnull+ackerleytng.google.com@kernel.org> wrote:
> >
> > From: Michael Roth <michael.roth@amd.com>
> >
> > For vm_memory_attributes=1, in-place conversion/population is not
> > supported, so the initial contents necessarily must need to come
> > from a separate src address, which is enforced by the current
> > implementation. However, for vm_memory_attributes=0, it is possible for
> > guest memory to be initialized directly from userspace by mmap()'ing the
> > guest_memfd and writing to it while the corresponding GPA ranges are in
> > a 'shared' state before converting them to the 'private' state expected
> > by KVM_SEV_SNP_LAUNCH_UPDATE.
> >
> > Update the handling/documentation for KVM_SEV_SNP_LAUNCH_UPDATE to allow
> > for 'uaddr' to be set to NULL when vm_memory_attributes=0, which
> > SNP_LAUNCH_UPDATE will then use to determine when it should/shouldn't
> > copy in data from a separate memory location. Continue to enforce
> > non-NULL for the original vm_memory_attributes=1 case.
> >
> > Signed-off-by: Michael Roth <michael.roth@amd.com>
> > [Added src_page check in error handling path when the firmware command fails]
> > [Dropped ifdef CONFIG_KVM_VM_MEMORY_ATTRIBUTES]
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> 
> I'm not very familiar with the SEV-SNP populate flows, but it looks
> like Sashiko is on to something:
> https://sashiko.dev/#/patchset/20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4%40google.com?part=21
> 
> - a potential read-only page overwrite, because src_page is acquired
> via get_user_pages_fast() without the FOLL_WRITE flag, but is then
> overwritten via memcpy

Oof, yeah, that's bad.  Adding FOLL_WRITE to kvm_gmem_populate() feels wrong, and
could break uABI, but doing gup() in SNP code would reintroduce the AB-BA issue
with filemap_invalidate_lock().

Aha!  Not if we use get_user_page_fast_only().  Ugh, but then we'd have to plumb
the userspace address into the post-populated callback.

Hrm.  Given that no one has yelled about overwriting their CPUID page, and given
that the CPUID page is likely dynamically created and thus is unlikely to be a
read-only mapping (e.g. versus the initial image), maybe this?

diff --git arch/x86/kvm/svm/sev.c arch/x86/kvm/svm/sev.c
index 37d4cfa5d980..c73c028d72c1 100644
--- arch/x86/kvm/svm/sev.c
+++ arch/x86/kvm/svm/sev.c
@@ -2456,6 +2456,7 @@ static int snp_launch_update(struct kvm *kvm, struct kvm_sev_cmd *argp)
        sev_populate_args.type = params.type;
 
        count = kvm_gmem_populate(kvm, params.gfn_start, src, npages,
+                                 params.type == KVM_SEV_SNP_PAGE_TYPE_CPUID,
                                  sev_gmem_post_populate, &sev_populate_args);
        if (count < 0) {
                argp->error = sev_populate_args.fw_error;
diff --git arch/x86/kvm/vmx/tdx.c arch/x86/kvm/vmx/tdx.c
index f97bcf580e6d..33f35be4455b 100644
--- arch/x86/kvm/vmx/tdx.c
+++ arch/x86/kvm/vmx/tdx.c
@@ -3188,7 +3188,7 @@ static int tdx_vcpu_init_mem_region(struct kvm_vcpu *vcpu, struct kvm_tdx_cmd *c
                };
                gmem_ret = kvm_gmem_populate(kvm, gpa_to_gfn(region.gpa),
                                             u64_to_user_ptr(region.source_addr),
-                                            1, tdx_gmem_post_populate, &arg);
+                                            1, false, tdx_gmem_post_populate, &arg);
                if (gmem_ret < 0) {
                        ret = gmem_ret;
                        break;
diff --git include/linux/kvm_host.h include/linux/kvm_host.h
index 61a3430957f2..b83cda2870ba 100644
--- include/linux/kvm_host.h
+++ include/linux/kvm_host.h
@@ -2596,7 +2596,8 @@ int kvm_arch_gmem_prepare(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn, int max_ord
 typedef int (*kvm_gmem_populate_cb)(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
                                    struct page *page, void *opaque);
 
-long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long npages,
+long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src,
+                      long npages, bool writable,
                       kvm_gmem_populate_cb post_populate, void *opaque);
 #endif
 
diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
index a35a55571a2d..6553d4e032ce 100644
--- virt/kvm/guest_memfd.c
+++ virt/kvm/guest_memfd.c
@@ -858,7 +858,8 @@ static long __kvm_gmem_populate(struct kvm *kvm, struct kvm_memory_slot *slot,
        return ret;
 }
 
-long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long npages,
+long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src,
+                      long npages, bool writable,
                       kvm_gmem_populate_cb post_populate, void *opaque)
 {
        struct kvm_memory_slot *slot;
@@ -892,8 +893,9 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t start_gfn, void __user *src, long
 
                if (src) {
                        unsigned long uaddr = (unsigned long)src + i * PAGE_SIZE;
+                       unsigned int flags = writable ? FOLL_WRITE : 0;
 
-                       ret = get_user_pages_fast(uaddr, 1, 0, &src_page);
+                       ret = get_user_pages_fast(uaddr, 1, flags, &src_page);
                        if (ret < 0)
                                break;
                        if (ret != 1) {

> - an ordering violation with the kunmap_local() calls

Yeesh, that's a new one for me.  Thankfully this is 64-bit only, so it's not an
issue.

> These predate this patch series and are just being touched by the
> 'src_page' addition, but if Sashiko's right, these should probably be
> fixed sooner rather than later.

Yeah, ditto with the offset wrapping case.

^ permalink raw reply related

* Re: [PATCH v3 27/41] x86/kvmclock: Enable kvmclock on APs during onlining if kvmclock isn't sched_clock
From: Peter Zijlstra @ 2026-05-21 13:10 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: David Woodhouse, Kiryl Shutsemau, Paolo Bonzini, K. Y. Srinivasan,
	Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li, Ajay Kaher,
	Alexey Makhalov, Jan Kiszka, Dave Hansen, Andy Lutomirski,
	Juergen Gross, Daniel Lezcano, Thomas Gleixner, John Stultz,
	Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <ag8Bpc_uVNrNWqfX@google.com>

On Thu, May 21, 2026 at 05:59:17AM -0700, Sean Christopherson wrote:
> On Thu, May 21, 2026, David Woodhouse wrote:
> > On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> > > In anticipation of making x86_cpuinit.early_percpu_clock_init(), i.e.
> > > kvm_setup_secondary_clock(), a dedicated sched_clock hook that will be
> > > invoked if and only if kvmclock is set as sched_clock, ensure APs enable
> > > their kvmclock during CPU online.  While a redundant write to the MSR is
> > > technically ok, skip the registration when kvmclock is sched_clock so that
> > > it's somewhat obvious that kvmclock *needs* to be enabled during early
> > > bringup when it's being used as sched_clock.
> > > 
> > > Plumb in the BSP's resume path purely for documentation purposes.  Both
> > > KVM (as-a-guest) and timekeeping/clocksource hook syscore_ops, and it's
> > > not super obvious that using KVM's hooks would be flawed.  E.g. it would
> > > work today, because KVM's hooks happen to run after/before timekeeping's
> > > hooks during suspend/resume, but that's sheer dumb luck as the order in
> > > which syscore_ops are invoked depends entirely on when a subsystem is
> > > initialized and thus registers its hooks.
> > > 
> > > Opportunsitically make the registration messages more precise to help
> > > debug issues where kvmclock is enabled too late.
> > 
> > That's a hard word to type, isn't it?
> 
> Heh, you have no idea.  I've been "this" close to creating a VIM binding for a
> while, it is time...

'z=' not good enough?


^ permalink raw reply

* Re: [PATCH v14 06/44] arm64: RMI: Check for RMI support at init
From: Marc Zyngier @ 2026-05-21 13:02 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-7-steven.price@arm.com>

On Wed, 13 May 2026 14:17:14 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> Query the RMI version number and check if it is a compatible version.
> The first two feature registers are read and exposed for future code to
> use.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> v14:
>  * This moves the basic RMI setup into the 'kernel' directory. This is
>    because RMI will be used for some features outside of KVM so should
>    be available even if KVM isn't compiled in.
> ---
>  arch/arm64/include/asm/rmi_cmds.h |  3 ++
>  arch/arm64/kernel/Makefile        |  2 +-
>  arch/arm64/kernel/cpufeature.c    |  1 +
>  arch/arm64/kernel/rmi.c           | 65 +++++++++++++++++++++++++++++++
>  4 files changed, 70 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm64/kernel/rmi.c
> 
> diff --git a/arch/arm64/include/asm/rmi_cmds.h b/arch/arm64/include/asm/rmi_cmds.h
> index 04f7066894e9..9179934925c5 100644
> --- a/arch/arm64/include/asm/rmi_cmds.h
> +++ b/arch/arm64/include/asm/rmi_cmds.h
> @@ -10,6 +10,9 @@
>  
>  #include <asm/rmi_smc.h>
>  
> +extern unsigned long rmm_feat_reg0;
> +extern unsigned long rmm_feat_reg1;
> +
>  struct rtt_entry {
>  	unsigned long walk_level;
>  	unsigned long desc;
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index 74b76bb70452..d68f351aae75 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -34,7 +34,7 @@ obj-y			:= debug-monitors.o entry.o irq.o fpsimd.o		\
>  			   cpufeature.o alternative.o cacheinfo.o		\
>  			   smp.o smp_spin_table.o topology.o smccc-call.o	\
>  			   syscall.o proton-pack.o idle.o patching.o pi/	\
> -			   rsi.o jump_label.o
> +			   rsi.o jump_label.o rmi.o
>  
>  obj-$(CONFIG_COMPAT)			+= sys32.o signal32.o			\
>  					   sys_compat.o
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 6d53bb15cf7b..8bdd95a8c2de 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -292,6 +292,7 @@ static const struct arm64_ftr_bits ftr_id_aa64isar3[] = {
>  static const struct arm64_ftr_bits ftr_id_aa64pfr0[] = {
>  	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL1_CSV3_SHIFT, 4, 0),
>  	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL1_CSV2_SHIFT, 4, 0),
> +	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL1_RME_SHIFT, 4, 0),
>  	ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL1_DIT_SHIFT, 4, 0),
>  	ARM64_FTR_BITS(FTR_HIDDEN, FTR_NONSTRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL1_AMU_SHIFT, 4, 0),
>  	ARM64_FTR_BITS(FTR_HIDDEN, FTR_STRICT, FTR_LOWER_SAFE, ID_AA64PFR0_EL1_MPAM_SHIFT, 4, 0),
> diff --git a/arch/arm64/kernel/rmi.c b/arch/arm64/kernel/rmi.c
> new file mode 100644
> index 000000000000..99c1ccc35c11
> --- /dev/null
> +++ b/arch/arm64/kernel/rmi.c
> @@ -0,0 +1,65 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2023-2025 ARM Ltd.
> + */
> +
> +#include <linux/memblock.h>
> +
> +#include <asm/rmi_cmds.h>
> +
> +unsigned long rmm_feat_reg0;
> +unsigned long rmm_feat_reg1;

What is the requirement for making those globally accessible? Can't
they be made static and use an accessor that returns them? Can the
variables be made __ro_after_init?

> +
> +static int rmi_check_version(void)
> +{
> +	struct arm_smccc_res res;
> +	unsigned short version_major, version_minor;
> +	unsigned long host_version = RMI_ABI_VERSION(RMI_ABI_MAJOR_VERSION,
> +						     RMI_ABI_MINOR_VERSION);
> +	unsigned long aa64pfr0 = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1);
> +
> +	/* If RME isn't supported, then RMI can't be */
> +	if (cpuid_feature_extract_unsigned_field(aa64pfr0, ID_AA64PFR0_EL1_RME_SHIFT) == 0)
> +		return -ENXIO;
> +
> +	arm_smccc_1_1_invoke(SMC_RMI_VERSION, host_version, &res);
> +
> +	if (res.a0 == SMCCC_RET_NOT_SUPPORTED)
> +		return -ENXIO;
> +
> +	version_major = RMI_ABI_VERSION_GET_MAJOR(res.a1);
> +	version_minor = RMI_ABI_VERSION_GET_MINOR(res.a1);
> +
> +	if (res.a0 != RMI_SUCCESS) {
> +		unsigned short high_version_major, high_version_minor;
> +
> +		high_version_major = RMI_ABI_VERSION_GET_MAJOR(res.a2);
> +		high_version_minor = RMI_ABI_VERSION_GET_MINOR(res.a2);
> +
> +		pr_err("Unsupported RMI ABI (v%d.%d - v%d.%d) we want v%d.%d\n",
> +		       version_major, version_minor,
> +		       high_version_major, high_version_minor,
> +		       RMI_ABI_MAJOR_VERSION,
> +		       RMI_ABI_MINOR_VERSION);
> +		return -ENXIO;
> +	}
> +
> +	pr_info("RMI ABI version %d.%d\n", version_major, version_minor);
> +
> +	return 0;
> +}
> +
> +static int __init arm64_init_rmi(void)
> +{
> +	/* Continue without realm support if we can't agree on a version */
> +	if (rmi_check_version())
> +		return 0;
> +
> +	if (WARN_ON(rmi_features(0, &rmm_feat_reg0)))
> +		return 0;
> +	if (WARN_ON(rmi_features(1, &rmm_feat_reg1)))
> +		return 0;
> +
> +	return 0;
> +}
> +subsys_initcall(arm64_init_rmi);

Is there any reliance on this being executed before or after KVM's own
initialisation? If so, this should be captured.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply

* Re: [PATCH v3 27/41] x86/kvmclock: Enable kvmclock on APs during onlining if kvmclock isn't sched_clock
From: Sean Christopherson @ 2026-05-21 12:59 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Kiryl Shutsemau, Paolo Bonzini, K. Y. Srinivasan, Haiyang Zhang,
	Wei Liu, Dexuan Cui, Long Li, Ajay Kaher, Alexey Makhalov,
	Jan Kiszka, Dave Hansen, Andy Lutomirski, Peter Zijlstra,
	Juergen Gross, Daniel Lezcano, Thomas Gleixner, John Stultz,
	Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <423b37f056f0d4d596d5f4cc73802fb1079ecf63.camel@infradead.org>

On Thu, May 21, 2026, David Woodhouse wrote:
> On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> > In anticipation of making x86_cpuinit.early_percpu_clock_init(), i.e.
> > kvm_setup_secondary_clock(), a dedicated sched_clock hook that will be
> > invoked if and only if kvmclock is set as sched_clock, ensure APs enable
> > their kvmclock during CPU online.  While a redundant write to the MSR is
> > technically ok, skip the registration when kvmclock is sched_clock so that
> > it's somewhat obvious that kvmclock *needs* to be enabled during early
> > bringup when it's being used as sched_clock.
> > 
> > Plumb in the BSP's resume path purely for documentation purposes.  Both
> > KVM (as-a-guest) and timekeeping/clocksource hook syscore_ops, and it's
> > not super obvious that using KVM's hooks would be flawed.  E.g. it would
> > work today, because KVM's hooks happen to run after/before timekeeping's
> > hooks during suspend/resume, but that's sheer dumb luck as the order in
> > which syscore_ops are invoked depends entirely on when a subsystem is
> > initialized and thus registers its hooks.
> > 
> > Opportunsitically make the registration messages more precise to help
> > debug issues where kvmclock is enabled too late.
> 
> That's a hard word to type, isn't it?

Heh, you have no idea.  I've been "this" close to creating a VIM binding for a
while, it is time...

^ permalink raw reply

* Re: [PATCH v6 16/43] KVM: guest_memfd: Use actual size for invalidation in kvm_gmem_release()
From: Sean Christopherson @ 2026-05-21 12:59 UTC (permalink / raw)
  To: Fuad Tabba
  Cc: ackerleytng, aik, andrew.jones, binbin.wu, brauner, chao.p.peng,
	david, ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H. Peter Anvin, Steven Rostedt, Masami Hiramatsu,
	Mathieu Desnoyers, Jonathan Corbet, Shuah Khan, Shuah Khan,
	Vishal Annapurve, Andrew Morton, Chris Li, Kairui Song,
	Kemeng Shi, Nhat Pham, Baoquan He, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka, kvm,
	linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <CA+EHjTxcadguOfOo7RpJVtAzcY5JAFZTbrAT_wcN6akMi8gCUg@mail.gmail.com>

On Thu, May 21, 2026, Fuad Tabba wrote:
> Hi Ackerley,
> 
> On Thu, 7 May 2026 at 21:22, Ackerley Tng via B4 Relay
> <devnull+ackerleytng.google.com@kernel.org> wrote:
> >
> > From: Ackerley Tng <ackerleytng@google.com>
> >
> > __kvm_gmem_invalidate_begin() and __kvm_gmem_invalidate_end() actually do
> > not specially handle -1ul. -1ul is used as a huge number, which legal
> > indices do not exceed, and hence the invalidation works as expected.
> >
> > Since a later patch is going to make use of the exact range, calculate the
> > size of the guest_memfd inode and use it as the end range for invalidating
> > SPTEs.
> >
> > Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> 
> Want to look at what Sashiko has to say? Seems to be a real issue:
> 
> https://sashiko.dev/#/patchset/20260507-gmem-inplace-conversion-v6-0-91ab5a8b19a4%40google.com?part=16
> 
> If I understand correctly, the fix should simple: use
> check_add_overflow() to validate the offset and size parameters in
> kvm_gmem_bind()
> 
>    int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
>              unsigned int fd, loff_t offset)
>    {
>        loff_t size = slot->npages << PAGE_SHIFT;
>    +    loff_t end;
>        unsigned long start, end_index;
>        struct gmem_file *f;
> ...
>    -    if (offset < 0 || !PAGE_ALIGNED(offset) ||
>    -        offset + size > i_size_read(inode))
>    +    if (offset < 0 || !PAGE_ALIGNED(offset) ||
>    +        check_add_overflow(offset, size, &end) ||

Eww, TIL I'm not a fan of check_add_overflow().  Burying an out-param in an
if-statement is nasty.

>    +        end > i_size_read(inode))

This is all rather silly.  @offset and and @slot->npages are fundamentally
unsigned values.   I don't see any reason to convert them to signed values, only
to convert them *back* to unsigned values (when stored in start/end, because xarrays
operate on "unsigned long" indices).

i_size_read() obviously has to return a positive value, so can't we just do this?

diff --git virt/kvm/guest_memfd.c virt/kvm/guest_memfd.c
index a35a55571a2d..9c6dbb54e800 100644
--- virt/kvm/guest_memfd.c
+++ virt/kvm/guest_memfd.c
@@ -640,9 +640,9 @@ int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args)
 }
 
 int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
-                 unsigned int fd, loff_t offset)
+                 unsigned int fd, u64 offset)
 {
-       loff_t size = slot->npages << PAGE_SHIFT;
+       u64 size = slot->npages << PAGE_SHIFT;
        unsigned long start, end;
        struct gmem_file *f;
        struct inode *inode;
@@ -664,8 +664,7 @@ int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
 
        inode = file_inode(file);
 
-       if (offset < 0 || !PAGE_ALIGNED(offset) ||
-           offset + size > i_size_read(inode))
+       if (!PAGE_ALIGNED(offset) || offset + size > i_size_read(inode))
                goto err;
 
        filemap_invalidate_lock(inode->i_mapping);
diff --git virt/kvm/kvm_mm.h virt/kvm/kvm_mm.h
index 9fcc5d5b7f8d..3cb5ef86d0d9 100644
--- virt/kvm/kvm_mm.h
+++ virt/kvm/kvm_mm.h
@@ -72,7 +72,7 @@ int kvm_gmem_init(struct module *module);
 void kvm_gmem_exit(void);
 int kvm_gmem_create(struct kvm *kvm, struct kvm_create_guest_memfd *args);
 int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
-                 unsigned int fd, loff_t offset);
+                 unsigned int fd, u64 offset);
 void kvm_gmem_unbind(struct kvm_memory_slot *slot);
 #else
 static inline int kvm_gmem_init(struct module *module)
@@ -80,9 +80,8 @@ static inline int kvm_gmem_init(struct module *module)
        return 0;
 }
 static inline void kvm_gmem_exit(void) {};
-static inline int kvm_gmem_bind(struct kvm *kvm,
-                                        struct kvm_memory_slot *slot,
-                                        unsigned int fd, loff_t offset)
+static inline int kvm_gmem_bind(struct kvm *kvm, struct kvm_memory_slot *slot,
+                               unsigned int fd, u64 offset)
 {
        WARN_ON_ONCE(1);
        return -EIO;


^ permalink raw reply related

* Re: [PATCH v14 05/44] arm64: RMI: Add wrappers for RMI calls
From: Marc Zyngier @ 2026-05-21 12:49 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-6-steven.price@arm.com>

On Wed, 13 May 2026 14:17:13 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> The wrappers make the call sites easier to read and deal with the
> boiler plate of handling the error codes from the RMM.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes from v13:
>  * Update to RMM v2.0-bet1 spec including some SRO support (there still
>    some FIXMEs where SRO support is incomplete).
> Changes from v12:
>  * Update to RMM v2.0 specification
> Changes from v8:
>  * Switch from arm_smccc_1_2_smc() to arm_smccc_1_2_invoke() in
>    rmi_rtt_read_entry() for consistency.
> Changes from v7:
>  * Minor renaming of parameters and updated comments
> Changes from v5:
>  * Further improve comments
> Changes from v4:
>  * Improve comments
> Changes from v2:
>  * Make output arguments optional.
>  * Mask RIPAS value rmi_rtt_read_entry()
>  * Drop unused rmi_rtt_get_phys()
> ---
>  arch/arm64/include/asm/rmi_cmds.h | 661 ++++++++++++++++++++++++++++++
>  1 file changed, 661 insertions(+)
>  create mode 100644 arch/arm64/include/asm/rmi_cmds.h
> 
> diff --git a/arch/arm64/include/asm/rmi_cmds.h b/arch/arm64/include/asm/rmi_cmds.h
> new file mode 100644
> index 000000000000..04f7066894e9
> --- /dev/null
> +++ b/arch/arm64/include/asm/rmi_cmds.h
> @@ -0,0 +1,661 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2023 ARM Ltd.
> + */
> +
> +#ifndef __ASM_RMI_CMDS_H
> +#define __ASM_RMI_CMDS_H
> +
> +#include <linux/arm-smccc.h>
> +
> +#include <asm/rmi_smc.h>
> +
> +struct rtt_entry {
> +	unsigned long walk_level;
> +	unsigned long desc;
> +	int state;
> +	int ripas;
> +};
> +
> +#define RMI_MAX_ADDR_LIST	256
> +
> +struct rmi_sro_state {
> +	struct arm_smccc_1_2_regs regs;
> +	unsigned long addr_count;
> +	unsigned long addr_list[RMI_MAX_ADDR_LIST];
> +};
> +
> +#define rmi_smccc(...) do {						\
> +	arm_smccc_1_1_invoke(__VA_ARGS__);				\
> +} while (RMI_RETURN_STATUS(res.a0) == RMI_BUSY ||			\
> +	 RMI_RETURN_STATUS(res.a0) == RMI_BLOCKED)
> +
> +unsigned long rmi_sro_execute(struct rmi_sro_state *sro, gfp_t gfp);
> +void rmi_sro_free(struct rmi_sro_state *sro);
> +
> +/**
> + * rmi_rmm_config_set() - Configure the RMM
> + * @cfg_ptr: PA of a struct rmm_config
> + *
> + * Sets configuration options on the RMM.
> + *
> + * Return: RMI return code
> + */
> +static inline int rmi_rmm_config_set(unsigned long cfg_ptr)
> +{
> +	struct arm_smccc_res res;
> +
> +	arm_smccc_1_1_invoke(SMC_RMI_RMM_CONFIG_SET, cfg_ptr, &res);
> +
> +	return res.a0;
> +}
> +
> +/**
> + * rmi_rmm_activate() - Activate the RMM
> + *
> + * Return: RMI return code
> + */
> +static inline int rmi_rmm_activate(void)
> +{
> +	struct arm_smccc_res res;
> +
> +	arm_smccc_1_1_invoke(SMC_RMI_RMM_ACTIVATE, &res);
> +
> +	return res.a0;
> +}
> +
> +/**
> + * rmi_granule_tracking_get() - Get configuration of a Granule tracking region
> + * @start: Base PA of the tracking region
> + * @end: End of the PA region
> + * @out_category: Memory category
> + * @out_state: Tracking region state
> + * @out_top: Top of the memory region
> + *
> + * Return: RMI return code
> + */
> +static inline int rmi_granule_tracking_get(unsigned long start,
> +					   unsigned long end,
> +					   unsigned long *out_category,
> +					   unsigned long *out_state,
> +					   unsigned long *out_top)
> +{
> +	struct arm_smccc_res res;
> +
> +	arm_smccc_1_1_invoke(SMC_RMI_GRANULE_TRACKING_GET, start, end, &res);
> +
> +	if (out_category)
> +		*out_category = res.a1;
> +	if (out_state)
> +		*out_state = res.a2;
> +	if (out_top)
> +		*out_top = res.a3;
> +
> +	return res.a0;
> +}
> +
> +/**
> + * rmi_gpt_l1_create() - Create a Level 1 GPT
> + * @addr: Base of physical address region described by the L1GPT
> + *
> + * Return: RMI return code
> + */
> +static inline int rmi_gpt_l1_create(unsigned long addr)
> +{
> +	struct arm_smccc_res res;
> +
> +	arm_smccc_1_1_invoke(SMC_RMI_GPT_L1_CREATE, addr, &res);
> +
> +	if (RMI_RETURN_STATUS(res.a0) == RMI_INCOMPLETE) {
> +		/* FIXME */

Is that part of the SRO stuff you're talking about in the notes?
What is the ETA for fixing all these FIXMEs?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply

* Re: [PATCH v14 04/44] arm64: RMI: Add SMC definitions for calling the RMM
From: Marc Zyngier @ 2026-05-21 12:40 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-5-steven.price@arm.com>

On Wed, 13 May 2026 14:17:12 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> The RMM (Realm Management Monitor) provides functionality that can be
> accessed by SMC calls from the host.
> 
> The SMC definitions are based on DEN0137[1] version 2.0-bet1
> 
> [1] https://developer.arm.com/documentation/den0137/2-0bet1/
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
>  * Updated to RMM spec v2.0-bet1
> Changes since v12:
>  * Updated to RMM spec v2.0-bet0
> Changes since v9:
>  * Corrected size of 'ripas_value' in struct rec_exit. The spec states
>    this is an 8-bit type with padding afterwards (rather than a u64).
> Changes since v8:
>  * Added RMI_PERMITTED_GICV3_HCR_BITS to define which bits the RMM
>    permits to be modified.
> Changes since v6:
>  * Renamed REC_ENTER_xxx defines to include 'FLAG' to make it obvious
>    these are flag values.
> Changes since v5:
>  * Sorted the SMC #defines by value.
>  * Renamed SMI_RxI_CALL to SMI_RMI_CALL since the macro is only used for
>    RMI calls.
>  * Renamed REC_GIC_NUM_LRS to REC_MAX_GIC_NUM_LRS since the actual
>    number of available list registers could be lower.
>  * Provided a define for the reserved fields of FeatureRegister0.
>  * Fix inconsistent names for padding fields.
> Changes since v4:
>  * Update to point to final released RMM spec.
>  * Minor rearrangements.
> Changes since v3:
>  * Update to match RMM spec v1.0-rel0-rc1.
> Changes since v2:
>  * Fix specification link.
>  * Rename rec_entry->rec_enter to match spec.
>  * Fix size of pmu_ovf_status to match spec.
> ---
>  arch/arm64/include/asm/rmi_smc.h | 448 +++++++++++++++++++++++++++++++
>  1 file changed, 448 insertions(+)
>  create mode 100644 arch/arm64/include/asm/rmi_smc.h
> 
> diff --git a/arch/arm64/include/asm/rmi_smc.h b/arch/arm64/include/asm/rmi_smc.h
> new file mode 100644
> index 000000000000..a09b7a631fef
> --- /dev/null
> +++ b/arch/arm64/include/asm/rmi_smc.h
> @@ -0,0 +1,448 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2023-2026 ARM Ltd.
> + *
> + * The values and structures in this file are from the Realm Management Monitor
> + * specification (DEN0137) version 2.0-bet1:
> + * https://developer.arm.com/documentation/den0137/2-0bet1/

How long is this spec going to be available on the ARM web site, which
has a tendency of being reorganised every other week? And there is
already a beta2.

> + */
> +
> +#ifndef __ASM_RMI_SMC_H
> +#define __ASM_RMI_SMC_H
> +
> +#include <linux/arm-smccc.h>
> +
> +#define SMC_RMI_CALL(func)				\
> +	ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,		\
> +			   ARM_SMCCC_SMC_64,		\
> +			   ARM_SMCCC_OWNER_STANDARD,	\
> +			   (func))
> +
> +#define SMC_RMI_VERSION				SMC_RMI_CALL(0x0150)
> +
> +#define SMC_RMI_RTT_DATA_MAP_INIT		SMC_RMI_CALL(0x0153)
> +
> +#define SMC_RMI_REALM_ACTIVATE			SMC_RMI_CALL(0x0157)
> +#define SMC_RMI_REALM_CREATE			SMC_RMI_CALL(0x0158)
> +#define SMC_RMI_REALM_DESTROY			SMC_RMI_CALL(0x0159)
> +#define SMC_RMI_REC_CREATE			SMC_RMI_CALL(0x015a)
> +#define SMC_RMI_REC_DESTROY			SMC_RMI_CALL(0x015b)
> +#define SMC_RMI_REC_ENTER			SMC_RMI_CALL(0x015c)
> +#define SMC_RMI_RTT_CREATE			SMC_RMI_CALL(0x015d)
> +#define SMC_RMI_RTT_DESTROY			SMC_RMI_CALL(0x015e)
> +
> +#define SMC_RMI_RTT_READ_ENTRY			SMC_RMI_CALL(0x0161)
> +
> +#define SMC_RMI_RTT_DEV_VALIDATE		SMC_RMI_CALL(0x0163)
> +#define SMC_RMI_PSCI_COMPLETE			SMC_RMI_CALL(0x0164)
> +#define SMC_RMI_FEATURES			SMC_RMI_CALL(0x0165)
> +#define SMC_RMI_RTT_FOLD			SMC_RMI_CALL(0x0166)
> +
> +#define SMC_RMI_RTT_INIT_RIPAS			SMC_RMI_CALL(0x0168)
> +#define SMC_RMI_RTT_SET_RIPAS			SMC_RMI_CALL(0x0169)
> +#define SMC_RMI_VSMMU_CREATE			SMC_RMI_CALL(0x016a)
> +#define SMC_RMI_VSMMU_DESTROY			SMC_RMI_CALL(0x016b)
> +#define SMC_RMI_RMM_CONFIG_SET			SMC_RMI_CALL(0x016e)
> +#define SMC_RMI_PSMMU_IRQ_NOTIFY		SMC_RMI_CALL(0x016f)
> +
> +#define SMC_RMI_PDEV_ABORT			SMC_RMI_CALL(0x0174)
> +#define SMC_RMI_PDEV_COMMUNICATE		SMC_RMI_CALL(0x0175)
> +#define SMC_RMI_PDEV_CREATE			SMC_RMI_CALL(0x0176)
> +#define SMC_RMI_PDEV_DESTROY			SMC_RMI_CALL(0x0177)
> +#define SMC_RMI_PDEV_GET_STATE			SMC_RMI_CALL(0x0178)
> +
> +#define SMC_RMI_PDEV_STREAM_KEY_REFRESH		SMC_RMI_CALL(0x017a)
> +#define SMC_RMI_PDEV_SET_PUBKEY			SMC_RMI_CALL(0x017b)
> +#define SMC_RMI_PDEV_STOP			SMC_RMI_CALL(0x017c)
> +#define SMC_RMI_RTT_AUX_CREATE			SMC_RMI_CALL(0x017d)
> +#define SMC_RMI_RTT_AUX_DESTROY			SMC_RMI_CALL(0x017e)
> +#define SMC_RMI_RTT_AUX_FOLD			SMC_RMI_CALL(0x017f)
> +
> +#define SMC_RMI_VDEV_ABORT			SMC_RMI_CALL(0x0185)
> +#define SMC_RMI_VDEV_COMMUNICATE		SMC_RMI_CALL(0x0186)
> +#define SMC_RMI_VDEV_CREATE			SMC_RMI_CALL(0x0187)
> +#define SMC_RMI_VDEV_DESTROY			SMC_RMI_CALL(0x0188)
> +#define SMC_RMI_VDEV_GET_STATE			SMC_RMI_CALL(0x0189)
> +#define SMC_RMI_VDEV_UNLOCK			SMC_RMI_CALL(0x018a)
> +#define SMC_RMI_RTT_SET_S2AP			SMC_RMI_CALL(0x018b)
> +#define SMC_RMI_VDEV_COMPLETE			SMC_RMI_CALL(0x018e)
> +
> +#define SMC_RMI_VDEV_GET_INTERFACE_REPORT	SMC_RMI_CALL(0x01d0)
> +#define SMC_RMI_VDEV_GET_MEASUREMENTS		SMC_RMI_CALL(0x01d1)
> +#define SMC_RMI_VDEV_LOCK			SMC_RMI_CALL(0x01d2)
> +#define SMC_RMI_VDEV_START			SMC_RMI_CALL(0x01d3)
> +
> +#define SMC_RMI_VSMMU_EVENT_NOTIFY		SMC_RMI_CALL(0x01d6)
> +#define SMC_RMI_PSMMU_ACTIVATE			SMC_RMI_CALL(0x01d7)
> +#define SMC_RMI_PSMMU_DEACTIVATE		SMC_RMI_CALL(0x01d8)
> +
> +#define SMC_RMI_PSMMU_ST_L2_CREATE		SMC_RMI_CALL(0x01db)
> +#define SMC_RMI_PSMMU_ST_L2_DESTROY		SMC_RMI_CALL(0x01dc)
> +#define SMC_RMI_DPT_L0_CREATE			SMC_RMI_CALL(0x01dd)
> +#define SMC_RMI_DPT_L0_DESTROY			SMC_RMI_CALL(0x01de)
> +#define SMC_RMI_DPT_L1_CREATE			SMC_RMI_CALL(0x01df)
> +#define SMC_RMI_DPT_L1_DESTROY			SMC_RMI_CALL(0x01e0)
> +#define SMC_RMI_GRANULE_TRACKING_GET		SMC_RMI_CALL(0x01e1)
> +
> +#define SMC_RMI_GRANULE_TRACKING_SET		SMC_RMI_CALL(0x01e3)
> +
> +#define SMC_RMI_RMM_CONFIG_GET			SMC_RMI_CALL(0x01ec)
> +
> +#define SMC_RMI_RMM_STATE_GET			SMC_RMI_CALL(0x01ee)
> +
> +#define SMC_RMI_PSMMU_EVENT_CONSUME		SMC_RMI_CALL(0x01f0)
> +#define SMC_RMI_GRANULE_RANGE_DELEGATE		SMC_RMI_CALL(0x01f1)
> +#define SMC_RMI_GRANULE_RANGE_UNDELEGATE	SMC_RMI_CALL(0x01f2)
> +#define SMC_RMI_GPT_L1_CREATE			SMC_RMI_CALL(0x01f3)
> +#define SMC_RMI_GPT_L1_DESTROY			SMC_RMI_CALL(0x01f4)
> +#define SMC_RMI_RTT_DATA_MAP			SMC_RMI_CALL(0x01f5)
> +#define SMC_RMI_RTT_DATA_UNMAP			SMC_RMI_CALL(0x01f6)
> +#define SMC_RMI_RTT_DEV_MAP			SMC_RMI_CALL(0x01f7)
> +#define SMC_RMI_RTT_DEV_UNMAP			SMC_RMI_CALL(0x01f8)
> +#define SMC_RMI_RTT_ARCH_DEV_MAP		SMC_RMI_CALL(0x01f9)
> +#define SMC_RMI_RTT_ARCH_DEV_UNMAP		SMC_RMI_CALL(0x01fa)
> +#define SMC_RMI_RTT_UNPROT_MAP			SMC_RMI_CALL(0x01fb)
> +#define SMC_RMI_RTT_UNPROT_UNMAP		SMC_RMI_CALL(0x01fc)
> +#define SMC_RMI_RTT_AUX_PROT_MAP		SMC_RMI_CALL(0x01fd)
> +#define SMC_RMI_RTT_AUX_PROT_UNMAP		SMC_RMI_CALL(0x01fe)
> +#define SMC_RMI_RTT_AUX_UNPROT_MAP		SMC_RMI_CALL(0x01ff)
> +#define SMC_RMI_RTT_AUX_UNPROT_UNMAP		SMC_RMI_CALL(0x0200)
> +#define SMC_RMI_REALM_TERMINATE			SMC_RMI_CALL(0x0201)
> +#define SMC_RMI_RMM_ACTIVATE			SMC_RMI_CALL(0x0202)
> +#define SMC_RMI_OP_CONTINUE			SMC_RMI_CALL(0x0203)
> +#define SMC_RMI_PDEV_STREAM_CONNECT		SMC_RMI_CALL(0x0204)
> +#define SMC_RMI_PDEV_STREAM_DISCONNECT		SMC_RMI_CALL(0x0205)
> +#define SMC_RMI_PDEV_STREAM_COMPLETE		SMC_RMI_CALL(0x0206)
> +#define SMC_RMI_PDEV_STREAM_KEY_PURGE		SMC_RMI_CALL(0x0207)
> +#define SMC_RMI_OP_MEM_DONATE			SMC_RMI_CALL(0x0208)
> +#define SMC_RMI_OP_MEM_RECLAIM			SMC_RMI_CALL(0x0209)
> +#define SMC_RMI_OP_CANCEL			SMC_RMI_CALL(0x020a)
> +#define SMC_RMI_VSMMU_FEATURES			SMC_RMI_CALL(0x020b)
> +#define SMC_RMI_VSMMU_CMD_GET			SMC_RMI_CALL(0x020c)
> +#define SMC_RMI_VSMMU_CMD_COMPLETE		SMC_RMI_CALL(0x020d)
> +#define SMC_RMI_PSMMU_INFO			SMC_RMI_CALL(0x020e)
> +
> +#define RMI_ABI_MAJOR_VERSION	2
> +#define RMI_ABI_MINOR_VERSION	0
> +
> +#define RMI_ABI_VERSION_GET_MAJOR(version) ((version) >> 16)
> +#define RMI_ABI_VERSION_GET_MINOR(version) ((version) & 0xFFFF)
> +#define RMI_ABI_VERSION(major, minor)      (((major) << 16) | (minor))
> +
> +#define RMI_UNASSIGNED			0
> +#define RMI_ASSIGNED			1
> +#define RMI_TABLE			2
> +
> +#define RMI_RETURN_STATUS(ret)		((ret) & 0xFF)
> +#define RMI_RETURN_INDEX(ret)		(((ret) >> 8) & 0xFF)
> +#define RMI_RETURN_MEMREQ(ret)		(((ret) >> 8) & 0x3)
> +#define RMI_RETURN_CAN_CANCEL(ret)	(((ret) >> 10) & 0x1)

Use FIELD_GET() and specify masks that define the actual fields.

> +
> +#define RMI_SUCCESS			0
> +#define RMI_ERROR_INPUT			1
> +#define RMI_ERROR_REALM			2
> +#define RMI_ERROR_REC			3
> +#define RMI_ERROR_RTT			4
> +#define RMI_ERROR_NOT_SUPPORTED		5
> +#define RMI_ERROR_DEVICE		6
> +#define RMI_ERROR_RTT_AUX		7
> +#define RMI_ERROR_PSMMU_ST		8
> +#define RMI_ERROR_DPT			9
> +#define RMI_BUSY			10
> +#define RMI_ERROR_GLOBAL		11
> +#define RMI_ERROR_TRACKING		12
> +#define RMI_INCOMPLETE			13
> +#define RMI_BLOCKED			14
> +#define RMI_ERROR_GPT			15
> +#define RMI_ERROR_GRANULE		16
> +
> +#define RMI_OP_MEM_REQ_NONE		0
> +#define RMI_OP_MEM_REQ_DONATE		1
> +#define RMI_OP_MEM_REQ_RECLAIM		2
> +
> +#define RMI_DONATE_SIZE(req)		((req) & 0x3)
> +#define RMI_DONATE_COUNT_MASK		GENMASK(15, 2)
> +#define RMI_DONATE_COUNT(req)		(((req) & RMI_DONATE_COUNT_MASK) >> 2)
> +#define RMI_DONATE_CONTIG(req)		(!!((req) & BIT(16)))
> +#define RMI_DONATE_STATE(req)		(!!((req) & BIT(17)))

FIELD_GET().

> +
> +#define RMI_OP_MEM_DELEGATED		0
> +#define RMI_OP_MEM_UNDELEGATED		1
> +
> +#define RMI_ADDR_TYPE_NONE		0
> +#define RMI_ADDR_TYPE_SINGLE		1
> +#define RMI_ADDR_TYPE_LIST		2
> +
> +#define RMI_ADDR_RANGE_SIZE_MASK	GENMASK(1, 0)
> +#define RMI_ADDR_RANGE_COUNT_MASK	GENMASK(PAGE_SHIFT - 1, 2)
> +#define RMI_ADDR_RANGE_ADDR_MASK	(PAGE_MASK & GENMASK(51, 0))
> +#define RMI_ADDR_RANGE_STATE_MASK	BIT(63)
> +
> +#define RMI_ADDR_RANGE_SIZE(ar)		(FIELD_GET(RMI_ADDR_RANGE_SIZE_MASK, \
> +						   (ar)))
> +#define RMI_ADDR_RANGE_COUNT(ar)	(FIELD_GET(RMI_ADDR_RANGE_COUNT_MASK, \
> +						   (ar)))
> +#define RMI_ADDR_RANGE_ADDR(ar)		((ar) & RMI_ADDR_RANGE_ADDR_MASK)
> +#define RMI_ADDR_RANGE_STATE(ar)	(FIELD_GET(RMI_ADDR_RANGE_STATE_MASK, \
> +						   (ar)))
> +
> +enum rmi_ripas {
> +	RMI_EMPTY = 0,
> +	RMI_RAM = 1,
> +	RMI_DESTROYED = 2,
> +	RMI_DEV = 3,
> +};
> +
> +#define RMI_NO_MEASURE_CONTENT	0
> +#define RMI_MEASURE_CONTENT	1
> +
> +#define RMI_FEATURE_REGISTER_0_S2SZ		GENMASK(7, 0)
> +#define RMI_FEATURE_REGISTER_0_LPA2		BIT(8)
> +#define RMI_FEATURE_REGISTER_0_SVE		BIT(9)
> +#define RMI_FEATURE_REGISTER_0_SVE_VL		GENMASK(13, 10)
> +#define RMI_FEATURE_REGISTER_0_NUM_BPS		GENMASK(19, 14)
> +#define RMI_FEATURE_REGISTER_0_NUM_WPS		GENMASK(25, 20)
> +#define RMI_FEATURE_REGISTER_0_PMU		BIT(26)
> +#define RMI_FEATURE_REGISTER_0_PMU_NUM_CTRS	GENMASK(31, 27)
> +
> +#define RMI_FEATURE_REGISTER_1_RMI_GRAN_SZ_4KB	BIT(0)
> +#define RMI_FEATURE_REGISTER_1_RMI_GRAN_SZ_16KB	BIT(1)
> +#define RMI_FEATURE_REGISTER_1_RMI_GRAN_SZ_64KB	BIT(2)
> +#define RMI_FEATURE_REGISTER_1_HASH_SHA_256	BIT(3)
> +#define RMI_FEATURE_REGISTER_1_HASH_SHA_384	BIT(4)
> +#define RMI_FEATURE_REGISTER_1_HASH_SHA_512	BIT(5)
> +#define RMI_FEATURE_REGISTER_1_MAX_RECS_ORDER	GENMASK(9, 6)
> +#define RMI_FEATURE_REGISTER_1_L0GPTSZ		GENMASK(13, 10)
> +#define RMI_FEATURE_REGISTER_1_PPS		GENMASK(16, 14)
> +
> +#define RMI_FEATURE_REGISTER_2_DA		BIT(0)
> +#define RMI_FEATURE_REGISTER_2_DA_COH		BIT(1)
> +#define RMI_FEATURE_REGISTER_2_VSMMU		BIT(2)
> +#define RMI_FEATURE_REGISTER_2_ATS		BIT(3)
> +#define RMI_FEATURE_REGISTER_2_MAX_VDEVS_ORDER	GENMASK(7, 4)
> +#define RMI_FEATURE_REGISTER_2_VDEV_KROU	BIT(8)
> +#define RMI_FEATURE_REGISTER_2_NON_TEE_STREAM	BIT(9)
> +
> +#define RMI_FEATURE_REGISTER_3_MAX_NUM_AUX_PLANES	GENMASK(3, 0)
> +#define RMI_FEATURE_REGISTER_3_RTT_PLAN			GENMASK(5, 4)
> +#define RMI_FEATURE_REGISTER_3_RTT_S2AP_INDIRECT	BIT(6)
> +
> +#define RMI_FEATURE_REGISTER_4_MEC_COUNT		GENMASK(63, 0)
> +
> +#define RMI_MEM_CATEGORY_CONVENTIONAL		0
> +#define RMI_MEM_CATEGORY_DEV_NCOH		1
> +#define RMI_MEM_CATEGORY_DEV_COH		2
> +
> +#define RMI_TRACKING_RESERVED			0
> +#define RMI_TRACKING_NONE			1
> +#define RMI_TRACKING_FINE			2
> +#define RMI_TRACKING_COARSE			3
> +
> +#define RMI_GRANULE_SIZE_4KB	0
> +#define RMI_GRANULE_SIZE_16KB	1
> +#define RMI_GRANULE_SIZE_64KB	2
> +
> +/*
> + * Note many of these fields are smaller than u64 but all fields have u64
> + * alignment, so use u64 to ensure correct alignment.
> + */
> +struct rmm_config {
> +	union { /* 0x0 */
> +		struct {
> +			u64 tracking_region_size;
> +			u64 rmi_granule_size;
> +		};
> +		u8 sizer[0x1000];

SZ_4K?

> +	};
> +};
> +
> +#define RMI_REALM_PARAM_FLAG_LPA2		BIT(0)
> +#define RMI_REALM_PARAM_FLAG_SVE		BIT(1)
> +#define RMI_REALM_PARAM_FLAG_PMU		BIT(2)
> +
> +struct realm_params {
> +	union { /* 0x0 */
> +		struct {
> +			u64 flags;
> +			u64 s2sz;
> +			u64 sve_vl;
> +			u64 num_bps;
> +			u64 num_wps;
> +			u64 pmu_num_ctrs;
> +			u64 hash_algo;
> +			u64 num_aux_planes;
> +		};
> +		u8 padding0[0x400];

SZ_1K? And similarly all over the shop?

I haven't checked the details of the encodings (life is too short),
but I wonder how much of this exists as an MRS and could be
automatically generated?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply

* Re: [PATCH v14 03/44] arm64: RME: Handle Granule Protection Faults (GPFs)
From: Marc Zyngier @ 2026-05-21 12:25 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-4-steven.price@arm.com>

On Wed, 13 May 2026 14:17:11 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> If the host attempts to access granules that have been delegated for use
> in a realm these accesses will be caught and will trigger a Granule
> Protection Fault (GPF).
> 
> A fault during a page walk signals a bug in the kernel and is handled by
> oopsing the kernel. A non-page walk fault could be caused by user space
> having access to a page which has been delegated to the kernel and will
> trigger a SIGBUS to allow debugging why user space is trying to access a
> delegated page.
> 
> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Reviewed-by: Gavin Shan <gshan@redhat.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v10:
>  * Don't call arm64_notify_die() in do_gpf() but simply return 1.
> Changes since v2:
>  * Include missing "Granule Protection Fault at level -1"
> ---
>  arch/arm64/mm/fault.c | 28 ++++++++++++++++++++++------
>  1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
> index 0f3c5c7ca054..6358ea4787ba 100644
> --- a/arch/arm64/mm/fault.c
> +++ b/arch/arm64/mm/fault.c
> @@ -905,6 +905,22 @@ static int do_tag_check_fault(unsigned long far, unsigned long esr,
>  	return 0;
>  }
>  
> +static int do_gpf_ptw(unsigned long far, unsigned long esr, struct pt_regs *regs)
> +{
> +	const struct fault_info *inf = esr_to_fault_info(esr);
> +
> +	die_kernel_fault(inf->name, far, esr, regs);
> +	return 0;
> +}
> +
> +static int do_gpf(unsigned long far, unsigned long esr, struct pt_regs *regs)
> +{
> +	if (!is_el1_instruction_abort(esr) && fixup_exception(regs, esr))
> +		return 0;
> +
> +	return 1;
> +}
> +
>  static const struct fault_info fault_info[] = {
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"ttbr address size fault"	},
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"level 1 address size fault"	},
> @@ -941,12 +957,12 @@ static const struct fault_info fault_info[] = {
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 32"			},
>  	{ do_alignment_fault,	SIGBUS,  BUS_ADRALN,	"alignment fault"		},
>  	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 34"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 35"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 36"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 37"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 38"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 39"			},
> -	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 40"			},
> +	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level -1" },
> +	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level 0" },
> +	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level 1" },
> +	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level 2" },
> +	{ do_gpf_ptw,		SIGKILL, SI_KERNEL,	"Granule Protection Fault at level 3" },
> +	{ do_gpf,		SIGBUS,  SI_KERNEL,	"Granule Protection Fault not on table walk" },

It wouldn't hurt to align the textual description with what we have
for other fault syndromes:

	"level X granule protection fault (translation table walk)"

for the PTW-trigger faults, and

	"granule protection fault"

for the non PTW case.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply

* Re: [PATCH v6 24/43] KVM: selftests: Rename guest_memfd{,_offset} to gmem_{fd,offset}
From: Fuad Tabba @ 2026-05-21 12:13 UTC (permalink / raw)
  To: ackerleytng
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	ira.weiny, jmattson, jthoughton, michael.roth, oupton,
	pankaj.gupta, qperret, rick.p.edgecombe, rientjes, shivankg,
	steven.price, willy, wyihan, yan.y.zhao, forkloop, pratyush,
	suzuki.poulose, aneesh.kumar, liam, Paolo Bonzini,
	Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Baoquan He, Barry Song,
	Axel Rasmussen, Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng,
	Shakeel Butt, Kiryl Shutsemau, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <20260507-gmem-inplace-conversion-v6-24-91ab5a8b19a4@google.com>

On Thu, 7 May 2026 at 21:23, Ackerley Tng via B4 Relay
<devnull+ackerleytng.google.com@kernel.org> wrote:
>
> From: Sean Christopherson <seanjc@google.com>
>
> Rename local variables and function parameters for the guest memory file
> descriptor and its offset to use a "gmem_" prefix instead of
> "guest_memfd_".
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>

Reviewed-by: Fuad Tabba <tabba@google.com>

Cheers,
/fuad

> ---
>  tools/testing/selftests/kvm/include/kvm_util.h |  6 +++---
>  tools/testing/selftests/kvm/lib/kvm_util.c     | 26 +++++++++++++-------------
>  2 files changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/kvm_util.h b/tools/testing/selftests/kvm/include/kvm_util.h
> index 2ecaaa0e99654..f19383376ee8e 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util.h
> @@ -690,17 +690,17 @@ int __vm_set_user_memory_region(struct kvm_vm *vm, u32 slot, u32 flags,
>                                 gpa_t gpa, u64 size, void *hva);
>  void vm_set_user_memory_region2(struct kvm_vm *vm, u32 slot, u32 flags,
>                                 gpa_t gpa, u64 size, void *hva,
> -                               u32 guest_memfd, u64 guest_memfd_offset);
> +                               u32 gmem_fd, u64 gmem_offset);
>  int __vm_set_user_memory_region2(struct kvm_vm *vm, u32 slot, u32 flags,
>                                  gpa_t gpa, u64 size, void *hva,
> -                                u32 guest_memfd, u64 guest_memfd_offset);
> +                                u32 gmem_fd, u64 gmem_offset);
>
>  void vm_userspace_mem_region_add(struct kvm_vm *vm,
>                                  enum vm_mem_backing_src_type src_type,
>                                  gpa_t gpa, u32 slot, u64 npages, u32 flags);
>  void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
>                 gpa_t gpa, u32 slot, u64 npages, u32 flags,
> -               int guest_memfd_fd, u64 guest_memfd_offset);
> +               int gmem_fd, u64 gmem_offset);
>
>  #ifndef vm_arch_has_protected_memory
>  static inline bool vm_arch_has_protected_memory(struct kvm_vm *vm)
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index df73b23a4c66a..11da9b7546d03 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -947,7 +947,7 @@ void vm_set_user_memory_region(struct kvm_vm *vm, u32 slot, u32 flags,
>
>  int __vm_set_user_memory_region2(struct kvm_vm *vm, u32 slot, u32 flags,
>                                  gpa_t gpa, u64 size, void *hva,
> -                                u32 guest_memfd, u64 guest_memfd_offset)
> +                                u32 gmem_fd, u64 gmem_offset)
>  {
>         struct kvm_userspace_memory_region2 region = {
>                 .slot = slot,
> @@ -955,8 +955,8 @@ int __vm_set_user_memory_region2(struct kvm_vm *vm, u32 slot, u32 flags,
>                 .guest_phys_addr = gpa,
>                 .memory_size = size,
>                 .userspace_addr = (uintptr_t)hva,
> -               .guest_memfd = guest_memfd,
> -               .guest_memfd_offset = guest_memfd_offset,
> +               .guest_memfd = gmem_fd,
> +               .guest_memfd_offset = gmem_offset,
>         };
>
>         TEST_REQUIRE_SET_USER_MEMORY_REGION2();
> @@ -966,10 +966,10 @@ int __vm_set_user_memory_region2(struct kvm_vm *vm, u32 slot, u32 flags,
>
>  void vm_set_user_memory_region2(struct kvm_vm *vm, u32 slot, u32 flags,
>                                 gpa_t gpa, u64 size, void *hva,
> -                               u32 guest_memfd, u64 guest_memfd_offset)
> +                               u32 gmem_fd, u64 gmem_offset)
>  {
>         int ret = __vm_set_user_memory_region2(vm, slot, flags, gpa, size, hva,
> -                                              guest_memfd, guest_memfd_offset);
> +                                              gmem_fd, gmem_offset);
>
>         TEST_ASSERT(!ret, "KVM_SET_USER_MEMORY_REGION2 failed, errno = %d (%s)",
>                     errno, strerror(errno));
> @@ -979,7 +979,7 @@ void vm_set_user_memory_region2(struct kvm_vm *vm, u32 slot, u32 flags,
>  /* FIXME: This thing needs to be ripped apart and rewritten. */
>  void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
>                 gpa_t gpa, u32 slot, u64 npages, u32 flags,
> -               int guest_memfd, u64 guest_memfd_offset)
> +               int gmem_fd, u64 gmem_offset)
>  {
>         int ret;
>         struct userspace_mem_region *region;
> @@ -1055,12 +1055,12 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
>                 region->mmap_size += alignment;
>
>         if (flags & KVM_MEM_GUEST_MEMFD) {
> -               if (guest_memfd < 0) {
> -                       u32 guest_memfd_flags = 0;
> +               if (gmem_fd < 0) {
> +                       u32 gmem_flags = 0;
>
> -                       TEST_ASSERT(!guest_memfd_offset,
> +                       TEST_ASSERT(!gmem_offset,
>                                     "Offset must be zero when creating new guest_memfd");
> -                       guest_memfd = vm_create_guest_memfd(vm, mem_size, guest_memfd_flags);
> +                       gmem_fd = vm_create_guest_memfd(vm, mem_size, gmem_flags);
>                 } else {
>                         /*
>                          * Install a unique fd for each memslot so that the fd
> @@ -1068,11 +1068,11 @@ void vm_mem_add(struct kvm_vm *vm, enum vm_mem_backing_src_type src_type,
>                          * needing to track if the fd is owned by the framework
>                          * or by the caller.
>                          */
> -                       guest_memfd = kvm_dup(guest_memfd);
> +                       gmem_fd = kvm_dup(gmem_fd);
>                 }
>
> -               region->region.guest_memfd = guest_memfd;
> -               region->region.guest_memfd_offset = guest_memfd_offset;
> +               region->region.guest_memfd = gmem_fd;
> +               region->region.guest_memfd_offset = gmem_offset;
>         } else {
>                 region->region.guest_memfd = -1;
>         }
>
> --
> 2.54.0.563.g4f69b47b94-goog
>
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox