Linux Confidential Computing Development
 help / color / mirror / Atom feed
* Re: [PATCH v14 37/44] arm64: RMI: Prevent Device mappings for Realms
From: Aneesh Kumar K.V @ 2026-05-19 10:25 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Gavin Shan, Shanker Donthineni, Alper Gun, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-38-steven.price@arm.com>

Steven Price <steven.price@arm.com> writes:

> Physical device assignment is not yet supported. RMM v2.0 does add the
> relevant APIs, but device assignment is a big topic so will be handled
> in a future patch series. For now prevent device mappings when the guest
> is a realm.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes from v6:
>  * Fix the check in user_mem_abort() to prevent all pages that are not
>    guest_memfd() from being mapped into the protected half of the IPA.
> Changes from v5:
>  * Also prevent accesses in user_mem_abort()
> ---
>  arch/arm64/kvm/mmu.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index 776ffe56d17e..7678226ffd38 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -1230,6 +1230,10 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
>  	if (is_protected_kvm_enabled())
>  		return -EPERM;
>  
> +	/* We don't support mapping special pages into a Realm */
> +	if (kvm_is_realm(kvm))
> +		return -EPERM;
> +
>  	size += offset_in_page(guest_ipa);
>  	guest_ipa &= PAGE_MASK;
>  

The commit message suggests that this will need to be updated to support
Device Assignment, but that is not true. IIUC, this is only used by
GICv2?. Can we update the commit message?

-aneesh

^ permalink raw reply

* Re: [PATCH v9 02/23] x86/virt/tdx: Move TDX_FEATURES0 bits to asm/tdx.h
From: Chao Gao @ 2026-05-19 10:24 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: Hansen, Dave, linux-kernel@vger.kernel.org,
	linux-coco@lists.linux.dev, Huang, Kai, kvm@vger.kernel.org,
	Li, Xiaoyao, Zhao, Yan Y, dave.hansen@linux.intel.com,
	tony.lindgren@linux.intel.com, Chatre, Reinette,
	seanjc@google.com, pbonzini@redhat.com, binbin.wu@linux.intel.com,
	Weiny, Ira, nik.borisov@suse.com, mingo@redhat.com,
	Verma, Vishal L, kas@kernel.org, Shahar, Sagi, Annapurve, Vishal,
	djbw@kernel.org, tglx@kernel.org, paulmck@kernel.org,
	hpa@zytor.com, bp@alien8.de, yilun.xu@linux.intel.com,
	x86@kernel.org
In-Reply-To: <60dfac5273fa3bd5b5d31dbe6b8f32a60d329a78.camel@intel.com>

On Tue, May 19, 2026 at 09:59:02AM +0800, Edgecombe, Rick P wrote:
>On Mon, 2026-05-18 at 09:57 -0700, Rick Edgecombe wrote:
>> On Mon, 2026-05-18 at 15:52 +0800, Chao Gao wrote:
>> > On Fri, May 15, 2026 at 09:15:47AM -0700, Dave Hansen wrote:
>> > > On 5/13/26 08:09, Chao Gao wrote:
>> > > > This prepares for TDX module update [1] and Dynamic PAMT [2] support. Both
>> > > > add new TDX_FEATURES0 capability bits, and both need those capabilities to
>> > > > be queried from code outside arch/x86/virt. The corresponding feature-query
>> > > > helpers therefore need to live in the public asm/tdx.h header, so move the
>> > > > existing bit definitions there first.
>> > > 
>> > > Please don't add unnecessary changelog cruft. If you need this move for
>> > > this series, that's enough.
>> > 
>> > Sure. Will remove "Dynamic PAMT" stuff from the changelog.
>> 
>> I think it should not link to old versions of this series to explain the
>> preparation. That is very confusing. We can just explain what will come in the
>> later patches of *this* series. I'll circle back and propose some verbiage.
>
>How about?
>
>Future changes will add support for new TDX features exposed as TDX_FEATURES0
>bits. The presence of these features will need to be checked outside of
>arch/x86/virt. So the feature query helpers, and the TDX_FEATURES0 defines they
>reference, will need to live in the widely accessible asm/tdx.h helper. Move the
>existing TDX_FEATURES0 to asm/tdx.h so that they can all be kept together.

Yes. This looks much clearer.

Thanks for helping me on this.

^ permalink raw reply

* Re: [PATCH v9 13/23] x86/virt/seamldr: Abort updates after a failed step
From: Chao Gao @ 2026-05-19 10:20 UTC (permalink / raw)
  To: Edgecombe, Rick P
  Cc: kvm@vger.kernel.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org, Li, Xiaoyao, Huang, Kai,
	Zhao, Yan Y, dave.hansen@linux.intel.com, kas@kernel.org,
	Chatre, Reinette, seanjc@google.com, pbonzini@redhat.com,
	binbin.wu@linux.intel.com, Verma, Vishal L, nik.borisov@suse.com,
	mingo@redhat.com, Weiny, Ira, tony.lindgren@linux.intel.com,
	Annapurve, Vishal, Shahar, Sagi, djbw@kernel.org, tglx@kernel.org,
	paulmck@kernel.org, hpa@zytor.com, bp@alien8.de,
	yilun.xu@linux.intel.com, x86@kernel.org
In-Reply-To: <c0c5edb3cf4dfeee986a478aa3db886aa8d5db38.camel@intel.com>

On Tue, May 19, 2026 at 10:34:31AM +0800, Edgecombe, Rick P wrote:
>On Wed, 2026-05-13 at 08:09 -0700, Chao Gao wrote:
>> A TDX module update is a multi-step process, and any step can fail.
>> 
>> The current update flow continues to later steps after an error.
>> Continuing after a failure can leave the TDX module in an unrecoverable
>> state.
>
>I get what you are saying here, but "continuing" vs "leaving" is a tiny bit
>confusing to me. Maybe: Continuing with subsequent update steps after a failure
>can cause the TDX module to enter an unrecoverable state?

Yes. it is better.

>
>> 
>> One failure case must remain recoverable: update contention with an ongoing
>> TD build. The agreed kernel behavior for this case [1] is to fail the
>> update with -EBUSY so userspace can retry later.
>
>The link to the discussion is nice, but the explanation of just that there was
>an agreement is not saying much. But the reasoning around AVOID_COMPAT_SENSITIVE
>*is* handled in later patch. So can we say future changes will want to return
>errors to userspace for certain update failures? Then we can discuss the
>specifics when code is actual error is added?

yes. the main point is certain update failures should be recoverable so userspace
can retry.

>
>And why talk about EBUSY specifically? It is not in this patch. Stale log? 

Sure. there is no need to single out EBUSY.

>
>> 
>> Abort the update on any failure. This also makes the TD-build contention
>> case recoverable, because that failure occurs before any TDX module state
>> is changed. 
>> 
>
>Oh, maybe I didn't get what you meant above actually. The contention case is
>only recoverable because we detect it at the first step? Does "Continuing after
>a failure can leave the TDX module in an unrecoverable" really mean that any
>failure after the first step is unrecoverable?

You are right. Any failure after the initial module shutdown step is
unrecoverable.

> Or can we put it in some other
>more specific terms like that. Terms which are more specific but still not
>overly complex description of TDX module update flows?
>
>> Apply the same rule to all errors instead of special-casing
>> -EBUSY.
>
>It seems like actually it is not special cased...? The error returned is
>whatever is returned from the step.
>
>> 
>> Track per-step failures, stop the update loop once a failure is observed,
>> and do not advance the state machine to the next step.
>
>Hmm, so this is actually a bunch of generic handling for each step, that really
>only works for the first one? Is the generic handling really needed?

We could special-case the first step, but that would add step-specific
error handling to the update loop. I think the simpler rule is to abort the
update on the first observed failure, regardless of which step reports it.

how about:

A TDX module update is a multi-step process, and any step can fail.

The current update flow continues to later steps after an error.
Continuing after a failure can cause the TDX module to enter an
unrecoverable state.

But certain failures during the initial module shutdown step should
simply return an error to userspace, so the update can be retried cleanly.

To preserve that recoverability, one option would be to abort the update
only for those failures, since they occur before any TDX module state is
changed. But special-casing specific failures in specific steps would
complicate the do-while() update loop for no benefit.

Simply abort update on any failure, at any step.

Track failures for each step, stop the update loop once a failure is
observed, and do not advance the state machine to the next step.

>
>> 
>> Signed-off-by: Chao Gao <chao.gao@intel.com>
>> Reviewed-by: Xu Yilun <yilun.xu@linux.intel.com>
>> Reviewed-by: Tony Lindgren <tony.lindgren@linux.intel.com>
>> Reviewed-by: Kai Huang <kai.huang@intel.com>
>> Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
>> Link: https://lore.kernel.org/linux-coco/aQFmOZCdw64z14cJ@google.com/ # [1]
>> ---
>> v9:
>>   - Avoid nested if/else by deferring failure accounting to ack_state().
>>   - Reduce indentation of the main flow.
>>   - Convert the failed flag into a counter. This avoids a conditional
>>     update of the flag; the counter can simply accumulate failures.
>> ---
>>  arch/x86/virt/vmx/tdx/seamldr.c | 11 +++++++----
>>  1 file changed, 7 insertions(+), 4 deletions(-)
>> 
>> diff --git a/arch/x86/virt/vmx/tdx/seamldr.c b/arch/x86/virt/vmx/tdx/seamldr.c
>> index 7befe4a08f33..48fe71319fea 100644
>> --- a/arch/x86/virt/vmx/tdx/seamldr.c
>> +++ b/arch/x86/virt/vmx/tdx/seamldr.c
>> @@ -170,6 +170,7 @@ enum module_update_state {
>>  static struct update_ctrl {
>>  	enum module_update_state state;
>>  	int num_ack;
>> +	int num_failed;
>
>Was there past discussion on why it keeps a failed count? All we need to know is
>if anything failed right? So a bool is fine too?

Kiryl suggested a boolean, and I used that in earlier versions.  In v9 I
moved the failure tracking next to the ack counting in ack_state(). A
boolean still works, but it needs an extra conditional to latch the failure
state.

static void ack_state(struct update_ctrl *ctrl, int result)
{
	raw_spin_lock(&ctrl->lock);

-	ctrl->num_failed += !!ret;
+	if (!ctrl->failed)
+		ctrl->failed = !!ret;
	ctrl->num_ack++;
	if (ctrl->num_ack == num_online_cpus())
	if (ctrl->num_ack == num_online_cpus() && !ctrl->num_failed)
		__set_target_state(ctrl, ctrl->state + 1);

Using an int mainly to keep the failure and ack tracking similar
and avoid the extra if. (I put a note under --- to explain this.)

If you prefer, I can switch it back to bool.

^ permalink raw reply

* Re: [PATCH v14 27/44] arm64: RMI: Set RIPAS of initial memslots
From: Suzuki K Poulose @ 2026-05-19 10:13 UTC (permalink / raw)
  To: Aneesh Kumar K.V, Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Zenghui Yu, linux-arm-kernel, linux-kernel,
	Joey Gouly, Alexandru Elisei, Christoffer Dall, Fuad Tabba,
	linux-coco, Ganapatrao Kulkarni, Gavin Shan, Shanker Donthineni,
	Alper Gun, Emi Kisanuki, Vishal Annapurve, WeiLin.Chang,
	Lorenzo.Pieralisi2
In-Reply-To: <yq5av7cjsonr.fsf@kernel.org>

On 19/05/2026 11:02, Aneesh Kumar K.V wrote:
> Steven Price <steven.price@arm.com> writes:
> 
>> The memory which the realm guest accesses must be set to RIPAS_RAM.
>> Iterate over the memslots and set all gmem memslots to RIPAS_RAM.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>   
>   ...
>   
>> +static int set_ripas_of_protected_regions(struct kvm *kvm)
>> +{
>> +	struct kvm_memslots *slots;
>> +	struct kvm_memory_slot *memslot;
>> +	int idx, bkt;
>> +	int ret = 0;
>> +
>> +	idx = srcu_read_lock(&kvm->srcu);
>> +
>> +	slots = kvm_memslots(kvm);
>> +	kvm_for_each_memslot(memslot, bkt, slots) {
>> +		if (!kvm_slot_has_gmem(memslot))
>> +			continue;
>> +
>> +		ret = realm_init_ipa_state(kvm, memslot->base_gfn,
>> +					   memslot->npages);
>> +		if (ret)
>> +			break;
>> +	}
>> +	srcu_read_unlock(&kvm->srcu, idx);
>> +
>> +	return ret;
>> +}
>> +
>>   int kvm_arm_rmi_populate(struct kvm *kvm,
>>   			 struct kvm_arm_rmi_populate *args)
>>   {
>> @@ -890,6 +922,10 @@ int kvm_activate_realm(struct kvm *kvm)
>>   			return ret;
>>   	}
>>   
>> +	ret = set_ripas_of_protected_regions(kvm);
>> +	if (ret)
>> +		return ret;
>> +
>>   	ret = rmi_realm_activate(virt_to_phys(realm->rd));
>>   	if (ret)
>>   		return -ENXIO;
> 
> relam guest already does.
> 	for_each_mem_range(i, &start, &end) {
> 		if (rsi_set_memory_range_protected_safe(start, end)) {
> 			panic("Failed to set memory range to protected: %pa-%pa",
> 			      &start, &end);
> 		}
> 	}
> 
> if so why is host required to do this ?

Ideally this should be a call from the VMM (i.e., user). Irrespective of
what the guest does (which the host has no knowledge about), the VMM/
user is better aware of what to do for a given guest. We have done this
implicitly in the KVM as a start, to keep the initial implementation
simple. This could be moved out to the VMM as UABI, if there is
sufficient demand for it.

TL,DR: This should be a host/deployer decision, not the Guest. There
may other guest OS, which do not do RIPAS_RAM early enough.

Suzuki






^ permalink raw reply

* Re: [PATCH v14 27/44] arm64: RMI: Set RIPAS of initial memslots
From: Aneesh Kumar K.V @ 2026-05-19 10:02 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Gavin Shan, Shanker Donthineni, Alper Gun, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-28-steven.price@arm.com>

Steven Price <steven.price@arm.com> writes:

> The memory which the realm guest accesses must be set to RIPAS_RAM.
> Iterate over the memslots and set all gmem memslots to RIPAS_RAM.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
 
 ...
 
> +static int set_ripas_of_protected_regions(struct kvm *kvm)
> +{
> +	struct kvm_memslots *slots;
> +	struct kvm_memory_slot *memslot;
> +	int idx, bkt;
> +	int ret = 0;
> +
> +	idx = srcu_read_lock(&kvm->srcu);
> +
> +	slots = kvm_memslots(kvm);
> +	kvm_for_each_memslot(memslot, bkt, slots) {
> +		if (!kvm_slot_has_gmem(memslot))
> +			continue;
> +
> +		ret = realm_init_ipa_state(kvm, memslot->base_gfn,
> +					   memslot->npages);
> +		if (ret)
> +			break;
> +	}
> +	srcu_read_unlock(&kvm->srcu, idx);
> +
> +	return ret;
> +}
> +
>  int kvm_arm_rmi_populate(struct kvm *kvm,
>  			 struct kvm_arm_rmi_populate *args)
>  {
> @@ -890,6 +922,10 @@ int kvm_activate_realm(struct kvm *kvm)
>  			return ret;
>  	}
>  
> +	ret = set_ripas_of_protected_regions(kvm);
> +	if (ret)
> +		return ret;
> +
>  	ret = rmi_realm_activate(virt_to_phys(realm->rd));
>  	if (ret)
>  		return -ENXIO;

relam guest already does. 
	for_each_mem_range(i, &start, &end) {
		if (rsi_set_memory_range_protected_safe(start, end)) {
			panic("Failed to set memory range to protected: %pa-%pa",
			      &start, &end);
		}
	}

if so why is host required to do this ?

-aneesh

^ permalink raw reply

* Re: [RFC PATCH v4 00/14] coco/TSM: Host-side Arm CCA IDE setup via connect/disconnect callbacks
From: Will Deacon @ 2026-05-19  9:46 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Aneesh Kumar K.V (Arm), linux-coco, kvmarm, linux-arm-kernel,
	linux-kernel, Alexey Kardashevskiy, Catalin Marinas, Dan Williams,
	Jason Gunthorpe, Jonathan Cameron, Marc Zyngier, Samuel Ortiz,
	Steven Price, Xu Yilun
In-Reply-To: <afca6cdd-f9e3-4b5f-9f70-940c8dbb7a80@arm.com>

On Tue, May 19, 2026 at 09:24:07AM +0100, Suzuki K Poulose wrote:
> On 18/05/2026 13:59, Will Deacon wrote:
> > On Mon, Apr 27, 2026 at 12:21:07PM +0530, Aneesh Kumar K.V (Arm) wrote:
> > >   arch/arm64/include/asm/rmi_cmds.h         |  85 +++
> > >   arch/arm64/include/asm/rmi_smc.h          | 168 +++++
> > 
> > Curious, but why does this stuff have to live in the arch code? Wouldn't
> > it be better off somewhere like drivers/firmware/ or
> > include/linux/arm-rmi.h?
> 
> Good point. RMI interface is only available for arm64 (not in Arm32). That
> said, it is indeed a firmware ! ;-) interface. The APIs are closely
> integrated with the KVM Realm management. If the general consensus is
> to move them under drivers/firmware (like PSCI), we could take that
> approach.

I'd certainly prefer that as it means it's co-located with other firmware
interface code and also means that the arch maintainers don't need to
worry about changes to driver code :p

Will

^ permalink raw reply

* Re: [PATCH v14 23/44] arm64: RMI: Handle RMI_EXIT_RIPAS_CHANGE
From: Aneesh Kumar K.V @ 2026-05-19  9:40 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Gavin Shan, Shanker Donthineni, Alper Gun, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-24-steven.price@arm.com>

Steven Price <steven.price@arm.com> writes:

...

> +void kvm_realm_unmap_range(struct kvm *kvm, unsigned long start,
> +			   unsigned long size, bool unmap_private,
> +			   bool may_block)
> +{
> +	unsigned long end = start + size;
> +	struct realm *realm = &kvm->arch.realm;
> +
> +	if (!kvm_realm_is_created(kvm))
> +		return;
> +
> +	end = min(BIT(realm->ia_bits - 1), end);
> +
> +	realm_unmap_shared_range(kvm, start, end, may_block);
> +	if (unmap_private)
> +		realm_unmap_private_range(kvm, start, end, may_block);
> +}
> +
 
kvm_gmem_invalidate_begin() indicates a private-only invalidation. How
is that supported?

-aneesh

^ permalink raw reply

* Re: [RFC PATCH v4 00/14] coco/TSM: Host-side Arm CCA IDE setup via connect/disconnect callbacks
From: Suzuki K Poulose @ 2026-05-19  8:24 UTC (permalink / raw)
  To: Will Deacon, Aneesh Kumar K.V (Arm)
  Cc: linux-coco, kvmarm, linux-arm-kernel, linux-kernel,
	Alexey Kardashevskiy, Catalin Marinas, Dan Williams,
	Jason Gunthorpe, Jonathan Cameron, Marc Zyngier, Samuel Ortiz,
	Steven Price, Xu Yilun
In-Reply-To: <agsNO9cc7H-b0H8L@willie-the-truck>

On 18/05/2026 13:59, Will Deacon wrote:
> On Mon, Apr 27, 2026 at 12:21:07PM +0530, Aneesh Kumar K.V (Arm) wrote:
>>   arch/arm64/include/asm/rmi_cmds.h         |  85 +++
>>   arch/arm64/include/asm/rmi_smc.h          | 168 +++++
> 
> Curious, but why does this stuff have to live in the arch code? Wouldn't
> it be better off somewhere like drivers/firmware/ or
> include/linux/arm-rmi.h?

Good point. RMI interface is only available for arm64 (not in Arm32). 
That said, it is indeed a firmware ! ;-) interface. The APIs are closely
integrated with the KVM Realm management. If the general consensus is
to move them under drivers/firmware (like PSCI), we could take that
approach.

Suzuki

> 
> Will


^ permalink raw reply

* Re: [PATCH v14 17/44] arm64: RMI: RTT tear down
From: Aneesh Kumar K.V @ 2026-05-19  6:54 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Gavin Shan, Shanker Donthineni, Alper Gun, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-18-steven.price@arm.com>

Steven Price <steven.price@arm.com> writes:
> +static void kvm_realm_uninit_stage2(struct kvm_s2_mmu *mmu)
> +{
> +	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
> +	struct realm *realm = &kvm->arch.realm;
> +
> +	if (kvm_realm_state(kvm) != REALM_STATE_ACTIVE)
> +		return;
> +
> +	write_lock(&kvm->mmu_lock);
> +	kvm_stage2_unmap_range(mmu, 0, BIT(realm->ia_bits - 1), true);
> +	write_unlock(&kvm->mmu_lock);
> +	kvm_realm_destroy_rtts(kvm);
> +}
> +

We also call kvm_stage2_unmap_range in kvm_destroy_realm()

void kvm_destroy_realm(struct kvm *kvm)
{
...
	write_lock(&kvm->mmu_lock);
	kvm_stage2_unmap_range(&kvm->arch.mmu, 0,
			       BIT(realm->ia_bits - 1), true);
	write_unlock(&kvm->mmu_lock);
        
-aneesh

^ permalink raw reply

* Re: [PATCH v14 14/44] arm64: RMI: Basic infrastructure for creating a realm.
From: Aneesh Kumar K.V @ 2026-05-19  6:31 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Gavin Shan, Shanker Donthineni, Alper Gun, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-15-steven.price@arm.com>

Steven Price <steven.price@arm.com> writes:

> @@ -1114,7 +1119,10 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>  	write_unlock(&kvm->mmu_lock);
>  
>  	if (pgt) {
> -		kvm_stage2_destroy(pgt);
> +		if (!kvm_is_realm(kvm))
> +			kvm_stage2_destroy(pgt);
> +		else
> +			kvm_pgtable_stage2_destroy_pgd(pgt);
>  		kfree(pgt);
>  	}
>  }

Maybe add a comment here explaining the difference.

We now have:

kvm_arch_destroy_vm()
  -> kvm_uninit_stage2_mmu()
       -> kvm_realm_uninit_stage2()
            -> unmap_range(0, max_ipa)        // for Realm VMs
       -> kvm_free_stage2_pgd()
            -> unmap and free PGD             // for non-Realm VMs
  -> kvm_destroy_realm()                      // for Realm VMs
       -> kvm_free_stage2_pgd()
            -> free PGD                       // for Realm VMs

I wonder whether this can be simplified using different functions names?
(can we call kvm_pgtable_stage2_destroy_pgd() from kvm_destroy_realm()? )

-aneesh

^ permalink raw reply

* Re: [PATCH v14 10/44] arm64: RMI: Add support for SRO
From: Aneesh Kumar K.V @ 2026-05-19  6:02 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Gavin Shan, Shanker Donthineni, Alper Gun, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-11-steven.price@arm.com>

Steven Price <steven.price@arm.com> writes:

> +static unsigned long donate_req_to_size(unsigned long donatereq)
> +{
> +	unsigned long unit_size = RMI_DONATE_SIZE(donatereq);
> +
> +	switch (unit_size) {
> +	case 0:
> +		return PAGE_SIZE;
> +	case 1:
> +		return PMD_SIZE;
> +	case 2:
> +		return PUD_SIZE;
> +	case 3:
> +		return P4D_SIZE;
> +	}
> +	unreachable();
> +}
>
> +
> +static void rmi_smccc_invoke(struct arm_smccc_1_2_regs *regs_in,
> +			     struct arm_smccc_1_2_regs *regs_out)
> +{
> +	struct arm_smccc_1_2_regs regs = *regs_in;
> +	unsigned long status;
> +
> +	do {
> +		arm_smccc_1_2_invoke(&regs, regs_out);
> +		status = RMI_RETURN_STATUS(regs_out->a0);
> +	} while (status == RMI_BUSY || status == RMI_BLOCKED);
> +}
> +
> +int free_delegated_page(phys_addr_t phys)
> +{
> +	if (WARN_ON(rmi_undelegate_page(phys))) {
> +		/* Undelegate failed: leak the page */
> +		return -EBUSY;
> +	}
> +
> +	free_page((unsigned long)phys_to_virt(phys));
> +
> +	return 0;
> +}
> +
> +static int rmi_sro_ensure_capacity(struct rmi_sro_state *sro,
> +				   unsigned long count)
> +{
> +	if (WARN_ON_ONCE(sro->addr_count > RMI_MAX_ADDR_LIST))
> +		return -EOVERFLOW;
> +
> +	if (count > RMI_MAX_ADDR_LIST - sro->addr_count)
> +		return -ENOSPC;
> +
> +	return 0;
> +}
> +
> +static int rmi_sro_donate_contig(struct rmi_sro_state *sro,
> +				 unsigned long sro_handle,
> +				 unsigned long donatereq,
> +				 struct arm_smccc_1_2_regs *out_regs,
> +				 gfp_t gfp)
> +{
> +	unsigned long unit_size = RMI_DONATE_SIZE(donatereq);
> +	unsigned long unit_size_bytes = donate_req_to_size(donatereq);
> +	unsigned long count = RMI_DONATE_COUNT(donatereq);
> +	unsigned long state = RMI_DONATE_STATE(donatereq);
> +	unsigned long size = unit_size_bytes * count;
> +	unsigned long addr_range;
>

Looking at above and the related code, I am wondering whether we should
use u64 instead of unsigned long for everything that the specification
defines as 64-bit.

-aneesh

^ permalink raw reply

* Re: [PATCH v14 08/44] arm64: RMI: Ensure that the RMM has GPT entries for memory
From: Aneesh Kumar K.V @ 2026-05-19  5:55 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Gavin Shan, Shanker Donthineni, Alper Gun, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-9-steven.price@arm.com>

> +
> +bool rmi_is_available(void)
> +{
> +	return arm64_rmi_is_available;
> +}
> +

Can we rename to is_rmi_available(void) ?

-aneesh

^ permalink raw reply

* Re: [PATCH v14 05/44] arm64: RMI: Add wrappers for RMI calls
From: Aneesh Kumar K.V @ 2026-05-19  5:35 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Steven Price, Catalin Marinas, Marc Zyngier, Will Deacon,
	James Morse, Oliver Upton, Suzuki K Poulose, Zenghui Yu,
	linux-arm-kernel, linux-kernel, Joey Gouly, Alexandru Elisei,
	Christoffer Dall, Fuad Tabba, linux-coco, Ganapatrao Kulkarni,
	Gavin Shan, Shanker Donthineni, Alper Gun, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-6-steven.price@arm.com>

Steven Price <steven.price@arm.com> writes:

> The wrappers make the call sites easier to read and deal with the
> boiler plate of handling the error codes from the RMM.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
> +#define rmi_smccc(...) do {						\
> +	arm_smccc_1_1_invoke(__VA_ARGS__);				\
> +} while (RMI_RETURN_STATUS(res.a0) == RMI_BUSY ||			\
> +	 RMI_RETURN_STATUS(res.a0) == RMI_BLOCKED)
> +

I guess this is not used. Also, that would require the call site to have a struct arm_smccc_res res.


-aneesh

^ permalink raw reply

* Re: [PATCH v9 19/23] x86/virt/tdx: Refresh TDX module version after update
From: Edgecombe, Rick P @ 2026-05-19  3:16 UTC (permalink / raw)
  To: kvm@vger.kernel.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org, Gao, Chao
  Cc: Li, Xiaoyao, Huang, Kai, Zhao, Yan Y, dave.hansen@linux.intel.com,
	kas@kernel.org, Chatre, Reinette, seanjc@google.com,
	pbonzini@redhat.com, binbin.wu@linux.intel.com, Verma, Vishal L,
	nik.borisov@suse.com, mingo@redhat.com, Weiny, Ira,
	tony.lindgren@linux.intel.com, Annapurve, Vishal, Shahar, Sagi,
	djbw@kernel.org, tglx@kernel.org, paulmck@kernel.org,
	hpa@zytor.com, bp@alien8.de, yilun.xu@linux.intel.com,
	x86@kernel.org
In-Reply-To: <20260513151045.1420990-20-chao.gao@intel.com>

On Wed, 2026-05-13 at 08:10 -0700, Chao Gao wrote:
> The kernel exposes the TDX module version through sysfs so userspace can
> check update compatibility. That information needs to remain accurate
> across runtime updates.
> 
> A runtime update may change the module's update_version, so refresh the
> cached version right after a successful update.
> 
> Drop __ro_after_init from tdx_sysinfo because it is now updated at runtime.
> 
> Do not refresh the rest of tdx_sysinfo, even if some values change across
> updates. TDX module updates are backward compatible, so existing
> tdx_sysinfo consumers, e.g. KVM, can continue to operate without seeing the
> new values.
> 
> Refreshing the full structure would be risky. A tdx_sysinfo consumer may
> initialize its TDX support based on the features originally reported in
> tdx_sysinfo. If a runtime update adds new features and the full structure
> is refreshed, that consumer could observe and use the newly reported
> features without having performed the setup required to use them safely.
> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> ---

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

The only thing I saw missing from Dave's last comments was:
---
> Note that major and minor versions are not refreshed because runtime updates
> are supported only between releases with identical major and minor versions.

I'd rather have this in code than a changelog comment.

If they can't change then warn if they do.
---

But I think we discussed offline to not do this, is it right?

^ permalink raw reply

* Re: [PATCH v9 14/23] x86/virt/seamldr: Shut down the current TDX module
From: Edgecombe, Rick P @ 2026-05-19  3:00 UTC (permalink / raw)
  To: kvm@vger.kernel.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org, Gao, Chao
  Cc: Li, Xiaoyao, Huang, Kai, Zhao, Yan Y, dave.hansen@linux.intel.com,
	kas@kernel.org, Chatre, Reinette, seanjc@google.com,
	pbonzini@redhat.com, binbin.wu@linux.intel.com, Verma, Vishal L,
	nik.borisov@suse.com, mingo@redhat.com, Weiny, Ira,
	tony.lindgren@linux.intel.com, Annapurve, Vishal, Shahar, Sagi,
	djbw@kernel.org, tglx@kernel.org, paulmck@kernel.org,
	hpa@zytor.com, bp@alien8.de, yilun.xu@linux.intel.com,
	x86@kernel.org
In-Reply-To: <20260513151045.1420990-15-chao.gao@intel.com>

On Wed, 2026-05-13 at 08:09 -0700, Chao Gao wrote:
> The first step of TDX module updates is shutting down the current TDX
> module. This step also packs state information that needs to be
> preserved across updates as handoff data, 
> 

kinda reads like handoff data is an existing term, but its the first reference
in this series.

Maybe packs state information that needs to be preserved across updates, called
"handoff data". This handoff data is consumed...

> which will be consumed by the
> updated module. The handoff data is stored internally in the SEAM range
> and is hidden from the kernel.
> 
> To ensure a successful update, the new module must be able to consume
> the handoff data generated by the old module.
> 

Is it too obvious thing to state? Above you already say it's needed.

>  Since handoff data layout
> may change between modules, the handoff data is versioned. Each module
> has a native handoff version and provides backward support for several
> older versions.
> 
> The complete handoff versioning protocol is complex as it supports both
> module upgrades and downgrades. See details in Intel® Trust Domain
> Extensions (Intel® TDX) Module Base Architecture Specification, Chapter
> "Handoff Versioning".
> 
> Ideally, the kernel needs to retrieve the handoff versions supported by
> the current module and the new module and select a version supported by
> both. But, since this implementation chooses to only support module
> upgrades, simply request the current module to generate handoff data
> using its highest supported version, expecting that the new module will
> likely support it.

Hmm, "likely"? Is this trying to justify the kernel's policy? Dunno, stands out
as weird to me. Like "this will mostly work". Sounds incomplete, rather than a
reason of "this policy is the optimal initial implementation" or something like
that.

> 
> Retrieve the module's handoff version from TDX global metadata and add an
> update step to shut down the module.
> 

This is small patch with both things, but it's almost two changes.

>  Module shutdown has global effect, so
> it only needs to run on one CPU.

I wouldn't think having some global effect would necessarily exclude having to
run on multiple CPUs. Or at least I don't follow. Is it a TDX arch thing? I
guess it's ok.

> 
> Note that the handoff information isn't cached in tdx_sysinfo. It is used
> only for module shutdown, and is present only when the TDX module supports
> updates. Caching it in get_tdx_sys_info() would require extra update-support
> guards and refreshing the cached value across module updates.

Instead of being a "note", could this be just an imperative: Don't cache the
handoff information in tdx_sysinfo...

> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Tony Lindgren <tony.lindgren@linux.intel.com>
> Reviewed-by: Xu Yilun <yilun.xu@linux.intel.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> ---
> v9:
>  - Use CPU0 as the primary CPU
> ---
>  arch/x86/include/asm/tdx_global_metadata.h  |  4 ++++
>  arch/x86/virt/vmx/tdx/seamldr.c             | 15 ++++++++++++++-
>  arch/x86/virt/vmx/tdx/tdx.c                 | 19 ++++++++++++++++++-
>  arch/x86/virt/vmx/tdx/tdx.h                 |  3 +++
>  arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 13 +++++++++++++
>  5 files changed, 52 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/tdx_global_metadata.h b/arch/x86/include/asm/tdx_global_metadata.h
> index 40689c8dc67e..41150d546589 100644
> --- a/arch/x86/include/asm/tdx_global_metadata.h
> +++ b/arch/x86/include/asm/tdx_global_metadata.h
> @@ -40,6 +40,10 @@ struct tdx_sys_info_td_conf {
>  	u64 cpuid_config_values[128][2];
>  };
>  
> +struct tdx_sys_info_handoff {
> +	u16 module_hv;
> +};
> +
>  struct tdx_sys_info {
>  	struct tdx_sys_info_version version;
>  	struct tdx_sys_info_features features;
> diff --git a/arch/x86/virt/vmx/tdx/seamldr.c b/arch/x86/virt/vmx/tdx/seamldr.c
> index 48fe71319fea..6114cab46196 100644
> --- a/arch/x86/virt/vmx/tdx/seamldr.c
> +++ b/arch/x86/virt/vmx/tdx/seamldr.c
> @@ -15,6 +15,7 @@
>  #include <asm/seamldr.h>
>  
>  #include "seamcall_internal.h"
> +#include "tdx.h"
>  
>  /* P-SEAMLDR SEAMCALL leaf function */
>  #define P_SEAMLDR_INFO			0x8000000000000000
> @@ -164,6 +165,7 @@ static int init_seamldr_params(struct seamldr_params *params, const u8 *data, u3
>   */
>  enum module_update_state {
>  	MODULE_UPDATE_START,
> +	MODULE_UPDATE_SHUTDOWN,
>  	MODULE_UPDATE_DONE,
>  };
>  
> @@ -214,8 +216,16 @@ static void init_state(struct update_ctrl *ctrl)
>  static int do_seamldr_install_module(void *seamldr_params)
>  {
>  	enum module_update_state newstate, curstate = MODULE_UPDATE_START;
> +	int cpu = smp_processor_id();
> +	bool primary;
>  	int ret = 0;
>  
> +	/*
> +	 * Use CPU 0 to execute update steps that must run exactly once.
> +	 * Note CPU 0 is always online.
> +	 */
> +	primary = cpu == 0;
> +

Where does the term 'primary' come from? I'm guessing that the global steps must
each be run on the same CPU? Is that right? And we just pick the cpu that we
know much be online? Or can the global steps be run on different CPUs? Or they
*have* to be run on cpu 0? It might be worth some comments explaining, depending
on the answers to those questions.

>  	do {
>  		newstate = READ_ONCE(update_ctrl.state);
>  
> @@ -226,7 +236,10 @@ static int do_seamldr_install_module(void *seamldr_params)
>  
>  		curstate = newstate;
>  		switch (curstate) {
> -		/* TODO: add the update steps. */
> +		case MODULE_UPDATE_SHUTDOWN:
> +			if (primary)
> +				ret = tdx_module_shutdown();
> +			break;
>  		default:
>  			break;
>  		}
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 1621695d7561..da3c1e857b26 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -321,7 +321,7 @@ static __init int build_tdx_memlist(struct list_head *tmb_list)
>  	return ret;
>  }
>  
> -static __init int read_sys_metadata_field(u64 field_id, u64 *data)
> +static int read_sys_metadata_field(u64 field_id, u64 *data)
>  {
>  	struct tdx_module_args args = {};
>  	int ret;
> @@ -1267,6 +1267,23 @@ static __init int tdx_enable(void)
>  }
>  subsys_initcall(tdx_enable);
>  
> +int tdx_module_shutdown(void)
> +{
> +	struct tdx_sys_info_handoff handoff = {};
> +	struct tdx_module_args args = {};
> +	int ret;
> +
> +	ret = get_tdx_sys_info_handoff(&handoff);
> +	WARN_ON_ONCE(ret);

Take or leave it:

  Why not just WARN_ON_ONCE(get_tdx_sys_info_handoff(&handoff));
  And we can drop the ret var. Save 2 LOC.

> +
> +	/*
> +	 * Use the module's handoff version as it is the highest the
> +	 * module can produce and most likely supported by newer modules.
> +	 */
> +	args.rcx = handoff.module_hv;
> +	return seamcall_prerr(TDH_SYS_SHUTDOWN, &args);
> +}
> +
>  static bool is_pamt_page(unsigned long phys)
>  {
>  	struct tdmr_info_list *tdmr_list = &tdx_tdmr_list;
> diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h
> index 76c5fb1e1ffe..f0c20dea0388 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.h
> +++ b/arch/x86/virt/vmx/tdx/tdx.h
> @@ -46,6 +46,7 @@
>  #define TDH_PHYMEM_PAGE_WBINVD		41
>  #define TDH_VP_WR			43
>  #define TDH_SYS_CONFIG			45
> +#define TDH_SYS_SHUTDOWN		52
>  #define TDH_SYS_DISABLE			69
>  
>  /*
> @@ -108,4 +109,6 @@ struct tdmr_info_list {
>  	int max_tdmrs;	/* How many 'tdmr_info's are allocated */
>  };
>  
> +int tdx_module_shutdown(void);
> +
>  #endif
> diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
> index d54d4227990c..e793dec688ab 100644
> --- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
> +++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c
> @@ -100,6 +100,19 @@ static __init int get_tdx_sys_info_td_conf(struct tdx_sys_info_td_conf *sysinfo_
>  	return ret;
>  }
>  
> +static int get_tdx_sys_info_handoff(struct tdx_sys_info_handoff *sysinfo_handoff)
> +{
> +	int ret;
> +	u64 val;
> +
> +	ret = read_sys_metadata_field(0x8900000100000000, &val);
> +	if (ret)
> +		return ret;
> +
> +	sysinfo_handoff->module_hv = val;
> +	return 0;
> +}
> +
>  static __init int get_tdx_sys_info(struct tdx_sys_info *sysinfo)
>  {
>  	int ret = 0;


^ permalink raw reply

* Re: [PATCH v9 13/23] x86/virt/seamldr: Abort updates after a failed step
From: Edgecombe, Rick P @ 2026-05-19  2:34 UTC (permalink / raw)
  To: kvm@vger.kernel.org, linux-coco@lists.linux.dev,
	linux-kernel@vger.kernel.org, Gao, Chao
  Cc: Li, Xiaoyao, Huang, Kai, Zhao, Yan Y, dave.hansen@linux.intel.com,
	kas@kernel.org, Chatre, Reinette, seanjc@google.com,
	pbonzini@redhat.com, binbin.wu@linux.intel.com, Verma, Vishal L,
	nik.borisov@suse.com, mingo@redhat.com, Weiny, Ira,
	tony.lindgren@linux.intel.com, Annapurve, Vishal, Shahar, Sagi,
	djbw@kernel.org, tglx@kernel.org, paulmck@kernel.org,
	hpa@zytor.com, bp@alien8.de, yilun.xu@linux.intel.com,
	x86@kernel.org
In-Reply-To: <20260513151045.1420990-14-chao.gao@intel.com>

On Wed, 2026-05-13 at 08:09 -0700, Chao Gao wrote:
> A TDX module update is a multi-step process, and any step can fail.
> 
> The current update flow continues to later steps after an error.
> Continuing after a failure can leave the TDX module in an unrecoverable
> state.

I get what you are saying here, but "continuing" vs "leaving" is a tiny bit
confusing to me. Maybe: Continuing with subsequent update steps after a failure
can cause the TDX module to enter an unrecoverable state?

> 
> One failure case must remain recoverable: update contention with an ongoing
> TD build. The agreed kernel behavior for this case [1] is to fail the
> update with -EBUSY so userspace can retry later.

The link to the discussion is nice, but the explanation of just that there was
an agreement is not saying much. But the reasoning around AVOID_COMPAT_SENSITIVE
*is* handled in later patch. So can we say future changes will want to return
errors to userspace for certain update failures? Then we can discuss the
specifics when code is actual error is added?

And why talk about EBUSY specifically? It is not in this patch. Stale log? 

> 
> Abort the update on any failure. This also makes the TD-build contention
> case recoverable, because that failure occurs before any TDX module state
> is changed. 
> 

Oh, maybe I didn't get what you meant above actually. The contention case is
only recoverable because we detect it at the first step? Does "Continuing after
a failure can leave the TDX module in an unrecoverable" really mean that any
failure after the first step is unrecoverable? Or can we put it in some other
more specific terms like that. Terms which are more specific but still not
overly complex description of TDX module update flows?

> Apply the same rule to all errors instead of special-casing
> -EBUSY.

It seems like actually it is not special cased...? The error returned is
whatever is returned from the step.

> 
> Track per-step failures, stop the update loop once a failure is observed,
> and do not advance the state machine to the next step.

Hmm, so this is actually a bunch of generic handling for each step, that really
only works for the first one? Is the generic handling really needed?

> 
> Signed-off-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Xu Yilun <yilun.xu@linux.intel.com>
> Reviewed-by: Tony Lindgren <tony.lindgren@linux.intel.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>
> Link: https://lore.kernel.org/linux-coco/aQFmOZCdw64z14cJ@google.com/ # [1]
> ---
> v9:
>   - Avoid nested if/else by deferring failure accounting to ack_state().
>   - Reduce indentation of the main flow.
>   - Convert the failed flag into a counter. This avoids a conditional
>     update of the flag; the counter can simply accumulate failures.
> ---
>  arch/x86/virt/vmx/tdx/seamldr.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/virt/vmx/tdx/seamldr.c b/arch/x86/virt/vmx/tdx/seamldr.c
> index 7befe4a08f33..48fe71319fea 100644
> --- a/arch/x86/virt/vmx/tdx/seamldr.c
> +++ b/arch/x86/virt/vmx/tdx/seamldr.c
> @@ -170,6 +170,7 @@ enum module_update_state {
>  static struct update_ctrl {
>  	enum module_update_state state;
>  	int num_ack;
> +	int num_failed;

Was there past discussion on why it keeps a failed count? All we need to know is
if anything failed right? So a bool is fine too?

>  	/*
>  	 * Protect update_ctrl. Raw spinlock as it will be acquired from
>  	 * interrupt-disabled contexts.
> @@ -187,12 +188,13 @@ static void __set_target_state(struct update_ctrl *ctrl,
>  }
>  
>  /* Last one to ack a state moves to the next state. */
> -static void ack_state(struct update_ctrl *ctrl)
> +static void ack_state(struct update_ctrl *ctrl, int result)
>  {
>  	raw_spin_lock(&ctrl->lock);
>  
> +	ctrl->num_failed += !!result;
>  	ctrl->num_ack++;
> -	if (ctrl->num_ack == num_online_cpus())
> +	if (ctrl->num_ack == num_online_cpus() && !ctrl->num_failed)
>  		__set_target_state(ctrl, ctrl->state + 1);
>  
>  	raw_spin_unlock(&ctrl->lock);
> @@ -202,6 +204,7 @@ static void init_state(struct update_ctrl *ctrl)
>  {
>  	raw_spin_lock_init(&ctrl->lock);
>  	__set_target_state(ctrl, MODULE_UPDATE_START + 1);
> +	ctrl->num_failed = 0;
>  }
>  
>  /*
> @@ -228,8 +231,8 @@ static int do_seamldr_install_module(void *seamldr_params)
>  			break;
>  		}
>  
> -		ack_state(&update_ctrl);
> -	} while (curstate != MODULE_UPDATE_DONE);
> +		ack_state(&update_ctrl, ret);
> +	} while (curstate != MODULE_UPDATE_DONE && !READ_ONCE(update_ctrl.num_failed));
>  
>  	return ret;
>  }


^ permalink raw reply

* Re: [PATCH v9 02/23] x86/virt/tdx: Move TDX_FEATURES0 bits to asm/tdx.h
From: Edgecombe, Rick P @ 2026-05-19  1:59 UTC (permalink / raw)
  To: Hansen, Dave, Gao, Chao
  Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev,
	Huang, Kai, kvm@vger.kernel.org, Li, Xiaoyao, Zhao, Yan Y,
	dave.hansen@linux.intel.com, tony.lindgren@linux.intel.com,
	Chatre, Reinette, seanjc@google.com, pbonzini@redhat.com,
	binbin.wu@linux.intel.com, Weiny, Ira, nik.borisov@suse.com,
	mingo@redhat.com, Verma, Vishal L, kas@kernel.org, Shahar, Sagi,
	Annapurve, Vishal, djbw@kernel.org, tglx@kernel.org,
	paulmck@kernel.org, hpa@zytor.com, bp@alien8.de,
	yilun.xu@linux.intel.com, x86@kernel.org
In-Reply-To: <68e91c7ae1d10bdff73dd178d0e4ee48eaf1cfe1.camel@intel.com>

On Mon, 2026-05-18 at 09:57 -0700, Rick Edgecombe wrote:
> On Mon, 2026-05-18 at 15:52 +0800, Chao Gao wrote:
> > On Fri, May 15, 2026 at 09:15:47AM -0700, Dave Hansen wrote:
> > > On 5/13/26 08:09, Chao Gao wrote:
> > > > This prepares for TDX module update [1] and Dynamic PAMT [2] support. Both
> > > > add new TDX_FEATURES0 capability bits, and both need those capabilities to
> > > > be queried from code outside arch/x86/virt. The corresponding feature-query
> > > > helpers therefore need to live in the public asm/tdx.h header, so move the
> > > > existing bit definitions there first.
> > > 
> > > Please don't add unnecessary changelog cruft. If you need this move for
> > > this series, that's enough.
> > 
> > Sure. Will remove "Dynamic PAMT" stuff from the changelog.
> 
> I think it should not link to old versions of this series to explain the
> preparation. That is very confusing. We can just explain what will come in the
> later patches of *this* series. I'll circle back and propose some verbiage.

How about?

Future changes will add support for new TDX features exposed as TDX_FEATURES0
bits. The presence of these features will need to be checked outside of
arch/x86/virt. So the feature query helpers, and the TDX_FEATURES0 defines they
reference, will need to live in the widely accessible asm/tdx.h helper. Move the
existing TDX_FEATURES0 to asm/tdx.h so that they can all be kept together.


I ended up re-writing the whole thing. Not sure it was entirely necessary, but
lets def lose the links to the old patch versions.

^ permalink raw reply

* Re: [PATCH v9 09/23] coco/tdx-host: Don't expose P-SEAMLDR information on CPUs with erratum
From: Edgecombe, Rick P @ 2026-05-19  1:22 UTC (permalink / raw)
  To: Hansen, Dave, Gao, Chao
  Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev,
	Huang, Kai, kvm@vger.kernel.org, Li, Xiaoyao, Zhao, Yan Y,
	dave.hansen@linux.intel.com, tony.lindgren@linux.intel.com,
	Chatre, Reinette, seanjc@google.com, pbonzini@redhat.com,
	binbin.wu@linux.intel.com, Weiny, Ira, nik.borisov@suse.com,
	mingo@redhat.com, Verma, Vishal L, kas@kernel.org, Shahar, Sagi,
	Annapurve, Vishal, djbw@kernel.org, tglx@kernel.org,
	paulmck@kernel.org, hpa@zytor.com, bp@alien8.de,
	yilun.xu@linux.intel.com, x86@kernel.org
In-Reply-To: <2e082f23-1383-4dba-8cdf-0df612f64ace@intel.com>

On Mon, 2026-05-18 at 08:29 -0700, Dave Hansen wrote:
> On 5/18/26 05:44, Chao Gao wrote:
> > On Fri, May 15, 2026 at 10:26:19AM -0700, Dave Hansen wrote:
> > > On 5/13/26 08:09, Chao Gao wrote:
> > > > Some TDX-capable CPUs have an erratum, as documented in Intel® Trust
> > > > Domain CPU Architectural Extensions (May 2021 edition) Chapter 2.3:
> > > 2021, eh?
> > The TDX ISA document has not been updated since then; the May 2021
> > edition is still the latest revision. See:
> > 
> > https://www.intel.com/content/www/us/en/developer/tools/trust-domain-
> > extensions/documentation.html
> 
> I think you are saying that the CPUs have an erratum.
> 
> That erratum diverges their implementation from the spec: "Intel® Trust
> Domain CPU Architectural Extensions (May 2021 edition) Chapter 2.3".

It actually is documented in that May 2021 spec as the architectural behavior.
But it looks like not earlier, because the doc said it is new verbiage on that
one.

> 
> But when you combine those two things in one sentence, it's incredibly
> confusing.
> 
> The erratum you are talking about is brand new. I just asked for it to
> be created in the last month or two. Thus, my confusion when you say
> there: "an erratum, as documented in ... May 2021".
> 
> Thus, I'm questioning the 2021 date. You probably also want to mention
> that the erratum is, as of today, not publicly documented.
> 
> Can you rephrase this all and make it clearer, please?

So I guess we want to explain:
1. The problematic VMCS clearing behavior
2. That the problematic behavior is only documented in later docs (right?)
3. That it will be documented as an erratum later, and checked via the bit

Maybe something like?

Some TDX-capable CPUs have an erratum where SEAMRET clears the current VMCS
pointer. The behavior relies on the VMM to reload the current VMCS pointer.
However, that is a problem for KVM because clearing the current VMCS pointer
behind KVM's back will break KVM. While the VMCS clearing is documented as the
actual architecture in later versions of the "Intel® Trust Domain CPU
Architectural Extensions"[0] documents, it is not present in the earlier ones. 

Future docs will describe this SEAMRET VMCS clearing behavior as being present
when IA32_VMX_BASIC[60] is set...






^ permalink raw reply

* Re: [PATCH v2] KVM: TDX: Fix x2APIC MSR handling in tdx_has_emulated_msr()
From: Sean Christopherson @ 2026-05-19  0:41 UTC (permalink / raw)
  To: Sean Christopherson, kas, kvm, linux-coco, linux-kernel, pbonzini,
	binbin.wu, dmaluka, Rick Edgecombe
In-Reply-To: <20260410232654.3864196-1-rick.p.edgecombe@intel.com>

On Fri, 10 Apr 2026 16:26:54 -0700, Rick Edgecombe wrote:
> Rework tdx_has_emulated_msr() to explicitly enumerate the x2APIC MSRs
> that KVM can emulate, instead of trying to enumerate the MSRs that KVM
> cannot emulate. Drop the inner switch and list the emulatable x2APIC
> registers directly in the outer switch's "return true" block.
> 
> The old code had multiple bugs in the x2APIC range handling.
> X2APIC_MSR(APIC_ISR + APIC_ISR_NR) was incorrect because APIC_ISR_NR is
> 0x8, not 0x80, so the X2APIC_MSR() shift lost the lower bits, collapsing
> each range to a single MSR. IA32_X2APIC_SELF_IPI was also missing from
> the non-emulatable list.
> 
> [...]

Applied to kvm-x86 vmx, with a massaged comment as suggested by Binbin.  Thanks!

[1/1] KVM: TDX: Fix x2APIC MSR handling in tdx_has_emulated_msr()
      https://github.com/kvm-x86/linux/commit/1f3e69af5f93

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply

* Re: [RFC PATCH v5 00/45] TDX: Dynamic PAMT + S-EPT Hugepage
From: Sean Christopherson @ 2026-05-19  0:40 UTC (permalink / raw)
  To: Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, Kiryl Shutsemau, Paolo Bonzini
  Cc: linux-kernel, linux-coco, kvm, Kai Huang, Rick Edgecombe,
	Yan Zhao, Vishal Annapurve, Ackerley Tng, Sagi Shahar, Binbin Wu,
	Xiaoyao Li, Isaku Yamahata
In-Reply-To: <20260129011517.3545883-1-seanjc@google.com>

On Wed, 28 Jan 2026 17:14:32 -0800, Sean Christopherson wrote:
> This is a combined series of Dynamic PAMT (from Rick), and S-EPT hugepage
> support (from Yan).  Except for some last minute tweaks to the DPAMT array
> args stuff, a version of this based on a Google-internal kernel has been
> moderately well tested (thanks Vishal!).  But overall it's still firmly RFC
> as I have deliberately NOT addressed others feedback from v4 of DPAMT and v3
> of S-EPT hugepage (mostly lack of cycles), and there's at least one patch in
> here that shouldn't be merged as-is (the quick-and-dirty switch from struct
> page to raw pfns).
> 
> [...]

Applied 1-4 to kvm-x86 mmu.  Please yell if this was unexpected in any way.
I'm pretty sure this is what we agreed on, but the last few week have been a
bit chaotic...

[01/45] x86/tdx: Use pg_level in TDX APIs, not the TDX-Module's 0-based level
        https://github.com/kvm-x86/linux/commit/4487492b92a4
[02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
        https://github.com/kvm-x86/linux/commit/02eaaffdd865
[03/45] KVM: TDX: Account all non-transient page allocations for per-TD structures
        https://github.com/kvm-x86/linux/commit/a8b2924676ec
[04/45] KVM: x86: Make "external SPTE" ops that can fail RET0 static calls
        https://github.com/kvm-x86/linux/commit/e1a31ca28c9d

--
https://github.com/kvm-x86/linux/tree/next

^ permalink raw reply

* Re: [PATCH v2 08/15] KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers
From: Huang, Kai @ 2026-05-18 23:44 UTC (permalink / raw)
  To: seanjc@google.com
  Cc: dwmw2@infradead.org, Edgecombe, Rick P, x86@kernel.org,
	kas@kernel.org, binbin.wu@linux.intel.com,
	dave.hansen@linux.intel.com, vkuznets@redhat.com, paul@xen.org,
	yosry@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org
In-Reply-To: <agt7vnVL5rJJUVzo@google.com>

On Mon, 2026-05-18 at 13:51 -0700, Sean Christopherson wrote:
> On Mon, May 18, 2026, Kai Huang wrote:
> > 
> > > @@ -10413,29 +10413,30 @@ static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
> > >  
> > >  	if (!is_64_bit_hypercall(vcpu))
> > >  		ret = (u32)ret;
> > > -	kvm_rax_write(vcpu, ret);
> > > +	kvm_rax_write_raw(vcpu, ret);
> > >  	return kvm_skip_emulated_instruction(vcpu);
> > >  }
> > > 
> > 
> > Nit:  AFAICT if we use kvm_rax_write(vcpu, ret) instead of the "raw" version
> > here, we can then remove the
> > 
> > 	if (!is_64_bit_hypercall(vcpu))
> > 		ret = (u32)ret;
> 
> No, because sneakily, is_64_bit_hypercall() != is_64_bit_mode(vcpu).  And because
> we also need to avoid calling is_64_bit_mode().  If we use kvm_rax_write(), then
> the unpacked code will be:
> 
> 	WARN_ON_ONCE(vcpu->arch.guest_state_protected);
> 
> 	if (is_long_mode(vcpu))
> 		kvm_x86_call(get_cs_db_l_bits)(vcpu, &cs_db, &cs_l);
> 	else
> 		cs_l = 0;
> 
> 	if (cs_l)
> 		vcpu->arch.regs[VCPU_REGS_RAX] = ret;
> 	else	
> 		vcpu->arch.regs[VCPU_REGS_RAX] = (u32)ret;
> 
> whereas the (correct) behavior here is:
> 
> 	if (vcpu->arch.guest_state_protected)
> 		cs_l = 1;
> 	else if (is_long_mode(vcpu))
> 		kvm_x86_call(get_cs_db_l_bits)(vcpu, &cs_db, &cs_l);
> 	else
> 		cs_l = 0;
> 
> 	if (cs_l)
> 		vcpu->arch.regs[VCPU_REGS_RAX] = ret;
> 	else	
> 		vcpu->arch.regs[VCPU_REGS_RAX] = (u32)ret;
> 
> I.e. using the non-raw version will trigger the WARN_ON_ONCE(), and will incorrectly
> truncate "ret" whenever cs_l is stale (which might be always?).

FWIW, I sanity tested that booting/destroying both TD and VMX guests worked
fine.  I have no environment to test SVM and Xen related parts, though.

^ permalink raw reply

* Re: [PATCH v3 00/41] x86: Try to wrangle PV clocks vs. TSC
From: David Woodhouse @ 2026-05-18 23:38 UTC (permalink / raw)
  To: Sean Christopherson, Kiryl Shutsemau, Paolo Bonzini,
	K. Y. Srinivasan, Haiyang Zhang, Wei Liu, Dexuan Cui, Long Li,
	Ajay Kaher, Alexey Makhalov, Jan Kiszka, Dave Hansen,
	Andy Lutomirski, Peter Zijlstra, Juergen Gross, Daniel Lezcano,
	Thomas Gleixner, John Stultz
  Cc: Rick Edgecombe, Vitaly Kuznetsov,
	Broadcom internal kernel review list, Boris Ostrovsky,
	Stephen Boyd, x86, linux-coco, kvm, linux-hyperv, virtualization,
	linux-kernel, xen-devel, Michael Kelley, Tom Lendacky,
	Nikunj A Dadhania, Thomas Gleixner
In-Reply-To: <20260515191942.1892718-1-seanjc@google.com>

[-- Attachment #1: Type: text/plain, Size: 3647 bytes --]

On Fri, 2026-05-15 at 12:19 -0700, Sean Christopherson wrote:
> Dave/Thomas/Peter/Boris, what's the going rate for bribes to take something
> like this through the tip tree?
> 
> The bulk of the changes are in kvmclock and TSC, but pretty much every
> hypervisor's guest-side code gets touched at some point.  I am reaonsably
> confident in the correctness of the KVM changes.  Michael tested Hyper-V in
> v2, and while there were conflicts when rebasing, they were largely
> superficial (and I've just jinxed myself).  For all other hypervisors, assume
> the code is compile-tested only, but those changes are all quite small and
> straightforward.
> 
> The only changes that are questionable/contentious are the last two patches,
> which have KVM-as-a-guest use CPUID 0x16 to get the CPU frequency, even on
> AMD (that's the dubious part).  I very deliberately put them last, so that
> they can be dropped at will (I don't care terribly if those patches land).
> To merge them, I would want explicit Acks from Paolo and David W.
> 
> So, except for the last two patches, to get the stuff I really care about
> landed, I think/hope it's just the TSC and guest-side CoCo changes that need
> reviews/acks?
> 
> The primary goal of this series is (or at least was, when I started) to
> fix flaws with SNP and TDX guests where a PV clock provided by the untrusted
> hypervisor is used instead of the secure/trusted TSC that is controlled by
> trusted firmware.
> 
> The secondary goal is to draft off of the SNP and TDX changes to slightly
> modernize running under KVM.  Currently, KVM guests will use TSC for
> clocksource, but not sched_clock.  And they ignore Intel's CPUID-based TSC
> and CPU frequency enumeration, even when using the TSC instead of kvmclock.
> And if the host provides the core crystal frequency in CPUID.0x15, then KVM
> guests can use that for the APIC timer period instead of manually calibrating
> the frequency.
> 
> The tertiary goal is to clean up all of the PV clock code to deduplicate logic
> across hypervisors, and to hopefully make it all easier to maintain going
> forward.

I booted this in qemu with -cpu host,+invtsc,+vmware-cpuid-freq

I was expecting to see it eschew the kvmclock and use *only* the TSC.
Is there even any need for 'tsc-early' given that it's *told* the TSC
frequency in CPUID? Shouldn't it have detected that the TSC is known
before init_tsc_clocksource() runs?

And then it even spent some time at boot actually using the kvmclock as
clocksource... when ideally I don't think it would even have *enabled*
it at all?

[    0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000000] tsc: Detected 2400.000 MHz processor
[    0.008205] TSC deadline timer available
[    0.008270] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns
[    0.159085] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604467 ns
[    0.164074] clocksource: tsc-early: mask: 0xffffffffffffffff max_cycles: 0x22983777dd9, max_idle_ns: 440795300422 ns
[    0.229087] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns
[    0.337095] clocksource: Switched to clocksource kvm-clock
[    0.345246] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    0.356201] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x22983777dd9, max_idle_ns: 440795300422 ns
[    0.360560] clocksource: Switched to clocksource tsc


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/4] x86/vmware: Hypercall refactoring and improved guest support
From: Alexey Makhalov @ 2026-05-18 22:52 UTC (permalink / raw)
  To: x86, virtualization, bp, hpa, dave.hansen, mingo, tglx
  Cc: ajay.kaher, brennan.lamoreaux, bo.gan, bcm-kernel-feedback-list,
	linux-kernel, kas, rick.p.edgecombe, linux-coco
In-Reply-To: <20260309235250.2611115-1-alexey.makhalov@broadcom.com>

On 3/9/26 4:52 PM, Alexey Makhalov wrote:
> This series improves VMware guest support on x86 by refactoring the
> hypercall infrastructure and adding better crash diagnostics, along
> with encrypted guest support for the steal time clock.
> 
> The first patch introduces a common vmware_hypercall() backend selected
> via static calls. It consolidates the existing hypercall mechanisms
> (backdoor, VMCALL/VMMCALL, and TDX) behind a single interface and
> selects the optimal implementation at boot. This reduces duplication
> and simplifies future extensions.
> 
> Building on top of the new hypercall infrastructure, the next two
> patches improve post-mortem debugging of VMware guests. They export
> panic information to the hypervisor by dumping kernel messages to the
> VM vmware.log on the host and explicitly reporting guest crash event
> to the hypervisor.
> 
> The final patch adds support for encrypted guests by ensuring that the
> shared memory used for the steal time clock is mapped as decrypted
> before being shared with the hypervisor. This enables steal time
> accounting to function correctly when guest memory encryption is
> enabled.
> 
> Patch overview:
> 
> 1. x86/vmware: Introduce common vmware_hypercall
> 
>     * Consolidate hypercall implementations behind a common API
>     * Select backend via static_call at boot
> 
> 2. x86/vmware: Log kmsg dump on panic
> 
>     * Register a kmsg dumper
>     * Export panic logs to the host
> 
> 3. x86/vmware: Report guest crash to the hypervisor
> 
>     * Register a panic notifier
>     * Notify the hypervisor about guest crashes
> 
> 4. x86/vmware: Support steal time clock for encrypted guests
> 
>     * Mark shared steal time memory as decrypted early in boot
> 
> 
> Changelog:
> 
> V1 -> V2
>     * Fix compilation warnings in patch 2 "x86/vmware: Log kmsg dump on panic"
>       reported by kernel test robot <lkp@intel.com>
> 
> 
> Alexey Makhalov (4):
>    x86/vmware: Introduce common vmware_hypercall()
>    x86/vmware: Log kmsg dump on panic
>    x86/vmware: Report guest crash to the hypervisor
>    x86/vmware: Support steal time clock for encrypted guests
> 
>   arch/x86/include/asm/vmware.h | 276 ++++++++------------
>   arch/x86/kernel/cpu/vmware.c  | 470 +++++++++++++++++++++++++---------
>   2 files changed, 463 insertions(+), 283 deletions(-)
> 
> 
> base-commit: 7d08a6ad25f85c9bb7d0382142838cb54713f1a3

Gentle reminder to review this change. Thanks,
--Alexey



^ permalink raw reply

* Re: [PATCH v5 2/7] x86/msr: add wrmsrq_on_cpus helper
From: Dave Hansen @ 2026-05-18 22:38 UTC (permalink / raw)
  To: Kalra, Ashish, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, ackerleytng, jackyli, pgonda, rientjes, jacobhxu,
	xin, pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen,
	darwi, linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <6e365465-9ecf-416f-9561-67cab6428e15@amd.com>

On 5/18/26 15:09, Kalra, Ashish wrote:
> On 5/18/2026 5:04 PM, Dave Hansen wrote:
>> On 5/18/26 14:42, Ashish Kalra wrote:
>>> Co-developed-by: Dave Hansen <dave.hansen@linux.intel.com>
>>> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
>>> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
>>> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
>> Hi Ashish,
>>
>> Sorry if my memory fails me, but I don't remember signing off on this.
>> Could you point me to the place where I gave you my Signed-off-by?
> Sorry about this, added this accidentally. 
> 
> You had suggested the code change, i accidentally took it as a Signed-off.

Hi Ashish,

First, please do me a favor and go back and re-read:

	Documentation/process/submitting-patches.rst

I recommend that everyone do this every once in a while so they remember
what they are constantly signing off on. It's important.

My personal rule for SoB lines is that I don't add them unless I've
explicitly talked to the person. Even then, I vastly prefer that the
person provides it to me *explicitly* (so I just copy and paste
directly) and on a public mailing list. That way, the avenues for
accidents to occur are very narrow.

^ permalink raw reply

* Re: [PATCH v2 08/15] KVM: x86: Add mode-aware versions of kvm_<reg>_{read,write}() helpers
From: Huang, Kai @ 2026-05-18 22:29 UTC (permalink / raw)
  To: seanjc@google.com
  Cc: dwmw2@infradead.org, Edgecombe, Rick P, x86@kernel.org,
	kas@kernel.org, binbin.wu@linux.intel.com,
	dave.hansen@linux.intel.com, vkuznets@redhat.com, paul@xen.org,
	yosry@kernel.org, pbonzini@redhat.com, kvm@vger.kernel.org,
	linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org
In-Reply-To: <agt7vnVL5rJJUVzo@google.com>

On Mon, 2026-05-18 at 13:51 -0700, Sean Christopherson wrote:
> On Mon, May 18, 2026, Kai Huang wrote:
> > 
> > > @@ -10413,29 +10413,30 @@ static int complete_hypercall_exit(struct kvm_vcpu *vcpu)
> > >  
> > >  	if (!is_64_bit_hypercall(vcpu))
> > >  		ret = (u32)ret;
> > > -	kvm_rax_write(vcpu, ret);
> > > +	kvm_rax_write_raw(vcpu, ret);
> > >  	return kvm_skip_emulated_instruction(vcpu);
> > >  }
> > > 
> > 
> > Nit:  AFAICT if we use kvm_rax_write(vcpu, ret) instead of the "raw" version
> > here, we can then remove the
> > 
> > 	if (!is_64_bit_hypercall(vcpu))
> > 		ret = (u32)ret;
> 
> No, because sneakily, is_64_bit_hypercall() != is_64_bit_mode(vcpu).  

Oh I missed this :-(  sorry for the noise.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox