Linux Confidential Computing Development
 help / color / mirror / Atom feed
* Re: [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
From: Dave Hansen @ 2026-05-28 16:43 UTC (permalink / raw)
  To: Kiryl Shutsemau
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
	Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
	linux-coco, kvm, stable
In-Reply-To: <ahgUBLjBRGhxULu3@thinkstation>

On 5/28/26 03:14, Kiryl Shutsemau wrote:
> +	switch (size) {
> +	case 1:
> +		*(u8 *)&regs->ax = (u8)val;
> +		break;
> +	case 2:
> +		*(u16 *)&regs->ax = (u16)val;
> +		break;

Is this intentionally clever to only work on little endian, or
accidentally clever? This seems like a great IOCCC thing to do, but it's
far too clever for my taste.

I mean, it's making a pointer to a 64-bit value with an 8-bit name and
casting that to an 8-bit pointer and then assigning that to a 32-bit
value cast to a u8.

Is it just my tiny brain that thinks this will be unintelligible on Monday?

How about we just make the CPU do the thinking for us? IN[BWL] and
MOV[BWL] have the same semantics here, right? So even if 'rax' and 'val'
are 64-bit values here, the following should have all the right
behaviors, I think.

I generally loathe inline assembly. But we have a CPU that kinda knows
the rules already. No need for us to laboriously reimplement it. Right?

Thanks to the friendly LLM that knows inline assembly better than I do.
The resulting compiled assembly looks right to me.

/*
 * Use MOV[BWL] to/from registers to match the IN[BWL] behavior
 * including the fact that INL zeros the upper 64-bits while
 * IN[BW] don't zero anything.
 */

    switch (size) {
    case 1:
	// Just write 1 byte of RAX:
        __asm__ volatile ("movb %b1, %b0" : "+q"(rax)
                                          : "q"(val));
        break;
    case 2:
	// Write 2 bytes of RAX:
        __asm__ volatile ("movw %w1, %w0" : "+r"(rax)
                                          : "r"(val));
        break;
    case 4:
	// Write 'val' into lower 32 bits. Zero the upper 32 bits:
        __asm__ volatile ("movl %k1, %k0" : "=r"(rax)
                                          : "r"(val));
        break;
    default:
	// WARN
    }

Thoughts?

^ permalink raw reply

* Re: [PATCH v2 0/6] KVM/x86: Drop "1" as MSR emulation return value
From: Jürgen Groß @ 2026-05-28 15:50 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, x86, kvm, linux-coco, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Vitaly Kuznetsov, Kiryl Shutsemau, Rick Edgecombe,
	David Woodhouse, Paul Durrant
In-Reply-To: <ahhBQDvyzHQfTCBD@google.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 1487 bytes --]

On 28.05.26 15:21, Sean Christopherson wrote:
> On Thu, May 28, 2026, Jürgen Groß wrote:
>> On 28.05.26 15:09, Sean Christopherson wrote:
>>> On Thu, May 28, 2026, Juergen Gross wrote:
>>>> Please disregard this series, there is one complication sashiko made me
>>>> aware of.
>>>
>>> Sashiko beat me to the punch. :-)
>>>
>>> See commit 2368048bf5c2 ("KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)")
>>> for a real world example of how things can and will go wrong.
>>
>> Yeah, with Sashiko's pointer it was easy to spot.
>>
>> Question now is whether the already existing cases of -errno passed as return
>> value are wrong or on purpose.
> 
> What are the existing cases?
> 
>> If the latter, there should be a comment for
>> that, otherwise they need to be fixed..
>>
>> Disentangling the MSR emulation return values from the "normal" ones ("return
>> to guest"/"return to user mode") will be quite interesting with the overloaded
>> semantics of "1".
> 
> LOL, "interesting".

What do you think about the following idea:

Lets pass struct msr_info * down to all functions which get their return
value passed up. Then extend msr_info with a bool "return_to_guest" (valid
only if !host_initiated), which should be set instead of passing "1" up to
the caller (probably using an inline helper). Then the return value could
be 0 or -errno, and after MSR emulation the return_to_guest indicator can
be tested if needed.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/6] KVM/x86: Drop "1" as MSR emulation return value
From: Jürgen Groß @ 2026-05-28 15:36 UTC (permalink / raw)
  To: David Woodhouse, Sean Christopherson
  Cc: linux-kernel, x86, kvm, linux-coco, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Vitaly Kuznetsov, Kiryl Shutsemau, Rick Edgecombe,
	Paul Durrant
In-Reply-To: <a6fdbcc655ea24209ed03e377f7e4a2a4d43bd9c.camel@infradead.org>


[-- Attachment #1.1.1: Type: text/plain, Size: 1337 bytes --]

On 28.05.26 17:32, David Woodhouse wrote:
> On Thu, 2026-05-28 at 16:33 +0200, Jürgen Groß wrote:
>> On 28.05.26 15:21, Sean Christopherson wrote:
>>> On Thu, May 28, 2026, Jürgen Groß wrote:
>>>> On 28.05.26 15:09, Sean Christopherson wrote:
>>>>> On Thu, May 28, 2026, Juergen Gross wrote:
>>>>>> Please disregard this series, there is one complication sashiko made me
>>>>>> aware of.
>>>>>
>>>>> Sashiko beat me to the punch. :-)
>>>>>
>>>>> See commit 2368048bf5c2 ("KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)")
>>>>> for a real world example of how things can and will go wrong.
>>>>
>>>> Yeah, with Sashiko's pointer it was easy to spot.
>>>>
>>>> Question now is whether the already existing cases of -errno passed as return
>>>> value are wrong or on purpose.
>>>
>>> What are the existing cases?
>>
>> Found another one:
>>
>> kvm_xen_write_hypercall_page() (called by kvm_set_msr_common())
> 
> You mean in the case where it's using the user-provided hypercall page,
> and can't copy from the buffer that the VMM provided?

Yes.

> 
> I think that's correct to return -errno via PTR_ERR() and let the guest
> die?

In this case I think a comment in this regard would be nice, as it would
prevent others stumbling over it asking the same question again.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/6] KVM/x86: Drop "1" as MSR emulation return value
From: David Woodhouse @ 2026-05-28 15:32 UTC (permalink / raw)
  To: Jürgen Groß, Sean Christopherson
  Cc: linux-kernel, x86, kvm, linux-coco, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Vitaly Kuznetsov, Kiryl Shutsemau, Rick Edgecombe,
	Paul Durrant
In-Reply-To: <8d3f3cbe-89ca-44df-b35f-90c2724e9154@suse.com>

[-- Attachment #1: Type: text/plain, Size: 1168 bytes --]

On Thu, 2026-05-28 at 16:33 +0200, Jürgen Groß wrote:
> On 28.05.26 15:21, Sean Christopherson wrote:
> > On Thu, May 28, 2026, Jürgen Groß wrote:
> > > On 28.05.26 15:09, Sean Christopherson wrote:
> > > > On Thu, May 28, 2026, Juergen Gross wrote:
> > > > > Please disregard this series, there is one complication sashiko made me
> > > > > aware of.
> > > > 
> > > > Sashiko beat me to the punch. :-)
> > > > 
> > > > See commit 2368048bf5c2 ("KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)")
> > > > for a real world example of how things can and will go wrong.
> > > 
> > > Yeah, with Sashiko's pointer it was easy to spot.
> > > 
> > > Question now is whether the already existing cases of -errno passed as return
> > > value are wrong or on purpose.
> > 
> > What are the existing cases?
> 
> Found another one:
> 
> kvm_xen_write_hypercall_page() (called by kvm_set_msr_common())

You mean in the case where it's using the user-provided hypercall page,
and can't copy from the buffer that the VMM provided?

I think that's correct to return -errno via PTR_ERR() and let the guest
die?

The rest return 0 or 1.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply

* Re: [PATCH v5 4/7] x86/sev: Add support to perform RMP optimizations asynchronously
From: Ackerley Tng @ 2026-05-28 14:45 UTC (permalink / raw)
  To: Ashish Kalra, tglx, mingo, bp, dave.hansen, x86, hpa, seanjc,
	peterz, thomas.lendacky, herbert, davem, ardb
  Cc: pbonzini, aik, Michael.Roth, KPrateek.Nayak, Tycho.Andersen,
	Nathan.Fontenot, jackyli, pgonda, rientjes, jacobhxu, xin,
	pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen, darwi,
	linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <6f1ec3d8ebcf3aaceccc099c07d0deb545dd4ab9.1779133590.git.ashish.kalra@amd.com>

Ashish Kalra <Ashish.Kalra@amd.com> writes:

Thank you Ashish!

> From: Ashish Kalra <ashish.kalra@amd.com>
>
> When SEV-SNP is enabled, all writes to memory are checked to ensure
> integrity of SNP guest memory. This imposes performance overhead on the
> whole system.
>
> RMPOPT is a new instruction that minimizes the performance overhead of
> RMP checks on the hypervisor and on non-SNP guests by allowing RMP
> checks to be skipped for 1GB regions of memory that are known not to
> contain any SEV-SNP guest memory.
>
> Add support for performing RMP optimizations asynchronously using a
> dedicated workqueue.
>
> Enable RMPOPT optimizations globally for all system RAM up to 2TB at

This should also be updated to say "Enable RMPOPT optimizations for up
to 2TB worth of system RAM at..."

The current phrasing sounds like only addresses [0, 2TB) are allowed to
be optimized, but actually any address [start, start + 2TB) can be
optimized?

> RMP initialization time. RMP checks can initially be skipped for 1GB
> memory ranges that do not contain SEV-SNP guest memory (excluding
> preassigned pages such as the RMP table and firmware pages). As SNP
> guests are launched, RMPUPDATE will disable the corresponding RMPOPT
> optimizations.
>
> Suggested-by: Thomas Lendacky <thomas.lendacky@amd.com>
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Reviewed-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Ashish Kalra <ashish.kalra@amd.com>
> ---
>  arch/x86/virt/svm/sev.c | 167 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 164 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/virt/svm/sev.c b/arch/x86/virt/svm/sev.c
> index 82f9dc7a57c3..8876cac052d5 100644
> --- a/arch/x86/virt/svm/sev.c
> +++ b/arch/x86/virt/svm/sev.c
> @@ -19,6 +19,7 @@
>  #include <linux/iommu.h>
>  #include <linux/amd-iommu.h>
>  #include <linux/nospec.h>
> +#include <linux/workqueue.h>
>
>  #include <asm/sev.h>
>  #include <asm/processor.h>
> @@ -125,7 +126,18 @@ static void *rmp_bookkeeping __ro_after_init;
>  static u64 probed_rmp_base, probed_rmp_size;
>
>  static cpumask_t rmpopt_cpumask;
> -static phys_addr_t rmpopt_pa_start;
> +static phys_addr_t rmpopt_pa_start, rmpopt_pa_end;
> +
> +enum rmpopt_function {
> +	RMPOPT_FUNC_VERIFY_AND_REPORT_STATUS,
> +	RMPOPT_FUNC_REPORT_STATUS
> +};
> +
> +#define RMPOPT_WORK_TIMEOUT	10000
> +
> +static struct workqueue_struct *rmpopt_wq;
> +static struct delayed_work rmpopt_delayed_work;
> +static DEFINE_MUTEX(rmpopt_wq_mutex);
>
>  static LIST_HEAD(snp_leaked_pages_list);
>  static DEFINE_SPINLOCK(snp_leaked_pages_list_lock);
> @@ -564,12 +576,21 @@ EXPORT_SYMBOL_FOR_MODULES(snp_prepare, "ccp");
>
>  static void rmpopt_cleanup(void)
>  {
> +	guard(mutex)(&rmpopt_wq_mutex);
> +
> +	if (!rmpopt_wq)
> +		return;
> +
> +	cancel_delayed_work_sync(&rmpopt_delayed_work);
> +	destroy_workqueue(rmpopt_wq);
> +
>  	cpus_read_lock();
>  	wrmsrq_on_cpus(&rmpopt_cpumask, MSR_AMD64_RMPOPT_BASE, 0);
>  	cpus_read_unlock();
>
>  	cpumask_clear(&rmpopt_cpumask);
> -	rmpopt_pa_start = 0;
> +	rmpopt_pa_start = rmpopt_pa_end = 0;
> +	rmpopt_wq = NULL;
>  }
>
>  void snp_shutdown(void)
> @@ -587,6 +608,105 @@ void snp_shutdown(void)
>  }
>  EXPORT_SYMBOL_FOR_MODULES(snp_shutdown, "ccp");
>
> +static inline bool __rmpopt(u64 rax, u64 rcx)

Perhaps use pa_start instead of rax and op_type for rcx?

> +{
> +	bool optimized;
> +
> +	asm volatile(".byte 0xf2, 0x0f, 0x01, 0xfc"
> +		     : "=@ccc" (optimized)
> +		     : "a" (rax), "c" (rcx)
> +		     : "memory", "cc");
> +
> +	return optimized;
> +}
> +
> +static void rmpopt(u64 pa)
> +{
> +	u64 rax = ALIGN_DOWN(pa, SZ_1G);
> +	u64 rcx = RMPOPT_FUNC_VERIFY_AND_REPORT_STATUS;
> +

And pa_start and op_type here too.

> +	__rmpopt(rax, rcx);
> +}
> +
> +/*
> + * 'val' is a system physical address.
> + */
> +static void rmpopt_smp(void *val)
> +{
> +	rmpopt((u64)val);
> +}
> +
> +/*
> + * RMPOPT optimizations skip RMP checks at 1GB granularity if this
> + * range of memory does not contain any SNP guest memory.
> + */
> +static void rmpopt_work_handler(struct work_struct *work)
> +{
> +	bool current_cpu_cleared = false;
> +	phys_addr_t pa;
> +	int this_cpu;
> +
> +	pr_info("Attempt RMP optimizations on physical address range @1GB alignment [0x%016llx - 0x%016llx]\n",
> +		rmpopt_pa_start, rmpopt_pa_end);
> +
> +	/*
> +	 * RMPOPT scans the RMP table, stores the result of the scan in the
> +	 * reserved processor memory. The RMP scan is the most expensive
> +	 * part. If a second RMPOPT occurs, it can skip the expensive scan
> +	 * if they can see a cached result in the reserved processor memory.
> +	 *
> +	 * Do RMPOPT on one CPU alone. Then, follow that up with RMPOPT
> +	 * on every other primary thread. This potentially allows the

I like the leader and follower comments below, thanks! With this
leader/follower setup, will the followers definitely see the cached scan
results, or might the followers still potentially not benefit from the
caching? If it's still only "potentially", why?

> +	 * followers to use the "cached" scan results to avoid repeating
> +	 * full scans.
> +	 */
> +
> +	/*
> +	 * Pin the worker to the current CPU for the leader loop so that
> +	 * this_cpu remains valid and the RMPOPT instruction executes on
> +	 * the CPU that was cleared from the cpumask.  The workqueue is
> +	 * WQ_UNBOUND, so without pinning, the scheduler could migrate
> +	 * the worker between the cpumask manipulation and the leader
> +	 * loop, causing the leader to run on a different CPU while
> +	 * this_cpu's core is skipped entirely.
> +	 *
> +	 * Use migrate_disable() rather than get_cpu() to prevent
> +	 * migration while still allowing preemption.
> +	 *
> +	 * Note: rmpopt_cpumask is modified here without holding
> +	 * rmpopt_wq_mutex.  This is safe because the delayed_work
> +	 * mechanism guarantees single-threaded execution of this
> +	 * handler, and rmpopt_cleanup() calls cancel_delayed_work_sync()
> +	 * to ensure handler completion before tearing down the cpumask.
> +	 */
> +	migrate_disable();
> +	this_cpu = smp_processor_id();
> +	if (cpumask_test_cpu(this_cpu, &rmpopt_cpumask)) {
> +		cpumask_clear_cpu(this_cpu, &rmpopt_cpumask);
> +		current_cpu_cleared = true;
> +	}
> +

Instead of reusing the global rmpopt_cpumask, why not make a copy of
rmpopt_cpumask for this function? Then this function won't have to
figure out current_cpu_cleared or restore rmpopt_cpumask at the end.

I'm thinking to also drop the test and clear, this function can just
always clear, like

  cpumask_clear_cpu(smp_processor_id(), followers_cpumask);

and later

  on_each_cpu_mask(&followers_cpumask, ...);

Actually, if for whatever reason cpumask_test_cpu(this_cpu,
&rmpopt_cpumask) above returns false, would that mean somehow some cpu
exists that wasn't enabled right when rmpopt was initialized? If yes,
what happens if we call rmpopt() on a cpu where it wasn't initialized?

> +	/* Leader: prime the RMPOPT cache on this CPU */
> +	for (pa = rmpopt_pa_start; pa < rmpopt_pa_end; pa += SZ_1G)
> +		rmpopt(pa);
> +
> +	migrate_enable();
> +
> +	/* Followers: run RMPOPT on all other cores */
> +	cpus_read_lock();
> +	for (pa = rmpopt_pa_start; pa < rmpopt_pa_end; pa += SZ_1G) {
> +		on_each_cpu_mask(&rmpopt_cpumask, rmpopt_smp,
> +				 (void *)pa, true);
> +
> +		 /* Give a chance for other threads to run */
> +		cond_resched();
> +	}
> +	cpus_read_unlock();
> +
> +	if (current_cpu_cleared)
> +		cpumask_set_cpu(this_cpu, &rmpopt_cpumask);
> +}
> +
>
> [...snip...]
>

^ permalink raw reply

* Re: [PATCH v2 0/6] KVM/x86: Drop "1" as MSR emulation return value
From: Jürgen Groß @ 2026-05-28 14:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, x86, kvm, linux-coco, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Vitaly Kuznetsov, Kiryl Shutsemau, Rick Edgecombe,
	David Woodhouse, Paul Durrant
In-Reply-To: <ahhBQDvyzHQfTCBD@google.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 814 bytes --]

On 28.05.26 15:21, Sean Christopherson wrote:
> On Thu, May 28, 2026, Jürgen Groß wrote:
>> On 28.05.26 15:09, Sean Christopherson wrote:
>>> On Thu, May 28, 2026, Juergen Gross wrote:
>>>> Please disregard this series, there is one complication sashiko made me
>>>> aware of.
>>>
>>> Sashiko beat me to the punch. :-)
>>>
>>> See commit 2368048bf5c2 ("KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)")
>>> for a real world example of how things can and will go wrong.
>>
>> Yeah, with Sashiko's pointer it was easy to spot.
>>
>> Question now is whether the already existing cases of -errno passed as return
>> value are wrong or on purpose.
> 
> What are the existing cases?

Found another one:

kvm_xen_write_hypercall_page() (called by kvm_set_msr_common())


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/6] KVM/x86: Drop "1" as MSR emulation return value
From: Jürgen Groß @ 2026-05-28 14:01 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, x86, kvm, linux-coco, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Vitaly Kuznetsov, Kiryl Shutsemau, Rick Edgecombe,
	David Woodhouse, Paul Durrant
In-Reply-To: <ahhBQDvyzHQfTCBD@google.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 834 bytes --]

On 28.05.26 15:21, Sean Christopherson wrote:
> On Thu, May 28, 2026, Jürgen Groß wrote:
>> On 28.05.26 15:09, Sean Christopherson wrote:
>>> On Thu, May 28, 2026, Juergen Gross wrote:
>>>> Please disregard this series, there is one complication sashiko made me
>>>> aware of.
>>>
>>> Sashiko beat me to the punch. :-)
>>>
>>> See commit 2368048bf5c2 ("KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)")
>>> for a real world example of how things can and will go wrong.
>>
>> Yeah, with Sashiko's pointer it was easy to spot.
>>
>> Question now is whether the already existing cases of -errno passed as return
>> value are wrong or on purpose.
> 
> What are the existing cases?

Have a look at:

kvm_hv_msr_get_crash_data()
kvm_hv_msr_set_crash_data()
svm_get_msr()
svm_set_msr()


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/6] KVM/x86: Drop "1" as MSR emulation return value
From: Sean Christopherson @ 2026-05-28 13:21 UTC (permalink / raw)
  To: Jürgen Groß
  Cc: linux-kernel, x86, kvm, linux-coco, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Vitaly Kuznetsov, Kiryl Shutsemau, Rick Edgecombe,
	David Woodhouse, Paul Durrant
In-Reply-To: <476964db-837a-45d8-a647-c5a2ea8b0cb6@suse.com>

On Thu, May 28, 2026, Jürgen Groß wrote:
> On 28.05.26 15:09, Sean Christopherson wrote:
> > On Thu, May 28, 2026, Juergen Gross wrote:
> > > Please disregard this series, there is one complication sashiko made me
> > > aware of.
> > 
> > Sashiko beat me to the punch. :-)
> > 
> > See commit 2368048bf5c2 ("KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)")
> > for a real world example of how things can and will go wrong.
> 
> Yeah, with Sashiko's pointer it was easy to spot.
> 
> Question now is whether the already existing cases of -errno passed as return
> value are wrong or on purpose. 

What are the existing cases?

> If the latter, there should be a comment for
> that, otherwise they need to be fixed..
> 
> Disentangling the MSR emulation return values from the "normal" ones ("return
> to guest"/"return to user mode") will be quite interesting with the overloaded
> semantics of "1".

LOL, "interesting".

^ permalink raw reply

* Re: [PATCH v2 0/6] KVM/x86: Drop "1" as MSR emulation return value
From: Jürgen Groß @ 2026-05-28 13:18 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: linux-kernel, x86, kvm, linux-coco, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Vitaly Kuznetsov, Kiryl Shutsemau, Rick Edgecombe,
	David Woodhouse, Paul Durrant
In-Reply-To: <ahg-bEiwyqYTdWOD@google.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 832 bytes --]

On 28.05.26 15:09, Sean Christopherson wrote:
> On Thu, May 28, 2026, Juergen Gross wrote:
>> Please disregard this series, there is one complication sashiko made me
>> aware of.
> 
> Sashiko beat me to the punch. :-)
> 
> See commit 2368048bf5c2 ("KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)")
> for a real world example of how things can and will go wrong.

Yeah, with Sashiko's pointer it was easy to spot.

Question now is whether the already existing cases of -errno passed as return
value are wrong or on purpose. If the latter, there should be a comment for
that, otherwise they need to be fixed..

Disentangling the MSR emulation return values from the "normal" ones ("return
to guest"/"return to user mode") will be quite interesting with the overloaded
semantics of "1".


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* Re: [PATCH v2 0/6] KVM/x86: Drop "1" as MSR emulation return value
From: Sean Christopherson @ 2026-05-28 13:09 UTC (permalink / raw)
  To: Juergen Gross
  Cc: linux-kernel, x86, kvm, linux-coco, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Vitaly Kuznetsov, Kiryl Shutsemau, Rick Edgecombe,
	David Woodhouse, Paul Durrant
In-Reply-To: <a8f495f0-b4a9-42b6-be74-4fa9d83c7346@suse.com>

On Thu, May 28, 2026, Juergen Gross wrote:
> Please disregard this series, there is one complication sashiko made me
> aware of.

Sashiko beat me to the punch. :-)

See commit 2368048bf5c2 ("KVM: x86: Signal #GP, not -EPERM, on bad WRMSR(MCi_CTL/STATUS)")
for a real world example of how things can and will go wrong.

^ permalink raw reply

* Re: [PATCH v2 0/6] KVM/x86: Drop "1" as MSR emulation return value
From: Juergen Gross @ 2026-05-28 11:58 UTC (permalink / raw)
  To: linux-kernel, x86, kvm, linux-coco
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Vitaly Kuznetsov,
	Kiryl Shutsemau, Rick Edgecombe, David Woodhouse, Paul Durrant
In-Reply-To: <20260528113605.267111-1-jgross@suse.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 1788 bytes --]

Please disregard this series, there is one complication sashiko made me
aware of.


Juergen

On 28.05.26 13:35, Juergen Gross wrote:
> Get rid of the literal "1" used as general error return value in KVM
> MSR emulation. It can easily be replaced by negative errno values
> instead.
> 
> This is meant to avoid confusion with the literal "1" used as return
> value for "return to guest".
> 
> Changes in V2:
> - series carved out from initial "KVM: Avoid literal numbers as return
>    values" series
> - don't use new KVM_MSR_RET_* defines, but 0 and -errno
> 
> Juergen Gross (6):
>    KVM/x86: Change comment before KVM_MSR_RET_* defines
>    KVM/x86: Return -errno instead of "1" for APIC related MSR emulation
>    KVM/x86: Return -errno instead of "1" for Hyper-V related MSR
>      emulation
>    KVM/x86: Return -errno instead of "1" for VMX related MSR emulation
>    KVM/x86: Return -errno instead of "1" for SVM related MSR emulation
>    KVM/x86: Return -errno instead of "1" for common MSR emulation
> 
>   arch/x86/kvm/hyperv.c        |  72 ++++++++++++-------------
>   arch/x86/kvm/lapic.c         |  39 +++++++-------
>   arch/x86/kvm/mtrr.c          |   6 +--
>   arch/x86/kvm/pmu.c           |   8 +--
>   arch/x86/kvm/svm/pmu.c       |   4 +-
>   arch/x86/kvm/svm/svm.c       |  36 ++++++-------
>   arch/x86/kvm/vmx/nested.c    |   2 +-
>   arch/x86/kvm/vmx/pmu_intel.c |  16 +++---
>   arch/x86/kvm/vmx/tdx.c       |  10 ++--
>   arch/x86/kvm/vmx/vmx.c       |  96 ++++++++++++++++-----------------
>   arch/x86/kvm/x86.c           | 102 +++++++++++++++++------------------
>   arch/x86/kvm/x86.h           |   4 +-
>   arch/x86/kvm/xen.c           |  10 ++--
>   13 files changed, 202 insertions(+), 203 deletions(-)
> 


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* COCONUT-SVSM Development Release v2026.05-devel
From: Jörg Rödel @ 2026-05-28 11:44 UTC (permalink / raw)
  To: coconut-svsm, linux-coco

Hi all,

The month is almost over and it is time for a new COCONUT-SVSM development
release. This one turned out bigger than usual with 33 merges that brought in
91 non-merge commits from 9 contributors.

The changes include:

  - Boot flow: stage2 was removed and replaced by the new simpler boot/bldr
    loader. Build, xbuild, IGVM builder, configs, and launch paths now
    prefer/consume bldr.

  - Platform/CPU feature model: CPUID handling was routed through the platform
    abstraction, with a feature lookup table added for x2APIC, physical address
    size, Hyper-V discovery, CET, FPU/SSE, INVLPGB, C-bit, and related SNP
    features.

  - Attestation: Added vsock transport support with serial fallback, refactored
    aproxy transport handling, added read_exact / write_all helpers, and
    documented the vsock transport option.

  - Protocol hardening: Core and attestation protocol handlers gained stricter
    region validation, reserved-bit checks, request validation, mutually
    exclusive core calls, safer CAA/VMSA handling, and better guest fault
    forwarding.
  
  - Memory and guest handling: Shared pages are made private on SharedBox drop,
    guest memory reads now require FromBytes, VMSA registration checks
    overlaps, and CAA/VMSA tracking was tightened.

  - Virtio/vTPM fixes: Virtio owning queue validation now checks tokens and
    lengths before indexing/slicing. vTPM failure mode no longer returns
    uninitialized heap bytes.

  - ACPI/fw_cfg cleanup: Removed leftover fw_cfg-based ACPI/MADT logic and
    dropped the ACPI fuzz target.

  - Common architecture code: MSR, CR0/CR4, SEV status, x86, and APIC
    definitions moved into cpuarch.

  - Scripts and CI: QEMU launch no longer invokes sudo, gained --tcg, dropped
    the QEMU >= 11 nocc object path, improved test timeout/error reporting,
    fixed workflow triggers, added Verus caching, updated dependency review to
    Node 24, and dumps host dmesg on QEMU/test failures.

  - Verification: Documentation and workflows now reflect cargo-verus usage and
    Verus installation changes.

  - Misc fixes: ELF symbol/buffer bounds fixes, IPI race fixes, CPU vendor
    display, kernel version display during guest launch, and IGVM target VTL
    selection based on firmware presence.

As usual, the full shortlog since the last release is attached.

Have fun!

Regards,

	Joerg

Carlos López (18):
      kernel: platform: add platform_method!() macro
      kernel: platform: do not take &self to query CPUID
      kernel: cpu/vc: simplify snp_cpuid()
      kernel: platform/snp: properly handle CPUID leaf 0xd
      kernel: platform: add default CPUID implementation
      kernel: always route CPUID through the platform abstraction
      kernel: cpu/features: create feature lookup table
      kernel: cet: make CET discovery into a feature
      kernel: cpu/sse: make FPU feature detection into a feature
      kernel: hyperv: make Hyper-V discovery into a feature
      kernel: sev/tlb: make INVLPGB max entry detection into a feature
      kernel: platform: make phys address sizes into a feature
      kernel: platform: make x2apic into a feature
      kernel: platform: make platform statics read-only after init
      kernel: platform: remove trivial FIXME for SvsmPlatformCell
      kernel: platform/snp: make C-bit into a feature
      kernel: platform/snp: get physical address size through CPU features
      kernel: platform: remove setup_guest_host_comm()

Joerg Roedel (23):
      kernel/guestmem: Require FromBytes for read_from_guest()
      kernel/mm: Make all pages private when SharedBox is dropped
      kernel/protocols: Forward GuestPtr faults to the guest
      kernel/protocols: Make sure memory regions are valid for attestation protocol
      kernel/greq: Round extended guest request size up to page-size
      kernel/percpu: Always page_align CAA address before mapping
      kernel/percpu: Track CAA address in PERCPU_VMSAS
      kernel/percpu: Check for region overlap in VMSA registration
      kernel/percpu: Return SvsmError from PERCPU_VMSAS.unregister()
      kernel/snp: Register initial guest VMSA
      kernel/protocols: Check for valid regions in core_pvalidate
      kernel/protocols: Update PERCPU_VMSAS in core_remap_caa()
      kernel/protocols: Do not deregister VMSA before updating RMP state
      kernel/protocols: Check for reserved bits in core_pvalidate_one()
      kernel/protocols: Check for region validity in core_pvalidate_one()
      kernel/protocols: Check whether attestation requests are valid
      kernel/protocols: Remove try_from_as_ref() from attestation structures
      kernel/protocols: Use valid_phys_region() where needed
      kernel/protocols: Make some core protocol calls mutually exclusive
      kernel/protocols: Use MemoryRegion::checked_new() in core protocol handlers
      Elf: Make sure buffer length is multiple of 16
      Elf: Do not read symbols beyond symbol table
      COCONUT-SVSM Release 2026.05-devel

Jon Lange (22):
      cpuarch: move MSR and CR0/CR4 definitions to common crate
      igvmbuilder: reshape the initial low-mem page tables
      bldr: implement a simpler boot loader
      igvmbuild: support bldr
      xbuild: support bldr
      build: consume bldr instead of stage2
      Merge pull request #1029 from msft-jlange/bldr
      svsm: prefer bldr to stage2
      xbuild: remove stage2 support
      igvmbuilder: remove support for stage2
      stage2: remove stage2
      bldr: clear temporary mapping PTEs after use
      Merge pull request #1064 from MelodyHuibo/init_vgif
      Merge pull request #1068 from 00xc/platform/fixme
      error: avoid `SvsmReqError` outside of SVSM-specific paths
      svsm: detect and display CPU vendor
      scripts: display kernel version when launching guest
      cpu/ipi: fix race conditions
      Merge pull request #1076 from msft-jlange/cpu_vendor
      Merge pull request #1077 from msft-jlange/kernel_info
      igvmbuilder: configure target VTL based on the presence of firmware
      cpuarch: move APIC constants to a common location

Jörg Rödel (24):
      Merge pull request #1048 from luigix25/fix_ci
      Merge pull request #1052 from n-ramacciotti/ci/remove_unmaintained_action
      Merge pull request #1054 from msft-jlange/bldr_ptes
      Merge pull request #1043 from mvanhorn/feat/1042-tools-check
      Merge pull request #1050 from n-ramacciotti/ci/simplify-test-in-svsm
      Merge pull request #1053 from msft-jlange/remove_stage2
      Merge pull request #1055 from n-ramacciotti/ci/update_dep_review_node_24
      Merge pull request #1030 from 00xc/platform/cpuid-v2
      Merge pull request #1059 from ziqiaozhou/fix-broken-alloc-proof
      Merge pull request #1067 from 00xc/ro-after-init
      Merge pull request #1070 from MelodyHuibo/enable_alternate_injection
      Merge pull request #1071 from 00xc/platform/remove-host-comm
      Merge pull request #1072 from stefano-garzarella/virtio-fix-owning-pop
      Merge pull request #1074 from stefano-garzarella/fix-tpm-allocation
      Merge pull request #1078 from msft-jlange/ipi_fix
      Merge pull request #1038 from ziqiaozhou/cargo-verus
      Merge pull request #1079 from stefano-garzarella/verus-cache
      Merge pull request #1080 from stefano-garzarella/ci-fix-verification-label-trigger
      Merge pull request #1082 from stefano-garzarella/ci-dmesg
      Merge pull request #1066 from luigix25/remove_sudo
      Merge pull request #1090 from luigix25/fw_cfg_cleanup
      Merge pull request #1085 from msft-jlange/igvm_vtl
      Merge pull request #1087 from msft-jlange/cpu_apic
      Merge pull request #1069 from 00xc/platform/missing-features

Luigi Leonardi (14):
      github/workflows: add apt update before apt install in publish-docs
      io: add `read_exact` and `write_all` to Read and Write trait
      aproxy: use read/write traits
      aproxy: factor out accept loop to a separate function
      aproxy: enable vsock for attestation
      kernel/attest: abstract transport implementation
      kernel/attest: switch to write_all/read_exact
      kernel/attest: add vsock transport with serial fallback
      Documentation/ATTESTATION: document vsock transport option
      scripts/launch_guest: drop nocc object for QEMU >= 11.0
      scripts/launch_guest: add --tcg option to use TCG acceleration
      github/workflows: set up /dev/kvm permissions
      scripts/launch_guest: remove sudo from QEMU invocation
      acpi: remove fw_cfg-based ACPI/MADT leftover

Matt Van Horn (1):
      testing/scripts: Check required host tools before launching guest

Melody Wang (2):
      cpu: Make sure guest's GIF is set
      boot: Allow Alternate Injection to be configured via boot params

Nicola Ramacciotti (5):
      github/workflows: Remove unmaintained action
      scripts/test-in-svsm: Print exit code when failing
      scripts/test-in-svsm: Add optional timeout handling
      github/workflows: Use the test-in-svsm script directly
      github/workflows: update dependency review to node 24

Stefano Garzarella (10):
      Merge pull request #879 from luigix25/add_attestation_to_vsock
      Merge pull request #1058 from luigix25/qemu_11_launch
      Merge pull request #1065 from msft-jlange/svsm_req_error
      virtio-drivers: queue/owning: validate the token before indexing buffer table
      virtio-drivers: queue/owning: validate len before slicing buffer
      kernel/vtpm: fix uninitialized heap bytes returned in TPM failure mode
      github/manual-verify: fix triggering on 'verification' label
      github/manual-verify: cache verus toolchain
      Merge pull request #1073 from joergroedel/fixes
      github/qemu: dump host kernel messages on QEMU or test failure

Ziqiao Zhou (5):
      mm/alloc.verus: update phys_to_virt proof after stage2 removal
      verification: Support Verus's verita test via cargo-verus.
      scripts: Update vsinstall.sh to directly install verus.
      workflow: Revert "github/manual-verify: check cargo-v output for errors"
      doc: Update verification.md to reflect the use of cargo-verus



^ permalink raw reply

* [PATCH v2 4/6] KVM/x86: Return -errno instead of "1" for VMX related MSR emulation
From: Juergen Gross @ 2026-05-28 11:36 UTC (permalink / raw)
  To: linux-kernel, x86, kvm, linux-coco
  Cc: Juergen Gross, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Kiryl Shutsemau, Rick Edgecombe
In-Reply-To: <20260528113605.267111-1-jgross@suse.com>

Instead of a literal "1" for signalling an error, use a negative errno
value in the emulation code of VMX related MSR registers.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- use -errno instead of KVM_MSR_RET_ERR
---
 arch/x86/kvm/vmx/nested.c    |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c | 16 +++---
 arch/x86/kvm/vmx/tdx.c       | 10 ++--
 arch/x86/kvm/vmx/vmx.c       | 96 ++++++++++++++++++------------------
 4 files changed, 62 insertions(+), 62 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3fe88f29be7a..2236f15ffab2 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1611,7 +1611,7 @@ int vmx_get_vmx_msr(struct nested_vmx_msrs *msrs, u32 msr_index, u64 *pdata)
 		*pdata = msrs->vmfunc_controls;
 		break;
 	default:
-		return 1;
+		return -EINVAL;
 	}
 
 	return 0;
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 27eb76e6b6a0..4f7e354c4b50 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -362,7 +362,7 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		} else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, true)) {
 			break;
 		}
-		return 1;
+		return -EINVAL;
 	}
 
 	return 0;
@@ -379,14 +379,14 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	switch (msr) {
 	case MSR_CORE_PERF_FIXED_CTR_CTRL:
 		if (data & pmu->fixed_ctr_ctrl_rsvd)
-			return 1;
+			return -EINVAL;
 
 		if (pmu->fixed_ctr_ctrl != data)
 			reprogram_fixed_counters(pmu, data);
 		break;
 	case MSR_IA32_PEBS_ENABLE:
 		if (data & pmu->pebs_enable_rsvd)
-			return 1;
+			return -EINVAL;
 
 		if (pmu->pebs_enable != data) {
 			diff = pmu->pebs_enable ^ data;
@@ -396,13 +396,13 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_DS_AREA:
 		if (is_noncanonical_msr_address(data, vcpu))
-			return 1;
+			return -EINVAL;
 
 		pmu->ds_area = data;
 		break;
 	case MSR_PEBS_DATA_CFG:
 		if (data & pmu->pebs_data_cfg_rsvd)
-			return 1;
+			return -EINVAL;
 
 		pmu->pebs_data_cfg = data;
 		break;
@@ -411,7 +411,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
 			if ((msr & MSR_PMC_FULL_WIDTH_BIT) &&
 			    (data & ~pmu->counter_bitmask[KVM_PMC_GP]))
-				return 1;
+				return -EINVAL;
 
 			if (!msr_info->host_initiated &&
 			    !(msr & MSR_PMC_FULL_WIDTH_BIT))
@@ -427,7 +427,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			    (pmu->raw_event_mask & HSW_IN_TX_CHECKPOINTED))
 				reserved_bits ^= HSW_IN_TX_CHECKPOINTED;
 			if (data & reserved_bits)
-				return 1;
+				return -EINVAL;
 
 			if (data != pmc->eventsel) {
 				pmc->eventsel = data;
@@ -439,7 +439,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			break;
 		}
 		/* Not a known PMU MSR. */
-		return 1;
+		return -EINVAL;
 	}
 
 	return 0;
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 04ce321ebdf3..acc3242af4f4 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2158,12 +2158,12 @@ int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		return 0;
 	case MSR_IA32_MCG_EXT_CTL:
 		if (!msr->host_initiated && !(vcpu->arch.mcg_cap & MCG_LMCE_P))
-			return 1;
+			return -EINVAL;
 		msr->data = vcpu->arch.mcg_ext_ctl;
 		return 0;
 	default:
 		if (!tdx_has_emulated_msr(msr->index))
-			return 1;
+			return -EACCES;
 
 		return kvm_get_msr_common(vcpu, msr);
 	}
@@ -2175,15 +2175,15 @@ int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_IA32_MCG_EXT_CTL:
 		if ((!msr->host_initiated && !(vcpu->arch.mcg_cap & MCG_LMCE_P)) ||
 		    (msr->data & ~MCG_EXT_CTL_LMCE_EN))
-			return 1;
+			return -EINVAL;
 		vcpu->arch.mcg_ext_ctl = msr->data;
 		return 0;
 	default:
 		if (tdx_is_read_only_msr(msr->index))
-			return 1;
+			return -EACCES;
 
 		if (!tdx_has_emulated_msr(msr->index))
-			return 1;
+			return -EACCES;
 
 		return kvm_set_msr_common(vcpu, msr);
 	}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b9103de01428..2eee599fca30 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2076,7 +2076,7 @@ int vmx_get_feature_msr(u32 msr, u64 *data)
 	switch (msr) {
 	case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
 		if (!nested)
-			return 1;
+			return -EINVAL;
 		return vmx_get_vmx_msr(&vmcs_config.nested, msr, data);
 	default:
 		return KVM_MSR_RET_UNSUPPORTED;
@@ -2111,18 +2111,18 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSX_CTRL:
 		if (!msr_info->host_initiated &&
 		    !(vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR))
-			return 1;
+			return -EINVAL;
 		goto find_uret_msr;
 	case MSR_IA32_UMWAIT_CONTROL:
 		if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx))
-			return 1;
+			return -EINVAL;
 
 		msr_info->data = vmx->msr_ia32_umwait_control;
 		break;
 	case MSR_IA32_SPEC_CTRL:
 		if (!msr_info->host_initiated &&
 		    !guest_has_spec_ctrl_msr(vcpu))
-			return 1;
+			return -EINVAL;
 
 		msr_info->data = to_vmx(vcpu)->spec_ctrl;
 		break;
@@ -2139,14 +2139,14 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!kvm_mpx_supported() ||
 		    (!msr_info->host_initiated &&
 		     !guest_cpu_cap_has(vcpu, X86_FEATURE_MPX)))
-			return 1;
+			return -EINVAL;
 		msr_info->data = vmcs_read64(GUEST_BNDCFGS);
 		break;
 	case MSR_IA32_MCG_EXT_CTL:
 		if (!msr_info->host_initiated &&
 		    !(vmx->msr_ia32_feature_control &
 		      FEAT_CTL_LMCE_ENABLED))
-			return 1;
+			return -EINVAL;
 		msr_info->data = vcpu->arch.mcg_ext_ctl;
 		break;
 	case MSR_IA32_FEAT_CTL:
@@ -2155,16 +2155,16 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
 		if (!msr_info->host_initiated &&
 		    !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC))
-			return 1;
+			return -EINVAL;
 		msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
 			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
 		break;
 	case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
 		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
-			return 1;
+			return -EINVAL;
 		if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
 				    &msr_info->data))
-			return 1;
+			return -EINVAL;
 #ifdef CONFIG_KVM_HYPERV
 		/*
 		 * Enlightened VMCS v1 doesn't have certain VMCS fields but
@@ -2180,19 +2180,19 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_RTIT_CTL:
 		if (!vmx_pt_mode_is_host_guest())
-			return 1;
+			return -EINVAL;
 		msr_info->data = vmx->pt_desc.guest.ctl;
 		break;
 	case MSR_IA32_RTIT_STATUS:
 		if (!vmx_pt_mode_is_host_guest())
-			return 1;
+			return -EINVAL;
 		msr_info->data = vmx->pt_desc.guest.status;
 		break;
 	case MSR_IA32_RTIT_CR3_MATCH:
 		if (!vmx_pt_mode_is_host_guest() ||
 			!intel_pt_validate_cap(vmx->pt_desc.caps,
 						PT_CAP_cr3_filtering))
-			return 1;
+			return -EINVAL;
 		msr_info->data = vmx->pt_desc.guest.cr3_match;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_BASE:
@@ -2201,7 +2201,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 					PT_CAP_topa_output) &&
 			 !intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_single_range_output)))
-			return 1;
+			return -EINVAL;
 		msr_info->data = vmx->pt_desc.guest.output_base;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_MASK:
@@ -2210,14 +2210,14 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 					PT_CAP_topa_output) &&
 			 !intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_single_range_output)))
-			return 1;
+			return -EINVAL;
 		msr_info->data = vmx->pt_desc.guest.output_mask;
 		break;
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
 		if (!vmx_pt_mode_is_host_guest() ||
 		    (index >= 2 * vmx->pt_desc.num_address_ranges))
-			return 1;
+			return -EINVAL;
 		if (index % 2)
 			msr_info->data = vmx->pt_desc.guest.addr_b[index / 2];
 		else
@@ -2359,7 +2359,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		if (!vmx_is_valid_debugctl(vcpu, data, msr_info->host_initiated))
-			return 1;
+			return -EINVAL;
 
 		data &= vmx_get_supported_debugctl(vcpu, msr_info->host_initiated);
 
@@ -2377,10 +2377,10 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!kvm_mpx_supported() ||
 		    (!msr_info->host_initiated &&
 		     !guest_cpu_cap_has(vcpu, X86_FEATURE_MPX)))
-			return 1;
+			return -EINVAL;
 		if (is_noncanonical_msr_address(data & PAGE_MASK, vcpu) ||
 		    (data & MSR_IA32_BNDCFGS_RSVD))
-			return 1;
+			return -EINVAL;
 
 		if (is_guest_mode(vcpu) &&
 		    ((vmx->nested.msrs.entry_ctls_high & VM_ENTRY_LOAD_BNDCFGS) ||
@@ -2391,21 +2391,21 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_UMWAIT_CONTROL:
 		if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx))
-			return 1;
+			return -EINVAL;
 
 		/* The reserved bit 1 and non-32 bit [63:32] should be zero */
 		if (data & (BIT_ULL(1) | GENMASK_ULL(63, 32)))
-			return 1;
+			return -EINVAL;
 
 		vmx->msr_ia32_umwait_control = data;
 		break;
 	case MSR_IA32_SPEC_CTRL:
 		if (!msr_info->host_initiated &&
 		    !guest_has_spec_ctrl_msr(vcpu))
-			return 1;
+			return -EINVAL;
 
 		if (kvm_spec_ctrl_test_value(data))
-			return 1;
+			return -EINVAL;
 
 		vmx->spec_ctrl = data;
 		if (!data)
@@ -2430,9 +2430,9 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSX_CTRL:
 		if (!msr_info->host_initiated &&
 		    !(vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR))
-			return 1;
+			return -EINVAL;
 		if (data & ~(TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR))
-			return 1;
+			return -EINVAL;
 		goto find_uret_msr;
 	case MSR_IA32_CR_PAT:
 		ret = kvm_set_msr_common(vcpu, msr_info);
@@ -2451,12 +2451,12 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		     !(to_vmx(vcpu)->msr_ia32_feature_control &
 		       FEAT_CTL_LMCE_ENABLED)) ||
 		    (data & ~MCG_EXT_CTL_LMCE_EN))
-			return 1;
+			return -EINVAL;
 		vcpu->arch.mcg_ext_ctl = data;
 		break;
 	case MSR_IA32_FEAT_CTL:
 		if (!is_vmx_feature_control_msr_valid(vmx, msr_info))
-			return 1;
+			return -EINVAL;
 
 		vmx->msr_ia32_feature_control = data;
 		if (msr_info->host_initiated && data == 0)
@@ -2481,70 +2481,70 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		    (!guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC) ||
 		    ((vmx->msr_ia32_feature_control & FEAT_CTL_LOCKED) &&
 		    !(vmx->msr_ia32_feature_control & FEAT_CTL_SGX_LC_ENABLED))))
-			return 1;
+			return -EINVAL;
 		vmx->msr_ia32_sgxlepubkeyhash
 			[msr_index - MSR_IA32_SGXLEPUBKEYHASH0] = data;
 		break;
 	case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
 		if (!msr_info->host_initiated)
-			return 1; /* they are read-only */
+			return -EINVAL; /* they are read-only */
 		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
-			return 1;
+			return -EINVAL;
 		return vmx_set_vmx_msr(vcpu, msr_index, data);
 	case MSR_IA32_RTIT_CTL:
 		if (!vmx_pt_mode_is_host_guest() ||
 			vmx_rtit_ctl_check(vcpu, data) ||
 			vmx->nested.vmxon)
-			return 1;
+			return -EINVAL;
 		vmcs_write64(GUEST_IA32_RTIT_CTL, data);
 		vmx->pt_desc.guest.ctl = data;
 		pt_update_intercept_for_msr(vcpu);
 		break;
 	case MSR_IA32_RTIT_STATUS:
 		if (!pt_can_write_msr(vmx))
-			return 1;
+			return -EINVAL;
 		if (data & MSR_IA32_RTIT_STATUS_MASK)
-			return 1;
+			return -EINVAL;
 		vmx->pt_desc.guest.status = data;
 		break;
 	case MSR_IA32_RTIT_CR3_MATCH:
 		if (!pt_can_write_msr(vmx))
-			return 1;
+			return -EINVAL;
 		if (!intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_cr3_filtering))
-			return 1;
+			return -EINVAL;
 		vmx->pt_desc.guest.cr3_match = data;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_BASE:
 		if (!pt_can_write_msr(vmx))
-			return 1;
+			return -EINVAL;
 		if (!intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_topa_output) &&
 		    !intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_single_range_output))
-			return 1;
+			return -EINVAL;
 		if (!pt_output_base_valid(vcpu, data))
-			return 1;
+			return -EINVAL;
 		vmx->pt_desc.guest.output_base = data;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_MASK:
 		if (!pt_can_write_msr(vmx))
-			return 1;
+			return -EINVAL;
 		if (!intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_topa_output) &&
 		    !intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_single_range_output))
-			return 1;
+			return -EINVAL;
 		vmx->pt_desc.guest.output_mask = data;
 		break;
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 		if (!pt_can_write_msr(vmx))
-			return 1;
+			return -EINVAL;
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
 		if (index >= 2 * vmx->pt_desc.num_address_ranges)
-			return 1;
+			return -EINVAL;
 		if (is_noncanonical_msr_address(data, vcpu))
-			return 1;
+			return -EINVAL;
 		if (index % 2)
 			vmx->pt_desc.guest.addr_b[index / 2] = data;
 		else
@@ -2563,20 +2563,20 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (data & PERF_CAP_LBR_FMT) {
 			if ((data & PERF_CAP_LBR_FMT) !=
 			    (kvm_caps.supported_perf_cap & PERF_CAP_LBR_FMT))
-				return 1;
+				return -EINVAL;
 			if (!cpuid_model_is_consistent(vcpu))
-				return 1;
+				return -EINVAL;
 		}
 		if (data & PERF_CAP_PEBS_FORMAT) {
 			if ((data & PERF_CAP_PEBS_MASK) !=
 			    (kvm_caps.supported_perf_cap & PERF_CAP_PEBS_MASK))
-				return 1;
+				return -EINVAL;
 			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_DS))
-				return 1;
+				return -EINVAL;
 			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_DTES64))
-				return 1;
+				return -EINVAL;
 			if (!cpuid_model_is_consistent(vcpu))
-				return 1;
+				return -EINVAL;
 		}
 		ret = kvm_set_msr_common(vcpu, msr_info);
 		break;
-- 
2.54.0


^ permalink raw reply related

* [PATCH v2 0/6] KVM/x86: Drop "1" as MSR emulation return value
From: Juergen Gross @ 2026-05-28 11:35 UTC (permalink / raw)
  To: linux-kernel, x86, kvm, linux-coco
  Cc: Juergen Gross, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Vitaly Kuznetsov, Kiryl Shutsemau, Rick Edgecombe,
	David Woodhouse, Paul Durrant

Get rid of the literal "1" used as general error return value in KVM
MSR emulation. It can easily be replaced by negative errno values
instead.

This is meant to avoid confusion with the literal "1" used as return
value for "return to guest".

Changes in V2:
- series carved out from initial "KVM: Avoid literal numbers as return
  values" series
- don't use new KVM_MSR_RET_* defines, but 0 and -errno

Juergen Gross (6):
  KVM/x86: Change comment before KVM_MSR_RET_* defines
  KVM/x86: Return -errno instead of "1" for APIC related MSR emulation
  KVM/x86: Return -errno instead of "1" for Hyper-V related MSR
    emulation
  KVM/x86: Return -errno instead of "1" for VMX related MSR emulation
  KVM/x86: Return -errno instead of "1" for SVM related MSR emulation
  KVM/x86: Return -errno instead of "1" for common MSR emulation

 arch/x86/kvm/hyperv.c        |  72 ++++++++++++-------------
 arch/x86/kvm/lapic.c         |  39 +++++++-------
 arch/x86/kvm/mtrr.c          |   6 +--
 arch/x86/kvm/pmu.c           |   8 +--
 arch/x86/kvm/svm/pmu.c       |   4 +-
 arch/x86/kvm/svm/svm.c       |  36 ++++++-------
 arch/x86/kvm/vmx/nested.c    |   2 +-
 arch/x86/kvm/vmx/pmu_intel.c |  16 +++---
 arch/x86/kvm/vmx/tdx.c       |  10 ++--
 arch/x86/kvm/vmx/vmx.c       |  96 ++++++++++++++++-----------------
 arch/x86/kvm/x86.c           | 102 +++++++++++++++++------------------
 arch/x86/kvm/x86.h           |   4 +-
 arch/x86/kvm/xen.c           |  10 ++--
 13 files changed, 202 insertions(+), 203 deletions(-)

-- 
2.54.0


^ permalink raw reply

* Re: [PATCH v2 0/5] KVM/x86: Drop "1" as MSR emulation return value
From: Juergen Gross @ 2026-05-28 11:35 UTC (permalink / raw)
  To: linux-kernel, x86, kvm, linux-coco
  Cc: Sean Christopherson, Paolo Bonzini, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, H. Peter Anvin, Vitaly Kuznetsov,
	Kiryl Shutsemau, Rick Edgecombe
In-Reply-To: <20260528111357.264809-1-jgross@suse.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 1468 bytes --]

On 28.05.26 13:13, Juergen Gross wrote:
> Get rid of the literal "1" used as general error return value in KVM
> MSR emulation. It can easily be replaced by negative errno values
> instead.
> 
> This is meant to avoid confusion with the literal "1" used as return
> value for "return to guest".
> 
> Changes in V2:
> - series carved out from initial "KVM: Avoid literal numbers as return
>    values" series
> - don't use new KVM_MSR_RET_* defines, but 0 and -errno
> 
> Juergen Gross (5):
>    KVM/x86: Change comment before KVM_MSR_RET_* defines
>    KVM/x86: Return -errno instead of "1" for APIC related MSR emulation
>    KVM/x86: Return -errno instead of "1" for Hyper-V related MSR
>      emulation
>    KVM/x86: Return -errno instead of "1" for VMX related MSR emulation
>    KVM/x86: Return -errno instead of "1" for SVM related MSR emulation
> 
>   arch/x86/kvm/hyperv.c        | 72 +++++++++++++--------------
>   arch/x86/kvm/lapic.c         | 39 +++++++--------
>   arch/x86/kvm/svm/pmu.c       |  4 +-
>   arch/x86/kvm/svm/svm.c       | 36 +++++++-------
>   arch/x86/kvm/vmx/nested.c    |  2 +-
>   arch/x86/kvm/vmx/pmu_intel.c | 16 +++---
>   arch/x86/kvm/vmx/tdx.c       | 10 ++--
>   arch/x86/kvm/vmx/vmx.c       | 96 ++++++++++++++++++------------------
>   arch/x86/kvm/x86.h           |  4 +-
>   9 files changed, 139 insertions(+), 140 deletions(-)
> 

Oh, sorry, there is another patch. Resending.


Juergen

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

^ permalink raw reply

* [PATCH v2 4/5] KVM/x86: Return -errno instead of "1" for VMX related MSR emulation
From: Juergen Gross @ 2026-05-28 11:13 UTC (permalink / raw)
  To: linux-kernel, x86, kvm, linux-coco
  Cc: Juergen Gross, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Kiryl Shutsemau, Rick Edgecombe
In-Reply-To: <20260528111357.264809-1-jgross@suse.com>

Instead of a literal "1" for signalling an error, use a negative errno
value in the emulation code of VMX related MSR registers.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
V2:
- use -errno instead of KVM_MSR_RET_ERR
---
 arch/x86/kvm/vmx/nested.c    |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c | 16 +++---
 arch/x86/kvm/vmx/tdx.c       | 10 ++--
 arch/x86/kvm/vmx/vmx.c       | 96 ++++++++++++++++++------------------
 4 files changed, 62 insertions(+), 62 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 3fe88f29be7a..2236f15ffab2 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1611,7 +1611,7 @@ int vmx_get_vmx_msr(struct nested_vmx_msrs *msrs, u32 msr_index, u64 *pdata)
 		*pdata = msrs->vmfunc_controls;
 		break;
 	default:
-		return 1;
+		return -EINVAL;
 	}
 
 	return 0;
diff --git a/arch/x86/kvm/vmx/pmu_intel.c b/arch/x86/kvm/vmx/pmu_intel.c
index 27eb76e6b6a0..4f7e354c4b50 100644
--- a/arch/x86/kvm/vmx/pmu_intel.c
+++ b/arch/x86/kvm/vmx/pmu_intel.c
@@ -362,7 +362,7 @@ static int intel_pmu_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		} else if (intel_pmu_handle_lbr_msrs_access(vcpu, msr_info, true)) {
 			break;
 		}
-		return 1;
+		return -EINVAL;
 	}
 
 	return 0;
@@ -379,14 +379,14 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	switch (msr) {
 	case MSR_CORE_PERF_FIXED_CTR_CTRL:
 		if (data & pmu->fixed_ctr_ctrl_rsvd)
-			return 1;
+			return -EINVAL;
 
 		if (pmu->fixed_ctr_ctrl != data)
 			reprogram_fixed_counters(pmu, data);
 		break;
 	case MSR_IA32_PEBS_ENABLE:
 		if (data & pmu->pebs_enable_rsvd)
-			return 1;
+			return -EINVAL;
 
 		if (pmu->pebs_enable != data) {
 			diff = pmu->pebs_enable ^ data;
@@ -396,13 +396,13 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_DS_AREA:
 		if (is_noncanonical_msr_address(data, vcpu))
-			return 1;
+			return -EINVAL;
 
 		pmu->ds_area = data;
 		break;
 	case MSR_PEBS_DATA_CFG:
 		if (data & pmu->pebs_data_cfg_rsvd)
-			return 1;
+			return -EINVAL;
 
 		pmu->pebs_data_cfg = data;
 		break;
@@ -411,7 +411,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		    (pmc = get_gp_pmc(pmu, msr, MSR_IA32_PMC0))) {
 			if ((msr & MSR_PMC_FULL_WIDTH_BIT) &&
 			    (data & ~pmu->counter_bitmask[KVM_PMC_GP]))
-				return 1;
+				return -EINVAL;
 
 			if (!msr_info->host_initiated &&
 			    !(msr & MSR_PMC_FULL_WIDTH_BIT))
@@ -427,7 +427,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			    (pmu->raw_event_mask & HSW_IN_TX_CHECKPOINTED))
 				reserved_bits ^= HSW_IN_TX_CHECKPOINTED;
 			if (data & reserved_bits)
-				return 1;
+				return -EINVAL;
 
 			if (data != pmc->eventsel) {
 				pmc->eventsel = data;
@@ -439,7 +439,7 @@ static int intel_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 			break;
 		}
 		/* Not a known PMU MSR. */
-		return 1;
+		return -EINVAL;
 	}
 
 	return 0;
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 04ce321ebdf3..acc3242af4f4 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -2158,12 +2158,12 @@ int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 		return 0;
 	case MSR_IA32_MCG_EXT_CTL:
 		if (!msr->host_initiated && !(vcpu->arch.mcg_cap & MCG_LMCE_P))
-			return 1;
+			return -EINVAL;
 		msr->data = vcpu->arch.mcg_ext_ctl;
 		return 0;
 	default:
 		if (!tdx_has_emulated_msr(msr->index))
-			return 1;
+			return -EACCES;
 
 		return kvm_get_msr_common(vcpu, msr);
 	}
@@ -2175,15 +2175,15 @@ int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	case MSR_IA32_MCG_EXT_CTL:
 		if ((!msr->host_initiated && !(vcpu->arch.mcg_cap & MCG_LMCE_P)) ||
 		    (msr->data & ~MCG_EXT_CTL_LMCE_EN))
-			return 1;
+			return -EINVAL;
 		vcpu->arch.mcg_ext_ctl = msr->data;
 		return 0;
 	default:
 		if (tdx_is_read_only_msr(msr->index))
-			return 1;
+			return -EACCES;
 
 		if (!tdx_has_emulated_msr(msr->index))
-			return 1;
+			return -EACCES;
 
 		return kvm_set_msr_common(vcpu, msr);
 	}
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b9103de01428..2eee599fca30 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2076,7 +2076,7 @@ int vmx_get_feature_msr(u32 msr, u64 *data)
 	switch (msr) {
 	case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
 		if (!nested)
-			return 1;
+			return -EINVAL;
 		return vmx_get_vmx_msr(&vmcs_config.nested, msr, data);
 	default:
 		return KVM_MSR_RET_UNSUPPORTED;
@@ -2111,18 +2111,18 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSX_CTRL:
 		if (!msr_info->host_initiated &&
 		    !(vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR))
-			return 1;
+			return -EINVAL;
 		goto find_uret_msr;
 	case MSR_IA32_UMWAIT_CONTROL:
 		if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx))
-			return 1;
+			return -EINVAL;
 
 		msr_info->data = vmx->msr_ia32_umwait_control;
 		break;
 	case MSR_IA32_SPEC_CTRL:
 		if (!msr_info->host_initiated &&
 		    !guest_has_spec_ctrl_msr(vcpu))
-			return 1;
+			return -EINVAL;
 
 		msr_info->data = to_vmx(vcpu)->spec_ctrl;
 		break;
@@ -2139,14 +2139,14 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!kvm_mpx_supported() ||
 		    (!msr_info->host_initiated &&
 		     !guest_cpu_cap_has(vcpu, X86_FEATURE_MPX)))
-			return 1;
+			return -EINVAL;
 		msr_info->data = vmcs_read64(GUEST_BNDCFGS);
 		break;
 	case MSR_IA32_MCG_EXT_CTL:
 		if (!msr_info->host_initiated &&
 		    !(vmx->msr_ia32_feature_control &
 		      FEAT_CTL_LMCE_ENABLED))
-			return 1;
+			return -EINVAL;
 		msr_info->data = vcpu->arch.mcg_ext_ctl;
 		break;
 	case MSR_IA32_FEAT_CTL:
@@ -2155,16 +2155,16 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_SGXLEPUBKEYHASH0 ... MSR_IA32_SGXLEPUBKEYHASH3:
 		if (!msr_info->host_initiated &&
 		    !guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC))
-			return 1;
+			return -EINVAL;
 		msr_info->data = to_vmx(vcpu)->msr_ia32_sgxlepubkeyhash
 			[msr_info->index - MSR_IA32_SGXLEPUBKEYHASH0];
 		break;
 	case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
 		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
-			return 1;
+			return -EINVAL;
 		if (vmx_get_vmx_msr(&vmx->nested.msrs, msr_info->index,
 				    &msr_info->data))
-			return 1;
+			return -EINVAL;
 #ifdef CONFIG_KVM_HYPERV
 		/*
 		 * Enlightened VMCS v1 doesn't have certain VMCS fields but
@@ -2180,19 +2180,19 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_RTIT_CTL:
 		if (!vmx_pt_mode_is_host_guest())
-			return 1;
+			return -EINVAL;
 		msr_info->data = vmx->pt_desc.guest.ctl;
 		break;
 	case MSR_IA32_RTIT_STATUS:
 		if (!vmx_pt_mode_is_host_guest())
-			return 1;
+			return -EINVAL;
 		msr_info->data = vmx->pt_desc.guest.status;
 		break;
 	case MSR_IA32_RTIT_CR3_MATCH:
 		if (!vmx_pt_mode_is_host_guest() ||
 			!intel_pt_validate_cap(vmx->pt_desc.caps,
 						PT_CAP_cr3_filtering))
-			return 1;
+			return -EINVAL;
 		msr_info->data = vmx->pt_desc.guest.cr3_match;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_BASE:
@@ -2201,7 +2201,7 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 					PT_CAP_topa_output) &&
 			 !intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_single_range_output)))
-			return 1;
+			return -EINVAL;
 		msr_info->data = vmx->pt_desc.guest.output_base;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_MASK:
@@ -2210,14 +2210,14 @@ int vmx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 					PT_CAP_topa_output) &&
 			 !intel_pt_validate_cap(vmx->pt_desc.caps,
 					PT_CAP_single_range_output)))
-			return 1;
+			return -EINVAL;
 		msr_info->data = vmx->pt_desc.guest.output_mask;
 		break;
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
 		if (!vmx_pt_mode_is_host_guest() ||
 		    (index >= 2 * vmx->pt_desc.num_address_ranges))
-			return 1;
+			return -EINVAL;
 		if (index % 2)
 			msr_info->data = vmx->pt_desc.guest.addr_b[index / 2];
 		else
@@ -2359,7 +2359,7 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_DEBUGCTLMSR:
 		if (!vmx_is_valid_debugctl(vcpu, data, msr_info->host_initiated))
-			return 1;
+			return -EINVAL;
 
 		data &= vmx_get_supported_debugctl(vcpu, msr_info->host_initiated);
 
@@ -2377,10 +2377,10 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!kvm_mpx_supported() ||
 		    (!msr_info->host_initiated &&
 		     !guest_cpu_cap_has(vcpu, X86_FEATURE_MPX)))
-			return 1;
+			return -EINVAL;
 		if (is_noncanonical_msr_address(data & PAGE_MASK, vcpu) ||
 		    (data & MSR_IA32_BNDCFGS_RSVD))
-			return 1;
+			return -EINVAL;
 
 		if (is_guest_mode(vcpu) &&
 		    ((vmx->nested.msrs.entry_ctls_high & VM_ENTRY_LOAD_BNDCFGS) ||
@@ -2391,21 +2391,21 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		break;
 	case MSR_IA32_UMWAIT_CONTROL:
 		if (!msr_info->host_initiated && !vmx_has_waitpkg(vmx))
-			return 1;
+			return -EINVAL;
 
 		/* The reserved bit 1 and non-32 bit [63:32] should be zero */
 		if (data & (BIT_ULL(1) | GENMASK_ULL(63, 32)))
-			return 1;
+			return -EINVAL;
 
 		vmx->msr_ia32_umwait_control = data;
 		break;
 	case MSR_IA32_SPEC_CTRL:
 		if (!msr_info->host_initiated &&
 		    !guest_has_spec_ctrl_msr(vcpu))
-			return 1;
+			return -EINVAL;
 
 		if (kvm_spec_ctrl_test_value(data))
-			return 1;
+			return -EINVAL;
 
 		vmx->spec_ctrl = data;
 		if (!data)
@@ -2430,9 +2430,9 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	case MSR_IA32_TSX_CTRL:
 		if (!msr_info->host_initiated &&
 		    !(vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR))
-			return 1;
+			return -EINVAL;
 		if (data & ~(TSX_CTRL_RTM_DISABLE | TSX_CTRL_CPUID_CLEAR))
-			return 1;
+			return -EINVAL;
 		goto find_uret_msr;
 	case MSR_IA32_CR_PAT:
 		ret = kvm_set_msr_common(vcpu, msr_info);
@@ -2451,12 +2451,12 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		     !(to_vmx(vcpu)->msr_ia32_feature_control &
 		       FEAT_CTL_LMCE_ENABLED)) ||
 		    (data & ~MCG_EXT_CTL_LMCE_EN))
-			return 1;
+			return -EINVAL;
 		vcpu->arch.mcg_ext_ctl = data;
 		break;
 	case MSR_IA32_FEAT_CTL:
 		if (!is_vmx_feature_control_msr_valid(vmx, msr_info))
-			return 1;
+			return -EINVAL;
 
 		vmx->msr_ia32_feature_control = data;
 		if (msr_info->host_initiated && data == 0)
@@ -2481,70 +2481,70 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		    (!guest_cpu_cap_has(vcpu, X86_FEATURE_SGX_LC) ||
 		    ((vmx->msr_ia32_feature_control & FEAT_CTL_LOCKED) &&
 		    !(vmx->msr_ia32_feature_control & FEAT_CTL_SGX_LC_ENABLED))))
-			return 1;
+			return -EINVAL;
 		vmx->msr_ia32_sgxlepubkeyhash
 			[msr_index - MSR_IA32_SGXLEPUBKEYHASH0] = data;
 		break;
 	case KVM_FIRST_EMULATED_VMX_MSR ... KVM_LAST_EMULATED_VMX_MSR:
 		if (!msr_info->host_initiated)
-			return 1; /* they are read-only */
+			return -EINVAL; /* they are read-only */
 		if (!guest_cpu_cap_has(vcpu, X86_FEATURE_VMX))
-			return 1;
+			return -EINVAL;
 		return vmx_set_vmx_msr(vcpu, msr_index, data);
 	case MSR_IA32_RTIT_CTL:
 		if (!vmx_pt_mode_is_host_guest() ||
 			vmx_rtit_ctl_check(vcpu, data) ||
 			vmx->nested.vmxon)
-			return 1;
+			return -EINVAL;
 		vmcs_write64(GUEST_IA32_RTIT_CTL, data);
 		vmx->pt_desc.guest.ctl = data;
 		pt_update_intercept_for_msr(vcpu);
 		break;
 	case MSR_IA32_RTIT_STATUS:
 		if (!pt_can_write_msr(vmx))
-			return 1;
+			return -EINVAL;
 		if (data & MSR_IA32_RTIT_STATUS_MASK)
-			return 1;
+			return -EINVAL;
 		vmx->pt_desc.guest.status = data;
 		break;
 	case MSR_IA32_RTIT_CR3_MATCH:
 		if (!pt_can_write_msr(vmx))
-			return 1;
+			return -EINVAL;
 		if (!intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_cr3_filtering))
-			return 1;
+			return -EINVAL;
 		vmx->pt_desc.guest.cr3_match = data;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_BASE:
 		if (!pt_can_write_msr(vmx))
-			return 1;
+			return -EINVAL;
 		if (!intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_topa_output) &&
 		    !intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_single_range_output))
-			return 1;
+			return -EINVAL;
 		if (!pt_output_base_valid(vcpu, data))
-			return 1;
+			return -EINVAL;
 		vmx->pt_desc.guest.output_base = data;
 		break;
 	case MSR_IA32_RTIT_OUTPUT_MASK:
 		if (!pt_can_write_msr(vmx))
-			return 1;
+			return -EINVAL;
 		if (!intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_topa_output) &&
 		    !intel_pt_validate_cap(vmx->pt_desc.caps,
 					   PT_CAP_single_range_output))
-			return 1;
+			return -EINVAL;
 		vmx->pt_desc.guest.output_mask = data;
 		break;
 	case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B:
 		if (!pt_can_write_msr(vmx))
-			return 1;
+			return -EINVAL;
 		index = msr_info->index - MSR_IA32_RTIT_ADDR0_A;
 		if (index >= 2 * vmx->pt_desc.num_address_ranges)
-			return 1;
+			return -EINVAL;
 		if (is_noncanonical_msr_address(data, vcpu))
-			return 1;
+			return -EINVAL;
 		if (index % 2)
 			vmx->pt_desc.guest.addr_b[index / 2] = data;
 		else
@@ -2563,20 +2563,20 @@ int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (data & PERF_CAP_LBR_FMT) {
 			if ((data & PERF_CAP_LBR_FMT) !=
 			    (kvm_caps.supported_perf_cap & PERF_CAP_LBR_FMT))
-				return 1;
+				return -EINVAL;
 			if (!cpuid_model_is_consistent(vcpu))
-				return 1;
+				return -EINVAL;
 		}
 		if (data & PERF_CAP_PEBS_FORMAT) {
 			if ((data & PERF_CAP_PEBS_MASK) !=
 			    (kvm_caps.supported_perf_cap & PERF_CAP_PEBS_MASK))
-				return 1;
+				return -EINVAL;
 			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_DS))
-				return 1;
+				return -EINVAL;
 			if (!guest_cpu_cap_has(vcpu, X86_FEATURE_DTES64))
-				return 1;
+				return -EINVAL;
 			if (!cpuid_model_is_consistent(vcpu))
-				return 1;
+				return -EINVAL;
 		}
 		ret = kvm_set_msr_common(vcpu, msr_info);
 		break;
-- 
2.54.0


^ permalink raw reply related

* [PATCH v2 0/5] KVM/x86: Drop "1" as MSR emulation return value
From: Juergen Gross @ 2026-05-28 11:13 UTC (permalink / raw)
  To: linux-kernel, x86, kvm, linux-coco
  Cc: Juergen Gross, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen,
	H. Peter Anvin, Vitaly Kuznetsov, Kiryl Shutsemau, Rick Edgecombe

Get rid of the literal "1" used as general error return value in KVM
MSR emulation. It can easily be replaced by negative errno values
instead.

This is meant to avoid confusion with the literal "1" used as return
value for "return to guest".

Changes in V2:
- series carved out from initial "KVM: Avoid literal numbers as return
  values" series
- don't use new KVM_MSR_RET_* defines, but 0 and -errno

Juergen Gross (5):
  KVM/x86: Change comment before KVM_MSR_RET_* defines
  KVM/x86: Return -errno instead of "1" for APIC related MSR emulation
  KVM/x86: Return -errno instead of "1" for Hyper-V related MSR
    emulation
  KVM/x86: Return -errno instead of "1" for VMX related MSR emulation
  KVM/x86: Return -errno instead of "1" for SVM related MSR emulation

 arch/x86/kvm/hyperv.c        | 72 +++++++++++++--------------
 arch/x86/kvm/lapic.c         | 39 +++++++--------
 arch/x86/kvm/svm/pmu.c       |  4 +-
 arch/x86/kvm/svm/svm.c       | 36 +++++++-------
 arch/x86/kvm/vmx/nested.c    |  2 +-
 arch/x86/kvm/vmx/pmu_intel.c | 16 +++---
 arch/x86/kvm/vmx/tdx.c       | 10 ++--
 arch/x86/kvm/vmx/vmx.c       | 96 ++++++++++++++++++------------------
 arch/x86/kvm/x86.h           |  4 +-
 9 files changed, 139 insertions(+), 140 deletions(-)

-- 
2.54.0


^ permalink raw reply

* Re: [PATCH v3 2/2] x86/tdx: Fix zero-extension for 32-bit port I/O
From: Kiryl Shutsemau @ 2026-05-28 10:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	H . Peter Anvin, Rick Edgecombe, Kuppuswamy Sathyanarayanan,
	Kai Huang, Sean Christopherson, Borys Tsyrulnikov, linux-kernel,
	linux-coco, kvm, stable
In-Reply-To: <5ed6121c-314e-4cf0-9a11-b0661c87c694@intel.com>

On Wed, May 27, 2026 at 10:45:28AM -0700, Dave Hansen wrote:
> On 5/27/26 05:05, Kiryl Shutsemau (Meta) wrote:
> ...
> > -	/* Update part of the register affected by the emulated instruction */
> > -	regs->ax &= ~mask;
> > +	/*
> > +	 * IN writes the result into a sub-register of RAX. Only the
> > +	 * 32-bit form zero-extends; the smaller forms leave the upper
> > +	 * bits untouched:
> > +	 *
> > +	 *   insn  dest  size  bits written     bits preserved
> > +	 *   inb   AL    1     RAX[ 7: 0]       RAX[63: 8]
> > +	 *   inw   AX    2     RAX[15: 0]       RAX[63:16]
> > +	 *   inl   EAX   4     RAX[63: 0]       (none, zero-extended)
> > +	 *
> > +	 * 'mask' only covers the low 'size' bytes, which is exactly the
> > +	 * range affected for size 1 and 2. For size 4 the write also
> > +	 * clears RAX[63:32], so widen the clear-mask.
> > +	 */
> > +	if (size == 4)
> > +		regs->ax = 0;
> > +	else
> > +		regs->ax &= ~mask;
> > +
> 
> Is there any way we could do this with fewer comments and more code?
> 
> I mean, there's only three cases. Why have;
> 
> 	u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);
> 
> When there are only 3 possible cases:
> 
> 	1 => 0xf
> 	2 => 0xff
> 	4 => 0xffff
> 
> and one of those cases needs a special case on top of it.
> 
> Maybe something like this?
> 
> 	/* Clear out part of RAX so part of args.r11 can be OR'd in: */
> 	switch (size) {
> 	case 1:
> 		/* inb consumes lower 8 bits of r11: */
> 		regs->ax &= ~GENMASK_ULL(7, 0);
> 		args.r11 &=  GENMASK_ULL(7, 0);
> 		break;
> 	case 2:
> 		/* inw consumes lower 16 bits of r11: */
> 		regs->ax &= ~GENMASK_ULL(15, 0);
> 		args.r11 &=  GENMASK_ULL(15, 0);
> 		break;
> 	case 4:
> 		/* inl is weird and zeros the whole register: */
> 		regs->ax &= ~GENMASK_ULL(63, 0);
> 		/* But only consumes 32-bits from r11: */
> 		args.r11 &=  GENMASK_ULL(31, 0);
> 		break;
> 	default:
> 		/* Probable TDX module bug. Illegal in[bwl] size: */
> 		WARN_ON_ONCE(1);
> 		success = 0;
> 	}
> 
> 	if (success)
> 		regs->ax |= args.r11;
> 
> It might need a temporary variable for args.r11, but you get the point.
> That's basically the data from the comment but written as code.

I hate how verbose it is. All these GENMASK_ULL() make it hard to
follow.

What about the patch below. Inspired by kvm's assign_register().

diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 65119362f9a2..460b9fbabf14 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -693,8 +693,8 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
 		.r13 = PORT_READ,
 		.r14 = port,
 	};
-	u64 mask = GENMASK(BITS_PER_BYTE * size - 1, 0);
 	bool success;
+	u32 val;
 
 	/*
 	 * Emulate the I/O read via hypercall. More info about ABI can be found
@@ -703,10 +703,33 @@ static bool handle_in(struct pt_regs *regs, int size, int port)
 	 */
 	success = !__tdx_hypercall(&args);
 
-	/* Update part of the register affected by the emulated instruction */
-	regs->ax &= ~mask;
 	if (success)
-		regs->ax |= args.r11 & mask;
+		val = args.r11;
+	else
+		val = 0;
+
+	/*
+	 * IN writes the result into a sub-register of RAX.
+	 *
+	 * Only the 32-bit form zero-extends; the smaller forms leave
+	 * the upper bits untouched.
+	 */
+	switch (size) {
+	case 1:
+		*(u8 *)&regs->ax = (u8)val;
+		break;
+	case 2:
+		*(u16 *)&regs->ax = (u16)val;
+		break;
+	case 4:
+		/* zero-extended */
+		regs->ax = val;
+		break;
+	default:
+		/* Probable TDX module bug. Illegal in[bwl] size. */
+		WARN_ON_ONCE(1);
+		break;
+	}
 
 	return success;
 }
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply related

* Re: [PATCH] MAINTAINERS: Move Rick Edgecombe to TDX maintainer
From: Kiryl Shutsemau @ 2026-05-28  9:45 UTC (permalink / raw)
  To: Rick Edgecombe; +Cc: dave.hansen, x86, linux-coco, kvm, pbonzini, seanjc
In-Reply-To: <20260527221342.415814-1-rick.p.edgecombe@intel.com>

On Wed, May 27, 2026 at 03:13:42PM -0700, Rick Edgecombe wrote:
> Per some offline discussion with Kiryl, he could use some help on the TDX
> host side. I have worked on the TDX host side for the past few years
> including wrangling the initial KVM support, and can help with this.
> 
> I am already listed as TDX reviewer. Move it to maintainer.
> 
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> Cc: Kiryl Shutsemau <kas@kernel.org>
> Cc: Dave Hansen <dave.hansen@linux.intel.com>

Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org>

-- 
  Kiryl Shutsemau / Kirill A. Shutemov

^ permalink raw reply

* Re: [PATCH v14 14/44] arm64: RMI: Basic infrastructure for creating a realm.
From: Marc Zyngier @ 2026-05-28  7:10 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, kvmarm, Catalin Marinas, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Gavin Shan,
	Shanker Donthineni, Alper Gun, Aneesh Kumar K . V, Emi Kisanuki,
	Vishal Annapurve, WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-15-steven.price@arm.com>

On Wed, 13 May 2026 14:17:22 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> Introduce the skeleton functions for creating and destroying a realm.
> The IPA size requested is checked against what the RMM supports.
> 
> The actual work of constructing the realm will be added in future
> patches.

Again, $SUBJECT doesn't reflect that this is purely a KVM patch.

> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
>  * Rebased and updated to RMM-v2.0-bet1.
>  * Auxiliary granules have been removed in RMM-v2.0-bet1
> Changes since v12:
>  * Drop the RMM_PAGE_{SHIFT,SIZE} defines - the RMM is now configured to
>    be the same as the host's page size.
>  * Rework delegate/undelegate functions to use the new RMI range based
>    operations.
> Changes since v11:
>  * Major rework to drop the realm configuration and make the
>    construction of realms implicit rather than driven by the VMM
>    directly.
>  * The code to create RDs, handle VMIDs etc is moved to later patches.
> Changes since v10:
>  * Rename from RME to RMI.
>  * Move the stage2 cleanup to a later patch.
> Changes since v9:
>  * Avoid walking the stage 2 page tables when destroying the realm -
>    the real ones are not accessible to the non-secure world, and the RMM
>    may leave junk in the physical pages when returning them.
>  * Fix an error path in realm_create_rd() to actually return an error value.
> Changes since v8:
>  * Fix free_delegated_granule() to not call kvm_account_pgtable_pages();
>    a separate wrapper will be introduced in a later patch to deal with
>    RTTs.
>  * Minor code cleanups following review.
> Changes since v7:
>  * Minor code cleanup following Gavin's review.
> Changes since v6:
>  * Separate RMM RTT calculations from host PAGE_SIZE. This allows the
>    host page size to be larger than 4k while still communicating with an
>    RMM which uses 4k granules.
> Changes since v5:
>  * Introduce free_delegated_granule() to replace many
>    undelegate/free_page() instances and centralise the comment on
>    leaking when the undelegate fails.
>  * Several other minor improvements suggested by reviews - thanks for
>    the feedback!
> Changes since v2:
>  * Improved commit description.
>  * Improved return failures for rmi_check_version().
>  * Clear contents of PGD after it has been undelegated in case the RMM
>    left stale data.
>  * Minor changes to reflect changes in previous patches.
> ---
>  arch/arm64/include/asm/kvm_emulate.h | 29 ++++++++++++++
>  arch/arm64/include/asm/kvm_rmi.h     | 51 +++++++++++++++++++++++++
>  arch/arm64/kvm/arm.c                 | 12 ++++++
>  arch/arm64/kvm/mmu.c                 | 12 +++++-
>  arch/arm64/kvm/rmi.c                 | 57 ++++++++++++++++++++++++++++
>  5 files changed, 159 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index 5bf3d7e1d92c..82fd777bd9bb 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -688,4 +688,33 @@ static inline void vcpu_set_hcrx(struct kvm_vcpu *vcpu)
>  			vcpu->arch.hcrx_el2 |= HCRX_EL2_EnASR;
>  	}
>  }
> +
> +static inline bool kvm_is_realm(struct kvm *kvm)
> +{
> +	if (static_branch_unlikely(&kvm_rmi_is_available))
> +		return kvm->arch.is_realm;
> +	return false;
> +}
> +
> +static inline enum realm_state kvm_realm_state(struct kvm *kvm)
> +{
> +	return READ_ONCE(kvm->arch.realm.state);
> +}
> +
> +static inline void kvm_set_realm_state(struct kvm *kvm,
> +				       enum realm_state new_state)
> +{
> +	WRITE_ONCE(kvm->arch.realm.state, new_state);
> +}
> +
> +static inline bool kvm_realm_is_created(struct kvm *kvm)
> +{
> +	return kvm_is_realm(kvm) && kvm_realm_state(kvm) != REALM_STATE_NONE;
> +}
> +
> +static inline bool vcpu_is_rec(const struct kvm_vcpu *vcpu)
> +{
> +	return false;
> +}
> +
>  #endif /* __ARM64_KVM_EMULATE_H__ */
> diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
> index 4936007947fd..9de34983ee52 100644
> --- a/arch/arm64/include/asm/kvm_rmi.h
> +++ b/arch/arm64/include/asm/kvm_rmi.h
> @@ -6,12 +6,63 @@
>  #ifndef __ASM_KVM_RMI_H
>  #define __ASM_KVM_RMI_H
>  
> +#include <asm/rmi_smc.h>
> +
> +/**
> + * enum realm_state - State of a Realm
> + */
> +enum realm_state {
> +	/**
> +	 * @REALM_STATE_NONE:
> +	 *      Realm has not yet been created. rmi_realm_create() has not
> +	 *      yet been called.
> +	 */
> +	REALM_STATE_NONE,
> +	/**
> +	 * @REALM_STATE_NEW:
> +	 *      Realm is under construction, rmi_realm_create() has been
> +	 *      called, but it is not yet activated. Pages may be populated.
> +	 */
> +	REALM_STATE_NEW,
> +	/**
> +	 * @REALM_STATE_ACTIVE:
> +	 *      Realm has been created and is eligible for execution with
> +	 *      rmi_rec_enter(). Pages may no longer be populated with
> +	 *      rmi_data_create().
> +	 */
> +	REALM_STATE_ACTIVE,
> +	/**
> +	 * @REALM_STATE_DYING:
> +	 *      Realm is in the process of being destroyed or has already been
> +	 *      destroyed.
> +	 */
> +	REALM_STATE_DYING,
> +	/**
> +	 * @REALM_STATE_DEAD:
> +	 *      Realm has been destroyed.
> +	 */
> +	REALM_STATE_DEAD
> +};

What is the ABI status of this state? Is it purely internal to KVM? Or
is it something that the RMM actively tracks?

> +
>  /**
>   * struct realm - Additional per VM data for a Realm
> + *
> + * @state: The lifetime state machine for the realm
> + * @rd: Kernel mapping of the Realm Descriptor (RD)
> + * @params: Parameters for the RMI_REALM_CREATE command
> + * @ia_bits: Number of valid Input Address bits in the IPA
>   */
>  struct realm {
> +	enum realm_state state;
> +	void *rd;

Why is this void? Doesn't it have a proper type?

> +	struct realm_params *params;
> +	unsigned int ia_bits;

Consider reordering this structure to avoid holes.

>  };
>  
>  void kvm_init_rmi(void);
> +u32 kvm_realm_ipa_limit(void);

The use of 'realm' is confusing. This is not a per-realm property, but
something global. I'd rather reserve the term 'realm' for CCA VMs (cue
the two prototypes below).

> +
> +int kvm_init_realm(struct kvm *kvm);
> +void kvm_destroy_realm(struct kvm *kvm);
>  
>  #endif /* __ASM_KVM_RMI_H */
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 247e03b33035..18251e561524 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -264,6 +264,13 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  
>  	bitmap_zero(kvm->arch.vcpu_features, KVM_VCPU_MAX_FEATURES);
>  
> +	/* Initialise the realm bits after the generic bits are enabled */
> +	if (kvm_is_realm(kvm)) {
> +		ret = kvm_init_realm(kvm);
> +		if (ret)
> +			goto err_uninit_mmu;
> +	}
> +
>  	return 0;
>  
>  err_uninit_mmu:
> @@ -326,6 +333,8 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  	kvm_unshare_hyp(kvm, kvm + 1);
>  
>  	kvm_arm_teardown_hypercalls(kvm);
> +	if (kvm_is_realm(kvm))
> +		kvm_destroy_realm(kvm);
>  }
>  
>  static bool kvm_has_full_ptr_auth(void)
> @@ -486,6 +495,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		else
>  			r = kvm_supports_cacheable_pfnmap();
>  		break;
> +	case KVM_CAP_ARM_RMI:
> +		r = static_key_enabled(&kvm_rmi_is_available);
> +		break;
>  
>  	default:
>  		r = 0;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index d089c107d9b7..ba8286472286 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -877,10 +877,14 @@ static struct kvm_pgtable_mm_ops kvm_s2_mm_ops = {
>  
>  static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu, unsigned long type)
>  {
> +	struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu);
>  	u32 kvm_ipa_limit = get_kvm_ipa_limit();
>  	u64 mmfr0, mmfr1;
>  	u32 phys_shift;
>  
> +	if (kvm_is_realm(kvm))
> +		kvm_ipa_limit = kvm_realm_ipa_limit();
> +
>  	phys_shift = KVM_VM_TYPE_ARM_IPA_SIZE(type);
>  	if (is_protected_kvm_enabled()) {
>  		phys_shift = kvm_ipa_limit;
> @@ -974,6 +978,8 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
>  		return -EINVAL;
>  	}
>  
> +	mmu->arch = &kvm->arch;
> +
>  	err = kvm_init_ipa_range(mmu, type);
>  	if (err)
>  		return err;
> @@ -982,7 +988,6 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
>  	if (!pgt)
>  		return -ENOMEM;
>  
> -	mmu->arch = &kvm->arch;

Why moving this init?

>  	err = KVM_PGT_FN(kvm_pgtable_stage2_init)(pgt, mmu, &kvm_s2_mm_ops);
>  	if (err)
>  		goto out_free_pgtable;
> @@ -1114,7 +1119,10 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu)
>  	write_unlock(&kvm->mmu_lock);
>  
>  	if (pgt) {
> -		kvm_stage2_destroy(pgt);
> +		if (!kvm_is_realm(kvm))
> +			kvm_stage2_destroy(pgt);
> +		else
> +			kvm_pgtable_stage2_destroy_pgd(pgt);

Why can't you make kvm_stage2_destroy() do the right thing? Surely the
PTs have to be reclaimed one way or another.

>  		kfree(pgt);
>  	}
>  }
> diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
> index 6e28b669ded2..f51ec667445e 100644
> --- a/arch/arm64/kvm/rmi.c
> +++ b/arch/arm64/kvm/rmi.c
> @@ -5,6 +5,8 @@
>  
>  #include <linux/kvm_host.h>
>  
> +#include <asm/kvm_emulate.h>
> +#include <asm/kvm_mmu.h>
>  #include <asm/kvm_pgtable.h>
>  #include <asm/rmi_cmds.h>
>  #include <asm/virt.h>
> @@ -14,6 +16,61 @@ static bool rmi_has_feature(unsigned long feature)
>  	return !!u64_get_bits(rmm_feat_reg0, feature);
>  }
>  
> +u32 kvm_realm_ipa_limit(void)
> +{
> +	return u64_get_bits(rmm_feat_reg0, RMI_FEATURE_REGISTER_0_S2SZ);
> +}
> +
> +void kvm_destroy_realm(struct kvm *kvm)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +	size_t pgd_size = kvm_pgtable_stage2_pgd_size(kvm->arch.mmu.vtcr);
> +
> +	if (realm->params) {
> +		free_page((unsigned long)realm->params);
> +		realm->params = NULL;
> +	}
> +
> +	if (!kvm_realm_is_created(kvm))
> +		return;
> +
> +	kvm_set_realm_state(kvm, REALM_STATE_DYING);
> +
> +	write_lock(&kvm->mmu_lock);
> +	kvm_stage2_unmap_range(&kvm->arch.mmu, 0,
> +			       BIT(realm->ia_bits - 1), true);
> +	write_unlock(&kvm->mmu_lock);
> +
> +	if (realm->rd) {
> +		phys_addr_t rd_phys = virt_to_phys(realm->rd);
> +
> +		if (WARN_ON(rmi_realm_terminate(rd_phys)))
> +			return;
> +
> +		if (WARN_ON(rmi_realm_destroy(rd_phys)))
> +			return;
> +		free_delegated_page(rd_phys);
> +		realm->rd = NULL;
> +	}
> +
> +	if (WARN_ON(rmi_undelegate_range(kvm->arch.mmu.pgd_phys, pgd_size)))
> +		return;
> +
> +	kvm_set_realm_state(kvm, REALM_STATE_DEAD);
> +
> +	/* Now that the Realm is destroyed, free the entry level RTTs */
> +	kvm_free_stage2_pgd(&kvm->arch.mmu);
> +}

This really needs documentation: what happens at each stage? What
memory is reclaimed when?

But even more importantly, why is this built in a completely parallel
way, potentially deviating from the existing KVM S2 management?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply

* Re: SVSM Development Call May 27th, 2026
From: Jörg Rödel @ 2026-05-28  6:52 UTC (permalink / raw)
  To: Stefano Garzarella; +Cc: coconut-svsm, linux-coco
In-Reply-To: <CAGxU2F5hP=7pA1rKRvq5sgr0t2y1YoUzYCmH8hzaCS58U4+Y3A@mail.gmail.com>

Thanks Stefano for running this weeks meeting! Also thanks to Nicola and Tanish
for introducing themselves and their GSoC projects. Welcome to the COCONUT
community :)

The minutes for the meeting are now available for review:

	https://github.com/coconut-svsm/governance/pull/109

-Joerg

^ permalink raw reply

* Re: [PATCH v14 32/44] KVM: arm64: Handle Realm PSCI requests
From: Gavin Shan @ 2026-05-28  6:55 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Shanker Donthineni,
	Alper Gun, Aneesh Kumar K . V, Emi Kisanuki, Vishal Annapurve,
	WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-33-steven.price@arm.com>

Hi Steve,

On 5/13/26 11:17 PM, Steven Price wrote:
> The RMM needs to be informed of the target REC when a PSCI call is made
> with an MPIDR argument.
> 
> This requirement will be removed in a future release of the RMM 2.0
> specification but is still required for v2.0-bet1.
> 
> Co-developed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Chanegs since v13:
>   * The ioctl KVM_ARM_VCPU_RMI_PSCI_COMPLETE has gone. The RMI call is
>     made automatically just before entering the REC again.
> Changes since v12:
>   * Chance return code for non-realms to -ENXIO to better represent that
>     the ioctl is invalid for non-realms (checkpatch is insistent that
>     "ENOSYS means 'invalid syscall nr' and nothing else").
> Changes since v11:
>   * RMM->RMI renaming.
> Changes since v6:
>   * Use vcpu_is_rec() rather than kvm_is_realm(vcpu->kvm).
>   * Minor renaming/formatting fixes.
> ---
>   arch/arm64/include/asm/kvm_rmi.h |  3 ++
>   arch/arm64/kvm/psci.c            | 15 ++++++++-
>   arch/arm64/kvm/rmi.c             | 58 ++++++++++++++++++++++++++++++++
>   3 files changed, 75 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
> index b65cfec10dee..eacf82a7467d 100644
> --- a/arch/arm64/include/asm/kvm_rmi.h
> +++ b/arch/arm64/include/asm/kvm_rmi.h
> @@ -109,6 +109,9 @@ int realm_map_non_secure(struct realm *realm,
>   			 unsigned long size,
>   			 enum kvm_pgtable_prot prot,
>   			 struct kvm_mmu_memory_cache *memcache);
> +int realm_psci_complete(struct kvm_vcpu *source,
> +			struct kvm_vcpu *target,
> +			unsigned long status);
>   
>   static inline bool kvm_realm_is_private_address(struct realm *realm,
>   						unsigned long addr)
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 3b5dbe9a0a0e..a2cd55dc7b5b 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -103,7 +103,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>   
>   	reset_state->reset = true;
>   	kvm_make_request(KVM_REQ_VCPU_RESET, vcpu);
> -

This change isn't supposed to be part of this patch :-)

>   	/*
>   	 * Make sure the reset request is observed if the RUNNABLE mp_state is
>   	 * observed.
> @@ -142,6 +141,20 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
>   	/* Ignore other bits of target affinity */
>   	target_affinity &= target_affinity_mask;
>   
> +	if (vcpu_is_rec(vcpu)) {
> +		struct kvm_vcpu *target_vcpu;
> +
> +		/* RMM supports only zero affinity level */
> +		if (lowest_affinity_level != 0)
> +			return PSCI_RET_INVALID_PARAMS;
> +
> +		target_vcpu = kvm_mpidr_to_vcpu(kvm, target_affinity);
> +		if (!target_vcpu)
> +			return PSCI_RET_INVALID_PARAMS;
> +
> +		return PSCI_RET_SUCCESS;
> +	}
> +
>   	/*
>   	 * If one or more VCPU matching target affinity are running
>   	 * then ON else OFF
> diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
> index 761b38a4071c..2b03e962ee41 100644
> --- a/arch/arm64/kvm/rmi.c
> +++ b/arch/arm64/kvm/rmi.c
> @@ -3,6 +3,7 @@
>    * Copyright (C) 2023-2025 ARM Ltd.
>    */
>   
> +#include <uapi/linux/psci.h>
>   #include <linux/kvm_host.h>
>   
>   #include <asm/kvm_emulate.h>
> @@ -127,6 +128,25 @@ static void free_rtt(phys_addr_t phys)
>   	kvm_account_pgtable_pages(phys_to_virt(phys), -1);
>   }
>   
> +int realm_psci_complete(struct kvm_vcpu *source, struct kvm_vcpu *target,
> +			unsigned long status)
> +{
> +	int ret;
> +
> +	/*
> +	 * XXX: RMM-v2.0 doesn't require the target REC address for completing
> +	 * PSCI requests. Temporary hack until RMM implementation catches up
> +	 * to the full spec.
> +	 */
> +	ret = rmi_psci_complete(virt_to_phys(source->arch.rec.rec_page),
> +				virt_to_phys(target->arch.rec.rec_page),
> +				status);
> +	if (ret)
> +		return -EINVAL;

		return -ENXIO;

> +
> +	return 0;
> +}
> +
>   static int realm_rtt_create(struct realm *realm,
>   			    unsigned long addr,
>   			    int level,
> @@ -1004,6 +1024,41 @@ static void kvm_complete_ripas_change(struct kvm_vcpu *vcpu)
>   	rec->run->exit.ripas_base = base;
>   }
>   
> +static void kvm_rec_complete_psci(struct kvm_vcpu *vcpu)
> +{
> +	struct rec_run *run = vcpu->arch.rec.run;
> +	unsigned long status = PSCI_RET_DENIED;
> +	unsigned long ret = vcpu_get_reg(vcpu, 0);
> +	struct kvm_vcpu *target;
> +
> +	switch (run->exit.gprs[0]) {
> +	/*
> +	 * XXX: RMM-v2.0 doesn't cause RMI_EXIT_PSCI for AFFINITY_INFO
> +	 * Temporary hack until tf-RMM gets the REC to MPIDR mapping via
> +	 * RD Auxiliary granules.
> +	 * For now always report SUCCESS
> +	 */
> +	case PSCI_0_2_FN64_AFFINITY_INFO:
> +		status = PSCI_RET_SUCCESS;
> +		break;
> +	case PSCI_0_2_FN64_CPU_ON: {
> +		if (ret != PSCI_RET_SUCCESS &&
> +		    ret != PSCI_RET_ALREADY_ON)
> +			status = PSCI_RET_DENIED;
> +		else
> +			status = PSCI_RET_SUCCESS;
> +		break;
> +	}
> +	default:
> +		return;
> +	}
> +
> +	target = kvm_mpidr_to_vcpu(vcpu->kvm, run->exit.gprs[1]);
> +	/* RMM makes sure that we don't get RMI_EXIT_PSCI for invalid mpidrs */
> +	if (target)
> +		realm_psci_complete(vcpu, target, status);
> +}
> +
>   /*
>    * kvm_rec_pre_enter - Complete operations before entering a REC
>    *
> @@ -1028,6 +1083,9 @@ int kvm_rec_pre_enter(struct kvm_vcpu *vcpu)
>   		for (int i = 0; i < REC_RUN_GPRS; i++)
>   			rec->run->enter.gprs[i] = vcpu_get_reg(vcpu, i);
>   		break;
> +	case RMI_EXIT_PSCI:
> +		kvm_rec_complete_psci(vcpu);
> +		break;
>   	case RMI_EXIT_RIPAS_CHANGE:
>   		kvm_complete_ripas_change(vcpu);
>   		break;

Thanks,
Gavin


^ permalink raw reply

* Re: [PATCH v14 28/44] arm64: RMI: Create the realm descriptor
From: Gavin Shan @ 2026-05-28  5:51 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Shanker Donthineni,
	Alper Gun, Aneesh Kumar K . V, Emi Kisanuki, Vishal Annapurve,
	WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-29-steven.price@arm.com>

Hi Steve,

On 5/13/26 11:17 PM, Steven Price wrote:
> Creating a realm involves first creating a realm descriptor (RD). This
> involves passing the configuration information to the RMM. Do this as
> part of realm_ensure_created() so that the realm is created when it is
> first needed.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
>   * The RMM no longer uses AUX granules, so no need to ask it how many it
>     needs.
>   * Adapted to other changes.
> Changes since v12:
>   * Since RMM page size is now equal to the host's page size various
>     calculations are simplified.
>   * Switch to using range based APIs to delegate/undelegate.
>   * VMID handling is now handled entirely by the RMM.
> ---
>   arch/arm64/kvm/rmi.c | 88 +++++++++++++++++++++++++++++++++++++++++++-
>   1 file changed, 86 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
> index fb96bcaa73ed..cae29fd3353c 100644
> --- a/arch/arm64/kvm/rmi.c
> +++ b/arch/arm64/kvm/rmi.c
> @@ -418,6 +418,77 @@ static void realm_unmap_shared_range(struct kvm *kvm,
>   			     start, end);
>   }
>   
> +static int realm_create_rd(struct kvm *kvm)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +	struct realm_params *params = realm->params;
> +	void *rd = NULL;
> +	phys_addr_t rd_phys, params_phys;
> +	size_t pgd_size = kvm_pgtable_stage2_pgd_size(kvm->arch.mmu.vtcr);
> +	int r;
> +
> +	realm->ia_bits = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
> +
> +	if (WARN_ON(realm->rd || !realm->params))
> +		return -EEXIST;
> +
> +	rd = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
> +	if (!rd)
> +		return -ENOMEM;
> +
> +	rd_phys = virt_to_phys(rd);
> +	if (rmi_delegate_page(rd_phys)) {
> +		r = -ENXIO;
> +		goto free_rd;
> +	}
> +
> +	if (rmi_delegate_range(kvm->arch.mmu.pgd_phys, pgd_size)) {
> +		r = -ENXIO;
> +		goto out_undelegate_tables;
> +	}
> +
> +	params->s2sz = VTCR_EL2_IPA(kvm->arch.mmu.vtcr);
> +	params->rtt_level_start = get_start_level(realm);
> +	params->rtt_num_start = pgd_size / PAGE_SIZE;
> +	params->rtt_base = kvm->arch.mmu.pgd_phys;
> +
> +	if (kvm->arch.arm_pmu) {
> +		params->pmu_num_ctrs = kvm->arch.nr_pmu_counters;
> +		params->flags |= RMI_REALM_PARAM_FLAG_PMU;
> +	}
> +
> +	if (kvm_lpa2_is_enabled())
> +		params->flags |= RMI_REALM_PARAM_FLAG_LPA2;
> +
> +	params_phys = virt_to_phys(params);
> +
> +	if (rmi_realm_create(rd_phys, params_phys)) {
> +		r = -ENXIO;
> +		goto out_undelegate_tables;
> +	}
> +
> +	realm->rd = rd;
> +	kvm_set_realm_state(kvm, REALM_STATE_NEW);
> +	/* The realm is up, free the parameters.  */
> +	free_page((unsigned long)realm->params);
> +	realm->params = NULL;
> +
> +	return 0;
> +
> +out_undelegate_tables:
> +	if (WARN_ON(rmi_undelegate_range(kvm->arch.mmu.pgd_phys, pgd_size))) {
> +		/* Leak the pages if they cannot be returned */
> +		kvm->arch.mmu.pgt = NULL;
> +	}

In the latest RMM implementation (topics/rmm-v2.0-poc_2), rmi_delegate_range() works
with the granularity of granule (4KB) and it can fail on any granule. For example,
we have 16x granule as the root RTT and rmi_delegate_range() fails on the first
granule, we're going to undelegate all these 16x granules, which were never delegated
to RMM. It eventually leads to error and memory leakage.

For this, rmi_delegate_range() could be improved to return the number of granules that
have been delegated. The return value can be used by the caller to handle the erroneous
case by passing the correct range to rmi_undelegate_page().

> +	if (WARN_ON(rmi_undelegate_page(rd_phys))) {
> +		/* Leak the page if it isn't returned */
> +		return r;
> +	}
> +free_rd:
> +	free_page((unsigned long)rd);
> +	return r;
> +}
> +
>   static void realm_unmap_private_range(struct kvm *kvm,
>   				      unsigned long start,
>   				      unsigned long end,
> @@ -647,8 +718,21 @@ static int realm_init_ipa_state(struct kvm *kvm,
>   
>   static int realm_ensure_created(struct kvm *kvm)
>   {
> -	/* Provided in later patch */
> -	return -ENXIO;
> +	int ret;
> +
> +	switch (kvm_realm_state(kvm)) {
> +	case REALM_STATE_NONE:
> +		break;
> +	case REALM_STATE_NEW:
> +		return 0;
> +	case REALM_STATE_DEAD:
> +		return -ENXIO;
> +	default:
> +		return -EBUSY;
> +	}
> +
> +	ret = realm_create_rd(kvm);
> +	return ret;
>   }
>   
>   static int set_ripas_of_protected_regions(struct kvm *kvm)

Thanks,
Gavin


^ permalink raw reply

* Re: [RFC PATCH v4 01/14] coco: host: arm64: Add host TSM callback and IDE stream allocation support
From: Dan Williams (nvidia) @ 2026-05-28  5:47 UTC (permalink / raw)
  To: Aneesh Kumar K.V (Arm), linux-coco, kvmarm, linux-arm-kernel,
	linux-kernel
  Cc: Aneesh Kumar K.V (Arm), Alexey Kardashevskiy, Catalin Marinas,
	Dan Williams, Jason Gunthorpe, Jonathan Cameron, Marc Zyngier,
	Samuel Ortiz, Steven Price, Suzuki K Poulose, Will Deacon,
	Xu Yilun
In-Reply-To: <20260427065121.916615-2-aneesh.kumar@kernel.org>

Aneesh Kumar K.V (Arm) wrote:
> Register the TSM callback when the DA feature is supported by KVM.
> 
> This driver handles IDE stream setup for both the root port and PCIe
> endpoints. Root port IDE stream enablement itself is managed by RMM.
> 
> In addition, the driver registers pci_tsm_ops with the TSM subsystem.

Do you want to call out that this is an infrastructure / scaffolding
patch that only handles the PCI-TSM skeleton. The CCA meat comes later,
in particular IDE key management. Tell a bit more of the story 

Otherwise, mostly looks good.

Minor comments below...

> Signed-off-by: Aneesh Kumar K.V (Arm) <aneesh.kumar@kernel.org>
> ---
>  arch/arm64/include/asm/rmi_smc.h         |   2 +
>  drivers/firmware/smccc/rmm.c             |  12 ++
>  drivers/firmware/smccc/rmm.h             |   8 +
>  drivers/firmware/smccc/smccc.c           |   1 +
>  drivers/virt/coco/Kconfig                |   2 +
>  drivers/virt/coco/Makefile               |   1 +
>  drivers/virt/coco/arm-cca-host/Kconfig   |  19 ++
>  drivers/virt/coco/arm-cca-host/Makefile  |   5 +
>  drivers/virt/coco/arm-cca-host/arm-cca.c | 225 +++++++++++++++++++++++
>  drivers/virt/coco/arm-cca-host/rmi-da.h  |  46 +++++
>  10 files changed, 321 insertions(+)
>  create mode 100644 drivers/virt/coco/arm-cca-host/Kconfig
>  create mode 100644 drivers/virt/coco/arm-cca-host/Makefile
>  create mode 100644 drivers/virt/coco/arm-cca-host/arm-cca.c
>  create mode 100644 drivers/virt/coco/arm-cca-host/rmi-da.h
> 
> diff --git a/arch/arm64/include/asm/rmi_smc.h b/arch/arm64/include/asm/rmi_smc.h
> index fa23818e1b4c..109d6cc6ef37 100644
> --- a/arch/arm64/include/asm/rmi_smc.h
> +++ b/arch/arm64/include/asm/rmi_smc.h
[..]
> diff --git a/drivers/firmware/smccc/rmm.c b/drivers/firmware/smccc/rmm.c
> index 2a6187df3285..7444cc3a588c 100644
> --- a/drivers/firmware/smccc/rmm.c
> +++ b/drivers/firmware/smccc/rmm.c
[..]
> diff --git a/drivers/firmware/smccc/rmm.h b/drivers/firmware/smccc/rmm.h
> index a47a650d4f51..37d0d95a099e 100644
> --- a/drivers/firmware/smccc/rmm.h
> +++ b/drivers/firmware/smccc/rmm.h
[..]
> diff --git a/drivers/firmware/smccc/smccc.c b/drivers/firmware/smccc/smccc.c
> index fc9b44b7c687..2bf2d59e686d 100644
> --- a/drivers/firmware/smccc/smccc.c
> +++ b/drivers/firmware/smccc/smccc.c
> @@ -97,6 +97,7 @@ static int __init smccc_devices_init(void)
>  		 * the required SMCCC function IDs at a supported revision.
>  		 */
>  		register_rsi_device(pdev);
> +		register_rmi_device(pdev);
>  	}

Would splitting the above three hunks make this series stand on its own
relative to the base CCA series? I assume likely not as soon as we get
to patch2.

Otherwise, just curious what your intended merge strategy is for this,
tsm.git or arm.git, and what help this needs?

[..]
snip code that looks good.

> diff --git a/drivers/virt/coco/arm-cca-host/Makefile b/drivers/virt/coco/arm-cca-host/Makefile
> new file mode 100644
> index 000000000000..c236827f002c
> --- /dev/null
> +++ b/drivers/virt/coco/arm-cca-host/Makefile
> @@ -0,0 +1,5 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +#
> +obj-$(CONFIG_ARM_CCA_HOST) += arm-cca-host.o
> +
> +arm-cca-host-y	+=  arm-cca.o
> diff --git a/drivers/virt/coco/arm-cca-host/arm-cca.c b/drivers/virt/coco/arm-cca-host/arm-cca.c
> new file mode 100644
> index 000000000000..67f7e80106e8
> --- /dev/null
> +++ b/drivers/virt/coco/arm-cca-host/arm-cca.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2026 ARM Ltd.
> + */
> +
> +#include <linux/auxiliary_bus.h>
> +#include <linux/pci-tsm.h>
> +#include <linux/pci-ide.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/tsm.h>
> +#include <linux/vmalloc.h>
> +#include <linux/cleanup.h>
> +
> +#include "rmi-da.h"
> +
> +/* Total number of stream id supported at root port level */
> +#define MAX_STREAM_ID	256
> +
> +static struct pci_tsm *cca_tsm_pci_probe(struct tsm_dev *tsm_dev, struct pci_dev *pdev)
> +{
> +	int ret;
> +
> +	if (!is_pci_tsm_pf0(pdev)) {
> +		struct cca_host_fn_dsc *fn_dsc __free(kfree) =
> +			kzalloc(sizeof(*fn_dsc), GFP_KERNEL);

kzalloc_obj(*fn_dsc)

> +
> +		if (!fn_dsc)
> +			return NULL;
> +
> +		ret = pci_tsm_link_constructor(pdev, &fn_dsc->pci, tsm_dev);
> +		if (ret)
> +			return NULL;
> +
> +		return &no_free_ptr(fn_dsc)->pci;
> +	}
> +
> +	if (!pdev->ide_cap)
> +		return NULL;

Bailing early?

Maybe the RMM knows something about this device not needing IDE? I have
a similar question in patch2 around trusted sources for whether a device
is internal or not. 

> +
> +	struct cca_host_pf0_ep_dsc *pf0_ep_dsc __free(kfree) =
> +		kzalloc(sizeof(*pf0_ep_dsc), GFP_KERNEL);
> +	if (!pf0_ep_dsc)
> +		return NULL;
> +
> +	ret = pci_tsm_pf0_constructor(pdev, &pf0_ep_dsc->pci, tsm_dev);
> +	if (ret)
> +		return NULL;
> +
> +	pci_dbg(pdev, "tsm enabled\n");
> +	return &no_free_ptr(pf0_ep_dsc)->pci.base_tsm;
> +}
> +
> +static void cca_tsm_pci_remove(struct pci_tsm *tsm)
> +{
> +	struct pci_dev *pdev = tsm->pdev;
> +
> +	if (is_pci_tsm_pf0(pdev)) {
> +		struct cca_host_pf0_ep_dsc *pf0_ep_dsc = to_cca_pf0_ep_dsc(pdev);
> +
> +		pci_tsm_pf0_destructor(&pf0_ep_dsc->pci);
> +		kfree(pf0_ep_dsc);
> +	} else {
> +		kfree(to_cca_fn_dsc(pdev));
> +	}
> +}
> +
> +/* For now global for simplicity. Protected by pci_tsm_rwsem */
> +static DECLARE_BITMAP(cca_stream_ids, MAX_STREAM_ID);
> +static int alloc_stream_id(struct pci_host_bridge *hb)
> +{
> +	int stream_id;
> +
> +redo_alloc:
> +	stream_id = find_first_zero_bit(cca_stream_ids, MAX_STREAM_ID);
> +	if (stream_id == MAX_STREAM_ID)
> +		return stream_id;
> +
> +	if (ida_exists(&hb->ide_stream_ids_ida, stream_id)) {
> +		/* mark the stream allocated in the global bitmap. */
> +		set_bit(stream_id, cca_stream_ids);
> +		goto redo_alloc;
> +	}
> +	return stream_id;

Is 256 total an RMM limit, and/or does it require globally unique
stream-ids? If not you could do what SEV-TIO does and just set stream-id
== stream-index.

> +}
> +
> +static inline bool cca_pdev_need_sel_ide_streams(struct pci_dev *pdev)
> +{
> +	return pci_pcie_type(pdev) == PCI_EXP_TYPE_ENDPOINT;
> +}
> +
> +static int cca_tsm_connect(struct pci_dev *pdev)
> +{
> +	struct pci_dev *rp = pcie_find_root_port(pdev);
> +	struct cca_host_pf0_ep_dsc *pf0_ep_dsc;
> +	struct pci_ide *ide;
> +	int ret, stream_id = 0;
> +
> +	/* Only function 0 supports connect in host */
> +	if (WARN_ON(!is_pci_tsm_pf0(pdev)))
> +		return -EIO;
> +
> +	pf0_ep_dsc = to_cca_pf0_ep_dsc(pdev);
> +	if (cca_pdev_need_sel_ide_streams(pdev)) {
> +		/* Allocate stream id */
> +		stream_id = alloc_stream_id(pci_find_host_bridge(pdev->bus));
> +		if (stream_id == MAX_STREAM_ID)
> +			return -EBUSY;
> +		set_bit(stream_id, cca_stream_ids);
> +
> +		ide = pci_ide_stream_alloc(pdev);
> +		if (!ide) {
> +			ret = -ENOMEM;
> +			goto err_stream_alloc;
> +		}
> +
> +		pf0_ep_dsc->sel_stream = ide;
> +		ide->stream_id = stream_id;
> +		ret = pci_ide_stream_register(ide);
> +		if (ret)
> +			goto err_stream;
> +		/*
> +		 * Configure IDE capability for target device
> +		 *
> +		 * Some test devices work only with DEFAULT_STREAM enabled.
> +		 * For simplicity, enable DEFAULT_STREAM for all devices. A
> +		 * future decent solution may be to have a quirk table to
> +		 * specify which devices need DEFAULT_STREAM.
> +		 */
> +		ide->partner[PCI_IDE_EP].default_stream = 1;
> +		pci_ide_stream_setup(pdev, ide);
> +		pci_ide_stream_setup(rp, ide);
> +
> +		ret = tsm_ide_stream_register(ide);
> +		if (ret)
> +			goto err_tsm;
> +
> +		/*
> +		 * Once ide is setup, enable the stream at the endpoint
> +		 * Root port will be done by RMM
> +		 */
> +		pci_ide_stream_enable(pdev, ide);

The end point of these patches follows the spec recommendation of
delaying enable until after key programming.

> +	}
> +	return 0;

Should this be making security claims to userspace without taking any
action for non-endpoint devices that happen to be passed in?

Thinking about a bisection case this should either fail here, print a
message that is removed in the final enabling patch, or do the
__maybe_unused arrangement to land all the CCA bits first and then do
this hookup. Up to you.

^ permalink raw reply

* Re: [PATCH v14 26/44] arm64: RMI: Allow populating initial contents
From: Gavin Shan @ 2026-05-28  5:30 UTC (permalink / raw)
  To: Steven Price, kvm, kvmarm
  Cc: Catalin Marinas, Marc Zyngier, Will Deacon, James Morse,
	Oliver Upton, Suzuki K Poulose, Zenghui Yu, linux-arm-kernel,
	linux-kernel, Joey Gouly, Alexandru Elisei, Christoffer Dall,
	Fuad Tabba, linux-coco, Ganapatrao Kulkarni, Shanker Donthineni,
	Alper Gun, Aneesh Kumar K . V, Emi Kisanuki, Vishal Annapurve,
	WeiLin.Chang, Lorenzo.Pieralisi2
In-Reply-To: <20260513131757.116630-27-steven.price@arm.com>

Hi Steve,

On 5/13/26 11:17 PM, Steven Price wrote:
> The VMM needs to populate the realm with some data before starting (e.g.
> a kernel and initrd). This is measured by the RMM and used as part of
> the attestation later on.
> 
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
> Changes since v13:
>   * Rename realm_create_protected_data_page() to realm_data_map_init().
> Changes since v12:
>   * The ioctl now updates the structure with the amount populated rather
>     than returning this through the ioctl return code.
>   * Use the new RMM v2.0 range based RMI calls.
>   * Adapt to upstream changes in kvm_gmem_populate().
> Changes since v11:
>   * The multiplex CAP is gone and there's a new ioctl which makes use of
>     the generic kvm_gmem_populate() functionality.
> Changes since v7:
>   * Improve the error codes.
>   * Other minor changes from review.
> Changes since v6:
>   * Handle host potentially having a larger page size than the RMM
>     granule.
>   * Drop historic "par" (protected address range) from
>     populate_par_region() - it doesn't exist within the current
>     architecture.
>   * Add a cond_resched() call in kvm_populate_realm().
> Changes since v5:
>   * Refactor to use PFNs rather than tracking struct page in
>     realm_create_protected_data_page().
>   * Pull changes from a later patch (in the v5 series) for accessing
>     pages from a guest memfd.
>   * Do the populate in chunks to avoid holding locks for too long and
>     triggering RCU stall warnings.
> ---
>   arch/arm64/include/asm/kvm_rmi.h |   4 ++
>   arch/arm64/kvm/Kconfig           |   1 +
>   arch/arm64/kvm/arm.c             |  13 ++++
>   arch/arm64/kvm/rmi.c             | 106 +++++++++++++++++++++++++++++++
>   4 files changed, 124 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_rmi.h b/arch/arm64/include/asm/kvm_rmi.h
> index 007249a13dbc..a2b6bc412a22 100644
> --- a/arch/arm64/include/asm/kvm_rmi.h
> +++ b/arch/arm64/include/asm/kvm_rmi.h
> @@ -88,6 +88,10 @@ int kvm_rec_enter(struct kvm_vcpu *vcpu);
>   int kvm_rec_pre_enter(struct kvm_vcpu *vcpu);
>   int handle_rec_exit(struct kvm_vcpu *vcpu, int rec_run_status);
>   
> +struct kvm_arm_rmi_populate;
> +
> +int kvm_arm_rmi_populate(struct kvm *kvm,
> +			 struct kvm_arm_rmi_populate *arg);
>   void kvm_realm_unmap_range(struct kvm *kvm,
>   			   unsigned long ipa,
>   			   unsigned long size,
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 4e16719fda22..d0cd011cf672 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -38,6 +38,7 @@ menuconfig KVM
>   	select GUEST_PERF_EVENTS if PERF_EVENTS
>   	select KVM_GUEST_MEMFD
>   	select KVM_GENERIC_MEMORY_ATTRIBUTES
> +	select HAVE_KVM_ARCH_GMEM_POPULATE
>   	help
>   	  Support hosting virtualized guest machines.
>   
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index ed88a203b892..073ba9181da9 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -2131,6 +2131,19 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
>   			return -EFAULT;
>   		return kvm_vm_ioctl_get_reg_writable_masks(kvm, &range);
>   	}
> +	case KVM_ARM_RMI_POPULATE: {
> +		struct kvm_arm_rmi_populate req;
> +		int ret;
> +
> +		if (!kvm_is_realm(kvm))
> +			return -ENXIO;
> +		if (copy_from_user(&req, argp, sizeof(req)))
> +			return -EFAULT;
> +		ret = kvm_arm_rmi_populate(kvm, &req);
> +		if (copy_to_user(argp, &req, sizeof(req)))
> +			return -EFAULT;
> +		return ret;
> +	}

s/return ret/return 0; The variable 'ret' can be dropped.

>   	default:
>   		return -EINVAL;
>   	}
> diff --git a/arch/arm64/kvm/rmi.c b/arch/arm64/kvm/rmi.c
> index a89873a5eb77..209087bcf399 100644
> --- a/arch/arm64/kvm/rmi.c
> +++ b/arch/arm64/kvm/rmi.c
> @@ -486,6 +486,75 @@ void kvm_realm_unmap_range(struct kvm *kvm, unsigned long start,
>   		realm_unmap_private_range(kvm, start, end, may_block);
>   }
>   
> +static int realm_data_map_init(struct kvm *kvm, unsigned long ipa,
> +			       kvm_pfn_t dst_pfn, kvm_pfn_t src_pfn,
> +			       unsigned long flags)
> +{
> +	struct realm *realm = &kvm->arch.realm;
> +	phys_addr_t rd = virt_to_phys(realm->rd);
> +	phys_addr_t dst_phys, src_phys;
> +	int ret;
> +
> +	dst_phys = __pfn_to_phys(dst_pfn);
> +	src_phys = __pfn_to_phys(src_pfn);
> +
> +	if (rmi_delegate_page(dst_phys))
> +		return -ENXIO;
> +
> +	ret = rmi_rtt_data_map_init(rd, dst_phys, ipa, src_phys, flags);
> +	if (RMI_RETURN_STATUS(ret) == RMI_ERROR_RTT) {
> +		/* Create missing RTTs and retry */
> +		int level = RMI_RETURN_INDEX(ret);
> +
> +		KVM_BUG_ON(level == KVM_PGTABLE_LAST_LEVEL, kvm);

		KVM_BUG_ON(level >= KVM_PGTABLE_LAST_LEVEL, kvm);> +
> +		ret = realm_create_rtt_levels(realm, ipa, level,
> +					      KVM_PGTABLE_LAST_LEVEL, NULL);
> +		if (!ret) {
> +			ret = rmi_rtt_data_map_init(rd, dst_phys, ipa, src_phys,
> +						    flags);
> +		}
> +	}
> +
> +	if (ret) {
> +		if (WARN_ON(rmi_undelegate_page(dst_phys))) {
> +			/* Undelegate failed, so we leak the page */
> +			get_page(pfn_to_page(dst_pfn));
> +		}
> +	}
> +

	if (ret && WARN_ON(rmi_undelegate_page(dst_phys)) {
		/* Leak the page that fails to be undelegated */
		get_page(pfn_to_page(dst_pfn));
	}

> +	return ret;
> +}
> +
> +static int populate_region_cb(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
> +			      struct page *src_page, void *opaque)
> +{
> +	unsigned long data_flags = *(unsigned long *)opaque;
> +	phys_addr_t ipa = gfn_to_gpa(gfn);
> +
> +	if (!src_page)
> +		return -EOPNOTSUPP;
> +
> +	return realm_data_map_init(kvm, ipa, pfn, page_to_pfn(src_page),
> +				   data_flags);
> +}
> +
> +static long populate_region(struct kvm *kvm,
> +			    gfn_t base_gfn,
> +			    unsigned long pages,
> +			    u64 uaddr,
> +			    unsigned long data_flags)
> +{
> +	long ret = 0;
> +
> +	mutex_lock(&kvm->slots_lock);
> +	ret = kvm_gmem_populate(kvm, base_gfn, u64_to_user_ptr(uaddr), pages,
> +				populate_region_cb, &data_flags);
> +	mutex_unlock(&kvm->slots_lock);
> +
> +	return ret;
> +}
> +
>   enum ripas_action {
>   	RIPAS_INIT,
>   	RIPAS_SET,
> @@ -574,6 +643,43 @@ static int realm_ensure_created(struct kvm *kvm)
>   	return -ENXIO;
>   }
>   
> +int kvm_arm_rmi_populate(struct kvm *kvm,
> +			 struct kvm_arm_rmi_populate *args)
> +{
> +	unsigned long data_flags = 0;
> +	unsigned long ipa_start = args->base;
> +	unsigned long ipa_end = ipa_start + args->size;
> +	long pages_populated;
> +	int ret;
> +
> +	if (args->reserved ||
> +	    (args->flags & ~KVM_ARM_RMI_POPULATE_FLAGS_MEASURE) ||
> +	    !IS_ALIGNED(ipa_start, PAGE_SIZE) ||
> +	    !IS_ALIGNED(ipa_end, PAGE_SIZE) ||
> +	    !IS_ALIGNED(args->source_uaddr, PAGE_SIZE))
> +		return -EINVAL;
> +

There are more conditions missed here:

	args->size == 0, return 0;
	args->base + args->size < args->base, return -EINVAL;  // wrapped range

> +	ret = realm_ensure_created(kvm);
> +	if (ret)
> +		return ret;
> +
> +	if (args->flags & KVM_ARM_RMI_POPULATE_FLAGS_MEASURE)
> +		data_flags |= RMI_MEASURE_CONTENT;
> +
> +	pages_populated = populate_region(kvm, gpa_to_gfn(ipa_start),
> +					  args->size >> PAGE_SHIFT,
> +					  args->source_uaddr, data_flags);
> +
> +	if (pages_populated < 0)
> +		return pages_populated;

pages_populaged is 'unsigned long', this function returns a 'int' value.

> +
> +	args->size -= pages_populated << PAGE_SHIFT;
> +	args->source_uaddr += pages_populated << PAGE_SHIFT;
> +	args->base += pages_populated << PAGE_SHIFT;
> +
> +	return 0;
> +}
> +
>   static void kvm_complete_ripas_change(struct kvm_vcpu *vcpu)
>   {
>   	struct kvm *kvm = vcpu->kvm;

Thanks,
Gavin


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox