Kernel KVM virtualization development
 help / color / mirror / Atom feed
* [PATCH v2] KVM: x86: Exempt in-kernel PIC from "disappearing" interrupt warning
From: syzbot @ 2026-06-25 21:10 UTC (permalink / raw)
  To: syzkaller-bugs, Borislav Petkov, Dave Hansen, kvm, Ingo Molnar,
	Paolo Bonzini, Sean Christopherson, Thomas Gleixner, x86
  Cc: hpa, linux-kernel, syzbot

From: Alexander Potapenko <glider@google.com>

A warning can be triggered in kvm_check_and_inject_events() when an
interrupt disappears between the time it is checked via
kvm_cpu_has_injectable_intr() and the time it is fetched via
kvm_cpu_get_interrupt(). This occurs because the warning incorrectly
assumes that if an interrupt is injectable, fetching it must always return
a valid interrupt vector (i.e., not -1).

However, this assumption is broken by level-triggered interrupts in the
in-kernel PIC that are deasserted concurrently by another thread. For
example, if a misconfigured PIT or a PCI device asserts and then
immediately deasserts a level-triggered interrupt, the vCPU thread might
see the pending interrupt during the check but find it gone during the
fetch, resulting in kvm_cpu_get_interrupt() returning -1.

The warning manifests as follows:

------------[ cut here ]------------
irq == -1
WARNING: arch/x86/kvm/x86.c:10860 at kvm_check_and_inject_events
arch/x86/kvm/x86.c:10860 [inline]
WARNING: arch/x86/kvm/x86.c:10860 at vcpu_enter_guest
arch/x86/kvm/x86.c:11356 [inline]
WARNING: arch/x86/kvm/x86.c:10860 at vcpu_run+0x57ec/0x7950
arch/x86/kvm/x86.c:11770
RIP: 0010:kvm_check_and_inject_events arch/x86/kvm/x86.c:10860 [inline]
RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:11356 [inline]
RIP: 0010:vcpu_run+0x57ec/0x7950 arch/x86/kvm/x86.c:11770
Call Trace:
 <TASK>
 kvm_arch_vcpu_ioctl_run+0x1193/0x2070 arch/x86/kvm/x86.c:12125
 kvm_vcpu_ioctl+0xa61/0xfd0 virt/kvm/kvm_main.c:4470
 vfs_ioctl fs/ioctl.c:51 [inline]
 __do_sys_ioctl fs/ioctl.c:597 [inline]
 __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:583
 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
 do_syscall_64+0x174/0x580 arch/x86/entry/syscall_64.c:94
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
 </TASK>

Since this is a legitimate Time-Of-Check to Time-Of-Use (TOCTOU) race
condition for the in-kernel PIC, WARN_ON_ONCE() must not be used for this
case. Update the warning to exempt the in-kernel PIC, while preserving it
for other interrupt sources (e.g. APIC) as they are not expected to exhibit
this behavior.

Fixes: bf672720e83c ("KVM: x86: check the kvm_cpu_get_interrupt result before using it")
Assisted-by: Gemini:gemini-3.1-pro-preview Gemini:gemini-3-flash-preview syzbot
Reported-by: syzbot+dd769db18693736eee89@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=dd769db18693736eee89
Link: https://syzkaller.appspot.com/ai_job?id=0b59ccd5-8820-460d-84d3-94df6307bd6a
Signed-off-by: Alexander Potapenko <glider@google.com>

---
v2:
- Restrict the warning exemption to the in-kernel PIC case.
- Remove the pr_err_ratelimited() logging.
- Preserve the WARN_ON_ONCE() for non-PIC interrupt sources.

v1:
https://lore.kernel.org/all/345e9d6c-d7d9-4bab-adb3-d6a7bd27599f@mail.kernel.org/T/
---
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0550359ed..f1681aa9f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10857,7 +10857,9 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
 		if (r) {
 			int irq = kvm_cpu_get_interrupt(vcpu);
 
-			if (!WARN_ON_ONCE(irq == -1)) {
+			WARN_ON_ONCE(irq == -1 && !pic_in_kernel(vcpu->kvm));
+
+			if (irq != -1) {
 				kvm_queue_interrupt(vcpu, irq, false);
 				kvm_x86_call(inject_irq)(vcpu, false);
 				WARN_ON(kvm_x86_call(interrupt_allowed)(vcpu, true) < 0);


base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6
-- 
See https://goo.gle/syzbot-ai-patches for information about AI-generated patches.
You can comment on the patch as usual, syzbot will try to address
the comments and send a new version of the patch if necessary.
syzbot engineers can be reached at syzkaller@googlegroups.com.

^ permalink raw reply related

* Re: Bug#1135235: linux-image-6.19.13+deb14-amd64: Reoccuring host crash "Invalid SPTE change" with gaming win kvm/qemu guest and device passthrough
From: Salvatore Bonaccorso @ 2026-06-25 20:24 UTC (permalink / raw)
  To: Sean Christopherson, 1135235-done
  Cc: Maximilian Senftleben, kvm, linux-kernel
In-Reply-To: <aj1NRsUIVQRGBqM1@google.com>

Source: linux
Source-Version: 7.0.10-1 

On Thu, Jun 25, 2026 at 03:46:14PM +0000, Sean Christopherson wrote:
> +lists to capture this for posterity
> 
> On Wed, Jun 24, 2026, Maximilian Senftleben wrote:
> > I tried a 7.0.10 with KASAN for several days, and now I am running
> > 7.0.12+deb14.1-amd64 since a couple of days, and at least so far I was not
> > able to reproduce my issue, i.e. I had no crash so far.
> 
> That, and the fact that 7.0.7 was fine, strongly suggests a broken fix got
> backported and landed in 7.0.8 or 7.0.9, and then a fix-for-the-fix landed in
> 7.10.  There aren't any KVM commits of interest anywhere in that range, which
> supports my theory that KVM is an innocent bystander that ran afoul of memory
> corruption due to a bug elsewhere in the kernel.
> 
> Unless you want to bisect to figure out exactly what commit broken things, and
> what commit fixed things, I think it makes sense to consider this resolved unless
> the problem occurs on a 7.0.10+ kernel.

Ack, I'm marking this as fixed with 7.0.10 based version in Debian then.

Regards,
Salvatore

^ permalink raw reply

* Re: [PATCH v2 kvmtool 0/4] Add support for running protected VMs on arm64
From: Fuad Tabba @ 2026-06-25 19:42 UTC (permalink / raw)
  To: Will Deacon
  Cc: kvm, kvmarm, Alexandru Elisei, Suzuki K Poulose, Andre Przywara,
	Oliver Upton, Marc Zyngier
In-Reply-To: <20260625171046.4482-1-will@kernel.org>

On Thu, 25 Jun 2026 at 18:10, Will Deacon <will@kernel.org> wrote:
>
> Hi folks,
>
> This is v2 of the patches I previously posted here:
>
>   https://lore.kernel.org/r/20260619115415.5475-1-will@kernel.org
>
> Changes since v1 include:
>
>   * Bail if user specifies less guest memory than the restricted DMA pool.
>   * Avoid silently dropping KVM_VM_TYPE_ARM_PROTECTED on old host kernels.
>   * Added R-b/T-b tags (thank you!)
>
> The patches are also available here if you want to pull them directly:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/log/?h=pkvm
>
> Cheers,
>
> Will

For the new series:

Reviewed-by: Fuad Tabba <fuad.tabba@linux.dev>
Tested-by: Fuad Tabba < fuad.tabba@linux.dev>

Cheers,
/fuad

>
> Cc: Alexandru Elisei <alexandru.elisei@arm.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
> Cc: Andre Przywara <andre.przywara@arm.com>
> Cc: Fuad Tabba <fuad.tabba@linux.dev>
> Cc: Oliver Upton <oliver.upton@linux.dev>
> Cc: Marc Zyngier <maz@kernel.org>
>
> --->8
>
> Will Deacon (4):
>   Sync kernel UAPI headers with v7.1
>   virtio: Factor out base features for modern virtio transports
>   virtio: Add helper for enabling VIRTIO_F_ACCESS_PLATFORM
>   arm64: Add support for protected VMs
>
>  arm64/fdt.c                         | 37 ++++++++++++++++++--
>  arm64/include/asm/kvm.h             |  1 +
>  arm64/include/kvm/fdt-arch.h        | 10 +++++-
>  arm64/include/kvm/kvm-arch.h        |  2 ++
>  arm64/include/kvm/kvm-config-arch.h |  5 ++-
>  arm64/kvm.c                         | 28 +++++++++++++--
>  arm64/pci.c                         |  2 ++
>  include/kvm/virtio.h                |  2 ++
>  include/linux/kvm.h                 | 53 +++++++++++++++++++++++++----
>  include/linux/virtio_ring.h         |  5 +--
>  riscv/include/asm/kvm.h             | 11 +++---
>  virtio/core.c                       | 12 +++++++
>  virtio/mmio-modern.c                |  2 +-
>  virtio/pci-modern.c                 |  2 +-
>  x86/include/asm/kvm.h               | 21 +++++++-----
>  15 files changed, 163 insertions(+), 30 deletions(-)
>
> --
> 2.55.0.rc0.799.gd6f94ed593-goog
>

^ permalink raw reply

* Re: [PATCH v9 3/6] x86/sev: Disable CPU hotplug while SNP is active
From: Kalra, Ashish @ 2026-06-25 19:42 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tglx, mingo, dave.hansen, x86, hpa, seanjc, peterz,
	thomas.lendacky, herbert, davem, ardb, pbonzini, aik,
	Michael.Roth, KPrateek.Nayak, Tycho.Andersen, Nathan.Fontenot,
	ackerleytng, jackyli, pgonda, rientjes, jacobhxu, xin,
	pawan.kumar.gupta, babu.moger, dyoung, nikunj, john.allen, darwi,
	linux-kernel, linux-crypto, kvm, linux-coco
In-Reply-To: <20260625150253.GAaj1DHZC8ULg6PzbI@fat_crate.local>

Hello Boris,

On 6/25/2026 10:02 AM, Borislav Petkov wrote:
> On Wed, Jun 24, 2026 at 09:56:49PM +0000, Ashish Kalra wrote:
>> +/* Set while SNP has CPU hotplug disabled (kernel-lifetime; survives ccp reload). */
>> +static bool snp_cpu_hotplug_disabled;
> 
> Do you really need this?
> 

Yes.

cpu_hotplug_disable()/cpu_hotplug_enable() are refcounted (cpu_hotplug_disabled++/--,
with a WARN on underflow), so they have to be balanced. This flag collapses them to
exactly one outstanding disable per SNP-active window, because the disable and enable
sites are not reached a symmetric number of times:

  - On firmware without SNP_X86_SHUTDOWN_SUPPORTED, __sev_snp_shutdown_locked() does not
  call snp_shutdown() (it's gated on data.x86_snp_shutdown), so SNP stays enabled in
  hardware — SNP_EN stays set and hotplug stays disabled — while sev->snp_initialized is
  cleared. Re-init after that is routine, the SNP ioctls self-bracket init and shutdown
  (e.g. SNP_COMMIT, SNP_SET_CONFIG, SNP_VLEK_LOAD):

  if (!sev->snp_initialized)
          snp_move_to_init_state(...);   /* -> __sev_snp_init_locked -> snp_prepare() */
  ... SNP_CMD ...
  if (shutdown_required)
          __sev_snp_shutdown_locked(...);
  - So whenever SNP isn't already initialized (psp_init_on_probe off, or after a prior
  legacy shutdown), every such ioctl does init -> command -> legacy shutdown. Each init
  reaches snp_prepare() with SNP_EN already set, and the disable now sits at the top of
  snp_prepare(), so it fires on every cycle. Without this flag that keeps bumping
  cpu_hotplug_disabled while the legacy shutdown never re-enables — hotplug ends up stuck
  disabled. This flag makes all but the first disable a no-op.
 
  - Also, importantly, kvm-amd module reload on legacy firmware is the same pattern: 
  unload leaves SNP_EN set, reload re-inits.)

  - On the enable side it avoids an unbalanced cpu_hotplug_enable() when the teardown/failure
  paths run without an outstanding disable (e.g. shutdown of a never-fully-initialized SNP).

So it's not redundant with cpu_hotplug_disabled — it tracks whether the outstanding disable
belongs to this SNP-active window in this kernel, which keeps the single disable/enable
balanced across the asymmetric legacy-vs-full SNP teardown paths and re-init.

Thanks,
Ashish

^ permalink raw reply

* Re: [PATCH] vhost/vdpa: reject overflowing PA map page counts
From: sashiko-bot @ 2026-06-25 19:07 UTC (permalink / raw)
  To: Yousef Alhouseen; +Cc: kvm
In-Reply-To: <20260624190653.2893-1-alhouseenyousef@gmail.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:

Pre-existing issues:
- [Critical] The variable `last_pfn` uses `0` as a sentinel value, which breaks the physical discontinuity check if PFN 0 is pinned, allowing an attacker to map arbitrary physical memory into the device IOMMU.
- [High] Permanent leak of pinned memory pages in the error handling path due to flawed state tracking with the `nchunks` variable.
- [High] TOCTOU race in RLIMIT_MEMLOCK check allows users to bypass the locked memory limit by concurrently mapping pages across multiple vhost-vdpa devices.
--

commit b62fefdce516922a639203f628dde15b26a4b097
Author: Yousef Alhouseen <alhouseenyousef@gmail.com>

vhost/vdpa: reject overflowing PA map page counts

This commit adds a check in vhost_vdpa_pa_map to prevent integer overflows
when adding the size and page offset. It also converts the pinned memory
limit check to use subtraction to prevent wrapping.

> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index ac55275fa0d0a..090cb8693b7ce 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -1102,6 +1102,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
>  	unsigned int gup_flags = FOLL_LONGTERM;
>  	unsigned long npages, cur_base, map_pfn, last_pfn = 0;

[Severity: Critical]
This is a pre-existing issue, but does using 0 as a sentinel for last_pfn
break the discontinuity check if PFN 0 is pinned?

If PFN 0 is pinned, last_pfn becomes 0, and this check is silently bypassed
for all subsequent pages:

vhost_vdpa_pa_map() {
    ...
    if (last_pfn && (this_pfn != last_pfn + 1)) {
    ...
}

Could this allow discontiguous physical pages to be incorrectly grouped
together and map arbitrary physical memory into the device IOMMU?

[ ... ]

> @@ -1123,7 +1130,8 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
>  	mmap_read_lock(dev->mm);
>  
>  	lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK));
> -	if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) {
> +	pinned_vm = atomic64_read(&dev->mm->pinned_vm);
> +	if (npages > lock_limit || pinned_vm > lock_limit - npages) {

[Severity: High]
This isn't a bug introduced by this patch, but does this limit check contain
a race condition?

Because pinned_vm is read without global serialization across the shared
mm_struct, multiple threads could concurrently issue ioctls to update the
IOTLB. Each thread takes its own device mutex and a shared mmap_read_lock.

If all threads concurrently read pinned_vm here, couldn't they all pass the
limit check, pin memory, and increment pinned_vm later, effectively bypassing
the limit?

>  		ret = -ENOMEM;
>  		goto unlock;
>  	}

[Severity: High]
This is a pre-existing issue, but is there a leak of pinned memory pages in
the error handling path of this function?

When a discontiguous page is encountered, the previous chunk is mapped and
nchunks is cleared to 0:

vhost_vdpa_pa_map() {
    ...
    if (last_pfn && (this_pfn != last_pfn + 1)) {
        ...
        ret = vhost_vdpa_map(...)
        ...
        nchunks = 0;
    }
    ...
}

However, the inner loop immediately continues processing the new page, making
it the start of a new outstanding chunk. If a subsequent pin_user_pages()
fails in the next iteration of the outer loop, the code jumps to the out
label:

vhost_vdpa_pa_map() {
    ...
out:
    if (ret) {
        if (nchunks) {
            ...
            for (pfn = map_pfn; pfn <= last_pfn; pfn++)
                unpin_user_page(pfn_to_page(pfn));
        }
        ...
}

Because nchunks was left at 0, doesn't it skip unpinning the outstanding
pages?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260624190653.2893-1-alhouseenyousef@gmail.com?part=1

^ permalink raw reply

* Re: [PATCH RFC 00/12] Document synchronization used in managing guest faults
From: Sean Christopherson @ 2026-06-25 18:37 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-0-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> Ackerley Tng (12):
>       Documentation: KVM: Elaborate comment on kvm_usage_lock
>       Documentation: KVM: Consolidate notes about cpu_read_lock() and kvm_lock
>       Documentation: KVM: Consolidate notes about kvm->slots_lock and irq_lock
>       Documentation: KVM: Turn - into bullet point
>       Documentation: KVM: Explain what rule the exception section is meant for
>       Documentation: KVM: Have actual headings for exceptions
>       Documentation: KVM: Drop mention of kvm->lock in SRCU documentation
>       Documentation: KVM: Add example for kvm->srcu in relation to mutex/lock
>       Documentation: KVM: Document synchronization for managing guest faults
>       KVM: guest_memfd: Clarify comment about gmem.file vs kvm->srcu
>       KVM: mmu: Point users of host_pfn_mapping_level() to docs
>       Documentation: KVM: Focus acquisition order section on preventing deadlocks

Please split these up into standalone patches or small series of patches that are
logically related at a finer granularity.  "Here's a pile of KVM documentation
updates" is not a reasonable level of granularity.

^ permalink raw reply

* Re: [PATCH RFC 07/12] Documentation: KVM: Drop mention of kvm->lock in SRCU documentation
From: Sean Christopherson @ 2026-06-25 18:35 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-7-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> The original comment says that synchronize_srcu(&kvm->srcu) is called
> inside critical sections for kvm->lock, vcpu->mutex and
> kvm->slots_lock. Drop mention of kvm->lock since this is no longer true.

I would *much* rather "fix" this by saying synchronize_srcu() *may* be called
inside blah blah blah.  Because (a) I don't feel like auditing all of KVM to see
if the above is true, (b) KVM's implementation may change again in the future,
and (c) taking kvm->lock inside a kvm->srcu read-side critical section is still
unsafe as we'd end up with ABBA deadlock (well, ABCCA?).

  1. SRCU held, waiting on kvm->lock
  2. kvm->lock held, waiting on vcpu->mutex
  3. vcpu->mutex held, waiting on synchronize_srcu()

^ permalink raw reply

* Re: [PATCH RFC 11/12] KVM: mmu: Point users of host_pfn_mapping_level() to docs
From: Sean Christopherson @ 2026-06-25 18:29 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-11-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> After consolidating documentation for host_pfn_mapping_level() in
> Documentation/virt/kvm/locking.rst, point users of function to docs.

NAK.  I want the "formal" documentation to describe the rules and general mechanisms,
not arch specific implementation details.  It's unfortunate the LoongArch copy+pasted
x86's code, comment and all, but that's a separate problem.

^ permalink raw reply

* Re: [PATCH RFC 12/12] Documentation: KVM: Focus acquisition order section on preventing deadlocks
From: Sean Christopherson @ 2026-06-25 18:25 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-12-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> Now that the first sentence is already described in more detail in the new
> section on synchronization while managing guest faults, drop the first
> sentence.

Nope, nothing in that sections says anything about the role of
mn_active_invalidate_count.

^ permalink raw reply

* Re: [PATCH v8 24/46] KVM: guest_memfd: Make in-place conversion the default
From: Ackerley Tng @ 2026-06-25 18:20 UTC (permalink / raw)
  To: Yan Zhao
  Cc: aik, andrew.jones, binbin.wu, brauner, chao.p.peng, david,
	jmattson, jthoughton, michael.roth, oupton, pankaj.gupta, qperret,
	rick.p.edgecombe, rientjes, shivankg, steven.price, tabba, willy,
	wyihan, forkloop, pratyush, suzuki.poulose, aneesh.kumar, liam,
	Paolo Bonzini, Sean Christopherson, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, x86, H. Peter Anvin, Steven Rostedt,
	Masami Hiramatsu, Mathieu Desnoyers, Jonathan Corbet, Shuah Khan,
	Shuah Khan, Vishal Annapurve, Andrew Morton, Chris Li,
	Kairui Song, Kemeng Shi, Nhat Pham, Barry Song, Axel Rasmussen,
	Yuanchu Xie, Wei Xu, Youngjun Park, Qi Zheng, Shakeel Butt,
	Kiryl Shutsemau, Baoquan He, Jason Gunthorpe, Vlastimil Babka,
	kvm, linux-kernel, linux-trace-kernel, linux-doc, linux-kselftest,
	linux-mm, linux-coco
In-Reply-To: <ajyCn0PnFtQK+Nka@yzhao56-desk.sh.intel.com>

Yan Zhao <yan.y.zhao@intel.com> writes:

> On Wed, Jun 24, 2026 at 05:05:44PM -0700, Ackerley Tng wrote:
>> Yan Zhao <yan.y.zhao@intel.com> writes:
>>
>> >
>> > [...snip...]
>> >
>> >>
>> >>  #ifdef kvm_arch_has_private_mem
>> >> -bool __ro_after_init gmem_in_place_conversion = false;
>> >> +bool __ro_after_init gmem_in_place_conversion = !IS_ENABLED(CONFIG_KVM_VM_MEMORY_ATTRIBUTES);
>> >> +module_param(gmem_in_place_conversion, bool, 0444);
>> >
>> > With gmem_in_place_conversion=true, userspace can create guest_memfd without the
>> > MMAP flag. In such cases, shared memory is allocated from different backends.
>> > This means this module parameter only enables per-gmem memory attribute and does
>> > not guarantee that gmem in-place conversion will actually occur.
>> >
>> > To avoid confusion, could we rename this module parameter to something more
>> > accurate, such as gmem_memory_attribute?
>> >
>>
>> I asked Sean about this after getting some fixes off list. Sean said
>> gmem_in_place_conversion is named for a host admin to use, and something
>> like gmem_memory_attributes is too much implementation details for the
>> admin.
> Thanks for this background.
>
> Some more context on why I'm asking:
>
> Currently, I'm testing TDX huge pages with the following two gmem components:
> 1. The gmem memory attribute in this gmem in-place conversion v8.
> 2. The gmem 2MB from buddy allocator. (for development/testing only).
>
> The gmem 2MB from buddy allocator allocates 2MB folios from buddy for private
> memory, while shared memory is allocated from a different backend.
> (To avoid fragmentation, only private mappings are split during private-to-shared
> conversions. In this approach, the 2MB folios are always retained in the gmem
> inode filemap cache without splitting.)
>
> Since shared memory is not allocated from gmem, there're no in-place conversions.
> The reason I'm using "gmem memory attribute" is that the per-VM attribute is
> being deprecated, as suggested by Sean [1].
>

v8 of conversions series changed that slightly, per-VM attributes is
going to stay around (because of work on RWX attributes, coming up) and
RWX will stay tracked at the VM level.

For v8 and beyond, only tracking of private/shared in per-VM attributes
is being deprecated.

By extension the entire thing about using guest_memfd for private memory
and a different backing memory for shared memory is being deprecated.

> Besides my current usage,

I think you can set up guest_memfd+2M for private memory and shared
memory from some other source, and that's the deprecated usage pattern.

> there may be other scenarios where gmem memory
> attributes is preferred without allocating shared memory from gmem.
> (e.g., PAGE.ADD from a temp extra shared source memory).
>

Is this TDH.MEM.PAGE.ADD, used indirectly from
tdx_gmem_post_populate()? This use case isn't blocked. Even if
gmem_in_place_conversion=true, you can still set src_address to
non-guest_memfd memory and load from anywhere you like.

Please let me know if that is broken! I think I accidentally used that
setup in selftests and it worked. The selftests are now defaulting to
in-place conversion.

> For such use cases, I'm concerns that the admins may find it confusing if they
> enable gmem_in_place_conversion but still observe extra memory consumptions for
> shared memory.
>

Hmm but I guess if someone enables gmem_in_place_conversion but still
allocates from elsewhere, they'd have to figure it out?

> [1] https://lore.kernel.org/kvm/aWmEegVP_A613WIr@google.com/
>
>> Sean, would you reconsider since Yan also asked? If the admin compiled
>> the kernel knowing what CONFIG_KVM_VM_MEMORY_ATTRIBUTES means, then the
>> admin would also be able to use a param like gmem_memory_attributes?
>>
>> There's the additional benefit that the similar naming aids in
>> understanding for both the admin and software engineers.
>>
>> Either way, in the next revision, I'll also add this documentation for
>> this module_param:
>>
>>   Setting the module parameter gmem_in_place_conversion to true will
>>   enable the KVM_SET_MEMORY_ATTRIBUTES2 guest_memfd ioctl and disables
>>   the KVM_SET_MEMORY_ATTRIBUTES VM ioctl. If gmem_in_place_conversion is
>>   true, the private/shared attribute will be tracked per-guest_memfd
>>   instead of per-VM.
>>
>> Let me know what y'all think of the wording!
>>
>> >>
>> >> [...snip...]
>> >>

^ permalink raw reply

* Re: [PATCH RFC 10/12] KVM: guest_memfd: Clarify comment about gmem.file vs kvm->srcu
From: Sean Christopherson @ 2026-06-25 18:19 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-10-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> Clarify the existing comment about synchronize_srcu() and
> kvm_gmem_get_pfn() to provide further context. Explain which
> synchronize_srcu() prevents races with how kvm_gmem_get_pfn() is used.
> 
> Also point reader to documentation for better understanding.
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
>  virt/kvm/guest_memfd.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
> index 69c9d6d546b28..f2218db0af980 100644
> --- a/virt/kvm/guest_memfd.c
> +++ b/virt/kvm/guest_memfd.c
> @@ -711,8 +711,13 @@ static void __kvm_gmem_unbind(struct kvm_memory_slot *slot, struct gmem_file *f)
>  	xa_store_range(&f->bindings, start, end - 1, NULL, GFP_KERNEL);
>  
>  	/*
> -	 * synchronize_srcu(&kvm->srcu) ensured that kvm_gmem_get_pfn()
> -	 * cannot see this memslot.
> +	 * This is called when memslots are updated, after the old
> +	 * memslot container is no longer in
> +	 * use. synchronize_srcu(&kvm->srcu) was called there, so
> +	 * kvm_gmem_get_pfn() from KVM's guest fault handling cannot
> +	 * see this memslot. See Documentation/virt/kvm/locking.rst
> +	 * for more information about kvm->srcu and the memslots
> +	 * container.

If we want to add to this comment, I would much rather do so as part of an update
to kvm_gmem_release()'s comment as well.

https://lore.kernel.org/all/20251113232229.1698886-1-seanjc@google.com

^ permalink raw reply

* Re: [PATCH RFC 08/12] Documentation: KVM: Add example for kvm->srcu in relation to mutex/lock
From: Sean Christopherson @ 2026-06-25 18:17 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-8-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> Add example of where vcpu->mutex and kvm->slots_lock are held while calling
> synchronize_srcu(&kvm->srcu) to concretely show where the synchronization
> primitives overlap.

Sorry, but NAK.  This is too x86-centric, and IMO the risk of the documentation
becoming stale and confusing outweighs any benefits from providing an incomplete
example.  Because like the kvm_usage_count stuff, I know the code in question,
and the example confused me and makes it harder to understand the rule(s).

^ permalink raw reply

* Re: [PATCH RFC 03/12] Documentation: KVM: Consolidate notes about kvm->slots_lock and irq_lock
From: Sean Christopherson @ 2026-06-25 18:12 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-3-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> Move the detail about ordering between kvm->slots_lock and kvm->irq_lock to
> where the two locks are first mentioned.

Why?

^ permalink raw reply

* Re: [PATCH RFC 02/12] Documentation: KVM: Consolidate notes about cpu_read_lock() and kvm_lock
From: Sean Christopherson @ 2026-06-25 18:12 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-2-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> Move the detail about cpu_read_lock() and kvm_lock to where the acquisition
> order is mentioned.

Why?

^ permalink raw reply

* Re: [PATCH RFC 01/12] Documentation: KVM: Elaborate comment on kvm_usage_lock
From: Sean Christopherson @ 2026-06-25 18:12 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: Paolo Bonzini, Jonathan Corbet, Shuah Khan, Tianrui Zhao,
	Bibo Mao, Huacai Chen, WANG Xuerui, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, Dave Hansen, Fuad Tabba, vannapurve, x86,
	H. Peter Anvin, kvm, linux-doc, linux-kernel, loongarch
In-Reply-To: <20260527-kvm-locking-docs-v1-1-4fe8b602ff47@google.com>

On Wed, May 27, 2026, Ackerley Tng wrote:
> The original comment talks about cpus_read_lock() and kvm_usage_count, but
> doesn't explain why they are related.
> 
> Elaborate comment on kvm_usage_lock to provide more context.
> 
> Signed-off-by: Ackerley Tng <ackerleytng@google.com>
> ---
>  Documentation/virt/kvm/locking.rst | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> index 662231e958a07..5564c8b38b9cc 100644
> --- a/Documentation/virt/kvm/locking.rst
> +++ b/Documentation/virt/kvm/locking.rst
> @@ -248,8 +248,23 @@ time it will be set using the Dirty tracking mechanism described above.
>  :Arch:		any
>  :Protects:	- kvm_usage_count
>  		- hardware virtualization enable/disable
> -:Comment:	Exists to allow taking cpus_read_lock() while kvm_usage_count is
> -		protected, which simplifies the virtualization enabling logic.
> +:Comment:       ``kvm_usage_count`` serves to deduplicate hardware
> +    virtualization enabling and disabling requests from different VMs
> +    being created.

kvm_usage_count does that and more, i.e. this is 'wrong" by being incomplete. 

> +
> +    Hardware virtualization enabling/disabling requires taking
> +    ``cpus_read_lock()``.
> +
> +    ``kvm_lock`` used to also protect ``kvm_usage_count``, but other
> +    parts of the Linux kernel holding ``cpus_read_lock()`` need to
> +    call into KVM to ensure that VM state remains consistent with the
> +    host's state. For example, when the CPU frequency changes, KVM is
> +    notified. ``kvmclock_cpufreq_notifier()`` takes ``kvm_lock`` to
> +    iterate ``vm_list``.
> +
> +    To decouple these, use different locks, ``kvm_lock`` for
> +    ``vm_list`` and ``kvm_usage_lock`` for enabling/disabling hardware
> +    virtualization.

I appreciate the effort, but honestly I think this does more harm than good.  I
already know what this code does, and the above confused me more than anything.

>  
>  ``kvm->mn_invalidate_lock``
>  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> -- 
> 2.54.0.823.g6e5bcc1fc9-goog
> 

^ permalink raw reply

* Re: [PATCH v2 16/17] KVM: TDX: Add in-kernel Quote generation
From: Sean Christopherson @ 2026-06-25 18:01 UTC (permalink / raw)
  To: Xu Yilun
  Cc: x86, kvm, linux-coco, linux-kernel, djbw, kas, rick.p.edgecombe,
	yilun.xu, xiaoyao.li, sohil.mehta, adrian.hunter, kishen.maloor,
	tony.lindgren, peter.fang, baolu.lu, zhenzhong.duan, dave.hansen,
	dave.hansen
In-Reply-To: <20260618081355.3253581-17-yilun.xu@linux.intel.com>

On Thu, Jun 18, 2026, Xu Yilun wrote:
> From: Peter Fang <peter.fang@intel.com>
> 
> Provide an in-kernel path for Quote generation when handling
> TDG.VP.VMCALL<GetQuote>, without requiring an exit to userspace.

Why?

^ permalink raw reply

* Re: [GIT PULL] KVM fixes for Linux 7.2-rc1
From: pr-tracker-bot @ 2026-06-25 17:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: torvalds, linux-kernel, kvm
In-Reply-To: <20260625122325.100485-1-pbonzini@redhat.com>

The pull request you sent on Thu, 25 Jun 2026 08:23:23 -0400:

> https://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/c75597caada080effbfbc0a7fb10dc2a3bb543ad

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html

^ permalink raw reply

* Re: [PATCH 00/34] Add migration support to the MSHV accelerator
From: Paolo Bonzini @ 2026-06-25 17:35 UTC (permalink / raw)
  To: Magnus Kulke
  Cc: qemu-devel, kvm, Magnus Kulke, Wei Liu, Michael S . Tsirkin,
	Cédric Le Goater, Zhao Liu, Richard Henderson, Wei Liu,
	Alex Williamson, Marcel Apfelbaum, Philippe Mathieu-Daudé,
	Marcelo Tosatti
In-Reply-To: <20260417105618.3621-1-magnuskulke@linux.microsoft.com>

I merged everything except:

- target/i386/mshv: migrate LAPIC state (*)
- target/i386/mshv: migrate Synic SINT MSRs
- target/i386/mshv: migrate SIMP and SIEFP state
- target/i386/mshv: migrate STIMER state
- accel/mshv: introduce SaveVMHandler (*)
- accel/mshv: write synthetic MSRs after migration
- accel/mshv: migrate REFERENCE_TIME
- target/i386/mshv: migrate MP_STATE

The ones marked (*) were the important ones that I'll reply to, the rest are
collateral damage.

I also removed TSC_DEADLINE from the MSRs patch, since it's more of a local
APIC register and it has a dependency on "migrate LAPIC state".

Paolo


^ permalink raw reply

* [PATCH v2 kvmtool 4/4] arm64: Add support for protected VMs
From: Will Deacon @ 2026-06-25 17:10 UTC (permalink / raw)
  To: kvm
  Cc: kvmarm, Will Deacon, Alexandru Elisei, Suzuki K Poulose,
	Andre Przywara, Fuad Tabba, Oliver Upton, Marc Zyngier
In-Reply-To: <20260625171046.4482-1-will@kernel.org>

Introduce a new '--protected' parameter which requests the creation of
a protected VM type from the kernel. In addition, a reserved DMA region
is advertised in the device-tree and VIRTIO_F_ACCESS_PLATFORM is
advertised so that virtio transfers can be bounced through a shared
memory window.

Signed-off-by: Will Deacon <will@kernel.org>
---
 arm64/fdt.c                         | 37 +++++++++++++++++++++++++++--
 arm64/include/kvm/fdt-arch.h        | 10 +++++++-
 arm64/include/kvm/kvm-arch.h        |  2 ++
 arm64/include/kvm/kvm-config-arch.h |  5 +++-
 arm64/kvm.c                         | 28 ++++++++++++++++++++--
 arm64/pci.c                         |  2 ++
 6 files changed, 78 insertions(+), 6 deletions(-)

diff --git a/arm64/fdt.c b/arm64/fdt.c
index 98f1dd9..3cbd36e 100644
--- a/arm64/fdt.c
+++ b/arm64/fdt.c
@@ -71,6 +71,19 @@ static void generate_irq_prop(void *fdt, u8 irq, enum irq_type irq_type)
 	_FDT(fdt_property(fdt, "interrupts", irq_prop, sizeof(irq_prop)));
 }
 
+static bool emit_dma_regions;
+void generate_dma_region_prop(void *fdt)
+{
+	if (emit_dma_regions)
+		_FDT(fdt_property_cell(fdt, "memory-region", PHANDLE_DMA));
+}
+
+static void generate_aux_props(void *fdt, u8 irq, enum irq_type irq_type)
+{
+	generate_irq_prop(fdt, irq, irq_type);
+	generate_dma_region_prop(fdt);
+}
+
 struct psci_fns {
 	u32 cpu_suspend;
 	u32 cpu_off;
@@ -103,7 +116,7 @@ static int setup_fdt(struct kvm *kvm)
 {
 	struct device_header *dev_hdr;
 	u8 staging_fdt[FDT_MAX_SIZE];
-	u64 mem_reg_prop[]	= {
+	u64 resv_mem_prop, mem_reg_prop[] = {
 		cpu_to_fdt64(kvm->arch.memory_guest_start),
 		cpu_to_fdt64(kvm->ram_size),
 	};
@@ -116,6 +129,9 @@ static int setup_fdt(struct kvm *kvm)
 	void (*generate_cpu_peripheral_fdt_nodes)(void *, struct kvm *)
 					= kvm->cpus[0]->generate_fdt_nodes;
 
+	/* Generate DMA regions for bouncing in protected VMs */
+	emit_dma_regions = kvm->cfg.arch.protected;
+
 	/* Create new tree without a reserve map */
 	_FDT(fdt_create(fdt, FDT_MAX_SIZE));
 	_FDT(fdt_finish_reservemap(fdt));
@@ -162,6 +178,23 @@ static int setup_fdt(struct kvm *kvm)
 	_FDT(fdt_property(fdt, "reg", mem_reg_prop, sizeof(mem_reg_prop)));
 	_FDT(fdt_end_node(fdt));
 
+	/* Reserved memory (restricted DMA pool) */
+	if (emit_dma_regions) {
+		_FDT(fdt_begin_node(fdt, "reserved-memory"));
+		_FDT(fdt_property_cell(fdt, "#address-cells", 0x2));
+		_FDT(fdt_property_cell(fdt, "#size-cells", 0x2));
+		_FDT(fdt_property(fdt, "ranges", NULL, 0));
+
+		_FDT(fdt_begin_node(fdt, "restricted_dma_reserved"));
+		_FDT(fdt_property_string(fdt, "compatible", "restricted-dma-pool"));
+		resv_mem_prop = cpu_to_fdt64(DMA_MEM_REGION_SIZE);
+		_FDT(fdt_property(fdt, "size", &resv_mem_prop, sizeof(resv_mem_prop)));
+		_FDT(fdt_property_cell(fdt, "phandle", PHANDLE_DMA));
+		_FDT(fdt_end_node(fdt));
+
+		_FDT(fdt_end_node(fdt));
+	}
+
 	/* CPU and peripherals (interrupt controller, timers, etc) */
 	generate_cpu_nodes(fdt, kvm);
 	if (generate_cpu_peripheral_fdt_nodes)
@@ -172,7 +205,7 @@ static int setup_fdt(struct kvm *kvm)
 	while (dev_hdr) {
 		generate_mmio_fdt_nodes = dev_hdr->data;
 		if (generate_mmio_fdt_nodes) {
-			generate_mmio_fdt_nodes(fdt, dev_hdr, generate_irq_prop);
+			generate_mmio_fdt_nodes(fdt, dev_hdr, generate_aux_props);
 		} else {
 			pr_debug("Missing FDT node generator for MMIO device %d",
 				 dev_hdr->dev_num);
diff --git a/arm64/include/kvm/fdt-arch.h b/arm64/include/kvm/fdt-arch.h
index 60c2d40..8a0a460 100644
--- a/arm64/include/kvm/fdt-arch.h
+++ b/arm64/include/kvm/fdt-arch.h
@@ -1,6 +1,14 @@
 #ifndef ARM__FDT_H
 #define ARM__FDT_H
 
-enum phandles {PHANDLE_RESERVED = 0, PHANDLE_GIC, PHANDLE_MSI, PHANDLES_MAX};
+enum phandles {
+	PHANDLE_RESERVED = 0,
+	PHANDLE_GIC,
+	PHANDLE_MSI,
+	PHANDLE_DMA,
+	PHANDLES_MAX
+};
+
+void generate_dma_region_prop(void *fdt);
 
 #endif /* ARM__FDT_H */
diff --git a/arm64/include/kvm/kvm-arch.h b/arm64/include/kvm/kvm-arch.h
index a50e6a4..e7dd526 100644
--- a/arm64/include/kvm/kvm-arch.h
+++ b/arm64/include/kvm/kvm-arch.h
@@ -87,6 +87,8 @@
 
 #define MAX_PAGE_SIZE	SZ_64K
 
+/* Size of DMA region for bouncing when running a protected guest */
+#define DMA_MEM_REGION_SIZE	SZ_32M
 
 static inline bool arm_addr_in_ioport_region(u64 phys_addr)
 {
diff --git a/arm64/include/kvm/kvm-config-arch.h b/arm64/include/kvm/kvm-config-arch.h
index d321b77..c2702d5 100644
--- a/arm64/include/kvm/kvm-config-arch.h
+++ b/arm64/include/kvm/kvm-config-arch.h
@@ -19,6 +19,7 @@ struct kvm_config_arch {
 	unsigned int	sve_max_vq;
 	bool		no_pvtime;
 	bool		psci;
+	bool		protected;
 };
 
 int irqchip_parser(const struct option *opt, const char *arg, int unset);
@@ -70,6 +71,8 @@ int sve_vl_parser(const struct option *opt, const char *arg, int unset);
 	OPT_BOOLEAN('\0', "nested", &(cfg)->nested_virt,			\
 		    "Start VCPUs in EL2 (for nested virt)"),			\
 	OPT_BOOLEAN('\0', "e2h0", &(cfg)->e2h0,					\
-		    "Create guest without VHE support"),
+		    "Create guest without VHE support"),			\
+	OPT_BOOLEAN('\0', "protected", &(cfg)->protected,			\
+			"Create a protected VM when pKVM is enabled"),
 
 #endif /* ARM_COMMON__KVM_CONFIG_ARCH_H */
diff --git a/arm64/kvm.c b/arm64/kvm.c
index c8570ce..fb0b98d 100644
--- a/arm64/kvm.c
+++ b/arm64/kvm.c
@@ -6,6 +6,7 @@
 #include "kvm/fdt.h"
 #include "kvm/gic.h"
 #include "kvm/kvm-cpu.h"
+#include "kvm/virtio.h"
 
 #include "asm/smccc.h"
 
@@ -147,6 +148,9 @@ void kvm__arch_init(struct kvm *kvm)
 	kvm__arch_enable_mte(kvm);
 	kvm__setup_smccc(kvm);
 	kvm__arch_set_counter_offset(kvm);
+
+	if (kvm->cfg.arch.protected)
+		virtio_modern_enable_feat_access_platform();
 }
 
 
@@ -463,6 +467,22 @@ void kvm__arch_validate_cfg(struct kvm *kvm)
 
 	if (kvm->cfg.arch.e2h0 && !kvm->cfg.arch.nested_virt)
 		pr_warning("--e2h0 requires --nested, ignoring");
+
+	if (kvm->cfg.arch.protected) {
+		if (kvm->cfg.ram_size &&
+		    kvm->cfg.ram_size < DMA_MEM_REGION_SIZE) {
+			die("RAM size (0x%llx) smaller than DMA bounce region (0x%x)",
+			    kvm->cfg.ram_size, DMA_MEM_REGION_SIZE);
+		}
+
+		if (kvm->cfg.virtio_transport == VIRTIO_MMIO_LEGACY ||
+		    kvm->cfg.virtio_transport == VIRTIO_PCI_LEGACY) {
+			die("Protected VMs require a modern virtio transport");
+		}
+
+		if (kvm->cfg.balloon)
+			die("Ballooning not supported with protected VMs");
+	}
 }
 
 u64 kvm__arch_default_ram_address(void)
@@ -485,11 +505,15 @@ int kvm__get_vm_type(struct kvm *kvm)
 {
 	unsigned int ipa_bits, max_ipa_bits;
 	unsigned long max_ipa;
+	int type = 0;
+
+	if (kvm->cfg.arch.protected)
+		type |= KVM_VM_TYPE_ARM_PROTECTED;
 
 	/* If we're running on an old kernel, use 0 as the VM type */
 	max_ipa_bits = kvm__arch_get_ipa_limit(kvm);
 	if (!max_ipa_bits)
-		return 0;
+		return type;
 
 	/* Otherwise, compute the minimal required IPA size */
 	max_ipa = kvm->cfg.ram_addr + kvm->cfg.ram_size - 1;
@@ -500,7 +524,7 @@ int kvm__get_vm_type(struct kvm *kvm)
 	if (ipa_bits > max_ipa_bits)
 		die("Memory too large for this system (needs %d bits, %d available)", ipa_bits, max_ipa_bits);
 
-	return KVM_VM_TYPE_ARM_IPA_SIZE(ipa_bits);
+	return type | KVM_VM_TYPE_ARM_IPA_SIZE(ipa_bits);
 }
 
 static int kvm__arch_free_kernel_header(struct kvm *kvm)
diff --git a/arm64/pci.c b/arm64/pci.c
index 0366783..db87db8 100644
--- a/arm64/pci.c
+++ b/arm64/pci.c
@@ -73,6 +73,8 @@ void pci__generate_fdt_nodes(void *fdt, struct kvm *kvm)
 	if (irqchip == IRQCHIP_GICV2M || irqchip == IRQCHIP_GICV3_ITS)
 		_FDT(fdt_property_cell(fdt, "msi-parent", PHANDLE_MSI));
 
+	generate_dma_region_prop(fdt);
+
 	/* Generate the interrupt map ... */
 	dev_hdr = device__first_dev(DEVICE_BUS_PCI);
 	while (dev_hdr && nentries < ARRAY_SIZE(irq_map)) {
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v2 kvmtool 3/4] virtio: Add helper for enabling VIRTIO_F_ACCESS_PLATFORM
From: Will Deacon @ 2026-06-25 17:10 UTC (permalink / raw)
  To: kvm
  Cc: kvmarm, Will Deacon, Alexandru Elisei, Suzuki K Poulose,
	Andre Przywara, Fuad Tabba, Oliver Upton, Marc Zyngier
In-Reply-To: <20260625171046.4482-1-will@kernel.org>

In preparation for supporting protected guests on arm64, introduce a
virtio helper to advertise the VIRTIO_F_ACCESS_PLATFORM feature for the
modern transport.

Tested-by: Fuad Tabba <fuad.tabba@linux.dev>
Reviewed-by: Fuad Tabba <fuad.tabba@linux.dev>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/kvm/virtio.h | 1 +
 virtio/core.c        | 5 +++++
 2 files changed, 6 insertions(+)

diff --git a/include/kvm/virtio.h b/include/kvm/virtio.h
index c95183b..dd3117b 100644
--- a/include/kvm/virtio.h
+++ b/include/kvm/virtio.h
@@ -277,6 +277,7 @@ void virtio_vhost_reset_vring(struct kvm *kvm, int vhost_fd, u32 index,
 int virtio_vhost_set_features(int vhost_fd, u64 features);
 
 int virtio_transport_parser(const struct option *opt, const char *arg, int unset);
+void virtio_modern_enable_feat_access_platform(void);
 u64 virtio_get_modern_transport_features(void);
 
 #endif /* KVM__VIRTIO_H */
diff --git a/virtio/core.c b/virtio/core.c
index 8c5086d..96bbf96 100644
--- a/virtio/core.c
+++ b/virtio/core.c
@@ -355,6 +355,11 @@ bool virtio_access_config(struct kvm *kvm, struct virtio_device *vdev,
 
 static u64 virtio_modern_transport_features = 1ULL << VIRTIO_F_VERSION_1;
 
+void virtio_modern_enable_feat_access_platform(void)
+{
+	virtio_modern_transport_features |= 1ULL << VIRTIO_F_ACCESS_PLATFORM;
+}
+
 u64 virtio_get_modern_transport_features(void)
 {
 	return virtio_modern_transport_features;
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v2 kvmtool 2/4] virtio: Factor out base features for modern virtio transports
From: Will Deacon @ 2026-06-25 17:10 UTC (permalink / raw)
  To: kvm
  Cc: kvmarm, Will Deacon, Alexandru Elisei, Suzuki K Poulose,
	Andre Przywara, Fuad Tabba, Oliver Upton, Marc Zyngier
In-Reply-To: <20260625171046.4482-1-will@kernel.org>

In preparation for optionally enabling VIRTIO_F_ACCESS_PLATFORM,
factor out the base features for modern virtio transports.

Tested-by: Fuad Tabba <fuad.tabba@linux.dev>
Reviewed-by: Fuad Tabba <fuad.tabba@linux.dev>
Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
 include/kvm/virtio.h | 1 +
 virtio/core.c        | 7 +++++++
 virtio/mmio-modern.c | 2 +-
 virtio/pci-modern.c  | 2 +-
 4 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/kvm/virtio.h b/include/kvm/virtio.h
index 8b7ec1b..c95183b 100644
--- a/include/kvm/virtio.h
+++ b/include/kvm/virtio.h
@@ -277,5 +277,6 @@ void virtio_vhost_reset_vring(struct kvm *kvm, int vhost_fd, u32 index,
 int virtio_vhost_set_features(int vhost_fd, u64 features);
 
 int virtio_transport_parser(const struct option *opt, const char *arg, int unset);
+u64 virtio_get_modern_transport_features(void);
 
 #endif /* KVM__VIRTIO_H */
diff --git a/virtio/core.c b/virtio/core.c
index 50c9ddd..8c5086d 100644
--- a/virtio/core.c
+++ b/virtio/core.c
@@ -353,6 +353,13 @@ bool virtio_access_config(struct kvm *kvm, struct virtio_device *vdev,
 	return true;
 }
 
+static u64 virtio_modern_transport_features = 1ULL << VIRTIO_F_VERSION_1;
+
+u64 virtio_get_modern_transport_features(void)
+{
+	return virtio_modern_transport_features;
+}
+
 int virtio_init(struct kvm *kvm, void *dev, struct virtio_device *vdev,
 		struct virtio_ops *ops, enum virtio_trans trans,
 		int device_id, int subsys_id, int class)
diff --git a/virtio/mmio-modern.c b/virtio/mmio-modern.c
index 6c0bb38..7787508 100644
--- a/virtio/mmio-modern.c
+++ b/virtio/mmio-modern.c
@@ -11,7 +11,7 @@ static void virtio_mmio_config_in(struct kvm_cpu *vcpu,
 				  struct virtio_device *vdev)
 {
 	struct virtio_mmio *vmmio = vdev->virtio;
-	u64 features = 1ULL << VIRTIO_F_VERSION_1;
+	u64 features = virtio_get_modern_transport_features();
 	u32 val = 0;
 
 	switch (addr) {
diff --git a/virtio/pci-modern.c b/virtio/pci-modern.c
index ef2f3e2..888afa5 100644
--- a/virtio/pci-modern.c
+++ b/virtio/pci-modern.c
@@ -148,7 +148,7 @@ static bool virtio_pci__common_read(struct virtio_device *vdev,
 {
 	u32 val;
 	struct virtio_pci *vpci = vdev->virtio;
-	u64 features = 1ULL << VIRTIO_F_VERSION_1;
+	u64 features = virtio_get_modern_transport_features();
 
 	switch (offset - VPCI_CFG_COMMON_START) {
 	case VIRTIO_PCI_COMMON_DFSELECT:
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v2 kvmtool 1/4] Sync kernel UAPI headers with v7.1
From: Will Deacon @ 2026-06-25 17:10 UTC (permalink / raw)
  To: kvm
  Cc: kvmarm, Will Deacon, Alexandru Elisei, Suzuki K Poulose,
	Andre Przywara, Fuad Tabba, Oliver Upton, Marc Zyngier
In-Reply-To: <20260625171046.4482-1-will@kernel.org>

Generated using util/update_headers.sh.

Tested-by: Fuad Tabba <fuad.tabba@linux.dev>
Reviewed-by: Fuad Tabba <fuad.tabba@linux.dev>
Signed-off-by: Will Deacon <will@kernel.org>
---
 arm64/include/asm/kvm.h     |  1 +
 include/linux/kvm.h         | 53 ++++++++++++++++++++++++++++++++-----
 include/linux/virtio_ring.h |  5 +---
 riscv/include/asm/kvm.h     | 11 +++++---
 x86/include/asm/kvm.h       | 21 +++++++++------
 5 files changed, 69 insertions(+), 22 deletions(-)

diff --git a/arm64/include/asm/kvm.h b/arm64/include/asm/kvm.h
index a792a59..1c13bfa 100644
--- a/arm64/include/asm/kvm.h
+++ b/arm64/include/asm/kvm.h
@@ -428,6 +428,7 @@ enum {
 #define   KVM_DEV_ARM_ITS_RESTORE_TABLES        2
 #define   KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES	3
 #define   KVM_DEV_ARM_ITS_CTRL_RESET		4
+#define   KVM_DEV_ARM_VGIC_USERSPACE_PPIS	5
 
 /* Device Control API on vcpu fd */
 #define KVM_ARM_VCPU_PMU_V3_CTRL	0
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index dddb781..6c8afa2 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -11,9 +11,14 @@
 #include <linux/const.h>
 #include <linux/types.h>
 #include <linux/compiler.h>
+#include <linux/stddef.h>
 #include <linux/ioctl.h>
 #include <asm/kvm.h>
 
+#ifdef __KERNEL__
+#include <linux/kvm_types.h>
+#endif
+
 #define KVM_API_VERSION 12
 
 /*
@@ -135,6 +140,12 @@ struct kvm_xen_exit {
 	} u;
 };
 
+struct kvm_exit_snp_req_certs {
+	__u64 gpa;
+	__u64 npages;
+	__u64 ret;
+};
+
 #define KVM_S390_GET_SKEYS_NONE   1
 #define KVM_S390_SKEYS_MAX        1048576
 
@@ -180,6 +191,8 @@ struct kvm_xen_exit {
 #define KVM_EXIT_MEMORY_FAULT     39
 #define KVM_EXIT_TDX              40
 #define KVM_EXIT_ARM_SEA          41
+#define KVM_EXIT_ARM_LDST64B      42
+#define KVM_EXIT_SNP_REQ_CERTS    43
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -402,7 +415,7 @@ struct kvm_run {
 		} eoi;
 		/* KVM_EXIT_HYPERV */
 		struct kvm_hyperv_exit hyperv;
-		/* KVM_EXIT_ARM_NISV */
+		/* KVM_EXIT_ARM_NISV / KVM_EXIT_ARM_LDST64B */
 		struct {
 			__u64 esr_iss;
 			__u64 fault_ipa;
@@ -482,6 +495,8 @@ struct kvm_run {
 			__u64 gva;
 			__u64 gpa;
 		} arm_sea;
+		/* KVM_EXIT_SNP_REQ_CERTS */
+		struct kvm_exit_snp_req_certs snp_req_certs;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -528,7 +543,7 @@ struct kvm_coalesced_mmio {
 
 struct kvm_coalesced_mmio_ring {
 	__u32 first, last;
-	struct kvm_coalesced_mmio coalesced_mmio[];
+	__DECLARE_FLEX_ARRAY(struct kvm_coalesced_mmio, coalesced_mmio);
 };
 
 #define KVM_COALESCED_MMIO_MAX \
@@ -578,7 +593,7 @@ struct kvm_clear_dirty_log {
 /* for KVM_SET_SIGNAL_MASK */
 struct kvm_signal_mask {
 	__u32 len;
-	__u8  sigset[];
+	__DECLARE_FLEX_ARRAY(__u8, sigset);
 };
 
 /* for KVM_TPR_ACCESS_REPORTING */
@@ -689,6 +704,11 @@ struct kvm_enable_cap {
 #define KVM_VM_TYPE_ARM_IPA_SIZE_MASK	0xffULL
 #define KVM_VM_TYPE_ARM_IPA_SIZE(x)		\
 	((x) & KVM_VM_TYPE_ARM_IPA_SIZE_MASK)
+
+#define KVM_VM_TYPE_ARM_PROTECTED	(1UL << 31)
+#define KVM_VM_TYPE_ARM_MASK		(KVM_VM_TYPE_ARM_IPA_SIZE_MASK | \
+					 KVM_VM_TYPE_ARM_PROTECTED)
+
 /*
  * ioctls for /dev/kvm fds:
  */
@@ -974,6 +994,8 @@ struct kvm_enable_cap {
 #define KVM_CAP_GUEST_MEMFD_FLAGS 244
 #define KVM_CAP_ARM_SEA_TO_USER 245
 #define KVM_CAP_S390_USER_OPEREXEC 246
+#define KVM_CAP_S390_KEYOP 247
+#define KVM_CAP_S390_VSIE_ESAMODE 248
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
@@ -1036,7 +1058,7 @@ struct kvm_irq_routing_entry {
 struct kvm_irq_routing {
 	__u32 nr;
 	__u32 flags;
-	struct kvm_irq_routing_entry entries[];
+	__DECLARE_FLEX_ARRAY(struct kvm_irq_routing_entry, entries);
 };
 
 #define KVM_IRQFD_FLAG_DEASSIGN (1 << 0)
@@ -1127,7 +1149,7 @@ struct kvm_dirty_tlb {
 
 struct kvm_reg_list {
 	__u64 n; /* number of regs */
-	__u64 reg[];
+	__DECLARE_FLEX_ARRAY(__u64, reg);
 };
 
 struct kvm_one_reg {
@@ -1209,6 +1231,10 @@ enum kvm_device_type {
 #define KVM_DEV_TYPE_LOONGARCH_EIOINTC	KVM_DEV_TYPE_LOONGARCH_EIOINTC
 	KVM_DEV_TYPE_LOONGARCH_PCHPIC,
 #define KVM_DEV_TYPE_LOONGARCH_PCHPIC	KVM_DEV_TYPE_LOONGARCH_PCHPIC
+	KVM_DEV_TYPE_LOONGARCH_DMSINTC,
+#define KVM_DEV_TYPE_LOONGARCH_DMSINTC	KVM_DEV_TYPE_LOONGARCH_DMSINTC
+	KVM_DEV_TYPE_ARM_VGIC_V5,
+#define KVM_DEV_TYPE_ARM_VGIC_V5	KVM_DEV_TYPE_ARM_VGIC_V5
 
 	KVM_DEV_TYPE_MAX,
 
@@ -1219,6 +1245,16 @@ struct kvm_vfio_spapr_tce {
 	__s32	tablefd;
 };
 
+#define KVM_S390_KEYOP_ISKE 0x01
+#define KVM_S390_KEYOP_RRBE 0x02
+#define KVM_S390_KEYOP_SSKE 0x03
+struct kvm_s390_keyop {
+	__u64 guest_addr;
+	__u8  key;
+	__u8  operation;
+	__u8  pad[6];
+};
+
 /*
  * KVM_CREATE_VCPU receives as a parameter the vcpu slot, and returns
  * a vcpu fd.
@@ -1238,6 +1274,7 @@ struct kvm_vfio_spapr_tce {
 #define KVM_S390_UCAS_MAP        _IOW(KVMIO, 0x50, struct kvm_s390_ucas_mapping)
 #define KVM_S390_UCAS_UNMAP      _IOW(KVMIO, 0x51, struct kvm_s390_ucas_mapping)
 #define KVM_S390_VCPU_FAULT	 _IOW(KVMIO, 0x52, unsigned long)
+#define KVM_S390_KEYOP           _IOWR(KVMIO, 0x53, struct kvm_s390_keyop)
 
 /* Device model IOC */
 #define KVM_CREATE_IRQCHIP        _IO(KVMIO,   0x60)
@@ -1579,7 +1616,11 @@ struct kvm_stats_desc {
 	__u16 size;
 	__u32 offset;
 	__u32 bucket_size;
-	char name[];
+#ifdef __KERNEL__
+	char name[KVM_STATS_NAME_SIZE];
+#else
+	__DECLARE_FLEX_ARRAY(char, name);
+#endif
 };
 
 #define KVM_GET_STATS_FD  _IO(KVMIO,  0xce)
diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h
index f8c20d3..3c47858 100644
--- a/include/linux/virtio_ring.h
+++ b/include/linux/virtio_ring.h
@@ -31,9 +31,6 @@
  * SUCH DAMAGE.
  *
  * Copyright Rusty Russell IBM Corporation 2007. */
-#ifndef __KERNEL__
-#include <stdint.h>
-#endif
 #include <linux/types.h>
 #include <linux/virtio_types.h>
 
@@ -202,7 +199,7 @@ static inline void vring_init(struct vring *vr, unsigned int num, void *p,
 	vr->num = num;
 	vr->desc = p;
 	vr->avail = (struct vring_avail *)((char *)p + num * sizeof(struct vring_desc));
-	vr->used = (void *)(((uintptr_t)&vr->avail->ring[num] + sizeof(__virtio16)
+	vr->used = (void *)(((unsigned long)&vr->avail->ring[num] + sizeof(__virtio16)
 		+ align-1) & ~(align - 1));
 }
 
diff --git a/riscv/include/asm/kvm.h b/riscv/include/asm/kvm.h
index 54f3ad7..504e733 100644
--- a/riscv/include/asm/kvm.h
+++ b/riscv/include/asm/kvm.h
@@ -110,6 +110,10 @@ struct kvm_riscv_timer {
 	__u64 state;
 };
 
+/* Possible states for kvm_riscv_timer */
+#define KVM_RISCV_TIMER_STATE_OFF	0
+#define KVM_RISCV_TIMER_STATE_ON	1
+
 /*
  * ISA extension IDs specific to KVM. This is not the same as the host ISA
  * extension IDs as that is internal to the host and should not be exposed
@@ -192,6 +196,9 @@ enum KVM_RISCV_ISA_EXT_ID {
 	KVM_RISCV_ISA_EXT_ZFBFMIN,
 	KVM_RISCV_ISA_EXT_ZVFBFMIN,
 	KVM_RISCV_ISA_EXT_ZVFBFWMA,
+	KVM_RISCV_ISA_EXT_ZCLSD,
+	KVM_RISCV_ISA_EXT_ZILSD,
+	KVM_RISCV_ISA_EXT_ZALASR,
 	KVM_RISCV_ISA_EXT_MAX,
 };
 
@@ -235,10 +242,6 @@ struct kvm_riscv_sbi_fwft {
 	struct kvm_riscv_sbi_fwft_feature pointer_masking;
 };
 
-/* Possible states for kvm_riscv_timer */
-#define KVM_RISCV_TIMER_STATE_OFF	0
-#define KVM_RISCV_TIMER_STATE_ON	1
-
 /* If you need to interpret the index values, here is the key: */
 #define KVM_REG_RISCV_TYPE_MASK		0x00000000FF000000
 #define KVM_REG_RISCV_TYPE_SHIFT	24
diff --git a/x86/include/asm/kvm.h b/x86/include/asm/kvm.h
index 7ceff65..5f2b30d 100644
--- a/x86/include/asm/kvm.h
+++ b/x86/include/asm/kvm.h
@@ -197,13 +197,13 @@ struct kvm_msrs {
 	__u32 nmsrs; /* number of msrs in entries */
 	__u32 pad;
 
-	struct kvm_msr_entry entries[];
+	__DECLARE_FLEX_ARRAY(struct kvm_msr_entry, entries);
 };
 
 /* for KVM_GET_MSR_INDEX_LIST */
 struct kvm_msr_list {
 	__u32 nmsrs; /* number of msrs in entries */
-	__u32 indices[];
+	__DECLARE_FLEX_ARRAY(__u32, indices);
 };
 
 /* Maximum size of any access bitmap in bytes */
@@ -245,7 +245,7 @@ struct kvm_cpuid_entry {
 struct kvm_cpuid {
 	__u32 nent;
 	__u32 padding;
-	struct kvm_cpuid_entry entries[];
+	__DECLARE_FLEX_ARRAY(struct kvm_cpuid_entry, entries);
 };
 
 struct kvm_cpuid_entry2 {
@@ -267,7 +267,7 @@ struct kvm_cpuid_entry2 {
 struct kvm_cpuid2 {
 	__u32 nent;
 	__u32 padding;
-	struct kvm_cpuid_entry2 entries[];
+	__DECLARE_FLEX_ARRAY(struct kvm_cpuid_entry2, entries);
 };
 
 /* for KVM_GET_PIT and KVM_SET_PIT */
@@ -398,7 +398,7 @@ struct kvm_xsave {
 	 * the contents of CPUID leaf 0xD on the host.
 	 */
 	__u32 region[1024];
-	__u32 extra[];
+	__DECLARE_FLEX_ARRAY(__u32, extra);
 };
 
 #define KVM_MAX_XCRS	16
@@ -476,6 +476,7 @@ struct kvm_sync_regs {
 #define KVM_X86_QUIRK_SLOT_ZAP_ALL		(1 << 7)
 #define KVM_X86_QUIRK_STUFF_FEATURE_MSRS	(1 << 8)
 #define KVM_X86_QUIRK_IGNORE_GUEST_PAT		(1 << 9)
+#define KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM (1 << 10)
 
 #define KVM_STATE_NESTED_FORMAT_VMX	0
 #define KVM_STATE_NESTED_FORMAT_SVM	1
@@ -503,6 +504,7 @@ struct kvm_sync_regs {
 #define KVM_X86_GRP_SEV			1
 #  define KVM_X86_SEV_VMSA_FEATURES	0
 #  define KVM_X86_SNP_POLICY_BITS	1
+#  define KVM_X86_SEV_SNP_REQ_CERTS	2
 
 struct kvm_vmx_nested_state_data {
 	__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
@@ -564,7 +566,7 @@ struct kvm_pmu_event_filter {
 	__u32 fixed_counter_bitmap;
 	__u32 flags;
 	__u32 pad[4];
-	__u64 events[];
+	__DECLARE_FLEX_ARRAY(__u64, events);
 };
 
 #define KVM_PMU_EVENT_ALLOW 0
@@ -743,6 +745,7 @@ enum sev_cmd_id {
 	KVM_SEV_SNP_LAUNCH_START = 100,
 	KVM_SEV_SNP_LAUNCH_UPDATE,
 	KVM_SEV_SNP_LAUNCH_FINISH,
+	KVM_SEV_SNP_ENABLE_REQ_CERTS,
 
 	KVM_SEV_NR_MAX,
 };
@@ -914,8 +917,10 @@ struct kvm_sev_snp_launch_finish {
 	__u64 pad1[4];
 };
 
-#define KVM_X2APIC_API_USE_32BIT_IDS            (1ULL << 0)
-#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK  (1ULL << 1)
+#define KVM_X2APIC_API_USE_32BIT_IDS			_BITULL(0)
+#define KVM_X2APIC_API_DISABLE_BROADCAST_QUIRK		_BITULL(1)
+#define KVM_X2APIC_ENABLE_SUPPRESS_EOI_BROADCAST	_BITULL(2)
+#define KVM_X2APIC_DISABLE_SUPPRESS_EOI_BROADCAST	_BITULL(3)
 
 struct kvm_hyperv_eventfd {
 	__u32 conn_id;
-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply related

* [PATCH v2 kvmtool 0/4] Add support for running protected VMs on arm64
From: Will Deacon @ 2026-06-25 17:10 UTC (permalink / raw)
  To: kvm
  Cc: kvmarm, Will Deacon, Alexandru Elisei, Suzuki K Poulose,
	Andre Przywara, Fuad Tabba, Oliver Upton, Marc Zyngier

Hi folks,

This is v2 of the patches I previously posted here:

  https://lore.kernel.org/r/20260619115415.5475-1-will@kernel.org

Changes since v1 include:

  * Bail if user specifies less guest memory than the restricted DMA pool.
  * Avoid silently dropping KVM_VM_TYPE_ARM_PROTECTED on old host kernels.
  * Added R-b/T-b tags (thank you!)

The patches are also available here if you want to pull them directly:

  https://git.kernel.org/pub/scm/linux/kernel/git/will/kvmtool.git/log/?h=pkvm

Cheers,

Will

Cc: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Fuad Tabba <fuad.tabba@linux.dev>
Cc: Oliver Upton <oliver.upton@linux.dev>
Cc: Marc Zyngier <maz@kernel.org>

--->8

Will Deacon (4):
  Sync kernel UAPI headers with v7.1
  virtio: Factor out base features for modern virtio transports
  virtio: Add helper for enabling VIRTIO_F_ACCESS_PLATFORM
  arm64: Add support for protected VMs

 arm64/fdt.c                         | 37 ++++++++++++++++++--
 arm64/include/asm/kvm.h             |  1 +
 arm64/include/kvm/fdt-arch.h        | 10 +++++-
 arm64/include/kvm/kvm-arch.h        |  2 ++
 arm64/include/kvm/kvm-config-arch.h |  5 ++-
 arm64/kvm.c                         | 28 +++++++++++++--
 arm64/pci.c                         |  2 ++
 include/kvm/virtio.h                |  2 ++
 include/linux/kvm.h                 | 53 +++++++++++++++++++++++++----
 include/linux/virtio_ring.h         |  5 +--
 riscv/include/asm/kvm.h             | 11 +++---
 virtio/core.c                       | 12 +++++++
 virtio/mmio-modern.c                |  2 +-
 virtio/pci-modern.c                 |  2 +-
 x86/include/asm/kvm.h               | 21 +++++++-----
 15 files changed, 163 insertions(+), 30 deletions(-)

-- 
2.55.0.rc0.799.gd6f94ed593-goog


^ permalink raw reply

* Re: [RFC PATCH v2 0/4] KVM: x86: TDX: Validate directly configurable CPUID bits
From: Sean Christopherson @ 2026-06-25 17:04 UTC (permalink / raw)
  To: Binbin Wu
  Cc: kvm, linux-kernel, pbonzini, rick.p.edgecombe, xiaoyao.li,
	chao.gao, kai.huang
In-Reply-To: <eba94468-2507-4c1a-8107-4d13182aed71@linux.intel.com>

On Mon, Jun 22, 2026, Binbin Wu wrote:
> On 6/4/2026 10:33 AM, Binbin Wu wrote:
> > Hi,
> > 
> > A host state clobbering feature on new TDX modules/platforms can lead
> > to host state corruption if KVM does not explicitly save and restore
> > the related MSR(s) during host/guest transitions. If such a feature is
> > blindly exposed to and used by TDs, it will result in unexpected behavior
> > on the host.
> > 
> > The v1 RFC [1] attempted to solve this by introducing a comprehensive
> > CPUID paranoid verification framework across VMX, SVM, and TDX. However,
> > as Sean pointed out in [2] and the discussion in the PUCK meeting, this
> > approach was overly complex and bled too many TDX-specific details into
> > common KVM code, creating an unnecessary maintenance burden.
> > 
> > This v2 takes a significantly simpler, TDX-contained approach. It strictly
> > validates only the TDX directly configurable CPUID bits—those reported by
> > the TDX module in CPUID_CONFIG fields that the VMM can configure for a TD.
> > This is sufficient to address the host clobbering issue, as no new host
> > state clobbering features will be fixed-1. All filtering and validation
> > logic is entirely isolated within TDX code.
> > 
> > Feedback is highly appreciated, particularly on whether this contained
> > approach strikes an acceptable balance regarding complexity.
> 
> Hi Sean,
> 
> Do you think this proposal is the direction to go?

Yeah, the basic gist looks good.

^ permalink raw reply

* Re: [RFC PATCH v2 1/4] KVM: x86: TDX: Track supported configurable CPUID bits
From: Sean Christopherson @ 2026-06-25 17:04 UTC (permalink / raw)
  To: Binbin Wu
  Cc: kvm, linux-kernel, pbonzini, rick.p.edgecombe, xiaoyao.li,
	chao.gao, kai.huang
In-Reply-To: <20260604023314.3907511-2-binbin.wu@linux.intel.com>

On Thu, Jun 04, 2026, Binbin Wu wrote:
> ---
>  arch/x86/kvm/vmx/tdx.c | 174 +++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 174 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index ffe9d0db58c5..e0567088ebf5 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -52,6 +52,178 @@
>  	__TDX_BUG_ON(__err, #__fn, __kvm, ", " #a1 " 0x%llx, " #a2 ", 0x%llx, " #a3 " 0x%llx", \
>  		     a1, a2, a3)
>  
> +#define TDX_CPUID_IGNORE_INDEX	BIT(0)
> +struct tdx_supported_cpuid_reg {
> +	u32 function;
> +	u32 index;
> +	u8 flags;
> +	u8 reg;
> +	u32 mask;
> +};
> +
> +/*
> + * Multi-bit fields are statically initialized, feature bits are initialized
> + * in tdx_initialize_cpu_cfg_caps().
> + */
> +static struct tdx_supported_cpuid_reg tdx_kvm_supported_cpuid[] __ro_after_init = {
> +	{ 0x1, 0, 0, CPUID_EAX, GENMASK_U32(27, 16) | GENMASK_U32(13, 0) },
> +	{ 0x1, 0, 0, CPUID_EBX, GENMASK_U32(23, 16) },
> +	{ 0x1, 0, 0, CPUID_ECX, 0 },
> +	{ 0x1, 0, 0, CPUID_EDX, 0 },
> +	{ 0x4, 0, TDX_CPUID_IGNORE_INDEX, CPUID_EAX, ~GENMASK_U32(13, 10) },
> +	{ 0x4, 0, TDX_CPUID_IGNORE_INDEX, CPUID_EBX, GENMASK_U32(31, 12) },
> +	{ 0x4, 0, TDX_CPUID_IGNORE_INDEX, CPUID_ECX, GENMASK_U32(31, 0) },
> +	{ 0x4, 0, TDX_CPUID_IGNORE_INDEX, CPUID_EDX, GENMASK_U32(2, 0) },
> +	{ 0x7, 0, 0, CPUID_EBX, 0 },
> +	{ 0x7, 0, 0, CPUID_ECX, 0 },
> +	{ 0x7, 0, 0, CPUID_EDX, 0 },
> +	{ 0x7, 1, 0, CPUID_EAX, 0 },
> +	{ 0x7, 1, 0, CPUID_EDX, 0 },
> +	{ 0x7, 2, 0, CPUID_EDX, 0 },
> +	{ 0x18, 0, TDX_CPUID_IGNORE_INDEX, CPUID_EDX, GENMASK_U32(25, 14) },
> +	{ 0x1E, 1, 0, CPUID_EAX, 0 },
> +	{ 0x1F, 0, TDX_CPUID_IGNORE_INDEX, CPUID_EAX, GENMASK_U32(4, 0) },
> +	{ 0x1F, 0, TDX_CPUID_IGNORE_INDEX, CPUID_EBX, GENMASK_U32(15, 0) },
> +	{ 0x1F, 0, TDX_CPUID_IGNORE_INDEX, CPUID_ECX, GENMASK_U32(15, 0) },
> +	/* See comments in td_init_cpuid_entry2() for CPUID 0x80000008 EAX[23:16]. */
> +	{ 0x80000008, 0, 0, CPUID_EAX, GENMASK_U32(23, 16) | GENMASK_U32(7, 0) },
> +	{ 0x80000008, 0, 0, CPUID_EBX, 0 },

For non-feature bits, I think I would rather handle them entirely at runtime via
switch statement(s).  Realistically, CPUID.0x1.E{A,B}X are never going to be
repurposed to hold feature bits, and so generating a mask of allowed bits adds
unnecessary cognitive load and maintenance.  Ditto for CPUID 0x4, 0x18, and 0x1F.

CPUID.0x1E is a bit different because it's kinda sorta a feature?  That one is
probably worth restricting, but again that's easy to do in a case-statement.

Then for the feature bits, there should be no need to define a separate structure,
just do "u32 kvm_tdx_cpu_caps[NR_KVM_CPU_CAPS]".  Then KVM can even further
restrict that array with kvm_cpu_caps (though it might take some creativity to
deal with things like MWAIT).  Because generally speaking, KVM shouldn't allow
features that KVM doesn't support for non-TDX VMs.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox