* [PATCH v5 1/4] KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
2026-07-01 5:14 [PATCH v5 0/4] KVM: PPC: Expose CPU compatibility modes for nested guests Amit Machhiwal
@ 2026-07-01 5:14 ` Amit Machhiwal
2026-07-01 5:23 ` sashiko-bot
2026-07-01 5:14 ` [PATCH v5 2/4] KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM Amit Machhiwal
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Amit Machhiwal @ 2026-07-01 5:14 UTC (permalink / raw)
To: linuxppc-dev, Madhavan Srinivasan
Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
Jonathan Corbet, Shuah Khan, Ritesh Harjani, kvm, linux-kernel,
linux-doc
Introduce a new capability and ioctl to expose CPU compatibility modes
supported by the host processor for nested guests.
On IBM POWER systems, newer processor generations (N) can operate in
compatibility modes corresponding to earlier generations, like (N-1) and
(N-2). This is particularly relevant for nested virtualization, where
nested KVM guests may need to run with a specific processor compatibility
level.
Introduce KVM_CAP_PPC_COMPAT_CAPS capability and the corresponding
KVM_PPC_GET_COMPAT_CAPS vm ioctl. The ioctl returns a bitmap describing
the compatibility modes supported by the host in respective bit numbers,
allowing userspace (e.g., QEMU) to select an appropriate compatibility
level when configuring nested KVM guests.
The ioctl handling is added in kvm_arch_vm_ioctl() and retrieves host
CPU compatibility capabilities via a PowerPC-specific backend
implementation when available.
The struct kvm_ppc_compat_caps places the 'size' field first so it can
be read alone via get_user() before copy_struct_from_user() is called,
avoiding pointer arithmetic to locate the size field.
The ioctl is defined using _IO so the ioctl number remains stable even if
the struct grows in future versions. It uses copy_struct_from_user() and
copy_struct_to_user() to provide forward- and backward-compatible
extensibility: older userspace passing a smaller struct to a newer kernel
gets zero-padded trailing fields, while newer userspace passing a larger
struct to an older kernel (usize > ksize) gets sizeof(struct
kvm_ppc_compat_caps)written back to host_caps.size so it can retry with the
older kernel-supported size, after which the kernel returns -E2BIG.
KVM_PPC_COMPAT_CAPS_SIZE_VER0 is defined as a frozen integer constant
(24) marking the size of the initial struct version, used as the
minimum floor for size field validation, similar to other versioned
struct interfaces in the kernel.
The 'flags' field is reserved for future use. The kernel rejects any
call where flags is non-zero with -EINVAL, preventing garbage values
from being baked into ABI permanently.
The ioctl returns appropriate error codes: EINVAL for an invalid size
or non-zero reserved fields, E2BIG if new userspace provides a larger
struct than the kernel knows about (with ksize written back into
host_caps.size for the retry), EFAULT for failed copy operations, and
ENOTTY if the backend doesn't implement get_compat_caps.
Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
Changes in this version:
- Moved size as the first member of the struct
- Replaced strict size equality check with copy_struct_from_user() and
copy_struct_to_user() for proper forward and backward ABI compatibility
- Added KVM_PPC_COMPAT_CAPS_SIZE_VER0 (24) as a frozen version floor
constant, following the convention used by similar interfaces in the kernel
- Added flags == 0 enforcement to prevent uninitialized stack values from
being baked into ABI permanently
arch/powerpc/include/asm/kvm_ppc.h | 1 +
arch/powerpc/include/uapi/asm/kvm.h | 8 ++++
arch/powerpc/kvm/powerpc.c | 71 +++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 4 ++
4 files changed, 84 insertions(+)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 0953f2daa466..169ea6a7fbad 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -319,6 +319,7 @@ struct kvmppc_ops {
bool (*hash_v3_possible)(void);
int (*create_vm_debugfs)(struct kvm *kvm);
int (*create_vcpu_debugfs)(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry);
+ int (*get_compat_caps)(struct kvm_ppc_compat_caps *host_caps);
};
extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 077c5437f521..19e53d5ae540 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -437,6 +437,14 @@ struct kvm_ppc_cpu_char {
__u64 behaviour_mask; /* valid bits in behaviour */
};
+/* For KVM_PPC_GET_COMPAT_CAPS */
+struct kvm_ppc_compat_caps {
+ __u64 size; /* Size of this structure */
+ __u64 flags; /* Reserved for future use */
+ __u64 compat_capabilities; /* Capabilities supported by the host */
+};
+#define KVM_PPC_COMPAT_CAPS_SIZE_VER0 24 /* sizeof first published struct */
+
/*
* Values for character and character_mask.
* These are identical to the values used by H_GET_CPU_CHARACTERISTICS.
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 98de68379b18..a2919b8b31c0 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -701,6 +701,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
}
}
break;
+#if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
+ case KVM_CAP_PPC_COMPAT_CAPS:
+ r = 0;
+ if (kvmhv_on_pseries())
+ r = 1;
+ break;
+#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
default:
r = 0;
break;
@@ -2467,6 +2474,70 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
r = kvm->arch.kvm_ops->svm_off(kvm);
break;
}
+ case KVM_PPC_GET_COMPAT_CAPS: {
+ struct kvm_ppc_compat_caps host_caps = {};
+ u64 usize;
+
+ /*
+ * Read the size field first to drive copy_struct_from_user.
+ * size must be the first field of the struct.
+ */
+ r = -EFAULT;
+ if (get_user(usize, (__u64 __user *)argp))
+ goto out;
+
+ /*
+ * Enforce a minimum: reject buffers smaller than the initial
+ * struct version (VER0). This allows old userspace compiled
+ * against the original struct to still work on a newer kernel
+ * that has grown the struct with appended fields.
+ */
+ r = -EINVAL;
+ if (usize < KVM_PPC_COMPAT_CAPS_SIZE_VER0)
+ goto out;
+
+ /*
+ * New userspace with a larger struct called an older kernel.
+ * Write back ksize in host_caps.size so userspace knows which
+ * older struct to retry with, then fail with -E2BIG.
+ */
+ if (usize > sizeof(host_caps)) {
+ host_caps.size = sizeof(host_caps);
+ r = -EFAULT;
+ if (put_user(host_caps.size, (__u64 __user *)argp))
+ goto out;
+ r = -E2BIG;
+ goto out;
+ }
+
+ /*
+ * copy_struct_from_user() handles forward/backward compat:
+ * usize == ksize: verbatim copy
+ * usize < ksize: zero-pad trailing (old userspace, new kernel)
+ */
+ r = copy_struct_from_user(&host_caps, sizeof(host_caps),
+ argp, usize);
+ if (r)
+ goto out;
+
+ /* Reserved fields must be zero */
+ r = -EINVAL;
+ if (host_caps.flags)
+ goto out;
+
+ r = -ENOTTY;
+ if (!kvm->arch.kvm_ops->get_compat_caps)
+ goto out;
+
+ r = kvm->arch.kvm_ops->get_compat_caps(&host_caps);
+ if (r)
+ goto out;
+
+ host_caps.size = sizeof(host_caps);
+ r = copy_struct_to_user(argp, usize, &host_caps,
+ sizeof(host_caps), NULL);
+ break;
+ }
default: {
struct kvm *kvm = filp->private_data;
r = kvm->arch.kvm_ops->arch_vm_ioctl(filp, ioctl, arg);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 419011097fa8..1cf9a959669e 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -997,6 +997,7 @@ struct kvm_enable_cap {
#define KVM_CAP_S390_KEYOP 247
#define KVM_CAP_S390_VSIE_ESAMODE 248
#define KVM_CAP_S390_HPAGE_2G 249
+#define KVM_CAP_PPC_COMPAT_CAPS 250
struct kvm_irq_routing_irqchip {
__u32 irqchip;
@@ -1350,6 +1351,9 @@ struct kvm_s390_keyop {
#define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr)
#define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr)
+/* Available with KVM_CAP_PPC_COMPAT_CAPS */
+#define KVM_PPC_GET_COMPAT_CAPS _IO(KVMIO, 0xe4)
+
/*
* ioctls for vcpu fds
*/
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v5 1/4] KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
2026-07-01 5:14 ` [PATCH v5 1/4] KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl Amit Machhiwal
@ 2026-07-01 5:23 ` sashiko-bot
0 siblings, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-07-01 5:23 UTC (permalink / raw)
To: Amit Machhiwal; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
- [High] The ioctl overwrites the userspace structure's `size` field with a value larger than the allocated buffer on success.
- [Medium] The explicit rejection of larger userspace structs defeats the built-in forward compatibility mechanism of `copy_struct_from_user()`.
--
commit d4727673391c2e4e3f4b9dd75bf4aeda3efb8249
Author: Amit Machhiwal <amachhiw@linux.ibm.com>
KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
This commit introduces a new capability and ioctl to expose CPU
compatibility modes supported by the host processor for nested guests.
It defines struct kvm_ppc_compat_caps and adds ioctl handling via
KVM_PPC_GET_COMPAT_CAPS.
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 00302399fc37b..8d6632c64ca5f 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
[ ... ]
> @@ -2463,6 +2470,70 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
[ ... ]
> + /*
> + * New userspace with a larger struct called an older kernel.
> + * Write back ksize in host_caps.size so userspace knows which
> + * older struct to retry with, then fail with -E2BIG.
> + */
> + if (usize > sizeof(host_caps)) {
> + host_caps.size = sizeof(host_caps);
> + r = -EFAULT;
> + if (put_user(host_caps.size, (__u64 __user *)argp))
> + goto out;
> + r = -E2BIG;
> + goto out;
> + }
> +
> + /*
> + * copy_struct_from_user() handles forward/backward compat:
> + * usize == ksize: verbatim copy
> + * usize < ksize: zero-pad trailing (old userspace, new kernel)
> + */
> + r = copy_struct_from_user(&host_caps, sizeof(host_caps),
> + argp, usize);
[Severity: Medium]
Does the explicit rejection of larger userspace structs defeat the built-in
forward compatibility mechanism of copy_struct_from_user()?
The copy_struct_from_user() helper natively handles larger userspace structs by
succeeding if the trailing bytes are zero, allowing a newer userspace to
transparently run on an older kernel without manual fallback logic. By
unconditionally returning -E2BIG here in kvm_arch_vm_ioctl() when usize is
greater than sizeof(host_caps), doesn't this force userspace to implement
explicit retry mechanisms instead of relying on the standard behavior?
[ ... ]
> + r = kvm->arch.kvm_ops->get_compat_caps(&host_caps);
> + if (r)
> + goto out;
> +
> + host_caps.size = sizeof(host_caps);
> + r = copy_struct_to_user(argp, usize, &host_caps,
> + sizeof(host_caps), NULL);
[Severity: High]
If an older userspace passes a smaller struct where usize is less than
sizeof(host_caps), does unconditionally setting host_caps.size to the larger
kernel size overwrite the userspace size field with a value larger than its
allocated buffer?
If userspace relies on this returned size field to process the structure,
could reading this larger size cause it to read past the end of its
allocated buffer?
> + break;
> + }
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260701051409.51820-1-amachhiw@linux.ibm.com?part=1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v5 2/4] KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM
2026-07-01 5:14 [PATCH v5 0/4] KVM: PPC: Expose CPU compatibility modes for nested guests Amit Machhiwal
2026-07-01 5:14 ` [PATCH v5 1/4] KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl Amit Machhiwal
@ 2026-07-01 5:14 ` Amit Machhiwal
2026-07-01 5:26 ` sashiko-bot
2026-07-01 5:14 ` [PATCH v5 3/4] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV Amit Machhiwal
2026-07-01 5:14 ` [PATCH v5 4/4] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl Amit Machhiwal
3 siblings, 1 reply; 9+ messages in thread
From: Amit Machhiwal @ 2026-07-01 5:14 UTC (permalink / raw)
To: linuxppc-dev, Madhavan Srinivasan
Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
Jonathan Corbet, Shuah Khan, Ritesh Harjani, kvm, linux-kernel,
linux-doc
On POWER systems, the host CPU may run in a compatibility mode (e.g., a
Power11 processor operating in Power10 compatibility mode). In such
cases, the effective CPU level exposed to guests differs from the
physical processor generation.
When running nested KVM guests, QEMU derives the host CPU type using
mfpvr(), which reflects the physical processor version. This can result
in a mismatch between the CPU model selected by QEMU and the
compatibility mode enforced by the host, leading to guest boot failures.
For example, booting a nested guest on a Power11 LPAR configured in
Power10 compatibility mode fails with:
KVM-NESTEDv2: couldn't set guest wide elements
[..KVM reg dump..]
This occurs because QEMU selects a CPU model corresponding to the
physical processor (via mfpvr()), while the host operates in a lower
compatibility mode. As a result, KVM rejects the requested compatibility
level during guest initialization.
On pseries nestedv2 systems, add support for retrieving host CPU
compatibility capabilities for nested guests on PowerVM. The capability
bitmap reflects the processor modes negotiated between the Power
hypervisor (L0) and the host partition (L1) via the
H_GUEST_GET_CAPABILITIES hcall, but is retrieved from the cached
nested_capabilities value populated during module initialization,
avoiding repeated hypervisor calls. A WARN_ON_ONCE() flags the
unexpected case where nested_capabilities is zero on a nestedv2 system.
The implementation defines KVM-specific capability constants
(KVM_PPC_COMPAT_CAP_POWER9/10/11), masks unsupported bits, and exposes
the result through the KVM_PPC_GET_COMPAT_CAPS ioctl.
Hook the implementation into the Book3S HV kvmppc_ops so that it can be
invoked by the generic KVM ioctl handling code.
Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
Changes in this version:
- Updated PowerVM implementation to use cached nested_capabilities instead
of making a live H_GUEST_GET_CAPABILITIES hcall on every ioctl call
- Added WARN_ON_ONCE(!nested_capabilities); sanity check when
nested_capabilities is unexpectedly zero on a nestedv2 system
arch/powerpc/include/uapi/asm/kvm.h | 10 ++++++++++
arch/powerpc/kvm/book3s_hv.c | 20 ++++++++++++++++++++
2 files changed, 30 insertions(+)
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 19e53d5ae540..913a64b901a3 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -445,6 +445,16 @@ struct kvm_ppc_compat_caps {
};
#define KVM_PPC_COMPAT_CAPS_SIZE_VER0 24 /* sizeof first published struct */
+/*
+ * Capability bits for compat_capabilities field in kvm_ppc_compat_caps.
+ * These bits indicate which processor compatibility modes are supported.
+ */
+#define KVM_PPC_COMPAT_CAP_POWER9 (1ULL << 62)
+#define KVM_PPC_COMPAT_CAP_POWER10 (1ULL << 61)
+#define KVM_PPC_COMPAT_CAP_POWER11 (1ULL << 60)
+#define KVM_PPC_COMPAT_BITMASK (KVM_PPC_COMPAT_CAP_POWER9 | \
+ KVM_PPC_COMPAT_CAP_POWER10 | \
+ KVM_PPC_COMPAT_CAP_POWER11)
/*
* Values for character and character_mask.
* These are identical to the values used by H_GET_CPU_CHARACTERISTICS.
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index f9380ef65750..152cd08a5b38 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6523,6 +6523,25 @@ static bool kvmppc_hash_v3_possible(void)
return true;
}
+
+static int kvmppc_get_compat_caps(struct kvm_ppc_compat_caps *host_caps)
+{
+ unsigned long capabilities = 0;
+ long rc = -EINVAL;
+
+ if (kvmhv_on_pseries()) {
+ if (kvmhv_is_nestedv2()) {
+ WARN_ON_ONCE(!nested_capabilities);
+ capabilities = nested_capabilities;
+ rc = 0;
+ }
+ }
+
+ host_caps->compat_capabilities = capabilities & KVM_PPC_COMPAT_BITMASK;
+
+ return rc;
+}
+
static struct kvmppc_ops kvm_ops_hv = {
.get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv,
.set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv,
@@ -6565,6 +6584,7 @@ static struct kvmppc_ops kvm_ops_hv = {
.hash_v3_possible = kvmppc_hash_v3_possible,
.create_vcpu_debugfs = kvmppc_arch_create_vcpu_debugfs_hv,
.create_vm_debugfs = kvmppc_arch_create_vm_debugfs_hv,
+ .get_compat_caps = kvmppc_get_compat_caps,
};
static int kvm_init_subcore_bitmap(void)
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v5 2/4] KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM
2026-07-01 5:14 ` [PATCH v5 2/4] KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM Amit Machhiwal
@ 2026-07-01 5:26 ` sashiko-bot
0 siblings, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-07-01 5:26 UTC (permalink / raw)
To: Amit Machhiwal; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] An unprivileged host-userspace process can trigger a kernel WARN_ON_ONCE by invoking the KVM_PPC_GET_COMPAT_CAPS ioctl when the L0 hypervisor reports zero nested capabilities.
--
commit 6f57063ca19824c71191d87b7d5e423ad8ddc900
Author: Amit Machhiwal <amachhiw@linux.ibm.com>
KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM
This commit adds support for retrieving host CPU compatibility capabilities
for nested guests on PowerVM. It exposes these capabilities to userspace
via the KVM_PPC_GET_COMPAT_CAPS ioctl, leveraging a cached value populated
during module initialization to avoid repeated hcalls.
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 61dbeea317f3b..8799124b80960 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
[ ... ]
> @@ -6510,6 +6510,25 @@ static bool kvmppc_hash_v3_possible(void)
> return true;
> }
>
> +
> +static int kvmppc_get_compat_caps(struct kvm_ppc_compat_caps *host_caps)
> +{
> + unsigned long capabilities = 0;
> + long rc = -EINVAL;
> +
> + if (kvmhv_on_pseries()) {
> + if (kvmhv_is_nestedv2()) {
> + WARN_ON_ONCE(!nested_capabilities);
[Severity: High]
Could this WARN_ON_ONCE be triggered by an unprivileged host-userspace
process?
If the L0 hypervisor reports no recognized capabilities during module
initialization, nested_capabilities is set to 0. A user process with access
to /dev/kvm could then create a VM and invoke the KVM_PPC_GET_COMPAT_CAPS
ioctl, directly evaluating this condition.
Since a userspace-reachable WARN_ON_ONCE can act as a local denial of
service by flooding dmesg or crashing the host if panic_on_warn is enabled,
would it be safer to return an error to userspace or log via pr_warn_once()
instead of asserting?
> + capabilities = nested_capabilities;
> + rc = 0;
> + }
> + }
> +
> + host_caps->compat_capabilities = capabilities & KVM_PPC_COMPAT_BITMASK;
> +
> + return rc;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260701051409.51820-1-amachhiw@linux.ibm.com?part=2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v5 3/4] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV
2026-07-01 5:14 [PATCH v5 0/4] KVM: PPC: Expose CPU compatibility modes for nested guests Amit Machhiwal
2026-07-01 5:14 ` [PATCH v5 1/4] KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl Amit Machhiwal
2026-07-01 5:14 ` [PATCH v5 2/4] KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM Amit Machhiwal
@ 2026-07-01 5:14 ` Amit Machhiwal
2026-07-01 5:27 ` sashiko-bot
2026-07-01 5:14 ` [PATCH v5 4/4] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl Amit Machhiwal
3 siblings, 1 reply; 9+ messages in thread
From: Amit Machhiwal @ 2026-07-01 5:14 UTC (permalink / raw)
To: linuxppc-dev, Madhavan Srinivasan
Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
Jonathan Corbet, Shuah Khan, Ritesh Harjani, kvm, linux-kernel,
linux-doc
Currently, when booting a compatibility-mode KVM guest (L1) on a PowerNV
hypervisor (L0), the guest runs with the expected processor
compatibility level. However, when booting a nested KVM guest (L2)
inside the L1, QEMU derives the CPU model from the raw host PVR and
attempts to run the nested guest at that level, instead of honoring the
compatibility mode of the L1.
Extend host CPU compatibility capability reporting to support nested
virtualization on PowerNV systems (PAPR nested API v1).
For nested API v2 (PowerVM), compatibility capabilities are served from
the cached nested_capabilities value (populated at module init via
kvmhv_nested_init() using the H_GUEST_GET_CAPABILITIES hcall). This
information is not available on PowerNV systems.
For nested API v1, derive the compatibility capabilities from the L1
guest by reading the "cpu-version" property from the device tree, which
reflects the effective (logical) processor compatibility level. Map this
value to the corresponding compatibility capability bitmap using
KVM-specific constants.
The mapping is cumulative: a system running at a given compatibility
level is assumed to also support older generations down the supported
chain. Note that unlike KVM on PowerVM (nested API v2), KVM on PowerNV
currently does not strictly enforce older generation compatibility modes
for nested guests - the reported capabilities reflect what the host CPU
can present, not what the hypervisor independently validates.
Introduce a helper kvmppc_map_compat_capabilities() to translate CPU
version values into KVM_PPC_COMPAT_CAP bits using a fallthrough switch,
and integrate it into kvmppc_get_compat_caps(). The implementation
applies masking to ensure only supported processor modes are exposed.
This allows userspace to query host CPU compatibility modes on both
KVM on PowerVM and on PowerNV platforms via the KVM_PPC_GET_COMPAT_CAPS
ioctl.
Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
Changes in this version:
- Converted switch in kvmppc_map_compat_capabilities() to use fallthrough
for cumulative compat mode reporting
- Check for 'rc' error before assigning 'capabilities' to
'host_caps->compat_capabilities'
- Call of_node_put(np) before break in for_each_node_by_type() loop to
avoid leaking the OF node reference
arch/powerpc/kvm/book3s_hv.c | 38 ++++++++++++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 152cd08a5b38..ba4b2b3aaf4e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6523,20 +6523,58 @@ static bool kvmppc_hash_v3_possible(void)
return true;
}
+static int kvmppc_map_compat_capabilities(const __be32 cpu_version,
+ unsigned long *capabilities)
+{
+ switch (cpu_version) {
+ case PVR_ARCH_31_P11:
+ *capabilities |= KVM_PPC_COMPAT_CAP_POWER11;
+ fallthrough;
+ case PVR_ARCH_31:
+ *capabilities |= KVM_PPC_COMPAT_CAP_POWER10;
+ fallthrough;
+ case PVR_ARCH_300:
+ *capabilities |= KVM_PPC_COMPAT_CAP_POWER9;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
static int kvmppc_get_compat_caps(struct kvm_ppc_compat_caps *host_caps)
{
+ struct device_node *np;
unsigned long capabilities = 0;
+ const __be32 *prop = NULL;
long rc = -EINVAL;
+ u32 cpu_version;
if (kvmhv_on_pseries()) {
if (kvmhv_is_nestedv2()) {
WARN_ON_ONCE(!nested_capabilities);
capabilities = nested_capabilities;
rc = 0;
+ } else {
+ for_each_node_by_type(np, "cpu") {
+ prop = of_get_property(np, "cpu-version", NULL);
+ if (prop) {
+ cpu_version = be32_to_cpup(prop);
+ of_node_put(np);
+ break;
+ }
+ }
+ if (!prop)
+ return -EINVAL;
+ rc = kvmppc_map_compat_capabilities(cpu_version,
+ &capabilities);
}
}
+ if (rc < 0)
+ return rc;
+
host_caps->compat_capabilities = capabilities & KVM_PPC_COMPAT_BITMASK;
return rc;
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v5 3/4] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV
2026-07-01 5:14 ` [PATCH v5 3/4] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV Amit Machhiwal
@ 2026-07-01 5:27 ` sashiko-bot
0 siblings, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-07-01 5:27 UTC (permalink / raw)
To: Amit Machhiwal; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [Low] Reading the `cpu-version` device tree property using `of_get_property` without validating its length can result in an out-of-bounds read if the property is malformed.
- [Low] Endianness type signature mismatch for `cpu_version` parameter causes Sparse static analysis warnings.
Pre-existing issues:
- [High] An unprivileged host user can trigger a `WARN_ON_ONCE` in `kvmppc_get_compat_caps()`, leading to a host denial of service.
--
commit 0383e83d38076b8b273c1d04bf2b738d14a3bbc2
Author: Amit Machhiwal <amachhiw@linux.ibm.com>
KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV
This commit extends host CPU compatibility capability reporting to support
nested virtualization on PowerNV systems. For nested API v1, it derives the
compatibility capabilities from the L1 guest by reading the cpu-version
property from the device tree and mapping it to the corresponding KVM
capability bitmap.
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 8799124b80960..70dcd370fd9ac 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
[ ... ]
> @@ -6510,20 +6510,58 @@ static bool kvmppc_hash_v3_possible(void)
> return true;
> }
>
> +static int kvmppc_map_compat_capabilities(const __be32 cpu_version,
> + unsigned long *capabilities)
[Severity: Low]
Is cpu_version actually host-endian here? The parameter is declared as
const __be32, but kvmppc_get_compat_caps() passes a host-endian u32 value
to it. Could this cause Sparse static analysis tools to warn about a
restricted __be32 degrading to an integer?
> +{
> + switch (cpu_version) {
> + case PVR_ARCH_31_P11:
[ ... ]
> static int kvmppc_get_compat_caps(struct kvm_ppc_compat_caps *host_caps)
> {
> + struct device_node *np;
> unsigned long capabilities = 0;
> + const __be32 *prop = NULL;
> long rc = -EINVAL;
> + u32 cpu_version;
>
> if (kvmhv_on_pseries()) {
> if (kvmhv_is_nestedv2()) {
> WARN_ON_ONCE(!nested_capabilities);
[Severity: High]
This is a pre-existing issue, but can this assertion be triggered by
unprivileged userspace?
If the KVM_PPC_GET_COMPAT_CAPS ioctl is called when nested_capabilities is 0
(for example, if the host hypervisor initializes nestedv2 but does not return
the capability flags), would this trigger a WARN_ON_ONCE?
According to KVM guidelines, adversary-reachable WARN_ON assertions can be
host-side denial of service vectors if panic_on_warn is enabled.
> capabilities = nested_capabilities;
> rc = 0;
> + } else {
> + for_each_node_by_type(np, "cpu") {
> + prop = of_get_property(np, "cpu-version", NULL);
> + if (prop) {
> + cpu_version = be32_to_cpup(prop);
[Severity: Low]
Could this read out of bounds if the cpu-version device tree property is
malformed and less than 4 bytes long?
The pointer returned by of_get_property() in kvmppc_get_compat_caps() is
dereferenced by be32_to_cpup() without checking the length. Would using
of_property_read_u32() natively validate the length and securely handle
the endianness conversion?
> + of_node_put(np);
> + break;
> + }
> + }
[ ... ]
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260701051409.51820-1-amachhiw@linux.ibm.com?part=3
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v5 4/4] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl
2026-07-01 5:14 [PATCH v5 0/4] KVM: PPC: Expose CPU compatibility modes for nested guests Amit Machhiwal
` (2 preceding siblings ...)
2026-07-01 5:14 ` [PATCH v5 3/4] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV Amit Machhiwal
@ 2026-07-01 5:14 ` Amit Machhiwal
2026-07-01 5:28 ` sashiko-bot
3 siblings, 1 reply; 9+ messages in thread
From: Amit Machhiwal @ 2026-07-01 5:14 UTC (permalink / raw)
To: linuxppc-dev, Madhavan Srinivasan
Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
Jonathan Corbet, Shuah Khan, Ritesh Harjani, kvm, linux-kernel,
linux-doc
Add documentation for the KVM_PPC_GET_COMPAT_CAPS ioctl to the KVM API
documentation.
The ioctl exposes host processor compatibility modes supported for
nested KVM guests on PowerPC systems. The documentation covers error
code descriptions including E2BIG for forward compatibility, the
extensible size-based versioning contract using
KVM_PPC_COMPAT_CAPS_SIZE_VER0, the rationale for rejecting non-zero
reserved fields to prevent ABI ambiguity, bit numbering clarification
for IBM MSB-0 convention, and KVM-specific capability bit constants.
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
Changes in this version:
- Updated error table: EINVAL now reflects size < VER0 or flags != 0;
added E2BIG for new userspace on old kernel
- Replaced stale strict-size-validation paragraph with description of
the copy_struct_from_user/to_user extensibility model and
KVM_PPC_COMPAT_CAPS_SIZE_VER0 versioning contract
- Added rationale for flags == 0 enforcement to prevent ABI ambiguity
Documentation/virt/kvm/api.rst | 79 ++++++++++++++++++++++++++++++++++
1 file changed, 79 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a5f9ee92f43e..43810c451317 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6566,6 +6566,85 @@ KVM_S390_KEYOP_SSKE
Sets the storage key for the guest address ``guest_addr`` to the key
specified in ``key``, returning the previous value in ``key``.
+4.145 KVM_PPC_GET_COMPAT_CAPS
+-----------------------------
+:Capability: KVM_CAP_PPC_COMPAT_CAPS
+:Architectures: powerpc
+:Type: vm ioctl
+:Parameters: struct kvm_ppc_compat_caps (out)
+:Returns: 0 on success, negative value on failure
+
+Errors include:
+
+ ======== ============================================================
+ EFAULT if ``struct kvm_ppc_compat_caps`` cannot be read from or
+ written to userspace
+ EINVAL if the ``size`` field is smaller than
+ ``KVM_PPC_COMPAT_CAPS_SIZE_VER0``, if the ``flags`` field
+ is non-zero, or if the backend fails to retrieve or map
+ CPU compatibility capabilities
+ E2BIG if ``size`` is larger than the kernel's struct size
+ (new userspace on old kernel); the kernel writes back its
+ own struct size into the ``size`` field so userspace can
+ retry with the correct size
+ ENOTTY if the backend does not implement the ``get_compat_caps``
+ operation (e.g., on non-pseries platforms or when the
+ required KVM operations are not available)
+ ======== ============================================================
+
+IBM POWER system server-based processors provide a compatibility mode feature
+where an Nth generation processor can operate in modes consistent with earlier
+generations such as (N-1) and (N-2).
+
+This ioctl provides userspace with information about the CPU compatibility modes
+supported by the current host processor for booting the nested KVM guests on
+KVM on PowerNV (nested API v1) and KVM on PowerVM (nested API v2) platforms.
+
+::
+
+ struct kvm_ppc_compat_caps {
+ __u64 size; /* Size of this structure */
+ __u64 flags; /* Reserved for future use, must be 0 */
+ __u64 compat_capabilities; /* Capabilities supported by the host */
+ };
+
+Before calling this ioctl, userspace must set the ``size`` field to
+``sizeof(struct kvm_ppc_compat_caps)`` and zero the ``flags`` field.
+The kernel rejects non-zero ``flags`` with ``-EINVAL`` to prevent
+uninitialized stack values from being silently accepted, keeping the
+field available for future use without ABI ambiguity.
+
+The ioctl uses ``copy_struct_from_user()`` and ``copy_struct_to_user()``
+to support extensible versioning: if userspace passes a struct smaller
+than the current kernel version (``size >= KVM_PPC_COMPAT_CAPS_SIZE_VER0``),
+the kernel zero-pads unknown trailing fields. If userspace passes a larger
+struct (``size > sizeof(struct kvm_ppc_compat_caps)``), the kernel writes
+back its own struct size into the ``size`` field and returns ``-E2BIG``,
+allowing userspace to discover the kernel's struct size and retry.
+``KVM_PPC_COMPAT_CAPS_SIZE_VER0`` (24) is a frozen constant marking the
+size of the initial struct version.
+
+The ``compat_capabilities`` bit field describes the processor compatibility
+modes supported by the host. The following bits indicate support for specific
+processor modes (using IBM's MSB-0 convention where bit 0 is the most
+significant bit):
+
+- ``KVM_PPC_COMPAT_CAP_POWER9`` (bit 1) -- KVM guests can run in Power9 processor mode
+- ``KVM_PPC_COMPAT_CAP_POWER10`` (bit 2) -- KVM guests can run in Power10 processor mode
+- ``KVM_PPC_COMPAT_CAP_POWER11`` (bit 3) -- KVM guests can run in Power11 processor mode
+
+.. note::
+
+ The bit numbering above uses IBM's MSB-0 convention (bit 0 is the most
+ significant bit). In the actual implementation, these are defined as:
+
+ - ``KVM_PPC_COMPAT_CAP_POWER9`` = ``(1ULL << 62)``
+ - ``KVM_PPC_COMPAT_CAP_POWER10`` = ``(1ULL << 61)``
+ - ``KVM_PPC_COMPAT_CAP_POWER11`` = ``(1ULL << 60)``
+
+ Userspace should use the defined constants from ``<linux/kvm.h>`` rather
+ than hardcoding bit positions.
+
.. _kvm_run:
5. The kvm_run structure
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v5 4/4] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl
2026-07-01 5:14 ` [PATCH v5 4/4] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl Amit Machhiwal
@ 2026-07-01 5:28 ` sashiko-bot
0 siblings, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-07-01 5:28 UTC (permalink / raw)
To: Amit Machhiwal; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues:
- [Medium] The documentation specifies the ioctl returns `-ENOTTY` on non-pseries platforms, but the code returns `-EINVAL`.
- [Low] The ioctl parameter is documented as an `(out)` parameter, but it is actually an `(in/out)` parameter.
Pre-existing issues:
- [High] The UAPI unconditionally overwrites the user's `size` field with the kernel's struct size on success, leading to future user-space buffer overflows.
--
commit 2c871fad53b6cd5636eeb954745af1ef601fd441
Author: Amit Machhiwal <amachhiw@linux.ibm.com>
KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl
Add documentation for the KVM_PPC_GET_COMPAT_CAPS ioctl to the KVM API
documentation. The ioctl exposes host processor compatibility modes supported
for nested KVM guests on PowerPC systems.
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index a5f9ee92f43e8..43810c451317f 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6566,6 +6566,85 @@ KVM_S390_KEYOP_SSKE
> Sets the storage key for the guest address ``guest_addr`` to the key
> specified in ``key``, returning the previous value in ``key``.
>
> +4.145 KVM_PPC_GET_COMPAT_CAPS
> +-----------------------------
> +:Capability: KVM_CAP_PPC_COMPAT_CAPS
> +:Architectures: powerpc
> +:Type: vm ioctl
> +:Parameters: struct kvm_ppc_compat_caps (out)
[Severity: Low]
Since userspace must set the size and flags fields before calling
the ioctl, should this parameter be designated as (in/out) rather than
just (out)?
> +:Returns: 0 on success, negative value on failure
> +
> +Errors include:
> +
> + ======== ============================================================
> + EFAULT if ``struct kvm_ppc_compat_caps`` cannot be read from or
> + written to userspace
> + EINVAL if the ``size`` field is smaller than
> + ``KVM_PPC_COMPAT_CAPS_SIZE_VER0``, if the ``flags`` field
> + is non-zero, or if the backend fails to retrieve or map
> + CPU compatibility capabilities
> + E2BIG if ``size`` is larger than the kernel's struct size
> + (new userspace on old kernel); the kernel writes back its
> + own struct size into the ``size`` field so userspace can
> + retry with the correct size
> + ENOTTY if the backend does not implement the ``get_compat_caps``
> + operation (e.g., on non-pseries platforms or when the
> + required KVM operations are not available)
[Severity: Medium]
Is this error code accurate for non-pseries platforms?
Looking at kvmppc_get_compat_caps() in arch/powerpc/kvm/book3s_hv.c,
it appears to return -EINVAL instead:
arch/powerpc/kvm/book3s_hv.c:kvmppc_get_compat_caps() {
long rc = -EINVAL;
if (kvmhv_on_pseries()) {
...
}
if (rc < 0)
return rc;
}
> + ======== ============================================================
> +
> +IBM POWER system server-based processors provide a compatibility mode feature
> +where an Nth generation processor can operate in modes consistent with earlier
> +generations such as (N-1) and (N-2).
> +
> +This ioctl provides userspace with information about the CPU compatibility modes
> +supported by the current host processor for booting the nested KVM guests on
> +KVM on PowerNV (nested API v1) and KVM on PowerVM (nested API v2) platforms.
> +
> +::
> +
> + struct kvm_ppc_compat_caps {
> + __u64 size; /* Size of this structure */
> + __u64 flags; /* Reserved for future use, must be 0 */
> + __u64 compat_capabilities; /* Capabilities supported by the host */
> + };
> +
> +Before calling this ioctl, userspace must set the ``size`` field to
> +``sizeof(struct kvm_ppc_compat_caps)`` and zero the ``flags`` field.
> +The kernel rejects non-zero ``flags`` with ``-EINVAL`` to prevent
> +uninitialized stack values from being silently accepted, keeping the
> +field available for future use without ABI ambiguity.
> +
> +The ioctl uses ``copy_struct_from_user()`` and ``copy_struct_to_user()``
> +to support extensible versioning: if userspace passes a struct smaller
> +than the current kernel version (``size >= KVM_PPC_COMPAT_CAPS_SIZE_VER0``),
> +the kernel zero-pads unknown trailing fields.
[Severity: High]
This isn't a bug introduced by this patch, but while documenting the extensible
versioning, does the existing code in kvm_arch_vm_ioctl() handle successful
calls correctly for smaller user structs?
arch/powerpc/kvm/powerpc.c:kvm_arch_vm_ioctl() {
...
host_caps.size = sizeof(host_caps);
r = copy_struct_to_user(argp, usize, &host_caps,
sizeof(host_caps), NULL);
...
}
By unconditionally overwriting host_caps.size with the kernel's struct size
on success, if the kernel struct is extended in the future (e.g., from 24 to
32 bytes), a userspace application passing size = 24 will get a success return
but have its size field overwritten with 32.
If userspace then reuses this struct for a subsequent ioctl call without
reinitializing the size field, it will pass size = 32. Will the kernel then read
and write 32 bytes to the user's 24-byte buffer, potentially corrupting memory?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260701051409.51820-1-amachhiw@linux.ibm.com?part=4
^ permalink raw reply [flat|nested] 9+ messages in thread