* [PATCH v4 1/4] KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
2026-06-16 12:33 [PATCH v4 0/4] KVM: PPC: Expose CPU compatibility modes for nested guests Amit Machhiwal
@ 2026-06-16 12:33 ` Amit Machhiwal
2026-06-16 12:45 ` sashiko-bot
2026-06-16 12:33 ` [PATCH v4 2/4] KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM Amit Machhiwal
` (2 subsequent siblings)
3 siblings, 1 reply; 9+ messages in thread
From: Amit Machhiwal @ 2026-06-16 12:33 UTC (permalink / raw)
To: linuxppc-dev, Madhavan Srinivasan
Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
Jonathan Corbet, Shuah Khan, kvm, linux-kernel, linux-doc, lkp
Introduce a new capability and ioctl to expose CPU compatibility modes
supported by the host processor for nested guests.
On IBM POWER systems, newer processor generations (N) can operate in
compatibility modes corresponding to earlier generations, like (N-1) and
(N-2). This is particularly relevant for nested virtualization, where
nested KVM guests may need to run with a specific processor compatibility
level.
Introduce KVM_CAP_PPC_COMPAT_CAPS capability and the corresponding
KVM_PPC_GET_COMPAT_CAPS vm ioctl. The ioctl returns a bitmap describing
the compatibility modes supported by the host in respective bit numbers,
allowing userspace (e.g., QEMU) to select an appropriate compatibility
level when configuring nested KVM guests.
The ioctl handling is added in kvm_arch_vm_ioctl() and retrieves host
CPU compatibility capabilities via a PowerPC-specific backend
implementation when available. The implementation validates the structure
size from userspace to ensure forward compatibility and returns
appropriate error codes (EINVAL for invalid size, EFAULT for copy
failures, ENOTTY if backend is not implemented). The struct
kvm_ppc_compat_caps includes a size field to support future ABI
extensions.
Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
arch/powerpc/include/asm/kvm_ppc.h | 1 +
arch/powerpc/include/uapi/asm/kvm.h | 7 ++++++
arch/powerpc/kvm/powerpc.c | 35 +++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 4 ++++
4 files changed, 47 insertions(+)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 0953f2daa466..169ea6a7fbad 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -319,6 +319,7 @@ struct kvmppc_ops {
bool (*hash_v3_possible)(void);
int (*create_vm_debugfs)(struct kvm *kvm);
int (*create_vcpu_debugfs)(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry);
+ int (*get_compat_caps)(struct kvm_ppc_compat_caps *host_caps);
};
extern struct kvmppc_ops *kvmppc_hv_ops;
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 077c5437f521..8a38be6c3b03 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -437,6 +437,13 @@ struct kvm_ppc_cpu_char {
__u64 behaviour_mask; /* valid bits in behaviour */
};
+/* For KVM_PPC_GET_COMPAT_CAPS */
+struct kvm_ppc_compat_caps {
+ __u64 flags; /* Reserved for future use */
+ __u64 size; /* Size of this structure */
+ __u64 compat_capabilities; /* Capabilities supported by the host */
+};
+
/*
* Values for character and character_mask.
* These are identical to the values used by H_GET_CPU_CHARACTERISTICS.
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 98de68379b18..9153b0034b45 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -701,6 +701,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
}
}
break;
+#if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
+ case KVM_CAP_PPC_COMPAT_CAPS:
+ r = 0;
+ if (kvmhv_on_pseries())
+ r = 1;
+ break;
+#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
default:
r = 0;
break;
@@ -2467,6 +2474,34 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
r = kvm->arch.kvm_ops->svm_off(kvm);
break;
}
+ case KVM_PPC_GET_COMPAT_CAPS: {
+ struct kvm_ppc_compat_caps host_caps;
+ u64 user_size;
+
+ r = -EFAULT;
+ /* First, get the size field from userspace to validate */
+ if (copy_from_user(&user_size, &((struct kvm_ppc_compat_caps
+ __user *)argp)->size, sizeof(user_size))) {
+ goto out;
+ }
+
+ /* Validate size - must be at least the current structure size */
+ r = -EINVAL;
+ if (user_size < sizeof(host_caps))
+ goto out;
+
+ r = -ENOTTY;
+ memset(&host_caps, 0, sizeof(host_caps));
+ if (!kvm->arch.kvm_ops->get_compat_caps)
+ goto out;
+
+ r = kvm->arch.kvm_ops->get_compat_caps(&host_caps);
+ /* Set the actual size of the structure we're returning */
+ host_caps.size = sizeof(host_caps);
+ if (!r && copy_to_user(argp, &host_caps, sizeof(host_caps)))
+ r = -EFAULT;
+ break;
+ }
default: {
struct kvm *kvm = filp->private_data;
r = kvm->arch.kvm_ops->arch_vm_ioctl(filp, ioctl, arg);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 6c8afa2047bf..1788a0068662 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -996,6 +996,7 @@ struct kvm_enable_cap {
#define KVM_CAP_S390_USER_OPEREXEC 246
#define KVM_CAP_S390_KEYOP 247
#define KVM_CAP_S390_VSIE_ESAMODE 248
+#define KVM_CAP_PPC_COMPAT_CAPS 249
struct kvm_irq_routing_irqchip {
__u32 irqchip;
@@ -1349,6 +1350,9 @@ struct kvm_s390_keyop {
#define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr)
#define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr)
+/* Available with KVM_CAP_PPC_COMPAT_CAPS */
+#define KVM_PPC_GET_COMPAT_CAPS _IOR(KVMIO, 0xe4, struct kvm_ppc_compat_caps)
+
/*
* ioctls for vcpu fds
*/
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v4 1/4] KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
2026-06-16 12:33 ` [PATCH v4 1/4] KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl Amit Machhiwal
@ 2026-06-16 12:45 ` sashiko-bot
0 siblings, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-06-16 12:45 UTC (permalink / raw)
To: Amit Machhiwal; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Backward compatibility failure due to strict size check and unconditional copy size.
- [High] Incorrect `_IOR` ioctl direction flag breaks UAPI extensibility and tooling.
- [High] Unvalidated reserved `flags` field prevents future use as an input.
- [Medium] Capability `KVM_CAP_PPC_COMPAT_CAPS` is advertised even when the backend lacks support.
--
commit c1e1544ff6eb8d147eda5713932bdc47b539c526
Author: Amit Machhiwal <amachhiw@linux.ibm.com>
Subject: KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl
This commit introduces a new KVM capability and ioctl to expose host CPU
compatibility modes to nested guests. It adds the KVM_CAP_PPC_COMPAT_CAPS
capability and the KVM_PPC_GET_COMPAT_CAPS ioctl, which returns a bitmap
of supported compatibility modes via the kvm_ppc_compat_caps structure.
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 00302399fc37b..2f72431b6ef53 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
[ ... ]
> @@ -697,6 +697,13 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
> }
> }
> break;
> +#if defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE)
> + case KVM_CAP_PPC_COMPAT_CAPS:
> + r = 0;
> + if (kvmhv_on_pseries())
> + r = 1;
> + break;
> +#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
[Severity: Medium]
Will this falsely advertise support if the backend lacks the get_compat_caps
callback?
It appears this unconditionally returns 1 when kvmhv_on_pseries() is true,
even if running on a backend like KVM-PR that might not populate
kvm->arch.kvm_ops->get_compat_caps.
Should this verify the backend capabilities before returning 1?
[ ... ]
> @@ -2463,6 +2470,34 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
> r = kvm->arch.kvm_ops->svm_off(kvm);
> break;
> }
> + case KVM_PPC_GET_COMPAT_CAPS: {
> + struct kvm_ppc_compat_caps host_caps;
> + u64 user_size;
> +
> + r = -EFAULT;
> + /* First, get the size field from userspace to validate */
> + if (copy_from_user(&user_size, &((struct kvm_ppc_compat_caps
> + __user *)argp)->size, sizeof(user_size))) {
> + goto out;
> + }
> +
> + /* Validate size - must be at least the current structure size */
> + r = -EINVAL;
> + if (user_size < sizeof(host_caps))
> + goto out;
[Severity: High]
Does this strict size check prevent backward compatibility?
If the structure is extended in a future kernel, sizeof(host_caps) will
increase. Older userspace binaries passing their original, smaller user_size
will be rejected with -EINVAL.
Could this be resolved by accepting smaller sizes and limiting the
copy_to_user() length to min(user_size, sizeof(host_caps))?
> +
> + r = -ENOTTY;
> + memset(&host_caps, 0, sizeof(host_caps));
[Severity: High]
Does zeroing the structure locally without validating the user-provided flags
prevent those flags from being used in the future?
Since the userspace flags aren't read and verified to be zero, applications
might pass uninitialized garbage memory. If a future kernel assigns meaning to
these flags, wouldn't it interpret that legacy garbage data as valid input?
> + if (!kvm->arch.kvm_ops->get_compat_caps)
> + goto out;
> +
> + r = kvm->arch.kvm_ops->get_compat_caps(&host_caps);
> + /* Set the actual size of the structure we're returning */
> + host_caps.size = sizeof(host_caps);
> + if (!r && copy_to_user(argp, &host_caps, sizeof(host_caps)))
> + r = -EFAULT;
> + break;
> + }
[ ... ]
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6c8afa2047bf3..1788a00686626 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
[ ... ]
> @@ -1349,6 +1350,9 @@ struct kvm_s390_keyop {
> #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr)
> #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr)
>
> +/* Available with KVM_CAP_PPC_COMPAT_CAPS */
> +#define KVM_PPC_GET_COMPAT_CAPS _IOR(KVMIO, 0xe4, struct kvm_ppc_compat_caps)
[Severity: High]
Does using _IOR here cause issues for tracing tools and user-mode translation
layers?
Since the implementation reads the size field from userspace via
copy_from_user(), it seems this should be defined with _IOWR.
Also, since _IOR embeds the structure size into the ioctl number, won't
expanding the structure in the future change the ioctl number and break older
binaries?
> /*
> * ioctls for vcpu fds
> */
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616123314.82721-1-amachhiw@linux.ibm.com?part=1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v4 2/4] KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM
2026-06-16 12:33 [PATCH v4 0/4] KVM: PPC: Expose CPU compatibility modes for nested guests Amit Machhiwal
2026-06-16 12:33 ` [PATCH v4 1/4] KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl Amit Machhiwal
@ 2026-06-16 12:33 ` Amit Machhiwal
2026-06-16 13:00 ` sashiko-bot
2026-06-16 12:33 ` [PATCH v4 3/4] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV Amit Machhiwal
2026-06-16 12:33 ` [PATCH v4 4/4] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl Amit Machhiwal
3 siblings, 1 reply; 9+ messages in thread
From: Amit Machhiwal @ 2026-06-16 12:33 UTC (permalink / raw)
To: linuxppc-dev, Madhavan Srinivasan
Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
Jonathan Corbet, Shuah Khan, kvm, linux-kernel, linux-doc, lkp
On POWER systems, the host CPU may run in a compatibility mode (e.g., a
Power11 processor operating in Power10 compatibility mode). In such
cases, the effective CPU level exposed to guests differs from the
physical processor generation.
When running nested KVM guests, QEMU derives the host CPU type using
mfpvr(), which reflects the physical processor version. This can result
in a mismatch between the CPU model selected by QEMU and the
compatibility mode enforced by the host, leading to guest boot failures.
For example, booting a nested guest on a Power11 LPAR configured in
Power10 compatibility mode fails with:
KVM-NESTEDv2: couldn't set guest wide elements
[..KVM reg dump..]
This occurs because QEMU selects a CPU model corresponding to the
physical processor (via mfpvr()), while the host operates in a lower
compatibility mode. As a result, KVM rejects the requested compatibility
level during guest initialization.
Add support for retrieving host CPU compatibility capabilities for
nested guests on PowerVM (PAPR nested API v2). The hypervisor provides
the effective compatibility levels via the H_GUEST_GET_CAPABILITIES
hcall, which reflects the processor modes negotiated between the Power
hypervisor (L0) and the host partition (L1).
On pseries systems, obtain the capability bitmap using
plpar_guest_get_capabilities() and return it via struct
kvm_ppc_compat_caps. The implementation defines KVM-specific capability
constants (KVM_PPC_COMPAT_CAP_POWER9/10/11) and applies masking to ensure
only supported processor modes are exposed to userspace. This information
is then exposed through the KVM_PPC_GET_COMPAT_CAPS ioctl.
Hook the implementation into the Book3S HV kvmppc_ops so that it can be
invoked by the generic KVM ioctl handling code.
Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
arch/powerpc/include/uapi/asm/kvm.h | 11 ++++++++++-
arch/powerpc/kvm/book3s_hv.c | 17 +++++++++++++++++
2 files changed, 27 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 8a38be6c3b03..730488681443 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -443,7 +443,16 @@ struct kvm_ppc_compat_caps {
__u64 size; /* Size of this structure */
__u64 compat_capabilities; /* Capabilities supported by the host */
};
-
+/*
+ * Capability bits for compat_capabilities field in kvm_ppc_compat_caps.
+ * These bits indicate which processor compatibility modes are supported.
+ */
+#define KVM_PPC_COMPAT_CAP_POWER9 (1ULL << 62)
+#define KVM_PPC_COMPAT_CAP_POWER10 (1ULL << 61)
+#define KVM_PPC_COMPAT_CAP_POWER11 (1ULL << 60)
+#define KVM_PPC_COMPAT_BITMASK (KVM_PPC_COMPAT_CAP_POWER9 | \
+ KVM_PPC_COMPAT_CAP_POWER10 | \
+ KVM_PPC_COMPAT_CAP_POWER11)
/*
* Values for character and character_mask.
* These are identical to the values used by H_GET_CPU_CHARACTERISTICS.
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index f9380ef65750..f674386df62c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6523,6 +6523,22 @@ static bool kvmppc_hash_v3_possible(void)
return true;
}
+
+static int kvmppc_get_compat_caps(struct kvm_ppc_compat_caps *host_caps)
+{
+ unsigned long capabilities = 0;
+ long rc = -EINVAL;
+
+ if (kvmhv_on_pseries()) {
+ if (kvmhv_is_nestedv2())
+ rc = plpar_guest_get_capabilities(0, &capabilities);
+ host_caps->compat_capabilities = capabilities &
+ KVM_PPC_COMPAT_BITMASK;
+ }
+
+ return rc;
+}
+
static struct kvmppc_ops kvm_ops_hv = {
.get_sregs = kvm_arch_vcpu_ioctl_get_sregs_hv,
.set_sregs = kvm_arch_vcpu_ioctl_set_sregs_hv,
@@ -6565,6 +6581,7 @@ static struct kvmppc_ops kvm_ops_hv = {
.hash_v3_possible = kvmppc_hash_v3_possible,
.create_vcpu_debugfs = kvmppc_arch_create_vcpu_debugfs_hv,
.create_vm_debugfs = kvmppc_arch_create_vm_debugfs_hv,
+ .get_compat_caps = kvmppc_get_compat_caps,
};
static int kvm_init_subcore_bitmap(void)
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v4 2/4] KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM
2026-06-16 12:33 ` [PATCH v4 2/4] KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM Amit Machhiwal
@ 2026-06-16 13:00 ` sashiko-bot
0 siblings, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-06-16 13:00 UTC (permalink / raw)
To: Amit Machhiwal; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
- [High] Strict size validation of the extensible `struct kvm_ppc_compat_caps` prevents future extensions without breaking backward compatibility.
- [High] Raw hypervisor `H_*` error codes are directly leaked to userspace as POSIX error numbers, causing incorrect system call mapping and potential infinite loops.
- [High] The ioctl incorrectly queries raw L0 hypervisor capabilities instead of the capabilities actually negotiated by the L1 KVM host.
- [Medium] Device tree node reference leak in `for_each_node_by_type` loop.
--
commit c079a1c0671cc47d6e2767d0d012a074b48a3edf
Author: Amit Machhiwal <amachhiw@linux.ibm.com>
KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM
On POWER systems, the host CPU may run in a compatibility mode (e.g., a
Power11 processor operating in Power10 compatibility mode). In such
cases, the effective CPU level exposed to guests differs from the
physical processor generation.
[ ... ]
> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
> index 8a38be6c3b03b..7304886814435 100644
> --- a/arch/powerpc/include/uapi/asm/kvm.h
> +++ b/arch/powerpc/include/uapi/asm/kvm.h
> @@ -443,7 +443,16 @@ struct kvm_ppc_compat_caps {
> __u64 size; /* Size of this structure */
> __u64 compat_capabilities; /* Capabilities supported by the host */
> };
[Severity: High]
The KVM_PPC_GET_COMPAT_CAPS ioctl is defined using
_IOR(KVMIO, 0xe4, struct kvm_ppc_compat_caps). Since _IOR embeds the
structure size in the ioctl number, if fields are added to this
structure in the future, the ioctl number will change.
Additionally, the explicit size validation in kvm_arch_vm_ioctl() rejects
requests where user_size < sizeof(host_caps), returning -EINVAL:
arch/powerpc/kvm/powerpc.c:kvm_arch_vm_ioctl() {
...
/* Validate size - must be at least the current structure size */
r = -EINVAL;
if (user_size < sizeof(host_caps))
goto out;
...
}
Does this prevent extending the structure in the future without breaking
backward UAPI compatibility for older userspace binaries?
> +/*
> + * Capability bits for compat_capabilities field in kvm_ppc_compat_caps.
> + * These bits indicate which processor compatibility modes are supported.
> + */
> +#define KVM_PPC_COMPAT_CAP_POWER9 (1ULL << 62)
[ ... ]
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 61dbeea317f3b..5ef6713343bed 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -6510,6 +6510,22 @@ static bool kvmppc_hash_v3_possible(void)
[ ... ]
> +static int kvmppc_get_compat_caps(struct kvm_ppc_compat_caps *host_caps)
> +{
> + unsigned long capabilities = 0;
> + long rc = -EINVAL;
> +
> + if (kvmhv_on_pseries()) {
> + if (kvmhv_is_nestedv2())
> + rc = plpar_guest_get_capabilities(0, &capabilities);
[Severity: High]
Are we returning a raw hypervisor error code directly to userspace here?
Negative codes like H_PARAMETER (-4) could be incorrectly interpreted as
standard POSIX errors (e.g., -EINTR). Standard userspace libraries might
handle EINTR by automatically retrying the ioctl, which could lead to an
infinite loop.
Positive codes like H_NOT_AVAILABLE (3) will be returned directly to the
userspace caller, skipping the copy_to_user logic in kvm_arch_vm_ioctl()
while still appearing as a success to userspace.
Should we map these raw hypervisor status codes to standard POSIX errors
before returning?
[Severity: High]
Does this query the raw L0 hypervisor capabilities instead of the capabilities
actively negotiated by the L1 KVM host?
According to the PAPR nested virtualization specification, this hypercall
retrieves unmodified L0 capabilities. However, KVM limits its nested guest
capabilities based on nested_capabilities negotiated during initialization:
arch/powerpc/kvm/book3s_hv_nested.c:kvmhv_nestedv2_init() {
...
nested_capabilities = capabilities & host_capabilities;
rc = plpar_guest_set_capabilities(0, nested_capabilities);
...
}
If QEMU receives raw L0 capabilities, could it attempt to use CPU modes that
KVM will subsequently reject during vCPU initialization, leading to guest
boot failures?
Should we return the negotiated nested_capabilities instead?
> + host_caps->compat_capabilities = capabilities &
> + KVM_PPC_COMPAT_BITMASK;
> + }
> +
> + return rc;
> +}
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616123314.82721-1-amachhiw@linux.ibm.com?part=2
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v4 3/4] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV
2026-06-16 12:33 [PATCH v4 0/4] KVM: PPC: Expose CPU compatibility modes for nested guests Amit Machhiwal
2026-06-16 12:33 ` [PATCH v4 1/4] KVM: PPC: Introduce KVM_CAP_PPC_COMPAT_CAPS and wire up ioctl Amit Machhiwal
2026-06-16 12:33 ` [PATCH v4 2/4] KVM: PPC: Book3S HV: Implement compat CPU capability retrieval for KVM on PowerVM Amit Machhiwal
@ 2026-06-16 12:33 ` Amit Machhiwal
2026-06-16 12:47 ` sashiko-bot
2026-06-16 12:33 ` [PATCH v4 4/4] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl Amit Machhiwal
3 siblings, 1 reply; 9+ messages in thread
From: Amit Machhiwal @ 2026-06-16 12:33 UTC (permalink / raw)
To: linuxppc-dev, Madhavan Srinivasan
Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
Jonathan Corbet, Shuah Khan, kvm, linux-kernel, linux-doc, lkp
Currently, when booting a compatibility-mode KVM guest (L1) on a PowerNV
hypervisor (L0), the guest runs with the expected processor
compatibility level. However, when booting a nested KVM guest (L2)
inside the L1, QEMU derives the CPU model from the raw host PVR and
attempts to run the nested guest at that level, instead of honoring the
compatibility mode of the L1.
Extend host CPU compatibility capability reporting to support nested
virtualization on PowerNV systems (PAPR nested API v1).
For nested API v2 (PowerVM), compatibility capabilities are obtained
from the hypervisor via the H_GUEST_GET_CAPABILITIES hcall. This
information is not available on PowerNV systems.
For nested API v1, derive the compatibility capabilities from the L1
guest by reading the "cpu-version" property from the device tree, which
reflects the effective (logical) processor compatibility level. Map this
value to the corresponding compatibility capability bitmap using
KVM-specific constants.
Introduce a helper to translate CPU version values into KVM_PPC_COMPAT_CAP
bits and integrate it into kvmppc_get_compat_caps(). The implementation
applies masking to ensure only supported processor modes are exposed.
This allows userspace to query host CPU compatibility modes on both
PowerVM and PowerNV platforms via the KVM_PPC_GET_COMPAT_CAPS ioctl.
Suggested-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
arch/powerpc/kvm/book3s_hv.c | 37 +++++++++++++++++++++++++++++++++++-
1 file changed, 36 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index f674386df62c..375e7a7fa9f8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -6523,15 +6523,50 @@ static bool kvmppc_hash_v3_possible(void)
return true;
}
+static int kvmppc_map_compat_capabilities(const __be32 cpu_version,
+ unsigned long *capabilities)
+{
+ switch (cpu_version) {
+ case PVR_ARCH_31_P11:
+ *capabilities |= KVM_PPC_COMPAT_CAP_POWER11;
+ break;
+ case PVR_ARCH_31:
+ *capabilities |= KVM_PPC_COMPAT_CAP_POWER10;
+ break;
+ case PVR_ARCH_300:
+ *capabilities |= KVM_PPC_COMPAT_CAP_POWER9;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
static int kvmppc_get_compat_caps(struct kvm_ppc_compat_caps *host_caps)
{
+ struct device_node *np;
unsigned long capabilities = 0;
+ const __be32 *prop = NULL;
long rc = -EINVAL;
+ u32 cpu_version;
if (kvmhv_on_pseries()) {
- if (kvmhv_is_nestedv2())
+ if (kvmhv_is_nestedv2()) {
rc = plpar_guest_get_capabilities(0, &capabilities);
+ } else {
+ for_each_node_by_type(np, "cpu") {
+ prop = of_get_property(np, "cpu-version", NULL);
+ if (prop) {
+ cpu_version = be32_to_cpup(prop);
+ break;
+ }
+ }
+ if (!prop)
+ return -EINVAL;
+ rc = kvmppc_map_compat_capabilities(cpu_version,
+ &capabilities);
+ }
host_caps->compat_capabilities = capabilities &
KVM_PPC_COMPAT_BITMASK;
}
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v4 3/4] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV
2026-06-16 12:33 ` [PATCH v4 3/4] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV Amit Machhiwal
@ 2026-06-16 12:47 ` sashiko-bot
0 siblings, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-06-16 12:47 UTC (permalink / raw)
To: Amit Machhiwal; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
- [High] Missing `of_node_put()` causes an OF node reference count leak when breaking out of `for_each_node_by_type()`.
- [High] Incomplete capability reporting due to missing `fallthrough` in `kvmppc_map_compat_capabilities`.
- [High] Hypercall H_ error codes are leaked directly to userspace as standard Linux errnos.
- [Medium] KVM_PPC_GET_COMPAT_CAPS is defined as _IOR but expects input from userspace.
- [Low] Endianness type mismatch for `cpu_version` between caller and callee violates sparse typing.
--
commit 2d0f9bcccc58cf2776556c592aad6db1adf0c8aa
Author: Amit Machhiwal <amachhiw@linux.ibm.com>
KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV
Extend host CPU compatibility capability reporting to support nested
virtualization on PowerNV systems. For nested API v1, derive the
capabilities from the L1 guest via the "cpu-version" device tree property
and map them to KVM_PPC_COMPAT_CAP bits.
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 61dbeea317f3b..dd0f4d487a770 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
[ ... ]
> @@ -6510,6 +6510,57 @@ static bool kvmppc_hash_v3_possible(void)
> return true;
> }
>
> +static int kvmppc_map_compat_capabilities(const __be32 cpu_version,
> + unsigned long *capabilities)
[Severity: Low]
Does this parameter type violate sparse typing rules?
The caller kvmppc_get_compat_caps() passes a host-endian u32 value that was
converted via be32_to_cpup(). If cpu_version is annotated as __be32, might
this trigger a static analyzer warning?
> +{
> + switch (cpu_version) {
> + case PVR_ARCH_31_P11:
> + *capabilities |= KVM_PPC_COMPAT_CAP_POWER11;
> + break;
> + case PVR_ARCH_31:
> + *capabilities |= KVM_PPC_COMPAT_CAP_POWER10;
> + break;
> + case PVR_ARCH_300:
> + *capabilities |= KVM_PPC_COMPAT_CAP_POWER9;
> + break;
[Severity: High]
Could this switch statement be missing fallthroughs?
Host CPU compatibility modes are typically cumulative. For example, a POWER10
CPU would support both POWER10 and POWER9 compatibility modes. With these
breaks, only a single highest capability bit is set. Is there a risk QEMU
might incorrectly conclude that older nested guests are unsupported?
> + default:
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int kvmppc_get_compat_caps(struct kvm_ppc_compat_caps *host_caps)
> +{
> + struct device_node *np;
> + unsigned long capabilities = 0;
> + const __be32 *prop = NULL;
> + long rc = -EINVAL;
> + u32 cpu_version;
> +
> + if (kvmhv_on_pseries()) {
> + if (kvmhv_is_nestedv2()) {
> + rc = plpar_guest_get_capabilities(0, &capabilities);
[Severity: High]
Are hypercall error codes being leaked directly to userspace here?
If plpar_guest_get_capabilities() fails, it returns a hypervisor error code
like H_PARAMETER (-4). This value is directly assigned to rc and returned to
the ioctl handler. Since -4 corresponds to -EINTR in Linux, would userspace
misinterpret the error as an interrupted system call and infinitely retry the
ioctl?
> + } else {
> + for_each_node_by_type(np, "cpu") {
> + prop = of_get_property(np, "cpu-version", NULL);
> + if (prop) {
> + cpu_version = be32_to_cpup(prop);
> + break;
[Severity: High]
Does this early break leak the OF node reference?
The for_each_node_by_type() macro drops the reference to the previous node
and acquires a reference to the current node. By breaking out of the loop
without calling of_node_put(np), the reference count appears to be leaked.
> + }
> + }
> + if (!prop)
> + return -EINVAL;
> + rc = kvmppc_map_compat_capabilities(cpu_version,
> + &capabilities);
> + }
> + host_caps->compat_capabilities = capabilities &
> + KVM_PPC_COMPAT_BITMASK;
> + }
> +
> + return rc;
> +}
[ ... ]
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6c8afa2047bf3..1788a00686626 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
[ ... ]
> @@ -1349,6 +1350,9 @@ struct kvm_s390_keyop {
> #define KVM_GET_DEVICE_ATTR _IOW(KVMIO, 0xe2, struct kvm_device_attr)
> #define KVM_HAS_DEVICE_ATTR _IOW(KVMIO, 0xe3, struct kvm_device_attr)
>
> +/* Available with KVM_CAP_PPC_COMPAT_CAPS */
> +#define KVM_PPC_GET_COMPAT_CAPS _IOR(KVMIO, 0xe4, struct kvm_ppc_compat_caps)
[Severity: Medium]
Could this ioctl definition cause issues with wrapper libraries?
The KVM_PPC_GET_COMPAT_CAPS ioctl handler in kvm_arch_vm_ioctl() uses
copy_from_user() to read the size field from the userspace structure.
Since it reads input data from userspace, should this be defined as _IOWR?
Using _IOR might cause unexpected behavior with tools that assume the
parameter is purely an output buffer.
> +
> /*
> * ioctls for vcpu fds
> */
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616123314.82721-1-amachhiw@linux.ibm.com?part=3
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v4 4/4] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl
2026-06-16 12:33 [PATCH v4 0/4] KVM: PPC: Expose CPU compatibility modes for nested guests Amit Machhiwal
` (2 preceding siblings ...)
2026-06-16 12:33 ` [PATCH v4 3/4] KVM: PPC: Book3S HV: Add support for compat CPU capabilities for KVM on PowerNV Amit Machhiwal
@ 2026-06-16 12:33 ` Amit Machhiwal
2026-06-16 12:50 ` sashiko-bot
3 siblings, 1 reply; 9+ messages in thread
From: Amit Machhiwal @ 2026-06-16 12:33 UTC (permalink / raw)
To: linuxppc-dev, Madhavan Srinivasan
Cc: Vaibhav Jain, Amit Machhiwal, Anushree Mathur, Paolo Bonzini,
Nicholas Piggin, Michael Ellerman, Christophe Leroy (CS GROUP),
Jonathan Corbet, Shuah Khan, kvm, linux-kernel, linux-doc, lkp
Add documentation for the KVM_PPC_GET_COMPAT_CAPS ioctl to the KVM API
documentation.
The ioctl exposes host processor compatibility modes supported for
nested KVM guests on PowerPC systems. The documentation includes
comprehensive error code descriptions, structure field definitions
including the size field for forward compatibility, and KVM-specific
capability bit constants.
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
Documentation/virt/kvm/api.rst | 47 ++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 52bbbb553ce1..ba6feba74d7d 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6553,6 +6553,53 @@ KVM_S390_KEYOP_SSKE
Sets the storage key for the guest address ``guest_addr`` to the key
specified in ``key``, returning the previous value in ``key``.
+4.145 KVM_PPC_GET_COMPAT_CAPS
+-----------------------------
+:Capability: KVM_CAP_PPC_COMPAT_CAPS
+:Architectures: powerpc
+:Type: vm ioctl
+:Parameters: struct kvm_ppc_compat_caps (out)
+:Returns: 0 on success, negative value on failure
+
+Errors include:
+
+ ======== ============================================================
+ EFAULT if ``struct kvm_ppc_compat_caps`` cannot be read from or
+ written to userspace
+ EINVAL if the ``size`` field is smaller than the current structure
+ size, or if the backend implementation fails to retrieve or
+ map CPU compatibility capabilities
+ ENOTTY if the backend does not implement the ``get_compat_caps``
+ operation (e.g., on non-pseries platforms or when the
+ required KVM operations are not available)
+ ======== ============================================================
+
+IBM POWER system server-based processors provide a compatibility mode feature
+where an Nth generation processor can operate in modes consistent with earlier
+generations such as (N-1) and (N-2).
+
+This ioctl provides userspace with information about the CPU compatibility modes
+supported by the current host processor for booting the nested KVM guests on
+PowerNV (KVM nested APIv1) and PowerVM (KVM nested APIv2) platforms.
+
+::
+
+ struct kvm_ppc_compat_caps {
+ __u64 flags; /* Reserved for future use */
+ __u64 size; /* Size of this structure */
+ __u64 compat_capabilities; /* Capabilities supported by the host */
+ };
+
+The ``compat_capabilities`` bit field describes the processor compatibility
+modes supported by the host. For example, the following bits indicate support
+for specific processor modes.
+
+::
+
+ KVM_PPC_COMPAT_CAP_POWER9 (bit 1): KVM guests can run in Power9 processor mode
+ KVM_PPC_COMPAT_CAP_POWER10 (bit 2): KVM guests can run in Power10 processor mode
+ KVM_PPC_COMPAT_CAP_POWER11 (bit 3): KVM guests can run in Power11 processor mode
+
.. _kvm_run:
5. The kvm_run structure
--
2.50.1 (Apple Git-155)
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH v4 4/4] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl
2026-06-16 12:33 ` [PATCH v4 4/4] KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl Amit Machhiwal
@ 2026-06-16 12:50 ` sashiko-bot
0 siblings, 0 replies; 9+ messages in thread
From: sashiko-bot @ 2026-06-16 12:50 UTC (permalink / raw)
To: Amit Machhiwal; +Cc: kvm
Thank you for your contribution! Sashiko AI review found 7 potential issue(s) to consider:
- [High] UAPI backwards compatibility breakage due to invalid struct size validation.
- [High] Reserved `flags` field is not checked for zero, rendering it permanently unusable for future input.
- [High] Capability mapping logic lacks `fallthrough`, artificially restricting nested guests.
- [High] Resource leak of `device_node` refcount in `kvmppc_get_compat_caps()`.
- [Medium] Misleading documentation causing userspace to pass uninitialized memory to the kernel.
- [Medium] UAPI misconfiguration: IOCTL direction is `_IOR` but kernel reads from userspace.
- [Medium] Inconsistent error return code on non-pseries platforms contradicts documentation.
--
commit 8e3e8114a54d0e9097662db3db8e0e1e2ee2506b
Author: Amit Machhiwal <amachhiw@linux.ibm.com>
KVM: PPC: Document KVM_PPC_GET_COMPAT_CAPS ioctl
This commit adds documentation for the KVM_PPC_GET_COMPAT_CAPS ioctl in the
KVM API. It details the capabilities, structure fields, and expected error
codes.
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 52bbbb553ce10..ba6feba74d7d9 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6553,6 +6553,53 @@ KVM_S390_KEYOP_SSKE
> Sets the storage key for the guest address ``guest_addr`` to the key
> specified in ``key``, returning the previous value in ``key``.
>
> +4.145 KVM_PPC_GET_COMPAT_CAPS
> +-----------------------------
> +:Capability: KVM_CAP_PPC_COMPAT_CAPS
> +:Architectures: powerpc
> +:Type: vm ioctl
> +:Parameters: struct kvm_ppc_compat_caps (out)
[Severity: Medium]
Is this parameter accurately described as strictly output-only?
Looking at the existing implementation, the kernel reads the size field from
userspace:
arch/powerpc/kvm/powerpc.c:kvm_arch_vm_ioctl() {
...
if (copy_from_user(&user_size, &((struct kvm_ppc_compat_caps
__user *)argp)->size, sizeof(user_size))) {
...
}
If userspace leaves the struct uninitialized based on this documentation, the
kernel will read random memory for size, leading to -EINVAL failures. Should
the documentation indicate this is an input/output parameter?
[Severity: Medium]
This isn't a bug introduced by this patch, but should the ioctl direction be
updated?
The IOCTL is currently defined as read-only:
include/uapi/linux/kvm.h:
#define KVM_PPC_GET_COMPAT_CAPS _IOR(KVMIO, 0xe4, struct kvm_ppc_compat_caps)
Since the kernel reads the size field from userspace, this violates
conventions and can cause issues with tracing tools. Should it be defined
with _IOWR?
> +:Returns: 0 on success, negative value on failure
> +
> +Errors include:
> +
> + ======== ============================================================
> + EFAULT if ``struct kvm_ppc_compat_caps`` cannot be read from or
> + written to userspace
> + EINVAL if the ``size`` field is smaller than the current structure
> + size, or if the backend implementation fails to retrieve or
> + map CPU compatibility capabilities
[Severity: High]
This isn't a bug introduced by this patch, but could the current size
validation break backwards compatibility?
The existing implementation in arch/powerpc/kvm/powerpc.c rejects requests if
the size is smaller than the host structure:
arch/powerpc/kvm/powerpc.c:kvm_arch_vm_ioctl() {
...
if (user_size < sizeof(host_caps))
goto out;
...
}
If struct kvm_ppc_compat_caps is ever extended, older userspace binaries
passing the old, smaller structure size will be unconditionally rejected.
> + ENOTTY if the backend does not implement the ``get_compat_caps``
> + operation (e.g., on non-pseries platforms or when the
> + required KVM operations are not available)
> + ======== ============================================================
[Severity: Medium]
Will the existing implementation return -ENOTTY as documented here?
Looking at kvmppc_get_compat_caps(), it returns -EINVAL on non-pseries
platforms:
arch/powerpc/kvm/book3s_hv.c:kvmppc_get_compat_caps() {
int rc = -EINVAL;
...
if (kvmhv_on_pseries()) { ... }
return rc;
}
This could confuse userspace into thinking it passed an invalid argument size.
Should the documentation or the code be adjusted?
> +
> +IBM POWER system server-based processors provide a compatibility mode feature
> +where an Nth generation processor can operate in modes consistent with earlier
> +generations such as (N-1) and (N-2).
> +
> +This ioctl provides userspace with information about the CPU compatibility modes
> +supported by the current host processor for booting the nested KVM guests on
> +PowerNV (KVM nested APIv1) and PowerVM (KVM nested APIv2) platforms.
> +
> +::
> +
> + struct kvm_ppc_compat_caps {
> + __u64 flags; /* Reserved for future use */
[Severity: High]
This isn't a bug introduced by this patch, but is the reserved flags field
permanently unusable for future input?
The existing implementation skips copying and validating the flags field:
arch/powerpc/kvm/powerpc.c:kvm_arch_vm_ioctl() {
...
if (copy_from_user(&user_size, &((struct kvm_ppc_compat_caps
__user *)argp)->size, sizeof(user_size))) {
...
}
Because the parameter is documented as (out) and the kernel ignores any
uninitialized data in flags, a future kernel will not be able to safely
repurpose flags for input. Should this validate that flags is zero?
> + __u64 size; /* Size of this structure */
> + __u64 compat_capabilities; /* Capabilities supported by the host */
> + };
> +
> +The ``compat_capabilities`` bit field describes the processor compatibility
> +modes supported by the host. For example, the following bits indicate support
> +for specific processor modes.
> +
> +::
> +
> + KVM_PPC_COMPAT_CAP_POWER9 (bit 1): KVM guests can run in Power9 processor mode
> + KVM_PPC_COMPAT_CAP_POWER10 (bit 2): KVM guests can run in Power10 processor mode
> + KVM_PPC_COMPAT_CAP_POWER11 (bit 3): KVM guests can run in Power11 processor mode
[Severity: High]
This isn't a bug introduced by this patch, but can the capability mapping
logic falsely restrict nested guests?
The switch statement lacks fallthrough annotations:
arch/powerpc/kvm/book3s_hv.c:kvmppc_map_compat_capabilities() {
...
case PVR_ARCH_31_P11:
*capabilities |= KVM_PPC_COMPAT_CAP_POWER11;
break;
...
}
This causes the host to report a capability mask with only a single bit,
indicating it cannot run nested guests in older compatibility modes.
[Severity: High]
This is a pre-existing issue, but are we missing a reference count cleanup
here?
In the compatibility capabilities retrieval:
arch/powerpc/kvm/book3s_hv.c:kvmppc_get_compat_caps() {
...
for_each_node_by_type(np, "cpu") {
prop = of_get_property(np, "cpu-version", NULL);
if (prop) {
cpu_version = be32_to_cpup(prop);
break;
}
}
...
}
The for_each_node_by_type macro takes a reference count on the device node.
Exiting the loop early via break requires manually dropping the reference via
of_node_put(np). Does this leak a reference on every ioctl call?
> .. _kvm_run:
>
> 5. The kvm_run structure
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260616123314.82721-1-amachhiw@linux.ibm.com?part=4
^ permalink raw reply [flat|nested] 9+ messages in thread