* [RFC PATCH v2 1/4] firmware/psci: Add definitions for PSCI v1.3 specification (ALPHA)
2024-03-18 16:14 [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation David Woodhouse
@ 2024-03-18 16:14 ` David Woodhouse
2024-03-18 16:14 ` [RFC PATCH v2 2/4] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation David Woodhouse
` (3 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2024-03-18 16:14 UTC (permalink / raw)
To: linux-arm-kernel, kvm
Cc: Paolo Bonzini, Jonathan Corbet, Marc Zyngier, Oliver Upton,
James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Will Deacon, Mark Rutland, Lorenzo Pieralisi, Rafael J. Wysocki,
Len Brown, Pavel Machek, David Woodhouse, Mostafa Saleh,
Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm
From: David Woodhouse <dwmw@amazon.co.uk>
The v1.3 PSCI spec (https://developer.arm.com/documentation/den0022) adds
SYSTEM_OFF2, CLEAN_INV_MEMREGION and CLEAN_INV_MEMREGION_ATTRIBUTES
functions. Add definitions for them and their parameters, along with the
new TIMEOUT, RATE_LIMITED and BUSY error values.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
include/uapi/linux/psci.h | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
index 42a40ad3fb62..082ed689fdaf 100644
--- a/include/uapi/linux/psci.h
+++ b/include/uapi/linux/psci.h
@@ -59,6 +59,8 @@
#define PSCI_1_1_FN_SYSTEM_RESET2 PSCI_0_2_FN(18)
#define PSCI_1_1_FN_MEM_PROTECT PSCI_0_2_FN(19)
#define PSCI_1_1_FN_MEM_PROTECT_CHECK_RANGE PSCI_0_2_FN(20)
+#define PSCI_1_3_FN_SYSTEM_OFF2 PSCI_0_2_FN(21)
+#define PSCI_1_3_FN_CLEAN_INV_MEMREGION_ATTRIBUTES PSCI_0_2_FN(23)
#define PSCI_1_0_FN64_CPU_DEFAULT_SUSPEND PSCI_0_2_FN64(12)
#define PSCI_1_0_FN64_NODE_HW_STATE PSCI_0_2_FN64(13)
@@ -68,6 +70,8 @@
#define PSCI_1_1_FN64_SYSTEM_RESET2 PSCI_0_2_FN64(18)
#define PSCI_1_1_FN64_MEM_PROTECT_CHECK_RANGE PSCI_0_2_FN64(20)
+#define PSCI_1_3_FN64_SYSTEM_OFF2 PSCI_0_2_FN64(21)
+#define PSCI_1_3_FN64_CLEAN_INV_MEMREGION PSCI_0_2_FN64(22)
/* PSCI v0.2 power state encoding for CPU_SUSPEND function */
#define PSCI_0_2_POWER_STATE_ID_MASK 0xffff
@@ -100,6 +104,19 @@
#define PSCI_1_1_RESET_TYPE_SYSTEM_WARM_RESET 0
#define PSCI_1_1_RESET_TYPE_VENDOR_START 0x80000000U
+/* PSCI v1.3 hibernate type for SYSTEM_OFF2 */
+#define PSCI_1_3_HIBERNATE_TYPE_OFF 0
+
+/* PSCI v1.3 flags for CLEAN_INV_MEMREGION */
+#define PSCI_1_3_CLEAN_INV_MEMREGION_FLAG_DRY_RUN BIT(0)
+
+/* PSCI v1.3 attributes for CLEAN_INV_MEMREGION_ATTRIBUTES */
+#define PSCI_1_3_CLEAN_INV_MEMREGION_ATTR_OP_TYPE 0
+#define PSCI_1_3_CLEAN_INV_MEMREGION_ATTR_CPU_RDVZ 1
+#define PSCI_1_3_CLEAN_INV_MEMREGION_ATTR_LATENCY 2
+#define PSCI_1_3_CLEAN_INV_MEMREGION_ATTR_RATE_LIMIT 3
+#define PSCI_1_3_CLEAN_INV_MEMREGION_ATTR_TIMEOUT 4
+
/* PSCI version decoding (independent of PSCI version) */
#define PSCI_VERSION_MAJOR_SHIFT 16
#define PSCI_VERSION_MINOR_MASK \
@@ -133,5 +150,8 @@
#define PSCI_RET_NOT_PRESENT -7
#define PSCI_RET_DISABLED -8
#define PSCI_RET_INVALID_ADDRESS -9
+#define PSCI_RET_TIMEOUT -10
+#define PSCI_RET_RATE_LIMITED -11
+#define PSCI_RET_BUSY -12
#endif /* _UAPI_LINUX_PSCI_H */
--
2.44.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [RFC PATCH v2 2/4] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation
2024-03-18 16:14 [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation David Woodhouse
2024-03-18 16:14 ` [RFC PATCH v2 1/4] firmware/psci: Add definitions for PSCI v1.3 specification (ALPHA) David Woodhouse
@ 2024-03-18 16:14 ` David Woodhouse
2024-03-18 17:29 ` Marc Zyngier
2024-03-18 16:14 ` [RFC PATCH v2 3/4] KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call David Woodhouse
` (2 subsequent siblings)
4 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2024-03-18 16:14 UTC (permalink / raw)
To: linux-arm-kernel, kvm
Cc: Paolo Bonzini, Jonathan Corbet, Marc Zyngier, Oliver Upton,
James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Will Deacon, Mark Rutland, Lorenzo Pieralisi, Rafael J. Wysocki,
Len Brown, Pavel Machek, David Woodhouse, Mostafa Saleh,
Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm
From: David Woodhouse <dwmw@amazon.co.uk>
The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2 function
which is analogous to ACPI S4 state. This will allow hosting environments
to determine that a guest is hibernated rather than just powered off, and
ensure that they preserve the virtual environment appropriately to allow
the guest to resume safely (or bump the hardware_signature in the FACS to
trigger a clean reboot instead).
The beta version will be changed to say that PSCI_FEATURES returns a bit
mask of the supported hibernate types, which is implemented here.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
Documentation/virt/kvm/api.rst | 11 +++++++++
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/include/uapi/asm/kvm.h | 6 +++++
arch/arm64/kvm/arm.c | 5 +++++
arch/arm64/kvm/psci.c | 37 +++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 1 +
6 files changed, 62 insertions(+)
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0b5a33ee71ee..ff061b6a2393 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6761,6 +6761,10 @@ the first `ndata` items (possibly zero) of the data array are valid.
the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI
specification.
+ - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2
+ if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI
+ specification.
+
- for RISC-V, data[0] is set to the value of the second argument of the
``sbi_system_reset`` call.
@@ -6794,6 +6798,13 @@ either:
- Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2
"Caller responsibilities" for possible return values.
+Hibernation using the PSCI SYSTEM_OFF2 call is enabled with the
+KVM_CAP_ARM_SYSTEM_OFF2 VM capability. If a guest invokes the PSCI
+SYSTEM_OFF2 function, KVM will exit to userspace with the
+KVM_SYSTEM_EVENT_SHUTDOWN event type and with data[0] set to
+KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only supported hibernate
+type for the SYSTEM_OFF2 function is HIBERNATE_OFF (0x0).
+
::
/* KVM_EXIT_IOAPIC_EOI */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6883963bbc3a..6ed05791afdb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -299,6 +299,8 @@ struct kvm_arch {
#define KVM_ARCH_FLAG_ID_REGS_INITIALIZED 7
/* Fine-Grained UNDEF initialised */
#define KVM_ARCH_FLAG_FGU_INITIALIZED 8
+ /* PSCI SYSTEM_OFF2 (hibernate) enabled for the guest */
+#define KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED 9
unsigned long flags;
/* VM-wide vCPU feature set */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 964df31da975..66736ff04011 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -484,6 +484,12 @@ enum {
*/
#define KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2 (1ULL << 0)
+/*
+ * Shutdown caused by a PSCI v1.3 SYSTEM_OFF2 call.
+ * Valid only when the system event has a type of KVM_SYSTEM_EVENT_SHUTDOWN.
+ */
+#define KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2 (1ULL << 0)
+
/* run->fail_entry.hardware_entry_failure_reason codes. */
#define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED (1ULL << 0)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3dee5490eea9..47d23064d6f7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -98,6 +98,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
r = 0;
set_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags);
break;
+ case KVM_CAP_ARM_SYSTEM_OFF2:
+ r = 0;
+ set_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags);
+ break;
case KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE:
new_cap = cap->args[0];
@@ -243,6 +247,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_VCPU_ATTRIBUTES:
case KVM_CAP_PTP_KVM:
case KVM_CAP_ARM_SYSTEM_SUSPEND:
+ case KVM_CAP_ARM_SYSTEM_OFF2:
case KVM_CAP_IRQFD_RESAMPLE:
case KVM_CAP_COUNTER_OFFSET:
r = 1;
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 1f69b667332b..59570eea8aa7 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -194,6 +194,12 @@ static void kvm_psci_system_off(struct kvm_vcpu *vcpu)
kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN, 0);
}
+static void kvm_psci_system_off2(struct kvm_vcpu *vcpu)
+{
+ kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN,
+ KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2);
+}
+
static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
{
kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET, 0);
@@ -353,6 +359,11 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
if (test_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags))
val = 0;
break;
+ case PSCI_1_3_FN_SYSTEM_OFF2:
+ case PSCI_1_3_FN64_SYSTEM_OFF2:
+ if (test_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags))
+ val = 1UL << PSCI_1_3_HIBERNATE_TYPE_OFF;
+ break;
case PSCI_1_1_FN_SYSTEM_RESET2:
case PSCI_1_1_FN64_SYSTEM_RESET2:
if (minor >= 1)
@@ -374,6 +385,32 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
return 0;
}
break;
+ case PSCI_1_3_FN_SYSTEM_OFF2:
+ kvm_psci_narrow_to_32bit(vcpu);
+ fallthrough;
+ case PSCI_1_3_FN64_SYSTEM_OFF2:
+ if (!test_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags))
+ break;
+
+ arg = smccc_get_arg1(vcpu);
+ if (arg != PSCI_1_3_HIBERNATE_TYPE_OFF) {
+ val = PSCI_RET_INVALID_PARAMS;
+ break;
+ }
+ kvm_psci_system_off2(vcpu);
+ /*
+ * We shouldn't be going back to guest VCPU after
+ * receiving SYSTEM_OFF2 request.
+ *
+ * If user space accidentally/deliberately resumes
+ * guest VCPU after SYSTEM_OFF request then guest
+ * VCPU should see internal failure from PSCI return
+ * value. To achieve this, we preload r0 (or x0) with
+ * PSCI return value INTERNAL_FAILURE.
+ */
+ val = PSCI_RET_INTERNAL_FAILURE;
+ ret = 0;
+ break;
case PSCI_1_1_FN_SYSTEM_RESET2:
kvm_psci_narrow_to_32bit(vcpu);
fallthrough;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..caa3845b80d6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -917,6 +917,7 @@ struct kvm_enable_cap {
#define KVM_CAP_MEMORY_ATTRIBUTES 233
#define KVM_CAP_GUEST_MEMFD 234
#define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_ARM_SYSTEM_OFF2 236
struct kvm_irq_routing_irqchip {
__u32 irqchip;
--
2.44.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [RFC PATCH v2 2/4] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation
2024-03-18 16:14 ` [RFC PATCH v2 2/4] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation David Woodhouse
@ 2024-03-18 17:29 ` Marc Zyngier
2024-03-18 17:54 ` David Woodhouse
0 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2024-03-18 17:29 UTC (permalink / raw)
To: David Woodhouse
Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Rafael J. Wysocki, Len Brown, Pavel Machek, David Woodhouse,
Mostafa Saleh, Jean-Philippe Brucker, linux-doc, linux-kernel,
kvmarm, linux-pm
On Mon, 18 Mar 2024 16:14:24 +0000,
David Woodhouse <dwmw2@infradead.org> wrote:
>
> From: David Woodhouse <dwmw@amazon.co.uk>
>
> The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2 function
> which is analogous to ACPI S4 state. This will allow hosting environments
> to determine that a guest is hibernated rather than just powered off, and
> ensure that they preserve the virtual environment appropriately to allow
> the guest to resume safely (or bump the hardware_signature in the FACS to
> trigger a clean reboot instead).
>
> The beta version will be changed to say that PSCI_FEATURES returns a bit
> mask of the supported hibernate types, which is implemented here.
>
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
> Documentation/virt/kvm/api.rst | 11 +++++++++
> arch/arm64/include/asm/kvm_host.h | 2 ++
> arch/arm64/include/uapi/asm/kvm.h | 6 +++++
> arch/arm64/kvm/arm.c | 5 +++++
> arch/arm64/kvm/psci.c | 37 +++++++++++++++++++++++++++++++
> include/uapi/linux/kvm.h | 1 +
> 6 files changed, 62 insertions(+)
>
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 0b5a33ee71ee..ff061b6a2393 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6761,6 +6761,10 @@ the first `ndata` items (possibly zero) of the data array are valid.
> the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI
> specification.
>
> + - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2
> + if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI
> + specification.
> +
> - for RISC-V, data[0] is set to the value of the second argument of the
> ``sbi_system_reset`` call.
>
> @@ -6794,6 +6798,13 @@ either:
> - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2
> "Caller responsibilities" for possible return values.
>
> +Hibernation using the PSCI SYSTEM_OFF2 call is enabled with the
> +KVM_CAP_ARM_SYSTEM_OFF2 VM capability. If a guest invokes the PSCI
> +SYSTEM_OFF2 function, KVM will exit to userspace with the
> +KVM_SYSTEM_EVENT_SHUTDOWN event type and with data[0] set to
> +KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only supported hibernate
> +type for the SYSTEM_OFF2 function is HIBERNATE_OFF (0x0).
Again, I really oppose this way of doing things. We already have an
infrastructure for selecting PSCI levels. You may not like it, but it
exists, and I'm not going entertain supporting yet another bike-shed
model. Adding an orthogonal cap for a feature that is specific to a
new PSCI version is just awful.
Please make PSCI 1.3 the only version of PSCI supporting suspend in a
non-optional way, and be done with it.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH v2 2/4] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation
2024-03-18 17:29 ` Marc Zyngier
@ 2024-03-18 17:54 ` David Woodhouse
2024-03-18 18:07 ` Marc Zyngier
0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2024-03-18 17:54 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm
[-- Attachment #1: Type: text/plain, Size: 1195 bytes --]
On Mon, 2024-03-18 at 17:29 +0000, Marc Zyngier wrote:
>
> Again, I really oppose this way of doing things. We already have an
> infrastructure for selecting PSCI levels. You may not like it, but it
> exists, and I'm not going entertain supporting yet another bike-shed
> model. Adding an orthogonal cap for a feature that is specific to a
> new PSCI version is just awful.
Huh? This isn't a "new bike-shed model". This is a straight copy of
what we *already* have for SYSTEM_RESET2.
If I were bike-shedding, I wouldn't do separate caps for them; I'd have
done it as a *bitmask* of the optional PSCI calls that should be
enabled.
The *mandatory* ones should obviously come from the PSCI version alone,
but I can't see how that makes sense for the optional ones...
> Please make PSCI 1.3 the only version of PSCI supporting suspend in a
> non-optional way, and be done with it.
SYSTEM_OFF2 is an *optional* feature in PSCI v1.3. As are
CLEAR_INV_MEMREGION and CLEAR_INV_MEMREGION_ATTRIBUTES.
Are you suggesting that enabling v1.3 should automatically enable *all*
of the optional features that were defined in that version (and
previous versions) of the spec?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH v2 2/4] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation
2024-03-18 17:54 ` David Woodhouse
@ 2024-03-18 18:07 ` Marc Zyngier
2024-03-18 18:17 ` David Woodhouse
0 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2024-03-18 18:07 UTC (permalink / raw)
To: David Woodhouse
Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm
On Mon, 18 Mar 2024 17:54:06 +0000,
David Woodhouse <dwmw2@infradead.org> wrote:
>
> [1 <text/plain; UTF-8 (quoted-printable)>]
> On Mon, 2024-03-18 at 17:29 +0000, Marc Zyngier wrote:
> >
> > Again, I really oppose this way of doing things. We already have an
> > infrastructure for selecting PSCI levels. You may not like it, but it
> > exists, and I'm not going entertain supporting yet another bike-shed
> > model. Adding an orthogonal cap for a feature that is specific to a
> > new PSCI version is just awful.
>
> Huh? This isn't a "new bike-shed model". This is a straight copy of
> what we *already* have for SYSTEM_RESET2.
There is no KVM capability for SYSTEM_RESET2. It is directly
advertised to the guest when PSCI 1.1 is supported.
> If I were bike-shedding, I wouldn't do separate caps for them; I'd have
> done it as a *bitmask* of the optional PSCI calls that should be
> enabled.
>
> The *mandatory* ones should obviously come from the PSCI version alone,
> but I can't see how that makes sense for the optional ones...
The guest is in a position to probe for what is supported or not with
the PSCI_FEATURES call. Why would you add anything else?
>
> > Please make PSCI 1.3 the only version of PSCI supporting suspend in a
> > non-optional way, and be done with it.
>
> SYSTEM_OFF2 is an *optional* feature in PSCI v1.3. As are
> CLEAR_INV_MEMREGION and CLEAR_INV_MEMREGION_ATTRIBUTES.
>
> Are you suggesting that enabling v1.3 should automatically enable *all*
> of the optional features that were defined in that version (and
> previous versions) of the spec?
No. We have everything we need to incrementally *add* features. So you
can perfectly implement PSCI 1.3 with only SYSTEM_OFF2, and only later
on add the rest, if ever.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH v2 2/4] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation
2024-03-18 18:07 ` Marc Zyngier
@ 2024-03-18 18:17 ` David Woodhouse
0 siblings, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2024-03-18 18:17 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm
[-- Attachment #1: Type: text/plain, Size: 2702 bytes --]
On Mon, 2024-03-18 at 18:07 +0000, Marc Zyngier wrote:
> On Mon, 18 Mar 2024 17:54:06 +0000,
> David Woodhouse <dwmw2@infradead.org> wrote:
> >
> > [1 <text/plain; UTF-8 (quoted-printable)>]
> > On Mon, 2024-03-18 at 17:29 +0000, Marc Zyngier wrote:
> > >
> > > Again, I really oppose this way of doing things. We already have an
> > > infrastructure for selecting PSCI levels. You may not like it, but it
> > > exists, and I'm not going entertain supporting yet another bike-shed
> > > model. Adding an orthogonal cap for a feature that is specific to a
> > > new PSCI version is just awful.
> >
> > Huh? This isn't a "new bike-shed model". This is a straight copy of
> > what we *already* have for SYSTEM_RESET2.
>
> There is no KVM capability for SYSTEM_RESET2. It is directly
> advertised to the guest when PSCI 1.1 is supported.
Apologies, I got that wrong. It's SYSTEM_SUSPEND and the corresponding
KVM_CAP_ARM_SYSTEM_SUSPEND that I was thinking of. Not SYSTEM_RESET2.I
mixed those up.
> > If I were bike-shedding, I wouldn't do separate caps for them; I'd have
> > done it as a *bitmask* of the optional PSCI calls that should be
> > enabled.
> >
> > The *mandatory* ones should obviously come from the PSCI version alone,
> > but I can't see how that makes sense for the optional ones...
>
> The guest is in a position to probe for what is supported or not with
> the PSCI_FEATURES call. Why would you add anything else?
Because we don't want to silently *change* what's advertised to the
guest with the VMM explicitly opting in.
> > > Please make PSCI 1.3 the only version of PSCI supporting suspend in a
> > > non-optional way, and be done with it.
> >
> > SYSTEM_OFF2 is an *optional* feature in PSCI v1.3. As are
> > CLEAR_INV_MEMREGION and CLEAR_INV_MEMREGION_ATTRIBUTES.
> >
> > Are you suggesting that enabling v1.3 should automatically enable *all*
> > of the optional features that were defined in that version (and
> > previous versions) of the spec?
>
> No. We have everything we need to incrementally *add* features. So you
> can perfectly implement PSCI 1.3 with only SYSTEM_OFF2, and only later
> on add the rest, if ever.
OK. It's still awful, but I suppose can live with that since existing
VMMs will just see the same KVM_SYSTEM_EVENT_SHUTDOWN as before, and
hopefully just won't understand the flag (and won't notice) the extra
flag which says it's a hibernate.
A VMM might *perhaps* check for flags it doesn't understand and
complain about them, which is why we shouldn't really do that. But
where PSCI is concerned it seems we've left best practice behind a long
time ago, so I'll let it go.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH v2 3/4] KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call
2024-03-18 16:14 [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation David Woodhouse
2024-03-18 16:14 ` [RFC PATCH v2 1/4] firmware/psci: Add definitions for PSCI v1.3 specification (ALPHA) David Woodhouse
2024-03-18 16:14 ` [RFC PATCH v2 2/4] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation David Woodhouse
@ 2024-03-18 16:14 ` David Woodhouse
2024-03-18 16:14 ` [RFC PATCH v2 4/4] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate David Woodhouse
2024-03-18 16:57 ` [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation Marc Zyngier
4 siblings, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2024-03-18 16:14 UTC (permalink / raw)
To: linux-arm-kernel, kvm
Cc: Paolo Bonzini, Jonathan Corbet, Marc Zyngier, Oliver Upton,
James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Will Deacon, Mark Rutland, Lorenzo Pieralisi, Rafael J. Wysocki,
Len Brown, Pavel Machek, David Woodhouse, Mostafa Saleh,
Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm
From: David Woodhouse <dwmw@amazon.co.uk>
Pass through the SYSTEM_OFF2 function for hibernation, just like SYSTEM_OFF.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
arch/arm64/kvm/hyp/nvhe/psci-relay.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
index d57bcb6ab94d..76c7643e7eff 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
@@ -265,6 +265,8 @@ static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_
case PSCI_1_0_FN_PSCI_FEATURES:
case PSCI_1_0_FN_SET_SUSPEND_MODE:
case PSCI_1_1_FN64_SYSTEM_RESET2:
+ case PSCI_1_3_FN_SYSTEM_OFF2:
+ case PSCI_1_3_FN64_SYSTEM_OFF2:
return psci_forward(host_ctxt);
case PSCI_1_0_FN64_SYSTEM_SUSPEND:
return psci_system_suspend(func_id, host_ctxt);
--
2.44.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC PATCH v2 4/4] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
2024-03-18 16:14 [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation David Woodhouse
` (2 preceding siblings ...)
2024-03-18 16:14 ` [RFC PATCH v2 3/4] KVM: arm64: nvhe: Pass through PSCI v1.3 SYSTEM_OFF2 call David Woodhouse
@ 2024-03-18 16:14 ` David Woodhouse
2024-03-18 16:57 ` [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation Marc Zyngier
4 siblings, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2024-03-18 16:14 UTC (permalink / raw)
To: linux-arm-kernel, kvm
Cc: Paolo Bonzini, Jonathan Corbet, Marc Zyngier, Oliver Upton,
James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
Will Deacon, Mark Rutland, Lorenzo Pieralisi, Rafael J. Wysocki,
Len Brown, Pavel Machek, David Woodhouse, Mostafa Saleh,
Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm
From: David Woodhouse <dwmw@amazon.co.uk>
The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2
function which is analogous to ACPI S4 state. This will allow hosting
environments to determine that a guest is hibernated rather than just
powered off, and handle that state appropriately on subsequent launches.
Since commit 60c0d45a7f7a ("efi/arm64: use UEFI for system reset and
poweroff") the EFI shutdown method is deliberately preferred over PSCI
or other methods. So register a SYS_OFF_MODE_POWER_OFF handler which
*only* handles the hibernation, leaving the original PSCI SYSTEM_OFF as
a last resort via the legacy pm_power_off function pointer.
The hibernation code already exports a system_entering_hibernation()
function which is be used by the higher-priority handler to check for
hibernation. That existing function just returns the value of a static
boolean variable from hibernate.c, which was previously only set in the
hibernation_platform_enter() code path. Set the same flag in the simpler
code path around the call to kernel_power_off() too.
An alternative way to hook SYSTEM_OFF2 into the hibernation code would
be to register a platform_hibernation_ops structure with an ->enter()
method which makes the new SYSTEM_OFF2 call. But that would have the
unwanted side-effect of making hibernation take a completely different
code path in hibernation_platform_enter(), invoking a lot of special dpm
callbacks.
Another option might be to add a new SYS_OFF_MODE_HIBERNATE mode, with
fallback to SYS_OFF_MODE_POWER_OFF. Or to use the sys_off_data to
indicate whether the power off is for hibernation.
But this version works and is relatively simple.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
drivers/firmware/psci/psci.c | 35 +++++++++++++++++++++++++++++++++++
kernel/power/hibernate.c | 5 ++++-
2 files changed, 39 insertions(+), 1 deletion(-)
diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index d9629ff87861..69d2f6969438 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -78,6 +78,7 @@ struct psci_0_1_function_ids get_psci_0_1_function_ids(void)
static u32 psci_cpu_suspend_feature;
static bool psci_system_reset2_supported;
+static bool psci_system_off2_supported;
static inline bool psci_has_ext_power_state(void)
{
@@ -333,6 +334,28 @@ static void psci_sys_poweroff(void)
invoke_psci_fn(PSCI_0_2_FN_SYSTEM_OFF, 0, 0, 0);
}
+#ifdef CONFIG_HIBERNATION
+static int psci_sys_hibernate(struct sys_off_data *data)
+{
+ if (system_entering_hibernation())
+ invoke_psci_fn(PSCI_FN_NATIVE(1_3, SYSTEM_OFF2),
+ PSCI_1_3_HIBERNATE_TYPE_OFF, 0, 0);
+ return NOTIFY_DONE;
+}
+
+static int __init psci_hibernate_init(void)
+{
+ if (psci_system_off2_supported) {
+ /* Higher priority than EFI shutdown, but only for hibernate */
+ register_sys_off_handler(SYS_OFF_MODE_POWER_OFF,
+ SYS_OFF_PRIO_FIRMWARE + 2,
+ psci_sys_hibernate, NULL);
+ }
+ return 0;
+}
+subsys_initcall(psci_hibernate_init);
+#endif
+
static int psci_features(u32 psci_func_id)
{
return invoke_psci_fn(PSCI_1_0_FN_PSCI_FEATURES,
@@ -364,6 +387,7 @@ static const struct {
PSCI_ID_NATIVE(1_1, SYSTEM_RESET2),
PSCI_ID(1_1, MEM_PROTECT),
PSCI_ID_NATIVE(1_1, MEM_PROTECT_CHECK_RANGE),
+ PSCI_ID_NATIVE(1_3, SYSTEM_OFF2),
};
static int psci_debugfs_read(struct seq_file *s, void *data)
@@ -523,6 +547,16 @@ static void __init psci_init_system_reset2(void)
psci_system_reset2_supported = true;
}
+static void __init psci_init_system_off2(void)
+{
+ int ret;
+
+ ret = psci_features(PSCI_FN_NATIVE(1_3, SYSTEM_OFF2));
+
+ if (ret != PSCI_RET_NOT_SUPPORTED)
+ psci_system_off2_supported = true;
+}
+
static void __init psci_init_system_suspend(void)
{
int ret;
@@ -653,6 +687,7 @@ static int __init psci_probe(void)
psci_init_cpu_suspend();
psci_init_system_suspend();
psci_init_system_reset2();
+ psci_init_system_off2();
kvm_init_hyp_services();
}
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 4b0b7cf2e019..ac87b3cb670c 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -676,8 +676,11 @@ static void power_down(void)
}
fallthrough;
case HIBERNATION_SHUTDOWN:
- if (kernel_can_power_off())
+ if (kernel_can_power_off()) {
+ entering_platform_hibernation = true;
kernel_power_off();
+ entering_platform_hibernation = false;
+ }
break;
}
kernel_halt();
--
2.44.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation
2024-03-18 16:14 [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation David Woodhouse
` (3 preceding siblings ...)
2024-03-18 16:14 ` [RFC PATCH v2 4/4] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate David Woodhouse
@ 2024-03-18 16:57 ` Marc Zyngier
2024-03-18 17:26 ` David Woodhouse
4 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2024-03-18 16:57 UTC (permalink / raw)
To: David Woodhouse
Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Rafael J. Wysocki, Len Brown, Pavel Machek, David Woodhouse,
Mostafa Saleh, Jean-Philippe Brucker, linux-doc, linux-kernel,
kvmarm, linux-pm
On Mon, 18 Mar 2024 16:14:22 +0000,
David Woodhouse <dwmw2@infradead.org> wrote:
>
> The PSCI v1.3 spec (https://developer.arm.com/documentation/den0022,
> currently in Alpha state, hence 'RFC') adds support for a SYSTEM_OFF2
> function enabling a HIBERNATE_OFF state which is analogous to ACPI S4.
> This will allow hosting environments to determine that a guest is
> hibernated rather than just powered off, and ensure that they preserve
> the virtual environment appropriately to allow the guest to resume
> safely (or bump the hardware_signature in the FACS to trigger a clean
> reboot instead).
>
> This adds support for it to KVM, exactly the same way as the existing
> support for SYSTEM_RESET2 as added in commits d43583b890e7 ("KVM: arm64:
> Expose PSCI SYSTEM_RESET2 call to the guest") and 34739fd95fab ("KVM:
> arm64: Indicate SYSTEM_RESET2 in kvm_run::system_event flags field").
>
> Back then, KVM was unconditionally bumped to expose PSCI v1.1. This
> means that a kernel upgrade causes guest visible behaviour changes
> without any explicit opt-in from the VMM, which is... unconventional. In
> some cases, a PSCI update isn't just about new optional calls; PSCI v1.2
> for example adds a new permitted error return from the existing CPU_ON
> function.
>
> There *is* a way for a VMM to opt *out* of newer PSCI versions... by
> setting a per-vCPU "special" register that actually ends up setting the
> PSCI version KVM-wide. Quite why this isn't just a simple KVM_CAP, I
> have no idea.
Because the expectations are that the VMM can blindly save/restore the
guest's state, including the PSCI version, and restore that blindly.
KVM CAPs are just a really bad design pattern for this sort of things.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation
2024-03-18 16:57 ` [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation Marc Zyngier
@ 2024-03-18 17:26 ` David Woodhouse
2024-03-18 17:41 ` Marc Zyngier
0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2024-03-18 17:26 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm
[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]
On Mon, 2024-03-18 at 16:57 +0000, Marc Zyngier wrote:
>
> >
> > There *is* a way for a VMM to opt *out* of newer PSCI versions... by
> > setting a per-vCPU "special" register that actually ends up setting the
> > PSCI version KVM-wide. Quite why this isn't just a simple KVM_CAP, I
> > have no idea.
>
> Because the expectations are that the VMM can blindly save/restore the
> guest's state, including the PSCI version, and restore that blindly.
> KVM CAPs are just a really bad design pattern for this sort of things.
Hm, am I missing something here? Does the *guest* get to set the PSCI
version somehow, and opt into the latest version that it understands
regardless of what the firmware/host can support?
Because if not, surely it's just part of the basic shape of the
machine, like "how many vCPUs does it have". You don't need to be able
to query it back again.
I don't think we ever aspired to be able to hand an arbitrary KVM fd to
a userspace VMM and have the VMM be able to drive that VM without
having any a priori context, did we?
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation
2024-03-18 17:26 ` David Woodhouse
@ 2024-03-18 17:41 ` Marc Zyngier
2024-03-18 18:15 ` David Woodhouse
0 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2024-03-18 17:41 UTC (permalink / raw)
To: David Woodhouse
Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm
On Mon, 18 Mar 2024 17:26:07 +0000,
David Woodhouse <dwmw2@infradead.org> wrote:
>
> [1 <text/plain; UTF-8 (quoted-printable)>]
> On Mon, 2024-03-18 at 16:57 +0000, Marc Zyngier wrote:
> >
> > >
> > > There *is* a way for a VMM to opt *out* of newer PSCI versions... by
> > > setting a per-vCPU "special" register that actually ends up setting the
> > > PSCI version KVM-wide. Quite why this isn't just a simple KVM_CAP, I
> > > have no idea.
> >
> > Because the expectations are that the VMM can blindly save/restore the
> > guest's state, including the PSCI version, and restore that blindly.
> > KVM CAPs are just a really bad design pattern for this sort of things.
>
> Hm, am I missing something here? Does the *guest* get to set the PSCI
> version somehow, and opt into the latest version that it understands
> regardless of what the firmware/host can support?
No. The *VMM* sets the PSCI version by writing to a pseudo register.
It means that when the guest migrates, the VMM saves and restores that
version, and the guest doesn't see any change.
The host firmware has nothing to do with it, obviously. This is all
about KVM's own implementation of the "firmware", as seen by the guest.
> Because if not, surely it's just part of the basic shape of the
> machine, like "how many vCPUs does it have". You don't need to be able
> to query it back again.
Nobody needs to do this.
> I don't think we ever aspired to be able to hand an arbitrary KVM fd to
> a userspace VMM and have the VMM be able to drive that VM without
> having any a priori context, did we?
Arbitrary? No. This is actually very specific and pretty well
documented.
Also, to answer your question about why we treat 0.1 differently from
0.2+: 0.1 didn't specify the PSCI SMC/HCR encoding, meaning that KVM
implemented something that was never fully specified. The VMM has to
provide firmware tables that describe that. With 0.2+, there is a
standard encoding for all functions, and the VMM doesn't have to
provide the encoding to the guest.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation
2024-03-18 17:41 ` Marc Zyngier
@ 2024-03-18 18:15 ` David Woodhouse
2024-03-18 18:31 ` Marc Zyngier
0 siblings, 1 reply; 15+ messages in thread
From: David Woodhouse @ 2024-03-18 18:15 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm
[-- Attachment #1: Type: text/plain, Size: 2842 bytes --]
On Mon, 2024-03-18 at 17:41 +0000, Marc Zyngier wrote:
> On Mon, 18 Mar 2024 17:26:07 +0000,
> David Woodhouse <dwmw2@infradead.org> wrote:
> >
> > [1 <text/plain; UTF-8 (quoted-printable)>]
> > On Mon, 2024-03-18 at 16:57 +0000, Marc Zyngier wrote:
> > >
> > > >
> > > > There *is* a way for a VMM to opt *out* of newer PSCI versions... by
> > > > setting a per-vCPU "special" register that actually ends up setting the
> > > > PSCI version KVM-wide. Quite why this isn't just a simple KVM_CAP, I
> > > > have no idea.
> > >
> > > Because the expectations are that the VMM can blindly save/restore the
> > > guest's state, including the PSCI version, and restore that blindly.
> > > KVM CAPs are just a really bad design pattern for this sort of things.
> >
> > Hm, am I missing something here? Does the *guest* get to set the PSCI
> > version somehow, and opt into the latest version that it understands
> > regardless of what the firmware/host can support?
>
> No. The *VMM* sets the PSCI version by writing to a pseudo register.
> It means that when the guest migrates, the VMM saves and restores that
> version, and the guest doesn't see any change.
And when you boot a guest image which has been working for years under
a new kernel+KVM, your guest suddenly experiences a new PSCI version.
As I said that's not just new optional functions; it's potentially even
returning new error codes to the functions that said guest was already
using.
And when you *hibernate* a guest and then launch it again under a newer
kernel+KVM, it experiences the same incompatibility.
Unless the VMM realises this problem and opts *out* of the newer KVM
behaviour, of course. This is very much unlike how we *normally* expose
new KVM capabilities.
> > I don't think we ever aspired to be able to hand an arbitrary KVM fd to
> > a userspace VMM and have the VMM be able to drive that VM without
> > having any a priori context, did we?
>
> Arbitrary? No. This is actually very specific and pretty well
> documented.
>
> Also, to answer your question about why we treat 0.1 differently from
> 0.2+: 0.1 didn't specify the PSCI SMC/HCR encoding, meaning that KVM
> implemented something that was never fully specified. The VMM has to
> provide firmware tables that describe that. With 0.2+, there is a
> standard encoding for all functions, and the VMM doesn't have to
> provide the encoding to the guest.
Gotcha. So for that case we were *forced* to do things correctly and
allow userspace to opt-in to the capability. While for 0.2 onwards we
got away with this awfulness of silently upgrading the version without
VMM consent.
I was hoping to just follow the existing model of SYSTEM_RESET2 and not
have to touch this awfulness with a barge-pole, but sure, whatever you
want.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation
2024-03-18 18:15 ` David Woodhouse
@ 2024-03-18 18:31 ` Marc Zyngier
2024-03-18 18:36 ` David Woodhouse
0 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2024-03-18 18:31 UTC (permalink / raw)
To: David Woodhouse
Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm
On Mon, 18 Mar 2024 18:15:36 +0000,
David Woodhouse <dwmw2@infradead.org> wrote:
>
> [1 <text/plain; UTF-8 (quoted-printable)>]
> On Mon, 2024-03-18 at 17:41 +0000, Marc Zyngier wrote:
> > On Mon, 18 Mar 2024 17:26:07 +0000,
> > David Woodhouse <dwmw2@infradead.org> wrote:
> > >
> > > [1 <text/plain; UTF-8 (quoted-printable)>]
> > > On Mon, 2024-03-18 at 16:57 +0000, Marc Zyngier wrote:
> > > >
> > > > >
> > > > > There *is* a way for a VMM to opt *out* of newer PSCI versions... by
> > > > > setting a per-vCPU "special" register that actually ends up setting the
> > > > > PSCI version KVM-wide. Quite why this isn't just a simple KVM_CAP, I
> > > > > have no idea.
> > > >
> > > > Because the expectations are that the VMM can blindly save/restore the
> > > > guest's state, including the PSCI version, and restore that blindly.
> > > > KVM CAPs are just a really bad design pattern for this sort of things.
> > >
> > > Hm, am I missing something here? Does the *guest* get to set the PSCI
> > > version somehow, and opt into the latest version that it understands
> > > regardless of what the firmware/host can support?
> >
> > No. The *VMM* sets the PSCI version by writing to a pseudo register.
> > It means that when the guest migrates, the VMM saves and restores that
> > version, and the guest doesn't see any change.
>
> And when you boot a guest image which has been working for years under
> a new kernel+KVM, your guest suddenly experiences a new PSCI version.
> As I said that's not just new optional functions; it's potentially even
> returning new error codes to the functions that said guest was already
> using.
If you want to stick to a given PSCI version, you write the version
you want.
>
> And when you *hibernate* a guest and then launch it again under a newer
> kernel+KVM, it experiences the same incompatibility.
>
> Unless the VMM realises this problem and opts *out* of the newer KVM
> behaviour, of course. This is very much unlike how we *normally* expose
> new KVM capabilities.
This was discussed at length 5 or 6 years ago (opt-in vs opt-out).
The feedback from QEMU (which is the only public VMM that does
anything remotely useful with this) was that opt-out was a better
model, specially as PSCI is the conduit for advertising the Spectre
mitigations and users (such as certain cloud vendors) were pretty keen
on guests seeing the mitigations advertised *by default*.
And if you can spot any form of "normality" in the KVM interface, I'll
buy you whatever beer you want. It is all inconsistent crap, so I
think we're in pretty good company here.
>
> > > I don't think we ever aspired to be able to hand an arbitrary KVM fd to
> > > a userspace VMM and have the VMM be able to drive that VM without
> > > having any a priori context, did we?
> >
> > Arbitrary? No. This is actually very specific and pretty well
> > documented.
> >
> > Also, to answer your question about why we treat 0.1 differently from
> > 0.2+: 0.1 didn't specify the PSCI SMC/HCR encoding, meaning that KVM
> > implemented something that was never fully specified. The VMM has to
> > provide firmware tables that describe that. With 0.2+, there is a
> > standard encoding for all functions, and the VMM doesn't have to
> > provide the encoding to the guest.
>
> Gotcha. So for that case we were *forced* to do things correctly and
> allow userspace to opt-in to the capability. While for 0.2 onwards we
> got away with this awfulness of silently upgrading the version without
> VMM consent.
>
> I was hoping to just follow the existing model of SYSTEM_RESET2 and not
> have to touch this awfulness with a barge-pole, but sure, whatever you
> want.
Unless I'm reading the whole thing wrong (which isn't impossible given
that I'm jet-lagged to my eyeballs), SYSTEM_RESET2 doesn't have any
form of configuration. If PSCI 1.1 is selected, SYSTEM_RESET2 is
available. So that'd be the model to follow.
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH v2 0/4] arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation
2024-03-18 18:31 ` Marc Zyngier
@ 2024-03-18 18:36 ` David Woodhouse
0 siblings, 0 replies; 15+ messages in thread
From: David Woodhouse @ 2024-03-18 18:36 UTC (permalink / raw)
To: Marc Zyngier
Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm
[-- Attachment #1: Type: text/plain, Size: 4360 bytes --]
On Mon, 2024-03-18 at 18:31 +0000, Marc Zyngier wrote:
> On Mon, 18 Mar 2024 18:15:36 +0000,
> David Woodhouse <dwmw2@infradead.org> wrote:
> >
> > [1 <text/plain; UTF-8 (quoted-printable)>]
> > On Mon, 2024-03-18 at 17:41 +0000, Marc Zyngier wrote:
> > > On Mon, 18 Mar 2024 17:26:07 +0000,
> > > David Woodhouse <dwmw2@infradead.org> wrote:
> > > >
> > > > [1 <text/plain; UTF-8 (quoted-printable)>]
> > > > On Mon, 2024-03-18 at 16:57 +0000, Marc Zyngier wrote:
> > > > >
> > > > > >
> > > > > > There *is* a way for a VMM to opt *out* of newer PSCI versions... by
> > > > > > setting a per-vCPU "special" register that actually ends up setting the
> > > > > > PSCI version KVM-wide. Quite why this isn't just a simple KVM_CAP, I
> > > > > > have no idea.
> > > > >
> > > > > Because the expectations are that the VMM can blindly save/restore the
> > > > > guest's state, including the PSCI version, and restore that blindly.
> > > > > KVM CAPs are just a really bad design pattern for this sort of things.
> > > >
> > > > Hm, am I missing something here? Does the *guest* get to set the PSCI
> > > > version somehow, and opt into the latest version that it understands
> > > > regardless of what the firmware/host can support?
> > >
> > > No. The *VMM* sets the PSCI version by writing to a pseudo register.
> > > It means that when the guest migrates, the VMM saves and restores that
> > > version, and the guest doesn't see any change.
> >
> > And when you boot a guest image which has been working for years under
> > a new kernel+KVM, your guest suddenly experiences a new PSCI version.
> > As I said that's not just new optional functions; it's potentially even
> > returning new error codes to the functions that said guest was already
> > using.
>
> If you want to stick to a given PSCI version, you write the version
> you want.
>
> >
> > And when you *hibernate* a guest and then launch it again under a newer
> > kernel+KVM, it experiences the same incompatibility.
> >
> > Unless the VMM realises this problem and opts *out* of the newer KVM
> > behaviour, of course. This is very much unlike how we *normally* expose
> > new KVM capabilities.
>
> This was discussed at length 5 or 6 years ago (opt-in vs opt-out).
>
> The feedback from QEMU (which is the only public VMM that does
> anything remotely useful with this) was that opt-out was a better
> model, specially as PSCI is the conduit for advertising the Spectre
> mitigations and users (such as certain cloud vendors) were pretty keen
> on guests seeing the mitigations advertised *by default*.
OK.
> And if you can spot any form of "normality" in the KVM interface, I'll
> buy you whatever beer you want. It is all inconsistent crap, so I
> think we're in pretty good company here.
I'll give you that one :)
> >
> > > > I don't think we ever aspired to be able to hand an arbitrary KVM fd to
> > > > a userspace VMM and have the VMM be able to drive that VM without
> > > > having any a priori context, did we?
> > >
> > > Arbitrary? No. This is actually very specific and pretty well
> > > documented.
> > >
> > > Also, to answer your question about why we treat 0.1 differently from
> > > 0.2+: 0.1 didn't specify the PSCI SMC/HCR encoding, meaning that KVM
> > > implemented something that was never fully specified. The VMM has to
> > > provide firmware tables that describe that. With 0.2+, there is a
> > > standard encoding for all functions, and the VMM doesn't have to
> > > provide the encoding to the guest.
> >
> > Gotcha. So for that case we were *forced* to do things correctly and
> > allow userspace to opt-in to the capability. While for 0.2 onwards we
> > got away with this awfulness of silently upgrading the version without
> > VMM consent.
> >
> > I was hoping to just follow the existing model of SYSTEM_RESET2 and not
> > have to touch this awfulness with a barge-pole, but sure, whatever you
> > want.
>
> Unless I'm reading the whole thing wrong (which isn't impossible given
> that I'm jet-lagged to my eyeballs), SYSTEM_RESET2 doesn't have any
> form of configuration. If PSCI 1.1 is selected, SYSTEM_RESET2 is
> available. So that'd be the model to follow.
Sorry, that was supposed to be SYSTEM_SUSPEND not SYSTEM_RESET2. But
OK.
[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread