linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] Add PSCI v1.3 SYSTEM_OFF2 support for hibernation
@ 2024-03-12 13:51 David Woodhouse
  2024-03-12 13:51 ` [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function " David Woodhouse
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: David Woodhouse @ 2024-03-12 13:51 UTC (permalink / raw)
  To: linux-arm-kernel, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Marc Zyngier, Oliver Upton,
	James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Will Deacon, Mark Rutland, Lorenzo Pieralisi, Rafael J. Wysocki,
	Len Brown, Pavel Machek, David Woodhouse, Mostafa Saleh,
	Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm

The upcoming PSCI v1.3 specification adds support for a SYSTEM_OFF2 
function which is analogous to ACPI S4 state. This will allow hosting 
environments to determine that a guest is hibernated rather than just 
powered off, and ensure that they preserve the virtual environment 
appropriately to allow the guest to resume safely (or bump the 
hardware_signature in the FACS to trigger a clean reboot instead).

This adds support for it to KVM, and to the guest hibernate code.

Strictly, we should perhaps also allow the guest to detect PSCI v1.3, 
but when v1.1 was added in commit 512865d83fd9 it was done 
unconditionally, which seems wrong. Shouldn't we have a way for 
userspace to control what gets exposed, rather than silently changing 
the guest behaviour with newer host kernels? Should I add a 
KVM_CAP_ARM_PSCI_VERSION?

For the guest side, this adds a new SYS_OFF_MODE_POWER_OFF with higher 
priority than the EFI one, but which *only* triggers when there's a 
hibernation in progress. That seemed like the simplest option, but see 
the commit message for alternative possilities. I told Rafael I'd post a 
straw man for bikeshedding, and here it is.

 Documentation/virt/kvm/api.rst       | 11 +++++++++++
 arch/arm64/include/asm/kvm_host.h    |  2 ++
 arch/arm64/include/uapi/asm/kvm.h    |  6 ++++++
 arch/arm64/kvm/arm.c                 |  5 +++++
 arch/arm64/kvm/hyp/nvhe/psci-relay.c |  2 ++
 arch/arm64/kvm/psci.c                | 37 ++++++++++++++++++++++++++++++++++++
 drivers/firmware/psci/psci.c         | 35 ++++++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h             |  1 +
 include/uapi/linux/psci.h            |  5 +++++
 kernel/power/hibernate.c             |  5 ++++-
 10 files changed, 108 insertions(+), 1 deletion(-)


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation
  2024-03-12 13:51 [RFC PATCH 0/2] Add PSCI v1.3 SYSTEM_OFF2 support for hibernation David Woodhouse
@ 2024-03-12 13:51 ` David Woodhouse
  2024-03-12 15:36   ` Marc Zyngier
  2024-03-12 15:47   ` Oliver Upton
  2024-03-12 13:51 ` [RFC PATCH 2/2] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate David Woodhouse
  2024-03-12 15:24 ` [RFC PATCH 0/2] Add PSCI v1.3 SYSTEM_OFF2 support for hibernation Marc Zyngier
  2 siblings, 2 replies; 16+ messages in thread
From: David Woodhouse @ 2024-03-12 13:51 UTC (permalink / raw)
  To: linux-arm-kernel, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Marc Zyngier, Oliver Upton,
	James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Will Deacon, Mark Rutland, Lorenzo Pieralisi, Rafael J. Wysocki,
	Len Brown, Pavel Machek, David Woodhouse, Mostafa Saleh,
	Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm

From: David Woodhouse <dwmw@amazon.co.uk>

The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2 function
which is analogous to ACPI S4 state. This will allow hosting environments
to determine that a guest is hibernated rather than just powered off, and
ensure that they preserve the virtual environment appropriately to allow
the guest to resume safely (or bump the hardware_signature in the FACS to
trigger a clean reboot instead).

The beta version will be changed to say that PSCI_FEATURES returns a bit
mask of the supported hibernate types, which is implemented here.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 Documentation/virt/kvm/api.rst       | 11 +++++++++
 arch/arm64/include/asm/kvm_host.h    |  2 ++
 arch/arm64/include/uapi/asm/kvm.h    |  6 +++++
 arch/arm64/kvm/arm.c                 |  5 ++++
 arch/arm64/kvm/hyp/nvhe/psci-relay.c |  2 ++
 arch/arm64/kvm/psci.c                | 37 ++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h             |  1 +
 include/uapi/linux/psci.h            |  5 ++++
 8 files changed, 69 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index bd93cafd3e4e..f5963c3770a5 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6761,6 +6761,10 @@ the first `ndata` items (possibly zero) of the data array are valid.
    the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI
    specification.
 
+ - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2
+   if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI
+   specification.
+
  - for RISC-V, data[0] is set to the value of the second argument of the
    ``sbi_system_reset`` call.
 
@@ -6794,6 +6798,13 @@ either:
  - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2
    "Caller responsibilities" for possible return values.
 
+Hibernation using the PSCI SYSTEM_OFF2 call is enabled with the
+KVM_CAP_ARM_SYSTEM_OFF2 VM capability. If a guest invokes the PSCI
+SYSTEM_OFF2 function, KVM will exit to userspace with the
+KVM_SYSTEM_EVENT_SHUTDOWN event type and with data[0] set to
+KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only supported hibernate
+type for the SYSTEM_OFF2 function is HIBERNATE_OFF (0x0).
+
 ::
 
 		/* KVM_EXIT_IOAPIC_EOI */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 21c57b812569..d6da0eb1c236 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -274,6 +274,8 @@ struct kvm_arch {
 #define KVM_ARCH_FLAG_TIMER_PPIS_IMMUTABLE		6
 	/* Initial ID reg values loaded */
 #define KVM_ARCH_FLAG_ID_REGS_INITIALIZED		7
+	/* PSCI SYSTEM_OFF2 (hibernate) enabled for the guest */
+#define KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED		8
 	unsigned long flags;
 
 	/* VM-wide vCPU feature set */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 964df31da975..66736ff04011 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -484,6 +484,12 @@ enum {
  */
 #define KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2	(1ULL << 0)
 
+/*
+ * Shutdown caused by a PSCI v1.3 SYSTEM_OFF2 call.
+ * Valid only when the system event has a type of KVM_SYSTEM_EVENT_SHUTDOWN.
+ */
+#define KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2	(1ULL << 0)
+
 /* run->fail_entry.hardware_entry_failure_reason codes. */
 #define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED	(1ULL << 0)
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a25265aca432..1c58762272eb 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -98,6 +98,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		r = 0;
 		set_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags);
 		break;
+	case KVM_CAP_ARM_SYSTEM_OFF2:
+		r = 0;
+		set_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags);
+		break;
 	case KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE:
 		new_cap = cap->args[0];
 
@@ -238,6 +242,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
 	case KVM_CAP_ARM_SYSTEM_SUSPEND:
+	case KVM_CAP_ARM_SYSTEM_OFF2:
 	case KVM_CAP_IRQFD_RESAMPLE:
 	case KVM_CAP_COUNTER_OFFSET:
 		r = 1;
diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
index d57bcb6ab94d..0d4bea0b9ca2 100644
--- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
+++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
@@ -264,6 +264,8 @@ static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_
 	switch (func_id) {
 	case PSCI_1_0_FN_PSCI_FEATURES:
 	case PSCI_1_0_FN_SET_SUSPEND_MODE:
+	case PSCI_1_3_FN_SYSTEM_OFF2:
+	case PSCI_1_3_FN64_SYSTEM_OFF2:
 	case PSCI_1_1_FN64_SYSTEM_RESET2:
 		return psci_forward(host_ctxt);
 	case PSCI_1_0_FN64_SYSTEM_SUSPEND:
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 1f69b667332b..59570eea8aa7 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -194,6 +194,12 @@ static void kvm_psci_system_off(struct kvm_vcpu *vcpu)
 	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN, 0);
 }
 
+static void kvm_psci_system_off2(struct kvm_vcpu *vcpu)
+{
+	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN,
+				 KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2);
+}
+
 static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
 {
 	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET, 0);
@@ -353,6 +359,11 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
 			if (test_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags))
 				val = 0;
 			break;
+		case PSCI_1_3_FN_SYSTEM_OFF2:
+		case PSCI_1_3_FN64_SYSTEM_OFF2:
+			if (test_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags))
+				val = 1UL << PSCI_1_3_HIBERNATE_TYPE_OFF;
+			break;
 		case PSCI_1_1_FN_SYSTEM_RESET2:
 		case PSCI_1_1_FN64_SYSTEM_RESET2:
 			if (minor >= 1)
@@ -374,6 +385,32 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
 			return 0;
 		}
 		break;
+	case PSCI_1_3_FN_SYSTEM_OFF2:
+		kvm_psci_narrow_to_32bit(vcpu);
+		fallthrough;
+	case PSCI_1_3_FN64_SYSTEM_OFF2:
+		if (!test_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags))
+			break;
+
+		arg = smccc_get_arg1(vcpu);
+		if (arg != PSCI_1_3_HIBERNATE_TYPE_OFF) {
+			val = PSCI_RET_INVALID_PARAMS;
+			break;
+		}
+		kvm_psci_system_off2(vcpu);
+		/*
+		 * We shouldn't be going back to guest VCPU after
+		 * receiving SYSTEM_OFF2 request.
+		 *
+		 * If user space accidentally/deliberately resumes
+		 * guest VCPU after SYSTEM_OFF request then guest
+		 * VCPU should see internal failure from PSCI return
+		 * value. To achieve this, we preload r0 (or x0) with
+		 * PSCI return value INTERNAL_FAILURE.
+		 */
+		val = PSCI_RET_INTERNAL_FAILURE;
+		ret = 0;
+		break;
 	case PSCI_1_1_FN_SYSTEM_RESET2:
 		kvm_psci_narrow_to_32bit(vcpu);
 		fallthrough;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..caa3845b80d6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -917,6 +917,7 @@ struct kvm_enable_cap {
 #define KVM_CAP_MEMORY_ATTRIBUTES 233
 #define KVM_CAP_GUEST_MEMFD 234
 #define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_ARM_SYSTEM_OFF2 236
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
index 42a40ad3fb62..f9a015ebe221 100644
--- a/include/uapi/linux/psci.h
+++ b/include/uapi/linux/psci.h
@@ -59,6 +59,7 @@
 #define PSCI_1_1_FN_SYSTEM_RESET2		PSCI_0_2_FN(18)
 #define PSCI_1_1_FN_MEM_PROTECT			PSCI_0_2_FN(19)
 #define PSCI_1_1_FN_MEM_PROTECT_CHECK_RANGE	PSCI_0_2_FN(20)
+#define PSCI_1_3_FN_SYSTEM_OFF2			PSCI_0_2_FN(21)
 
 #define PSCI_1_0_FN64_CPU_DEFAULT_SUSPEND	PSCI_0_2_FN64(12)
 #define PSCI_1_0_FN64_NODE_HW_STATE		PSCI_0_2_FN64(13)
@@ -68,6 +69,7 @@
 
 #define PSCI_1_1_FN64_SYSTEM_RESET2		PSCI_0_2_FN64(18)
 #define PSCI_1_1_FN64_MEM_PROTECT_CHECK_RANGE	PSCI_0_2_FN64(20)
+#define PSCI_1_3_FN64_SYSTEM_OFF2		PSCI_0_2_FN64(21)
 
 /* PSCI v0.2 power state encoding for CPU_SUSPEND function */
 #define PSCI_0_2_POWER_STATE_ID_MASK		0xffff
@@ -100,6 +102,9 @@
 #define PSCI_1_1_RESET_TYPE_SYSTEM_WARM_RESET	0
 #define PSCI_1_1_RESET_TYPE_VENDOR_START	0x80000000U
 
+/* PSCI v1.3 hibernate type for SYSTEM_OFF2 */
+#define PSCI_1_3_HIBERNATE_TYPE_OFF		0
+
 /* PSCI version decoding (independent of PSCI version) */
 #define PSCI_VERSION_MAJOR_SHIFT		16
 #define PSCI_VERSION_MINOR_MASK			\
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC PATCH 2/2] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
  2024-03-12 13:51 [RFC PATCH 0/2] Add PSCI v1.3 SYSTEM_OFF2 support for hibernation David Woodhouse
  2024-03-12 13:51 ` [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function " David Woodhouse
@ 2024-03-12 13:51 ` David Woodhouse
  2024-03-12 15:57   ` Sudeep Holla
  2024-03-12 15:24 ` [RFC PATCH 0/2] Add PSCI v1.3 SYSTEM_OFF2 support for hibernation Marc Zyngier
  2 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2024-03-12 13:51 UTC (permalink / raw)
  To: linux-arm-kernel, kvm
  Cc: Paolo Bonzini, Jonathan Corbet, Marc Zyngier, Oliver Upton,
	James Morse, Suzuki K Poulose, Zenghui Yu, Catalin Marinas,
	Will Deacon, Mark Rutland, Lorenzo Pieralisi, Rafael J. Wysocki,
	Len Brown, Pavel Machek, David Woodhouse, Mostafa Saleh,
	Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm

From: David Woodhouse <dwmw@amazon.co.uk>

The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2
function which is analogous to ACPI S4 state. This will allow hosting
environments to determine that a guest is hibernated rather than just
powered off, and handle that state appropriately on subsequent launches.

Since commit 60c0d45a7f7a ("efi/arm64: use UEFI for system reset and
poweroff") the EFI shutdown method is deliberately preferred over PSCI
or other methods. So register a SYS_OFF_MODE_POWER_OFF handler which
*only* handles the hibernation, leaving the original PSCI SYSTEM_OFF as
a last resort via the legacy pm_power_off function pointer.

The hibernation code already exports a system_entering_hibernation()
function which is be used by the higher-priority handler to check for
hibernation. That existing function just returns the value of a static
boolean variable from hibernate.c, which was previously only set in the
hibernation_platform_enter() code path. Set the same flag in the simpler
code path around the call to kernel_power_off() too.

An alternative way to hook SYSTEM_OFF2 into the hibernation code would
be to register a platform_hibernation_ops structure with an ->enter()
method which makes the new SYSTEM_OFF2 call. But that would have the
unwanted side-effect of making hibernation take a completely different
code path in hibernation_platform_enter(), invoking a lot of special dpm
callbacks.

Another option might be to add a new SYS_OFF_MODE_HIBERNATE mode, with
fallback to SYS_OFF_MODE_POWER_OFF. Or to use the sys_off_data to
indicate whether the power off is for hibernation.

But this version works and is relatively simple.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 drivers/firmware/psci/psci.c | 35 +++++++++++++++++++++++++++++++++++
 kernel/power/hibernate.c     |  5 ++++-
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
index d9629ff87861..69d2f6969438 100644
--- a/drivers/firmware/psci/psci.c
+++ b/drivers/firmware/psci/psci.c
@@ -78,6 +78,7 @@ struct psci_0_1_function_ids get_psci_0_1_function_ids(void)
 
 static u32 psci_cpu_suspend_feature;
 static bool psci_system_reset2_supported;
+static bool psci_system_off2_supported;
 
 static inline bool psci_has_ext_power_state(void)
 {
@@ -333,6 +334,28 @@ static void psci_sys_poweroff(void)
 	invoke_psci_fn(PSCI_0_2_FN_SYSTEM_OFF, 0, 0, 0);
 }
 
+#ifdef CONFIG_HIBERNATION
+static int psci_sys_hibernate(struct sys_off_data *data)
+{
+	if (system_entering_hibernation())
+		invoke_psci_fn(PSCI_FN_NATIVE(1_3, SYSTEM_OFF2),
+			       PSCI_1_3_HIBERNATE_TYPE_OFF, 0, 0);
+	return NOTIFY_DONE;
+}
+
+static int __init psci_hibernate_init(void)
+{
+	if (psci_system_off2_supported) {
+		/* Higher priority than EFI shutdown, but only for hibernate */
+		register_sys_off_handler(SYS_OFF_MODE_POWER_OFF,
+					 SYS_OFF_PRIO_FIRMWARE + 2,
+					 psci_sys_hibernate, NULL);
+	}
+	return 0;
+}
+subsys_initcall(psci_hibernate_init);
+#endif
+
 static int psci_features(u32 psci_func_id)
 {
 	return invoke_psci_fn(PSCI_1_0_FN_PSCI_FEATURES,
@@ -364,6 +387,7 @@ static const struct {
 	PSCI_ID_NATIVE(1_1, SYSTEM_RESET2),
 	PSCI_ID(1_1, MEM_PROTECT),
 	PSCI_ID_NATIVE(1_1, MEM_PROTECT_CHECK_RANGE),
+	PSCI_ID_NATIVE(1_3, SYSTEM_OFF2),
 };
 
 static int psci_debugfs_read(struct seq_file *s, void *data)
@@ -523,6 +547,16 @@ static void __init psci_init_system_reset2(void)
 		psci_system_reset2_supported = true;
 }
 
+static void __init psci_init_system_off2(void)
+{
+	int ret;
+
+	ret = psci_features(PSCI_FN_NATIVE(1_3, SYSTEM_OFF2));
+
+	if (ret != PSCI_RET_NOT_SUPPORTED)
+		psci_system_off2_supported = true;
+}
+
 static void __init psci_init_system_suspend(void)
 {
 	int ret;
@@ -653,6 +687,7 @@ static int __init psci_probe(void)
 		psci_init_cpu_suspend();
 		psci_init_system_suspend();
 		psci_init_system_reset2();
+		psci_init_system_off2();
 		kvm_init_hyp_services();
 	}
 
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 4b0b7cf2e019..ac87b3cb670c 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -676,8 +676,11 @@ static void power_down(void)
 		}
 		fallthrough;
 	case HIBERNATION_SHUTDOWN:
-		if (kernel_can_power_off())
+		if (kernel_can_power_off()) {
+			entering_platform_hibernation = true;
 			kernel_power_off();
+			entering_platform_hibernation = false;
+		}
 		break;
 	}
 	kernel_halt();
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/2] Add PSCI v1.3 SYSTEM_OFF2 support for hibernation
  2024-03-12 13:51 [RFC PATCH 0/2] Add PSCI v1.3 SYSTEM_OFF2 support for hibernation David Woodhouse
  2024-03-12 13:51 ` [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function " David Woodhouse
  2024-03-12 13:51 ` [RFC PATCH 2/2] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate David Woodhouse
@ 2024-03-12 15:24 ` Marc Zyngier
  2024-03-12 17:01   ` David Woodhouse
  2 siblings, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2024-03-12 15:24 UTC (permalink / raw)
  To: David Woodhouse
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
	Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
	Rafael J. Wysocki, Len Brown, Pavel Machek, David Woodhouse,
	Mostafa Saleh, Jean-Philippe Brucker, linux-doc, linux-kernel,
	kvmarm, linux-pm

On Tue, 12 Mar 2024 13:51:27 +0000,
David Woodhouse <dwmw2@infradead.org> wrote:
> 
> The upcoming PSCI v1.3 specification adds support for a SYSTEM_OFF2

Pointer to the spec? Crucially, this is in the Alpha state, meaning
that it is still subject to change [1].

> function which is analogous to ACPI S4 state. This will allow hosting 
> environments to determine that a guest is hibernated rather than just 
> powered off, and ensure that they preserve the virtual environment 
> appropriately to allow the guest to resume safely (or bump the 
> hardware_signature in the FACS to trigger a clean reboot instead).
> 
> This adds support for it to KVM, and to the guest hibernate code.
> 
> Strictly, we should perhaps also allow the guest to detect PSCI v1.3, 
> but when v1.1 was added in commit 512865d83fd9 it was done 
> unconditionally, which seems wrong. Shouldn't we have a way for 
> userspace to control what gets exposed, rather than silently changing 
> the guest behaviour with newer host kernels? Should I add a 
> KVM_CAP_ARM_PSCI_VERSION?

Do you mean something like 85bd0ba1ff98?

	M.

[1] https://documentation-service.arm.com/static/65e59325837c4d065f6556a6

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation
  2024-03-12 13:51 ` [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function " David Woodhouse
@ 2024-03-12 15:36   ` Marc Zyngier
  2024-03-12 17:06     ` David Woodhouse
  2024-03-12 15:47   ` Oliver Upton
  1 sibling, 1 reply; 16+ messages in thread
From: Marc Zyngier @ 2024-03-12 15:36 UTC (permalink / raw)
  To: David Woodhouse
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
	Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
	Rafael J. Wysocki, Len Brown, Pavel Machek, David Woodhouse,
	Mostafa Saleh, Jean-Philippe Brucker, linux-doc, linux-kernel,
	kvmarm, linux-pm

On Tue, 12 Mar 2024 13:51:28 +0000,
David Woodhouse <dwmw2@infradead.org> wrote:
> 
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2 function
> which is analogous to ACPI S4 state. This will allow hosting environments
> to determine that a guest is hibernated rather than just powered off, and
> ensure that they preserve the virtual environment appropriately to allow
> the guest to resume safely (or bump the hardware_signature in the FACS to
> trigger a clean reboot instead).
> 
> The beta version will be changed to say that PSCI_FEATURES returns a bit
> mask of the supported hibernate types, which is implemented here.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  Documentation/virt/kvm/api.rst       | 11 +++++++++
>  arch/arm64/include/asm/kvm_host.h    |  2 ++
>  arch/arm64/include/uapi/asm/kvm.h    |  6 +++++
>  arch/arm64/kvm/arm.c                 |  5 ++++
>  arch/arm64/kvm/hyp/nvhe/psci-relay.c |  2 ++
>  arch/arm64/kvm/psci.c                | 37 ++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h             |  1 +
>  include/uapi/linux/psci.h            |  5 ++++
>  8 files changed, 69 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index bd93cafd3e4e..f5963c3770a5 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6761,6 +6761,10 @@ the first `ndata` items (possibly zero) of the data array are valid.
>     the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI
>     specification.
>  
> + - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2
> +   if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI
> +   specification.
> +
>   - for RISC-V, data[0] is set to the value of the second argument of the
>     ``sbi_system_reset`` call.
>  
> @@ -6794,6 +6798,13 @@ either:
>   - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2
>     "Caller responsibilities" for possible return values.
>  
> +Hibernation using the PSCI SYSTEM_OFF2 call is enabled with the
> +KVM_CAP_ARM_SYSTEM_OFF2 VM capability. If a guest invokes the PSCI

Checking that PSCI 1.3 is enabled for the guest should be enough, no?
I don't think providing yet another level of optionally brings us
much, other than complexity.

> +SYSTEM_OFF2 function, KVM will exit to userspace with the
> +KVM_SYSTEM_EVENT_SHUTDOWN event type and with data[0] set to
> +KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only supported hibernate
> +type for the SYSTEM_OFF2 function is HIBERNATE_OFF (0x0).
> +
>  ::
>  
>  		/* KVM_EXIT_IOAPIC_EOI */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 21c57b812569..d6da0eb1c236 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -274,6 +274,8 @@ struct kvm_arch {
>  #define KVM_ARCH_FLAG_TIMER_PPIS_IMMUTABLE		6
>  	/* Initial ID reg values loaded */
>  #define KVM_ARCH_FLAG_ID_REGS_INITIALIZED		7
> +	/* PSCI SYSTEM_OFF2 (hibernate) enabled for the guest */
> +#define KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED		8
>  	unsigned long flags;
>  
>  	/* VM-wide vCPU feature set */
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 964df31da975..66736ff04011 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -484,6 +484,12 @@ enum {
>   */
>  #define KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2	(1ULL << 0)
>  
> +/*
> + * Shutdown caused by a PSCI v1.3 SYSTEM_OFF2 call.
> + * Valid only when the system event has a type of KVM_SYSTEM_EVENT_SHUTDOWN.
> + */
> +#define KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2	(1ULL << 0)
> +
>  /* run->fail_entry.hardware_entry_failure_reason codes. */
>  #define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED	(1ULL << 0)
>  
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index a25265aca432..1c58762272eb 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -98,6 +98,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		r = 0;
>  		set_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags);
>  		break;
> +	case KVM_CAP_ARM_SYSTEM_OFF2:
> +		r = 0;
> +		set_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags);
> +		break;
>  	case KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE:
>  		new_cap = cap->args[0];
>  
> @@ -238,6 +242,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  	case KVM_CAP_VCPU_ATTRIBUTES:
>  	case KVM_CAP_PTP_KVM:
>  	case KVM_CAP_ARM_SYSTEM_SUSPEND:
> +	case KVM_CAP_ARM_SYSTEM_OFF2:
>  	case KVM_CAP_IRQFD_RESAMPLE:
>  	case KVM_CAP_COUNTER_OFFSET:
>  		r = 1;
> diff --git a/arch/arm64/kvm/hyp/nvhe/psci-relay.c b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> index d57bcb6ab94d..0d4bea0b9ca2 100644
> --- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> +++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> @@ -264,6 +264,8 @@ static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_
>  	switch (func_id) {
>  	case PSCI_1_0_FN_PSCI_FEATURES:
>  	case PSCI_1_0_FN_SET_SUSPEND_MODE:
> +	case PSCI_1_3_FN_SYSTEM_OFF2:
> +	case PSCI_1_3_FN64_SYSTEM_OFF2:

nit: order by version number.

>  	case PSCI_1_1_FN64_SYSTEM_RESET2:
>  		return psci_forward(host_ctxt);
>  	case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
> index 1f69b667332b..59570eea8aa7 100644
> --- a/arch/arm64/kvm/psci.c
> +++ b/arch/arm64/kvm/psci.c
> @@ -194,6 +194,12 @@ static void kvm_psci_system_off(struct kvm_vcpu *vcpu)
>  	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN, 0);
>  }
>  
> +static void kvm_psci_system_off2(struct kvm_vcpu *vcpu)
> +{
> +	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN,
> +				 KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2);
> +}
> +
>  static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
>  {
>  	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET, 0);
> @@ -353,6 +359,11 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
>  			if (test_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags))
>  				val = 0;
>  			break;
> +		case PSCI_1_3_FN_SYSTEM_OFF2:
> +		case PSCI_1_3_FN64_SYSTEM_OFF2:
> +			if (test_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags))
> +				val = 1UL << PSCI_1_3_HIBERNATE_TYPE_OFF;
> +			break;

Testing the PSCI version should be enough (minor >= 3). Same thing
goes the the capability: checking that the host supports 1.3 should be
enough.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation
  2024-03-12 13:51 ` [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function " David Woodhouse
  2024-03-12 15:36   ` Marc Zyngier
@ 2024-03-12 15:47   ` Oliver Upton
  2024-03-13 12:53     ` David Woodhouse
  1 sibling, 1 reply; 16+ messages in thread
From: Oliver Upton @ 2024-03-12 15:47 UTC (permalink / raw)
  To: David Woodhouse
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
	Marc Zyngier, James Morse, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
	Rafael J. Wysocki, Len Brown, Pavel Machek, David Woodhouse,
	Mostafa Saleh, Jean-Philippe Brucker, linux-doc, linux-kernel,
	kvmarm, linux-pm

Hi,

On Tue, Mar 12, 2024 at 01:51:28PM +0000, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2 function
> which is analogous to ACPI S4 state. This will allow hosting environments
> to determine that a guest is hibernated rather than just powered off, and
> ensure that they preserve the virtual environment appropriately to allow
> the guest to resume safely (or bump the hardware_signature in the FACS to
> trigger a clean reboot instead).
> 
> The beta version will be changed to say that PSCI_FEATURES returns a bit
> mask of the supported hibernate types, which is implemented here.

Have you considered doing the PSCI implementation in userspace? The
SMCCC filter [*] was added for this exact purpose. There are other
features being done in userspace (e.g. vCPU hotplug) built on
intercepting hypercalls, this seems to be a reasonable candidate too.

[*] https://docs.kernel.org/virt/kvm/devices/vm.html#attribute-kvm-arm-vm-smccc-filter-w-o

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/2] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
  2024-03-12 13:51 ` [RFC PATCH 2/2] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate David Woodhouse
@ 2024-03-12 15:57   ` Sudeep Holla
  2024-03-12 16:36     ` David Woodhouse
  0 siblings, 1 reply; 16+ messages in thread
From: Sudeep Holla @ 2024-03-12 15:57 UTC (permalink / raw)
  To: David Woodhouse
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
	Marc Zyngier, Sudeep Holla, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Mark Rutland, Lorenzo Pieralisi, Rafael J. Wysocki, Len Brown,
	Pavel Machek, David Woodhouse, Mostafa Saleh,
	Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm

On Tue, Mar 12, 2024 at 01:51:29PM +0000, David Woodhouse wrote:
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2
> function which is analogous to ACPI S4 state. This will allow hosting
> environments to determine that a guest is hibernated rather than just
> powered off, and handle that state appropriately on subsequent launches.
> 
> Since commit 60c0d45a7f7a ("efi/arm64: use UEFI for system reset and
> poweroff") the EFI shutdown method is deliberately preferred over PSCI
> or other methods. So register a SYS_OFF_MODE_POWER_OFF handler which
> *only* handles the hibernation, leaving the original PSCI SYSTEM_OFF as
> a last resort via the legacy pm_power_off function pointer.
> 
> The hibernation code already exports a system_entering_hibernation()
> function which is be used by the higher-priority handler to check for
> hibernation. That existing function just returns the value of a static
> boolean variable from hibernate.c, which was previously only set in the
> hibernation_platform_enter() code path. Set the same flag in the simpler
> code path around the call to kernel_power_off() too.
> 
> An alternative way to hook SYSTEM_OFF2 into the hibernation code would
> be to register a platform_hibernation_ops structure with an ->enter()
> method which makes the new SYSTEM_OFF2 call. But that would have the
> unwanted side-effect of making hibernation take a completely different
> code path in hibernation_platform_enter(), invoking a lot of special dpm
> callbacks.
> 
> Another option might be to add a new SYS_OFF_MODE_HIBERNATE mode, with
> fallback to SYS_OFF_MODE_POWER_OFF. Or to use the sys_off_data to
> indicate whether the power off is for hibernation.
> 
> But this version works and is relatively simple.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  drivers/firmware/psci/psci.c | 35 +++++++++++++++++++++++++++++++++++
>  kernel/power/hibernate.c     |  5 ++++-
>  2 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/firmware/psci/psci.c b/drivers/firmware/psci/psci.c
> index d9629ff87861..69d2f6969438 100644
> --- a/drivers/firmware/psci/psci.c
> +++ b/drivers/firmware/psci/psci.c
> @@ -78,6 +78,7 @@ struct psci_0_1_function_ids get_psci_0_1_function_ids(void)
>  
>  static u32 psci_cpu_suspend_feature;
>  static bool psci_system_reset2_supported;
> +static bool psci_system_off2_supported;
>  
>  static inline bool psci_has_ext_power_state(void)
>  {
> @@ -333,6 +334,28 @@ static void psci_sys_poweroff(void)
>  	invoke_psci_fn(PSCI_0_2_FN_SYSTEM_OFF, 0, 0, 0);
>  }
>  
> +#ifdef CONFIG_HIBERNATION
> +static int psci_sys_hibernate(struct sys_off_data *data)
> +{
> +	if (system_entering_hibernation())
> +		invoke_psci_fn(PSCI_FN_NATIVE(1_3, SYSTEM_OFF2),
> +			       PSCI_1_3_HIBERNATE_TYPE_OFF, 0, 0);
> +	return NOTIFY_DONE;
> +}
> +
> +static int __init psci_hibernate_init(void)
> +{
> +	if (psci_system_off2_supported) {
> +		/* Higher priority than EFI shutdown, but only for hibernate */
> +		register_sys_off_handler(SYS_OFF_MODE_POWER_OFF,
> +					 SYS_OFF_PRIO_FIRMWARE + 2,
> +					 psci_sys_hibernate, NULL);
> +	}
> +	return 0;
> +}
> +subsys_initcall(psci_hibernate_init);

Looked briefly at register_sys_off_handler and it should be OK to call
it from psci_init_system_off2() below. Any particular reason for having
separate initcall to do this ? We can even eliminate the need for
psci_init_system_off2 if it can be called from there. What am I missing ?

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/2] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
  2024-03-12 15:57   ` Sudeep Holla
@ 2024-03-12 16:36     ` David Woodhouse
  2024-03-13 15:34       ` Sudeep Holla
  0 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2024-03-12 16:36 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Mark Rutland,
	Lorenzo Pieralisi, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Mostafa Saleh, Jean-Philippe Brucker, linux-doc, linux-kernel,
	kvmarm, linux-pm

[-- Attachment #1: Type: text/plain, Size: 2969 bytes --]

On Tue, 2024-03-12 at 15:57 +0000, Sudeep Holla wrote:
> Looked briefly at register_sys_off_handler and it should be OK to call
> it from psci_init_system_off2() below. Any particular reason for having
> separate initcall to do this ? We can even eliminate the need for
> psci_init_system_off2 if it can be called from there. What am I missing ?

My first attempt did that. I don't think we can kmalloc that early:

[    0.000000] psci: SMC Calling Convention v1.1
[    0.000000] Unable to handle kernel read from unreadable memory at virtual address 0000000000000018
[    0.000000] Mem abort info:
[    0.000000]   ESR = 0x0000000096000004
[    0.000000]   EC = 0x25: DABT (current EL), IL = 32 bits
[    0.000000]   SET = 0, FnV = 0
[    0.000000]   EA = 0, S1PTW = 0
[    0.000000]   FSC = 0x04: level 0 translation fault
[    0.000000] Data abort info:
[    0.000000]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[    0.000000]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[    0.000000]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[    0.000000] [0000000000000018] user address but active_mm is swapper
[    0.000000] Internal error: Oops: 0000000096000004 [#1] SMP
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.8.0-rc3+ #30
[    0.000000] pstate: 804000c5 (Nzcv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[    0.000000] pc : kmalloc_trace+0x138/0x340
[    0.000000] lr : register_sys_off_handler+0x60/0x258
[    0.000000] sp : ffff8000827d3d10
[    0.000000] x29: ffff8000827d3d20 x28: 000000005cd7e0ac x27: 0000000000001f3f
[    0.000000] x26: 0000000000000000 x25: ffff8000802bd890 x24: ffff8000802bd890
[    0.000000] x23: 0000000000000040 x22: 0000000000000dc0 x21: 0000000000000001
[    0.000000] x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000006
[    0.000000] x17: 000000000036fd40 x16: 000000005ec902c0 x15: ffff8000827d37c0
[    0.000000] x14: 0000000000000000 x13: 312e3176206e6f69 x12: 746e65766e6f4320
[    0.000000] x11: 00000000ffffdfff x10: ffff8000828cebe0 x9 : ffff80008281ea10
[    0.000000] x8 : ffff8000827d3d78 x7 : 0000000000000000 x6 : 0000000000000000
[    0.000000] x5 : 0000000000000000 x4 : ffff8000827e0000 x3 : ffff8000827f41c0
[    0.000000] x2 : 0000000000000040 x1 : 0000000000000dc0 x0 : 0000000000000000
[    0.000000] Call trace:
[    0.000000]  kmalloc_trace+0x138/0x340
[    0.000000]  register_sys_off_handler+0x60/0x258
[    0.000000]  psci_probe+0x2cc/0x350
[    0.000000]  psci_acpi_init+0x50/0x88
[    0.000000]  setup_arch+0x194/0x278
[    0.000000]  start_kernel+0x7c/0x410
[    0.000000]  __primary_switched+0xb8/0xc8
[    0.000000] Code: b5000f7a f94003f4 aa1803fe d50320ff (b9401a64) 
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 0/2] Add PSCI v1.3 SYSTEM_OFF2 support for hibernation
  2024-03-12 15:24 ` [RFC PATCH 0/2] Add PSCI v1.3 SYSTEM_OFF2 support for hibernation Marc Zyngier
@ 2024-03-12 17:01   ` David Woodhouse
  0 siblings, 0 replies; 16+ messages in thread
From: David Woodhouse @ 2024-03-12 17:01 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
	Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
	Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm

[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]

On Tue, 2024-03-12 at 15:24 +0000, Marc Zyngier wrote:
> 
> > Strictly, we should perhaps also allow the guest to detect PSCI v1.3, 
> > but when v1.1 was added in commit 512865d83fd9 it was done 
> > unconditionally, which seems wrong. Shouldn't we have a way for 
> > userspace to control what gets exposed, rather than silently changing 
> > the guest behaviour with newer host kernels? Should I add a 
> > KVM_CAP_ARM_PSCI_VERSION?
> 
> Do you mean something like 85bd0ba1ff98?

Ew :)

That isn't quite what I was thinking, no. I wasn't thinking of
something that would default to the latest, and would have a per-vCPU
way of setting what's essentially a KVM-wide configuration.

So if current userspace doesn't want the environment it exposes to
guests to be randomly changed by a kernel upgrade in the future, it
needs to explicitly use KVM_ARM_SET_REG on any one of the vCPUs, to set
KVM_REG_ARM_PSCI_VERSION to KVM_ARM_PSCI_1_1?

It isn't just new optional features; PSCI v1.2 added new error returns
from CPU_ON for example. Should guests start to see those, just because
the host kernel got upgraded? 

Now I see it, I suppose we can extend it to v1.2 (and v1.3 when that's
eventually published for real). Should we really continue to increment
the *default* though?

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation
  2024-03-12 15:36   ` Marc Zyngier
@ 2024-03-12 17:06     ` David Woodhouse
  0 siblings, 0 replies; 16+ messages in thread
From: David Woodhouse @ 2024-03-12 17:06 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
	Oliver Upton, James Morse, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
	Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm

[-- Attachment #1: Type: text/plain, Size: 2491 bytes --]

On Tue, 2024-03-12 at 15:36 +0000, Marc Zyngier wrote:
> On Tue, 12 Mar 2024 13:51:28 +0000,
> David Woodhouse <dwmw2@infradead.org> wrote:
> > 
> > +Hibernation using the PSCI SYSTEM_OFF2 call is enabled with the
> > +KVM_CAP_ARM_SYSTEM_OFF2 VM capability. If a guest invokes the PSCI
> 
> Checking that PSCI 1.3 is enabled for the guest should be enough, no?
> I don't think providing yet another level of optionally brings us
> much, other than complexity.

This is just following what we already do for SYSTEM_RESET2. Regardless
of the PSCI version, these calls are *optional*. Shouldn't exposing
them to the guest be a deliberate choice on the part of the userspace
VMM?

I was originally thinking of a KVM_CAP with a bitmask of the optional
features to be enabled (and which would return the bitmask of supported
features). But that isn't how it was already being done, so I just
followed the existing precedent.

> > --- a/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> > +++ b/arch/arm64/kvm/hyp/nvhe/psci-relay.c
> > @@ -264,6 +264,8 @@ static unsigned long psci_1_0_handler(u64 func_id, struct kvm_cpu_context *host_
> >         switch (func_id) {
> >         case PSCI_1_0_FN_PSCI_FEATURES:
> >         case PSCI_1_0_FN_SET_SUSPEND_MODE:
> > +       case PSCI_1_3_FN_SYSTEM_OFF2:
> > +       case PSCI_1_3_FN64_SYSTEM_OFF2:
> 
> nit: order by version number.

Ack.

> > @@ -353,6 +359,11 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
> >                         if (test_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags))
> >                                 val = 0;
> >                         break;
> > +               case PSCI_1_3_FN_SYSTEM_OFF2:
> > +               case PSCI_1_3_FN64_SYSTEM_OFF2:
> > +                       if (test_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags))
> > +                               val = 1UL << PSCI_1_3_HIBERNATE_TYPE_OFF;
> > +                       break;
> 
> Testing the PSCI version should be enough (minor >= 3). Same thing
> goes the the capability: checking that the host supports 1.3 should be
> enough.

Wouldn't that mean we should implement *all* the new functions which
are optional in v1.3? I really think the opt-in should be per feature,
for the optional ones.



[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation
  2024-03-12 15:47   ` Oliver Upton
@ 2024-03-13 12:53     ` David Woodhouse
  2024-03-13 19:42       ` Oliver Upton
  0 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2024-03-13 12:53 UTC (permalink / raw)
  To: Oliver Upton
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
	Marc Zyngier, James Morse, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
	Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm

[-- Attachment #1: Type: text/plain, Size: 1572 bytes --]

On Tue, 2024-03-12 at 08:47 -0700, Oliver Upton wrote:
> Hi,
> 
> On Tue, Mar 12, 2024 at 01:51:28PM +0000, David Woodhouse wrote:
> > From: David Woodhouse <dwmw@amazon.co.uk>
> > 
> > The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2 function
> > which is analogous to ACPI S4 state. This will allow hosting environments
> > to determine that a guest is hibernated rather than just powered off, and
> > ensure that they preserve the virtual environment appropriately to allow
> > the guest to resume safely (or bump the hardware_signature in the FACS to
> > trigger a clean reboot instead).
> > 
> > The beta version will be changed to say that PSCI_FEATURES returns a bit
> > mask of the supported hibernate types, which is implemented here.
> 
> Have you considered doing the PSCI implementation in userspace? The
> SMCCC filter [*] was added for this exact purpose. 
> 

For the purpose of deprecating the in-KVM PSCI implementation and
reimplementing it in VMMs? So we're never going to add new features and
versions to the kernel PSCI?

If that's the case then I suppose I can send the patch to clearly
document the in-KVM PSCI as deprecated, and do it that way.

But to answer your question directly, no I hadn't considered that. I
was just following the existing precedent of adding new optional PSCI
features like SYSTEM_RESET2.

I didn't think that the addition of SYSTEM_OFF2 in precisely the same
fashion would be the straw that broke the camel's back, and pushed us
to reimplement it in userspace instead.

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/2] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
  2024-03-12 16:36     ` David Woodhouse
@ 2024-03-13 15:34       ` Sudeep Holla
  2024-03-14 11:09         ` Sudeep Holla
  0 siblings, 1 reply; 16+ messages in thread
From: Sudeep Holla @ 2024-03-13 15:34 UTC (permalink / raw)
  To: David Woodhouse
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Mark Rutland,
	Lorenzo Pieralisi, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Mostafa Saleh, Jean-Philippe Brucker, linux-doc, linux-kernel,
	kvmarm, linux-pm

On Tue, Mar 12, 2024 at 04:36:05PM +0000, David Woodhouse wrote:
> On Tue, 2024-03-12 at 15:57 +0000, Sudeep Holla wrote:
> > Looked briefly at register_sys_off_handler and it should be OK to call
> > it from psci_init_system_off2() below. Any particular reason for having
> > separate initcall to do this ? We can even eliminate the need for
> > psci_init_system_off2 if it can be called from there. What am I missing ?
>
> My first attempt did that. I don't think we can kmalloc that early:
>

That was was initial guess. But a quick hack on my setup and running it on
the FVP model didn't complain. I think either I messed up or something else
wrong, I must check on some h/w. Anyways sorry for the noise and thanks for
the response.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation
  2024-03-13 12:53     ` David Woodhouse
@ 2024-03-13 19:42       ` Oliver Upton
  2024-03-13 23:01         ` David Woodhouse
  0 siblings, 1 reply; 16+ messages in thread
From: Oliver Upton @ 2024-03-13 19:42 UTC (permalink / raw)
  To: David Woodhouse
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
	Marc Zyngier, James Morse, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
	Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm

On Wed, Mar 13, 2024 at 12:53:45PM +0000, David Woodhouse wrote:
> On Tue, 2024-03-12 at 08:47 -0700, Oliver Upton wrote:
> > Hi,
> > 
> > On Tue, Mar 12, 2024 at 01:51:28PM +0000, David Woodhouse wrote:
> > > From: David Woodhouse <dwmw@amazon.co.uk>
> > > 
> > > The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2 function
> > > which is analogous to ACPI S4 state. This will allow hosting environments
> > > to determine that a guest is hibernated rather than just powered off, and
> > > ensure that they preserve the virtual environment appropriately to allow
> > > the guest to resume safely (or bump the hardware_signature in the FACS to
> > > trigger a clean reboot instead).
> > > 
> > > The beta version will be changed to say that PSCI_FEATURES returns a bit
> > > mask of the supported hibernate types, which is implemented here.
> > 
> > Have you considered doing the PSCI implementation in userspace? The
> > SMCCC filter [*] was added for this exact purpose. 
> > 
> 
> For the purpose of deprecating the in-KVM PSCI implementation and
> reimplementing it in VMMs? So we're never going to add new features and
> versions to the kernel PSCI?

I'm not against the idea of adding features to the in-kernel PSCI
implementation when it has a clear reason to live in the kernel. For
this hypercall in particular the actual implementation lives in
userspace, the KVM code is just boilerplate for migration / UAPI
compatibility.

> If that's the case then I suppose I can send the patch to clearly
> document the in-KVM PSCI as deprecated, and do it that way.

There probably isn't an awful lot to be gained from documenting the UAPI
as deprecated, we will never actually get to delete it.

> But to answer your question directly, no I hadn't considered that. I
> was just following the existing precedent of adding new optional PSCI
> features like SYSTEM_RESET2.

Understandable. And the infrastructure I'm recommending didn't exist
around the time of THE SYSTEM_RESET2 addition.

> I didn't think that the addition of SYSTEM_OFF2 in precisely the same
> fashion would be the straw that broke the camel's back, and pushed us
> to reimplement it in userspace instead.

I do not believe using the SMCCC filter to take SYSTEM_OFF2 to userspace
would necessitate a _full_ userspace reimplementation. You're free to
leave the default handler in place for functions you don't care about.
Forwarding PSCI_VERSION, PSCI_FEATURES, and SYSTEM_OFF2 would be sufficient
to get this off the ground, and the VMM can still advertise the rest of
the hypercalls implemented by KVM.

That might get you where you want to go a bit faster, since it'd avoid
any concerns about implementing a draft ABI in the kernel.

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation
  2024-03-13 19:42       ` Oliver Upton
@ 2024-03-13 23:01         ` David Woodhouse
  0 siblings, 0 replies; 16+ messages in thread
From: David Woodhouse @ 2024-03-13 23:01 UTC (permalink / raw)
  To: Oliver Upton
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
	Marc Zyngier, James Morse, Suzuki K Poulose, Zenghui Yu,
	Catalin Marinas, Will Deacon, Mark Rutland, Lorenzo Pieralisi,
	Rafael J. Wysocki, Len Brown, Pavel Machek, Mostafa Saleh,
	Jean-Philippe Brucker, linux-doc, linux-kernel, kvmarm, linux-pm

[-- Attachment #1: Type: text/plain, Size: 1314 bytes --]

On Wed, 2024-03-13 at 12:42 -0700, Oliver Upton wrote:
> I do not believe using the SMCCC filter to take SYSTEM_OFF2 to userspace
> would necessitate a _full_ userspace reimplementation. You're free to
> leave the default handler in place for functions you don't care about.
> Forwarding PSCI_VERSION, PSCI_FEATURES, and SYSTEM_OFF2 would be sufficient
> to get this off the ground, and the VMM can still advertise the rest of
> the hypercalls implemented by KVM.

Right... so we'd intercept PSCI_FEATURES *just* to indicate support for
the one call we implement in userspace, and pass all other
PSCI_FEATURES calls through to the kernel to handle the others? 

And then we'd implement SYSTEM_OFF2, hooking it up to do whatever our
existing KVM_EXIT_SYSTEM_EVENT handler *already* does for a standard
power-off, just with the extra flag to show it's a hibernate.

This concept does not fill me with joy.

> That might get you where you want to go a bit faster, since it'd avoid
> any concerns about implementing a draft ABI in the kernel.

I'd be more concerned about supporting a draft ABI in a public cloud
provider, TBH. Having it as just one more downstream kernel patch,
posted upstream but not yet merged before the final publication of the
spec, would be the least of my worries.


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5965 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/2] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
  2024-03-13 15:34       ` Sudeep Holla
@ 2024-03-14 11:09         ` Sudeep Holla
  2024-03-14 11:27           ` David Woodhouse
  0 siblings, 1 reply; 16+ messages in thread
From: Sudeep Holla @ 2024-03-14 11:09 UTC (permalink / raw)
  To: David Woodhouse
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Sudeep Holla,
	Jonathan Corbet, Marc Zyngier, Oliver Upton, James Morse,
	Suzuki K Poulose, Zenghui Yu, Catalin Marinas, Will Deacon,
	Mark Rutland, Lorenzo Pieralisi, Rafael J. Wysocki, Len Brown,
	Pavel Machek, Mostafa Saleh, Jean-Philippe Brucker, linux-doc,
	linux-kernel, kvmarm, linux-pm

On Wed, Mar 13, 2024 at 03:34:44PM +0000, Sudeep Holla wrote:
> On Tue, Mar 12, 2024 at 04:36:05PM +0000, David Woodhouse wrote:
> > On Tue, 2024-03-12 at 15:57 +0000, Sudeep Holla wrote:
> > > Looked briefly at register_sys_off_handler and it should be OK to call
> > > it from psci_init_system_off2() below. Any particular reason for having
> > > separate initcall to do this ? We can even eliminate the need for
> > > psci_init_system_off2 if it can be called from there. What am I missing ?
> >
> > My first attempt did that. I don't think we can kmalloc that early:
> >
>
> That was was initial guess. But a quick hack on my setup and running it on
> the FVP model didn't complain. I think either I messed up or something else
> wrong, I must check on some h/w. Anyways sorry for the noise and thanks for
> the response.
>

OK, it was indeed giving -ENOMEM which in my hack didn't get propogated
properly 🙁. I assume you have some configs that is resulting in the
crash instead of -ENOMEM as I see in my setup(FVP as well as hardware).

Sorry for the noise.

--
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC PATCH 2/2] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate
  2024-03-14 11:09         ` Sudeep Holla
@ 2024-03-14 11:27           ` David Woodhouse
  0 siblings, 0 replies; 16+ messages in thread
From: David Woodhouse @ 2024-03-14 11:27 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: linux-arm-kernel, kvm, Paolo Bonzini, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, James Morse, Suzuki K Poulose,
	Zenghui Yu, Catalin Marinas, Will Deacon, Mark Rutland,
	Lorenzo Pieralisi, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Mostafa Saleh, Jean-Philippe Brucker, linux-doc, linux-kernel,
	kvmarm, linux-pm

On 14 March 2024 12:09:11 CET, Sudeep Holla <sudeep.holla@arm.com> wrote:
>On Wed, Mar 13, 2024 at 03:34:44PM +0000, Sudeep Holla wrote:
>> On Tue, Mar 12, 2024 at 04:36:05PM +0000, David Woodhouse wrote:
>> > On Tue, 2024-03-12 at 15:57 +0000, Sudeep Holla wrote:
>> > > Looked briefly at register_sys_off_handler and it should be OK to call
>> > > it from psci_init_system_off2() below. Any particular reason for having
>> > > separate initcall to do this ? We can even eliminate the need for
>> > > psci_init_system_off2 if it can be called from there. What am I missing ?
>> >
>> > My first attempt did that. I don't think we can kmalloc that early:
>> >
>>
>> That was was initial guess. But a quick hack on my setup and running it on
>> the FVP model didn't complain. I think either I messed up or something else
>> wrong, I must check on some h/w. Anyways sorry for the noise and thanks for
>> the response.
>>
>
>OK, it was indeed giving -ENOMEM which in my hack didn't get propogated
>properly 🙁. I assume you have some configs that is resulting in the
>crash instead of -ENOMEM as I see in my setup(FVP as well as hardware).
>
>Sorry for the noise.

Fairly stock Fedora config, with a few tweaks.
http://david.woodhou.se/arm-hibernate-config

I note kmalloc_trace() is in the backtrace.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-03-14 11:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-12 13:51 [RFC PATCH 0/2] Add PSCI v1.3 SYSTEM_OFF2 support for hibernation David Woodhouse
2024-03-12 13:51 ` [RFC PATCH 1/2] KVM: arm64: Add PSCI SYSTEM_OFF2 function " David Woodhouse
2024-03-12 15:36   ` Marc Zyngier
2024-03-12 17:06     ` David Woodhouse
2024-03-12 15:47   ` Oliver Upton
2024-03-13 12:53     ` David Woodhouse
2024-03-13 19:42       ` Oliver Upton
2024-03-13 23:01         ` David Woodhouse
2024-03-12 13:51 ` [RFC PATCH 2/2] arm64: Use SYSTEM_OFF2 PSCI call to power off for hibernate David Woodhouse
2024-03-12 15:57   ` Sudeep Holla
2024-03-12 16:36     ` David Woodhouse
2024-03-13 15:34       ` Sudeep Holla
2024-03-14 11:09         ` Sudeep Holla
2024-03-14 11:27           ` David Woodhouse
2024-03-12 15:24 ` [RFC PATCH 0/2] Add PSCI v1.3 SYSTEM_OFF2 support for hibernation Marc Zyngier
2024-03-12 17:01   ` David Woodhouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).