linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action
@ 2025-01-13 16:25 Ahmad Fatoum
  2025-01-13 16:25 ` [PATCH v2 01/12] reboot: replace __hw_protection_shutdown bool action parameter with an enum Ahmad Fatoum
                   ` (12 more replies)
  0 siblings, 13 replies; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum, Matteo Croce

We currently leave the decision of whether to shutdown or reboot to
protect hardware in an emergency situation to the individual drivers.

This works out in some cases, where the driver detecting the critical
failure has inside knowledge: It binds to the system management controller
for example or is guided by hardware description that defines what to do.

This is inadequate in the general case though as a driver reporting e.g.
an imminent power failure can't know whether a shutdown or a reboot would
be more appropriate for a given hardware platform.

To address this, this series adds a hw_protection kernel parameter and
sysfs toggle that can be used to change the action from the shutdown
default to reboot. A new hw_protection_trigger API then makes use of
this default action.

My particular use case is unattended embedded systems that don't
have support for shutdown and that power on automatically when power is
supplied:

  - A brief power cycle gets detected by the driver
  - The kernel powers down the system and SoC goes into shutdown mode
  - Power is restored
  - The system remains oblivious to the restored power
  - System needs to be manually power cycled for a duration long enough
    to drain the capacitors

With this series, such systems can configure the kernel with
hw_protection=reboot to have the boot firmware worry about critical
conditions.

---
Changes in v2:
- Added Rob's dt-bindings Acked-by
- Add kernel-doc for all newly introduced enums, functions and
  function parameters (lkp)
- Fix kernel-doc warning for do_kernel_restart even though it
  wasn't introduced in this series (lkp)
- Rename the work function and object in patch 2 already to align
  with the functional change
-
- Link to v1: https://lore.kernel.org/r/20241219-hw_protection-reboot-v1-0-263a0c1df802@pengutronix.de

---
Ahmad Fatoum (12):
      reboot: replace __hw_protection_shutdown bool action parameter with an enum
      reboot: reboot, not shutdown, on hw_protection_reboot timeout
      docs: thermal: sync hardware protection doc with code
      reboot: describe do_kernel_restart's cmd argument in kernel-doc
      reboot: rename now misleading __hw_protection_shutdown symbols
      reboot: indicate whether it is a HARDWARE PROTECTION reboot or shutdown
      reboot: add support for configuring emergency hardware protection action
      regulator: allow user configuration of hardware protection action
      platform/chrome: cros_ec_lpc: prepare for hw_protection_shutdown removal
      dt-bindings: thermal: give OS some leeway in absence of critical-action
      thermal: core: allow user configuration of hardware protection action
      reboot: retire hw_protection_reboot and hw_protection_shutdown helpers

 Documentation/ABI/testing/sysfs-kernel-reboot      |   8 ++
 Documentation/admin-guide/kernel-parameters.txt    |   6 +
 .../devicetree/bindings/thermal/thermal-zones.yaml |   5 +-
 Documentation/driver-api/thermal/sysfs-api.rst     |  25 ++--
 drivers/platform/chrome/cros_ec_lpc.c              |   2 +-
 drivers/regulator/core.c                           |   4 +-
 drivers/regulator/irq_helpers.c                    |  16 +--
 drivers/thermal/thermal_core.c                     |  17 +--
 drivers/thermal/thermal_core.h                     |   1 +
 drivers/thermal/thermal_of.c                       |   7 +-
 include/linux/reboot.h                             |  36 ++++--
 include/uapi/linux/capability.h                    |   1 +
 kernel/reboot.c                                    | 140 ++++++++++++++++-----
 13 files changed, 195 insertions(+), 73 deletions(-)
---
base-commit: 78d4f34e2115b517bcbfe7ec0d018bbbb6f9b0b8
change-id: 20241218-hw_protection-reboot-96953493726a

Best regards,
-- 
Ahmad Fatoum <a.fatoum@pengutronix.de>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v2 01/12] reboot: replace __hw_protection_shutdown bool action parameter with an enum
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
@ 2025-01-13 16:25 ` Ahmad Fatoum
  2025-01-20  7:10   ` Tzung-Bi Shih
  2025-01-13 16:25 ` [PATCH v2 02/12] reboot: reboot, not shutdown, on hw_protection_reboot timeout Ahmad Fatoum
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum

Currently __hw_protection_shutdown() either reboots or shuts down the
system according to its shutdown argument.

To make the logic easier to follow, both inside __hw_protection_shutdown
and at caller sites, lets replace the bool parameter with an enum.

This will be extra useful, when in a later commit, a third action is
added to the enumeration.

No functional change.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 include/linux/reboot.h | 18 +++++++++++++++---
 kernel/reboot.c        | 14 ++++++--------
 2 files changed, 21 insertions(+), 11 deletions(-)

diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index abcdde4df697969a8027bcb052efc00daabbbf6a..e97f6b8e858685b4b527daa8920a31eabcf91622 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -177,16 +177,28 @@ void ctrl_alt_del(void);
 
 extern void orderly_poweroff(bool force);
 extern void orderly_reboot(void);
-void __hw_protection_shutdown(const char *reason, int ms_until_forced, bool shutdown);
+
+/**
+ * enum hw_protection_action - Hardware protection action
+ *
+ * @HWPROT_ACT_SHUTDOWN:
+ *	The system should be shut down (powered off) for HW protection.
+ * @HWPROT_ACT_REBOOT:
+ *	The system should be rebooted for HW protection.
+ */
+enum hw_protection_action { HWPROT_ACT_SHUTDOWN, HWPROT_ACT_REBOOT };
+
+void __hw_protection_shutdown(const char *reason, int ms_until_forced,
+			      enum hw_protection_action action);
 
 static inline void hw_protection_reboot(const char *reason, int ms_until_forced)
 {
-	__hw_protection_shutdown(reason, ms_until_forced, false);
+	__hw_protection_shutdown(reason, ms_until_forced, HWPROT_ACT_REBOOT);
 }
 
 static inline void hw_protection_shutdown(const char *reason, int ms_until_forced)
 {
-	__hw_protection_shutdown(reason, ms_until_forced, true);
+	__hw_protection_shutdown(reason, ms_until_forced, HWPROT_ACT_SHUTDOWN);
 }
 
 /*
diff --git a/kernel/reboot.c b/kernel/reboot.c
index a701000bab3470df28665e8c9591cd82a033c6c2..847ac5d17a659981c6765699eac323f5e87f48c1 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -983,10 +983,7 @@ static void hw_failure_emergency_poweroff(int poweroff_delay_ms)
  * @ms_until_forced:	Time to wait for orderly shutdown or reboot before
  *			triggering it. Negative value disables the forced
  *			shutdown or reboot.
- * @shutdown:		If true, indicates that a shutdown will happen
- *			after the critical tempeature is reached.
- *			If false, indicates that a reboot will happen
- *			after the critical tempeature is reached.
+ * @action:		The hardware protection action to be taken.
  *
  * Initiate an emergency system shutdown or reboot in order to protect
  * hardware from further damage. Usage examples include a thermal protection.
@@ -994,7 +991,8 @@ static void hw_failure_emergency_poweroff(int poweroff_delay_ms)
  * pending even if the previous request has given a large timeout for forced
  * shutdown/reboot.
  */
-void __hw_protection_shutdown(const char *reason, int ms_until_forced, bool shutdown)
+void __hw_protection_shutdown(const char *reason, int ms_until_forced,
+			      enum hw_protection_action action)
 {
 	static atomic_t allow_proceed = ATOMIC_INIT(1);
 
@@ -1009,10 +1007,10 @@ void __hw_protection_shutdown(const char *reason, int ms_until_forced, bool shut
 	 * orderly_poweroff failure
 	 */
 	hw_failure_emergency_poweroff(ms_until_forced);
-	if (shutdown)
-		orderly_poweroff(true);
-	else
+	if (action == HWPROT_ACT_REBOOT)
 		orderly_reboot();
+	else
+		orderly_poweroff(true);
 }
 EXPORT_SYMBOL_GPL(__hw_protection_shutdown);
 

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 02/12] reboot: reboot, not shutdown, on hw_protection_reboot timeout
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
  2025-01-13 16:25 ` [PATCH v2 01/12] reboot: replace __hw_protection_shutdown bool action parameter with an enum Ahmad Fatoum
@ 2025-01-13 16:25 ` Ahmad Fatoum
  2025-01-20  7:10   ` Tzung-Bi Shih
  2025-01-22 11:28   ` Matti Vaittinen
  2025-01-13 16:25 ` [PATCH v2 03/12] docs: thermal: sync hardware protection doc with code Ahmad Fatoum
                   ` (10 subsequent siblings)
  12 siblings, 2 replies; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum

hw_protection_shutdown() will kick off an orderly shutdown and if that
takes longer than a configurable amount of time, an emergency shutdown
will occur.

Recently, hw_protection_reboot() was added for those systems that don't
implement a proper shutdown and are better served by rebooting and
having the boot firmware worry about doing something about the critical
condition.

On timeout of the orderly reboot of hw_protection_reboot(), the system
would go into shutdown, instead of reboot. This is not a good idea, as
going into shutdown was explicitly not asked for.

Fix this by always doing an emergency reboot if hw_protection_reboot()
is called and the orderly reboot takes too long.

Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 kernel/reboot.c | 70 ++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 49 insertions(+), 21 deletions(-)

diff --git a/kernel/reboot.c b/kernel/reboot.c
index 847ac5d17a659981c6765699eac323f5e87f48c1..222b63dfd31020d0e2bc1b1402dbfa82adc71990 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -932,48 +932,76 @@ void orderly_reboot(void)
 }
 EXPORT_SYMBOL_GPL(orderly_reboot);
 
+static const char *hw_protection_action_str(enum hw_protection_action action)
+{
+	switch (action) {
+	case HWPROT_ACT_SHUTDOWN:
+		return "shutdown";
+	case HWPROT_ACT_REBOOT:
+		return "reboot";
+	default:
+		return "undefined";
+	}
+}
+
+static enum hw_protection_action hw_failure_emergency_action;
+
 /**
- * hw_failure_emergency_poweroff_func - emergency poweroff work after a known delay
- * @work: work_struct associated with the emergency poweroff function
+ * hw_failure_emergency_action_func - emergency action work after a known delay
+ * @work: work_struct associated with the emergency action function
  *
  * This function is called in very critical situations to force
- * a kernel poweroff after a configurable timeout value.
+ * a kernel poweroff or reboot after a configurable timeout value.
  */
-static void hw_failure_emergency_poweroff_func(struct work_struct *work)
+static void hw_failure_emergency_action_func(struct work_struct *work)
 {
+	const char *action_str = hw_protection_action_str(hw_failure_emergency_action);
+
+	pr_emerg("Hardware protection timed-out. Trying forced %s\n",
+		 action_str);
+
 	/*
-	 * We have reached here after the emergency shutdown waiting period has
-	 * expired. This means orderly_poweroff has not been able to shut off
-	 * the system for some reason.
+	 * We have reached here after the emergency action waiting period has
+	 * expired. This means orderly_poweroff/reboot has not been able to
+	 * shut off the system for some reason.
 	 *
-	 * Try to shut down the system immediately using kernel_power_off
-	 * if populated
+	 * Try to shut off the system immediately if possible
 	 */
-	pr_emerg("Hardware protection timed-out. Trying forced poweroff\n");
-	kernel_power_off();
+
+	if (hw_failure_emergency_action == HWPROT_ACT_REBOOT)
+		kernel_restart(NULL);
+	else
+		kernel_power_off();
 
 	/*
 	 * Worst of the worst case trigger emergency restart
 	 */
-	pr_emerg("Hardware protection shutdown failed. Trying emergency restart\n");
+	pr_emerg("Hardware protection %s failed. Trying emergency restart\n",
+		 action_str);
 	emergency_restart();
 }
 
-static DECLARE_DELAYED_WORK(hw_failure_emergency_poweroff_work,
-			    hw_failure_emergency_poweroff_func);
+static DECLARE_DELAYED_WORK(hw_failure_emergency_action_work,
+			    hw_failure_emergency_action_func);
 
 /**
- * hw_failure_emergency_poweroff - Trigger an emergency system poweroff
+ * hw_failure_emergency_schedule - Schedule an emergency system shutdown or reboot
+ *
+ * @action:		The hardware protection action to be taken
+ * @action_delay_ms:	Time in milliseconds to elapse before triggering action
  *
  * This may be called from any critical situation to trigger a system shutdown
- * after a given period of time. If time is negative this is not scheduled.
+ * or reboot after a given period of time.
+ * If time is negative this is not scheduled.
  */
-static void hw_failure_emergency_poweroff(int poweroff_delay_ms)
+static void hw_failure_emergency_schedule(enum hw_protection_action action,
+					  int action_delay_ms)
 {
-	if (poweroff_delay_ms <= 0)
+	if (action_delay_ms <= 0)
 		return;
-	schedule_delayed_work(&hw_failure_emergency_poweroff_work,
-			      msecs_to_jiffies(poweroff_delay_ms));
+	hw_failure_emergency_action = action;
+	schedule_delayed_work(&hw_failure_emergency_action_work,
+			      msecs_to_jiffies(action_delay_ms));
 }
 
 /**
@@ -1006,7 +1034,7 @@ void __hw_protection_shutdown(const char *reason, int ms_until_forced,
 	 * Queue a backup emergency shutdown in the event of
 	 * orderly_poweroff failure
 	 */
-	hw_failure_emergency_poweroff(ms_until_forced);
+	hw_failure_emergency_schedule(action, ms_until_forced);
 	if (action == HWPROT_ACT_REBOOT)
 		orderly_reboot();
 	else

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 03/12] docs: thermal: sync hardware protection doc with code
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
  2025-01-13 16:25 ` [PATCH v2 01/12] reboot: replace __hw_protection_shutdown bool action parameter with an enum Ahmad Fatoum
  2025-01-13 16:25 ` [PATCH v2 02/12] reboot: reboot, not shutdown, on hw_protection_reboot timeout Ahmad Fatoum
@ 2025-01-13 16:25 ` Ahmad Fatoum
  2025-01-20  7:11   ` Tzung-Bi Shih
  2025-01-22 11:01   ` Matti Vaittinen
  2025-01-13 16:25 ` [PATCH v2 04/12] reboot: describe do_kernel_restart's cmd argument in kernel-doc Ahmad Fatoum
                   ` (9 subsequent siblings)
  12 siblings, 2 replies; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum

Originally, the thermal framework's only hardware protection action was
to trigger a shutdown. This has been changed a little over a year ago to
also support rebooting as alternative hardware protection action.

Update the documentation to reflect this.

Fixes: 62e79e38b257 ("thermal/thermal_of: Allow rebooting after critical temp")
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 Documentation/driver-api/thermal/sysfs-api.rst | 25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/Documentation/driver-api/thermal/sysfs-api.rst b/Documentation/driver-api/thermal/sysfs-api.rst
index c803b89b7248f9f26ac24608b0144db5e9c2ddb4..6b481364457b8ec56302e80bb443291c2b4a94d5 100644
--- a/Documentation/driver-api/thermal/sysfs-api.rst
+++ b/Documentation/driver-api/thermal/sysfs-api.rst
@@ -413,18 +413,21 @@ This function serves as an arbitrator to set the state of a cooling
 device. It sets the cooling device to the deepest cooling state if
 possible.
 
-5. thermal_emergency_poweroff
-=============================
+5. Critical Events
+==================
 
-On an event of critical trip temperature crossing the thermal framework
-shuts down the system by calling hw_protection_shutdown(). The
-hw_protection_shutdown() first attempts to perform an orderly shutdown
-but accepts a delay after which it proceeds doing a forced power-off
-or as last resort an emergency_restart.
+On an event of critical trip temperature crossing, the thermal framework
+will trigger a hardware protection power-off (shutdown) or reboot,
+depending on configuration.
+
+At first, the kernel will attempt an orderly power-off or reboot, but
+accepts a delay after which it proceeds to do a forced power-off or
+reboot, respectively. If this fails, ``emergency restart()`` is invoked
+as last resort.
 
 The delay should be carefully profiled so as to give adequate time for
-orderly poweroff.
+orderly power-off or reboot.
 
-If the delay is set to 0 emergency poweroff will not be supported. So a
-carefully profiled non-zero positive value is a must for emergency
-poweroff to be triggered.
+If the delay is set to 0, the emergency action will not be supported. So a
+carefully profiled non-zero positive value is a must for the emergency
+action to be triggered.

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 04/12] reboot: describe do_kernel_restart's cmd argument in kernel-doc
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
                   ` (2 preceding siblings ...)
  2025-01-13 16:25 ` [PATCH v2 03/12] docs: thermal: sync hardware protection doc with code Ahmad Fatoum
@ 2025-01-13 16:25 ` Ahmad Fatoum
  2025-01-20  7:11   ` Tzung-Bi Shih
  2025-01-13 16:25 ` [PATCH v2 05/12] reboot: rename now misleading __hw_protection_shutdown symbols Ahmad Fatoum
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum

A W=1 build rightfully complains about the function's kernel-doc
being incomplete.

Describe its single parameter to fix this.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 kernel/reboot.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/kernel/reboot.c b/kernel/reboot.c
index 222b63dfd31020d0e2bc1b1402dbfa82adc71990..21a85cf95413979322d3183af1a378f5ddd8e780 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -229,6 +229,9 @@ EXPORT_SYMBOL(unregister_restart_handler);
 /**
  *	do_kernel_restart - Execute kernel restart handler call chain
  *
+ *	@cmd: pointer to buffer containing command to execute for restart
+ *		or %NULL
+ *
  *	Calls functions registered with register_restart_handler.
  *
  *	Expected to be called from machine_restart as last step of the restart

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 05/12] reboot: rename now misleading __hw_protection_shutdown symbols
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
                   ` (3 preceding siblings ...)
  2025-01-13 16:25 ` [PATCH v2 04/12] reboot: describe do_kernel_restart's cmd argument in kernel-doc Ahmad Fatoum
@ 2025-01-13 16:25 ` Ahmad Fatoum
  2025-01-20  7:11   ` Tzung-Bi Shih
  2025-01-13 16:25 ` [PATCH v2 06/12] reboot: indicate whether it is a HARDWARE PROTECTION reboot or shutdown Ahmad Fatoum
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum

The __hw_protection_shutdown function name has become misleading
since it can cause either a shutdown (poweroff) or a reboot
depending on its argument.

To avoid further confusion, let's rename it, so it doesn't suggest
that a poweroff is all it can do.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 include/linux/reboot.h | 8 ++++----
 kernel/reboot.c        | 8 ++++----
 2 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index e97f6b8e858685b4b527daa8920a31eabcf91622..53c64e31b3cfdcb6b6dfe4def45fbb40c29f5144 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -188,17 +188,17 @@ extern void orderly_reboot(void);
  */
 enum hw_protection_action { HWPROT_ACT_SHUTDOWN, HWPROT_ACT_REBOOT };
 
-void __hw_protection_shutdown(const char *reason, int ms_until_forced,
-			      enum hw_protection_action action);
+void __hw_protection_trigger(const char *reason, int ms_until_forced,
+			     enum hw_protection_action action);
 
 static inline void hw_protection_reboot(const char *reason, int ms_until_forced)
 {
-	__hw_protection_shutdown(reason, ms_until_forced, HWPROT_ACT_REBOOT);
+	__hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_REBOOT);
 }
 
 static inline void hw_protection_shutdown(const char *reason, int ms_until_forced)
 {
-	__hw_protection_shutdown(reason, ms_until_forced, HWPROT_ACT_SHUTDOWN);
+	__hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_SHUTDOWN);
 }
 
 /*
diff --git a/kernel/reboot.c b/kernel/reboot.c
index 21a85cf95413979322d3183af1a378f5ddd8e780..eee4f449181f0cf3662892f4001f9a3074ec4881 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -1008,7 +1008,7 @@ static void hw_failure_emergency_schedule(enum hw_protection_action action,
 }
 
 /**
- * __hw_protection_shutdown - Trigger an emergency system shutdown or reboot
+ * __hw_protection_trigger - Trigger an emergency system shutdown or reboot
  *
  * @reason:		Reason of emergency shutdown or reboot to be printed.
  * @ms_until_forced:	Time to wait for orderly shutdown or reboot before
@@ -1022,8 +1022,8 @@ static void hw_failure_emergency_schedule(enum hw_protection_action action,
  * pending even if the previous request has given a large timeout for forced
  * shutdown/reboot.
  */
-void __hw_protection_shutdown(const char *reason, int ms_until_forced,
-			      enum hw_protection_action action)
+void __hw_protection_trigger(const char *reason, int ms_until_forced,
+			     enum hw_protection_action action)
 {
 	static atomic_t allow_proceed = ATOMIC_INIT(1);
 
@@ -1043,7 +1043,7 @@ void __hw_protection_shutdown(const char *reason, int ms_until_forced,
 	else
 		orderly_poweroff(true);
 }
-EXPORT_SYMBOL_GPL(__hw_protection_shutdown);
+EXPORT_SYMBOL_GPL(__hw_protection_trigger);
 
 static int __init reboot_setup(char *str)
 {

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 06/12] reboot: indicate whether it is a HARDWARE PROTECTION reboot or shutdown
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
                   ` (4 preceding siblings ...)
  2025-01-13 16:25 ` [PATCH v2 05/12] reboot: rename now misleading __hw_protection_shutdown symbols Ahmad Fatoum
@ 2025-01-13 16:25 ` Ahmad Fatoum
  2025-01-20  7:11   ` Tzung-Bi Shih
  2025-01-13 16:25 ` [PATCH v2 07/12] reboot: add support for configuring emergency hardware protection action Ahmad Fatoum
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum

It currently depends on the caller, whether we attempt a hardware
protection shutdown (poweroff) or a reboot. A follow-up commit will make
this partially user-configurable, so it's a good idea to have the
emergency message clearly state whether the kernel is going for a reboot
or a shutdown.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 kernel/reboot.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/reboot.c b/kernel/reboot.c
index eee4f449181f0cf3662892f4001f9a3074ec4881..c0333b0a901f2cea7e39d1ac04c57bc9a5c9ba20 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -1027,7 +1027,8 @@ void __hw_protection_trigger(const char *reason, int ms_until_forced,
 {
 	static atomic_t allow_proceed = ATOMIC_INIT(1);
 
-	pr_emerg("HARDWARE PROTECTION shutdown (%s)\n", reason);
+	pr_emerg("HARDWARE PROTECTION %s (%s)\n",
+		 hw_protection_action_str(action), reason);
 
 	/* Shutdown should be initiated only once. */
 	if (!atomic_dec_and_test(&allow_proceed))

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 07/12] reboot: add support for configuring emergency hardware protection action
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
                   ` (5 preceding siblings ...)
  2025-01-13 16:25 ` [PATCH v2 06/12] reboot: indicate whether it is a HARDWARE PROTECTION reboot or shutdown Ahmad Fatoum
@ 2025-01-13 16:25 ` Ahmad Fatoum
  2025-01-20  7:12   ` Tzung-Bi Shih
  2025-01-13 16:25 ` [PATCH v2 08/12] regulator: allow user configuration of " Ahmad Fatoum
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum, Matteo Croce

We currently leave the decision of whether to shutdown or reboot to
protect hardware in an emergency situation to the individual drivers.

This works out in some cases, where the driver detecting the critical
failure has inside knowledge: It binds to the system management controller
for example or is guided by hardware description that defines what to do.

In the general case, however, the driver detecting the issue can't know
what the appropriate course of action is and shouldn't be dictating the
policy of dealing with it.

Therefore, add a global hw_protection toggle that allows the user to
specify whether shutdown or reboot should be the default action when the
driver doesn't set policy.

This introduces no functional change yet as hw_protection_trigger() has
no callers, but these will be added in subsequent commits.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 Documentation/ABI/testing/sysfs-kernel-reboot   |  8 +++++
 Documentation/admin-guide/kernel-parameters.txt |  6 ++++
 include/linux/reboot.h                          | 22 +++++++++++-
 include/uapi/linux/capability.h                 |  1 +
 kernel/reboot.c                                 | 46 +++++++++++++++++++++++++
 5 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-reboot b/Documentation/ABI/testing/sysfs-kernel-reboot
index 837330fb251134ffdf29cd68f0b2a845b088e5a0..133f54707d533665c68a5946394540ec50b149e5 100644
--- a/Documentation/ABI/testing/sysfs-kernel-reboot
+++ b/Documentation/ABI/testing/sysfs-kernel-reboot
@@ -30,3 +30,11 @@ KernelVersion:	5.11
 Contact:	Matteo Croce <mcroce@microsoft.com>
 Description:	Don't wait for any other CPUs on reboot and
 		avoid anything that could hang.
+
+What:		/sys/kernel/reboot/hw_protection
+Date:		Feb 2025
+KernelVersion:	6.14
+Contact:	Ahmad Fatoum <a.fatoum@pengutronix.de>
+Description:	Hardware protection action taken on critical events like
+		overtemperature or imminent voltage loss.
+		Valid values are: reboot shutdown
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 3872bc6ec49d63772755504966ae70113f24a1db..ff244e6a0e04d2c172825818defd5d94448f8518 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1921,6 +1921,12 @@
 			which allow the hypervisor to 'idle' the guest
 			on lock contention.
 
+	hw_protection=	[HW]
+			Format: reboot | shutdown
+
+			Hardware protection action taken on critical events like
+			overtemperature or imminent voltage loss.
+
 	i2c_bus=	[HW]	Override the default board specific I2C bus speed
 				or register an additional I2C bus that is not
 				registered from board initialization code.
diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index 53c64e31b3cfdcb6b6dfe4def45fbb40c29f5144..79e02876f2ba2b5508f6f26567cbcd5cbe97a277 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -181,16 +181,36 @@ extern void orderly_reboot(void);
 /**
  * enum hw_protection_action - Hardware protection action
  *
+ * @HWPROT_ACT_DEFAULT:
+ *      The default action should be taken. This is HWPROT_ACT_SHUTDOWN
+ *      by default, but can be overridden.
  * @HWPROT_ACT_SHUTDOWN:
  *	The system should be shut down (powered off) for HW protection.
  * @HWPROT_ACT_REBOOT:
  *	The system should be rebooted for HW protection.
  */
-enum hw_protection_action { HWPROT_ACT_SHUTDOWN, HWPROT_ACT_REBOOT };
+enum hw_protection_action { HWPROT_ACT_DEFAULT, HWPROT_ACT_SHUTDOWN, HWPROT_ACT_REBOOT };
 
 void __hw_protection_trigger(const char *reason, int ms_until_forced,
 			     enum hw_protection_action action);
 
+/**
+ * hw_protection_trigger - Trigger default emergency system hardware protection action
+ *
+ * @reason:		Reason of emergency shutdown or reboot to be printed.
+ * @ms_until_forced:	Time to wait for orderly shutdown or reboot before
+ *			triggering it. Negative value disables the forced
+ *			shutdown or reboot.
+ *
+ * Initiate an emergency system shutdown or reboot in order to protect
+ * hardware from further damage. The exact action taken is controllable at
+ * runtime and defaults to shutdown.
+ */
+static inline void hw_protection_trigger(const char *reason, int ms_until_forced)
+{
+	__hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_DEFAULT);
+}
+
 static inline void hw_protection_reboot(const char *reason, int ms_until_forced)
 {
 	__hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_REBOOT);
diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h
index 5bb9060986974726025eaabee24a0b720ff94657..2e21b5594f81313e8e17aeeb98a09f098355515f 100644
--- a/include/uapi/linux/capability.h
+++ b/include/uapi/linux/capability.h
@@ -275,6 +275,7 @@ struct vfs_ns_cap_data {
 /* Allow setting encryption key on loopback filesystem */
 /* Allow setting zone reclaim policy */
 /* Allow everything under CAP_BPF and CAP_PERFMON for backward compatibility */
+/* Allow setting hardware protection emergency action */
 
 #define CAP_SYS_ADMIN        21
 
diff --git a/kernel/reboot.c b/kernel/reboot.c
index c0333b0a901f2cea7e39d1ac04c57bc9a5c9ba20..b8ecc3c0533f950e8ab42685ce9134b17cfdddef 100644
--- a/kernel/reboot.c
+++ b/kernel/reboot.c
@@ -36,6 +36,8 @@ enum reboot_mode reboot_mode DEFAULT_REBOOT_MODE;
 EXPORT_SYMBOL_GPL(reboot_mode);
 enum reboot_mode panic_reboot_mode = REBOOT_UNDEFINED;
 
+static enum hw_protection_action hw_protection_action = HWPROT_ACT_SHUTDOWN;
+
 /*
  * This variable is used privately to keep track of whether or not
  * reboot_type is still set to its default value (i.e., reboot= hasn't
@@ -1027,6 +1029,9 @@ void __hw_protection_trigger(const char *reason, int ms_until_forced,
 {
 	static atomic_t allow_proceed = ATOMIC_INIT(1);
 
+	if (action == HWPROT_ACT_DEFAULT)
+		action = hw_protection_action;
+
 	pr_emerg("HARDWARE PROTECTION %s (%s)\n",
 		 hw_protection_action_str(action), reason);
 
@@ -1046,6 +1051,46 @@ void __hw_protection_trigger(const char *reason, int ms_until_forced,
 }
 EXPORT_SYMBOL_GPL(__hw_protection_trigger);
 
+static bool hw_protection_action_parse(const char *str,
+				       enum hw_protection_action *action)
+{
+	if (sysfs_streq(str, "shutdown"))
+		*action = HWPROT_ACT_SHUTDOWN;
+	else if (sysfs_streq(str, "reboot"))
+		*action = HWPROT_ACT_REBOOT;
+	else
+		return false;
+
+	return true;
+}
+
+static int __init hw_protection_setup(char *str)
+{
+	hw_protection_action_parse(str, &hw_protection_action);
+	return 1;
+}
+__setup("hw_protection=", hw_protection_setup);
+
+static ssize_t hw_protection_show(struct kobject *kobj,
+				  struct kobj_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%s\n",
+			  hw_protection_action_str(hw_protection_action));
+}
+static ssize_t hw_protection_store(struct kobject *kobj,
+				   struct kobj_attribute *attr, const char *buf,
+				   size_t count)
+{
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (!hw_protection_action_parse(buf, &hw_protection_action))
+		return -EINVAL;
+
+	return count;
+}
+static struct kobj_attribute hw_protection_attr = __ATTR_RW(hw_protection);
+
 static int __init reboot_setup(char *str)
 {
 	for (;;) {
@@ -1305,6 +1350,7 @@ static struct kobj_attribute reboot_cpu_attr = __ATTR_RW(cpu);
 #endif
 
 static struct attribute *reboot_attrs[] = {
+	&hw_protection_attr.attr,
 	&reboot_mode_attr.attr,
 #ifdef CONFIG_X86
 	&reboot_force_attr.attr,

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 08/12] regulator: allow user configuration of hardware protection action
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
                   ` (6 preceding siblings ...)
  2025-01-13 16:25 ` [PATCH v2 07/12] reboot: add support for configuring emergency hardware protection action Ahmad Fatoum
@ 2025-01-13 16:25 ` Ahmad Fatoum
  2025-01-20  7:12   ` Tzung-Bi Shih
  2025-01-22 11:18   ` Matti Vaittinen
  2025-01-13 16:25 ` [PATCH v2 09/12] platform/chrome: cros_ec_lpc: prepare for hw_protection_shutdown removal Ahmad Fatoum
                   ` (4 subsequent siblings)
  12 siblings, 2 replies; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum

When the core detects permanent regulator hardware failure or imminent
power failure of a critical supply, it will call hw_protection_shutdown
in an attempt to do a limited orderly shutdown followed by powering off
the system.

This doesn't work out well for many unattended embedded systems that don't
have support for shutdown and that power on automatically when power is
supplied:

  - A brief power cycle gets detected by the driver
  - The kernel powers down the system and SoC goes into shutdown mode
  - Power is restored
  - The system remains oblivious to the restored power
  - System needs to be manually power cycled for a duration long enough
    to drain the capacitors

Allow users to fix this by calling the newly introduced
hw_protection_trigger() instead: This way the hw_protection commandline
or sysfs parameter is used to dictate the policy of dealing with the
regulator fault.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 drivers/regulator/core.c        |  4 ++--
 drivers/regulator/irq_helpers.c | 16 ++++++++--------
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 8cb948a91e60d958c6b5ec97d736e6e3bf4b47eb..74c8f1262f2cd4e796ba8f4f1bdf17b685a615c1 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -5137,8 +5137,8 @@ static void regulator_handle_critical(struct regulator_dev *rdev,
 	if (!reason)
 		return;
 
-	hw_protection_shutdown(reason,
-			       rdev->constraints->uv_less_critical_window_ms);
+	hw_protection_trigger(reason,
+			      rdev->constraints->uv_less_critical_window_ms);
 }
 
 /**
diff --git a/drivers/regulator/irq_helpers.c b/drivers/regulator/irq_helpers.c
index 0aa188b2bbb26797b7907cbfb581459ef41df286..5742faee8071dd8104c094587d66693f48fb0f9b 100644
--- a/drivers/regulator/irq_helpers.c
+++ b/drivers/regulator/irq_helpers.c
@@ -64,16 +64,16 @@ static void regulator_notifier_isr_work(struct work_struct *work)
 reread:
 	if (d->fatal_cnt && h->retry_cnt > d->fatal_cnt) {
 		if (!d->die)
-			return hw_protection_shutdown("Regulator HW failure? - no IC recovery",
-						      REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS);
+			return hw_protection_trigger("Regulator HW failure? - no IC recovery",
+						     REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS);
 		ret = d->die(rid);
 		/*
 		 * If the 'last resort' IC recovery failed we will have
 		 * nothing else left to do...
 		 */
 		if (ret)
-			return hw_protection_shutdown("Regulator HW failure. IC recovery failed",
-						      REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS);
+			return hw_protection_trigger("Regulator HW failure. IC recovery failed",
+						     REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS);
 
 		/*
 		 * If h->die() was implemented we assume recovery has been
@@ -263,14 +263,14 @@ static irqreturn_t regulator_notifier_isr(int irq, void *data)
 	if (d->fatal_cnt && h->retry_cnt > d->fatal_cnt) {
 		/* If we have no recovery, just try shut down straight away */
 		if (!d->die) {
-			hw_protection_shutdown("Regulator failure. Retry count exceeded",
-					       REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS);
+			hw_protection_trigger("Regulator failure. Retry count exceeded",
+					      REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS);
 		} else {
 			ret = d->die(rid);
 			/* If die() failed shut down as a last attempt to save the HW */
 			if (ret)
-				hw_protection_shutdown("Regulator failure. Recovery failed",
-						       REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS);
+				hw_protection_trigger("Regulator failure. Recovery failed",
+						      REGULATOR_FORCED_SAFETY_SHUTDOWN_WAIT_MS);
 		}
 	}
 

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 09/12] platform/chrome: cros_ec_lpc: prepare for hw_protection_shutdown removal
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
                   ` (7 preceding siblings ...)
  2025-01-13 16:25 ` [PATCH v2 08/12] regulator: allow user configuration of " Ahmad Fatoum
@ 2025-01-13 16:25 ` Ahmad Fatoum
  2025-01-20  7:12   ` Tzung-Bi Shih
  2025-01-13 16:25 ` [PATCH v2 10/12] dt-bindings: thermal: give OS some leeway in absence of critical-action Ahmad Fatoum
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum

In the general case, a driver doesn't know which of system shutdown or
reboot is the better action to take to protect hardware in an emergency
situation. For this reason, hw_protection_shutdown is going to be
removed in favor of hw_protection_trigger, which defaults to shutdown,
but may be configured at kernel runtime to be a reboot instead.

The ChromeOS EC situation is different as we do know that shutdown is
the correct action as the EC is programmed to force reset after the
short period, thus replace hw_protection_shutdown with
__hw_protection_trigger with HWPROT_ACT_SHUTDOWN as argument to
maintain the same behavior.

No functional change.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 drivers/platform/chrome/cros_ec_lpc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/platform/chrome/cros_ec_lpc.c b/drivers/platform/chrome/cros_ec_lpc.c
index 924bf4d3cc77b9f27d415a10c61ae06886fa7f80..068781b75529d80b51b3773845bcf457fa958710 100644
--- a/drivers/platform/chrome/cros_ec_lpc.c
+++ b/drivers/platform/chrome/cros_ec_lpc.c
@@ -414,7 +414,7 @@ static void cros_ec_lpc_acpi_notify(acpi_handle device, u32 value, void *data)
 		blocking_notifier_call_chain(&ec_dev->panic_notifier, 0, ec_dev);
 		kobject_uevent_env(&ec_dev->dev->kobj, KOBJ_CHANGE, (char **)env);
 		/* Begin orderly shutdown. EC will force reset after a short period. */
-		hw_protection_shutdown("CrOS EC Panic", -1);
+		__hw_protection_trigger("CrOS EC Panic", -1, HWPROT_ACT_SHUTDOWN);
 		/* Do not query for other events after a panic is reported */
 		return;
 	}

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 10/12] dt-bindings: thermal: give OS some leeway in absence of critical-action
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
                   ` (8 preceding siblings ...)
  2025-01-13 16:25 ` [PATCH v2 09/12] platform/chrome: cros_ec_lpc: prepare for hw_protection_shutdown removal Ahmad Fatoum
@ 2025-01-13 16:25 ` Ahmad Fatoum
  2025-01-13 16:25 ` [PATCH v2 11/12] thermal: core: allow user configuration of hardware protection action Ahmad Fatoum
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum

An operating system may allow its user to configure the action to be
undertaken on critical overtemperature events.

However, the bindings currently mandate an absence of the critical-action
property to be equal to critical-action = "shutdown", which would mean
any differing user configuration would violate the bindings.

Resolve this by documenting the absence of the property to mean that the
OS gets to decide.

Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 Documentation/devicetree/bindings/thermal/thermal-zones.yaml | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
index 0f435be1dbd8cfb4502be9d198ed6d51058f453b..0de0a9757ccc201ebbb0c8c8efb9f8da662f8e9c 100644
--- a/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
+++ b/Documentation/devicetree/bindings/thermal/thermal-zones.yaml
@@ -82,9 +82,8 @@ patternProperties:
         $ref: /schemas/types.yaml#/definitions/string
         description: |
           The action the OS should perform after the critical temperature is reached.
-          By default the system will shutdown as a safe action to prevent damage
-          to the hardware, if the property is not set.
-          The shutdown action should be always the default and preferred one.
+          If the property is not set, it is up to the system to select the correct
+          action. The recommended and preferred default is shutdown.
           Choose 'reboot' with care, as the hardware may be in thermal stress,
           thus leading to infinite reboots that may cause damage to the hardware.
           Make sure the firmware/bootloader will act as the last resort and take

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 11/12] thermal: core: allow user configuration of hardware protection action
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
                   ` (9 preceding siblings ...)
  2025-01-13 16:25 ` [PATCH v2 10/12] dt-bindings: thermal: give OS some leeway in absence of critical-action Ahmad Fatoum
@ 2025-01-13 16:25 ` Ahmad Fatoum
  2025-01-20  7:12   ` Tzung-Bi Shih
  2025-01-13 16:25 ` [PATCH v2 12/12] reboot: retire hw_protection_reboot and hw_protection_shutdown helpers Ahmad Fatoum
  2025-01-14  0:33 ` [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Andrew Morton
  12 siblings, 1 reply; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum

In the general case, we don't know which of system shutdown or
reboot is the better action to take to protect hardware in an emergency
situation. We thus allow the policy to come from the device-tree in the
form of an optional critical-action OF property, but so far there was no
way for the end user to configure this.

With recent addition of the hw_protection parameter, the user can now
choose a default action for the case, where the driver isn't fully sure
what's the better course of action.

Let's make use of this by passing HWPROT_ACT_DEFAULT in absence of the
critical-action OF property.

As HWPROT_ACT_DEFAULT is shutdown by default, this introduces no
functional change for users, unless they start using the new parameter.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 drivers/thermal/thermal_core.c | 17 ++++++++++-------
 drivers/thermal/thermal_core.h |  1 +
 drivers/thermal/thermal_of.c   |  7 +++++--
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 19a3894ad752a91ef621794abbeec9abfb2323ec..abe990b7b40b0c7fa5034093b961e239840a18c1 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -369,7 +369,8 @@ void thermal_governor_update_tz(struct thermal_zone_device *tz,
 	tz->governor->update_tz(tz, reason);
 }
 
-static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdown)
+static void thermal_zone_device_halt(struct thermal_zone_device *tz,
+				     enum hw_protection_action action)
 {
 	/*
 	 * poweroff_delay_ms must be a carefully profiled positive value.
@@ -380,21 +381,23 @@ static void thermal_zone_device_halt(struct thermal_zone_device *tz, bool shutdo
 
 	dev_emerg(&tz->device, "%s: critical temperature reached\n", tz->type);
 
-	if (shutdown)
-		hw_protection_shutdown(msg, poweroff_delay_ms);
-	else
-		hw_protection_reboot(msg, poweroff_delay_ms);
+	__hw_protection_trigger(msg, poweroff_delay_ms, action);
 }
 
 void thermal_zone_device_critical(struct thermal_zone_device *tz)
 {
-	thermal_zone_device_halt(tz, true);
+	thermal_zone_device_halt(tz, HWPROT_ACT_DEFAULT);
 }
 EXPORT_SYMBOL(thermal_zone_device_critical);
 
+void thermal_zone_device_critical_shutdown(struct thermal_zone_device *tz)
+{
+	thermal_zone_device_halt(tz, HWPROT_ACT_SHUTDOWN);
+}
+
 void thermal_zone_device_critical_reboot(struct thermal_zone_device *tz)
 {
-	thermal_zone_device_halt(tz, false);
+	thermal_zone_device_halt(tz, HWPROT_ACT_REBOOT);
 }
 
 static void handle_critical_trips(struct thermal_zone_device *tz,
diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h
index be271e7c8f4141146a03efecc82fc4036ec12df6..7d6637126007168ac05010af0f16a4c8012a0d77 100644
--- a/drivers/thermal/thermal_core.h
+++ b/drivers/thermal/thermal_core.h
@@ -262,6 +262,7 @@ int thermal_build_list_of_policies(char *buf);
 void __thermal_zone_device_update(struct thermal_zone_device *tz,
 				  enum thermal_notify_event event);
 void thermal_zone_device_critical_reboot(struct thermal_zone_device *tz);
+void thermal_zone_device_critical_shutdown(struct thermal_zone_device *tz);
 void thermal_governor_update_tz(struct thermal_zone_device *tz,
 				enum thermal_notify_event reason);
 
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index fab11b98ca4952d23d0232998433bd0650b53d24..c574e775d686599deddd08f932a5a6dd781d342e 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -396,9 +396,12 @@ static struct thermal_zone_device *thermal_of_zone_register(struct device_node *
 	of_ops.should_bind = thermal_of_should_bind;
 
 	ret = of_property_read_string(np, "critical-action", &action);
-	if (!ret)
-		if (!of_ops.critical && !strcasecmp(action, "reboot"))
+	if (!ret && !of_ops.critical) {
+		if (!strcasecmp(action, "reboot"))
 			of_ops.critical = thermal_zone_device_critical_reboot;
+		else if (!strcasecmp(action, "shutdown"))
+			of_ops.critical = thermal_zone_device_critical_shutdown;
+	}
 
 	tz = thermal_zone_device_register_with_trips(np->name, trips, ntrips,
 						     data, &of_ops, &tzp,

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v2 12/12] reboot: retire hw_protection_reboot and hw_protection_shutdown helpers
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
                   ` (10 preceding siblings ...)
  2025-01-13 16:25 ` [PATCH v2 11/12] thermal: core: allow user configuration of hardware protection action Ahmad Fatoum
@ 2025-01-13 16:25 ` Ahmad Fatoum
  2025-01-20  7:13   ` Tzung-Bi Shih
  2025-01-14  0:33 ` [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Andrew Morton
  12 siblings, 1 reply; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-13 16:25 UTC (permalink / raw)
  To: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Ahmad Fatoum

The hw_protection_reboot and hw_protection_shutdown functions mix
mechanism with policy: They let the driver requesting an emergency
action for hardware protection also decide how to deal with it.

This is inadequate in the general case as a driver reporting e.g. an
imminent power failure can't know whether a shutdown or a reboot would
be more appropriate for a given hardware platform.

With the addition of the hw_protection parameter, it's now possible to
configure at runtime the default emergency action and drivers are
expected to use hw_protection_trigger to have this parameter dictate
policy.

As no current users of either hw_protection_shutdown or
hw_protection_shutdown helpers remain, remove them, as not to tempt
driver authors to call them.

Existing users now either defer to hw_protection_trigger or call
__hw_protection_trigger with a suitable argument directly when
they have inside knowledge on whether a reboot or shutdown would
be more appropriate.

Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
 include/linux/reboot.h | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/include/linux/reboot.h b/include/linux/reboot.h
index 79e02876f2ba2b5508f6f26567cbcd5cbe97a277..aa08c3bbbf59a9dec65d775d280902b1455427c2 100644
--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -211,16 +211,6 @@ static inline void hw_protection_trigger(const char *reason, int ms_until_forced
 	__hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_DEFAULT);
 }
 
-static inline void hw_protection_reboot(const char *reason, int ms_until_forced)
-{
-	__hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_REBOOT);
-}
-
-static inline void hw_protection_shutdown(const char *reason, int ms_until_forced)
-{
-	__hw_protection_trigger(reason, ms_until_forced, HWPROT_ACT_SHUTDOWN);
-}
-
 /*
  * Emergency restart, callable from an interrupt handler.
  */

-- 
2.39.5


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action
  2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
                   ` (11 preceding siblings ...)
  2025-01-13 16:25 ` [PATCH v2 12/12] reboot: retire hw_protection_reboot and hw_protection_shutdown helpers Ahmad Fatoum
@ 2025-01-14  0:33 ` Andrew Morton
  12 siblings, 0 replies; 33+ messages in thread
From: Andrew Morton @ 2025-01-14  0:33 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki, Zhang Rui,
	Lukasz Luba, Jonathan Corbet, Serge Hallyn, Liam Girdwood,
	Mark Brown, Matti Vaittinen, Benson Leung, Tzung-Bi Shih,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Matteo Croce

On Mon, 13 Jan 2025 17:25:25 +0100 Ahmad Fatoum <a.fatoum@pengutronix.de> wrote:

> We currently leave the decision of whether to shutdown or reboot to
> protect hardware in an emergency situation to the individual drivers.
> 
> This works out in some cases, where the driver detecting the critical
> failure has inside knowledge: It binds to the system management controller
> for example or is guided by hardware description that defines what to do.
> 
> This is inadequate in the general case though as a driver reporting e.g.
> an imminent power failure can't know whether a shutdown or a reboot would
> be more appropriate for a given hardware platform.
> 
> To address this, this series adds a hw_protection kernel parameter and
> sysfs toggle that can be used to change the action from the shutdown
> default to reboot. A new hw_protection_trigger API then makes use of
> this default action.
> 
> My particular use case is unattended embedded systems that don't
> have support for shutdown and that power on automatically when power is
> supplied:
> 
>   - A brief power cycle gets detected by the driver
>   - The kernel powers down the system and SoC goes into shutdown mode
>   - Power is restored
>   - The system remains oblivious to the restored power
>   - System needs to be manually power cycled for a duration long enough
>     to drain the capacitors
> 
> With this series, such systems can configure the kernel with
> hw_protection=reboot to have the boot firmware worry about critical
> conditions.

This seems useful.

>  Documentation/ABI/testing/sysfs-kernel-reboot      |   8 ++
>  Documentation/admin-guide/kernel-parameters.txt    |   6 +
>  .../devicetree/bindings/thermal/thermal-zones.yaml |   5 +-
>  Documentation/driver-api/thermal/sysfs-api.rst     |  25 ++--
>  drivers/platform/chrome/cros_ec_lpc.c              |   2 +-
>  drivers/regulator/core.c                           |   4 +-
>  drivers/regulator/irq_helpers.c                    |  16 +--
>  drivers/thermal/thermal_core.c                     |  17 +--
>  drivers/thermal/thermal_core.h                     |   1 +
>  drivers/thermal/thermal_of.c                       |   7 +-
>  include/linux/reboot.h                             |  36 ++++--
>  include/uapi/linux/capability.h                    |   1 +
>  kernel/reboot.c                                    | 140 ++++++++++++++++-----
>  13 files changed, 195 insertions(+), 73 deletions(-)

I'm not sure what the merge path is.  Maybe the drivers tree, maybe
mm.git's mm-nonmm branches.

We're at -rc7 so I'll save this away and shall revisit after -rc1 with
a view to gathering acks (please) and adding the series to mm-nonmm, thanks.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 01/12] reboot: replace __hw_protection_shutdown bool action parameter with an enum
  2025-01-13 16:25 ` [PATCH v2 01/12] reboot: replace __hw_protection_shutdown bool action parameter with an enum Ahmad Fatoum
@ 2025-01-20  7:10   ` Tzung-Bi Shih
  2025-01-21  9:27     ` Ahmad Fatoum
  0 siblings, 1 reply; 33+ messages in thread
From: Tzung-Bi Shih @ 2025-01-20  7:10 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On Mon, Jan 13, 2025 at 05:25:26PM +0100, Ahmad Fatoum wrote:
> Currently __hw_protection_shutdown() either reboots or shuts down the
> system according to its shutdown argument.
> 
> To make the logic easier to follow, both inside __hw_protection_shutdown
> and at caller sites, lets replace the bool parameter with an enum.
> 
> This will be extra useful, when in a later commit, a third action is
> added to the enumeration.
> 
> No functional change.
> 
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

With a minor question,
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>

> @@ -1009,10 +1007,10 @@ void __hw_protection_shutdown(const char *reason, int ms_until_forced, bool shut
>  	 * orderly_poweroff failure
>  	 */
>  	hw_failure_emergency_poweroff(ms_until_forced);
> -	if (shutdown)
> -		orderly_poweroff(true);
> -	else
> +	if (action == HWPROT_ACT_REBOOT)
>  		orderly_reboot();
> +	else
> +		orderly_poweroff(true);

It probably doesn't really matter.  Does it intend to change the branch
order?  As s/shutdown/action == HWPROT_ACT_SHUTDOWN/ should be more
intuitive for the hunk to me.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 02/12] reboot: reboot, not shutdown, on hw_protection_reboot timeout
  2025-01-13 16:25 ` [PATCH v2 02/12] reboot: reboot, not shutdown, on hw_protection_reboot timeout Ahmad Fatoum
@ 2025-01-20  7:10   ` Tzung-Bi Shih
  2025-01-22 11:28   ` Matti Vaittinen
  1 sibling, 0 replies; 33+ messages in thread
From: Tzung-Bi Shih @ 2025-01-20  7:10 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On Mon, Jan 13, 2025 at 05:25:27PM +0100, Ahmad Fatoum wrote:
> hw_protection_shutdown() will kick off an orderly shutdown and if that
> takes longer than a configurable amount of time, an emergency shutdown
> will occur.
> 
> Recently, hw_protection_reboot() was added for those systems that don't
> implement a proper shutdown and are better served by rebooting and
> having the boot firmware worry about doing something about the critical
> condition.
> 
> On timeout of the orderly reboot of hw_protection_reboot(), the system
> would go into shutdown, instead of reboot. This is not a good idea, as
> going into shutdown was explicitly not asked for.
> 
> Fix this by always doing an emergency reboot if hw_protection_reboot()
> is called and the orderly reboot takes too long.
> 
> Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/12] docs: thermal: sync hardware protection doc with code
  2025-01-13 16:25 ` [PATCH v2 03/12] docs: thermal: sync hardware protection doc with code Ahmad Fatoum
@ 2025-01-20  7:11   ` Tzung-Bi Shih
  2025-01-21  9:29     ` Ahmad Fatoum
  2025-01-22 11:01   ` Matti Vaittinen
  1 sibling, 1 reply; 33+ messages in thread
From: Tzung-Bi Shih @ 2025-01-20  7:11 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On Mon, Jan 13, 2025 at 05:25:28PM +0100, Ahmad Fatoum wrote:
> Originally, the thermal framework's only hardware protection action was
> to trigger a shutdown. This has been changed a little over a year ago to
> also support rebooting as alternative hardware protection action.
> 
> Update the documentation to reflect this.
> 
> Fixes: 62e79e38b257 ("thermal/thermal_of: Allow rebooting after critical temp")
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

With a possible typo,
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>

> +At first, the kernel will attempt an orderly power-off or reboot, but
> +accepts a delay after which it proceeds to do a forced power-off or
> +reboot, respectively. If this fails, ``emergency restart()`` is invoked
                                                   ^
s/ /_/?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 04/12] reboot: describe do_kernel_restart's cmd argument in kernel-doc
  2025-01-13 16:25 ` [PATCH v2 04/12] reboot: describe do_kernel_restart's cmd argument in kernel-doc Ahmad Fatoum
@ 2025-01-20  7:11   ` Tzung-Bi Shih
  0 siblings, 0 replies; 33+ messages in thread
From: Tzung-Bi Shih @ 2025-01-20  7:11 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On Mon, Jan 13, 2025 at 05:25:29PM +0100, Ahmad Fatoum wrote:
> A W=1 build rightfully complains about the function's kernel-doc
> being incomplete.
> 
> Describe its single parameter to fix this.
> 
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 05/12] reboot: rename now misleading __hw_protection_shutdown symbols
  2025-01-13 16:25 ` [PATCH v2 05/12] reboot: rename now misleading __hw_protection_shutdown symbols Ahmad Fatoum
@ 2025-01-20  7:11   ` Tzung-Bi Shih
  0 siblings, 0 replies; 33+ messages in thread
From: Tzung-Bi Shih @ 2025-01-20  7:11 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On Mon, Jan 13, 2025 at 05:25:30PM +0100, Ahmad Fatoum wrote:
> The __hw_protection_shutdown function name has become misleading
> since it can cause either a shutdown (poweroff) or a reboot
> depending on its argument.
> 
> To avoid further confusion, let's rename it, so it doesn't suggest
> that a poweroff is all it can do.
> 
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 06/12] reboot: indicate whether it is a HARDWARE PROTECTION reboot or shutdown
  2025-01-13 16:25 ` [PATCH v2 06/12] reboot: indicate whether it is a HARDWARE PROTECTION reboot or shutdown Ahmad Fatoum
@ 2025-01-20  7:11   ` Tzung-Bi Shih
  0 siblings, 0 replies; 33+ messages in thread
From: Tzung-Bi Shih @ 2025-01-20  7:11 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On Mon, Jan 13, 2025 at 05:25:31PM +0100, Ahmad Fatoum wrote:
> It currently depends on the caller, whether we attempt a hardware
> protection shutdown (poweroff) or a reboot. A follow-up commit will make
> this partially user-configurable, so it's a good idea to have the
> emergency message clearly state whether the kernel is going for a reboot
> or a shutdown.
> 
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 07/12] reboot: add support for configuring emergency hardware protection action
  2025-01-13 16:25 ` [PATCH v2 07/12] reboot: add support for configuring emergency hardware protection action Ahmad Fatoum
@ 2025-01-20  7:12   ` Tzung-Bi Shih
  2025-01-21  9:35     ` Ahmad Fatoum
  0 siblings, 1 reply; 33+ messages in thread
From: Tzung-Bi Shih @ 2025-01-20  7:12 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Matteo Croce

On Mon, Jan 13, 2025 at 05:25:32PM +0100, Ahmad Fatoum wrote:
> We currently leave the decision of whether to shutdown or reboot to
> protect hardware in an emergency situation to the individual drivers.
> 
> This works out in some cases, where the driver detecting the critical
> failure has inside knowledge: It binds to the system management controller
> for example or is guided by hardware description that defines what to do.
> 
> In the general case, however, the driver detecting the issue can't know
> what the appropriate course of action is and shouldn't be dictating the
> policy of dealing with it.
> 
> Therefore, add a global hw_protection toggle that allows the user to
> specify whether shutdown or reboot should be the default action when the
> driver doesn't set policy.
> 
> This introduces no functional change yet as hw_protection_trigger() has
> no callers, but these will be added in subsequent commits.
> 
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

With a minor comment,
Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>

> diff --git a/Documentation/ABI/testing/sysfs-kernel-reboot b/Documentation/ABI/testing/sysfs-kernel-reboot
> index 837330fb251134ffdf29cd68f0b2a845b088e5a0..133f54707d533665c68a5946394540ec50b149e5 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-reboot
> +++ b/Documentation/ABI/testing/sysfs-kernel-reboot
> @@ -30,3 +30,11 @@ KernelVersion:	5.11
>  Contact:	Matteo Croce <mcroce@microsoft.com>
>  Description:	Don't wait for any other CPUs on reboot and
>  		avoid anything that could hang.
> +
> +What:		/sys/kernel/reboot/hw_protection
> +Date:		Feb 2025
> +KernelVersion:	6.14

The info might need to be adjusted if the series would be for 6.15. 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 08/12] regulator: allow user configuration of hardware protection action
  2025-01-13 16:25 ` [PATCH v2 08/12] regulator: allow user configuration of " Ahmad Fatoum
@ 2025-01-20  7:12   ` Tzung-Bi Shih
  2025-01-22 11:18   ` Matti Vaittinen
  1 sibling, 0 replies; 33+ messages in thread
From: Tzung-Bi Shih @ 2025-01-20  7:12 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On Mon, Jan 13, 2025 at 05:25:33PM +0100, Ahmad Fatoum wrote:
> When the core detects permanent regulator hardware failure or imminent
> power failure of a critical supply, it will call hw_protection_shutdown
> in an attempt to do a limited orderly shutdown followed by powering off
> the system.
> 
> This doesn't work out well for many unattended embedded systems that don't
> have support for shutdown and that power on automatically when power is
> supplied:
> 
>   - A brief power cycle gets detected by the driver
>   - The kernel powers down the system and SoC goes into shutdown mode
>   - Power is restored
>   - The system remains oblivious to the restored power
>   - System needs to be manually power cycled for a duration long enough
>     to drain the capacitors
> 
> Allow users to fix this by calling the newly introduced
> hw_protection_trigger() instead: This way the hw_protection commandline
> or sysfs parameter is used to dictate the policy of dealing with the
> regulator fault.
> 
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 09/12] platform/chrome: cros_ec_lpc: prepare for hw_protection_shutdown removal
  2025-01-13 16:25 ` [PATCH v2 09/12] platform/chrome: cros_ec_lpc: prepare for hw_protection_shutdown removal Ahmad Fatoum
@ 2025-01-20  7:12   ` Tzung-Bi Shih
  0 siblings, 0 replies; 33+ messages in thread
From: Tzung-Bi Shih @ 2025-01-20  7:12 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On Mon, Jan 13, 2025 at 05:25:34PM +0100, Ahmad Fatoum wrote:
> In the general case, a driver doesn't know which of system shutdown or
> reboot is the better action to take to protect hardware in an emergency
> situation. For this reason, hw_protection_shutdown is going to be
> removed in favor of hw_protection_trigger, which defaults to shutdown,
> but may be configured at kernel runtime to be a reboot instead.
> 
> The ChromeOS EC situation is different as we do know that shutdown is
> the correct action as the EC is programmed to force reset after the
> short period, thus replace hw_protection_shutdown with
> __hw_protection_trigger with HWPROT_ACT_SHUTDOWN as argument to
> maintain the same behavior.
> 
> No functional change.
> 
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

Acked-by: Tzung-Bi Shih <tzungbi@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 11/12] thermal: core: allow user configuration of hardware protection action
  2025-01-13 16:25 ` [PATCH v2 11/12] thermal: core: allow user configuration of hardware protection action Ahmad Fatoum
@ 2025-01-20  7:12   ` Tzung-Bi Shih
  0 siblings, 0 replies; 33+ messages in thread
From: Tzung-Bi Shih @ 2025-01-20  7:12 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On Mon, Jan 13, 2025 at 05:25:36PM +0100, Ahmad Fatoum wrote:
> In the general case, we don't know which of system shutdown or
> reboot is the better action to take to protect hardware in an emergency
> situation. We thus allow the policy to come from the device-tree in the
> form of an optional critical-action OF property, but so far there was no
> way for the end user to configure this.
> 
> With recent addition of the hw_protection parameter, the user can now
> choose a default action for the case, where the driver isn't fully sure
> what's the better course of action.
> 
> Let's make use of this by passing HWPROT_ACT_DEFAULT in absence of the
> critical-action OF property.
> 
> As HWPROT_ACT_DEFAULT is shutdown by default, this introduces no
> functional change for users, unless they start using the new parameter.
> 
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 12/12] reboot: retire hw_protection_reboot and hw_protection_shutdown helpers
  2025-01-13 16:25 ` [PATCH v2 12/12] reboot: retire hw_protection_reboot and hw_protection_shutdown helpers Ahmad Fatoum
@ 2025-01-20  7:13   ` Tzung-Bi Shih
  0 siblings, 0 replies; 33+ messages in thread
From: Tzung-Bi Shih @ 2025-01-20  7:13 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On Mon, Jan 13, 2025 at 05:25:37PM +0100, Ahmad Fatoum wrote:
> The hw_protection_reboot and hw_protection_shutdown functions mix
> mechanism with policy: They let the driver requesting an emergency
> action for hardware protection also decide how to deal with it.
> 
> This is inadequate in the general case as a driver reporting e.g. an
> imminent power failure can't know whether a shutdown or a reboot would
> be more appropriate for a given hardware platform.
> 
> With the addition of the hw_protection parameter, it's now possible to
> configure at runtime the default emergency action and drivers are
> expected to use hw_protection_trigger to have this parameter dictate
> policy.
> 
> As no current users of either hw_protection_shutdown or
> hw_protection_shutdown helpers remain, remove them, as not to tempt
> driver authors to call them.
> 
> Existing users now either defer to hw_protection_trigger or call
> __hw_protection_trigger with a suitable argument directly when
> they have inside knowledge on whether a reboot or shutdown would
> be more appropriate.
> 
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 01/12] reboot: replace __hw_protection_shutdown bool action parameter with an enum
  2025-01-20  7:10   ` Tzung-Bi Shih
@ 2025-01-21  9:27     ` Ahmad Fatoum
  0 siblings, 0 replies; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-21  9:27 UTC (permalink / raw)
  To: Tzung-Bi Shih
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

Hello,

On 20.01.25 08:10, Tzung-Bi Shih wrote:
> On Mon, Jan 13, 2025 at 05:25:26PM +0100, Ahmad Fatoum wrote:
>> Currently __hw_protection_shutdown() either reboots or shuts down the
>> system according to its shutdown argument.
>>
>> To make the logic easier to follow, both inside __hw_protection_shutdown
>> and at caller sites, lets replace the bool parameter with an enum.
>>
>> This will be extra useful, when in a later commit, a third action is
>> added to the enumeration.
>>
>> No functional change.
>>
>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> 
> With a minor question,
> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>

Thanks for your review.

>> @@ -1009,10 +1007,10 @@ void __hw_protection_shutdown(const char *reason, int ms_until_forced, bool shut
>>  	 * orderly_poweroff failure
>>  	 */
>>  	hw_failure_emergency_poweroff(ms_until_forced);
>> -	if (shutdown)
>> -		orderly_poweroff(true);
>> -	else
>> +	if (action == HWPROT_ACT_REBOOT)
>>  		orderly_reboot();
>> +	else
>> +		orderly_poweroff(true);
> 
> It probably doesn't really matter.  Does it intend to change the branch
> order?  As s/shutdown/action == HWPROT_ACT_SHUTDOWN/ should be more
> intuitive for the hunk to me.

My thinking was that having poweroff in the else branch underlines that
it's the default, i.e. either reboot was explicitly asked for or we fall
back to the poweroff default.

I see that this is subjective. I can change it for v3 if that's preferred.

Cheers,
Ahmad

> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/12] docs: thermal: sync hardware protection doc with code
  2025-01-20  7:11   ` Tzung-Bi Shih
@ 2025-01-21  9:29     ` Ahmad Fatoum
  0 siblings, 0 replies; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-21  9:29 UTC (permalink / raw)
  To: Tzung-Bi Shih
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On 20.01.25 08:11, Tzung-Bi Shih wrote:
> On Mon, Jan 13, 2025 at 05:25:28PM +0100, Ahmad Fatoum wrote:
>> Originally, the thermal framework's only hardware protection action was
>> to trigger a shutdown. This has been changed a little over a year ago to
>> also support rebooting as alternative hardware protection action.
>>
>> Update the documentation to reflect this.
>>
>> Fixes: 62e79e38b257 ("thermal/thermal_of: Allow rebooting after critical temp")
>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> 
> With a possible typo,
> Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>

Thanks.

> 
>> +At first, the kernel will attempt an orderly power-off or reboot, but
>> +accepts a delay after which it proceeds to do a forced power-off or
>> +reboot, respectively. If this fails, ``emergency restart()`` is invoked
>                                                    ^
> s/ /_/?

Typo was there before and didn't notice when reworking the paragraph.
Will fix for v3.

Cheers,
Ahmad

-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 07/12] reboot: add support for configuring emergency hardware protection action
  2025-01-20  7:12   ` Tzung-Bi Shih
@ 2025-01-21  9:35     ` Ahmad Fatoum
  0 siblings, 0 replies; 33+ messages in thread
From: Ahmad Fatoum @ 2025-01-21  9:35 UTC (permalink / raw)
  To: Tzung-Bi Shih
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Matti Vaittinen, Benson Leung,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel, Matteo Croce

Hi,

On 20.01.25 08:12, Tzung-Bi Shih wrote:
>> +What:		/sys/kernel/reboot/hw_protection
>> +Date:		Feb 2025
>> +KernelVersion:	6.14
> 
> The info might need to be adjusted if the series would be for 6.15. 

I was being optimistic, but ye, now v6.15 would be earliest.
I will wait a bit to see if there's more feedback and then send v3
with the suggested changes and tags.

Thank you for taking the time,
Ahmad


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 03/12] docs: thermal: sync hardware protection doc with code
  2025-01-13 16:25 ` [PATCH v2 03/12] docs: thermal: sync hardware protection doc with code Ahmad Fatoum
  2025-01-20  7:11   ` Tzung-Bi Shih
@ 2025-01-22 11:01   ` Matti Vaittinen
  1 sibling, 0 replies; 33+ messages in thread
From: Matti Vaittinen @ 2025-01-22 11:01 UTC (permalink / raw)
  To: Ahmad Fatoum
  Cc: Andrew Morton, Daniel Lezcano, Fabio Estevam, Rafael J. Wysocki,
	Zhang Rui, Lukasz Luba, Jonathan Corbet, Serge Hallyn,
	Liam Girdwood, Mark Brown, Benson Leung, Tzung-Bi Shih,
	Guenter Roeck, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
	linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

ma 13.1.2025 klo 18.26 Ahmad Fatoum (a.fatoum@pengutronix.de) kirjoitti:
>
> Originally, the thermal framework's only hardware protection action was
> to trigger a shutdown. This has been changed a little over a year ago to
> also support rebooting as alternative hardware protection action.
>
> Update the documentation to reflect this.
>
> Fixes: 62e79e38b257 ("thermal/thermal_of: Allow rebooting after critical temp")
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>

Matti Vaittinen
Linux kernel developer at ROHM Semiconductors
Oulu Finland

~~ When things go utterly wrong vim users can always type :help! ~~

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 08/12] regulator: allow user configuration of hardware protection action
  2025-01-13 16:25 ` [PATCH v2 08/12] regulator: allow user configuration of " Ahmad Fatoum
  2025-01-20  7:12   ` Tzung-Bi Shih
@ 2025-01-22 11:18   ` Matti Vaittinen
  1 sibling, 0 replies; 33+ messages in thread
From: Matti Vaittinen @ 2025-01-22 11:18 UTC (permalink / raw)
  To: Ahmad Fatoum, Andrew Morton, Daniel Lezcano, Fabio Estevam,
	Rafael J. Wysocki, Zhang Rui, Lukasz Luba, Jonathan Corbet,
	Serge Hallyn, Liam Girdwood, Mark Brown, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On 13/01/2025 18:25, Ahmad Fatoum wrote:
> When the core detects permanent regulator hardware failure or imminent
> power failure of a critical supply, it will call hw_protection_shutdown
> in an attempt to do a limited orderly shutdown followed by powering off
> the system.
> 
> This doesn't work out well for many unattended embedded systems that don't
> have support for shutdown and that power on automatically when power is
> supplied:
> 
>    - A brief power cycle gets detected by the driver
>    - The kernel powers down the system and SoC goes into shutdown mode
>    - Power is restored
>    - The system remains oblivious to the restored power
>    - System needs to be manually power cycled for a duration long enough
>      to drain the capacitors
> 
> Allow users to fix this by calling the newly introduced
> hw_protection_trigger() instead: This way the hw_protection commandline
> or sysfs parameter is used to dictate the policy of dealing with the
> regulator fault.
> 
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>

Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 02/12] reboot: reboot, not shutdown, on hw_protection_reboot timeout
  2025-01-13 16:25 ` [PATCH v2 02/12] reboot: reboot, not shutdown, on hw_protection_reboot timeout Ahmad Fatoum
  2025-01-20  7:10   ` Tzung-Bi Shih
@ 2025-01-22 11:28   ` Matti Vaittinen
  2025-02-17 20:22     ` Ahmad Fatoum
  1 sibling, 1 reply; 33+ messages in thread
From: Matti Vaittinen @ 2025-01-22 11:28 UTC (permalink / raw)
  To: Ahmad Fatoum, Andrew Morton, Daniel Lezcano, Fabio Estevam,
	Rafael J. Wysocki, Zhang Rui, Lukasz Luba, Jonathan Corbet,
	Serge Hallyn, Liam Girdwood, Mark Brown, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On 13/01/2025 18:25, Ahmad Fatoum wrote:
> hw_protection_shutdown() will kick off an orderly shutdown and if that
> takes longer than a configurable amount of time, an emergency shutdown
> will occur.
> 
> Recently, hw_protection_reboot() was added for those systems that don't
> implement a proper shutdown and are better served by rebooting and
> having the boot firmware worry about doing something about the critical
> condition.
> 
> On timeout of the orderly reboot of hw_protection_reboot(), the system
> would go into shutdown, instead of reboot. This is not a good idea, as
> going into shutdown was explicitly not asked for.
> 
> Fix this by always doing an emergency reboot if hw_protection_reboot()
> is called and the orderly reboot takes too long.
> 
> Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
> ---
>   kernel/reboot.c | 70 ++++++++++++++++++++++++++++++++++++++++-----------------
>   1 file changed, 49 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/reboot.c b/kernel/reboot.c
> index 847ac5d17a659981c6765699eac323f5e87f48c1..222b63dfd31020d0e2bc1b1402dbfa82adc71990 100644
> --- a/kernel/reboot.c
> +++ b/kernel/reboot.c
> @@ -932,48 +932,76 @@ void orderly_reboot(void)
>   }
>   EXPORT_SYMBOL_GPL(orderly_reboot);
>   
> +static const char *hw_protection_action_str(enum hw_protection_action action)
> +{
> +	switch (action) {
> +	case HWPROT_ACT_SHUTDOWN:
> +		return "shutdown";
> +	case HWPROT_ACT_REBOOT:
> +		return "reboot";
> +	default:
> +		return "undefined";
> +	}
> +}
> +
> +static enum hw_protection_action hw_failure_emergency_action;

nit: Do we have a (theoretical) possibility that two emergency restarts 
get scheduled with different actions? Should the action be allocated 
(maybe not) for each caller, or should there be a check if an operation 
with conflicting action is already scheduled?

If this was already considered and thought it is not an issue:

Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>


Yours,
	-- Matti

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 02/12] reboot: reboot, not shutdown, on hw_protection_reboot timeout
  2025-01-22 11:28   ` Matti Vaittinen
@ 2025-02-17 20:22     ` Ahmad Fatoum
  2025-02-18  6:45       ` Matti Vaittinen
  0 siblings, 1 reply; 33+ messages in thread
From: Ahmad Fatoum @ 2025-02-17 20:22 UTC (permalink / raw)
  To: Matti Vaittinen, Andrew Morton, Daniel Lezcano, Fabio Estevam,
	Rafael J. Wysocki, Zhang Rui, Lukasz Luba, Jonathan Corbet,
	Serge Hallyn, Liam Girdwood, Mark Brown, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

Hello Matti,

On 22.01.25 12:28, Matti Vaittinen wrote:
> On 13/01/2025 18:25, Ahmad Fatoum wrote:
>> hw_protection_shutdown() will kick off an orderly shutdown and if that
>> takes longer than a configurable amount of time, an emergency shutdown
>> will occur.
>>
>> Recently, hw_protection_reboot() was added for those systems that don't
>> implement a proper shutdown and are better served by rebooting and
>> having the boot firmware worry about doing something about the critical
>> condition.
>>
>> On timeout of the orderly reboot of hw_protection_reboot(), the system
>> would go into shutdown, instead of reboot. This is not a good idea, as
>> going into shutdown was explicitly not asked for.
>>
>> Fix this by always doing an emergency reboot if hw_protection_reboot()
>> is called and the orderly reboot takes too long.
>>
>> Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>> ---
>>   kernel/reboot.c | 70 ++++++++++++++++++++++++++++++++++++++++-----------------
>>   1 file changed, 49 insertions(+), 21 deletions(-)
>>
>> diff --git a/kernel/reboot.c b/kernel/reboot.c
>> index 847ac5d17a659981c6765699eac323f5e87f48c1..222b63dfd31020d0e2bc1b1402dbfa82adc71990 100644
>> --- a/kernel/reboot.c
>> +++ b/kernel/reboot.c
>> @@ -932,48 +932,76 @@ void orderly_reboot(void)
>>   }
>>   EXPORT_SYMBOL_GPL(orderly_reboot);
>>   +static const char *hw_protection_action_str(enum hw_protection_action action)
>> +{
>> +    switch (action) {
>> +    case HWPROT_ACT_SHUTDOWN:
>> +        return "shutdown";
>> +    case HWPROT_ACT_REBOOT:
>> +        return "reboot";
>> +    default:
>> +        return "undefined";
>> +    }
>> +}
>> +
>> +static enum hw_protection_action hw_failure_emergency_action;
> 
> nit: Do we have a (theoretical) possibility that two emergency restarts get scheduled with different actions? Should the action be allocated (maybe not) for each caller, or should there be a check if an operation with conflicting action is already scheduled?
> 
> If this was already considered and thought it is not an issue:
> 
> Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>

__hw_protection_trigger (née __hw_protection_shutdown) has this at its start:

 static atomic_t allow_proceed = ATOMIC_INIT(1);

 /* Shutdown should be initiated only once. */
 if (!atomic_dec_and_test(&allow_proceed))
         return;

It's thus not possible to have a later emergency restart race against the first.

Thanks for your R-b,
Ahmad

> 
> 
> Yours,
>     -- Matti
> 


-- 
Pengutronix e.K.                           |                             |
Steuerwalder Str. 21                       | http://www.pengutronix.de/  |
31137 Hildesheim, Germany                  | Phone: +49-5121-206917-0    |
Amtsgericht Hildesheim, HRA 2686           | Fax:   +49-5121-206917-5555 |

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v2 02/12] reboot: reboot, not shutdown, on hw_protection_reboot timeout
  2025-02-17 20:22     ` Ahmad Fatoum
@ 2025-02-18  6:45       ` Matti Vaittinen
  0 siblings, 0 replies; 33+ messages in thread
From: Matti Vaittinen @ 2025-02-18  6:45 UTC (permalink / raw)
  To: Ahmad Fatoum, Andrew Morton, Daniel Lezcano, Fabio Estevam,
	Rafael J. Wysocki, Zhang Rui, Lukasz Luba, Jonathan Corbet,
	Serge Hallyn, Liam Girdwood, Mark Brown, Benson Leung,
	Tzung-Bi Shih, Guenter Roeck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: linux-kernel, linux-pm, linux-doc, linux-security-module,
	chrome-platform, devicetree, kernel

On 17/02/2025 22:22, Ahmad Fatoum wrote:
> Hello Matti,
> 
> On 22.01.25 12:28, Matti Vaittinen wrote:
>> On 13/01/2025 18:25, Ahmad Fatoum wrote:
>>> hw_protection_shutdown() will kick off an orderly shutdown and if that
>>> takes longer than a configurable amount of time, an emergency shutdown
>>> will occur.
>>>
>>> Recently, hw_protection_reboot() was added for those systems that don't
>>> implement a proper shutdown and are better served by rebooting and
>>> having the boot firmware worry about doing something about the critical
>>> condition.
>>>
>>> On timeout of the orderly reboot of hw_protection_reboot(), the system
>>> would go into shutdown, instead of reboot. This is not a good idea, as
>>> going into shutdown was explicitly not asked for.
>>>
>>> Fix this by always doing an emergency reboot if hw_protection_reboot()
>>> is called and the orderly reboot takes too long.
>>>
>>> Fixes: 79fa723ba84c ("reboot: Introduce thermal_zone_device_critical_reboot()")
>>> Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
>>> ---
>>>    kernel/reboot.c | 70 ++++++++++++++++++++++++++++++++++++++++-----------------
>>>    1 file changed, 49 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/kernel/reboot.c b/kernel/reboot.c
>>> index 847ac5d17a659981c6765699eac323f5e87f48c1..222b63dfd31020d0e2bc1b1402dbfa82adc71990 100644
>>> --- a/kernel/reboot.c
>>> +++ b/kernel/reboot.c
>>> @@ -932,48 +932,76 @@ void orderly_reboot(void)
>>>    }
>>>    EXPORT_SYMBOL_GPL(orderly_reboot);
>>>    +static const char *hw_protection_action_str(enum hw_protection_action action)
>>> +{
>>> +    switch (action) {
>>> +    case HWPROT_ACT_SHUTDOWN:
>>> +        return "shutdown";
>>> +    case HWPROT_ACT_REBOOT:
>>> +        return "reboot";
>>> +    default:
>>> +        return "undefined";
>>> +    }
>>> +}
>>> +
>>> +static enum hw_protection_action hw_failure_emergency_action;
>>
>> nit: Do we have a (theoretical) possibility that two emergency restarts get scheduled with different actions? Should the action be allocated (maybe not) for each caller, or should there be a check if an operation with conflicting action is already scheduled?
>>
>> If this was already considered and thought it is not an issue:
>>
>> Reviewed-by: Matti Vaittinen <mazziesaccount@gmail.com>
> 
> __hw_protection_trigger (née __hw_protection_shutdown) has this at its start:
> 
>   static atomic_t allow_proceed = ATOMIC_INIT(1);
> 
>   /* Shutdown should be initiated only once. */
>   if (!atomic_dec_and_test(&allow_proceed))
>           return;
> 
> It's thus not possible to have a later emergency restart race against the first.
> 

Ah, indeed. I missed this. Thanks for the clarification! :)

Yours,
	-- Matti

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2025-02-18  6:45 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-13 16:25 [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Ahmad Fatoum
2025-01-13 16:25 ` [PATCH v2 01/12] reboot: replace __hw_protection_shutdown bool action parameter with an enum Ahmad Fatoum
2025-01-20  7:10   ` Tzung-Bi Shih
2025-01-21  9:27     ` Ahmad Fatoum
2025-01-13 16:25 ` [PATCH v2 02/12] reboot: reboot, not shutdown, on hw_protection_reboot timeout Ahmad Fatoum
2025-01-20  7:10   ` Tzung-Bi Shih
2025-01-22 11:28   ` Matti Vaittinen
2025-02-17 20:22     ` Ahmad Fatoum
2025-02-18  6:45       ` Matti Vaittinen
2025-01-13 16:25 ` [PATCH v2 03/12] docs: thermal: sync hardware protection doc with code Ahmad Fatoum
2025-01-20  7:11   ` Tzung-Bi Shih
2025-01-21  9:29     ` Ahmad Fatoum
2025-01-22 11:01   ` Matti Vaittinen
2025-01-13 16:25 ` [PATCH v2 04/12] reboot: describe do_kernel_restart's cmd argument in kernel-doc Ahmad Fatoum
2025-01-20  7:11   ` Tzung-Bi Shih
2025-01-13 16:25 ` [PATCH v2 05/12] reboot: rename now misleading __hw_protection_shutdown symbols Ahmad Fatoum
2025-01-20  7:11   ` Tzung-Bi Shih
2025-01-13 16:25 ` [PATCH v2 06/12] reboot: indicate whether it is a HARDWARE PROTECTION reboot or shutdown Ahmad Fatoum
2025-01-20  7:11   ` Tzung-Bi Shih
2025-01-13 16:25 ` [PATCH v2 07/12] reboot: add support for configuring emergency hardware protection action Ahmad Fatoum
2025-01-20  7:12   ` Tzung-Bi Shih
2025-01-21  9:35     ` Ahmad Fatoum
2025-01-13 16:25 ` [PATCH v2 08/12] regulator: allow user configuration of " Ahmad Fatoum
2025-01-20  7:12   ` Tzung-Bi Shih
2025-01-22 11:18   ` Matti Vaittinen
2025-01-13 16:25 ` [PATCH v2 09/12] platform/chrome: cros_ec_lpc: prepare for hw_protection_shutdown removal Ahmad Fatoum
2025-01-20  7:12   ` Tzung-Bi Shih
2025-01-13 16:25 ` [PATCH v2 10/12] dt-bindings: thermal: give OS some leeway in absence of critical-action Ahmad Fatoum
2025-01-13 16:25 ` [PATCH v2 11/12] thermal: core: allow user configuration of hardware protection action Ahmad Fatoum
2025-01-20  7:12   ` Tzung-Bi Shih
2025-01-13 16:25 ` [PATCH v2 12/12] reboot: retire hw_protection_reboot and hw_protection_shutdown helpers Ahmad Fatoum
2025-01-20  7:13   ` Tzung-Bi Shih
2025-01-14  0:33 ` [PATCH v2 00/12] reboot: support runtime configuration of emergency hw_protection action Andrew Morton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).