devicetree.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver
@ 2025-03-09 12:13 John Madieu
  2025-03-09 12:13 ` [RFC PATCH 1/3] thermal/cpuplog_cooling: " John Madieu
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: John Madieu @ 2025-03-09 12:13 UTC (permalink / raw)
  To: geert+renesas, niklas.soderlund+renesas, conor+dt, krzk+dt, robh,
	rafael, daniel.lezcano
  Cc: magnus.damm, claudiu.beznea.uj, devicetree, john.madieu,
	rui.zhang, linux-kernel, linux-renesas-soc, biju.das.jz, linux-pm,
	John Madieu

MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This patch series introduces a new thermal cooling driver that implements CPU
hotplug-based thermal management. The driver dynamically takes CPUs offline
during thermal excursions to reduce power consumption and prevent overheating,
while maintaining system stability by keeping at least one CPU online. 

1- Problem Statement

Modern SoCs require robust thermal management to prevent overheating under heavy
workloads. Existing cooling mechanisms like frequency scaling may not always
provide sufficient thermal relief, especially in multi-core systems where
per-core thermal contributions can be significant. 

2- Solution Overview 

The driver:

 - Integrates with the Linux thermal framework as a cooling device  
 - Registers per-CPU cooling devices that respond to thermal trip points  
 - Uses CPU hotplug operations to reduce thermal load  
 - Maintains system stability by preserving the boot CPU from being put offline,
 regardless the CPUs that are specified in cooling device list. 
 - Implements proper state tracking and cleanup

Key Features:   

 - Dynamic CPU online/offline management based on thermal thresholds  
 - Device tree-based configuration via thermal zones and trip points  
 - Hysteresis support through thermal governor interactions  
 - Safe handling of CPU state transitions during module load/unload  
 - Compatibility with existing thermal management frameworks

Testing    

 - Verified on Renesas RZ/G3E platforms with multi-core CPU configurations  
 - Validated thermal response using artificial load generation (emul_temp)  
 - Confirmed proper interaction with other cooling devices
 - Verified support for 'plug' type trace events
 - Tested with step_wise governor

As the 'hot' type is already used for user space notification, I've choosen
'plug' for this new type. suggestions on this are welcome. Here is an example
of 'thermal-zone' that integrate 'plug' type:

```
thermal-zones {
	cpu-thermal {
		polling-delay = <1000>;
		polling-delay-passive = <250>;
		thermal-sensors = <&tsu>;

		cooling-maps {
			map0 {
				trip = <&target>;
				cooling-device = <&cpu0 0 3>, <&cpu3 0 3>;
				contribution = <1024>;
			};

			map1 {
				trip = <&trip_emergency>;
				cooling-device = <&cpu1 0 1>, <&cpu2 0 1>;
				contribution = <1024>;
			};

		};

		trips {
			target: trip-point {
				temperature = <95000>;
				hysteresis = <1000>;
				type = "passive";
			};

			trip_emergency: emergency {
				temperature = <110000>;
				hysteresis = <1000>;
				type = "plug";
			};

			sensor_crit: sensor-crit {
				temperature = <120000>;
				hysteresis = <1000>;
				type = "critical";
			};
		};
	};
};
```

Dependencies    

 - Requires standard thermal framework components (CONFIG_THERMAL)  
 - Depends on CPU hotplug support (CONFIG_HOTPLUG_CPU)  
 - Assumes device tree contains appropriate thermal zone definitions

This series also depends upon [1], more precisely on patch 6/7, 
arm64: dts: renesas: r9a09g047: Add TSU node.


3) Notes for Reviewers    

 - Focus areas: Thermal framework integration, CPU state management, and error handling  
 - Feedback on device tree binding requirements is particularly welcome  
 - Suggestions for interaction improvements with other governors are appreciated

I look forward to your feedback and guidance on this contribution.

[1] https://patchwork.kernel.org/project/linux-clk/cover/20250227122453.30480-1-john.madieu.xa@bp.renesas.com/

Regards,
John


John Madieu (3):
  thermal/cpuplog_cooling: Add CPU hotplug cooling driver
  tmon: Add support for THERMAL_TRIP_PLUG type
  arm64: dts: renesas: r9a09g047: Add thermal hotplug trip point

 arch/arm64/boot/dts/renesas/r9a09g047.dtsi |  13 +
 drivers/thermal/Kconfig                    |  12 +
 drivers/thermal/Makefile                   |   1 +
 drivers/thermal/cpuplug_cooling.c          | 363 +++++++++++++++++++++
 drivers/thermal/thermal_of.c               |   1 +
 drivers/thermal/thermal_trace.h            |   2 +
 drivers/thermal/thermal_trip.c             |   1 +
 include/uapi/linux/thermal.h               |   1 +
 tools/thermal/tmon/tmon.h                  |   1 +
 tools/thermal/tmon/tui.c                   |   3 +-
 10 files changed, 397 insertions(+), 1 deletion(-)
 create mode 100644 drivers/thermal/cpuplug_cooling.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [RFC PATCH 1/3] thermal/cpuplog_cooling: Add CPU hotplug cooling driver
  2025-03-09 12:13 [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver John Madieu
@ 2025-03-09 12:13 ` John Madieu
  2025-03-11  7:30   ` Krzysztof Kozlowski
  2025-03-11  8:28   ` Geert Uytterhoeven
  2025-03-09 12:13 ` [RFC PATCH 2/3] tmon: Add support for THERMAL_TRIP_PLUG type John Madieu
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 14+ messages in thread
From: John Madieu @ 2025-03-09 12:13 UTC (permalink / raw)
  To: geert+renesas, niklas.soderlund+renesas, conor+dt, krzk+dt, robh,
	rafael, daniel.lezcano
  Cc: magnus.damm, claudiu.beznea.uj, devicetree, john.madieu,
	rui.zhang, linux-kernel, linux-renesas-soc, biju.das.jz, linux-pm,
	John Madieu

Add thermal cooling mechanism that dynamically manages CPU online/offline
states to prevent overheating. It registers  per-CPU cooling devices that can
take CPUs offline when thermal thresholds are excee and that integrates with
the Linux thermal framework as a cooling devices.

Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>
---
 drivers/thermal/Kconfig           |  12 +
 drivers/thermal/Makefile          |   1 +
 drivers/thermal/cpuplug_cooling.c | 363 ++++++++++++++++++++++++++++++
 drivers/thermal/thermal_of.c      |   1 +
 drivers/thermal/thermal_trace.h   |   2 +
 drivers/thermal/thermal_trip.c    |   1 +
 include/uapi/linux/thermal.h      |   1 +
 7 files changed, 381 insertions(+)
 create mode 100644 drivers/thermal/cpuplug_cooling.c

diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index d3f9686e26e7..6b0687f0d635 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -202,6 +202,18 @@ config CPU_IDLE_THERMAL
 	  This implements the CPU cooling mechanism through
 	  idle injection. This will throttle the CPU by injecting
 	  idle cycle.
+
+config CPU_HOTPLUG_THERMAL
+	bool "CPU hotplug cooling device"
+	depends on THERMAL
+	depends on HOTPLUG_CPU
+	help
+	  Enable this to manage platform thermals using CPU hotplug.
+	  This can offline CPUs when a temperature threshold is exceeded and
+	  bring them back online when it drops below the reset temperature.
+	  The boot CPU is never offlined.
+
+	  If in doubt, say N.
 endif
 
 config DEVFREQ_THERMAL
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index 9abf43a74f2b..7b3648daabd2 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -28,6 +28,7 @@ thermal_sys-$(CONFIG_THERMAL_GOV_POWER_ALLOCATOR)	+= gov_power_allocator.o
 # cpufreq cooling
 thermal_sys-$(CONFIG_CPU_FREQ_THERMAL)	+= cpufreq_cooling.o
 thermal_sys-$(CONFIG_CPU_IDLE_THERMAL)	+= cpuidle_cooling.o
+thermal_sys-$(CONFIG_CPU_HOTPLUG_THERMAL)	+= cpuplug_cooling.o
 
 # devfreq cooling
 thermal_sys-$(CONFIG_DEVFREQ_THERMAL) += devfreq_cooling.o
diff --git a/drivers/thermal/cpuplug_cooling.c b/drivers/thermal/cpuplug_cooling.c
new file mode 100644
index 000000000000..1f62325f0665
--- /dev/null
+++ b/drivers/thermal/cpuplug_cooling.c
@@ -0,0 +1,363 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * CPU Hotplug Thermal Cooling Device
+ *
+ * Copyright (C) 2025 Renesas Electronics Corporation
+ */
+#define pr_fmt(fmt) "cpu-hotplug-thermal: " fmt
+
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/of.h>
+#include <linux/slab.h>
+#include <linux/thermal.h>
+#include <linux/types.h>
+
+#define COOLING_DEVICE_NAME "cpu-hotplug"
+
+/* CPU Hotplug cooling device private data structure */
+struct cpu_hotplug_cooling_device {
+	struct thermal_cooling_device *cdev;
+	int cpu_id;
+	unsigned long cur_state;
+	bool cpu_online;
+	struct list_head node;
+};
+
+static LIST_HEAD(cooling_devices);
+static DEFINE_MUTEX(cooling_list_lock);
+
+/* Track which CPUs already have cooling devices */
+static DECLARE_BITMAP(cpu_cooling_registered, NR_CPUS);
+
+static inline bool is_boot_cpu(unsigned int cpu)
+{
+	return cpu == cpumask_first(cpu_online_mask);
+}
+
+static int cpu_hotplug_get_max_state(struct thermal_cooling_device *cdev,
+				     unsigned long *state)
+{
+	*state = 1; /* We only have two states: on/off */
+	return 0;
+}
+
+/* Get current cooling state */
+static int cpu_hotplug_get_cur_state(struct thermal_cooling_device *cdev,
+				     unsigned long *state)
+{
+	struct cpu_hotplug_cooling_device *hotplug_cdev = cdev->devdata;
+
+	*state = hotplug_cdev->cur_state;
+	return 0;
+}
+
+static int cpu_hotplug_set_cur_state(struct thermal_cooling_device *cdev,
+				     unsigned long state)
+{
+	struct cpu_hotplug_cooling_device *hotplug_cdev = cdev->devdata;
+	int cpu, ret = 0;
+
+	/* Only take action if state has changed */
+	if (hotplug_cdev->cur_state == state)
+		return 0;
+
+	/* Store the current state */
+	hotplug_cdev->cur_state = state;
+	cpu = hotplug_cdev->cpu_id;
+
+	/* Skip if it's the boot CPU */
+	if (is_boot_cpu(cpu))
+		return 0;
+
+	if (state == 0) {
+		/* Cooling off - bring CPU online if it's offline */
+		if (!cpu_online(cpu)) {
+			pr_info("CPU%d coming back online\n", cpu);
+			ret = add_cpu(cpu);
+			if (ret)
+				pr_err("Failed to bring CPU%d online: %d\n", cpu, ret);
+			else
+				hotplug_cdev->cpu_online = true;
+		}
+	} else {
+		/* Cooling on - take CPU offline if it's online */
+		if (cpu_online(cpu)) {
+			pr_info("CPU%d going offline due to overheating\n", cpu);
+			ret = remove_cpu(cpu);
+			if (ret)
+				pr_err("Failed to offline CPU%d: %d\n", cpu, ret);
+			else
+				hotplug_cdev->cpu_online = false;
+		}
+	}
+
+	return 0;
+}
+
+static const struct thermal_cooling_device_ops cpu_hotplug_cooling_ops = {
+	.get_max_state = cpu_hotplug_get_max_state,
+	.get_cur_state = cpu_hotplug_get_cur_state,
+	.set_cur_state = cpu_hotplug_set_cur_state,
+};
+
+static int register_cpu_hotplug_cooling(struct device_node *cpu_node,
+					int cpu_id)
+{
+	struct cpu_hotplug_cooling_device *hotplug_cdev;
+	struct thermal_cooling_device *cdev;
+	char cooling_name[32];
+
+	/* Check if we already registered this CPU */
+	if (test_bit(cpu_id, cpu_cooling_registered)) {
+		pr_info("Cooling device already registered for CPU%d\n", cpu_id);
+		return 0;
+	}
+
+	/* Skip the boot CPU */
+	if (is_boot_cpu(cpu_id)) {
+		pr_info("Skipping boot CPU%d for hotplug cooling\n", cpu_id);
+		return 0;
+	}
+
+	hotplug_cdev = kzalloc(sizeof(*hotplug_cdev), GFP_KERNEL);
+	if (!hotplug_cdev) {
+		pr_err("Failed to allocate memory for cooling device\n");
+		return -ENOMEM;
+	}
+
+	/* Initialize cooling device */
+	hotplug_cdev->cpu_id = cpu_id;
+	hotplug_cdev->cur_state = 0;
+	hotplug_cdev->cpu_online = cpu_online(cpu_id);
+
+	/* Unique cooling device name */
+	snprintf(cooling_name, sizeof(cooling_name), "%s-%d",
+		 COOLING_DEVICE_NAME, hotplug_cdev->cpu_id);
+
+	/* Register cooling device with a unique name - using CPU node */
+	cdev = thermal_of_cooling_device_register(
+		cpu_node, cooling_name, hotplug_cdev, &cpu_hotplug_cooling_ops);
+	if (IS_ERR(cdev)) {
+		pr_err("Failed to register %s: %ld\n", cooling_name,
+		       PTR_ERR(cdev));
+		kfree(hotplug_cdev);
+		return PTR_ERR(cdev);
+	}
+
+	hotplug_cdev->cdev = cdev;
+
+	/* Mark this CPU as having a registered cooling device */
+	set_bit(cpu_id, cpu_cooling_registered);
+
+	/* Add to our list for cleanup later */
+	mutex_lock(&cooling_list_lock);
+	list_add(&hotplug_cdev->node, &cooling_devices);
+	mutex_unlock(&cooling_list_lock);
+
+	pr_info("Successfully registered %s for CPU%d\n", cooling_name,
+		hotplug_cdev->cpu_id);
+
+	return 0;
+}
+
+/* Cleanup all cooling devices */
+static void cleanup_cooling_devices(void)
+{
+	struct cpu_hotplug_cooling_device *hotplug_cdev, *next;
+
+	mutex_lock(&cooling_list_lock);
+	list_for_each_entry_safe(hotplug_cdev, next, &cooling_devices, node) {
+		pr_info("Unregistering cooling device for CPU%d\n",
+			hotplug_cdev->cpu_id);
+
+		/* Clear the registration bit */
+		clear_bit(hotplug_cdev->cpu_id, cpu_cooling_registered);
+
+		/* Remove from list */
+		list_del(&hotplug_cdev->node);
+
+		/* Unregister cooling device */
+		thermal_cooling_device_unregister(hotplug_cdev->cdev);
+
+		/* Make sure CPU is back online */
+		if (!hotplug_cdev->cpu_online) {
+			int cpu = hotplug_cdev->cpu_id;
+			if (!is_boot_cpu(cpu) && !cpu_online(cpu)) {
+				pr_info("Bringing CPU%d back online during module unload\n", cpu);
+				if (add_cpu(cpu))
+					pr_err("Failed to bring CPU%d online\n", cpu);
+			}
+		}
+
+		/* Free memory */
+		kfree(hotplug_cdev);
+	}
+	mutex_unlock(&cooling_list_lock);
+}
+
+/* Check if a trip point is of type "plug" */
+static bool is_plug_trip_point(struct device_node *trip_node)
+{
+	const char *trip_type_str;
+
+	if (!trip_node) {
+		pr_err("Trip node is NULL\n");
+		return false;
+	}
+
+	if (of_property_read_string(trip_node, "type", &trip_type_str)) {
+		pr_err("Trip node missing 'type' property\n");
+		return false;
+	}
+
+	pr_info("Trip type: '%s'\n", trip_type_str);
+
+	if (strcmp(trip_type_str, "plug") != 0) {
+		pr_debug("Trip type is '%s', not 'plug' - skipping\n",
+			 trip_type_str);
+		return false;
+	}
+
+	return true;
+}
+
+/* Init function */
+static int __init cpu_hotplug_cooling_init(void)
+{
+	struct device_node *thermal_zones, *thermal_zone;
+	int ret = 0;
+	int count = 0;
+
+	bitmap_zero(cpu_cooling_registered, NR_CPUS);
+
+	thermal_zones = of_find_node_by_name(NULL, "thermal-zones");
+	if (!thermal_zones) {
+		pr_err("Missing thermal-zones node\n");
+		return -EINVAL;
+	}
+
+	/* Process each thermal zone */
+	for_each_child_of_node(thermal_zones, thermal_zone) {
+		struct device_node *trips, *trip;
+		struct device_node *maps, *map;
+		bool found_plug = false;
+
+		/* First find trips and get a specific plug trip */
+		trips = of_find_node_by_name(thermal_zone, "trips");
+		if (!trips)
+			continue;
+
+		/* Find the emergency trip with type="plug" */
+		for_each_child_of_node(trips, trip) {
+			if (is_plug_trip_point(trip)) {
+				found_plug = true;
+				break;
+			}
+		}
+
+		/* If we didn't find a plug trip, no need to process this zone */
+		if (!found_plug) {
+			of_node_put(trips);
+			continue;
+		}
+
+		maps = of_find_node_by_name(thermal_zone, "cooling-maps");
+		if (!maps) {
+			of_node_put(trip);
+			of_node_put(trips);
+			continue;
+		}
+
+		pr_info("Found 'plug' trip point, processing cooling devices\n");
+
+		/* Find the specific cooling map that references our plug trip */
+		for_each_child_of_node(maps, map) {
+			struct device_node *trip_ref;
+			struct of_phandle_args cooling_spec;
+			int idx = 0;
+
+			trip_ref = of_parse_phandle(map, "trip", 0);
+			if (!trip_ref || trip_ref != trip) {
+				if (trip_ref)
+					of_node_put(trip_ref);
+				continue;
+			}
+			of_node_put(trip_ref);
+
+			if (!of_find_property(map, "cooling-device", NULL)) {
+				pr_err("Missing cooling-device property\n");
+				continue;
+			}
+
+			/* Iterate through all cooling-device entries */
+			while (of_parse_phandle_with_args(
+				       map, "cooling-device",
+				       "#cooling-cells", idx++,
+				       &cooling_spec) == 0) {
+				struct device_node *cpu_node = cooling_spec.np;
+				int cpu;
+
+				if (!cpu_node) {
+					pr_err("CPU node at index %d is NULL\n",
+					       idx - 1);
+					continue;
+				}
+
+				cpu = of_cpu_node_to_id(cpu_node);
+				if (cpu < 0) {
+					pr_err("Failed to map CPU node %pOF to logical ID\n",
+					       cpu_node);
+					of_node_put(cpu_node);
+					continue;
+				}
+
+				if (cpu >= num_possible_cpus()) {
+					pr_err("Invalid CPU ID %d (max %d)\n",
+					       cpu, num_possible_cpus() - 1);
+					of_node_put(cpu_node);
+					continue;
+				}
+
+				pr_info("Processing cooling device for CPU%d\n", cpu);
+				ret = register_cpu_hotplug_cooling(cpu_node, cpu);
+				if (ret == 0)
+					count++;
+
+				of_node_put(cpu_node);
+			}
+			break; /* Only process the first map that references our trip */
+		}
+		of_node_put(maps);
+		of_node_put(trip);
+		of_node_put(trips);
+	}
+	of_node_put(thermal_zones);
+
+	if (count == 0) {
+		pr_err("No cooling devices registered\n");
+		return -ENODEV;
+	}
+
+	pr_info("CPU hotplug cooling driver initialized with %d devices\n", count);
+	return 0;
+}
+
+/* Exit function */
+static void __exit cpu_hotplug_cooling_exit(void)
+{
+	cleanup_cooling_devices();
+	pr_info("CPU hotplug cooling driver removed\n");
+}
+
+module_init(cpu_hotplug_cooling_init);
+module_exit(cpu_hotplug_cooling_exit);
+
+MODULE_AUTHOR("John Madieu <john.madieu.xa@bp.renesas.com>");
+MODULE_DESCRIPTION("CPU Hotplug Thermal Cooling Device");
+MODULE_LICENSE("GPL");
\ No newline at end of file
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index 0eb92d57a1e2..41655af1e419 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -28,6 +28,7 @@ static const char * const trip_types[] = {
 	[THERMAL_TRIP_ACTIVE]	= "active",
 	[THERMAL_TRIP_PASSIVE]	= "passive",
 	[THERMAL_TRIP_HOT]	= "hot",
+	[THERMAL_TRIP_PLUG]	= "plug",
 	[THERMAL_TRIP_CRITICAL]	= "critical",
 };
 
diff --git a/drivers/thermal/thermal_trace.h b/drivers/thermal/thermal_trace.h
index df8f4edd6068..c26a3aa7de5f 100644
--- a/drivers/thermal/thermal_trace.h
+++ b/drivers/thermal/thermal_trace.h
@@ -12,6 +12,7 @@
 #include "thermal_core.h"
 
 TRACE_DEFINE_ENUM(THERMAL_TRIP_CRITICAL);
+TRACE_DEFINE_ENUM(THERMAL_TRIP_PLUG);
 TRACE_DEFINE_ENUM(THERMAL_TRIP_HOT);
 TRACE_DEFINE_ENUM(THERMAL_TRIP_PASSIVE);
 TRACE_DEFINE_ENUM(THERMAL_TRIP_ACTIVE);
@@ -19,6 +20,7 @@ TRACE_DEFINE_ENUM(THERMAL_TRIP_ACTIVE);
 #define show_tzt_type(type)					\
 	__print_symbolic(type,					\
 			 { THERMAL_TRIP_CRITICAL, "CRITICAL"},	\
+			 { THERMAL_TRIP_PLUG,     "PLUG"},	\
 			 { THERMAL_TRIP_HOT,      "HOT"},	\
 			 { THERMAL_TRIP_PASSIVE,  "PASSIVE"},	\
 			 { THERMAL_TRIP_ACTIVE,   "ACTIVE"})
diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c
index 4b8238468b53..373f6aaaf0da 100644
--- a/drivers/thermal/thermal_trip.c
+++ b/drivers/thermal/thermal_trip.c
@@ -13,6 +13,7 @@ static const char *trip_type_names[] = {
 	[THERMAL_TRIP_ACTIVE] = "active",
 	[THERMAL_TRIP_PASSIVE] = "passive",
 	[THERMAL_TRIP_HOT] = "hot",
+	[THERMAL_TRIP_PLUG]	= "plug",
 	[THERMAL_TRIP_CRITICAL] = "critical",
 };
 
diff --git a/include/uapi/linux/thermal.h b/include/uapi/linux/thermal.h
index 46a2633d33aa..5f76360c6f69 100644
--- a/include/uapi/linux/thermal.h
+++ b/include/uapi/linux/thermal.h
@@ -15,6 +15,7 @@ enum thermal_trip_type {
 	THERMAL_TRIP_ACTIVE = 0,
 	THERMAL_TRIP_PASSIVE,
 	THERMAL_TRIP_HOT,
+	THERMAL_TRIP_PLUG,
 	THERMAL_TRIP_CRITICAL,
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 2/3] tmon: Add support for THERMAL_TRIP_PLUG type
  2025-03-09 12:13 [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver John Madieu
  2025-03-09 12:13 ` [RFC PATCH 1/3] thermal/cpuplog_cooling: " John Madieu
@ 2025-03-09 12:13 ` John Madieu
  2025-03-09 12:13 ` [RFC PATCH 3/3] arm64: dts: renesas: r9a09g047: Add thermal hotplug trip point John Madieu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: John Madieu @ 2025-03-09 12:13 UTC (permalink / raw)
  To: geert+renesas, niklas.soderlund+renesas, conor+dt, krzk+dt, robh,
	rafael, daniel.lezcano
  Cc: magnus.damm, claudiu.beznea.uj, devicetree, john.madieu,
	rui.zhang, linux-kernel, linux-renesas-soc, biju.das.jz, linux-pm,
	John Madieu

Extend tmon to handle the new THERMAL_TRIP_PLUG trip type:

- Update UI legend to show 'G=Plug' in status display
- Map trip type to 'G' character in trip_type_to_char()

Align tmon with kernel thermal framework extensions that support
CPU hotplug-based cooling through dedicated trip points.

Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>
---
 tools/thermal/tmon/tmon.h | 1 +
 tools/thermal/tmon/tui.c  | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/thermal/tmon/tmon.h b/tools/thermal/tmon/tmon.h
index 44d16d778f04..b9b413be5eac 100644
--- a/tools/thermal/tmon/tmon.h
+++ b/tools/thermal/tmon/tmon.h
@@ -57,6 +57,7 @@ struct cdev_info {
 
 enum trip_type {
 	THERMAL_TRIP_CRITICAL,
+	THERMAL_TRIP_PLUG,
 	THERMAL_TRIP_HOT,
 	THERMAL_TRIP_PASSIVE,
 	THERMAL_TRIP_ACTIVE,
diff --git a/tools/thermal/tmon/tui.c b/tools/thermal/tmon/tui.c
index 7f5dd2b87f15..8579b9a0d00d 100644
--- a/tools/thermal/tmon/tui.c
+++ b/tools/thermal/tmon/tui.c
@@ -307,7 +307,7 @@ void show_dialogue(void)
 	wattroff(w, A_BOLD);
 	/* print legend at the bottom line */
 	mvwprintw(w, rows - 2, 1,
-		"Legend: A=Active, P=Passive, C=Critical");
+		"Legend: A=Active, P=Passive, G=Plug, C=Critical");
 
 	wrefresh(dialogue_window);
 }
@@ -535,6 +535,7 @@ static char trip_type_to_char(int type)
 	switch (type) {
 	case THERMAL_TRIP_CRITICAL: return 'C';
 	case THERMAL_TRIP_HOT: return 'H';
+	case THERMAL_TRIP_PLUG: return 'G';
 	case THERMAL_TRIP_PASSIVE: return 'P';
 	case THERMAL_TRIP_ACTIVE: return 'A';
 	default:
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [RFC PATCH 3/3] arm64: dts: renesas: r9a09g047: Add thermal hotplug trip point
  2025-03-09 12:13 [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver John Madieu
  2025-03-09 12:13 ` [RFC PATCH 1/3] thermal/cpuplog_cooling: " John Madieu
  2025-03-09 12:13 ` [RFC PATCH 2/3] tmon: Add support for THERMAL_TRIP_PLUG type John Madieu
@ 2025-03-09 12:13 ` John Madieu
  2025-03-11 10:53   ` Christian Loehle
  2025-03-10 10:17 ` [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver Biju Das
  2025-03-15 14:40 ` Rafael J. Wysocki
  4 siblings, 1 reply; 14+ messages in thread
From: John Madieu @ 2025-03-09 12:13 UTC (permalink / raw)
  To: geert+renesas, niklas.soderlund+renesas, conor+dt, krzk+dt, robh,
	rafael, daniel.lezcano
  Cc: magnus.damm, claudiu.beznea.uj, devicetree, john.madieu,
	rui.zhang, linux-kernel, linux-renesas-soc, biju.das.jz, linux-pm,
	John Madieu

Add CPU hotplug trip point to shutdown CPU1 and CPU2 when exceeding 110°C.

Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>
---
 arch/arm64/boot/dts/renesas/r9a09g047.dtsi | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/boot/dts/renesas/r9a09g047.dtsi b/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
index 93b57d7ad7b9..06bd394582e2 100644
--- a/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
+++ b/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
@@ -533,6 +533,13 @@ map0 {
 							 <&cpu2 0 3>, <&cpu3 0 3>;
 					contribution = <1024>;
 				};
+
+				map1 {
+					trip = <&trip_emergency>;
+					cooling-device = <&cpu1 0 1>, <&cpu2 0 1>;
+					contribution = <1024>;
+				};
+
 			};
 
 			trips {
@@ -542,6 +549,12 @@ target: trip-point {
 					type = "passive";
 				};
 
+				trip_emergency: emergency {
+					temperature = <110000>;
+					hysteresis = <1000>;
+					type = "plug";
+				};
+
 				sensor_crit: sensor-crit {
 					temperature = <120000>;
 					hysteresis = <1000>;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver
  2025-03-09 12:13 [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver John Madieu
                   ` (2 preceding siblings ...)
  2025-03-09 12:13 ` [RFC PATCH 3/3] arm64: dts: renesas: r9a09g047: Add thermal hotplug trip point John Madieu
@ 2025-03-10 10:17 ` Biju Das
  2025-03-11 11:33   ` John Madieu
  2025-03-15 14:40 ` Rafael J. Wysocki
  4 siblings, 1 reply; 14+ messages in thread
From: Biju Das @ 2025-03-10 10:17 UTC (permalink / raw)
  To: John Madieu, geert+renesas@glider.be,
	niklas.soderlund+renesas@ragnatech.se, conor+dt@kernel.org,
	krzk+dt@kernel.org, robh@kernel.org, rafael@kernel.org,
	daniel.lezcano@linaro.org
  Cc: magnus.damm@gmail.com, Claudiu Beznea, devicetree@vger.kernel.org,
	john.madieu@gmail.com, rui.zhang@intel.com,
	linux-kernel@vger.kernel.org, linux-renesas-soc@vger.kernel.org,
	linux-pm@vger.kernel.org, John Madieu

Hi John,

Thanks for the patch.

> -----Original Message-----
> From: John Madieu <john.madieu.xa@bp.renesas.com>
> Sent: 09 March 2025 12:13
> Subject: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver
> 
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> This patch series introduces a new thermal cooling driver that implements CPU hotplug-based thermal
> management. The driver dynamically takes CPUs offline during thermal excursions to reduce power
> consumption and prevent overheating, while maintaining system stability by keeping at least one CPU
> online.
> 
> 1- Problem Statement
> 
> Modern SoCs require robust thermal management to prevent overheating under heavy workloads. Existing
> cooling mechanisms like frequency scaling may not always provide sufficient thermal relief, especially
> in multi-core systems where per-core thermal contributions can be significant.
> 
> 2- Solution Overview
> 
> The driver:
> 
>  - Integrates with the Linux thermal framework as a cooling device
>  - Registers per-CPU cooling devices that respond to thermal trip points
>  - Uses CPU hotplug operations to reduce thermal load
>  - Maintains system stability by preserving the boot CPU from being put offline,  regardless the CPUs
> that are specified in cooling device list.
>  - Implements proper state tracking and cleanup
> 
> Key Features:
> 
>  - Dynamic CPU online/offline management based on thermal thresholds
>  - Device tree-based configuration via thermal zones and trip points
>  - Hysteresis support through thermal governor interactions
>  - Safe handling of CPU state transitions during module load/unload
>  - Compatibility with existing thermal management frameworks
> 
> Testing
> 
>  - Verified on Renesas RZ/G3E platforms with multi-core CPU configurations
>  - Validated thermal response using artificial load generation (emul_temp)
>  - Confirmed proper interaction with other cooling devices
>  - Verified support for 'plug' type trace events
>  - Tested with step_wise governor
> 
> As the 'hot' type is already used for user space notification, I've choosen 'plug' for this new type.
> suggestions on this are welcome. Here is an example of 'thermal-zone' that integrate 'plug' type:
> 
> ```
> thermal-zones {
> 	cpu-thermal {
> 		polling-delay = <1000>;
> 		polling-delay-passive = <250>;
> 		thermal-sensors = <&tsu>;
> 
> 		cooling-maps {
> 			map0 {
> 				trip = <&target>;
> 				cooling-device = <&cpu0 0 3>, <&cpu3 0 3>;
> 				contribution = <1024>;
> 			};

Is it not possible here to make cpu1 and cpu2 as well for DVFS passive cooling?

> 
> 			map1 {
> 				trip = <&trip_emergency>;
> 				cooling-device = <&cpu1 0 1>, <&cpu2 0 1>;
> 				contribution = <1024>;
> 			};
> 
> 		};

Is it not possible here to make cpu3 as well as hot pluggable device for cooling?

Cheers,
Biju


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/3] thermal/cpuplog_cooling: Add CPU hotplug cooling driver
  2025-03-09 12:13 ` [RFC PATCH 1/3] thermal/cpuplog_cooling: " John Madieu
@ 2025-03-11  7:30   ` Krzysztof Kozlowski
  2025-03-11 11:38     ` John Madieu
  2025-03-11  8:28   ` Geert Uytterhoeven
  1 sibling, 1 reply; 14+ messages in thread
From: Krzysztof Kozlowski @ 2025-03-11  7:30 UTC (permalink / raw)
  To: John Madieu, geert+renesas, niklas.soderlund+renesas, conor+dt,
	krzk+dt, robh, rafael, daniel.lezcano
  Cc: magnus.damm, claudiu.beznea.uj, devicetree, john.madieu,
	rui.zhang, linux-kernel, linux-renesas-soc, biju.das.jz, linux-pm

On 09/03/2025 13:13, John Madieu wrote:
> +
> +/* Check if a trip point is of type "plug" */
> +static bool is_plug_trip_point(struct device_node *trip_node)
> +{
> +	const char *trip_type_str;
> +
> +	if (!trip_node) {
> +		pr_err("Trip node is NULL\n");
> +		return false;
> +	}
> +
> +	if (of_property_read_string(trip_node, "type", &trip_type_str)) {
> +		pr_err("Trip node missing 'type' property\n");
> +		return false;
> +	}
> +
> +	pr_info("Trip type: '%s'\n", trip_type_str);
> +
> +	if (strcmp(trip_type_str, "plug") != 0) {

type is object, not string. Where is ABI documented? For the type and
its value?


> +		pr_debug("Trip type is '%s', not 'plug' - skipping\n",
> +			 trip_type_str);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +/* Init function */
> +static int __init cpu_hotplug_cooling_init(void)
> +{
> +	struct device_node *thermal_zones, *thermal_zone;
> +	int ret = 0;
> +	int count = 0;
> +
> +	bitmap_zero(cpu_cooling_registered, NR_CPUS);
> +
> +	thermal_zones = of_find_node_by_name(NULL, "thermal-zones");
> +	if (!thermal_zones) {
> +		pr_err("Missing thermal-zones node\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Process each thermal zone */
> +	for_each_child_of_node(thermal_zones, thermal_zone) {
> +		struct device_node *trips, *trip;
> +		struct device_node *maps, *map;
> +		bool found_plug = false;
> +
> +		/* First find trips and get a specific plug trip */
> +		trips = of_find_node_by_name(thermal_zone, "trips");
> +		if (!trips)
> +			continue;
> +
> +		/* Find the emergency trip with type="plug" */
> +		for_each_child_of_node(trips, trip) {
> +			if (is_plug_trip_point(trip)) {
> +				found_plug = true;
> +				break;
> +			}
> +		}
> +
> +		/* If we didn't find a plug trip, no need to process this zone */
> +		if (!found_plug) {
> +			of_node_put(trips);
> +			continue;
> +		}
> +
> +		maps = of_find_node_by_name(thermal_zone, "cooling-maps");
> +		if (!maps) {
> +			of_node_put(trip);
> +			of_node_put(trips);
> +			continue;
> +		}
> +
> +		pr_info("Found 'plug' trip point, processing cooling devices\n");

dev_info, or just drop. You are not supposed to print successes of
standard DT parsing.

> +
> +		/* Find the specific cooling map that references our plug trip */
> +		for_each_child_of_node(maps, map) {
> +			struct device_node *trip_ref;
> +			struct of_phandle_args cooling_spec;
> +			int idx = 0;
> +
> +			trip_ref = of_parse_phandle(map, "trip", 0);
> +			if (!trip_ref || trip_ref != trip) {
> +				if (trip_ref)
> +					of_node_put(trip_ref);
> +				continue;
> +			}
> +			of_node_put(trip_ref);
> +
> +			if (!of_find_property(map, "cooling-device", NULL)) {
> +				pr_err("Missing cooling-device property\n");
> +				continue;
> +			}
> +
> +			/* Iterate through all cooling-device entries */
> +			while (of_parse_phandle_with_args(
> +				       map, "cooling-device",
> +				       "#cooling-cells", idx++,
> +				       &cooling_spec) == 0) {
> +				struct device_node *cpu_node = cooling_spec.np;
> +				int cpu;
> +
> +				if (!cpu_node) {
> +					pr_err("CPU node at index %d is NULL\n",
> +					       idx - 1);
> +					continue;
> +				}
> +
> +				cpu = of_cpu_node_to_id(cpu_node);
> +				if (cpu < 0) {
> +					pr_err("Failed to map CPU node %pOF to logical ID\n",
> +					       cpu_node);
> +					of_node_put(cpu_node);
> +					continue;
> +				}
> +
> +				if (cpu >= num_possible_cpus()) {
> +					pr_err("Invalid CPU ID %d (max %d)\n",
> +					       cpu, num_possible_cpus() - 1);
> +					of_node_put(cpu_node);
> +					continue;
> +				}
> +
> +				pr_info("Processing cooling device for CPU%d\n", cpu);
> +				ret = register_cpu_hotplug_cooling(cpu_node, cpu);
> +				if (ret == 0)
> +					count++;
> +
> +				of_node_put(cpu_node);
> +			}
> +			break; /* Only process the first map that references our trip */
> +		}
> +		of_node_put(maps);
> +		of_node_put(trip);
> +		of_node_put(trips);
> +	}
> +	of_node_put(thermal_zones);
> +
> +	if (count == 0) {
> +		pr_err("No cooling devices registered\n");
> +		return -ENODEV;
> +	}
> +
> +	pr_info("CPU hotplug cooling driver initialized with %d devices\n", count);

Drop. Why would you print this on MIPS machine which does not care about
it, just because someone loaded a module?

> +	return 0;
> +}
> +
> +/* Exit function */
> +static void __exit cpu_hotplug_cooling_exit(void)
> +{
> +	cleanup_cooling_devices();
> +	pr_info("CPU hotplug cooling driver removed\n");

No, drop


> +}
> +
> +module_init(cpu_hotplug_cooling_init);
> +module_exit(cpu_hotplug_cooling_exit);
> +
> +MODULE_AUTHOR("John Madieu <john.madieu.xa@bp.renesas.com>");
> +MODULE_DESCRIPTION("CPU Hotplug Thermal Cooling Device");
> +MODULE_LICENSE("GPL");
> \ No newline at end of file

Warning here

> diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
> index 0eb92d57a1e2..41655af1e419 100644
> --- a/drivers/thermal/thermal_of.c
> +++ b/drivers/thermal/thermal_of.c
> @@ -28,6 +28,7 @@ static const char * const trip_types[] = {
>  	[THERMAL_TRIP_ACTIVE]	= "active",
>  	[THERMAL_TRIP_PASSIVE]	= "passive",
>  	[THERMAL_TRIP_HOT]	= "hot",
> +	[THERMAL_TRIP_PLUG]	= "plug",
>  	[THERMAL_TRIP_CRITICAL]	= "critical",
>  };
>  
> diff --git a/drivers/thermal/thermal_trace.h b/drivers/thermal/thermal_trace.h
> index df8f4edd6068..c26a3aa7de5f 100644
> --- a/drivers/thermal/thermal_trace.h
> +++ b/drivers/thermal/thermal_trace.h
> @@ -12,6 +12,7 @@
>  #include "thermal_core.h"
>  
>  TRACE_DEFINE_ENUM(THERMAL_TRIP_CRITICAL);
> +TRACE_DEFINE_ENUM(THERMAL_TRIP_PLUG);
>  TRACE_DEFINE_ENUM(THERMAL_TRIP_HOT);
>  TRACE_DEFINE_ENUM(THERMAL_TRIP_PASSIVE);
>  TRACE_DEFINE_ENUM(THERMAL_TRIP_ACTIVE);
> @@ -19,6 +20,7 @@ TRACE_DEFINE_ENUM(THERMAL_TRIP_ACTIVE);
>  #define show_tzt_type(type)					\
>  	__print_symbolic(type,					\
>  			 { THERMAL_TRIP_CRITICAL, "CRITICAL"},	\
> +			 { THERMAL_TRIP_PLUG,     "PLUG"},	\
>  			 { THERMAL_TRIP_HOT,      "HOT"},	\
>  			 { THERMAL_TRIP_PASSIVE,  "PASSIVE"},	\
>  			 { THERMAL_TRIP_ACTIVE,   "ACTIVE"})
> diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c
> index 4b8238468b53..373f6aaaf0da 100644
> --- a/drivers/thermal/thermal_trip.c
> +++ b/drivers/thermal/thermal_trip.c
> @@ -13,6 +13,7 @@ static const char *trip_type_names[] = {
>  	[THERMAL_TRIP_ACTIVE] = "active",
>  	[THERMAL_TRIP_PASSIVE] = "passive",
>  	[THERMAL_TRIP_HOT] = "hot",
> +	[THERMAL_TRIP_PLUG]	= "plug",
>  	[THERMAL_TRIP_CRITICAL] = "critical",
>  };
>  
> diff --git a/include/uapi/linux/thermal.h b/include/uapi/linux/thermal.h
> index 46a2633d33aa..5f76360c6f69 100644
> --- a/include/uapi/linux/thermal.h
> +++ b/include/uapi/linux/thermal.h
> @@ -15,6 +15,7 @@ enum thermal_trip_type {
>  	THERMAL_TRIP_ACTIVE = 0,
>  	THERMAL_TRIP_PASSIVE,
>  	THERMAL_TRIP_HOT,
> +	THERMAL_TRIP_PLUG,
>  	THERMAL_TRIP_CRITICAL,
>  };
>  


Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 1/3] thermal/cpuplog_cooling: Add CPU hotplug cooling driver
  2025-03-09 12:13 ` [RFC PATCH 1/3] thermal/cpuplog_cooling: " John Madieu
  2025-03-11  7:30   ` Krzysztof Kozlowski
@ 2025-03-11  8:28   ` Geert Uytterhoeven
  2025-03-11 11:41     ` John Madieu
  1 sibling, 1 reply; 14+ messages in thread
From: Geert Uytterhoeven @ 2025-03-11  8:28 UTC (permalink / raw)
  To: John Madieu
  Cc: niklas.soderlund+renesas, conor+dt, krzk+dt, robh, rafael,
	daniel.lezcano, magnus.damm, claudiu.beznea.uj, devicetree,
	john.madieu, rui.zhang, linux-kernel, linux-renesas-soc,
	biju.das.jz, linux-pm

Hi John,

On Sun, 9 Mar 2025 at 13:14, John Madieu <john.madieu.xa@bp.renesas.com> wrote:
> Add thermal cooling mechanism that dynamically manages CPU online/offline
> states to prevent overheating. It registers  per-CPU cooling devices that can
> take CPUs offline when thermal thresholds are excee and that integrates with
> the Linux thermal framework as a cooling devices.
>
> Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>

Thanks for your patch!

> --- /dev/null
> +++ b/drivers/thermal/cpuplug_cooling.c

> +static int register_cpu_hotplug_cooling(struct device_node *cpu_node,
> +                                       int cpu_id)
> +{

> +       hotplug_cdev = kzalloc(sizeof(*hotplug_cdev), GFP_KERNEL);
> +       if (!hotplug_cdev) {
> +               pr_err("Failed to allocate memory for cooling device\n");

scripts/checkpatch.pl:

WARNING: Possible unnecessary 'out of memory' message

and checkpatch is right, as the memory core already takes care of
printing a message.

> +               return -ENOMEM;
> +       }

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 3/3] arm64: dts: renesas: r9a09g047: Add thermal hotplug trip point
  2025-03-09 12:13 ` [RFC PATCH 3/3] arm64: dts: renesas: r9a09g047: Add thermal hotplug trip point John Madieu
@ 2025-03-11 10:53   ` Christian Loehle
  2025-03-11 11:57     ` John Madieu
  0 siblings, 1 reply; 14+ messages in thread
From: Christian Loehle @ 2025-03-11 10:53 UTC (permalink / raw)
  To: John Madieu, geert+renesas, niklas.soderlund+renesas, conor+dt,
	krzk+dt, robh, rafael, daniel.lezcano
  Cc: magnus.damm, claudiu.beznea.uj, devicetree, john.madieu,
	rui.zhang, linux-kernel, linux-renesas-soc, biju.das.jz, linux-pm

On 3/9/25 12:13, John Madieu wrote:
> Add CPU hotplug trip point to shutdown CPU1 and CPU2 when exceeding 110°C.
> 
> Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>
> ---
>  arch/arm64/boot/dts/renesas/r9a09g047.dtsi | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/renesas/r9a09g047.dtsi b/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
> index 93b57d7ad7b9..06bd394582e2 100644
> --- a/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
> +++ b/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
> @@ -533,6 +533,13 @@ map0 {
>  							 <&cpu2 0 3>, <&cpu3 0 3>;
>  					contribution = <1024>;
>  				};
> +
> +				map1 {
> +					trip = <&trip_emergency>;
> +					cooling-device = <&cpu1 0 1>, <&cpu2 0 1>;
> +					contribution = <1024>;
> +				};
> +
>  			};
>  
>  			trips {
> @@ -542,6 +549,12 @@ target: trip-point {
>  					type = "passive";
>  				};
>  
> +				trip_emergency: emergency {
> +					temperature = <110000>;
> +					hysteresis = <1000>;
> +					type = "plug";
> +				};
> +
>  				sensor_crit: sensor-crit {
>  					temperature = <120000>;
>  					hysteresis = <1000>;


Are there no other cooling methods?
How does it compare to idle inject?

Furthermore, couldn't the offlining of some CPUs lead to the rest being
operated at much higher OPPs therefore the overall power increase, too?
(Without having looked at if this is a possibility for this particular
SoC.)
Some numbers would be helpful IMO.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver
  2025-03-10 10:17 ` [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver Biju Das
@ 2025-03-11 11:33   ` John Madieu
  0 siblings, 0 replies; 14+ messages in thread
From: John Madieu @ 2025-03-11 11:33 UTC (permalink / raw)
  To: Biju Das, geert+renesas@glider.be,
	niklas.soderlund+renesas@ragnatech.se, conor+dt@kernel.org,
	krzk+dt@kernel.org, robh@kernel.org, rafael@kernel.org,
	daniel.lezcano@linaro.org
  Cc: magnus.damm@gmail.com, Claudiu Beznea, devicetree@vger.kernel.org,
	john.madieu@gmail.com, rui.zhang@intel.com,
	linux-kernel@vger.kernel.org, linux-renesas-soc@vger.kernel.org,
	linux-pm@vger.kernel.org

Hi Biju,

Thanks for your review.

> -----Original Message-----
> From: Biju Das <biju.das.jz@bp.renesas.com>
> Sent: Monday, March 10, 2025 11:18 AM
> To: John Madieu <john.madieu.xa@bp.renesas.com>; geert+renesas@glider.be;
> niklas.soderlund+renesas@ragnatech.se; conor+dt@kernel.org;
> krzk+dt@kernel.org; robh@kernel.org; rafael@kernel.org;
> daniel.lezcano@linaro.org
> Subject: RE: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver
> 
> Hi John,
> 
> Thanks for the patch.
> 
> > -----Original Message-----
> > From: John Madieu <john.madieu.xa@bp.renesas.com>
> > Sent: 09 March 2025 12:13
> > Subject: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver
> >
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=UTF-8
> > Content-Transfer-Encoding: 8bit
> >
> > This patch series introduces a new thermal cooling driver that
> > implements CPU hotplug-based thermal management. The driver
> > dynamically takes CPUs offline during thermal excursions to reduce
> > power consumption and prevent overheating, while maintaining system
> stability by keeping at least one CPU online.
> >
> > 1- Problem Statement
> >
> > Modern SoCs require robust thermal management to prevent overheating
> > under heavy workloads. Existing cooling mechanisms like frequency
> > scaling may not always provide sufficient thermal relief, especially in
> multi-core systems where per-core thermal contributions can be
> significant.
> >
> > 2- Solution Overview
> >
> > The driver:
> >
> >  - Integrates with the Linux thermal framework as a cooling device
> >  - Registers per-CPU cooling devices that respond to thermal trip
> > points
> >  - Uses CPU hotplug operations to reduce thermal load
> >  - Maintains system stability by preserving the boot CPU from being
> > put offline,  regardless the CPUs that are specified in cooling device
> list.
> >  - Implements proper state tracking and cleanup
> >
> > Key Features:
> >
> >  - Dynamic CPU online/offline management based on thermal thresholds
> >  - Device tree-based configuration via thermal zones and trip points
> >  - Hysteresis support through thermal governor interactions
> >  - Safe handling of CPU state transitions during module load/unload
> >  - Compatibility with existing thermal management frameworks
> >
> > Testing
> >
> >  - Verified on Renesas RZ/G3E platforms with multi-core CPU
> > configurations
> >  - Validated thermal response using artificial load generation
> > (emul_temp)
> >  - Confirmed proper interaction with other cooling devices
> >  - Verified support for 'plug' type trace events
> >  - Tested with step_wise governor
> >
> > As the 'hot' type is already used for user space notification, I've
> choosen 'plug' for this new type.
> > suggestions on this are welcome. Here is an example of 'thermal-zone'
> that integrate 'plug' type:
> >
> > ```
> > thermal-zones {
> > 	cpu-thermal {
> > 		polling-delay = <1000>;
> > 		polling-delay-passive = <250>;
> > 		thermal-sensors = <&tsu>;
> >
> > 		cooling-maps {
> > 			map0 {
> > 				trip = <&target>;
> > 				cooling-device = <&cpu0 0 3>, <&cpu3 0 3>;
> > 				contribution = <1024>;
> > 			};
> 
> Is it not possible here to make cpu1 and cpu2 as well for DVFS passive
> cooling?

From my tests, adding same CPUs as cooling devices in both maps
generated some warnings saying that the trip could not be bound
to my ("plug") cooling device.

This is a point I still must investigate, and comments from maintainers
would be welcome. However, despite these warnings, I had no unexpected
behavior, and even thermal trace events were Ok.

> 
> >
> > 			map1 {
> > 				trip = <&trip_emergency>;
> > 				cooling-device = <&cpu1 0 1>, <&cpu2 0 1>;
> > 				contribution = <1024>;
> > 			};
> >
> > 		};
> 
> Is it not possible here to make cpu3 as well as hot pluggable device for
> cooling?
> 
> Cheers,
> Biju

Regards,
John


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH 1/3] thermal/cpuplog_cooling: Add CPU hotplug cooling driver
  2025-03-11  7:30   ` Krzysztof Kozlowski
@ 2025-03-11 11:38     ` John Madieu
  0 siblings, 0 replies; 14+ messages in thread
From: John Madieu @ 2025-03-11 11:38 UTC (permalink / raw)
  To: Krzysztof Kozlowski, geert+renesas@glider.be,
	niklas.soderlund+renesas@ragnatech.se, conor+dt@kernel.org,
	krzk+dt@kernel.org, robh@kernel.org, rafael@kernel.org,
	daniel.lezcano@linaro.org
  Cc: magnus.damm@gmail.com, Claudiu Beznea, devicetree@vger.kernel.org,
	john.madieu@gmail.com, rui.zhang@intel.com,
	linux-kernel@vger.kernel.org, linux-renesas-soc@vger.kernel.org,
	Biju Das, linux-pm@vger.kernel.org

Hi Krzysztof,

Thanks for the review.

> -----Original Message-----
> From: Krzysztof Kozlowski <krzk@kernel.org>
> Sent: Tuesday, March 11, 2025 8:31 AM
> To: John Madieu <john.madieu.xa@bp.renesas.com>; geert+renesas@glider.be;
> niklas.soderlund+renesas@ragnatech.se; conor+dt@kernel.org;
> krzk+dt@kernel.org; robh@kernel.org; rafael@kernel.org;
> daniel.lezcano@linaro.org
> Subject: Re: [RFC PATCH 1/3] thermal/cpuplog_cooling: Add CPU hotplug
> cooling driver
> 
> On 09/03/2025 13:13, John Madieu wrote:
> > +
> > +/* Check if a trip point is of type "plug" */ static bool
> > +is_plug_trip_point(struct device_node *trip_node) {
> > +	const char *trip_type_str;
> > +
> > +	if (!trip_node) {
> > +		pr_err("Trip node is NULL\n");
> > +		return false;
> > +	}
> > +
> > +	if (of_property_read_string(trip_node, "type", &trip_type_str)) {
> > +		pr_err("Trip node missing 'type' property\n");
> > +		return false;
> > +	}
> > +
> > +	pr_info("Trip type: '%s'\n", trip_type_str);
> > +
> > +	if (strcmp(trip_type_str, "plug") != 0) {
> 
> type is object, not string. Where is ABI documented? For the type and its
> value?

I'll prepare it for v2.

> 
> 
> > +		pr_debug("Trip type is '%s', not 'plug' - skipping\n",
> > +			 trip_type_str);
> > +		return false;
> > +	}
> > +
> > +	return true;
> > +}
> > +
> > +/* Init function */
> > +static int __init cpu_hotplug_cooling_init(void) {
> > +	struct device_node *thermal_zones, *thermal_zone;
> > +	int ret = 0;
> > +	int count = 0;
> > +
> > +	bitmap_zero(cpu_cooling_registered, NR_CPUS);
> > +
> > +	thermal_zones = of_find_node_by_name(NULL, "thermal-zones");
> > +	if (!thermal_zones) {
> > +		pr_err("Missing thermal-zones node\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	/* Process each thermal zone */
> > +	for_each_child_of_node(thermal_zones, thermal_zone) {
> > +		struct device_node *trips, *trip;
> > +		struct device_node *maps, *map;
> > +		bool found_plug = false;
> > +
> > +		/* First find trips and get a specific plug trip */
> > +		trips = of_find_node_by_name(thermal_zone, "trips");
> > +		if (!trips)
> > +			continue;
> > +
> > +		/* Find the emergency trip with type="plug" */
> > +		for_each_child_of_node(trips, trip) {
> > +			if (is_plug_trip_point(trip)) {
> > +				found_plug = true;
> > +				break;
> > +			}
> > +		}
> > +
> > +		/* If we didn't find a plug trip, no need to process this zone
> */
> > +		if (!found_plug) {
> > +			of_node_put(trips);
> > +			continue;
> > +		}
> > +
> > +		maps = of_find_node_by_name(thermal_zone, "cooling-maps");
> > +		if (!maps) {
> > +			of_node_put(trip);
> > +			of_node_put(trips);
> > +			continue;
> > +		}
> > +
> > +		pr_info("Found 'plug' trip point, processing cooling
> devices\n");
> 
> dev_info, or just drop. You are not supposed to print successes of
> standard DT parsing.

Noted. Thanks!

> 
> > +
> > +		/* Find the specific cooling map that references our plug trip
> */
> > +		for_each_child_of_node(maps, map) {
> > +			struct device_node *trip_ref;
> > +			struct of_phandle_args cooling_spec;
> > +			int idx = 0;
> > +
> > +			trip_ref = of_parse_phandle(map, "trip", 0);
> > +			if (!trip_ref || trip_ref != trip) {
> > +				if (trip_ref)
> > +					of_node_put(trip_ref);
> > +				continue;
> > +			}
> > +			of_node_put(trip_ref);
> > +
> > +			if (!of_find_property(map, "cooling-device", NULL)) {
> > +				pr_err("Missing cooling-device property\n");
> > +				continue;
> > +			}
> > +
> > +			/* Iterate through all cooling-device entries */
> > +			while (of_parse_phandle_with_args(
> > +				       map, "cooling-device",
> > +				       "#cooling-cells", idx++,
> > +				       &cooling_spec) == 0) {
> > +				struct device_node *cpu_node = cooling_spec.np;
> > +				int cpu;
> > +
> > +				if (!cpu_node) {
> > +					pr_err("CPU node at index %d is NULL\n",
> > +					       idx - 1);
> > +					continue;
> > +				}
> > +
> > +				cpu = of_cpu_node_to_id(cpu_node);
> > +				if (cpu < 0) {
> > +					pr_err("Failed to map CPU node %pOF to
> logical ID\n",
> > +					       cpu_node);
> > +					of_node_put(cpu_node);
> > +					continue;
> > +				}
> > +
> > +				if (cpu >= num_possible_cpus()) {
> > +					pr_err("Invalid CPU ID %d (max %d)\n",
> > +					       cpu, num_possible_cpus() - 1);
> > +					of_node_put(cpu_node);
> > +					continue;
> > +				}
> > +
> > +				pr_info("Processing cooling device for CPU%d\n",
> cpu);
> > +				ret = register_cpu_hotplug_cooling(cpu_node, cpu);
> > +				if (ret == 0)
> > +					count++;
> > +
> > +				of_node_put(cpu_node);
> > +			}
> > +			break; /* Only process the first map that references our
> trip */
> > +		}
> > +		of_node_put(maps);
> > +		of_node_put(trip);
> > +		of_node_put(trips);
> > +	}
> > +	of_node_put(thermal_zones);
> > +
> > +	if (count == 0) {
> > +		pr_err("No cooling devices registered\n");
> > +		return -ENODEV;
> > +	}
> > +
> > +	pr_info("CPU hotplug cooling driver initialized with %d devices\n",
> > +count);
> 
> Drop. Why would you print this on MIPS machine which does not care about
> it, just because someone loaded a module?
> 

Will remove this in v2.

> > +	return 0;
> > +}
> > +
> > +/* Exit function */
> > +static void __exit cpu_hotplug_cooling_exit(void) {
> > +	cleanup_cooling_devices();
> > +	pr_info("CPU hotplug cooling driver removed\n");
> 
> No, drop
> 

Got it.

> 
> > +}
> > +
> > +module_init(cpu_hotplug_cooling_init);
> > +module_exit(cpu_hotplug_cooling_exit);
> > +
> > +MODULE_AUTHOR("John Madieu <john.madieu.xa@bp.renesas.com>");
> > +MODULE_DESCRIPTION("CPU Hotplug Thermal Cooling Device");
> > +MODULE_LICENSE("GPL");
> > \ No newline at end of file
> 
> Warning here
> 

Will be fixed in v2.

> > diff --git a/drivers/thermal/thermal_of.c
> > b/drivers/thermal/thermal_of.c index 0eb92d57a1e2..41655af1e419 100644
> > --- a/drivers/thermal/thermal_of.c
> > +++ b/drivers/thermal/thermal_of.c
> > @@ -28,6 +28,7 @@ static const char * const trip_types[] = {
> >  	[THERMAL_TRIP_ACTIVE]	= "active",
> >  	[THERMAL_TRIP_PASSIVE]	= "passive",
> >  	[THERMAL_TRIP_HOT]	= "hot",
> > +	[THERMAL_TRIP_PLUG]	= "plug",
> >  	[THERMAL_TRIP_CRITICAL]	= "critical",
> >  };
> >
> > diff --git a/drivers/thermal/thermal_trace.h
> > b/drivers/thermal/thermal_trace.h index df8f4edd6068..c26a3aa7de5f
> > 100644
> > --- a/drivers/thermal/thermal_trace.h
> > +++ b/drivers/thermal/thermal_trace.h
> > @@ -12,6 +12,7 @@
> >  #include "thermal_core.h"
> >
> >  TRACE_DEFINE_ENUM(THERMAL_TRIP_CRITICAL);
> > +TRACE_DEFINE_ENUM(THERMAL_TRIP_PLUG);
> >  TRACE_DEFINE_ENUM(THERMAL_TRIP_HOT);
> >  TRACE_DEFINE_ENUM(THERMAL_TRIP_PASSIVE);
> >  TRACE_DEFINE_ENUM(THERMAL_TRIP_ACTIVE);
> > @@ -19,6 +20,7 @@ TRACE_DEFINE_ENUM(THERMAL_TRIP_ACTIVE);
> >  #define show_tzt_type(type)					\
> >  	__print_symbolic(type,					\
> >  			 { THERMAL_TRIP_CRITICAL, "CRITICAL"},	\
> > +			 { THERMAL_TRIP_PLUG,     "PLUG"},	\
> >  			 { THERMAL_TRIP_HOT,      "HOT"},	\
> >  			 { THERMAL_TRIP_PASSIVE,  "PASSIVE"},	\
> >  			 { THERMAL_TRIP_ACTIVE,   "ACTIVE"})
> > diff --git a/drivers/thermal/thermal_trip.c
> > b/drivers/thermal/thermal_trip.c index 4b8238468b53..373f6aaaf0da
> > 100644
> > --- a/drivers/thermal/thermal_trip.c
> > +++ b/drivers/thermal/thermal_trip.c
> > @@ -13,6 +13,7 @@ static const char *trip_type_names[] = {
> >  	[THERMAL_TRIP_ACTIVE] = "active",
> >  	[THERMAL_TRIP_PASSIVE] = "passive",
> >  	[THERMAL_TRIP_HOT] = "hot",
> > +	[THERMAL_TRIP_PLUG]	= "plug",
> >  	[THERMAL_TRIP_CRITICAL] = "critical",  };
> >
> > diff --git a/include/uapi/linux/thermal.h
> > b/include/uapi/linux/thermal.h index 46a2633d33aa..5f76360c6f69 100644
> > --- a/include/uapi/linux/thermal.h
> > +++ b/include/uapi/linux/thermal.h
> > @@ -15,6 +15,7 @@ enum thermal_trip_type {
> >  	THERMAL_TRIP_ACTIVE = 0,
> >  	THERMAL_TRIP_PASSIVE,
> >  	THERMAL_TRIP_HOT,
> > +	THERMAL_TRIP_PLUG,
> >  	THERMAL_TRIP_CRITICAL,
> >  };
> >
> 
> 
> Best regards,
> Krzysztof

Regards,
John

^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH 1/3] thermal/cpuplog_cooling: Add CPU hotplug cooling driver
  2025-03-11  8:28   ` Geert Uytterhoeven
@ 2025-03-11 11:41     ` John Madieu
  0 siblings, 0 replies; 14+ messages in thread
From: John Madieu @ 2025-03-11 11:41 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: niklas.soderlund+renesas@ragnatech.se, conor+dt@kernel.org,
	krzk+dt@kernel.org, robh@kernel.org, rafael@kernel.org,
	daniel.lezcano@linaro.org, magnus.damm@gmail.com, Claudiu Beznea,
	devicetree@vger.kernel.org, john.madieu@gmail.com,
	rui.zhang@intel.com, linux-kernel@vger.kernel.org,
	linux-renesas-soc@vger.kernel.org, Biju Das,
	linux-pm@vger.kernel.org

Hi Geert,

Thanks for your review.

> -----Original Message-----
> From: Geert Uytterhoeven <geert@linux-m68k.org>
> Sent: Tuesday, March 11, 2025 9:28 AM
> To: John Madieu <john.madieu.xa@bp.renesas.com>
> Subject: Re: [RFC PATCH 1/3] thermal/cpuplog_cooling: Add CPU hotplug
> cooling driver
> 
> Hi John,
> 
> On Sun, 9 Mar 2025 at 13:14, John Madieu <john.madieu.xa@bp.renesas.com>
> wrote:
> > Add thermal cooling mechanism that dynamically manages CPU
> > online/offline states to prevent overheating. It registers  per-CPU
> > cooling devices that can take CPUs offline when thermal thresholds are
> > excee and that integrates with the Linux thermal framework as a cooling
> devices.
> >
> > Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>
> 
> Thanks for your patch!
> 
> > --- /dev/null
> > +++ b/drivers/thermal/cpuplug_cooling.c
> 
> > +static int register_cpu_hotplug_cooling(struct device_node *cpu_node,
> > +                                       int cpu_id) {
> 
> > +       hotplug_cdev = kzalloc(sizeof(*hotplug_cdev), GFP_KERNEL);
> > +       if (!hotplug_cdev) {
> > +               pr_err("Failed to allocate memory for cooling
> > + device\n");
> 
> scripts/checkpatch.pl:
> 
> WARNING: Possible unnecessary 'out of memory' message
> 
> and checkpatch is right, as the memory core already takes care of printing
> a message.
> 

Will be removed in v2.

> > +               return -ENOMEM;
> > +       }
> 
> Gr{oetje,eeting}s,
> 
>                         Geert
> 
> --

Regards,
John


^ permalink raw reply	[flat|nested] 14+ messages in thread

* RE: [RFC PATCH 3/3] arm64: dts: renesas: r9a09g047: Add thermal hotplug trip point
  2025-03-11 10:53   ` Christian Loehle
@ 2025-03-11 11:57     ` John Madieu
  2025-03-11 15:29       ` Christian Loehle
  0 siblings, 1 reply; 14+ messages in thread
From: John Madieu @ 2025-03-11 11:57 UTC (permalink / raw)
  To: Christian Loehle, geert+renesas@glider.be,
	niklas.soderlund+renesas@ragnatech.se, conor+dt@kernel.org,
	krzk+dt@kernel.org, robh@kernel.org, rafael@kernel.org,
	daniel.lezcano@linaro.org
  Cc: magnus.damm@gmail.com, Claudiu Beznea, devicetree@vger.kernel.org,
	john.madieu@gmail.com, rui.zhang@intel.com,
	linux-kernel@vger.kernel.org, linux-renesas-soc@vger.kernel.org,
	Biju Das, linux-pm@vger.kernel.org

Hi Christian,

Thanks for reviewing.

> -----Original Message-----
> From: Christian Loehle <christian.loehle@arm.com>
> Sent: Tuesday, March 11, 2025 11:53 AM
> To: John Madieu <john.madieu.xa@bp.renesas.com>; geert+renesas@glider.be;
> niklas.soderlund+renesas@ragnatech.se; conor+dt@kernel.org;
> krzk+dt@kernel.org; robh@kernel.org; rafael@kernel.org;
> daniel.lezcano@linaro.org
> Cc: magnus.damm@gmail.com; Claudiu Beznea
> <claudiu.beznea.uj@bp.renesas.com>; devicetree@vger.kernel.org;
> john.madieu@gmail.com; rui.zhang@intel.com; linux-kernel@vger.kernel.org;
> linux-renesas-soc@vger.kernel.org; Biju Das <biju.das.jz@bp.renesas.com>;
> linux-pm@vger.kernel.org
> Subject: Re: [RFC PATCH 3/3] arm64: dts: renesas: r9a09g047: Add thermal
> hotplug trip point
> 
> On 3/9/25 12:13, John Madieu wrote:
> > Add CPU hotplug trip point to shutdown CPU1 and CPU2 when exceeding
> 110°C.
> >
> > Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>
> > ---
> >  arch/arm64/boot/dts/renesas/r9a09g047.dtsi | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
> > b/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
> > index 93b57d7ad7b9..06bd394582e2 100644
> > --- a/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
> > +++ b/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
> > @@ -533,6 +533,13 @@ map0 {
> >  							 <&cpu2 0 3>, <&cpu3 0 3>;
> >  					contribution = <1024>;
> >  				};
> > +
> > +				map1 {
> > +					trip = <&trip_emergency>;
> > +					cooling-device = <&cpu1 0 1>, <&cpu2 0 1>;
> > +					contribution = <1024>;
> > +				};
> > +
> >  			};
> >
> >  			trips {
> > @@ -542,6 +549,12 @@ target: trip-point {
> >  					type = "passive";
> >  				};
> >
> > +				trip_emergency: emergency {
> > +					temperature = <110000>;
> > +					hysteresis = <1000>;
> > +					type = "plug";
> > +				};
> > +
> >  				sensor_crit: sensor-crit {
> >  					temperature = <120000>;
> >  					hysteresis = <1000>;
> 
> 
> Are there no other cooling methods?
> How does it compare to idle inject?
> 
> Furthermore, couldn't the offlining of some CPUs lead to the rest being
> operated at much higher OPPs therefore the overall power increase, too?
> (Without having looked at if this is a possibility for this particular
> SoC.)
> Some numbers would be helpful IMO.

To clarify this, I tested with CPUFreq cooling, along with performance
Governor, with "plug" threshold higher than "passive" one. When passive
trip is crossed, we observe proper CPUs throttling, and when "plug" trip
is crossed, we observe target CPUs being put offline, while throttling
remains.

When "plug" targeted CPUs come back online, throttling is still operational.

Once I get comparison results with CPU idle cooling, I'll keep you posted.

Regards,
John.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 3/3] arm64: dts: renesas: r9a09g047: Add thermal hotplug trip point
  2025-03-11 11:57     ` John Madieu
@ 2025-03-11 15:29       ` Christian Loehle
  0 siblings, 0 replies; 14+ messages in thread
From: Christian Loehle @ 2025-03-11 15:29 UTC (permalink / raw)
  To: John Madieu, geert+renesas@glider.be,
	niklas.soderlund+renesas@ragnatech.se, conor+dt@kernel.org,
	krzk+dt@kernel.org, robh@kernel.org, rafael@kernel.org,
	daniel.lezcano@linaro.org
  Cc: magnus.damm@gmail.com, Claudiu Beznea, devicetree@vger.kernel.org,
	john.madieu@gmail.com, rui.zhang@intel.com,
	linux-kernel@vger.kernel.org, linux-renesas-soc@vger.kernel.org,
	Biju Das, linux-pm@vger.kernel.org

On 3/11/25 11:57, John Madieu wrote:
> Hi Christian,
> 
> Thanks for reviewing.
> 
>> -----Original Message-----
>> From: Christian Loehle <christian.loehle@arm.com>
>> Sent: Tuesday, March 11, 2025 11:53 AM
>> To: John Madieu <john.madieu.xa@bp.renesas.com>; geert+renesas@glider.be;
>> niklas.soderlund+renesas@ragnatech.se; conor+dt@kernel.org;
>> krzk+dt@kernel.org; robh@kernel.org; rafael@kernel.org;
>> daniel.lezcano@linaro.org
>> Cc: magnus.damm@gmail.com; Claudiu Beznea
>> <claudiu.beznea.uj@bp.renesas.com>; devicetree@vger.kernel.org;
>> john.madieu@gmail.com; rui.zhang@intel.com; linux-kernel@vger.kernel.org;
>> linux-renesas-soc@vger.kernel.org; Biju Das <biju.das.jz@bp.renesas.com>;
>> linux-pm@vger.kernel.org
>> Subject: Re: [RFC PATCH 3/3] arm64: dts: renesas: r9a09g047: Add thermal
>> hotplug trip point
>>
>> On 3/9/25 12:13, John Madieu wrote:
>>> Add CPU hotplug trip point to shutdown CPU1 and CPU2 when exceeding
>> 110°C.
>>>
>>> Signed-off-by: John Madieu <john.madieu.xa@bp.renesas.com>
>>> ---
>>>  arch/arm64/boot/dts/renesas/r9a09g047.dtsi | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
>>> b/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
>>> index 93b57d7ad7b9..06bd394582e2 100644
>>> --- a/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
>>> +++ b/arch/arm64/boot/dts/renesas/r9a09g047.dtsi
>>> @@ -533,6 +533,13 @@ map0 {
>>>  							 <&cpu2 0 3>, <&cpu3 0 3>;
>>>  					contribution = <1024>;
>>>  				};
>>> +
>>> +				map1 {
>>> +					trip = <&trip_emergency>;
>>> +					cooling-device = <&cpu1 0 1>, <&cpu2 0 1>;
>>> +					contribution = <1024>;
>>> +				};
>>> +
>>>  			};
>>>
>>>  			trips {
>>> @@ -542,6 +549,12 @@ target: trip-point {
>>>  					type = "passive";
>>>  				};
>>>
>>> +				trip_emergency: emergency {
>>> +					temperature = <110000>;
>>> +					hysteresis = <1000>;
>>> +					type = "plug";
>>> +				};
>>> +
>>>  				sensor_crit: sensor-crit {
>>>  					temperature = <120000>;
>>>  					hysteresis = <1000>;
>>
>>
>> Are there no other cooling methods?
>> How does it compare to idle inject?
>>
>> Furthermore, couldn't the offlining of some CPUs lead to the rest being
>> operated at much higher OPPs therefore the overall power increase, too?
>> (Without having looked at if this is a possibility for this particular
>> SoC.)
>> Some numbers would be helpful IMO.
> 
> To clarify this, I tested with CPUFreq cooling, along with performance
> Governor, with "plug" threshold higher than "passive" one. When passive
> trip is crossed, we observe proper CPUs throttling, and when "plug" trip
> is crossed, we observe target CPUs being put offline, while throttling
> remains.
> 
> When "plug" targeted CPUs come back online, throttling is still operational.
> 
> Once I get comparison results with CPU idle cooling, I'll keep you posted.
> 

Thanks John!
Might make sense to also try this with schedutil, because my argument doesn't
hold with performance governor.
As long as we also have throttling that's not a concern anyway.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver
  2025-03-09 12:13 [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver John Madieu
                   ` (3 preceding siblings ...)
  2025-03-10 10:17 ` [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver Biju Das
@ 2025-03-15 14:40 ` Rafael J. Wysocki
  4 siblings, 0 replies; 14+ messages in thread
From: Rafael J. Wysocki @ 2025-03-15 14:40 UTC (permalink / raw)
  To: John Madieu
  Cc: geert+renesas, niklas.soderlund+renesas, conor+dt, krzk+dt, robh,
	rafael, daniel.lezcano, magnus.damm, claudiu.beznea.uj,
	devicetree, john.madieu, rui.zhang, linux-kernel,
	linux-renesas-soc, biju.das.jz, linux-pm

On Sun, Mar 9, 2025 at 1:13 PM John Madieu
<john.madieu.xa@bp.renesas.com> wrote:
>
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> This patch series introduces a new thermal cooling driver that implements CPU
> hotplug-based thermal management. The driver dynamically takes CPUs offline
> during thermal excursions to reduce power consumption and prevent overheating,
> while maintaining system stability by keeping at least one CPU online.

So as far as I am concerned, this is a total no-go.  CPU offline is
not designed to be triggered from within a driver.

> 1- Problem Statement
>
> Modern SoCs require robust thermal management to prevent overheating under heavy
> workloads. Existing cooling mechanisms like frequency scaling may not always
> provide sufficient thermal relief, especially in multi-core systems where
> per-core thermal contributions can be significant.

What about idle injection?

> 2- Solution Overview
>
> The driver:
>
>  - Integrates with the Linux thermal framework as a cooling device
>  - Registers per-CPU cooling devices that respond to thermal trip points
>  - Uses CPU hotplug operations to reduce thermal load
>  - Maintains system stability by preserving the boot CPU from being put offline,
>  regardless the CPUs that are specified in cooling device list.
>  - Implements proper state tracking and cleanup
>
> Key Features:
>
>  - Dynamic CPU online/offline management based on thermal thresholds
>  - Device tree-based configuration via thermal zones and trip points

So DT-only.  Not nice.

>  - Hysteresis support through thermal governor interactions

I'd rather not combine thermal governors with CPU offline.

>  - Safe handling of CPU state transitions during module load/unload

Are you sure that it is really safe?

>  - Compatibility with existing thermal management frameworks

I'm not sure about this.

So one of the things that CPU offline does, which you probably are not
aware of, is breaking CPU affinity which is a very brutal thing for
user space if it is not expecting that to happen.  Also it migrates
interrupts between CPUs that also may confuse things.  So don't do it
from the kernel, really.

Thanks, Rafael

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-03-15 14:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-09 12:13 [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver John Madieu
2025-03-09 12:13 ` [RFC PATCH 1/3] thermal/cpuplog_cooling: " John Madieu
2025-03-11  7:30   ` Krzysztof Kozlowski
2025-03-11 11:38     ` John Madieu
2025-03-11  8:28   ` Geert Uytterhoeven
2025-03-11 11:41     ` John Madieu
2025-03-09 12:13 ` [RFC PATCH 2/3] tmon: Add support for THERMAL_TRIP_PLUG type John Madieu
2025-03-09 12:13 ` [RFC PATCH 3/3] arm64: dts: renesas: r9a09g047: Add thermal hotplug trip point John Madieu
2025-03-11 10:53   ` Christian Loehle
2025-03-11 11:57     ` John Madieu
2025-03-11 15:29       ` Christian Loehle
2025-03-10 10:17 ` [RFC PATCH 0/3] thermal: Add CPU hotplug cooling driver Biju Das
2025-03-11 11:33   ` John Madieu
2025-03-15 14:40 ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).