public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points
@ 2024-08-19 15:49 Rafael J. Wysocki
  2024-08-19 15:50 ` [PATCH v3 01/14] thermal: core: Fold two functions into their respective callers Rafael J. Wysocki
                   ` (14 more replies)
  0 siblings, 15 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 15:49 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui

Hi Everyone,

This is one more update of

https://lore.kernel.org/linux-pm/3134863.CbtlEUcBR6@rjwysocki.net/#r

the cover letter of which was sent separately by mistake:

https://lore.kernel.org/linux-pm/CAJZ5v0jo5vh2uD5t4GqBnN0qukMBG_ty33PB=NiEqigqxzBcsw@mail.gmail.com/

and it has been updated once already:

https://lore.kernel.org/linux-pm/114901234.nniJfEyVGO@rjwysocki.net/

Relative to the v2 above it drops 3 patches, one because it was broken ([04/17
in the v2), and two more that would need to be rebased significantly, either
because of dropping the other broken patch or because of the recent Bang-bang
governor fixes:

https://lore.kernel.org/linux-pm/1903691.tdWV9SEqCh@rjwysocki.net/

The remaining 14 patches, 2 of which have been slightly rebased and the rest
is mostly unchanged (except for some very minor subject and changelog fixes),
is not expected to be controversial and are targeting 6.12, on top of the
current linux-next material.

The original motivation for this series quoted below has not changed:

 The code for binding cooling devices to trip points (and unbinding them from
 trip point) is one of the murkiest pieces of the thermal subsystem.  It is
 convoluted, bloated with unnecessary code doing questionable things, and it
 works backwards.

 The idea is to bind cooling devices to trip points in accordance with some
 information known to the thermal zone owner (thermal driver).  This information
 is not known to the thermal core when the thermal zone is registered, so the
 driver needs to be involved, but instead of just asking the driver whether
 or not the given cooling device should be bound to a given trip point, the
 thermal core expects the driver to carry out all of the binding process
 including calling functions specifically provided by the core for this
 purpose which is cumbersome and counter-intuitive.

 Because the driver has no information regarding the representation of the trip
 points at the core level, it is forced to walk them (and it has to avoid some
 locking traps while doing this), or it needs to make questionable assumptions
 regarding the ordering of the trips in the core.  There are drivers doing both
 these things.

The first 5 patches in the series are preliminary.

Patch [06/14] introduces a new .should_bind() callback for thermal zones and
patches [07,09-12/14] modifies drivers to use it instead of the .bind() and
.unbind() callbacks which allows them to be simplified quite a bit.

The other patches [08,13-14/14] get rid of code that becomes unused after the
previous changes and do some cleanups on top of that.

The entire series along with 2 patches on top of it (that were present in the
v2 of this set of patches) is available in the thermal-core-testing git branch:

https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=thermal-core-testing

(note that this branch is going to be rebased shortly on top of 6.11-rc4
and the thermal control material in linux-next).

Thanks!




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 01/14] thermal: core: Fold two functions into their respective callers
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
@ 2024-08-19 15:50 ` Rafael J. Wysocki
  2024-08-20  7:04   ` Zhang, Rui
  2024-08-21  7:57   ` Daniel Lezcano
  2024-08-19 15:51 ` [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip() Rafael J. Wysocki
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 15:50 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Fold bind_cdev() into __thermal_cooling_device_register() and bind_tz()
into thermal_zone_device_register_with_trips() to reduce code bloat and
make it somewhat easier to follow the code flow.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v3: No changes

---
 drivers/thermal/thermal_core.c |   55 ++++++++++++++---------------------------
 1 file changed, 19 insertions(+), 36 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -991,20 +991,6 @@ void print_bind_err_msg(struct thermal_z
 		tz->type, cdev->type, ret);
 }
 
-static void bind_cdev(struct thermal_cooling_device *cdev)
-{
-	int ret;
-	struct thermal_zone_device *pos = NULL;
-
-	list_for_each_entry(pos, &thermal_tz_list, node) {
-		if (pos->ops.bind) {
-			ret = pos->ops.bind(pos, cdev);
-			if (ret)
-				print_bind_err_msg(pos, cdev, ret);
-		}
-	}
-}
-
 /**
  * __thermal_cooling_device_register() - register a new thermal cooling device
  * @np:		a pointer to a device tree node.
@@ -1100,7 +1086,13 @@ __thermal_cooling_device_register(struct
 	list_add(&cdev->node, &thermal_cdev_list);
 
 	/* Update binding information for 'this' new cdev */
-	bind_cdev(cdev);
+	list_for_each_entry(pos, &thermal_tz_list, node) {
+		if (pos->ops.bind) {
+			ret = pos->ops.bind(pos, cdev);
+			if (ret)
+				print_bind_err_msg(pos, cdev, ret);
+		}
+	}
 
 	list_for_each_entry(pos, &thermal_tz_list, node)
 		if (atomic_cmpxchg(&pos->need_update, 1, 0))
@@ -1338,25 +1330,6 @@ void thermal_cooling_device_unregister(s
 }
 EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);
 
-static void bind_tz(struct thermal_zone_device *tz)
-{
-	int ret;
-	struct thermal_cooling_device *pos = NULL;
-
-	if (!tz->ops.bind)
-		return;
-
-	mutex_lock(&thermal_list_lock);
-
-	list_for_each_entry(pos, &thermal_cdev_list, node) {
-		ret = tz->ops.bind(tz, pos);
-		if (ret)
-			print_bind_err_msg(tz, pos, ret);
-	}
-
-	mutex_unlock(&thermal_list_lock);
-}
-
 static void thermal_set_delay_jiffies(unsigned long *delay_jiffies, int delay_ms)
 {
 	*delay_jiffies = msecs_to_jiffies(delay_ms);
@@ -1554,13 +1527,23 @@ thermal_zone_device_register_with_trips(
 	}
 
 	mutex_lock(&thermal_list_lock);
+
 	mutex_lock(&tz->lock);
 	list_add_tail(&tz->node, &thermal_tz_list);
 	mutex_unlock(&tz->lock);
-	mutex_unlock(&thermal_list_lock);
 
 	/* Bind cooling devices for this zone */
-	bind_tz(tz);
+	if (tz->ops.bind) {
+		struct thermal_cooling_device *cdev;
+
+		list_for_each_entry(cdev, &thermal_cdev_list, node) {
+			result = tz->ops.bind(tz, cdev);
+			if (result)
+				print_bind_err_msg(tz, cdev, result);
+		}
+	}
+
+	mutex_unlock(&thermal_list_lock);
 
 	thermal_zone_device_init(tz);
 	/* Update the new thermal zone and mark it as already updated. */




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
  2024-08-19 15:50 ` [PATCH v3 01/14] thermal: core: Fold two functions into their respective callers Rafael J. Wysocki
@ 2024-08-19 15:51 ` Rafael J. Wysocki
  2024-08-20  7:05   ` Zhang, Rui
                     ` (2 more replies)
  2024-08-19 15:52 ` [PATCH v3 03/14] thermal: core: Drop redundant thermal instance checks Rafael J. Wysocki
                   ` (12 subsequent siblings)
  14 siblings, 3 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 15:51 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

It is not necessary to look up the thermal zone and the cooling device
in the respective global lists to check whether or not they are
registered.  It is sufficient to check whether or not their respective
list nodes are empty for this purpose.

Use the above observation to simplify thermal_bind_cdev_to_trip().  In
addition, eliminate an unnecessary ternary operator from it.

Moreover, add lockdep_assert_held() for thermal_list_lock to it because
that lock must be held by its callers when it is running.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v3: No changes

---
 drivers/thermal/thermal_core.c |   16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -781,25 +781,17 @@ int thermal_bind_cdev_to_trip(struct the
 {
 	struct thermal_instance *dev;
 	struct thermal_instance *pos;
-	struct thermal_zone_device *pos1;
-	struct thermal_cooling_device *pos2;
 	bool upper_no_limit;
 	int result;
 
-	list_for_each_entry(pos1, &thermal_tz_list, node) {
-		if (pos1 == tz)
-			break;
-	}
-	list_for_each_entry(pos2, &thermal_cdev_list, node) {
-		if (pos2 == cdev)
-			break;
-	}
+	lockdep_assert_held(&thermal_list_lock);
 
-	if (tz != pos1 || cdev != pos2)
+	if (list_empty(&tz->node) || list_empty(&cdev->node))
 		return -EINVAL;
 
 	/* lower default 0, upper default max_state */
-	lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
+	if (lower == THERMAL_NO_LIMIT)
+		lower = 0;
 
 	if (upper == THERMAL_NO_LIMIT) {
 		upper = cdev->max_state;




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 03/14] thermal: core: Drop redundant thermal instance checks
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
  2024-08-19 15:50 ` [PATCH v3 01/14] thermal: core: Fold two functions into their respective callers Rafael J. Wysocki
  2024-08-19 15:51 ` [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip() Rafael J. Wysocki
@ 2024-08-19 15:52 ` Rafael J. Wysocki
  2024-08-20  7:05   ` Zhang, Rui
  2024-08-21  9:32   ` Daniel Lezcano
  2024-08-19 15:56 ` [PATCH v3 04/14] thermal: sysfs: Use the dev argument in instance-related show/store Rafael J. Wysocki
                   ` (11 subsequent siblings)
  14 siblings, 2 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 15:52 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Because the trip and cdev pointers are sufficient to identify a thermal
instance holding them unambiguously, drop the additional thermal zone
checks from two loops walking the list of thermal instances in a
thermal zone.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v3: No changes

---
 drivers/thermal/thermal_core.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -850,7 +850,7 @@ int thermal_bind_cdev_to_trip(struct the
 	mutex_lock(&tz->lock);
 	mutex_lock(&cdev->lock);
 	list_for_each_entry(pos, &tz->thermal_instances, tz_node)
-		if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) {
+		if (pos->trip == trip && pos->cdev == cdev) {
 			result = -EEXIST;
 			break;
 		}
@@ -915,7 +915,7 @@ int thermal_unbind_cdev_from_trip(struct
 	mutex_lock(&tz->lock);
 	mutex_lock(&cdev->lock);
 	list_for_each_entry_safe(pos, next, &tz->thermal_instances, tz_node) {
-		if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) {
+		if (pos->trip == trip && pos->cdev == cdev) {
 			list_del(&pos->tz_node);
 			list_del(&pos->cdev_node);
 




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 04/14] thermal: sysfs: Use the dev argument in instance-related show/store
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
                   ` (2 preceding siblings ...)
  2024-08-19 15:52 ` [PATCH v3 03/14] thermal: core: Drop redundant thermal instance checks Rafael J. Wysocki
@ 2024-08-19 15:56 ` Rafael J. Wysocki
  2024-08-20  7:05   ` Zhang, Rui
                     ` (2 more replies)
  2024-08-19 15:58 ` [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions Rafael J. Wysocki
                   ` (10 subsequent siblings)
  14 siblings, 3 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 15:56 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Two sysfs show/store functions for attributes representing thermal
instances, trip_point_show() and weight_store(), retrieve the thermal
zone pointer from the instance object at hand, but they may also get
it from their dev argument, which is more consistent with what the
other thermal sysfs functions do, so make them do so.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v3: No changes (previously [06/17])

---
 drivers/thermal/thermal_sysfs.c |   15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

Index: linux-pm/drivers/thermal/thermal_sysfs.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_sysfs.c
+++ linux-pm/drivers/thermal/thermal_sysfs.c
@@ -836,13 +836,12 @@ void thermal_cooling_device_stats_reinit
 ssize_t
 trip_point_show(struct device *dev, struct device_attribute *attr, char *buf)
 {
+	struct thermal_zone_device *tz = to_thermal_zone(dev);
 	struct thermal_instance *instance;
 
-	instance =
-	    container_of(attr, struct thermal_instance, attr);
+	instance = container_of(attr, struct thermal_instance, attr);
 
-	return sprintf(buf, "%d\n",
-		       thermal_zone_trip_id(instance->tz, instance->trip));
+	return sprintf(buf, "%d\n", thermal_zone_trip_id(tz, instance->trip));
 }
 
 ssize_t
@@ -858,6 +857,7 @@ weight_show(struct device *dev, struct d
 ssize_t weight_store(struct device *dev, struct device_attribute *attr,
 		     const char *buf, size_t count)
 {
+	struct thermal_zone_device *tz = to_thermal_zone(dev);
 	struct thermal_instance *instance;
 	int ret, weight;
 
@@ -868,14 +868,13 @@ ssize_t weight_store(struct device *dev,
 	instance = container_of(attr, struct thermal_instance, weight_attr);
 
 	/* Don't race with governors using the 'weight' value */
-	mutex_lock(&instance->tz->lock);
+	mutex_lock(&tz->lock);
 
 	instance->weight = weight;
 
-	thermal_governor_update_tz(instance->tz,
-				   THERMAL_INSTANCE_WEIGHT_CHANGED);
+	thermal_governor_update_tz(tz, THERMAL_INSTANCE_WEIGHT_CHANGED);
 
-	mutex_unlock(&instance->tz->lock);
+	mutex_unlock(&tz->lock);
 
 	return count;
 }




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
                   ` (3 preceding siblings ...)
  2024-08-19 15:56 ` [PATCH v3 04/14] thermal: sysfs: Use the dev argument in instance-related show/store Rafael J. Wysocki
@ 2024-08-19 15:58 ` Rafael J. Wysocki
  2024-08-20  7:05   ` Zhang, Rui
                     ` (2 more replies)
  2024-08-19 16:00 ` [PATCH v3 06/14] thermal: core: Introduce .should_bind() thermal zone callback Rafael J. Wysocki
                   ` (9 subsequent siblings)
  14 siblings, 3 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 15:58 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
acquire the thermal zone lock, the locking rules for their callers get
complicated.  In particular, the thermal zone lock cannot be acquired
in any code path leading to one of these functions even though it might
be useful to do so.

To address this, remove the thermal zone locking from both these
functions, add lockdep assertions for the thermal zone lock to both
of them and make their callers acquire the thermal zone lock instead.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v2 -> v3: Rebase after dropping patches [04-05/17] from the series

v1 -> v2: No changes

---
 drivers/acpi/thermal.c         |    2 +-
 drivers/thermal/thermal_core.c |   30 ++++++++++++++++++++++--------
 2 files changed, 23 insertions(+), 9 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -785,6 +785,7 @@ int thermal_bind_cdev_to_trip(struct the
 	int result;
 
 	lockdep_assert_held(&thermal_list_lock);
+	lockdep_assert_held(&tz->lock);
 
 	if (list_empty(&tz->node) || list_empty(&cdev->node))
 		return -EINVAL;
@@ -847,7 +848,6 @@ int thermal_bind_cdev_to_trip(struct the
 	if (result)
 		goto remove_trip_file;
 
-	mutex_lock(&tz->lock);
 	mutex_lock(&cdev->lock);
 	list_for_each_entry(pos, &tz->thermal_instances, tz_node)
 		if (pos->trip == trip && pos->cdev == cdev) {
@@ -862,7 +862,6 @@ int thermal_bind_cdev_to_trip(struct the
 		thermal_governor_update_tz(tz, THERMAL_TZ_BIND_CDEV);
 	}
 	mutex_unlock(&cdev->lock);
-	mutex_unlock(&tz->lock);
 
 	if (!result)
 		return 0;
@@ -886,11 +885,19 @@ int thermal_zone_bind_cooling_device(str
 				     unsigned long upper, unsigned long lower,
 				     unsigned int weight)
 {
+	int ret;
+
 	if (trip_index < 0 || trip_index >= tz->num_trips)
 		return -EINVAL;
 
-	return thermal_bind_cdev_to_trip(tz, &tz->trips[trip_index].trip, cdev,
-					 upper, lower, weight);
+	mutex_lock(&tz->lock);
+
+	ret = thermal_bind_cdev_to_trip(tz, &tz->trips[trip_index].trip, cdev,
+					upper, lower, weight);
+
+	mutex_unlock(&tz->lock);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(thermal_zone_bind_cooling_device);
 
@@ -912,7 +919,8 @@ int thermal_unbind_cdev_from_trip(struct
 {
 	struct thermal_instance *pos, *next;
 
-	mutex_lock(&tz->lock);
+	lockdep_assert_held(&tz->lock);
+
 	mutex_lock(&cdev->lock);
 	list_for_each_entry_safe(pos, next, &tz->thermal_instances, tz_node) {
 		if (pos->trip == trip && pos->cdev == cdev) {
@@ -922,12 +930,10 @@ int thermal_unbind_cdev_from_trip(struct
 			thermal_governor_update_tz(tz, THERMAL_TZ_UNBIND_CDEV);
 
 			mutex_unlock(&cdev->lock);
-			mutex_unlock(&tz->lock);
 			goto unbind;
 		}
 	}
 	mutex_unlock(&cdev->lock);
-	mutex_unlock(&tz->lock);
 
 	return -ENODEV;
 
@@ -945,10 +951,18 @@ int thermal_zone_unbind_cooling_device(s
 				       int trip_index,
 				       struct thermal_cooling_device *cdev)
 {
+	int ret;
+
 	if (trip_index < 0 || trip_index >= tz->num_trips)
 		return -EINVAL;
 
-	return thermal_unbind_cdev_from_trip(tz, &tz->trips[trip_index].trip, cdev);
+	mutex_lock(&tz->lock);
+
+	ret = thermal_unbind_cdev_from_trip(tz, &tz->trips[trip_index].trip, cdev);
+
+	mutex_unlock(&tz->lock);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(thermal_zone_unbind_cooling_device);
 
Index: linux-pm/drivers/acpi/thermal.c
===================================================================
--- linux-pm.orig/drivers/acpi/thermal.c
+++ linux-pm/drivers/acpi/thermal.c
@@ -609,7 +609,7 @@ static int acpi_thermal_bind_unbind_cdev
 		.thermal = thermal, .cdev = cdev, .bind = bind
 	};
 
-	return for_each_thermal_trip(thermal, bind_unbind_cdev_cb, &bd);
+	return thermal_zone_for_each_trip(thermal, bind_unbind_cdev_cb, &bd);
 }
 
 static int




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 06/14] thermal: core: Introduce .should_bind() thermal zone callback
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
                   ` (4 preceding siblings ...)
  2024-08-19 15:58 ` [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions Rafael J. Wysocki
@ 2024-08-19 16:00 ` Rafael J. Wysocki
  2024-08-20  7:06   ` Zhang, Rui
                     ` (2 more replies)
  2024-08-19 16:02 ` [PATCH v3 07/14] thermal: ACPI: Use the " Rafael J. Wysocki
                   ` (8 subsequent siblings)
  14 siblings, 3 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 16:00 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The current design of the code binding cooling devices to trip points in
thermal zones is convoluted and hard to follow.

Namely, a driver that registers a thermal zone can provide .bind()
and .unbind() operations for it, which are required to call either
thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip(),
respectively, or thermal_zone_bind_cooling_device() and
thermal_zone_unbind_cooling_device(), respectively, for every relevant
trip point and the given cooling device.  Moreover, if .bind() is
provided and .unbind() is not, the cleanup necessary during the removal
of a thermal zone or a cooling device may not be carried out.

In other words, the core relies on the thermal zone owners to do the
right thing, which is error prone and far from obvious, even though all
of that is not really necessary.  Specifically, if the core could ask
the thermal zone owner, through a special thermal zone callback, whether
or not a given cooling device should be bound to a given trip point in
the given thermal zone, it might as well carry out all of the binding
and unbinding by itself.  In particular, the unbinding can be done
automatically without involving the thermal zone owner at all because
all of the thermal instances associated with a thermal zone or cooling
device going away must be deleted regardless.

Accordingly, introduce a new thermal zone operation, .should_bind(),
that can be invoked by the thermal core for a given thermal zone,
trip point and cooling device combination in order to check whether
or not the cooling device should be bound to the trip point at hand.
It takes an additional cooling_spec argument allowing the thermal
zone owner to specify the highest and lowest cooling states of the
cooling device and its weight for the given trip point binding.

Make the thermal core use this operation, if present, in the absence of
.bind() and .unbind().  Note that .should_bind() will be called under
the thermal zone lock.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v3: No changes (previously [08/17])

---
 drivers/thermal/thermal_core.c |  106 +++++++++++++++++++++++++++++++----------
 include/linux/thermal.h        |   10 +++
 2 files changed, 92 insertions(+), 24 deletions(-)

Index: linux-pm/include/linux/thermal.h
===================================================================
--- linux-pm.orig/include/linux/thermal.h
+++ linux-pm/include/linux/thermal.h
@@ -85,11 +85,21 @@ struct thermal_trip {
 
 struct thermal_zone_device;
 
+struct cooling_spec {
+	unsigned long upper;	/* Highest cooling state  */
+	unsigned long lower;	/* Lowest cooling state  */
+	unsigned int weight;	/* Cooling device weight */
+};
+
 struct thermal_zone_device_ops {
 	int (*bind) (struct thermal_zone_device *,
 		     struct thermal_cooling_device *);
 	int (*unbind) (struct thermal_zone_device *,
 		       struct thermal_cooling_device *);
+	bool (*should_bind) (struct thermal_zone_device *,
+			     const struct thermal_trip *,
+			     struct thermal_cooling_device *,
+			     struct cooling_spec *);
 	int (*get_temp) (struct thermal_zone_device *, int *);
 	int (*set_trips) (struct thermal_zone_device *, int, int);
 	int (*change_mode) (struct thermal_zone_device *,
Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -991,12 +991,61 @@ static struct class *thermal_class;
 
 static inline
 void print_bind_err_msg(struct thermal_zone_device *tz,
+			const struct thermal_trip *trip,
 			struct thermal_cooling_device *cdev, int ret)
 {
+	if (trip) {
+		dev_err(&tz->device, "binding cdev %s to trip %d failed: %d\n",
+			cdev->type, thermal_zone_trip_id(tz, trip), ret);
+		return;
+	}
+
 	dev_err(&tz->device, "binding zone %s with cdev %s failed:%d\n",
 		tz->type, cdev->type, ret);
 }
 
+static void thermal_zone_cdev_binding(struct thermal_zone_device *tz,
+				      struct thermal_cooling_device *cdev)
+{
+	struct thermal_trip_desc *td;
+	int ret;
+
+	/*
+	 * Old-style binding. The .bind() callback is expected to call
+	 * thermal_bind_cdev_to_trip() under the thermal zone lock.
+	 */
+	if (tz->ops.bind) {
+		ret = tz->ops.bind(tz, cdev);
+		if (ret)
+			print_bind_err_msg(tz, NULL, cdev, ret);
+
+		return;
+	}
+
+	if (!tz->ops.should_bind)
+		return;
+
+	mutex_lock(&tz->lock);
+
+	for_each_trip_desc(tz, td) {
+		struct thermal_trip *trip = &td->trip;
+		struct cooling_spec c = {
+			.upper = THERMAL_NO_LIMIT,
+			.lower = THERMAL_NO_LIMIT,
+			.weight = THERMAL_WEIGHT_DEFAULT
+		};
+
+		if (tz->ops.should_bind(tz, trip, cdev, &c)) {
+			ret = thermal_bind_cdev_to_trip(tz, trip, cdev, c.upper,
+							c.lower, c.weight);
+			if (ret)
+				print_bind_err_msg(tz, trip, cdev, ret);
+		}
+	}
+
+	mutex_unlock(&tz->lock);
+}
+
 /**
  * __thermal_cooling_device_register() - register a new thermal cooling device
  * @np:		a pointer to a device tree node.
@@ -1092,13 +1141,8 @@ __thermal_cooling_device_register(struct
 	list_add(&cdev->node, &thermal_cdev_list);
 
 	/* Update binding information for 'this' new cdev */
-	list_for_each_entry(pos, &thermal_tz_list, node) {
-		if (pos->ops.bind) {
-			ret = pos->ops.bind(pos, cdev);
-			if (ret)
-				print_bind_err_msg(pos, cdev, ret);
-		}
-	}
+	list_for_each_entry(pos, &thermal_tz_list, node)
+		thermal_zone_cdev_binding(pos, cdev);
 
 	list_for_each_entry(pos, &thermal_tz_list, node)
 		if (atomic_cmpxchg(&pos->need_update, 1, 0))
@@ -1299,6 +1343,28 @@ unlock_list:
 }
 EXPORT_SYMBOL_GPL(thermal_cooling_device_update);
 
+static void thermal_zone_cdev_unbinding(struct thermal_zone_device *tz,
+					struct thermal_cooling_device *cdev)
+{
+	struct thermal_trip_desc *td;
+
+	/*
+	 * Old-style unbinding.  The .unbind callback is expected to call
+	 * thermal_unbind_cdev_from_trip() under the thermal zone lock.
+	 */
+	if (tz->ops.unbind) {
+		tz->ops.unbind(tz, cdev);
+		return;
+	}
+
+	mutex_lock(&tz->lock);
+
+	for_each_trip_desc(tz, td)
+		thermal_unbind_cdev_from_trip(tz, &td->trip, cdev);
+
+	mutex_unlock(&tz->lock);
+}
+
 /**
  * thermal_cooling_device_unregister - removes a thermal cooling device
  * @cdev:	the thermal cooling device to remove.
@@ -1325,10 +1391,8 @@ void thermal_cooling_device_unregister(s
 	list_del(&cdev->node);
 
 	/* Unbind all thermal zones associated with 'this' cdev */
-	list_for_each_entry(tz, &thermal_tz_list, node) {
-		if (tz->ops.unbind)
-			tz->ops.unbind(tz, cdev);
-	}
+	list_for_each_entry(tz, &thermal_tz_list, node)
+		thermal_zone_cdev_unbinding(tz, cdev);
 
 	mutex_unlock(&thermal_list_lock);
 
@@ -1403,6 +1467,7 @@ thermal_zone_device_register_with_trips(
 					unsigned int polling_delay)
 {
 	const struct thermal_trip *trip = trips;
+	struct thermal_cooling_device *cdev;
 	struct thermal_zone_device *tz;
 	struct thermal_trip_desc *td;
 	int id;
@@ -1425,8 +1490,9 @@ thermal_zone_device_register_with_trips(
 		return ERR_PTR(-EINVAL);
 	}
 
-	if (!ops || !ops->get_temp) {
-		pr_err("Thermal zone device ops not defined\n");
+	if (!ops || !ops->get_temp || (ops->should_bind && ops->bind) ||
+	    (ops->should_bind && ops->unbind)) {
+		pr_err("Thermal zone device ops not defined or invalid\n");
 		return ERR_PTR(-EINVAL);
 	}
 
@@ -1539,15 +1605,8 @@ thermal_zone_device_register_with_trips(
 	mutex_unlock(&tz->lock);
 
 	/* Bind cooling devices for this zone */
-	if (tz->ops.bind) {
-		struct thermal_cooling_device *cdev;
-
-		list_for_each_entry(cdev, &thermal_cdev_list, node) {
-			result = tz->ops.bind(tz, cdev);
-			if (result)
-				print_bind_err_msg(tz, cdev, result);
-		}
-	}
+	list_for_each_entry(cdev, &thermal_cdev_list, node)
+		thermal_zone_cdev_binding(tz, cdev);
 
 	mutex_unlock(&thermal_list_lock);
 
@@ -1641,8 +1700,7 @@ void thermal_zone_device_unregister(stru
 
 	/* Unbind all cdevs associated with 'this' thermal zone */
 	list_for_each_entry(cdev, &thermal_cdev_list, node)
-		if (tz->ops.unbind)
-			tz->ops.unbind(tz, cdev);
+		thermal_zone_cdev_unbinding(tz, cdev);
 
 	mutex_unlock(&thermal_list_lock);
 




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 07/14] thermal: ACPI: Use the .should_bind() thermal zone callback
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
                   ` (5 preceding siblings ...)
  2024-08-19 16:00 ` [PATCH v3 06/14] thermal: core: Introduce .should_bind() thermal zone callback Rafael J. Wysocki
@ 2024-08-19 16:02 ` Rafael J. Wysocki
  2024-08-20  7:06   ` Zhang, Rui
  2024-08-21 13:22   ` Daniel Lezcano
  2024-08-19 16:05 ` [PATCH v3 08/14] thermal: core: Unexport thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip() Rafael J. Wysocki
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 16:02 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the ACPI thermal zone driver use the .should_bind() thermal zone
callback to provide the thermal core with the information on whether or
not to bind the given cooling device to the given trip point in the
given thermal zone.  If it returns 'true', the thermal core will bind
the cooling device to the trip and the corresponding unbinding will be
taken care of automatically by the core on the removal of the involved
thermal zone or cooling device.

This replaces the .bind() and .unbind() thermal zone callbacks which
allows the code to be simplified quite significantly while providing
the same functionality.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v3: No changes (previously [09/17])

This patch only depends on the previous one introducing the .should_bind()
thermal zone callback.

---
 drivers/acpi/thermal.c |   64 ++++++-------------------------------------------
 1 file changed, 9 insertions(+), 55 deletions(-)

Index: linux-pm/drivers/acpi/thermal.c
===================================================================
--- linux-pm.orig/drivers/acpi/thermal.c
+++ linux-pm/drivers/acpi/thermal.c
@@ -558,77 +558,31 @@ static void acpi_thermal_zone_device_cri
 	thermal_zone_device_critical(thermal);
 }
 
-struct acpi_thermal_bind_data {
-	struct thermal_zone_device *thermal;
-	struct thermal_cooling_device *cdev;
-	bool bind;
-};
-
-static int bind_unbind_cdev_cb(struct thermal_trip *trip, void *arg)
+static bool acpi_thermal_should_bind_cdev(struct thermal_zone_device *thermal,
+					  const struct thermal_trip *trip,
+					  struct thermal_cooling_device *cdev,
+					  struct cooling_spec *c)
 {
 	struct acpi_thermal_trip *acpi_trip = trip->priv;
-	struct acpi_thermal_bind_data *bd = arg;
-	struct thermal_zone_device *thermal = bd->thermal;
-	struct thermal_cooling_device *cdev = bd->cdev;
 	struct acpi_device *cdev_adev = cdev->devdata;
 	int i;
 
 	/* Skip critical and hot trips. */
 	if (!acpi_trip)
-		return 0;
+		return false;
 
 	for (i = 0; i < acpi_trip->devices.count; i++) {
 		acpi_handle handle = acpi_trip->devices.handles[i];
-		struct acpi_device *adev = acpi_fetch_acpi_dev(handle);
-
-		if (adev != cdev_adev)
-			continue;
 
-		if (bd->bind) {
-			int ret;
-
-			ret = thermal_bind_cdev_to_trip(thermal, trip, cdev,
-							THERMAL_NO_LIMIT,
-							THERMAL_NO_LIMIT,
-							THERMAL_WEIGHT_DEFAULT);
-			if (ret)
-				return ret;
-		} else {
-			thermal_unbind_cdev_from_trip(thermal, trip, cdev);
-		}
+		if (acpi_fetch_acpi_dev(handle) == cdev_adev)
+			return true;
 	}
 
-	return 0;
-}
-
-static int acpi_thermal_bind_unbind_cdev(struct thermal_zone_device *thermal,
-					 struct thermal_cooling_device *cdev,
-					 bool bind)
-{
-	struct acpi_thermal_bind_data bd = {
-		.thermal = thermal, .cdev = cdev, .bind = bind
-	};
-
-	return thermal_zone_for_each_trip(thermal, bind_unbind_cdev_cb, &bd);
-}
-
-static int
-acpi_thermal_bind_cooling_device(struct thermal_zone_device *thermal,
-				 struct thermal_cooling_device *cdev)
-{
-	return acpi_thermal_bind_unbind_cdev(thermal, cdev, true);
-}
-
-static int
-acpi_thermal_unbind_cooling_device(struct thermal_zone_device *thermal,
-				   struct thermal_cooling_device *cdev)
-{
-	return acpi_thermal_bind_unbind_cdev(thermal, cdev, false);
+	return false;
 }
 
 static const struct thermal_zone_device_ops acpi_thermal_zone_ops = {
-	.bind = acpi_thermal_bind_cooling_device,
-	.unbind	= acpi_thermal_unbind_cooling_device,
+	.should_bind = acpi_thermal_should_bind_cdev,
 	.get_temp = thermal_get_temp,
 	.get_trend = thermal_get_trend,
 	.hot = acpi_thermal_zone_device_hot,




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 08/14] thermal: core: Unexport thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
                   ` (6 preceding siblings ...)
  2024-08-19 16:02 ` [PATCH v3 07/14] thermal: ACPI: Use the " Rafael J. Wysocki
@ 2024-08-19 16:05 ` Rafael J. Wysocki
  2024-08-20  7:08   ` Zhang, Rui
                     ` (2 more replies)
  2024-08-19 16:19 ` [PATCH v3 09/14] platform/x86: acerhdf: Use the .should_bind() thermal zone callback Rafael J. Wysocki
                   ` (6 subsequent siblings)
  14 siblings, 3 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 16:05 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
are only called locally in the thermal core now, they can be static,
so change their definitions accordingly and drop their headers from
the global thermal header file.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v2 -> v3: Rebase after dropping patches [04-05/17] from the series

v1 -> v2: No changes

---
 drivers/thermal/thermal_core.c |   10 ++++------
 include/linux/thermal.h        |    8 --------
 2 files changed, 4 insertions(+), 14 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -773,7 +773,7 @@ struct thermal_zone_device *thermal_zone
  *
  * Return: 0 on success, the proper error value otherwise.
  */
-int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
+static int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
 				     const struct thermal_trip *trip,
 				     struct thermal_cooling_device *cdev,
 				     unsigned long upper, unsigned long lower,
@@ -877,7 +877,6 @@ free_mem:
 	kfree(dev);
 	return result;
 }
-EXPORT_SYMBOL_GPL(thermal_bind_cdev_to_trip);
 
 int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
 				     int trip_index,
@@ -913,9 +912,9 @@ EXPORT_SYMBOL_GPL(thermal_zone_bind_cool
  *
  * Return: 0 on success, the proper error value otherwise.
  */
-int thermal_unbind_cdev_from_trip(struct thermal_zone_device *tz,
-				  const struct thermal_trip *trip,
-				  struct thermal_cooling_device *cdev)
+static int thermal_unbind_cdev_from_trip(struct thermal_zone_device *tz,
+					 const struct thermal_trip *trip,
+					 struct thermal_cooling_device *cdev)
 {
 	struct thermal_instance *pos, *next;
 
@@ -945,7 +944,6 @@ unbind:
 	kfree(pos);
 	return 0;
 }
-EXPORT_SYMBOL_GPL(thermal_unbind_cdev_from_trip);
 
 int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz,
 				       int trip_index,
Index: linux-pm/include/linux/thermal.h
===================================================================
--- linux-pm.orig/include/linux/thermal.h
+++ linux-pm/include/linux/thermal.h
@@ -247,18 +247,10 @@ const char *thermal_zone_device_type(str
 int thermal_zone_device_id(struct thermal_zone_device *tzd);
 struct device *thermal_zone_device(struct thermal_zone_device *tzd);
 
-int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
-			      const struct thermal_trip *trip,
-			      struct thermal_cooling_device *cdev,
-			      unsigned long upper, unsigned long lower,
-			      unsigned int weight);
 int thermal_zone_bind_cooling_device(struct thermal_zone_device *, int,
 				     struct thermal_cooling_device *,
 				     unsigned long, unsigned long,
 				     unsigned int);
-int thermal_unbind_cdev_from_trip(struct thermal_zone_device *tz,
-				  const struct thermal_trip *trip,
-				  struct thermal_cooling_device *cdev);
 int thermal_zone_unbind_cooling_device(struct thermal_zone_device *, int,
 				       struct thermal_cooling_device *);
 void thermal_zone_device_update(struct thermal_zone_device *,




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 09/14] platform/x86: acerhdf: Use the .should_bind() thermal zone callback
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
                   ` (7 preceding siblings ...)
  2024-08-19 16:05 ` [PATCH v3 08/14] thermal: core: Unexport thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip() Rafael J. Wysocki
@ 2024-08-19 16:19 ` Rafael J. Wysocki
  2024-08-19 20:24   ` Peter Kästle
  2024-08-21 13:25   ` Daniel Lezcano
  2024-08-19 16:24 ` [PATCH v3 10/14] mlxsw: core_thermal: " Rafael J. Wysocki
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 16:19 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui, Hans de Goede,
	Peter Kaestle, platform-driver-x86

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the acerhdf driver use the .should_bind() thermal zone
callback to provide the thermal core with the information on whether or
not to bind the given cooling device to the given trip point in the
given thermal zone.  If it returns 'true', the thermal core will bind
the cooling device to the trip and the corresponding unbinding will be
taken care of automatically by the core on the removal of the involved
thermal zone or cooling device.

The previously existing acerhdf_bind() function bound cooling devices
to thermal trip point 0 only, so the new callback needs to return 'true'
for trip point 0.  However, it is straightforward to observe that trip
point 0 is an active trip point and the only other trip point in the
driver's thermal zone is a critical one, so it is sufficient to return
'true' from that callback if the type of the given trip point is
THERMAL_TRIP_ACTIVE.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Hans de Goede <hdegoede@redhat.com>
---

v2 -> v3: Reorder (previously [12/17]) and add the ACK from Hans

v1 -> v2: No changes

This patch only depends on the [06/14] introducing the .should_bind()
thermal zone callback:

https://lore.kernel.org/linux-pm/9334403.CDJkKcVGEf@rjwysocki.net/

---
 drivers/platform/x86/acerhdf.c |   33 ++++++---------------------------
 1 file changed, 6 insertions(+), 27 deletions(-)

Index: linux-pm/drivers/platform/x86/acerhdf.c
===================================================================
--- linux-pm.orig/drivers/platform/x86/acerhdf.c
+++ linux-pm/drivers/platform/x86/acerhdf.c
@@ -378,33 +378,13 @@ static int acerhdf_get_ec_temp(struct th
 	return 0;
 }
 
-static int acerhdf_bind(struct thermal_zone_device *thermal,
-			struct thermal_cooling_device *cdev)
+static bool acerhdf_should_bind(struct thermal_zone_device *thermal,
+				const struct thermal_trip *trip,
+				struct thermal_cooling_device *cdev,
+				struct cooling_spec *c)
 {
 	/* if the cooling device is the one from acerhdf bind it */
-	if (cdev != cl_dev)
-		return 0;
-
-	if (thermal_zone_bind_cooling_device(thermal, 0, cdev,
-			THERMAL_NO_LIMIT, THERMAL_NO_LIMIT,
-			THERMAL_WEIGHT_DEFAULT)) {
-		pr_err("error binding cooling dev\n");
-		return -EINVAL;
-	}
-	return 0;
-}
-
-static int acerhdf_unbind(struct thermal_zone_device *thermal,
-			  struct thermal_cooling_device *cdev)
-{
-	if (cdev != cl_dev)
-		return 0;
-
-	if (thermal_zone_unbind_cooling_device(thermal, 0, cdev)) {
-		pr_err("error unbinding cooling dev\n");
-		return -EINVAL;
-	}
-	return 0;
+	return cdev == cl_dev && trip->type == THERMAL_TRIP_ACTIVE;
 }
 
 static inline void acerhdf_revert_to_bios_mode(void)
@@ -447,8 +427,7 @@ static int acerhdf_get_crit_temp(struct
 
 /* bind callback functions to thermalzone */
 static struct thermal_zone_device_ops acerhdf_dev_ops = {
-	.bind = acerhdf_bind,
-	.unbind = acerhdf_unbind,
+	.should_bind = acerhdf_should_bind,
 	.get_temp = acerhdf_get_ec_temp,
 	.change_mode = acerhdf_change_mode,
 	.get_crit_temp = acerhdf_get_crit_temp,




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 10/14] mlxsw: core_thermal:  Use the .should_bind() thermal zone callback
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
                   ` (8 preceding siblings ...)
  2024-08-19 16:19 ` [PATCH v3 09/14] platform/x86: acerhdf: Use the .should_bind() thermal zone callback Rafael J. Wysocki
@ 2024-08-19 16:24 ` Rafael J. Wysocki
  2024-08-19 16:26 ` [PATCH v3 11/14] thermal: imx: " Rafael J. Wysocki
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 16:24 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui, Ido Schimmel,
	netdev

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the mlxsw core_thermal driver use the .should_bind() thermal zone
callback to provide the thermal core with the information on whether or
not to bind the given cooling device to the given trip point in the
given thermal zone.  If it returns 'true', the thermal core will bind
the cooling device to the trip and the corresponding unbinding will be
taken care of automatically by the core on the removal of the involved
thermal zone or cooling device.

It replaces the .bind() and .unbind() thermal zone callbacks (in 3
places) which assumed the same trip points ordering in the driver
and in the thermal core (that may not be true any more in the
future).  The .bind() callbacks used loops over trip point indices
to call thermal_zone_bind_cooling_device() for the same cdev (once
it had been verified) and all of the trip points, but they passed
different 'upper' and 'lower' values to it for each trip.

To retain the original functionality, the .should_bind() callbacks
need to use the same 'upper' and 'lower' values that would be used
by the corresponding .bind() callbacks when they are about to return
'true'.  To that end, the 'priv' field of each trip is set during the
thermal zone initialization to point to the corresponding 'state'
object containing the maximum and minimum cooling states of the
cooling device.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
---

v2 -> v3:
   * Add R-by from Ido
   * Reorder (previously [13/17])

v1 -> v2:
   * Fix typo in the changelog.
   * Do not move the mlxsw_thermal_ops definition.
   * Change ordering of local variables in mlxsw_thermal_module_should_bind().

This patch only depends on the [06/14] introducing the .should_bind()
thermal zone callback:

https://lore.kernel.org/linux-pm/9334403.CDJkKcVGEf@rjwysocki.net/

---
 drivers/net/ethernet/mellanox/mlxsw/core_thermal.c |  115 +++++----------------
 1 file changed, 31 insertions(+), 84 deletions(-)

Index: linux-pm/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
===================================================================
--- linux-pm.orig/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
+++ linux-pm/drivers/net/ethernet/mellanox/mlxsw/core_thermal.c
@@ -165,52 +165,22 @@ static int mlxsw_get_cooling_device_idx(
 	return -ENODEV;
 }
 
-static int mlxsw_thermal_bind(struct thermal_zone_device *tzdev,
-			      struct thermal_cooling_device *cdev)
+static bool mlxsw_thermal_should_bind(struct thermal_zone_device *tzdev,
+				      const struct thermal_trip *trip,
+				      struct thermal_cooling_device *cdev,
+				      struct cooling_spec *c)
 {
 	struct mlxsw_thermal *thermal = thermal_zone_device_priv(tzdev);
-	struct device *dev = thermal->bus_info->dev;
-	int i, err;
+	const struct mlxsw_cooling_states *state = trip->priv;
 
 	/* If the cooling device is one of ours bind it */
 	if (mlxsw_get_cooling_device_idx(thermal, cdev) < 0)
-		return 0;
-
-	for (i = 0; i < MLXSW_THERMAL_NUM_TRIPS; i++) {
-		const struct mlxsw_cooling_states *state = &thermal->cooling_states[i];
-
-		err = thermal_zone_bind_cooling_device(tzdev, i, cdev,
-						       state->max_state,
-						       state->min_state,
-						       THERMAL_WEIGHT_DEFAULT);
-		if (err < 0) {
-			dev_err(dev, "Failed to bind cooling device to trip %d\n", i);
-			return err;
-		}
-	}
-	return 0;
-}
-
-static int mlxsw_thermal_unbind(struct thermal_zone_device *tzdev,
-				struct thermal_cooling_device *cdev)
-{
-	struct mlxsw_thermal *thermal = thermal_zone_device_priv(tzdev);
-	struct device *dev = thermal->bus_info->dev;
-	int i;
-	int err;
+		return false;
 
-	/* If the cooling device is our one unbind it */
-	if (mlxsw_get_cooling_device_idx(thermal, cdev) < 0)
-		return 0;
+	c->upper = state->max_state;
+	c->lower = state->min_state;
 
-	for (i = 0; i < MLXSW_THERMAL_NUM_TRIPS; i++) {
-		err = thermal_zone_unbind_cooling_device(tzdev, i, cdev);
-		if (err < 0) {
-			dev_err(dev, "Failed to unbind cooling device\n");
-			return err;
-		}
-	}
-	return 0;
+	return true;
 }
 
 static int mlxsw_thermal_get_temp(struct thermal_zone_device *tzdev,
@@ -240,57 +210,27 @@ static struct thermal_zone_params mlxsw_
 };
 
 static struct thermal_zone_device_ops mlxsw_thermal_ops = {
-	.bind = mlxsw_thermal_bind,
-	.unbind = mlxsw_thermal_unbind,
+	.should_bind = mlxsw_thermal_should_bind,
 	.get_temp = mlxsw_thermal_get_temp,
 };
 
-static int mlxsw_thermal_module_bind(struct thermal_zone_device *tzdev,
-				     struct thermal_cooling_device *cdev)
+static bool mlxsw_thermal_module_should_bind(struct thermal_zone_device *tzdev,
+					     const struct thermal_trip *trip,
+					     struct thermal_cooling_device *cdev,
+					     struct cooling_spec *c)
 {
 	struct mlxsw_thermal_module *tz = thermal_zone_device_priv(tzdev);
+	const struct mlxsw_cooling_states *state = trip->priv;
 	struct mlxsw_thermal *thermal = tz->parent;
-	int i, j, err;
 
 	/* If the cooling device is one of ours bind it */
 	if (mlxsw_get_cooling_device_idx(thermal, cdev) < 0)
-		return 0;
-
-	for (i = 0; i < MLXSW_THERMAL_NUM_TRIPS; i++) {
-		const struct mlxsw_cooling_states *state = &tz->cooling_states[i];
+		return false;
 
-		err = thermal_zone_bind_cooling_device(tzdev, i, cdev,
-						       state->max_state,
-						       state->min_state,
-						       THERMAL_WEIGHT_DEFAULT);
-		if (err < 0)
-			goto err_thermal_zone_bind_cooling_device;
-	}
-	return 0;
-
-err_thermal_zone_bind_cooling_device:
-	for (j = i - 1; j >= 0; j--)
-		thermal_zone_unbind_cooling_device(tzdev, j, cdev);
-	return err;
-}
-
-static int mlxsw_thermal_module_unbind(struct thermal_zone_device *tzdev,
-				       struct thermal_cooling_device *cdev)
-{
-	struct mlxsw_thermal_module *tz = thermal_zone_device_priv(tzdev);
-	struct mlxsw_thermal *thermal = tz->parent;
-	int i;
-	int err;
+	c->upper = state->max_state;
+	c->lower = state->min_state;
 
-	/* If the cooling device is one of ours unbind it */
-	if (mlxsw_get_cooling_device_idx(thermal, cdev) < 0)
-		return 0;
-
-	for (i = 0; i < MLXSW_THERMAL_NUM_TRIPS; i++) {
-		err = thermal_zone_unbind_cooling_device(tzdev, i, cdev);
-		WARN_ON(err);
-	}
-	return err;
+	return true;
 }
 
 static int mlxsw_thermal_module_temp_get(struct thermal_zone_device *tzdev,
@@ -313,8 +253,7 @@ static int mlxsw_thermal_module_temp_get
 }
 
 static struct thermal_zone_device_ops mlxsw_thermal_module_ops = {
-	.bind		= mlxsw_thermal_module_bind,
-	.unbind		= mlxsw_thermal_module_unbind,
+	.should_bind	= mlxsw_thermal_module_should_bind,
 	.get_temp	= mlxsw_thermal_module_temp_get,
 };
 
@@ -342,8 +281,7 @@ static int mlxsw_thermal_gearbox_temp_ge
 }
 
 static struct thermal_zone_device_ops mlxsw_thermal_gearbox_ops = {
-	.bind		= mlxsw_thermal_module_bind,
-	.unbind		= mlxsw_thermal_module_unbind,
+	.should_bind	= mlxsw_thermal_module_should_bind,
 	.get_temp	= mlxsw_thermal_gearbox_temp_get,
 };
 
@@ -451,6 +389,7 @@ mlxsw_thermal_module_init(struct device
 			  struct mlxsw_thermal_area *area, u8 module)
 {
 	struct mlxsw_thermal_module *module_tz;
+	int i;
 
 	module_tz = &area->tz_module_arr[module];
 	/* Skip if parent is already set (case of port split). */
@@ -465,6 +404,8 @@ mlxsw_thermal_module_init(struct device
 	       sizeof(thermal->trips));
 	memcpy(module_tz->cooling_states, default_cooling_states,
 	       sizeof(thermal->cooling_states));
+	for (i = 0; i < MLXSW_THERMAL_NUM_TRIPS; i++)
+		module_tz->trips[i].priv = &module_tz->cooling_states[i];
 }
 
 static void mlxsw_thermal_module_fini(struct mlxsw_thermal_module *module_tz)
@@ -579,7 +520,7 @@ mlxsw_thermal_gearboxes_init(struct devi
 	struct mlxsw_thermal_module *gearbox_tz;
 	char mgpir_pl[MLXSW_REG_MGPIR_LEN];
 	u8 gbox_num;
-	int i;
+	int i, j;
 	int err;
 
 	mlxsw_reg_mgpir_pack(mgpir_pl, area->slot_index);
@@ -606,6 +547,9 @@ mlxsw_thermal_gearboxes_init(struct devi
 		       sizeof(thermal->trips));
 		memcpy(gearbox_tz->cooling_states, default_cooling_states,
 		       sizeof(thermal->cooling_states));
+		for (j = 0; j < MLXSW_THERMAL_NUM_TRIPS; j++)
+			gearbox_tz->trips[j].priv = &gearbox_tz->cooling_states[j];
+
 		gearbox_tz->module = i;
 		gearbox_tz->parent = thermal;
 		gearbox_tz->slot_index = area->slot_index;
@@ -722,6 +666,9 @@ int mlxsw_thermal_init(struct mlxsw_core
 	thermal->bus_info = bus_info;
 	memcpy(thermal->trips, default_thermal_trips, sizeof(thermal->trips));
 	memcpy(thermal->cooling_states, default_cooling_states, sizeof(thermal->cooling_states));
+	for (i = 0; i < MLXSW_THERMAL_NUM_TRIPS; i++)
+		thermal->trips[i].priv = &thermal->cooling_states[i];
+
 	thermal->line_cards[0].slot_index = 0;
 
 	err = mlxsw_reg_query(thermal->core, MLXSW_REG(mfcr), mfcr_pl);




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 11/14] thermal: imx: Use the .should_bind() thermal zone callback
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
                   ` (9 preceding siblings ...)
  2024-08-19 16:24 ` [PATCH v3 10/14] mlxsw: core_thermal: " Rafael J. Wysocki
@ 2024-08-19 16:26 ` Rafael J. Wysocki
  2024-08-21 13:42   ` Daniel Lezcano
  2024-08-19 16:30 ` [PATCH v3 12/14] thermal/of: " Rafael J. Wysocki
                   ` (3 subsequent siblings)
  14 siblings, 1 reply; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 16:26 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the imx_thermal driver use the .should_bind() thermal zone callback
to provide the thermal core with the information on whether or not to
bind the given cooling device to the given trip point in the given
thermal zone.  If it returns 'true', the thermal core will bind the
cooling device to the trip and the corresponding unbinding will be
taken care of automatically by the core on the removal of the involved
thermal zone or cooling device.

In the imx_thermal case, it only needs to return 'true' for the passive
trip point and it will match any cooling device passed to it, in
analogy with the old-style imx_bind() callback function.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 -> v3: No changes

This patch only depends on the [06/14] introducing the .should_bind()
thermal zone callback:

https://lore.kernel.org/linux-pm/9334403.CDJkKcVGEf@rjwysocki.net/

---
 drivers/thermal/imx_thermal.c |   20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

Index: linux-pm/drivers/thermal/imx_thermal.c
===================================================================
--- linux-pm.orig/drivers/thermal/imx_thermal.c
+++ linux-pm/drivers/thermal/imx_thermal.c
@@ -353,24 +353,16 @@ static int imx_set_trip_temp(struct ther
 	return 0;
 }
 
-static int imx_bind(struct thermal_zone_device *tz,
-		    struct thermal_cooling_device *cdev)
+static bool imx_should_bind(struct thermal_zone_device *tz,
+			    const struct thermal_trip *trip,
+			    struct thermal_cooling_device *cdev,
+			    struct cooling_spec *c)
 {
-	return thermal_zone_bind_cooling_device(tz, IMX_TRIP_PASSIVE, cdev,
-						THERMAL_NO_LIMIT,
-						THERMAL_NO_LIMIT,
-						THERMAL_WEIGHT_DEFAULT);
-}
-
-static int imx_unbind(struct thermal_zone_device *tz,
-		      struct thermal_cooling_device *cdev)
-{
-	return thermal_zone_unbind_cooling_device(tz, IMX_TRIP_PASSIVE, cdev);
+	return trip->type == THERMAL_TRIP_PASSIVE;
 }
 
 static struct thermal_zone_device_ops imx_tz_ops = {
-	.bind = imx_bind,
-	.unbind = imx_unbind,
+	.should_bind = imx_should_bind,
 	.get_temp = imx_get_temp,
 	.change_mode = imx_change_mode,
 	.set_trip_temp = imx_set_trip_temp,




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 12/14] thermal/of:  Use the .should_bind() thermal zone callback
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
                   ` (10 preceding siblings ...)
  2024-08-19 16:26 ` [PATCH v3 11/14] thermal: imx: " Rafael J. Wysocki
@ 2024-08-19 16:30 ` Rafael J. Wysocki
  2024-08-21 14:20   ` Daniel Lezcano
  2024-08-26 11:31   ` Marek Szyprowski
  2024-08-19 16:31 ` [PATCH v3 13/14] thermal: core: Drop unused bind/unbind functions and callbacks Rafael J. Wysocki
                   ` (2 subsequent siblings)
  14 siblings, 2 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 16:30 UTC (permalink / raw)
  To: Linux PM
  Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui, Krzysztof Kozlowski

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make the thermal_of driver use the .should_bind() thermal zone callback
to provide the thermal core with the information on whether or not to
bind the given cooling device to the given trip point in the given
thermal zone.  If it returns 'true', the thermal core will bind the
cooling device to the trip and the corresponding unbinding will be
taken care of automatically by the core on the removal of the involved
thermal zone or cooling device.

This replaces the .bind() and .unbind() thermal zone callbacks which
assumed the same trip points ordering in the driver and in the thermal
core (that may not be true any more in the future).  The .bind()
callback would walk the given thermal zone's cooling maps to find all
of the valid trip point combinations with the given cooling device and
it would call thermal_zone_bind_cooling_device() for all of them using
trip point indices reflecting the ordering of the trips in the DT.

The .should_bind() callback still walks the thermal zone's cooling maps,
but it can use the trip object passed to it by the thermal core to find
the trip in question in the first place and then it uses the
corresponding 'cooling-device' entries to look up the given cooling
device.  To be able to match the trip object provided by the thermal
core to a specific device node, the driver sets the 'priv' field of each
trip to the corresponding device node pointer during initialization.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v2 -> v3: Reorder (previously [14/17])

v1 -> v2:
   * Fix a build issue (undefined symbol)

This patch only depends on the [06/14] introducing the .should_bind()
thermal zone callback:

https://lore.kernel.org/linux-pm/9334403.CDJkKcVGEf@rjwysocki.net/

---
 drivers/thermal/thermal_of.c |  171 ++++++++++---------------------------------
 1 file changed, 41 insertions(+), 130 deletions(-)

Index: linux-pm/drivers/thermal/thermal_of.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_of.c
+++ linux-pm/drivers/thermal/thermal_of.c
@@ -20,37 +20,6 @@
 
 /***   functions parsing device tree nodes   ***/
 
-static int of_find_trip_id(struct device_node *np, struct device_node *trip)
-{
-	struct device_node *trips;
-	struct device_node *t;
-	int i = 0;
-
-	trips = of_get_child_by_name(np, "trips");
-	if (!trips) {
-		pr_err("Failed to find 'trips' node\n");
-		return -EINVAL;
-	}
-
-	/*
-	 * Find the trip id point associated with the cooling device map
-	 */
-	for_each_child_of_node(trips, t) {
-
-		if (t == trip) {
-			of_node_put(t);
-			goto out;
-		}
-		i++;
-	}
-
-	i = -ENXIO;
-out:
-	of_node_put(trips);
-
-	return i;
-}
-
 /*
  * It maps 'enum thermal_trip_type' found in include/linux/thermal.h
  * into the device tree binding of 'trip', property type.
@@ -119,6 +88,8 @@ static int thermal_of_populate_trip(stru
 
 	trip->flags = THERMAL_TRIP_FLAG_RW_TEMP;
 
+	trip->priv = np;
+
 	return 0;
 }
 
@@ -290,39 +261,9 @@ static struct device_node *thermal_of_zo
 	return tz_np;
 }
 
-static int __thermal_of_unbind(struct device_node *map_np, int index, int trip_id,
-			       struct thermal_zone_device *tz, struct thermal_cooling_device *cdev)
-{
-	struct of_phandle_args cooling_spec;
-	int ret;
-
-	ret = of_parse_phandle_with_args(map_np, "cooling-device", "#cooling-cells",
-					 index, &cooling_spec);
-
-	if (ret < 0) {
-		pr_err("Invalid cooling-device entry\n");
-		return ret;
-	}
-
-	of_node_put(cooling_spec.np);
-
-	if (cooling_spec.args_count < 2) {
-		pr_err("wrong reference to cooling device, missing limits\n");
-		return -EINVAL;
-	}
-
-	if (cooling_spec.np != cdev->np)
-		return 0;
-
-	ret = thermal_zone_unbind_cooling_device(tz, trip_id, cdev);
-	if (ret)
-		pr_err("Failed to unbind '%s' with '%s': %d\n", tz->type, cdev->type, ret);
-
-	return ret;
-}
-
-static int __thermal_of_bind(struct device_node *map_np, int index, int trip_id,
-			     struct thermal_zone_device *tz, struct thermal_cooling_device *cdev)
+static bool thermal_of_get_cooling_spec(struct device_node *map_np, int index,
+					struct thermal_cooling_device *cdev,
+					struct cooling_spec *c)
 {
 	struct of_phandle_args cooling_spec;
 	int ret, weight = THERMAL_WEIGHT_DEFAULT;
@@ -334,104 +275,75 @@ static int __thermal_of_bind(struct devi
 
 	if (ret < 0) {
 		pr_err("Invalid cooling-device entry\n");
-		return ret;
+		return false;
 	}
 
 	of_node_put(cooling_spec.np);
 
 	if (cooling_spec.args_count < 2) {
 		pr_err("wrong reference to cooling device, missing limits\n");
-		return -EINVAL;
+		return false;
 	}
 
 	if (cooling_spec.np != cdev->np)
-		return 0;
-
-	ret = thermal_zone_bind_cooling_device(tz, trip_id, cdev, cooling_spec.args[1],
-					       cooling_spec.args[0],
-					       weight);
-	if (ret)
-		pr_err("Failed to bind '%s' with '%s': %d\n", tz->type, cdev->type, ret);
-
-	return ret;
-}
-
-static int thermal_of_for_each_cooling_device(struct device_node *tz_np, struct device_node *map_np,
-					      struct thermal_zone_device *tz, struct thermal_cooling_device *cdev,
-					      int (*action)(struct device_node *, int, int,
-							    struct thermal_zone_device *, struct thermal_cooling_device *))
-{
-	struct device_node *tr_np;
-	int count, i, trip_id;
-
-	tr_np = of_parse_phandle(map_np, "trip", 0);
-	if (!tr_np)
-		return -ENODEV;
-
-	trip_id = of_find_trip_id(tz_np, tr_np);
-	if (trip_id < 0)
-		return trip_id;
-
-	count = of_count_phandle_with_args(map_np, "cooling-device", "#cooling-cells");
-	if (count <= 0) {
-		pr_err("Add a cooling_device property with at least one device\n");
-		return -ENOENT;
-	}
+		return false;
 
-	/*
-	 * At this point, we don't want to bail out when there is an
-	 * error, we will try to bind/unbind as many as possible
-	 * cooling devices
-	 */
-	for (i = 0; i < count; i++)
-		action(map_np, i, trip_id, tz, cdev);
+	c->lower = cooling_spec.args[0];
+	c->upper = cooling_spec.args[1];
+	c->weight = weight;
 
-	return 0;
+	return true;
 }
 
-static int thermal_of_for_each_cooling_maps(struct thermal_zone_device *tz,
-					    struct thermal_cooling_device *cdev,
-					    int (*action)(struct device_node *, int, int,
-							  struct thermal_zone_device *, struct thermal_cooling_device *))
+static bool thermal_of_should_bind(struct thermal_zone_device *tz,
+				   const struct thermal_trip *trip,
+				   struct thermal_cooling_device *cdev,
+				   struct cooling_spec *c)
 {
 	struct device_node *tz_np, *cm_np, *child;
-	int ret = 0;
+	bool result = false;
 
 	tz_np = thermal_of_zone_get_by_name(tz);
 	if (IS_ERR(tz_np)) {
 		pr_err("Failed to get node tz by name\n");
-		return PTR_ERR(tz_np);
+		return false;
 	}
 
 	cm_np = of_get_child_by_name(tz_np, "cooling-maps");
 	if (!cm_np)
 		goto out;
 
+	/* Look up the trip and the cdev in the cooling maps. */
 	for_each_child_of_node(cm_np, child) {
-		ret = thermal_of_for_each_cooling_device(tz_np, child, tz, cdev, action);
-		if (ret) {
+		struct device_node *tr_np;
+		int count, i;
+
+		tr_np = of_parse_phandle(child, "trip", 0);
+		if (tr_np != trip->priv) {
 			of_node_put(child);
-			break;
+			continue;
+		}
+
+		/* The trip has been found, look up the cdev. */
+		count = of_count_phandle_with_args(child, "cooling-device", "#cooling-cells");
+		if (count <= 0)
+			pr_err("Add a cooling_device property with at least one device\n");
+
+		for (i = 0; i < count; i++) {
+			result = thermal_of_get_cooling_spec(child, i, cdev, c);
+			if (result)
+				break;
 		}
+
+		of_node_put(child);
+		break;
 	}
 
 	of_node_put(cm_np);
 out:
 	of_node_put(tz_np);
 
-	return ret;
-}
-
-static int thermal_of_bind(struct thermal_zone_device *tz,
-			   struct thermal_cooling_device *cdev)
-{
-	return thermal_of_for_each_cooling_maps(tz, cdev, __thermal_of_bind);
-}
-
-static int thermal_of_unbind(struct thermal_zone_device *tz,
-			     struct thermal_cooling_device *cdev)
-{
-	return thermal_of_for_each_cooling_maps(tz, cdev, __thermal_of_unbind);
+	return result;
 }
 
 /**
@@ -502,8 +414,7 @@ static struct thermal_zone_device *therm
 
 	thermal_of_parameters_init(np, &tzp);
 
-	of_ops.bind = thermal_of_bind;
-	of_ops.unbind = thermal_of_unbind;
+	of_ops.should_bind = thermal_of_should_bind;
 
 	ret = of_property_read_string(np, "critical-action", &action);
 	if (!ret)




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 13/14] thermal: core: Drop unused bind/unbind functions and callbacks
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
                   ` (11 preceding siblings ...)
  2024-08-19 16:30 ` [PATCH v3 12/14] thermal/of: " Rafael J. Wysocki
@ 2024-08-19 16:31 ` Rafael J. Wysocki
  2024-08-20  7:10   ` Zhang, Rui
                     ` (2 more replies)
  2024-08-19 16:33 ` [PATCH v3 14/14] thermal: core: Clean up trip bind/unbind functions Rafael J. Wysocki
  2024-08-24 18:45 ` [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Nícolas F. R. A. Prado
  14 siblings, 3 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 16:31 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

There are no more callers of thermal_zone_bind_cooling_device() and
thermal_zone_unbind_cooling_device(), so drop them along with all of
the corresponding headers, code and documentation.

Moreover, because the .bind() and .unbind() thermal zone callbacks would
only be used when the above functions, respectively, were called, drop
them as well along with all of the code related to them.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v2 -> v3: No changes

v1 -> v2:
   * Update the list of thermal zone ops in the documentation.

---
 Documentation/driver-api/thermal/sysfs-api.rst |   59 +------------------
 drivers/thermal/thermal_core.c                 |   75 +------------------------
 include/linux/thermal.h                        |   10 ---
 3 files changed, 6 insertions(+), 138 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -878,28 +878,6 @@ free_mem:
 	return result;
 }
 
-int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
-				     int trip_index,
-				     struct thermal_cooling_device *cdev,
-				     unsigned long upper, unsigned long lower,
-				     unsigned int weight)
-{
-	int ret;
-
-	if (trip_index < 0 || trip_index >= tz->num_trips)
-		return -EINVAL;
-
-	mutex_lock(&tz->lock);
-
-	ret = thermal_bind_cdev_to_trip(tz, &tz->trips[trip_index].trip, cdev,
-					upper, lower, weight);
-
-	mutex_unlock(&tz->lock);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(thermal_zone_bind_cooling_device);
-
 /**
  * thermal_unbind_cdev_from_trip - unbind a cooling device from a thermal zone.
  * @tz:		pointer to a struct thermal_zone_device.
@@ -945,25 +923,6 @@ unbind:
 	return 0;
 }
 
-int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz,
-				       int trip_index,
-				       struct thermal_cooling_device *cdev)
-{
-	int ret;
-
-	if (trip_index < 0 || trip_index >= tz->num_trips)
-		return -EINVAL;
-
-	mutex_lock(&tz->lock);
-
-	ret = thermal_unbind_cdev_from_trip(tz, &tz->trips[trip_index].trip, cdev);
-
-	mutex_unlock(&tz->lock);
-
-	return ret;
-}
-EXPORT_SYMBOL_GPL(thermal_zone_unbind_cooling_device);
-
 static void thermal_release(struct device *dev)
 {
 	struct thermal_zone_device *tz;
@@ -992,14 +951,8 @@ void print_bind_err_msg(struct thermal_z
 			const struct thermal_trip *trip,
 			struct thermal_cooling_device *cdev, int ret)
 {
-	if (trip) {
-		dev_err(&tz->device, "binding cdev %s to trip %d failed: %d\n",
-			cdev->type, thermal_zone_trip_id(tz, trip), ret);
-		return;
-	}
-
-	dev_err(&tz->device, "binding zone %s with cdev %s failed:%d\n",
-		tz->type, cdev->type, ret);
+	dev_err(&tz->device, "binding cdev %s to trip %d failed: %d\n",
+		cdev->type, thermal_zone_trip_id(tz, trip), ret);
 }
 
 static void thermal_zone_cdev_binding(struct thermal_zone_device *tz,
@@ -1008,18 +961,6 @@ static void thermal_zone_cdev_binding(st
 	struct thermal_trip_desc *td;
 	int ret;
 
-	/*
-	 * Old-style binding. The .bind() callback is expected to call
-	 * thermal_bind_cdev_to_trip() under the thermal zone lock.
-	 */
-	if (tz->ops.bind) {
-		ret = tz->ops.bind(tz, cdev);
-		if (ret)
-			print_bind_err_msg(tz, NULL, cdev, ret);
-
-		return;
-	}
-
 	if (!tz->ops.should_bind)
 		return;
 
@@ -1346,15 +1287,6 @@ static void thermal_zone_cdev_unbinding(
 {
 	struct thermal_trip_desc *td;
 
-	/*
-	 * Old-style unbinding.  The .unbind callback is expected to call
-	 * thermal_unbind_cdev_from_trip() under the thermal zone lock.
-	 */
-	if (tz->ops.unbind) {
-		tz->ops.unbind(tz, cdev);
-		return;
-	}
-
 	mutex_lock(&tz->lock);
 
 	for_each_trip_desc(tz, td)
@@ -1488,8 +1420,7 @@ thermal_zone_device_register_with_trips(
 		return ERR_PTR(-EINVAL);
 	}
 
-	if (!ops || !ops->get_temp || (ops->should_bind && ops->bind) ||
-	    (ops->should_bind && ops->unbind)) {
+	if (!ops || !ops->get_temp) {
 		pr_err("Thermal zone device ops not defined or invalid\n");
 		return ERR_PTR(-EINVAL);
 	}
Index: linux-pm/include/linux/thermal.h
===================================================================
--- linux-pm.orig/include/linux/thermal.h
+++ linux-pm/include/linux/thermal.h
@@ -92,10 +92,6 @@ struct cooling_spec {
 };
 
 struct thermal_zone_device_ops {
-	int (*bind) (struct thermal_zone_device *,
-		     struct thermal_cooling_device *);
-	int (*unbind) (struct thermal_zone_device *,
-		       struct thermal_cooling_device *);
 	bool (*should_bind) (struct thermal_zone_device *,
 			     const struct thermal_trip *,
 			     struct thermal_cooling_device *,
@@ -247,12 +243,6 @@ const char *thermal_zone_device_type(str
 int thermal_zone_device_id(struct thermal_zone_device *tzd);
 struct device *thermal_zone_device(struct thermal_zone_device *tzd);
 
-int thermal_zone_bind_cooling_device(struct thermal_zone_device *, int,
-				     struct thermal_cooling_device *,
-				     unsigned long, unsigned long,
-				     unsigned int);
-int thermal_zone_unbind_cooling_device(struct thermal_zone_device *, int,
-				       struct thermal_cooling_device *);
 void thermal_zone_device_update(struct thermal_zone_device *,
 				enum thermal_notify_event);
 
Index: linux-pm/Documentation/driver-api/thermal/sysfs-api.rst
===================================================================
--- linux-pm.orig/Documentation/driver-api/thermal/sysfs-api.rst
+++ linux-pm/Documentation/driver-api/thermal/sysfs-api.rst
@@ -58,10 +58,9 @@ temperature) and throttle appropriate de
     ops:
 	thermal zone device call-backs.
 
-	.bind:
-		bind the thermal zone device with a thermal cooling device.
-	.unbind:
-		unbind the thermal zone device with a thermal cooling device.
+	.should_bind:
+		check whether or not a given cooling device should be bound to
+		a given trip point in this thermal zone.
 	.get_temp:
 		get the current temperature of the thermal zone.
 	.set_trips:
@@ -246,56 +245,6 @@ temperature) and throttle appropriate de
     It deletes the corresponding entry from /sys/class/thermal folder and
     unbinds itself from all the thermal zone devices using it.
 
-1.3 interface for binding a thermal zone device with a thermal cooling device
------------------------------------------------------------------------------
-
-    ::
-
-	int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
-		int trip, struct thermal_cooling_device *cdev,
-		unsigned long upper, unsigned long lower, unsigned int weight);
-
-    This interface function binds a thermal cooling device to a particular trip
-    point of a thermal zone device.
-
-    This function is usually called in the thermal zone device .bind callback.
-
-    tz:
-	  the thermal zone device
-    cdev:
-	  thermal cooling device
-    trip:
-	  indicates which trip point in this thermal zone the cooling device
-	  is associated with.
-    upper:
-	  the Maximum cooling state for this trip point.
-	  THERMAL_NO_LIMIT means no upper limit,
-	  and the cooling device can be in max_state.
-    lower:
-	  the Minimum cooling state can be used for this trip point.
-	  THERMAL_NO_LIMIT means no lower limit,
-	  and the cooling device can be in cooling state 0.
-    weight:
-	  the influence of this cooling device in this thermal
-	  zone.  See 1.4.1 below for more information.
-
-    ::
-
-	int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz,
-				int trip, struct thermal_cooling_device *cdev);
-
-    This interface function unbinds a thermal cooling device from a particular
-    trip point of a thermal zone device. This function is usually called in
-    the thermal zone device .unbind callback.
-
-    tz:
-	the thermal zone device
-    cdev:
-	thermal cooling device
-    trip:
-	indicates which trip point in this thermal zone the cooling device
-	is associated with.
-
 1.4 Thermal Zone Parameters
 ---------------------------
 
@@ -366,8 +315,6 @@ Thermal cooling device sys I/F, created
 
 Then next two dynamic attributes are created/removed in pairs. They represent
 the relationship between a thermal zone and its associated cooling device.
-They are created/removed for each successful execution of
-thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device.
 
 ::
 




^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH v3 14/14] thermal: core: Clean up trip bind/unbind functions
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
                   ` (12 preceding siblings ...)
  2024-08-19 16:31 ` [PATCH v3 13/14] thermal: core: Drop unused bind/unbind functions and callbacks Rafael J. Wysocki
@ 2024-08-19 16:33 ` Rafael J. Wysocki
  2024-08-20  7:11   ` Zhang, Rui
                     ` (2 more replies)
  2024-08-24 18:45 ` [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Nícolas F. R. A. Prado
  14 siblings, 3 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-19 16:33 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Make thermal_bind_cdev_to_trip() take a struct cooling_spec pointer
to reduce the number of its arguments, change the return type of
thermal_unbind_cdev_from_trip() to void and rearrange the code in
thermal_zone_cdev_binding() to reduce the indentation level.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v2 -> v3: Subject fix

v1-> v2: No changes

---
 drivers/thermal/thermal_core.c |   54 +++++++++++++++--------------------------
 1 file changed, 21 insertions(+), 33 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -757,15 +757,7 @@ struct thermal_zone_device *thermal_zone
  * @tz:		pointer to struct thermal_zone_device
  * @trip:	trip point the cooling devices is associated with in this zone.
  * @cdev:	pointer to struct thermal_cooling_device
- * @upper:	the Maximum cooling state for this trip point.
- *		THERMAL_NO_LIMIT means no upper limit,
- *		and the cooling device can be in max_state.
- * @lower:	the Minimum cooling state can be used for this trip point.
- *		THERMAL_NO_LIMIT means no lower limit,
- *		and the cooling device can be in cooling state 0.
- * @weight:	The weight of the cooling device to be bound to the
- *		thermal zone. Use THERMAL_WEIGHT_DEFAULT for the
- *		default value
+ * @c:		cooling specification for @trip and @cdev
  *
  * This interface function bind a thermal cooling device to the certain trip
  * point of a thermal zone device.
@@ -776,8 +768,7 @@ struct thermal_zone_device *thermal_zone
 static int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
 				     const struct thermal_trip *trip,
 				     struct thermal_cooling_device *cdev,
-				     unsigned long upper, unsigned long lower,
-				     unsigned int weight)
+				     struct cooling_spec *c)
 {
 	struct thermal_instance *dev;
 	struct thermal_instance *pos;
@@ -791,17 +782,17 @@ static int thermal_bind_cdev_to_trip(str
 		return -EINVAL;
 
 	/* lower default 0, upper default max_state */
-	if (lower == THERMAL_NO_LIMIT)
-		lower = 0;
+	if (c->lower == THERMAL_NO_LIMIT)
+		c->lower = 0;
 
-	if (upper == THERMAL_NO_LIMIT) {
-		upper = cdev->max_state;
+	if (c->upper == THERMAL_NO_LIMIT) {
+		c->upper = cdev->max_state;
 		upper_no_limit = true;
 	} else {
 		upper_no_limit = false;
 	}
 
-	if (lower > upper || upper > cdev->max_state)
+	if (c->lower > c->upper || c->upper > cdev->max_state)
 		return -EINVAL;
 
 	dev = kzalloc(sizeof(*dev), GFP_KERNEL);
@@ -810,11 +801,11 @@ static int thermal_bind_cdev_to_trip(str
 	dev->tz = tz;
 	dev->cdev = cdev;
 	dev->trip = trip;
-	dev->upper = upper;
+	dev->upper = c->upper;
 	dev->upper_no_limit = upper_no_limit;
-	dev->lower = lower;
+	dev->lower = c->lower;
 	dev->target = THERMAL_NO_TARGET;
-	dev->weight = weight;
+	dev->weight = c->weight;
 
 	result = ida_alloc(&tz->ida, GFP_KERNEL);
 	if (result < 0)
@@ -887,12 +878,10 @@ free_mem:
  * This interface function unbind a thermal cooling device from the certain
  * trip point of a thermal zone device.
  * This function is usually called in the thermal zone device .unbind callback.
- *
- * Return: 0 on success, the proper error value otherwise.
  */
-static int thermal_unbind_cdev_from_trip(struct thermal_zone_device *tz,
-					 const struct thermal_trip *trip,
-					 struct thermal_cooling_device *cdev)
+static void thermal_unbind_cdev_from_trip(struct thermal_zone_device *tz,
+					  const struct thermal_trip *trip,
+					  struct thermal_cooling_device *cdev)
 {
 	struct thermal_instance *pos, *next;
 
@@ -912,7 +901,7 @@ static int thermal_unbind_cdev_from_trip
 	}
 	mutex_unlock(&cdev->lock);
 
-	return -ENODEV;
+	return;
 
 unbind:
 	device_remove_file(&tz->device, &pos->weight_attr);
@@ -920,7 +909,6 @@ unbind:
 	sysfs_remove_link(&tz->device.kobj, pos->name);
 	ida_free(&tz->ida, pos->id);
 	kfree(pos);
-	return 0;
 }
 
 static void thermal_release(struct device *dev)
@@ -959,7 +947,6 @@ static void thermal_zone_cdev_binding(st
 				      struct thermal_cooling_device *cdev)
 {
 	struct thermal_trip_desc *td;
-	int ret;
 
 	if (!tz->ops.should_bind)
 		return;
@@ -973,13 +960,14 @@ static void thermal_zone_cdev_binding(st
 			.lower = THERMAL_NO_LIMIT,
 			.weight = THERMAL_WEIGHT_DEFAULT
 		};
+		int ret;
 
-		if (tz->ops.should_bind(tz, trip, cdev, &c)) {
-			ret = thermal_bind_cdev_to_trip(tz, trip, cdev, c.upper,
-							c.lower, c.weight);
-			if (ret)
-				print_bind_err_msg(tz, trip, cdev, ret);
-		}
+		if (!tz->ops.should_bind(tz, trip, cdev, &c))
+			continue;
+
+		ret = thermal_bind_cdev_to_trip(tz, trip, cdev, &c);
+		if (ret)
+			print_bind_err_msg(tz, trip, cdev, ret);
 	}
 
 	mutex_unlock(&tz->lock);




^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 09/14] platform/x86: acerhdf: Use the .should_bind() thermal zone callback
  2024-08-19 16:19 ` [PATCH v3 09/14] platform/x86: acerhdf: Use the .should_bind() thermal zone callback Rafael J. Wysocki
@ 2024-08-19 20:24   ` Peter Kästle
  2024-08-21 13:25   ` Daniel Lezcano
  1 sibling, 0 replies; 66+ messages in thread
From: Peter Kästle @ 2024-08-19 20:24 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui, Hans de Goede,
	Peter Kaestle, platform-driver-x86

On 19.08.24 18:19, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make the acerhdf driver use the .should_bind() thermal zone
> callback to provide the thermal core with the information on whether or
> not to bind the given cooling device to the given trip point in the
> given thermal zone.  If it returns 'true', the thermal core will bind
> the cooling device to the trip and the corresponding unbinding will be
> taken care of automatically by the core on the removal of the involved
> thermal zone or cooling device.
> 
> The previously existing acerhdf_bind() function bound cooling devices
> to thermal trip point 0 only, so the new callback needs to return 'true'
> for trip point 0.  However, it is straightforward to observe that trip
> point 0 is an active trip point and the only other trip point in the
> driver's thermal zone is a critical one, so it is sufficient to return
> 'true' from that callback if the type of the given trip point is
> THERMAL_TRIP_ACTIVE.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Hans de Goede <hdegoede@redhat.com>

Tested-by: Peter Kästle <peter@piie.net>

> ---
> 
> v2 -> v3: Reorder (previously [12/17]) and add the ACK from Hans
> 
> v1 -> v2: No changes
> 
> This patch only depends on the [06/14] introducing the .should_bind()
> thermal zone callback:
> 
> https://lore.kernel.org/linux-pm/9334403.CDJkKcVGEf@rjwysocki.net/
> 
> ---
>   drivers/platform/x86/acerhdf.c |   33 ++++++---------------------------
>   1 file changed, 6 insertions(+), 27 deletions(-)
> 
> Index: linux-pm/drivers/platform/x86/acerhdf.c
> ===================================================================
> --- linux-pm.orig/drivers/platform/x86/acerhdf.c
> +++ linux-pm/drivers/platform/x86/acerhdf.c
> @@ -378,33 +378,13 @@ static int acerhdf_get_ec_temp(struct th
>   	return 0;
>   }
>   
> -static int acerhdf_bind(struct thermal_zone_device *thermal,
> -			struct thermal_cooling_device *cdev)
> +static bool acerhdf_should_bind(struct thermal_zone_device *thermal,
> +				const struct thermal_trip *trip,
> +				struct thermal_cooling_device *cdev,
> +				struct cooling_spec *c)
>   {
>   	/* if the cooling device is the one from acerhdf bind it */
> -	if (cdev != cl_dev)
> -		return 0;
> -
> -	if (thermal_zone_bind_cooling_device(thermal, 0, cdev,
> -			THERMAL_NO_LIMIT, THERMAL_NO_LIMIT,
> -			THERMAL_WEIGHT_DEFAULT)) {
> -		pr_err("error binding cooling dev\n");
> -		return -EINVAL;
> -	}
> -	return 0;
> -}
> -
> -static int acerhdf_unbind(struct thermal_zone_device *thermal,
> -			  struct thermal_cooling_device *cdev)
> -{
> -	if (cdev != cl_dev)
> -		return 0;
> -
> -	if (thermal_zone_unbind_cooling_device(thermal, 0, cdev)) {
> -		pr_err("error unbinding cooling dev\n");
> -		return -EINVAL;
> -	}
> -	return 0;
> +	return cdev == cl_dev && trip->type == THERMAL_TRIP_ACTIVE;
>   }
>   
>   static inline void acerhdf_revert_to_bios_mode(void)
> @@ -447,8 +427,7 @@ static int acerhdf_get_crit_temp(struct
>   
>   /* bind callback functions to thermalzone */
>   static struct thermal_zone_device_ops acerhdf_dev_ops = {
> -	.bind = acerhdf_bind,
> -	.unbind = acerhdf_unbind,
> +	.should_bind = acerhdf_should_bind,
>   	.get_temp = acerhdf_get_ec_temp,
>   	.change_mode = acerhdf_change_mode,
>   	.get_crit_temp = acerhdf_get_crit_temp,
> 
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 01/14] thermal: core: Fold two functions into their respective callers
  2024-08-19 15:50 ` [PATCH v3 01/14] thermal: core: Fold two functions into their respective callers Rafael J. Wysocki
@ 2024-08-20  7:04   ` Zhang, Rui
  2024-08-21  7:57   ` Daniel Lezcano
  1 sibling, 0 replies; 66+ messages in thread
From: Zhang, Rui @ 2024-08-20  7:04 UTC (permalink / raw)
  To: linux-pm@vger.kernel.org, rjw@rjwysocki.net
  Cc: lukasz.luba@arm.com, linux-kernel@vger.kernel.org,
	daniel.lezcano@linaro.org

On Mon, 2024-08-19 at 17:50 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Fold bind_cdev() into __thermal_cooling_device_register() and
> bind_tz()
> into thermal_zone_device_register_with_trips() to reduce code bloat
> and
> make it somewhat easier to follow the code flow.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui
> ---
> 
> v1 -> v3: No changes
> 
> ---
>  drivers/thermal/thermal_core.c |   55 ++++++++++++++----------------
> -----------
>  1 file changed, 19 insertions(+), 36 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -991,20 +991,6 @@ void print_bind_err_msg(struct thermal_z
>                 tz->type, cdev->type, ret);
>  }
>  
> -static void bind_cdev(struct thermal_cooling_device *cdev)
> -{
> -       int ret;
> -       struct thermal_zone_device *pos = NULL;
> -
> -       list_for_each_entry(pos, &thermal_tz_list, node) {
> -               if (pos->ops.bind) {
> -                       ret = pos->ops.bind(pos, cdev);
> -                       if (ret)
> -                               print_bind_err_msg(pos, cdev, ret);
> -               }
> -       }
> -}
> -
>  /**
>   * __thermal_cooling_device_register() - register a new thermal
> cooling device
>   * @np:                a pointer to a device tree node.
> @@ -1100,7 +1086,13 @@ __thermal_cooling_device_register(struct
>         list_add(&cdev->node, &thermal_cdev_list);
>  
>         /* Update binding information for 'this' new cdev */
> -       bind_cdev(cdev);
> +       list_for_each_entry(pos, &thermal_tz_list, node) {
> +               if (pos->ops.bind) {
> +                       ret = pos->ops.bind(pos, cdev);
> +                       if (ret)
> +                               print_bind_err_msg(pos, cdev, ret);
> +               }
> +       }
>  
>         list_for_each_entry(pos, &thermal_tz_list, node)
>                 if (atomic_cmpxchg(&pos->need_update, 1, 0))
> @@ -1338,25 +1330,6 @@ void thermal_cooling_device_unregister(s
>  }
>  EXPORT_SYMBOL_GPL(thermal_cooling_device_unregister);
>  
> -static void bind_tz(struct thermal_zone_device *tz)
> -{
> -       int ret;
> -       struct thermal_cooling_device *pos = NULL;
> -
> -       if (!tz->ops.bind)
> -               return;
> -
> -       mutex_lock(&thermal_list_lock);
> -
> -       list_for_each_entry(pos, &thermal_cdev_list, node) {
> -               ret = tz->ops.bind(tz, pos);
> -               if (ret)
> -                       print_bind_err_msg(tz, pos, ret);
> -       }
> -
> -       mutex_unlock(&thermal_list_lock);
> -}
> -
>  static void thermal_set_delay_jiffies(unsigned long *delay_jiffies,
> int delay_ms)
>  {
>         *delay_jiffies = msecs_to_jiffies(delay_ms);
> @@ -1554,13 +1527,23 @@ thermal_zone_device_register_with_trips(
>         }
>  
>         mutex_lock(&thermal_list_lock);
> +
>         mutex_lock(&tz->lock);
>         list_add_tail(&tz->node, &thermal_tz_list);
>         mutex_unlock(&tz->lock);
> -       mutex_unlock(&thermal_list_lock);
>  
>         /* Bind cooling devices for this zone */
> -       bind_tz(tz);
> +       if (tz->ops.bind) {
> +               struct thermal_cooling_device *cdev;
> +
> +               list_for_each_entry(cdev, &thermal_cdev_list, node) {
> +                       result = tz->ops.bind(tz, cdev);
> +                       if (result)
> +                               print_bind_err_msg(tz, cdev, result);
> +               }
> +       }
> +
> +       mutex_unlock(&thermal_list_lock);
>  
>         thermal_zone_device_init(tz);
>         /* Update the new thermal zone and mark it as already
> updated. */
> 
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()
  2024-08-19 15:51 ` [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip() Rafael J. Wysocki
@ 2024-08-20  7:05   ` Zhang, Rui
  2024-08-21  7:59   ` Daniel Lezcano
  2024-08-21  8:49   ` lihuisong (C)
  2 siblings, 0 replies; 66+ messages in thread
From: Zhang, Rui @ 2024-08-20  7:05 UTC (permalink / raw)
  To: linux-pm@vger.kernel.org, rjw@rjwysocki.net
  Cc: lukasz.luba@arm.com, linux-kernel@vger.kernel.org,
	daniel.lezcano@linaro.org

On Mon, 2024-08-19 at 17:51 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> It is not necessary to look up the thermal zone and the cooling
> device
> in the respective global lists to check whether or not they are
> registered.  It is sufficient to check whether or not their
> respective
> list nodes are empty for this purpose.
> 
> Use the above observation to simplify thermal_bind_cdev_to_trip(). 
> In
> addition, eliminate an unnecessary ternary operator from it.
> 
> Moreover, add lockdep_assert_held() for thermal_list_lock to it
> because
> that lock must be held by its callers when it is running.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui

> ---
> 
> v1 -> v3: No changes
> 
> ---
>  drivers/thermal/thermal_core.c |   16 ++++------------
>  1 file changed, 4 insertions(+), 12 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -781,25 +781,17 @@ int thermal_bind_cdev_to_trip(struct the
>  {
>         struct thermal_instance *dev;
>         struct thermal_instance *pos;
> -       struct thermal_zone_device *pos1;
> -       struct thermal_cooling_device *pos2;
>         bool upper_no_limit;
>         int result;
>  
> -       list_for_each_entry(pos1, &thermal_tz_list, node) {
> -               if (pos1 == tz)
> -                       break;
> -       }
> -       list_for_each_entry(pos2, &thermal_cdev_list, node) {
> -               if (pos2 == cdev)
> -                       break;
> -       }
> +       lockdep_assert_held(&thermal_list_lock);
>  
> -       if (tz != pos1 || cdev != pos2)
> +       if (list_empty(&tz->node) || list_empty(&cdev->node))
>                 return -EINVAL;
>  
>         /* lower default 0, upper default max_state */
> -       lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
> +       if (lower == THERMAL_NO_LIMIT)
> +               lower = 0;
>  
>         if (upper == THERMAL_NO_LIMIT) {
>                 upper = cdev->max_state;
> 
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 03/14] thermal: core: Drop redundant thermal instance checks
  2024-08-19 15:52 ` [PATCH v3 03/14] thermal: core: Drop redundant thermal instance checks Rafael J. Wysocki
@ 2024-08-20  7:05   ` Zhang, Rui
  2024-08-21  9:32   ` Daniel Lezcano
  1 sibling, 0 replies; 66+ messages in thread
From: Zhang, Rui @ 2024-08-20  7:05 UTC (permalink / raw)
  To: linux-pm@vger.kernel.org, rjw@rjwysocki.net
  Cc: lukasz.luba@arm.com, linux-kernel@vger.kernel.org,
	daniel.lezcano@linaro.org

On Mon, 2024-08-19 at 17:52 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Because the trip and cdev pointers are sufficient to identify a
> thermal
> instance holding them unambiguously, drop the additional thermal zone
> checks from two loops walking the list of thermal instances in a
> thermal zone.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui

> ---
> 
> v1 -> v3: No changes
> 
> ---
>  drivers/thermal/thermal_core.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -850,7 +850,7 @@ int thermal_bind_cdev_to_trip(struct the
>         mutex_lock(&tz->lock);
>         mutex_lock(&cdev->lock);
>         list_for_each_entry(pos, &tz->thermal_instances, tz_node)
> -               if (pos->tz == tz && pos->trip == trip && pos->cdev
> == cdev) {
> +               if (pos->trip == trip && pos->cdev == cdev) {
>                         result = -EEXIST;
>                         break;
>                 }
> @@ -915,7 +915,7 @@ int thermal_unbind_cdev_from_trip(struct
>         mutex_lock(&tz->lock);
>         mutex_lock(&cdev->lock);
>         list_for_each_entry_safe(pos, next, &tz->thermal_instances,
> tz_node) {
> -               if (pos->tz == tz && pos->trip == trip && pos->cdev
> == cdev) {
> +               if (pos->trip == trip && pos->cdev == cdev) {
>                         list_del(&pos->tz_node);
>                         list_del(&pos->cdev_node);
>  
> 
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 04/14] thermal: sysfs: Use the dev argument in instance-related show/store
  2024-08-19 15:56 ` [PATCH v3 04/14] thermal: sysfs: Use the dev argument in instance-related show/store Rafael J. Wysocki
@ 2024-08-20  7:05   ` Zhang, Rui
  2024-08-20  7:59   ` lihuisong (C)
  2024-08-21  9:36   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: Zhang, Rui @ 2024-08-20  7:05 UTC (permalink / raw)
  To: linux-pm@vger.kernel.org, rjw@rjwysocki.net
  Cc: lukasz.luba@arm.com, linux-kernel@vger.kernel.org,
	daniel.lezcano@linaro.org

On Mon, 2024-08-19 at 17:56 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Two sysfs show/store functions for attributes representing thermal
> instances, trip_point_show() and weight_store(), retrieve the thermal
> zone pointer from the instance object at hand, but they may also get
> it from their dev argument, which is more consistent with what the
> other thermal sysfs functions do, so make them do so.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui

> ---
> 
> v1 -> v3: No changes (previously [06/17])
> 
> ---
>  drivers/thermal/thermal_sysfs.c |   15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_sysfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_sysfs.c
> +++ linux-pm/drivers/thermal/thermal_sysfs.c
> @@ -836,13 +836,12 @@ void thermal_cooling_device_stats_reinit
>  ssize_t
>  trip_point_show(struct device *dev, struct device_attribute *attr,
> char *buf)
>  {
> +       struct thermal_zone_device *tz = to_thermal_zone(dev);
>         struct thermal_instance *instance;
>  
> -       instance =
> -           container_of(attr, struct thermal_instance, attr);
> +       instance = container_of(attr, struct thermal_instance, attr);
>  
> -       return sprintf(buf, "%d\n",
> -                      thermal_zone_trip_id(instance->tz, instance-
> >trip));
> +       return sprintf(buf, "%d\n", thermal_zone_trip_id(tz,
> instance->trip));
>  }
>  
>  ssize_t
> @@ -858,6 +857,7 @@ weight_show(struct device *dev, struct d
>  ssize_t weight_store(struct device *dev, struct device_attribute
> *attr,
>                      const char *buf, size_t count)
>  {
> +       struct thermal_zone_device *tz = to_thermal_zone(dev);
>         struct thermal_instance *instance;
>         int ret, weight;
>  
> @@ -868,14 +868,13 @@ ssize_t weight_store(struct device *dev,
>         instance = container_of(attr, struct thermal_instance,
> weight_attr);
>  
>         /* Don't race with governors using the 'weight' value */
> -       mutex_lock(&instance->tz->lock);
> +       mutex_lock(&tz->lock);
>  
>         instance->weight = weight;
>  
> -       thermal_governor_update_tz(instance->tz,
> -                                  THERMAL_INSTANCE_WEIGHT_CHANGED);
> +       thermal_governor_update_tz(tz,
> THERMAL_INSTANCE_WEIGHT_CHANGED);
>  
> -       mutex_unlock(&instance->tz->lock);
> +       mutex_unlock(&tz->lock);
>  
>         return count;
>  }
> 
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions
  2024-08-19 15:58 ` [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions Rafael J. Wysocki
@ 2024-08-20  7:05   ` Zhang, Rui
  2024-08-20  8:27   ` lihuisong (C)
  2024-08-21  9:46   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: Zhang, Rui @ 2024-08-20  7:05 UTC (permalink / raw)
  To: linux-pm@vger.kernel.org, rjw@rjwysocki.net
  Cc: lukasz.luba@arm.com, linux-kernel@vger.kernel.org,
	daniel.lezcano@linaro.org

On Mon, 2024-08-19 at 17:58 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
> acquire the thermal zone lock, the locking rules for their callers
> get
> complicated.  In particular, the thermal zone lock cannot be acquired
> in any code path leading to one of these functions even though it
> might
> be useful to do so.
> 
> To address this, remove the thermal zone locking from both these
> functions, add lockdep assertions for the thermal zone lock to both
> of them and make their callers acquire the thermal zone lock instead.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui

> ---
> 
> v2 -> v3: Rebase after dropping patches [04-05/17] from the series
> 
> v1 -> v2: No changes
> 
> ---
>  drivers/acpi/thermal.c         |    2 +-
>  drivers/thermal/thermal_core.c |   30 ++++++++++++++++++++++--------
>  2 files changed, 23 insertions(+), 9 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -785,6 +785,7 @@ int thermal_bind_cdev_to_trip(struct the
>         int result;
>  
>         lockdep_assert_held(&thermal_list_lock);
> +       lockdep_assert_held(&tz->lock);
>  
>         if (list_empty(&tz->node) || list_empty(&cdev->node))
>                 return -EINVAL;
> @@ -847,7 +848,6 @@ int thermal_bind_cdev_to_trip(struct the
>         if (result)
>                 goto remove_trip_file;
>  
> -       mutex_lock(&tz->lock);
>         mutex_lock(&cdev->lock);
>         list_for_each_entry(pos, &tz->thermal_instances, tz_node)
>                 if (pos->trip == trip && pos->cdev == cdev) {
> @@ -862,7 +862,6 @@ int thermal_bind_cdev_to_trip(struct the
>                 thermal_governor_update_tz(tz, THERMAL_TZ_BIND_CDEV);
>         }
>         mutex_unlock(&cdev->lock);
> -       mutex_unlock(&tz->lock);
>  
>         if (!result)
>                 return 0;
> @@ -886,11 +885,19 @@ int thermal_zone_bind_cooling_device(str
>                                      unsigned long upper, unsigned
> long lower,
>                                      unsigned int weight)
>  {
> +       int ret;
> +
>         if (trip_index < 0 || trip_index >= tz->num_trips)
>                 return -EINVAL;
>  
> -       return thermal_bind_cdev_to_trip(tz, &tz-
> >trips[trip_index].trip, cdev,
> -                                        upper, lower, weight);
> +       mutex_lock(&tz->lock);
> +
> +       ret = thermal_bind_cdev_to_trip(tz, &tz-
> >trips[trip_index].trip, cdev,
> +                                       upper, lower, weight);
> +
> +       mutex_unlock(&tz->lock);
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(thermal_zone_bind_cooling_device);
>  
> @@ -912,7 +919,8 @@ int thermal_unbind_cdev_from_trip(struct
>  {
>         struct thermal_instance *pos, *next;
>  
> -       mutex_lock(&tz->lock);
> +       lockdep_assert_held(&tz->lock);
> +
>         mutex_lock(&cdev->lock);
>         list_for_each_entry_safe(pos, next, &tz->thermal_instances,
> tz_node) {
>                 if (pos->trip == trip && pos->cdev == cdev) {
> @@ -922,12 +930,10 @@ int thermal_unbind_cdev_from_trip(struct
>                         thermal_governor_update_tz(tz,
> THERMAL_TZ_UNBIND_CDEV);
>  
>                         mutex_unlock(&cdev->lock);
> -                       mutex_unlock(&tz->lock);
>                         goto unbind;
>                 }
>         }
>         mutex_unlock(&cdev->lock);
> -       mutex_unlock(&tz->lock);
>  
>         return -ENODEV;
>  
> @@ -945,10 +951,18 @@ int thermal_zone_unbind_cooling_device(s
>                                        int trip_index,
>                                        struct thermal_cooling_device
> *cdev)
>  {
> +       int ret;
> +
>         if (trip_index < 0 || trip_index >= tz->num_trips)
>                 return -EINVAL;
>  
> -       return thermal_unbind_cdev_from_trip(tz, &tz-
> >trips[trip_index].trip, cdev);
> +       mutex_lock(&tz->lock);
> +
> +       ret = thermal_unbind_cdev_from_trip(tz, &tz-
> >trips[trip_index].trip, cdev);
> +
> +       mutex_unlock(&tz->lock);
> +
> +       return ret;
>  }
>  EXPORT_SYMBOL_GPL(thermal_zone_unbind_cooling_device);
>  
> Index: linux-pm/drivers/acpi/thermal.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/thermal.c
> +++ linux-pm/drivers/acpi/thermal.c
> @@ -609,7 +609,7 @@ static int acpi_thermal_bind_unbind_cdev
>                 .thermal = thermal, .cdev = cdev, .bind = bind
>         };
>  
> -       return for_each_thermal_trip(thermal, bind_unbind_cdev_cb,
> &bd);
> +       return thermal_zone_for_each_trip(thermal,
> bind_unbind_cdev_cb, &bd);
>  }
>  
>  static int
> 
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 06/14] thermal: core: Introduce .should_bind() thermal zone callback
  2024-08-19 16:00 ` [PATCH v3 06/14] thermal: core: Introduce .should_bind() thermal zone callback Rafael J. Wysocki
@ 2024-08-20  7:06   ` Zhang, Rui
  2024-08-21  9:09   ` lihuisong (C)
  2024-08-21 13:21   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: Zhang, Rui @ 2024-08-20  7:06 UTC (permalink / raw)
  To: linux-pm@vger.kernel.org, rjw@rjwysocki.net
  Cc: lukasz.luba@arm.com, linux-kernel@vger.kernel.org,
	daniel.lezcano@linaro.org

On Mon, 2024-08-19 at 18:00 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The current design of the code binding cooling devices to trip points
> in
> thermal zones is convoluted and hard to follow.
> 
> Namely, a driver that registers a thermal zone can provide .bind()
> and .unbind() operations for it, which are required to call either
> thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip(),
> respectively, or thermal_zone_bind_cooling_device() and
> thermal_zone_unbind_cooling_device(), respectively, for every
> relevant
> trip point and the given cooling device.  Moreover, if .bind() is
> provided and .unbind() is not, the cleanup necessary during the
> removal
> of a thermal zone or a cooling device may not be carried out.
> 
> In other words, the core relies on the thermal zone owners to do the
> right thing, which is error prone and far from obvious, even though
> all
> of that is not really necessary.  Specifically, if the core could ask
> the thermal zone owner, through a special thermal zone callback,
> whether
> or not a given cooling device should be bound to a given trip point
> in
> the given thermal zone, it might as well carry out all of the binding
> and unbinding by itself.  In particular, the unbinding can be done
> automatically without involving the thermal zone owner at all because
> all of the thermal instances associated with a thermal zone or
> cooling
> device going away must be deleted regardless.
> 
> Accordingly, introduce a new thermal zone operation, .should_bind(),
> that can be invoked by the thermal core for a given thermal zone,
> trip point and cooling device combination in order to check whether
> or not the cooling device should be bound to the trip point at hand.
> It takes an additional cooling_spec argument allowing the thermal
> zone owner to specify the highest and lowest cooling states of the
> cooling device and its weight for the given trip point binding.
> 
> Make the thermal core use this operation, if present, in the absence
> of
> .bind() and .unbind().  Note that .should_bind() will be called under
> the thermal zone lock.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui

> ---
> 
> v1 -> v3: No changes (previously [08/17])
> 
> ---
>  drivers/thermal/thermal_core.c |  106
> +++++++++++++++++++++++++++++++----------
>  include/linux/thermal.h        |   10 +++
>  2 files changed, 92 insertions(+), 24 deletions(-)
> 
> Index: linux-pm/include/linux/thermal.h
> ===================================================================
> --- linux-pm.orig/include/linux/thermal.h
> +++ linux-pm/include/linux/thermal.h
> @@ -85,11 +85,21 @@ struct thermal_trip {
>  
>  struct thermal_zone_device;
>  
> +struct cooling_spec {
> +       unsigned long upper;    /* Highest cooling state  */
> +       unsigned long lower;    /* Lowest cooling state  */
> +       unsigned int weight;    /* Cooling device weight */
> +};
> +
>  struct thermal_zone_device_ops {
>         int (*bind) (struct thermal_zone_device *,
>                      struct thermal_cooling_device *);
>         int (*unbind) (struct thermal_zone_device *,
>                        struct thermal_cooling_device *);
> +       bool (*should_bind) (struct thermal_zone_device *,
> +                            const struct thermal_trip *,
> +                            struct thermal_cooling_device *,
> +                            struct cooling_spec *);
>         int (*get_temp) (struct thermal_zone_device *, int *);
>         int (*set_trips) (struct thermal_zone_device *, int, int);
>         int (*change_mode) (struct thermal_zone_device *,
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -991,12 +991,61 @@ static struct class *thermal_class;
>  
>  static inline
>  void print_bind_err_msg(struct thermal_zone_device *tz,
> +                       const struct thermal_trip *trip,
>                         struct thermal_cooling_device *cdev, int ret)
>  {
> +       if (trip) {
> +               dev_err(&tz->device, "binding cdev %s to trip %d
> failed: %d\n",
> +                       cdev->type, thermal_zone_trip_id(tz, trip),
> ret);
> +               return;
> +       }
> +
>         dev_err(&tz->device, "binding zone %s with cdev %s
> failed:%d\n",
>                 tz->type, cdev->type, ret);
>  }
>  
> +static void thermal_zone_cdev_binding(struct thermal_zone_device
> *tz,
> +                                     struct thermal_cooling_device
> *cdev)
> +{
> +       struct thermal_trip_desc *td;
> +       int ret;
> +
> +       /*
> +        * Old-style binding. The .bind() callback is expected to
> call
> +        * thermal_bind_cdev_to_trip() under the thermal zone lock.
> +        */
> +       if (tz->ops.bind) {
> +               ret = tz->ops.bind(tz, cdev);
> +               if (ret)
> +                       print_bind_err_msg(tz, NULL, cdev, ret);
> +
> +               return;
> +       }
> +
> +       if (!tz->ops.should_bind)
> +               return;
> +
> +       mutex_lock(&tz->lock);
> +
> +       for_each_trip_desc(tz, td) {
> +               struct thermal_trip *trip = &td->trip;
> +               struct cooling_spec c = {
> +                       .upper = THERMAL_NO_LIMIT,
> +                       .lower = THERMAL_NO_LIMIT,
> +                       .weight = THERMAL_WEIGHT_DEFAULT
> +               };
> +
> +               if (tz->ops.should_bind(tz, trip, cdev, &c)) {
> +                       ret = thermal_bind_cdev_to_trip(tz, trip,
> cdev, c.upper,
> +                                                       c.lower,
> c.weight);
> +                       if (ret)
> +                               print_bind_err_msg(tz, trip, cdev,
> ret);
> +               }
> +       }
> +
> +       mutex_unlock(&tz->lock);
> +}
> +
>  /**
>   * __thermal_cooling_device_register() - register a new thermal
> cooling device
>   * @np:                a pointer to a device tree node.
> @@ -1092,13 +1141,8 @@ __thermal_cooling_device_register(struct
>         list_add(&cdev->node, &thermal_cdev_list);
>  
>         /* Update binding information for 'this' new cdev */
> -       list_for_each_entry(pos, &thermal_tz_list, node) {
> -               if (pos->ops.bind) {
> -                       ret = pos->ops.bind(pos, cdev);
> -                       if (ret)
> -                               print_bind_err_msg(pos, cdev, ret);
> -               }
> -       }
> +       list_for_each_entry(pos, &thermal_tz_list, node)
> +               thermal_zone_cdev_binding(pos, cdev);
>  
>         list_for_each_entry(pos, &thermal_tz_list, node)
>                 if (atomic_cmpxchg(&pos->need_update, 1, 0))
> @@ -1299,6 +1343,28 @@ unlock_list:
>  }
>  EXPORT_SYMBOL_GPL(thermal_cooling_device_update);
>  
> +static void thermal_zone_cdev_unbinding(struct thermal_zone_device
> *tz,
> +                                       struct thermal_cooling_device
> *cdev)
> +{
> +       struct thermal_trip_desc *td;
> +
> +       /*
> +        * Old-style unbinding.  The .unbind callback is expected to
> call
> +        * thermal_unbind_cdev_from_trip() under the thermal zone
> lock.
> +        */
> +       if (tz->ops.unbind) {
> +               tz->ops.unbind(tz, cdev);
> +               return;
> +       }
> +
> +       mutex_lock(&tz->lock);
> +
> +       for_each_trip_desc(tz, td)
> +               thermal_unbind_cdev_from_trip(tz, &td->trip, cdev);
> +
> +       mutex_unlock(&tz->lock);
> +}
> +
>  /**
>   * thermal_cooling_device_unregister - removes a thermal cooling
> device
>   * @cdev:      the thermal cooling device to remove.
> @@ -1325,10 +1391,8 @@ void thermal_cooling_device_unregister(s
>         list_del(&cdev->node);
>  
>         /* Unbind all thermal zones associated with 'this' cdev */
> -       list_for_each_entry(tz, &thermal_tz_list, node) {
> -               if (tz->ops.unbind)
> -                       tz->ops.unbind(tz, cdev);
> -       }
> +       list_for_each_entry(tz, &thermal_tz_list, node)
> +               thermal_zone_cdev_unbinding(tz, cdev);
>  
>         mutex_unlock(&thermal_list_lock);
>  
> @@ -1403,6 +1467,7 @@ thermal_zone_device_register_with_trips(
>                                         unsigned int polling_delay)
>  {
>         const struct thermal_trip *trip = trips;
> +       struct thermal_cooling_device *cdev;
>         struct thermal_zone_device *tz;
>         struct thermal_trip_desc *td;
>         int id;
> @@ -1425,8 +1490,9 @@ thermal_zone_device_register_with_trips(
>                 return ERR_PTR(-EINVAL);
>         }
>  
> -       if (!ops || !ops->get_temp) {
> -               pr_err("Thermal zone device ops not defined\n");
> +       if (!ops || !ops->get_temp || (ops->should_bind && ops->bind)
> ||
> +           (ops->should_bind && ops->unbind)) {
> +               pr_err("Thermal zone device ops not defined or
> invalid\n");
>                 return ERR_PTR(-EINVAL);
>         }
>  
> @@ -1539,15 +1605,8 @@ thermal_zone_device_register_with_trips(
>         mutex_unlock(&tz->lock);
>  
>         /* Bind cooling devices for this zone */
> -       if (tz->ops.bind) {
> -               struct thermal_cooling_device *cdev;
> -
> -               list_for_each_entry(cdev, &thermal_cdev_list, node) {
> -                       result = tz->ops.bind(tz, cdev);
> -                       if (result)
> -                               print_bind_err_msg(tz, cdev, result);
> -               }
> -       }
> +       list_for_each_entry(cdev, &thermal_cdev_list, node)
> +               thermal_zone_cdev_binding(tz, cdev);
>  
>         mutex_unlock(&thermal_list_lock);
>  
> @@ -1641,8 +1700,7 @@ void thermal_zone_device_unregister(stru
>  
>         /* Unbind all cdevs associated with 'this' thermal zone */
>         list_for_each_entry(cdev, &thermal_cdev_list, node)
> -               if (tz->ops.unbind)
> -                       tz->ops.unbind(tz, cdev);
> +               thermal_zone_cdev_unbinding(tz, cdev);
>  
>         mutex_unlock(&thermal_list_lock);
>  
> 
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 07/14] thermal: ACPI: Use the .should_bind() thermal zone callback
  2024-08-19 16:02 ` [PATCH v3 07/14] thermal: ACPI: Use the " Rafael J. Wysocki
@ 2024-08-20  7:06   ` Zhang, Rui
  2024-08-21 13:22   ` Daniel Lezcano
  1 sibling, 0 replies; 66+ messages in thread
From: Zhang, Rui @ 2024-08-20  7:06 UTC (permalink / raw)
  To: linux-pm@vger.kernel.org, rjw@rjwysocki.net
  Cc: lukasz.luba@arm.com, linux-kernel@vger.kernel.org,
	daniel.lezcano@linaro.org

On Mon, 2024-08-19 at 18:02 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make the ACPI thermal zone driver use the .should_bind() thermal zone
> callback to provide the thermal core with the information on whether
> or
> not to bind the given cooling device to the given trip point in the
> given thermal zone.  If it returns 'true', the thermal core will bind
> the cooling device to the trip and the corresponding unbinding will
> be
> taken care of automatically by the core on the removal of the
> involved
> thermal zone or cooling device.
> 
> This replaces the .bind() and .unbind() thermal zone callbacks which
> allows the code to be simplified quite significantly while providing
> the same functionality.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui
> ---
> 
> v1 -> v3: No changes (previously [09/17])
> 
> This patch only depends on the previous one introducing the
> .should_bind()
> thermal zone callback.
> 
> ---
>  drivers/acpi/thermal.c |   64 ++++++--------------------------------
> -----------
>  1 file changed, 9 insertions(+), 55 deletions(-)
> 
> Index: linux-pm/drivers/acpi/thermal.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/thermal.c
> +++ linux-pm/drivers/acpi/thermal.c
> @@ -558,77 +558,31 @@ static void acpi_thermal_zone_device_cri
>         thermal_zone_device_critical(thermal);
>  }
>  
> -struct acpi_thermal_bind_data {
> -       struct thermal_zone_device *thermal;
> -       struct thermal_cooling_device *cdev;
> -       bool bind;
> -};
> -
> -static int bind_unbind_cdev_cb(struct thermal_trip *trip, void *arg)
> +static bool acpi_thermal_should_bind_cdev(struct thermal_zone_device
> *thermal,
> +                                         const struct thermal_trip
> *trip,
> +                                         struct
> thermal_cooling_device *cdev,
> +                                         struct cooling_spec *c)
>  {
>         struct acpi_thermal_trip *acpi_trip = trip->priv;
> -       struct acpi_thermal_bind_data *bd = arg;
> -       struct thermal_zone_device *thermal = bd->thermal;
> -       struct thermal_cooling_device *cdev = bd->cdev;
>         struct acpi_device *cdev_adev = cdev->devdata;
>         int i;
>  
>         /* Skip critical and hot trips. */
>         if (!acpi_trip)
> -               return 0;
> +               return false;
>  
>         for (i = 0; i < acpi_trip->devices.count; i++) {
>                 acpi_handle handle = acpi_trip->devices.handles[i];
> -               struct acpi_device *adev =
> acpi_fetch_acpi_dev(handle);
> -
> -               if (adev != cdev_adev)
> -                       continue;
>  
> -               if (bd->bind) {
> -                       int ret;
> -
> -                       ret = thermal_bind_cdev_to_trip(thermal,
> trip, cdev,
> -
>                                                        THERMAL_NO_LIMI
> T,
> -
>                                                        THERMAL_NO_LIMI
> T,
> -
>                                                        THERMAL_WEIGHT_
> DEFAULT);
> -                       if (ret)
> -                               return ret;
> -               } else {
> -                       thermal_unbind_cdev_from_trip(thermal, trip,
> cdev);
> -               }
> +               if (acpi_fetch_acpi_dev(handle) == cdev_adev)
> +                       return true;
>         }
>  
> -       return 0;
> -}
> -
> -static int acpi_thermal_bind_unbind_cdev(struct thermal_zone_device
> *thermal,
> -                                        struct
> thermal_cooling_device *cdev,
> -                                        bool bind)
> -{
> -       struct acpi_thermal_bind_data bd = {
> -               .thermal = thermal, .cdev = cdev, .bind = bind
> -       };
> -
> -       return thermal_zone_for_each_trip(thermal,
> bind_unbind_cdev_cb, &bd);
> -}
> -
> -static int
> -acpi_thermal_bind_cooling_device(struct thermal_zone_device
> *thermal,
> -                                struct thermal_cooling_device *cdev)
> -{
> -       return acpi_thermal_bind_unbind_cdev(thermal, cdev, true);
> -}
> -
> -static int
> -acpi_thermal_unbind_cooling_device(struct thermal_zone_device
> *thermal,
> -                                  struct thermal_cooling_device
> *cdev)
> -{
> -       return acpi_thermal_bind_unbind_cdev(thermal, cdev, false);
> +       return false;
>  }
>  
>  static const struct thermal_zone_device_ops acpi_thermal_zone_ops =
> {
> -       .bind = acpi_thermal_bind_cooling_device,
> -       .unbind = acpi_thermal_unbind_cooling_device,
> +       .should_bind = acpi_thermal_should_bind_cdev,
>         .get_temp = thermal_get_temp,
>         .get_trend = thermal_get_trend,
>         .hot = acpi_thermal_zone_device_hot,
> 
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 08/14] thermal: core: Unexport thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
  2024-08-19 16:05 ` [PATCH v3 08/14] thermal: core: Unexport thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip() Rafael J. Wysocki
@ 2024-08-20  7:08   ` Zhang, Rui
  2024-08-21  9:18   ` lihuisong (C)
  2024-08-21 13:23   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: Zhang, Rui @ 2024-08-20  7:08 UTC (permalink / raw)
  To: linux-pm@vger.kernel.org, rjw@rjwysocki.net
  Cc: lukasz.luba@arm.com, linux-kernel@vger.kernel.org,
	daniel.lezcano@linaro.org

On Mon, 2024-08-19 at 18:05 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
> are only called locally in the thermal core now, they can be static,
> so change their definitions accordingly and drop their headers from
> the global thermal header file.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui

> ---
> 
> v2 -> v3: Rebase after dropping patches [04-05/17] from the series
> 
> v1 -> v2: No changes
> 
> ---
>  drivers/thermal/thermal_core.c |   10 ++++------
>  include/linux/thermal.h        |    8 --------
>  2 files changed, 4 insertions(+), 14 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -773,7 +773,7 @@ struct thermal_zone_device *thermal_zone
>   *
>   * Return: 0 on success, the proper error value otherwise.
>   */
> -int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
> +static int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
>                                      const struct thermal_trip *trip,
>                                      struct thermal_cooling_device
> *cdev,
>                                      unsigned long upper, unsigned
> long lower,
> @@ -877,7 +877,6 @@ free_mem:
>         kfree(dev);
>         return result;
>  }
> -EXPORT_SYMBOL_GPL(thermal_bind_cdev_to_trip);
>  
>  int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
>                                      int trip_index,
> @@ -913,9 +912,9 @@ EXPORT_SYMBOL_GPL(thermal_zone_bind_cool
>   *
>   * Return: 0 on success, the proper error value otherwise.
>   */
> -int thermal_unbind_cdev_from_trip(struct thermal_zone_device *tz,
> -                                 const struct thermal_trip *trip,
> -                                 struct thermal_cooling_device
> *cdev)
> +static int thermal_unbind_cdev_from_trip(struct thermal_zone_device
> *tz,
> +                                        const struct thermal_trip
> *trip,
> +                                        struct
> thermal_cooling_device *cdev)
>  {
>         struct thermal_instance *pos, *next;
>  
> @@ -945,7 +944,6 @@ unbind:
>         kfree(pos);
>         return 0;
>  }
> -EXPORT_SYMBOL_GPL(thermal_unbind_cdev_from_trip);
>  
>  int thermal_zone_unbind_cooling_device(struct thermal_zone_device
> *tz,
>                                        int trip_index,
> Index: linux-pm/include/linux/thermal.h
> ===================================================================
> --- linux-pm.orig/include/linux/thermal.h
> +++ linux-pm/include/linux/thermal.h
> @@ -247,18 +247,10 @@ const char *thermal_zone_device_type(str
>  int thermal_zone_device_id(struct thermal_zone_device *tzd);
>  struct device *thermal_zone_device(struct thermal_zone_device *tzd);
>  
> -int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
> -                             const struct thermal_trip *trip,
> -                             struct thermal_cooling_device *cdev,
> -                             unsigned long upper, unsigned long
> lower,
> -                             unsigned int weight);
>  int thermal_zone_bind_cooling_device(struct thermal_zone_device *,
> int,
>                                      struct thermal_cooling_device *,
>                                      unsigned long, unsigned long,
>                                      unsigned int);
> -int thermal_unbind_cdev_from_trip(struct thermal_zone_device *tz,
> -                                 const struct thermal_trip *trip,
> -                                 struct thermal_cooling_device
> *cdev);
>  int thermal_zone_unbind_cooling_device(struct thermal_zone_device *,
> int,
>                                        struct thermal_cooling_device
> *);
>  void thermal_zone_device_update(struct thermal_zone_device *,
> 
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 13/14] thermal: core: Drop unused bind/unbind functions and callbacks
  2024-08-19 16:31 ` [PATCH v3 13/14] thermal: core: Drop unused bind/unbind functions and callbacks Rafael J. Wysocki
@ 2024-08-20  7:10   ` Zhang, Rui
  2024-08-21  9:33   ` lihuisong (C)
  2024-08-21 14:24   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: Zhang, Rui @ 2024-08-20  7:10 UTC (permalink / raw)
  To: linux-pm@vger.kernel.org, rjw@rjwysocki.net
  Cc: lukasz.luba@arm.com, linux-kernel@vger.kernel.org,
	daniel.lezcano@linaro.org

On Mon, 2024-08-19 at 18:31 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> There are no more callers of thermal_zone_bind_cooling_device() and
> thermal_zone_unbind_cooling_device(), so drop them along with all of
> the corresponding headers, code and documentation.
> 
> Moreover, because the .bind() and .unbind() thermal zone callbacks
> would
> only be used when the above functions, respectively, were called,
> drop
> them as well along with all of the code related to them.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui

> ---
> 
> v2 -> v3: No changes
> 
> v1 -> v2:
>    * Update the list of thermal zone ops in the documentation.
> 
> ---
>  Documentation/driver-api/thermal/sysfs-api.rst |   59 +-------------
> -----
>  drivers/thermal/thermal_core.c                 |   75 +-------------
> -----------
>  include/linux/thermal.h                        |   10 ---
>  3 files changed, 6 insertions(+), 138 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -878,28 +878,6 @@ free_mem:
>         return result;
>  }
>  
> -int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
> -                                    int trip_index,
> -                                    struct thermal_cooling_device
> *cdev,
> -                                    unsigned long upper, unsigned
> long lower,
> -                                    unsigned int weight)
> -{
> -       int ret;
> -
> -       if (trip_index < 0 || trip_index >= tz->num_trips)
> -               return -EINVAL;
> -
> -       mutex_lock(&tz->lock);
> -
> -       ret = thermal_bind_cdev_to_trip(tz, &tz-
> >trips[trip_index].trip, cdev,
> -                                       upper, lower, weight);
> -
> -       mutex_unlock(&tz->lock);
> -
> -       return ret;
> -}
> -EXPORT_SYMBOL_GPL(thermal_zone_bind_cooling_device);
> -
>  /**
>   * thermal_unbind_cdev_from_trip - unbind a cooling device from a
> thermal zone.
>   * @tz:                pointer to a struct thermal_zone_device.
> @@ -945,25 +923,6 @@ unbind:
>         return 0;
>  }
>  
> -int thermal_zone_unbind_cooling_device(struct thermal_zone_device
> *tz,
> -                                      int trip_index,
> -                                      struct thermal_cooling_device
> *cdev)
> -{
> -       int ret;
> -
> -       if (trip_index < 0 || trip_index >= tz->num_trips)
> -               return -EINVAL;
> -
> -       mutex_lock(&tz->lock);
> -
> -       ret = thermal_unbind_cdev_from_trip(tz, &tz-
> >trips[trip_index].trip, cdev);
> -
> -       mutex_unlock(&tz->lock);
> -
> -       return ret;
> -}
> -EXPORT_SYMBOL_GPL(thermal_zone_unbind_cooling_device);
> -
>  static void thermal_release(struct device *dev)
>  {
>         struct thermal_zone_device *tz;
> @@ -992,14 +951,8 @@ void print_bind_err_msg(struct thermal_z
>                         const struct thermal_trip *trip,
>                         struct thermal_cooling_device *cdev, int ret)
>  {
> -       if (trip) {
> -               dev_err(&tz->device, "binding cdev %s to trip %d
> failed: %d\n",
> -                       cdev->type, thermal_zone_trip_id(tz, trip),
> ret);
> -               return;
> -       }
> -
> -       dev_err(&tz->device, "binding zone %s with cdev %s
> failed:%d\n",
> -               tz->type, cdev->type, ret);
> +       dev_err(&tz->device, "binding cdev %s to trip %d failed:
> %d\n",
> +               cdev->type, thermal_zone_trip_id(tz, trip), ret);
>  }
>  
>  static void thermal_zone_cdev_binding(struct thermal_zone_device
> *tz,
> @@ -1008,18 +961,6 @@ static void thermal_zone_cdev_binding(st
>         struct thermal_trip_desc *td;
>         int ret;
>  
> -       /*
> -        * Old-style binding. The .bind() callback is expected to
> call
> -        * thermal_bind_cdev_to_trip() under the thermal zone lock.
> -        */
> -       if (tz->ops.bind) {
> -               ret = tz->ops.bind(tz, cdev);
> -               if (ret)
> -                       print_bind_err_msg(tz, NULL, cdev, ret);
> -
> -               return;
> -       }
> -
>         if (!tz->ops.should_bind)
>                 return;
>  
> @@ -1346,15 +1287,6 @@ static void thermal_zone_cdev_unbinding(
>  {
>         struct thermal_trip_desc *td;
>  
> -       /*
> -        * Old-style unbinding.  The .unbind callback is expected to
> call
> -        * thermal_unbind_cdev_from_trip() under the thermal zone
> lock.
> -        */
> -       if (tz->ops.unbind) {
> -               tz->ops.unbind(tz, cdev);
> -               return;
> -       }
> -
>         mutex_lock(&tz->lock);
>  
>         for_each_trip_desc(tz, td)
> @@ -1488,8 +1420,7 @@ thermal_zone_device_register_with_trips(
>                 return ERR_PTR(-EINVAL);
>         }
>  
> -       if (!ops || !ops->get_temp || (ops->should_bind && ops->bind)
> ||
> -           (ops->should_bind && ops->unbind)) {
> +       if (!ops || !ops->get_temp) {
>                 pr_err("Thermal zone device ops not defined or
> invalid\n");
>                 return ERR_PTR(-EINVAL);
>         }
> Index: linux-pm/include/linux/thermal.h
> ===================================================================
> --- linux-pm.orig/include/linux/thermal.h
> +++ linux-pm/include/linux/thermal.h
> @@ -92,10 +92,6 @@ struct cooling_spec {
>  };
>  
>  struct thermal_zone_device_ops {
> -       int (*bind) (struct thermal_zone_device *,
> -                    struct thermal_cooling_device *);
> -       int (*unbind) (struct thermal_zone_device *,
> -                      struct thermal_cooling_device *);
>         bool (*should_bind) (struct thermal_zone_device *,
>                              const struct thermal_trip *,
>                              struct thermal_cooling_device *,
> @@ -247,12 +243,6 @@ const char *thermal_zone_device_type(str
>  int thermal_zone_device_id(struct thermal_zone_device *tzd);
>  struct device *thermal_zone_device(struct thermal_zone_device *tzd);
>  
> -int thermal_zone_bind_cooling_device(struct thermal_zone_device *,
> int,
> -                                    struct thermal_cooling_device *,
> -                                    unsigned long, unsigned long,
> -                                    unsigned int);
> -int thermal_zone_unbind_cooling_device(struct thermal_zone_device *,
> int,
> -                                      struct thermal_cooling_device
> *);
>  void thermal_zone_device_update(struct thermal_zone_device *,
>                                 enum thermal_notify_event);
>  
> Index: linux-pm/Documentation/driver-api/thermal/sysfs-api.rst
> ===================================================================
> --- linux-pm.orig/Documentation/driver-api/thermal/sysfs-api.rst
> +++ linux-pm/Documentation/driver-api/thermal/sysfs-api.rst
> @@ -58,10 +58,9 @@ temperature) and throttle appropriate de
>      ops:
>         thermal zone device call-backs.
>  
> -       .bind:
> -               bind the thermal zone device with a thermal cooling
> device.
> -       .unbind:
> -               unbind the thermal zone device with a thermal cooling
> device.
> +       .should_bind:
> +               check whether or not a given cooling device should be
> bound to
> +               a given trip point in this thermal zone.
>         .get_temp:
>                 get the current temperature of the thermal zone.
>         .set_trips:
> @@ -246,56 +245,6 @@ temperature) and throttle appropriate de
>      It deletes the corresponding entry from /sys/class/thermal
> folder and
>      unbinds itself from all the thermal zone devices using it.
>  
> -1.3 interface for binding a thermal zone device with a thermal
> cooling device
> ---------------------------------------------------------------------
> ---------
> -
> -    ::
> -
> -       int thermal_zone_bind_cooling_device(struct
> thermal_zone_device *tz,
> -               int trip, struct thermal_cooling_device *cdev,
> -               unsigned long upper, unsigned long lower, unsigned
> int weight);
> -
> -    This interface function binds a thermal cooling device to a
> particular trip
> -    point of a thermal zone device.
> -
> -    This function is usually called in the thermal zone device .bind
> callback.
> -
> -    tz:
> -         the thermal zone device
> -    cdev:
> -         thermal cooling device
> -    trip:
> -         indicates which trip point in this thermal zone the cooling
> device
> -         is associated with.
> -    upper:
> -         the Maximum cooling state for this trip point.
> -         THERMAL_NO_LIMIT means no upper limit,
> -         and the cooling device can be in max_state.
> -    lower:
> -         the Minimum cooling state can be used for this trip point.
> -         THERMAL_NO_LIMIT means no lower limit,
> -         and the cooling device can be in cooling state 0.
> -    weight:
> -         the influence of this cooling device in this thermal
> -         zone.  See 1.4.1 below for more information.
> -
> -    ::
> -
> -       int thermal_zone_unbind_cooling_device(struct
> thermal_zone_device *tz,
> -                               int trip, struct
> thermal_cooling_device *cdev);
> -
> -    This interface function unbinds a thermal cooling device from a
> particular
> -    trip point of a thermal zone device. This function is usually
> called in
> -    the thermal zone device .unbind callback.
> -
> -    tz:
> -       the thermal zone device
> -    cdev:
> -       thermal cooling device
> -    trip:
> -       indicates which trip point in this thermal zone the cooling
> device
> -       is associated with.
> -
>  1.4 Thermal Zone Parameters
>  ---------------------------
>  
> @@ -366,8 +315,6 @@ Thermal cooling device sys I/F, created
>  
>  Then next two dynamic attributes are created/removed in pairs. They
> represent
>  the relationship between a thermal zone and its associated cooling
> device.
> -They are created/removed for each successful execution of
> -thermal_zone_bind_cooling_device/thermal_zone_unbind_cooling_device.
>  
>  ::
>  
> 
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 14/14] thermal: core: Clean up trip bind/unbind functions
  2024-08-19 16:33 ` [PATCH v3 14/14] thermal: core: Clean up trip bind/unbind functions Rafael J. Wysocki
@ 2024-08-20  7:11   ` Zhang, Rui
  2024-08-21  9:34   ` lihuisong (C)
  2024-08-21 14:29   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: Zhang, Rui @ 2024-08-20  7:11 UTC (permalink / raw)
  To: linux-pm@vger.kernel.org, rjw@rjwysocki.net
  Cc: lukasz.luba@arm.com, linux-kernel@vger.kernel.org,
	daniel.lezcano@linaro.org

On Mon, 2024-08-19 at 18:33 +0200, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make thermal_bind_cdev_to_trip() take a struct cooling_spec pointer
> to reduce the number of its arguments, change the return type of
> thermal_unbind_cdev_from_trip() to void and rearrange the code in
> thermal_zone_cdev_binding() to reduce the indentation level.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Zhang Rui <rui.zhang@intel.com>

thanks,
rui

> ---
> 
> v2 -> v3: Subject fix
> 
> v1-> v2: No changes
> 
> ---
>  drivers/thermal/thermal_core.c |   54 +++++++++++++++---------------
> -----------
>  1 file changed, 21 insertions(+), 33 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -757,15 +757,7 @@ struct thermal_zone_device *thermal_zone
>   * @tz:                pointer to struct thermal_zone_device
>   * @trip:      trip point the cooling devices is associated with in
> this zone.
>   * @cdev:      pointer to struct thermal_cooling_device
> - * @upper:     the Maximum cooling state for this trip point.
> - *             THERMAL_NO_LIMIT means no upper limit,
> - *             and the cooling device can be in max_state.
> - * @lower:     the Minimum cooling state can be used for this trip
> point.
> - *             THERMAL_NO_LIMIT means no lower limit,
> - *             and the cooling device can be in cooling state 0.
> - * @weight:    The weight of the cooling device to be bound to the
> - *             thermal zone. Use THERMAL_WEIGHT_DEFAULT for the
> - *             default value
> + * @c:         cooling specification for @trip and @cdev
>   *
>   * This interface function bind a thermal cooling device to the
> certain trip
>   * point of a thermal zone device.
> @@ -776,8 +768,7 @@ struct thermal_zone_device *thermal_zone
>  static int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
>                                      const struct thermal_trip *trip,
>                                      struct thermal_cooling_device
> *cdev,
> -                                    unsigned long upper, unsigned
> long lower,
> -                                    unsigned int weight)
> +                                    struct cooling_spec *c)
>  {
>         struct thermal_instance *dev;
>         struct thermal_instance *pos;
> @@ -791,17 +782,17 @@ static int thermal_bind_cdev_to_trip(str
>                 return -EINVAL;
>  
>         /* lower default 0, upper default max_state */
> -       if (lower == THERMAL_NO_LIMIT)
> -               lower = 0;
> +       if (c->lower == THERMAL_NO_LIMIT)
> +               c->lower = 0;
>  
> -       if (upper == THERMAL_NO_LIMIT) {
> -               upper = cdev->max_state;
> +       if (c->upper == THERMAL_NO_LIMIT) {
> +               c->upper = cdev->max_state;
>                 upper_no_limit = true;
>         } else {
>                 upper_no_limit = false;
>         }
>  
> -       if (lower > upper || upper > cdev->max_state)
> +       if (c->lower > c->upper || c->upper > cdev->max_state)
>                 return -EINVAL;
>  
>         dev = kzalloc(sizeof(*dev), GFP_KERNEL);
> @@ -810,11 +801,11 @@ static int thermal_bind_cdev_to_trip(str
>         dev->tz = tz;
>         dev->cdev = cdev;
>         dev->trip = trip;
> -       dev->upper = upper;
> +       dev->upper = c->upper;
>         dev->upper_no_limit = upper_no_limit;
> -       dev->lower = lower;
> +       dev->lower = c->lower;
>         dev->target = THERMAL_NO_TARGET;
> -       dev->weight = weight;
> +       dev->weight = c->weight;
>  
>         result = ida_alloc(&tz->ida, GFP_KERNEL);
>         if (result < 0)
> @@ -887,12 +878,10 @@ free_mem:
>   * This interface function unbind a thermal cooling device from the
> certain
>   * trip point of a thermal zone device.
>   * This function is usually called in the thermal zone device
> .unbind callback.
> - *
> - * Return: 0 on success, the proper error value otherwise.
>   */
> -static int thermal_unbind_cdev_from_trip(struct thermal_zone_device
> *tz,
> -                                        const struct thermal_trip
> *trip,
> -                                        struct
> thermal_cooling_device *cdev)
> +static void thermal_unbind_cdev_from_trip(struct thermal_zone_device
> *tz,
> +                                         const struct thermal_trip
> *trip,
> +                                         struct
> thermal_cooling_device *cdev)
>  {
>         struct thermal_instance *pos, *next;
>  
> @@ -912,7 +901,7 @@ static int thermal_unbind_cdev_from_trip
>         }
>         mutex_unlock(&cdev->lock);
>  
> -       return -ENODEV;
> +       return;
>  
>  unbind:
>         device_remove_file(&tz->device, &pos->weight_attr);
> @@ -920,7 +909,6 @@ unbind:
>         sysfs_remove_link(&tz->device.kobj, pos->name);
>         ida_free(&tz->ida, pos->id);
>         kfree(pos);
> -       return 0;
>  }
>  
>  static void thermal_release(struct device *dev)
> @@ -959,7 +947,6 @@ static void thermal_zone_cdev_binding(st
>                                       struct thermal_cooling_device
> *cdev)
>  {
>         struct thermal_trip_desc *td;
> -       int ret;
>  
>         if (!tz->ops.should_bind)
>                 return;
> @@ -973,13 +960,14 @@ static void thermal_zone_cdev_binding(st
>                         .lower = THERMAL_NO_LIMIT,
>                         .weight = THERMAL_WEIGHT_DEFAULT
>                 };
> +               int ret;
>  
> -               if (tz->ops.should_bind(tz, trip, cdev, &c)) {
> -                       ret = thermal_bind_cdev_to_trip(tz, trip,
> cdev, c.upper,
> -                                                       c.lower,
> c.weight);
> -                       if (ret)
> -                               print_bind_err_msg(tz, trip, cdev,
> ret);
> -               }
> +               if (!tz->ops.should_bind(tz, trip, cdev, &c))
> +                       continue;
> +
> +               ret = thermal_bind_cdev_to_trip(tz, trip, cdev, &c);
> +               if (ret)
> +                       print_bind_err_msg(tz, trip, cdev, ret);
>         }
>  
>         mutex_unlock(&tz->lock);
> 
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 04/14] thermal: sysfs: Use the dev argument in instance-related show/store
  2024-08-19 15:56 ` [PATCH v3 04/14] thermal: sysfs: Use the dev argument in instance-related show/store Rafael J. Wysocki
  2024-08-20  7:05   ` Zhang, Rui
@ 2024-08-20  7:59   ` lihuisong (C)
  2024-08-21  9:36   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: lihuisong (C) @ 2024-08-20  7:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui


在 2024/8/19 23:56, Rafael J. Wysocki 写道:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Two sysfs show/store functions for attributes representing thermal
> instances, trip_point_show() and weight_store(), retrieve the thermal
> zone pointer from the instance object at hand, but they may also get
> it from their dev argument, which is more consistent with what the
> other thermal sysfs functions do, so make them do so.
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> v1 -> v3: No changes (previously [06/17])
>
> ---
>   drivers/thermal/thermal_sysfs.c |   15 +++++++--------
>   1 file changed, 7 insertions(+), 8 deletions(-)
Using to_thermal_zone() and to_cooling_device() in sysfs looks good to me.
Acked-by: Huisong Li <lihuisong@huawei.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions
  2024-08-19 15:58 ` [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions Rafael J. Wysocki
  2024-08-20  7:05   ` Zhang, Rui
@ 2024-08-20  8:27   ` lihuisong (C)
  2024-08-20 10:27     ` Rafael J. Wysocki
  2024-08-21  9:46   ` Daniel Lezcano
  2 siblings, 1 reply; 66+ messages in thread
From: lihuisong (C) @ 2024-08-20  8:27 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui


在 2024/8/19 23:58, Rafael J. Wysocki 写道:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
> acquire the thermal zone lock, the locking rules for their callers get
> complicated.  In particular, the thermal zone lock cannot be acquired
> in any code path leading to one of these functions even though it might
> be useful to do so.
>
> To address this, remove the thermal zone locking from both these
> functions, add lockdep assertions for the thermal zone lock to both
> of them and make their callers acquire the thermal zone lock instead.
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> v2 -> v3: Rebase after dropping patches [04-05/17] from the series
>
> v1 -> v2: No changes
>
> ---
>   drivers/acpi/thermal.c         |    2 +-
>   drivers/thermal/thermal_core.c |   30 ++++++++++++++++++++++--------
>   2 files changed, 23 insertions(+), 9 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -785,6 +785,7 @@ int thermal_bind_cdev_to_trip(struct the
>   	int result;
>   
<snip>
>   
> Index: linux-pm/drivers/acpi/thermal.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/thermal.c
> +++ linux-pm/drivers/acpi/thermal.c
> @@ -609,7 +609,7 @@ static int acpi_thermal_bind_unbind_cdev
>   		.thermal = thermal, .cdev = cdev, .bind = bind
>   	};
>   
> -	return for_each_thermal_trip(thermal, bind_unbind_cdev_cb, &bd);
> +	return thermal_zone_for_each_trip(thermal, bind_unbind_cdev_cb, &bd);
If so, it seems that the for_each_thermal_trip() can be removed or no 
need to export.
>   }
>   
>   static int
>
>
>
>
>
> .

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions
  2024-08-20  8:27   ` lihuisong (C)
@ 2024-08-20 10:27     ` Rafael J. Wysocki
  2024-08-21  9:02       ` lihuisong (C)
  0 siblings, 1 reply; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-20 10:27 UTC (permalink / raw)
  To: lihuisong (C)
  Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano, Lukasz Luba,
	Zhang Rui

On Tue, Aug 20, 2024 at 10:27 AM lihuisong (C) <lihuisong@huawei.com> wrote:
>
>
> 在 2024/8/19 23:58, Rafael J. Wysocki 写道:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
> > acquire the thermal zone lock, the locking rules for their callers get
> > complicated.  In particular, the thermal zone lock cannot be acquired
> > in any code path leading to one of these functions even though it might
> > be useful to do so.
> >
> > To address this, remove the thermal zone locking from both these
> > functions, add lockdep assertions for the thermal zone lock to both
> > of them and make their callers acquire the thermal zone lock instead.
> >
> > No intentional functional impact.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > v2 -> v3: Rebase after dropping patches [04-05/17] from the series
> >
> > v1 -> v2: No changes
> >
> > ---
> >   drivers/acpi/thermal.c         |    2 +-
> >   drivers/thermal/thermal_core.c |   30 ++++++++++++++++++++++--------
> >   2 files changed, 23 insertions(+), 9 deletions(-)
> >
> > Index: linux-pm/drivers/thermal/thermal_core.c
> > ===================================================================
> > --- linux-pm.orig/drivers/thermal/thermal_core.c
> > +++ linux-pm/drivers/thermal/thermal_core.c
> > @@ -785,6 +785,7 @@ int thermal_bind_cdev_to_trip(struct the
> >       int result;
> >
> <snip>
> >
> > Index: linux-pm/drivers/acpi/thermal.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/thermal.c
> > +++ linux-pm/drivers/acpi/thermal.c
> > @@ -609,7 +609,7 @@ static int acpi_thermal_bind_unbind_cdev
> >               .thermal = thermal, .cdev = cdev, .bind = bind
> >       };
> >
> > -     return for_each_thermal_trip(thermal, bind_unbind_cdev_cb, &bd);
> > +     return thermal_zone_for_each_trip(thermal, bind_unbind_cdev_cb, &bd);
> If so, it seems that the for_each_thermal_trip() can be removed or no
> need to export.

I beg to differ:

$ git grep for_each_thermal_trip | head -1
drivers/net/wireless/intel/iwlwifi/mvm/tt.c:
for_each_thermal_trip(mvm->tz_device.tzone, iwl_trip_temp_cb, &twd);

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 01/14] thermal: core: Fold two functions into their respective callers
  2024-08-19 15:50 ` [PATCH v3 01/14] thermal: core: Fold two functions into their respective callers Rafael J. Wysocki
  2024-08-20  7:04   ` Zhang, Rui
@ 2024-08-21  7:57   ` Daniel Lezcano
  1 sibling, 0 replies; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21  7:57 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba, Zhang Rui

On 19/08/2024 17:50, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Fold bind_cdev() into __thermal_cooling_device_register() and bind_tz()
> into thermal_zone_device_register_with_trips() to reduce code bloat and
> make it somewhat easier to follow the code flow.

I don't fully agree with the description but anyway I understand it will 
be more relevant with the next changes

> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()
  2024-08-19 15:51 ` [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip() Rafael J. Wysocki
  2024-08-20  7:05   ` Zhang, Rui
@ 2024-08-21  7:59   ` Daniel Lezcano
  2024-08-21  8:49   ` lihuisong (C)
  2 siblings, 0 replies; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21  7:59 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba, Zhang Rui

On 19/08/2024 17:51, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> It is not necessary to look up the thermal zone and the cooling device
> in the respective global lists to check whether or not they are
> registered.  It is sufficient to check whether or not their respective
> list nodes are empty for this purpose.
> 
> Use the above observation to simplify thermal_bind_cdev_to_trip().  In
> addition, eliminate an unnecessary ternary operator from it.
> 
> Moreover, add lockdep_assert_held() for thermal_list_lock to it because
> that lock must be held by its callers when it is running.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Good catch

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()
  2024-08-19 15:51 ` [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip() Rafael J. Wysocki
  2024-08-20  7:05   ` Zhang, Rui
  2024-08-21  7:59   ` Daniel Lezcano
@ 2024-08-21  8:49   ` lihuisong (C)
  2024-08-21  9:28     ` Daniel Lezcano
  2024-08-21 10:51     ` Rafael J. Wysocki
  2 siblings, 2 replies; 66+ messages in thread
From: lihuisong (C) @ 2024-08-21  8:49 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui


在 2024/8/19 23:51, Rafael J. Wysocki 写道:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> It is not necessary to look up the thermal zone and the cooling device
> in the respective global lists to check whether or not they are
> registered.  It is sufficient to check whether or not their respective
> list nodes are empty for this purpose.
>
> Use the above observation to simplify thermal_bind_cdev_to_trip().  In
> addition, eliminate an unnecessary ternary operator from it.
>
> Moreover, add lockdep_assert_held() for thermal_list_lock to it because
> that lock must be held by its callers when it is running.
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> v1 -> v3: No changes
>
> ---
>   drivers/thermal/thermal_core.c |   16 ++++------------
>   1 file changed, 4 insertions(+), 12 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -781,25 +781,17 @@ int thermal_bind_cdev_to_trip(struct the
>   {
>   	struct thermal_instance *dev;
>   	struct thermal_instance *pos;
> -	struct thermal_zone_device *pos1;
> -	struct thermal_cooling_device *pos2;
>   	bool upper_no_limit;
>   	int result;
>   
> -	list_for_each_entry(pos1, &thermal_tz_list, node) {
> -		if (pos1 == tz)
> -			break;
> -	}
> -	list_for_each_entry(pos2, &thermal_cdev_list, node) {
> -		if (pos2 == cdev)
> -			break;
> -	}
> +	lockdep_assert_held(&thermal_list_lock);
>   
> -	if (tz != pos1 || cdev != pos2)
> +	if (list_empty(&tz->node) || list_empty(&cdev->node))
The old verification is ensure that tz and cdev already add to 
thermal_tz_list and thermal_cdev_list,respectively.
Namely, tz and cdev are definitely registered and intialized.
The check is ok for all untizalized thermal_zone_device and cooling device.
But the new verification doesn't seem to do that.
>   		return -EINVAL;
>   
>   	/* lower default 0, upper default max_state */
> -	lower = lower == THERMAL_NO_LIMIT ? 0 : lower;
> +	if (lower == THERMAL_NO_LIMIT)
> +		lower = 0;
>   
>   	if (upper == THERMAL_NO_LIMIT) {
>   		upper = cdev->max_state;
>
>
>
>
> .

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions
  2024-08-20 10:27     ` Rafael J. Wysocki
@ 2024-08-21  9:02       ` lihuisong (C)
  2024-08-21 10:30         ` Rafael J. Wysocki
  0 siblings, 1 reply; 66+ messages in thread
From: lihuisong (C) @ 2024-08-21  9:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano, Lukasz Luba,
	Zhang Rui


在 2024/8/20 18:27, Rafael J. Wysocki 写道:
> On Tue, Aug 20, 2024 at 10:27 AM lihuisong (C) <lihuisong@huawei.com> wrote:
>>
>> 在 2024/8/19 23:58, Rafael J. Wysocki 写道:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
>>> acquire the thermal zone lock, the locking rules for their callers get
>>> complicated.  In particular, the thermal zone lock cannot be acquired
>>> in any code path leading to one of these functions even though it might
>>> be useful to do so.
>>>
>>> To address this, remove the thermal zone locking from both these
>>> functions, add lockdep assertions for the thermal zone lock to both
>>> of them and make their callers acquire the thermal zone lock instead.
>>>
>>> No intentional functional impact.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>> ---
>>>
>>> v2 -> v3: Rebase after dropping patches [04-05/17] from the series
>>>
>>> v1 -> v2: No changes
>>>
>>> ---
>>>    drivers/acpi/thermal.c         |    2 +-
>>>    drivers/thermal/thermal_core.c |   30 ++++++++++++++++++++++--------
>>>    2 files changed, 23 insertions(+), 9 deletions(-)
>>>
>>> Index: linux-pm/drivers/thermal/thermal_core.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/thermal/thermal_core.c
>>> +++ linux-pm/drivers/thermal/thermal_core.c
>>> @@ -785,6 +785,7 @@ int thermal_bind_cdev_to_trip(struct the
>>>        int result;
>>>
>> <snip>
>>> Index: linux-pm/drivers/acpi/thermal.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/acpi/thermal.c
>>> +++ linux-pm/drivers/acpi/thermal.c
>>> @@ -609,7 +609,7 @@ static int acpi_thermal_bind_unbind_cdev
>>>                .thermal = thermal, .cdev = cdev, .bind = bind
>>>        };
>>>
>>> -     return for_each_thermal_trip(thermal, bind_unbind_cdev_cb, &bd);
>>> +     return thermal_zone_for_each_trip(thermal, bind_unbind_cdev_cb, &bd);
>> If so, it seems that the for_each_thermal_trip() can be removed or no
>> need to export.
> I beg to differ:
>
> $ git grep for_each_thermal_trip | head -1
> drivers/net/wireless/intel/iwlwifi/mvm/tt.c:
> for_each_thermal_trip(mvm->tz_device.tzone, iwl_trip_temp_cb, &twd);
Can we modify it for tt.c?
It doesn't seem to keep two interfaces. I'm a little confused for that.
> .

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 06/14] thermal: core: Introduce .should_bind() thermal zone callback
  2024-08-19 16:00 ` [PATCH v3 06/14] thermal: core: Introduce .should_bind() thermal zone callback Rafael J. Wysocki
  2024-08-20  7:06   ` Zhang, Rui
@ 2024-08-21  9:09   ` lihuisong (C)
  2024-08-21 13:21   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: lihuisong (C) @ 2024-08-21  9:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui


在 2024/8/20 0:00, Rafael J. Wysocki 写道:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> The current design of the code binding cooling devices to trip points in
> thermal zones is convoluted and hard to follow.
>
> Namely, a driver that registers a thermal zone can provide .bind()
> and .unbind() operations for it, which are required to call either
> thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip(),
> respectively, or thermal_zone_bind_cooling_device() and
> thermal_zone_unbind_cooling_device(), respectively, for every relevant
> trip point and the given cooling device.  Moreover, if .bind() is
> provided and .unbind() is not, the cleanup necessary during the removal
> of a thermal zone or a cooling device may not be carried out.
>
> In other words, the core relies on the thermal zone owners to do the
> right thing, which is error prone and far from obvious, even though all
> of that is not really necessary.  Specifically, if the core could ask
> the thermal zone owner, through a special thermal zone callback, whether
> or not a given cooling device should be bound to a given trip point in
> the given thermal zone, it might as well carry out all of the binding
> and unbinding by itself.  In particular, the unbinding can be done
> automatically without involving the thermal zone owner at all because
> all of the thermal instances associated with a thermal zone or cooling
> device going away must be deleted regardless.
>
> Accordingly, introduce a new thermal zone operation, .should_bind(),
> that can be invoked by the thermal core for a given thermal zone,
> trip point and cooling device combination in order to check whether
> or not the cooling device should be bound to the trip point at hand.
> It takes an additional cooling_spec argument allowing the thermal
> zone owner to specify the highest and lowest cooling states of the
> cooling device and its weight for the given trip point binding.
>
> Make the thermal core use this operation, if present, in the absence of
> .bind() and .unbind().  Note that .should_bind() will be called under
> the thermal zone lock.
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
all thermal zone is linked to thermal_tz_list and cooling devices is 
linked to thermal_cdev_list.
But if one cooling device should bind to a trip in thermal zone is 
determined by thermal driver.
Introducing should_bind() looks good to me.
Acked-by: Huisong Li <lihuisong@huawei.com>
>
>
>
>
> .

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 08/14] thermal: core: Unexport thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
  2024-08-19 16:05 ` [PATCH v3 08/14] thermal: core: Unexport thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip() Rafael J. Wysocki
  2024-08-20  7:08   ` Zhang, Rui
@ 2024-08-21  9:18   ` lihuisong (C)
  2024-08-21 13:23   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: lihuisong (C) @ 2024-08-21  9:18 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui, Linux PM


在 2024/8/20 0:05, Rafael J. Wysocki 写道:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
> are only called locally in the thermal core now, they can be static,
> so change their definitions accordingly and drop their headers from
> the global thermal header file.
The thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip() are 
used by acpi/thermal.c.
I guess that the patch [07/14] I didn't receive must have done that.
If so, I'd like add:
Acked-by: Huisong Li <lihuisong@huawei.com>
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>
> v2 -> v3: Rebase after dropping patches [04-05/17] from the series
>
> v1 -> v2: No changes
>
> ---
>   drivers/thermal/thermal_core.c |   10 ++++------
>   include/linux/thermal.h        |    8 --------
>   2 files changed, 4 insertions(+), 14 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -773,7 +773,7 @@ struct thermal_zone_device *thermal_zone
>    *
>    * Return: 0 on success, the proper error value otherwise.
>    */
> -int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
> +static int thermal_bind_cdev_to_trip(struct thermal_zone_device *tz,
>   				     const struct thermal_trip *trip,
>   				     struct thermal_cooling_device *cdev,
>   				     unsigned long upper, unsigned long lower,
> @@ -877,7 +877,6 @@ free_mem:
>   	kfree(dev);
>   	return result;
>   }
> -EXPORT_SYMBOL_GPL(thermal_bind_cdev_to_trip);
>   
>   int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz,
>   				     int trip_index,
> @@ -913,9 +912,9 @@ EXPORT_SYMBOL_GPL(thermal_zone_bind_cool
>    *
>    * Return: 0 on success, the proper error value otherwise.
>    */
> -int thermal_unbind_cdev_from_trip(struct thermal_zone_device *tz,
> -				  const struct thermal_trip *trip,
> -				  struct thermal_cooling_device *cdev)
> +static int thermal_unbind_cdev_from_trip(struct thermal_zone_device *tz,
> +					 const struct thermal_trip *trip,
> +					 struct thermal_cooling_device *cdev)
>   {
>   	struct thermal_instance *pos, *next;
>   
> @@ -945,7 +944,6 @@ unbind:
>   	kfree(pos);
>   	return 0;
>   }
> -EXPORT_SYMBOL_GPL(thermal_unbind_cdev_from_trip);
>   
>   int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz,
<...>
>
>
>
>
> .

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()
  2024-08-21  8:49   ` lihuisong (C)
@ 2024-08-21  9:28     ` Daniel Lezcano
  2024-08-21  9:44       ` lihuisong (C)
  2024-08-21 10:51     ` Rafael J. Wysocki
  1 sibling, 1 reply; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21  9:28 UTC (permalink / raw)
  To: lihuisong (C), Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba, Zhang Rui

On 21/08/2024 10:49, lihuisong (C) wrote:

[ ... ]

>> -    list_for_each_entry(pos2, &thermal_cdev_list, node) {
>> -        if (pos2 == cdev)
>> -            break;
>> -    }
>> +    lockdep_assert_held(&thermal_list_lock);
>> -    if (tz != pos1 || cdev != pos2)
>> +    if (list_empty(&tz->node) || list_empty(&cdev->node))
> The old verification is ensure that tz and cdev already add to 
> thermal_tz_list and thermal_cdev_list,respectively.
> Namely, tz and cdev are definitely registered and intialized.
> The check is ok for all untizalized thermal_zone_device and cooling device.
> But the new verification doesn't seem to do that.

If the tz or the cdev are registered then their "->node" is not empty 
because they are linked with the thermal_list and cdev_list

So either way is browsing the lists to find the tz/cdev or just check 
"->node" is not empty. The latter the faster.

Did I misunderstood your comment ?

[ ... ]

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 03/14] thermal: core: Drop redundant thermal instance checks
  2024-08-19 15:52 ` [PATCH v3 03/14] thermal: core: Drop redundant thermal instance checks Rafael J. Wysocki
  2024-08-20  7:05   ` Zhang, Rui
@ 2024-08-21  9:32   ` Daniel Lezcano
  2024-08-21 11:11     ` Rafael J. Wysocki
  1 sibling, 1 reply; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21  9:32 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba, Zhang Rui

On 19/08/2024 17:52, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Because the trip and cdev pointers are sufficient to identify a thermal
> instance holding them unambiguously, drop the additional thermal zone
> checks from two loops walking the list of thermal instances in a
> thermal zone.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>

I'm wondering if the thermal_instance 'tz' field could be removed too ?

> ---
> 
> v1 -> v3: No changes
> 
> ---
>   drivers/thermal/thermal_core.c |    4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -850,7 +850,7 @@ int thermal_bind_cdev_to_trip(struct the
>   	mutex_lock(&tz->lock);
>   	mutex_lock(&cdev->lock);
>   	list_for_each_entry(pos, &tz->thermal_instances, tz_node)
> -		if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) {
> +		if (pos->trip == trip && pos->cdev == cdev) {
>   			result = -EEXIST;
>   			break;
>   		}
> @@ -915,7 +915,7 @@ int thermal_unbind_cdev_from_trip(struct
>   	mutex_lock(&tz->lock);
>   	mutex_lock(&cdev->lock);
>   	list_for_each_entry_safe(pos, next, &tz->thermal_instances, tz_node) {
> -		if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) {
> +		if (pos->trip == trip && pos->cdev == cdev) {
>   			list_del(&pos->tz_node);
>   			list_del(&pos->cdev_node);
>   
> 
> 
> 


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 13/14] thermal: core: Drop unused bind/unbind functions and callbacks
  2024-08-19 16:31 ` [PATCH v3 13/14] thermal: core: Drop unused bind/unbind functions and callbacks Rafael J. Wysocki
  2024-08-20  7:10   ` Zhang, Rui
@ 2024-08-21  9:33   ` lihuisong (C)
  2024-08-21 14:24   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: lihuisong (C) @ 2024-08-21  9:33 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui, Linux PM


在 2024/8/20 0:31, Rafael J. Wysocki 写道:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> There are no more callers of thermal_zone_bind_cooling_device() and
> thermal_zone_unbind_cooling_device(), so drop them along with all of
> the corresponding headers, code and documentation.
>
> Moreover, because the .bind() and .unbind() thermal zone callbacks would
> only be used when the above functions, respectively, were called, drop
> them as well along with all of the code related to them.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
lgtm,
Acked-by: Huisong Li <lihuisong@huawei.com>
>
>
>
>
> .

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 14/14] thermal: core: Clean up trip bind/unbind functions
  2024-08-19 16:33 ` [PATCH v3 14/14] thermal: core: Clean up trip bind/unbind functions Rafael J. Wysocki
  2024-08-20  7:11   ` Zhang, Rui
@ 2024-08-21  9:34   ` lihuisong (C)
  2024-08-21 14:29   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: lihuisong (C) @ 2024-08-21  9:34 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui, Linux PM


在 2024/8/20 0:33, Rafael J. Wysocki 写道:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Make thermal_bind_cdev_to_trip() take a struct cooling_spec pointer
> to reduce the number of its arguments, change the return type of
> thermal_unbind_cdev_from_trip() to void and rearrange the code in
> thermal_zone_cdev_binding() to reduce the indentation level.
>
> No intentional functional impact.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---

Acked-by: Huisong Li <lihuisong@huawei.com>

>
>
>
> .

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 04/14] thermal: sysfs: Use the dev argument in instance-related show/store
  2024-08-19 15:56 ` [PATCH v3 04/14] thermal: sysfs: Use the dev argument in instance-related show/store Rafael J. Wysocki
  2024-08-20  7:05   ` Zhang, Rui
  2024-08-20  7:59   ` lihuisong (C)
@ 2024-08-21  9:36   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21  9:36 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba, Zhang Rui

On 19/08/2024 17:56, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Two sysfs show/store functions for attributes representing thermal
> instances, trip_point_show() and weight_store(), retrieve the thermal
> zone pointer from the instance object at hand, but they may also get
> it from their dev argument, which is more consistent with what the
> other thermal sysfs functions do, so make them do so.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()
  2024-08-21  9:28     ` Daniel Lezcano
@ 2024-08-21  9:44       ` lihuisong (C)
  2024-08-21 10:49         ` Daniel Lezcano
  2024-08-21 11:12         ` Rafael J. Wysocki
  0 siblings, 2 replies; 66+ messages in thread
From: lihuisong (C) @ 2024-08-21  9:44 UTC (permalink / raw)
  To: Daniel Lezcano, Rafael J. Wysocki; +Cc: LKML, Lukasz Luba, Zhang Rui, Linux PM


在 2024/8/21 17:28, Daniel Lezcano 写道:
> On 21/08/2024 10:49, lihuisong (C) wrote:
>
> [ ... ]
>
>>> -    list_for_each_entry(pos2, &thermal_cdev_list, node) {
>>> -        if (pos2 == cdev)
>>> -            break;
>>> -    }
>>> +    lockdep_assert_held(&thermal_list_lock);
>>> -    if (tz != pos1 || cdev != pos2)
>>> +    if (list_empty(&tz->node) || list_empty(&cdev->node))
>> The old verification is ensure that tz and cdev already add to 
>> thermal_tz_list and thermal_cdev_list,respectively.
>> Namely, tz and cdev are definitely registered and intialized.
>> The check is ok for all untizalized thermal_zone_device and cooling 
>> device.
>> But the new verification doesn't seem to do that.
>
> If the tz or the cdev are registered then their "->node" is not empty 
> because they are linked with the thermal_list and cdev_list
>
> So either way is browsing the lists to find the tz/cdev or just check 
> "->node" is not empty. The latter the faster.
Assume that tz/cdev isn't intiazlized and registered to thermal_tz_list 
or thermal_cdev_list. And then directly call this interface.
>
> Did I misunderstood your comment ?
>
> [ ... ]
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions
  2024-08-19 15:58 ` [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions Rafael J. Wysocki
  2024-08-20  7:05   ` Zhang, Rui
  2024-08-20  8:27   ` lihuisong (C)
@ 2024-08-21  9:46   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21  9:46 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba, Zhang Rui

On 19/08/2024 17:58, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
> acquire the thermal zone lock, the locking rules for their callers get
> complicated.  In particular, the thermal zone lock cannot be acquired
> in any code path leading to one of these functions even though it might
> be useful to do so.
> 
> To address this, remove the thermal zone locking from both these
> functions, add lockdep assertions for the thermal zone lock to both
> of them and make their callers acquire the thermal zone lock instead.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions
  2024-08-21  9:02       ` lihuisong (C)
@ 2024-08-21 10:30         ` Rafael J. Wysocki
  0 siblings, 0 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-21 10:30 UTC (permalink / raw)
  To: lihuisong (C)
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML,
	Daniel Lezcano, Lukasz Luba, Zhang Rui

On Wed, Aug 21, 2024 at 11:02 AM lihuisong (C) <lihuisong@huawei.com> wrote:
>
>
> 在 2024/8/20 18:27, Rafael J. Wysocki 写道:
> > On Tue, Aug 20, 2024 at 10:27 AM lihuisong (C) <lihuisong@huawei.com> wrote:
> >>
> >> 在 2024/8/19 23:58, Rafael J. Wysocki 写道:
> >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>>
> >>> Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
> >>> acquire the thermal zone lock, the locking rules for their callers get
> >>> complicated.  In particular, the thermal zone lock cannot be acquired
> >>> in any code path leading to one of these functions even though it might
> >>> be useful to do so.
> >>>
> >>> To address this, remove the thermal zone locking from both these
> >>> functions, add lockdep assertions for the thermal zone lock to both
> >>> of them and make their callers acquire the thermal zone lock instead.
> >>>
> >>> No intentional functional impact.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>> ---
> >>>
> >>> v2 -> v3: Rebase after dropping patches [04-05/17] from the series
> >>>
> >>> v1 -> v2: No changes
> >>>
> >>> ---
> >>>    drivers/acpi/thermal.c         |    2 +-
> >>>    drivers/thermal/thermal_core.c |   30 ++++++++++++++++++++++--------
> >>>    2 files changed, 23 insertions(+), 9 deletions(-)
> >>>
> >>> Index: linux-pm/drivers/thermal/thermal_core.c
> >>> ===================================================================
> >>> --- linux-pm.orig/drivers/thermal/thermal_core.c
> >>> +++ linux-pm/drivers/thermal/thermal_core.c
> >>> @@ -785,6 +785,7 @@ int thermal_bind_cdev_to_trip(struct the
> >>>        int result;
> >>>
> >> <snip>
> >>> Index: linux-pm/drivers/acpi/thermal.c
> >>> ===================================================================
> >>> --- linux-pm.orig/drivers/acpi/thermal.c
> >>> +++ linux-pm/drivers/acpi/thermal.c
> >>> @@ -609,7 +609,7 @@ static int acpi_thermal_bind_unbind_cdev
> >>>                .thermal = thermal, .cdev = cdev, .bind = bind
> >>>        };
> >>>
> >>> -     return for_each_thermal_trip(thermal, bind_unbind_cdev_cb, &bd);
> >>> +     return thermal_zone_for_each_trip(thermal, bind_unbind_cdev_cb, &bd);
> >> If so, it seems that the for_each_thermal_trip() can be removed or no
> >> need to export.
> > I beg to differ:
> >
> > $ git grep for_each_thermal_trip | head -1
> > drivers/net/wireless/intel/iwlwifi/mvm/tt.c:
> > for_each_thermal_trip(mvm->tz_device.tzone, iwl_trip_temp_cb, &twd);
> Can we modify it for tt.c?

Not really.

tt.c invokes this under the thermal zone lock, so it cannot use the
"locked" variant.

> It doesn't seem to keep two interfaces. I'm a little confused for that.

The difference between for_each_thermal_trip() and
thermal_zone_for_each_trip() is "unlocked" vs "locked", respectively.
It may just be a question of naming ...

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()
  2024-08-21  9:44       ` lihuisong (C)
@ 2024-08-21 10:49         ` Daniel Lezcano
  2024-08-21 11:22           ` lihuisong (C)
  2024-08-21 11:12         ` Rafael J. Wysocki
  1 sibling, 1 reply; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21 10:49 UTC (permalink / raw)
  To: lihuisong (C), Rafael J. Wysocki; +Cc: LKML, Lukasz Luba, Zhang Rui, Linux PM

On 21/08/2024 11:44, lihuisong (C) wrote:
> 
> 在 2024/8/21 17:28, Daniel Lezcano 写道:
>> On 21/08/2024 10:49, lihuisong (C) wrote:
>>
>> [ ... ]
>>
>>>> -    list_for_each_entry(pos2, &thermal_cdev_list, node) {
>>>> -        if (pos2 == cdev)
>>>> -            break;
>>>> -    }
>>>> +    lockdep_assert_held(&thermal_list_lock);
>>>> -    if (tz != pos1 || cdev != pos2)
>>>> +    if (list_empty(&tz->node) || list_empty(&cdev->node))
>>> The old verification is ensure that tz and cdev already add to 
>>> thermal_tz_list and thermal_cdev_list,respectively.
>>> Namely, tz and cdev are definitely registered and intialized.
>>> The check is ok for all untizalized thermal_zone_device and cooling 
>>> device.
>>> But the new verification doesn't seem to do that.
>>
>> If the tz or the cdev are registered then their "->node" is not empty 
>> because they are linked with the thermal_list and cdev_list
>>
>> So either way is browsing the lists to find the tz/cdev or just check 
>> "->node" is not empty. The latter the faster.
> Assume that tz/cdev isn't intiazlized and registered to thermal_tz_list 
> or thermal_cdev_list. And then directly call this interface.

Then there is a bug in the internal code because the 
thermal_zone_device_register*() and cooling_device_device_register() 
allocate and initialize those structures.

The caller of the function is supposed to use the API provided by the 
thermal framework. It is not possible to plan every stupid things a 
driver can do. In this particular case, very likely the kernel will 
crash immediately which is a sufficient test for me and coercive enough 
to have the API user to put its code in question ;)

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()
  2024-08-21  8:49   ` lihuisong (C)
  2024-08-21  9:28     ` Daniel Lezcano
@ 2024-08-21 10:51     ` Rafael J. Wysocki
  1 sibling, 0 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-21 10:51 UTC (permalink / raw)
  To: lihuisong (C)
  Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano, Lukasz Luba,
	Zhang Rui

On Wed, Aug 21, 2024 at 12:43 PM lihuisong (C) <lihuisong@huawei.com> wrote:
>
>
> 在 2024/8/19 23:51, Rafael J. Wysocki 写道:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > It is not necessary to look up the thermal zone and the cooling device
> > in the respective global lists to check whether or not they are
> > registered.  It is sufficient to check whether or not their respective
> > list nodes are empty for this purpose.
> >
> > Use the above observation to simplify thermal_bind_cdev_to_trip().  In
> > addition, eliminate an unnecessary ternary operator from it.
> >
> > Moreover, add lockdep_assert_held() for thermal_list_lock to it because
> > that lock must be held by its callers when it is running.
> >
> > No intentional functional impact.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > v1 -> v3: No changes
> >
> > ---
> >   drivers/thermal/thermal_core.c |   16 ++++------------
> >   1 file changed, 4 insertions(+), 12 deletions(-)
> >
> > Index: linux-pm/drivers/thermal/thermal_core.c
> > ===================================================================
> > --- linux-pm.orig/drivers/thermal/thermal_core.c
> > +++ linux-pm/drivers/thermal/thermal_core.c
> > @@ -781,25 +781,17 @@ int thermal_bind_cdev_to_trip(struct the
> >   {
> >       struct thermal_instance *dev;
> >       struct thermal_instance *pos;
> > -     struct thermal_zone_device *pos1;
> > -     struct thermal_cooling_device *pos2;
> >       bool upper_no_limit;
> >       int result;
> >
> > -     list_for_each_entry(pos1, &thermal_tz_list, node) {
> > -             if (pos1 == tz)
> > -                     break;
> > -     }
> > -     list_for_each_entry(pos2, &thermal_cdev_list, node) {
> > -             if (pos2 == cdev)
> > -                     break;
> > -     }
> > +     lockdep_assert_held(&thermal_list_lock);
> >
> > -     if (tz != pos1 || cdev != pos2)
> > +     if (list_empty(&tz->node) || list_empty(&cdev->node))
> The old verification is ensure that tz and cdev already add to
> thermal_tz_list and thermal_cdev_list,respectively.
> Namely, tz and cdev are definitely registered and intialized.
> The check is ok for all untizalized thermal_zone_device and cooling device.
> But the new verification doesn't seem to do that.

It doesn't need to do it and after this series it is only called from
thermal_zone_device_register_with_trips() and
__thermal_cooling_device_register() via thermal_zone_cdev_binding()
after both the cdev and the tz have been added to the list, under
thermal_list_lock.

I guess I can send a patch to remove the check altogether now.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 03/14] thermal: core: Drop redundant thermal instance checks
  2024-08-21  9:32   ` Daniel Lezcano
@ 2024-08-21 11:11     ` Rafael J. Wysocki
  2024-08-21 11:56       ` Daniel Lezcano
  0 siblings, 1 reply; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-21 11:11 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: Rafael J. Wysocki, Linux PM, LKML, Lukasz Luba, Zhang Rui

On Wed, Aug 21, 2024 at 1:01 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 19/08/2024 17:52, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Because the trip and cdev pointers are sufficient to identify a thermal
> > instance holding them unambiguously, drop the additional thermal zone
> > checks from two loops walking the list of thermal instances in a
> > thermal zone.
> >
> > No intentional functional impact.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>
> I'm wondering if the thermal_instance 'tz' field could be removed too ?

It is used in a debug printk in __thermal_cdev_update().  If that
message can be dropped, then yes, but that would be a separate patch
anyway.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()
  2024-08-21  9:44       ` lihuisong (C)
  2024-08-21 10:49         ` Daniel Lezcano
@ 2024-08-21 11:12         ` Rafael J. Wysocki
  1 sibling, 0 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-21 11:12 UTC (permalink / raw)
  To: lihuisong (C)
  Cc: Daniel Lezcano, Rafael J. Wysocki, LKML, Lukasz Luba, Zhang Rui,
	Linux PM

On Wed, Aug 21, 2024 at 1:11 PM lihuisong (C) <lihuisong@huawei.com> wrote:
>
>
> 在 2024/8/21 17:28, Daniel Lezcano 写道:
> > On 21/08/2024 10:49, lihuisong (C) wrote:
> >
> > [ ... ]
> >
> >>> -    list_for_each_entry(pos2, &thermal_cdev_list, node) {
> >>> -        if (pos2 == cdev)
> >>> -            break;
> >>> -    }
> >>> +    lockdep_assert_held(&thermal_list_lock);
> >>> -    if (tz != pos1 || cdev != pos2)
> >>> +    if (list_empty(&tz->node) || list_empty(&cdev->node))
> >> The old verification is ensure that tz and cdev already add to
> >> thermal_tz_list and thermal_cdev_list,respectively.
> >> Namely, tz and cdev are definitely registered and intialized.
> >> The check is ok for all untizalized thermal_zone_device and cooling
> >> device.
> >> But the new verification doesn't seem to do that.
> >
> > If the tz or the cdev are registered then their "->node" is not empty
> > because they are linked with the thermal_list and cdev_list
> >
> > So either way is browsing the lists to find the tz/cdev or just check
> > "->node" is not empty. The latter the faster.
> Assume that tz/cdev isn't intiazlized and registered to thermal_tz_list
> or thermal_cdev_list. And then directly call this interface.

Who does this?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip()
  2024-08-21 10:49         ` Daniel Lezcano
@ 2024-08-21 11:22           ` lihuisong (C)
  0 siblings, 0 replies; 66+ messages in thread
From: lihuisong (C) @ 2024-08-21 11:22 UTC (permalink / raw)
  To: Daniel Lezcano, Rafael J. Wysocki; +Cc: LKML, Lukasz Luba, Zhang Rui, Linux PM


在 2024/8/21 18:49, Daniel Lezcano 写道:
> On 21/08/2024 11:44, lihuisong (C) wrote:
>>
>> 在 2024/8/21 17:28, Daniel Lezcano 写道:
>>> On 21/08/2024 10:49, lihuisong (C) wrote:
>>>
>>> [ ... ]
>>>
>>>>> -    list_for_each_entry(pos2, &thermal_cdev_list, node) {
>>>>> -        if (pos2 == cdev)
>>>>> -            break;
>>>>> -    }
>>>>> +    lockdep_assert_held(&thermal_list_lock);
>>>>> -    if (tz != pos1 || cdev != pos2)
>>>>> +    if (list_empty(&tz->node) || list_empty(&cdev->node))
>>>> The old verification is ensure that tz and cdev already add to 
>>>> thermal_tz_list and thermal_cdev_list,respectively.
>>>> Namely, tz and cdev are definitely registered and intialized.
>>>> The check is ok for all untizalized thermal_zone_device and cooling 
>>>> device.
>>>> But the new verification doesn't seem to do that.
>>>
>>> If the tz or the cdev are registered then their "->node" is not 
>>> empty because they are linked with the thermal_list and cdev_list
>>>
>>> So either way is browsing the lists to find the tz/cdev or just 
>>> check "->node" is not empty. The latter the faster.
>> Assume that tz/cdev isn't intiazlized and registered to 
>> thermal_tz_list or thermal_cdev_list. And then directly call this 
>> interface.
>
> Then there is a bug in the internal code because the 
> thermal_zone_device_register*() and cooling_device_device_register() 
> allocate and initialize those structures.
>
> The caller of the function is supposed to use the API provided by the 
> thermal framework. It is not possible to plan every stupid things a 
> driver can do. In this particular case, very likely the kernel will 
> crash immediately which is a sufficient test for me and coercive 
> enough to have the API user to put its code in question ;)

A good point. Agree.


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 03/14] thermal: core: Drop redundant thermal instance checks
  2024-08-21 11:11     ` Rafael J. Wysocki
@ 2024-08-21 11:56       ` Daniel Lezcano
  2024-08-21 12:52         ` Rafael J. Wysocki
  0 siblings, 1 reply; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21 11:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, LKML, Lukasz Luba, Zhang Rui

On 21/08/2024 13:11, Rafael J. Wysocki wrote:
> On Wed, Aug 21, 2024 at 1:01 PM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> On 19/08/2024 17:52, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> Because the trip and cdev pointers are sufficient to identify a thermal
>>> instance holding them unambiguously, drop the additional thermal zone
>>> checks from two loops walking the list of thermal instances in a
>>> thermal zone.
>>>
>>> No intentional functional impact.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>>
>> I'm wondering if the thermal_instance 'tz' field could be removed too ?
> 
> It is used in a debug printk in __thermal_cdev_update().  If that
> message can be dropped, then yes, but that would be a separate patch
> anyway.

Yes, I don't think it is really worth the debug message here


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 03/14] thermal: core: Drop redundant thermal instance checks
  2024-08-21 11:56       ` Daniel Lezcano
@ 2024-08-21 12:52         ` Rafael J. Wysocki
  0 siblings, 0 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-21 12:52 UTC (permalink / raw)
  To: Daniel Lezcano
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML, Lukasz Luba,
	Zhang Rui

On Wed, Aug 21, 2024 at 1:56 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 21/08/2024 13:11, Rafael J. Wysocki wrote:
> > On Wed, Aug 21, 2024 at 1:01 PM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >>
> >> On 19/08/2024 17:52, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>>
> >>> Because the trip and cdev pointers are sufficient to identify a thermal
> >>> instance holding them unambiguously, drop the additional thermal zone
> >>> checks from two loops walking the list of thermal instances in a
> >>> thermal zone.
> >>>
> >>> No intentional functional impact.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>
> >> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> >>
> >> I'm wondering if the thermal_instance 'tz' field could be removed too ?
> >
> > It is used in a debug printk in __thermal_cdev_update().  If that
> > message can be dropped, then yes, but that would be a separate patch
> > anyway.
>
> Yes, I don't think it is really worth the debug message here

OK

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 06/14] thermal: core: Introduce .should_bind() thermal zone callback
  2024-08-19 16:00 ` [PATCH v3 06/14] thermal: core: Introduce .should_bind() thermal zone callback Rafael J. Wysocki
  2024-08-20  7:06   ` Zhang, Rui
  2024-08-21  9:09   ` lihuisong (C)
@ 2024-08-21 13:21   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21 13:21 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba, Zhang Rui

On 19/08/2024 18:00, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The current design of the code binding cooling devices to trip points in
> thermal zones is convoluted and hard to follow.
> 
> Namely, a driver that registers a thermal zone can provide .bind()
> and .unbind() operations for it, which are required to call either
> thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip(),
> respectively, or thermal_zone_bind_cooling_device() and
> thermal_zone_unbind_cooling_device(), respectively, for every relevant
> trip point and the given cooling device.  Moreover, if .bind() is
> provided and .unbind() is not, the cleanup necessary during the removal
> of a thermal zone or a cooling device may not be carried out.
> 
> In other words, the core relies on the thermal zone owners to do the
> right thing, which is error prone and far from obvious, even though all
> of that is not really necessary.  Specifically, if the core could ask
> the thermal zone owner, through a special thermal zone callback, whether
> or not a given cooling device should be bound to a given trip point in
> the given thermal zone, it might as well carry out all of the binding
> and unbinding by itself.  In particular, the unbinding can be done
> automatically without involving the thermal zone owner at all because
> all of the thermal instances associated with a thermal zone or cooling
> device going away must be deleted regardless.
> 
> Accordingly, introduce a new thermal zone operation, .should_bind(),
> that can be invoked by the thermal core for a given thermal zone,
> trip point and cooling device combination in order to check whether
> or not the cooling device should be bound to the trip point at hand.
> It takes an additional cooling_spec argument allowing the thermal
> zone owner to specify the highest and lowest cooling states of the
> cooling device and its weight for the given trip point binding.
> 
> Make the thermal core use this operation, if present, in the absence of
> .bind() and .unbind().  Note that .should_bind() will be called under
> the thermal zone lock.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v1 -> v3: No changes (previously [08/17])
> 
> ---
>   drivers/thermal/thermal_core.c |  106 +++++++++++++++++++++++++++++++----------
>   include/linux/thermal.h        |   10 +++
>   2 files changed, 92 insertions(+), 24 deletions(-)
> 
> Index: linux-pm/include/linux/thermal.h
> ===================================================================
> --- linux-pm.orig/include/linux/thermal.h
> +++ linux-pm/include/linux/thermal.h
> @@ -85,11 +85,21 @@ struct thermal_trip {
>   
>   struct thermal_zone_device;
>   
> +struct cooling_spec {
> +	unsigned long upper;	/* Highest cooling state  */
> +	unsigned long lower;	/* Lowest cooling state  */
> +	unsigned int weight;	/* Cooling device weight */
> +};
> +
>   struct thermal_zone_device_ops {
>   	int (*bind) (struct thermal_zone_device *,
>   		     struct thermal_cooling_device *);
>   	int (*unbind) (struct thermal_zone_device *,
>   		       struct thermal_cooling_device *);
> +	bool (*should_bind) (struct thermal_zone_device *,
> +			     const struct thermal_trip *,
> +			     struct thermal_cooling_device *,
> +			     struct cooling_spec *);
>   	int (*get_temp) (struct thermal_zone_device *, int *);
>   	int (*set_trips) (struct thermal_zone_device *, int, int);
>   	int (*change_mode) (struct thermal_zone_device *,
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -991,12 +991,61 @@ static struct class *thermal_class;
>   
>   static inline
>   void print_bind_err_msg(struct thermal_zone_device *tz,
> +			const struct thermal_trip *trip,
>   			struct thermal_cooling_device *cdev, int ret)
>   {
> +	if (trip) {
> +		dev_err(&tz->device, "binding cdev %s to trip %d failed: %d\n",
> +			cdev->type, thermal_zone_trip_id(tz, trip), ret);
> +		return;
> +	}
> +
>   	dev_err(&tz->device, "binding zone %s with cdev %s failed:%d\n",
>   		tz->type, cdev->type, ret);
>   }
>   
> +static void thermal_zone_cdev_binding(struct thermal_zone_device *tz,
> +				      struct thermal_cooling_device *cdev)

nit picking: is there a reason to use 'binding' instead of 'bind' ?

IMO it would appear more consistent to keep the same wording than the 
ops. The present participle is usually used to describe an action which 
is happening, usually to report back an event. Here it is more an action 
to be done (feel free to send a separate patch for the renaming).

Other than that

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 07/14] thermal: ACPI: Use the .should_bind() thermal zone callback
  2024-08-19 16:02 ` [PATCH v3 07/14] thermal: ACPI: Use the " Rafael J. Wysocki
  2024-08-20  7:06   ` Zhang, Rui
@ 2024-08-21 13:22   ` Daniel Lezcano
  1 sibling, 0 replies; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21 13:22 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba, Zhang Rui

On 19/08/2024 18:02, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make the ACPI thermal zone driver use the .should_bind() thermal zone
> callback to provide the thermal core with the information on whether or
> not to bind the given cooling device to the given trip point in the
> given thermal zone.  If it returns 'true', the thermal core will bind
> the cooling device to the trip and the corresponding unbinding will be
> taken care of automatically by the core on the removal of the involved
> thermal zone or cooling device.
> 
> This replaces the .bind() and .unbind() thermal zone callbacks which
> allows the code to be simplified quite significantly while providing
> the same functionality.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---

Nice !

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 08/14] thermal: core: Unexport thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
  2024-08-19 16:05 ` [PATCH v3 08/14] thermal: core: Unexport thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip() Rafael J. Wysocki
  2024-08-20  7:08   ` Zhang, Rui
  2024-08-21  9:18   ` lihuisong (C)
@ 2024-08-21 13:23   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21 13:23 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba, Zhang Rui

On 19/08/2024 18:05, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Since thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip()
> are only called locally in the thermal core now, they can be static,
> so change their definitions accordingly and drop their headers from
> the global thermal header file.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 09/14] platform/x86: acerhdf: Use the .should_bind() thermal zone callback
  2024-08-19 16:19 ` [PATCH v3 09/14] platform/x86: acerhdf: Use the .should_bind() thermal zone callback Rafael J. Wysocki
  2024-08-19 20:24   ` Peter Kästle
@ 2024-08-21 13:25   ` Daniel Lezcano
  1 sibling, 0 replies; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21 13:25 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: LKML, Lukasz Luba, Zhang Rui, Hans de Goede, Peter Kaestle,
	platform-driver-x86

On 19/08/2024 18:19, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make the acerhdf driver use the .should_bind() thermal zone
> callback to provide the thermal core with the information on whether or
> not to bind the given cooling device to the given trip point in the
> given thermal zone.  If it returns 'true', the thermal core will bind
> the cooling device to the trip and the corresponding unbinding will be
> taken care of automatically by the core on the removal of the involved
> thermal zone or cooling device.
> 
> The previously existing acerhdf_bind() function bound cooling devices
> to thermal trip point 0 only, so the new callback needs to return 'true'
> for trip point 0.  However, it is straightforward to observe that trip
> point 0 is an active trip point and the only other trip point in the
> driver's thermal zone is a critical one, so it is sufficient to return
> 'true' from that callback if the type of the given trip point is
> THERMAL_TRIP_ACTIVE.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Acked-by: Hans de Goede <hdegoede@redhat.com>

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 11/14] thermal: imx: Use the .should_bind() thermal zone callback
  2024-08-19 16:26 ` [PATCH v3 11/14] thermal: imx: " Rafael J. Wysocki
@ 2024-08-21 13:42   ` Daniel Lezcano
  0 siblings, 0 replies; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21 13:42 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba, Zhang Rui

On 19/08/2024 18:26, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make the imx_thermal driver use the .should_bind() thermal zone callback
> to provide the thermal core with the information on whether or not to
> bind the given cooling device to the given trip point in the given
> thermal zone.  If it returns 'true', the thermal core will bind the
> cooling device to the trip and the corresponding unbinding will be
> taken care of automatically by the core on the removal of the involved
> thermal zone or cooling device.
> 
> In the imx_thermal case, it only needs to return 'true' for the passive
> trip point and it will match any cooling device passed to it, in
> analogy with the old-style imx_bind() callback function.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 12/14] thermal/of: Use the .should_bind() thermal zone callback
  2024-08-19 16:30 ` [PATCH v3 12/14] thermal/of: " Rafael J. Wysocki
@ 2024-08-21 14:20   ` Daniel Lezcano
  2024-08-26 11:31   ` Marek Szyprowski
  1 sibling, 0 replies; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21 14:20 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: LKML, Lukasz Luba, Zhang Rui, Krzysztof Kozlowski

On 19/08/2024 18:30, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make the thermal_of driver use the .should_bind() thermal zone callback
> to provide the thermal core with the information on whether or not to
> bind the given cooling device to the given trip point in the given
> thermal zone.  If it returns 'true', the thermal core will bind the
> cooling device to the trip and the corresponding unbinding will be
> taken care of automatically by the core on the removal of the involved
> thermal zone or cooling device.
> 
> This replaces the .bind() and .unbind() thermal zone callbacks which
> assumed the same trip points ordering in the driver and in the thermal
> core (that may not be true any more in the future).  The .bind()
> callback would walk the given thermal zone's cooling maps to find all
> of the valid trip point combinations with the given cooling device and
> it would call thermal_zone_bind_cooling_device() for all of them using
> trip point indices reflecting the ordering of the trips in the DT.
> 
> The .should_bind() callback still walks the thermal zone's cooling maps,
> but it can use the trip object passed to it by the thermal core to find
> the trip in question in the first place and then it uses the
> corresponding 'cooling-device' entries to look up the given cooling
> device.  To be able to match the trip object provided by the thermal
> core to a specific device node, the driver sets the 'priv' field of each
> trip to the corresponding device node pointer during initialization.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Daniel Lezcano <daniel.lezcano@linaro.org> # rk3399-rock960


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 13/14] thermal: core: Drop unused bind/unbind functions and callbacks
  2024-08-19 16:31 ` [PATCH v3 13/14] thermal: core: Drop unused bind/unbind functions and callbacks Rafael J. Wysocki
  2024-08-20  7:10   ` Zhang, Rui
  2024-08-21  9:33   ` lihuisong (C)
@ 2024-08-21 14:24   ` Daniel Lezcano
  2 siblings, 0 replies; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21 14:24 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba, Zhang Rui

On 19/08/2024 18:31, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> There are no more callers of thermal_zone_bind_cooling_device() and
> thermal_zone_unbind_cooling_device(), so drop them along with all of
> the corresponding headers, code and documentation.
> 
> Moreover, because the .bind() and .unbind() thermal zone callbacks would
> only be used when the above functions, respectively, were called, drop
> them as well along with all of the code related to them.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 14/14] thermal: core: Clean up trip bind/unbind functions
  2024-08-19 16:33 ` [PATCH v3 14/14] thermal: core: Clean up trip bind/unbind functions Rafael J. Wysocki
  2024-08-20  7:11   ` Zhang, Rui
  2024-08-21  9:34   ` lihuisong (C)
@ 2024-08-21 14:29   ` Daniel Lezcano
  2024-08-21 16:21     ` Rafael J. Wysocki
  2 siblings, 1 reply; 66+ messages in thread
From: Daniel Lezcano @ 2024-08-21 14:29 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba, Zhang Rui

On 19/08/2024 18:33, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make thermal_bind_cdev_to_trip() take a struct cooling_spec pointer
> to reduce the number of its arguments, change the return type of
> thermal_unbind_cdev_from_trip() to void and rearrange the code in
> thermal_zone_cdev_binding() to reduce the indentation level.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v2 -> v3: Subject fix
> 
> v1-> v2: No changes
> 
> ---
>   drivers/thermal/thermal_core.c |   54 +++++++++++++++--------------------------
>   1 file changed, 21 insertions(+), 33 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -757,15 +757,7 @@ struct thermal_zone_device *thermal_zone
>    * @tz:		pointer to struct thermal_zone_device
>    * @trip:	trip point the cooling devices is associated with in this zone.
>    * @cdev:	pointer to struct thermal_cooling_device
> - * @upper:	the Maximum cooling state for this trip point.
> - *		THERMAL_NO_LIMIT means no upper limit,
> - *		and the cooling device can be in max_state.
> - * @lower:	the Minimum cooling state can be used for this trip point.
> - *		THERMAL_NO_LIMIT means no lower limit,
> - *		and the cooling device can be in cooling state 0.
> - * @weight:	The weight of the cooling device to be bound to the
> - *		thermal zone. Use THERMAL_WEIGHT_DEFAULT for the
> - *		default value
> + * @c:		cooling specification for @trip and @cdev

s/c/cspec/ at least :)

Other than that

Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>


-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 14/14] thermal: core: Clean up trip bind/unbind functions
  2024-08-21 14:29   ` Daniel Lezcano
@ 2024-08-21 16:21     ` Rafael J. Wysocki
  0 siblings, 0 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-21 16:21 UTC (permalink / raw)
  To: Daniel Lezcano; +Cc: Rafael J. Wysocki, Linux PM, LKML, Lukasz Luba, Zhang Rui

On Wed, Aug 21, 2024 at 4:29 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 19/08/2024 18:33, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Make thermal_bind_cdev_to_trip() take a struct cooling_spec pointer
> > to reduce the number of its arguments, change the return type of
> > thermal_unbind_cdev_from_trip() to void and rearrange the code in
> > thermal_zone_cdev_binding() to reduce the indentation level.
> >
> > No intentional functional impact.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > v2 -> v3: Subject fix
> >
> > v1-> v2: No changes
> >
> > ---
> >   drivers/thermal/thermal_core.c |   54 +++++++++++++++--------------------------
> >   1 file changed, 21 insertions(+), 33 deletions(-)
> >
> > Index: linux-pm/drivers/thermal/thermal_core.c
> > ===================================================================
> > --- linux-pm.orig/drivers/thermal/thermal_core.c
> > +++ linux-pm/drivers/thermal/thermal_core.c
> > @@ -757,15 +757,7 @@ struct thermal_zone_device *thermal_zone
> >    * @tz:             pointer to struct thermal_zone_device
> >    * @trip:   trip point the cooling devices is associated with in this zone.
> >    * @cdev:   pointer to struct thermal_cooling_device
> > - * @upper:   the Maximum cooling state for this trip point.
> > - *           THERMAL_NO_LIMIT means no upper limit,
> > - *           and the cooling device can be in max_state.
> > - * @lower:   the Minimum cooling state can be used for this trip point.
> > - *           THERMAL_NO_LIMIT means no lower limit,
> > - *           and the cooling device can be in cooling state 0.
> > - * @weight:  The weight of the cooling device to be bound to the
> > - *           thermal zone. Use THERMAL_WEIGHT_DEFAULT for the
> > - *           default value
> > + * @c:               cooling specification for @trip and @cdev
>
> s/c/cspec/ at least :)

I have settled on cool_spec here.

> Other than that
>
> Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org>

Thank you for all of the reviews!

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points
  2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
                   ` (13 preceding siblings ...)
  2024-08-19 16:33 ` [PATCH v3 14/14] thermal: core: Clean up trip bind/unbind functions Rafael J. Wysocki
@ 2024-08-24 18:45 ` Nícolas F. R. A. Prado
  2024-08-26  9:58   ` Rafael J. Wysocki
  14 siblings, 1 reply; 66+ messages in thread
From: Nícolas F. R. A. Prado @ 2024-08-24 18:45 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui,
	regressions, kernelci, kernel

On Mon, Aug 19, 2024 at 05:49:07PM +0200, Rafael J. Wysocki wrote:
> Hi Everyone,
> 
> This is one more update of
> 
> https://lore.kernel.org/linux-pm/3134863.CbtlEUcBR6@rjwysocki.net/#r
> 
> the cover letter of which was sent separately by mistake:
> 
> https://lore.kernel.org/linux-pm/CAJZ5v0jo5vh2uD5t4GqBnN0qukMBG_ty33PB=NiEqigqxzBcsw@mail.gmail.com/
> 
> and it has been updated once already:
> 
> https://lore.kernel.org/linux-pm/114901234.nniJfEyVGO@rjwysocki.net/
> 
> Relative to the v2 above it drops 3 patches, one because it was broken ([04/17
> in the v2), and two more that would need to be rebased significantly, either
> because of dropping the other broken patch or because of the recent Bang-bang
> governor fixes:
> 
> https://lore.kernel.org/linux-pm/1903691.tdWV9SEqCh@rjwysocki.net/
> 
> The remaining 14 patches, 2 of which have been slightly rebased and the rest
> is mostly unchanged (except for some very minor subject and changelog fixes),
> is not expected to be controversial and are targeting 6.12, on top of the
> current linux-next material.
> 
> The original motivation for this series quoted below has not changed:
> 
>  The code for binding cooling devices to trip points (and unbinding them from
>  trip point) is one of the murkiest pieces of the thermal subsystem.  It is
>  convoluted, bloated with unnecessary code doing questionable things, and it
>  works backwards.
> 
>  The idea is to bind cooling devices to trip points in accordance with some
>  information known to the thermal zone owner (thermal driver).  This information
>  is not known to the thermal core when the thermal zone is registered, so the
>  driver needs to be involved, but instead of just asking the driver whether
>  or not the given cooling device should be bound to a given trip point, the
>  thermal core expects the driver to carry out all of the binding process
>  including calling functions specifically provided by the core for this
>  purpose which is cumbersome and counter-intuitive.
> 
>  Because the driver has no information regarding the representation of the trip
>  points at the core level, it is forced to walk them (and it has to avoid some
>  locking traps while doing this), or it needs to make questionable assumptions
>  regarding the ordering of the trips in the core.  There are drivers doing both
>  these things.
> 
> The first 5 patches in the series are preliminary.
> 
> Patch [06/14] introduces a new .should_bind() callback for thermal zones and
> patches [07,09-12/14] modifies drivers to use it instead of the .bind() and
> .unbind() callbacks which allows them to be simplified quite a bit.
> 
> The other patches [08,13-14/14] get rid of code that becomes unused after the
> previous changes and do some cleanups on top of that.
> 
> The entire series along with 2 patches on top of it (that were present in the
> v2 of this set of patches) is available in the thermal-core-testing git branch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=thermal-core-testing
> 
> (note that this branch is going to be rebased shortly on top of 6.11-rc4
> and the thermal control material in linux-next).
> 
> Thanks!

Hi,

KernelCI has identified a boot regression originating from this series. I've
verified that reverting the series fixes the issue.

Affected platforms:
* mt8195-cherry-tomato-r2
* mt8192-asurada-spherion-r0
* mt8183-kukui-jacuzzi-juniper-sku16
* mt8186-corsola-steelix-sku131072
* sc7180-trogdor-kingoftown
* sc7180-trogdor-lazor-limozeen

Relevant log from mt8195-cherry-tomato-r2 (with additional debug configs
enabled):

[   11.326726] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
[   11.335294] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 165, name: udevd
[   11.342944] preempt_count: 1, expected: 0
[   11.346943] RCU nest depth: 0, expected: 0
[   11.351028] 4 locks held by udevd/165:
[   11.354766]  #0: ffff4dc8825db0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x130/0x4a8
[   11.363207]  #1: ffffc208f386c3c8 (thermal_list_lock){+.+.}-{3:3}, at: thermal_zone_device_register_with_trips+0x85c/0xcd8
[   11.374248]  #2: ffff4dc7dc3586f0 (&tz->lock){+.+.}-{3:3}, at: thermal_zone_cdev_binding.part.0+0x98/0x280
[   11.383896]  #3: ffffc208f39b7b78 (devtree_lock){....}-{2:2}, at: of_get_next_child+0x2c/0xc4
[   11.392418] irq event stamp: 173740
[   11.395895] hardirqs last  enabled at (173739): [<ffffc208ecde804c>] _raw_spin_unlock_irqrestore+0x84/0x90
[   11.405537] hardirqs last disabled at (173740): [<ffffc208ecde6f7c>] _raw_spin_lock_irqsave+0xe0/0xf4
[   11.414742] softirqs last  enabled at (172404): [<ffffc208e978bb20>] handle_softirqs+0x534/0x874
[   11.423517] softirqs last disabled at (172393): [<ffffc208e961097c>] __do_softirq+0x14/0x20
[   11.431857] CPU: 5 UID: 0 PID: 165 Comm: udevd Not tainted 6.11.0-rc4-next-20240822-00002-gfbbbf9faa56a #628
[   11.441670] Hardware name: Acer Tomato (rev2) board (DT)
[   11.446970] Call trace:
[   11.449407]  dump_backtrace+0x98/0xf0
[   11.453059]  show_stack+0x18/0x24
[   11.456364]  dump_stack_lvl+0x90/0xd0
[   11.460018]  dump_stack+0x1c/0x28
[   11.463322]  __might_resched+0x358/0x570
[   11.467234]  __might_sleep+0xa4/0x16c
[   11.470885]  down_write+0x8c/0x21c
[   11.474277]  kernfs_remove+0x64/0x98
[   11.477844]  sysfs_remove_dir+0xa8/0xe8
[   11.481669]  __kobject_del+0xb0/0x27c
[   11.485321]  kobject_release+0xfc/0x134
[   11.489146]  kobject_put+0xb0/0x130
[   11.492624]  of_node_put+0x18/0x28
[   11.496016]  of_get_next_child+0x64/0xc4
[   11.499929]  thermal_of_should_bind+0x154/0x390
[   11.504449]  thermal_zone_cdev_binding.part.0+0x174/0x280
[   11.509836]  thermal_zone_device_register_with_trips+0x914/0xcd8
[   11.515831]  thermal_of_zone_register+0x284/0x464
[   11.520523]  devm_thermal_of_zone_register+0x80/0xf4
[   11.525476]  lvts_domain_init+0x500/0x760 [lvts_thermal]
[   11.530785]  lvts_probe+0x1b4/0x3ac [lvts_thermal]
[   11.535565]  platform_probe+0xc4/0x214
[   11.539303]  really_probe+0x188/0x5d0
[   11.542954]  __driver_probe_device+0x160/0x2e8
[   11.547386]  driver_probe_device+0x5c/0x298
[   11.551558]  __driver_attach+0x13c/0x4a8
[   11.555470]  bus_for_each_dev+0xf8/0x180
[   11.559383]  driver_attach+0x3c/0x58
[   11.562947]  bus_add_driver+0x1c4/0x458
[   11.566772]  driver_register+0xf4/0x3c0
[   11.570598]  __platform_driver_register+0x60/0x88
[   11.575291]  lvts_driver_init+0x20/0x1000 [lvts_thermal]
[   11.580593]  do_one_initcall+0xcc/0x284
[   11.584418]  do_init_module+0x278/0x740
[   11.588244]  load_module+0xed8/0x1434
[   11.591897]  init_module_from_file+0xdc/0x1fc
[   11.596243]  idempotent_init_module+0x2bc/0x604
[   11.600762]  __arm64_sys_finit_module+0xac/0x100
[   11.605368]  invoke_syscall+0x6c/0x258
[   11.609107]  el0_svc_common.constprop.0+0xac/0x230
[   11.613886]  do_el0_svc+0x40/0x58
[   11.617190]  el0_svc+0x48/0xb8
[   11.620234]  el0t_64_sync_handler+0x100/0x12c
[   11.624580]  el0t_64_sync+0x190/0x194
[   11.628233]
[   11.629713] =============================
[   11.633708] [ BUG: Invalid wait context ]
[   11.637705] 6.11.0-rc4-next-20240822-00002-gfbbbf9faa56a #628 Tainted: G        W
[   11.645953] -----------------------------
[   11.649950] udevd/165 is trying to lock:
[   11.653859] ffff4dc880881148 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_remove+0x64/0x98
[   11.662200] other info that might help us debug this:
[   11.667238] context-{4:4}
[   11.669846] 4 locks held by udevd/165:
[   11.673582]  #0: ffff4dc8825db0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x130/0x4a8
[   11.682009]  #1: ffffc208f386c3c8 (thermal_list_lock){+.+.}-{3:3}, at: thermal_zone_device_register_with_trips+0x85c/0xcd8
[   11.693041]  #2: ffff4dc7dc3586f0 (&tz->lock){+.+.}-{3:3}, at: thermal_zone_cdev_binding.part.0+0x98/0x280
[   11.702684]  #3: ffffc208f39b7b78 (devtree_lock){....}-{2:2}, at: of_get_next_child+0x2c/0xc4
[   11.711199] stack backtrace:
[   11.714067] CPU: 5 UID: 0 PID: 165 Comm: udevd Tainted: G        W          6.11.0-rc4-next-20240822-00002-gfbbbf9faa56a #628
[   11.725355] Tainted: [W]=WARN
[   11.728310] Hardware name: Acer Tomato (rev2) board (DT)
[   11.733608] Call trace:
[   11.736041]  dump_backtrace+0x98/0xf0
[   11.739692]  show_stack+0x18/0x24
[   11.742994]  dump_stack_lvl+0x90/0xd0
[   11.746645]  dump_stack+0x1c/0x28
[   11.749948]  __lock_acquire+0x10f8/0x2710
[   11.753948]  lock_acquire.part.0+0x218/0x518
[   11.758206]  lock_acquire+0x90/0xb4
[   11.761683]  down_write+0xb4/0x21c
[   11.765074]  kernfs_remove+0x64/0x98
[   11.768637]  sysfs_remove_dir+0xa8/0xe8
[   11.772461]  __kobject_del+0xb0/0x27c
[   11.776111]  kobject_release+0xfc/0x134
[   11.779935]  kobject_put+0xb0/0x130
[   11.783413]  of_node_put+0x18/0x28
[   11.786803]  of_get_next_child+0x64/0xc4
[   11.790714]  thermal_of_should_bind+0x154/0x390
[   11.795231]  thermal_zone_cdev_binding.part.0+0x174/0x280
[   11.800617]  thermal_zone_device_register_with_trips+0x914/0xcd8
[   11.806609]  thermal_of_zone_register+0x284/0x464
[   11.811301]  devm_thermal_of_zone_register+0x80/0xf4
[   11.816253]  lvts_domain_init+0x500/0x760 [lvts_thermal]
[   11.821553]  lvts_probe+0x1b4/0x3ac [lvts_thermal]
[   11.826332]  platform_probe+0xc4/0x214
[   11.830069]  really_probe+0x188/0x5d0
[   11.833719]  __driver_probe_device+0x160/0x2e8
[   11.838150]  driver_probe_device+0x5c/0x298
[   11.842320]  __driver_attach+0x13c/0x4a8
[   11.846230]  bus_for_each_dev+0xf8/0x180
[   11.850141]  driver_attach+0x3c/0x58
[   11.853704]  bus_add_driver+0x1c4/0x458
[   11.857529]  driver_register+0xf4/0x3c0
[   11.861352]  __platform_driver_register+0x60/0x88
[   11.866043]  lvts_driver_init+0x20/0x1000 [lvts_thermal]
[   11.871342]  do_one_initcall+0xcc/0x284
[   11.875166]  do_init_module+0x278/0x740
[   11.878990]  load_module+0xed8/0x1434
[   11.882641]  init_module_from_file+0xdc/0x1fc
[   11.886986]  idempotent_init_module+0x2bc/0x604
[   11.891504]  __arm64_sys_finit_module+0xac/0x100
[   11.896109]  invoke_syscall+0x6c/0x258
[   11.899846]  el0_svc_common.constprop.0+0xac/0x230
[   11.904624]  do_el0_svc+0x40/0x58
[   11.907927]  el0_svc+0x48/0xb8
[   11.910969]  el0t_64_sync_handler+0x100/0x12c
[   11.915314]  el0t_64_sync+0x190/0x194
[   36.261761] watchdog: Watchdog detected hard LOCKUP on cpu 0
[   36.267414] Modules linked in: cbmem cros_ec_lid_angle cros_ec_sensors(+) cros_ec_sensors_core pcie_mediatek_gen3 sbs_battery cros_kbd_led_backlight industrialio_triggered_buffer kfifo_buf cros_ec_chardev cros_ec_rpmsg lvts_thermal(+) cros_ec_typec leds_cros_ec mtk_svs snd_sof_mt8195 mtk_adsp_common snd_sof_xtensa_dsp snd_sof_of mt6577_auxadc snd_soc_mt8195_afe snd_sof snd_sof_utils mtk_scp mtk_rpmsg mtk_scp_ipi pwm_bl mtk_wdt coreboot_table backlight mt8195_mt6359 ramoops reed_solomon
[   36.310414] irq event stamp: 197347
[   36.313890] hardirqs last  enabled at (197347): [<ffffc208ecdc994c>] exit_to_kernel_mode+0x38/0x118
[   36.322923] hardirqs last disabled at (197346): [<ffffc208ecdcac44>] el1_interrupt+0x24/0x54
[   36.331347] softirqs last  enabled at (197268): [<ffffc208e978bb20>] handle_softirqs+0x534/0x874
[   36.340117] softirqs last disabled at (197263): [<ffffc208e961097c>] __do_softirq+0x14/0x20

Full log at http://0x0.st/XyID.txt

Let me know if you need any more information.

#regzbot introduced: next-20240821..next-20240822
#regzbot title: Hang during boot in sysfs_remove_dir() called by thermal_of_zone_register()

Thanks,
Nícolas

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points
  2024-08-24 18:45 ` [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Nícolas F. R. A. Prado
@ 2024-08-26  9:58   ` Rafael J. Wysocki
  2024-08-30 13:55     ` Nícolas F. R. A. Prado
  0 siblings, 1 reply; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-26  9:58 UTC (permalink / raw)
  To: Nícolas F. R. A. Prado
  Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano, Lukasz Luba,
	Zhang Rui, regressions, kernelci, kernel

On Sat, Aug 24, 2024 at 8:45 PM Nícolas F. R. A. Prado
<nfraprado@collabora.com> wrote:
>
> On Mon, Aug 19, 2024 at 05:49:07PM +0200, Rafael J. Wysocki wrote:
> > Hi Everyone,
> >
> > This is one more update of
> >
> > https://lore.kernel.org/linux-pm/3134863.CbtlEUcBR6@rjwysocki.net/#r
> >
> > the cover letter of which was sent separately by mistake:
> >
> > https://lore.kernel.org/linux-pm/CAJZ5v0jo5vh2uD5t4GqBnN0qukMBG_ty33PB=NiEqigqxzBcsw@mail.gmail.com/
> >
> > and it has been updated once already:
> >
> > https://lore.kernel.org/linux-pm/114901234.nniJfEyVGO@rjwysocki.net/
> >
> > Relative to the v2 above it drops 3 patches, one because it was broken ([04/17
> > in the v2), and two more that would need to be rebased significantly, either
> > because of dropping the other broken patch or because of the recent Bang-bang
> > governor fixes:
> >
> > https://lore.kernel.org/linux-pm/1903691.tdWV9SEqCh@rjwysocki.net/
> >
> > The remaining 14 patches, 2 of which have been slightly rebased and the rest
> > is mostly unchanged (except for some very minor subject and changelog fixes),
> > is not expected to be controversial and are targeting 6.12, on top of the
> > current linux-next material.
> >
> > The original motivation for this series quoted below has not changed:
> >
> >  The code for binding cooling devices to trip points (and unbinding them from
> >  trip point) is one of the murkiest pieces of the thermal subsystem.  It is
> >  convoluted, bloated with unnecessary code doing questionable things, and it
> >  works backwards.
> >
> >  The idea is to bind cooling devices to trip points in accordance with some
> >  information known to the thermal zone owner (thermal driver).  This information
> >  is not known to the thermal core when the thermal zone is registered, so the
> >  driver needs to be involved, but instead of just asking the driver whether
> >  or not the given cooling device should be bound to a given trip point, the
> >  thermal core expects the driver to carry out all of the binding process
> >  including calling functions specifically provided by the core for this
> >  purpose which is cumbersome and counter-intuitive.
> >
> >  Because the driver has no information regarding the representation of the trip
> >  points at the core level, it is forced to walk them (and it has to avoid some
> >  locking traps while doing this), or it needs to make questionable assumptions
> >  regarding the ordering of the trips in the core.  There are drivers doing both
> >  these things.
> >
> > The first 5 patches in the series are preliminary.
> >
> > Patch [06/14] introduces a new .should_bind() callback for thermal zones and
> > patches [07,09-12/14] modifies drivers to use it instead of the .bind() and
> > .unbind() callbacks which allows them to be simplified quite a bit.
> >
> > The other patches [08,13-14/14] get rid of code that becomes unused after the
> > previous changes and do some cleanups on top of that.
> >
> > The entire series along with 2 patches on top of it (that were present in the
> > v2 of this set of patches) is available in the thermal-core-testing git branch:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=thermal-core-testing
> >
> > (note that this branch is going to be rebased shortly on top of 6.11-rc4
> > and the thermal control material in linux-next).
> >
> > Thanks!
>
> Hi,
>
> KernelCI has identified a boot regression originating from this series. I've
> verified that reverting the series fixes the issue.

Thanks for the report!

There was a bug in the original patch [12/14] that would cause
symptoms like what you are observing to appear, which was reported on
Friday and has since been fixed in the tree.  Please see:

https://lore.kernel.org/linux-pm/CAJZ5v0iw7uXE_cfU5VXOjFDg9GM8Hu0+hKxqfzU3v0OM5KK9oQ@mail.gmail.com/

You probably have not tested the fixed tree yet, so please let
kernelci run again on it and if the issue is still there, please let
me know.


> Affected platforms:
> * mt8195-cherry-tomato-r2
> * mt8192-asurada-spherion-r0
> * mt8183-kukui-jacuzzi-juniper-sku16
> * mt8186-corsola-steelix-sku131072
> * sc7180-trogdor-kingoftown
> * sc7180-trogdor-lazor-limozeen
>
> Relevant log from mt8195-cherry-tomato-r2 (with additional debug configs
> enabled):
>
> [   11.326726] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
> [   11.335294] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 165, name: udevd
> [   11.342944] preempt_count: 1, expected: 0
> [   11.346943] RCU nest depth: 0, expected: 0
> [   11.351028] 4 locks held by udevd/165:
> [   11.354766]  #0: ffff4dc8825db0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x130/0x4a8
> [   11.363207]  #1: ffffc208f386c3c8 (thermal_list_lock){+.+.}-{3:3}, at: thermal_zone_device_register_with_trips+0x85c/0xcd8
> [   11.374248]  #2: ffff4dc7dc3586f0 (&tz->lock){+.+.}-{3:3}, at: thermal_zone_cdev_binding.part.0+0x98/0x280
> [   11.383896]  #3: ffffc208f39b7b78 (devtree_lock){....}-{2:2}, at: of_get_next_child+0x2c/0xc4
> [   11.392418] irq event stamp: 173740
> [   11.395895] hardirqs last  enabled at (173739): [<ffffc208ecde804c>] _raw_spin_unlock_irqrestore+0x84/0x90
> [   11.405537] hardirqs last disabled at (173740): [<ffffc208ecde6f7c>] _raw_spin_lock_irqsave+0xe0/0xf4
> [   11.414742] softirqs last  enabled at (172404): [<ffffc208e978bb20>] handle_softirqs+0x534/0x874
> [   11.423517] softirqs last disabled at (172393): [<ffffc208e961097c>] __do_softirq+0x14/0x20
> [   11.431857] CPU: 5 UID: 0 PID: 165 Comm: udevd Not tainted 6.11.0-rc4-next-20240822-00002-gfbbbf9faa56a #628
> [   11.441670] Hardware name: Acer Tomato (rev2) board (DT)
> [   11.446970] Call trace:
> [   11.449407]  dump_backtrace+0x98/0xf0
> [   11.453059]  show_stack+0x18/0x24
> [   11.456364]  dump_stack_lvl+0x90/0xd0
> [   11.460018]  dump_stack+0x1c/0x28
> [   11.463322]  __might_resched+0x358/0x570
> [   11.467234]  __might_sleep+0xa4/0x16c
> [   11.470885]  down_write+0x8c/0x21c
> [   11.474277]  kernfs_remove+0x64/0x98
> [   11.477844]  sysfs_remove_dir+0xa8/0xe8
> [   11.481669]  __kobject_del+0xb0/0x27c
> [   11.485321]  kobject_release+0xfc/0x134
> [   11.489146]  kobject_put+0xb0/0x130
> [   11.492624]  of_node_put+0x18/0x28
> [   11.496016]  of_get_next_child+0x64/0xc4
> [   11.499929]  thermal_of_should_bind+0x154/0x390
> [   11.504449]  thermal_zone_cdev_binding.part.0+0x174/0x280
> [   11.509836]  thermal_zone_device_register_with_trips+0x914/0xcd8
> [   11.515831]  thermal_of_zone_register+0x284/0x464
> [   11.520523]  devm_thermal_of_zone_register+0x80/0xf4
> [   11.525476]  lvts_domain_init+0x500/0x760 [lvts_thermal]
> [   11.530785]  lvts_probe+0x1b4/0x3ac [lvts_thermal]
> [   11.535565]  platform_probe+0xc4/0x214
> [   11.539303]  really_probe+0x188/0x5d0
> [   11.542954]  __driver_probe_device+0x160/0x2e8
> [   11.547386]  driver_probe_device+0x5c/0x298
> [   11.551558]  __driver_attach+0x13c/0x4a8
> [   11.555470]  bus_for_each_dev+0xf8/0x180
> [   11.559383]  driver_attach+0x3c/0x58
> [   11.562947]  bus_add_driver+0x1c4/0x458
> [   11.566772]  driver_register+0xf4/0x3c0
> [   11.570598]  __platform_driver_register+0x60/0x88
> [   11.575291]  lvts_driver_init+0x20/0x1000 [lvts_thermal]
> [   11.580593]  do_one_initcall+0xcc/0x284
> [   11.584418]  do_init_module+0x278/0x740
> [   11.588244]  load_module+0xed8/0x1434
> [   11.591897]  init_module_from_file+0xdc/0x1fc
> [   11.596243]  idempotent_init_module+0x2bc/0x604
> [   11.600762]  __arm64_sys_finit_module+0xac/0x100
> [   11.605368]  invoke_syscall+0x6c/0x258
> [   11.609107]  el0_svc_common.constprop.0+0xac/0x230
> [   11.613886]  do_el0_svc+0x40/0x58
> [   11.617190]  el0_svc+0x48/0xb8
> [   11.620234]  el0t_64_sync_handler+0x100/0x12c
> [   11.624580]  el0t_64_sync+0x190/0x194
> [   11.628233]
> [   11.629713] =============================
> [   11.633708] [ BUG: Invalid wait context ]
> [   11.637705] 6.11.0-rc4-next-20240822-00002-gfbbbf9faa56a #628 Tainted: G        W
> [   11.645953] -----------------------------
> [   11.649950] udevd/165 is trying to lock:
> [   11.653859] ffff4dc880881148 (&root->kernfs_rwsem){++++}-{3:3}, at: kernfs_remove+0x64/0x98
> [   11.662200] other info that might help us debug this:
> [   11.667238] context-{4:4}
> [   11.669846] 4 locks held by udevd/165:
> [   11.673582]  #0: ffff4dc8825db0f8 (&dev->mutex){....}-{3:3}, at: __driver_attach+0x130/0x4a8
> [   11.682009]  #1: ffffc208f386c3c8 (thermal_list_lock){+.+.}-{3:3}, at: thermal_zone_device_register_with_trips+0x85c/0xcd8
> [   11.693041]  #2: ffff4dc7dc3586f0 (&tz->lock){+.+.}-{3:3}, at: thermal_zone_cdev_binding.part.0+0x98/0x280
> [   11.702684]  #3: ffffc208f39b7b78 (devtree_lock){....}-{2:2}, at: of_get_next_child+0x2c/0xc4
> [   11.711199] stack backtrace:
> [   11.714067] CPU: 5 UID: 0 PID: 165 Comm: udevd Tainted: G        W          6.11.0-rc4-next-20240822-00002-gfbbbf9faa56a #628
> [   11.725355] Tainted: [W]=WARN
> [   11.728310] Hardware name: Acer Tomato (rev2) board (DT)
> [   11.733608] Call trace:
> [   11.736041]  dump_backtrace+0x98/0xf0
> [   11.739692]  show_stack+0x18/0x24
> [   11.742994]  dump_stack_lvl+0x90/0xd0
> [   11.746645]  dump_stack+0x1c/0x28
> [   11.749948]  __lock_acquire+0x10f8/0x2710
> [   11.753948]  lock_acquire.part.0+0x218/0x518
> [   11.758206]  lock_acquire+0x90/0xb4
> [   11.761683]  down_write+0xb4/0x21c
> [   11.765074]  kernfs_remove+0x64/0x98
> [   11.768637]  sysfs_remove_dir+0xa8/0xe8
> [   11.772461]  __kobject_del+0xb0/0x27c
> [   11.776111]  kobject_release+0xfc/0x134
> [   11.779935]  kobject_put+0xb0/0x130
> [   11.783413]  of_node_put+0x18/0x28
> [   11.786803]  of_get_next_child+0x64/0xc4
> [   11.790714]  thermal_of_should_bind+0x154/0x390
> [   11.795231]  thermal_zone_cdev_binding.part.0+0x174/0x280
> [   11.800617]  thermal_zone_device_register_with_trips+0x914/0xcd8
> [   11.806609]  thermal_of_zone_register+0x284/0x464
> [   11.811301]  devm_thermal_of_zone_register+0x80/0xf4
> [   11.816253]  lvts_domain_init+0x500/0x760 [lvts_thermal]
> [   11.821553]  lvts_probe+0x1b4/0x3ac [lvts_thermal]
> [   11.826332]  platform_probe+0xc4/0x214
> [   11.830069]  really_probe+0x188/0x5d0
> [   11.833719]  __driver_probe_device+0x160/0x2e8
> [   11.838150]  driver_probe_device+0x5c/0x298
> [   11.842320]  __driver_attach+0x13c/0x4a8
> [   11.846230]  bus_for_each_dev+0xf8/0x180
> [   11.850141]  driver_attach+0x3c/0x58
> [   11.853704]  bus_add_driver+0x1c4/0x458
> [   11.857529]  driver_register+0xf4/0x3c0
> [   11.861352]  __platform_driver_register+0x60/0x88
> [   11.866043]  lvts_driver_init+0x20/0x1000 [lvts_thermal]
> [   11.871342]  do_one_initcall+0xcc/0x284
> [   11.875166]  do_init_module+0x278/0x740
> [   11.878990]  load_module+0xed8/0x1434
> [   11.882641]  init_module_from_file+0xdc/0x1fc
> [   11.886986]  idempotent_init_module+0x2bc/0x604
> [   11.891504]  __arm64_sys_finit_module+0xac/0x100
> [   11.896109]  invoke_syscall+0x6c/0x258
> [   11.899846]  el0_svc_common.constprop.0+0xac/0x230
> [   11.904624]  do_el0_svc+0x40/0x58
> [   11.907927]  el0_svc+0x48/0xb8
> [   11.910969]  el0t_64_sync_handler+0x100/0x12c
> [   11.915314]  el0t_64_sync+0x190/0x194
> [   36.261761] watchdog: Watchdog detected hard LOCKUP on cpu 0
> [   36.267414] Modules linked in: cbmem cros_ec_lid_angle cros_ec_sensors(+) cros_ec_sensors_core pcie_mediatek_gen3 sbs_battery cros_kbd_led_backlight industrialio_triggered_buffer kfifo_buf cros_ec_chardev cros_ec_rpmsg lvts_thermal(+) cros_ec_typec leds_cros_ec mtk_svs snd_sof_mt8195 mtk_adsp_common snd_sof_xtensa_dsp snd_sof_of mt6577_auxadc snd_soc_mt8195_afe snd_sof snd_sof_utils mtk_scp mtk_rpmsg mtk_scp_ipi pwm_bl mtk_wdt coreboot_table backlight mt8195_mt6359 ramoops reed_solomon
> [   36.310414] irq event stamp: 197347
> [   36.313890] hardirqs last  enabled at (197347): [<ffffc208ecdc994c>] exit_to_kernel_mode+0x38/0x118
> [   36.322923] hardirqs last disabled at (197346): [<ffffc208ecdcac44>] el1_interrupt+0x24/0x54
> [   36.331347] softirqs last  enabled at (197268): [<ffffc208e978bb20>] handle_softirqs+0x534/0x874
> [   36.340117] softirqs last disabled at (197263): [<ffffc208e961097c>] __do_softirq+0x14/0x20
>
> Full log at http://0x0.st/XyID.txt
>
> Let me know if you need any more information.
>
> #regzbot introduced: next-20240821..next-20240822
> #regzbot title: Hang during boot in sysfs_remove_dir() called by thermal_of_zone_register()
>
> Thanks,
> Nícolas
>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 12/14] thermal/of: Use the .should_bind() thermal zone callback
  2024-08-19 16:30 ` [PATCH v3 12/14] thermal/of: " Rafael J. Wysocki
  2024-08-21 14:20   ` Daniel Lezcano
@ 2024-08-26 11:31   ` Marek Szyprowski
  2024-08-26 12:14     ` Rafael J. Wysocki
  1 sibling, 1 reply; 66+ messages in thread
From: Marek Szyprowski @ 2024-08-26 11:31 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM
  Cc: LKML, Daniel Lezcano, Lukasz Luba, Zhang Rui, Krzysztof Kozlowski,
	'Linux Samsung SOC', 'Mateusz Majewski',
	linux-amlogic

On 19.08.2024 18:30, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> Make the thermal_of driver use the .should_bind() thermal zone callback
> to provide the thermal core with the information on whether or not to
> bind the given cooling device to the given trip point in the given
> thermal zone.  If it returns 'true', the thermal core will bind the
> cooling device to the trip and the corresponding unbinding will be
> taken care of automatically by the core on the removal of the involved
> thermal zone or cooling device.
>
> This replaces the .bind() and .unbind() thermal zone callbacks which
> assumed the same trip points ordering in the driver and in the thermal
> core (that may not be true any more in the future).  The .bind()
> callback would walk the given thermal zone's cooling maps to find all
> of the valid trip point combinations with the given cooling device and
> it would call thermal_zone_bind_cooling_device() for all of them using
> trip point indices reflecting the ordering of the trips in the DT.
>
> The .should_bind() callback still walks the thermal zone's cooling maps,
> but it can use the trip object passed to it by the thermal core to find
> the trip in question in the first place and then it uses the
> corresponding 'cooling-device' entries to look up the given cooling
> device.  To be able to match the trip object provided by the thermal
> core to a specific device node, the driver sets the 'priv' field of each
> trip to the corresponding device node pointer during initialization.
>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

This patch landed recently in linux-next as commit 6d71d55c3b12 
("thermal/of: Use the .should_bind() thermal zone callback"). In my 
tests I found that it breaks booting some on my test boars: Exynos-based 
(OdroidXU4 with ARM32 bit kernel from multi_v7_defconfig) and Amlogic 
Meson based boards (OdroidC4, VIM3 with ARM64 defconfig+some debug 
options). Reverting $subject on top of next-20240823 together with 
c1ee6e1f68f5 ("thermal: core: Clean up trip bind/unbind functions") and 
526954900465 ("thermal: core: Drop unused bind/unbind functions and 
callbacks") due to compile dependencies fixes the issue.

On Odroid C4 I see the following warnings before the boards hangs:

BUG: sleeping function called from invalid context at 
kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 263, name: 
systemd-udevd
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
4 locks held by systemd-udevd/263:
  #0: ffff0000013768f8 (&dev->mutex){....}-{3:3}, at: 
__driver_attach+0x90/0x1ac
  #1: ffff80008349e1a0 (thermal_list_lock){+.+.}-{3:3}, at: 
__thermal_cooling_device_register.part.0+0x154/0x2f4
  #2: ffff000000988700 (&tz->lock){+.+.}-{3:3}, at: 
thermal_zone_cdev_binding+0x84/0x1e4
  #3: ffff8000834b8a98 (devtree_lock){....}-{2:2}, at: 
of_get_next_child+0x2c/0x80
irq event stamp: 7936
hardirqs last  enabled at (7935): [<ffff8000812b1700>] 
_raw_spin_unlock_irqrestore+0x74/0x78
hardirqs last disabled at (7936): [<ffff8000812b0b14>] 
_raw_spin_lock_irqsave+0x84/0x88
softirqs last  enabled at (7302): [<ffff8000800b13dc>] 
handle_softirqs+0x4cc/0x4e4
softirqs last disabled at (7295): [<ffff8000800105b0>] 
__do_softirq+0x14/0x20
CPU: 3 UID: 0 PID: 263 Comm: systemd-udevd Not tainted 6.11.0-rc3+ #15264
Hardware name: Hardkernel ODROID-C4 (DT)
Call trace:
  dump_backtrace+0x94/0xec
  show_stack+0x18/0x24
  dump_stack_lvl+0x90/0xd0
  dump_stack+0x18/0x24
  __might_resched+0x144/0x248
  __might_sleep+0x48/0x98
  down_write+0x28/0xe8
  kernfs_remove+0x34/0x58
  sysfs_remove_dir+0x54/0x70
  __kobject_del+0x40/0xb8
  kobject_put+0x104/0x124
  of_node_put+0x18/0x28
  of_get_next_child+0x4c/0x80
  thermal_of_should_bind+0xec/0x28c
  thermal_zone_cdev_binding+0x104/0x1e4
  __thermal_cooling_device_register.part.0+0x194/0x2f4
  thermal_of_cooling_device_register+0x3c/0x54
  of_devfreq_cooling_register_power+0x220/0x298
  devfreq_cooling_em_register+0x48/0xa8
  panfrost_devfreq_init+0x294/0x320 [panfrost]
  panfrost_device_init+0x16c/0x5c8 [panfrost]
  panfrost_probe+0xbc/0x194 [panfrost]
  platform_probe+0x68/0xdc
  really_probe+0xbc/0x298
  __driver_probe_device+0x78/0x12c
  driver_probe_device+0x40/0x164
  __driver_attach+0x9c/0x1ac
  bus_for_each_dev+0x74/0xd4
  driver_attach+0x24/0x30
  bus_add_driver+0xe4/0x208
  driver_register+0x60/0x128
  __platform_driver_register+0x28/0x34
  panfrost_driver_init+0x20/0x1000 [panfrost]
  do_one_initcall+0x68/0x300
  do_init_module+0x60/0x224
  load_module+0x1b0c/0x1cb0
  init_module_from_file+0x84/0xc4
  idempotent_init_module+0x18c/0x284
  __arm64_sys_finit_module+0x64/0xa0
  invoke_syscall+0x48/0x110
  el0_svc_common.constprop.0+0x40/0xe8
  do_el0_svc_compat+0x20/0x3c
  el0_svc_compat+0x44/0xe0
  el0t_32_sync_handler+0x98/0x148
  el0t_32_sync+0x194/0x198

=============================
[ BUG: Invalid wait context ]
6.11.0-rc3+ #15264 Tainted: G        W
-----------------------------
systemd-udevd/263 is trying to lock:
ffff0000000e5948 (&root->kernfs_rwsem){++++}-{3:3}, at: 
kernfs_remove+0x34/0x58
other info that might help us debug this:
context-{4:4}
4 locks held by systemd-udevd/263:
  #0: ffff0000013768f8 (&dev->mutex){....}-{3:3}, at: 
__driver_attach+0x90/0x1ac
  #1: ffff80008349e1a0 (thermal_list_lock){+.+.}-{3:3}, at: 
__thermal_cooling_device_register.part.0+0x154/0x2f4
  #2: ffff000000988700 (&tz->lock){+.+.}-{3:3}, at: 
thermal_zone_cdev_binding+0x84/0x1e4
  #3: ffff8000834b8a98 (devtree_lock){....}-{2:2}, at: 
of_get_next_child+0x2c/0x80
stack backtrace:
CPU: 3 UID: 0 PID: 263 Comm: systemd-udevd Tainted: G W          
6.11.0-rc3+ #15264
Tainted: [W]=WARN
Hardware name: Hardkernel ODROID-C4 (DT)
Call trace:
  dump_backtrace+0x94/0xec
  show_stack+0x18/0x24
  dump_stack_lvl+0x90/0xd0
  dump_stack+0x18/0x24
  __lock_acquire+0x9fc/0x21a0
  lock_acquire+0x200/0x340
  down_write+0x50/0xe8
  kernfs_remove+0x34/0x58
  sysfs_remove_dir+0x54/0x70
  __kobject_del+0x40/0xb8
  kobject_put+0x104/0x124
  of_node_put+0x18/0x28
  of_get_next_child+0x4c/0x80
  thermal_of_should_bind+0xec/0x28c
  thermal_zone_cdev_binding+0x104/0x1e4
  __thermal_cooling_device_register.part.0+0x194/0x2f4
  thermal_of_cooling_device_register+0x3c/0x54
  of_devfreq_cooling_register_power+0x220/0x298
  devfreq_cooling_em_register+0x48/0xa8
  panfrost_devfreq_init+0x294/0x320 [panfrost]
  panfrost_device_init+0x16c/0x5c8 [panfrost]
  panfrost_probe+0xbc/0x194 [panfrost]
  platform_probe+0x68/0xdc
  really_probe+0xbc/0x298
  __driver_probe_device+0x78/0x12c
  driver_probe_device+0x40/0x164
  __driver_attach+0x9c/0x1ac
  bus_for_each_dev+0x74/0xd4
  driver_attach+0x24/0x30
  bus_add_driver+0xe4/0x208
  driver_register+0x60/0x128
  __platform_driver_register+0x28/0x34
  panfrost_driver_init+0x20/0x1000 [panfrost]
  do_one_initcall+0x68/0x300
  do_init_module+0x60/0x224
  load_module+0x1b0c/0x1cb0
  init_module_from_file+0x84/0xc4
  idempotent_init_module+0x18c/0x284
  __arm64_sys_finit_module+0x64/0xa0
  invoke_syscall+0x48/0x110
  el0_svc_common.constprop.0+0x40/0xe8
  do_el0_svc_compat+0x20/0x3c
  el0_svc_compat+0x44/0xe0
  el0t_32_sync_handler+0x98/0x148
  el0t_32_sync+0x194/0x198
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:     2-...!: (0 ticks this GP) idle=2aac/1/0x4000000000000000 
softirq=798/798 fqs=4
rcu:     3-...!: (0 ticks this GP) idle=28a4/1/0x4000000000000000 
softirq=1007/1007 fqs=4
rcu:     (detected by 0, t=6505 jiffies, g=349, q=46 ncpus=4)
Sending NMI from CPU 0 to CPUs 2:
Sending NMI from CPU 0 to CPUs 3:
rcu: rcu_preempt kthread timer wakeup didn't happen for 6483 jiffies! 
g349 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
rcu:     Possible timer handling issue on cpu=1 timer-softirq=260
rcu: rcu_preempt kthread starved for 6484 jiffies! g349 f0x0 
RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
rcu:     Unless rcu_preempt kthread gets sufficient CPU time, OOM is now 
expected behavior.
rcu: RCU grace-period kthread stack dump:
task:rcu_preempt     state:I stack:0     pid:16    tgid:16 ppid:2      
flags:0x00000008
Call trace:
  __switch_to+0xe0/0x124
  __schedule+0x318/0xc30
  schedule+0x50/0x15c
  schedule_timeout+0xac/0x134
  rcu_gp_fqs_loop+0x16c/0x8b4
  rcu_gp_kthread+0x280/0x314
  kthread+0x124/0x128
  ret_from_fork+0x10/0x20
rcu: Stack dump where RCU GP kthread last ran:
Sending NMI from CPU 0 to CPUs 1:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu:     2-...!: (0 ticks this GP) idle=2aac/1/0x4000000000000000 
softirq=798/798 fqs=4
rcu:     3-...!: (0 ticks this GP) idle=28a4/1/0x4000000000000000 
softirq=1007/1007 fqs=4
rcu:     (detected by 0, t=26013 jiffies, g=349, q=46 ncpus=4)
Sending NMI from CPU 0 to CPUs 2:
Sending NMI from CPU 0 to CPUs 3:

Let me know if I can help debugging this issue further.

> ---
>
> v2 -> v3: Reorder (previously [14/17])
>
> v1 -> v2:
>     * Fix a build issue (undefined symbol)
>
> This patch only depends on the [06/14] introducing the .should_bind()
> thermal zone callback:
>
> https://lore.kernel.org/linux-pm/9334403.CDJkKcVGEf@rjwysocki.net/
>
> ---
>   drivers/thermal/thermal_of.c |  171 ++++++++++---------------------------------
>   1 file changed, 41 insertions(+), 130 deletions(-)
>
> Index: linux-pm/drivers/thermal/thermal_of.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_of.c
> +++ linux-pm/drivers/thermal/thermal_of.c
> @@ -20,37 +20,6 @@
>   
>   /***   functions parsing device tree nodes   ***/
>   
> -static int of_find_trip_id(struct device_node *np, struct device_node *trip)
> -{
> -	struct device_node *trips;
> -	struct device_node *t;
> -	int i = 0;
> -
> -	trips = of_get_child_by_name(np, "trips");
> -	if (!trips) {
> -		pr_err("Failed to find 'trips' node\n");
> -		return -EINVAL;
> -	}
> -
> -	/*
> -	 * Find the trip id point associated with the cooling device map
> -	 */
> -	for_each_child_of_node(trips, t) {
> -
> -		if (t == trip) {
> -			of_node_put(t);
> -			goto out;
> -		}
> -		i++;
> -	}
> -
> -	i = -ENXIO;
> -out:
> -	of_node_put(trips);
> -
> -	return i;
> -}
> -
>   /*
>    * It maps 'enum thermal_trip_type' found in include/linux/thermal.h
>    * into the device tree binding of 'trip', property type.
> @@ -119,6 +88,8 @@ static int thermal_of_populate_trip(stru
>   
>   	trip->flags = THERMAL_TRIP_FLAG_RW_TEMP;
>   
> +	trip->priv = np;
> +
>   	return 0;
>   }
>   
> @@ -290,39 +261,9 @@ static struct device_node *thermal_of_zo
>   	return tz_np;
>   }
>   
> -static int __thermal_of_unbind(struct device_node *map_np, int index, int trip_id,
> -			       struct thermal_zone_device *tz, struct thermal_cooling_device *cdev)
> -{
> -	struct of_phandle_args cooling_spec;
> -	int ret;
> -
> -	ret = of_parse_phandle_with_args(map_np, "cooling-device", "#cooling-cells",
> -					 index, &cooling_spec);
> -
> -	if (ret < 0) {
> -		pr_err("Invalid cooling-device entry\n");
> -		return ret;
> -	}
> -
> -	of_node_put(cooling_spec.np);
> -
> -	if (cooling_spec.args_count < 2) {
> -		pr_err("wrong reference to cooling device, missing limits\n");
> -		return -EINVAL;
> -	}
> -
> -	if (cooling_spec.np != cdev->np)
> -		return 0;
> -
> -	ret = thermal_zone_unbind_cooling_device(tz, trip_id, cdev);
> -	if (ret)
> -		pr_err("Failed to unbind '%s' with '%s': %d\n", tz->type, cdev->type, ret);
> -
> -	return ret;
> -}
> -
> -static int __thermal_of_bind(struct device_node *map_np, int index, int trip_id,
> -			     struct thermal_zone_device *tz, struct thermal_cooling_device *cdev)
> +static bool thermal_of_get_cooling_spec(struct device_node *map_np, int index,
> +					struct thermal_cooling_device *cdev,
> +					struct cooling_spec *c)
>   {
>   	struct of_phandle_args cooling_spec;
>   	int ret, weight = THERMAL_WEIGHT_DEFAULT;
> @@ -334,104 +275,75 @@ static int __thermal_of_bind(struct devi
>   
>   	if (ret < 0) {
>   		pr_err("Invalid cooling-device entry\n");
> -		return ret;
> +		return false;
>   	}
>   
>   	of_node_put(cooling_spec.np);
>   
>   	if (cooling_spec.args_count < 2) {
>   		pr_err("wrong reference to cooling device, missing limits\n");
> -		return -EINVAL;
> +		return false;
>   	}
>   
>   	if (cooling_spec.np != cdev->np)
> -		return 0;
> -
> -	ret = thermal_zone_bind_cooling_device(tz, trip_id, cdev, cooling_spec.args[1],
> -					       cooling_spec.args[0],
> -					       weight);
> -	if (ret)
> -		pr_err("Failed to bind '%s' with '%s': %d\n", tz->type, cdev->type, ret);
> -
> -	return ret;
> -}
> -
> -static int thermal_of_for_each_cooling_device(struct device_node *tz_np, struct device_node *map_np,
> -					      struct thermal_zone_device *tz, struct thermal_cooling_device *cdev,
> -					      int (*action)(struct device_node *, int, int,
> -							    struct thermal_zone_device *, struct thermal_cooling_device *))
> -{
> -	struct device_node *tr_np;
> -	int count, i, trip_id;
> -
> -	tr_np = of_parse_phandle(map_np, "trip", 0);
> -	if (!tr_np)
> -		return -ENODEV;
> -
> -	trip_id = of_find_trip_id(tz_np, tr_np);
> -	if (trip_id < 0)
> -		return trip_id;
> -
> -	count = of_count_phandle_with_args(map_np, "cooling-device", "#cooling-cells");
> -	if (count <= 0) {
> -		pr_err("Add a cooling_device property with at least one device\n");
> -		return -ENOENT;
> -	}
> +		return false;
>   
> -	/*
> -	 * At this point, we don't want to bail out when there is an
> -	 * error, we will try to bind/unbind as many as possible
> -	 * cooling devices
> -	 */
> -	for (i = 0; i < count; i++)
> -		action(map_np, i, trip_id, tz, cdev);
> +	c->lower = cooling_spec.args[0];
> +	c->upper = cooling_spec.args[1];
> +	c->weight = weight;
>   
> -	return 0;
> +	return true;
>   }
>   
> -static int thermal_of_for_each_cooling_maps(struct thermal_zone_device *tz,
> -					    struct thermal_cooling_device *cdev,
> -					    int (*action)(struct device_node *, int, int,
> -							  struct thermal_zone_device *, struct thermal_cooling_device *))
> +static bool thermal_of_should_bind(struct thermal_zone_device *tz,
> +				   const struct thermal_trip *trip,
> +				   struct thermal_cooling_device *cdev,
> +				   struct cooling_spec *c)
>   {
>   	struct device_node *tz_np, *cm_np, *child;
> -	int ret = 0;
> +	bool result = false;
>   
>   	tz_np = thermal_of_zone_get_by_name(tz);
>   	if (IS_ERR(tz_np)) {
>   		pr_err("Failed to get node tz by name\n");
> -		return PTR_ERR(tz_np);
> +		return false;
>   	}
>   
>   	cm_np = of_get_child_by_name(tz_np, "cooling-maps");
>   	if (!cm_np)
>   		goto out;
>   
> +	/* Look up the trip and the cdev in the cooling maps. */
>   	for_each_child_of_node(cm_np, child) {
> -		ret = thermal_of_for_each_cooling_device(tz_np, child, tz, cdev, action);
> -		if (ret) {
> +		struct device_node *tr_np;
> +		int count, i;
> +
> +		tr_np = of_parse_phandle(child, "trip", 0);
> +		if (tr_np != trip->priv) {
>   			of_node_put(child);
> -			break;
> +			continue;
> +		}
> +
> +		/* The trip has been found, look up the cdev. */
> +		count = of_count_phandle_with_args(child, "cooling-device", "#cooling-cells");
> +		if (count <= 0)
> +			pr_err("Add a cooling_device property with at least one device\n");
> +
> +		for (i = 0; i < count; i++) {
> +			result = thermal_of_get_cooling_spec(child, i, cdev, c);
> +			if (result)
> +				break;
>   		}
> +
> +		of_node_put(child);
> +		break;
>   	}
>   
>   	of_node_put(cm_np);
>   out:
>   	of_node_put(tz_np);
>   
> -	return ret;
> -}
> -
> -static int thermal_of_bind(struct thermal_zone_device *tz,
> -			   struct thermal_cooling_device *cdev)
> -{
> -	return thermal_of_for_each_cooling_maps(tz, cdev, __thermal_of_bind);
> -}
> -
> -static int thermal_of_unbind(struct thermal_zone_device *tz,
> -			     struct thermal_cooling_device *cdev)
> -{
> -	return thermal_of_for_each_cooling_maps(tz, cdev, __thermal_of_unbind);
> +	return result;
>   }
>   
>   /**
> @@ -502,8 +414,7 @@ static struct thermal_zone_device *therm
>   
>   	thermal_of_parameters_init(np, &tzp);
>   
> -	of_ops.bind = thermal_of_bind;
> -	of_ops.unbind = thermal_of_unbind;
> +	of_ops.should_bind = thermal_of_should_bind;
>   
>   	ret = of_property_read_string(np, "critical-action", &action);
>   	if (!ret)
>
>
>
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 12/14] thermal/of: Use the .should_bind() thermal zone callback
  2024-08-26 11:31   ` Marek Szyprowski
@ 2024-08-26 12:14     ` Rafael J. Wysocki
  2024-08-26 20:49       ` Marek Szyprowski
  0 siblings, 1 reply; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-26 12:14 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano, Lukasz Luba,
	Zhang Rui, Krzysztof Kozlowski, Linux Samsung SOC,
	Mateusz Majewski, linux-amlogic

On Mon, Aug 26, 2024 at 1:32 PM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> On 19.08.2024 18:30, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Make the thermal_of driver use the .should_bind() thermal zone callback
> > to provide the thermal core with the information on whether or not to
> > bind the given cooling device to the given trip point in the given
> > thermal zone.  If it returns 'true', the thermal core will bind the
> > cooling device to the trip and the corresponding unbinding will be
> > taken care of automatically by the core on the removal of the involved
> > thermal zone or cooling device.
> >
> > This replaces the .bind() and .unbind() thermal zone callbacks which
> > assumed the same trip points ordering in the driver and in the thermal
> > core (that may not be true any more in the future).  The .bind()
> > callback would walk the given thermal zone's cooling maps to find all
> > of the valid trip point combinations with the given cooling device and
> > it would call thermal_zone_bind_cooling_device() for all of them using
> > trip point indices reflecting the ordering of the trips in the DT.
> >
> > The .should_bind() callback still walks the thermal zone's cooling maps,
> > but it can use the trip object passed to it by the thermal core to find
> > the trip in question in the first place and then it uses the
> > corresponding 'cooling-device' entries to look up the given cooling
> > device.  To be able to match the trip object provided by the thermal
> > core to a specific device node, the driver sets the 'priv' field of each
> > trip to the corresponding device node pointer during initialization.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>
> This patch landed recently in linux-next as commit 6d71d55c3b12
> ("thermal/of: Use the .should_bind() thermal zone callback")

It has been fixed since and it is commit  94c6110b0b13c6416146 now.

Bottom line is that it was calling of_node_put() too many times due to
a coding mistake.

> In my tests I found that it breaks booting some on my test boars: Exynos-based
> (OdroidXU4 with ARM32 bit kernel from multi_v7_defconfig) and Amlogic
> Meson based boards (OdroidC4, VIM3 with ARM64 defconfig+some debug
> options). Reverting $subject on top of next-20240823 together with
> c1ee6e1f68f5 ("thermal: core: Clean up trip bind/unbind functions") and
> 526954900465 ("thermal: core: Drop unused bind/unbind functions and
> callbacks") due to compile dependencies fixes the issue.

Thanks for the report!

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 12/14] thermal/of: Use the .should_bind() thermal zone callback
  2024-08-26 12:14     ` Rafael J. Wysocki
@ 2024-08-26 20:49       ` Marek Szyprowski
  2024-08-27 11:39         ` Rafael J. Wysocki
  0 siblings, 1 reply; 66+ messages in thread
From: Marek Szyprowski @ 2024-08-26 20:49 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano, Lukasz Luba,
	Zhang Rui, Krzysztof Kozlowski, Linux Samsung SOC,
	Mateusz Majewski, linux-amlogic

On 26.08.2024 14:14, Rafael J. Wysocki wrote:
> On Mon, Aug 26, 2024 at 1:32 PM Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> On 19.08.2024 18:30, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>>
>>> Make the thermal_of driver use the .should_bind() thermal zone callback
>>> to provide the thermal core with the information on whether or not to
>>> bind the given cooling device to the given trip point in the given
>>> thermal zone.  If it returns 'true', the thermal core will bind the
>>> cooling device to the trip and the corresponding unbinding will be
>>> taken care of automatically by the core on the removal of the involved
>>> thermal zone or cooling device.
>>>
>>> This replaces the .bind() and .unbind() thermal zone callbacks which
>>> assumed the same trip points ordering in the driver and in the thermal
>>> core (that may not be true any more in the future).  The .bind()
>>> callback would walk the given thermal zone's cooling maps to find all
>>> of the valid trip point combinations with the given cooling device and
>>> it would call thermal_zone_bind_cooling_device() for all of them using
>>> trip point indices reflecting the ordering of the trips in the DT.
>>>
>>> The .should_bind() callback still walks the thermal zone's cooling maps,
>>> but it can use the trip object passed to it by the thermal core to find
>>> the trip in question in the first place and then it uses the
>>> corresponding 'cooling-device' entries to look up the given cooling
>>> device.  To be able to match the trip object provided by the thermal
>>> core to a specific device node, the driver sets the 'priv' field of each
>>> trip to the corresponding device node pointer during initialization.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>> This patch landed recently in linux-next as commit 6d71d55c3b12
>> ("thermal/of: Use the .should_bind() thermal zone callback")
> It has been fixed since and it is commit  94c6110b0b13c6416146 now.


Confirmed. Thanks for fixing it and sorry for the noise.


> Bottom line is that it was calling of_node_put() too many times due to
> a coding mistake.
>
>> In my tests I found that it breaks booting some on my test boars: Exynos-based
>> (OdroidXU4 with ARM32 bit kernel from multi_v7_defconfig) and Amlogic
>> Meson based boards (OdroidC4, VIM3 with ARM64 defconfig+some debug
>> options). Reverting $subject on top of next-20240823 together with
>> c1ee6e1f68f5 ("thermal: core: Clean up trip bind/unbind functions") and
>> 526954900465 ("thermal: core: Drop unused bind/unbind functions and
>> callbacks") due to compile dependencies fixes the issue.
> Thanks for the report!
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 12/14] thermal/of: Use the .should_bind() thermal zone callback
  2024-08-26 20:49       ` Marek Szyprowski
@ 2024-08-27 11:39         ` Rafael J. Wysocki
  0 siblings, 0 replies; 66+ messages in thread
From: Rafael J. Wysocki @ 2024-08-27 11:39 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Linux PM, LKML,
	Daniel Lezcano, Lukasz Luba, Zhang Rui, Krzysztof Kozlowski,
	Linux Samsung SOC, Mateusz Majewski, linux-amlogic

On Mon, Aug 26, 2024 at 10:49 PM Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
>
> On 26.08.2024 14:14, Rafael J. Wysocki wrote:
> > On Mon, Aug 26, 2024 at 1:32 PM Marek Szyprowski
> > <m.szyprowski@samsung.com> wrote:
> >> On 19.08.2024 18:30, Rafael J. Wysocki wrote:
> >>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >>>
> >>> Make the thermal_of driver use the .should_bind() thermal zone callback
> >>> to provide the thermal core with the information on whether or not to
> >>> bind the given cooling device to the given trip point in the given
> >>> thermal zone.  If it returns 'true', the thermal core will bind the
> >>> cooling device to the trip and the corresponding unbinding will be
> >>> taken care of automatically by the core on the removal of the involved
> >>> thermal zone or cooling device.
> >>>
> >>> This replaces the .bind() and .unbind() thermal zone callbacks which
> >>> assumed the same trip points ordering in the driver and in the thermal
> >>> core (that may not be true any more in the future).  The .bind()
> >>> callback would walk the given thermal zone's cooling maps to find all
> >>> of the valid trip point combinations with the given cooling device and
> >>> it would call thermal_zone_bind_cooling_device() for all of them using
> >>> trip point indices reflecting the ordering of the trips in the DT.
> >>>
> >>> The .should_bind() callback still walks the thermal zone's cooling maps,
> >>> but it can use the trip object passed to it by the thermal core to find
> >>> the trip in question in the first place and then it uses the
> >>> corresponding 'cooling-device' entries to look up the given cooling
> >>> device.  To be able to match the trip object provided by the thermal
> >>> core to a specific device node, the driver sets the 'priv' field of each
> >>> trip to the corresponding device node pointer during initialization.
> >>>
> >>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >> This patch landed recently in linux-next as commit 6d71d55c3b12
> >> ("thermal/of: Use the .should_bind() thermal zone callback")
> > It has been fixed since and it is commit  94c6110b0b13c6416146 now.
>
>
> Confirmed. Thanks for fixing it and sorry for the noise.

Thank you!

And it wasn't noise.  You reported the problem as soon as you saw it
and before you could see the fix.  Somebody else saw it earlier, but
there's nothing wrong with that.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points
  2024-08-26  9:58   ` Rafael J. Wysocki
@ 2024-08-30 13:55     ` Nícolas F. R. A. Prado
  0 siblings, 0 replies; 66+ messages in thread
From: Nícolas F. R. A. Prado @ 2024-08-30 13:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Rafael J. Wysocki, Linux PM, LKML, Daniel Lezcano, Lukasz Luba,
	Zhang Rui, regressions, kernelci, kernel

On Mon, Aug 26, 2024 at 11:58:12AM +0200, Rafael J. Wysocki wrote:
> On Sat, Aug 24, 2024 at 8:45 PM Nícolas F. R. A. Prado
> <nfraprado@collabora.com> wrote:
> >
> > On Mon, Aug 19, 2024 at 05:49:07PM +0200, Rafael J. Wysocki wrote:
> > > Hi Everyone,
> > >
> > > This is one more update of
> > >
> > > https://lore.kernel.org/linux-pm/3134863.CbtlEUcBR6@rjwysocki.net/#r
> > >
> > > the cover letter of which was sent separately by mistake:
> > >
> > > https://lore.kernel.org/linux-pm/CAJZ5v0jo5vh2uD5t4GqBnN0qukMBG_ty33PB=NiEqigqxzBcsw@mail.gmail.com/
> > >
> > > and it has been updated once already:
> > >
> > > https://lore.kernel.org/linux-pm/114901234.nniJfEyVGO@rjwysocki.net/
> > >
> > > Relative to the v2 above it drops 3 patches, one because it was broken ([04/17
> > > in the v2), and two more that would need to be rebased significantly, either
> > > because of dropping the other broken patch or because of the recent Bang-bang
> > > governor fixes:
> > >
> > > https://lore.kernel.org/linux-pm/1903691.tdWV9SEqCh@rjwysocki.net/
> > >
> > > The remaining 14 patches, 2 of which have been slightly rebased and the rest
> > > is mostly unchanged (except for some very minor subject and changelog fixes),
> > > is not expected to be controversial and are targeting 6.12, on top of the
> > > current linux-next material.
> > >
> > > The original motivation for this series quoted below has not changed:
> > >
> > >  The code for binding cooling devices to trip points (and unbinding them from
> > >  trip point) is one of the murkiest pieces of the thermal subsystem.  It is
> > >  convoluted, bloated with unnecessary code doing questionable things, and it
> > >  works backwards.
> > >
> > >  The idea is to bind cooling devices to trip points in accordance with some
> > >  information known to the thermal zone owner (thermal driver).  This information
> > >  is not known to the thermal core when the thermal zone is registered, so the
> > >  driver needs to be involved, but instead of just asking the driver whether
> > >  or not the given cooling device should be bound to a given trip point, the
> > >  thermal core expects the driver to carry out all of the binding process
> > >  including calling functions specifically provided by the core for this
> > >  purpose which is cumbersome and counter-intuitive.
> > >
> > >  Because the driver has no information regarding the representation of the trip
> > >  points at the core level, it is forced to walk them (and it has to avoid some
> > >  locking traps while doing this), or it needs to make questionable assumptions
> > >  regarding the ordering of the trips in the core.  There are drivers doing both
> > >  these things.
> > >
> > > The first 5 patches in the series are preliminary.
> > >
> > > Patch [06/14] introduces a new .should_bind() callback for thermal zones and
> > > patches [07,09-12/14] modifies drivers to use it instead of the .bind() and
> > > .unbind() callbacks which allows them to be simplified quite a bit.
> > >
> > > The other patches [08,13-14/14] get rid of code that becomes unused after the
> > > previous changes and do some cleanups on top of that.
> > >
> > > The entire series along with 2 patches on top of it (that were present in the
> > > v2 of this set of patches) is available in the thermal-core-testing git branch:
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=thermal-core-testing
> > >
> > > (note that this branch is going to be rebased shortly on top of 6.11-rc4
> > > and the thermal control material in linux-next).
> > >
> > > Thanks!
> >
> > Hi,
> >
> > KernelCI has identified a boot regression originating from this series. I've
> > verified that reverting the series fixes the issue.
> 
> Thanks for the report!
> 
> There was a bug in the original patch [12/14] that would cause
> symptoms like what you are observing to appear, which was reported on
> Friday and has since been fixed in the tree.  Please see:
> 
> https://lore.kernel.org/linux-pm/CAJZ5v0iw7uXE_cfU5VXOjFDg9GM8Hu0+hKxqfzU3v0OM5KK9oQ@mail.gmail.com/
> 
> You probably have not tested the fixed tree yet, so please let
> kernelci run again on it and if the issue is still there, please let
> me know.

Indeed it has been fixed.

#regzbot fix: 'thermal/of: Use the .should_bind() thermal zone callback'

Thanks,
Nícolas

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2024-08-30 13:55 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-19 15:49 [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Rafael J. Wysocki
2024-08-19 15:50 ` [PATCH v3 01/14] thermal: core: Fold two functions into their respective callers Rafael J. Wysocki
2024-08-20  7:04   ` Zhang, Rui
2024-08-21  7:57   ` Daniel Lezcano
2024-08-19 15:51 ` [PATCH v3 02/14] thermal: core: Rearrange checks in thermal_bind_cdev_to_trip() Rafael J. Wysocki
2024-08-20  7:05   ` Zhang, Rui
2024-08-21  7:59   ` Daniel Lezcano
2024-08-21  8:49   ` lihuisong (C)
2024-08-21  9:28     ` Daniel Lezcano
2024-08-21  9:44       ` lihuisong (C)
2024-08-21 10:49         ` Daniel Lezcano
2024-08-21 11:22           ` lihuisong (C)
2024-08-21 11:12         ` Rafael J. Wysocki
2024-08-21 10:51     ` Rafael J. Wysocki
2024-08-19 15:52 ` [PATCH v3 03/14] thermal: core: Drop redundant thermal instance checks Rafael J. Wysocki
2024-08-20  7:05   ` Zhang, Rui
2024-08-21  9:32   ` Daniel Lezcano
2024-08-21 11:11     ` Rafael J. Wysocki
2024-08-21 11:56       ` Daniel Lezcano
2024-08-21 12:52         ` Rafael J. Wysocki
2024-08-19 15:56 ` [PATCH v3 04/14] thermal: sysfs: Use the dev argument in instance-related show/store Rafael J. Wysocki
2024-08-20  7:05   ` Zhang, Rui
2024-08-20  7:59   ` lihuisong (C)
2024-08-21  9:36   ` Daniel Lezcano
2024-08-19 15:58 ` [PATCH v3 05/14] thermal: core: Move thermal zone locking out of bind/unbind functions Rafael J. Wysocki
2024-08-20  7:05   ` Zhang, Rui
2024-08-20  8:27   ` lihuisong (C)
2024-08-20 10:27     ` Rafael J. Wysocki
2024-08-21  9:02       ` lihuisong (C)
2024-08-21 10:30         ` Rafael J. Wysocki
2024-08-21  9:46   ` Daniel Lezcano
2024-08-19 16:00 ` [PATCH v3 06/14] thermal: core: Introduce .should_bind() thermal zone callback Rafael J. Wysocki
2024-08-20  7:06   ` Zhang, Rui
2024-08-21  9:09   ` lihuisong (C)
2024-08-21 13:21   ` Daniel Lezcano
2024-08-19 16:02 ` [PATCH v3 07/14] thermal: ACPI: Use the " Rafael J. Wysocki
2024-08-20  7:06   ` Zhang, Rui
2024-08-21 13:22   ` Daniel Lezcano
2024-08-19 16:05 ` [PATCH v3 08/14] thermal: core: Unexport thermal_bind_cdev_to_trip() and thermal_unbind_cdev_from_trip() Rafael J. Wysocki
2024-08-20  7:08   ` Zhang, Rui
2024-08-21  9:18   ` lihuisong (C)
2024-08-21 13:23   ` Daniel Lezcano
2024-08-19 16:19 ` [PATCH v3 09/14] platform/x86: acerhdf: Use the .should_bind() thermal zone callback Rafael J. Wysocki
2024-08-19 20:24   ` Peter Kästle
2024-08-21 13:25   ` Daniel Lezcano
2024-08-19 16:24 ` [PATCH v3 10/14] mlxsw: core_thermal: " Rafael J. Wysocki
2024-08-19 16:26 ` [PATCH v3 11/14] thermal: imx: " Rafael J. Wysocki
2024-08-21 13:42   ` Daniel Lezcano
2024-08-19 16:30 ` [PATCH v3 12/14] thermal/of: " Rafael J. Wysocki
2024-08-21 14:20   ` Daniel Lezcano
2024-08-26 11:31   ` Marek Szyprowski
2024-08-26 12:14     ` Rafael J. Wysocki
2024-08-26 20:49       ` Marek Szyprowski
2024-08-27 11:39         ` Rafael J. Wysocki
2024-08-19 16:31 ` [PATCH v3 13/14] thermal: core: Drop unused bind/unbind functions and callbacks Rafael J. Wysocki
2024-08-20  7:10   ` Zhang, Rui
2024-08-21  9:33   ` lihuisong (C)
2024-08-21 14:24   ` Daniel Lezcano
2024-08-19 16:33 ` [PATCH v3 14/14] thermal: core: Clean up trip bind/unbind functions Rafael J. Wysocki
2024-08-20  7:11   ` Zhang, Rui
2024-08-21  9:34   ` lihuisong (C)
2024-08-21 14:29   ` Daniel Lezcano
2024-08-21 16:21     ` Rafael J. Wysocki
2024-08-24 18:45 ` [PATCH v3 00/14] thermal: Rework binding cooling devices to trip points Nícolas F. R. A. Prado
2024-08-26  9:58   ` Rafael J. Wysocki
2024-08-30 13:55     ` Nícolas F. R. A. Prado

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox