* [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone @ 2026-04-02 2:27 Jiajia Liu 2026-04-03 12:52 ` Rafael J. Wysocki 0 siblings, 1 reply; 10+ messages in thread From: Jiajia Liu @ 2026-04-02 2:27 UTC (permalink / raw) To: Rafael J . Wysocki, Daniel Lezcano, Zhang Rui, Lukasz Luba, linux-pm, linux-kernel Cc: Jiajia Liu When hwmon->tz_list has more than one member, thermal_remove_hwmon_sysfs does not unregister hwmon->device. Unregistering the zone which is parent of hwmon->device blocks at wait_for_completion(&tz->removal). Add check and move hwmon to other zone in thermal_remove_hwmon_sysfs. One method of reproducing hung task is to unbind the first acpitz zone on systems with two acpitz zones. $ cd /sys/bus/platform/drivers/acpi-thermal/ $ ls bind LNXTHERM:00 LNXTHERM:01 uevent unbind $ echo 'LNXTHERM:00' | sudo tee unbind > /dev/null Signed-off-by: Jiajia Liu <liujiajia@kylinos.cn> --- drivers/thermal/thermal_hwmon.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c index b624892bc6d6..43cde079fef0 100644 --- a/drivers/thermal/thermal_hwmon.c +++ b/drivers/thermal/thermal_hwmon.c @@ -242,6 +242,15 @@ void thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz) list_del(&temp->hwmon_node); kfree(temp); if (!list_empty(&hwmon->tz_list)) { + if (hwmon->device->parent == &tz->device) { + struct thermal_hwmon_temp *first; + + first = list_first_entry(&hwmon->tz_list, + struct thermal_hwmon_temp, + hwmon_node); + device_move(hwmon->device, &first->tz->device, + DPM_ORDER_DEV_AFTER_PARENT); + } mutex_unlock(&thermal_hwmon_list_lock); return; } -- 2.53.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone 2026-04-02 2:27 [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone Jiajia Liu @ 2026-04-03 12:52 ` Rafael J. Wysocki 2026-04-03 14:20 ` Guenter Roeck 0 siblings, 1 reply; 10+ messages in thread From: Rafael J. Wysocki @ 2026-04-03 12:52 UTC (permalink / raw) To: Jiajia Liu Cc: Daniel Lezcano, Zhang Rui, Lukasz Luba, linux-pm, linux-kernel, Armin Wolf, Guenter Roeck, linux-hwmon On Thursday, April 2, 2026 4:27:42 AM CEST Jiajia Liu wrote: > When hwmon->tz_list has more than one member, > thermal_remove_hwmon_sysfs does not unregister hwmon->device. > Unregistering the zone which is parent of hwmon->device blocks > at wait_for_completion(&tz->removal). Add check and move hwmon > to other zone in thermal_remove_hwmon_sysfs. > > One method of reproducing hung task is to unbind the first > acpitz zone on systems with two acpitz zones. > > $ cd /sys/bus/platform/drivers/acpi-thermal/ > $ ls > bind LNXTHERM:00 LNXTHERM:01 uevent unbind > $ echo 'LNXTHERM:00' | sudo tee unbind > /dev/null Right, this is a problem. However, it is more of a design issue IMV because putting temperature attributes for all of the (possibly unrelated) thermal zones of the same type under one hwmon interface is not particularly useful (for example, if there are more then one of them, it is not particularly straightforward to find the thermal zone corresponding to a given hwmon attribute and vice versa). Also it is asking for trouble as demonstrated by the above. > Signed-off-by: Jiajia Liu <liujiajia@kylinos.cn> > --- > drivers/thermal/thermal_hwmon.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c > index b624892bc6d6..43cde079fef0 100644 > --- a/drivers/thermal/thermal_hwmon.c > +++ b/drivers/thermal/thermal_hwmon.c > @@ -242,6 +242,15 @@ void thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz) > list_del(&temp->hwmon_node); > kfree(temp); > if (!list_empty(&hwmon->tz_list)) { > + if (hwmon->device->parent == &tz->device) { > + struct thermal_hwmon_temp *first; > + > + first = list_first_entry(&hwmon->tz_list, > + struct thermal_hwmon_temp, > + hwmon_node); > + device_move(hwmon->device, &first->tz->device, > + DPM_ORDER_DEV_AFTER_PARENT); > + } This may just confuse user space if it started to use the hwmon interface already. > mutex_unlock(&thermal_hwmon_list_lock); > return; > } > So I'm wondering if something like the patch below can be done instead. It will basically cause every thermal zone to get its own hwmon interface regardless of the type. It appears to work for me, but I'm not sure if having multiple hwmon class devices with the same value in the name attribute is fine. --- drivers/thermal/thermal_hwmon.c | 109 ++++++++-------------------------------- 1 file changed, 24 insertions(+), 85 deletions(-) --- a/drivers/thermal/thermal_hwmon.c +++ b/drivers/thermal/thermal_hwmon.c @@ -19,19 +19,8 @@ #include "thermal_hwmon.h" #include "thermal_core.h" -/* hwmon sys I/F */ -/* thermal zone devices with the same type share one hwmon device */ -struct thermal_hwmon_device { - char type[THERMAL_NAME_LENGTH]; - struct device *device; - int count; - struct list_head tz_list; - struct list_head node; -}; - struct thermal_hwmon_attr { struct device_attribute attr; - char name[16]; }; /* one temperature input for each thermal zone */ @@ -42,6 +31,15 @@ struct thermal_hwmon_temp { struct thermal_hwmon_attr temp_crit; /* hwmon sys attr */ }; +/* hwmon sys I/F */ +/* thermal zone devices with the same type share one hwmon device */ +struct thermal_hwmon_device { + char type[THERMAL_NAME_LENGTH]; + struct device *device; + struct list_head node; + struct thermal_hwmon_temp tz_temp; +}; + static LIST_HEAD(thermal_hwmon_list); static DEFINE_MUTEX(thermal_hwmon_list_lock); @@ -87,41 +85,17 @@ temp_crit_show(struct device *dev, struc return sysfs_emit(buf, "%d\n", temperature); } - static struct thermal_hwmon_device * -thermal_hwmon_lookup_by_type(const struct thermal_zone_device *tz) +thermal_hwmon_lookup(const struct thermal_zone_device *tz) { struct thermal_hwmon_device *hwmon; - char type[THERMAL_NAME_LENGTH]; - mutex_lock(&thermal_hwmon_list_lock); + guard(mutex)(&thermal_hwmon_list_lock); + list_for_each_entry(hwmon, &thermal_hwmon_list, node) { - strscpy(type, tz->type); - strreplace(type, '-', '_'); - if (!strcmp(hwmon->type, type)) { - mutex_unlock(&thermal_hwmon_list_lock); + if (hwmon->tz_temp.tz == tz) return hwmon; - } } - mutex_unlock(&thermal_hwmon_list_lock); - - return NULL; -} - -/* Find the temperature input matching a given thermal zone */ -static struct thermal_hwmon_temp * -thermal_hwmon_lookup_temp(const struct thermal_hwmon_device *hwmon, - const struct thermal_zone_device *tz) -{ - struct thermal_hwmon_temp *temp; - - mutex_lock(&thermal_hwmon_list_lock); - list_for_each_entry(temp, &hwmon->tz_list, hwmon_node) - if (temp->tz == tz) { - mutex_unlock(&thermal_hwmon_list_lock); - return temp; - } - mutex_unlock(&thermal_hwmon_list_lock); return NULL; } @@ -136,20 +110,15 @@ int thermal_add_hwmon_sysfs(struct therm { struct thermal_hwmon_device *hwmon; struct thermal_hwmon_temp *temp; - int new_hwmon_device = 1; int result; - hwmon = thermal_hwmon_lookup_by_type(tz); - if (hwmon) { - new_hwmon_device = 0; - goto register_sys_interface; - } + if (thermal_hwmon_lookup(tz)) + return -EEXIST; hwmon = kzalloc_obj(*hwmon); if (!hwmon) return -ENOMEM; - INIT_LIST_HEAD(&hwmon->tz_list); strscpy(hwmon->type, tz->type, THERMAL_NAME_LENGTH); strreplace(hwmon->type, '-', '_'); hwmon->device = hwmon_device_register_for_thermal(&tz->device, @@ -159,31 +128,20 @@ int thermal_add_hwmon_sysfs(struct therm goto free_mem; } - register_sys_interface: - temp = kzalloc_obj(*temp); - if (!temp) { - result = -ENOMEM; - goto unregister_name; - } + temp = &hwmon->tz_temp; temp->tz = tz; - hwmon->count++; - snprintf(temp->temp_input.name, sizeof(temp->temp_input.name), - "temp%d_input", hwmon->count); - temp->temp_input.attr.attr.name = temp->temp_input.name; + temp->temp_input.attr.attr.name = "temp1_input"; temp->temp_input.attr.attr.mode = 0444; temp->temp_input.attr.show = temp_input_show; sysfs_attr_init(&temp->temp_input.attr.attr); result = device_create_file(hwmon->device, &temp->temp_input.attr); if (result) - goto free_temp_mem; + goto unregister_name; if (thermal_zone_crit_temp_valid(tz)) { - snprintf(temp->temp_crit.name, - sizeof(temp->temp_crit.name), - "temp%d_crit", hwmon->count); - temp->temp_crit.attr.attr.name = temp->temp_crit.name; + temp->temp_crit.attr.attr.name = "temp1_crit"; temp->temp_crit.attr.attr.mode = 0444; temp->temp_crit.attr.show = temp_crit_show; sysfs_attr_init(&temp->temp_crit.attr.attr); @@ -194,20 +152,15 @@ int thermal_add_hwmon_sysfs(struct therm } mutex_lock(&thermal_hwmon_list_lock); - if (new_hwmon_device) - list_add_tail(&hwmon->node, &thermal_hwmon_list); - list_add_tail(&temp->hwmon_node, &hwmon->tz_list); + list_add_tail(&hwmon->node, &thermal_hwmon_list); mutex_unlock(&thermal_hwmon_list_lock); return 0; unregister_input: device_remove_file(hwmon->device, &temp->temp_input.attr); - free_temp_mem: - kfree(temp); unregister_name: - if (new_hwmon_device) - hwmon_device_unregister(hwmon->device); + hwmon_device_unregister(hwmon->device); free_mem: kfree(hwmon); @@ -220,31 +173,17 @@ void thermal_remove_hwmon_sysfs(struct t struct thermal_hwmon_device *hwmon; struct thermal_hwmon_temp *temp; - hwmon = thermal_hwmon_lookup_by_type(tz); - if (unlikely(!hwmon)) { - /* Should never happen... */ - dev_dbg(&tz->device, "hwmon device lookup failed!\n"); + hwmon = thermal_hwmon_lookup(tz); + if (!hwmon) return; - } - temp = thermal_hwmon_lookup_temp(hwmon, tz); - if (unlikely(!temp)) { - /* Should never happen... */ - dev_dbg(&tz->device, "temperature input lookup failed!\n"); - return; - } + temp = &hwmon->tz_temp; device_remove_file(hwmon->device, &temp->temp_input.attr); if (thermal_zone_crit_temp_valid(tz)) device_remove_file(hwmon->device, &temp->temp_crit.attr); mutex_lock(&thermal_hwmon_list_lock); - list_del(&temp->hwmon_node); - kfree(temp); - if (!list_empty(&hwmon->tz_list)) { - mutex_unlock(&thermal_hwmon_list_lock); - return; - } list_del(&hwmon->node); mutex_unlock(&thermal_hwmon_list_lock); ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone 2026-04-03 12:52 ` Rafael J. Wysocki @ 2026-04-03 14:20 ` Guenter Roeck 2026-04-04 12:58 ` Rafael J. Wysocki 0 siblings, 1 reply; 10+ messages in thread From: Guenter Roeck @ 2026-04-03 14:20 UTC (permalink / raw) To: Rafael J. Wysocki, Jiajia Liu Cc: Daniel Lezcano, Zhang Rui, Lukasz Luba, linux-pm, linux-kernel, Armin Wolf, linux-hwmon On 4/3/26 05:52, Rafael J. Wysocki wrote: .[ ... ] > It appears to work for me, but I'm not sure if having multiple hwmon class > devices with the same value in the name attribute is fine. Like this ? $ cd /sys/class/hwmon $ grep . */name hwmon0/name:r8169_0_c00:00 hwmon1/name:nvme hwmon2/name:nvme hwmon3/name:nct6687 hwmon4/name:k10temp hwmon5/name:spd5118 hwmon6/name:spd5118 hwmon7/name:spd5118 hwmon8/name:spd5118 hwmon9/name:mt7921_phy0 Names such as "r8169_0_c00:00" and "mt7921_phy0" are actually overkill since the "sensors" command makes it r8169_0_c00:00-mdio-0 Adapter: MDIO adapter temp1: +36.0°C (high = +120.0°C) mt7921_phy0-pci-0d00 Adapter: PCI adapter temp1: +30.0°C essentially duplicating the device index. Thanks, Guenter ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone 2026-04-03 14:20 ` Guenter Roeck @ 2026-04-04 12:58 ` Rafael J. Wysocki 2026-04-04 14:02 ` Guenter Roeck 0 siblings, 1 reply; 10+ messages in thread From: Rafael J. Wysocki @ 2026-04-04 12:58 UTC (permalink / raw) To: Guenter Roeck Cc: Rafael J. Wysocki, Jiajia Liu, Daniel Lezcano, Zhang Rui, Lukasz Luba, linux-pm, linux-kernel, Armin Wolf, linux-hwmon On Fri, Apr 3, 2026 at 4:20 PM Guenter Roeck <linux@roeck-us.net> wrote: > > On 4/3/26 05:52, Rafael J. Wysocki wrote: > .[ ... ] > > It appears to work for me, but I'm not sure if having multiple hwmon class > > devices with the same value in the name attribute is fine. > > Like this ? > > $ cd /sys/class/hwmon > $ grep . */name > hwmon0/name:r8169_0_c00:00 > hwmon1/name:nvme > hwmon2/name:nvme > hwmon3/name:nct6687 > hwmon4/name:k10temp > hwmon5/name:spd5118 > hwmon6/name:spd5118 > hwmon7/name:spd5118 > hwmon8/name:spd5118 > hwmon9/name:mt7921_phy0 Yes. > Names such as "r8169_0_c00:00" and "mt7921_phy0" are actually overkill > since the "sensors" command makes it > > r8169_0_c00:00-mdio-0 > Adapter: MDIO adapter > temp1: +36.0°C (high = +120.0°C) > > mt7921_phy0-pci-0d00 > Adapter: PCI adapter > temp1: +30.0°C > > essentially duplicating the device index. Well, with the patch posted by me, the output of sensors from a test system looks like this: acpitz-acpi-0 Adapter: ACPI interface temp1: +16.8°C pch_cannonlake-virtual-0 Adapter: Virtual device temp1: +33.0°C acpitz-acpi-0 Adapter: ACPI interface temp1: +27.8°C (some further data excluded), which is kind of confusing (note the duplicate acpitz-acpi-0 entries with different values of temp1). That could be disambiguated by concatenating the thermal zone ID (possibly after a '_') to the name. Or the "temp*" things for thermal zones of the same type could carry different numbers. A less attractive alternative would be to register a special virtual device serving as a parent for all hwmon interfaces registered automatically for thermal zones. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone 2026-04-04 12:58 ` Rafael J. Wysocki @ 2026-04-04 14:02 ` Guenter Roeck 2026-04-04 17:38 ` Rafael J. Wysocki 0 siblings, 1 reply; 10+ messages in thread From: Guenter Roeck @ 2026-04-04 14:02 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiajia Liu, Daniel Lezcano, Zhang Rui, Lukasz Luba, linux-pm, linux-kernel, Armin Wolf, linux-hwmon On 4/4/26 05:58, Rafael J. Wysocki wrote: > On Fri, Apr 3, 2026 at 4:20 PM Guenter Roeck <linux@roeck-us.net> wrote: >> >> On 4/3/26 05:52, Rafael J. Wysocki wrote: >> .[ ... ] >>> It appears to work for me, but I'm not sure if having multiple hwmon class >>> devices with the same value in the name attribute is fine. >> >> Like this ? >> >> $ cd /sys/class/hwmon >> $ grep . */name >> hwmon0/name:r8169_0_c00:00 >> hwmon1/name:nvme >> hwmon2/name:nvme >> hwmon3/name:nct6687 >> hwmon4/name:k10temp >> hwmon5/name:spd5118 >> hwmon6/name:spd5118 >> hwmon7/name:spd5118 >> hwmon8/name:spd5118 >> hwmon9/name:mt7921_phy0 > > Yes. > >> Names such as "r8169_0_c00:00" and "mt7921_phy0" are actually overkill >> since the "sensors" command makes it >> >> r8169_0_c00:00-mdio-0 >> Adapter: MDIO adapter >> temp1: +36.0°C (high = +120.0°C) >> >> mt7921_phy0-pci-0d00 >> Adapter: PCI adapter >> temp1: +30.0°C >> >> essentially duplicating the device index. > > Well, with the patch posted by me, the output of sensors from a test > system looks like this: > > acpitz-acpi-0 > Adapter: ACPI interface > temp1: +16.8°C > > pch_cannonlake-virtual-0 > Adapter: Virtual device > temp1: +33.0°C > > acpitz-acpi-0 > Adapter: ACPI interface > temp1: +27.8°C > > (some further data excluded), which is kind of confusing (note the > duplicate acpitz-acpi-0 entries with different values of temp1). > Yes, agreed, that is confusing. I would have expected the second one to be identified as "acpitz-acpi-1". Do they both have the same parent ? > That could be disambiguated by concatenating the thermal zone ID > (possibly after a '_') to the name. Or the "temp*" things for thermal > zones of the same type could carry different numbers. > > A less attractive alternative would be to register a special virtual > device serving as a parent for all hwmon interfaces registered > automatically for thermal zones. If they all have the same parent, technically it should be a single hwmon device with multiple sensors, as in: acpitz-acpi-0 Adapter: ACPI interface temp1: +16.8°C temp2: +27.8°C Guenter ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone 2026-04-04 14:02 ` Guenter Roeck @ 2026-04-04 17:38 ` Rafael J. Wysocki 2026-04-05 3:34 ` Guenter Roeck 0 siblings, 1 reply; 10+ messages in thread From: Rafael J. Wysocki @ 2026-04-04 17:38 UTC (permalink / raw) To: Guenter Roeck Cc: Rafael J. Wysocki, Jiajia Liu, Daniel Lezcano, Zhang Rui, Lukasz Luba, linux-pm, linux-kernel, Armin Wolf, linux-hwmon On Sat, Apr 4, 2026 at 4:02 PM Guenter Roeck <linux@roeck-us.net> wrote: > > On 4/4/26 05:58, Rafael J. Wysocki wrote: > > On Fri, Apr 3, 2026 at 4:20 PM Guenter Roeck <linux@roeck-us.net> wrote: > >> > >> On 4/3/26 05:52, Rafael J. Wysocki wrote: > >> .[ ... ] > >>> It appears to work for me, but I'm not sure if having multiple hwmon class > >>> devices with the same value in the name attribute is fine. > >> > >> Like this ? > >> > >> $ cd /sys/class/hwmon > >> $ grep . */name > >> hwmon0/name:r8169_0_c00:00 > >> hwmon1/name:nvme > >> hwmon2/name:nvme > >> hwmon3/name:nct6687 > >> hwmon4/name:k10temp > >> hwmon5/name:spd5118 > >> hwmon6/name:spd5118 > >> hwmon7/name:spd5118 > >> hwmon8/name:spd5118 > >> hwmon9/name:mt7921_phy0 > > > > Yes. > > > >> Names such as "r8169_0_c00:00" and "mt7921_phy0" are actually overkill > >> since the "sensors" command makes it > >> > >> r8169_0_c00:00-mdio-0 > >> Adapter: MDIO adapter > >> temp1: +36.0°C (high = +120.0°C) > >> > >> mt7921_phy0-pci-0d00 > >> Adapter: PCI adapter > >> temp1: +30.0°C > >> > >> essentially duplicating the device index. > > > > Well, with the patch posted by me, the output of sensors from a test > > system looks like this: > > > > acpitz-acpi-0 > > Adapter: ACPI interface > > temp1: +16.8°C > > > > pch_cannonlake-virtual-0 > > Adapter: Virtual device > > temp1: +33.0°C > > > > acpitz-acpi-0 > > Adapter: ACPI interface > > temp1: +27.8°C > > > > (some further data excluded), which is kind of confusing (note the > > duplicate acpitz-acpi-0 entries with different values of temp1). > > > > Yes, agreed, that is confusing. I would have expected the second one > to be identified as "acpitz-acpi-1". Do they both have the same parent ? No, they don't. The parent of each of them is a thermal zone device and both parents have the same "type" value. > > That could be disambiguated by concatenating the thermal zone ID > > (possibly after a '_') to the name. Or the "temp*" things for thermal > > zones of the same type could carry different numbers. > > > > A less attractive alternative would be to register a special virtual > > device serving as a parent for all hwmon interfaces registered > > automatically for thermal zones. > > If they all have the same parent, technically it should be a single > hwmon device with multiple sensors, as in: > > acpitz-acpi-0 > Adapter: ACPI interface > temp1: +16.8°C > temp2: +27.8°C So somebody tried to make it look like that by registering hwmon interfaces for all of the thermal zones of the same type under one of them, but that (quite obviously) doesn't work. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone 2026-04-04 17:38 ` Rafael J. Wysocki @ 2026-04-05 3:34 ` Guenter Roeck 2026-04-08 15:05 ` Rafael J. Wysocki 0 siblings, 1 reply; 10+ messages in thread From: Guenter Roeck @ 2026-04-05 3:34 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiajia Liu, Daniel Lezcano, Zhang Rui, Lukasz Luba, linux-pm, linux-kernel, Armin Wolf, linux-hwmon On 4/4/26 10:38, Rafael J. Wysocki wrote: > On Sat, Apr 4, 2026 at 4:02 PM Guenter Roeck <linux@roeck-us.net> wrote: >> >> On 4/4/26 05:58, Rafael J. Wysocki wrote: >>> On Fri, Apr 3, 2026 at 4:20 PM Guenter Roeck <linux@roeck-us.net> wrote: >>>> >>>> On 4/3/26 05:52, Rafael J. Wysocki wrote: >>>> .[ ... ] >>>>> It appears to work for me, but I'm not sure if having multiple hwmon class >>>>> devices with the same value in the name attribute is fine. >>>> >>>> Like this ? >>>> >>>> $ cd /sys/class/hwmon >>>> $ grep . */name >>>> hwmon0/name:r8169_0_c00:00 >>>> hwmon1/name:nvme >>>> hwmon2/name:nvme >>>> hwmon3/name:nct6687 >>>> hwmon4/name:k10temp >>>> hwmon5/name:spd5118 >>>> hwmon6/name:spd5118 >>>> hwmon7/name:spd5118 >>>> hwmon8/name:spd5118 >>>> hwmon9/name:mt7921_phy0 >>> >>> Yes. >>> >>>> Names such as "r8169_0_c00:00" and "mt7921_phy0" are actually overkill >>>> since the "sensors" command makes it >>>> >>>> r8169_0_c00:00-mdio-0 >>>> Adapter: MDIO adapter >>>> temp1: +36.0°C (high = +120.0°C) >>>> >>>> mt7921_phy0-pci-0d00 >>>> Adapter: PCI adapter >>>> temp1: +30.0°C >>>> >>>> essentially duplicating the device index. >>> >>> Well, with the patch posted by me, the output of sensors from a test >>> system looks like this: >>> >>> acpitz-acpi-0 >>> Adapter: ACPI interface >>> temp1: +16.8°C >>> >>> pch_cannonlake-virtual-0 >>> Adapter: Virtual device >>> temp1: +33.0°C >>> >>> acpitz-acpi-0 >>> Adapter: ACPI interface >>> temp1: +27.8°C >>> >>> (some further data excluded), which is kind of confusing (note the >>> duplicate acpitz-acpi-0 entries with different values of temp1). >>> >> >> Yes, agreed, that is confusing. I would have expected the second one >> to be identified as "acpitz-acpi-1". Do they both have the same parent ? > > No, they don't. > > The parent of each of them is a thermal zone device and both parents > have the same "type" value. > >>> That could be disambiguated by concatenating the thermal zone ID >>> (possibly after a '_') to the name. Or the "temp*" things for thermal >>> zones of the same type could carry different numbers. >>> >>> A less attractive alternative would be to register a special virtual >>> device serving as a parent for all hwmon interfaces registered >>> automatically for thermal zones. >> >> If they all have the same parent, technically it should be a single >> hwmon device with multiple sensors, as in: >> >> acpitz-acpi-0 >> Adapter: ACPI interface >> temp1: +16.8°C >> temp2: +27.8°C > > So somebody tried to make it look like that by registering hwmon > interfaces for all of the thermal zones of the same type under one of > them, but that (quite obviously) doesn't work. Not sure I understand why that doesn't work or why that is obvious, but I'll take you by your word (I would agree that the current _implementation_ looks problematic). I looked into the source code of the "sensors" command. It indeed does not index ACPI devices (nor virtual devices, for that matter) but assumes that such devices are unique. My apologies for not realizing this earlier. So your only option is indeed to index the chip name if you want/need more than one hwmon device with the same base name (here: acpitz) instantiated from the thermal subsystem. One comment to one of your earlier e-mails: "However, it is more of a design issue IMV because putting temperature attributes for all of the (possibly unrelated) thermal zones of the same type under one hwmon interface is not particularly useful" A single hardware monitoring device, by design, serves multiple thermal zones. Anything else would not make sense for multi-channel hardware monitoring chips. The hardware monitoring subsystem groups sensors by chip, not by thermal zones. Having said this: This discussion is not new. Certain subsystems advocate for having one hardware monitoring device per sensor, not per chip. One submitter went as far as telling me that I am clueless. We don't need to repeat the exercise. I have my opinion, you have yours, and all we can do is to agree to disagree. Thanks, Guenter ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone 2026-04-05 3:34 ` Guenter Roeck @ 2026-04-08 15:05 ` Rafael J. Wysocki 2026-04-08 15:32 ` Guenter Roeck 0 siblings, 1 reply; 10+ messages in thread From: Rafael J. Wysocki @ 2026-04-08 15:05 UTC (permalink / raw) To: Guenter Roeck Cc: Rafael J. Wysocki, Jiajia Liu, Daniel Lezcano, Zhang Rui, Lukasz Luba, linux-pm, linux-kernel, Armin Wolf, linux-hwmon On Sun, Apr 5, 2026 at 5:34 AM Guenter Roeck <linux@roeck-us.net> wrote: > > On 4/4/26 10:38, Rafael J. Wysocki wrote: > > On Sat, Apr 4, 2026 at 4:02 PM Guenter Roeck <linux@roeck-us.net> wrote: > >> > >> On 4/4/26 05:58, Rafael J. Wysocki wrote: > >>> On Fri, Apr 3, 2026 at 4:20 PM Guenter Roeck <linux@roeck-us.net> wrote: > >>>> > >>>> On 4/3/26 05:52, Rafael J. Wysocki wrote: > >>>> .[ ... ] > >>>>> It appears to work for me, but I'm not sure if having multiple hwmon class > >>>>> devices with the same value in the name attribute is fine. > >>>> > >>>> Like this ? > >>>> > >>>> $ cd /sys/class/hwmon > >>>> $ grep . */name > >>>> hwmon0/name:r8169_0_c00:00 > >>>> hwmon1/name:nvme > >>>> hwmon2/name:nvme > >>>> hwmon3/name:nct6687 > >>>> hwmon4/name:k10temp > >>>> hwmon5/name:spd5118 > >>>> hwmon6/name:spd5118 > >>>> hwmon7/name:spd5118 > >>>> hwmon8/name:spd5118 > >>>> hwmon9/name:mt7921_phy0 > >>> > >>> Yes. > >>> > >>>> Names such as "r8169_0_c00:00" and "mt7921_phy0" are actually overkill > >>>> since the "sensors" command makes it > >>>> > >>>> r8169_0_c00:00-mdio-0 > >>>> Adapter: MDIO adapter > >>>> temp1: +36.0°C (high = +120.0°C) > >>>> > >>>> mt7921_phy0-pci-0d00 > >>>> Adapter: PCI adapter > >>>> temp1: +30.0°C > >>>> > >>>> essentially duplicating the device index. > >>> > >>> Well, with the patch posted by me, the output of sensors from a test > >>> system looks like this: > >>> > >>> acpitz-acpi-0 > >>> Adapter: ACPI interface > >>> temp1: +16.8°C > >>> > >>> pch_cannonlake-virtual-0 > >>> Adapter: Virtual device > >>> temp1: +33.0°C > >>> > >>> acpitz-acpi-0 > >>> Adapter: ACPI interface > >>> temp1: +27.8°C > >>> > >>> (some further data excluded), which is kind of confusing (note the > >>> duplicate acpitz-acpi-0 entries with different values of temp1). > >>> > >> > >> Yes, agreed, that is confusing. I would have expected the second one > >> to be identified as "acpitz-acpi-1". Do they both have the same parent ? > > > > No, they don't. > > > > The parent of each of them is a thermal zone device and both parents > > have the same "type" value. > > > >>> That could be disambiguated by concatenating the thermal zone ID > >>> (possibly after a '_') to the name. Or the "temp*" things for thermal > >>> zones of the same type could carry different numbers. > >>> > >>> A less attractive alternative would be to register a special virtual > >>> device serving as a parent for all hwmon interfaces registered > >>> automatically for thermal zones. > >> > >> If they all have the same parent, technically it should be a single > >> hwmon device with multiple sensors, as in: > >> > >> acpitz-acpi-0 > >> Adapter: ACPI interface > >> temp1: +16.8°C > >> temp2: +27.8°C > > > > So somebody tried to make it look like that by registering hwmon > > interfaces for all of the thermal zones of the same type under one of > > them, but that (quite obviously) doesn't work. > > Not sure I understand why that doesn't work or why that is obvious, > but I'll take you by your word (I would agree that the current > _implementation_ looks problematic). For example, say that there are two ACPI thermal zones on a system /sys/devices/virtual/thermal/thermal_zone0/ /sys/devices/virtual/thermal/thermal_zone1/ The current mainline code registers a hwmon class device for thermal_zone0 only: /sys/devices/virtual/thermal/thermal_zone0/hwmon0/ because the type is "acpitz" for both of them, but it adds a sysfs attribute that belongs to thermal_zone1 under it: /sys/devices/virtual/thermal/thermal_zone0/hwmon0/temp2_input There is also /sys/devices/virtual/thermal/thermal_zone0/hwmon0/temp1_input but it belongs to thermal_zone0. Interesting things happen when thermal_zone0 is removed, for example because the ACPI thermal driver is unbound from the underlying platform device. Namely, the removal code skips the removal of hwmon0 because of the temp2_input attribute belonging to thermal_zone1 which effectively prevents thermal_zone0 removal from making progress. AFAICS, nothing particularly smart can be done to address this issue while retaining the current design of the code. Reparenting hwmon0 to thermal_zone1 may confuse user space as well as removing hwmon0 along with temp2_input. That's why I think that this is a design issue. > I looked into the source code of the "sensors" command. It indeed does > not index ACPI devices (nor virtual devices, for that matter) but > assumes that such devices are unique. My apologies for not realizing > this earlier. > > So your only option is indeed to index the chip name if you want/need > more than one hwmon device with the same base name (here: acpitz) > instantiated from the thermal subsystem. > > One comment to one of your earlier e-mails: > > "However, it is more of a design issue IMV because putting temperature > attributes for all of the (possibly unrelated) thermal zones of the > same type under one hwmon interface is not particularly useful" > > A single hardware monitoring device, by design, serves multiple > thermal zones. Anything else would not make sense for multi-channel > hardware monitoring chips. The hardware monitoring subsystem groups > sensors by chip, not by thermal zones. > > Having said this: This discussion is not new. Certain subsystems > advocate for having one hardware monitoring device per sensor, > not per chip. One submitter went as far as telling me that I am > clueless. We don't need to repeat the exercise. I have my opinion, > you have yours, and all we can do is to agree to disagree. I'm not sure if this has anything to do with hardware monitoring chips because hwmon_device_register_for_thermal() sets the chip argument of __hwmon_device_register() to NULL, so the chip information is missing in this particular case. The underlying hardware may or may not be a multi-channel hardware monitoring chip, that is hard to tell in general. In the particular case of ACPI thermal zones, they each correspond to a different platform device and regarding those as different channels of the same hardware monitoring chip is kind of a stretch IMV (they may even be located at different places in the device hierarchy). Regardless, it should be possible to remove each of them cleanly because they are handled by the driver independently. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone 2026-04-08 15:05 ` Rafael J. Wysocki @ 2026-04-08 15:32 ` Guenter Roeck 2026-04-08 15:57 ` Rafael J. Wysocki 0 siblings, 1 reply; 10+ messages in thread From: Guenter Roeck @ 2026-04-08 15:32 UTC (permalink / raw) To: Rafael J. Wysocki Cc: Jiajia Liu, Daniel Lezcano, Zhang Rui, Lukasz Luba, linux-pm, linux-kernel, Armin Wolf, linux-hwmon On 4/8/26 08:05, Rafael J. Wysocki wrote: > On Sun, Apr 5, 2026 at 5:34 AM Guenter Roeck <linux@roeck-us.net> wrote: >> >> On 4/4/26 10:38, Rafael J. Wysocki wrote: >>> On Sat, Apr 4, 2026 at 4:02 PM Guenter Roeck <linux@roeck-us.net> wrote: >>>> >>>> On 4/4/26 05:58, Rafael J. Wysocki wrote: >>>>> On Fri, Apr 3, 2026 at 4:20 PM Guenter Roeck <linux@roeck-us.net> wrote: >>>>>> >>>>>> On 4/3/26 05:52, Rafael J. Wysocki wrote: >>>>>> .[ ... ] >>>>>>> It appears to work for me, but I'm not sure if having multiple hwmon class >>>>>>> devices with the same value in the name attribute is fine. >>>>>> >>>>>> Like this ? >>>>>> >>>>>> $ cd /sys/class/hwmon >>>>>> $ grep . */name >>>>>> hwmon0/name:r8169_0_c00:00 >>>>>> hwmon1/name:nvme >>>>>> hwmon2/name:nvme >>>>>> hwmon3/name:nct6687 >>>>>> hwmon4/name:k10temp >>>>>> hwmon5/name:spd5118 >>>>>> hwmon6/name:spd5118 >>>>>> hwmon7/name:spd5118 >>>>>> hwmon8/name:spd5118 >>>>>> hwmon9/name:mt7921_phy0 >>>>> >>>>> Yes. >>>>> >>>>>> Names such as "r8169_0_c00:00" and "mt7921_phy0" are actually overkill >>>>>> since the "sensors" command makes it >>>>>> >>>>>> r8169_0_c00:00-mdio-0 >>>>>> Adapter: MDIO adapter >>>>>> temp1: +36.0°C (high = +120.0°C) >>>>>> >>>>>> mt7921_phy0-pci-0d00 >>>>>> Adapter: PCI adapter >>>>>> temp1: +30.0°C >>>>>> >>>>>> essentially duplicating the device index. >>>>> >>>>> Well, with the patch posted by me, the output of sensors from a test >>>>> system looks like this: >>>>> >>>>> acpitz-acpi-0 >>>>> Adapter: ACPI interface >>>>> temp1: +16.8°C >>>>> >>>>> pch_cannonlake-virtual-0 >>>>> Adapter: Virtual device >>>>> temp1: +33.0°C >>>>> >>>>> acpitz-acpi-0 >>>>> Adapter: ACPI interface >>>>> temp1: +27.8°C >>>>> >>>>> (some further data excluded), which is kind of confusing (note the >>>>> duplicate acpitz-acpi-0 entries with different values of temp1). >>>>> >>>> >>>> Yes, agreed, that is confusing. I would have expected the second one >>>> to be identified as "acpitz-acpi-1". Do they both have the same parent ? >>> >>> No, they don't. >>> >>> The parent of each of them is a thermal zone device and both parents >>> have the same "type" value. >>> >>>>> That could be disambiguated by concatenating the thermal zone ID >>>>> (possibly after a '_') to the name. Or the "temp*" things for thermal >>>>> zones of the same type could carry different numbers. >>>>> >>>>> A less attractive alternative would be to register a special virtual >>>>> device serving as a parent for all hwmon interfaces registered >>>>> automatically for thermal zones. >>>> >>>> If they all have the same parent, technically it should be a single >>>> hwmon device with multiple sensors, as in: >>>> >>>> acpitz-acpi-0 >>>> Adapter: ACPI interface >>>> temp1: +16.8°C >>>> temp2: +27.8°C >>> >>> So somebody tried to make it look like that by registering hwmon >>> interfaces for all of the thermal zones of the same type under one of >>> them, but that (quite obviously) doesn't work. >> >> Not sure I understand why that doesn't work or why that is obvious, >> but I'll take you by your word (I would agree that the current >> _implementation_ looks problematic). > > For example, say that there are two ACPI thermal zones on a system > > /sys/devices/virtual/thermal/thermal_zone0/ > /sys/devices/virtual/thermal/thermal_zone1/ > > The current mainline code registers a hwmon class device for thermal_zone0 only: > > /sys/devices/virtual/thermal/thermal_zone0/hwmon0/ > > because the type is "acpitz" for both of them, but it adds a sysfs > attribute that belongs to thermal_zone1 under it: > > /sys/devices/virtual/thermal/thermal_zone0/hwmon0/temp2_input > > There is also > > /sys/devices/virtual/thermal/thermal_zone0/hwmon0/temp1_input > > but it belongs to thermal_zone0. > > Interesting things happen when thermal_zone0 is removed, for example > because the ACPI thermal driver is unbound from the underlying > platform device. Namely, the removal code skips the removal of hwmon0 > because of the temp2_input attribute belonging to thermal_zone1 which > effectively prevents thermal_zone0 removal from making progress. > > AFAICS, nothing particularly smart can be done to address this issue > while retaining the current design of the code. Reparenting hwmon0 to > thermal_zone1 may confuse user space as well as removing hwmon0 along > with temp2_input. That's why I think that this is a design issue. > The ACPI power meter driver has pretty much the same problem. A clear solution would require making hwmon sysfs attributes dynamic in nature (i.e., by adding the ability to change the visibility of attributes in runtime). I have started working on that, but did not have time to complete the work. The ACPI power meter driver uses a kludge around that: It unregisters the hwmon device whenever it gets a METER_NOTIFY_CONFIG event and re-registers it. Anyway, registering separate hwmon devices, one per thermal zone, is perfectly fine with me. Guenter ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone 2026-04-08 15:32 ` Guenter Roeck @ 2026-04-08 15:57 ` Rafael J. Wysocki 0 siblings, 0 replies; 10+ messages in thread From: Rafael J. Wysocki @ 2026-04-08 15:57 UTC (permalink / raw) To: Guenter Roeck Cc: Rafael J. Wysocki, Jiajia Liu, Daniel Lezcano, Zhang Rui, Lukasz Luba, linux-pm, linux-kernel, Armin Wolf, linux-hwmon On Wed, Apr 8, 2026 at 5:32 PM Guenter Roeck <linux@roeck-us.net> wrote: > > On 4/8/26 08:05, Rafael J. Wysocki wrote: > > On Sun, Apr 5, 2026 at 5:34 AM Guenter Roeck <linux@roeck-us.net> wrote: > >> > >> On 4/4/26 10:38, Rafael J. Wysocki wrote: > >>> On Sat, Apr 4, 2026 at 4:02 PM Guenter Roeck <linux@roeck-us.net> wrote: > >>>> > >>>> On 4/4/26 05:58, Rafael J. Wysocki wrote: > >>>>> On Fri, Apr 3, 2026 at 4:20 PM Guenter Roeck <linux@roeck-us.net> wrote: > >>>>>> > >>>>>> On 4/3/26 05:52, Rafael J. Wysocki wrote: > >>>>>> .[ ... ] > >>>>>>> It appears to work for me, but I'm not sure if having multiple hwmon class > >>>>>>> devices with the same value in the name attribute is fine. > >>>>>> > >>>>>> Like this ? > >>>>>> > >>>>>> $ cd /sys/class/hwmon > >>>>>> $ grep . */name > >>>>>> hwmon0/name:r8169_0_c00:00 > >>>>>> hwmon1/name:nvme > >>>>>> hwmon2/name:nvme > >>>>>> hwmon3/name:nct6687 > >>>>>> hwmon4/name:k10temp > >>>>>> hwmon5/name:spd5118 > >>>>>> hwmon6/name:spd5118 > >>>>>> hwmon7/name:spd5118 > >>>>>> hwmon8/name:spd5118 > >>>>>> hwmon9/name:mt7921_phy0 > >>>>> > >>>>> Yes. > >>>>> > >>>>>> Names such as "r8169_0_c00:00" and "mt7921_phy0" are actually overkill > >>>>>> since the "sensors" command makes it > >>>>>> > >>>>>> r8169_0_c00:00-mdio-0 > >>>>>> Adapter: MDIO adapter > >>>>>> temp1: +36.0°C (high = +120.0°C) > >>>>>> > >>>>>> mt7921_phy0-pci-0d00 > >>>>>> Adapter: PCI adapter > >>>>>> temp1: +30.0°C > >>>>>> > >>>>>> essentially duplicating the device index. > >>>>> > >>>>> Well, with the patch posted by me, the output of sensors from a test > >>>>> system looks like this: > >>>>> > >>>>> acpitz-acpi-0 > >>>>> Adapter: ACPI interface > >>>>> temp1: +16.8°C > >>>>> > >>>>> pch_cannonlake-virtual-0 > >>>>> Adapter: Virtual device > >>>>> temp1: +33.0°C > >>>>> > >>>>> acpitz-acpi-0 > >>>>> Adapter: ACPI interface > >>>>> temp1: +27.8°C > >>>>> > >>>>> (some further data excluded), which is kind of confusing (note the > >>>>> duplicate acpitz-acpi-0 entries with different values of temp1). > >>>>> > >>>> > >>>> Yes, agreed, that is confusing. I would have expected the second one > >>>> to be identified as "acpitz-acpi-1". Do they both have the same parent ? > >>> > >>> No, they don't. > >>> > >>> The parent of each of them is a thermal zone device and both parents > >>> have the same "type" value. > >>> > >>>>> That could be disambiguated by concatenating the thermal zone ID > >>>>> (possibly after a '_') to the name. Or the "temp*" things for thermal > >>>>> zones of the same type could carry different numbers. > >>>>> > >>>>> A less attractive alternative would be to register a special virtual > >>>>> device serving as a parent for all hwmon interfaces registered > >>>>> automatically for thermal zones. > >>>> > >>>> If they all have the same parent, technically it should be a single > >>>> hwmon device with multiple sensors, as in: > >>>> > >>>> acpitz-acpi-0 > >>>> Adapter: ACPI interface > >>>> temp1: +16.8°C > >>>> temp2: +27.8°C > >>> > >>> So somebody tried to make it look like that by registering hwmon > >>> interfaces for all of the thermal zones of the same type under one of > >>> them, but that (quite obviously) doesn't work. > >> > >> Not sure I understand why that doesn't work or why that is obvious, > >> but I'll take you by your word (I would agree that the current > >> _implementation_ looks problematic). > > > > For example, say that there are two ACPI thermal zones on a system > > > > /sys/devices/virtual/thermal/thermal_zone0/ > > /sys/devices/virtual/thermal/thermal_zone1/ > > > > The current mainline code registers a hwmon class device for thermal_zone0 only: > > > > /sys/devices/virtual/thermal/thermal_zone0/hwmon0/ > > > > because the type is "acpitz" for both of them, but it adds a sysfs > > attribute that belongs to thermal_zone1 under it: > > > > /sys/devices/virtual/thermal/thermal_zone0/hwmon0/temp2_input > > > > There is also > > > > /sys/devices/virtual/thermal/thermal_zone0/hwmon0/temp1_input > > > > but it belongs to thermal_zone0. > > > > Interesting things happen when thermal_zone0 is removed, for example > > because the ACPI thermal driver is unbound from the underlying > > platform device. Namely, the removal code skips the removal of hwmon0 > > because of the temp2_input attribute belonging to thermal_zone1 which > > effectively prevents thermal_zone0 removal from making progress. > > > > AFAICS, nothing particularly smart can be done to address this issue > > while retaining the current design of the code. Reparenting hwmon0 to > > thermal_zone1 may confuse user space as well as removing hwmon0 along > > with temp2_input. That's why I think that this is a design issue. > > > > The ACPI power meter driver has pretty much the same problem. A clear > solution would require making hwmon sysfs attributes dynamic in nature > (i.e., by adding the ability to change the visibility of attributes in > runtime). I have started working on that, but did not have time to > complete the work. The ACPI power meter driver uses a kludge around that: > It unregisters the hwmon device whenever it gets a METER_NOTIFY_CONFIG > event and re-registers it. > > Anyway, registering separate hwmon devices, one per thermal zone, > is perfectly fine with me. OK, I'll do that and see how it goes. Thanks for the feedback! ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-04-08 15:57 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-04-02 2:27 [PATCH RESEND v1] thermal: core: fix blocking in unregistering zone Jiajia Liu 2026-04-03 12:52 ` Rafael J. Wysocki 2026-04-03 14:20 ` Guenter Roeck 2026-04-04 12:58 ` Rafael J. Wysocki 2026-04-04 14:02 ` Guenter Roeck 2026-04-04 17:38 ` Rafael J. Wysocki 2026-04-05 3:34 ` Guenter Roeck 2026-04-08 15:05 ` Rafael J. Wysocki 2026-04-08 15:32 ` Guenter Roeck 2026-04-08 15:57 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox