Linux Power Management development
 help / color / mirror / Atom feed
* Re: [PATCH v1] thermal: core: Relocate the struct thermal_governor definition
From: Lukasz Luba @ 2024-04-05  8:16 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: LKML, Daniel Lezcano, Linux PM
In-Reply-To: <2725268.mvXUDI8C0e@kreacher>



On 4/4/24 20:27, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Notice that struct thermal_governor is only used by the thermal core
> and so move its definition to thermal_core.h.
> 
> No functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>   drivers/thermal/thermal_core.h |   25 +++++++++++++++++++++++++
>   include/linux/thermal.h        |   25 -------------------------
>   2 files changed, 25 insertions(+), 25 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.h
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.h
> +++ linux-pm/drivers/thermal/thermal_core.h
> @@ -23,6 +23,31 @@ struct thermal_trip_desc {
>   };
>   
>   /**
> + * struct thermal_governor - structure that holds thermal governor information
> + * @name:	name of the governor
> + * @bind_to_tz: callback called when binding to a thermal zone.  If it
> + *		returns 0, the governor is bound to the thermal zone,
> + *		otherwise it fails.
> + * @unbind_from_tz:	callback called when a governor is unbound from a
> + *			thermal zone.
> + * @throttle:	callback called for every trip point even if temperature is
> + *		below the trip point temperature
> + * @update_tz:	callback called when thermal zone internals have changed, e.g.
> + *		thermal cooling instance was added/removed
> + * @governor_list:	node in thermal_governor_list (in thermal_core.c)
> + */
> +struct thermal_governor {
> +	const char *name;
> +	int (*bind_to_tz)(struct thermal_zone_device *tz);
> +	void (*unbind_from_tz)(struct thermal_zone_device *tz);
> +	int (*throttle)(struct thermal_zone_device *tz,
> +			const struct thermal_trip *trip);
> +	void (*update_tz)(struct thermal_zone_device *tz,
> +			  enum thermal_notify_event reason);
> +	struct list_head	governor_list;
> +};
> +
> +/**
>    * struct thermal_zone_device - structure for a thermal zone
>    * @id:		unique id number for each thermal zone
>    * @type:	the thermal zone device type
> Index: linux-pm/include/linux/thermal.h
> ===================================================================
> --- linux-pm.orig/include/linux/thermal.h
> +++ linux-pm/include/linux/thermal.h
> @@ -126,31 +126,6 @@ struct thermal_cooling_device {
>   #endif
>   };
>   
> -/**
> - * struct thermal_governor - structure that holds thermal governor information
> - * @name:	name of the governor
> - * @bind_to_tz: callback called when binding to a thermal zone.  If it
> - *		returns 0, the governor is bound to the thermal zone,
> - *		otherwise it fails.
> - * @unbind_from_tz:	callback called when a governor is unbound from a
> - *			thermal zone.
> - * @throttle:	callback called for every trip point even if temperature is
> - *		below the trip point temperature
> - * @update_tz:	callback called when thermal zone internals have changed, e.g.
> - *		thermal cooling instance was added/removed
> - * @governor_list:	node in thermal_governor_list (in thermal_core.c)
> - */
> -struct thermal_governor {
> -	const char *name;
> -	int (*bind_to_tz)(struct thermal_zone_device *tz);
> -	void (*unbind_from_tz)(struct thermal_zone_device *tz);
> -	int (*throttle)(struct thermal_zone_device *tz,
> -			const struct thermal_trip *trip);
> -	void (*update_tz)(struct thermal_zone_device *tz,
> -			  enum thermal_notify_event reason);
> -	struct list_head	governor_list;
> -};
> -
>   /* Structure to define Thermal Zone parameters */
>   struct thermal_zone_params {
>   	const char *governor_name;
> 
> 
> 

That makes perfectly sense IMO. We don't have drivers which come
together with some custom governor just for them (like IIRC some
devfreq more complex devices do/did).

Lets close that open dimension.

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>


^ permalink raw reply

* Re: [PATCH v3 6/6] thermal: core: Relocate critical and hot trip handling
From: Lukasz Luba @ 2024-04-05  7:35 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Srinivas Pandruvada, Daniel Lezcano,
	AngeloGioacchino Del Regno, Rafael J. Wysocki, Linux PM
In-Reply-To: <CAJZ5v0j0jKi9=w_RiYqSZuQtveskcE8jKZHDwaP1EmNOxLk-RQ@mail.gmail.com>

Hi Rafael,

On 4/4/24 10:03, Rafael J. Wysocki wrote:
> On Tue, Apr 2, 2024 at 9:04 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> Modify handle_thermal_trip() to call handle_critical_trips() only after
>> finding that the trip temperature has been crossed on the way up and
>> remove the redundant temperature check from the latter.
>>
>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> This change is premature, as it will cause handle_non_critical_trips()
> to be called for hot/critical trips which is questionable, so I'm
> withdrawing it for now.
> 
> The rest of the series is still applicable, though.
> 
> 

Could you explain your concerns about this, please?
Is about the extra execution time for the non-critical trip,
while we are in section of handling critical ASAP?
(also it would require that extra sorting there IMO)

Regards,
Lukasz

^ permalink raw reply

* Re: [PATCH v1] thermal: core: Relocate the struct thermal_governor definition
From: Daniel Lezcano @ 2024-04-05  6:51 UTC (permalink / raw)
  To: Rafael J. Wysocki, Linux PM; +Cc: LKML, Lukasz Luba
In-Reply-To: <2725268.mvXUDI8C0e@kreacher>

On 04/04/2024 21:27, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Notice that struct thermal_governor is only used by the thermal core
> and so move its definition to thermal_core.h.
> 
> No functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>

> ---
>   drivers/thermal/thermal_core.h |   25 +++++++++++++++++++++++++
>   include/linux/thermal.h        |   25 -------------------------
>   2 files changed, 25 insertions(+), 25 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.h
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.h
> +++ linux-pm/drivers/thermal/thermal_core.h
> @@ -23,6 +23,31 @@ struct thermal_trip_desc {
>   };
>   
>   /**
> + * struct thermal_governor - structure that holds thermal governor information
> + * @name:	name of the governor
> + * @bind_to_tz: callback called when binding to a thermal zone.  If it
> + *		returns 0, the governor is bound to the thermal zone,
> + *		otherwise it fails.
> + * @unbind_from_tz:	callback called when a governor is unbound from a
> + *			thermal zone.
> + * @throttle:	callback called for every trip point even if temperature is
> + *		below the trip point temperature
> + * @update_tz:	callback called when thermal zone internals have changed, e.g.
> + *		thermal cooling instance was added/removed
> + * @governor_list:	node in thermal_governor_list (in thermal_core.c)
> + */
> +struct thermal_governor {
> +	const char *name;
> +	int (*bind_to_tz)(struct thermal_zone_device *tz);
> +	void (*unbind_from_tz)(struct thermal_zone_device *tz);
> +	int (*throttle)(struct thermal_zone_device *tz,
> +			const struct thermal_trip *trip);
> +	void (*update_tz)(struct thermal_zone_device *tz,
> +			  enum thermal_notify_event reason);
> +	struct list_head	governor_list;
> +};
> +
> +/**
>    * struct thermal_zone_device - structure for a thermal zone
>    * @id:		unique id number for each thermal zone
>    * @type:	the thermal zone device type
> Index: linux-pm/include/linux/thermal.h
> ===================================================================
> --- linux-pm.orig/include/linux/thermal.h
> +++ linux-pm/include/linux/thermal.h
> @@ -126,31 +126,6 @@ struct thermal_cooling_device {
>   #endif
>   };
>   
> -/**
> - * struct thermal_governor - structure that holds thermal governor information
> - * @name:	name of the governor
> - * @bind_to_tz: callback called when binding to a thermal zone.  If it
> - *		returns 0, the governor is bound to the thermal zone,
> - *		otherwise it fails.
> - * @unbind_from_tz:	callback called when a governor is unbound from a
> - *			thermal zone.
> - * @throttle:	callback called for every trip point even if temperature is
> - *		below the trip point temperature
> - * @update_tz:	callback called when thermal zone internals have changed, e.g.
> - *		thermal cooling instance was added/removed
> - * @governor_list:	node in thermal_governor_list (in thermal_core.c)
> - */
> -struct thermal_governor {
> -	const char *name;
> -	int (*bind_to_tz)(struct thermal_zone_device *tz);
> -	void (*unbind_from_tz)(struct thermal_zone_device *tz);
> -	int (*throttle)(struct thermal_zone_device *tz,
> -			const struct thermal_trip *trip);
> -	void (*update_tz)(struct thermal_zone_device *tz,
> -			  enum thermal_notify_event reason);
> -	struct list_head	governor_list;
> -};
> -
>   /* Structure to define Thermal Zone parameters */
>   struct thermal_zone_params {
>   	const char *governor_name;
> 
> 
> 

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply

* pm/testing build: 7 builds: 0 failed, 7 passed, 33 warnings (v6.9-rc2-41-g3bf8b781e45e)
From: kernelci.org bot @ 2024-04-05  0:36 UTC (permalink / raw)
  To: rafael, linux-pm, kernel-build-reports, kernelci-results

pm/testing build: 7 builds: 0 failed, 7 passed, 33 warnings (v6.9-rc2-41-g3bf8b781e45e)

Full Build Summary: https://kernelci.org/build/pm/branch/testing/kernel/v6.9-rc2-41-g3bf8b781e45e/

Tree: pm
Branch: testing
Git Describe: v6.9-rc2-41-g3bf8b781e45e
Git Commit: 3bf8b781e45ef08116e826321af7e36af0565566
Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
Built: 7 unique architectures

Warnings Detected:

arc:
    haps_hs_smp_defconfig (gcc-10): 4 warnings

arm:

i386:

mips:
    32r2el_defconfig (gcc-10): 3 warnings

riscv:

sparc:
    sparc64_defconfig (gcc-10): 26 warnings

x86_64:


Warnings summary:

    2    WARNING: modpost: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.
    2    <stdin>:1519:2: warning: #warning syscall clone3 not implemented [-Wcpp]
    1    arch/sparc/vdso/vma.c:246:12: warning: no previous prototype for ‘init_vdso_image’ [-Wmissing-prototypes]
    1    arch/sparc/vdso/vdso32/../vclock_gettime.c:343:1: warning: no previous prototype for ‘__vdso_gettimeofday_stick’ [-Wmissing-prototypes]
    1    arch/sparc/vdso/vdso32/../vclock_gettime.c:307:1: warning: no previous prototype for ‘__vdso_gettimeofday’ [-Wmissing-prototypes]
    1    arch/sparc/vdso/vdso32/../vclock_gettime.c:282:1: warning: no previous prototype for ‘__vdso_clock_gettime_stick’ [-Wmissing-prototypes]
    1    arch/sparc/vdso/vdso32/../vclock_gettime.c:254:1: warning: no previous prototype for ‘__vdso_clock_gettime’ [-Wmissing-prototypes]
    1    arch/sparc/vdso/vclock_gettime.c:343:1: warning: no previous prototype for ‘__vdso_gettimeofday_stick’ [-Wmissing-prototypes]
    1    arch/sparc/vdso/vclock_gettime.c:307:1: warning: no previous prototype for ‘__vdso_gettimeofday’ [-Wmissing-prototypes]
    1    arch/sparc/vdso/vclock_gettime.c:282:1: warning: no previous prototype for ‘__vdso_clock_gettime_stick’ [-Wmissing-prototypes]
    1    arch/sparc/vdso/vclock_gettime.c:254:1: warning: no previous prototype for ‘__vdso_clock_gettime’ [-Wmissing-prototypes]
    1    arch/sparc/prom/p1275.c:52:6: warning: no previous prototype for ‘prom_cif_init’ [-Wmissing-prototypes]
    1    arch/sparc/prom/misc_64.c:165:5: warning: no previous prototype for ‘prom_get_mmu_ihandle’ [-Wmissing-prototypes]
    1    arch/sparc/mm/init_64.c:2644:6: warning: no previous prototype for ‘vmemmap_free’ [-Wmissing-prototypes]
    1    arch/sparc/kernel/uprobes.c:237:17: warning: no previous prototype for ‘uprobe_trap’ [-Wmissing-prototypes]
    1    arch/sparc/kernel/traps_64.c:253:6: warning: no previous prototype for ‘is_no_fault_exception’ [-Wmissing-prototypes]
    1    arch/sparc/kernel/traps_64.c:2153:6: warning: no previous prototype for ‘sun4v_nonresum_error_user_handled’ [-Wmissing-prototypes]
    1    arch/sparc/kernel/traps_64.c:2035:6: warning: no previous prototype for ‘do_mcd_err’ [-Wmissing-prototypes]
    1    arch/sparc/kernel/time_64.c:880:20: warning: no previous prototype for ‘sched_clock’ [-Wmissing-prototypes]
    1    arch/sparc/kernel/setup_64.c:602:13: warning: no previous prototype for ‘alloc_irqstack_bootmem’ [-Wmissing-prototypes]
    1    arch/sparc/kernel/pci_sun4v.c:259:15: warning: no previous prototype for ‘dma_4v_iotsb_bind’ [-Wmissing-prototypes]
    1    arch/sparc/kernel/adi_64.c:299:6: warning: no previous prototype for ‘del_tag_store’ [-Wmissing-prototypes]
    1    arch/sparc/kernel/adi_64.c:156:21: warning: no previous prototype for ‘alloc_tag_store’ [-Wmissing-prototypes]
    1    arch/sparc/kernel/adi_64.c:124:21: warning: no previous prototype for ‘find_tag_store’ [-Wmissing-prototypes]
    1    arch/mips/boot/dts/img/boston.dts:136.23-177.6: Warning (interrupt_provider): /pci@14000000/pci2_root@0,0/eg20t_bridge@1,0,0: '#interrupt-cells' found, but node is not an interrupt provider
    1    arch/mips/boot/dts/img/boston.dts:128.17-178.5: Warning (interrupt_provider): /pci@14000000/pci2_root@0,0: '#interrupt-cells' found, but node is not an interrupt provider
    1    arch/mips/boot/dts/img/boston.dtb: Warning (interrupt_map): Failed prerequisite 'interrupt_provider'
    1    arch/arc/kernel/ptrace.c:342:16: warning: no previous prototype for 'syscall_trace_enter' [-Wmissing-prototypes]
    1    arch/arc/kernel/kprobes.c:193:15: warning: no previous prototype for 'arc_kprobe_handler' [-Wmissing-prototypes]
    1    arch/arc/boot/dts/haps_hs_idu.dts:68.16-72.5: Warning (interrupt_provider): /fpga/pct: '#interrupt-cells' found, but node is not an interrupt provider
    1    arch/arc/boot/dts/haps_hs_idu.dtb: Warning (interrupt_map): Failed prerequisite 'interrupt_provider'

================================================================================

Detailed per-defconfig build reports:

--------------------------------------------------------------------------------
32r2el_defconfig (mips, gcc-10) — PASS, 0 errors, 3 warnings, 0 section mismatches

Warnings:
    arch/mips/boot/dts/img/boston.dts:128.17-178.5: Warning (interrupt_provider): /pci@14000000/pci2_root@0,0: '#interrupt-cells' found, but node is not an interrupt provider
    arch/mips/boot/dts/img/boston.dts:136.23-177.6: Warning (interrupt_provider): /pci@14000000/pci2_root@0,0/eg20t_bridge@1,0,0: '#interrupt-cells' found, but node is not an interrupt provider
    arch/mips/boot/dts/img/boston.dtb: Warning (interrupt_map): Failed prerequisite 'interrupt_provider'

--------------------------------------------------------------------------------
defconfig (riscv, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches

--------------------------------------------------------------------------------
haps_hs_smp_defconfig (arc, gcc-10) — PASS, 0 errors, 4 warnings, 0 section mismatches

Warnings:
    arch/arc/boot/dts/haps_hs_idu.dts:68.16-72.5: Warning (interrupt_provider): /fpga/pct: '#interrupt-cells' found, but node is not an interrupt provider
    arch/arc/boot/dts/haps_hs_idu.dtb: Warning (interrupt_map): Failed prerequisite 'interrupt_provider'
    arch/arc/kernel/ptrace.c:342:16: warning: no previous prototype for 'syscall_trace_enter' [-Wmissing-prototypes]
    arch/arc/kernel/kprobes.c:193:15: warning: no previous prototype for 'arc_kprobe_handler' [-Wmissing-prototypes]

--------------------------------------------------------------------------------
i386_defconfig (i386, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches

--------------------------------------------------------------------------------
multi_v7_defconfig (arm, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches

--------------------------------------------------------------------------------
sparc64_defconfig (sparc, gcc-10) — PASS, 0 errors, 26 warnings, 0 section mismatches

Warnings:
    <stdin>:1519:2: warning: #warning syscall clone3 not implemented [-Wcpp]
    arch/sparc/kernel/traps_64.c:253:6: warning: no previous prototype for ‘is_no_fault_exception’ [-Wmissing-prototypes]
    arch/sparc/kernel/traps_64.c:2035:6: warning: no previous prototype for ‘do_mcd_err’ [-Wmissing-prototypes]
    arch/sparc/kernel/traps_64.c:2153:6: warning: no previous prototype for ‘sun4v_nonresum_error_user_handled’ [-Wmissing-prototypes]
    arch/sparc/kernel/setup_64.c:602:13: warning: no previous prototype for ‘alloc_irqstack_bootmem’ [-Wmissing-prototypes]
    arch/sparc/kernel/time_64.c:880:20: warning: no previous prototype for ‘sched_clock’ [-Wmissing-prototypes]
    arch/sparc/kernel/adi_64.c:124:21: warning: no previous prototype for ‘find_tag_store’ [-Wmissing-prototypes]
    arch/sparc/kernel/adi_64.c:156:21: warning: no previous prototype for ‘alloc_tag_store’ [-Wmissing-prototypes]
    arch/sparc/kernel/adi_64.c:299:6: warning: no previous prototype for ‘del_tag_store’ [-Wmissing-prototypes]
    arch/sparc/kernel/pci_sun4v.c:259:15: warning: no previous prototype for ‘dma_4v_iotsb_bind’ [-Wmissing-prototypes]
    arch/sparc/kernel/uprobes.c:237:17: warning: no previous prototype for ‘uprobe_trap’ [-Wmissing-prototypes]
    arch/sparc/mm/init_64.c:2644:6: warning: no previous prototype for ‘vmemmap_free’ [-Wmissing-prototypes]
    arch/sparc/vdso/vma.c:246:12: warning: no previous prototype for ‘init_vdso_image’ [-Wmissing-prototypes]
    arch/sparc/vdso/vclock_gettime.c:254:1: warning: no previous prototype for ‘__vdso_clock_gettime’ [-Wmissing-prototypes]
    arch/sparc/vdso/vclock_gettime.c:282:1: warning: no previous prototype for ‘__vdso_clock_gettime_stick’ [-Wmissing-prototypes]
    arch/sparc/vdso/vclock_gettime.c:307:1: warning: no previous prototype for ‘__vdso_gettimeofday’ [-Wmissing-prototypes]
    arch/sparc/vdso/vclock_gettime.c:343:1: warning: no previous prototype for ‘__vdso_gettimeofday_stick’ [-Wmissing-prototypes]
    arch/sparc/vdso/vdso32/../vclock_gettime.c:254:1: warning: no previous prototype for ‘__vdso_clock_gettime’ [-Wmissing-prototypes]
    arch/sparc/vdso/vdso32/../vclock_gettime.c:282:1: warning: no previous prototype for ‘__vdso_clock_gettime_stick’ [-Wmissing-prototypes]
    arch/sparc/vdso/vdso32/../vclock_gettime.c:307:1: warning: no previous prototype for ‘__vdso_gettimeofday’ [-Wmissing-prototypes]
    arch/sparc/vdso/vdso32/../vclock_gettime.c:343:1: warning: no previous prototype for ‘__vdso_gettimeofday_stick’ [-Wmissing-prototypes]
    arch/sparc/prom/misc_64.c:165:5: warning: no previous prototype for ‘prom_get_mmu_ihandle’ [-Wmissing-prototypes]
    arch/sparc/prom/p1275.c:52:6: warning: no previous prototype for ‘prom_cif_init’ [-Wmissing-prototypes]
    WARNING: modpost: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.
    <stdin>:1519:2: warning: #warning syscall clone3 not implemented [-Wcpp]
    WARNING: modpost: EXPORT symbol "_mcount" [vmlinux] version generation failed, symbol will not be versioned.

--------------------------------------------------------------------------------
x86_64_defconfig (x86_64, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches

---
For more info write to <info@kernelci.org>

^ permalink raw reply

* Re: [PATCH v3 4/6] thermal: core: Send trip crossing notifications at init time if needed
From: Lukasz Luba @ 2024-04-04 22:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linux PM, Srinivas Pandruvada, Daniel Lezcano,
	AngeloGioacchino Del Regno
In-Reply-To: <1963742.PYKUYFuaPT@kreacher>



On 4/2/24 20:02, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> If a trip point is already exceeded by the zone temperature at the
> initialization time, no trip crossing notification is send regarding
> this even though mitigation should be started then.
> 
> Address this by rearranging the code in handle_thermal_trip() to
> send a trip crossing notification for trip points already exceeded
> by the zone temperature initially which also allows to reduce its
> size by using the observation that the initialization and regular
> trip crossing on the way up become the same case then.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v2 -> v3: New patch
> 
> ---
>   drivers/thermal/thermal_core.c |   37 ++++++++++++++++---------------------
>   1 file changed, 16 insertions(+), 21 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -364,6 +364,7 @@ static void handle_thermal_trip(struct t
>   				struct thermal_trip_desc *td)
>   {
>   	const struct thermal_trip *trip = &td->trip;
> +	int old_threshold;
>   
>   	if (trip->temperature == THERMAL_TEMP_INVALID)
>   		return;
> @@ -375,25 +376,11 @@ static void handle_thermal_trip(struct t
>   	 * is what needs to be compared with the previous zone temperature
>   	 * to decide which action to take.
>   	 */
> -	if (tz->last_temperature == THERMAL_TEMP_INVALID) {
> -		/* Initialization. */
> -		td->threshold = trip->temperature;
> -		if (tz->temperature >= td->threshold)
> -			td->threshold -= trip->hysteresis;
> -	} else if (tz->last_temperature < td->threshold) {
> -		/*
> -		 * There is no mitigation under way, so it needs to be started
> -		 * if the zone temperature exceeds the trip one.  The new
> -		 * threshold is then set to the low temperature of the trip.
> -		 */
> -		if (tz->temperature >= trip->temperature) {
> -			thermal_notify_tz_trip_up(tz, trip);
> -			thermal_debug_tz_trip_up(tz, trip);
> -			td->threshold = trip->temperature - trip->hysteresis;
> -		} else {
> -			td->threshold = trip->temperature;
> -		}
> -	} else {
> +	old_threshold = td->threshold;
> +	td->threshold = trip->temperature;
> +
> +	if (tz->last_temperature >= old_threshold &&
> +	    tz->last_temperature != THERMAL_TEMP_INVALID) {
>   		/*
>   		 * Mitigation is under way, so it needs to stop if the zone
>   		 * temperature falls below the low temperature of the trip.
> @@ -402,10 +389,18 @@ static void handle_thermal_trip(struct t
>   		if (tz->temperature < trip->temperature - trip->hysteresis) {
>   			thermal_notify_tz_trip_down(tz, trip);
>   			thermal_debug_tz_trip_down(tz, trip);
> -			td->threshold = trip->temperature;
>   		} else {
> -			td->threshold = trip->temperature - trip->hysteresis;
> +			td->threshold -= trip->hysteresis;
>   		}
> +	} else if (tz->temperature >= trip->temperature) {
> +		/*
> +		 * There is no mitigation under way, so it needs to be started
> +		 * if the zone temperature exceeds the trip one.  The new
> +		 * threshold is then set to the low temperature of the trip.
> +		 */
> +		thermal_notify_tz_trip_up(tz, trip);
> +		thermal_debug_tz_trip_up(tz, trip);
> +		td->threshold -= trip->hysteresis;
>   	}
>   
>   	if (trip->type == THERMAL_TRIP_CRITICAL || trip->type == THERMAL_TRIP_HOT)
> 
> 
> 


Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply

* Re: [PATCH v3 5/6] thermal: core: Sort trip point crossing notifications by temperature
From: Lukasz Luba @ 2024-04-04 22:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linux PM, Srinivas Pandruvada, Daniel Lezcano,
	AngeloGioacchino Del Regno
In-Reply-To: <7648070.EvYhyI6sBW@kreacher>



On 4/2/24 20:03, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> If multiple trip points are crossed in one go and the trips table in
> the thermal zone device object is not sorted, the corresponding trip
> point crossing notifications sent to user space will not be ordered
> either.
> 
> Moreover, if the trips table is sorted by trip temperature in ascending
> order, the trip crossing notifications on the way up will be sent in that
> order too, but the trip crossing notifications on the way down will be
> sent in the reverse order.
> 
> This is generally confusing and it is better to make the kernel send the
> notifications in the order of growing (on the way up) or falling (on the
> way down) trip temperature.
> 
> To achieve that, instead of sending a trip crossing notification and
> recording a trip crossing event in the statistics right away from
> handle_thermal_trip(), put the trip in question on a list that will be
> sorted by __thermal_zone_device_update() after processing all of the trips
> and before sending the notifications and recording trip crossing events.
> 
> Link: https://lore.kernel.org/linux-pm/20240306085428.88011-1-daniel.lezcano@linaro.org/
> Reported-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v2 -> v3: Rebase, update changelog, add notify_temp to struct thermal_trip_desc
> 
> v1 -> v2: New patch
> 
> ---
>   drivers/thermal/thermal_core.c |   41 +++++++++++++++++++++++++++++++++++------
>   drivers/thermal/thermal_core.h |    2 ++
>   2 files changed, 37 insertions(+), 6 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -15,6 +15,7 @@
>   #include <linux/slab.h>
>   #include <linux/kdev_t.h>
>   #include <linux/idr.h>
> +#include <linux/list_sort.h>
>   #include <linux/thermal.h>
>   #include <linux/reboot.h>
>   #include <linux/string.h>
> @@ -361,7 +362,9 @@ static void handle_critical_trips(struct
>   }
>   
>   static void handle_thermal_trip(struct thermal_zone_device *tz,
> -				struct thermal_trip_desc *td)
> +				struct thermal_trip_desc *td,
> +				struct list_head *way_up_list,
> +				struct list_head *way_down_list)
>   {
>   	const struct thermal_trip *trip = &td->trip;
>   	int old_threshold;
> @@ -387,8 +390,8 @@ static void handle_thermal_trip(struct t
>   		 * In that case, the trip temperature becomes the new threshold.
>   		 */
>   		if (tz->temperature < trip->temperature - trip->hysteresis) {
> -			thermal_notify_tz_trip_down(tz, trip);
> -			thermal_debug_tz_trip_down(tz, trip);
> +			list_add(&td->notify_list_node, way_down_list);
> +			td->notify_temp = trip->temperature - trip->hysteresis;
>   		} else {
>   			td->threshold -= trip->hysteresis;
>   		}
> @@ -398,8 +401,8 @@ static void handle_thermal_trip(struct t
>   		 * if the zone temperature exceeds the trip one.  The new
>   		 * threshold is then set to the low temperature of the trip.
>   		 */
> -		thermal_notify_tz_trip_up(tz, trip);
> -		thermal_debug_tz_trip_up(tz, trip);
> +		list_add_tail(&td->notify_list_node, way_up_list);
> +		td->notify_temp = trip->temperature;
>   		td->threshold -= trip->hysteresis;
>   	}
>   
> @@ -452,10 +455,24 @@ static void thermal_zone_device_init(str
>   		pos->initialized = false;
>   }
>   
> +static int thermal_trip_notify_cmp(void *ascending, const struct list_head *a,
> +				   const struct list_head *b)
> +{
> +	struct thermal_trip_desc *tda = container_of(a, struct thermal_trip_desc,
> +						     notify_list_node);
> +	struct thermal_trip_desc *tdb = container_of(b, struct thermal_trip_desc,
> +						     notify_list_node);
> +	int ret = tdb->notify_temp - tda->notify_temp;
> +
> +	return ascending ? ret : -ret;
> +}
> +
>   void __thermal_zone_device_update(struct thermal_zone_device *tz,
>   				  enum thermal_notify_event event)
>   {
>   	struct thermal_trip_desc *td;
> +	LIST_HEAD(way_down_list);
> +	LIST_HEAD(way_up_list);
>   
>   	if (tz->suspended)
>   		return;
> @@ -470,7 +487,19 @@ void __thermal_zone_device_update(struct
>   	tz->notify_event = event;
>   
>   	for_each_trip_desc(tz, td)
> -		handle_thermal_trip(tz, td);
> +		handle_thermal_trip(tz, td, &way_up_list, &way_down_list);
> +
> +	list_sort(&way_up_list, &way_up_list, thermal_trip_notify_cmp);
> +	list_for_each_entry(td, &way_up_list, notify_list_node) {
> +		thermal_notify_tz_trip_up(tz, &td->trip);
> +		thermal_debug_tz_trip_up(tz, &td->trip);
> +	}
> +
> +	list_sort(NULL, &way_down_list, thermal_trip_notify_cmp);
> +	list_for_each_entry(td, &way_down_list, notify_list_node) {
> +		thermal_notify_tz_trip_down(tz, &td->trip);
> +		thermal_debug_tz_trip_down(tz, &td->trip);
> +	}
>   
>   	monitor_thermal_zone(tz);
>   }
> Index: linux-pm/drivers/thermal/thermal_core.h
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.h
> +++ linux-pm/drivers/thermal/thermal_core.h
> @@ -17,6 +17,8 @@
>   
>   struct thermal_trip_desc {
>   	struct thermal_trip trip;
> +	struct list_head notify_list_node;
> +	int notify_temp;
>   	int threshold;
>   };
>   
> 
> 
> 


Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply

* Re: [PATCH v3 3/6] thermal: core: Rewrite comments in handle_thermal_trip()
From: Lukasz Luba @ 2024-04-04 22:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linux PM, Srinivas Pandruvada, Daniel Lezcano,
	AngeloGioacchino Del Regno
In-Reply-To: <3284691.aeNJFYEL58@kreacher>



On 4/2/24 19:59, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Make the comments regarding trip crossing and threshold updates in
> handle_thermal_trip() slightly more clear.
> 
> No functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v2 -> v3: New patch
> 
> ---
>   drivers/thermal/thermal_core.c |   26 +++++++++++++-------------
>   1 file changed, 13 insertions(+), 13 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -368,6 +368,13 @@ static void handle_thermal_trip(struct t
>   	if (trip->temperature == THERMAL_TEMP_INVALID)
>   		return;
>   
> +	/*
> +	 * If the trip temperature or hysteresis has been updated recently,
> +	 * the threshold needs to be computed again using the new values.
> +	 * However, its initial value still reflects the old ones and that
> +	 * is what needs to be compared with the previous zone temperature
> +	 * to decide which action to take.
> +	 */
>   	if (tz->last_temperature == THERMAL_TEMP_INVALID) {
>   		/* Initialization. */
>   		td->threshold = trip->temperature;
> @@ -375,11 +382,9 @@ static void handle_thermal_trip(struct t
>   			td->threshold -= trip->hysteresis;
>   	} else if (tz->last_temperature < td->threshold) {
>   		/*
> -		 * The trip threshold is equal to the trip temperature, unless
> -		 * the latter has changed in the meantime.  In either case,
> -		 * the trip is crossed if the current zone temperature is at
> -		 * least equal to its temperature, but otherwise ensure that
> -		 * the threshold and the trip temperature will be equal.
> +		 * There is no mitigation under way, so it needs to be started
> +		 * if the zone temperature exceeds the trip one.  The new
> +		 * threshold is then set to the low temperature of the trip.
>   		 */
>   		if (tz->temperature >= trip->temperature) {
>   			thermal_notify_tz_trip_up(tz, trip);
> @@ -390,14 +395,9 @@ static void handle_thermal_trip(struct t
>   		}
>   	} else {
>   		/*
> -		 * The previous zone temperature was above or equal to the trip
> -		 * threshold, which would be equal to the "low temperature" of
> -		 * the trip (its temperature minus its hysteresis), unless the
> -		 * trip temperature or hysteresis had changed.  In either case,
> -		 * the trip is crossed if the current zone temperature is below
> -		 * the low temperature of the trip, but otherwise ensure that
> -		 * the trip threshold will be equal to the low temperature of
> -		 * the trip.
> +		 * Mitigation is under way, so it needs to stop if the zone
> +		 * temperature falls below the low temperature of the trip.
> +		 * In that case, the trip temperature becomes the new threshold.
>   		 */
>   		if (tz->temperature < trip->temperature - trip->hysteresis) {
>   			thermal_notify_tz_trip_down(tz, trip);
> 
> 
> 

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply

* Re: [PATCH v3 2/6] thermal: core: Make struct thermal_zone_device definition internal
From: Lukasz Luba @ 2024-04-04 22:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linux PM, Srinivas Pandruvada, Daniel Lezcano,
	AngeloGioacchino Del Regno
In-Reply-To: <13485814.uLZWGnKmhe@kreacher>



On 4/2/24 19:57, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> Move the definitions of struct thermal_trip_desc and struct
> thermal_zone_device to an internal header file in the thermal core,
> as they don't need to be accessible to any code other than the thermal
> core and so they don't need to be present in a global header.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v2 -> v3: Minor changelog update
> 
> v1 -> v2: No changes
> 
> ---
>   drivers/thermal/thermal_core.h  |   85 +++++++++++++++++++++++++++++++++++++++
>   drivers/thermal/thermal_trace.h |    2
>   include/linux/thermal.h         |   87 ----------------------------------------
>   3 files changed, 89 insertions(+), 85 deletions(-)
> 

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply

* Re: [PATCH v3 1/6] thermal: core: Move threshold out of struct thermal_trip
From: Lukasz Luba @ 2024-04-04 22:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Linux PM, Srinivas Pandruvada, Daniel Lezcano,
	AngeloGioacchino Del Regno
In-Reply-To: <1884181.tdWV9SEqCh@kreacher>



On 4/2/24 19:56, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The threshold field in struct thermal_trip is only used internally by
> the thermal core and it is better to prevent drivers from misusing it.
> It also takes some space unnecessarily in the trip tables passed by
> drivers to the core during thermal zone registration.
> 
> For this reason, introduce struct thermal_trip_desc as a wrapper around
> struct thermal_trip, move the threshold field directly into it and make
> the thermal core store struct thermal_trip_desc objects in the internal
> thermal zone trip tables.  Adjust all of the code using trip tables in
> the thermal core accordingly.
> 
> No intentional functional impact.
> 
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v2 -> v3: Rebase on top of 6.9-rc2, minor changelog update.
> 
> v1 -> v2: No changes.
> 
> ---
>   drivers/thermal/gov_fair_share.c      |    7 +++--
>   drivers/thermal/gov_power_allocator.c |    6 ++--
>   drivers/thermal/thermal_core.c        |   46 +++++++++++++++++++---------------
>   drivers/thermal/thermal_core.h        |    7 +++--
>   drivers/thermal/thermal_debugfs.c     |    6 ++--
>   drivers/thermal/thermal_helpers.c     |    8 +++--
>   drivers/thermal/thermal_netlink.c     |    6 ++--
>   drivers/thermal/thermal_sysfs.c       |   20 +++++++-------
>   drivers/thermal/thermal_trip.c        |   15 +++++------
>   include/linux/thermal.h               |    9 ++++--
>   10 files changed, 78 insertions(+), 52 deletions(-)
> 
> Index: linux-pm/include/linux/thermal.h
> ===================================================================
> --- linux-pm.orig/include/linux/thermal.h
> +++ linux-pm/include/linux/thermal.h
> @@ -61,7 +61,6 @@ enum thermal_notify_event {
>    * struct thermal_trip - representation of a point in temperature domain
>    * @temperature: temperature value in miliCelsius
>    * @hysteresis: relative hysteresis in miliCelsius
> - * @threshold: trip crossing notification threshold miliCelsius
>    * @type: trip point type
>    * @priv: pointer to driver data associated with this trip
>    * @flags: flags representing binary properties of the trip
> @@ -69,12 +68,16 @@ enum thermal_notify_event {
>   struct thermal_trip {
>   	int temperature;
>   	int hysteresis;
> -	int threshold;
>   	enum thermal_trip_type type;
>   	u8 flags;
>   	void *priv;
>   };
>   
> +struct thermal_trip_desc {
> +	struct thermal_trip trip;
> +	int threshold;
> +};
> +
>   #define THERMAL_TRIP_FLAG_RW_TEMP	BIT(0)
>   #define THERMAL_TRIP_FLAG_RW_HYST	BIT(1)
>   
> @@ -203,7 +206,7 @@ struct thermal_zone_device {
>   #ifdef CONFIG_THERMAL_DEBUGFS
>   	struct thermal_debugfs *debugfs;
>   #endif
> -	struct thermal_trip trips[] __counted_by(num_trips);
> +	struct thermal_trip_desc trips[] __counted_by(num_trips);
>   };
>   
>   /**
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -361,17 +361,19 @@ static void handle_critical_trips(struct
>   }
>   
>   static void handle_thermal_trip(struct thermal_zone_device *tz,
> -				struct thermal_trip *trip)
> +				struct thermal_trip_desc *td)
>   {
> +	const struct thermal_trip *trip = &td->trip;
> +
>   	if (trip->temperature == THERMAL_TEMP_INVALID)
>   		return;
>   
>   	if (tz->last_temperature == THERMAL_TEMP_INVALID) {
>   		/* Initialization. */
> -		trip->threshold = trip->temperature;
> -		if (tz->temperature >= trip->threshold)
> -			trip->threshold -= trip->hysteresis;
> -	} else if (tz->last_temperature < trip->threshold) {
> +		td->threshold = trip->temperature;
> +		if (tz->temperature >= td->threshold)
> +			td->threshold -= trip->hysteresis;
> +	} else if (tz->last_temperature < td->threshold) {
>   		/*
>   		 * The trip threshold is equal to the trip temperature, unless
>   		 * the latter has changed in the meantime.  In either case,
> @@ -382,9 +384,9 @@ static void handle_thermal_trip(struct t
>   		if (tz->temperature >= trip->temperature) {
>   			thermal_notify_tz_trip_up(tz, trip);
>   			thermal_debug_tz_trip_up(tz, trip);
> -			trip->threshold = trip->temperature - trip->hysteresis;
> +			td->threshold = trip->temperature - trip->hysteresis;
>   		} else {
> -			trip->threshold = trip->temperature;
> +			td->threshold = trip->temperature;
>   		}
>   	} else {
>   		/*
> @@ -400,9 +402,9 @@ static void handle_thermal_trip(struct t
>   		if (tz->temperature < trip->temperature - trip->hysteresis) {
>   			thermal_notify_tz_trip_down(tz, trip);
>   			thermal_debug_tz_trip_down(tz, trip);
> -			trip->threshold = trip->temperature;
> +			td->threshold = trip->temperature;
>   		} else {
> -			trip->threshold = trip->temperature - trip->hysteresis;
> +			td->threshold = trip->temperature - trip->hysteresis;
>   		}
>   	}
>   
> @@ -458,7 +460,7 @@ static void thermal_zone_device_init(str
>   void __thermal_zone_device_update(struct thermal_zone_device *tz,
>   				  enum thermal_notify_event event)
>   {
> -	struct thermal_trip *trip;
> +	struct thermal_trip_desc *td;
>   
>   	if (tz->suspended)
>   		return;
> @@ -472,8 +474,8 @@ void __thermal_zone_device_update(struct
>   
>   	tz->notify_event = event;
>   
> -	for_each_trip(tz, trip)
> -		handle_thermal_trip(tz, trip);
> +	for_each_trip_desc(tz, td)
> +		handle_thermal_trip(tz, td);
>   
>   	monitor_thermal_zone(tz);
>   }
> @@ -766,7 +768,7 @@ int thermal_zone_bind_cooling_device(str
>   	if (trip_index < 0 || trip_index >= tz->num_trips)
>   		return -EINVAL;
>   
> -	return thermal_bind_cdev_to_trip(tz, &tz->trips[trip_index], cdev,
> +	return thermal_bind_cdev_to_trip(tz, &tz->trips[trip_index].trip, cdev,
>   					 upper, lower, weight);
>   }
>   EXPORT_SYMBOL_GPL(thermal_zone_bind_cooling_device);
> @@ -825,7 +827,7 @@ int thermal_zone_unbind_cooling_device(s
>   	if (trip_index < 0 || trip_index >= tz->num_trips)
>   		return -EINVAL;
>   
> -	return thermal_unbind_cdev_from_trip(tz, &tz->trips[trip_index], cdev);
> +	return thermal_unbind_cdev_from_trip(tz, &tz->trips[trip_index].trip, cdev);
>   }
>   EXPORT_SYMBOL_GPL(thermal_zone_unbind_cooling_device);
>   
> @@ -1221,16 +1223,19 @@ static void thermal_set_delay_jiffies(un
>   
>   int thermal_zone_get_crit_temp(struct thermal_zone_device *tz, int *temp)
>   {
> -	int i, ret = -EINVAL;
> +	const struct thermal_trip_desc *td;
> +	int ret = -EINVAL;
>   
>   	if (tz->ops.get_crit_temp)
>   		return tz->ops.get_crit_temp(tz, temp);
>   
>   	mutex_lock(&tz->lock);
>   
> -	for (i = 0; i < tz->num_trips; i++) {
> -		if (tz->trips[i].type == THERMAL_TRIP_CRITICAL) {
> -			*temp = tz->trips[i].temperature;
> +	for_each_trip_desc(tz, td) {
> +		const struct thermal_trip *trip = &td->trip;
> +
> +		if (trip->type == THERMAL_TRIP_CRITICAL) {
> +			*temp = trip->temperature;
>   			ret = 0;
>   			break;
>   		}
> @@ -1274,7 +1279,9 @@ thermal_zone_device_register_with_trips(
>   					const struct thermal_zone_params *tzp,
>   					int passive_delay, int polling_delay)
>   {
> +	const struct thermal_trip *trip = trips;
>   	struct thermal_zone_device *tz;
> +	struct thermal_trip_desc *td;
>   	int id;
>   	int result;
>   	struct thermal_governor *governor;
> @@ -1339,7 +1346,8 @@ thermal_zone_device_register_with_trips(
>   	tz->device.class = thermal_class;
>   	tz->devdata = devdata;
>   	tz->num_trips = num_trips;
> -	memcpy(tz->trips, trips, num_trips * sizeof(*trips));
> +	for_each_trip_desc(tz, td)
> +		td->trip = *trip++;
>   
>   	thermal_set_delay_jiffies(&tz->passive_delay_jiffies, passive_delay);
>   	thermal_set_delay_jiffies(&tz->polling_delay_jiffies, polling_delay);
> Index: linux-pm/drivers/thermal/thermal_core.h
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.h
> +++ linux-pm/drivers/thermal/thermal_core.h
> @@ -120,8 +120,11 @@ void thermal_governor_update_tz(struct t
>   				enum thermal_notify_event reason);
>   
>   /* Helpers */
> -#define for_each_trip(__tz, __trip)	\
> -	for (__trip = __tz->trips; __trip - __tz->trips < __tz->num_trips; __trip++)
> +#define for_each_trip_desc(__tz, __td)	\
> +	for (__td = __tz->trips; __td - __tz->trips < __tz->num_trips; __td++)
> +
> +#define trip_to_trip_desc(__trip)	\
> +	container_of(__trip, struct thermal_trip_desc, trip)
>   
>   void __thermal_zone_set_trips(struct thermal_zone_device *tz);
>   int thermal_zone_trip_id(const struct thermal_zone_device *tz,
> Index: linux-pm/drivers/thermal/thermal_sysfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_sysfs.c
> +++ linux-pm/drivers/thermal/thermal_sysfs.c
> @@ -88,7 +88,7 @@ trip_point_type_show(struct device *dev,
>   	if (sscanf(attr->attr.name, "trip_point_%d_type", &trip_id) != 1)
>   		return -EINVAL;
>   
> -	switch (tz->trips[trip_id].type) {
> +	switch (tz->trips[trip_id].trip.type) {

This could be a bit shorter, with some helper variable, but that's
minor cosmetic.

>   	case THERMAL_TRIP_CRITICAL:
>   		return sprintf(buf, "critical\n");
>   	case THERMAL_TRIP_HOT:
> @@ -120,7 +120,7 @@ trip_point_temp_store(struct device *dev
>   
>   	mutex_lock(&tz->lock);
>   
> -	trip = &tz->trips[trip_id];
> +	trip = &tz->trips[trip_id].trip;
>   
>   	if (temp != trip->temperature) {
>   		if (tz->ops.set_trip_temp) {
> @@ -150,7 +150,7 @@ trip_point_temp_show(struct device *dev,
>   	if (sscanf(attr->attr.name, "trip_point_%d_temp", &trip_id) != 1)
>   		return -EINVAL;
>   
> -	return sprintf(buf, "%d\n", tz->trips[trip_id].temperature);
> +	return sprintf(buf, "%d\n", tz->trips[trip_id].trip.temperature);
>   }
>   
>   static ssize_t
> @@ -171,7 +171,7 @@ trip_point_hyst_store(struct device *dev
>   
>   	mutex_lock(&tz->lock);
>   
> -	trip = &tz->trips[trip_id];
> +	trip = &tz->trips[trip_id].trip;
>   
>   	if (hyst != trip->hysteresis) {
>   		trip->hysteresis = hyst;
> @@ -194,7 +194,7 @@ trip_point_hyst_show(struct device *dev,
>   	if (sscanf(attr->attr.name, "trip_point_%d_hyst", &trip_id) != 1)
>   		return -EINVAL;
>   
> -	return sprintf(buf, "%d\n", tz->trips[trip_id].hysteresis);
> +	return sprintf(buf, "%d\n", tz->trips[trip_id].trip.hysteresis);
>   }
>   
>   static ssize_t
> @@ -393,7 +393,7 @@ static const struct attribute_group *the
>    */
>   static int create_trip_attrs(struct thermal_zone_device *tz)
>   {
> -	const struct thermal_trip *trip;
> +	const struct thermal_trip_desc *td;
>   	struct attribute **attrs;
>   
>   	/* This function works only for zones with at least one trip */
> @@ -429,8 +429,8 @@ static int create_trip_attrs(struct ther
>   		return -ENOMEM;
>   	}
>   
> -	for_each_trip(tz, trip) {
> -		int indx = thermal_zone_trip_id(tz, trip);
> +	for_each_trip_desc(tz, td) {
> +		int indx = thermal_zone_trip_id(tz, &td->trip);
>   
>   		/* create trip type attribute */
>   		snprintf(tz->trip_type_attrs[indx].name, THERMAL_NAME_LENGTH,
> @@ -452,7 +452,7 @@ static int create_trip_attrs(struct ther
>   						tz->trip_temp_attrs[indx].name;
>   		tz->trip_temp_attrs[indx].attr.attr.mode = S_IRUGO;
>   		tz->trip_temp_attrs[indx].attr.show = trip_point_temp_show;
> -		if (trip->flags & THERMAL_TRIP_FLAG_RW_TEMP) {
> +		if (td->trip.flags & THERMAL_TRIP_FLAG_RW_TEMP) {
>   			tz->trip_temp_attrs[indx].attr.attr.mode |= S_IWUSR;
>   			tz->trip_temp_attrs[indx].attr.store =
>   							trip_point_temp_store;
> @@ -467,7 +467,7 @@ static int create_trip_attrs(struct ther
>   					tz->trip_hyst_attrs[indx].name;
>   		tz->trip_hyst_attrs[indx].attr.attr.mode = S_IRUGO;
>   		tz->trip_hyst_attrs[indx].attr.show = trip_point_hyst_show;
> -		if (trip->flags & THERMAL_TRIP_FLAG_RW_HYST) {
> +		if (td->trip.flags & THERMAL_TRIP_FLAG_RW_HYST) {
>   			tz->trip_hyst_attrs[indx].attr.attr.mode |= S_IWUSR;
>   			tz->trip_hyst_attrs[indx].attr.store =
>   					trip_point_hyst_store;
> Index: linux-pm/drivers/thermal/thermal_debugfs.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_debugfs.c
> +++ linux-pm/drivers/thermal/thermal_debugfs.c
> @@ -744,7 +744,7 @@ static void tze_seq_stop(struct seq_file
>   static int tze_seq_show(struct seq_file *s, void *v)
>   {
>   	struct thermal_zone_device *tz = s->private;
> -	struct thermal_trip *trip;
> +	struct thermal_trip_desc *td;
>   	struct tz_episode *tze;
>   	const char *type;
>   	int trip_id;
> @@ -757,7 +757,9 @@ static int tze_seq_show(struct seq_file
>   
>   	seq_printf(s, "| trip |     type | temp(°mC) | hyst(°mC) |  duration  |  avg(°mC) |  min(°mC) |  max(°mC) |\n");
>   
> -	for_each_trip(tz, trip) {
> +	for_each_trip_desc(tz, td) {
> +		const struct thermal_trip *trip = &td->trip;
> +
>   		/*
>   		 * There is no possible mitigation happening at the
>   		 * critical trip point, so the stats will be always
> Index: linux-pm/drivers/thermal/thermal_netlink.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_netlink.c
> +++ linux-pm/drivers/thermal/thermal_netlink.c
> @@ -442,7 +442,7 @@ out_cancel_nest:
>   static int thermal_genl_cmd_tz_get_trip(struct param *p)
>   {
>   	struct sk_buff *msg = p->msg;
> -	const struct thermal_trip *trip;
> +	const struct thermal_trip_desc *td;
>   	struct thermal_zone_device *tz;
>   	struct nlattr *start_trip;
>   	int id;
> @@ -462,7 +462,9 @@ static int thermal_genl_cmd_tz_get_trip(
>   
>   	mutex_lock(&tz->lock);
>   
> -	for_each_trip(tz, trip) {
> +	for_each_trip_desc(tz, td) {
> +		const struct thermal_trip *trip = &td->trip;
> +
>   		if (nla_put_u32(msg, THERMAL_GENL_ATTR_TZ_TRIP_ID,
>   				thermal_zone_trip_id(tz, trip)) ||
>   		    nla_put_u32(msg, THERMAL_GENL_ATTR_TZ_TRIP_TYPE, trip->type) ||
> Index: linux-pm/drivers/thermal/thermal_trip.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_trip.c
> +++ linux-pm/drivers/thermal/thermal_trip.c
> @@ -13,11 +13,11 @@ int for_each_thermal_trip(struct thermal
>   			  int (*cb)(struct thermal_trip *, void *),
>   			  void *data)
>   {
> -	struct thermal_trip *trip;
> +	struct thermal_trip_desc *td;
>   	int ret;
>   
> -	for_each_trip(tz, trip) {
> -		ret = cb(trip, data);
> +	for_each_trip_desc(tz, td) {
> +		ret = cb(&td->trip, data);
>   		if (ret)
>   			return ret;
>   	}
> @@ -63,7 +63,7 @@ EXPORT_SYMBOL_GPL(thermal_zone_get_num_t
>    */
>   void __thermal_zone_set_trips(struct thermal_zone_device *tz)
>   {
> -	const struct thermal_trip *trip;
> +	const struct thermal_trip_desc *td;
>   	int low = -INT_MAX, high = INT_MAX;
>   	int ret;
>   
> @@ -72,7 +72,8 @@ void __thermal_zone_set_trips(struct the
>   	if (!tz->ops.set_trips)
>   		return;
>   
> -	for_each_trip(tz, trip) {
> +	for_each_trip_desc(tz, td) {
> +		const struct thermal_trip *trip = &td->trip;
>   		int trip_low;
>   
>   		trip_low = trip->temperature - trip->hysteresis;
> @@ -110,7 +111,7 @@ int __thermal_zone_get_trip(struct therm
>   	if (!tz || trip_id < 0 || trip_id >= tz->num_trips || !trip)
>   		return -EINVAL;
>   
> -	*trip = tz->trips[trip_id];
> +	*trip = tz->trips[trip_id].trip;
>   	return 0;
>   }
>   EXPORT_SYMBOL_GPL(__thermal_zone_get_trip);
> @@ -135,7 +136,7 @@ int thermal_zone_trip_id(const struct th
>   	 * Assume the trip to be located within the bounds of the thermal
>   	 * zone's trips[] table.
>   	 */
> -	return trip - tz->trips;
> +	return trip_to_trip_desc(trip) - tz->trips;
>   }
>   void thermal_zone_trip_updated(struct thermal_zone_device *tz,
>   			       const struct thermal_trip *trip)
> Index: linux-pm/drivers/thermal/gov_fair_share.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/gov_fair_share.c
> +++ linux-pm/drivers/thermal/gov_fair_share.c
> @@ -17,10 +17,13 @@
>   
>   static int get_trip_level(struct thermal_zone_device *tz)
>   {
> -	const struct thermal_trip *trip, *level_trip = NULL;
> +	const struct thermal_trip *level_trip = NULL;
> +	const struct thermal_trip_desc *td;
>   	int trip_level = -1;
>   
> -	for_each_trip(tz, trip) {
> +	for_each_trip_desc(tz, td) {
> +		const struct thermal_trip *trip = &td->trip;
> +
>   		if (trip->temperature >= tz->temperature)
>   			continue;
>   
> Index: linux-pm/drivers/thermal/gov_power_allocator.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/gov_power_allocator.c
> +++ linux-pm/drivers/thermal/gov_power_allocator.c
> @@ -496,9 +496,11 @@ static void get_governor_trips(struct th
>   	const struct thermal_trip *first_passive = NULL;
>   	const struct thermal_trip *last_passive = NULL;
>   	const struct thermal_trip *last_active = NULL;
> -	const struct thermal_trip *trip;
> +	const struct thermal_trip_desc *td;
> +
> +	for_each_trip_desc(tz, td) {
> +		const struct thermal_trip *trip = &td->trip;
>   
> -	for_each_trip(tz, trip) {
>   		switch (trip->type) {
>   		case THERMAL_TRIP_PASSIVE:
>   			if (!first_passive) {
> Index: linux-pm/drivers/thermal/thermal_helpers.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_helpers.c
> +++ linux-pm/drivers/thermal/thermal_helpers.c
> @@ -50,7 +50,7 @@ get_thermal_instance(struct thermal_zone
>   	mutex_lock(&tz->lock);
>   	mutex_lock(&cdev->lock);
>   
> -	trip = &tz->trips[trip_index];
> +	trip = &tz->trips[trip_index].trip;
>   
>   	list_for_each_entry(pos, &tz->thermal_instances, tz_node) {
>   		if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) {
> @@ -82,7 +82,7 @@ EXPORT_SYMBOL(get_thermal_instance);
>    */
>   int __thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp)
>   {
> -	const struct thermal_trip *trip;
> +	const struct thermal_trip_desc *td;
>   	int crit_temp = INT_MAX;
>   	int ret = -EINVAL;
>   
> @@ -91,7 +91,9 @@ int __thermal_zone_get_temp(struct therm
>   	ret = tz->ops.get_temp(tz, temp);
>   
>   	if (IS_ENABLED(CONFIG_THERMAL_EMULATION) && tz->emul_temperature) {
> -		for_each_trip(tz, trip) {
> +		for_each_trip_desc(tz, td) {
> +			const struct thermal_trip *trip = &td->trip;
> +
>   			if (trip->type == THERMAL_TRIP_CRITICAL) {
>   				crit_temp = trip->temperature;
>   				break;
> 
> 
> 

LGTM,

Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

^ permalink raw reply

* [PATCH v1] thermal: core: Relocate the struct thermal_governor definition
From: Rafael J. Wysocki @ 2024-04-04 19:27 UTC (permalink / raw)
  To: Linux PM; +Cc: LKML, Lukasz Luba, Daniel Lezcano

From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Notice that struct thermal_governor is only used by the thermal core
and so move its definition to thermal_core.h.

No functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/thermal_core.h |   25 +++++++++++++++++++++++++
 include/linux/thermal.h        |   25 -------------------------
 2 files changed, 25 insertions(+), 25 deletions(-)

Index: linux-pm/drivers/thermal/thermal_core.h
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.h
+++ linux-pm/drivers/thermal/thermal_core.h
@@ -23,6 +23,31 @@ struct thermal_trip_desc {
 };
 
 /**
+ * struct thermal_governor - structure that holds thermal governor information
+ * @name:	name of the governor
+ * @bind_to_tz: callback called when binding to a thermal zone.  If it
+ *		returns 0, the governor is bound to the thermal zone,
+ *		otherwise it fails.
+ * @unbind_from_tz:	callback called when a governor is unbound from a
+ *			thermal zone.
+ * @throttle:	callback called for every trip point even if temperature is
+ *		below the trip point temperature
+ * @update_tz:	callback called when thermal zone internals have changed, e.g.
+ *		thermal cooling instance was added/removed
+ * @governor_list:	node in thermal_governor_list (in thermal_core.c)
+ */
+struct thermal_governor {
+	const char *name;
+	int (*bind_to_tz)(struct thermal_zone_device *tz);
+	void (*unbind_from_tz)(struct thermal_zone_device *tz);
+	int (*throttle)(struct thermal_zone_device *tz,
+			const struct thermal_trip *trip);
+	void (*update_tz)(struct thermal_zone_device *tz,
+			  enum thermal_notify_event reason);
+	struct list_head	governor_list;
+};
+
+/**
  * struct thermal_zone_device - structure for a thermal zone
  * @id:		unique id number for each thermal zone
  * @type:	the thermal zone device type
Index: linux-pm/include/linux/thermal.h
===================================================================
--- linux-pm.orig/include/linux/thermal.h
+++ linux-pm/include/linux/thermal.h
@@ -126,31 +126,6 @@ struct thermal_cooling_device {
 #endif
 };
 
-/**
- * struct thermal_governor - structure that holds thermal governor information
- * @name:	name of the governor
- * @bind_to_tz: callback called when binding to a thermal zone.  If it
- *		returns 0, the governor is bound to the thermal zone,
- *		otherwise it fails.
- * @unbind_from_tz:	callback called when a governor is unbound from a
- *			thermal zone.
- * @throttle:	callback called for every trip point even if temperature is
- *		below the trip point temperature
- * @update_tz:	callback called when thermal zone internals have changed, e.g.
- *		thermal cooling instance was added/removed
- * @governor_list:	node in thermal_governor_list (in thermal_core.c)
- */
-struct thermal_governor {
-	const char *name;
-	int (*bind_to_tz)(struct thermal_zone_device *tz);
-	void (*unbind_from_tz)(struct thermal_zone_device *tz);
-	int (*throttle)(struct thermal_zone_device *tz,
-			const struct thermal_trip *trip);
-	void (*update_tz)(struct thermal_zone_device *tz,
-			  enum thermal_notify_event reason);
-	struct list_head	governor_list;
-};
-
 /* Structure to define Thermal Zone parameters */
 struct thermal_zone_params {
 	const char *governor_name;




^ permalink raw reply

* Re: [PATCH] PM:EM: fix wrong utilization estimation in em_cpu_energy()
From: Rafael J. Wysocki @ 2024-04-04 19:05 UTC (permalink / raw)
  To: Lukasz Luba, Vincent Guittot
  Cc: rui.zhang, amit.kucheria, amit.kachhap, daniel.lezcano,
	viresh.kumar, len.brown, pavel, mhiramat, qyousef, wvw,
	xuewen.yan94, rafael, linux-kernel, linux-pm, dietmar.eggemann
In-Reply-To: <7ecd3ec9-6990-4d3e-84ae-d0d3a1cccb78@arm.com>

On Thu, Apr 4, 2024 at 1:05 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
> Hi Vincent,
>
> On 4/4/24 11:42, Vincent Guittot wrote:
> > Commit 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove division")
> > has added back map_util_perf() in em_cpu_energy() computation which has
> > been removed with the rework of scheduler/cpufreq interface.
> > This is wrong because sugov_effective_cpu_perf() already takes care of
> > mapping the utilization to a performance level.
> >
> > Fixes: 1b600da51073 ("PM: EM: Optimize em_cpu_energy() and remove division")
> > Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
> > ---
> >   include/linux/energy_model.h | 1 -
> >   1 file changed, 1 deletion(-)
> >
> > diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h
> > index 770755df852f..70cd7258cd29 100644
> > --- a/include/linux/energy_model.h
> > +++ b/include/linux/energy_model.h
> > @@ -245,7 +245,6 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd,
> >        * max utilization to the allowed CPU capacity before calculating
> >        * effective performance.
> >        */
> > -     max_util = map_util_perf(max_util);
> >       max_util = min(max_util, allowed_cpu_cap);
> >
> >       /*
>
> LGTM. It was developed in parallel IIRC and that change which removes
> the extra margin to the util was lost from my radar. I can see it
> landed first.
>
> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

Applied as 6.9-rc material, thanks!

^ permalink raw reply

* Re: [PATCH v2] platform/x86/intel/hid: Don't wake on 5-button releases
From: Hans de Goede @ 2024-04-04 18:35 UTC (permalink / raw)
  To: David McFarland
  Cc: linux-pm, Ilpo Järvinen, platform-driver-x86@vger.kernel.org,
	Rafael J . Wysocki, Enrik Berkhan
In-Reply-To: <878r1tpd6u.fsf_-_@gmail.com>

Hi,

On 4/4/24 1:41 PM, David McFarland wrote:
> If, for example, the power button is configured to suspend, holding it
> and releasing it after the machine has suspended, will wake the machine.
> 
> Also on some machines, power button release events are sent during
> hibernation, even if the button wasn't used to hibernate the machine.
> This causes hibernation to be aborted.
> 
> Fixes: 0c4cae1bc00d ("PM: hibernate: Avoid missing wakeup events during hibernation")
> Signed-off-by: David McFarland <corngood@gmail.com>
> Tested-by: Enrik Berkhan <Enrik.Berkhan@inka.de>
> Reviewed-by: Hans de Goede <hdegoede@redhat.com>

Thank you, this version looks good to me.

Ilpo, can you pick this up as a bug-fix for the 6.9 cycle please?

Regards,

Hans



> ---
> v2: Added tags and fixed whitespace, as requested by Hans.
> 
>  drivers/platform/x86/intel/hid.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/platform/x86/intel/hid.c b/drivers/platform/x86/intel/hid.c
> index 7457ca2b27a6..9ffbdc988fe5 100644
> --- a/drivers/platform/x86/intel/hid.c
> +++ b/drivers/platform/x86/intel/hid.c
> @@ -504,6 +504,7 @@ static void notify_handler(acpi_handle handle, u32 event, void *context)
>  	struct platform_device *device = context;
>  	struct intel_hid_priv *priv = dev_get_drvdata(&device->dev);
>  	unsigned long long ev_index;
> +	struct key_entry *ke;
>  	int err;
>  
>  	/*
> @@ -545,11 +546,15 @@ static void notify_handler(acpi_handle handle, u32 event, void *context)
>  		if (event == 0xc0 || !priv->array)
>  			return;
>  
> -		if (!sparse_keymap_entry_from_scancode(priv->array, event)) {
> +		ke = sparse_keymap_entry_from_scancode(priv->array, event);
> +		if (!ke) {
>  			dev_info(&device->dev, "unknown event 0x%x\n", event);
>  			return;
>  		}
>  
> +		if (ke->type == KE_IGNORE)
> +			return;
> +
>  wakeup:
>  		pm_wakeup_hard_event(&device->dev);
>  


^ permalink raw reply

* Re: [RFC PATCH] therm_throt: test bits as we build therm_intr_core_clear_mask
From: Kyle McMartin @ 2024-04-04 17:17 UTC (permalink / raw)
  To: srinivas pandruvada
  Cc: Kyle McMartin, Rafael J. Wysocki, linux-pm, kernel-team,
	linux-kernel
In-Reply-To: <8b4cb4ad67032fad69f29df8e6b83054c7fa15db.camel@linux.intel.com>

On Wed, Apr 03, 2024 at 06:15:47PM -0700, srinivas pandruvada wrote:
> > On Broadwell and Broadwell-DE, the HWP flag is not set, but writing
> > these bits does not trap.
> > 
> > On our Skylake-DE, Skylake, and Cooper Lake platforms, the HWP flag
> > is
> > set in CPUID, and writing 1 to these bits traps attempting to write
> > 0xAAA8 to MSR 0x19C (THERM_STATUS). Writing 0xAA8 from userspace
> > works
> > as expected to un-stick PROCHOT_LOG.
> 
> I think this issue happens only on Skylake, Cascade Lake, Cooper Lake
> and not on any other systems.
> 
> Please verify:
> GP# happens only when bit13 (Current Limit Log) or bit15 (Cross Domain
> Limit Log) is 1.
> 

Yeah, if either of the bits are set, we'll trap and fail the WRMSRL.

> Basically writing 0x2000 or 0x8000  or A000 will cause this issue.
> Are you using the latest BIOS with microcode?
> Please confirm your microcode version, I can check internally.
> 

On SkylakeDE, 6-85-4 we've got 0x2006e08 and 0x2006e05 as the most commonly
deployed microcodes. On Skylake, 6-85-4 we've got 0x2006e05 and 0x2000065.
Finally, on Cooper Lake, 6-85-11, we have 0x700001f and are in the process
of rolling out 0x7002503.

Rolling out new firmware is a pretty slow process... Since we're not
clearing those bits anywhere in the kernel we're deploying, I just
stubbed out setting BIT(13) and BIT(15) on those platforms for now while
we discuss a more durable fix.

Thanks for following up! --kyle

> Thanks,
> Srinivas
> 
> 
> > 
> > On our Sapphire Rapids platforms, the HWP flag is set, and writing 1
> > to
> > these bits is successful.
> > 
> >  drivers/thermal/intel/therm_throt.c | 29 ++++++++++++++++++++++-----
> > --
> >  1 file changed, 22 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/thermal/intel/therm_throt.c
> > b/drivers/thermal/intel/therm_throt.c
> > index e69868e868eb..3058d8fcfcef 100644
> > --- a/drivers/thermal/intel/therm_throt.c
> > +++ b/drivers/thermal/intel/therm_throt.c
> > @@ -196,8 +196,14 @@ static const struct attribute_group
> > thermal_attr_group = {
> >  static u64 therm_intr_core_clear_mask;
> >  static u64 therm_intr_pkg_clear_mask;
> >  
> > +/* Probe each addition to the mask to ensure that our wrmsrl
> > + * won't fail to clear bits.
> > + */
> >  static void thermal_intr_init_core_clear_mask(void)
> >  {
> > +       u64 bits = 0;
> > +       u64 mask = 0;
> > +
> >         if (therm_intr_core_clear_mask)
> >                 return;
> >  
> > @@ -211,25 +217,34 @@ static void
> > thermal_intr_init_core_clear_mask(void)
> >          * Bit 1, 3, 5: CPUID.01H:EDX[22] = 1. This driver will not
> >          * enable interrupts, when 0 as it checks for
> > X86_FEATURE_ACPI.
> >          */
> > -       therm_intr_core_clear_mask = (BIT(1) | BIT(3) | BIT(5));
> > +       mask = (BIT(1) | BIT(3) | BIT(5));
> >  
> >         /*
> >          * Bit 7 and 9: Thermal Threshold #1 and #2 log
> >          * If CPUID.01H:ECX[8] = 1
> >          */
> > -       if (boot_cpu_has(X86_FEATURE_TM2))
> > -               therm_intr_core_clear_mask |= (BIT(7) | BIT(9));
> > +       bits = BIT(7) | BIT(9);
> > +       if (boot_cpu_has(X86_FEATURE_TM2) &&
> > +           wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> > +               mask |= bits;
> > +
> >  
> >         /* Bit 11: Power Limitation log (R/WC0) If CPUID.06H:EAX[4] =
> > 1 */
> > -       if (boot_cpu_has(X86_FEATURE_PLN))
> > -               therm_intr_core_clear_mask |= BIT(11);
> > +       bits = BIT(11);
> > +       if (boot_cpu_has(X86_FEATURE_PLN) &&
> > +           wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> > +               mask |= bits;
> >  
> >         /*
> >          * Bit 13: Current Limit log (R/WC0) If CPUID.06H:EAX[7] = 1
> >          * Bit 15: Cross Domain Limit log (R/WC0) If CPUID.06H:EAX[7]
> > = 1
> >          */
> > -       if (boot_cpu_has(X86_FEATURE_HWP))
> > -               therm_intr_core_clear_mask |= (BIT(13) | BIT(15));
> > +       bits = BIT(13) | BIT(15);
> > +       if (boot_cpu_has(X86_FEATURE_HWP) &&
> > +           wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> > +               mask |= bits;
> > +
> > +       therm_intr_core_clear_mask = mask;
> >  }
> >  
> >  static void thermal_intr_init_pkg_clear_mask(void)
> 

^ permalink raw reply

* iMX8M Mini suspend/resume hanging on imx8m_blk_ctrl_power_on()
From: vitor @ 2024-04-04 15:53 UTC (permalink / raw)
  To: linux-pm, imx, linux-arm-kernel, linux-kernel
  Cc: ivitro, vitor.soares, ulf.hansson, shawnguo, s.hauer, kernel,
	festevam, rafael, geert+renesas, peng.fan, linus.walleij,
	u.kleine-koenig, marex

Greetings,

I'm trying to suspend/resume our Verdin iMX8M Mini with VPU IP using
the latest 6.9.0-rc2 Kernel. While the system can suspend without
issues, it hangs on the resume routine. After some investigation, I can
see the Kernel hanging on imx8m_blk_ctrl_power_on()[1] while resuming
the hantro-vpu power domain.

Any hint about that?

[1]https://elixir.bootlin.com/linux/v6.9-rc2/source/drivers/pmdomain/imx/imx8m-blk-ctrl.c#L101


^ permalink raw reply

* Re: [PATCH v2 15/15] arm64: dts: mediatek: mt8188: add default thermal zones
From: Daniel Lezcano @ 2024-04-04 15:16 UTC (permalink / raw)
  To: Nicolas Pitre, linux-pm, linux-mediatek, devicetree
  Cc: Nicolas Pitre, AngeloGioacchino Del Regno
In-Reply-To: <20240402032729.2736685-16-nico@fluxnic.net>


Hi Nico,

a few comments about this description.

On 02/04/2024 05:25, Nicolas Pitre wrote:
> From: Nicolas Pitre <npitre@baylibre.com>
> 
> Inspired by the vendor kernel but adapted to the upstream thermal
> driver version.

[ ... ]

> +	thermal_zones: thermal-zones {
> +		cpu-little0-thermal {
> +			polling-delay = <1000>;

Except if I'm wrong, the driver supports the interrupt mode, so it not 
necessary to poll constantly when there is no mitigation. You can remove 
the line and everywhere else.

> +			polling-delay-passive = <250>;

As little CPU, 200ms or 150ms may be more adequate.

> +			thermal-sensors = <&lvts_mcu MT8188_MCU_LITTLE_CPU0>;
> +
> +			trips {
> +				cpu_little0_alert: trip-alert {
> +					temperature = <85000>;
> +					hysteresis = <2000>;
> +					type = "passive";
> +				};

You may want to add a 'hot' trip point in between, so the userspace can 
be notified and take an action before reaching 'critical' (like 
unplugging a CPU)

> +				cpu_little0_crit: trip-crit {
> +					temperature = <100000>;
> +					hysteresis = <2000>;

critical is a point of no return. Hysteresis does not make sense.

These comments apply to all thermal zones.

[ .. ]

> +		cpu_big0-thermal {
> +			polling-delay = <1000>;
> +			polling-delay-passive = <250>;

Same comments as the little but may be an even lower value. eg. 100ms.

> +			thermal-sensors = <&lvts_mcu MT8188_MCU_BIG_CPU0>;
> +
> +			trips {
> +				cpu_big0_alert: trip-alert {
> +					temperature = <85000>;
> +					hysteresis = <2000>;
> +					type = "passive";
> +				};
> +
> +				cpu_big0_crit: trip-crit {
> +					temperature = <100000>;
> +					hysteresis = <2000>;
> +					type = "critical";
> +				};
> +			};
> +
> +			cooling-maps {
> +				map0 {
> +					trip = <&cpu_big0_alert>;
> +					cooling-device = <&cpu6 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> +							 <&cpu7 THERMAL_NO_LIMIT THERMAL_NO_LIMIT>;
> +				};
> +			};
> +		};

[ ... ]

> +		gpu1-thermal {
> +			polling-delay = <1000>;
> +			polling-delay-passive = <250>;
> +			thermal-sensors = <&lvts_ap MT8188_AP_GPU1>;
> +
> +			trips {
> +				gpu1_alert: trip-alert {
> +					temperature = <85000>;
> +					hysteresis = <2000>;
> +					type = "passive";
> +				};
> +
> +				gpu1_crit: trip-crit {
> +					temperature = <100000>;
> +					hysteresis = <2000>;
> +					type = "critical";
> +				};
> +			};
> +		};
> +
> +		gpu2-thermal {
> +			polling-delay = <1000>;
> +			polling-delay-passive = <250>;
> +			thermal-sensors = <&lvts_ap MT8188_AP_GPU2>;
> +
> +			trips {
> +				gpu2_alert: trip-alert {
> +					temperature = <85000>;
> +					hysteresis = <2000>;
> +					type = "passive";
> +				};
> +
> +				gpu2_crit: trip-crit {
> +					temperature = <100000>;
> +					hysteresis = <2000>;
> +					type = "critical";
> +				};
> +			};

You can add a devfreq cooling device for the GPU here.

[ ... ]

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply

* [PATCH v2 2/2] cgroup/cpuset: Add test_cpuset_v1_hp.sh
From: Waiman Long @ 2024-04-04 13:47 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Thomas Gleixner,
	Peter Zijlstra, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Shuah Khan
  Cc: linux-kernel, cgroups, linux-pm, linux-kselftest,
	Frederic Weisbecker, Paul E. McKenney, Ingo Molnar,
	Valentin Schneider, Anna-Maria Behnsen, Alex Shi, Vincent Guittot,
	Michal Koutný, Waiman Long
In-Reply-To: <20240404134749.2857852-1-longman@redhat.com>

Add a simple test to verify that an empty v1 cpuset will force its tasks
to be moved to an ancestor node. It is based on the test case documented
in commit 76bb5ab8f6e3 ("cpuset: break kernfs active protection in
cpuset_write_resmask()").

Signed-off-by: Waiman Long <longman@redhat.com>
---
 tools/testing/selftests/cgroup/Makefile       |  2 +-
 .../selftests/cgroup/test_cpuset_v1_hp.sh     | 46 +++++++++++++++++++
 2 files changed, 47 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh

diff --git a/tools/testing/selftests/cgroup/Makefile b/tools/testing/selftests/cgroup/Makefile
index 00b441928909..16461dc0ffdf 100644
--- a/tools/testing/selftests/cgroup/Makefile
+++ b/tools/testing/selftests/cgroup/Makefile
@@ -4,7 +4,7 @@ CFLAGS += -Wall -pthread
 all: ${HELPER_PROGS}
 
 TEST_FILES     := with_stress.sh
-TEST_PROGS     := test_stress.sh test_cpuset_prs.sh
+TEST_PROGS     := test_stress.sh test_cpuset_prs.sh test_cpuset_v1_hp.sh
 TEST_GEN_FILES := wait_inotify
 TEST_GEN_PROGS = test_memcontrol
 TEST_GEN_PROGS += test_kmem
diff --git a/tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh b/tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh
new file mode 100755
index 000000000000..3f45512fb512
--- /dev/null
+++ b/tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test the special cpuset v1 hotplug case where a cpuset become empty of
+# CPUs will force migration of tasks out to an ancestor.
+#
+
+skip_test() {
+	echo "$1"
+	echo "Test SKIPPED"
+	exit 4 # ksft_skip
+}
+
+[[ $(id -u) -eq 0 ]] || skip_test "Test must be run as root!"
+
+# Find cpuset v1 mount point
+CPUSET=$(mount -t cgroup | grep cpuset | head -1 | awk -e '{print $3}')
+[[ -n "$CPUSET" ]] || skip_test "cpuset v1 mount point not found!"
+
+#
+# Create a test cpuset, put a CPU and a task there and offline that CPU
+#
+TDIR=test$$
+[[ -d $CPUSET/$TDIR ]] || mkdir $CPUSET/$TDIR
+echo 1 > $CPUSET/$TDIR/cpuset.cpus
+echo 0 > $CPUSET/$TDIR/cpuset.mems
+sleep 10&
+TASK=$!
+echo $TASK > $CPUSET/$TDIR/tasks
+NEWCS=$(cat /proc/$TASK/cpuset)
+[[ $NEWCS != "/$TDIR" ]] && {
+	echo "Unexpected cpuset $NEWCS, test FAILED!"
+	exit 1
+}
+
+echo 0 > /sys/devices/system/cpu/cpu1/online
+sleep 0.5
+echo 1 > /sys/devices/system/cpu/cpu1/online
+NEWCS=$(cat /proc/$TASK/cpuset)
+rmdir $CPUSET/$TDIR
+[[ $NEWCS != "/" ]] && {
+	echo "cpuset $NEWCS, test FAILED!"
+	exit 1
+}
+echo "Test PASSED"
+exit 0
-- 
2.39.3


^ permalink raw reply related

* [PATCH v2 1/2] cgroup/cpuset: Make cpuset hotplug processing synchronous
From: Waiman Long @ 2024-04-04 13:47 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Thomas Gleixner,
	Peter Zijlstra, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Shuah Khan
  Cc: linux-kernel, cgroups, linux-pm, linux-kselftest,
	Frederic Weisbecker, Paul E. McKenney, Ingo Molnar,
	Valentin Schneider, Anna-Maria Behnsen, Alex Shi, Vincent Guittot,
	Michal Koutný, Waiman Long
In-Reply-To: <20240404134749.2857852-1-longman@redhat.com>

Since commit 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside
get_online_cpus()"), cpuset hotplug was done asynchronously via a work
function. This is to avoid recursive locking of cgroup_mutex.

Since then, the cgroup locking scheme has changed quite a bit. A
cpuset_mutex was introduced to protect cpuset specific operations.
The cpuset_mutex is then replaced by a cpuset_rwsem. With commit
d74b27d63a8b ("cgroup/cpuset: Change cpuset_rwsem and hotplug lock
order"), cpu_hotplug_lock is acquired before cpuset_rwsem. Later on,
cpuset_rwsem is reverted back to cpuset_mutex. All these locking changes
allow the hotplug code to call into cpuset core directly.

The following commits were also merged due to the asynchronous nature
of cpuset hotplug processing.

  - commit b22afcdf04c9 ("cpu/hotplug: Cure the cpusets trainwreck")
  - commit 50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume
    bugs")
  - commit 28b89b9e6f7b ("cpuset: handle race between CPU hotplug and
    cpuset_hotplug_work")

Clean up all these bandages by making cpuset hotplug
processing synchronous again with the exception that the call to
cgroup_transfer_tasks() to transfer tasks out of an empty cgroup v1
cpuset, if necessary, will still be done via a work function due to the
existing cgroup_mutex -> cpu_hotplug_lock dependency. It is possible
to reverse that dependency, but that will require updating a number of
different cgroup controllers. This special hotplug code path should be
rarely taken anyway.

As all the cpuset states will be updated by the end of the hotplug
operation, we can revert most the above commits except commit
50e76632339d ("sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs")
which is partially reverted.  Also removing some cpus_read_lock trylock
attempts in the cpuset partition code as they are no longer necessary
since the cpu_hotplug_lock is now held for the whole duration of the
cpuset hotplug code path.

Signed-off-by: Waiman Long <longman@redhat.com>
---
 include/linux/cpuset.h |   3 -
 kernel/cgroup/cpuset.c | 141 ++++++++++++++++-------------------------
 kernel/cpu.c           |  48 --------------
 kernel/power/process.c |   2 -
 4 files changed, 56 insertions(+), 138 deletions(-)

diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 0ce6ff0d9c9a..de4cf0ee96f7 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -70,7 +70,6 @@ extern int cpuset_init(void);
 extern void cpuset_init_smp(void);
 extern void cpuset_force_rebuild(void);
 extern void cpuset_update_active_cpus(void);
-extern void cpuset_wait_for_hotplug(void);
 extern void inc_dl_tasks_cs(struct task_struct *task);
 extern void dec_dl_tasks_cs(struct task_struct *task);
 extern void cpuset_lock(void);
@@ -185,8 +184,6 @@ static inline void cpuset_update_active_cpus(void)
 	partition_sched_domains(1, NULL, NULL);
 }
 
-static inline void cpuset_wait_for_hotplug(void) { }
-
 static inline void inc_dl_tasks_cs(struct task_struct *task) { }
 static inline void dec_dl_tasks_cs(struct task_struct *task) { }
 static inline void cpuset_lock(void) { }
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 4237c8748715..d8d3439eda4e 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -201,6 +201,14 @@ struct cpuset {
 	struct list_head remote_sibling;
 };
 
+/*
+ * Legacy hierarchy call to cgroup_transfer_tasks() is handled asynchrously
+ */
+struct cpuset_remove_tasks_struct {
+	struct work_struct work;
+	struct cpuset *cs;
+};
+
 /*
  * Exclusive CPUs distributed out to sub-partitions of top_cpuset
  */
@@ -449,12 +457,6 @@ static DEFINE_SPINLOCK(callback_lock);
 
 static struct workqueue_struct *cpuset_migrate_mm_wq;
 
-/*
- * CPU / memory hotplug is handled asynchronously.
- */
-static void cpuset_hotplug_workfn(struct work_struct *work);
-static DECLARE_WORK(cpuset_hotplug_work, cpuset_hotplug_workfn);
-
 static DECLARE_WAIT_QUEUE_HEAD(cpuset_attach_wq);
 
 static inline void check_insane_mems_config(nodemask_t *nodes)
@@ -540,22 +542,10 @@ static void guarantee_online_cpus(struct task_struct *tsk,
 	rcu_read_lock();
 	cs = task_cs(tsk);
 
-	while (!cpumask_intersects(cs->effective_cpus, pmask)) {
+	while (!cpumask_intersects(cs->effective_cpus, pmask))
 		cs = parent_cs(cs);
-		if (unlikely(!cs)) {
-			/*
-			 * The top cpuset doesn't have any online cpu as a
-			 * consequence of a race between cpuset_hotplug_work
-			 * and cpu hotplug notifier.  But we know the top
-			 * cpuset's effective_cpus is on its way to be
-			 * identical to cpu_online_mask.
-			 */
-			goto out_unlock;
-		}
-	}
-	cpumask_and(pmask, pmask, cs->effective_cpus);
 
-out_unlock:
+	cpumask_and(pmask, pmask, cs->effective_cpus);
 	rcu_read_unlock();
 }
 
@@ -1217,7 +1207,7 @@ static void rebuild_sched_domains_locked(void)
 	/*
 	 * If we have raced with CPU hotplug, return early to avoid
 	 * passing doms with offlined cpu to partition_sched_domains().
-	 * Anyways, cpuset_hotplug_workfn() will rebuild sched domains.
+	 * Anyways, cpuset_handle_hotplug() will rebuild sched domains.
 	 *
 	 * With no CPUs in any subpartitions, top_cpuset's effective CPUs
 	 * should be the same as the active CPUs, so checking only top_cpuset
@@ -1260,12 +1250,17 @@ static void rebuild_sched_domains_locked(void)
 }
 #endif /* CONFIG_SMP */
 
-void rebuild_sched_domains(void)
+static void rebuild_sched_domains_cpuslocked(void)
 {
-	cpus_read_lock();
 	mutex_lock(&cpuset_mutex);
 	rebuild_sched_domains_locked();
 	mutex_unlock(&cpuset_mutex);
+}
+
+void rebuild_sched_domains(void)
+{
+	cpus_read_lock();
+	rebuild_sched_domains_cpuslocked();
 	cpus_read_unlock();
 }
 
@@ -2079,14 +2074,11 @@ static int update_parent_effective_cpumask(struct cpuset *cs, int cmd,
 
 	/*
 	 * For partcmd_update without newmask, it is being called from
-	 * cpuset_hotplug_workfn() where cpus_read_lock() wasn't taken.
-	 * Update the load balance flag and scheduling domain if
-	 * cpus_read_trylock() is successful.
+	 * cpuset_handle_hotplug(). Update the load balance flag and
+	 * scheduling domain accordingly.
 	 */
-	if ((cmd == partcmd_update) && !newmask && cpus_read_trylock()) {
+	if ((cmd == partcmd_update) && !newmask)
 		update_partition_sd_lb(cs, old_prs);
-		cpus_read_unlock();
-	}
 
 	notify_partition_change(cs, old_prs);
 	return 0;
@@ -3599,8 +3591,8 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
 	 * proceeding, so that we don't end up keep removing tasks added
 	 * after execution capability is restored.
 	 *
-	 * cpuset_hotplug_work calls back into cgroup core via
-	 * cgroup_transfer_tasks() and waiting for it from a cgroupfs
+	 * cpuset_handle_hotplug may call back into cgroup core asynchronously
+	 * via cgroup_transfer_tasks() and waiting for it from a cgroupfs
 	 * operation like this one can lead to a deadlock through kernfs
 	 * active_ref protection.  Let's break the protection.  Losing the
 	 * protection is okay as we check whether @cs is online after
@@ -3609,7 +3601,6 @@ static ssize_t cpuset_write_resmask(struct kernfs_open_file *of,
 	 */
 	css_get(&cs->css);
 	kernfs_break_active_protection(of->kn);
-	flush_work(&cpuset_hotplug_work);
 
 	cpus_read_lock();
 	mutex_lock(&cpuset_mutex);
@@ -4354,6 +4345,16 @@ static void remove_tasks_in_empty_cpuset(struct cpuset *cs)
 	}
 }
 
+static void cpuset_migrate_tasks_workfn(struct work_struct *work)
+{
+	struct cpuset_remove_tasks_struct *s;
+
+	s = container_of(work, struct cpuset_remove_tasks_struct, work);
+	remove_tasks_in_empty_cpuset(s->cs);
+	css_put(&s->cs->css);
+	kfree(s);
+}
+
 static void
 hotplug_update_tasks_legacy(struct cpuset *cs,
 			    struct cpumask *new_cpus, nodemask_t *new_mems,
@@ -4383,12 +4384,21 @@ hotplug_update_tasks_legacy(struct cpuset *cs,
 	/*
 	 * Move tasks to the nearest ancestor with execution resources,
 	 * This is full cgroup operation which will also call back into
-	 * cpuset. Should be done outside any lock.
+	 * cpuset. Execute it asynchronously using workqueue.
 	 */
-	if (is_empty) {
-		mutex_unlock(&cpuset_mutex);
-		remove_tasks_in_empty_cpuset(cs);
-		mutex_lock(&cpuset_mutex);
+	if (is_empty && cs->css.cgroup->nr_populated_csets &&
+	    css_tryget_online(&cs->css)) {
+		struct cpuset_remove_tasks_struct *s;
+
+		s = kzalloc(sizeof(*s), GFP_KERNEL);
+		if (WARN_ON_ONCE(!s)) {
+			css_put(&cs->css);
+			return;
+		}
+
+		s->cs = cs;
+		INIT_WORK(&s->work, cpuset_migrate_tasks_workfn);
+		schedule_work(&s->work);
 	}
 }
 
@@ -4421,30 +4431,6 @@ void cpuset_force_rebuild(void)
 	force_rebuild = true;
 }
 
-/*
- * Attempt to acquire a cpus_read_lock while a hotplug operation may be in
- * progress.
- * Return: true if successful, false otherwise
- *
- * To avoid circular lock dependency between cpuset_mutex and cpus_read_lock,
- * cpus_read_trylock() is used here to acquire the lock.
- */
-static bool cpuset_hotplug_cpus_read_trylock(void)
-{
-	int retries = 0;
-
-	while (!cpus_read_trylock()) {
-		/*
-		 * CPU hotplug still in progress. Retry 5 times
-		 * with a 10ms wait before bailing out.
-		 */
-		if (++retries > 5)
-			return false;
-		msleep(10);
-	}
-	return true;
-}
-
 /**
  * cpuset_hotplug_update_tasks - update tasks in a cpuset for hotunplug
  * @cs: cpuset in interest
@@ -4493,13 +4479,11 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
 		compute_partition_effective_cpumask(cs, &new_cpus);
 
 	if (remote && cpumask_empty(&new_cpus) &&
-	    partition_is_populated(cs, NULL) &&
-	    cpuset_hotplug_cpus_read_trylock()) {
+	    partition_is_populated(cs, NULL)) {
 		remote_partition_disable(cs, tmp);
 		compute_effective_cpumask(&new_cpus, cs, parent);
 		remote = false;
 		cpuset_force_rebuild();
-		cpus_read_unlock();
 	}
 
 	/*
@@ -4519,18 +4503,8 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
 	else if (is_partition_valid(parent) && is_partition_invalid(cs))
 		partcmd = partcmd_update;
 
-	/*
-	 * cpus_read_lock needs to be held before calling
-	 * update_parent_effective_cpumask(). To avoid circular lock
-	 * dependency between cpuset_mutex and cpus_read_lock,
-	 * cpus_read_trylock() is used here to acquire the lock.
-	 */
 	if (partcmd >= 0) {
-		if (!cpuset_hotplug_cpus_read_trylock())
-			goto update_tasks;
-
 		update_parent_effective_cpumask(cs, partcmd, NULL, tmp);
-		cpus_read_unlock();
 		if ((partcmd == partcmd_invalidate) || is_partition_valid(cs)) {
 			compute_partition_effective_cpumask(cs, &new_cpus);
 			cpuset_force_rebuild();
@@ -4558,8 +4532,7 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
 }
 
 /**
- * cpuset_hotplug_workfn - handle CPU/memory hotunplug for a cpuset
- * @work: unused
+ * cpuset_handle_hotplug - handle CPU/memory hot{,un}plug for a cpuset
  *
  * This function is called after either CPU or memory configuration has
  * changed and updates cpuset accordingly.  The top_cpuset is always
@@ -4573,8 +4546,10 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
  *
  * Note that CPU offlining during suspend is ignored.  We don't modify
  * cpusets across suspend/resume cycles at all.
+ *
+ * CPU / memory hotplug is handled synchronously.
  */
-static void cpuset_hotplug_workfn(struct work_struct *work)
+static void cpuset_handle_hotplug(void)
 {
 	static cpumask_t new_cpus;
 	static nodemask_t new_mems;
@@ -4585,6 +4560,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	if (on_dfl && !alloc_cpumasks(NULL, &tmp))
 		ptmp = &tmp;
 
+	lockdep_assert_cpus_held();
 	mutex_lock(&cpuset_mutex);
 
 	/* fetch the available cpus/mems and find out which changed how */
@@ -4666,7 +4642,7 @@ static void cpuset_hotplug_workfn(struct work_struct *work)
 	/* rebuild sched domains if cpus_allowed has changed */
 	if (cpus_updated || force_rebuild) {
 		force_rebuild = false;
-		rebuild_sched_domains();
+		rebuild_sched_domains_cpuslocked();
 	}
 
 	free_cpumasks(NULL, ptmp);
@@ -4679,12 +4655,7 @@ void cpuset_update_active_cpus(void)
 	 * inside cgroup synchronization.  Bounce actual hotplug processing
 	 * to a work item to avoid reverse locking order.
 	 */
-	schedule_work(&cpuset_hotplug_work);
-}
-
-void cpuset_wait_for_hotplug(void)
-{
-	flush_work(&cpuset_hotplug_work);
+	cpuset_handle_hotplug();
 }
 
 /*
@@ -4695,7 +4666,7 @@ void cpuset_wait_for_hotplug(void)
 static int cpuset_track_online_nodes(struct notifier_block *self,
 				unsigned long action, void *arg)
 {
-	schedule_work(&cpuset_hotplug_work);
+	cpuset_handle_hotplug();
 	return NOTIFY_OK;
 }
 
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 8f6affd051f7..49ce3f309688 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1208,52 +1208,6 @@ void __init cpuhp_threads_init(void)
 	kthread_unpark(this_cpu_read(cpuhp_state.thread));
 }
 
-/*
- *
- * Serialize hotplug trainwrecks outside of the cpu_hotplug_lock
- * protected region.
- *
- * The operation is still serialized against concurrent CPU hotplug via
- * cpu_add_remove_lock, i.e. CPU map protection.  But it is _not_
- * serialized against other hotplug related activity like adding or
- * removing of state callbacks and state instances, which invoke either the
- * startup or the teardown callback of the affected state.
- *
- * This is required for subsystems which are unfixable vs. CPU hotplug and
- * evade lock inversion problems by scheduling work which has to be
- * completed _before_ cpu_up()/_cpu_down() returns.
- *
- * Don't even think about adding anything to this for any new code or even
- * drivers. It's only purpose is to keep existing lock order trainwrecks
- * working.
- *
- * For cpu_down() there might be valid reasons to finish cleanups which are
- * not required to be done under cpu_hotplug_lock, but that's a different
- * story and would be not invoked via this.
- */
-static void cpu_up_down_serialize_trainwrecks(bool tasks_frozen)
-{
-	/*
-	 * cpusets delegate hotplug operations to a worker to "solve" the
-	 * lock order problems. Wait for the worker, but only if tasks are
-	 * _not_ frozen (suspend, hibernate) as that would wait forever.
-	 *
-	 * The wait is required because otherwise the hotplug operation
-	 * returns with inconsistent state, which could even be observed in
-	 * user space when a new CPU is brought up. The CPU plug uevent
-	 * would be delivered and user space reacting on it would fail to
-	 * move tasks to the newly plugged CPU up to the point where the
-	 * work has finished because up to that point the newly plugged CPU
-	 * is not assignable in cpusets/cgroups. On unplug that's not
-	 * necessarily a visible issue, but it is still inconsistent state,
-	 * which is the real problem which needs to be "fixed". This can't
-	 * prevent the transient state between scheduling the work and
-	 * returning from waiting for it.
-	 */
-	if (!tasks_frozen)
-		cpuset_wait_for_hotplug();
-}
-
 #ifdef CONFIG_HOTPLUG_CPU
 #ifndef arch_clear_mm_cpumask_cpu
 #define arch_clear_mm_cpumask_cpu(cpu, mm) cpumask_clear_cpu(cpu, mm_cpumask(mm))
@@ -1494,7 +1448,6 @@ static int __ref _cpu_down(unsigned int cpu, int tasks_frozen,
 	 */
 	lockup_detector_cleanup();
 	arch_smt_update();
-	cpu_up_down_serialize_trainwrecks(tasks_frozen);
 	return ret;
 }
 
@@ -1728,7 +1681,6 @@ static int _cpu_up(unsigned int cpu, int tasks_frozen, enum cpuhp_state target)
 out:
 	cpus_write_unlock();
 	arch_smt_update();
-	cpu_up_down_serialize_trainwrecks(tasks_frozen);
 	return ret;
 }
 
diff --git a/kernel/power/process.c b/kernel/power/process.c
index cae81a87cc91..66ac067d9ae6 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -194,8 +194,6 @@ void thaw_processes(void)
 	__usermodehelper_set_disable_depth(UMH_FREEZING);
 	thaw_workqueues();
 
-	cpuset_wait_for_hotplug();
-
 	read_lock(&tasklist_lock);
 	for_each_process_thread(g, p) {
 		/* No other threads should have PF_SUSPEND_TASK set */
-- 
2.39.3


^ permalink raw reply related

* [PATCH v2 0/2] cgroup/cpuset: Make cpuset hotplug processing synchronous
From: Waiman Long @ 2024-04-04 13:47 UTC (permalink / raw)
  To: Tejun Heo, Zefan Li, Johannes Weiner, Thomas Gleixner,
	Peter Zijlstra, Rafael J. Wysocki, Len Brown, Pavel Machek,
	Shuah Khan
  Cc: linux-kernel, cgroups, linux-pm, linux-kselftest,
	Frederic Weisbecker, Paul E. McKenney, Ingo Molnar,
	Valentin Schneider, Anna-Maria Behnsen, Alex Shi, Vincent Guittot,
	Michal Koutný, Waiman Long

 v2:
  - Found that rebuild_sched_domains() has external callers, revert its
    change and introduce rebuild_sched_domains_cpuslocked() instead.

As discussed in the LKML thread [1], the asynchronous nature of cpuset
hotplug handling code is causing problem with RCU testing. With recent
changes in the way locking is being handled in the cpuset code, it is
now possible to make the cpuset hotplug code synchronous again without
major changes.

This series enables the hotplug code to call directly into cpuset hotplug
core without indirection with the exception of the special case of v1
cpuset becoming empty still being handled indirectly with workqueue.

A new simple test case was also written to test this special v1 cpuset
case. The test_cpuset_prs.sh script was also run with LOCKDEP on to
verify that there is no regression.

[1] https://lore.kernel.org/lkml/ZgYikMb5kZ7rxPp6@slm.duckdns.org/

Waiman Long (2):
  cgroup/cpuset: Make cpuset hotplug processing synchronous
  cgroup/cpuset: Add test_cpuset_v1_hp.sh

 include/linux/cpuset.h                        |   3 -
 kernel/cgroup/cpuset.c                        | 141 +++++++-----------
 kernel/cpu.c                                  |  48 ------
 kernel/power/process.c                        |   2 -
 tools/testing/selftests/cgroup/Makefile       |   2 +-
 .../selftests/cgroup/test_cpuset_v1_hp.sh     |  46 ++++++
 6 files changed, 103 insertions(+), 139 deletions(-)
 create mode 100755 tools/testing/selftests/cgroup/test_cpuset_v1_hp.sh

-- 
2.39.3


^ permalink raw reply

* Re: [PATCH v2 0/2] thermal: amlogic: introduce A1 SoC family Thermal Sensor controller
From: Daniel Lezcano @ 2024-04-04 12:23 UTC (permalink / raw)
  To: Dmitry Rokosov, neil.armstrong, jbrunet, mturquette, khilman,
	martin.blumenstingl, glaroque, rafael, rui.zhang, lukasz.luba,
	robh+dt, krzysztof.kozlowski+dt, conor+dt
  Cc: kernel, rockosov, linux-amlogic, linux-pm, linux-kernel,
	devicetree, linux-arm-kernel
In-Reply-To: <20240328191322.17551-1-ddrokosov@salutedevices.com>

On 28/03/2024 20:13, Dmitry Rokosov wrote:
> It is primarily based on the G12A thermal controller, with only a slight
> variation in the offset value of the efuse parameters. Therefore, this
> patch series provides appropriate platform data and dt-bindings to
> ensure proper support.


Applied, thanks

-- 
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog


^ permalink raw reply

* Re: [PATCH v1 2/2] thermal: gov_step_wise: Simplify checks related to passive trips
From: Rafael J. Wysocki @ 2024-04-04 11:57 UTC (permalink / raw)
  To: Lukasz Luba; +Cc: Rafael J. Wysocki, LKML, Daniel Lezcano, Linux PM
In-Reply-To: <6f950b18-4ff7-4570-957d-49b46167c12e@arm.com>

On Thu, Apr 4, 2024 at 1:26 PM Lukasz Luba <lukasz.luba@arm.com> wrote:
>
>
>
> On 4/3/24 19:12, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > Make it more clear from the code flow that the passive polling status
> > updates only take place for passive trip points.
> >
> > No intentional functional impact.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >   drivers/thermal/gov_step_wise.c |   14 ++++++--------
> >   1 file changed, 6 insertions(+), 8 deletions(-)
> >
> > Index: linux-pm/drivers/thermal/gov_step_wise.c
> > ===================================================================
> > --- linux-pm.orig/drivers/thermal/gov_step_wise.c
> > +++ linux-pm/drivers/thermal/gov_step_wise.c
> > @@ -92,15 +92,13 @@ static void thermal_zone_trip_update(str
> >               if (instance->initialized && old_target == instance->target)
> >                       continue;
> >
> > -             if (old_target == THERMAL_NO_TARGET &&
> > -                 instance->target != THERMAL_NO_TARGET) {
> > -                     /* Activate a passive thermal instance */
> > -                     if (trip->type == THERMAL_TRIP_PASSIVE)
> > +             if (trip->type == THERMAL_TRIP_PASSIVE) {
> > +                     /* If needed, update the status of passive polling. */
> > +                     if (old_target == THERMAL_NO_TARGET &&
> > +                         instance->target != THERMAL_NO_TARGET)
> >                               tz->passive++;
> > -             } else if (old_target != THERMAL_NO_TARGET &&
> > -                        instance->target == THERMAL_NO_TARGET) {
> > -                     /* Deactivate a passive thermal instance */
> > -                     if (trip->type == THERMAL_TRIP_PASSIVE)
> > +                     else if (old_target != THERMAL_NO_TARGET &&
> > +                              instance->target == THERMAL_NO_TARGET)
> >                               tz->passive--;
> >               }
> >
> >
> >
> >
>
> The patch looks good, although I got some warning while applying with
> my b4 tool:
> BADSIG: DKIM/rjwysocki.net

It's likely because it was sent from an address in the rjwysocki.net
domain, but it is perfectly fine to send "somebody else's" patches if
a replacement From: header is given.

Looks like a b4 issue to me.

> Anyway, it looks like false warning IMO.
>
> Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>

Thank you!

^ permalink raw reply

* Re: [CfP] LPC 2024 - Power Management and Thermal Control Micro-Conference
From: Rafael J. Wysocki @ 2024-04-04 11:54 UTC (permalink / raw)
  To: Ulf Hansson
  Cc: Dhruva Gole, Nikunj Kela, Kevin Hilman, Doug Anderson, Linux PM
In-Reply-To: <CAPDyKFrUKZ+h37YEhkkLj9ffPtV_gn5_8_dqUrSGNsYn7T_ozw@mail.gmail.com>

On Thu, Apr 4, 2024 at 1:32 PM Ulf Hansson <ulf.hansson@linaro.org> wrote:
>
> + Dhruva, Nikunj, Kevin, Doug (I probably forgot some of the last
> years attendants, please loop them if you know)
>
> On Mon, 25 Mar 2024 at 12:37, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > Hi Everyone,
> >
> > This year the Linux Plumbers Conference will be held in Vienna,
> > Austria, on Wednesday 18th,  Thursday 19th and Friday 20th of
> > September (http://linuxplumbersconf.org/).
> >
> > As it has been a tradition for the last few years, I'm planning to
> > submit a proposal for a Power Management and Thermal Control
> > Micro-Conference to the LPC and I'm looking for topics that people
> > would like to discuss in that MC.
>
> Thanks for continuing to drive this!

Anytime!

> If you need some additional help around this, feel free to reach out to me.

I will.

> >
> > The LPC is focused on work in progress and future developments, so any
> > work that has been completed already is not a suitable topic for it,
> > but if there's anything you'd like to do or you are planning or
> > considering in the power management and thermal control space in
> > Linux, please let me know.
>
> We had a BoF last year around "supporting multiple low power states
> for system wide suspend".
>
> Not so much progress has been made I believe? (Except for some local
> hacks that I am working on).
>
> If possible, I think it would be a good opportunity to continue the
> discussions from last year, assuming people are still interested in
> this topic.

OK, I'll add this to the list of prospective topics.

^ permalink raw reply

* [PATCH v2] platform/x86/intel/hid: Don't wake on 5-button releases
From: David McFarland @ 2024-04-04 11:41 UTC (permalink / raw)
  To: Hans de Goede
  Cc: David McFarland, linux-pm, Ilpo Järvinen,
	platform-driver-x86@vger.kernel.org, Rafael J . Wysocki,
	Enrik Berkhan
In-Reply-To: <ed891842-a86f-4ca8-af29-f7921a259146@redhat.com>

If, for example, the power button is configured to suspend, holding it
and releasing it after the machine has suspended, will wake the machine.

Also on some machines, power button release events are sent during
hibernation, even if the button wasn't used to hibernate the machine.
This causes hibernation to be aborted.

Fixes: 0c4cae1bc00d ("PM: hibernate: Avoid missing wakeup events during hibernation")
Signed-off-by: David McFarland <corngood@gmail.com>
Tested-by: Enrik Berkhan <Enrik.Berkhan@inka.de>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
---
v2: Added tags and fixed whitespace, as requested by Hans.

 drivers/platform/x86/intel/hid.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/platform/x86/intel/hid.c b/drivers/platform/x86/intel/hid.c
index 7457ca2b27a6..9ffbdc988fe5 100644
--- a/drivers/platform/x86/intel/hid.c
+++ b/drivers/platform/x86/intel/hid.c
@@ -504,6 +504,7 @@ static void notify_handler(acpi_handle handle, u32 event, void *context)
 	struct platform_device *device = context;
 	struct intel_hid_priv *priv = dev_get_drvdata(&device->dev);
 	unsigned long long ev_index;
+	struct key_entry *ke;
 	int err;
 
 	/*
@@ -545,11 +546,15 @@ static void notify_handler(acpi_handle handle, u32 event, void *context)
 		if (event == 0xc0 || !priv->array)
 			return;
 
-		if (!sparse_keymap_entry_from_scancode(priv->array, event)) {
+		ke = sparse_keymap_entry_from_scancode(priv->array, event);
+		if (!ke) {
 			dev_info(&device->dev, "unknown event 0x%x\n", event);
 			return;
 		}
 
+		if (ke->type == KE_IGNORE)
+			return;
+
 wakeup:
 		pm_wakeup_hard_event(&device->dev);
 
-- 
2.42.0

^ permalink raw reply related

* Re: [PATCH 02/18] gpio: cros-ec: provide ID table for avoiding fallback match
From: Linus Walleij @ 2024-04-04 11:32 UTC (permalink / raw)
  To: Tzung-Bi Shih
  Cc: bleung, groeck, brgl, hverkuil-cisco, mchehab, sre,
	alexandre.belloni, chrome-platform, pmalani, linux-gpio,
	linux-media, linux-pm, linux-rtc, krzk
In-Reply-To: <20240329075630.2069474-3-tzungbi@kernel.org>

On Fri, Mar 29, 2024 at 8:57 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:

> Instead of using fallback driver name match, provide ID table[1] for the
> primary match.  Also allow automatic module loading by adding
> MODULE_DEVICE_TABLE().
>
> [1]: https://elixir.bootlin.com/linux/v6.8/source/drivers/base/platform.c#L1353
>
> Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>

Acked-by: Linus Walleij <linus.walleij@linaro.org>

Yours,
Linus Walleij

^ permalink raw reply

* Re: [CfP] LPC 2024 - Power Management and Thermal Control Micro-Conference
From: Ulf Hansson @ 2024-04-04 11:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Linux PM, Dhruva Gole, Nikunj Kela, Kevin Hilman, Doug Anderson
In-Reply-To: <CAJZ5v0i+G7q0jxLGEnoigWf8Aa=zi4esC3EzethsBkxrp=sCLA@mail.gmail.com>

+ Dhruva, Nikunj, Kevin, Doug (I probably forgot some of the last
years attendants, please loop them if you know)

On Mon, 25 Mar 2024 at 12:37, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> Hi Everyone,
>
> This year the Linux Plumbers Conference will be held in Vienna,
> Austria, on Wednesday 18th,  Thursday 19th and Friday 20th of
> September (http://linuxplumbersconf.org/).
>
> As it has been a tradition for the last few years, I'm planning to
> submit a proposal for a Power Management and Thermal Control
> Micro-Conference to the LPC and I'm looking for topics that people
> would like to discuss in that MC.

Thanks for continuing to drive this! If you need some additional help
around this, feel free to reach out to me.

>
> The LPC is focused on work in progress and future developments, so any
> work that has been completed already is not a suitable topic for it,
> but if there's anything you'd like to do or you are planning or
> considering in the power management and thermal control space in
> Linux, please let me know.

We had a BoF last year around "supporting multiple low power states
for system wide suspend".

Not so much progress has been made I believe? (Except for some local
hacks that I am working on).

If possible, I think it would be a good opportunity to continue the
discussions from last year, assuming people are still interested in
this topic.

Kind regards
Uffe

^ permalink raw reply

* Re: [PATCH v3 0/6] thermal: More separation between the core and drivers
From: Lukasz Luba @ 2024-04-04 11:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: LKML, Srinivas Pandruvada, Daniel Lezcano,
	AngeloGioacchino Del Regno, Rafael J. Wysocki, Linux PM
In-Reply-To: <CAJZ5v0iSyTP4SbGBYESNy9NynMCQn8dojCFoOtQU4Q305ZKGTQ@mail.gmail.com>



On 4/2/24 20:42, Rafael J. Wysocki wrote:
> On Tue, Apr 2, 2024 at 9:04 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>>
>> Hi Everyone,
>>
>> This is an update of
>>
>> https://lore.kernel.org/linux-pm/4558384.LvFx2qVVIh@kreacher/
>>
>> and
>>
>> https://lore.kernel.org/linux-pm/2331888.ElGaqSPkdT@kreacher/
>>
>> which rebases the first patch on top of 6.9-rc2, adds 3 patches and adjusts
>> the third patch from v2.
>>
>> The original description of the first two patches still applies:
>>
>>> Patch [1/2] is based on the observation that the threshold field in struct
>>> thermal_trip really should be core-internal and to make that happen it
>>> introduces a wrapper structure around struct thermal_trip for internal
>>> use in the core.
>>>
>>> Patch [2/2] moves the definition of the new structure and the struct
>>> thermal_zone_device one to a local header file in the core to enforce
>>> more separation between the core and drivers.
>>>
>>> The patches are not expected to introduce any observable differences in
>>> behavior, so please let me know if you see any of that.
>>
>> Note that these patches were first sent before the merge window and have not
>> really changed since then (except for a minor rebase of the first patch in
>> this series).  Moreover, no comments regarding the merit of these patches
>> have been made shared, so if this continues, I will be considering them as
>> good to go by the end of this week.
>>
>> Patch [3/6] is a rewrite of comments regarding trip crossing and threshold
>> computations.
>>
>> Patch [4/6] updates the trip crossing detection code to consolidate the
>> threshold initialization with trip crossing on the way up.
>>
>> Patch [5/6] ([3/3] in v2) adds a mechanism to sort notifications and debug
>> calls taking place during one invocation of __thermal_zone_device_update() so
>> they always go in temperature order.
>>
>> Patch [6/6] relocates the critical and trip point handling to avoid a
>> redundant temperature check.
>>
>> The series applies on top of 6.9-rc2 and I'm planning to create a test
>> branch containing it.
> 
> As promised:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/log/?h=thermal-core-testing

Thanks Rafael for the handy branch. I'll play with it today.

Regards,
Lukasz

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox