* [PATCH v1 0/2] thermal/debufs: Fix and clean up trip statistics collection @ 2024-04-15 18:59 Rafael J. Wysocki 2024-04-15 19:02 ` [PATCH v1 1/2] thermal/debugfs: Add missing count increment to thermal_debug_tz_trip_up() Rafael J. Wysocki 2024-04-15 19:03 ` [PATCH v1 2/2] thermal/debugfs: Add helper function for trip stats updates Rafael J. Wysocki 0 siblings, 2 replies; 4+ messages in thread From: Rafael J. Wysocki @ 2024-04-15 18:59 UTC (permalink / raw) To: Linux PM; +Cc: LKML, Lukasz Luba, Daniel Lezcano Hi Everyone, This series fixes a possible kernel crash in thermal_debug_tz_trip_up() and reduces some code duplication between this function and thermal_debug_update_temp(). The plan is to push the fix (patch [1/2]) for 6.9-rc and apply the cleanup for 6.10 when the fix reaches the mainline. Thanks! ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v1 1/2] thermal/debugfs: Add missing count increment to thermal_debug_tz_trip_up() 2024-04-15 18:59 [PATCH v1 0/2] thermal/debufs: Fix and clean up trip statistics collection Rafael J. Wysocki @ 2024-04-15 19:02 ` Rafael J. Wysocki 2024-04-15 19:03 ` [PATCH v1 2/2] thermal/debugfs: Add helper function for trip stats updates Rafael J. Wysocki 1 sibling, 0 replies; 4+ messages in thread From: Rafael J. Wysocki @ 2024-04-15 19:02 UTC (permalink / raw) To: Linux PM; +Cc: LKML, Lukasz Luba, Daniel Lezcano From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> The count field in struct trip_stats, representing the number of times the zone temperature was above the trip point, needs to be incremented in thermal_debug_tz_trip_up(), for two reasons. First, if a trip point is crossed on the way up for the first time, thermal_debug_update_temp() called from update_temperature() does not see it because it has not been added to trips_crossed[] array in the thermal zone's struct tz_debugfs object yet. Therefore, when thermal_debug_tz_trip_up() is called after that, the trip point's count value is 0, and the attempt to divide by it during the average temperature computation leads to a divide error which causes the kernel to crash. Setting the count to 1 before the division by incrementing it fixes this problem. Second, if a trip point is crossed on the way up, but it has been crossed on the way up already before, its count value needs to be incremented to make a record of the fact that the zone temperature is above the trip now. Without doing that, if the mitigations applied after crossing the trip cause the zone temperature to drop below its threshold, the count will not be updated for this episode at all and the average temperature in the trip statistics record will be somewhat smaller than it should be. Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes") Cc :6.8+ <stable@vger.kernel.org> # 6.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- drivers/thermal/thermal_debugfs.c | 1 + 1 file changed, 1 insertion(+) Index: linux-pm/drivers/thermal/thermal_debugfs.c =================================================================== --- linux-pm.orig/drivers/thermal/thermal_debugfs.c +++ linux-pm/drivers/thermal/thermal_debugfs.c @@ -616,6 +616,7 @@ void thermal_debug_tz_trip_up(struct the tze->trip_stats[trip_id].timestamp = now; tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature); tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature); + tze->trip_stats[trip_id].count++; tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg + (temperature - tze->trip_stats[trip_id].avg) / tze->trip_stats[trip_id].count; ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v1 2/2] thermal/debugfs: Add helper function for trip stats updates 2024-04-15 18:59 [PATCH v1 0/2] thermal/debufs: Fix and clean up trip statistics collection Rafael J. Wysocki 2024-04-15 19:02 ` [PATCH v1 1/2] thermal/debugfs: Add missing count increment to thermal_debug_tz_trip_up() Rafael J. Wysocki @ 2024-04-15 19:03 ` Rafael J. Wysocki 2024-04-17 9:35 ` Rafael J. Wysocki 1 sibling, 1 reply; 4+ messages in thread From: Rafael J. Wysocki @ 2024-04-15 19:03 UTC (permalink / raw) To: Linux PM; +Cc: LKML, Lukasz Luba, Daniel Lezcano From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> The code updating a trip_stats entry in thermal_debug_tz_trip_up() and thermal_debug_update_temp() is almost entirely duplicate, so move it to a new helper function that will be called from both these places. While at it, drop a redundant tz_dbg->nr_trips check and a label related to it from thermal_debug_update_temp(). Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> --- drivers/thermal/thermal_debugfs.c | 42 +++++++++++++++++--------------------- 1 file changed, 19 insertions(+), 23 deletions(-) Index: linux-pm/drivers/thermal/thermal_debugfs.c =================================================================== --- linux-pm.orig/drivers/thermal/thermal_debugfs.c +++ linux-pm/drivers/thermal/thermal_debugfs.c @@ -539,6 +539,19 @@ static struct tz_episode *thermal_debugf return tze; } +static struct trip_stats *update_tz_episode(struct tz_debugfs *tz_dbg, + int trip_id, int temperature) +{ + struct tz_episode *tze = list_first_entry(&tz_dbg->tz_episodes, + struct tz_episode, node); + struct trip_stats *trip_stats = &tze->trip_stats[trip_id]; + + trip_stats->max = max(trip_stats->max, temperature); + trip_stats->min = min(trip_stats->min, temperature); + trip_stats->avg += (temperature - trip_stats->avg) / ++trip_stats->count; + return trip_stats; +} + void thermal_debug_tz_trip_up(struct thermal_zone_device *tz, const struct thermal_trip *trip) { @@ -547,6 +560,7 @@ void thermal_debug_tz_trip_up(struct the struct thermal_debugfs *thermal_dbg = tz->debugfs; int temperature = tz->temperature; int trip_id = thermal_zone_trip_id(tz, trip); + struct trip_stats *trip_stats; ktime_t now = ktime_get(); if (!thermal_dbg) @@ -612,14 +626,8 @@ void thermal_debug_tz_trip_up(struct the */ tz_dbg->trips_crossed[tz_dbg->nr_trips++] = trip_id; - tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node); - tze->trip_stats[trip_id].timestamp = now; - tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature); - tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature); - tze->trip_stats[trip_id].count++; - tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg + - (temperature - tze->trip_stats[trip_id].avg) / - tze->trip_stats[trip_id].count; + trip_stats = update_tz_episode(tz_dbg, trip_id, temperature); + trip_stats->timestamp = now; unlock: mutex_unlock(&thermal_dbg->lock); @@ -686,9 +694,8 @@ out: void thermal_debug_update_temp(struct thermal_zone_device *tz) { struct thermal_debugfs *thermal_dbg = tz->debugfs; - struct tz_episode *tze; struct tz_debugfs *tz_dbg; - int trip_id, i; + int i; if (!thermal_dbg) return; @@ -697,20 +704,9 @@ void thermal_debug_update_temp(struct th tz_dbg = &thermal_dbg->tz_dbg; - if (!tz_dbg->nr_trips) - goto out; + for (i = 0; i < tz_dbg->nr_trips; i++) + update_tz_episode(tz_dbg, tz_dbg->trips_crossed[i], tz->temperature); - for (i = 0; i < tz_dbg->nr_trips; i++) { - trip_id = tz_dbg->trips_crossed[i]; - tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node); - tze->trip_stats[trip_id].count++; - tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, tz->temperature); - tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, tz->temperature); - tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg + - (tz->temperature - tze->trip_stats[trip_id].avg) / - tze->trip_stats[trip_id].count; - } -out: mutex_unlock(&thermal_dbg->lock); } ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH v1 2/2] thermal/debugfs: Add helper function for trip stats updates 2024-04-15 19:03 ` [PATCH v1 2/2] thermal/debugfs: Add helper function for trip stats updates Rafael J. Wysocki @ 2024-04-17 9:35 ` Rafael J. Wysocki 0 siblings, 0 replies; 4+ messages in thread From: Rafael J. Wysocki @ 2024-04-17 9:35 UTC (permalink / raw) To: Rafael J. Wysocki; +Cc: Linux PM, LKML, Lukasz Luba, Daniel Lezcano On Mon, Apr 15, 2024 at 9:03 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > > The code updating a trip_stats entry in thermal_debug_tz_trip_up() > and thermal_debug_update_temp() is almost entirely duplicate, so move > it to a new helper function that will be called from both these places. > > While at it, drop a redundant tz_dbg->nr_trips check and a label related > to it from thermal_debug_update_temp(). > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> In the meantime I realized that thermal_debug_update_temp() made false-positive updates of trips involved in the current mitigation episode in the cases when they were crossed on the way down. Namely, in that case the zone temperature is already below the low temperature of the trip, but that is only recorded by thermal_debug_tz_trip_down() that is called after thermal_debug_update_temp(). For this reason, I'm withdrawing this patch and I will send replacement patches later today. Thanks! > --- > drivers/thermal/thermal_debugfs.c | 42 +++++++++++++++++--------------------- > 1 file changed, 19 insertions(+), 23 deletions(-) > > Index: linux-pm/drivers/thermal/thermal_debugfs.c > =================================================================== > --- linux-pm.orig/drivers/thermal/thermal_debugfs.c > +++ linux-pm/drivers/thermal/thermal_debugfs.c > @@ -539,6 +539,19 @@ static struct tz_episode *thermal_debugf > return tze; > } > > +static struct trip_stats *update_tz_episode(struct tz_debugfs *tz_dbg, > + int trip_id, int temperature) > +{ > + struct tz_episode *tze = list_first_entry(&tz_dbg->tz_episodes, > + struct tz_episode, node); > + struct trip_stats *trip_stats = &tze->trip_stats[trip_id]; > + > + trip_stats->max = max(trip_stats->max, temperature); > + trip_stats->min = min(trip_stats->min, temperature); > + trip_stats->avg += (temperature - trip_stats->avg) / ++trip_stats->count; > + return trip_stats; > +} > + > void thermal_debug_tz_trip_up(struct thermal_zone_device *tz, > const struct thermal_trip *trip) > { > @@ -547,6 +560,7 @@ void thermal_debug_tz_trip_up(struct the > struct thermal_debugfs *thermal_dbg = tz->debugfs; > int temperature = tz->temperature; > int trip_id = thermal_zone_trip_id(tz, trip); > + struct trip_stats *trip_stats; > ktime_t now = ktime_get(); > > if (!thermal_dbg) > @@ -612,14 +626,8 @@ void thermal_debug_tz_trip_up(struct the > */ > tz_dbg->trips_crossed[tz_dbg->nr_trips++] = trip_id; > > - tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node); > - tze->trip_stats[trip_id].timestamp = now; > - tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, temperature); > - tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, temperature); > - tze->trip_stats[trip_id].count++; > - tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg + > - (temperature - tze->trip_stats[trip_id].avg) / > - tze->trip_stats[trip_id].count; > + trip_stats = update_tz_episode(tz_dbg, trip_id, temperature); > + trip_stats->timestamp = now; > > unlock: > mutex_unlock(&thermal_dbg->lock); > @@ -686,9 +694,8 @@ out: > void thermal_debug_update_temp(struct thermal_zone_device *tz) > { > struct thermal_debugfs *thermal_dbg = tz->debugfs; > - struct tz_episode *tze; > struct tz_debugfs *tz_dbg; > - int trip_id, i; > + int i; > > if (!thermal_dbg) > return; > @@ -697,20 +704,9 @@ void thermal_debug_update_temp(struct th > > tz_dbg = &thermal_dbg->tz_dbg; > > - if (!tz_dbg->nr_trips) > - goto out; > + for (i = 0; i < tz_dbg->nr_trips; i++) > + update_tz_episode(tz_dbg, tz_dbg->trips_crossed[i], tz->temperature); > > - for (i = 0; i < tz_dbg->nr_trips; i++) { > - trip_id = tz_dbg->trips_crossed[i]; > - tze = list_first_entry(&tz_dbg->tz_episodes, struct tz_episode, node); > - tze->trip_stats[trip_id].count++; > - tze->trip_stats[trip_id].max = max(tze->trip_stats[trip_id].max, tz->temperature); > - tze->trip_stats[trip_id].min = min(tze->trip_stats[trip_id].min, tz->temperature); > - tze->trip_stats[trip_id].avg = tze->trip_stats[trip_id].avg + > - (tz->temperature - tze->trip_stats[trip_id].avg) / > - tze->trip_stats[trip_id].count; > - } > -out: > mutex_unlock(&thermal_dbg->lock); > } > > > > > ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2024-04-17 9:35 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-04-15 18:59 [PATCH v1 0/2] thermal/debufs: Fix and clean up trip statistics collection Rafael J. Wysocki 2024-04-15 19:02 ` [PATCH v1 1/2] thermal/debugfs: Add missing count increment to thermal_debug_tz_trip_up() Rafael J. Wysocki 2024-04-15 19:03 ` [PATCH v1 2/2] thermal/debugfs: Add helper function for trip stats updates Rafael J. Wysocki 2024-04-17 9:35 ` Rafael J. Wysocki
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox