public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Daniel Lezcano <daniel.lezcano@linaro.org>
To: Thierry Reding <thierry.reding@gmail.com>
Cc: Wei Ni <wni@nvidia.com>, "Rafael J. Wysocki" <rafael@kernel.org>,
	Jon Hunter <jonathanh@nvidia.com>,
	Johan Hovold <johan@kernel.org>,
	Philipp Zabel <p.zabel@pengutronix.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	"linux-tegra@vger.kernel.org" <linux-tegra@vger.kernel.org>,
	Linux PM mailing list <linux-pm@vger.kernel.org>
Subject: Re: thermal/drivers/tegra: Getting rid of the get_thermal_instance() usage
Date: Fri, 10 Feb 2023 14:17:03 +0100	[thread overview]
Message-ID: <20230210131703.GF175687@linaro.org> (raw)
In-Reply-To: <Y9J4WAFyXyV/nqlG@orome>

Hi Thierry,

On Thu, Jan 26, 2023 at 01:55:52PM +0100, Thierry Reding wrote:
> On Tue, Jan 24, 2023 at 08:57:23PM +0100, Daniel Lezcano wrote:
> > 
> > Hi,
> > 
> > does anyone know what is the purpose of the get_thermal_instance() usage in
> > this code:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux.git/tree/drivers/thermal/tegra/soctherm.c?h=thermal/linux-next#n623
> > 
> > The driver is using a function which is reserved for the thermal core. It
> > should not.
> > 
> > Is the following change ok ?
> > 
> > diff --git a/drivers/thermal/tegra/soctherm.c
> > b/drivers/thermal/tegra/soctherm.c
> > index 220873298d77..5f552402d987 100644
> > --- a/drivers/thermal/tegra/soctherm.c
> > +++ b/drivers/thermal/tegra/soctherm.c
> > @@ -620,9 +620,8 @@ static int tegra_thermctl_set_trip_temp(struct
> > thermal_zone_device *tz, int trip
> >  				continue;
> > 
> >  			cdev = ts->throt_cfgs[i].cdev;
> > -			if (get_thermal_instance(tz, cdev, trip_id))
> > -				stc = find_throttle_cfg_by_name(ts, cdev->type);
> > -			else
> > +			stc = find_throttle_cfg_by_name(ts, cdev->type);
> > +			if (!stc)
> >  				continue;
> > 
> >  			return throttrip_program(dev, sg, stc, temp);
> > @@ -768,9 +767,9 @@ static int tegra_soctherm_set_hwtrips(struct device
> > *dev,
> >  			continue;
> > 
> >  		cdev = ts->throt_cfgs[i].cdev;
> > -		if (get_thermal_instance(tz, cdev, trip))
> > -			stc = find_throttle_cfg_by_name(ts, cdev->type);
> > -		else
> > +
> > +		stc = find_throttle_cfg_by_name(ts, cdev->type);
> > +		if (!stc)
> >  			continue;
> > 
> >  		ret = throttrip_program(dev, sg, stc, temperature);
> 
> There's a small difference in behavior after applying this patch. Prior
> to this I get (on Tegra210):
> 
> 	[   12.354091] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> 	[   12.379009] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> 	[   12.388882] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> 	[   12.401007] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> 	[   12.471041] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> 	[   12.482852] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> 	[   12.482860] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> 	[   12.485357] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> 	[   12.501774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> 
> and after these changes, it turns into:
> 
> 	[   12.447113] tegra_soctherm 700e2000.thermal-sensor: missing thermtrips, will use critical trips as shut down temp
> 	[   12.472300] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when cpu reaches 102500 mC
> 	[   12.481789] tegra_soctherm 700e2000.thermal-sensor: programming throttle for cpu to 102500
> 	[   12.495447] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when cpu reaches 102500 mC
> 	[   12.496514] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when gpu reaches 103000 mC
> 	[   12.510353] tegra_soctherm 700e2000.thermal-sensor: programming throttle for gpu to 103000
> 	[   12.526856] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when gpu reaches 103000 mC
> 	[   12.528774] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when pll reaches 103000 mC
> 	[   12.569352] tegra_soctherm 700e2000.thermal-sensor: programming throttle for pll to 103000
> 	[   12.577635] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when pll reaches 103000 mC
> 	[   12.590952] tegra_soctherm 700e2000.thermal-sensor: thermtrip: will shut down when mem reaches 103000 mC
> 	[   12.600783] tegra_soctherm 700e2000.thermal-sensor: programming throttle for mem to 103000
> 	[   12.609204] tegra_soctherm 700e2000.thermal-sensor: throttrip: will throttle when mem reaches 103000 mC
> 
> The "programming throttle ..." messages are something I've added locally
> to trace what gets called. So it looks like for "pll" and "mem" thermal
> zones, we now program trip points whereas we previously didn't.

The DT descriptioni (tegra210.dtsi) says one thing and the implementation says
something else.

If we refer to the PLL description, there is one 'hot' trip point and
one 'critical' trip point. No polling delay at all, so we need the
interrupts.

Logically, we should set the 'hot' trip point first, when the trip
point is crossed, we setup the next trip point, which is the critical.

With these two trip points, the first one will send a notification to
the userspace and the second one will force a shutdown of the
system. For both, no cooling device is expected.

Well, actually I don't get the logic of the soctherm driver. It should
just rely on the thermal framework to set the trip point regardless
the cooling devices.

The device tree also is strange. For example, the dram sets
cooling-device = <&emc 0 0>; an inoperative action for a 'nominal'
trip point ... If the goal is to stop the mitigation, that is already
done by the governor when the trip point is crossed the way down. The
second trip point is an 'active' cooling device but it refers to a emc
which is, at the first glance, a passive cooling device.

The gpu description only describes hot and critical trip points. The
cooling device maps to the 'hot' trip point ! The governor is not used
in this case, so the cooling device is inoperative. Same for the cpu
thermal zone.

IOW, the driver is not correctly implemented and the device tree is
wrong. Thermal is not working correctly on these board AFAICT.



-- 

 <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter |
<http://www.linaro.org/linaro-blog/> Blog

  parent reply	other threads:[~2023-02-10 13:17 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-24 19:57 thermal/drivers/tegra: Getting rid of the get_thermal_instance() usage Daniel Lezcano
2023-01-26 12:55 ` Thierry Reding
2023-01-26 15:37   ` Daniel Lezcano
2023-02-06 14:50   ` Daniel Lezcano
2023-02-07 12:18     ` Thierry Reding
2023-02-07 12:38       ` Daniel Lezcano
2023-02-07 14:27         ` Thierry Reding
2023-02-10 13:17   ` Daniel Lezcano [this message]
2023-02-10 14:09     ` Thierry Reding
2023-02-10 14:36       ` Daniel Lezcano
2023-02-10 15:12         ` Thierry Reding
2023-04-11 10:48           ` Daniel Lezcano
2023-04-11 16:30             ` Thierry Reding
2023-03-08 17:21       ` Daniel Lezcano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230210131703.GF175687@linaro.org \
    --to=daniel.lezcano@linaro.org \
    --cc=johan@kernel.org \
    --cc=jonathanh@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=linux-tegra@vger.kernel.org \
    --cc=p.zabel@pengutronix.de \
    --cc=rafael@kernel.org \
    --cc=thierry.reding@gmail.com \
    --cc=wni@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox