From: Mikko Perttunen <mperttunen@nvidia.com>
To: Aaron Kling <webgeek1234@gmail.com>
Cc: Michael Turquette <mturquette@baylibre.com>,
Stephen Boyd <sboyd@kernel.org>, Rob Herring <robh@kernel.org>,
Krzysztof Kozlowski <krzk+dt@kernel.org>,
Conor Dooley <conor+dt@kernel.org>,
Thierry Reding <thierry.reding@gmail.com>,
Jonathan Hunter <jonathanh@nvidia.com>,
Joseph Lo <josephl@nvidia.com>,
Peter De Schrijver <pdeschrijver@nvidia.com>,
Prashant Gaikwad <pgaikwad@nvidia.com>,
linux-clk@vger.kernel.org, devicetree@vger.kernel.org,
linux-tegra@vger.kernel.org, linux-kernel@vger.kernel.org,
Thierry Reding <treding@nvidia.com>
Subject: Re: [PATCH 5/5] arm64: tegra: Limit max cpu frequency on P3450
Date: Thu, 04 Sep 2025 09:55:58 +0900 [thread overview]
Message-ID: <8194755.G0QQBjFxQf@senjougahara> (raw)
In-Reply-To: <CALHNRZ894WcNaAuLFoDLwJ8mXDRM8PzdqRFzcyYUMPy+0q0nMw@mail.gmail.com>
On Wednesday, September 3, 2025 5:01 PM Aaron Kling wrote:
> On Wed, Sep 3, 2025 at 2:29 AM Mikko Perttunen <mperttunen@nvidia.com> wrote:
> >
> > On Wednesday, September 3, 2025 3:28 PM Aaron Kling wrote:
> > > On Wed, Sep 3, 2025 at 12:50 AM Mikko Perttunen <mperttunen@nvidia.com> wrote:
> > > >
> > > > On Saturday, August 16, 2025 2:53 PM Aaron Kling via B4 Relay wrote:
> > > > > From: Aaron Kling <webgeek1234@gmail.com>
> > > > >
> > > > > P3450's cpu is only rated for 1.4 GHz while the CVB table it uses tries
> > > > > to scale to 1.5 GHz. Set an appropriate limit on the maximum scaling
> > > > > frequency.
> > > >
> > > > Looking at downstream, from what I can tell, the CPU's maximum frequency is indeed 1.55GHz under normal conditions. However, at temperatures over 90C, its voltage is limited to 1090mV. Reference:
> > > >
> > > > static struct dvfs_therm_limits
> > > > tegra210_core_therm_caps_ucm2[MAX_THERMAL_LIMITS] = {
> > > > {86, 1090},
> > > > {0, 0},
> > > > };
> > > > (rel-32 kernel-4.9/drivers/soc/tegra/tegra210-dvfs.c)
> > > >
> > > > Here the throttling is set at 86C, I suppose to give some margin.
> > > >
> > > > 1090mV perfectly matches the 1.479GHz operating point defined in the upstream kernel. So it seems to me that rather than setting a maximum frequency, we would need temperature dependent DVFS. Or, at least as a first step, we could have the driver just always limit the maximum frequency so it fits under the thermal cap voltage -- the temperature limit is rather high, after all.
> > > >
> > > > If you have other information, please do tell.
> > >
> > > I am basing on this line in the downstream porg dt repo:
> > >
> > > nvidia,dfll-max-freq-khz = <1479000>;
> > > (tegra-l4t-r32.7.6_good kernel-dts/tegra210-porg-p3448-common.dtsi)
> > >
> > > Which in the downstream dfll driver limits the max frequency it will use:
> > >
> > > max_freq = fcpu_data->cpu_max_freq_table[speedo_id];
> > > if (!of_property_read_u32(pdev->dev.of_node, "nvidia,dfll-max-freq-khz",
> > > &f))
> > > max_freq = min(max_freq, f * 1000UL);
> > > (tegra-l4t-r32.7.6_good drivers/clk/tegra/clk-tegra124-dfll-fcpu.c)
> > >
> > > If I read the commit history correctly, it does appear that this limit
> > > was set because the always-on use case was failing thermal tests. I
> > > couldn't say if it was intentional that this throttling was applied to
> > > all use cases or not, but that is what appears to have happened. Hence
> > > trying to replicate here in an effort to squash stability issues.
> >
> > I can't see any reference to failing thermal tests. Can to point to the commit?
>
> In the porg dt repo, commit hash d1326f08, which adds the
> nvidia,dfll-max-freq-khz property, the message body states: "Set
> CPU/GPU Fmax limit for 24x7 105C UCM." I read that to mean that the
> 24x7 always-on use case model was failing to stay under 105C unless
> the cpu and gpu frequencies were limited. Is that an incorrect
> reading? 105C is kind of a crazy number anyways, beyond the soctherm
> critical shutdown temperature.
What that's (trying) to say is that it sets the CPU's Fmax to the limit specified by the 24x7 105C UCM profile, which is the 1090mV i.e. 1.4GHz limit. The profile is called that because it's normally used for the 90C-105C temperature range.
>
> > I looked into why this was added for porg -- it does not seem to be related to reliability, but more so consistency of performance. I don't think that's a huge concern for upstream -- though in any case we should be capping the frequency in the DFLL driver for now since we don't support dynamic thermal capping.
>
> So the whole conversation winds around to: The change is valid, but
> the commit message needs better justification?
In my opinion, there is no need to add the device tree property in upstream. The CPU is designed to work at 1.5GHz under 90C, and 1.4GHz between 90C to 105C. I think this is a bit of a downstream-ism and not something we should add in upstream. If the user wants to underclock, then that should be through the cpufreq governor or such mechanism.
>
> As a side note: I'm still chasing multiple stability issues on various
> t210 devices. Though, the only one I've seen on p3450/p3541 is that
> nouveau intermittently fails to init the gpu. Just hangs on probe and
> eventually something times out, stack traces, and causes a panic
> reboot. Seems to be about a 50/50 chance for me, but works fine if
> probe succeeds. For another dev, it only works once in a blue moon,
> but still dies shortly thereafter even if probe works. I thought it
> might be related to the cpu/gpu getting 'overclocked'. But even after
> this series, the problem persists. So maybe me calling this underclock
> a stability fix is inaccurate. But stability issues still exist.
Good to know. It doesn't strike me as a CPU issue -- I'd put the first place to look at nouveau's init code itself to see what is failing. There's a lot of potential software issues that can cause intermittencies during GPU boot. If power related, GPU or SOC rail.
Thanks,
Mikko
>
> Aaron
next prev parent reply other threads:[~2025-09-04 0:56 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-16 5:53 [PATCH 0/5] Properly Limit Tegra210 Clock Rates Aaron Kling via B4 Relay
2025-08-16 5:53 ` [PATCH 1/5] dt-bindings: clock: tegra124-dfll: Add property to limit frequency Aaron Kling via B4 Relay
2025-08-16 8:21 ` Krzysztof Kozlowski
2025-08-18 3:23 ` Aaron Kling
2025-08-18 6:31 ` Krzysztof Kozlowski
2025-08-16 5:53 ` [PATCH 2/5] soc: tegra: fuse: speedo-tegra210: Update speedo ids Aaron Kling via B4 Relay
2025-09-03 6:39 ` Mikko Perttunen
2025-08-16 5:53 ` [PATCH 3/5] soc: tegra: fuse: speedo-tegra210: Add sku 0x8F Aaron Kling via B4 Relay
2025-08-16 5:53 ` [PATCH 4/5] clk: tegra: dfll: Support limiting max clock per device Aaron Kling via B4 Relay
2025-08-16 5:53 ` [PATCH 5/5] arm64: tegra: Limit max cpu frequency on P3450 Aaron Kling via B4 Relay
2025-09-03 5:50 ` Mikko Perttunen
2025-09-03 6:28 ` Aaron Kling
2025-09-03 7:29 ` Mikko Perttunen
2025-09-03 8:01 ` Aaron Kling
2025-09-04 0:55 ` Mikko Perttunen [this message]
2025-09-04 1:55 ` Aaron Kling
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8194755.G0QQBjFxQf@senjougahara \
--to=mperttunen@nvidia.com \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=jonathanh@nvidia.com \
--cc=josephl@nvidia.com \
--cc=krzk+dt@kernel.org \
--cc=linux-clk@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-tegra@vger.kernel.org \
--cc=mturquette@baylibre.com \
--cc=pdeschrijver@nvidia.com \
--cc=pgaikwad@nvidia.com \
--cc=robh@kernel.org \
--cc=sboyd@kernel.org \
--cc=thierry.reding@gmail.com \
--cc=treding@nvidia.com \
--cc=webgeek1234@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox