linux-pm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Zhang, Rui" <rui.zhang@intel.com>
To: Doug Smythies <dsmythies@telus.net>
Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
	"srinivas.pandruvada@linux.intel.com" 
	<srinivas.pandruvada@linux.intel.com>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
Subject: RE: [PATCH] thermal/intel: introduce tcc cooling driver
Date: Mon, 18 Jan 2021 09:46:30 +0000	[thread overview]
Message-ID: <e04c36aae6eb4cbb9b99799290016d58@intel.com> (raw)
In-Reply-To: <002601d6ec2a$36663da0$a332b8e0$@net>

Hi, Doug,

Thanks for testing this patch.

> -----Original Message-----
> From: Doug Smythies <dsmythies@telus.net>
> Sent: Sunday, January 17, 2021 1:08 AM
> To: Zhang, Rui <rui.zhang@intel.com>
> Cc: daniel.lezcano@linaro.org; srinivas.pandruvada@linux.intel.com; linux-
> pm@vger.kernel.org
> Subject: RE: [PATCH] thermal/intel: introduce tcc cooling driver
> Importance: High
> 
> On 2021.01.15 Zhang Rui wrote:
> >
> > On Intel processors, the core frequency can be reduced below OS
> > request, when the current temperature reaches the TCC (Thermal Control
> > Circuit) activation temperature.
> >
> > The default TCC activation temperature is specified by
> > MSR_IA32_TEMPERATURE_TARGET. However, it can be adjusted by
> specifying
> > an offset in degrees C, using the TCC Offset bits in the same MSR register.
> >
> > This patch introduces a cooling devices driver that utilizes the TCC
> > Offset feature. The bigger the current cooling state is, the lower the
> > effective TCC activation temperature is, so that the processors can be
> > throttled earlier before system critical overheats.
> 
> Thank you for this useful patch.
> My systems don't need thermald or any other thermal control, but it is nice
> to have this extra margin to add to the critical stuff, as a backup.
> I also like to use the offset to test stuff.
> 
> I use the internal power limit servo for power limiting, and that servo works
> very well indeed. Using this temperature offset as a way to servo the
> thermal operating limit does work, but tends to overshoot, oscillate, hold low
> excessively long (minutes). 

Do you have a script to test and show the drawbacks of this feature?
It seems that it behaves differently on different platforms.
Maybe we can evaluate this on more platforms.

> It also seems to limit CPU clock frequency
> reduction to the non-turbo limit, regardless of the desired maximum
> temperature.
> 
> I am not familiar with the thermal stuff at all, and didn't know where to find
> the trip point knob. Anyway, found "cooling_devices11".
> 
> I do not understand this:
> 
> ~$ cat /sys/devices/virtual/thermal/cooling_device11/stats/trans_table
> cat: /sys/devices/virtual/thermal/cooling_device11/stats/trans_table: File
> too large

This is a known issue that stats table can not handle devices with too many cooling states, say, 127 cooling states for TCC Offset cooling device.
We can ignore this for now.

> 
> Rather than enter the actual TCC offset, I would rather enter the desired trip
> point, and have the driver do the math to convert it to the offset.

Hmmm, a writable trip point? I need to think about this.

> 
> Example step function overshoot, trip point set to 55 degrees C.
> 
> doug@s18:~$ sudo ~/turbostat --Summary --quiet --show
> Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 1
> Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt
> 0.07    800     45      24      1.89    0.00
> 0.04    800     29      23      1.89    0.00
> 61.76   4546    4151    66      103.77  0.00 < step function load applied on 4 of 6
> cores
> 67.76   4570    4476    66      120.42  0.00
> 68.03   4567    4488    66      120.73  0.00
> 67.98   4572    4492    67      121.00  0.00 < 19 degrees over trip point
> 68.10   4489    4493    58      109.19  0.00 < this throttling is either the power
> servo or the temp servo.
> 68.08   4262    4476    51      82.82   0.00 < this throttling is the temp servo.
> 68.13   4143    4513    48      75.16   0.00
> 68.03   4086    4488    46      71.87   0.00 < It actually undershoots often, I don't
> know why.
> 68.12   4000    4505    46      67.02   0.00 < often it doesn't undershoot.
> 68.44   4000    4502    45      67.16   0.00
> 68.06   4000    4483    45      66.95   0.00
> 68.02   3973    4490    44      65.20   0.00
> 67.94   3900    4489    43      60.51   0.00
> 67.88   3900    4501    44      60.55   0.00
> 67.85   3900    4472    43      60.52   0.00
> 67.96   3900    4481    43      60.59   0.00
> 68.26   3900    4501    44      60.70   0.00
> 67.93   3900    4498    43      60.58   0.00
> 68.03   3900    4476    43      60.68   0.00
> 67.83   3900    4481    44      60.54   0.00
> 35.06   3895    2412    25      32.13   0.00 < load removed.
> 0.04    800     25      24      1.89    0.00
> 0.04    800     22      23      1.89    0.00
> 0.06    800     35      23      1.90    0.00
> 0.03    800     18      23      1.89    0.00
> 0.04    800     26      22      1.90    0.00
> 0.30    1927    44      23      1.97    0.00
> ^C0.10  800     25      23      1.91    0.00
> 
> Example long time to recover:
> (actually, this example never recovers, unusual):
> Note: 3.7 GHz is the limit.
> 
> doug@s18:~$ sudo ~/turbostat --Summary --quiet --show
> Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 30
> Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt
> 67.58   3700    134812  42      52.15   0.00 <<< the trip point was changed from 37
> to 57 degrees
> 67.90   3700    134964  42      52.08   0.00
> 68.07   3700    134424  42      52.06   0.00
> 68.01   3700    134415  41      50.76   0.00
> 68.14   3700    134521  41      50.78   0.00
> 68.11   3700    134424  42      50.75   0.00
> 68.03   3700    134329  42      50.70   0.00
> 68.11   3700    134321  42      50.76   0.00
> 68.05   3700    134456  42      51.09   0.00
> 68.12   3700    134549  42      52.21   0.00
> 68.12   3700    134482  42      52.19   0.00
> 68.10   3700    134301  42      52.20   0.00
> 68.11   3700    134444  42      52.14   0.00
> 68.08   3700    134422  42      52.17   0.00
> 68.07   3700    134430  42      52.23   0.00
> 68.00   3700    134723  42      52.12   0.00
> 67.96   3711    135207  44      52.53   0.00 <<< It takes 8 minutes until the
> frequency goes above 3.7 GHz
> 68.05   3765    134519  42      54.34   0.00
> 68.11   3771    134461  43      54.60   0.00
> 67.83   3763    134867  43      54.26   0.00
> 67.93   3773    134577  43      54.78   0.00 <<< But it never recovers, Why not?
> ...
> 
> For unknown reason the processor seems to now think it is not heavily
> loaded. From my MSR decoder:
> 
> 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 200020 AUTO AUTOL
> 
> From the book:
> 
> > Autonomous Utilization-Based Frequency Control Status (R0) When set,
> > frequency is reduced below the operating system request because the
> > processor has detected that utilization is low.
> 
> Which is not true.
> 
> Anyway,
> 
> Acked-by: Doug Smythies <dsmythies@telus.net>
> 
thanks,
rui

  parent reply	other threads:[~2021-01-18 10:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-15  9:47 [PATCH] thermal/intel: introduce tcc cooling driver Zhang Rui
2021-01-16 17:08 ` Doug Smythies
2021-01-16 21:21   ` Doug Smythies
2021-01-18  9:31     ` Zhang, Rui
2021-01-19  7:10       ` Doug Smythies
2021-01-18  9:46   ` Zhang, Rui [this message]
2021-01-28 17:32     ` Zhang Rui
2021-01-26 19:18   ` Doug Smythies
2021-01-28 17:29     ` Zhang Rui
2021-01-30 16:58       ` Doug Smythies

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e04c36aae6eb4cbb9b99799290016d58@intel.com \
    --to=rui.zhang@intel.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dsmythies@telus.net \
    --cc=linux-pm@vger.kernel.org \
    --cc=srinivas.pandruvada@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).