All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zhang, Rui" <rui.zhang@intel.com>
To: Doug Smythies <dsmythies@telus.net>
Cc: "daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
	"srinivas.pandruvada@linux.intel.com" 
	<srinivas.pandruvada@linux.intel.com>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
Subject: RE: [PATCH] thermal/intel: introduce tcc cooling driver
Date: Mon, 18 Jan 2021 09:46:30 +0000	[thread overview]
Message-ID: <e04c36aae6eb4cbb9b99799290016d58@intel.com> (raw)
In-Reply-To: <002601d6ec2a$36663da0$a332b8e0$@net>

Hi, Doug,

Thanks for testing this patch.

> -----Original Message-----
> From: Doug Smythies <dsmythies@telus.net>
> Sent: Sunday, January 17, 2021 1:08 AM
> To: Zhang, Rui <rui.zhang@intel.com>
> Cc: daniel.lezcano@linaro.org; srinivas.pandruvada@linux.intel.com; linux-
> pm@vger.kernel.org
> Subject: RE: [PATCH] thermal/intel: introduce tcc cooling driver
> Importance: High
> 
> On 2021.01.15 Zhang Rui wrote:
> >
> > On Intel processors, the core frequency can be reduced below OS
> > request, when the current temperature reaches the TCC (Thermal Control
> > Circuit) activation temperature.
> >
> > The default TCC activation temperature is specified by
> > MSR_IA32_TEMPERATURE_TARGET. However, it can be adjusted by
> specifying
> > an offset in degrees C, using the TCC Offset bits in the same MSR register.
> >
> > This patch introduces a cooling devices driver that utilizes the TCC
> > Offset feature. The bigger the current cooling state is, the lower the
> > effective TCC activation temperature is, so that the processors can be
> > throttled earlier before system critical overheats.
> 
> Thank you for this useful patch.
> My systems don't need thermald or any other thermal control, but it is nice
> to have this extra margin to add to the critical stuff, as a backup.
> I also like to use the offset to test stuff.
> 
> I use the internal power limit servo for power limiting, and that servo works
> very well indeed. Using this temperature offset as a way to servo the
> thermal operating limit does work, but tends to overshoot, oscillate, hold low
> excessively long (minutes). 

Do you have a script to test and show the drawbacks of this feature?
It seems that it behaves differently on different platforms.
Maybe we can evaluate this on more platforms.

> It also seems to limit CPU clock frequency
> reduction to the non-turbo limit, regardless of the desired maximum
> temperature.
> 
> I am not familiar with the thermal stuff at all, and didn't know where to find
> the trip point knob. Anyway, found "cooling_devices11".
> 
> I do not understand this:
> 
> ~$ cat /sys/devices/virtual/thermal/cooling_device11/stats/trans_table
> cat: /sys/devices/virtual/thermal/cooling_device11/stats/trans_table: File
> too large

This is a known issue that stats table can not handle devices with too many cooling states, say, 127 cooling states for TCC Offset cooling device.
We can ignore this for now.

> 
> Rather than enter the actual TCC offset, I would rather enter the desired trip
> point, and have the driver do the math to convert it to the offset.

Hmmm, a writable trip point? I need to think about this.

> 
> Example step function overshoot, trip point set to 55 degrees C.
> 
> doug@s18:~$ sudo ~/turbostat --Summary --quiet --show
> Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 1
> Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt
> 0.07    800     45      24      1.89    0.00
> 0.04    800     29      23      1.89    0.00
> 61.76   4546    4151    66      103.77  0.00 < step function load applied on 4 of 6
> cores
> 67.76   4570    4476    66      120.42  0.00
> 68.03   4567    4488    66      120.73  0.00
> 67.98   4572    4492    67      121.00  0.00 < 19 degrees over trip point
> 68.10   4489    4493    58      109.19  0.00 < this throttling is either the power
> servo or the temp servo.
> 68.08   4262    4476    51      82.82   0.00 < this throttling is the temp servo.
> 68.13   4143    4513    48      75.16   0.00
> 68.03   4086    4488    46      71.87   0.00 < It actually undershoots often, I don't
> know why.
> 68.12   4000    4505    46      67.02   0.00 < often it doesn't undershoot.
> 68.44   4000    4502    45      67.16   0.00
> 68.06   4000    4483    45      66.95   0.00
> 68.02   3973    4490    44      65.20   0.00
> 67.94   3900    4489    43      60.51   0.00
> 67.88   3900    4501    44      60.55   0.00
> 67.85   3900    4472    43      60.52   0.00
> 67.96   3900    4481    43      60.59   0.00
> 68.26   3900    4501    44      60.70   0.00
> 67.93   3900    4498    43      60.58   0.00
> 68.03   3900    4476    43      60.68   0.00
> 67.83   3900    4481    44      60.54   0.00
> 35.06   3895    2412    25      32.13   0.00 < load removed.
> 0.04    800     25      24      1.89    0.00
> 0.04    800     22      23      1.89    0.00
> 0.06    800     35      23      1.90    0.00
> 0.03    800     18      23      1.89    0.00
> 0.04    800     26      22      1.90    0.00
> 0.30    1927    44      23      1.97    0.00
> ^C0.10  800     25      23      1.91    0.00
> 
> Example long time to recover:
> (actually, this example never recovers, unusual):
> Note: 3.7 GHz is the limit.
> 
> doug@s18:~$ sudo ~/turbostat --Summary --quiet --show
> Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 30
> Busy%   Bzy_MHz IRQ     PkgTmp  PkgWatt GFXWatt
> 67.58   3700    134812  42      52.15   0.00 <<< the trip point was changed from 37
> to 57 degrees
> 67.90   3700    134964  42      52.08   0.00
> 68.07   3700    134424  42      52.06   0.00
> 68.01   3700    134415  41      50.76   0.00
> 68.14   3700    134521  41      50.78   0.00
> 68.11   3700    134424  42      50.75   0.00
> 68.03   3700    134329  42      50.70   0.00
> 68.11   3700    134321  42      50.76   0.00
> 68.05   3700    134456  42      51.09   0.00
> 68.12   3700    134549  42      52.21   0.00
> 68.12   3700    134482  42      52.19   0.00
> 68.10   3700    134301  42      52.20   0.00
> 68.11   3700    134444  42      52.14   0.00
> 68.08   3700    134422  42      52.17   0.00
> 68.07   3700    134430  42      52.23   0.00
> 68.00   3700    134723  42      52.12   0.00
> 67.96   3711    135207  44      52.53   0.00 <<< It takes 8 minutes until the
> frequency goes above 3.7 GHz
> 68.05   3765    134519  42      54.34   0.00
> 68.11   3771    134461  43      54.60   0.00
> 67.83   3763    134867  43      54.26   0.00
> 67.93   3773    134577  43      54.78   0.00 <<< But it never recovers, Why not?
> ...
> 
> For unknown reason the processor seems to now think it is not heavily
> loaded. From my MSR decoder:
> 
> 0x64F: MSR_CORE_PERF_LIMIT_REASONS: 200020 AUTO AUTOL
> 
> From the book:
> 
> > Autonomous Utilization-Based Frequency Control Status (R0) When set,
> > frequency is reduced below the operating system request because the
> > processor has detected that utilization is low.
> 
> Which is not true.
> 
> Anyway,
> 
> Acked-by: Doug Smythies <dsmythies@telus.net>
> 
thanks,
rui

  parent reply	other threads:[~2021-01-18 10:47 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-15  9:47 [PATCH] thermal/intel: introduce tcc cooling driver Zhang Rui
2021-01-16 17:08 ` Doug Smythies
2021-01-16 21:21   ` Doug Smythies
2021-01-18  9:31     ` Zhang, Rui
2021-01-19  7:10       ` Doug Smythies
2021-01-18  9:46   ` Zhang, Rui [this message]
2021-01-28 17:32     ` Zhang Rui
2021-01-26 19:18   ` Doug Smythies
2021-01-28 17:29     ` Zhang Rui
2021-01-30 16:58       ` Doug Smythies

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e04c36aae6eb4cbb9b99799290016d58@intel.com \
    --to=rui.zhang@intel.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=dsmythies@telus.net \
    --cc=linux-pm@vger.kernel.org \
    --cc=srinivas.pandruvada@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.