From: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
To: Borislav Petkov <bp@alien8.de>, "Luck, Tony" <tony.luck@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"mingo@redhat.com" <mingo@redhat.com>,
"hpa@zytor.com" <hpa@zytor.com>,
"bberg@redhat.com" <bberg@redhat.com>,
"x86@kernel.org" <x86@kernel.org>,
"linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"hdegoede@redhat.com" <hdegoede@redhat.com>,
"ckellner@redhat.com" <ckellner@redhat.com>
Subject: Re: [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages
Date: Fri, 18 Oct 2019 05:26:36 -0700 [thread overview]
Message-ID: <c2ce4ef128aad84616b2dc21f6230ad4db12194b.camel@linux.intel.com> (raw)
In-Reply-To: <20191017214445.GG14441@zn.tnic>
On Thu, 2019-10-17 at 23:44 +0200, Borislav Petkov wrote:
> On Thu, Oct 17, 2019 at 09:31:30PM +0000, Luck, Tony wrote:
> > That sounds like the right short term action.
> >
> > Depending on what we end up with from Srinivas ... we may want
> > to reconsider the severity. The basic premise of Srinivas' patch
> > is to avoid printing anything for short excursions above
> > temperature
> > threshold. But the effect of that is that when we find the
> > core/package
> > staying above temperature for an extended period of time, we are
> > in a serious situation where some action may be needed. E.g.
> > move the laptop off the soft surface that is blocking the air
> > vents.
>
> I don't think having a critical severity message is nearly enough.
> There are cases where the users simply won't see that message, no
> shell
> opened, nothing scanning dmesg, nothing pops up on the desktop to
> show
> KERN_CRIT messages, etc.
>
> If we really wanna handle this case then we must be much more
> reliable:
>
> * we throttle the machine from within the kernel - whatever that may
> mean
There are actions associated with the high temperature using acpi
thermal subsystems. The problem with associating with this warning
directly is that, this threhold temperature is set to too low in some
recent laptops at power up.
Server/desktops generally rely on the embedded controller for FAN
control, which kernel have no control. For them this warning helps to
either bring in additional cooling or fix existing cooling.
If something needs to force throttle from kernel, then we should use
some offset from the max temperature (aka TJMax), instead of this
warning threshold. Then we can use idle injection or change duty cycle
of CPU clocks.
Thanks,
Srinivas
> * if that doesn't help, we stop scheduling !root tasks
> * if that doesn't help, we halt
> * ...
>
> These are purely hypothetical things to do but I'm pointing them out
> as
> an example that in a high temperature situation we should be actively
> doing something and not wait for the user to do that.
>
> Come to think of it, one can apply the same type of logic here and
> split
> the temp severity into action-required events and action-optional
> events
> and then depending on the type, we do things.
>
> Now what those things are, should be determined by the severity of
> the
> events. Which would mean, we'd need to know how severe those events
> are.
> And since this is left in the hands of the OEMs, good luck to us. ;-\
>
next prev parent reply other threads:[~2019-10-18 12:26 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <2c2b65c23be3064504566c5f621c1f37bf7e7326.camel@redhat.com>
2019-10-14 21:21 ` [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages Srinivas Pandruvada
2019-10-14 21:21 ` [PATCH 2/2] x86, mce: Add additional kernel boot parameter Srinivas Pandruvada
2019-10-14 21:36 ` [PATCH 1/2] x86, mce, therm_throt: Optimize logging of thermal throttle messages Borislav Petkov
2019-10-14 22:27 ` Luck, Tony
2019-10-15 8:36 ` Borislav Petkov
2019-10-15 8:52 ` Peter Zijlstra
2019-10-15 13:43 ` Srinivas Pandruvada
2019-10-14 22:41 ` Srinivas Pandruvada
2019-10-15 8:46 ` Borislav Petkov
2019-10-15 14:01 ` Srinivas Pandruvada
2019-10-15 8:48 ` Peter Zijlstra
2019-10-15 13:31 ` Srinivas Pandruvada
2019-10-16 8:14 ` Peter Zijlstra
2019-10-16 14:00 ` Borislav Petkov
2019-10-17 21:31 ` Luck, Tony
2019-10-17 21:44 ` Borislav Petkov
2019-10-17 23:53 ` Luck, Tony
2019-10-18 6:46 ` Borislav Petkov
2019-10-18 7:17 ` Peter Zijlstra
2019-10-18 12:26 ` Srinivas Pandruvada [this message]
2019-10-18 13:23 ` Borislav Petkov
2019-10-18 15:55 ` Srinivas Pandruvada
2019-10-18 19:40 ` Borislav Petkov
2019-10-18 18:02 ` Luck, Tony
2019-10-18 19:45 ` Borislav Petkov
2019-10-18 20:38 ` Luck, Tony
2019-10-19 8:10 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c2ce4ef128aad84616b2dc21f6230ad4db12194b.camel@linux.intel.com \
--to=srinivas.pandruvada@linux.intel.com \
--cc=bberg@redhat.com \
--cc=bp@alien8.de \
--cc=ckellner@redhat.com \
--cc=hdegoede@redhat.com \
--cc=hpa@zytor.com \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).