From: Kyle McMartin <kyle@infradead.org>
To: srinivas pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Kyle McMartin <kyle@infradead.org>,
"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
linux-pm@vger.kernel.org, kernel-team@fb.com,
linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] therm_throt: test bits as we build therm_intr_core_clear_mask
Date: Thu, 4 Apr 2024 13:17:19 -0400 [thread overview]
Message-ID: <Zg7gn2ATm_NMiw_2@merlin.infradead.org> (raw)
In-Reply-To: <8b4cb4ad67032fad69f29df8e6b83054c7fa15db.camel@linux.intel.com>
On Wed, Apr 03, 2024 at 06:15:47PM -0700, srinivas pandruvada wrote:
> > On Broadwell and Broadwell-DE, the HWP flag is not set, but writing
> > these bits does not trap.
> >
> > On our Skylake-DE, Skylake, and Cooper Lake platforms, the HWP flag
> > is
> > set in CPUID, and writing 1 to these bits traps attempting to write
> > 0xAAA8 to MSR 0x19C (THERM_STATUS). Writing 0xAA8 from userspace
> > works
> > as expected to un-stick PROCHOT_LOG.
>
> I think this issue happens only on Skylake, Cascade Lake, Cooper Lake
> and not on any other systems.
>
> Please verify:
> GP# happens only when bit13 (Current Limit Log) or bit15 (Cross Domain
> Limit Log) is 1.
>
Yeah, if either of the bits are set, we'll trap and fail the WRMSRL.
> Basically writing 0x2000 or 0x8000 or A000 will cause this issue.
> Are you using the latest BIOS with microcode?
> Please confirm your microcode version, I can check internally.
>
On SkylakeDE, 6-85-4 we've got 0x2006e08 and 0x2006e05 as the most commonly
deployed microcodes. On Skylake, 6-85-4 we've got 0x2006e05 and 0x2000065.
Finally, on Cooper Lake, 6-85-11, we have 0x700001f and are in the process
of rolling out 0x7002503.
Rolling out new firmware is a pretty slow process... Since we're not
clearing those bits anywhere in the kernel we're deploying, I just
stubbed out setting BIT(13) and BIT(15) on those platforms for now while
we discuss a more durable fix.
Thanks for following up! --kyle
> Thanks,
> Srinivas
>
>
> >
> > On our Sapphire Rapids platforms, the HWP flag is set, and writing 1
> > to
> > these bits is successful.
> >
> > drivers/thermal/intel/therm_throt.c | 29 ++++++++++++++++++++++-----
> > --
> > 1 file changed, 22 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/thermal/intel/therm_throt.c
> > b/drivers/thermal/intel/therm_throt.c
> > index e69868e868eb..3058d8fcfcef 100644
> > --- a/drivers/thermal/intel/therm_throt.c
> > +++ b/drivers/thermal/intel/therm_throt.c
> > @@ -196,8 +196,14 @@ static const struct attribute_group
> > thermal_attr_group = {
> > static u64 therm_intr_core_clear_mask;
> > static u64 therm_intr_pkg_clear_mask;
> >
> > +/* Probe each addition to the mask to ensure that our wrmsrl
> > + * won't fail to clear bits.
> > + */
> > static void thermal_intr_init_core_clear_mask(void)
> > {
> > + u64 bits = 0;
> > + u64 mask = 0;
> > +
> > if (therm_intr_core_clear_mask)
> > return;
> >
> > @@ -211,25 +217,34 @@ static void
> > thermal_intr_init_core_clear_mask(void)
> > * Bit 1, 3, 5: CPUID.01H:EDX[22] = 1. This driver will not
> > * enable interrupts, when 0 as it checks for
> > X86_FEATURE_ACPI.
> > */
> > - therm_intr_core_clear_mask = (BIT(1) | BIT(3) | BIT(5));
> > + mask = (BIT(1) | BIT(3) | BIT(5));
> >
> > /*
> > * Bit 7 and 9: Thermal Threshold #1 and #2 log
> > * If CPUID.01H:ECX[8] = 1
> > */
> > - if (boot_cpu_has(X86_FEATURE_TM2))
> > - therm_intr_core_clear_mask |= (BIT(7) | BIT(9));
> > + bits = BIT(7) | BIT(9);
> > + if (boot_cpu_has(X86_FEATURE_TM2) &&
> > + wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> > + mask |= bits;
> > +
> >
> > /* Bit 11: Power Limitation log (R/WC0) If CPUID.06H:EAX[4] =
> > 1 */
> > - if (boot_cpu_has(X86_FEATURE_PLN))
> > - therm_intr_core_clear_mask |= BIT(11);
> > + bits = BIT(11);
> > + if (boot_cpu_has(X86_FEATURE_PLN) &&
> > + wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> > + mask |= bits;
> >
> > /*
> > * Bit 13: Current Limit log (R/WC0) If CPUID.06H:EAX[7] = 1
> > * Bit 15: Cross Domain Limit log (R/WC0) If CPUID.06H:EAX[7]
> > = 1
> > */
> > - if (boot_cpu_has(X86_FEATURE_HWP))
> > - therm_intr_core_clear_mask |= (BIT(13) | BIT(15));
> > + bits = BIT(13) | BIT(15);
> > + if (boot_cpu_has(X86_FEATURE_HWP) &&
> > + wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> > + mask |= bits;
> > +
> > + therm_intr_core_clear_mask = mask;
> > }
> >
> > static void thermal_intr_init_pkg_clear_mask(void)
>
prev parent reply other threads:[~2024-04-04 17:17 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-03 21:13 [RFC PATCH] therm_throt: test bits as we build therm_intr_core_clear_mask Kyle McMartin
2024-04-04 1:15 ` srinivas pandruvada
2024-04-04 17:17 ` Kyle McMartin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zg7gn2ATm_NMiw_2@merlin.infradead.org \
--to=kyle@infradead.org \
--cc=kernel-team@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=rafael.j.wysocki@intel.com \
--cc=srinivas.pandruvada@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox