public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Kyle McMartin <kyle@infradead.org>
To: srinivas pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Kyle McMartin <kyle@infradead.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	linux-pm@vger.kernel.org, kernel-team@fb.com,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH] therm_throt: test bits as we build therm_intr_core_clear_mask
Date: Thu, 4 Apr 2024 13:17:19 -0400	[thread overview]
Message-ID: <Zg7gn2ATm_NMiw_2@merlin.infradead.org> (raw)
In-Reply-To: <8b4cb4ad67032fad69f29df8e6b83054c7fa15db.camel@linux.intel.com>

On Wed, Apr 03, 2024 at 06:15:47PM -0700, srinivas pandruvada wrote:
> > On Broadwell and Broadwell-DE, the HWP flag is not set, but writing
> > these bits does not trap.
> > 
> > On our Skylake-DE, Skylake, and Cooper Lake platforms, the HWP flag
> > is
> > set in CPUID, and writing 1 to these bits traps attempting to write
> > 0xAAA8 to MSR 0x19C (THERM_STATUS). Writing 0xAA8 from userspace
> > works
> > as expected to un-stick PROCHOT_LOG.
> 
> I think this issue happens only on Skylake, Cascade Lake, Cooper Lake
> and not on any other systems.
> 
> Please verify:
> GP# happens only when bit13 (Current Limit Log) or bit15 (Cross Domain
> Limit Log) is 1.
> 

Yeah, if either of the bits are set, we'll trap and fail the WRMSRL.

> Basically writing 0x2000 or 0x8000  or A000 will cause this issue.
> Are you using the latest BIOS with microcode?
> Please confirm your microcode version, I can check internally.
> 

On SkylakeDE, 6-85-4 we've got 0x2006e08 and 0x2006e05 as the most commonly
deployed microcodes. On Skylake, 6-85-4 we've got 0x2006e05 and 0x2000065.
Finally, on Cooper Lake, 6-85-11, we have 0x700001f and are in the process
of rolling out 0x7002503.

Rolling out new firmware is a pretty slow process... Since we're not
clearing those bits anywhere in the kernel we're deploying, I just
stubbed out setting BIT(13) and BIT(15) on those platforms for now while
we discuss a more durable fix.

Thanks for following up! --kyle

> Thanks,
> Srinivas
> 
> 
> > 
> > On our Sapphire Rapids platforms, the HWP flag is set, and writing 1
> > to
> > these bits is successful.
> > 
> >  drivers/thermal/intel/therm_throt.c | 29 ++++++++++++++++++++++-----
> > --
> >  1 file changed, 22 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/thermal/intel/therm_throt.c
> > b/drivers/thermal/intel/therm_throt.c
> > index e69868e868eb..3058d8fcfcef 100644
> > --- a/drivers/thermal/intel/therm_throt.c
> > +++ b/drivers/thermal/intel/therm_throt.c
> > @@ -196,8 +196,14 @@ static const struct attribute_group
> > thermal_attr_group = {
> >  static u64 therm_intr_core_clear_mask;
> >  static u64 therm_intr_pkg_clear_mask;
> >  
> > +/* Probe each addition to the mask to ensure that our wrmsrl
> > + * won't fail to clear bits.
> > + */
> >  static void thermal_intr_init_core_clear_mask(void)
> >  {
> > +       u64 bits = 0;
> > +       u64 mask = 0;
> > +
> >         if (therm_intr_core_clear_mask)
> >                 return;
> >  
> > @@ -211,25 +217,34 @@ static void
> > thermal_intr_init_core_clear_mask(void)
> >          * Bit 1, 3, 5: CPUID.01H:EDX[22] = 1. This driver will not
> >          * enable interrupts, when 0 as it checks for
> > X86_FEATURE_ACPI.
> >          */
> > -       therm_intr_core_clear_mask = (BIT(1) | BIT(3) | BIT(5));
> > +       mask = (BIT(1) | BIT(3) | BIT(5));
> >  
> >         /*
> >          * Bit 7 and 9: Thermal Threshold #1 and #2 log
> >          * If CPUID.01H:ECX[8] = 1
> >          */
> > -       if (boot_cpu_has(X86_FEATURE_TM2))
> > -               therm_intr_core_clear_mask |= (BIT(7) | BIT(9));
> > +       bits = BIT(7) | BIT(9);
> > +       if (boot_cpu_has(X86_FEATURE_TM2) &&
> > +           wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> > +               mask |= bits;
> > +
> >  
> >         /* Bit 11: Power Limitation log (R/WC0) If CPUID.06H:EAX[4] =
> > 1 */
> > -       if (boot_cpu_has(X86_FEATURE_PLN))
> > -               therm_intr_core_clear_mask |= BIT(11);
> > +       bits = BIT(11);
> > +       if (boot_cpu_has(X86_FEATURE_PLN) &&
> > +           wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> > +               mask |= bits;
> >  
> >         /*
> >          * Bit 13: Current Limit log (R/WC0) If CPUID.06H:EAX[7] = 1
> >          * Bit 15: Cross Domain Limit log (R/WC0) If CPUID.06H:EAX[7]
> > = 1
> >          */
> > -       if (boot_cpu_has(X86_FEATURE_HWP))
> > -               therm_intr_core_clear_mask |= (BIT(13) | BIT(15));
> > +       bits = BIT(13) | BIT(15);
> > +       if (boot_cpu_has(X86_FEATURE_HWP) &&
> > +           wrmsrl_safe(MSR_IA32_THERM_STATUS, mask | bits) >= 0)
> > +               mask |= bits;
> > +
> > +       therm_intr_core_clear_mask = mask;
> >  }
> >  
> >  static void thermal_intr_init_pkg_clear_mask(void)
> 

      reply	other threads:[~2024-04-04 17:17 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-03 21:13 [RFC PATCH] therm_throt: test bits as we build therm_intr_core_clear_mask Kyle McMartin
2024-04-04  1:15 ` srinivas pandruvada
2024-04-04 17:17   ` Kyle McMartin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zg7gn2ATm_NMiw_2@merlin.infradead.org \
    --to=kyle@infradead.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rafael.j.wysocki@intel.com \
    --cc=srinivas.pandruvada@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox