From: srinivas pandruvada <srinivas.pandruvada@linux.intel.com>
To: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: rui.zhang@intel.com, daniel.lezcano@linaro.org,
linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
bp@alien8.de, Rui Salvaterra <rsalvaterra@gmail.com>,
stable@kernel.org
Subject: Re: [PATCH] thermal: intel: Fix unchecked MSR issue
Date: Tue, 11 Apr 2023 10:12:07 -0700 [thread overview]
Message-ID: <26bea1012a0a02895e30f6a33a4ebfe6ec8ba4b2.camel@linux.intel.com> (raw)
In-Reply-To: <CAJZ5v0iAHY6pOQb2N=AQbks7JKnVa1T29-zTT1XFBcVXEdZuwg@mail.gmail.com>
On Tue, 2023-04-11 at 18:16 +0200, Rafael J. Wysocki wrote:
> On Mon, Apr 10, 2023 at 7:35 PM Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com> wrote:
> >
> > Some older processors don't allow BIT(13) and BIT(15) in the
> > current
> > mask set by "THERM_STATUS_CLEAR_CORE_MASK". This results in:
> >
> > unchecked MSR access error: WRMSR to 0x19c (tried to
> > write 0x000000000000aaa8) at rIP: 0xffffffff816f66a6
> > (throttle_active_work+0xa6/0x1d0)
> >
> > To avoid unchecked MSR issues, check cpuid for each feature and
> > then
> > form core mask. Do the same for package mask set by
> > "THERM_STATUS_CLEAR_PKG_MASK".
> >
> > Introduce functions thermal_intr_core_clear_mask() and
> > thermal_intr_pkg_clear_mask()
>
> I've renamed these two functions to
> thermal_intr_init_core_clear_mask() and
> thermal_intr_init_pkg_clear_mask(), respectively.
>
> > to set core and package mask respectively.
> > These functions are called during initialization.
> >
> > Fixes: 6fe1e64b6026 ("thermal: intel: Prevent accidental clearing
> > of HFI status")
> > Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
> > Link:
> > https://lore.kernel.org/lkml/cdf43fb423368ee3994124a9e8c9b4f8d00712c6.camel@linux.intel.com/T/
> > Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
> > Signed-off-by: Srinivas Pandruvada
> > <srinivas.pandruvada@linux.intel.com>
> > Cc: stable@kernel.org # 6.2+
> > ---
> > drivers/thermal/intel/therm_throt.c | 73
> > ++++++++++++++++++++++++++---
> > 1 file changed, 66 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/thermal/intel/therm_throt.c
> > b/drivers/thermal/intel/therm_throt.c
> > index 2e22bb82b738..d5047676f3d2 100644
> > --- a/drivers/thermal/intel/therm_throt.c
> > +++ b/drivers/thermal/intel/therm_throt.c
> > @@ -193,8 +193,67 @@ static const struct attribute_group
> > thermal_attr_group = {
> > #define THERM_THROT_POLL_INTERVAL HZ
> > #define THERM_STATUS_PROCHOT_LOG BIT(1)
> >
> > -#define THERM_STATUS_CLEAR_CORE_MASK (BIT(1) | BIT(3) | BIT(5) |
> > BIT(7) | BIT(9) | BIT(11) | BIT(13) | BIT(15))
> > -#define THERM_STATUS_CLEAR_PKG_MASK (BIT(1) | BIT(3) | BIT(5) |
> > BIT(7) | BIT(9) | BIT(11))
> > +static u64 def_therm_core_clear_mask;
> > +static u64 def_therm_pkg_clear_mask;
>
> And I've renamed these two variables to therm_intr_core_clear_mask
> and
> therm_intr_pkg_clear_mask, respectively.
>
> Also I've changed the subject (to "thermal: intel: Avoid updating
> unsupported THERM_STATUS_CLEAR mask bits") and made some assorted
> changelog edits.
>
> With the above changes, the patch has been queued up for 6.3-rc7.
Thanks!
-Srinivas
>
> > +
> > +static void thermal_intr_core_clear_mask(void)
> > +{
> > + if (def_therm_core_clear_mask)
> > + return;
> > +
> > + /*
> > + * Reference: Intel SDM Volume 4
> > + * "Table 2-2. IA-32 Architectural MSRs", MSR 0x19C
> > + * IA32_THERM_STATUS.
> > + */
> > +
> > + /*
> > + * Bit 1, 3, 5: CPUID.01H:EDX[22] = 1. This driver will not
> > + * enable interrupts, when 0 as it checks for
> > X86_FEATURE_ACPI.
> > + */
> > + def_therm_core_clear_mask = (BIT(1) | BIT(3) | BIT(5));
> > +
> > + /*
> > + * Bit 7 and 9: Thermal Threshold #1 and #2 log
> > + * If CPUID.01H:ECX[8] = 1
> > + */
> > + if (boot_cpu_has(X86_FEATURE_TM2))
> > + def_therm_core_clear_mask |= (BIT(7) | BIT(9));
> > +
> > + /* Bit 11: Power Limitation log (R/WC0) If CPUID.06H:EAX[4]
> > = 1 */
> > + if (boot_cpu_has(X86_FEATURE_PLN))
> > + def_therm_core_clear_mask |= BIT(11);
> > +
> > + /*
> > + * Bit 13: Current Limit log (R/WC0) If CPUID.06H:EAX[7] =
> > 1
> > + * Bit 15: Cross Domain Limit log (R/WC0) If
> > CPUID.06H:EAX[7] = 1
> > + */
> > + if (boot_cpu_has(X86_FEATURE_HWP))
> > + def_therm_core_clear_mask |= (BIT(13) | BIT(15));
> > +}
> > +
> > +static void thermal_intr_pkg_clear_mask(void)
> > +{
> > + if (def_therm_pkg_clear_mask)
> > + return;
> > +
> > + /*
> > + * Reference: Intel SDM Volume 4
> > + * "Table 2-2. IA-32 Architectural MSRs", MSR 0x1B1
> > + * IA32_PACKAGE_THERM_STATUS.
> > + */
> > +
> > + /* All bits except BIT 26 depends on CPUID.06H: EAX[6] = 1
> > */
> > + if (boot_cpu_has(X86_FEATURE_PTS))
> > + def_therm_pkg_clear_mask = (BIT(1) | BIT(3) |
> > BIT(5) | BIT(7) | BIT(9) | BIT(11));
> > +
> > + /*
> > + * Intel SDM Volume 2A: Thermal and Power Management Leaf
> > + * Bit 26: CPUID.06H: EAX[19] = 1
> > + */
> > + if (boot_cpu_has(X86_FEATURE_HFI))
> > + def_therm_pkg_clear_mask |= BIT(26);
> > +}
> >
> > /*
> > * Clear the bits in package thermal status register for bit = 1
> > @@ -207,13 +266,10 @@ void thermal_clear_package_intr_status(int
> > level, u64 bit_mask)
> >
> > if (level == CORE_LEVEL) {
> > msr = MSR_IA32_THERM_STATUS;
> > - msr_val = THERM_STATUS_CLEAR_CORE_MASK;
> > + msr_val = def_therm_core_clear_mask;
> > } else {
> > msr = MSR_IA32_PACKAGE_THERM_STATUS;
> > - msr_val = THERM_STATUS_CLEAR_PKG_MASK;
> > - if (boot_cpu_has(X86_FEATURE_HFI))
> > - msr_val |= BIT(26);
> > -
> > + msr_val = def_therm_pkg_clear_mask;
> > }
> >
> > msr_val &= ~bit_mask;
> > @@ -708,6 +764,9 @@ void intel_init_thermal(struct cpuinfo_x86 *c)
> > h = THERMAL_APIC_VECTOR | APIC_DM_FIXED | APIC_LVT_MASKED;
> > apic_write(APIC_LVTTHMR, h);
> >
> > + thermal_intr_core_clear_mask();
> > + thermal_intr_pkg_clear_mask();
> > +
> > rdmsr(MSR_IA32_THERM_INTERRUPT, l, h);
> > if (cpu_has(c, X86_FEATURE_PLN) && !int_pln_enable)
> > wrmsr(MSR_IA32_THERM_INTERRUPT,
> > --
> > 2.39.1
> >
prev parent reply other threads:[~2023-04-11 17:13 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-10 17:35 [PATCH] thermal: intel: Fix unchecked MSR issue Srinivas Pandruvada
2023-04-11 16:16 ` Rafael J. Wysocki
2023-04-11 17:12 ` srinivas pandruvada [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=26bea1012a0a02895e30f6a33a4ebfe6ec8ba4b2.camel@linux.intel.com \
--to=srinivas.pandruvada@linux.intel.com \
--cc=bp@alien8.de \
--cc=daniel.lezcano@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=rafael@kernel.org \
--cc=rsalvaterra@gmail.com \
--cc=rui.zhang@intel.com \
--cc=stable@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox