public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Liang, Kan" <kan.liang@linux.intel.com>
To: Borislav Petkov <bp@alien8.de>, x86-ml <x86@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexander Antonov <alexander.antonov@linux.intel.com>,
	lkml <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: unchecked MSR access error: WRMSR to 0xd84 (tried to write 0x0000000000010003) at rIP: 0xffffffffa025a1b8 (snbep_uncore_msr_init_box+0x38/0x60 [intel_uncore])
Date: Mon, 4 Mar 2024 14:22:50 -0500	[thread overview]
Message-ID: <b16add91-30c4-43e6-bcf8-11ca8aeaa783@linux.intel.com> (raw)
In-Reply-To: <20240304181841.GCZeYQgbZk6fdntg-X@fat_crate.local>



On 2024-03-04 1:18 p.m., Borislav Petkov wrote:
> Hi all,
> 
> sending this to a bunch of people who have touched this function
> recently and some more relevant Intel folks.
> 
> The machine is an old SNB:
> 
> smpboot: CPU0: Intel(R) Xeon(R) CPU E5-1620 0 @ 3.60GHz (family: 0x6, model: 0x2d, stepping: 0x7)
> 
> and with latest linus/master + tip/master it gives the below.
> 
> It must be something new because 6.8-rc6 is fine.
> 
> ...
> i801_smbus 0000:00:1f.3: enabling device (0000 -> 0003)
> input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input5
> i801_smbus 0000:00:1f.3: SMBus using PCI interrupt
> ACPI: button: Power Button [PWRF]
> i2c i2c-14: 4/4 memory slots populated (from DMI)
> unchecked MSR access error: WRMSR to 0xd84 (tried to write 0x0000000000010003) at rIP: 0xffffffffa025a1b8 (snbep_uncore_msr_init_box+0x38/0x60 [intel_uncore])

The 0xd84 is the box control MSR of the CBOX 4 (Please find the
definition of the MSR from page 11 of
https://www.intel.com/content/www/us/en/develop/download/intel-xeon-processor-e5-v2-and-e7-v2-product-families-uncore-performance-monitoring.html).

It looks like the driver tries to access the CBOX 4, but it is not
available on the machine.

The number of available CBOXs on a SNBEP machine is determined at boot
time. It should not be larger than the maximum number of cores.
The recent commit 89b0f15f408f ("x86/cpu/topology: Get rid of
cpuinfo::x86_max_cores") change the boot_cpu_data.x86_max_cores to
topology_num_cores_per_package().
I guess the new function probably returns a different maximum number of
cores on the machine. But I don't have a SNBEP on my hands. Could you
please help to check whether a different maximum number of cores is
returned?

Thanks,
Kan

> Call Trace:
>  <TASK>
>  ? ex_handler_msr+0xcb/0x130
>  ? fixup_exception+0x166/0x320
>  ? exc_general_protection+0xd7/0x3f0
>  ? asm_exc_general_protection+0x22/0x30
>  ? snbep_uncore_msr_init_box+0x38/0x60 [intel_uncore]
>  uncore_box_ref.part.0+0x9c/0xc0 [intel_uncore]
>  ? __pfx_uncore_event_cpu_online+0x10/0x10 [intel_uncore]
>  uncore_event_cpu_online+0x56/0x140 [intel_uncore]
>  ? __pfx_uncore_event_cpu_online+0x10/0x10 [intel_uncore]
>  cpuhp_invoke_callback+0x174/0x5e0
>  ? cpuhp_thread_fun+0x5a/0x200
>  cpuhp_thread_fun+0x17e/0x200
>  ? smpboot_thread_fn+0x2b/0x250
>  smpboot_thread_fn+0x1ad/0x250
>  ? __pfx_smpboot_thread_fn+0x10/0x10
>  kthread+0xed/0x120
>  ? __pfx_kthread+0x10/0x10
>  ret_from_fork+0x30/0x50
>  ? __pfx_kthread+0x10/0x10
> iTCO_vendor_support: vendor-support=0
>  ret_from_fork_asm+0x1a/0x30
>  </TASK>
> iTCO_wdt iTCO_wdt.1.auto: Found a Patsburg TCO device (Version=2, TCOBASE=0x0460)
> iTCO_wdt iTCO_wdt.1.auto: initialized. heartbeat=30 sec (nowayout=0)
> RAPL PMU: API unit is 2^-32 Joules, 2 fixed counters, 163840 ms ovfl timer
> ...
> 
> Thx.
> 

  reply	other threads:[~2024-03-04 19:22 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-04 18:18 unchecked MSR access error: WRMSR to 0xd84 (tried to write 0x0000000000010003) at rIP: 0xffffffffa025a1b8 (snbep_uncore_msr_init_box+0x38/0x60 [intel_uncore]) Borislav Petkov
2024-03-04 19:22 ` Liang, Kan [this message]
2024-03-04 20:12   ` Borislav Petkov
2024-03-05 10:14     ` Thomas Gleixner
2024-03-05 12:10       ` Borislav Petkov
2024-03-06 11:17         ` Thomas Gleixner
2024-03-06 12:32           ` Borislav Petkov
2024-03-06 13:42           ` [tip: x86/apic] x86/topology: Ignore non-present APIC IDs in a present package tip-bot2 for Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b16add91-30c4-43e6-bcf8-11ca8aeaa783@linux.intel.com \
    --to=kan.liang@linux.intel.com \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.antonov@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox