Linux Perf Users
 help / color / mirror / Atom feed
From: "Chen, Zide" <zide.chen@intel.com>
To: sashiko-reviews@lists.linux.dev
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 1/6] perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems
Date: Wed, 13 May 2026 11:35:17 -0700	[thread overview]
Message-ID: <61a7734a-752b-4452-ae18-511f0da6aa6a@intel.com> (raw)
In-Reply-To: <20260513014918.E1D66C2BCB0@smtp.kernel.org>



On 5/12/2026 6:49 PM, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
> - [High] Unchecked return value causes a write to MSR 0x0 when a PMON discovery unit is not found for the current die, leading to a #GP fault and kernel panic.
> --
> 
> commit 618972be846a765ebeeda359a995bf217a4d825c
> Author: Zide Chen <zide.chen@intel.com>
> 
> perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems
> 
> This commit updates intel_generic_uncore_box_ctl() to look up discovery
> units using box->dieid instead of hardcoding -1. This ensures the
> correct node is selected on multi-die systems.
> 
>> diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
>> index 583cbd06b9b88..1d22d7c00ee06 100644
>> --- a/arch/x86/events/intel/uncore_discovery.c
>> +++ b/arch/x86/events/intel/uncore_discovery.c
>> @@ -481,7 +481,7 @@ static u64 intel_generic_uncore_box_ctl(struct intel_uncore_box *box)
>>  	struct intel_uncore_discovery_unit *unit;
>>  
>>  	unit = intel_uncore_find_discovery_unit(box->pmu->type->boxes,
>> -						-1, box->pmu->pmu_idx);
>> +						box->dieid, box->pmu->pmu_idx);
>>  	if (WARN_ON_ONCE(!unit))
>>  		return 0;
> 
> Could returning 0 here cause a regression during CPU hotplug if a die is
> brought online later?

You're right, it's problematic, but not a regression — it's caused by a
pre-existing bug.

With or without this patch, intel_uncore_find_discovery_unit() could
return NULL, causing intel_generic_uncore_box_ctl() to return 0, and all
callers should check the return value before accessing hardware.

The CPU offline scenario you described is just one example of this. I'll
fix it in a separate patch.


> If a CPU die is offline during boot, uncore_discovery_msr() skips it,
> meaning no PMON units are added to the discovery RB-tree for that dieid.
> 
> When the die is later brought online, uncore_event_cpu_online()
> allocates boxes based on the globally maximum number of discovered units
> and calls uncore_box_init().
> 
> The initialization function unconditionally calls
> intel_generic_uncore_msr_init_box(), which writes to the returned address
> without checking if it is 0:
> 
> arch/x86/events/intel/uncore_discovery.c:intel_generic_uncore_msr_init_box() {
> 	wrmsrq(intel_generic_uncore_box_ctl(box), GENERIC_PMON_BOX_CTL_INT);
> }
> 
> Because the lookup now strictly matches box->dieid instead of safely
> falling back with -1 to an identical unit on another die,
> intel_uncore_find_discovery_unit() will return NULL for this die.
> 
> This results in intel_generic_uncore_box_ctl() returning 0. Calling
> wrmsrq(0, ...) writes directly to MSR 0x0 (IA32_P5_MC_ADDR), which will
> trigger a #GP fault and crash the system.
> 
> Similar unvalidated accesses also occur in PCI configuration space
> via intel_generic_uncore_pci_init_box():
> 
> arch/x86/events/intel/uncore_discovery.c:intel_generic_uncore_pci_init_box() {
> 	struct pci_dev *pdev = box->pci_dev;
> 	int box_ctl = intel_pci_uncore_box_ctl(box);
> 
> 	__set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
> 	pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_INT);
> }
> 
> Should the callers be updated to check the return value of
> intel_generic_uncore_box_ctl() before attempting hardware accesses?
> 


  reply	other threads:[~2026-05-13 18:35 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-11 23:05 [PATCH 0/6] perf/x86/intel/uncore: Bug fixes and cleanups Zide Chen
2026-05-11 23:05 ` [PATCH 1/6] perf/x86/intel/uncore: Fix discovery unit lookup for multi-die systems Zide Chen
2026-05-13  1:49   ` sashiko-bot
2026-05-13 18:35     ` Chen, Zide [this message]
2026-05-11 23:05 ` [PATCH 2/6] perf/x86/intel/uncore: Fix PCI device refcount leak in UPI discovery Zide Chen
2026-05-12  9:27   ` Mi, Dapeng
2026-05-12 17:35     ` Chen, Zide
2026-05-13  0:31       ` Mi, Dapeng
2026-05-11 23:05 ` [PATCH 3/6] perf/x86/intel/uncore: Defer ADL global PMON enable to enable_box() Zide Chen
2026-05-13  2:33   ` sashiko-bot
2026-05-13 17:58     ` Chen, Zide
2026-05-11 23:05 ` [PATCH 4/6] perf/x86/intel/uncore: Move die_to_cpu() to uncore.c Zide Chen
2026-05-13  2:58   ` sashiko-bot
2026-05-13 18:11     ` Chen, Zide
2026-05-11 23:05 ` [PATCH 5/6] perf/x86/intel/uncore: Fix uncore_die_to_cpu() for offline dies Zide Chen
2026-05-11 23:05 ` [PATCH 6/6] perf/x86/intel/uncore: Implement global init callback for GNR uncore Zide Chen
2026-05-13  4:25   ` sashiko-bot
2026-05-13 18:26     ` Chen, Zide

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61a7734a-752b-4452-ae18-511f0da6aa6a@intel.com \
    --to=zide.chen@intel.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox