public inbox for linux-perf-users@vger.kernel.org
 help / color / mirror / Atom feed
From: "Mi, Dapeng" <dapeng1.mi@linux.intel.com>
To: "Chen, Zide" <zide.chen@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Eranian Stephane <eranian@google.com>
Cc: linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Steve Wahl <steve.wahl@hpe.com>,
	Chun-Tse Shao <ctshao@google.com>,
	Markus Elfring <Markus.Elfring@web.de>
Subject: Re: [PATCH V6 4/5] perf/x86/intel/uncore: Fix PMON enumeration with NUMA disabled
Date: Fri, 3 Apr 2026 08:58:07 +0800	[thread overview]
Message-ID: <ca050d2c-53e6-40fe-b72a-4dd11fe739ed@linux.intel.com> (raw)
In-Reply-To: <71af8e5f-8d13-4bec-8856-b30ebaf308c7@intel.com>


On 4/3/2026 5:31 AM, Chen, Zide wrote:
>
> On 4/1/2026 7:48 PM, Mi, Dapeng wrote:
>> On 4/2/2026 4:25 AM, Chen, Zide wrote:
>>> On 3/30/2026 6:26 PM, Mi, Dapeng wrote:
>>>> On 3/31/2026 5:24 AM, Zide Chen wrote:
>>>>> When NUMA is disabled on a NUMA-capable platform, UPI and M3UPI PMON
>>>>> units are not enumerated.
>>>>>
>>>>> In this case, pcibus_to_node() always returns NUMA_NO_NODE, causing
>>>>> uncore_device_to_die() to return -1 for all PCI devices. As a result,
>>>>> the corresponding PMON units are not added to the RB tree.
>>>>>
>>>>> These PMON units are per-die resources, and their utility when NUMA is
>>>>> disabled is limited.  The driver does not prohibit their use, and the
>>>>> enumeration should still work correctly.
>>>>>
>>>>> Fix this by using uncore_pcibus_to_dieid(), which works regardless of
>>>>> whether NUMA is enabled.  This requires calling
>>>>> snbep_pci2phy_map_init() in spr_uncore_pci_init().
>>>>>
>>>>> Since pci_init() is called before mmio_init(), remove the redundant
>>>>> snbep_pci2phy_map_init() call from spr_uncore_mmio_init().  If
>>>>> snbep_pci2phy_map_init() fails, uncore driver should be bailed out,
>>>>> so the fallback path in spr_uncore_mmio_init() can be removed.
>>>>>
>>>>> Signed-off-by: Zide Chen <zide.chen@intel.com>
>>>>> ---
>>>>> V6:
>>>>> - Split from patch v5 3/4.
>>>>> - Remove the redundant call in spr_uncore_mmio_init().
>>>>> - Update commit messages.
>>>>> ---
>>>>>  arch/x86/events/intel/uncore.c       |  1 +
>>>>>  arch/x86/events/intel/uncore_snbep.c | 26 +++++++++++---------------
>>>>>  2 files changed, 12 insertions(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
>>>>> index 786bd51a0d89..e9cc1ba921c5 100644
>>>>> --- a/arch/x86/events/intel/uncore.c
>>>>> +++ b/arch/x86/events/intel/uncore.c
>>>>> @@ -67,6 +67,7 @@ int uncore_die_to_segment(int die)
>>>>>  	return bus ? pci_domain_nr(bus) : -EINVAL;
>>>>>  }
>>>>>  
>>>>> +/* Note: This API can only be used when NUMA information is available. */
>>>>>  int uncore_device_to_die(struct pci_dev *dev)
>>>>>  {
>>>>>  	int node = pcibus_to_node(dev->bus);
>>>>> diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
>>>>> index 8ee06d4659bb..73da1e88e286 100644
>>>>> --- a/arch/x86/events/intel/uncore_snbep.c
>>>>> +++ b/arch/x86/events/intel/uncore_snbep.c
>>>>> @@ -6415,7 +6415,7 @@ static void spr_update_device_location(int type_id)
>>>>>  
>>>>>  	while ((dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, dev)) != NULL) {
>>>>>  
>>>>> -		die = uncore_device_to_die(dev);
>>>>> +		die = uncore_pcibus_to_dieid(dev->bus);
>>>>>  		if (die < 0)
>>>>>  			continue;
>>>>>  
>>>>> @@ -6439,6 +6439,10 @@ static void spr_update_device_location(int type_id)
>>>>>  
>>>>>  int spr_uncore_pci_init(void)
>>>>>  {
>>>>> +	int ret = snbep_pci2phy_map_init(0x3250, SKX_CPUNODEID, SKX_GIDNIDMAP, true);
>>>>> +	if (ret)
>>>>> +		return ret;
>>>>> +
>>>>>  	/*
>>>>>  	 * The discovery table of UPI on some SPR variant is broken,
>>>>>  	 * which impacts the detection of both UPI and M3UPI uncore PMON.
>>>>> @@ -6460,21 +6464,13 @@ int spr_uncore_pci_init(void)
>>>>>  
>>>>>  void spr_uncore_mmio_init(void)
>>>>>  {
>>>>> -	int ret = snbep_pci2phy_map_init(0x3250, SKX_CPUNODEID, SKX_GIDNIDMAP, true);
>>>>> +	uncore_mmio_uncores = uncore_get_uncores(UNCORE_ACCESS_MMIO,
>>>>> +						 UNCORE_SPR_MMIO_EXTRA_UNCORES,
>>>>> +						 spr_mmio_uncores,
>>>>> +						 UNCORE_SPR_NUM_UNCORE_TYPES,
>>>>> +						 spr_uncores);
>>>>>  
>>>>> -	if (ret) {
>>>>> -		uncore_mmio_uncores = uncore_get_uncores(UNCORE_ACCESS_MMIO, 0, NULL,
>>>>> -							 UNCORE_SPR_NUM_UNCORE_TYPES,
>>>>> -							 spr_uncores);
>>>>> -	} else {
>>>>> -		uncore_mmio_uncores = uncore_get_uncores(UNCORE_ACCESS_MMIO,
>>>>> -							 UNCORE_SPR_MMIO_EXTRA_UNCORES,
>>>>> -							 spr_mmio_uncores,
>>>>> -							 UNCORE_SPR_NUM_UNCORE_TYPES,
>>>>> -							 spr_uncores);
>>>>> -
>>>>> -		spr_uncore_imc_free_running.num_boxes = uncore_type_max_boxes(uncore_mmio_uncores, UNCORE_SPR_IMC) / 2;
>>>>> -	}
>>>>> +	spr_uncore_imc_free_running.num_boxes = uncore_type_max_boxes(uncore_mmio_uncores, UNCORE_SPR_IMC) / 2;
>>>> I'm not sure if we can directly remove the snbep_pci2phy_map_init() call
>>>> here. In theory, the snbep_pci2phy_map_init() call in spr_uncore_pci_init()
>>>> could fail and then spr_uncore_mmio_init() doesn't know it and directly
>>>> initializes MMIO PMU, then it could lead to the MMIO initialization fails.
>>> Yes, this is true. But I would argue that the fix in this patch is
>>> correct, and the issue you pointed out is not new: the uncore driver
>>> registers a PMU device without guaranteeing it's functioning.
>>>
>>> This is because the Intel uncore driver employs a lazy init approach.
>>> And when init_box() fails, it doesn't unregister the inaccessible PMU
>>> devices. For example, intel_generic_uncore_mmio_init_box() could fail
>>> for a number of reasons, making all associated PMU devices non-functional.
>>>
>>> Originally the uncore driver tried to enumerate PCI/MSR/MMIO uncore
>>> independently, but evolving hardware complexity makes this more
>>> challenging.  This patch is just one example, IMC Freerunning is
>>> MMIO-accessed but relies on PCI devices to read the die-specific MMIO
>>> base address. Explicitly gating sysfs node creation with PCI init code
>>> in mmio_init() is neither clean nor reliable.
>>>
>>> To fix it, it seems reasonable to have init_box() return int and
>>> unregister the PMU device if deemed inaccessible — similar to what
>>> perf_event_ibs_init() does.
>>>
>>> --- a/arch/x86/events/intel/uncore.h
>>> +++ b/arch/x86/events/intel/uncore.h
>>> @@ -129,7 +129,7 @@ struct intel_uncore_type {
>>>  #define events_group attr_groups[2]
>>>
>>>  struct intel_uncore_ops {
>>> -       void (*init_box)(struct intel_uncore_box *);
>>> +       int (*init_box)(struct intel_uncore_box *);
>>>
>>> --- a/arch/x86/events/intel/uncore.c
>>> +++ b/arch/x86/events/intel/uncore.c
>>> @@ -1155,7 +1155,8 @@ static int uncore_pci_pmu_register(struct pci_dev
>>> *pdev,
>>>         box->dieid = die;
>>>         box->pci_dev = pdev;
>>>         box->pmu = pmu;
>>> -       uncore_box_init(box);
>>> +       ret = uncore_box_init(box);
>>> +	if (ret)
>>> +               return ret;
>>>
>>> @@ -1598,8 +1599,10 @@ static int uncore_box_ref(struct
>>> intel_uncore_type **types,
>>>                 pmu = type->pmus;
>>>                 for (i = 0; i < type->num_boxes; i++, pmu++) {
>>>                         box = pmu->boxes[id];
>>> -                       if (box && box->cpu >= 0 &&
>>> atomic_inc_return(&box->refcnt) == 1)
>>> -                               uncore_box_init(box);
>>> +                       if (box && box->cpu >= 0 &&
>>> atomic_inc_return(&box->refcnt) == 1)
>>> +                               if (uncore_box_init(box))
>>> +                                       uncore_pmu_unregister(pmu);
>> Yes, I like this idea. The return value of init_box() should always be
>> checked. I'm not quite sure if there are other resources need to be cleaned
>> besides unregistering the corresponding uncore pmu, please double check.
>> Thanks.
> I'm thinking of removing this patch from this series and putting it
> together with the init_box() changes, where it will be a complete fix
> and I need more time to double check all the init_box() callbacks.

Yeah, It's a fundamental change and let's do more tests. Thanks.


>
>>>
>>>> Currently the PCI, CPU and MMIO initialization are totally independent,
>>>> only when the 3 types initialization all fail, then uncore PMU can abort.
>>>>
>>>> ``` 
>>>>
>>>>    if (uncore_init->pci_init) {
>>>>         pret = uncore_init->pci_init();
>>>>         if (!pret)
>>>>             pret = uncore_pci_init();
>>>>     }
>>>>
>>>>     if (uncore_init->cpu_init) {
>>>>         uncore_init->cpu_init();
>>>>         cret = uncore_cpu_init();
>>>>     }
>>>>
>>>>     if (uncore_init->mmio_init) {
>>>>         uncore_init->mmio_init();
>>>>         mret = uncore_mmio_init();
>>>>     }
>>>>
>>>>     if (cret && pret && mret) {
>>>>         ret = -ENODEV;
>>>>         goto free_discovery;
>>>>     }
>>>> ```
>>>>
>>>>
>>>>>  }
>>>>>  
>>>>>  /* end of SPR uncore support */

  reply	other threads:[~2026-04-03  0:58 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-30 21:24 [PATCH V6 0/5] Miscellaneous Intel uncore patches Zide Chen
2026-03-30 21:24 ` [PATCH V6 1/5] perf/x86/intel/uncore: Fix iounmap() leak on global_init failure Zide Chen
2026-03-30 21:24 ` [PATCH V6 2/5] perf/x86/intel/uncore: Skip discovery table for offline dies Zide Chen
2026-03-30 21:24 ` [PATCH V6 3/5] perf/x86/intel/uncore: Do not treat -1 die_id as error during UBOX scan Zide Chen
2026-03-31  1:13   ` Mi, Dapeng
2026-03-30 21:24 ` [PATCH V6 4/5] perf/x86/intel/uncore: Fix PMON enumeration with NUMA disabled Zide Chen
2026-03-31  1:26   ` Mi, Dapeng
2026-04-01 20:25     ` Chen, Zide
2026-04-02  2:48       ` Mi, Dapeng
2026-04-02 21:31         ` Chen, Zide
2026-04-03  0:58           ` Mi, Dapeng [this message]
2026-03-30 21:24 ` [PATCH V6 5/5] perf/x86/intel/uncore: Remove extra double quote mark Zide Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ca050d2c-53e6-40fe-b72a-4dd11fe739ed@linux.intel.com \
    --to=dapeng1.mi@linux.intel.com \
    --cc=Markus.Elfring@web.de \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=ak@linux.intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=ctshao@google.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=steve.wahl@hpe.com \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox