Linux Perf Users
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Zide Chen" <zide.chen@intel.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH V3 3/8] perf/x86/intel/uncore: Let init_box() callback report failures
Date: Thu, 11 Jun 2026 16:38:45 +0000	[thread overview]
Message-ID: <20260611163846.1CCAF1F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260611160033.66760-4-zide.chen@intel.com>

Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:

New issues:
- [Medium] The `adl_uncore_msr_ops` struct lacks an `init_box` callback, causing `exit_box` to be skipped during teardown and leaving hardware PMU logic enabled.
- [Low] PCI `init_box` callbacks return positive `PCIBIOS_*` error codes instead of negative error codes as documented.

Pre-existing issues:
- [High] `ptl_uncore_sncu_init_box` performs a `writel()` gated by a check on `box->io_addr` rather than checking for initialization success, leading to Use-After-Unmap if reinitialization fails.
- [High] `uncore_mmio_exit_box()` calls `iounmap()` on `box->io_addr` but fails to clear the pointer to NULL, leaving a dangling pointer.
--

commit eb92519036a2afbd738b5e91732e11fc662b60f7
Author: Zide Chen <zide.chen@intel.com>

perf/x86/intel/uncore: Let init_box() callback report failures

The init_box() callback currently returns void, so initialization
failures are silently ignored and the box is still marked initialized.
This patch changes the callback to return int so platform code can report
errors back to the common uncore layer.

> diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
> --- a/arch/x86/events/intel/uncore.h
> +++ b/arch/x86/events/intel/uncore.h
[ ... ]
> @@ -557,12 +557,18 @@ static inline u64 uncore_read_counter(struct intel_uncore_box *box,
>  	return box->pmu->type->ops->read_counter(box, event);
>  }
>  
> -static inline void uncore_box_init(struct intel_uncore_box *box)
> +static inline int uncore_box_init(struct intel_uncore_box *box)
>  {
> -	if (!test_and_set_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags)) {
> -		if (box->pmu->type->ops->init_box)
> -			box->pmu->type->ops->init_box(box);
> +	int ret = 0;
> +
> +	if (!test_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags) &&
> +	    box->pmu->type->ops->init_box) {
> +		ret = box->pmu->type->ops->init_box(box);
> +		if (!ret)
> +			__set_bit(UNCORE_BOX_FLAG_INITIALIZED, &box->flags);
>  	}

[Severity: Medium]
Will this cause exit_box() to be skipped for PMUs that lack an init_box()
callback?

For example, adl_uncore_msr_ops defines exit_box but lacks an init_box.

Because ops->init_box is NULL, the UNCORE_BOX_FLAG_INITIALIZED flag is
never set for Alder Lake MSR PMUs.

During teardown, uncore_box_exit() checks the initialized flag.

Since it is false, adl_uncore_msr_exit_box() is skipped, and the
ADL_UNC_PERF_GLOBAL_CTL MSR is never cleared.

Does this leave the hardware PMON logic improperly enabled?

> diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
> --- a/arch/x86/events/intel/uncore_discovery.c
> +++ b/arch/x86/events/intel/uncore_discovery.c
[ ... ]
> @@ -578,15 +579,16 @@ static inline int intel_pci_uncore_box_ctl(struct intel_uncore_box *box)
>  	return UNCORE_DISCOVERY_PCI_BOX_CTRL(intel_generic_uncore_box_ctl(box));
>  }
>  
> -void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
> +int intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
>  {
>  	int box_ctl = intel_pci_uncore_box_ctl(box);
>  
>  	if (!box_ctl)
> -		return;
> +		return -ENODEV;
>  
>  	__set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
> -	pci_write_config_dword(box->pci_dev, box_ctl, GENERIC_PMON_BOX_CTL_INT);
> +	return pci_write_config_dword(box->pci_dev, box_ctl,
> +				      GENERIC_PMON_BOX_CTL_INT);
>  }

[Severity: Low]
Does this violate the contract mentioned in the commit message to "return 0
on success or a negative error code on failure"?

pci_write_config_dword() yields positive PCIBIOS_* error codes on failure.

Returning positive values directly mixes standard Linux error codes with PCI
BIOS error codes.

> diff --git a/arch/x86/events/intel/uncore_snb.c b/arch/x86/events/intel/uncore_snb.c
> --- a/arch/x86/events/intel/uncore_snb.c
> +++ b/arch/x86/events/intel/uncore_snb.c
[ ... ]
> @@ -1887,13 +1907,15 @@ static struct intel_uncore_type ptl_uncore_imc = {
>  	.mmio_map_size		= 0xf00,
>  };
>  
> -static void ptl_uncore_sncu_init_box(struct intel_uncore_box *box)
> +static int ptl_uncore_sncu_init_box(struct intel_uncore_box *box)
>  {
> -	intel_generic_uncore_mmio_init_box(box);
> +	int ret = intel_generic_uncore_mmio_init_box(box);
>  
>  	/* Clear the global freeze bit */
>  	if (box->io_addr)
>  		writel(0, box->io_addr + PTL_UNCORE_GLOBAL_CTL_OFFSET);
> +
> +	return ret;
>  }

[Severity: High]
This is a pre-existing issue, but should this check if (!ret) instead of
if (box->io_addr)?

When a CPU goes offline, the MMIO region is unmapped by
uncore_mmio_exit_box(), but the pointer box->io_addr is not cleared to NULL.

If the CPU is later brought back online,
intel_generic_uncore_mmio_init_box() may fail early (e.g., returning
-ENODEV) without overwriting the stale box->io_addr pointer.

Because the stale pointer evaluates to true, the kernel would execute
writel(0, box->io_addr + PTL_UNCORE_GLOBAL_CTL_OFFSET) to an unmapped
address.

Could this trigger a page fault?

[Severity: High]
This isn't a bug introduced by this patch, but does uncore_mmio_exit_box()
leave a dangling pointer?

uncore_mmio_exit_box() calls iounmap() on box->io_addr but fails to clear
the pointer to NULL.

If the box is later reused and suffers an early initialization failure,
box->io_addr remains non-NULL.

Core handlers (such as uncore_mmio_read_counter) validate readiness by
checking if (!box->io_addr).

Since the pointer is not cleared, these checks can bypass validation and
permit perf events to trigger memory accesses.

Could this lead to a use-after-unmap memory access?

> diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
> --- a/arch/x86/events/intel/uncore_snbep.c
> +++ b/arch/x86/events/intel/uncore_snbep.c
[ ... ]
> @@ -627,12 +627,12 @@ static u64 snbep_uncore_pci_read_counter(struct intel_uncore_box *box, struct pe
>  	return count;
>  }
>  
> -static void snbep_uncore_pci_init_box(struct intel_uncore_box *box)
> +static int snbep_uncore_pci_init_box(struct intel_uncore_box *box)
>  {
>  	struct pci_dev *pdev = box->pci_dev;
>  	int box_ctl = uncore_pci_box_ctl(box);
>  
> -	pci_write_config_dword(pdev, box_ctl, SNBEP_PMON_BOX_CTL_INT);
> +	return pci_write_config_dword(pdev, box_ctl, SNBEP_PMON_BOX_CTL_INT);
>  }

[Severity: Low]
Similar to intel_generic_uncore_pci_init_box(), does this also violate the
negative error code API contract by returning positive PCIBIOS error codes?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260611160033.66760-1-zide.chen@intel.com?part=3

  reply	other threads:[~2026-06-11 16:38 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-11 16:00 [PATCH v3 0/8] perf/x86/intel/uncore: PMU setup robustness fixes Zide Chen
2026-06-11 16:00 ` [PATCH V3 1/8] perf/x86/intel/uncore: Fix PCI PMU cleanup on setup failure Zide Chen
2026-06-11 16:26   ` sashiko-bot
2026-06-11 16:00 ` [PATCH V3 2/8] perf/x86/intel/uncore: Fix refcnt and other cleanups Zide Chen
2026-06-11 16:29   ` sashiko-bot
2026-06-11 16:00 ` [PATCH V3 3/8] perf/x86/intel/uncore: Let init_box() callback report failures Zide Chen
2026-06-11 16:38   ` sashiko-bot [this message]
2026-06-11 16:00 ` [PATCH V3 4/8] perf/x86/intel/uncore: Keep PCI PMUs working when MMIO/MSR setup fails Zide Chen
2026-06-11 16:00 ` [PATCH V3 5/8] perf/x86/intel/uncore: Factor out box setup code Zide Chen
2026-06-11 16:00 ` [PATCH V3 6/8] perf/x86/intel/uncore: Introduce PMU flags and broken state Zide Chen
2026-06-11 16:30   ` sashiko-bot
2026-06-11 16:00 ` [PATCH V3 7/8] perf/x86/intel/uncore: Fix uncore_box ref/unref ordering Zide Chen
2026-06-11 16:29   ` sashiko-bot
2026-06-11 16:00 ` [PATCH V3 8/8] perf/x86/intel/uncore: Implement lazy setup for MSR/MMIO PMUs Zide Chen
2026-06-11 16:33   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260611163846.1CCAF1F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    --cc=zide.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox