From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 31A2B25B094 for ; Thu, 14 May 2026 04:27:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778732859; cv=none; b=LdyN8UTWZMhVu4IeLHozlME9IinsZtSDe59L3NfFhErxa2y7BTFOiYoTWYXJ0+x8znpzDczZOJpSmYlYaZSLEMdIJ7M9zG6WpTfA+7feIaBp/aZEUvTvVtoThybkLziwp8dJ7X0X2DE4Gef5tX9b2yoBRSUnRD+tByQBgaUKdGY= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778732859; c=relaxed/simple; bh=oUWJWLQiXFo5JMnNXlRhu5toUh4UYuNWmXNoUtBcry8=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=nLD87ycK7cHLCcYPbblCe483FjMckdtOuLBhkSXI4QK3vgL0zsaZuJdU4krQC6haGu4J24Rc4ineOyIsiRzy/07ByYet21diFXyIsxuKvvHm3o8ca0chzNtlI9fnPdgXEvHhKnDrtJYyopoiMLg9nm6bgyCDm+UtuiyTsH56brs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=DCxE8KAV; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="DCxE8KAV" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 80B40C2BCB7; Thu, 14 May 2026 04:27:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778732858; bh=oUWJWLQiXFo5JMnNXlRhu5toUh4UYuNWmXNoUtBcry8=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=DCxE8KAVG7oIbydOP6GFGY49kGM71ZFRspVP3V3zmPELNPU3/AapMgS8niESOxlzk xP+gjCLdOlaZMgyMVkxjn+CrW91glNyEaIJ2lXTfPoV74nGE9+1KfYXQtq/TC0K1Se qO8AGD+eKOK0Le+1yRZI5jVery8auu1av1Y+vv3SoDqA1+j7Tqv7g+QUlaq7A/tyqW L2yLCJhOGpkt/tr7eVknR46AuW1XyaAw/9yx/wPaM8KauMu8r46bYSHVKfbQjzs/3+ aRk7zV+j28KyK9IlaL5oM2fc7GmStsdmqHOWLuOLCggTnY7mT7OWKFgX8+f9/W4OV9 //IwcQ+6u+YOw== From: sashiko-bot@kernel.org Subject: Re: [PATCH 5/7] perf/x86/intel/uncore: Introduce PMU flags and broken state Reply-To: sashiko-reviews@lists.linux.dev To: "Zide Chen" Cc: linux-perf-users@vger.kernel.org In-Reply-To: <20260512233048.9577-6-zide.chen@intel.com> References: <20260512233048.9577-6-zide.chen@intel.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Thu, 14 May 2026 04:27:37 +0000 Message-Id: <20260514042738.80B40C2BCB7@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 4 potential issue(= s) to consider: - [Critical] Intentionally leaking `die_refcnt` on PCI box setup failure pr= events PMU unregistration, leading to a Use-After-Free. - [High] Global `PMU_BROKEN` state disables new event creation and breaks C= PU context migration for fully functional dies. - [Medium] Confused lifecycle assumptions lead to contradictory error handl= ing and dead code between PCI and MSR/MMIO PMUs. - [Medium] Data race between plain store and atomic bitwise operation on `p= mu->flags`. -- commit 553c891e098baf8dc3f2b474849cc4642dfe9be6 Author: Zide Chen perf/x86/intel/uncore: Introduce PMU flags and broken state This commit replaces the 'registered' field in intel_uncore_pmu with a 'flags' field, introducing a PMU_BROKEN flag to track box setup failures. If a box fails to initialize, the PMU is marked broken, which rejects new event assignments and skips future box setup attempts. > diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncor= e.c > index 617a68c1d76c3..4d31dcaf5405b 100644 > --- a/arch/x86/events/intel/uncore.c > +++ b/arch/x86/events/intel/uncore.c [ ... ] > @@ -738,7 +738,7 @@ static int uncore_pmu_event_init(struct perf_event *e= vent) > =20 > pmu =3D uncore_event_to_pmu(event); > /* no device found for this pmu */ > - if (!pmu->registered) > + if (!uncore_pmu_available(pmu)) > return -ENOENT; Since PMU_BROKEN is a global flag across all dies and packages, if one die fails initialization, the entire PMU is marked broken. Does this cause uncore_pmu_event_init() to incorrectly reject new event assignments on fully functional packages? [ ... ] > @@ -934,16 +934,16 @@ static int uncore_pmu_register(struct intel_uncore_= pmu *pmu) > =20 > ret =3D perf_pmu_register(&pmu->pmu, pmu->name, -1); > if (!ret) > - pmu->registered =3D true; > + uncore_pmu_set_registered(pmu); > return ret; > } > =20 > static void uncore_pmu_unregister(struct intel_uncore_pmu *pmu) > { > - if (!pmu->registered) > + if (!uncore_pmu_registered(pmu)) > return; > perf_pmu_unregister(&pmu->pmu); > - pmu->registered =3D false; > + WRITE_ONCE(pmu->flags, 0); Can this plain store data race with concurrent atomic bitwise operations? If uncore_box_setup() executes on another CPU (e.g., probing a new failing die while removing a successful die), it uses the atomic read-modify-write instruction set_bit(PMU_BROKEN_BIT, &(pmu)->flags). Mixing plain stores and atomic bitwise operations on the same memory location could lead to clobbered updates. [ ... ] > @@ -1136,7 +1136,13 @@ static int uncore_box_setup(struct intel_uncore_pm= u *pmu, > =20 > /* die_refcnt tracks online dies, not only functioning boxes. */ > dies =3D atomic_inc_return(&pmu->die_refcnt); > - uncore_box_init(box); > + > + if (uncore_pmu_broken(pmu)) > + return -ENODEV; > + > + ret =3D uncore_box_init(box); > + if (ret) > + goto err; > =20 > /* First active box registers the pmu. */ > if (dies > 1) > @@ -1148,6 +1154,19 @@ static int uncore_box_setup(struct intel_uncore_pm= u *pmu, > =20 > return 0; > err: > + /* > + * On failure on any box, mark the per-package PMU as broken regardless > + * of whether it was registered or not. > + * > + * Don't decrement die_refcnt to prevent any future CPU online > + * event or PCI probe, from retrying the failed PMU registration. > + * > + * Don't decrement cpu_refcnt to avoid other in-die CPUs from > + * trying to set up the PMU box again. > + * > + * Don't kfree box; MSR and MMIO boxes are freed at module exit only. > + */ Is there a risk of a use-after-free resulting from intentionally leaking die_refcnt here? For PCI PMUs, the caller uncore_pci_pmu_register() executes kfree(box) on e= rror and fails the PCI probe, meaning uncore_pci_remove() is never called to cle= an up this leaked reference. If another successfully probed PCI box for this PMU is later removed, uncore_pci_pmu_unregister() decrements die_refcnt, but it will not reach 0 = due to the earlier leak. Consequently, uncore_pmu_unregister(pmu) is skipped, leaving the PMU and its associated perf events active. When uncore_pci_pmu_unregister() then frees the successful box, active perf_event instances could still hold event->pmu_private pointing to this f= reed box, potentially triggering a use-after-free on subsequent event reads or context switches. Additionally, this comment claims MSR and MMIO boxes are freed at module ex= it only. Does this conflict with PCI PMUs which unconditionally execute kfree(= box) on error, immediately destroying the box? > + uncore_pmu_set_broken(pmu); > uncore_box_exit(box); > return ret; > } [ ... ] > @@ -1483,7 +1502,8 @@ static void uncore_change_type_ctx(struct intel_unc= ore_type *type, int old_cpu, > =20 > if (old_cpu < 0) { > WARN_ON_ONCE(box->cpu !=3D -1); > - if (uncore_die_has_box(type, die, pmu->pmu_idx)) { > + if (uncore_die_has_box(type, die, pmu->pmu_idx) && > + !uncore_pmu_broken(pmu)) { > box->cpu =3D new_cpu; > cpumask_set_cpu(new_cpu, &pmu->cpu_mask); > } > @@ -1493,7 +1513,7 @@ static void uncore_change_type_ctx(struct intel_unc= ore_type *type, int old_cpu, > WARN_ON_ONCE(box->cpu !=3D -1 && box->cpu !=3D old_cpu); > box->cpu =3D -1; > cpumask_clear_cpu(old_cpu, &pmu->cpu_mask); > - if (new_cpu < 0) > + if (new_cpu < 0 || uncore_pmu_broken(pmu)) > continue; If uncore_pmu_broken(pmu) causes this to skip, does this strand active hard= ware events on offlined CPUs for fully functional dies? The commit message states the goal is to avoid disrupting in-flight events = on other boxes, but skipping uncore_pmu_cancel_hrtimer() and perf_pmu_migrate_context() here seems to break CPU context migration for th= ose events. > =20 > if (!uncore_die_has_box(type, die, pmu->pmu_idx)) [ ... ] > @@ -1573,7 +1593,7 @@ static int allocate_boxes(struct intel_uncore_type = **types, > type =3D *types; > pmu =3D type->pmus; > for (i =3D 0; i < type->num_boxes; i++, pmu++) { > - if (pmu->boxes[die]) > + if (pmu->boxes[die] || uncore_pmu_broken(pmu)) > continue; Is this check dead code? allocate_boxes() is exclusively used by MSR/MMIO PMUs, but those PMUs ignore hardware initialization failures and never call uncore_pmu_set_broken(pmu). --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260512233048.9577= -1-zide.chen@intel.com?part=3D5