From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 54F092641EE for ; Wed, 27 May 2026 19:56:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779911763; cv=none; b=BjseCH1oW4NgjnbyIW6Oynro4Uqtl4QfbfMF7umZJN4dVYwzbGSOIGsBH6xT5KbpEdLVR0lAks3xEhqGJW/XJWPfy4g/UNes/KLnewbVqxMdoxqPL9rPV7/CZuh596vVEq+LMN3UWdNMZdpg31cBMJWGFEruBuXkYrUJOaWr8T4= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779911763; c=relaxed/simple; bh=PEgYszr3C0/80DXj9fpELQ8jt6gYhIAhYk0e4gbcO4g=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=TZEHk8ghLuTJ4MJ0hV5j+7ATJdWzb9ppZ9P1a6WNIRLnl4IqnoDpqBIZJnIWfapRGi5ByXZTHZXIkokheW8IMj/wgigRhCL2Vu3+AARP8X1l4cg6aF0ZWWS6Ju7zcuEjdyGK9gWLT/ywcEHqnu5Z0pSUBR4fkAnBGcIfcXiwLnA= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CceXB88o; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CceXB88o" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F20AF1F000E9; Wed, 27 May 2026 19:56:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1779911762; bh=NummqP1mBbU7sy9QVHom4VF6oTxQ0g8IRMRDwJOeHVc=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=CceXB88oQi1mxz+e5PUmYQjzKLY8nuDNnnJm4XJci//oXZJbSsbv52sFYJfmuGx+J p5kHAPCvZ5W0L9rlo1s5DAmFNTg3pqRuazA/zv0VVcARPGr+ZwfN9l/ob0TwHMPed/ ATSziLJZaG4Xz2wPUtwmsXDPP3V7VILudVozQ+QSUwShdJTzT+uloKqlhTciUfLKwJ 47mEpTWMUnqR8BKHWyAJTbr2aSb9m5cgmOrHnswkFtzXigUxqnfGi5ju02T6kXU1we SVa9u5hAaOVlikeSa5RjmitbwzQ/S/FgNmfOKh9x5iceNtAMZI6pTZyWG1ZKlq+Z67 zITfmzbnhsfWg== From: sashiko-bot@kernel.org Subject: Re: [PATCH V2 6/7] perf/x86/intel/uncore: Fix uncore_die_to_cpu() for offline dies Reply-To: sashiko-reviews@lists.linux.dev To: "Zide Chen" Cc: linux-perf-users@vger.kernel.org In-Reply-To: <20260527151154.130505-6-zide.chen@intel.com> References: <20260527151154.130505-6-zide.chen@intel.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Wed, 27 May 2026 19:56:01 +0000 Message-Id: <20260527195601.F20AF1F000E9@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-perf-users@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 2 potential issue(= s) to consider: Pre-existing issues: - [High] This problem wasn't introduced by this patch, but in `pmu_alloc_to= pology()`, if the memory allocation for `topology[die]` fails, the error cl= eanup loop dereferences the newly-NULL pointer, causing a kernel crash. - [High] This problem wasn't introduced by this patch, but there is an unpr= otected list traversal of PCI buses in `uncore_die_to_segment()` leading to= a potential Use-After-Free if a PCI bus is hot-removed. -- commit ff1f5cc12e40244014f762aa4e444d2ef1d867a3 Author: Zide Chen perf/x86/intel/uncore: Fix uncore_die_to_cpu() for offline dies If the die is offline when uncore_die_to_cpu() is called, it silently returns 0, which is misleading. This patch modifies the function to return -1 in this case, indicating that all CPUs on the die are offline. It also opportunistically replaces -EPERM with -ENODEV. > diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncor= e.c > index 22256ded2d67d..4b3a1fa5b41b7 100644 > --- a/arch/x86/events/intel/uncore.c > +++ b/arch/x86/events/intel/uncore.c > @@ -91,7 +91,7 @@ int uncore_device_to_die(struct pci_dev *dev) > */ > int uncore_die_to_cpu(int die) > { > - int res =3D 0, cpu; > + int res =3D -1, cpu; > =20 > for_each_online_cpu(cpu) { > if (topology_logical_die_id(cpu) =3D=3D die) { [ ... ] > diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel= /uncore_snbep.c > index 772b78237424c..334dc384b5b93 100644 > --- a/arch/x86/events/intel/uncore_snbep.c > +++ b/arch/x86/events/intel/uncore_snbep.c > @@ -3772,12 +3772,17 @@ static void pmu_free_topology(struct intel_uncore= _type *type) > static int skx_pmu_get_topology(struct intel_uncore_type *type, > int (*topology_cb)(struct intel_uncore_type*, int, int, u64)) > { > - int die, ret =3D -EPERM; > + int die, ret =3D -ENODEV; > u64 cpu_bus_msr; > + int cpu; > =20 > cpus_read_lock(); > for (die =3D 0; die < uncore_max_dies(); die++) { > - ret =3D skx_msr_cpu_bus_read(uncore_die_to_cpu(die), &cpu_bus_msr); > + cpu =3D uncore_die_to_cpu(die); > + if (cpu =3D=3D -1) > + continue; > + > + ret =3D skx_msr_cpu_bus_read(cpu, &cpu_bus_msr); > if (ret) > break; [Severity: High] This isn't a bug introduced by this patch, but later in this function, we call uncore_die_to_segment() which iterates over PCI root buses using pci_find_next_bus(): arch/x86/events/intel/uncore.c:uncore_die_to_segment() { struct pci_bus *bus =3D NULL; /* Find first pci bus which attributes to specified die. */ while ((bus =3D pci_find_next_bus(bus)) && (die !=3D uncore_pcibus_to_dieid(bus))) ; return bus ? pci_domain_nr(bus) : -EINVAL; } The pci_find_next_bus() function acquires and releases pci_bus_sem internal= ly, returning a raw bus pointer without holding a lock or taking a reference co= unt. If an unrelated hot-pluggable PCI root bus is hot-removed concurrently, cou= ld the bus pointer be freed before the next iteration? Passing this freed pointer back to pci_find_next_bus() could cause a use-after-free when accessing from->node.next.=20 While cpus_read_lock() is held in skx_pmu_get_topology(), does it prevent concurrent hotplug of unrelated PCI root buses? [Severity: High] This is a pre-existing issue, but I noticed a potential NULL pointer dereference in pmu_alloc_topology() in this same file. If kzalloc_objs() for topology[die] fails, the code jumps to the clear labe= l: arch/x86/events/intel/uncore_snbep.c:pmu_alloc_topology() { ... for (die =3D 0; die < uncore_max_dies(); die++) { topology[die] =3D kzalloc_objs(**topology, type->num_boxes); if (!topology[die]) goto clear; ... clear: for (; die >=3D 0; die--) { for (idx =3D 0; idx < type->num_boxes; idx++) kfree(topology[die][idx].untyped); kfree(topology[die]); ... } At the clear label, the loop variable die still points to the index that failed allocation, so topology[die] is NULL. Does the cleanup loop's attempt to execute kfree(topology[die][idx].untyped) evaluate to (topology[die] + idx)->untyped and crash the kernel? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260527151154.1305= 05-1-zide.chen@intel.com?part=3D6