From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BE8D73A9625 for ; Wed, 13 May 2026 23:39:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778715584; cv=none; b=P4aOSLDJ91yS9WYOkZC446KhPH/Dk6gPv75jqGfuiDULNohbI9wZYp5OZcMl36sc7n31ROmfKpBzPM3pgVDXo8fVyqQdSVrmnUxcRFHysry2WMKORm1LObSwKEwtd6Gw8Ln5lDOUTE1MOCtKYpIacW1txrOwCHwS45R+TcqTWZo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778715584; c=relaxed/simple; bh=JAd7Fl6gOGz/q4M3f/Nh3DRHf0OPjX6Q7FfJWAegsuw=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=eb7RnYL98Lle/Mq2vEaBh5NhVZ5vCHVVCKK/mJKZ9TeEJVFl8R5s31as2c6/wugPnk1Kd6f3S7dzWLtdOuLsXV9MFi9E/p9EZZIXgEVrJARvrZLYduvXTdGuisawzoTfkBk+NAREVjHH6DT/7howXTZiFJYrJUqUock2pj2KK9s= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=baPttK4a; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="baPttK4a" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 896FBC19425; Wed, 13 May 2026 23:39:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778715584; bh=JAd7Fl6gOGz/q4M3f/Nh3DRHf0OPjX6Q7FfJWAegsuw=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=baPttK4aSHaUlbmXktrx6tMJ8mCsPR3LcQNtJamBsQpVETpA1sGEucpvpY0jw6Dms 6U/zy2FRwh2PeL7+KBIQQ7d51N9sItVJMDMzBB5orfKWjieJ9E6U8bH+QfBNFvA+L/ uWFZRRANIx55Q8Q++Ra7NQigNhr8tHheDsHp6oXAd5SwQAX12lqa0on0+UZRNXI8xU 0t5kMYZ68xXoSpfSEcuBklLDMiAUYKSyRUncBbRnn7JiG47D/tMCqbRAL7IwwT57a0 syczd/j8LaOdxQ+BvWgx3zQltlj4kRkx0QODWA464S6igHdsrWu4gVQzruC/6o5PCl P288BbJutYtLg== From: sashiko-bot@kernel.org Subject: Re: [PATCH] pci_crash: capture PCI config space at panic time Reply-To: sashiko-reviews@lists.linux.dev To: hangej Cc: linux-pci@vger.kernel.org In-Reply-To: <20260512162402.1929687-1-hangej@amazon.com> References: <20260512162402.1929687-1-hangej@amazon.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Wed, 13 May 2026 23:39:44 +0000 Message-Id: <20260513233944.896FBC19425@smtp.kernel.org> Precedence: bulk X-Mailing-List: linux-pci@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 9 potential issue(= s) to consider: - [Critical] Deadlock in panic path due to global pci_lock acquisition. - [High] Hardware error and recursive panic from accessing offline or D3col= d devices. - [High] Lockless array teardown and missing memory barriers cause Use-Afte= r-Free during panic. - [High] Loss of intra-page offset in the physical pagemap for kmalloc-back= ed buffers. - [Medium] Resource leak when all matched devices are hot-removed. - [Medium] Uninitialized variable used in AER capture check. - [Medium] Missing locking in sysfs capture parameter setter leads to data = race. - [Low] Ignoring strim() return value prevents stripping of leading whitesp= ace. - [Low] Commit message incorrectly claims the use of slab_is_available(). -- commit f80de92901e525b2a8355b2b6d9ab3d784f115db Author: Johannes Hange Subject: pci_crash: capture PCI config space at panic time This commit introduces CONFIG_PCI_CRASH to capture PCI configuration space data during a kernel panic. The data is stored in a buffer that can be extracted from the vmcore for post-mortem analysis. > diff --git a/Documentation/PCI/pci-crash-capture.rst b/Documentation/PCI/= pci-crash-capture.rst > index 0000000000000..35f676c2724b7 > --- /dev/null > +++ b/Documentation/PCI/pci-crash-capture.rst [ ... ] > +- slab_is_available() guard in param setters prevents use-before-init > + when parameters are set via kernel command line. This isn't a bug, but the commit message and documentation mention using a slab_is_available() guard. It looks like the code actually uses READ_ONCE(pci_crash_ready). Could this documentation be updated to match the implementation? > diff --git a/kernel/pci_crash.c b/kernel/pci_crash.c > new file mode 100644 > index 0000000000000..8bbc9008fdc85 > --- /dev/null > +++ b/kernel/pci_crash.c [ ... ] > +static int capture_param_set(const char *val, const struct kernel_param = *kp) > +{ > + if (strlen(val) >=3D sizeof(capture)) > + return -EINVAL; > + strscpy(capture, val, sizeof(capture)); > + strim(capture); Could concurrent sysfs writes from userspace cause data races here? The devices_param_set() function protects the string modification with pci_crash_lock, but it appears to be missing in this setter. Also, does this code correctly handle leading whitespace? The strim() function modifies the string in-place to remove trailing whitespace, but it returns a pointer to the first non-whitespace character to skip leading whitespace. Since the return value is ignored here and in devices_param_set(), any leading whitespace would be preserved and might break the exact token string comparisons later. [ ... ] > +static struct pci_crash_pagemap *pci_crash_build_pagemap(void *buf, > + size_t buf_size) > +{ [ ... ] > + for (i =3D 0; i < num_pages; i++) { > + struct page *page; > + phys_addr_t pa; > + > + if (is_vmalloc_addr(buf + i * PAGE_SIZE)) > + page =3D vmalloc_to_page(buf + i * PAGE_SIZE); > + else > + page =3D virt_to_page(buf + i * PAGE_SIZE); > + > + if (!page) { > + kfree(pm); > + return NULL; > + } > + pa =3D page_to_phys(page); > + pm->addrs[i] =3D cpu_to_le64(pa); Can this lose the intra-page offset for the buffer? The kvmalloc() function falls back to kmalloc() for smaller allocations, which are aligned to slab boundaries rather than page boundaries. The page_to_phys() macro returns the physical address of the start of the page, so won't the crash parser read from the page boundary instead of the actual buffer start? [ ... ] > +static void pci_crash_read_config_space(struct pci_dev *pdev, u8 *ptr) > +{ [ ... ] > + for (offset =3D 0; offset < pdev->cfg_size; offset +=3D 4) { > + if (pci_read_config_dword(pdev, offset, &val)) { Could this trigger a recursive panic when accessing offline or D3cold devic= es? The code iterates over all captured devices and reads their config space without checking their power state. If a device is in D3cold or otherwise unreachable, accessing it will cause a PCIe Master Abort. On some architectures like ARM64, unhandled Master Aborts trigger an SError, which would cause an immediate hard hang or reset when raised from within the panic handler. Also, the commit message claims that pci_read_config_dword() is lockless. Is this description accurate? Looking at pci_read_config_dword(), the generic implementation unconditionally calls pci_bus_read_config_dword(), which acquires the global raw spinlock pci_lock. If the kernel panics while this lock is held by another CPU, won't the panicking CPU spin indefinitely attempting to acquire it and deadlock the panic handler? > + put_unaligned_le32(0xFFFFFFFF, &cfg_data[offset]); > + continue; > + } > + put_unaligned_le32(val, &cfg_data[offset]); > + } > +} [ ... ] > +static void pci_crash_rebuild_snapshot(void) > +{ [ ... ] > + if (count =3D=3D 0) { > + pr_info("no PCI devices found\n"); > + WRITE_ONCE(pci_crash_num_devs, 0); > + if (old_devs) { > + for (i =3D 0; i < old_num_devs; i++) > + pci_dev_put(old_devs[i]); > + kvfree(old_devs); > + WRITE_ONCE(pci_crash_devs, NULL); > + } > + kfree(old_pm); > + WRITE_ONCE(pci_crash_pagemap, NULL); > + WRITE_ONCE(pci_crash_pagemap_phys, 0); Could this result in a use-after-free during a panic? On weakly ordered architectures, the CPU store buffer might delay making pci_crash_num_devs =3D 0 visible globally while kvfree(old_devs) or kfree(old_pm) modify allocator metadata in memory. If an NMI stops this CPU at this exact point, could the panicking CPU observe a non-zero device count and traverse the freed old_devs array or export a freed physical address from old_pm? Should there be an smp_wmb() after writing 0 to pci_crash_num_devs? > + old_buf =3D pci_crash_buffer; > + WRITE_ONCE(pci_crash_buffer, NULL); > + WRITE_ONCE(pci_crash_buffer_size, 0); > + kvfree(old_buf); > + return; > + } [ ... ] > + if (count =3D=3D 0) { > + kvfree(new_devs); > + pr_info("no devices match devices=3D%s\n", devices); > + goto err_restore; > + } [ ... ] > +err_restore: > + WRITE_ONCE(pci_crash_num_devs, old_num_devs); > +} Does this code leak the struct pci_dev references for the removed devices? When jumping to err_restore, it restores pci_crash_num_devs to old_num_devs but doesn't appear to drop the references on the old_devs array, pinning their memory indefinitely. [ ... ] > +void pci_crash_save(void) > +{ [ ... ] > + if (!(cflags & PCI_CRASH_CAPTURE_ALWAYS)) { > + if (cflags & PCI_CRASH_CAPTURE_AER) { > + struct pci_dev **devs =3D READ_ONCE(pci_crash_devs); > + unsigned int i; > + bool pci_error_found =3D false; > + > + if (!devs) > + return; > + > + for (i =3D 0; i < num_devs; i++) { > + struct pci_dev *pdev =3D devs[i]; > + u32 status; > + > + if (!pdev || !pdev->aer_cap) > + continue; > + > + if (pci_pcie_type(pdev) =3D=3D PCI_EXP_TYPE_ROOT_PORT) { > + pci_read_config_dword(pdev, > + pdev->aer_cap + PCI_ERR_ROOT_STATUS, > + &status); > + if (status & PCI_ERR_ROOT_UNCOR_RCV) { Can this evaluate uninitialized stack memory? If pci_read_config_dword() fails early, it returns an error without populating the pointer, leaving the status variable with uninitialized stack garbage. The return value isn't checked before evaluating status here. > + pci_error_found =3D true; > + break; > + } > + } > + } --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260512162402.1929= 687-1-hangej@amazon.com?part=3D1