From: sashiko-bot@kernel.org
To: "Farhan Ali" <alifm@linux.ibm.com>
Cc: imx@lists.linux.dev, Frank.Li@kernel.org,
linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [RFC PATCH v1] PCI: Remove pcie_flr() and convert all callers to use pcie_reset_flr()
Date: Tue, 16 Jun 2026 19:22:23 +0000 [thread overview]
Message-ID: <20260616192223.74A2E1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260615192200.3489-1-alifm@linux.ibm.com>
Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
New issues:
- [Critical] Unchecked VF FLR failure introduces a reliable DMA Use-After-Free (UAF) in the MANA driver during queue teardown.
- [High] Replacing `pcie_flr()` with `pcie_reset_flr()` silently breaks Function Level Reset (FLR) for all SR-IOV Virtual Functions (VFs) across multiple drivers.
- [High] Silent FLR failure in PCI reset quirks causes the kernel to block for 20 seconds during device teardown or initialization.
- [Low] The patch breaks the kernel build because it missed converting a caller of `pcie_flr()` in the `ixgbe` driver.
Pre-existing issues:
- [Critical] An unbounded MMIO read loop causes an infinite loop and hard lockup if the PCIe device is hot-unplugged or encounters a fatal hardware error.
--
commit e494ba565a5af359d72294da06691da5e6507322
Author: Farhan Ali <alifm@linux.ibm.com>
Subject: PCI: Remove pcie_flr() and convert all callers to use pcie_reset_flr()
This patch removes the unconditional pcie_flr() function and converts its
remaining callers to use pcie_reset_flr(). This ensures validation checks,
such as verifying FLR support in the device capabilities, are applied
consistently across the kernel.
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
[ ... ]
> int pcie_reset_flr(struct pci_dev *dev, bool probe)
> {
> int ret;
>
> if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
> return -ENOTTY;
>
> if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
> return -ENOTTY;
[Severity: High]
According to the PCIe Base Specification, SR-IOV Virtual Functions do not
report FLR capability in their Device Capabilities register, so this check
will fail for VFs.
Is this a regression that inadvertently breaks FLR for VFs that previously
relied on pcie_flr() bypassing this check? Callers that ignore the return
value will silently skip resetting the hardware.
> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
[ ... ]
> @@ -8319,7 +8319,7 @@ static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter)
> if (status_reg != IXGBE_FAILED_READ_CFG_WORD &&
> status_reg & PCI_STATUS_REC_MASTER_ABORT) {
> ixgbe_bad_vf_abort(adapter, vf);
> - pcie_flr(vfdev);
> + pcie_reset_flr(vfdev, PCI_RESET_DO_RESET);
> }
[Severity: Low]
Does this patch introduce a build regression by missing another caller in
ixgbe_main.c?
Searching the tree might reveal that ixgbe_io_error_detected() still calls
the now-removed pcie_flr(), which would cause a build failure when compiling
with CONFIG_IXGBE.
> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
[ ... ]
> @@ -3306,7 +3306,8 @@ static int mana_dealloc_queues(struct net_device *ndev)
> }
> if (atomic_read(&txq->pending_sends)) {
> err =
> - pcie_flr(to_pci_dev(gd->gdma_context->dev));
> + pcie_reset_flr(to_pci_dev(gd->gdma_context->dev),
> + PCI_RESET_DO_RESET);
> if (err) {
> netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n",
> err,
[Severity: Critical]
If pcie_reset_flr() returns -ENOTTY for VFs, the FLR fails and this loop
breaks early.
Will the driver then proceed to unmap and free active SKB buffers in the
subsequent loop:
mana_dealloc_queues() {
...
for (i = 0; i < apc->num_queues; i++) {
txq = &apc->tx_qp[i].txq;
while ((skb = skb_dequeue(&txq->pending_skbs))) {
mana_unmap_skb(skb, apc);
dev_kfree_skb_any(skb);
}
...
}
Since the hardware wasn't reset and remains active, can this lead to a DMA
use-after-free regression as the hardware continues to process the freed
memory?
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
[ ... ]
> @@ -4207,7 +4209,7 @@ static int reset_hinic_vf_dev(struct pci_dev *pdev, bool probe)
> val = val | HINIC_VF_FLR_PROC_BIT;
> iowrite32be(val, bar + HINIC_VF_OP);
>
> - pcie_flr(pdev);
> + pcie_reset_flr(pdev, PCI_RESET_DO_RESET);
>
> /*
> * The device must recapture its Bus and Device Numbers after FLR
[Severity: High]
If pcie_reset_flr() returns -ENOTTY without actually issuing a reset to the
hardware, will the firmware ever clear the HINIC_VF_FLR_PROC_BIT?
Looking at the rest of reset_hinic_vf_dev(), it appears the code will
subsequently spin in a busy-wait loop for the full 20-second timeout
waiting for that bit to clear:
reset_hinic_vf_dev() {
...
do {
val = ioread32be(bar + HINIC_VF_OP);
if (!(val & HINIC_VF_FLR_PROC_BIT))
goto reset_complete;
msleep(20);
} while (time_before(jiffies, timeout));
...
}
Could this regression silently block the thread for 20 seconds during
device teardown?
> diff --git a/drivers/ptp/ptp_netc.c b/drivers/ptp/ptp_netc.c
> --- a/drivers/ptp/ptp_netc.c
> +++ b/drivers/ptp/ptp_netc.c
[ ... ]
> @@ -802,7 +802,7 @@ static int netc_timer_pci_probe(struct pci_dev *pdev)
> if (!priv)
> return -ENOMEM;
>
> - pcie_flr(pdev);
> + pcie_reset_flr(pdev, PCI_RESET_DO_RESET);
> err = pci_enable_device_mem(pdev);
[Severity: Critical]
This is a pre-existing issue, but while reviewing this driver's PCIe
handling, does the interrupt handler have an unbounded loop on PCIe
disconnect?
In netc_timer_handle_etts_event(), the code continuously reads a
memory-mapped register:
netc_timer_handle_etts_event() {
...
while (netc_timer_rd(priv, NETC_TMR_STAT) & TMR_STAT_ETS_VLD(index)) {
...
}
...
}
If the device is removed or experiences a fatal error, the MMIO read will
return all 1s (0xFFFFFFFF). Because 0xFFFFFFFF masked with the bit is
always true, would this loop spin infinitely and cause a hard lockup?
--
Sashiko AI review · https://sashiko.dev/#/patchset/20260615192200.3489-1-alifm@linux.ibm.com?part=1
prev parent reply other threads:[~2026-06-16 19:22 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-15 19:21 [RFC PATCH v1] PCI: Remove pcie_flr() and convert all callers to use pcie_reset_flr() Farhan Ali
2026-06-16 19:22 ` sashiko-bot [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260616192223.74A2E1F000E9@smtp.kernel.org \
--to=sashiko-bot@kernel.org \
--cc=Frank.Li@kernel.org \
--cc=alifm@linux.ibm.com \
--cc=imx@lists.linux.dev \
--cc=linux-hyperv@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=sashiko-reviews@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox