Re: [RFC PATCH v1] PCI: Remove pcie_flr() and convert all callers to use pcie_reset_flr()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Farhan Ali" <alifm@linux.ibm.com>
Cc: imx@lists.linux.dev, Frank.Li@kernel.org,
	linux-hyperv@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [RFC PATCH v1] PCI: Remove pcie_flr() and convert all callers to use pcie_reset_flr()
Date: Tue, 16 Jun 2026 19:22:23 +0000	[thread overview]
Message-ID: <20260616192223.74A2E1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260615192200.3489-1-alifm@linux.ibm.com>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [Critical] Unchecked VF FLR failure introduces a reliable DMA Use-After-Free (UAF) in the MANA driver during queue teardown.
- [High] Replacing `pcie_flr()` with `pcie_reset_flr()` silently breaks Function Level Reset (FLR) for all SR-IOV Virtual Functions (VFs) across multiple drivers.
- [High] Silent FLR failure in PCI reset quirks causes the kernel to block for 20 seconds during device teardown or initialization.
- [Low] The patch breaks the kernel build because it missed converting a caller of `pcie_flr()` in the `ixgbe` driver.

Pre-existing issues:
- [Critical] An unbounded MMIO read loop causes an infinite loop and hard lockup if the PCIe device is hot-unplugged or encounters a fatal hardware error.
--

commit e494ba565a5af359d72294da06691da5e6507322
Author: Farhan Ali <alifm@linux.ibm.com>
Subject: PCI: Remove pcie_flr() and convert all callers to use pcie_reset_flr()

This patch removes the unconditional pcie_flr() function and converts its
remaining callers to use pcie_reset_flr(). This ensures validation checks,
such as verifying FLR support in the device capabilities, are applied
consistently across the kernel.

> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
[ ... ]
>  int pcie_reset_flr(struct pci_dev *dev, bool probe)
>  {
>  	int ret;
>  
>  	if (dev->dev_flags & PCI_DEV_FLAGS_NO_FLR_RESET)
>  		return -ENOTTY;
>  
>  	if (!(dev->devcap & PCI_EXP_DEVCAP_FLR))
>  		return -ENOTTY;

[Severity: High]
According to the PCIe Base Specification, SR-IOV Virtual Functions do not
report FLR capability in their Device Capabilities register, so this check
will fail for VFs. 

Is this a regression that inadvertently breaks FLR for VFs that previously
relied on pcie_flr() bypassing this check? Callers that ignore the return
value will silently skip resetting the hardware.

> diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
> +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
[ ... ]
> @@ -8319,7 +8319,7 @@ static void ixgbe_check_for_bad_vf(struct ixgbe_adapter *adapter)
>  		if (status_reg != IXGBE_FAILED_READ_CFG_WORD &&
>  		    status_reg & PCI_STATUS_REC_MASTER_ABORT) {
>  			ixgbe_bad_vf_abort(adapter, vf);
> -			pcie_flr(vfdev);
> +			pcie_reset_flr(vfdev, PCI_RESET_DO_RESET);
>  		}

[Severity: Low]
Does this patch introduce a build regression by missing another caller in
ixgbe_main.c? 

Searching the tree might reveal that ixgbe_io_error_detected() still calls
the now-removed pcie_flr(), which would cause a build failure when compiling
with CONFIG_IXGBE.

> diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
> --- a/drivers/net/ethernet/microsoft/mana/mana_en.c
> +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
[ ... ]
> @@ -3306,7 +3306,8 @@ static int mana_dealloc_queues(struct net_device *ndev)
>  			}
>  			if (atomic_read(&txq->pending_sends)) {
>  				err =
> -				    pcie_flr(to_pci_dev(gd->gdma_context->dev));
> +				    pcie_reset_flr(to_pci_dev(gd->gdma_context->dev),
> +						   PCI_RESET_DO_RESET);
>  				if (err) {
>  					netdev_err(ndev, "flr failed %d with %d pkts pending in txq %u\n",
>  						   err,

[Severity: Critical]
If pcie_reset_flr() returns -ENOTTY for VFs, the FLR fails and this loop
breaks early. 

Will the driver then proceed to unmap and free active SKB buffers in the
subsequent loop:

mana_dealloc_queues() {
        ...
        for (i = 0; i < apc->num_queues; i++) {
                txq = &apc->tx_qp[i].txq;
                while ((skb = skb_dequeue(&txq->pending_skbs))) {
                        mana_unmap_skb(skb, apc);
                        dev_kfree_skb_any(skb);
                }
        ...
}

Since the hardware wasn't reset and remains active, can this lead to a DMA
use-after-free regression as the hardware continues to process the freed
memory?

> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
[ ... ]
> @@ -4207,7 +4209,7 @@ static int reset_hinic_vf_dev(struct pci_dev *pdev, bool probe)
>  	val = val | HINIC_VF_FLR_PROC_BIT;
>  	iowrite32be(val, bar + HINIC_VF_OP);
>  
> -	pcie_flr(pdev);
> +	pcie_reset_flr(pdev, PCI_RESET_DO_RESET);
>  
>  	/*
>  	 * The device must recapture its Bus and Device Numbers after FLR

[Severity: High]
If pcie_reset_flr() returns -ENOTTY without actually issuing a reset to the
hardware, will the firmware ever clear the HINIC_VF_FLR_PROC_BIT? 

Looking at the rest of reset_hinic_vf_dev(), it appears the code will
subsequently spin in a busy-wait loop for the full 20-second timeout
waiting for that bit to clear:

reset_hinic_vf_dev() {
        ...
        do {
                val = ioread32be(bar + HINIC_VF_OP);
                if (!(val & HINIC_VF_FLR_PROC_BIT))
                        goto reset_complete;
                msleep(20);
        } while (time_before(jiffies, timeout));
        ...
}

Could this regression silently block the thread for 20 seconds during
device teardown?

> diff --git a/drivers/ptp/ptp_netc.c b/drivers/ptp/ptp_netc.c
> --- a/drivers/ptp/ptp_netc.c
> +++ b/drivers/ptp/ptp_netc.c
[ ... ]
> @@ -802,7 +802,7 @@ static int netc_timer_pci_probe(struct pci_dev *pdev)
>  	if (!priv)
>  		return -ENOMEM;
>  
> -	pcie_flr(pdev);
> +	pcie_reset_flr(pdev, PCI_RESET_DO_RESET);
>  	err = pci_enable_device_mem(pdev);

[Severity: Critical]
This is a pre-existing issue, but while reviewing this driver's PCIe
handling, does the interrupt handler have an unbounded loop on PCIe
disconnect?

In netc_timer_handle_etts_event(), the code continuously reads a
memory-mapped register:

netc_timer_handle_etts_event() {
        ...
        while (netc_timer_rd(priv, NETC_TMR_STAT) & TMR_STAT_ETS_VLD(index)) {
                ...
        }
        ...
}

If the device is removed or experiences a fatal error, the MMIO read will
return all 1s (0xFFFFFFFF). Because 0xFFFFFFFF masked with the bit is
always true, would this loop spin infinitely and cause a hard lockup?

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260615192200.3489-1-alifm@linux.ibm.com?part=1

     prev parent reply	other threads:[~2026-06-16 19:22 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-15 19:21 [RFC PATCH v1] PCI: Remove pcie_flr() and convert all callers to use pcie_reset_flr() Farhan Ali
2026-06-15 19:21 ` [Intel-wired-lan] " Farhan Ali
2026-06-16 11:21 ` kernel test robot
2026-06-16 12:45 ` kernel test robot
2026-06-16 19:22 ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260616192223.74A2E1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=Frank.Li@kernel.org \
    --cc=alifm@linux.ibm.com \
    --cc=imx@lists.linux.dev \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.