From: qinyuntan <qinyuntan@linux.alibaba.com>
To: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@kernel.dk>,
Sagi Grimberg <sagi@grimberg.me>,
linux-nvme@lists.infradead.org,
Xunlei Pang <xlpang@linux.alibaba.com>,
Guixin Liu <kanie@linux.alibaba.com>,
oliver.yang@linux.alibaba.com,
Guanghui Feng <guanghuifeng@linux.alibaba.com>,
Bjorn Helgaas <bhelgaas@google.com>,
linux-pci@vger.kernel.org
Subject: Re: [PATCH V1] nvme-pci: disable SR-IOV VFs on driver unbind
Date: Fri, 30 Jan 2026 12:53:25 +0800 [thread overview]
Message-ID: <88437e1a-2df0-41e6-a58f-dcc68d4458bc@linux.alibaba.com> (raw)
In-Reply-To: <20260127084807.GA342@lst.de>
Hi All,
Thank you all for the insightful discussion!
I agree with Leon's point that not all devices are created equal when it
comes to SR-IOV handling during driver unbind.
Looking at existing driver implementations, I found two different
approaches:
1) mlx5 - unconditionally disables SR-IOV in remove:
drivers/net/ethernet/mellanox/mlx5/core/main.c:
static void remove_one(struct pci_dev *pdev)
{
...
mlx5_sriov_disable(pdev, false);
...
}
drivers/net/ethernet/mellanox/mlx5/core/sriov.c:
void mlx5_sriov_disable(struct pci_dev *pdev, bool num_vf_change)
{
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
struct devlink *devlink = priv_to_devlink(dev);
int num_vfs = pci_num_vf(dev->pdev);
pci_disable_sriov(pdev); /* Always disable, no
pci_vfs_assigned() check */
devl_lock(devlink);
mlx5_device_disable_sriov(dev, num_vfs, true, num_vf_change);
devl_unlock(devlink);
}
2) ixgbe - checks pci_vfs_assigned() and skips disable if VFs are in use:
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c:
static void ixgbe_remove(struct pci_dev *pdev)
{
...
#ifdef CONFIG_PCI_IOV
ixgbe_disable_sriov(adapter);
#endif
...
}
drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c:
#ifdef CONFIG_PCI_IOV
if (pci_vfs_assigned(adapter->pdev)) {
e_dev_warn("Unloading driver while VFs are assigned - VFs
will not be deallocated\n");
return -EPERM;
}
pci_disable_sriov(adapter->pdev);
#endif
Regarding the warning level discussion: I would prefer keeping it as
dev_warn() rather than downgrading to dev_info(). As Leon mentioned,
some devices do require SR-IOV to be disabled when the PF is unbound,
and for those cases, this warning is important for operators to notice
and take action. A warning level helps ensure it doesn't get lost in
normal system logs.
Please let me know how you'd like to proceed.
Thanks,
Qinyun
On 1/27/26 4:48 PM, Christoph Hellwig wrote:
> On Tue, Jan 27, 2026 at 03:33:44PM +0800, Qinyun Tan wrote:
>> The NVMe PCI driver exports the sriov_configure callback via
>> pci_sriov_configure_simple(), which allows userspace to enable SR-IOV
>> VFs through sysfs. However, when the PF driver is unbound, the driver
>> does not disable SR-IOV, leaving VFs orphaned in the system.
>
> That sounds dangerous.
>
>> According to Documentation/PCI/pci-iov-howto.rst, PCI drivers that
>> support SR-IOV should call pci_disable_sriov() in their remove callback
>> to properly clean up VFs before the driver is unloaded.
>
> Bjorn and other PCI folks: is there any reason to not do this in
> the PCI code and leave a landmine for the drivers?
>
>> Fix this by disabling SR-IOV in nvme_remove(). If VFs are not assigned
>> to a guest, disable SR-IOV. If VFs are still assigned, emit a warning
>> since forcibly disabling would disrupt the guest.
>
> Well, I think we have to distrupt it, at least for hot unplug. This
> sounds like we need some better handling in the core code as well.
>
>> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
>> index 58f3097888a7..4f2dc13de48b 100644
>> --- a/drivers/nvme/host/pci.c
>> +++ b/drivers/nvme/host/pci.c
>> @@ -3666,6 +3666,15 @@ static void nvme_remove(struct pci_dev *pdev)
>> nvme_stop_ctrl(&dev->ctrl);
>> nvme_remove_namespaces(&dev->ctrl);
>> nvme_dev_disable(dev, true);
>> +
>> + if (pci_num_vf(pdev)) {
>> + if (pci_vfs_assigned(pdev))
>> + dev_warn(&pdev->dev,
>> + "WARNING: Removing PF while VFs are assigned - VFs will not be deallocated!\n");
>> + else
>> + pci_disable_sriov(pdev);
>> + }
>> +
>> nvme_free_host_mem(dev);
>> nvme_dev_remove_admin(dev);
>> nvme_dbbuf_dma_free(dev);
>> --
>> 2.43.5
> ---end quoted text---
next prev parent reply other threads:[~2026-01-30 4:53 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260127073344.2489873-1-qinyuntan@linux.alibaba.com>
2026-01-27 8:48 ` [PATCH V1] nvme-pci: disable SR-IOV VFs on driver unbind Christoph Hellwig
2026-01-27 14:31 ` Leon Romanovsky
2026-01-27 16:06 ` Keith Busch
2026-01-27 18:00 ` Leon Romanovsky
2026-01-27 23:09 ` Bjorn Helgaas
2026-01-27 23:43 ` Jakub Kicinski
2026-01-28 8:44 ` Leon Romanovsky
2026-01-30 4:53 ` qinyuntan [this message]
2026-02-06 22:28 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=88437e1a-2df0-41e6-a58f-dcc68d4458bc@linux.alibaba.com \
--to=qinyuntan@linux.alibaba.com \
--cc=axboe@kernel.dk \
--cc=bhelgaas@google.com \
--cc=guanghuifeng@linux.alibaba.com \
--cc=hch@lst.de \
--cc=kanie@linux.alibaba.com \
--cc=kbusch@kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-pci@vger.kernel.org \
--cc=oliver.yang@linux.alibaba.com \
--cc=sagi@grimberg.me \
--cc=xlpang@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox