stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
	stefanha@redhat.com, pbonzini@redhat.com,
	xuanzhuo@linux.alibaba.com, stable@vger.kernel.org,
	mgurtovoy@nvidia.com, lirongqing@baidu.com
Subject: Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
Date: Fri, 22 Aug 2025 06:21:52 -0400	[thread overview]
Message-ID: <20250822060839-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20250822091706.21170-1-parav@nvidia.com>

On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
> This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
> 
> Virtio drivers and PCI devices have never fully supported true
> surprise (aka hot unplug) removal. Drivers historically continued
> processing and waiting for pending I/O and even continued synchronous
> device reset during surprise removal. Devices have also continued
> completing I/Os, doing DMA and allowing device reset after surprise
> removal to support such drivers.
> 
> Supporting it correctly would require a new device capability

If a device is removed, it is removed. Windows drivers supported
this since forever and it's just a Linux bug that it does not
handle all the cases. This is not something you can handle
with a capability.





> and
> driver negotiation in the virtio specification to safely stop
> I/O and free queue memory. Failure to do so either breaks all the
> existing drivers with call trace listed in the commit or crashes the
> host on continuing the DMA.

If the device is gone, then DMA does not continue.

IIUC what is going on for you, is that you have developed a surprise
removal emulation that pretends to remove the device but
actually the device is doing DMA. So of course things break then.

> Hence, until such specification and devices
> are invented, restore the previous behavior of treating surprise
> removal as graceful removal to avoid regressions and maintain system
> stability same as before the
> commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
> 
> As explained above, previous analysis of solving this only in driver
> was incomplete and non-reliable at [1] and at [2]; Hence reverting commit
> 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
> is still the best stand to restore failures of virtio net and
> block devices.
> 
> [1] https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB100BC6C638DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t


I can only repeat what I said then, this is not how we do kernel
development.

> [2] https://lore.kernel.org/virtualization/20250602024358.57114-1-parav@nvidia.com/

What was missing here, is handling corner cases. So let us please 
try to handle them.

Here is how I would try to do it:

- add a new driver callback
- start a periodic timer task in virtio core on remove
- in the timer, probe that the device is still present.
  if not, invoke a driver callback
- cancel the task on device reset

If you do not have the time, let me know and I will try to look into it.

> Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
> Cc: stable@vger.kernel.org
> Reported-by: lirongqing@baidu.com
> Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@baidu.com/
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> ---
>  drivers/virtio/virtio_pci_common.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index d6d79af44569..dba5eb2eaff9 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
>  	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
>  	struct device *dev = get_device(&vp_dev->vdev.dev);
>  
> -	/*
> -	 * Device is marked broken on surprise removal so that virtio upper
> -	 * layers can abort any ongoing operation.
> -	 */
> -	if (!pci_device_is_present(pci_dev))
> -		virtio_break_device(&vp_dev->vdev);
> -
>  	pci_disable_sriov(pci_dev);
>  
>  	unregister_virtio_device(&vp_dev->vdev);
> -- 
> 2.26.2


  reply	other threads:[~2025-08-22 10:21 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-22  9:17 [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device" Parav Pandit
2025-08-22 10:21 ` Michael S. Tsirkin [this message]
2025-08-22 12:22   ` Parav Pandit
2025-08-22 13:03     ` Michael S. Tsirkin
2025-08-22 13:49       ` Parav Pandit
2025-08-22 13:59         ` Michael S. Tsirkin
2025-08-24  2:36           ` Parav Pandit
2025-08-24 14:33             ` Michael S. Tsirkin
2025-08-26 18:52               ` Parav Pandit
2025-08-27 10:19                 ` Michael S. Tsirkin
2025-08-27 11:33                   ` Cornelia Huck
2025-08-28  6:24                     ` Parav Pandit
2025-08-28 12:16                       ` Cornelia Huck
2025-08-28 12:19                         ` Michael S. Tsirkin
2025-08-28 12:22                           ` Cornelia Huck
2025-08-28 12:33                             ` Parav Pandit
2025-08-28 13:00                               ` Michael S. Tsirkin
2025-08-28 13:37                                 ` Parav Pandit
  -- strict thread matches above, loose matches on Subject: below --
2025-08-22 10:27 Li,Rongqing
2025-08-22 12:24 ` Parav Pandit
2025-08-22 13:04   ` Michael S. Tsirkin
2025-08-22 13:53     ` Parav Pandit
2025-08-22 14:02       ` Michael S. Tsirkin
2025-08-24  2:36         ` Parav Pandit
2025-08-24 14:29           ` Michael S. Tsirkin
2025-08-26 18:52             ` Parav Pandit
2025-08-27 10:21               ` Michael S. Tsirkin
2025-08-27 10:49                 ` Michael S. Tsirkin
2025-08-28  6:23                   ` Parav Pandit
2025-08-28  6:34                     ` Michael S. Tsirkin
2025-08-28  6:59                       ` Parav Pandit
2025-08-28  9:23                         ` Michael S. Tsirkin
2025-08-28 10:41                           ` Parav Pandit
2025-04-08 14:59 Parav Pandit
2025-04-08 20:15 ` Michael S. Tsirkin
2025-04-09 13:50   ` Parav Pandit
2025-04-09 16:02     ` Michael S. Tsirkin
2025-04-16  3:01       ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250822060839-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=lirongqing@baidu.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=parav@nvidia.com \
    --cc=pbonzini@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).