All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Parav Pandit <parav@nvidia.com>
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
	stefanha@redhat.com, pbonzini@redhat.com,
	xuanzhuo@linux.alibaba.com, stable@vger.kernel.org,
	mgurtovoy@nvidia.com, lirongqing@baidu.com
Subject: Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
Date: Fri, 22 Aug 2025 06:21:52 -0400	[thread overview]
Message-ID: <20250822060839-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20250822091706.21170-1-parav@nvidia.com>

On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
> This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
> 
> Virtio drivers and PCI devices have never fully supported true
> surprise (aka hot unplug) removal. Drivers historically continued
> processing and waiting for pending I/O and even continued synchronous
> device reset during surprise removal. Devices have also continued
> completing I/Os, doing DMA and allowing device reset after surprise
> removal to support such drivers.
> 
> Supporting it correctly would require a new device capability

If a device is removed, it is removed. Windows drivers supported
this since forever and it's just a Linux bug that it does not
handle all the cases. This is not something you can handle
with a capability.





> and
> driver negotiation in the virtio specification to safely stop
> I/O and free queue memory. Failure to do so either breaks all the
> existing drivers with call trace listed in the commit or crashes the
> host on continuing the DMA.

If the device is gone, then DMA does not continue.

IIUC what is going on for you, is that you have developed a surprise
removal emulation that pretends to remove the device but
actually the device is doing DMA. So of course things break then.

> Hence, until such specification and devices
> are invented, restore the previous behavior of treating surprise
> removal as graceful removal to avoid regressions and maintain system
> stability same as before the
> commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
> 
> As explained above, previous analysis of solving this only in driver
> was incomplete and non-reliable at [1] and at [2]; Hence reverting commit
> 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
> is still the best stand to restore failures of virtio net and
> block devices.
> 
> [1] https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB100BC6C638DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t


I can only repeat what I said then, this is not how we do kernel
development.

> [2] https://lore.kernel.org/virtualization/20250602024358.57114-1-parav@nvidia.com/

What was missing here, is handling corner cases. So let us please 
try to handle them.

Here is how I would try to do it:

- add a new driver callback
- start a periodic timer task in virtio core on remove
- in the timer, probe that the device is still present.
  if not, invoke a driver callback
- cancel the task on device reset

If you do not have the time, let me know and I will try to look into it.

> Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
> Cc: stable@vger.kernel.org
> Reported-by: lirongqing@baidu.com
> Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@baidu.com/
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> ---
>  drivers/virtio/virtio_pci_common.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index d6d79af44569..dba5eb2eaff9 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
>  	struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
>  	struct device *dev = get_device(&vp_dev->vdev.dev);
>  
> -	/*
> -	 * Device is marked broken on surprise removal so that virtio upper
> -	 * layers can abort any ongoing operation.
> -	 */
> -	if (!pci_device_is_present(pci_dev))
> -		virtio_break_device(&vp_dev->vdev);
> -
>  	pci_disable_sriov(pci_dev);
>  
>  	unregister_virtio_device(&vp_dev->vdev);
> -- 
> 2.26.2


  reply	other threads:[~2025-08-22 10:21 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-22  9:17 [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device" Parav Pandit
2025-08-22 10:21 ` Michael S. Tsirkin [this message]
2025-08-22 12:22   ` Parav Pandit
2025-08-22 13:03     ` Michael S. Tsirkin
2025-08-22 13:49       ` Parav Pandit
2025-08-22 13:59         ` Michael S. Tsirkin
2025-08-24  2:36           ` Parav Pandit
2025-08-24 14:33             ` Michael S. Tsirkin
2025-08-26 18:52               ` Parav Pandit
2025-08-27 10:19                 ` Michael S. Tsirkin
2025-08-27 11:33                   ` Cornelia Huck
2025-08-28  6:24                     ` Parav Pandit
2025-08-28 12:16                       ` Cornelia Huck
2025-08-28 12:19                         ` Michael S. Tsirkin
2025-08-28 12:22                           ` Cornelia Huck
2025-08-28 12:33                             ` Parav Pandit
2025-08-28 13:00                               ` Michael S. Tsirkin
2025-08-28 13:37                                 ` Parav Pandit
  -- strict thread matches above, loose matches on Subject: below --
2025-08-22 10:27 Li,Rongqing
2025-08-22 12:24 ` Parav Pandit
2025-08-22 13:04   ` Michael S. Tsirkin
2025-08-22 13:53     ` Parav Pandit
2025-08-22 14:02       ` Michael S. Tsirkin
2025-08-24  2:36         ` Parav Pandit
2025-08-24 14:29           ` Michael S. Tsirkin
2025-08-26 18:52             ` Parav Pandit
2025-08-27 10:21               ` Michael S. Tsirkin
2025-08-27 10:49                 ` Michael S. Tsirkin
2025-08-28  6:23                   ` Parav Pandit
2025-08-28  6:34                     ` Michael S. Tsirkin
2025-08-28  6:59                       ` Parav Pandit
2025-08-28  9:23                         ` Michael S. Tsirkin
2025-08-28 10:41                           ` Parav Pandit
2025-04-08 14:59 Parav Pandit
2025-04-08 20:15 ` Michael S. Tsirkin
2025-04-09 13:50   ` Parav Pandit
2025-04-09 16:02     ` Michael S. Tsirkin
2025-04-16  3:01       ` Parav Pandit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250822060839-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=lirongqing@baidu.com \
    --cc=mgurtovoy@nvidia.com \
    --cc=parav@nvidia.com \
    --cc=pbonzini@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux.dev \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.