virtualization.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Revert "virtio_pci: Support surprise removal of virtio pci
@ 2024-01-11  9:14 Li,Rongqing
  2024-01-11 10:18 ` Parav Pandit
  2024-01-11 16:21 ` Michael S. Tsirkin
  0 siblings, 2 replies; 9+ messages in thread
From: Li,Rongqing @ 2024-01-11  9:14 UTC (permalink / raw)
  To: mst@redhat.com, jasowang@redhat.com, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev, parav@nvidia.com
  Cc: songyang23, liubokai, Song,Zhan

Revert "virtio_pci: Support surprise removal of virtio pci device"

This reverts commit 43bb40c5b92659966bdf4bfe584fde0a3575a049.

Marking the device as broken will cause the uncompleted IO request on
virtio-blk since virtblk_done will not continue when it find the broken virtqueu
at last it will cause the failure of removal of virtio-blk device because of
uncompleted IO request

The correct fix for the issue that commit 43bb40c5b9 ("virtio_pci:
Support surprise removal of virtio pci device") tried to fix
is that virtio backend always complete the IO request as soon as possible

Signed-off-by: Li RongQing <lirongqing@baidu.com>
---
 drivers/virtio/virtio_pci_common.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index 7a55939..d60fe99 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -584,13 +584,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
        struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
        struct device *dev = get_device(&vp_dev->vdev.dev);

-       /*
-        * Device is marked broken on surprise removal so that virtio upper
-        * layers can abort any ongoing operation.
-        */
-       if (!pci_device_is_present(pci_dev))
-               virtio_break_device(&vp_dev->vdev);
-
        pci_disable_sriov(pci_dev);

        unregister_virtio_device(&vp_dev->vdev);
--
2.9.4

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* RE: [RFC] Revert "virtio_pci: Support surprise removal of virtio pci
  2024-01-11  9:14 [RFC] Revert "virtio_pci: Support surprise removal of virtio pci Li,Rongqing
@ 2024-01-11 10:18 ` Parav Pandit
  2024-01-11 11:47   ` Li,Rongqing
  2024-01-11 16:21 ` Michael S. Tsirkin
  1 sibling, 1 reply; 9+ messages in thread
From: Parav Pandit @ 2024-01-11 10:18 UTC (permalink / raw)
  To: NBU-Contact-Li Rongqing (EXTERNAL), mst@redhat.com,
	jasowang@redhat.com, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev
  Cc: songyang23, liubokai, Song,Zhan


> From: Li,Rongqing <lirongqing@baidu.com>
> Sent: Thursday, January 11, 2024 2:45 PM
> 
> Revert "virtio_pci: Support surprise removal of virtio pci device"
> 
> This reverts commit 43bb40c5b92659966bdf4bfe584fde0a3575a049.
> 
> Marking the device as broken will cause the uncompleted IO request on
> virtio-blk since virtblk_done will not continue when it find the broken
> virtqueu at last it will cause the failure of removal of virtio-blk device because
> of uncompleted IO request
> 
> The correct fix for the issue that commit 43bb40c5b9 ("virtio_pci:
> Support surprise removal of virtio pci device") tried to fix is that virtio
> backend always complete the IO request as soon as possible
>
This can never be guaranteed by a device and by pci spec given it is surprise removal by definition.

For Linux virtio blk is not the only block device which has surprised removal.
Nvme blk driver has supported surprise removal for several years now.
I am sure nbd and others would have too given its network.

I am not familiar with the blk driver stack.
So please explore with blk community of how to complete an aborted/never completed io which never completed by the device.
Since the blk driver knows exactly that the device is removed, it can 100% complete the io to upper layer with the error in deterministic way.
 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>  drivers/virtio/virtio_pci_common.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c
> b/drivers/virtio/virtio_pci_common.c
> index 7a55939..d60fe99 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -584,13 +584,6 @@ static void virtio_pci_remove(struct pci_dev
> *pci_dev)
>         struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
>         struct device *dev = get_device(&vp_dev->vdev.dev);
> 
> -       /*
> -        * Device is marked broken on surprise removal so that virtio upper
> -        * layers can abort any ongoing operation.
> -        */
> -       if (!pci_device_is_present(pci_dev))
> -               virtio_break_device(&vp_dev->vdev);
> -
>         pci_disable_sriov(pci_dev);
> 
>         unregister_virtio_device(&vp_dev->vdev);
> --
> 2.9.4

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC] Revert "virtio_pci: Support surprise removal of virtio pci
  2024-01-11 10:18 ` Parav Pandit
@ 2024-01-11 11:47   ` Li,Rongqing
  2024-01-11 11:54     ` Parav Pandit
  0 siblings, 1 reply; 9+ messages in thread
From: Li,Rongqing @ 2024-01-11 11:47 UTC (permalink / raw)
  To: Parav Pandit, mst@redhat.com, jasowang@redhat.com,
	xuanzhuo@linux.alibaba.com, virtualization@lists.linux.dev
  Cc: songyang23, liubokai, Song,Zhan



> -----Original Message-----
> From: Parav Pandit <parav@nvidia.com>
> Sent: Thursday, January 11, 2024 6:18 PM
> To: Li,Rongqing <lirongqing@baidu.com>; mst@redhat.com;
> jasowang@redhat.com; xuanzhuo@linux.alibaba.com;
> virtualization@lists.linux.dev
> Cc: songyang23 <songyang23@baidu.com>; liubokai <liubokai@baidu.com>;
> Song,Zhan <songzhan@baidu.com>
> Subject: RE: [RFC] Revert "virtio_pci: Support surprise removal of virtio pci
> 
> 
> > From: Li,Rongqing <lirongqing@baidu.com>
> > Sent: Thursday, January 11, 2024 2:45 PM
> >
> > Revert "virtio_pci: Support surprise removal of virtio pci device"
> >
> > This reverts commit 43bb40c5b92659966bdf4bfe584fde0a3575a049.
> >
> > Marking the device as broken will cause the uncompleted IO request on
> > virtio-blk since virtblk_done will not continue when it find the
> > broken virtqueu at last it will cause the failure of removal of
> > virtio-blk device because of uncompleted IO request
> >
> > The correct fix for the issue that commit 43bb40c5b9 ("virtio_pci:
> > Support surprise removal of virtio pci device") tried to fix is that
> > virtio backend always complete the IO request as soon as possible
> >
> This can never be guaranteed by a device and by pci spec given it is surprise
> removal by definition.
> 
> For Linux virtio blk is not the only block device which has surprised removal.
> Nvme blk driver has supported surprise removal for several years now.
> I am sure nbd and others would have too given its network.
> 
> I am not familiar with the blk driver stack.
> So please explore with blk community of how to complete an aborted/never
> completed io which never completed by the device.
> Since the blk driver knows exactly that the device is removed, it can 100%
> complete the io to upper layer with the error in deterministic way.
> 


Without the commit 43bb40c5b92("virtio_pci: Support surprise removal of virtio pci device"), 
My virtio-blk device can be removed successfully, with this commit, and if we use fio to send io
requests the virtio-blk, The removal always failed.


Once virt queue is marked as broken, io request can not be finished since lots of virtio functions
will check If the vq is broken, if broken, exit directly, like virtqueue_get_buf_ctx_packed

static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
                      unsigned int *len,
                      void **ctx)
{
    struct vring_virtqueue *vq = to_vvq(_vq);
    u16 last_used, id, last_used_idx;
    bool used_wrap_counter;
    void *ret;

    START_USE(vq);

    if (unlikely(vq->broken)) {
        END_USE(vq);
        return NULL;
    } 

Nvme is different from virtio, it can ensure that all requests are flushed to completion, like below patch:

commit 1d39e6928cbd0eb737c51545210b5186d5551ba1
Author: Keith Busch <keith.busch@intel.com>
Date:   Wed Jun 6 08:13:08 2018 -0600

    nvme-pci: unquiesce dead controller queues

    This patch ensures the nvme namsepace request queues are not quiesced
    on a surprise removal. It's possible the queues were previously killed
    in a failed reset, so the queues need to be unquiesced to ensure all
    requests are flushed to completion.

    Signed-off-by: Keith Busch <keith.busch@intel.com>
    Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
    Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: Jens Axboe <axboe@kernel.dk>


-Li


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC] Revert "virtio_pci: Support surprise removal of virtio pci
  2024-01-11 11:47   ` Li,Rongqing
@ 2024-01-11 11:54     ` Parav Pandit
  2024-01-11 11:58       ` Li,Rongqing
  0 siblings, 1 reply; 9+ messages in thread
From: Parav Pandit @ 2024-01-11 11:54 UTC (permalink / raw)
  To: NBU-Contact-Li Rongqing (EXTERNAL), mst@redhat.com,
	jasowang@redhat.com, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev
  Cc: songyang23, liubokai, Song,Zhan


> From: Li,Rongqing <lirongqing@baidu.com>
> Sent: Thursday, January 11, 2024 5:18 PM
> 
> 
> > -----Original Message-----
> > From: Parav Pandit <parav@nvidia.com>
> > Sent: Thursday, January 11, 2024 6:18 PM
> > To: Li,Rongqing <lirongqing@baidu.com>; mst@redhat.com;
> > jasowang@redhat.com; xuanzhuo@linux.alibaba.com;
> > virtualization@lists.linux.dev
> > Cc: songyang23 <songyang23@baidu.com>; liubokai
> <liubokai@baidu.com>;
> > Song,Zhan <songzhan@baidu.com>
> > Subject: RE: [RFC] Revert "virtio_pci: Support surprise removal of
> > virtio pci
> >
> >
> > > From: Li,Rongqing <lirongqing@baidu.com>
> > > Sent: Thursday, January 11, 2024 2:45 PM
> > >
> > > Revert "virtio_pci: Support surprise removal of virtio pci device"
> > >
> > > This reverts commit 43bb40c5b92659966bdf4bfe584fde0a3575a049.
> > >
> > > Marking the device as broken will cause the uncompleted IO request
> > > on virtio-blk since virtblk_done will not continue when it find the
> > > broken virtqueu at last it will cause the failure of removal of
> > > virtio-blk device because of uncompleted IO request
> > >
> > > The correct fix for the issue that commit 43bb40c5b9 ("virtio_pci:
> > > Support surprise removal of virtio pci device") tried to fix is that
> > > virtio backend always complete the IO request as soon as possible
> > >
> > This can never be guaranteed by a device and by pci spec given it is
> > surprise removal by definition.
> >
> > For Linux virtio blk is not the only block device which has surprised removal.
> > Nvme blk driver has supported surprise removal for several years now.
> > I am sure nbd and others would have too given its network.
> >
> > I am not familiar with the blk driver stack.
> > So please explore with blk community of how to complete an
> > aborted/never completed io which never completed by the device.
> > Since the blk driver knows exactly that the device is removed, it can
> > 100% complete the io to upper layer with the error in deterministic way.
> >
> 
> 
> Without the commit 43bb40c5b92("virtio_pci: Support surprise removal of
> virtio pci device"), My virtio-blk device can be removed successfully, with this
> commit, and if we use fio to send io requests the virtio-blk, The removal
> always failed.
> 
How can it complete when the device is not going complete the io ever?

> 
> Once virt queue is marked as broken, io request can not be finished since lots
> of virtio functions
> will check If the vq is broken, if broken, exit directly, like
> virtqueue_get_buf_ctx_packed
> 
> static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
>                       unsigned int *len,
>                       void **ctx)
> {
>     struct vring_virtqueue *vq = to_vvq(_vq);
>     u16 last_used, id, last_used_idx;
>     bool used_wrap_counter;
>     void *ret;
> 
>     START_USE(vq);
> 
>     if (unlikely(vq->broken)) {
>         END_USE(vq);
>         return NULL;
>     }
> 
> Nvme is different from virtio, it can ensure that all requests are flushed to
> completion, like below patch:
> 
Who is "it" in your above comment? Is it Driver or device?

In below commit who (driver/device) flushes the completions?
If it is device, then it is not surprise removal, right?

> commit 1d39e6928cbd0eb737c51545210b5186d5551ba1
> Author: Keith Busch <keith.busch@intel.com>
> Date:   Wed Jun 6 08:13:08 2018 -0600
> 
>     nvme-pci: unquiesce dead controller queues
> 
>     This patch ensures the nvme namsepace request queues are not quiesced
>     on a surprise removal. It's possible the queues were previously killed
>     in a failed reset, so the queues need to be unquiesced to ensure all
>     requests are flushed to completion.
> 
>     Signed-off-by: Keith Busch <keith.busch@intel.com>
>     Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
>     Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
>     Signed-off-by: Christoph Hellwig <hch@lst.de>
>     Signed-off-by: Jens Axboe <axboe@kernel.dk>
> 
> 
> -Li


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC] Revert "virtio_pci: Support surprise removal of virtio pci
  2024-01-11 11:54     ` Parav Pandit
@ 2024-01-11 11:58       ` Li,Rongqing
  2024-01-11 12:01         ` Parav Pandit
  0 siblings, 1 reply; 9+ messages in thread
From: Li,Rongqing @ 2024-01-11 11:58 UTC (permalink / raw)
  To: Parav Pandit, mst@redhat.com, jasowang@redhat.com,
	xuanzhuo@linux.alibaba.com, virtualization@lists.linux.dev
  Cc: songyang23, liubokai, Song,Zhan



> -----Original Message-----
> From: Parav Pandit <parav@nvidia.com>
> Sent: Thursday, January 11, 2024 7:55 PM
> To: Li,Rongqing <lirongqing@baidu.com>; mst@redhat.com;
> jasowang@redhat.com; xuanzhuo@linux.alibaba.com;
> virtualization@lists.linux.dev
> Cc: songyang23 <songyang23@baidu.com>; liubokai <liubokai@baidu.com>;
> Song,Zhan <songzhan@baidu.com>
> Subject: RE: [RFC] Revert "virtio_pci: Support surprise removal of virtio pci
> 
> 
> > From: Li,Rongqing <lirongqing@baidu.com>
> > Sent: Thursday, January 11, 2024 5:18 PM
> >
> >
> > > -----Original Message-----
> > > From: Parav Pandit <parav@nvidia.com>
> > > Sent: Thursday, January 11, 2024 6:18 PM
> > > To: Li,Rongqing <lirongqing@baidu.com>; mst@redhat.com;
> > > jasowang@redhat.com; xuanzhuo@linux.alibaba.com;
> > > virtualization@lists.linux.dev
> > > Cc: songyang23 <songyang23@baidu.com>; liubokai
> > <liubokai@baidu.com>;
> > > Song,Zhan <songzhan@baidu.com>
> > > Subject: RE: [RFC] Revert "virtio_pci: Support surprise removal of
> > > virtio pci
> > >
> > >
> > > > From: Li,Rongqing <lirongqing@baidu.com>
> > > > Sent: Thursday, January 11, 2024 2:45 PM
> > > >
> > > > Revert "virtio_pci: Support surprise removal of virtio pci device"
> > > >
> > > > This reverts commit 43bb40c5b92659966bdf4bfe584fde0a3575a049.
> > > >
> > > > Marking the device as broken will cause the uncompleted IO request
> > > > on virtio-blk since virtblk_done will not continue when it find
> > > > the broken virtqueu at last it will cause the failure of removal
> > > > of virtio-blk device because of uncompleted IO request
> > > >
> > > > The correct fix for the issue that commit 43bb40c5b9 ("virtio_pci:
> > > > Support surprise removal of virtio pci device") tried to fix is
> > > > that virtio backend always complete the IO request as soon as
> > > > possible
> > > >
> > > This can never be guaranteed by a device and by pci spec given it is
> > > surprise removal by definition.
> > >
> > > For Linux virtio blk is not the only block device which has surprised removal.
> > > Nvme blk driver has supported surprise removal for several years now.
> > > I am sure nbd and others would have too given its network.
> > >
> > > I am not familiar with the blk driver stack.
> > > So please explore with blk community of how to complete an
> > > aborted/never completed io which never completed by the device.
> > > Since the blk driver knows exactly that the device is removed, it
> > > can 100% complete the io to upper layer with the error in deterministic way.
> > >
> >
> >
> > Without the commit 43bb40c5b92("virtio_pci: Support surprise removal
> > of virtio pci device"), My virtio-blk device can be removed
> > successfully, with this commit, and if we use fio to send io requests
> > the virtio-blk, The removal always failed.
> >
> How can it complete when the device is not going complete the io ever?
> 

Now this issue is that the virtio backend (device) has completed the io request, but virtio-blk 
Can not get the completed io request, since the virtqueue is marked as broken.

-Li

> >
> > Once virt queue is marked as broken, io request can not be finished
> > since lots of virtio functions will check If the vq is broken, if
> > broken, exit directly, like virtqueue_get_buf_ctx_packed
> >
> > static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> >                       unsigned int *len,
> >                       void **ctx)
> > {
> >     struct vring_virtqueue *vq = to_vvq(_vq);
> >     u16 last_used, id, last_used_idx;
> >     bool used_wrap_counter;
> >     void *ret;
> >
> >     START_USE(vq);
> >
> >     if (unlikely(vq->broken)) {
> >         END_USE(vq);
> >         return NULL;
> >     }
> >
> > Nvme is different from virtio, it can ensure that all requests are
> > flushed to completion, like below patch:
> >
> Who is "it" in your above comment? Is it Driver or device?
> 
> In below commit who (driver/device) flushes the completions?
> If it is device, then it is not surprise removal, right?
> 
> > commit 1d39e6928cbd0eb737c51545210b5186d5551ba1
> > Author: Keith Busch <keith.busch@intel.com>
> > Date:   Wed Jun 6 08:13:08 2018 -0600
> >
> >     nvme-pci: unquiesce dead controller queues
> >
> >     This patch ensures the nvme namsepace request queues are not
> quiesced
> >     on a surprise removal. It's possible the queues were previously killed
> >     in a failed reset, so the queues need to be unquiesced to ensure all
> >     requests are flushed to completion.
> >
> >     Signed-off-by: Keith Busch <keith.busch@intel.com>
> >     Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> >     Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> >     Signed-off-by: Christoph Hellwig <hch@lst.de>
> >     Signed-off-by: Jens Axboe <axboe@kernel.dk>
> >
> >
> > -Li


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC] Revert "virtio_pci: Support surprise removal of virtio pci
  2024-01-11 11:58       ` Li,Rongqing
@ 2024-01-11 12:01         ` Parav Pandit
  0 siblings, 0 replies; 9+ messages in thread
From: Parav Pandit @ 2024-01-11 12:01 UTC (permalink / raw)
  To: NBU-Contact-Li Rongqing (EXTERNAL), mst@redhat.com,
	jasowang@redhat.com, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev
  Cc: songyang23, liubokai, Song,Zhan


> From: Li,Rongqing <lirongqing@baidu.com>
> Sent: Thursday, January 11, 2024 5:28 PM
> 
> 
> > -----Original Message-----
> > From: Parav Pandit <parav@nvidia.com>
> > Sent: Thursday, January 11, 2024 7:55 PM
> > To: Li,Rongqing <lirongqing@baidu.com>; mst@redhat.com;
> > jasowang@redhat.com; xuanzhuo@linux.alibaba.com;
> > virtualization@lists.linux.dev
> > Cc: songyang23 <songyang23@baidu.com>; liubokai
> <liubokai@baidu.com>;
> > Song,Zhan <songzhan@baidu.com>
> > Subject: RE: [RFC] Revert "virtio_pci: Support surprise removal of
> > virtio pci
> >
> >
> > > From: Li,Rongqing <lirongqing@baidu.com>
> > > Sent: Thursday, January 11, 2024 5:18 PM
> > >
> > >
> > > > -----Original Message-----
> > > > From: Parav Pandit <parav@nvidia.com>
> > > > Sent: Thursday, January 11, 2024 6:18 PM
> > > > To: Li,Rongqing <lirongqing@baidu.com>; mst@redhat.com;
> > > > jasowang@redhat.com; xuanzhuo@linux.alibaba.com;
> > > > virtualization@lists.linux.dev
> > > > Cc: songyang23 <songyang23@baidu.com>; liubokai
> > > <liubokai@baidu.com>;
> > > > Song,Zhan <songzhan@baidu.com>
> > > > Subject: RE: [RFC] Revert "virtio_pci: Support surprise removal of
> > > > virtio pci
> > > >
> > > >
> > > > > From: Li,Rongqing <lirongqing@baidu.com>
> > > > > Sent: Thursday, January 11, 2024 2:45 PM
> > > > >
> > > > > Revert "virtio_pci: Support surprise removal of virtio pci device"
> > > > >
> > > > > This reverts commit 43bb40c5b92659966bdf4bfe584fde0a3575a049.
> > > > >
> > > > > Marking the device as broken will cause the uncompleted IO
> > > > > request on virtio-blk since virtblk_done will not continue when
> > > > > it find the broken virtqueu at last it will cause the failure of
> > > > > removal of virtio-blk device because of uncompleted IO request
> > > > >
> > > > > The correct fix for the issue that commit 43bb40c5b9 ("virtio_pci:
> > > > > Support surprise removal of virtio pci device") tried to fix is
> > > > > that virtio backend always complete the IO request as soon as
> > > > > possible
> > > > >
> > > > This can never be guaranteed by a device and by pci spec given it
> > > > is surprise removal by definition.
> > > >
> > > > For Linux virtio blk is not the only block device which has surprised
> removal.
> > > > Nvme blk driver has supported surprise removal for several years now.
> > > > I am sure nbd and others would have too given its network.
> > > >
> > > > I am not familiar with the blk driver stack.
> > > > So please explore with blk community of how to complete an
> > > > aborted/never completed io which never completed by the device.
> > > > Since the blk driver knows exactly that the device is removed, it
> > > > can 100% complete the io to upper layer with the error in deterministic
> way.
> > > >
> > >
> > >
> > > Without the commit 43bb40c5b92("virtio_pci: Support surprise removal
> > > of virtio pci device"), My virtio-blk device can be removed
> > > successfully, with this commit, and if we use fio to send io
> > > requests the virtio-blk, The removal always failed.
> > >
> > How can it complete when the device is not going complete the io ever?
> >
> 
> Now this issue is that the virtio backend (device) has completed the io request,
> but virtio-blk Can not get the completed io request, since the virtqueue is
> marked as broken.
Your test was just a timing luck when the io was completed, it is not always the case.
Hence, vblk driver should flush all the pending requests with error to upper layers for which the ios have not yet processed after it was broken because when surprise removal request was received from that point onwards there is no communication with device.

> 
> -Li
> 
> > >
> > > Once virt queue is marked as broken, io request can not be finished
> > > since lots of virtio functions will check If the vq is broken, if
> > > broken, exit directly, like virtqueue_get_buf_ctx_packed
> > >
> > > static void *virtqueue_get_buf_ctx_packed(struct virtqueue *_vq,
> > >                       unsigned int *len,
> > >                       void **ctx)
> > > {
> > >     struct vring_virtqueue *vq = to_vvq(_vq);
> > >     u16 last_used, id, last_used_idx;
> > >     bool used_wrap_counter;
> > >     void *ret;
> > >
> > >     START_USE(vq);
> > >
> > >     if (unlikely(vq->broken)) {
> > >         END_USE(vq);
> > >         return NULL;
> > >     }
> > >
> > > Nvme is different from virtio, it can ensure that all requests are
> > > flushed to completion, like below patch:
> > >
> > Who is "it" in your above comment? Is it Driver or device?
> >
> > In below commit who (driver/device) flushes the completions?
> > If it is device, then it is not surprise removal, right?
> >
> > > commit 1d39e6928cbd0eb737c51545210b5186d5551ba1
> > > Author: Keith Busch <keith.busch@intel.com>
> > > Date:   Wed Jun 6 08:13:08 2018 -0600
> > >
> > >     nvme-pci: unquiesce dead controller queues
> > >
> > >     This patch ensures the nvme namsepace request queues are not
> > quiesced
> > >     on a surprise removal. It's possible the queues were previously killed
> > >     in a failed reset, so the queues need to be unquiesced to ensure all
> > >     requests are flushed to completion.
> > >
> > >     Signed-off-by: Keith Busch <keith.busch@intel.com>
> > >     Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> > >     Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> > >     Signed-off-by: Christoph Hellwig <hch@lst.de>
> > >     Signed-off-by: Jens Axboe <axboe@kernel.dk>
> > >
> > >
> > > -Li


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Revert "virtio_pci: Support surprise removal of virtio pci
  2024-01-11  9:14 [RFC] Revert "virtio_pci: Support surprise removal of virtio pci Li,Rongqing
  2024-01-11 10:18 ` Parav Pandit
@ 2024-01-11 16:21 ` Michael S. Tsirkin
  2024-01-12  2:28   ` Li,Rongqing
  1 sibling, 1 reply; 9+ messages in thread
From: Michael S. Tsirkin @ 2024-01-11 16:21 UTC (permalink / raw)
  To: Li,Rongqing
  Cc: jasowang@redhat.com, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev, parav@nvidia.com, songyang23,
	liubokai, Song,Zhan

On Thu, Jan 11, 2024 at 09:14:37AM +0000, Li,Rongqing wrote:
> Revert "virtio_pci: Support surprise removal of virtio pci device"
> 
> This reverts commit 43bb40c5b92659966bdf4bfe584fde0a3575a049.
> 
> Marking the device as broken will cause the uncompleted IO request on
> virtio-blk since virtblk_done will not continue when it find the broken virtqueu
> at last it will cause the failure of removal of virtio-blk device because of
> uncompleted IO request
> 
> The correct fix for the issue that commit 43bb40c5b9 ("virtio_pci:
> Support surprise removal of virtio pci device") tried to fix
> is that virtio backend always complete the IO request as soon as possible
> 
> Signed-off-by: Li RongQing <lirongqing@baidu.com>
> ---
>  drivers/virtio/virtio_pci_common.c | 7 -------
>  1 file changed, 7 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index 7a55939..d60fe99 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -584,13 +584,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
>         struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
>         struct device *dev = get_device(&vp_dev->vdev.dev);
> 
> -       /*
> -        * Device is marked broken on surprise removal so that virtio upper
> -        * layers can abort any ongoing operation.
> -        */
> -       if (!pci_device_is_present(pci_dev))
> -               virtio_break_device(&vp_dev->vdev);
> -
>         pci_disable_sriov(pci_dev);


Maybe break device is not the right thing to do. Indeed, request buffers which
have been already used are not completed by driver.
Just removing this is not good since you can then get more requests
and the device is gone.

>         unregister_virtio_device(&vp_dev->vdev);
> --
> 2.9.4


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC] Revert "virtio_pci: Support surprise removal of virtio pci
  2024-01-11 16:21 ` Michael S. Tsirkin
@ 2024-01-12  2:28   ` Li,Rongqing
  2024-01-12  5:56     ` Parav Pandit
  0 siblings, 1 reply; 9+ messages in thread
From: Li,Rongqing @ 2024-01-12  2:28 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang@redhat.com, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev, parav@nvidia.com, songyang23,
	liubokai, Song,Zhan


> > diff --git a/drivers/virtio/virtio_pci_common.c
> > b/drivers/virtio/virtio_pci_common.c
> > index 7a55939..d60fe99 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -584,13 +584,6 @@ static void virtio_pci_remove(struct pci_dev
> *pci_dev)
> >         struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> >         struct device *dev = get_device(&vp_dev->vdev.dev);
> >
> > -       /*
> > -        * Device is marked broken on surprise removal so that virtio upper
> > -        * layers can abort any ongoing operation.
> > -        */
> > -       if (!pci_device_is_present(pci_dev))
> > -               virtio_break_device(&vp_dev->vdev);
> > -
> >         pci_disable_sriov(pci_dev);
> 
> 
> Maybe break device is not the right thing to do. Indeed, request buffers which
> have been already used are not completed by driver.

True, break device makes virtio-blk no longer work for surprise removal

And seems no simple method can work for all devices.

-Li


^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [RFC] Revert "virtio_pci: Support surprise removal of virtio pci
  2024-01-12  2:28   ` Li,Rongqing
@ 2024-01-12  5:56     ` Parav Pandit
  0 siblings, 0 replies; 9+ messages in thread
From: Parav Pandit @ 2024-01-12  5:56 UTC (permalink / raw)
  To: NBU-Contact-Li Rongqing (EXTERNAL), Michael S. Tsirkin
  Cc: jasowang@redhat.com, xuanzhuo@linux.alibaba.com,
	virtualization@lists.linux.dev, songyang23, liubokai, Song,Zhan



> From: Li,Rongqing <lirongqing@baidu.com>
> Sent: Friday, January 12, 2024 7:59 AM
> 
> 
> > > diff --git a/drivers/virtio/virtio_pci_common.c
> > > b/drivers/virtio/virtio_pci_common.c
> > > index 7a55939..d60fe99 100644
> > > --- a/drivers/virtio/virtio_pci_common.c
> > > +++ b/drivers/virtio/virtio_pci_common.c
> > > @@ -584,13 +584,6 @@ static void virtio_pci_remove(struct pci_dev
> > *pci_dev)
> > >         struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > >         struct device *dev = get_device(&vp_dev->vdev.dev);
> > >
> > > -       /*
> > > -        * Device is marked broken on surprise removal so that virtio upper
> > > -        * layers can abort any ongoing operation.
> > > -        */
> > > -       if (!pci_device_is_present(pci_dev))
> > > -               virtio_break_device(&vp_dev->vdev);
> > > -
> > >         pci_disable_sriov(pci_dev);
> >
> >
> > Maybe break device is not the right thing to do. Indeed, request
> > buffers which have been already used are not completed by driver.
>
We can possibly have a virtio_device level flag to indicate device dead/not responsive.
 
> True, break device makes virtio-blk no longer work for surprise removal
> 
> And seems no simple method can work for all devices.

When the VQ is broken or when virtio_device is marked dead, virtblk_remove() should 
a. flush all the outstanding requests
b. inform the upper layer to not issue more requests
c. for transient requests arriving on MQ, fail them in the request queue callback.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-01-12  5:56 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-11  9:14 [RFC] Revert "virtio_pci: Support surprise removal of virtio pci Li,Rongqing
2024-01-11 10:18 ` Parav Pandit
2024-01-11 11:47   ` Li,Rongqing
2024-01-11 11:54     ` Parav Pandit
2024-01-11 11:58       ` Li,Rongqing
2024-01-11 12:01         ` Parav Pandit
2024-01-11 16:21 ` Michael S. Tsirkin
2024-01-12  2:28   ` Li,Rongqing
2024-01-12  5:56     ` Parav Pandit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).