* [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
@ 2025-04-08 14:59 Parav Pandit
2025-04-08 20:15 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-04-08 14:59 UTC (permalink / raw)
To: mst, virtualization, jasowang
Cc: stefanha, pbonzini, xuanzhuo, Parav Pandit, stable, lirongqing,
Max Gurtovoy
This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
The cited commit introduced a fix that marks the device as broken
during surprise removal. However, this approach causes uncompleted
I/O requests on virtio-blk device. The presence of uncompleted I/O
requests prevents the successful removal of virtio-blk devices.
This fix allows devices that simulate a surprise removal but actually
remove gracefully to continue working as before.
For surprise removals, a better solution will be preferred in the future.
Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
Cc: stable@vger.kernel.org
Reported-by: lirongqing@baidu.com
Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@baidu.com/
Reviewed-by: Max Gurtovoy<mgurtovoy@nvidia.com>
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
drivers/virtio/virtio_pci_common.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index d6d79af44569..dba5eb2eaff9 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
struct device *dev = get_device(&vp_dev->vdev.dev);
- /*
- * Device is marked broken on surprise removal so that virtio upper
- * layers can abort any ongoing operation.
- */
- if (!pci_device_is_present(pci_dev))
- virtio_break_device(&vp_dev->vdev);
-
pci_disable_sriov(pci_dev);
unregister_virtio_device(&vp_dev->vdev);
--
2.26.2
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-04-08 14:59 Parav Pandit
@ 2025-04-08 20:15 ` Michael S. Tsirkin
2025-04-09 13:50 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-04-08 20:15 UTC (permalink / raw)
To: Parav Pandit
Cc: virtualization, jasowang, stefanha, pbonzini, xuanzhuo, stable,
lirongqing, Max Gurtovoy
On Tue, Apr 08, 2025 at 05:59:08PM +0300, Parav Pandit wrote:
> This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
>
> The cited commit introduced a fix that marks the device as broken
> during surprise removal. However, this approach causes uncompleted
> I/O requests on virtio-blk device. The presence of uncompleted I/O
> requests prevents the successful removal of virtio-blk devices.
>
> This fix allows devices that simulate a surprise removal but actually
> remove gracefully to continue working as before.
>
> For surprise removals, a better solution will be preferred in the future.
Sorry I'm not breaking one thing to fix another.
Device is gone so no new requests will be completed. Why not complete all
unfinished requests, for example?
Come up with a proper fix pls.
>
> Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
> Cc: stable@vger.kernel.org
> Reported-by: lirongqing@baidu.com
> Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@baidu.com/
> Reviewed-by: Max Gurtovoy<mgurtovoy@nvidia.com>
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> ---
> drivers/virtio/virtio_pci_common.c | 7 -------
> 1 file changed, 7 deletions(-)
>
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index d6d79af44569..dba5eb2eaff9 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> struct device *dev = get_device(&vp_dev->vdev.dev);
>
> - /*
> - * Device is marked broken on surprise removal so that virtio upper
> - * layers can abort any ongoing operation.
> - */
> - if (!pci_device_is_present(pci_dev))
> - virtio_break_device(&vp_dev->vdev);
> -
> pci_disable_sriov(pci_dev);
>
> unregister_virtio_device(&vp_dev->vdev);
> --
> 2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-04-08 20:15 ` Michael S. Tsirkin
@ 2025-04-09 13:50 ` Parav Pandit
2025-04-09 16:02 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-04-09 13:50 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org,
NBU-Contact-Li Rongqing (EXTERNAL), Max Gurtovoy
Hi Michael,
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, April 9, 2025 1:45 AM
>
> On Tue, Apr 08, 2025 at 05:59:08PM +0300, Parav Pandit wrote:
> > This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of
> virtio pci device").
> >
> > The cited commit introduced a fix that marks the device as broken
> > during surprise removal. However, this approach causes uncompleted I/O
> > requests on virtio-blk device. The presence of uncompleted I/O
> > requests prevents the successful removal of virtio-blk devices.
> >
> > This fix allows devices that simulate a surprise removal but actually
> > remove gracefully to continue working as before.
> >
> > For surprise removals, a better solution will be preferred in the future.
>
> Sorry I'm not breaking one thing to fix another.
> Device is gone so no new requests will be completed. Why not complete all
> unfinished requests, for example?
>
> Come up with a proper fix pls.
>
I would also like to have a proper fix that can be backportable.
However, an attempt [1] had race.
To overcome the race, a different approach also tried, however the block layer was stuck even if all requests in virtio-blk driver layer was completed like you suggested.
It appeared that supporting uncompleted requests won't be so straightforward to backport.
Hence, the request is to revert and restore the previous behavior.
This at least improves the case where the OS thinks that surprise removal occurred, but the device eventually completes the IO.
And hence, virtio block driver successfully unloads.
And virtio-net also does not experience the mentioned crash.
[1] https://lore.kernel.org/all/20240217180848.241068-1-parav@nvidia.com/
> >
> > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio
> > pci device")
> > Cc: stable@vger.kernel.org
> > Reported-by: lirongqing@baidu.com
> > Closes:
> > https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b474
> > 1@baidu.com/
> > Reviewed-by: Max Gurtovoy<mgurtovoy@nvidia.com>
> > Signed-off-by: Parav Pandit <parav@nvidia.com>
>
>
> > ---
> > drivers/virtio/virtio_pci_common.c | 7 -------
> > 1 file changed, 7 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_pci_common.c
> > b/drivers/virtio/virtio_pci_common.c
> > index d6d79af44569..dba5eb2eaff9 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev
> *pci_dev)
> > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > struct device *dev = get_device(&vp_dev->vdev.dev);
> >
> > - /*
> > - * Device is marked broken on surprise removal so that virtio upper
> > - * layers can abort any ongoing operation.
> > - */
> > - if (!pci_device_is_present(pci_dev))
> > - virtio_break_device(&vp_dev->vdev);
> > -
> > pci_disable_sriov(pci_dev);
> >
> > unregister_virtio_device(&vp_dev->vdev);
> > --
> > 2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-04-09 13:50 ` Parav Pandit
@ 2025-04-09 16:02 ` Michael S. Tsirkin
2025-04-16 3:01 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-04-09 16:02 UTC (permalink / raw)
To: Parav Pandit
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org,
NBU-Contact-Li Rongqing (EXTERNAL), Max Gurtovoy
On Wed, Apr 09, 2025 at 01:50:18PM +0000, Parav Pandit wrote:
> Hi Michael,
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: Wednesday, April 9, 2025 1:45 AM
> >
> > On Tue, Apr 08, 2025 at 05:59:08PM +0300, Parav Pandit wrote:
> > > This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of
> > virtio pci device").
> > >
> > > The cited commit introduced a fix that marks the device as broken
> > > during surprise removal. However, this approach causes uncompleted I/O
> > > requests on virtio-blk device. The presence of uncompleted I/O
> > > requests prevents the successful removal of virtio-blk devices.
> > >
> > > This fix allows devices that simulate a surprise removal but actually
> > > remove gracefully to continue working as before.
> > >
> > > For surprise removals, a better solution will be preferred in the future.
> >
> > Sorry I'm not breaking one thing to fix another.
> > Device is gone so no new requests will be completed. Why not complete all
> > unfinished requests, for example?
> >
> > Come up with a proper fix pls.
> >
> I would also like to have a proper fix that can be backportable.
> However, an attempt [1] had race.
> To overcome the race, a different approach also tried, however the block layer was stuck even if all requests in virtio-blk driver layer was completed like you suggested.
>
> It appeared that supporting uncompleted requests won't be so straightforward to backport.
>
> Hence, the request is to revert and restore the previous behavior.
> This at least improves the case where the OS thinks that surprise removal occurred, but the device eventually completes the IO.
> And hence, virtio block driver successfully unloads.
> And virtio-net also does not experience the mentioned crash.
>
> [1] https://lore.kernel.org/all/20240217180848.241068-1-parav@nvidia.com/
Parav this is a commit from 2021. I am not reverting it "because it
seems to help". We'll never make progress like this.
You will have to debug and fix it properly. Sorry.
Once we have a fix, we will worry about backports and stuff, this
is how we do kernel development here.
> > >
> > > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio
> > > pci device")
> > > Cc: stable@vger.kernel.org
> > > Reported-by: lirongqing@baidu.com
> > > Closes:
> > > https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b474
> > > 1@baidu.com/
> > > Reviewed-by: Max Gurtovoy<mgurtovoy@nvidia.com>
> > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> >
> >
> > > ---
> > > drivers/virtio/virtio_pci_common.c | 7 -------
> > > 1 file changed, 7 deletions(-)
> > >
> > > diff --git a/drivers/virtio/virtio_pci_common.c
> > > b/drivers/virtio/virtio_pci_common.c
> > > index d6d79af44569..dba5eb2eaff9 100644
> > > --- a/drivers/virtio/virtio_pci_common.c
> > > +++ b/drivers/virtio/virtio_pci_common.c
> > > @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev
> > *pci_dev)
> > > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > > struct device *dev = get_device(&vp_dev->vdev.dev);
> > >
> > > - /*
> > > - * Device is marked broken on surprise removal so that virtio upper
> > > - * layers can abort any ongoing operation.
> > > - */
> > > - if (!pci_device_is_present(pci_dev))
> > > - virtio_break_device(&vp_dev->vdev);
> > > -
> > > pci_disable_sriov(pci_dev);
> > >
> > > unregister_virtio_device(&vp_dev->vdev);
> > > --
> > > 2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-04-09 16:02 ` Michael S. Tsirkin
@ 2025-04-16 3:01 ` Parav Pandit
0 siblings, 0 replies; 38+ messages in thread
From: Parav Pandit @ 2025-04-16 3:01 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org,
NBU-Contact-Li Rongqing (EXTERNAL), Max Gurtovoy
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: Wednesday, April 9, 2025 9:32 PM
>
> On Wed, Apr 09, 2025 at 01:50:18PM +0000, Parav Pandit wrote:
> > Hi Michael,
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: Wednesday, April 9, 2025 1:45 AM
> > >
> > > On Tue, Apr 08, 2025 at 05:59:08PM +0300, Parav Pandit wrote:
> > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise
> > > > removal of
> > > virtio pci device").
> > > >
> > > > The cited commit introduced a fix that marks the device as broken
> > > > during surprise removal. However, this approach causes uncompleted
> > > > I/O requests on virtio-blk device. The presence of uncompleted I/O
> > > > requests prevents the successful removal of virtio-blk devices.
> > > >
> > > > This fix allows devices that simulate a surprise removal but
> > > > actually remove gracefully to continue working as before.
> > > >
> > > > For surprise removals, a better solution will be preferred in the future.
> > >
> > > Sorry I'm not breaking one thing to fix another.
> > > Device is gone so no new requests will be completed. Why not
> > > complete all unfinished requests, for example?
> > >
> > > Come up with a proper fix pls.
> > >
> > I would also like to have a proper fix that can be backportable.
> > However, an attempt [1] had race.
> > To overcome the race, a different approach also tried, however the block
> layer was stuck even if all requests in virtio-blk driver layer was completed like
> you suggested.
> >
> > It appeared that supporting uncompleted requests won't be so
> straightforward to backport.
> >
> > Hence, the request is to revert and restore the previous behavior.
> > This at least improves the case where the OS thinks that surprise removal
> occurred, but the device eventually completes the IO.
> > And hence, virtio block driver successfully unloads.
> > And virtio-net also does not experience the mentioned crash.
> >
> > [1]
> > https://lore.kernel.org/all/20240217180848.241068-1-parav@nvidia.com/
>
> Parav this is a commit from 2021. I am not reverting it "because it seems to
> help". We'll never make progress like this.
> You will have to debug and fix it properly. Sorry.
>
> Once we have a fix, we will worry about backports and stuff, this is how we do
> kernel development here.
Ok. I will post the candidate patch. Will likely need help to fix it. Will ask Stefano.
Thanks.
^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
@ 2025-08-22 9:17 Parav Pandit
2025-08-22 10:21 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-22 9:17 UTC (permalink / raw)
To: mst, virtualization, jasowang
Cc: stefanha, pbonzini, xuanzhuo, stable, mgurtovoy, Parav Pandit,
lirongqing
This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
Virtio drivers and PCI devices have never fully supported true
surprise (aka hot unplug) removal. Drivers historically continued
processing and waiting for pending I/O and even continued synchronous
device reset during surprise removal. Devices have also continued
completing I/Os, doing DMA and allowing device reset after surprise
removal to support such drivers.
Supporting it correctly would require a new device capability and
driver negotiation in the virtio specification to safely stop
I/O and free queue memory. Failure to do so either breaks all the
existing drivers with call trace listed in the commit or crashes the
host on continuing the DMA. Hence, until such specification and devices
are invented, restore the previous behavior of treating surprise
removal as graceful removal to avoid regressions and maintain system
stability same as before the
commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
As explained above, previous analysis of solving this only in driver
was incomplete and non-reliable at [1] and at [2]; Hence reverting commit
43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
is still the best stand to restore failures of virtio net and
block devices.
[1] https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB100BC6C638DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t
[2] https://lore.kernel.org/virtualization/20250602024358.57114-1-parav@nvidia.com/
Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
Cc: stable@vger.kernel.org
Reported-by: lirongqing@baidu.com
Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@baidu.com/
Signed-off-by: Parav Pandit <parav@nvidia.com>
---
drivers/virtio/virtio_pci_common.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index d6d79af44569..dba5eb2eaff9 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
struct device *dev = get_device(&vp_dev->vdev.dev);
- /*
- * Device is marked broken on surprise removal so that virtio upper
- * layers can abort any ongoing operation.
- */
- if (!pci_device_is_present(pci_dev))
- virtio_break_device(&vp_dev->vdev);
-
pci_disable_sriov(pci_dev);
unregister_virtio_device(&vp_dev->vdev);
--
2.26.2
^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-22 9:17 [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device" Parav Pandit
@ 2025-08-22 10:21 ` Michael S. Tsirkin
2025-08-22 12:22 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-22 10:21 UTC (permalink / raw)
To: Parav Pandit
Cc: virtualization, jasowang, stefanha, pbonzini, xuanzhuo, stable,
mgurtovoy, lirongqing
On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
> This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
>
> Virtio drivers and PCI devices have never fully supported true
> surprise (aka hot unplug) removal. Drivers historically continued
> processing and waiting for pending I/O and even continued synchronous
> device reset during surprise removal. Devices have also continued
> completing I/Os, doing DMA and allowing device reset after surprise
> removal to support such drivers.
>
> Supporting it correctly would require a new device capability
If a device is removed, it is removed. Windows drivers supported
this since forever and it's just a Linux bug that it does not
handle all the cases. This is not something you can handle
with a capability.
> and
> driver negotiation in the virtio specification to safely stop
> I/O and free queue memory. Failure to do so either breaks all the
> existing drivers with call trace listed in the commit or crashes the
> host on continuing the DMA.
If the device is gone, then DMA does not continue.
IIUC what is going on for you, is that you have developed a surprise
removal emulation that pretends to remove the device but
actually the device is doing DMA. So of course things break then.
> Hence, until such specification and devices
> are invented, restore the previous behavior of treating surprise
> removal as graceful removal to avoid regressions and maintain system
> stability same as before the
> commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device").
>
> As explained above, previous analysis of solving this only in driver
> was incomplete and non-reliable at [1] and at [2]; Hence reverting commit
> 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
> is still the best stand to restore failures of virtio net and
> block devices.
>
> [1] https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB100BC6C638DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t
I can only repeat what I said then, this is not how we do kernel
development.
> [2] https://lore.kernel.org/virtualization/20250602024358.57114-1-parav@nvidia.com/
What was missing here, is handling corner cases. So let us please
try to handle them.
Here is how I would try to do it:
- add a new driver callback
- start a periodic timer task in virtio core on remove
- in the timer, probe that the device is still present.
if not, invoke a driver callback
- cancel the task on device reset
If you do not have the time, let me know and I will try to look into it.
> Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
> Cc: stable@vger.kernel.org
> Reported-by: lirongqing@baidu.com
> Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@baidu.com/
> Signed-off-by: Parav Pandit <parav@nvidia.com>
> ---
> drivers/virtio/virtio_pci_common.c | 7 -------
> 1 file changed, 7 deletions(-)
>
> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> index d6d79af44569..dba5eb2eaff9 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> struct device *dev = get_device(&vp_dev->vdev.dev);
>
> - /*
> - * Device is marked broken on surprise removal so that virtio upper
> - * layers can abort any ongoing operation.
> - */
> - if (!pci_device_is_present(pci_dev))
> - virtio_break_device(&vp_dev->vdev);
> -
> pci_disable_sriov(pci_dev);
>
> unregister_virtio_device(&vp_dev->vdev);
> --
> 2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
@ 2025-08-22 10:27 Li,Rongqing
2025-08-22 12:24 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Li,Rongqing @ 2025-08-22 10:27 UTC (permalink / raw)
To: Parav Pandit, mst@redhat.com, virtualization@lists.linux.dev,
jasowang@redhat.com
Cc: stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org,
mgurtovoy@nvidia.com
> This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of
> virtio pci device").
>
> Virtio drivers and PCI devices have never fully supported true surprise (aka hot
> unplug) removal. Drivers historically continued processing and waiting for
> pending I/O and even continued synchronous device reset during surprise
> removal. Devices have also continued completing I/Os, doing DMA and allowing
> device reset after surprise removal to support such drivers.
>
> Supporting it correctly would require a new device capability and driver
> negotiation in the virtio specification to safely stop I/O and free queue memory.
> Failure to do so either breaks all the existing drivers with call trace listed in the
> commit or crashes the host on continuing the DMA. Hence, until such
> specification and devices are invented, restore the previous behavior of treating
> surprise removal as graceful removal to avoid regressions and maintain system
> stability same as before the commit 43bb40c5b926 ("virtio_pci: Support surprise
> removal of virtio pci device").
>
> As explained above, previous analysis of solving this only in driver was
> incomplete and non-reliable at [1] and at [2]; Hence reverting commit
> 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device") is still
> the best stand to restore failures of virtio net and block devices.
>
> [1]
> https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB100BC6C638
> DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t
> [2]
> https://lore.kernel.org/virtualization/20250602024358.57114-1-parav@nvidia.c
> om/
>
> Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device")
> Cc: stable@vger.kernel.org
> Reported-by: lirongqing@baidu.com
> Closes:
> https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@b
> aidu.com/
> Signed-off-by: Parav Pandit <parav@nvidia.com>
Tested-by: Li RongQing <lirongqing@baidu.com>
Thanks
-Li
> ---
> drivers/virtio/virtio_pci_common.c | 7 -------
> 1 file changed, 7 deletions(-)
>
> diff --git a/drivers/virtio/virtio_pci_common.c
> b/drivers/virtio/virtio_pci_common.c
> index d6d79af44569..dba5eb2eaff9 100644
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> struct device *dev = get_device(&vp_dev->vdev.dev);
>
> - /*
> - * Device is marked broken on surprise removal so that virtio upper
> - * layers can abort any ongoing operation.
> - */
> - if (!pci_device_is_present(pci_dev))
> - virtio_break_device(&vp_dev->vdev);
> -
> pci_disable_sriov(pci_dev);
>
> unregister_virtio_device(&vp_dev->vdev);
> --
> 2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-22 10:21 ` Michael S. Tsirkin
@ 2025-08-22 12:22 ` Parav Pandit
2025-08-22 13:03 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-22 12:22 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL)
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: 22 August 2025 03:52 PM
>
> On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
> > This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of
> virtio pci device").
> >
> > Virtio drivers and PCI devices have never fully supported true
> > surprise (aka hot unplug) removal. Drivers historically continued
> > processing and waiting for pending I/O and even continued synchronous
> > device reset during surprise removal. Devices have also continued
> > completing I/Os, doing DMA and allowing device reset after surprise
> > removal to support such drivers.
> >
> > Supporting it correctly would require a new device capability
>
> If a device is removed, it is removed.
This is how it was implemented and none of the virtio drivers supported it.
So vendors had stepped away from such device implementation.
(not just us).
> Windows drivers supported this since
> forever and it's just a Linux bug that it does not handle all the cases.
Internal tests showed the opposite that hotplug driver + virtio didn't handle the surprise removal either.
Can you please share the starting version of the Windows driver that actually does not wait for the CVQ commands and outstanding requests for block etc drivers?
We have even seen window driver that corrupts the CVQ command buffer when CVQ command times out!
> This is not
> something you can handle with a capability.
>
The sad truth in my view is that capability is needed so that device can know how to do _sane_ stop as driver told exactly to do so.
For sure if device stops the DMA, the device is unusable until kernel 6.20 or whatever new kernel arrives.
>
>
>
>
> > and
> > driver negotiation in the virtio specification to safely stop I/O and
> > free queue memory. Failure to do so either breaks all the existing
> > drivers with call trace listed in the commit or crashes the host on
> > continuing the DMA.
>
> If the device is gone, then DMA does not continue.
>
This is exactly how we built the device.
And device behaved correctly, it resulted in the call trace on multiple driver hands in net, block area.
Commit 43bb40c5b926 partially fixed it.
However, device does not know when to behave correctly vs behave as driver expects.
And to do this reliability a capability is needed.
> IIUC what is going on for you, is that you have developed a surprise removal
> emulation that pretends to remove the device but actually the device is doing
> DMA. So of course things break then.
>
Not really. The device was built correctly to do surprise removal and stop the DMA.
That caused the call traces. One of them listed in 43bb40c5b926.
So to maintain backward compatibility, device had to graceful removal as expected by these drivers.
> > Hence, until such specification and devices are invented, restore the
> > previous behavior of treating surprise removal as graceful removal to
> > avoid regressions and maintain system stability same as before the
> > commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci
> device").
> >
> > As explained above, previous analysis of solving this only in driver
> > was incomplete and non-reliable at [1] and at [2]; Hence reverting
> > commit
> > 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci
> > device") is still the best stand to restore failures of virtio net and
> > block devices.
> >
> > [1]
> >
> https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB100BC6C6
> > 38DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t
>
>
> I can only repeat what I said then, this is not how we do kernel development.
>
Kernel development does not promote breaking users.
Here users are affected with this incomplete kernel behaviour.
Virtio subsystem (multiple drivers) including marked stable kernel lacks the contract on when to expect DMA completions and when not.
> > [2]
> > https://lore.kernel.org/virtualization/20250602024358.57114-1-parav@nv
> > idia.com/
>
> What was missing here, is handling corner cases. So let us please try to handle
> them.
>
> Here is how I would try to do it:
>
> - add a new driver callback
> - start a periodic timer task in virtio core on remove
> - in the timer, probe that the device is still present.
> if not, invoke a driver callback
> - cancel the task on device reset
>
Multiple users request to restore the stability before adding the surprise removal support in reliable way.
Lets be practical.
Each net, blk, fs, console, crypto has different way of flushing the pending requests.
It is unlikely to find a single stable kernel where all the virtio devices would have support for above.
So until above grant work is added to new kernel, user deserves to have stability restored in stable kernels.
Hence this request to revert multiple times from the users.
> If you do not have the time, let me know and I will try to look into it.
>
> > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio
> > pci device")
> > Cc: stable@vger.kernel.org
> > Reported-by: lirongqing@baidu.com
> > Closes:
> > https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b474
> > 1@baidu.com/
> > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > ---
> > drivers/virtio/virtio_pci_common.c | 7 -------
> > 1 file changed, 7 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_pci_common.c
> > b/drivers/virtio/virtio_pci_common.c
> > index d6d79af44569..dba5eb2eaff9 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev
> *pci_dev)
> > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > struct device *dev = get_device(&vp_dev->vdev.dev);
> >
> > - /*
> > - * Device is marked broken on surprise removal so that virtio upper
> > - * layers can abort any ongoing operation.
> > - */
> > - if (!pci_device_is_present(pci_dev))
> > - virtio_break_device(&vp_dev->vdev);
> > -
> > pci_disable_sriov(pci_dev);
> >
> > unregister_virtio_device(&vp_dev->vdev);
> > --
> > 2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-22 10:27 Li,Rongqing
@ 2025-08-22 12:24 ` Parav Pandit
2025-08-22 13:04 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-22 12:24 UTC (permalink / raw)
To: NBU-Contact-Li Rongqing (EXTERNAL), mst@redhat.com,
virtualization@lists.linux.dev, jasowang@redhat.com
Cc: stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
> From: Li,Rongqing <lirongqing@baidu.com>
> Sent: 22 August 2025 03:57 PM
>
> > This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise
> > removal of virtio pci device").
> >
> > Virtio drivers and PCI devices have never fully supported true
> > surprise (aka hot
> > unplug) removal. Drivers historically continued processing and waiting
> > for pending I/O and even continued synchronous device reset during
> > surprise removal. Devices have also continued completing I/Os, doing
> > DMA and allowing device reset after surprise removal to support such drivers.
> >
> > Supporting it correctly would require a new device capability and
> > driver negotiation in the virtio specification to safely stop I/O and free queue
> memory.
> > Failure to do so either breaks all the existing drivers with call
> > trace listed in the commit or crashes the host on continuing the DMA.
> > Hence, until such specification and devices are invented, restore the
> > previous behavior of treating surprise removal as graceful removal to
> > avoid regressions and maintain system stability same as before the
> > commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci
> device").
> >
> > As explained above, previous analysis of solving this only in driver
> > was incomplete and non-reliable at [1] and at [2]; Hence reverting
> > commit
> > 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci
> > device") is still the best stand to restore failures of virtio net and block
> devices.
> >
> > [1]
> >
> https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB100BC6C6
> > 38 DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t
> > [2]
> > https://lore.kernel.org/virtualization/20250602024358.57114-1-parav@nv
> > idia.c
> > om/
> >
> > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio
> > pci device")
> > Cc: stable@vger.kernel.org
> > Reported-by: lirongqing@baidu.com
> > Closes:
> > https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b474
> > 1@b
> > aidu.com/
> > Signed-off-by: Parav Pandit <parav@nvidia.com>
>
>
>
> Tested-by: Li RongQing <lirongqing@baidu.com>
>
> Thanks
>
> -Li
>
Multiple users are blocked to have this fix in stable kernel.
Thanks a lot Li for the quick test.
>
> > ---
> > drivers/virtio/virtio_pci_common.c | 7 -------
> > 1 file changed, 7 deletions(-)
> >
> > diff --git a/drivers/virtio/virtio_pci_common.c
> > b/drivers/virtio/virtio_pci_common.c
> > index d6d79af44569..dba5eb2eaff9 100644
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -747,13 +747,6 @@ static void virtio_pci_remove(struct pci_dev
> *pci_dev)
> > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > struct device *dev = get_device(&vp_dev->vdev.dev);
> >
> > - /*
> > - * Device is marked broken on surprise removal so that virtio upper
> > - * layers can abort any ongoing operation.
> > - */
> > - if (!pci_device_is_present(pci_dev))
> > - virtio_break_device(&vp_dev->vdev);
> > -
> > pci_disable_sriov(pci_dev);
> >
> > unregister_virtio_device(&vp_dev->vdev);
> > --
> > 2.26.2
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-22 12:22 ` Parav Pandit
@ 2025-08-22 13:03 ` Michael S. Tsirkin
2025-08-22 13:49 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-22 13:03 UTC (permalink / raw)
To: Parav Pandit
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL)
On Fri, Aug 22, 2025 at 12:22:50PM +0000, Parav Pandit wrote:
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: 22 August 2025 03:52 PM
> >
> > On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
> > > This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise removal of
> > virtio pci device").
> > >
> > > Virtio drivers and PCI devices have never fully supported true
> > > surprise (aka hot unplug) removal. Drivers historically continued
> > > processing and waiting for pending I/O and even continued synchronous
> > > device reset during surprise removal. Devices have also continued
> > > completing I/Os, doing DMA and allowing device reset after surprise
> > > removal to support such drivers.
> > >
> > > Supporting it correctly would require a new device capability
> >
> > If a device is removed, it is removed.
> This is how it was implemented and none of the virtio drivers supported it.
> So vendors had stepped away from such device implementation.
> (not just us).
If the slot does not have a mechanical interlock, I can pull the device
out. It's not up to a device implementation.
--
MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-22 12:24 ` Parav Pandit
@ 2025-08-22 13:04 ` Michael S. Tsirkin
2025-08-22 13:53 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-22 13:04 UTC (permalink / raw)
To: Parav Pandit
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
On Fri, Aug 22, 2025 at 12:24:06PM +0000, Parav Pandit wrote:
>
> > From: Li,Rongqing <lirongqing@baidu.com>
> > Sent: 22 August 2025 03:57 PM
> >
> > > This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise
> > > removal of virtio pci device").
> > >
> > > Virtio drivers and PCI devices have never fully supported true
> > > surprise (aka hot
> > > unplug) removal. Drivers historically continued processing and waiting
> > > for pending I/O and even continued synchronous device reset during
> > > surprise removal. Devices have also continued completing I/Os, doing
> > > DMA and allowing device reset after surprise removal to support such drivers.
> > >
> > > Supporting it correctly would require a new device capability and
> > > driver negotiation in the virtio specification to safely stop I/O and free queue
> > memory.
> > > Failure to do so either breaks all the existing drivers with call
> > > trace listed in the commit or crashes the host on continuing the DMA.
> > > Hence, until such specification and devices are invented, restore the
> > > previous behavior of treating surprise removal as graceful removal to
> > > avoid regressions and maintain system stability same as before the
> > > commit 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci
> > device").
> > >
> > > As explained above, previous analysis of solving this only in driver
> > > was incomplete and non-reliable at [1] and at [2]; Hence reverting
> > > commit
> > > 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci
> > > device") is still the best stand to restore failures of virtio net and block
> > devices.
> > >
> > > [1]
> > >
> > https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB100BC6C6
> > > 38 DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t
> > > [2]
> > > https://lore.kernel.org/virtualization/20250602024358.57114-1-parav@nv
> > > idia.c
> > > om/
> > >
> > > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio
> > > pci device")
> > > Cc: stable@vger.kernel.org
> > > Reported-by: lirongqing@baidu.com
> > > Closes:
> > > https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b474
> > > 1@b
> > > aidu.com/
> > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> >
> >
> >
> > Tested-by: Li RongQing <lirongqing@baidu.com>
> >
> > Thanks
> >
> > -Li
> >
> Multiple users are blocked to have this fix in stable kernel.
what are these users doing that is blocked by this fix?
--
MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-22 13:03 ` Michael S. Tsirkin
@ 2025-08-22 13:49 ` Parav Pandit
2025-08-22 13:59 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-22 13:49 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL)
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: 22 August 2025 06:34 PM
>
> On Fri, Aug 22, 2025 at 12:22:50PM +0000, Parav Pandit wrote:
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: 22 August 2025 03:52 PM
> > >
> > > On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
> > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise
> > > > removal of
> > > virtio pci device").
> > > >
> > > > Virtio drivers and PCI devices have never fully supported true
> > > > surprise (aka hot unplug) removal. Drivers historically continued
> > > > processing and waiting for pending I/O and even continued
> > > > synchronous device reset during surprise removal. Devices have
> > > > also continued completing I/Os, doing DMA and allowing device
> > > > reset after surprise removal to support such drivers.
> > > >
> > > > Supporting it correctly would require a new device capability
> > >
> > > If a device is removed, it is removed.
> > This is how it was implemented and none of the virtio drivers supported it.
> > So vendors had stepped away from such device implementation.
> > (not just us).
>
>
> If the slot does not have a mechanical interlock, I can pull the device out. It's
> not up to a device implementation.
Sure yes, stack is not there yet to support it.
Each of the virtio device drivers are not there yet.
Lets build that infra, let device indicate it and it will be smooth ride for driver and device.
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-22 13:04 ` Michael S. Tsirkin
@ 2025-08-22 13:53 ` Parav Pandit
2025-08-22 14:02 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-22 13:53 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: 22 August 2025 06:35 PM
>
> On Fri, Aug 22, 2025 at 12:24:06PM +0000, Parav Pandit wrote:
> >
> > > From: Li,Rongqing <lirongqing@baidu.com>
> > > Sent: 22 August 2025 03:57 PM
> > >
> > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise
> > > > removal of virtio pci device").
> > > >
> > > > Virtio drivers and PCI devices have never fully supported true
> > > > surprise (aka hot
> > > > unplug) removal. Drivers historically continued processing and
> > > > waiting for pending I/O and even continued synchronous device
> > > > reset during surprise removal. Devices have also continued
> > > > completing I/Os, doing DMA and allowing device reset after surprise
> removal to support such drivers.
> > > >
> > > > Supporting it correctly would require a new device capability and
> > > > driver negotiation in the virtio specification to safely stop I/O
> > > > and free queue
> > > memory.
> > > > Failure to do so either breaks all the existing drivers with call
> > > > trace listed in the commit or crashes the host on continuing the DMA.
> > > > Hence, until such specification and devices are invented, restore
> > > > the previous behavior of treating surprise removal as graceful
> > > > removal to avoid regressions and maintain system stability same as
> > > > before the commit 43bb40c5b926 ("virtio_pci: Support surprise
> > > > removal of virtio pci
> > > device").
> > > >
> > > > As explained above, previous analysis of solving this only in
> > > > driver was incomplete and non-reliable at [1] and at [2]; Hence
> > > > reverting commit
> > > > 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci
> > > > device") is still the best stand to restore failures of virtio net
> > > > and block
> > > devices.
> > > >
> > > > [1]
> > > >
> > > https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB100BC6
> > > C6
> > > > 38 DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t
> > > > [2]
> > > > https://lore.kernel.org/virtualization/20250602024358.57114-1-para
> > > > v@nv
> > > > idia.c
> > > > om/
> > > >
> > > > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of
> > > > virtio pci device")
> > > > Cc: stable@vger.kernel.org
> > > > Reported-by: lirongqing@baidu.com
> > > > Closes:
> > > > https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9
> > > > b474
> > > > 1@b
> > > > aidu.com/
> > > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > >
> > >
> > >
> > > Tested-by: Li RongQing <lirongqing@baidu.com>
> > >
> > > Thanks
> > >
> > > -Li
> > >
> > Multiple users are blocked to have this fix in stable kernel.
>
> what are these users doing that is blocked by this fix?
>
Not sure I understand the question. Let me try to answer.
They are unable to dynamically add/remove the virtio net, block, fs devices in their systems.
Users have their networking applications running over NS network and database and file system through these devices.
Some of them keep reverting the patch. Some are unable to.
They are in search of stable kernel.
Did I understand your question?
> --
> MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-22 13:49 ` Parav Pandit
@ 2025-08-22 13:59 ` Michael S. Tsirkin
2025-08-24 2:36 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-22 13:59 UTC (permalink / raw)
To: Parav Pandit
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL)
On Fri, Aug 22, 2025 at 01:49:36PM +0000, Parav Pandit wrote:
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: 22 August 2025 06:34 PM
> >
> > On Fri, Aug 22, 2025 at 12:22:50PM +0000, Parav Pandit wrote:
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: 22 August 2025 03:52 PM
> > > >
> > > > On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
> > > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise
> > > > > removal of
> > > > virtio pci device").
> > > > >
> > > > > Virtio drivers and PCI devices have never fully supported true
> > > > > surprise (aka hot unplug) removal. Drivers historically continued
> > > > > processing and waiting for pending I/O and even continued
> > > > > synchronous device reset during surprise removal. Devices have
> > > > > also continued completing I/Os, doing DMA and allowing device
> > > > > reset after surprise removal to support such drivers.
> > > > >
> > > > > Supporting it correctly would require a new device capability
> > > >
> > > > If a device is removed, it is removed.
> > > This is how it was implemented and none of the virtio drivers supported it.
> > > So vendors had stepped away from such device implementation.
> > > (not just us).
> >
> >
> > If the slot does not have a mechanical interlock, I can pull the device out. It's
> > not up to a device implementation.
>
> Sure yes, stack is not there yet to support it.
> Each of the virtio device drivers are not there yet.
> Lets build that infra, let device indicate it and it will be smooth ride for driver and device.
There is simply no way for the device to "support" for surprise removal,
or lack such support thereof. The support is up to the slot, not the
device. Any pci compliant device can be placed in a slot that allows
surprise removal and that is all. The user can then remove the device.
Software can then either recover gracefully - it should - or hang or
crash - it does sometimes, now. The patch you are trying to revert
is an attempt to move some use-cases from the 1st to the 2nd category.
But what is going on now, as far as I could tell, is that someone developed
a surprise removal emulation that does not actually remove the device,
and is using that for testing the code in linux that
supports surprise removal. That weird emulation
seems to lead to all kind of weird issues. You answer is to remove the
existing code and tell your testing team "we do not support surprise removal".
But just go ahead and tell this to them straight away. You do not need
this patch for this.
Or better still, let's fix the issues please.
--
MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-22 13:53 ` Parav Pandit
@ 2025-08-22 14:02 ` Michael S. Tsirkin
2025-08-24 2:36 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-22 14:02 UTC (permalink / raw)
To: Parav Pandit
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
On Fri, Aug 22, 2025 at 01:53:02PM +0000, Parav Pandit wrote:
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: 22 August 2025 06:35 PM
> >
> > On Fri, Aug 22, 2025 at 12:24:06PM +0000, Parav Pandit wrote:
> > >
> > > > From: Li,Rongqing <lirongqing@baidu.com>
> > > > Sent: 22 August 2025 03:57 PM
> > > >
> > > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support surprise
> > > > > removal of virtio pci device").
> > > > >
> > > > > Virtio drivers and PCI devices have never fully supported true
> > > > > surprise (aka hot
> > > > > unplug) removal. Drivers historically continued processing and
> > > > > waiting for pending I/O and even continued synchronous device
> > > > > reset during surprise removal. Devices have also continued
> > > > > completing I/Os, doing DMA and allowing device reset after surprise
> > removal to support such drivers.
> > > > >
> > > > > Supporting it correctly would require a new device capability and
> > > > > driver negotiation in the virtio specification to safely stop I/O
> > > > > and free queue
> > > > memory.
> > > > > Failure to do so either breaks all the existing drivers with call
> > > > > trace listed in the commit or crashes the host on continuing the DMA.
> > > > > Hence, until such specification and devices are invented, restore
> > > > > the previous behavior of treating surprise removal as graceful
> > > > > removal to avoid regressions and maintain system stability same as
> > > > > before the commit 43bb40c5b926 ("virtio_pci: Support surprise
> > > > > removal of virtio pci
> > > > device").
> > > > >
> > > > > As explained above, previous analysis of solving this only in
> > > > > driver was incomplete and non-reliable at [1] and at [2]; Hence
> > > > > reverting commit
> > > > > 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci
> > > > > device") is still the best stand to restore failures of virtio net
> > > > > and block
> > > > devices.
> > > > >
> > > > > [1]
> > > > >
> > > > https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB100BC6
> > > > C6
> > > > > 38 DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t
> > > > > [2]
> > > > > https://lore.kernel.org/virtualization/20250602024358.57114-1-para
> > > > > v@nv
> > > > > idia.c
> > > > > om/
> > > > >
> > > > > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of
> > > > > virtio pci device")
> > > > > Cc: stable@vger.kernel.org
> > > > > Reported-by: lirongqing@baidu.com
> > > > > Closes:
> > > > > https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9
> > > > > b474
> > > > > 1@b
> > > > > aidu.com/
> > > > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > > >
> > > >
> > > >
> > > > Tested-by: Li RongQing <lirongqing@baidu.com>
> > > >
> > > > Thanks
> > > >
> > > > -Li
> > > >
> > > Multiple users are blocked to have this fix in stable kernel.
> >
> > what are these users doing that is blocked by this fix?
> >
> Not sure I understand the question. Let me try to answer.
> They are unable to dynamically add/remove the virtio net, block, fs devices in their systems.
> Users have their networking applications running over NS network and database and file system through these devices.
> Some of them keep reverting the patch. Some are unable to.
> They are in search of stable kernel.
>
> Did I understand your question?
>
Not really, sorry.
Does the system or does it not have a mechanical interlock?
If it does, how does a user run into surprise removal issues without
the ability to remove the device?
If it does not, and a user pull out the working device, how does your
patch help?
--
MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-22 14:02 ` Michael S. Tsirkin
@ 2025-08-24 2:36 ` Parav Pandit
2025-08-24 14:29 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-24 2:36 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: 22 August 2025 07:32 PM
>
> On Fri, Aug 22, 2025 at 01:53:02PM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: 22 August 2025 06:35 PM
> > >
> > > On Fri, Aug 22, 2025 at 12:24:06PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Li,Rongqing <lirongqing@baidu.com>
> > > > > Sent: 22 August 2025 03:57 PM
> > > > >
> > > > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support
> > > > > > surprise removal of virtio pci device").
> > > > > >
> > > > > > Virtio drivers and PCI devices have never fully supported true
> > > > > > surprise (aka hot
> > > > > > unplug) removal. Drivers historically continued processing and
> > > > > > waiting for pending I/O and even continued synchronous device
> > > > > > reset during surprise removal. Devices have also continued
> > > > > > completing I/Os, doing DMA and allowing device reset after
> > > > > > surprise
> > > removal to support such drivers.
> > > > > >
> > > > > > Supporting it correctly would require a new device capability
> > > > > > and driver negotiation in the virtio specification to safely
> > > > > > stop I/O and free queue
> > > > > memory.
> > > > > > Failure to do so either breaks all the existing drivers with
> > > > > > call trace listed in the commit or crashes the host on continuing the
> DMA.
> > > > > > Hence, until such specification and devices are invented,
> > > > > > restore the previous behavior of treating surprise removal as
> > > > > > graceful removal to avoid regressions and maintain system
> > > > > > stability same as before the commit 43bb40c5b926 ("virtio_pci:
> > > > > > Support surprise removal of virtio pci
> > > > > device").
> > > > > >
> > > > > > As explained above, previous analysis of solving this only in
> > > > > > driver was incomplete and non-reliable at [1] and at [2];
> > > > > > Hence reverting commit
> > > > > > 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio
> > > > > > pci
> > > > > > device") is still the best stand to restore failures of virtio
> > > > > > net and block
> > > > > devices.
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB10
> > > > > 0BC6
> > > > > C6
> > > > > > 38 DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t
> > > > > > [2]
> > > > > > https://lore.kernel.org/virtualization/20250602024358.57114-1-
> > > > > > para
> > > > > > v@nv
> > > > > > idia.c
> > > > > > om/
> > > > > >
> > > > > > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of
> > > > > > virtio pci device")
> > > > > > Cc: stable@vger.kernel.org
> > > > > > Reported-by: lirongqing@baidu.com
> > > > > > Closes:
> > > > > > https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb7
> > > > > > 3ca9
> > > > > > b474
> > > > > > 1@b
> > > > > > aidu.com/
> > > > > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > > > >
> > > > >
> > > > >
> > > > > Tested-by: Li RongQing <lirongqing@baidu.com>
> > > > >
> > > > > Thanks
> > > > >
> > > > > -Li
> > > > >
> > > > Multiple users are blocked to have this fix in stable kernel.
> > >
> > > what are these users doing that is blocked by this fix?
> > >
> > Not sure I understand the question. Let me try to answer.
> > They are unable to dynamically add/remove the virtio net, block, fs devices in
> their systems.
> > Users have their networking applications running over NS network and
> database and file system through these devices.
> > Some of them keep reverting the patch. Some are unable to.
> > They are in search of stable kernel.
> >
> > Did I understand your question?
> >
>
> Not really, sorry.
>
> Does the system or does it not have a mechanical interlock?
>
It is modern system beyond mechanical interlock but has the ability for surprise removal.
> If it does, how does a user run into surprise removal issues without the ability
> to remove the device?
>
User has the ability to surprise removal a device from the slot via the slot's pci registers.
Yet the device is capable enough to fulfil the needs of broken drivers which are waiting for the pending requests to arrive.
> If it does not, and a user pull out the working device, how does your patch
> help?
>
A driver must tell that it will not follow broken ancient behaviour and at that point device would stop its ancient backward compatibility mode.
> --
> MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-22 13:59 ` Michael S. Tsirkin
@ 2025-08-24 2:36 ` Parav Pandit
2025-08-24 14:33 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-24 2:36 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL)
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: 22 August 2025 07:30 PM
>
> On Fri, Aug 22, 2025 at 01:49:36PM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: 22 August 2025 06:34 PM
> > >
> > > On Fri, Aug 22, 2025 at 12:22:50PM +0000, Parav Pandit wrote:
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: 22 August 2025 03:52 PM
> > > > >
> > > > > On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
> > > > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support
> > > > > > surprise removal of
> > > > > virtio pci device").
> > > > > >
> > > > > > Virtio drivers and PCI devices have never fully supported true
> > > > > > surprise (aka hot unplug) removal. Drivers historically
> > > > > > continued processing and waiting for pending I/O and even
> > > > > > continued synchronous device reset during surprise removal.
> > > > > > Devices have also continued completing I/Os, doing DMA and
> > > > > > allowing device reset after surprise removal to support such drivers.
> > > > > >
> > > > > > Supporting it correctly would require a new device capability
> > > > >
> > > > > If a device is removed, it is removed.
> > > > This is how it was implemented and none of the virtio drivers supported it.
> > > > So vendors had stepped away from such device implementation.
> > > > (not just us).
> > >
> > >
> > > If the slot does not have a mechanical interlock, I can pull the
> > > device out. It's not up to a device implementation.
> >
> > Sure yes, stack is not there yet to support it.
> > Each of the virtio device drivers are not there yet.
> > Lets build that infra, let device indicate it and it will be smooth ride for driver
> and device.
>
> There is simply no way for the device to "support" for surprise removal, or lack
> such support thereof.
> The support is up to the slot, not the device. Any pci
> compliant device can be placed in a slot that allows surprise removal and that is
> all. The user can then remove the device.
> Software can then either recover gracefully - it should - or hang or crash - it
> does sometimes, now. The patch you are trying to revert is an attempt to move
> some use-cases from the 1st to the 2nd category.
>
It is the driver (and not the device) who needs to tell the device that it will do sane cleanup and not wait infinitely.
> But what is going on now, as far as I could tell, is that someone developed a
> surprise removal emulation that does not actually remove the device, and is
> using that for testing the code in linux that supports surprise removal.
Nop. Your analysis is incorrect.
And I explained you that already.
The device implementation supports correct implementation where device stops all the dma and also does not support register access.
And no single virtio driver supported that.
On a surprised removed device, driver expects I/Os to complete and this is beyond a 'bug fix' watermark.
> That
> weird emulation seems to lead to all kind of weird issues. You answer is to
> remove the existing code and tell your testing team "we do not support
> surprise removal".
>
He he, it is no the device, it is the driver that does not support surprise removal as you can see in your proposed patches and other sw changes.
> But just go ahead and tell this to them straight away. You do not need this patch
> for this.
>
It is needed until infrastructure in multiple subsystem is built.
>
> Or better still, let's fix the issues please.
>
The implementation is more than a fix category for stable kernels.
Hence, what is asked is to do proper implementation for future kernels and until that point restore the bad use experience.
>
> --
> MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-24 2:36 ` Parav Pandit
@ 2025-08-24 14:29 ` Michael S. Tsirkin
2025-08-26 18:52 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-24 14:29 UTC (permalink / raw)
To: Parav Pandit
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
On Sun, Aug 24, 2025 at 02:36:11AM +0000, Parav Pandit wrote:
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: 22 August 2025 07:32 PM
> >
> > On Fri, Aug 22, 2025 at 01:53:02PM +0000, Parav Pandit wrote:
> > >
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: 22 August 2025 06:35 PM
> > > >
> > > > On Fri, Aug 22, 2025 at 12:24:06PM +0000, Parav Pandit wrote:
> > > > >
> > > > > > From: Li,Rongqing <lirongqing@baidu.com>
> > > > > > Sent: 22 August 2025 03:57 PM
> > > > > >
> > > > > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support
> > > > > > > surprise removal of virtio pci device").
> > > > > > >
> > > > > > > Virtio drivers and PCI devices have never fully supported true
> > > > > > > surprise (aka hot
> > > > > > > unplug) removal. Drivers historically continued processing and
> > > > > > > waiting for pending I/O and even continued synchronous device
> > > > > > > reset during surprise removal. Devices have also continued
> > > > > > > completing I/Os, doing DMA and allowing device reset after
> > > > > > > surprise
> > > > removal to support such drivers.
> > > > > > >
> > > > > > > Supporting it correctly would require a new device capability
> > > > > > > and driver negotiation in the virtio specification to safely
> > > > > > > stop I/O and free queue
> > > > > > memory.
> > > > > > > Failure to do so either breaks all the existing drivers with
> > > > > > > call trace listed in the commit or crashes the host on continuing the
> > DMA.
> > > > > > > Hence, until such specification and devices are invented,
> > > > > > > restore the previous behavior of treating surprise removal as
> > > > > > > graceful removal to avoid regressions and maintain system
> > > > > > > stability same as before the commit 43bb40c5b926 ("virtio_pci:
> > > > > > > Support surprise removal of virtio pci
> > > > > > device").
> > > > > > >
> > > > > > > As explained above, previous analysis of solving this only in
> > > > > > > driver was incomplete and non-reliable at [1] and at [2];
> > > > > > > Hence reverting commit
> > > > > > > 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio
> > > > > > > pci
> > > > > > > device") is still the best stand to restore failures of virtio
> > > > > > > net and block
> > > > > > devices.
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613EB10
> > > > > > 0BC6
> > > > > > C6
> > > > > > > 38 DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t
> > > > > > > [2]
> > > > > > > https://lore.kernel.org/virtualization/20250602024358.57114-1-
> > > > > > > para
> > > > > > > v@nv
> > > > > > > idia.c
> > > > > > > om/
> > > > > > >
> > > > > > > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of
> > > > > > > virtio pci device")
> > > > > > > Cc: stable@vger.kernel.org
> > > > > > > Reported-by: lirongqing@baidu.com
> > > > > > > Closes:
> > > > > > > https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb7
> > > > > > > 3ca9
> > > > > > > b474
> > > > > > > 1@b
> > > > > > > aidu.com/
> > > > > > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > > > > >
> > > > > >
> > > > > >
> > > > > > Tested-by: Li RongQing <lirongqing@baidu.com>
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > > -Li
> > > > > >
> > > > > Multiple users are blocked to have this fix in stable kernel.
> > > >
> > > > what are these users doing that is blocked by this fix?
> > > >
> > > Not sure I understand the question. Let me try to answer.
> > > They are unable to dynamically add/remove the virtio net, block, fs devices in
> > their systems.
> > > Users have their networking applications running over NS network and
> > database and file system through these devices.
> > > Some of them keep reverting the patch. Some are unable to.
> > > They are in search of stable kernel.
> > >
> > > Did I understand your question?
> > >
> >
> > Not really, sorry.
> >
> > Does the system or does it not have a mechanical interlock?
> >
> It is modern system beyond mechanical interlock but has the ability for surprise removal.
I am not sure what does "beyond" mean. I guess that it does not have it?
> > If it does, how does a user run into surprise removal issues without the ability
> > to remove the device?
> >
> User has the ability to surprise removal a device from the slot via the slot's pci registers.
I don't know what this means. Surprise removal is done by removing the
device. Not via pci registers.
> Yet the device is capable enough to fulfil the needs of broken drivers which are waiting for the pending requests to arrive.
I don't know what this means. A removed device can not do anything at
all.
> > If it does not, and a user pull out the working device, how does your patch
> > help?
> >
> A driver must tell that it will not follow broken ancient behaviour and at that point device would stop its ancient backward compatibility mode.
I don't know what is "ancient backward compatibility mode".
> > --
> > MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-24 2:36 ` Parav Pandit
@ 2025-08-24 14:33 ` Michael S. Tsirkin
2025-08-26 18:52 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-24 14:33 UTC (permalink / raw)
To: Parav Pandit
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL)
On Sun, Aug 24, 2025 at 02:36:23AM +0000, Parav Pandit wrote:
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: 22 August 2025 07:30 PM
> >
> > On Fri, Aug 22, 2025 at 01:49:36PM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: 22 August 2025 06:34 PM
> > > >
> > > > On Fri, Aug 22, 2025 at 12:22:50PM +0000, Parav Pandit wrote:
> > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Sent: 22 August 2025 03:52 PM
> > > > > >
> > > > > > On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
> > > > > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support
> > > > > > > surprise removal of
> > > > > > virtio pci device").
> > > > > > >
> > > > > > > Virtio drivers and PCI devices have never fully supported true
> > > > > > > surprise (aka hot unplug) removal. Drivers historically
> > > > > > > continued processing and waiting for pending I/O and even
> > > > > > > continued synchronous device reset during surprise removal.
> > > > > > > Devices have also continued completing I/Os, doing DMA and
> > > > > > > allowing device reset after surprise removal to support such drivers.
> > > > > > >
> > > > > > > Supporting it correctly would require a new device capability
> > > > > >
> > > > > > If a device is removed, it is removed.
> > > > > This is how it was implemented and none of the virtio drivers supported it.
> > > > > So vendors had stepped away from such device implementation.
> > > > > (not just us).
> > > >
> > > >
> > > > If the slot does not have a mechanical interlock, I can pull the
> > > > device out. It's not up to a device implementation.
> > >
> > > Sure yes, stack is not there yet to support it.
> > > Each of the virtio device drivers are not there yet.
> > > Lets build that infra, let device indicate it and it will be smooth ride for driver
> > and device.
> >
> > There is simply no way for the device to "support" for surprise removal, or lack
> > such support thereof.
> > The support is up to the slot, not the device. Any pci
> > compliant device can be placed in a slot that allows surprise removal and that is
> > all. The user can then remove the device.
> > Software can then either recover gracefully - it should - or hang or crash - it
> > does sometimes, now. The patch you are trying to revert is an attempt to move
> > some use-cases from the 1st to the 2nd category.
> >
> It is the driver (and not the device) who needs to tell the device that it will do sane cleanup and not wait infinitely.
You can invent a way for driver to tell the device that it is not
broken. But even if the driver does not do it, nothing at all
prevents users from removing the device.
> > But what is going on now, as far as I could tell, is that someone developed a
> > surprise removal emulation that does not actually remove the device, and is
> > using that for testing the code in linux that supports surprise removal.
> Nop. Your analysis is incorrect.
> And I explained you that already.
> The device implementation supports correct implementation where device stops all the dma and also does not support register access.
> And no single virtio driver supported that.
>
> On a surprised removed device, driver expects I/Os to complete and this is beyond a 'bug fix' watermark.
>
> > That
> > weird emulation seems to lead to all kind of weird issues. You answer is to
> > remove the existing code and tell your testing team "we do not support
> > surprise removal".
> >
> He he, it is no the device, it is the driver that does not support surprise removal as you can see in your proposed patches and other sw changes.
Then fix the driver. Or don't, for that matter, if you lack the time.
> > But just go ahead and tell this to them straight away. You do not need this patch
> > for this.
> >
> It is needed until infrastructure in multiple subsystem is built.
What I do not understand, is what good does the revert do. Sorry.
> >
> > Or better still, let's fix the issues please.
> >
> The implementation is more than a fix category for stable kernels.
> Hence, what is asked is to do proper implementation for future kernels and until that point restore the bad use experience.
I am not at all interested in discussing ease of backporting fixes
before they are developed.
Not how we do kernel development, sorry.
> >
> > --
> > MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-24 14:33 ` Michael S. Tsirkin
@ 2025-08-26 18:52 ` Parav Pandit
2025-08-27 10:19 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-26 18:52 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL)
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: 24 August 2025 08:03 PM
> To: Parav Pandit <parav@nvidia.com>
> Cc: virtualization@lists.linux.dev; jasowang@redhat.com;
> stefanha@redhat.com; pbonzini@redhat.com; xuanzhuo@linux.alibaba.com;
> stable@vger.kernel.org; Max Gurtovoy <mgurtovoy@nvidia.com>; NBU-
> Contact-Li Rongqing (EXTERNAL) <lirongqing@baidu.com>
> Subject: Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci
> device"
>
> On Sun, Aug 24, 2025 at 02:36:23AM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: 22 August 2025 07:30 PM
> > >
> > > On Fri, Aug 22, 2025 at 01:49:36PM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: 22 August 2025 06:34 PM
> > > > >
> > > > > On Fri, Aug 22, 2025 at 12:22:50PM +0000, Parav Pandit wrote:
> > > > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > Sent: 22 August 2025 03:52 PM
> > > > > > >
> > > > > > > On Fri, Aug 22, 2025 at 12:17:06PM +0300, Parav Pandit wrote:
> > > > > > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support
> > > > > > > > surprise removal of
> > > > > > > virtio pci device").
> > > > > > > >
> > > > > > > > Virtio drivers and PCI devices have never fully supported
> > > > > > > > true surprise (aka hot unplug) removal. Drivers
> > > > > > > > historically continued processing and waiting for pending
> > > > > > > > I/O and even continued synchronous device reset during surprise
> removal.
> > > > > > > > Devices have also continued completing I/Os, doing DMA and
> > > > > > > > allowing device reset after surprise removal to support such
> drivers.
> > > > > > > >
> > > > > > > > Supporting it correctly would require a new device
> > > > > > > > capability
> > > > > > >
> > > > > > > If a device is removed, it is removed.
> > > > > > This is how it was implemented and none of the virtio drivers
> supported it.
> > > > > > So vendors had stepped away from such device implementation.
> > > > > > (not just us).
> > > > >
> > > > >
> > > > > If the slot does not have a mechanical interlock, I can pull the
> > > > > device out. It's not up to a device implementation.
> > > >
> > > > Sure yes, stack is not there yet to support it.
> > > > Each of the virtio device drivers are not there yet.
> > > > Lets build that infra, let device indicate it and it will be
> > > > smooth ride for driver
> > > and device.
> > >
> > > There is simply no way for the device to "support" for surprise
> > > removal, or lack such support thereof.
> > > The support is up to the slot, not the device. Any pci compliant
> > > device can be placed in a slot that allows surprise removal and that
> > > is all. The user can then remove the device.
> > > Software can then either recover gracefully - it should - or hang or
> > > crash - it does sometimes, now. The patch you are trying to revert
> > > is an attempt to move some use-cases from the 1st to the 2nd category.
> > >
> > It is the driver (and not the device) who needs to tell the device that it will
> do sane cleanup and not wait infinitely.
>
> You can invent a way for driver to tell the device that it is not broken. But even
> if the driver does not do it, nothing at all prevents users from removing the
> device.
Sure. Vendors will implement the device accordingly based on the driver's config.
If driver tells that it is ready for surprise removal, the device will implement functionality accordingly.
And user can do anything it wants.
But user will have indication and knowledge from the device, when not to break the system.
>
>
> > > But what is going on now, as far as I could tell, is that someone
> > > developed a surprise removal emulation that does not actually remove
> > > the device, and is using that for testing the code in linux that supports
> surprise removal.
> > Nop. Your analysis is incorrect.
> > And I explained you that already.
> > The device implementation supports correct implementation where device
> stops all the dma and also does not support register access.
> > And no single virtio driver supported that.
> >
> > On a surprised removed device, driver expects I/Os to complete and this is
> beyond a 'bug fix' watermark.
> >
> > > That
> > > weird emulation seems to lead to all kind of weird issues. You
> > > answer is to remove the existing code and tell your testing team "we
> > > do not support surprise removal".
> > >
> > He he, it is no the device, it is the driver that does not support surprise
> removal as you can see in your proposed patches and other sw changes.
>
> Then fix the driver. Or don't, for that matter, if you lack the time.
>
I explained, it is not a fix. It is a completely new implementation of the infrastructure that spans, virtio, pci or core subsystem and each individual device types.
> > > But just go ahead and tell this to them straight away. You do not
> > > need this patch for this.
> > >
> > It is needed until infrastructure in multiple subsystem is built.
>
> What I do not understand, is what good does the revert do. Sorry.
>
Let me explain.
It prevents the issue of vblk requests being stuck due to broken VQ.
It prevents the vnet driver start_xmit() to be not stuck on skb completions.
And virtio stack works exactly the way it was working before the commit.
> > >
> > > Or better still, let's fix the issues please.
> > >
> > The implementation is more than a fix category for stable kernels.
> > Hence, what is asked is to do proper implementation for future kernels and
> until that point restore the bad use experience.
>
>
>
> I am not at all interested in discussing ease of backporting fixes before they
> are developed.
Your attribution of "fixing a problem" for the missing new implementation that span across multiple different subsystem (virtio generic, block net) is not accurate.
I understand that such confusion arises from your previous email regarding physical removal of the device...
Hope I explained that its virtual system where user has not removed the card physically.
Driver is expecting requests to complete and attempting device reset too...
> Not how we do kernel development, sorry.
>
The ask is to revert a patch that broke the virtio stack.
No one is asking to backport new implementation and new spec to old kernels.
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-24 14:29 ` Michael S. Tsirkin
@ 2025-08-26 18:52 ` Parav Pandit
2025-08-27 10:21 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-26 18:52 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: 24 August 2025 08:00 PM
>
> On Sun, Aug 24, 2025 at 02:36:11AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: 22 August 2025 07:32 PM
> > >
> > > On Fri, Aug 22, 2025 at 01:53:02PM +0000, Parav Pandit wrote:
> > > >
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: 22 August 2025 06:35 PM
> > > > >
> > > > > On Fri, Aug 22, 2025 at 12:24:06PM +0000, Parav Pandit wrote:
> > > > > >
> > > > > > > From: Li,Rongqing <lirongqing@baidu.com>
> > > > > > > Sent: 22 August 2025 03:57 PM
> > > > > > >
> > > > > > > > This reverts commit 43bb40c5b926 ("virtio_pci: Support
> > > > > > > > surprise removal of virtio pci device").
> > > > > > > >
> > > > > > > > Virtio drivers and PCI devices have never fully supported
> > > > > > > > true surprise (aka hot
> > > > > > > > unplug) removal. Drivers historically continued processing
> > > > > > > > and waiting for pending I/O and even continued synchronous
> > > > > > > > device reset during surprise removal. Devices have also
> > > > > > > > continued completing I/Os, doing DMA and allowing device
> > > > > > > > reset after surprise
> > > > > removal to support such drivers.
> > > > > > > >
> > > > > > > > Supporting it correctly would require a new device
> > > > > > > > capability and driver negotiation in the virtio
> > > > > > > > specification to safely stop I/O and free queue
> > > > > > > memory.
> > > > > > > > Failure to do so either breaks all the existing drivers
> > > > > > > > with call trace listed in the commit or crashes the host
> > > > > > > > on continuing the
> > > DMA.
> > > > > > > > Hence, until such specification and devices are invented,
> > > > > > > > restore the previous behavior of treating surprise removal
> > > > > > > > as graceful removal to avoid regressions and maintain
> > > > > > > > system stability same as before the commit 43bb40c5b926
> ("virtio_pci:
> > > > > > > > Support surprise removal of virtio pci
> > > > > > > device").
> > > > > > > >
> > > > > > > > As explained above, previous analysis of solving this only
> > > > > > > > in driver was incomplete and non-reliable at [1] and at
> > > > > > > > [2]; Hence reverting commit
> > > > > > > > 43bb40c5b926 ("virtio_pci: Support surprise removal of
> > > > > > > > virtio pci
> > > > > > > > device") is still the best stand to restore failures of
> > > > > > > > virtio net and block
> > > > > > > devices.
> > > > > > > >
> > > > > > > > [1]
> > > > > > > >
> > > > > > > https://lore.kernel.org/virtualization/CY8PR12MB719506CC5613
> > > > > > > EB10
> > > > > > > 0BC6
> > > > > > > C6
> > > > > > > > 38 DCBD2@CY8PR12MB7195.namprd12.prod.outlook.com/#t
> > > > > > > > [2]
> > > > > > > > https://lore.kernel.org/virtualization/20250602024358.5711
> > > > > > > > 4-1-
> > > > > > > > para
> > > > > > > > v@nv
> > > > > > > > idia.c
> > > > > > > > om/
> > > > > > > >
> > > > > > > > Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal
> > > > > > > > of virtio pci device")
> > > > > > > > Cc: stable@vger.kernel.org
> > > > > > > > Reported-by: lirongqing@baidu.com
> > > > > > > > Closes:
> > > > > > > > https://lore.kernel.org/virtualization/c45dd68698cd47238c5
> > > > > > > > 5fb7
> > > > > > > > 3ca9
> > > > > > > > b474
> > > > > > > > 1@b
> > > > > > > > aidu.com/
> > > > > > > > Signed-off-by: Parav Pandit <parav@nvidia.com>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Tested-by: Li RongQing <lirongqing@baidu.com>
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > > -Li
> > > > > > >
> > > > > > Multiple users are blocked to have this fix in stable kernel.
> > > > >
> > > > > what are these users doing that is blocked by this fix?
> > > > >
> > > > Not sure I understand the question. Let me try to answer.
> > > > They are unable to dynamically add/remove the virtio net, block,
> > > > fs devices in
> > > their systems.
> > > > Users have their networking applications running over NS network
> > > > and
> > > database and file system through these devices.
> > > > Some of them keep reverting the patch. Some are unable to.
> > > > They are in search of stable kernel.
> > > >
> > > > Did I understand your question?
> > > >
> > >
> > > Not really, sorry.
> > >
> > > Does the system or does it not have a mechanical interlock?
> > >
> > It is modern system beyond mechanical interlock but has the ability for
> surprise removal.
>
> I am not sure what does "beyond" mean. I guess that it does not have it?
>
Right.
> > > If it does, how does a user run into surprise removal issues without
> > > the ability to remove the device?
> > >
> > User has the ability to surprise removal a device from the slot via the slot's
> pci registers.
>
> I don't know what this means. Surprise removal is done by removing the
> device. Not via pci registers.
They are many ways to surprise remove a device from a slot.
Slots are capable to detect the device removal and when that occurs, the pcie switch/bridge updates the slot registers for the downstream port and indicate to the sw.
Some slots are physical in nature. Some are virtual slots in the pcie switch/bridge.
End result is, sw gets to know this sw PCIe registers in switch/bridge/port level.
>
> > Yet the device is capable enough to fulfil the needs of broken drivers which
> are waiting for the pending requests to arrive.
>
> I don't know what this means. A removed device can not do anything at all.
>
If this was really physically removed, yes.
but in virtual system, the device is still present on the slot until it gets indication from the OS.
> > > If it does not, and a user pull out the working device, how does
> > > your patch help?
> > >
> > A driver must tell that it will not follow broken ancient behaviour and at that
> point device would stop its ancient backward compatibility mode.
>
>
>
> I don't know what is "ancient backward compatibility mode".
>
Let me explain.
Sadly, CSPs virtio pci device implementation is done such a way that, it works with ancient Linux kernel which does not have commit 43bb40c5b9265.
Commit 43bb40c5b9265 was partial in nature.
Hence request to revert and do proper implementation like you mentioned using extra callback and/or with spec extension.
>
>
>
>
> > > --
> > > MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-26 18:52 ` Parav Pandit
@ 2025-08-27 10:19 ` Michael S. Tsirkin
2025-08-27 11:33 ` Cornelia Huck
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-27 10:19 UTC (permalink / raw)
To: Parav Pandit
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL)
On Tue, Aug 26, 2025 at 06:52:03PM +0000, Parav Pandit wrote:
> > What I do not understand, is what good does the revert do. Sorry.
> >
> Let me explain.
> It prevents the issue of vblk requests being stuck due to broken VQ.
> It prevents the vnet driver start_xmit() to be not stuck on skb completions.
This is the part I don't get. In what scenario, before 43bb40c5b9265
start_xmit is not stuck, but after 43bb40c5b9265 it is stuck?
Once the device is gone, it is not using any buffers at all.
--
MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-26 18:52 ` Parav Pandit
@ 2025-08-27 10:21 ` Michael S. Tsirkin
2025-08-27 10:49 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-27 10:21 UTC (permalink / raw)
To: Parav Pandit
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
On Tue, Aug 26, 2025 at 06:52:11PM +0000, Parav Pandit wrote:
> > > > If it does not, and a user pull out the working device, how does
> > > > your patch help?
> > > >
> > > A driver must tell that it will not follow broken ancient behaviour and at that
> > point device would stop its ancient backward compatibility mode.
> >
> >
> >
> > I don't know what is "ancient backward compatibility mode".
> >
> Let me explain.
> Sadly, CSPs virtio pci device implementation is done such a way that, it works with ancient Linux kernel which does not have commit 43bb40c5b9265.
OK we are getting new information here.
So let me summarize. There's a virtual system that pretends, to the
guest, that device was removed by surprise removal, but actually
device is there and is still doing DMA.
Is that a fair summary?
--
MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-27 10:21 ` Michael S. Tsirkin
@ 2025-08-27 10:49 ` Michael S. Tsirkin
2025-08-28 6:23 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-27 10:49 UTC (permalink / raw)
To: Parav Pandit
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
On Wed, Aug 27, 2025 at 06:21:28AM -0400, Michael S. Tsirkin wrote:
> On Tue, Aug 26, 2025 at 06:52:11PM +0000, Parav Pandit wrote:
> > > > > If it does not, and a user pull out the working device, how does
> > > > > your patch help?
> > > > >
> > > > A driver must tell that it will not follow broken ancient behaviour and at that
> > > point device would stop its ancient backward compatibility mode.
> > >
> > >
> > >
> > > I don't know what is "ancient backward compatibility mode".
> > >
> > Let me explain.
> > Sadly, CSPs virtio pci device implementation is done such a way that, it works with ancient Linux kernel which does not have commit 43bb40c5b9265.
>
>
> OK we are getting new information here.
>
> So let me summarize. There's a virtual system that pretends, to the
> guest, that device was removed by surprise removal, but actually
> device is there and is still doing DMA.
> Is that a fair summary?
If that is the case, the thing to do would be to try and detect the fake
removal and then work with device as usual - device not doing DMA
after removal is pretty fundamental, after all.
For example, how about reading device control+status?
If we get all ones device has been removed
If we get 0 in bus master: device has been removed but re-inserted
Anything else is a fake removal
Hmm?
> --
> MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-27 10:19 ` Michael S. Tsirkin
@ 2025-08-27 11:33 ` Cornelia Huck
2025-08-28 6:24 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Cornelia Huck @ 2025-08-27 11:33 UTC (permalink / raw)
To: Michael S. Tsirkin, Parav Pandit
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL)
On Wed, Aug 27 2025, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Tue, Aug 26, 2025 at 06:52:03PM +0000, Parav Pandit wrote:
>> > What I do not understand, is what good does the revert do. Sorry.
>> >
>> Let me explain.
>> It prevents the issue of vblk requests being stuck due to broken VQ.
>> It prevents the vnet driver start_xmit() to be not stuck on skb completions.
>
> This is the part I don't get. In what scenario, before 43bb40c5b9265
> start_xmit is not stuck, but after 43bb40c5b9265 it is stuck?
>
> Once the device is gone, it is not using any buffers at all.
What I also don't understand: virtio-ccw does exactly the same thing
(virtio_break_device(), added in 2014), and it supports surprise removal
_only_, yet I don't remember seeing bug reports?
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-27 10:49 ` Michael S. Tsirkin
@ 2025-08-28 6:23 ` Parav Pandit
2025-08-28 6:34 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-28 6:23 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: 27 August 2025 04:19 PM
>
> On Wed, Aug 27, 2025 at 06:21:28AM -0400, Michael S. Tsirkin wrote:
> > On Tue, Aug 26, 2025 at 06:52:11PM +0000, Parav Pandit wrote:
> > > > > > If it does not, and a user pull out the working device, how
> > > > > > does your patch help?
> > > > > >
> > > > > A driver must tell that it will not follow broken ancient
> > > > > behaviour and at that
> > > > point device would stop its ancient backward compatibility mode.
> > > >
> > > >
> > > >
> > > > I don't know what is "ancient backward compatibility mode".
> > > >
> > > Let me explain.
> > > Sadly, CSPs virtio pci device implementation is done such a way that, it
> works with ancient Linux kernel which does not have commit
> 43bb40c5b9265.
> >
> >
> > OK we are getting new information here.
> >
> > So let me summarize. There's a virtual system that pretends, to the
> > guest, that device was removed by surprise removal, but actually
> > device is there and is still doing DMA.
> > Is that a fair summary?
>
Yes.
> If that is the case, the thing to do would be to try and detect the fake removal
> and then work with device as usual - device not doing DMA after removal is
> pretty fundamental, after all.
>
The issue is: one can build the device to stop the DMA.
There is no predictable combination for the driver and device that can work for the user.
For example,
Device that stops the dma will not work before the commit 43bb40c5b9265.
Device that continues the dma will not work with whatever new implementation done in future kernels.
Hence the capability negotiation would be needed so that device can stop the DMA, config interrupts etc.
> For example, how about reading device control+status?
>
Most platforms read 0xffff on non-existing device, but not sure if this the standard or well defined.
> If we get all ones device has been removed If we get 0 in bus master: device
> has been removed but re-inserted Anything else is a fake removal
>
Bus master check may pass, right returning all 1s, even if the device is removed, isn't it?
> Hmm?
>
>
>
> > --
> > MST
>
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-27 11:33 ` Cornelia Huck
@ 2025-08-28 6:24 ` Parav Pandit
2025-08-28 12:16 ` Cornelia Huck
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-28 6:24 UTC (permalink / raw)
To: Cornelia Huck, Michael S. Tsirkin
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL)
> From: Cornelia Huck <cohuck@redhat.com>
> Sent: 27 August 2025 05:04 PM
>
> On Wed, Aug 27 2025, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Tue, Aug 26, 2025 at 06:52:03PM +0000, Parav Pandit wrote:
> >> > What I do not understand, is what good does the revert do. Sorry.
> >> >
> >> Let me explain.
> >> It prevents the issue of vblk requests being stuck due to broken VQ.
> >> It prevents the vnet driver start_xmit() to be not stuck on skb completions.
> >
> > This is the part I don't get. In what scenario, before 43bb40c5b9265
> > start_xmit is not stuck, but after 43bb40c5b9265 it is stuck?
> >
> > Once the device is gone, it is not using any buffers at all.
>
> What I also don't understand: virtio-ccw does exactly the same thing
> (virtio_break_device(), added in 2014), and it supports surprise removal
> _only_, yet I don't remember seeing bug reports?
I suspect that stress testing may not have happened for ccw with active vblk Ios and outstanding transmit pkt and cvq commands.
Hard to say as we don't have ccw hw or systems.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-28 6:23 ` Parav Pandit
@ 2025-08-28 6:34 ` Michael S. Tsirkin
2025-08-28 6:59 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-28 6:34 UTC (permalink / raw)
To: Parav Pandit
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
On Thu, Aug 28, 2025 at 06:23:02AM +0000, Parav Pandit wrote:
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: 27 August 2025 04:19 PM
> >
> > On Wed, Aug 27, 2025 at 06:21:28AM -0400, Michael S. Tsirkin wrote:
> > > On Tue, Aug 26, 2025 at 06:52:11PM +0000, Parav Pandit wrote:
> > > > > > > If it does not, and a user pull out the working device, how
> > > > > > > does your patch help?
> > > > > > >
> > > > > > A driver must tell that it will not follow broken ancient
> > > > > > behaviour and at that
> > > > > point device would stop its ancient backward compatibility mode.
> > > > >
> > > > >
> > > > >
> > > > > I don't know what is "ancient backward compatibility mode".
> > > > >
> > > > Let me explain.
> > > > Sadly, CSPs virtio pci device implementation is done such a way that, it
> > works with ancient Linux kernel which does not have commit
> > 43bb40c5b9265.
> > >
> > >
> > > OK we are getting new information here.
> > >
> > > So let me summarize. There's a virtual system that pretends, to the
> > > guest, that device was removed by surprise removal, but actually
> > > device is there and is still doing DMA.
> > > Is that a fair summary?
> >
> Yes.
>
> > If that is the case, the thing to do would be to try and detect the fake removal
> > and then work with device as usual - device not doing DMA after removal is
> > pretty fundamental, after all.
> >
> The issue is: one can build the device to stop the DMA.
> There is no predictable combination for the driver and device that can work for the user.
> For example,
> Device that stops the dma will not work before the commit 43bb40c5b9265.
> Device that continues the dma will not work with whatever new implementation done in future kernels.
>
> Hence the capability negotiation would be needed so that device can stop the DMA, config interrupts etc.
So this is a broken implementation at the pci level. We really can't fix
removal for this device at all, except by fixing the device. Whatever
works, works by chance. Feature negotiation in spec is not the way to
fix that, but some work arounds in the driver to skip the device are
acceptable, mostly to not bother with it.
Pls document exactly how this pci looks. Does it have an id we can use
to detect it?
> > For example, how about reading device control+status?
> >
> Most platforms read 0xffff on non-existing device, but not sure if this the standard or well defined.
IIRC it's in the pci spec as a note.
> > If we get all ones device has been removed If we get 0 in bus master: device
> > has been removed but re-inserted Anything else is a fake removal
> >
> Bus master check may pass, right returning all 1s, even if the device is removed, isn't it?
So we check all ones 1st, only check bus master if not all ones?
> > Hmm?
> >
> >
> >
> > > --
> > > MST
> >
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-28 6:34 ` Michael S. Tsirkin
@ 2025-08-28 6:59 ` Parav Pandit
2025-08-28 9:23 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-28 6:59 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: 28 August 2025 12:04 PM
>
> On Thu, Aug 28, 2025 at 06:23:02AM +0000, Parav Pandit wrote:
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: 27 August 2025 04:19 PM
> > >
> > > On Wed, Aug 27, 2025 at 06:21:28AM -0400, Michael S. Tsirkin wrote:
> > > > On Tue, Aug 26, 2025 at 06:52:11PM +0000, Parav Pandit wrote:
> > > > > > > > If it does not, and a user pull out the working device,
> > > > > > > > how does your patch help?
> > > > > > > >
> > > > > > > A driver must tell that it will not follow broken ancient
> > > > > > > behaviour and at that
> > > > > > point device would stop its ancient backward compatibility mode.
> > > > > >
> > > > > >
> > > > > >
> > > > > > I don't know what is "ancient backward compatibility mode".
> > > > > >
> > > > > Let me explain.
> > > > > Sadly, CSPs virtio pci device implementation is done such a way
> > > > > that, it
> > > works with ancient Linux kernel which does not have commit
> > > 43bb40c5b9265.
> > > >
> > > >
> > > > OK we are getting new information here.
> > > >
> > > > So let me summarize. There's a virtual system that pretends, to
> > > > the guest, that device was removed by surprise removal, but
> > > > actually device is there and is still doing DMA.
> > > > Is that a fair summary?
> > >
> > Yes.
> >
> > > If that is the case, the thing to do would be to try and detect the
> > > fake removal and then work with device as usual - device not doing
> > > DMA after removal is pretty fundamental, after all.
> > >
> > The issue is: one can build the device to stop the DMA.
> > There is no predictable combination for the driver and device that can work
> for the user.
> > For example,
> > Device that stops the dma will not work before the commit 43bb40c5b9265.
> > Device that continues the dma will not work with whatever new
> implementation done in future kernels.
> >
> > Hence the capability negotiation would be needed so that device can stop the
> DMA, config interrupts etc.
>
> So this is a broken implementation at the pci level. We really can't fix removal
> for this device at all, except by fixing the device.
The device to be told how to behave with/without commit 43bb40c5b9265.
Not sure what you mean by 'fix the device'.
Users are running stable kernel that has commit 43bb40c5b9265 and its broken setup for them.
> Whatever works, works by
> chance. Feature negotiation in spec is not the way to fix that, but some work
> arounds in the driver to skip the device are acceptable, mostly to not bother
> with it.
>
Why not?
It sounds like we need feature bit like VERSION_1 or ORDER_PLATFORM.
To _fix_ a stable kernel, if you have a suggestion, please suggest.
> Pls document exactly how this pci looks. Does it have an id we can use to detect
> it?
>
CSPs have different device and vendor id for vnet, blk vfs.
Is that what you mean by id?
> > > For example, how about reading device control+status?
> > >
> > Most platforms read 0xffff on non-existing device, but not sure if this the
> standard or well defined.
>
> IIRC it's in the pci spec as a note.
>
Checking.
> > > If we get all ones device has been removed If we get 0 in bus
> > > master: device has been removed but re-inserted Anything else is a
> > > fake removal
> > >
> > Bus master check may pass, right returning all 1s, even if the device is
> removed, isn't it?
>
>
> So we check all ones 1st, only check bus master if not all ones?
>
Pci subsystem typically checks the vendor and device ids, and if its not all 1s, its safe enough check.
How about a fix something like this:
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -746,12 +746,16 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
{
struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
struct device *dev = get_device(&vp_dev->vdev.dev);
+ u32 v;
/*
* Device is marked broken on surprise removal so that virtio upper
* layers can abort any ongoing operation.
+ * Make sure that device is truly removed by directly interacting
+ * with the device (and not just depend on the slot registers).
*/
- if (!pci_device_is_present(pci_dev))
+ if (!pci_device_is_present(pci_dev) &&
+ !pci_bus_read_dev_vendor_id(pci_dev->bus, pci_dev->devfn, &v, 0))
virtio_break_device(&vp_dev->vdev);
So if the device is still there, it let it go through its usual cleanup flow.
And post this fix, a proper implementation with callback etc that you described can be implemented.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-28 6:59 ` Parav Pandit
@ 2025-08-28 9:23 ` Michael S. Tsirkin
2025-08-28 10:41 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-28 9:23 UTC (permalink / raw)
To: Parav Pandit
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
On Thu, Aug 28, 2025 at 06:59:26AM +0000, Parav Pandit wrote:
>
>
> > From: Michael S. Tsirkin <mst@redhat.com>
> > Sent: 28 August 2025 12:04 PM
> >
> > On Thu, Aug 28, 2025 at 06:23:02AM +0000, Parav Pandit wrote:
> > >
> > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > Sent: 27 August 2025 04:19 PM
> > > >
> > > > On Wed, Aug 27, 2025 at 06:21:28AM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, Aug 26, 2025 at 06:52:11PM +0000, Parav Pandit wrote:
> > > > > > > > > If it does not, and a user pull out the working device,
> > > > > > > > > how does your patch help?
> > > > > > > > >
> > > > > > > > A driver must tell that it will not follow broken ancient
> > > > > > > > behaviour and at that
> > > > > > > point device would stop its ancient backward compatibility mode.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > I don't know what is "ancient backward compatibility mode".
> > > > > > >
> > > > > > Let me explain.
> > > > > > Sadly, CSPs virtio pci device implementation is done such a way
> > > > > > that, it
> > > > works with ancient Linux kernel which does not have commit
> > > > 43bb40c5b9265.
> > > > >
> > > > >
> > > > > OK we are getting new information here.
> > > > >
> > > > > So let me summarize. There's a virtual system that pretends, to
> > > > > the guest, that device was removed by surprise removal, but
> > > > > actually device is there and is still doing DMA.
> > > > > Is that a fair summary?
> > > >
> > > Yes.
> > >
> > > > If that is the case, the thing to do would be to try and detect the
> > > > fake removal and then work with device as usual - device not doing
> > > > DMA after removal is pretty fundamental, after all.
> > > >
> > > The issue is: one can build the device to stop the DMA.
> > > There is no predictable combination for the driver and device that can work
> > for the user.
> > > For example,
> > > Device that stops the dma will not work before the commit 43bb40c5b9265.
> > > Device that continues the dma will not work with whatever new
> > implementation done in future kernels.
> > >
> > > Hence the capability negotiation would be needed so that device can stop the
> > DMA, config interrupts etc.
> >
> > So this is a broken implementation at the pci level. We really can't fix removal
> > for this device at all, except by fixing the device.
> The device to be told how to behave with/without commit 43bb40c5b9265.
> Not sure what you mean by 'fix the device'.
>
> Users are running stable kernel that has commit 43bb40c5b9265 and its broken setup for them.
>
> > Whatever works, works by
> > chance. Feature negotiation in spec is not the way to fix that, but some work
> > arounds in the driver to skip the device are acceptable, mostly to not bother
> > with it.
> >
> Why not?
> It sounds like we need feature bit like VERSION_1 or ORDER_PLATFORM.
Because the device is out of spec (PCI spec which virtio references).
Besides the bug is not in the device, it's in the pci emulation.
> To _fix_ a stable kernel, if you have a suggestion, please suggest.
>
> > Pls document exactly how this pci looks. Does it have an id we can use to detect
> > it?
> >
> CSPs have different device and vendor id for vnet, blk vfs.
> Is that what you mean by id?
vendor id is one way, yes. maybe a revision check, too.
> > > > For example, how about reading device control+status?
> > > >
> > > Most platforms read 0xffff on non-existing device, but not sure if this the
> > standard or well defined.
> >
> > IIRC it's in the pci spec as a note.
> >
> Checking.
>
> > > > If we get all ones device has been removed If we get 0 in bus
> > > > master: device has been removed but re-inserted Anything else is a
> > > > fake removal
> > > >
> > > Bus master check may pass, right returning all 1s, even if the device is
> > removed, isn't it?
> >
> >
> > So we check all ones 1st, only check bus master if not all ones?
> >
> Pci subsystem typically checks the vendor and device ids, and if its not all 1s, its safe enough check.
>
> How about a fix something like this:
>
> --- a/drivers/virtio/virtio_pci_common.c
> +++ b/drivers/virtio/virtio_pci_common.c
> @@ -746,12 +746,16 @@ static void virtio_pci_remove(struct pci_dev *pci_dev)
> {
> struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> struct device *dev = get_device(&vp_dev->vdev.dev);
> + u32 v;
>
> /*
> * Device is marked broken on surprise removal so that virtio upper
> * layers can abort any ongoing operation.
> + * Make sure that device is truly removed by directly interacting
> + * with the device (and not just depend on the slot registers).
> */
> - if (!pci_device_is_present(pci_dev))
> + if (!pci_device_is_present(pci_dev) &&
> + !pci_bus_read_dev_vendor_id(pci_dev->bus, pci_dev->devfn, &v, 0))
> virtio_break_device(&vp_dev->vdev);
>
> So if the device is still there, it let it go through its usual cleanup flow.
> And post this fix, a proper implementation with callback etc that you described can be implemented.
I don't have a big problem with this, but I don't understand the
scenario now again. report_error_detected relies on dev->error_state and
bus read. error_state is set on AER reporting an error. This is
not what you described.
Does the patch actually solve the problem for you?
Also can we limit this to a specific vendor id, or something like that?
I also still like the idea of reading dev control and status, since
it always bothered me that there's a theoretical chance that device
is re-inserted and bus read will succeed. Or maybe I'm imagining it.
--
MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-28 9:23 ` Michael S. Tsirkin
@ 2025-08-28 10:41 ` Parav Pandit
0 siblings, 0 replies; 38+ messages in thread
From: Parav Pandit @ 2025-08-28 10:41 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: NBU-Contact-Li Rongqing (EXTERNAL),
virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: 28 August 2025 02:53 PM
>
> On Thu, Aug 28, 2025 at 06:59:26AM +0000, Parav Pandit wrote:
> >
> >
> > > From: Michael S. Tsirkin <mst@redhat.com>
> > > Sent: 28 August 2025 12:04 PM
> > >
> > > On Thu, Aug 28, 2025 at 06:23:02AM +0000, Parav Pandit wrote:
> > > >
> > > > > From: Michael S. Tsirkin <mst@redhat.com>
> > > > > Sent: 27 August 2025 04:19 PM
> > > > >
> > > > > On Wed, Aug 27, 2025 at 06:21:28AM -0400, Michael S. Tsirkin wrote:
> > > > > > On Tue, Aug 26, 2025 at 06:52:11PM +0000, Parav Pandit wrote:
> > > > > > > > > > If it does not, and a user pull out the working
> > > > > > > > > > device, how does your patch help?
> > > > > > > > > >
> > > > > > > > > A driver must tell that it will not follow broken
> > > > > > > > > ancient behaviour and at that
> > > > > > > > point device would stop its ancient backward compatibility mode.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > I don't know what is "ancient backward compatibility mode".
> > > > > > > >
> > > > > > > Let me explain.
> > > > > > > Sadly, CSPs virtio pci device implementation is done such a
> > > > > > > way that, it
> > > > > works with ancient Linux kernel which does not have commit
> > > > > 43bb40c5b9265.
> > > > > >
> > > > > >
> > > > > > OK we are getting new information here.
> > > > > >
> > > > > > So let me summarize. There's a virtual system that pretends,
> > > > > > to the guest, that device was removed by surprise removal, but
> > > > > > actually device is there and is still doing DMA.
> > > > > > Is that a fair summary?
> > > > >
> > > > Yes.
> > > >
> > > > > If that is the case, the thing to do would be to try and detect
> > > > > the fake removal and then work with device as usual - device not
> > > > > doing DMA after removal is pretty fundamental, after all.
> > > > >
> > > > The issue is: one can build the device to stop the DMA.
> > > > There is no predictable combination for the driver and device that
> > > > can work
> > > for the user.
> > > > For example,
> > > > Device that stops the dma will not work before the commit
> 43bb40c5b9265.
> > > > Device that continues the dma will not work with whatever new
> > > implementation done in future kernels.
> > > >
> > > > Hence the capability negotiation would be needed so that device
> > > > can stop the
> > > DMA, config interrupts etc.
> > >
> > > So this is a broken implementation at the pci level. We really can't
> > > fix removal for this device at all, except by fixing the device.
> > The device to be told how to behave with/without commit 43bb40c5b9265.
> > Not sure what you mean by 'fix the device'.
> >
> > Users are running stable kernel that has commit 43bb40c5b9265 and its
> broken setup for them.
> >
> > > Whatever works, works by
> > > chance. Feature negotiation in spec is not the way to fix that, but
> > > some work arounds in the driver to skip the device are acceptable,
> > > mostly to not bother with it.
> > >
> > Why not?
> > It sounds like we need feature bit like VERSION_1 or ORDER_PLATFORM.
>
>
> Because the device is out of spec (PCI spec which virtio references).
>
> Besides the bug is not in the device, it's in the pci emulation.
>
>
> > To _fix_ a stable kernel, if you have a suggestion, please suggest.
> >
> > > Pls document exactly how this pci looks. Does it have an id we can
> > > use to detect it?
> > >
> > CSPs have different device and vendor id for vnet, blk vfs.
> > Is that what you mean by id?
>
> vendor id is one way, yes. maybe a revision check, too.
>
Vendor and device id are as defined in virtio spec as ID 0x1AF4 and respective device id.
> > > > > For example, how about reading device control+status?
> > > > >
> > > > Most platforms read 0xffff on non-existing device, but not sure if
> > > > this the
> > > standard or well defined.
> > >
> > > IIRC it's in the pci spec as a note.
> > >
> > Checking.
> >
> > > > > If we get all ones device has been removed If we get 0 in bus
> > > > > master: device has been removed but re-inserted Anything else is
> > > > > a fake removal
> > > > >
> > > > Bus master check may pass, right returning all 1s, even if the
> > > > device is
> > > removed, isn't it?
> > >
> > >
> > > So we check all ones 1st, only check bus master if not all ones?
> > >
> > Pci subsystem typically checks the vendor and device ids, and if its not all 1s,
> its safe enough check.
> >
> > How about a fix something like this:
> >
> > --- a/drivers/virtio/virtio_pci_common.c
> > +++ b/drivers/virtio/virtio_pci_common.c
> > @@ -746,12 +746,16 @@ static void virtio_pci_remove(struct pci_dev
> > *pci_dev) {
> > struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);
> > struct device *dev = get_device(&vp_dev->vdev.dev);
> > + u32 v;
> >
> > /*
> > * Device is marked broken on surprise removal so that virtio upper
> > * layers can abort any ongoing operation.
> > + * Make sure that device is truly removed by directly interacting
> > + * with the device (and not just depend on the slot registers).
> > */
> > - if (!pci_device_is_present(pci_dev))
> > + if (!pci_device_is_present(pci_dev) &&
> > + !pci_bus_read_dev_vendor_id(pci_dev->bus, pci_dev->devfn,
> > + &v, 0))
> > virtio_break_device(&vp_dev->vdev);
> >
> > So if the device is still there, it let it go through its usual cleanup flow.
> > And post this fix, a proper implementation with callback etc that you
> described can be implemented.
>
>
> I don't have a big problem with this, but I don't understand the scenario now
> again. report_error_detected relies on dev->error_state and bus read.
> error_state is set on AER reporting an error. This is not what you described.
>
When pci device is virtually removed from the slot error_state is updated using,
pci_dev_set_disconnected()
pci_dev_set_io_state()
> Does the patch actually solve the problem for you?
>
It should. I am going to check if this approach looks fine to you.
Please let me know.
> Also can we limit this to a specific vendor id, or something like that?
>
Its spec defined 0x1AF4.
>
> I also still like the idea of reading dev control and status, since it always
> bothered me that there's a theoretical chance that device is re-inserted and bus
> read will succeed. Or maybe I'm imagining it.
>
Re-insertion cannot happen in same slot until the previous slot is properly cleaned up and bus number is not released.
User may still attempt to plug in in same (virtual) or physical slot, but it will get different bus assigned as the previous one is not recycled yet because cleanup didnt finish yet.
>
> --
> MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-28 6:24 ` Parav Pandit
@ 2025-08-28 12:16 ` Cornelia Huck
2025-08-28 12:19 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Cornelia Huck @ 2025-08-28 12:16 UTC (permalink / raw)
To: Parav Pandit, Michael S. Tsirkin
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL), linux-s390
On Thu, Aug 28 2025, Parav Pandit <parav@nvidia.com> wrote:
>> From: Cornelia Huck <cohuck@redhat.com>
>> Sent: 27 August 2025 05:04 PM
>>
>> On Wed, Aug 27 2025, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>
>> > On Tue, Aug 26, 2025 at 06:52:03PM +0000, Parav Pandit wrote:
>> >> > What I do not understand, is what good does the revert do. Sorry.
>> >> >
>> >> Let me explain.
>> >> It prevents the issue of vblk requests being stuck due to broken VQ.
>> >> It prevents the vnet driver start_xmit() to be not stuck on skb completions.
>> >
>> > This is the part I don't get. In what scenario, before 43bb40c5b9265
>> > start_xmit is not stuck, but after 43bb40c5b9265 it is stuck?
>> >
>> > Once the device is gone, it is not using any buffers at all.
>>
>> What I also don't understand: virtio-ccw does exactly the same thing
>> (virtio_break_device(), added in 2014), and it supports surprise removal
>> _only_, yet I don't remember seeing bug reports?
>
> I suspect that stress testing may not have happened for ccw with active vblk Ios and outstanding transmit pkt and cvq commands.
> Hard to say as we don't have ccw hw or systems.
cc:ing linux-s390 list. I'd be surprised if nobody ever tested surprise
removal on a loaded system in the last 11 years.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-28 12:16 ` Cornelia Huck
@ 2025-08-28 12:19 ` Michael S. Tsirkin
2025-08-28 12:22 ` Cornelia Huck
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-28 12:19 UTC (permalink / raw)
To: Cornelia Huck
Cc: Parav Pandit, virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL), linux-s390
On Thu, Aug 28, 2025 at 02:16:28PM +0200, Cornelia Huck wrote:
> On Thu, Aug 28 2025, Parav Pandit <parav@nvidia.com> wrote:
>
> >> From: Cornelia Huck <cohuck@redhat.com>
> >> Sent: 27 August 2025 05:04 PM
> >>
> >> On Wed, Aug 27 2025, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>
> >> > On Tue, Aug 26, 2025 at 06:52:03PM +0000, Parav Pandit wrote:
> >> >> > What I do not understand, is what good does the revert do. Sorry.
> >> >> >
> >> >> Let me explain.
> >> >> It prevents the issue of vblk requests being stuck due to broken VQ.
> >> >> It prevents the vnet driver start_xmit() to be not stuck on skb completions.
> >> >
> >> > This is the part I don't get. In what scenario, before 43bb40c5b9265
> >> > start_xmit is not stuck, but after 43bb40c5b9265 it is stuck?
> >> >
> >> > Once the device is gone, it is not using any buffers at all.
> >>
> >> What I also don't understand: virtio-ccw does exactly the same thing
> >> (virtio_break_device(), added in 2014), and it supports surprise removal
> >> _only_, yet I don't remember seeing bug reports?
> >
> > I suspect that stress testing may not have happened for ccw with active vblk Ios and outstanding transmit pkt and cvq commands.
> > Hard to say as we don't have ccw hw or systems.
>
> cc:ing linux-s390 list. I'd be surprised if nobody ever tested surprise
> removal on a loaded system in the last 11 years.
As it became very clear from follow up discussion, the issue is nothing
to do with virtio, it is with a broken hypervisor that allows device to
DMA into guest memory while also telling the guest that the device has
been removed.
I guess s390 is just not broken like this.
--
MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-28 12:19 ` Michael S. Tsirkin
@ 2025-08-28 12:22 ` Cornelia Huck
2025-08-28 12:33 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Cornelia Huck @ 2025-08-28 12:22 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Parav Pandit, virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL), linux-s390
On Thu, Aug 28 2025, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> On Thu, Aug 28, 2025 at 02:16:28PM +0200, Cornelia Huck wrote:
>> On Thu, Aug 28 2025, Parav Pandit <parav@nvidia.com> wrote:
>>
>> >> From: Cornelia Huck <cohuck@redhat.com>
>> >> Sent: 27 August 2025 05:04 PM
>> >>
>> >> On Wed, Aug 27 2025, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>> >>
>> >> > On Tue, Aug 26, 2025 at 06:52:03PM +0000, Parav Pandit wrote:
>> >> >> > What I do not understand, is what good does the revert do. Sorry.
>> >> >> >
>> >> >> Let me explain.
>> >> >> It prevents the issue of vblk requests being stuck due to broken VQ.
>> >> >> It prevents the vnet driver start_xmit() to be not stuck on skb completions.
>> >> >
>> >> > This is the part I don't get. In what scenario, before 43bb40c5b9265
>> >> > start_xmit is not stuck, but after 43bb40c5b9265 it is stuck?
>> >> >
>> >> > Once the device is gone, it is not using any buffers at all.
>> >>
>> >> What I also don't understand: virtio-ccw does exactly the same thing
>> >> (virtio_break_device(), added in 2014), and it supports surprise removal
>> >> _only_, yet I don't remember seeing bug reports?
>> >
>> > I suspect that stress testing may not have happened for ccw with active vblk Ios and outstanding transmit pkt and cvq commands.
>> > Hard to say as we don't have ccw hw or systems.
>>
>> cc:ing linux-s390 list. I'd be surprised if nobody ever tested surprise
>> removal on a loaded system in the last 11 years.
>
>
> As it became very clear from follow up discussion, the issue is nothing
> to do with virtio, it is with a broken hypervisor that allows device to
> DMA into guest memory while also telling the guest that the device has
> been removed.
>
> I guess s390 is just not broken like this.
Ah good, I missed that -- that indeed sounds broken, and needs to be
fixed there.
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-28 12:22 ` Cornelia Huck
@ 2025-08-28 12:33 ` Parav Pandit
2025-08-28 13:00 ` Michael S. Tsirkin
0 siblings, 1 reply; 38+ messages in thread
From: Parav Pandit @ 2025-08-28 12:33 UTC (permalink / raw)
To: Cornelia Huck, Michael S. Tsirkin
Cc: virtualization@lists.linux.dev, jasowang@redhat.com,
stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL), linux-s390@vger.kernel.org
> From: Cornelia Huck <cohuck@redhat.com>
> Sent: 28 August 2025 05:52 PM
>
> On Thu, Aug 28 2025, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Thu, Aug 28, 2025 at 02:16:28PM +0200, Cornelia Huck wrote:
> >> On Thu, Aug 28 2025, Parav Pandit <parav@nvidia.com> wrote:
> >>
> >> >> From: Cornelia Huck <cohuck@redhat.com>
> >> >> Sent: 27 August 2025 05:04 PM
> >> >>
> >> >> On Wed, Aug 27 2025, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >> >>
> >> >> > On Tue, Aug 26, 2025 at 06:52:03PM +0000, Parav Pandit wrote:
> >> >> >> > What I do not understand, is what good does the revert do. Sorry.
> >> >> >> >
> >> >> >> Let me explain.
> >> >> >> It prevents the issue of vblk requests being stuck due to broken VQ.
> >> >> >> It prevents the vnet driver start_xmit() to be not stuck on skb
> completions.
> >> >> >
> >> >> > This is the part I don't get. In what scenario, before
> >> >> > 43bb40c5b9265 start_xmit is not stuck, but after 43bb40c5b9265 it is
> stuck?
> >> >> >
> >> >> > Once the device is gone, it is not using any buffers at all.
> >> >>
> >> >> What I also don't understand: virtio-ccw does exactly the same
> >> >> thing (virtio_break_device(), added in 2014), and it supports
> >> >> surprise removal _only_, yet I don't remember seeing bug reports?
> >> >
> >> > I suspect that stress testing may not have happened for ccw with active
> vblk Ios and outstanding transmit pkt and cvq commands.
> >> > Hard to say as we don't have ccw hw or systems.
> >>
> >> cc:ing linux-s390 list. I'd be surprised if nobody ever tested
> >> surprise removal on a loaded system in the last 11 years.
> >
> >
> > As it became very clear from follow up discussion, the issue is
> > nothing to do with virtio, it is with a broken hypervisor that allows
> > device to DMA into guest memory while also telling the guest that the
> > device has been removed.
> >
> > I guess s390 is just not broken like this.
>
> Ah good, I missed that -- that indeed sounds broken, and needs to be fixed
> there.
Nop. This is not the issue. You missed this focused on fixing the device.
The fact is: the driver is expecting the IOs and CVQ commands and DMA to succeed even after device is removed.
The driver is expecting the device reset to also succeed.
Stefan already pointed out this in the vblk driver patches.
This is why you see call traces on del_gendisk(), CVQ commands.
Again, it is the broken drivers not the device.
Device can stop the DMA and stop responding to the requests and kernel 6.X will continue to hang as long as it has cited commit.
^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-28 12:33 ` Parav Pandit
@ 2025-08-28 13:00 ` Michael S. Tsirkin
2025-08-28 13:37 ` Parav Pandit
0 siblings, 1 reply; 38+ messages in thread
From: Michael S. Tsirkin @ 2025-08-28 13:00 UTC (permalink / raw)
To: Parav Pandit
Cc: Cornelia Huck, virtualization@lists.linux.dev,
jasowang@redhat.com, stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL), linux-s390@vger.kernel.org
On Thu, Aug 28, 2025 at 12:33:58PM +0000, Parav Pandit wrote:
>
> > From: Cornelia Huck <cohuck@redhat.com>
> > Sent: 28 August 2025 05:52 PM
> >
> > On Thu, Aug 28 2025, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> > > On Thu, Aug 28, 2025 at 02:16:28PM +0200, Cornelia Huck wrote:
> > >> On Thu, Aug 28 2025, Parav Pandit <parav@nvidia.com> wrote:
> > >>
> > >> >> From: Cornelia Huck <cohuck@redhat.com>
> > >> >> Sent: 27 August 2025 05:04 PM
> > >> >>
> > >> >> On Wed, Aug 27 2025, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >> >>
> > >> >> > On Tue, Aug 26, 2025 at 06:52:03PM +0000, Parav Pandit wrote:
> > >> >> >> > What I do not understand, is what good does the revert do. Sorry.
> > >> >> >> >
> > >> >> >> Let me explain.
> > >> >> >> It prevents the issue of vblk requests being stuck due to broken VQ.
> > >> >> >> It prevents the vnet driver start_xmit() to be not stuck on skb
> > completions.
> > >> >> >
> > >> >> > This is the part I don't get. In what scenario, before
> > >> >> > 43bb40c5b9265 start_xmit is not stuck, but after 43bb40c5b9265 it is
> > stuck?
> > >> >> >
> > >> >> > Once the device is gone, it is not using any buffers at all.
> > >> >>
> > >> >> What I also don't understand: virtio-ccw does exactly the same
> > >> >> thing (virtio_break_device(), added in 2014), and it supports
> > >> >> surprise removal _only_, yet I don't remember seeing bug reports?
> > >> >
> > >> > I suspect that stress testing may not have happened for ccw with active
> > vblk Ios and outstanding transmit pkt and cvq commands.
> > >> > Hard to say as we don't have ccw hw or systems.
> > >>
> > >> cc:ing linux-s390 list. I'd be surprised if nobody ever tested
> > >> surprise removal on a loaded system in the last 11 years.
> > >
> > >
> > > As it became very clear from follow up discussion, the issue is
> > > nothing to do with virtio, it is with a broken hypervisor that allows
> > > device to DMA into guest memory while also telling the guest that the
> > > device has been removed.
> > >
> > > I guess s390 is just not broken like this.
> >
> > Ah good, I missed that -- that indeed sounds broken, and needs to be fixed
> > there.
> Nop. This is not the issue. You missed this focused on fixing the device.
>
> The fact is: the driver is expecting the IOs and CVQ commands and DMA to succeed even after device is removed.
> The driver is expecting the device reset to also succeed.
> Stefan already pointed out this in the vblk driver patches.
> This is why you see call traces on del_gendisk(), CVQ commands.
>
> Again, it is the broken drivers not the device.
> Device can stop the DMA and stop responding to the requests and kernel 6.X will continue to hang as long as it has cited commit.
Parav, the issues you cite are real but unrelated and will hang anyway
with or without the commit.
All you have to do is pull out the device while e.g. a command is in the
process of being submitted.
All the commit you want to revert does, is in some instances instead of
just hanging it will make queue as broken and release memory. Since you
device is not really gone and keeps DMAing into memory, guest memory
gets corrupted.
But your argument that the issue is that the fix is "incomplete" is
bogus - when we make the fix complete it will become even worse for
this broken devices.
--
MST
^ permalink raw reply [flat|nested] 38+ messages in thread
* RE: [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device"
2025-08-28 13:00 ` Michael S. Tsirkin
@ 2025-08-28 13:37 ` Parav Pandit
0 siblings, 0 replies; 38+ messages in thread
From: Parav Pandit @ 2025-08-28 13:37 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Cornelia Huck, virtualization@lists.linux.dev,
jasowang@redhat.com, stefanha@redhat.com, pbonzini@redhat.com,
xuanzhuo@linux.alibaba.com, stable@vger.kernel.org, Max Gurtovoy,
NBU-Contact-Li Rongqing (EXTERNAL), linux-s390@vger.kernel.org
> From: Michael S. Tsirkin <mst@redhat.com>
> Sent: 28 August 2025 06:31 PM
>
> On Thu, Aug 28, 2025 at 12:33:58PM +0000, Parav Pandit wrote:
> >
> > > From: Cornelia Huck <cohuck@redhat.com>
> > > Sent: 28 August 2025 05:52 PM
> > >
> > > On Thu, Aug 28 2025, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >
> > > > On Thu, Aug 28, 2025 at 02:16:28PM +0200, Cornelia Huck wrote:
> > > >> On Thu, Aug 28 2025, Parav Pandit <parav@nvidia.com> wrote:
> > > >>
> > > >> >> From: Cornelia Huck <cohuck@redhat.com>
> > > >> >> Sent: 27 August 2025 05:04 PM
> > > >> >>
> > > >> >> On Wed, Aug 27 2025, "Michael S. Tsirkin" <mst@redhat.com>
> wrote:
> > > >> >>
> > > >> >> > On Tue, Aug 26, 2025 at 06:52:03PM +0000, Parav Pandit wrote:
> > > >> >> >> > What I do not understand, is what good does the revert do.
> Sorry.
> > > >> >> >> >
> > > >> >> >> Let me explain.
> > > >> >> >> It prevents the issue of vblk requests being stuck due to broken
> VQ.
> > > >> >> >> It prevents the vnet driver start_xmit() to be not stuck on
> > > >> >> >> skb
> > > completions.
> > > >> >> >
> > > >> >> > This is the part I don't get. In what scenario, before
> > > >> >> > 43bb40c5b9265 start_xmit is not stuck, but after
> > > >> >> > 43bb40c5b9265 it is
> > > stuck?
> > > >> >> >
> > > >> >> > Once the device is gone, it is not using any buffers at all.
> > > >> >>
> > > >> >> What I also don't understand: virtio-ccw does exactly the same
> > > >> >> thing (virtio_break_device(), added in 2014), and it supports
> > > >> >> surprise removal _only_, yet I don't remember seeing bug reports?
> > > >> >
> > > >> > I suspect that stress testing may not have happened for ccw
> > > >> > with active
> > > vblk Ios and outstanding transmit pkt and cvq commands.
> > > >> > Hard to say as we don't have ccw hw or systems.
> > > >>
> > > >> cc:ing linux-s390 list. I'd be surprised if nobody ever tested
> > > >> surprise removal on a loaded system in the last 11 years.
> > > >
> > > >
> > > > As it became very clear from follow up discussion, the issue is
> > > > nothing to do with virtio, it is with a broken hypervisor that
> > > > allows device to DMA into guest memory while also telling the
> > > > guest that the device has been removed.
> > > >
> > > > I guess s390 is just not broken like this.
> > >
> > > Ah good, I missed that -- that indeed sounds broken, and needs to be
> > > fixed there.
> > Nop. This is not the issue. You missed this focused on fixing the device.
> >
> > The fact is: the driver is expecting the IOs and CVQ commands and DMA to
> succeed even after device is removed.
> > The driver is expecting the device reset to also succeed.
> > Stefan already pointed out this in the vblk driver patches.
> > This is why you see call traces on del_gendisk(), CVQ commands.
> >
> > Again, it is the broken drivers not the device.
> > Device can stop the DMA and stop responding to the requests and kernel
> 6.X will continue to hang as long as it has cited commit.
>
>
> Parav, the issues you cite are real but unrelated and will hang anyway with or
> without the commit.
>
How is it unrelated?
If it is going to hang anyway (in your view), and you proposed different callback etc as brand-new feature to Linux kernel, what is the objection to revert it?
As you pointed out it will be in multiple subsystems (net, block, pci) etc, why not do the proper work?
Reverting at least helps those stable kernels to operate smoothly as before.
> All you have to do is pull out the device while e.g. a command is in the
> process of being submitted.
>
> All the commit you want to revert does, is in some instances instead of just
> hanging it will make queue as broken and release memory. Since you device is
> not really gone and keeps DMAing into memory, guest memory gets
> corrupted.
Nop. This is not the case.
What is "some instance"?
The virtio block driver is expecting the IOs to be completed without the cited commit.
As you listed cross subsystem callbacks, such infrastructure != fix.
So to make things clear, as discussed.
1. have proper kernel infrastructure in placed as you outlined the design using callback
2. have the spec update to make sure drivers negotiate its readiness for surprise removal and do not expect to access the device.
Until that point, restore the stability of stable kernels.
>
> But your argument that the issue is that the fix is "incomplete" is bogus -
> when we make the fix complete it will become even worse for this broken
> devices.
I explained you that the device was doing the right thing and that is why exactly the call trace in the cited patch showed up.
Again quoting "broken device" is wrong.
The drivers are broken trying to reset the removed device.
And the ask is to do proper feature negotiation to get to that point.
When the reasonable workaround is suggested in previous email, you opted to not respond to it?
This is not how broken user experience is restored for stable kernel.
Are you waiting for the test results if it works?
If so, then yes, it makes sense.
I will of course test and submit proper v2.
^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2025-08-28 13:37 UTC | newest]
Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22 9:17 [PATCH] Revert "virtio_pci: Support surprise removal of virtio pci device" Parav Pandit
2025-08-22 10:21 ` Michael S. Tsirkin
2025-08-22 12:22 ` Parav Pandit
2025-08-22 13:03 ` Michael S. Tsirkin
2025-08-22 13:49 ` Parav Pandit
2025-08-22 13:59 ` Michael S. Tsirkin
2025-08-24 2:36 ` Parav Pandit
2025-08-24 14:33 ` Michael S. Tsirkin
2025-08-26 18:52 ` Parav Pandit
2025-08-27 10:19 ` Michael S. Tsirkin
2025-08-27 11:33 ` Cornelia Huck
2025-08-28 6:24 ` Parav Pandit
2025-08-28 12:16 ` Cornelia Huck
2025-08-28 12:19 ` Michael S. Tsirkin
2025-08-28 12:22 ` Cornelia Huck
2025-08-28 12:33 ` Parav Pandit
2025-08-28 13:00 ` Michael S. Tsirkin
2025-08-28 13:37 ` Parav Pandit
-- strict thread matches above, loose matches on Subject: below --
2025-08-22 10:27 Li,Rongqing
2025-08-22 12:24 ` Parav Pandit
2025-08-22 13:04 ` Michael S. Tsirkin
2025-08-22 13:53 ` Parav Pandit
2025-08-22 14:02 ` Michael S. Tsirkin
2025-08-24 2:36 ` Parav Pandit
2025-08-24 14:29 ` Michael S. Tsirkin
2025-08-26 18:52 ` Parav Pandit
2025-08-27 10:21 ` Michael S. Tsirkin
2025-08-27 10:49 ` Michael S. Tsirkin
2025-08-28 6:23 ` Parav Pandit
2025-08-28 6:34 ` Michael S. Tsirkin
2025-08-28 6:59 ` Parav Pandit
2025-08-28 9:23 ` Michael S. Tsirkin
2025-08-28 10:41 ` Parav Pandit
2025-04-08 14:59 Parav Pandit
2025-04-08 20:15 ` Michael S. Tsirkin
2025-04-09 13:50 ` Parav Pandit
2025-04-09 16:02 ` Michael S. Tsirkin
2025-04-16 3:01 ` Parav Pandit
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).