qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* vhost-user-blk  reconnect issue
@ 2024-03-25 10:54 Yajun Wu
  2024-03-27 10:47 ` Stefano Garzarella
  0 siblings, 1 reply; 8+ messages in thread
From: Yajun Wu @ 2024-03-25 10:54 UTC (permalink / raw)
  To: fengli@smartx.com, raphael.norwitz@nutanix.com,
	qemu-devel@nongnu.org
  Cc: mst@redhat.com, Parav Pandit

Hi experts,

With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect failure scenarios:
1. Disconnect vhost-user-blk backend before guest driver probe vblk device, then reconnect backend after guest driver probe device. QEMU won't send out any vhost messages to restore backend.
This is because vhost->vdev is NULL before guest driver probe vblk device, so vhost_user_blk_disconnect won't be called, s->connected is still true. Next vhost_user_blk_connect will simply return without doing anything.

2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init. 
This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again s->connected is still true, even chr connect is closed.

I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called when chr connect close?
Hope we can have a fix soon.


Thanks,
Yajun


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: vhost-user-blk  reconnect issue
  2024-03-25 10:54 vhost-user-blk reconnect issue Yajun Wu
@ 2024-03-27 10:47 ` Stefano Garzarella
  2024-04-01  2:08   ` Yajun Wu
  0 siblings, 1 reply; 8+ messages in thread
From: Stefano Garzarella @ 2024-03-27 10:47 UTC (permalink / raw)
  To: Yajun Wu
  Cc: fengli@smartx.com, raphael.norwitz@nutanix.com,
	qemu-devel@nongnu.org, mst@redhat.com, Parav Pandit

Hi Yajun,

On Mon, Mar 25, 2024 at 10:54:13AM +0000, Yajun Wu wrote:
>Hi experts,
>
>With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect 
>failure scenarios:

Do you know if has it ever worked and so it's a regression, or have we 
always had this problem?

Thanks,
Stefano

>1. Disconnect vhost-user-blk backend before guest driver probe vblk device, then reconnect backend after guest driver probe device. QEMU won't send out any vhost messages to restore backend.
>This is because vhost->vdev is NULL before guest driver probe vblk device, so vhost_user_blk_disconnect won't be called, s->connected is still true. Next vhost_user_blk_connect will simply return without doing anything.
>
>2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init.
>This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again s->connected is still true, even chr connect is closed.
>
>I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called when chr connect close?
>Hope we can have a fix soon.
>
>
>Thanks,
>Yajun
>



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: vhost-user-blk reconnect issue
  2024-03-27 10:47 ` Stefano Garzarella
@ 2024-04-01  2:08   ` Yajun Wu
  2024-04-01  7:54     ` Michael S. Tsirkin
  2024-04-01  8:34     ` Li Feng
  0 siblings, 2 replies; 8+ messages in thread
From: Yajun Wu @ 2024-04-01  2:08 UTC (permalink / raw)
  To: Stefano Garzarella, ",alex.bennee"
  Cc: fengli@smartx.com, raphael.norwitz@nutanix.com,
	qemu-devel@nongnu.org, mst@redhat.com, Parav Pandit


On 3/27/2024 6:47 PM, Stefano Garzarella wrote:
> External email: Use caution opening links or attachments
>
>
> Hi Yajun,
>
> On Mon, Mar 25, 2024 at 10:54:13AM +0000, Yajun Wu wrote:
>> Hi experts,
>>
>> With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
>> failure scenarios:
> Do you know if has it ever worked and so it's a regression, or have we
> always had this problem?

I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) 
hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both failures. 
Previous hash is good.

I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the 
cause, previous code doesn't have this check?

>
> Thanks,
> Stefano
>
>> 1. Disconnect vhost-user-blk backend before guest driver probe vblk device, then reconnect backend after guest driver probe device. QEMU won't send out any vhost messages to restore backend.
>> This is because vhost->vdev is NULL before guest driver probe vblk device, so vhost_user_blk_disconnect won't be called, s->connected is still true. Next vhost_user_blk_connect will simply return without doing anything.
>>
>> 2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init.
>> This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again s->connected is still true, even chr connect is closed.
>>
>> I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called when chr connect close?
>> Hope we can have a fix soon.
>>
>>
>> Thanks,
>> Yajun
>>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: vhost-user-blk reconnect issue
  2024-04-01  2:08   ` Yajun Wu
@ 2024-04-01  7:54     ` Michael S. Tsirkin
  2024-04-01  8:34     ` Li Feng
  1 sibling, 0 replies; 8+ messages in thread
From: Michael S. Tsirkin @ 2024-04-01  7:54 UTC (permalink / raw)
  To: Yajun Wu
  Cc: Stefano Garzarella, fengli@smartx.com,
	raphael.norwitz@nutanix.com, qemu-devel@nongnu.org, Parav Pandit,
	Alex Bennée

On Mon, Apr 01, 2024 at 10:08:10AM +0800, Yajun Wu wrote:
> 
> On 3/27/2024 6:47 PM, Stefano Garzarella wrote:
> > External email: Use caution opening links or attachments
> > 
> > 
> > Hi Yajun,
> > 
> > On Mon, Mar 25, 2024 at 10:54:13AM +0000, Yajun Wu wrote:
> > > Hi experts,
> > > 
> > > With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
> > > failure scenarios:
> > Do you know if has it ever worked and so it's a regression, or have we
> > always had this problem?
> 
> I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) hw/virtio:
> generalise CHR_EVENT_CLOSED handling"  caused both failures. Previous hash
> is good.

CC Alex who wrote that commit.

> I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the cause,
> previous code doesn't have this check?
> 
> > 
> > Thanks,
> > Stefano
> > 
> > > 1. Disconnect vhost-user-blk backend before guest driver probe vblk device, then reconnect backend after guest driver probe device. QEMU won't send out any vhost messages to restore backend.
> > > This is because vhost->vdev is NULL before guest driver probe vblk device, so vhost_user_blk_disconnect won't be called, s->connected is still true. Next vhost_user_blk_connect will simply return without doing anything.
> > > 
> > > 2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init.
> > > This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again s->connected is still true, even chr connect is closed.
> > > 
> > > I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called when chr connect close?
> > > Hope we can have a fix soon.
> > > 
> > > 
> > > Thanks,
> > > Yajun
> > > 



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: vhost-user-blk reconnect issue
  2024-04-01  2:08   ` Yajun Wu
  2024-04-01  7:54     ` Michael S. Tsirkin
@ 2024-04-01  8:34     ` Li Feng
  2024-04-01  8:43       ` Yajun Wu
  1 sibling, 1 reply; 8+ messages in thread
From: Li Feng @ 2024-04-01  8:34 UTC (permalink / raw)
  To: Yajun Wu
  Cc: Stefano Garzarella, Alex Benné e,
	raphael.norwitz@nutanix.com, qemu-devel@nongnu.org,
	mst@redhat.com, Parav Pandit

[-- Attachment #1: Type: text/plain, Size: 2223 bytes --]

Hi yajun,

I have submitted a patch to fix this problem a few months ago, but in the end this solution was not accepted and other solutions
were adopted to fix it.

https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/

This is the merged fix:


https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/

Thanks,
Li

> 2024年4月1日 10:08,Yajun Wu <yajunw@nvidia.com> 写道:
> 
> 
> On 3/27/2024 6:47 PM, Stefano Garzarella wrote:
>> External email: Use caution opening links or attachments
>> 
>> 
>> Hi Yajun,
>> 
>> On Mon, Mar 25, 2024 at 10:54:13AM +0000, Yajun Wu wrote:
>>> Hi experts,
>>> 
>>> With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
>>> failure scenarios:
>> Do you know if has it ever worked and so it's a regression, or have we
>> always had this problem?
> 
> I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both failures. Previous hash is good.
> 
> I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the cause, previous code doesn't have this check?
> 
>> 
>> Thanks,
>> Stefano
>> 
>>> 1. Disconnect vhost-user-blk backend before guest driver probe vblk device, then reconnect backend after guest driver probe device. QEMU won't send out any vhost messages to restore backend.
>>> This is because vhost->vdev is NULL before guest driver probe vblk device, so vhost_user_blk_disconnect won't be called, s->connected is still true. Next vhost_user_blk_connect will simply return without doing anything.
>>> 
>>> 2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init.
>>> This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again s->connected is still true, even chr connect is closed.
>>> 
>>> I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called when chr connect close?
>>> Hope we can have a fix soon.
>>> 
>>> 
>>> Thanks,
>>> Yajun
>>> 


[-- Attachment #2: Type: text/html, Size: 21908 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: vhost-user-blk reconnect issue
  2024-04-01  8:34     ` Li Feng
@ 2024-04-01  8:43       ` Yajun Wu
  2024-04-02  8:44         ` Li Feng
  0 siblings, 1 reply; 8+ messages in thread
From: Yajun Wu @ 2024-04-01  8:43 UTC (permalink / raw)
  To: Li Feng
  Cc: Stefano Garzarella, Alex Benné e,
	raphael.norwitz@nutanix.com, qemu-devel@nongnu.org,
	mst@redhat.com, Parav Pandit

[-- Attachment #1: Type: text/plain, Size: 3209 bytes --]


On 4/1/2024 4:34 PM, Li Feng wrote:
> *External email: Use caution opening links or attachments*
>
>
> Hi yajun,
>
> I have submitted a patch to fix this problem a few months ago, but in 
> the end this solution was not accepted and other solutions
> were adopted to fix it.
>
> [PATCH 1/2] vhost-user: fix lost reconnect - Li Feng 
> <https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/>
> lore.kernel.org 
> <https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/>
> 	<https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/>
>
> <https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/>
>
I think this fix is valid.

> This is the merged fix:
>
>
> [PULL 76/83] vhost-user: fix lost reconnect - Michael S. Tsirkin 
> <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/>
> lore.kernel.org 
> <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/>
> 	<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/>
>
> <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/>

My tests are with this fix, failed in the two scenarios I mentioned.

>
> Thanks,
> Li
>
>> 2024年4月1日 10:08,Yajun Wu <yajunw@nvidia.com> 写道:
>>
>>
>> On 3/27/2024 6:47 PM, Stefano Garzarella wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> Hi Yajun,
>>>
>>> On Mon, Mar 25, 2024 at 10:54:13AM +0000, Yajun Wu wrote:
>>>> Hi experts,
>>>>
>>>> With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
>>>> failure scenarios:
>>> Do you know if has it ever worked and so it's a regression, or have we
>>> always had this problem?
>>
>> I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) 
>> hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both 
>> failures. Previous hash is good.
>>
>> I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the 
>> cause, previous code doesn't have this check?
>>
>>>
>>> Thanks,
>>> Stefano
>>>
>>>> 1. Disconnect vhost-user-blk backend before guest driver probe vblk 
>>>> device, then reconnect backend after guest driver probe device. 
>>>> QEMU won't send out any vhost messages to restore backend.
>>>> This is because vhost->vdev is NULL before guest driver probe vblk 
>>>> device, so vhost_user_blk_disconnect won't be called, s->connected 
>>>> is still true. Next vhost_user_blk_connect will simply return 
>>>> without doing anything.
>>>>
>>>> 2. modprobe -r virtio-blk inside VM, then disconnect backend, then 
>>>> reconnect backend, then modprobe virtio-blk. QEMU won't send 
>>>> messages in vhost_dev_init.
>>>> This is because rmmod will let qemu call vhost_user_blk_stop, 
>>>> vhost->vdev also become NULL(in vhost_dev_stop), 
>>>> vhost_user_blk_disconnect won't be called. Again s->connected is 
>>>> still true, even chr connect is closed.
>>>>
>>>> I think even vhost->vdev is NULL, vhost_user_blk_disconnect should 
>>>> be called when chr connect close?
>>>> Hope we can have a fix soon.
>>>>
>>>>
>>>> Thanks,
>>>> Yajun
>>>>
>

[-- Attachment #2: Type: text/html, Size: 30874 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: vhost-user-blk reconnect issue
  2024-04-01  8:43       ` Yajun Wu
@ 2024-04-02  8:44         ` Li Feng
  2024-04-10  7:51           ` Yajun Wu
  0 siblings, 1 reply; 8+ messages in thread
From: Li Feng @ 2024-04-02  8:44 UTC (permalink / raw)
  To: Yajun Wu, Stefano Garzarella, raphael.norwitz@nutanix.com
  Cc: Alex Benné e, qemu-devel@nongnu.org, mst@redhat.com,
	Parav Pandit

[-- Attachment #1: Type: text/plain, Size: 3204 bytes --]


Hi,

I tested it today and there is indeed a problem in this scenario.
It seems that the first version of the patch is the best and can handle all scenarios.
With this patch, the previously merged patches are no longer useful.


I will revert this patch and submit a new fix. Do you have any comments?

Revert: https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/
New: https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/

Thanks,
Li

> 2024年4月1日 16:43,Yajun Wu <yajunw@nvidia.com> 写道:
> 
> 
> 
> On 4/1/2024 4:34 PM, Li Feng wrote:
>> 
>> External email: Use caution opening links or attachments               
>> 
>> Hi yajun,
>> 
>> I have submitted a patch to fix this problem a few months ago, but in the end this solution was not accepted and other solutions
>> were adopted to fix it.
>> 
>> https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/
>> 
> I think this fix is valid.
> 
>> This is the merged fix:
>> 
>> 
>> https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/My tests are with this fix, failed in the two scenarios I mentioned. 
> 
>> 
>> Thanks,
>> Li
>> 
>>> 2024年4月1日 10:08,Yajun Wu <yajunw@nvidia.com> <mailto:yajunw@nvidia.com> 写道:
>>> 
>>> 
>>> On 3/27/2024 6:47 PM, Stefano Garzarella wrote:
>>>> External email: Use caution opening links or attachments
>>>> 
>>>> 
>>>> Hi Yajun,
>>>> 
>>>> On Mon, Mar 25, 2024 at 10:54:13AM +0000, Yajun Wu wrote:
>>>>> Hi experts,
>>>>> 
>>>>> With latest QEMU (8.2.90), we find two vhost-user-blk backend reconnect
>>>>> failure scenarios:
>>>> Do you know if has it ever worked and so it's a regression, or have we
>>>> always had this problem?
>>> 
>>> I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both failures. Previous hash is good.
>>> 
>>> I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is the cause, previous code doesn't have this check?
>>> 
>>>> 
>>>> Thanks,
>>>> Stefano
>>>> 
>>>>> 1. Disconnect vhost-user-blk backend before guest driver probe vblk device, then reconnect backend after guest driver probe device. QEMU won't send out any vhost messages to restore backend.
>>>>> This is because vhost->vdev is NULL before guest driver probe vblk device, so vhost_user_blk_disconnect won't be called, s->connected is still true. Next vhost_user_blk_connect will simply return without doing anything.
>>>>> 
>>>>> 2. modprobe -r virtio-blk inside VM, then disconnect backend, then reconnect backend, then modprobe virtio-blk. QEMU won't send messages in vhost_dev_init.
>>>>> This is because rmmod will let qemu call vhost_user_blk_stop, vhost->vdev also become NULL(in vhost_dev_stop), vhost_user_blk_disconnect won't be called. Again s->connected is still true, even chr connect is closed.
>>>>> 
>>>>> I think even vhost->vdev is NULL, vhost_user_blk_disconnect should be called when chr connect close?
>>>>> Hope we can have a fix soon.
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Yajun
>>>>> 
>> 


[-- Attachment #2: Type: text/html, Size: 31933 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: vhost-user-blk reconnect issue
  2024-04-02  8:44         ` Li Feng
@ 2024-04-10  7:51           ` Yajun Wu
  0 siblings, 0 replies; 8+ messages in thread
From: Yajun Wu @ 2024-04-10  7:51 UTC (permalink / raw)
  To: Li Feng, Stefano Garzarella, raphael.norwitz@nutanix.com
  Cc: Alex Benné e, qemu-devel@nongnu.org, mst@redhat.com,
	Parav Pandit

[-- Attachment #1: Type: text/plain, Size: 4329 bytes --]


On 4/2/2024 4:44 PM, Li Feng wrote:
> *External email: Use caution opening links or attachments*
>
>
>
> Hi,
>
> I tested it today and there is indeed a problem in this scenario.
> It seems that the first version of the patch is the best and can 
> handle all scenarios.
> With this patch, the previously merged patches are no longer useful.
>
>
> I will revert this patch and submit a new fix. Do you have any comments?
>
> Revert: 
> https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/ 
> <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/>
> New: 
> https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/ 
> <https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/>

Looks good to me.

Thanks,

Yajun

>
> Thanks,
> Li
>
>> 2024年4月1日 16:43,Yajun Wu <yajunw@nvidia.com> 写道:
>>
>>
>> On 4/1/2024 4:34 PM, Li Feng wrote:
>>> *External email: Use caution opening links or attachments*
>>>
>>>
>>> Hi yajun,
>>>
>>> I have submitted a patch to fix this problem a few months ago, but 
>>> in the end this solution was not accepted and other solutions
>>> were adopted to fix it.
>>>
>>> [PATCH 1/2] vhost-user: fix lost reconnect - Li Feng 
>>> <https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/>
>>> lore.kernel.org 
>>> <https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/>
>>> 	<https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/>
>>>
>>> <https://lore.kernel.org/all/20230804052954.2918915-2-fengli@smartx.com/>
>>>
>> I think this fix is valid.
>>
>>> This is the merged fix:
>>>
>>>
>>> [PULL 76/83] vhost-user: fix lost reconnect - Michael S. Tsirkin 
>>> <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/>
>>> lore.kernel.org 
>>> <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/>
>>> 	<https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/>
>>>
>>> <https://lore.kernel.org/all/a68c0148e9bf105f9e83ff5e763b8fcb6f7ba9be.1697644299.git.mst@redhat.com/>
>>
>> My tests are with this fix, failed in the two scenarios I mentioned.
>>
>>>
>>> Thanks,
>>> Li
>>>
>>>> 2024年4月1日 10:08,Yajun Wu <yajunw@nvidia.com> 写道:
>>>>
>>>>
>>>> On 3/27/2024 6:47 PM, Stefano Garzarella wrote:
>>>>> External email: Use caution opening links or attachments
>>>>>
>>>>>
>>>>> Hi Yajun,
>>>>>
>>>>> On Mon, Mar 25, 2024 at 10:54:13AM +0000, Yajun Wu wrote:
>>>>>> Hi experts,
>>>>>>
>>>>>> With latest QEMU (8.2.90), we find two vhost-user-blk backend 
>>>>>> reconnect
>>>>>> failure scenarios:
>>>>> Do you know if has it ever worked and so it's a regression, or have we
>>>>> always had this problem?
>>>>
>>>> I am afraid this commit: "71e076a07d (2022-12-01 02:30:13 -0500) 
>>>> hw/virtio: generalise CHR_EVENT_CLOSED handling"  caused both 
>>>> failures. Previous hash is good.
>>>>
>>>> I suspect the "if (vhost->vdev)" in vhost_user_async_close_bh is 
>>>> the cause, previous code doesn't have this check?
>>>>
>>>>>
>>>>> Thanks,
>>>>> Stefano
>>>>>
>>>>>> 1. Disconnect vhost-user-blk backend before guest driver probe 
>>>>>> vblk device, then reconnect backend after guest driver probe 
>>>>>> device. QEMU won't send out any vhost messages to restore backend.
>>>>>> This is because vhost->vdev is NULL before guest driver probe 
>>>>>> vblk device, so vhost_user_blk_disconnect won't be called, 
>>>>>> s->connected is still true. Next vhost_user_blk_connect will 
>>>>>> simply return without doing anything.
>>>>>>
>>>>>> 2. modprobe -r virtio-blk inside VM, then disconnect backend, 
>>>>>> then reconnect backend, then modprobe virtio-blk. QEMU won't send 
>>>>>> messages in vhost_dev_init.
>>>>>> This is because rmmod will let qemu call vhost_user_blk_stop, 
>>>>>> vhost->vdev also become NULL(in vhost_dev_stop), 
>>>>>> vhost_user_blk_disconnect won't be called. Again s->connected is 
>>>>>> still true, even chr connect is closed.
>>>>>>
>>>>>> I think even vhost->vdev is NULL, vhost_user_blk_disconnect 
>>>>>> should be called when chr connect close?
>>>>>> Hope we can have a fix soon.
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Yajun
>>>>>>
>>>
>

[-- Attachment #2: Type: text/html, Size: 36499 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-04-10  7:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-03-25 10:54 vhost-user-blk reconnect issue Yajun Wu
2024-03-27 10:47 ` Stefano Garzarella
2024-04-01  2:08   ` Yajun Wu
2024-04-01  7:54     ` Michael S. Tsirkin
2024-04-01  8:34     ` Li Feng
2024-04-01  8:43       ` Yajun Wu
2024-04-02  8:44         ` Li Feng
2024-04-10  7:51           ` Yajun Wu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).