public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH v1] virtio_pci: only store successfully populated virtio_pci_vq_info
@ 2026-04-07 21:25 Link Lin
       [not found] ` <CALUx4KQKNYZm5AZzQXNqLRdGAT0nQOpmn_Lz6WHie73w1d9JQA@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Link Lin @ 2026-04-07 21:25 UTC (permalink / raw)
  To: mst, jasowang, xuanzhuo
  Cc: eperezma, jiaqiyan, rientjes, weixugc, virtualization,
	linux-kernel, Link Lin

In environments where free page reporting is disabled, a kernel
panic is triggered when tearing down the virtio_balloon module:

  [12261.808190] Call trace:
  [12261.808471]  __list_del_entry_valid_or_report+0x18/0xe0
  [12261.809064]  vp_del_vqs+0x12c/0x270
  [12261.809462]  remove_common+0x80/0x98 [virtio_balloon]
  [12261.810034]  virtballoon_remove+0xfc/0x158 [virtio_balloon]
  [12261.810663]  virtio_dev_remove+0x68/0xf8
  [12261.811108]  device_release_driver_internal+0x17c/0x278
  [12261.811701]  driver_detach+0xd4/0x138
  [12261.812117]  bus_remove_driver+0x90/0xd0
  [12261.812562]  driver_unregister+0x40/0x70
  [12261.813006]  unregister_virtio_driver+0x20/0x38
  [12261.813518]  cleanup_module+0x20/0x7a8 [virtio_balloon]
  [12261.814109]  __arm64_sys_delete_module+0x278/0x3d0
  [12261.814654]  invoke_syscall+0x5c/0x120
  [12261.815086]  el0_svc_common+0x90/0xf8
  [12261.815506]  do_el0_svc+0x2c/0x48
  [12261.815883]  el0_svc+0x3c/0xa8
  [12261.816235]  el0t_64_sync_handler+0x8c/0x108
  [12261.816724]  el0t_64_sync+0x198/0x1a0

The issue originates in vp_find_vqs_intx(). It kzalloc_objs() based
on the nvqs count provided by the caller, virtio_balloon::init_vqs().
However, it is not always the case that all nvqs number of
virtio_pci_vq_info objects will be properly populated.

For example, when VIRTIO_BALLOON_F_FREE_PAGE_HINT is absent, the
VIRTIO_BALLOON_VQ_FREE_PAGE-th item in the vp_dev->vqs array is
actually never populated, and is still a zeroe-initialized
virtio_pci_vq_info object, which is eventually going to trigger
a __list_del_entry_valid_or_report() crash.

Tested by applying this patch to a guest VM kernel with the
VIRTIO_BALLOON_F_REPORTING feature enabled and the
VIRTIO_BALLOON_F_FREE_PAGE_HINT feature disabled.
Without this patch, unloading the virtio_balloon module triggers a panic.
With this patch, no panic is observed.

The fix is to use queue_idx to handle the case that vp_find_vqs_intx()
skips vp_setup_vq() when caller provided null vqs_info[i].name, when
the caller doesn't populate all nvqs number of virtqueue_info objects.
Invariantly queue_idx is the correct index to store a successfully
created and populated virtio_pci_vq_info object. As a result, now
a virtio_pci_device object only stores queue_idx number of valid
virtio_pci_vq_info objects in its vqs array when the for-loop over
nvqs finishes (of course, without goto out_del_vqs).

vp_find_vqs_msix() has similar issue, so fix it in the same way.

This patch is marked as RFC because we are uncertain if any virtio-pci
code implicitly requires virtio_pci_device's vqs array to always
contain nvqs number of virtio_pci_vq_info objects, and to store
zero-initialized virtio_pci_vq_info objects. We have not observed
any issues in our testing, but insights or alternatives are welcome!

Signed-off-by: Link Lin <linkl@google.com>
Co-developed-by: Jiaqi Yan <jiaqiyan@google.com>
Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 drivers/virtio/virtio_pci_common.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
index da97b6a988de..9b32301529e5 100644
--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -423,14 +423,15 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs,
 			vqs[i] = NULL;
 			continue;
 		}
-		vqs[i] = vp_find_one_vq_msix(vdev, queue_idx++, vqi->callback,
+		vqs[i] = vp_find_one_vq_msix(vdev, queue_idx, vqi->callback,
 					     vqi->name, vqi->ctx, false,
 					     &allocated_vectors, vector_policy,
-					     &vp_dev->vqs[i]);
+					     &vp_dev->vqs[queue_idx]);
 		if (IS_ERR(vqs[i])) {
 			err = PTR_ERR(vqs[i]);
 			goto error_find;
 		}
+		++queue_idx;
 	}
 
 	if (!avq_num)
@@ -485,13 +486,14 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
 			vqs[i] = NULL;
 			continue;
 		}
-		vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
+		vqs[i] = vp_setup_vq(vdev, queue_idx, vqi->callback,
 				     vqi->name, vqi->ctx,
-				     VIRTIO_MSI_NO_VECTOR, &vp_dev->vqs[i]);
+				     VIRTIO_MSI_NO_VECTOR, &vp_dev->vqs[queue_idx]);
 		if (IS_ERR(vqs[i])) {
 			err = PTR_ERR(vqs[i]);
 			goto out_del_vqs;
 		}
+		++queue_idx;
 	}
 
 	if (!avq_num)
-- 
2.53.0.1213.gd9a14994de-goog

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v1] virtio_pci: only store successfully populated virtio_pci_vq_info
       [not found] ` <CALUx4KQKNYZm5AZzQXNqLRdGAT0nQOpmn_Lz6WHie73w1d9JQA@mail.gmail.com>
@ 2026-04-21 21:47   ` Link Lin
  2026-04-21 21:51     ` Michael S. Tsirkin
  2026-04-21 21:50   ` Michael S. Tsirkin
  1 sibling, 1 reply; 6+ messages in thread
From: Link Lin @ 2026-04-21 21:47 UTC (permalink / raw)
  To: mst, jasowang, xuanzhuo
  Cc: eperezma, jiaqiyan, rientjes, weixugc, virtualization,
	linux-kernel

Hi everyone,

Friendly ping. Apologies if you are getting this the second time - my
last ping wasn't in plain text mode and got rejected by some mailing
lists.

Please let me know if anyone has had a chance to look at this RFC
patch, or if any changes are needed.

Thanks,
Link

On Tue, Apr 21, 2026 at 2:24 PM Link Lin <linkl@google.com> wrote:
>
> Hi everyone,
>
> Friendly ping on this RFC patch. Please let me know if anyone has had a chance to look at this, or if any changes are needed.
>
> Thanks,
> Link
>
> On Tue, Apr 7, 2026 at 2:25 PM Link Lin <linkl@google.com> wrote:
>>
>> In environments where free page reporting is disabled, a kernel
>> panic is triggered when tearing down the virtio_balloon module:
>>
>>   [12261.808190] Call trace:
>>   [12261.808471]  __list_del_entry_valid_or_report+0x18/0xe0
>>   [12261.809064]  vp_del_vqs+0x12c/0x270
>>   [12261.809462]  remove_common+0x80/0x98 [virtio_balloon]
>>   [12261.810034]  virtballoon_remove+0xfc/0x158 [virtio_balloon]
>>   [12261.810663]  virtio_dev_remove+0x68/0xf8
>>   [12261.811108]  device_release_driver_internal+0x17c/0x278
>>   [12261.811701]  driver_detach+0xd4/0x138
>>   [12261.812117]  bus_remove_driver+0x90/0xd0
>>   [12261.812562]  driver_unregister+0x40/0x70
>>   [12261.813006]  unregister_virtio_driver+0x20/0x38
>>   [12261.813518]  cleanup_module+0x20/0x7a8 [virtio_balloon]
>>   [12261.814109]  __arm64_sys_delete_module+0x278/0x3d0
>>   [12261.814654]  invoke_syscall+0x5c/0x120
>>   [12261.815086]  el0_svc_common+0x90/0xf8
>>   [12261.815506]  do_el0_svc+0x2c/0x48
>>   [12261.815883]  el0_svc+0x3c/0xa8
>>   [12261.816235]  el0t_64_sync_handler+0x8c/0x108
>>   [12261.816724]  el0t_64_sync+0x198/0x1a0
>>
>> The issue originates in vp_find_vqs_intx(). It kzalloc_objs() based
>> on the nvqs count provided by the caller, virtio_balloon::init_vqs().
>> However, it is not always the case that all nvqs number of
>> virtio_pci_vq_info objects will be properly populated.
>>
>> For example, when VIRTIO_BALLOON_F_FREE_PAGE_HINT is absent, the
>> VIRTIO_BALLOON_VQ_FREE_PAGE-th item in the vp_dev->vqs array is
>> actually never populated, and is still a zeroe-initialized
>> virtio_pci_vq_info object, which is eventually going to trigger
>> a __list_del_entry_valid_or_report() crash.
>>
>> Tested by applying this patch to a guest VM kernel with the
>> VIRTIO_BALLOON_F_REPORTING feature enabled and the
>> VIRTIO_BALLOON_F_FREE_PAGE_HINT feature disabled.
>> Without this patch, unloading the virtio_balloon module triggers a panic.
>> With this patch, no panic is observed.
>>
>> The fix is to use queue_idx to handle the case that vp_find_vqs_intx()
>> skips vp_setup_vq() when caller provided null vqs_info[i].name, when
>> the caller doesn't populate all nvqs number of virtqueue_info objects.
>> Invariantly queue_idx is the correct index to store a successfully
>> created and populated virtio_pci_vq_info object. As a result, now
>> a virtio_pci_device object only stores queue_idx number of valid
>> virtio_pci_vq_info objects in its vqs array when the for-loop over
>> nvqs finishes (of course, without goto out_del_vqs).
>>
>> vp_find_vqs_msix() has similar issue, so fix it in the same way.
>>
>> This patch is marked as RFC because we are uncertain if any virtio-pci
>> code implicitly requires virtio_pci_device's vqs array to always
>> contain nvqs number of virtio_pci_vq_info objects, and to store
>> zero-initialized virtio_pci_vq_info objects. We have not observed
>> any issues in our testing, but insights or alternatives are welcome!
>>
>> Signed-off-by: Link Lin <linkl@google.com>
>> Co-developed-by: Jiaqi Yan <jiaqiyan@google.com>
>> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
>> ---
>>  drivers/virtio/virtio_pci_common.c | 10 ++++++----
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
>> index da97b6a988de..9b32301529e5 100644
>> --- a/drivers/virtio/virtio_pci_common.c
>> +++ b/drivers/virtio/virtio_pci_common.c
>> @@ -423,14 +423,15 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs,
>>                         vqs[i] = NULL;
>>                         continue;
>>                 }
>> -               vqs[i] = vp_find_one_vq_msix(vdev, queue_idx++, vqi->callback,
>> +               vqs[i] = vp_find_one_vq_msix(vdev, queue_idx, vqi->callback,
>>                                              vqi->name, vqi->ctx, false,
>>                                              &allocated_vectors, vector_policy,
>> -                                            &vp_dev->vqs[i]);
>> +                                            &vp_dev->vqs[queue_idx]);
>>                 if (IS_ERR(vqs[i])) {
>>                         err = PTR_ERR(vqs[i]);
>>                         goto error_find;
>>                 }
>> +               ++queue_idx;
>>         }
>>
>>         if (!avq_num)
>> @@ -485,13 +486,14 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
>>                         vqs[i] = NULL;
>>                         continue;
>>                 }
>> -               vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
>> +               vqs[i] = vp_setup_vq(vdev, queue_idx, vqi->callback,
>>                                      vqi->name, vqi->ctx,
>> -                                    VIRTIO_MSI_NO_VECTOR, &vp_dev->vqs[i]);
>> +                                    VIRTIO_MSI_NO_VECTOR, &vp_dev->vqs[queue_idx]);
>>                 if (IS_ERR(vqs[i])) {
>>                         err = PTR_ERR(vqs[i]);
>>                         goto out_del_vqs;
>>                 }
>> +               ++queue_idx;
>>         }
>>
>>         if (!avq_num)
>> --
>> 2.53.0.1213.gd9a14994de-goog

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v1] virtio_pci: only store successfully populated virtio_pci_vq_info
       [not found] ` <CALUx4KQKNYZm5AZzQXNqLRdGAT0nQOpmn_Lz6WHie73w1d9JQA@mail.gmail.com>
  2026-04-21 21:47   ` Link Lin
@ 2026-04-21 21:50   ` Michael S. Tsirkin
  1 sibling, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 21:50 UTC (permalink / raw)
  To: Link Lin
  Cc: jasowang, xuanzhuo, eperezma, jiaqiyan, rientjes, weixugc,
	virtualization, linux-kernel

On Tue, Apr 21, 2026 at 02:24:16PM -0700, Link Lin wrote:
> Hi everyone,
> 
> Friendly ping on this RFC patch. Please let me know if anyone has had a chance
> to look at this, or if any changes are needed.
> 
> Thanks,
> Link
> 
> On Tue, Apr 7, 2026 at 2:25 PM Link Lin <linkl@google.com> wrote:
> 
>     In environments where free page reporting is disabled, a kernel
>     panic is triggered when tearing down the virtio_balloon module:
> 
>       [12261.808190] Call trace:
>       [12261.808471]  __list_del_entry_valid_or_report+0x18/0xe0
>       [12261.809064]  vp_del_vqs+0x12c/0x270
>       [12261.809462]  remove_common+0x80/0x98 [virtio_balloon]
>       [12261.810034]  virtballoon_remove+0xfc/0x158 [virtio_balloon]
>       [12261.810663]  virtio_dev_remove+0x68/0xf8
>       [12261.811108]  device_release_driver_internal+0x17c/0x278
>       [12261.811701]  driver_detach+0xd4/0x138
>       [12261.812117]  bus_remove_driver+0x90/0xd0
>       [12261.812562]  driver_unregister+0x40/0x70
>       [12261.813006]  unregister_virtio_driver+0x20/0x38
>       [12261.813518]  cleanup_module+0x20/0x7a8 [virtio_balloon]
>       [12261.814109]  __arm64_sys_delete_module+0x278/0x3d0
>       [12261.814654]  invoke_syscall+0x5c/0x120
>       [12261.815086]  el0_svc_common+0x90/0xf8
>       [12261.815506]  do_el0_svc+0x2c/0x48
>       [12261.815883]  el0_svc+0x3c/0xa8
>       [12261.816235]  el0t_64_sync_handler+0x8c/0x108
>       [12261.816724]  el0t_64_sync+0x198/0x1a0
> 
>     The issue originates in vp_find_vqs_intx(). It kzalloc_objs() based
>     on the nvqs count provided by the caller, virtio_balloon::init_vqs().
>     However, it is not always the case that all nvqs number of
>     virtio_pci_vq_info objects will be properly populated.
> 
>     For example, when VIRTIO_BALLOON_F_FREE_PAGE_HINT is absent, the
>     VIRTIO_BALLOON_VQ_FREE_PAGE-th item in the vp_dev->vqs array is
>     actually never populated, and is still a zeroe-initialized
>     virtio_pci_vq_info object, which is eventually going to trigger
>     a __list_del_entry_valid_or_report() crash.
> 
>     Tested by applying this patch to a guest VM kernel with the
>     VIRTIO_BALLOON_F_REPORTING feature enabled and the
>     VIRTIO_BALLOON_F_FREE_PAGE_HINT feature disabled.
>     Without this patch, unloading the virtio_balloon module triggers a panic.
>     With this patch, no panic is observed.
> 
>     The fix is to use queue_idx to handle the case that vp_find_vqs_intx()
>     skips vp_setup_vq() when caller provided null vqs_info[i].name, when
>     the caller doesn't populate all nvqs number of virtqueue_info objects.
>     Invariantly queue_idx is the correct index to store a successfully
>     created and populated virtio_pci_vq_info object. As a result, now
>     a virtio_pci_device object only stores queue_idx number of valid
>     virtio_pci_vq_info objects in its vqs array when the for-loop over
>     nvqs finishes (of course, without goto out_del_vqs).
> 
>     vp_find_vqs_msix() has similar issue, so fix it in the same way.
> 
>     This patch is marked as RFC because we are uncertain if any virtio-pci
>     code implicitly requires virtio_pci_device's vqs array to always
>     contain nvqs number of virtio_pci_vq_info objects, and to store
>     zero-initialized virtio_pci_vq_info objects. We have not observed
>     any issues in our testing, but insights or alternatives are welcome!
> 
>     Signed-off-by: Link Lin <linkl@google.com>
>     Co-developed-by: Jiaqi Yan <jiaqiyan@google.com>
>     Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
>     ---
>      drivers/virtio/virtio_pci_common.c | 10 ++++++----
>      1 file changed, 6 insertions(+), 4 deletions(-)
> 
>     diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/
>     virtio_pci_common.c
>     index da97b6a988de..9b32301529e5 100644
>     --- a/drivers/virtio/virtio_pci_common.c
>     +++ b/drivers/virtio/virtio_pci_common.c
>     @@ -423,14 +423,15 @@ static int vp_find_vqs_msix(struct virtio_device
>     *vdev, unsigned int nvqs,
>                             vqs[i] = NULL;
>                             continue;
>                     }
>     -               vqs[i] = vp_find_one_vq_msix(vdev, queue_idx++, vqi->
>     callback,
>     +               vqs[i] = vp_find_one_vq_msix(vdev, queue_idx, vqi->
>     callback,
>                                                  vqi->name, vqi->ctx, false,
>                                                  &allocated_vectors,
>     vector_policy,
>     -                                            &vp_dev->vqs[i]);
>     +                                            &vp_dev->vqs[queue_idx]);
>                     if (IS_ERR(vqs[i])) {
>                             err = PTR_ERR(vqs[i]);
>                             goto error_find;
>                     }
>     +               ++queue_idx;
>             }
> 
>             if (!avq_num)
>     @@ -485,13 +486,14 @@ static int vp_find_vqs_intx(struct virtio_device
>     *vdev, unsigned int nvqs,
>                             vqs[i] = NULL;
>                             continue;
>                     }
>     -               vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
>     +               vqs[i] = vp_setup_vq(vdev, queue_idx, vqi->callback,
>                                          vqi->name, vqi->ctx,
>     -                                    VIRTIO_MSI_NO_VECTOR, &vp_dev->vqs
>     [i]);
>     +                                    VIRTIO_MSI_NO_VECTOR, &vp_dev->vqs
>     [queue_idx]);
>                     if (IS_ERR(vqs[i])) {
>                             err = PTR_ERR(vqs[i]);
>                             goto out_del_vqs;
>                     }
>     +               ++queue_idx;
>             }
> 
>             if (!avq_num)
>     --
>     2.53.0.1213.gd9a14994de-goog
> 

I have this in my tree:

https://lore.kernel.org/all/20260315141808.547081-1-ammarfaizi2@openresty.com/


same?

-- 
MST


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v1] virtio_pci: only store successfully populated virtio_pci_vq_info
  2026-04-21 21:47   ` Link Lin
@ 2026-04-21 21:51     ` Michael S. Tsirkin
  2026-04-21 22:15       ` Link Lin
  0 siblings, 1 reply; 6+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 21:51 UTC (permalink / raw)
  To: Link Lin
  Cc: jasowang, xuanzhuo, eperezma, jiaqiyan, rientjes, weixugc,
	virtualization, linux-kernel

On Tue, Apr 21, 2026 at 02:47:32PM -0700, Link Lin wrote:
> Hi everyone,
> 
> Friendly ping. Apologies if you are getting this the second time - my
> last ping wasn't in plain text mode and got rejected by some mailing
> lists.
> 
> Please let me know if anyone has had a chance to look at this RFC
> patch, or if any changes are needed.
> 
> Thanks,
> Link
> 
> On Tue, Apr 21, 2026 at 2:24 PM Link Lin <linkl@google.com> wrote:
> >
> > Hi everyone,
> >
> > Friendly ping on this RFC patch. Please let me know if anyone has had a chance to look at this, or if any changes are needed.
> >
> > Thanks,
> > Link
> >
> > On Tue, Apr 7, 2026 at 2:25 PM Link Lin <linkl@google.com> wrote:
> >>
> >> In environments where free page reporting is disabled, a kernel
> >> panic is triggered when tearing down the virtio_balloon module:
> >>
> >>   [12261.808190] Call trace:
> >>   [12261.808471]  __list_del_entry_valid_or_report+0x18/0xe0
> >>   [12261.809064]  vp_del_vqs+0x12c/0x270
> >>   [12261.809462]  remove_common+0x80/0x98 [virtio_balloon]
> >>   [12261.810034]  virtballoon_remove+0xfc/0x158 [virtio_balloon]
> >>   [12261.810663]  virtio_dev_remove+0x68/0xf8
> >>   [12261.811108]  device_release_driver_internal+0x17c/0x278
> >>   [12261.811701]  driver_detach+0xd4/0x138
> >>   [12261.812117]  bus_remove_driver+0x90/0xd0
> >>   [12261.812562]  driver_unregister+0x40/0x70
> >>   [12261.813006]  unregister_virtio_driver+0x20/0x38
> >>   [12261.813518]  cleanup_module+0x20/0x7a8 [virtio_balloon]
> >>   [12261.814109]  __arm64_sys_delete_module+0x278/0x3d0
> >>   [12261.814654]  invoke_syscall+0x5c/0x120
> >>   [12261.815086]  el0_svc_common+0x90/0xf8
> >>   [12261.815506]  do_el0_svc+0x2c/0x48
> >>   [12261.815883]  el0_svc+0x3c/0xa8
> >>   [12261.816235]  el0t_64_sync_handler+0x8c/0x108
> >>   [12261.816724]  el0t_64_sync+0x198/0x1a0
> >>
> >> The issue originates in vp_find_vqs_intx(). It kzalloc_objs() based
> >> on the nvqs count provided by the caller, virtio_balloon::init_vqs().
> >> However, it is not always the case that all nvqs number of
> >> virtio_pci_vq_info objects will be properly populated.
> >>
> >> For example, when VIRTIO_BALLOON_F_FREE_PAGE_HINT is absent, the
> >> VIRTIO_BALLOON_VQ_FREE_PAGE-th item in the vp_dev->vqs array is
> >> actually never populated, and is still a zeroe-initialized
> >> virtio_pci_vq_info object, which is eventually going to trigger
> >> a __list_del_entry_valid_or_report() crash.
> >>
> >> Tested by applying this patch to a guest VM kernel with the
> >> VIRTIO_BALLOON_F_REPORTING feature enabled and the
> >> VIRTIO_BALLOON_F_FREE_PAGE_HINT feature disabled.
> >> Without this patch, unloading the virtio_balloon module triggers a panic.
> >> With this patch, no panic is observed.
> >>
> >> The fix is to use queue_idx to handle the case that vp_find_vqs_intx()
> >> skips vp_setup_vq() when caller provided null vqs_info[i].name, when
> >> the caller doesn't populate all nvqs number of virtqueue_info objects.
> >> Invariantly queue_idx is the correct index to store a successfully
> >> created and populated virtio_pci_vq_info object. As a result, now
> >> a virtio_pci_device object only stores queue_idx number of valid
> >> virtio_pci_vq_info objects in its vqs array when the for-loop over
> >> nvqs finishes (of course, without goto out_del_vqs).
> >>
> >> vp_find_vqs_msix() has similar issue, so fix it in the same way.
> >>
> >> This patch is marked as RFC because we are uncertain if any virtio-pci
> >> code implicitly requires virtio_pci_device's vqs array to always
> >> contain nvqs number of virtio_pci_vq_info objects, and to store
> >> zero-initialized virtio_pci_vq_info objects. We have not observed
> >> any issues in our testing, but insights or alternatives are welcome!
> >>
> >> Signed-off-by: Link Lin <linkl@google.com>
> >> Co-developed-by: Jiaqi Yan <jiaqiyan@google.com>
> >> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> >> ---
> >>  drivers/virtio/virtio_pci_common.c | 10 ++++++----
> >>  1 file changed, 6 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> >> index da97b6a988de..9b32301529e5 100644
> >> --- a/drivers/virtio/virtio_pci_common.c
> >> +++ b/drivers/virtio/virtio_pci_common.c
> >> @@ -423,14 +423,15 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs,
> >>                         vqs[i] = NULL;
> >>                         continue;
> >>                 }
> >> -               vqs[i] = vp_find_one_vq_msix(vdev, queue_idx++, vqi->callback,
> >> +               vqs[i] = vp_find_one_vq_msix(vdev, queue_idx, vqi->callback,
> >>                                              vqi->name, vqi->ctx, false,
> >>                                              &allocated_vectors, vector_policy,
> >> -                                            &vp_dev->vqs[i]);
> >> +                                            &vp_dev->vqs[queue_idx]);
> >>                 if (IS_ERR(vqs[i])) {
> >>                         err = PTR_ERR(vqs[i]);
> >>                         goto error_find;
> >>                 }
> >> +               ++queue_idx;
> >>         }
> >>
> >>         if (!avq_num)
> >> @@ -485,13 +486,14 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
> >>                         vqs[i] = NULL;
> >>                         continue;
> >>                 }
> >> -               vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
> >> +               vqs[i] = vp_setup_vq(vdev, queue_idx, vqi->callback,
> >>                                      vqi->name, vqi->ctx,
> >> -                                    VIRTIO_MSI_NO_VECTOR, &vp_dev->vqs[i]);
> >> +                                    VIRTIO_MSI_NO_VECTOR, &vp_dev->vqs[queue_idx]);
> >>                 if (IS_ERR(vqs[i])) {
> >>                         err = PTR_ERR(vqs[i]);
> >>                         goto out_del_vqs;
> >>                 }
> >> +               ++queue_idx;
> >>         }
> >>
> >>         if (!avq_num)
> >> --
> >> 2.53.0.1213.gd9a14994de-goog


I have this in my tree:

https://lore.kernel.org/all/20260315141808.547081-1-ammarfaizi2@openresty.com/


same?



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v1] virtio_pci: only store successfully populated virtio_pci_vq_info
  2026-04-21 21:51     ` Michael S. Tsirkin
@ 2026-04-21 22:15       ` Link Lin
  2026-04-21 22:16         ` Michael S. Tsirkin
  0 siblings, 1 reply; 6+ messages in thread
From: Link Lin @ 2026-04-21 22:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, xuanzhuo, eperezma, jiaqiyan, rientjes, weixugc,
	virtualization, linux-kernel, ammarfaizi2

Hi Michael,

That's essentially the same fix as ours but one month earlier. That's
great news!

Quick follow-up question: Do you know if this will make it into v7.1?

I also noticed the patch has the "Cc: stable@vger.kernel.org # v6.11+"
tag, which is perfect since we actually hit this bug in v6.12.

I wasn't aware of Ammar's patch as I wasn't subscribed to the mailing
list. Looks like we tried to reinvent the wheel ^_^;

Sincerely,
Link


On Tue, Apr 21, 2026 at 2:51 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Apr 21, 2026 at 02:47:32PM -0700, Link Lin wrote:
> > Hi everyone,
> >
> > Friendly ping. Apologies if you are getting this the second time - my
> > last ping wasn't in plain text mode and got rejected by some mailing
> > lists.
> >
> > Please let me know if anyone has had a chance to look at this RFC
> > patch, or if any changes are needed.
> >
> > Thanks,
> > Link
> >
> > On Tue, Apr 21, 2026 at 2:24 PM Link Lin <linkl@google.com> wrote:
> > >
> > > Hi everyone,
> > >
> > > Friendly ping on this RFC patch. Please let me know if anyone has had a chance to look at this, or if any changes are needed.
> > >
> > > Thanks,
> > > Link
> > >
> > > On Tue, Apr 7, 2026 at 2:25 PM Link Lin <linkl@google.com> wrote:
> > >>
> > >> In environments where free page reporting is disabled, a kernel
> > >> panic is triggered when tearing down the virtio_balloon module:
> > >>
> > >>   [12261.808190] Call trace:
> > >>   [12261.808471]  __list_del_entry_valid_or_report+0x18/0xe0
> > >>   [12261.809064]  vp_del_vqs+0x12c/0x270
> > >>   [12261.809462]  remove_common+0x80/0x98 [virtio_balloon]
> > >>   [12261.810034]  virtballoon_remove+0xfc/0x158 [virtio_balloon]
> > >>   [12261.810663]  virtio_dev_remove+0x68/0xf8
> > >>   [12261.811108]  device_release_driver_internal+0x17c/0x278
> > >>   [12261.811701]  driver_detach+0xd4/0x138
> > >>   [12261.812117]  bus_remove_driver+0x90/0xd0
> > >>   [12261.812562]  driver_unregister+0x40/0x70
> > >>   [12261.813006]  unregister_virtio_driver+0x20/0x38
> > >>   [12261.813518]  cleanup_module+0x20/0x7a8 [virtio_balloon]
> > >>   [12261.814109]  __arm64_sys_delete_module+0x278/0x3d0
> > >>   [12261.814654]  invoke_syscall+0x5c/0x120
> > >>   [12261.815086]  el0_svc_common+0x90/0xf8
> > >>   [12261.815506]  do_el0_svc+0x2c/0x48
> > >>   [12261.815883]  el0_svc+0x3c/0xa8
> > >>   [12261.816235]  el0t_64_sync_handler+0x8c/0x108
> > >>   [12261.816724]  el0t_64_sync+0x198/0x1a0
> > >>
> > >> The issue originates in vp_find_vqs_intx(). It kzalloc_objs() based
> > >> on the nvqs count provided by the caller, virtio_balloon::init_vqs().
> > >> However, it is not always the case that all nvqs number of
> > >> virtio_pci_vq_info objects will be properly populated.
> > >>
> > >> For example, when VIRTIO_BALLOON_F_FREE_PAGE_HINT is absent, the
> > >> VIRTIO_BALLOON_VQ_FREE_PAGE-th item in the vp_dev->vqs array is
> > >> actually never populated, and is still a zeroe-initialized
> > >> virtio_pci_vq_info object, which is eventually going to trigger
> > >> a __list_del_entry_valid_or_report() crash.
> > >>
> > >> Tested by applying this patch to a guest VM kernel with the
> > >> VIRTIO_BALLOON_F_REPORTING feature enabled and the
> > >> VIRTIO_BALLOON_F_FREE_PAGE_HINT feature disabled.
> > >> Without this patch, unloading the virtio_balloon module triggers a panic.
> > >> With this patch, no panic is observed.
> > >>
> > >> The fix is to use queue_idx to handle the case that vp_find_vqs_intx()
> > >> skips vp_setup_vq() when caller provided null vqs_info[i].name, when
> > >> the caller doesn't populate all nvqs number of virtqueue_info objects.
> > >> Invariantly queue_idx is the correct index to store a successfully
> > >> created and populated virtio_pci_vq_info object. As a result, now
> > >> a virtio_pci_device object only stores queue_idx number of valid
> > >> virtio_pci_vq_info objects in its vqs array when the for-loop over
> > >> nvqs finishes (of course, without goto out_del_vqs).
> > >>
> > >> vp_find_vqs_msix() has similar issue, so fix it in the same way.
> > >>
> > >> This patch is marked as RFC because we are uncertain if any virtio-pci
> > >> code implicitly requires virtio_pci_device's vqs array to always
> > >> contain nvqs number of virtio_pci_vq_info objects, and to store
> > >> zero-initialized virtio_pci_vq_info objects. We have not observed
> > >> any issues in our testing, but insights or alternatives are welcome!
> > >>
> > >> Signed-off-by: Link Lin <linkl@google.com>
> > >> Co-developed-by: Jiaqi Yan <jiaqiyan@google.com>
> > >> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> > >> ---
> > >>  drivers/virtio/virtio_pci_common.c | 10 ++++++----
> > >>  1 file changed, 6 insertions(+), 4 deletions(-)
> > >>
> > >> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > >> index da97b6a988de..9b32301529e5 100644
> > >> --- a/drivers/virtio/virtio_pci_common.c
> > >> +++ b/drivers/virtio/virtio_pci_common.c
> > >> @@ -423,14 +423,15 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs,
> > >>                         vqs[i] = NULL;
> > >>                         continue;
> > >>                 }
> > >> -               vqs[i] = vp_find_one_vq_msix(vdev, queue_idx++, vqi->callback,
> > >> +               vqs[i] = vp_find_one_vq_msix(vdev, queue_idx, vqi->callback,
> > >>                                              vqi->name, vqi->ctx, false,
> > >>                                              &allocated_vectors, vector_policy,
> > >> -                                            &vp_dev->vqs[i]);
> > >> +                                            &vp_dev->vqs[queue_idx]);
> > >>                 if (IS_ERR(vqs[i])) {
> > >>                         err = PTR_ERR(vqs[i]);
> > >>                         goto error_find;
> > >>                 }
> > >> +               ++queue_idx;
> > >>         }
> > >>
> > >>         if (!avq_num)
> > >> @@ -485,13 +486,14 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
> > >>                         vqs[i] = NULL;
> > >>                         continue;
> > >>                 }
> > >> -               vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
> > >> +               vqs[i] = vp_setup_vq(vdev, queue_idx, vqi->callback,
> > >>                                      vqi->name, vqi->ctx,
> > >> -                                    VIRTIO_MSI_NO_VECTOR, &vp_dev->vqs[i]);
> > >> +                                    VIRTIO_MSI_NO_VECTOR, &vp_dev->vqs[queue_idx]);
> > >>                 if (IS_ERR(vqs[i])) {
> > >>                         err = PTR_ERR(vqs[i]);
> > >>                         goto out_del_vqs;
> > >>                 }
> > >> +               ++queue_idx;
> > >>         }
> > >>
> > >>         if (!avq_num)
> > >> --
> > >> 2.53.0.1213.gd9a14994de-goog
>
>
> I have this in my tree:
>
> https://lore.kernel.org/all/20260315141808.547081-1-ammarfaizi2@openresty.com/
>
>
> same?
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [RFC PATCH v1] virtio_pci: only store successfully populated virtio_pci_vq_info
  2026-04-21 22:15       ` Link Lin
@ 2026-04-21 22:16         ` Michael S. Tsirkin
  0 siblings, 0 replies; 6+ messages in thread
From: Michael S. Tsirkin @ 2026-04-21 22:16 UTC (permalink / raw)
  To: Link Lin
  Cc: jasowang, xuanzhuo, eperezma, jiaqiyan, rientjes, weixugc,
	virtualization, linux-kernel, ammarfaizi2

On Tue, Apr 21, 2026 at 03:15:55PM -0700, Link Lin wrote:
> Hi Michael,
> 
> That's essentially the same fix as ours but one month earlier. That's
> great news!
> 
> Quick follow-up question: Do you know if this will make it into v7.1?


I'll try.

> I also noticed the patch has the "Cc: stable@vger.kernel.org # v6.11+"
> tag, which is perfect since we actually hit this bug in v6.12.
> 
> I wasn't aware of Ammar's patch as I wasn't subscribed to the mailing
> list. Looks like we tried to reinvent the wheel ^_^;
> 
> Sincerely,
> Link
> 
> 
> On Tue, Apr 21, 2026 at 2:51 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Tue, Apr 21, 2026 at 02:47:32PM -0700, Link Lin wrote:
> > > Hi everyone,
> > >
> > > Friendly ping. Apologies if you are getting this the second time - my
> > > last ping wasn't in plain text mode and got rejected by some mailing
> > > lists.
> > >
> > > Please let me know if anyone has had a chance to look at this RFC
> > > patch, or if any changes are needed.
> > >
> > > Thanks,
> > > Link
> > >
> > > On Tue, Apr 21, 2026 at 2:24 PM Link Lin <linkl@google.com> wrote:
> > > >
> > > > Hi everyone,
> > > >
> > > > Friendly ping on this RFC patch. Please let me know if anyone has had a chance to look at this, or if any changes are needed.
> > > >
> > > > Thanks,
> > > > Link
> > > >
> > > > On Tue, Apr 7, 2026 at 2:25 PM Link Lin <linkl@google.com> wrote:
> > > >>
> > > >> In environments where free page reporting is disabled, a kernel
> > > >> panic is triggered when tearing down the virtio_balloon module:
> > > >>
> > > >>   [12261.808190] Call trace:
> > > >>   [12261.808471]  __list_del_entry_valid_or_report+0x18/0xe0
> > > >>   [12261.809064]  vp_del_vqs+0x12c/0x270
> > > >>   [12261.809462]  remove_common+0x80/0x98 [virtio_balloon]
> > > >>   [12261.810034]  virtballoon_remove+0xfc/0x158 [virtio_balloon]
> > > >>   [12261.810663]  virtio_dev_remove+0x68/0xf8
> > > >>   [12261.811108]  device_release_driver_internal+0x17c/0x278
> > > >>   [12261.811701]  driver_detach+0xd4/0x138
> > > >>   [12261.812117]  bus_remove_driver+0x90/0xd0
> > > >>   [12261.812562]  driver_unregister+0x40/0x70
> > > >>   [12261.813006]  unregister_virtio_driver+0x20/0x38
> > > >>   [12261.813518]  cleanup_module+0x20/0x7a8 [virtio_balloon]
> > > >>   [12261.814109]  __arm64_sys_delete_module+0x278/0x3d0
> > > >>   [12261.814654]  invoke_syscall+0x5c/0x120
> > > >>   [12261.815086]  el0_svc_common+0x90/0xf8
> > > >>   [12261.815506]  do_el0_svc+0x2c/0x48
> > > >>   [12261.815883]  el0_svc+0x3c/0xa8
> > > >>   [12261.816235]  el0t_64_sync_handler+0x8c/0x108
> > > >>   [12261.816724]  el0t_64_sync+0x198/0x1a0
> > > >>
> > > >> The issue originates in vp_find_vqs_intx(). It kzalloc_objs() based
> > > >> on the nvqs count provided by the caller, virtio_balloon::init_vqs().
> > > >> However, it is not always the case that all nvqs number of
> > > >> virtio_pci_vq_info objects will be properly populated.
> > > >>
> > > >> For example, when VIRTIO_BALLOON_F_FREE_PAGE_HINT is absent, the
> > > >> VIRTIO_BALLOON_VQ_FREE_PAGE-th item in the vp_dev->vqs array is
> > > >> actually never populated, and is still a zeroe-initialized
> > > >> virtio_pci_vq_info object, which is eventually going to trigger
> > > >> a __list_del_entry_valid_or_report() crash.
> > > >>
> > > >> Tested by applying this patch to a guest VM kernel with the
> > > >> VIRTIO_BALLOON_F_REPORTING feature enabled and the
> > > >> VIRTIO_BALLOON_F_FREE_PAGE_HINT feature disabled.
> > > >> Without this patch, unloading the virtio_balloon module triggers a panic.
> > > >> With this patch, no panic is observed.
> > > >>
> > > >> The fix is to use queue_idx to handle the case that vp_find_vqs_intx()
> > > >> skips vp_setup_vq() when caller provided null vqs_info[i].name, when
> > > >> the caller doesn't populate all nvqs number of virtqueue_info objects.
> > > >> Invariantly queue_idx is the correct index to store a successfully
> > > >> created and populated virtio_pci_vq_info object. As a result, now
> > > >> a virtio_pci_device object only stores queue_idx number of valid
> > > >> virtio_pci_vq_info objects in its vqs array when the for-loop over
> > > >> nvqs finishes (of course, without goto out_del_vqs).
> > > >>
> > > >> vp_find_vqs_msix() has similar issue, so fix it in the same way.
> > > >>
> > > >> This patch is marked as RFC because we are uncertain if any virtio-pci
> > > >> code implicitly requires virtio_pci_device's vqs array to always
> > > >> contain nvqs number of virtio_pci_vq_info objects, and to store
> > > >> zero-initialized virtio_pci_vq_info objects. We have not observed
> > > >> any issues in our testing, but insights or alternatives are welcome!
> > > >>
> > > >> Signed-off-by: Link Lin <linkl@google.com>
> > > >> Co-developed-by: Jiaqi Yan <jiaqiyan@google.com>
> > > >> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
> > > >> ---
> > > >>  drivers/virtio/virtio_pci_common.c | 10 ++++++----
> > > >>  1 file changed, 6 insertions(+), 4 deletions(-)
> > > >>
> > > >> diff --git a/drivers/virtio/virtio_pci_common.c b/drivers/virtio/virtio_pci_common.c
> > > >> index da97b6a988de..9b32301529e5 100644
> > > >> --- a/drivers/virtio/virtio_pci_common.c
> > > >> +++ b/drivers/virtio/virtio_pci_common.c
> > > >> @@ -423,14 +423,15 @@ static int vp_find_vqs_msix(struct virtio_device *vdev, unsigned int nvqs,
> > > >>                         vqs[i] = NULL;
> > > >>                         continue;
> > > >>                 }
> > > >> -               vqs[i] = vp_find_one_vq_msix(vdev, queue_idx++, vqi->callback,
> > > >> +               vqs[i] = vp_find_one_vq_msix(vdev, queue_idx, vqi->callback,
> > > >>                                              vqi->name, vqi->ctx, false,
> > > >>                                              &allocated_vectors, vector_policy,
> > > >> -                                            &vp_dev->vqs[i]);
> > > >> +                                            &vp_dev->vqs[queue_idx]);
> > > >>                 if (IS_ERR(vqs[i])) {
> > > >>                         err = PTR_ERR(vqs[i]);
> > > >>                         goto error_find;
> > > >>                 }
> > > >> +               ++queue_idx;
> > > >>         }
> > > >>
> > > >>         if (!avq_num)
> > > >> @@ -485,13 +486,14 @@ static int vp_find_vqs_intx(struct virtio_device *vdev, unsigned int nvqs,
> > > >>                         vqs[i] = NULL;
> > > >>                         continue;
> > > >>                 }
> > > >> -               vqs[i] = vp_setup_vq(vdev, queue_idx++, vqi->callback,
> > > >> +               vqs[i] = vp_setup_vq(vdev, queue_idx, vqi->callback,
> > > >>                                      vqi->name, vqi->ctx,
> > > >> -                                    VIRTIO_MSI_NO_VECTOR, &vp_dev->vqs[i]);
> > > >> +                                    VIRTIO_MSI_NO_VECTOR, &vp_dev->vqs[queue_idx]);
> > > >>                 if (IS_ERR(vqs[i])) {
> > > >>                         err = PTR_ERR(vqs[i]);
> > > >>                         goto out_del_vqs;
> > > >>                 }
> > > >> +               ++queue_idx;
> > > >>         }
> > > >>
> > > >>         if (!avq_num)
> > > >> --
> > > >> 2.53.0.1213.gd9a14994de-goog
> >
> >
> > I have this in my tree:
> >
> > https://lore.kernel.org/all/20260315141808.547081-1-ammarfaizi2@openresty.com/
> >
> >
> > same?
> >
> >


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-21 22:16 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-07 21:25 [RFC PATCH v1] virtio_pci: only store successfully populated virtio_pci_vq_info Link Lin
     [not found] ` <CALUx4KQKNYZm5AZzQXNqLRdGAT0nQOpmn_Lz6WHie73w1d9JQA@mail.gmail.com>
2026-04-21 21:47   ` Link Lin
2026-04-21 21:51     ` Michael S. Tsirkin
2026-04-21 22:15       ` Link Lin
2026-04-21 22:16         ` Michael S. Tsirkin
2026-04-21 21:50   ` Michael S. Tsirkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox