Re: [RFC v5 0/7] Add packed format to shadow virtqueue

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Sahil Siddiq <icegambit91@gmail.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: sgarzare@redhat.com, mst@redhat.com, qemu-devel@nongnu.org,
	sahilcdq@proton.me
Subject: Re: [RFC v5 0/7] Add packed format to shadow virtqueue
Date: Mon, 4 Aug 2025 11:34:26 +0530	[thread overview]
Message-ID: <15d4f488-5431-460c-8e06-0741e11fe7f2@gmail.com> (raw)
In-Reply-To: <CAJaqyWcWoyGWeOG1B8FBZgGb6n99fVkTAvaTFco0A2cvr8Lf8w@mail.gmail.com>

Hi,

On 7/31/25 7:22 PM, Eugenio Perez Martin wrote:
> On Wed, Jul 30, 2025 at 4:33 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
>> I think I have finally found the reason behind this issue.
>>
>> The order in which "add_packed" and "get_buf_packed" are performed in the
>> nested guest kernel (L2 kernel) and QEMU are a little different. Due to
>> this, the values in free_head and svq->desc_next[] differ and the guest
>> crashes at some point. More below.
>>
>> On 6/26/25 1:07 PM, Eugenio Perez Martin wrote:
>>> On Thu, Jun 26, 2025 at 7:16 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>>> I think there's something off in the way "free_head", "last_used_idx" and
>>>> "desc_next" values are calculated in vhost_svq_get_buf_packed() [1].
>>>>
>>>> In the latest test run, QEMU sent ids 0 through 28 to L2. L2 started receiving
>>>> them in order till id 8. At this point it received id 7 again for some reason
>>>> and then crashed.
>>>>
>>>> L2:
>>>>
>>>> [ 1641.129218] (prepare_packed) output.0 -> needs_kick: 1
>>>> [ 1641.130621] (notify) output.0 -> function will return true
>>>> [ 1641.132022] output.0 -> id: 0
>>>> [ 1739.502358] input.0 -> id: 0
>>>> [ 1739.503003] input.0 -> id: 1
>>>> [ 1739.562024] input.0 -> id: 2
>>>> [ 1739.578682] input.0 -> id: 3
>>>> [ 1739.661913] input.0 -> id: 4
>>>> [ 1739.828796] input.0 -> id: 5
>>>> [ 1739.829789] input.0 -> id: 6
>>>> [ 1740.078757] input.0 -> id: 7
>>>> [ 1740.079749] input.0 -> id: 8
>>>> [ 1740.080382] input.0 -> id: 7    <----Received 7 again
>>>> [ 1740.081614] virtio_net virtio1: input.0:id 7 is not a head!
>>>>
>>>> QEMU logs (vhost_svq_get_buf_packed):
>>>> ------
>>>> size              : svq->vring.num
>>>> len               : svq->vring_packed.vring.desc[last_used].len
>>>> id                : svq->vring_packed.vring.desc[last_used].id
>>>> num               : svq->desc_state[id].ndescs
>>>> last_used_chain   : Result of vhost_svq_last_desc_of_chain(svq, num, id) [2]
>>>> free_head         : svq->free_head
>>>> last_used         : (last_used_idx & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR)) + num
>>>> used_wrap_counter : !!(last_used_idx & (1 << VRING_PACKED_EVENT_F_WRAP_CTR))
>>>> ------
>>>> size: 256, len: 82, id: 7, vq idx: 0
>>>> id: 7, last_used_chain: 7, free_head: 6, vq idx: 0
>>>> num: 1, free_head: 7, id: 7, last_used: 8, used_wrap_counter: 1, vq idx: 0
>>>> ------
>>>> size: 256, len: 104, id: 8, vq idx: 0
>>>> id: 8, last_used_chain: 8, free_head: 7, vq idx: 0
>>>> num: 1, free_head: 8, id: 8, last_used: 9, used_wrap_counter: 1, vq idx: 0
>>>> ------
>>>> size: 256, len: 98, id: 9, vq idx: 0
>>>> id: 9, last_used_chain: 9, free_head: 8, vq idx: 0
>>>> num: 1, free_head: 9, id: 9, last_used: 10, used_wrap_counter: 1, vq idx: 0
>>>> ------
>>>> size: 256, len: 104, id: 10, vq idx: 0
>>>> id: 10, last_used_chain: 10, free_head: 9, vq idx: 0
>>>> num: 1, free_head: 10, id: 10, last_used: 11, used_wrap_counter: 1, vq idx: 0
>>>>
>>>> I have a few more ideas of what to do. I'll let you know if I find something
>>>> else.
>>>>
>>> I cannot find anything just by inspection. What about printing all the
>>> desc_state and all desc_next to check for incoherencies in each
>>> svq_add and get_buf?
>> In this test, all 256 descriptors were filled in the RX vq.
>>
>> In the TX queue, L2 kernel would add one descriptor at a time and notify
>> QEMU. QEMU would then register it in its SVQ and mark it as "available".
>> After processing the descriptor, QEMU would mark it as "used" and flush it
>> back to L2. L2, in turn, would mark this descriptor as "used". After this
>> process, L2 would add the next descriptor in the TX vq while reusing this
>> ID. This was observed from idx 0 till idx 7.
>>
>> L2's debug logs:
>>
>> [   18.379112] (use_indirect?) output.0 -> verdict: 0                <----- Begin adding descriptor in idx 6
>> [   18.387134] (add_packed) output.0 -> idx: 6
>> [   18.389897] (add_packed) output.0 -> id: 0
>> [   18.392290] (add_packed) output.0 -> len: 74
>> [   18.394606] (add_packed) output.0 -> addr: 5012315726
>> [   18.397043] (add_packed) output.0 -> next id: 1
>> [   18.399861] Entering prepare_packed: output.0
>> [   18.402478] (prepare_packed) output.0 -> needs_kick: 1
>> [   18.404998] (notify) output.0 -> function will return true        <----- Notify QEMU
>> [   18.406349] output.0 -> id: 0, idx: 6                             <----- Mark ID 0 in idx 6 as used
>> [   18.409482] output.0 -> old free_head: 1, new free_head: 0        <----- ID 0 can be reused
>> [   18.410919] (after get_buf processed) output.0 -> id: 0, idx: 7   <----- Next slot is idx 7
>> [   18.921895] (use_indirect?) output.0 -> verdict: 0                <----- Begin adding descriptor with ID = 0 in idx 7
>> [   18.930093] (add_packed) output.0 -> idx: 7
>> [   18.935715] (add_packed) output.0 -> id: 0
>> [   18.937609] (add_packed) output.0 -> len: 122
>> [   18.939614] (add_packed) output.0 -> addr: 4925868038
>> [   18.941710] (add_packed) output.0 -> next id: 1
>> [   18.944032] Entering prepare_packed: output.0
>> [   18.946148] (prepare_packed) output.0 -> needs_kick: 1
>> [   18.948234] (notify) output.0 -> function will return true        <----- Notify QEMU
>> [   18.949606] output.0 -> id: 0, idx: 7                             <----- Mark ID 0 in idx 7 as used
>> [   18.952756] output.0 -> old free_head: 1, new free_head: 0        <----- ID 0 can be reused
>> [   18.955154] (after get_buf processed) output.0 -> id: 0, idx: 8   <----- Next slot is idx 8
>>
>> There was no issue in QEMU till this point.
>>
>> [   19.177536] (use_indirect?) output.0 -> verdict: 0                <----- Begin adding descriptor with ID = 0 in idx 8
>> [   19.182415] (add_packed) output.0 -> idx: 8
>> [   19.187257] (add_packed) output.0 -> id: 0
>> [   19.191355] (add_packed) output.0 -> len: 102
>> [   19.195131] (add_packed) output.0 -> addr: 4370702342
>> [   19.199224] (add_packed) output.0 -> next id: 1
>> [   19.204929] Entering prepare_packed: output.0
>> [   19.209505] (prepare_packed) output.0 -> needs_kick: 1
>> [   19.213820] (notify) output.0 -> function will return true       <----- Notify QEMU
>> [   19.218792] (use_indirect?) output.0 -> verdict: 0               <----- Next slot is idx 9
>> [   19.224730] (add_packed) output.0 -> idx: 9
>> [   19.227067] (add_packed) output.0 -> id: 1                       <----- ID 0 can't be reused yet, so use ID = 1
>> [   19.229090] (add_packed) output.0 -> len: 330
>> [   19.231182] (add_packed) output.0 -> addr: 4311020614
>> [   19.233302] (add_packed) output.0 -> next id: 2
>> [   19.235620] Entering prepare_packed: output.0
>> [   19.237781] (prepare_packed) output.0 -> needs_kick: 1
>> [   19.239958] (notify) output.0 -> function will return true       <----- Notify QEMU
>> [   19.237780] output.0 -> id: 0, idx: 8                            <----- Mark ID 0 in idx 8 as used
>> [   19.243676] output.0 -> old free_head: 2, new free_head: 0       <----- ID 0 can now be reused
>> [   19.245214] (after get_buf processed) output.0 -> id: 0, idx: 9  <----- Next slot is idx 9
>> [   19.247097] output.0 -> id: 1, idx: 9                            <----- Mark ID 1 in idx 9 as used
>> [   19.249612] output.0 -> old free_head: 0, new free_head: 1       <----- ID 1 can now be reused
>> [   19.252266] (after get_buf processed) output.0 -> id: 1, idx: 10 <----- Next slot is idx 10
>>
>> ID 0 and ID 1 in idx 8 and idx 9 respectively are pushed to QEMU
>> before either of them are marked as used.
>>
>> But in QEMU, the order is slightly different.
>>
>> num: 1, init_flags: 128                                                        <----- vhost_svq_add_packed()
>> idx: 8, id: 0, len: 0, flags: 0, vq idx: 1                                     <----- Before adding descriptor
>> idx: 8, id: 0, len: 102, flags: 128, vq idx: 1                                 <----- After adding descriptor
>> Finally: new_idx: 9, head_idx: 8, id: 0, len: 102, flags: 128, vq idx: 1
>> svq->vring.num: 256                                                            <----- Begin vhost_svq_get_buf_packed()
>> descriptor_len: 0
>> descriptor_id: 0                                                               <----- Mark ID = 0 as used
>> last_used: 8                                                                   <----- Processing idx 8
>> used_wrap_counter: 1
>> svq->desc_state[id].ndescs: 1
>> free_head: 0                                                                   <----- Update free_head to 0.
>> last_used: 9                                                                   <----- Update last_used to 9.
>> vq idx: 1                                                                      <----- End vhost_svq_get_buf_packed()
>> i: 0                                                                           <----- vhost_svq_flush()
>> descriptor_len: 0
>> elem->len: 22086
>> i: 1
>> elem_is_null: 1
>> vq idx: 1                                                                      <----- End vhost_svq_flush()
>> num: 1, init_flags: 128                                                        <----- vhost_svq_add_packed()
>> idx: 9, id: 0, len: 0, flags: 0, curr: 0, vq idx: 1                            <----- Before adding descriptor
>> idx: 9, id: 0, len: 330, flags: 128, curr: 1, vq idx: 1                        <----- After adding descriptor
>> Finally: new_idx: 10, head_idx: 9, id: 0, len: 330, flags: 128, vq idx: 1      <----- ID 0 has been reused (versus ID 1 in L2)
>> svq->vring.num: 256                                                            <----- Begin vhost_svq_get_buf_packed()
>> descriptor_len: 0
>> descriptor_id: 0                                                               <----- Mark ID = 0 as used
>> last_used: 9                                                                   <----- Processing idx 9
>> used_wrap_counter: 1
>> svq->desc_state[id].ndescs: 1
>> free_head: 0                                                                   <----- Update free_head to 0.
>> last_used: 10                                                                  <----- Update last_used to 10.
>> vq idx: 1                                                                      <----- End vhost_svq_get_buf_packed()
>> i: 0                                                                           <----- vhost_svq_flush()
>> descriptor_len: 0
>> elem->len: 22086
>> i: 1
>> elem_is_null: 1
>> vq idx: 1                                                                      <----- End vhost_svq_flush()
>>
>> In QEMU, id 0 is added in idx 8. But it's marked as used before a
>> descriptor can be added in idx 9. Because of this there's a discrepancy
>> in the value of free_head and in svq->desc_next.
>>
>> In the current implementation, the values of ID are generated, maintained
>> and processed by QEMU instead of reading from the guest's memory. I think
>> reading the value of ID from the guest memory (similar to reading the
>> descriptor length from guest memory) should resolve this issue.
>>
> 
> Ok you made a good catch here :).
> 
> The 1:1 sync is hard to achieve as a single buffer in the guest may
> need to be splitted in many buffers in the host.
> 
>> The alternative would be to ensure that "add_packed" and "get_buf_packed"
>> are synchronized between the guest and QEMU.
> 
> Yes, they're synchronized. When the guest makes an available
> descriptor, its head is saved in the VirtQueueElement of the SVQ's
> head idx on svq->desc_state.
> 
> Reviewing patch 3/7 I see you're actually returning the id of the
> first descriptor of the chain in *head, while it should be the id of
> the *last* descriptor. It should not be the cause of the failure, as I
> don't see any descriptor chain in the log.

Does this mean if the current free_head is 3 and the length of the chain
is 4, then ID 7 should be saved in the descriptor ring? In the current
implementation, *all* descriptors in the chain are being assigned the same
ID (= free_head instead of free_head + length of chain).

> To keep the free linked list happy we may need to store the head of the
> descriptor chain in the vq too.
> 
> Now, why is SVQ id 0 being reused? Sounds like free_list is not
> initialized to 0, 1, 2... but to something else like 0, 0, 0, etc. Can
> you print the whole list in each iteration?
> 

The free_list initially has the following values:
index 0 -> 1
index 1 -> 2
...
index 254 -> 255
index 255 -> 0

free_head is set to 0 at the beginning. When add_packed() executes for
the first time ID 0 is used and free_head is set to 1. If get_buf_packed()
is run immediately after the "add" operation, free_list[ID] is set to
the current free_head (ID 1 in this case). After this free_head is set to
the used ID (0 in this case).

So, free_list still looks like this:
index 0 -> 1
index 1 -> 2
...
index 254 -> 255
index 255 -> 0

But the free_head is 0. So, ID = 0 is reused again.

When 2 IDs (such as with idx 8 and idx 9 in the guest) are added to the
SVQ without either being used, then free_head is updated as shown:

free_head = 0 -> 1 -> 2

And then marking both IDs as "used" results in:

free_list[0] = current free_head (= 2)
free_head = 0
free_list[1] = current free_head (= 0)
free_head = 1

So free_list looks like this:

index 0 -> 2
index 1 -> 0
index 2 -> 3
index 3 -> 4
...
index 254 -> 255
index 255 -> 0

None of the indices in free_list hold the value 1.

Since at this point free_head is 1, ID 1 is now used very frequently
if add_packed() and get_buf_used() run one at a time in an interleaved
fashion.

Thanks,
Sahil

next prev parent reply	other threads:[~2025-08-04  6:05 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-24 13:59 [RFC v5 0/7] Add packed format to shadow virtqueue Sahil Siddiq
2025-03-24 13:59 ` [RFC v5 1/7] vhost: Refactor vhost_svq_add_split Sahil Siddiq
2025-03-26 11:25   ` Eugenio Perez Martin
2025-03-28  5:18     ` Sahil Siddiq
2025-03-24 13:59 ` [RFC v5 2/7] vhost: Data structure changes to support packed vqs Sahil Siddiq
2025-03-26 11:26   ` Eugenio Perez Martin
2025-03-28  5:17     ` Sahil Siddiq
2025-03-24 13:59 ` [RFC v5 3/7] vhost: Forward descriptors to device via packed SVQ Sahil Siddiq
2025-03-24 14:14   ` Sahil Siddiq
2025-03-26  8:03     ` Eugenio Perez Martin
2025-03-27 18:42       ` Sahil Siddiq
2025-03-28  7:51         ` Eugenio Perez Martin
2025-04-14  9:37           ` Sahil Siddiq
2025-04-14 15:07             ` Eugenio Perez Martin
2025-04-15 19:10               ` Sahil Siddiq
2025-03-26 12:02   ` Eugenio Perez Martin
2025-03-28  5:09     ` Sahil Siddiq
2025-03-28  6:42       ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 4/7] vdpa: Allocate memory for SVQ and map them to vdpa Sahil Siddiq
2025-03-26 12:05   ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 5/7] vhost: Forward descriptors to guest via packed vqs Sahil Siddiq
2025-03-24 14:34   ` Sahil Siddiq
2025-03-26  8:34     ` Eugenio Perez Martin
2025-03-28  5:22       ` Sahil Siddiq
2025-03-28  7:53         ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 6/7] vhost: Validate transport device features for " Sahil Siddiq
2025-03-26 12:06   ` Eugenio Perez Martin
2025-03-28  5:33     ` Sahil Siddiq
2025-03-28  8:02       ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 7/7] vdpa: Support setting vring_base for packed SVQ Sahil Siddiq
2025-03-26 12:08   ` Eugenio Perez Martin
2025-03-27 18:44     ` Sahil Siddiq
2025-03-26  7:35 ` [RFC v5 0/7] Add packed format to shadow virtqueue Eugenio Perez Martin
2025-04-14  9:20   ` Sahil Siddiq
2025-04-15 19:20     ` Sahil Siddiq
2025-04-16  7:20     ` Eugenio Perez Martin
2025-05-14  6:21       ` Sahil Siddiq
2025-05-15  6:19         ` Eugenio Perez Martin
2025-06-26  5:16           ` Sahil Siddiq
2025-06-26  7:37             ` Eugenio Perez Martin
2025-07-30 14:32               ` Sahil Siddiq
2025-07-31 13:52                 ` Eugenio Perez Martin
2025-08-04  6:04                   ` Sahil Siddiq [this message]
2025-08-05  9:07                     ` Eugenio Perez Martin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=15d4f488-5431-460c-8e06-0741e11fe7f2@gmail.com \
    --to=icegambit91@gmail.com \
    --cc=eperezma@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=sahilcdq@proton.me \
    --cc=sgarzare@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).