From: Sahil Siddiq <icegambit91@gmail.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: sgarzare@redhat.com, mst@redhat.com, qemu-devel@nongnu.org,
sahilcdq@proton.me
Subject: Re: [RFC v5 0/7] Add packed format to shadow virtqueue
Date: Mon, 4 Aug 2025 11:34:26 +0530 [thread overview]
Message-ID: <15d4f488-5431-460c-8e06-0741e11fe7f2@gmail.com> (raw)
In-Reply-To: <CAJaqyWcWoyGWeOG1B8FBZgGb6n99fVkTAvaTFco0A2cvr8Lf8w@mail.gmail.com>
Hi,
On 7/31/25 7:22 PM, Eugenio Perez Martin wrote:
> On Wed, Jul 30, 2025 at 4:33 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
>> I think I have finally found the reason behind this issue.
>>
>> The order in which "add_packed" and "get_buf_packed" are performed in the
>> nested guest kernel (L2 kernel) and QEMU are a little different. Due to
>> this, the values in free_head and svq->desc_next[] differ and the guest
>> crashes at some point. More below.
>>
>> On 6/26/25 1:07 PM, Eugenio Perez Martin wrote:
>>> On Thu, Jun 26, 2025 at 7:16 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>>> I think there's something off in the way "free_head", "last_used_idx" and
>>>> "desc_next" values are calculated in vhost_svq_get_buf_packed() [1].
>>>>
>>>> In the latest test run, QEMU sent ids 0 through 28 to L2. L2 started receiving
>>>> them in order till id 8. At this point it received id 7 again for some reason
>>>> and then crashed.
>>>>
>>>> L2:
>>>>
>>>> [ 1641.129218] (prepare_packed) output.0 -> needs_kick: 1
>>>> [ 1641.130621] (notify) output.0 -> function will return true
>>>> [ 1641.132022] output.0 -> id: 0
>>>> [ 1739.502358] input.0 -> id: 0
>>>> [ 1739.503003] input.0 -> id: 1
>>>> [ 1739.562024] input.0 -> id: 2
>>>> [ 1739.578682] input.0 -> id: 3
>>>> [ 1739.661913] input.0 -> id: 4
>>>> [ 1739.828796] input.0 -> id: 5
>>>> [ 1739.829789] input.0 -> id: 6
>>>> [ 1740.078757] input.0 -> id: 7
>>>> [ 1740.079749] input.0 -> id: 8
>>>> [ 1740.080382] input.0 -> id: 7 <----Received 7 again
>>>> [ 1740.081614] virtio_net virtio1: input.0:id 7 is not a head!
>>>>
>>>> QEMU logs (vhost_svq_get_buf_packed):
>>>> ------
>>>> size : svq->vring.num
>>>> len : svq->vring_packed.vring.desc[last_used].len
>>>> id : svq->vring_packed.vring.desc[last_used].id
>>>> num : svq->desc_state[id].ndescs
>>>> last_used_chain : Result of vhost_svq_last_desc_of_chain(svq, num, id) [2]
>>>> free_head : svq->free_head
>>>> last_used : (last_used_idx & ~(1 << VRING_PACKED_EVENT_F_WRAP_CTR)) + num
>>>> used_wrap_counter : !!(last_used_idx & (1 << VRING_PACKED_EVENT_F_WRAP_CTR))
>>>> ------
>>>> size: 256, len: 82, id: 7, vq idx: 0
>>>> id: 7, last_used_chain: 7, free_head: 6, vq idx: 0
>>>> num: 1, free_head: 7, id: 7, last_used: 8, used_wrap_counter: 1, vq idx: 0
>>>> ------
>>>> size: 256, len: 104, id: 8, vq idx: 0
>>>> id: 8, last_used_chain: 8, free_head: 7, vq idx: 0
>>>> num: 1, free_head: 8, id: 8, last_used: 9, used_wrap_counter: 1, vq idx: 0
>>>> ------
>>>> size: 256, len: 98, id: 9, vq idx: 0
>>>> id: 9, last_used_chain: 9, free_head: 8, vq idx: 0
>>>> num: 1, free_head: 9, id: 9, last_used: 10, used_wrap_counter: 1, vq idx: 0
>>>> ------
>>>> size: 256, len: 104, id: 10, vq idx: 0
>>>> id: 10, last_used_chain: 10, free_head: 9, vq idx: 0
>>>> num: 1, free_head: 10, id: 10, last_used: 11, used_wrap_counter: 1, vq idx: 0
>>>>
>>>> I have a few more ideas of what to do. I'll let you know if I find something
>>>> else.
>>>>
>>> I cannot find anything just by inspection. What about printing all the
>>> desc_state and all desc_next to check for incoherencies in each
>>> svq_add and get_buf?
>> In this test, all 256 descriptors were filled in the RX vq.
>>
>> In the TX queue, L2 kernel would add one descriptor at a time and notify
>> QEMU. QEMU would then register it in its SVQ and mark it as "available".
>> After processing the descriptor, QEMU would mark it as "used" and flush it
>> back to L2. L2, in turn, would mark this descriptor as "used". After this
>> process, L2 would add the next descriptor in the TX vq while reusing this
>> ID. This was observed from idx 0 till idx 7.
>>
>> L2's debug logs:
>>
>> [ 18.379112] (use_indirect?) output.0 -> verdict: 0 <----- Begin adding descriptor in idx 6
>> [ 18.387134] (add_packed) output.0 -> idx: 6
>> [ 18.389897] (add_packed) output.0 -> id: 0
>> [ 18.392290] (add_packed) output.0 -> len: 74
>> [ 18.394606] (add_packed) output.0 -> addr: 5012315726
>> [ 18.397043] (add_packed) output.0 -> next id: 1
>> [ 18.399861] Entering prepare_packed: output.0
>> [ 18.402478] (prepare_packed) output.0 -> needs_kick: 1
>> [ 18.404998] (notify) output.0 -> function will return true <----- Notify QEMU
>> [ 18.406349] output.0 -> id: 0, idx: 6 <----- Mark ID 0 in idx 6 as used
>> [ 18.409482] output.0 -> old free_head: 1, new free_head: 0 <----- ID 0 can be reused
>> [ 18.410919] (after get_buf processed) output.0 -> id: 0, idx: 7 <----- Next slot is idx 7
>> [ 18.921895] (use_indirect?) output.0 -> verdict: 0 <----- Begin adding descriptor with ID = 0 in idx 7
>> [ 18.930093] (add_packed) output.0 -> idx: 7
>> [ 18.935715] (add_packed) output.0 -> id: 0
>> [ 18.937609] (add_packed) output.0 -> len: 122
>> [ 18.939614] (add_packed) output.0 -> addr: 4925868038
>> [ 18.941710] (add_packed) output.0 -> next id: 1
>> [ 18.944032] Entering prepare_packed: output.0
>> [ 18.946148] (prepare_packed) output.0 -> needs_kick: 1
>> [ 18.948234] (notify) output.0 -> function will return true <----- Notify QEMU
>> [ 18.949606] output.0 -> id: 0, idx: 7 <----- Mark ID 0 in idx 7 as used
>> [ 18.952756] output.0 -> old free_head: 1, new free_head: 0 <----- ID 0 can be reused
>> [ 18.955154] (after get_buf processed) output.0 -> id: 0, idx: 8 <----- Next slot is idx 8
>>
>> There was no issue in QEMU till this point.
>>
>> [ 19.177536] (use_indirect?) output.0 -> verdict: 0 <----- Begin adding descriptor with ID = 0 in idx 8
>> [ 19.182415] (add_packed) output.0 -> idx: 8
>> [ 19.187257] (add_packed) output.0 -> id: 0
>> [ 19.191355] (add_packed) output.0 -> len: 102
>> [ 19.195131] (add_packed) output.0 -> addr: 4370702342
>> [ 19.199224] (add_packed) output.0 -> next id: 1
>> [ 19.204929] Entering prepare_packed: output.0
>> [ 19.209505] (prepare_packed) output.0 -> needs_kick: 1
>> [ 19.213820] (notify) output.0 -> function will return true <----- Notify QEMU
>> [ 19.218792] (use_indirect?) output.0 -> verdict: 0 <----- Next slot is idx 9
>> [ 19.224730] (add_packed) output.0 -> idx: 9
>> [ 19.227067] (add_packed) output.0 -> id: 1 <----- ID 0 can't be reused yet, so use ID = 1
>> [ 19.229090] (add_packed) output.0 -> len: 330
>> [ 19.231182] (add_packed) output.0 -> addr: 4311020614
>> [ 19.233302] (add_packed) output.0 -> next id: 2
>> [ 19.235620] Entering prepare_packed: output.0
>> [ 19.237781] (prepare_packed) output.0 -> needs_kick: 1
>> [ 19.239958] (notify) output.0 -> function will return true <----- Notify QEMU
>> [ 19.237780] output.0 -> id: 0, idx: 8 <----- Mark ID 0 in idx 8 as used
>> [ 19.243676] output.0 -> old free_head: 2, new free_head: 0 <----- ID 0 can now be reused
>> [ 19.245214] (after get_buf processed) output.0 -> id: 0, idx: 9 <----- Next slot is idx 9
>> [ 19.247097] output.0 -> id: 1, idx: 9 <----- Mark ID 1 in idx 9 as used
>> [ 19.249612] output.0 -> old free_head: 0, new free_head: 1 <----- ID 1 can now be reused
>> [ 19.252266] (after get_buf processed) output.0 -> id: 1, idx: 10 <----- Next slot is idx 10
>>
>> ID 0 and ID 1 in idx 8 and idx 9 respectively are pushed to QEMU
>> before either of them are marked as used.
>>
>> But in QEMU, the order is slightly different.
>>
>> num: 1, init_flags: 128 <----- vhost_svq_add_packed()
>> idx: 8, id: 0, len: 0, flags: 0, vq idx: 1 <----- Before adding descriptor
>> idx: 8, id: 0, len: 102, flags: 128, vq idx: 1 <----- After adding descriptor
>> Finally: new_idx: 9, head_idx: 8, id: 0, len: 102, flags: 128, vq idx: 1
>> svq->vring.num: 256 <----- Begin vhost_svq_get_buf_packed()
>> descriptor_len: 0
>> descriptor_id: 0 <----- Mark ID = 0 as used
>> last_used: 8 <----- Processing idx 8
>> used_wrap_counter: 1
>> svq->desc_state[id].ndescs: 1
>> free_head: 0 <----- Update free_head to 0.
>> last_used: 9 <----- Update last_used to 9.
>> vq idx: 1 <----- End vhost_svq_get_buf_packed()
>> i: 0 <----- vhost_svq_flush()
>> descriptor_len: 0
>> elem->len: 22086
>> i: 1
>> elem_is_null: 1
>> vq idx: 1 <----- End vhost_svq_flush()
>> num: 1, init_flags: 128 <----- vhost_svq_add_packed()
>> idx: 9, id: 0, len: 0, flags: 0, curr: 0, vq idx: 1 <----- Before adding descriptor
>> idx: 9, id: 0, len: 330, flags: 128, curr: 1, vq idx: 1 <----- After adding descriptor
>> Finally: new_idx: 10, head_idx: 9, id: 0, len: 330, flags: 128, vq idx: 1 <----- ID 0 has been reused (versus ID 1 in L2)
>> svq->vring.num: 256 <----- Begin vhost_svq_get_buf_packed()
>> descriptor_len: 0
>> descriptor_id: 0 <----- Mark ID = 0 as used
>> last_used: 9 <----- Processing idx 9
>> used_wrap_counter: 1
>> svq->desc_state[id].ndescs: 1
>> free_head: 0 <----- Update free_head to 0.
>> last_used: 10 <----- Update last_used to 10.
>> vq idx: 1 <----- End vhost_svq_get_buf_packed()
>> i: 0 <----- vhost_svq_flush()
>> descriptor_len: 0
>> elem->len: 22086
>> i: 1
>> elem_is_null: 1
>> vq idx: 1 <----- End vhost_svq_flush()
>>
>> In QEMU, id 0 is added in idx 8. But it's marked as used before a
>> descriptor can be added in idx 9. Because of this there's a discrepancy
>> in the value of free_head and in svq->desc_next.
>>
>> In the current implementation, the values of ID are generated, maintained
>> and processed by QEMU instead of reading from the guest's memory. I think
>> reading the value of ID from the guest memory (similar to reading the
>> descriptor length from guest memory) should resolve this issue.
>>
>
> Ok you made a good catch here :).
>
> The 1:1 sync is hard to achieve as a single buffer in the guest may
> need to be splitted in many buffers in the host.
>
>> The alternative would be to ensure that "add_packed" and "get_buf_packed"
>> are synchronized between the guest and QEMU.
>
> Yes, they're synchronized. When the guest makes an available
> descriptor, its head is saved in the VirtQueueElement of the SVQ's
> head idx on svq->desc_state.
>
> Reviewing patch 3/7 I see you're actually returning the id of the
> first descriptor of the chain in *head, while it should be the id of
> the *last* descriptor. It should not be the cause of the failure, as I
> don't see any descriptor chain in the log.
Does this mean if the current free_head is 3 and the length of the chain
is 4, then ID 7 should be saved in the descriptor ring? In the current
implementation, *all* descriptors in the chain are being assigned the same
ID (= free_head instead of free_head + length of chain).
> To keep the free linked list happy we may need to store the head of the
> descriptor chain in the vq too.
>
> Now, why is SVQ id 0 being reused? Sounds like free_list is not
> initialized to 0, 1, 2... but to something else like 0, 0, 0, etc. Can
> you print the whole list in each iteration?
>
The free_list initially has the following values:
index 0 -> 1
index 1 -> 2
...
index 254 -> 255
index 255 -> 0
free_head is set to 0 at the beginning. When add_packed() executes for
the first time ID 0 is used and free_head is set to 1. If get_buf_packed()
is run immediately after the "add" operation, free_list[ID] is set to
the current free_head (ID 1 in this case). After this free_head is set to
the used ID (0 in this case).
So, free_list still looks like this:
index 0 -> 1
index 1 -> 2
...
index 254 -> 255
index 255 -> 0
But the free_head is 0. So, ID = 0 is reused again.
When 2 IDs (such as with idx 8 and idx 9 in the guest) are added to the
SVQ without either being used, then free_head is updated as shown:
free_head = 0 -> 1 -> 2
And then marking both IDs as "used" results in:
free_list[0] = current free_head (= 2)
free_head = 0
free_list[1] = current free_head (= 0)
free_head = 1
So free_list looks like this:
index 0 -> 2
index 1 -> 0
index 2 -> 3
index 3 -> 4
...
index 254 -> 255
index 255 -> 0
None of the indices in free_list hold the value 1.
Since at this point free_head is 1, ID 1 is now used very frequently
if add_packed() and get_buf_used() run one at a time in an interleaved
fashion.
Thanks,
Sahil
next prev parent reply other threads:[~2025-08-04 6:05 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-24 13:59 [RFC v5 0/7] Add packed format to shadow virtqueue Sahil Siddiq
2025-03-24 13:59 ` [RFC v5 1/7] vhost: Refactor vhost_svq_add_split Sahil Siddiq
2025-03-26 11:25 ` Eugenio Perez Martin
2025-03-28 5:18 ` Sahil Siddiq
2025-03-24 13:59 ` [RFC v5 2/7] vhost: Data structure changes to support packed vqs Sahil Siddiq
2025-03-26 11:26 ` Eugenio Perez Martin
2025-03-28 5:17 ` Sahil Siddiq
2025-03-24 13:59 ` [RFC v5 3/7] vhost: Forward descriptors to device via packed SVQ Sahil Siddiq
2025-03-24 14:14 ` Sahil Siddiq
2025-03-26 8:03 ` Eugenio Perez Martin
2025-03-27 18:42 ` Sahil Siddiq
2025-03-28 7:51 ` Eugenio Perez Martin
2025-04-14 9:37 ` Sahil Siddiq
2025-04-14 15:07 ` Eugenio Perez Martin
2025-04-15 19:10 ` Sahil Siddiq
2025-03-26 12:02 ` Eugenio Perez Martin
2025-03-28 5:09 ` Sahil Siddiq
2025-03-28 6:42 ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 4/7] vdpa: Allocate memory for SVQ and map them to vdpa Sahil Siddiq
2025-03-26 12:05 ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 5/7] vhost: Forward descriptors to guest via packed vqs Sahil Siddiq
2025-03-24 14:34 ` Sahil Siddiq
2025-03-26 8:34 ` Eugenio Perez Martin
2025-03-28 5:22 ` Sahil Siddiq
2025-03-28 7:53 ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 6/7] vhost: Validate transport device features for " Sahil Siddiq
2025-03-26 12:06 ` Eugenio Perez Martin
2025-03-28 5:33 ` Sahil Siddiq
2025-03-28 8:02 ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 7/7] vdpa: Support setting vring_base for packed SVQ Sahil Siddiq
2025-03-26 12:08 ` Eugenio Perez Martin
2025-03-27 18:44 ` Sahil Siddiq
2025-03-26 7:35 ` [RFC v5 0/7] Add packed format to shadow virtqueue Eugenio Perez Martin
2025-04-14 9:20 ` Sahil Siddiq
2025-04-15 19:20 ` Sahil Siddiq
2025-04-16 7:20 ` Eugenio Perez Martin
2025-05-14 6:21 ` Sahil Siddiq
2025-05-15 6:19 ` Eugenio Perez Martin
2025-06-26 5:16 ` Sahil Siddiq
2025-06-26 7:37 ` Eugenio Perez Martin
2025-07-30 14:32 ` Sahil Siddiq
2025-07-31 13:52 ` Eugenio Perez Martin
2025-08-04 6:04 ` Sahil Siddiq [this message]
2025-08-05 9:07 ` Eugenio Perez Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=15d4f488-5431-460c-8e06-0741e11fe7f2@gmail.com \
--to=icegambit91@gmail.com \
--cc=eperezma@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=sahilcdq@proton.me \
--cc=sgarzare@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).