From: Sahil Siddiq <icegambit91@gmail.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: sgarzare@redhat.com, mst@redhat.com, qemu-devel@nongnu.org,
sahilcdq@proton.me
Subject: Re: [RFC v5 0/7] Add packed format to shadow virtqueue
Date: Wed, 14 May 2025 11:51:42 +0530 [thread overview]
Message-ID: <9a7c409f-cd7e-4906-812b-c8a4d77cfc4d@gmail.com> (raw)
In-Reply-To: <CAJaqyWd=ssa5fkmV7Z=tzJvFeciC1P2U2pYheaSrZ2PZCaejHg@mail.gmail.com>
Hi,
Apologies, I haven't been in touch for a while. I have an update that
I would like to give.
On 4/16/25 12:50 PM, Eugenio Perez Martin wrote:
> On Mon, Apr 14, 2025 at 11:20 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>
>> Hi,
>>
>> On 3/26/25 1:05 PM, Eugenio Perez Martin wrote:
>>> On Mon, Mar 24, 2025 at 2:59 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>>> I managed to fix a few issues while testing this patch series.
>>>> There is still one issue that I am unable to resolve. I thought
>>>> I would send this patch series for review in case I have missed
>>>> something.
>>>>
>>>> The issue is that this patch series does not work every time. I
>>>> am able to ping L0 from L2 and vice versa via packed SVQ when it
>>>> works.
>>>
>>> So we're on a very good track then!
>>>
>>>> When this doesn't work, both VMs throw a "Destination Host
>>>> Unreachable" error. This is sometimes (not always) accompanied
>>>> by the following kernel error (thrown by L2-kernel):
>>>>
>>>> virtio_net virtio1: output.0:id 1 is not a head!
>>>>
>>>
>>> How many packets have been sent or received before hitting this? If
>>> the answer to that is "the vq size", maybe there is a bug in the code
>>> that handles the wraparound of the packed vq, as the used and avail
>>> flags need to be twisted. You can count them in the SVQ code.
>>
>> I did a lot more testing. This issue is quite unpredictable in terms
>> of the time at which it appears after booting L2. So far, it almost
>> always appears after booting L2. Even when pinging works, this issue
>> appears after several seconds of pinging.
>>
>
> Maybe you can speed it up with ping -f?
Thank you, I was able to run tests much faster with the -f option. So
far I have noticed that the RX queue does not give problems. When all
the descriptors are used it is able to wrap around without issues.
>> The total number of svq descriptors varied in every test run. But in
>> every case, all 256 indices were filled in the descriptor region for
>> vq with vq_idx = 0. This is the RX vq, right?
>
> Right!
The TX queue seems to be problematic. More on this below.
>> This was filled while L2
>> was booting. In the case when the ctrl vq is disabled, I am not sure
>> what is responsible for filling the vqs in the data plane during
>> booting.
>>
> The nested guest's driver fills the rx queue at startup. After that,
> that nested guest kicks and SVQ receives the descriptors. It copies
> the descriptors to the shadow virtqueue and then kicks L0 QEMU.
Understood.
>> =====
>> The issue is hit most frequently when the following command is run
>> in L0:
>> $ ip addr add 111.1.1.1/24 dev tap0
>> $ ip link set tap0 up
>>
>> or, running the following in L2:
>> # ip addr add 111.1.1.2/24 dev eth0
>>
>
> I guess those are able to start the network, aren't they?
Yes, that's correct.
>> The other vq (vq_idx=1) is not filled completely before the issue is
>> hit.
>> I have been noting down the numbers and here is an example:
>>
>> 295 descriptors were added individually to the queues i.e., there were no chains (vhost_svq_add_packed)
>> |_ 256 additions in vq_idx = 0, all with unique ids
>> |---- 27 descriptors (ids 0 through 26) were received later from the device (vhost_svq_get_buf_packed)
>> |_ 39 additions in vq_idx = 1
>> |_ 13 descriptors had id = 0
>> |_ 26 descriptors had id = 1
>> |---- All descriptors were received at some point from the device (vhost_svq_get_buf_packed)
>>
>> There was one case in which vq_idx=0 had wrapped around. I verified
>> that flags were set appropriately during the wrap (avail and used flags
>> were flipped as expected).
>>
>
> Ok sounds like you're able to reach it before filling the queue. I'd
> go for debugging notifications for this one then. More on this below.
>
>> =====
>> The next common situation where this issue is hit is during startup.
>> Before L2 can finish booting successfully, this error is thrown:
>>
>> virtio_net virtio1: output.0:id 0 is not a head!
>>
>> 258 descriptors were added individually to the queues during startup (there were no chains) (vhost_svq_add_packed)
>> |_ 256 additions in vq_idx = 0, all with unique ids
>> |---- None of them were received by the device (vhost_svq_get_buf_packed)
>> |_ 2 additions in vq_idx = 1
>> |_ id = 0 in index 0
>> |_ id = 1 in index 1
>> |---- Both descriptors were received at some point during startup from the device (vhost_svq_get_buf_packed)
>>
>> =====
>> Another case is after several seconds of pinging L0 from L2.
>>
>> [ 99.034114] virtio_net virtio1: output.0:id 0 is not a head!
>>
>
> So the L2 guest sees a descriptor it has not made available
> previously. This can be caused because SVQ returns the same descriptor
> twice, or it doesn't fill the id or flags properly. It can also be
> caused because we're not protecting the write ordering in the ring,
> but I don't see anything obviously wrong by looking at the code.
>
>> 366 descriptors were added individually to the queues i.e., there were no chains (vhost_svq_add_packed)
>> |_ 289 additions in vq_idx = 0, wrap-around was observed with avail and used flags inverted for 33 descriptors
>> | |---- 40 descriptors (ids 0 through 39) were received from the device (vhost_svq_get_buf_packed)
>> |_ 77 additions in vq_idx = 1
>> |_ 76 descriptors had id = 0
>> |_ 1 descriptor had id = 1
>> |---- all 77 descriptors were received at some point from the device (vhost_svq_get_buf_packed)
>>
>> I am not entirely sure now if there's an issue in the packed vq
>> implementation in QEMU or if this is being caused due to some sort
>> of race condition in linux.
>>
>> "id is not a head" is being thrown because vq->packed.desc_state[id].data
>> doesn't exist for the corresponding id in Linux [1]. But QEMU seems to have
>> stored some data for this id via vhost_svq_add() [2]. Linux sets the value
>> of vq->packed.desc_state[id].data in its version of virtqueue_add_packed() [3].
>>
>
> Let's keep debugging further. Can you trace the ids that the L2 kernel
> makes available, and then the ones that it uses? At the same time, can
> you trace the ids that the svq sees in vhost_svq_get_buf and the ones
> that flushes? This allows us to check the set of available descriptors
> at any given time.
>
In the linux kernel, I am printing which descriptor is received in which
queue in drivers/virtio/virtio_ring.c:virtqueue_get_buf_ctx_packed() [1].
I see the following lines getting printed for the TX queue:
[ 192.101591] output.0 -> id: 0
[ 213.737417] output.0 -> id: 0
[ 213.738714] output.0 -> id: 1
[ 213.740093] output.0 -> id: 0
[ 213.741521] virtio_net virtio1: output.0:id 0 is not a head!
In QEMU's hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_add_packed(), I am
printing the head_idx, id, len, flags and vq_idx. Just before the crash,
the following lines are printed:
head_idx: 157, id: 0, len: 122, flags: 32768, vq idx: 1
head_idx: 158, id: 0, len: 122, flags: 32768, vq idx: 1
head_idx: 159, id: 0, len: 66, flags: 32768, vq idx: 1
head_idx: 160, id: 1, len: 102, flags: 32768, vq idx: 1
In QEMU's hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_get_buf_packed(), I
am printing the id, last_used index, used wrap counter and vq_idx. These
are the lines just before the crash:
id: 0, last_used: 158, used_wrap_counter: 0, vq idx: 1
id: 0, last_used: 159, used_wrap_counter: 0, vq idx: 1
id: 0, last_used: 160, used_wrap_counter: 0, vq idx: 1
id: 1, last_used: 161, used_wrap_counter: 0, vq idx: 1
In QEMU's hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_flush() [2], I am tracking
the values of i and vq_idx in the outer do..while() loop as well as in the inner
while(true) loop. The value of i is used as the "idx" in virtqueue_fill() [3] and
as "count" in virtqueue_flush() [4]. Lines printed in each iteration of the outer
do...while loop are enclosed between "===" lines. These are the lines just before
the crash:
===
in_loop: i: 0, vq idx: 1
in_loop: i: 1, vq idx: 1
out_loop: i: 1, vq idx: 1
===
in_loop: i: 0, vq idx: 1
in_loop: i: 1, vq idx: 1
out_loop: i: 1, vq idx: 1
===
in_loop: i: 0, vq idx: 1
in_loop: i: 1, vq idx: 1
in_loop: i: 2, vq idx: 1
out_loop: i: 2, vq idx: 1
===
in_loop: i: 0, vq idx: 1
out_loop: i: 0, vq idx: 1
I have only investigated which descriptors the kernel uses. I'll also check
which descriptors are made available by the kernel. I'll let you know what I
find.
Thanks,
Sahil
[1] https://github.com/torvalds/linux/blob/master/drivers/virtio/virtio_ring.c#L1727
[2] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/vhost-shadow-virtqueue.c#L499
[3] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/virtio.c#L1008
[4] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/virtio/virtio.c#L1147
next prev parent reply other threads:[~2025-05-14 6:22 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-24 13:59 [RFC v5 0/7] Add packed format to shadow virtqueue Sahil Siddiq
2025-03-24 13:59 ` [RFC v5 1/7] vhost: Refactor vhost_svq_add_split Sahil Siddiq
2025-03-26 11:25 ` Eugenio Perez Martin
2025-03-28 5:18 ` Sahil Siddiq
2025-03-24 13:59 ` [RFC v5 2/7] vhost: Data structure changes to support packed vqs Sahil Siddiq
2025-03-26 11:26 ` Eugenio Perez Martin
2025-03-28 5:17 ` Sahil Siddiq
2025-03-24 13:59 ` [RFC v5 3/7] vhost: Forward descriptors to device via packed SVQ Sahil Siddiq
2025-03-24 14:14 ` Sahil Siddiq
2025-03-26 8:03 ` Eugenio Perez Martin
2025-03-27 18:42 ` Sahil Siddiq
2025-03-28 7:51 ` Eugenio Perez Martin
2025-04-14 9:37 ` Sahil Siddiq
2025-04-14 15:07 ` Eugenio Perez Martin
2025-04-15 19:10 ` Sahil Siddiq
2025-03-26 12:02 ` Eugenio Perez Martin
2025-03-28 5:09 ` Sahil Siddiq
2025-03-28 6:42 ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 4/7] vdpa: Allocate memory for SVQ and map them to vdpa Sahil Siddiq
2025-03-26 12:05 ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 5/7] vhost: Forward descriptors to guest via packed vqs Sahil Siddiq
2025-03-24 14:34 ` Sahil Siddiq
2025-03-26 8:34 ` Eugenio Perez Martin
2025-03-28 5:22 ` Sahil Siddiq
2025-03-28 7:53 ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 6/7] vhost: Validate transport device features for " Sahil Siddiq
2025-03-26 12:06 ` Eugenio Perez Martin
2025-03-28 5:33 ` Sahil Siddiq
2025-03-28 8:02 ` Eugenio Perez Martin
2025-03-24 13:59 ` [RFC v5 7/7] vdpa: Support setting vring_base for packed SVQ Sahil Siddiq
2025-03-26 12:08 ` Eugenio Perez Martin
2025-03-27 18:44 ` Sahil Siddiq
2025-03-26 7:35 ` [RFC v5 0/7] Add packed format to shadow virtqueue Eugenio Perez Martin
2025-04-14 9:20 ` Sahil Siddiq
2025-04-15 19:20 ` Sahil Siddiq
2025-04-16 7:20 ` Eugenio Perez Martin
2025-05-14 6:21 ` Sahil Siddiq [this message]
2025-05-15 6:19 ` Eugenio Perez Martin
2025-06-26 5:16 ` Sahil Siddiq
2025-06-26 7:37 ` Eugenio Perez Martin
2025-07-30 14:32 ` Sahil Siddiq
2025-07-31 13:52 ` Eugenio Perez Martin
2025-08-04 6:04 ` Sahil Siddiq
2025-08-05 9:07 ` Eugenio Perez Martin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9a7c409f-cd7e-4906-812b-c8a4d77cfc4d@gmail.com \
--to=icegambit91@gmail.com \
--cc=eperezma@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=sahilcdq@proton.me \
--cc=sgarzare@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.