From: Sahil Siddiq <icegambit91@gmail.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: sgarzare@redhat.com, mst@redhat.com, qemu-devel@nongnu.org,
Sahil Siddiq <sahilcdq@proton.me>
Subject: Re: [RFC v4 0/5] Add packed virtqueue to shadow virtqueue
Date: Sun, 19 Jan 2025 12:07:19 +0530 [thread overview]
Message-ID: <4ee57bd3-5ea0-49a7-969e-c3fe902d8246@gmail.com> (raw)
In-Reply-To: <CAJaqyWftS8angT2=XUUFiR_5yjxNOmV4WXHe3cxkb4t6KbQdDw@mail.gmail.com>
Hi,
On 1/7/25 1:35 PM, Eugenio Perez Martin wrote:
> On Fri, Jan 3, 2025 at 2:06 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>
>> Hi,
>>
>> On 12/20/24 12:28 PM, Eugenio Perez Martin wrote:
>>> On Thu, Dec 19, 2024 at 8:37 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 12/17/24 1:20 PM, Eugenio Perez Martin wrote:
>>>>> On Tue, Dec 17, 2024 at 6:45 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>>>>> On 12/16/24 2:09 PM, Eugenio Perez Martin wrote:
>>>>>>> On Sun, Dec 15, 2024 at 6:27 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>>>>>>> On 12/10/24 2:57 PM, Eugenio Perez Martin wrote:
>>>>>>>>> On Thu, Dec 5, 2024 at 9:34 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>>>>>>>>> [...]
>>>>>>>>>> I have been following the "Hands on vDPA: what do you do
>>>>>>>>>> when you ain't got the hardware v2 (Part 2)" [1] blog to
>>>>>>>>>> test my changes. To boot the L1 VM, I ran:
>>>>>>>>>>
>>>>>>>>>> sudo ./qemu/build/qemu-system-x86_64 \
>>>>>>>>>> -enable-kvm \
>>>>>>>>>> -drive file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio \
>>>>>>>>>> -net nic,model=virtio \
>>>>>>>>>> -net user,hostfwd=tcp::2222-:22 \
>>>>>>>>>> -device intel-iommu,snoop-control=on \
>>>>>>>>>> -device virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,ctrl_vq=on,ctrl_rx=on,packed=on,event_idx=off,bus=pcie.0,addr=0x4 \
>>>>>>>>>> -netdev tap,id=net0,script=no,downscript=no \
>>>>>>>>>> -nographic \
>>>>>>>>>> -m 8G \
>>>>>>>>>> -smp 4 \
>>>>>>>>>> -M q35 \
>>>>>>>>>> -cpu host 2>&1 | tee vm.log
>>>>>>>>>>
>>>>>>>>>> Without "guest_uso4=off,guest_uso6=off,host_uso=off,
>>>>>>>>>> guest_announce=off" in "-device virtio-net-pci", QEMU
>>>>>>>>>> throws "vdpa svq does not work with features" [2] when
>>>>>>>>>> trying to boot L2.
>>>>>>>>>>
>>>>>>>>>> The enums added in commit #2 in this series is new and
>>>>>>>>>> wasn't in the earlier versions of the series. Without
>>>>>>>>>> this change, x-svq=true throws "SVQ invalid device feature
>>>>>>>>>> flags" [3] and x-svq is consequently disabled.
>>>>>>>>>>
>>>>>>>>>> The first issue is related to running traffic in L2
>>>>>>>>>> with vhost-vdpa.
>>>>>>>>>>
>>>>>>>>>> In L0:
>>>>>>>>>>
>>>>>>>>>> $ ip addr add 111.1.1.1/24 dev tap0
>>>>>>>>>> $ ip link set tap0 up
>>>>>>>>>> $ ip addr show tap0
>>>>>>>>>> 4: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
>>>>>>>>>> link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff
>>>>>>>>>> inet 111.1.1.1/24 scope global tap0
>>>>>>>>>> valid_lft forever preferred_lft forever
>>>>>>>>>> inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto kernel_ll
>>>>>>>>>> valid_lft forever preferred_lft forever
>>>>>>>>>>
>>>>>>>>>> I am able to run traffic in L2 when booting without
>>>>>>>>>> x-svq.
>>>>>>>>>>
>>>>>>>>>> In L1:
>>>>>>>>>>
>>>>>>>>>> $ ./qemu/build/qemu-system-x86_64 \
>>>>>>>>>> -nographic \
>>>>>>>>>> -m 4G \
>>>>>>>>>> -enable-kvm \
>>>>>>>>>> -M q35 \
>>>>>>>>>> -drive file=//root/L2.qcow2,media=disk,if=virtio \
>>>>>>>>>> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0 \
>>>>>>>>>> -device virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,bus=pcie.0,addr=0x7 \
>>>>>>>>>> -smp 4 \
>>>>>>>>>> -cpu host \
>>>>>>>>>> 2>&1 | tee vm.log
>>>>>>>>>>
>>>>>>>>>> In L2:
>>>>>>>>>>
>>>>>>>>>> # ip addr add 111.1.1.2/24 dev eth0
>>>>>>>>>> # ip addr show eth0
>>>>>>>>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>>>>>>>>>> link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
>>>>>>>>>> altname enp0s7
>>>>>>>>>> inet 111.1.1.2/24 scope global eth0
>>>>>>>>>> valid_lft forever preferred_lft forever
>>>>>>>>>> inet6 fe80::9877:de30:5f17:35f9/64 scope link noprefixroute
>>>>>>>>>> valid_lft forever preferred_lft forever
>>>>>>>>>>
>>>>>>>>>> # ip route
>>>>>>>>>> 111.1.1.0/24 dev eth0 proto kernel scope link src 111.1.1.2
>>>>>>>>>>
>>>>>>>>>> # ping 111.1.1.1 -w3
>>>>>>>>>> PING 111.1.1.1 (111.1.1.1) 56(84) bytes of data.
>>>>>>>>>> 64 bytes from 111.1.1.1: icmp_seq=1 ttl=64 time=0.407 ms
>>>>>>>>>> 64 bytes from 111.1.1.1: icmp_seq=2 ttl=64 time=0.671 ms
>>>>>>>>>> 64 bytes from 111.1.1.1: icmp_seq=3 ttl=64 time=0.291 ms
>>>>>>>>>>
>>>>>>>>>> --- 111.1.1.1 ping statistics ---
>>>>>>>>>> 3 packets transmitted, 3 received, 0% packet loss, time 2034ms
>>>>>>>>>> rtt min/avg/max/mdev = 0.291/0.456/0.671/0.159 ms
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> But if I boot L2 with x-svq=true as shown below, I am unable
>>>>>>>>>> to ping the host machine.
>>>>>>>>>>
>>>>>>>>>> $ ./qemu/build/qemu-system-x86_64 \
>>>>>>>>>> -nographic \
>>>>>>>>>> -m 4G \
>>>>>>>>>> -enable-kvm \
>>>>>>>>>> -M q35 \
>>>>>>>>>> -drive file=//root/L2.qcow2,media=disk,if=virtio \
>>>>>>>>>> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,x-svq=true,id=vhost-vdpa0 \
>>>>>>>>>> -device virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,bus=pcie.0,addr=0x7 \
>>>>>>>>>> -smp 4 \
>>>>>>>>>> -cpu host \
>>>>>>>>>> 2>&1 | tee vm.log
>>>>>>>>>>
>>>>>>>>>> In L2:
>>>>>>>>>>
>>>>>>>>>> # ip addr add 111.1.1.2/24 dev eth0
>>>>>>>>>> # ip addr show eth0
>>>>>>>>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>>>>>>>>>> link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
>>>>>>>>>> altname enp0s7
>>>>>>>>>> inet 111.1.1.2/24 scope global eth0
>>>>>>>>>> valid_lft forever preferred_lft forever
>>>>>>>>>> inet6 fe80::9877:de30:5f17:35f9/64 scope link noprefixroute
>>>>>>>>>> valid_lft forever preferred_lft forever
>>>>>>>>>>
>>>>>>>>>> # ip route
>>>>>>>>>> 111.1.1.0/24 dev eth0 proto kernel scope link src 111.1.1.2
>>>>>>>>>>
>>>>>>>>>> # ping 111.1.1.1 -w10
>>>>>>>>>> PING 111.1.1.1 (111.1.1.1) 56(84) bytes of data.
>>>>>>>>>> From 111.1.1.2 icmp_seq=1 Destination Host Unreachable
>>>>>>>>>> ping: sendmsg: No route to host
>>>>>>>>>> From 111.1.1.2 icmp_seq=2 Destination Host Unreachable
>>>>>>>>>> From 111.1.1.2 icmp_seq=3 Destination Host Unreachable
>>>>>>>>>>
>>>>>>>>>> --- 111.1.1.1 ping statistics ---
>>>>>>>>>> 3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2076ms
>>>>>>>>>> pipe 3
>>>>>>>>>>
>>>>>>>>>> The other issue is related to booting L2 with "x-svq=true"
>>>>>>>>>> and "packed=on".
>>>>>>>>>>
>>>>>>>>>> In L1:
>>>>>>>>>>
>>>>>>>>>> $ ./qemu/build/qemu-system-x86_64 \
>>>>>>>>>> -nographic \
>>>>>>>>>> -m 4G \
>>>>>>>>>> -enable-kvm \
>>>>>>>>>> -M q35 \
>>>>>>>>>> -drive file=//root/L2.qcow2,media=disk,if=virtio \
>>>>>>>>>> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true \
>>>>>>>>>> -device virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,packed=on,bus=pcie.0,addr=0x7 \
>>>>>>>>>> -smp 4 \
>>>>>>>>>> -cpu host \
>>>>>>>>>> 2>&1 | tee vm.log
>>>>>>>>>>
>>>>>>>>>> The kernel throws "virtio_net virtio1: output.0:id 0 is not
>>>>>>>>>> a head!" [4].
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So this series implements the descriptor forwarding from the guest to
>>>>>>>>> the device in packed vq. We also need to forward the descriptors from
>>>>>>>>> the device to the guest. The device writes them in the SVQ ring.
>>>>>>>>>
>>>>>>>>> The functions responsible for that in QEMU are
>>>>>>>>> hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_flush, which is called by
>>>>>>>>> the device when used descriptors are written to the SVQ, which calls
>>>>>>>>> hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_get_buf. We need to do
>>>>>>>>> modifications similar to vhost_svq_add: Make them conditional if we're
>>>>>>>>> in split or packed vq, and "copy" the code from Linux's
>>>>>>>>> drivers/virtio/virtio_ring.c:virtqueue_get_buf.
>>>>>>>>>
>>>>>>>>> After these modifications you should be able to ping and forward
>>>>>>>>> traffic. As always, It is totally ok if it needs more than one
>>>>>>>>> iteration, and feel free to ask any question you have :).
>>>>>>>>>
>>>>>>>>
>>>>>>>> I misunderstood this part. While working on extending
>>>>>>>> hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_get_buf() [1]
>>>>>>>> for packed vqs, I realized that this function and
>>>>>>>> vhost_svq_flush() already support split vqs. However, I am
>>>>>>>> unable to ping L0 when booting L2 with "x-svq=true" and
>>>>>>>> "packed=off" or when the "packed" option is not specified
>>>>>>>> in QEMU's command line.
>>>>>>>>
>>>>>>>> I tried debugging these functions for split vqs after running
>>>>>>>> the following QEMU commands while following the blog [2].
>>>>>>>>
>>>>>>>> Booting L1:
>>>>>>>>
>>>>>>>> $ sudo ./qemu/build/qemu-system-x86_64 \
>>>>>>>> -enable-kvm \
>>>>>>>> -drive file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio \
>>>>>>>> -net nic,model=virtio \
>>>>>>>> -net user,hostfwd=tcp::2222-:22 \
>>>>>>>> -device intel-iommu,snoop-control=on \
>>>>>>>> -device virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,ctrl_vq=on,ctrl_rx=on,packed=off,event_idx=off,bus=pcie.0,addr=0x4 \
>>>>>>>> -netdev tap,id=net0,script=no,downscript=no \
>>>>>>>> -nographic \
>>>>>>>> -m 8G \
>>>>>>>> -smp 4 \
>>>>>>>> -M q35 \
>>>>>>>> -cpu host 2>&1 | tee vm.log
>>>>>>>>
>>>>>>>> Booting L2:
>>>>>>>>
>>>>>>>> # ./qemu/build/qemu-system-x86_64 \
>>>>>>>> -nographic \
>>>>>>>> -m 4G \
>>>>>>>> -enable-kvm \
>>>>>>>> -M q35 \
>>>>>>>> -drive file=//root/L2.qcow2,media=disk,if=virtio \
>>>>>>>> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,x-svq=true,id=vhost-vdpa0 \
>>>>>>>> -device virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,bus=pcie.0,addr=0x7 \
>>>>>>>> -smp 4 \
>>>>>>>> -cpu host \
>>>>>>>> 2>&1 | tee vm.log
>>>>>>>>
>>>>>>>> I printed out the contents of VirtQueueElement returned
>>>>>>>> by vhost_svq_get_buf() in vhost_svq_flush() [3].
>>>>>>>> I noticed that "len" which is set by "vhost_svq_get_buf"
>>>>>>>> is always set to 0 while VirtQueueElement.len is non-zero.
>>>>>>>> I haven't understood the difference between these two "len"s.
>>>>>>>>
>>>>>>>
>>>>>>> VirtQueueElement.len is the length of the buffer, while the len of
>>>>>>> vhost_svq_get_buf is the bytes written by the device. In the case of
>>>>>>> the tx queue, VirtQueuelen is the length of the tx packet, and the
>>>>>>> vhost_svq_get_buf is always 0 as the device does not write. In the
>>>>>>> case of rx, VirtQueueElem.len is the available length for a rx frame,
>>>>>>> and the vhost_svq_get_buf len is the actual length written by the
>>>>>>> device.
>>>>>>>
>>>>>>> To be 100% accurate a rx packet can span over multiple buffers, but
>>>>>>> SVQ does not need special code to handle this.
>>>>>>>
>>>>>>> So vhost_svq_get_buf should return > 0 for rx queue (svq->vq->index ==
>>>>>>> 0), and 0 for tx queue (svq->vq->index % 2 == 1).
>>>>>>>
>>>>>>> Take into account that vhost_svq_get_buf only handles split vq at the
>>>>>>> moment! It should be renamed or splitted into vhost_svq_get_buf_split.
>>>>>>
>>>>>> In L1, there are 2 virtio network devices.
>>>>>>
>>>>>> # lspci -nn | grep -i net
>>>>>> 00:02.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1000]
>>>>>> 00:04.0 Ethernet controller [0200]: Red Hat, Inc. Virtio 1.0 network device [1af4:1041] (rev 01)
>>>>>>
>>>>>> I am using the second one (1af4:1041) for testing my changes and have
>>>>>> bound this device to the vp_vdpa driver.
>>>>>>
>>>>>> # vdpa dev show -jp
>>>>>> {
>>>>>> "dev": {
>>>>>> "vdpa0": {
>>>>>> "type": "network",
>>>>>> "mgmtdev": "pci/0000:00:04.0",
>>>>>> "vendor_id": 6900,
>>>>>> "max_vqs": 3,
>>>>>
>>>>> How is max_vqs=3? For this to happen L0 QEMU should have
>>>>> virtio-net-pci,...,queues=3 cmdline argument.
>>>
>>> Ouch! I totally misread it :(. Everything is correct, max_vqs should
>>> be 3. I read it as the virtio_net queues, which means queue *pairs*,
>>> as it includes rx and tx queue.
>>
>> Understood :)
>>
>>>>
>>>> I am not sure why max_vqs is 3. I haven't set the value of queues to 3
>>>> in the cmdline argument. Is max_vqs expected to have a default value
>>>> other than 3?
>>>>
>>>> In the blog [1] as well, max_vqs is 3 even though there's no queues=3
>>>> argument.
>>>>
>>>>> It's clear the guest is not using them, we can add mq=off
>>>>> to simplify the scenario.
>>>>
>>>> The value of max_vqs is still 3 after adding mq=off. The whole
>>>> command that I run to boot L0 is:
>>>>
>>>> $ sudo ./qemu/build/qemu-system-x86_64 \
>>>> -enable-kvm \
>>>> -drive file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio \
>>>> -net nic,model=virtio \
>>>> -net user,hostfwd=tcp::2222-:22 \
>>>> -device intel-iommu,snoop-control=on \
>>>> -device virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,mq=off,ctrl_vq=on,ctrl_rx=on,packed=off,event_idx=off,bus=pcie.0,addr=0x4 \
>>>> -netdev tap,id=net0,script=no,downscript=no \
>>>> -nographic \
>>>> -m 8G \
>>>> -smp 4 \
>>>> -M q35 \
>>>> -cpu host 2>&1 | tee vm.log
>>>>
>>>> Could it be that 2 of the 3 vqs are used for the dataplane and
>>>> the third vq is the control vq?
>>>>
>>>>>> "max_vq_size": 256
>>>>>> }
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> The max number of vqs is 3 with the max size being 256.
>>>>>>
>>>>>> Since, there are 2 virtio net devices, vhost_vdpa_svqs_start [1]
>>>>>> is called twice. For each of them. it calls vhost_svq_start [2]
>>>>>> v->shadow_vqs->len number of times.
>>>>>>
>>>>>
>>>>> Ok I understand this confusion, as the code is not intuitive :). Take
>>>>> into account you can only have svq in vdpa devices, so both
>>>>> vhost_vdpa_svqs_start are acting on the vdpa device.
>>>>>
>>>>> You are seeing two calls to vhost_vdpa_svqs_start because virtio (and
>>>>> vdpa) devices are modelled internally as two devices in QEMU: One for
>>>>> the dataplane vq, and other for the control vq. There are historical
>>>>> reasons for this, but we use it in vdpa to always shadow the CVQ while
>>>>> leaving dataplane passthrough if x-svq=off and the virtio & virtio-net
>>>>> feature set is understood by SVQ.
>>>>>
>>>>> If you break at vhost_vdpa_svqs_start with gdb and go higher in the
>>>>> stack you should reach vhost_net_start, that starts each vhost_net
>>>>> device individually.
>>>>>
>>>>> To be 100% honest, each dataplain *queue pair* (rx+tx) is modelled
>>>>> with a different vhost_net device in QEMU, but you don't need to take
>>>>> that into account implementing the packed vq :).
>>>>
>>>> Got it, this makes sense now.
>>>>
>>>>>> Printing the values of dev->vdev->name, v->shadow_vqs->len and
>>>>>> svq->vring.num in vhost_vdpa_svqs_start gives:
>>>>>>
>>>>>> name: virtio-net
>>>>>> len: 2
>>>>>> num: 256
>>>>>> num: 256
>>>>>
>>>>> First QEMU's vhost_net device, the dataplane.
>>>>>
>>>>>> name: virtio-net
>>>>>> len: 1
>>>>>> num: 64
>>>>>>
>>>>>
>>>>> Second QEMU's vhost_net device, the control virtqueue.
>>>>
>>>> Ok, if I understand this correctly, the control vq doesn't
>>>> need separate queues for rx and tx.
>>>>
>>>
>>> That's right. Since CVQ has one reply per command, the driver can just
>>> send ro+rw descriptors to the device. In the case of RX, the device
>>> needs a queue with only-writable descriptors, as neither the device or
>>> the driver knows how many packets will arrive.
>>
>> Got it, this makes sense now.
>>
>>>>>> I am not sure how to match the above log lines to the
>>>>>> right virtio-net device since the actual value of num
>>>>>> can be less than "max_vq_size" in the output of "vdpa
>>>>>> dev show".
>>>>>>
>>>>>
>>>>> Yes, the device can set a different vq max per vq, and the driver can
>>>>> negotiate a lower vq size per vq too.
>>>>>
>>>>>> I think the first 3 log lines correspond to the virtio
>>>>>> net device that I am using for testing since it has
>>>>>> 2 vqs (rx and tx) while the other virtio-net device
>>>>>> only has one vq.
>>>>>>
>>>>>> When printing out the values of svq->vring.num,
>>>>>> used_elem.len and used_elem.id in vhost_svq_get_buf,
>>>>>> there are two sets of output. One set corresponds to
>>>>>> svq->vring.num = 64 and the other corresponds to
>>>>>> svq->vring.num = 256.
>>>>>>
>>>>>> For svq->vring.num = 64, only the following line
>>>>>> is printed repeatedly:
>>>>>>
>>>>>> size: 64, len: 1, i: 0
>>>>>>
>>>>>
>>>>> This is with packed=off, right? If this is testing with packed, you
>>>>> need to change the code to accommodate it. Let me know if you need
>>>>> more help with this.
>>>>
>>>> Yes, this is for packed=off. For the time being, I am trying to
>>>> get L2 to communicate with L0 using split virtqueues and x-svq=true.
>>>>
>>>
>>> Got it.
>>>
>>>>> In the CVQ the only reply is a byte, indicating if the command was
>>>>> applied or not. This seems ok to me.
>>>>
>>>> Understood.
>>>>
>>>>> The queue can also recycle ids as long as they are not available, so
>>>>> that part seems correct to me too.
>>>>
>>>> I am a little confused here. The ids are recycled when they are
>>>> available (i.e., the id is not already in use), right?
>>>>
>>>
>>> In virtio, available is that the device can use them. And used is that
>>> the device returned to the driver. I think you're aligned it's just it
>>> is better to follow the virtio nomenclature :).
>>
>> Got it.
>>
>>>>>> For svq->vring.num = 256, the following line is
>>>>>> printed 20 times,
>>>>>>
>>>>>> size: 256, len: 0, i: 0
>>>>>>
>>>>>> followed by:
>>>>>>
>>>>>> size: 256, len: 0, i: 1
>>>>>> size: 256, len: 0, i: 1
>>>>>>
>>>>>
>>>>> This makes sense for the tx queue too. Can you print the VirtQueue index?
>>>>
>>>> For svq->vring.num = 64, the vq index is 2. So the following line
>>>> (svq->vring.num, used_elem.len, used_elem.id, svq->vq->queue_index)
>>>> is printed repeatedly:
>>>>
>>>> size: 64, len: 1, i: 0, vq idx: 2
>>>>
>>>> For svq->vring.num = 256, the following line is repeated several
>>>> times:
>>>>
>>>> size: 256, len: 0, i: 0, vq idx: 1
>>>>
>>>> This is followed by:
>>>>
>>>> size: 256, len: 0, i: 1, vq idx: 1
>>>>
>>>> In both cases, queue_index is 1. To get the value of queue_index,
>>>> I used "virtio_get_queue_index(svq->vq)" [2].
>>>>
>>>> Since the queue_index is 1, I guess this means this is the tx queue
>>>> and the value of len (0) is correct. However, nothing with
>>>> queue_index % 2 == 0 is printed by vhost_svq_get_buf() which means
>>>> the device is not sending anything to the guest. Is this correct?
>>>>
>>>
>>> Yes, that's totally correct.
>>>
>>> You can set -netdev tap,...,vhost=off in L0 qemu and trace (or debug
>>> with gdb) it to check what is receiving. You should see calls to
>>> hw/net/virtio-net.c:virtio_net_flush_tx. The corresponding function to
>>> receive is virtio_net_receive_rcu, I recommend you trace too just it
>>> in case you see any strange call to it.
>>>
>>
>> I added "vhost=off" to -netdev tap in L0's qemu command. I followed all
>> the steps in the blog [1] up till the point where L2 is booted. Before
>> booting L2, I had no issues pinging L0 from L1.
>>
>> For each ping, the following trace lines were printed by QEMU:
>>
>> virtqueue_alloc_element elem 0x5d041024f560 size 56 in_num 0 out_num 1
>> virtqueue_pop vq 0x5d04109b0ce8 elem 0x5d041024f560 in_num 0 out_num 1
>> virtqueue_fill vq 0x5d04109b0ce8 elem 0x5d041024f560 len 0 idx 0
>> virtqueue_flush vq 0x5d04109b0ce8 count 1
>> virtio_notify vdev 0x5d04109a8d50 vq 0x5d04109b0ce8
>> virtqueue_alloc_element elem 0x5d041024f560 size 56 in_num 1 out_num 0
>> virtqueue_pop vq 0x5d04109b0c50 elem 0x5d041024f560 in_num 1 out_num 0
>> virtqueue_fill vq 0x5d04109b0c50 elem 0x5d041024f560 len 110 idx 0
>> virtqueue_flush vq 0x5d04109b0c50 count 1
>> virtio_notify vdev 0x5d04109a8d50 vq 0x5d04109b0c50
>>
>> The first 5 lines look like they were printed when an echo request was
>> sent to L0 and the next 5 lines were printed when an echo reply was
>> received.
>>
>> After booting L2, I set up the tap device's IP address in L0 and the
>> vDPA port's IP address in L2.
>>
>> When trying to ping L0 from L2, I only see the following lines being
>> printed:
>>
>> virtqueue_alloc_element elem 0x5d041099ffd0 size 56 in_num 0 out_num 1
>> virtqueue_pop vq 0x5d0410d87168 elem 0x5d041099ffd0 in_num 0 out_num 1
>> virtqueue_fill vq 0x5d0410d87168 elem 0x5d041099ffd0 len 0 idx 0
>> virtqueue_flush vq 0x5d0410d87168 count 1
>> virtio_notify vdev 0x5d0410d79a10 vq 0x5d0410d87168
>>
>> There's no reception. I used wireshark to inspect the packets that are
>> being sent and received through the tap device in L0.
>>
>> When pinging L0 from L2, I see one of the following two outcomes:
>>
>> Outcome 1:
>> ----------
>> L2 broadcasts ARP packets and L0 replies to L2.
>>
>> Source Destination Protocol Length Info
>> 52:54:00:12:34:57 Broadcast ARP 42 Who has 111.1.1.1? Tell 111.1.1.2
>> d2:6d:b9:61:e1:9a 52:54:00:12:34:57 ARP 42 111.1.1.1 is at d2:6d:b9:61:e1:9a
>>
>> Outcome 2 (less frequent):
>> --------------------------
>> L2 sends an ICMP echo request packet to L0 and L0 sends a reply,
>> but the reply is not received by L2.
>>
>> Source Destination Protocol Length Info
>> 111.1.1.2 111.1.1.1 ICMP 98 Echo (ping) request id=0x0006, seq=1/256, ttl=64
>> 111.1.1.1 111.1.1.2 ICMP 98 Echo (ping) reply id=0x0006, seq=1/256, ttl=64
>>
>> When pinging L2 from L0 I get the following output in
>> wireshark:
>>
>> Source Destination Protocol Length Info
>> 111.1.1.1 111.1.1.2 ICMP 100 Echo (ping) request id=0x002c, seq=2/512, ttl=64 (no response found!)
>>
>> I do see a lot of traced lines being printed (by the QEMU instance that
>> was started in L0) with in_num > 1, for example:
>>
>> virtqueue_alloc_element elem 0x5d040fdbad30 size 56 in_num 1 out_num 0
>> virtqueue_pop vq 0x5d04109b0c50 elem 0x5d040fdbad30 in_num 1 out_num 0
>> virtqueue_fill vq 0x5d04109b0c50 elem 0x5d040fdbad30 len 76 idx 0
>> virtqueue_flush vq 0x5d04109b0c50 count 1
>> virtio_notify vdev 0x5d04109a8d50 vq 0x5d04109b0c50
>>
>
> So L0 is able to receive data from L2. We're halfway there, Good! :).
>
>> It looks like L1 is receiving data from L0 but this is not related to
>> the pings that are sent from L2. I haven't figured out what data is
>> actually being transferred in this case. It's not necessary for all of
>> the data that L1 receives from L0 to be passed to L2, is it?
>>
>
> It should be noise, yes.
>
Understood.
>>>>>> For svq->vring.num = 256, the following line is
>>>>>> printed 20 times,
>>>>>>
>>>>>> size: 256, len: 0, i: 0
>>>>>>
>>>>>> followed by:
>>>>>>
>>>>>> size: 256, len: 0, i: 1
>>>>>> size: 256, len: 0, i: 1
>>>>>>
>>>>>
>>>>> This makes sense for the tx queue too. Can you print the VirtQueue index?
>>>>
>>>> For svq->vring.num = 64, the vq index is 2. So the following line
>>>> (svq->vring.num, used_elem.len, used_elem.id, svq->vq->queue_index)
>>>> is printed repeatedly:
>>>>
>>>> size: 64, len: 1, i: 0, vq idx: 2
>>>>
>>>> For svq->vring.num = 256, the following line is repeated several
>>>> times:
>>>>
>>>> size: 256, len: 0, i: 0, vq idx: 1
>>>>
>>>> This is followed by:
>>>>
>>>> size: 256, len: 0, i: 1, vq idx: 1
>>>>
>>>> In both cases, queue_index is 1.
>>
>> I also noticed that there are now some lines with svq->vring.num = 256
>> where len > 0. These lines were printed by the QEMU instance running
>> in L1, so this corresponds to data that was received by L2.
>>
>> svq->vring.num used_elem.len used_elem.id svq->vq->queue_index
>> size: 256 len: 82 i: 0 vq idx: 0
>> size: 256 len: 82 i: 1 vq idx: 0
>> size: 256 len: 82 i: 2 vq idx: 0
>> size: 256 len: 54 i: 3 vq idx: 0
>>
>> I still haven't figured out what data was received by L2 but I am
>> slightly confused as to why this data was received by L2 but not
>> the ICMP echo replies sent by L0.
>>
>
> We're on a good track, let's trace it deeper. I guess these are
> printed from vhost_svq_flush, right? Do virtqueue_fill,
> virtqueue_flush, and event_notifier_set(&svq->svq_call) run properly,
> or do you see anything strange with gdb / tracing?
>
Apologies for the delay in replying. It took me a while to figure
this out, but I have now understood why this doesn't work. L1 is
unable to receive messages from L0 because they get filtered out
by hw/net/virtio-net.c:receive_filter [1]. There's an issue with
the MAC addresses.
In L0, I have:
$ ip a show tap0
6: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff
inet 111.1.1.1/24 scope global tap0
valid_lft forever preferred_lft forever
inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto kernel_ll
valid_lft forever preferred_lft forever
In L1:
# ip a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
altname enp0s2
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute eth0
valid_lft 83455sec preferred_lft 83455sec
inet6 fec0::7bd2:265e:3b8e:5acc/64 scope site dynamic noprefixroute
valid_lft 86064sec preferred_lft 14064sec
inet6 fe80::50e7:5bf6:fff8:a7b0/64 scope link noprefixroute
valid_lft forever preferred_lft forever
I'll call this L1-eth0.
In L2:
# ip a show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP gro0
link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
altname enp0s7
inet 111.1.1.2/24 scope global eth0
valid_lft forever preferred_lft forever
I'll call this L2-eth0.
Apart from eth0, lo is the only other device in both L1 and L2.
A frame that L1 receives from L0 has L2-eth0's MAC address (LSB = 57)
as its destination address. When booting L2 with x-svq=false, the
value of n->mac in VirtIONet is also L2-eth0. So, L1 accepts
the frames and passes them on to L2 and pinging works [2].
However, when booting L2 with x-svq=true, n->mac is set to L1-eth0
(LSB = 56) in virtio_net_handle_mac() [3]. n->mac_table.macs also
does not seem to have L2-eth0's MAC address. Due to this,
receive_filter() filters out all the frames [4] that were meant for
L2-eth0.
With x-svq=true, I see that n->mac is set by virtio_net_handle_mac()
[3] when L1 receives VIRTIO_NET_CTRL_MAC_ADDR_SET. With x-svq=false,
virtio_net_handle_mac() doesn't seem to be getting called. I haven't
understood how the MAC address is set in VirtIONet when x-svq=false.
Understanding this might help see why n->mac has different values
when x-svq is false vs when it is true.
Thanks,
Sahil
[1] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/net/virtio-net.c#L1944
[2] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/net/virtio-net.c#L1775
[3] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/net/virtio-net.c#L1108
[4] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/net/virtio-net.c#L1770-1786
next prev parent reply other threads:[~2025-01-19 6:38 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-05 20:34 [RFC v4 0/5] Add packed virtqueue to shadow virtqueue Sahil Siddiq
2024-12-05 20:34 ` [RFC v4 1/5] vhost: Refactor vhost_svq_add_split Sahil Siddiq
2024-12-10 8:40 ` Eugenio Perez Martin
2024-12-05 20:34 ` [RFC v4 2/5] vhost: Write descriptors to packed svq Sahil Siddiq
2024-12-10 8:54 ` Eugenio Perez Martin
2024-12-11 15:58 ` Sahil Siddiq
2024-12-05 20:34 ` [RFC v4 3/5] vhost: Data structure changes to support packed vqs Sahil Siddiq
2024-12-10 8:55 ` Eugenio Perez Martin
2024-12-11 15:59 ` Sahil Siddiq
2024-12-05 20:34 ` [RFC v4 4/5] vdpa: Allocate memory for svq and map them to vdpa Sahil Siddiq
2024-12-05 20:34 ` [RFC v4 5/5] vdpa: Support setting vring_base for packed svq Sahil Siddiq
2024-12-10 9:27 ` [RFC v4 0/5] Add packed virtqueue to shadow virtqueue Eugenio Perez Martin
2024-12-11 15:57 ` Sahil Siddiq
2024-12-15 17:27 ` Sahil Siddiq
2024-12-16 8:39 ` Eugenio Perez Martin
2024-12-17 5:45 ` Sahil Siddiq
2024-12-17 7:50 ` Eugenio Perez Martin
2024-12-19 19:37 ` Sahil Siddiq
2024-12-20 6:58 ` Eugenio Perez Martin
2025-01-03 13:06 ` Sahil Siddiq
2025-01-07 8:05 ` Eugenio Perez Martin
2025-01-19 6:37 ` Sahil Siddiq [this message]
2025-01-21 16:37 ` Eugenio Perez Martin
2025-01-24 5:46 ` Sahil Siddiq
2025-01-24 7:34 ` Eugenio Perez Martin
2025-01-31 5:04 ` Sahil Siddiq
2025-01-31 6:57 ` Eugenio Perez Martin
2025-02-04 12:49 ` Sahil Siddiq
2025-02-04 18:10 ` Eugenio Perez Martin
2025-02-04 18:15 ` Eugenio Perez Martin
2025-02-06 5:26 ` Sahil Siddiq
2025-02-06 7:12 ` Eugenio Perez Martin
2025-02-06 15:17 ` Sahil Siddiq
2025-02-10 10:58 ` Sahil Siddiq
2025-02-10 14:23 ` Eugenio Perez Martin
2025-02-10 16:25 ` Sahil Siddiq
2025-02-11 7:57 ` Eugenio Perez Martin
2025-03-06 5:25 ` Sahil Siddiq
2025-03-06 7:23 ` Eugenio Perez Martin
2025-03-24 13:54 ` Sahil Siddiq
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ee57bd3-5ea0-49a7-969e-c3fe902d8246@gmail.com \
--to=icegambit91@gmail.com \
--cc=eperezma@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=sahilcdq@proton.me \
--cc=sgarzare@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).