From: Sahil Siddiq <icegambit91@gmail.com>
To: Eugenio Perez Martin <eperezma@redhat.com>
Cc: sgarzare@redhat.com, mst@redhat.com, qemu-devel@nongnu.org,
Sahil Siddiq <sahilcdq@proton.me>
Subject: Re: [RFC v4 0/5] Add packed virtqueue to shadow virtqueue
Date: Thu, 6 Feb 2025 20:47:57 +0530 [thread overview]
Message-ID: <360803dd-f1e0-48a3-8917-2477d8a821a9@gmail.com> (raw)
In-Reply-To: <CAJaqyWfkOwC_-3N66Gq2EM+eXz7hNv3n+W_2W6XtJZ0iS8PQPw@mail.gmail.com>
Hi,
On 2/6/25 12:42 PM, Eugenio Perez Martin wrote:
> On Thu, Feb 6, 2025 at 6:26 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>
>> Hi,
>>
>> On 2/4/25 11:40 PM, Eugenio Perez Martin wrote:
>>> On Tue, Feb 4, 2025 at 1:49 PM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>>> On 1/31/25 12:27 PM, Eugenio Perez Martin wrote:
>>>>> On Fri, Jan 31, 2025 at 6:04 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>>>>> On 1/24/25 1:04 PM, Eugenio Perez Martin wrote:
>>>>>>> On Fri, Jan 24, 2025 at 6:47 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>>>>>>> On 1/21/25 10:07 PM, Eugenio Perez Martin wrote:
>>>>>>>>> On Sun, Jan 19, 2025 at 7:37 AM Sahil Siddiq <icegambit91@gmail.com> wrote:
>>>>>>>>>> On 1/7/25 1:35 PM, Eugenio Perez Martin wrote:
>>>>>>>>>> [...]
>>>>>>>>>> Apologies for the delay in replying. It took me a while to figure
>>>>>>>>>> this out, but I have now understood why this doesn't work. L1 is
>>>>>>>>>> unable to receive messages from L0 because they get filtered out
>>>>>>>>>> by hw/net/virtio-net.c:receive_filter [1]. There's an issue with
>>>>>>>>>> the MAC addresses.
>>>>>>>>>>
>>>>>>>>>> In L0, I have:
>>>>>>>>>>
>>>>>>>>>> $ ip a show tap0
>>>>>>>>>> 6: tap0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000
>>>>>>>>>> link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff
>>>>>>>>>> inet 111.1.1.1/24 scope global tap0
>>>>>>>>>> valid_lft forever preferred_lft forever
>>>>>>>>>> inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto kernel_ll
>>>>>>>>>> valid_lft forever preferred_lft forever
>>>>>>>>>>
>>>>>>>>>> In L1:
>>>>>>>>>>
>>>>>>>>>> # ip a show eth0
>>>>>>>>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
>>>>>>>>>> link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff
>>>>>>>>>> altname enp0s2
>>>>>>>>>> inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute eth0
>>>>>>>>>> valid_lft 83455sec preferred_lft 83455sec
>>>>>>>>>> inet6 fec0::7bd2:265e:3b8e:5acc/64 scope site dynamic noprefixroute
>>>>>>>>>> valid_lft 86064sec preferred_lft 14064sec
>>>>>>>>>> inet6 fe80::50e7:5bf6:fff8:a7b0/64 scope link noprefixroute
>>>>>>>>>> valid_lft forever preferred_lft forever
>>>>>>>>>>
>>>>>>>>>> I'll call this L1-eth0.
>>>>>>>>>>
>>>>>>>>>> In L2:
>>>>>>>>>> # ip a show eth0
>>>>>>>>>> 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP gro0
>>>>>>>>>> link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff
>>>>>>>>>> altname enp0s7
>>>>>>>>>> inet 111.1.1.2/24 scope global eth0
>>>>>>>>>> valid_lft forever preferred_lft forever
>>>>>>>>>>
>>>>>>>>>> I'll call this L2-eth0.
>>>>>>>>>>
>>>>>>>>>> Apart from eth0, lo is the only other device in both L1 and L2.
>>>>>>>>>>
>>>>>>>>>> A frame that L1 receives from L0 has L2-eth0's MAC address (LSB = 57)
>>>>>>>>>> as its destination address. When booting L2 with x-svq=false, the
>>>>>>>>>> value of n->mac in VirtIONet is also L2-eth0. So, L1 accepts
>>>>>>>>>> the frames and passes them on to L2 and pinging works [2].
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> So this behavior is interesting by itself. But L1's kernel net system
>>>>>>>>> should not receive anything. As I read it, even if it receives it, it
>>>>>>>>> should not forward the frame to L2 as it is in a different subnet. Are
>>>>>>>>> you able to read it using tcpdump on L1?
>>>>>>>>
>>>>>>>> I ran "tcpdump -i eth0" in L1. It didn't capture any of the packets
>>>>>>>> that were directed at L2 even though L2 was able to receive them.
>>>>>>>> Similarly, it didn't capture any packets that were sent from L2 to
>>>>>>>> L0. This is when L2 is launched with x-svq=false.
>>>>>>>> [...]
>>>>>>>> With x-svq=true, forcibly setting the LSB of n->mac to 0x57 in
>>>>>>>> receive_filter allows L2 to receive packets from L0. I added
>>>>>>>> the following line just before line 1771 [1] to check this out.
>>>>>>>>
>>>>>>>> n->mac[5] = 0x57;
>>>>>>>>
>>>>>>>
>>>>>>> That's very interesting. Let me answer all the gdb questions below and
>>>>>>> we can debug it deeper :).
>>>>>>>
>>>>>>
>>>>>> Thank you for the primer on using gdb with QEMU. I am able to debug
>>>>>> QEMU now.
>>>>>>
>>>>>>>>> Maybe we can make the scenario clearer by telling which virtio-net
>>>>>>>>> device is which with virtio_net_pci,mac=XX:... ?
>>>>>>>>>
>>>>>>>>>> However, when booting L2 with x-svq=true, n->mac is set to L1-eth0
>>>>>>>>>> (LSB = 56) in virtio_net_handle_mac() [3].
>>>>>>>>>
>>>>>>>>> Can you tell with gdb bt if this function is called from net or the
>>>>>>>>> SVQ subsystem?
>>>>>>>>
>>>>>>
>>>>>> It looks like the function is being called from net.
>>>>>>
>>>>>> (gdb) bt
>>>>>> #0 virtio_net_handle_mac (n=0x15622425e, cmd=85 'U', iov=0x555558865980, iov_cnt=1476792840) at ../hw/net/virtio-net.c:1098
>>>>>> #1 0x0000555555e5920b in virtio_net_handle_ctrl_iov (vdev=0x555558fdacd0, in_sg=0x5555580611f8, in_num=1, out_sg=0x555558061208,
>>>>>> out_num=1) at ../hw/net/virtio-net.c:1581
>>>>>> #2 0x0000555555e593a0 in virtio_net_handle_ctrl (vdev=0x555558fdacd0, vq=0x555558fe7730) at ../hw/net/virtio-net.c:1610
>>>>>> #3 0x0000555555e9a7d8 in virtio_queue_notify_vq (vq=0x555558fe7730) at ../hw/virtio/virtio.c:2484
>>>>>> #4 0x0000555555e9dffb in virtio_queue_host_notifier_read (n=0x555558fe77a4) at ../hw/virtio/virtio.c:3869
>>>>>> #5 0x000055555620329f in aio_dispatch_handler (ctx=0x555557d9f840, node=0x7fffdca7ba80) at ../util/aio-posix.c:373
>>>>>> #6 0x000055555620346f in aio_dispatch_handlers (ctx=0x555557d9f840) at ../util/aio-posix.c:415
>>>>>> #7 0x00005555562034cb in aio_dispatch (ctx=0x555557d9f840) at ../util/aio-posix.c:425
>>>>>> #8 0x00005555562242b5 in aio_ctx_dispatch (source=0x555557d9f840, callback=0x0, user_data=0x0) at ../util/async.c:361
>>>>>> #9 0x00007ffff6d86559 in ?? () from /usr/lib/libglib-2.0.so.0
>>>>>> #10 0x00007ffff6d86858 in g_main_context_dispatch () from /usr/lib/libglib-2.0.so.0
>>>>>> #11 0x0000555556225bf9 in glib_pollfds_poll () at ../util/main-loop.c:287
>>>>>> #12 0x0000555556225c87 in os_host_main_loop_wait (timeout=294672) at ../util/main-loop.c:310
>>>>>> #13 0x0000555556225db6 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:589
>>>>>> #14 0x0000555555c0c1a3 in qemu_main_loop () at ../system/runstate.c:835
>>>>>> #15 0x000055555612bd8d in qemu_default_main (opaque=0x0) at ../system/main.c:48
>>>>>> #16 0x000055555612be3d in main (argc=23, argv=0x7fffffffe508) at ../system/main.c:76
>>>>>>
>>>>>> virtio_queue_notify_vq at hw/virtio/virtio.c:2484 [2] calls
>>>>>> vq->handle_output(vdev, vq). I see "handle_output" is a function
>>>>>> pointer and in this case it seems to be pointing to
>>>>>> virtio_net_handle_ctrl.
>>>>>>
>>>>>>>>>> [...]
>>>>>>>>>> With x-svq=true, I see that n->mac is set by virtio_net_handle_mac()
>>>>>>>>>> [3] when L1 receives VIRTIO_NET_CTRL_MAC_ADDR_SET. With x-svq=false,
>>>>>>>>>> virtio_net_handle_mac() doesn't seem to be getting called. I haven't
>>>>>>>>>> understood how the MAC address is set in VirtIONet when x-svq=false.
>>>>>>>>>> Understanding this might help see why n->mac has different values
>>>>>>>>>> when x-svq is false vs when it is true.
>>>>>>>>>
>>>>>>>>> Ok this makes sense, as x-svq=true is the one that receives the set
>>>>>>>>> mac message. You should see it in L0's QEMU though, both in x-svq=on
>>>>>>>>> and x-svq=off scenarios. Can you check it?
>>>>>>>>
>>>>>>>> L0's QEMU seems to be receiving the "set mac" message only when L1
>>>>>>>> is launched with x-svq=true. With x-svq=off, I don't see any call
>>>>>>>> to virtio_net_handle_mac with cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET
>>>>>>>> in L0.
>>>>>>>>
>>>>>>>
>>>>>>> Ok this is interesting. Let's disable control virtqueue to start with
>>>>>>> something simpler:
>>>>>>> device virtio-net-pci,netdev=net0,ctrl_vq=off,...
>>>>>>>
>>>>>>> QEMU will start complaining about features that depend on ctrl_vq,
>>>>>>> like ctrl_rx. Let's disable all of them and check this new scenario.
>>>>>>>
>>>>>>
>>>>>> I am still investigating this part. I set ctrl_vq=off and ctrl_rx=off.
>>>>>> I didn't get any errors as such about features that depend on ctrl_vq.
>>>>>> However, I did notice that after booting L2 (x-svq=true as well as
>>>>>> x-svq=false), no eth0 device was created. There was only a "lo" interface
>>>>>> in L2. An eth0 interface is present only when L1 (L0 QEMU) is booted
>>>>>> with ctrl_vq=on and ctrl_rx=on.
>>>>>>
>>>>>
>>>>> Any error messages on the nested guest's dmesg?
>>>>
>>>> Oh, yes, there were error messages in the output of dmesg related to
>>>> ctrl_vq. After adding the following args, there were no error messages
>>>> in dmesg.
>>>>
>>>> -device virtio-net-pci,ctrl_vq=off,ctrl_rx=off,ctrl_vlan=off,ctrl_mac_addr=off
>>>>
>>>> I see that the eth0 interface is also created. I am able to ping L0
>>>> from L2 and vice versa as well (even with x-svq=true). This is because
>>>> n->promisc is set when these features are disabled and receive_filter() [1]
>>>> always returns 1.
>>>>
>>>>> Is it fixed when you set the same mac address on L0
>>>>> virtio-net-pci and L1's?
>>>>>
>>>>
>>>> I didn't have to set the same mac address in this case since promiscuous
>>>> mode seems to be getting enabled which allows pinging to work.
>>>>
>>>> There is another concept that I am a little confused about. In the case
>>>> where L2 is booted with x-svq=false (and all ctrl features such as ctrl_vq,
>>>> ctrl_rx, etc. are on), I am able to ping L0 from L2. When tracing
>>>> receive_filter() in L0-QEMU, I see the values of n->mac and the destination
>>>> mac address in the ICMP packet match [2].
>>>>
>>>
>>> SVQ makes an effort to set the mac address at the beginning of
>>> operation. The L0 interpret it as "filter out all MACs except this
>>> one". But SVQ cannot set the mac if ctrl_mac_addr=off, so the nic
>>> receives all packets and the guest kernel needs to filter out by
>>> itself.
>>>
>>>> I haven't understood what n->mac refers to over here. MAC addresses are
>>>> globally unique and so the mac address of the device in L1 should be
>>>> different from that in L2.
>>>
>>> With vDPA, they should be the same device even if they are declared in
>>> different cmdlines or layers of virtualizations. If it were a physical
>>> NIC, QEMU should declare the MAC of the physical NIC too.
>>
>> Understood. I guess the issue with x-svq=true is that the MAC address
>> set in L0-QEMU's n->mac is different from the device in L2. That's why
>> the packets get filtered out with x-svq=true but pinging works with
>> x-svq=false.
>>
>
> Right!
>
>
>>> There is a thread in QEMU maul list where how QEMU should influence
>>> the control plane is discussed, and maybe it would be easier if QEMU
>>> just checks the device's MAC and ignores cmdline. But then, that
>>> behavior would be surprising for the rest of vhosts like vhost-kernel.
>>> Or just emit a warning if the MAC is different than the one that the
>>> device reports.
>>>
>>
>> Got it.
>>
>>>> But I see L0-QEMU's n->mac is set to the mac
>>>> address of the device in L2 (allowing receive_filter to accept the packet).
>>>>
>>>
>>> That's interesting, can you check further what does receive_filter and
>>> virtio_net_receive_rcu do with gdb? As long as virtio_net_receive_rcu
>>> flushes the packet on the receive queue, SVQ should receive it.
>>>
>> The control flow irrespective of the value of x-svq is the same up till
>> the MAC address comparison in receive_filter() [1]. For x-svq=true,
>> the equality check between n->mac and the packet's destination MAC address
>> fails and the packet is filtered out. It is not flushed to the receive
>> queue. With x-svq=false, this is not the case.
>>
>> On 2/4/25 11:45 PM, Eugenio Perez Martin wrote:
>>> PS: Please note that you can check packed_vq SVQ implementation
>>> already without CVQ, as these features are totally orthogonal :).
>>>
>>
>> Right. Now that I can ping with the ctrl features turned off, I think
>> this should take precedence. There's another issue specific to the
>> packed virtqueue case. It causes the kernel to crash. I have been
>> investigating this and the situation here looks very similar to what's
>> explained in Jason Wang's mail [2]. My plan of action is to apply his
>> changes in L2's kernel and check if that resolves the problem.
>>
>> The details of the crash can be found in this mail [3].
>>
>
> If you're testing this series without changes, I think that is caused
> by not implementing the packed version of vhost_svq_get_buf.
>
> https://lists.nongnu.org/archive/html/qemu-devel/2024-12/msg01902.html
>
Oh, apologies, I think I had misunderstood your response in the linked mail.
Until now, I thought they were unrelated. In that case, I'll implement the
packed version of vhost_svq_get_buf. Hopefully that fixes it :).
Thanks,
Sahil
next prev parent reply other threads:[~2025-02-06 15:18 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-05 20:34 [RFC v4 0/5] Add packed virtqueue to shadow virtqueue Sahil Siddiq
2024-12-05 20:34 ` [RFC v4 1/5] vhost: Refactor vhost_svq_add_split Sahil Siddiq
2024-12-10 8:40 ` Eugenio Perez Martin
2024-12-05 20:34 ` [RFC v4 2/5] vhost: Write descriptors to packed svq Sahil Siddiq
2024-12-10 8:54 ` Eugenio Perez Martin
2024-12-11 15:58 ` Sahil Siddiq
2024-12-05 20:34 ` [RFC v4 3/5] vhost: Data structure changes to support packed vqs Sahil Siddiq
2024-12-10 8:55 ` Eugenio Perez Martin
2024-12-11 15:59 ` Sahil Siddiq
2024-12-05 20:34 ` [RFC v4 4/5] vdpa: Allocate memory for svq and map them to vdpa Sahil Siddiq
2024-12-05 20:34 ` [RFC v4 5/5] vdpa: Support setting vring_base for packed svq Sahil Siddiq
2024-12-10 9:27 ` [RFC v4 0/5] Add packed virtqueue to shadow virtqueue Eugenio Perez Martin
2024-12-11 15:57 ` Sahil Siddiq
2024-12-15 17:27 ` Sahil Siddiq
2024-12-16 8:39 ` Eugenio Perez Martin
2024-12-17 5:45 ` Sahil Siddiq
2024-12-17 7:50 ` Eugenio Perez Martin
2024-12-19 19:37 ` Sahil Siddiq
2024-12-20 6:58 ` Eugenio Perez Martin
2025-01-03 13:06 ` Sahil Siddiq
2025-01-07 8:05 ` Eugenio Perez Martin
2025-01-19 6:37 ` Sahil Siddiq
2025-01-21 16:37 ` Eugenio Perez Martin
2025-01-24 5:46 ` Sahil Siddiq
2025-01-24 7:34 ` Eugenio Perez Martin
2025-01-31 5:04 ` Sahil Siddiq
2025-01-31 6:57 ` Eugenio Perez Martin
2025-02-04 12:49 ` Sahil Siddiq
2025-02-04 18:10 ` Eugenio Perez Martin
2025-02-04 18:15 ` Eugenio Perez Martin
2025-02-06 5:26 ` Sahil Siddiq
2025-02-06 7:12 ` Eugenio Perez Martin
2025-02-06 15:17 ` Sahil Siddiq [this message]
2025-02-10 10:58 ` Sahil Siddiq
2025-02-10 14:23 ` Eugenio Perez Martin
2025-02-10 16:25 ` Sahil Siddiq
2025-02-11 7:57 ` Eugenio Perez Martin
2025-03-06 5:25 ` Sahil Siddiq
2025-03-06 7:23 ` Eugenio Perez Martin
2025-03-24 13:54 ` Sahil Siddiq
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=360803dd-f1e0-48a3-8917-2477d8a821a9@gmail.com \
--to=icegambit91@gmail.com \
--cc=eperezma@redhat.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=sahilcdq@proton.me \
--cc=sgarzare@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).