From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 05237C02181 for ; Fri, 24 Jan 2025 05:47:54 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tbCX3-0004vi-51; Fri, 24 Jan 2025 00:47:05 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tbCX2-0004vU-0R for qemu-devel@nongnu.org; Fri, 24 Jan 2025 00:47:04 -0500 Received: from mail-pl1-x630.google.com ([2607:f8b0:4864:20::630]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tbCWy-0006in-Aj for qemu-devel@nongnu.org; Fri, 24 Jan 2025 00:47:03 -0500 Received: by mail-pl1-x630.google.com with SMTP id d9443c01a7336-216728b1836so28056545ad.0 for ; Thu, 23 Jan 2025 21:46:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737697618; x=1738302418; darn=nongnu.org; h=content-transfer-encoding:in-reply-to:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:from:to:cc:subject:date:message-id:reply-to; bh=If3OZx59M/9WwGuAoLUpFmPC6f3tB2A7BDZZCQX+oco=; b=EnESlwinSfiydox/xtyY0ODjzc+5Ow3Y6mBh5VgOFzrktzZd4Vzoo5iuJJiskx+KLc hs1TPb3ZFcktRyASfMgj9wQkQBIxoFJK0DEoUyLNP7NcOZhn5RrgQewN4fS9DVIn9hEz +9bceS7J8QyX6+9kWmxuCIC9besCaSl40CeHE9wVbFW6UXxgqReXCk+hkZR2wFTx3OlC CutaBvTbM33ou1T86GITKCHilMom40Kgj6NmEIa09pF34hHur8P/bsYlZVCan0sqkgX0 92ig5SdZ1vos4qCD0TkPhW6H8Ba1M/B6AmtQdmDd2OKJDmL9VRm6JUfduYYQ7eEtVvAF DBeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737697618; x=1738302418; h=content-transfer-encoding:in-reply-to:autocrypt:from :content-language:references:cc:to:subject:user-agent:mime-version :date:message-id:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=If3OZx59M/9WwGuAoLUpFmPC6f3tB2A7BDZZCQX+oco=; b=l3ilnxVpHkJdDJhazoCuGoKr7CU4RLrkXrzo8hvaW//MFS76+3M+RTht8p2xL7Uxqf ia//c+Azzk//h6ZKZt10UpRtfHijcb+YVp2BxBfkylyalKazpNIMmD7CWughRwF4x9uM uXCQWSrkXL2BuHMPA1GP4jYPgfORnQdVR4l7iJ5G7rTJV6cAvoO/dAT3oS2qcmMCkGPn q7NpbD4chteUkpxvfOv/qCYJUEJ5HXWLuQP5NNfWitNyATjY510pCSPHr5/iRrMv48ug iUQ6A8x0EgJ8+rvBE7eV3WgBvlc4o/RwF+44g1eMKMahwulLrEXa5jiQPzUa2XlfU7Id yI8A== X-Forwarded-Encrypted: i=1; AJvYcCXeG0mpeS9c3+T8uodH5NwxlVP6xTcItwgDGBEG542xWu10+b4KTvIPx8NI5sdHuRJYnDRli/BuIYq9@nongnu.org X-Gm-Message-State: AOJu0Yzu1uLPqq2mROTzzngRcG5x1i3QyloYq5w+9gyC9gMjbc3nybKH Nhpwli/3oHK/unk19Y0QgOxJqHFoucbtGFLlXufYo7oXdQn8kiyQ X-Gm-Gg: ASbGncvydo4eFBQwQ7By+9Hpg4Yj7sebakIoS59IsMEhb8gw8Xk1wd6xfzorWj6z7ZJ OmTpmdm9E/20J4UQFiALLsFfZJt4taVgLe3/lKMHshtO9S8mbmkgFqbzTywqPM575FaP/IMsG8t z5BUxmIHNc5engdDyk2LH6oQXEXAKFQzrVSi37zyXKn7i+LDg5HcqM0jAsecA4C+k03idFcOCn5 CuP45arRWrKHGTtABHlZd0l4utVE8AuC3i1v5aWzN+FwYwabA== X-Google-Smtp-Source: AGHT+IH1ExDG+Jys+7D4maZHbUrG/In09LLf/ZFLxMvPc3MD9jjs3btsMzFh6FbY/F3m64fNwrTygA== X-Received: by 2002:a17:902:ce01:b0:215:aae1:40f0 with SMTP id d9443c01a7336-21c34cc0d1emr469522785ad.0.1737697617655; Thu, 23 Jan 2025 21:46:57 -0800 (PST) Received: from [192.168.1.18] ([223.233.82.97]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21da424f368sm8135405ad.257.2025.01.23.21.46.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 23 Jan 2025 21:46:57 -0800 (PST) Message-ID: Date: Fri, 24 Jan 2025 11:16:53 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC v4 0/5] Add packed virtqueue to shadow virtqueue To: Eugenio Perez Martin Cc: sgarzare@redhat.com, mst@redhat.com, qemu-devel@nongnu.org, Sahil Siddiq References: <20241205203430.76251-1-sahilcdq@proton.me> <9b20ffc4-b55b-42c8-9847-a677c30c0051@gmail.com> <4ee57bd3-5ea0-49a7-969e-c3fe902d8246@gmail.com> Content-Language: en-US From: Sahil Siddiq Autocrypt: addr=icegambit91@gmail.com; keydata= xsDNBGcgaYEBDADpKUSKbchLCMdCuZGkuF50/7BiraKc8Ch+mk4T+2+E2/6qXAkalvCkFoqx 3/sa35rconZAFzB/r19e7i3UajIQjATvENrGxqe/IFqcJxo2Jr1HQBwCrsmlQoUCilSC6nDi ejcEIAFytJORDkCcZwLXPjdf5/4pbqVAW5823LB5j5F0TqHAnGY1RhS2V1eBPdRqjAA3xecT zTmLHlkqAXgM2DOot1KbycedZSieCwEykTXMaLC0/3Gyo2Cp1WTWOIyD0hsXpLyFioV4FaX2 Lm+z45Zc4PoNXeC6+l4PdDxixs+saAbadknP+9omwlb+PkMd3esq2wkowTwTJVJK8FCCNTo5 2OArA/ddxcyXY25JHN7vzGooFNW6Bb9YV+lbX6y95ytE3KcAmid73tQrcjlebIpgNAvOMyyZ BgQJY0HSu3DGNZuKtbNM3iTl82TFj7MVgkEffgF83N6XyBqDztIz2lN47/q5wyRi3jda9NDt geI+Nv145HjulO7bI3NT048AEQEAAc0kU2FoaWwgU2lkZGlxIDxpY2VnYW1iaXQ5MUBnbWFp bC5jb20+wsENBBMBCAA3FiEERtYfQYWFu+uAZjYrrzGlXdb6f1cFAmcgaYEFCQWjmoACGwME CwkIBwUVCAkKCwUWAgMBAAAKCRCvMaVd1vp/V/nnC/9KnNIr4a3JW3E/snxv1+XIyUmHBDLn PKBmLDYxO9RJe1xKo/sNmLEno4c8G1F/y12TLV086cpBYGKkE8mPMBABqxuiPG8srwoKc2HW bvoC2Zfeu/WeQ0YqeI9ZEwRhsDGQZ7vc8PnKnEUaPZn6iWW4GeX7dXWeGNrK0wU2B04l2d+M FIKaoPHk8w5Ff++QNcn0YRkm//nYlukHUrMxhNcuc18jaLLftOh7BH/4EbKtTN75KAFePQBi I2CbuC41fchTt12QrPB3yz1GKfudsEMLFHBNeComJNnuolPOq0YSyuKdRO8Jubn5ZqWQeTwj XbG7wTonDc8xe46irOhz36VcjsjSY+PYhVZSeDWeDUZgpaJkBjQDDodIN2eoMwVEyUByos9H mKrqrpBMmylOspAZzqjb5FtOqM0BCxQINdKKiMwRelSb6pHYCrbS0XzpwDUEpp7RWCbHgg+6 Ot72kQCEFxj2LzX9VxF24GGQy9inlUfN51IV04klSibtBuuz/NbOwM0EZyBpgQEMAJelVX4k CtCxD4Ji3FQ8LZs22z7VoUvqIb7Gj2lNvhPeijlqqBkSMIgnSCLxlH4ahqKnEV58IrfVriV0 92zb94Az2nl0r+bZYfvev1qCcVIYxk+pYYcRl5qPXX8XGalrkcBBWmkgTSwzNK9rV4850iVI hsJNel49qen9JwiFYMSKa2MYgdYSbeuuwXwUp0ZHeVFc5RnPK2wxws1xcnsdb9hRXs2UeTEE 0klG3HuXqJ96DzKrCieKHLjs330h+16gDWAFZSEoT7Mh3HFGI2dscVuBstQNgnwUMnsJv8jx c005CfLCjCBnJEhMd2/QFuLwCZv4IdoghKwYw18e61UbX2bFovo9dduD527pD4sFqi7U7ofv aO3yf+ulL6jiKypGvnbiBP3KY3aKxx6pHHH3aDc9eOqCUgrtS3+xt1du4+qxrYqEnrywFoJy 5zqSzbnTTjFpdTbY5SS52fIOktLlAKzEg6V9hkg2r08hC3/L4NVj6I4tsGZlqb2neRlHFmCr bQARAQABwsD8BBgBCAAmFiEERtYfQYWFu+uAZjYrrzGlXdb6f1cFAmcgaYIFCQWjmoACGwwA CgkQrzGlXdb6f1fDIgwAmpB7eL3XNSx3F+gbmksOPMqCU5rEswRedjEt6tBzFTXhdNFfhZTb vCddUNePZnzddgxAnDBcTqI1jx6Go6Hkti/mxJqXSczMYBsImD/lEm47axsADvpnNaEM+tmu m/cMKfpILUpy2Ey7CKXUA1vpzYeUD29EQWi0fxM0arplrVt/uzUdFRFQRn2hCqeDLBLONX1F Adq+re6M0dhKl4a2+erzZRIXh3vIGiDmpJEGrajrhqEnMXFp6toSiMGian94m8H3NT6rB64E JmdHgyjXADFbn2G5Mb6Pwa8KnnK1kYcZ+Pwu9LfMXfgI01Sh/k01hjUVmnpYep4nHUfwXA8r kn6WekD80DYbAfKyFAXQCO/nclZ82RNmJbDRi3AeMFrxKi6KgdGCp1Izhj9USaMOVqcuV2p0 Rsoq+sFqWOKaHWnQHCM9RkynQVqrgUaSawEbGlCP1KIhVmjfjVsmsCaKkUb9T6VeO+ZNe+Pn rPgMe6IIvn24UuW2f6fIt0AaqOWq In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Received-SPF: pass client-ip=2607:f8b0:4864:20::630; envelope-from=icegambit91@gmail.com; helo=mail-pl1-x630.google.com X-Spam_score_int: -17 X-Spam_score: -1.8 X-Spam_bar: - X-Spam_report: (-1.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Hi, On 1/21/25 10:07 PM, Eugenio Perez Martin wrote: > On Sun, Jan 19, 2025 at 7:37 AM Sahil Siddiq wrote: >> >> Hi, >> >> On 1/7/25 1:35 PM, Eugenio Perez Martin wrote: >>> On Fri, Jan 3, 2025 at 2:06 PM Sahil Siddiq wrote: >>>> >>>> Hi, >>>> >>>> On 12/20/24 12:28 PM, Eugenio Perez Martin wrote: >>>>> On Thu, Dec 19, 2024 at 8:37 PM Sahil Siddiq wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> On 12/17/24 1:20 PM, Eugenio Perez Martin wrote: >>>>>>> On Tue, Dec 17, 2024 at 6:45 AM Sahil Siddiq wrote: >>>>>>>> On 12/16/24 2:09 PM, Eugenio Perez Martin wrote: >>>>>>>>> On Sun, Dec 15, 2024 at 6:27 PM Sahil Siddiq wrote: >>>>>>>>>> On 12/10/24 2:57 PM, Eugenio Perez Martin wrote: >>>>>>>>>>> On Thu, Dec 5, 2024 at 9:34 PM Sahil Siddiq wrote: >>>>>>>>>>>> [...] >>>>>>>>>>>> I have been following the "Hands on vDPA: what do you do >>>>>>>>>>>> when you ain't got the hardware v2 (Part 2)" [1] blog to >>>>>>>>>>>> test my changes. To boot the L1 VM, I ran: >>>>>>>>>>>> >>>>>>>>>>>> sudo ./qemu/build/qemu-system-x86_64 \ >>>>>>>>>>>> -enable-kvm \ >>>>>>>>>>>> -drive file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio \ >>>>>>>>>>>> -net nic,model=virtio \ >>>>>>>>>>>> -net user,hostfwd=tcp::2222-:22 \ >>>>>>>>>>>> -device intel-iommu,snoop-control=on \ >>>>>>>>>>>> -device virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,ctrl_vq=on,ctrl_rx=on,packed=on,event_idx=off,bus=pcie.0,addr=0x4 \ >>>>>>>>>>>> -netdev tap,id=net0,script=no,downscript=no \ >>>>>>>>>>>> -nographic \ >>>>>>>>>>>> -m 8G \ >>>>>>>>>>>> -smp 4 \ >>>>>>>>>>>> -M q35 \ >>>>>>>>>>>> -cpu host 2>&1 | tee vm.log >>>>>>>>>>>> >>>>>>>>>>>> Without "guest_uso4=off,guest_uso6=off,host_uso=off, >>>>>>>>>>>> guest_announce=off" in "-device virtio-net-pci", QEMU >>>>>>>>>>>> throws "vdpa svq does not work with features" [2] when >>>>>>>>>>>> trying to boot L2. >>>>>>>>>>>> >>>>>>>>>>>> The enums added in commit #2 in this series is new and >>>>>>>>>>>> wasn't in the earlier versions of the series. Without >>>>>>>>>>>> this change, x-svq=true throws "SVQ invalid device feature >>>>>>>>>>>> flags" [3] and x-svq is consequently disabled. >>>>>>>>>>>> >>>>>>>>>>>> The first issue is related to running traffic in L2 >>>>>>>>>>>> with vhost-vdpa. >>>>>>>>>>>> >>>>>>>>>>>> In L0: >>>>>>>>>>>> >>>>>>>>>>>> $ ip addr add 111.1.1.1/24 dev tap0 >>>>>>>>>>>> $ ip link set tap0 up >>>>>>>>>>>> $ ip addr show tap0 >>>>>>>>>>>> 4: tap0: mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000 >>>>>>>>>>>> link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff >>>>>>>>>>>> inet 111.1.1.1/24 scope global tap0 >>>>>>>>>>>> valid_lft forever preferred_lft forever >>>>>>>>>>>> inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto kernel_ll >>>>>>>>>>>> valid_lft forever preferred_lft forever >>>>>>>>>>>> >>>>>>>>>>>> I am able to run traffic in L2 when booting without >>>>>>>>>>>> x-svq. >>>>>>>>>>>> >>>>>>>>>>>> In L1: >>>>>>>>>>>> >>>>>>>>>>>> $ ./qemu/build/qemu-system-x86_64 \ >>>>>>>>>>>> -nographic \ >>>>>>>>>>>> -m 4G \ >>>>>>>>>>>> -enable-kvm \ >>>>>>>>>>>> -M q35 \ >>>>>>>>>>>> -drive file=//root/L2.qcow2,media=disk,if=virtio \ >>>>>>>>>>>> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0 \ >>>>>>>>>>>> -device virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,bus=pcie.0,addr=0x7 \ >>>>>>>>>>>> -smp 4 \ >>>>>>>>>>>> -cpu host \ >>>>>>>>>>>> 2>&1 | tee vm.log >>>>>>>>>>>> >>>>>>>>>>>> In L2: >>>>>>>>>>>> >>>>>>>>>>>> # ip addr add 111.1.1.2/24 dev eth0 >>>>>>>>>>>> # ip addr show eth0 >>>>>>>>>>>> 2: eth0: mtu 1500 qdisc fq_codel state UP group default qlen 1000 >>>>>>>>>>>> link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff >>>>>>>>>>>> altname enp0s7 >>>>>>>>>>>> inet 111.1.1.2/24 scope global eth0 >>>>>>>>>>>> valid_lft forever preferred_lft forever >>>>>>>>>>>> inet6 fe80::9877:de30:5f17:35f9/64 scope link noprefixroute >>>>>>>>>>>> valid_lft forever preferred_lft forever >>>>>>>>>>>> >>>>>>>>>>>> # ip route >>>>>>>>>>>> 111.1.1.0/24 dev eth0 proto kernel scope link src 111.1.1.2 >>>>>>>>>>>> >>>>>>>>>>>> # ping 111.1.1.1 -w3 >>>>>>>>>>>> PING 111.1.1.1 (111.1.1.1) 56(84) bytes of data. >>>>>>>>>>>> 64 bytes from 111.1.1.1: icmp_seq=1 ttl=64 time=0.407 ms >>>>>>>>>>>> 64 bytes from 111.1.1.1: icmp_seq=2 ttl=64 time=0.671 ms >>>>>>>>>>>> 64 bytes from 111.1.1.1: icmp_seq=3 ttl=64 time=0.291 ms >>>>>>>>>>>> >>>>>>>>>>>> --- 111.1.1.1 ping statistics --- >>>>>>>>>>>> 3 packets transmitted, 3 received, 0% packet loss, time 2034ms >>>>>>>>>>>> rtt min/avg/max/mdev = 0.291/0.456/0.671/0.159 ms >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> But if I boot L2 with x-svq=true as shown below, I am unable >>>>>>>>>>>> to ping the host machine. >>>>>>>>>>>> >>>>>>>>>>>> $ ./qemu/build/qemu-system-x86_64 \ >>>>>>>>>>>> -nographic \ >>>>>>>>>>>> -m 4G \ >>>>>>>>>>>> -enable-kvm \ >>>>>>>>>>>> -M q35 \ >>>>>>>>>>>> -drive file=//root/L2.qcow2,media=disk,if=virtio \ >>>>>>>>>>>> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,x-svq=true,id=vhost-vdpa0 \ >>>>>>>>>>>> -device virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,bus=pcie.0,addr=0x7 \ >>>>>>>>>>>> -smp 4 \ >>>>>>>>>>>> -cpu host \ >>>>>>>>>>>> 2>&1 | tee vm.log >>>>>>>>>>>> >>>>>>>>>>>> In L2: >>>>>>>>>>>> >>>>>>>>>>>> # ip addr add 111.1.1.2/24 dev eth0 >>>>>>>>>>>> # ip addr show eth0 >>>>>>>>>>>> 2: eth0: mtu 1500 qdisc fq_codel state UP group default qlen 1000 >>>>>>>>>>>> link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff >>>>>>>>>>>> altname enp0s7 >>>>>>>>>>>> inet 111.1.1.2/24 scope global eth0 >>>>>>>>>>>> valid_lft forever preferred_lft forever >>>>>>>>>>>> inet6 fe80::9877:de30:5f17:35f9/64 scope link noprefixroute >>>>>>>>>>>> valid_lft forever preferred_lft forever >>>>>>>>>>>> >>>>>>>>>>>> # ip route >>>>>>>>>>>> 111.1.1.0/24 dev eth0 proto kernel scope link src 111.1.1.2 >>>>>>>>>>>> >>>>>>>>>>>> # ping 111.1.1.1 -w10 >>>>>>>>>>>> PING 111.1.1.1 (111.1.1.1) 56(84) bytes of data. >>>>>>>>>>>> From 111.1.1.2 icmp_seq=1 Destination Host Unreachable >>>>>>>>>>>> ping: sendmsg: No route to host >>>>>>>>>>>> From 111.1.1.2 icmp_seq=2 Destination Host Unreachable >>>>>>>>>>>> From 111.1.1.2 icmp_seq=3 Destination Host Unreachable >>>>>>>>>>>> >>>>>>>>>>>> --- 111.1.1.1 ping statistics --- >>>>>>>>>>>> 3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2076ms >>>>>>>>>>>> pipe 3 >>>>>>>>>>>> >>>>>>>>>>>> The other issue is related to booting L2 with "x-svq=true" >>>>>>>>>>>> and "packed=on". >>>>>>>>>>>> >>>>>>>>>>>> In L1: >>>>>>>>>>>> >>>>>>>>>>>> $ ./qemu/build/qemu-system-x86_64 \ >>>>>>>>>>>> -nographic \ >>>>>>>>>>>> -m 4G \ >>>>>>>>>>>> -enable-kvm \ >>>>>>>>>>>> -M q35 \ >>>>>>>>>>>> -drive file=//root/L2.qcow2,media=disk,if=virtio \ >>>>>>>>>>>> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-svq=true \ >>>>>>>>>>>> -device virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,packed=on,bus=pcie.0,addr=0x7 \ >>>>>>>>>>>> -smp 4 \ >>>>>>>>>>>> -cpu host \ >>>>>>>>>>>> 2>&1 | tee vm.log >>>>>>>>>>>> >>>>>>>>>>>> The kernel throws "virtio_net virtio1: output.0:id 0 is not >>>>>>>>>>>> a head!" [4]. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> So this series implements the descriptor forwarding from the guest to >>>>>>>>>>> the device in packed vq. We also need to forward the descriptors from >>>>>>>>>>> the device to the guest. The device writes them in the SVQ ring. >>>>>>>>>>> >>>>>>>>>>> The functions responsible for that in QEMU are >>>>>>>>>>> hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_flush, which is called by >>>>>>>>>>> the device when used descriptors are written to the SVQ, which calls >>>>>>>>>>> hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_get_buf. We need to do >>>>>>>>>>> modifications similar to vhost_svq_add: Make them conditional if we're >>>>>>>>>>> in split or packed vq, and "copy" the code from Linux's >>>>>>>>>>> drivers/virtio/virtio_ring.c:virtqueue_get_buf. >>>>>>>>>>> >>>>>>>>>>> After these modifications you should be able to ping and forward >>>>>>>>>>> traffic. As always, It is totally ok if it needs more than one >>>>>>>>>>> iteration, and feel free to ask any question you have :). >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I misunderstood this part. While working on extending >>>>>>>>>> hw/virtio/vhost-shadow-virtqueue.c:vhost_svq_get_buf() [1] >>>>>>>>>> for packed vqs, I realized that this function and >>>>>>>>>> vhost_svq_flush() already support split vqs. However, I am >>>>>>>>>> unable to ping L0 when booting L2 with "x-svq=true" and >>>>>>>>>> "packed=off" or when the "packed" option is not specified >>>>>>>>>> in QEMU's command line. >>>>>>>>>> >>>>>>>>>> I tried debugging these functions for split vqs after running >>>>>>>>>> the following QEMU commands while following the blog [2]. >>>>>>>>>> >>>>>>>>>> Booting L1: >>>>>>>>>> >>>>>>>>>> $ sudo ./qemu/build/qemu-system-x86_64 \ >>>>>>>>>> -enable-kvm \ >>>>>>>>>> -drive file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio \ >>>>>>>>>> -net nic,model=virtio \ >>>>>>>>>> -net user,hostfwd=tcp::2222-:22 \ >>>>>>>>>> -device intel-iommu,snoop-control=on \ >>>>>>>>>> -device virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,ctrl_vq=on,ctrl_rx=on,packed=off,event_idx=off,bus=pcie.0,addr=0x4 \ >>>>>>>>>> -netdev tap,id=net0,script=no,downscript=no \ >>>>>>>>>> -nographic \ >>>>>>>>>> -m 8G \ >>>>>>>>>> -smp 4 \ >>>>>>>>>> -M q35 \ >>>>>>>>>> -cpu host 2>&1 | tee vm.log >>>>>>>>>> >>>>>>>>>> Booting L2: >>>>>>>>>> >>>>>>>>>> # ./qemu/build/qemu-system-x86_64 \ >>>>>>>>>> -nographic \ >>>>>>>>>> -m 4G \ >>>>>>>>>> -enable-kvm \ >>>>>>>>>> -M q35 \ >>>>>>>>>> -drive file=//root/L2.qcow2,media=disk,if=virtio \ >>>>>>>>>> -netdev type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,x-svq=true,id=vhost-vdpa0 \ >>>>>>>>>> -device virtio-net-pci,netdev=vhost-vdpa0,disable-legacy=on,disable-modern=off,ctrl_vq=on,ctrl_rx=on,event_idx=off,bus=pcie.0,addr=0x7 \ >>>>>>>>>> -smp 4 \ >>>>>>>>>> -cpu host \ >>>>>>>>>> 2>&1 | tee vm.log >>>>>>>>>> >>>>>>>>>> I printed out the contents of VirtQueueElement returned >>>>>>>>>> by vhost_svq_get_buf() in vhost_svq_flush() [3]. >>>>>>>>>> I noticed that "len" which is set by "vhost_svq_get_buf" >>>>>>>>>> is always set to 0 while VirtQueueElement.len is non-zero. >>>>>>>>>> I haven't understood the difference between these two "len"s. >>>>>>>>>> >>>>>>>>> >>>>>>>>> VirtQueueElement.len is the length of the buffer, while the len of >>>>>>>>> vhost_svq_get_buf is the bytes written by the device. In the case of >>>>>>>>> the tx queue, VirtQueuelen is the length of the tx packet, and the >>>>>>>>> vhost_svq_get_buf is always 0 as the device does not write. In the >>>>>>>>> case of rx, VirtQueueElem.len is the available length for a rx frame, >>>>>>>>> and the vhost_svq_get_buf len is the actual length written by the >>>>>>>>> device. >>>>>>>>> >>>>>>>>> To be 100% accurate a rx packet can span over multiple buffers, but >>>>>>>>> SVQ does not need special code to handle this. >>>>>>>>> >>>>>>>>> So vhost_svq_get_buf should return > 0 for rx queue (svq->vq->index == >>>>>>>>> 0), and 0 for tx queue (svq->vq->index % 2 == 1). >>>>>>>>> >>>>>>>>> Take into account that vhost_svq_get_buf only handles split vq at the >>>>>>>>> moment! It should be renamed or splitted into vhost_svq_get_buf_split. >>>>>>>> >>>>>>>> In L1, there are 2 virtio network devices. >>>>>>>> >>>>>>>> # lspci -nn | grep -i net >>>>>>>> 00:02.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1000] >>>>>>>> 00:04.0 Ethernet controller [0200]: Red Hat, Inc. Virtio 1.0 network device [1af4:1041] (rev 01) >>>>>>>> >>>>>>>> I am using the second one (1af4:1041) for testing my changes and have >>>>>>>> bound this device to the vp_vdpa driver. >>>>>>>> >>>>>>>> # vdpa dev show -jp >>>>>>>> { >>>>>>>> "dev": { >>>>>>>> "vdpa0": { >>>>>>>> "type": "network", >>>>>>>> "mgmtdev": "pci/0000:00:04.0", >>>>>>>> "vendor_id": 6900, >>>>>>>> "max_vqs": 3, >>>>>>> >>>>>>> How is max_vqs=3? For this to happen L0 QEMU should have >>>>>>> virtio-net-pci,...,queues=3 cmdline argument. >>>>> >>>>> Ouch! I totally misread it :(. Everything is correct, max_vqs should >>>>> be 3. I read it as the virtio_net queues, which means queue *pairs*, >>>>> as it includes rx and tx queue. >>>> >>>> Understood :) >>>> >>>>>> >>>>>> I am not sure why max_vqs is 3. I haven't set the value of queues to 3 >>>>>> in the cmdline argument. Is max_vqs expected to have a default value >>>>>> other than 3? >>>>>> >>>>>> In the blog [1] as well, max_vqs is 3 even though there's no queues=3 >>>>>> argument. >>>>>> >>>>>>> It's clear the guest is not using them, we can add mq=off >>>>>>> to simplify the scenario. >>>>>> >>>>>> The value of max_vqs is still 3 after adding mq=off. The whole >>>>>> command that I run to boot L0 is: >>>>>> >>>>>> $ sudo ./qemu/build/qemu-system-x86_64 \ >>>>>> -enable-kvm \ >>>>>> -drive file=//home/valdaarhun/valdaarhun/qcow2_img/L1.qcow2,media=disk,if=virtio \ >>>>>> -net nic,model=virtio \ >>>>>> -net user,hostfwd=tcp::2222-:22 \ >>>>>> -device intel-iommu,snoop-control=on \ >>>>>> -device virtio-net-pci,netdev=net0,disable-legacy=on,disable-modern=off,iommu_platform=on,guest_uso4=off,guest_uso6=off,host_uso=off,guest_announce=off,mq=off,ctrl_vq=on,ctrl_rx=on,packed=off,event_idx=off,bus=pcie.0,addr=0x4 \ >>>>>> -netdev tap,id=net0,script=no,downscript=no \ >>>>>> -nographic \ >>>>>> -m 8G \ >>>>>> -smp 4 \ >>>>>> -M q35 \ >>>>>> -cpu host 2>&1 | tee vm.log >>>>>> >>>>>> Could it be that 2 of the 3 vqs are used for the dataplane and >>>>>> the third vq is the control vq? >>>>>> >>>>>>>> "max_vq_size": 256 >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> The max number of vqs is 3 with the max size being 256. >>>>>>>> >>>>>>>> Since, there are 2 virtio net devices, vhost_vdpa_svqs_start [1] >>>>>>>> is called twice. For each of them. it calls vhost_svq_start [2] >>>>>>>> v->shadow_vqs->len number of times. >>>>>>>> >>>>>>> >>>>>>> Ok I understand this confusion, as the code is not intuitive :). Take >>>>>>> into account you can only have svq in vdpa devices, so both >>>>>>> vhost_vdpa_svqs_start are acting on the vdpa device. >>>>>>> >>>>>>> You are seeing two calls to vhost_vdpa_svqs_start because virtio (and >>>>>>> vdpa) devices are modelled internally as two devices in QEMU: One for >>>>>>> the dataplane vq, and other for the control vq. There are historical >>>>>>> reasons for this, but we use it in vdpa to always shadow the CVQ while >>>>>>> leaving dataplane passthrough if x-svq=off and the virtio & virtio-net >>>>>>> feature set is understood by SVQ. >>>>>>> >>>>>>> If you break at vhost_vdpa_svqs_start with gdb and go higher in the >>>>>>> stack you should reach vhost_net_start, that starts each vhost_net >>>>>>> device individually. >>>>>>> >>>>>>> To be 100% honest, each dataplain *queue pair* (rx+tx) is modelled >>>>>>> with a different vhost_net device in QEMU, but you don't need to take >>>>>>> that into account implementing the packed vq :). >>>>>> >>>>>> Got it, this makes sense now. >>>>>> >>>>>>>> Printing the values of dev->vdev->name, v->shadow_vqs->len and >>>>>>>> svq->vring.num in vhost_vdpa_svqs_start gives: >>>>>>>> >>>>>>>> name: virtio-net >>>>>>>> len: 2 >>>>>>>> num: 256 >>>>>>>> num: 256 >>>>>>> >>>>>>> First QEMU's vhost_net device, the dataplane. >>>>>>> >>>>>>>> name: virtio-net >>>>>>>> len: 1 >>>>>>>> num: 64 >>>>>>>> >>>>>>> >>>>>>> Second QEMU's vhost_net device, the control virtqueue. >>>>>> >>>>>> Ok, if I understand this correctly, the control vq doesn't >>>>>> need separate queues for rx and tx. >>>>>> >>>>> >>>>> That's right. Since CVQ has one reply per command, the driver can just >>>>> send ro+rw descriptors to the device. In the case of RX, the device >>>>> needs a queue with only-writable descriptors, as neither the device or >>>>> the driver knows how many packets will arrive. >>>> >>>> Got it, this makes sense now. >>>> >>>>>>>> I am not sure how to match the above log lines to the >>>>>>>> right virtio-net device since the actual value of num >>>>>>>> can be less than "max_vq_size" in the output of "vdpa >>>>>>>> dev show". >>>>>>>> >>>>>>> >>>>>>> Yes, the device can set a different vq max per vq, and the driver can >>>>>>> negotiate a lower vq size per vq too. >>>>>>> >>>>>>>> I think the first 3 log lines correspond to the virtio >>>>>>>> net device that I am using for testing since it has >>>>>>>> 2 vqs (rx and tx) while the other virtio-net device >>>>>>>> only has one vq. >>>>>>>> >>>>>>>> When printing out the values of svq->vring.num, >>>>>>>> used_elem.len and used_elem.id in vhost_svq_get_buf, >>>>>>>> there are two sets of output. One set corresponds to >>>>>>>> svq->vring.num = 64 and the other corresponds to >>>>>>>> svq->vring.num = 256. >>>>>>>> >>>>>>>> For svq->vring.num = 64, only the following line >>>>>>>> is printed repeatedly: >>>>>>>> >>>>>>>> size: 64, len: 1, i: 0 >>>>>>>> >>>>>>> >>>>>>> This is with packed=off, right? If this is testing with packed, you >>>>>>> need to change the code to accommodate it. Let me know if you need >>>>>>> more help with this. >>>>>> >>>>>> Yes, this is for packed=off. For the time being, I am trying to >>>>>> get L2 to communicate with L0 using split virtqueues and x-svq=true. >>>>>> >>>>> >>>>> Got it. >>>>> >>>>>>> In the CVQ the only reply is a byte, indicating if the command was >>>>>>> applied or not. This seems ok to me. >>>>>> >>>>>> Understood. >>>>>> >>>>>>> The queue can also recycle ids as long as they are not available, so >>>>>>> that part seems correct to me too. >>>>>> >>>>>> I am a little confused here. The ids are recycled when they are >>>>>> available (i.e., the id is not already in use), right? >>>>>> >>>>> >>>>> In virtio, available is that the device can use them. And used is that >>>>> the device returned to the driver. I think you're aligned it's just it >>>>> is better to follow the virtio nomenclature :). >>>> >>>> Got it. >>>> >>>>>>>> For svq->vring.num = 256, the following line is >>>>>>>> printed 20 times, >>>>>>>> >>>>>>>> size: 256, len: 0, i: 0 >>>>>>>> >>>>>>>> followed by: >>>>>>>> >>>>>>>> size: 256, len: 0, i: 1 >>>>>>>> size: 256, len: 0, i: 1 >>>>>>>> >>>>>>> >>>>>>> This makes sense for the tx queue too. Can you print the VirtQueue index? >>>>>> >>>>>> For svq->vring.num = 64, the vq index is 2. So the following line >>>>>> (svq->vring.num, used_elem.len, used_elem.id, svq->vq->queue_index) >>>>>> is printed repeatedly: >>>>>> >>>>>> size: 64, len: 1, i: 0, vq idx: 2 >>>>>> >>>>>> For svq->vring.num = 256, the following line is repeated several >>>>>> times: >>>>>> >>>>>> size: 256, len: 0, i: 0, vq idx: 1 >>>>>> >>>>>> This is followed by: >>>>>> >>>>>> size: 256, len: 0, i: 1, vq idx: 1 >>>>>> >>>>>> In both cases, queue_index is 1. To get the value of queue_index, >>>>>> I used "virtio_get_queue_index(svq->vq)" [2]. >>>>>> >>>>>> Since the queue_index is 1, I guess this means this is the tx queue >>>>>> and the value of len (0) is correct. However, nothing with >>>>>> queue_index % 2 == 0 is printed by vhost_svq_get_buf() which means >>>>>> the device is not sending anything to the guest. Is this correct? >>>>>> >>>>> >>>>> Yes, that's totally correct. >>>>> >>>>> You can set -netdev tap,...,vhost=off in L0 qemu and trace (or debug >>>>> with gdb) it to check what is receiving. You should see calls to >>>>> hw/net/virtio-net.c:virtio_net_flush_tx. The corresponding function to >>>>> receive is virtio_net_receive_rcu, I recommend you trace too just it >>>>> in case you see any strange call to it. >>>>> >>>> >>>> I added "vhost=off" to -netdev tap in L0's qemu command. I followed all >>>> the steps in the blog [1] up till the point where L2 is booted. Before >>>> booting L2, I had no issues pinging L0 from L1. >>>> >>>> For each ping, the following trace lines were printed by QEMU: >>>> >>>> virtqueue_alloc_element elem 0x5d041024f560 size 56 in_num 0 out_num 1 >>>> virtqueue_pop vq 0x5d04109b0ce8 elem 0x5d041024f560 in_num 0 out_num 1 >>>> virtqueue_fill vq 0x5d04109b0ce8 elem 0x5d041024f560 len 0 idx 0 >>>> virtqueue_flush vq 0x5d04109b0ce8 count 1 >>>> virtio_notify vdev 0x5d04109a8d50 vq 0x5d04109b0ce8 >>>> virtqueue_alloc_element elem 0x5d041024f560 size 56 in_num 1 out_num 0 >>>> virtqueue_pop vq 0x5d04109b0c50 elem 0x5d041024f560 in_num 1 out_num 0 >>>> virtqueue_fill vq 0x5d04109b0c50 elem 0x5d041024f560 len 110 idx 0 >>>> virtqueue_flush vq 0x5d04109b0c50 count 1 >>>> virtio_notify vdev 0x5d04109a8d50 vq 0x5d04109b0c50 >>>> >>>> The first 5 lines look like they were printed when an echo request was >>>> sent to L0 and the next 5 lines were printed when an echo reply was >>>> received. >>>> >>>> After booting L2, I set up the tap device's IP address in L0 and the >>>> vDPA port's IP address in L2. >>>> >>>> When trying to ping L0 from L2, I only see the following lines being >>>> printed: >>>> >>>> virtqueue_alloc_element elem 0x5d041099ffd0 size 56 in_num 0 out_num 1 >>>> virtqueue_pop vq 0x5d0410d87168 elem 0x5d041099ffd0 in_num 0 out_num 1 >>>> virtqueue_fill vq 0x5d0410d87168 elem 0x5d041099ffd0 len 0 idx 0 >>>> virtqueue_flush vq 0x5d0410d87168 count 1 >>>> virtio_notify vdev 0x5d0410d79a10 vq 0x5d0410d87168 >>>> >>>> There's no reception. I used wireshark to inspect the packets that are >>>> being sent and received through the tap device in L0. >>>> >>>> When pinging L0 from L2, I see one of the following two outcomes: >>>> >>>> Outcome 1: >>>> ---------- >>>> L2 broadcasts ARP packets and L0 replies to L2. >>>> >>>> Source Destination Protocol Length Info >>>> 52:54:00:12:34:57 Broadcast ARP 42 Who has 111.1.1.1? Tell 111.1.1.2 >>>> d2:6d:b9:61:e1:9a 52:54:00:12:34:57 ARP 42 111.1.1.1 is at d2:6d:b9:61:e1:9a >>>> >>>> Outcome 2 (less frequent): >>>> -------------------------- >>>> L2 sends an ICMP echo request packet to L0 and L0 sends a reply, >>>> but the reply is not received by L2. >>>> >>>> Source Destination Protocol Length Info >>>> 111.1.1.2 111.1.1.1 ICMP 98 Echo (ping) request id=0x0006, seq=1/256, ttl=64 >>>> 111.1.1.1 111.1.1.2 ICMP 98 Echo (ping) reply id=0x0006, seq=1/256, ttl=64 >>>> >>>> When pinging L2 from L0 I get the following output in >>>> wireshark: >>>> >>>> Source Destination Protocol Length Info >>>> 111.1.1.1 111.1.1.2 ICMP 100 Echo (ping) request id=0x002c, seq=2/512, ttl=64 (no response found!) >>>> >>>> I do see a lot of traced lines being printed (by the QEMU instance that >>>> was started in L0) with in_num > 1, for example: >>>> >>>> virtqueue_alloc_element elem 0x5d040fdbad30 size 56 in_num 1 out_num 0 >>>> virtqueue_pop vq 0x5d04109b0c50 elem 0x5d040fdbad30 in_num 1 out_num 0 >>>> virtqueue_fill vq 0x5d04109b0c50 elem 0x5d040fdbad30 len 76 idx 0 >>>> virtqueue_flush vq 0x5d04109b0c50 count 1 >>>> virtio_notify vdev 0x5d04109a8d50 vq 0x5d04109b0c50 >>>> >>> >>> So L0 is able to receive data from L2. We're halfway there, Good! :). >>> >>>> It looks like L1 is receiving data from L0 but this is not related to >>>> the pings that are sent from L2. I haven't figured out what data is >>>> actually being transferred in this case. It's not necessary for all of >>>> the data that L1 receives from L0 to be passed to L2, is it? >>>> >>> >>> It should be noise, yes. >>> >> >> Understood. >> >>>>>>>> For svq->vring.num = 256, the following line is >>>>>>>> printed 20 times, >>>>>>>> >>>>>>>> size: 256, len: 0, i: 0 >>>>>>>> >>>>>>>> followed by: >>>>>>>> >>>>>>>> size: 256, len: 0, i: 1 >>>>>>>> size: 256, len: 0, i: 1 >>>>>>>> >>>>>>> >>>>>>> This makes sense for the tx queue too. Can you print the VirtQueue index? >>>>>> >>>>>> For svq->vring.num = 64, the vq index is 2. So the following line >>>>>> (svq->vring.num, used_elem.len, used_elem.id, svq->vq->queue_index) >>>>>> is printed repeatedly: >>>>>> >>>>>> size: 64, len: 1, i: 0, vq idx: 2 >>>>>> >>>>>> For svq->vring.num = 256, the following line is repeated several >>>>>> times: >>>>>> >>>>>> size: 256, len: 0, i: 0, vq idx: 1 >>>>>> >>>>>> This is followed by: >>>>>> >>>>>> size: 256, len: 0, i: 1, vq idx: 1 >>>>>> >>>>>> In both cases, queue_index is 1. >>>> >>>> I also noticed that there are now some lines with svq->vring.num = 256 >>>> where len > 0. These lines were printed by the QEMU instance running >>>> in L1, so this corresponds to data that was received by L2. >>>> >>>> svq->vring.num used_elem.len used_elem.id svq->vq->queue_index >>>> size: 256 len: 82 i: 0 vq idx: 0 >>>> size: 256 len: 82 i: 1 vq idx: 0 >>>> size: 256 len: 82 i: 2 vq idx: 0 >>>> size: 256 len: 54 i: 3 vq idx: 0 >>>> >>>> I still haven't figured out what data was received by L2 but I am >>>> slightly confused as to why this data was received by L2 but not >>>> the ICMP echo replies sent by L0. >>>> >>> >>> We're on a good track, let's trace it deeper. I guess these are >>> printed from vhost_svq_flush, right? Do virtqueue_fill, >>> virtqueue_flush, and event_notifier_set(&svq->svq_call) run properly, >>> or do you see anything strange with gdb / tracing? >>> >> >> Apologies for the delay in replying. It took me a while to figure >> this out, but I have now understood why this doesn't work. L1 is >> unable to receive messages from L0 because they get filtered out >> by hw/net/virtio-net.c:receive_filter [1]. There's an issue with >> the MAC addresses. >> >> In L0, I have: >> >> $ ip a show tap0 >> 6: tap0: mtu 1500 qdisc fq_codel state UNKNOWN group default qlen 1000 >> link/ether d2:6d:b9:61:e1:9a brd ff:ff:ff:ff:ff:ff >> inet 111.1.1.1/24 scope global tap0 >> valid_lft forever preferred_lft forever >> inet6 fe80::d06d:b9ff:fe61:e19a/64 scope link proto kernel_ll >> valid_lft forever preferred_lft forever >> >> In L1: >> >> # ip a show eth0 >> 2: eth0: mtu 1500 qdisc fq_codel state UP group default qlen 1000 >> link/ether 52:54:00:12:34:56 brd ff:ff:ff:ff:ff:ff >> altname enp0s2 >> inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic noprefixroute eth0 >> valid_lft 83455sec preferred_lft 83455sec >> inet6 fec0::7bd2:265e:3b8e:5acc/64 scope site dynamic noprefixroute >> valid_lft 86064sec preferred_lft 14064sec >> inet6 fe80::50e7:5bf6:fff8:a7b0/64 scope link noprefixroute >> valid_lft forever preferred_lft forever >> >> I'll call this L1-eth0. >> >> In L2: >> # ip a show eth0 >> 2: eth0: mtu 1500 qdisc fq_codel state UP gro0 >> link/ether 52:54:00:12:34:57 brd ff:ff:ff:ff:ff:ff >> altname enp0s7 >> inet 111.1.1.2/24 scope global eth0 >> valid_lft forever preferred_lft forever >> >> I'll call this L2-eth0. >> >> Apart from eth0, lo is the only other device in both L1 and L2. >> >> A frame that L1 receives from L0 has L2-eth0's MAC address (LSB = 57) >> as its destination address. When booting L2 with x-svq=false, the >> value of n->mac in VirtIONet is also L2-eth0. So, L1 accepts >> the frames and passes them on to L2 and pinging works [2]. >> > > So this behavior is interesting by itself. But L1's kernel net system > should not receive anything. As I read it, even if it receives it, it > should not forward the frame to L2 as it is in a different subnet. Are > you able to read it using tcpdump on L1? I ran "tcpdump -i eth0" in L1. It didn't capture any of the packets that were directed at L2 even though L2 was able to receive them. Similarly, it didn't capture any packets that were sent from L2 to L0. This is when L2 is launched with x-svq=false. With x-svq=true, forcibly setting the LSB of n->mac to 0x57 in receive_filter allows L2 to receive packets from L0. I added the following line just before line 1771 [1] to check this out. n->mac[5] = 0x57; > Maybe we can make the scenario clearer by telling which virtio-net > device is which with virtio_net_pci,mac=XX:... ? > >> However, when booting L2 with x-svq=true, n->mac is set to L1-eth0 >> (LSB = 56) in virtio_net_handle_mac() [3]. > > Can you tell with gdb bt if this function is called from net or the > SVQ subsystem? I am struggling to learn how one uses gdb to debug QEMU. I tried running QEMU in L0 with -s and -S in one terminal. In another terminal, I ran the following: $ gdb ./build/qemu-system-x86_64 I then ran the following in gdb's console, but stepping through or continuing the execution gives me errors: (gdb) target remote localhost:1234 (gdb) break -source ../hw/net/virtio-net.c -function receive_filter (gdb) c Continuing. Warning: Cannot insert breakpoint 2. Cannot access memory at address 0x9058c6 Command aborted. (gdb) ni Continuing. Warning: Cannot insert breakpoint 2. Cannot access memory at address 0x9058c6 Command aborted. I built QEMU using ./configure --enable-debug. I also tried using the --disable-pie option but this results in a build error. [8063/8844] Linking target qemu-keymap FAILED: qemu-keymap cc -m64 -o qemu-keymap <...> /usr/bin/ld: libevent-loop-base.a.p/event-loop-base.c.o: relocation R_X86_64_32 against `.rodata' can not be used when making a PIE object; recompile with -fPIE /usr/bin/ld: failed to set dynamic section sizes: bad value collect2: error: ld returned 1 exit status >> n->mac_table.macs also >> does not seem to have L2-eth0's MAC address. Due to this, >> receive_filter() filters out all the frames [4] that were meant for >> L2-eth0. >> > > In the vp_vdpa scenario of the blog receive_filter should not be > called in the qemu running in the L1 guest, the nested one. Can you > check it with gdb or by printing a trace if it is called? This is right. receive_filter is not called in L1's QEMU with x-svq=true. >> With x-svq=true, I see that n->mac is set by virtio_net_handle_mac() >> [3] when L1 receives VIRTIO_NET_CTRL_MAC_ADDR_SET. With x-svq=false, >> virtio_net_handle_mac() doesn't seem to be getting called. I haven't >> understood how the MAC address is set in VirtIONet when x-svq=false. >> Understanding this might help see why n->mac has different values >> when x-svq is false vs when it is true. >> > > Ok this makes sense, as x-svq=true is the one that receives the set > mac message. You should see it in L0's QEMU though, both in x-svq=on > and x-svq=off scenarios. Can you check it? L0's QEMU seems to be receiving the "set mac" message only when L1 is launched with x-svq=true. With x-svq=off, I don't see any call to virtio_net_handle_mac with cmd == VIRTIO_NET_CTRL_MAC_ADDR_SET in L0. Thanks, Sahil [1] https://gitlab.com/qemu-project/qemu/-/blob/master/hw/net/virtio-net.c#L1771