From: "Michael S. Tsirkin" <mst@redhat.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: "Fiona Ebner" <f.ebner@proxmox.com>,
qemu-devel@nongnu.org, "Peter Maydell" <peter.maydell@linaro.org>,
"Jason Wang" <jasowang@redhat.com>,
"Lei Yang" <leiyang@redhat.com>,
"Eduardo Habkost" <eduardo@habkost.net>,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Yanan Wang" <wangyanan55@huawei.com>,
"Zhao Liu" <zhao1.liu@intel.com>,
"Gabriel Goller" <g.goller@proxmox.com>,
"Stefan Hanreich" <s.hanreich@proxmox.com>,
"Thomas Lamprecht" <t.lamprecht@proxmox.com>
Subject: Re: [PULL 12/14] virtio-net: Advertise UDP tunnel GSO support by default
Date: Mon, 8 Jun 2026 16:06:08 -0400 [thread overview]
Message-ID: <20260608160546-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <375064ee-5a6f-44d4-8192-09f39804be69@redhat.com>
On Mon, Jun 08, 2026 at 07:12:14PM +0200, Paolo Abeni wrote:
> On 6/8/26 12:41 PM, Fiona Ebner wrote:
> > Am 05.06.26 um 4:54 PM schrieb Paolo Abeni:
> >> On 6/5/26 4:02 PM, Fiona Ebner wrote:
> >>> Am 09.11.25 um 4:10 PM schrieb Michael S. Tsirkin:
> >>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >>>> index 17ed0ef919..3b85560f6f 100644
> >>>> --- a/hw/net/virtio-net.c
> >>>> +++ b/hw/net/virtio-net.c
> >>>> @@ -4299,19 +4299,19 @@ static const Property virtio_net_properties[] = {
> >>>> VIRTIO_DEFINE_PROP_FEATURE("host_tunnel", VirtIONet,
> >>>> host_features_ex,
> >>>> VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO,
> >>>> - false),
> >>>> + true),
> >>> it seems that the host_tunnel setting can cause issues when VXLAN
> >>> traffic originating in a guest goes over a physical NIC which does not
> >>> support the feature. We received several reports about the issue
> >>> [0][1][2][3] and were able to reproduce it. Turning off the
> >>> 'host_tunnel' property in the commandline for the VirtIO net device
> >>> makes TCP traffic work. The network configuration from our reproducer
> >>> setup is as follows:
> >>>
> >>> guest A (iperf3 -c) guest B (iperf3 -s)
> >>> vxlan using vNIC as underlay vxlan using vNIC as underlay
> >>> virtualized NIC exposed to guest virtualized NIC exposed to guest
> >>> ---guest boundary--- ---guest boundary---
> >>> tap device connected to bridge tap device connected to bridge
> >>> bridge with physical NIC as port bridge with physical NIC as port
> >>> physical NIC <---host boundary---> physical NIC
> >>>
> >>> Bridge configuration:
> >>> iface vmbr0 inet static
> >>> address 10.48.0.109/20
> >>> gateway 10.48.0.1
> >>> bridge-ports nic3
> >>> bridge-stp off
> >>> bridge-fd 0
> >>> bridge-vlan-aware yes
> >>> bridge-vids 2-4094
> >>>
> >>> VXLAN created with:
> >>> ip link add vxlan0 type vxlan id 100 remote X dstport 4789 dev eth1
> >>> where eth1 is the virtualized NIC exposed to the guest
> >>>
> >>> The physical NIC does not have the feature:
> >>> Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme
> >>> BCM5719 Gigabit Ethernet PCIe [14e4:1657] (rev 01)
> >>> tx-udp_tnl-segmentation: off [fixed]
> >>> tx-udp_tnl-csum-segmentation: off [fixed]
> >>>
> >>> Using a physical NIC which does have the feature works:
> >>> Ethernet controller [0200]: Broadcom Inc. and subsidiaries BCM57504
> >>> NetXtreme-E 10Gb/25Gb/40Gb/50Gb/100Gb Ethernet [14e4:1751] (rev 11)
> >>> tx-udp_tnl-segmentation: on
> >>> tx-udp_tnl-csum-segmentation: on
> >>>
> >>> Host kernel:
> >>> Proxmox VE with 7.0.2-6-pve
> >>>
> >>> Guest kernel:
> >>> Apline with 6.18.34-0-lts
> >>>
> >>> QEMU commandline for the vNIC:
> >>>> -netdev 'type=tap,id=net2,ifname=tap103i2,script=/usr/libexec/qemu-server/pve-bridge,downscript=/usr/libexec/qemu-server/pve-bridgedown,vhost=on' \
> >>>> -device 'virtio-net-pci,mac=BC:24:11:78:C3:3B,netdev=net2,bus=pci.0,addr=0x14,id=net2,rx_queue_size=1024,tx_queue_size=256,host_mtu=1500' \
> >>>
> >>> We can see that QEMU sets the features for the tap interface via ioctl()
> >>> and the host kernel allows it:
> >>> tx-udp_tnl-segmentation: on
> >>> tx-udp_tnl-csum-segmentation: on
> >>>
> >>> As far as we understand, in the problematic scenario, nothing is ever
> >>> filling in the checksums for the inner TCP packets, meaning the outer
> >>> UDP checksum ends up being wrong on the target side. Is the host kernel
> >>> responsible for doing that before passing the packet to the physical NIC
> >>> (without the feature)? Or who would be?
> >>>
> >>> Turning off host_tunnel_csum without turning off host_tunnel does not help.
> >>>
> >>> Interestingly, turning off the features for the working physical NIC
> >>> does not make it break:
> >>> tx-udp_tnl-segmentation: off
> >>> tx-udp_tnl-csum-segmentation: off
> >>> Could it be that the NIC just always fills in the inner TCP checksums
> >>> regardless of that setting?
> >>>
> >>> On the other hand, running
> >>> localhost:~# ethtool -K eth2 tx-checksum-ip-generic off
> >>> Actual changes:
> >>> tx-checksum-ip-generic: off
> >>> tx-tcp-segmentation: off [not requested]
> >>> tx-tcp-ecn-segmentation: off [not requested]
> >>> tx-tcp6-segmentation: off [not requested]
> >>> tx-udp-segmentation: off [not requested]
> >>> inside the guests makes it work for the physical NIC without the
> >>> tx-udp_tnl* features.
> >>>
> >>> I wanted to ask if this configuration is expected to be unsupported and
> >>> if the management is expected to turn off the feature on the commandline
> >>> if the traffic might go over a physical NIC without the feature. Or if
> >>> this could be a kernel or NIC bug that should be investigated further?
> >>> In the former case, should the option really be turned on by default
> >>> with new machine versions?
> >>
> >> Thank you for the detailed report. The configuration you describe is
> >> supported and expected to work. The fact that different results are
> >> obtained on top of a NIC with:
> >>
> >> [1] tx-udp_tnl-segmentation: off [fixed]
> >>
> >> WRT to similar setup on top of NIC with:
> >>
> >> [2] tx-udp_tnl-segmentation: off
> >>
> >> is indeed strange/unexpected, as the two scenarios are indistinguishable
> >> from the stack perspective. I suspect the issue is NIC driver dependent.
> >>
> >> I understand [1] is using a tg3 driver, and [2] bnxt, both running Linux
> >> 7.0.2, am I correct?
> >
> > Yes.
> >
> >> If you disable csum offloading on the tg3 NIC, does that impact the results?
> >
> > Yes, doing
> >
> > root@tamy3:~# ethtool -K nic3 tx-checksum-ipv4 off
> > Actual changes:
> > tx-checksum-ipv4: off
> > tx-tcp-segmentation: off [not requested]
> > tx-tcp-ecn-segmentation: off [not requested]
> >
> > on both hosts makes it work.
> >
> >> If you have such data handy, could you please share pcap captures on
> >> both ends? links to some accessible URL would be better than sending a
> >> lot of data to the ML, I think.
> >
> > I captured the following while the problem is present with
> > tcpdump -i foo udp port 4789 -w bar.pcap
> > on the host interfaces (tap, bridge and physical NIC) just to be sure.
> > Looking at it with tcpdump -envvvr, within the same host, only the
> > timestamps change. Between the hosts, the UDP checksums do change, but
> > the inner TCP checksums do not. So I suppose the NIC fills in the UDP
> > checksum based on the still wrong data? Since the UDP checksum would
> > already be correct if the TCP checksums would be fixed up?
> >
> > For the NIC with the tx-udp_tnl features, the inner TCP checksums do get
> > corrected and the UDP checksum stays the same. I did not include
> > captures for this.
> >
> > IPs for the guest running iperf -s (on host tamy2)
> > 10.48.6.81 for the virtualized NIC
> > 10.0.123.102 for the VXLAN
> >
> > IPs for the guest running iperf -c (on host tamy3):
> > 10.48.6.101 for the virtualized NIC
> > 10.0.123.103 for the VXLAN
> >
> > The captures are short, so I take the liberty to just provide them directly:
> >
> > [I] febner@enia ~> tar cf pcap.tar tamy*.pcap
> > [I] febner@enia ~> xz pcap.tar
> > [I] febner@enia ~> base64 -w 70 pcap.tar.xz
> > /Td6WFoAAATm1rRGBMDXBYCgASEBFgAAAAAAAKQwOkPgT/8Cz10AOhhJ/551cIJN23SQMX
> > Q2Us4cGiof2bxOS4FK4DxejNh+76NiWIpdIfOxrB5urac3FT0mPKMbUreSY+04/NhofcgS
> > Zz41D6t/Xp+VkPxNYx7Xsp3xz4xUCsVuK205jz6G/NAY0bJ0+UrJuCkP0G5VBtn88hJstD
> > 7qlaT7qcBLECseOO1OfqsLezxasbm5p614IL18cqAVMCMWucr/Kh2Oqth26v7zI4SVEJC/
> > YSEgaOhfjCbQZSi85BEw9/NSZO6IqoyNLrEiPUPgXTWH63NssG+4RMuBswrkgN5Wld70B1
> > mROOCwKbo9b9oXI4DumGHqgCV5jdAxzITpEjMQpvDKh6NvM5L/8v1cPiGjLFSL2JesZ0F5
> > dTbstymv1q4eN+9f3ng+4AXCvDzaziYMwtGwxYyptK5qDI2oGsCIGwFDpP/ZEw7NYI9EMM
> > G2+SDG6D8bKgKWl9Mi6EJcqSMVKFR1P1Z/P3XJ/9sWOMJug1IVYZGIJmtXXM3+roqOEGMF
> > tco/LMUJHgdmfkitfuZ5tN1+0EVE0/f4GQiUpdidjqfZ2m9jL0svcGXUd5D3LN0tbh5vmP
> > KzXQNtMQiMY6Fj7gbzDbOQGGW/L3/34B5YV+pWEpzhAbeTI9KL0ZF3vJ0OESlL9OMhrqgl
> > WX23bxek2h9eG15eO9cderaoCOFb8NEKIjC+UTh2Ir7/ZFfDvlXeGB/3jXM8OTmWmJSr5b
> > CrAvBQ4xvow3hwKq2Fbyu7aU6KycVpo03a+59LqxPyRfc3qRXcoUnp8MTi2YUk+kfYR6mI
> > S9AE/5xYFzb7I40RUBPUm0OCzguzk9qlIcab3lnTFnrMWa+Cj9AMIkWEEf0tMzw0v9+17u
> > VJg/8tWMad3d9Jc5Z6B9kOukzGvgVEWoq4z9snb/k6u2sBVY36q2iI1cmSPrI+UcF2GtSA
> > Qs6bt/T/c1Xi2r0Up+tRDrIE9O2aNAAAAKAWpkIVfsvrAAHzBYCgAQAA6om1scRn+wIAAA
> > AABFla
>
> Thanks for the data. The bug is in the virtio_net driver; non GSO
> packets requiring inner header csum are handled unmodified from the
> guest to the host, and the H/W NICs has to compute the inner transport
> header csum for the encap packet.
>
> Could you please try the attached kernel patch? You will need to update
> the kernel inside the guest.
Paolo was there supposed to be a patch here?
> > Do you have any tips where to start looking in the kernel? What is the
> > expected place where the TCP checksums are corrected if the NIC does not
> > have the tx-udp_tnl features?
>
> Hopefully no more investigation needed. FTR, for non GSO packets, the
> inner transport header csum should be computed inside the guest before
> transmitting on the virtio net device.
>
> For GSO packets such csum should be computed inside the host, just
> before transmitting on the H/W NIC (if the latter does not support the
> relevant offload).
>
> Thanks,
>
> Paolo
next prev parent reply other threads:[~2026-06-08 20:06 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-09 14:35 [PULL 00/14] virtio,pci,pc: fixes for 10.2 Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 01/14] MAINTAINERS: Update entry for AMD-Vi Emulation Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 02/14] amd_iommu: Fix handling of devices on buses != 0 Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 03/14] amd_iommu: Support 64-bit address for IOTLB lookup Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 04/14] vhost-user: fix shared object lookup handler logic Michael S. Tsirkin
2025-11-10 9:23 ` Albert Esteve
2025-11-10 14:37 ` Richard Henderson
2025-11-10 15:42 ` Michael S. Tsirkin
2025-11-10 15:57 ` Albert Esteve
2025-11-10 16:06 ` Michael S. Tsirkin
2025-11-10 18:54 ` Albert Esteve
2025-11-09 14:35 ` [PULL 05/14] intel_iommu: Handle PASID cache invalidation Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 06/14] intel_iommu: Reset pasid cache when system level reset Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 07/14] intel_iommu: Fix DMA failure when guest switches IOMMU domain Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 08/14] vhost-user: make vhost_set_vring_file() synchronous Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 09/14] tests/qtest/bios-tables-test: Prepare for _DSM change in the DSDT table Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 10/14] hw/pci-host/gpex-acpi: Fix _DSM function 0 support return value Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 11/14] tests/qtest/bios-tables-test: Update DSDT blobs after GPEX _DSM change Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 12/14] virtio-net: Advertise UDP tunnel GSO support by default Michael S. Tsirkin
2026-06-05 14:02 ` Fiona Ebner
2026-06-05 14:54 ` Paolo Abeni
2026-06-05 15:08 ` Paolo Abeni
2026-06-08 10:41 ` Fiona Ebner
2026-06-08 17:12 ` Paolo Abeni
2026-06-08 20:06 ` Michael S. Tsirkin [this message]
2026-06-09 7:03 ` Paolo Abeni
2026-06-05 15:20 ` Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 13/14] q35: increase default tseg size Michael S. Tsirkin
2025-11-09 14:35 ` [PULL 14/14] vhost-user.rst: clarify when FDs can be sent Michael S. Tsirkin
2025-11-10 16:57 ` [PULL 00/14] virtio,pci,pc: fixes for 10.2 Richard Henderson
2025-11-17 10:27 ` Michael S. Tsirkin
2025-11-17 11:44 ` Peter Maydell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260608160546-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=eduardo@habkost.net \
--cc=f.ebner@proxmox.com \
--cc=g.goller@proxmox.com \
--cc=jasowang@redhat.com \
--cc=leiyang@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=pabeni@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=s.hanreich@proxmox.com \
--cc=t.lamprecht@proxmox.com \
--cc=wangyanan55@huawei.com \
--cc=zhao1.liu@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.