* [PATCH v9 0/2] virtio-net: define UDP tunnel offload
@ 2024-10-04 8:13 Paolo Abeni
2024-10-04 8:13 ` [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature Paolo Abeni
` (2 more replies)
0 siblings, 3 replies; 40+ messages in thread
From: Paolo Abeni @ 2024-10-04 8:13 UTC (permalink / raw)
To: virtio-comment
Cc: maxime.coquelin, Eelco Chaudron, Jason Wang, Stefano Garzarella,
Willem de Bruijn, kshankar
UDP tunnel usage is ubiquitous in container deployment, and the ability
to offload UDP encapsulated GSO traffic impacts greatly the performances
and the CPU utilization of such use cases.
This series introduces separate features to handle both UDP tunnel
segmentation offload (patch 1) and UDP tunnel outer checksum offload
(patch 2).
This revision is hopefully a minor delta over the previous iteration,
covering a few problems spotted thanks to the PoC implementation
available here:
https://github.com/pabeni/linux-devel/commits/virtio_tunnel_gso/
https://github.com/pabeni/qemu/commits/virtio_tunnel/
Changes from v8:
- rebased on top of virtio-1.4, changed udp-tunnel features number
to avoid conflicts/duplications
- Clarified the usage of DATA_VALID flag with UDP tunneled GSO packets
- mandate strict hdr validation on the rx side, too - for UDP tunneled
GSO packets only
Changes from v7:
- minor cleanup in patch 1/2
- dropped confusing wording about csum validation in patch 2/2
Changes from v6:
- replaced inner_protocol_type with
VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6}
- many clarifications and cleanup, see the individual patches changelog
for the details
Changes from v5:
- split in 2 patches
- dropped outer_mh_offset field
- many csum related clarification
---
Note that such while the mentioned PoC is based on the previous
iteration of this change, it already deals with all the delta introduced
here except the refined DATA_VALID handling. I'll try to update the PoC
WRT that ASAP.
Paolo Abeni (2):
virtio-net: define UDP tunnel segmentation offload feature
virtio-net: define UDP tunnel checksum offload feature
device-types/net/description.tex | 325 ++++++++++++++++++++++++++++++-
1 file changed, 315 insertions(+), 10 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 40+ messages in thread* [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-04 8:13 [PATCH v9 0/2] virtio-net: define UDP tunnel offload Paolo Abeni @ 2024-10-04 8:13 ` Paolo Abeni 2024-10-09 7:18 ` Jason Wang 2024-10-04 8:13 ` [PATCH v9 2/2] virtio-net: define UDP tunnel checksum " Paolo Abeni 2024-10-04 16:53 ` [PATCH v9 0/2] virtio-net: define UDP tunnel offload Paolo Abeni 2 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-04 8:13 UTC (permalink / raw) To: virtio-comment Cc: maxime.coquelin, Eelco Chaudron, Jason Wang, Stefano Garzarella, Willem de Bruijn, kshankar The VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} are two gso_type flags allowing respectively GSO over UDPv4 tunnel and GSO over UDPv6 tunnel. They can be negotiated on both the host and guest sides. One constraint addressed here is that the virtio side (either device or driver) receiving a UDP tunneled GSO packet must be able to reconstruct completely the inner and outer headers offset - to allow for later GSO. To accommodate such need, new fields are introduced in the virtio_net header: outer_th_offset and inner_nh_offset. They map directly to the corresponding header information. The inner transport header is implied by the (inner) checksum offload. Those fields are required because a virtio device H/W implementation performing segmentation for UDP tunneled packet will need to touch the outer transport protocol (for the UDP length filed), the inner network protocol (for the total length field, in the IPv4 case). Note that segmentation will additionally need to touch the outer network protocol and the inner transport protocol. The first is implied/easily found with trivial parsing, the latter is identified by the existing csum_start field. Note that there is no concept of UDP tunnel type negotiation (e.g. vxlan, geneve, vxlan-gpe, etc.), as a virtio device H/W implementation can perform segmentation for every possible UDP-tunnel given the specified new fields. In the reverse direction, if a virtio device H/W implementation receives some traffic for an unknown or unsupported UDP tunnel, it will simply not aggregate the wire packet in a GSO one. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- v8 -> v9: - rebased on top of virtio-1.4, changed udp-tunnel features number to avoid conflicts/duplications - fixed typos: VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV4 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4 VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV6 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6 - specified strict validation on RX side (both driver and device) for UDP tunneled packet: NEEDS_CSUM and plain GSO are mandatory for UDP tunnel GSO packets. The driver is not allowed to set DATA_VALID for UDP tunnel GSO packets, the driver can set both DATA_VALID & NEEDS_CSUM. v7 -> v8: - dropped 'Note that' in note - NOT set neither A nor B -> NOT set either A or B - specify that 'gso_partial' needs uh->len set to the wire one v6 -> v7: - replaced inner_protocol_type with VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 && VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 - UDP_TUNNEL now depends on all the TCP and USO segmentation features - reworded the 'VIRTIO_NET_HDR_GSO_UDP_TUNNEL' bit driver normative - specified that csum_start/offset points to the inner header in the driver normative, too - new fields presence depends on either VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO - e.g. -> i.e. - zero -> positive zero (0x0) - checksumming -> checksum validation - explain why we the outer UDP csum can be precomputed only when the GSO packet length is a multiple of gso_size - added an example note for the tunnel-related fields value https://lore.kernel.org/virtio-comment/cover.1722252302.git.pabeni@redhat.com/T/#t v5 -> v6: - split in 2 patches - dropped the unneeded outer_mh_offset field https://lore.kernel.org/virtio-comment/56ccba136272593d0d2af0303600c06f26bd84a1.1718621522.git.pabeni@redhat.com/ v4 -> v5: - add separate negotiation for UDP_TUNNEL_CSUM - much more verbose specification of outer csum handling - UDP_TUNNEL_CSUM requires GSO_UDP_TUNNEL https://lore.kernel.org/virtio-comment/cb89d3d6b6a7e4a7f13e60dd3fded5e950727890.1716448766.git.pabeni@redhat.com/ v3 -> v4: - new virtio_net_hdr fields depends on F_HOST_UDP_TUNNEL_GSO or F_GUEST_UDP_TUNNEL_GSO (Stefano) - Extended the changelog to answer Jason's questions. - Clarified outer csum handling (Jason) https://lore.kernel.org/virtio-dev/f0bf2db203f8061a3ecb50b7a48cf26d8f3c7f68.1716193086.git.pabeni@redhat.com v2 -> v3: - UDP_TUNNEL -> UDP_TUNNEL_GSO - add explicit fields for the inner meta-data - more verbose changelog https://lists.oasis-open.org/archives/virtio-dev/202206/msg00026.html v1 -> v2: - explicitly state that the outer header probing is mandatory - explicitly state that GSO_UDP is not allowed with GSO_UDP_TUNNEL - clarify hdr_len usage - clarify UDP_TUNNEL_CSUM bit usage - fix a few typos https://lists.oasis-open.org/archives/virtio-dev/202205/msg00037.html --- device-types/net/description.tex | 205 +++++++++++++++++++++++++++++-- 1 file changed, 198 insertions(+), 7 deletions(-) diff --git a/device-types/net/description.tex b/device-types/net/description.tex index b2a0d39..8e84213 100644 --- a/device-types/net/description.tex +++ b/device-types/net/description.tex @@ -88,6 +88,12 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control channel. +\item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO (46)] Driver can receive GSO packets + carried by a UDP tunnel. + +\item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO (48)] Device can receive GSO packets + carried by a UDP tunnel. + \item[VIRTIO_NET_F_DEVICE_STATS(50)] Device can provide device-level statistics to the driver through the control virtqueue. @@ -138,12 +144,16 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_GUEST_UFO] Requires VIRTIO_NET_F_GUEST_CSUM. \item[VIRTIO_NET_F_GUEST_USO4] Requires VIRTIO_NET_F_GUEST_CSUM. \item[VIRTIO_NET_F_GUEST_USO6] Requires VIRTIO_NET_F_GUEST_CSUM. +\item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO] Requires VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, + VIRTIO_NET_F_GUEST_USO4 and VIRTIO_NET_F_GUEST_USO6. \item[VIRTIO_NET_F_HOST_TSO4] Requires VIRTIO_NET_F_CSUM. \item[VIRTIO_NET_F_HOST_TSO6] Requires VIRTIO_NET_F_CSUM. \item[VIRTIO_NET_F_HOST_ECN] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. \item[VIRTIO_NET_F_HOST_UFO] Requires VIRTIO_NET_F_CSUM. \item[VIRTIO_NET_F_HOST_USO] Requires VIRTIO_NET_F_CSUM. +\item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO] Requires VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_TSO6 + and VIRTIO_NET_F_HOST_USO. \item[VIRTIO_NET_F_CTRL_RX] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_CTRL_VLAN] Requires VIRTIO_NET_F_CTRL_VQ. @@ -381,6 +391,13 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP), VIRTIO_NET_F_HOST_UFO (UDP fragmentation) and VIRTIO_NET_F_HOST_USO (UDP segmentation) features. +\item If the VIRTIO_NET_F_HOST_TSO6, VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_USO + segmentation features are negotiated, a driver can + use TCP segmentation or UDP segmentation on top of UDP encapsulation + offload, when the outer header does not require checksumming - e.g. + the outer UDP checksum is zero - by negotiating the + VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO feature. + \item The converse features are also available: a driver can save the virtual device some work by negotiating these features.\note{For example, a network packet transported between two guests on the same system might not need checksumming at all, nor segmentation, @@ -388,8 +405,9 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially checksummed packets can be received, and if it can do that then the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, - VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_USO4 - and VIRTIO_NET_F_GUEST_USO6 are the input equivalents of the features described above. + VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_USO4, + VIRTIO_NET_F_GUEST_USO6 and VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO + are the input equivalents of the features described above. See \ref{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers}~\nameref{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers} and @@ -451,6 +469,8 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_GSO_UDP 3 #define VIRTIO_NET_HDR_GSO_TCPV6 4 #define VIRTIO_NET_HDR_GSO_UDP_L4 5 +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 0x20 +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 0x40 #define VIRTIO_NET_HDR_GSO_ECN 0x80 u8 gso_type; le16 hdr_len; @@ -461,6 +481,8 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O le32 hash_value; (Only if VIRTIO_NET_F_HASH_REPORT negotiated) le16 hash_report; (Only if VIRTIO_NET_F_HASH_REPORT negotiated) le16 padding_reserved; (Only if VIRTIO_NET_F_HASH_REPORT negotiated) + le16 outer_th_offset (Only if VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO negotiated) + le16 inner_nh_offset; (Only if VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO negotiated) }; \end{lstlisting} @@ -518,6 +540,10 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De followed by the TCP header (with the TCP checksum field 16 bytes into that header). \field{csum_start} will be 14+20 = 34 (the TCP checksum includes the header), and \field{csum_offset} will be 16. +If the given packet has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit set, +the above checksum fields refer to the inner header checksum, see +the example below. \end{note} \item If the driver negotiated @@ -554,6 +580,39 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De specifically in the protocol.}. \end{itemize} +\item If the driver negotiated the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO feature and the + \field{gso_type} has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or + VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit set, the GSO protocol is encapsulated + in a UDP tunnel. + The outer UDP header must not require checksum validation, i.e. the outer UDP + checksum must be a positive zero (0x0) as defined in UDP RFC 768. + The other tunnel-related fields indicate how to replicate the packet + headers to cut it into smaller packets: + + \begin{itemize} + \item \field{outer_th_offset} field indicates the outer transport header within + the packet. This field differs from \field{csum_start} as the latter + points to the inner transport header within the packet. + + \item \field{inner_nh_offset} field indicates the inner network header within + the packet. + \end{itemize} + +\begin{note} +For example, consider a partially checksummed TCP (IPv4) packet carried over a +Geneve UDP tunnel (again IPv4) with no tunnel options. The +only relevant variable related to the tunnel type is the tunnel header length. +The packet will have a 14 byte outer ethernet header, 20 byte outer IP header +followed by the 8 byte UDP header (with a 0 checksum value), 8 byte Geneve header, +14 byte inner ethernet header, 20 byte inner IP header +and the TCP header (with the TCP checksum field 16 bytes +into that header). \field{csum_start} will be 14+20+8+8+14+20 = 84 (the TCP +checksum includes the header), \field{csum_offset} will be 16. +\field{inner_nh_offset} will be 14+20+8+8+14 = 62, \field{outer_th_offset} will be +14+20+8 = 42 and \field{gso_type} will be +VIRTIO_NET_HDR_GSO_TCPV4 | VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 = 0x21 +\end{note} + \item \field{num_buffers} is set to zero. This field is unused on transmitted packets. \item The header and packet are added as one output descriptor to the @@ -598,6 +657,29 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De driver MUST set the VIRTIO_NET_HDR_GSO_ECN bit in \field{gso_type}. +If VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO is negotiated, the driver MAY set +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit +in \field{gso_type} according to the inner network header protocol type +to request TCP or UDP GSO packets over UDPv4 or UDPv6 tunnel +segmentation, otherwise the driver MUST NOT set either the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit +in \field{gso_type}. + +When requesting TCP or UDP GSO segmentation over UDP tunnel, the driver MUST SET the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit if the inner network header is IPv4, i.e. the +packet is a TCPv4 GSO one, otherwise, if the inner network header is IPv6, the driver +MUST SET the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit. + +The driver MUST NOT send to the device TCP or UDP GSO packets over UDP tunnel +requiring segmentation and outer UDP checksum offload. + +The driver MUST NOT set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit together with VIRTIO_NET_HDR_GSO_UDP, as the +latter is deprecated in favor of UDP_L4 and no new feature will support it. + +The driver MUST NOT set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit together. + If the VIRTIO_NET_F_CSUM feature has been negotiated, the driver MAY set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in \field{flags}, if so: @@ -639,12 +721,48 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De header. \end{itemize} +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO option has been negotiated, the +driver MAY set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}, if so: +\begin{itemize} +\item the driver MUST set \field{outer_th_offset} to the outer UDP header + offset and \field{inner_nh_offset} to the inner network header offset. + The \field{csum_start} and \field{csum_offset} fields point respectively + to the inner transport header and inner transport checksum field. +\end{itemize} + +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the +outer UDP header MUST NOT require checksum validation. That is, the +outer UDP checksum value MUST be 0 or the validated complete checksum +for such header. + +\begin{note} +The valid complete checksum of the outer UDP header of individual segments +can be computed by the driver prior to segmentation only if the GSO packet +size is a multiple of \field{gso_size}, because then all segments +have the same size and thus all data included in the outer UDP +checksum is the same for every segment. These pre-computed segment +length and checksum fields are different from those of the GSO +packet. +In this scenario the outer UDP header of the GSO packet must carry the +segmented UDP packet length. +\end{note} + +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO option has not +been negotiated, the driver MUST NOT set either the VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4 +bit or the VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6 in \field{gso_type}. + The driver SHOULD accept the VIRTIO_NET_F_GUEST_HDRLEN feature if it has been offered, and if it's able to provide the exact header length. The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID and VIRTIO_NET_HDR_F_RSC_INFO bits in \field{flags}. +The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} +together with the VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4 bit or the +VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}. + \devicenormative{\paragraph}{Packet Transmission}{Device Types / Network Device / Device Operation / Packet Transmission} The device MUST ignore \field{flag} bits that it does not recognize. @@ -674,6 +792,25 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De \end{note} \end{itemize} +If both the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in in \field{gso_type} are set, +the device MUST NOT accept the packet. + +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 +bit in \field{gso_type} are not set, the device MUST NOT use the +\field{outer_th_offset} and \field{inner_nh_offset}. + +If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, and any of +the following is true: +\begin{itemize} +\item the VIRTIO_NET_HDR_F_NEEDS_CSUM is not set in \field{flags} +\item the VIRTIO_NET_HDR_F_DATA_VALID is set in \field{flags} +\item the \field{gso_type} excluding the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 +bit and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is VIRTIO_NET_HDR_GSO_NONE +\end{itemize} +the device MUST NOT accept the packet. + If VIRTIO_NET_HDR_F_NEEDS_CSUM is not set, the device MUST NOT rely on the packet checksum being correct. \paragraph{Packet Transmission Interrupt}\label{sec:Device Types / Network Device / Device Operation / Packet Transmission / Packet Transmission Interrupt} @@ -776,12 +913,19 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be set: if so, device has validated the packet checksum. + If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO feature has been negotiated, + and either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the + VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, + the device can additionally set the VIRTIO_NET_HDR_F_DATA_VALID bit + in \field{flags}, meaning that the device validated the checksum, + set the csum_start and csum_offset fields, but did not provide the + partial checksum for the transport header. In case of multiple encapsulated protocols, one level of checksums has been validated. \end{enumerate} -Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP and ECN -features enable receive checksum, large receive offload and ECN +Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP, UDP_TUNNEL +and ECN features enable receive checksum, large receive offload and ECN support which are the input equivalents of the transmit checksum, transmit segmentation offloading and ECN features, as described in \ref{sec:Device Types / Network Device / Device Operation / @@ -803,8 +947,16 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network from \field{csum_start} and any preceding checksums have been validated. The checksum on the packet is incomplete and if bit VIRTIO_NET_HDR_F_RSC_INFO is not set in \field{flags}, - then \field{csum_start} and \field{csum_offset} indicate how to calculate it - (see Packet Transmission point 1). + then \field{csum_start} and \field{csum_offset} indicate how to calculate it. + If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 + bit are set, the \field{csum_start} field + refers to the inner transport header offset (see Packet Transmission point 1). +\item If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO option was negotiated and + \field{gso_type} is not VIRTIO_NET_HDR_GSO_NONE, the + VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 + bit MAY be set. In such case the \field{outer_th_offset} and + \field{inner_nh_offset} fields indicate the corresponding + headers information. \end{enumerate} @@ -849,6 +1001,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network If none of VIRTIO_NET_F_GUEST_USO4 or VIRTIO_NET_F_GUEST_USO6 have been negotiated, the device MUST NOT set \field{gso_type} to VIRTIO_NET_HDR_GSO_UDP_L4. +If VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO is not negotiated, the device MUST NOT set +either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}. + The device SHOULD NOT send to the driver TCP packets requiring segmentation offload which have the Explicit Congestion Notification bit set, unless the VIRTIO_NET_F_GUEST_ECN feature is negotiated, in which case the @@ -872,6 +1028,18 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network fully checksummed packet; \end{enumerate} +The device MUST NOT send to the driver TCP or UDP GSO packets encapsulated in UDP +tunnel and requiring segmentation offload, unless the +VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO is negotiated, in which case the device MUST set +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 +bit in \field{gso_type} according to the inner network header protocol type, +MUST set the \field{outer_th_offset} and \field{inner_nh_offset} fields +to the corresponding header information, and the outer UDP header MUST NOT +require checksum offload. + +The device MUST NOT send the driver TCP or UDP GSO packets encapsulated in UDP +tunnel and requiring segmentation and outer checksum offload. + If none of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have been negotiated, the device MUST set \field{gso_type} to VIRTIO_NET_HDR_GSO_NONE. @@ -896,7 +1064,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags}, if so, the device MUST validate the packet checksum (in case of multiple encapsulated protocols, one level -of checksums is validated). +of checksums is validated). \note{This implies that if either the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, +and the set the VIRTIO_NET_HDR_F_DATA_VALID in \field{flags} is set, +the the VIRTIO_NET_F_GUEST_CSUM bit in \field{flags} is must be set, too} \drivernormative{\paragraph}{Processing of Incoming Packets}{Device Types / Network Device / Device Operation / @@ -920,6 +1092,24 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network VIRTIO_NET_HDR_F_DATA_VALID is set, the driver MUST NOT rely on the packet checksum being correct. +If both the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in in \field{gso_type} are set, +the driver MUST NOT accept the packet. + +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 +bit in \field{gso_type} are not set, the driver MUST NOT use the +\field{outer_th_offset} and \field{inner_nh_offset}. + +If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, and any of +the following is true: +\begin{itemize} +\item the VIRTIO_NET_HDR_F_NEEDS_CSUM is not set in \field{flags} +\item the \field{gso_type} excluding the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 +bit and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is VIRTIO_NET_HDR_GSO_NONE +\end{itemize} +the driver MUST NOT accept the packet. + \paragraph{Hash calculation for incoming packets} \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets} @@ -1763,6 +1953,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi #define VIRTIO_NET_F_GUEST_TSO6 8 #define VIRTIO_NET_F_GUEST_ECN 9 #define VIRTIO_NET_F_GUEST_UFO 10 +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 46 #define VIRTIO_NET_F_GUEST_USO4 54 #define VIRTIO_NET_F_GUEST_USO6 55 -- 2.45.2 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-04 8:13 ` [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature Paolo Abeni @ 2024-10-09 7:18 ` Jason Wang 2024-10-09 8:37 ` Paolo Abeni 0 siblings, 1 reply; 40+ messages in thread From: Jason Wang @ 2024-10-09 7:18 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Fri, Oct 4, 2024 at 4:13 PM Paolo Abeni <pabeni@redhat.com> wrote: > > The VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} are two gso_type flags > allowing respectively GSO over UDPv4 tunnel and GSO over UDPv6 tunnel. > They can be negotiated on both the host and guest sides. > > One constraint addressed here is that the virtio side (either device or > driver) receiving a UDP tunneled GSO packet must be able to reconstruct > completely the inner and outer headers offset - to allow for later GSO. > > To accommodate such need, new fields are introduced in the virtio_net > header: outer_th_offset and inner_nh_offset. > They map directly to the corresponding header information. The inner > transport header is implied by the (inner) checksum offload. > > Those fields are required because a virtio device H/W implementation > performing segmentation for UDP tunneled packet will need to touch > the outer transport protocol (for the UDP length filed), the > inner network protocol (for the total length field, in the IPv4 > case). > > Note that segmentation will additionally need to touch > the outer network protocol and the inner transport protocol. The first > is implied/easily found with trivial parsing, the latter is identified > by the existing csum_start field. > > Note that there is no concept of UDP tunnel type negotiation (e.g. > vxlan, geneve, vxlan-gpe, etc.), as a virtio device H/W implementation > can perform segmentation for every possible UDP-tunnel given the > specified new fields. > In the reverse direction, if a virtio device H/W implementation receives > some traffic for an unknown or unsupported UDP tunnel, it will simply > not aggregate the wire packet in a GSO one. > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > v8 -> v9: > - rebased on top of virtio-1.4, changed udp-tunnel features number > to avoid conflicts/duplications > - fixed typos: > VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV4 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4 > VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV6 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6 > - specified strict validation on RX side (both driver and device) > for UDP tunneled packet: NEEDS_CSUM and plain GSO are mandatory for UDP > tunnel GSO packets. The driver is not allowed to set DATA_VALID for UDP > tunnel GSO packets, the driver can set both DATA_VALID & NEEDS_CSUM. DATA_VALID and NEEDS_CSUM are mutually exclusive unless I miss something. > > v7 -> v8: > - dropped 'Note that' in note > - NOT set neither A nor B -> NOT set either A or B > - specify that 'gso_partial' needs uh->len set to the wire one > > v6 -> v7: > - replaced inner_protocol_type with VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 && > VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 > - UDP_TUNNEL now depends on all the TCP and USO segmentation features > - reworded the 'VIRTIO_NET_HDR_GSO_UDP_TUNNEL' bit driver normative > - specified that csum_start/offset points to the inner header in the > driver normative, too > - new fields presence depends on either VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or > VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO > - e.g. -> i.e. > - zero -> positive zero (0x0) > - checksumming -> checksum validation > - explain why we the outer UDP csum can be precomputed only > when the GSO packet length is a multiple of gso_size > - added an example note for the tunnel-related fields value > https://lore.kernel.org/virtio-comment/cover.1722252302.git.pabeni@redhat.com/T/#t > > v5 -> v6: > - split in 2 patches > - dropped the unneeded outer_mh_offset field > https://lore.kernel.org/virtio-comment/56ccba136272593d0d2af0303600c06f26bd84a1.1718621522.git.pabeni@redhat.com/ > > v4 -> v5: > - add separate negotiation for UDP_TUNNEL_CSUM > - much more verbose specification of outer csum handling > - UDP_TUNNEL_CSUM requires GSO_UDP_TUNNEL > https://lore.kernel.org/virtio-comment/cb89d3d6b6a7e4a7f13e60dd3fded5e950727890.1716448766.git.pabeni@redhat.com/ > > v3 -> v4: > - new virtio_net_hdr fields depends on F_HOST_UDP_TUNNEL_GSO or > F_GUEST_UDP_TUNNEL_GSO (Stefano) > - Extended the changelog to answer Jason's questions. > - Clarified outer csum handling (Jason) > https://lore.kernel.org/virtio-dev/f0bf2db203f8061a3ecb50b7a48cf26d8f3c7f68.1716193086.git.pabeni@redhat.com > > v2 -> v3: > - UDP_TUNNEL -> UDP_TUNNEL_GSO > - add explicit fields for the inner meta-data > - more verbose changelog > https://lists.oasis-open.org/archives/virtio-dev/202206/msg00026.html > > v1 -> v2: > - explicitly state that the outer header probing is mandatory > - explicitly state that GSO_UDP is not allowed with GSO_UDP_TUNNEL > - clarify hdr_len usage > - clarify UDP_TUNNEL_CSUM bit usage > - fix a few typos > https://lists.oasis-open.org/archives/virtio-dev/202205/msg00037.html > --- > device-types/net/description.tex | 205 +++++++++++++++++++++++++++++-- > 1 file changed, 198 insertions(+), 7 deletions(-) > > diff --git a/device-types/net/description.tex b/device-types/net/description.tex > index b2a0d39..8e84213 100644 > --- a/device-types/net/description.tex > +++ b/device-types/net/description.tex > @@ -88,6 +88,12 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits > \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control > channel. > > +\item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO (46)] Driver can receive GSO packets > + carried by a UDP tunnel. > + > +\item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO (48)] Device can receive GSO packets > + carried by a UDP tunnel. > + > \item[VIRTIO_NET_F_DEVICE_STATS(50)] Device can provide device-level statistics > to the driver through the control virtqueue. > > @@ -138,12 +144,16 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device > \item[VIRTIO_NET_F_GUEST_UFO] Requires VIRTIO_NET_F_GUEST_CSUM. > \item[VIRTIO_NET_F_GUEST_USO4] Requires VIRTIO_NET_F_GUEST_CSUM. > \item[VIRTIO_NET_F_GUEST_USO6] Requires VIRTIO_NET_F_GUEST_CSUM. > +\item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO] Requires VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, > + VIRTIO_NET_F_GUEST_USO4 and VIRTIO_NET_F_GUEST_USO6. > > \item[VIRTIO_NET_F_HOST_TSO4] Requires VIRTIO_NET_F_CSUM. > \item[VIRTIO_NET_F_HOST_TSO6] Requires VIRTIO_NET_F_CSUM. > \item[VIRTIO_NET_F_HOST_ECN] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6. > \item[VIRTIO_NET_F_HOST_UFO] Requires VIRTIO_NET_F_CSUM. > \item[VIRTIO_NET_F_HOST_USO] Requires VIRTIO_NET_F_CSUM. > +\item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO] Requires VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_TSO6 > + and VIRTIO_NET_F_HOST_USO. > > \item[VIRTIO_NET_F_CTRL_RX] Requires VIRTIO_NET_F_CTRL_VQ. > \item[VIRTIO_NET_F_CTRL_VLAN] Requires VIRTIO_NET_F_CTRL_VQ. > @@ -381,6 +391,13 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev > TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP), VIRTIO_NET_F_HOST_UFO > (UDP fragmentation) and VIRTIO_NET_F_HOST_USO (UDP segmentation) features. > > +\item If the VIRTIO_NET_F_HOST_TSO6, VIRTIO_NET_F_HOST_TSO4 and VIRTIO_NET_F_HOST_USO > + segmentation features are negotiated, a driver can > + use TCP segmentation or UDP segmentation on top of UDP encapsulation > + offload, when the outer header does not require checksumming - e.g. > + the outer UDP checksum is zero - by negotiating the > + VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO feature. > + > \item The converse features are also available: a driver can save > the virtual device some work by negotiating these features.\note{For example, a network packet transported between two guests on > the same system might not need checksumming at all, nor segmentation, > @@ -388,8 +405,9 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev > The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially > checksummed packets can be received, and if it can do that then > the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, > - VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_USO4 > - and VIRTIO_NET_F_GUEST_USO6 are the input equivalents of the features described above. > + VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_USO4, > + VIRTIO_NET_F_GUEST_USO6 and VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO > + are the input equivalents of the features described above. > See \ref{sec:Device Types / Network Device / Device Operation / > Setting Up Receive Buffers}~\nameref{sec:Device Types / Network > Device / Device Operation / Setting Up Receive Buffers} and > @@ -451,6 +469,8 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O > #define VIRTIO_NET_HDR_GSO_UDP 3 > #define VIRTIO_NET_HDR_GSO_TCPV6 4 > #define VIRTIO_NET_HDR_GSO_UDP_L4 5 > +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 0x20 > +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 0x40 > #define VIRTIO_NET_HDR_GSO_ECN 0x80 > u8 gso_type; > le16 hdr_len; > @@ -461,6 +481,8 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O > le32 hash_value; (Only if VIRTIO_NET_F_HASH_REPORT negotiated) > le16 hash_report; (Only if VIRTIO_NET_F_HASH_REPORT negotiated) > le16 padding_reserved; (Only if VIRTIO_NET_F_HASH_REPORT negotiated) > + le16 outer_th_offset (Only if VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO negotiated) > + le16 inner_nh_offset; (Only if VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO negotiated) > }; > \end{lstlisting} > > @@ -518,6 +540,10 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > followed by the TCP header (with the TCP checksum field 16 bytes > into that header). \field{csum_start} will be 14+20 = 34 (the TCP > checksum includes the header), and \field{csum_offset} will be 16. > +If the given packet has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit set, > +the above checksum fields refer to the inner header checksum, see > +the example below. > \end{note} > > \item If the driver negotiated > @@ -554,6 +580,39 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > specifically in the protocol.}. > \end{itemize} > > +\item If the driver negotiated the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO feature and the > + \field{gso_type} has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or > + VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit set, the GSO protocol is encapsulated > + in a UDP tunnel. > + The outer UDP header must not require checksum validation, i.e. the outer UDP > + checksum must be a positive zero (0x0) as defined in UDP RFC 768. > + The other tunnel-related fields indicate how to replicate the packet > + headers to cut it into smaller packets: > + > + \begin{itemize} > + \item \field{outer_th_offset} field indicates the outer transport header within > + the packet. This field differs from \field{csum_start} as the latter > + points to the inner transport header within the packet. > + > + \item \field{inner_nh_offset} field indicates the inner network header within > + the packet. > + \end{itemize} > + > +\begin{note} > +For example, consider a partially checksummed TCP (IPv4) packet carried over a > +Geneve UDP tunnel (again IPv4) with no tunnel options. The > +only relevant variable related to the tunnel type is the tunnel header length. > +The packet will have a 14 byte outer ethernet header, 20 byte outer IP header > +followed by the 8 byte UDP header (with a 0 checksum value), 8 byte Geneve header, > +14 byte inner ethernet header, 20 byte inner IP header > +and the TCP header (with the TCP checksum field 16 bytes > +into that header). \field{csum_start} will be 14+20+8+8+14+20 = 84 (the TCP > +checksum includes the header), \field{csum_offset} will be 16. > +\field{inner_nh_offset} will be 14+20+8+8+14 = 62, \field{outer_th_offset} will be > +14+20+8 = 42 and \field{gso_type} will be > +VIRTIO_NET_HDR_GSO_TCPV4 | VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 = 0x21 > +\end{note} > + > \item \field{num_buffers} is set to zero. This field is unused on transmitted packets. > > \item The header and packet are added as one output descriptor to the > @@ -598,6 +657,29 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > driver MUST set the VIRTIO_NET_HDR_GSO_ECN bit in > \field{gso_type}. > > +If VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO is negotiated, the driver MAY set > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit > +in \field{gso_type} according to the inner network header protocol type > +to request TCP or UDP GSO packets over UDPv4 or UDPv6 tunnel > +segmentation, otherwise the driver MUST NOT set either the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit > +in \field{gso_type}. > + > +When requesting TCP or UDP GSO segmentation over UDP tunnel, the driver MUST SET the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit if the inner network header is IPv4, i.e. the > +packet is a TCPv4 GSO one, otherwise, if the inner network header is IPv6, the driver > +MUST SET the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit. > + > +The driver MUST NOT send to the device TCP or UDP GSO packets over UDP tunnel > +requiring segmentation and outer UDP checksum offload. > + > +The driver MUST NOT set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit together with VIRTIO_NET_HDR_GSO_UDP, as the > +latter is deprecated in favor of UDP_L4 and no new feature will support it. > + > +The driver MUST NOT set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit together. > + > If the VIRTIO_NET_F_CSUM feature has been negotiated, the > driver MAY set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in > \field{flags}, if so: > @@ -639,12 +721,48 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > header. > \end{itemize} > > +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO option has been negotiated, the > +driver MAY set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}, if so: > +\begin{itemize} > +\item the driver MUST set \field{outer_th_offset} to the outer UDP header > + offset and \field{inner_nh_offset} to the inner network header offset. > + The \field{csum_start} and \field{csum_offset} fields point respectively > + to the inner transport header and inner transport checksum field. > +\end{itemize} > + > +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the > +outer UDP header MUST NOT require checksum validation. That is, the > +outer UDP checksum value MUST be 0 or the validated complete checksum > +for such header. > + > +\begin{note} > +The valid complete checksum of the outer UDP header of individual segments > +can be computed by the driver prior to segmentation only if the GSO packet > +size is a multiple of \field{gso_size}, because then all segments > +have the same size and thus all data included in the outer UDP > +checksum is the same for every segment. These pre-computed segment > +length and checksum fields are different from those of the GSO > +packet. How could a device know if the checksum if pre-computed or not? (I could not find the relevant part in the sample kernel code). > +In this scenario the outer UDP header of the GSO packet must carry the > +segmented UDP packet length. > +\end{note} > + > +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO option has not > +been negotiated, the driver MUST NOT set either the VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4 > +bit or the VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6 in \field{gso_type}. > + > The driver SHOULD accept the VIRTIO_NET_F_GUEST_HDRLEN feature if it has > been offered, and if it's able to provide the exact header length. > > The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID and > VIRTIO_NET_HDR_F_RSC_INFO bits in \field{flags}. > > +The driver MUST NOT set the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} > +together with the VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4 bit or the > +VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}. > + > \devicenormative{\paragraph}{Packet Transmission}{Device Types / Network Device / Device Operation / Packet Transmission} > The device MUST ignore \field{flag} bits that it does not recognize. > > @@ -674,6 +792,25 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > \end{note} > \end{itemize} > > +If both the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and > +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in in \field{gso_type} are set, > +the device MUST NOT accept the packet. > + > +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 > +bit in \field{gso_type} are not set, the device MUST NOT use the > +\field{outer_th_offset} and \field{inner_nh_offset}. > + > +If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or > +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, and any of > +the following is true: > +\begin{itemize} > +\item the VIRTIO_NET_HDR_F_NEEDS_CSUM is not set in \field{flags} > +\item the VIRTIO_NET_HDR_F_DATA_VALID is set in \field{flags} > +\item the \field{gso_type} excluding the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 > +bit and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is VIRTIO_NET_HDR_GSO_NONE > +\end{itemize} > +the device MUST NOT accept the packet. > + > If VIRTIO_NET_HDR_F_NEEDS_CSUM is not set, the device MUST NOT > rely on the packet checksum being correct. > \paragraph{Packet Transmission Interrupt}\label{sec:Device Types / Network Device / Device Operation / Packet Transmission / Packet Transmission Interrupt} > @@ -776,12 +913,19 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be > set: if so, device has validated the packet checksum. > + If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO feature has been negotiated, > + and either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > + VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, > + the device can additionally set the VIRTIO_NET_HDR_F_DATA_VALID bit > + in \field{flags}, meaning that the device validated the checksum, > + set the csum_start and csum_offset fields, but did not provide the > + partial checksum for the transport header. Interesting, we had this in the spec now: """ \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be set: if so, device has validated the packet checksum. In case of multiple encapsulated protocols, one level of checksums has been validated. """ Firstly, it doesn't mandate the csum_start/csum_offset. Anything makes the tunnel GSO different? Secondly, I think "one level of checksums" means the outer, this seems to be contradict to what we had here: 1) DATA_VALID seems to be for inner 2) no outer checksum offload It looks like we want to reuse hdr->flags for inner, not sure if it would result come conflict when introducing outer checksum offload. Or maybe we should introduce a dedicated inner_flags. > In case of multiple encapsulated protocols, one level of checksums > has been validated. > \end{enumerate} > > -Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP and ECN > -features enable receive checksum, large receive offload and ECN > +Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP, UDP_TUNNEL > +and ECN features enable receive checksum, large receive offload and ECN > support which are the input equivalents of the transmit checksum, > transmit segmentation offloading and ECN features, as described > in \ref{sec:Device Types / Network Device / Device Operation / > @@ -803,8 +947,16 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > from \field{csum_start} and any preceding checksums > have been validated. The checksum on the packet is incomplete and > if bit VIRTIO_NET_HDR_F_RSC_INFO is not set in \field{flags}, > - then \field{csum_start} and \field{csum_offset} indicate how to calculate it > - (see Packet Transmission point 1). > + then \field{csum_start} and \field{csum_offset} indicate how to calculate it. > + If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 > + bit are set, the \field{csum_start} field > + refers to the inner transport header offset (see Packet Transmission point 1). > +\item If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO option was negotiated and > + \field{gso_type} is not VIRTIO_NET_HDR_GSO_NONE, the > + VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 > + bit MAY be set. In such case the \field{outer_th_offset} and > + \field{inner_nh_offset} fields indicate the corresponding > + headers information. > > \end{enumerate} > > @@ -849,6 +1001,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > If none of VIRTIO_NET_F_GUEST_USO4 or VIRTIO_NET_F_GUEST_USO6 have been negotiated, > the device MUST NOT set \field{gso_type} to VIRTIO_NET_HDR_GSO_UDP_L4. > > +If VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO is not negotiated, the device MUST NOT set > +either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}. > + > The device SHOULD NOT send to the driver TCP packets requiring segmentation offload > which have the Explicit Congestion Notification bit set, unless the > VIRTIO_NET_F_GUEST_ECN feature is negotiated, in which case the > @@ -872,6 +1028,18 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > fully checksummed packet; > \end{enumerate} > > +The device MUST NOT send to the driver TCP or UDP GSO packets encapsulated in UDP > +tunnel and requiring segmentation offload, unless the > +VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO is negotiated, in which case the device MUST set > +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 > +bit in \field{gso_type} according to the inner network header protocol type, > +MUST set the \field{outer_th_offset} and \field{inner_nh_offset} fields > +to the corresponding header information, and the outer UDP header MUST NOT > +require checksum offload. > + > +The device MUST NOT send the driver TCP or UDP GSO packets encapsulated in UDP > +tunnel and requiring segmentation and outer checksum offload. > + > If none of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have > been negotiated, the device MUST set \field{gso_type} to > VIRTIO_NET_HDR_GSO_NONE. > @@ -896,7 +1064,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in > \field{flags}, if so, the device MUST validate the packet > checksum (in case of multiple encapsulated protocols, one level > -of checksums is validated). > +of checksums is validated). \note{This implies that if either the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, > +and the set the VIRTIO_NET_HDR_F_DATA_VALID in \field{flags} is set, > +the the VIRTIO_NET_F_GUEST_CSUM bit in \field{flags} is must be set, too} > > \drivernormative{\paragraph}{Processing of Incoming > Packets}{Device Types / Network Device / Device Operation / > @@ -920,6 +1092,24 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > VIRTIO_NET_HDR_F_DATA_VALID is set, the driver MUST NOT > rely on the packet checksum being correct. > > +If both the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and > +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in in \field{gso_type} are set, > +the driver MUST NOT accept the packet. > + > +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 > +bit in \field{gso_type} are not set, the driver MUST NOT use the > +\field{outer_th_offset} and \field{inner_nh_offset}. > + > +If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or > +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, and any of > +the following is true: > +\begin{itemize} > +\item the VIRTIO_NET_HDR_F_NEEDS_CSUM is not set in \field{flags} > +\item the \field{gso_type} excluding the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 > +bit and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is VIRTIO_NET_HDR_GSO_NONE Maybe we can just simplify this as "VIRTIO_NET_HDR_GSO_NONE is set in \field{gso_type}" > +\end{itemize} > +the driver MUST NOT accept the packet. > + > \paragraph{Hash calculation for incoming packets} > \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets} > > @@ -1763,6 +1953,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi > #define VIRTIO_NET_F_GUEST_TSO6 8 > #define VIRTIO_NET_F_GUEST_ECN 9 > #define VIRTIO_NET_F_GUEST_UFO 10 > +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 46 > #define VIRTIO_NET_F_GUEST_USO4 54 > #define VIRTIO_NET_F_GUEST_USO6 55 > > -- > 2.45.2 > Thanks ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-09 7:18 ` Jason Wang @ 2024-10-09 8:37 ` Paolo Abeni 2024-10-10 3:17 ` Jason Wang 0 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-09 8:37 UTC (permalink / raw) To: Jason Wang Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On 10/9/24 09:18, Jason Wang wrote: > On Fri, Oct 4, 2024 at 4:13 PM Paolo Abeni <pabeni@redhat.com> wrote: >> >> The VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} are two gso_type flags >> allowing respectively GSO over UDPv4 tunnel and GSO over UDPv6 tunnel. >> They can be negotiated on both the host and guest sides. >> >> One constraint addressed here is that the virtio side (either device or >> driver) receiving a UDP tunneled GSO packet must be able to reconstruct >> completely the inner and outer headers offset - to allow for later GSO. >> >> To accommodate such need, new fields are introduced in the virtio_net >> header: outer_th_offset and inner_nh_offset. >> They map directly to the corresponding header information. The inner >> transport header is implied by the (inner) checksum offload. >> >> Those fields are required because a virtio device H/W implementation >> performing segmentation for UDP tunneled packet will need to touch >> the outer transport protocol (for the UDP length filed), the >> inner network protocol (for the total length field, in the IPv4 >> case). >> >> Note that segmentation will additionally need to touch >> the outer network protocol and the inner transport protocol. The first >> is implied/easily found with trivial parsing, the latter is identified >> by the existing csum_start field. >> >> Note that there is no concept of UDP tunnel type negotiation (e.g. >> vxlan, geneve, vxlan-gpe, etc.), as a virtio device H/W implementation >> can perform segmentation for every possible UDP-tunnel given the >> specified new fields. >> In the reverse direction, if a virtio device H/W implementation receives >> some traffic for an unknown or unsupported UDP tunnel, it will simply >> not aggregate the wire packet in a GSO one. >> >> Signed-off-by: Paolo Abeni <pabeni@redhat.com> >> --- >> v8 -> v9: >> - rebased on top of virtio-1.4, changed udp-tunnel features number >> to avoid conflicts/duplications >> - fixed typos: >> VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV4 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4 >> VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV6 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6 >> - specified strict validation on RX side (both driver and device) >> for UDP tunneled packet: NEEDS_CSUM and plain GSO are mandatory for UDP >> tunnel GSO packets. The driver is not allowed to set DATA_VALID for UDP >> tunnel GSO packets, the driver can set both DATA_VALID & NEEDS_CSUM. > > DATA_VALID and NEEDS_CSUM are mutually exclusive unless I miss something. AFAICS the specification does not specify them to be mutually esclusive and at least the linux kernel implementation allow them to be set together. As a possible alternative to the change from v8 to v9 we can enforce DATA_VALID and NEEDS_CSUM to be mutually esclusive. GSO over UDP tunnel needs the csum_start offset; with DATA_VALID and NEEDS_CSUM mutually esclusive exclusive we must either: - NOT allow GSO over UDP tunnel with DATA_VALID or - required csum_start being set with GSO over UDP tunnel and DATA_VALID Please LMK which one is the preferred option. >> @@ -639,12 +721,48 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De >> header. >> \end{itemize} >> >> +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO option has been negotiated, the >> +driver MAY set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the >> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}, if so: >> +\begin{itemize} >> +\item the driver MUST set \field{outer_th_offset} to the outer UDP header >> + offset and \field{inner_nh_offset} to the inner network header offset. >> + The \field{csum_start} and \field{csum_offset} fields point respectively >> + to the inner transport header and inner transport checksum field. >> +\end{itemize} >> + >> +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the >> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the >> +outer UDP header MUST NOT require checksum validation. That is, the >> +outer UDP checksum value MUST be 0 or the validated complete checksum >> +for such header. >> + >> +\begin{note} >> +The valid complete checksum of the outer UDP header of individual segments >> +can be computed by the driver prior to segmentation only if the GSO packet >> +size is a multiple of \field{gso_size}, because then all segments >> +have the same size and thus all data included in the outer UDP >> +checksum is the same for every segment. These pre-computed segment >> +length and checksum fields are different from those of the GSO >> +packet. > > How could a device know if the checksum if pre-computed or not? (I > could not find the relevant part in the sample kernel code). Note that the UDP tunnel csum offload is discusses in patch 2/2 as per your request. At this point (only patch 1/2 applied) there is no concept of csum offload for the outer header. With only patch 1/2 applied the outer UDP will be always left-alone/ignored/not touched (either 0 or pre-computed). In patch 2/2 is specified that the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in flags carries the info about the outer csum offload. When set the outer csum is offloaded, when cleared is either 0 or precomputed. >> @@ -776,12 +913,19 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network >> \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the >> VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be >> set: if so, device has validated the packet checksum. >> + If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO feature has been negotiated, >> + and either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the >> + VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, >> + the device can additionally set the VIRTIO_NET_HDR_F_DATA_VALID bit >> + in \field{flags}, meaning that the device validated the checksum, >> + set the csum_start and csum_offset fields, but did not provide the >> + partial checksum for the transport header. > > Interesting, we had this in the spec now: > > """ > \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be > set: if so, device has validated the packet checksum. > In case of multiple encapsulated protocols, one level of checksums > has been validated. > """ > > Firstly, it doesn't mandate the csum_start/csum_offset. Anything makes > the tunnel GSO different? This difference is intentional, to satisfy a requested from Willem for "robustness". We have a continuos flow of serious bugs in the current virtio implementation due to the spec allowing "unexpected"/strange/weak features combination leading to strange packet layout and crashes. i.e. see the kernel commits: 49d14b54a527289d09a9480f214b8c586322310a 6513eb3d3191574b58859ef2d6dc26c0277c6f81 89add40066f9ed9abe5f7f886fe5789ff7e0c50e The idea is that GSO over UDP tunnel specs are new, so we can be more strict and we try to mandate only 'sane' flag/features/layout combinations. > Secondly, I think "one level of checksums" means the outer, this seems > to be contradict to what we had here: 1) DATA_VALID seems to be for > inner 2) no outer checksum offload Yes, with this spec GSO over UDP tunnel DATA_VALID specify the inner head csum. AFAICS the simpler option is allowing DATA_VALID with GSO over UDP tunnel without only with outer header csum offload, too. > It looks like we want to reuse hdr->flags for inner, not sure if it > would result come conflict when introducing outer checksum offload. Or > maybe we should introduce a dedicated inner_flags. I could not follow the above, could you please re-phrase? >> @@ -920,6 +1092,24 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network >> VIRTIO_NET_HDR_F_DATA_VALID is set, the driver MUST NOT >> rely on the packet checksum being correct. >> >> +If both the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and >> +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in in \field{gso_type} are set, >> +the driver MUST NOT accept the packet. >> + >> +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 >> +bit in \field{gso_type} are not set, the driver MUST NOT use the >> +\field{outer_th_offset} and \field{inner_nh_offset}. >> + >> +If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or >> +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, and any of >> +the following is true: >> +\begin{itemize} >> +\item the VIRTIO_NET_HDR_F_NEEDS_CSUM is not set in \field{flags} >> +\item the \field{gso_type} excluding the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 >> +bit and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is VIRTIO_NET_HDR_GSO_NONE > > Maybe we can just simplify this as > > "VIRTIO_NET_HDR_GSO_NONE is set in \field{gso_type}" I considered that option, the main problem is that such statement will never be true: gso_type has at least a bit set (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 or VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6). I preferred to be more accurate even if more verbose. I think we should preserve this accuracy. Thanks, Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-09 8:37 ` Paolo Abeni @ 2024-10-10 3:17 ` Jason Wang 2024-10-10 7:40 ` Paolo Abeni 0 siblings, 1 reply; 40+ messages in thread From: Jason Wang @ 2024-10-10 3:17 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Wed, Oct 9, 2024 at 4:37 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/9/24 09:18, Jason Wang wrote: > > On Fri, Oct 4, 2024 at 4:13 PM Paolo Abeni <pabeni@redhat.com> wrote: > >> > >> The VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} are two gso_type flags > >> allowing respectively GSO over UDPv4 tunnel and GSO over UDPv6 tunnel. > >> They can be negotiated on both the host and guest sides. > >> > >> One constraint addressed here is that the virtio side (either device or > >> driver) receiving a UDP tunneled GSO packet must be able to reconstruct > >> completely the inner and outer headers offset - to allow for later GSO. > >> > >> To accommodate such need, new fields are introduced in the virtio_net > >> header: outer_th_offset and inner_nh_offset. > >> They map directly to the corresponding header information. The inner > >> transport header is implied by the (inner) checksum offload. > >> > >> Those fields are required because a virtio device H/W implementation > >> performing segmentation for UDP tunneled packet will need to touch > >> the outer transport protocol (for the UDP length filed), the > >> inner network protocol (for the total length field, in the IPv4 > >> case). > >> > >> Note that segmentation will additionally need to touch > >> the outer network protocol and the inner transport protocol. The first > >> is implied/easily found with trivial parsing, the latter is identified > >> by the existing csum_start field. > >> > >> Note that there is no concept of UDP tunnel type negotiation (e.g. > >> vxlan, geneve, vxlan-gpe, etc.), as a virtio device H/W implementation > >> can perform segmentation for every possible UDP-tunnel given the > >> specified new fields. > >> In the reverse direction, if a virtio device H/W implementation receives > >> some traffic for an unknown or unsupported UDP tunnel, it will simply > >> not aggregate the wire packet in a GSO one. > >> > >> Signed-off-by: Paolo Abeni <pabeni@redhat.com> > >> --- > >> v8 -> v9: > >> - rebased on top of virtio-1.4, changed udp-tunnel features number > >> to avoid conflicts/duplications > >> - fixed typos: > >> VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV4 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4 > >> VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV6 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6 > >> - specified strict validation on RX side (both driver and device) > >> for UDP tunneled packet: NEEDS_CSUM and plain GSO are mandatory for UDP > >> tunnel GSO packets. The driver is not allowed to set DATA_VALID for UDP > >> tunnel GSO packets, the driver can set both DATA_VALID & NEEDS_CSUM. > > > > DATA_VALID and NEEDS_CSUM are mutually exclusive unless I miss something. > > AFAICS the specification does not specify them to be mutually esclusive > and at least the linux kernel implementation allow them to be set together. Interesting, where is such code? E.g in virtio_net_hdr_from_skb() we had: if (skb->ip_summed == CHECKSUM_PARTIAL) { hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; hdr->csum_start = __cpu_to_virtio16(little_endian, skb_checksum_start_offset(skb) + vlan_hlen); hdr->csum_offset = __cpu_to_virtio16(little_endian, skb->csum_offset); } else if (has_data_valid && skb->ip_summed == CHECKSUM_UNNECESSARY) { hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID; ... Anyhow, when I introduce DATA_VALID, the idea is to map CHECKSUM_UNNECESSARY to DATA_VALID and CHECKSUM_PARTIAL maps to NEEDS_CSUM. My understanding is CHECKSUM_UNNECESSARY and CHECKSUM_PARTIAL and mutually exclusive. > > As a possible alternative to the change from v8 to v9 we can enforce > DATA_VALID and NEEDS_CSUM to be mutually esclusive. It seems implied in this part of the spec: """ If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has validated the packet checksum. In case of multiple encapsulated protocols, one level of checksums has been validated. """ As DATA_VALID is only used in RX path, if packet has partial cusm, device can't do the validation anyhow. Or did you mean the hardware that can do hardware GRO, in that case, device should set NEEDS_CSUM but not DATA_VALID (maybe we need clarify this in the spec). > GSO over UDP tunnel > needs the csum_start offset; with DATA_VALID and NEEDS_CSUM mutually > esclusive exclusive we must either: > - NOT allow GSO over UDP tunnel with DATA_VALID > or > - required csum_start being set with GSO over UDP tunnel and DATA_VALID I have some comment for this, see below. > > Please LMK which one is the preferred option. > > >> @@ -639,12 +721,48 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > >> header. > >> \end{itemize} > >> > >> +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO option has been negotiated, the > >> +driver MAY set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > >> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}, if so: > >> +\begin{itemize} > >> +\item the driver MUST set \field{outer_th_offset} to the outer UDP header > >> + offset and \field{inner_nh_offset} to the inner network header offset. > >> + The \field{csum_start} and \field{csum_offset} fields point respectively > >> + to the inner transport header and inner transport checksum field. > >> +\end{itemize} > >> + > >> +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > >> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the > >> +outer UDP header MUST NOT require checksum validation. That is, the > >> +outer UDP checksum value MUST be 0 or the validated complete checksum > >> +for such header. > >> + > >> +\begin{note} > >> +The valid complete checksum of the outer UDP header of individual segments > >> +can be computed by the driver prior to segmentation only if the GSO packet > >> +size is a multiple of \field{gso_size}, because then all segments > >> +have the same size and thus all data included in the outer UDP > >> +checksum is the same for every segment. These pre-computed segment > >> +length and checksum fields are different from those of the GSO > >> +packet. > > > > How could a device know if the checksum if pre-computed or not? (I > > could not find the relevant part in the sample kernel code). > > Note that the UDP tunnel csum offload is discusses in patch 2/2 as per > your request. At this point (only patch 1/2 applied) there is no concept > of csum offload for the outer header. With only patch 1/2 applied the > outer UDP will be always left-alone/ignored/not touched (either 0 or > pre-computed). Correct me if I was wrong, my understanding is that, device should behave differently in those cases: 0: do a full checksum pre-computed: checksum only the data part So you meant if device sees a non zero checksum, it realizes that it is a pre-computed one? If yes, we probably need to clarify this in the spec. > > In patch 2/2 is specified that the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit > in flags carries the info about the outer csum offload. When set the > outer csum is offloaded, when cleared is either 0 or precomputed. > > >> @@ -776,12 +913,19 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > >> \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > >> VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be > >> set: if so, device has validated the packet checksum. > >> + If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO feature has been negotiated, > >> + and either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > >> + VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, > >> + the device can additionally set the VIRTIO_NET_HDR_F_DATA_VALID bit > >> + in \field{flags}, meaning that the device validated the checksum, > >> + set the csum_start and csum_offset fields, but did not provide the > >> + partial checksum for the transport header. > > > > Interesting, we had this in the spec now: > > > > """ > > \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > > VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be > > set: if so, device has validated the packet checksum. > > In case of multiple encapsulated protocols, one level of checksums > > has been validated. > > """ > > > > Firstly, it doesn't mandate the csum_start/csum_offset. Anything makes > > the tunnel GSO different? > > This difference is intentional, to satisfy a requested from Willem for > "robustness". We have a continuos flow of serious bugs in the current > virtio implementation due to the spec allowing "unexpected"/strange/weak > features combination leading to strange packet layout and crashes. > > i.e. see the kernel commits: > 49d14b54a527289d09a9480f214b8c586322310a > 6513eb3d3191574b58859ef2d6dc26c0277c6f81 > 89add40066f9ed9abe5f7f886fe5789ff7e0c50e > I see. > The idea is that GSO over UDP tunnel specs are new, so we can be more > strict and we try to mandate only 'sane' flag/features/layout combinations. But I don't see how mandating device to set csum stuffs can help. 1) There's no way for the driver to know if device sets csum stuffs or not 2) Driver should not trust the device, malicious device can set arbitrary value, so driver needs its own validation > > > Secondly, I think "one level of checksums" means the outer, this seems > > to be contradict to what we had here: 1) DATA_VALID seems to be for > > inner 2) no outer checksum offload > > Yes, with this spec GSO over UDP tunnel DATA_VALID specify the inner > head csum. Just to make sure we are on the same page. I meant we probably need a clarification on the "one level of checksums" here. It looks ambiguous as I see we think differently here: Assuming we have a packet with N level of tunnels. 1) if it means the N level tunnel, this means 1.1) device needs complicated logic to analyze each level of the tunneling which seems to be impossible 1.2) in this case, if device want to set the csum_start/offset, we will need to answer if it for 0 or N 2) if it means the 0 level tunnel (that's my understanding so far), we end up with my questions above 2.1) DATA_VALID means 0 level, we can't reuse it for inner 2.2) if DATA_VALID means 0 level, we should already had outer checksum offload at least in the RX path Btw, I think we should clarify that the tunnel GSO only works for one level. > > AFAICS the simpler option is allowing DATA_VALID with GSO over UDP > tunnel without only with outer header csum offload, too. Not sure I get this, but it seems to be if DATA_VALID is for outer, it would be simpler. > > > It looks like we want to reuse hdr->flags for inner, not sure if it > > would result come conflict when introducing outer checksum offload. Or > > maybe we should introduce a dedicated inner_flags. > > I could not follow the above, could you please re-phrase? I meant e.g for RX we would have the following matrix inner: NEEDS_CUSM, DATA_VALID outer: NEEDS_CSUM, DATA_VALID Do we want to support all of the four combinations? It looks to me thing would be simpler, if NEEDS_CSUM, DATA_VALID is for outer (0 level of tunnel), and new flags or fields were introduced for inner. > > >> @@ -920,6 +1092,24 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > >> VIRTIO_NET_HDR_F_DATA_VALID is set, the driver MUST NOT > >> rely on the packet checksum being correct. > >> > >> +If both the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and > >> +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in in \field{gso_type} are set, > >> +the driver MUST NOT accept the packet. > >> + > >> +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 > >> +bit in \field{gso_type} are not set, the driver MUST NOT use the > >> +\field{outer_th_offset} and \field{inner_nh_offset}. > >> + > >> +If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or > >> +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, and any of > >> +the following is true: > >> +\begin{itemize} > >> +\item the VIRTIO_NET_HDR_F_NEEDS_CSUM is not set in \field{flags} > >> +\item the \field{gso_type} excluding the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 > >> +bit and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is VIRTIO_NET_HDR_GSO_NONE > > > > Maybe we can just simplify this as > > > > "VIRTIO_NET_HDR_GSO_NONE is set in \field{gso_type}" > > I considered that option, the main problem is that such statement will > never be true: gso_type has at least a bit set > (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 or > VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6). I preferred to be more accurate > even if more verbose. I think we should preserve this accuracy. Note that this has been under context of: """ If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, """ (Anyhow, if we can leave this to the native speaker) > > Thanks, > > Paolo > Thanks ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-10 3:17 ` Jason Wang @ 2024-10-10 7:40 ` Paolo Abeni 2024-10-11 2:08 ` Jason Wang 0 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-10 7:40 UTC (permalink / raw) To: Jason Wang Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On 10/10/24 05:17, Jason Wang wrote: > On Wed, Oct 9, 2024 at 4:37 PM Paolo Abeni <pabeni@redhat.com> wrote: >> On 10/9/24 09:18, Jason Wang wrote: >>> On Fri, Oct 4, 2024 at 4:13 PM Paolo Abeni <pabeni@redhat.com> wrote: >>>> >>>> The VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} are two gso_type flags >>>> allowing respectively GSO over UDPv4 tunnel and GSO over UDPv6 tunnel. >>>> They can be negotiated on both the host and guest sides. >>>> >>>> One constraint addressed here is that the virtio side (either device or >>>> driver) receiving a UDP tunneled GSO packet must be able to reconstruct >>>> completely the inner and outer headers offset - to allow for later GSO. >>>> >>>> To accommodate such need, new fields are introduced in the virtio_net >>>> header: outer_th_offset and inner_nh_offset. >>>> They map directly to the corresponding header information. The inner >>>> transport header is implied by the (inner) checksum offload. >>>> >>>> Those fields are required because a virtio device H/W implementation >>>> performing segmentation for UDP tunneled packet will need to touch >>>> the outer transport protocol (for the UDP length filed), the >>>> inner network protocol (for the total length field, in the IPv4 >>>> case). >>>> >>>> Note that segmentation will additionally need to touch >>>> the outer network protocol and the inner transport protocol. The first >>>> is implied/easily found with trivial parsing, the latter is identified >>>> by the existing csum_start field. >>>> >>>> Note that there is no concept of UDP tunnel type negotiation (e.g. >>>> vxlan, geneve, vxlan-gpe, etc.), as a virtio device H/W implementation >>>> can perform segmentation for every possible UDP-tunnel given the >>>> specified new fields. >>>> In the reverse direction, if a virtio device H/W implementation receives >>>> some traffic for an unknown or unsupported UDP tunnel, it will simply >>>> not aggregate the wire packet in a GSO one. >>>> >>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com> >>>> --- >>>> v8 -> v9: >>>> - rebased on top of virtio-1.4, changed udp-tunnel features number >>>> to avoid conflicts/duplications >>>> - fixed typos: >>>> VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV4 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4 >>>> VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV6 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6 >>>> - specified strict validation on RX side (both driver and device) >>>> for UDP tunneled packet: NEEDS_CSUM and plain GSO are mandatory for UDP >>>> tunnel GSO packets. The driver is not allowed to set DATA_VALID for UDP >>>> tunnel GSO packets, the driver can set both DATA_VALID & NEEDS_CSUM. >>> >>> DATA_VALID and NEEDS_CSUM are mutually exclusive unless I miss something. >> >> AFAICS the specification does not specify them to be mutually esclusive >> and at least the linux kernel implementation allow them to be set together. > > Interesting, where is such code? E.g in virtio_net_hdr_from_skb() we had: > > if (skb->ip_summed == CHECKSUM_PARTIAL) { > hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; > hdr->csum_start = __cpu_to_virtio16(little_endian, > skb_checksum_start_offset(skb) + vlan_hlen); > hdr->csum_offset = __cpu_to_virtio16(little_endian, > skb->csum_offset); > } else if (has_data_valid && > skb->ip_summed == CHECKSUM_UNNECESSARY) { > hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID; > ... The problem is in the opposite direction, in virtio_net_hdr_to_skb(). It accepts GSO packets with the VIRTIO_NET_HDR_F_NEEDS_CSUM bit not set, and accepts also packets with both VIRTIO_NET_HDR_F_NEEDS_CSUM and VIRTIO_NET_HDR_F_DATA_VALID set. (Note that the PoC is more restrictive for GSO over UDP tunnel packets) The above means that an hypervisor with a S/W virtio implementation need to deal with all sort of strange/unexpected combinations. > Anyhow, when I introduce DATA_VALID, the idea is to map > CHECKSUM_UNNECESSARY to DATA_VALID and CHECKSUM_PARTIAL maps to > NEEDS_CSUM. My understanding is CHECKSUM_UNNECESSARY and > CHECKSUM_PARTIAL and mutually exclusive. AFAICS the problem is that the specification does not enforce such understanding (I can't see any related 'MUST'/'MUST NOT') and the s/w implementation allow the same freedom. >> As a possible alternative to the change from v8 to v9 we can enforce >> DATA_VALID and NEEDS_CSUM to be mutually esclusive. > > It seems implied in this part of the spec: > > """ > If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has > validated the packet checksum. In case of multiple encapsulated > protocols, one level of checksums has been validated. > """ > > As DATA_VALID is only used in RX path, if packet has partial cusm, > device can't do the validation anyhow. Or did you mean the hardware > that can do hardware GRO, in that case, device should set NEEDS_CSUM > but not DATA_VALID (maybe we need clarify this in the spec). Indeed the above is a (possibly 'the') source of confusion. In the current spec (for non UDP tunneled GSO) it's unclear (I can't nee any related MUST/MUST NOT) if DATA_VALID can or can not be set with a GSO packet. AFAICS the current implementation in virtio_net_hdr_to_skb() allows that option. If we restrict DATA_VALID to non GSO packet, I think most/all the doubt discussed here should go away. I'm unsure if we can enforce such restriction for non UDP tunneled GSO, it could break/conflict with existing setup/deplyment, but I think we can enforce that for UDP tunneled GSO packets. Side note: as a consequence, sice we will not have anymore DATA_VALID together with VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV{4,6}, the above will additional avoid the 'DATA_VALID with GSO over UDP tunnel means inner header is valid' strangeness. >>>> @@ -639,12 +721,48 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De >>>> header. >>>> \end{itemize} >>>> >>>> +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO option has been negotiated, the >>>> +driver MAY set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the >>>> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}, if so: >>>> +\begin{itemize} >>>> +\item the driver MUST set \field{outer_th_offset} to the outer UDP header >>>> + offset and \field{inner_nh_offset} to the inner network header offset. >>>> + The \field{csum_start} and \field{csum_offset} fields point respectively >>>> + to the inner transport header and inner transport checksum field. >>>> +\end{itemize} >>>> + >>>> +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the >>>> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the >>>> +outer UDP header MUST NOT require checksum validation. That is, the >>>> +outer UDP checksum value MUST be 0 or the validated complete checksum >>>> +for such header. >>>> + >>>> +\begin{note} >>>> +The valid complete checksum of the outer UDP header of individual segments >>>> +can be computed by the driver prior to segmentation only if the GSO packet >>>> +size is a multiple of \field{gso_size}, because then all segments >>>> +have the same size and thus all data included in the outer UDP >>>> +checksum is the same for every segment. These pre-computed segment >>>> +length and checksum fields are different from those of the GSO >>>> +packet. >>> >>> How could a device know if the checksum if pre-computed or not? (I >>> could not find the relevant part in the sample kernel code). >> >> Note that the UDP tunnel csum offload is discusses in patch 2/2 as per >> your request. At this point (only patch 1/2 applied) there is no concept >> of csum offload for the outer header. With only patch 1/2 applied the >> outer UDP will be always left-alone/ignored/not touched (either 0 or >> pre-computed). > > Correct me if I was wrong, my understanding is that, device should > behave differently in those cases: > > 0: do a full checksum > pre-computed: checksum only the data part > So you meant if device sees a non zero checksum, it realizes that it > is a pre-computed one? If yes, we probably need to clarify this in the > spec. Lacking the UDP tunnel csum offload, the device will simply not touch the outer UDP csum. That will be the correct action both if the outer csum is 0 or the fully csum one. >>> Firstly, it doesn't mandate the csum_start/csum_offset. Anything makes >>> the tunnel GSO different? >> >> This difference is intentional, to satisfy a requested from Willem for >> "robustness". We have a continuos flow of serious bugs in the current >> virtio implementation due to the spec allowing "unexpected"/strange/weak >> features combination leading to strange packet layout and crashes. >> >> i.e. see the kernel commits: >> 49d14b54a527289d09a9480f214b8c586322310a >> 6513eb3d3191574b58859ef2d6dc26c0277c6f81 >> 89add40066f9ed9abe5f7f886fe5789ff7e0c50e >> > > I see. > >> The idea is that GSO over UDP tunnel specs are new, so we can be more >> strict and we try to mandate only 'sane' flag/features/layout combinations. > > But I don't see how mandating device to set csum stuffs can help. > > 1) There's no way for the driver to know if device sets csum stuffs or not > 2) Driver should not trust the device, malicious device can set > arbitrary value, so driver needs its own validation We are discussing only about csum validy, right? Lacking UDP tunnel csum offload, the device does not need to discriminate between outer zero csum and outer fully checksummed packets: in both case it should not touch the outer UDP csum. The 'worst case scenario' is that the guest will send packets with bad outer csum, which is AFACS legit, unharmful and sometimes necesary (for testing purpose). The guest can already send packets with bad csum in many different ways. >>> Secondly, I think "one level of checksums" means the outer, this seems >>> to be contradict to what we had here: 1) DATA_VALID seems to be for >>> inner 2) no outer checksum offload >> >> Yes, with this spec GSO over UDP tunnel DATA_VALID specify the inner >> head csum. > > Just to make sure we are on the same page. I meant we probably need a > clarification on the "one level of checksums" here. It looks ambiguous > as I see we think differently here: > > Assuming we have a packet with N level of tunnels. > > 1) if it means the N level tunnel, this means > 1.1) device needs complicated logic to analyze each level of the > tunneling which seems to be impossible > 1.2) in this case, if device want to set the csum_start/offset, we > will need to answer if it for 0 or N > > 2) if it means the 0 level tunnel (that's my understanding so far), we > end up with my questions above > 2.1) DATA_VALID means 0 level, we can't reuse it for inner > 2.2) if DATA_VALID means 0 level, we should already had outer checksum > offload at least in the RX path > > Btw, I think we should clarify that the tunnel GSO only works for one level. I hope and believe that this part should have been clarified by a previously discussed point: no DATA_VALID for GSO over UDP tunnel, no more differentiate/confusing meaning for the DATA VALID bit. >>>> @@ -920,6 +1092,24 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network >>>> VIRTIO_NET_HDR_F_DATA_VALID is set, the driver MUST NOT >>>> rely on the packet checksum being correct. >>>> >>>> +If both the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and >>>> +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in in \field{gso_type} are set, >>>> +the driver MUST NOT accept the packet. >>>> + >>>> +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 >>>> +bit in \field{gso_type} are not set, the driver MUST NOT use the >>>> +\field{outer_th_offset} and \field{inner_nh_offset}. >>>> + >>>> +If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or >>>> +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, and any of >>>> +the following is true: >>>> +\begin{itemize} >>>> +\item the VIRTIO_NET_HDR_F_NEEDS_CSUM is not set in \field{flags} >>>> +\item the \field{gso_type} excluding the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 >>>> +bit and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is VIRTIO_NET_HDR_GSO_NONE >>> >>> Maybe we can just simplify this as >>> >>> "VIRTIO_NET_HDR_GSO_NONE is set in \field{gso_type}" >> >> I considered that option, the main problem is that such statement will >> never be true: gso_type has at least a bit set >> (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 or >> VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6). I preferred to be more accurate >> even if more verbose. I think we should preserve this accuracy. > > Note that this has been under context of: > > """ > If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or > the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, > """ That is my point: we know that this statement: hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE is false. The goal is to be as accurate as possible, to avoid strangess/later misuse or misinterpretation. /P ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-10 7:40 ` Paolo Abeni @ 2024-10-11 2:08 ` Jason Wang 2024-10-11 7:50 ` Paolo Abeni 2024-10-14 7:20 ` Paolo Abeni 0 siblings, 2 replies; 40+ messages in thread From: Jason Wang @ 2024-10-11 2:08 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Thu, Oct 10, 2024 at 3:40 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/10/24 05:17, Jason Wang wrote: > > On Wed, Oct 9, 2024 at 4:37 PM Paolo Abeni <pabeni@redhat.com> wrote: > >> On 10/9/24 09:18, Jason Wang wrote: > >>> On Fri, Oct 4, 2024 at 4:13 PM Paolo Abeni <pabeni@redhat.com> wrote: > >>>> > >>>> The VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} are two gso_type flags > >>>> allowing respectively GSO over UDPv4 tunnel and GSO over UDPv6 tunnel. > >>>> They can be negotiated on both the host and guest sides. > >>>> > >>>> One constraint addressed here is that the virtio side (either device or > >>>> driver) receiving a UDP tunneled GSO packet must be able to reconstruct > >>>> completely the inner and outer headers offset - to allow for later GSO. > >>>> > >>>> To accommodate such need, new fields are introduced in the virtio_net > >>>> header: outer_th_offset and inner_nh_offset. > >>>> They map directly to the corresponding header information. The inner > >>>> transport header is implied by the (inner) checksum offload. > >>>> > >>>> Those fields are required because a virtio device H/W implementation > >>>> performing segmentation for UDP tunneled packet will need to touch > >>>> the outer transport protocol (for the UDP length filed), the > >>>> inner network protocol (for the total length field, in the IPv4 > >>>> case). > >>>> > >>>> Note that segmentation will additionally need to touch > >>>> the outer network protocol and the inner transport protocol. The first > >>>> is implied/easily found with trivial parsing, the latter is identified > >>>> by the existing csum_start field. > >>>> > >>>> Note that there is no concept of UDP tunnel type negotiation (e.g. > >>>> vxlan, geneve, vxlan-gpe, etc.), as a virtio device H/W implementation > >>>> can perform segmentation for every possible UDP-tunnel given the > >>>> specified new fields. > >>>> In the reverse direction, if a virtio device H/W implementation receives > >>>> some traffic for an unknown or unsupported UDP tunnel, it will simply > >>>> not aggregate the wire packet in a GSO one. > >>>> > >>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com> > >>>> --- > >>>> v8 -> v9: > >>>> - rebased on top of virtio-1.4, changed udp-tunnel features number > >>>> to avoid conflicts/duplications > >>>> - fixed typos: > >>>> VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV4 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4 > >>>> VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV6 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6 > >>>> - specified strict validation on RX side (both driver and device) > >>>> for UDP tunneled packet: NEEDS_CSUM and plain GSO are mandatory for UDP > >>>> tunnel GSO packets. The driver is not allowed to set DATA_VALID for UDP > >>>> tunnel GSO packets, the driver can set both DATA_VALID & NEEDS_CSUM. > >>> > >>> DATA_VALID and NEEDS_CSUM are mutually exclusive unless I miss something. > >> > >> AFAICS the specification does not specify them to be mutually esclusive > >> and at least the linux kernel implementation allow them to be set together. > > > > Interesting, where is such code? E.g in virtio_net_hdr_from_skb() we had: > > > > if (skb->ip_summed == CHECKSUM_PARTIAL) { > > hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; > > hdr->csum_start = __cpu_to_virtio16(little_endian, > > skb_checksum_start_offset(skb) + vlan_hlen); > > hdr->csum_offset = __cpu_to_virtio16(little_endian, > > skb->csum_offset); > > } else if (has_data_valid && > > skb->ip_summed == CHECKSUM_UNNECESSARY) { > > hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID; > > ... > > The problem is in the opposite direction, in virtio_net_hdr_to_skb(). > > It accepts GSO packets with the VIRTIO_NET_HDR_F_NEEDS_CSUM bit not set, This seems to be fine, it requires software checksum. > and accepts also packets with both VIRTIO_NET_HDR_F_NEEDS_CSUM and > VIRTIO_NET_HDR_F_DATA_VALID set. (Note that the PoC is more restrictive > for GSO over UDP tunnel packets) This is actually a bug that needs to be fixed. if (flags & VIRTIO_NET_HDR_F_DATA_VALID) skb->ip_summed = CHECKSUM_UNNECESSARY; if (virtio_net_hdr_to_skb(skb, &hdr->hdr, The code may overwrite CHECKSUM_UNNECESSARY with CHECKSUM_PARTIAL. > > The above means that an hypervisor with a S/W virtio implementation need > to deal with all sort of strange/unexpected combinations. Driver and device doesn't trust each other, they must do necessary validation no matter what features were negotiated. > > > Anyhow, when I introduce DATA_VALID, the idea is to map > > CHECKSUM_UNNECESSARY to DATA_VALID and CHECKSUM_PARTIAL maps to > > NEEDS_CSUM. My understanding is CHECKSUM_UNNECESSARY and > > CHECKSUM_PARTIAL and mutually exclusive. > > AFAICS the problem is that the specification does not enforce such > understanding (I can't see any related 'MUST'/'MUST NOT') and the s/w > implementation allow the same freedom. Spec can be unclear and it's hard to prove an implementation is 100% compliant with spec. There are quirks in both side to workaround buggy implementation in other side. We need to fix the spec if it causes confusion. > > >> As a possible alternative to the change from v8 to v9 we can enforce > >> DATA_VALID and NEEDS_CSUM to be mutually esclusive. > > > > It seems implied in this part of the spec: > > > > """ > > If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > > VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has > > validated the packet checksum. In case of multiple encapsulated > > protocols, one level of checksums has been validated. > > """ > > > > As DATA_VALID is only used in RX path, if packet has partial cusm, > > device can't do the validation anyhow. Or did you mean the hardware > > that can do hardware GRO, in that case, device should set NEEDS_CSUM > > but not DATA_VALID (maybe we need clarify this in the spec). > > Indeed the above is a (possibly 'the') source of confusion. In the > current spec (for non UDP tunneled GSO) it's unclear (I can't nee any > related MUST/MUST NOT) if DATA_VALID can or can not be set with a GSO > packet. According to this """ If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in flags, if so, the device MUST validate the packet checksum (in case of multiple encapsulated protocols, one level of checksums is validated). """ It doesn't have any care for GSO, so my understanding is that DATA_VALID can be set for GSO packet. > AFAICS the current implementation in virtio_net_hdr_to_skb() > allows that option. > > If we restrict DATA_VALID to non GSO packet, I think most/all the doubt > discussed here should go away. I'm unsure if we can enforce such > restriction for non UDP tunneled GSO, it could break/conflict with > existing setup/deplyment, Yes, it's too late to do that. > but I think we can enforce that for UDP > tunneled GSO packets. It looks like there are some existing drivers that support GRO_HW as well as CHECKSUM_UNNECESSARY. If it doesn't turn out to be too hard, we need to find way to support that. > > Side note: as a consequence, sice we will not have anymore DATA_VALID > together with VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV{4,6}, the above will > additional avoid the 'DATA_VALID with GSO over UDP tunnel means inner > header is valid' strangeness. Well, then we will still have the issue with NEEDS_CSUM? For example, as I ask in patch 2, if we want to offload outer checksum where do we put the out csum_start/csum_offset? > > >>>> @@ -639,12 +721,48 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > >>>> header. > >>>> \end{itemize} > >>>> > >>>> +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO option has been negotiated, the > >>>> +driver MAY set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > >>>> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}, if so: > >>>> +\begin{itemize} > >>>> +\item the driver MUST set \field{outer_th_offset} to the outer UDP header > >>>> + offset and \field{inner_nh_offset} to the inner network header offset. > >>>> + The \field{csum_start} and \field{csum_offset} fields point respectively > >>>> + to the inner transport header and inner transport checksum field. > >>>> +\end{itemize} > >>>> + > >>>> +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > >>>> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the > >>>> +outer UDP header MUST NOT require checksum validation. That is, the > >>>> +outer UDP checksum value MUST be 0 or the validated complete checksum > >>>> +for such header. > >>>> + > >>>> +\begin{note} > >>>> +The valid complete checksum of the outer UDP header of individual segments > >>>> +can be computed by the driver prior to segmentation only if the GSO packet > >>>> +size is a multiple of \field{gso_size}, because then all segments > >>>> +have the same size and thus all data included in the outer UDP > >>>> +checksum is the same for every segment. These pre-computed segment > >>>> +length and checksum fields are different from those of the GSO > >>>> +packet. > >>> > >>> How could a device know if the checksum if pre-computed or not? (I > >>> could not find the relevant part in the sample kernel code). > >> > >> Note that the UDP tunnel csum offload is discusses in patch 2/2 as per > >> your request. At this point (only patch 1/2 applied) there is no concept > >> of csum offload for the outer header. With only patch 1/2 applied the > >> outer UDP will be always left-alone/ignored/not touched (either 0 or > >> pre-computed). > > > > Correct me if I was wrong, my understanding is that, device should > > behave differently in those cases: > > > > 0: do a full checksum > > pre-computed: checksum only the data part > > So you meant if device sees a non zero checksum, it realizes that it > > is a pre-computed one? If yes, we probably need to clarify this in the > > spec. > > Lacking the UDP tunnel csum offload, the device will simply not touch > the outer UDP csum. That will be the correct action both if the outer > csum is 0 or the fully csum one. Just to make sure we are talking the same thing. Actually, I meant for TX you propose the following text: """ +The valid complete checksum of the outer UDP header of individual segments +can be computed by the driver prior to segmentation only if the GSO packet +size is a multiple of \field{gso_size}, because then all segments +have the same size and thus all data included in the outer UDP +checksum is the same for every segment. These pre-computed segment +length and checksum fields are different from those of the GSO +packet. """ If I understand correctly, you are talking about the precomputed checksum by the driver which should be TX. If no checksum offload, I don't get how such pre-computing can help. > > >>> Firstly, it doesn't mandate the csum_start/csum_offset. Anything makes > >>> the tunnel GSO different? > >> > >> This difference is intentional, to satisfy a requested from Willem for > >> "robustness". We have a continuos flow of serious bugs in the current > >> virtio implementation due to the spec allowing "unexpected"/strange/weak > >> features combination leading to strange packet layout and crashes. > >> > >> i.e. see the kernel commits: > >> 49d14b54a527289d09a9480f214b8c586322310a > >> 6513eb3d3191574b58859ef2d6dc26c0277c6f81 > >> 89add40066f9ed9abe5f7f886fe5789ff7e0c50e > >> > > > > I see. > > > >> The idea is that GSO over UDP tunnel specs are new, so we can be more > >> strict and we try to mandate only 'sane' flag/features/layout combinations. > > > > But I don't see how mandating device to set csum stuffs can help. > > > > 1) There's no way for the driver to know if device sets csum stuffs or not > > 2) Driver should not trust the device, malicious device can set > > arbitrary value, so driver needs its own validation > > We are discussing only about csum validy, right? Kind of, actually the context is whether mandating the device to set csum_start/offset for DATA_VALID can help. So I meant there's no difference in the following two cases: 1) Device doesn't set csum_start/offset, driver need to probe and validate the packet 2) Device set the csum_start/offset, driver still need to probe and validate the packet And as replied before, there's no way for the driver to know if device set csum_start/offset or not. > Lacking UDP tunnel csum > offload, the device does not need to discriminate between outer zero > csum and outer fully checksummed packets: in both case it should not > touch the outer UDP csum. > > The 'worst case scenario' is that the guest will send packets with bad > outer csum, which is AFACS legit, unharmful and sometimes necesary (for > testing purpose). > > The guest can already send packets with bad csum in many different ways. DATA_VALID is not used in TX I think. But the point is, in any case, validation in each side (driver or device) is a must. > > >>> Secondly, I think "one level of checksums" means the outer, this seems > >>> to be contradict to what we had here: 1) DATA_VALID seems to be for > >>> inner 2) no outer checksum offload > >> > >> Yes, with this spec GSO over UDP tunnel DATA_VALID specify the inner > >> head csum. > > > > Just to make sure we are on the same page. I meant we probably need a > > clarification on the "one level of checksums" here. It looks ambiguous > > as I see we think differently here: > > > > Assuming we have a packet with N level of tunnels. > > > > 1) if it means the N level tunnel, this means > > 1.1) device needs complicated logic to analyze each level of the > > tunneling which seems to be impossible > > 1.2) in this case, if device want to set the csum_start/offset, we > > will need to answer if it for 0 or N > > > > 2) if it means the 0 level tunnel (that's my understanding so far), we > > end up with my questions above > > 2.1) DATA_VALID means 0 level, we can't reuse it for inner > > 2.2) if DATA_VALID means 0 level, we should already had outer checksum > > offload at least in the RX path > > > > Btw, I think we should clarify that the tunnel GSO only works for one level. > > I hope and believe that this part should have been clarified by a > previously discussed point: no DATA_VALID for GSO over UDP tunnel, no > more differentiate/confusing meaning for the DATA VALID bit. Rethink of the design, the root cause is that we're trying to re-use existing fields for inner which works for outer before: For example. consider the case for non GSO tunneling, all csum_start/offset/hdr_flags etc were for the outer, and we change the semantic when GSO tunnel is supported. Such asymmetry might end up of a lot of complexity in both the spec and the implementation. I wonder why not just introduce new fields for the inner, like what has beed done in the linux sk_buff implementation? I know I propose this before, but now I think it can help to reduce the complexity and more aligned with the for non GSO tunneling. ... union { __be16 inner_protocol; __u8 inner_ipproto; }; ... __u16 inner_transport_header; __u16 inner_network_header; __u16 inner_mac_header; __be16 protocol; __u16 transport_header; __u16 network_header; ... Thanks > > >>>> @@ -920,6 +1092,24 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > >>>> VIRTIO_NET_HDR_F_DATA_VALID is set, the driver MUST NOT > >>>> rely on the packet checksum being correct. > >>>> > >>>> +If both the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and > >>>> +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in in \field{gso_type} are set, > >>>> +the driver MUST NOT accept the packet. > >>>> + > >>>> +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 > >>>> +bit in \field{gso_type} are not set, the driver MUST NOT use the > >>>> +\field{outer_th_offset} and \field{inner_nh_offset}. > >>>> + > >>>> +If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or > >>>> +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, and any of > >>>> +the following is true: > >>>> +\begin{itemize} > >>>> +\item the VIRTIO_NET_HDR_F_NEEDS_CSUM is not set in \field{flags} > >>>> +\item the \field{gso_type} excluding the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 > >>>> +bit and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is VIRTIO_NET_HDR_GSO_NONE > >>> > >>> Maybe we can just simplify this as > >>> > >>> "VIRTIO_NET_HDR_GSO_NONE is set in \field{gso_type}" > >> > >> I considered that option, the main problem is that such statement will > >> never be true: gso_type has at least a bit set > >> (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 or > >> VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6). I preferred to be more accurate > >> even if more verbose. I think we should preserve this accuracy. > > > > Note that this has been under context of: > > > > """ > > If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or > > the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, > > """ > > That is my point: we know that this statement: > > hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE > > is false. The goal is to be as accurate as possible, to avoid > strangess/later misuse or misinterpretation. > > /P > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-11 2:08 ` Jason Wang @ 2024-10-11 7:50 ` Paolo Abeni 2024-10-14 7:20 ` Paolo Abeni 1 sibling, 0 replies; 40+ messages in thread From: Paolo Abeni @ 2024-10-11 7:50 UTC (permalink / raw) To: Jason Wang Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On 10/11/24 04:08, Jason Wang wrote: > On Thu, Oct 10, 2024 at 3:40 PM Paolo Abeni <pabeni@redhat.com> wrote: >> The above means that an hypervisor with a S/W virtio implementation need >> to deal with all sort of strange/unexpected combinations. > > Driver and device doesn't trust each other, they must do necessary > validation no matter what features were negotiated. The problem is that a non restrictive specification allows for conflicting/confusing layout, hard to check hence the bugs. The goal here is to try to enforce a restrictive specification for the new feature. i.e. if we could replace the whole 'else' statement here: https://elixir.bootlin.com/linux/v6.12-rc2/source/include/linux/virtio_net.h#L113 with a simpler: if (gso_type) return -EINVAL; we would have avoid a lot of serious kernel bugs. Note that the PoC code enforces the above, but just for GSO over UDP tunnel packets. >> Side note: as a consequence, sice we will not have anymore DATA_VALID >> together with VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV{4,6}, the above will >> additional avoid the 'DATA_VALID with GSO over UDP tunnel means inner >> header is valid' strangeness. > > Well, then we will still have the issue with NEEDS_CSUM? For example, > as I ask in patch 2, if we want to offload outer checksum where do we > put the out csum_start/csum_offset? I think we don't have any issue there. For GSO over UDP tunnel packets, csum_start/csum_offset always refer to the inner checksum. For GSO without UDP tunnel, csum_start/csum_offset refers to the only checksum. The outer checksum is offloaded with checksum partial only for GSO over UDP tunnel packets, when the header carries and set the outer_th_offset field: """ If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been negotiated, and the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit is set in \field{flags}, both the outer UDP checksum and the inner transport checksum have been validated, otherwise only one level of checksums (the inner one in case of tunnels) has been validated. """ Note that i.e. performing USO aggregation on the outer UDP for GSO over UDP tunnel packets, will cause data stream corruption, must be avoided regardless of virtio. > Just to make sure we are talking the same thing. Actually, I meant for > TX you propose the following text: > > """ > +The valid complete checksum of the outer UDP header of individual segments > +can be computed by the driver prior to segmentation only if the GSO packet > +size is a multiple of \field{gso_size}, because then all segments > +have the same size and thus all data included in the outer UDP > +checksum is the same for every segment. These pre-computed segment > +length and checksum fields are different from those of the GSO > +packet. > """ > > If I understand correctly, you are talking about the precomputed > checksum by the driver which should be TX. If no checksum offload, I > don't get how such pre-computing can help. The above refers to GSO over UDP tunnel, so by the spec change requires that there is csum offload for the inner transport. Precomputing the outer csum basically leads to checksum partial, and allows a device/driver pair not supporting outer header csum offload to handle GSO over UDP tunnel with outer csum enabled. >>>> The idea is that GSO over UDP tunnel specs are new, so we can be more >>>> strict and we try to mandate only 'sane' flag/features/layout combinations. >>> >>> But I don't see how mandating device to set csum stuffs can help. >>> >>> 1) There's no way for the driver to know if device sets csum stuffs or not >>> 2) Driver should not trust the device, malicious device can set >>> arbitrary value, so driver needs its own validation >> >> We are discussing only about csum validy, right? > > Kind of, actually the context is whether mandating the device to set > csum_start/offset for DATA_VALID can help. > > So I meant there's no difference in the following two cases: > > 1) Device doesn't set csum_start/offset, driver need to probe and > validate the packet > 2) Device set the csum_start/offset, driver still need to probe and > validate the packet The current implementation is not doing (rightful) any probing when csum_start/offset are available, just sanity checks, which are way simpler and safer. > And as replied before, there's no way for the driver to know if device > set csum_start/offset or not. I don't understand this statement. According the the spec, if the device has set NEEDS_CSUM it also must have set csum_start/offset. If a broken or malicious device set random/bad values, sanity checks should catch it. Note that the real problems are actually in the opposite direction: the driver sends bad/malicious data to the device and potentially crashing the host. Strict sanity checks should avoid that. >> I hope and believe that this part should have been clarified by a >> previously discussed point: no DATA_VALID for GSO over UDP tunnel, no >> more differentiate/confusing meaning for the DATA VALID bit. > > Rethink of the design, the root cause is that we're trying to re-use > existing fields for inner which works for outer before: > > For example. consider the case for non GSO tunneling, all > csum_start/offset/hdr_flags etc were for the outer, and we change the > semantic when GSO tunnel is supported. Such asymmetry might end up of > a lot of complexity in both the spec and the implementation. Have you looked at the actual PoC implementation? the proposed specification allows re-using the existing code to process the inner headers seamlessly: https://github.com/pabeni/linux-devel/blob/virtio_tunnel_gso_v9/include/linux/virtio_net.h#L302 https://github.com/pabeni/linux-devel/blob/virtio_tunnel_gso_v9/include/linux/virtio_net.h#L359 For non tunneling GSO, the fields refer to the _only_ transport header available. You can call it 'outer' but it's just an arbitrary choice. I must admit all this discussion is quite confusing to me, as keep touching consolidated and agreed stuff. My understanding was you were ok with V8: https://lore.kernel.org/virtio-comment/CA+FuTSf8274uQVEiJ-R5nnqAzDR9+N3WA0YoSbC-Q7MF5+O7Mw@mail.gmail.com/T/#mb455dd5f07ac98f1ccf79ba6edbd17a75f72df0b and the only significant delta between v8 and v9 is a more restrictive and defined usage of the DATA_VALID flag with GSO over UDP tunnel packets (as also reported in the changelog). I think we should restrict the discussion to that point. I agree the conflicting definition of DATA_VALID GSO over tunnel and GSO without tunnel could be confusing. AFAICS there are 2 options to make the DATA_VALID usage with GSO over UDP tunnel not contradictory with the non tunnel GSO: 1) forbid DATA_VALID with GSO over UDP tunnel or 2) forbid DATA_VALID with GSO over UDP tunnel without UDP tunnel checksum offload. That means that DATA_VALID for GSO over UDP tunnel packets could only be set when both the inner and outher header are offloaded. I hope opt 1) would be simpler to define in details, opt 2 would still allow for GSO over UDP tunnel HW GRO with DATA_VALID. Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-11 2:08 ` Jason Wang 2024-10-11 7:50 ` Paolo Abeni @ 2024-10-14 7:20 ` Paolo Abeni 2024-10-17 6:47 ` Jason Wang 1 sibling, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-14 7:20 UTC (permalink / raw) To: Jason Wang Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On 10/11/24 04:08, Jason Wang wrote: > I wonder why not just introduce new fields for the inner, like what > has beed done in the linux sk_buff implementation? I know I propose > this before, but now I think it can help to reduce the complexity and > more aligned with the for non GSO tunneling. > > ... > union { > __be16 inner_protocol; > __u8 inner_ipproto; > }; > ... > __u16 inner_transport_header; > __u16 inner_network_header; > __u16 inner_mac_header; > > __be16 protocol; > __u16 transport_header; > __u16 network_header; > ... As the discussion touches many different points, please allow me wrap-up, in order of increasing conflictuality: 1) there are a few points clarification requests which are actually not needed, because the asked text is already there in the spec. 2) the DATA_VALID handling proposed in v9 is bad, let me scratch it for now. 3) you feel the handling of csum_start/csum_offset is confusing and would prefer adding new fields for the inner headers. Let's try to set this most critical point first. Adding new fields for the inner headers has several dis-advantages: - we will need to add more new fields WRT to the current proposal. Using csum_start/csum_offset for the outer we will need: inner_nh inner_th inner_offset instead of just outer_th, inner_nh - AFAICS, due to align reason, the above will require an additional <unused> field. - the implementation will not be able to reuse the existing code, leading to duplication and/or hacks. - will discard most of the discussions on this topics done in the past several revisions I understand that one of the argument in favor of using new fields for the inner headers is to align with the current linux kernel handling of tunnel related fields. Please note that the existence of skb->inner_* fields is quite blamed, I suggest avoiding taking them as a reference. AFAICS the main argument against the layout I proposed is the inconsistent usage of csum_start that will refer to the inner header for GSO over udp-tunnel packets and to the outer header in when the device does not expose the GSO over UDP tunnel support. I argue that dubbing such behavior as inconsistent is a matter of interpretation: in both case the csum_start field refers to the innermost header that the device is able to parse. Finally, the specification change to describe the new field usage I propose is quite minimal and almost don't touch the existing definition: it's difficult for alternative specification being simpler - especially when such alternative specification will need more new fields to be described. Cheers, Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-14 7:20 ` Paolo Abeni @ 2024-10-17 6:47 ` Jason Wang 2024-10-17 15:34 ` Paolo Abeni 0 siblings, 1 reply; 40+ messages in thread From: Jason Wang @ 2024-10-17 6:47 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Mon, Oct 14, 2024 at 3:20 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/11/24 04:08, Jason Wang wrote: > > I wonder why not just introduce new fields for the inner, like what > > has beed done in the linux sk_buff implementation? I know I propose > > this before, but now I think it can help to reduce the complexity and > > more aligned with the for non GSO tunneling. > > > > ... > > union { > > __be16 inner_protocol; > > __u8 inner_ipproto; > > }; > > ... > > __u16 inner_transport_header; > > __u16 inner_network_header; > > __u16 inner_mac_header; > > > > __be16 protocol; > > __u16 transport_header; > > __u16 network_header; > > ... > > As the discussion touches many different points, please allow me > wrap-up, in order of increasing conflictuality: > > 1) there are a few points clarification requests which are actually not > needed, because the asked text is already there in the spec. > > 2) the DATA_VALID handling proposed in v9 is bad, let me scratch it for now. > > 3) you feel the handling of csum_start/csum_offset is confusing and > would prefer adding new fields for the inner headers. > > Let's try to set this most critical point first. Adding new fields for > the inner headers has several dis-advantages: > > - we will need to add more new fields WRT to the current proposal. Using > csum_start/csum_offset for the outer we will need: > > inner_nh > inner_th > inner_offset > > instead of just outer_th, inner_nh > > - AFAICS, due to align reason, the above will require an additional > <unused> field. > > - the implementation will not be able to reuse the existing code, > leading to duplication and/or hacks. This part I don't understand which part of the code could not be re-used? More below. > > - will discard most of the discussions on this topics done in the past > several revisions > > I understand that one of the argument in favor of using new fields for > the inner headers is to align with the current linux kernel handling of > tunnel related fields. Please note that the existence of skb->inner_* > fields is quite blamed, I suggest avoiding taking them as a reference. That's fine, but I would like to know why it is blamed or if such blaming is related. > > AFAICS the main argument against the layout I proposed is the > inconsistent usage of csum_start that will refer to the inner header for > GSO over udp-tunnel packets and to the outer header in when the device > does not expose the GSO over UDP tunnel support. > > I argue that dubbing such behavior as inconsistent is a matter of > interpretation: in both case the csum_start field refers to the > innermost header that the device is able to parse. I was convinced by this argument until I see this """ If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has validated the packet checksum. In case of multiple encapsulated protocols, one level of checksums has been validated. """ The "one level" seems to be ambiguous. But this reminds me that the innermost assumption seems to be impractical: The checksum of non GSO tunnels has been supported. In this case, looking at the current Linux implementation, if I don't misread the code, csum_start/offset were used for outer. What's more, the innermost assumption requires the device to phrase each level of the tunnel which is very hard (or even impossible). If we use csum_start/offset for inner for GSO tunnel we would end up with different semantics (GSO seems not a good reason for the differences of csum): For GSO UDP tunnel, csum_start/offset is for inner. For non GSO UDP tunnel, csum_start/offset is for outer. Such inconsistency a lot of extra statements to describe the difference semantics in the spec and if/else in the code (as demonstrated in your POC which uses a completely different helper to do the header conversion between vnet header and skb metadata). If we introduce new fields for inner, we can stick to the guidance and normatives for the new fields instead of dealing with the different semantics between GSO and non-GSO tunnels. 1) a patch to clarify the existing fields in vnet header were for outer in there's a tunnel 2) adding new fields and explain that they were used for inner, we don't need to worry any possible conflict when trying to reuse fields with outer (as demonstrated by DATA_VALID). > > Finally, the specification change to describe the new field usage I > propose is quite minimal and almost don't touch the existing definition: > it's difficult for alternative specification being simpler - especially > when such alternative specification will need more new fields to be > described. > > Cheers, > > Paolo Thanks > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-17 6:47 ` Jason Wang @ 2024-10-17 15:34 ` Paolo Abeni 2024-10-18 4:26 ` Jason Wang 0 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-17 15:34 UTC (permalink / raw) To: Jason Wang Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On 10/17/24 08:47, Jason Wang wrote: >> AFAICS the main argument against the layout I proposed is the >> inconsistent usage of csum_start that will refer to the inner header for >> GSO over udp-tunnel packets and to the outer header in when the device >> does not expose the GSO over UDP tunnel support. >> >> I argue that dubbing such behavior as inconsistent is a matter of >> interpretation: in both case the csum_start field refers to the >> innermost header that the device is able to parse. > > I was convinced by this argument until I see this > > """ > If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has > validated the packet checksum. In case of multiple encapsulated > protocols, one level of checksums has been validated. > """ > > The "one level" seems to be ambiguous. But this reminds me that the > innermost assumption seems to be impractical: > > The checksum of non GSO tunnels has been supported. I argue the above statement is again a matter of interpretation: the existing specs support checksum offload of UDP packets. That UDP packet may (or may not) carry a tunnel and some other headers, but such info are actually irrelevant to the device: lacking the GSO over UDP tunnel support, the device still parse the packet up to the (only known) transport (UDP) header and offload the related checksum. Also note that I already agreed the specification for DATA_VALID with GSO over UDP tunnel packets must be changed/corrected and I proposed 2 options to avoid confusion there: https://lore.kernel.org/virtio-comment/CACGkMEvmhdYMHf5HKX37XqAYO2uHPq1=eWXfBPoS2gp8FJezDQ@mail.gmail.com/T/#m520efe2519200e878ab103f32598e5163cd01ec4 """ 1) forbid DATA_VALID with GSO over UDP tunnel or 2) forbid DATA_VALID with GSO over UDP tunnel without UDP tunnel checksum offload. That means that DATA_VALID for GSO over UDP tunnel packets could only be set when both the inner and outher header are offloaded. """ [...] > Such inconsistency a lot of extra statements to describe the > difference semantics in the spec and if/else in the code (as > demonstrated in your POC which uses a completely different helper to > do the header conversion between vnet header and skb metadata). I think you have misread the PoC code. It actually re-uses the exiting helpers, verbatim, without the need to duplicate any existing checks. Sure, there is some additional code in the GSO over UDP tunnel helpers, as the latter must process and check additional fields. The other additional conditionals you see are due to both the spec and helper being very strict about the admitted packets - we want to avoid the loopholes leading to nasty bugs, as already mentioned. Such code checks and spec statements should (/must) be there with any layout we chose for the virtio_net_hdr to support GSO over UDP tunnel. Let me reiterate that, instead, using the csum_start/csum_offset fields to track the outer header information, the GSO over UDP tunnel could processing could not use virtio_net_hdr_to_skb()/virtio_net_hdr_from_skb() to process the existing fields. i.e. all the checks in https://elixir.bootlin.com/linux/v6.12-rc2/source/include/linux/virtio_net.h#L97 ..L112 will be meaningless in such scenario and the GSO over UDP tunnel helper code must replicate them with the correct offsets. IDK which extra statements in the spec you refers to, could you please be more specific? > If we introduce new fields for inner, we can stick to the guidance and > normatives for the new fields instead of dealing with the different > semantics between GSO and non-GSO tunnels. > > 1) a patch to clarify the existing fields in vnet header were for > outer in there's a tunnel Would such patch require the sort of extra statements you don't like?!? > 2) adding new fields and explain that they were used for inner, we > don't need to worry any possible conflict when trying to reuse fields > with outer (as demonstrated by DATA_VALID). I must admit the 6m (or 2y?) step backward in time does sound scaring to me. As mentioned the above will lead to a poorer implementation, are we ok with that? What about hdr_len field? for GSO over UDP packets should include the inner hdr len or not? If not, it would be useless, otherwise would be inconsistent. /P ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-17 15:34 ` Paolo Abeni @ 2024-10-18 4:26 ` Jason Wang 2024-10-18 10:10 ` Paolo Abeni 0 siblings, 1 reply; 40+ messages in thread From: Jason Wang @ 2024-10-18 4:26 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Thu, Oct 17, 2024 at 11:34 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/17/24 08:47, Jason Wang wrote: > >> AFAICS the main argument against the layout I proposed is the > >> inconsistent usage of csum_start that will refer to the inner header for > >> GSO over udp-tunnel packets and to the outer header in when the device > >> does not expose the GSO over UDP tunnel support. > >> > >> I argue that dubbing such behavior as inconsistent is a matter of > >> interpretation: in both case the csum_start field refers to the > >> innermost header that the device is able to parse. > > > > I was convinced by this argument until I see this > > > > """ > > If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > > VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has > > validated the packet checksum. In case of multiple encapsulated > > protocols, one level of checksums has been validated. > > """ > > > > The "one level" seems to be ambiguous. But this reminds me that the > > innermost assumption seems to be impractical: > > > > The checksum of non GSO tunnels has been supported. > > I argue the above statement is again a matter of interpretation: the > existing specs support checksum offload of UDP packets. That UDP packet > may (or may not) carry a tunnel and some other headers, but such info > are actually irrelevant to the device: lacking the GSO over UDP tunnel > support, the device still parse the packet up to the (only known) > transport (UDP) header and offload the related checksum. Yes, so for non GSO tunnels, all vnet header fields (including csum_start/offset) were for the outer. But this proposal tries to make csum_start/offset an inner for GSO UDP tunnels. That seems to introduce inconsistency with side effects. > > Also note that I already agreed the specification for DATA_VALID with > GSO over UDP tunnel packets must be changed/corrected and I proposed 2 > options to avoid confusion there: > > https://lore.kernel.org/virtio-comment/CACGkMEvmhdYMHf5HKX37XqAYO2uHPq1=eWXfBPoS2gp8FJezDQ@mail.gmail.com/T/#m520efe2519200e878ab103f32598e5163cd01ec4 > > """ > 1) forbid DATA_VALID with GSO over UDP tunnel > If it's not too hard, I would like to keep that. Here are the reasons 1) I notice that some Linux drivers uses DATA_VALID for GSO UDP tunnel 2) when we want to add it back some day in the future, it would cost another feature bit 3) It looks to me all the issues come from when we try to reuse the outer flags, we would not have issues if we introduce a new bits/fields for inner > or > > 2) forbid DATA_VALID with GSO over UDP tunnel without UDP tunnel > checksum offload. That means that DATA_VALID for GSO over UDP tunnel > packets could only be set when both the inner and outher header are > offloaded. This seems to be fine, but again we need to differ the outer/inner here. 1) reusing DATA_VALID/NEEDS_CSUM 2) introducing new fields and bit Reusing seems to be harder. And no matter what way we choose to go, we need clarify the following conditions in the normatives 1) if inner is DATA_VALID but outer is NEEDS_CSUM is allowed 2) if inner is NEEDS_CUSM but outer is DATA_VALID is allowed (It looks to me like it would be easier if we just forbid the above two). > """ > > [...] > > Such inconsistency a lot of extra statements to describe the > > difference semantics in the spec and if/else in the code (as > > demonstrated in your POC which uses a completely different helper to > > do the header conversion between vnet header and skb metadata). > > I think you have misread the PoC code. It actually re-uses the exiting > helpers, verbatim, without the need to duplicate any existing checks. There's no duplication but new semantic for csum_start/offset: static inline int virtio_net_hdr_tnl_to_skb(struct sk_buff *skb, ... ... if (!gso_tunnel_type) return virtio_net_hdr_to_skb(skb, hdr, little_endian); // [1] ... inner_th = __virtio16_to_cpu(little_endian, hdr->csum_start); ... /* Let the basic parsing deal with plain GSO features. */ ret = virtio_net_hdr_from_skb(skb, hdr, little_endian, false, // [2] ... skb->inner_transport_header = inner_th ... static inline int virtio_net_hdr_to_skb(struct sk_buff *skb, ... ... if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) { u32 start = __virtio16_to_cpu(little_endian, hdr->csum_start); ... if (!skb_partial_csum_set(skb, start, off)) ... } And having two calls to virtio_net_hdr_from_skb() is a hint that the logic needs to be extended in virtio_net_hdr_from_skb() instead. > > Sure, there is some additional code in the GSO over UDP tunnel helpers, > as the latter must process and check additional fields. The other > additional conditionals you see are due to both the spec and helper > being very strict about the admitted packets - we want to avoid the > loopholes leading to nasty bugs, as already mentioned. Such code checks > and spec statements should (/must) be there with any layout we chose for > the virtio_net_hdr to support GSO over UDP tunnel. > > Let me reiterate that, instead, using the csum_start/csum_offset fields > to track the outer header information, the GSO over UDP tunnel could > processing could not use > virtio_net_hdr_to_skb()/virtio_net_hdr_from_skb() to process the > existing fields. i.e. all the checks in > > https://elixir.bootlin.com/linux/v6.12-rc2/source/include/linux/virtio_net.h#L97 > ..L112 > > will be meaningless in such scenario and the GSO over UDP tunnel helper > code must replicate them with the correct offsets. I don't understand here. It works like a charm for non GSO UDP tunnel, anything that makes it can't work for GSO? > > IDK which extra statements in the spec you refers to, could you please > be more specific? I meant statements needs to be added to 1) say csum_start/offset is for inner but not outer, one example: """ + \item \field{outer_th_offset} field indicates the outer transport header within + the packet. This field differs from \field{csum_start} as the latter + points to the inner transport header within the packet. """ 2) clarifying about other possible conflicts about such kinds of reusing (gso_size, flags and others). Those seem to be missed in the current proposal. > > > If we introduce new fields for inner, we can stick to the guidance and > > normatives for the new fields instead of dealing with the different > > semantics between GSO and non-GSO tunnels. > > > > 1) a patch to clarify the existing fields in vnet header were for > > outer in there's a tunnel > > Would such patch require the sort of extra statements you don't like?!? No, I meant, we need to clarify "one level" in the spec to align with the existing implementation. > > > 2) adding new fields and explain that they were used for inner, we > > don't need to worry any possible conflict when trying to reuse fields > > with outer (as demonstrated by DATA_VALID). > > I must admit the 6m (or 2y?) step backward in time does sound scaring to me. Sometimes it might be longer than expected, but it's still much better than we found issues after it has been merged. Spec is kind of different from the code as fixing bugs might be harder for the vendor. Lots of vendors just start to implement their device once a feature is merged into the spec. > > As mentioned the above will lead to a poorer implementation, are we ok > with that? 1) sticking to existing csum_start/offset semantic, both the spec and implementation are straightforward: Convert between skb->csum_start/offset and vnet_hdr->csum_start/offset with necessary checks if (packet is GSO_UDP) Do conversion between skb->inner_XXX and vnet_hdr->inner_XXX with necessary checks 2) with the proposal here if (packet is GSO UDP) convert between skb->csum_start/offset/in_nh_offset and skb->inner_XXX with necessary checks convert between outer_th_offset and skb->csum_start/offset with necessary checks else convert between skb->csum_start/offset and vnet_hdr->csum_start/offset with necessary checks I think 1) are easier to implement (no mention of the issues of flags and other side effects caused by trying to reusing outer flags/fields for inner). I don't see how 1) is poorer. Or is it just because more vnet header fields were introduced? > > What about hdr_len field? for GSO over UDP packets should include the > inner hdr len or not? If not, it would be useless, otherwise would be > inconsistent. hdr_len is just a hint, various implementations just ignore that. Actually, it's another example that needs to be clarified in this proposal. Is that for outer and inner? My understanding is that new fields for new functions are easier than new semantics for existing fields. As the latter needs to deal with the possible conflicts which is not easy. By looking at your concern so far about adding new fields for inner, the only thing that matters seems to be the vnet header length. I think it's not a problem as adding a few new bytes won't impact the performance. Thanks > > /P > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-18 4:26 ` Jason Wang @ 2024-10-18 10:10 ` Paolo Abeni 2024-10-20 22:28 ` Willem de Bruijn 2024-10-21 6:54 ` Jason Wang 0 siblings, 2 replies; 40+ messages in thread From: Paolo Abeni @ 2024-10-18 10:10 UTC (permalink / raw) To: Jason Wang Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On 10/18/24 06:26, Jason Wang wrote: > On Thu, Oct 17, 2024 at 11:34 PM Paolo Abeni <pabeni@redhat.com> wrote: >> i.e. all the checks in >> >> https://elixir.bootlin.com/linux/v6.12-rc2/source/include/linux/virtio_net.h#L97 >> ..L112 >> >> will be meaningless in such scenario and the GSO over UDP tunnel helper >> code must replicate them with the correct offsets. > > I don't understand here. It works like a charm for non GSO UDP tunnel, > anything that makes it can't work for GSO? I we chose to use csum_start for the outer header, for GSO over UDP tunnel packets: 1) the 'all headers in linear part' check: https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L111 and all the need preeceeding algebra will be useless, as we will have later to do the same for inner header. 2) the 'thlen' as computed by the previous switch statement: https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L62 will be incorrect if we use csum_start to point the outer header, as VIRTIO_NET_HDR_GSO_{TCPV{4,6},UDP} refers to the inner header. 3) the csum_start/csum_offset assignment performed by: https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L104 will yeld the wrong values - as they should point to the inner header for GSO over UDP tunnel packets. 4) the gso_size validation checks done under the if statement: https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L156 will be incorrect - we need to account for the inner Basically all actions and checks performed by virtio_net_hdr_to_skb() will be wrong or irrelevant. Instead, by using the virtio_net_hdr csum_start for the inner header, all the above mentioned checks are be correct and meaningful. I wrote a longer reply, addressing every later point, but I fear it would distract from the main point using csum_offset for the outer is a poor and inconsistent choice. Cheers, Paolo p.s. I think you got scared by the many checks that the tnl helpers implementation carry: such helpers are much more strict than the non tnl ones to avoid injecting into the stack packets with bad layout. A functional PoC could be implemented without most of them, but I think we actually want those additional checks no matter what layout we use for the virtio_net_hdr. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-18 10:10 ` Paolo Abeni @ 2024-10-20 22:28 ` Willem de Bruijn 2024-10-21 15:47 ` Paolo Abeni 2024-10-21 6:54 ` Jason Wang 1 sibling, 1 reply; 40+ messages in thread From: Willem de Bruijn @ 2024-10-20 22:28 UTC (permalink / raw) To: Paolo Abeni Cc: Jason Wang, virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, kshankar This thread has gotten very long. I'm not sure where exactly to sprinkle all responses, so just starting my attempt at a summary afresh. One continuing point of confusion is ingress vs egress. * Checksumming. Perhaps we can start with a new precursor patch that clarifies the intended existing behavior? - NEEDS_CSUM and DATA_VALID are mutually exclusive bits - DATA_VALID reads/writes CHECKSUM_UNNECESSARY - NEEDS_CSUM reads/writes CHECKSUM_PARTIAL, and implies that virtio_net_hdr csum_start/_off are set This does not explicitly address CHECKSUM_NONE and CHECKSUM_COMPLETE. And the meaning of the flags are different on ingress vs egress. All of these points can probably be clarified. Also CHECKSUM_UNNECESSARY has the csum_level field to encode how many levels of checksums it covers. This only becomes relevant with tunnels. Also, if the current Linux virtio_net_hdr_to_skb is more permissive than the spec, let's definitely update the code. * Tunnels. Technically there are no existing non-GSO tunnels in the virtio spec. What exists are packets with a single checksum offload. These may happen to be UDP packets with tunnels and inner packets, whose checksum virtio indeed has no way to vouch for. But no tunnel specific information is read/written in virtio_net_hdr_from_skb/virtio_net_hdr_to_skb. Such as the skb inner_.. fields, or csum_level. When introducing tunneling in the spec, and when only one checksum can be hardware offloaded, the most valuable one to offload is the innermost. Inner TCP packets must always be checksummed. UDP or other outer checksum fields can be left zero. Or computed cheaply in software, derived from the innermost checksum (local checksum offload). Though that is only relevant to patch 2. Relevant here is that in the current proposal if NEEDS_CHECKSUM is set, this is a checksum offload request of the inner TCP packet, and the outer UDP header has checksum zero. The inverse, for TSO, would not be very useful. An alternative is to support multiple checksum offload requests at once, at a minimum two. But most hardware only supports one csum_start/_csum_off. And due to LCO, the complexity of supporting multiple levels is just not needed. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-20 22:28 ` Willem de Bruijn @ 2024-10-21 15:47 ` Paolo Abeni 2024-10-22 7:54 ` Jason Wang 0 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-21 15:47 UTC (permalink / raw) To: Willem de Bruijn Cc: Jason Wang, virtio-comment@lists.linux.dev, maxime.coquelin@redhat.com, Eelco Chaudron, Stefano Garzarella On 10/21/24 00:28, Willem de Bruijn wrote: > This thread has gotten very long. I'm not sure where exactly to > sprinkle all responses, so just starting my attempt at a summary > afresh. > > One continuing point of confusion is ingress vs egress. > > * Checksumming. > > Perhaps we can start with a new precursor patch that clarifies the > intended existing behavior? > > - NEEDS_CSUM and DATA_VALID are mutually exclusive bits > - DATA_VALID reads/writes CHECKSUM_UNNECESSARY > - NEEDS_CSUM reads/writes CHECKSUM_PARTIAL, and implies that > virtio_net_hdr csum_start/_off are set > > This does not explicitly address CHECKSUM_NONE and CHECKSUM_COMPLETE. > And the meaning of the flags are different on ingress vs egress. All > of these points can probably be clarified. > > Also CHECKSUM_UNNECESSARY has the csum_level field to encode how many > levels of checksums it covers. This only becomes relevant with > tunnels. > > Also, if the current Linux virtio_net_hdr_to_skb is more permissive > than the spec, let's definitely update the code. Can we? Wouldn't that possibly cause some interoperability problems with other implementations (H/W or S/W) that are currently leveraging such more permissive behavior? (Note: I don't have any specific info WRT that point). > * Tunnels. > > Technically there are no existing non-GSO tunnels in the virtio spec. > > What exists are packets with a single checksum offload. These may > happen to be UDP packets with tunnels and inner packets, whose > checksum virtio indeed has no way to vouch for. But no tunnel specific > information is read/written in > virtio_net_hdr_from_skb/virtio_net_hdr_to_skb. Such as the skb > inner_.. fields, or csum_level. > > When introducing tunneling in the spec, and when only one checksum can > be hardware offloaded, the most valuable one to offload is the > innermost. Inner TCP packets must always be checksummed. UDP or other > outer checksum fields can be left zero. > > Or computed cheaply in software, derived from the innermost checksum > (local checksum offload). Though that is only relevant to patch 2. > Relevant here is that in the current proposal if NEEDS_CHECKSUM is > set, this is a checksum offload request of the inner TCP packet, and > the outer UDP header has checksum zero. The inverse, for TSO, would > not be very useful. I read the above as we are basically on the same page. > An alternative is to support multiple checksum offload requests at > once, at a minimum two. But most hardware only supports one > csum_start/_csum_off. And due to LCO, the complexity of supporting > multiple levels is just not needed. patch 2 is about supporting offload of 2 checksums. In the v9 incarnation allows either CSUM_PARTIAL or CSUM_COMPLETE for both the offloaded csum. FTR I'm aware of at least a (old, low-end) H/W supporting offload of 2 separate, concurrent running csums. In any case my understanding is that the most controversial point in this thread is the previous one: which header should csum_start point to for GSO over UDP tunnel, as Jason prefer to have csum_start point to the outer header. Let's try to solve such point first. Cheers, Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-21 15:47 ` Paolo Abeni @ 2024-10-22 7:54 ` Jason Wang 2024-10-23 20:57 ` Willem de Bruijn 0 siblings, 1 reply; 40+ messages in thread From: Jason Wang @ 2024-10-22 7:54 UTC (permalink / raw) To: Paolo Abeni Cc: Willem de Bruijn, virtio-comment@lists.linux.dev, maxime.coquelin@redhat.com, Eelco Chaudron, Stefano Garzarella On Mon, Oct 21, 2024 at 11:47 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/21/24 00:28, Willem de Bruijn wrote: > > This thread has gotten very long. I'm not sure where exactly to > > sprinkle all responses, so just starting my attempt at a summary > > afresh. > > > > One continuing point of confusion is ingress vs egress. > > > > * Checksumming. > > > > Perhaps we can start with a new precursor patch that clarifies the > > intended existing behavior? > > > > - NEEDS_CSUM and DATA_VALID are mutually exclusive bits > > - DATA_VALID reads/writes CHECKSUM_UNNECESSARY Spec has this: """ If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has validated the packet checksum. In case of multiple encapsulated protocols, one level of checksums has been validated. """ Is this not sufficient? > > - NEEDS_CSUM reads/writes CHECKSUM_PARTIAL, and implies that > > virtio_net_hdr csum_start/_off are set Spec has this: For TX: """ If the driver negotiated VIRTIO_NET_F_CSUM, it can skip checksumming the packet: flags has the VIRTIO_NET_HDR_F_NEEDS_CSUM set, csum_start is set to the offset within the packet to begin checksumming, and csum_offset indicates how many bytes after the csum_start the new (16 bit ones’ complement) checksum is placed by the device. The TCP checksum field in the packet is set to the sum of the TCP pseudo header, so that replacing it by the ones’ complement checksum of the TCP header and body will give the correct result. """ For RX: """ If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in flags can be set: if so, the packet checksum at offset csum_offset from csum_start and any preceding checksums have been validated. The checksum on the packet is incomplete and if bit VIRTIO_NET_HDR_F_RSC_INFO is not set in flags, then csum_start and csum_offset indicate how to calculate it (see Packet Transmission point 1). """ > > > > This does not explicitly address CHECKSUM_NONE And this part """ The driver can send a completely checksummed packet. In this case, flags will be zero, and gso_type will be VIRTIO_NET_HDR_GSO_NONE. """ > > and CHECKSUM_COMPLETE. This requires extending of the descriptor or vnet header to report checksum. > > And the meaning of the flags are different on ingress vs egress. All > > of these points can probably be clarified. > > > > Also CHECKSUM_UNNECESSARY has the csum_level field to encode how many > > levels of checksums it covers. This only becomes relevant with > > tunnels. It depends on the semantics of the inner fields, if they are for innermost, we probably need to have some field. If they are just for level 1 tunnel, csum_level would be 1 at most. > > > > Also, if the current Linux virtio_net_hdr_to_skb is more permissive > > than the spec, let's definitely update the code. > > Can we? Wouldn't that possibly cause some interoperability problems with > other implementations (H/W or S/W) that are currently leveraging such > more permissive behavior? (Note: I don't have any specific info WRT that > point). This needs to be careful as it may break the migration compatibility. For example the recent increase of UDP_MAX_SEGMENTS may break migration compatibility silently. (Src has 128 but dst has 64). > > > * Tunnels. > > > > Technically there are no existing non-GSO tunnels in the virtio spec. > > > > What exists are packets with a single checksum offload. These may > > happen to be UDP packets with tunnels and inner packets, whose > > checksum virtio indeed has no way to vouch for. But no tunnel specific > > information is read/written in > > virtio_net_hdr_from_skb/virtio_net_hdr_to_skb. Such as the skb > > inner_.. fields, or csum_level. > > > > When introducing tunneling in the spec, and when only one checksum can > > be hardware offloaded, the most valuable one to offload is the > > innermost. So should we first introduce this and then GSO tunnel on top? As mentioned before, non GSO tunnel tries to offload the outer if I read the spec and Linux code correctly. > > Inner TCP packets must always be checksummed. UDP or other > > outer checksum fields can be left zero. > > > > Or computed cheaply in software, derived from the innermost checksum > > (local checksum offload). Though that is only relevant to patch 2. > > Relevant here is that in the current proposal if NEEDS_CHECKSUM is > > set, this is a checksum offload request of the inner TCP packet, and > > the outer UDP header has checksum zero. The inverse, for TSO, would > > not be very useful. > > I read the above as we are basically on the same page. > > > An alternative is to support multiple checksum offload requests at > > once, at a minimum two. But most hardware only supports one > > csum_start/_csum_off. And due to LCO, the complexity of supporting > > multiple levels is just not needed. > > patch 2 is about supporting offload of 2 checksums. In the v9 > incarnation allows either CSUM_PARTIAL or CSUM_COMPLETE for both the > offloaded csum. FTR I'm aware of at least a (old, low-end) H/W > supporting offload of 2 separate, concurrent running csums. [1] > > In any case my understanding is that the most controversial point in > this thread is the previous one: which header should csum_start point to > for GSO over UDP tunnel, as Jason prefer to have csum_start point to the > outer header. Let's try to solve such point first. I think I would not stick to that. Just wonder if it would be more convenient since it gives us better consistency and is more hardware friendly (seems not a point any more as we have LCO and you mentioned [1] that it's an old low-end one) if it supports more than one. > > Cheers, > > Paolo > Thanks ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-22 7:54 ` Jason Wang @ 2024-10-23 20:57 ` Willem de Bruijn 2024-10-25 8:41 ` Jason Wang 0 siblings, 1 reply; 40+ messages in thread From: Willem de Bruijn @ 2024-10-23 20:57 UTC (permalink / raw) To: Jason Wang Cc: Paolo Abeni, virtio-comment@lists.linux.dev, maxime.coquelin@redhat.com, Eelco Chaudron, Stefano Garzarella On Tue, Oct 22, 2024 at 3:54 AM Jason Wang <jasowang@redhat.com> wrote: > > On Mon, Oct 21, 2024 at 11:47 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On 10/21/24 00:28, Willem de Bruijn wrote: > > > This thread has gotten very long. I'm not sure where exactly to > > > sprinkle all responses, so just starting my attempt at a summary > > > afresh. > > > > > > One continuing point of confusion is ingress vs egress. > > > > > > * Checksumming. > > > > > > Perhaps we can start with a new precursor patch that clarifies the > > > intended existing behavior? > > > > > > - NEEDS_CSUM and DATA_VALID are mutually exclusive bits > > > - DATA_VALID reads/writes CHECKSUM_UNNECESSARY > > Spec has this: > > """ > If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has > validated the packet checksum. In case of multiple encapsulated > protocols, one level of checksums has been validated. > """ > > Is this not sufficient? It does not rule out setting DATA_VALID and NEEDS_CSUM at the same time. I thought Paolo was suggesting using this, and you responded that they should be mutually exclusive. > > > - NEEDS_CSUM reads/writes CHECKSUM_PARTIAL, and implies that > > > virtio_net_hdr csum_start/_off are set > > Spec has this: > > For TX: > > """ > If the driver negotiated VIRTIO_NET_F_CSUM, it can skip checksumming the packet: > > flags has the VIRTIO_NET_HDR_F_NEEDS_CSUM set, > csum_start is set to the offset within the packet to begin checksumming, and > csum_offset indicates how many bytes after the csum_start the new (16 > bit ones’ complement) checksum is placed by the device. > The TCP checksum field in the packet is set to the sum of the TCP > pseudo header, so that replacing it by the ones’ complement checksum > of the TCP header and body will give the correct result. > """ > > For RX: > > """ > If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > VIRTIO_NET_HDR_F_NEEDS_CSUM bit in flags can be set: if so, the packet > checksum at offset csum_offset from csum_start and any preceding > checksums have been validated. The checksum on the packet is > incomplete and if bit VIRTIO_NET_HDR_F_RSC_INFO is not set in flags, > then csum_start and csum_offset indicate how to calculate it (see > Packet Transmission point 1). > """ Interesting. So on Rx NEEDS_CSUM also implies that the packet is validated, just like DATA_VALID. And this implicitly has a csum_level: "and any preceding". > > > > > > This does not explicitly address CHECKSUM_NONE > > And this part > > """ > The driver can send a completely checksummed packet. In this case, > flags will be zero, and gso_type will be VIRTIO_NET_HDR_GSO_NONE. > """ I would expect DATA_VALID for this. Yes, segmentation offloading depends on checksum offload, so that must always be zero. > > > and CHECKSUM_COMPLETE. > > This requires extending of the descriptor or vnet header to report checksum. Probably treated as DATA_VALID, after verifying the checksum field against the computed sum. > > > And the meaning of the flags are different on ingress vs egress. All > > > of these points can probably be clarified. > > > > > > Also CHECKSUM_UNNECESSARY has the csum_level field to encode how many > > > levels of checksums it covers. This only becomes relevant with > > > tunnels. > > It depends on the semantics of the inner fields, if they are for > innermost, we probably need to have some field. If they are just for > level 1 tunnel, csum_level would be 1 at most. > > > > > > > Also, if the current Linux virtio_net_hdr_to_skb is more permissive > > > than the spec, let's definitely update the code. > > > > Can we? Wouldn't that possibly cause some interoperability problems with > > other implementations (H/W or S/W) that are currently leveraging such > > more permissive behavior? (Note: I don't have any specific info WRT that > > point). > > This needs to be careful as it may break the migration compatibility. > > For example the recent increase of UDP_MAX_SEGMENTS may break > migration compatibility silently. (Src has 128 but dst has 64). Agreed with both of you. We cannot safely restrict after the initial release. Which is a good reminder to make this new extension as unambiguous as we can from the start, and limited to exposing exactly the feature that we intend only. > > > > > * Tunnels. > > > > > > Technically there are no existing non-GSO tunnels in the virtio spec. > > > > > > What exists are packets with a single checksum offload. These may > > > happen to be UDP packets with tunnels and inner packets, whose > > > checksum virtio indeed has no way to vouch for. But no tunnel specific > > > information is read/written in > > > virtio_net_hdr_from_skb/virtio_net_hdr_to_skb. Such as the skb > > > inner_.. fields, or csum_level. > > > > > > When introducing tunneling in the spec, and when only one checksum can > > > be hardware offloaded, the most valuable one to offload is the > > > innermost. > > So should we first introduce this and then GSO tunnel on top? As > mentioned before, non GSO tunnel tries to offload the outer if I read > the spec and Linux code correctly. It might make sense to clarify the non-GSO tunneling case on virtio as a first patch in this series. If we can get to a clear description. But I'm concerned that it would request a lot of extra work of Paolo that he did not sign up for. I don't know how tunnels are used with virtio_net_hdr today. As far as I can tell they are basically just UDP packets from the virtio point of view. > > > Inner TCP packets must always be checksummed. UDP or other > > > outer checksum fields can be left zero. > > > > > > Or computed cheaply in software, derived from the innermost checksum > > > (local checksum offload). Though that is only relevant to patch 2. > > > Relevant here is that in the current proposal if NEEDS_CHECKSUM is > > > set, this is a checksum offload request of the inner TCP packet, and > > > the outer UDP header has checksum zero. The inverse, for TSO, would > > > not be very useful. > > > > I read the above as we are basically on the same page. > > > > > An alternative is to support multiple checksum offload requests at > > > once, at a minimum two. But most hardware only supports one > > > csum_start/_csum_off. And due to LCO, the complexity of supporting > > > multiple levels is just not needed. > > > > patch 2 is about supporting offload of 2 checksums. In the v9 > > incarnation allows either CSUM_PARTIAL or CSUM_COMPLETE for both the > > offloaded csum. FTR I'm aware of at least a (old, low-end) H/W > > supporting offload of 2 separate, concurrent running csums. > > [1] > > > > > In any case my understanding is that the most controversial point in > > this thread is the previous one: which header should csum_start point to > > for GSO over UDP tunnel, as Jason prefer to have csum_start point to the > > outer header. Let's try to solve such point first. > > I think I would not stick to that. Just wonder if it would be more > convenient since it gives us better consistency and is more hardware > friendly (seems not a point any more as we have LCO and you mentioned > [1] that it's an old low-end one) if it supports more than one. If we can rely on LCO everywhere, I would also not try hard to make the one piece of hardware work that has another, now superfluous, mechanism. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-23 20:57 ` Willem de Bruijn @ 2024-10-25 8:41 ` Jason Wang 0 siblings, 0 replies; 40+ messages in thread From: Jason Wang @ 2024-10-25 8:41 UTC (permalink / raw) To: Willem de Bruijn Cc: Paolo Abeni, virtio-comment@lists.linux.dev, maxime.coquelin@redhat.com, Eelco Chaudron, Stefano Garzarella On Thu, Oct 24, 2024 at 4:58 AM Willem de Bruijn <willemb@google.com> wrote: > > On Tue, Oct 22, 2024 at 3:54 AM Jason Wang <jasowang@redhat.com> wrote: > > > > On Mon, Oct 21, 2024 at 11:47 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > > > On 10/21/24 00:28, Willem de Bruijn wrote: > > > > This thread has gotten very long. I'm not sure where exactly to > > > > sprinkle all responses, so just starting my attempt at a summary > > > > afresh. > > > > > > > > One continuing point of confusion is ingress vs egress. > > > > > > > > * Checksumming. > > > > > > > > Perhaps we can start with a new precursor patch that clarifies the > > > > intended existing behavior? > > > > > > > > - NEEDS_CSUM and DATA_VALID are mutually exclusive bits > > > > - DATA_VALID reads/writes CHECKSUM_UNNECESSARY > > > > Spec has this: > > > > """ > > If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > > VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has > > validated the packet checksum. In case of multiple encapsulated > > protocols, one level of checksums has been validated. > > """ > > > > Is this not sufficient? > > It does not rule out setting DATA_VALID and NEEDS_CSUM at the same time. > > I thought Paolo was suggesting using this, and you responded that they should > be mutually exclusive. Exactly, so it's for "NEEDS_CSUM and DATA_VALID are mutually exclusive bits" I meant I didn't get this: " DATA_VALID reads/writes CHECKSUM_UNNECESSARY" as spec already describes that DATA_VALID is the semantic of CHECKSUM_UNNECESSARY. (See my quoted statement above). > > > > > - NEEDS_CSUM reads/writes CHECKSUM_PARTIAL, and implies that > > > > virtio_net_hdr csum_start/_off are set > > > > Spec has this: > > > > For TX: > > > > """ > > If the driver negotiated VIRTIO_NET_F_CSUM, it can skip checksumming the packet: > > > > flags has the VIRTIO_NET_HDR_F_NEEDS_CSUM set, > > csum_start is set to the offset within the packet to begin checksumming, and > > csum_offset indicates how many bytes after the csum_start the new (16 > > bit ones’ complement) checksum is placed by the device. > > The TCP checksum field in the packet is set to the sum of the TCP > > pseudo header, so that replacing it by the ones’ complement checksum > > of the TCP header and body will give the correct result. > > """ > > > > For RX: > > > > """ > > If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > > VIRTIO_NET_HDR_F_NEEDS_CSUM bit in flags can be set: if so, the packet > > checksum at offset csum_offset from csum_start and any preceding > > checksums have been validated. The checksum on the packet is > > incomplete and if bit VIRTIO_NET_HDR_F_RSC_INFO is not set in flags, > > then csum_start and csum_offset indicate how to calculate it (see > > Packet Transmission point 1). > > """ > > Interesting. So on Rx NEEDS_CSUM also implies that the packet > is validated, just like DATA_VALID. Yes, the only difference is that the checksum is incomplete. > > And this implicitly has a csum_level: "and any preceding". Kind of but it requires the driver to figure out the csum level, I wonder if this makes sense for any implementation. > > > > > > > > > This does not explicitly address CHECKSUM_NONE > > > > And this part > > > > """ > > The driver can send a completely checksummed packet. In this case, > > flags will be zero, and gso_type will be VIRTIO_NET_HDR_GSO_NONE. > > """ > > I would expect DATA_VALID for this. DATA_VALID is only used in RX and here zero flags means no offload, if I was not wrong. > > Yes, segmentation offloading depends on checksum offload, so that must > always be zero. > > > > > and CHECKSUM_COMPLETE. > > > > This requires extending of the descriptor or vnet header to report checksum. > > Probably treated as DATA_VALID, after verifying the checksum field against > the computed sum. > > > > > And the meaning of the flags are different on ingress vs egress. All > > > > of these points can probably be clarified. > > > > > > > > Also CHECKSUM_UNNECESSARY has the csum_level field to encode how many > > > > levels of checksums it covers. This only becomes relevant with > > > > tunnels. > > > > It depends on the semantics of the inner fields, if they are for > > innermost, we probably need to have some field. If they are just for > > level 1 tunnel, csum_level would be 1 at most. > > > > > > > > > > Also, if the current Linux virtio_net_hdr_to_skb is more permissive > > > > than the spec, let's definitely update the code. > > > > > > Can we? Wouldn't that possibly cause some interoperability problems with > > > other implementations (H/W or S/W) that are currently leveraging such > > > more permissive behavior? (Note: I don't have any specific info WRT that > > > point). > > > > This needs to be careful as it may break the migration compatibility. > > > > For example the recent increase of UDP_MAX_SEGMENTS may break > > migration compatibility silently. (Src has 128 but dst has 64). > > Agreed with both of you. We cannot safely restrict after the initial release. > > Which is a good reminder to make this new extension as unambiguous > as we can from the start, and limited to exposing exactly the feature that > we intend only. Right. > > > > > > > > * Tunnels. > > > > > > > > Technically there are no existing non-GSO tunnels in the virtio spec. > > > > > > > > What exists are packets with a single checksum offload. These may > > > > happen to be UDP packets with tunnels and inner packets, whose > > > > checksum virtio indeed has no way to vouch for. But no tunnel specific > > > > information is read/written in > > > > virtio_net_hdr_from_skb/virtio_net_hdr_to_skb. Such as the skb > > > > inner_.. fields, or csum_level. > > > > > > > > When introducing tunneling in the spec, and when only one checksum can > > > > be hardware offloaded, the most valuable one to offload is the > > > > innermost. > > > > So should we first introduce this and then GSO tunnel on top? As > > mentioned before, non GSO tunnel tries to offload the outer if I read > > the spec and Linux code correctly. > > It might make sense to clarify the non-GSO tunneling case on virtio > as a first patch in this series. > > If we can get to a clear description. But I'm concerned that it would > request a lot of extra work of Paolo that he did not sign up for. Maybe, let's see. > > I don't know how tunnels are used with virtio_net_hdr today. As far as > I can tell they are basically just UDP packets from the virtio point of view. Paolo said csum_start/offset has already been used for the inner header, and I've asked him to confirm in another thread. > > > > > Inner TCP packets must always be checksummed. UDP or other > > > > outer checksum fields can be left zero. > > > > > > > > Or computed cheaply in software, derived from the innermost checksum > > > > (local checksum offload). Though that is only relevant to patch 2. > > > > Relevant here is that in the current proposal if NEEDS_CHECKSUM is > > > > set, this is a checksum offload request of the inner TCP packet, and > > > > the outer UDP header has checksum zero. The inverse, for TSO, would > > > > not be very useful. > > > > > > I read the above as we are basically on the same page. > > > > > > > An alternative is to support multiple checksum offload requests at > > > > once, at a minimum two. But most hardware only supports one > > > > csum_start/_csum_off. And due to LCO, the complexity of supporting > > > > multiple levels is just not needed. > > > > > > patch 2 is about supporting offload of 2 checksums. In the v9 > > > incarnation allows either CSUM_PARTIAL or CSUM_COMPLETE for both the > > > offloaded csum. FTR I'm aware of at least a (old, low-end) H/W > > > supporting offload of 2 separate, concurrent running csums. > > > > [1] > > > > > > > > In any case my understanding is that the most controversial point in > > > this thread is the previous one: which header should csum_start point to > > > for GSO over UDP tunnel, as Jason prefer to have csum_start point to the > > > outer header. Let's try to solve such point first. > > > > I think I would not stick to that. Just wonder if it would be more > > convenient since it gives us better consistency and is more hardware > > friendly (seems not a point any more as we have LCO and you mentioned > > [1] that it's an old low-end one) if it supports more than one. > > If we can rely on LCO everywhere, I would also not try hard to make the one > piece of hardware work that has another, now superfluous, mechanism. > Ok. Thanks ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-18 10:10 ` Paolo Abeni 2024-10-20 22:28 ` Willem de Bruijn @ 2024-10-21 6:54 ` Jason Wang 2024-10-21 16:27 ` Paolo Abeni 1 sibling, 1 reply; 40+ messages in thread From: Jason Wang @ 2024-10-21 6:54 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Fri, Oct 18, 2024 at 6:10 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/18/24 06:26, Jason Wang wrote: > > On Thu, Oct 17, 2024 at 11:34 PM Paolo Abeni <pabeni@redhat.com> wrote: > >> i.e. all the checks in > >> > >> https://elixir.bootlin.com/linux/v6.12-rc2/source/include/linux/virtio_net.h#L97 > >> ..L112 > >> > >> will be meaningless in such scenario and the GSO over UDP tunnel helper > >> code must replicate them with the correct offsets. > > > > I don't understand here. It works like a charm for non GSO UDP tunnel, > > anything that makes it can't work for GSO? > > I we chose to use csum_start for the outer header, for GSO over UDP > tunnel packets: > > 1) the 'all headers in linear part' check: > > https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L111 > > and all the need preeceeding algebra will be useless, This part I don't understand. We still need a validation for the outer header and we've already done this for non GSO tunnels already. > as we will have > later to do the same for inner header. Did you mean we only need inner validation? E.g in your POC virtio_net_hdr_tnl_to_skb(): ... /* Let the basic parsing deal with plain GSO features. */ hdr->gso_type &= ~gso_tunnel_type; ret = virtio_net_hdr_to_skb(skb, hdr, little_endian); It seems we lose some validation for the outer header. > > 2) the 'thlen' as computed by the previous switch statement: > > https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L62 > > will be incorrect if we use csum_start to point the outer header, as > VIRTIO_NET_HDR_GSO_{TCPV{4,6},UDP} refers to the inner header. Just to make sure we are on the same page here. I meant having new dedicated fields/flags for inner. In this case, all the existing checks were still done for outer check in e.g virtio_net_hdr_to_skb(). And we can do tunnel specific check inside it: virtio_net_hdr_to_skb() { ...... if (skb is UDP tunnel GSO) { do_udp_tunnel_gso_check() // using separated flags/fields for inner. } } > > 3) the csum_start/csum_offset assignment performed by: > > https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L104 > > will yeld the wrong values - as they should point to the inner header > for GSO over UDP tunnel packets. See above, it just works as a non GSO UDP tunnel. > > 4) the gso_size validation checks done under the if statement: > > https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L156 > > will be incorrect - we need to account for the inner Still, it's the check for outer, inner can have a dedicated check if necessary. Or if we only need validation for the inner, it needs to be explained. > > Basically all actions and checks performed by virtio_net_hdr_to_skb() > will be wrong or irrelevant. See above, if we stick to using existing fields/flags, those checks would just work as is for the outer. > > Instead, by using the virtio_net_hdr csum_start for the inner header, > all the above mentioned checks are be correct and meaningful. > > I wrote a longer reply, addressing every later point, but I fear it > would distract from the main point using csum_offset for the outer is a > poor and inconsistent choice. Right, let's figure out this part first. > > Cheers, > > Paolo > > p.s. I think you got scared by the many checks that the tnl helpers > implementation carry: such helpers are much more strict than the non tnl > ones to avoid injecting into the stack packets with bad layout. A > functional PoC could be implemented without most of them, but I think we > actually want those additional checks no matter what layout we use for > the virtio_net_hdr. I'm totally fine with adding those checks. The main issue left is if it's better to reuse existing fields for inner or not. Thanks > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-21 6:54 ` Jason Wang @ 2024-10-21 16:27 ` Paolo Abeni 2024-10-22 7:42 ` Jason Wang 0 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-21 16:27 UTC (permalink / raw) To: Jason Wang Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On 10/21/24 08:54, Jason Wang wrote: > On Fri, Oct 18, 2024 at 6:10 PM Paolo Abeni <pabeni@redhat.com> wrote: >> On 10/18/24 06:26, Jason Wang wrote: >>> On Thu, Oct 17, 2024 at 11:34 PM Paolo Abeni <pabeni@redhat.com> wrote: >>>> i.e. all the checks in >>>> >>>> https://elixir.bootlin.com/linux/v6.12-rc2/source/include/linux/virtio_net.h#L97 >>>> ..L112 >>>> >>>> will be meaningless in such scenario and the GSO over UDP tunnel helper >>>> code must replicate them with the correct offsets. >>> >>> I don't understand here. It works like a charm for non GSO UDP tunnel, >>> anything that makes it can't work for GSO? >> >> I we chose to use csum_start for the outer header, for GSO over UDP >> tunnel packets: >> >> 1) the 'all headers in linear part' check: >> >> https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L111 >> >> and all the need preeceeding algebra will be useless, > > This part I don't understand. We still need a validation for the outer > header and we've already done this for non GSO tunnels already. For GSO over UDP tunnel packet, what we need to ensure is that all the headers, comprising the inner ones, are available. Checking that the outer header are there does not avoid the need for checking that the inner headers are present. Checking the inners only is instead sufficient: if the inner headers are available any preceeding data will also be available, including the outer headers. >> 3) the csum_start/csum_offset assignment performed by: >> >> https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L104 >> >> will yeld the wrong values - as they should point to the inner header >> for GSO over UDP tunnel packets. > > See above, it just works as a non GSO UDP tunnel. Note that all this topic is about GSO over UDP tunnel packets, and which layout is more consistent for them. For GSO over UDP tunnel packets, if we set virtio_net_header csum_start to the outer, we can't reuse a single line of the existing code, as the tests will be unnecessary or will give the wrong results. i.e. - https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L171 would flag as invalid good packets, as it would over-estimante the packet payload size - https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L179 would drop every TCP over UDP tunnel packet, as csum_offset will be 6 even when the inner header is TCP. Instead if for GSO over UDP tunnel we set csum_start to the inner header, all the existing test will yield valid result - mostly because the Linux kernel, for such packets, expect csum_start pointing the inner header. And that is IMHO a clear sign that for such packet csum_offset should point to the inner header even for virtio_net_hdr. Cheers, Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-21 16:27 ` Paolo Abeni @ 2024-10-22 7:42 ` Jason Wang 2024-10-22 16:56 ` Paolo Abeni 0 siblings, 1 reply; 40+ messages in thread From: Jason Wang @ 2024-10-22 7:42 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Tue, Oct 22, 2024 at 12:27 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/21/24 08:54, Jason Wang wrote: > > On Fri, Oct 18, 2024 at 6:10 PM Paolo Abeni <pabeni@redhat.com> wrote: > >> On 10/18/24 06:26, Jason Wang wrote: > >>> On Thu, Oct 17, 2024 at 11:34 PM Paolo Abeni <pabeni@redhat.com> wrote: > >>>> i.e. all the checks in > >>>> > >>>> https://elixir.bootlin.com/linux/v6.12-rc2/source/include/linux/virtio_net.h#L97 > >>>> ..L112 > >>>> > >>>> will be meaningless in such scenario and the GSO over UDP tunnel helper > >>>> code must replicate them with the correct offsets. > >>> > >>> I don't understand here. It works like a charm for non GSO UDP tunnel, > >>> anything that makes it can't work for GSO? > >> > >> I we chose to use csum_start for the outer header, for GSO over UDP > >> tunnel packets: > >> > >> 1) the 'all headers in linear part' check: > >> > >> https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L111 > >> > >> and all the need preeceeding algebra will be useless, > > > > This part I don't understand. We still need a validation for the outer > > header and we've already done this for non GSO tunnels already. > > For GSO over UDP tunnel packet, what we need to ensure is that all the > headers, comprising the inner ones, are available. > > Checking that the outer header are there does not avoid the need for > checking that the inner headers are present. Right. > > Checking the inners only is instead sufficient: if the inner headers are > available any preceeding data will also be available, including the > outer headers. Somehow but should we validate that the outer_th_offset belongs to the linear part? > > >> 3) the csum_start/csum_offset assignment performed by: > >> > >> https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L104 > >> > >> will yeld the wrong values - as they should point to the inner header > >> for GSO over UDP tunnel packets. > > > > See above, it just works as a non GSO UDP tunnel. > > Note that all this topic is about GSO over UDP tunnel packets, and which > layout is more consistent for them. > > For GSO over UDP tunnel packets, if we set virtio_net_header csum_start > to the outer, we can't reuse a single line of the existing code, as the > tests will be unnecessary or will give the wrong results. > > i.e. > > - > https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L171 > > would flag as invalid good packets, as it would over-estimante the > packet payload size > > - > https://elixir.bootlin.com/linux/v6.12-rc3/source/include/linux/virtio_net.h#L179 > > would drop every TCP over UDP tunnel packet, as csum_offset will be 6 > even when the inner header is TCP. We won't hit this if gso_type is for the outer. Just to clarify, I meant if we stick the existing vnet header fields only for outer, there would be any issue. > > Instead if for GSO over UDP tunnel we set csum_start to the inner > header, all the existing test will yield valid result - mostly because > the Linux kernel, for such packets, expect csum_start pointing the inner > header. Ok, that makes sense. So my point is considering we are designing virtio instead of trying to make existing GSO capable devices to support tunnel GSO (or can we do that for virtio?). If some new fields can make things easier, we probably need to go that way. Let me summarize my understanding for your proposal so far: 1) csum_start/csum_offset were for the inner and so the outer checksum can be done through LCO or offload via patch 2 2) if VIRTIO_NET_HDR_GSO_UDP_TUNNEL_{IPV4|IPV6} is set the rest bits of the gso_type were for inner 3) hdr_len is for inner (this needs to be clarified in the spec), so outer hdr_len could be deduced from inner It should work but for 1) I worry about the possible inconsistency: 1) It might have some conflict with " If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has validated the packet checksum. In case of multiple encapsulated protocols, one level of checksums has been validated. " 2) csum_start/csum_offset were for outer for non GSO tunnel packets. hdr_len, gso_type are fine. I think those can be addressed for sure with more statements in the spec. Just not sure if it's worth it. I'm fine if you stick to that or we can hear from others. Down this road, I have other questions: 1) We need to clarify if there's an N level nesting tunnel, is csum_start/offset/hdr_len for the innermost or not. Linux seems to use the innermost semantic. If yes, for DATA_VALID, as Willem mentioned, we need a way to report csum_level. 2) In your POC[1], hdr->csum_start/offset seems still point to the outer header https://github.com/pabeni/linux-devel/blob/virtio_tunnel_gso/include/linux/virtio_net.h#L229 Thanks > > And that is IMHO a clear sign that for such packet csum_offset should > point to the inner header even for virtio_net_hdr. > > Cheers, > > Paolo > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-22 7:42 ` Jason Wang @ 2024-10-22 16:56 ` Paolo Abeni 2024-10-25 8:28 ` Jason Wang 0 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-22 16:56 UTC (permalink / raw) To: Jason Wang Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On 10/22/24 09:42, Jason Wang wrote: > So my point is considering we are designing virtio instead of trying > to make existing GSO capable devices to support tunnel GSO (or can we > do that for virtio?). If some new fields can make things easier, we > probably need to go that way. > > Let me summarize my understanding for your proposal so far: > > 1) csum_start/csum_offset were for the inner and so the outer checksum > can be done through LCO or offload via patch 2 > 2) if VIRTIO_NET_HDR_GSO_UDP_TUNNEL_{IPV4|IPV6} is set the rest bits > of the gso_type were for inner > 3) hdr_len is for inner (this needs to be clarified in the spec), so > outer hdr_len could be deduced from inner > > It should work but for 1) I worry about the possible inconsistency: > > 1) It might have some conflict with > > " > If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has > validated the packet checksum. In case of multiple encapsulated > protocols, one level of checksums has been validated. > " I proposed the following: For GSO over UDP tunnel VIRTIO_NET_HDR_F_DATA_VALID is allowed if and only even VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set in flags. That means that the device supports csum offload for both the inner and the outer and both csums are valid. The corresponding skb will have CHECKSUM_UNNECESSARY and csum_level == 1. For GSO over UDP tunnel VIRTIO_NET_HDR_F_DATA_VALID with a single csum offloaded, VIRTIO_NET_HDR_F_DATA_VALID can't be used. Note that we may want to avoid completely VIRTIO_NET_HDR_F_DATA_VALID for GSO over UDP tunnel packets, see below[1]. Also that option would avoid all the conflicts. > 2) csum_start/csum_offset were for outer for non GSO tunnel packets. > hdr_len, gso_type are fine. > > I think those can be addressed for sure with more statements in the > spec. Just not sure if it's worth it. I'm fine if you stick to that or > we can hear from others. I'm unsure I read this point correctly. A rephrase could help. > Down this road, I have other questions: > > 1) We need to clarify if there's an N level nesting tunnel, is > csum_start/offset/hdr_len for the innermost or not. Linux seems to use > the innermost semantic. If yes, for DATA_VALID, as Willem mentioned, > we need a way to report csum_level. If there are multiple nested tunnels, there will be no GSO over UDP tunnels packets. The existing specs will apply, only one csum will be offloaded. > 2) In your POC[1], hdr->csum_start/offset seems still point to the outer header That would be a bug... > https://github.com/pabeni/linux-devel/blob/virtio_tunnel_gso/include/linux/virtio_net.h#L229 ... but the above code shows that hdr->csum_start points to the inner, because skb->csum_start/skb->csum_offset refers to the inner header in Linux. There is actually another point which I'm quite scared to mention because it caused v9 and implicitly all this discussion. I want to prevent the driver receiving from the device GSO over UDP tunnel packets without csum_start/csum_offset, because the header probing code looks fragile and bug-prone and will be even more complex in case of tunnels. [1] AFAICS the easiest way to prevent such thing is explicitly requiring no DATA_VALID for GSO over UDP tunnel packets. Cheers, Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-22 16:56 ` Paolo Abeni @ 2024-10-25 8:28 ` Jason Wang 2024-10-25 11:50 ` Paolo Abeni 0 siblings, 1 reply; 40+ messages in thread From: Jason Wang @ 2024-10-25 8:28 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Wed, Oct 23, 2024 at 12:56 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/22/24 09:42, Jason Wang wrote: > > So my point is considering we are designing virtio instead of trying > > to make existing GSO capable devices to support tunnel GSO (or can we > > do that for virtio?). If some new fields can make things easier, we > > probably need to go that way. > > > > Let me summarize my understanding for your proposal so far: > > > > 1) csum_start/csum_offset were for the inner and so the outer checksum > > can be done through LCO or offload via patch 2 > > 2) if VIRTIO_NET_HDR_GSO_UDP_TUNNEL_{IPV4|IPV6} is set the rest bits > > of the gso_type were for inner > > 3) hdr_len is for inner (this needs to be clarified in the spec), so > > outer hdr_len could be deduced from inner > > > > It should work but for 1) I worry about the possible inconsistency: > > > > 1) It might have some conflict with > > > > " > > If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > > VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has > > validated the packet checksum. In case of multiple encapsulated > > protocols, one level of checksums has been validated. > > " > > I proposed the following: > > For GSO over UDP tunnel VIRTIO_NET_HDR_F_DATA_VALID is allowed if and > only even VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set in flags. > > That means that the device supports csum offload for both the inner and > the outer and both csums are valid. The corresponding skb will have > CHECKSUM_UNNECESSARY and csum_level == 1. Nit: csum_level seems to be Linux specific. > > For GSO over UDP tunnel VIRTIO_NET_HDR_F_DATA_VALID with a single csum > offloaded, VIRTIO_NET_HDR_F_DATA_VALID can't be used. > > Note that we may want to avoid completely VIRTIO_NET_HDR_F_DATA_VALID > for GSO over UDP tunnel packets, see below[1]. Also that option would > avoid all the conflicts. > > > 2) csum_start/csum_offset were for outer for non GSO tunnel packets. > > hdr_len, gso_type are fine. > > > > I think those can be addressed for sure with more statements in the > > spec. Just not sure if it's worth it. I'm fine if you stick to that or > > we can hear from others. > > I'm unsure I read this point correctly. A rephrase could help. I meant 2.1) in the spec, we need to describe for example, if hdr_len is for inner(most) or outer(most) 2.2) based on the discussion so far, I'm convinced the using cusm_start/offset looks more convenient > > > Down this road, I have other questions: > > > > 1) We need to clarify if there's an N level nesting tunnel, is > > csum_start/offset/hdr_len for the innermost or not. Linux seems to use > > the innermost semantic. If yes, for DATA_VALID, as Willem mentioned, > > we need a way to report csum_level. > > If there are multiple nested tunnels, there will be no GSO over UDP > tunnels packets. This needs to be described in the spec. > The existing specs will apply, only one csum will be > offloaded. This sounds like conflict with what you proposed above """ That means that the device supports csum offload for both the inner and the outer and both csums are valid. The corresponding skb will have CHECKSUM_UNNECESSARY and csum_level == 1. """ This sounds like two csums were offloaded. > > > 2) In your POC[1], hdr->csum_start/offset seems still point to the outer header > > That would be a bug... > > > https://github.com/pabeni/linux-devel/blob/virtio_tunnel_gso/include/linux/virtio_net.h#L229 > > ... but the above code shows that hdr->csum_start points to the inner, > because skb->csum_start/skb->csum_offset refers to the inner header in > Linux. Ah, I may have mis-read the code. So you meant Linux already did LCO by pointing csum_start/offset for the inner header. This kind of answers my question (and Willem's in another thread) about how non GSO UDP tunnel works in Linux now. And my concern of the inconsistency would not make sense any more if my above understanding is correct. > > There is actually another point which I'm quite scared to mention > because it caused v9 and implicitly all this discussion. > > I want to prevent the driver receiving from the device GSO over UDP > tunnel packets without csum_start/csum_offset, because the header > probing code looks fragile and bug-prone and will be even more complex > in case of tunnels. This needs more thought. For example, it seems not specific to GSO. Should we have a new feature for this? (And build a GSO UDP tunnel on top?) > > [1] AFAICS the easiest way to prevent such thing is explicitly requiring > no DATA_VALID for GSO over UDP tunnel packets. > > Cheers, > > Paolo Thanks > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-25 8:28 ` Jason Wang @ 2024-10-25 11:50 ` Paolo Abeni 2024-10-25 13:28 ` Willem de Bruijn 2024-10-28 3:27 ` Jason Wang 0 siblings, 2 replies; 40+ messages in thread From: Paolo Abeni @ 2024-10-25 11:50 UTC (permalink / raw) To: Jason Wang Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On 10/25/24 10:28, Jason Wang wrote: > On Wed, Oct 23, 2024 at 12:56 AM Paolo Abeni <pabeni@redhat.com> wrote: >> I proposed the following: >> >> For GSO over UDP tunnel VIRTIO_NET_HDR_F_DATA_VALID is allowed if and >> only even VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set in flags. >> >> That means that the device supports csum offload for both the inner and >> the outer and both csums are valid. The corresponding skb will have >> CHECKSUM_UNNECESSARY and csum_level == 1. > > Nit: csum_level seems to be Linux specific. Note: the above was an informal description of the proposal, not the actual wording to be used in the spec. >> There is actually another point which I'm quite scared to mention >> because it caused v9 and implicitly all this discussion. >> >> I want to prevent the driver receiving from the device GSO over UDP >> tunnel packets without csum_start/csum_offset, because the header >> probing code looks fragile and bug-prone and will be even more complex >> in case of tunnels. > > This needs more thought. For example, it seems not specific to GSO. > Should we have a new feature for this? (And build a GSO UDP tunnel on > top?) Double checking I read the above correctly. Do you mean something alike the following to negotiated another feature controlling a new bit in flag telling 'the csum_start field is set even if NEEDS_CSUM is not set'? What about using the flag already defined in patch 2 instead (currently named VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM, possibly a different better name can be used [suggestion more than welcome])? The overall schema will be: - DATA_VALID retains its current semantic when GSO over UDP features are not negotiated and when VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is not set - when VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set that means the following: - 2 checksums are offloaded - the 2 outer-most one in case the packet carries more nested headers. and - 'csum_start' points to the innermost offloaded header - DATA_VALID can be set together with VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM, meaning: - CHECKSUM_UNNECESSARY with csum_level == 1, and - csum_start -> inner transport offset. If the packet carries many nested headers csum_start points to the 2nd outermost one. - VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} can be set/used if and only if: - NEEDS_CSUM is set (csum_start/csum_offset avail) or - DATA_VALID and VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM are both set (meaning 2 csums are offloaded and csum_start is avail) Note that even the following will be also allowed: VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} is set, NEEDS_CSUM is set, VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set. meaning it's GSO over UDP tunnel packets, with both checksum offloaded and CSUM_PARTIAL. WDYT? Thanks, Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-25 11:50 ` Paolo Abeni @ 2024-10-25 13:28 ` Willem de Bruijn 2024-10-25 14:35 ` Paolo Abeni 2024-10-28 3:27 ` Jason Wang 1 sibling, 1 reply; 40+ messages in thread From: Willem de Bruijn @ 2024-10-25 13:28 UTC (permalink / raw) To: Paolo Abeni Cc: Jason Wang, virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, kshankar On Fri, Oct 25, 2024 at 7:50 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/25/24 10:28, Jason Wang wrote: > > On Wed, Oct 23, 2024 at 12:56 AM Paolo Abeni <pabeni@redhat.com> wrote: > >> I proposed the following: > >> > >> For GSO over UDP tunnel VIRTIO_NET_HDR_F_DATA_VALID is allowed if and > >> only even VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set in flags. > >> > >> That means that the device supports csum offload for both the inner and > >> the outer and both csums are valid. The corresponding skb will have > >> CHECKSUM_UNNECESSARY and csum_level == 1. > > > > Nit: csum_level seems to be Linux specific. > > Note: the above was an informal description of the proposal, not the > actual wording to be used in the spec. > > >> There is actually another point which I'm quite scared to mention > >> because it caused v9 and implicitly all this discussion. > >> > >> I want to prevent the driver receiving from the device GSO over UDP > >> tunnel packets without csum_start/csum_offset, because the header > >> probing code looks fragile and bug-prone and will be even more complex > >> in case of tunnels. > > > > This needs more thought. For example, it seems not specific to GSO. > > Should we have a new feature for this? (And build a GSO UDP tunnel on > > top?) > > Double checking I read the above correctly. > > Do you mean something alike the following to negotiated another feature > controlling a new bit in flag telling 'the csum_start field is set even > if NEEDS_CSUM is not set'? > > What about using the flag already defined in patch 2 instead (currently > named VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM, possibly a different better name > can be used [suggestion more than welcome])? > > The overall schema will be: > > - DATA_VALID retains its current semantic when GSO over UDP features are > not negotiated and when VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is not set > > - when VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set that means the following: > - 2 checksums are offloaded - the 2 outer-most one in case the packet > carries more nested headers. Is this on transmit, receive or both. Can we be very explicit in all cases? > and > - 'csum_start' points to the innermost offloaded header > > - DATA_VALID can be set together with VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM, > meaning: > - CHECKSUM_UNNECESSARY with csum_level == 1, > and > - csum_start -> inner transport offset. If the packet carries many > nested headers csum_start points to the 2nd outermost one. Given LCO, is there ever a compelling reason to support offload of more than the inner checksum? And iff there is, can csum_start imply the innermost offloaded header, such that all checksums preceding it have been verified, rather than hardcoding this as exactly 2? > > - VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} can be set/used if and only if: > - NEEDS_CSUM is set (csum_start/csum_offset avail) > or > - DATA_VALID and VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM are both set > (meaning 2 csums are offloaded and csum_start is avail) > > Note that even the following will be also allowed: > > VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} is set, NEEDS_CSUM is set, > VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set. > > meaning it's GSO over UDP tunnel packets, with both checksum offloaded > and CSUM_PARTIAL. > > WDYT? > > Thanks, > > Paolo > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-25 13:28 ` Willem de Bruijn @ 2024-10-25 14:35 ` Paolo Abeni 2024-10-25 15:47 ` Willem de Bruijn 0 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-25 14:35 UTC (permalink / raw) To: Willem de Bruijn Cc: Jason Wang, virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, kshankar On 10/25/24 15:28, Willem de Bruijn wrote: > On Fri, Oct 25, 2024 at 7:50 AM Paolo Abeni <pabeni@redhat.com> wrote: >> >> On 10/25/24 10:28, Jason Wang wrote: >>> On Wed, Oct 23, 2024 at 12:56 AM Paolo Abeni <pabeni@redhat.com> wrote: >>>> I proposed the following: >>>> >>>> For GSO over UDP tunnel VIRTIO_NET_HDR_F_DATA_VALID is allowed if and >>>> only even VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set in flags. >>>> >>>> That means that the device supports csum offload for both the inner and >>>> the outer and both csums are valid. The corresponding skb will have >>>> CHECKSUM_UNNECESSARY and csum_level == 1. >>> >>> Nit: csum_level seems to be Linux specific. >> >> Note: the above was an informal description of the proposal, not the >> actual wording to be used in the spec. >> >>>> There is actually another point which I'm quite scared to mention >>>> because it caused v9 and implicitly all this discussion. >>>> >>>> I want to prevent the driver receiving from the device GSO over UDP >>>> tunnel packets without csum_start/csum_offset, because the header >>>> probing code looks fragile and bug-prone and will be even more complex >>>> in case of tunnels. >>> >>> This needs more thought. For example, it seems not specific to GSO. >>> Should we have a new feature for this? (And build a GSO UDP tunnel on >>> top?) >> >> Double checking I read the above correctly. >> >> Do you mean something alike the following to negotiated another feature >> controlling a new bit in flag telling 'the csum_start field is set even >> if NEEDS_CSUM is not set'? >> >> What about using the flag already defined in patch 2 instead (currently >> named VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM, possibly a different better name >> can be used [suggestion more than welcome])? >> >> The overall schema will be: >> >> - DATA_VALID retains its current semantic when GSO over UDP features are >> not negotiated and when VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is not set >> >> - when VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set that means the following: >> - 2 checksums are offloaded - the 2 outer-most one in case the packet >> carries more nested headers. > > Is this on transmit, receive or both. Can we be very explicit in all cases? This is for both device->driver and driver->device directions. >> and >> - 'csum_start' points to the innermost offloaded header >> >> - DATA_VALID can be set together with VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM, >> meaning: >> - CHECKSUM_UNNECESSARY with csum_level == 1, >> and >> - csum_start -> inner transport offset. If the packet carries many >> nested headers csum_start points to the 2nd outermost one. > > Given LCO, is there ever a compelling reason to support offload of more than the > inner checksum? The end goal here is to support VM doing/receiving tunneled traffic. With the above proposal device H/W GRO is straight-forward - likely existing nics could support hit with some formware update. Even with devices not supporting H/W GRO, it's beneficial. Currently GRO for UDP tunneled CHECKSUM_UNNECESSARY traffic requires csum_level == 1 (or zero outer csum). This allows the VM to perform GRO for UDP tunneled traffic out of the virtio_net device. I guess the relevant GRO code could be refactored to lift the mentioned requirement and instead leveraging LCO, but we will need to look at the inner header before tacking the decision of attempting the aggregation. Without any code written yet (here), such refactor looks weird/the current implementation feels preferable and simpler. > And iff there is, can csum_start imply the innermost offloaded header, such > that all checksums preceding it have been verified, rather than hardcoding > this as exactly 2? Uhm... I guess it could be doable with an additional field (carrying the csum_valid value), but why should be useful? multiple nested tunnels can't be GRO aggregated (s/w), not H/W GRO aggregated. Note that the thing discussed here: https://netdev.bots.linux.dev/netconf/2024/paolo.pdf in all its variant, basically tricks multiple nested tunnel in a single one. Given that reaching agreement has proven to be complex, I would avoid adding additional complexity without a very strong reason. Thanks, Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-25 14:35 ` Paolo Abeni @ 2024-10-25 15:47 ` Willem de Bruijn 0 siblings, 0 replies; 40+ messages in thread From: Willem de Bruijn @ 2024-10-25 15:47 UTC (permalink / raw) To: Paolo Abeni Cc: Jason Wang, virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, kshankar On Fri, Oct 25, 2024 at 10:36 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/25/24 15:28, Willem de Bruijn wrote: > > On Fri, Oct 25, 2024 at 7:50 AM Paolo Abeni <pabeni@redhat.com> wrote: > >> > >> On 10/25/24 10:28, Jason Wang wrote: > >>> On Wed, Oct 23, 2024 at 12:56 AM Paolo Abeni <pabeni@redhat.com> wrote: > >>>> I proposed the following: > >>>> > >>>> For GSO over UDP tunnel VIRTIO_NET_HDR_F_DATA_VALID is allowed if and > >>>> only even VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set in flags. > >>>> > >>>> That means that the device supports csum offload for both the inner and > >>>> the outer and both csums are valid. The corresponding skb will have > >>>> CHECKSUM_UNNECESSARY and csum_level == 1. > >>> > >>> Nit: csum_level seems to be Linux specific. > >> > >> Note: the above was an informal description of the proposal, not the > >> actual wording to be used in the spec. > >> > >>>> There is actually another point which I'm quite scared to mention > >>>> because it caused v9 and implicitly all this discussion. > >>>> > >>>> I want to prevent the driver receiving from the device GSO over UDP > >>>> tunnel packets without csum_start/csum_offset, because the header > >>>> probing code looks fragile and bug-prone and will be even more complex > >>>> in case of tunnels. > >>> > >>> This needs more thought. For example, it seems not specific to GSO. > >>> Should we have a new feature for this? (And build a GSO UDP tunnel on > >>> top?) > >> > >> Double checking I read the above correctly. > >> > >> Do you mean something alike the following to negotiated another feature > >> controlling a new bit in flag telling 'the csum_start field is set even > >> if NEEDS_CSUM is not set'? > >> > >> What about using the flag already defined in patch 2 instead (currently > >> named VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM, possibly a different better name > >> can be used [suggestion more than welcome])? > >> > >> The overall schema will be: > >> > >> - DATA_VALID retains its current semantic when GSO over UDP features are > >> not negotiated and when VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is not set > >> > >> - when VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set that means the following: > >> - 2 checksums are offloaded - the 2 outer-most one in case the packet > >> carries more nested headers. > > > > Is this on transmit, receive or both. Can we be very explicit in all cases? > > This is for both device->driver and driver->device directions. > > >> and > >> - 'csum_start' points to the innermost offloaded header > >> > >> - DATA_VALID can be set together with VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM, > >> meaning: > >> - CHECKSUM_UNNECESSARY with csum_level == 1, > >> and > >> - csum_start -> inner transport offset. If the packet carries many > >> nested headers csum_start points to the 2nd outermost one. > > > > Given LCO, is there ever a compelling reason to support offload of more than the > > inner checksum? > > The end goal here is to support VM doing/receiving tunneled traffic. > > With the above proposal device H/W GRO is straight-forward - likely > existing nics could support hit with some formware update. > > Even with devices not supporting H/W GRO, it's beneficial. Currently GRO > for UDP tunneled CHECKSUM_UNNECESSARY traffic requires csum_level == 1 > (or zero outer csum). This allows the VM to perform GRO for UDP tunneled > traffic out of the virtio_net device. I see, so this is derived from a constraint in the current Linux GRO code. > I guess the relevant GRO code could be refactored to lift the mentioned > requirement and instead leveraging LCO, but we will need to look at the > inner header before tacking the decision of attempting the aggregation. > Without any code written yet (here), such refactor looks weird/the > current implementation feels preferable and simpler. > > > And iff there is, can csum_start imply the innermost offloaded header, such > > that all checksums preceding it have been verified, rather than hardcoding > > this as exactly 2? > > Uhm... I guess it could be doable with an additional field (carrying the > csum_valid value), but why should be useful? multiple nested tunnels > can't be GRO aggregated (s/w), not H/W GRO aggregated. In principle, or are not aggregated today? I think the latter. No need for an explicit csum_valid field if it is implicitly implied by where csum_start points: all preceding checksum fields are verified. I don't think this conflicts with your "only 1 level" requirement, is just a generalization of it. > Note that the thing discussed here: > > https://netdev.bots.linux.dev/netconf/2024/paolo.pdf > > in all its variant, basically tricks multiple nested tunnel in a single one. > > Given that reaching agreement has proven to be complex, I would avoid > adding additional complexity without a very strong reason. I agree. This topic is complex enough already. And I think that there are far fewer uses for multiple tunnels than just for a single tunnel. So let's get that merged. But let's try to write the proposal so that it can possibly be extended in the future. I suppose your proposal is. It might be useful to make explicit why the "only 1 level" and that this is a pragmatic choice that may be revisited in the future. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-25 11:50 ` Paolo Abeni 2024-10-25 13:28 ` Willem de Bruijn @ 2024-10-28 3:27 ` Jason Wang 2024-10-28 12:08 ` Paolo Abeni 1 sibling, 1 reply; 40+ messages in thread From: Jason Wang @ 2024-10-28 3:27 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Fri, Oct 25, 2024 at 7:51 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/25/24 10:28, Jason Wang wrote: > > On Wed, Oct 23, 2024 at 12:56 AM Paolo Abeni <pabeni@redhat.com> wrote: > >> I proposed the following: > >> > >> For GSO over UDP tunnel VIRTIO_NET_HDR_F_DATA_VALID is allowed if and > >> only even VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set in flags. > >> > >> That means that the device supports csum offload for both the inner and > >> the outer and both csums are valid. The corresponding skb will have > >> CHECKSUM_UNNECESSARY and csum_level == 1. > > > > Nit: csum_level seems to be Linux specific. > > Note: the above was an informal description of the proposal, not the > actual wording to be used in the spec. I see. > > >> There is actually another point which I'm quite scared to mention > >> because it caused v9 and implicitly all this discussion. > >> > >> I want to prevent the driver receiving from the device GSO over UDP > >> tunnel packets without csum_start/csum_offset, because the header > >> probing code looks fragile and bug-prone and will be even more complex > >> in case of tunnels. > > > > This needs more thought. For example, it seems not specific to GSO. > > Should we have a new feature for this? (And build a GSO UDP tunnel on > > top?) > > Double checking I read the above correctly. > > Do you mean something alike the following to negotiated another feature > controlling a new bit in flag telling 'the csum_start field is set even > if NEEDS_CSUM is not set'? Just wondering if it's worth it. > > What about using the flag already defined in patch 2 instead (currently > named VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM, possibly a different better name > can be used [suggestion more than welcome])? If we want to make a csum_start/offset set for DATA_VALID, I think we need a dedicated feature flag along with the new vnet header flag. > > The overall schema will be: > > - DATA_VALID retains its current semantic when GSO over UDP features are > not negotiated and when VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is not set Just to make sure we are aligned, it looks like we go with this way. The new flag only makes sense for GSO UDP tunnel, then it seems unnecessary for a dedicated feature. I think we need to know if it makes sense to mandate csum_start/offset for non GSO UDP tunnels. Going back to what you points out in the above (I think we had some discussion but without conclusion): " I want to prevent the driver receiving from the device GSO over UDP tunnel packets without csum_start/csum_offset, because the header probing code looks fragile and bug-prone and will be even more complex in case of tunnels. " I want to know which case for such a requirement can help. Or you mean, e.g for RX side: 1) csum_start/offset is not set: driver need to probe the header which 2) csum_start/offset is set: driver need to do something simpler than 1) If 2) is true, it seems to relax the check somehow anyway which seems not optimal. And requiring csum_start/offset doesn't help for fragility of the driver codes. > > - when VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set that means the following: > - 2 checksums are offloaded - the 2 outer-most one in case the packet > carries more nested headers. > and > - 'csum_start' points to the innermost offloaded header I'm not sure I will get here, so if we had 3 checksums of 2 levels of tunnel, the above seems to be a conflict: 1) outermost and level 1 tunnel is offload 2) csum start points to level 2 tunnel > > - DATA_VALID can be set together with VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM, > meaning: > - CHECKSUM_UNNECESSARY with csum_level == 1, > and > - csum_start -> inner transport offset. If the packet carries many > nested headers csum_start points to the 2nd outermost one. So it depends on the driver to check the rest of the inner level. It looks anyhow inconsistent with the above. And if we do LCO, at least TX sider csum_start needs to point to the innermost one, this looks like another inconsistency. > > - VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} can be set/used if and only if: > - NEEDS_CSUM is set (csum_start/csum_offset avail) > or As explained before, I think it's better to stick to the semantics that map to CHECKSUM_PARTIAL which seems mutually exclusive with CHECK_UNNECESSARY. > - DATA_VALID and VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM are both set > (meaning 2 csums are offloaded and csum_start is avail) > > Note that even the following will be also allowed: > > VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} is set, NEEDS_CSUM is set, > VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set. > > meaning it's GSO over UDP tunnel packets, with both checksum offloaded > and CSUM_PARTIAL. > > WDYT? As the topic becomes more complicated, I wonder if we can do something easier, that is focusing on the one checksum offload first. And I would like to check the current Linux behaviour first for non-GSO UDP tunnel: 1) For TX, as you've mentioned, the csum_start/offset has already pointed to the inner header (but is this the innermost?) 2) For RX, what header would csum_start/offset points (considering we have nested tunnels) I'm asking to make sure our proposal doesn't conflict with the existing implementation. Thanks > > Thanks, > > Paolo > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-28 3:27 ` Jason Wang @ 2024-10-28 12:08 ` Paolo Abeni 2024-10-28 12:26 ` Willem de Bruijn 0 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-28 12:08 UTC (permalink / raw) To: Jason Wang, Willem de Bruijn Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, kshankar On 10/28/24 04:27, Jason Wang wrote: > As the topic becomes more complicated, I wonder if we can do something > easier, that is focusing on the one checksum offload first. And I > would like to check the current Linux behaviour first for non-GSO UDP > tunnel: > > 1) For TX, as you've mentioned, the csum_start/offset has already > pointed to the inner header (but is this the innermost?) > 2) For RX, what header would csum_start/offset points (considering we > have nested tunnels) > > I'm asking to make sure our proposal doesn't conflict with the > existing implementation. In Linux, csum_start/offset, both for rx and tx, points to the innermost _offloaded_ header. On RX, since GSO partial does not apply, such innermost offloaded header correspond the outermost/only header (when no tunnels are configured) or the 2nd outermost, when tunnels are configured and GRO-ed. On TX, via GSO partial and local csum offload, the csum_start can, in theory point to an header with an header with an arbitrary high level, but currently no driver/tx path allow for GSO partial with more than 1 encapsulation level (may change in the future). The proposed specification change in v9 for NEED_CSUM matches the above. Since reaching an agreement WRT DATA_VALID handling looks problematic, would you consider a virtual meeting to discuss the topic? Thanks, Paolo p.s. to be clear, with 'you' I include both Jason and Willem. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-28 12:08 ` Paolo Abeni @ 2024-10-28 12:26 ` Willem de Bruijn 2024-10-28 14:23 ` Paolo Abeni 2024-10-29 7:32 ` Jason Wang 0 siblings, 2 replies; 40+ messages in thread From: Willem de Bruijn @ 2024-10-28 12:26 UTC (permalink / raw) To: Paolo Abeni Cc: Jason Wang, virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, kshankar On Mon, Oct 28, 2024 at 8:08 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/28/24 04:27, Jason Wang wrote: > > As the topic becomes more complicated, I wonder if we can do something > > easier, that is focusing on the one checksum offload first. And I > > would like to check the current Linux behaviour first for non-GSO UDP > > tunnel: > > > > 1) For TX, as you've mentioned, the csum_start/offset has already > > pointed to the inner header (but is this the innermost?) > > 2) For RX, what header would csum_start/offset points (considering we > > have nested tunnels) > > > > I'm asking to make sure our proposal doesn't conflict with the > > existing implementation. > > In Linux, csum_start/offset, both for rx and tx, points to the innermost > _offloaded_ header. > > On RX, since GSO partial does not apply, such innermost offloaded header > correspond the outermost/only header (when no tunnels are configured) or > the 2nd outermost, when tunnels are configured and GRO-ed. > > On TX, via GSO partial and local csum offload, the csum_start can, in > theory point to an header with an header with an arbitrary high level, > but currently no driver/tx path allow for GSO partial with more than 1 > encapsulation level (may change in the future). > > The proposed specification change in v9 for NEED_CSUM matches the above. > > Since reaching an agreement WRT DATA_VALID handling looks problematic, > would you consider a virtual meeting to discuss the topic? > > Thanks, > > Paolo > > p.s. to be clear, with 'you' I include both Jason and Willem. Works for me. Might indeed be faster to get to a resolution than an email thread at this point. ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-28 12:26 ` Willem de Bruijn @ 2024-10-28 14:23 ` Paolo Abeni 2024-10-29 7:32 ` Jason Wang 1 sibling, 0 replies; 40+ messages in thread From: Paolo Abeni @ 2024-10-28 14:23 UTC (permalink / raw) To: Willem de Bruijn Cc: Jason Wang, virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, kshankar On 10/28/24 13:26, Willem de Bruijn wrote: > On Mon, Oct 28, 2024 at 8:08 AM Paolo Abeni <pabeni@redhat.com> wrote: >> On 10/28/24 04:27, Jason Wang wrote: >>> As the topic becomes more complicated, I wonder if we can do something >>> easier, that is focusing on the one checksum offload first. And I >>> would like to check the current Linux behaviour first for non-GSO UDP >>> tunnel: >>> >>> 1) For TX, as you've mentioned, the csum_start/offset has already >>> pointed to the inner header (but is this the innermost?) >>> 2) For RX, what header would csum_start/offset points (considering we >>> have nested tunnels) >>> >>> I'm asking to make sure our proposal doesn't conflict with the >>> existing implementation. >> >> In Linux, csum_start/offset, both for rx and tx, points to the innermost >> _offloaded_ header. >> >> On RX, since GSO partial does not apply, such innermost offloaded header >> correspond the outermost/only header (when no tunnels are configured) or >> the 2nd outermost, when tunnels are configured and GRO-ed. >> >> On TX, via GSO partial and local csum offload, the csum_start can, in >> theory point to an header with an header with an arbitrary high level, >> but currently no driver/tx path allow for GSO partial with more than 1 >> encapsulation level (may change in the future). >> >> The proposed specification change in v9 for NEED_CSUM matches the above. >> >> Since reaching an agreement WRT DATA_VALID handling looks problematic, >> would you consider a virtual meeting to discuss the topic? >> >> Thanks, >> >> Paolo >> >> p.s. to be clear, with 'you' I include both Jason and Willem. > > Works for me. Thanks! I forgot to mention that it would be great if you could share (even off-list) a few timeslots that could fit your schedule, and I'll try to find a match. Cheers, Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature 2024-10-28 12:26 ` Willem de Bruijn 2024-10-28 14:23 ` Paolo Abeni @ 2024-10-29 7:32 ` Jason Wang 1 sibling, 0 replies; 40+ messages in thread From: Jason Wang @ 2024-10-29 7:32 UTC (permalink / raw) To: Willem de Bruijn Cc: Paolo Abeni, virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, kshankar On Mon, Oct 28, 2024 at 8:27 PM Willem de Bruijn <willemb@google.com> wrote: > > On Mon, Oct 28, 2024 at 8:08 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On 10/28/24 04:27, Jason Wang wrote: > > > As the topic becomes more complicated, I wonder if we can do something > > > easier, that is focusing on the one checksum offload first. And I > > > would like to check the current Linux behaviour first for non-GSO UDP > > > tunnel: > > > > > > 1) For TX, as you've mentioned, the csum_start/offset has already > > > pointed to the inner header (but is this the innermost?) > > > 2) For RX, what header would csum_start/offset points (considering we > > > have nested tunnels) > > > > > > I'm asking to make sure our proposal doesn't conflict with the > > > existing implementation. > > > > In Linux, csum_start/offset, both for rx and tx, points to the innermost > > _offloaded_ header. > > > > On RX, since GSO partial does not apply, such innermost offloaded header > > correspond the outermost/only header (when no tunnels are configured) or > > the 2nd outermost, when tunnels are configured and GRO-ed. > > > > On TX, via GSO partial and local csum offload, the csum_start can, in > > theory point to an header with an header with an arbitrary high level, > > but currently no driver/tx path allow for GSO partial with more than 1 > > encapsulation level (may change in the future). > > > > The proposed specification change in v9 for NEED_CSUM matches the above. > > > > Since reaching an agreement WRT DATA_VALID handling looks problematic, > > would you consider a virtual meeting to discuss the topic? > > > > Thanks, > > > > Paolo > > > > p.s. to be clear, with 'you' I include both Jason and Willem. > > Works for me. > Works for me. > Might indeed be faster to get to a resolution than an email thread at > this point. Yes. Thanks > ^ permalink raw reply [flat|nested] 40+ messages in thread
* [PATCH v9 2/2] virtio-net: define UDP tunnel checksum offload feature 2024-10-04 8:13 [PATCH v9 0/2] virtio-net: define UDP tunnel offload Paolo Abeni 2024-10-04 8:13 ` [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature Paolo Abeni @ 2024-10-04 8:13 ` Paolo Abeni 2024-10-09 7:18 ` Jason Wang 2024-10-04 16:53 ` [PATCH v9 0/2] virtio-net: define UDP tunnel offload Paolo Abeni 2 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-04 8:13 UTC (permalink / raw) To: virtio-comment Cc: maxime.coquelin, Eelco Chaudron, Jason Wang, Stefano Garzarella, Willem de Bruijn, kshankar This complements the previous change, additionally introducing the UDP tunnel checksum offload feature. Differently from the plain checksum offload feature, this depends on UDP tunnel segmentation being available, as outer checksum computation for non GSO packets is cheap and H/W implementation of such a feature is complex. UDP tunnel checksum offload does not introduce additional fields, instead it leverages the outer transport offset introduced by the UDP tunnel segmentation feature to locate the outer checksum inside the packet. When UDP tunnel checksum offload is negotiated: - the driver requests the outer UDP checksum offload setting the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in the flag field. Such bit is not allocated inside the gso_type field to prevent space exhaustion there. - in the device -> driver direction, the VIRTIO_NET_HDR_F_DATA_VALID bit semantic is extended, covering both the inner and the outer checksum validation. Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- v8 -> v9: - rebased on top of virtio-1.4, changed udp-tunnel features number to avoid conflicts/duplications - fixed typos: VIRTIO_NET_HDR_F_UDP_TUNNEL -> VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM VIRTIO_NET_HDR_GSO_UDP_TUNNEL -> either VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 or VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 - specified strict validation for the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit, only allowed with UDP tunnel GSO packets. - specified that VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM describe the outer header csum offload for both in presence and absence of VIRTIO_NET_HDR_F_DATA_VALID v7 -> v8: - dropped confusing wording around csum validation - clarified VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM allocation v6 -> v7: - rebased - VIRTIO_NET_F_{HOST,GUEST}_UDP_TUNNEL_CSUM -> ...UDP_TUNNEL_GSO_CSUM - dropped unintended change to existing TCP gso spec --- device-types/net/description.tex | 136 ++++++++++++++++++++++++++++--- 1 file changed, 125 insertions(+), 11 deletions(-) diff --git a/device-types/net/description.tex b/device-types/net/description.tex index 8e84213..48e0f5e 100644 --- a/device-types/net/description.tex +++ b/device-types/net/description.tex @@ -91,9 +91,15 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits \item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO (46)] Driver can receive GSO packets carried by a UDP tunnel. +\item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM (47)] Driver handles packets + carried by a UDP tunnel with partial csum for the outer header. + \item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO (48)] Device can receive GSO packets carried by a UDP tunnel. +\item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM (49)] Device handles packets + carried by a UDP tunnel with partial csum for the outer header. + \item[VIRTIO_NET_F_DEVICE_STATS(50)] Device can provide device-level statistics to the driver through the control virtqueue. @@ -146,6 +152,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_GUEST_USO6] Requires VIRTIO_NET_F_GUEST_CSUM. \item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO] Requires VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, VIRTIO_NET_F_GUEST_USO4 and VIRTIO_NET_F_GUEST_USO6. +\item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM] Requires VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO \item[VIRTIO_NET_F_HOST_TSO4] Requires VIRTIO_NET_F_CSUM. \item[VIRTIO_NET_F_HOST_TSO6] Requires VIRTIO_NET_F_CSUM. @@ -154,6 +161,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_HOST_USO] Requires VIRTIO_NET_F_CSUM. \item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO] Requires VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_TSO6 and VIRTIO_NET_F_HOST_USO. +\item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM] Requires VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO \item[VIRTIO_NET_F_CTRL_RX] Requires VIRTIO_NET_F_CTRL_VQ. \item[VIRTIO_NET_F_CTRL_VLAN] Requires VIRTIO_NET_F_CTRL_VQ. @@ -168,6 +176,13 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device \item[VIRTIO_NET_F_RSS_CONTEXT] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_RSS. \end{description} +\begin{note} +The dependency between UDP_TUNNEL_GSO_CSUM and UDP_TUNNEL_GSO is intentionally +in the opposite direction with respect to the plain GSO features and the plain +checksum offload because UDP tunnel checksum offload gives very little gain +for non GSO packets and is quite complex to implement in H/W. +\end{note} + \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} \begin{description} \item[VIRTIO_NET_F_GSO (6)] Device handles packets with any GSO type. This was supposed to indicate segmentation offload support, but @@ -398,6 +413,11 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev the outer UDP checksum is zero - by negotiating the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO feature. +\item If VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO is negotiated, a driver can + additionally use TCP segmentation or UDP segmentation on top of UDP + encapsulation with the outer header requiring checksum offload, + negotiating the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature. + \item The converse features are also available: a driver can save the virtual device some work by negotiating these features.\note{For example, a network packet transported between two guests on the same system might not need checksumming at all, nor segmentation, @@ -406,8 +426,9 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev checksummed packets can be received, and if it can do that then the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_USO4, - VIRTIO_NET_F_GUEST_USO6 and VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO - are the input equivalents of the features described above. + VIRTIO_NET_F_GUEST_USO6 VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO and + VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM are the input equivalents of + the features described above. See \ref{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers}~\nameref{sec:Device Types / Network Device / Device Operation / Setting Up Receive Buffers} and @@ -463,6 +484,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 #define VIRTIO_NET_HDR_F_DATA_VALID 2 #define VIRTIO_NET_HDR_F_RSC_INFO 4 +#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8 u8 flags; #define VIRTIO_NET_HDR_GSO_NONE 0 #define VIRTIO_NET_HDR_GSO_TCPV4 1 @@ -584,8 +606,11 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De \field{gso_type} has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit set, the GSO protocol is encapsulated in a UDP tunnel. - The outer UDP header must not require checksum validation, i.e. the outer UDP - checksum must be a positive zero (0x0) as defined in UDP RFC 768. + If the outer UDP header requires checksumming, the driver must have + additionally negotiated the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature + and offloaded the outer checksum accordingly, otherwise + the outer UDP header must not require checksum validation, i.e. the outer + UDP checksum must be positive zero (0x0) as defined in UDP RFC 768. The other tunnel-related fields indicate how to replicate the packet headers to cut it into smaller packets: @@ -613,6 +638,20 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De VIRTIO_NET_HDR_GSO_TCPV4 | VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 = 0x21 \end{note} +\item If the driver negotiated the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature, + the transmitted packet is a GSO one encapsulated in a UDP tunnel, and + the outer UDP header requires checksumming, the driver can skip checksumming + the outer header: + + \begin{itemize} + \item \field{flags} has the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM set, + + \item The outer UDP checksum field in the packet is set to the sum + of the UDP pseudo header, so that replacing it by the ones' + complement checksum of the outer UDP header and payload will give the + correct result. + \end{itemize} + \item \field{num_buffers} is set to zero. This field is unused on transmitted packets. \item The header and packet are added as one output descriptor to the @@ -671,6 +710,14 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De MUST SET the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit. The driver MUST NOT send to the device TCP or UDP GSO packets over UDP tunnel +requiring segmentation and outer UDP checksum offload, unless both the +VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO and VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM features +are negotiated, in which case the driver MUST set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL +bit in the \field{gso_type} and the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in the \field{flags}. + +If VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM is not negotiated, the driver MUST not set +the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in the \field{flags} and +MUST NOT send to the device TCP or UDP GSO packets over UDP tunnel requiring segmentation and outer UDP checksum offload. The driver MUST NOT set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the @@ -680,6 +727,10 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De The driver MUST NOT set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit together. +The driver MUST NOT set the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit \field{flags} +without setting either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}. + If the VIRTIO_NET_F_CSUM feature has been negotiated, the driver MAY set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in \field{flags}, if so: @@ -731,8 +782,23 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De to the inner transport header and inner transport checksum field. \end{itemize} +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature has been negotiated, +and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, +the driver MAY set the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in +\field{flags}, if so the driver MUST set the packet outer UDP header checksum +to the outer UDP pseudo header. + +\begin{note} +calculating a ones' complement checksum from \field{outer_th_offset} +up until the end of the packet and storing the result at offset 6 +from \field{outer_th_offset} will result in a fully checksummed outer UDP packet; +\end{note} + If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the -VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set +and the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature has not +been negotiated, the outer UDP header MUST NOT require checksum validation. That is, the outer UDP checksum value MUST be 0 or the validated complete checksum for such header. @@ -753,6 +819,10 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De been negotiated, the driver MUST NOT set either the VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6 in \field{gso_type}. +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM option has not been negotiated, +the driver MUST NOT set the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit +in \field{flags}. + The driver SHOULD accept the VIRTIO_NET_F_GUEST_HDRLEN feature if it has been offered, and if it's able to provide the exact header length. @@ -811,6 +881,11 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De \end{itemize} the device MUST NOT accept the packet. +If the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in \field{flags} is set, +and both the bits VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 and +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 in \field{gso_type} are not set, +the device MOST NOT accept the packet. + If VIRTIO_NET_HDR_F_NEEDS_CSUM is not set, the device MUST NOT rely on the packet checksum being correct. \paragraph{Packet Transmission Interrupt}\label{sec:Device Types / Network Device / Device Operation / Packet Transmission / Packet Transmission Interrupt} @@ -920,8 +995,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network in \field{flags}, meaning that the device validated the checksum, set the csum_start and csum_offset fields, but did not provide the partial checksum for the transport header. - In case of multiple encapsulated protocols, one level of checksums - has been validated. + If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been negotiated, + and the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit is set in \field{flags}, + both the outer UDP checksum and the inner transport checksum + have been validated, otherwise only one level of checksums (the inner one + in case of tunnels) has been validated. \end{enumerate} Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP, UDP_TUNNEL @@ -957,6 +1035,14 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network bit MAY be set. In such case the \field{outer_th_offset} and \field{inner_nh_offset} fields indicate the corresponding headers information. +\item If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature was +negotiated, and + the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit is set in \field{gso_type}, the + VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in the \field{flags} can be set: if so, + the outer UDP checksum has been validated. + If the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} is not set, the UDP + header checksum at offset 6 from from \field{outer_th_offset} + is set to the outer UDP pseudo header checksum. \end{enumerate} @@ -1005,6 +1091,9 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}. +If VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM is not negotiated the device MUST NOT set +the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in \field{flags}. + The device SHOULD NOT send to the driver TCP packets requiring segmentation offload which have the Explicit Congestion Notification bit set, unless the VIRTIO_NET_F_GUEST_ECN feature is negotiated, in which case the @@ -1037,7 +1126,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network to the corresponding header information, and the outer UDP header MUST NOT require checksum offload. -The device MUST NOT send the driver TCP or UDP GSO packets encapsulated in UDP +If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has not been negotiated, +the device MUST NOT send the driver TCP or UDP GSO packets encapsulated in UDP tunnel and requiring segmentation and outer checksum offload. If none of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have @@ -1063,13 +1153,31 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags}, if so, the device MUST validate the packet -checksum (in case of multiple encapsulated protocols, one level -of checksums is validated). \note{This implies that if either the -VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the +checksum. If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has +been negotiated, and the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit set in +\field{flags}, both the outer UDP checksum and the inner transport +checksum have been validated, otherwise only one level of checksums +(the inner one in case of tunnels) has been validated. \note{This implies +that if either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, and the set the VIRTIO_NET_HDR_F_DATA_VALID in \field{flags} is set, the the VIRTIO_NET_F_GUEST_CSUM bit in \field{flags} is must be set, too} +If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been negotiated +and either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit is set or the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is set in \field{gso_type}, the +device MAY set the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in +\field{flags}, if so and the VIRTIO_NET_F_DATA_VALID bit in \field{flags} +is not set, the device MUST set the packet outer UDP checksum stored in the +receive buffer to the outer UDP pseudo header. + +Otherwise, the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been +negotiated, either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit is set or the +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is set in \field{gso_type}, +and the bit VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is not set in +\field{flags}, the device MUST either provide a zero outer UDP header +checksum or a fully checksummed outer UDP header. + \drivernormative{\paragraph}{Processing of Incoming Packets}{Device Types / Network Device / Device Operation / Processing of Incoming Packets} @@ -1110,6 +1218,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network \end{itemize} the driver MUST NOT accept the packet. +If the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in \field{flags} is set, +and both the bits VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 and +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 in \field{gso_type} are not set, +the driver MOST NOT accept the packet. + \paragraph{Hash calculation for incoming packets} \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets} @@ -1954,6 +2067,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi #define VIRTIO_NET_F_GUEST_ECN 9 #define VIRTIO_NET_F_GUEST_UFO 10 #define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 46 +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 47 #define VIRTIO_NET_F_GUEST_USO4 54 #define VIRTIO_NET_F_GUEST_USO6 55 -- 2.45.2 ^ permalink raw reply related [flat|nested] 40+ messages in thread
* Re: [PATCH v9 2/2] virtio-net: define UDP tunnel checksum offload feature 2024-10-04 8:13 ` [PATCH v9 2/2] virtio-net: define UDP tunnel checksum " Paolo Abeni @ 2024-10-09 7:18 ` Jason Wang 2024-10-09 9:39 ` Paolo Abeni 0 siblings, 1 reply; 40+ messages in thread From: Jason Wang @ 2024-10-09 7:18 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Fri, Oct 4, 2024 at 4:13 PM Paolo Abeni <pabeni@redhat.com> wrote: > > This complements the previous change, additionally > introducing the UDP tunnel checksum offload feature. > > Differently from the plain checksum offload feature, this > depends on UDP tunnel segmentation being available, as outer checksum > computation for non GSO packets is cheap and H/W implementation of > such a feature is complex. > > UDP tunnel checksum offload does not introduce additional fields, > instead it leverages the outer transport offset introduced by the > UDP tunnel segmentation feature to locate the outer checksum > inside the packet. > > When UDP tunnel checksum offload is negotiated: > - the driver requests the outer UDP checksum offload setting the > VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in the flag field. Such bit > is not allocated inside the gso_type field to prevent space > exhaustion there. > - in the device -> driver direction, the VIRTIO_NET_HDR_F_DATA_VALID > bit semantic is extended, covering both the inner and the outer > checksum validation. > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > v8 -> v9: > - rebased on top of virtio-1.4, changed udp-tunnel features number > to avoid conflicts/duplications > - fixed typos: > VIRTIO_NET_HDR_F_UDP_TUNNEL -> VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM > VIRTIO_NET_HDR_GSO_UDP_TUNNEL -> either > VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 or VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 > - specified strict validation for the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM > bit, only allowed with UDP tunnel GSO packets. > - specified that VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM describe the outer > header csum offload for both in presence and absence of > VIRTIO_NET_HDR_F_DATA_VALID > > v7 -> v8: > - dropped confusing wording around csum validation > - clarified VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM allocation > > v6 -> v7: > - rebased > - VIRTIO_NET_F_{HOST,GUEST}_UDP_TUNNEL_CSUM -> ...UDP_TUNNEL_GSO_CSUM > - dropped unintended change to existing TCP gso spec > --- > device-types/net/description.tex | 136 ++++++++++++++++++++++++++++--- > 1 file changed, 125 insertions(+), 11 deletions(-) > > diff --git a/device-types/net/description.tex b/device-types/net/description.tex > index 8e84213..48e0f5e 100644 > --- a/device-types/net/description.tex > +++ b/device-types/net/description.tex > @@ -91,9 +91,15 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits > \item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO (46)] Driver can receive GSO packets > carried by a UDP tunnel. > > +\item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM (47)] Driver handles packets > + carried by a UDP tunnel with partial csum for the outer header. > + > \item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO (48)] Device can receive GSO packets > carried by a UDP tunnel. > > +\item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM (49)] Device handles packets > + carried by a UDP tunnel with partial csum for the outer header. > + > \item[VIRTIO_NET_F_DEVICE_STATS(50)] Device can provide device-level statistics > to the driver through the control virtqueue. > > @@ -146,6 +152,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device > \item[VIRTIO_NET_F_GUEST_USO6] Requires VIRTIO_NET_F_GUEST_CSUM. > \item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO] Requires VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, > VIRTIO_NET_F_GUEST_USO4 and VIRTIO_NET_F_GUEST_USO6. > +\item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM] Requires VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO > > \item[VIRTIO_NET_F_HOST_TSO4] Requires VIRTIO_NET_F_CSUM. > \item[VIRTIO_NET_F_HOST_TSO6] Requires VIRTIO_NET_F_CSUM. > @@ -154,6 +161,7 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device > \item[VIRTIO_NET_F_HOST_USO] Requires VIRTIO_NET_F_CSUM. > \item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO] Requires VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_TSO6 > and VIRTIO_NET_F_HOST_USO. > +\item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM] Requires VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO > > \item[VIRTIO_NET_F_CTRL_RX] Requires VIRTIO_NET_F_CTRL_VQ. > \item[VIRTIO_NET_F_CTRL_VLAN] Requires VIRTIO_NET_F_CTRL_VQ. > @@ -168,6 +176,13 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device > \item[VIRTIO_NET_F_RSS_CONTEXT] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_RSS. > \end{description} > > +\begin{note} > +The dependency between UDP_TUNNEL_GSO_CSUM and UDP_TUNNEL_GSO is intentionally > +in the opposite direction with respect to the plain GSO features and the plain > +checksum offload because UDP tunnel checksum offload gives very little gain > +for non GSO packets and is quite complex to implement in H/W. > +\end{note} Nit: maybe it's better to define "plain GSO" or just call it "non-tunnel GSO". > + > \subsubsection{Legacy Interface: Feature bits}\label{sec:Device Types / Network Device / Feature bits / Legacy Interface: Feature bits} > \begin{description} > \item[VIRTIO_NET_F_GSO (6)] Device handles packets with any GSO type. This was supposed to indicate segmentation offload support, but > @@ -398,6 +413,11 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev > the outer UDP checksum is zero - by negotiating the > VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO feature. > > +\item If VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO is negotiated, a driver can > + additionally use TCP segmentation or UDP segmentation on top of UDP > + encapsulation with the outer header requiring checksum offload, > + negotiating the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature. > + > \item The converse features are also available: a driver can save > the virtual device some work by negotiating these features.\note{For example, a network packet transported between two guests on > the same system might not need checksumming at all, nor segmentation, > @@ -406,8 +426,9 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev > checksummed packets can be received, and if it can do that then > the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6, > VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_USO4, > - VIRTIO_NET_F_GUEST_USO6 and VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO > - are the input equivalents of the features described above. > + VIRTIO_NET_F_GUEST_USO6 VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO and > + VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM are the input equivalents of > + the features described above. > See \ref{sec:Device Types / Network Device / Device Operation / > Setting Up Receive Buffers}~\nameref{sec:Device Types / Network > Device / Device Operation / Setting Up Receive Buffers} and > @@ -463,6 +484,7 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O > #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1 > #define VIRTIO_NET_HDR_F_DATA_VALID 2 > #define VIRTIO_NET_HDR_F_RSC_INFO 4 > +#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8 > u8 flags; > #define VIRTIO_NET_HDR_GSO_NONE 0 > #define VIRTIO_NET_HDR_GSO_TCPV4 1 > @@ -584,8 +606,11 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > \field{gso_type} has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or > VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit set, the GSO protocol is encapsulated > in a UDP tunnel. > - The outer UDP header must not require checksum validation, i.e. the outer UDP > - checksum must be a positive zero (0x0) as defined in UDP RFC 768. > + If the outer UDP header requires checksumming, the driver must have > + additionally negotiated the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature > + and offloaded the outer checksum accordingly, otherwise > + the outer UDP header must not require checksum validation, i.e. the outer > + UDP checksum must be positive zero (0x0) as defined in UDP RFC 768. > The other tunnel-related fields indicate how to replicate the packet > headers to cut it into smaller packets: > > @@ -613,6 +638,20 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > VIRTIO_NET_HDR_GSO_TCPV4 | VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 = 0x21 > \end{note} > > +\item If the driver negotiated the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature, > + the transmitted packet is a GSO one encapsulated in a UDP tunnel, and > + the outer UDP header requires checksumming, the driver can skip checksumming > + the outer header: > + > + \begin{itemize} > + \item \field{flags} has the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM set, > + > + \item The outer UDP checksum field in the packet is set to the sum > + of the UDP pseudo header, so that replacing it by the ones' > + complement checksum of the outer UDP header and payload will give the > + correct result. > + \end{itemize} > + > \item \field{num_buffers} is set to zero. This field is unused on transmitted packets. > > \item The header and packet are added as one output descriptor to the > @@ -671,6 +710,14 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > MUST SET the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit. > > The driver MUST NOT send to the device TCP or UDP GSO packets over UDP tunnel > +requiring segmentation and outer UDP checksum offload, unless both the > +VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO and VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM features > +are negotiated, in which case the driver MUST set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL > +bit in the \field{gso_type} and the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in the \field{flags}. > + > +If VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM is not negotiated, the driver MUST not set > +the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in the \field{flags} and > +MUST NOT send to the device TCP or UDP GSO packets over UDP tunnel > requiring segmentation and outer UDP checksum offload. > > The driver MUST NOT set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > @@ -680,6 +727,10 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > The driver MUST NOT set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and the > VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit together. > > +The driver MUST NOT set the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit \field{flags} > +without setting either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or > +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}. > + > If the VIRTIO_NET_F_CSUM feature has been negotiated, the > driver MAY set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in > \field{flags}, if so: > @@ -731,8 +782,23 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > to the inner transport header and inner transport checksum field. > \end{itemize} > > +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature has been negotiated, > +and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, > +the driver MAY set the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in > +\field{flags}, if so the driver MUST set the packet outer UDP header checksum > +to the outer UDP pseudo header. > + > +\begin{note} > +calculating a ones' complement checksum from \field{outer_th_offset} > +up until the end of the packet and storing the result at offset 6 > +from \field{outer_th_offset} will result in a fully checksummed outer UDP packet; > +\end{note} > + > If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > -VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set > +and the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature has not > +been negotiated, the > outer UDP header MUST NOT require checksum validation. That is, the > outer UDP checksum value MUST be 0 or the validated complete checksum > for such header. Nit: do we need to mention this is for TX? > @@ -753,6 +819,10 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > been negotiated, the driver MUST NOT set either the VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4 > bit or the VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6 in \field{gso_type}. > > +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM option has not been negotiated, > +the driver MUST NOT set the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit > +in \field{flags}. > + > The driver SHOULD accept the VIRTIO_NET_F_GUEST_HDRLEN feature if it has > been offered, and if it's able to provide the exact header length. > > @@ -811,6 +881,11 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > \end{itemize} > the device MUST NOT accept the packet. > > +If the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in \field{flags} is set, > +and both the bits VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 and > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 in \field{gso_type} are not set, > +the device MOST NOT accept the packet. > + > If VIRTIO_NET_HDR_F_NEEDS_CSUM is not set, the device MUST NOT > rely on the packet checksum being correct. > \paragraph{Packet Transmission Interrupt}\label{sec:Device Types / Network Device / Device Operation / Packet Transmission / Packet Transmission Interrupt} > @@ -920,8 +995,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > in \field{flags}, meaning that the device validated the checksum, > set the csum_start and csum_offset fields, but did not provide the > partial checksum for the transport header. > - In case of multiple encapsulated protocols, one level of checksums > - has been validated. > + If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been negotiated, > + and the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit is set in \field{flags}, > + both the outer UDP checksum and the inner transport checksum > + have been validated, We may need to clarify this is for RX. What's more, this seems to be the semantic of DATA_VALID? If yes, to be consistent, something like VIRTIO_NET_HDR_F_UDP_TUNNEL_DATA_VALID might be better. And is there a chance that we may need partial csum for outer UDP e.g in the case of VM2VM communication. > otherwise only one level of checksums (the inner one > + in case of tunnels) has been validated. > \end{enumerate} Ok, if one level means inner, that answers one of my questions for the previous patch. > > Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP, UDP_TUNNEL > @@ -957,6 +1035,14 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > bit MAY be set. In such case the \field{outer_th_offset} and > \field{inner_nh_offset} fields indicate the corresponding > headers information. > +\item If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature was > +negotiated, and > + the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit is set in \field{gso_type}, the > + VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in the \field{flags} can be set: if so, > + the outer UDP checksum has been validated. > + If the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} is not set, the UDP > + header checksum at offset 6 from from \field{outer_th_offset} > + is set to the outer UDP pseudo header checksum. Just to confirm, VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is for outer and VIRTIO_NET_HDR_F_DATA_VALID is for inner in this case? > > \end{enumerate} > > @@ -1005,6 +1091,9 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}. > > +If VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM is not negotiated the device MUST NOT set > +the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in \field{flags}. > + > The device SHOULD NOT send to the driver TCP packets requiring segmentation offload > which have the Explicit Congestion Notification bit set, unless the > VIRTIO_NET_F_GUEST_ECN feature is negotiated, in which case the > @@ -1037,7 +1126,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > to the corresponding header information, and the outer UDP header MUST NOT > require checksum offload. > > -The device MUST NOT send the driver TCP or UDP GSO packets encapsulated in UDP > +If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has not been negotiated, > +the device MUST NOT send the driver TCP or UDP GSO packets encapsulated in UDP > tunnel and requiring segmentation and outer checksum offload. > > If none of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have > @@ -1063,13 +1153,31 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the > device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in > \field{flags}, if so, the device MUST validate the packet > -checksum (in case of multiple encapsulated protocols, one level > -of checksums is validated). \note{This implies that if either the > -VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > +checksum. If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has > +been negotiated, and the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit set in > +\field{flags}, both the outer UDP checksum and the inner transport > +checksum have been validated, otherwise only one level of checksums > +(the inner one in case of tunnels) has been validated. \note{This implies > +that if either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, > and the set the VIRTIO_NET_HDR_F_DATA_VALID in \field{flags} is set, > the the VIRTIO_NET_F_GUEST_CSUM bit in \field{flags} is must be set, too} > > +If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been negotiated > +and either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit is set or the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is set in \field{gso_type}, the > +device MAY set the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in > +\field{flags}, if so and the VIRTIO_NET_F_DATA_VALID bit in \field{flags} > +is not set, the device MUST set the packet outer UDP checksum stored in the > +receive buffer to the outer UDP pseudo header. > + > +Otherwise, the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been > +negotiated, either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit is set or the > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is set in \field{gso_type}, > +and the bit VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is not set in > +\field{flags}, the device MUST either provide a zero outer UDP header > +checksum or a fully checksummed outer UDP header. > + > \drivernormative{\paragraph}{Processing of Incoming > Packets}{Device Types / Network Device / Device Operation / > Processing of Incoming Packets} > @@ -1110,6 +1218,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > \end{itemize} > the driver MUST NOT accept the packet. > > +If the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in \field{flags} is set, > +and both the bits VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 and > +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 in \field{gso_type} are not set, > +the driver MOST NOT accept the packet. > + > \paragraph{Hash calculation for incoming packets} > \label{sec:Device Types / Network Device / Device Operation / Processing of Incoming Packets / Hash calculation for incoming packets} > > @@ -1954,6 +2067,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi > #define VIRTIO_NET_F_GUEST_ECN 9 > #define VIRTIO_NET_F_GUEST_UFO 10 > #define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 46 > +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 47 > #define VIRTIO_NET_F_GUEST_USO4 54 > #define VIRTIO_NET_F_GUEST_USO6 55 > > -- > 2.45.2 > Thanks ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 2/2] virtio-net: define UDP tunnel checksum offload feature 2024-10-09 7:18 ` Jason Wang @ 2024-10-09 9:39 ` Paolo Abeni 2024-10-10 4:22 ` Jason Wang 0 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-09 9:39 UTC (permalink / raw) To: Jason Wang Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On 10/9/24 09:18, Jason Wang wrote: >> @@ -168,6 +176,13 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device >> \item[VIRTIO_NET_F_RSS_CONTEXT] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_RSS. >> \end{description} >> >> +\begin{note} >> +The dependency between UDP_TUNNEL_GSO_CSUM and UDP_TUNNEL_GSO is intentionally >> +in the opposite direction with respect to the plain GSO features and the plain >> +checksum offload because UDP tunnel checksum offload gives very little gain >> +for non GSO packets and is quite complex to implement in H/W. >> +\end{note} > > Nit: maybe it's better to define "plain GSO" or just call it "non-tunnel GSO". I'll opt for "non-tunnel GSO". Note this section is not changed from the previous iteration. >> @@ -731,8 +782,23 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De >> to the inner transport header and inner transport checksum field. >> \end{itemize} >> >> +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature has been negotiated, >> +and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or >> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, >> +the driver MAY set the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in >> +\field{flags}, if so the driver MUST set the packet outer UDP header checksum >> +to the outer UDP pseudo header. >> + >> +\begin{note} >> +calculating a ones' complement checksum from \field{outer_th_offset} >> +up until the end of the packet and storing the result at offset 6 >> +from \field{outer_th_offset} will result in a fully checksummed outer UDP packet; >> +\end{note} >> + >> If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the >> -VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the >> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set >> +and the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature has not >> +been negotiated, the >> outer UDP header MUST NOT require checksum validation. That is, the >> outer UDP checksum value MUST be 0 or the validated complete checksum >> for such header. > > Nit: do we need to mention this is for TX? Note that this the '\drivernormative... Packet Transmission' section. With 'TX' I guess you refer to '\devicenormative... Processing of Incoming Packets'? It's already there: """ Otherwise, the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been negotiated, either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit is set or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is set in \field{gso_type}, and the bit VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is not set in \field{flags}, the device MUST either provide a zero outer UDP header checksum or a fully checksummed outer UDP header. """ >> @@ -920,8 +995,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network >> in \field{flags}, meaning that the device validated the checksum, >> set the csum_start and csum_offset fields, but did not provide the >> partial checksum for the transport header. >> - In case of multiple encapsulated protocols, one level of checksums >> - has been validated. >> + If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been negotiated, >> + and the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit is set in \field{flags}, >> + both the outer UDP checksum and the inner transport checksum >> + have been validated, > > We may need to clarify this is for RX. I'm sorry, I'm still slightly confused by the 'RX' alone. This is in the '\devicenormative ... Setting Up Receive Buffers' section, do you mean the other direction?, that is '\drivernormative ... Processing of Incoming Packets'? Currently such section only prescribe 'MUST NOT' behavior (even for non tunnel GSO), such clarification does not seam to fit there?!? > What's more, this seems to be > the semantic of DATA_VALID? If yes, to be consistent, something like > VIRTIO_NET_HDR_F_UDP_TUNNEL_DATA_VALID might be better. The VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM bit just specify how many csums are validated by the driver (or the device). If DATA_VALID is set, this is checksum unnecessary. Otherwise is csum partial. Its specificed a few lines above: """ \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be set: if so, device has validated the packet checksum. If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO feature has been negotiated, and either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the device can additionally set the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags}, meaning that the device validated the checksum, set the csum_start and csum_offset fields, but did not provide the partial checksum for the transport header. """ > And is there a chance that we may need partial csum for outer UDP e.g > in the case of VM2VM communication. I feel like I'm missing something?!? AFAICS this is has been already discussed, agreed and set int spec. This specification allows for checksum partial for the outer header, in both direction (VIRTIO_NET_HDR_F_DATA_VALID not set and VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM set. >> @@ -957,6 +1035,14 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network >> bit MAY be set. In such case the \field{outer_th_offset} and >> \field{inner_nh_offset} fields indicate the corresponding >> headers information. >> +\item If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature was >> +negotiated, and >> + the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit is set in \field{gso_type}, the >> + VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in the \field{flags} can be set: if so, >> + the outer UDP checksum has been validated. >> + If the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} is not set, the UDP >> + header checksum at offset 6 from from \field{outer_th_offset} >> + is set to the outer UDP pseudo header checksum. > > Just to confirm, VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is for outer and > VIRTIO_NET_HDR_F_DATA_VALID is for inner in this case? The VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit specify if the outer header has been validated: with !VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM only the inner header is validated, with VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM set, both the inner and the outer. The VIRTIO_NET_HDR_F_DATA_VALID specify if it's the packet is csum_partial (!VIRTIO_NET_HDR_F_DATA_VALID) or csum_unnecessary (VIRTIO_NET_HDR_F_DATA_VALID). It refers only to the inner header when !VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM or to both the inner and the outer when VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set. The relevant spec wording is: """ \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be set: if so, device has validated the packet checksum. If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO feature has been negotiated, and either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the device can additionally set the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags}, meaning that the device validated the checksum, set the csum_start and csum_offset fields, but did not provide the partial checksum for the transport header. If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been negotiated, and the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit is set in \field{flags}, both the outer UDP checksum and the inner transport checksum have been validated, otherwise only one level of checksums (the inner one in case of tunnels) has been validated. """ Is that unclear? which part needs rewording? Thanks, Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 2/2] virtio-net: define UDP tunnel checksum offload feature 2024-10-09 9:39 ` Paolo Abeni @ 2024-10-10 4:22 ` Jason Wang 0 siblings, 0 replies; 40+ messages in thread From: Jason Wang @ 2024-10-10 4:22 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Wed, Oct 9, 2024 at 5:39 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/9/24 09:18, Jason Wang wrote: > >> @@ -168,6 +176,13 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device > >> \item[VIRTIO_NET_F_RSS_CONTEXT] Requires VIRTIO_NET_F_CTRL_VQ and VIRTIO_NET_F_RSS. > >> \end{description} > >> > >> +\begin{note} > >> +The dependency between UDP_TUNNEL_GSO_CSUM and UDP_TUNNEL_GSO is intentionally > >> +in the opposite direction with respect to the plain GSO features and the plain > >> +checksum offload because UDP tunnel checksum offload gives very little gain > >> +for non GSO packets and is quite complex to implement in H/W. > >> +\end{note} > > > > Nit: maybe it's better to define "plain GSO" or just call it "non-tunnel GSO". > > I'll opt for "non-tunnel GSO". > Note this section is not changed from the previous iteration. Sorry if some questions were repeated, some of the context might be forgotten during the switch. > > >> @@ -731,8 +782,23 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De > >> to the inner transport header and inner transport checksum field. > >> \end{itemize} > >> > >> +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature has been negotiated, > >> +and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or > >> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, > >> +the driver MAY set the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in > >> +\field{flags}, if so the driver MUST set the packet outer UDP header checksum > >> +to the outer UDP pseudo header. > >> + > >> +\begin{note} > >> +calculating a ones' complement checksum from \field{outer_th_offset} > >> +up until the end of the packet and storing the result at offset 6 > >> +from \field{outer_th_offset} will result in a fully checksummed outer UDP packet; > >> +\end{note} > >> + > >> If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > >> -VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the > >> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set > >> +and the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM feature has not > >> +been negotiated, the > >> outer UDP header MUST NOT require checksum validation. That is, the > >> outer UDP checksum value MUST be 0 or the validated complete checksum > >> for such header. > > > > Nit: do we need to mention this is for TX? > > Note that this the '\drivernormative... Packet Transmission' section. > With 'TX' I guess you refer to '\devicenormative... Processing of > Incoming Packets'? It's already there: > > """ > Otherwise, the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been > negotiated, either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit is set or the > VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is set in \field{gso_type}, > and the bit VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is not set in > \field{flags}, the device MUST either provide a zero outer UDP header > checksum or a fully checksummed outer UDP header. > """ My bad, you are right. > > >> @@ -920,8 +995,11 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > >> in \field{flags}, meaning that the device validated the checksum, > >> set the csum_start and csum_offset fields, but did not provide the > >> partial checksum for the transport header. > >> - In case of multiple encapsulated protocols, one level of checksums > >> - has been validated. > >> + If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been negotiated, > >> + and the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit is set in \field{flags}, > >> + both the outer UDP checksum and the inner transport checksum > >> + have been validated, > > > > We may need to clarify this is for RX. > > I'm sorry, I'm still slightly confused by the 'RX' alone. This is in the > '\devicenormative ... Setting Up Receive Buffers' section, do you mean > the other direction?, that is '\drivernormative ... Processing of > Incoming Packets'? > > Currently such section only prescribe 'MUST NOT' behavior (even for non > tunnel GSO), such clarification does not seam to fit there?!? You're right here. > > > What's more, this seems to be > > the semantic of DATA_VALID? If yes, to be consistent, something like > > VIRTIO_NET_HDR_F_UDP_TUNNEL_DATA_VALID might be better. > > The VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM bit just specify how many > csums are validated by the driver (or the device). I'm not sure I get the meaning of "how many csums" here. > > If DATA_VALID is set, this is checksum unnecessary. Otherwise is csum > partial. I think I'm a little bit confused about the usage here. Please see my question in the previous patch, and it looks like it causes some confusion. For example, without this patch, NEEDS_CUSM needs to be set explicitly but this patch switches to assume NEEDS_CSUM if !DATA_VALID? > Its specificed a few lines above: > > """ > \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be > set: if so, device has validated the packet checksum. Note that this part will be changed soon, see: https://lore.kernel.org/all/1704263702-50528-1-git-send-email-hengqi@linux.alibaba.com/ (Note sure why it is not merged) DATA_VALID (CHECKSUM_UNNECESSARY) should not depend on NEEDS_CUSM (CHECKSUM_PARTAIL). > If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO feature has been negotiated, > and either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, > the device can additionally set the VIRTIO_NET_HDR_F_DATA_VALID bit > in \field{flags}, meaning that the device validated the checksum, > set the csum_start and csum_offset fields, but did not provide the > partial checksum for the transport header. > """ And probably this part """ If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been negotiated, and the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit is set in \field{flags}, both the outer UDP checksum and the inner transport checksum have been validated, otherwise only one level of checksums (the inner one in case of tunnels) has been validated. """ I think the confusion came from the definition of "validated". I thought it only worked for DATA_VALID (as I replied in patch 1). Or you meant: VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM | DATA_VALID: DATA_VALID for inner and outer VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM: NEEDS_CSUM for both inner or outer This excludes the combination of NEEDS_CUSM & DATA_VALID. It might be worth explaining this in the cover letter. And the introduction of the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is to differ from the case where outer checksum is not offload. Anyhow, it's better to set NEEDS_CUSM explicitly to be consistent with the non-tunnel GSO. > > > And is there a chance that we may need partial csum for outer UDP e.g > > in the case of VM2VM communication. > > I feel like I'm missing something?!? AFAICS this is has been already > discussed, agreed and set int spec. This specification allows for > checksum partial for the outer header, in both direction > (VIRTIO_NET_HDR_F_DATA_VALID not set and > VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM set. > > >> @@ -957,6 +1035,14 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network > >> bit MAY be set. In such case the \field{outer_th_offset} and > >> \field{inner_nh_offset} fields indicate the corresponding > >> headers information. > >> +\item If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature was > >> +negotiated, and > >> + the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit is set in \field{gso_type}, the > >> + VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in the \field{flags} can be set: if so, > >> + the outer UDP checksum has been validated. > >> + If the VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} is not set, the UDP > >> + header checksum at offset 6 from from \field{outer_th_offset} > >> + is set to the outer UDP pseudo header checksum. > > > > Just to confirm, VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is for outer and > > VIRTIO_NET_HDR_F_DATA_VALID is for inner in this case? > > The VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit specify if the outer header has > been validated: with !VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM only the inner > header is validated, with VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM set, both the > inner and the outer. > > The VIRTIO_NET_HDR_F_DATA_VALID specify if it's the packet is > csum_partial (!VIRTIO_NET_HDR_F_DATA_VALID) or csum_unnecessary > (VIRTIO_NET_HDR_F_DATA_VALID). It refers only to the inner header when > !VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM or to both the inner and the outer > when VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM is set. Ok, I finally got here. But another question is, if we use partial cusm for outer, where does the device put outer csum_start/offset for outer? > > The relevant spec wording is: > > """ > \item If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the > VIRTIO_NET_HDR_F_DATA_VALID bit in \field{flags} can be > set: if so, device has validated the packet checksum. > If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO feature has been negotiated, > and either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the > VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, > the device can additionally set the VIRTIO_NET_HDR_F_DATA_VALID bit > in \field{flags}, meaning that the device validated the checksum, > set the csum_start and csum_offset fields, but did not provide the > partial checksum for the transport header. > If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM feature has been > negotiated, > and the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit is set in \field{flags}, > both the outer UDP checksum and the inner transport checksum > have been validated, otherwise only one level of checksums (the inner one > in case of tunnels) has been validated. > """ > > Is that unclear? which part needs rewording? I think this is less straightforward to have something like: 1) a new field called inner_flags 2) or a new bit VIRTIO_NET_HDR_F_INNER_DATA_VALID, VIRTIO_NET_HDR_F_INNER_NEEDS_CSUM Thanks > > Thanks, > > Paolo > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 0/2] virtio-net: define UDP tunnel offload 2024-10-04 8:13 [PATCH v9 0/2] virtio-net: define UDP tunnel offload Paolo Abeni 2024-10-04 8:13 ` [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature Paolo Abeni 2024-10-04 8:13 ` [PATCH v9 2/2] virtio-net: define UDP tunnel checksum " Paolo Abeni @ 2024-10-04 16:53 ` Paolo Abeni 2024-10-09 7:24 ` Jason Wang 2 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-04 16:53 UTC (permalink / raw) To: virtio-comment, Jason Wang Cc: maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On 10/4/24 10:13, Paolo Abeni wrote: > Note that such while the mentioned PoC is based on the previous > iteration of this change, it already deals with all the delta introduced > here except the refined DATA_VALID handling. I'll try to update the PoC > WRT that ASAP. The updated PoC dealing with DATA_VALID according to the spec change is avail here: https://github.com/pabeni/linux-devel/commits/virtio_tunnel_gso_v9/ https://github.com/pabeni/qemu/commits/virtio_tunnel_gso_v9/ It's slighly less untested than the previous iteration, as the CHECKSUM_COMPLETE/DATA_VALID code path is explicitly covered allowing the user-space to configure the tun device to set DATA_VALID on the xmitted virtio hdr. @Jason, please LMK is something additional is needed to reach the board voting. Thanks, Paolo ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 0/2] virtio-net: define UDP tunnel offload 2024-10-04 16:53 ` [PATCH v9 0/2] virtio-net: define UDP tunnel offload Paolo Abeni @ 2024-10-09 7:24 ` Jason Wang 2024-10-09 8:08 ` Paolo Abeni 0 siblings, 1 reply; 40+ messages in thread From: Jason Wang @ 2024-10-09 7:24 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Sat, Oct 5, 2024 at 12:53 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/4/24 10:13, Paolo Abeni wrote: > > Note that such while the mentioned PoC is based on the previous > > iteration of this change, it already deals with all the delta introduced > > here except the refined DATA_VALID handling. I'll try to update the PoC > > WRT that ASAP. > > The updated PoC dealing with DATA_VALID according to the spec change is > avail here: > > https://github.com/pabeni/linux-devel/commits/virtio_tunnel_gso_v9/ > https://github.com/pabeni/qemu/commits/virtio_tunnel_gso_v9/ > > It's slighly less untested than the previous iteration, as the > CHECKSUM_COMPLETE/DATA_VALID code path is explicitly covered allowing > the user-space to configure the tun device to set DATA_VALID on the > xmitted virtio hdr. > > @Jason, please LMK is something additional is needed to reach the board > voting. It should work at a first glance. But the tnl_offset and new ioctl seems a little odd. Any reason we can't deduce it from the features? I'm asking since the design needs to work with hardware implementation as well. So any channel outside virtio (e.g the TUNSETTNLOFFLOAD) may not work. Thanks > > Thanks, > > Paolo > ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 0/2] virtio-net: define UDP tunnel offload 2024-10-09 7:24 ` Jason Wang @ 2024-10-09 8:08 ` Paolo Abeni 2024-10-10 2:29 ` Jason Wang 0 siblings, 1 reply; 40+ messages in thread From: Paolo Abeni @ 2024-10-09 8:08 UTC (permalink / raw) To: Jason Wang Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On 10/9/24 09:24, Jason Wang wrote: > On Sat, Oct 5, 2024 at 12:53 AM Paolo Abeni <pabeni@redhat.com> wrote: >> >> On 10/4/24 10:13, Paolo Abeni wrote: >>> Note that such while the mentioned PoC is based on the previous >>> iteration of this change, it already deals with all the delta introduced >>> here except the refined DATA_VALID handling. I'll try to update the PoC >>> WRT that ASAP. >> >> The updated PoC dealing with DATA_VALID according to the spec change is >> avail here: >> >> https://github.com/pabeni/linux-devel/commits/virtio_tunnel_gso_v9/ >> https://github.com/pabeni/qemu/commits/virtio_tunnel_gso_v9/ >> >> It's slighly less untested than the previous iteration, as the >> CHECKSUM_COMPLETE/DATA_VALID code path is explicitly covered allowing >> the user-space to configure the tun device to set DATA_VALID on the >> xmitted virtio hdr. >> >> @Jason, please LMK is something additional is needed to reach the board >> voting. > > It should work at a first glance. But the tnl_offset and new ioctl > seems a little odd. Any reason we can't deduce it from the features? The tun device does not know the complete layout of the virtio_net_hdr. It knows only the total size but is not aware i.e. if the VIRTIO_NET_F_MRG_RXBUF or the VIRTIO_NET_F_HASH_REPORT are negotiated. To support UDP tunnel GSO offload, The tun device must be able to parse correctly the UDP tunnel fields, so it must know at least the tun fields offset inside the virtio_net_hdr structure. The new ioctl is added to pass such additional data. A possible alternative would be piggybacking the mentioned 'tunnel offset' inside the offload 'long'. I avoid such option because it felt quite hackish, but it you feel that as preferrable it's still doable. Another option would let the user-space know to tun all the relevant negotiated features, but that look quite more invasive (I would avoid such option at least for a PoC). > I'm asking since the design needs to work with hardware implementation > as well. So any channel outside virtio (e.g the TUNSETTNLOFFLOAD) may > not work. I've no idea about the above. My uneducated guess is that H/W implementation will still need a S/W interface, and such interface could implement the new ioctl(). In any case I guess a new SW and/or firmware revision will be needed to support UDP tunnel GSO offload on virtio H/W impl. /P ^ permalink raw reply [flat|nested] 40+ messages in thread
* Re: [PATCH v9 0/2] virtio-net: define UDP tunnel offload 2024-10-09 8:08 ` Paolo Abeni @ 2024-10-10 2:29 ` Jason Wang 0 siblings, 0 replies; 40+ messages in thread From: Jason Wang @ 2024-10-10 2:29 UTC (permalink / raw) To: Paolo Abeni Cc: virtio-comment, maxime.coquelin, Eelco Chaudron, Stefano Garzarella, Willem de Bruijn, kshankar On Wed, Oct 9, 2024 at 4:16 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On 10/9/24 09:24, Jason Wang wrote: > > On Sat, Oct 5, 2024 at 12:53 AM Paolo Abeni <pabeni@redhat.com> wrote: > >> > >> On 10/4/24 10:13, Paolo Abeni wrote: > >>> Note that such while the mentioned PoC is based on the previous > >>> iteration of this change, it already deals with all the delta introduced > >>> here except the refined DATA_VALID handling. I'll try to update the PoC > >>> WRT that ASAP. > >> > >> The updated PoC dealing with DATA_VALID according to the spec change is > >> avail here: > >> > >> https://github.com/pabeni/linux-devel/commits/virtio_tunnel_gso_v9/ > >> https://github.com/pabeni/qemu/commits/virtio_tunnel_gso_v9/ > >> > >> It's slighly less untested than the previous iteration, as the > >> CHECKSUM_COMPLETE/DATA_VALID code path is explicitly covered allowing > >> the user-space to configure the tun device to set DATA_VALID on the > >> xmitted virtio hdr. > >> > >> @Jason, please LMK is something additional is needed to reach the board > >> voting. > > > > It should work at a first glance. But the tnl_offset and new ioctl > > seems a little odd. Any reason we can't deduce it from the features? > > The tun device does not know the complete layout of the virtio_net_hdr. We can teach if it is needed. > It knows only the total size but is not aware i.e. if the > VIRTIO_NET_F_MRG_RXBUF or the VIRTIO_NET_F_HASH_REPORT are negotiated. > > To support UDP tunnel GSO offload, The tun device must be able to parse > correctly the UDP tunnel fields, so it must know at least the tun fields > offset inside the virtio_net_hdr structure. > > The new ioctl is added to pass such additional data. Then if we want to add new fields of vnet header that tun needs, new ioctls would be added endlessly. > A possible > alternative would be piggybacking the mentioned 'tunnel offset' inside > the offload 'long'. I avoid such option because it felt quite hackish, > but it you feel that as preferrable it's still doable. > > Another option would let the user-space know to tun all the relevant > negotiated features, but that look quite more invasive (I would avoid > such option at least for a PoC). Yep, for POC, the current code should be sufficient. But I meant for the formal submission, we need a better way. For example, we can map some of the virtio features to tun features (like GSO). or it looks to me the tnl_offset is a fixed value. I think we can deduce it via TUNSETVNETHDRSZ. > > > I'm asking since the design needs to work with hardware implementation > > as well. So any channel outside virtio (e.g the TUNSETTNLOFFLOAD) may > > not work. > > I've no idea about the above. My uneducated guess is that H/W > implementation will still need a S/W interface, and such interface could > implement the new ioctl(). Seems not, the only interface for virtio-net-pci devices is the registers. Looking at the qemu implementation the set tnl offset is called from the feature negotiation path. It means for hardware it can deduce it from the feature negotiation somehow. > In any case I guess a new SW and/or firmware > revision will be needed to support UDP tunnel GSO offload on virtio H/W > impl. Thanks > > /P > ^ permalink raw reply [flat|nested] 40+ messages in thread
end of thread, other threads:[~2024-10-29 7:32 UTC | newest] Thread overview: 40+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-04 8:13 [PATCH v9 0/2] virtio-net: define UDP tunnel offload Paolo Abeni 2024-10-04 8:13 ` [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature Paolo Abeni 2024-10-09 7:18 ` Jason Wang 2024-10-09 8:37 ` Paolo Abeni 2024-10-10 3:17 ` Jason Wang 2024-10-10 7:40 ` Paolo Abeni 2024-10-11 2:08 ` Jason Wang 2024-10-11 7:50 ` Paolo Abeni 2024-10-14 7:20 ` Paolo Abeni 2024-10-17 6:47 ` Jason Wang 2024-10-17 15:34 ` Paolo Abeni 2024-10-18 4:26 ` Jason Wang 2024-10-18 10:10 ` Paolo Abeni 2024-10-20 22:28 ` Willem de Bruijn 2024-10-21 15:47 ` Paolo Abeni 2024-10-22 7:54 ` Jason Wang 2024-10-23 20:57 ` Willem de Bruijn 2024-10-25 8:41 ` Jason Wang 2024-10-21 6:54 ` Jason Wang 2024-10-21 16:27 ` Paolo Abeni 2024-10-22 7:42 ` Jason Wang 2024-10-22 16:56 ` Paolo Abeni 2024-10-25 8:28 ` Jason Wang 2024-10-25 11:50 ` Paolo Abeni 2024-10-25 13:28 ` Willem de Bruijn 2024-10-25 14:35 ` Paolo Abeni 2024-10-25 15:47 ` Willem de Bruijn 2024-10-28 3:27 ` Jason Wang 2024-10-28 12:08 ` Paolo Abeni 2024-10-28 12:26 ` Willem de Bruijn 2024-10-28 14:23 ` Paolo Abeni 2024-10-29 7:32 ` Jason Wang 2024-10-04 8:13 ` [PATCH v9 2/2] virtio-net: define UDP tunnel checksum " Paolo Abeni 2024-10-09 7:18 ` Jason Wang 2024-10-09 9:39 ` Paolo Abeni 2024-10-10 4:22 ` Jason Wang 2024-10-04 16:53 ` [PATCH v9 0/2] virtio-net: define UDP tunnel offload Paolo Abeni 2024-10-09 7:24 ` Jason Wang 2024-10-09 8:08 ` Paolo Abeni 2024-10-10 2:29 ` Jason Wang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox