From: "Michael S. Tsirkin" <mst@redhat.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: virtio-dev@lists.linux.dev, maxime.coquelin@redhat.com,
Eelco Chaudron <echaudro@redhat.com>,
Jason Wang <jasowang@redhat.com>
Subject: Re: [PATCH v3] virtio-net: define UDP tunnel offload feature
Date: Mon, 20 May 2024 22:55:42 -0400 [thread overview]
Message-ID: <20240520225254-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <f0bf2db203f8061a3ecb50b7a48cf26d8f3c7f68.1716193086.git.pabeni@redhat.com>
On Mon, May 20, 2024 at 10:24:53AM +0200, Paolo Abeni wrote:
> The VIRTIO_NET_HDR_GSO_UDP_TUNNEL is a gso_type flag allowing GSO over
> UDP tunnel. It can be negotiated on both the host and guest sides.
>
> UDP tunnel usage is ubiquitous in container deployment, and the ability
> to offload UDP encapsulated GSO traffic impacts greatly the performances
> and the CPU utilization of such use cases.
>
> One constraint addressed here is that the virtio side (either device or
> driver) receiving a UDP tunneled GSO packet must be able to reconstruct
> completely the inner and outer headers offset - to allow for later GSO.
>
> To accommodate such need, new optional fields are introduced in the
> virtio_net header: outer_th_offset, inner_protocol, inner_mac_offset,
> inner_nh_offset. They map directly to the corresponding header
> information.
>
> Note that the inner transport header is implied by the (inner) checksum
> offload, if present. Otherwise, it's up to the receiver to detect the
> inner transport header offset from the provided information, as it's
> currently the case for plain (not UDP tunneled) GSO packets.
>
> The outer UDP header may carry a second checksum, which can be offloaded
> independently from the inner one. Since UDP tunnel checksum offload
> support makes little sense without UDP tunnel GSO support, to avoid
> unnecessary complex feature negotiation, the
> VIRTIO_NET_HDR_GSO_UDP_TUNNEL feature implies the support for the outer
> header checksum offload and the checksum itself is handled similarly to
> the inner header one.
>
> Note that there is no concept of UDP tunnel type negotiation (e.g.
> vxlan, geneve, vxlan-gpe, etc.). That is intentional because:
> - given the information carried by the guest or host kernel, it's
> impossible to probe reliably the UDP tunnel type. Specifically, the
> outer UDP port numbers give a hint, but peers could use nonstandard
> ones.
> - all the existing UDP tunnel protocols behave the same way WRT GSO
> offload, carrying an immutable header on top of the outer transport
> one.
> - if a new UDP tunnel protocol should surface in the future with
> different constraints, the host and guest kernels will need explicit
> support for it, including new, different GSO features. Additional
> virtio support should be designed separately.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Paolo,
when you subscribed to virtio-dev you should have received this:
When to use this list:
- questions and change proposals for Virtio drivers and devices
implementing the specification.
When not to use this list:
- questions and change proposals for the Virtio specification,
including the specification of basic functionality, transports and
devices (please use the virtio-comment mailing list
[mailto:virtio-comment@lists.linux.dev] for this).
Did you receive this? If yes was this somehow unclear? If no why
are you sending a Virtio specification change proposal
to virtio-dev?
> ---
> v2 -> v3:
> - UDP_TUNNEL -> UDP_TUNNEL_GSO
> - add explicit fields for the inner meta-data
> - more verbose changelog
> https://lists.oasis-open.org/archives/virtio-dev/202206/msg00026.html
>
> v1 -> v2:
> - explicitly state that the outer header probing is mandatory
> - explicitly state that GSO_UDP is not allowed with GSO_UDP_TUNNEL
> - clarify hdr_len usage
> - clarify UDP_TUNNEL_CSUM bit usage
> - fix a few typos
> https://lists.oasis-open.org/archives/virtio-dev/202205/msg00037.html
> ---
> device-types/net/description.tex | 118 ++++++++++++++++++++++++++++---
> 1 file changed, 110 insertions(+), 8 deletions(-)
>
> diff --git a/device-types/net/description.tex b/device-types/net/description.tex
> index 76585b0..2eae797 100644
> --- a/device-types/net/description.tex
> +++ b/device-types/net/description.tex
> @@ -88,6 +88,12 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
> \item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
> channel.
>
> +\item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO (49)] Driver can receive GSO packets
> + carried by an UDP tunnel and can handle the outer checksum.
> +
> +\item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO (50)] Device can receive GSO packets
> + carried by an UDP tunnel and can handle the outer checksum.
> +
> \item[VIRTIO_NET_F_HASH_TUNNEL(51)] Device supports inner header hash for encapsulated packets.
>
> \item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports virtqueue notification coalescing.
> @@ -133,12 +139,16 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
> \item[VIRTIO_NET_F_GUEST_UFO] Requires VIRTIO_NET_F_GUEST_CSUM.
> \item[VIRTIO_NET_F_GUEST_USO4] Requires VIRTIO_NET_F_GUEST_CSUM.
> \item[VIRTIO_NET_F_GUEST_USO6] Requires VIRTIO_NET_F_GUEST_CSUM.
> +\item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO] Requires VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
> + VIRTIO_NET_F_GUEST_USO4 or VIRTIO_NET_F_GUEST_USO6.
>
> \item[VIRTIO_NET_F_HOST_TSO4] Requires VIRTIO_NET_F_CSUM.
> \item[VIRTIO_NET_F_HOST_TSO6] Requires VIRTIO_NET_F_CSUM.
> \item[VIRTIO_NET_F_HOST_ECN] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
> \item[VIRTIO_NET_F_HOST_UFO] Requires VIRTIO_NET_F_CSUM.
> \item[VIRTIO_NET_F_HOST_USO] Requires VIRTIO_NET_F_CSUM.
> +\item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO] Requires VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_TSO6,
> + VIRTIO_NET_F_GUEST_USO4 or VIRTIO_NET_F_GUEST_USO6.
>
> \item[VIRTIO_NET_F_CTRL_RX] Requires VIRTIO_NET_F_CTRL_VQ.
> \item[VIRTIO_NET_F_CTRL_VLAN] Requires VIRTIO_NET_F_CTRL_VQ.
> @@ -374,6 +384,9 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev
> segmentation/fragmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4
> TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP), VIRTIO_NET_F_HOST_UFO
> (UDP fragmentation) and VIRTIO_NET_F_HOST_USO (UDP segmentation) features.
> + Additionally, it can negotiate the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO feature
> + to use TCP segmentation or UDP segmentation on top of UDP encapsulation,
> + respecting the other negotiated features.
>
> \item The converse features are also available: a driver can save
> the virtual device some work by negotiating these features.\note{For example, a network packet transported between two guests on
> @@ -382,8 +395,9 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev
> The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially
> checksummed packets can be received, and if it can do that then
> the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
> - VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_USO4
> - and VIRTIO_NET_F_GUEST_USO6 are the input equivalents of the features described above.
> + VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_USO4,
> + VIRTIO_NET_F_GUEST_USO6 and VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO
> + are the input equivalents of the features described above.
> See \ref{sec:Device Types / Network Device / Device Operation /
> Setting Up Receive Buffers}~\nameref{sec:Device Types / Network
> Device / Device Operation / Setting Up Receive Buffers} and
> @@ -407,12 +421,14 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1
> #define VIRTIO_NET_HDR_F_DATA_VALID 2
> #define VIRTIO_NET_HDR_F_RSC_INFO 4
> +#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8
> u8 flags;
> #define VIRTIO_NET_HDR_GSO_NONE 0
> #define VIRTIO_NET_HDR_GSO_TCPV4 1
> #define VIRTIO_NET_HDR_GSO_UDP 3
> #define VIRTIO_NET_HDR_GSO_TCPV6 4
> #define VIRTIO_NET_HDR_GSO_UDP_L4 5
> +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL 0x40
> #define VIRTIO_NET_HDR_GSO_ECN 0x80
> u8 gso_type;
> le16 hdr_len;
> @@ -423,6 +439,10 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> le32 hash_value; (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
> le16 hash_report; (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
> le16 padding_reserved; (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
> + le16 outer_th_offset; (Only if VIRTIO_NET_F_UDP_TUNNEL_GSO negotiated)
> + le16 inner_protocol; (Only if VIRTIO_NET_F_UDP_TUNNEL_GSO negotiated)
> + le16 inner_mac_offset; (Only if VIRTIO_NET_F_UDP_TUNNEL_GSO negotiated)
> + le16 inner_nh_offset; (Only if VIRTIO_NET_F_UDP_TUNNEL_GSO negotiated)
> };
> \end{lstlisting}
>
> @@ -480,6 +500,8 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De
> followed by the TCP header (with the TCP checksum field 16 bytes
> into that header). \field{csum_start} will be 14+20 = 34 (the TCP
> checksum includes the header), and \field{csum_offset} will be 16.
> +If the given packets has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit set,
> +the above checksum fields refer to the inner header checksum.
> \end{note}
>
> \item If the driver negotiated
> @@ -516,6 +538,30 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De
> specifically in the protocol.}.
> \end{itemize}
>
> +\item If the driver negotiated the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO feature,
> + the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit in \field{gso_type} indicates that
> + the GSO protocol is encapsulated in an UDP tunnel.
> + The other tunnel-related fields indicate how to replicate the inner packet
> + header to cut it into smaller packets:
> +
> + \begin{itemize}
> + \item \field{outer_th_offset} field indicates the outer transport header within
> + the packet
> +
> + \item \field{inner_protocol} field indicates the ethernet type of the inner
> + protocol.
> +
> + \item \field{inner_mac_offset} field indicates the inner mac header within the packet
> +
> + \item \field{inner_nh_offset} field indicates the inner network header within
> + the packet
> +
> + \item If the \field{flags} field has the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM set,
> + the outer UDP checksum field carries the checksum for the UDP pseudo header
> + and the complete UDP checksum can be computed in a similar way to the
> + inner TCP.
> + \end{itemize}
> +
> \item \field{num_buffers} is set to zero. This field is unused on transmitted packets.
>
> \item The header and packet are added as one output descriptor to the
> @@ -557,6 +603,14 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De
> driver MUST set the VIRTIO_NET_HDR_GSO_ECN bit in
> \field{gso_type}.
>
> +The driver MUST NOT send to the device TCP or UDP GSO packets over UDP tunnel
> +requiring segmentation offload, unless the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO is
> +negotiated, in which case the driver MUST set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL
> +bit in the \field{gso_type}.
> +
> +The driver MUST NOT set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL together with
> +VIRTIO_NET_HDR_GSO_UDP.
> +
> If the VIRTIO_NET_F_CSUM feature has been negotiated, the
> driver MAY set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in
> \field{flags}, if so:
> @@ -633,6 +687,18 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De
> \end{note}
> \end{itemize}
>
> +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO option has been negotiated:
> +\begin{itemize}
> +\item If the \field{gso_type} has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit set
> + the device MUST use the \field{outer_th_offset}, \field{inner_protocol},
> + \field{inner_mac_offset} and \field{inner_nh_offset} fields to
> + locate the corresponding headers inside the packet.
> +\end{itemize}
> +
> +If VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit in \field{gso_type} is not set, the
> +device MUST NOT use the \field{outer_th_offset}, \field{inner_protocol},
> +\field{inner_mac_offset} and \field{inner_nh_offset}.
> +
> If VIRTIO_NET_HDR_F_NEEDS_CSUM is not set, the device MUST NOT
> rely on the packet checksum being correct.
> \paragraph{Packet Transmission Interrupt}\label{sec:Device Types / Network Device / Device Operation / Packet Transmission / Packet Transmission Interrupt}
> @@ -727,8 +793,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> has been validated.
> \end{enumerate}
>
> -Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP and ECN
> -features enable receive checksum, large receive offload and ECN
> +Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP, UDP_TUNNEL
> +and ECN features enable receive checksum, large receive offload and ECN
> support which are the input equivalents of the transmit checksum,
> transmit segmentation offloading and ECN features, as described
> in \ref{sec:Device Types / Network Device / Device Operation /
> @@ -738,6 +804,15 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> negotiated, then \field{gso_type} MAY be something other than
> VIRTIO_NET_HDR_GSO_NONE, and \field{gso_size} field indicates the
> desired MSS (see Packet Transmission point 2).
> +\item If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO option was negotiated and
> + \field{gso_type} is not VIRTIO_NET_HDR_GSO_NONE, the VIRTIO_NET_HDR_GSO_UDP_TUNNEL
> + bit MAY be set. In such case the \field{outer_th_offset}, \field{inner_protocol},
> + \field{inner_mac_offset} and \field{inner_nh_offset} fields indicates corresponding
> + header information.
> + Additionally, the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in the
> + \field{flags} MAY be set, indicating that the outer UDP header
> + carries the UDP pseudo header csum and that the driver can compute
> + the full UDP checksum on top of it (see Packet Transmission point 3).
> \item If the VIRTIO_NET_F_RSC_EXT option was negotiated (this
> implies one of VIRTIO_NET_F_GUEST_TSO4, TSO6), the
> device processes also duplicated ACK segments, reports
> @@ -750,8 +825,9 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> from \field{csum_start} and any preceding checksums
> have been validated. The checksum on the packet is incomplete and
> if bit VIRTIO_NET_HDR_F_RSC_INFO is not set in \field{flags},
> - then \field{csum_start} and \field{csum_offset} indicate how to calculate it
> - (see Packet Transmission point 1).
> + then \field{csum_start} and \field{csum_offset} indicate how to calculate it.
> + If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit is set, the \field{csum_start} field
> + refers to the inner transport header offset (see Packet Transmission point 1).
>
> \end{enumerate}
>
> @@ -800,6 +876,20 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> which have the Explicit Congestion Notification bit set, unless the
> VIRTIO_NET_F_GUEST_ECN feature is negotiated, in which case the
> device MUST set the VIRTIO_NET_HDR_GSO_ECN bit in
> +
> +The device SHOULD NOT send to the driver TCP or UDP GSO packets encapsulated in UDP
> +tunnel and requiring segmentation offload, unless the
> +VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO is negotiated, in which case the device MUST set
> +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit in \field{gso_type} and MUST set the
> +\field{outer_th_offset}, \field{inner_protocol}, \field{inner_mac_offset} and
> +\field{inner_nh_offset}. If the outer UDP header carries a non 0 checksum:
> +\begin{enumerate}
> +\item the device MUST set the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in
> + \field{flags}
> +\item the device MUST set the outer UDP header checksum field to the outer
> + UDP pseudo header sum
> +\end{enumerate}
> +
> \field{gso_type}.
>
> If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
> @@ -819,6 +909,12 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> fully checksummed packet;
> \end{enumerate}
>
> +\begin{note}
> +If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO feature is negotiated and the
> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit is set, the VIRTIO_NET_HDR_F_NEEDS_CSUM
> +bit refers to the inner header checksum.
> +\end{note}
> +
> If none of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
> been negotiated, the device MUST set \field{gso_type} to
> VIRTIO_NET_HDR_GSO_NONE.
> @@ -842,8 +938,9 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
> device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in
> \field{flags}, if so, the device MUST validate the packet
> -checksum (in case of multiple encapsulated protocols, one level
> -of checksums is validated).
> +checksum. If the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM
> +bit in \field{flags} is also set, the device MUST additionally validate
> +the outer UDP header checksum.
>
> \drivernormative{\paragraph}{Processing of Incoming
> Packets}{Device Types / Network Device / Device Operation /
> @@ -863,6 +960,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
> This is due to various bugs in implementations.
> \end{note}
>
> +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_GSO bit in \field{gso_type} is not set,
> +the driver MUST NOT use the \field{outer_th_offset}, \field{inner_protocol},
> +\field{inner_mac_offset} and \field{inner_nh_offset}.
> +
> If neither VIRTIO_NET_HDR_F_NEEDS_CSUM nor
> VIRTIO_NET_HDR_F_DATA_VALID is set, the driver MUST NOT
> rely on the packet checksum being correct.
> @@ -1624,6 +1725,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
> #define VIRTIO_NET_F_GUEST_TSO6 8
> #define VIRTIO_NET_F_GUEST_ECN 9
> #define VIRTIO_NET_F_GUEST_UFO 10
> +#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 49
> #define VIRTIO_NET_F_GUEST_USO4 54
> #define VIRTIO_NET_F_GUEST_USO6 55
>
> --
> 2.43.2
>
next prev parent reply other threads:[~2024-05-21 2:55 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-20 8:24 [PATCH v3] virtio-net: define UDP tunnel offload feature Paolo Abeni
2024-05-20 13:00 ` Stefano Garzarella
2024-05-21 2:55 ` Michael S. Tsirkin [this message]
2024-05-21 7:25 ` Paolo Abeni
2024-05-21 7:41 ` Michael S. Tsirkin
2024-05-21 12:00 ` Paolo Abeni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240520225254-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=echaudro@redhat.com \
--cc=jasowang@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=pabeni@redhat.com \
--cc=virtio-dev@lists.linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox