* [PATCH v4] virtio-net: define UDP tunnel offload feature
@ 2024-05-23 7:20 Paolo Abeni
2024-05-23 7:50 ` Paolo Abeni
2024-05-23 8:47 ` Paolo Abeni
0 siblings, 2 replies; 22+ messages in thread
From: Paolo Abeni @ 2024-05-23 7:20 UTC (permalink / raw)
To: virtio-comment
Cc: maxime.coquelin, Eelco Chaudron, Jason Wang, Stefano Garzarella
The VIRTIO_NET_HDR_GSO_UDP_TUNNEL is a gso_type flag allowing GSO over
UDP tunnel. It can be negotiated on both the host and guest sides.
UDP tunnel usage is ubiquitous in container deployment, and the ability
to offload UDP encapsulated GSO traffic impacts greatly the performances
and the CPU utilization of such use cases.
One constraint addressed here is that the virtio side (either device or
driver) receiving a UDP tunneled GSO packet must be able to reconstruct
completely the inner and outer headers offset - to allow for later GSO.
To accommodate such need, new optional fields are introduced in the
virtio_net header: outer_th_offset, inner_protocol, inner_mac_offset,
inner_nh_offset. They map directly to the corresponding header
information.
Note that the inner transport header is implied by the (inner) checksum
offload, if present. Otherwise, it's up to the receiver to detect the
inner transport header offset from the provided information, as it's
currently the case for plain (not UDP tunneled) GSO packets.
The outer UDP header may carry a second checksum, which can be offloaded
independently from the inner one. Since UDP tunnel checksum offload
support makes little sense without UDP tunnel GSO support, to avoid
unnecessary complex feature negotiation, the
VIRTIO_NET_HDR_GSO_UDP_TUNNEL feature implies the support for the outer
header checksum offload and the checksum itself is handled similarly to
the inner header one.
Migration for GSO_UDP_TUNNEL enabled VMs should be managed alike other
types of GSO offload: VMs that has negotiated such features could be
migrated only on hosts supporting such features.
Note that there is no concept of UDP tunnel type negotiation (e.g.
vxlan, geneve, vxlan-gpe, etc.). That is intentional because:
- given the information carried by the guest or host kernel, it's
impossible to probe reliably the UDP tunnel type. Specifically, the
outer UDP port numbers give a hint, but peers could use nonstandard
ones.
- all the existing UDP tunnel protocols behave the same way WRT GSO
offload, carrying an immutable header on top of the outer transport
one.
- if a new UDP tunnel protocol, carring packet dependant information
alike the inner packet length or csum should surface in the future,
the host and guest kernels will need explicit support for it,
including new, different GSO features. Additional virtio support
should be designed separately.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v3 -> v4:
- new virtio_net_hdr fields depends on F_HOST_UDP_TUNNEL_GSO or
F_GUEST_UDP_TUNNEL_GSO (Stefano)
- Extended the changelog to answer Jason's questions.
- Clarified outer csum handling (Jason)
https://lore.kernel.org/virtio-dev/f0bf2db203f8061a3ecb50b7a48cf26d8f3c7f68.1716193086.git.pabeni@redhat.com
v2 -> v3:
- UDP_TUNNEL -> UDP_TUNNEL_GSO
- add explicit fields for the inner meta-data
- more verbose changelog
https://lists.oasis-open.org/archives/virtio-dev/202206/msg00026.html
v1 -> v2:
- explicitly state that the outer header probing is mandatory
- explicitly state that GSO_UDP is not allowed with GSO_UDP_TUNNEL
- clarify hdr_len usage
- clarify UDP_TUNNEL_CSUM bit usage
- fix a few typos
https://lists.oasis-open.org/archives/virtio-dev/202205/msg00037.html
---
device-types/net/description.tex | 118 ++++++++++++++++++++++++++++---
1 file changed, 110 insertions(+), 8 deletions(-)
diff --git a/device-types/net/description.tex b/device-types/net/description.tex
index 76585b0..2c8214f 100644
--- a/device-types/net/description.tex
+++ b/device-types/net/description.tex
@@ -88,6 +88,12 @@ \subsection{Feature bits}\label{sec:Device Types / Network Device / Feature bits
\item[VIRTIO_NET_F_CTRL_MAC_ADDR(23)] Set MAC address through control
channel.
+\item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO (49)] Driver can receive GSO packets
+ carried by an UDP tunnel and can handle the outer checksum.
+
+\item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO (50)] Device can receive GSO packets
+ carried by an UDP tunnel and can handle the outer checksum.
+
\item[VIRTIO_NET_F_HASH_TUNNEL(51)] Device supports inner header hash for encapsulated packets.
\item[VIRTIO_NET_F_VQ_NOTF_COAL(52)] Device supports virtqueue notification coalescing.
@@ -133,12 +139,16 @@ \subsubsection{Feature bit requirements}\label{sec:Device Types / Network Device
\item[VIRTIO_NET_F_GUEST_UFO] Requires VIRTIO_NET_F_GUEST_CSUM.
\item[VIRTIO_NET_F_GUEST_USO4] Requires VIRTIO_NET_F_GUEST_CSUM.
\item[VIRTIO_NET_F_GUEST_USO6] Requires VIRTIO_NET_F_GUEST_CSUM.
+\item[VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO] Requires VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
+ VIRTIO_NET_F_GUEST_USO4 or VIRTIO_NET_F_GUEST_USO6.
\item[VIRTIO_NET_F_HOST_TSO4] Requires VIRTIO_NET_F_CSUM.
\item[VIRTIO_NET_F_HOST_TSO6] Requires VIRTIO_NET_F_CSUM.
\item[VIRTIO_NET_F_HOST_ECN] Requires VIRTIO_NET_F_HOST_TSO4 or VIRTIO_NET_F_HOST_TSO6.
\item[VIRTIO_NET_F_HOST_UFO] Requires VIRTIO_NET_F_CSUM.
\item[VIRTIO_NET_F_HOST_USO] Requires VIRTIO_NET_F_CSUM.
+\item[VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO] Requires VIRTIO_NET_F_HOST_TSO4, VIRTIO_NET_F_HOST_TSO6,
+ VIRTIO_NET_F_GUEST_USO4 or VIRTIO_NET_F_GUEST_USO6.
\item[VIRTIO_NET_F_CTRL_RX] Requires VIRTIO_NET_F_CTRL_VQ.
\item[VIRTIO_NET_F_CTRL_VLAN] Requires VIRTIO_NET_F_CTRL_VQ.
@@ -374,6 +384,9 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev
segmentation/fragmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4
TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP), VIRTIO_NET_F_HOST_UFO
(UDP fragmentation) and VIRTIO_NET_F_HOST_USO (UDP segmentation) features.
+ Additionally, it can negotiate the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO feature
+ to use TCP segmentation or UDP segmentation on top of UDP encapsulation,
+ respecting the other negotiated features.
\item The converse features are also available: a driver can save
the virtual device some work by negotiating these features.\note{For example, a network packet transported between two guests on
@@ -382,8 +395,9 @@ \subsection{Device Initialization}\label{sec:Device Types / Network Device / Dev
The VIRTIO_NET_F_GUEST_CSUM feature indicates that partially
checksummed packets can be received, and if it can do that then
the VIRTIO_NET_F_GUEST_TSO4, VIRTIO_NET_F_GUEST_TSO6,
- VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_USO4
- and VIRTIO_NET_F_GUEST_USO6 are the input equivalents of the features described above.
+ VIRTIO_NET_F_GUEST_UFO, VIRTIO_NET_F_GUEST_ECN, VIRTIO_NET_F_GUEST_USO4,
+ VIRTIO_NET_F_GUEST_USO6 and VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO
+ are the input equivalents of the features described above.
See \ref{sec:Device Types / Network Device / Device Operation /
Setting Up Receive Buffers}~\nameref{sec:Device Types / Network
Device / Device Operation / Setting Up Receive Buffers} and
@@ -407,12 +421,14 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
#define VIRTIO_NET_HDR_F_NEEDS_CSUM 1
#define VIRTIO_NET_HDR_F_DATA_VALID 2
#define VIRTIO_NET_HDR_F_RSC_INFO 4
+#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8
u8 flags;
#define VIRTIO_NET_HDR_GSO_NONE 0
#define VIRTIO_NET_HDR_GSO_TCPV4 1
#define VIRTIO_NET_HDR_GSO_UDP 3
#define VIRTIO_NET_HDR_GSO_TCPV6 4
#define VIRTIO_NET_HDR_GSO_UDP_L4 5
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL 0x40
#define VIRTIO_NET_HDR_GSO_ECN 0x80
u8 gso_type;
le16 hdr_len;
@@ -423,6 +439,10 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
le32 hash_value; (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
le16 hash_report; (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
le16 padding_reserved; (Only if VIRTIO_NET_F_HASH_REPORT negotiated)
+ le16 outer_th_offset; (Only if VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO negotiated)
+ le16 inner_protocol; (Only if VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO negotiated)
+ le16 inner_mac_offset; (Only if VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO negotiated)
+ le16 inner_nh_offset; (Only if VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO or VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO negotiated)
};
\end{lstlisting}
@@ -480,6 +500,8 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De
followed by the TCP header (with the TCP checksum field 16 bytes
into that header). \field{csum_start} will be 14+20 = 34 (the TCP
checksum includes the header), and \field{csum_offset} will be 16.
+If the given packets has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit set,
+the above checksum fields refer to the inner header checksum.
\end{note}
\item If the driver negotiated
@@ -516,6 +538,30 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De
specifically in the protocol.}.
\end{itemize}
+\item If the driver negotiated the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO feature,
+ the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit in \field{gso_type} indicates that
+ the GSO protocol is encapsulated in an UDP tunnel.
+ The other tunnel-related fields indicate how to replicate the inner packet
+ header to cut it into smaller packets:
+
+ \begin{itemize}
+ \item \field{outer_th_offset} field indicates the outer transport header within
+ the packet
+
+ \item \field{inner_protocol} field indicates the ethernet type of the inner
+ protocol.
+
+ \item \field{inner_mac_offset} field indicates the inner mac header within the packet
+
+ \item \field{inner_nh_offset} field indicates the inner network header within
+ the packet
+
+ \item If the \field{flags} field has the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM set,
+ the outer UDP checksum field carries the checksum for the UDP pseudo header
+ and the complete UDP checksum can be computed in a similar way to the
+ inner TCP.
+ \end{itemize}
+
\item \field{num_buffers} is set to zero. This field is unused on transmitted packets.
\item The header and packet are added as one output descriptor to the
@@ -557,6 +603,14 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De
driver MUST set the VIRTIO_NET_HDR_GSO_ECN bit in
\field{gso_type}.
+The driver MUST NOT send to the device TCP or UDP GSO packets over UDP tunnel
+requiring segmentation offload, unless the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO is
+negotiated, in which case the driver MUST set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL
+bit in the \field{gso_type}.
+
+The driver MUST NOT set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL together with
+VIRTIO_NET_HDR_GSO_UDP.
+
If the VIRTIO_NET_F_CSUM feature has been negotiated, the
driver MAY set the VIRTIO_NET_HDR_F_NEEDS_CSUM bit in
\field{flags}, if so:
@@ -633,6 +687,18 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De
\end{note}
\end{itemize}
+If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO option has been negotiated:
+\begin{itemize}
+\item If the \field{gso_type} has the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit set
+ the device MUST use the \field{outer_th_offset}, \field{inner_protocol},
+ \field{inner_mac_offset} and \field{inner_nh_offset} fields to
+ locate the corresponding headers inside the packet.
+\end{itemize}
+
+If VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit in \field{gso_type} is not set, the
+device MUST NOT use the \field{outer_th_offset}, \field{inner_protocol},
+\field{inner_mac_offset} and \field{inner_nh_offset}.
+
If VIRTIO_NET_HDR_F_NEEDS_CSUM is not set, the device MUST NOT
rely on the packet checksum being correct.
\paragraph{Packet Transmission Interrupt}\label{sec:Device Types / Network Device / Device Operation / Packet Transmission / Packet Transmission Interrupt}
@@ -727,8 +793,8 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
has been validated.
\end{enumerate}
-Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP and ECN
-features enable receive checksum, large receive offload and ECN
+Additionally, VIRTIO_NET_F_GUEST_CSUM, TSO4, TSO6, UDP, UDP_TUNNEL
+and ECN features enable receive checksum, large receive offload and ECN
support which are the input equivalents of the transmit checksum,
transmit segmentation offloading and ECN features, as described
in \ref{sec:Device Types / Network Device / Device Operation /
@@ -738,6 +804,15 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
negotiated, then \field{gso_type} MAY be something other than
VIRTIO_NET_HDR_GSO_NONE, and \field{gso_size} field indicates the
desired MSS (see Packet Transmission point 2).
+\item If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO option was negotiated and
+ \field{gso_type} is not VIRTIO_NET_HDR_GSO_NONE, the VIRTIO_NET_HDR_GSO_UDP_TUNNEL
+ bit MAY be set. In such case the \field{outer_th_offset}, \field{inner_protocol},
+ \field{inner_mac_offset} and \field{inner_nh_offset} fields indicates corresponding
+ header information.
+ Additionally, the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in the
+ \field{flags} MAY be set, indicating that the outer UDP header
+ carries the UDP pseudo header csum and that the driver can compute
+ the full UDP checksum on top of it (see Packet Transmission point 3).
\item If the VIRTIO_NET_F_RSC_EXT option was negotiated (this
implies one of VIRTIO_NET_F_GUEST_TSO4, TSO6), the
device processes also duplicated ACK segments, reports
@@ -750,8 +825,9 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
from \field{csum_start} and any preceding checksums
have been validated. The checksum on the packet is incomplete and
if bit VIRTIO_NET_HDR_F_RSC_INFO is not set in \field{flags},
- then \field{csum_start} and \field{csum_offset} indicate how to calculate it
- (see Packet Transmission point 1).
+ then \field{csum_start} and \field{csum_offset} indicate how to calculate it.
+ If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit is set, the \field{csum_start} field
+ refers to the inner transport header offset (see Packet Transmission point 1).
\end{enumerate}
@@ -800,6 +876,20 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
which have the Explicit Congestion Notification bit set, unless the
VIRTIO_NET_F_GUEST_ECN feature is negotiated, in which case the
device MUST set the VIRTIO_NET_HDR_GSO_ECN bit in
+
+The device SHOULD NOT send to the driver TCP or UDP GSO packets encapsulated in UDP
+tunnel and requiring segmentation offload, unless the
+VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO is negotiated, in which case the device MUST set
+the VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit in \field{gso_type} and MUST set the
+\field{outer_th_offset}, \field{inner_protocol}, \field{inner_mac_offset} and
+\field{inner_nh_offset}. If the outer UDP header carries a non 0 checksum:
+\begin{enumerate}
+\item the device MUST set the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM bit in
+ \field{flags}
+\item the device MUST set the outer UDP header checksum field to the outer
+ UDP pseudo header sum
+\end{enumerate}
+
\field{gso_type}.
If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
@@ -819,6 +909,12 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
fully checksummed packet;
\end{enumerate}
+\begin{note}
+If the VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO feature is negotiated and the
+VIRTIO_NET_HDR_GSO_UDP_TUNNEL bit is set, the VIRTIO_NET_HDR_F_NEEDS_CSUM
+bit refers to the inner header checksum.
+\end{note}
+
If none of the VIRTIO_NET_F_GUEST_TSO4, TSO6, UFO, USO4 or USO6 options have
been negotiated, the device MUST set \field{gso_type} to
VIRTIO_NET_HDR_GSO_NONE.
@@ -842,8 +938,9 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
If the VIRTIO_NET_F_GUEST_CSUM feature has been negotiated, the
device MAY set the VIRTIO_NET_HDR_F_DATA_VALID bit in
\field{flags}, if so, the device MUST validate the packet
-checksum (in case of multiple encapsulated protocols, one level
-of checksums is validated).
+checksum. If the VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM
+bit in \field{flags} is also set, the device MUST additionally validate
+the outer UDP header checksum.
\drivernormative{\paragraph}{Processing of Incoming
Packets}{Device Types / Network Device / Device Operation /
@@ -863,6 +960,10 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
This is due to various bugs in implementations.
\end{note}
+If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_GSO bit in \field{gso_type} is not set,
+the driver MUST NOT use the \field{outer_th_offset}, \field{inner_protocol},
+\field{inner_mac_offset} and \field{inner_nh_offset}.
+
If neither VIRTIO_NET_HDR_F_NEEDS_CSUM nor
VIRTIO_NET_HDR_F_DATA_VALID is set, the driver MUST NOT
rely on the packet checksum being correct.
@@ -1624,6 +1725,7 @@ \subsubsection{Control Virtqueue}\label{sec:Device Types / Network Device / Devi
#define VIRTIO_NET_F_GUEST_TSO6 8
#define VIRTIO_NET_F_GUEST_ECN 9
#define VIRTIO_NET_F_GUEST_UFO 10
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 49
#define VIRTIO_NET_F_GUEST_USO4 54
#define VIRTIO_NET_F_GUEST_USO6 55
--
2.43.2
^ permalink raw reply related [flat|nested] 22+ messages in thread* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-23 7:20 [PATCH v4] virtio-net: define UDP tunnel offload feature Paolo Abeni
@ 2024-05-23 7:50 ` Paolo Abeni
2024-05-23 8:47 ` Paolo Abeni
1 sibling, 0 replies; 22+ messages in thread
From: Paolo Abeni @ 2024-05-23 7:50 UTC (permalink / raw)
To: virtio-comment
Cc: maxime.coquelin, Eelco Chaudron, Jason Wang, Stefano Garzarella
On Thu, 2024-05-23 at 09:20 +0200, Paolo Abeni wrote:
> The VIRTIO_NET_HDR_GSO_UDP_TUNNEL is a gso_type flag allowing GSO over
> UDP tunnel. It can be negotiated on both the host and guest sides.
>
> UDP tunnel usage is ubiquitous in container deployment, and the ability
> to offload UDP encapsulated GSO traffic impacts greatly the performances
> and the CPU utilization of such use cases.
>
> One constraint addressed here is that the virtio side (either device or
> driver) receiving a UDP tunneled GSO packet must be able to reconstruct
> completely the inner and outer headers offset - to allow for later GSO.
>
> To accommodate such need, new optional fields are introduced in the
> virtio_net header: outer_th_offset, inner_protocol, inner_mac_offset,
> inner_nh_offset. They map directly to the corresponding header
> information.
>
> Note that the inner transport header is implied by the (inner) checksum
> offload, if present. Otherwise, it's up to the receiver to detect the
> inner transport header offset from the provided information, as it's
> currently the case for plain (not UDP tunneled) GSO packets.
>
> The outer UDP header may carry a second checksum, which can be offloaded
> independently from the inner one. Since UDP tunnel checksum offload
> support makes little sense without UDP tunnel GSO support, to avoid
> unnecessary complex feature negotiation, the
> VIRTIO_NET_HDR_GSO_UDP_TUNNEL feature implies the support for the outer
> header checksum offload and the checksum itself is handled similarly to
> the inner header one.
>
> Migration for GSO_UDP_TUNNEL enabled VMs should be managed alike other
> types of GSO offload: VMs that has negotiated such features could be
> migrated only on hosts supporting such features.
>
> Note that there is no concept of UDP tunnel type negotiation (e.g.
> vxlan, geneve, vxlan-gpe, etc.). That is intentional because:
> - given the information carried by the guest or host kernel, it's
> impossible to probe reliably the UDP tunnel type. Specifically, the
> outer UDP port numbers give a hint, but peers could use nonstandard
> ones.
> - all the existing UDP tunnel protocols behave the same way WRT GSO
> offload, carrying an immutable header on top of the outer transport
> one.
> - if a new UDP tunnel protocol, carring packet dependant information
> alike the inner packet length or csum should surface in the future,
> the host and guest kernels will need explicit support for it,
> including new, different GSO features. Additional virtio support
> should be designed separately.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v3 -> v4:
> - new virtio_net_hdr fields depends on F_HOST_UDP_TUNNEL_GSO or
> F_GUEST_UDP_TUNNEL_GSO (Stefano)
> - Extended the changelog to answer Jason's questions.
> - Clarified outer csum handling (Jason)
> https://lore.kernel.org/virtio-dev/f0bf2db203f8061a3ecb50b7a48cf26d8f3c7f68.1716193086.git.pabeni@redhat.com
Overall only minor edit over v3, just to restart the conversation.
@Jason: I don't understand a question you raised on v3:
"""
1) if there's no concept of UDP tunnel type negotiation, would it be
possible for a device to just implement vxlan?
"""
Specifically, I don't understand what do you mean with 'just implement
vxlan': there is no concept of vxlan-specific GSO offload in Linux, all
UDP tunnels are exactly the same thing from GSO perspective.
e.g. see:
https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/ice/ice_txrx.c#L2061
Could you please re-phrase?
> @@ -407,12 +421,14 @@ \subsection{Device Operation}\label{sec:Device Types / Network Device / Device O
> #define VIRTIO_NET_HDR_F_NEEDS_CSUM 1
> #define VIRTIO_NET_HDR_F_DATA_VALID 2
> #define VIRTIO_NET_HDR_F_RSC_INFO 4
> +#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8
> u8 flags;
> #define VIRTIO_NET_HDR_GSO_NONE 0
> #define VIRTIO_NET_HDR_GSO_TCPV4 1
> #define VIRTIO_NET_HDR_GSO_UDP 3
> #define VIRTIO_NET_HDR_GSO_TCPV6 4
> #define VIRTIO_NET_HDR_GSO_UDP_L4 5
> +#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL 0x40
Note that we can't use '6' here, because GSO_UDP_TUNNEL is a 'flag'
that can be set together with other offloads e.g.
VIRTIO_NET_HDR_GSO_TCPV4 or VIRTIO_NET_HDR_GSO_TCPV6.
Quite alike VIRTIO_NET_HDR_GSO_ECN, I think.
Note that GSO_UDP_TUNNEL could additionally be set together with
VIRTIO_NET_HDR_GSO_UDP_L4.
Cheers,
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-23 7:20 [PATCH v4] virtio-net: define UDP tunnel offload feature Paolo Abeni
2024-05-23 7:50 ` Paolo Abeni
@ 2024-05-23 8:47 ` Paolo Abeni
2024-05-27 3:26 ` Jason Wang
1 sibling, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-05-23 8:47 UTC (permalink / raw)
To: virtio-comment
Cc: maxime.coquelin, Eelco Chaudron, Jason Wang, Stefano Garzarella
On Thu, 2024-05-23 at 09:20 +0200, Paolo Abeni wrote:
> The VIRTIO_NET_HDR_GSO_UDP_TUNNEL is a gso_type flag allowing GSO over
> UDP tunnel. It can be negotiated on both the host and guest sides.
>
> UDP tunnel usage is ubiquitous in container deployment, and the ability
> to offload UDP encapsulated GSO traffic impacts greatly the performances
> and the CPU utilization of such use cases.
>
> One constraint addressed here is that the virtio side (either device or
> driver) receiving a UDP tunneled GSO packet must be able to reconstruct
> completely the inner and outer headers offset - to allow for later GSO.
>
> To accommodate such need, new optional fields are introduced in the
> virtio_net header: outer_th_offset, inner_protocol, inner_mac_offset,
> inner_nh_offset. They map directly to the corresponding header
> information.
>
> Note that the inner transport header is implied by the (inner) checksum
> offload, if present. Otherwise, it's up to the receiver to detect the
> inner transport header offset from the provided information, as it's
> currently the case for plain (not UDP tunneled) GSO packets.
>
> The outer UDP header may carry a second checksum, which can be offloaded
> independently from the inner one. Since UDP tunnel checksum offload
> support makes little sense without UDP tunnel GSO support, to avoid
> unnecessary complex feature negotiation, the
> VIRTIO_NET_HDR_GSO_UDP_TUNNEL feature implies the support for the outer
> header checksum offload and the checksum itself is handled similarly to
> the inner header one.
>
> Migration for GSO_UDP_TUNNEL enabled VMs should be managed alike other
> types of GSO offload: VMs that has negotiated such features could be
> migrated only on hosts supporting such features.
>
> Note that there is no concept of UDP tunnel type negotiation (e.g.
> vxlan, geneve, vxlan-gpe, etc.). That is intentional because:
> - given the information carried by the guest or host kernel, it's
> impossible to probe reliably the UDP tunnel type. Specifically, the
> outer UDP port numbers give a hint, but peers could use nonstandard
> ones.
> - all the existing UDP tunnel protocols behave the same way WRT GSO
> offload, carrying an immutable header on top of the outer transport
> one.
> - if a new UDP tunnel protocol, carring packet dependant information
> alike the inner packet length or csum should surface in the future,
> the host and guest kernels will need explicit support for it,
> including new, different GSO features. Additional virtio support
> should be designed separately.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> v3 -> v4:
> - new virtio_net_hdr fields depends on F_HOST_UDP_TUNNEL_GSO or
> F_GUEST_UDP_TUNNEL_GSO (Stefano)
> - Extended the changelog to answer Jason's questions.
> - Clarified outer csum handling (Jason)
> https://lore.kernel.org/virtio-dev/f0bf2db203f8061a3ecb50b7a48cf26d8f3c7f68.1716193086.git.pabeni@redhat.com
Overall only minor edit over v3, just to restart the conversation.
@Jason: I don't understand a question you raised on v3:
"""
1) if there's no concept of UDP tunnel type negotiation, would it be
possible for a device to just implement vxlan?
"""
Specifically, I don't understand what do you mean with 'just implement
vxlan': there is no concept of vxlan-specific GSO offload in Linux, all
UDP tunnels are exactly the same thing from GSO perspective.
e.g. see:
https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/ice/ice_txrx.c#L2061
Could you please re-phrase?
Thanks!
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-23 8:47 ` Paolo Abeni
@ 2024-05-27 3:26 ` Jason Wang
2024-05-27 8:06 ` Paolo Abeni
2024-05-30 15:08 ` Willem de Bruijn
0 siblings, 2 replies; 22+ messages in thread
From: Jason Wang @ 2024-05-27 3:26 UTC (permalink / raw)
To: Paolo Abeni
Cc: virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Thu, May 23, 2024 at 4:47 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Thu, 2024-05-23 at 09:20 +0200, Paolo Abeni wrote:
> > The VIRTIO_NET_HDR_GSO_UDP_TUNNEL is a gso_type flag allowing GSO over
> > UDP tunnel. It can be negotiated on both the host and guest sides.
> >
> > UDP tunnel usage is ubiquitous in container deployment, and the ability
> > to offload UDP encapsulated GSO traffic impacts greatly the performances
> > and the CPU utilization of such use cases.
> >
> > One constraint addressed here is that the virtio side (either device or
> > driver) receiving a UDP tunneled GSO packet must be able to reconstruct
> > completely the inner and outer headers offset - to allow for later GSO.
> >
> > To accommodate such need, new optional fields are introduced in the
> > virtio_net header: outer_th_offset, inner_protocol, inner_mac_offset,
> > inner_nh_offset. They map directly to the corresponding header
> > information.
> >
> > Note that the inner transport header is implied by the (inner) checksum
> > offload, if present. Otherwise, it's up to the receiver to detect the
> > inner transport header offset from the provided information, as it's
> > currently the case for plain (not UDP tunneled) GSO packets.
> >
> > The outer UDP header may carry a second checksum, which can be offloaded
> > independently from the inner one. Since UDP tunnel checksum offload
> > support makes little sense without UDP tunnel GSO support, to avoid
> > unnecessary complex feature negotiation, the
> > VIRTIO_NET_HDR_GSO_UDP_TUNNEL feature implies the support for the outer
> > header checksum offload and the checksum itself is handled similarly to
> > the inner header one.
> >
> > Migration for GSO_UDP_TUNNEL enabled VMs should be managed alike other
> > types of GSO offload: VMs that has negotiated such features could be
> > migrated only on hosts supporting such features.
> >
> > Note that there is no concept of UDP tunnel type negotiation (e.g.
> > vxlan, geneve, vxlan-gpe, etc.). That is intentional because:
> > - given the information carried by the guest or host kernel, it's
> > impossible to probe reliably the UDP tunnel type. Specifically, the
> > outer UDP port numbers give a hint, but peers could use nonstandard
> > ones.
> > - all the existing UDP tunnel protocols behave the same way WRT GSO
> > offload, carrying an immutable header on top of the outer transport
> > one.
> > - if a new UDP tunnel protocol, carring packet dependant information
> > alike the inner packet length or csum should surface in the future,
> > the host and guest kernels will need explicit support for it,
> > including new, different GSO features. Additional virtio support
> > should be designed separately.
> >
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > v3 -> v4:
> > - new virtio_net_hdr fields depends on F_HOST_UDP_TUNNEL_GSO or
> > F_GUEST_UDP_TUNNEL_GSO (Stefano)
> > - Extended the changelog to answer Jason's questions.
> > - Clarified outer csum handling (Jason)
> > https://lore.kernel.org/virtio-dev/f0bf2db203f8061a3ecb50b7a48cf26d8f3c7f68.1716193086.git.pabeni@redhat.com
>
> Overall only minor edit over v3, just to restart the conversation.
>
> @Jason: I don't understand a question you raised on v3:
>
> """
> 1) if there's no concept of UDP tunnel type negotiation, would it be
> possible for a device to just implement vxlan?
> """
>
> Specifically, I don't understand what do you mean with 'just implement
> vxlan': there is no concept of vxlan-specific GSO offload in Linux, all
> UDP tunnels are exactly the same thing from GSO perspective.
>
> e.g. see:
>
> https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/ice/ice_txrx.c#L2061
>
> Could you please re-phrase?
I meant, virtio is not necessarily for a software implementation.
Various hardware vendors started to implement it as well.
I wonder if hardware is as flexible as Linux to support general UDP
tunnel offload.
Thanks
>
> Thanks!
>
> Paolo
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-27 3:26 ` Jason Wang
@ 2024-05-27 8:06 ` Paolo Abeni
2024-05-28 2:57 ` Jason Wang
2024-05-30 15:08 ` Willem de Bruijn
1 sibling, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-05-27 8:06 UTC (permalink / raw)
To: Jason Wang
Cc: virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Mon, 2024-05-27 at 11:26 +0800, Jason Wang wrote:
> On Thu, May 23, 2024 at 4:47 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On Thu, 2024-05-23 at 09:20 +0200, Paolo Abeni wrote:
> > > The VIRTIO_NET_HDR_GSO_UDP_TUNNEL is a gso_type flag allowing GSO over
> > > UDP tunnel. It can be negotiated on both the host and guest sides.
> > >
> > > UDP tunnel usage is ubiquitous in container deployment, and the ability
> > > to offload UDP encapsulated GSO traffic impacts greatly the performances
> > > and the CPU utilization of such use cases.
> > >
> > > One constraint addressed here is that the virtio side (either device or
> > > driver) receiving a UDP tunneled GSO packet must be able to reconstruct
> > > completely the inner and outer headers offset - to allow for later GSO.
> > >
> > > To accommodate such need, new optional fields are introduced in the
> > > virtio_net header: outer_th_offset, inner_protocol, inner_mac_offset,
> > > inner_nh_offset. They map directly to the corresponding header
> > > information.
> > >
> > > Note that the inner transport header is implied by the (inner) checksum
> > > offload, if present. Otherwise, it's up to the receiver to detect the
> > > inner transport header offset from the provided information, as it's
> > > currently the case for plain (not UDP tunneled) GSO packets.
> > >
> > > The outer UDP header may carry a second checksum, which can be offloaded
> > > independently from the inner one. Since UDP tunnel checksum offload
> > > support makes little sense without UDP tunnel GSO support, to avoid
> > > unnecessary complex feature negotiation, the
> > > VIRTIO_NET_HDR_GSO_UDP_TUNNEL feature implies the support for the outer
> > > header checksum offload and the checksum itself is handled similarly to
> > > the inner header one.
> > >
> > > Migration for GSO_UDP_TUNNEL enabled VMs should be managed alike other
> > > types of GSO offload: VMs that has negotiated such features could be
> > > migrated only on hosts supporting such features.
> > >
> > > Note that there is no concept of UDP tunnel type negotiation (e.g.
> > > vxlan, geneve, vxlan-gpe, etc.). That is intentional because:
> > > - given the information carried by the guest or host kernel, it's
> > > impossible to probe reliably the UDP tunnel type. Specifically, the
> > > outer UDP port numbers give a hint, but peers could use nonstandard
> > > ones.
> > > - all the existing UDP tunnel protocols behave the same way WRT GSO
> > > offload, carrying an immutable header on top of the outer transport
> > > one.
> > > - if a new UDP tunnel protocol, carring packet dependant information
> > > alike the inner packet length or csum should surface in the future,
> > > the host and guest kernels will need explicit support for it,
> > > including new, different GSO features. Additional virtio support
> > > should be designed separately.
> > >
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > > v3 -> v4:
> > > - new virtio_net_hdr fields depends on F_HOST_UDP_TUNNEL_GSO or
> > > F_GUEST_UDP_TUNNEL_GSO (Stefano)
> > > - Extended the changelog to answer Jason's questions.
> > > - Clarified outer csum handling (Jason)
> > > https://lore.kernel.org/virtio-dev/f0bf2db203f8061a3ecb50b7a48cf26d8f3c7f68.1716193086.git.pabeni@redhat.com
> >
> > Overall only minor edit over v3, just to restart the conversation.
> >
> > @Jason: I don't understand a question you raised on v3:
> >
> > """
> > 1) if there's no concept of UDP tunnel type negotiation, would it be
> > possible for a device to just implement vxlan?
> > """
> >
> > Specifically, I don't understand what do you mean with 'just implement
> > vxlan': there is no concept of vxlan-specific GSO offload in Linux, all
> > UDP tunnels are exactly the same thing from GSO perspective.
> >
> > e.g. see:
> >
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/ice/ice_txrx.c#L2061
> >
> > Could you please re-phrase?
>
> I meant, virtio is not necessarily for a software implementation.
> Various hardware vendors started to implement it as well.
Note that the above links refers to an HW offload implementation for
TCP over UDP tunnel TSO support.
> I wonder if hardware is as flexible as Linux to support general UDP
> tunnel offload.
I don't have additional information WRT HW specific limitations.
Perhaps the vendors (nvidia? intel? who else?) could chime here to
specify such limitation if any.
How is that kind of scenario managed with plain TCP TSO for virtio?
e.g. let's suppose that some virtio H/W only supports a limited number
of segments per TSO packet.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-27 8:06 ` Paolo Abeni
@ 2024-05-28 2:57 ` Jason Wang
2024-05-28 17:47 ` Paolo Abeni
0 siblings, 1 reply; 22+ messages in thread
From: Jason Wang @ 2024-05-28 2:57 UTC (permalink / raw)
To: Paolo Abeni
Cc: virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Mon, May 27, 2024 at 4:06 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Mon, 2024-05-27 at 11:26 +0800, Jason Wang wrote:
> > On Thu, May 23, 2024 at 4:47 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > On Thu, 2024-05-23 at 09:20 +0200, Paolo Abeni wrote:
> > > > The VIRTIO_NET_HDR_GSO_UDP_TUNNEL is a gso_type flag allowing GSO over
> > > > UDP tunnel. It can be negotiated on both the host and guest sides.
> > > >
> > > > UDP tunnel usage is ubiquitous in container deployment, and the ability
> > > > to offload UDP encapsulated GSO traffic impacts greatly the performances
> > > > and the CPU utilization of such use cases.
> > > >
> > > > One constraint addressed here is that the virtio side (either device or
> > > > driver) receiving a UDP tunneled GSO packet must be able to reconstruct
> > > > completely the inner and outer headers offset - to allow for later GSO.
> > > >
> > > > To accommodate such need, new optional fields are introduced in the
> > > > virtio_net header: outer_th_offset, inner_protocol, inner_mac_offset,
> > > > inner_nh_offset. They map directly to the corresponding header
> > > > information.
> > > >
> > > > Note that the inner transport header is implied by the (inner) checksum
> > > > offload, if present. Otherwise, it's up to the receiver to detect the
> > > > inner transport header offset from the provided information, as it's
> > > > currently the case for plain (not UDP tunneled) GSO packets.
> > > >
> > > > The outer UDP header may carry a second checksum, which can be offloaded
> > > > independently from the inner one. Since UDP tunnel checksum offload
> > > > support makes little sense without UDP tunnel GSO support, to avoid
> > > > unnecessary complex feature negotiation, the
> > > > VIRTIO_NET_HDR_GSO_UDP_TUNNEL feature implies the support for the outer
> > > > header checksum offload and the checksum itself is handled similarly to
> > > > the inner header one.
> > > >
> > > > Migration for GSO_UDP_TUNNEL enabled VMs should be managed alike other
> > > > types of GSO offload: VMs that has negotiated such features could be
> > > > migrated only on hosts supporting such features.
> > > >
> > > > Note that there is no concept of UDP tunnel type negotiation (e.g.
> > > > vxlan, geneve, vxlan-gpe, etc.). That is intentional because:
> > > > - given the information carried by the guest or host kernel, it's
> > > > impossible to probe reliably the UDP tunnel type. Specifically, the
> > > > outer UDP port numbers give a hint, but peers could use nonstandard
> > > > ones.
> > > > - all the existing UDP tunnel protocols behave the same way WRT GSO
> > > > offload, carrying an immutable header on top of the outer transport
> > > > one.
> > > > - if a new UDP tunnel protocol, carring packet dependant information
> > > > alike the inner packet length or csum should surface in the future,
> > > > the host and guest kernels will need explicit support for it,
> > > > including new, different GSO features. Additional virtio support
> > > > should be designed separately.
> > > >
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > ---
> > > > v3 -> v4:
> > > > - new virtio_net_hdr fields depends on F_HOST_UDP_TUNNEL_GSO or
> > > > F_GUEST_UDP_TUNNEL_GSO (Stefano)
> > > > - Extended the changelog to answer Jason's questions.
> > > > - Clarified outer csum handling (Jason)
> > > > https://lore.kernel.org/virtio-dev/f0bf2db203f8061a3ecb50b7a48cf26d8f3c7f68.1716193086.git.pabeni@redhat.com
> > >
> > > Overall only minor edit over v3, just to restart the conversation.
> > >
> > > @Jason: I don't understand a question you raised on v3:
> > >
> > > """
> > > 1) if there's no concept of UDP tunnel type negotiation, would it be
> > > possible for a device to just implement vxlan?
> > > """
> > >
> > > Specifically, I don't understand what do you mean with 'just implement
> > > vxlan': there is no concept of vxlan-specific GSO offload in Linux, all
> > > UDP tunnels are exactly the same thing from GSO perspective.
> > >
> > > e.g. see:
> > >
> > > https://elixir.bootlin.com/linux/latest/source/drivers/net/ethernet/intel/ice/ice_txrx.c#L2061
> > >
> > > Could you please re-phrase?
> >
> > I meant, virtio is not necessarily for a software implementation.
> > Various hardware vendors started to implement it as well.
>
> Note that the above links refers to an HW offload implementation for
> TCP over UDP tunnel TSO support.
>
> > I wonder if hardware is as flexible as Linux to support general UDP
> > tunnel offload.
>
> I don't have additional information WRT HW specific limitations.
>
> Perhaps the vendors (nvidia? intel? who else?) could chime here to
> specify such limitation if any.
>
> How is that kind of scenario managed with plain TCP TSO for virtio?
It's a problem indeed which needs to be addressed (and it is required
for big TCP).
> e.g. let's suppose that some virtio H/W only supports a limited number
> of segments per TSO packet.
A similar problem like this:
https://lore.kernel.org/lkml/20240411051124.386817-1-yuri.benditovich@daynix.com/T/
I think this approach is problematic as it causes migration
compatibility issues.
(Unfortunately, it is applied and for some reason my comment doesn't
reach the list ...)
Thanks
>
> Thanks,
>
> Paolo
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-28 2:57 ` Jason Wang
@ 2024-05-28 17:47 ` Paolo Abeni
2024-05-29 0:30 ` Jason Wang
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-05-28 17:47 UTC (permalink / raw)
To: Jason Wang
Cc: virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Tue, 2024-05-28 at 10:57 +0800, Jason Wang wrote:
> On Mon, May 27, 2024 at 4:06 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Mon, 2024-05-27 at 11:26 +0800, Jason Wang wrote:
> > > I meant, virtio is not necessarily for a software implementation.
> > > Various hardware vendors started to implement it as well.
> >
> > Note that the above links refers to an HW offload implementation for
> > TCP over UDP tunnel TSO support.
> >
> > > I wonder if hardware is as flexible as Linux to support general UDP
> > > tunnel offload.
> >
> > I don't have additional information WRT HW specific limitations.
> >
> > Perhaps the vendors (nvidia? intel? who else?) could chime here to
> > specify such limitation if any.
> >
> > How is that kind of scenario managed with plain TCP TSO for virtio?
>
> It's a problem indeed which needs to be addressed (and it is required
> for big TCP).
>
> > e.g. let's suppose that some virtio H/W only supports a limited number
> > of segments per TSO packet.
>
> A similar problem like this:
>
> https://lore.kernel.org/lkml/20240411051124.386817-1-yuri.benditovich@daynix.com/T/
This is actually for the "reverse" problem, right? I think we can have
HW induced migration issues only if the H/W introduces some unexpected
constraints in the VM -> Host direction.
Restricting the problem to the tunnel part of GSO/TSO over UDP tunnel,
The only variables there are:
- the tunnel header size
- the outer header destination port number.
- the inner mac header type
I think the easiest thing is requiring that HW implementation for
virtio net supporting GSO over UDP tunnel must support:
- a 'reasonable' arbitrary maximum tunnel header - say 128.
- any outer UDP header destination port number
- ethernet inner mac header
If the above proves to be too restrictive, we could later extend the
specification to fine grain negotiate the mentioned parameters, and
prevent migration on Hosts with too restrictive capabilities. I think
that designing that now (even before such negotiation for plain TCP
TSO) would be too early.
WDYT?
> I think this approach is problematic as it causes migration
> compatibility issues.
>
> (Unfortunately, it is applied and for some reason my comment doesn't
> reach the list ...)
I guess the above is a slightly different topic. I suggested to send a
revert with the reasoning or at least to start a new thread about the
problem.
Cheers,
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-28 17:47 ` Paolo Abeni
@ 2024-05-29 0:30 ` Jason Wang
2024-05-29 10:25 ` Paolo Abeni
0 siblings, 1 reply; 22+ messages in thread
From: Jason Wang @ 2024-05-29 0:30 UTC (permalink / raw)
To: Paolo Abeni
Cc: virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Wed, May 29, 2024 at 1:47 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2024-05-28 at 10:57 +0800, Jason Wang wrote:
> > On Mon, May 27, 2024 at 4:06 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > On Mon, 2024-05-27 at 11:26 +0800, Jason Wang wrote:
> > > > I meant, virtio is not necessarily for a software implementation.
> > > > Various hardware vendors started to implement it as well.
> > >
> > > Note that the above links refers to an HW offload implementation for
> > > TCP over UDP tunnel TSO support.
> > >
> > > > I wonder if hardware is as flexible as Linux to support general UDP
> > > > tunnel offload.
> > >
> > > I don't have additional information WRT HW specific limitations.
> > >
> > > Perhaps the vendors (nvidia? intel? who else?) could chime here to
> > > specify such limitation if any.
> > >
> > > How is that kind of scenario managed with plain TCP TSO for virtio?
> >
> > It's a problem indeed which needs to be addressed (and it is required
> > for big TCP).
> >
> > > e.g. let's suppose that some virtio H/W only supports a limited number
> > > of segments per TSO packet.
> >
> > A similar problem like this:
> >
> > https://lore.kernel.org/lkml/20240411051124.386817-1-yuri.benditovich@daynix.com/T/
>
> This is actually for the "reverse" problem, right?
Not sure but I think the context is all about migration compatibility.
> I think we can have
> HW induced migration issues only if the H/W introduces some unexpected
> constraints in the VM -> Host direction.
Yes, it's kind of similar. For example, destination is not as capable
as the source:
1) dst's max gso segs is less than the source
2) dst's UDP tunnel offloading is not as "rich" as the source
(something similar to the UFO removal issue in the past).
>
> Restricting the problem to the tunnel part of GSO/TSO over UDP tunnel,
> The only variables there are:
> - the tunnel header size
> - the outer header destination port number.
> - the inner mac header type
>
> I think the easiest thing is requiring that HW implementation for
> virtio net supporting GSO over UDP tunnel must support:
> - a 'reasonable' arbitrary maximum tunnel header - say 128.
> - any outer UDP header destination port number
> - ethernet inner mac header
>
> If the above proves to be too restrictive, we could later extend the
> specification to fine grain negotiate the mentioned parameters, and
> prevent migration on Hosts with too restrictive capabilities. I think
> that designing that now (even before such negotiation for plain TCP
> TSO) would be too early.
>
> WDYT?
It should be fine. A question here, does the kernel have checks for
the above? (e.g max tunnel header size)
>
> > I think this approach is problematic as it causes migration
> > compatibility issues.
> >
> > (Unfortunately, it is applied and for some reason my comment doesn't
> > reach the list ...)
>
> I guess the above is a slightly different topic. I suggested to send a
> revert with the reasoning or at least to start a new thread about the
> problem.
Fine.
Thanks
>
>
> Cheers,
>
> Paolo
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-29 0:30 ` Jason Wang
@ 2024-05-29 10:25 ` Paolo Abeni
2024-05-30 2:36 ` Jason Wang
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-05-29 10:25 UTC (permalink / raw)
To: Jason Wang
Cc: virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Wed, 2024-05-29 at 08:30 +0800, Jason Wang wrote:
> On Wed, May 29, 2024 at 1:47 AM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > On Tue, 2024-05-28 at 10:57 +0800, Jason Wang wrote:
> > > On Mon, May 27, 2024 at 4:06 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > On Mon, 2024-05-27 at 11:26 +0800, Jason Wang wrote:
> > > > > I meant, virtio is not necessarily for a software implementation.
> > > > > Various hardware vendors started to implement it as well.
> > > >
> > > > Note that the above links refers to an HW offload implementation for
> > > > TCP over UDP tunnel TSO support.
> > > >
> > > > > I wonder if hardware is as flexible as Linux to support general UDP
> > > > > tunnel offload.
> > > >
> > > > I don't have additional information WRT HW specific limitations.
> > > >
> > > > Perhaps the vendors (nvidia? intel? who else?) could chime here to
> > > > specify such limitation if any.
> > > >
> > > > How is that kind of scenario managed with plain TCP TSO for virtio?
> > >
> > > It's a problem indeed which needs to be addressed (and it is required
> > > for big TCP).
> > >
> > > > e.g. let's suppose that some virtio H/W only supports a limited number
> > > > of segments per TSO packet.
> > >
> > > A similar problem like this:
> > >
> > > https://lore.kernel.org/lkml/20240411051124.386817-1-yuri.benditovich@daynix.com/T/
> >
> > This is actually for the "reverse" problem, right?
>
> Not sure but I think the context is all about migration compatibility.
>
> > I think we can have
> > HW induced migration issues only if the H/W introduces some unexpected
> > constraints in the VM -> Host direction.
>
> Yes, it's kind of similar. For example, destination is not as capable
> as the source:
>
> 1) dst's max gso segs is less than the source
> 2) dst's UDP tunnel offloading is not as "rich" as the source
> (something similar to the UFO removal issue in the past).
>
> >
> > Restricting the problem to the tunnel part of GSO/TSO over UDP tunnel,
> > The only variables there are:
> > - the tunnel header size
> > - the outer header destination port number.
> > - the inner mac header type
> >
> > I think the easiest thing is requiring that HW implementation for
> > virtio net supporting GSO over UDP tunnel must support:
> > - a 'reasonable' arbitrary maximum tunnel header - say 128.
> > - any outer UDP header destination port number
> > - ethernet inner mac header
> >
> > If the above proves to be too restrictive, we could later extend the
> > specification to fine grain negotiate the mentioned parameters, and
> > prevent migration on Hosts with too restrictive capabilities. I think
> > that designing that now (even before such negotiation for plain TCP
> > TSO) would be too early.
> >
> > WDYT?
>
> It should be fine. A question here, does the kernel have checks for
> the above? (e.g max tunnel header size)
At least a driver (ice) has a check for that in its ndo_feature_check:
https://elixir.bootlin.com/linux/v6.10-rc1/source/drivers/net/ethernet/intel/ice/ice_main.c#L9556
The tunnel size limit is 32, and the callback additionally ensures that
the tunnel header len is not odd.
Note that in such ndo there are more constraints related to plain TCP
TSO - e.g. maximum IP header len <= 508 for both the inner and outer
header (no idea how that could be respected in practice).
Do you have other comments or can I go for v5 including the mentioned
constraints on the tunnel header layout?
Thanks,
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-29 10:25 ` Paolo Abeni
@ 2024-05-30 2:36 ` Jason Wang
0 siblings, 0 replies; 22+ messages in thread
From: Jason Wang @ 2024-05-30 2:36 UTC (permalink / raw)
To: Paolo Abeni
Cc: virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Wed, May 29, 2024 at 6:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Wed, 2024-05-29 at 08:30 +0800, Jason Wang wrote:
> > On Wed, May 29, 2024 at 1:47 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > On Tue, 2024-05-28 at 10:57 +0800, Jason Wang wrote:
> > > > On Mon, May 27, 2024 at 4:06 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > On Mon, 2024-05-27 at 11:26 +0800, Jason Wang wrote:
> > > > > > I meant, virtio is not necessarily for a software implementation.
> > > > > > Various hardware vendors started to implement it as well.
> > > > >
> > > > > Note that the above links refers to an HW offload implementation for
> > > > > TCP over UDP tunnel TSO support.
> > > > >
> > > > > > I wonder if hardware is as flexible as Linux to support general UDP
> > > > > > tunnel offload.
> > > > >
> > > > > I don't have additional information WRT HW specific limitations.
> > > > >
> > > > > Perhaps the vendors (nvidia? intel? who else?) could chime here to
> > > > > specify such limitation if any.
> > > > >
> > > > > How is that kind of scenario managed with plain TCP TSO for virtio?
> > > >
> > > > It's a problem indeed which needs to be addressed (and it is required
> > > > for big TCP).
> > > >
> > > > > e.g. let's suppose that some virtio H/W only supports a limited number
> > > > > of segments per TSO packet.
> > > >
> > > > A similar problem like this:
> > > >
> > > > https://lore.kernel.org/lkml/20240411051124.386817-1-yuri.benditovich@daynix.com/T/
> > >
> > > This is actually for the "reverse" problem, right?
> >
> > Not sure but I think the context is all about migration compatibility.
> >
> > > I think we can have
> > > HW induced migration issues only if the H/W introduces some unexpected
> > > constraints in the VM -> Host direction.
> >
> > Yes, it's kind of similar. For example, destination is not as capable
> > as the source:
> >
> > 1) dst's max gso segs is less than the source
> > 2) dst's UDP tunnel offloading is not as "rich" as the source
> > (something similar to the UFO removal issue in the past).
> >
> > >
> > > Restricting the problem to the tunnel part of GSO/TSO over UDP tunnel,
> > > The only variables there are:
> > > - the tunnel header size
> > > - the outer header destination port number.
> > > - the inner mac header type
> > >
> > > I think the easiest thing is requiring that HW implementation for
> > > virtio net supporting GSO over UDP tunnel must support:
> > > - a 'reasonable' arbitrary maximum tunnel header - say 128.
> > > - any outer UDP header destination port number
> > > - ethernet inner mac header
> > >
> > > If the above proves to be too restrictive, we could later extend the
> > > specification to fine grain negotiate the mentioned parameters, and
> > > prevent migration on Hosts with too restrictive capabilities. I think
> > > that designing that now (even before such negotiation for plain TCP
> > > TSO) would be too early.
> > >
> > > WDYT?
> >
> > It should be fine. A question here, does the kernel have checks for
> > the above? (e.g max tunnel header size)
>
> At least a driver (ice) has a check for that in its ndo_feature_check:
>
> https://elixir.bootlin.com/linux/v6.10-rc1/source/drivers/net/ethernet/intel/ice/ice_main.c#L9556
>
> The tunnel size limit is 32, and the callback additionally ensures that
> the tunnel header len is not odd.
Ok, this should be fine as we can fallback.
>
> Note that in such ndo there are more constraints related to plain TCP
> TSO - e.g. maximum IP header len <= 508 for both the inner and outer
> header (no idea how that could be respected in practice).
Right.
>
> Do you have other comments or can I go for v5 including the mentioned
> constraints on the tunnel header layout?
One more thing is probably to clarify how the outer checksum is done.
Thanks
>
> Thanks,
>
> Paolo
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-27 3:26 ` Jason Wang
2024-05-27 8:06 ` Paolo Abeni
@ 2024-05-30 15:08 ` Willem de Bruijn
2024-05-30 16:09 ` Paolo Abeni
2024-06-03 1:40 ` Jason Wang
1 sibling, 2 replies; 22+ messages in thread
From: Willem de Bruijn @ 2024-05-30 15:08 UTC (permalink / raw)
To: Jason Wang
Cc: Paolo Abeni, virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Sun, May 26, 2024 at 11:26 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, May 23, 2024 at 4:47 PM Paolo Abeni <pabeni@redhat.com> wrote:
Sorry, late to this proposal. Originally introduced in 2022?
> >
> > On Thu, 2024-05-23 at 09:20 +0200, Paolo Abeni wrote:
> > > The VIRTIO_NET_HDR_GSO_UDP_TUNNEL is a gso_type flag allowing GSO over
> > > UDP tunnel. It can be negotiated on both the host and guest sides.
> > >
> > > UDP tunnel usage is ubiquitous in container deployment, and the ability
> > > to offload UDP encapsulated GSO traffic impacts greatly the performances
> > > and the CPU utilization of such use cases.
> > >
> > > One constraint addressed here is that the virtio side (either device or
> > > driver) receiving a UDP tunneled GSO packet must be able to reconstruct
> > > completely the inner and outer headers offset - to allow for later GSO.
> > >
> > > To accommodate such need, new optional fields are introduced in the
> > > virtio_net header: outer_th_offset, inner_protocol, inner_mac_offset,
> > > inner_nh_offset. They map directly to the corresponding header
> > > information.
> > >
> > > Note that the inner transport header is implied by the (inner) checksum
> > > offload, if present. Otherwise, it's up to the receiver to detect the
Can we just require filling checksum start and checksum off for tunnel
types. Even if the checksum offload flag is not set.
> > > inner transport header offset from the provided information, as it's
> > > currently the case for plain (not UDP tunneled) GSO packets.
> > >
> > > The outer UDP header may carry a second checksum, which can be offloaded
> > > independently from the inner one. Since UDP tunnel checksum offload
Just require local checksum offload? Computing the outer checksum is
always cheap. Optional double checksum offload might add complexity
that is not needed, as it is so cheap.
> > > support makes little sense without UDP tunnel GSO support, to avoid
> > > unnecessary complex feature negotiation, the
> > > VIRTIO_NET_HDR_GSO_UDP_TUNNEL feature implies the support for the outer
> > > header checksum offload and the checksum itself is handled similarly to
> > > the inner header one.
> > >
> > > Migration for GSO_UDP_TUNNEL enabled VMs should be managed alike other
> > > types of GSO offload: VMs that has negotiated such features could be
> > > migrated only on hosts supporting such features.
> > >
> > > Note that there is no concept of UDP tunnel type negotiation (e.g.
> > > vxlan, geneve, vxlan-gpe, etc.). That is intentional because:
> > > - given the information carried by the guest or host kernel, it's
> > > impossible to probe reliably the UDP tunnel type. Specifically, the
> > > outer UDP port numbers give a hint, but peers could use nonstandard
> > > ones.
> > > - all the existing UDP tunnel protocols behave the same way WRT GSO
> > > offload, carrying an immutable header on top of the outer transport
> > > one.
> > > - if a new UDP tunnel protocol, carring packet dependant information
> > > alike the inner packet length or csum should surface in the future,
> > > the host and guest kernels will need explicit support for it,
> > > including new, different GSO features. Additional virtio support
> > > should be designed separately.
> > >
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Can you reconsider the choice to only allow UDP encap that includes an
inner MAC header?
There are useful UDP based tunnels that skip that, such as GUE. Virtio
revisions are a long process. If we can accommodate those in the
design from the start, that saves us much work later. Or, at minimum
make sure that the proposed design can easily be extended to such
protocols.
> I meant, virtio is not necessarily for a software implementation.
> Various hardware vendors started to implement it as well.
>
> I wonder if hardware is as flexible as Linux to support general UDP
> tunnel offload.
Do you mean TSO with tunnels? Using software parser mode, devices can
support that. See also protocol independent segmentation offload
(PISO, https://www.opencompute.org/w/index.php?title=Core_Offloads#Protocol_Independent_Segmentation_Offload)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-30 15:08 ` Willem de Bruijn
@ 2024-05-30 16:09 ` Paolo Abeni
2024-05-30 16:23 ` Willem de Bruijn
2024-06-03 1:40 ` Jason Wang
1 sibling, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-05-30 16:09 UTC (permalink / raw)
To: Willem de Bruijn, Jason Wang
Cc: virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Thu, 2024-05-30 at 11:08 -0400, Willem de Bruijn wrote:
> On Sun, May 26, 2024 at 11:26 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Thu, May 23, 2024 at 4:47 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Sorry, late to this proposal. Originally introduced in 2022?
Let's say that it went into idle status for long time;)
> > > On Thu, 2024-05-23 at 09:20 +0200, Paolo Abeni wrote:
> > > > The VIRTIO_NET_HDR_GSO_UDP_TUNNEL is a gso_type flag allowing GSO over
> > > > UDP tunnel. It can be negotiated on both the host and guest sides.
> > > >
> > > > UDP tunnel usage is ubiquitous in container deployment, and the ability
> > > > to offload UDP encapsulated GSO traffic impacts greatly the performances
> > > > and the CPU utilization of such use cases.
> > > >
> > > > One constraint addressed here is that the virtio side (either device or
> > > > driver) receiving a UDP tunneled GSO packet must be able to reconstruct
> > > > completely the inner and outer headers offset - to allow for later GSO.
> > > >
> > > > To accommodate such need, new optional fields are introduced in the
> > > > virtio_net header: outer_th_offset, inner_protocol, inner_mac_offset,
> > > > inner_nh_offset. They map directly to the corresponding header
> > > > information.
> > > >
> > > > Note that the inner transport header is implied by the (inner) checksum
> > > > offload, if present. Otherwise, it's up to the receiver to detect the
>
> Can we just require filling checksum start and checksum off for tunnel
> types. Even if the checksum offload flag is not set.
I think it should work.
> > > > inner transport header offset from the provided information, as it's
> > > > currently the case for plain (not UDP tunneled) GSO packets.
> > > >
> > > > The outer UDP header may carry a second checksum, which can be offloaded
> > > > independently from the inner one. Since UDP tunnel checksum offload
>
> Just require local checksum offload? Computing the outer checksum is
> always cheap. Optional double checksum offload might add complexity
> that is not needed, as it is so cheap.
The rationale behind the making the outer csum offload manadatory was
to keep the specification change simple and prevent the exhaustion of
the virtio_net_hdr flags and gso_type exhaustion. With this change,
'flags' has 4 bits used out of 8 and will raise to 5. 'gso_type' has 3
5 bits used and will raise to 6 (even if only 3 of them will be
bitfields).
The complexity problem is only relevant for h/w implementation, right?
Is really such a problem for modern H/W?
> > > > support makes little sense without UDP tunnel GSO support, to avoid
> > > > unnecessary complex feature negotiation, the
> > > > VIRTIO_NET_HDR_GSO_UDP_TUNNEL feature implies the support for the outer
> > > > header checksum offload and the checksum itself is handled similarly to
> > > > the inner header one.
> > > >
> > > > Migration for GSO_UDP_TUNNEL enabled VMs should be managed alike other
> > > > types of GSO offload: VMs that has negotiated such features could be
> > > > migrated only on hosts supporting such features.
> > > >
> > > > Note that there is no concept of UDP tunnel type negotiation (e.g.
> > > > vxlan, geneve, vxlan-gpe, etc.). That is intentional because:
> > > > - given the information carried by the guest or host kernel, it's
> > > > impossible to probe reliably the UDP tunnel type. Specifically, the
> > > > outer UDP port numbers give a hint, but peers could use nonstandard
> > > > ones.
> > > > - all the existing UDP tunnel protocols behave the same way WRT GSO
> > > > offload, carrying an immutable header on top of the outer transport
> > > > one.
> > > > - if a new UDP tunnel protocol, carring packet dependant information
> > > > alike the inner packet length or csum should surface in the future,
> > > > the host and guest kernels will need explicit support for it,
> > > > including new, different GSO features. Additional virtio support
> > > > should be designed separately.
> > > >
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>
> Can you reconsider the choice to only allow UDP encap that includes an
> inner MAC header?
>
> There are useful UDP based tunnels that skip that, such as GUE. Virtio
> revisions are a long process. If we can accommodate those in the
> design from the start, that saves us much work later. Or, at minimum
> make sure that the proposed design can easily be extended to such
> protocols.
I'm surprised about interest in GUE: for years I wondered about
deprecating and removing it (together with FOU).
Yes, I think the ethernet-only constraint could be removed, but I fear
that will pose some difficulties for H/W implementation - requiring
extra flexibility - and possibly some additional features negotiation
should be needed.
I think that the struct layout can already cope with GUE or the like.
>
Cheers,
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-30 16:09 ` Paolo Abeni
@ 2024-05-30 16:23 ` Willem de Bruijn
2024-05-30 16:27 ` Willem de Bruijn
2024-05-30 17:15 ` Paolo Abeni
0 siblings, 2 replies; 22+ messages in thread
From: Willem de Bruijn @ 2024-05-30 16:23 UTC (permalink / raw)
To: Paolo Abeni
Cc: Jason Wang, virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
> > > > > inner transport header offset from the provided information, as it's
> > > > > currently the case for plain (not UDP tunneled) GSO packets.
> > > > >
> > > > > The outer UDP header may carry a second checksum, which can be offloaded
> > > > > independently from the inner one. Since UDP tunnel checksum offload
> >
> > Just require local checksum offload? Computing the outer checksum is
> > always cheap. Optional double checksum offload might add complexity
> > that is not needed, as it is so cheap.
>
> The rationale behind the making the outer csum offload manadatory was
> to keep the specification change simple and prevent the exhaustion of
> the virtio_net_hdr flags and gso_type exhaustion. With this change,
> 'flags' has 4 bits used out of 8 and will raise to 5. 'gso_type' has 3
> 5 bits used and will raise to 6 (even if only 3 of them will be
> bitfields).
Perhaps we're talking about something different. I'm suggesting that
the outer checksum is not offloaded at all, because outer checksum
generation and validation is always cheap in software. This is the
premise of LCO.
> The complexity problem is only relevant for h/w implementation, right?
> Is really such a problem for modern H/W?
I'm concerned more about the virtio_net_hdr struct and minimizing
the combinatorial explosion of allowed flags.
We've had a lot of issues in the past with the Linux implementation,
in part because we allow combinations of fields that have no practical
use, but let syzbot go nuts. And then we have to patch up the data
path with branches that slow down the actual practical cases.
> > Can you reconsider the choice to only allow UDP encap that includes an
> > inner MAC header?
> >
> > There are useful UDP based tunnels that skip that, such as GUE. Virtio
> > revisions are a long process. If we can accommodate those in the
> > design from the start, that saves us much work later. Or, at minimum
> > make sure that the proposed design can easily be extended to such
> > protocols.
>
> I'm surprised about interest in GUE: for years I wondered about
> deprecating and removing it (together with FOU).
We use it extensively. In fairness, maybe as a result of my choosing
this protocol when we needed a (UDP based) tunnel protocol with
custom internal headers. FOU and GUE are well suited to that.
> Yes, I think the ethernet-only constraint could be removed, but I fear
> that will pose some difficulties for H/W implementation - requiring
> extra flexibility - and possibly some additional features negotiation
> should be needed.
>
> I think that the struct layout can already cope with GUE or the like.
Great.
To be clear, I don't mean to ask for all back-ends to offer support for
all these. But that the layout supports it.
I suppose if we want conditional support, then that would require
extra flags on negotiation. E.g., a separate
VIRTIO_NET_HDR_GSO_UDP_L4_TUNNEL type.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-30 16:23 ` Willem de Bruijn
@ 2024-05-30 16:27 ` Willem de Bruijn
2024-05-30 17:15 ` Paolo Abeni
1 sibling, 0 replies; 22+ messages in thread
From: Willem de Bruijn @ 2024-05-30 16:27 UTC (permalink / raw)
To: Paolo Abeni
Cc: Jason Wang, virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
> > Yes, I think the ethernet-only constraint could be removed, but I fear
> > that will pose some difficulties for H/W implementation - requiring
> > extra flexibility - and possibly some additional features negotiation
> > should be needed.
> >
> > I think that the struct layout can already cope with GUE or the like.
>
> Great.
>
> To be clear, I don't mean to ask for all back-ends to offer support for
> all these. But that the layout supports it.
>
> I suppose if we want conditional support, then that would require
> extra flags on negotiation. E.g., a separate
> VIRTIO_NET_HDR_GSO_UDP_L4_TUNNEL type.
Whoops. Meant L3 instead of L4 here, of course.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-30 16:23 ` Willem de Bruijn
2024-05-30 16:27 ` Willem de Bruijn
@ 2024-05-30 17:15 ` Paolo Abeni
2024-05-31 15:13 ` Willem de Bruijn
1 sibling, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-05-30 17:15 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Jason Wang, virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Thu, 2024-05-30 at 12:23 -0400, Willem de Bruijn wrote:
> > > > > > inner transport header offset from the provided information, as it's
> > > > > > currently the case for plain (not UDP tunneled) GSO packets.
> > > > > >
> > > > > > The outer UDP header may carry a second checksum, which can be offloaded
> > > > > > independently from the inner one. Since UDP tunnel checksum offload
> > >
> > > Just require local checksum offload? Computing the outer checksum is
> > > always cheap. Optional double checksum offload might add complexity
> > > that is not needed, as it is so cheap.
> >
> > The rationale behind the making the outer csum offload manadatory was
> > to keep the specification change simple and prevent the exhaustion of
> > the virtio_net_hdr flags and gso_type exhaustion. With this change,
> > 'flags' has 4 bits used out of 8 and will raise to 5. 'gso_type' has 3
> > 5 bits used and will raise to 6 (even if only 3 of them will be
> > bitfields).
>
> Perhaps we're talking about something different. I'm suggesting that
> the outer checksum is not offloaded at all, because outer checksum
> generation and validation is always cheap in software. This is the
> premise of LCO.
If the guess produces GSO packets with outer checksum, and the virtio
driver does not support the outer csum offload, the guest will have to
do s/w segmentation. Even if computing the outer csum will be cheap,
the segmentation will be bad enough by itself.
Am I missing something?
> > The complexity problem is only relevant for h/w implementation, right?
> > Is really such a problem for modern H/W?
>
> I'm concerned more about the virtio_net_hdr struct and minimizing
> the combinatorial explosion of allowed flags.
>
> We've had a lot of issues in the past with the Linux implementation,
> in part because we allow combinations of fields that have no practical
> use, but let syzbot go nuts. And then we have to patch up the data
> path with branches that slow down the actual practical cases.
I understand.
I don't have any usage data handy to tell which is the most frequent
use-case.
To me, the relevant use-case is openshift inside a VM, which *should*
carry a csum in the outer header (but I have to double check).
> > > Can you reconsider the choice to only allow UDP encap that includes an
> > > inner MAC header?
> > >
> > > There are useful UDP based tunnels that skip that, such as GUE. Virtio
> > > revisions are a long process. If we can accommodate those in the
> > > design from the start, that saves us much work later. Or, at minimum
> > > make sure that the proposed design can easily be extended to such
> > > protocols.
> >
> > I'm surprised about interest in GUE: for years I wondered about
> > deprecating and removing it (together with FOU).
>
> We use it extensively. In fairness, maybe as a result of my choosing
> this protocol when we needed a (UDP based) tunnel protocol with
> custom internal headers. FOU and GUE are well suited to that.
What about geneve?
> > Yes, I think the ethernet-only constraint could be removed, but I fear
> > that will pose some difficulties for H/W implementation - requiring
> > extra flexibility - and possibly some additional features negotiation
> > should be needed.
> >
> > I think that the struct layout can already cope with GUE or the like.
>
> Great.
>
> To be clear, I don't mean to ask for all back-ends to offer support for
> all these. But that the layout supports it.
>
> I suppose if we want conditional support, then that would require
> extra flags on negotiation. E.g., a separate
> VIRTIO_NET_HDR_GSO_UDP_L4_TUNNEL type.
I think it should suffice negotiating an additional feature, say
VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_L3
as the s/w packet processing should not change in any way.
I propose the 'ethernet only' restriction because AFAICS some existing
H/W implementing offload for TSO over UDP tunnel has that kind of
restriction and Jason suggested to be 'polite' towards H/W
implementation.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-30 17:15 ` Paolo Abeni
@ 2024-05-31 15:13 ` Willem de Bruijn
2024-06-03 14:17 ` Paolo Abeni
0 siblings, 1 reply; 22+ messages in thread
From: Willem de Bruijn @ 2024-05-31 15:13 UTC (permalink / raw)
To: Paolo Abeni
Cc: Jason Wang, virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Thu, May 30, 2024 at 1:15 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Thu, 2024-05-30 at 12:23 -0400, Willem de Bruijn wrote:
> > > > > > > inner transport header offset from the provided information, as it's
> > > > > > > currently the case for plain (not UDP tunneled) GSO packets.
> > > > > > >
> > > > > > > The outer UDP header may carry a second checksum, which can be offloaded
> > > > > > > independently from the inner one. Since UDP tunnel checksum offload
> > > >
> > > > Just require local checksum offload? Computing the outer checksum is
> > > > always cheap. Optional double checksum offload might add complexity
> > > > that is not needed, as it is so cheap.
> > >
> > > The rationale behind the making the outer csum offload manadatory was
> > > to keep the specification change simple and prevent the exhaustion of
> > > the virtio_net_hdr flags and gso_type exhaustion. With this change,
> > > 'flags' has 4 bits used out of 8 and will raise to 5. 'gso_type' has 3
> > > 5 bits used and will raise to 6 (even if only 3 of them will be
> > > bitfields).
> >
> > Perhaps we're talking about something different. I'm suggesting that
> > the outer checksum is not offloaded at all, because outer checksum
> > generation and validation is always cheap in software. This is the
> > premise of LCO.
>
> If the guess produces GSO packets with outer checksum, and the virtio
> driver does not support the outer csum offload, the guest will have to
> do s/w segmentation. Even if computing the outer csum will be cheap,
> the segmentation will be bad enough by itself.
>
> Am I missing something?
Do you mean virtio driver (in the guest) or device (in the host)?
All segmentation offload packets imply checksum offload. So the
virtual device must support checksum offload. In this case, of both
checksum fields.
Unless from the start we incorporate partial GSO, and expect the
guest to prepare the outer packet as they have to be for the individual
segments. Then it could also compute the outer checksum in software
using LCO.
I encourage physical hardware vendors to only implement offload for
a single checksum, based on explicit csum_start/csum_off passed by
the host.
Because multiple checksum generally are implemented using packet
parsing, which requires protocol knowledge and thus is fragile.
I suppose you could define a virtio_net_hdr with an extra
outer_csum_start/outer_csum_off. With software segmentation, this
can still apply LCO cheaply.
Is there physical hardware that supports tunnel TSO with multiple
explicit outer_csum_start + inner_csum_start instructions?
>
> > > The complexity problem is only relevant for h/w implementation, right?
> > > Is really such a problem for modern H/W?
> >
> > I'm concerned more about the virtio_net_hdr struct and minimizing
> > the combinatorial explosion of allowed flags.
> >
> > We've had a lot of issues in the past with the Linux implementation,
> > in part because we allow combinations of fields that have no practical
> > use, but let syzbot go nuts. And then we have to patch up the data
> > path with branches that slow down the actual practical cases.
>
> I understand.
>
> I don't have any usage data handy to tell which is the most frequent
> use-case.
>
> To me, the relevant use-case is openshift inside a VM, which *should*
> carry a csum in the outer header (but I have to double check).
>
> > > > Can you reconsider the choice to only allow UDP encap that includes an
> > > > inner MAC header?
> > > >
> > > > There are useful UDP based tunnels that skip that, such as GUE. Virtio
> > > > revisions are a long process. If we can accommodate those in the
> > > > design from the start, that saves us much work later. Or, at minimum
> > > > make sure that the proposed design can easily be extended to such
> > > > protocols.
> > >
> > > I'm surprised about interest in GUE: for years I wondered about
> > > deprecating and removing it (together with FOU).
> >
> > We use it extensively. In fairness, maybe as a result of my choosing
> > this protocol when we needed a (UDP based) tunnel protocol with
> > custom internal headers. FOU and GUE are well suited to that.
>
> What about geneve?
Good point. That also appears to support IP in UDP without inner mac.
So I suppose that could have worked for our use case as well. Another
example is the Linux bare udp driver. In all, plenty of examples.
> > > Yes, I think the ethernet-only constraint could be removed, but I fear
> > > that will pose some difficulties for H/W implementation - requiring
> > > extra flexibility - and possibly some additional features negotiation
> > > should be needed.
> > >
> > > I think that the struct layout can already cope with GUE or the like.
> >
> > Great.
> >
> > To be clear, I don't mean to ask for all back-ends to offer support for
> > all these. But that the layout supports it.
> >
> > I suppose if we want conditional support, then that would require
> > extra flags on negotiation. E.g., a separate
> > VIRTIO_NET_HDR_GSO_UDP_L4_TUNNEL type.
>
> I think it should suffice negotiating an additional feature, say
>
> VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_L3
>
> as the s/w packet processing should not change in any way.
>
> I propose the 'ethernet only' restriction because AFAICS some existing
> H/W implementing offload for TSO over UDP tunnel has that kind of
> restriction and Jason suggested to be 'polite' towards H/W
> implementation.
Makes sense to consider them separate features then.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-31 15:13 ` Willem de Bruijn
@ 2024-06-03 14:17 ` Paolo Abeni
2024-06-03 17:41 ` Willem de Bruijn
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-06-03 14:17 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Jason Wang, virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
Hi,
On Fri, 2024-05-31 at 11:13 -0400, Willem de Bruijn wrote:
> On Thu, May 30, 2024 at 1:15 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Thu, 2024-05-30 at 12:23 -0400, Willem de Bruijn wrote:
> > > > > > > > inner transport header offset from the provided information, as it's
> > > > > > > > currently the case for plain (not UDP tunneled) GSO packets.
> > > > > > > >
> > > > > > > > The outer UDP header may carry a second checksum, which can be offloaded
> > > > > > > > independently from the inner one. Since UDP tunnel checksum offload
> > > > >
> > > > > Just require local checksum offload? Computing the outer checksum is
> > > > > always cheap. Optional double checksum offload might add complexity
> > > > > that is not needed, as it is so cheap.
> > > >
> > > > The rationale behind the making the outer csum offload manadatory was
> > > > to keep the specification change simple and prevent the exhaustion of
> > > > the virtio_net_hdr flags and gso_type exhaustion. With this change,
> > > > 'flags' has 4 bits used out of 8 and will raise to 5. 'gso_type' has 3
> > > > 5 bits used and will raise to 6 (even if only 3 of them will be
> > > > bitfields).
> > >
> > > Perhaps we're talking about something different. I'm suggesting that
> > > the outer checksum is not offloaded at all, because outer checksum
> > > generation and validation is always cheap in software. This is the
> > > premise of LCO.
> >
> > If the guess produces GSO packets with outer checksum, and the virtio
> > driver does not support the outer csum offload, the guest will have to
> > do s/w segmentation. Even if computing the outer csum will be cheap,
> > the segmentation will be bad enough by itself.
> >
> > Am I missing something?
>
> Do you mean virtio driver (in the guest) or device (in the host)?
>
> All segmentation offload packets imply checksum offload. So the
> virtual device must support checksum offload. In this case, of both
> checksum fields.
I feel confused. I think the starting point was your request to avoid
supporting (in the virtual device) outer checksum offload, and I read
the above as quite the opposite?!?
> Unless from the start we incorporate partial GSO, and expect the
> guest to prepare the outer packet as they have to be for the individual
> segments. Then it could also compute the outer checksum in software
> using LCO.
>
> I encourage physical hardware vendors to only implement offload for
> a single checksum, based on explicit csum_start/csum_off passed by
> the host.
>
> Because multiple checksum generally are implemented using packet
> parsing, which requires protocol knowledge and thus is fragile.
>
> I suppose you could define a virtio_net_hdr with an extra
> outer_csum_start/outer_csum_off. With software segmentation, this
> can still apply LCO cheaply.
The proposed spec change adds to the virtio_net_hdr the outer transport
offset, and enforces the outer transport being UDP, no parsing should
be needed (nor s/w segmentation).
> Is there physical hardware that supports tunnel TSO with multiple
> explicit outer_csum_start + inner_csum_start instructions?
I don't have any insights there. Skimming over some major device
drivers in the Linux kernel, I can't find any specific hint of that.
Cheers,
Paolo
> > >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-06-03 14:17 ` Paolo Abeni
@ 2024-06-03 17:41 ` Willem de Bruijn
2024-06-05 10:59 ` Paolo Abeni
0 siblings, 1 reply; 22+ messages in thread
From: Willem de Bruijn @ 2024-06-03 17:41 UTC (permalink / raw)
To: Paolo Abeni
Cc: Jason Wang, virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Mon, Jun 3, 2024 at 10:17 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hi,
>
> On Fri, 2024-05-31 at 11:13 -0400, Willem de Bruijn wrote:
> > On Thu, May 30, 2024 at 1:15 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > On Thu, 2024-05-30 at 12:23 -0400, Willem de Bruijn wrote:
> > > > > > > > > inner transport header offset from the provided information, as it's
> > > > > > > > > currently the case for plain (not UDP tunneled) GSO packets.
> > > > > > > > >
> > > > > > > > > The outer UDP header may carry a second checksum, which can be offloaded
> > > > > > > > > independently from the inner one. Since UDP tunnel checksum offload
> > > > > >
> > > > > > Just require local checksum offload? Computing the outer checksum is
> > > > > > always cheap. Optional double checksum offload might add complexity
> > > > > > that is not needed, as it is so cheap.
> > > > >
> > > > > The rationale behind the making the outer csum offload manadatory was
> > > > > to keep the specification change simple and prevent the exhaustion of
> > > > > the virtio_net_hdr flags and gso_type exhaustion. With this change,
> > > > > 'flags' has 4 bits used out of 8 and will raise to 5. 'gso_type' has 3
> > > > > 5 bits used and will raise to 6 (even if only 3 of them will be
> > > > > bitfields).
> > > >
> > > > Perhaps we're talking about something different. I'm suggesting that
> > > > the outer checksum is not offloaded at all, because outer checksum
> > > > generation and validation is always cheap in software. This is the
> > > > premise of LCO.
> > >
> > > If the guess produces GSO packets with outer checksum, and the virtio
> > > driver does not support the outer csum offload, the guest will have to
> > > do s/w segmentation. Even if computing the outer csum will be cheap,
> > > the segmentation will be bad enough by itself.
> > >
> > > Am I missing something?
> >
> > Do you mean virtio driver (in the guest) or device (in the host)?
> >
> > All segmentation offload packets imply checksum offload. So the
> > virtual device must support checksum offload. In this case, of both
> > checksum fields.
>
> I feel confused. I think the starting point was your request to avoid
> supporting (in the virtual device) outer checksum offload, and I read
> the above as quite the opposite?!?
Sorry for the confusion.
I agree that the outer transport offset is needed, after all.
Without segmentation offload, the outer checksum can be calculated
cheaply software.
But with segmentation offload, this either
- has to happen after segmentation, or
- the outer headers must already be setup for segments (GSO_PARTIAL)
The first approach requires the field.
The second is probably too much to ask of a virtio interface. Demanding
GSO_PARTIAL.
It would require the driver to
- split a tunnel GSO packet into a GSO and
non-GSO pair if it payload length is not a multiple of mss.
- rewrite the outer headers to adjust the length fields
- compute the outer csum using LCO
It's not entirely crazy. The Linux kernel already can already do this for
drivers that advertise tunnel offload only as part of gso_partial_features.
> > Unless from the start we incorporate partial GSO, and expect the
> > guest to prepare the outer packet as they have to be for the individual
> > segments. Then it could also compute the outer checksum in software
> > using LCO.
> >
> > I encourage physical hardware vendors to only implement offload for
> > a single checksum, based on explicit csum_start/csum_off passed by
> > the host.
> >
> > Because multiple checksum generally are implemented using packet
> > parsing, which requires protocol knowledge and thus is fragile.
> >
> > I suppose you could define a virtio_net_hdr with an extra
> > outer_csum_start/outer_csum_off. With software segmentation, this
> > can still apply LCO cheaply.
>
> The proposed spec change adds to the virtio_net_hdr the outer transport
> offset, and enforces the outer transport being UDP, no parsing should
> be needed (nor s/w segmentation).
Good point.
> > Is there physical hardware that supports tunnel TSO with multiple
> > explicit outer_csum_start + inner_csum_start instructions?
>
> I don't have any insights there. Skimming over some major device
> drivers in the Linux kernel, I can't find any specific hint of that.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-06-03 17:41 ` Willem de Bruijn
@ 2024-06-05 10:59 ` Paolo Abeni
2024-06-05 11:11 ` Michael S. Tsirkin
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Abeni @ 2024-06-05 10:59 UTC (permalink / raw)
To: Willem de Bruijn, Jason Wang
Cc: virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Mon, 2024-06-03 at 13:41 -0400, Willem de Bruijn wrote:
> On Mon, Jun 3, 2024 at 10:17 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Fri, 2024-05-31 at 11:13 -0400, Willem de Bruijn wrote:
> > > On Thu, May 30, 2024 at 1:15 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > On Thu, 2024-05-30 at 12:23 -0400, Willem de Bruijn wrote:
> > > > > > > > > > inner transport header offset from the provided information, as it's
> > > > > > > > > > currently the case for plain (not UDP tunneled) GSO packets.
> > > > > > > > > >
> > > > > > > > > > The outer UDP header may carry a second checksum, which can be offloaded
> > > > > > > > > > independently from the inner one. Since UDP tunnel checksum offload
> > > > > > >
> > > > > > > Just require local checksum offload? Computing the outer checksum is
> > > > > > > always cheap. Optional double checksum offload might add complexity
> > > > > > > that is not needed, as it is so cheap.
> > > > > >
> > > > > > The rationale behind the making the outer csum offload manadatory was
> > > > > > to keep the specification change simple and prevent the exhaustion of
> > > > > > the virtio_net_hdr flags and gso_type exhaustion. With this change,
> > > > > > 'flags' has 4 bits used out of 8 and will raise to 5. 'gso_type' has 3
> > > > > > 5 bits used and will raise to 6 (even if only 3 of them will be
> > > > > > bitfields).
> > > > >
> > > > > Perhaps we're talking about something different. I'm suggesting that
> > > > > the outer checksum is not offloaded at all, because outer checksum
> > > > > generation and validation is always cheap in software. This is the
> > > > > premise of LCO.
> > > >
> > > > If the guess produces GSO packets with outer checksum, and the virtio
> > > > driver does not support the outer csum offload, the guest will have to
> > > > do s/w segmentation. Even if computing the outer csum will be cheap,
> > > > the segmentation will be bad enough by itself.
> > > >
> > > > Am I missing something?
> > >
> > > Do you mean virtio driver (in the guest) or device (in the host)?
> > >
> > > All segmentation offload packets imply checksum offload. So the
> > > virtual device must support checksum offload. In this case, of both
> > > checksum fields.
> >
> > I feel confused. I think the starting point was your request to avoid
> > supporting (in the virtual device) outer checksum offload, and I read
> > the above as quite the opposite?!?
>
> Sorry for the confusion.
>
> I agree that the outer transport offset is needed, after all.
>
> Without segmentation offload, the outer checksum can be calculated
> cheaply software.
>
> But with segmentation offload, this either
> - has to happen after segmentation, or
> - the outer headers must already be setup for segments (GSO_PARTIAL)
>
> The first approach requires the field.
>
> The second is probably too much to ask of a virtio interface. Demanding
> GSO_PARTIAL.
>
> It would require the driver to
> - split a tunnel GSO packet into a GSO and
> non-GSO pair if it payload length is not a multiple of mss.
> - rewrite the outer headers to adjust the length fields
> - compute the outer csum using LCO
>
> It's not entirely crazy. The Linux kernel already can already do this for
> drivers that advertise tunnel offload only as part of gso_partial_features.
I'm sorry for the latency.
Wrapping-up the above, I (mis?) understand you are ok with the proposed
virtio_net_hdr layout changes.
WRT UDP tunnel outer header csum offload, I guess I could add (back,
IIRC was there in a previous revision) the feature negotiation for the
outer UDP header checksum.
I think such feature negotiation makes sense only in the guest ->
hypervisor direction, that is, device receive path, where we could hit
possibly H/W implementation constraints.
In the opposite direction it's just a matter of s/w support, it would
be strange to me introduce UDP tunnel GSO and UDP tunnel outer checksum
support separately.
WDYT, does the above makes sense to you? Or do you think we should
negotiate the csum feature in both directions, for better symmetry?
Thanks!
Paolo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-06-05 10:59 ` Paolo Abeni
@ 2024-06-05 11:11 ` Michael S. Tsirkin
0 siblings, 0 replies; 22+ messages in thread
From: Michael S. Tsirkin @ 2024-06-05 11:11 UTC (permalink / raw)
To: Paolo Abeni
Cc: Willem de Bruijn, Jason Wang, virtio-comment, maxime.coquelin,
Eelco Chaudron, Stefano Garzarella
On Wed, Jun 05, 2024 at 12:59:58PM +0200, Paolo Abeni wrote:
> On Mon, 2024-06-03 at 13:41 -0400, Willem de Bruijn wrote:
> > On Mon, Jun 3, 2024 at 10:17 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > On Fri, 2024-05-31 at 11:13 -0400, Willem de Bruijn wrote:
> > > > On Thu, May 30, 2024 at 1:15 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > On Thu, 2024-05-30 at 12:23 -0400, Willem de Bruijn wrote:
> > > > > > > > > > > inner transport header offset from the provided information, as it's
> > > > > > > > > > > currently the case for plain (not UDP tunneled) GSO packets.
> > > > > > > > > > >
> > > > > > > > > > > The outer UDP header may carry a second checksum, which can be offloaded
> > > > > > > > > > > independently from the inner one. Since UDP tunnel checksum offload
> > > > > > > >
> > > > > > > > Just require local checksum offload? Computing the outer checksum is
> > > > > > > > always cheap. Optional double checksum offload might add complexity
> > > > > > > > that is not needed, as it is so cheap.
> > > > > > >
> > > > > > > The rationale behind the making the outer csum offload manadatory was
> > > > > > > to keep the specification change simple and prevent the exhaustion of
> > > > > > > the virtio_net_hdr flags and gso_type exhaustion. With this change,
> > > > > > > 'flags' has 4 bits used out of 8 and will raise to 5. 'gso_type' has 3
> > > > > > > 5 bits used and will raise to 6 (even if only 3 of them will be
> > > > > > > bitfields).
> > > > > >
> > > > > > Perhaps we're talking about something different. I'm suggesting that
> > > > > > the outer checksum is not offloaded at all, because outer checksum
> > > > > > generation and validation is always cheap in software. This is the
> > > > > > premise of LCO.
> > > > >
> > > > > If the guess produces GSO packets with outer checksum, and the virtio
> > > > > driver does not support the outer csum offload, the guest will have to
> > > > > do s/w segmentation. Even if computing the outer csum will be cheap,
> > > > > the segmentation will be bad enough by itself.
> > > > >
> > > > > Am I missing something?
> > > >
> > > > Do you mean virtio driver (in the guest) or device (in the host)?
> > > >
> > > > All segmentation offload packets imply checksum offload. So the
> > > > virtual device must support checksum offload. In this case, of both
> > > > checksum fields.
> > >
> > > I feel confused. I think the starting point was your request to avoid
> > > supporting (in the virtual device) outer checksum offload, and I read
> > > the above as quite the opposite?!?
> >
> > Sorry for the confusion.
> >
> > I agree that the outer transport offset is needed, after all.
> >
> > Without segmentation offload, the outer checksum can be calculated
> > cheaply software.
> >
> > But with segmentation offload, this either
> > - has to happen after segmentation, or
> > - the outer headers must already be setup for segments (GSO_PARTIAL)
> >
> > The first approach requires the field.
> >
> > The second is probably too much to ask of a virtio interface. Demanding
> > GSO_PARTIAL.
> >
> > It would require the driver to
> > - split a tunnel GSO packet into a GSO and
> > non-GSO pair if it payload length is not a multiple of mss.
> > - rewrite the outer headers to adjust the length fields
> > - compute the outer csum using LCO
> >
> > It's not entirely crazy. The Linux kernel already can already do this for
> > drivers that advertise tunnel offload only as part of gso_partial_features.
>
> I'm sorry for the latency.
>
> Wrapping-up the above, I (mis?) understand you are ok with the proposed
> virtio_net_hdr layout changes.
>
> WRT UDP tunnel outer header csum offload, I guess I could add (back,
> IIRC was there in a previous revision) the feature negotiation for the
> outer UDP header checksum.
>
> I think such feature negotiation makes sense only in the guest ->
> hypervisor direction, that is, device receive path, where we could hit
> possibly H/W implementation constraints.
>
> In the opposite direction it's just a matter of s/w support, it would
> be strange to me introduce UDP tunnel GSO and UDP tunnel outer checksum
> support separately.
>
> WDYT, does the above makes sense to you? Or do you think we should
> negotiate the csum feature in both directions, for better symmetry?
>
> Thanks!
>
> Paolo
>
We generally do negotiate csums in both directions, yes.
I'd rather we kept things consistent and we are not short on
feature bits.
--
MST
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-05-30 15:08 ` Willem de Bruijn
2024-05-30 16:09 ` Paolo Abeni
@ 2024-06-03 1:40 ` Jason Wang
2024-06-03 2:36 ` Willem de Bruijn
1 sibling, 1 reply; 22+ messages in thread
From: Jason Wang @ 2024-06-03 1:40 UTC (permalink / raw)
To: Willem de Bruijn
Cc: Paolo Abeni, virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Thu, May 30, 2024 at 11:09 PM Willem de Bruijn <willemb@google.com> wrote:
>
> On Sun, May 26, 2024 at 11:26 PM Jason Wang <jasowang@redhat.com> wrote:
> >
[...]
>
> > I meant, virtio is not necessarily for a software implementation.
> > Various hardware vendors started to implement it as well.
> >
> > I wonder if hardware is as flexible as Linux to support general UDP
> > tunnel offload.
>
> Do you mean TSO with tunnels? Using software parser mode,
Is "software" here mean the driver?
> devices can
> support that. See also protocol independent segmentation offload
> (PISO, https://www.opencompute.org/w/index.php?title=Core_Offloads#Protocol_Independent_Segmentation_Offload)
>
This is great, does the whole idea of core offloads has the plan to
release something like ABI (where virtio-net can just transport it).
Or virtio net needs to clarify such a mechanism with e.g CH/SP fully
in the spec, so there would be no worries about whether we need a
whitelist of the supported protocols etc.
But we may have other corner case where it can break the migration
compatibility due the limitation in the hardware resources:
1) maximum legal value of e.g piso_off
2) maximum #gso_segs support by the device
etc.
Thanks
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v4] virtio-net: define UDP tunnel offload feature
2024-06-03 1:40 ` Jason Wang
@ 2024-06-03 2:36 ` Willem de Bruijn
0 siblings, 0 replies; 22+ messages in thread
From: Willem de Bruijn @ 2024-06-03 2:36 UTC (permalink / raw)
To: Jason Wang
Cc: Paolo Abeni, virtio-comment, maxime.coquelin, Eelco Chaudron,
Stefano Garzarella
On Sun, Jun 2, 2024 at 9:40 PM Jason Wang <jasowang@redhat.com> wrote:
>
> On Thu, May 30, 2024 at 11:09 PM Willem de Bruijn <willemb@google.com> wrote:
> >
> > On Sun, May 26, 2024 at 11:26 PM Jason Wang <jasowang@redhat.com> wrote:
> > >
>
> [...]
>
> >
> > > I meant, virtio is not necessarily for a software implementation.
> > > Various hardware vendors started to implement it as well.
> > >
> > > I wonder if hardware is as flexible as Linux to support general UDP
> > > tunnel offload.
> >
> > Do you mean TSO with tunnels? Using software parser mode,
>
> Is "software" here mean the driver?
Yes, software parser mode (swp) implies suppressing any protocol
parsing by the device, and instead make the driver explicitly pass the
relevant offset(s) in the descriptor.
There are some references to swp in the Linux mlx5 driver, for one example.
> > devices can
> > support that. See also protocol independent segmentation offload
> > (PISO, https://www.opencompute.org/w/index.php?title=Core_Offloads#Protocol_Independent_Segmentation_Offload)
> >
>
> This is great, does the whole idea of core offloads has the plan to
> release something like ABI (where virtio-net can just transport it).
There is no plan to formalize the pattern further. The OCP spec stays
clear of specifying a concrete device API, down to struct layout,
which is vendor-specific. It only tries to list a baseline of features
that the APIs must support.
> Or virtio net needs to clarify such a mechanism with e.g CH/SP fully
> in the spec, so there would be no worries about whether we need a
> whitelist of the supported protocols etc.
>
> But we may have other corner case where it can break the migration
> compatibility due the limitation in the hardware resources:
>
> 1) maximum legal value of e.g piso_off
> 2) maximum #gso_segs support by the device
>
> etc.
True. This would ideally all have been captured in the v1.0 OCP spec.
But it sounds like we need to refine it.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2024-06-05 11:11 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-23 7:20 [PATCH v4] virtio-net: define UDP tunnel offload feature Paolo Abeni
2024-05-23 7:50 ` Paolo Abeni
2024-05-23 8:47 ` Paolo Abeni
2024-05-27 3:26 ` Jason Wang
2024-05-27 8:06 ` Paolo Abeni
2024-05-28 2:57 ` Jason Wang
2024-05-28 17:47 ` Paolo Abeni
2024-05-29 0:30 ` Jason Wang
2024-05-29 10:25 ` Paolo Abeni
2024-05-30 2:36 ` Jason Wang
2024-05-30 15:08 ` Willem de Bruijn
2024-05-30 16:09 ` Paolo Abeni
2024-05-30 16:23 ` Willem de Bruijn
2024-05-30 16:27 ` Willem de Bruijn
2024-05-30 17:15 ` Paolo Abeni
2024-05-31 15:13 ` Willem de Bruijn
2024-06-03 14:17 ` Paolo Abeni
2024-06-03 17:41 ` Willem de Bruijn
2024-06-05 10:59 ` Paolo Abeni
2024-06-05 11:11 ` Michael S. Tsirkin
2024-06-03 1:40 ` Jason Wang
2024-06-03 2:36 ` Willem de Bruijn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox