public inbox for virtio-comment@lists.linux.dev
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Jason Wang <jasowang@redhat.com>
Cc: virtio-comment@lists.linux.dev, maxime.coquelin@redhat.com,
	Eelco Chaudron <echaudro@redhat.com>,
	Stefano Garzarella <sgarzare@redhat.com>,
	Willem de Bruijn <willemb@google.com>,
	kshankar@marvell.com
Subject: Re: [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature
Date: Thu, 10 Oct 2024 09:40:48 +0200	[thread overview]
Message-ID: <2b90c368-b514-4077-a6bf-43da075af4fc@redhat.com> (raw)
In-Reply-To: <CACGkMEvrowr6CwF+=RiR+f3EagdKB=rzyCPCQcUdXee4a5MaZA@mail.gmail.com>

On 10/10/24 05:17, Jason Wang wrote:
> On Wed, Oct 9, 2024 at 4:37 PM Paolo Abeni <pabeni@redhat.com> wrote:
>> On 10/9/24 09:18, Jason Wang wrote:
>>> On Fri, Oct 4, 2024 at 4:13 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>>>
>>>> The VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV{4,6} are two gso_type flags
>>>> allowing respectively GSO over UDPv4 tunnel and GSO over UDPv6 tunnel.
>>>> They can be negotiated on both the host and guest sides.
>>>>
>>>> One constraint addressed here is that the virtio side (either device or
>>>> driver) receiving a UDP tunneled GSO packet must be able to reconstruct
>>>> completely the inner and outer headers offset - to allow for later GSO.
>>>>
>>>> To accommodate such need, new fields are introduced in the virtio_net
>>>> header: outer_th_offset and inner_nh_offset.
>>>> They map directly to the corresponding header information. The inner
>>>> transport header is implied by the (inner) checksum offload.
>>>>
>>>> Those fields are required because a virtio device H/W implementation
>>>> performing segmentation for UDP tunneled packet will need to touch
>>>> the outer transport protocol (for the UDP length filed), the
>>>> inner network protocol (for the total length field, in the IPv4
>>>> case).
>>>>
>>>> Note that segmentation will additionally need to touch
>>>> the outer network protocol and the inner transport protocol. The first
>>>> is implied/easily found with trivial parsing, the latter is identified
>>>> by the existing csum_start field.
>>>>
>>>> Note that there is no concept of UDP tunnel type negotiation (e.g.
>>>> vxlan, geneve, vxlan-gpe, etc.), as a virtio device H/W implementation
>>>> can perform segmentation for every possible UDP-tunnel given the
>>>> specified new fields.
>>>> In the reverse direction, if a virtio device H/W implementation receives
>>>> some traffic for an unknown or unsupported UDP tunnel, it will simply
>>>> not aggregate the wire packet in a GSO one.
>>>>
>>>> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
>>>> ---
>>>> v8 -> v9:
>>>>    - rebased on top of virtio-1.4, changed udp-tunnel features number
>>>>      to avoid conflicts/duplications
>>>>    - fixed typos:
>>>>      VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV4 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV4
>>>>      VIRTIO_NET_HDR_F_UDP_TUNNEL_IPV6 -> VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV6
>>>>    - specified strict validation on RX side (both driver and device)
>>>>      for UDP tunneled packet: NEEDS_CSUM and plain GSO are mandatory for UDP
>>>>      tunnel GSO packets. The driver is not allowed to set DATA_VALID for UDP
>>>>      tunnel GSO packets, the driver can set both DATA_VALID & NEEDS_CSUM.
>>>
>>> DATA_VALID and NEEDS_CSUM are mutually exclusive unless I miss something.
>>
>> AFAICS the specification does not specify them to be mutually esclusive
>> and at least the linux kernel implementation allow them to be set together.
> 
> Interesting, where is such code? E.g in virtio_net_hdr_from_skb() we had:
> 
>          if (skb->ip_summed == CHECKSUM_PARTIAL) {
>                  hdr->flags = VIRTIO_NET_HDR_F_NEEDS_CSUM;
>                  hdr->csum_start = __cpu_to_virtio16(little_endian,
>                          skb_checksum_start_offset(skb) + vlan_hlen);
>                  hdr->csum_offset = __cpu_to_virtio16(little_endian,
>                                  skb->csum_offset);
>          } else if (has_data_valid &&
>                     skb->ip_summed == CHECKSUM_UNNECESSARY) {
>                  hdr->flags = VIRTIO_NET_HDR_F_DATA_VALID;
> ...

The problem is in the opposite direction, in virtio_net_hdr_to_skb().

It accepts GSO packets with the VIRTIO_NET_HDR_F_NEEDS_CSUM bit not set, 
and accepts also packets with both VIRTIO_NET_HDR_F_NEEDS_CSUM and 
VIRTIO_NET_HDR_F_DATA_VALID set. (Note that the PoC is more restrictive 
for GSO over UDP tunnel packets)

The above means that an hypervisor with a S/W virtio implementation need 
to deal with all sort of strange/unexpected combinations.

> Anyhow, when I introduce DATA_VALID, the idea is to map
> CHECKSUM_UNNECESSARY to DATA_VALID and CHECKSUM_PARTIAL maps to
> NEEDS_CSUM. My understanding is CHECKSUM_UNNECESSARY and
> CHECKSUM_PARTIAL and mutually exclusive.

AFAICS the problem is that the specification does not enforce such 
understanding (I can't see any related 'MUST'/'MUST NOT') and the s/w 
implementation allow the same freedom.

>> As a possible alternative to the change from v8 to v9 we can enforce
>> DATA_VALID and NEEDS_CSUM to be mutually esclusive.
> 
> It seems implied in this part of the spec:
> 
> """
> If the VIRTIO_NET_F_GUEST_CSUM feature was negotiated, the
> VIRTIO_NET_HDR_F_DATA_VALID bit in flags can be set: if so, device has
> validated the packet checksum. In case of multiple encapsulated
> protocols, one level of checksums has been validated.
> """
> 
> As DATA_VALID is only used in RX path, if packet has partial cusm,
> device can't do the validation anyhow. Or did you mean the hardware
> that can do hardware GRO, in that case, device should set NEEDS_CSUM
> but not DATA_VALID (maybe we need clarify this in the spec).

Indeed the above is a (possibly 'the') source of confusion. In the 
current spec (for non UDP tunneled GSO) it's unclear (I can't nee any 
related MUST/MUST NOT) if DATA_VALID can or can not be set with a GSO 
packet. AFAICS the current implementation in virtio_net_hdr_to_skb() 
allows that option.

If we restrict DATA_VALID to non GSO packet, I think most/all the doubt 
discussed here should go away. I'm unsure if we can enforce such 
restriction for non UDP tunneled GSO, it could break/conflict with 
existing setup/deplyment, but I think we can enforce that for UDP 
tunneled GSO packets.

Side note: as a consequence, sice we will not have anymore DATA_VALID 
together with VIRTIO_NET_HDR_F_GSO_UDP_TUNNEL_IPV{4,6}, the above will 
additional avoid the 'DATA_VALID with GSO over UDP tunnel means inner 
header is valid' strangeness.

>>>> @@ -639,12 +721,48 @@ \subsubsection{Packet Transmission}\label{sec:Device Types / Network Device / De
>>>>           header.
>>>>    \end{itemize}
>>>>
>>>> +If the VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO option has been negotiated, the
>>>> +driver MAY set the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the
>>>> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type}, if so:
>>>> +\begin{itemize}
>>>> +\item the driver MUST set \field{outer_th_offset} to the outer UDP header
>>>> +  offset and \field{inner_nh_offset} to the inner network header offset.
>>>> +  The \field{csum_start} and \field{csum_offset} fields point respectively
>>>> +  to the inner transport header and inner transport checksum field.
>>>> +\end{itemize}
>>>> +
>>>> +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the
>>>> +VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, the
>>>> +outer UDP header MUST NOT require checksum validation. That is, the
>>>> +outer UDP checksum value MUST be 0 or the validated complete checksum
>>>> +for such header.
>>>> +
>>>> +\begin{note}
>>>> +The valid complete checksum of the outer UDP header of individual segments
>>>> +can be computed by the driver prior to segmentation only if the GSO packet
>>>> +size is a multiple of \field{gso_size}, because then all segments
>>>> +have the same size and thus all data included in the outer UDP
>>>> +checksum is the same for every segment. These pre-computed segment
>>>> +length and checksum fields are different from those of the GSO
>>>> +packet.
>>>
>>> How could a device know if the checksum if pre-computed or not? (I
>>> could not find the relevant part in the sample kernel code).
>>
>> Note that the UDP tunnel csum offload is discusses in patch 2/2 as per
>> your request. At this point (only patch 1/2 applied) there is no concept
>> of csum offload for the outer header. With only patch 1/2 applied the
>> outer UDP will be always left-alone/ignored/not touched (either 0 or
>> pre-computed).
> 
> Correct me if I was wrong, my understanding is that, device should
> behave differently in those cases:
> 
> 0: do a full checksum
> pre-computed: checksum only the data part
> So you meant if device sees a non zero checksum, it realizes that it
> is a pre-computed one? If yes, we probably need to clarify this in the
> spec.

Lacking the UDP tunnel csum offload, the device will simply not touch 
the outer UDP csum. That will be the correct action both if the outer 
csum is 0 or the fully csum one.

>>> Firstly, it doesn't mandate the csum_start/csum_offset. Anything makes
>>> the tunnel GSO different?
>>
>> This difference is intentional, to satisfy a requested from Willem for
>> "robustness". We have a continuos flow of serious bugs in the current
>> virtio implementation due to the spec allowing "unexpected"/strange/weak
>> features combination leading to strange packet layout and crashes.
>>
>> i.e. see the kernel commits:
>> 49d14b54a527289d09a9480f214b8c586322310a
>> 6513eb3d3191574b58859ef2d6dc26c0277c6f81
>> 89add40066f9ed9abe5f7f886fe5789ff7e0c50e
>>
> 
> I see.
> 
>> The idea is that GSO over UDP tunnel specs are new, so we can be more
>> strict and we try to mandate only 'sane' flag/features/layout combinations.
> 
> But I don't see how mandating device to set csum stuffs can help.
> 
> 1) There's no way for the driver to know if device sets csum stuffs or not
> 2) Driver should not trust the device, malicious device can set
> arbitrary value, so driver needs its own validation

We are discussing only about csum validy, right? Lacking UDP tunnel csum 
offload, the device does not need to discriminate between outer zero 
csum and outer fully checksummed packets: in both case it should not 
touch the outer UDP csum.

The 'worst case scenario' is that the guest will send packets with bad 
outer csum, which is AFACS legit, unharmful and sometimes necesary (for 
testing purpose).

The guest can already send packets with bad csum in many different ways.

>>> Secondly, I think "one level of checksums" means the outer, this seems
>>> to be contradict to what we had here: 1) DATA_VALID seems to be for
>>> inner 2) no outer checksum offload
>>
>> Yes, with this spec GSO over UDP tunnel DATA_VALID specify the inner
>> head csum.
> 
> Just to make sure we are on the same page. I meant we probably need a
> clarification on the "one level of checksums" here. It looks ambiguous
> as I see we think differently here:
> 
> Assuming we have a packet with N level of tunnels.
> 
> 1) if it means the N level tunnel, this means
> 1.1) device needs complicated logic to analyze each level of the
> tunneling which seems to be impossible
> 1.2) in this case, if device want to set the csum_start/offset, we
> will need to answer if it for 0 or N
> 
> 2) if it means the 0 level tunnel (that's my understanding so far), we
> end up with my questions above
> 2.1) DATA_VALID means 0 level, we can't reuse it for inner
> 2.2) if DATA_VALID means 0 level, we should already had outer checksum
> offload at least in the RX path
> 
> Btw, I think we should clarify that the tunnel GSO only works for one level.

I hope and believe that this part should have been clarified by a 
previously discussed point: no DATA_VALID for GSO over UDP tunnel, no 
more differentiate/confusing meaning for the DATA VALID bit.

>>>> @@ -920,6 +1092,24 @@ \subsubsection{Processing of Incoming Packets}\label{sec:Device Types / Network
>>>>    VIRTIO_NET_HDR_F_DATA_VALID is set, the driver MUST NOT
>>>>    rely on the packet checksum being correct.
>>>>
>>>> +If both the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit and
>>>> +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in in \field{gso_type} are set,
>>>> +the driver MUST NOT accept the packet.
>>>> +
>>>> +If the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6
>>>> +bit in \field{gso_type} are not set, the driver MUST NOT use the
>>>> +\field{outer_th_offset} and \field{inner_nh_offset}.
>>>> +
>>>> +If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or
>>>> +the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set, and any of
>>>> +the following is true:
>>>> +\begin{itemize}
>>>> +\item the VIRTIO_NET_HDR_F_NEEDS_CSUM is not set in \field{flags}
>>>> +\item the \field{gso_type} excluding the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4
>>>> +bit and the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit is VIRTIO_NET_HDR_GSO_NONE
>>>
>>> Maybe we can just simplify this as
>>>
>>> "VIRTIO_NET_HDR_GSO_NONE is set in \field{gso_type}"
>>
>> I considered that option, the main problem is that such statement will
>> never be true: gso_type has at least a bit set
>> (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 or
>> VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6). I preferred to be more accurate
>> even if more verbose. I think we should preserve this accuracy.
> 
> Note that this has been under context of:
> 
> """
> If either the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 bit or
> the VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 bit in \field{gso_type} are set,
> """

That is my point: we know that this statement:

hdr->gso_type == VIRTIO_NET_HDR_GSO_NONE

is false. The goal is to be as accurate as possible, to avoid 
strangess/later misuse or misinterpretation.

/P


  reply	other threads:[~2024-10-10  7:40 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-04  8:13 [PATCH v9 0/2] virtio-net: define UDP tunnel offload Paolo Abeni
2024-10-04  8:13 ` [PATCH v9 1/2] virtio-net: define UDP tunnel segmentation offload feature Paolo Abeni
2024-10-09  7:18   ` Jason Wang
2024-10-09  8:37     ` Paolo Abeni
2024-10-10  3:17       ` Jason Wang
2024-10-10  7:40         ` Paolo Abeni [this message]
2024-10-11  2:08           ` Jason Wang
2024-10-11  7:50             ` Paolo Abeni
2024-10-14  7:20             ` Paolo Abeni
2024-10-17  6:47               ` Jason Wang
2024-10-17 15:34                 ` Paolo Abeni
2024-10-18  4:26                   ` Jason Wang
2024-10-18 10:10                     ` Paolo Abeni
2024-10-20 22:28                       ` Willem de Bruijn
2024-10-21 15:47                         ` Paolo Abeni
2024-10-22  7:54                           ` Jason Wang
2024-10-23 20:57                             ` Willem de Bruijn
2024-10-25  8:41                               ` Jason Wang
2024-10-21  6:54                       ` Jason Wang
2024-10-21 16:27                         ` Paolo Abeni
2024-10-22  7:42                           ` Jason Wang
2024-10-22 16:56                             ` Paolo Abeni
2024-10-25  8:28                               ` Jason Wang
2024-10-25 11:50                                 ` Paolo Abeni
2024-10-25 13:28                                   ` Willem de Bruijn
2024-10-25 14:35                                     ` Paolo Abeni
2024-10-25 15:47                                       ` Willem de Bruijn
2024-10-28  3:27                                   ` Jason Wang
2024-10-28 12:08                                     ` Paolo Abeni
2024-10-28 12:26                                       ` Willem de Bruijn
2024-10-28 14:23                                         ` Paolo Abeni
2024-10-29  7:32                                         ` Jason Wang
2024-10-04  8:13 ` [PATCH v9 2/2] virtio-net: define UDP tunnel checksum " Paolo Abeni
2024-10-09  7:18   ` Jason Wang
2024-10-09  9:39     ` Paolo Abeni
2024-10-10  4:22       ` Jason Wang
2024-10-04 16:53 ` [PATCH v9 0/2] virtio-net: define UDP tunnel offload Paolo Abeni
2024-10-09  7:24   ` Jason Wang
2024-10-09  8:08     ` Paolo Abeni
2024-10-10  2:29       ` Jason Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b90c368-b514-4077-a6bf-43da075af4fc@redhat.com \
    --to=pabeni@redhat.com \
    --cc=echaudro@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=kshankar@marvell.com \
    --cc=maxime.coquelin@redhat.com \
    --cc=sgarzare@redhat.com \
    --cc=virtio-comment@lists.linux.dev \
    --cc=willemb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox