netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org,  "David S. Miller" <davem@davemloft.net>,
	 Eric Dumazet <edumazet@google.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>,
	 Willem de Bruijn <willemb@google.com>,
	kernel-team@cloudflare.com,
	syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com
Subject: Re: [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output
Date: Fri, 26 Jul 2024 13:23:13 +0200	[thread overview]
Message-ID: <87h6ccl7mm.fsf@cloudflare.com> (raw)
In-Reply-To: <CAF=yD-LLPPg77MUhdXrHUVJj4o2+rnOC_qsHc_8tKurTsAGkYw@mail.gmail.com> (Willem de Bruijn's message of "Thu, 25 Jul 2024 10:21:50 -0400")

On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote:
> On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
>> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
>> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
>> doesn't support checksum offload. This was done to satisfy the offload
>> checks in the gso stack.
>>
>> However, when sending a UDP GSO packet from a tunnel device, we will go
>> through the TX path and the GSO offload twice. Once for the tunnel device,
>> which acts as a passthru for GSO packets, and once for the underlying
>> egress device.
>>
>> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
>> offload checks still happen on transmit from a tunnel device. So if the skb
>> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
>> warning from the gso stack.
>
> I don't entirely understand. The check should not hit on pass through,
> where segs == skb:
>
>         if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
> !IS_ERR(segs)))
>                 skb_warn_bad_offload(skb);
>

That's something I should have explained better. Let me try to shed some
light on it now. We're hitting the skb_warn_bad_offload warning because
skb_mac_gso_segment doesn't return any segments (segs == NULL).

And that's because we bail out early out of __udp_gso_segment when we
detect that the tunnel device is capable of tx-udp-segmentation
(GSO_UDP_L4):

	if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
		/* Packet is from an untrusted source, reset gso_segs. */
		skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
							     mss);
		return NULL;
	}

It has not occurred to me before, but in the spirit of commit
8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could
tighten the check to exclude cases when segs == NULL. I'm thinking of:

	if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path)))
		skb_warn_bad_offload(skb);

That would be an alternative. Though I'm not sure I understand the
consequences of such change fully yet. Namely if we're wouldn't be
losing some diagnostics from the bad offload warning.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d74e9f88d65af8bb2e095aff506aa6eac755ada

>> Today this can occur in two situations, which we check for in
>> __ip_append_data() and __ip6_append_data():
>>
>> 1) when the tunnel device does not advertise checksum offload, or
>> 2) when there are IPv6 extension headers present.
>>
>> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
>> path, when still in the udp layer, since we need to have ip_summed set up
>> correctly for GSO processing by tunnel devices.
>
> The previous patch converted segments post segmentation to
> CHECKSUM_UNNECESSARY, which is fine as they had
> already been checksummed in software, and CHECKSUM_NONE
> packets on egress are common.
>
> This creates GSO packets without CHECKSUM_PARTIAL.
> Segmentation offload always requires checksum offload. So these
> would be weird new packets. And having CHECKSUM_NONE (or
> equivalent), but entering software checksumming is also confusing.

I agree this is confusing to reason about. That is a GSO packet with
CHECKSUM_UNNECESSARY which has not undergone segmentation and csum
offload in software.

Kind of related, I noticed that turning off tx-checksum-ip-generic with
ethtool doesn't disable tx-udp-segmentation. That looks like a bug.

> The crux is that I don't understand why the warning fires on tunnel
> exit when no segmentation takes place there. Hopefully we can fix
> in a way that does not introduce these weird GSO packets (but if
> not, so be it).

Attaching a self contained repro which I've been using to trace and
understand the GSO code:

---8<---

sh# cat repro-full.py
#!/bin/env python
#
# `modprobe ip6_tunnel` might be needed.
#

import os
import subprocess
import shutil
from socket import *

UDP_SEGMENT = 103

cmd = [shutil.which("ip"), "-batch", "/dev/stdin"]
script = b"""
link set dev lo up

link add name sink mtu 1540 type dummy
addr add dev sink fd11::2/48 nodad
link set dev sink up

tunnel add iptnl mode ip6ip6 remote fd11::1 local fd11::2 dev sink
link set dev iptnl mtu 1500
addr add dev iptnl fd00::2/48 nodad
link set dev iptnl up
"""
proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
proc.communicate(input=script)

os.system("ethtool -K sink tx-udp-segmentation off > /dev/null")
os.system("ethtool -K sink tx-checksum-ip-generic off > /dev/null")

# Alternatively to hopopts:
# os.system("ethtool -K iptnl tx-checksum-ip-generic off")

hopopts = b"\x00" * 8
s = socket(AF_INET6, SOCK_DGRAM)
s.setsockopt(IPPROTO_IPV6, IPV6_HOPOPTS, hopopts)
s.setsockopt(SOL_UDP, UDP_SEGMENT, 145)
s.sendto(b"x" * 3000, ("fd00::1", 9))
sh# perf ftrace -G __skb_gso_segment --graph-opts noirqs,depth=5 -- unshare -n python repro-full.py
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 16)               |  __skb_gso_segment() {
 16)   0.288 us    |    irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
 16)   0.172 us    |    idle_cpu(); /* = 0x0 */
 16)               |    skb_mac_gso_segment() {
 16)   0.184 us    |      skb_network_protocol(); /* = 0xdd86 */
 16)   0.161 us    |      __rcu_read_lock(); /* = 0x2 */
 16)               |      ipv6_gso_segment() {
 16)               |        rcu_read_lock_held() {
 16)   0.151 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.514 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.152 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.459 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.151 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.459 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        udp6_ufo_fragment() {
 16)   0.237 us    |          __udp_gso_segment(); /* = 0x0 */
 16)   0.727 us    |        } /* udp6_ufo_fragment = 0x0 */
 16)   3.049 us    |      } /* ipv6_gso_segment = 0x0 */
 16)   0.171 us    |      __rcu_read_unlock(); /* = 0x1 */
 16)   4.748 us    |    } /* skb_mac_gso_segment = 0x0 */
 16)               |    skb_warn_bad_offload() {
 [...]
 16) ! 785.215 us  |    } /* skb_warn_bad_offload = 0x0 */
 16) ! 800.986 us  |  } /* __skb_gso_segment = 0x0 */
 16)               |  __skb_gso_segment() {
 16)   0.394 us    |    irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
 16)   0.181 us    |    idle_cpu(); /* = 0x0 */
 16)               |    skb_mac_gso_segment() {
 16)   0.182 us    |      skb_network_protocol(); /* = 0xdd86 */
 16)   0.178 us    |      __rcu_read_lock(); /* = 0x3 */
 16)               |      ipv6_gso_segment() {
 16)               |        rcu_read_lock_held() {
 16)   0.155 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.556 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.159 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.480 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.159 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.480 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        ip6ip6_gso_segment() {
 16) + 22.176 us   |          ipv6_gso_segment(); /* = 0xffffa00c03018c00 */
 16) + 24.875 us   |        } /* ip6ip6_gso_segment = 0xffffa00c03018c00 */
 16) + 27.416 us   |      } /* ipv6_gso_segment = 0xffffa00c03018c00 */
 16)   0.230 us    |      __rcu_read_unlock(); /* = 0x2 */
 16) + 29.065 us   |    } /* skb_mac_gso_segment = 0xffffa00c03018c00 */
 16) + 32.828 us   |  } /* __skb_gso_segment = 0xffffa00c03018c00 */
sh#

  reply	other threads:[~2024-07-26 11:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-25  9:55 [PATCH net 0/2] Fix bad offload warning when sending UDP GSO from a tunnel device Jakub Sitnicki
2024-07-25  9:55 ` [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output Jakub Sitnicki
2024-07-25 14:21   ` Willem de Bruijn
2024-07-26 11:23     ` Jakub Sitnicki [this message]
2024-07-26 13:58       ` Willem de Bruijn
2024-07-29 22:10         ` Jakub Sitnicki
2024-07-25  9:55 ` [PATCH net 2/2] selftests/net: Add coverage for UDP GSO with egress from tunnel Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h6ccl7mm.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).