All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: netdev@vger.kernel.org,  "David S. Miller" <davem@davemloft.net>,
	 Eric Dumazet <edumazet@google.com>,
	 Jakub Kicinski <kuba@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>,
	 Willem de Bruijn <willemb@google.com>,
	kernel-team@cloudflare.com,
	syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com
Subject: Re: [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output
Date: Fri, 26 Jul 2024 13:23:13 +0200	[thread overview]
Message-ID: <87h6ccl7mm.fsf@cloudflare.com> (raw)
In-Reply-To: <CAF=yD-LLPPg77MUhdXrHUVJj4o2+rnOC_qsHc_8tKurTsAGkYw@mail.gmail.com> (Willem de Bruijn's message of "Thu, 25 Jul 2024 10:21:50 -0400")

On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote:
> On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
>> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
>> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
>> doesn't support checksum offload. This was done to satisfy the offload
>> checks in the gso stack.
>>
>> However, when sending a UDP GSO packet from a tunnel device, we will go
>> through the TX path and the GSO offload twice. Once for the tunnel device,
>> which acts as a passthru for GSO packets, and once for the underlying
>> egress device.
>>
>> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
>> offload checks still happen on transmit from a tunnel device. So if the skb
>> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
>> warning from the gso stack.
>
> I don't entirely understand. The check should not hit on pass through,
> where segs == skb:
>
>         if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
> !IS_ERR(segs)))
>                 skb_warn_bad_offload(skb);
>

That's something I should have explained better. Let me try to shed some
light on it now. We're hitting the skb_warn_bad_offload warning because
skb_mac_gso_segment doesn't return any segments (segs == NULL).

And that's because we bail out early out of __udp_gso_segment when we
detect that the tunnel device is capable of tx-udp-segmentation
(GSO_UDP_L4):

	if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
		/* Packet is from an untrusted source, reset gso_segs. */
		skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
							     mss);
		return NULL;
	}

It has not occurred to me before, but in the spirit of commit
8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could
tighten the check to exclude cases when segs == NULL. I'm thinking of:

	if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path)))
		skb_warn_bad_offload(skb);

That would be an alternative. Though I'm not sure I understand the
consequences of such change fully yet. Namely if we're wouldn't be
losing some diagnostics from the bad offload warning.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d74e9f88d65af8bb2e095aff506aa6eac755ada

>> Today this can occur in two situations, which we check for in
>> __ip_append_data() and __ip6_append_data():
>>
>> 1) when the tunnel device does not advertise checksum offload, or
>> 2) when there are IPv6 extension headers present.
>>
>> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
>> path, when still in the udp layer, since we need to have ip_summed set up
>> correctly for GSO processing by tunnel devices.
>
> The previous patch converted segments post segmentation to
> CHECKSUM_UNNECESSARY, which is fine as they had
> already been checksummed in software, and CHECKSUM_NONE
> packets on egress are common.
>
> This creates GSO packets without CHECKSUM_PARTIAL.
> Segmentation offload always requires checksum offload. So these
> would be weird new packets. And having CHECKSUM_NONE (or
> equivalent), but entering software checksumming is also confusing.

I agree this is confusing to reason about. That is a GSO packet with
CHECKSUM_UNNECESSARY which has not undergone segmentation and csum
offload in software.

Kind of related, I noticed that turning off tx-checksum-ip-generic with
ethtool doesn't disable tx-udp-segmentation. That looks like a bug.

> The crux is that I don't understand why the warning fires on tunnel
> exit when no segmentation takes place there. Hopefully we can fix
> in a way that does not introduce these weird GSO packets (but if
> not, so be it).

Attaching a self contained repro which I've been using to trace and
understand the GSO code:

---8<---

sh# cat repro-full.py
#!/bin/env python
#
# `modprobe ip6_tunnel` might be needed.
#

import os
import subprocess
import shutil
from socket import *

UDP_SEGMENT = 103

cmd = [shutil.which("ip"), "-batch", "/dev/stdin"]
script = b"""
link set dev lo up

link add name sink mtu 1540 type dummy
addr add dev sink fd11::2/48 nodad
link set dev sink up

tunnel add iptnl mode ip6ip6 remote fd11::1 local fd11::2 dev sink
link set dev iptnl mtu 1500
addr add dev iptnl fd00::2/48 nodad
link set dev iptnl up
"""
proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
proc.communicate(input=script)

os.system("ethtool -K sink tx-udp-segmentation off > /dev/null")
os.system("ethtool -K sink tx-checksum-ip-generic off > /dev/null")

# Alternatively to hopopts:
# os.system("ethtool -K iptnl tx-checksum-ip-generic off")

hopopts = b"\x00" * 8
s = socket(AF_INET6, SOCK_DGRAM)
s.setsockopt(IPPROTO_IPV6, IPV6_HOPOPTS, hopopts)
s.setsockopt(SOL_UDP, UDP_SEGMENT, 145)
s.sendto(b"x" * 3000, ("fd00::1", 9))
sh# perf ftrace -G __skb_gso_segment --graph-opts noirqs,depth=5 -- unshare -n python repro-full.py
# tracer: function_graph
#
# CPU  DURATION                  FUNCTION CALLS
# |     |   |                     |   |   |   |
 16)               |  __skb_gso_segment() {
 16)   0.288 us    |    irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
 16)   0.172 us    |    idle_cpu(); /* = 0x0 */
 16)               |    skb_mac_gso_segment() {
 16)   0.184 us    |      skb_network_protocol(); /* = 0xdd86 */
 16)   0.161 us    |      __rcu_read_lock(); /* = 0x2 */
 16)               |      ipv6_gso_segment() {
 16)               |        rcu_read_lock_held() {
 16)   0.151 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.514 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.152 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.459 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.151 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.459 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        udp6_ufo_fragment() {
 16)   0.237 us    |          __udp_gso_segment(); /* = 0x0 */
 16)   0.727 us    |        } /* udp6_ufo_fragment = 0x0 */
 16)   3.049 us    |      } /* ipv6_gso_segment = 0x0 */
 16)   0.171 us    |      __rcu_read_unlock(); /* = 0x1 */
 16)   4.748 us    |    } /* skb_mac_gso_segment = 0x0 */
 16)               |    skb_warn_bad_offload() {
 [...]
 16) ! 785.215 us  |    } /* skb_warn_bad_offload = 0x0 */
 16) ! 800.986 us  |  } /* __skb_gso_segment = 0x0 */
 16)               |  __skb_gso_segment() {
 16)   0.394 us    |    irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
 16)   0.181 us    |    idle_cpu(); /* = 0x0 */
 16)               |    skb_mac_gso_segment() {
 16)   0.182 us    |      skb_network_protocol(); /* = 0xdd86 */
 16)   0.178 us    |      __rcu_read_lock(); /* = 0x3 */
 16)               |      ipv6_gso_segment() {
 16)               |        rcu_read_lock_held() {
 16)   0.155 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.556 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.159 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.480 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        rcu_read_lock_held() {
 16)   0.159 us    |          rcu_lockdep_current_cpu_online(); /* = 0x1 */
 16)   0.480 us    |        } /* rcu_read_lock_held = 0x1 */
 16)               |        ip6ip6_gso_segment() {
 16) + 22.176 us   |          ipv6_gso_segment(); /* = 0xffffa00c03018c00 */
 16) + 24.875 us   |        } /* ip6ip6_gso_segment = 0xffffa00c03018c00 */
 16) + 27.416 us   |      } /* ipv6_gso_segment = 0xffffa00c03018c00 */
 16)   0.230 us    |      __rcu_read_unlock(); /* = 0x2 */
 16) + 29.065 us   |    } /* skb_mac_gso_segment = 0xffffa00c03018c00 */
 16) + 32.828 us   |  } /* __skb_gso_segment = 0xffffa00c03018c00 */
sh#

  reply	other threads:[~2024-07-26 11:23 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-25  9:55 [PATCH net 0/2] Fix bad offload warning when sending UDP GSO from a tunnel device Jakub Sitnicki
2024-07-25  9:55 ` [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output Jakub Sitnicki
2024-07-25 14:21   ` Willem de Bruijn
2024-07-26 11:23     ` Jakub Sitnicki [this message]
2024-07-26 13:58       ` Willem de Bruijn
2024-07-29 22:10         ` Jakub Sitnicki
2024-07-25  9:55 ` [PATCH net 2/2] selftests/net: Add coverage for UDP GSO with egress from tunnel Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h6ccl7mm.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com \
    --cc=willemb@google.com \
    --cc=willemdebruijn.kernel@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.