* [PATCH net 0/2] Fix bad offload warning when sending UDP GSO from a tunnel device
@ 2024-07-25 9:55 Jakub Sitnicki
2024-07-25 9:55 ` [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output Jakub Sitnicki
2024-07-25 9:55 ` [PATCH net 2/2] selftests/net: Add coverage for UDP GSO with egress from tunnel Jakub Sitnicki
0 siblings, 2 replies; 7+ messages in thread
From: Jakub Sitnicki @ 2024-07-25 9:55 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, kernel-team, syzbot+e15b7e15b8a751a91d9a
This series addresses a recent regression report from syzbot [1].
Please see patch 1 description for details.
[1] https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
Jakub Sitnicki (2):
udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output
selftests/net: Add coverage for UDP GSO with egress from tunnel
net/ipv4/udp.c | 7 ++++++
net/ipv4/udp_offload.c | 8 -------
net/ipv6/udp.c | 7 ++++++
tools/testing/selftests/net/udpgso.sh | 41 ++++++++++++++++++++++++++++++++---
4 files changed, 52 insertions(+), 11 deletions(-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output
2024-07-25 9:55 [PATCH net 0/2] Fix bad offload warning when sending UDP GSO from a tunnel device Jakub Sitnicki
@ 2024-07-25 9:55 ` Jakub Sitnicki
2024-07-25 14:21 ` Willem de Bruijn
2024-07-25 9:55 ` [PATCH net 2/2] selftests/net: Add coverage for UDP GSO with egress from tunnel Jakub Sitnicki
1 sibling, 1 reply; 7+ messages in thread
From: Jakub Sitnicki @ 2024-07-25 9:55 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, kernel-team, syzbot+e15b7e15b8a751a91d9a
In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
checksum offload") we have added a tweak in the UDP GSO code to mark GSO
packets being sent out as CHECKSUM_UNNECESSARY when the egress device
doesn't support checksum offload. This was done to satisfy the offload
checks in the gso stack.
However, when sending a UDP GSO packet from a tunnel device, we will go
through the TX path and the GSO offload twice. Once for the tunnel device,
which acts as a passthru for GSO packets, and once for the underlying
egress device.
Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
offload checks still happen on transmit from a tunnel device. So if the skb
is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
warning from the gso stack.
Today this can occur in two situations, which we check for in
__ip_append_data() and __ip6_append_data():
1) when the tunnel device does not advertise checksum offload, or
2) when there are IPv6 extension headers present.
Syzbot has triggered the second case, producing a warning as below:
ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869)
WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
Modules linked in:
CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
[...]
Call Trace:
<TASK>
__skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661
__dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415
neigh_output include/net/neighbour.h:542 [inline]
ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137
ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222
ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958
udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292
udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0xef/0x270 net/socket.c:745
____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
___sys_sendmsg net/socket.c:2639 [inline]
__sys_sendmmsg+0x3b2/0x740 net/socket.c:2725
__do_sys_sendmmsg net/socket.c:2754 [inline]
__se_sys_sendmmsg net/socket.c:2751 [inline]
__x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[...]
</TASK>
To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
path, when still in the udp layer, since we need to have ip_summed set up
correctly for GSO processing by tunnel devices.
Note that even if GSO packet gets marked as CHECKSUM_PARTIAL due to tunnel
advertising HW csum offload, it will not prevent software csum offload in
UDP GSO from kicking in if the underlying device doesn't offer csum
offload (for example, a TUN/TAP device with default config). This is
because we recheck device features in gso stack instead relying on the
ip_summed hint.
Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
net/ipv4/udp.c | 7 +++++++
net/ipv4/udp_offload.c | 8 --------
net/ipv6/udp.c | 7 +++++++
3 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 49c622e743e8..b7254b8a1e56 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -946,6 +946,13 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
}
if (datalen > cork->gso_size) {
+ /* On the TX path CHECKSUM_NONE and CHECKSUM_UNNECESSARY
+ * have the same meaning. However, check for bad
+ * offloads in the GSO stack expects the latter, if the
+ * checksum can be calculated in software.
+ */
+ if (skb->ip_summed == CHECKSUM_NONE)
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
skb_shinfo(skb)->gso_size = cork->gso_size;
skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index aa2e0a28ca61..59448a2dbf2c 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -357,14 +357,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
else
uh->check = gso_make_checksum(seg, ~check) ? : CSUM_MANGLED_0;
- /* On the TX path, CHECKSUM_NONE and CHECKSUM_UNNECESSARY have the same
- * meaning. However, check for bad offloads in the GSO stack expects the
- * latter, if the checksum was calculated in software. To vouch for the
- * segment skbs we actually need to set it on the gso_skb.
- */
- if (gso_skb->ip_summed == CHECKSUM_NONE)
- gso_skb->ip_summed = CHECKSUM_UNNECESSARY;
-
/* update refcount for the packet */
if (copy_dtor) {
int delta = sum_truesize - gso_skb->truesize;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 6602a2e9cdb5..360392fc2b68 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1262,6 +1262,13 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
}
if (datalen > cork->gso_size) {
+ /* On the TX path CHECKSUM_NONE and CHECKSUM_UNNECESSARY
+ * have the same meaning. However, check for bad
+ * offloads in the GSO stack expects the latter, if the
+ * checksum can be calculated in software.
+ */
+ if (skb->ip_summed == CHECKSUM_NONE)
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
skb_shinfo(skb)->gso_size = cork->gso_size;
skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
--
2.40.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH net 2/2] selftests/net: Add coverage for UDP GSO with egress from tunnel
2024-07-25 9:55 [PATCH net 0/2] Fix bad offload warning when sending UDP GSO from a tunnel device Jakub Sitnicki
2024-07-25 9:55 ` [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output Jakub Sitnicki
@ 2024-07-25 9:55 ` Jakub Sitnicki
1 sibling, 0 replies; 7+ messages in thread
From: Jakub Sitnicki @ 2024-07-25 9:55 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, kernel-team
After enabling UDP GSO for devices not offering checksum offload, we have
hit a regression where a bad offload warning can be triggered if egressing
through a tunnel device.
This can happen when the tunnel device has checksum offload disabled or
when IPv6 extension headers are present.
Extend the UDP GSO tests to cover the egress from a tunnel device scenario.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
tools/testing/selftests/net/udpgso.sh | 41 ++++++++++++++++++++++++++++++++---
1 file changed, 38 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/udpgso.sh b/tools/testing/selftests/net/udpgso.sh
index 85d1fa3c1ff7..3fb6fea06b28 100755
--- a/tools/testing/selftests/net/udpgso.sh
+++ b/tools/testing/selftests/net/udpgso.sh
@@ -28,9 +28,13 @@ test_route_mtu() {
}
setup_dummy_sink() {
- ip link add name sink mtu 1500 type dummy
- ip addr add dev sink 10.0.0.0/24
- ip addr add dev sink fd00::2/64 nodad
+ mtu="${1:-1500}"
+ prefix4="${2:-10.0.0.2/24}"
+ prefix6="${3:-fd00::2/48}"
+
+ ip link add name sink mtu "${mtu}" type dummy
+ ip addr add dev sink "${prefix4}"
+ ip addr add dev sink "${prefix6}" nodad
ip link set dev sink up
}
@@ -52,6 +56,25 @@ test_sw_gso_sw_csum() {
ethtool -K sink tx-udp-segmentation off >/dev/null
}
+setup_ipip_tunnel() {
+ setup_dummy_sink 1520 10.1.1.2/24 fd11::2/48
+
+ ip tunnel add iptnl mode ipip local 10.1.1.2 remote 10.1.1.1
+ ip addr add dev iptnl 10.0.0.2/24
+ ip addr add dev iptnl fd00::2/48 nodad
+ ip link set dev iptnl up
+}
+
+test_tunnel_hw_csum() {
+ setup_ipip_tunnel
+ ethtool -K iptnl tx-checksum-ip-generic on >/dev/null
+}
+
+test_tunnel_sw_csum() {
+ setup_ipip_tunnel
+ ethtool -K iptnl tx-checksum-ip-generic off >/dev/null
+}
+
if [ "$#" -gt 0 ]; then
"$1"
shift 2 # pop "test_*" arg and "--" delimiter
@@ -99,3 +122,15 @@ echo "ipv4 sw-gso sw-csum"
echo "ipv6 sw-gso sw-csum"
./in_netns.sh "$0" test_sw_gso_sw_csum -- ./udpgso -6 -C -R
+
+echo "ipv4 tunnel hw-csum"
+./in_netns.sh "$0" test_tunnel_hw_csum -- ./udpgso -4 -C -R
+
+echo "ipv6 tunnel hw-csum"
+./in_netns.sh "$0" test_tunnel_hw_csum -- ./udpgso -6 -C -R
+
+echo "ipv4 tunnel sw-csum"
+./in_netns.sh "$0" test_tunnel_sw_csum -- ./udpgso -4 -C -R
+
+echo "ipv6 tunnel sw-csum"
+./in_netns.sh "$0" test_tunnel_sw_csum -- ./udpgso -6 -C -R
--
2.40.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output
2024-07-25 9:55 ` [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output Jakub Sitnicki
@ 2024-07-25 14:21 ` Willem de Bruijn
2024-07-26 11:23 ` Jakub Sitnicki
0 siblings, 1 reply; 7+ messages in thread
From: Willem de Bruijn @ 2024-07-25 14:21 UTC (permalink / raw)
To: Jakub Sitnicki
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Willem de Bruijn, kernel-team,
syzbot+e15b7e15b8a751a91d9a
On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
> doesn't support checksum offload. This was done to satisfy the offload
> checks in the gso stack.
>
> However, when sending a UDP GSO packet from a tunnel device, we will go
> through the TX path and the GSO offload twice. Once for the tunnel device,
> which acts as a passthru for GSO packets, and once for the underlying
> egress device.
>
> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
> offload checks still happen on transmit from a tunnel device. So if the skb
> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
> warning from the gso stack.
I don't entirely understand. The check should not hit on pass through,
where segs == skb:
if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
!IS_ERR(segs)))
skb_warn_bad_offload(skb);
> Today this can occur in two situations, which we check for in
> __ip_append_data() and __ip6_append_data():
>
> 1) when the tunnel device does not advertise checksum offload, or
> 2) when there are IPv6 extension headers present.
>
> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
> path, when still in the udp layer, since we need to have ip_summed set up
> correctly for GSO processing by tunnel devices.
The previous patch converted segments post segmentation to
CHECKSUM_UNNECESSARY, which is fine as they had
already been checksummed in software, and CHECKSUM_NONE
packets on egress are common.
This creates GSO packets without CHECKSUM_PARTIAL.
Segmentation offload always requires checksum offload. So these
would be weird new packets. And having CHECKSUM_NONE (or
equivalent), but entering software checksumming is also confusing.
The crux is that I don't understand why the warning fires on tunnel
exit when no segmentation takes place there. Hopefully we can fix
in a way that does not introduce these weird GSO packets (but if
not, so be it).
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output
2024-07-25 14:21 ` Willem de Bruijn
@ 2024-07-26 11:23 ` Jakub Sitnicki
2024-07-26 13:58 ` Willem de Bruijn
0 siblings, 1 reply; 7+ messages in thread
From: Jakub Sitnicki @ 2024-07-26 11:23 UTC (permalink / raw)
To: Willem de Bruijn
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Willem de Bruijn, kernel-team,
syzbot+e15b7e15b8a751a91d9a
On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote:
> On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
>> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
>> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
>> doesn't support checksum offload. This was done to satisfy the offload
>> checks in the gso stack.
>>
>> However, when sending a UDP GSO packet from a tunnel device, we will go
>> through the TX path and the GSO offload twice. Once for the tunnel device,
>> which acts as a passthru for GSO packets, and once for the underlying
>> egress device.
>>
>> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
>> offload checks still happen on transmit from a tunnel device. So if the skb
>> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
>> warning from the gso stack.
>
> I don't entirely understand. The check should not hit on pass through,
> where segs == skb:
>
> if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
> !IS_ERR(segs)))
> skb_warn_bad_offload(skb);
>
That's something I should have explained better. Let me try to shed some
light on it now. We're hitting the skb_warn_bad_offload warning because
skb_mac_gso_segment doesn't return any segments (segs == NULL).
And that's because we bail out early out of __udp_gso_segment when we
detect that the tunnel device is capable of tx-udp-segmentation
(GSO_UDP_L4):
if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
/* Packet is from an untrusted source, reset gso_segs. */
skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
mss);
return NULL;
}
It has not occurred to me before, but in the spirit of commit
8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could
tighten the check to exclude cases when segs == NULL. I'm thinking of:
if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path)))
skb_warn_bad_offload(skb);
That would be an alternative. Though I'm not sure I understand the
consequences of such change fully yet. Namely if we're wouldn't be
losing some diagnostics from the bad offload warning.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d74e9f88d65af8bb2e095aff506aa6eac755ada
>> Today this can occur in two situations, which we check for in
>> __ip_append_data() and __ip6_append_data():
>>
>> 1) when the tunnel device does not advertise checksum offload, or
>> 2) when there are IPv6 extension headers present.
>>
>> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
>> path, when still in the udp layer, since we need to have ip_summed set up
>> correctly for GSO processing by tunnel devices.
>
> The previous patch converted segments post segmentation to
> CHECKSUM_UNNECESSARY, which is fine as they had
> already been checksummed in software, and CHECKSUM_NONE
> packets on egress are common.
>
> This creates GSO packets without CHECKSUM_PARTIAL.
> Segmentation offload always requires checksum offload. So these
> would be weird new packets. And having CHECKSUM_NONE (or
> equivalent), but entering software checksumming is also confusing.
I agree this is confusing to reason about. That is a GSO packet with
CHECKSUM_UNNECESSARY which has not undergone segmentation and csum
offload in software.
Kind of related, I noticed that turning off tx-checksum-ip-generic with
ethtool doesn't disable tx-udp-segmentation. That looks like a bug.
> The crux is that I don't understand why the warning fires on tunnel
> exit when no segmentation takes place there. Hopefully we can fix
> in a way that does not introduce these weird GSO packets (but if
> not, so be it).
Attaching a self contained repro which I've been using to trace and
understand the GSO code:
---8<---
sh# cat repro-full.py
#!/bin/env python
#
# `modprobe ip6_tunnel` might be needed.
#
import os
import subprocess
import shutil
from socket import *
UDP_SEGMENT = 103
cmd = [shutil.which("ip"), "-batch", "/dev/stdin"]
script = b"""
link set dev lo up
link add name sink mtu 1540 type dummy
addr add dev sink fd11::2/48 nodad
link set dev sink up
tunnel add iptnl mode ip6ip6 remote fd11::1 local fd11::2 dev sink
link set dev iptnl mtu 1500
addr add dev iptnl fd00::2/48 nodad
link set dev iptnl up
"""
proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
proc.communicate(input=script)
os.system("ethtool -K sink tx-udp-segmentation off > /dev/null")
os.system("ethtool -K sink tx-checksum-ip-generic off > /dev/null")
# Alternatively to hopopts:
# os.system("ethtool -K iptnl tx-checksum-ip-generic off")
hopopts = b"\x00" * 8
s = socket(AF_INET6, SOCK_DGRAM)
s.setsockopt(IPPROTO_IPV6, IPV6_HOPOPTS, hopopts)
s.setsockopt(SOL_UDP, UDP_SEGMENT, 145)
s.sendto(b"x" * 3000, ("fd00::1", 9))
sh# perf ftrace -G __skb_gso_segment --graph-opts noirqs,depth=5 -- unshare -n python repro-full.py
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
16) | __skb_gso_segment() {
16) 0.288 us | irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
16) 0.172 us | idle_cpu(); /* = 0x0 */
16) | skb_mac_gso_segment() {
16) 0.184 us | skb_network_protocol(); /* = 0xdd86 */
16) 0.161 us | __rcu_read_lock(); /* = 0x2 */
16) | ipv6_gso_segment() {
16) | rcu_read_lock_held() {
16) 0.151 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
16) 0.514 us | } /* rcu_read_lock_held = 0x1 */
16) | rcu_read_lock_held() {
16) 0.152 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
16) 0.459 us | } /* rcu_read_lock_held = 0x1 */
16) | rcu_read_lock_held() {
16) 0.151 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
16) 0.459 us | } /* rcu_read_lock_held = 0x1 */
16) | udp6_ufo_fragment() {
16) 0.237 us | __udp_gso_segment(); /* = 0x0 */
16) 0.727 us | } /* udp6_ufo_fragment = 0x0 */
16) 3.049 us | } /* ipv6_gso_segment = 0x0 */
16) 0.171 us | __rcu_read_unlock(); /* = 0x1 */
16) 4.748 us | } /* skb_mac_gso_segment = 0x0 */
16) | skb_warn_bad_offload() {
[...]
16) ! 785.215 us | } /* skb_warn_bad_offload = 0x0 */
16) ! 800.986 us | } /* __skb_gso_segment = 0x0 */
16) | __skb_gso_segment() {
16) 0.394 us | irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
16) 0.181 us | idle_cpu(); /* = 0x0 */
16) | skb_mac_gso_segment() {
16) 0.182 us | skb_network_protocol(); /* = 0xdd86 */
16) 0.178 us | __rcu_read_lock(); /* = 0x3 */
16) | ipv6_gso_segment() {
16) | rcu_read_lock_held() {
16) 0.155 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
16) 0.556 us | } /* rcu_read_lock_held = 0x1 */
16) | rcu_read_lock_held() {
16) 0.159 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
16) 0.480 us | } /* rcu_read_lock_held = 0x1 */
16) | rcu_read_lock_held() {
16) 0.159 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
16) 0.480 us | } /* rcu_read_lock_held = 0x1 */
16) | ip6ip6_gso_segment() {
16) + 22.176 us | ipv6_gso_segment(); /* = 0xffffa00c03018c00 */
16) + 24.875 us | } /* ip6ip6_gso_segment = 0xffffa00c03018c00 */
16) + 27.416 us | } /* ipv6_gso_segment = 0xffffa00c03018c00 */
16) 0.230 us | __rcu_read_unlock(); /* = 0x2 */
16) + 29.065 us | } /* skb_mac_gso_segment = 0xffffa00c03018c00 */
16) + 32.828 us | } /* __skb_gso_segment = 0xffffa00c03018c00 */
sh#
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output
2024-07-26 11:23 ` Jakub Sitnicki
@ 2024-07-26 13:58 ` Willem de Bruijn
2024-07-29 22:10 ` Jakub Sitnicki
0 siblings, 1 reply; 7+ messages in thread
From: Willem de Bruijn @ 2024-07-26 13:58 UTC (permalink / raw)
To: Jakub Sitnicki
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Willem de Bruijn, kernel-team,
syzbot+e15b7e15b8a751a91d9a
On Fri, Jul 26, 2024 at 7:23 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote:
> > On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>
> >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
> >> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
> >> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
> >> doesn't support checksum offload. This was done to satisfy the offload
> >> checks in the gso stack.
> >>
> >> However, when sending a UDP GSO packet from a tunnel device, we will go
> >> through the TX path and the GSO offload twice. Once for the tunnel device,
> >> which acts as a passthru for GSO packets, and once for the underlying
> >> egress device.
> >>
> >> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
> >> offload checks still happen on transmit from a tunnel device. So if the skb
> >> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
> >> warning from the gso stack.
> >
> > I don't entirely understand. The check should not hit on pass through,
> > where segs == skb:
> >
> > if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
> > !IS_ERR(segs)))
> > skb_warn_bad_offload(skb);
> >
>
> That's something I should have explained better. Let me try to shed some
> light on it now. We're hitting the skb_warn_bad_offload warning because
> skb_mac_gso_segment doesn't return any segments (segs == NULL).
>
> And that's because we bail out early out of __udp_gso_segment when we
> detect that the tunnel device is capable of tx-udp-segmentation
> (GSO_UDP_L4):
>
> if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
> /* Packet is from an untrusted source, reset gso_segs. */
> skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
> mss);
> return NULL;
> }
Oh I see. Thanks.
> It has not occurred to me before, but in the spirit of commit
> 8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could
> tighten the check to exclude cases when segs == NULL. I'm thinking of:
>
> if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path)))
> skb_warn_bad_offload(skb);
That looks sensible to me. And nicer than the ip_summed conversion in
udp_send_skb.
> That would be an alternative. Though I'm not sure I understand the
> consequences of such change fully yet. Namely if we're wouldn't be
> losing some diagnostics from the bad offload warning.
>
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d74e9f88d65af8bb2e095aff506aa6eac755ada
>
> >> Today this can occur in two situations, which we check for in
> >> __ip_append_data() and __ip6_append_data():
> >>
> >> 1) when the tunnel device does not advertise checksum offload, or
> >> 2) when there are IPv6 extension headers present.
> >>
> >> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX
> >> path, when still in the udp layer, since we need to have ip_summed set up
> >> correctly for GSO processing by tunnel devices.
> >
> > The previous patch converted segments post segmentation to
> > CHECKSUM_UNNECESSARY, which is fine as they had
> > already been checksummed in software, and CHECKSUM_NONE
> > packets on egress are common.
> >
> > This creates GSO packets without CHECKSUM_PARTIAL.
> > Segmentation offload always requires checksum offload. So these
> > would be weird new packets. And having CHECKSUM_NONE (or
> > equivalent), but entering software checksumming is also confusing.
>
> I agree this is confusing to reason about. That is a GSO packet with
> CHECKSUM_UNNECESSARY which has not undergone segmentation and csum
> offload in software.
I was mistaken earlier. Was looking at this code just yesterday too for
https://lore.kernel.org/netdev/20240726023359.879166-1-willemdebruijn.kernel@gmail.com/
We do set the GSO skb already skb CHECKSUM_NONE. So your suggestion is
not a significant change.
> Kind of related, I noticed that turning off tx-checksum-ip-generic with
> ethtool doesn't disable tx-udp-segmentation. That looks like a bug.
I saw the same :)
> > The crux is that I don't understand why the warning fires on tunnel
> > exit when no segmentation takes place there. Hopefully we can fix
> > in a way that does not introduce these weird GSO packets (but if
> > not, so be it).
>
> Attaching a self contained repro which I've been using to trace and
> understand the GSO code:
>
> ---8<---
>
> sh# cat repro-full.py
> #!/bin/env python
> #
> # `modprobe ip6_tunnel` might be needed.
> #
>
> import os
> import subprocess
> import shutil
> from socket import *
>
> UDP_SEGMENT = 103
>
> cmd = [shutil.which("ip"), "-batch", "/dev/stdin"]
> script = b"""
> link set dev lo up
>
> link add name sink mtu 1540 type dummy
> addr add dev sink fd11::2/48 nodad
> link set dev sink up
>
> tunnel add iptnl mode ip6ip6 remote fd11::1 local fd11::2 dev sink
> link set dev iptnl mtu 1500
> addr add dev iptnl fd00::2/48 nodad
> link set dev iptnl up
> """
> proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
> proc.communicate(input=script)
>
> os.system("ethtool -K sink tx-udp-segmentation off > /dev/null")
> os.system("ethtool -K sink tx-checksum-ip-generic off > /dev/null")
>
> # Alternatively to hopopts:
> # os.system("ethtool -K iptnl tx-checksum-ip-generic off")
>
> hopopts = b"\x00" * 8
> s = socket(AF_INET6, SOCK_DGRAM)
> s.setsockopt(IPPROTO_IPV6, IPV6_HOPOPTS, hopopts)
> s.setsockopt(SOL_UDP, UDP_SEGMENT, 145)
> s.sendto(b"x" * 3000, ("fd00::1", 9))
> sh# perf ftrace -G __skb_gso_segment --graph-opts noirqs,depth=5 -- unshare -n python repro-full.py
> # tracer: function_graph
> #
> # CPU DURATION FUNCTION CALLS
> # | | | | | | |
> 16) | __skb_gso_segment() {
> 16) 0.288 us | irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
> 16) 0.172 us | idle_cpu(); /* = 0x0 */
> 16) | skb_mac_gso_segment() {
> 16) 0.184 us | skb_network_protocol(); /* = 0xdd86 */
> 16) 0.161 us | __rcu_read_lock(); /* = 0x2 */
> 16) | ipv6_gso_segment() {
> 16) | rcu_read_lock_held() {
> 16) 0.151 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
> 16) 0.514 us | } /* rcu_read_lock_held = 0x1 */
> 16) | rcu_read_lock_held() {
> 16) 0.152 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
> 16) 0.459 us | } /* rcu_read_lock_held = 0x1 */
> 16) | rcu_read_lock_held() {
> 16) 0.151 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
> 16) 0.459 us | } /* rcu_read_lock_held = 0x1 */
> 16) | udp6_ufo_fragment() {
> 16) 0.237 us | __udp_gso_segment(); /* = 0x0 */
> 16) 0.727 us | } /* udp6_ufo_fragment = 0x0 */
> 16) 3.049 us | } /* ipv6_gso_segment = 0x0 */
> 16) 0.171 us | __rcu_read_unlock(); /* = 0x1 */
> 16) 4.748 us | } /* skb_mac_gso_segment = 0x0 */
> 16) | skb_warn_bad_offload() {
> [...]
> 16) ! 785.215 us | } /* skb_warn_bad_offload = 0x0 */
> 16) ! 800.986 us | } /* __skb_gso_segment = 0x0 */
> 16) | __skb_gso_segment() {
> 16) 0.394 us | irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */
> 16) 0.181 us | idle_cpu(); /* = 0x0 */
> 16) | skb_mac_gso_segment() {
> 16) 0.182 us | skb_network_protocol(); /* = 0xdd86 */
> 16) 0.178 us | __rcu_read_lock(); /* = 0x3 */
> 16) | ipv6_gso_segment() {
> 16) | rcu_read_lock_held() {
> 16) 0.155 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
> 16) 0.556 us | } /* rcu_read_lock_held = 0x1 */
> 16) | rcu_read_lock_held() {
> 16) 0.159 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
> 16) 0.480 us | } /* rcu_read_lock_held = 0x1 */
> 16) | rcu_read_lock_held() {
> 16) 0.159 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */
> 16) 0.480 us | } /* rcu_read_lock_held = 0x1 */
> 16) | ip6ip6_gso_segment() {
> 16) + 22.176 us | ipv6_gso_segment(); /* = 0xffffa00c03018c00 */
> 16) + 24.875 us | } /* ip6ip6_gso_segment = 0xffffa00c03018c00 */
> 16) + 27.416 us | } /* ipv6_gso_segment = 0xffffa00c03018c00 */
> 16) 0.230 us | __rcu_read_unlock(); /* = 0x2 */
> 16) + 29.065 us | } /* skb_mac_gso_segment = 0xffffa00c03018c00 */
> 16) + 32.828 us | } /* __skb_gso_segment = 0xffffa00c03018c00 */
> sh#
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output
2024-07-26 13:58 ` Willem de Bruijn
@ 2024-07-29 22:10 ` Jakub Sitnicki
0 siblings, 0 replies; 7+ messages in thread
From: Jakub Sitnicki @ 2024-07-29 22:10 UTC (permalink / raw)
To: Willem de Bruijn
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Willem de Bruijn, kernel-team,
syzbot+e15b7e15b8a751a91d9a
On Fri, Jul 26, 2024 at 09:58 AM -04, Willem de Bruijn wrote:
> On Fri, Jul 26, 2024 at 7:23 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote:
>> > On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> >>
>> >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
>> >> checksum offload") we have added a tweak in the UDP GSO code to mark GSO
>> >> packets being sent out as CHECKSUM_UNNECESSARY when the egress device
>> >> doesn't support checksum offload. This was done to satisfy the offload
>> >> checks in the gso stack.
>> >>
>> >> However, when sending a UDP GSO packet from a tunnel device, we will go
>> >> through the TX path and the GSO offload twice. Once for the tunnel device,
>> >> which acts as a passthru for GSO packets, and once for the underlying
>> >> egress device.
>> >>
>> >> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO
>> >> offload checks still happen on transmit from a tunnel device. So if the skb
>> >> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a
>> >> warning from the gso stack.
>> >
>> > I don't entirely understand. The check should not hit on pass through,
>> > where segs == skb:
>> >
>> > if (segs != skb && unlikely(skb_needs_check(skb, tx_path) &&
>> > !IS_ERR(segs)))
>> > skb_warn_bad_offload(skb);
>> >
>>
>> That's something I should have explained better. Let me try to shed some
>> light on it now. We're hitting the skb_warn_bad_offload warning because
>> skb_mac_gso_segment doesn't return any segments (segs == NULL).
>>
>> And that's because we bail out early out of __udp_gso_segment when we
>> detect that the tunnel device is capable of tx-udp-segmentation
>> (GSO_UDP_L4):
>>
>> if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
>> /* Packet is from an untrusted source, reset gso_segs. */
>> skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
>> mss);
>> return NULL;
>> }
>
> Oh I see. Thanks.
>
>> It has not occurred to me before, but in the spirit of commit
>> 8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could
>> tighten the check to exclude cases when segs == NULL. I'm thinking of:
>>
>> if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path)))
>> skb_warn_bad_offload(skb);
>
> That looks sensible to me. And nicer than the ip_summed conversion in
> udp_send_skb.
I've audited all existing ->gso_segment callbacks. skb_mac_gso_segment()
returns no segments, that is segs == NULL, if the callback chain ends
with either of these:
… → udp[46]_ufo_fragment → __udp_gso_segment → skb_gso_ok == true
… → tcp[46]_gso_segment → tcp_gso_segment → skb_gso_ok == true
… → sctp_gso_segment → skb_gso_ok == true
IOW when the device advertises that it can handle the desired GSO kind
(skb_gso_ok() returns true).
Considering that a device offering HW GSO and no checksum offload at the
same time makes no sense, I also think that tweaking the bad offload
detection to exclude the !segs case doesn't deprive us of diagnostics.
I will change to that in v2.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-07-29 22:10 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-25 9:55 [PATCH net 0/2] Fix bad offload warning when sending UDP GSO from a tunnel device Jakub Sitnicki
2024-07-25 9:55 ` [PATCH net 1/2] udp: Mark GSO packets as CHECKSUM_UNNECESSARY early on on output Jakub Sitnicki
2024-07-25 14:21 ` Willem de Bruijn
2024-07-26 11:23 ` Jakub Sitnicki
2024-07-26 13:58 ` Willem de Bruijn
2024-07-29 22:10 ` Jakub Sitnicki
2024-07-25 9:55 ` [PATCH net 2/2] selftests/net: Add coverage for UDP GSO with egress from tunnel Jakub Sitnicki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).