* [PATCH net 0/2] Lift UDP_SEGMENT restriction for egress via device w/o csum offload
@ 2024-06-22 16:14 Jakub Sitnicki
2024-06-22 16:14 ` [PATCH net 1/2] udp: Allow GSO transmit from devices with no checksum offload Jakub Sitnicki
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Jakub Sitnicki @ 2024-06-22 16:14 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, kernel-team
This is a follow-up to an earlier question [1] if we can make UDP GSO work with
any egress device, even those with no checksum offload capability. That's the
default setup for TUN/TAP.
I leave it to the maintainers to decide if it qualifies as a fix. We plan to
backport it to our v6.6 either way, hence the submission to -net.
[1] https://lore.kernel.org/netdev/87jzqsld6q.fsf@cloudflare.com/
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
Jakub Sitnicki (2):
udp: Allow GSO transmit from devices with no checksum offload
selftests/net: Add test coverage for UDP GSO software fallback
net/ipv4/udp.c | 3 +--
net/ipv4/udp_offload.c | 8 +++++++
net/ipv6/udp.c | 3 +--
tools/testing/selftests/net/udpgso.c | 15 +++++++++---
tools/testing/selftests/net/udpgso.sh | 43 +++++++++++++++++++++++++++++++++++
5 files changed, 65 insertions(+), 7 deletions(-)
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH net 1/2] udp: Allow GSO transmit from devices with no checksum offload
2024-06-22 16:14 [PATCH net 0/2] Lift UDP_SEGMENT restriction for egress via device w/o csum offload Jakub Sitnicki
@ 2024-06-22 16:14 ` Jakub Sitnicki
2024-06-22 16:14 ` [PATCH net 2/2] selftests/net: Add test coverage for UDP GSO software fallback Jakub Sitnicki
2024-06-23 8:18 ` [PATCH net 0/2] Lift UDP_SEGMENT restriction for egress via device w/o csum offload Willem de Bruijn
2 siblings, 0 replies; 7+ messages in thread
From: Jakub Sitnicki @ 2024-06-22 16:14 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, kernel-team
Today sending a UDP GSO packet from a TUN device results in an EIO error:
import fcntl, os, struct
from socket import *
TUNSETIFF = 0x400454CA
IFF_TUN = 0x0001
IFF_NO_PI = 0x1000
UDP_SEGMENT = 103
tun_fd = os.open("/dev/net/tun", os.O_RDWR)
ifr = struct.pack("16sH", b"tun0", IFF_TUN | IFF_NO_PI)
fcntl.ioctl(tun_fd, TUNSETIFF, ifr)
os.system("ip addr add 192.0.2.1/24 dev tun0")
os.system("ip link set dev tun0 up")
s = socket(AF_INET, SOCK_DGRAM)
s.setsockopt(SOL_UDP, UDP_SEGMENT, 1200)
s.sendto(b"x" * 3000, ("192.0.2.2", 9)) # EIO
This is due to a check in the udp stack if the egress device offers
checksum offload. TUN/TAP devices, by default, don't advertise this
capability because it requires support from the TUN/TAP reader.
However, the GSO stack has a software fallback for checksum calculation,
which we can use. This way we don't force UDP_SEGMENT users to handle the
EIO error and implement a segmentation fallback. Even if it means a
potentially worse performance (see discussion in [1]).
Lift the restriction so that user-space can use UDP_SEGMENT with any egress
device. We also need to adjust the UDP GSO code to match the GSO stack
expectation about ip_summed field, as set in commit 8d63bee643f1 ("net:
avoid skb_warn_bad_offload false positives on UFO"). Otherwise we will hit
the bad offload check.
[1] https://lore.kernel.org/netdev/87jzqsld6q.fsf@cloudflare.com/
Fixes: bec1f6f69736 ("udp: generate gso with UDP_SEGMENT")
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
net/ipv4/udp.c | 3 +--
net/ipv4/udp_offload.c | 8 ++++++++
net/ipv6/udp.c | 3 +--
3 files changed, 10 insertions(+), 4 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 189c9113fe9a..8d542e9f4a93 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -938,8 +938,7 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
kfree_skb(skb);
return -EINVAL;
}
- if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite ||
- dst_xfrm(skb_dst(skb))) {
+ if (is_udplite || dst_xfrm(skb_dst(skb))) {
kfree_skb(skb);
return -EIO;
}
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 59448a2dbf2c..aa2e0a28ca61 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -357,6 +357,14 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
else
uh->check = gso_make_checksum(seg, ~check) ? : CSUM_MANGLED_0;
+ /* On the TX path, CHECKSUM_NONE and CHECKSUM_UNNECESSARY have the same
+ * meaning. However, check for bad offloads in the GSO stack expects the
+ * latter, if the checksum was calculated in software. To vouch for the
+ * segment skbs we actually need to set it on the gso_skb.
+ */
+ if (gso_skb->ip_summed == CHECKSUM_NONE)
+ gso_skb->ip_summed = CHECKSUM_UNNECESSARY;
+
/* update refcount for the packet */
if (copy_dtor) {
int delta = sum_truesize - gso_skb->truesize;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index c81a07ac0463..ab4d0332488e 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1257,8 +1257,7 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
kfree_skb(skb);
return -EINVAL;
}
- if (skb->ip_summed != CHECKSUM_PARTIAL || is_udplite ||
- dst_xfrm(skb_dst(skb))) {
+ if (is_udplite || dst_xfrm(skb_dst(skb))) {
kfree_skb(skb);
return -EIO;
}
--
2.40.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH net 2/2] selftests/net: Add test coverage for UDP GSO software fallback
2024-06-22 16:14 [PATCH net 0/2] Lift UDP_SEGMENT restriction for egress via device w/o csum offload Jakub Sitnicki
2024-06-22 16:14 ` [PATCH net 1/2] udp: Allow GSO transmit from devices with no checksum offload Jakub Sitnicki
@ 2024-06-22 16:14 ` Jakub Sitnicki
2024-06-23 2:27 ` Jakub Kicinski
2024-06-23 8:18 ` [PATCH net 0/2] Lift UDP_SEGMENT restriction for egress via device w/o csum offload Willem de Bruijn
2 siblings, 1 reply; 7+ messages in thread
From: Jakub Sitnicki @ 2024-06-22 16:14 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, kernel-team
Extend the existing test to exercise UDP GSO egress through devices with
various offload capabilities, including lack of checksum offload, which is
the default case for TUN/TAP devices.
Test against a dummy device because it is simpler to set up then TUN/TAP.
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
tools/testing/selftests/net/udpgso.c | 15 +++++++++---
tools/testing/selftests/net/udpgso.sh | 43 +++++++++++++++++++++++++++++++++++
2 files changed, 55 insertions(+), 3 deletions(-)
diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c
index 85b3baa3f7f3..3e74cfa1a2bf 100644
--- a/tools/testing/selftests/net/udpgso.c
+++ b/tools/testing/selftests/net/udpgso.c
@@ -53,6 +53,7 @@ static bool cfg_do_ipv6;
static bool cfg_do_connected;
static bool cfg_do_connectionless;
static bool cfg_do_msgmore;
+static bool cfg_do_recv = true;
static bool cfg_do_setsockopt;
static int cfg_specific_test_id = -1;
@@ -414,6 +415,9 @@ static void run_one(struct testcase *test, int fdt, int fdr,
if (!sent)
return;
+ if (!cfg_do_recv)
+ return;
+
if (test->gso_len)
mss = test->gso_len;
else
@@ -464,8 +468,10 @@ static void run_test(struct sockaddr *addr, socklen_t alen)
if (fdr == -1)
error(1, errno, "socket r");
- if (bind(fdr, addr, alen))
- error(1, errno, "bind");
+ if (cfg_do_recv) {
+ if (bind(fdr, addr, alen))
+ error(1, errno, "bind");
+ }
/* Have tests fail quickly instead of hang */
if (setsockopt(fdr, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)))
@@ -524,7 +530,7 @@ static void parse_opts(int argc, char **argv)
{
int c;
- while ((c = getopt(argc, argv, "46cCmst:")) != -1) {
+ while ((c = getopt(argc, argv, "46cCmRst:")) != -1) {
switch (c) {
case '4':
cfg_do_ipv4 = true;
@@ -541,6 +547,9 @@ static void parse_opts(int argc, char **argv)
case 'm':
cfg_do_msgmore = true;
break;
+ case 'R':
+ cfg_do_recv = false;
+ break;
case 's':
cfg_do_setsockopt = true;
break;
diff --git a/tools/testing/selftests/net/udpgso.sh b/tools/testing/selftests/net/udpgso.sh
index 6c63178086b0..7aad16750686 100755
--- a/tools/testing/selftests/net/udpgso.sh
+++ b/tools/testing/selftests/net/udpgso.sh
@@ -27,6 +27,31 @@ test_route_mtu() {
ip route add local fd00::1/128 table local dev lo mtu 1500
}
+setup_dummy_sink() {
+ ip link add name sink type dummy mtu 1500
+ ip addr add dev sink 10.0.0.0/24
+ ip addr add dev sink fd00::2/64 nodad
+ ip link set dev sink up
+}
+
+test_hw_gso_hw_csum() {
+ setup_dummy_sink
+ ethtool -K sink tx-checksum-ip-generic on >/dev/null
+ ethtool -K sink tx-udp-segmentation on >/dev/null
+}
+
+test_sw_gso_hw_csum() {
+ setup_dummy_sink
+ ethtool -K sink tx-checksum-ip-generic on >/dev/null
+ ethtool -K sink tx-udp-segmentation off >/dev/null
+}
+
+test_sw_gso_sw_csum() {
+ setup_dummy_sink
+ ethtool -K sink tx-checksum-ip-generic off >/dev/null
+ ethtool -K sink tx-udp-segmentation off >/dev/null
+}
+
if [ "$#" -gt 0 ]; then
"$1"
shift 2 # pop "test_*" arg and "--" delimiter
@@ -56,3 +81,21 @@ echo "ipv4 msg_more"
echo "ipv6 msg_more"
./in_netns.sh "$0" test_dev_mtu -- ./udpgso -6 -C -m
+
+echo "ipv4 hw-gso hw-csum"
+./in_netns.sh "$0" test_hw_gso_hw_csum -- ./udpgso -4 -C -R
+
+echo "ipv6 hw-gso hw-csum"
+./in_netns.sh "$0" test_hw_gso_hw_csum -- ./udpgso -6 -C -R
+
+echo "ipv4 sw-gso hw-csum"
+./in_netns.sh "$0" test_sw_gso_hw_csum -- ./udpgso -4 -C -R
+
+echo "ipv6 sw-gso hw-csum"
+./in_netns.sh "$0" test_sw_gso_hw_csum -- ./udpgso -6 -C -R
+
+echo "ipv4 sw-gso sw-csum"
+./in_netns.sh "$0" test_sw_gso_sw_csum -- ./udpgso -4 -C -R
+
+echo "ipv6 sw-gso sw-csum"
+./in_netns.sh "$0" test_sw_gso_sw_csum -- ./udpgso -6 -C -R
--
2.40.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH net 2/2] selftests/net: Add test coverage for UDP GSO software fallback
2024-06-22 16:14 ` [PATCH net 2/2] selftests/net: Add test coverage for UDP GSO software fallback Jakub Sitnicki
@ 2024-06-23 2:27 ` Jakub Kicinski
2024-06-24 10:20 ` Jakub Sitnicki
0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2024-06-23 2:27 UTC (permalink / raw)
To: Jakub Sitnicki
Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, kernel-team
On Sat, 22 Jun 2024 18:14:44 +0200 Jakub Sitnicki wrote:
> + ip link add name sink type dummy mtu 1500
Looks like this doesn't make CI happy:
https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/651680/59-udpgso-sh/stdout
I'm guessing "mtu 1500" needs to go before "type dummy"?
iproute2 hands over arguments after type to type specific handler.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net 0/2] Lift UDP_SEGMENT restriction for egress via device w/o csum offload
2024-06-22 16:14 [PATCH net 0/2] Lift UDP_SEGMENT restriction for egress via device w/o csum offload Jakub Sitnicki
2024-06-22 16:14 ` [PATCH net 1/2] udp: Allow GSO transmit from devices with no checksum offload Jakub Sitnicki
2024-06-22 16:14 ` [PATCH net 2/2] selftests/net: Add test coverage for UDP GSO software fallback Jakub Sitnicki
@ 2024-06-23 8:18 ` Willem de Bruijn
2024-06-24 11:29 ` Jakub Sitnicki
2 siblings, 1 reply; 7+ messages in thread
From: Willem de Bruijn @ 2024-06-23 8:18 UTC (permalink / raw)
To: Jakub Sitnicki, netdev
Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Willem de Bruijn, kernel-team
Jakub Sitnicki wrote:
> This is a follow-up to an earlier question [1] if we can make UDP GSO work with
> any egress device, even those with no checksum offload capability. That's the
> default setup for TUN/TAP.
>
> I leave it to the maintainers to decide if it qualifies as a fix. We plan to
> backport it to our v6.6 either way, hence the submission to -net.
>
> [1] https://lore.kernel.org/netdev/87jzqsld6q.fsf@cloudflare.com/
Agreed with the change to allow UDP_SEGMENT to work regardless of
device capabilities.
In my opinion this is a new feature with sufficient risk of unintended
side effects to be net-next material.
Maybe worth recording in patch 1 the reason for the original check:
that UDP_SEGMENT with software checksumming in the GSO stack may be
a regression vs copy_and_checksum in the send syscall.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
> Jakub Sitnicki (2):
> udp: Allow GSO transmit from devices with no checksum offload
> selftests/net: Add test coverage for UDP GSO software fallback
>
> net/ipv4/udp.c | 3 +--
> net/ipv4/udp_offload.c | 8 +++++++
> net/ipv6/udp.c | 3 +--
> tools/testing/selftests/net/udpgso.c | 15 +++++++++---
> tools/testing/selftests/net/udpgso.sh | 43 +++++++++++++++++++++++++++++++++++
> 5 files changed, 65 insertions(+), 7 deletions(-)
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net 2/2] selftests/net: Add test coverage for UDP GSO software fallback
2024-06-23 2:27 ` Jakub Kicinski
@ 2024-06-24 10:20 ` Jakub Sitnicki
0 siblings, 0 replies; 7+ messages in thread
From: Jakub Sitnicki @ 2024-06-24 10:20 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni,
Willem de Bruijn, kernel-team
On Sat, Jun 22, 2024 at 07:27 PM -07, Jakub Kicinski wrote:
> On Sat, 22 Jun 2024 18:14:44 +0200 Jakub Sitnicki wrote:
>> + ip link add name sink type dummy mtu 1500
>
> Looks like this doesn't make CI happy:
>
> https://netdev-3.bots.linux.dev/vmksft-net-dbg/results/651680/59-udpgso-sh/stdout
>
> I'm guessing "mtu 1500" needs to go before "type dummy"?
> iproute2 hands over arguments after type to type specific handler.
"ip link" got stricter, I see. Been testing against an old one :/
# ip -V
ip utility, iproute2-6.1.0, libbpf 1.2.2
Will rearrange. Thanks for the heads-up.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH net 0/2] Lift UDP_SEGMENT restriction for egress via device w/o csum offload
2024-06-23 8:18 ` [PATCH net 0/2] Lift UDP_SEGMENT restriction for egress via device w/o csum offload Willem de Bruijn
@ 2024-06-24 11:29 ` Jakub Sitnicki
0 siblings, 0 replies; 7+ messages in thread
From: Jakub Sitnicki @ 2024-06-24 11:29 UTC (permalink / raw)
To: Willem de Bruijn
Cc: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Willem de Bruijn, kernel-team
On Sun, Jun 23, 2024 at 04:18 AM -04, Willem de Bruijn wrote:
> Jakub Sitnicki wrote:
>> This is a follow-up to an earlier question [1] if we can make UDP GSO work with
>> any egress device, even those with no checksum offload capability. That's the
>> default setup for TUN/TAP.
>>
>> I leave it to the maintainers to decide if it qualifies as a fix. We plan to
>> backport it to our v6.6 either way, hence the submission to -net.
>>
>> [1] https://lore.kernel.org/netdev/87jzqsld6q.fsf@cloudflare.com/
>
> Agreed with the change to allow UDP_SEGMENT to work regardless of
> device capabilities.
Thanks for the stamp.
> In my opinion this is a new feature with sufficient risk of unintended
> side effects to be net-next material.
OK, I will resubmit for net-next.
> Maybe worth recording in patch 1 the reason for the original check:
> that UDP_SEGMENT with software checksumming in the GSO stack may be
> a regression vs copy_and_checksum in the send syscall.
Will expand the description.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-06-24 11:29 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-22 16:14 [PATCH net 0/2] Lift UDP_SEGMENT restriction for egress via device w/o csum offload Jakub Sitnicki
2024-06-22 16:14 ` [PATCH net 1/2] udp: Allow GSO transmit from devices with no checksum offload Jakub Sitnicki
2024-06-22 16:14 ` [PATCH net 2/2] selftests/net: Add test coverage for UDP GSO software fallback Jakub Sitnicki
2024-06-23 2:27 ` Jakub Kicinski
2024-06-24 10:20 ` Jakub Sitnicki
2024-06-23 8:18 ` [PATCH net 0/2] Lift UDP_SEGMENT restriction for egress via device w/o csum offload Willem de Bruijn
2024-06-24 11:29 ` Jakub Sitnicki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).