* EIO on send with UDP_SEGMENT
@ 2023-11-08 10:58 Jakub Sitnicki
2023-11-08 15:10 ` Willem de Bruijn
0 siblings, 1 reply; 4+ messages in thread
From: Jakub Sitnicki @ 2023-11-08 10:58 UTC (permalink / raw)
To: Willem de Bruijn; +Cc: netdev, kernel-team
Hi Willem et al,
We have hit the EIO error path in udp_send_skb introduced in commit bec1f6f69736
("udp: generate gso with UDP_SEGMENT") [0]:
if (skb->ip_summed != CHECKSUM_PARTIAL || ...) {
kfree_skb(skb);
return -EIO;
}
... when attempting to send a GSO packet, using UDP_SEGMENT option, from
a TUN device which didn't have any offloads enabled (the default case).
A trivial reproducer for that would be:
ip tuntap add dev tun0 mode tun
ip addr add dev tun0 192.0.2.1/24
ip link set dev tun0 up
strace -e %net python -c '
from socket import *
s = socket(AF_INET, SOCK_DGRAM)
s.setsockopt(SOL_UDP, 103, 1200)
s.sendto(b"x" * 3000, ("192.0.2.2", 9))
'
which yields:
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
setsockopt(3, SOL_UDP, UDP_SEGMENT, [1200], 4) = 0
sendto(3, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 3000, 0, {sa_family=AF_INET, sin_port=htons(9), sin_addr=inet_addr("192.0.2.2")}, 16) = -1 EIO (Input/output error)
This has been a surprise and caused us some pain. I think it comes down
to that anyone using UDP_SEGMENT has to implement a segmentation
fallback in user-space. Just to be on the safe side. We can't really
assume that any TUN/TAP interface, which happens to be our egress
device, has at least checksum offload enabled and implemented.
Which is not ideal.
So it made us wonder if anything can be done about it?
As it turns out, skb_segment() in GSO path implements a software
fallback not only for segmentation but also for checksumming [1].
What is more, when we removed the skb->ip_summed == CHECKSUM_PARTIAL
restriction in udp_send, as an experiment, we were able to observe fully
checksummed segments in packet capture.
Which brings me to my question -
Do you think the restriction in udp_send_skb can be lifted or tweaked?
Thanks,
Jakub
[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bec1f6f697362c5bc635dacd7ac8499d0a10a4e7
[1] https://elixir.bootlin.com/linux/v6.6/source/net/core/skbuff.c#L4626
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: EIO on send with UDP_SEGMENT
2023-11-08 10:58 EIO on send with UDP_SEGMENT Jakub Sitnicki
@ 2023-11-08 15:10 ` Willem de Bruijn
2023-11-08 17:55 ` Jakub Sitnicki
0 siblings, 1 reply; 4+ messages in thread
From: Willem de Bruijn @ 2023-11-08 15:10 UTC (permalink / raw)
To: Jakub Sitnicki; +Cc: Willem de Bruijn, netdev, kernel-team
On Wed, Nov 8, 2023 at 6:03 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Hi Willem et al,
>
> We have hit the EIO error path in udp_send_skb introduced in commit bec1f6f69736
> ("udp: generate gso with UDP_SEGMENT") [0]:
>
> if (skb->ip_summed != CHECKSUM_PARTIAL || ...) {
> kfree_skb(skb);
> return -EIO;
> }
>
> ... when attempting to send a GSO packet, using UDP_SEGMENT option, from
> a TUN device which didn't have any offloads enabled (the default case).
>
> A trivial reproducer for that would be:
>
> ip tuntap add dev tun0 mode tun
> ip addr add dev tun0 192.0.2.1/24
> ip link set dev tun0 up
>
> strace -e %net python -c '
> from socket import *
> s = socket(AF_INET, SOCK_DGRAM)
> s.setsockopt(SOL_UDP, 103, 1200)
> s.sendto(b"x" * 3000, ("192.0.2.2", 9))
> '
>
> which yields:
>
> socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC, IPPROTO_IP) = 3
> setsockopt(3, SOL_UDP, UDP_SEGMENT, [1200], 4) = 0
> sendto(3, "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"..., 3000, 0, {sa_family=AF_INET, sin_port=htons(9), sin_addr=inet_addr("192.0.2.2")}, 16) = -1 EIO (Input/output error)
>
> This has been a surprise and caused us some pain. I think it comes down
> to that anyone using UDP_SEGMENT has to implement a segmentation
> fallback in user-space. Just to be on the safe side. We can't really
> assume that any TUN/TAP interface, which happens to be our egress
> device, has at least checksum offload enabled and implemented.
>
> Which is not ideal.
> So it made us wonder if anything can be done about it?
>
> As it turns out, skb_segment() in GSO path implements a software
> fallback not only for segmentation but also for checksumming [1].
>
> What is more, when we removed the skb->ip_summed == CHECKSUM_PARTIAL
> restriction in udp_send, as an experiment, we were able to observe fully
> checksummed segments in packet capture.
>
> Which brings me to my question -
>
> Do you think the restriction in udp_send_skb can be lifted or tweaked?
The argument against has been that segmentation offload offers no
performance benefit if the stack has to fall back onto software
checksumming.
If this limitation makes userspace code more complex, by having to
branch between segmentation offload and not depending on device
features, that would be an argument to drop it. As you point out, it
is not needed for correctness.
>
> Thanks,
> Jakub
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bec1f6f697362c5bc635dacd7ac8499d0a10a4e7
> [1] https://elixir.bootlin.com/linux/v6.6/source/net/core/skbuff.c#L4626
>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: EIO on send with UDP_SEGMENT
2023-11-08 15:10 ` Willem de Bruijn
@ 2023-11-08 17:55 ` Jakub Sitnicki
2023-11-08 19:11 ` Willem de Bruijn
0 siblings, 1 reply; 4+ messages in thread
From: Jakub Sitnicki @ 2023-11-08 17:55 UTC (permalink / raw)
To: Willem de Bruijn; +Cc: Willem de Bruijn, netdev, kernel-team
On Wed, Nov 08, 2023 at 10:10 AM -05, Willem de Bruijn wrote:
> On Wed, Nov 8, 2023 at 6:03 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
[...]
>> Do you think the restriction in udp_send_skb can be lifted or tweaked?
>
> The argument against has been that segmentation offload offers no
> performance benefit if the stack has to fall back onto software
> checksumming.
Interesting. Thanks for sharing the context. Must admit, it would have
not been my first guess that the software GSO+checksum itself is not
worth it. Despite it happening late on the TX path.
> If this limitation makes userspace code more complex, by having to
> branch between segmentation offload and not depending on device
> features, that would be an argument to drop it. As you point out, it
> is not needed for correctness.
That answers my question. Thanks for feedback.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: EIO on send with UDP_SEGMENT
2023-11-08 17:55 ` Jakub Sitnicki
@ 2023-11-08 19:11 ` Willem de Bruijn
0 siblings, 0 replies; 4+ messages in thread
From: Willem de Bruijn @ 2023-11-08 19:11 UTC (permalink / raw)
To: Jakub Sitnicki; +Cc: Willem de Bruijn, netdev, kernel-team
On Wed, Nov 8, 2023 at 1:08 PM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Wed, Nov 08, 2023 at 10:10 AM -05, Willem de Bruijn wrote:
> > On Wed, Nov 8, 2023 at 6:03 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> [...]
>
> >> Do you think the restriction in udp_send_skb can be lifted or tweaked?
> >
> > The argument against has been that segmentation offload offers no
> > performance benefit if the stack has to fall back onto software
> > checksumming.
>
> Interesting. Thanks for sharing the context. Must admit, it would have
> not been my first guess that the software GSO+checksum itself is not
> worth it. Despite it happening late on the TX path.
The heuristic is that checksum during copy_from_user is cheap, while
checksum after qdisc dequeue might have to read cold memory.
There will be cases where the data is warm. So YMMV. But that is the
basis for the choice.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-11-08 19:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-08 10:58 EIO on send with UDP_SEGMENT Jakub Sitnicki
2023-11-08 15:10 ` Willem de Bruijn
2023-11-08 17:55 ` Jakub Sitnicki
2023-11-08 19:11 ` Willem de Bruijn
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).