* [RFC] Big TCP and ping support vs. max ICMP{,v6} packet size
@ 2024-08-19 12:49 Petr Vorel
2024-08-19 12:56 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: Petr Vorel @ 2024-08-19 12:49 UTC (permalink / raw)
To: Eric Dumazet, Xin Long; +Cc: netdev
Hi Eric, Xin,
I see you both worked on Big TCP support for IPv4/IPv6. I wonder if anybody was
thinking about add Big TCP to raw socket or ICMP datagram socket. I'm not sure
what would be a real use case (due MTU limitation is Big TCP mostly used on
local networks anyway).
I'm asking because I'm just about to limit -s value for ping in iputils (this
influences size of payload of ICMP{,v6} being send) to 65507 (IPv4) or 65527 (IPv6):
65507 = 65535 (IPv4 packet size) - 20 (min IPv4 header size) - 8 (ICMP header size)
65527 = 65535 (IPv6 packet size) - 8 (ICMPv6 header size)
which would then block using Big TCP.
The reasons are:
1) The implementation was wrong [1] (signed integer overflow when using
INT_MAX).
2) Kernel limits it exactly to these values:
* ICMP datagram socket net/ipv4/ping.c in ping_common_sendmsg() [2] (used in
both ping_v4_sendmsg() and ping_v6_sendmsg()):
if (len > 0xFFFF)
return -EMSGSIZE;
* raw socket IPv4 in raw_sendmsg() [3]:
err = -EMSGSIZE;
if (len > 0xFFFF)
goto out;
* Raw socket IPv6 I suppose either in rawv6_send_hdrinc() [4] (I suppose when
IP_HDRINCL set when userspace passes also IP header) or in ip6_append_data() [5]
otherwise.
3) Other ping implementations also limit it [6] (I suppose due 2)).
Kind regards,
Petr
[1] https://github.com/iputils/iputils/issues/542
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/ping.c?h=v6.11-rc4#n655
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/raw.c?h=v6.11-rc4#n498
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/raw.c?h=v6.11-rc4#n605
[5] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/ip6_output.c?h=v6.11-rc4#n1453
[6] https://github.com/pevik/iputils/wiki/Maximum-value-for-%E2%80%90s-(size)
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFC] Big TCP and ping support vs. max ICMP{,v6} packet size 2024-08-19 12:49 [RFC] Big TCP and ping support vs. max ICMP{,v6} packet size Petr Vorel @ 2024-08-19 12:56 ` Eric Dumazet 2024-08-20 15:38 ` Petr Vorel 0 siblings, 1 reply; 5+ messages in thread From: Eric Dumazet @ 2024-08-19 12:56 UTC (permalink / raw) To: Petr Vorel; +Cc: Xin Long, netdev On Mon, Aug 19, 2024 at 2:50 PM Petr Vorel <pvorel@suse.cz> wrote: > > Hi Eric, Xin, > > I see you both worked on Big TCP support for IPv4/IPv6. I wonder if anybody was > thinking about add Big TCP to raw socket or ICMP datagram socket. I'm not sure > what would be a real use case (due MTU limitation is Big TCP mostly used on > local networks anyway). I think you are mistaken. BIG TCP does not have any MTU restrictions and can be used on any network. Think about BIG TCP being GSO/TSO/GRO with bigger logical packet sizes. > > I'm asking because I'm just about to limit -s value for ping in iputils (this > influences size of payload of ICMP{,v6} being send) to 65507 (IPv4) or 65527 (IPv6): > > 65507 = 65535 (IPv4 packet size) - 20 (min IPv4 header size) - 8 (ICMP header size) > 65527 = 65535 (IPv6 packet size) - 8 (ICMPv6 header size) This would involve IP fragmentation, this is orthogonal to GSO/GRO. > > which would then block using Big TCP. > > The reasons are: > 1) The implementation was wrong [1] (signed integer overflow when using > INT_MAX). > > 2) Kernel limits it exactly to these values: > > * ICMP datagram socket net/ipv4/ping.c in ping_common_sendmsg() [2] (used in > both ping_v4_sendmsg() and ping_v6_sendmsg()): > > if (len > 0xFFFF) > return -EMSGSIZE; > > * raw socket IPv4 in raw_sendmsg() [3]: > > err = -EMSGSIZE; > if (len > 0xFFFF) > goto out; > > * Raw socket IPv6 I suppose either in rawv6_send_hdrinc() [4] (I suppose when > IP_HDRINCL set when userspace passes also IP header) or in ip6_append_data() [5] > otherwise. > > 3) Other ping implementations also limit it [6] (I suppose due 2)). > > Kind regards, > Petr > > [1] https://github.com/iputils/iputils/issues/542 > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/ping.c?h=v6.11-rc4#n655 > [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/raw.c?h=v6.11-rc4#n498 > [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/raw.c?h=v6.11-rc4#n605 > [5] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/ip6_output.c?h=v6.11-rc4#n1453 > [6] https://github.com/pevik/iputils/wiki/Maximum-value-for-%E2%80%90s-(size) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] Big TCP and ping support vs. max ICMP{,v6} packet size 2024-08-19 12:56 ` Eric Dumazet @ 2024-08-20 15:38 ` Petr Vorel 2024-08-20 18:35 ` Eric Dumazet 0 siblings, 1 reply; 5+ messages in thread From: Petr Vorel @ 2024-08-20 15:38 UTC (permalink / raw) To: Eric Dumazet; +Cc: Xin Long, netdev Hi Eric, > On Mon, Aug 19, 2024 at 2:50 PM Petr Vorel <pvorel@suse.cz> wrote: > > Hi Eric, Xin, > > I see you both worked on Big TCP support for IPv4/IPv6. I wonder if anybody was > > thinking about add Big TCP to raw socket or ICMP datagram socket. I'm not sure > > what would be a real use case (due MTU limitation is Big TCP mostly used on > > local networks anyway). > I think you are mistaken. > BIG TCP does not have any MTU restrictions and can be used on any network. > Think about BIG TCP being GSO/TSO/GRO with bigger logical packet sizes. First, thanks for a quick info. I need to study more BIG TCP. Because I was wondering if this could be used for sending larger ICMP echo requests > 65k as it's possible in FreeBSD, where it's done via Jumbograms [1]: ping -6 -b 70000 -s 68000 ::1 > > I'm asking because I'm just about to limit -s value for ping in iputils (this > > influences size of payload of ICMP{,v6} being send) to 65507 (IPv4) or 65527 (IPv6): > > 65507 = 65535 (IPv4 packet size) - 20 (min IPv4 header size) - 8 (ICMP header size) > > 65527 = 65535 (IPv6 packet size) - 8 (ICMPv6 header size) > This would involve IP fragmentation, this is orthogonal to GSO/GRO. But now I'm not sure as GSO/TSO/GRO are in NIC drivers, but this change would be needed in raw sockets and/or ICMP datagram sockets (net/ipv[46]/{raw,ping}.c). Also from RFC 8504 point 15. [2] I understood that Jumbograms are not relevant any more (on FreeBSD it's only for loopback): 15. Removed Jumbograms (RFC 2675) as they aren't deployed. I guess that's why BIG TCP was created, to have real support anywhere. Kind regards, Petr [1] https://docs.freebsd.org/en/books/developers-handbook/ipv6/#ipv6-jumbo [2] https://datatracker.ietf.org/doc/html/rfc8504#appendix-A > > which would then block using Big TCP. > > The reasons are: > > 1) The implementation was wrong [1] (signed integer overflow when using > > INT_MAX). > > 2) Kernel limits it exactly to these values: > > * ICMP datagram socket net/ipv4/ping.c in ping_common_sendmsg() [2] (used in > > both ping_v4_sendmsg() and ping_v6_sendmsg()): > > if (len > 0xFFFF) > > return -EMSGSIZE; > > * raw socket IPv4 in raw_sendmsg() [3]: > > err = -EMSGSIZE; > > if (len > 0xFFFF) > > goto out; > > * Raw socket IPv6 I suppose either in rawv6_send_hdrinc() [4] (I suppose when > > IP_HDRINCL set when userspace passes also IP header) or in ip6_append_data() [5] > > otherwise. > > 3) Other ping implementations also limit it [6] (I suppose due 2)). > > Kind regards, > > Petr > > [1] https://github.com/iputils/iputils/issues/542 > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/ping.c?h=v6.11-rc4#n655 > > [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv4/raw.c?h=v6.11-rc4#n498 > > [4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/raw.c?h=v6.11-rc4#n605 > > [5] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/ipv6/ip6_output.c?h=v6.11-rc4#n1453 > > [6] https://github.com/pevik/iputils/wiki/Maximum-value-for-%E2%80%90s-(size) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] Big TCP and ping support vs. max ICMP{,v6} packet size 2024-08-20 15:38 ` Petr Vorel @ 2024-08-20 18:35 ` Eric Dumazet 2024-08-21 20:12 ` Petr Vorel 0 siblings, 1 reply; 5+ messages in thread From: Eric Dumazet @ 2024-08-20 18:35 UTC (permalink / raw) To: Petr Vorel; +Cc: Xin Long, netdev On Tue, Aug 20, 2024 at 5:38 PM Petr Vorel <pvorel@suse.cz> wrote: > > Hi Eric, > > > On Mon, Aug 19, 2024 at 2:50 PM Petr Vorel <pvorel@suse.cz> wrote: > > > > Hi Eric, Xin, > > > > I see you both worked on Big TCP support for IPv4/IPv6. I wonder if anybody was > > > thinking about add Big TCP to raw socket or ICMP datagram socket. I'm not sure > > > what would be a real use case (due MTU limitation is Big TCP mostly used on > > > local networks anyway). > > > I think you are mistaken. > > > BIG TCP does not have any MTU restrictions and can be used on any network. > > > Think about BIG TCP being GSO/TSO/GRO with bigger logical packet sizes. > > First, thanks for a quick info. I need to study more BIG TCP. Because I was > wondering if this could be used for sending larger ICMP echo requests > 65k > as it's possible in FreeBSD, where it's done via Jumbograms [1]: > > ping -6 -b 70000 -s 68000 ::1 I guess ip6_append_data() is a bit conservative and uses IPV6_MAXPLEN while it should not ;) Also ping needs to add the jumboheader if/when using RAW6 sockets With the following patch, the following commands sends big packets just fine ifconfig lo mtu 90000 ping -s 68000 ::1 diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index ab504d31f0cdd8dec9ab01bf9d6e6517307609cd..6b1668e037dae3c88052c50f02f319355baf4304 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1473,7 +1473,7 @@ static int __ip6_append_data(struct sock *sk, } if (ip6_sk_ignore_df(sk)) - maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN; + maxnonfragsize = max_t(u32, mtu, sizeof(struct ipv6hdr) + IPV6_MAXPLEN); else maxnonfragsize = mtu; ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC] Big TCP and ping support vs. max ICMP{,v6} packet size 2024-08-20 18:35 ` Eric Dumazet @ 2024-08-21 20:12 ` Petr Vorel 0 siblings, 0 replies; 5+ messages in thread From: Petr Vorel @ 2024-08-21 20:12 UTC (permalink / raw) To: Eric Dumazet; +Cc: Xin Long, netdev Hi Eric, Xin, > On Tue, Aug 20, 2024 at 5:38 PM Petr Vorel <pvorel@suse.cz> wrote: > > Hi Eric, > > > On Mon, Aug 19, 2024 at 2:50 PM Petr Vorel <pvorel@suse.cz> wrote: > > > > Hi Eric, Xin, > > > > I see you both worked on Big TCP support for IPv4/IPv6. I wonder if anybody was > > > > thinking about add Big TCP to raw socket or ICMP datagram socket. I'm not sure > > > > what would be a real use case (due MTU limitation is Big TCP mostly used on > > > > local networks anyway). > > > I think you are mistaken. > > > BIG TCP does not have any MTU restrictions and can be used on any network. > > > Think about BIG TCP being GSO/TSO/GRO with bigger logical packet sizes. > > First, thanks for a quick info. I need to study more BIG TCP. Because I was > > wondering if this could be used for sending larger ICMP echo requests > 65k > > as it's possible in FreeBSD, where it's done via Jumbograms [1]: > > ping -6 -b 70000 -s 68000 ::1 > I guess ip6_append_data() is a bit conservative and uses IPV6_MAXPLEN > while it should not ;) > Also ping needs to add the jumboheader if/when using RAW6 sockets First I thought you mean to modify kernel net/ipv6/raw.c and net/ipv6/icmp.c (+ net/ipv4/ping.c for ICMP datagram socket). I.e. to create "Big RAW" and "Big UDP" (maybe the modification could be in just in net/ipv6/icmp.c for both types of sockets). But thinking it twice you may mean to modify userspace ping to add jumboheader. > With the following patch, the following commands sends big packets just fine > ifconfig lo mtu 90000 > ping -s 68000 ::1 Yes, it looks like with the above patch it's possible to send a bigger packet, it goes from userspace to kernel, but here is broken. From what I observed for 65528 (the first value which exceeds the limit) on raw socket (net/ipv6/raw.c, net/ipv6/ip6_output.c), rawv6_sendmsg() calls ip6_append_data() and after that somewhere in 3rd pskb_pull() call skb->data_len (unsigned int) changes from 65528 to 0, skb->len from 65576 to 40 (IP header). Also checksum (likely due this) fails. ICMP datagram socket starts with net/ipv[46]/ping.c but ping_v6_sendmsg() also calls ip6_append_data() and suffers the same problem. + I obviously needed to commented out the check in ping_common_sendmsg() if (len > 0xFFFF) return -EMSGSIZE; I'm obviously missing something. Kind regards, Petr > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c > index ab504d31f0cdd8dec9ab01bf9d6e6517307609cd..6b1668e037dae3c88052c50f02f319355baf4304 > 100644 > --- a/net/ipv6/ip6_output.c > +++ b/net/ipv6/ip6_output.c > @@ -1473,7 +1473,7 @@ static int __ip6_append_data(struct sock *sk, > } > if (ip6_sk_ignore_df(sk)) > - maxnonfragsize = sizeof(struct ipv6hdr) + IPV6_MAXPLEN; > + maxnonfragsize = max_t(u32, mtu, sizeof(struct > ipv6hdr) + IPV6_MAXPLEN); > else > maxnonfragsize = mtu; ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-08-21 20:12 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-19 12:49 [RFC] Big TCP and ping support vs. max ICMP{,v6} packet size Petr Vorel
2024-08-19 12:56 ` Eric Dumazet
2024-08-20 15:38 ` Petr Vorel
2024-08-20 18:35 ` Eric Dumazet
2024-08-21 20:12 ` Petr Vorel
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).