* [PATCH v2 net] udp: gso: do not drop small packets when PMTU reduces
@ 2025-01-30 7:55 Yan Zhai
2025-01-30 15:35 ` Willem de Bruijn
0 siblings, 1 reply; 3+ messages in thread
From: Yan Zhai @ 2025-01-30 7:55 UTC (permalink / raw)
To: netdev
Cc: Willem de Bruijn, David S. Miller, David Ahern, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Shuah Khan, Josh Hunt,
Alexander Duyck, netdev, linux-kernel, linux-kselftest,
kernel-team
Commit 4094871db1d6 ("udp: only do GSO if # of segs > 1") avoided GSO
for small packets. But the kernel currently dismisses GSO requests only
after checking MTU/PMTU on gso_size. This means any packets, regardless
of their payload sizes, could be dropped when PMTU becomes smaller than
requested gso_size. We encountered this issue in production and it
caused a reliability problem that new QUIC connection cannot be
established before PMTU cache expired, while non GSO sockets still
worked fine at the same time.
Ideally, do not check any GSO related constraints when payload size is
smaller than requested gso_size, and return EMSGSIZE instead of EINVAL
on MTU/PMTU check failure to be more specific on the error cause.
Fixes: 4094871db1d6 ("udp: only do GSO if # of segs > 1")
Signed-off-by: Yan Zhai <yan@cloudflare.com>
--
v1->v2: add a missing MTU check when fall back to no GSO mode suggested
by Willem de Bruijn <willemdebruijn.kernel@gmail.com>; Fixed up commit
message to be more precise.
v1: https://lore.kernel.org/all/Z5cgWh%2F6bRQm9vVU@debian.debian/
---
net/ipv4/udp.c | 28 +++++++++++++++++++---------
net/ipv6/udp.c | 28 +++++++++++++++++++---------
tools/testing/selftests/net/udpgso.c | 14 ++++++++++++++
3 files changed, 52 insertions(+), 18 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index c472c9a57cf6..0b5010238d05 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1141,9 +1141,20 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
const int hlen = skb_network_header_len(skb) +
sizeof(struct udphdr);
+ if (datalen <= cork->gso_size) {
+ /*
+ * check MTU again: it's skipped previously when
+ * gso_size != 0
+ */
+ if (hlen + datalen > cork->fragsize) {
+ kfree_skb(skb);
+ return -EMSGSIZE;
+ }
+ goto no_gso;
+ }
if (hlen + cork->gso_size > cork->fragsize) {
kfree_skb(skb);
- return -EINVAL;
+ return -EMSGSIZE;
}
if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {
kfree_skb(skb);
@@ -1158,17 +1169,16 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
return -EIO;
}
- if (datalen > cork->gso_size) {
- skb_shinfo(skb)->gso_size = cork->gso_size;
- skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
- skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
- cork->gso_size);
+ skb_shinfo(skb)->gso_size = cork->gso_size;
+ skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
+ skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
+ cork->gso_size);
- /* Don't checksum the payload, skb will get segmented */
- goto csum_partial;
- }
+ /* Don't checksum the payload, skb will get segmented */
+ goto csum_partial;
}
+no_gso:
if (is_udplite) /* UDP-Lite */
csum = udplite_csum(skb);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 6671daa67f4f..d97befa7f80d 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1389,9 +1389,20 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
const int hlen = skb_network_header_len(skb) +
sizeof(struct udphdr);
+ if (datalen <= cork->gso_size) {
+ /*
+ * check MTU again: it's skipped previously when
+ * gso_size != 0
+ */
+ if (hlen + datalen > cork->fragsize) {
+ kfree_skb(skb);
+ return -EMSGSIZE;
+ }
+ goto no_gso;
+ }
if (hlen + cork->gso_size > cork->fragsize) {
kfree_skb(skb);
- return -EINVAL;
+ return -EMSGSIZE;
}
if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {
kfree_skb(skb);
@@ -1406,17 +1417,16 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
return -EIO;
}
- if (datalen > cork->gso_size) {
- skb_shinfo(skb)->gso_size = cork->gso_size;
- skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
- skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
- cork->gso_size);
+ skb_shinfo(skb)->gso_size = cork->gso_size;
+ skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
+ skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
+ cork->gso_size);
- /* Don't checksum the payload, skb will get segmented */
- goto csum_partial;
- }
+ /* Don't checksum the payload, skb will get segmented */
+ goto csum_partial;
}
+no_gso:
if (is_udplite)
csum = udplite_csum(skb);
else if (udp_get_no_check6_tx(sk)) { /* UDP csum disabled */
diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c
index 3f2fca02fec5..fb73f1c331fb 100644
--- a/tools/testing/selftests/net/udpgso.c
+++ b/tools/testing/selftests/net/udpgso.c
@@ -102,6 +102,13 @@ struct testcase testcases_v4[] = {
.gso_len = CONST_MSS_V4,
.r_num_mss = 1,
},
+ {
+ /* datalen <= MSS < gso_len: will fall back to no GSO */
+ .tlen = CONST_MSS_V4,
+ .gso_len = CONST_MSS_V4 + 1,
+ .r_num_mss = 0,
+ .r_len_last = CONST_MSS_V4,
+ },
{
/* send a single MSS + 1B */
.tlen = CONST_MSS_V4 + 1,
@@ -205,6 +212,13 @@ struct testcase testcases_v6[] = {
.gso_len = CONST_MSS_V6,
.r_num_mss = 1,
},
+ {
+ /* datalen <= MSS < gso_len: will fall back to no GSO */
+ .tlen = CONST_MSS_V6,
+ .gso_len = CONST_MSS_V6 + 1,
+ .r_num_mss = 0,
+ .r_len_last = CONST_MSS_V6,
+ },
{
/* send a single MSS + 1B */
.tlen = CONST_MSS_V6 + 1,
--
2.30.2
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH v2 net] udp: gso: do not drop small packets when PMTU reduces
2025-01-30 7:55 [PATCH v2 net] udp: gso: do not drop small packets when PMTU reduces Yan Zhai
@ 2025-01-30 15:35 ` Willem de Bruijn
2025-01-30 18:34 ` Yan Zhai
0 siblings, 1 reply; 3+ messages in thread
From: Willem de Bruijn @ 2025-01-30 15:35 UTC (permalink / raw)
To: Yan Zhai, netdev
Cc: Willem de Bruijn, David S. Miller, David Ahern, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Shuah Khan, Josh Hunt,
Alexander Duyck, netdev, linux-kernel, linux-kselftest,
kernel-team
Yan Zhai wrote:
> Commit 4094871db1d6 ("udp: only do GSO if # of segs > 1") avoided GSO
> for small packets. But the kernel currently dismisses GSO requests only
> after checking MTU/PMTU on gso_size. This means any packets, regardless
> of their payload sizes, could be dropped when PMTU becomes smaller than
> requested gso_size. We encountered this issue in production and it
> caused a reliability problem that new QUIC connection cannot be
> established before PMTU cache expired, while non GSO sockets still
> worked fine at the same time.
>
> Ideally, do not check any GSO related constraints when payload size is
> smaller than requested gso_size, and return EMSGSIZE instead of EINVAL
> on MTU/PMTU check failure to be more specific on the error cause.
>
> Fixes: 4094871db1d6 ("udp: only do GSO if # of segs > 1")
> Signed-off-by: Yan Zhai <yan@cloudflare.com>
> --
> v1->v2: add a missing MTU check when fall back to no GSO mode suggested
> by Willem de Bruijn <willemdebruijn.kernel@gmail.com>; Fixed up commit
> message to be more precise.
>
> v1: https://lore.kernel.org/all/Z5cgWh%2F6bRQm9vVU@debian.debian/
> ---
> net/ipv4/udp.c | 28 +++++++++++++++++++---------
> net/ipv6/udp.c | 28 +++++++++++++++++++---------
> tools/testing/selftests/net/udpgso.c | 14 ++++++++++++++
> 3 files changed, 52 insertions(+), 18 deletions(-)
>
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index c472c9a57cf6..0b5010238d05 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -1141,9 +1141,20 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
> const int hlen = skb_network_header_len(skb) +
> sizeof(struct udphdr);
>
> + if (datalen <= cork->gso_size) {
> + /*
> + * check MTU again: it's skipped previously when
> + * gso_size != 0
> + */
> + if (hlen + datalen > cork->fragsize) {
> + kfree_skb(skb);
> + return -EMSGSIZE;
> + }
> + goto no_gso;
This is almost the same as the test below.
How about just
if (hlen + min(cork->gso_size, datalen) > cork->fragsize)
And don't bypass the subsequent checks with a goto, or modify the
rest of the code.
> + }
> if (hlen + cork->gso_size > cork->fragsize) {
> kfree_skb(skb);
> - return -EINVAL;
> + return -EMSGSIZE;
> }
> if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {
> kfree_skb(skb);
> @@ -1158,17 +1169,16 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
> return -EIO;
> }
>
> - if (datalen > cork->gso_size) {
> - skb_shinfo(skb)->gso_size = cork->gso_size;
> - skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
> - skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
> - cork->gso_size);
> + skb_shinfo(skb)->gso_size = cork->gso_size;
> + skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
> + skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
> + cork->gso_size);
>
> - /* Don't checksum the payload, skb will get segmented */
> - goto csum_partial;
> - }
> + /* Don't checksum the payload, skb will get segmented */
> + goto csum_partial;
> }
>
> +no_gso:
> if (is_udplite) /* UDP-Lite */
> csum = udplite_csum(skb);
>
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 6671daa67f4f..d97befa7f80d 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -1389,9 +1389,20 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
> const int hlen = skb_network_header_len(skb) +
> sizeof(struct udphdr);
>
> + if (datalen <= cork->gso_size) {
> + /*
> + * check MTU again: it's skipped previously when
> + * gso_size != 0
> + */
> + if (hlen + datalen > cork->fragsize) {
> + kfree_skb(skb);
> + return -EMSGSIZE;
> + }
> + goto no_gso;
> + }
> if (hlen + cork->gso_size > cork->fragsize) {
> kfree_skb(skb);
> - return -EINVAL;
> + return -EMSGSIZE;
> }
> if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {
> kfree_skb(skb);
> @@ -1406,17 +1417,16 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
> return -EIO;
> }
>
> - if (datalen > cork->gso_size) {
> - skb_shinfo(skb)->gso_size = cork->gso_size;
> - skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
> - skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
> - cork->gso_size);
> + skb_shinfo(skb)->gso_size = cork->gso_size;
> + skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
> + skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
> + cork->gso_size);
>
> - /* Don't checksum the payload, skb will get segmented */
> - goto csum_partial;
> - }
> + /* Don't checksum the payload, skb will get segmented */
> + goto csum_partial;
> }
>
> +no_gso:
> if (is_udplite)
> csum = udplite_csum(skb);
> else if (udp_get_no_check6_tx(sk)) { /* UDP csum disabled */
> diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c
> index 3f2fca02fec5..fb73f1c331fb 100644
> --- a/tools/testing/selftests/net/udpgso.c
> +++ b/tools/testing/selftests/net/udpgso.c
> @@ -102,6 +102,13 @@ struct testcase testcases_v4[] = {
> .gso_len = CONST_MSS_V4,
> .r_num_mss = 1,
> },
> + {
> + /* datalen <= MSS < gso_len: will fall back to no GSO */
> + .tlen = CONST_MSS_V4,
> + .gso_len = CONST_MSS_V4 + 1,
> + .r_num_mss = 0,
> + .r_len_last = CONST_MSS_V4,
> + },
Please also add a test where datalen > MSS < gso_len (with .tfail = true).
> {
> /* send a single MSS + 1B */
> .tlen = CONST_MSS_V4 + 1,
> @@ -205,6 +212,13 @@ struct testcase testcases_v6[] = {
> .gso_len = CONST_MSS_V6,
> .r_num_mss = 1,
> },
> + {
> + /* datalen <= MSS < gso_len: will fall back to no GSO */
> + .tlen = CONST_MSS_V6,
> + .gso_len = CONST_MSS_V6 + 1,
> + .r_num_mss = 0,
> + .r_len_last = CONST_MSS_V6,
> + },
> {
> /* send a single MSS + 1B */
> .tlen = CONST_MSS_V6 + 1,
> --
> 2.30.2
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH v2 net] udp: gso: do not drop small packets when PMTU reduces
2025-01-30 15:35 ` Willem de Bruijn
@ 2025-01-30 18:34 ` Yan Zhai
0 siblings, 0 replies; 3+ messages in thread
From: Yan Zhai @ 2025-01-30 18:34 UTC (permalink / raw)
To: Willem de Bruijn
Cc: netdev, David S. Miller, David Ahern, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Simon Horman, Shuah Khan, Josh Hunt,
Alexander Duyck, linux-kernel, linux-kselftest, kernel-team
On Thu, Jan 30, 2025 at 9:35 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Yan Zhai wrote:
> > Commit 4094871db1d6 ("udp: only do GSO if # of segs > 1") avoided GSO
> > for small packets. But the kernel currently dismisses GSO requests only
> > after checking MTU/PMTU on gso_size. This means any packets, regardless
> > of their payload sizes, could be dropped when PMTU becomes smaller than
> > requested gso_size. We encountered this issue in production and it
> > caused a reliability problem that new QUIC connection cannot be
> > established before PMTU cache expired, while non GSO sockets still
> > worked fine at the same time.
> >
> > Ideally, do not check any GSO related constraints when payload size is
> > smaller than requested gso_size, and return EMSGSIZE instead of EINVAL
> > on MTU/PMTU check failure to be more specific on the error cause.
> >
> > Fixes: 4094871db1d6 ("udp: only do GSO if # of segs > 1")
> > Signed-off-by: Yan Zhai <yan@cloudflare.com>
> > --
> > v1->v2: add a missing MTU check when fall back to no GSO mode suggested
> > by Willem de Bruijn <willemdebruijn.kernel@gmail.com>; Fixed up commit
> > message to be more precise.
> >
> > v1: https://lore.kernel.org/all/Z5cgWh%2F6bRQm9vVU@debian.debian/
> > ---
> > net/ipv4/udp.c | 28 +++++++++++++++++++---------
> > net/ipv6/udp.c | 28 +++++++++++++++++++---------
> > tools/testing/selftests/net/udpgso.c | 14 ++++++++++++++
> > 3 files changed, 52 insertions(+), 18 deletions(-)
> >
> > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> > index c472c9a57cf6..0b5010238d05 100644
> > --- a/net/ipv4/udp.c
> > +++ b/net/ipv4/udp.c
> > @@ -1141,9 +1141,20 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
> > const int hlen = skb_network_header_len(skb) +
> > sizeof(struct udphdr);
> >
> > + if (datalen <= cork->gso_size) {
> > + /*
> > + * check MTU again: it's skipped previously when
> > + * gso_size != 0
> > + */
> > + if (hlen + datalen > cork->fragsize) {
> > + kfree_skb(skb);
> > + return -EMSGSIZE;
> > + }
> > + goto no_gso;
>
> This is almost the same as the test below.
>
> How about just
>
> if (hlen + min(cork->gso_size, datalen) > cork->fragsize)
>
> And don't bypass the subsequent checks with a goto, or modify the
> rest of the code.
>
> > + }
> > if (hlen + cork->gso_size > cork->fragsize) {
> > kfree_skb(skb);
> > - return -EINVAL;
> > + return -EMSGSIZE;
> > }
> > if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {
> > kfree_skb(skb);
> > @@ -1158,17 +1169,16 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
> > return -EIO;
> > }
> >
> > - if (datalen > cork->gso_size) {
> > - skb_shinfo(skb)->gso_size = cork->gso_size;
> > - skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
> > - skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
> > - cork->gso_size);
> > + skb_shinfo(skb)->gso_size = cork->gso_size;
> > + skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
> > + skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
> > + cork->gso_size);
> >
> > - /* Don't checksum the payload, skb will get segmented */
> > - goto csum_partial;
> > - }
> > + /* Don't checksum the payload, skb will get segmented */
> > + goto csum_partial;
> > }
> >
> > +no_gso:
> > if (is_udplite) /* UDP-Lite */
> > csum = udplite_csum(skb);
> >
> > diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> > index 6671daa67f4f..d97befa7f80d 100644
> > --- a/net/ipv6/udp.c
> > +++ b/net/ipv6/udp.c
> > @@ -1389,9 +1389,20 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
> > const int hlen = skb_network_header_len(skb) +
> > sizeof(struct udphdr);
> >
> > + if (datalen <= cork->gso_size) {
> > + /*
> > + * check MTU again: it's skipped previously when
> > + * gso_size != 0
> > + */
> > + if (hlen + datalen > cork->fragsize) {
> > + kfree_skb(skb);
> > + return -EMSGSIZE;
> > + }
> > + goto no_gso;
> > + }
> > if (hlen + cork->gso_size > cork->fragsize) {
> > kfree_skb(skb);
> > - return -EINVAL;
> > + return -EMSGSIZE;
> > }
> > if (datalen > cork->gso_size * UDP_MAX_SEGMENTS) {
> > kfree_skb(skb);
> > @@ -1406,17 +1417,16 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
> > return -EIO;
> > }
> >
> > - if (datalen > cork->gso_size) {
> > - skb_shinfo(skb)->gso_size = cork->gso_size;
> > - skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
> > - skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
> > - cork->gso_size);
> > + skb_shinfo(skb)->gso_size = cork->gso_size;
> > + skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
> > + skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
> > + cork->gso_size);
> >
> > - /* Don't checksum the payload, skb will get segmented */
> > - goto csum_partial;
> > - }
> > + /* Don't checksum the payload, skb will get segmented */
> > + goto csum_partial;
> > }
> >
> > +no_gso:
> > if (is_udplite)
> > csum = udplite_csum(skb);
> > else if (udp_get_no_check6_tx(sk)) { /* UDP csum disabled */
> > diff --git a/tools/testing/selftests/net/udpgso.c b/tools/testing/selftests/net/udpgso.c
> > index 3f2fca02fec5..fb73f1c331fb 100644
> > --- a/tools/testing/selftests/net/udpgso.c
> > +++ b/tools/testing/selftests/net/udpgso.c
> > @@ -102,6 +102,13 @@ struct testcase testcases_v4[] = {
> > .gso_len = CONST_MSS_V4,
> > .r_num_mss = 1,
> > },
> > + {
> > + /* datalen <= MSS < gso_len: will fall back to no GSO */
> > + .tlen = CONST_MSS_V4,
> > + .gso_len = CONST_MSS_V4 + 1,
> > + .r_num_mss = 0,
> > + .r_len_last = CONST_MSS_V4,
> > + },
>
> Please also add a test where datalen > MSS < gso_len (with .tfail = true).
>
ACK on both comments.
Thanks
Yan
> > {
> > /* send a single MSS + 1B */
> > .tlen = CONST_MSS_V4 + 1,
> > @@ -205,6 +212,13 @@ struct testcase testcases_v6[] = {
> > .gso_len = CONST_MSS_V6,
> > .r_num_mss = 1,
> > },
> > + {
> > + /* datalen <= MSS < gso_len: will fall back to no GSO */
> > + .tlen = CONST_MSS_V6,
> > + .gso_len = CONST_MSS_V6 + 1,
> > + .r_num_mss = 0,
> > + .r_len_last = CONST_MSS_V6,
> > + },
> > {
> > /* send a single MSS + 1B */
> > .tlen = CONST_MSS_V6 + 1,
> > --
> > 2.30.2
> >
> >
>
>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-01-30 18:35 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-30 7:55 [PATCH v2 net] udp: gso: do not drop small packets when PMTU reduces Yan Zhai
2025-01-30 15:35 ` Willem de Bruijn
2025-01-30 18:34 ` Yan Zhai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).