From: Yan Zhai <yan@cloudflare.com>
To: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>,
David Ahern <dsahern@kernel.org>,
Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Aya Levin <ayal@nvidia.com>, Tariq Toukan <tariqt@nvidia.com>,
linux-kernel@vger.kernel.org, kernel-team@cloudflare.com,
Florian Westphal <fw@strlen.de>,
Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Subject: [PATCH v2 net-next] ipv6: avoid atomic fragment on GSO packets
Date: Mon, 16 Oct 2023 11:23:38 -0700 [thread overview]
Message-ID: <ZS1/qtr0dZJ35VII@debian.debian> (raw)
GSO packets can contain a trailing segment that is smaller than
gso_size. When examining the dst MTU for such packet, if its gso_size is
too large, then all segments would be fragmented. However, there is a
good chance the trailing segment has smaller actual size than both
gso_size as well as the MTU, which leads to an "atomic fragment". It is
considered harmful in RFC-8021. An Existing report from APNIC also shows
that atomic fragments are more likely to be dropped even it is
equivalent to a no-op [1].
Refactor __ip6_finish_output code to separate GSO and non-GSO packet
processing. It mirrors __ip_finish_output logic now. Add an extra check
in GSO handling to avoid atomic fragments. Lastly, drop dst_allfrag
check, which is no longer true since commit 9d289715eb5c ("ipv6: stop
sending PTB packets for MTU < 1280").
Link: https://www.potaroo.net/presentations/2022-03-01-ipv6-frag.pdf [1]
Fixes: b210de4f8c97 ("net: ipv6: Validate GSO SKB before finish IPv6 processing")
Suggested-by: Florian Westphal <fw@strlen.de>
Reported-by: David Wragg <dwragg@cloudflare.com>
Signed-off-by: Yan Zhai <yan@cloudflare.com>
---
net/ipv6/ip6_output.c | 33 +++++++++++++++++++++++----------
1 file changed, 23 insertions(+), 10 deletions(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a471c7e91761..1de6f3c11655 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -162,7 +162,14 @@ ip6_finish_output_gso_slowpath_drop(struct net *net, struct sock *sk,
int err;
skb_mark_not_on_list(segs);
- err = ip6_fragment(net, sk, segs, ip6_finish_output2);
+ /* Last gso segment might be smaller than actual MTU. Adding
+ * a fragment header to it would produce an "atomic fragment",
+ * which is considered harmful (RFC-8021)
+ */
+ err = segs->len > mtu ?
+ ip6_fragment(net, sk, segs, ip6_finish_output2) :
+ ip6_finish_output2(net, sk, segs);
+
if (err && ret == 0)
ret = err;
}
@@ -170,10 +177,19 @@ ip6_finish_output_gso_slowpath_drop(struct net *net, struct sock *sk,
return ret;
}
+static int ip6_finish_output_gso(struct net *net, struct sock *sk,
+ struct sk_buff *skb, unsigned int mtu)
+{
+ if (!(IP6CB(skb)->flags & IP6SKB_FAKEJUMBO) &&
+ !skb_gso_validate_network_len(skb, mtu))
+ return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu);
+
+ return ip6_finish_output2(net, sk, skb);
+}
+
static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff *skb)
{
unsigned int mtu;
-
#if defined(CONFIG_NETFILTER) && defined(CONFIG_XFRM)
/* Policy lookup after SNAT yielded a new policy */
if (skb_dst(skb)->xfrm) {
@@ -183,17 +199,14 @@ static int __ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff
#endif
mtu = ip6_skb_dst_mtu(skb);
- if (skb_is_gso(skb) &&
- !(IP6CB(skb)->flags & IP6SKB_FAKEJUMBO) &&
- !skb_gso_validate_network_len(skb, mtu))
- return ip6_finish_output_gso_slowpath_drop(net, sk, skb, mtu);
+ if (skb_is_gso(skb))
+ return ip6_finish_output_gso(net, sk, skb, mtu);
- if ((skb->len > mtu && !skb_is_gso(skb)) ||
- dst_allfrag(skb_dst(skb)) ||
+ if (skb->len > mtu ||
(IP6CB(skb)->frag_max_size && skb->len > IP6CB(skb)->frag_max_size))
return ip6_fragment(net, sk, skb, ip6_finish_output2);
- else
- return ip6_finish_output2(net, sk, skb);
+
+ return ip6_finish_output2(net, sk, skb);
}
static int ip6_finish_output(struct net *net, struct sock *sk, struct sk_buff *skb)
--
2.30.2
next reply other threads:[~2023-10-16 18:23 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-16 18:23 Yan Zhai [this message]
2023-10-16 18:27 ` [PATCH v2 net-next] ipv6: avoid atomic fragment on GSO packets Yan Zhai
2023-10-16 21:00 ` Alexander H Duyck
2023-10-16 21:51 ` Yan Zhai
2023-10-16 22:28 ` Alexander Duyck
2023-10-17 20:02 ` Florian Westphal
2023-10-18 1:41 ` Yan Zhai
2023-10-18 1:57 ` Willem de Bruijn
2023-10-18 13:53 ` Yan Zhai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZS1/qtr0dZJ35VII@debian.debian \
--to=yan@cloudflare.com \
--cc=ayal@nvidia.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=fw@strlen.de \
--cc=kernel-team@cloudflare.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=tariqt@nvidia.com \
--cc=willemdebruijn.kernel@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.