* [RFC PATCH] fix xfrm MTU regression
@ 2021-04-29 17:02 Jiri Bohac
2021-04-29 19:48 ` Sabrina Dubroca
2021-04-30 5:36 ` [RFC PATCH v2] " Jiri Bohac
0 siblings, 2 replies; 5+ messages in thread
From: Jiri Bohac @ 2021-04-29 17:02 UTC (permalink / raw)
To: Mike Maloney, Eric Dumazet, davem; +Cc: netdev, Steffen Klassert, Herbert Xu
Hi,
Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6
sendmsg crash caused by too small MTU") breaks PMTU for xfrm.
A Packet Too Big ICMPv6 message received in response to an ESP
packet will prevent all further communication through the tunnel
if the reported MTU minus the ESP overhead is smaller than 1280.
E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
in all further packets in the tunnel dropped. A ping through the
tunnel fails with "ping: sendmsg: Invalid argument".
Apparently the MTU on the xfrm route is smaller than 1280 and
fails the check inside ip6_setup_cork() added by 749439bf.
We found this by debugging USGv6/ipv6ready failures. Failing
tests are: "Phase-2 Interoperability Test Scenario IPsec" /
5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).
Below is my attempt to fix the situation by dropping the MTU
check and instead checking for the underflows described in the
749439bf commit message (without much understanding of the
details!). Does this make sense?:
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ff4f9ebcf7f6..8af6adb42c85 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1402,8 +1402,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
if (np->frag_size)
mtu = np->frag_size;
}
- if (mtu < IPV6_MIN_MTU)
- return -EINVAL;
cork->base.fragsize = mtu;
cork->base.gso_size = ipc6->gso_size;
cork->base.tx_flags = 0;
@@ -1465,6 +1463,11 @@ static int __ip6_append_data(struct sock *sk,
fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
(opt ? opt->opt_nflen : 0);
+
+ if (mtu < fragheaderlen ||
+ ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr))
+ goto emsgsize;
+
maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
sizeof(struct frag_hdr);
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia
^ permalink raw reply related [flat|nested] 5+ messages in thread* Re: [RFC PATCH] fix xfrm MTU regression
2021-04-29 17:02 [RFC PATCH] fix xfrm MTU regression Jiri Bohac
@ 2021-04-29 19:48 ` Sabrina Dubroca
2021-04-29 20:25 ` Jiri Bohac
2021-04-30 5:36 ` [RFC PATCH v2] " Jiri Bohac
1 sibling, 1 reply; 5+ messages in thread
From: Sabrina Dubroca @ 2021-04-29 19:48 UTC (permalink / raw)
To: Jiri Bohac
Cc: Mike Maloney, Eric Dumazet, davem, netdev, Steffen Klassert,
Herbert Xu
2021-04-29, 19:02:54 +0200, Jiri Bohac wrote:
> Hi,
>
> Commit 749439bfac6e1a2932c582e2699f91d329658196 ("ipv6: fix udpv6
> sendmsg crash caused by too small MTU") breaks PMTU for xfrm.
>
> A Packet Too Big ICMPv6 message received in response to an ESP
> packet will prevent all further communication through the tunnel
> if the reported MTU minus the ESP overhead is smaller than 1280.
>
> E.g. in a case of a tunnel-mode ESP with sha256/aes the overhead
> is 92 bytes. Receiving a PTB with MTU of 1371 or less will result
> in all further packets in the tunnel dropped. A ping through the
> tunnel fails with "ping: sendmsg: Invalid argument".
>
> Apparently the MTU on the xfrm route is smaller than 1280 and
> fails the check inside ip6_setup_cork() added by 749439bf.
>
> We found this by debugging USGv6/ipv6ready failures. Failing
> tests are: "Phase-2 Interoperability Test Scenario IPsec" /
> 5.3.11 and 5.4.11 (Tunnel Mode: Fragmentation).
That should be fixed with commit b515d2637276 ("xfrm: xfrm_state_mtu
should return at least 1280 for ipv6"), currently in Steffen's ipsec
tree:
https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git/commit/?id=b515d2637276
--
Sabrina
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFC PATCH] fix xfrm MTU regression
2021-04-29 19:48 ` Sabrina Dubroca
@ 2021-04-29 20:25 ` Jiri Bohac
2021-05-01 10:23 ` Sabrina Dubroca
0 siblings, 1 reply; 5+ messages in thread
From: Jiri Bohac @ 2021-04-29 20:25 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: Mike Maloney, Eric Dumazet, davem, netdev, Steffen Klassert,
Herbert Xu
On Thu, Apr 29, 2021 at 09:48:09PM +0200, Sabrina Dubroca wrote:
> That should be fixed with commit b515d2637276 ("xfrm: xfrm_state_mtu
> should return at least 1280 for ipv6"), currently in Steffen's ipsec
> tree:
> https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git/commit/?id=b515d2637276
Thanks, that is interesting! The patch makes my large (-s 1400) pings inside
ESP pass through a 1280-MTU link on an intermediary router but in a suboptimal
double-fragmented way. tcpdump on the router shows:
22:09:44.556452 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (0|1232) ESP(spi=0x00000001,seq=0xdd), length 1232
22:09:44.566269 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (1232|100)
22:09:44.566553 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0xde), length 276
I.e. the ping is fragmented into two ESP packets and the first ESP packet is then fragmented again.
The same pings with my patch come through in two fragments:
22:13:22.072934 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0x28), length 1236
22:13:22.073039 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0x29), length 356
I can do more tests if needed.
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [RFC PATCH] fix xfrm MTU regression
2021-04-29 20:25 ` Jiri Bohac
@ 2021-05-01 10:23 ` Sabrina Dubroca
0 siblings, 0 replies; 5+ messages in thread
From: Sabrina Dubroca @ 2021-05-01 10:23 UTC (permalink / raw)
To: Jiri Bohac
Cc: Mike Maloney, Eric Dumazet, davem, netdev, Steffen Klassert,
Herbert Xu
2021-04-29, 22:25:29 +0200, Jiri Bohac wrote:
> On Thu, Apr 29, 2021 at 09:48:09PM +0200, Sabrina Dubroca wrote:
> > That should be fixed with commit b515d2637276 ("xfrm: xfrm_state_mtu
> > should return at least 1280 for ipv6"), currently in Steffen's ipsec
> > tree:
> > https://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec.git/commit/?id=b515d2637276
>
> Thanks, that is interesting! The patch makes my large (-s 1400) pings inside
> ESP pass through a 1280-MTU link on an intermediary router but in a suboptimal
> double-fragmented way. tcpdump on the router shows:
>
> 22:09:44.556452 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (0|1232) ESP(spi=0x00000001,seq=0xdd), length 1232
> 22:09:44.566269 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: frag (1232|100)
> 22:09:44.566553 IP6 2001:db8:ffff::1 > 2001:db8:ffff:1::1: ESP(spi=0x00000001,seq=0xde), length 276
>
> I.e. the ping is fragmented into two ESP packets and the first ESP packet is then fragmented again.
It's a bit ugly, but I don't think we can do any better. We're going
through the stack twice in tunnel mode. The first pass (before xfrm)
we fragment according to the PMTU (adjusted to IPV6_MIN_MTU, because
MTUs lower than that are illegal in IPv6). The second time (after
xfrm), the first ESP packet is too big so we fragment it. This
behavior is consistent with a vti device running over a network with
MTU=1280 (which doesn't seem to work without my patch).
In transport mode, we're only going through the stack once, so we
don't see this double fragmentation.
I think my patch is correct, because without it we have IPv6 dsts
going around the kernel with an associated MTU smaller than
IPV6_MIN_MTU.
--
Sabrina
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH v2] fix xfrm MTU regression
2021-04-29 17:02 [RFC PATCH] fix xfrm MTU regression Jiri Bohac
2021-04-29 19:48 ` Sabrina Dubroca
@ 2021-04-30 5:36 ` Jiri Bohac
1 sibling, 0 replies; 5+ messages in thread
From: Jiri Bohac @ 2021-04-30 5:36 UTC (permalink / raw)
To: Mike Maloney, Eric Dumazet, davem; +Cc: netdev, Steffen Klassert, Herbert Xu
On Thu, Apr 29, 2021 at 07:02:55PM +0200, Jiri Bohac wrote:
> Below is my attempt to fix the situation by dropping the MTU
> check and instead checking for the underflows described in the
> 749439bf commit message (without much understanding of the
> details!). Does this make sense?:
the first version left headersize uninitialized in the error
path; v2 below fixes this.
Signed-off-by: Jiri Bohac <jbohac@suse.cz>
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ff4f9ebcf7f6..171eb4ec1e67 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1402,8 +1402,6 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
if (np->frag_size)
mtu = np->frag_size;
}
- if (mtu < IPV6_MIN_MTU)
- return -EINVAL;
cork->base.fragsize = mtu;
cork->base.gso_size = ipc6->gso_size;
cork->base.tx_flags = 0;
@@ -1465,8 +1463,6 @@ static int __ip6_append_data(struct sock *sk,
fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
(opt ? opt->opt_nflen : 0);
- maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
- sizeof(struct frag_hdr);
headersize = sizeof(struct ipv6hdr) +
(opt ? opt->opt_flen + opt->opt_nflen : 0) +
@@ -1474,6 +1470,13 @@ static int __ip6_append_data(struct sock *sk,
sizeof(struct frag_hdr) : 0) +
rt->rt6i_nfheader_len;
+ if (mtu < fragheaderlen ||
+ ((mtu - fragheaderlen) & ~7) + fragheaderlen < sizeof(struct frag_hdr))
+ goto emsgsize;
+
+ maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
+ sizeof(struct frag_hdr);
+
/* as per RFC 7112 section 5, the entire IPv6 Header Chain must fit
* the first fragment
*/
--
Jiri Bohac <jbohac@suse.cz>
SUSE Labs, Prague, Czechia
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2021-05-01 10:24 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-04-29 17:02 [RFC PATCH] fix xfrm MTU regression Jiri Bohac
2021-04-29 19:48 ` Sabrina Dubroca
2021-04-29 20:25 ` Jiri Bohac
2021-05-01 10:23 ` Sabrina Dubroca
2021-04-30 5:36 ` [RFC PATCH v2] " Jiri Bohac
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).