From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Jan Tluka <jtluka@redhat.com>,
Jakub Sitnicki <jkbs@redhat.com>,
Hannes Frederic Sowa <hannes@stressinduktion.org>,
"David S. Miller" <davem@davemloft.net>,
Benedict Wong <benedictwong@google.com>
Subject: [PATCH 3.18 01/24] ipv6: Skip XFRM lookup if dst_entry in socket cache is valid
Date: Fri, 2 Mar 2018 09:50:58 +0100 [thread overview]
Message-ID: <20180302084239.248232037@linuxfoundation.org> (raw)
In-Reply-To: <20180302084239.157503766@linuxfoundation.org>
3.18-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Sitnicki <jkbs@redhat.com>
commit 00bc0ef5880dc7b82f9c320dead4afaad48e47be upstream.
At present we perform an xfrm_lookup() for each UDPv6 message we
send. The lookup involves querying the flow cache (flow_cache_lookup)
and, in case of a cache miss, creating an XFRM bundle.
If we miss the flow cache, we can end up creating a new bundle and
deriving the path MTU (xfrm_init_pmtu) from on an already transformed
dst_entry, which we pass from the socket cache (sk->sk_dst_cache) down
to xfrm_lookup(). This can happen only if we're caching the dst_entry
in the socket, that is when we're using a connected UDP socket.
To put it another way, the path MTU shrinks each time we miss the flow
cache, which later on leads to incorrectly fragmented payload. It can
be observed with ESPv6 in transport mode:
1) Set up a transformation and lower the MTU to trigger fragmentation
# ip xfrm policy add dir out src ::1 dst ::1 \
tmpl src ::1 dst ::1 proto esp spi 1
# ip xfrm state add src ::1 dst ::1 \
proto esp spi 1 enc 'aes' 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b
# ip link set dev lo mtu 1500
2) Monitor the packet flow and set up an UDP sink
# tcpdump -ni lo -ttt &
# socat udp6-listen:12345,fork /dev/null &
3) Send a datagram that needs fragmentation with a connected socket
# perl -e 'print "@" x 1470 | socat - udp6:[::1]:12345
2016/06/07 18:52:52 socat[724] E read(3, 0x555bb3d5ba00, 8192): Protocol error
00:00:00.000000 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x2), length 1448
00:00:00.000014 IP6 ::1 > ::1: frag (1448|32)
00:00:00.000050 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x3), length 1272
(^ ICMPv6 Parameter Problem)
00:00:00.000022 IP6 ::1 > ::1: ESP(spi=0x00000001,seq=0x5), length 136
4) Compare it to a non-connected socket
# perl -e 'print "@" x 1500' | socat - udp6-sendto:[::1]:12345
00:00:40.535488 IP6 ::1 > ::1: frag (0|1448) ESP(spi=0x00000001,seq=0x6), length 1448
00:00:00.000010 IP6 ::1 > ::1: frag (1448|64)
What happens in step (3) is:
1) when connecting the socket in __ip6_datagram_connect(), we
perform an XFRM lookup, miss the flow cache, create an XFRM
bundle, and cache the destination,
2) afterwards, when sending the datagram, we perform an XFRM lookup,
again, miss the flow cache (due to mismatch of flowi6_iif and
flowi6_oif, which is an issue of its own), and recreate an XFRM
bundle based on the cached (and already transformed) destination.
To prevent the recreation of an XFRM bundle, avoid an XFRM lookup
altogether whenever we already have a destination entry cached in the
socket. This prevents the path MTU shrinkage and brings us on par with
UDPv4.
The fix also benefits connected PINGv6 sockets, another user of
ip6_sk_dst_lookup_flow(), who also suffer messages being transformed
twice.
Joint work with Hannes Frederic Sowa.
Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Benedict Wong <benedictwong@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv6/ip6_output.c | 11 +++--------
1 file changed, 3 insertions(+), 8 deletions(-)
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1038,17 +1038,12 @@ struct dst_entry *ip6_sk_dst_lookup_flow
const struct in6_addr *final_dst)
{
struct dst_entry *dst = sk_dst_check(sk, inet6_sk(sk)->dst_cookie);
- int err;
dst = ip6_sk_dst_check(sk, dst, fl6);
+ if (!dst)
+ dst = ip6_dst_lookup_flow(sk, fl6, final_dst);
- err = ip6_dst_lookup_tail(sk, &dst, fl6);
- if (err)
- return ERR_PTR(err);
- if (final_dst)
- fl6->daddr = *final_dst;
-
- return xfrm_lookup_route(sock_net(sk), dst, flowi6_to_flowi(fl6), sk, 0);
+ return dst;
}
EXPORT_SYMBOL_GPL(ip6_sk_dst_lookup_flow);
next prev parent reply other threads:[~2018-03-02 8:50 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-02 8:50 [PATCH 3.18 00/24] 3.18.98-stable review Greg Kroah-Hartman
2018-03-02 8:50 ` Greg Kroah-Hartman [this message]
2018-03-02 8:50 ` [PATCH 3.18 02/24] hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers) Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 03/24] mtd: nand: gpmi: Fix failure when a erased page has a bitflip at BBM Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 04/24] ipv6: icmp6: Allow icmp messages to be looped back Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 05/24] sget(): handle failures of register_shrinker() Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 06/24] spi: atmel: fixed spin_lock usage inside atmel_spi_remove Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 07/24] net: arc_emac: fix arc_emac_rx() error paths Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 08/24] scsi: storvsc: Fix scsi_cmd error assignments in storvsc_handle_error Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 09/24] tg3: Add workaround to restrict 5762 MRRS to 2048 Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 10/24] tg3: Enable PHY reset in MTU change path for 5720 Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 11/24] bnx2x: Improve reliability in case of nested PCI errors Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 12/24] led: core: Fix brightness setting when setting delay_off=0 Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 13/24] s390/dasd: fix wrongly assigned configuration data Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 14/24] xfs: quota: fix missed destroy of qi_tree_lock Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 15/24] xfs: quota: check result of register_shrinker() Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 16/24] e1000: fix disabling already-disabled warning Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 18/24] xen-netfront: enable device after manual module load Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 19/24] mdio-sun4i: Fix a memory leak Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 20/24] SolutionEngine771x: fix Ether platform data Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 21/24] xen/gntdev: Fix off-by-one error when unmapping with holes Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 22/24] xen/gntdev: Fix partial gntdev_mmap() cleanup Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 23/24] sctp: make use of pre-calculated len Greg Kroah-Hartman
2018-03-02 8:51 ` [PATCH 3.18 24/24] net: gianfar_ptp: move set_fipers() to spinlock protecting area Greg Kroah-Hartman
2018-03-02 17:33 ` [PATCH 3.18 00/24] 3.18.98-stable review Guenter Roeck
[not found] ` <CALpmF+Ess8+k+N0q6YyVEUL+YmM5m1rS2ORs9xyR=sZoYXTwHA@mail.gmail.com>
2018-03-02 18:53 ` Greg Kroah-Hartman
2018-03-02 21:31 ` Shuah Khan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180302084239.248232037@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=benedictwong@google.com \
--cc=davem@davemloft.net \
--cc=hannes@stressinduktion.org \
--cc=jkbs@redhat.com \
--cc=jtluka@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).