From: Vlad Buslov <vladbu@nvidia.com>
To: Antoine Tenart <atenart@kernel.org>
Cc: <davem@davemloft.net>, <kuba@kernel.org>, <echaudro@redhat.com>,
<sbrivio@redhat.com>, <netdev@vger.kernel.org>
Subject: Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
Date: Thu, 20 Jan 2022 09:38:05 +0200 [thread overview]
Message-ID: <ygnhh79yluw2.fsf@nvidia.com> (raw)
In-Reply-To: <20210325153533.770125-2-atenart@kernel.org>
Hello Antoine,
On Thu 25 Mar 2021 at 17:35, Antoine Tenart <atenart@kernel.org> wrote:
> When the interface is part of a bridge or an Open vSwitch port and a
> packet exceed a PMTU estimate, an ICMP reply is sent to the sender. When
> using the external mode (collect metadata) the source and destination
> addresses are reversed, so that Open vSwitch can match the packet
> against an existing (reverse) flow.
>
> But inverting the source and destination addresses in the shared
> ip_tunnel_info will make following packets of the flow to use a wrong
> destination address (packets will be tunnelled to itself), if the flow
> isn't updated. Which happens with Open vSwitch, until the flow times
> out.
>
> Fixes this by uncloning the skb's ip_tunnel_info before inverting its
> source and destination addresses, so that the modification will only be
> made for the PTMU packet, not the following ones.
>
> Fixes: fc68c99577cc ("vxlan: Support for PMTU discovery on directly bridged links")
> Tested-by: Eelco Chaudron <echaudro@redhat.com>
> Reviewed-by: Eelco Chaudron <echaudro@redhat.com>
> Signed-off-by: Antoine Tenart <atenart@kernel.org>
> ---
> drivers/net/vxlan.c | 18 ++++++++++++++----
> 1 file changed, 14 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 666dd201c3d5..53dbc67e8a34 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -2725,12 +2725,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
> goto tx_error;
> } else if (err) {
> if (info) {
> + struct ip_tunnel_info *unclone;
> struct in_addr src, dst;
>
> + unclone = skb_tunnel_info_unclone(skb);
> + if (unlikely(!unclone))
> + goto tx_error;
> +
We have been getting memleaks in one of our tests that point to this
code (test deletes vxlan device while running traffic redirected by OvS
TC at the same time):
unreferenced object 0xffff8882d0114200 (size 256):
comm "softirq", pid 0, jiffies 4296140292 (age 1435.992s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 3b 85 84 ff ff ff ff .........;......
a1 26 b7 83 ff ff ff ff 00 00 00 00 00 00 00 00 .&..............
backtrace:
[<0000000097659d47>] metadata_dst_alloc+0x1f/0x470
[<000000007571c30f>] tun_dst_unclone+0xee/0x360 [vxlan]
[<00000000d2dcfd00>] vxlan_xmit_one+0x131d/0x2a00 [vxlan]
[<00000000281572b6>] vxlan_xmit+0x8e6/0x4cd0 [vxlan]
[<00000000d49d33fe>] dev_hard_start_xmit+0x1ba/0x710
[<00000000eac444f5>] __dev_queue_xmit+0x17c5/0x25f0
[<000000005fbd8585>] tcf_mirred_act+0xb1d/0xf70 [act_mirred]
[<0000000064b6eb2d>] tcf_action_exec+0x10e/0x350
[<00000000352821e8>] fl_classify+0x4e3/0x610 [cls_flower]
[<0000000011d3f765>] tcf_classify+0x33d/0x800
[<000000006c69b225>] __netif_receive_skb_core+0x18d6/0x2ae0
[<00000000dd256fe3>] __netif_receive_skb_one_core+0xaf/0x180
[<0000000065d43bd6>] process_backlog+0x2e3/0x710
[<00000000964357ae>] __napi_poll+0x9f/0x560
[<0000000059a93cf6>] net_rx_action+0x357/0xa60
[<00000000766481bc>] __do_softirq+0x282/0x94e
Looking at the code the potential issue seems to be that
tun_dst_unclone() creates new metadata_dst instance with refcount==1,
increments the refcount with dst_hold() to value 2, then returns it.
This seems to imply that caller is expected to release one of the
references (second one if for skb), but none of the callers (including
original dev_fill_metadata_dst()) do that, so I guess I'm
misunderstanding something here.
Any tips or suggestions?
> src = remote_ip.sin.sin_addr;
> dst = local_ip.sin.sin_addr;
> - info->key.u.ipv4.src = src.s_addr;
> - info->key.u.ipv4.dst = dst.s_addr;
> + unclone->key.u.ipv4.src = src.s_addr;
> + unclone->key.u.ipv4.dst = dst.s_addr;
> }
> vxlan_encap_bypass(skb, vxlan, vxlan, vni, false);
> dst_release(ndst);
> @@ -2781,12 +2786,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
> goto tx_error;
> } else if (err) {
> if (info) {
> + struct ip_tunnel_info *unclone;
> struct in6_addr src, dst;
>
> + unclone = skb_tunnel_info_unclone(skb);
> + if (unlikely(!unclone))
> + goto tx_error;
> +
> src = remote_ip.sin6.sin6_addr;
> dst = local_ip.sin6.sin6_addr;
> - info->key.u.ipv6.src = src;
> - info->key.u.ipv6.dst = dst;
> + unclone->key.u.ipv6.src = src;
> + unclone->key.u.ipv6.dst = dst;
> }
>
> vxlan_encap_bypass(skb, vxlan, vxlan, vni, false);
next prev parent reply other threads:[~2022-01-20 7:38 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-25 15:35 [PATCH net 0/2] net: do not modify the shared tunnel info when PMTU triggers an ICMP reply Antoine Tenart
2021-03-25 15:35 ` [PATCH net 1/2] vxlan: " Antoine Tenart
2022-01-20 7:38 ` Vlad Buslov [this message]
2022-01-20 10:27 ` Antoine Tenart
2022-01-20 12:58 ` Vlad Buslov
2022-01-28 17:01 ` Antoine Tenart
2022-01-31 11:26 ` Vlad Buslov
2022-01-31 13:26 ` Antoine Tenart
2022-01-31 14:04 ` Stefano Brivio
2022-01-31 14:42 ` Antoine Tenart
2022-01-31 17:55 ` Vlad Buslov
2021-03-25 15:35 ` [PATCH net 2/2] geneve: " Antoine Tenart
2021-03-25 20:28 ` [PATCH net 0/2] net: " Stefano Brivio
2021-03-26 0:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ygnhh79yluw2.fsf@nvidia.com \
--to=vladbu@nvidia.com \
--cc=atenart@kernel.org \
--cc=davem@davemloft.net \
--cc=echaudro@redhat.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).