From: Vlad Buslov <vladbu@nvidia.com>
To: Antoine Tenart <atenart@kernel.org>
Cc: <davem@davemloft.net>, <kuba@kernel.org>, <echaudro@redhat.com>,
<sbrivio@redhat.com>, <netdev@vger.kernel.org>, <pshelar@ovn.org>
Subject: Re: [PATCH net 1/2] vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
Date: Thu, 20 Jan 2022 14:58:18 +0200 [thread overview]
Message-ID: <ygnhee52lg2d.fsf@nvidia.com> (raw)
In-Reply-To: <164267447125.4497.8151505359440130213@kwain>
On Thu 20 Jan 2022 at 12:27, Antoine Tenart <atenart@kernel.org> wrote:
> Hello Vlad,
>
> Quoting Vlad Buslov (2022-01-20 08:38:05)
>> On Thu 25 Mar 2021 at 17:35, Antoine Tenart <atenart@kernel.org> wrote:
>> >
>> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> > index 666dd201c3d5..53dbc67e8a34 100644
>> > --- a/drivers/net/vxlan.c
>> > +++ b/drivers/net/vxlan.c
>> > @@ -2725,12 +2725,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>> > goto tx_error;
>> > } else if (err) {
>> > if (info) {
>> > + struct ip_tunnel_info *unclone;
>> > struct in_addr src, dst;
>> >
>> > + unclone = skb_tunnel_info_unclone(skb);
>> > + if (unlikely(!unclone))
>> > + goto tx_error;
>> > +
>>
>> We have been getting memleaks in one of our tests that point to this
>> code (test deletes vxlan device while running traffic redirected by OvS
>> TC at the same time):
>>
>> unreferenced object 0xffff8882d0114200 (size 256):
>> comm "softirq", pid 0, jiffies 4296140292 (age 1435.992s)
>> hex dump (first 32 bytes):
>> 00 00 00 00 00 00 00 00 00 3b 85 84 ff ff ff ff .........;......
>> a1 26 b7 83 ff ff ff ff 00 00 00 00 00 00 00 00 .&..............
>> backtrace:
>> [<0000000097659d47>] metadata_dst_alloc+0x1f/0x470
>> [<000000007571c30f>] tun_dst_unclone+0xee/0x360 [vxlan]
>> [<00000000d2dcfd00>] vxlan_xmit_one+0x131d/0x2a00 [vxlan]
>> [<00000000281572b6>] vxlan_xmit+0x8e6/0x4cd0 [vxlan]
>> [<00000000d49d33fe>] dev_hard_start_xmit+0x1ba/0x710
>> [<00000000eac444f5>] __dev_queue_xmit+0x17c5/0x25f0
>> [<000000005fbd8585>] tcf_mirred_act+0xb1d/0xf70 [act_mirred]
>> [<0000000064b6eb2d>] tcf_action_exec+0x10e/0x350
>> [<00000000352821e8>] fl_classify+0x4e3/0x610 [cls_flower]
>> [<0000000011d3f765>] tcf_classify+0x33d/0x800
>> [<000000006c69b225>] __netif_receive_skb_core+0x18d6/0x2ae0
>> [<00000000dd256fe3>] __netif_receive_skb_one_core+0xaf/0x180
>> [<0000000065d43bd6>] process_backlog+0x2e3/0x710
>> [<00000000964357ae>] __napi_poll+0x9f/0x560
>> [<0000000059a93cf6>] net_rx_action+0x357/0xa60
>> [<00000000766481bc>] __do_softirq+0x282/0x94e
>>
>> Looking at the code the potential issue seems to be that
>> tun_dst_unclone() creates new metadata_dst instance with refcount==1,
>> increments the refcount with dst_hold() to value 2, then returns it.
>> This seems to imply that caller is expected to release one of the
>> references (second one if for skb), but none of the callers (including
>> original dev_fill_metadata_dst()) do that, so I guess I'm
>> misunderstanding something here.
>>
>> Any tips or suggestions?
>
> I'd say there is no need to increase the dst refcount here after calling
> metadata_dst_alloc, as the metadata is local to the skb and the dst
> refcount was already initialized to 1. This might be an issue with
> commit fc4099f17240 ("openvswitch: Fix egress tunnel info."); I CCed
> Pravin, he might recall if there was a reason to increase the refcount.
I tried to remove the dst_hold(), but that caused underflows[0], so I
guess the current reference counting is required at least for some
use-cases.
[0]:
[ 118.803011] ------------[ cut here ]------------
[ 118.803011] dst_release: dst:000000001fc13e61 refcnt:-2
[ 118.803019] dst_release: dst:000000001fc13e61 refcnt:-2
[ 118.803027] dst_release: dst:000000001fc13e61 refcnt:-2
[ 118.803041] dst_release: dst:000000001fc13e61 refcnt:-2
[ 118.803046] dst_release: dst:000000001fc13e61 refcnt:-2
[ 118.803060] dst_release: dst:000000001fc13e61 refcnt:-2
[ 118.803065] dst_release: dst:000000001fc13e61 refcnt:-2
[ 118.803078] dst_release: dst:000000001fc13e61 refcnt:-2
[ 118.803083] dst_release: dst:000000001fc13e61 refcnt:-2
[ 118.803096] dst_release: dst:000000001fc13e61 refcnt:-2
[ 118.803920] dst_release underflow
[ 118.803937] WARNING: CPU: 4 PID: 0 at net/core/dst.c:173 dst_release+0x79/0x90
[ 118.815961] Modules linked in: act_tunnel_key act_mirred act_skbedit veth vxlan ip6_udp_tunnel udp_tunnel act_gact cls_flower sch_ingress openvswitch nsh nf_conncount mlx5_ib mlx5_core mlxfw pci_hyperv_intf ptp pps_core nfsv3 nfs_acl xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_filter iptable_nat nf_nat nf_connt
rack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill overlay rpcrdma rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_iser ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs sunrpc ib_core iTCO_wdt iTCO_vendor_support lpc_ich mfd_core virtio_net
net_failover failover i2c_i801 i2c_smbus kvm_intel kvm pcspkr irqbypass crc32_pclmul ghash_clmulni_intel sch_fq_codel drm i2c_core ip_tables crc32c_intel serio_raw fuse [last unloaded: mlxfw]
[ 118.829464] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.16.0+ #3
[ 118.830567] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
[ 118.832541] RIP: 0010:dst_release+0x79/0x90
[ 118.833372] Code: 04 e8 db 14 01 00 8b 4c 24 04 85 c0 74 c7 e9 4d 60 22 00 48 c7 c7 35 00 32 82 89 4c 24 04 c6 05 c5 47 e5 00 01 e8 6f c2 1b 00 <0f> 0b 8b 4c 24 04 eb cb 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40
[ 118.836566] RSP: 0018:ffffc90000180ee0 EFLAGS: 00010282
[ 118.837546] RAX: 0000000000000000 RBX: ffff88813a139ab0 RCX: 0000000000000000
[ 118.838912] RDX: 0000000000000102 RSI: ffffffff82286273 RDI: 00000000ffffffff
[ 118.840278] RBP: 0000000000000004 R08: 0000000000000015 R09: ffffc90000180e78
[ 118.841646] R10: ffffffff825c7000 R11: 0000000000000001 R12: ffff88811d3ab480
[ 118.843008] R13: 0000000000000170 R14: 0000000000000000 R15: ffff8882f5a2e1b8
[ 118.844371] FS: 0000000000000000(0000) GS:ffff8882f5a00000(0000) knlGS:0000000000000000
[ 118.845968] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 118.847100] CR2: 00007fa5e5abd7e0 CR3: 0000000175408005 CR4: 0000000000770ee0
[ 118.848461] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 118.849834] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 118.851198] PKRU: 55555554
[ 118.851815] Call Trace:
[ 118.852387] <IRQ>
[ 118.852894] dst_cache_destroy+0x33/0x60
[ 118.853728] dst_destroy+0xaa/0xb0
[ 118.854465] rcu_core+0x2a3/0x960
[ 118.855197] __do_softirq+0xf0/0x2f9
[ 118.855976] __irq_exit_rcu+0xcc/0x120
[ 118.856765] sysvec_apic_timer_interrupt+0xa2/0xd0
[ 118.857746] </IRQ>
[ 118.858270] <TASK>
[ 118.858775] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 118.859801] RIP: 0010:native_safe_halt+0xb/0x10
[ 118.860731] Code: 7e ff ff ff 7f 5b c3 65 48 8b 04 25 c0 cb 01 00 f0 80 48 02 20 48 8b 00 a8 08 75 c3 eb 80 cc eb 07 0f 00 2d 8f f5 4f 00 fb f4 <c3> 0f 1f 40 00 eb 07 0f 00 2d 7f f5 4f 00 f4 c3 cc cc cc cc cc 0f
[ 118.864175] RSP: 0018:ffffc9000009fef0 EFLAGS: 00000206
[ 118.865223] RAX: 0000000000027f6c RBX: 0000000000000004 RCX: 0000000000000000
[ 118.866504] RDX: ffff88817c090d50 RSI: ffffffff82286273 RDI: ffffffff8225bf7f
[ 118.867774] RBP: ffff8881009d2e80 R08: 0000000000027f6b R09: 0000000000000000
[ 118.869051] R10: 00000000fffd3b17 R11: 0000000000000001 R12: 0000000000000000
[ 118.870326] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[ 118.871601] default_idle+0xa/0x10
[ 118.872300] default_idle_call+0x33/0xe0
[ 118.873081] do_idle+0x208/0x270
[ 118.873741] cpu_startup_entry+0x19/0x20
[ 118.874562] secondary_startup_64_no_verify+0xc3/0xcb
[ 118.875604] </TASK>
[ 118.876140] ---[ end trace dec5061c76371ce7 ]---
next prev parent reply other threads:[~2022-01-20 12:58 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-25 15:35 [PATCH net 0/2] net: do not modify the shared tunnel info when PMTU triggers an ICMP reply Antoine Tenart
2021-03-25 15:35 ` [PATCH net 1/2] vxlan: " Antoine Tenart
2022-01-20 7:38 ` Vlad Buslov
2022-01-20 10:27 ` Antoine Tenart
2022-01-20 12:58 ` Vlad Buslov [this message]
2022-01-28 17:01 ` Antoine Tenart
2022-01-31 11:26 ` Vlad Buslov
2022-01-31 13:26 ` Antoine Tenart
2022-01-31 14:04 ` Stefano Brivio
2022-01-31 14:42 ` Antoine Tenart
2022-01-31 17:55 ` Vlad Buslov
2021-03-25 15:35 ` [PATCH net 2/2] geneve: " Antoine Tenart
2021-03-25 20:28 ` [PATCH net 0/2] net: " Stefano Brivio
2021-03-26 0:40 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ygnhee52lg2d.fsf@nvidia.com \
--to=vladbu@nvidia.com \
--cc=atenart@kernel.org \
--cc=davem@davemloft.net \
--cc=echaudro@redhat.com \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pshelar@ovn.org \
--cc=sbrivio@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.