From: Oliver Herms <oliver.peter.herms@gmail.com>
To: Network Development <netdev@vger.kernel.org>
Cc: David Miller <davem@davemloft.net>, David Ahern <dsahern@gmail.com>
Subject: VRF/IPv4/ARP: unregister_netdevice waiting for dev to become free -> Who's responsible for releasing dst_entry created by ip_route_input_noref?
Date: Sat, 5 Jun 2021 19:16:14 +0200 [thread overview]
Message-ID: <20cd265b-d52d-fd1f-c47e-bfa7ea15518f@gmail.com> (raw)
Hi everyone,
I'm observing an device unregistration issue when I try to delete a VRF interface after using the VRF.
The issue is reproducible on 5.12.9, 5.10.24, 5.11.0-18 (debian).
Here are the steps to reproduce the issue:
ip addr add 10.0.0.1/32 dev lo
ip netns add test-ns
ip link add veth-outside type veth peer name veth-inside
ip link add vrf-100 type vrf table 1100
ip link set veth-outside master vrf-100
ip link set veth-inside netns test-ns
ip link set veth-outside up
ip link set vrf-100 up
ip route add 10.1.1.1/32 dev veth-outside table 1100
ip netns exec test-ns ip link set veth-inside up
ip netns exec test-ns ip addr add 10.1.1.1/32 dev veth-inside
ip netns exec test-ns ip route add 10.0.0.1/32 dev veth-inside
ip netns exec test-ns ip route add default via 10.0.0.1
ip netns exec test-ns ping 10.0.0.1 -c 1 -i 1
sleep 10
ip link set veth-outside nomaster
ip link set vrf-100 down
ip link delete vrf-100 <= Never returns
The issue does not happen when I don't do the ping.
I've tracked down all calls to dev_hold and dev_put.
When the ping command is run there is the following call to dev_hold to which the corresponding dev_put seems to be missing (doesn't even happen when the VRF is set down or deleted):
[ 284.528775] CPU: 2 PID: 1205 Comm: ping Not tainted 5.12.9 #1
[ 284.528790] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 284.528796] Call Trace:
[ 284.528802] <IRQ>
[ 284.528832] dump_stack+0x7d/0x9c
[ 284.528854] dst_alloc.cold+0x11/0x2a
[ 284.528866] rt_dst_alloc+0x48/0xd0
[ 284.528881] ip_route_input_slow+0x507/0xc80
[ 284.528900] ip_route_input_rcu+0x258/0x270
[ 284.528913] ip_route_input_noref+0x2a/0x50
[ 284.528923] arp_process+0x4da/0x8a0
[ 284.528938] arp_rcv+0x1a9/0x1d0
[ 284.528948] ? trigger_load_balance+0x205/0x240
[ 284.528961] __netif_receive_skb_one_core+0x8d/0xa0
[ 284.528974] __netif_receive_skb+0x18/0x60
[ 284.528984] process_backlog+0xa2/0x170
[ 284.528993] __napi_poll+0x31/0x170
[ 284.529002] net_rx_action+0x22f/0x280
[ 284.529012] __do_softirq+0xce/0x281
[ 284.529024] do_softirq+0x77/0xa0
[ 284.529049] </IRQ>
[ 284.529054] __local_bh_enable_ip+0x50/0x60
[ 284.529064] ip_finish_output2+0x1ab/0x590
[ 284.529073] ? __cgroup_bpf_run_filter_skb+0x3ce/0x3e0
[ 284.529086] __ip_finish_output+0x110/0x270
[ 284.529096] ip_finish_output+0x2d/0xb0
[ 284.529105] ip_output+0x78/0x100
[ 284.529114] ? __ip_finish_output+0x270/0x270
[ 284.529122] ip_push_pending_frames+0xa3/0xb0
[ 284.529131] raw_sendmsg+0x5f0/0xdb0
[ 284.529144] ? setup_min_slab_ratio+0x68/0x90
[ 284.529182] ? __cond_resched+0x1a/0x50
[ 284.529195] ? aa_sk_perm+0x43/0x1b0
[ 284.529211] inet_sendmsg+0x6c/0x70
[ 284.529221] sock_sendmsg+0x5e/0x70
[ 284.529234] __sys_sendto+0x113/0x190
[ 284.529249] ? handle_mm_fault+0xda/0x2c0
[ 284.529258] ? do_user_addr_fault+0x1f5/0x670
[ 284.529266] ? exit_to_user_mode_prepare+0x37/0x190
[ 284.529277] __x64_sys_sendto+0x29/0x30
[ 284.529287] do_syscall_64+0x38/0x90
[ 284.529298] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 284.529306] RIP: 0033:0x7f89f02db53a
[ 284.529317] Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c
[ 284.529325] RSP: 002b:00007ffd7c1b0478 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[ 284.529335] RAX: ffffffffffffffda RBX: 00007ffd7c1b1c00 RCX: 00007f89f02db53a
[ 284.529340] RDX: 0000000000000040 RSI: 00005592d86be100 RDI: 0000000000000003
[ 284.529345] RBP: 00005592d86be100 R08: 00007ffd7c1b3e7c R09: 0000000000000010
[ 284.529349] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000040
[ 284.529354] R13: 00007ffd7c1b1bc0 R14: 00007ffd7c1b0480 R15: 0000001d00000001
Processing the incoming ARP request causes a call to ip_route_input_noref => ip_route_input_rcu => ip_route_input_slow => rt_dst_alloc => dst_alloc => dev_hold.
In a non VRF use-case the dst->dev would be the loopback interface that is never deleted. In the VRF use-case dst->dev is the VRF interface. And that one I would like to delete.
I've tracked down that dst_release() would call dev_put() but it seems dst_release is not called here (but should be I guess?). Thus, causing a dst_entry leak that causes the VRF device to be unremovable.
At least that's what it looks like to me.
So: Who's responsible for releasing dst_entry created by ip_route_input_noref in arp_process?
Kind Regards
Oliver
next reply other threads:[~2021-06-05 17:16 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-05 17:16 Oliver Herms [this message]
2021-06-05 18:56 ` VRF/IPv4/ARP: unregister_netdevice waiting for dev to become free -> Who's responsible for releasing dst_entry created by ip_route_input_noref? David Ahern
2021-06-05 23:00 ` David Ahern
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20cd265b-d52d-fd1f-c47e-bfa7ea15518f@gmail.com \
--to=oliver.peter.herms@gmail.com \
--cc=davem@davemloft.net \
--cc=dsahern@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.