* [PATCH net-next 0/2] ipv6: fix hangup on device removal
@ 2024-10-31 8:53 Paolo Abeni
2024-10-31 8:53 ` [PATCH net-next 1/2] ipv6: release nexthop " Paolo Abeni
2024-10-31 8:53 ` [PATCH net-next 2/2] selftests: net: really check for bg process completion Paolo Abeni
0 siblings, 2 replies; 5+ messages in thread
From: Paolo Abeni @ 2024-10-31 8:53 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Simon Horman, Shuah Khan
This addresses the infamous unregister_netdevice splat in net selftests;
the actual fix is carried by the first patch, while the 2nd one
addresses a related problem in the relevant test that was patially
hiding the problem.
Targeting net-next as the issue is quite old and I feel a little lost
in the fib info/nh jungle.
Paolo Abeni (2):
ipv6: release nexthop on device removal
selftests: net: really check for bg process completion
net/ipv6/route.c | 7 ++++---
tools/testing/selftests/net/pmtu.sh | 2 +-
2 files changed, 5 insertions(+), 4 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next 1/2] ipv6: release nexthop on device removal
2024-10-31 8:53 [PATCH net-next 0/2] ipv6: fix hangup on device removal Paolo Abeni
@ 2024-10-31 8:53 ` Paolo Abeni
2024-11-05 12:19 ` Paolo Abeni
2024-10-31 8:53 ` [PATCH net-next 2/2] selftests: net: really check for bg process completion Paolo Abeni
1 sibling, 1 reply; 5+ messages in thread
From: Paolo Abeni @ 2024-10-31 8:53 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Simon Horman, Shuah Khan
The CI is hitting some aperiodic hangup at device removal time in the
pmtu.sh self-test:
unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 6
ref_tracker: veth_A-R1@ffff888013df15d8 has 1/5 users at
dst_init+0x84/0x4a0
dst_alloc+0x97/0x150
ip6_dst_alloc+0x23/0x90
ip6_rt_pcpu_alloc+0x1e6/0x520
ip6_pol_route+0x56f/0x840
fib6_rule_lookup+0x334/0x630
ip6_route_output_flags+0x259/0x480
ip6_dst_lookup_tail.constprop.0+0x5c2/0x940
ip6_dst_lookup_flow+0x88/0x190
udp_tunnel6_dst_lookup+0x2a7/0x4c0
vxlan_xmit_one+0xbde/0x4a50 [vxlan]
vxlan_xmit+0x9ad/0xf20 [vxlan]
dev_hard_start_xmit+0x10e/0x360
__dev_queue_xmit+0xf95/0x18c0
arp_solicit+0x4a2/0xe00
neigh_probe+0xaa/0xf0
While the first suspect is the dst_cache, explicitly tracking the dst
owing the last device reference via probes proved such dst is held by
the nexthop in the originating fib6_info.
Similar to commit f5b51fe804ec ("ipv6: route: purge exception on
removal"), we need to explicitly release the originating fib info when
disconnecting a to-be-removed device from a live ipv6 dst: move the
fib6_info cleanup into ip6_dst_ifdown().
Tested running:
./pmtu.sh cleanup_ipv6_exception
in a tight loop for more than 400 iterations with no spat, running an
unpatched kernel I observed a splat every ~10 iterations.
Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/ipv6/route.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d7ce5cf2017a..ef55f330dcda 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -187,6 +187,7 @@ static void rt6_uncached_list_flush_dev(struct net_device *dev)
GFP_ATOMIC);
handled = true;
}
+
if (handled)
list_del_init(&rt->dst.rt_uncached);
}
@@ -374,6 +375,7 @@ static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev)
{
struct rt6_info *rt = dst_rt6_info(dst);
struct inet6_dev *idev = rt->rt6i_idev;
+ struct fib6_info *from;
if (idev && idev->dev != blackhole_netdev) {
struct inet6_dev *blackhole_idev = in6_dev_get(blackhole_netdev);
@@ -383,6 +385,8 @@ static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev)
in6_dev_put(idev);
}
}
+ from = unrcu_pointer(xchg(&rt->from, NULL));
+ fib6_info_release(from);
}
static bool __rt6_check_expired(const struct rt6_info *rt)
@@ -1455,7 +1459,6 @@ static DEFINE_SPINLOCK(rt6_exception_lock);
static void rt6_remove_exception(struct rt6_exception_bucket *bucket,
struct rt6_exception *rt6_ex)
{
- struct fib6_info *from;
struct net *net;
if (!bucket || !rt6_ex)
@@ -1467,8 +1470,6 @@ static void rt6_remove_exception(struct rt6_exception_bucket *bucket,
/* purge completely the exception to allow releasing the held resources:
* some [sk] cache may keep the dst around for unlimited time
*/
- from = unrcu_pointer(xchg(&rt6_ex->rt6i->from, NULL));
- fib6_info_release(from);
dst_dev_put(&rt6_ex->rt6i->dst);
hlist_del_rcu(&rt6_ex->hlist);
--
2.45.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next 2/2] selftests: net: really check for bg process completion
2024-10-31 8:53 [PATCH net-next 0/2] ipv6: fix hangup on device removal Paolo Abeni
2024-10-31 8:53 ` [PATCH net-next 1/2] ipv6: release nexthop " Paolo Abeni
@ 2024-10-31 8:53 ` Paolo Abeni
1 sibling, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2024-10-31 8:53 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Simon Horman, Shuah Khan
A recent refactor transformed the check for process completion
in a true statement, due to a typo.
As a result, the relevant test-case is unable to catch the
regression it was supposed to detect.
Restore the correct condition.
Fixes: 691bb4e49c98 ("selftests: net: avoid just another constant wait")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
tools/testing/selftests/net/pmtu.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 569bce8b6383..6c651c880fe8 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -2056,7 +2056,7 @@ check_running() {
pid=${1}
cmd=${2}
- [ "$(cat /proc/${pid}/cmdline 2>/dev/null | tr -d '\0')" = "{cmd}" ]
+ [ "$(cat /proc/${pid}/cmdline 2>/dev/null | tr -d '\0')" = "${cmd}" ]
}
test_cleanup_vxlanX_exception() {
--
2.45.2
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next 1/2] ipv6: release nexthop on device removal
2024-10-31 8:53 ` [PATCH net-next 1/2] ipv6: release nexthop " Paolo Abeni
@ 2024-11-05 12:19 ` Paolo Abeni
2024-11-05 12:29 ` Eric Dumazet
0 siblings, 1 reply; 5+ messages in thread
From: Paolo Abeni @ 2024-11-05 12:19 UTC (permalink / raw)
To: netdev
Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
Simon Horman, Shuah Khan
On 10/31/24 09:53, Paolo Abeni wrote:
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index d7ce5cf2017a..ef55f330dcda 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -187,6 +187,7 @@ static void rt6_uncached_list_flush_dev(struct net_device *dev)
> GFP_ATOMIC);
> handled = true;
> }
> +
Please do not include unrelated whitespace chang... wait! I'm talking to
myself... in any case a v2 is needed.
/P
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net-next 1/2] ipv6: release nexthop on device removal
2024-11-05 12:19 ` Paolo Abeni
@ 2024-11-05 12:29 ` Eric Dumazet
0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2024-11-05 12:29 UTC (permalink / raw)
To: Paolo Abeni
Cc: netdev, David S. Miller, David Ahern, Jakub Kicinski,
Simon Horman, Shuah Khan
On Tue, Nov 5, 2024 at 1:19 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 10/31/24 09:53, Paolo Abeni wrote:
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index d7ce5cf2017a..ef55f330dcda 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -187,6 +187,7 @@ static void rt6_uncached_list_flush_dev(struct net_device *dev)
> > GFP_ATOMIC);
> > handled = true;
> > }
> > +
>
> Please do not include unrelated whitespace chang... wait! I'm talking to
> myself... in any case a v2 is needed.
Oh, I missed this patch, thanks a lot for sorting it out.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-11-05 12:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31 8:53 [PATCH net-next 0/2] ipv6: fix hangup on device removal Paolo Abeni
2024-10-31 8:53 ` [PATCH net-next 1/2] ipv6: release nexthop " Paolo Abeni
2024-11-05 12:19 ` Paolo Abeni
2024-11-05 12:29 ` Eric Dumazet
2024-10-31 8:53 ` [PATCH net-next 2/2] selftests: net: really check for bg process completion Paolo Abeni
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.