All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] ipv6: fix hangup on device removal
@ 2024-10-31  8:53 Paolo Abeni
  2024-10-31  8:53 ` [PATCH net-next 1/2] ipv6: release nexthop " Paolo Abeni
  2024-10-31  8:53 ` [PATCH net-next 2/2] selftests: net: really check for bg process completion Paolo Abeni
  0 siblings, 2 replies; 5+ messages in thread
From: Paolo Abeni @ 2024-10-31  8:53 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Simon Horman, Shuah Khan

This addresses the infamous unregister_netdevice splat in net selftests;
the actual fix is carried by the first patch, while the 2nd one
addresses a related problem in the relevant test that was patially
hiding the problem.

Targeting net-next as the issue is quite old and I feel a little lost
in the fib info/nh jungle.

Paolo Abeni (2):
  ipv6: release nexthop on device removal
  selftests: net: really check for bg process completion

 net/ipv6/route.c                    | 7 ++++---
 tools/testing/selftests/net/pmtu.sh | 2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)

-- 
2.45.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net-next 1/2] ipv6: release nexthop on device removal
  2024-10-31  8:53 [PATCH net-next 0/2] ipv6: fix hangup on device removal Paolo Abeni
@ 2024-10-31  8:53 ` Paolo Abeni
  2024-11-05 12:19   ` Paolo Abeni
  2024-10-31  8:53 ` [PATCH net-next 2/2] selftests: net: really check for bg process completion Paolo Abeni
  1 sibling, 1 reply; 5+ messages in thread
From: Paolo Abeni @ 2024-10-31  8:53 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Simon Horman, Shuah Khan

The CI is hitting some aperiodic hangup at device removal time in the
pmtu.sh self-test:

unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 6
ref_tracker: veth_A-R1@ffff888013df15d8 has 1/5 users at
	dst_init+0x84/0x4a0
	dst_alloc+0x97/0x150
	ip6_dst_alloc+0x23/0x90
	ip6_rt_pcpu_alloc+0x1e6/0x520
	ip6_pol_route+0x56f/0x840
	fib6_rule_lookup+0x334/0x630
	ip6_route_output_flags+0x259/0x480
	ip6_dst_lookup_tail.constprop.0+0x5c2/0x940
	ip6_dst_lookup_flow+0x88/0x190
	udp_tunnel6_dst_lookup+0x2a7/0x4c0
	vxlan_xmit_one+0xbde/0x4a50 [vxlan]
	vxlan_xmit+0x9ad/0xf20 [vxlan]
	dev_hard_start_xmit+0x10e/0x360
	__dev_queue_xmit+0xf95/0x18c0
	arp_solicit+0x4a2/0xe00
	neigh_probe+0xaa/0xf0

While the first suspect is the dst_cache, explicitly tracking the dst
owing the last device reference via probes proved such dst is held by
the nexthop in the originating fib6_info.

Similar to commit f5b51fe804ec ("ipv6: route: purge exception on
removal"), we need to explicitly release the originating fib info when
disconnecting a to-be-removed device from a live ipv6 dst: move the
fib6_info cleanup into ip6_dst_ifdown().

Tested running:

./pmtu.sh cleanup_ipv6_exception

in a tight loop for more than 400 iterations with no spat, running an
unpatched kernel  I observed a splat every ~10 iterations.

Fixes: f88d8ea67fbd ("ipv6: Plumb support for nexthop object in a fib6_info")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv6/route.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d7ce5cf2017a..ef55f330dcda 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -187,6 +187,7 @@ static void rt6_uncached_list_flush_dev(struct net_device *dev)
 						   GFP_ATOMIC);
 				handled = true;
 			}
+
 			if (handled)
 				list_del_init(&rt->dst.rt_uncached);
 		}
@@ -374,6 +375,7 @@ static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev)
 {
 	struct rt6_info *rt = dst_rt6_info(dst);
 	struct inet6_dev *idev = rt->rt6i_idev;
+	struct fib6_info *from;
 
 	if (idev && idev->dev != blackhole_netdev) {
 		struct inet6_dev *blackhole_idev = in6_dev_get(blackhole_netdev);
@@ -383,6 +385,8 @@ static void ip6_dst_ifdown(struct dst_entry *dst, struct net_device *dev)
 			in6_dev_put(idev);
 		}
 	}
+	from = unrcu_pointer(xchg(&rt->from, NULL));
+	fib6_info_release(from);
 }
 
 static bool __rt6_check_expired(const struct rt6_info *rt)
@@ -1455,7 +1459,6 @@ static DEFINE_SPINLOCK(rt6_exception_lock);
 static void rt6_remove_exception(struct rt6_exception_bucket *bucket,
 				 struct rt6_exception *rt6_ex)
 {
-	struct fib6_info *from;
 	struct net *net;
 
 	if (!bucket || !rt6_ex)
@@ -1467,8 +1470,6 @@ static void rt6_remove_exception(struct rt6_exception_bucket *bucket,
 	/* purge completely the exception to allow releasing the held resources:
 	 * some [sk] cache may keep the dst around for unlimited time
 	 */
-	from = unrcu_pointer(xchg(&rt6_ex->rt6i->from, NULL));
-	fib6_info_release(from);
 	dst_dev_put(&rt6_ex->rt6i->dst);
 
 	hlist_del_rcu(&rt6_ex->hlist);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net-next 2/2] selftests: net: really check for bg process completion
  2024-10-31  8:53 [PATCH net-next 0/2] ipv6: fix hangup on device removal Paolo Abeni
  2024-10-31  8:53 ` [PATCH net-next 1/2] ipv6: release nexthop " Paolo Abeni
@ 2024-10-31  8:53 ` Paolo Abeni
  1 sibling, 0 replies; 5+ messages in thread
From: Paolo Abeni @ 2024-10-31  8:53 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Simon Horman, Shuah Khan

A recent refactor transformed the check for process completion
in a true statement, due to a typo.

As a result, the relevant test-case is unable to catch the
regression it was supposed to detect.

Restore the correct condition.

Fixes: 691bb4e49c98 ("selftests: net: avoid just another constant wait")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 tools/testing/selftests/net/pmtu.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/pmtu.sh b/tools/testing/selftests/net/pmtu.sh
index 569bce8b6383..6c651c880fe8 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -2056,7 +2056,7 @@ check_running() {
 	pid=${1}
 	cmd=${2}
 
-	[ "$(cat /proc/${pid}/cmdline 2>/dev/null | tr -d '\0')" = "{cmd}" ]
+	[ "$(cat /proc/${pid}/cmdline 2>/dev/null | tr -d '\0')" = "${cmd}" ]
 }
 
 test_cleanup_vxlanX_exception() {
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 1/2] ipv6: release nexthop on device removal
  2024-10-31  8:53 ` [PATCH net-next 1/2] ipv6: release nexthop " Paolo Abeni
@ 2024-11-05 12:19   ` Paolo Abeni
  2024-11-05 12:29     ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Paolo Abeni @ 2024-11-05 12:19 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, David Ahern, Eric Dumazet, Jakub Kicinski,
	Simon Horman, Shuah Khan

On 10/31/24 09:53, Paolo Abeni wrote:
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index d7ce5cf2017a..ef55f330dcda 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -187,6 +187,7 @@ static void rt6_uncached_list_flush_dev(struct net_device *dev)
>  						   GFP_ATOMIC);
>  				handled = true;
>  			}
> +

Please do not include unrelated whitespace chang... wait! I'm talking to
myself... in any case a v2 is needed.

/P


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net-next 1/2] ipv6: release nexthop on device removal
  2024-11-05 12:19   ` Paolo Abeni
@ 2024-11-05 12:29     ` Eric Dumazet
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Dumazet @ 2024-11-05 12:29 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, David S. Miller, David Ahern, Jakub Kicinski,
	Simon Horman, Shuah Khan

On Tue, Nov 5, 2024 at 1:19 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 10/31/24 09:53, Paolo Abeni wrote:
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index d7ce5cf2017a..ef55f330dcda 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -187,6 +187,7 @@ static void rt6_uncached_list_flush_dev(struct net_device *dev)
> >                                                  GFP_ATOMIC);
> >                               handled = true;
> >                       }
> > +
>
> Please do not include unrelated whitespace chang... wait! I'm talking to
> myself... in any case a v2 is needed.

Oh, I missed this patch, thanks a lot for sorting it out.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-11-05 12:30 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-31  8:53 [PATCH net-next 0/2] ipv6: fix hangup on device removal Paolo Abeni
2024-10-31  8:53 ` [PATCH net-next 1/2] ipv6: release nexthop " Paolo Abeni
2024-11-05 12:19   ` Paolo Abeni
2024-11-05 12:29     ` Eric Dumazet
2024-10-31  8:53 ` [PATCH net-next 2/2] selftests: net: really check for bg process completion Paolo Abeni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.