Re: [PATCH net v5] ipv6: Fix soft lockups in fib6_select_path under high next hop churn

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ido Schimmel <idosch@idosch.org>
To: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@menlosecurity.com>
Cc: netdev@vger.kernel.org, adrian.oliver@menlosecurity.com,
	Adrian Oliver <kernel@aoliver.ca>,
	David Ahern <dsahern@gmail.com>,
	Eric Dumazet <edumazet@google.com>,
	Kuniyuki Iwashima <kuniyu@amazon.com>,
	Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH net v5] ipv6: Fix soft lockups in fib6_select_path under high next hop churn
Date: Sun, 20 Oct 2024 13:55:13 +0300	[thread overview]
Message-ID: <ZxThkZZBCAnuJf8Y@shredder.mtl.com> (raw)
In-Reply-To: <20241019034958.197329-1-omid.ehtemamhaghighi@menlosecurity.com>

On Fri, Oct 18, 2024 at 08:49:58PM -0700, Omid Ehtemam-Haghighi wrote:
> Soft lockups have been observed on a cluster of Linux-based edge routers
> located in a highly dynamic environment. Using the `bird` service, these
> routers continuously update BGP-advertised routes due to frequently
> changing nexthop destinations, while also managing significant IPv6
> traffic. The lockups occur during the traversal of the multipath
> circular linked-list in the `fib6_select_path` function, particularly
> while iterating through the siblings in the list. The issue typically
> arises when the nodes of the linked list are unexpectedly deleted
> concurrently on a different core—indicated by their 'next' and
> 'previous' elements pointing back to the node itself and their reference
> count dropping to zero. This results in an infinite loop, leading to a
> soft lockup that triggers a system panic via the watchdog timer.
> 
> Apply RCU primitives in the problematic code sections to resolve the
> issue. Where necessary, update the references to fib6_siblings to
> annotate or use the RCU APIs.
> 
> Include a test script that reproduces the issue. The script
> periodically updates the routing table while generating a heavy load
> of outgoing IPv6 traffic through multiple iperf3 clients. It
> consistently induces infinite soft lockups within a couple of minutes.

[...]

> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
> Reported-by: Adrian Oliver <kernel@aoliver.ca>
> Tested-by: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@menlosecurity.com>
> Signed-off-by: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@menlosecurity.com>

You can drop your Tested-by given you authored the patch. It's implicit.

> Cc: David Ahern <dsahern@gmail.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Ido Schimmel <idosch@idosch.org>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Paolo Abeni <pabeni@redhat.com>

Please use scripts/get_maintainer.pl to get the full list of people to
copy.

[...]

> @@ -5195,14 +5195,18 @@ static void ip6_route_mpath_notify(struct fib6_info *rt,
>  	 * nexthop. Since sibling routes are always added at the end of
>  	 * the list, find the first sibling of the last route appended
>  	 */
> +	rcu_read_lock();
> +
>  	if ((nlflags & NLM_F_APPEND) && rt_last && rt_last->fib6_nsiblings) {
> -		rt = list_first_entry(&rt_last->fib6_siblings,
> -				      struct fib6_info,
> -				      fib6_siblings);
> +		rt = list_first_or_null_rcu(&rt_last->fib6_siblings,
> +					    struct fib6_info,
> +					    fib6_siblings);
>  	}
>  
>  	if (rt)
>  		inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);

Ran your test with a debug config [1] and got the following splat [2].
Fixed by [3]. Beside ip6_route_mpath_notify() all the callers of
inet6_rt_notify() already call it from an atomic context.

> +
> +	rcu_read_unlock();
>  }

[1] https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style

[2]
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 3730, name: ip
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by ip/3730:
 #0: ffffffff86fc7c30 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60
 #1: ffffffff8655b3e0 (rcu_read_lock){....}-{1:2}, at: ip6_route_mpath_notify+0x75/0x330
CPU: 5 UID: 0 PID: 3730 Comm: ip Tainted: G        W          6.12.0-rc3-custom-virtme-g38827a50efa3 #134
Tainted: [W]=WARN
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
 <TASK>
 dump_stack_lvl+0xba/0x110
 __might_resched.cold+0x1ed/0x253
 kmem_cache_alloc_node_noprof+0x2ab/0x310
 __alloc_skb+0x2da/0x3a0
 inet6_rt_notify+0xf6/0x2a0
 ip6_route_mpath_notify+0x12c/0x330
 ip6_route_multipath_add+0xcc7/0x1f70
 inet6_rtm_newroute+0xfb/0x170
 rtnetlink_rcv_msg+0x3cc/0xf60
 netlink_rcv_skb+0x171/0x450
 netlink_unicast+0x539/0x7f0
 netlink_sendmsg+0x8c1/0xd80
 ____sys_sendmsg+0x8f9/0xc20
 ___sys_sendmsg+0x197/0x1e0
 __sys_sendmsg+0x122/0x1f0
 do_syscall_64+0xbb/0x1d0
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

[3]
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d1065a0edc19..b9b986bda943 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -6192,7 +6192,7 @@ void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
        err = -ENOBUFS;
        seq = info->nlh ? info->nlh->nlmsg_seq : 0;
 
-       skb = nlmsg_new(rt6_nlmsg_size(rt), gfp_any());
+       skb = nlmsg_new(rt6_nlmsg_size(rt), GFP_ATOMIC);
        if (!skb)
                goto errout;
 
@@ -6205,7 +6205,7 @@ void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
                goto errout;
        }
        rtnl_notify(skb, net, info->portid, RTNLGRP_IPV6_ROUTE,
-                   info->nlh, gfp_any());
+                   info->nlh, GFP_ATOMIC);
        return;
 errout:
        rtnl_set_sk_err(net, RTNLGRP_IPV6_ROUTE, err);

[...]

> diff --git a/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh b/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh
> new file mode 100755
> index 000000000000..d257fbf0b0a3
> --- /dev/null
> +++ b/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh

[...]

> +	# Check if any iperf3 instances are still running. This could occur if a core has entered an infinite loop and
> +	# the timeout for the soft lockup to be detected has not expired yet, but either the test interval has already
> +	# elapsed or the test is terminated manually (Ctrl+C)

I didn't thoroughly review the test and only ran it to make sure it's
passing, but I suggest wrapping some of the comments there to 80
characters to avoid checkpatch warnings.

     prev parent reply	other threads:[~2024-10-20 10:55 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-19  3:49 [PATCH net v5] ipv6: Fix soft lockups in fib6_select_path under high next hop churn Omid Ehtemam-Haghighi
2024-10-20 10:55 ` Ido Schimmel [this message]

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:d1065a0edc1 dfblob:b9b986bda94 )
 OR (
bs:"Re: [PATCH net v5] ipv6: Fix soft lockups in fib6_select_path under high next hop churn" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZxThkZZBCAnuJf8Y@shredder.mtl.com \
    --to=idosch@idosch.org \
    --cc=adrian.oliver@menlosecurity.com \
    --cc=dsahern@gmail.com \
    --cc=edumazet@google.com \
    --cc=kernel@aoliver.ca \
    --cc=kuniyu@amazon.com \
    --cc=netdev@vger.kernel.org \
    --cc=omid.ehtemamhaghighi@menlosecurity.com \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).