From: Ido Schimmel <idosch@idosch.org>
To: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@menlosecurity.com>
Cc: netdev@vger.kernel.org, adrian.oliver@menlosecurity.com,
Adrian Oliver <kernel@aoliver.ca>,
David Ahern <dsahern@gmail.com>,
Eric Dumazet <edumazet@google.com>,
Kuniyuki Iwashima <kuniyu@amazon.com>,
Paolo Abeni <pabeni@redhat.com>
Subject: Re: [PATCH net v5] ipv6: Fix soft lockups in fib6_select_path under high next hop churn
Date: Sun, 20 Oct 2024 13:55:13 +0300 [thread overview]
Message-ID: <ZxThkZZBCAnuJf8Y@shredder.mtl.com> (raw)
In-Reply-To: <20241019034958.197329-1-omid.ehtemamhaghighi@menlosecurity.com>
On Fri, Oct 18, 2024 at 08:49:58PM -0700, Omid Ehtemam-Haghighi wrote:
> Soft lockups have been observed on a cluster of Linux-based edge routers
> located in a highly dynamic environment. Using the `bird` service, these
> routers continuously update BGP-advertised routes due to frequently
> changing nexthop destinations, while also managing significant IPv6
> traffic. The lockups occur during the traversal of the multipath
> circular linked-list in the `fib6_select_path` function, particularly
> while iterating through the siblings in the list. The issue typically
> arises when the nodes of the linked list are unexpectedly deleted
> concurrently on a different core—indicated by their 'next' and
> 'previous' elements pointing back to the node itself and their reference
> count dropping to zero. This results in an infinite loop, leading to a
> soft lockup that triggers a system panic via the watchdog timer.
>
> Apply RCU primitives in the problematic code sections to resolve the
> issue. Where necessary, update the references to fib6_siblings to
> annotate or use the RCU APIs.
>
> Include a test script that reproduces the issue. The script
> periodically updates the routing table while generating a heavy load
> of outgoing IPv6 traffic through multiple iperf3 clients. It
> consistently induces infinite soft lockups within a couple of minutes.
[...]
> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
> Reported-by: Adrian Oliver <kernel@aoliver.ca>
> Tested-by: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@menlosecurity.com>
> Signed-off-by: Omid Ehtemam-Haghighi <omid.ehtemamhaghighi@menlosecurity.com>
You can drop your Tested-by given you authored the patch. It's implicit.
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Ido Schimmel <idosch@idosch.org>
> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
Please use scripts/get_maintainer.pl to get the full list of people to
copy.
[...]
> @@ -5195,14 +5195,18 @@ static void ip6_route_mpath_notify(struct fib6_info *rt,
> * nexthop. Since sibling routes are always added at the end of
> * the list, find the first sibling of the last route appended
> */
> + rcu_read_lock();
> +
> if ((nlflags & NLM_F_APPEND) && rt_last && rt_last->fib6_nsiblings) {
> - rt = list_first_entry(&rt_last->fib6_siblings,
> - struct fib6_info,
> - fib6_siblings);
> + rt = list_first_or_null_rcu(&rt_last->fib6_siblings,
> + struct fib6_info,
> + fib6_siblings);
> }
>
> if (rt)
> inet6_rt_notify(RTM_NEWROUTE, rt, info, nlflags);
Ran your test with a debug config [1] and got the following splat [2].
Fixed by [3]. Beside ip6_route_mpath_notify() all the callers of
inet6_rt_notify() already call it from an atomic context.
> +
> + rcu_read_unlock();
> }
[1] https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style
[2]
BUG: sleeping function called from invalid context at include/linux/sched/mm.h:321
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 3730, name: ip
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
2 locks held by ip/3730:
#0: ffffffff86fc7c30 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x377/0xf60
#1: ffffffff8655b3e0 (rcu_read_lock){....}-{1:2}, at: ip6_route_mpath_notify+0x75/0x330
CPU: 5 UID: 0 PID: 3730 Comm: ip Tainted: G W 6.12.0-rc3-custom-virtme-g38827a50efa3 #134
Tainted: [W]=WARN
Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
Call Trace:
<TASK>
dump_stack_lvl+0xba/0x110
__might_resched.cold+0x1ed/0x253
kmem_cache_alloc_node_noprof+0x2ab/0x310
__alloc_skb+0x2da/0x3a0
inet6_rt_notify+0xf6/0x2a0
ip6_route_mpath_notify+0x12c/0x330
ip6_route_multipath_add+0xcc7/0x1f70
inet6_rtm_newroute+0xfb/0x170
rtnetlink_rcv_msg+0x3cc/0xf60
netlink_rcv_skb+0x171/0x450
netlink_unicast+0x539/0x7f0
netlink_sendmsg+0x8c1/0xd80
____sys_sendmsg+0x8f9/0xc20
___sys_sendmsg+0x197/0x1e0
__sys_sendmsg+0x122/0x1f0
do_syscall_64+0xbb/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[3]
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index d1065a0edc19..b9b986bda943 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -6192,7 +6192,7 @@ void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
err = -ENOBUFS;
seq = info->nlh ? info->nlh->nlmsg_seq : 0;
- skb = nlmsg_new(rt6_nlmsg_size(rt), gfp_any());
+ skb = nlmsg_new(rt6_nlmsg_size(rt), GFP_ATOMIC);
if (!skb)
goto errout;
@@ -6205,7 +6205,7 @@ void inet6_rt_notify(int event, struct fib6_info *rt, struct nl_info *info,
goto errout;
}
rtnl_notify(skb, net, info->portid, RTNLGRP_IPV6_ROUTE,
- info->nlh, gfp_any());
+ info->nlh, GFP_ATOMIC);
return;
errout:
rtnl_set_sk_err(net, RTNLGRP_IPV6_ROUTE, err);
[...]
> diff --git a/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh b/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh
> new file mode 100755
> index 000000000000..d257fbf0b0a3
> --- /dev/null
> +++ b/tools/testing/selftests/net/ipv6_route_update_soft_lockup.sh
[...]
> + # Check if any iperf3 instances are still running. This could occur if a core has entered an infinite loop and
> + # the timeout for the soft lockup to be detected has not expired yet, but either the test interval has already
> + # elapsed or the test is terminated manually (Ctrl+C)
I didn't thoroughly review the test and only ran it to make sure it's
passing, but I suggest wrapping some of the comments there to 80
characters to avoid checkpatch warnings.
prev parent reply other threads:[~2024-10-20 10:55 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-19 3:49 [PATCH net v5] ipv6: Fix soft lockups in fib6_select_path under high next hop churn Omid Ehtemam-Haghighi
2024-10-20 10:55 ` Ido Schimmel [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZxThkZZBCAnuJf8Y@shredder.mtl.com \
--to=idosch@idosch.org \
--cc=adrian.oliver@menlosecurity.com \
--cc=dsahern@gmail.com \
--cc=edumazet@google.com \
--cc=kernel@aoliver.ca \
--cc=kuniyu@amazon.com \
--cc=netdev@vger.kernel.org \
--cc=omid.ehtemamhaghighi@menlosecurity.com \
--cc=pabeni@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).