From: Pengfei Zhang <zhangfeionline@gmail.com>
To: dsahern@kernel.org, idosch@nvidia.com
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, chenzhangqi@xiaomi.com,
baohua@kernel.org, Pengfei Zhang <zhangpengfei16@xiaomi.com>,
Pengfei Zhang <zhangfeionline@gmail.com>
Subject: [PATCH] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump
Date: Thu, 25 Jun 2026 01:11:56 +0800 [thread overview]
Message-ID: <20260624171156.822055-1-zhangfeionline@gmail.com> (raw)
From: Pengfei Zhang <zhangpengfei16@xiaomi.com>
inet6_dump_fib() saves its progress in cb->args[1] as a positional
index within the current hash chain. Between batches the RTNL lock
is released, so a concurrent fib6_new_table() can insert a new table
at the chain head, shifting all existing entries. The saved index
then lands on a different table, causing fib6_dump_table() to set
w->root to the wrong table while w->node still points into the
previous one. fib6_walk_continue() dereferences w->node->parent
(NULL) and panics:
BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:fib6_walk_continue+0x6e/0x170
Call Trace:
<TASK>
fib6_dump_table.isra.0+0xc5/0x240
inet6_dump_fib+0xf6/0x420
rtnl_dumpit+0x30/0xa0
netlink_dump+0x15b/0x460
netlink_recvmsg+0x1d6/0x2a0
____sys_recvmsg+0x17a/0x190
Fix by storing tb->tb6_id in cb->args[1] instead of a positional
index. On resume, skip entries until the id matches; a concurrent
head-insert can never match the saved id, so the walker always
resumes on the correct table.
Signed-off-by: Pengfei Zhang <zhangfeionline@gmail.com>
---
The same crash was independently reported in a production environment
(kernel 5.15.137, triggered by ovs-vswitchd issuing RTM_GETROUTE):
https://lkml.iu.edu/hypermail/linux/kernel/2402.3/02068.html
The crash is probabilistic and occurs in fib6_walk_continue() at the
FWS_U state:
case FWS_U:
if (fn == w->root)
return 0;
pn = rcu_dereference_protected(fn->parent, 1);
left = rcu_dereference_protected(pn->left, 1); /* crash here */
The crash dump shows fn->parent is NULL. At first glance this looks
like fn is a leaf node whose parent was freed, but closer inspection of
the walker state reveals fn->fn_flags has RTN_ROOT set — fn is itself
a root node of a routing table, not a child node. A root node has no
parent by definition, so fn->parent == NULL is correct for that node.
The real question is why fn != w->root despite fn being a root. The
answer is that w->root and fn belong to *different* tables: w->node
(which became fn during traversal) still references a node from the
table that was being dumped when the batch suspended, while w->root was
silently redirected to a different table on resume.
This misdirection happens because inet6_dump_fib() uses a positional
index to resume across batches. Consider a hash slot containing two
tables [A(pos=0), B(pos=1)] where B is large enough to require multiple
batches. On the first batch, B suspends mid-walk and the loop saves:
cb->args[1] = e; /* e=1, position of B in the chain */
The RTNL lock is then released. At this point a concurrent
fib6_new_table() inserts table C at the chain head via
hlist_add_head_rcu(), making the chain [C(pos=0), A(pos=1), B(pos=2)].
On the next batch, inet6_dump_fib() resumes with s_e=1 and iterates:
s_e = cb->args[1]; /* s_e = 1 */
hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
if (e < s_e) /* skip C at pos=0 */
goto next;
/* e=1: tb now points to A, not B */
fib6_dump_table(tb, skb, cb); /* called with wrong table A */
}
Inside fib6_dump_table(), w->root is unconditionally overwritten
before the resume branch is entered:
w->root = &table->tb6_root; /* now A's root */
/* ... */
} else {
int sernum = READ_ONCE(w->root->fn_sernum); /* A's sernum */
if (cb->args[5] != sernum) {
/* sernum changed: safe reset, w->node = w->root (A) */
w->node = w->root;
} else {
/* sernum unchanged: w->node untouched, still in B */
w->skip = 0;
}
fib6_walk_continue(w); /* sernum equal: w->root=A, w->node=B */
}
The sernum guard was intended to detect tree modifications and reset
the walk, but here the two tables happen to share the same fn_sernum
value (a global flush had previously unified them), so the guard does
not fire and w->node is left pointing into B's tree.
From this point w->root and w->node belong to different tables. When
fib6_walk_continue() traverses upward and reaches B's root node
(fn->fn_flags & RTN_ROOT), the exit check:
if (fn == w->root) /* B's root != A's root, check fails */
return 0;
pn = fn->parent; /* B's root has no parent: pn == NULL */
left = pn->left; /* NULL deref -> crash */
net/ipv6/ip6_fib.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index fc95738de..bda492634 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -636,11 +636,11 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
};
const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
- unsigned int e = 0, s_e;
struct hlist_head *head;
struct fib6_walker *w;
struct fib6_table *tb;
unsigned int h, s_h;
+ u32 s_id;
int err = 0;
rcu_read_lock();
@@ -701,23 +701,22 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
}
s_h = cb->args[0];
- s_e = cb->args[1];
+ s_id = cb->args[1];
- for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_e = 0) {
- e = 0;
+ for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_id = 0) {
head = &net->ipv6.fib_table_hash[h];
hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
- if (e < s_e)
- goto next;
+ if (s_id && tb->tb6_id != s_id)
+ continue;
+ s_id = 0;
+
+ cb->args[1] = tb->tb6_id;
err = fib6_dump_table(tb, skb, cb);
if (err != 0)
goto out;
-next:
- e++;
}
}
out:
- cb->args[1] = e;
cb->args[0] = h;
unlock:
--
2.34.1
next reply other threads:[~2026-06-24 17:12 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-24 17:11 Pengfei Zhang [this message]
2026-06-24 17:22 ` [PATCH] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump Eric Dumazet
2026-06-25 1:23 ` Pengfei Zhang
2026-06-25 1:23 ` [PATCH v2] " Pengfei Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260624171156.822055-1-zhangfeionline@gmail.com \
--to=zhangfeionline@gmail.com \
--cc=baohua@kernel.org \
--cc=chenzhangqi@xiaomi.com \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=idosch@nvidia.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=zhangpengfei16@xiaomi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox