* [PATCH] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump
@ 2026-06-24 17:11 Pengfei Zhang
2026-06-24 17:22 ` Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Pengfei Zhang @ 2026-06-24 17:11 UTC (permalink / raw)
To: dsahern, idosch
Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel,
chenzhangqi, baohua, Pengfei Zhang, Pengfei Zhang
From: Pengfei Zhang <zhangpengfei16@xiaomi.com>
inet6_dump_fib() saves its progress in cb->args[1] as a positional
index within the current hash chain. Between batches the RTNL lock
is released, so a concurrent fib6_new_table() can insert a new table
at the chain head, shifting all existing entries. The saved index
then lands on a different table, causing fib6_dump_table() to set
w->root to the wrong table while w->node still points into the
previous one. fib6_walk_continue() dereferences w->node->parent
(NULL) and panics:
BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:fib6_walk_continue+0x6e/0x170
Call Trace:
<TASK>
fib6_dump_table.isra.0+0xc5/0x240
inet6_dump_fib+0xf6/0x420
rtnl_dumpit+0x30/0xa0
netlink_dump+0x15b/0x460
netlink_recvmsg+0x1d6/0x2a0
____sys_recvmsg+0x17a/0x190
Fix by storing tb->tb6_id in cb->args[1] instead of a positional
index. On resume, skip entries until the id matches; a concurrent
head-insert can never match the saved id, so the walker always
resumes on the correct table.
Signed-off-by: Pengfei Zhang <zhangfeionline@gmail.com>
---
The same crash was independently reported in a production environment
(kernel 5.15.137, triggered by ovs-vswitchd issuing RTM_GETROUTE):
https://lkml.iu.edu/hypermail/linux/kernel/2402.3/02068.html
The crash is probabilistic and occurs in fib6_walk_continue() at the
FWS_U state:
case FWS_U:
if (fn == w->root)
return 0;
pn = rcu_dereference_protected(fn->parent, 1);
left = rcu_dereference_protected(pn->left, 1); /* crash here */
The crash dump shows fn->parent is NULL. At first glance this looks
like fn is a leaf node whose parent was freed, but closer inspection of
the walker state reveals fn->fn_flags has RTN_ROOT set — fn is itself
a root node of a routing table, not a child node. A root node has no
parent by definition, so fn->parent == NULL is correct for that node.
The real question is why fn != w->root despite fn being a root. The
answer is that w->root and fn belong to *different* tables: w->node
(which became fn during traversal) still references a node from the
table that was being dumped when the batch suspended, while w->root was
silently redirected to a different table on resume.
This misdirection happens because inet6_dump_fib() uses a positional
index to resume across batches. Consider a hash slot containing two
tables [A(pos=0), B(pos=1)] where B is large enough to require multiple
batches. On the first batch, B suspends mid-walk and the loop saves:
cb->args[1] = e; /* e=1, position of B in the chain */
The RTNL lock is then released. At this point a concurrent
fib6_new_table() inserts table C at the chain head via
hlist_add_head_rcu(), making the chain [C(pos=0), A(pos=1), B(pos=2)].
On the next batch, inet6_dump_fib() resumes with s_e=1 and iterates:
s_e = cb->args[1]; /* s_e = 1 */
hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
if (e < s_e) /* skip C at pos=0 */
goto next;
/* e=1: tb now points to A, not B */
fib6_dump_table(tb, skb, cb); /* called with wrong table A */
}
Inside fib6_dump_table(), w->root is unconditionally overwritten
before the resume branch is entered:
w->root = &table->tb6_root; /* now A's root */
/* ... */
} else {
int sernum = READ_ONCE(w->root->fn_sernum); /* A's sernum */
if (cb->args[5] != sernum) {
/* sernum changed: safe reset, w->node = w->root (A) */
w->node = w->root;
} else {
/* sernum unchanged: w->node untouched, still in B */
w->skip = 0;
}
fib6_walk_continue(w); /* sernum equal: w->root=A, w->node=B */
}
The sernum guard was intended to detect tree modifications and reset
the walk, but here the two tables happen to share the same fn_sernum
value (a global flush had previously unified them), so the guard does
not fire and w->node is left pointing into B's tree.
From this point w->root and w->node belong to different tables. When
fib6_walk_continue() traverses upward and reaches B's root node
(fn->fn_flags & RTN_ROOT), the exit check:
if (fn == w->root) /* B's root != A's root, check fails */
return 0;
pn = fn->parent; /* B's root has no parent: pn == NULL */
left = pn->left; /* NULL deref -> crash */
net/ipv6/ip6_fib.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index fc95738de..bda492634 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -636,11 +636,11 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
};
const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
- unsigned int e = 0, s_e;
struct hlist_head *head;
struct fib6_walker *w;
struct fib6_table *tb;
unsigned int h, s_h;
+ u32 s_id;
int err = 0;
rcu_read_lock();
@@ -701,23 +701,22 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
}
s_h = cb->args[0];
- s_e = cb->args[1];
+ s_id = cb->args[1];
- for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_e = 0) {
- e = 0;
+ for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_id = 0) {
head = &net->ipv6.fib_table_hash[h];
hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
- if (e < s_e)
- goto next;
+ if (s_id && tb->tb6_id != s_id)
+ continue;
+ s_id = 0;
+
+ cb->args[1] = tb->tb6_id;
err = fib6_dump_table(tb, skb, cb);
if (err != 0)
goto out;
-next:
- e++;
}
}
out:
- cb->args[1] = e;
cb->args[0] = h;
unlock:
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump
2026-06-24 17:11 [PATCH] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump Pengfei Zhang
@ 2026-06-24 17:22 ` Eric Dumazet
2026-06-25 1:23 ` Pengfei Zhang
2026-06-25 1:23 ` [PATCH v2] " Pengfei Zhang
2 siblings, 0 replies; 4+ messages in thread
From: Eric Dumazet @ 2026-06-24 17:22 UTC (permalink / raw)
To: Pengfei Zhang
Cc: dsahern, idosch, davem, kuba, pabeni, horms, netdev, linux-kernel,
chenzhangqi, baohua, Pengfei Zhang
On Wed, Jun 24, 2026 at 10:12 AM Pengfei Zhang <zhangfeionline@gmail.com> wrote:
>
> From: Pengfei Zhang <zhangpengfei16@xiaomi.com>
>
> inet6_dump_fib() saves its progress in cb->args[1] as a positional
> index within the current hash chain. Between batches the RTNL lock
> is released, so a concurrent fib6_new_table() can insert a new table
> at the chain head, shifting all existing entries. The saved index
> then lands on a different table, causing fib6_dump_table() to set
> w->root to the wrong table while w->node still points into the
> previous one. fib6_walk_continue() dereferences w->node->parent
> (NULL) and panics:
>
> BUG: kernel NULL pointer dereference, address: 0000000000000008
> RIP: 0010:fib6_walk_continue+0x6e/0x170
> Call Trace:
> <TASK>
> fib6_dump_table.isra.0+0xc5/0x240
> inet6_dump_fib+0xf6/0x420
> rtnl_dumpit+0x30/0xa0
> netlink_dump+0x15b/0x460
> netlink_recvmsg+0x1d6/0x2a0
> ____sys_recvmsg+0x17a/0x190
>
> Fix by storing tb->tb6_id in cb->args[1] instead of a positional
> index. On resume, skip entries until the id matches; a concurrent
> head-insert can never match the saved id, so the walker always
> resumes on the correct table.
>
> Signed-off-by: Pengfei Zhang <zhangfeionline@gmail.com>
Patch looks good, but you forgot to add a Fixes: tag
Perhaps:
Fixes: 1b43af5480c3 ("[IPV6]: Increase number of possible routing
tables to 2^32")
> ---
> The same crash was independently reported in a production environment
> (kernel 5.15.137, triggered by ovs-vswitchd issuing RTM_GETROUTE):
> https://lkml.iu.edu/hypermail/linux/kernel/2402.3/02068.html
>
> The crash is probabilistic and occurs in fib6_walk_continue() at the
> FWS_U state:
>
> case FWS_U:
> if (fn == w->root)
> return 0;
> pn = rcu_dereference_protected(fn->parent, 1);
> left = rcu_dereference_protected(pn->left, 1); /* crash here */
>
> The crash dump shows fn->parent is NULL. At first glance this looks
> like fn is a leaf node whose parent was freed, but closer inspection of
> the walker state reveals fn->fn_flags has RTN_ROOT set — fn is itself
> a root node of a routing table, not a child node. A root node has no
> parent by definition, so fn->parent == NULL is correct for that node.
>
> The real question is why fn != w->root despite fn being a root. The
> answer is that w->root and fn belong to *different* tables: w->node
> (which became fn during traversal) still references a node from the
> table that was being dumped when the batch suspended, while w->root was
> silently redirected to a different table on resume.
>
> This misdirection happens because inet6_dump_fib() uses a positional
> index to resume across batches. Consider a hash slot containing two
> tables [A(pos=0), B(pos=1)] where B is large enough to require multiple
> batches. On the first batch, B suspends mid-walk and the loop saves:
>
> cb->args[1] = e; /* e=1, position of B in the chain */
>
> The RTNL lock is then released. At this point a concurrent
> fib6_new_table() inserts table C at the chain head via
> hlist_add_head_rcu(), making the chain [C(pos=0), A(pos=1), B(pos=2)].
>
> On the next batch, inet6_dump_fib() resumes with s_e=1 and iterates:
>
> s_e = cb->args[1]; /* s_e = 1 */
> hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
> if (e < s_e) /* skip C at pos=0 */
> goto next;
> /* e=1: tb now points to A, not B */
> fib6_dump_table(tb, skb, cb); /* called with wrong table A */
> }
>
> Inside fib6_dump_table(), w->root is unconditionally overwritten
> before the resume branch is entered:
>
> w->root = &table->tb6_root; /* now A's root */
> /* ... */
> } else {
> int sernum = READ_ONCE(w->root->fn_sernum); /* A's sernum */
> if (cb->args[5] != sernum) {
> /* sernum changed: safe reset, w->node = w->root (A) */
> w->node = w->root;
> } else {
> /* sernum unchanged: w->node untouched, still in B */
> w->skip = 0;
> }
> fib6_walk_continue(w); /* sernum equal: w->root=A, w->node=B */
> }
>
> The sernum guard was intended to detect tree modifications and reset
> the walk, but here the two tables happen to share the same fn_sernum
> value (a global flush had previously unified them), so the guard does
> not fire and w->node is left pointing into B's tree.
>
> From this point w->root and w->node belong to different tables. When
> fib6_walk_continue() traverses upward and reaches B's root node
> (fn->fn_flags & RTN_ROOT), the exit check:
>
> if (fn == w->root) /* B's root != A's root, check fails */
> return 0;
> pn = fn->parent; /* B's root has no parent: pn == NULL */
> left = pn->left; /* NULL deref -> crash */
>
> net/ipv6/ip6_fib.c | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index fc95738de..bda492634 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -636,11 +636,11 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
> };
> const struct nlmsghdr *nlh = cb->nlh;
> struct net *net = sock_net(skb->sk);
> - unsigned int e = 0, s_e;
> struct hlist_head *head;
> struct fib6_walker *w;
> struct fib6_table *tb;
> unsigned int h, s_h;
> + u32 s_id;
> int err = 0;
>
> rcu_read_lock();
> @@ -701,23 +701,22 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
> }
>
> s_h = cb->args[0];
> - s_e = cb->args[1];
> + s_id = cb->args[1];
>
> - for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_e = 0) {
> - e = 0;
> + for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_id = 0) {
> head = &net->ipv6.fib_table_hash[h];
> hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
> - if (e < s_e)
> - goto next;
> + if (s_id && tb->tb6_id != s_id)
> + continue;
> + s_id = 0;
> +
> + cb->args[1] = tb->tb6_id;
> err = fib6_dump_table(tb, skb, cb);
> if (err != 0)
> goto out;
> -next:
> - e++;
> }
> }
> out:
> - cb->args[1] = e;
> cb->args[0] = h;
>
> unlock:
> --
> 2.34.1
>
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump
2026-06-24 17:11 [PATCH] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump Pengfei Zhang
2026-06-24 17:22 ` Eric Dumazet
@ 2026-06-25 1:23 ` Pengfei Zhang
2026-06-25 1:23 ` [PATCH v2] " Pengfei Zhang
2 siblings, 0 replies; 4+ messages in thread
From: Pengfei Zhang @ 2026-06-25 1:23 UTC (permalink / raw)
To: dsahern, idosch
Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel,
chenzhangqi, baohua, Pengfei Zhang, Pengfei Zhang
From: Pengfei Zhang <zhangpengfei16@xiaomi.com>
inet6_dump_fib() saves its progress in cb->args[1] as a positional
index within the current hash chain. Between batches the RTNL lock
is released, so a concurrent fib6_new_table() can insert a new table
at the chain head, shifting all existing entries. The saved index
then lands on a different table, causing fib6_dump_table() to set
w->root to the wrong table while w->node still points into the
previous one. fib6_walk_continue() dereferences w->node->parent
(NULL) and panics:
BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:fib6_walk_continue+0x6e/0x170
Call Trace:
<TASK>
fib6_dump_table.isra.0+0xc5/0x240
inet6_dump_fib+0xf6/0x420
rtnl_dumpit+0x30/0xa0
netlink_dump+0x15b/0x460
netlink_recvmsg+0x1d6/0x2a0
____sys_recvmsg+0x17a/0x190
Fix by storing tb->tb6_id in cb->args[1] instead of a positional
index. On resume, skip entries until the id matches; a concurrent
head-insert can never match the saved id, so the walker always
resumes on the correct table.
Fixes: 1b43af5480c3 ("[IPV6]: Increase number of possible routing tables to 2^32")
Signed-off-by: Pengfei Zhang <zhangfeionline@gmail.com>
---
net/ipv6/ip6_fib.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index fc95738de..bda492634 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -636,11 +636,11 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
};
const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
- unsigned int e = 0, s_e;
struct hlist_head *head;
struct fib6_walker *w;
struct fib6_table *tb;
unsigned int h, s_h;
+ u32 s_id;
int err = 0;
rcu_read_lock();
@@ -701,23 +701,22 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
}
s_h = cb->args[0];
- s_e = cb->args[1];
+ s_id = cb->args[1];
- for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_e = 0) {
- e = 0;
+ for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_id = 0) {
head = &net->ipv6.fib_table_hash[h];
hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
- if (e < s_e)
- goto next;
+ if (s_id && tb->tb6_id != s_id)
+ continue;
+ s_id = 0;
+
+ cb->args[1] = tb->tb6_id;
err = fib6_dump_table(tb, skb, cb);
if (err != 0)
goto out;
-next:
- e++;
}
}
out:
- cb->args[1] = e;
cb->args[0] = h;
unlock:
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* [PATCH v2] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump
2026-06-24 17:11 [PATCH] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump Pengfei Zhang
2026-06-24 17:22 ` Eric Dumazet
2026-06-25 1:23 ` Pengfei Zhang
@ 2026-06-25 1:23 ` Pengfei Zhang
2 siblings, 0 replies; 4+ messages in thread
From: Pengfei Zhang @ 2026-06-25 1:23 UTC (permalink / raw)
To: dsahern, idosch
Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel,
chenzhangqi, baohua, Pengfei Zhang, Pengfei Zhang
From: Pengfei Zhang <zhangpengfei16@xiaomi.com>
inet6_dump_fib() saves its progress in cb->args[1] as a positional
index within the current hash chain. Between batches the RTNL lock
is released, so a concurrent fib6_new_table() can insert a new table
at the chain head, shifting all existing entries. The saved index
then lands on a different table, causing fib6_dump_table() to set
w->root to the wrong table while w->node still points into the
previous one. fib6_walk_continue() dereferences w->node->parent
(NULL) and panics:
BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:fib6_walk_continue+0x6e/0x170
Call Trace:
<TASK>
fib6_dump_table.isra.0+0xc5/0x240
inet6_dump_fib+0xf6/0x420
rtnl_dumpit+0x30/0xa0
netlink_dump+0x15b/0x460
netlink_recvmsg+0x1d6/0x2a0
____sys_recvmsg+0x17a/0x190
Fix by storing tb->tb6_id in cb->args[1] instead of a positional
index. On resume, skip entries until the id matches; a concurrent
head-insert can never match the saved id, so the walker always
resumes on the correct table.
Fixes: 1b43af5480c3 ("[IPV6]: Increase number of possible routing tables to 2^32")
Signed-off-by: Pengfei Zhang <zhangfeionline@gmail.com>
---
net/ipv6/ip6_fib.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index fc95738de..bda492634 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -636,11 +636,11 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
};
const struct nlmsghdr *nlh = cb->nlh;
struct net *net = sock_net(skb->sk);
- unsigned int e = 0, s_e;
struct hlist_head *head;
struct fib6_walker *w;
struct fib6_table *tb;
unsigned int h, s_h;
+ u32 s_id;
int err = 0;
rcu_read_lock();
@@ -701,23 +701,22 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
}
s_h = cb->args[0];
- s_e = cb->args[1];
+ s_id = cb->args[1];
- for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_e = 0) {
- e = 0;
+ for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_id = 0) {
head = &net->ipv6.fib_table_hash[h];
hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
- if (e < s_e)
- goto next;
+ if (s_id && tb->tb6_id != s_id)
+ continue;
+ s_id = 0;
+
+ cb->args[1] = tb->tb6_id;
err = fib6_dump_table(tb, skb, cb);
if (err != 0)
goto out;
-next:
- e++;
}
}
out:
- cb->args[1] = e;
cb->args[0] = h;
unlock:
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-25 1:23 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-24 17:11 [PATCH] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump Pengfei Zhang
2026-06-24 17:22 ` Eric Dumazet
2026-06-25 1:23 ` Pengfei Zhang
2026-06-25 1:23 ` [PATCH v2] " Pengfei Zhang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox