* [PATCHv2 net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
@ 2026-02-14 9:15 Hangbin Liu
2026-02-14 20:49 ` Jay Vosburgh
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Hangbin Liu @ 2026-02-14 9:15 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jiri Bohac, Hangbin Liu, Liang Li,
Nikolay Aleksandrov
The ALB RX path may access rx_hashtbl concurrently with bond
teardown. During rapid bond up/down cycles, rlb_deinitialize()
frees rx_hashtbl while RX handlers are still running, leading
to a use-after-free detected by KASAN.
[ 214.174138] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001d: 0000 [#1] SMP KASAN PTI
[ 214.186478] KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
[ 214.194933] CPU: 30 UID: 0 PID: 2375 Comm: ping Kdump: loaded Not tainted 6.19.0-rc8+ #2 PREEMPT(voluntary)
[ 214.205907] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.14.0 01/14/2022
[ 214.214357] RIP: 0010:rlb_arp_recv+0x505/0xab0 [bonding]
[ 214.220320] Code: 0f 85 2b 05 00 00 48 b8 00 00 00 00 00 fc ff df 40 0f b6 ed 48 c1 e5 06 49 03 ad 78 01 00 00 48 8d 7d 28 48 89 fa 48 c1 ea 03 <0f> b6
04 02 84 c0 74 06 0f 8e 12 05 00 00 80 7d 28 00 0f 84 8c 00
[ 214.241280] RSP: 0018:ffffc900073d8870 EFLAGS: 00010206
[ 214.247116] RAX: dffffc0000000000 RBX: ffff888168556822 RCX: ffff88816855681e
[ 214.255082] RDX: 000000000000001d RSI: dffffc0000000000 RDI: 00000000000000e8
[ 214.263048] RBP: 00000000000000c0 R08: 0000000000000002 R09: ffffed11192021c8
[ 214.271013] R10: ffff8888c9010e43 R11: 0000000000000001 R12: 1ffff92000e7b119
[ 214.278978] R13: ffff8888c9010e00 R14: ffff888168556822 R15: ffff888168556810
[ 214.286943] FS: 00007f85d2d9cb80(0000) GS:ffff88886ccb3000(0000) knlGS:0000000000000000
[ 214.295966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 214.302380] CR2: 00007f0d047b5e34 CR3: 00000008a1c2e002 CR4: 00000000001726f0
[ 214.310347] Call Trace:
[ 214.313070] <IRQ>
[ 214.315318] ? __pfx_rlb_arp_recv+0x10/0x10 [bonding] [ 214.320975] bond_handle_frame+0x166/0xb60 [bonding]
[ 214.326537] ? __pfx_bond_handle_frame+0x10/0x10 [bonding]
[ 214.332680] __netif_receive_skb_core.constprop.0+0x576/0x2710
[ 214.339199] ? __pfx_arp_process+0x10/0x10
[ 214.343775] ? sched_balance_find_src_group+0x98/0x630
[ 214.349513] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 214.356513] ? arp_rcv+0x307/0x690
[ 214.360311] ? __pfx_arp_rcv+0x10/0x10
[ 214.364499] ? __lock_acquire+0x58c/0xbd0
[ 214.368975] __netif_receive_skb_one_core+0xae/0x1b0
[ 214.374518] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 214.380743] ? lock_acquire+0x10b/0x140
[ 214.385026] process_backlog+0x3f1/0x13a0
[ 214.389502] ? process_backlog+0x3aa/0x13a0
[ 214.394174] __napi_poll.constprop.0+0x9f/0x370
[ 214.399233] net_rx_action+0x8c1/0xe60
[ 214.403423] ? __pfx_net_rx_action+0x10/0x10
[ 214.408193] ? lock_acquire.part.0+0xbd/0x260
[ 214.413058] ? sched_clock_cpu+0x6c/0x540
[ 214.417540] ? mark_held_locks+0x40/0x70
[ 214.421920] handle_softirqs+0x1fd/0x860
[ 214.426302] ? __pfx_handle_softirqs+0x10/0x10
[ 214.431264] ? __neigh_event_send+0x2d6/0xf50
[ 214.436131] do_softirq+0xb1/0xf0
[ 214.439830] </IRQ>
The issue is reproducible by looping ip link set bond0 up/down,
where rlb_arp_recv() can race with rlb_deinitialize() and dereference
a freed rx_hashtbl entry.
Fix this by setting recv_probe to NULL, and then calling synchronize_net()
to wait for any concurrent RX processing to finish. This ensures that no
RX handler can access rx_hashtbl after it is freed in
bond_alb_deinitialize().
Reported-by: Liang Li <liali@redhat.com>
Fixes: e53665c6eaa6 ("bonding: delete migrated IP addresses from the rlb hash table")
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
v2: make the description more clear (Suggested by Jay)
---
drivers/net/bonding/bond_main.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 47f13d86cb7e..8e1057a2a061 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4314,9 +4314,13 @@ static int bond_close(struct net_device *bond_dev)
bond_work_cancel_all(bond);
bond->send_peer_notif = 0;
+ bond->recv_probe = NULL;
+
+ /* Wait for any in-flight RX handlers */
+ synchronize_net();
+
if (bond_is_lb(bond))
bond_alb_deinitialize(bond);
- bond->recv_probe = NULL;
if (BOND_MODE(bond) == BOND_MODE_8023AD &&
bond->params.broadcast_neighbor)
--
2.50.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCHv2 net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
2026-02-14 9:15 [PATCHv2 net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down Hangbin Liu
@ 2026-02-14 20:49 ` Jay Vosburgh
2026-02-18 0:42 ` Jakub Kicinski
2026-02-18 0:43 ` Jakub Kicinski
2 siblings, 0 replies; 8+ messages in thread
From: Jay Vosburgh @ 2026-02-14 20:49 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jiri Bohac, Liang Li,
Nikolay Aleksandrov
Hangbin Liu <liuhangbin@gmail.com> wrote:
>The ALB RX path may access rx_hashtbl concurrently with bond
>teardown. During rapid bond up/down cycles, rlb_deinitialize()
>frees rx_hashtbl while RX handlers are still running, leading
>to a use-after-free detected by KASAN.
>
>[ 214.174138] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001d: 0000 [#1] SMP KASAN PTI
>[ 214.186478] KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
>[ 214.194933] CPU: 30 UID: 0 PID: 2375 Comm: ping Kdump: loaded Not tainted 6.19.0-rc8+ #2 PREEMPT(voluntary)
>[ 214.205907] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.14.0 01/14/2022
>[ 214.214357] RIP: 0010:rlb_arp_recv+0x505/0xab0 [bonding]
>[ 214.220320] Code: 0f 85 2b 05 00 00 48 b8 00 00 00 00 00 fc ff df 40 0f b6 ed 48 c1 e5 06 49 03 ad 78 01 00 00 48 8d 7d 28 48 89 fa 48 c1 ea 03 <0f> b6
> 04 02 84 c0 74 06 0f 8e 12 05 00 00 80 7d 28 00 0f 84 8c 00
>[ 214.241280] RSP: 0018:ffffc900073d8870 EFLAGS: 00010206
>[ 214.247116] RAX: dffffc0000000000 RBX: ffff888168556822 RCX: ffff88816855681e
>[ 214.255082] RDX: 000000000000001d RSI: dffffc0000000000 RDI: 00000000000000e8
>[ 214.263048] RBP: 00000000000000c0 R08: 0000000000000002 R09: ffffed11192021c8
>[ 214.271013] R10: ffff8888c9010e43 R11: 0000000000000001 R12: 1ffff92000e7b119
>[ 214.278978] R13: ffff8888c9010e00 R14: ffff888168556822 R15: ffff888168556810
>[ 214.286943] FS: 00007f85d2d9cb80(0000) GS:ffff88886ccb3000(0000) knlGS:0000000000000000
>[ 214.295966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[ 214.302380] CR2: 00007f0d047b5e34 CR3: 00000008a1c2e002 CR4: 00000000001726f0
>[ 214.310347] Call Trace:
>[ 214.313070] <IRQ>
>[ 214.315318] ? __pfx_rlb_arp_recv+0x10/0x10 [bonding] [ 214.320975] bond_handle_frame+0x166/0xb60 [bonding]
>[ 214.326537] ? __pfx_bond_handle_frame+0x10/0x10 [bonding]
>[ 214.332680] __netif_receive_skb_core.constprop.0+0x576/0x2710
>[ 214.339199] ? __pfx_arp_process+0x10/0x10
>[ 214.343775] ? sched_balance_find_src_group+0x98/0x630
>[ 214.349513] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
>[ 214.356513] ? arp_rcv+0x307/0x690
>[ 214.360311] ? __pfx_arp_rcv+0x10/0x10
>[ 214.364499] ? __lock_acquire+0x58c/0xbd0
>[ 214.368975] __netif_receive_skb_one_core+0xae/0x1b0
>[ 214.374518] ? __pfx___netif_receive_skb_one_core+0x10/0x10
>[ 214.380743] ? lock_acquire+0x10b/0x140
>[ 214.385026] process_backlog+0x3f1/0x13a0
>[ 214.389502] ? process_backlog+0x3aa/0x13a0
>[ 214.394174] __napi_poll.constprop.0+0x9f/0x370
>[ 214.399233] net_rx_action+0x8c1/0xe60
>[ 214.403423] ? __pfx_net_rx_action+0x10/0x10
>[ 214.408193] ? lock_acquire.part.0+0xbd/0x260
>[ 214.413058] ? sched_clock_cpu+0x6c/0x540
>[ 214.417540] ? mark_held_locks+0x40/0x70
>[ 214.421920] handle_softirqs+0x1fd/0x860
>[ 214.426302] ? __pfx_handle_softirqs+0x10/0x10
>[ 214.431264] ? __neigh_event_send+0x2d6/0xf50
>[ 214.436131] do_softirq+0xb1/0xf0
>[ 214.439830] </IRQ>
>
>The issue is reproducible by looping ip link set bond0 up/down,
>where rlb_arp_recv() can race with rlb_deinitialize() and dereference
>a freed rx_hashtbl entry.
>
>Fix this by setting recv_probe to NULL, and then calling synchronize_net()
>to wait for any concurrent RX processing to finish. This ensures that no
>RX handler can access rx_hashtbl after it is freed in
>bond_alb_deinitialize().
This is much clearer, thanks.
>Reported-by: Liang Li <liali@redhat.com>
>Fixes: e53665c6eaa6 ("bonding: delete migrated IP addresses from the rlb hash table")
>Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
>Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jay Vosburgh <jv@jvosburgh.net>
>---
>
>v2: make the description more clear (Suggested by Jay)
>
>---
> drivers/net/bonding/bond_main.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 47f13d86cb7e..8e1057a2a061 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -4314,9 +4314,13 @@ static int bond_close(struct net_device *bond_dev)
>
> bond_work_cancel_all(bond);
> bond->send_peer_notif = 0;
>+ bond->recv_probe = NULL;
>+
>+ /* Wait for any in-flight RX handlers */
>+ synchronize_net();
>+
> if (bond_is_lb(bond))
> bond_alb_deinitialize(bond);
>- bond->recv_probe = NULL;
>
> if (BOND_MODE(bond) == BOND_MODE_8023AD &&
> bond->params.broadcast_neighbor)
>--
>2.50.1
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2 net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
2026-02-14 9:15 [PATCHv2 net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down Hangbin Liu
2026-02-14 20:49 ` Jay Vosburgh
@ 2026-02-18 0:42 ` Jakub Kicinski
2026-02-18 0:43 ` Jakub Kicinski
2 siblings, 0 replies; 8+ messages in thread
From: Jakub Kicinski @ 2026-02-18 0:42 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, Jiri Bohac, Liang Li, Nikolay Aleksandrov
On Sat, 14 Feb 2026 09:15:41 +0000 Hangbin Liu wrote:
> + bond->recv_probe = NULL;
nit: WRITE_ONCE()? It's READ_ONCE() in the handler.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2 net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
2026-02-14 9:15 [PATCHv2 net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down Hangbin Liu
2026-02-14 20:49 ` Jay Vosburgh
2026-02-18 0:42 ` Jakub Kicinski
@ 2026-02-18 0:43 ` Jakub Kicinski
2026-02-18 4:36 ` Hangbin Liu
2 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2026-02-18 0:43 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, Jiri Bohac, Liang Li, Nikolay Aleksandrov
On Sat, 14 Feb 2026 09:15:41 +0000 Hangbin Liu wrote:
> Fixes: e53665c6eaa6 ("bonding: delete migrated IP addresses from the rlb hash table")
Ah, also AI says the issue existed already in
3aba891dde38 ("bonding: move processing of recv handlers into
handle_frame()")
not the exact trapping instruction but the hash table was used from
recv_probe so at least a UAF would happen.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2 net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
2026-02-18 0:43 ` Jakub Kicinski
@ 2026-02-18 4:36 ` Hangbin Liu
2026-02-19 0:11 ` Jakub Kicinski
0 siblings, 1 reply; 8+ messages in thread
From: Hangbin Liu @ 2026-02-18 4:36 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, Jiri Bohac, Liang Li, Nikolay Aleksandrov
On Tue, Feb 17, 2026 at 04:43:55PM -0800, Jakub Kicinski wrote:
> On Sat, 14 Feb 2026 09:15:41 +0000 Hangbin Liu wrote:
> > Fixes: e53665c6eaa6 ("bonding: delete migrated IP addresses from the rlb hash table")
>
> Ah, also AI says the issue existed already in
> 3aba891dde38 ("bonding: move processing of recv handlers into
> handle_frame()")
> not the exact trapping instruction but the hash table was used from
> recv_probe so at least a UAF would happen.
Not sure if I understand correctly. Do you mean we still able to access
rlb_arp_recv() after setting recv_probe to NULL?
OK, that also count as UAF, even though not crash would happen.
Thanks
Hangbin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2 net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
2026-02-18 4:36 ` Hangbin Liu
@ 2026-02-19 0:11 ` Jakub Kicinski
2026-02-19 13:34 ` Hangbin Liu
0 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2026-02-19 0:11 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, Jiri Bohac, Liang Li, Nikolay Aleksandrov
On Wed, 18 Feb 2026 04:36:24 +0000 Hangbin Liu wrote:
> On Tue, Feb 17, 2026 at 04:43:55PM -0800, Jakub Kicinski wrote:
> > On Sat, 14 Feb 2026 09:15:41 +0000 Hangbin Liu wrote:
> > > Fixes: e53665c6eaa6 ("bonding: delete migrated IP addresses from the rlb hash table")
> >
> > Ah, also AI says the issue existed already in
> > 3aba891dde38 ("bonding: move processing of recv handlers into
> > handle_frame()")
> > not the exact trapping instruction but the hash table was used from
> > recv_probe so at least a UAF would happen.
>
> Not sure if I understand correctly. Do you mean we still able to access
> rlb_arp_recv() after setting recv_probe to NULL?
Simply put -- wasn't there a case where rx_hashtbl was accessed after
being freed in 3aba891dde38 already? That commit is a year and a half
older than the commit you had under Fixes.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2 net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
2026-02-19 0:11 ` Jakub Kicinski
@ 2026-02-19 13:34 ` Hangbin Liu
2026-02-19 13:39 ` Hangbin Liu
0 siblings, 1 reply; 8+ messages in thread
From: Hangbin Liu @ 2026-02-19 13:34 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, Jiri Bohac, Liang Li, Nikolay Aleksandrov
On Wed, Feb 18, 2026 at 04:11:10PM -0800, Jakub Kicinski wrote:
> On Wed, 18 Feb 2026 04:36:24 +0000 Hangbin Liu wrote:
> > On Tue, Feb 17, 2026 at 04:43:55PM -0800, Jakub Kicinski wrote:
> > > On Sat, 14 Feb 2026 09:15:41 +0000 Hangbin Liu wrote:
> > > > Fixes: e53665c6eaa6 ("bonding: delete migrated IP addresses from the rlb hash table")
> > >
> > > Ah, also AI says the issue existed already in
> > > 3aba891dde38 ("bonding: move processing of recv handlers into
> > > handle_frame()")
> > > not the exact trapping instruction but the hash table was used from
> > > recv_probe so at least a UAF would happen.
> >
> > Not sure if I understand correctly. Do you mean we still able to access
> > rlb_arp_recv() after setting recv_probe to NULL?
>
> Simply put -- wasn't there a case where rx_hashtbl was accessed after
> being freed in 3aba891dde38 already? That commit is a year and a half
> older than the commit you had under Fixes.
AFAIK, the UAF/null-ptr-deref issue for rx_hashtble is introduced by
53665c6eaa6 ("bonding: delete migrated IP addresses from the rlb hash table"),
which added rlb_purge_src_ip() in rlb_arp_recv().
In 3aba891dde38 ("bonding: move processing of recv handlers into handle_frame()")
it only let other CPU still able to access rlb_arp_recv() after we set recv_probe
to NULL. But it doesn't trigger a null-ptr-deref.
Thanks
Hangbin
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCHv2 net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
2026-02-19 13:34 ` Hangbin Liu
@ 2026-02-19 13:39 ` Hangbin Liu
0 siblings, 0 replies; 8+ messages in thread
From: Hangbin Liu @ 2026-02-19 13:39 UTC (permalink / raw)
To: Jakub Kicinski
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Paolo Abeni, Jiri Bohac, Liang Li, Nikolay Aleksandrov
On Thu, Feb 19, 2026 at 01:34:10PM +0000, Hangbin Liu wrote:
> On Wed, Feb 18, 2026 at 04:11:10PM -0800, Jakub Kicinski wrote:
> > On Wed, 18 Feb 2026 04:36:24 +0000 Hangbin Liu wrote:
> > > On Tue, Feb 17, 2026 at 04:43:55PM -0800, Jakub Kicinski wrote:
> > > > On Sat, 14 Feb 2026 09:15:41 +0000 Hangbin Liu wrote:
> > > > > Fixes: e53665c6eaa6 ("bonding: delete migrated IP addresses from the rlb hash table")
> > > >
> > > > Ah, also AI says the issue existed already in
> > > > 3aba891dde38 ("bonding: move processing of recv handlers into
> > > > handle_frame()")
> > > > not the exact trapping instruction but the hash table was used from
> > > > recv_probe so at least a UAF would happen.
> > >
> > > Not sure if I understand correctly. Do you mean we still able to access
> > > rlb_arp_recv() after setting recv_probe to NULL?
> >
> > Simply put -- wasn't there a case where rx_hashtbl was accessed after
> > being freed in 3aba891dde38 already? That commit is a year and a half
> > older than the commit you had under Fixes.
>
> AFAIK, the UAF/null-ptr-deref issue for rx_hashtble is introduced by
> 53665c6eaa6 ("bonding: delete migrated IP addresses from the rlb hash table"),
> which added rlb_purge_src_ip() in rlb_arp_recv().
>
> In 3aba891dde38 ("bonding: move processing of recv handlers into handle_frame()")
> it only let other CPU still able to access rlb_arp_recv() after we set recv_probe
> to NULL. But it doesn't trigger a null-ptr-deref.
Oh, I remember now. rlb_arp_recv() also calls rlb_update_entry_from_arp(),
which could access rx_hashtbl. You are right, the fixes tag should be
3aba891dde38 ("bonding: move processing of recv handlers into handle_frame()")
Thanks
Hangbin
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-02-19 13:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-14 9:15 [PATCHv2 net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down Hangbin Liu
2026-02-14 20:49 ` Jay Vosburgh
2026-02-18 0:42 ` Jakub Kicinski
2026-02-18 0:43 ` Jakub Kicinski
2026-02-18 4:36 ` Hangbin Liu
2026-02-19 0:11 ` Jakub Kicinski
2026-02-19 13:34 ` Hangbin Liu
2026-02-19 13:39 ` Hangbin Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox