* [PATCH net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
@ 2026-02-13 9:54 Hangbin Liu
2026-02-13 10:16 ` Nikolay Aleksandrov
2026-02-13 21:52 ` Jay Vosburgh
0 siblings, 2 replies; 5+ messages in thread
From: Hangbin Liu @ 2026-02-13 9:54 UTC (permalink / raw)
To: netdev
Cc: Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jiri Bohac, Hangbin Liu, Liang Li
The ALB RX path may access rx_hashtbl concurrently with bond
teardown. During rapid bond up/down cycles, rlb_deinitialize()
frees rx_hashtbl while RX handlers are still running, leading
to a use-after-free detected by KASAN.
[ 214.174138] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001d: 0000 [#1] SMP KASAN PTI
[ 214.186478] KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
[ 214.194933] CPU: 30 UID: 0 PID: 2375 Comm: ping Kdump: loaded Not tainted 6.19.0-rc8+ #2 PREEMPT(voluntary)
[ 214.205907] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.14.0 01/14/2022
[ 214.214357] RIP: 0010:rlb_arp_recv+0x505/0xab0 [bonding]
[ 214.220320] Code: 0f 85 2b 05 00 00 48 b8 00 00 00 00 00 fc ff df 40 0f b6 ed 48 c1 e5 06 49 03 ad 78 01 00 00 48 8d 7d 28 48 89 fa 48 c1 ea 03 <0f> b6
04 02 84 c0 74 06 0f 8e 12 05 00 00 80 7d 28 00 0f 84 8c 00
[ 214.241280] RSP: 0018:ffffc900073d8870 EFLAGS: 00010206
[ 214.247116] RAX: dffffc0000000000 RBX: ffff888168556822 RCX: ffff88816855681e
[ 214.255082] RDX: 000000000000001d RSI: dffffc0000000000 RDI: 00000000000000e8
[ 214.263048] RBP: 00000000000000c0 R08: 0000000000000002 R09: ffffed11192021c8
[ 214.271013] R10: ffff8888c9010e43 R11: 0000000000000001 R12: 1ffff92000e7b119
[ 214.278978] R13: ffff8888c9010e00 R14: ffff888168556822 R15: ffff888168556810
[ 214.286943] FS: 00007f85d2d9cb80(0000) GS:ffff88886ccb3000(0000) knlGS:0000000000000000
[ 214.295966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 214.302380] CR2: 00007f0d047b5e34 CR3: 00000008a1c2e002 CR4: 00000000001726f0
[ 214.310347] Call Trace:
[ 214.313070] <IRQ>
[ 214.315318] ? __pfx_rlb_arp_recv+0x10/0x10 [bonding] [ 214.320975] bond_handle_frame+0x166/0xb60 [bonding]
[ 214.326537] ? __pfx_bond_handle_frame+0x10/0x10 [bonding]
[ 214.332680] __netif_receive_skb_core.constprop.0+0x576/0x2710
[ 214.339199] ? __pfx_arp_process+0x10/0x10
[ 214.343775] ? sched_balance_find_src_group+0x98/0x630
[ 214.349513] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[ 214.356513] ? arp_rcv+0x307/0x690
[ 214.360311] ? __pfx_arp_rcv+0x10/0x10
[ 214.364499] ? __lock_acquire+0x58c/0xbd0
[ 214.368975] __netif_receive_skb_one_core+0xae/0x1b0
[ 214.374518] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 214.380743] ? lock_acquire+0x10b/0x140
[ 214.385026] process_backlog+0x3f1/0x13a0
[ 214.389502] ? process_backlog+0x3aa/0x13a0
[ 214.394174] __napi_poll.constprop.0+0x9f/0x370
[ 214.399233] net_rx_action+0x8c1/0xe60
[ 214.403423] ? __pfx_net_rx_action+0x10/0x10
[ 214.408193] ? lock_acquire.part.0+0xbd/0x260
[ 214.413058] ? sched_clock_cpu+0x6c/0x540
[ 214.417540] ? mark_held_locks+0x40/0x70
[ 214.421920] handle_softirqs+0x1fd/0x860
[ 214.426302] ? __pfx_handle_softirqs+0x10/0x10
[ 214.431264] ? __neigh_event_send+0x2d6/0xf50
[ 214.436131] do_softirq+0xb1/0xf0
[ 214.439830] </IRQ>
The issue is reproducible by looping ip link set bond0 up/down,
where rlb_arp_recv() can race with rlb_deinitialize() and dereference
a freed rx_hashtbl entry.
Fix this by setting recv_probe to NULL before calling
bond_alb_deinitialize(), and then calling synchronize_net() to wait
for any concurrent RX processing to finish.
This ensures that no RX handler can access rx_hashtbl after it is
freed, while preserving the existing locking semantics.
Reported-by: Liang Li <liali@redhat.com>
Fixes: e53665c6eaa6 ("bonding: delete migrated IP addresses from the rlb hash table")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
---
drivers/net/bonding/bond_main.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 47f13d86cb7e..8e1057a2a061 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4314,9 +4314,13 @@ static int bond_close(struct net_device *bond_dev)
bond_work_cancel_all(bond);
bond->send_peer_notif = 0;
+ bond->recv_probe = NULL;
+
+ /* Wait for any in-flight RX handlers */
+ synchronize_net();
+
if (bond_is_lb(bond))
bond_alb_deinitialize(bond);
- bond->recv_probe = NULL;
if (BOND_MODE(bond) == BOND_MODE_8023AD &&
bond->params.broadcast_neighbor)
--
2.50.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
2026-02-13 9:54 [PATCH net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down Hangbin Liu
@ 2026-02-13 10:16 ` Nikolay Aleksandrov
2026-02-13 10:30 ` Hangbin Liu
2026-02-13 21:52 ` Jay Vosburgh
1 sibling, 1 reply; 5+ messages in thread
From: Nikolay Aleksandrov @ 2026-02-13 10:16 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jiri Bohac, Liang Li
On Fri, Feb 13, 2026 at 09:54:24AM +0000, Hangbin Liu wrote:
> The ALB RX path may access rx_hashtbl concurrently with bond
> teardown. During rapid bond up/down cycles, rlb_deinitialize()
> frees rx_hashtbl while RX handlers are still running, leading
> to a use-after-free detected by KASAN.
>
> [ 214.174138] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001d: 0000 [#1] SMP KASAN PTI
> [ 214.186478] KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
> [ 214.194933] CPU: 30 UID: 0 PID: 2375 Comm: ping Kdump: loaded Not tainted 6.19.0-rc8+ #2 PREEMPT(voluntary)
> [ 214.205907] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.14.0 01/14/2022
> [ 214.214357] RIP: 0010:rlb_arp_recv+0x505/0xab0 [bonding]
> [ 214.220320] Code: 0f 85 2b 05 00 00 48 b8 00 00 00 00 00 fc ff df 40 0f b6 ed 48 c1 e5 06 49 03 ad 78 01 00 00 48 8d 7d 28 48 89 fa 48 c1 ea 03 <0f> b6
> 04 02 84 c0 74 06 0f 8e 12 05 00 00 80 7d 28 00 0f 84 8c 00
> [ 214.241280] RSP: 0018:ffffc900073d8870 EFLAGS: 00010206
> [ 214.247116] RAX: dffffc0000000000 RBX: ffff888168556822 RCX: ffff88816855681e
> [ 214.255082] RDX: 000000000000001d RSI: dffffc0000000000 RDI: 00000000000000e8
> [ 214.263048] RBP: 00000000000000c0 R08: 0000000000000002 R09: ffffed11192021c8
> [ 214.271013] R10: ffff8888c9010e43 R11: 0000000000000001 R12: 1ffff92000e7b119
> [ 214.278978] R13: ffff8888c9010e00 R14: ffff888168556822 R15: ffff888168556810
> [ 214.286943] FS: 00007f85d2d9cb80(0000) GS:ffff88886ccb3000(0000) knlGS:0000000000000000
> [ 214.295966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 214.302380] CR2: 00007f0d047b5e34 CR3: 00000008a1c2e002 CR4: 00000000001726f0
> [ 214.310347] Call Trace:
> [ 214.313070] <IRQ>
> [ 214.315318] ? __pfx_rlb_arp_recv+0x10/0x10 [bonding] [ 214.320975] bond_handle_frame+0x166/0xb60 [bonding]
> [ 214.326537] ? __pfx_bond_handle_frame+0x10/0x10 [bonding]
> [ 214.332680] __netif_receive_skb_core.constprop.0+0x576/0x2710
> [ 214.339199] ? __pfx_arp_process+0x10/0x10
> [ 214.343775] ? sched_balance_find_src_group+0x98/0x630
> [ 214.349513] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
> [ 214.356513] ? arp_rcv+0x307/0x690
> [ 214.360311] ? __pfx_arp_rcv+0x10/0x10
> [ 214.364499] ? __lock_acquire+0x58c/0xbd0
> [ 214.368975] __netif_receive_skb_one_core+0xae/0x1b0
> [ 214.374518] ? __pfx___netif_receive_skb_one_core+0x10/0x10
> [ 214.380743] ? lock_acquire+0x10b/0x140
> [ 214.385026] process_backlog+0x3f1/0x13a0
> [ 214.389502] ? process_backlog+0x3aa/0x13a0
> [ 214.394174] __napi_poll.constprop.0+0x9f/0x370
> [ 214.399233] net_rx_action+0x8c1/0xe60
> [ 214.403423] ? __pfx_net_rx_action+0x10/0x10
> [ 214.408193] ? lock_acquire.part.0+0xbd/0x260
> [ 214.413058] ? sched_clock_cpu+0x6c/0x540
> [ 214.417540] ? mark_held_locks+0x40/0x70
> [ 214.421920] handle_softirqs+0x1fd/0x860
> [ 214.426302] ? __pfx_handle_softirqs+0x10/0x10
> [ 214.431264] ? __neigh_event_send+0x2d6/0xf50
> [ 214.436131] do_softirq+0xb1/0xf0
> [ 214.439830] </IRQ>
>
> The issue is reproducible by looping ip link set bond0 up/down,
> where rlb_arp_recv() can race with rlb_deinitialize() and dereference
> a freed rx_hashtbl entry.
>
> Fix this by setting recv_probe to NULL before calling
> bond_alb_deinitialize(), and then calling synchronize_net() to wait
> for any concurrent RX processing to finish.
>
> This ensures that no RX handler can access rx_hashtbl after it is
> freed, while preserving the existing locking semantics.
>
> Reported-by: Liang Li <liali@redhat.com>
> Fixes: e53665c6eaa6 ("bonding: delete migrated IP addresses from the rlb hash table")
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
> ---
> drivers/net/bonding/bond_main.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index 47f13d86cb7e..8e1057a2a061 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4314,9 +4314,13 @@ static int bond_close(struct net_device *bond_dev)
>
> bond_work_cancel_all(bond);
> bond->send_peer_notif = 0;
> + bond->recv_probe = NULL;
> +
> + /* Wait for any in-flight RX handlers */
> + synchronize_net();
> +
> if (bond_is_lb(bond))
> bond_alb_deinitialize(bond);
> - bond->recv_probe = NULL;
>
> if (BOND_MODE(bond) == BOND_MODE_8023AD &&
> bond->params.broadcast_neighbor)
> --
> 2.50.1
>
Ouch, good catch! By the way I have a (very old) patch that properly
converts recv_probe to RCU, currently it's missing any annotations on
the write side. I think I'll revive it when net-next opens up. :)
Reviewed-by: Nikolay Aleksandrov <nikolay@nvidia.com>
Cheers,
Nik
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
2026-02-13 10:16 ` Nikolay Aleksandrov
@ 2026-02-13 10:30 ` Hangbin Liu
0 siblings, 0 replies; 5+ messages in thread
From: Hangbin Liu @ 2026-02-13 10:30 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: netdev, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jiri Bohac, Liang Li
On Fri, Feb 13, 2026 at 12:16:53PM +0200, Nikolay Aleksandrov wrote:
> Ouch, good catch! By the way I have a (very old) patch that properly
> converts recv_probe to RCU, currently it's missing any annotations on
> the write side. I think I'll revive it when net-next opens up. :)
Yes, I also thought about using RCU. But that changes too much compared
with this one. And bond_close is not a critical area.
Convert it in net-next is a good choice.
Thanks
Hangbin
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
2026-02-13 9:54 [PATCH net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down Hangbin Liu
2026-02-13 10:16 ` Nikolay Aleksandrov
@ 2026-02-13 21:52 ` Jay Vosburgh
2026-02-14 9:13 ` Hangbin Liu
1 sibling, 1 reply; 5+ messages in thread
From: Jay Vosburgh @ 2026-02-13 21:52 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jiri Bohac, Liang Li
Hangbin Liu <liuhangbin@gmail.com> wrote:
>The ALB RX path may access rx_hashtbl concurrently with bond
>teardown. During rapid bond up/down cycles, rlb_deinitialize()
>frees rx_hashtbl while RX handlers are still running, leading
>to a use-after-free detected by KASAN.
>
>[ 214.174138] Oops: general protection fault, probably for non-canonical address 0xdffffc000000001d: 0000 [#1] SMP KASAN PTI
>[ 214.186478] KASAN: null-ptr-deref in range [0x00000000000000e8-0x00000000000000ef]
>[ 214.194933] CPU: 30 UID: 0 PID: 2375 Comm: ping Kdump: loaded Not tainted 6.19.0-rc8+ #2 PREEMPT(voluntary)
>[ 214.205907] Hardware name: Dell Inc. PowerEdge R730/0WCJNT, BIOS 2.14.0 01/14/2022
>[ 214.214357] RIP: 0010:rlb_arp_recv+0x505/0xab0 [bonding]
>[ 214.220320] Code: 0f 85 2b 05 00 00 48 b8 00 00 00 00 00 fc ff df 40 0f b6 ed 48 c1 e5 06 49 03 ad 78 01 00 00 48 8d 7d 28 48 89 fa 48 c1 ea 03 <0f> b6
> 04 02 84 c0 74 06 0f 8e 12 05 00 00 80 7d 28 00 0f 84 8c 00
>[ 214.241280] RSP: 0018:ffffc900073d8870 EFLAGS: 00010206
>[ 214.247116] RAX: dffffc0000000000 RBX: ffff888168556822 RCX: ffff88816855681e
>[ 214.255082] RDX: 000000000000001d RSI: dffffc0000000000 RDI: 00000000000000e8
>[ 214.263048] RBP: 00000000000000c0 R08: 0000000000000002 R09: ffffed11192021c8
>[ 214.271013] R10: ffff8888c9010e43 R11: 0000000000000001 R12: 1ffff92000e7b119
>[ 214.278978] R13: ffff8888c9010e00 R14: ffff888168556822 R15: ffff888168556810
>[ 214.286943] FS: 00007f85d2d9cb80(0000) GS:ffff88886ccb3000(0000) knlGS:0000000000000000
>[ 214.295966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[ 214.302380] CR2: 00007f0d047b5e34 CR3: 00000008a1c2e002 CR4: 00000000001726f0
>[ 214.310347] Call Trace:
>[ 214.313070] <IRQ>
>[ 214.315318] ? __pfx_rlb_arp_recv+0x10/0x10 [bonding] [ 214.320975] bond_handle_frame+0x166/0xb60 [bonding]
>[ 214.326537] ? __pfx_bond_handle_frame+0x10/0x10 [bonding]
>[ 214.332680] __netif_receive_skb_core.constprop.0+0x576/0x2710
>[ 214.339199] ? __pfx_arp_process+0x10/0x10
>[ 214.343775] ? sched_balance_find_src_group+0x98/0x630
>[ 214.349513] ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
>[ 214.356513] ? arp_rcv+0x307/0x690
>[ 214.360311] ? __pfx_arp_rcv+0x10/0x10
>[ 214.364499] ? __lock_acquire+0x58c/0xbd0
>[ 214.368975] __netif_receive_skb_one_core+0xae/0x1b0
>[ 214.374518] ? __pfx___netif_receive_skb_one_core+0x10/0x10
>[ 214.380743] ? lock_acquire+0x10b/0x140
>[ 214.385026] process_backlog+0x3f1/0x13a0
>[ 214.389502] ? process_backlog+0x3aa/0x13a0
>[ 214.394174] __napi_poll.constprop.0+0x9f/0x370
>[ 214.399233] net_rx_action+0x8c1/0xe60
>[ 214.403423] ? __pfx_net_rx_action+0x10/0x10
>[ 214.408193] ? lock_acquire.part.0+0xbd/0x260
>[ 214.413058] ? sched_clock_cpu+0x6c/0x540
>[ 214.417540] ? mark_held_locks+0x40/0x70
>[ 214.421920] handle_softirqs+0x1fd/0x860
>[ 214.426302] ? __pfx_handle_softirqs+0x10/0x10
>[ 214.431264] ? __neigh_event_send+0x2d6/0xf50
>[ 214.436131] do_softirq+0xb1/0xf0
>[ 214.439830] </IRQ>
>
>The issue is reproducible by looping ip link set bond0 up/down,
>where rlb_arp_recv() can race with rlb_deinitialize() and dereference
>a freed rx_hashtbl entry.
>
>Fix this by setting recv_probe to NULL before calling
>bond_alb_deinitialize(), and then calling synchronize_net() to wait
>for any concurrent RX processing to finish.
I think the change looks good, however, to my reading the
sentence above describes the order of events as 1) recv_probe = NULL, 2)
bond_alb_deinitialize(), then 3) synchroninze_net(), whereas the changed
code below is 1, 3, then 2.
The code order is correct, but I think the commit description
should match the code.
-J
>This ensures that no RX handler can access rx_hashtbl after it is
>freed, while preserving the existing locking semantics.
>
>Reported-by: Liang Li <liali@redhat.com>
>Fixes: e53665c6eaa6 ("bonding: delete migrated IP addresses from the rlb hash table")
>Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
>---
> drivers/net/bonding/bond_main.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>index 47f13d86cb7e..8e1057a2a061 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -4314,9 +4314,13 @@ static int bond_close(struct net_device *bond_dev)
>
> bond_work_cancel_all(bond);
> bond->send_peer_notif = 0;
>+ bond->recv_probe = NULL;
>+
>+ /* Wait for any in-flight RX handlers */
>+ synchronize_net();
>+
> if (bond_is_lb(bond))
> bond_alb_deinitialize(bond);
>- bond->recv_probe = NULL;
>
> if (BOND_MODE(bond) == BOND_MODE_8023AD &&
> bond->params.broadcast_neighbor)
>--
>2.50.1
>
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down
2026-02-13 21:52 ` Jay Vosburgh
@ 2026-02-14 9:13 ` Hangbin Liu
0 siblings, 0 replies; 5+ messages in thread
From: Hangbin Liu @ 2026-02-14 9:13 UTC (permalink / raw)
To: Jay Vosburgh
Cc: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Jiri Bohac, Liang Li
On Fri, Feb 13, 2026 at 01:52:52PM -0800, Jay Vosburgh wrote:
> Hangbin Liu <liuhangbin@gmail.com> wrote:
> >
> >Fix this by setting recv_probe to NULL before calling
> >bond_alb_deinitialize(), and then calling synchronize_net() to wait
> >for any concurrent RX processing to finish.
>
> I think the change looks good, however, to my reading the
> sentence above describes the order of events as 1) recv_probe = NULL, 2)
> bond_alb_deinitialize(), then 3) synchroninze_net(), whereas the changed
> code below is 1, 3, then 2.
>
> The code order is correct, but I think the commit description
> should match the code.
>
Got it. I will post v2 version to fix this.
Hangbin
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-02-14 9:13 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-13 9:54 [PATCH net] bonding: alb: fix UAF in rlb_arp_recv during bond up/down Hangbin Liu
2026-02-13 10:16 ` Nikolay Aleksandrov
2026-02-13 10:30 ` Hangbin Liu
2026-02-13 21:52 ` Jay Vosburgh
2026-02-14 9:13 ` Hangbin Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox