* [PATCH net v3 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest @ 2026-02-28 2:19 Jiayuan Chen 2026-02-28 2:19 ` [PATCH net v3 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() Jiayuan Chen 2026-02-28 2:19 ` [PATCH net v3 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up Jiayuan Chen 0 siblings, 2 replies; 8+ messages in thread From: Jiayuan Chen @ 2026-02-28 2:19 UTC (permalink / raw) To: netdev Cc: jiayuna.chen, jiayuna.chen, Jiayuan Chen, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt, Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel syzkaller reported a kernel panic [1] with the following crash stack: Call Trace: BUG: unable to handle page fault for address: ffff8ebd08580000 PF: supervisor write access in kernel mode PF: error_code(0x0002) - not-present page PGD 11f201067 P4D 11f201067 PUD 0 Oops: Oops: 0002 [#1] SMP PTI CPU: 2 UID: 0 PID: 451 Comm: test_progs Not tainted 6.19.0+ #161 PREEMPT_RT RIP: 0010:bond_rr_gen_slave_id+0x90/0xd0 RSP: 0018:ffffd3f4815f3448 EFLAGS: 00010246 RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffff8ebc8728b17e RDX: 0000000000000000 RSI: ffffd3f4815f3538 RDI: ffff8ebc8abcce40 RBP: ffffd3f4815f3460 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffd3f4815f3538 R13: ffff8ebc8abcce40 R14: ffff8ebc8728b17f R15: ffff8ebc8728b170 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8ebd08580000 CR3: 000000010a808006 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <TASK> bond_xdp_get_xmit_slave+0xc0/0x240 xdp_master_redirect+0x74/0xc0 bpf_prog_run_generic_xdp+0x2f2/0x3f0 do_xdp_generic+0x1fd/0x3d0 __netif_receive_skb_core.constprop.0+0x30d/0x1220 __netif_receive_skb_list_core+0xfc/0x250 netif_receive_skb_list_internal+0x20c/0x3d0 ? eth_type_trans+0x137/0x160 netif_receive_skb_list+0x25/0x140 xdp_test_run_batch.constprop.0+0x65b/0x6e0 bpf_test_run_xdp_live+0x1ec/0x3b0 bpf_prog_test_run_xdp+0x49d/0x6e0 __sys_bpf+0x446/0x27b0 __x64_sys_bpf+0x1a/0x30 x64_sys_call+0x146c/0x26e0 do_syscall_64+0xd3/0x1510 entry_SYSCALL_64_after_hwframe+0x76/0x7e Problem Description bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL check. rr_tx_counter is a per-CPU counter only allocated in bond_open() when the bond mode is round-robin. If the bond device was never brought up, rr_tx_counter remains NULL. The XDP redirect path can reach this code even when the bond is not up: bpf_master_redirect_enabled_key is a global static key, so when any bond device has native XDP attached, the XDP_TX -> xdp_master_redirect() interception is enabled for all bond slaves system-wide. Solution Patch 1: Allocate rr_tx_counter unconditionally in bond_init() (ndo_init). Patch 2: Add a selftest that reproduces the above scenario. Changes since v2: https://lore.kernel.org/netdev/20260227092254.272603-1-jiayuan.chen@linux.dev/T/#t - Moved allocation from bond_create_init() helper into bond_init() (ndo_init), which is the natural single point covering both creation paths and also handles post-creation mode changes to round-robin Changes since v1: https://lore.kernel.org/netdev/20260224112545.37888-1-jiayuan.chen@linux.dev/T/#t - Moved the guard for NULL rr_tx_counter from xdp_master_redirect() into the bonding subsystem itself (Suggested by Sebastian Andrzej Siewior <bigeasy@linutronix.de>) [1] https://syzkaller.appspot.com/bug?extid=80e046b8da2820b6ba73 Jiayuan Chen (2): bonding: fix null-ptr-deref in bond_rr_gen_slave_id() selftests/bpf: add test for xdp_master_redirect with bond not up drivers/net/bonding/bond_main.c | 12 +-- .../selftests/bpf/prog_tests/xdp_bonding.c | 101 +++++++++++++++++- 2 files changed, 105 insertions(+), 8 deletions(-) -- 2.43.0 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net v3 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() 2026-02-28 2:19 [PATCH net v3 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest Jiayuan Chen @ 2026-02-28 2:19 ` Jiayuan Chen 2026-02-28 3:01 ` Jay Vosburgh 2026-02-28 2:19 ` [PATCH net v3 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up Jiayuan Chen 1 sibling, 1 reply; 8+ messages in thread From: Jiayuan Chen @ 2026-02-28 2:19 UTC (permalink / raw) To: netdev Cc: jiayuna.chen, jiayuna.chen, Jiayuan Chen, syzbot+80e046b8da2820b6ba73, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev, Andrii Nakryiko, Eduard Zingerman, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt, Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel From: Jiayuan Chen <jiayuan.chen@shopee.com> bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL check. rr_tx_counter is a per-CPU counter only allocated in bond_open() when the bond mode is round-robin. If the bond device was never brought up, rr_tx_counter remains NULL, causing a null-ptr-deref. The XDP redirect path can reach this code even when the bond is not up: bpf_master_redirect_enabled_key is a global static key, so when any bond device has native XDP attached, the XDP_TX -> xdp_master_redirect() interception is enabled for all bond slaves system-wide. This allows the path xdp_master_redirect() -> bond_xdp_get_xmit_slave() -> bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be reached on a bond that was never opened. The normal TX path (bond_xmit_roundrobin) is not affected because TX requires the bond to be UP, which guarantees rr_tx_counter is allocated. However, bond_xmit_get_slave() (ndo_get_xmit_slave) has the same code pattern via bond_xmit_roundrobin_slave_get() and could theoretically hit the same issue. Fix this by allocating rr_tx_counter unconditionally in bond_init() (ndo_init), which is called by register_netdevice() and covers both device creation paths (bond_create() and bond_newlink()). This also handles the case where bond mode is changed to round-robin after device creation. The conditional allocation in bond_open() is removed. Since bond_destructor() already unconditionally calls free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at ndo_init, free at destructor. Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device") Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/ Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> --- drivers/net/bonding/bond_main.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 78cff904cdc3..9f63f67d8418 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4279,12 +4279,6 @@ static int bond_open(struct net_device *bond_dev) struct list_head *iter; struct slave *slave; - if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) { - bond->rr_tx_counter = alloc_percpu(u32); - if (!bond->rr_tx_counter) - return -ENOMEM; - } - /* reset slave->backup and slave->inactive */ if (bond_has_slaves(bond)) { bond_for_each_slave(bond, slave, iter) { @@ -6411,6 +6405,12 @@ static int bond_init(struct net_device *bond_dev) if (!bond->wq) return -ENOMEM; + bond->rr_tx_counter = alloc_percpu(u32); + if (!bond->rr_tx_counter) { + destroy_workqueue(bond->wq); + return -ENOMEM; + } + bond->notifier_ctx = false; spin_lock_init(&bond->stats_lock); -- 2.43.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net v3 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() 2026-02-28 2:19 ` [PATCH net v3 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() Jiayuan Chen @ 2026-02-28 3:01 ` Jay Vosburgh 2026-02-28 3:36 ` Jiayuan Chen 0 siblings, 1 reply; 8+ messages in thread From: Jay Vosburgh @ 2026-02-28 3:01 UTC (permalink / raw) To: Jiayuan Chen Cc: netdev, jiayuna.chen, jiayuna.chen, Jiayuan Chen, syzbot+80e046b8da2820b6ba73, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev, Andrii Nakryiko, Eduard Zingerman, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt, Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel Jiayuan Chen <jiayuan.chen@linux.dev> wrote: >From: Jiayuan Chen <jiayuan.chen@shopee.com> > >bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL >check. rr_tx_counter is a per-CPU counter only allocated in bond_open() >when the bond mode is round-robin. If the bond device was never brought >up, rr_tx_counter remains NULL, causing a null-ptr-deref. > >The XDP redirect path can reach this code even when the bond is not up: >bpf_master_redirect_enabled_key is a global static key, so when any bond >device has native XDP attached, the XDP_TX -> xdp_master_redirect() >interception is enabled for all bond slaves system-wide. This allows the >path xdp_master_redirect() -> bond_xdp_get_xmit_slave() -> >bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be >reached on a bond that was never opened. > >The normal TX path (bond_xmit_roundrobin) is not affected because TX >requires the bond to be UP, which guarantees rr_tx_counter is allocated. >However, bond_xmit_get_slave() (ndo_get_xmit_slave) has the same code >pattern via bond_xmit_roundrobin_slave_get() and could theoretically >hit the same issue. As a practical matter, though, I don't think the ndo_get_xmit_slave path can actually hit the issue, as that looks to only be called from Infiniband, which is only supported in bonding for active-backup mode. >Fix this by allocating rr_tx_counter unconditionally in bond_init() >(ndo_init), which is called by register_netdevice() and covers both >device creation paths (bond_create() and bond_newlink()). This also >handles the case where bond mode is changed to round-robin after device >creation. The conditional allocation in bond_open() is removed. Since >bond_destructor() already unconditionally calls >free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at >ndo_init, free at destructor. > >Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device") >Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com >Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/ >Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> My only concern is that this will waste a percpu u32 per bond device for the majority of bonding use cases (which use modes other than balance-rr), which could be a few hundred bytes on a large machine. Does everything work reliably if the rr_tx_counter allocation happens conditionally on mode == BOND_MODE_ROUNDROBIN in bond_setup, as well as in bond_option_mode_set? -J >--- > drivers/net/bonding/bond_main.c | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index 78cff904cdc3..9f63f67d8418 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -4279,12 +4279,6 @@ static int bond_open(struct net_device *bond_dev) > struct list_head *iter; > struct slave *slave; > >- if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) { >- bond->rr_tx_counter = alloc_percpu(u32); >- if (!bond->rr_tx_counter) >- return -ENOMEM; >- } >- > /* reset slave->backup and slave->inactive */ > if (bond_has_slaves(bond)) { > bond_for_each_slave(bond, slave, iter) { >@@ -6411,6 +6405,12 @@ static int bond_init(struct net_device *bond_dev) > if (!bond->wq) > return -ENOMEM; > >+ bond->rr_tx_counter = alloc_percpu(u32); >+ if (!bond->rr_tx_counter) { >+ destroy_workqueue(bond->wq); >+ return -ENOMEM; >+ } >+ > bond->notifier_ctx = false; > > spin_lock_init(&bond->stats_lock); >-- >2.43.0 > --- -Jay Vosburgh, jv@jvosburgh.net ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v3 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() 2026-02-28 3:01 ` Jay Vosburgh @ 2026-02-28 3:36 ` Jiayuan Chen 2026-03-02 8:10 ` Sebastian Andrzej Siewior 0 siblings, 1 reply; 8+ messages in thread From: Jiayuan Chen @ 2026-02-28 3:36 UTC (permalink / raw) To: Jay Vosburgh Cc: netdev, jiayuna.chen, jiayuna.chen, Jiayuan Chen, syzbot+80e046b8da2820b6ba73, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev, Andrii Nakryiko, Eduard Zingerman, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt, Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel February 28, 2026 at 11:01, "Jay Vosburgh" <jv@jvosburgh.net mailto:jv@jvosburgh.net?to=%22Jay%20Vosburgh%22%20%3Cjv%40jvosburgh.net%3E > wrote: > > Jiayuan Chen <jiayuan.chen@linux.dev> wrote: > > > > > From: Jiayuan Chen <jiayuan.chen@shopee.com> > > > > bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL > > check. rr_tx_counter is a per-CPU counter only allocated in bond_open() > > when the bond mode is round-robin. If the bond device was never brought > > up, rr_tx_counter remains NULL, causing a null-ptr-deref. > > > > The XDP redirect path can reach this code even when the bond is not up: > > bpf_master_redirect_enabled_key is a global static key, so when any bond > > device has native XDP attached, the XDP_TX -> xdp_master_redirect() > > interception is enabled for all bond slaves system-wide. This allows the > > path xdp_master_redirect() -> bond_xdp_get_xmit_slave() -> > > bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be > > reached on a bond that was never opened. > > > > The normal TX path (bond_xmit_roundrobin) is not affected because TX > > requires the bond to be UP, which guarantees rr_tx_counter is allocated. > > However, bond_xmit_get_slave() (ndo_get_xmit_slave) has the same code > > pattern via bond_xmit_roundrobin_slave_get() and could theoretically > > hit the same issue. > > > As a practical matter, though, I don't think the > ndo_get_xmit_slave path can actually hit the issue, as that looks to > only be called from Infiniband, which is only supported in bonding for > active-backup mode. > > > > > Fix this by allocating rr_tx_counter unconditionally in bond_init() > > (ndo_init), which is called by register_netdevice() and covers both > > device creation paths (bond_create() and bond_newlink()). This also > > handles the case where bond mode is changed to round-robin after device > > creation. The conditional allocation in bond_open() is removed. Since > > bond_destructor() already unconditionally calls > > free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at > > ndo_init, free at destructor. > > > > Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device") > > Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com > > Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/ > > Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> > > > My only concern is that this will waste a percpu u32 per bond > device for the majority of bonding use cases (which use modes other than > balance-rr), which could be a few hundred bytes on a large machine. > > Does everything work reliably if the rr_tx_counter allocation > happens conditionally on mode == BOND_MODE_ROUNDROBIN in bond_setup, as > well as in bond_option_mode_set? > Hi Jay, Thanks for the review. bond_setup() is not suitable here as it is a void callback with no error return path, so an alloc_percpu() failure cannot be propagated. An alternative would be to allocate conditionally in bond_init() (since the default mode is round-robin) and manage allocation/deallocation in bond_option_mode_set() when the mode changes. This is a trade-off between the added complexity of conditional alloc/free across multiple code paths and saving a per-CPU u32 for non-round-robin bonds. For the per-CPU u32 overhead, it's only 4 extra bytes per CPU per bond device — and machines with that many CPUs tend to have plenty of memory to match. I don't have a strong preference either way. Thanks > -J > > > > > --- > > drivers/net/bonding/bond_main.c | 12 ++++++------ > > 1 file changed, 6 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > > index 78cff904cdc3..9f63f67d8418 100644 > > --- a/drivers/net/bonding/bond_main.c > > +++ b/drivers/net/bonding/bond_main.c > > @@ -4279,12 +4279,6 @@ static int bond_open(struct net_device *bond_dev) > > struct list_head *iter; > > struct slave *slave; > > > > - if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) { > > - bond->rr_tx_counter = alloc_percpu(u32); > > - if (!bond->rr_tx_counter) > > - return -ENOMEM; > > - } > > - > > /* reset slave->backup and slave->inactive */ > > if (bond_has_slaves(bond)) { > > bond_for_each_slave(bond, slave, iter) { > > @@ -6411,6 +6405,12 @@ static int bond_init(struct net_device *bond_dev) > > if (!bond->wq) > > return -ENOMEM; > > > > + bond->rr_tx_counter = alloc_percpu(u32); > > + if (!bond->rr_tx_counter) { > > + destroy_workqueue(bond->wq); > > + return -ENOMEM; > > + } > > + > > bond->notifier_ctx = false; > > > > spin_lock_init(&bond->stats_lock); > > -- > > 2.43.0 > > > --- > -Jay Vosburgh, jv@jvosburgh.net > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v3 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() 2026-02-28 3:36 ` Jiayuan Chen @ 2026-03-02 8:10 ` Sebastian Andrzej Siewior 2026-03-02 10:15 ` Jiayuan Chen 0 siblings, 1 reply; 8+ messages in thread From: Sebastian Andrzej Siewior @ 2026-03-02 8:10 UTC (permalink / raw) To: Jiayuan Chen Cc: Jay Vosburgh, netdev, jiayuna.chen, jiayuna.chen, Jiayuan Chen, syzbot+80e046b8da2820b6ba73, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev, Andrii Nakryiko, Eduard Zingerman, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Clark Williams, Steven Rostedt, Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel On 2026-02-28 03:36:24 [+0000], Jiayuan Chen wrote: > > My only concern is that this will waste a percpu u32 per bond > > device for the majority of bonding use cases (which use modes other than > > balance-rr), which could be a few hundred bytes on a large machine. > > > > Does everything work reliably if the rr_tx_counter allocation > > happens conditionally on mode == BOND_MODE_ROUNDROBIN in bond_setup, as > > well as in bond_option_mode_set? … > An alternative would be to allocate conditionally in bond_init() (since the default mode is round-robin) > and manage allocation/deallocation in bond_option_mode_set() when the mode changes. This sounds reasonable. > This is a trade-off between the added complexity of conditional alloc/free across multiple code > paths and saving a per-CPU u32 for non-round-robin bonds. > > For the per-CPU u32 overhead, it's only 4 extra bytes per CPU per bond device — and machines with > that many CPUs tend to have plenty of memory to match. 4 bytes is the minimum allocation for per-CPU memory. The memory is already "there" it is just not assigned. So for the 4 byte allocation it is needed to find a single area (the smallest allocation size). In case there no free block, a new block will be allocated and mapped for each CPU which the part that costs memory. That said, we should not waste memory but it is not _that_ expensive either for a bond device. Things change if here are hundreds of devices. > Thanks > > > -J Sebastian ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v3 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() 2026-03-02 8:10 ` Sebastian Andrzej Siewior @ 2026-03-02 10:15 ` Jiayuan Chen 2026-03-04 2:38 ` Jay Vosburgh 0 siblings, 1 reply; 8+ messages in thread From: Jiayuan Chen @ 2026-03-02 10:15 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Jay Vosburgh, netdev, jiayuna.chen, jiayuna.chen, Jiayuan Chen, syzbot+80e046b8da2820b6ba73, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev, Andrii Nakryiko, Eduard Zingerman, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Clark Williams, Steven Rostedt, Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel March 2, 2026 at 16:10, "Sebastian Andrzej Siewior" <bigeasy@linutronix.de mailto:bigeasy@linutronix.de?to=%22Sebastian%20Andrzej%20Siewior%22%20%3Cbigeasy%40linutronix.de%3E > wrote: > > On 2026-02-28 03:36:24 [+0000], Jiayuan Chen wrote: > > > > > My only concern is that this will waste a percpu u32 per bond > > device for the majority of bonding use cases (which use modes other than > > balance-rr), which could be a few hundred bytes on a large machine. > > > > Does everything work reliably if the rr_tx_counter allocation > > happens conditionally on mode == BOND_MODE_ROUNDROBIN in bond_setup, as > > well as in bond_option_mode_set? > > > … > > > > > An alternative would be to allocate conditionally in bond_init() (since the default mode is round-robin) > > and manage allocation/deallocation in bond_option_mode_set() when the mode changes. > > > This sounds reasonable. > > > > > This is a trade-off between the added complexity of conditional alloc/free across multiple code > > paths and saving a per-CPU u32 for non-round-robin bonds. > > > > For the per-CPU u32 overhead, it's only 4 extra bytes per CPU per bond device — and machines with > > that many CPUs tend to have plenty of memory to match. > > > 4 bytes is the minimum allocation for per-CPU memory. The memory is > already "there" it is just not assigned. So for the 4 byte allocation it > is needed to find a single area (the smallest allocation size). > In case there no free block, a new block will be allocated and mapped > for each CPU which the part that costs memory. > That said, we should not waste memory but it is not _that_ expensive > either for a bond device. Things change if here are hundreds of devices. Hi Jay, Sebastian, Sorry, the conditional alloc/free approach in bond_option_mode_set() I suggested earlier doesn't work well on closer inspection. Since bond_option_mode_set() requires the bond to be down, and as this bug shows, the XDP redirect path can still reach a downed bond device, we need to carefully order the operations: when switching to RR, alloc rr_tx_counter before setting mode; when switching away, set mode before freeing rr_tx_counter. Strictly speaking, this also requires smp_wmb()/smp_rmb() pairs to guarantee the ordering is visible to other CPUs — the write side in bond_option_mode_set() and the read side in the XDP path. In practice the race window is very small and unlikely to trigger, but leaving out the barriers would look incorrect, and adding them to the XDP hot path feels wrong for saving only 4 bytes per CPU per bond device. smp_wmb()/smp_rmb() (no test): diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4279,12 +4279,6 @@ static int bond_open(struct net_device *bond_dev) struct list_head *iter; struct slave *slave; - if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) { - bond->rr_tx_counter = alloc_percpu(u32); - if (!bond->rr_tx_counter) - return -ENOMEM; - } - /* reset slave->backup and slave->inactive */ if (bond_has_slaves(bond)) { bond_for_each_slave(bond, slave, iter) { @@ -5532,6 +5526,8 @@ bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp) switch (BOND_MODE(bond)) { case BOND_MODE_ROUNDROBIN: + /* Pairs with smp_wmb() in bond_option_mode_set() */ + smp_rmb(); slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp); break; @@ -6411,6 +6407,14 @@ static int bond_init(struct net_device *bond_dev) if (!bond->wq) return -ENOMEM; + /* Default mode is round-robin, allocate rr_tx_counter for it. + * For mode changes, bond_option_mode_set() manages the lifecycle. + */ + bond->rr_tx_counter = alloc_percpu(u32); + if (!bond->rr_tx_counter) { + destroy_workqueue(bond->wq); + return -ENOMEM; + } + bond->notifier_ctx = false; diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -918,7 +918,27 @@ static int bond_option_mode_set(struct bonding *bond, /* don't cache arp_validate between modes */ bond->params.arp_validate = BOND_ARP_VALIDATE_NONE; - bond->params.mode = newval->value; + + if (newval->value == BOND_MODE_ROUNDROBIN) { + /* Switching to round-robin: allocate before setting mode, + * so XDP path seeing BOND_MODE_ROUNDROBIN always finds + * rr_tx_counter allocated. + */ + if (!bond->rr_tx_counter) { + bond->rr_tx_counter = alloc_percpu(u32); + if (!bond->rr_tx_counter) + return -ENOMEM; + } + /* Pairs with smp_rmb() in bond_xdp_get_xmit_slave() */ + smp_wmb(); + bond->params.mode = newval->value; + } else { + /* Switching away: set mode first so XDP no longer + * enters RR branch before we free rr_tx_counter. + */ + bond->params.mode = newval->value; + /* Pairs with smp_rmb() in bond_xdp_get_xmit_slave() */ + smp_wmb(); + free_percpu(bond->rr_tx_counter); + bond->rr_tx_counter = NULL; + } Thanks, Jiayuan > > > > Thanks > > > > -J > > > Sebastian > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net v3 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() 2026-03-02 10:15 ` Jiayuan Chen @ 2026-03-04 2:38 ` Jay Vosburgh 0 siblings, 0 replies; 8+ messages in thread From: Jay Vosburgh @ 2026-03-04 2:38 UTC (permalink / raw) To: Jiayuan Chen Cc: Sebastian Andrzej Siewior, netdev, jiayuna.chen, jiayuna.chen, Jiayuan Chen, syzbot+80e046b8da2820b6ba73, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev, Andrii Nakryiko, Eduard Zingerman, Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Clark Williams, Steven Rostedt, Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel Jiayuan Chen <jiayuan.chen@linux.dev> wrote: >March 2, 2026 at 16:10, "Sebastian Andrzej Siewior" <bigeasy@linutronix.de mailto:bigeasy@linutronix.de?to=%22Sebastian%20Andrzej%20Siewior%22%20%3Cbigeasy%40linutronix.de%3E > wrote: > > >> >> On 2026-02-28 03:36:24 [+0000], Jiayuan Chen wrote: >> >> > >> > My only concern is that this will waste a percpu u32 per bond >> > device for the majority of bonding use cases (which use modes other than >> > balance-rr), which could be a few hundred bytes on a large machine. >> > >> > Does everything work reliably if the rr_tx_counter allocation >> > happens conditionally on mode == BOND_MODE_ROUNDROBIN in bond_setup, as >> > well as in bond_option_mode_set? >> > >> … >> >> > >> > An alternative would be to allocate conditionally in bond_init() (since the default mode is round-robin) >> > and manage allocation/deallocation in bond_option_mode_set() when the mode changes. >> > >> This sounds reasonable. >> >> > >> > This is a trade-off between the added complexity of conditional alloc/free across multiple code >> > paths and saving a per-CPU u32 for non-round-robin bonds. >> > >> > For the per-CPU u32 overhead, it's only 4 extra bytes per CPU per bond device — and machines with >> > that many CPUs tend to have plenty of memory to match. >> > >> 4 bytes is the minimum allocation for per-CPU memory. The memory is >> already "there" it is just not assigned. So for the 4 byte allocation it >> is needed to find a single area (the smallest allocation size). >> In case there no free block, a new block will be allocated and mapped >> for each CPU which the part that costs memory. >> That said, we should not waste memory but it is not _that_ expensive >> either for a bond device. Things change if here are hundreds of devices. > > >Hi Jay, Sebastian, > >Sorry, the conditional alloc/free approach in bond_option_mode_set() >I suggested earlier doesn't work well on closer inspection. > >Since bond_option_mode_set() requires the bond to be down, and as this >bug shows, the XDP redirect path can still reach a downed bond device, >we need to carefully order the operations: > >when switching to RR, alloc rr_tx_counter before setting mode; >when switching away, set mode before freeing rr_tx_counter. > >Strictly speaking, this also requires smp_wmb()/smp_rmb() pairs to >guarantee the ordering is visible to other CPUs — the write side in >bond_option_mode_set() and the read side in the XDP path. > >In practice the race window is very small and unlikely to trigger, but >leaving out the barriers would look incorrect, and adding them to the >XDP hot path feels wrong for saving only 4 bytes per CPU per bond device. Ok, fair enough. Can you repost the last version, and please note in the commit log that we're deliberately allowing the memory to be allocated but unused for most cases for the above reasons? I'm also thinking that a short comment in the code along the lines of "unused in most modes but needed for XDP" is worthwhile. Assuming Jakub, Eric, and Paolo don't object to the patch, I think we should have it documented that we're doing this on purpose so nobody tries to "fix" it later. -J >smp_wmb()/smp_rmb() (no test): > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -4279,12 +4279,6 @@ static int bond_open(struct net_device *bond_dev) > struct list_head *iter; > struct slave *slave; > >- if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) { >- bond->rr_tx_counter = alloc_percpu(u32); >- if (!bond->rr_tx_counter) >- return -ENOMEM; >- } >- > /* reset slave->backup and slave->inactive */ > if (bond_has_slaves(bond)) { > bond_for_each_slave(bond, slave, iter) { >@@ -5532,6 +5526,8 @@ bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp) > > switch (BOND_MODE(bond)) { > case BOND_MODE_ROUNDROBIN: >+ /* Pairs with smp_wmb() in bond_option_mode_set() */ >+ smp_rmb(); > slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp); > break; > >@@ -6411,6 +6407,14 @@ static int bond_init(struct net_device *bond_dev) > if (!bond->wq) > return -ENOMEM; > >+ /* Default mode is round-robin, allocate rr_tx_counter for it. >+ * For mode changes, bond_option_mode_set() manages the lifecycle. >+ */ >+ bond->rr_tx_counter = alloc_percpu(u32); >+ if (!bond->rr_tx_counter) { >+ destroy_workqueue(bond->wq); >+ return -ENOMEM; >+ } >+ > bond->notifier_ctx = false; > >diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c >--- a/drivers/net/bonding/bond_options.c >+++ b/drivers/net/bonding/bond_options.c >@@ -918,7 +918,27 @@ static int bond_option_mode_set(struct bonding *bond, > /* don't cache arp_validate between modes */ > bond->params.arp_validate = BOND_ARP_VALIDATE_NONE; >- bond->params.mode = newval->value; >+ >+ if (newval->value == BOND_MODE_ROUNDROBIN) { >+ /* Switching to round-robin: allocate before setting mode, >+ * so XDP path seeing BOND_MODE_ROUNDROBIN always finds >+ * rr_tx_counter allocated. >+ */ >+ if (!bond->rr_tx_counter) { >+ bond->rr_tx_counter = alloc_percpu(u32); >+ if (!bond->rr_tx_counter) >+ return -ENOMEM; >+ } >+ /* Pairs with smp_rmb() in bond_xdp_get_xmit_slave() */ >+ smp_wmb(); >+ bond->params.mode = newval->value; >+ } else { >+ /* Switching away: set mode first so XDP no longer >+ * enters RR branch before we free rr_tx_counter. >+ */ >+ bond->params.mode = newval->value; >+ /* Pairs with smp_rmb() in bond_xdp_get_xmit_slave() */ >+ smp_wmb(); >+ free_percpu(bond->rr_tx_counter); >+ bond->rr_tx_counter = NULL; >+ } > >Thanks, >Jiayuan > > >> > >> > Thanks >> > >> > -J >> > >> Sebastian >> --- -Jay Vosburgh, jv@jvosburgh.net ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net v3 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up 2026-02-28 2:19 [PATCH net v3 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest Jiayuan Chen 2026-02-28 2:19 ` [PATCH net v3 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() Jiayuan Chen @ 2026-02-28 2:19 ` Jiayuan Chen 1 sibling, 0 replies; 8+ messages in thread From: Jiayuan Chen @ 2026-02-28 2:19 UTC (permalink / raw) To: netdev Cc: jiayuna.chen, jiayuna.chen, Jiayuan Chen, Jay Vosburgh, Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan, Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt, Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel From: Jiayuan Chen <jiayuan.chen@shopee.com> Add a selftest that reproduces the null-ptr-deref in bond_rr_gen_slave_id() when XDP redirect targets a bond device in round-robin mode that was never brought up. The test verifies the fix by ensuring no crash occurs. Test setup: - bond0: active-backup mode, UP, with native XDP (enables bpf_master_redirect_enabled_key globally) - bond1: round-robin mode, never UP - veth1: slave of bond1, with generic XDP (XDP_TX) - BPF_PROG_TEST_RUN with live frames triggers the redirect path Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com> --- .../selftests/bpf/prog_tests/xdp_bonding.c | 101 +++++++++++++++++- 1 file changed, 99 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c index fb952703653e..a5b15e464018 100644 --- a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c +++ b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c @@ -191,13 +191,18 @@ static int bonding_setup(struct skeletons *skeletons, int mode, int xmit_policy, return -1; } -static void bonding_cleanup(struct skeletons *skeletons) +static void link_cleanup(struct skeletons *skeletons) { - restore_root_netns(); while (skeletons->nlinks) { skeletons->nlinks--; bpf_link__destroy(skeletons->links[skeletons->nlinks]); } +} + +static void bonding_cleanup(struct skeletons *skeletons) +{ + restore_root_netns(); + link_cleanup(skeletons); ASSERT_OK(system("ip link delete bond1"), "delete bond1"); ASSERT_OK(system("ip link delete veth1_1"), "delete veth1_1"); ASSERT_OK(system("ip link delete veth1_2"), "delete veth1_2"); @@ -493,6 +498,95 @@ static void test_xdp_bonding_nested(struct skeletons *skeletons) system("ip link del bond_nest2"); } +/* + * Test that XDP redirect via xdp_master_redirect() does not crash when + * the bond master device is not up. When bond is in round-robin mode but + * never opened, rr_tx_counter is NULL. + */ +static void test_xdp_bonding_redirect_no_up(struct skeletons *skeletons) +{ + struct nstoken *nstoken = NULL; + int xdp_pass_fd, xdp_tx_fd; + int veth1_ifindex; + int err; + char pkt[ETH_HLEN + 1]; + struct xdp_md ctx_in = {}; + + DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts, + .data_in = &pkt, + .data_size_in = sizeof(pkt), + .ctx_in = &ctx_in, + .ctx_size_in = sizeof(ctx_in), + .flags = BPF_F_TEST_XDP_LIVE_FRAMES, + .repeat = 1, + .batch_size = 1, + ); + + /* We can't use bonding_setup() because bond will be active */ + SYS(out, "ip netns add ns_rr_no_up"); + nstoken = open_netns("ns_rr_no_up"); + if (!ASSERT_OK_PTR(nstoken, "open ns_rr_no_up")) + goto out; + + /* bond0: active-backup, UP with slave veth0. + * Attaching native XDP to bond0 enables bpf_master_redirect_enabled_key + * globally. + */ + SYS(out, "ip link add bond0 type bond mode active-backup"); + SYS(out, "ip link add veth0 type veth peer name veth0p"); + SYS(out, "ip link set veth0 master bond0"); + SYS(out, "ip link set bond0 up"); + SYS(out, "ip link set veth0p up"); + + /* bond1: round-robin, never UP -> rr_tx_counter stays NULL */ + SYS(out, "ip link add bond1 type bond mode balance-rr"); + SYS(out, "ip link add veth1 type veth peer name veth1p"); + SYS(out, "ip link set veth1 master bond1"); + + veth1_ifindex = if_nametoindex("veth1"); + if (!ASSERT_GT(veth1_ifindex, 0, "veth1_ifindex")) + goto out; + + /* Attach native XDP to bond0 -> enables global redirect key */ + if (xdp_attach(skeletons, skeletons->xdp_tx->progs.xdp_tx, "bond0")) + goto out; + + /* Attach generic XDP (XDP_TX) to veth1. + * When packets arrive at veth1 via netif_receive_skb, do_xdp_generic() + * runs this program. XDP_TX + bond slave triggers xdp_master_redirect(). + */ + xdp_tx_fd = bpf_program__fd(skeletons->xdp_tx->progs.xdp_tx); + if (!ASSERT_GE(xdp_tx_fd, 0, "xdp_tx prog_fd")) + goto out; + + err = bpf_xdp_attach(veth1_ifindex, xdp_tx_fd, + XDP_FLAGS_SKB_MODE, NULL); + if (!ASSERT_OK(err, "attach generic XDP to veth1")) + goto out; + + /* Run BPF_PROG_TEST_RUN with XDP_PASS live frames on veth1. + * XDP_PASS frames become SKBs with skb->dev = veth1, entering + * netif_receive_skb -> do_xdp_generic -> xdp_master_redirect. + * Without the fix, bond_rr_gen_slave_id() dereferences NULL + * rr_tx_counter and crashes. + */ + xdp_pass_fd = bpf_program__fd(skeletons->xdp_dummy->progs.xdp_dummy_prog); + if (!ASSERT_GE(xdp_pass_fd, 0, "xdp_pass prog_fd")) + goto out; + + memset(pkt, 0, sizeof(pkt)); + ctx_in.data_end = sizeof(pkt); + ctx_in.ingress_ifindex = veth1_ifindex; + + err = bpf_prog_test_run_opts(xdp_pass_fd, &opts); + ASSERT_OK(err, "xdp_pass test_run should not crash"); + +out: + link_cleanup(skeletons); + close_netns(nstoken); + SYS_NOFAIL("ip netns del ns_rr_no_up"); +} + static void test_xdp_bonding_features(struct skeletons *skeletons) { LIBBPF_OPTS(bpf_xdp_query_opts, query_opts); @@ -680,6 +774,9 @@ void serial_test_xdp_bonding(void) if (test__start_subtest("xdp_bonding_redirect_multi")) test_xdp_bonding_redirect_multi(&skeletons); + if (test__start_subtest("xdp_bonding_redirect_no_up")) + test_xdp_bonding_redirect_no_up(&skeletons); + out: xdp_dummy__destroy(skeletons.xdp_dummy); xdp_tx__destroy(skeletons.xdp_tx); -- 2.43.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2026-03-04 2:38 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-02-28 2:19 [PATCH net v3 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest Jiayuan Chen 2026-02-28 2:19 ` [PATCH net v3 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() Jiayuan Chen 2026-02-28 3:01 ` Jay Vosburgh 2026-02-28 3:36 ` Jiayuan Chen 2026-03-02 8:10 ` Sebastian Andrzej Siewior 2026-03-02 10:15 ` Jiayuan Chen 2026-03-04 2:38 ` Jay Vosburgh 2026-02-28 2:19 ` [PATCH net v3 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up Jiayuan Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox