* [PATCH net v4 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest
@ 2026-03-04 7:42 Jiayuan Chen
2026-03-04 7:42 ` [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() Jiayuan Chen
2026-03-04 7:42 ` [PATCH net v4 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up Jiayuan Chen
0 siblings, 2 replies; 13+ messages in thread
From: Jiayuan Chen @ 2026-03-04 7:42 UTC (permalink / raw)
To: jv, netdev
Cc: jiayuan.chen, jiayuan.chen, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev, Andrii Nakryiko, Eduard Zingerman,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt, Jussi Maki, linux-kernel, bpf, linux-kselftest,
linux-rt-devel
syzkaller reported a kernel panic [1] with the following crash stack:
Call Trace:
BUG: unable to handle page fault for address: ffff8ebd08580000
PF: supervisor write access in kernel mode
PF: error_code(0x0002) - not-present page
PGD 11f201067 P4D 11f201067 PUD 0
Oops: Oops: 0002 [#1] SMP PTI
CPU: 2 UID: 0 PID: 451 Comm: test_progs Not tainted 6.19.0+ #161 PREEMPT_RT
RIP: 0010:bond_rr_gen_slave_id+0x90/0xd0
RSP: 0018:ffffd3f4815f3448 EFLAGS: 00010246
RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffff8ebc8728b17e
RDX: 0000000000000000 RSI: ffffd3f4815f3538 RDI: ffff8ebc8abcce40
RBP: ffffd3f4815f3460 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffd3f4815f3538
R13: ffff8ebc8abcce40 R14: ffff8ebc8728b17f R15: ffff8ebc8728b170
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8ebd08580000 CR3: 000000010a808006 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
<TASK>
bond_xdp_get_xmit_slave+0xc0/0x240
xdp_master_redirect+0x74/0xc0
bpf_prog_run_generic_xdp+0x2f2/0x3f0
do_xdp_generic+0x1fd/0x3d0
__netif_receive_skb_core.constprop.0+0x30d/0x1220
__netif_receive_skb_list_core+0xfc/0x250
netif_receive_skb_list_internal+0x20c/0x3d0
? eth_type_trans+0x137/0x160
netif_receive_skb_list+0x25/0x140
xdp_test_run_batch.constprop.0+0x65b/0x6e0
bpf_test_run_xdp_live+0x1ec/0x3b0
bpf_prog_test_run_xdp+0x49d/0x6e0
__sys_bpf+0x446/0x27b0
__x64_sys_bpf+0x1a/0x30
x64_sys_call+0x146c/0x26e0
do_syscall_64+0xd3/0x1510
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Problem Description
bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL
check. rr_tx_counter is a per-CPU counter only allocated in bond_open()
when the bond mode is round-robin. If the bond device was never brought
up, rr_tx_counter remains NULL.
The XDP redirect path can reach this code even when the bond is not up:
bpf_master_redirect_enabled_key is a global static key, so when any bond
device has native XDP attached, the XDP_TX -> xdp_master_redirect()
interception is enabled for all bond slaves system-wide.
Solution
Patch 1: Allocate rr_tx_counter unconditionally in bond_init() (ndo_init).
Patch 2: Add a selftest that reproduces the above scenario.
Changes since v3:
https://lore.kernel.org/netdev/20260228021918.141002-1-jiayuan.chen@linux.dev/T/#t
- Added code comment and commit log explaining why rr_tx_counter is
allocated unconditionally for all modes (Suggested by Jay Vosburgh)
Changes since v2:
https://lore.kernel.org/netdev/20260227092254.272603-1-jiayuan.chen@linux.dev/T/#t
- Moved allocation from bond_create_init() helper into bond_init()
(ndo_init), which is the natural single point covering both creation
paths and also handles post-creation mode changes to round-robin
Changes since v1:
https://lore.kernel.org/netdev/20260224112545.37888-1-jiayuan.chen@linux.dev/T/#t
- Moved the guard for NULL rr_tx_counter from xdp_master_redirect()
into the bonding subsystem itself
(Suggested by Sebastian Andrzej Siewior <bigeasy@linutronix.de>)
[1] https://syzkaller.appspot.com/bug?extid=80e046b8da2820b6ba73
Jiayuan Chen (2):
bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
selftests/bpf: add test for xdp_master_redirect with bond not up
drivers/net/bonding/bond_main.c | 19 ++--
.../selftests/bpf/prog_tests/xdp_bonding.c | 101 +++++++++++++++++-
2 files changed, 112 insertions(+), 8 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
2026-03-04 7:42 [PATCH net v4 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest Jiayuan Chen
@ 2026-03-04 7:42 ` Jiayuan Chen
2026-03-04 8:20 ` Daniel Borkmann
2026-03-04 15:59 ` Nikolay Aleksandrov
2026-03-04 7:42 ` [PATCH net v4 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up Jiayuan Chen
1 sibling, 2 replies; 13+ messages in thread
From: Jiayuan Chen @ 2026-03-04 7:42 UTC (permalink / raw)
To: jv, netdev
Cc: jiayuan.chen, jiayuan.chen, syzbot+80e046b8da2820b6ba73,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel
From: Jiayuan Chen <jiayuan.chen@shopee.com>
bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL
check. rr_tx_counter is a per-CPU counter only allocated in bond_open()
when the bond mode is round-robin. If the bond device was never brought
up, rr_tx_counter remains NULL, causing a null-ptr-deref.
The XDP redirect path can reach this code even when the bond is not up:
bpf_master_redirect_enabled_key is a global static key, so when any bond
device has native XDP attached, the XDP_TX -> xdp_master_redirect()
interception is enabled for all bond slaves system-wide. This allows the
path xdp_master_redirect() -> bond_xdp_get_xmit_slave() ->
bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be
reached on a bond that was never opened.
Fix this by allocating rr_tx_counter unconditionally in bond_init()
(ndo_init), which is called by register_netdevice() and covers both
device creation paths (bond_create() and bond_newlink()). This also
handles the case where bond mode is changed to round-robin after device
creation. The conditional allocation in bond_open() is removed. Since
bond_destructor() already unconditionally calls
free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at
ndo_init, free at destructor.
Note: rr_tx_counter is only used by round-robin mode, so this
deliberately allocates a per-cpu u32 that goes unused for other modes.
Conditional allocation (e.g., in bond_option_mode_set) was considered
but rejected: the XDP path can race with mode changes on a downed bond,
and adding memory barriers to the XDP hot path is not justified for
saving 4 bytes per CPU.
Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device")
Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
---
drivers/net/bonding/bond_main.c | 19 +++++++++++++------
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 78cff904cdc3..55b5c7a6cb5f 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4279,12 +4279,6 @@ static int bond_open(struct net_device *bond_dev)
struct list_head *iter;
struct slave *slave;
- if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) {
- bond->rr_tx_counter = alloc_percpu(u32);
- if (!bond->rr_tx_counter)
- return -ENOMEM;
- }
-
/* reset slave->backup and slave->inactive */
if (bond_has_slaves(bond)) {
bond_for_each_slave(bond, slave, iter) {
@@ -6411,6 +6405,19 @@ static int bond_init(struct net_device *bond_dev)
if (!bond->wq)
return -ENOMEM;
+ /* rr_tx_counter is only used in round-robin mode, but we allocate
+ * it unconditionally because the XDP redirect path
+ * (xdp_master_redirect -> bond_xdp_get_xmit_slave) can reach here
+ * even when the bond is not up, and deferring allocation to
+ * bond_open or bond_option_mode_set would require memory barriers
+ * on the XDP hot path. The cost is a per-cpu u32 per bond device.
+ */
+ bond->rr_tx_counter = alloc_percpu(u32);
+ if (!bond->rr_tx_counter) {
+ destroy_workqueue(bond->wq);
+ return -ENOMEM;
+ }
+
bond->notifier_ctx = false;
spin_lock_init(&bond->stats_lock);
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [PATCH net v4 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up
2026-03-04 7:42 [PATCH net v4 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest Jiayuan Chen
2026-03-04 7:42 ` [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() Jiayuan Chen
@ 2026-03-04 7:42 ` Jiayuan Chen
1 sibling, 0 replies; 13+ messages in thread
From: Jiayuan Chen @ 2026-03-04 7:42 UTC (permalink / raw)
To: jv, netdev
Cc: jiayuan.chen, jiayuan.chen, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev, Andrii Nakryiko, Eduard Zingerman,
Martin KaFai Lau, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt, Jussi Maki, linux-kernel, bpf, linux-kselftest,
linux-rt-devel
From: Jiayuan Chen <jiayuan.chen@shopee.com>
Add a selftest that reproduces the null-ptr-deref in
bond_rr_gen_slave_id() when XDP redirect targets a bond device in
round-robin mode that was never brought up. The test verifies the fix
by ensuring no crash occurs.
Test setup:
- bond0: active-backup mode, UP, with native XDP (enables
bpf_master_redirect_enabled_key globally)
- bond1: round-robin mode, never UP
- veth1: slave of bond1, with generic XDP (XDP_TX)
- BPF_PROG_TEST_RUN with live frames triggers the redirect path
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
---
.../selftests/bpf/prog_tests/xdp_bonding.c | 101 +++++++++++++++++-
1 file changed, 99 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
index fb952703653e..a5b15e464018 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
@@ -191,13 +191,18 @@ static int bonding_setup(struct skeletons *skeletons, int mode, int xmit_policy,
return -1;
}
-static void bonding_cleanup(struct skeletons *skeletons)
+static void link_cleanup(struct skeletons *skeletons)
{
- restore_root_netns();
while (skeletons->nlinks) {
skeletons->nlinks--;
bpf_link__destroy(skeletons->links[skeletons->nlinks]);
}
+}
+
+static void bonding_cleanup(struct skeletons *skeletons)
+{
+ restore_root_netns();
+ link_cleanup(skeletons);
ASSERT_OK(system("ip link delete bond1"), "delete bond1");
ASSERT_OK(system("ip link delete veth1_1"), "delete veth1_1");
ASSERT_OK(system("ip link delete veth1_2"), "delete veth1_2");
@@ -493,6 +498,95 @@ static void test_xdp_bonding_nested(struct skeletons *skeletons)
system("ip link del bond_nest2");
}
+/*
+ * Test that XDP redirect via xdp_master_redirect() does not crash when
+ * the bond master device is not up. When bond is in round-robin mode but
+ * never opened, rr_tx_counter is NULL.
+ */
+static void test_xdp_bonding_redirect_no_up(struct skeletons *skeletons)
+{
+ struct nstoken *nstoken = NULL;
+ int xdp_pass_fd, xdp_tx_fd;
+ int veth1_ifindex;
+ int err;
+ char pkt[ETH_HLEN + 1];
+ struct xdp_md ctx_in = {};
+
+ DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts,
+ .data_in = &pkt,
+ .data_size_in = sizeof(pkt),
+ .ctx_in = &ctx_in,
+ .ctx_size_in = sizeof(ctx_in),
+ .flags = BPF_F_TEST_XDP_LIVE_FRAMES,
+ .repeat = 1,
+ .batch_size = 1,
+ );
+
+ /* We can't use bonding_setup() because bond will be active */
+ SYS(out, "ip netns add ns_rr_no_up");
+ nstoken = open_netns("ns_rr_no_up");
+ if (!ASSERT_OK_PTR(nstoken, "open ns_rr_no_up"))
+ goto out;
+
+ /* bond0: active-backup, UP with slave veth0.
+ * Attaching native XDP to bond0 enables bpf_master_redirect_enabled_key
+ * globally.
+ */
+ SYS(out, "ip link add bond0 type bond mode active-backup");
+ SYS(out, "ip link add veth0 type veth peer name veth0p");
+ SYS(out, "ip link set veth0 master bond0");
+ SYS(out, "ip link set bond0 up");
+ SYS(out, "ip link set veth0p up");
+
+ /* bond1: round-robin, never UP -> rr_tx_counter stays NULL */
+ SYS(out, "ip link add bond1 type bond mode balance-rr");
+ SYS(out, "ip link add veth1 type veth peer name veth1p");
+ SYS(out, "ip link set veth1 master bond1");
+
+ veth1_ifindex = if_nametoindex("veth1");
+ if (!ASSERT_GT(veth1_ifindex, 0, "veth1_ifindex"))
+ goto out;
+
+ /* Attach native XDP to bond0 -> enables global redirect key */
+ if (xdp_attach(skeletons, skeletons->xdp_tx->progs.xdp_tx, "bond0"))
+ goto out;
+
+ /* Attach generic XDP (XDP_TX) to veth1.
+ * When packets arrive at veth1 via netif_receive_skb, do_xdp_generic()
+ * runs this program. XDP_TX + bond slave triggers xdp_master_redirect().
+ */
+ xdp_tx_fd = bpf_program__fd(skeletons->xdp_tx->progs.xdp_tx);
+ if (!ASSERT_GE(xdp_tx_fd, 0, "xdp_tx prog_fd"))
+ goto out;
+
+ err = bpf_xdp_attach(veth1_ifindex, xdp_tx_fd,
+ XDP_FLAGS_SKB_MODE, NULL);
+ if (!ASSERT_OK(err, "attach generic XDP to veth1"))
+ goto out;
+
+ /* Run BPF_PROG_TEST_RUN with XDP_PASS live frames on veth1.
+ * XDP_PASS frames become SKBs with skb->dev = veth1, entering
+ * netif_receive_skb -> do_xdp_generic -> xdp_master_redirect.
+ * Without the fix, bond_rr_gen_slave_id() dereferences NULL
+ * rr_tx_counter and crashes.
+ */
+ xdp_pass_fd = bpf_program__fd(skeletons->xdp_dummy->progs.xdp_dummy_prog);
+ if (!ASSERT_GE(xdp_pass_fd, 0, "xdp_pass prog_fd"))
+ goto out;
+
+ memset(pkt, 0, sizeof(pkt));
+ ctx_in.data_end = sizeof(pkt);
+ ctx_in.ingress_ifindex = veth1_ifindex;
+
+ err = bpf_prog_test_run_opts(xdp_pass_fd, &opts);
+ ASSERT_OK(err, "xdp_pass test_run should not crash");
+
+out:
+ link_cleanup(skeletons);
+ close_netns(nstoken);
+ SYS_NOFAIL("ip netns del ns_rr_no_up");
+}
+
static void test_xdp_bonding_features(struct skeletons *skeletons)
{
LIBBPF_OPTS(bpf_xdp_query_opts, query_opts);
@@ -680,6 +774,9 @@ void serial_test_xdp_bonding(void)
if (test__start_subtest("xdp_bonding_redirect_multi"))
test_xdp_bonding_redirect_multi(&skeletons);
+ if (test__start_subtest("xdp_bonding_redirect_no_up"))
+ test_xdp_bonding_redirect_no_up(&skeletons);
+
out:
xdp_dummy__destroy(skeletons.xdp_dummy);
xdp_tx__destroy(skeletons.xdp_tx);
--
2.43.0
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
2026-03-04 7:42 ` [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() Jiayuan Chen
@ 2026-03-04 8:20 ` Daniel Borkmann
2026-03-04 8:47 ` Jiayuan Chen
2026-03-04 9:40 ` Sebastian Andrzej Siewior
2026-03-04 15:59 ` Nikolay Aleksandrov
1 sibling, 2 replies; 13+ messages in thread
From: Daniel Borkmann @ 2026-03-04 8:20 UTC (permalink / raw)
To: Jiayuan Chen, jv, netdev
Cc: jiayuan.chen, syzbot+80e046b8da2820b6ba73, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau,
Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt, Jussi Maki, linux-kernel, bpf, linux-kselftest,
linux-rt-devel
On 3/4/26 8:42 AM, Jiayuan Chen wrote:
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
>
> bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL
> check. rr_tx_counter is a per-CPU counter only allocated in bond_open()
> when the bond mode is round-robin. If the bond device was never brought
> up, rr_tx_counter remains NULL, causing a null-ptr-deref.
>
> The XDP redirect path can reach this code even when the bond is not up:
> bpf_master_redirect_enabled_key is a global static key, so when any bond
> device has native XDP attached, the XDP_TX -> xdp_master_redirect()
> interception is enabled for all bond slaves system-wide. This allows the
> path xdp_master_redirect() -> bond_xdp_get_xmit_slave() ->
> bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be
> reached on a bond that was never opened.
>
> Fix this by allocating rr_tx_counter unconditionally in bond_init()
> (ndo_init), which is called by register_netdevice() and covers both
> device creation paths (bond_create() and bond_newlink()). This also
> handles the case where bond mode is changed to round-robin after device
> creation. The conditional allocation in bond_open() is removed. Since
> bond_destructor() already unconditionally calls
> free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at
> ndo_init, free at destructor.
>
> Note: rr_tx_counter is only used by round-robin mode, so this
> deliberately allocates a per-cpu u32 that goes unused for other modes.
> Conditional allocation (e.g., in bond_option_mode_set) was considered
> but rejected: the XDP path can race with mode changes on a downed bond,
> and adding memory barriers to the XDP hot path is not justified for
> saving 4 bytes per CPU.
Arguably it's a corner case, but could we not just do sth like this to
actually check if the device is up and if not drop?
diff --git a/net/core/filter.c b/net/core/filter.c
index ba019ded773d..c447fd989a27 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4387,6 +4387,9 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
struct net_device *master, *slave;
master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
+ if (unlikely(!(master->flags & IFF_UP)))
+ return XDP_ABORTED;
+
slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
if (slave && slave != xdp->rxq->dev) {
/* The target device is different from the receiving device, so
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
2026-03-04 8:20 ` Daniel Borkmann
@ 2026-03-04 8:47 ` Jiayuan Chen
2026-03-04 9:40 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 13+ messages in thread
From: Jiayuan Chen @ 2026-03-04 8:47 UTC (permalink / raw)
To: Daniel Borkmann, jv, netdev
Cc: jiayuan.chen, syzbot+80e046b8da2820b6ba73, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Jesper Dangaard Brouer, John Fastabend,
Stanislav Fomichev, Andrii Nakryiko, Martin KaFai Lau,
Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
Jiri Olsa, Shuah Khan, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt, Jussi Maki, linux-kernel, bpf, linux-kselftest,
linux-rt-devel
March 4, 2026 at 16:20, "Daniel Borkmann" <daniel@iogearbox.net mailto:daniel@iogearbox.net?to=%22Daniel%20Borkmann%22%20%3Cdaniel%40iogearbox.net%3E > wrote:
>
> On 3/4/26 8:42 AM, Jiayuan Chen wrote:
>
> >
> > From: Jiayuan Chen <jiayuan.chen@shopee.com>
> > bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL
> > check. rr_tx_counter is a per-CPU counter only allocated in bond_open()
> > when the bond mode is round-robin. If the bond device was never brought
> > up, rr_tx_counter remains NULL, causing a null-ptr-deref.
> > The XDP redirect path can reach this code even when the bond is not up:
> > bpf_master_redirect_enabled_key is a global static key, so when any bond
> > device has native XDP attached, the XDP_TX -> xdp_master_redirect()
> > interception is enabled for all bond slaves system-wide. This allows the
> > path xdp_master_redirect() -> bond_xdp_get_xmit_slave() ->
> > bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be
> > reached on a bond that was never opened.
> > Fix this by allocating rr_tx_counter unconditionally in bond_init()
> > (ndo_init), which is called by register_netdevice() and covers both
> > device creation paths (bond_create() and bond_newlink()). This also
> > handles the case where bond mode is changed to round-robin after device
> > creation. The conditional allocation in bond_open() is removed. Since
> > bond_destructor() already unconditionally calls
> > free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at
> > ndo_init, free at destructor.
> > Note: rr_tx_counter is only used by round-robin mode, so this
> > deliberately allocates a per-cpu u32 that goes unused for other modes.
> > Conditional allocation (e.g., in bond_option_mode_set) was considered
> > but rejected: the XDP path can race with mode changes on a downed bond,
> > and adding memory barriers to the XDP hot path is not justified for
> > saving 4 bytes per CPU.
> >
> Arguably it's a corner case, but could we not just do sth like this to
> actually check if the device is up and if not drop?
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index ba019ded773d..c447fd989a27 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4387,6 +4387,9 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
> struct net_device *master, *slave;
> master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
> + if (unlikely(!(master->flags & IFF_UP)))
> + return XDP_ABORTED;
> +
> slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
> if (slave && slave != xdp->rxq->dev) {
> /* The target device is different from the receiving device, so
>
Hi Daniel,
It was discussed at [1].
The primary concern at the time was that assuming
bond->rr_tx_counter had been allocated based on the
interface being up was too implicit.
[1] https://lore.kernel.org/netdev/20260224112545.37888-1-jiayuan.chen@linux.dev/T/#t
Feel free to share any other thoughts or ideas
Thanks,
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
2026-03-04 8:20 ` Daniel Borkmann
2026-03-04 8:47 ` Jiayuan Chen
@ 2026-03-04 9:40 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 13+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-03-04 9:40 UTC (permalink / raw)
To: Daniel Borkmann
Cc: Jiayuan Chen, jv, netdev, jiayuan.chen,
syzbot+80e046b8da2820b6ba73, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Alexei Starovoitov,
Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Clark Williams, Steven Rostedt, Jussi Maki, linux-kernel, bpf,
linux-kselftest, linux-rt-devel
On 2026-03-04 09:20:27 [+0100], Daniel Borkmann wrote:
> Arguably it's a corner case, but could we not just do sth like this to
> actually check if the device is up and if not drop?
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index ba019ded773d..c447fd989a27 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4387,6 +4387,9 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
> struct net_device *master, *slave;
> master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
> + if (unlikely(!(master->flags & IFF_UP)))
> + return XDP_ABORTED;
preemption (vcpu scheduled), while paused the other CPU could put the
device down, or is it too much of a corner case?
Also, there is also ndo_get_xmit_slave which at this time is only used
by Infiniband and supports only active-backup mode [0].
[0] https://lore.kernel.org/netdev/999129.1772247707@famine/
> +
> slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
> if (slave && slave != xdp->rxq->dev) {
> /* The target device is different from the receiving device, so
Sebastian
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
2026-03-04 7:42 ` [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() Jiayuan Chen
2026-03-04 8:20 ` Daniel Borkmann
@ 2026-03-04 15:59 ` Nikolay Aleksandrov
2026-03-04 17:27 ` Jay Vosburgh
1 sibling, 1 reply; 13+ messages in thread
From: Nikolay Aleksandrov @ 2026-03-04 15:59 UTC (permalink / raw)
To: Jiayuan Chen
Cc: jv, netdev, jiayuan.chen, syzbot+80e046b8da2820b6ba73,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel
On Wed, Mar 04, 2026 at 03:42:57PM +0800, Jiayuan Chen wrote:
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
>
> bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL
> check. rr_tx_counter is a per-CPU counter only allocated in bond_open()
> when the bond mode is round-robin. If the bond device was never brought
> up, rr_tx_counter remains NULL, causing a null-ptr-deref.
>
> The XDP redirect path can reach this code even when the bond is not up:
> bpf_master_redirect_enabled_key is a global static key, so when any bond
> device has native XDP attached, the XDP_TX -> xdp_master_redirect()
> interception is enabled for all bond slaves system-wide. This allows the
> path xdp_master_redirect() -> bond_xdp_get_xmit_slave() ->
> bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be
> reached on a bond that was never opened.
>
> Fix this by allocating rr_tx_counter unconditionally in bond_init()
> (ndo_init), which is called by register_netdevice() and covers both
> device creation paths (bond_create() and bond_newlink()). This also
> handles the case where bond mode is changed to round-robin after device
> creation. The conditional allocation in bond_open() is removed. Since
> bond_destructor() already unconditionally calls
> free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at
> ndo_init, free at destructor.
>
> Note: rr_tx_counter is only used by round-robin mode, so this
> deliberately allocates a per-cpu u32 that goes unused for other modes.
> Conditional allocation (e.g., in bond_option_mode_set) was considered
> but rejected: the XDP path can race with mode changes on a downed bond,
> and adding memory barriers to the XDP hot path is not justified for
> saving 4 bytes per CPU.
>
> Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device")
> Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/
> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> ---
> drivers/net/bonding/bond_main.c | 19 +++++++++++++------
> 1 file changed, 13 insertions(+), 6 deletions(-)
>
IMO it's not worth it to waste memory in all modes, for an unpopular mode.
I think it'd be better to add a null check in bond_rr_gen_slave_id(),
READ/WRITE_ONCE() should be enough since it is allocated only once, and
freed when the xmit code cannot be reachable anymore (otherwise we'd have
more bugs now). The branch will be successfully predicted practically always,
and you can also mark the ptr being null as unlikely. That way only RR takes
a very minimal hit, if any.
Cheers,
Nik
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
2026-03-04 15:59 ` Nikolay Aleksandrov
@ 2026-03-04 17:27 ` Jay Vosburgh
2026-03-04 17:32 ` Nikolay Aleksandrov
0 siblings, 1 reply; 13+ messages in thread
From: Jay Vosburgh @ 2026-03-04 17:27 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: Jiayuan Chen, netdev, jiayuan.chen, syzbot+80e046b8da2820b6ba73,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel
Nikolay Aleksandrov <razor@blackwall.org> wrote:
>On Wed, Mar 04, 2026 at 03:42:57PM +0800, Jiayuan Chen wrote:
>> From: Jiayuan Chen <jiayuan.chen@shopee.com>
>>
>> bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL
>> check. rr_tx_counter is a per-CPU counter only allocated in bond_open()
>> when the bond mode is round-robin. If the bond device was never brought
>> up, rr_tx_counter remains NULL, causing a null-ptr-deref.
>>
>> The XDP redirect path can reach this code even when the bond is not up:
>> bpf_master_redirect_enabled_key is a global static key, so when any bond
>> device has native XDP attached, the XDP_TX -> xdp_master_redirect()
>> interception is enabled for all bond slaves system-wide. This allows the
>> path xdp_master_redirect() -> bond_xdp_get_xmit_slave() ->
>> bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be
>> reached on a bond that was never opened.
>>
>> Fix this by allocating rr_tx_counter unconditionally in bond_init()
>> (ndo_init), which is called by register_netdevice() and covers both
>> device creation paths (bond_create() and bond_newlink()). This also
>> handles the case where bond mode is changed to round-robin after device
>> creation. The conditional allocation in bond_open() is removed. Since
>> bond_destructor() already unconditionally calls
>> free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at
>> ndo_init, free at destructor.
>>
>> Note: rr_tx_counter is only used by round-robin mode, so this
>> deliberately allocates a per-cpu u32 that goes unused for other modes.
>> Conditional allocation (e.g., in bond_option_mode_set) was considered
>> but rejected: the XDP path can race with mode changes on a downed bond,
>> and adding memory barriers to the XDP hot path is not justified for
>> saving 4 bytes per CPU.
>>
>> Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device")
>> Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/
>> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
>> ---
>> drivers/net/bonding/bond_main.c | 19 +++++++++++++------
>> 1 file changed, 13 insertions(+), 6 deletions(-)
>>
>
>IMO it's not worth it to waste memory in all modes, for an unpopular mode.
>I think it'd be better to add a null check in bond_rr_gen_slave_id(),
>READ/WRITE_ONCE() should be enough since it is allocated only once, and
>freed when the xmit code cannot be reachable anymore (otherwise we'd have
>more bugs now). The branch will be successfully predicted practically always,
>and you can also mark the ptr being null as unlikely. That way only RR takes
>a very minimal hit, if any.
Is what you're suggesting different from Jiayuan's proposal[0],
in the sense of needing barriers in the XDP hot path to insure ordering?
If I understand correctly, your suggestion is something like
(totally untested):
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index eb27cacc26d7..ac2a4fc0aad0 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4273,13 +4273,17 @@ void bond_work_cancel_all(struct bonding *bond)
static int bond_open(struct net_device *bond_dev)
{
struct bonding *bond = netdev_priv(bond_dev);
+ u32 __percpu *rr_tx_tmp;
struct list_head *iter;
struct slave *slave;
- if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) {
- bond->rr_tx_counter = alloc_percpu(u32);
- if (!bond->rr_tx_counter)
+ if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN &&
+ !READ_ONCE(bond->rr_tx_counter)) {
+ rr_tx_tmp = alloc_percpu(u32);
+ if (!rr_tx_tmp)
return -ENOMEM;
+ WRITE_ONCE(bond->rr_tx_counter, rr_tx_tmp);
+
}
/* reset slave->backup and slave->inactive */
@@ -4866,6 +4870,9 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
struct reciprocal_value reciprocal_packets_per_slave;
int packets_per_slave = bond->params.packets_per_slave;
+ if (!READ_ONCE(bond->rr_tx_counter))
+ packets_per_slave = 0;
+
switch (packets_per_slave) {
case 0:
slave_id = get_random_u32();
-J
[0] https://lore.kernel.org/netdev/e4a2a652784ec206728eb3a929a9892238c61f06@linux.dev/
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
2026-03-04 17:27 ` Jay Vosburgh
@ 2026-03-04 17:32 ` Nikolay Aleksandrov
2026-03-05 21:03 ` Jay Vosburgh
0 siblings, 1 reply; 13+ messages in thread
From: Nikolay Aleksandrov @ 2026-03-04 17:32 UTC (permalink / raw)
To: Jay Vosburgh
Cc: Jiayuan Chen, netdev, jiayuan.chen, syzbot+80e046b8da2820b6ba73,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel
On Wed, Mar 04, 2026 at 09:27:28AM -0800, Jay Vosburgh wrote:
> Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> >On Wed, Mar 04, 2026 at 03:42:57PM +0800, Jiayuan Chen wrote:
> >> From: Jiayuan Chen <jiayuan.chen@shopee.com>
> >>
> >> bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL
> >> check. rr_tx_counter is a per-CPU counter only allocated in bond_open()
> >> when the bond mode is round-robin. If the bond device was never brought
> >> up, rr_tx_counter remains NULL, causing a null-ptr-deref.
> >>
> >> The XDP redirect path can reach this code even when the bond is not up:
> >> bpf_master_redirect_enabled_key is a global static key, so when any bond
> >> device has native XDP attached, the XDP_TX -> xdp_master_redirect()
> >> interception is enabled for all bond slaves system-wide. This allows the
> >> path xdp_master_redirect() -> bond_xdp_get_xmit_slave() ->
> >> bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be
> >> reached on a bond that was never opened.
> >>
> >> Fix this by allocating rr_tx_counter unconditionally in bond_init()
> >> (ndo_init), which is called by register_netdevice() and covers both
> >> device creation paths (bond_create() and bond_newlink()). This also
> >> handles the case where bond mode is changed to round-robin after device
> >> creation. The conditional allocation in bond_open() is removed. Since
> >> bond_destructor() already unconditionally calls
> >> free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at
> >> ndo_init, free at destructor.
> >>
> >> Note: rr_tx_counter is only used by round-robin mode, so this
> >> deliberately allocates a per-cpu u32 that goes unused for other modes.
> >> Conditional allocation (e.g., in bond_option_mode_set) was considered
> >> but rejected: the XDP path can race with mode changes on a downed bond,
> >> and adding memory barriers to the XDP hot path is not justified for
> >> saving 4 bytes per CPU.
> >>
> >> Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device")
> >> Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com
> >> Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/
> >> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> >> ---
> >> drivers/net/bonding/bond_main.c | 19 +++++++++++++------
> >> 1 file changed, 13 insertions(+), 6 deletions(-)
> >>
> >
> >IMO it's not worth it to waste memory in all modes, for an unpopular mode.
> >I think it'd be better to add a null check in bond_rr_gen_slave_id(),
> >READ/WRITE_ONCE() should be enough since it is allocated only once, and
> >freed when the xmit code cannot be reachable anymore (otherwise we'd have
> >more bugs now). The branch will be successfully predicted practically always,
> >and you can also mark the ptr being null as unlikely. That way only RR takes
> >a very minimal hit, if any.
>
> Is what you're suggesting different from Jiayuan's proposal[0],
> in the sense of needing barriers in the XDP hot path to insure ordering?
>
> If I understand correctly, your suggestion is something like
> (totally untested):
>
Basically yes, that is what I'm proposing + an unlikely() around that
null check since it is really unlikely and will be always predicted
correctly, this way it's only for RR mode.
> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> index eb27cacc26d7..ac2a4fc0aad0 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -4273,13 +4273,17 @@ void bond_work_cancel_all(struct bonding *bond)
> static int bond_open(struct net_device *bond_dev)
> {
> struct bonding *bond = netdev_priv(bond_dev);
> + u32 __percpu *rr_tx_tmp;
> struct list_head *iter;
> struct slave *slave;
>
> - if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) {
> - bond->rr_tx_counter = alloc_percpu(u32);
> - if (!bond->rr_tx_counter)
> + if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN &&
> + !READ_ONCE(bond->rr_tx_counter)) {
> + rr_tx_tmp = alloc_percpu(u32);
> + if (!rr_tx_tmp)
> return -ENOMEM;
> + WRITE_ONCE(bond->rr_tx_counter, rr_tx_tmp);
> +
> }
>
> /* reset slave->backup and slave->inactive */
> @@ -4866,6 +4870,9 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
> struct reciprocal_value reciprocal_packets_per_slave;
> int packets_per_slave = bond->params.packets_per_slave;
>
> + if (!READ_ONCE(bond->rr_tx_counter))
> + packets_per_slave = 0;
> +
> switch (packets_per_slave) {
> case 0:
> slave_id = get_random_u32();
>
> -J
>
>
> [0] https://lore.kernel.org/netdev/e4a2a652784ec206728eb3a929a9892238c61f06@linux.dev/
>
> ---
> -Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
2026-03-04 17:32 ` Nikolay Aleksandrov
@ 2026-03-05 21:03 ` Jay Vosburgh
2026-03-06 2:42 ` Jiayuan Chen
0 siblings, 1 reply; 13+ messages in thread
From: Jay Vosburgh @ 2026-03-05 21:03 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: Jiayuan Chen, netdev, jiayuan.chen, syzbot+80e046b8da2820b6ba73,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel
Nikolay Aleksandrov <razor@blackwall.org> wrote:
>On Wed, Mar 04, 2026 at 09:27:28AM -0800, Jay Vosburgh wrote:
>> Nikolay Aleksandrov <razor@blackwall.org> wrote:
>>
>> >On Wed, Mar 04, 2026 at 03:42:57PM +0800, Jiayuan Chen wrote:
>> >> From: Jiayuan Chen <jiayuan.chen@shopee.com>
>> >>
>> >> bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL
>> >> check. rr_tx_counter is a per-CPU counter only allocated in bond_open()
>> >> when the bond mode is round-robin. If the bond device was never brought
>> >> up, rr_tx_counter remains NULL, causing a null-ptr-deref.
>> >>
>> >> The XDP redirect path can reach this code even when the bond is not up:
>> >> bpf_master_redirect_enabled_key is a global static key, so when any bond
>> >> device has native XDP attached, the XDP_TX -> xdp_master_redirect()
>> >> interception is enabled for all bond slaves system-wide. This allows the
>> >> path xdp_master_redirect() -> bond_xdp_get_xmit_slave() ->
>> >> bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be
>> >> reached on a bond that was never opened.
>> >>
>> >> Fix this by allocating rr_tx_counter unconditionally in bond_init()
>> >> (ndo_init), which is called by register_netdevice() and covers both
>> >> device creation paths (bond_create() and bond_newlink()). This also
>> >> handles the case where bond mode is changed to round-robin after device
>> >> creation. The conditional allocation in bond_open() is removed. Since
>> >> bond_destructor() already unconditionally calls
>> >> free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at
>> >> ndo_init, free at destructor.
>> >>
>> >> Note: rr_tx_counter is only used by round-robin mode, so this
>> >> deliberately allocates a per-cpu u32 that goes unused for other modes.
>> >> Conditional allocation (e.g., in bond_option_mode_set) was considered
>> >> but rejected: the XDP path can race with mode changes on a downed bond,
>> >> and adding memory barriers to the XDP hot path is not justified for
>> >> saving 4 bytes per CPU.
>> >>
>> >> Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device")
>> >> Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com
>> >> Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/
>> >> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
>> >> ---
>> >> drivers/net/bonding/bond_main.c | 19 +++++++++++++------
>> >> 1 file changed, 13 insertions(+), 6 deletions(-)
>> >>
>> >
>> >IMO it's not worth it to waste memory in all modes, for an unpopular mode.
>> >I think it'd be better to add a null check in bond_rr_gen_slave_id(),
>> >READ/WRITE_ONCE() should be enough since it is allocated only once, and
>> >freed when the xmit code cannot be reachable anymore (otherwise we'd have
>> >more bugs now). The branch will be successfully predicted practically always,
>> >and you can also mark the ptr being null as unlikely. That way only RR takes
>> >a very minimal hit, if any.
>>
>> Is what you're suggesting different from Jiayuan's proposal[0],
>> in the sense of needing barriers in the XDP hot path to insure ordering?
>>
>> If I understand correctly, your suggestion is something like
>> (totally untested):
>>
>
>Basically yes, that is what I'm proposing + an unlikely() around that
>null check since it is really unlikely and will be always predicted
>correctly, this way it's only for RR mode.
Jiayuan,
Do you agree that the patch below (including Nikolay's
suggestion to add "unlikely") resolves the original issue without memory
waste, and without introducing performance issues (barriers) into the
XDP path?
-J
>> diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
>> index eb27cacc26d7..ac2a4fc0aad0 100644
>> --- a/drivers/net/bonding/bond_main.c
>> +++ b/drivers/net/bonding/bond_main.c
>> @@ -4273,13 +4273,17 @@ void bond_work_cancel_all(struct bonding *bond)
>> static int bond_open(struct net_device *bond_dev)
>> {
>> struct bonding *bond = netdev_priv(bond_dev);
>> + u32 __percpu *rr_tx_tmp;
>> struct list_head *iter;
>> struct slave *slave;
>>
>> - if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) {
>> - bond->rr_tx_counter = alloc_percpu(u32);
>> - if (!bond->rr_tx_counter)
>> + if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN &&
>> + !READ_ONCE(bond->rr_tx_counter)) {
>> + rr_tx_tmp = alloc_percpu(u32);
>> + if (!rr_tx_tmp)
>> return -ENOMEM;
>> + WRITE_ONCE(bond->rr_tx_counter, rr_tx_tmp);
>> +
>> }
>>
>> /* reset slave->backup and slave->inactive */
>> @@ -4866,6 +4870,9 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
>> struct reciprocal_value reciprocal_packets_per_slave;
>> int packets_per_slave = bond->params.packets_per_slave;
>>
>> + if (!READ_ONCE(bond->rr_tx_counter))
>> + packets_per_slave = 0;
>> +
>> switch (packets_per_slave) {
>> case 0:
>> slave_id = get_random_u32();
>>
>> -J
>>
>>
>> [0] https://lore.kernel.org/netdev/e4a2a652784ec206728eb3a929a9892238c61f06@linux.dev/
---
-Jay Vosburgh, jv@jvosburgh.net
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
2026-03-05 21:03 ` Jay Vosburgh
@ 2026-03-06 2:42 ` Jiayuan Chen
2026-03-06 12:22 ` Nikolay Aleksandrov
0 siblings, 1 reply; 13+ messages in thread
From: Jiayuan Chen @ 2026-03-06 2:42 UTC (permalink / raw)
To: Jay Vosburgh, Nikolay Aleksandrov
Cc: netdev, jiayuan.chen, syzbot+80e046b8da2820b6ba73, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev, Andrii Nakryiko,
Martin KaFai Lau, Eduard Zingerman, Song Liu, Yonghong Song,
KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel
2026/3/6 05:03, "Jay Vosburgh" <jv@jvosburgh.net mailto:jv@jvosburgh.net?to=%22Jay%20Vosburgh%22%20%3Cjv%40jvosburgh.net%3E > wrote:
>
> Nikolay Aleksandrov <razor@blackwall.org> wrote:
>
> >
> > On Wed, Mar 04, 2026 at 09:27:28AM -0800, Jay Vosburgh wrote:
> >
> > >
> > > Nikolay Aleksandrov <razor@blackwall.org> wrote:
> > >
> > > >On Wed, Mar 04, 2026 at 03:42:57PM +0800, Jiayuan Chen wrote:
> > > >> From: Jiayuan Chen <jiayuan.chen@shopee.com>
> > > >>
> > > >> bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL
> > > >> check. rr_tx_counter is a per-CPU counter only allocated in bond_open()
> > > >> when the bond mode is round-robin. If the bond device was never brought
> > > >> up, rr_tx_counter remains NULL, causing a null-ptr-deref.
> > > >>
> > > >> The XDP redirect path can reach this code even when the bond is not up:
> > > >> bpf_master_redirect_enabled_key is a global static key, so when any bond
> > > >> device has native XDP attached, the XDP_TX -> xdp_master_redirect()
> > > >> interception is enabled for all bond slaves system-wide. This allows the
> > > >> path xdp_master_redirect() -> bond_xdp_get_xmit_slave() ->
> > > >> bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be
> > > >> reached on a bond that was never opened.
> > > >>
> > > >> Fix this by allocating rr_tx_counter unconditionally in bond_init()
> > > >> (ndo_init), which is called by register_netdevice() and covers both
> > > >> device creation paths (bond_create() and bond_newlink()). This also
> > > >> handles the case where bond mode is changed to round-robin after device
> > > >> creation. The conditional allocation in bond_open() is removed. Since
> > > >> bond_destructor() already unconditionally calls
> > > >> free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at
> > > >> ndo_init, free at destructor.
> > > >>
> > > >> Note: rr_tx_counter is only used by round-robin mode, so this
> > > >> deliberately allocates a per-cpu u32 that goes unused for other modes.
> > > >> Conditional allocation (e.g., in bond_option_mode_set) was considered
> > > >> but rejected: the XDP path can race with mode changes on a downed bond,
> > > >> and adding memory barriers to the XDP hot path is not justified for
> > > >> saving 4 bytes per CPU.
> > > >>
> > > >> Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device")
> > > >> Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com
> > > >> Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/
> > > >> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> > > >> ---
> > > >> drivers/net/bonding/bond_main.c | 19 +++++++++++++------
> > > >> 1 file changed, 13 insertions(+), 6 deletions(-)
> > > >>
> > > >
> > > >IMO it's not worth it to waste memory in all modes, for an unpopular mode.
> > > >I think it'd be better to add a null check in bond_rr_gen_slave_id(),
> > > >READ/WRITE_ONCE() should be enough since it is allocated only once, and
> > > >freed when the xmit code cannot be reachable anymore (otherwise we'd have
> > > >more bugs now). The branch will be successfully predicted practically always,
> > > >and you can also mark the ptr being null as unlikely. That way only RR takes
> > > >a very minimal hit, if any.
> > >
> > > Is what you're suggesting different from Jiayuan's proposal[0],
> > > in the sense of needing barriers in the XDP hot path to insure ordering?
> > >
> > > If I understand correctly, your suggestion is something like
> > > (totally untested):
> > >
> > Basically yes, that is what I'm proposing + an unlikely() around that
> > null check since it is really unlikely and will be always predicted
> > correctly, this way it's only for RR mode.
> >
> Jiayuan,
>
> Do you agree that the patch below (including Nikolay's
> suggestion to add "unlikely") resolves the original issue without memory
> waste, and without introducing performance issues (barriers) into the
> XDP path?
Sure, it's basically similar to what my v1 did, but the patch below can be more generic.
https://lore.kernel.org/netdev/20260224112545.37888-1-jiayuan.chen@linux.dev/T/#m08e3e53a8aa8d837ddc9242f4b14f2651a2b00aa
> -J
>
> >
> > >
> > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > > index eb27cacc26d7..ac2a4fc0aad0 100644
> > > --- a/drivers/net/bonding/bond_main.c
> > > +++ b/drivers/net/bonding/bond_main.c
> > > @@ -4273,13 +4273,17 @@ void bond_work_cancel_all(struct bonding *bond)
> > > static int bond_open(struct net_device *bond_dev)
> > > {
> > > struct bonding *bond = netdev_priv(bond_dev);
> > > + u32 __percpu *rr_tx_tmp;
> > > struct list_head *iter;
> > > struct slave *slave;
> > >
> > > - if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) {
> > > - bond->rr_tx_counter = alloc_percpu(u32);
> > > - if (!bond->rr_tx_counter)
> > > + if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN &&
> > > + !READ_ONCE(bond->rr_tx_counter)) {
> > > + rr_tx_tmp = alloc_percpu(u32);
> > > + if (!rr_tx_tmp)
> > > return -ENOMEM;
> > > + WRITE_ONCE(bond->rr_tx_counter, rr_tx_tmp);
> > > +
> > > }
> > >
> > > /* reset slave->backup and slave->inactive */
> > > @@ -4866,6 +4870,9 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
> > > struct reciprocal_value reciprocal_packets_per_slave;
> > > int packets_per_slave = bond->params.packets_per_slave;
> > >
> > > + if (!READ_ONCE(bond->rr_tx_counter))
> > > + packets_per_slave = 0;
> > > +
> > > switch (packets_per_slave) {
> > > case 0:
> > > slave_id = get_random_u32();
> > >
> > > -J
> > >
> > >
> > > [0] https://lore.kernel.org/netdev/e4a2a652784ec206728eb3a929a9892238c61f06@linux.dev/
> > >
> >
> ---
> -Jay Vosburgh, jv@jvosburgh.net
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
2026-03-06 2:42 ` Jiayuan Chen
@ 2026-03-06 12:22 ` Nikolay Aleksandrov
2026-03-06 12:38 ` Jiayuan Chen
0 siblings, 1 reply; 13+ messages in thread
From: Nikolay Aleksandrov @ 2026-03-06 12:22 UTC (permalink / raw)
To: Jiayuan Chen
Cc: Jay Vosburgh, netdev, jiayuan.chen, syzbot+80e046b8da2820b6ba73,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel
On Fri, Mar 06, 2026 at 02:42:05AM +0000, Jiayuan Chen wrote:
> 2026/3/6 05:03, "Jay Vosburgh" <jv@jvosburgh.net mailto:jv@jvosburgh.net?to=%22Jay%20Vosburgh%22%20%3Cjv%40jvosburgh.net%3E > wrote:
>
>
> >
> > Nikolay Aleksandrov <razor@blackwall.org> wrote:
> >
> > >
> > > On Wed, Mar 04, 2026 at 09:27:28AM -0800, Jay Vosburgh wrote:
> > >
> > > >
> > > > Nikolay Aleksandrov <razor@blackwall.org> wrote:
> > > >
> > > > >On Wed, Mar 04, 2026 at 03:42:57PM +0800, Jiayuan Chen wrote:
> > > > >> From: Jiayuan Chen <jiayuan.chen@shopee.com>
> > > > >>
> > > > >> bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL
> > > > >> check. rr_tx_counter is a per-CPU counter only allocated in bond_open()
> > > > >> when the bond mode is round-robin. If the bond device was never brought
> > > > >> up, rr_tx_counter remains NULL, causing a null-ptr-deref.
> > > > >>
> > > > >> The XDP redirect path can reach this code even when the bond is not up:
> > > > >> bpf_master_redirect_enabled_key is a global static key, so when any bond
> > > > >> device has native XDP attached, the XDP_TX -> xdp_master_redirect()
> > > > >> interception is enabled for all bond slaves system-wide. This allows the
> > > > >> path xdp_master_redirect() -> bond_xdp_get_xmit_slave() ->
> > > > >> bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be
> > > > >> reached on a bond that was never opened.
> > > > >>
> > > > >> Fix this by allocating rr_tx_counter unconditionally in bond_init()
> > > > >> (ndo_init), which is called by register_netdevice() and covers both
> > > > >> device creation paths (bond_create() and bond_newlink()). This also
> > > > >> handles the case where bond mode is changed to round-robin after device
> > > > >> creation. The conditional allocation in bond_open() is removed. Since
> > > > >> bond_destructor() already unconditionally calls
> > > > >> free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at
> > > > >> ndo_init, free at destructor.
> > > > >>
> > > > >> Note: rr_tx_counter is only used by round-robin mode, so this
> > > > >> deliberately allocates a per-cpu u32 that goes unused for other modes.
> > > > >> Conditional allocation (e.g., in bond_option_mode_set) was considered
> > > > >> but rejected: the XDP path can race with mode changes on a downed bond,
> > > > >> and adding memory barriers to the XDP hot path is not justified for
> > > > >> saving 4 bytes per CPU.
> > > > >>
> > > > >> Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device")
> > > > >> Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com
> > > > >> Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/
> > > > >> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> > > > >> ---
> > > > >> drivers/net/bonding/bond_main.c | 19 +++++++++++++------
> > > > >> 1 file changed, 13 insertions(+), 6 deletions(-)
> > > > >>
> > > > >
> > > > >IMO it's not worth it to waste memory in all modes, for an unpopular mode.
> > > > >I think it'd be better to add a null check in bond_rr_gen_slave_id(),
> > > > >READ/WRITE_ONCE() should be enough since it is allocated only once, and
> > > > >freed when the xmit code cannot be reachable anymore (otherwise we'd have
> > > > >more bugs now). The branch will be successfully predicted practically always,
> > > > >and you can also mark the ptr being null as unlikely. That way only RR takes
> > > > >a very minimal hit, if any.
> > > >
> > > > Is what you're suggesting different from Jiayuan's proposal[0],
> > > > in the sense of needing barriers in the XDP hot path to insure ordering?
> > > >
> > > > If I understand correctly, your suggestion is something like
> > > > (totally untested):
> > > >
> > > Basically yes, that is what I'm proposing + an unlikely() around that
> > > null check since it is really unlikely and will be always predicted
> > > correctly, this way it's only for RR mode.
> > >
> > Jiayuan,
> >
> > Do you agree that the patch below (including Nikolay's
> > suggestion to add "unlikely") resolves the original issue without memory
> > waste, and without introducing performance issues (barriers) into the
> > XDP path?
>
>
> Sure, it's basically similar to what my v1 did, but the patch below can be more generic.
>
> https://lore.kernel.org/netdev/20260224112545.37888-1-jiayuan.chen@linux.dev/T/#m08e3e53a8aa8d837ddc9242f4b14f2651a2b00aa
>
IMO that is worse, you'll add 2 new tests and potentially 1 more cache line
for everyone in a hot path that is used a lot and must be kept as fast as possible.
> > -J
> >
> > >
> > > >
> > > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > > > index eb27cacc26d7..ac2a4fc0aad0 100644
> > > > --- a/drivers/net/bonding/bond_main.c
> > > > +++ b/drivers/net/bonding/bond_main.c
> > > > @@ -4273,13 +4273,17 @@ void bond_work_cancel_all(struct bonding *bond)
> > > > static int bond_open(struct net_device *bond_dev)
> > > > {
> > > > struct bonding *bond = netdev_priv(bond_dev);
> > > > + u32 __percpu *rr_tx_tmp;
> > > > struct list_head *iter;
> > > > struct slave *slave;
> > > >
> > > > - if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) {
> > > > - bond->rr_tx_counter = alloc_percpu(u32);
> > > > - if (!bond->rr_tx_counter)
> > > > + if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN &&
> > > > + !READ_ONCE(bond->rr_tx_counter)) {
> > > > + rr_tx_tmp = alloc_percpu(u32);
> > > > + if (!rr_tx_tmp)
> > > > return -ENOMEM;
> > > > + WRITE_ONCE(bond->rr_tx_counter, rr_tx_tmp);
> > > > +
> > > > }
> > > >
> > > > /* reset slave->backup and slave->inactive */
> > > > @@ -4866,6 +4870,9 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
> > > > struct reciprocal_value reciprocal_packets_per_slave;
> > > > int packets_per_slave = bond->params.packets_per_slave;
> > > >
> > > > + if (!READ_ONCE(bond->rr_tx_counter))
> > > > + packets_per_slave = 0;
> > > > +
> > > > switch (packets_per_slave) {
> > > > case 0:
> > > > slave_id = get_random_u32();
> > > >
> > > > -J
> > > >
> > > >
> > > > [0] https://lore.kernel.org/netdev/e4a2a652784ec206728eb3a929a9892238c61f06@linux.dev/
> > > >
> > >
> > ---
> > -Jay Vosburgh, jv@jvosburgh.net
> >
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
2026-03-06 12:22 ` Nikolay Aleksandrov
@ 2026-03-06 12:38 ` Jiayuan Chen
0 siblings, 0 replies; 13+ messages in thread
From: Jiayuan Chen @ 2026-03-06 12:38 UTC (permalink / raw)
To: Nikolay Aleksandrov
Cc: Jay Vosburgh, netdev, jiayuan.chen, syzbot+80e046b8da2820b6ba73,
Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Stanislav Fomichev,
Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, Shuah Khan,
Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
Jussi Maki, linux-kernel, bpf, linux-kselftest, linux-rt-devel
March 6, 2026 at 20:22, "Nikolay Aleksandrov" <razor@blackwall.org mailto:razor@blackwall.org?to=%22Nikolay%20Aleksandrov%22%20%3Crazor%40blackwall.org%3E > wrote:
>
> On Fri, Mar 06, 2026 at 02:42:05AM +0000, Jiayuan Chen wrote:
>
> >
> > 2026/3/6 05:03, "Jay Vosburgh" <jv@jvosburgh.net mailto:jv@jvosburgh.net?to=%22Jay%20Vosburgh%22%20%3Cjv%40jvosburgh.net%3E > wrote:
> >
> >
> >
> > Nikolay Aleksandrov <razor@blackwall.org> wrote:
> >
> > >
> > > On Wed, Mar 04, 2026 at 09:27:28AM -0800, Jay Vosburgh wrote:
> > >
> > > >
> > > > Nikolay Aleksandrov <razor@blackwall.org> wrote:
> > > >
> > > > >On Wed, Mar 04, 2026 at 03:42:57PM +0800, Jiayuan Chen wrote:
> > > > >> From: Jiayuan Chen <jiayuan.chen@shopee.com>
> > > > >>
> > > > >> bond_rr_gen_slave_id() dereferences bond->rr_tx_counter without a NULL
> > > > >> check. rr_tx_counter is a per-CPU counter only allocated in bond_open()
> > > > >> when the bond mode is round-robin. If the bond device was never brought
> > > > >> up, rr_tx_counter remains NULL, causing a null-ptr-deref.
> > > > >>
> > > > >> The XDP redirect path can reach this code even when the bond is not up:
> > > > >> bpf_master_redirect_enabled_key is a global static key, so when any bond
> > > > >> device has native XDP attached, the XDP_TX -> xdp_master_redirect()
> > > > >> interception is enabled for all bond slaves system-wide. This allows the
> > > > >> path xdp_master_redirect() -> bond_xdp_get_xmit_slave() ->
> > > > >> bond_xdp_xmit_roundrobin_slave_get() -> bond_rr_gen_slave_id() to be
> > > > >> reached on a bond that was never opened.
> > > > >>
> > > > >> Fix this by allocating rr_tx_counter unconditionally in bond_init()
> > > > >> (ndo_init), which is called by register_netdevice() and covers both
> > > > >> device creation paths (bond_create() and bond_newlink()). This also
> > > > >> handles the case where bond mode is changed to round-robin after device
> > > > >> creation. The conditional allocation in bond_open() is removed. Since
> > > > >> bond_destructor() already unconditionally calls
> > > > >> free_percpu(bond->rr_tx_counter), the lifecycle is clean: allocate at
> > > > >> ndo_init, free at destructor.
> > > > >>
> > > > >> Note: rr_tx_counter is only used by round-robin mode, so this
> > > > >> deliberately allocates a per-cpu u32 that goes unused for other modes.
> > > > >> Conditional allocation (e.g., in bond_option_mode_set) was considered
> > > > >> but rejected: the XDP path can race with mode changes on a downed bond,
> > > > >> and adding memory barriers to the XDP hot path is not justified for
> > > > >> saving 4 bytes per CPU.
> > > > >>
> > > > >> Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device")
> > > > >> Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com
> > > > >> Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/
> > > > >> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> > > > >> ---
> > > > >> drivers/net/bonding/bond_main.c | 19 +++++++++++++------
> > > > >> 1 file changed, 13 insertions(+), 6 deletions(-)
> > > > >>
> > > > >
> > > > >IMO it's not worth it to waste memory in all modes, for an unpopular mode.
> > > > >I think it'd be better to add a null check in bond_rr_gen_slave_id(),
> > > > >READ/WRITE_ONCE() should be enough since it is allocated only once, and
> > > > >freed when the xmit code cannot be reachable anymore (otherwise we'd have
> > > > >more bugs now). The branch will be successfully predicted practically always,
> > > > >and you can also mark the ptr being null as unlikely. That way only RR takes
> > > > >a very minimal hit, if any.
> > > >
> > > > Is what you're suggesting different from Jiayuan's proposal[0],
> > > > in the sense of needing barriers in the XDP hot path to insure ordering?
> > > >
> > > > If I understand correctly, your suggestion is something like
> > > > (totally untested):
> > > >
> > > Basically yes, that is what I'm proposing + an unlikely() around that
> > > null check since it is really unlikely and will be always predicted
> > > correctly, this way it's only for RR mode.
> > >
> > Jiayuan,
> >
> > Do you agree that the patch below (including Nikolay's
> > suggestion to add "unlikely") resolves the original issue without memory
> > waste, and without introducing performance issues (barriers) into the
> > XDP path?
> >
> >
> > Sure, it's basically similar to what my v1 did, but the patch below can be more generic.
> >
> > https://lore.kernel.org/netdev/20260224112545.37888-1-jiayuan.chen@linux.dev/T/#m08e3e53a8aa8d837ddc9242f4b14f2651a2b00aa
> >
> IMO that is worse, you'll add 2 new tests and potentially 1 more cache line
> for everyone in a hot path that is used a lot and must be kept as fast as possible.
Hi Nikolay,
Apologies for the confusion. When I said "the patch below," I was not referring to the v1 link,
but to the diff snippet appended at the bottom of my email.
That specific snippet (using a local temp pointer) is the optimal approach. It ensures the NULL
check for bond->rr_tx_counter is handled efficiently without introducing the performance
overhead or memory waste you were concerned about.
Best regards,
Jiayuan
> >
> > -J
> >
> > >
> > > >
> > > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
> > > > index eb27cacc26d7..ac2a4fc0aad0 100644
> > > > --- a/drivers/net/bonding/bond_main.c
> > > > +++ b/drivers/net/bonding/bond_main.c
> > > > @@ -4273,13 +4273,17 @@ void bond_work_cancel_all(struct bonding *bond)
> > > > static int bond_open(struct net_device *bond_dev)
> > > > {
> > > > struct bonding *bond = netdev_priv(bond_dev);
> > > > + u32 __percpu *rr_tx_tmp;
> > > > struct list_head *iter;
> > > > struct slave *slave;
> > > >
> > > > - if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) {
> > > > - bond->rr_tx_counter = alloc_percpu(u32);
> > > > - if (!bond->rr_tx_counter)
> > > > + if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN &&
> > > > + !READ_ONCE(bond->rr_tx_counter)) {
> > > > + rr_tx_tmp = alloc_percpu(u32);
> > > > + if (!rr_tx_tmp)
> > > > return -ENOMEM;
> > > > + WRITE_ONCE(bond->rr_tx_counter, rr_tx_tmp);
> > > > +
> > > > }
> > > >
> > > > /* reset slave->backup and slave->inactive */
> > > > @@ -4866,6 +4870,9 @@ static u32 bond_rr_gen_slave_id(struct bonding *bond)
> > > > struct reciprocal_value reciprocal_packets_per_slave;
> > > > int packets_per_slave = bond->params.packets_per_slave;
> > > >
> > > > + if (!READ_ONCE(bond->rr_tx_counter))
> > > > + packets_per_slave = 0;
> > > > +
> > > > switch (packets_per_slave) {
> > > > case 0:
> > > > slave_id = get_random_u32();
> > > >
> > > > -J
> > > >
> > > >
> > > > [0] https://lore.kernel.org/netdev/e4a2a652784ec206728eb3a929a9892238c61f06@linux.dev/
> > > >
> > >
> > ---
> > -Jay Vosburgh, jv@jvosburgh.net
> >
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2026-03-06 12:39 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-04 7:42 [PATCH net v4 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest Jiayuan Chen
2026-03-04 7:42 ` [PATCH net v4 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() Jiayuan Chen
2026-03-04 8:20 ` Daniel Borkmann
2026-03-04 8:47 ` Jiayuan Chen
2026-03-04 9:40 ` Sebastian Andrzej Siewior
2026-03-04 15:59 ` Nikolay Aleksandrov
2026-03-04 17:27 ` Jay Vosburgh
2026-03-04 17:32 ` Nikolay Aleksandrov
2026-03-05 21:03 ` Jay Vosburgh
2026-03-06 2:42 ` Jiayuan Chen
2026-03-06 12:22 ` Nikolay Aleksandrov
2026-03-06 12:38 ` Jiayuan Chen
2026-03-04 7:42 ` [PATCH net v4 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up Jiayuan Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox