public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf v1 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest
@ 2026-02-24 11:25 Jiayuan Chen
  2026-02-24 11:25 ` [PATCH bpf v1 1/2] net/bpf: fix null-ptr-deref in xdp_master_redirect() for bonding Jiayuan Chen
  2026-02-24 11:25 ` [PATCH bpf v1 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up Jiayuan Chen
  0 siblings, 2 replies; 5+ messages in thread
From: Jiayuan Chen @ 2026-02-24 11:25 UTC (permalink / raw)
  To: bpf
  Cc: jiayuan.chen, jiayuan.chen, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jesper Dangaard Brouer, Shuah Khan,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jussi Maki, netdev, linux-kernel, linux-kselftest, linux-rt-devel

syzkaller reported a kernel panic [1] with the following crash stack:

Call Trace:
BUG: unable to handle page fault for address: ffff8ebd08580000
PF: supervisor write access in kernel mode
PF: error_code(0x0002) - not-present page
PGD 11f201067 P4D 11f201067 PUD 0
Oops: Oops: 0002 [#1] SMP PTI
CPU: 2 UID: 0 PID: 451 Comm: test_progs Not tainted 6.19.0+ #161 PREEMPT_RT
RIP: 0010:bond_rr_gen_slave_id+0x90/0xd0
RSP: 0018:ffffd3f4815f3448 EFLAGS: 00010246
RAX: 0000000000000001 RBX: 0000000000000001 RCX: ffff8ebc8728b17e
RDX: 0000000000000000 RSI: ffffd3f4815f3538 RDI: ffff8ebc8abcce40
RBP: ffffd3f4815f3460 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffd3f4815f3538
R13: ffff8ebc8abcce40 R14: ffff8ebc8728b17f R15: ffff8ebc8728b170
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8ebd08580000 CR3: 000000010a808006 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <TASK>
 bond_xdp_get_xmit_slave+0xc0/0x240
 xdp_master_redirect+0x74/0xc0
 bpf_prog_run_generic_xdp+0x2f2/0x3f0
 do_xdp_generic+0x1fd/0x3d0
 __netif_receive_skb_core.constprop.0+0x30d/0x1220
 __netif_receive_skb_list_core+0xfc/0x250
 netif_receive_skb_list_internal+0x20c/0x3d0
 ? eth_type_trans+0x137/0x160
 netif_receive_skb_list+0x25/0x140
 xdp_test_run_batch.constprop.0+0x65b/0x6e0
 bpf_test_run_xdp_live+0x1ec/0x3b0
 bpf_prog_test_run_xdp+0x49d/0x6e0
 __sys_bpf+0x446/0x27b0
 __x64_sys_bpf+0x1a/0x30
 x64_sys_call+0x146c/0x26e0
 do_syscall_64+0xd3/0x1510
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Problem Description

This issue occurs when the following conditions are met:

1. A bond device is in round-robin mode but has never been brought UP
 (bond_open() was never called)
 - rr_tx_counter is only allocated in bond_open()

2. bpf_master_redirect_enabled_key is a global static key
 - When any bond device attaches native XDP, this key is globally enabled
 - It affects XDP processing for ALL bond slaves system-wide

3. The XDP redirect data path can reach bond_rr_gen_slave_id()
 - Via: xdp_master_redirect()->bond_xdp_get_xmit_slave()->bond_rr_gen_slave_id()
 - bond_rr_gen_slave_id() directly dereferences rr_tx_counter without NULL check
 - When the bond is not UP, rr_tx_counter is NULL, causing a null-ptr-deref crash

Solution

Patch 1: Add netif_running() check in xdp_master_redirect() to verify the master
	   device is in the running state before proceeding with the redirect.

Patch 2: Add a selftest that reproduces the above scenario and verifies the fix.

[1] https://syzkaller.appspot.com/bug?extid=80e046b8da2820b6ba73

Jiayuan Chen (2):
  net/bpf: fix null-ptr-deref in xdp_master_redirect() for bonding
  selftests/bpf: add test for xdp_master_redirect with bond not up

 net/core/filter.c                             |   3 +
 .../selftests/bpf/prog_tests/xdp_bonding.c    | 101 +++++++++++++++++-
 2 files changed, 102 insertions(+), 2 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH bpf v1 1/2] net/bpf: fix null-ptr-deref in xdp_master_redirect() for bonding
  2026-02-24 11:25 [PATCH bpf v1 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest Jiayuan Chen
@ 2026-02-24 11:25 ` Jiayuan Chen
  2026-02-26  9:58   ` Sebastian Andrzej Siewior
  2026-02-24 11:25 ` [PATCH bpf v1 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up Jiayuan Chen
  1 sibling, 1 reply; 5+ messages in thread
From: Jiayuan Chen @ 2026-02-24 11:25 UTC (permalink / raw)
  To: bpf
  Cc: jiayuan.chen, jiayuan.chen, syzbot+80e046b8da2820b6ba73,
	Martin KaFai Lau, Daniel Borkmann, John Fastabend,
	Stanislav Fomichev, Alexei Starovoitov, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, KP Singh, Hao Luo,
	Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jesper Dangaard Brouer, Shuah Khan,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jussi Maki, netdev, linux-kernel, linux-kselftest, linux-rt-devel

From: Jiayuan Chen <jiayuan.chen@shopee.com>

xdp_master_redirect() dereferences the master device pointer and calls
ndo_xdp_get_xmit_slave() without checking whether the master is valid
or running. When a bond device in round-robin mode has never been brought
up, bond->rr_tx_counter is not allocated (only allocated in bond_open()).
The XDP redirect path can still reach bond_rr_gen_slave_id() via
xdp_master_redirect() → bond_xdp_get_xmit_slave() →
bond_xdp_xmit_roundrobin_slave_get(), causing a null-ptr-deref on
rr_tx_counter.

bpf_master_redirect_enabled_key is a global static key. When any bond
device has native XDP attached, this key is enabled system-wide, causing
the XDP_TX → xdp_master_redirect interception to fire for all bond
slaves, even slaves of other bond devices that have never been opened.

To trigger, the following conditions must be met:
1. A bond (bond0) with native XDP attached, enabling the global key
2. Another bond (bond1) in round-robin mode, never brought up
3. A slave (veth1) of bond1 with generic XDP returning XDP_TX
4. Packets hitting the generic XDP path on veth1

Add a check in xdp_master_redirect() to verify the master device is
valid and running before proceeding with the redirect. This is
semantically correct because redirecting to a non-running master is
meaningless, and safe from a concurrency perspective because
xdp_master_redirect() runs under RCU protection and the master's
resources are not freed until after an RCU grace period.

Fixes: 879af96ffd72 ("net, core: Add support for XDP redirection to slave device")
Reported-by: syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/698f84c6.a70a0220.2c38d7.00cc.GAE@google.com/T/
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
---
 net/core/filter.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index ba019ded773d..9a45dabd0044 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4387,6 +4387,9 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
 	struct net_device *master, *slave;
 
 	master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
+	if (unlikely(!master || !netif_running(master)))
+		return XDP_TX;
+
 	slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
 	if (slave && slave != xdp->rxq->dev) {
 		/* The target device is different from the receiving device, so
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH bpf v1 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up
  2026-02-24 11:25 [PATCH bpf v1 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest Jiayuan Chen
  2026-02-24 11:25 ` [PATCH bpf v1 1/2] net/bpf: fix null-ptr-deref in xdp_master_redirect() for bonding Jiayuan Chen
@ 2026-02-24 11:25 ` Jiayuan Chen
  1 sibling, 0 replies; 5+ messages in thread
From: Jiayuan Chen @ 2026-02-24 11:25 UTC (permalink / raw)
  To: bpf
  Cc: jiayuan.chen, jiayuan.chen, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman, Song Liu,
	Yonghong Song, John Fastabend, KP Singh, Stanislav Fomichev,
	Hao Luo, Jiri Olsa, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jesper Dangaard Brouer, Shuah Khan,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt,
	Jussi Maki, netdev, linux-kernel, linux-kselftest, linux-rt-devel

From: Jiayuan Chen <jiayuan.chen@shopee.com>

Add a selftest that verifies xdp_master_redirect() does not crash when
the bond master device has never been brought up. The test reproduces
the null-ptr-deref in bond_rr_gen_slave_id() by:

1. Creating bond0 (active-backup, UP) with native XDP attached to
   enable the global bpf_master_redirect_enabled_key
2. Creating bond1 (round-robin, never UP) with veth1 enslaved
3. Attaching generic XDP (XDP_TX) to veth1
4. Running BPF_PROG_TEST_RUN with live frames and XDP_PASS on veth1
   to inject packets into the generic XDP path

Without the fix in xdp_master_redirect(), step 4 causes a kernel crash.

Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
---
 .../selftests/bpf/prog_tests/xdp_bonding.c    | 101 +++++++++++++++++-
 1 file changed, 99 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
index fb952703653e..a5b15e464018 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_bonding.c
@@ -191,13 +191,18 @@ static int bonding_setup(struct skeletons *skeletons, int mode, int xmit_policy,
 	return -1;
 }
 
-static void bonding_cleanup(struct skeletons *skeletons)
+static void link_cleanup(struct skeletons *skeletons)
 {
-	restore_root_netns();
 	while (skeletons->nlinks) {
 		skeletons->nlinks--;
 		bpf_link__destroy(skeletons->links[skeletons->nlinks]);
 	}
+}
+
+static void bonding_cleanup(struct skeletons *skeletons)
+{
+	restore_root_netns();
+	link_cleanup(skeletons);
 	ASSERT_OK(system("ip link delete bond1"), "delete bond1");
 	ASSERT_OK(system("ip link delete veth1_1"), "delete veth1_1");
 	ASSERT_OK(system("ip link delete veth1_2"), "delete veth1_2");
@@ -493,6 +498,95 @@ static void test_xdp_bonding_nested(struct skeletons *skeletons)
 	system("ip link del bond_nest2");
 }
 
+/*
+ * Test that XDP redirect via xdp_master_redirect() does not crash when
+ * the bond master device is not up. When bond is in round-robin mode but
+ * never opened, rr_tx_counter is NULL.
+ */
+static void test_xdp_bonding_redirect_no_up(struct skeletons *skeletons)
+{
+	struct nstoken *nstoken = NULL;
+	int xdp_pass_fd, xdp_tx_fd;
+	int veth1_ifindex;
+	int err;
+	char pkt[ETH_HLEN + 1];
+	struct xdp_md ctx_in = {};
+
+	DECLARE_LIBBPF_OPTS(bpf_test_run_opts, opts,
+			    .data_in = &pkt,
+			    .data_size_in = sizeof(pkt),
+			    .ctx_in = &ctx_in,
+			    .ctx_size_in = sizeof(ctx_in),
+			    .flags = BPF_F_TEST_XDP_LIVE_FRAMES,
+			    .repeat = 1,
+			    .batch_size = 1,
+		);
+
+	/* We can't use bonding_setup() because bond will be active */
+	SYS(out, "ip netns add ns_rr_no_up");
+	nstoken = open_netns("ns_rr_no_up");
+	if (!ASSERT_OK_PTR(nstoken, "open ns_rr_no_up"))
+		goto out;
+
+	/* bond0: active-backup, UP with slave veth0.
+	 * Attaching native XDP to bond0 enables bpf_master_redirect_enabled_key
+	 * globally.
+	 */
+	SYS(out, "ip link add bond0 type bond mode active-backup");
+	SYS(out, "ip link add veth0 type veth peer name veth0p");
+	SYS(out, "ip link set veth0 master bond0");
+	SYS(out, "ip link set bond0 up");
+	SYS(out, "ip link set veth0p up");
+
+	/* bond1: round-robin, never UP -> rr_tx_counter stays NULL */
+	SYS(out, "ip link add bond1 type bond mode balance-rr");
+	SYS(out, "ip link add veth1 type veth peer name veth1p");
+	SYS(out, "ip link set veth1 master bond1");
+
+	veth1_ifindex = if_nametoindex("veth1");
+	if (!ASSERT_GT(veth1_ifindex, 0, "veth1_ifindex"))
+		goto out;
+
+	/* Attach native XDP to bond0 -> enables global redirect key */
+	if (xdp_attach(skeletons, skeletons->xdp_tx->progs.xdp_tx, "bond0"))
+		goto out;
+
+	/* Attach generic XDP (XDP_TX) to veth1.
+	 * When packets arrive at veth1 via netif_receive_skb, do_xdp_generic()
+	 * runs this program. XDP_TX + bond slave triggers xdp_master_redirect().
+	 */
+	xdp_tx_fd = bpf_program__fd(skeletons->xdp_tx->progs.xdp_tx);
+	if (!ASSERT_GE(xdp_tx_fd, 0, "xdp_tx prog_fd"))
+		goto out;
+
+	err = bpf_xdp_attach(veth1_ifindex, xdp_tx_fd,
+			     XDP_FLAGS_SKB_MODE, NULL);
+	if (!ASSERT_OK(err, "attach generic XDP to veth1"))
+		goto out;
+
+	/* Run BPF_PROG_TEST_RUN with XDP_PASS live frames on veth1.
+	 * XDP_PASS frames become SKBs with skb->dev = veth1, entering
+	 * netif_receive_skb -> do_xdp_generic -> xdp_master_redirect.
+	 * Without the fix, bond_rr_gen_slave_id() dereferences NULL
+	 * rr_tx_counter and crashes.
+	 */
+	xdp_pass_fd = bpf_program__fd(skeletons->xdp_dummy->progs.xdp_dummy_prog);
+	if (!ASSERT_GE(xdp_pass_fd, 0, "xdp_pass prog_fd"))
+		goto out;
+
+	memset(pkt, 0, sizeof(pkt));
+	ctx_in.data_end = sizeof(pkt);
+	ctx_in.ingress_ifindex = veth1_ifindex;
+
+	err = bpf_prog_test_run_opts(xdp_pass_fd, &opts);
+	ASSERT_OK(err, "xdp_pass test_run should not crash");
+
+out:
+	link_cleanup(skeletons);
+	close_netns(nstoken);
+	SYS_NOFAIL("ip netns del ns_rr_no_up");
+}
+
 static void test_xdp_bonding_features(struct skeletons *skeletons)
 {
 	LIBBPF_OPTS(bpf_xdp_query_opts, query_opts);
@@ -680,6 +774,9 @@ void serial_test_xdp_bonding(void)
 	if (test__start_subtest("xdp_bonding_redirect_multi"))
 		test_xdp_bonding_redirect_multi(&skeletons);
 
+	if (test__start_subtest("xdp_bonding_redirect_no_up"))
+		test_xdp_bonding_redirect_no_up(&skeletons);
+
 out:
 	xdp_dummy__destroy(skeletons.xdp_dummy);
 	xdp_tx__destroy(skeletons.xdp_tx);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf v1 1/2] net/bpf: fix null-ptr-deref in xdp_master_redirect() for bonding
  2026-02-24 11:25 ` [PATCH bpf v1 1/2] net/bpf: fix null-ptr-deref in xdp_master_redirect() for bonding Jiayuan Chen
@ 2026-02-26  9:58   ` Sebastian Andrzej Siewior
  2026-02-26 12:57     ` Jiayuan Chen
  0 siblings, 1 reply; 5+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-02-26  9:58 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: bpf, jiayuan.chen, syzbot+80e046b8da2820b6ba73, Martin KaFai Lau,
	Daniel Borkmann, John Fastabend, Stanislav Fomichev,
	Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jesper Dangaard Brouer, Shuah Khan, Clark Williams,
	Steven Rostedt, Jussi Maki, netdev, linux-kernel, linux-kselftest,
	linux-rt-devel

On 2026-02-24 19:25:41 [+0800], Jiayuan Chen wrote:
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4387,6 +4387,9 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
>  	struct net_device *master, *slave;
>  
>  	master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
> +	if (unlikely(!master || !netif_running(master)))
> +		return XDP_TX;
> +

I'm not sure this check belongs here as this is not bond specific, is
it? Also nothing stops the admin to put the device right after the
netif_running() check so it fails later but the race window is not as
wide as it is now.

The per-CPU memory could be allocated while the bond device is created.
I don't think delaying it until "device up" brings any advantages.
One creates the device with the intention to use it so the "up" is
inevitable.
The bond_xmit_get_slave() callback has the same logic. Couldn't this
scenario also occur to the ->ndo_get_xmit_slave() user?

>  	slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
>  	if (slave && slave != xdp->rxq->dev) {
>  		/* The target device is different from the receiving device, so

Sebastian

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH bpf v1 1/2] net/bpf: fix null-ptr-deref in xdp_master_redirect() for bonding
  2026-02-26  9:58   ` Sebastian Andrzej Siewior
@ 2026-02-26 12:57     ` Jiayuan Chen
  0 siblings, 0 replies; 5+ messages in thread
From: Jiayuan Chen @ 2026-02-26 12:57 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: bpf, jiayuan.chen, syzbot+80e046b8da2820b6ba73, Martin KaFai Lau,
	Daniel Borkmann, John Fastabend, Stanislav Fomichev,
	Alexei Starovoitov, Andrii Nakryiko, Eduard Zingerman, Song Liu,
	Yonghong Song, KP Singh, Hao Luo, Jiri Olsa, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Jesper Dangaard Brouer, Shuah Khan, Clark Williams,
	Steven Rostedt, Jussi Maki, netdev, linux-kernel, linux-kselftest,
	linux-rt-devel

2026/2/26 17:58, "Sebastian Andrzej Siewior" <bigeasy@linutronix.de mailto:bigeasy@linutronix.de?to=%22Sebastian%20Andrzej%20Siewior%22%20%3Cbigeasy%40linutronix.de%3E > wrote:


> 
> On 2026-02-24 19:25:41 [+0800], Jiayuan Chen wrote:
> 
> > 
> > --- a/net/core/filter.c
> >  +++ b/net/core/filter.c
> >  @@ -4387,6 +4387,9 @@ u32 xdp_master_redirect(struct xdp_buff *xdp)
> >  struct net_device *master, *slave;
> >  
> >  master = netdev_master_upper_dev_get_rcu(xdp->rxq->dev);
> >  + if (unlikely(!master || !netif_running(master)))
> >  + return XDP_TX;
> >  +
> > 
> I'm not sure this check belongs here as this is not bond specific, is
> it? Also nothing stops the admin to put the device right after the
> netif_running() check so it fails later but the race window is not as
> wide as it is now.
> 
> The per-CPU memory could be allocated while the bond device is created.
> I don't think delaying it until "device up" brings any advantages.
> One creates the device with the intention to use it so the "up" is
> inevitable.
> The bond_xmit_get_slave() callback has the same logic. Couldn't this
> scenario also occur to the ->ndo_get_xmit_slave() user?
> 
> > 
> > slave = master->netdev_ops->ndo_xdp_get_xmit_slave(master, xdp);
> >  if (slave && slave != xdp->rxq->dev) {
> >  /* The target device is different from the receiving device, so
> > 
> Sebastian
>

I agree with your points, especially allocating
rr_tx_counter unconditionally at device creation time in bond_setup().

This eliminates the NULL possibility from the root, fixes both the
ndo_xdp_get_xmit_slave and ndo_get_xmit_slave paths, and keeps the
change within the bonding subsystem.

I'll wait a bit to collect more feedback before sending a v2.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-02-26 12:57 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-24 11:25 [PATCH bpf v1 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest Jiayuan Chen
2026-02-24 11:25 ` [PATCH bpf v1 1/2] net/bpf: fix null-ptr-deref in xdp_master_redirect() for bonding Jiayuan Chen
2026-02-26  9:58   ` Sebastian Andrzej Siewior
2026-02-26 12:57     ` Jiayuan Chen
2026-02-24 11:25 ` [PATCH bpf v1 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up Jiayuan Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox