public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: "Jiayuan Chen" <jiayuan.chen@linux.dev>
To: "Sebastian Andrzej Siewior" <bigeasy@linutronix.de>
Cc: "Jay Vosburgh" <jv@jvosburgh.net>,
	netdev@vger.kernel.org, jiayuna.chen@linux.dev,
	jiayuna.chen@shopee.com, "Jiayuan Chen" <jiayuan.chen@shopee.com>,
	syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com,
	"Andrew Lunn" <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Jakub Kicinski" <kuba@kernel.org>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Jesper Dangaard Brouer" <hawk@kernel.org>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Stanislav Fomichev" <sdf@fomichev.me>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Eduard Zingerman" <eddyz87@gmail.com>,
	"Martin KaFai Lau" <martin.lau@linux.dev>,
	"Song Liu" <song@kernel.org>,
	"Yonghong Song" <yonghong.song@linux.dev>,
	"KP Singh" <kpsingh@kernel.org>, "Hao Luo" <haoluo@google.com>,
	"Jiri Olsa" <jolsa@kernel.org>, "Shuah Khan" <shuah@kernel.org>,
	"Clark Williams" <clrkwllms@kernel.org>,
	"Steven Rostedt" <rostedt@goodmis.org>,
	"Jussi Maki" <joamaki@gmail.com>,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	linux-kselftest@vger.kernel.org, linux-rt-devel@lists.linux.dev
Subject: Re: [PATCH net v3 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id()
Date: Mon, 02 Mar 2026 10:15:22 +0000	[thread overview]
Message-ID: <e4a2a652784ec206728eb3a929a9892238c61f06@linux.dev> (raw)
In-Reply-To: <20260302081021.MiQQ_LQr@linutronix.de>

March 2, 2026 at 16:10, "Sebastian Andrzej Siewior" <bigeasy@linutronix.de mailto:bigeasy@linutronix.de?to=%22Sebastian%20Andrzej%20Siewior%22%20%3Cbigeasy%40linutronix.de%3E > wrote:


> 
> On 2026-02-28 03:36:24 [+0000], Jiayuan Chen wrote:
> 
> > 
> > My only concern is that this will waste a percpu u32 per bond
> >  device for the majority of bonding use cases (which use modes other than
> >  balance-rr), which could be a few hundred bytes on a large machine.
> >  
> >  Does everything work reliably if the rr_tx_counter allocation
> >  happens conditionally on mode == BOND_MODE_ROUNDROBIN in bond_setup, as
> >  well as in bond_option_mode_set?
> > 
> …
> 
> > 
> > An alternative would be to allocate conditionally in bond_init() (since the default mode is round-robin)
> >  and manage allocation/deallocation in bond_option_mode_set() when the mode changes.
> > 
> This sounds reasonable.
> 
> > 
> > This is a trade-off between the added complexity of conditional alloc/free across multiple code
> >  paths and saving a per-CPU u32 for non-round-robin bonds.
> >  
> >  For the per-CPU u32 overhead, it's only 4 extra bytes per CPU per bond device — and machines with
> >  that many CPUs tend to have plenty of memory to match.
> > 
> 4 bytes is the minimum allocation for per-CPU memory. The memory is
> already "there" it is just not assigned. So for the 4 byte allocation it
> is needed to find a single area (the smallest allocation size).
> In case there no free block, a new block will be allocated and mapped
> for each CPU which the part that costs memory.
> That said, we should not waste memory but it is not _that_ expensive
> either for a bond device. Things change if here are hundreds of devices.


Hi Jay, Sebastian,

Sorry, the conditional alloc/free approach in bond_option_mode_set()
I suggested earlier doesn't work well on closer inspection.

Since bond_option_mode_set() requires the bond to be down, and as this
bug shows, the XDP redirect path can still reach a downed bond device,
we need to carefully order the operations:

when switching to RR, alloc rr_tx_counter before setting mode;
when switching away, set mode before freeing rr_tx_counter.

Strictly speaking, this also requires smp_wmb()/smp_rmb() pairs to
guarantee the ordering is visible to other CPUs — the write side in
bond_option_mode_set() and the read side in the XDP path.

In practice the race window is very small and unlikely to trigger, but
leaving out the barriers would look incorrect, and adding them to the
XDP hot path feels wrong for saving only 4 bytes per CPU per bond device.


smp_wmb()/smp_rmb() (no test): 

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4279,12 +4279,6 @@ static int bond_open(struct net_device *bond_dev)
	struct list_head *iter;
	struct slave *slave;

-     if (BOND_MODE(bond) == BOND_MODE_ROUNDROBIN && !bond->rr_tx_counter) {
-             bond->rr_tx_counter = alloc_percpu(u32);
-             if (!bond->rr_tx_counter)
-                     return -ENOMEM;
-     }
-
	/* reset slave->backup and slave->inactive */
	if (bond_has_slaves(bond)) {
			bond_for_each_slave(bond, slave, iter) {
@@ -5532,6 +5526,8 @@ bond_xdp_get_xmit_slave(struct net_device *bond_dev, struct xdp_buff *xdp)

	switch (BOND_MODE(bond)) {
	case BOND_MODE_ROUNDROBIN:
+             /* Pairs with smp_wmb() in bond_option_mode_set() */
+             smp_rmb();
			slave = bond_xdp_xmit_roundrobin_slave_get(bond, xdp);
			break;

@@ -6411,6 +6407,14 @@ static int bond_init(struct net_device *bond_dev)
	if (!bond->wq)
			return -ENOMEM;

+     /* Default mode is round-robin, allocate rr_tx_counter for it.
+      * For mode changes, bond_option_mode_set() manages the lifecycle.
+      */
+     bond->rr_tx_counter = alloc_percpu(u32);
+     if (!bond->rr_tx_counter) {
+             destroy_workqueue(bond->wq);
+             return -ENOMEM;
+     }
+
	bond->notifier_ctx = false;

diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -918,7 +918,27 @@ static int bond_option_mode_set(struct bonding *bond,
	/* don't cache arp_validate between modes */
	bond->params.arp_validate = BOND_ARP_VALIDATE_NONE;
-     bond->params.mode = newval->value;
+
+     if (newval->value == BOND_MODE_ROUNDROBIN) {
+             /* Switching to round-robin: allocate before setting mode,
+              * so XDP path seeing BOND_MODE_ROUNDROBIN always finds
+              * rr_tx_counter allocated.
+              */
+             if (!bond->rr_tx_counter) {
+                     bond->rr_tx_counter = alloc_percpu(u32);
+                     if (!bond->rr_tx_counter)
+                             return -ENOMEM;
+             }
+             /* Pairs with smp_rmb() in bond_xdp_get_xmit_slave() */
+             smp_wmb();
+             bond->params.mode = newval->value;
+     } else {
+             /* Switching away: set mode first so XDP no longer
+              * enters RR branch before we free rr_tx_counter.
+              */
+             bond->params.mode = newval->value;
+             /* Pairs with smp_rmb() in bond_xdp_get_xmit_slave() */
+             smp_wmb();
+             free_percpu(bond->rr_tx_counter);
+             bond->rr_tx_counter = NULL;
+     }

Thanks,
Jiayuan


> > 
> > Thanks
> >  
> >  -J
> > 
> Sebastian
>

  reply	other threads:[~2026-03-02 10:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-28  2:19 [PATCH net v3 0/2] net,bpf: fix null-ptr-deref in xdp_master_redirect() for bonding and add selftest Jiayuan Chen
2026-02-28  2:19 ` [PATCH net v3 1/2] bonding: fix null-ptr-deref in bond_rr_gen_slave_id() Jiayuan Chen
2026-02-28  3:01   ` Jay Vosburgh
2026-02-28  3:36     ` Jiayuan Chen
2026-03-02  8:10       ` Sebastian Andrzej Siewior
2026-03-02 10:15         ` Jiayuan Chen [this message]
2026-03-04  2:38           ` Jay Vosburgh
2026-02-28  2:19 ` [PATCH net v3 2/2] selftests/bpf: add test for xdp_master_redirect with bond not up Jiayuan Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e4a2a652784ec206728eb3a929a9892238c61f06@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=andrew+netdev@lunn.ch \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=clrkwllms@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=eddyz87@gmail.com \
    --cc=edumazet@google.com \
    --cc=haoluo@google.com \
    --cc=hawk@kernel.org \
    --cc=jiayuan.chen@shopee.com \
    --cc=jiayuna.chen@linux.dev \
    --cc=jiayuna.chen@shopee.com \
    --cc=joamaki@gmail.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=jv@jvosburgh.net \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=martin.lau@linux.dev \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rostedt@goodmis.org \
    --cc=sdf@fomichev.me \
    --cc=shuah@kernel.org \
    --cc=song@kernel.org \
    --cc=syzbot+80e046b8da2820b6ba73@syzkaller.appspotmail.com \
    --cc=yonghong.song@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox