Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 1/2] bpf: preserve rx_queue_index across XDP redirects
From: Jakub Kicinski @ 2026-06-25  1:54 UTC (permalink / raw)
  To: Siddharth C; +Cc: ast, hawk, andrii, netdev, bpf, linux-kernel, linux-kselftest
In-Reply-To: <20260620121321.45227-2-siddharthcibi@icloud.com>

On Sat, 20 Jun 2026 12:13:13 +0000 Siddharth C wrote:
> diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
> index 5e59ab896f05..8f2d7013620f 100644
> --- a/kernel/bpf/cpumap.c
> +++ b/kernel/bpf/cpumap.c
> @@ -197,7 +197,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
>  
>  		rxq.dev = xdpf->dev_rx;
>  		rxq.mem.type = xdpf->mem_type;
> -		/* TODO: report queue_index to xdp_rxq_info */
> +		rxq.queue_index = xdpf->rx_queue_index;

Do you actually need this or you're just trying to address the TODO?
You can always store the info you need in the metadata prepend.

>  		xdp_convert_frame_to_buff(xdpf, &xdp);
>  
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index dc7b859e8bbf..f419fa0e53e5 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -339,7 +339,7 @@ static int dev_map_bpf_prog_run(struct bpf_prog *xdp_prog,
>  				struct net_device *rx_dev)
>  {
>  	struct xdp_txq_info txq = { .dev = tx_dev };
> -	struct xdp_rxq_info rxq = { .dev = rx_dev };
> +	struct xdp_rxq_info rxq = { };
>  	struct xdp_buff xdp;
>  	int i, nframes = 0;
>  
> @@ -349,6 +349,9 @@ static int dev_map_bpf_prog_run(struct bpf_prog *xdp_prog,
>  		int err;
>  
>  		xdp_convert_frame_to_buff(xdpf, &xdp);
> +		rxq.dev = rx_dev;
> +		rxq.mem.type = xdpf->mem_type;

Why are you setting mem_type?

> +		rxq.queue_index = xdpf->rx_queue_index;
>  		xdp.txq = &txq;
>  		xdp.rxq = &rxq;

^ permalink raw reply

* Re: [PATCH net v2] octeontx2-af: npc: cn20k: Fix subbank free list indexing for search order
From: patchwork-bot+netdevbpf @ 2026-06-25  1:50 UTC (permalink / raw)
  To: Ratheesh Kannoth
  Cc: kuba, linux-kernel, netdev, andrew+netdev, davem, edumazet,
	pabeni, sgoutham
In-Reply-To: <20260619095100.1864440-1-rkannoth@marvell.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 19 Jun 2026 15:21:00 +0530 you wrote:
> subbank_srch_order[i] is the physical subbank at search-order slot i,
> so each subbank's arr_idx must be i (its slot), not
> subbank_srch_order[sb->idx].  The old logic mis-keyed xa_sb_free
> and broke allocation traversal order.
> 
> Populate arr_idx and xa_sb_free in a single pass over the search
> order after subbank structs are initialized.
> 
> [...]

Here is the summary with links:
  - [net,v2] octeontx2-af: npc: cn20k: Fix subbank free list indexing for search order
    https://git.kernel.org/netdev/net/c/429ef02895db

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2] net: phy: realtek: Clear MDIO_AN_10GBT_CTRL_ADV10G bit
From: patchwork-bot+netdevbpf @ 2026-06-25  1:50 UTC (permalink / raw)
  To: Jan Klos
  Cc: hkallweit1, andrew, linux, netdev, maxime.chevallier, davem,
	edumazet, kuba, pabeni, daniel, vladimir.oltean, olek2,
	markus.stockhausen, jan, ih, amadeus, rmk+kernel, hau,
	linux-kernel
In-Reply-To: <20260620011956.37181-1-honza.klos@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Sat, 20 Jun 2026 03:19:53 +0200 you wrote:
> On RTL8127A connected to a link partner that advertises 10000baseT
> speed cannot be changed to anything other than 10000baseT as 10GbE
> is always advertised regardless of any setting. Fix this by
> clearing MDIO_AN_10GBT_CTRL_ADV10G bit in rtl822x_config_aneg()'s
> call to phy_modify_mmd_changed().
> 
> Fixes: 83d962316128 ("net: phy: realtek: add RTL8127-internal PHY")
> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
> Signed-off-by: Jan Klos <honza.klos@gmail.com>
> 
> [...]

Here is the summary with links:
  - [net,v2] net: phy: realtek: Clear MDIO_AN_10GBT_CTRL_ADV10G bit
    https://git.kernel.org/netdev/net/c/510a283f4d12

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2] net: wwan: iosm: bound device offsets in the MUX downlink decoder
From: Jakub Kicinski @ 2026-06-25  1:42 UTC (permalink / raw)
  To: maoyixie.tju
  Cc: Jakub Kicinski, loic.poulain, ryazanov.s.a, johannes,
	andrew+netdev, davem, edumazet, pabeni, netdev, linux-kernel,
	stable
In-Reply-To: <178196118045.462404.11069139160448641355@maoyixie.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: wwan: iosm: bound device offsets in the MUX downlink decoder

This patch validates device-supplied offsets and lengths in
mux_dl_adb_decode() and mux_dl_process_dg() against skb->len so that a
modem reporting an index or length past the downlink buffer cannot make
the decoder read out of bounds. The block header, each table header,
the datagram table, and each datagram index and length are now bounded
against skb->len.

> diff --git a/drivers/net/wwan/iosm/iosm_ipc_mux_codec.c b/drivers/net/wwan/iosm/iosm_ipc_mux_codec.c
> index bff46f7ca59f..ff9a4bc52f29 100644
> --- a/drivers/net/wwan/iosm/iosm_ipc_mux_codec.c
> +++ b/drivers/net/wwan/iosm/iosm_ipc_mux_codec.c

[ ... ]

> @@ -606,6 +612,11 @@ static void mux_dl_adb_decode(struct iosm_mux *ipc_mux,
>  
>  	/* Loop through mixed session tables. */
>  	while (adth_index) {
> +		/* The table header must lie within the received skb. */
> +		if (adth_index < sizeof(struct mux_adbh) ||
> +		    adth_index > skb->len - sizeof(struct mux_adth))
> +			goto adb_decode_err;
> +
>  		/* Get the reference to the table header. */
>  		adth = (struct mux_adth *)(block + adth_index);

[High]
The commit message frames the threat model as "A modem that reports an
index or a length past the downlink buffer", and this patch bounds every
device offset against skb->len. Could the same threat model also produce
an infinite loop in this while (adth_index) loop?

The loop continues as long as adth_index is non-zero, and each iteration
sets:

    adth_index = le32_to_cpu(adth->next_table_index);

The new bounds check only requires that adth_index lies within
[sizeof(struct mux_adbh), skb->len - sizeof(struct mux_adth)]. It does
not require forward progress. Can a modem stage two valid struct mux_adth
headers inside one downlink skb (each with the IOSM_AGGR_MUX_SIG_ADTH
signature, a valid if_id, and table_length >= sizeof(struct mux_adth))
where table A's next_table_index points to table B and table B's
next_table_index points back to table A?

Each adth_index passes the new range check, so the loop never terminates.
mux_dl_adb_decode() runs in softirq context via:

    ipc_task_queue_handler()
      ipc_imem_tq_irq_cb()
        ipc_imem_handle_irq()
          ipc_imem_dl_pipe_process()
            ipc_imem_dl_skb_process()
              ipc_mux_dl_decode()
                mux_dl_adb_decode()

If the cycled tables also reference valid datagrams, each pass through
the loop calls mux_dl_process_dg() -> ipc_mux_net_receive() which does
skb_clone() per datagram, so the same skb produces unbounded clones for
as long as the loop runs.

This is not a regression introduced by this patch — the cycle was
reachable before — but since the patch is hardening the decoder against
modem-supplied offsets, would it make sense to also require strictly
increasing next_table_index, cap the iteration count, or otherwise
guarantee forward progress in the same change?

> @@ -629,6 +640,10 @@ static void mux_dl_adb_decode(struct iosm_mux *ipc_mux,
>  		if (le16_to_cpu(adth->table_length) < sizeof(struct mux_adth))
>  			goto adb_decode_err;
>  
> +		/* The whole datagram table must fit in the received skb. */
> +		if (le16_to_cpu(adth->table_length) > skb->len - adth_index)
> +			goto adb_decode_err;
> +
>  		/* Calculate the number of datagrams. */
>  		nr_of_dg = (le16_to_cpu(adth->table_length) -
>  					sizeof(struct mux_adth)) /
-- 
pw-bot: cr

^ permalink raw reply

* [PATCH v4] virtio_net: disable cb when NAPI is busy-polled
From: Longjun Tang @ 2026-06-25  1:37 UTC (permalink / raw)
  To: mst, xuanzhuo
  Cc: jasowang, edumazet, virtualization, netdev, tanglongjun,
	lange_tang

From: Longjun Tang <tanglongjun@kylinos.cn>

When busy-poll is active, napi_schedule_prep() returns false in
virtqueue_napi_schedule(), so virtqueue_disable_cb() is skipped.
The device may keep firing irqs until reaches virtqueue_napi_complete().
Under load (received == budget), it will lead to a large number
of spurious interrupts.

Fix it by disabling the callback at the virtnet_poll() entry.
This keeps the callback off while we poll and it is re-enabled by
virtqueue_napi_complete() when going idle.

Fixes: ceef438d613f ("virtio_net: remove custom busy_poll")
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Longjun Tang <tanglongjun@kylinos.cn>

---
V1 -> V2: Remain agnostic to busy polling
V2 -> V3: Add fixes tag
V3 -> V4: Update commit message and remove some comments
---
 drivers/net/virtio_net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f4adcfee7a80..569e4db187d1 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3008,6 +3008,8 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
 	unsigned int xdp_xmit = 0;
 	bool napi_complete;

+	virtqueue_disable_cb(rq->vq);
+
 	virtnet_poll_cleantx(rq, budget);

 	received = virtnet_receive(rq, budget, &xdp_xmit);
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH net 0/7] xsk: fix AF_XDP multi-buffer Tx descriptor reclaim
From: Jason Xing @ 2026-06-25  1:33 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Stanislav Fomichev, netdev, bpf, magnus.karlsson, stfomichev,
	kuba, pabeni, horms, bjorn
In-Reply-To: <ajwHuCp82SazSwKv@boxer>

On Thu, Jun 25, 2026 at 12:37 AM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Wed, Jun 24, 2026 at 08:38:20AM -0700, Stanislav Fomichev wrote:
> > On 06/23, Maciej Fijalkowski wrote:
> > > Hi,
> > >
> > > This series fixes several AF_XDP multi-buffer Tx paths where descriptors
> > > consumed from the Tx ring are not consistently returned to userspace
> > > through the completion ring when the packet is later dropped as invalid.
> > >
> > > The affected cases are invalid or oversized multi-buffer Tx packets in
> > > both the generic and zero-copy paths. In these cases, the kernel can
> > > consume one or more Tx descriptors while building or validating a
> > > multi-buffer packet, then drop the packet before it reaches the device.
> > > Userspace still owns the UMEM buffers only after the corresponding
> > > addresses are returned through the CQ. Missing completions therefore
> > > make userspace lose track of those buffers.
> > >
> > > The generic path fixes cover three related cases:
> > > * partially built multi-buffer skbs dropped by xsk_drop_skb();
> > >   continuation descriptors left in the Tx ring after xsk_build_skb()
> > >   reports overflow;
> > > * invalid descriptors encountered in the middle of a multi-buffer
> > >   packet, including the offending invalid descriptor itself.
> > >
> > > The zero-copy path is handled separately. The batched Tx parser now
> > > distinguishes descriptors that can be passed to the driver from
> > > descriptors that are consumed only because they belong to an invalid
> > > multi-buffer packet. Reclaim-only descriptors are written to the CQ
> > > address area and published in completion order, after any earlier
> > > driver-visible Tx descriptors.
> > >
> > > The ZC batching path can also retain drain state when userspace has not
> > > yet provided the end of an invalid multi-buffer packet. To keep this
> > > state local to the singular batched path, the series prevents a second
> > > Tx socket from joining the same pool while such drain state exists.
> > > During the singular-to-shared transition, Tx batching is gated,
> > > pre-existing readers are waited out, and bind fails with -EAGAIN if the
> > > existing socket still has pending drain state. This avoids adding
> > > multi-buffer drain handling to the shared-UMEM fallback path.
> > >
> > > The last two patches update xskxceiver so the tests account invalid
> > > multi-buffer Tx packets as descriptors that must be reclaimed, while
> > > still not expecting those invalid packets on the Rx side.
> > >
> > > This is a follow-up to Jason's changes [0] which were addressing generic
> > > xmit only and this set allows me to pass full xskxceiver test suite run
> > > against ice driver.
> >
> > There is a fair amount of feedback from sashiko already :-( So the meta
> > question from me is: is it time to scrap our current approach where
> > we parse descriptor by descriptor? (and maintain half-baked skb and
> > half-consumed descriptor queues)
> >
> > Should we:
> >
> > 1. do desc[MAX_SKB_FRAGS] and xskq_cons_peek_desc until we exhaust
> > PKT_CONT (if the last packet has PKT_CONT, return EOVERFLOW to userspace
> > and do a full stop here)
> > 2. now that we really know the number of valid descriptors -> reserve
> > the cq space (if not -> EAGAIN)
> > 3. pre-allocate everything here (if at any point we have ENOMEM -> cleanup
> > locally, don't ever create semi-initialized skb)
> > 4. construct the skb
> > 5. xmit
>
> Yeah generic xmit became utterly horrible, haven't gone through sashiko
> reviews yet, but bare in mind this set also aligns zc side to what was
> previously being addressed by Jason.
>
> I believe planned logistics were to get these fixes onto net and then
> Jason had an implementation of batching on generic xmit, directed towards
> -next and that's where we could address current flow.

Agreed. That's what I'm hoping for. There would be much more
discussion on how to do batch xmit in an elegant way, I believe.

Thanks,
Jason

>
> >
> > If at any point there is an issue, the cleanup is straightforward. That
> > whole xk->skb goes away, no state between syscalls. Thoughts?

^ permalink raw reply

* Re:Re: [PATCH v3] virtio_net: disable cb when NAPI is busy-polled
From: Lange Tang @ 2026-06-25  1:28 UTC (permalink / raw)
  To: mst@redhat.com
  Cc: xuanzhuo@linux.alibaba.com, jasowang@redhat.com,
	edumazet@google.com, virtualization@lists.linux.dev,
	netdev@vger.kernel.org, Tang Longjun
In-Reply-To: <20260624030656-mutt-send-email-mst@kernel.org>

At 2026-06-24 15:08:24, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>On Wed, Jun 24, 2026 at 03:02:06PM +0800, Longjun Tang wrote:
>> From: Longjun Tang <tanglongjun@kylinos.cn>
>> 
>> When busy-poll is active, napi_schedule_prep() returns false in
>> virtqueue_napi_schedule(), so virtqueue_disable_cb() is skipped.
>> The device may keep firing irqs until reaches virtqueue_napi_complete().
>> Under load (received == budget), it will lead to a large number
>> of spurious interrupts.
>> 
>> Fix it by disabling the callback at the virtnet_poll() entry. This keeps
>> the callback off while we poll and re-enable
>
>and it is re-enabled
>
>> by virtqueue_napi_complete()
>> when going idle.
>> 
>> Fixes: ceef438d613f ("virtio_net: remove custom busy_poll")
>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>> Signed-off-by: Longjun Tang <tanglongjun@kylinos.cn>
>> 
>> ---
>> V1 -> V2: Remain agnostic to busy polling
>> V2 -> V3: Add fixes tag
>> ---
>>  drivers/net/virtio_net.c | 5 +++++
>>  1 file changed, 5 insertions(+)
>> 
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index f4adcfee7a80..0a11f2b32500 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -3008,6 +3008,11 @@ static int virtnet_poll(struct napi_struct *napi, int budget)
>>  	unsigned int xdp_xmit = 0;
>>  	bool napi_complete;
>>  
>> +	/* Keep callbacks suppressed for the duration of this poll,
>> +	 * busy-poll need.
>
>I don't know what "busy-poll need" means. Just drop this part?
>In fact, the whole comment can go, we know virtqueue_disable_cb
>disables callbacks.

Thanks for your reply.  I got it, see you next version.
>
>> +	 */
>> +	virtqueue_disable_cb(rq->vq);
>> +
>>  	virtnet_poll_cleantx(rq, budget);
>>  
>>  	received = virtnet_receive(rq, budget, &xdp_xmit);
>> -- 
>> 2.43.0

^ permalink raw reply

* [PATCH v2] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump
From: Pengfei Zhang @ 2026-06-25  1:23 UTC (permalink / raw)
  To: dsahern, idosch
  Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel,
	chenzhangqi, baohua, Pengfei Zhang, Pengfei Zhang
In-Reply-To: <20260624171156.822055-1-zhangfeionline@gmail.com>

From: Pengfei Zhang <zhangpengfei16@xiaomi.com>

inet6_dump_fib() saves its progress in cb->args[1] as a positional
index within the current hash chain.  Between batches the RTNL lock
is released, so a concurrent fib6_new_table() can insert a new table
at the chain head, shifting all existing entries.  The saved index
then lands on a different table, causing fib6_dump_table() to set
w->root to the wrong table while w->node still points into the
previous one.  fib6_walk_continue() dereferences w->node->parent
(NULL) and panics:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  RIP: 0010:fib6_walk_continue+0x6e/0x170
  Call Trace:
   <TASK>
   fib6_dump_table.isra.0+0xc5/0x240
   inet6_dump_fib+0xf6/0x420
   rtnl_dumpit+0x30/0xa0
   netlink_dump+0x15b/0x460
   netlink_recvmsg+0x1d6/0x2a0
   ____sys_recvmsg+0x17a/0x190

Fix by storing tb->tb6_id in cb->args[1] instead of a positional
index.  On resume, skip entries until the id matches; a concurrent
head-insert can never match the saved id, so the walker always
resumes on the correct table.

Fixes: 1b43af5480c3 ("[IPV6]: Increase number of possible routing tables to 2^32")
Signed-off-by: Pengfei Zhang <zhangfeionline@gmail.com>
---
 net/ipv6/ip6_fib.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index fc95738de..bda492634 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -636,11 +636,11 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	};
 	const struct nlmsghdr *nlh = cb->nlh;
 	struct net *net = sock_net(skb->sk);
-	unsigned int e = 0, s_e;
 	struct hlist_head *head;
 	struct fib6_walker *w;
 	struct fib6_table *tb;
 	unsigned int h, s_h;
+	u32 s_id;
 	int err = 0;
 
 	rcu_read_lock();
@@ -701,23 +701,22 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	}
 
 	s_h = cb->args[0];
-	s_e = cb->args[1];
+	s_id = cb->args[1];
 
-	for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_e = 0) {
-		e = 0;
+	for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_id = 0) {
 		head = &net->ipv6.fib_table_hash[h];
 		hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
-			if (e < s_e)
-				goto next;
+			if (s_id && tb->tb6_id != s_id)
+				continue;
+			s_id = 0;
+
+			cb->args[1] = tb->tb6_id;
 			err = fib6_dump_table(tb, skb, cb);
 			if (err != 0)
 				goto out;
-next:
-			e++;
 		}
 	}
 out:
-	cb->args[1] = e;
 	cb->args[0] = h;
 
 unlock:
-- 
2.34.1


^ permalink raw reply related

* [PATCH] ipv6: fib6: fix NULL deref in fib6_walk_continue() on multi-batch dump
From: Pengfei Zhang @ 2026-06-25  1:23 UTC (permalink / raw)
  To: dsahern, idosch
  Cc: davem, edumazet, kuba, pabeni, horms, netdev, linux-kernel,
	chenzhangqi, baohua, Pengfei Zhang, Pengfei Zhang
In-Reply-To: <20260624171156.822055-1-zhangfeionline@gmail.com>

From: Pengfei Zhang <zhangpengfei16@xiaomi.com>

inet6_dump_fib() saves its progress in cb->args[1] as a positional
index within the current hash chain.  Between batches the RTNL lock
is released, so a concurrent fib6_new_table() can insert a new table
at the chain head, shifting all existing entries.  The saved index
then lands on a different table, causing fib6_dump_table() to set
w->root to the wrong table while w->node still points into the
previous one.  fib6_walk_continue() dereferences w->node->parent
(NULL) and panics:

  BUG: kernel NULL pointer dereference, address: 0000000000000008
  RIP: 0010:fib6_walk_continue+0x6e/0x170
  Call Trace:
   <TASK>
   fib6_dump_table.isra.0+0xc5/0x240
   inet6_dump_fib+0xf6/0x420
   rtnl_dumpit+0x30/0xa0
   netlink_dump+0x15b/0x460
   netlink_recvmsg+0x1d6/0x2a0
   ____sys_recvmsg+0x17a/0x190

Fix by storing tb->tb6_id in cb->args[1] instead of a positional
index.  On resume, skip entries until the id matches; a concurrent
head-insert can never match the saved id, so the walker always
resumes on the correct table.

Fixes: 1b43af5480c3 ("[IPV6]: Increase number of possible routing tables to 2^32")
Signed-off-by: Pengfei Zhang <zhangfeionline@gmail.com>
---
 net/ipv6/ip6_fib.c | 17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index fc95738de..bda492634 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -636,11 +636,11 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	};
 	const struct nlmsghdr *nlh = cb->nlh;
 	struct net *net = sock_net(skb->sk);
-	unsigned int e = 0, s_e;
 	struct hlist_head *head;
 	struct fib6_walker *w;
 	struct fib6_table *tb;
 	unsigned int h, s_h;
+	u32 s_id;
 	int err = 0;
 
 	rcu_read_lock();
@@ -701,23 +701,22 @@ static int inet6_dump_fib(struct sk_buff *skb, struct netlink_callback *cb)
 	}
 
 	s_h = cb->args[0];
-	s_e = cb->args[1];
+	s_id = cb->args[1];
 
-	for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_e = 0) {
-		e = 0;
+	for (h = s_h; h < FIB6_TABLE_HASHSZ; h++, s_id = 0) {
 		head = &net->ipv6.fib_table_hash[h];
 		hlist_for_each_entry_rcu(tb, head, tb6_hlist) {
-			if (e < s_e)
-				goto next;
+			if (s_id && tb->tb6_id != s_id)
+				continue;
+			s_id = 0;
+
+			cb->args[1] = tb->tb6_id;
 			err = fib6_dump_table(tb, skb, cb);
 			if (err != 0)
 				goto out;
-next:
-			e++;
 		}
 	}
 out:
-	cb->args[1] = e;
 	cb->args[0] = h;
 
 unlock:
-- 
2.34.1


^ permalink raw reply related

* Re: [PATCH] net: ixp4xx_hss: fix duplicate HDLC netdev allocation
From: patchwork-bot+netdevbpf @ 2026-06-25  1:09 UTC (permalink / raw)
  To: Haoxiang Li
  Cc: linusw, kaloz, andrew+netdev, davem, edumazet, kuba, pabeni,
	huangguangbin2, lipeng321, linux-arm-kernel, netdev, linux-kernel,
	stable
In-Reply-To: <20260622043015.643637-1-haoxiang_li2024@163.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 22 Jun 2026 12:30:15 +0800 you wrote:
> ixp4xx_hss_probe() allocates two HDLC netdevs. The first one is stored
> in ndev, initialized, and registered with register_hdlc_device(). The
> second one is stored in port->netdev and later used by the remove path
> for unregister_hdlc_device() and free_netdev().
> 
> This means that the registered netdev is not the same object that is
> unregistered and freed on remove. It also leaks the first allocation if
> the second alloc_hdlcdev() call fails, and the first allocation is not
> checked before ndev is used.
> 
> [...]

Here is the summary with links:
  - net: ixp4xx_hss: fix duplicate HDLC netdev allocation
    https://git.kernel.org/netdev/net/c/db818b0e8af7

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net: dsa: realtek: fix memory leak in rtl8366rb_setup_led()
From: patchwork-bot+netdevbpf @ 2026-06-25  1:09 UTC (permalink / raw)
  To: David Yang
  Cc: netdev, linusw, alsi, andrew, olteanv, davem, edumazet, kuba,
	pabeni, luizluca, linux-kernel
In-Reply-To: <20260618140200.1888707-1-mmyangfl@gmail.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 18 Jun 2026 22:01:55 +0800 you wrote:
> led_classdev_register_ext() only reads init_data.devicename - it never
> stores the pointer. However, the caller allocated devicename with
> kasprintf() but never freed it, leaking the string memory.
> 
> Fix it with a stack buffer to avoid dynamic buffers completely.
> 
> Fixes: 32d617005475 ("net: dsa: realtek: add LED drivers for rtl8366rb")
> Signed-off-by: David Yang <mmyangfl@gmail.com>
> 
> [...]

Here is the summary with links:
  - [net] net: dsa: realtek: fix memory leak in rtl8366rb_setup_led()
    https://git.kernel.org/netdev/net/c/056a5087d87e

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net v2 0/2] airoha: fixes for sched HTB offload support
From: patchwork-bot+netdevbpf @ 2026-06-25  1:08 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, horms, win847,
	linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260619-airoha-qos-fixes-v2-0-5c43485038f9@kernel.org>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 19 Jun 2026 13:37:12 +0200 you wrote:
> ---
> Changes in v2:
> - cosmetics
> - Link to v1: https://lore.kernel.org/r/20260618-airoha-qos-fixes-v1-0-37192652157f@kernel.org
> 
> ---
> Lorenzo Bianconi (2):
>       net: airoha: Fix off-by-one in airoha_tc_remove_htb_queue()
>       net: airoha: fix netif_set_real_num_tx_queues for sparse QoS channels
> 
> [...]

Here is the summary with links:
  - [net,v2,1/2] net: airoha: Fix off-by-one in airoha_tc_remove_htb_queue()
    https://git.kernel.org/netdev/net/c/bfcce49c4aaa
  - [net,v2,2/2] net: airoha: fix netif_set_real_num_tx_queues for sparse QoS channels
    https://git.kernel.org/netdev/net/c/788663dd28e4

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net: mana: Fall back to standard MTU when PF reports adapter_mtu of 0
From: patchwork-bot+netdevbpf @ 2026-06-25  1:08 UTC (permalink / raw)
  To: Erni Sri Satya Vennela
  Cc: kys, haiyangz, wei.liu, decui, longli, andrew+netdev, davem,
	edumazet, kuba, pabeni, dipayanroy, ssengar, jacob.e.keller,
	horms, gargaditya, kees, linux-hyperv, netdev, linux-kernel, bpf
In-Reply-To: <20260619055348.467224-1-ernis@linux.microsoft.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Thu, 18 Jun 2026 22:53:38 -0700 you wrote:
> Commit d7709812e13d ("net: mana: hardening: Validate adapter_mtu from
> MANA_QUERY_DEV_CONFIG") rejected any adapter_mtu value smaller than
> ETH_MIN_MTU + ETH_HLEN, including 0, returning -EPROTO and failing
> mana_probe().
> 
> Some older PF firmware versions still in the field report
> adapter_mtu as 0 in the MANA_QUERY_DEV_CONFIG response. With the
> hardening check in place, the MANA VF driver now fails to load on
> those hosts, breaking networking entirely for guests.
> 
> [...]

Here is the summary with links:
  - [net] net: mana: Fall back to standard MTU when PF reports adapter_mtu of 0
    https://git.kernel.org/netdev/net/c/6bd81a5b4e0d

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net 1/2] net: dsa: mxl862xx: avoid unaligned 16-bit access in api_wrap
From: patchwork-bot+netdevbpf @ 2026-06-25  1:08 UTC (permalink / raw)
  To: Daniel Golle
  Cc: andrew, olteanv, davem, edumazet, kuba, pabeni, netdev,
	linux-kernel
In-Reply-To: <599327521db465a534d277de53ab9b6cac01928b.1781702256.git.daniel@makrotopia.org>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Fri, 19 Jun 2026 04:39:25 +0100 you wrote:
> The MXL862XX_API_* macros pass the address of a stack-allocated, __packed
> firmware-ABI struct to mxl862xx_api_wrap() as a void *. The struct has an
> alignment of 1, so the compiler is free to place it at an odd address.
> 
> mxl862xx_api_wrap() reinterprets that buffer as a __le16 * and accesses it
> with data[i], for which the compiler assumes the natural 2-byte alignment
> of __le16 and emits aligned 16-bit loads/stores (e.g. lhu/sh on MIPS).
> When the buffer lands on an odd address these fault on architectures that
> do not support unaligned access, such as MIPS32.
> 
> [...]

Here is the summary with links:
  - [net,1/2] net: dsa: mxl862xx: avoid unaligned 16-bit access in api_wrap
    https://git.kernel.org/netdev/net/c/6b3f7af57881
  - [net,2/2] net: dsa: mxl862xx: fix use-after-free of DSA ports in crc_err_work
    https://git.kernel.org/netdev/net/c/bcb3b8314611

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* [PATCH nf] netfilter: ipset: fix race between dump and ip_set_list resize
From: Xiang Mei @ 2026-06-25  1:00 UTC (permalink / raw)
  To: Florian Westphal, Pablo Neira Ayuso, Jozsef Kadlecsik,
	Phil Sutter, netfilter-devel
  Cc: kees, horms, Weiming Shi, coreteam, netdev, linux-kernel,
	Xiang Mei

The release path of ip_set_dump_do() and ip_set_dump_done() read
inst->ip_set_list via ip_set_ref_netlink(), a plain rcu_dereference_raw()
of the array pointer. These run from netlink_recvmsg() without the nfnl
mutex and without an RCU read-side critical section.

A concurrent ip_set_create() can grow the array: it publishes the new
array, calls synchronize_net() and then kvfree()s the old one. Since the
dump paths read the array outside any RCU reader, synchronize_net() does
not wait for them and the old array can be freed while they still index
into it, causing a use-after-free.

The dumped set itself stays pinned via set->ref_netlink, so only the
array load needs protecting. Take rcu_read_lock() around it, matching
ip_set_get_byname() and __ip_set_put_byindex().

  BUG: KASAN: slab-use-after-free in ip_set_dump_do (net/netfilter/ipset/ip_set_core.c:1697)
  Read of size 8 at addr ffff88800b5c4018 by task exploit/150
  Call Trace:
   ...
   kasan_report (mm/kasan/report.c:595)
   ip_set_dump_do (net/netfilter/ipset/ip_set_core.c:1697)
   netlink_dump (net/netlink/af_netlink.c:2325)
   netlink_recvmsg (net/netlink/af_netlink.c:1976)
   sock_recvmsg (net/socket.c:1159)
   __sys_recvfrom (net/socket.c:2315)
   ...
  Oops: general protection fault, probably for non-canonical address ... KASAN NOPTI
  KASAN: maybe wild-memory-access in range [0x02d6...d0-0x02d6...d7]
  RIP: 0010:ip_set_dump_do (net/netfilter/ipset/ip_set_core.c:1698)
  Kernel panic - not syncing: Fatal exception

Fixes: 8a02bdd50b2e ("netfilter: ipset: Fix calling ip_set() macro at dumping")
Reported-by: Weiming Shi <bestswngs@gmail.com>
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Xiang Mei <xmei5@asu.edu>
---
 net/netfilter/ipset/ip_set_core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
index a531b654b8d9..6cfad152d7d1 100644
--- a/net/netfilter/ipset/ip_set_core.c
+++ b/net/netfilter/ipset/ip_set_core.c
@@ -1480,7 +1480,11 @@ ip_set_dump_done(struct netlink_callback *cb)
 		struct ip_set_net *inst =
 			(struct ip_set_net *)cb->args[IPSET_CB_NET];
 		ip_set_id_t index = (ip_set_id_t)cb->args[IPSET_CB_INDEX];
-		struct ip_set *set = ip_set_ref_netlink(inst, index);
+		struct ip_set *set;
+
+		rcu_read_lock();
+		set = ip_set_ref_netlink(inst, index);
+		rcu_read_unlock();

 		if (set->variant->uref)
 			set->variant->uref(set, cb, false);
@@ -1686,7 +1690,9 @@ ip_set_dump_do(struct sk_buff *skb, struct netlink_callback *cb)
 release_refcount:
 	/* If there was an error or set is done, release set */
 	if (ret || !cb->args[IPSET_CB_ARG0]) {
+		rcu_read_lock();
 		set = ip_set_ref_netlink(inst, index);
+		rcu_read_unlock();
 		if (set->variant->uref)
 			set->variant->uref(set, cb, false);
 		pr_debug("release set %s\n", set->name);
-- 
2.43.0

^ permalink raw reply related

* [PATCH net-next] fsl/fman: make enable() return void
From: Haoxiang Li @ 2026-06-25  0:56 UTC (permalink / raw)
  To: madalin.bucur, sean.anderson, andrew+netdev, davem, edumazet,
	kuba, pabeni
  Cc: netdev, linux-kernel, Haoxiang Li

The enable() helper always returns 0 and has no failure path.
Make it return void and update the only caller accordingly.

Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
---
 drivers/net/ethernet/freescale/fman/fman.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
index 013273a2de32..aa35373413d4 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -924,7 +924,7 @@ static void hwp_init(struct fman_hwp_regs __iomem *hwp_rg)
 	iowrite32be(HWP_RPIMAC_PEN, &hwp_rg->fmprrpimac);
 }
 
-static int enable(struct fman *fman, struct fman_cfg *cfg)
+static void enable(struct fman *fman, struct fman_cfg *cfg)
 {
 	u32 cfg_reg = 0;
 
@@ -941,8 +941,6 @@ static int enable(struct fman *fman, struct fman_cfg *cfg)
 	iowrite32be(BMI_INIT_START, &fman->bmi_regs->fmbm_init);
 	iowrite32be(cfg_reg | QMI_CFG_ENQ_EN | QMI_CFG_DEQ_EN,
 		    &fman->qmi_regs->fmqm_gc);
-
-	return 0;
 }
 
 static int set_exception(struct fman *fman,
@@ -1998,9 +1996,7 @@ static int fman_init(struct fman *fman)
 	if (!fman->keygen)
 		return -EINVAL;
 
-	err = enable(fman, cfg);
-	if (err != 0)
-		return err;
+	enable(fman, cfg);
 
 	enable_time_stamp(fman);
 
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH net 1/2] net: dsa: mxl862xx: avoid unaligned 16-bit access in api_wrap
From: Jakub Kicinski @ 2026-06-25  0:52 UTC (permalink / raw)
  To: David Laight
  Cc: Daniel Golle, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260619100154.794168e5@pumpkin>

On Fri, 19 Jun 2026 10:01:54 +0100 David Laight wrote:
> > The MXL862XX_API_* macros pass the address of a stack-allocated, __packed
> > firmware-ABI struct to mxl862xx_api_wrap() as a void *. The struct has an
> > alignment of 1, so the compiler is free to place it at an odd address.
> > 
> > mxl862xx_api_wrap() reinterprets that buffer as a __le16 * and accesses it
> > with data[i], for which the compiler assumes the natural 2-byte alignment
> > of __le16 and emits aligned 16-bit loads/stores (e.g. lhu/sh on MIPS).
> > When the buffer lands on an odd address these fault on architectures that
> > do not support unaligned access, such as MIPS32.  
> 
> Isn't the correct fix to not pack the structure?
> (or probably any of the associated structures??)

Agreed, this is very silly:

struct mxl862xx_register_mod {
	__le16 addr;
	__le16 data;
	__le16 mask;
} __packed;

But some structs won't get aligned:

struct mxl862xx_mac_table_clear {
	u8 type;
	u8 port_id;
} __packed;

So I guess the "just don't pack" will have some corner cases, too.

^ permalink raw reply

* [PATCH net v3] fsl/fman: Free init resources on KeyGen failure in fman_init()
From: Haoxiang Li @ 2026-06-25  0:48 UTC (permalink / raw)
  To: madalin.bucur, sean.anderson, andrew+netdev, davem, edumazet,
	kuba, pabeni, florinel.iordache
  Cc: netdev, linux-kernel, Haoxiang Li, stable, Pavan Chebbi,
	Breno Leitao

fman_muram_alloc() allocates initialization resources before
initializing the KeyGen block. If keygen_init() fails, the
function returns -EINVAL directly and leaves those resources
allocated. Free the initialization resources before returning
from the KeyGen failure path.

Fixes: 7472f4f281d0 ("fsl/fman: enable FMan Keygen")
Cc: stable@kernel.org
Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
---
Changes in v2:
 - Add "net" to patch title. Thanks, Pavan!
Changes in v3:
 - Drop the unrelated enable() cleanup from this fix. Thanks, Breno!
---
 drivers/net/ethernet/freescale/fman/fman.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
index 013273a2de32..299bab043175 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -1995,8 +1995,10 @@ static int fman_init(struct fman *fman)
 
 	/* Init KeyGen */
 	fman->keygen = keygen_init(fman->kg_regs);
-	if (!fman->keygen)
+	if (!fman->keygen) {
+		free_init_resources(fman);
 		return -EINVAL;
+	}
 
 	err = enable(fman, cfg);
 	if (err != 0)
-- 
2.25.1


^ permalink raw reply related

* Re:Re: [PATCH] octeontx2-af: Free BPID bitmap on setup failure
From: haoxiang_li2024 @ 2026-06-25  0:34 UTC (permalink / raw)
  To: Simon Horman
  Cc: sgoutham, lcherian, gakula, hkelam, sbhatta, andrew+netdev, davem,
	edumazet, kuba, pabeni, netdev, linux-kernel, stable
In-Reply-To: <20260624170930.GB1131256@horms.kernel.org>



At 2026-06-25 01:09:30, "Simon Horman" <horms@kernel.org> wrote:
>On Tue, Jun 23, 2026 at 07:43:16PM +0800, Haoxiang Li wrote:
>> nix_setup_bpids() allocates bp->bpids with rvu_alloc_bitmap(), which uses
>> a plain kcalloc(). If any of the following devm_kcalloc() allocations for
>> the BPID mapping arrays fails, the function returns without freeing the
>> bitmap. Free the BPID bitmap before returning from those error paths.
>> 
>> Fixes: d6212d2e41a0 ("octeontx2-af: Create BPIDs free pool")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
>
>Reviewed-by: Simon Horman <horms@kernel.org>
>
>I am wondering if you did a pass for any other similar problems
>with users of rvu_alloc_bitmap.

Thanks for your review! Yes, I did. I found similar issues in
nix_setup_ipolicers() and rvu_setup_msix_resources(), and
I will address them in follow-up patches.

Thanks,
Haoxiang


^ permalink raw reply

* Re: [PATCH v3 0/7] net: wwan: t9xx: Add MediaTek T9XX WWAN driver
From: Jakub Kicinski @ 2026-06-25  0:09 UTC (permalink / raw)
  To: Jack Wu via B4 Relay
  Cc: jackbb_wu, Loic Poulain, Sergey Ryazanov, Johannes Berg,
	Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni,
	Wen-Zhi Huang, Shi-Wei Yeh, Minano Tseng, Matthias Brugger,
	AngeloGioacchino Del Regno, Simon Horman, Jonathan Corbet,
	Shuah Khan, linux-kernel, netdev, linux-arm-kernel,
	linux-mediatek, linux-doc
In-Reply-To: <20260624-t9xx_driver_v1-v3-0-73ff03f60c48@compal.com>

On Wed, 24 Jun 2026 18:04:06 +0800 Jack Wu via B4 Relay wrote:
> T9XX is the PCIe host device driver for MediaTek's
> t900 modem. The driver uses the WWAN framework
> infrastructure to create the following control ports
> and network interfaces for data transactions.

Replying after a long delay and then immediately posting a new version
of patches is very bad. Don't bother replying and just put the comments
you had in the changelog of the new posting. Otherwise the discussion
may get split.

^ permalink raw reply

* Re: [External Mail] [PATCH v2 1/7] net: wwan: t9xx: Add PCIe core
From: Jakub Kicinski @ 2026-06-24 23:35 UTC (permalink / raw)
  To: Wu. JackBB (GSM)
  Cc: Loic Poulain, Sergey Ryazanov, Johannes Berg, Andrew Lunn,
	David S. Miller, Eric Dumazet, Paolo Abeni, Wen-Zhi Huang,
	Shi-Wei Yeh, Minano Tseng, Matthias Brugger,
	AngeloGioacchino Del Regno, Simon Horman, Jonathan Corbet,
	Shuah Khan, linux-kernel@vger.kernel.org, netdev@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-mediatek@lists.infradead.org, linux-doc@vger.kernel.org
In-Reply-To: <b02c0e1e9f0449f2b819197e4329373b@compal.com>

On Wed, 24 Jun 2026 09:15:17 +0000 Wu. JackBB (GSM) wrote:
> ================================================================================================================================================================
> This message may contain information which is private, privileged or confidential of Compal Electronics, Inc. If you are not the intended recipient of this message, please notify the sender and destroy/delete the message. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information, by persons or entities other than the intended recipient is prohibited.
> ================================================================================================================================================================

If you want to do anything upstream you have to get rid of this first.

^ permalink raw reply

* [PATCH net v3] sctp: add INIT verification after cookie unpacking
From: Xin Long @ 2026-06-24 22:53 UTC (permalink / raw)
  To: network dev, linux-sctp
  Cc: davem, kuba, Eric Dumazet, Paolo Abeni, Simon Horman,
	Marcelo Ricardo Leitner

In SCTP handshake, the INIT chunk is initially processed by the server
and embedded into the cookie carried in INIT-ACK. The client then
returns this cookie via COOKIE-ECHO, where the server unpacks it and
reconstructs the original INIT chunk.

When cookie authentication is enabled, the cookie contents are protected
against tampering, so reusing the unpacked INIT without re-verification
is safe.

However, when cookie authentication is disabled, the reconstructed INIT
can no longer be trusted. In this case, the INIT must be explicitly
validated after unpacking to avoid processing potentially tampered data.

Add sctp_verify_init() checks after cookie unpacking in COOKIE-ECHO
processing paths (sctp_sf_do_5_1D_ce() and sctp_sf_do_5_2_4_dupcook())
when cookie_auth_enable is disabled. On failure, the new association is
freed and the packet is discarded.

Also tighten cookie validation in sctp_unpack_cookie() by verifying the
embedded chunk type is SCTP_CID_INIT before treating it as an INIT
chunk.

Finally, update sctp_verify_init() to validate parameter bounds using
the actual embedded INIT length instead of chunk->chunk_end, since the
INIT stored in COOKIE-ECHO may not span the entire chunk buffer.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
v2:
  - Because of sctp_abort_on_init_err() param change in patch 1/2,
    pass cid and not chunk.
  - Use SCTP_PAD4() around ntohs(peer_init->chunk_hdr.length) when
    checking param.v in sctp_verify_init() to make Sashiko happy.
v3:
  - Validate the embedded INIT chunk type in sctp_unpack_cookie(), as
    noted by Sashiko.
  - Discard the packet if embedded INIT chunk validation fails,
    consistent with malformed cookie handling.
---
 net/sctp/sm_make_chunk.c |  5 ++++-
 net/sctp/sm_statefuns.c  | 36 +++++++++++++++++++++++++++++++++---
 2 files changed, 37 insertions(+), 4 deletions(-)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 41958b8e59fd..8adac9e0cd66 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -1761,6 +1761,8 @@ struct sctp_association *sctp_unpack_cookie(
 	bear_cookie = &cookie->c;
 
 	ch = (struct sctp_chunkhdr *)(bear_cookie + 1);
+	if (ch->type != SCTP_CID_INIT)
+		goto malformed;
 	chlen = ntohs(ch->length);
 	if (chlen < sizeof(struct sctp_init_chunk))
 		goto malformed;
@@ -2298,7 +2300,8 @@ int sctp_verify_init(struct net *net, const struct sctp_endpoint *ep,
 	 * VIOLATION error.  We build the ERROR chunk here and let the normal
 	 * error handling code build and send the packet.
 	 */
-	if (param.v != (void *)chunk->chunk_end)
+	if (param.v != (void *)peer_init +
+		       SCTP_PAD4(ntohs(peer_init->chunk_hdr.length)))
 		return sctp_process_inv_paramlength(asoc, param.p, chunk, errp);
 
 	/* The only missing mandatory param possible today is
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 8e920cef0858..d23d935e128e 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -707,11 +707,12 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
 					 struct sctp_cmd_seq *commands)
 {
 	struct sctp_ulpevent *ev, *ai_ev = NULL, *auth_ev = NULL;
+	struct sctp_chunk *err_chk_p = NULL;
 	struct sctp_association *new_asoc;
 	struct sctp_init_chunk *peer_init;
 	struct sctp_chunk *chunk = arg;
-	struct sctp_chunk *err_chk_p;
 	struct sctp_chunk *repl;
+	enum sctp_cid cid;
 	struct sock *sk;
 	int error = 0;
 
@@ -785,6 +786,19 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
 		}
 	}
 
+	peer_init = (struct sctp_init_chunk *)(chunk->subh.cookie_hdr + 1);
+	cid = peer_init->chunk_hdr.type;
+	if (!sctp_sk(sk)->cookie_auth_enable &&
+	    !sctp_verify_init(net, ep, asoc, cid, peer_init, chunk,
+			      &err_chk_p)) {
+		sctp_association_free(new_asoc);
+		if (err_chk_p)
+			sctp_chunk_free(err_chk_p);
+		return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
+	}
+	if (err_chk_p)
+		sctp_chunk_free(err_chk_p);
+
 	if (security_sctp_assoc_request(new_asoc, chunk->head_skb ?: chunk->skb)) {
 		sctp_association_free(new_asoc);
 		return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
@@ -798,7 +812,6 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
 	/* This is a brand-new association, so these are not yet side
 	 * effects--it is safe to run them here.
 	 */
-	peer_init = (struct sctp_init_chunk *)(chunk->subh.cookie_hdr + 1);
 	if (!sctp_process_init(new_asoc, chunk,
 			       &chunk->subh.cookie_hdr->c.peer_addr,
 			       peer_init, GFP_ATOMIC))
@@ -2215,10 +2228,12 @@ enum sctp_disposition sctp_sf_do_5_2_4_dupcook(
 					void *arg,
 					struct sctp_cmd_seq *commands)
 {
+	struct sctp_chunk *err_chk_p = NULL;
 	struct sctp_association *new_asoc;
+	struct sctp_init_chunk *peer_init;
 	struct sctp_chunk *chunk = arg;
 	enum sctp_disposition retval;
-	struct sctp_chunk *err_chk_p;
+	enum sctp_cid cid;
 	int error = 0;
 	char action;
 
@@ -2287,6 +2302,21 @@ enum sctp_disposition sctp_sf_do_5_2_4_dupcook(
 	switch (action) {
 	case 'A': /* Association restart. */
 	case 'B': /* Collision case B. */
+		peer_init = (struct sctp_init_chunk *)
+				(chunk->subh.cookie_hdr + 1);
+		cid = peer_init->chunk_hdr.type;
+		if (!sctp_sk(ep->base.sk)->cookie_auth_enable &&
+		    !sctp_verify_init(net, ep, asoc, cid, peer_init, chunk,
+				      &err_chk_p)) {
+			sctp_association_free(new_asoc);
+			if (err_chk_p)
+				sctp_chunk_free(err_chk_p);
+			return sctp_sf_pdiscard(net, ep, asoc, type, arg,
+						commands);
+		}
+		if (err_chk_p)
+			sctp_chunk_free(err_chk_p);
+		fallthrough;
 	case 'D': /* Collision case D. */
 		/* Update socket peer label if first association. */
 		if (security_sctp_assoc_request((struct sctp_association *)asoc,
-- 
2.47.1


^ permalink raw reply related

* RE: [EXTERNAL] Re: [PATCH net] net: mana: Sync page pool RX frags for CPU
From: Dexuan Cui @ 2026-06-24 22:50 UTC (permalink / raw)
  To: Simon Horman
  Cc: KY Srinivasan, Haiyang Zhang, wei.liu@kernel.org, Long Li,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Konstantin Taranov,
	ernis@linux.microsoft.com, dipayanroy@linux.microsoft.com,
	kees@kernel.org, jacob.e.keller@intel.com,
	ssengar@linux.microsoft.com, linux-hyperv@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-rdma@vger.kernel.org, stable@vger.kernel.org
In-Reply-To: <20260619090514.GT827683@horms.kernel.org>

> From: Simon Horman <horms@kernel.org>
> Sent: Friday, June 19, 2026 2:05 AM
> > ...
> > Also validate the packet length reported in the RX CQE before using it as
> > a DMA sync length or passing it to skb processing. The CQE is supplied
> > by the device and should not be blindly trusted by Confidential VMs.
> 
> I think this last part warrants being split out into a separate patch.

Sorry for the late reply. I split v1 into 2 patches of v2, which I just posted:
https://lwn.net/ml/linux-kernel/20260624222605.1794719-1-decui@microsoft.com/
 
Thanks,
Dexuan

^ permalink raw reply

* [PATCH net v2] nfc: nci: fix uninit-value in nci_core_init_rsp_packet()
From: Samuel Page @ 2026-06-24 22:44 UTC (permalink / raw)
  To: David Heidelberg
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, oe-linux-nfc, netdev, linux-kernel, stable

The CORE_INIT_RSP handlers walk the response using length fields taken
from the packet itself, without checking they stay within skb->len:

 - v1 computes
	rsp_2 = skb->data + 6 + rsp_1->num_supported_rf_interfaces;
   from the on-wire (unclamped) interface count and then dereferences
   rsp_2, and memcpy()s the advertised interfaces - both can run past the
   received data;
 - v2 walks supported_rf_interfaces[], advancing the cursor by an
   in-packet rf_extension_cnt with no bound.

A short CORE_INIT_RSP therefore makes the parser read past the packet
(into the uninitialised tail of the RX skb); the values are stored into
struct nci_dev and consumed while bringing the device up:

  BUG: KMSAN: uninit-value in nci_dev_up+0x10f3/0x1720
   nci_dev_up+0x10f3/0x1720
   nfc_dev_up+0x187/0x380
   nfc_genl_dev_up+0xdc/0x1a0
   genl_rcv_msg+0x5d4/0x9e0
   netlink_rcv_skb+0x28f/0x530
  Uninit was stored to memory at:
   nci_rsp_packet+0x68f/0x2310
   nci_rx_work+0x25f/0x5d0
  Uninit was created at:
   __alloc_skb+0x540/0xd40
   virtual_ncidev_write+0x65/0x210

Validate the response length before parsing or storing the
variable-length parts, rejecting truncated responses with
NCI_STATUS_SYNTAX_ERROR.  In v1 the check is done before
num_supported_rf_interfaces is stored into ndev, so a truncated response
cannot leave ndev->num_supported_rf_interfaces holding the unclamped
on-wire count, which nci_init_complete_req() would otherwise use as a
bound for the fixed-size supported_rf_interfaces[] array.

Fixes: 6a2968aaf50c ("NFC: basic NCI protocol implementation")
Fixes: bcd684aace34 ("net/nfc/nci: Support NCI 2.x initial sequence")
Cc: stable@vger.kernel.org
Tested-by: syzbot@syzkaller.appspotmail.com
Assisted-by: Bynario AI
Signed-off-by: Samuel Page <sam@bynar.io>
---
v2: validate the response length before storing num_supported_rf_interfaces
    into @ndev.  In v1 the unclamped on-wire count was stored first and the
    length check returned early on a truncated response, leaving
    ndev->num_supported_rf_interfaces > NCI_MAX_SUPPORTED_RF_INTERFACES; a
    subsequent CORE_INIT completion then walked it in nci_init_complete_req(),
    which the syzbot CI run on v1 flagged as a UBSAN array-index-out-of-bounds.
    https://ci.syzbot.org/series/2a9a8657-37a3-4dce-8cb5-2035027791dd
    v1: https://lore.kernel.org/all/20260623222402.175798-1-sam@bynar.io

 net/nfc/nci/rsp.c | 28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/net/nfc/nci/rsp.c b/net/nfc/nci/rsp.c
index 9eeb862825c5..6b2fa6bdbd14 100644
--- a/net/nfc/nci/rsp.c
+++ b/net/nfc/nci/rsp.c
@@ -50,11 +50,25 @@ static u8 nci_core_init_rsp_packet_v1(struct nci_dev *ndev,
 	const struct nci_core_init_rsp_1 *rsp_1 = (void *)skb->data;
 	const struct nci_core_init_rsp_2 *rsp_2;
 
+	if (skb->len < sizeof(*rsp_1))
+		return NCI_STATUS_SYNTAX_ERROR;
+
 	pr_debug("status 0x%x\n", rsp_1->status);
 
 	if (rsp_1->status != NCI_STATUS_OK)
 		return rsp_1->status;
 
+	/*
+	 * supported_rf_interfaces[] and the trailing nci_core_init_rsp_2 are
+	 * addressed using the on-wire (unclamped) interface count, so the
+	 * response must be long enough for both before any of it is parsed or
+	 * stored into @ndev - otherwise a truncated response would leave
+	 * ndev->num_supported_rf_interfaces holding the unclamped count.
+	 */
+	if (skb->len < sizeof(*rsp_1) +
+	    rsp_1->num_supported_rf_interfaces + sizeof(*rsp_2))
+		return NCI_STATUS_SYNTAX_ERROR;
+
 	ndev->nfcc_features = __le32_to_cpu(rsp_1->nfcc_features);
 	ndev->num_supported_rf_interfaces = rsp_1->num_supported_rf_interfaces;
 
@@ -88,9 +102,13 @@ static u8 nci_core_init_rsp_packet_v2(struct nci_dev *ndev,
 {
 	const struct nci_core_init_rsp_nci_ver2 *rsp = (void *)skb->data;
 	const u8 *supported_rf_interface = rsp->supported_rf_interfaces;
+	const u8 *end = skb->data + skb->len;
 	u8 rf_interface_idx = 0;
 	u8 rf_extension_cnt = 0;
 
+	if (skb->len < sizeof(*rsp))
+		return NCI_STATUS_SYNTAX_ERROR;
+
 	pr_debug("status %x\n", rsp->status);
 
 	if (rsp->status != NCI_STATUS_OK)
@@ -104,10 +122,16 @@ static u8 nci_core_init_rsp_packet_v2(struct nci_dev *ndev,
 		    NCI_MAX_SUPPORTED_RF_INTERFACES);
 
 	while (rf_interface_idx < ndev->num_supported_rf_interfaces) {
-		ndev->supported_rf_interfaces[rf_interface_idx++] = *supported_rf_interface++;
+		/* one interface byte + one extension-count byte must be present */
+		if (end - supported_rf_interface < 2)
+			return NCI_STATUS_SYNTAX_ERROR;
+		ndev->supported_rf_interfaces[rf_interface_idx++] =
+			*supported_rf_interface++;
 
-		/* skip rf extension parameters */
+		/* skip rf extension parameters, bounded by the packet */
 		rf_extension_cnt = *supported_rf_interface++;
+		if (rf_extension_cnt > end - supported_rf_interface)
+			return NCI_STATUS_SYNTAX_ERROR;
 		supported_rf_interface += rf_extension_cnt;
 	}
 

base-commit: a986fde914d88af47eb78fd29c5d1af7952c3500
-- 
2.54.0


^ permalink raw reply related

* [PATCH net v2 1/1] net/sched: sch_teql: Introduce slaves_lock to avoid race condition and UAF
From: Jamal Hadi Salim @ 2026-06-24 22:40 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jiri, victor, security,
	zdi-disclosures, stable, Jamal Hadi Salim, kernel test robot

The teql master->slaves singly linked list is not protected against
multiple writes. It can be mod'ed concurently from teql_master_xmit(),
teql_dequeue(), teql_init() and teql_destroy() without holding any list
lock or RCU protection.

zdi-disclosures@trendmicro.com has demonstrated that the qdisc is freed
after an RCU grace period, but teql_master_xmit() running on another
CPU can still hold a stale pointer into the list, resulting in a
slab-use-after-free:

BUG: KASAN: slab-use-after-free in teql_destroy+0x3ca/0x440 linux/net/sched/sch_teql.c:142
Read of size 8 at addr ffff88802923aa80 by task ip/10024

The zdi-disclosures@trendmicro.com repro created concurrent AF_PACKET
senders on a teql device against a thread that repeatedly adds/deletes the
slave qdisc, together with a SLUB spray that reclaims the freed slot; the
resulting UAF is controllable enough to be turned into a read/write
primitive against the freed qdisc object.

The fix?
Add a per-master slaves_lock spinlock that serializes all mutations of
master->slaves and the NEXT_SLAVE() links in teql_destroy() and
teql_qdisc_init(). teql_master_xmit() also takes the same slaves_lock
around those updates.
Annotate master->slaves and the per-slave ->next pointer with __rcu and
use the appropriate RCU accessors everywhere they are touched:
rcu_assign_pointer() on the writer side (under slaves_lock),
rcu_dereference_protected() for the writer-side loads (also under
slaves_lock), rcu_dereference_bh() for the loads in teql_master_xmit() and
rtnl_dereference() for the loads in teql_master_open()/teql_master_mtu(),
which run under RTNL.
Pair this with rcu_read_lock_bh()/rcu_read_unlock_bh() around the list
traversal in teql_master_xmit(), so that readers either observe a fully
linked list or are deferred until the in-flight mutation completes. The two
early-return paths in teql_master_xmit() are updated to release the RCU-bh
read-side critical section before returning, since leaving it held would
disable BH on that CPU for good.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: zdi-disclosures@trendmicro.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202606241501.XQBMu4b8-lkp@intel.com/
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
Changes in v2:
Address sparse issues found by kernel test robot <lkp@intel.com>
 - rcu annotated struct teql_master->slaves and struct teql_sched_data->next
 - teql_destroy/teql_qdisc_init: replace all READ/WRITE_ONCE() with rcu_dereference_protected()/rcu_assign_pointer()
---
 net/sched/sch_teql.c | 119 ++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 80 insertions(+), 39 deletions(-)

diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index e7bbc9e5174d..f4654f1b1200 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -52,7 +52,8 @@
 struct teql_master {
 	struct Qdisc_ops qops;
 	struct net_device *dev;
-	struct Qdisc *slaves;
+	struct Qdisc __rcu	*slaves;
+	spinlock_t		slaves_lock;	/* serializes writes to ->slaves */
 	struct list_head master_list;
 	unsigned long	tx_bytes;
 	unsigned long	tx_packets;
@@ -61,7 +62,7 @@ struct teql_master {
 };
 
 struct teql_sched_data {
-	struct Qdisc *next;
+	struct Qdisc __rcu	*next;
 	struct teql_master *m;
 	struct sk_buff_head q;
 };
@@ -101,7 +102,9 @@ teql_dequeue(struct Qdisc *sch)
 	if (skb == NULL) {
 		struct net_device *m = qdisc_dev(q);
 		if (m) {
-			dat->m->slaves = sch;
+			spin_lock_bh(&dat->m->slaves_lock);
+			rcu_assign_pointer(dat->m->slaves, sch);
+			spin_unlock_bh(&dat->m->slaves_lock);
 			netif_wake_queue(m);
 		}
 	} else {
@@ -132,34 +135,49 @@ teql_destroy(struct Qdisc *sch)
 	struct Qdisc *q, *prev;
 	struct teql_sched_data *dat = qdisc_priv(sch);
 	struct teql_master *master = dat->m;
+	struct netdev_queue *txq = NULL;
+	bool reset_master_queue = false;
 
 	if (!master)
 		return;
 
-	prev = master->slaves;
+	spin_lock_bh(&master->slaves_lock);
+	prev = rcu_dereference_protected(master->slaves,
+					 lockdep_is_held(&master->slaves_lock));
 	if (prev) {
 		do {
-			q = NEXT_SLAVE(prev);
-			if (q == sch) {
-				NEXT_SLAVE(prev) = NEXT_SLAVE(q);
-				if (q == master->slaves) {
-					master->slaves = NEXT_SLAVE(q);
-					if (q == master->slaves) {
-						struct netdev_queue *txq;
-
-						txq = netdev_get_tx_queue(master->dev, 0);
-						master->slaves = NULL;
-
-						dev_reset_queue(master->dev,
-								txq, NULL);
-					}
-				}
-				skb_queue_purge(&dat->q);
-				break;
+			struct Qdisc *head, *next;
+
+			q = rcu_dereference_protected(NEXT_SLAVE(prev),
+						      lockdep_is_held(&master->slaves_lock));
+			if (q != sch) {
+				prev = q;
+				continue;
 			}
 
-		} while ((prev = q) != master->slaves);
+			next = rcu_dereference_protected(NEXT_SLAVE(q),
+							 lockdep_is_held(&master->slaves_lock));
+			rcu_assign_pointer(NEXT_SLAVE(prev), next);
+
+			head = rcu_dereference_protected(master->slaves,
+							 lockdep_is_held(&master->slaves_lock));
+			if (q == head) {
+				rcu_assign_pointer(master->slaves, next);
+				if (q == next) {
+					txq = netdev_get_tx_queue(master->dev, 0);
+					rcu_assign_pointer(master->slaves, NULL);
+					reset_master_queue = true;
+				}
+			}
+			skb_queue_purge(&dat->q);
+			break;
+		} while (prev != rcu_dereference_protected(master->slaves,
+							   lockdep_is_held(&master->slaves_lock)));
 	}
+	spin_unlock_bh(&master->slaves_lock);
+
+	if (reset_master_queue)
+		dev_reset_queue(master->dev, txq, NULL);
 }
 
 static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt,
@@ -168,6 +186,7 @@ static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt,
 	struct net_device *dev = qdisc_dev(sch);
 	struct teql_master *m = (struct teql_master *)sch->ops;
 	struct teql_sched_data *q = qdisc_priv(sch);
+	struct Qdisc *first;
 
 	if (dev->hard_header_len > m->dev->hard_header_len)
 		return -EINVAL;
@@ -184,7 +203,9 @@ static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt,
 
 	skb_queue_head_init(&q->q);
 
-	if (m->slaves) {
+	spin_lock_bh(&m->slaves_lock);
+	first = rcu_dereference_protected(m->slaves, lockdep_is_held(&m->slaves_lock));
+	if (first) {
 		if (m->dev->flags & IFF_UP) {
 			if ((m->dev->flags & IFF_POINTOPOINT &&
 			     !(dev->flags & IFF_POINTOPOINT)) ||
@@ -192,8 +213,10 @@ static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt,
 			     !(dev->flags & IFF_BROADCAST)) ||
 			    (m->dev->flags & IFF_MULTICAST &&
 			     !(dev->flags & IFF_MULTICAST)) ||
-			    dev->mtu < m->dev->mtu)
+			    dev->mtu < m->dev->mtu) {
+				spin_unlock_bh(&m->slaves_lock);
 				return -EINVAL;
+			}
 		} else {
 			if (!(dev->flags&IFF_POINTOPOINT))
 				m->dev->flags &= ~IFF_POINTOPOINT;
@@ -204,14 +227,17 @@ static int teql_qdisc_init(struct Qdisc *sch, struct nlattr *opt,
 			if (dev->mtu < m->dev->mtu)
 				m->dev->mtu = dev->mtu;
 		}
-		q->next = NEXT_SLAVE(m->slaves);
-		NEXT_SLAVE(m->slaves) = sch;
+		rcu_assign_pointer(q->next,
+				   rcu_dereference_protected(NEXT_SLAVE(first),
+							     lockdep_is_held(&m->slaves_lock)));
+		rcu_assign_pointer(NEXT_SLAVE(first), sch);
 	} else {
-		q->next = sch;
-		m->slaves = sch;
+		rcu_assign_pointer(q->next, sch);
+		rcu_assign_pointer(m->slaves, sch);
 		m->dev->mtu = dev->mtu;
 		m->dev->flags = (m->dev->flags&~FMASK)|(dev->flags&FMASK);
 	}
+	spin_unlock_bh(&m->slaves_lock);
 	return 0;
 }
 
@@ -285,7 +311,9 @@ static netdev_tx_t teql_master_xmit(struct sk_buff *skb, struct net_device *dev)
 	int subq = skb_get_queue_mapping(skb);
 	struct sk_buff *skb_res = NULL;
 
-	start = master->slaves;
+	rcu_read_lock_bh();
+
+	start = rcu_dereference_bh(master->slaves);
 
 restart:
 	nores = 0;
@@ -317,10 +345,14 @@ static netdev_tx_t teql_master_xmit(struct sk_buff *skb, struct net_device *dev)
 				    netdev_start_xmit(skb, slave, slave_txq, false) ==
 				    NETDEV_TX_OK) {
 					__netif_tx_unlock(slave_txq);
-					master->slaves = NEXT_SLAVE(q);
+					spin_lock_bh(&master->slaves_lock);
+					rcu_assign_pointer(master->slaves,
+							   rcu_dereference_bh(NEXT_SLAVE(q)));
+					spin_unlock_bh(&master->slaves_lock);
 					netif_wake_queue(dev);
 					master->tx_packets++;
 					master->tx_bytes += length;
+					rcu_read_unlock_bh();
 					return NETDEV_TX_OK;
 				}
 				__netif_tx_unlock(slave_txq);
@@ -329,14 +361,18 @@ static netdev_tx_t teql_master_xmit(struct sk_buff *skb, struct net_device *dev)
 				busy = 1;
 			break;
 		case 1:
-			master->slaves = NEXT_SLAVE(q);
+			spin_lock_bh(&master->slaves_lock);
+			rcu_assign_pointer(master->slaves,
+					   rcu_dereference_bh(NEXT_SLAVE(q)));
+			spin_unlock_bh(&master->slaves_lock);
+			rcu_read_unlock_bh();
 			return NETDEV_TX_OK;
 		default:
 			nores = 1;
 			break;
 		}
 		__skb_pull(skb, skb_network_offset(skb));
-	} while ((q = NEXT_SLAVE(q)) != start);
+	} while ((q = rcu_dereference_bh(NEXT_SLAVE(q))) != start);
 
 	if (nores && skb_res == NULL) {
 		skb_res = skb;
@@ -345,29 +381,32 @@ static netdev_tx_t teql_master_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	if (busy) {
 		netif_stop_queue(dev);
+		rcu_read_unlock_bh();
 		return NETDEV_TX_BUSY;
 	}
 	master->tx_errors++;
 
 drop:
 	master->tx_dropped++;
+	rcu_read_unlock_bh();
 	dev_kfree_skb(skb);
 	return NETDEV_TX_OK;
 }
 
 static int teql_master_open(struct net_device *dev)
 {
-	struct Qdisc *q;
+	struct Qdisc *q, *first;
 	struct teql_master *m = netdev_priv(dev);
 	int mtu = 0xFFFE;
 	unsigned int flags = IFF_NOARP | IFF_MULTICAST;
 
-	if (m->slaves == NULL)
+	first = rtnl_dereference(m->slaves);
+	if (!first)
 		return -EUNATCH;
 
 	flags = FMASK;
 
-	q = m->slaves;
+	q = first;
 	do {
 		struct net_device *slave = qdisc_dev(q);
 
@@ -389,7 +428,7 @@ static int teql_master_open(struct net_device *dev)
 			flags &= ~IFF_BROADCAST;
 		if (!(slave->flags&IFF_MULTICAST))
 			flags &= ~IFF_MULTICAST;
-	} while ((q = NEXT_SLAVE(q)) != m->slaves);
+	} while ((q = rtnl_dereference(NEXT_SLAVE(q))) != first);
 
 	m->dev->mtu = mtu;
 	m->dev->flags = (m->dev->flags&~FMASK) | flags;
@@ -417,14 +456,15 @@ static void teql_master_stats64(struct net_device *dev,
 static int teql_master_mtu(struct net_device *dev, int new_mtu)
 {
 	struct teql_master *m = netdev_priv(dev);
-	struct Qdisc *q;
+	struct Qdisc *q, *first;
 
-	q = m->slaves;
+	first = rtnl_dereference(m->slaves);
+	q = first;
 	if (q) {
 		do {
 			if (new_mtu > qdisc_dev(q)->mtu)
 				return -EINVAL;
-		} while ((q = NEXT_SLAVE(q)) != m->slaves);
+		} while ((q = rtnl_dereference(NEXT_SLAVE(q))) != first);
 	}
 
 	WRITE_ONCE(dev->mtu, new_mtu);
@@ -444,6 +484,7 @@ static __init void teql_master_setup(struct net_device *dev)
 	struct teql_master *master = netdev_priv(dev);
 	struct Qdisc_ops *ops = &master->qops;
 
+	spin_lock_init(&master->slaves_lock);
 	master->dev	= dev;
 	ops->priv_size  = sizeof(struct teql_sched_data);
 

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox