Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net v2] net: airoha: Add missing bits in airoha_qdma_cleanup_tx_queue()
From: Lorenzo Bianconi @ 2026-04-15 17:58 UTC (permalink / raw)
  To: Simon Horman
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260415164643.GQ772670@horms.kernel.org>

[-- Attachment #1: Type: text/plain, Size: 5012 bytes --]

> On Tue, Apr 14, 2026 at 08:50:52AM +0200, Lorenzo Bianconi wrote:
> > Similar to airoha_qdma_cleanup_rx_queue(), reset DMA TX descriptors in
> > airoha_qdma_cleanup_tx_queue routine. Moreover, reset TX_DMA_IDX to
> > TX_CPU_IDX to notify the NIC the QDMA TX ring is empty.
> > 
> > Fixes: 23020f0493270 ("net: airoha: Introduce ethernet support for EN7581 SoC")
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> > Changes in v2:
> > - Move q->ndesc initialization at end of airoha_qdma_init_tx routine in
> >   order to avoid any possible NULL pointer dereference in
> >   airoha_qdma_cleanup_tx_queue()
> 
> This seems to be a separate issue.
> If so, I think it should be split out into a separate patch.

Sorry, I missed this comment.
Ack. I will repost splitting this in two separated patches.

Regards,
Lorenzo

> 
> > - Check if q->tx_list is empty in airoha_qdma_cleanup_tx_queue()
> > - Link to v1: https://lore.kernel.org/r/20260410-airoha_qdma_cleanup_tx_queue-fix-net-v1-1-b7171c8f1e78@kernel.org
> 
> I think it was covered in the review Jakub forwarded for v1.  But FTR,
> Sashiko has some feedback on this patch in the form of an existing bug
> (that should almost certainly be handled separately from this patch).
> 
> > ---
> >  drivers/net/ethernet/airoha/airoha_eth.c | 41 ++++++++++++++++++++++++++------
> >  1 file changed, 34 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
> > index 9e995094c32a..3c1a2bc68c42 100644
> > --- a/drivers/net/ethernet/airoha/airoha_eth.c
> > +++ b/drivers/net/ethernet/airoha/airoha_eth.c
> > @@ -966,27 +966,27 @@ static int airoha_qdma_init_tx_queue(struct airoha_queue *q,
> >  	dma_addr_t dma_addr;
> >  
> >  	spin_lock_init(&q->lock);
> > -	q->ndesc = size;
> >  	q->qdma = qdma;
> >  	q->free_thr = 1 + MAX_SKB_FRAGS;
> >  	INIT_LIST_HEAD(&q->tx_list);
> >  
> > -	q->entry = devm_kzalloc(eth->dev, q->ndesc * sizeof(*q->entry),
> > +	q->entry = devm_kzalloc(eth->dev, size * sizeof(*q->entry),
> >  				GFP_KERNEL);
> >  	if (!q->entry)
> >  		return -ENOMEM;
> >  
> > -	q->desc = dmam_alloc_coherent(eth->dev, q->ndesc * sizeof(*q->desc),
> > +	q->desc = dmam_alloc_coherent(eth->dev, size * sizeof(*q->desc),
> >  				      &dma_addr, GFP_KERNEL);
> >  	if (!q->desc)
> >  		return -ENOMEM;
> >  
> > -	for (i = 0; i < q->ndesc; i++) {
> > +	for (i = 0; i < size; i++) {
> >  		u32 val = FIELD_PREP(QDMA_DESC_DONE_MASK, 1);
> >  
> >  		list_add_tail(&q->entry[i].list, &q->tx_list);
> >  		WRITE_ONCE(q->desc[i].ctrl, cpu_to_le32(val));
> >  	}
> > +	q->ndesc = size;
> >  
> >  	/* xmit ring drop default setting */
> >  	airoha_qdma_set(qdma, REG_TX_RING_BLOCKING(qid),
> > @@ -1051,13 +1051,17 @@ static int airoha_qdma_init_tx(struct airoha_qdma *qdma)
> >  
> >  static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q)
> >  {
> > -	struct airoha_eth *eth = q->qdma->eth;
> > -	int i;
> > +	struct airoha_qdma *qdma = q->qdma;
> > +	struct airoha_eth *eth = qdma->eth;
> > +	int i, qid = q - &qdma->q_tx[0];
> > +	struct airoha_queue_entry *e;
> > +	u16 index = 0;
> >  
> >  	spin_lock_bh(&q->lock);
> >  	for (i = 0; i < q->ndesc; i++) {
> > -		struct airoha_queue_entry *e = &q->entry[i];
> 
> super nit: In v2 e is always used within a block (here and in the hunk below).
>            So I would lean towards declaring e in the blocks where it is
> 	   used.
> 
> 	   No need to repost just for this!
> 
> > +		struct airoha_qdma_desc *desc = &q->desc[i];
> >  
> > +		e = &q->entry[i];
> >  		if (!e->dma_addr)
> >  			continue;
> >  
> > @@ -1067,8 +1071,31 @@ static void airoha_qdma_cleanup_tx_queue(struct airoha_queue *q)
> >  		e->dma_addr = 0;
> >  		e->skb = NULL;
> >  		list_add_tail(&e->list, &q->tx_list);
> > +
> > +		/* Reset DMA descriptor */
> > +		WRITE_ONCE(desc->ctrl, 0);
> > +		WRITE_ONCE(desc->addr, 0);
> > +		WRITE_ONCE(desc->data, 0);
> > +		WRITE_ONCE(desc->msg0, 0);
> > +		WRITE_ONCE(desc->msg1, 0);
> > +		WRITE_ONCE(desc->msg2, 0);
> > +
> >  		q->queued--;
> >  	}
> > +
> > +	if (!list_empty(&q->tx_list)) {
> > +		e = list_first_entry(&q->tx_list, struct airoha_queue_entry,
> > +				     list);
> > +		index = e - q->entry;
> > +	}
> > +	/* Set TX_DMA_IDX to TX_CPU_IDX to notify the hw the QDMA TX ring is
> > +	 * empty.
> > +	 */
> > +	airoha_qdma_rmw(qdma, REG_TX_CPU_IDX(qid), TX_RING_CPU_IDX_MASK,
> > +			FIELD_PREP(TX_RING_CPU_IDX_MASK, index));
> > +	airoha_qdma_rmw(qdma, REG_TX_DMA_IDX(qid), TX_RING_DMA_IDX_MASK,
> > +			FIELD_PREP(TX_RING_DMA_IDX_MASK, index));
> > +
> >  	spin_unlock_bh(&q->lock);
> >  }
> >  
> > 
> > ---
> > base-commit: 2cd7e6971fc2787408ceef17906ea152791448cf
> > change-id: 20260410-airoha_qdma_cleanup_tx_queue-fix-net-93375f5ee80f
> > 
> > Best regards,
> > -- 
> > Lorenzo Bianconi <lorenzo@kernel.org>
> > 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH net v3 3/5] bonding: 3ad: fix mux port state on oper down
From: Louis Scalbert @ 2026-04-15 18:03 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, andrew+netdev, edumazet, kuba, pabeni, fbl, andy,
	shemminger, maheshb
In-Reply-To: <707642.1776098994@famine>

Le lun. 13 avr. 2026 à 18:49, Jay Vosburgh <jv@jvosburgh.net> a écrit :
>
> Louis Scalbert <louis.scalbert@6wind.com> wrote:
>
> >When the bonding interface has carrier down due to the absence of
> >valid slaves and a slave transitions from down to up, the bonding
> >interface briefly goes carrier up, then down again, and finally up
> >once LACP negotiates collecting and distributing on the port.
>
>         Instead of "valid," I would suggest "usable."
>
> >The interface should not transition to carrier up until LACP
> >negotiation is complete.

I agree to take lacp_strict as the new parameter name.
I should write instead:

*With lacp_strict enabled,* the interface should not transition...

>
>         If the new option is off, i.e., does not require successful LACP
> negotiation, it should wait some time before asserting carrier up.  If
> negotiation fails, however, how long should it wait?
>

I am not sure I understand your point.

When lacp_strict is disabled, the existing behavior should remain
unchanged. The purpose of this patch is only to correct the
Collecting/Distributing state when the port is not operational.

This state is not used to determine carrier assertion when lacp_strict
is disabled.

I will double-check that the timing remains unchanged before posting
the next version of the patch.

> >This happens because the actor and partner port states remain in
> >collecting (and distributing) when the port goes down. When the port
> >comes back up, it temporarily remains in this state until LACP
> >renegotiation occurs.
> >
> >Previously this was mostly cosmetic, but since the bonding carrier
> >state now depends on the LACP negotiation state, it causes the
> >interface to flap.
>
>         "now depends" -> "may depend"
>
> >Fix this by unsetting the SELECTED flag when a port goes down so that
> >the mux state machine transitions through ATTACHED and DETACHED,
> >which clears the actor collecting and distributing flags. Do not
> >attempt to set the SELECTED flag if the port is still disabled.
> >
> >Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
> >Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
> >---
> > drivers/net/bonding/bond_3ad.c | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> >index b79a76296966..3a94fbcbf721 100644
> >--- a/drivers/net/bonding/bond_3ad.c
> >+++ b/drivers/net/bonding/bond_3ad.c
> >@@ -1570,6 +1570,12 @@ static void ad_port_selection_logic(struct port *port, bool *update_slave_arr)
> >       struct slave *slave;
> >       int found = 0;
> >
> >+      /* Disabled ports cannot be SELECTED.
> >+       * Do not attempt to set the SELECTED flag if the port is still disabled.
> >+       */
> >+      if (!port->is_enabled)
> >+              return;
> >+
>
>         I think the change is fine, but the comment seems redundant to
> me.
>
>         -J
>
> >       /* if the port is already Selected, do nothing */
> >       if (port->sm_vars & AD_PORT_SELECTED)
> >               return;
> >@@ -2794,6 +2800,7 @@ void bond_3ad_handle_link_change(struct slave *slave, char link)
> >               /* link has failed */
> >               port->is_enabled = false;
> >               ad_update_actor_keys(port, true);
> >+              port->sm_vars &= ~AD_PORT_SELECTED;
> >       }
> >       agg = __get_first_agg(port);
> >       ad_agg_selection_logic(agg, &dummy);
> >--
> >2.39.2
> >
>
> ---
>         -Jay Vosburgh, jv@jvosburgh.net

^ permalink raw reply

* Re: [PATCH net v3 4/5] bonding: 3ad: fix stuck negotiation on recovery
From: Louis Scalbert @ 2026-04-15 18:08 UTC (permalink / raw)
  To: Jay Vosburgh
  Cc: netdev, andrew+netdev, edumazet, kuba, pabeni, fbl, andy,
	shemminger, maheshb
In-Reply-To: <710787.1776105588@famine>

Le lun. 13 avr. 2026 à 20:39, Jay Vosburgh <jv@jvosburgh.net> a écrit :
>
> Louis Scalbert <louis.scalbert@6wind.com> wrote:
>
> >The previous commit introduced a side effect caused by clearing the
> >SELECTED flag on disabled ports. After all ports in an aggregator go
> >down, if only a subset of ports comes back up, those ports can no
> >longer renegotiate LACP unless all aggregator ports come back up.
> >
> >1. All aggregator ports go down
> >  - The SELECTED flag is cleared on all of them.
> >2. One port comes back up
> >  - Its SELECTED flag is set again.
> >  - It enters the WAITING state and gets its READY_N flag.
> >  - The remaining ports stay UNSELECTED. Because of that, they cannot
> >  enter the WAITING state and therefore never get READY_N.
>
>         This is the part that I think we may be doing something else
> incorrectly.  If the port is UNSELECTED, then that means that no
> aggregator is currently selected for that port, and therefore it
> shouldn't be assigned to an aggregator with other ports (per
> 802.1AX-2014 6.4.8, "Selected").
>
>         I'm not seeing anything in the 6.4.14 Selection Logic that makes
> me think a port that is down (port_enabled == FALSE) is disallowed from
> being SELECTED.

Yes. Clearing SELECTED when port_enabled == FALSE is incorrect.
That means my previous patch was wrong.

>
>         Looking at the Receive machine state diagram (Figure 6-18), I
> tend to think that in this case the port would transition to
> PORT_DISABLED state, as we're not asserting a BEGIN (reinitialization of
> the LACP protocol entity), so the port variables can remain unchanged.
> There's even some language that suggests this is intentional:
>
>         "If the Aggregation Port becomes inoperable and the BEGIN
>         variable is not asserted, the state machine enters the
>         PORT_DISABLED state. [...] This state allows the current
>         Selection state to remain undisturbed, so that, in the event
>         that the Aggregation Port is still connected to the same Partner
>         and Partner Aggregation Port when it becomes operable again,
>         there will be no disturbance caused to higher layers by
>         unnecessary re-configuration.
>
>         So, perhaps the actual bug is that these ports are attached to
> the aggregator but not SELECTED.

Yes. According to the Mux state machine diagram, the correct behavior
when port_enabled == FALSE is to transition to the WAITING state,
except when already in the DETACHED state.

The WAITING state should then be responsible for clearing the
Synchronization, Collecting, and Distributing flags.

I will address this in the previous patch, so this one will no longer
be necessary.

best regards,

Louis Scalbert


>
>         -J
>
>
> >  - __agg_ports_are_ready() returns 0 because it finds a port without
> >  READY_N.
> >  - As a result, __set_agg_ports_ready() keeps the READY flag cleared on
> >  all ports.
> >  - The port that came back up is therefore not marked READY and cannot
> >  transition to ATTACHED.
> >  - LACP negotiation becomes stuck, and the port cannot be used.
> >3. All aggregator ports come back up
> >  - They all regain SELECTED and READY_N.
> >  - __agg_ports_are_ready() now returns 1.
> >  - __set_agg_ports_ready() sets READY on all ports.
> >  - They can then transition to ATTACHED.
> >  - Negotiation resumes and the aggregator becomes operational again.
> >
> >Consider only ports currently in the WAITING mux state for READY_N in
> >order to avoid __agg_ports_are_ready() to return 0 because of a disabled
> >port. That matches 802.3ad, which states: "The Selection Logic asserts
> >Ready TRUE when the values of Ready_N for all ports that are waiting to
> >attach to a given Aggregator are TRUE.".
> >
> >Fixes: 655f8919d549 ("bonding: add min links parameter to 802.3ad")
> >Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
> >---
> > drivers/net/bonding/bond_3ad.c | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> >diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
> >index 3a94fbcbf721..3f56d892b101 100644
> >--- a/drivers/net/bonding/bond_3ad.c
> >+++ b/drivers/net/bonding/bond_3ad.c
> >@@ -700,7 +700,8 @@ static void __update_ntt(struct lacpdu *lacpdu, struct port *port)
> > }
> >
> > /**
> >- * __agg_ports_are_ready - check if all ports in an aggregator are ready
> >+ * __agg_ports_are_ready - check if all ports in an aggregator that are in
> >+ * the WAITING state are ready
> >  * @aggregator: the aggregator we're looking at
> >  *
> >  */
> >@@ -716,6 +717,8 @@ static int __agg_ports_are_ready(struct aggregator *aggregator)
> >               for (port = aggregator->lag_ports;
> >                    port;
> >                    port = port->next_port_in_aggregator) {
> >+                      if (port->sm_mux_state != AD_MUX_WAITING)
> >+                              continue;
> >                       if (!(port->sm_vars & AD_PORT_READY_N)) {
> >                               retval = 0;
> >                               break;
> >--
> >2.39.2
> >
>
> ---
>         -Jay Vosburgh, jv@jvosburgh.net

^ permalink raw reply

* Re: [PATCH net-next v8 4/4] tun/tap & vhost-net: avoid ptr_ring tail-drop when a qdisc is present
From: Simon Schippers @ 2026-04-15 18:27 UTC (permalink / raw)
  To: Jason Wang
  Cc: willemdebruijn.kernel, andrew+netdev, davem, edumazet, kuba,
	pabeni, mst, eperezma, leiyang, stephen, jon, tim.gebauer, netdev,
	linux-kernel, kvm, virtualization
In-Reply-To: <b9d84d88-46d5-4fd3-a5b2-d914f54766f6@tu-dortmund.de>

I'd appreciate your feedback when you have time. 

Thanks!


^ permalink raw reply

* [RFC PATCH net-next v2 0/2] udp: fix FOU/GUE over multicast
From: Anton Danilov @ 2026-04-15 18:48 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, shuah, linux-kselftest

UDP encapsulation (FOU, GUE) has never worked correctly with multicast
destination addresses. When a FOU-encapsulated packet arrives at a
multicast address, it enters __udp4_lib_mcast_deliver() which calls
consume_skb() on packets that need resubmission to the inner protocol
handler, silently dropping them instead.

The unicast delivery path and the GSO segmentation path both handle
this correctly, but the multicast path was never updated to support
UDP encapsulation resubmit.

This causes silent packet loss for FOU/GRETAP tunnels configured with
multicast remote addresses. The loss ratio depends on the early demux
cache hit rate - packets that hit early demux bypass the multicast path
and work correctly, masking the issue.

Reproducing the issue:

  ip netns add ns_a && ip netns add ns_b
  ip link add veth0 type veth peer name veth1
  ip link set veth0 netns ns_a && ip link set veth1 netns ns_b

  ip -n ns_a addr add 10.0.0.1/24 dev veth0 && ip -n ns_a link set veth0 up
  ip -n ns_b addr add 10.0.0.2/24 dev veth1 && ip -n ns_b link set veth1 up

  # Multicast routes
  ip -n ns_a route add 239.0.0.0/8 dev veth0
  ip -n ns_b route add 239.0.0.0/8 dev veth1

  # Disable early demux to expose the issue (otherwise it's partially masked)
  ip netns exec ns_b sysctl -w net.ipv4.ip_early_demux=0

  # Join multicast group on receiver
  ip -n ns_b addr add 239.0.0.1/32 dev veth1 autojoin

  # Sender: FOU + GRETAP with multicast remote
  ip netns exec ns_a ip fou add port 4797 ipproto 47
  ip -n ns_a link add eoudp0 type gretap \
      remote 239.0.0.1 local 10.0.0.1 \
      encap fou encap-sport 4797 encap-dport 4797 key 239.0.0.1
  ip -n ns_a link set eoudp0 up
  ip -n ns_a addr add 192.168.99.1/24 dev eoudp0

  # Receiver: FOU + GRETAP with multicast remote
  ip netns exec ns_b ip fou add port 4797 ipproto 47
  ip -n ns_b link add eoudp0 type gretap \
      remote 239.0.0.1 local 10.0.0.2 \
      encap fou encap-sport 4797 encap-dport 4797 key 239.0.0.1
  ip -n ns_b link set eoudp0 up
  ip -n ns_b addr add 192.168.99.2/24 dev eoudp0

  # Test: ping through the FOU/GRETAP tunnel
  ip netns exec ns_a ping -c 100 192.168.99.2
  # -> without this patch: 0 packets received on eoudp0
  # -> with this patch: all packets received on eoudp0

AI assistance (Claude, claude-opus-4-6) was used during root cause
analysis of the kernel source code (tracing the call chain from
udp_queue_rcv_skb through encap_rcv to ip_protocol_deliver_rcu,
comparing unicast/GSO/multicast paths) and during patch and selftest
authoring. The fix approach was identified by observing that the
unicast path (udp_unicast_rcv_skb) and the GSO path
(udp_queue_rcv_skb) both already handle encap resubmit correctly,
while the multicast path did not.

Changes since v1 (RFC):
  - Moved inline Python packet generator from the shell script into a
    separate helper (fou_mcast_encap.py) and added it to TEST_FILES
    in the Makefile

Anton Danilov (2):
  udp: fix encapsulation packet resubmit in multicast deliver
  selftests: net: add FOU multicast encapsulation resubmit test

 net/ipv4/udp.c                                |  13 ++-
 net/ipv6/udp.c                                |  13 ++-
 tools/testing/selftests/net/Makefile          |   2 +
 .../testing/selftests/net/fou_mcast_encap.py  |  81 ++++++++++++++
 .../testing/selftests/net/fou_mcast_encap.sh  | 105 ++++++++++++++++++
 5 files changed, 206 insertions(+), 8 deletions(-)
 create mode 100755 tools/testing/selftests/net/fou_mcast_encap.py
 create mode 100755 tools/testing/selftests/net/fou_mcast_encap.sh

--
2.47.3

^ permalink raw reply

* [PATCH v1 net] af_unix: Drop all SCM attributes for SOCKMAP.
From: Kuniyuki Iwashima @ 2026-04-15 18:48 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, Cong Wang, Jiang Wang, Kuniyuki Iwashima,
	Kuniyuki Iwashima, netdev, Xingyu Jin

SOCKMAP can hide inflight fd from AF_UNIX GC.

When a socket in SOCKMAP receives skb with inflight fd,
sk_psock_verdict_data_ready() looks up the mapped socket and
enqueue skb to its psock->ingress_skb.

Since neither the old nor the new GC can inspect the psock
queue, the hidden skb leaks the inflight sockets.  Note that
this cannot be detected via kmemleak because inflight sockets
are linked to a global list.

In addition, SOCKMAP redirect breaks the Tarjan-based GC's
assumption that unix_edge.successor is always alive, which
is no longer true once skb is redirected, resulting in
use-after-free below. [0]

Moreover, SOCKMAP does not call scm_stat_del() properly,
so unix_show_fdinfo() could report an incorrect fd count.

sk_msg_recvmsg() does not support any SCM attributes in the
first place.

Let's drop all SCM attributes before passing skb to the
SOCKMAP layer.

[0]:
BUG: KASAN: slab-use-after-free in unix_del_edges (net/unix/garbage.c:118 net/unix/garbage.c:181 net/unix/garbage.c:251)
Read of size 8 at addr ffff888125362670 by task kworker/56:1/496

CPU: 56 UID: 0 PID: 496 Comm: kworker/56:1 Not tainted 7.0.0-rc7-00263-gb9d8b856689d #3 PREEMPT(lazy)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
Workqueue: events sk_psock_backlog
Call Trace:
 <TASK>
 dump_stack_lvl (lib/dump_stack.c:122)
 print_report (mm/kasan/report.c:379)
 kasan_report (mm/kasan/report.c:597)
 unix_del_edges (net/unix/garbage.c:118 net/unix/garbage.c:181 net/unix/garbage.c:251)
 unix_destroy_fpl (net/unix/garbage.c:317)
 unix_destruct_scm (./include/net/scm.h:80 ./include/net/scm.h:86 net/unix/af_unix.c:1976)
 sk_psock_backlog (./include/linux/skbuff.h:?)
 process_scheduled_works (kernel/workqueue.c:?)
 worker_thread (kernel/workqueue.c:?)
 kthread (kernel/kthread.c:438)
 ret_from_fork (arch/x86/kernel/process.c:164)
 ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
 </TASK>

Allocated by task 955:
 kasan_save_track (mm/kasan/common.c:58 mm/kasan/common.c:78)
 __kasan_slab_alloc (mm/kasan/common.c:369)
 kmem_cache_alloc_noprof (mm/slub.c:4539)
 sk_prot_alloc (net/core/sock.c:2240)
 sk_alloc (net/core/sock.c:2301)
 unix_create1 (net/unix/af_unix.c:1099)
 unix_create (net/unix/af_unix.c:1169)
 __sock_create (net/socket.c:1606)
 __sys_socketpair (net/socket.c:1811)
 __x64_sys_socketpair (net/socket.c:1863 net/socket.c:1860 net/socket.c:1860)
 do_syscall_64 (arch/x86/entry/syscall_64.c:?)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)

Freed by task 496:
 kasan_save_track (mm/kasan/common.c:58 mm/kasan/common.c:78)
 kasan_save_free_info (mm/kasan/generic.c:587)
 __kasan_slab_free (mm/kasan/common.c:287)
 kmem_cache_free (mm/slub.c:6165)
 __sk_destruct (net/core/sock.c:2282 net/core/sock.c:2384)
 sk_psock_destroy (./include/net/sock.h:?)
 process_scheduled_works (kernel/workqueue.c:?)
 worker_thread (kernel/workqueue.c:?)
 kthread (kernel/kthread.c:438)
 ret_from_fork (arch/x86/kernel/process.c:164)
 ret_from_fork_asm (arch/x86/entry/entry_64.S:258)

Fixes: c63829182c37 ("af_unix: Implement ->psock_update_sk_prot()")
Fixes: 77462de14a43 ("af_unix: Add read_sock for stream socket types")
Reported-by: Xingyu Jin <xingyuj@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
 net/unix/af_unix.c | 35 +++++++++++++++++++++++++++--------
 1 file changed, 27 insertions(+), 8 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index b23c33df8b46..91a03c8a4281 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1964,16 +1964,19 @@ static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb)
 
 static void unix_destruct_scm(struct sk_buff *skb)
 {
-	struct scm_cookie scm;
+	struct scm_cookie scm = {};
+
+	swap(scm.pid, UNIXCB(skb).pid);
 
-	memset(&scm, 0, sizeof(scm));
-	scm.pid = UNIXCB(skb).pid;
 	if (UNIXCB(skb).fp)
 		unix_detach_fds(&scm, skb);
 
-	/* Alas, it calls VFS */
-	/* So fscking what? fput() had been SMP-safe since the last Summer */
 	scm_destroy(&scm);
+}
+
+static void unix_wfree(struct sk_buff *skb)
+{
+	unix_destruct_scm(skb);
 	sock_wfree(skb);
 }
 
@@ -1989,7 +1992,7 @@ static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool sen
 	if (scm->fp && send_fds)
 		err = unix_attach_fds(scm, skb);
 
-	skb->destructor = unix_destruct_scm;
+	skb->destructor = unix_wfree;
 	return err;
 }
 
@@ -2066,6 +2069,13 @@ static void scm_stat_del(struct sock *sk, struct sk_buff *skb)
 	}
 }
 
+static void unix_orphan_scm(struct sock *sk, struct sk_buff *skb)
+{
+	scm_stat_del(sk, skb);
+	unix_destruct_scm(skb);
+	skb->destructor = sock_wfree;
+}
+
 /*
  *	Send AF_UNIX data.
  */
@@ -2679,10 +2689,16 @@ static int unix_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 	int err;
 
 	mutex_lock(&u->iolock);
+
 	skb = skb_recv_datagram(sk, MSG_DONTWAIT, &err);
-	mutex_unlock(&u->iolock);
-	if (!skb)
+	if (!skb) {
+		mutex_unlock(&u->iolock);
 		return err;
+	}
+
+	unix_orphan_scm(sk, skb);
+
+	mutex_unlock(&u->iolock);
 
 	return recv_actor(sk, skb);
 }
@@ -2882,6 +2898,9 @@ static int unix_stream_read_skb(struct sock *sk, skb_read_actor_t recv_actor)
 #endif
 
 	spin_unlock(&queue->lock);
+
+	unix_orphan_scm(sk, skb);
+
 	mutex_unlock(&u->iolock);
 
 	return recv_actor(sk, skb);
-- 
2.54.0.rc1.513.gad8abe7a5a-goog


^ permalink raw reply related

* [RFC PATCH net-next v2 1/2] udp: fix encapsulation packet resubmit in multicast deliver
From: Anton Danilov @ 2026-04-15 18:50 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, shuah, linux-kselftest
In-Reply-To: <ad_dal164gVmImWl@dau-home-pc>

When a UDP encapsulation socket (e.g., FOU) receives a multicast
packet, __udp4_lib_mcast_deliver() and __udp6_lib_mcast_deliver()
incorrectly call consume_skb() when udp_queue_rcv_skb() returns a
positive value. A positive return value from udp_queue_rcv_skb()
indicates that the encap_rcv handler (e.g., fou_udp_recv) has
consumed the UDP header and wants the packet to be resubmitted to
the IP protocol handler for further processing (e.g., as a GRE
packet).

The unicast path in udp_unicast_rcv_skb() handles this correctly by
returning -ret, which propagates up to ip_protocol_deliver_rcu() for
resubmission. The GSO path in udp_queue_rcv_skb() also handles this
correctly by calling ip_protocol_deliver_rcu() directly. However, the
multicast path destroys the packet via consume_skb() instead of
resubmitting it, causing silent packet loss.

This affects any UDP encapsulation (FOU, GUE) combined with multicast
destination addresses. In practice, it causes ~50% packet loss on
FOU/GRETAP tunnels configured with multicast remote addresses, with
the exact ratio depending on the early demux cache hit rate (packets
that hit early demux take the unicast path and are handled correctly).

Fix this by calling ip_protocol_deliver_rcu() (IPv4) or
ip6_protocol_deliver_rcu() (IPv6) instead of consume_skb() when the
return value is positive, matching the behavior of the GSO path.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Assisted-by: Claude:claude-opus-4-6
---
 net/ipv4/udp.c | 13 +++++++++----
 net/ipv6/udp.c | 13 +++++++++----
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index e9e2ce9522ef..8c2d4367cba2 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2467,6 +2467,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	struct udp_hslot *hslot;
 	struct sk_buff *nskb;
 	bool use_hash2;
+	int ret;

 	hash2_any = 0;
 	hash2 = 0;
@@ -2500,8 +2501,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 			__UDP_INC_STATS(net, UDP_MIB_INERRORS);
 			continue;
 		}
-		if (udp_queue_rcv_skb(sk, nskb) > 0)
-			consume_skb(nskb);
+		ret = udp_queue_rcv_skb(sk, nskb);
+		if (ret > 0)
+			ip_protocol_deliver_rcu(dev_net(nskb->dev), nskb,
+						ret);
 	}

 	/* Also lookup *:port if we are using hash2 and haven't done so yet. */
@@ -2511,8 +2514,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	}

 	if (first) {
-		if (udp_queue_rcv_skb(first, skb) > 0)
-			consume_skb(skb);
+		ret = udp_queue_rcv_skb(first, skb);
+		if (ret > 0)
+			ip_protocol_deliver_rcu(dev_net(skb->dev), skb,
+						ret);
 	} else {
 		kfree_skb(skb);
 		__UDP_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 15e032194ecc..f74935d9f7d7 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -949,6 +949,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	struct udp_hslot *hslot;
 	struct sk_buff *nskb;
 	bool use_hash2;
+	int ret;

 	hash2_any = 0;
 	hash2 = 0;
@@ -987,8 +988,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 			continue;
 		}

-		if (udpv6_queue_rcv_skb(sk, nskb) > 0)
-			consume_skb(nskb);
+		ret = udpv6_queue_rcv_skb(sk, nskb);
+		if (ret > 0)
+			ip6_protocol_deliver_rcu(dev_net(nskb->dev), nskb,
+						 ret, true);
 	}

 	/* Also lookup *:port if we are using hash2 and haven't done so yet. */
@@ -998,8 +1001,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	}

 	if (first) {
-		if (udpv6_queue_rcv_skb(first, skb) > 0)
-			consume_skb(skb);
+		ret = udpv6_queue_rcv_skb(first, skb);
+		if (ret > 0)
+			ip6_protocol_deliver_rcu(dev_net(skb->dev), skb,
+						 ret, true);
 	} else {
 		kfree_skb(skb);
 		__UDP6_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
-- 
2.47.3

^ permalink raw reply related

* [RFC PATCH net-next v2 2/2] selftests: net: add FOU multicast encapsulation resubmit test
From: Anton Danilov @ 2026-04-15 18:52 UTC (permalink / raw)
  To: netdev
  Cc: willemdebruijn.kernel, davem, dsahern, edumazet, kuba, pabeni,
	horms, shuah, linux-kselftest
In-Reply-To: <ad_dal164gVmImWl@dau-home-pc>

Add a selftest to verify that FOU-encapsulated packets addressed to a
multicast destination are correctly resubmitted to the inner protocol
handler (GRE) via the UDP multicast delivery path.

The test creates two network namespaces connected by a veth pair. The
receiver namespace has a FOU/GRETAP tunnel with a multicast remote
address (239.0.0.1). A Python helper script (fou_mcast_encap.py)
crafts GRE-over-UDP packets and sends them to the multicast address.

The early demux optimization (net.ipv4.ip_early_demux) is disabled on
the receiver to force packets through __udp4_lib_mcast_deliver(),
which is the code path that was previously broken.

Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
Assisted-by: Claude:claude-opus-4-6
---
 tools/testing/selftests/net/Makefile          |   2 +
 .../testing/selftests/net/fou_mcast_encap.py  |  81 ++++++++++++++
 .../testing/selftests/net/fou_mcast_encap.sh  | 105 ++++++++++++++++++
 3 files changed, 188 insertions(+)
 create mode 100755 tools/testing/selftests/net/fou_mcast_encap.py
 create mode 100755 tools/testing/selftests/net/fou_mcast_encap.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index a275ed584026..2b463dda5ac6 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -38,6 +38,7 @@ TEST_PROGS := \
 	fib_rule_tests.sh \
 	fib_tests.sh \
 	fin_ack_lat.sh \
+	fou_mcast_encap.sh \
 	fq_band_pktlimit.sh \
 	gre_gso.sh \
 	gre_ipv6_lladdr.sh \
@@ -195,6 +196,7 @@ TEST_GEN_PROGS := \
 
 TEST_FILES := \
 	fcnal-test.sh \
+	fou_mcast_encap.py \
 	in_netns.sh \
 	lib.sh \
 	settings \
diff --git a/tools/testing/selftests/net/fou_mcast_encap.py b/tools/testing/selftests/net/fou_mcast_encap.py
new file mode 100755
index 000000000000..0fd836c454a8
--- /dev/null
+++ b/tools/testing/selftests/net/fou_mcast_encap.py
@@ -0,0 +1,81 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+"""Send FOU/GRE encapsulated packets to a multicast destination.
+
+Build and send GRE-over-UDP (FOU) packets with a multicast outer
+destination IP. Used by fou_mcast_encap.sh to test that the UDP
+multicast delivery path correctly resubmits encapsulated packets
+to the inner protocol handler.
+"""
+
+import argparse
+import socket
+import struct
+
+
+def checksum(data):
+    """Compute Internet checksum (RFC 1071)."""
+    csum = 0
+    for i in range(0, len(data) - 1, 2):
+        csum += (data[i] << 8) + data[i + 1]
+    if len(data) % 2:
+        csum += data[-1] << 8
+    while csum >> 16:
+        csum = (csum & 0xFFFF) + (csum >> 16)
+    return ~csum & 0xFFFF
+
+
+def build_gre_encap_packet(dst_addr):
+    """Build a GRE/Ethernet/IP/ICMP payload for FOU encapsulation."""
+    gre_key = socket.inet_aton(dst_addr)
+    gre_hdr = struct.pack("!HH", 0x2000, 0x6558) + gre_key
+
+    dst_mac = b"\xff\xff\xff\xff\xff\xff"
+    src_mac = b"\x02\x00\x00\x00\x00\x01"
+    eth_hdr = dst_mac + src_mac + struct.pack("!H", 0x0800)
+
+    inner_ip_src = socket.inet_aton("192.168.99.1")
+    inner_ip_dst = socket.inet_aton("192.168.99.2")
+
+    icmp_payload = b"TESTFOU!" * 4
+    icmp_hdr = struct.pack("!BBHHH", 8, 0, 0, 0x1234, 1) + icmp_payload
+    icmp_csum = checksum(icmp_hdr)
+    icmp_hdr = struct.pack("!BBHHH", 8, 0, icmp_csum, 0x1234, 1) + icmp_payload
+
+    ip_len = 20 + len(icmp_hdr)
+    ip_hdr = struct.pack("!BBHHHBBH", 0x45, 0, ip_len, 0x1234, 0, 64, 1, 0)
+    ip_hdr += inner_ip_src + inner_ip_dst
+    ip_csum = checksum(ip_hdr)
+    ip_hdr = ip_hdr[:10] + struct.pack("!H", ip_csum) + ip_hdr[12:]
+
+    return gre_hdr + eth_hdr + ip_hdr + icmp_hdr
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description="Send FOU/GRE encapsulated packets to a multicast address"
+    )
+    parser.add_argument(
+        "-c", "--count", type=int, required=True,
+        help="number of packets to send",
+    )
+    parser.add_argument(
+        "-d", "--dst", default="239.0.0.1",
+        help="destination multicast address (default: 239.0.0.1)",
+    )
+    parser.add_argument(
+        "-p", "--port", type=int, default=4797,
+        help="destination UDP port (default: 4797)",
+    )
+    args = parser.parse_args()
+
+    payload = build_gre_encap_packet(args.dst)
+
+    sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+    for _ in range(args.count):
+        sock.sendto(payload, (args.dst, args.port))
+    sock.close()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/tools/testing/selftests/net/fou_mcast_encap.sh b/tools/testing/selftests/net/fou_mcast_encap.sh
new file mode 100755
index 000000000000..1f9dbe7092f3
--- /dev/null
+++ b/tools/testing/selftests/net/fou_mcast_encap.sh
@@ -0,0 +1,105 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test that UDP encapsulation (FOU) correctly handles packet resubmit
+# when packets are delivered via the multicast UDP delivery path.
+#
+# When a FOU-encapsulated packet arrives with a multicast destination IP,
+# __udp4_lib_mcast_deliver() must resubmit it to the inner protocol
+# handler (e.g., GRE) rather than consuming it. This test verifies that
+# by creating a FOU/GRETAP tunnel with a multicast remote address, sending
+# encapsulated packets, and checking that they are correctly decapsulated.
+#
+# The early demux optimization can mask this issue by routing packets via
+# the unicast path (udp_unicast_rcv_skb), so we disable it to force
+# packets through __udp4_lib_mcast_deliver().
+
+source lib.sh
+
+NSENDER=""
+NRECV=""
+
+cleanup() {
+	cleanup_all_ns
+}
+
+trap cleanup EXIT
+
+setup() {
+	setup_ns NSENDER NRECV
+
+	ip link add veth_s type veth peer name veth_r
+	ip link set veth_s netns "$NSENDER"
+	ip link set veth_r netns "$NRECV"
+
+	ip -n "$NSENDER" addr add 10.0.0.1/24 dev veth_s
+	ip -n "$NSENDER" link set veth_s up
+
+	ip -n "$NRECV" addr add 10.0.0.2/24 dev veth_r
+	ip -n "$NRECV" link set veth_r up
+
+	# Disable early demux to force multicast delivery path
+	ip netns exec "$NRECV" sysctl -wq net.ipv4.ip_early_demux=0
+
+	# Join multicast group on receiver
+	ip -n "$NRECV" addr add 239.0.0.1/32 dev veth_r autojoin
+
+	# Multicast routes
+	ip -n "$NRECV" route add 239.0.0.0/8 dev veth_r
+	ip -n "$NSENDER" route add 239.0.0.0/8 dev veth_s
+
+	# FOU listener
+	ip netns exec "$NRECV" ip fou add port 4797 ipproto 47
+
+	# GRETAP with multicast remote - this triggers __udp4_lib_mcast_deliver
+	ip -n "$NRECV" link add eoudp0 type gretap \
+		remote 239.0.0.1 local 10.0.0.2 \
+		encap fou encap-sport 4797 encap-dport 4797 \
+		key 239.0.0.1
+	ip -n "$NRECV" link set eoudp0 up
+	ip -n "$NRECV" addr add 192.168.99.2/24 dev eoudp0
+}
+
+send_fou_gre_packets() {
+	local count=$1
+	local script_dir
+	script_dir=$(dirname "$(readlink -f "$0")")
+
+	ip netns exec "$NSENDER" python3 "$script_dir/fou_mcast_encap.py" \
+		-c "$count" -d 239.0.0.1 -p 4797
+}
+
+get_rx_packets() {
+	ip -n "$NRECV" -s link show eoudp0 | awk '/RX:/{getline; print $2}'
+}
+
+test_fou_mcast_encap() {
+	local count=100
+	local rx_before
+	local rx_after
+	local rx_delta
+
+	rx_before=$(get_rx_packets)
+	send_fou_gre_packets $count
+	sleep 1
+	rx_after=$(get_rx_packets)
+
+	rx_delta=$((rx_after - rx_before))
+
+	if [ "$rx_delta" -ge "$count" ]; then
+		echo "PASS: received $rx_delta/$count packets via multicast FOU/GRETAP"
+		return "$ksft_pass"
+	elif [ "$rx_delta" -gt 0 ]; then
+		echo "FAIL: only $rx_delta/$count packets received (partial delivery)"
+		return "$ksft_fail"
+	else
+		echo "FAIL: 0/$count packets received (multicast encap resubmit broken)"
+		return "$ksft_fail"
+	fi
+}
+
+echo "TEST: FOU/GRETAP multicast encapsulation resubmit"
+
+setup
+test_fou_mcast_encap
+exit $?
-- 
2.47.3


^ permalink raw reply related

* Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: Martin KaFai Lau @ 2026-04-15 18:55 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: bpf, Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu, Eric Dumazet,
	Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern,
	netdev, linux-doc, linux-kernel
In-Reply-To: <20260414105702.248310-1-jiayuan.chen@linux.dev>

On Tue, Apr 14, 2026 at 06:57:00PM +0800, Jiayuan Chen wrote:
> A BPF_PROG_TYPE_SOCK_OPS program can set BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG
> to inject custom TCP header options. When the kernel builds a TCP packet,
> it calls tcp_established_options() to calculate the header size, which
> invokes bpf_skops_hdr_opt_len() to trigger the BPF_SOCK_OPS_HDR_OPT_LEN_CB
> callback.
> 
> If the BPF program calls bpf_setsockopt(TCP_NODELAY) inside this callback,
> __tcp_sock_set_nodelay() will call tcp_push_pending_frames(), which calls
> tcp_current_mss(), which calls tcp_established_options() again,
> re-triggering the same BPF callback. This creates an infinite recursion
> that exhausts the kernel stack and causes a panic.
> 
> BPF_SOCK_OPS_HDR_OPT_LEN_CB
>   -> bpf_setsockopt(TCP_NODELAY)
> 	-> tcp_push_pending_frames()
> 	  -> tcp_current_mss()
> 		-> tcp_established_options()
> 		  -> bpf_skops_hdr_opt_len()
>                            /* infinite recursion */
> 			-> BPF_SOCK_OPS_HDR_OPT_LEN_CB
> 
> A similar reentrancy issue exists for TCP congestion control, which is
> guarded by tp->bpf_chg_cc_inprogress. Adopt the same approach: introduce
> tp->bpf_hdr_opt_len_cb_inprogress, set it before invoking the callback in
> bpf_skops_hdr_opt_len(), and check it in sol_tcp_sockopt() to reject
> bpf_setsockopt(TCP_NODELAY) calls that would trigger
> tcp_push_pending_frames() and cause the recursion.
> 
> Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> Reported-by: Dongliang Mu <dzm91@hust.edu.cn>
> Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/

Thanks for the report and fixes suggested across different threads.

Using has_current_bpf_ctx() to avoid tcp_push_pending_frames() should
work but it may change the expectation for bpf_setsockopt(TCP_NODELAY).
e.g. A bpf_tcp_iter does bpf_setsockopt(TCP_NODELAY).

Adding another bit in the tcp_sock is not ideal either. I agree with
Alexei that it is better to reuse the existing bit if we go down this path.
We also need to audit more closely if there are cases that two different
type of bpf progs may call bpf_setsockopt(). e.g.
bpf_tcp_iter does bpf_setsockopt(TCP_CONGESTION) to switch
to a bpf_tcp_cc and the new bpf_tcp_cc->init() will also do
bpf_setsockopt(xxx) which then will be rejected.

Another fix could be, the bpf_setsockopt(TCP_NODELAY) is always broken
for BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB unless
the bpf prog is doing some maneuver to avoid the recursion. Thus,
this use case is basically broken as is and I don't see a use case
for bpf_setsockopt(TCP_NODELAY) when writing header also.
How about checking the bpf_sock->op, level, and optname in
bpf_sock_ops_setsockopt() and return -EOPNOTSUPP?

^ permalink raw reply

* Re: [PATCH net-next v4 04/13] devlink: allow to use devlink index as a command handle
From: Geert Uytterhoeven @ 2026-04-15 19:04 UTC (permalink / raw)
  To: jiri
  Cc: andrew+netdev, chuck.lever, cjubran, corbet, daniel.zahka, davem,
	donald.hunter, edumazet, horms, kuba, leon, linux-doc, linux-rdma,
	linux-trace-kernel, mathieu.desnoyers, matttbe, mbloch, mhiramat,
	mschmidt, netdev, pabeni, przemyslaw.kitszel, rostedt, saeedm,
	skhan, tariqt, linux-kernel
In-Reply-To: <20260312100407.551173-5-jiri@resnulli.us>

On Thu, 12 Mar 2026, Jiri Pirko wrote:
> Currently devlink instances are addressed bus_name/dev_name tuple.
> Allow the newly introduced DEVLINK_ATTR_INDEX to be used as
> an alternative handle for all devlink commands.
> 
> When DEVLINK_ATTR_INDEX is present in the request, use it for a direct
> xarray lookup instead of iterating over all instances comparing
> bus_name/dev_name strings.
> 
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>

Thanks for your patch, which is now commit d85a8af57da87196 ("devlink:
allow to use devlink index as a command handle").

This has a rather large impact on kernel size.
For e.g. m68k/atari_defconfig, bloat-o-meter reports:

    add/remove: 4/1 grow/shrink: 72/1 up/down: 65804/-76 (65728)
    Function                                     old     new   delta
    devlink_trap_policer_get_dump_nl_policy       24    1480   +1456
    devlink_trap_group_get_dump_nl_policy         24    1480   +1456
    devlink_trap_get_dump_nl_policy               24    1480   +1456
    devlink_selftests_get_nl_policy               24    1480   +1456
    devlink_sb_tc_pool_bind_get_dump_nl_policy      24    1480   +1456
    devlink_sb_port_pool_get_dump_nl_policy       24    1480   +1456
    devlink_sb_pool_get_dump_nl_policy            24    1480   +1456
    devlink_sb_get_dump_nl_policy                 24    1480   +1456
    devlink_resource_dump_nl_policy               24    1480   +1456
    devlink_region_get_dump_nl_policy             24    1480   +1456
    devlink_rate_get_dump_nl_policy               24    1480   +1456
    devlink_port_get_dump_nl_policy               24    1480   +1456
    devlink_param_get_dump_nl_policy              24    1480   +1456
    devlink_linecard_get_dump_nl_policy           24    1480   +1456
    devlink_info_get_nl_policy                    24    1480   +1456
    devlink_get_nl_policy                         24    1480   +1456
    devlink_eswitch_get_nl_policy                 24    1480   +1456
    devlink_dpipe_headers_get_nl_policy           24    1480   +1456
    devlink_port_unsplit_nl_policy                32    1480   +1448
    devlink_port_param_set_nl_policy              32    1480   +1448
    devlink_port_param_get_nl_policy              32    1480   +1448
    devlink_port_get_do_nl_policy                 32    1480   +1448
    devlink_port_del_nl_policy                    32    1480   +1448
    devlink_notify_filter_set_nl_policy           32    1480   +1448
    devlink_health_reporter_get_dump_nl_policy      32    1480   +1448
    devlink_port_split_nl_policy                  80    1480   +1400
    devlink_sb_occ_snapshot_nl_policy             96    1480   +1384
    devlink_sb_occ_max_clear_nl_policy            96    1480   +1384
    devlink_sb_get_do_nl_policy                   96    1480   +1384
    devlink_sb_port_pool_get_do_nl_policy        144    1480   +1336
    devlink_sb_pool_get_do_nl_policy             144    1480   +1336
    devlink_sb_pool_set_nl_policy                168    1480   +1312
    devlink_sb_port_pool_set_nl_policy           176    1480   +1304
    devlink_sb_tc_pool_bind_set_nl_policy        184    1480   +1296
    devlink_sb_tc_pool_bind_get_do_nl_policy     184    1480   +1296
    devlink_dpipe_table_get_nl_policy            240    1480   +1240
    devlink_dpipe_entries_get_nl_policy          240    1480   +1240
    devlink_dpipe_table_counters_set_nl_policy     272    1480   +1208
    devlink_eswitch_set_nl_policy                504    1480    +976
    devlink_resource_set_nl_policy               544    1480    +936
    devlink_param_get_do_nl_policy               656    1480    +824
    devlink_region_get_do_nl_policy              712    1480    +768
    devlink_region_new_nl_policy                 744    1480    +736
    devlink_region_del_nl_policy                 744    1480    +736
    devlink_health_reporter_test_nl_policy       928    1480    +552
    devlink_health_reporter_recover_nl_policy     928    1480    +552
    devlink_health_reporter_get_do_nl_policy     928    1480    +552
    devlink_health_reporter_dump_get_nl_policy     928    1480    +552
    devlink_health_reporter_dump_clear_nl_policy     928    1480    +552
    devlink_health_reporter_diagnose_nl_policy     928    1480    +552
    devlink_trap_get_do_nl_policy               1048    1480    +432
    devlink_trap_set_nl_policy                  1056    1480    +424
    devlink_trap_group_get_do_nl_policy         1088    1480    +392
    devlink_trap_policer_get_do_nl_policy       1144    1480    +336
    devlink_trap_group_set_nl_policy            1144    1480    +336
    devlink_trap_policer_set_nl_policy          1160    1480    +320
    devlink_port_set_nl_policy                  1168    1480    +312
    devlink_flash_update_nl_policy              1224    1480    +256
    devlink_reload_nl_policy                    1248    1480    +232
    devlink_port_new_nl_policy                  1320    1480    +160
    devlink_rate_get_do_nl_policy               1352    1480    +128
    devlink_rate_del_nl_policy                  1352    1480    +128
    devlink_linecard_get_do_nl_policy           1376    1480    +104
    __devlinks_xa_find_get                         -      96     +96
    devlink_linecard_set_nl_policy              1392    1480     +88
    devlink_selftests_run_nl_policy             1416    1480     +64
    devlink_get_from_attrs_lock                  262     314     +52
    devlink_region_read_nl_policy               1440    1480     +40
    devlink_rate_set_nl_policy                  1448    1480     +32
    devlink_rate_new_nl_policy                  1448    1480     +32
    devlinks_xa_lookup_get                         -      30     +30
    devlink_health_reporter_set_nl_policy       1456    1480     +24
    devlink_attr_index_range                       -      16     +16
    devlink_param_set_nl_policy                 1472    1480      +8
    devlink_nl_dumpit                            276     282      +6
    __initcall__kmod_core__670_573_devlink_init4       -       4      +4
    __initcall__kmod_core__670_561_devlink_init4       4       -      -4
    devlinks_xa_find_get                          96      24     -72
    Total: Before=5203976, After=5269704, chg +1.26%

> --- a/net/devlink/netlink_gen.c
> +++ b/net/devlink/netlink_gen.c
> @@ -11,6 +11,11 @@
>  
>  #include <uapi/linux/devlink.h>
>  
> +/* Integer value ranges */
> +static const struct netlink_range_validation devlink_attr_index_range = {
> +	.max	= U32_MAX,
> +};
> +
>  /* Sparse enums validation callbacks */
>  static int
>  devlink_attr_param_type_validate(const struct nlattr *attr,
> @@ -56,37 +61,42 @@ const struct nla_policy devlink_dl_selftest_id_nl_policy[DEVLINK_ATTR_SELFTEST_I
>  };
>  
>  /* DEVLINK_CMD_GET - do */
> -static const struct nla_policy devlink_get_nl_policy[DEVLINK_ATTR_DEV_NAME + 1] = {
> +static const struct nla_policy devlink_get_nl_policy[DEVLINK_ATTR_INDEX + 1] = {

Unrelated to this change, but the explicit sizing of these arrays is not
needed, as the compiler will take care of that.

>  	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
>  	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
> +	[DEVLINK_ATTR_INDEX] = NLA_POLICY_FULL_RANGE(NLA_UINT, &devlink_attr_index_range),

This array, and many others below, are sparse, with large gaps (up to
1456 or 2912 bytes on 32-bit resp. 64-bit systems) before the last
entries.

>  };
>  
>  /* DEVLINK_CMD_PORT_GET - do */
> -static const struct nla_policy devlink_port_get_do_nl_policy[DEVLINK_ATTR_PORT_INDEX + 1] = {
> +static const struct nla_policy devlink_port_get_do_nl_policy[DEVLINK_ATTR_INDEX + 1] = {
>  	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
>  	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
> +	[DEVLINK_ATTR_INDEX] = NLA_POLICY_FULL_RANGE(NLA_UINT, &devlink_attr_index_range),

Shouldn't this be inserted at the end, as DEVLINK_ATTR_INDEX >
DEVLINK_ATTR_PORT_INDEX, for readability?

>  	[DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32, },
>  };

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Russell King (Oracle) @ 2026-04-15 19:37 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni
In-Reply-To: <CAH5Ym4jKdzDeYwCfkMLmUz0FsiD2vFwfuAvqFE=uvMtPmakeMQ@mail.gmail.com>

On Wed, Apr 15, 2026 at 10:38:29AM -0700, Sam Edwards wrote:
> On Wed, Apr 15, 2026 at 5:44 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Tue, Apr 14, 2026 at 07:12:34PM -0700, Sam Edwards wrote:
> > > On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle)
> > > <linux@armlinux.org.uk> wrote:
> > > > Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
> > > > survives iperf3 -c -R to the imx6.
> > >
> > > Hi Russell,
> > >
> > > Aw, you beat me to it! I was about to report that 5.10.104-tegra is
> > > unaffected. And my iperf3 server is a multi-GbE amd64 machine.
> > >
> > > > Dumping the registers and comparing, and then forcing the RQS and TQS
> > > > values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
> > > > *256 = 36864 ytes) respectively seems to solve the problem. Under
> > > > net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
> > > > Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
> > > > all four of the MTL receive operation mode registers, but only the
> > > > first MTL transmit operation mode register. However, DMA channels 1-3
> > > > aren't initialised.
> > >
> > > Wow, great! I wonder if the problem is that the MTL FIFOs are smaller
> > > than that, so when the DMA suffers a momentary hiccup, the FIFOs are
> > > allowed to overflow, putting the hardware in a bad state.
> > >
> > > Though I suspect this is only half of the problem: do you still see
> > > RBUs? Everything you've shared so far suggests the DMA failures are
> > > _not_ because the rx ring is drying up.
> >
> > Yes. Note that RBUs will happen not because of DMA failures, but if
> > the kernel fails to keep up with the packet rate. RBU means "we read
> > the next descriptor, and it wasn't owned by hardware".
> 
> Are you speaking from observation, documentation, or understanding?

Observation.

> I'd define RBU the same way, but you reported:

It's not a question about how I define RBU - this is defined by Synopsys
and I'm using it *exactly* that way as stated in the documentation.

"This bit indicates that the host owns the Next Descriptor in the
Receive List and the DMA cannot acquire it. The Receive Process is
suspended. ... This bit is set only when the previous Receive
Descriptor is owned by the DMA."

In other words, DMA has processed the previous receive descriptor which
_was_ owned by the hardware, written back to clear the OWN bit, and
then fetches the next descriptor and finds that the OWN bit is also
clear.

> 
> ```
> [   55.766199] dwc-eth-dwmac 2490000.ethernet eth0: q0: receive buffer
> unavailable: cur_rx=309 dirty_rx=309 last_cur_rx=245
> last_cur_rx_post=309 last_dirty_rx=245 count=64 budget=64
> 
> cur_rx == dirty_rx _should_ mean that we fully refilled the ring. [...]
> [...]
> Every ring entry contains the same RDES3 value, so it really is
> completely full at the point RBU fires (bit 31 clear means software
> owns the descriptor, and it's basically saying first/last segment,
> RDES1 valid, buffer 1 length of 1518.
> ```

Right, because the _last_ time stmmac_rx() was called, the ring was
completely refilled (as it always is for me).

There are two scenarios that what I'm seeing may happen.

1) The ring was fully refilled, but before stmmac_rx() is next
   executed, all descriptors end up being consumed due to the rate
   at which packets are being received. Thus, the hardware encounters
   a descriptor that has OWN=0

2) The kernel has been slow to respond to packets that have been
   received, and because of the NAPI throttling stmmac_rx() to only
   process 64 descriptors at a time, we are falling way behind the
   hardware position. Eventually, the hardware catches up with
   the point at which stmmac_rx_refill() is repopulating the receive
   descriptors, and encounters a descriptor that has OWN=0.

For (2), for example, let's take the example which you've quoted from
me.

stmmac_rx() gets called, and cur_rx = dirty_rx = 245. We're limited to
a count of 64 meaning we're not going to process more than 64 entries
no matter how far ahead the hardware is. Let's say the hardware is at
e.g. descriptor 400 at this point.

stmmac_rx() runs, processing descriptors. It works its way up to entry
309, at which point count == limit, so it stops, and we now have
cur_rx = 309, dirty_rx = 245.

The next thing stmmac_rx() does is call stmmac_rx_refill(). This looks
at the difference, and calculates how many entries need to be
repopulated. stmmac_rx_dirty() returns 64, as that's the number of
entries between dirty_rx and the updated cur_rx. It populates those
entries.

At this point, dirty_rx = 309. All well and good. However, during that
process, packet reception hasn't stopped, and let's say it's now at
descriptor 500.

In that scenario, we're consuming 100 descriptors, but only repopulating
64 descriptors. As this continues, the hardware is slowly catching up
with point in the ring that stmmac_rx_refill() is repopulating the
descriptors.

When it does catch up, it will encounter a descriptor with OWN=0, which
will fire the RBU interrupt.

At this point, my debug dumps the state of the ring. If the RBU was
raised when stmmac_rx()/stmmac_rx_refill() was not running, _and_ we
are always successfully refilling all the entries that stmmac_rx()
processed, then cur_rx will equal dirty_rx, even when the hardware
could be way ahead of cur_rx. Neither of these indexes have any
relevance to where the hardware actually is in the ring.

The dump of the ring state *clearly* shows that all descriptors have
a RDES3 value which indicates that every single descriptor is not
hardware owned at this point (since RBU has been raised, the receive
process is suspended, so hardware is no longer changing the ring.)

> It would seem* that the kernel isn't really failing to keep up with
> the packet rate. If RBU is firing with a ring that's not even close to
> empty, that tells me there's another way for it to fire. So I suspect
> the hardware designers implemented it to mean:
> "We couldn't read the next descriptor, _or_ it wasn't owned by hardware."
> 
> (* However, if bit 31 is clear everywhere, wouldn't that mean the ring
> is actually completely depleted, not full? If count==budget, wouldn't
> that mean the whole ring hasn't been visited, so we only refilled 64
> entries and not necessarily the entire ring? Maybe the kernel isn't
> keeping up after all.)

Ah, I think that's where our terminology differs.

You seem to define full as "populated with empty buffers". I define
full to mean "the hardware has filled every buffer with a packet that
it has received and handed it over to software to process." Note even
the terminology there - filling buffers with data. That ultimately
ends up filling the ring, and when completely filled, it is full.

I think of buffers like buckets. If a buffer contains no data, it
is empty. If a buffer contains data, it has been filled or is full.
Apply that to a list of buffers and you get the same thing. Many
ethernet driver documentation uses this same terminology, so I
thought it would be widely understood.

> > That has:
> >
> >         const nveu32_t rx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
> >                 { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
> >                   FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
> >                 { FIFO_SZ(36U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U),
> >                   FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(16U) },
> >         };
> >         const nveu32_t tx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
> >                 { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
> >                   FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
> >                 { FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U),
> >                   FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U) },
> >         };
> >
> > where each of those values is the RQS/TQS value to use in KiB:
> >
> > #define FIFO_SZ(x)              ((((x) * 1024U) / 256U) - 1U)
> >
> > This doesn't correspond with the values I'm seeing programmed into
> > the hardware under the 5.10.216-tegra kernel. I'm seeing TQS = 143
> > (36KiB), and RQS = 35 (9KiB). Yes, these values exist in the tables
> > above from a quick look, but they're not in the right place!
> 
> True, but:
> a) I doubt 5.10.216-tegra includes exactly the same version of the
> driver found in this random GitHub mirror. (My intent was only to
> point out that they don't use 5.10's stmmac; I should have been more
> clear that I wasn't trying to link the same version, sorry!)
> b) This is vendor code; I don't know how good their testing/review
> process is. It might not run the way it looks. The intent seems to be
> for RQS > TQS (which makes intuitive sense), but as you're seeing the
> registers programmed the other way 'round, they might have gotten them
> subtly mixed up.
> 
> > Now, as for FIFO sizes, if we sum up all the entries, then we
> > get:
> >
> > SUM(rx_fifo_size[0][]) = 60KiB
> > SUM(rx_fifo_size[1][]) = 64KiB
> > SUM(tx_fifo_size[0][]) = 60KiB
> > SUM(tx_fifo_size[1][]) = 64KiB
> 
> I follow the math with 64KiB, but surely the 60KiB should be
> 9+9+9+9+1+1+1+1=40KiB? This seems to me that the "legacy EQOS" simply
> shifts with smaller FIFOs. Since dwmac is licensed as a soft IP core,
> perhaps the FIFO size is an elaboration parameter? That would mean
> this isn't an issue with dwmac 5.0 broadly, but with Nvidia's specific
> instantiation of it.

Right, 40KiB. Sorry, I'm getting interrupted almost constantly while
trying to do anything.

However, I've tested with 0x7f in both fields, and it still falls flat
on its face. I've also tried other values, but because I had to unplug
the laptop from the nvidia board to use the laptop portably due to the
medical emergency situation, that caused screen to quit, so I've lost
all that. Chaos reigns supreme here :/

So, I'm not sure we understand what's going on - I don't think it's that
the FIFOs are smaller than specified. I suspect that the 9KiB vs 36KiB
results in some kind of throttling that prevents the condition which
hangs the hardware.

I'm not getting as much time as I'd like to really test out scenarios
due to everything that is going on, and honestly I feel like just
writing this week off now and giving up.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH v3 net] vsock: fix buffer size clamping order
From: Michal Luczaj @ 2026-04-15 19:55 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Norbert Szetei, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, virtualization, netdev, linux-kernel
In-Reply-To: <ad9q6r6ZBHIH7k4f@sgarzare-redhat>

On 4/15/26 12:42, Stefano Garzarella wrote:
> On Tue, Apr 14, 2026 at 04:22:04PM +0200, Michal Luczaj wrote:
>> On 4/9/26 18:34, Norbert Szetei wrote:
>>> In vsock_update_buffer_size(), the buffer size was being clamped to the
>>> maximum first, and then to the minimum. If a user sets a minimum buffer
>>> size larger than the maximum, the minimum check overrides the maximum
>>> check, inverting the constraint.
>>>
>>> This breaks the intended socket memory boundaries by allowing the
>>> vsk->buffer_size to grow beyond the configured vsk->buffer_max_size.
>>>
>>> Fix this by checking the minimum first, and then the maximum. This
>>> ensures the buffer size never exceeds the buffer_max_size.
>>
>> Something may be missing. After adding another ioctl to your reproducer, I
>> still see crashes.
>>
>>     SYSCHK(setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_MIN_SIZE, &min,
>>                       sizeof(min)));
>> +    SYSCHK(setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_MAX_SIZE, &min,
>> +                      sizeof(min)));
>> }
>>
>> [*] Setting buffer_min_size to 0x400000000.
>> [socket][0] sending...
>>
>> refcount_t: saturated; leaking memory.
>> WARNING: lib/refcount.c:22 at refcount_warn_saturate+0x7d/0xb0, CPU#2:
>> a.out/1478
>> ...
>> refcount_t: underflow; use-after-free.
>> WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x50/0xb0, CPU#12:
>> kworker/12:0/80
>> Workqueue: vsock-loopback vsock_loopback_work
>> ...
>>
> 
> yeah, I pointed out the same during the bug discussion 
> (https://lore.kernel.org/netdev/acuKUpZQq6z1DY_n@sgarzare-redhat/) and 
> suggested to add a sysctl or reuse net.core.wmem_max/rmem_max
> (https://lore.kernel.org/netdev/adYKERRYwzMIhZAl@sgarzare-redhat/)

Oh, so this patch wasn't meant to tackle the repro from v1. Sorry, I got
confused.

Michal


^ permalink raw reply

* Re: [PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted
From: Russell King (Oracle) @ 2026-04-15 19:58 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Maxime Chevallier,
	Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin,
	Giuseppe Cavallaro, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel, stable
In-Reply-To: <CAH5Ym4gy6g8d88-vGhe1zxoV7jNH_fXHsDSdDWC4x00H7s-3=w@mail.gmail.com>

On Wed, Apr 15, 2026 at 10:53:15AM -0700, Sam Edwards wrote:
> On Wed, Apr 15, 2026 at 9:28 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Wed, Apr 15, 2026 at 01:56:32PM +0100, Russell King (Oracle) wrote:
> > > Locally, while debugging my issues, I used this to prevent cur_rx
> > > catching up with dirty_rx:
> > >
> > >                 status = stmmac_rx_status(priv, &priv->xstats, p);
> > >                 /* check if managed by the DMA otherwise go ahead */
> > >                 if (unlikely(status & dma_own))
> > >                         break;
> > >
> > >                 next_entry = STMMAC_NEXT_ENTRY(rx_q->cur_rx,
> > >                                                priv->dma_conf.dma_rx_size);
> > >                 if (unlikely(next_entry == rx_q->dirty_rx))
> > >                         break;
> > >
> > >                 rx_q->cur_rx = next_entry;
> > >
> > > If we care about the cost of reloading rx_q->dirty_rx on every
> > > iteration, then I'd suggest that the cost we already incur reading and
> > > writing rx_q->cur_rx is something that should be addressed, and
> > > eliminating that would counter the cost of reading rx_q->dirty_rx. I
> > > suspect, however, that the cost is minimal, as cur_tx and dirty_rx are
> > > likely in the same cache line.
> 
> No, no, I like your approach better. :) It also removes the need for
> the `limit` clamp at the top of the function, so later code can assume
> limit==budget.
> 
> > > It looks like any fix to stmmac_rx() will also need a corresponding
> > > fix for stmmac_rx_zc().
> 
> I agree that stmmac_rx_zc() is likely also broken (in a similar way,
> but not similar enough to permit a "corresponding" fix), but I don't
> agree that there's a dependency relationship here. This patch is
> addressing #221010, which affects the generic/non-ZC codepath; I'm
> afraid the ZC codepath warrants its own investigation.

The code structure is identical. The only difference is what happens
to the packets.

Both paths take the NAPI limit. Both paths process up to that limit of
descriptors. The state saving / restoring is similar. The read_again
label is the same, the condition after is the same.

The ZC path differs at this point in that it will attempt to refill
every 16 descriptors that have been processed.

Both paths then read the descriptor and check the ownership.
Both paths then increment cur_rx to point to the next entry around
the ring.
Both paths then get the following descriptor pointer and prefetch
it.
Both paths then get the extended status if we're using extended
descriptors.
Both paths then handle frame discard.
Both paths then jump back to read_again if this isn't the last
segment and we have an error.
Both paths then check for error.
... and so it goes on.

The ZC path to me looks like a copy-paste-and-tweak approach to
adding support. The difference seems to be centered only around
the handling of the data buffers in the descriptors. The overall
mechanism of processing the descriptors follows the same layout
in both functions.

> > I have some further information, but a new curveball has just been
> > chucked... and I've no idea what this will mean at this stage. Just
> > take it that I won't be responding for a while.
> 
> I think I follow your meaning. Good luck getting it straightened out!

It looks like further curveballs have been thrown as a result,
destroying all "plans" for the next days/week. I have aboslutely
no ideas how much time or when I'll be able to look at anything
at the moment, so don't assume that because I find an opportunity
to send an email, everthing is back to normal.

I'll also note that over the last two days I've written several
emails on this, spent many hours on them, only to discard them
as other ideas/research and maybe even the passage of time means
they're no longer appropriate to send.

Jakub: sorry, I just *can't* review stuff on netdev with everything
that is going on, not when .... cna't complete this.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH net] slip: reject VJ frames when no receive slots are allocated
From: Weiming Shi @ 2026-04-15 20:03 UTC (permalink / raw)
  To: Simon Horman
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, kees, netdev, xmei5
In-Reply-To: <20260414134126.758795-2-horms@kernel.org>

On 26-04-14 14:41, Simon Horman wrote:
> From: 'Simon Horman' <horms@kernel.org>
> 
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> slip: reject VJ frames when no receive slots are allocated
> 
> This commit prevents a null pointer dereference by adding checks for
> a NULL comp->rstate in slhc_uncompress() and slhc_remember(). It ensures
> frames are rejected when no receive slots are allocated.
> 
> > diff --git a/drivers/net/slip/slhc.c b/drivers/net/slip/slhc.c
> > index e3c785da3eef3..e67052bcab57b 100644
> > --- a/drivers/net/slip/slhc.c
> > +++ b/drivers/net/slip/slhc.c
> > @@ -502,6 +502,10 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize)
> >  
> >  	/* We've got a compressed packet; read the change byte */
> >  	comp->sls_i_compressed++;
> > +	if (!comp->rstate) {
> > +		comp->sls_i_error++;
> > +		return 0;
> > +	}
> 
> While reviewing this code, a few pre-existing issues were noticed in and
> around slhc_uncompress().
> 
> Can slhc_uncompress() cause a heap buffer overflow?
> 
> It expands compressed packets by prepending reconstructed IP and TCP
> headers, which can total up to 120 bytes if options are maximized. The
> caller in slip.c assumes a maximum expansion of 80 bytes:
> 
>     if (count + 80 > sl->buffsize)
> 
> If a compressed packet is sized at sl->buffsize - 80, slhc_uncompress()
> shifts the payload and copies the 120-byte headers, writing up to 37 bytes
> past the dynamically allocated sl->rbuff chunk. Does this corrupt memory?
> 
> Also, does the decode() function read variable-length deltas without
> bounds checking against the packet length isize?
> 
> A malformed packet with multiple change flags set could cause decode()
> to read up to 15 bytes past the end of the packet data. Because the packet
> buffer sl->rbuff is allocated via kmalloc without zero-initialization,
> could this read uninitialized kernel heap memory and inject it into the
> reconstructed TCP/IP headers, leaking it back to the network?
> 
> Finally, do slhc_compress() and slhc_uncompress() read and write the
> 16-bit TCP checksum using direct pointer casts?
> 
> drivers/net/slip/slhc.c:slhc_uncompress() {
>     ...
>     thp->check = *(__sum16 *)cp;
>     ...
> }
> 
> Since the compressed header varies dynamically in size, the cp pointer
> frequently lands on odd byte boundaries. On architectures with strict
> alignment requirements, could this direct dereference trigger an alignment
> fault and kernel panic? Should the code use get_unaligned() and
> put_unaligned() instead?
> 
> [ ... ]
> 
> > @@ -651,8 +655,9 @@ slhc_remember(struct slcompress *comp, unsigned char *icp, int isize)
> >  
> >  	/* The packet is shorter than a legal IP header.
> >  	 * Also make sure isize is positive.
> > +	 * Reject if no receive slots are configured (rstate is NULL).
> >  	 */
> > -	if (isize < (int)sizeof(struct iphdr)) {
> > +	if (!comp->rstate || isize < (int)sizeof(struct iphdr)) {
> >  runt:
> >  		comp->sls_i_runt++;
> >  		return slhc_toss(comp);
> 
> When dropping a packet due to a lack of configured receive slots
> (rslots == 0), should this increment comp->sls_i_error instead of
> comp->sls_i_runt?
> 
> Dropping a packet because rstate is missing is an internal configuration
> state, not a runt packet.

Thanks, Simon. I will send v2 patch.

I also reproduced the three pre-existing issues you pointed out
(one via KASAN, the other two under GDB) and will send patches for
them separately.

Best regards,
Weiming Shi

^ permalink raw reply

* Re: [PATCH nf,v2 1/3] rculist: add list_splice_rcu() for private lists
From: Pablo Neira Ayuso @ 2026-04-15 20:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, fw, horms,
	joelagnelf, josh, boqun, urezki, rostedt, mathieu.desnoyers,
	jiangshanlai, qiang.zhang, rcu
In-Reply-To: <9210a276-8158-40f4-b3b5-6431f5f13541@paulmck-laptop>

On Wed, Apr 15, 2026 at 10:39:33AM -0700, Paul E. McKenney wrote:
> On Wed, Apr 15, 2026 at 07:08:44PM +0200, Pablo Neira Ayuso wrote:
> > This patch adds a helper function, list_splice_rcu(), to safely splice
> > a private (non-RCU-protected) list into an RCU-protected list.
> > 
> > The function ensures that only the pointer visible to RCU readers
> > (prev->next) is updated using rcu_assign_pointer(), while the rest of
> > the list manipulations are performed with regular assignments, as the
> > source list is private and not visible to concurrent RCU readers.
> > 
> > This is useful for moving elements from a private list into a global
> > RCU-protected list, ensuring safe publication for RCU readers.
> > Subsystems with some sort of batching mechanism from userspace can
> > benefit from this new function.
> > 
> > The function __list_splice_rcu() has been added for clarity and to
> > follow the same pattern as in the existing list_splice*() interfaces,
> > where there is a check to ensure that that the list to splice is not
> > empty. Note that __list_splice_rcu() has no documentation for this
> > reason.
> > 
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> > v2: including comments by Paul McKenney.
> > 
> >     Except, I have deliberately keep back the suggestion to squash
> >     __list_splice_rcu() into list_splice_rcu(), I instead removed
> >     the documentation for __list_splice_rcu(). I am looking
> >     at other existing list_splice*() function in list.h and rculist.h
> >     to get this aligned with __list_splice(), which also has no users
> >     in the tree and no documentation. I find it easier to read with
> >     __list_splice(), but if this explaination is not sound so...
> > 
> >     @Paul: I can post v3 squashing __list_splice_rcu(), just let me
> >            know.
> 
> Removing the comment addresses most of my concerns.  I do have a slight
> but not overwhelming preference for the squashed version, but either way:
> 
> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Or if you want this to go in via RCU, please let us know.  My guess is
> that it would be easier for you to take it in with the code using it.

I'd prefer to take it through nf.git, I need this as a fix for an
invalid use of list_splice() on a RCU-protected list.

Thanks for your quick review Paul!

^ permalink raw reply

* Re: [RFC PATCH net-next v2 1/2] udp: fix encapsulation packet resubmit in multicast deliver
From: Kuniyuki Iwashima @ 2026-04-15 20:35 UTC (permalink / raw)
  To: littlesmilingcloud
  Cc: davem, dsahern, edumazet, horms, kuba, linux-kselftest, netdev,
	pabeni, shuah, willemdebruijn.kernel
In-Reply-To: <ad_eD6CjMOBSXQVm@dau-home-pc>

From: Anton Danilov <littlesmilingcloud@gmail.com>
Date: Wed, 15 Apr 2026 21:50:55 +0300
> When a UDP encapsulation socket (e.g., FOU) receives a multicast
> packet, __udp4_lib_mcast_deliver() and __udp6_lib_mcast_deliver()
> incorrectly call consume_skb() when udp_queue_rcv_skb() returns a
> positive value. A positive return value from udp_queue_rcv_skb()
> indicates that the encap_rcv handler (e.g., fou_udp_recv) has
> consumed the UDP header and wants the packet to be resubmitted to
> the IP protocol handler for further processing (e.g., as a GRE
> packet).
> 
> The unicast path in udp_unicast_rcv_skb() handles this correctly by
> returning -ret, which propagates up to ip_protocol_deliver_rcu() for
> resubmission. The GSO path in udp_queue_rcv_skb() also handles this
> correctly by calling ip_protocol_deliver_rcu() directly. However, the
> multicast path destroys the packet via consume_skb() instead of
> resubmitting it, causing silent packet loss.
> 
> This affects any UDP encapsulation (FOU, GUE) combined with multicast
> destination addresses. In practice, it causes ~50% packet loss on
> FOU/GRETAP tunnels configured with multicast remote addresses, with
> the exact ratio depending on the early demux cache hit rate (packets
> that hit early demux take the unicast path and are handled correctly).
> 
> Fix this by calling ip_protocol_deliver_rcu() (IPv4) or
> ip6_protocol_deliver_rcu() (IPv6) instead of consume_skb() when the
> return value is positive, matching the behavior of the GSO path.
> 
> Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
> Assisted-by: Claude:claude-opus-4-6
> ---
>  net/ipv4/udp.c | 13 +++++++++----
>  net/ipv6/udp.c | 13 +++++++++----
>  2 files changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index e9e2ce9522ef..8c2d4367cba2 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2467,6 +2467,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  	struct udp_hslot *hslot;
>  	struct sk_buff *nskb;
>  	bool use_hash2;
> +	int ret;
>  
>  	hash2_any = 0;
>  	hash2 = 0;
> @@ -2500,8 +2501,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  			__UDP_INC_STATS(net, UDP_MIB_INERRORS);
>  			continue;
>  		}
> -		if (udp_queue_rcv_skb(sk, nskb) > 0)
> -			consume_skb(nskb);
> +		ret = udp_queue_rcv_skb(sk, nskb);
> +		if (ret > 0)
> +			ip_protocol_deliver_rcu(dev_net(nskb->dev), nskb,
> +						ret);

Is this path reachable in the first place ?

Maybe I'm missing something, but UDP tunnel sockets
should not have SO_REUSEADDR/SO_REUSEPORT.


>  	}
>  
>  	/* Also lookup *:port if we are using hash2 and haven't done so yet. */
> @@ -2511,8 +2514,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  	}
>  
>  	if (first) {
> -		if (udp_queue_rcv_skb(first, skb) > 0)
> -			consume_skb(skb);
> +		ret = udp_queue_rcv_skb(first, skb);
> +		if (ret > 0)
> +			ip_protocol_deliver_rcu(dev_net(skb->dev), skb,
> +						ret);

If the above is true, we can simply return -ret here
to avoid possible stack overflow with too many FOU
encapsulation that syzbot is fond of.


Please wait 24h before next submission.
https://docs.kernel.org/process/maintainer-netdev.html


>  	} else {
>  		kfree_skb(skb);
>  		__UDP_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 15e032194ecc..f74935d9f7d7 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -949,6 +949,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  	struct udp_hslot *hslot;
>  	struct sk_buff *nskb;
>  	bool use_hash2;
> +	int ret;
>  
>  	hash2_any = 0;
>  	hash2 = 0;
> @@ -987,8 +988,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  			continue;
>  		}
>  
> -		if (udpv6_queue_rcv_skb(sk, nskb) > 0)
> -			consume_skb(nskb);
> +		ret = udpv6_queue_rcv_skb(sk, nskb);
> +		if (ret > 0)
> +			ip6_protocol_deliver_rcu(dev_net(nskb->dev), nskb,
> +						 ret, true);
>  	}
>  
>  	/* Also lookup *:port if we are using hash2 and haven't done so yet. */
> @@ -998,8 +1001,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  	}
>  
>  	if (first) {
> -		if (udpv6_queue_rcv_skb(first, skb) > 0)
> -			consume_skb(skb);
> +		ret = udpv6_queue_rcv_skb(first, skb);
> +		if (ret > 0)
> +			ip6_protocol_deliver_rcu(dev_net(skb->dev), skb,
> +						 ret, true);
>  	} else {
>  		kfree_skb(skb);
>  		__UDP6_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
> -- 
> 2.47.3
> 

^ permalink raw reply

* [PATCH net v2] slip: reject VJ receive packets on instances with no rstate array
From: Weiming Shi @ 2026-04-15 20:41 UTC (permalink / raw)
  To: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: netdev, Xiang Mei, Weiming Shi

slhc_init() accepts rslots == 0 as a valid configuration, with the
documented meaning of 'no receive compression'. In that case the
allocation loop in slhc_init() is skipped, so comp->rstate stays
NULL and comp->rslot_limit stays 0 (from the kzalloc of struct
slcompress).

The receive helpers do not defend against that configuration.
slhc_uncompress() dereferences comp->rstate[x] when the VJ header
carries an explicit connection ID, and slhc_remember() later assigns
cs = &comp->rstate[...] after only comparing the packet's slot number
to comp->rslot_limit. Because rslot_limit is 0, slot 0 passes the
range check, and the code dereferences a NULL rstate.

The configuration is reachable in-tree through PPP. PPPIOCSMAXCID
stores its argument in a signed int, and (val >> 16) uses arithmetic
shift. Passing 0xffff0000 therefore sign-extends to -1, so val2 + 1
is 0 and ppp_generic.c ends up calling slhc_init(0, 1). Because
/dev/ppp open is gated by ns_capable(CAP_NET_ADMIN), the whole path
is reachable from an unprivileged user namespace. Once the malformed
VJ state is installed, any inbound VJ-compressed or VJ-uncompressed
frame that selects slot 0 crashes the kernel in softirq context:

 Oops: general protection fault, probably for non-canonical
       address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
 RIP: 0010:slhc_uncompress (drivers/net/slip/slhc.c:519)
 Call Trace:
  <TASK>
  ppp_receive_nonmp_frame (drivers/net/ppp/ppp_generic.c:2466)
  ppp_input (drivers/net/ppp/ppp_generic.c:2359)
  ppp_async_process (drivers/net/ppp/ppp_async.c:492)
  tasklet_action_common (kernel/softirq.c:926)
  handle_softirqs (kernel/softirq.c:623)
  run_ksoftirqd (kernel/softirq.c:1055)
  smpboot_thread_fn (kernel/smpboot.c:160)
  kthread (kernel/kthread.c:436)
  ret_from_fork (arch/x86/kernel/process.c:164)
  </TASK>

Reject the receive side on such instances instead of touching rstate.
slhc_uncompress() falls through to its existing 'bad' label, which
bumps sls_i_error and enters the toss state. slhc_remember() mirrors
that with an explicit sls_i_error increment followed by slhc_toss();
the sls_i_runt counter is not used here because a missing rstate is
an internal configuration state, not a runt packet.

The transmit path is unaffected: the only in-tree caller that picks
rslots from userspace (ppp_generic.c) still supplies tslots >= 1, and
slip.c always calls slhc_init(16, 16), so comp->tstate remains valid
and slhc_compress() continues to work.

Fixes: b5451d783ade ("slip: Move the SLIP drivers")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
v2:
- slhc_remember(): use sls_i_error instead of sls_i_runt for the
  missing-rstate case; it is a configuration error, not a runt packet
  (Simon).
- slhc_uncompress(): goto bad instead of returning 0, so the instance
  also enters SLF_TOSS on the first rejected frame.

 drivers/net/slip/slhc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/slip/slhc.c b/drivers/net/slip/slhc.c
index e3c785da3eef3..e18a4213d10ce 100644
--- a/drivers/net/slip/slhc.c
+++ b/drivers/net/slip/slhc.c
@@ -506,6 +506,8 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize)
 		comp->sls_i_error++;
 		return 0;
 	}
+	if (!comp->rstate)
+		goto bad;
 	changes = *cp++;
 	if(changes & NEW_C){
 		/* Make sure the state index is in range, then grab the state.
@@ -649,6 +651,10 @@ slhc_remember(struct slcompress *comp, unsigned char *icp, int isize)
 	struct cstate *cs;
 	unsigned int ihl;

+	if (!comp->rstate) {
+		comp->sls_i_error++;
+		return slhc_toss(comp);
+	}
 	/* The packet is shorter than a legal IP header.
 	 * Also make sure isize is positive.
 	 */
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: KaFai Wan @ 2026-04-15 20:47 UTC (permalink / raw)
  To: Martin KaFai Lau, Jiayuan Chen
  Cc: bpf, Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu, Eric Dumazet,
	Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern,
	netdev, linux-doc, linux-kernel
In-Reply-To: <2026415181939.1bue.martin.lau@linux.dev>

On Wed, 2026-04-15 at 11:55 -0700, Martin KaFai Lau wrote:
> On Tue, Apr 14, 2026 at 06:57:00PM +0800, Jiayuan Chen wrote:
> > A BPF_PROG_TYPE_SOCK_OPS program can set BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG
> > to inject custom TCP header options. When the kernel builds a TCP packet,
> > it calls tcp_established_options() to calculate the header size, which
> > invokes bpf_skops_hdr_opt_len() to trigger the BPF_SOCK_OPS_HDR_OPT_LEN_CB
> > callback.
> > 
> > If the BPF program calls bpf_setsockopt(TCP_NODELAY) inside this callback,
> > __tcp_sock_set_nodelay() will call tcp_push_pending_frames(), which calls
> > tcp_current_mss(), which calls tcp_established_options() again,
> > re-triggering the same BPF callback. This creates an infinite recursion
> > that exhausts the kernel stack and causes a panic.
> > 
> > BPF_SOCK_OPS_HDR_OPT_LEN_CB
> >   -> bpf_setsockopt(TCP_NODELAY)
> > 	-> tcp_push_pending_frames()
> > 	  -> tcp_current_mss()
> > 		-> tcp_established_options()
> > 		  -> bpf_skops_hdr_opt_len()
> >                            /* infinite recursion */
> > 			-> BPF_SOCK_OPS_HDR_OPT_LEN_CB
> > 
> > A similar reentrancy issue exists for TCP congestion control, which is
> > guarded by tp->bpf_chg_cc_inprogress. Adopt the same approach: introduce
> > tp->bpf_hdr_opt_len_cb_inprogress, set it before invoking the callback in
> > bpf_skops_hdr_opt_len(), and check it in sol_tcp_sockopt() to reject
> > bpf_setsockopt(TCP_NODELAY) calls that would trigger
> > tcp_push_pending_frames() and cause the recursion.
> > 
> > Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> > Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> > Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> > Reported-by: Dongliang Mu <dzm91@hust.edu.cn>
> > Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
> 
> Thanks for the report and fixes suggested across different threads.
> 
> Using has_current_bpf_ctx() to avoid tcp_push_pending_frames() should
> work but it may change the expectation for bpf_setsockopt(TCP_NODELAY).
> e.g. A bpf_tcp_iter does bpf_setsockopt(TCP_NODELAY).
> 
> Adding another bit in the tcp_sock is not ideal either. I agree with
> Alexei that it is better to reuse the existing bit if we go down this path.
> We also need to audit more closely if there are cases that two different
> type of bpf progs may call bpf_setsockopt(). e.g.
> bpf_tcp_iter does bpf_setsockopt(TCP_CONGESTION) to switch
> to a bpf_tcp_cc and the new bpf_tcp_cc->init() will also do
> bpf_setsockopt(xxx) which then will be rejected.
> 
> Another fix could be, the bpf_setsockopt(TCP_NODELAY) is always broken
> for BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB unless
> the bpf prog is doing some maneuver to avoid the recursion. Thus,
> this use case is basically broken as is and I don't see a use case
> for bpf_setsockopt(TCP_NODELAY) when writing header also.
> How about checking the bpf_sock->op, level, and optname in
> bpf_sock_ops_setsockopt() and return -EOPNOTSUPP?

Hi Martin, thanks for the review.

I'm working whit return -EOPNOTSUPP. I've completed whit the code of fix and test, and will send the
patch later.

The fix is:

diff --git a/net/core/filter.c b/net/core/filter.c
index fcfcb72663ca..911ff04bca5a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	if (!is_locked_tcp_sock_ops(bpf_sock))
 		return -EOPNOTSUPP;
 
+	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
+	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
+	    IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY)
+		return -EOPNOTSUPP;
+
 	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
 }

-- 
Thanks,
KaFai

^ permalink raw reply related

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Sam Edwards @ 2026-04-15 20:50 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni
In-Reply-To: <ad_o4aDP0UBY_8i4@shell.armlinux.org.uk>

On Wed, Apr 15, 2026 at 12:37 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> It's not a question about how I define RBU - this is defined by Synopsys
> and I'm using it *exactly* that way as stated in the documentation.
>
> "This bit indicates that the host owns the Next Descriptor in the
> Receive List and the DMA cannot acquire it. The Receive Process is
> suspended. ... This bit is set only when the previous Receive
> Descriptor is owned by the DMA."
>
> In other words, DMA has processed the previous receive descriptor which
> _was_ owned by the hardware, written back to clear the OWN bit, and
> then fetches the next descriptor and finds that the OWN bit is also
> clear.

I'm only trying to leave open the possibility that the Synopsys
technical writer and the hardware implementation team weren't
communicating clearly. We already have a situation where RPS isn't
behaving as documented (even if that's likely just hardware
misconfiguration), so while I'm currently pretty sure RBU carries no
other (actual) meaning than "DMA caught up to OWN=0," I'm only about
75% confident.

> > It would seem* that the kernel isn't really failing to keep up with
> > the packet rate. If RBU is firing with a ring that's not even close to
> > empty, that tells me there's another way for it to fire. So I suspect
> > the hardware designers implemented it to mean:
> > "We couldn't read the next descriptor, _or_ it wasn't owned by hardware."
> >
> > (* However, if bit 31 is clear everywhere, wouldn't that mean the ring
> > is actually completely depleted, not full? If count==budget, wouldn't
> > that mean the whole ring hasn't been visited, so we only refilled 64
> > entries and not necessarily the entire ring? Maybe the kernel isn't
> > keeping up after all.)
>
> Ah, I think that's where our terminology differs.
>
> You seem to define full as "populated with empty buffers". I define
> full to mean "the hardware has filled every buffer with a packet that
> it has received and handed it over to software to process." Note even
> the terminology there - filling buffers with data. That ultimately
> ends up filling the ring, and when completely filled, it is full.
>
> I think of buffers like buckets. If a buffer contains no data, it
> is empty. If a buffer contains data, it has been filled or is full.
> Apply that to a list of buffers and you get the same thing. Many
> ethernet driver documentation uses this same terminology, so I
> thought it would be widely understood.

Ah okay, I was beginning to suspect the same. In my defense: though I
also think of buffers in the same way, this driver calls the process
of supplying empty buffers "refilling," which is also the terminology
we've both been using throughout this exchange, and when something is
"completely refilled" I generally call it "full." But I'm realizing
now that the bidirectional (submissions+completions) nature of this
ring means that "full" and "empty" aren't really well-defined
concepts. I'll try to read more carefully (and switch to saying
"completely dirty" and "completely clean") going forward.

So the kernel is able to supply clean buffers without issue, but it
somehow falls behind the incoming packet rate and the DMA is left with
a completely dirty ring. I agree that stmmac_rx() is therefore just
not running fast enough: either it's got really bad scheduler jitter
for the ~6.3ms minimum it takes for 512x full-sized Ethernet frames to
arrive from the PHY (your scenario 1), or -- more likely -- the NAPI
budgets gradually fall behind the hardware (your scenario 2).

> Right, 40KiB. Sorry, I'm getting interrupted almost constantly while
> trying to do anything.
>
> However, I've tested with 0x7f in both fields, and it still falls flat
> on its face. I've also tried other values, but because I had to unplug
> the laptop from the nvidia board to use the laptop portably due to the
> medical emergency situation, that caused screen to quit, so I've lost
> all that. Chaos reigns supreme here :/

I'm sorry to hear about that, please prioritize you/yours and don't
feel like you owe me speedy replies.

> So, I'm not sure we understand what's going on - I don't think it's that
> the FIFOs are smaller than specified. I suspect that the 9KiB vs 36KiB
> results in some kind of throttling that prevents the condition which
> hangs the hardware.

I'll try playing with the FIFO configuration on my end to learn:
a) If a suitably-configured FIFO size makes the RPS status arrive as documented
b) If I can safely fill the FIFO slowly (by manually stalling the
driver and adding frames one at a time) and have it drain on resume
c) Whether the TQS value can be adjusted independently of this
problem's prevalence
d) The maximum RQS value that allows the problem to happen

> I'm not getting as much time as I'd like to really test out scenarios
> due to everything that is going on, and honestly I feel like just
> writing this week off now and giving up.

I have the same hardware, observe the same issue, and find this
interesting enough to keep plugging away at it. I would have no hard
feelings if you left me alone with this problem for a bit. :)

Be well,
Sam

^ permalink raw reply

* Re: [PATCH iwl-net] ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
From: Jacob Keller @ 2026-04-15 21:22 UTC (permalink / raw)
  To: Simon Horman, Petr Oros
  Cc: netdev, Tony Nguyen, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Aleksandr Loktionov, Nikolay Aleksandrov, Daniel Zahka,
	Paul Greenwalt, Dave Ertman, Michal Swiatkowski, intel-wired-lan,
	linux-kernel
In-Reply-To: <20260415163003.GP772670@horms.kernel.org>

On 4/15/2026 9:30 AM, Simon Horman wrote:
> On Mon, Apr 13, 2026 at 09:14:20PM +0200, Petr Oros wrote:
>> On certain E810 configurations where firmware supports Tx scheduler
>> topology switching (tx_sched_topo_comp_mode_en), ice_cfg_tx_topo()
>> may need to apply a new 5-layer or 9-layer topology from the DDP
>> package. If the AQ command to set the topology fails (e.g. due to
>> invalid DDP data or firmware limitations), the global configuration
>> lock must still be cleared via a CORER reset.
>>
>> Commit 86aae43f21cf ("ice: don't leave device non-functional if Tx
>> scheduler config fails") correctly fixed this by refactoring
>> ice_cfg_tx_topo() to always trigger CORER after acquiring the global
>> lock and re-initialize hardware via ice_init_hw() afterwards.
>>
>> However, commit 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end
>> of deinit paths") later moved ice_init_dev_hw() into ice_init_hw(),
>> breaking the reinit path introduced by 86aae43f21cf. This creates an
>> infinite recursive call chain:
>>
>>   ice_init_hw()
>>     ice_init_dev_hw()
>>       ice_cfg_tx_topo()         # topology change needed
>>         ice_deinit_hw()
>>         ice_init_hw()           # reinit after CORER
>>           ice_init_dev_hw()     # recurse
>>             ice_cfg_tx_topo()
>>               ...               # stack overflow
>>
>> Fix by moving ice_init_dev_hw() back out of ice_init_hw() and calling
>> it explicitly from ice_probe() and ice_devlink_reinit_up(). The third
>> caller, ice_cfg_tx_topo(), intentionally does not need ice_init_dev_hw()
>> during its reinit, it only needs the core HW reinitialization. This
>> breaks the recursion cleanly without adding flags or guards.
>>
>> The deinit ordering changes from commit 8a37f9e2ff40 ("ice: move
>> ice_deinit_dev() to the end of deinit paths") which fixed slow rmmod
>> are preserved, only the init-side placement of ice_init_dev_hw() is
>> reverted.
>>
>> Fixes: 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end of deinit paths")
>> Signed-off-by: Petr Oros <poros@redhat.com>
> 
> Hi Petr,
> 
> I don't intended to delay this patch.
> But could you follow-up by looking over the AI generated
> review of this patch on sashiko.dev?
> 
> Thanks!

I'll take a look as well. I recently included this fix in Intel Wired
LAN update last night, so hopefully nothing too problematic...

Sashiko says:

> While this code wasn't introduced by this patch, the restructuring makes it
> more visible: can this cause a use-after-free if the nested hardware
> initialization fails?
> If ice_cfg_tx_topo() triggers a topology change, it performs a CORER reset
> followed by an unroll (ice_deinit_hw) and re-initialization (ice_init_hw). If
> that nested ice_init_hw() fails, its unroll path frees hw->port_info and
> destroys control queues and mutexes.
> Because ice_init_dev_hw() returns void, it swallows the -ENODEV error and
> falls back to safe mode. This allows ice_probe() to proceed with an unrolled
> and freed hardware struct, which would result in a use-after-free when memory
> like hw->port_info is accessed later.
> Should ice_init_dev_hw() be updated to return an error code so the caller can
> abort the probe when base hardware unrolls occur?

I think this suggestion might be good. We fail in probe if ice_init_hw()
fails regardless of "safe" mode, so having init_dev_hw() also fail if
the reinit fails makes some sense to me...

Thanks,
Jake

^ permalink raw reply

* Re: [PATCH iwl-net] ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
From: Jacob Keller @ 2026-04-15 21:23 UTC (permalink / raw)
  To: Simon Horman, Petr Oros
  Cc: netdev, Tony Nguyen, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Aleksandr Loktionov, Nikolay Aleksandrov, Daniel Zahka,
	Paul Greenwalt, Dave Ertman, Michal Swiatkowski, intel-wired-lan,
	linux-kernel
In-Reply-To: <f30ad78e-1eb9-4c9d-9034-c8873966de66@intel.com>

On 4/15/2026 2:22 PM, Jacob Keller wrote:
> I'll take a look as well. I recently included this fix in Intel Wired
> LAN update last night, so hopefully nothing too problematic...
> 

Correction, and I need more caffeine: I think I had considered including
this fix but didn't quite make the cut last night when sending.

^ permalink raw reply

* [PATCH net] slip: fix slab-out-of-bounds write in slhc_uncompress()
From: Weiming Shi @ 2026-04-15 21:34 UTC (permalink / raw)
  To: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Andrew Morton, Hans Verkuil, Alex Deucher, Ian Rogers,
	Jonathan Cameron, Kees Cook, Ingo Molnar, Alan Cox, netdev,
	Weiming Shi, Simon Horman

sl_bump() reserves only 80 bytes of expansion headroom before calling
slhc_uncompress(), but the reconstructed IP + TCP header is up to
ip->ihl*4 + thp->doff*4 bytes. IHL and TCP doff are 4-bit fields and
both can legitimately reach 15, so the header can grow to 2*15*4 =
120 bytes. A VJ-uncompressed primer with ihl=15, doff=15 followed by
a compressed frame of size buffsize - 80 therefore writes up to
33 bytes past the kmalloc(buffsize + 4) rbuff allocation, with
attacker-controlled content:

 BUG: KASAN: slab-out-of-bounds in slhc_uncompress
 Write of size 1069 at addr ffff88800ba93078 by task kworker/u8:1/32
 Workqueue: events_unbound flush_to_ldisc
 Call Trace:
  __asan_memmove+0x3f/0x70
  slhc_uncompress (drivers/net/slip/slhc.c:614)
  slip_receive_buf (drivers/net/slip/slip.c:342)
  tty_ldisc_receive_buf
  flush_to_ldisc

Raise the reservation to match the real worst case. The ppp_generic
receive path already enforces skb_tailroom >= 124 and is unaffected.

Fixes: b5451d783ade ("slip: Move the SLIP drivers")
Reported-by: Simon Horman <horms@kernel.org>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
 drivers/net/slip/slip.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
index 820e1a8fc9560..37af7cbe7f81d 100644
--- a/drivers/net/slip/slip.c
+++ b/drivers/net/slip/slip.c
@@ -333,9 +333,13 @@ static void sl_bump(struct slip *sl)
 				printk(KERN_WARNING "%s: compressed packet ignored\n", dev->name);
 				return;
 			}
-			/* make sure we've reserved enough space for uncompress
-			   to use */
-			if (count + 80 > sl->buffsize) {
+			/* slhc_uncompress() prepends up to
+			 * ip->ihl * 4 + thp->doff * 4 bytes of reconstructed
+			 * IPv4 + TCP header. IHL and doff are 4-bit fields
+			 * (max 15) counting 4-byte units, so the header is
+			 * at most 2 * 15 * 4 = 120 bytes.
+			 */
+			if (count + 2 * 15 * 4 > sl->buffsize) {
 				dev->stats.rx_over_errors++;
 				return;
 			}
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v5 8/9] driver core: Replace dev->of_node_reused with dev_of_node_reused()
From: Rob Herring (Arm) @ 2026-04-15 22:10 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: astewart, linux-arm-kernel, Mark Brown, bhelgaas, maz, linux,
	kees, Alan Stern, Saravana Kannan, netdev, linux-serial, davem,
	andrew, Greg Kroah-Hartman, brgl, jirislaby, mani, Johan Hovold,
	linux-aspeed, linux-pci, kuba, Alexander Lobakin, Leon Romanovsky,
	andriy.shevchenko, Rafael J . Wysocki, Alexey Kardashevskiy,
	lgirdwood, andrew, hkallweit1, linux-kernel, Danilo Krummrich,
	Eric Dumazet, linux-usb, alexander.stein, Robin Murphy, pabeni,
	devicetree, driver-core, joel, Christoph Hellwig
In-Reply-To: <20260406162231.v5.8.I806b8636cd3724f6cd1f5e199318ab8694472d90@changeid>


On Mon, 06 Apr 2026 16:23:01 -0700, Douglas Anderson wrote:
> In C, bitfields are not necessarily safe to modify from multiple
> threads without locking. Switch "of_node_reused" over to the "flags"
> field so modifications are safe.
> 
> Cc: Johan Hovold <johan@kernel.org>
> Acked-by: Mark Brown <broonie@kernel.org>
> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
> Reviewed-by: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> Not fixing any known bugs; problem is theoretical and found by code
> inspection. Change is done somewhat manually and only lightly tested
> (mostly compile-time tested).
> 
> (no changes since v4)
> 
> Changes in v4:
> - Use accessor functions for flags
> 
> Changes in v3:
> - New
> 
>  drivers/base/core.c                      | 2 +-
>  drivers/base/pinctrl.c                   | 2 +-
>  drivers/base/platform.c                  | 2 +-
>  drivers/net/pcs/pcs-xpcs-plat.c          | 2 +-
>  drivers/of/device.c                      | 6 +++---
>  drivers/pci/of.c                         | 2 +-
>  drivers/pci/pwrctrl/core.c               | 2 +-
>  drivers/regulator/bq257xx-regulator.c    | 2 +-
>  drivers/regulator/rk808-regulator.c      | 2 +-
>  drivers/tty/serial/serial_base_bus.c     | 2 +-
>  drivers/usb/gadget/udc/aspeed-vhub/dev.c | 2 +-
>  include/linux/device.h                   | 7 ++++---
>  12 files changed, 17 insertions(+), 16 deletions(-)
> 

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>


^ permalink raw reply

* Re: [PATCH RFC 7/8] clk: sunxi-ng: a733: Add bus clock gates
From: Andre Przywara @ 2026-04-15 22:14 UTC (permalink / raw)
  To: Junhui Liu, Michael Turquette, Stephen Boyd, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Chen-Yu Tsai, Jernej Skrabec,
	Samuel Holland, Philipp Zabel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Alexandre Ghiti, Richard Cochran
  Cc: linux-clk, devicetree, linux-arm-kernel, linux-sunxi,
	linux-kernel, linux-riscv, netdev
In-Reply-To: <20260310-a733-clk-v1-7-36b4e9b24457@pigmoral.tech>

Hi,

cheekily jumping in here, for the parts that are easy to verify ;-)

In general this series looks very good, and many thanks for splitting
this up in reviewable chunks, that's much appreciated!

On 3/10/26 09:34, Junhui Liu wrote:
> Add the bus clock gates that control access to the devices' register
> interface on the Allwinner A733 SoC. These clocks are typically
> single-bit controls in the BGR registers, covering UARTs, SPI, I2C, and
> various multimedia engines. It also includes bus gates for system
> components like the IOMMU and MSI-lite interfaces.
> 
> Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
> 
> ---
> The parents of some bus clocks are difficult to determine, as the user
> manual only describes the clock source for a few instances. The current
> configurations are based on references to previous Allwinner SoCs and
> information gathered from the manual. Where documentation is lacking,
> vendor practices are followed by setting the parent to "hosc" for now.
> ---
>   drivers/clk/sunxi-ng/ccu-sun60i-a733.c | 475 ++++++++++++++++++++++++++++++++-
>   1 file changed, 474 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/clk/sunxi-ng/ccu-sun60i-a733.c b/drivers/clk/sunxi-ng/ccu-sun60i-a733.c
> index 36b44568a56f..c0b09f9197d1 100644
> --- a/drivers/clk/sunxi-ng/ccu-sun60i-a733.c
> +++ b/drivers/clk/sunxi-ng/ccu-sun60i-a733.c
> @@ -408,16 +408,19 @@ static SUNXI_CCU_M_DATA_WITH_MUX(ahb_clk, "ahb", ahb_apb_parents, 0x500,
>   				 0, 5,		/* M */
>   				 24, 2,		/* mux */
>   				 0);
> +static const struct clk_hw *ahb_hws[] = { &ahb_clk.common.hw };
>   
>   static SUNXI_CCU_M_DATA_WITH_MUX(apb0_clk, "apb0", ahb_apb_parents, 0x510,
>   				 0, 5,		/* M */
>   				 24, 2,		/* mux */
>   				 0);
> +static const struct clk_hw *apb0_hws[] = { &apb0_clk.common.hw };
>   
>   static SUNXI_CCU_M_DATA_WITH_MUX(apb1_clk, "apb1", ahb_apb_parents, 0x518,
>   				 0, 5,		/* M */
>   				 24, 2,		/* mux */
>   				 0);
> +static const struct clk_hw *apb1_hws[] = { &apb1_clk.common.hw };
>   
>   static const struct clk_parent_data apb_uart_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -430,6 +433,9 @@ static SUNXI_CCU_M_DATA_WITH_MUX(apb_uart_clk, "apb-uart", apb_uart_parents, 0x5
>   				 0, 5,		/* M */
>   				 24, 3,		/* mux */
>   				 0);
> +static const struct clk_hw *apb_uart_hws[] = {
> +	&apb_uart_clk.common.hw
> +};
>   
>   static const struct clk_parent_data trace_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -463,6 +469,8 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(cpu_peri_clk, "cpu-peri", gic_cpu_peri_par
>   				      BIT(31),	/* gate */
>   				      0);
>   
> +static SUNXI_CCU_GATE_DATA(bus_its_pcie_clk, "bus-its-pcie", hosc, 0x574, BIT(1), 0);
> +
>   static const struct clk_parent_data nsi_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
>   	{ .hw = &pll_ddr_clk.common.hw },
> @@ -477,6 +485,7 @@ static SUNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT(nsi_clk, "nsi", nsi_parents, 0x580,
>   					    24, 3,	/* mux */
>   					    BIT(31),	/* gate */
>   					    0, CCU_FEATURE_UPDATE_BIT);
> +static SUNXI_CCU_GATE_DATA(bus_nsi_clk, "bus-nsi", hosc, 0x584, BIT(0), 0);
>   
>   static const struct clk_parent_data mbus_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -493,9 +502,117 @@ static SUNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT(mbus_clk, "mbus", mbus_parents, 0x58
>   					    BIT(31),	/* gate */
>   					    CLK_IS_CRITICAL,
>   					    CCU_FEATURE_UPDATE_BIT);
> +static const struct clk_hw *mbus_hws[] = { &mbus_clk.common.hw };
> +
> +static SUNXI_CCU_GATE_HWS(mbus_iommu0_sys_clk, "mbus-iommu0-sys", mbus_hws, 0x58c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(apb_iommu0_sys_clk, "apb-iommu0-sys", apb0_hws, 0x58c, BIT(1), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_iommu0_sys_clk, "ahb-iommu0-sys", ahb_hws, 0x58c, BIT(2), 0);
> +
> +static SUNXI_CCU_GATE_DATA(bus_msi_lite0_clk, "bus-msi-lite0", hosc, 0x594, BIT(0), 0);
> +static SUNXI_CCU_GATE_DATA(bus_msi_lite1_clk, "bus-msi-lite1", hosc, 0x59c, BIT(0), 0);
> +static SUNXI_CCU_GATE_DATA(bus_msi_lite2_clk, "bus-msi-lite2", hosc, 0x5a4, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(mbus_iommu1_sys_clk, "mbus-iommu1-sys", mbus_hws, 0x5b4, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(apb_iommu1_sys_clk, "apb_iommu1-sys", apb0_hws, 0x5b4, BIT(1), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_iommu1_sys_clk, "ahb_iommu1-sys", ahb_hws, 0x5b4, BIT(2), 0);
> +
> +static SUNXI_CCU_GATE_HWS(ahb_ve_dec_clk, "ahb-ve-dec", ahb_hws,
> +			  0x5c0, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_ve_enc_clk, "ahb-ve-enc", ahb_hws,
> +			  0x5c0, BIT(1), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_vid_in_clk, "ahb-vid-in", ahb_hws,
> +			  0x5c0, BIT(2), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_vid_cout0_clk, "ahb-vid-cout0", ahb_hws,
> +			  0x5c0, BIT(3), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_vid_cout1_clk, "ahb-vid-cout1", ahb_hws,
> +			  0x5c0, BIT(4), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_de_clk, "ahb-de", ahb_hws,
> +			  0x5c0, BIT(5), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_npu_clk, "ahb-npu", ahb_hws,
> +			  0x5c0, BIT(6), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_gpu0_clk, "ahb-gpu0", ahb_hws,
> +			  0x5c0, BIT(7), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_serdes_clk, "ahb-serdes", ahb_hws,
> +			  0x5c0, BIT(8), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_usb_sys_clk, "ahb-usb-sys", ahb_hws,
> +			  0x5c0, BIT(9), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_msi_lite0_clk, "ahb-msi-lite0", ahb_hws,
> +			  0x5c0, BIT(16), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_store_clk, "ahb-store", ahb_hws,
> +			  0x5c0, BIT(24), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_cpus_clk, "ahb-cpus", ahb_hws,
> +			  0x5c0, BIT(28), 0);
> +
> +static SUNXI_CCU_GATE_HWS(mbus_iommu0_clk, "mbus-iommu0", mbus_hws,
> +			  0x5e0, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_iommu1_clk, "mbus-iommu1", mbus_hws,
> +			  0x5e0, BIT(1), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_desys_clk, "mbus-desys", mbus_hws,
> +			  0x5e0, BIT(11), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_ve_enc_gate_clk, "mbus-ve-enc-gate", mbus_hws,
> +			  0x5e0, BIT(12), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_ve_dec_gate_clk, "mbus-ve-dec-gate", mbus_hws,
> +			  0x5e0, BIT(14), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_gpu0_clk, "mbus-gpu0", mbus_hws,
> +			  0x5e0, BIT(16), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_npu_clk, "mbus-npu", mbus_hws,
> +			  0x5e0, BIT(18), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_vid_in_clk, "mbus-vid-in", mbus_hws,
> +			  0x5e0, BIT(24), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_serdes_clk, "mbus-serdes", mbus_hws,
> +			  0x5e0, BIT(28), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_msi_lite0_clk, "mbus-msi-lite0", mbus_hws,
> +			  0x5e0, BIT(29), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_store_clk, "mbus-store", mbus_hws,
> +			  0x5e0, BIT(30), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_msi_lite2_clk, "mbus-msi-lite2", mbus_hws,
> +			  0x5e0, BIT(31), 0);
> +
> +static SUNXI_CCU_GATE_HWS(mbus_dma0_clk, "mbus-dma0", mbus_hws,
> +			  0x5e4, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_ve_enc_clk, "mbus-ve-enc", mbus_hws,
> +			  0x5e4, BIT(1), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_ce_clk, "mbus-ce", mbus_hws,
> +			  0x5e4, BIT(2), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_dma1_clk, "mbus-dma1", mbus_hws,
> +			  0x5e4, BIT(3), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_nand_clk, "mbus-nand", mbus_hws,
> +			  0x5e4, BIT(5), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_csi_clk, "mbus-csi", mbus_hws,
> +			  0x5e4, BIT(8), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_isp_clk, "mbus-isp", mbus_hws,
> +			  0x5e4, BIT(9), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_gmac0_clk, "mbus-gmac0", mbus_hws,
> +			  0x5e4, BIT(11), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_gmac1_clk, "mbus-gmac1", mbus_hws,
> +			  0x5e4, BIT(12), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_ve_dec_clk, "mbus-ve-dec", mbus_hws,
> +			  0x5e4, BIT(18), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_dma0_clk, "bus-dma0", ahb_hws,
> +			  0x704, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_dma1_clk, "bus-dma1", ahb_hws,
> +			  0x70c, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_spinlock_clk, "bus-spinlock", ahb_hws,
> +			  0x724, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_msgbox_clk, "bus-msgbox", ahb_hws,
> +			  0x744, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_pwm0_clk, "bus-pwm0", apb0_hws,
> +			  0x784, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_pwm1_clk, "bus-pwm1", apb0_hws,
> +			  0x78c, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_dbg_clk, "bus-dbg", sys_24M_hws,
> +			  0x7a4, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_sysdap_clk, "bus-sysdap", apb1_hws,
> +			  0x88c, BIT(0), 0);
>   
>   /**************************************************************************
> - *                          mod clocks                                    *
> + *                          mod clocks with gates                         *
>    **************************************************************************/
>   
>   static const struct clk_parent_data timer_parents[] = {
> @@ -565,6 +682,7 @@ static SUNXI_CCU_MP_DATA_WITH_MUX_GATE(timer9_clk, "timer9", timer_parents, 0x82
>   				       24, 3,		/* mux */
>   				       BIT(31),		/* gate */
>   				       0);
> +static SUNXI_CCU_GATE_HWS(bus_timer_clk, "bus-timer", ahb_hws, 0x850, BIT(0), 0);
>   
>   static const struct clk_parent_data avs_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -589,6 +707,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(de_clk, "de", de_parents, 0xa00,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    CLK_SET_RATE_PARENT);
> +static SUNXI_CCU_GATE_HWS(bus_de_clk, "bus-de", ahb_hws, 0xa04, BIT(0), 0);
>   
>   static const struct clk_hw *di_parents[] = {
>   	&pll_periph0_600M_clk.hw,
> @@ -602,6 +721,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(di_clk, "di", di_parents, 0xa20,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    CLK_SET_RATE_PARENT);
> +static SUNXI_CCU_GATE_HWS(bus_di_clk, "bus-di", ahb_hws, 0xa24, BIT(0), 0);
>   
>   static const struct clk_hw *g2d_parents[] = {
>   	&pll_periph0_400M_clk.hw,
> @@ -614,6 +734,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(g2d_clk, "g2d", g2d_parents, 0xa40,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    CLK_SET_RATE_PARENT);
> +static SUNXI_CCU_GATE_HWS(bus_g2d_clk, "bus-g2d", ahb_hws, 0xa44, BIT(0), 0);
>   
>   static const struct clk_hw *eink_parents[] = {
>   	&pll_periph0_480M_clk.common.hw,
> @@ -637,6 +758,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(eink_panel_clk, "eink-panel", eink_panel_par
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    CLK_SET_RATE_PARENT);
> +static SUNXI_CCU_GATE_HWS(bus_eink_clk, "bus-eink", ahb_hws, 0xa6c, BIT(0), 0);
>   
>   static const struct clk_hw *ve_enc_parents[] = {
>   	&pll_ve0_clk.common.hw,
> @@ -668,6 +790,9 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(ve_dec_clk, "ve-dec", ve_dec_parents, 0xa88,
>   				    BIT(31),	/* gate */
>   				    CLK_SET_RATE_PARENT);
>   
> +static SUNXI_CCU_GATE_HWS(bus_ve_enc_clk, "bus-ve-enc", ahb_hws, 0xa8c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_ve_dec_clk, "bus-ve-dec", ahb_hws, 0xa8c, BIT(2), 0);
> +
>   static const struct clk_hw *ce_parents[] = {
>   	&sys_24M_clk.hw,
>   	&pll_periph0_400M_clk.hw,
> @@ -678,6 +803,8 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(ce_clk, "ce", ce_parents, 0xac0,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_HWS(bus_ce_clk, "bus-ce", ahb_hws, 0xac4, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_ce_sys_clk, "bus-ce-sys", ahb_hws, 0xac4, BIT(1), 0);
>   
>   static const struct clk_hw *npu_parents[] = {
>   	&pll_npu_clk.common.hw,
> @@ -693,6 +820,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(npu_clk, "npu", npu_parents, 0xb00,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_DATA(bus_npu_clk, "bus-npu", hosc, 0xb04, BIT(0), 0);
>   
>   /*
>    * GPU_CLK = ClockSource * ((16 - M) / 16)
> @@ -725,6 +853,7 @@ static struct ccu_div gpu_clk = {
>   							   &ccu_div_ops, 0),
>   	}
>   };
> +static SUNXI_CCU_GATE_HWS(bus_gpu_clk, "bus-gpu", ahb_hws, 0xb24, BIT(0), 0);
>   
>   static const struct clk_parent_data dram_parents[] = {
>   	{ .hw = &pll_ddr_clk.common.hw, },
> @@ -740,6 +869,7 @@ static SUNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT(dram_clk, "dram", dram_parents, 0xc0
>   					    BIT(31),	/* gate */
>   					    CLK_IS_CRITICAL,
>   					    CCU_FEATURE_UPDATE_BIT);
> +static SUNXI_CCU_GATE_HWS(bus_dram_clk, "bus-dram", ahb_hws, 0xc0c, BIT(0), 0);
>   
>   static const struct clk_parent_data nand_mmc_parents[] = {
>   	{ .hw = &sys_24M_clk.hw, },
> @@ -758,6 +888,7 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(nand1_clk, "nand1", nand_mmc_parents, 0xc8
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_HWS(bus_nand_clk, "bus-nand", ahb_hws, 0xc8c, BIT(0), 0);
>   
>   static SUNXI_CCU_MP_MUX_GATE_POSTDIV_DUALDIV(mmc0_clk, "mmc0", nand_mmc_parents, 0xd00,
>   					     0, 5,	/* M */
> @@ -796,6 +927,11 @@ static SUNXI_CCU_MP_MUX_GATE_POSTDIV_DUALDIV(mmc3_clk, "mmc3", mmc2_mmc3_parents
>   					     2,		/* post div */
>   					     0);
>   
> +static SUNXI_CCU_GATE_HWS(bus_mmc0_clk, "bus-mmc0", ahb_hws, 0xd0c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_mmc1_clk, "bus-mmc1", ahb_hws, 0xd1c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_mmc2_clk, "bus-mmc2", ahb_hws, 0xd2c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_mmc3_clk, "bus-mmc3", ahb_hws, 0xd3c, BIT(0), 0);
> +
>   static const struct clk_hw *ufs_axi_parents[] = {
>   	&pll_periph0_300M_clk.hw,
>   	&pll_periph0_200M_clk.hw,
> @@ -815,6 +951,29 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(ufs_cfg_clk, "ufs-cfg", ufs_cfg_parents, 0
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_DATA(bus_ufs_clk, "bus-ufs", hosc, 0xd8c, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_uart0_clk, "bus-uart0", apb_uart_hws, 0xe00, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_uart1_clk, "bus-uart1", apb_uart_hws, 0xe04, BIT(1), 0);
> +static SUNXI_CCU_GATE_HWS(bus_uart2_clk, "bus-uart2", apb_uart_hws, 0xe08, BIT(2), 0);
> +static SUNXI_CCU_GATE_HWS(bus_uart3_clk, "bus-uart3", apb_uart_hws, 0xe0c, BIT(3), 0);
> +static SUNXI_CCU_GATE_HWS(bus_uart4_clk, "bus-uart4", apb_uart_hws, 0xe10, BIT(4), 0);
> +static SUNXI_CCU_GATE_HWS(bus_uart5_clk, "bus-uart5", apb_uart_hws, 0xe14, BIT(5), 0);
> +static SUNXI_CCU_GATE_HWS(bus_uart6_clk, "bus-uart6", apb_uart_hws, 0xe18, BIT(6), 0);

According to the manual the gate bits are always BIT(0), since each
UART has its own bus gate register.

> +static SUNXI_CCU_GATE_HWS(bus_i2c0_clk, "bus-i2c0", apb1_hws, 0xe80, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c1_clk, "bus-i2c1", apb1_hws, 0xe84, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c2_clk, "bus-i2c2", apb1_hws, 0xe88, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c3_clk, "bus-i2c3", apb1_hws, 0xe8c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c4_clk, "bus-i2c4", apb1_hws, 0xe90, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c5_clk, "bus-i2c5", apb1_hws, 0xe94, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c6_clk, "bus-i2c6", apb1_hws, 0xe98, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c7_clk, "bus-i2c7", apb1_hws, 0xe9c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c8_clk, "bus-i2c8", apb1_hws, 0xea0, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c9_clk, "bus-i2c9", apb1_hws, 0xea4, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c10_clk, "bus-i2c10", apb1_hws, 0xea8, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c11_clk, "bus-i2c11", apb1_hws, 0xeac, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c12_clk, "bus-i2c12", apb1_hws, 0xeb0, BIT(0), 0);
>   
>   static const struct clk_parent_data spi_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -856,6 +1015,11 @@ static SUNXI_CCU_DUALDIV_MUX_GATE(spi4_clk, "spi4", spi_parents, 0xf28,
>   				  24, 3,	/* mux */
>   				  BIT(31),	/* gate */
>   				  0);
> +static SUNXI_CCU_GATE_HWS(bus_spi0_clk, "bus-spi0", ahb_hws, 0xf04, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_spi1_clk, "bus-spi1", ahb_hws, 0xf0c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_spi2_clk, "bus-spi2", ahb_hws, 0xf14, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_spi3_clk, "bus-spi3", ahb_hws, 0xf24, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_spi4_clk, "bus-spi4", ahb_hws, 0xf2c, BIT(0), 0);
>   
>   static const struct clk_parent_data spif_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -873,6 +1037,7 @@ static SUNXI_CCU_DUALDIV_MUX_GATE(spif_clk, "spif", spif_parents, 0xf18,
>   				  24, 3,	/* mux */
>   				  BIT(31),	/* gate */
>   				  0);
> +static SUNXI_CCU_GATE_HWS(bus_spif_clk, "bus-spif", ahb_hws, 0xf1c, BIT(0), 0);

Can you please move that line into the other SPI gates above, so that
it is ordered by address?

>   
>   static const struct clk_parent_data gpadc_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -883,6 +1048,9 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(gpadc_clk, "gpadc", gpadc_parents, 0xfc0,
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_HWS(bus_gpadc_clk, "bus-gpadc", ahb_hws, 0xfc4, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_ths_clk, "bus-ths", apb0_hws, 0xfe4, BIT(0), 0);
>   
>   static const struct clk_parent_data irrx_parents[] = {
>   	{ .fw_name = "losc"},
> @@ -894,6 +1062,7 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(irrx_clk, "irrx", irrx_parents, 0x1000,
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_HWS(bus_irrx_clk, "bus-irrx", apb0_hws, 0x1004, BIT(0), 0);
>   
>   static const struct clk_parent_data irtx_parents[] = {
>   	{ .fw_name = "losc"},
> @@ -905,6 +1074,9 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(irtx_clk, "irtx", irtx_parents, 0x1008,
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_HWS(bus_irtx_clk, "bus-irtx", apb0_hws, 0x100c, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_lradc_clk, "bus-lradc", apb0_hws, 0x1024, BIT(0), 0);
>   
>   static const struct clk_parent_data sgpio_parents[] = {
>   	{ .fw_name = "losc"},
> @@ -915,6 +1087,7 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(sgpio_clk, "sgpio", sgpio_parents, 0x1060,
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_DATA(bus_sgpio_clk, "bus-sgpio", hosc, 0x1064, BIT(0), 0);
>   
>   static const struct clk_hw *lpc_parents[] = {
>   	&pll_video0_3x_clk.common.hw,
> @@ -927,6 +1100,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(lpc_clk, "lpc", lpc_parents, 0x1080,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_DATA(bus_lpc_clk, "bus-lpc", hosc, 0x1084, BIT(0), 0);

where do these two clocks come from? They are not mentioned in the 
version of the manual I am looking at. If they come from BSP sources, 
please add a comment about that.

>   
>   static const struct clk_hw *i2spcm_parents[] = {
>   	&pll_audio0_4x_clk.common.hw,
> @@ -959,6 +1133,11 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(i2spcm4_clk, "i2spcm4", i2spcm_parents, 0x12
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_DATA(bus_i2spcm0_clk, "bus-i2spcm0", hosc, 0x120c, BIT(0), 0);
> +static SUNXI_CCU_GATE_DATA(bus_i2spcm1_clk, "bus-i2spcm1", hosc, 0x121c, BIT(0), 0);
> +static SUNXI_CCU_GATE_DATA(bus_i2spcm2_clk, "bus-i2spcm2", hosc, 0x122c, BIT(0), 0);
> +static SUNXI_CCU_GATE_DATA(bus_i2spcm3_clk, "bus-i2spcm3", hosc, 0x123c, BIT(0), 0);
> +static SUNXI_CCU_GATE_DATA(bus_i2spcm4_clk, "bus-i2spcm4", hosc, 0x124c, BIT(0), 0);
>   
>   static const struct clk_hw *i2spcm2_asrc_parents[] = {
>   	&pll_audio0_4x_clk.common.hw,
> @@ -995,6 +1174,8 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(owa_rx_clk, "owa_rx", owa_rx_parents, 0x1284
>   				    BIT(31),	/* gate */
>   				    0);
>   
> +static SUNXI_CCU_GATE_HWS(bus_owa_clk, "bus-owa", apb1_hws, 0x128c, BIT(0), 0);

In mainline we use "spdif" instead of "owa", compare the other drivers.

> +
>   static const struct clk_hw *dmic_parents[] = {
>   	&pll_audio0_4x_clk.common.hw,
>   	&pll_audio1_div2_clk.common.hw,
> @@ -1006,6 +1187,8 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(dmic_clk, "dmic", dmic_parents, 0x12c0,
>   				    BIT(31),	/* gate */
>   				    0);
>   
> +static SUNXI_CCU_GATE_HWS(bus_dmic_clk, "bus-dmic", apb1_hws, 0x12cc, BIT(0), 0);
> +
>   /*
>    * The first parent is a 48 MHz input clock divided by 4. That 48 MHz clock is
>    * a 2x multiplier from pll-ref synchronized by pll-periph0, and is also used by
> @@ -1037,6 +1220,9 @@ static struct ccu_mux usb_ohci0_clk = {
>   							   &ccu_mux_ops, 0),
>   	},
>   };
> +static SUNXI_CCU_GATE_HWS(bus_ohci0_clk, "bus-ohci0", ahb_hws, 0x1304, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_ehci0_clk, "bus-ehci0", ahb_hws, 0x1304, BIT(4), 0);
> +static SUNXI_CCU_GATE_HWS(bus_otg_clk, "bus-otg", ahb_hws, 0x1304, BIT(8), 0);
>   
>   static struct ccu_mux usb_ohci1_clk = {
>   	.enable		= BIT(31),
> @@ -1053,6 +1239,8 @@ static struct ccu_mux usb_ohci1_clk = {
>   							   &ccu_mux_ops, 0),
>   	},
>   };
> +static SUNXI_CCU_GATE_HWS(bus_ohci1_clk, "bus-ohci1", ahb_hws, 0x130c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_ehci1_clk, "bus-ehci1", ahb_hws, 0x130c, BIT(4), 0);
>   
>   static const struct clk_parent_data usb_ref_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -1159,6 +1347,8 @@ static SUNXI_CCU_M_HWS_WITH_GATE(gmac1_phy_clk, "gmac1-phy", pll_periph0_150M_hw
>   				 0, 5,		/* M */
>   				 BIT(31),	/* gate */
>   				 0);
> +static SUNXI_CCU_GATE_HWS(bus_gmac0_clk, "bus-gmac0", ahb_hws, 0x141c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_gmac1_clk, "bus-gmac1", ahb_hws, 0x142c, BIT(0), 0);

That GMAC1 clock is not in the manual, where does it come from?

>   
>   static const struct clk_hw *tcon_lcd_parents[] = {
>   	&pll_video0_4x_clk.common.hw,
> @@ -1181,6 +1371,9 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(tcon_lcd2_clk, "tcon-lcd2", tcon_lcd_parents
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_HWS(bus_tcon_lcd0_clk, "bus-tcon-lcd0", ahb_hws, 0x1504, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_tcon_lcd1_clk, "bus-tcon-lcd1", ahb_hws, 0x150c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_tcon_lcd2_clk, "bus-tcon-lcd2", ahb_hws, 0x1514, BIT(0), 0);

Same here, LCD2 is not listed.

The rest looks alright when comparing to the manual, also the whole
boilerplate with the SUNXI_CC_GATE_HWS macro, the list of hw clocks
below and the assignment of the clock IDs to the clocks.

Cheers,
Andre

>   
>   static const struct clk_hw *dsi_parents[] = {
>   	&sys_24M_clk.hw,
> @@ -1197,6 +1390,8 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(dsi1_clk, "dsi1", dsi_parents, 0x1588,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_HWS(bus_dsi0_clk, "bus-dsi0", ahb_hws, 0x1584, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_dsi1_clk, "bus-dsi1", ahb_hws, 0x158c, BIT(0), 0);
>   
>   static const struct clk_hw *combphy_parents[] = {
>   	&pll_video0_4x_clk.common.hw,
> @@ -1216,6 +1411,9 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(combphy1_clk, "combphy1", combphy_parents, 0
>   				    BIT(31),	/* gate */
>   				    0);
>   
> +static SUNXI_CCU_GATE_HWS(bus_tcon_tv0_clk, "bus-tcon-tv0", ahb_hws, 0x1604, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_tcon_tv1_clk, "bus-tcon-tv1", ahb_hws, 0x160c, BIT(0), 0);
> +
>   static const struct clk_hw *edp_tv_parents[] = {
>   	&pll_video0_4x_clk.common.hw,
>   	&pll_video1_4x_clk.common.hw,
> @@ -1227,6 +1425,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(edp_tv_clk, "edp-tv", edp_tv_parents, 0x1640
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_HWS(bus_edp_tv_clk, "bus-edp-tv", ahb_hws, 0x164c, BIT(0), 0);
>   
>   static SUNXI_CCU_GATE_HWS_WITH_PREDIV(hdmi_cec_32k_clk, "hdmi-cec-32k", pll_periph0_2x_hws, 0x1680,
>   				      BIT(30),	/* gate */
> @@ -1254,6 +1453,7 @@ static SUNXI_CCU_DUALDIV_MUX_GATE(hdmi_tv_clk, "hdmi-tv", hdmi_tv_parents, 0x168
>   				  24, 3,	/* mux */
>   				  BIT(31),	/* gate */
>   				  0);
> +static SUNXI_CCU_GATE_HWS(bus_hdmi_tv_clk, "bus-hdmi-tv", ahb_hws, 0x168c, BIT(0), 0);
>   
>   static const struct clk_parent_data hdmi_sfr_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -1266,6 +1466,9 @@ static SUNXI_CCU_MUX_DATA_WITH_GATE(hdmi_sfr_clk, "hdmi-sfr", hdmi_sfr_parents,
>   
>   static SUNXI_CCU_GATE_HWS(hdmi_esm_clk, "hdmi-esm", pll_periph0_300M_hws, 0x1694, BIT(31), 0);
>   
> +static SUNXI_CCU_GATE_HWS(bus_dpss_top0_clk, "bus-dpss-top0", ahb_hws, 0x16c4, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_dpss_top1_clk, "bus-dpss-top1", ahb_hws, 0x16cc, BIT(0), 0);
> +
>   static const struct clk_parent_data ledc_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
>   	{ .hw = &pll_periph0_600M_clk.hw },
> @@ -1276,6 +1479,9 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(ledc_clk, "ledc", ledc_parents, 0x1700,
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_HWS(bus_ledc_clk, "bus-ledc", apb0_hws, 0x1704, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_dsc_clk, "bus-dsc", ahb_hws, 0x1744, BIT(0), 0);
>   
>   static const struct clk_parent_data csi_master_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -1317,6 +1523,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(csi_clk, "csi", csi_parents, 0x1840,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_HWS(bus_csi_clk, "bus-csi", ahb_hws, 0x1844, BIT(0), 0);
>   
>   static const struct clk_hw *isp_parents[] = {
>   	&pll_video2_4x_clk.common.hw,
> @@ -1446,8 +1653,62 @@ static struct ccu_common *sun60i_a733_ccu_clks[] = {
>   	&trace_clk.common,
>   	&gic_clk.common,
>   	&cpu_peri_clk.common,
> +	&bus_its_pcie_clk.common,
>   	&nsi_clk.common,
> +	&bus_nsi_clk.common,
>   	&mbus_clk.common,
> +	&mbus_iommu0_sys_clk.common,
> +	&apb_iommu0_sys_clk.common,
> +	&ahb_iommu0_sys_clk.common,
> +	&bus_msi_lite0_clk.common,
> +	&bus_msi_lite1_clk.common,
> +	&bus_msi_lite2_clk.common,
> +	&mbus_iommu1_sys_clk.common,
> +	&apb_iommu1_sys_clk.common,
> +	&ahb_iommu1_sys_clk.common,
> +	&ahb_ve_dec_clk.common,
> +	&ahb_ve_enc_clk.common,
> +	&ahb_vid_in_clk.common,
> +	&ahb_vid_cout0_clk.common,
> +	&ahb_vid_cout1_clk.common,
> +	&ahb_de_clk.common,
> +	&ahb_npu_clk.common,
> +	&ahb_gpu0_clk.common,
> +	&ahb_serdes_clk.common,
> +	&ahb_usb_sys_clk.common,
> +	&ahb_msi_lite0_clk.common,
> +	&ahb_store_clk.common,
> +	&ahb_cpus_clk.common,
> +	&mbus_iommu0_clk.common,
> +	&mbus_iommu1_clk.common,
> +	&mbus_desys_clk.common,
> +	&mbus_ve_enc_gate_clk.common,
> +	&mbus_ve_dec_gate_clk.common,
> +	&mbus_gpu0_clk.common,
> +	&mbus_npu_clk.common,
> +	&mbus_vid_in_clk.common,
> +	&mbus_serdes_clk.common,
> +	&mbus_msi_lite0_clk.common,
> +	&mbus_store_clk.common,
> +	&mbus_msi_lite2_clk.common,
> +	&mbus_dma0_clk.common,
> +	&mbus_ve_enc_clk.common,
> +	&mbus_ce_clk.common,
> +	&mbus_dma1_clk.common,
> +	&mbus_nand_clk.common,
> +	&mbus_csi_clk.common,
> +	&mbus_isp_clk.common,
> +	&mbus_gmac0_clk.common,
> +	&mbus_gmac1_clk.common,
> +	&mbus_ve_dec_clk.common,
> +	&bus_dma0_clk.common,
> +	&bus_dma1_clk.common,
> +	&bus_spinlock_clk.common,
> +	&bus_msgbox_clk.common,
> +	&bus_pwm0_clk.common,
> +	&bus_pwm1_clk.common,
> +	&bus_dbg_clk.common,
> +	&bus_sysdap_clk.common,
>   	&timer0_clk.common,
>   	&timer1_clk.common,
>   	&timer2_clk.common,
> @@ -1458,48 +1719,111 @@ static struct ccu_common *sun60i_a733_ccu_clks[] = {
>   	&timer7_clk.common,
>   	&timer8_clk.common,
>   	&timer9_clk.common,
> +	&bus_timer_clk.common,
>   	&avs_clk.common,
>   	&de_clk.common,
> +	&bus_de_clk.common,
>   	&di_clk.common,
> +	&bus_di_clk.common,
>   	&g2d_clk.common,
> +	&bus_g2d_clk.common,
>   	&eink_clk.common,
>   	&eink_panel_clk.common,
> +	&bus_eink_clk.common,
>   	&ve_enc_clk.common,
>   	&ve_dec_clk.common,
> +	&bus_ve_enc_clk.common,
> +	&bus_ve_dec_clk.common,
>   	&ce_clk.common,
> +	&bus_ce_clk.common,
> +	&bus_ce_sys_clk.common,
>   	&npu_clk.common,
> +	&bus_npu_clk.common,
>   	&gpu_clk.common,
> +	&bus_gpu_clk.common,
>   	&dram_clk.common,
> +	&bus_dram_clk.common,
>   	&nand0_clk.common,
>   	&nand1_clk.common,
> +	&bus_nand_clk.common,
>   	&mmc0_clk.common,
>   	&mmc1_clk.common,
>   	&mmc2_clk.common,
>   	&mmc3_clk.common,
> +	&bus_mmc0_clk.common,
> +	&bus_mmc1_clk.common,
> +	&bus_mmc2_clk.common,
> +	&bus_mmc3_clk.common,
>   	&ufs_axi_clk.common,
>   	&ufs_cfg_clk.common,
> +	&bus_ufs_clk.common,
> +	&bus_uart0_clk.common,
> +	&bus_uart1_clk.common,
> +	&bus_uart2_clk.common,
> +	&bus_uart3_clk.common,
> +	&bus_uart4_clk.common,
> +	&bus_uart5_clk.common,
> +	&bus_uart6_clk.common,
> +	&bus_i2c0_clk.common,
> +	&bus_i2c1_clk.common,
> +	&bus_i2c2_clk.common,
> +	&bus_i2c3_clk.common,
> +	&bus_i2c4_clk.common,
> +	&bus_i2c5_clk.common,
> +	&bus_i2c6_clk.common,
> +	&bus_i2c7_clk.common,
> +	&bus_i2c8_clk.common,
> +	&bus_i2c9_clk.common,
> +	&bus_i2c10_clk.common,
> +	&bus_i2c11_clk.common,
> +	&bus_i2c12_clk.common,
>   	&spi0_clk.common,
>   	&spi1_clk.common,
>   	&spi2_clk.common,
>   	&spi3_clk.common,
>   	&spi4_clk.common,
> +	&bus_spi0_clk.common,
> +	&bus_spi1_clk.common,
> +	&bus_spi2_clk.common,
> +	&bus_spi3_clk.common,
> +	&bus_spi4_clk.common,
>   	&spif_clk.common,
> +	&bus_spif_clk.common,
>   	&gpadc_clk.common,
> +	&bus_gpadc_clk.common,
> +	&bus_ths_clk.common,
>   	&irrx_clk.common,
> +	&bus_irrx_clk.common,
>   	&irtx_clk.common,
> +	&bus_irtx_clk.common,
> +	&bus_lradc_clk.common,
>   	&sgpio_clk.common,
> +	&bus_sgpio_clk.common,
>   	&lpc_clk.common,
> +	&bus_lpc_clk.common,
>   	&i2spcm0_clk.common,
>   	&i2spcm1_clk.common,
>   	&i2spcm2_clk.common,
>   	&i2spcm3_clk.common,
>   	&i2spcm4_clk.common,
> +	&bus_i2spcm0_clk.common,
> +	&bus_i2spcm1_clk.common,
> +	&bus_i2spcm2_clk.common,
> +	&bus_i2spcm3_clk.common,
> +	&bus_i2spcm4_clk.common,
>   	&i2spcm2_asrc_clk.common,
>   	&owa_tx_clk.common,
>   	&owa_rx_clk.common,
> +	&bus_owa_clk.common,
>   	&dmic_clk.common,
> +	&bus_dmic_clk.common,
>   	&usb_ohci0_clk.common,
> +	&bus_otg_clk.common,
> +	&bus_ehci0_clk.common,
> +	&bus_ohci0_clk.common,
>   	&usb_ohci1_clk.common,
> +	&bus_ehci1_clk.common,
> +	&bus_ohci1_clk.common,
>   	&usb_ref_clk.common,
>   	&usb2_u2_ref_clk.common,
>   	&usb2_suspend_clk.common,
> @@ -1512,24 +1836,40 @@ static struct ccu_common *sun60i_a733_ccu_clks[] = {
>   	&gmac_ptp_clk.common,
>   	&gmac0_phy_clk.common,
>   	&gmac1_phy_clk.common,
> +	&bus_gmac0_clk.common,
> +	&bus_gmac1_clk.common,
>   	&tcon_lcd0_clk.common,
>   	&tcon_lcd1_clk.common,
>   	&tcon_lcd2_clk.common,
> +	&bus_tcon_lcd0_clk.common,
> +	&bus_tcon_lcd1_clk.common,
> +	&bus_tcon_lcd2_clk.common,
>   	&dsi0_clk.common,
>   	&dsi1_clk.common,
> +	&bus_dsi0_clk.common,
> +	&bus_dsi1_clk.common,
>   	&combphy0_clk.common,
>   	&combphy1_clk.common,
> +	&bus_tcon_tv0_clk.common,
> +	&bus_tcon_tv1_clk.common,
>   	&edp_tv_clk.common,
> +	&bus_edp_tv_clk.common,
>   	&hdmi_cec_32k_clk.common,
>   	&hdmi_cec_clk.common,
>   	&hdmi_tv_clk.common,
> +	&bus_hdmi_tv_clk.common,
>   	&hdmi_sfr_clk.common,
>   	&hdmi_esm_clk.common,
> +	&bus_dpss_top0_clk.common,
> +	&bus_dpss_top1_clk.common,
>   	&ledc_clk.common,
> +	&bus_ledc_clk.common,
> +	&bus_dsc_clk.common,
>   	&csi_master0_clk.common,
>   	&csi_master1_clk.common,
>   	&csi_master2_clk.common,
>   	&csi_clk.common,
> +	&bus_csi_clk.common,
>   	&isp_clk.common,
>   	&apb2jtag_clk.common,
>   	&fanout_24M_clk.common,
> @@ -1596,8 +1936,62 @@ static struct clk_hw_onecell_data sun60i_a733_hw_clks = {
>   		[CLK_TRACE]		= &trace_clk.common.hw,
>   		[CLK_GIC]		= &gic_clk.common.hw,
>   		[CLK_CPU_PERI]		= &cpu_peri_clk.common.hw,
> +		[CLK_BUS_ITS_PCIE]	= &bus_its_pcie_clk.common.hw,
>   		[CLK_NSI]		= &nsi_clk.common.hw,
> +		[CLK_BUS_NSI]		= &bus_nsi_clk.common.hw,
>   		[CLK_MBUS]		= &mbus_clk.common.hw,
> +		[CLK_MBUS_IOMMU0_SYS]	= &mbus_iommu0_sys_clk.common.hw,
> +		[CLK_APB_IOMMU0_SYS]	= &apb_iommu0_sys_clk.common.hw,
> +		[CLK_AHB_IOMMU0_SYS]	= &ahb_iommu0_sys_clk.common.hw,
> +		[CLK_BUS_MSI_LITE0]	= &bus_msi_lite0_clk.common.hw,
> +		[CLK_BUS_MSI_LITE1]	= &bus_msi_lite1_clk.common.hw,
> +		[CLK_BUS_MSI_LITE2]	= &bus_msi_lite2_clk.common.hw,
> +		[CLK_MBUS_IOMMU1_SYS]	= &mbus_iommu1_sys_clk.common.hw,
> +		[CLK_APB_IOMMU1_SYS]	= &apb_iommu1_sys_clk.common.hw,
> +		[CLK_AHB_IOMMU1_SYS]	= &ahb_iommu1_sys_clk.common.hw,
> +		[CLK_AHB_VE_DEC]	= &ahb_ve_dec_clk.common.hw,
> +		[CLK_AHB_VE_ENC]	= &ahb_ve_enc_clk.common.hw,
> +		[CLK_AHB_VID_IN]	= &ahb_vid_in_clk.common.hw,
> +		[CLK_AHB_VID_COUT0]	= &ahb_vid_cout0_clk.common.hw,
> +		[CLK_AHB_VID_COUT1]	= &ahb_vid_cout1_clk.common.hw,
> +		[CLK_AHB_DE]		= &ahb_de_clk.common.hw,
> +		[CLK_AHB_NPU]		= &ahb_npu_clk.common.hw,
> +		[CLK_AHB_GPU0]		= &ahb_gpu0_clk.common.hw,
> +		[CLK_AHB_SERDES]	= &ahb_serdes_clk.common.hw,
> +		[CLK_AHB_USB_SYS]	= &ahb_usb_sys_clk.common.hw,
> +		[CLK_AHB_MSI_LITE0]	= &ahb_msi_lite0_clk.common.hw,
> +		[CLK_AHB_STORE]		= &ahb_store_clk.common.hw,
> +		[CLK_AHB_CPUS]		= &ahb_cpus_clk.common.hw,
> +		[CLK_MBUS_IOMMU0]	= &mbus_iommu0_clk.common.hw,
> +		[CLK_MBUS_IOMMU1]	= &mbus_iommu1_clk.common.hw,
> +		[CLK_MBUS_DESYS]	= &mbus_desys_clk.common.hw,
> +		[CLK_MBUS_VE_ENC_GATE]	= &mbus_ve_enc_gate_clk.common.hw,
> +		[CLK_MBUS_VE_DEC_GATE]	= &mbus_ve_dec_gate_clk.common.hw,
> +		[CLK_MBUS_GPU0]		= &mbus_gpu0_clk.common.hw,
> +		[CLK_MBUS_NPU]		= &mbus_npu_clk.common.hw,
> +		[CLK_MBUS_VID_IN]	= &mbus_vid_in_clk.common.hw,
> +		[CLK_MBUS_SERDES]	= &mbus_serdes_clk.common.hw,
> +		[CLK_MBUS_MSI_LITE0]	= &mbus_msi_lite0_clk.common.hw,
> +		[CLK_MBUS_STORE]	= &mbus_store_clk.common.hw,
> +		[CLK_MBUS_MSI_LITE2]	= &mbus_msi_lite2_clk.common.hw,
> +		[CLK_MBUS_DMA0]		= &mbus_dma0_clk.common.hw,
> +		[CLK_MBUS_VE_ENC]	= &mbus_ve_enc_clk.common.hw,
> +		[CLK_MBUS_CE]		= &mbus_ce_clk.common.hw,
> +		[CLK_MBUS_DMA1]		= &mbus_dma1_clk.common.hw,
> +		[CLK_MBUS_NAND]		= &mbus_nand_clk.common.hw,
> +		[CLK_MBUS_CSI]		= &mbus_csi_clk.common.hw,
> +		[CLK_MBUS_ISP]		= &mbus_isp_clk.common.hw,
> +		[CLK_MBUS_GMAC0]	= &mbus_gmac0_clk.common.hw,
> +		[CLK_MBUS_GMAC1]	= &mbus_gmac1_clk.common.hw,
> +		[CLK_MBUS_VE_DEC]	= &mbus_ve_dec_clk.common.hw,
> +		[CLK_BUS_DMA0]		= &bus_dma0_clk.common.hw,
> +		[CLK_BUS_DMA1]		= &bus_dma1_clk.common.hw,
> +		[CLK_BUS_SPINLOCK]	= &bus_spinlock_clk.common.hw,
> +		[CLK_BUS_MSGBOX]	= &bus_msgbox_clk.common.hw,
> +		[CLK_BUS_PWM0]		= &bus_pwm0_clk.common.hw,
> +		[CLK_BUS_PWM1]		= &bus_pwm1_clk.common.hw,
> +		[CLK_BUS_DBG]		= &bus_dbg_clk.common.hw,
> +		[CLK_BUS_SYSDAP]	= &bus_sysdap_clk.common.hw,
>   		[CLK_TIMER0]		= &timer0_clk.common.hw,
>   		[CLK_TIMER1]		= &timer1_clk.common.hw,
>   		[CLK_TIMER2]		= &timer2_clk.common.hw,
> @@ -1608,48 +2002,111 @@ static struct clk_hw_onecell_data sun60i_a733_hw_clks = {
>   		[CLK_TIMER7]		= &timer7_clk.common.hw,
>   		[CLK_TIMER8]		= &timer8_clk.common.hw,
>   		[CLK_TIMER9]		= &timer9_clk.common.hw,
> +		[CLK_BUS_TIMER]		= &bus_timer_clk.common.hw,
>   		[CLK_AVS]		= &avs_clk.common.hw,
>   		[CLK_DE]		= &de_clk.common.hw,
> +		[CLK_BUS_DE]		= &bus_de_clk.common.hw,
>   		[CLK_DI]		= &di_clk.common.hw,
> +		[CLK_BUS_DI]		= &bus_di_clk.common.hw,
>   		[CLK_G2D]		= &g2d_clk.common.hw,
> +		[CLK_BUS_G2D]		= &bus_g2d_clk.common.hw,
>   		[CLK_EINK]		= &eink_clk.common.hw,
>   		[CLK_EINK_PANEL]	= &eink_panel_clk.common.hw,
> +		[CLK_BUS_EINK]		= &bus_eink_clk.common.hw,
>   		[CLK_VE_ENC]		= &ve_enc_clk.common.hw,
>   		[CLK_VE_DEC]		= &ve_dec_clk.common.hw,
> +		[CLK_BUS_VE_ENC]	= &bus_ve_enc_clk.common.hw,
> +		[CLK_BUS_VE_DEC]	= &bus_ve_dec_clk.common.hw,
>   		[CLK_CE]		= &ce_clk.common.hw,
> +		[CLK_BUS_CE]		= &bus_ce_clk.common.hw,
> +		[CLK_BUS_CE_SYS]	= &bus_ce_sys_clk.common.hw,
>   		[CLK_NPU]		= &npu_clk.common.hw,
> +		[CLK_BUS_NPU]		= &bus_npu_clk.common.hw,
>   		[CLK_GPU]		= &gpu_clk.common.hw,
> +		[CLK_BUS_GPU]		= &bus_gpu_clk.common.hw,
>   		[CLK_DRAM]		= &dram_clk.common.hw,
> +		[CLK_BUS_DRAM]		= &bus_dram_clk.common.hw,
>   		[CLK_NAND0]		= &nand0_clk.common.hw,
>   		[CLK_NAND1]		= &nand1_clk.common.hw,
> +		[CLK_BUS_NAND]		= &bus_nand_clk.common.hw,
>   		[CLK_MMC0]		= &mmc0_clk.common.hw,
>   		[CLK_MMC1]		= &mmc1_clk.common.hw,
>   		[CLK_MMC2]		= &mmc2_clk.common.hw,
>   		[CLK_MMC3]		= &mmc3_clk.common.hw,
> +		[CLK_BUS_MMC0]		= &bus_mmc0_clk.common.hw,
> +		[CLK_BUS_MMC1]		= &bus_mmc1_clk.common.hw,
> +		[CLK_BUS_MMC2]		= &bus_mmc2_clk.common.hw,
> +		[CLK_BUS_MMC3]		= &bus_mmc3_clk.common.hw,
>   		[CLK_UFS_AXI]		= &ufs_axi_clk.common.hw,
>   		[CLK_UFS_CFG]		= &ufs_cfg_clk.common.hw,
> +		[CLK_BUS_UFS]		= &bus_ufs_clk.common.hw,
> +		[CLK_BUS_UART0]		= &bus_uart0_clk.common.hw,
> +		[CLK_BUS_UART1]		= &bus_uart1_clk.common.hw,
> +		[CLK_BUS_UART2]		= &bus_uart2_clk.common.hw,
> +		[CLK_BUS_UART3]		= &bus_uart3_clk.common.hw,
> +		[CLK_BUS_UART4]		= &bus_uart4_clk.common.hw,
> +		[CLK_BUS_UART5]		= &bus_uart5_clk.common.hw,
> +		[CLK_BUS_UART6]		= &bus_uart6_clk.common.hw,
> +		[CLK_BUS_I2C0]		= &bus_i2c0_clk.common.hw,
> +		[CLK_BUS_I2C1]		= &bus_i2c1_clk.common.hw,
> +		[CLK_BUS_I2C2]		= &bus_i2c2_clk.common.hw,
> +		[CLK_BUS_I2C3]		= &bus_i2c3_clk.common.hw,
> +		[CLK_BUS_I2C4]		= &bus_i2c4_clk.common.hw,
> +		[CLK_BUS_I2C5]		= &bus_i2c5_clk.common.hw,
> +		[CLK_BUS_I2C6]		= &bus_i2c6_clk.common.hw,
> +		[CLK_BUS_I2C7]		= &bus_i2c7_clk.common.hw,
> +		[CLK_BUS_I2C8]		= &bus_i2c8_clk.common.hw,
> +		[CLK_BUS_I2C9]		= &bus_i2c9_clk.common.hw,
> +		[CLK_BUS_I2C10]		= &bus_i2c10_clk.common.hw,
> +		[CLK_BUS_I2C11]		= &bus_i2c11_clk.common.hw,
> +		[CLK_BUS_I2C12]		= &bus_i2c12_clk.common.hw,
>   		[CLK_SPI0]		= &spi0_clk.common.hw,
>   		[CLK_SPI1]		= &spi1_clk.common.hw,
>   		[CLK_SPI2]		= &spi2_clk.common.hw,
>   		[CLK_SPI3]		= &spi3_clk.common.hw,
>   		[CLK_SPI4]		= &spi4_clk.common.hw,
> +		[CLK_BUS_SPI0]		= &bus_spi0_clk.common.hw,
> +		[CLK_BUS_SPI1]		= &bus_spi1_clk.common.hw,
> +		[CLK_BUS_SPI2]		= &bus_spi2_clk.common.hw,
> +		[CLK_BUS_SPI3]		= &bus_spi3_clk.common.hw,
> +		[CLK_BUS_SPI4]		= &bus_spi4_clk.common.hw,
>   		[CLK_SPIF]		= &spif_clk.common.hw,
> +		[CLK_BUS_SPIF]		= &bus_spif_clk.common.hw,
>   		[CLK_GPADC]		= &gpadc_clk.common.hw,
> +		[CLK_BUS_GPADC]		= &bus_gpadc_clk.common.hw,
> +		[CLK_BUS_THS]		= &bus_ths_clk.common.hw,
>   		[CLK_IRRX]		= &irrx_clk.common.hw,
> +		[CLK_BUS_IRRX]		= &bus_irrx_clk.common.hw,
>   		[CLK_IRTX]		= &irtx_clk.common.hw,
> +		[CLK_BUS_IRTX]		= &bus_irtx_clk.common.hw,
> +		[CLK_BUS_LRADC]		= &bus_lradc_clk.common.hw,
>   		[CLK_SGPIO]		= &sgpio_clk.common.hw,
> +		[CLK_BUS_SGPIO]		= &bus_sgpio_clk.common.hw,
>   		[CLK_LPC]		= &lpc_clk.common.hw,
> +		[CLK_BUS_LPC]		= &bus_lpc_clk.common.hw,
>   		[CLK_I2SPCM0]		= &i2spcm0_clk.common.hw,
>   		[CLK_I2SPCM1]		= &i2spcm1_clk.common.hw,
>   		[CLK_I2SPCM2]		= &i2spcm2_clk.common.hw,
>   		[CLK_I2SPCM3]		= &i2spcm3_clk.common.hw,
>   		[CLK_I2SPCM4]		= &i2spcm4_clk.common.hw,
> +		[CLK_BUS_I2SPCM0]	= &bus_i2spcm0_clk.common.hw,
> +		[CLK_BUS_I2SPCM1]	= &bus_i2spcm1_clk.common.hw,
> +		[CLK_BUS_I2SPCM2]	= &bus_i2spcm2_clk.common.hw,
> +		[CLK_BUS_I2SPCM3]	= &bus_i2spcm3_clk.common.hw,
> +		[CLK_BUS_I2SPCM4]	= &bus_i2spcm4_clk.common.hw,
>   		[CLK_I2SPCM2_ASRC]	= &i2spcm2_asrc_clk.common.hw,
>   		[CLK_OWA_TX]		= &owa_tx_clk.common.hw,
>   		[CLK_OWA_RX]		= &owa_rx_clk.common.hw,
> +		[CLK_BUS_OWA]		= &bus_owa_clk.common.hw,
>   		[CLK_DMIC]		= &dmic_clk.common.hw,
> +		[CLK_BUS_DMIC]		= &bus_dmic_clk.common.hw,
>   		[CLK_USB_OHCI0]		= &usb_ohci0_clk.common.hw,
> +		[CLK_BUS_OTG]		= &bus_otg_clk.common.hw,
> +		[CLK_BUS_EHCI0]		= &bus_ehci0_clk.common.hw,
> +		[CLK_BUS_OHCI0]		= &bus_ohci0_clk.common.hw,
>   		[CLK_USB_OHCI1]		= &usb_ohci1_clk.common.hw,
> +		[CLK_BUS_EHCI1]		= &bus_ehci1_clk.common.hw,
> +		[CLK_BUS_OHCI1]		= &bus_ohci1_clk.common.hw,
>   		[CLK_USB_REF]		= &usb_ref_clk.common.hw,
>   		[CLK_USB2_U2_REF]	= &usb2_u2_ref_clk.common.hw,
>   		[CLK_USB2_SUSPEND]	= &usb2_suspend_clk.common.hw,
> @@ -1662,24 +2119,40 @@ static struct clk_hw_onecell_data sun60i_a733_hw_clks = {
>   		[CLK_GMAC_PTP]		= &gmac_ptp_clk.common.hw,
>   		[CLK_GMAC0_PHY]		= &gmac0_phy_clk.common.hw,
>   		[CLK_GMAC1_PHY]		= &gmac1_phy_clk.common.hw,
> +		[CLK_BUS_GMAC0]		= &bus_gmac0_clk.common.hw,
> +		[CLK_BUS_GMAC1]		= &bus_gmac1_clk.common.hw,
>   		[CLK_TCON_LCD0]		= &tcon_lcd0_clk.common.hw,
>   		[CLK_TCON_LCD1]		= &tcon_lcd1_clk.common.hw,
>   		[CLK_TCON_LCD2]		= &tcon_lcd2_clk.common.hw,
> +		[CLK_BUS_TCON_LCD0]	= &bus_tcon_lcd0_clk.common.hw,
> +		[CLK_BUS_TCON_LCD1]	= &bus_tcon_lcd1_clk.common.hw,
> +		[CLK_BUS_TCON_LCD2]	= &bus_tcon_lcd2_clk.common.hw,
>   		[CLK_DSI0]		= &dsi0_clk.common.hw,
>   		[CLK_DSI1]		= &dsi1_clk.common.hw,
> +		[CLK_BUS_DSI0]		= &bus_dsi0_clk.common.hw,
> +		[CLK_BUS_DSI1]		= &bus_dsi1_clk.common.hw,
>   		[CLK_COMBPHY0]		= &combphy0_clk.common.hw,
>   		[CLK_COMBPHY1]		= &combphy1_clk.common.hw,
> +		[CLK_BUS_TCON_TV0]	= &bus_tcon_tv0_clk.common.hw,
> +		[CLK_BUS_TCON_TV1]	= &bus_tcon_tv1_clk.common.hw,
>   		[CLK_EDP_TV]		= &edp_tv_clk.common.hw,
> +		[CLK_BUS_EDP_TV]	= &bus_edp_tv_clk.common.hw,
>   		[CLK_HDMI_CEC_32K]	= &hdmi_cec_32k_clk.common.hw,
>   		[CLK_HDMI_CEC]		= &hdmi_cec_clk.common.hw,
>   		[CLK_HDMI_TV]		= &hdmi_tv_clk.common.hw,
> +		[CLK_BUS_HDMI_TV]	= &bus_hdmi_tv_clk.common.hw,
>   		[CLK_HDMI_SFR]		= &hdmi_sfr_clk.common.hw,
>   		[CLK_HDMI_ESM]		= &hdmi_esm_clk.common.hw,
> +		[CLK_BUS_DPSS_TOP0]	= &bus_dpss_top0_clk.common.hw,
> +		[CLK_BUS_DPSS_TOP1]	= &bus_dpss_top1_clk.common.hw,
>   		[CLK_LEDC]		= &ledc_clk.common.hw,
> +		[CLK_BUS_LEDC]		= &bus_ledc_clk.common.hw,
> +		[CLK_BUS_DSC]		= &bus_dsc_clk.common.hw,
>   		[CLK_CSI_MASTER0]	= &csi_master0_clk.common.hw,
>   		[CLK_CSI_MASTER1]	= &csi_master1_clk.common.hw,
>   		[CLK_CSI_MASTER2]	= &csi_master2_clk.common.hw,
>   		[CLK_CSI]		= &csi_clk.common.hw,
> +		[CLK_BUS_CSI]		= &bus_csi_clk.common.hw,
>   		[CLK_ISP]		= &isp_clk.common.hw,
>   		[CLK_APB2JTAG]		= &apb2jtag_clk.common.hw,
>   		[CLK_FANOUT_24M]	= &fanout_24M_clk.common.hw,
> 


^ permalink raw reply

* [PATCH 6.6.y] net: sched: fix TCF_LAYER_TRANSPORT handling in tcf_get_base_ptr()
From: Chelsy Ratnawat @ 2026-04-15 22:19 UTC (permalink / raw)
  To: stable
  Cc: jhs, jiri, davem, edumazet, kuba, netdev,
	syzbot+f3a497f02c389d86ef16, Chelsy Ratnawat

From: Eric Dumazet <edumazet@google.com>

[Upstream commit 4fe5a00ec70717a7f1002d8913ec6143582b3c8e]

syzbot reported that tcf_get_base_ptr() can be called while transport
header is not set [1].

Instead of returning a dangling pointer, return NULL.

Fix tcf_get_base_ptr() callers to handle this NULL value.

[1]
 WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 skb_transport_header include/linux/skbuff.h:3071 [inline]
 WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 tcf_get_base_ptr include/net/pkt_cls.h:539 [inline]
 WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 em_nbyte_match+0x2d8/0x3f0 net/sched/em_nbyte.c:43
Modules linked in:
CPU: 1 UID: 0 PID: 6019 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Call Trace:
 <TASK>
  tcf_em_match net/sched/ematch.c:494 [inline]
  __tcf_em_tree_match+0x1ac/0x770 net/sched/ematch.c:520
  tcf_em_tree_match include/net/pkt_cls.h:512 [inline]
  basic_classify+0x115/0x2d0 net/sched/cls_basic.c:50
  tc_classify include/net/tc_wrapper.h:197 [inline]
  __tcf_classify net/sched/cls_api.c:1764 [inline]
  tcf_classify+0x4cf/0x1140 net/sched/cls_api.c:1860
  multiq_classify net/sched/sch_multiq.c:39 [inline]
  multiq_enqueue+0xfd/0x4c0 net/sched/sch_multiq.c:66
  dev_qdisc_enqueue+0x4e/0x260 net/core/dev.c:4118
  __dev_xmit_skb net/core/dev.c:4214 [inline]
  __dev_queue_xmit+0xe83/0x3b50 net/core/dev.c:4729
  packet_snd net/packet/af_packet.c:3076 [inline]
  packet_sendmsg+0x3e33/0x5080 net/packet/af_packet.c:3108
  sock_sendmsg_nosec net/socket.c:727 [inline]
  __sock_sendmsg+0x21c/0x270 net/socket.c:742
  ____sys_sendmsg+0x505/0x830 net/socket.c:2630

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+f3a497f02c389d86ef16@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6920855a.a70a0220.2ea503.0058.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20251121154100.1616228-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Chelsy Ratnawat <chelsyratnawat2001@gmail.com>
---
 include/net/pkt_cls.h |  2 ++
 net/sched/em_cmp.c    |  5 ++++-
 net/sched/em_nbyte.c  |  2 ++
 net/sched/em_text.c   | 11 +++++++++--
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index f308e8268651..ccc1c698ed00 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -525,6 +525,8 @@ static inline unsigned char * tcf_get_base_ptr(struct sk_buff *skb, int layer)
 		case TCF_LAYER_NETWORK:
 			return skb_network_header(skb);
 		case TCF_LAYER_TRANSPORT:
+			if (!skb_transport_header_was_set(skb))
+				break;
 			return skb_transport_header(skb);
 	}
 
diff --git a/net/sched/em_cmp.c b/net/sched/em_cmp.c
index f17b049ea530..71ce113f2d08 100644
--- a/net/sched/em_cmp.c
+++ b/net/sched/em_cmp.c
@@ -22,9 +22,12 @@ static int em_cmp_match(struct sk_buff *skb, struct tcf_ematch *em,
 			struct tcf_pkt_info *info)
 {
 	struct tcf_em_cmp *cmp = (struct tcf_em_cmp *) em->data;
-	unsigned char *ptr = tcf_get_base_ptr(skb, cmp->layer) + cmp->off;
+	unsigned char *ptr = tcf_get_base_ptr(skb, cmp->layer);
 	u32 val = 0;
 
+	if (!ptr)
+		return 0;
+	ptr += cmp->off;
 	if (!tcf_valid_offset(skb, ptr, cmp->align))
 		return 0;
 
diff --git a/net/sched/em_nbyte.c b/net/sched/em_nbyte.c
index a83b237cbeb0..2e3c1d58d456 100644
--- a/net/sched/em_nbyte.c
+++ b/net/sched/em_nbyte.c
@@ -42,6 +42,8 @@ static int em_nbyte_match(struct sk_buff *skb, struct tcf_ematch *em,
 	struct nbyte_data *nbyte = (struct nbyte_data *) em->data;
 	unsigned char *ptr = tcf_get_base_ptr(skb, nbyte->hdr.layer);
 
+	if (!ptr)
+		return 0;
 	ptr += nbyte->hdr.off;
 
 	if (!tcf_valid_offset(skb, ptr, nbyte->hdr.len))
diff --git a/net/sched/em_text.c b/net/sched/em_text.c
index f176afb70559..32aae8a9deda 100644
--- a/net/sched/em_text.c
+++ b/net/sched/em_text.c
@@ -29,12 +29,19 @@ static int em_text_match(struct sk_buff *skb, struct tcf_ematch *m,
 			 struct tcf_pkt_info *info)
 {
 	struct text_match *tm = EM_TEXT_PRIV(m);
+	unsigned char *ptr;
 	int from, to;
 
-	from = tcf_get_base_ptr(skb, tm->from_layer) - skb->data;
+	ptr = tcf_get_base_ptr(skb, tm->from_layer);
+	if (!ptr)
+		return 0;
+	from = ptr - skb->data;
 	from += tm->from_offset;
 
-	to = tcf_get_base_ptr(skb, tm->to_layer) - skb->data;
+	ptr = tcf_get_base_ptr(skb, tm->to_layer);
+	if (!ptr)
+		return 0;
+	to = ptr - skb->data;
 	to += tm->to_offset;
 
 	return skb_find_text(skb, from, to, tm->config) != UINT_MAX;
-- 
2.43.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox