Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: Martin KaFai Lau @ 2026-04-15 18:55 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: bpf, Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu, Eric Dumazet,
	Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern,
	netdev, linux-doc, linux-kernel
In-Reply-To: <20260414105702.248310-1-jiayuan.chen@linux.dev>

On Tue, Apr 14, 2026 at 06:57:00PM +0800, Jiayuan Chen wrote:
> A BPF_PROG_TYPE_SOCK_OPS program can set BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG
> to inject custom TCP header options. When the kernel builds a TCP packet,
> it calls tcp_established_options() to calculate the header size, which
> invokes bpf_skops_hdr_opt_len() to trigger the BPF_SOCK_OPS_HDR_OPT_LEN_CB
> callback.
> 
> If the BPF program calls bpf_setsockopt(TCP_NODELAY) inside this callback,
> __tcp_sock_set_nodelay() will call tcp_push_pending_frames(), which calls
> tcp_current_mss(), which calls tcp_established_options() again,
> re-triggering the same BPF callback. This creates an infinite recursion
> that exhausts the kernel stack and causes a panic.
> 
> BPF_SOCK_OPS_HDR_OPT_LEN_CB
>   -> bpf_setsockopt(TCP_NODELAY)
> 	-> tcp_push_pending_frames()
> 	  -> tcp_current_mss()
> 		-> tcp_established_options()
> 		  -> bpf_skops_hdr_opt_len()
>                            /* infinite recursion */
> 			-> BPF_SOCK_OPS_HDR_OPT_LEN_CB
> 
> A similar reentrancy issue exists for TCP congestion control, which is
> guarded by tp->bpf_chg_cc_inprogress. Adopt the same approach: introduce
> tp->bpf_hdr_opt_len_cb_inprogress, set it before invoking the callback in
> bpf_skops_hdr_opt_len(), and check it in sol_tcp_sockopt() to reject
> bpf_setsockopt(TCP_NODELAY) calls that would trigger
> tcp_push_pending_frames() and cause the recursion.
> 
> Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> Reported-by: Dongliang Mu <dzm91@hust.edu.cn>
> Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/

Thanks for the report and fixes suggested across different threads.

Using has_current_bpf_ctx() to avoid tcp_push_pending_frames() should
work but it may change the expectation for bpf_setsockopt(TCP_NODELAY).
e.g. A bpf_tcp_iter does bpf_setsockopt(TCP_NODELAY).

Adding another bit in the tcp_sock is not ideal either. I agree with
Alexei that it is better to reuse the existing bit if we go down this path.
We also need to audit more closely if there are cases that two different
type of bpf progs may call bpf_setsockopt(). e.g.
bpf_tcp_iter does bpf_setsockopt(TCP_CONGESTION) to switch
to a bpf_tcp_cc and the new bpf_tcp_cc->init() will also do
bpf_setsockopt(xxx) which then will be rejected.

Another fix could be, the bpf_setsockopt(TCP_NODELAY) is always broken
for BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB unless
the bpf prog is doing some maneuver to avoid the recursion. Thus,
this use case is basically broken as is and I don't see a use case
for bpf_setsockopt(TCP_NODELAY) when writing header also.
How about checking the bpf_sock->op, level, and optname in
bpf_sock_ops_setsockopt() and return -EOPNOTSUPP?

^ permalink raw reply

* Re: [PATCH net-next v4 04/13] devlink: allow to use devlink index as a command handle
From: Geert Uytterhoeven @ 2026-04-15 19:04 UTC (permalink / raw)
  To: jiri
  Cc: andrew+netdev, chuck.lever, cjubran, corbet, daniel.zahka, davem,
	donald.hunter, edumazet, horms, kuba, leon, linux-doc, linux-rdma,
	linux-trace-kernel, mathieu.desnoyers, matttbe, mbloch, mhiramat,
	mschmidt, netdev, pabeni, przemyslaw.kitszel, rostedt, saeedm,
	skhan, tariqt, linux-kernel
In-Reply-To: <20260312100407.551173-5-jiri@resnulli.us>

On Thu, 12 Mar 2026, Jiri Pirko wrote:
> Currently devlink instances are addressed bus_name/dev_name tuple.
> Allow the newly introduced DEVLINK_ATTR_INDEX to be used as
> an alternative handle for all devlink commands.
> 
> When DEVLINK_ATTR_INDEX is present in the request, use it for a direct
> xarray lookup instead of iterating over all instances comparing
> bus_name/dev_name strings.
> 
> Signed-off-by: Jiri Pirko <jiri@nvidia.com>

Thanks for your patch, which is now commit d85a8af57da87196 ("devlink:
allow to use devlink index as a command handle").

This has a rather large impact on kernel size.
For e.g. m68k/atari_defconfig, bloat-o-meter reports:

    add/remove: 4/1 grow/shrink: 72/1 up/down: 65804/-76 (65728)
    Function                                     old     new   delta
    devlink_trap_policer_get_dump_nl_policy       24    1480   +1456
    devlink_trap_group_get_dump_nl_policy         24    1480   +1456
    devlink_trap_get_dump_nl_policy               24    1480   +1456
    devlink_selftests_get_nl_policy               24    1480   +1456
    devlink_sb_tc_pool_bind_get_dump_nl_policy      24    1480   +1456
    devlink_sb_port_pool_get_dump_nl_policy       24    1480   +1456
    devlink_sb_pool_get_dump_nl_policy            24    1480   +1456
    devlink_sb_get_dump_nl_policy                 24    1480   +1456
    devlink_resource_dump_nl_policy               24    1480   +1456
    devlink_region_get_dump_nl_policy             24    1480   +1456
    devlink_rate_get_dump_nl_policy               24    1480   +1456
    devlink_port_get_dump_nl_policy               24    1480   +1456
    devlink_param_get_dump_nl_policy              24    1480   +1456
    devlink_linecard_get_dump_nl_policy           24    1480   +1456
    devlink_info_get_nl_policy                    24    1480   +1456
    devlink_get_nl_policy                         24    1480   +1456
    devlink_eswitch_get_nl_policy                 24    1480   +1456
    devlink_dpipe_headers_get_nl_policy           24    1480   +1456
    devlink_port_unsplit_nl_policy                32    1480   +1448
    devlink_port_param_set_nl_policy              32    1480   +1448
    devlink_port_param_get_nl_policy              32    1480   +1448
    devlink_port_get_do_nl_policy                 32    1480   +1448
    devlink_port_del_nl_policy                    32    1480   +1448
    devlink_notify_filter_set_nl_policy           32    1480   +1448
    devlink_health_reporter_get_dump_nl_policy      32    1480   +1448
    devlink_port_split_nl_policy                  80    1480   +1400
    devlink_sb_occ_snapshot_nl_policy             96    1480   +1384
    devlink_sb_occ_max_clear_nl_policy            96    1480   +1384
    devlink_sb_get_do_nl_policy                   96    1480   +1384
    devlink_sb_port_pool_get_do_nl_policy        144    1480   +1336
    devlink_sb_pool_get_do_nl_policy             144    1480   +1336
    devlink_sb_pool_set_nl_policy                168    1480   +1312
    devlink_sb_port_pool_set_nl_policy           176    1480   +1304
    devlink_sb_tc_pool_bind_set_nl_policy        184    1480   +1296
    devlink_sb_tc_pool_bind_get_do_nl_policy     184    1480   +1296
    devlink_dpipe_table_get_nl_policy            240    1480   +1240
    devlink_dpipe_entries_get_nl_policy          240    1480   +1240
    devlink_dpipe_table_counters_set_nl_policy     272    1480   +1208
    devlink_eswitch_set_nl_policy                504    1480    +976
    devlink_resource_set_nl_policy               544    1480    +936
    devlink_param_get_do_nl_policy               656    1480    +824
    devlink_region_get_do_nl_policy              712    1480    +768
    devlink_region_new_nl_policy                 744    1480    +736
    devlink_region_del_nl_policy                 744    1480    +736
    devlink_health_reporter_test_nl_policy       928    1480    +552
    devlink_health_reporter_recover_nl_policy     928    1480    +552
    devlink_health_reporter_get_do_nl_policy     928    1480    +552
    devlink_health_reporter_dump_get_nl_policy     928    1480    +552
    devlink_health_reporter_dump_clear_nl_policy     928    1480    +552
    devlink_health_reporter_diagnose_nl_policy     928    1480    +552
    devlink_trap_get_do_nl_policy               1048    1480    +432
    devlink_trap_set_nl_policy                  1056    1480    +424
    devlink_trap_group_get_do_nl_policy         1088    1480    +392
    devlink_trap_policer_get_do_nl_policy       1144    1480    +336
    devlink_trap_group_set_nl_policy            1144    1480    +336
    devlink_trap_policer_set_nl_policy          1160    1480    +320
    devlink_port_set_nl_policy                  1168    1480    +312
    devlink_flash_update_nl_policy              1224    1480    +256
    devlink_reload_nl_policy                    1248    1480    +232
    devlink_port_new_nl_policy                  1320    1480    +160
    devlink_rate_get_do_nl_policy               1352    1480    +128
    devlink_rate_del_nl_policy                  1352    1480    +128
    devlink_linecard_get_do_nl_policy           1376    1480    +104
    __devlinks_xa_find_get                         -      96     +96
    devlink_linecard_set_nl_policy              1392    1480     +88
    devlink_selftests_run_nl_policy             1416    1480     +64
    devlink_get_from_attrs_lock                  262     314     +52
    devlink_region_read_nl_policy               1440    1480     +40
    devlink_rate_set_nl_policy                  1448    1480     +32
    devlink_rate_new_nl_policy                  1448    1480     +32
    devlinks_xa_lookup_get                         -      30     +30
    devlink_health_reporter_set_nl_policy       1456    1480     +24
    devlink_attr_index_range                       -      16     +16
    devlink_param_set_nl_policy                 1472    1480      +8
    devlink_nl_dumpit                            276     282      +6
    __initcall__kmod_core__670_573_devlink_init4       -       4      +4
    __initcall__kmod_core__670_561_devlink_init4       4       -      -4
    devlinks_xa_find_get                          96      24     -72
    Total: Before=5203976, After=5269704, chg +1.26%

> --- a/net/devlink/netlink_gen.c
> +++ b/net/devlink/netlink_gen.c
> @@ -11,6 +11,11 @@
>  
>  #include <uapi/linux/devlink.h>
>  
> +/* Integer value ranges */
> +static const struct netlink_range_validation devlink_attr_index_range = {
> +	.max	= U32_MAX,
> +};
> +
>  /* Sparse enums validation callbacks */
>  static int
>  devlink_attr_param_type_validate(const struct nlattr *attr,
> @@ -56,37 +61,42 @@ const struct nla_policy devlink_dl_selftest_id_nl_policy[DEVLINK_ATTR_SELFTEST_I
>  };
>  
>  /* DEVLINK_CMD_GET - do */
> -static const struct nla_policy devlink_get_nl_policy[DEVLINK_ATTR_DEV_NAME + 1] = {
> +static const struct nla_policy devlink_get_nl_policy[DEVLINK_ATTR_INDEX + 1] = {

Unrelated to this change, but the explicit sizing of these arrays is not
needed, as the compiler will take care of that.

>  	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
>  	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
> +	[DEVLINK_ATTR_INDEX] = NLA_POLICY_FULL_RANGE(NLA_UINT, &devlink_attr_index_range),

This array, and many others below, are sparse, with large gaps (up to
1456 or 2912 bytes on 32-bit resp. 64-bit systems) before the last
entries.

>  };
>  
>  /* DEVLINK_CMD_PORT_GET - do */
> -static const struct nla_policy devlink_port_get_do_nl_policy[DEVLINK_ATTR_PORT_INDEX + 1] = {
> +static const struct nla_policy devlink_port_get_do_nl_policy[DEVLINK_ATTR_INDEX + 1] = {
>  	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING, },
>  	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING, },
> +	[DEVLINK_ATTR_INDEX] = NLA_POLICY_FULL_RANGE(NLA_UINT, &devlink_attr_index_range),

Shouldn't this be inserted at the end, as DEVLINK_ATTR_INDEX >
DEVLINK_ATTR_PORT_INDEX, for readability?

>  	[DEVLINK_ATTR_PORT_INDEX] = { .type = NLA_U32, },
>  };

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Russell King (Oracle) @ 2026-04-15 19:37 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni
In-Reply-To: <CAH5Ym4jKdzDeYwCfkMLmUz0FsiD2vFwfuAvqFE=uvMtPmakeMQ@mail.gmail.com>

On Wed, Apr 15, 2026 at 10:38:29AM -0700, Sam Edwards wrote:
> On Wed, Apr 15, 2026 at 5:44 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Tue, Apr 14, 2026 at 07:12:34PM -0700, Sam Edwards wrote:
> > > On Tue, Apr 14, 2026 at 6:19 PM Russell King (Oracle)
> > > <linux@armlinux.org.uk> wrote:
> > > > Okay, just a quick note to say that nvidia's 5.10.216-tegra kernel
> > > > survives iperf3 -c -R to the imx6.
> > >
> > > Hi Russell,
> > >
> > > Aw, you beat me to it! I was about to report that 5.10.104-tegra is
> > > unaffected. And my iperf3 server is a multi-GbE amd64 machine.
> > >
> > > > Dumping the registers and comparing, and then forcing the RQS and TQS
> > > > values to 0x23 (+1 = 36, *256 = 9216 bytes) and 0x8f (+1 = 144,
> > > > *256 = 36864 ytes) respectively seems to solve the problem. Under
> > > > net-next, these both end up being 0xff (+1 = 256, *256 = 65536 bytes.)
> > > > Suspiciously, 36 * 4 = 144, and I also see that this kernel programs
> > > > all four of the MTL receive operation mode registers, but only the
> > > > first MTL transmit operation mode register. However, DMA channels 1-3
> > > > aren't initialised.
> > >
> > > Wow, great! I wonder if the problem is that the MTL FIFOs are smaller
> > > than that, so when the DMA suffers a momentary hiccup, the FIFOs are
> > > allowed to overflow, putting the hardware in a bad state.
> > >
> > > Though I suspect this is only half of the problem: do you still see
> > > RBUs? Everything you've shared so far suggests the DMA failures are
> > > _not_ because the rx ring is drying up.
> >
> > Yes. Note that RBUs will happen not because of DMA failures, but if
> > the kernel fails to keep up with the packet rate. RBU means "we read
> > the next descriptor, and it wasn't owned by hardware".
> 
> Are you speaking from observation, documentation, or understanding?

Observation.

> I'd define RBU the same way, but you reported:

It's not a question about how I define RBU - this is defined by Synopsys
and I'm using it *exactly* that way as stated in the documentation.

"This bit indicates that the host owns the Next Descriptor in the
Receive List and the DMA cannot acquire it. The Receive Process is
suspended. ... This bit is set only when the previous Receive
Descriptor is owned by the DMA."

In other words, DMA has processed the previous receive descriptor which
_was_ owned by the hardware, written back to clear the OWN bit, and
then fetches the next descriptor and finds that the OWN bit is also
clear.

> 
> ```
> [   55.766199] dwc-eth-dwmac 2490000.ethernet eth0: q0: receive buffer
> unavailable: cur_rx=309 dirty_rx=309 last_cur_rx=245
> last_cur_rx_post=309 last_dirty_rx=245 count=64 budget=64
> 
> cur_rx == dirty_rx _should_ mean that we fully refilled the ring. [...]
> [...]
> Every ring entry contains the same RDES3 value, so it really is
> completely full at the point RBU fires (bit 31 clear means software
> owns the descriptor, and it's basically saying first/last segment,
> RDES1 valid, buffer 1 length of 1518.
> ```

Right, because the _last_ time stmmac_rx() was called, the ring was
completely refilled (as it always is for me).

There are two scenarios that what I'm seeing may happen.

1) The ring was fully refilled, but before stmmac_rx() is next
   executed, all descriptors end up being consumed due to the rate
   at which packets are being received. Thus, the hardware encounters
   a descriptor that has OWN=0

2) The kernel has been slow to respond to packets that have been
   received, and because of the NAPI throttling stmmac_rx() to only
   process 64 descriptors at a time, we are falling way behind the
   hardware position. Eventually, the hardware catches up with
   the point at which stmmac_rx_refill() is repopulating the receive
   descriptors, and encounters a descriptor that has OWN=0.

For (2), for example, let's take the example which you've quoted from
me.

stmmac_rx() gets called, and cur_rx = dirty_rx = 245. We're limited to
a count of 64 meaning we're not going to process more than 64 entries
no matter how far ahead the hardware is. Let's say the hardware is at
e.g. descriptor 400 at this point.

stmmac_rx() runs, processing descriptors. It works its way up to entry
309, at which point count == limit, so it stops, and we now have
cur_rx = 309, dirty_rx = 245.

The next thing stmmac_rx() does is call stmmac_rx_refill(). This looks
at the difference, and calculates how many entries need to be
repopulated. stmmac_rx_dirty() returns 64, as that's the number of
entries between dirty_rx and the updated cur_rx. It populates those
entries.

At this point, dirty_rx = 309. All well and good. However, during that
process, packet reception hasn't stopped, and let's say it's now at
descriptor 500.

In that scenario, we're consuming 100 descriptors, but only repopulating
64 descriptors. As this continues, the hardware is slowly catching up
with point in the ring that stmmac_rx_refill() is repopulating the
descriptors.

When it does catch up, it will encounter a descriptor with OWN=0, which
will fire the RBU interrupt.

At this point, my debug dumps the state of the ring. If the RBU was
raised when stmmac_rx()/stmmac_rx_refill() was not running, _and_ we
are always successfully refilling all the entries that stmmac_rx()
processed, then cur_rx will equal dirty_rx, even when the hardware
could be way ahead of cur_rx. Neither of these indexes have any
relevance to where the hardware actually is in the ring.

The dump of the ring state *clearly* shows that all descriptors have
a RDES3 value which indicates that every single descriptor is not
hardware owned at this point (since RBU has been raised, the receive
process is suspended, so hardware is no longer changing the ring.)

> It would seem* that the kernel isn't really failing to keep up with
> the packet rate. If RBU is firing with a ring that's not even close to
> empty, that tells me there's another way for it to fire. So I suspect
> the hardware designers implemented it to mean:
> "We couldn't read the next descriptor, _or_ it wasn't owned by hardware."
> 
> (* However, if bit 31 is clear everywhere, wouldn't that mean the ring
> is actually completely depleted, not full? If count==budget, wouldn't
> that mean the whole ring hasn't been visited, so we only refilled 64
> entries and not necessarily the entire ring? Maybe the kernel isn't
> keeping up after all.)

Ah, I think that's where our terminology differs.

You seem to define full as "populated with empty buffers". I define
full to mean "the hardware has filled every buffer with a packet that
it has received and handed it over to software to process." Note even
the terminology there - filling buffers with data. That ultimately
ends up filling the ring, and when completely filled, it is full.

I think of buffers like buckets. If a buffer contains no data, it
is empty. If a buffer contains data, it has been filled or is full.
Apply that to a list of buffers and you get the same thing. Many
ethernet driver documentation uses this same terminology, so I
thought it would be widely understood.

> > That has:
> >
> >         const nveu32_t rx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
> >                 { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
> >                   FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
> >                 { FIFO_SZ(36U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U),
> >                   FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(2U), FIFO_SZ(16U) },
> >         };
> >         const nveu32_t tx_fifo_sz[2U][OSI_EQOS_MAX_NUM_QUEUES] = {
> >                 { FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U), FIFO_SZ(9U),
> >                   FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U), FIFO_SZ(1U) },
> >                 { FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U),
> >                   FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U), FIFO_SZ(8U) },
> >         };
> >
> > where each of those values is the RQS/TQS value to use in KiB:
> >
> > #define FIFO_SZ(x)              ((((x) * 1024U) / 256U) - 1U)
> >
> > This doesn't correspond with the values I'm seeing programmed into
> > the hardware under the 5.10.216-tegra kernel. I'm seeing TQS = 143
> > (36KiB), and RQS = 35 (9KiB). Yes, these values exist in the tables
> > above from a quick look, but they're not in the right place!
> 
> True, but:
> a) I doubt 5.10.216-tegra includes exactly the same version of the
> driver found in this random GitHub mirror. (My intent was only to
> point out that they don't use 5.10's stmmac; I should have been more
> clear that I wasn't trying to link the same version, sorry!)
> b) This is vendor code; I don't know how good their testing/review
> process is. It might not run the way it looks. The intent seems to be
> for RQS > TQS (which makes intuitive sense), but as you're seeing the
> registers programmed the other way 'round, they might have gotten them
> subtly mixed up.
> 
> > Now, as for FIFO sizes, if we sum up all the entries, then we
> > get:
> >
> > SUM(rx_fifo_size[0][]) = 60KiB
> > SUM(rx_fifo_size[1][]) = 64KiB
> > SUM(tx_fifo_size[0][]) = 60KiB
> > SUM(tx_fifo_size[1][]) = 64KiB
> 
> I follow the math with 64KiB, but surely the 60KiB should be
> 9+9+9+9+1+1+1+1=40KiB? This seems to me that the "legacy EQOS" simply
> shifts with smaller FIFOs. Since dwmac is licensed as a soft IP core,
> perhaps the FIFO size is an elaboration parameter? That would mean
> this isn't an issue with dwmac 5.0 broadly, but with Nvidia's specific
> instantiation of it.

Right, 40KiB. Sorry, I'm getting interrupted almost constantly while
trying to do anything.

However, I've tested with 0x7f in both fields, and it still falls flat
on its face. I've also tried other values, but because I had to unplug
the laptop from the nvidia board to use the laptop portably due to the
medical emergency situation, that caused screen to quit, so I've lost
all that. Chaos reigns supreme here :/

So, I'm not sure we understand what's going on - I don't think it's that
the FIFOs are smaller than specified. I suspect that the 9KiB vs 36KiB
results in some kind of throttling that prevents the condition which
hangs the hardware.

I'm not getting as much time as I'd like to really test out scenarios
due to everything that is going on, and honestly I feel like just
writing this week off now and giving up.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH v3 net] vsock: fix buffer size clamping order
From: Michal Luczaj @ 2026-04-15 19:55 UTC (permalink / raw)
  To: Stefano Garzarella
  Cc: Norbert Szetei, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, virtualization, netdev, linux-kernel
In-Reply-To: <ad9q6r6ZBHIH7k4f@sgarzare-redhat>

On 4/15/26 12:42, Stefano Garzarella wrote:
> On Tue, Apr 14, 2026 at 04:22:04PM +0200, Michal Luczaj wrote:
>> On 4/9/26 18:34, Norbert Szetei wrote:
>>> In vsock_update_buffer_size(), the buffer size was being clamped to the
>>> maximum first, and then to the minimum. If a user sets a minimum buffer
>>> size larger than the maximum, the minimum check overrides the maximum
>>> check, inverting the constraint.
>>>
>>> This breaks the intended socket memory boundaries by allowing the
>>> vsk->buffer_size to grow beyond the configured vsk->buffer_max_size.
>>>
>>> Fix this by checking the minimum first, and then the maximum. This
>>> ensures the buffer size never exceeds the buffer_max_size.
>>
>> Something may be missing. After adding another ioctl to your reproducer, I
>> still see crashes.
>>
>>     SYSCHK(setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_MIN_SIZE, &min,
>>                       sizeof(min)));
>> +    SYSCHK(setsockopt(fd, AF_VSOCK, SO_VM_SOCKETS_BUFFER_MAX_SIZE, &min,
>> +                      sizeof(min)));
>> }
>>
>> [*] Setting buffer_min_size to 0x400000000.
>> [socket][0] sending...
>>
>> refcount_t: saturated; leaking memory.
>> WARNING: lib/refcount.c:22 at refcount_warn_saturate+0x7d/0xb0, CPU#2:
>> a.out/1478
>> ...
>> refcount_t: underflow; use-after-free.
>> WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x50/0xb0, CPU#12:
>> kworker/12:0/80
>> Workqueue: vsock-loopback vsock_loopback_work
>> ...
>>
> 
> yeah, I pointed out the same during the bug discussion 
> (https://lore.kernel.org/netdev/acuKUpZQq6z1DY_n@sgarzare-redhat/) and 
> suggested to add a sysctl or reuse net.core.wmem_max/rmem_max
> (https://lore.kernel.org/netdev/adYKERRYwzMIhZAl@sgarzare-redhat/)

Oh, so this patch wasn't meant to tackle the repro from v1. Sorry, I got
confused.

Michal


^ permalink raw reply

* Re: [PATCH net v5] net: stmmac: Prevent NULL deref when RX memory exhausted
From: Russell King (Oracle) @ 2026-04-15 19:58 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue, Maxime Chevallier,
	Ovidiu Panait, Vladimir Oltean, Baruch Siach, Serge Semin,
	Giuseppe Cavallaro, netdev, linux-stm32, linux-arm-kernel,
	linux-kernel, stable
In-Reply-To: <CAH5Ym4gy6g8d88-vGhe1zxoV7jNH_fXHsDSdDWC4x00H7s-3=w@mail.gmail.com>

On Wed, Apr 15, 2026 at 10:53:15AM -0700, Sam Edwards wrote:
> On Wed, Apr 15, 2026 at 9:28 AM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > On Wed, Apr 15, 2026 at 01:56:32PM +0100, Russell King (Oracle) wrote:
> > > Locally, while debugging my issues, I used this to prevent cur_rx
> > > catching up with dirty_rx:
> > >
> > >                 status = stmmac_rx_status(priv, &priv->xstats, p);
> > >                 /* check if managed by the DMA otherwise go ahead */
> > >                 if (unlikely(status & dma_own))
> > >                         break;
> > >
> > >                 next_entry = STMMAC_NEXT_ENTRY(rx_q->cur_rx,
> > >                                                priv->dma_conf.dma_rx_size);
> > >                 if (unlikely(next_entry == rx_q->dirty_rx))
> > >                         break;
> > >
> > >                 rx_q->cur_rx = next_entry;
> > >
> > > If we care about the cost of reloading rx_q->dirty_rx on every
> > > iteration, then I'd suggest that the cost we already incur reading and
> > > writing rx_q->cur_rx is something that should be addressed, and
> > > eliminating that would counter the cost of reading rx_q->dirty_rx. I
> > > suspect, however, that the cost is minimal, as cur_tx and dirty_rx are
> > > likely in the same cache line.
> 
> No, no, I like your approach better. :) It also removes the need for
> the `limit` clamp at the top of the function, so later code can assume
> limit==budget.
> 
> > > It looks like any fix to stmmac_rx() will also need a corresponding
> > > fix for stmmac_rx_zc().
> 
> I agree that stmmac_rx_zc() is likely also broken (in a similar way,
> but not similar enough to permit a "corresponding" fix), but I don't
> agree that there's a dependency relationship here. This patch is
> addressing #221010, which affects the generic/non-ZC codepath; I'm
> afraid the ZC codepath warrants its own investigation.

The code structure is identical. The only difference is what happens
to the packets.

Both paths take the NAPI limit. Both paths process up to that limit of
descriptors. The state saving / restoring is similar. The read_again
label is the same, the condition after is the same.

The ZC path differs at this point in that it will attempt to refill
every 16 descriptors that have been processed.

Both paths then read the descriptor and check the ownership.
Both paths then increment cur_rx to point to the next entry around
the ring.
Both paths then get the following descriptor pointer and prefetch
it.
Both paths then get the extended status if we're using extended
descriptors.
Both paths then handle frame discard.
Both paths then jump back to read_again if this isn't the last
segment and we have an error.
Both paths then check for error.
... and so it goes on.

The ZC path to me looks like a copy-paste-and-tweak approach to
adding support. The difference seems to be centered only around
the handling of the data buffers in the descriptors. The overall
mechanism of processing the descriptors follows the same layout
in both functions.

> > I have some further information, but a new curveball has just been
> > chucked... and I've no idea what this will mean at this stage. Just
> > take it that I won't be responding for a while.
> 
> I think I follow your meaning. Good luck getting it straightened out!

It looks like further curveballs have been thrown as a result,
destroying all "plans" for the next days/week. I have aboslutely
no ideas how much time or when I'll be able to look at anything
at the moment, so don't assume that because I find an opportunity
to send an email, everthing is back to normal.

I'll also note that over the last two days I've written several
emails on this, spent many hours on them, only to discard them
as other ideas/research and maybe even the passage of time means
they're no longer appropriate to send.

Jakub: sorry, I just *can't* review stuff on netdev with everything
that is going on, not when .... cna't complete this.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH net] slip: reject VJ frames when no receive slots are allocated
From: Weiming Shi @ 2026-04-15 20:03 UTC (permalink / raw)
  To: Simon Horman
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, kees, netdev, xmei5
In-Reply-To: <20260414134126.758795-2-horms@kernel.org>

On 26-04-14 14:41, Simon Horman wrote:
> From: 'Simon Horman' <horms@kernel.org>
> 
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> slip: reject VJ frames when no receive slots are allocated
> 
> This commit prevents a null pointer dereference by adding checks for
> a NULL comp->rstate in slhc_uncompress() and slhc_remember(). It ensures
> frames are rejected when no receive slots are allocated.
> 
> > diff --git a/drivers/net/slip/slhc.c b/drivers/net/slip/slhc.c
> > index e3c785da3eef3..e67052bcab57b 100644
> > --- a/drivers/net/slip/slhc.c
> > +++ b/drivers/net/slip/slhc.c
> > @@ -502,6 +502,10 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize)
> >  
> >  	/* We've got a compressed packet; read the change byte */
> >  	comp->sls_i_compressed++;
> > +	if (!comp->rstate) {
> > +		comp->sls_i_error++;
> > +		return 0;
> > +	}
> 
> While reviewing this code, a few pre-existing issues were noticed in and
> around slhc_uncompress().
> 
> Can slhc_uncompress() cause a heap buffer overflow?
> 
> It expands compressed packets by prepending reconstructed IP and TCP
> headers, which can total up to 120 bytes if options are maximized. The
> caller in slip.c assumes a maximum expansion of 80 bytes:
> 
>     if (count + 80 > sl->buffsize)
> 
> If a compressed packet is sized at sl->buffsize - 80, slhc_uncompress()
> shifts the payload and copies the 120-byte headers, writing up to 37 bytes
> past the dynamically allocated sl->rbuff chunk. Does this corrupt memory?
> 
> Also, does the decode() function read variable-length deltas without
> bounds checking against the packet length isize?
> 
> A malformed packet with multiple change flags set could cause decode()
> to read up to 15 bytes past the end of the packet data. Because the packet
> buffer sl->rbuff is allocated via kmalloc without zero-initialization,
> could this read uninitialized kernel heap memory and inject it into the
> reconstructed TCP/IP headers, leaking it back to the network?
> 
> Finally, do slhc_compress() and slhc_uncompress() read and write the
> 16-bit TCP checksum using direct pointer casts?
> 
> drivers/net/slip/slhc.c:slhc_uncompress() {
>     ...
>     thp->check = *(__sum16 *)cp;
>     ...
> }
> 
> Since the compressed header varies dynamically in size, the cp pointer
> frequently lands on odd byte boundaries. On architectures with strict
> alignment requirements, could this direct dereference trigger an alignment
> fault and kernel panic? Should the code use get_unaligned() and
> put_unaligned() instead?
> 
> [ ... ]
> 
> > @@ -651,8 +655,9 @@ slhc_remember(struct slcompress *comp, unsigned char *icp, int isize)
> >  
> >  	/* The packet is shorter than a legal IP header.
> >  	 * Also make sure isize is positive.
> > +	 * Reject if no receive slots are configured (rstate is NULL).
> >  	 */
> > -	if (isize < (int)sizeof(struct iphdr)) {
> > +	if (!comp->rstate || isize < (int)sizeof(struct iphdr)) {
> >  runt:
> >  		comp->sls_i_runt++;
> >  		return slhc_toss(comp);
> 
> When dropping a packet due to a lack of configured receive slots
> (rslots == 0), should this increment comp->sls_i_error instead of
> comp->sls_i_runt?
> 
> Dropping a packet because rstate is missing is an internal configuration
> state, not a runt packet.

Thanks, Simon. I will send v2 patch.

I also reproduced the three pre-existing issues you pointed out
(one via KASAN, the other two under GDB) and will send patches for
them separately.

Best regards,
Weiming Shi

^ permalink raw reply

* Re: [PATCH nf,v2 1/3] rculist: add list_splice_rcu() for private lists
From: Pablo Neira Ayuso @ 2026-04-15 20:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: netfilter-devel, davem, netdev, kuba, pabeni, edumazet, fw, horms,
	joelagnelf, josh, boqun, urezki, rostedt, mathieu.desnoyers,
	jiangshanlai, qiang.zhang, rcu
In-Reply-To: <9210a276-8158-40f4-b3b5-6431f5f13541@paulmck-laptop>

On Wed, Apr 15, 2026 at 10:39:33AM -0700, Paul E. McKenney wrote:
> On Wed, Apr 15, 2026 at 07:08:44PM +0200, Pablo Neira Ayuso wrote:
> > This patch adds a helper function, list_splice_rcu(), to safely splice
> > a private (non-RCU-protected) list into an RCU-protected list.
> > 
> > The function ensures that only the pointer visible to RCU readers
> > (prev->next) is updated using rcu_assign_pointer(), while the rest of
> > the list manipulations are performed with regular assignments, as the
> > source list is private and not visible to concurrent RCU readers.
> > 
> > This is useful for moving elements from a private list into a global
> > RCU-protected list, ensuring safe publication for RCU readers.
> > Subsystems with some sort of batching mechanism from userspace can
> > benefit from this new function.
> > 
> > The function __list_splice_rcu() has been added for clarity and to
> > follow the same pattern as in the existing list_splice*() interfaces,
> > where there is a check to ensure that that the list to splice is not
> > empty. Note that __list_splice_rcu() has no documentation for this
> > reason.
> > 
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
> > v2: including comments by Paul McKenney.
> > 
> >     Except, I have deliberately keep back the suggestion to squash
> >     __list_splice_rcu() into list_splice_rcu(), I instead removed
> >     the documentation for __list_splice_rcu(). I am looking
> >     at other existing list_splice*() function in list.h and rculist.h
> >     to get this aligned with __list_splice(), which also has no users
> >     in the tree and no documentation. I find it easier to read with
> >     __list_splice(), but if this explaination is not sound so...
> > 
> >     @Paul: I can post v3 squashing __list_splice_rcu(), just let me
> >            know.
> 
> Removing the comment addresses most of my concerns.  I do have a slight
> but not overwhelming preference for the squashed version, but either way:
> 
> Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Or if you want this to go in via RCU, please let us know.  My guess is
> that it would be easier for you to take it in with the code using it.

I'd prefer to take it through nf.git, I need this as a fix for an
invalid use of list_splice() on a RCU-protected list.

Thanks for your quick review Paul!

^ permalink raw reply

* Re: [RFC PATCH net-next v2 1/2] udp: fix encapsulation packet resubmit in multicast deliver
From: Kuniyuki Iwashima @ 2026-04-15 20:35 UTC (permalink / raw)
  To: littlesmilingcloud
  Cc: davem, dsahern, edumazet, horms, kuba, linux-kselftest, netdev,
	pabeni, shuah, willemdebruijn.kernel
In-Reply-To: <ad_eD6CjMOBSXQVm@dau-home-pc>

From: Anton Danilov <littlesmilingcloud@gmail.com>
Date: Wed, 15 Apr 2026 21:50:55 +0300
> When a UDP encapsulation socket (e.g., FOU) receives a multicast
> packet, __udp4_lib_mcast_deliver() and __udp6_lib_mcast_deliver()
> incorrectly call consume_skb() when udp_queue_rcv_skb() returns a
> positive value. A positive return value from udp_queue_rcv_skb()
> indicates that the encap_rcv handler (e.g., fou_udp_recv) has
> consumed the UDP header and wants the packet to be resubmitted to
> the IP protocol handler for further processing (e.g., as a GRE
> packet).
> 
> The unicast path in udp_unicast_rcv_skb() handles this correctly by
> returning -ret, which propagates up to ip_protocol_deliver_rcu() for
> resubmission. The GSO path in udp_queue_rcv_skb() also handles this
> correctly by calling ip_protocol_deliver_rcu() directly. However, the
> multicast path destroys the packet via consume_skb() instead of
> resubmitting it, causing silent packet loss.
> 
> This affects any UDP encapsulation (FOU, GUE) combined with multicast
> destination addresses. In practice, it causes ~50% packet loss on
> FOU/GRETAP tunnels configured with multicast remote addresses, with
> the exact ratio depending on the early demux cache hit rate (packets
> that hit early demux take the unicast path and are handled correctly).
> 
> Fix this by calling ip_protocol_deliver_rcu() (IPv4) or
> ip6_protocol_deliver_rcu() (IPv6) instead of consume_skb() when the
> return value is positive, matching the behavior of the GSO path.
> 
> Signed-off-by: Anton Danilov <littlesmilingcloud@gmail.com>
> Assisted-by: Claude:claude-opus-4-6
> ---
>  net/ipv4/udp.c | 13 +++++++++----
>  net/ipv6/udp.c | 13 +++++++++----
>  2 files changed, 18 insertions(+), 8 deletions(-)
> 
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index e9e2ce9522ef..8c2d4367cba2 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2467,6 +2467,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  	struct udp_hslot *hslot;
>  	struct sk_buff *nskb;
>  	bool use_hash2;
> +	int ret;
>  
>  	hash2_any = 0;
>  	hash2 = 0;
> @@ -2500,8 +2501,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  			__UDP_INC_STATS(net, UDP_MIB_INERRORS);
>  			continue;
>  		}
> -		if (udp_queue_rcv_skb(sk, nskb) > 0)
> -			consume_skb(nskb);
> +		ret = udp_queue_rcv_skb(sk, nskb);
> +		if (ret > 0)
> +			ip_protocol_deliver_rcu(dev_net(nskb->dev), nskb,
> +						ret);

Is this path reachable in the first place ?

Maybe I'm missing something, but UDP tunnel sockets
should not have SO_REUSEADDR/SO_REUSEPORT.


>  	}
>  
>  	/* Also lookup *:port if we are using hash2 and haven't done so yet. */
> @@ -2511,8 +2514,10 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  	}
>  
>  	if (first) {
> -		if (udp_queue_rcv_skb(first, skb) > 0)
> -			consume_skb(skb);
> +		ret = udp_queue_rcv_skb(first, skb);
> +		if (ret > 0)
> +			ip_protocol_deliver_rcu(dev_net(skb->dev), skb,
> +						ret);

If the above is true, we can simply return -ret here
to avoid possible stack overflow with too many FOU
encapsulation that syzbot is fond of.


Please wait 24h before next submission.
https://docs.kernel.org/process/maintainer-netdev.html


>  	} else {
>  		kfree_skb(skb);
>  		__UDP_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 15e032194ecc..f74935d9f7d7 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -949,6 +949,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  	struct udp_hslot *hslot;
>  	struct sk_buff *nskb;
>  	bool use_hash2;
> +	int ret;
>  
>  	hash2_any = 0;
>  	hash2 = 0;
> @@ -987,8 +988,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  			continue;
>  		}
>  
> -		if (udpv6_queue_rcv_skb(sk, nskb) > 0)
> -			consume_skb(nskb);
> +		ret = udpv6_queue_rcv_skb(sk, nskb);
> +		if (ret > 0)
> +			ip6_protocol_deliver_rcu(dev_net(nskb->dev), nskb,
> +						 ret, true);
>  	}
>  
>  	/* Also lookup *:port if we are using hash2 and haven't done so yet. */
> @@ -998,8 +1001,10 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
>  	}
>  
>  	if (first) {
> -		if (udpv6_queue_rcv_skb(first, skb) > 0)
> -			consume_skb(skb);
> +		ret = udpv6_queue_rcv_skb(first, skb);
> +		if (ret > 0)
> +			ip6_protocol_deliver_rcu(dev_net(skb->dev), skb,
> +						 ret, true);
>  	} else {
>  		kfree_skb(skb);
>  		__UDP6_INC_STATS(net, UDP_MIB_IGNOREDMULTI);
> -- 
> 2.47.3
> 

^ permalink raw reply

* [PATCH net v2] slip: reject VJ receive packets on instances with no rstate array
From: Weiming Shi @ 2026-04-15 20:41 UTC (permalink / raw)
  To: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: netdev, Xiang Mei, Weiming Shi

slhc_init() accepts rslots == 0 as a valid configuration, with the
documented meaning of 'no receive compression'. In that case the
allocation loop in slhc_init() is skipped, so comp->rstate stays
NULL and comp->rslot_limit stays 0 (from the kzalloc of struct
slcompress).

The receive helpers do not defend against that configuration.
slhc_uncompress() dereferences comp->rstate[x] when the VJ header
carries an explicit connection ID, and slhc_remember() later assigns
cs = &comp->rstate[...] after only comparing the packet's slot number
to comp->rslot_limit. Because rslot_limit is 0, slot 0 passes the
range check, and the code dereferences a NULL rstate.

The configuration is reachable in-tree through PPP. PPPIOCSMAXCID
stores its argument in a signed int, and (val >> 16) uses arithmetic
shift. Passing 0xffff0000 therefore sign-extends to -1, so val2 + 1
is 0 and ppp_generic.c ends up calling slhc_init(0, 1). Because
/dev/ppp open is gated by ns_capable(CAP_NET_ADMIN), the whole path
is reachable from an unprivileged user namespace. Once the malformed
VJ state is installed, any inbound VJ-compressed or VJ-uncompressed
frame that selects slot 0 crashes the kernel in softirq context:

 Oops: general protection fault, probably for non-canonical
       address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI
 KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
 RIP: 0010:slhc_uncompress (drivers/net/slip/slhc.c:519)
 Call Trace:
  <TASK>
  ppp_receive_nonmp_frame (drivers/net/ppp/ppp_generic.c:2466)
  ppp_input (drivers/net/ppp/ppp_generic.c:2359)
  ppp_async_process (drivers/net/ppp/ppp_async.c:492)
  tasklet_action_common (kernel/softirq.c:926)
  handle_softirqs (kernel/softirq.c:623)
  run_ksoftirqd (kernel/softirq.c:1055)
  smpboot_thread_fn (kernel/smpboot.c:160)
  kthread (kernel/kthread.c:436)
  ret_from_fork (arch/x86/kernel/process.c:164)
  </TASK>

Reject the receive side on such instances instead of touching rstate.
slhc_uncompress() falls through to its existing 'bad' label, which
bumps sls_i_error and enters the toss state. slhc_remember() mirrors
that with an explicit sls_i_error increment followed by slhc_toss();
the sls_i_runt counter is not used here because a missing rstate is
an internal configuration state, not a runt packet.

The transmit path is unaffected: the only in-tree caller that picks
rslots from userspace (ppp_generic.c) still supplies tslots >= 1, and
slip.c always calls slhc_init(16, 16), so comp->tstate remains valid
and slhc_compress() continues to work.

Fixes: b5451d783ade ("slip: Move the SLIP drivers")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
v2:
- slhc_remember(): use sls_i_error instead of sls_i_runt for the
  missing-rstate case; it is a configuration error, not a runt packet
  (Simon).
- slhc_uncompress(): goto bad instead of returning 0, so the instance
  also enters SLF_TOSS on the first rejected frame.

 drivers/net/slip/slhc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/slip/slhc.c b/drivers/net/slip/slhc.c
index e3c785da3eef3..e18a4213d10ce 100644
--- a/drivers/net/slip/slhc.c
+++ b/drivers/net/slip/slhc.c
@@ -506,6 +506,8 @@ slhc_uncompress(struct slcompress *comp, unsigned char *icp, int isize)
 		comp->sls_i_error++;
 		return 0;
 	}
+	if (!comp->rstate)
+		goto bad;
 	changes = *cp++;
 	if(changes & NEW_C){
 		/* Make sure the state index is in range, then grab the state.
@@ -649,6 +651,10 @@ slhc_remember(struct slcompress *comp, unsigned char *icp, int isize)
 	struct cstate *cs;
 	unsigned int ihl;

+	if (!comp->rstate) {
+		comp->sls_i_error++;
+		return slhc_toss(comp);
+	}
 	/* The packet is shorter than a legal IP header.
 	 * Also make sure isize is positive.
 	 */
-- 
2.43.0

^ permalink raw reply related

* Re: [PATCH bpf] bpf,tcp: avoid infinite recursion in BPF_SOCK_OPS_HDR_OPT_LEN_CB
From: KaFai Wan @ 2026-04-15 20:47 UTC (permalink / raw)
  To: Martin KaFai Lau, Jiayuan Chen
  Cc: bpf, Quan Sun, Yinhao Hu, Kaiyan Mei, Dongliang Mu, Eric Dumazet,
	Neal Cardwell, Kuniyuki Iwashima, David S. Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Jonathan Corbet, Shuah Khan,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa, David Ahern,
	netdev, linux-doc, linux-kernel
In-Reply-To: <2026415181939.1bue.martin.lau@linux.dev>

On Wed, 2026-04-15 at 11:55 -0700, Martin KaFai Lau wrote:
> On Tue, Apr 14, 2026 at 06:57:00PM +0800, Jiayuan Chen wrote:
> > A BPF_PROG_TYPE_SOCK_OPS program can set BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG
> > to inject custom TCP header options. When the kernel builds a TCP packet,
> > it calls tcp_established_options() to calculate the header size, which
> > invokes bpf_skops_hdr_opt_len() to trigger the BPF_SOCK_OPS_HDR_OPT_LEN_CB
> > callback.
> > 
> > If the BPF program calls bpf_setsockopt(TCP_NODELAY) inside this callback,
> > __tcp_sock_set_nodelay() will call tcp_push_pending_frames(), which calls
> > tcp_current_mss(), which calls tcp_established_options() again,
> > re-triggering the same BPF callback. This creates an infinite recursion
> > that exhausts the kernel stack and causes a panic.
> > 
> > BPF_SOCK_OPS_HDR_OPT_LEN_CB
> >   -> bpf_setsockopt(TCP_NODELAY)
> > 	-> tcp_push_pending_frames()
> > 	  -> tcp_current_mss()
> > 		-> tcp_established_options()
> > 		  -> bpf_skops_hdr_opt_len()
> >                            /* infinite recursion */
> > 			-> BPF_SOCK_OPS_HDR_OPT_LEN_CB
> > 
> > A similar reentrancy issue exists for TCP congestion control, which is
> > guarded by tp->bpf_chg_cc_inprogress. Adopt the same approach: introduce
> > tp->bpf_hdr_opt_len_cb_inprogress, set it before invoking the callback in
> > bpf_skops_hdr_opt_len(), and check it in sol_tcp_sockopt() to reject
> > bpf_setsockopt(TCP_NODELAY) calls that would trigger
> > tcp_push_pending_frames() and cause the recursion.
> > 
> > Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> > Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> > Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> > Reported-by: Dongliang Mu <dzm91@hust.edu.cn>
> > Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
> 
> Thanks for the report and fixes suggested across different threads.
> 
> Using has_current_bpf_ctx() to avoid tcp_push_pending_frames() should
> work but it may change the expectation for bpf_setsockopt(TCP_NODELAY).
> e.g. A bpf_tcp_iter does bpf_setsockopt(TCP_NODELAY).
> 
> Adding another bit in the tcp_sock is not ideal either. I agree with
> Alexei that it is better to reuse the existing bit if we go down this path.
> We also need to audit more closely if there are cases that two different
> type of bpf progs may call bpf_setsockopt(). e.g.
> bpf_tcp_iter does bpf_setsockopt(TCP_CONGESTION) to switch
> to a bpf_tcp_cc and the new bpf_tcp_cc->init() will also do
> bpf_setsockopt(xxx) which then will be rejected.
> 
> Another fix could be, the bpf_setsockopt(TCP_NODELAY) is always broken
> for BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB unless
> the bpf prog is doing some maneuver to avoid the recursion. Thus,
> this use case is basically broken as is and I don't see a use case
> for bpf_setsockopt(TCP_NODELAY) when writing header also.
> How about checking the bpf_sock->op, level, and optname in
> bpf_sock_ops_setsockopt() and return -EOPNOTSUPP?

Hi Martin, thanks for the review.

I'm working whit return -EOPNOTSUPP. I've completed whit the code of fix and test, and will send the
patch later.

The fix is:

diff --git a/net/core/filter.c b/net/core/filter.c
index fcfcb72663ca..911ff04bca5a 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
 	if (!is_locked_tcp_sock_ops(bpf_sock))
 		return -EOPNOTSUPP;
 
+	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
+	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
+	    IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY)
+		return -EOPNOTSUPP;
+
 	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
 }

-- 
Thanks,
KaFai

^ permalink raw reply related

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Sam Edwards @ 2026-04-15 20:50 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni
In-Reply-To: <ad_o4aDP0UBY_8i4@shell.armlinux.org.uk>

On Wed, Apr 15, 2026 at 12:37 PM Russell King (Oracle)
<linux@armlinux.org.uk> wrote:
>
> It's not a question about how I define RBU - this is defined by Synopsys
> and I'm using it *exactly* that way as stated in the documentation.
>
> "This bit indicates that the host owns the Next Descriptor in the
> Receive List and the DMA cannot acquire it. The Receive Process is
> suspended. ... This bit is set only when the previous Receive
> Descriptor is owned by the DMA."
>
> In other words, DMA has processed the previous receive descriptor which
> _was_ owned by the hardware, written back to clear the OWN bit, and
> then fetches the next descriptor and finds that the OWN bit is also
> clear.

I'm only trying to leave open the possibility that the Synopsys
technical writer and the hardware implementation team weren't
communicating clearly. We already have a situation where RPS isn't
behaving as documented (even if that's likely just hardware
misconfiguration), so while I'm currently pretty sure RBU carries no
other (actual) meaning than "DMA caught up to OWN=0," I'm only about
75% confident.

> > It would seem* that the kernel isn't really failing to keep up with
> > the packet rate. If RBU is firing with a ring that's not even close to
> > empty, that tells me there's another way for it to fire. So I suspect
> > the hardware designers implemented it to mean:
> > "We couldn't read the next descriptor, _or_ it wasn't owned by hardware."
> >
> > (* However, if bit 31 is clear everywhere, wouldn't that mean the ring
> > is actually completely depleted, not full? If count==budget, wouldn't
> > that mean the whole ring hasn't been visited, so we only refilled 64
> > entries and not necessarily the entire ring? Maybe the kernel isn't
> > keeping up after all.)
>
> Ah, I think that's where our terminology differs.
>
> You seem to define full as "populated with empty buffers". I define
> full to mean "the hardware has filled every buffer with a packet that
> it has received and handed it over to software to process." Note even
> the terminology there - filling buffers with data. That ultimately
> ends up filling the ring, and when completely filled, it is full.
>
> I think of buffers like buckets. If a buffer contains no data, it
> is empty. If a buffer contains data, it has been filled or is full.
> Apply that to a list of buffers and you get the same thing. Many
> ethernet driver documentation uses this same terminology, so I
> thought it would be widely understood.

Ah okay, I was beginning to suspect the same. In my defense: though I
also think of buffers in the same way, this driver calls the process
of supplying empty buffers "refilling," which is also the terminology
we've both been using throughout this exchange, and when something is
"completely refilled" I generally call it "full." But I'm realizing
now that the bidirectional (submissions+completions) nature of this
ring means that "full" and "empty" aren't really well-defined
concepts. I'll try to read more carefully (and switch to saying
"completely dirty" and "completely clean") going forward.

So the kernel is able to supply clean buffers without issue, but it
somehow falls behind the incoming packet rate and the DMA is left with
a completely dirty ring. I agree that stmmac_rx() is therefore just
not running fast enough: either it's got really bad scheduler jitter
for the ~6.3ms minimum it takes for 512x full-sized Ethernet frames to
arrive from the PHY (your scenario 1), or -- more likely -- the NAPI
budgets gradually fall behind the hardware (your scenario 2).

> Right, 40KiB. Sorry, I'm getting interrupted almost constantly while
> trying to do anything.
>
> However, I've tested with 0x7f in both fields, and it still falls flat
> on its face. I've also tried other values, but because I had to unplug
> the laptop from the nvidia board to use the laptop portably due to the
> medical emergency situation, that caused screen to quit, so I've lost
> all that. Chaos reigns supreme here :/

I'm sorry to hear about that, please prioritize you/yours and don't
feel like you owe me speedy replies.

> So, I'm not sure we understand what's going on - I don't think it's that
> the FIFOs are smaller than specified. I suspect that the 9KiB vs 36KiB
> results in some kind of throttling that prevents the condition which
> hangs the hardware.

I'll try playing with the FIFO configuration on my end to learn:
a) If a suitably-configured FIFO size makes the RPS status arrive as documented
b) If I can safely fill the FIFO slowly (by manually stalling the
driver and adding frames one at a time) and have it drain on resume
c) Whether the TQS value can be adjusted independently of this
problem's prevalence
d) The maximum RQS value that allows the problem to happen

> I'm not getting as much time as I'd like to really test out scenarios
> due to everything that is going on, and honestly I feel like just
> writing this week off now and giving up.

I have the same hardware, observe the same issue, and find this
interesting enough to keep plugging away at it. I would have no hard
feelings if you left me alone with this problem for a bit. :)

Be well,
Sam

^ permalink raw reply

* Re: [PATCH iwl-net] ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
From: Jacob Keller @ 2026-04-15 21:22 UTC (permalink / raw)
  To: Simon Horman, Petr Oros
  Cc: netdev, Tony Nguyen, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Aleksandr Loktionov, Nikolay Aleksandrov, Daniel Zahka,
	Paul Greenwalt, Dave Ertman, Michal Swiatkowski, intel-wired-lan,
	linux-kernel
In-Reply-To: <20260415163003.GP772670@horms.kernel.org>

On 4/15/2026 9:30 AM, Simon Horman wrote:
> On Mon, Apr 13, 2026 at 09:14:20PM +0200, Petr Oros wrote:
>> On certain E810 configurations where firmware supports Tx scheduler
>> topology switching (tx_sched_topo_comp_mode_en), ice_cfg_tx_topo()
>> may need to apply a new 5-layer or 9-layer topology from the DDP
>> package. If the AQ command to set the topology fails (e.g. due to
>> invalid DDP data or firmware limitations), the global configuration
>> lock must still be cleared via a CORER reset.
>>
>> Commit 86aae43f21cf ("ice: don't leave device non-functional if Tx
>> scheduler config fails") correctly fixed this by refactoring
>> ice_cfg_tx_topo() to always trigger CORER after acquiring the global
>> lock and re-initialize hardware via ice_init_hw() afterwards.
>>
>> However, commit 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end
>> of deinit paths") later moved ice_init_dev_hw() into ice_init_hw(),
>> breaking the reinit path introduced by 86aae43f21cf. This creates an
>> infinite recursive call chain:
>>
>>   ice_init_hw()
>>     ice_init_dev_hw()
>>       ice_cfg_tx_topo()         # topology change needed
>>         ice_deinit_hw()
>>         ice_init_hw()           # reinit after CORER
>>           ice_init_dev_hw()     # recurse
>>             ice_cfg_tx_topo()
>>               ...               # stack overflow
>>
>> Fix by moving ice_init_dev_hw() back out of ice_init_hw() and calling
>> it explicitly from ice_probe() and ice_devlink_reinit_up(). The third
>> caller, ice_cfg_tx_topo(), intentionally does not need ice_init_dev_hw()
>> during its reinit, it only needs the core HW reinitialization. This
>> breaks the recursion cleanly without adding flags or guards.
>>
>> The deinit ordering changes from commit 8a37f9e2ff40 ("ice: move
>> ice_deinit_dev() to the end of deinit paths") which fixed slow rmmod
>> are preserved, only the init-side placement of ice_init_dev_hw() is
>> reverted.
>>
>> Fixes: 8a37f9e2ff40 ("ice: move ice_deinit_dev() to the end of deinit paths")
>> Signed-off-by: Petr Oros <poros@redhat.com>
> 
> Hi Petr,
> 
> I don't intended to delay this patch.
> But could you follow-up by looking over the AI generated
> review of this patch on sashiko.dev?
> 
> Thanks!

I'll take a look as well. I recently included this fix in Intel Wired
LAN update last night, so hopefully nothing too problematic...

Sashiko says:

> While this code wasn't introduced by this patch, the restructuring makes it
> more visible: can this cause a use-after-free if the nested hardware
> initialization fails?
> If ice_cfg_tx_topo() triggers a topology change, it performs a CORER reset
> followed by an unroll (ice_deinit_hw) and re-initialization (ice_init_hw). If
> that nested ice_init_hw() fails, its unroll path frees hw->port_info and
> destroys control queues and mutexes.
> Because ice_init_dev_hw() returns void, it swallows the -ENODEV error and
> falls back to safe mode. This allows ice_probe() to proceed with an unrolled
> and freed hardware struct, which would result in a use-after-free when memory
> like hw->port_info is accessed later.
> Should ice_init_dev_hw() be updated to return an error code so the caller can
> abort the probe when base hardware unrolls occur?

I think this suggestion might be good. We fail in probe if ice_init_hw()
fails regardless of "safe" mode, so having init_dev_hw() also fail if
the reinit fails makes some sense to me...

Thanks,
Jake

^ permalink raw reply

* Re: [PATCH iwl-net] ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw
From: Jacob Keller @ 2026-04-15 21:23 UTC (permalink / raw)
  To: Simon Horman, Petr Oros
  Cc: netdev, Tony Nguyen, Przemek Kitszel, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Aleksandr Loktionov, Nikolay Aleksandrov, Daniel Zahka,
	Paul Greenwalt, Dave Ertman, Michal Swiatkowski, intel-wired-lan,
	linux-kernel
In-Reply-To: <f30ad78e-1eb9-4c9d-9034-c8873966de66@intel.com>

On 4/15/2026 2:22 PM, Jacob Keller wrote:
> I'll take a look as well. I recently included this fix in Intel Wired
> LAN update last night, so hopefully nothing too problematic...
> 

Correction, and I need more caffeine: I think I had considered including
this fix but didn't quite make the cut last night when sending.

^ permalink raw reply

* [PATCH net] slip: fix slab-out-of-bounds write in slhc_uncompress()
From: Weiming Shi @ 2026-04-15 21:34 UTC (permalink / raw)
  To: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Andrew Morton, Hans Verkuil, Alex Deucher, Ian Rogers,
	Jonathan Cameron, Kees Cook, Ingo Molnar, Alan Cox, netdev,
	Weiming Shi, Simon Horman

sl_bump() reserves only 80 bytes of expansion headroom before calling
slhc_uncompress(), but the reconstructed IP + TCP header is up to
ip->ihl*4 + thp->doff*4 bytes. IHL and TCP doff are 4-bit fields and
both can legitimately reach 15, so the header can grow to 2*15*4 =
120 bytes. A VJ-uncompressed primer with ihl=15, doff=15 followed by
a compressed frame of size buffsize - 80 therefore writes up to
33 bytes past the kmalloc(buffsize + 4) rbuff allocation, with
attacker-controlled content:

 BUG: KASAN: slab-out-of-bounds in slhc_uncompress
 Write of size 1069 at addr ffff88800ba93078 by task kworker/u8:1/32
 Workqueue: events_unbound flush_to_ldisc
 Call Trace:
  __asan_memmove+0x3f/0x70
  slhc_uncompress (drivers/net/slip/slhc.c:614)
  slip_receive_buf (drivers/net/slip/slip.c:342)
  tty_ldisc_receive_buf
  flush_to_ldisc

Raise the reservation to match the real worst case. The ppp_generic
receive path already enforces skb_tailroom >= 124 and is unaffected.

Fixes: b5451d783ade ("slip: Move the SLIP drivers")
Reported-by: Simon Horman <horms@kernel.org>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
---
 drivers/net/slip/slip.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/slip/slip.c b/drivers/net/slip/slip.c
index 820e1a8fc9560..37af7cbe7f81d 100644
--- a/drivers/net/slip/slip.c
+++ b/drivers/net/slip/slip.c
@@ -333,9 +333,13 @@ static void sl_bump(struct slip *sl)
 				printk(KERN_WARNING "%s: compressed packet ignored\n", dev->name);
 				return;
 			}
-			/* make sure we've reserved enough space for uncompress
-			   to use */
-			if (count + 80 > sl->buffsize) {
+			/* slhc_uncompress() prepends up to
+			 * ip->ihl * 4 + thp->doff * 4 bytes of reconstructed
+			 * IPv4 + TCP header. IHL and doff are 4-bit fields
+			 * (max 15) counting 4-byte units, so the header is
+			 * at most 2 * 15 * 4 = 120 bytes.
+			 */
+			if (count + 2 * 15 * 4 > sl->buffsize) {
 				dev->stats.rx_over_errors++;
 				return;
 			}
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH v5 8/9] driver core: Replace dev->of_node_reused with dev_of_node_reused()
From: Rob Herring (Arm) @ 2026-04-15 22:10 UTC (permalink / raw)
  To: Douglas Anderson
  Cc: astewart, linux-arm-kernel, Mark Brown, bhelgaas, maz, linux,
	kees, Alan Stern, Saravana Kannan, netdev, linux-serial, davem,
	andrew, Greg Kroah-Hartman, brgl, jirislaby, mani, Johan Hovold,
	linux-aspeed, linux-pci, kuba, Alexander Lobakin, Leon Romanovsky,
	andriy.shevchenko, Rafael J . Wysocki, Alexey Kardashevskiy,
	lgirdwood, andrew, hkallweit1, linux-kernel, Danilo Krummrich,
	Eric Dumazet, linux-usb, alexander.stein, Robin Murphy, pabeni,
	devicetree, driver-core, joel, Christoph Hellwig
In-Reply-To: <20260406162231.v5.8.I806b8636cd3724f6cd1f5e199318ab8694472d90@changeid>


On Mon, 06 Apr 2026 16:23:01 -0700, Douglas Anderson wrote:
> In C, bitfields are not necessarily safe to modify from multiple
> threads without locking. Switch "of_node_reused" over to the "flags"
> field so modifications are safe.
> 
> Cc: Johan Hovold <johan@kernel.org>
> Acked-by: Mark Brown <broonie@kernel.org>
> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>
> Reviewed-by: Danilo Krummrich <dakr@kernel.org>
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---
> Not fixing any known bugs; problem is theoretical and found by code
> inspection. Change is done somewhat manually and only lightly tested
> (mostly compile-time tested).
> 
> (no changes since v4)
> 
> Changes in v4:
> - Use accessor functions for flags
> 
> Changes in v3:
> - New
> 
>  drivers/base/core.c                      | 2 +-
>  drivers/base/pinctrl.c                   | 2 +-
>  drivers/base/platform.c                  | 2 +-
>  drivers/net/pcs/pcs-xpcs-plat.c          | 2 +-
>  drivers/of/device.c                      | 6 +++---
>  drivers/pci/of.c                         | 2 +-
>  drivers/pci/pwrctrl/core.c               | 2 +-
>  drivers/regulator/bq257xx-regulator.c    | 2 +-
>  drivers/regulator/rk808-regulator.c      | 2 +-
>  drivers/tty/serial/serial_base_bus.c     | 2 +-
>  drivers/usb/gadget/udc/aspeed-vhub/dev.c | 2 +-
>  include/linux/device.h                   | 7 ++++---
>  12 files changed, 17 insertions(+), 16 deletions(-)
> 

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>


^ permalink raw reply

* Re: [PATCH RFC 7/8] clk: sunxi-ng: a733: Add bus clock gates
From: Andre Przywara @ 2026-04-15 22:14 UTC (permalink / raw)
  To: Junhui Liu, Michael Turquette, Stephen Boyd, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Chen-Yu Tsai, Jernej Skrabec,
	Samuel Holland, Philipp Zabel, Paul Walmsley, Palmer Dabbelt,
	Albert Ou, Alexandre Ghiti, Richard Cochran
  Cc: linux-clk, devicetree, linux-arm-kernel, linux-sunxi,
	linux-kernel, linux-riscv, netdev
In-Reply-To: <20260310-a733-clk-v1-7-36b4e9b24457@pigmoral.tech>

Hi,

cheekily jumping in here, for the parts that are easy to verify ;-)

In general this series looks very good, and many thanks for splitting
this up in reviewable chunks, that's much appreciated!

On 3/10/26 09:34, Junhui Liu wrote:
> Add the bus clock gates that control access to the devices' register
> interface on the Allwinner A733 SoC. These clocks are typically
> single-bit controls in the BGR registers, covering UARTs, SPI, I2C, and
> various multimedia engines. It also includes bus gates for system
> components like the IOMMU and MSI-lite interfaces.
> 
> Signed-off-by: Junhui Liu <junhui.liu@pigmoral.tech>
> 
> ---
> The parents of some bus clocks are difficult to determine, as the user
> manual only describes the clock source for a few instances. The current
> configurations are based on references to previous Allwinner SoCs and
> information gathered from the manual. Where documentation is lacking,
> vendor practices are followed by setting the parent to "hosc" for now.
> ---
>   drivers/clk/sunxi-ng/ccu-sun60i-a733.c | 475 ++++++++++++++++++++++++++++++++-
>   1 file changed, 474 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/clk/sunxi-ng/ccu-sun60i-a733.c b/drivers/clk/sunxi-ng/ccu-sun60i-a733.c
> index 36b44568a56f..c0b09f9197d1 100644
> --- a/drivers/clk/sunxi-ng/ccu-sun60i-a733.c
> +++ b/drivers/clk/sunxi-ng/ccu-sun60i-a733.c
> @@ -408,16 +408,19 @@ static SUNXI_CCU_M_DATA_WITH_MUX(ahb_clk, "ahb", ahb_apb_parents, 0x500,
>   				 0, 5,		/* M */
>   				 24, 2,		/* mux */
>   				 0);
> +static const struct clk_hw *ahb_hws[] = { &ahb_clk.common.hw };
>   
>   static SUNXI_CCU_M_DATA_WITH_MUX(apb0_clk, "apb0", ahb_apb_parents, 0x510,
>   				 0, 5,		/* M */
>   				 24, 2,		/* mux */
>   				 0);
> +static const struct clk_hw *apb0_hws[] = { &apb0_clk.common.hw };
>   
>   static SUNXI_CCU_M_DATA_WITH_MUX(apb1_clk, "apb1", ahb_apb_parents, 0x518,
>   				 0, 5,		/* M */
>   				 24, 2,		/* mux */
>   				 0);
> +static const struct clk_hw *apb1_hws[] = { &apb1_clk.common.hw };
>   
>   static const struct clk_parent_data apb_uart_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -430,6 +433,9 @@ static SUNXI_CCU_M_DATA_WITH_MUX(apb_uart_clk, "apb-uart", apb_uart_parents, 0x5
>   				 0, 5,		/* M */
>   				 24, 3,		/* mux */
>   				 0);
> +static const struct clk_hw *apb_uart_hws[] = {
> +	&apb_uart_clk.common.hw
> +};
>   
>   static const struct clk_parent_data trace_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -463,6 +469,8 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(cpu_peri_clk, "cpu-peri", gic_cpu_peri_par
>   				      BIT(31),	/* gate */
>   				      0);
>   
> +static SUNXI_CCU_GATE_DATA(bus_its_pcie_clk, "bus-its-pcie", hosc, 0x574, BIT(1), 0);
> +
>   static const struct clk_parent_data nsi_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
>   	{ .hw = &pll_ddr_clk.common.hw },
> @@ -477,6 +485,7 @@ static SUNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT(nsi_clk, "nsi", nsi_parents, 0x580,
>   					    24, 3,	/* mux */
>   					    BIT(31),	/* gate */
>   					    0, CCU_FEATURE_UPDATE_BIT);
> +static SUNXI_CCU_GATE_DATA(bus_nsi_clk, "bus-nsi", hosc, 0x584, BIT(0), 0);
>   
>   static const struct clk_parent_data mbus_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -493,9 +502,117 @@ static SUNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT(mbus_clk, "mbus", mbus_parents, 0x58
>   					    BIT(31),	/* gate */
>   					    CLK_IS_CRITICAL,
>   					    CCU_FEATURE_UPDATE_BIT);
> +static const struct clk_hw *mbus_hws[] = { &mbus_clk.common.hw };
> +
> +static SUNXI_CCU_GATE_HWS(mbus_iommu0_sys_clk, "mbus-iommu0-sys", mbus_hws, 0x58c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(apb_iommu0_sys_clk, "apb-iommu0-sys", apb0_hws, 0x58c, BIT(1), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_iommu0_sys_clk, "ahb-iommu0-sys", ahb_hws, 0x58c, BIT(2), 0);
> +
> +static SUNXI_CCU_GATE_DATA(bus_msi_lite0_clk, "bus-msi-lite0", hosc, 0x594, BIT(0), 0);
> +static SUNXI_CCU_GATE_DATA(bus_msi_lite1_clk, "bus-msi-lite1", hosc, 0x59c, BIT(0), 0);
> +static SUNXI_CCU_GATE_DATA(bus_msi_lite2_clk, "bus-msi-lite2", hosc, 0x5a4, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(mbus_iommu1_sys_clk, "mbus-iommu1-sys", mbus_hws, 0x5b4, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(apb_iommu1_sys_clk, "apb_iommu1-sys", apb0_hws, 0x5b4, BIT(1), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_iommu1_sys_clk, "ahb_iommu1-sys", ahb_hws, 0x5b4, BIT(2), 0);
> +
> +static SUNXI_CCU_GATE_HWS(ahb_ve_dec_clk, "ahb-ve-dec", ahb_hws,
> +			  0x5c0, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_ve_enc_clk, "ahb-ve-enc", ahb_hws,
> +			  0x5c0, BIT(1), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_vid_in_clk, "ahb-vid-in", ahb_hws,
> +			  0x5c0, BIT(2), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_vid_cout0_clk, "ahb-vid-cout0", ahb_hws,
> +			  0x5c0, BIT(3), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_vid_cout1_clk, "ahb-vid-cout1", ahb_hws,
> +			  0x5c0, BIT(4), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_de_clk, "ahb-de", ahb_hws,
> +			  0x5c0, BIT(5), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_npu_clk, "ahb-npu", ahb_hws,
> +			  0x5c0, BIT(6), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_gpu0_clk, "ahb-gpu0", ahb_hws,
> +			  0x5c0, BIT(7), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_serdes_clk, "ahb-serdes", ahb_hws,
> +			  0x5c0, BIT(8), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_usb_sys_clk, "ahb-usb-sys", ahb_hws,
> +			  0x5c0, BIT(9), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_msi_lite0_clk, "ahb-msi-lite0", ahb_hws,
> +			  0x5c0, BIT(16), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_store_clk, "ahb-store", ahb_hws,
> +			  0x5c0, BIT(24), 0);
> +static SUNXI_CCU_GATE_HWS(ahb_cpus_clk, "ahb-cpus", ahb_hws,
> +			  0x5c0, BIT(28), 0);
> +
> +static SUNXI_CCU_GATE_HWS(mbus_iommu0_clk, "mbus-iommu0", mbus_hws,
> +			  0x5e0, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_iommu1_clk, "mbus-iommu1", mbus_hws,
> +			  0x5e0, BIT(1), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_desys_clk, "mbus-desys", mbus_hws,
> +			  0x5e0, BIT(11), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_ve_enc_gate_clk, "mbus-ve-enc-gate", mbus_hws,
> +			  0x5e0, BIT(12), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_ve_dec_gate_clk, "mbus-ve-dec-gate", mbus_hws,
> +			  0x5e0, BIT(14), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_gpu0_clk, "mbus-gpu0", mbus_hws,
> +			  0x5e0, BIT(16), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_npu_clk, "mbus-npu", mbus_hws,
> +			  0x5e0, BIT(18), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_vid_in_clk, "mbus-vid-in", mbus_hws,
> +			  0x5e0, BIT(24), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_serdes_clk, "mbus-serdes", mbus_hws,
> +			  0x5e0, BIT(28), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_msi_lite0_clk, "mbus-msi-lite0", mbus_hws,
> +			  0x5e0, BIT(29), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_store_clk, "mbus-store", mbus_hws,
> +			  0x5e0, BIT(30), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_msi_lite2_clk, "mbus-msi-lite2", mbus_hws,
> +			  0x5e0, BIT(31), 0);
> +
> +static SUNXI_CCU_GATE_HWS(mbus_dma0_clk, "mbus-dma0", mbus_hws,
> +			  0x5e4, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_ve_enc_clk, "mbus-ve-enc", mbus_hws,
> +			  0x5e4, BIT(1), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_ce_clk, "mbus-ce", mbus_hws,
> +			  0x5e4, BIT(2), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_dma1_clk, "mbus-dma1", mbus_hws,
> +			  0x5e4, BIT(3), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_nand_clk, "mbus-nand", mbus_hws,
> +			  0x5e4, BIT(5), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_csi_clk, "mbus-csi", mbus_hws,
> +			  0x5e4, BIT(8), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_isp_clk, "mbus-isp", mbus_hws,
> +			  0x5e4, BIT(9), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_gmac0_clk, "mbus-gmac0", mbus_hws,
> +			  0x5e4, BIT(11), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_gmac1_clk, "mbus-gmac1", mbus_hws,
> +			  0x5e4, BIT(12), 0);
> +static SUNXI_CCU_GATE_HWS(mbus_ve_dec_clk, "mbus-ve-dec", mbus_hws,
> +			  0x5e4, BIT(18), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_dma0_clk, "bus-dma0", ahb_hws,
> +			  0x704, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_dma1_clk, "bus-dma1", ahb_hws,
> +			  0x70c, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_spinlock_clk, "bus-spinlock", ahb_hws,
> +			  0x724, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_msgbox_clk, "bus-msgbox", ahb_hws,
> +			  0x744, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_pwm0_clk, "bus-pwm0", apb0_hws,
> +			  0x784, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_pwm1_clk, "bus-pwm1", apb0_hws,
> +			  0x78c, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_dbg_clk, "bus-dbg", sys_24M_hws,
> +			  0x7a4, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_sysdap_clk, "bus-sysdap", apb1_hws,
> +			  0x88c, BIT(0), 0);
>   
>   /**************************************************************************
> - *                          mod clocks                                    *
> + *                          mod clocks with gates                         *
>    **************************************************************************/
>   
>   static const struct clk_parent_data timer_parents[] = {
> @@ -565,6 +682,7 @@ static SUNXI_CCU_MP_DATA_WITH_MUX_GATE(timer9_clk, "timer9", timer_parents, 0x82
>   				       24, 3,		/* mux */
>   				       BIT(31),		/* gate */
>   				       0);
> +static SUNXI_CCU_GATE_HWS(bus_timer_clk, "bus-timer", ahb_hws, 0x850, BIT(0), 0);
>   
>   static const struct clk_parent_data avs_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -589,6 +707,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(de_clk, "de", de_parents, 0xa00,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    CLK_SET_RATE_PARENT);
> +static SUNXI_CCU_GATE_HWS(bus_de_clk, "bus-de", ahb_hws, 0xa04, BIT(0), 0);
>   
>   static const struct clk_hw *di_parents[] = {
>   	&pll_periph0_600M_clk.hw,
> @@ -602,6 +721,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(di_clk, "di", di_parents, 0xa20,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    CLK_SET_RATE_PARENT);
> +static SUNXI_CCU_GATE_HWS(bus_di_clk, "bus-di", ahb_hws, 0xa24, BIT(0), 0);
>   
>   static const struct clk_hw *g2d_parents[] = {
>   	&pll_periph0_400M_clk.hw,
> @@ -614,6 +734,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(g2d_clk, "g2d", g2d_parents, 0xa40,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    CLK_SET_RATE_PARENT);
> +static SUNXI_CCU_GATE_HWS(bus_g2d_clk, "bus-g2d", ahb_hws, 0xa44, BIT(0), 0);
>   
>   static const struct clk_hw *eink_parents[] = {
>   	&pll_periph0_480M_clk.common.hw,
> @@ -637,6 +758,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(eink_panel_clk, "eink-panel", eink_panel_par
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    CLK_SET_RATE_PARENT);
> +static SUNXI_CCU_GATE_HWS(bus_eink_clk, "bus-eink", ahb_hws, 0xa6c, BIT(0), 0);
>   
>   static const struct clk_hw *ve_enc_parents[] = {
>   	&pll_ve0_clk.common.hw,
> @@ -668,6 +790,9 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(ve_dec_clk, "ve-dec", ve_dec_parents, 0xa88,
>   				    BIT(31),	/* gate */
>   				    CLK_SET_RATE_PARENT);
>   
> +static SUNXI_CCU_GATE_HWS(bus_ve_enc_clk, "bus-ve-enc", ahb_hws, 0xa8c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_ve_dec_clk, "bus-ve-dec", ahb_hws, 0xa8c, BIT(2), 0);
> +
>   static const struct clk_hw *ce_parents[] = {
>   	&sys_24M_clk.hw,
>   	&pll_periph0_400M_clk.hw,
> @@ -678,6 +803,8 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(ce_clk, "ce", ce_parents, 0xac0,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_HWS(bus_ce_clk, "bus-ce", ahb_hws, 0xac4, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_ce_sys_clk, "bus-ce-sys", ahb_hws, 0xac4, BIT(1), 0);
>   
>   static const struct clk_hw *npu_parents[] = {
>   	&pll_npu_clk.common.hw,
> @@ -693,6 +820,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(npu_clk, "npu", npu_parents, 0xb00,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_DATA(bus_npu_clk, "bus-npu", hosc, 0xb04, BIT(0), 0);
>   
>   /*
>    * GPU_CLK = ClockSource * ((16 - M) / 16)
> @@ -725,6 +853,7 @@ static struct ccu_div gpu_clk = {
>   							   &ccu_div_ops, 0),
>   	}
>   };
> +static SUNXI_CCU_GATE_HWS(bus_gpu_clk, "bus-gpu", ahb_hws, 0xb24, BIT(0), 0);
>   
>   static const struct clk_parent_data dram_parents[] = {
>   	{ .hw = &pll_ddr_clk.common.hw, },
> @@ -740,6 +869,7 @@ static SUNXI_CCU_MP_DATA_WITH_MUX_GATE_FEAT(dram_clk, "dram", dram_parents, 0xc0
>   					    BIT(31),	/* gate */
>   					    CLK_IS_CRITICAL,
>   					    CCU_FEATURE_UPDATE_BIT);
> +static SUNXI_CCU_GATE_HWS(bus_dram_clk, "bus-dram", ahb_hws, 0xc0c, BIT(0), 0);
>   
>   static const struct clk_parent_data nand_mmc_parents[] = {
>   	{ .hw = &sys_24M_clk.hw, },
> @@ -758,6 +888,7 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(nand1_clk, "nand1", nand_mmc_parents, 0xc8
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_HWS(bus_nand_clk, "bus-nand", ahb_hws, 0xc8c, BIT(0), 0);
>   
>   static SUNXI_CCU_MP_MUX_GATE_POSTDIV_DUALDIV(mmc0_clk, "mmc0", nand_mmc_parents, 0xd00,
>   					     0, 5,	/* M */
> @@ -796,6 +927,11 @@ static SUNXI_CCU_MP_MUX_GATE_POSTDIV_DUALDIV(mmc3_clk, "mmc3", mmc2_mmc3_parents
>   					     2,		/* post div */
>   					     0);
>   
> +static SUNXI_CCU_GATE_HWS(bus_mmc0_clk, "bus-mmc0", ahb_hws, 0xd0c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_mmc1_clk, "bus-mmc1", ahb_hws, 0xd1c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_mmc2_clk, "bus-mmc2", ahb_hws, 0xd2c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_mmc3_clk, "bus-mmc3", ahb_hws, 0xd3c, BIT(0), 0);
> +
>   static const struct clk_hw *ufs_axi_parents[] = {
>   	&pll_periph0_300M_clk.hw,
>   	&pll_periph0_200M_clk.hw,
> @@ -815,6 +951,29 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(ufs_cfg_clk, "ufs-cfg", ufs_cfg_parents, 0
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_DATA(bus_ufs_clk, "bus-ufs", hosc, 0xd8c, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_uart0_clk, "bus-uart0", apb_uart_hws, 0xe00, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_uart1_clk, "bus-uart1", apb_uart_hws, 0xe04, BIT(1), 0);
> +static SUNXI_CCU_GATE_HWS(bus_uart2_clk, "bus-uart2", apb_uart_hws, 0xe08, BIT(2), 0);
> +static SUNXI_CCU_GATE_HWS(bus_uart3_clk, "bus-uart3", apb_uart_hws, 0xe0c, BIT(3), 0);
> +static SUNXI_CCU_GATE_HWS(bus_uart4_clk, "bus-uart4", apb_uart_hws, 0xe10, BIT(4), 0);
> +static SUNXI_CCU_GATE_HWS(bus_uart5_clk, "bus-uart5", apb_uart_hws, 0xe14, BIT(5), 0);
> +static SUNXI_CCU_GATE_HWS(bus_uart6_clk, "bus-uart6", apb_uart_hws, 0xe18, BIT(6), 0);

According to the manual the gate bits are always BIT(0), since each
UART has its own bus gate register.

> +static SUNXI_CCU_GATE_HWS(bus_i2c0_clk, "bus-i2c0", apb1_hws, 0xe80, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c1_clk, "bus-i2c1", apb1_hws, 0xe84, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c2_clk, "bus-i2c2", apb1_hws, 0xe88, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c3_clk, "bus-i2c3", apb1_hws, 0xe8c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c4_clk, "bus-i2c4", apb1_hws, 0xe90, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c5_clk, "bus-i2c5", apb1_hws, 0xe94, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c6_clk, "bus-i2c6", apb1_hws, 0xe98, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c7_clk, "bus-i2c7", apb1_hws, 0xe9c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c8_clk, "bus-i2c8", apb1_hws, 0xea0, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c9_clk, "bus-i2c9", apb1_hws, 0xea4, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c10_clk, "bus-i2c10", apb1_hws, 0xea8, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c11_clk, "bus-i2c11", apb1_hws, 0xeac, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_i2c12_clk, "bus-i2c12", apb1_hws, 0xeb0, BIT(0), 0);
>   
>   static const struct clk_parent_data spi_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -856,6 +1015,11 @@ static SUNXI_CCU_DUALDIV_MUX_GATE(spi4_clk, "spi4", spi_parents, 0xf28,
>   				  24, 3,	/* mux */
>   				  BIT(31),	/* gate */
>   				  0);
> +static SUNXI_CCU_GATE_HWS(bus_spi0_clk, "bus-spi0", ahb_hws, 0xf04, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_spi1_clk, "bus-spi1", ahb_hws, 0xf0c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_spi2_clk, "bus-spi2", ahb_hws, 0xf14, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_spi3_clk, "bus-spi3", ahb_hws, 0xf24, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_spi4_clk, "bus-spi4", ahb_hws, 0xf2c, BIT(0), 0);
>   
>   static const struct clk_parent_data spif_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -873,6 +1037,7 @@ static SUNXI_CCU_DUALDIV_MUX_GATE(spif_clk, "spif", spif_parents, 0xf18,
>   				  24, 3,	/* mux */
>   				  BIT(31),	/* gate */
>   				  0);
> +static SUNXI_CCU_GATE_HWS(bus_spif_clk, "bus-spif", ahb_hws, 0xf1c, BIT(0), 0);

Can you please move that line into the other SPI gates above, so that
it is ordered by address?

>   
>   static const struct clk_parent_data gpadc_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -883,6 +1048,9 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(gpadc_clk, "gpadc", gpadc_parents, 0xfc0,
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_HWS(bus_gpadc_clk, "bus-gpadc", ahb_hws, 0xfc4, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_ths_clk, "bus-ths", apb0_hws, 0xfe4, BIT(0), 0);
>   
>   static const struct clk_parent_data irrx_parents[] = {
>   	{ .fw_name = "losc"},
> @@ -894,6 +1062,7 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(irrx_clk, "irrx", irrx_parents, 0x1000,
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_HWS(bus_irrx_clk, "bus-irrx", apb0_hws, 0x1004, BIT(0), 0);
>   
>   static const struct clk_parent_data irtx_parents[] = {
>   	{ .fw_name = "losc"},
> @@ -905,6 +1074,9 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(irtx_clk, "irtx", irtx_parents, 0x1008,
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_HWS(bus_irtx_clk, "bus-irtx", apb0_hws, 0x100c, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_lradc_clk, "bus-lradc", apb0_hws, 0x1024, BIT(0), 0);
>   
>   static const struct clk_parent_data sgpio_parents[] = {
>   	{ .fw_name = "losc"},
> @@ -915,6 +1087,7 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(sgpio_clk, "sgpio", sgpio_parents, 0x1060,
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_DATA(bus_sgpio_clk, "bus-sgpio", hosc, 0x1064, BIT(0), 0);
>   
>   static const struct clk_hw *lpc_parents[] = {
>   	&pll_video0_3x_clk.common.hw,
> @@ -927,6 +1100,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(lpc_clk, "lpc", lpc_parents, 0x1080,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_DATA(bus_lpc_clk, "bus-lpc", hosc, 0x1084, BIT(0), 0);

where do these two clocks come from? They are not mentioned in the 
version of the manual I am looking at. If they come from BSP sources, 
please add a comment about that.

>   
>   static const struct clk_hw *i2spcm_parents[] = {
>   	&pll_audio0_4x_clk.common.hw,
> @@ -959,6 +1133,11 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(i2spcm4_clk, "i2spcm4", i2spcm_parents, 0x12
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_DATA(bus_i2spcm0_clk, "bus-i2spcm0", hosc, 0x120c, BIT(0), 0);
> +static SUNXI_CCU_GATE_DATA(bus_i2spcm1_clk, "bus-i2spcm1", hosc, 0x121c, BIT(0), 0);
> +static SUNXI_CCU_GATE_DATA(bus_i2spcm2_clk, "bus-i2spcm2", hosc, 0x122c, BIT(0), 0);
> +static SUNXI_CCU_GATE_DATA(bus_i2spcm3_clk, "bus-i2spcm3", hosc, 0x123c, BIT(0), 0);
> +static SUNXI_CCU_GATE_DATA(bus_i2spcm4_clk, "bus-i2spcm4", hosc, 0x124c, BIT(0), 0);
>   
>   static const struct clk_hw *i2spcm2_asrc_parents[] = {
>   	&pll_audio0_4x_clk.common.hw,
> @@ -995,6 +1174,8 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(owa_rx_clk, "owa_rx", owa_rx_parents, 0x1284
>   				    BIT(31),	/* gate */
>   				    0);
>   
> +static SUNXI_CCU_GATE_HWS(bus_owa_clk, "bus-owa", apb1_hws, 0x128c, BIT(0), 0);

In mainline we use "spdif" instead of "owa", compare the other drivers.

> +
>   static const struct clk_hw *dmic_parents[] = {
>   	&pll_audio0_4x_clk.common.hw,
>   	&pll_audio1_div2_clk.common.hw,
> @@ -1006,6 +1187,8 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(dmic_clk, "dmic", dmic_parents, 0x12c0,
>   				    BIT(31),	/* gate */
>   				    0);
>   
> +static SUNXI_CCU_GATE_HWS(bus_dmic_clk, "bus-dmic", apb1_hws, 0x12cc, BIT(0), 0);
> +
>   /*
>    * The first parent is a 48 MHz input clock divided by 4. That 48 MHz clock is
>    * a 2x multiplier from pll-ref synchronized by pll-periph0, and is also used by
> @@ -1037,6 +1220,9 @@ static struct ccu_mux usb_ohci0_clk = {
>   							   &ccu_mux_ops, 0),
>   	},
>   };
> +static SUNXI_CCU_GATE_HWS(bus_ohci0_clk, "bus-ohci0", ahb_hws, 0x1304, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_ehci0_clk, "bus-ehci0", ahb_hws, 0x1304, BIT(4), 0);
> +static SUNXI_CCU_GATE_HWS(bus_otg_clk, "bus-otg", ahb_hws, 0x1304, BIT(8), 0);
>   
>   static struct ccu_mux usb_ohci1_clk = {
>   	.enable		= BIT(31),
> @@ -1053,6 +1239,8 @@ static struct ccu_mux usb_ohci1_clk = {
>   							   &ccu_mux_ops, 0),
>   	},
>   };
> +static SUNXI_CCU_GATE_HWS(bus_ohci1_clk, "bus-ohci1", ahb_hws, 0x130c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_ehci1_clk, "bus-ehci1", ahb_hws, 0x130c, BIT(4), 0);
>   
>   static const struct clk_parent_data usb_ref_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -1159,6 +1347,8 @@ static SUNXI_CCU_M_HWS_WITH_GATE(gmac1_phy_clk, "gmac1-phy", pll_periph0_150M_hw
>   				 0, 5,		/* M */
>   				 BIT(31),	/* gate */
>   				 0);
> +static SUNXI_CCU_GATE_HWS(bus_gmac0_clk, "bus-gmac0", ahb_hws, 0x141c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_gmac1_clk, "bus-gmac1", ahb_hws, 0x142c, BIT(0), 0);

That GMAC1 clock is not in the manual, where does it come from?

>   
>   static const struct clk_hw *tcon_lcd_parents[] = {
>   	&pll_video0_4x_clk.common.hw,
> @@ -1181,6 +1371,9 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(tcon_lcd2_clk, "tcon-lcd2", tcon_lcd_parents
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_HWS(bus_tcon_lcd0_clk, "bus-tcon-lcd0", ahb_hws, 0x1504, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_tcon_lcd1_clk, "bus-tcon-lcd1", ahb_hws, 0x150c, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_tcon_lcd2_clk, "bus-tcon-lcd2", ahb_hws, 0x1514, BIT(0), 0);

Same here, LCD2 is not listed.

The rest looks alright when comparing to the manual, also the whole
boilerplate with the SUNXI_CC_GATE_HWS macro, the list of hw clocks
below and the assignment of the clock IDs to the clocks.

Cheers,
Andre

>   
>   static const struct clk_hw *dsi_parents[] = {
>   	&sys_24M_clk.hw,
> @@ -1197,6 +1390,8 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(dsi1_clk, "dsi1", dsi_parents, 0x1588,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_HWS(bus_dsi0_clk, "bus-dsi0", ahb_hws, 0x1584, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_dsi1_clk, "bus-dsi1", ahb_hws, 0x158c, BIT(0), 0);
>   
>   static const struct clk_hw *combphy_parents[] = {
>   	&pll_video0_4x_clk.common.hw,
> @@ -1216,6 +1411,9 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(combphy1_clk, "combphy1", combphy_parents, 0
>   				    BIT(31),	/* gate */
>   				    0);
>   
> +static SUNXI_CCU_GATE_HWS(bus_tcon_tv0_clk, "bus-tcon-tv0", ahb_hws, 0x1604, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_tcon_tv1_clk, "bus-tcon-tv1", ahb_hws, 0x160c, BIT(0), 0);
> +
>   static const struct clk_hw *edp_tv_parents[] = {
>   	&pll_video0_4x_clk.common.hw,
>   	&pll_video1_4x_clk.common.hw,
> @@ -1227,6 +1425,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(edp_tv_clk, "edp-tv", edp_tv_parents, 0x1640
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_HWS(bus_edp_tv_clk, "bus-edp-tv", ahb_hws, 0x164c, BIT(0), 0);
>   
>   static SUNXI_CCU_GATE_HWS_WITH_PREDIV(hdmi_cec_32k_clk, "hdmi-cec-32k", pll_periph0_2x_hws, 0x1680,
>   				      BIT(30),	/* gate */
> @@ -1254,6 +1453,7 @@ static SUNXI_CCU_DUALDIV_MUX_GATE(hdmi_tv_clk, "hdmi-tv", hdmi_tv_parents, 0x168
>   				  24, 3,	/* mux */
>   				  BIT(31),	/* gate */
>   				  0);
> +static SUNXI_CCU_GATE_HWS(bus_hdmi_tv_clk, "bus-hdmi-tv", ahb_hws, 0x168c, BIT(0), 0);
>   
>   static const struct clk_parent_data hdmi_sfr_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -1266,6 +1466,9 @@ static SUNXI_CCU_MUX_DATA_WITH_GATE(hdmi_sfr_clk, "hdmi-sfr", hdmi_sfr_parents,
>   
>   static SUNXI_CCU_GATE_HWS(hdmi_esm_clk, "hdmi-esm", pll_periph0_300M_hws, 0x1694, BIT(31), 0);
>   
> +static SUNXI_CCU_GATE_HWS(bus_dpss_top0_clk, "bus-dpss-top0", ahb_hws, 0x16c4, BIT(0), 0);
> +static SUNXI_CCU_GATE_HWS(bus_dpss_top1_clk, "bus-dpss-top1", ahb_hws, 0x16cc, BIT(0), 0);
> +
>   static const struct clk_parent_data ledc_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
>   	{ .hw = &pll_periph0_600M_clk.hw },
> @@ -1276,6 +1479,9 @@ static SUNXI_CCU_M_DATA_WITH_MUX_GATE(ledc_clk, "ledc", ledc_parents, 0x1700,
>   				      24, 3,	/* mux */
>   				      BIT(31),	/* gate */
>   				      0);
> +static SUNXI_CCU_GATE_HWS(bus_ledc_clk, "bus-ledc", apb0_hws, 0x1704, BIT(0), 0);
> +
> +static SUNXI_CCU_GATE_HWS(bus_dsc_clk, "bus-dsc", ahb_hws, 0x1744, BIT(0), 0);
>   
>   static const struct clk_parent_data csi_master_parents[] = {
>   	{ .hw = &sys_24M_clk.hw },
> @@ -1317,6 +1523,7 @@ static SUNXI_CCU_M_HW_WITH_MUX_GATE(csi_clk, "csi", csi_parents, 0x1840,
>   				    24, 3,	/* mux */
>   				    BIT(31),	/* gate */
>   				    0);
> +static SUNXI_CCU_GATE_HWS(bus_csi_clk, "bus-csi", ahb_hws, 0x1844, BIT(0), 0);
>   
>   static const struct clk_hw *isp_parents[] = {
>   	&pll_video2_4x_clk.common.hw,
> @@ -1446,8 +1653,62 @@ static struct ccu_common *sun60i_a733_ccu_clks[] = {
>   	&trace_clk.common,
>   	&gic_clk.common,
>   	&cpu_peri_clk.common,
> +	&bus_its_pcie_clk.common,
>   	&nsi_clk.common,
> +	&bus_nsi_clk.common,
>   	&mbus_clk.common,
> +	&mbus_iommu0_sys_clk.common,
> +	&apb_iommu0_sys_clk.common,
> +	&ahb_iommu0_sys_clk.common,
> +	&bus_msi_lite0_clk.common,
> +	&bus_msi_lite1_clk.common,
> +	&bus_msi_lite2_clk.common,
> +	&mbus_iommu1_sys_clk.common,
> +	&apb_iommu1_sys_clk.common,
> +	&ahb_iommu1_sys_clk.common,
> +	&ahb_ve_dec_clk.common,
> +	&ahb_ve_enc_clk.common,
> +	&ahb_vid_in_clk.common,
> +	&ahb_vid_cout0_clk.common,
> +	&ahb_vid_cout1_clk.common,
> +	&ahb_de_clk.common,
> +	&ahb_npu_clk.common,
> +	&ahb_gpu0_clk.common,
> +	&ahb_serdes_clk.common,
> +	&ahb_usb_sys_clk.common,
> +	&ahb_msi_lite0_clk.common,
> +	&ahb_store_clk.common,
> +	&ahb_cpus_clk.common,
> +	&mbus_iommu0_clk.common,
> +	&mbus_iommu1_clk.common,
> +	&mbus_desys_clk.common,
> +	&mbus_ve_enc_gate_clk.common,
> +	&mbus_ve_dec_gate_clk.common,
> +	&mbus_gpu0_clk.common,
> +	&mbus_npu_clk.common,
> +	&mbus_vid_in_clk.common,
> +	&mbus_serdes_clk.common,
> +	&mbus_msi_lite0_clk.common,
> +	&mbus_store_clk.common,
> +	&mbus_msi_lite2_clk.common,
> +	&mbus_dma0_clk.common,
> +	&mbus_ve_enc_clk.common,
> +	&mbus_ce_clk.common,
> +	&mbus_dma1_clk.common,
> +	&mbus_nand_clk.common,
> +	&mbus_csi_clk.common,
> +	&mbus_isp_clk.common,
> +	&mbus_gmac0_clk.common,
> +	&mbus_gmac1_clk.common,
> +	&mbus_ve_dec_clk.common,
> +	&bus_dma0_clk.common,
> +	&bus_dma1_clk.common,
> +	&bus_spinlock_clk.common,
> +	&bus_msgbox_clk.common,
> +	&bus_pwm0_clk.common,
> +	&bus_pwm1_clk.common,
> +	&bus_dbg_clk.common,
> +	&bus_sysdap_clk.common,
>   	&timer0_clk.common,
>   	&timer1_clk.common,
>   	&timer2_clk.common,
> @@ -1458,48 +1719,111 @@ static struct ccu_common *sun60i_a733_ccu_clks[] = {
>   	&timer7_clk.common,
>   	&timer8_clk.common,
>   	&timer9_clk.common,
> +	&bus_timer_clk.common,
>   	&avs_clk.common,
>   	&de_clk.common,
> +	&bus_de_clk.common,
>   	&di_clk.common,
> +	&bus_di_clk.common,
>   	&g2d_clk.common,
> +	&bus_g2d_clk.common,
>   	&eink_clk.common,
>   	&eink_panel_clk.common,
> +	&bus_eink_clk.common,
>   	&ve_enc_clk.common,
>   	&ve_dec_clk.common,
> +	&bus_ve_enc_clk.common,
> +	&bus_ve_dec_clk.common,
>   	&ce_clk.common,
> +	&bus_ce_clk.common,
> +	&bus_ce_sys_clk.common,
>   	&npu_clk.common,
> +	&bus_npu_clk.common,
>   	&gpu_clk.common,
> +	&bus_gpu_clk.common,
>   	&dram_clk.common,
> +	&bus_dram_clk.common,
>   	&nand0_clk.common,
>   	&nand1_clk.common,
> +	&bus_nand_clk.common,
>   	&mmc0_clk.common,
>   	&mmc1_clk.common,
>   	&mmc2_clk.common,
>   	&mmc3_clk.common,
> +	&bus_mmc0_clk.common,
> +	&bus_mmc1_clk.common,
> +	&bus_mmc2_clk.common,
> +	&bus_mmc3_clk.common,
>   	&ufs_axi_clk.common,
>   	&ufs_cfg_clk.common,
> +	&bus_ufs_clk.common,
> +	&bus_uart0_clk.common,
> +	&bus_uart1_clk.common,
> +	&bus_uart2_clk.common,
> +	&bus_uart3_clk.common,
> +	&bus_uart4_clk.common,
> +	&bus_uart5_clk.common,
> +	&bus_uart6_clk.common,
> +	&bus_i2c0_clk.common,
> +	&bus_i2c1_clk.common,
> +	&bus_i2c2_clk.common,
> +	&bus_i2c3_clk.common,
> +	&bus_i2c4_clk.common,
> +	&bus_i2c5_clk.common,
> +	&bus_i2c6_clk.common,
> +	&bus_i2c7_clk.common,
> +	&bus_i2c8_clk.common,
> +	&bus_i2c9_clk.common,
> +	&bus_i2c10_clk.common,
> +	&bus_i2c11_clk.common,
> +	&bus_i2c12_clk.common,
>   	&spi0_clk.common,
>   	&spi1_clk.common,
>   	&spi2_clk.common,
>   	&spi3_clk.common,
>   	&spi4_clk.common,
> +	&bus_spi0_clk.common,
> +	&bus_spi1_clk.common,
> +	&bus_spi2_clk.common,
> +	&bus_spi3_clk.common,
> +	&bus_spi4_clk.common,
>   	&spif_clk.common,
> +	&bus_spif_clk.common,
>   	&gpadc_clk.common,
> +	&bus_gpadc_clk.common,
> +	&bus_ths_clk.common,
>   	&irrx_clk.common,
> +	&bus_irrx_clk.common,
>   	&irtx_clk.common,
> +	&bus_irtx_clk.common,
> +	&bus_lradc_clk.common,
>   	&sgpio_clk.common,
> +	&bus_sgpio_clk.common,
>   	&lpc_clk.common,
> +	&bus_lpc_clk.common,
>   	&i2spcm0_clk.common,
>   	&i2spcm1_clk.common,
>   	&i2spcm2_clk.common,
>   	&i2spcm3_clk.common,
>   	&i2spcm4_clk.common,
> +	&bus_i2spcm0_clk.common,
> +	&bus_i2spcm1_clk.common,
> +	&bus_i2spcm2_clk.common,
> +	&bus_i2spcm3_clk.common,
> +	&bus_i2spcm4_clk.common,
>   	&i2spcm2_asrc_clk.common,
>   	&owa_tx_clk.common,
>   	&owa_rx_clk.common,
> +	&bus_owa_clk.common,
>   	&dmic_clk.common,
> +	&bus_dmic_clk.common,
>   	&usb_ohci0_clk.common,
> +	&bus_otg_clk.common,
> +	&bus_ehci0_clk.common,
> +	&bus_ohci0_clk.common,
>   	&usb_ohci1_clk.common,
> +	&bus_ehci1_clk.common,
> +	&bus_ohci1_clk.common,
>   	&usb_ref_clk.common,
>   	&usb2_u2_ref_clk.common,
>   	&usb2_suspend_clk.common,
> @@ -1512,24 +1836,40 @@ static struct ccu_common *sun60i_a733_ccu_clks[] = {
>   	&gmac_ptp_clk.common,
>   	&gmac0_phy_clk.common,
>   	&gmac1_phy_clk.common,
> +	&bus_gmac0_clk.common,
> +	&bus_gmac1_clk.common,
>   	&tcon_lcd0_clk.common,
>   	&tcon_lcd1_clk.common,
>   	&tcon_lcd2_clk.common,
> +	&bus_tcon_lcd0_clk.common,
> +	&bus_tcon_lcd1_clk.common,
> +	&bus_tcon_lcd2_clk.common,
>   	&dsi0_clk.common,
>   	&dsi1_clk.common,
> +	&bus_dsi0_clk.common,
> +	&bus_dsi1_clk.common,
>   	&combphy0_clk.common,
>   	&combphy1_clk.common,
> +	&bus_tcon_tv0_clk.common,
> +	&bus_tcon_tv1_clk.common,
>   	&edp_tv_clk.common,
> +	&bus_edp_tv_clk.common,
>   	&hdmi_cec_32k_clk.common,
>   	&hdmi_cec_clk.common,
>   	&hdmi_tv_clk.common,
> +	&bus_hdmi_tv_clk.common,
>   	&hdmi_sfr_clk.common,
>   	&hdmi_esm_clk.common,
> +	&bus_dpss_top0_clk.common,
> +	&bus_dpss_top1_clk.common,
>   	&ledc_clk.common,
> +	&bus_ledc_clk.common,
> +	&bus_dsc_clk.common,
>   	&csi_master0_clk.common,
>   	&csi_master1_clk.common,
>   	&csi_master2_clk.common,
>   	&csi_clk.common,
> +	&bus_csi_clk.common,
>   	&isp_clk.common,
>   	&apb2jtag_clk.common,
>   	&fanout_24M_clk.common,
> @@ -1596,8 +1936,62 @@ static struct clk_hw_onecell_data sun60i_a733_hw_clks = {
>   		[CLK_TRACE]		= &trace_clk.common.hw,
>   		[CLK_GIC]		= &gic_clk.common.hw,
>   		[CLK_CPU_PERI]		= &cpu_peri_clk.common.hw,
> +		[CLK_BUS_ITS_PCIE]	= &bus_its_pcie_clk.common.hw,
>   		[CLK_NSI]		= &nsi_clk.common.hw,
> +		[CLK_BUS_NSI]		= &bus_nsi_clk.common.hw,
>   		[CLK_MBUS]		= &mbus_clk.common.hw,
> +		[CLK_MBUS_IOMMU0_SYS]	= &mbus_iommu0_sys_clk.common.hw,
> +		[CLK_APB_IOMMU0_SYS]	= &apb_iommu0_sys_clk.common.hw,
> +		[CLK_AHB_IOMMU0_SYS]	= &ahb_iommu0_sys_clk.common.hw,
> +		[CLK_BUS_MSI_LITE0]	= &bus_msi_lite0_clk.common.hw,
> +		[CLK_BUS_MSI_LITE1]	= &bus_msi_lite1_clk.common.hw,
> +		[CLK_BUS_MSI_LITE2]	= &bus_msi_lite2_clk.common.hw,
> +		[CLK_MBUS_IOMMU1_SYS]	= &mbus_iommu1_sys_clk.common.hw,
> +		[CLK_APB_IOMMU1_SYS]	= &apb_iommu1_sys_clk.common.hw,
> +		[CLK_AHB_IOMMU1_SYS]	= &ahb_iommu1_sys_clk.common.hw,
> +		[CLK_AHB_VE_DEC]	= &ahb_ve_dec_clk.common.hw,
> +		[CLK_AHB_VE_ENC]	= &ahb_ve_enc_clk.common.hw,
> +		[CLK_AHB_VID_IN]	= &ahb_vid_in_clk.common.hw,
> +		[CLK_AHB_VID_COUT0]	= &ahb_vid_cout0_clk.common.hw,
> +		[CLK_AHB_VID_COUT1]	= &ahb_vid_cout1_clk.common.hw,
> +		[CLK_AHB_DE]		= &ahb_de_clk.common.hw,
> +		[CLK_AHB_NPU]		= &ahb_npu_clk.common.hw,
> +		[CLK_AHB_GPU0]		= &ahb_gpu0_clk.common.hw,
> +		[CLK_AHB_SERDES]	= &ahb_serdes_clk.common.hw,
> +		[CLK_AHB_USB_SYS]	= &ahb_usb_sys_clk.common.hw,
> +		[CLK_AHB_MSI_LITE0]	= &ahb_msi_lite0_clk.common.hw,
> +		[CLK_AHB_STORE]		= &ahb_store_clk.common.hw,
> +		[CLK_AHB_CPUS]		= &ahb_cpus_clk.common.hw,
> +		[CLK_MBUS_IOMMU0]	= &mbus_iommu0_clk.common.hw,
> +		[CLK_MBUS_IOMMU1]	= &mbus_iommu1_clk.common.hw,
> +		[CLK_MBUS_DESYS]	= &mbus_desys_clk.common.hw,
> +		[CLK_MBUS_VE_ENC_GATE]	= &mbus_ve_enc_gate_clk.common.hw,
> +		[CLK_MBUS_VE_DEC_GATE]	= &mbus_ve_dec_gate_clk.common.hw,
> +		[CLK_MBUS_GPU0]		= &mbus_gpu0_clk.common.hw,
> +		[CLK_MBUS_NPU]		= &mbus_npu_clk.common.hw,
> +		[CLK_MBUS_VID_IN]	= &mbus_vid_in_clk.common.hw,
> +		[CLK_MBUS_SERDES]	= &mbus_serdes_clk.common.hw,
> +		[CLK_MBUS_MSI_LITE0]	= &mbus_msi_lite0_clk.common.hw,
> +		[CLK_MBUS_STORE]	= &mbus_store_clk.common.hw,
> +		[CLK_MBUS_MSI_LITE2]	= &mbus_msi_lite2_clk.common.hw,
> +		[CLK_MBUS_DMA0]		= &mbus_dma0_clk.common.hw,
> +		[CLK_MBUS_VE_ENC]	= &mbus_ve_enc_clk.common.hw,
> +		[CLK_MBUS_CE]		= &mbus_ce_clk.common.hw,
> +		[CLK_MBUS_DMA1]		= &mbus_dma1_clk.common.hw,
> +		[CLK_MBUS_NAND]		= &mbus_nand_clk.common.hw,
> +		[CLK_MBUS_CSI]		= &mbus_csi_clk.common.hw,
> +		[CLK_MBUS_ISP]		= &mbus_isp_clk.common.hw,
> +		[CLK_MBUS_GMAC0]	= &mbus_gmac0_clk.common.hw,
> +		[CLK_MBUS_GMAC1]	= &mbus_gmac1_clk.common.hw,
> +		[CLK_MBUS_VE_DEC]	= &mbus_ve_dec_clk.common.hw,
> +		[CLK_BUS_DMA0]		= &bus_dma0_clk.common.hw,
> +		[CLK_BUS_DMA1]		= &bus_dma1_clk.common.hw,
> +		[CLK_BUS_SPINLOCK]	= &bus_spinlock_clk.common.hw,
> +		[CLK_BUS_MSGBOX]	= &bus_msgbox_clk.common.hw,
> +		[CLK_BUS_PWM0]		= &bus_pwm0_clk.common.hw,
> +		[CLK_BUS_PWM1]		= &bus_pwm1_clk.common.hw,
> +		[CLK_BUS_DBG]		= &bus_dbg_clk.common.hw,
> +		[CLK_BUS_SYSDAP]	= &bus_sysdap_clk.common.hw,
>   		[CLK_TIMER0]		= &timer0_clk.common.hw,
>   		[CLK_TIMER1]		= &timer1_clk.common.hw,
>   		[CLK_TIMER2]		= &timer2_clk.common.hw,
> @@ -1608,48 +2002,111 @@ static struct clk_hw_onecell_data sun60i_a733_hw_clks = {
>   		[CLK_TIMER7]		= &timer7_clk.common.hw,
>   		[CLK_TIMER8]		= &timer8_clk.common.hw,
>   		[CLK_TIMER9]		= &timer9_clk.common.hw,
> +		[CLK_BUS_TIMER]		= &bus_timer_clk.common.hw,
>   		[CLK_AVS]		= &avs_clk.common.hw,
>   		[CLK_DE]		= &de_clk.common.hw,
> +		[CLK_BUS_DE]		= &bus_de_clk.common.hw,
>   		[CLK_DI]		= &di_clk.common.hw,
> +		[CLK_BUS_DI]		= &bus_di_clk.common.hw,
>   		[CLK_G2D]		= &g2d_clk.common.hw,
> +		[CLK_BUS_G2D]		= &bus_g2d_clk.common.hw,
>   		[CLK_EINK]		= &eink_clk.common.hw,
>   		[CLK_EINK_PANEL]	= &eink_panel_clk.common.hw,
> +		[CLK_BUS_EINK]		= &bus_eink_clk.common.hw,
>   		[CLK_VE_ENC]		= &ve_enc_clk.common.hw,
>   		[CLK_VE_DEC]		= &ve_dec_clk.common.hw,
> +		[CLK_BUS_VE_ENC]	= &bus_ve_enc_clk.common.hw,
> +		[CLK_BUS_VE_DEC]	= &bus_ve_dec_clk.common.hw,
>   		[CLK_CE]		= &ce_clk.common.hw,
> +		[CLK_BUS_CE]		= &bus_ce_clk.common.hw,
> +		[CLK_BUS_CE_SYS]	= &bus_ce_sys_clk.common.hw,
>   		[CLK_NPU]		= &npu_clk.common.hw,
> +		[CLK_BUS_NPU]		= &bus_npu_clk.common.hw,
>   		[CLK_GPU]		= &gpu_clk.common.hw,
> +		[CLK_BUS_GPU]		= &bus_gpu_clk.common.hw,
>   		[CLK_DRAM]		= &dram_clk.common.hw,
> +		[CLK_BUS_DRAM]		= &bus_dram_clk.common.hw,
>   		[CLK_NAND0]		= &nand0_clk.common.hw,
>   		[CLK_NAND1]		= &nand1_clk.common.hw,
> +		[CLK_BUS_NAND]		= &bus_nand_clk.common.hw,
>   		[CLK_MMC0]		= &mmc0_clk.common.hw,
>   		[CLK_MMC1]		= &mmc1_clk.common.hw,
>   		[CLK_MMC2]		= &mmc2_clk.common.hw,
>   		[CLK_MMC3]		= &mmc3_clk.common.hw,
> +		[CLK_BUS_MMC0]		= &bus_mmc0_clk.common.hw,
> +		[CLK_BUS_MMC1]		= &bus_mmc1_clk.common.hw,
> +		[CLK_BUS_MMC2]		= &bus_mmc2_clk.common.hw,
> +		[CLK_BUS_MMC3]		= &bus_mmc3_clk.common.hw,
>   		[CLK_UFS_AXI]		= &ufs_axi_clk.common.hw,
>   		[CLK_UFS_CFG]		= &ufs_cfg_clk.common.hw,
> +		[CLK_BUS_UFS]		= &bus_ufs_clk.common.hw,
> +		[CLK_BUS_UART0]		= &bus_uart0_clk.common.hw,
> +		[CLK_BUS_UART1]		= &bus_uart1_clk.common.hw,
> +		[CLK_BUS_UART2]		= &bus_uart2_clk.common.hw,
> +		[CLK_BUS_UART3]		= &bus_uart3_clk.common.hw,
> +		[CLK_BUS_UART4]		= &bus_uart4_clk.common.hw,
> +		[CLK_BUS_UART5]		= &bus_uart5_clk.common.hw,
> +		[CLK_BUS_UART6]		= &bus_uart6_clk.common.hw,
> +		[CLK_BUS_I2C0]		= &bus_i2c0_clk.common.hw,
> +		[CLK_BUS_I2C1]		= &bus_i2c1_clk.common.hw,
> +		[CLK_BUS_I2C2]		= &bus_i2c2_clk.common.hw,
> +		[CLK_BUS_I2C3]		= &bus_i2c3_clk.common.hw,
> +		[CLK_BUS_I2C4]		= &bus_i2c4_clk.common.hw,
> +		[CLK_BUS_I2C5]		= &bus_i2c5_clk.common.hw,
> +		[CLK_BUS_I2C6]		= &bus_i2c6_clk.common.hw,
> +		[CLK_BUS_I2C7]		= &bus_i2c7_clk.common.hw,
> +		[CLK_BUS_I2C8]		= &bus_i2c8_clk.common.hw,
> +		[CLK_BUS_I2C9]		= &bus_i2c9_clk.common.hw,
> +		[CLK_BUS_I2C10]		= &bus_i2c10_clk.common.hw,
> +		[CLK_BUS_I2C11]		= &bus_i2c11_clk.common.hw,
> +		[CLK_BUS_I2C12]		= &bus_i2c12_clk.common.hw,
>   		[CLK_SPI0]		= &spi0_clk.common.hw,
>   		[CLK_SPI1]		= &spi1_clk.common.hw,
>   		[CLK_SPI2]		= &spi2_clk.common.hw,
>   		[CLK_SPI3]		= &spi3_clk.common.hw,
>   		[CLK_SPI4]		= &spi4_clk.common.hw,
> +		[CLK_BUS_SPI0]		= &bus_spi0_clk.common.hw,
> +		[CLK_BUS_SPI1]		= &bus_spi1_clk.common.hw,
> +		[CLK_BUS_SPI2]		= &bus_spi2_clk.common.hw,
> +		[CLK_BUS_SPI3]		= &bus_spi3_clk.common.hw,
> +		[CLK_BUS_SPI4]		= &bus_spi4_clk.common.hw,
>   		[CLK_SPIF]		= &spif_clk.common.hw,
> +		[CLK_BUS_SPIF]		= &bus_spif_clk.common.hw,
>   		[CLK_GPADC]		= &gpadc_clk.common.hw,
> +		[CLK_BUS_GPADC]		= &bus_gpadc_clk.common.hw,
> +		[CLK_BUS_THS]		= &bus_ths_clk.common.hw,
>   		[CLK_IRRX]		= &irrx_clk.common.hw,
> +		[CLK_BUS_IRRX]		= &bus_irrx_clk.common.hw,
>   		[CLK_IRTX]		= &irtx_clk.common.hw,
> +		[CLK_BUS_IRTX]		= &bus_irtx_clk.common.hw,
> +		[CLK_BUS_LRADC]		= &bus_lradc_clk.common.hw,
>   		[CLK_SGPIO]		= &sgpio_clk.common.hw,
> +		[CLK_BUS_SGPIO]		= &bus_sgpio_clk.common.hw,
>   		[CLK_LPC]		= &lpc_clk.common.hw,
> +		[CLK_BUS_LPC]		= &bus_lpc_clk.common.hw,
>   		[CLK_I2SPCM0]		= &i2spcm0_clk.common.hw,
>   		[CLK_I2SPCM1]		= &i2spcm1_clk.common.hw,
>   		[CLK_I2SPCM2]		= &i2spcm2_clk.common.hw,
>   		[CLK_I2SPCM3]		= &i2spcm3_clk.common.hw,
>   		[CLK_I2SPCM4]		= &i2spcm4_clk.common.hw,
> +		[CLK_BUS_I2SPCM0]	= &bus_i2spcm0_clk.common.hw,
> +		[CLK_BUS_I2SPCM1]	= &bus_i2spcm1_clk.common.hw,
> +		[CLK_BUS_I2SPCM2]	= &bus_i2spcm2_clk.common.hw,
> +		[CLK_BUS_I2SPCM3]	= &bus_i2spcm3_clk.common.hw,
> +		[CLK_BUS_I2SPCM4]	= &bus_i2spcm4_clk.common.hw,
>   		[CLK_I2SPCM2_ASRC]	= &i2spcm2_asrc_clk.common.hw,
>   		[CLK_OWA_TX]		= &owa_tx_clk.common.hw,
>   		[CLK_OWA_RX]		= &owa_rx_clk.common.hw,
> +		[CLK_BUS_OWA]		= &bus_owa_clk.common.hw,
>   		[CLK_DMIC]		= &dmic_clk.common.hw,
> +		[CLK_BUS_DMIC]		= &bus_dmic_clk.common.hw,
>   		[CLK_USB_OHCI0]		= &usb_ohci0_clk.common.hw,
> +		[CLK_BUS_OTG]		= &bus_otg_clk.common.hw,
> +		[CLK_BUS_EHCI0]		= &bus_ehci0_clk.common.hw,
> +		[CLK_BUS_OHCI0]		= &bus_ohci0_clk.common.hw,
>   		[CLK_USB_OHCI1]		= &usb_ohci1_clk.common.hw,
> +		[CLK_BUS_EHCI1]		= &bus_ehci1_clk.common.hw,
> +		[CLK_BUS_OHCI1]		= &bus_ohci1_clk.common.hw,
>   		[CLK_USB_REF]		= &usb_ref_clk.common.hw,
>   		[CLK_USB2_U2_REF]	= &usb2_u2_ref_clk.common.hw,
>   		[CLK_USB2_SUSPEND]	= &usb2_suspend_clk.common.hw,
> @@ -1662,24 +2119,40 @@ static struct clk_hw_onecell_data sun60i_a733_hw_clks = {
>   		[CLK_GMAC_PTP]		= &gmac_ptp_clk.common.hw,
>   		[CLK_GMAC0_PHY]		= &gmac0_phy_clk.common.hw,
>   		[CLK_GMAC1_PHY]		= &gmac1_phy_clk.common.hw,
> +		[CLK_BUS_GMAC0]		= &bus_gmac0_clk.common.hw,
> +		[CLK_BUS_GMAC1]		= &bus_gmac1_clk.common.hw,
>   		[CLK_TCON_LCD0]		= &tcon_lcd0_clk.common.hw,
>   		[CLK_TCON_LCD1]		= &tcon_lcd1_clk.common.hw,
>   		[CLK_TCON_LCD2]		= &tcon_lcd2_clk.common.hw,
> +		[CLK_BUS_TCON_LCD0]	= &bus_tcon_lcd0_clk.common.hw,
> +		[CLK_BUS_TCON_LCD1]	= &bus_tcon_lcd1_clk.common.hw,
> +		[CLK_BUS_TCON_LCD2]	= &bus_tcon_lcd2_clk.common.hw,
>   		[CLK_DSI0]		= &dsi0_clk.common.hw,
>   		[CLK_DSI1]		= &dsi1_clk.common.hw,
> +		[CLK_BUS_DSI0]		= &bus_dsi0_clk.common.hw,
> +		[CLK_BUS_DSI1]		= &bus_dsi1_clk.common.hw,
>   		[CLK_COMBPHY0]		= &combphy0_clk.common.hw,
>   		[CLK_COMBPHY1]		= &combphy1_clk.common.hw,
> +		[CLK_BUS_TCON_TV0]	= &bus_tcon_tv0_clk.common.hw,
> +		[CLK_BUS_TCON_TV1]	= &bus_tcon_tv1_clk.common.hw,
>   		[CLK_EDP_TV]		= &edp_tv_clk.common.hw,
> +		[CLK_BUS_EDP_TV]	= &bus_edp_tv_clk.common.hw,
>   		[CLK_HDMI_CEC_32K]	= &hdmi_cec_32k_clk.common.hw,
>   		[CLK_HDMI_CEC]		= &hdmi_cec_clk.common.hw,
>   		[CLK_HDMI_TV]		= &hdmi_tv_clk.common.hw,
> +		[CLK_BUS_HDMI_TV]	= &bus_hdmi_tv_clk.common.hw,
>   		[CLK_HDMI_SFR]		= &hdmi_sfr_clk.common.hw,
>   		[CLK_HDMI_ESM]		= &hdmi_esm_clk.common.hw,
> +		[CLK_BUS_DPSS_TOP0]	= &bus_dpss_top0_clk.common.hw,
> +		[CLK_BUS_DPSS_TOP1]	= &bus_dpss_top1_clk.common.hw,
>   		[CLK_LEDC]		= &ledc_clk.common.hw,
> +		[CLK_BUS_LEDC]		= &bus_ledc_clk.common.hw,
> +		[CLK_BUS_DSC]		= &bus_dsc_clk.common.hw,
>   		[CLK_CSI_MASTER0]	= &csi_master0_clk.common.hw,
>   		[CLK_CSI_MASTER1]	= &csi_master1_clk.common.hw,
>   		[CLK_CSI_MASTER2]	= &csi_master2_clk.common.hw,
>   		[CLK_CSI]		= &csi_clk.common.hw,
> +		[CLK_BUS_CSI]		= &bus_csi_clk.common.hw,
>   		[CLK_ISP]		= &isp_clk.common.hw,
>   		[CLK_APB2JTAG]		= &apb2jtag_clk.common.hw,
>   		[CLK_FANOUT_24M]	= &fanout_24M_clk.common.hw,
> 


^ permalink raw reply

* [PATCH 6.6.y] net: sched: fix TCF_LAYER_TRANSPORT handling in tcf_get_base_ptr()
From: Chelsy Ratnawat @ 2026-04-15 22:19 UTC (permalink / raw)
  To: stable
  Cc: jhs, jiri, davem, edumazet, kuba, netdev,
	syzbot+f3a497f02c389d86ef16, Chelsy Ratnawat

From: Eric Dumazet <edumazet@google.com>

[Upstream commit 4fe5a00ec70717a7f1002d8913ec6143582b3c8e]

syzbot reported that tcf_get_base_ptr() can be called while transport
header is not set [1].

Instead of returning a dangling pointer, return NULL.

Fix tcf_get_base_ptr() callers to handle this NULL value.

[1]
 WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 skb_transport_header include/linux/skbuff.h:3071 [inline]
 WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 tcf_get_base_ptr include/net/pkt_cls.h:539 [inline]
 WARNING: CPU: 1 PID: 6019 at ./include/linux/skbuff.h:3071 em_nbyte_match+0x2d8/0x3f0 net/sched/em_nbyte.c:43
Modules linked in:
CPU: 1 UID: 0 PID: 6019 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full)
Call Trace:
 <TASK>
  tcf_em_match net/sched/ematch.c:494 [inline]
  __tcf_em_tree_match+0x1ac/0x770 net/sched/ematch.c:520
  tcf_em_tree_match include/net/pkt_cls.h:512 [inline]
  basic_classify+0x115/0x2d0 net/sched/cls_basic.c:50
  tc_classify include/net/tc_wrapper.h:197 [inline]
  __tcf_classify net/sched/cls_api.c:1764 [inline]
  tcf_classify+0x4cf/0x1140 net/sched/cls_api.c:1860
  multiq_classify net/sched/sch_multiq.c:39 [inline]
  multiq_enqueue+0xfd/0x4c0 net/sched/sch_multiq.c:66
  dev_qdisc_enqueue+0x4e/0x260 net/core/dev.c:4118
  __dev_xmit_skb net/core/dev.c:4214 [inline]
  __dev_queue_xmit+0xe83/0x3b50 net/core/dev.c:4729
  packet_snd net/packet/af_packet.c:3076 [inline]
  packet_sendmsg+0x3e33/0x5080 net/packet/af_packet.c:3108
  sock_sendmsg_nosec net/socket.c:727 [inline]
  __sock_sendmsg+0x21c/0x270 net/socket.c:742
  ____sys_sendmsg+0x505/0x830 net/socket.c:2630

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot+f3a497f02c389d86ef16@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/6920855a.a70a0220.2ea503.0058.GAE@google.com/T/#u
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/20251121154100.1616228-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Chelsy Ratnawat <chelsyratnawat2001@gmail.com>
---
 include/net/pkt_cls.h |  2 ++
 net/sched/em_cmp.c    |  5 ++++-
 net/sched/em_nbyte.c  |  2 ++
 net/sched/em_text.c   | 11 +++++++++--
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index f308e8268651..ccc1c698ed00 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -525,6 +525,8 @@ static inline unsigned char * tcf_get_base_ptr(struct sk_buff *skb, int layer)
 		case TCF_LAYER_NETWORK:
 			return skb_network_header(skb);
 		case TCF_LAYER_TRANSPORT:
+			if (!skb_transport_header_was_set(skb))
+				break;
 			return skb_transport_header(skb);
 	}
 
diff --git a/net/sched/em_cmp.c b/net/sched/em_cmp.c
index f17b049ea530..71ce113f2d08 100644
--- a/net/sched/em_cmp.c
+++ b/net/sched/em_cmp.c
@@ -22,9 +22,12 @@ static int em_cmp_match(struct sk_buff *skb, struct tcf_ematch *em,
 			struct tcf_pkt_info *info)
 {
 	struct tcf_em_cmp *cmp = (struct tcf_em_cmp *) em->data;
-	unsigned char *ptr = tcf_get_base_ptr(skb, cmp->layer) + cmp->off;
+	unsigned char *ptr = tcf_get_base_ptr(skb, cmp->layer);
 	u32 val = 0;
 
+	if (!ptr)
+		return 0;
+	ptr += cmp->off;
 	if (!tcf_valid_offset(skb, ptr, cmp->align))
 		return 0;
 
diff --git a/net/sched/em_nbyte.c b/net/sched/em_nbyte.c
index a83b237cbeb0..2e3c1d58d456 100644
--- a/net/sched/em_nbyte.c
+++ b/net/sched/em_nbyte.c
@@ -42,6 +42,8 @@ static int em_nbyte_match(struct sk_buff *skb, struct tcf_ematch *em,
 	struct nbyte_data *nbyte = (struct nbyte_data *) em->data;
 	unsigned char *ptr = tcf_get_base_ptr(skb, nbyte->hdr.layer);
 
+	if (!ptr)
+		return 0;
 	ptr += nbyte->hdr.off;
 
 	if (!tcf_valid_offset(skb, ptr, nbyte->hdr.len))
diff --git a/net/sched/em_text.c b/net/sched/em_text.c
index f176afb70559..32aae8a9deda 100644
--- a/net/sched/em_text.c
+++ b/net/sched/em_text.c
@@ -29,12 +29,19 @@ static int em_text_match(struct sk_buff *skb, struct tcf_ematch *m,
 			 struct tcf_pkt_info *info)
 {
 	struct text_match *tm = EM_TEXT_PRIV(m);
+	unsigned char *ptr;
 	int from, to;
 
-	from = tcf_get_base_ptr(skb, tm->from_layer) - skb->data;
+	ptr = tcf_get_base_ptr(skb, tm->from_layer);
+	if (!ptr)
+		return 0;
+	from = ptr - skb->data;
 	from += tm->from_offset;
 
-	to = tcf_get_base_ptr(skb, tm->to_layer) - skb->data;
+	ptr = tcf_get_base_ptr(skb, tm->to_layer);
+	if (!ptr)
+		return 0;
+	to = ptr - skb->data;
 	to += tm->to_offset;
 
 	return skb_find_text(skb, from, to, tm->config) != UINT_MAX;
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net v4] openvswitch: cap upcall PID array size and pre-size vport replies
From: Ilya Maximets @ 2026-04-15 22:41 UTC (permalink / raw)
  To: Weiming Shi, Aaron Conole, Eelco Chaudron, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: i.maximets, Simon Horman, netdev, dev, Xiang Mei
In-Reply-To: <20260415125121.110874-2-bestswngs@gmail.com>

On 4/15/26 2:51 PM, Weiming Shi wrote:
> The vport netlink reply helpers allocate a fixed-size skb with
> nlmsg_new(NLMSG_DEFAULT_SIZE, ...) but serialize the full upcall PID
> array via ovs_vport_get_upcall_portids().  Since
> ovs_vport_set_upcall_portids() accepts any non-zero multiple of
> sizeof(u32) with no upper bound, a CAP_NET_ADMIN user can install a PID
> array large enough to overflow the reply buffer, causing nla_put() to
> fail with -EMSGSIZE and hitting BUG_ON(err < 0).  On systems with
> unprivileged user namespaces enabled (e.g., Ubuntu default), this is
> reachable via unshare -Urn since OVS vport mutation operations use
> GENL_UNS_ADMIN_PERM.
> 
>  kernel BUG at net/openvswitch/datapath.c:2414!
>  Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
>  CPU: 1 UID: 0 PID: 65 Comm: poc Not tainted 7.0.0-rc7-00195-geb216e422044 #1
>  RIP: 0010:ovs_vport_cmd_set+0x34c/0x400
>  Call Trace:
>   <TASK>
>   genl_family_rcv_msg_doit (net/netlink/genetlink.c:1116)
>   genl_rcv_msg (net/netlink/genetlink.c:1194)
>   netlink_rcv_skb (net/netlink/af_netlink.c:2550)
>   genl_rcv (net/netlink/genetlink.c:1219)
>   netlink_unicast (net/netlink/af_netlink.c:1344)
>   netlink_sendmsg (net/netlink/af_netlink.c:1894)
>   __sys_sendto (net/socket.c:2206)
>   __x64_sys_sendto (net/socket.c:2209)
>   do_syscall_64 (arch/x86/entry/syscall_64.c:63)
>   entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
>   </TASK>
>  Kernel panic - not syncing: Fatal exception
> 
> Reject attempts to set more PIDs than nr_cpu_ids in
> ovs_vport_set_upcall_portids(), and pre-compute the worst-case reply
> size in ovs_vport_cmd_msg_size() based on that bound, similar to the
> existing ovs_dp_cmd_msg_size().  nr_cpu_ids matches the cap already
> used by the per-CPU dispatch configuration on the datapath side
> (ovs_dp_cmd_fill_info() serialises at most nr_cpu_ids PIDs), so the
> two sides stay consistent.
> 
> Fixes: 5cd667b0a456 ("openvswitch: Allow each vport to have an array of 'port_id's.")
> Reported-by: Xiang Mei <xmei5@asu.edu>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> ---
> v4 (per Ilya):
> - Use nr_cpu_ids instead of num_possible_cpus() for consistency with
>   the per-CPU dispatch on the datapath side.
> - Annotate ovs_vport_cmd_msg_size() per-attribute; split nested sums.
> v3: Cap at num_possible_cpus(); add ovs_vport_cmd_msg_size(); keep
>     BUG_ON(); fix Fixes tag.
> v2: Dynamically size reply skb; drop WARN_ON_ONCE, return plain errors.

Please, don't re-name the patch for every version if there are no changes
that actually invalidate the name.  It was definitely not necessary in the
past few versions of this patch.  Could've even kept the original name
from v1, it was fine.  But please, keep the current v4 name in v5.

These renames are messing up version tracking.  Also, please, add lore links
into the changelog for previous versions, especially if you're renaming the
patch, so reviewers can find the older versions.

In case you're using AI to help with these patches (which would explain the
constant renaming), you should disclose that by adding an Assisted-by tag:
  https://docs.kernel.org/process/coding-assistants.html#attribution

> ---
>  net/openvswitch/datapath.c | 33 +++++++++++++++++++++++++++++++--
>  net/openvswitch/vport.c    |  3 +++
>  2 files changed, 34 insertions(+), 2 deletions(-)
> 
> diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c
> index e209099218b4..35e67e51b0d2 100644
> --- a/net/openvswitch/datapath.c
> +++ b/net/openvswitch/datapath.c
> @@ -2184,9 +2184,38 @@ static int ovs_vport_cmd_fill_info(struct vport *vport, struct sk_buff *skb,
>  	return err;
>  }
>  
> +static size_t ovs_vport_cmd_msg_size(void)
> +{
> +	size_t msgsize = NLMSG_ALIGN(sizeof(struct ovs_header));
> +
> +	msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_PORT_NO */
> +	msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_TYPE */
> +	msgsize += nla_total_size(IFNAMSIZ);    /* OVS_VPORT_ATTR_NAME */
> +	msgsize += nla_total_size(sizeof(u32)); /* OVS_VPORT_ATTR_IFINDEX */
> +	msgsize += nla_total_size(sizeof(s32)); /* OVS_VPORT_ATTR_NETNSID */

Add an empty line here, it's hard to read when comments are sandwiched
between the code.  Same for all the blocks below (empty line before the
comment line).

> +	/* OVS_VPORT_ATTR_STATS */
> +	msgsize += nla_total_size_64bit(sizeof(struct ovs_vport_stats));
> +	/* OVS_VPORT_ATTR_UPCALL_STATS(OVS_VPORT_UPCALL_ATTR_SUCCESS +
> +	 *                             OVS_VPORT_UPCALL_ATTR_FAIL)
> +	 */
> +	msgsize += nla_total_size(nla_total_size_64bit(sizeof(u64)) +
> +				  nla_total_size_64bit(sizeof(u64)));
> +	/* OVS_VPORT_ATTR_UPCALL_PID (capped at nr_cpu_ids by
> +	 * ovs_vport_set_upcall_portids())

The explanation inside the parentheses is not needed, IMO.

The rest seems fine to me.

Best regards, Ilya Maximets.

^ permalink raw reply

* [net,PATCH v4 1/2] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: Marek Vasut @ 2026-04-15 23:09 UTC (permalink / raw)
  To: netdev
  Cc: Marek Vasut, Sebastian Andrzej Siewior, stable, David S. Miller,
	Andrew Lunn, Eric Dumazet, Jakub Kicinski, Nicolai Buchwitz,
	Paolo Abeni, Ronald Wahl, Yicong Hui, linux-kernel

If the driver executes ks8851_irq() AND a TX packet has been sent, then
the driver enables TX queue via netif_wake_queue() which schedules TX
softirq to queue packets for this device.

If CONFIG_PREEMPT_RT=y is set AND a packet has also been received by
the MAC, then ks8851_rx_pkts() calls netdev_alloc_skb_ip_align() to
allocate SKBs for the received packets. If netdev_alloc_skb_ip_align()
is called with BH enabled, then local_bh_enable() at the end of
netdev_alloc_skb_ip_align() will trigger the pending softirq processing,
which may ultimately call the .xmit callback ks8851_start_xmit_par().
The ks8851_start_xmit_par() will try to lock struct ks8851_net_par
.lock spinlock, which is already locked by ks8851_irq() from which
ks8851_start_xmit_par() was called. This leads to a deadlock, which
is reported by the kernel, including a trace listed below.

If CONFIG_PREEMPT_RT is not set, then since commit 0913ec336a6c0
("net: ks8851: Fix deadlock with the SPI chip variant") the deadlock
can also be triggered without received packet in the RX FIFO. The
pending softirqs will be processed on return from
spin_unlock_bh(&ks->statelock) in ks8851_irq(), which triggers the
deadlock as well.

Fix the problem by disabling BH around critical sections, including the
IRQ handler, thus preventing the net_tx_action() softirq from triggering
during these critical sections. The net_tx_action() softirq is triggered
once BH are re-enabled and at the end of the IRQ handler, once all the
other IRQ handler actions have been completed.

 __schedule from schedule_rtlock+0x1c/0x34
 schedule_rtlock from rtlock_slowlock_locked+0x548/0x904
 rtlock_slowlock_locked from rt_spin_lock+0x60/0x9c
 rt_spin_lock from ks8851_start_xmit_par+0x74/0x1a8
 ks8851_start_xmit_par from netdev_start_xmit+0x20/0x44
 netdev_start_xmit from dev_hard_start_xmit+0xd0/0x188
 dev_hard_start_xmit from sch_direct_xmit+0xb8/0x25c
 sch_direct_xmit from __qdisc_run+0x1f8/0x4ec
 __qdisc_run from qdisc_run+0x1c/0x28
 qdisc_run from net_tx_action+0x1f0/0x268
 net_tx_action from handle_softirqs+0x1a4/0x270
 handle_softirqs from __local_bh_enable_ip+0xcc/0xe0
 __local_bh_enable_ip from __alloc_skb+0xd8/0x128
 __alloc_skb from __netdev_alloc_skb+0x3c/0x19c
 __netdev_alloc_skb from ks8851_irq+0x388/0x4d4
 ks8851_irq from irq_thread_fn+0x24/0x64
 irq_thread_fn from irq_thread+0x178/0x28c
 irq_thread from kthread+0x12c/0x138
 kthread from ret_from_fork+0x14/0x28

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Fixes: e0863634bf9f ("net: ks8851: Queue RX packets in IRQ handler instead of disabling BHs")
Cc: stable@vger.kernel.org
Signed-off-by: Marek Vasut <marex@nabladev.com>
---
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Nicolai Buchwitz <nb@tipi-net.de>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ronald Wahl <ronald.wahl@raritan.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Yicong Hui <yiconghui@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
---
V2: Register dedicated IRQ handler wrapper which disables BH for the
    parallel variant of the MAC, the variant which uses spinlocks as
    the locking primitive. Use stock IRQ handler with BH unchanged
    for the SPI variant, which uses mutexes as locking primitive.
V3: Switch to spin_lock_bh(), update commit message
V4: Extend the commit message
    Add RB from Sebastian
---
 drivers/net/ethernet/micrel/ks8851.h        |  6 +-
 drivers/net/ethernet/micrel/ks8851_common.c | 64 +++++++++------------
 drivers/net/ethernet/micrel/ks8851_par.c    | 15 ++---
 drivers/net/ethernet/micrel/ks8851_spi.c    | 11 ++--
 4 files changed, 38 insertions(+), 58 deletions(-)

diff --git a/drivers/net/ethernet/micrel/ks8851.h b/drivers/net/ethernet/micrel/ks8851.h
index 31f75b4a67fd7..b795a3a605711 100644
--- a/drivers/net/ethernet/micrel/ks8851.h
+++ b/drivers/net/ethernet/micrel/ks8851.h
@@ -408,10 +408,8 @@ struct ks8851_net {
 	struct gpio_desc	*gpio;
 	struct mii_bus		*mii_bus;
 
-	void			(*lock)(struct ks8851_net *ks,
-					unsigned long *flags);
-	void			(*unlock)(struct ks8851_net *ks,
-					  unsigned long *flags);
+	void			(*lock)(struct ks8851_net *ks);
+	void			(*unlock)(struct ks8851_net *ks);
 	unsigned int		(*rdreg16)(struct ks8851_net *ks,
 					   unsigned int reg);
 	void			(*wrreg16)(struct ks8851_net *ks,
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
index 8048770958d60..6c375647b24de 100644
--- a/drivers/net/ethernet/micrel/ks8851_common.c
+++ b/drivers/net/ethernet/micrel/ks8851_common.c
@@ -28,25 +28,23 @@
 /**
  * ks8851_lock - register access lock
  * @ks: The chip state
- * @flags: Spinlock flags
  *
  * Claim chip register access lock
  */
-static void ks8851_lock(struct ks8851_net *ks, unsigned long *flags)
+static void ks8851_lock(struct ks8851_net *ks)
 {
-	ks->lock(ks, flags);
+	ks->lock(ks);
 }
 
 /**
  * ks8851_unlock - register access unlock
  * @ks: The chip state
- * @flags: Spinlock flags
  *
  * Release chip register access lock
  */
-static void ks8851_unlock(struct ks8851_net *ks, unsigned long *flags)
+static void ks8851_unlock(struct ks8851_net *ks)
 {
-	ks->unlock(ks, flags);
+	ks->unlock(ks);
 }
 
 /**
@@ -129,11 +127,10 @@ static void ks8851_set_powermode(struct ks8851_net *ks, unsigned pwrmode)
 static int ks8851_write_mac_addr(struct net_device *dev)
 {
 	struct ks8851_net *ks = netdev_priv(dev);
-	unsigned long flags;
 	u16 val;
 	int i;
 
-	ks8851_lock(ks, &flags);
+	ks8851_lock(ks);
 
 	/*
 	 * Wake up chip in case it was powered off when stopped; otherwise,
@@ -149,7 +146,7 @@ static int ks8851_write_mac_addr(struct net_device *dev)
 	if (!netif_running(dev))
 		ks8851_set_powermode(ks, PMECR_PM_SOFTDOWN);
 
-	ks8851_unlock(ks, &flags);
+	ks8851_unlock(ks);
 
 	return 0;
 }
@@ -163,12 +160,11 @@ static int ks8851_write_mac_addr(struct net_device *dev)
 static void ks8851_read_mac_addr(struct net_device *dev)
 {
 	struct ks8851_net *ks = netdev_priv(dev);
-	unsigned long flags;
 	u8 addr[ETH_ALEN];
 	u16 reg;
 	int i;
 
-	ks8851_lock(ks, &flags);
+	ks8851_lock(ks);
 
 	for (i = 0; i < ETH_ALEN; i += 2) {
 		reg = ks8851_rdreg16(ks, KS_MAR(i));
@@ -177,7 +173,7 @@ static void ks8851_read_mac_addr(struct net_device *dev)
 	}
 	eth_hw_addr_set(dev, addr);
 
-	ks8851_unlock(ks, &flags);
+	ks8851_unlock(ks);
 }
 
 /**
@@ -312,11 +308,10 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
 {
 	struct ks8851_net *ks = _ks;
 	struct sk_buff_head rxq;
-	unsigned long flags;
 	unsigned int status;
 	struct sk_buff *skb;
 
-	ks8851_lock(ks, &flags);
+	ks8851_lock(ks);
 
 	status = ks8851_rdreg16(ks, KS_ISR);
 	ks8851_wrreg16(ks, KS_ISR, status);
@@ -373,7 +368,7 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
 		ks8851_wrreg16(ks, KS_RXCR1, rxc->rxcr1);
 	}
 
-	ks8851_unlock(ks, &flags);
+	ks8851_unlock(ks);
 
 	if (status & IRQ_LCI)
 		mii_check_link(&ks->mii);
@@ -405,7 +400,6 @@ static void ks8851_flush_tx_work(struct ks8851_net *ks)
 static int ks8851_net_open(struct net_device *dev)
 {
 	struct ks8851_net *ks = netdev_priv(dev);
-	unsigned long flags;
 	int ret;
 
 	ret = request_threaded_irq(dev->irq, NULL, ks8851_irq,
@@ -418,7 +412,7 @@ static int ks8851_net_open(struct net_device *dev)
 
 	/* lock the card, even if we may not actually be doing anything
 	 * else at the moment */
-	ks8851_lock(ks, &flags);
+	ks8851_lock(ks);
 
 	netif_dbg(ks, ifup, ks->netdev, "opening\n");
 
@@ -471,7 +465,7 @@ static int ks8851_net_open(struct net_device *dev)
 
 	netif_dbg(ks, ifup, ks->netdev, "network device up\n");
 
-	ks8851_unlock(ks, &flags);
+	ks8851_unlock(ks);
 	mii_check_link(&ks->mii);
 	return 0;
 }
@@ -487,23 +481,22 @@ static int ks8851_net_open(struct net_device *dev)
 static int ks8851_net_stop(struct net_device *dev)
 {
 	struct ks8851_net *ks = netdev_priv(dev);
-	unsigned long flags;
 
 	netif_info(ks, ifdown, dev, "shutting down\n");
 
 	netif_stop_queue(dev);
 
-	ks8851_lock(ks, &flags);
+	ks8851_lock(ks);
 	/* turn off the IRQs and ack any outstanding */
 	ks8851_wrreg16(ks, KS_IER, 0x0000);
 	ks8851_wrreg16(ks, KS_ISR, 0xffff);
-	ks8851_unlock(ks, &flags);
+	ks8851_unlock(ks);
 
 	/* stop any outstanding work */
 	ks8851_flush_tx_work(ks);
 	flush_work(&ks->rxctrl_work);
 
-	ks8851_lock(ks, &flags);
+	ks8851_lock(ks);
 	/* shutdown RX process */
 	ks8851_wrreg16(ks, KS_RXCR1, 0x0000);
 
@@ -512,7 +505,7 @@ static int ks8851_net_stop(struct net_device *dev)
 
 	/* set powermode to soft power down to save power */
 	ks8851_set_powermode(ks, PMECR_PM_SOFTDOWN);
-	ks8851_unlock(ks, &flags);
+	ks8851_unlock(ks);
 
 	/* ensure any queued tx buffers are dumped */
 	while (!skb_queue_empty(&ks->txq)) {
@@ -566,14 +559,13 @@ static netdev_tx_t ks8851_start_xmit(struct sk_buff *skb,
 static void ks8851_rxctrl_work(struct work_struct *work)
 {
 	struct ks8851_net *ks = container_of(work, struct ks8851_net, rxctrl_work);
-	unsigned long flags;
 
-	ks8851_lock(ks, &flags);
+	ks8851_lock(ks);
 
 	/* need to shutdown RXQ before modifying filter parameters */
 	ks8851_wrreg16(ks, KS_RXCR1, 0x00);
 
-	ks8851_unlock(ks, &flags);
+	ks8851_unlock(ks);
 }
 
 static void ks8851_set_rx_mode(struct net_device *dev)
@@ -780,7 +772,6 @@ static int ks8851_set_eeprom(struct net_device *dev,
 {
 	struct ks8851_net *ks = netdev_priv(dev);
 	int offset = ee->offset;
-	unsigned long flags;
 	int len = ee->len;
 	u16 tmp;
 
@@ -794,7 +785,7 @@ static int ks8851_set_eeprom(struct net_device *dev,
 	if (!(ks->rc_ccr & CCR_EEPROM))
 		return -ENOENT;
 
-	ks8851_lock(ks, &flags);
+	ks8851_lock(ks);
 
 	ks8851_eeprom_claim(ks);
 
@@ -817,7 +808,7 @@ static int ks8851_set_eeprom(struct net_device *dev,
 	eeprom_93cx6_wren(&ks->eeprom, false);
 
 	ks8851_eeprom_release(ks);
-	ks8851_unlock(ks, &flags);
+	ks8851_unlock(ks);
 
 	return 0;
 }
@@ -827,7 +818,6 @@ static int ks8851_get_eeprom(struct net_device *dev,
 {
 	struct ks8851_net *ks = netdev_priv(dev);
 	int offset = ee->offset;
-	unsigned long flags;
 	int len = ee->len;
 
 	/* must be 2 byte aligned */
@@ -837,7 +827,7 @@ static int ks8851_get_eeprom(struct net_device *dev,
 	if (!(ks->rc_ccr & CCR_EEPROM))
 		return -ENOENT;
 
-	ks8851_lock(ks, &flags);
+	ks8851_lock(ks);
 
 	ks8851_eeprom_claim(ks);
 
@@ -845,7 +835,7 @@ static int ks8851_get_eeprom(struct net_device *dev,
 
 	eeprom_93cx6_multiread(&ks->eeprom, offset/2, (__le16 *)data, len/2);
 	ks8851_eeprom_release(ks);
-	ks8851_unlock(ks, &flags);
+	ks8851_unlock(ks);
 
 	return 0;
 }
@@ -904,7 +894,6 @@ static int ks8851_phy_reg(int reg)
 static int ks8851_phy_read_common(struct net_device *dev, int phy_addr, int reg)
 {
 	struct ks8851_net *ks = netdev_priv(dev);
-	unsigned long flags;
 	int result;
 	int ksreg;
 
@@ -912,9 +901,9 @@ static int ks8851_phy_read_common(struct net_device *dev, int phy_addr, int reg)
 	if (ksreg < 0)
 		return ksreg;
 
-	ks8851_lock(ks, &flags);
+	ks8851_lock(ks);
 	result = ks8851_rdreg16(ks, ksreg);
-	ks8851_unlock(ks, &flags);
+	ks8851_unlock(ks);
 
 	return result;
 }
@@ -949,14 +938,13 @@ static void ks8851_phy_write(struct net_device *dev,
 			     int phy, int reg, int value)
 {
 	struct ks8851_net *ks = netdev_priv(dev);
-	unsigned long flags;
 	int ksreg;
 
 	ksreg = ks8851_phy_reg(reg);
 	if (ksreg >= 0) {
-		ks8851_lock(ks, &flags);
+		ks8851_lock(ks);
 		ks8851_wrreg16(ks, ksreg, value);
-		ks8851_unlock(ks, &flags);
+		ks8851_unlock(ks);
 	}
 }
 
diff --git a/drivers/net/ethernet/micrel/ks8851_par.c b/drivers/net/ethernet/micrel/ks8851_par.c
index 78695be2570bf..9f1c33f6ddec0 100644
--- a/drivers/net/ethernet/micrel/ks8851_par.c
+++ b/drivers/net/ethernet/micrel/ks8851_par.c
@@ -55,29 +55,27 @@ struct ks8851_net_par {
 /**
  * ks8851_lock_par - register access lock
  * @ks: The chip state
- * @flags: Spinlock flags
  *
  * Claim chip register access lock
  */
-static void ks8851_lock_par(struct ks8851_net *ks, unsigned long *flags)
+static void ks8851_lock_par(struct ks8851_net *ks)
 {
 	struct ks8851_net_par *ksp = to_ks8851_par(ks);
 
-	spin_lock_irqsave(&ksp->lock, *flags);
+	spin_lock_bh(&ksp->lock);
 }
 
 /**
  * ks8851_unlock_par - register access unlock
  * @ks: The chip state
- * @flags: Spinlock flags
  *
  * Release chip register access lock
  */
-static void ks8851_unlock_par(struct ks8851_net *ks, unsigned long *flags)
+static void ks8851_unlock_par(struct ks8851_net *ks)
 {
 	struct ks8851_net_par *ksp = to_ks8851_par(ks);
 
-	spin_unlock_irqrestore(&ksp->lock, *flags);
+	spin_unlock_bh(&ksp->lock);
 }
 
 /**
@@ -233,7 +231,6 @@ static netdev_tx_t ks8851_start_xmit_par(struct sk_buff *skb,
 {
 	struct ks8851_net *ks = netdev_priv(dev);
 	netdev_tx_t ret = NETDEV_TX_OK;
-	unsigned long flags;
 	unsigned int txqcr;
 	u16 txmir;
 	int err;
@@ -241,7 +238,7 @@ static netdev_tx_t ks8851_start_xmit_par(struct sk_buff *skb,
 	netif_dbg(ks, tx_queued, ks->netdev,
 		  "%s: skb %p, %d@%p\n", __func__, skb, skb->len, skb->data);
 
-	ks8851_lock_par(ks, &flags);
+	ks8851_lock_par(ks);
 
 	txmir = ks8851_rdreg16_par(ks, KS_TXMIR) & 0x1fff;
 
@@ -262,7 +259,7 @@ static netdev_tx_t ks8851_start_xmit_par(struct sk_buff *skb,
 		ret = NETDEV_TX_BUSY;
 	}
 
-	ks8851_unlock_par(ks, &flags);
+	ks8851_unlock_par(ks);
 
 	return ret;
 }
diff --git a/drivers/net/ethernet/micrel/ks8851_spi.c b/drivers/net/ethernet/micrel/ks8851_spi.c
index a161ae45743ab..b9e68520278d0 100644
--- a/drivers/net/ethernet/micrel/ks8851_spi.c
+++ b/drivers/net/ethernet/micrel/ks8851_spi.c
@@ -71,11 +71,10 @@ struct ks8851_net_spi {
 /**
  * ks8851_lock_spi - register access lock
  * @ks: The chip state
- * @flags: Spinlock flags
  *
  * Claim chip register access lock
  */
-static void ks8851_lock_spi(struct ks8851_net *ks, unsigned long *flags)
+static void ks8851_lock_spi(struct ks8851_net *ks)
 {
 	struct ks8851_net_spi *kss = to_ks8851_spi(ks);
 
@@ -85,11 +84,10 @@ static void ks8851_lock_spi(struct ks8851_net *ks, unsigned long *flags)
 /**
  * ks8851_unlock_spi - register access unlock
  * @ks: The chip state
- * @flags: Spinlock flags
  *
  * Release chip register access lock
  */
-static void ks8851_unlock_spi(struct ks8851_net *ks, unsigned long *flags)
+static void ks8851_unlock_spi(struct ks8851_net *ks)
 {
 	struct ks8851_net_spi *kss = to_ks8851_spi(ks);
 
@@ -309,7 +307,6 @@ static void ks8851_tx_work(struct work_struct *work)
 	struct ks8851_net_spi *kss;
 	unsigned short tx_space;
 	struct ks8851_net *ks;
-	unsigned long flags;
 	struct sk_buff *txb;
 	bool last;
 
@@ -317,7 +314,7 @@ static void ks8851_tx_work(struct work_struct *work)
 	ks = &kss->ks8851;
 	last = skb_queue_empty(&ks->txq);
 
-	ks8851_lock_spi(ks, &flags);
+	ks8851_lock_spi(ks);
 
 	while (!last) {
 		txb = skb_dequeue(&ks->txq);
@@ -343,7 +340,7 @@ static void ks8851_tx_work(struct work_struct *work)
 	ks->tx_space = tx_space;
 	spin_unlock_bh(&ks->statelock);
 
-	ks8851_unlock_spi(ks, &flags);
+	ks8851_unlock_spi(ks);
 }
 
 /**
-- 
2.53.0


^ permalink raw reply related

* [net,PATCH v4 2/2] net: ks8851: Avoid excess softirq scheduling
From: Marek Vasut @ 2026-04-15 23:09 UTC (permalink / raw)
  To: netdev
  Cc: Marek Vasut, Sebastian Andrzej Siewior, stable, David S. Miller,
	Andrew Lunn, Eric Dumazet, Jakub Kicinski, Nicolai Buchwitz,
	Paolo Abeni, Ronald Wahl, Yicong Hui, linux-kernel
In-Reply-To: <20260415231020.455298-1-marex@nabladev.com>

The code injects a packet into netif_rx() repeatedly, which will add
it to its internal NAPI and schedule a softirq, and process it. It is
more efficient to queue multiple packets and process them all at the
local_bh_enable() time.

Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Fixes: e0863634bf9f ("net: ks8851: Queue RX packets in IRQ handler instead of disabling BHs")
Cc: stable@vger.kernel.org
Signed-off-by: Marek Vasut <marex@nabladev.com>
---
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Nicolai Buchwitz <nb@tipi-net.de>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Ronald Wahl <ronald.wahl@raritan.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Yicong Hui <yiconghui@gmail.com>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
---
V3: New patch
V4: Add RB from Sebastian
---
 drivers/net/ethernet/micrel/ks8851_common.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
index 6c375647b24de..4afbb40bc0e4a 100644
--- a/drivers/net/ethernet/micrel/ks8851_common.c
+++ b/drivers/net/ethernet/micrel/ks8851_common.c
@@ -373,9 +373,12 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
 	if (status & IRQ_LCI)
 		mii_check_link(&ks->mii);
 
-	if (status & IRQ_RXI)
+	if (status & IRQ_RXI) {
+		local_bh_disable();
 		while ((skb = __skb_dequeue(&rxq)))
 			netif_rx(skb);
+		local_bh_enable();
+	}
 
 	return IRQ_HANDLED;
 }
-- 
2.53.0


^ permalink raw reply related

* Re: [net,PATCH v3 1/2] net: ks8851: Reinstate disabling of BHs around IRQ handler
From: Marek Vasut @ 2026-04-15 23:14 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: netdev, stable, David S. Miller, Andrew Lunn, Eric Dumazet,
	Jakub Kicinski, Nicolai Buchwitz, Paolo Abeni, Ronald Wahl,
	Yicong Hui, linux-kernel
In-Reply-To: <20260414145218.lsNpdAJI@linutronix.de>

On 4/14/26 4:52 PM, Sebastian Andrzej Siewior wrote:
> On 2026-04-14 16:20:46 [+0200], Marek Vasut wrote:
>>> This is what happens since commit 0913ec336a6c0 ("net: ks8851: Fix
>>> deadlock with the SPI chip variant"). Before that commit the softirq
>>> execution will be picked up by netdev_alloc_skb_ip_align() and requires
>>> PREEMPT_RT and a RX packet in #1 to trigger the deadlock.
>>
>> Do you want me to add this into the V4 commit message ?
> 
> The description does not match the code since the commit mentioned
> above.

I hope the V4 commit message is a bit better.

>>> The backtrace here and the description is based on an older kernel.
>>> However
>> I actually did update the backtrace in V3 with the one from current next
>> 20260413 .
> 
> That would be from yesterday and the change is merged since v6.10. But
> why is the softirq starting from __netdev_alloc_skb() instead of
> spin_unlock_bh(&ks->statelock)? After that unlock, the softirq must be
> processed and __netdev_alloc_skb() _could_ observe pending softirqs but
> not from ks8851.
Because __netdev_alloc_skb() also enables/disables BH , see the "else" 
branch:

  759 struct sk_buff *__netdev_alloc_skb(struct net_device *dev, 
unsigned int len,
  760                                    gfp_t gfp_mask)
  761 {
...
  786         if (in_hardirq() || irqs_disabled()) {
  787                 nc = this_cpu_ptr(&netdev_alloc_cache);
  788                 data = page_frag_alloc(nc, len, gfp_mask);
  789                 pfmemalloc = page_frag_cache_is_pfmemalloc(nc);
  790         } else {
  791                 local_bh_disable();
  792                 local_lock_nested_bh(&napi_alloc_cache.bh_lock);
...
  798                 local_unlock_nested_bh(&napi_alloc_cache.bh_lock);
  799                 local_bh_enable();
...

^ permalink raw reply

* Re: [PATCH net-next v18 00/15] Begin upstreaming Homa transport protocol
From: John Ousterhout @ 2026-04-15 23:21 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: netdev, pabeni, edumazet, horms
In-Reply-To: <CAGXJAmxCsBPJttXdvK=s2SLxFUBzTDvf6PewPo+6QkxZbuES2Q@mail.gmail.com>

For the record, I have fixed all of the issues reported by the AI
review of this patch series, except for the following comments:

(the following comment was on patch 8/15)

> In v17, Paolo followed up asking about kmem_cache performance and whether
> optimization makes sense. The author responded that kmem_cache was actually
> slower than kzalloc (400 vs 380 cycles). While the author provided
> justification, Paolo's v17 response suggests this remains a point of
> discussion about understanding the performance characteristics.

I have not attempted to track down why this optimization didn't help
for Homa. Let me know if that needs to be done as part of the
acceptance process for this patch series.

(the following comment was on patch 11/15)

> Eric Dumazet raised a performance concern during the September 2025 review
> (https://lore.kernel.org/netdev/CANn89i+r6ogvutW1+C7ZDhEYKZN7WU=OVONpgnoX4EgYX=BF4A@mail.gmail.com/):

> "Also if homa does not have yet GRO support, I doubt
> skb_attempt_defer_free() is a big win."

> Is there evidence that skb_attempt_defer_free provides meaningful
> performance benefits for Homa without GRO support? The concern appears
> unaddressed in subsequent patch versions.

I implemented skb_attempt_defer_free at Paolo's suggestion. My
performance measurements indicated that it does not reduce overall CPU
usage (it actually increases it slightly), but it moves CPU time off
the critical path for receiving long messages. Thus I'm inclined to
retain it. Maybe the performance benefits will increase when I
upstream Homa's GRO support?

-John-

^ permalink raw reply

* [BUG] net: caif: potential UAF in cfusbl_device_notify()
From: Wxm-233 @ 2026-04-15 23:35 UTC (permalink / raw)
  To: kuba, marcel, netdev, linux-bluetooth; +Cc: horms, linux-kernel

Hello,

we hit a KASAN slab-use-after-free read in
cfusbl_device_notify() under syzkaller-style workloads.

We reproduced this on 7.0.0-rc2-g0031c06807cf.

The crash happens in the NETDEV_REGISTER notifier path:

  bnep_add_connection()
    -> register_netdev()
      -> register_netdevice()
        -> call_netdevice_notifiers(NETDEV_REGISTER, dev)
          -> cfusbl_device_notify()

The faulting access is the parent/driver-name check in
net/caif/caif_usb.c:

  if (!(dev->dev.parent && dev->dev.parent->driver &&
        strcmp(dev->dev.parent->driver->name, "cdc_ncm") == 0))
          return 0;

From the report, the freed object appears to be the device behind
dev->dev.parent, not the net_device itself.

My current reading is that this is a cross-subsystem lifetime issue:

- the CAIF USB code registers a global netdevice notifier and
  inspects every new netdev
- BNEP sets dev->dev.parent via
  SET_NETDEV_DEV(dev, bnep_get_device(s))
- SET_NETDEV_DEV() only stores a raw pointer
- under concurrent Bluetooth teardown / suspend-related activity,
  that parent device can be released before cfusbl_device_notify()
  dereferences dev->dev.parent->driver

So this does not look like a CAIF modem functional regression in
ordinary use, but it does look like a real UAF in the notifier-side
filtering logic.

The trigger is clearly fuzzing-oriented and cross-subsystem, so I
would treat it as low urgency, but it still seems worth fixing in
net/caif/caif_usb.c, likely by avoiding naked parent dereferences in
the notifier path or by using a safer or lower-scope identification
path for relevant devices.

If useful, I can also send the full KASAN report separately.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] net: stmmac: enable RPS and RBU interrupts
From: Russell King (Oracle) @ 2026-04-16  0:02 UTC (permalink / raw)
  To: Sam Edwards
  Cc: Jakub Kicinski, Andrew Lunn, Alexandre Torgue, Andrew Lunn,
	David S. Miller, Eric Dumazet,
	moderated list:BROADCOM BCM2711/BCM2835 ARM ARCHITECTURE,
	linux-stm32, Linux Network Development Mailing List, Paolo Abeni
In-Reply-To: <CAH5Ym4j3GePEMEMmg1Z27gYfQ0N8Sc1BMW1rnvNZ4aLQ+cfFyQ@mail.gmail.com>

On Wed, Apr 15, 2026 at 01:50:53PM -0700, Sam Edwards wrote:
> On Wed, Apr 15, 2026 at 12:37 PM Russell King (Oracle)
> <linux@armlinux.org.uk> wrote:
> >
> > It's not a question about how I define RBU - this is defined by Synopsys
> > and I'm using it *exactly* that way as stated in the documentation.
> >
> > "This bit indicates that the host owns the Next Descriptor in the
> > Receive List and the DMA cannot acquire it. The Receive Process is
> > suspended. ... This bit is set only when the previous Receive
> > Descriptor is owned by the DMA."
> >
> > In other words, DMA has processed the previous receive descriptor which
> > _was_ owned by the hardware, written back to clear the OWN bit, and
> > then fetches the next descriptor and finds that the OWN bit is also
> > clear.
> 
> I'm only trying to leave open the possibility that the Synopsys
> technical writer and the hardware implementation team weren't
> communicating clearly. We already have a situation where RPS isn't
> behaving as documented (even if that's likely just hardware
> misconfiguration), so while I'm currently pretty sure RBU carries no
> other (actual) meaning than "DMA caught up to OWN=0," I'm only about
> 75% confident.

It doesn't make sense for RPS to be set though. RPS is "Receive Process
Stopped" and it's documented as being raised when the receive process
enters the stopped state.

If we look at the DMA Debug Status 0 register at 0x100c, then this
gives us a four bit bitfield for channels 0, 1 and 2. Further channels
are in 0x1010. I've added code to dump these when RBU occurs:

dwc-eth-dwmac 2490000.ethernet eth0: debug status: 0x00006400 0x00000000

bits 11:8 are RPS0, which indiciates that the DMA channel 0 receive
process state is "Suspended (Rx Descriptor Unavailable)". If this were
0, then it would be "Stopped (Reset or Stop Receive Command issued)".

So, RPS isn't being raised because the process state isn't entering
the stopped state, which makes sense - because we haven't issued a
stop command, nor have we caused a reset, and the documented recovery
from this condition is to merely advance the tail pointer, rather than
issuing a command to re-start the receive process.

When this is done (because stmmac_rx() continues to periodically run
because of NAPI) RPS0 does change back to 3 "Running (Waiting for Rx
packet)" but it seems that although there are packets waiting to be
written out, that never happens (the Queue 0 Receive Debug register
indicates that there are packets in the receive queue, the receive
queue fill level is above the flow control activate threshold, and
the MAC itself hammers the network with pause frames as a result.)

Thus, I think that the fact that RPS isn't being signalled is entirely
reasonable and consistent with the available documentation.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply

* Re: [PATCH bpf v4 0/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update
From: patchwork-bot+netdevbpf @ 2026-04-16  0:30 UTC (permalink / raw)
  To: Michal Luczaj
  Cc: john.fastabend, jakub, edumazet, kuniyu, pabeni, willemb, davem,
	kuba, horms, yhs, andrii, ast, daniel, martin.lau, eddyz87, song,
	yonghong.song, kpsingh, sdf, haoluo, jolsa, shuah, cong.wang,
	netdev, bpf, linux-kernel, linux-kselftest, jiayuan.chen,
	yimingqian591
In-Reply-To: <20260414-unix-proto-update-null-ptr-deref-v4-0-2af6fe97918e@rbox.co>

Hello:

This series was applied to bpf/bpf.git (master)
by Martin KaFai Lau <martin.lau@kernel.org>:

On Tue, 14 Apr 2026 16:13:14 +0200 you wrote:
> Updating sockmap/sockhash using a unix sock races unix_stream_connect():
> when sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED),
> unix_peer(sk) in unix_stream_bpf_update_proto() may still return NULL.
> 
> Signed-off-by: Michal Luczaj <mhal@rbox.co>
> ---
> Changes in v4:
> - Circle back to v1 approach
> - More details in commit messages [Martin]
> - Make unix iter take the state lock [Kaniyuki]
> - Link to v3: https://lore.kernel.org/r/20260306-unix-proto-update-null-ptr-deref-v3-0-2f0c7410c523@rbox.co
> 
> [...]

Here is the summary with links:
  - [bpf,v4,1/5] bpf, sockmap: Annotate af_unix sock::sk_state data-races
    https://git.kernel.org/bpf/bpf/c/a25566084e39
  - [bpf,v4,2/5] bpf, sockmap: Fix af_unix iter deadlock
    https://git.kernel.org/bpf/bpf/c/4d328dd69538
  - [bpf,v4,3/5] selftests/bpf: Extend bpf_iter_unix to attempt deadlocking
    https://git.kernel.org/bpf/bpf/c/997b8483d44c
  - [bpf,v4,4/5] bpf, sockmap: Fix af_unix null-ptr-deref in proto update
    https://git.kernel.org/bpf/bpf/c/dca38b7734d2
  - [bpf,v4,5/5] bpf, sockmap: Take state lock for af_unix iter
    https://git.kernel.org/bpf/bpf/c/64c2f93fc325

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox