Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH bpf v3 1/2] bpf, sockmap: fix use-after-free when the stream parser resizes the skb
From: John Fastabend @ 2026-06-18 18:01 UTC (permalink / raw)
  To: Jiayuan Chen; +Cc: netdev, bpf, linux-kernel, Jakub Kicinski, Sechang Lim
In-Reply-To: <34f330b8-60d2-4647-a6b4-a5b001c3715d@linux.dev>

On Thu, Jun 18, 2026 at 07:56:34PM +0800, Jiayuan Chen wrote:
>
>On 6/18/26 6:27 PM, Sechang Lim wrote:
>>sk_psock_strp_parse() runs the BPF_PROG_TYPE_SK_SKB stream-parser program
>>to find the length of the next message. strparser assembles a message out
>>of several received skbs by chaining them onto the head's frag_list and
>>recording where to append the next one in strp->skb_nextp:
>>
>>	*strp->skb_nextp = skb;
>>	strp->skb_nextp = &skb->next;
>>
>>and then calls the parser on the head:
>>
>>	len = (*strp->cb.parse_msg)(strp, head);
>
>[...]
>
>>unaffected and may still modify the skb.
>>
>>Fixes: 8a31db561566 ("bpf: add access to sock fields and pkt data from sk_skb programs")
>
>Is the Fixes tag correct ?
>
>Anyway, I don't think this patch is a fix; it's more of a hardening. 
>So no Fixes tag needed, IMO.
>
>
>>Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
>>---

[...]

>
>
>CI failed:
>https://github.com/kernel-patches/bpf/actions/runs/27754218839/job/82113319982
>   Failed stream parser bpf prog attach
>
>Hi John
>I noticed that bpf_skb_pull_data was added to the skmsg test:
>https://github.com/torvalds/linux/commit/82a8616889d506cb690cfc0afb2ccadda120461d
>
>Can we drop bpf_skb_pull_data in parser prog(sockmap_parse_prog.c‎) ?
>And are there any scenarios where we need to modify skb len when using 
>strparser ?

We should never modify the skb from strparser. Just remove any tests
that do this and state its not safe. We haven't used strparser progs
for a long time anyways.

^ permalink raw reply

* [PATCH net] eth: bnxt: improve the timing of stats
From: Jakub Kicinski @ 2026-06-18 18:13 UTC (permalink / raw)
  To: davem
  Cc: netdev, edumazet, pabeni, andrew+netdev, horms, Jakub Kicinski,
	michael.chan, pavan.chebbi

Kernel selftests wait 1.25x of the promised stats refresh time
(as read from ethtool -c). bnxt reports 1sec by default, but
the stats update process has two steps. First device DMAs the
new values, then the service task performs update in full-width
SW counters. So the worst case delay is actually 2x.

Note that there is bnxt_hwrm_port_qstats() but the qstats here
probably stands for "query stats", and the command itself
updates detailed MAC-level stats (MAC errors, RMON histogram etc.)
It must not be updating the stats we care about, otherwise
update would be synchronous, and this patch would make no
difference (and it does help).

The problem of stale stats impacts not only tests but real workloads
which monitor egress bandwidth of a NIC. The inaccuracy causes double
counting in the next cycle and spurious overload alarms.

Try to read from the DMA buffer more aggressively, to mitigate
timing issues between DMA and service task. The SW update should
be cheap.

Fixes: 51f307856b60 ("bnxt_en: Allow statistics DMA to be configurable using ethtool -C.")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
---
CC: michael.chan@broadcom.com
CC: pavan.chebbi@broadcom.com

With this patch I had a 50 clean runs of ntuple.py in a row.
Previously it'd fail within 5 runs at most.

Hopefully this is good enough, in the past I sent an RFC to
convert the driver to use SW stats for everything. That felt
a little drastic.
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  4 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 36 +++++++++++++++++++
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |  8 +++++
 3 files changed, 48 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 6d312259f852..aab6e88c3ca1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2620,6 +2620,9 @@ struct bnxt {
 #define BNXT_MIN_STATS_COAL_TICKS	  250000
 #define BNXT_MAX_STATS_COAL_TICKS	 1000000
 
+	spinlock_t		stats_lock;
+	unsigned long		stats_updated_jiffies;
+
 	struct work_struct	sp_task;
 	unsigned long		sp_event;
 #define BNXT_RX_NTP_FLTR_SP_EVENT	1
@@ -3027,6 +3030,7 @@ void bnxt_reenable_sriov(struct bnxt *bp);
 void bnxt_close_nic(struct bnxt *, bool, bool);
 void bnxt_get_ring_drv_stats(struct bnxt *bp,
 			     struct bnxt_total_ring_drv_stats *stats);
+void bnxt_sync_stats(struct bnxt *bp);
 bool bnxt_rfs_capable(struct bnxt *bp, bool new_rss_ctx);
 int bnxt_dbg_hwrm_rd_reg(struct bnxt *bp, u32 reg_off, u16 num_words,
 			 u32 *reg_buf);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 055e93a417b6..25462f854478 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -10575,6 +10575,35 @@ static void bnxt_accumulate_all_stats(struct bnxt *bp)
 	}
 }
 
+/* Re-accumulate stats from DMA buffers if stale.
+ * uAPIs for reading sw_stats should call this first.
+ *
+ * We promise user space update frequency of bp->stats_coal_ticks but
+ * the update is a two step process - first device updates the DMA buffer,
+ * then we have to update from that buffer to driver stats in the service work.
+ * Worst case we would be 2x off from the desired frequency.
+ * Sync the stats sooner, if stale. The 20% threshold was chosen arbitrarily.
+ *
+ * Ideally we would split the user-configured time into two portions,
+ * i.e. also lower the DMA period by the 20%. But the DMA timer seems to have
+ * too coarse granularity to play such tricks.
+ */
+void bnxt_sync_stats(struct bnxt *bp)
+{
+	unsigned long stale;
+
+	if (!netif_running(bp->dev) || !bp->stats_coal_ticks)
+		return;
+
+	spin_lock(&bp->stats_lock);
+	stale = usecs_to_jiffies(bp->stats_coal_ticks / 5);
+	if (time_after_eq(jiffies, bp->stats_updated_jiffies + stale)) {
+		bnxt_accumulate_all_stats(bp);
+		bp->stats_updated_jiffies = jiffies;
+	}
+	spin_unlock(&bp->stats_lock);
+}
+
 static int bnxt_hwrm_port_qstats(struct bnxt *bp, u8 flags)
 {
 	struct hwrm_port_qstats_input *req;
@@ -13577,6 +13606,7 @@ bnxt_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 		return;
 	}
 
+	bnxt_sync_stats(bp);
 	bnxt_get_ring_stats(bp, stats);
 	bnxt_add_prev_stats(bp, stats);
 
@@ -14753,7 +14783,10 @@ static void bnxt_sp_task(struct work_struct *work)
 	if (test_and_clear_bit(BNXT_PERIODIC_STATS_SP_EVENT, &bp->sp_event)) {
 		bnxt_hwrm_port_qstats(bp, 0);
 		bnxt_hwrm_port_qstats_ext(bp, 0);
+		spin_lock(&bp->stats_lock);
 		bnxt_accumulate_all_stats(bp);
+		bp->stats_updated_jiffies = jiffies;
+		spin_unlock(&bp->stats_lock);
 	}
 
 	if (test_and_clear_bit(BNXT_LINK_CHNG_SP_EVENT, &bp->sp_event)) {
@@ -15488,6 +15521,7 @@ static int bnxt_init_board(struct pci_dev *pdev, struct net_device *dev)
 	INIT_DELAYED_WORK(&bp->fw_reset_task, bnxt_fw_reset_task);
 
 	spin_lock_init(&bp->ntp_fltr_lock);
+	spin_lock_init(&bp->stats_lock);
 #if BITS_PER_LONG == 32
 	spin_lock_init(&bp->db_lock);
 #endif
@@ -16056,6 +16090,7 @@ static void bnxt_get_queue_stats_rx(struct net_device *dev, int i,
 	if (!bp->bnapi)
 		return;
 
+	bnxt_sync_stats(bp);
 	cpr = &bp->bnapi[i]->cp_ring;
 	sw = cpr->stats.sw_stats;
 
@@ -16084,6 +16119,7 @@ static void bnxt_get_queue_stats_tx(struct net_device *dev, int i,
 	if (!bp->tx_ring)
 		return;
 
+	bnxt_sync_stats(bp);
 	bnapi = bp->tx_ring[bp->tx_ring_map[i]].bnapi;
 	sw = bnapi->cp_ring.stats.sw_stats;
 
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 56d74a3c24b7..835b54287579 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -606,6 +606,7 @@ static void bnxt_get_ethtool_stats(struct net_device *dev,
 		goto skip_ring_stats;
 	}
 
+	bnxt_sync_stats(bp);
 	tpa_stats = bnxt_get_num_tpa_ring_stats(bp);
 	for (i = 0; i < bp->cp_nr_rings; i++) {
 		struct bnxt_napi *bnapi = bp->bnapi[i];
@@ -3310,6 +3311,7 @@ static void bnxt_get_fec_stats(struct net_device *dev,
 	if (BNXT_VF(bp) || !(bp->flags & BNXT_FLAG_PORT_STATS_EXT))
 		return;
 
+	bnxt_sync_stats(bp);
 	rx = bp->rx_port_stats_ext.sw_stats;
 	fec_stats->corrected_bits.total =
 		*(rx + BNXT_RX_STATS_EXT_OFFSET(rx_corrected_bits));
@@ -3409,6 +3411,7 @@ static void bnxt_get_pause_stats(struct net_device *dev,
 	if (BNXT_VF(bp) || !(bp->flags & BNXT_FLAG_PORT_STATS))
 		return;
 
+	bnxt_sync_stats(bp);
 	rx = bp->port_stats.sw_stats;
 	tx = bp->port_stats.sw_stats + BNXT_TX_PORT_STATS_BYTE_OFFSET / 8;
 
@@ -5572,6 +5575,7 @@ static void bnxt_get_eth_phy_stats(struct net_device *dev,
 	if (BNXT_VF(bp) || !(bp->flags & BNXT_FLAG_PORT_STATS_EXT))
 		return;
 
+	bnxt_sync_stats(bp);
 	rx = bp->rx_port_stats_ext.sw_stats;
 	phy_stats->SymbolErrorDuringCarrier =
 		*(rx + BNXT_RX_STATS_EXT_OFFSET(rx_pcs_symbol_err));
@@ -5586,6 +5590,7 @@ static void bnxt_get_eth_mac_stats(struct net_device *dev,
 	if (BNXT_VF(bp) || !(bp->flags & BNXT_FLAG_PORT_STATS))
 		return;
 
+	bnxt_sync_stats(bp);
 	rx = bp->port_stats.sw_stats;
 	tx = bp->port_stats.sw_stats + BNXT_TX_PORT_STATS_BYTE_OFFSET / 8;
 
@@ -5610,6 +5615,7 @@ static void bnxt_get_eth_ctrl_stats(struct net_device *dev,
 	if (BNXT_VF(bp) || !(bp->flags & BNXT_FLAG_PORT_STATS))
 		return;
 
+	bnxt_sync_stats(bp);
 	rx = bp->port_stats.sw_stats;
 	ctrl_stats->MACControlFramesReceived =
 		BNXT_GET_RX_PORT_STATS64(rx, rx_ctrl_frames);
@@ -5639,6 +5645,7 @@ static void bnxt_get_rmon_stats(struct net_device *dev,
 	if (BNXT_VF(bp) || !(bp->flags & BNXT_FLAG_PORT_STATS))
 		return;
 
+	bnxt_sync_stats(bp);
 	rx = bp->port_stats.sw_stats;
 	tx = bp->port_stats.sw_stats + BNXT_TX_PORT_STATS_BYTE_OFFSET / 8;
 
@@ -5712,6 +5719,7 @@ static void bnxt_get_link_ext_stats(struct net_device *dev,
 	if (BNXT_VF(bp) || !(bp->flags & BNXT_FLAG_PORT_STATS_EXT))
 		return;
 
+	bnxt_sync_stats(bp);
 	rx = bp->rx_port_stats_ext.sw_stats;
 	stats->link_down_events =
 		*(rx + BNXT_RX_STATS_EXT_OFFSET(link_down_events));
-- 
2.54.0


^ permalink raw reply related

* Re: [PATCH 1/2] fs: Add bpf_sock_read_xattr() kfunc to read socket xattrs
From: John Fastabend @ 2026-06-18 18:20 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Alexander Viro, Jan Kara,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, linux-fsdevel,
	netdev, bpf, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa
In-Reply-To: <20260617-work-bpf-sock-xattr-v1-1-a1276f7c9da3@kernel.org>

On Wed, Jun 17, 2026 at 01:18:27PM +0200, Christian Brauner wrote:
>In c8db08110cbe ("Merge tag 'vfs-7.1-rc1.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs")
>we added support for extended attributes for sockets. This comes in two
>flavors: sockfs and non-sockfs/filesystem sockets. Filesystem sockets
>are actual filesystem objects so reading xattrs must use dedicated fs
>helpers such as bpf_get_dentry_xattr() and bpf_get_file_xattr(). Those
>are inherently sleeping operations. Sockfs sockets on the other hand
>don't need to use sleeping operations as the underlying data structure
>is lockless. In addition, retrieval of sockfs extended attributes often
>happens from LSM hooks that only provide struct socket and it's
>completely nonsensical to grab a reference to a file, then force a
>sleeping operation to retrieve the xattr and drop the reference. We know
>that the sockfs file cannot go away while the LSM hook runs.

[...]

>
>Link: https://github.com/systemd/systemd/pull/40559 [1]
>Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
>---

Nice this will simplify some of our socket tracking.

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* [PATCH v2 bpf-next 0/2] bpf: bpf_redirect_peer egress redirection
From: Jordan Rife @ 2026-06-18 18:20 UTC (permalink / raw)
  To: bpf
  Cc: Jordan Rife, netdev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Stanislav Fomichev,
	Jiayuan Chen, Paul Chaignon

We have several use cases where a pod injects traffic into the datapath
of another so that the traffic appears to have originated from that
pod. One such use case is a synthetic flow generator which injects
synthetic traffic into a pod's datapath to enable dynamic probing and
debugging. Another is a transparent proxy where connections originating
from one pod are redirected towards another which proxies that
connection. The new connection is bound to the IP of the original pod
using IP_TRANSPARENT and its traffic is injected into that pod's
datapath and handled as if it had originated there. This can be used for
mTLS, etc.

We use bpf_redirect(BPF_F_INGRESS) to direct traffic leaving the proxy,
flow generator, etc. towards the target pod, ensuring that eBPF programs
that are meant to intercept traffic leaving that pod are executed.
However, this doesn't work with netkit.

With netkit, an ingress redirection from proxy to workload skips eBPF
programs that are meant to intercept traffic leaving the pod, since they
reside on the netkit peer device. One workaround is to attach the
same program to both the netkit peer device and the TCX ingress hook for
the netkit pair's primary interface, but

a) This seems hacky and we need to be careful not to run the same
   program twice for the same skb in cases where we want to pass that
   traffic to the host stack.
b) We're trying to keep the proxy redirection / traffic injection
   systems as modular and separated from Cilium as possible, the system
   that manages netkit setup and core eBPF programming.

It would be handy if instead we could redirect traffic directly from
one netkit peer device to another. This patch proposes an extension
to bpf_redirect_peer to allow us to do just that.

With this patch, the BPF_F_EGRESS flag tells bpf_redirect_peer to emit
the skb in the egress direction of the target interface's peer device
While the main use case is netkit, I suppose you could also use this
mode with veth as well if, e.g., there were some eBPF programs attached
to that side of the veth pair that needed to intercept traffic.

 +---------------------------------------------------------------------+
 | +-------------------------+         6. bpf_redirect_neigh(eth0)     |
 | | pod (10.244.0.10)       |           ------------------------      |
 | |                         |          |                        |     |
 | |              +--------+ |          |      +---------+       |     |
 | | 1. packet -->|        | |          |      |         |       |     |
 | |    leaves ^  | netkit |<===========|======| netkit  |       |     |
 | |           |  | peer   |=======(eBPF)=====>| primary |       |     |
 | |           |  |        | |          |      |         |       |     |
 | |           |  +--------+ |          |      +---------+       |     |
 | |           |             |          | 2. bpf_redirect        v     |
 | +-----------|-------------+          |___________________   +-------|
 |             |                                            |  | eth0  |
 |             | 5. bpf_redirect_peer(BPF_F_EGRESS)         |  +-------|
 |             |________________________                    |          |
 | +-------------------------+          |                   |          |
 | | proxy (10.244.0.11)     |          |                   |          |
 | | IP_TRANSPARENT          |          |                   |          |
 | |              +--------+ |          |      +---------+  |          |
 | | 3. packet <--|        | |          |      |         |<--          |
 | |    enters    | netkit |<===========|======| netkit  |             |
 | |    [proxy]   | peer   |=======(eBPF)=====>| primary |             |
 | | 4. packet -->|        | |                 |         |             |
 | |    leaves    +--------+ |                 +---------+             |
 | |    sip=10.244.0.10      |                                         |
 | +-------------------------+                                         |
 +---------------------------------------------------------------------+

Using the proxy use case as an example, in step 5 we would redirect
traffic leaving the proxy towards the pod's peer device using
bpf_redirect_peer(BPF_F_EGRESS).

As a bonus, since the skb doesn't have to go through the backlog queue
it can take full advantage of netkit's performance benefits. I set up a
test where outgoing iperf3 traffic is injected into the datapath of
another pod using either bpf_redirect_peer(BPF_F_EGRESS) or
bpf_redirect(BPF_F_INGRESS). I used Cilium's eBPF host routing mode
which skips the host stack and uses BPF redirect helpers to do all the
routing.

  (net.ipv4.tcp_congestion_control=cubic,mtu=1500,100GiB link,Cilium
   eBPF host routing mode)

BASELINE [bpf_redirect(BPF_F_INGRESS)]
  1. [iperf pod] ==bpf_redirect([pod b], BPF_F_INGRESS)==> [pod b]
  2. [pod b]     ==bpf_redirect_neigh([eth0])==>           eth0
  3. eth0        ==over network==>                         [host b]

  [ ID] Interval           Transfer     Bitrate         Retr
  [  5]   0.00-60.00  sec   231 GBytes  33.0 Gbits/sec  12060     sender
  [  5]   0.00-60.00  sec   230 GBytes  33.0 Gbits/sec            receiver

TEST [bpf_redirect_peer(BPF_F_EGRESS)]
  1. [iperf pod] ==bpf_redirect_peer([pod b], BPF_F_EGRESS)==> [pod b]
  2. [pod b]     ==bpf_redirect_neigh([eth0])==>               eth0
  3. eth0        ==over network==>                             [host b]

  [ ID] Interval           Transfer     Bitrate         Retr
  [  5]   0.00-60.00  sec   272 GBytes  38.9 Gbits/sec    0       sender
  [  5]   0.00-60.00  sec   272 GBytes  38.9 Gbits/sec            receiver

In this test, using bpf_redirect_peer(BPF_F_EGRESS) for the hop from
[iperf pod] to [pod b] led to ~18% more throughput compared to
bpf_redirect(BPF_F_INGRESS).

CHANGES
=======
v1->v2: https://lore.kernel.org/bpf/20260613183424.1198073-1-jordan@jrife.io/
* Introduce and use BPF_F_EGRESS instead of BPF_F_INGRESS (Paul,
  Jiayuan).
    Overall opinion was that BPF_F_EGRESS was clearer, but it was
    acknowledged that this creates some inconsistencies with
    bpf_redirect where 0 means egress implicitly.
* Invert `skb->dev = dev;` and `dev_sw_netstats_rx_add` to make the
  diff cleaner.

Jordan Rife (2):
  bpf: Support BPF_F_EGRESS with bpf_redirect_peer
  selftests/bpf: Add tests for bpf_redirect_peer with BPF_F_EGRESS

 include/uapi/linux/bpf.h                      | 19 +++---
 net/core/filter.c                             | 12 ++--
 tools/include/uapi/linux/bpf.h                | 19 +++---
 .../selftests/bpf/prog_tests/tc_redirect.c    | 68 +++++++++++++++++++
 .../selftests/bpf/progs/test_tc_peer.c        | 22 ++++++
 5 files changed, 119 insertions(+), 21 deletions(-)

-- 
2.43.0

^ permalink raw reply

* [PATCH v2 bpf-next 1/2] bpf: Support BPF_F_EGRESS with bpf_redirect_peer
From: Jordan Rife @ 2026-06-18 18:20 UTC (permalink / raw)
  To: bpf
  Cc: Jordan Rife, netdev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Stanislav Fomichev,
	Jiayuan Chen, Paul Chaignon
In-Reply-To: <20260618182035.43811-1-jordan@jrife.io>

We have several use cases where a pod injects traffic into the datapath
of another so that the traffic appears to have originated from that
pod. One such use case is a synthetic flow generator which injects
synthetic traffic into a pod's datapath to enable dynamic probing and
debugging. Another is a transparent proxy where connections originating
from one pod are redirected towards another which proxies that
connection. The new connection is bound to the IP of the original pod
using IP_TRANSPARENT and its traffic is injected into that pod's
datapath and handled as if it had originated there. This can be used for
mTLS, etc.

We use bpf_redirect(BPF_F_INGRESS) to direct traffic leaving the proxy,
flow generator, etc. towards the target pod, ensuring that eBPF programs
that are meant to intercept traffic leaving that pod are executed.
However, this doesn't work with netkit.

With netkit, an ingress redirection from proxy to workload skips eBPF
programs that are meant to intercept traffic leaving the pod, since they
reside on the netkit peer device. One workaround is to attach the
same program to both the netkit peer device and the TCX ingress hook for
the netkit pair's primary interface, but

a) This seems hacky and we need to be careful not to run the same
   program twice for the same skb in cases where we want to pass that
   traffic to the host stack.
b) We're trying to keep the proxy redirection / traffic injection
   systems as modular and separated from Cilium as possible, the system
   that manages netkit setup and core eBPF programming.

It would be handy if instead we could redirect traffic directly from
one netkit peer device to another. This patch proposes an extension
to bpf_redirect_peer to allow us to do just that.

With this patch, the BPF_F_EGRESS flag tells bpf_redirect_peer to emit
the skb in the egress direction of the target interface's peer device
While the main use case is netkit, I suppose you could also use this
mode with veth as well if, e.g., there were some eBPF programs attached
to that side of the veth pair that needed to intercept traffic.

 +---------------------------------------------------------------------+
 | +-------------------------+         6. bpf_redirect_neigh(eth0)     |
 | | pod (10.244.0.10)       |           ------------------------      |
 | |                         |          |                        |     |
 | |              +--------+ |          |      +---------+       |     |
 | | 1. packet -->|        | |          |      |         |       |     |
 | |    leaves ^  | netkit |<===========|======| netkit  |       |     |
 | |           |  | peer   |=======(eBPF)=====>| primary |       |     |
 | |           |  |        | |          |      |         |       |     |
 | |           |  +--------+ |          |      +---------+       |     |
 | |           |             |          | 2. bpf_redirect        v     |
 | +-----------|-------------+          |___________________   +-------|
 |             |                                            |  | eth0  |
 |             | 5. bpf_redirect_peer(BPF_F_EGRESS)         |  +-------|
 |             |________________________                    |          |
 | +-------------------------+          |                   |          |
 | | proxy (10.244.0.11)     |          |                   |          |
 | | IP_TRANSPARENT          |          |                   |          |
 | |              +--------+ |          |      +---------+  |          |
 | | 3. packet <--|        | |          |      |         |<--          |
 | |    enters    | netkit |<===========|======| netkit  |             |
 | |    [proxy]   | peer   |=======(eBPF)=====>| primary |             |
 | | 4. packet -->|        | |                 |         |             |
 | |    leaves    +--------+ |                 +---------+             |
 | |    sip=10.244.0.10      |                                         |
 | +-------------------------+                                         |
 +---------------------------------------------------------------------+

Using the proxy use case as an example, in step 5 we would redirect
traffic leaving the proxy towards the pod's peer device using
bpf_redirect_peer(BPF_F_EGRESS).

As a bonus, since the skb doesn't have to go through the backlog queue
it can take full advantage of netkit's performance benefits. I set up a
test where outgoing iperf3 traffic is injected into the datapath of
another pod using either bpf_redirect_peer(BPF_F_EGRESS) or
bpf_redirect(BPF_F_INGRESS). I used Cilium's eBPF host routing mode
which skips the host stack and uses BPF redirect helpers to do all the
routing.

  (net.ipv4.tcp_congestion_control=cubic,mtu=1500,100GiB link,Cilium
   eBPF host routing mode)

BASELINE [bpf_redirect(BPF_F_INGRESS)]
  1. [iperf pod] ==bpf_redirect([pod b], BPF_F_INGRESS)==> [pod b]
  2. [pod b]     ==bpf_redirect_neigh([eth0])==>           eth0
  3. eth0        ==over network==>                         [host b]

  [ ID] Interval           Transfer     Bitrate         Retr
  [  5]   0.00-60.00  sec   231 GBytes  33.0 Gbits/sec  12060     sender
  [  5]   0.00-60.00  sec   230 GBytes  33.0 Gbits/sec            receiver

TEST [bpf_redirect_peer(BPF_F_EGRESS)]
  1. [iperf pod] ==bpf_redirect_peer([pod b], BPF_F_EGRESS)==> [pod b]
  2. [pod b]     ==bpf_redirect_neigh([eth0])==>               eth0
  3. eth0        ==over network==>                             [host b]

  [ ID] Interval           Transfer     Bitrate         Retr
  [  5]   0.00-60.00  sec   272 GBytes  38.9 Gbits/sec    0       sender
  [  5]   0.00-60.00  sec   272 GBytes  38.9 Gbits/sec            receiver

In this test, using bpf_redirect_peer(BPF_F_EGRESS) for the hop from
[iperf pod] to [pod b] led to ~18% more throughput compared to
bpf_redirect(BPF_F_INGRESS).

Signed-off-by: Jordan Rife <jordan@jrife.io>
---
 include/uapi/linux/bpf.h       | 19 +++++++++++--------
 net/core/filter.c              | 12 +++++++-----
 tools/include/uapi/linux/bpf.h | 19 +++++++++++--------
 3 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 89b36de5fdbb..c91b5a4bda03 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5079,17 +5079,19 @@ union bpf_attr {
  * 	Description
  * 		Redirect the packet to another net device of index *ifindex*.
  * 		This helper is somewhat similar to **bpf_redirect**\ (), except
- * 		that the redirection happens to the *ifindex*' peer device and
- * 		the netns switch takes place from ingress to ingress without
- * 		going through the CPU's backlog queue.
+ * 		that the redirection happens to the *ifindex*' peer device. If
+ * 		*flags* is 0, the netns switch takes place from ingress to
+ * 		ingress without going through the CPU's backlog queue. If the
+ * 		**BPF_F_EGRESS** flag is provided then redirection happens in
+ * 		the egress direction of the peer device.
  *
  * 		*skb*\ **->mark** and *skb*\ **->tstamp** are not cleared during
  * 		the netns switch.
  *
- * 		The *flags* argument is reserved and must be 0. The helper is
- * 		currently only supported for tc BPF program types at the
- * 		ingress hook and for veth and netkit target device types. The
- * 		peer device must reside in a different network namespace.
+ * 		If the *flags* argument is 0, the helper is currently only
+ * 		supported for tc BPF program types at the ingress hook and for
+ * 		veth and netkit target device types. The peer device must reside
+ * 		in a different network namespace.
  * 	Return
  * 		The helper returns **TC_ACT_REDIRECT** on success or
  * 		**TC_ACT_SHOT** on error.
@@ -6336,9 +6338,10 @@ enum {
 /* Flags for bpf_redirect and bpf_redirect_map helpers */
 enum {
 	BPF_F_INGRESS		= (1ULL << 0), /* used for skb path */
+	BPF_F_EGRESS		= (1ULL << 1), /* used for skb path */
 	BPF_F_BROADCAST		= (1ULL << 3), /* used for XDP path */
 	BPF_F_EXCLUDE_INGRESS	= (1ULL << 4), /* used for XDP path */
-#define BPF_F_REDIRECT_FLAGS (BPF_F_INGRESS | BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS)
+#define BPF_F_REDIRECT_FLAGS (BPF_F_INGRESS | BPF_F_EGRESS | BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS)
 };
 
 #define __bpf_md_ptr(type, name)	\
diff --git a/net/core/filter.c b/net/core/filter.c
index 2e96b4b847ce..ce2ef5d8ae44 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2529,16 +2529,18 @@ int skb_do_redirect(struct sk_buff *skb)
 	if (unlikely(!dev))
 		goto out_drop;
 	if (flags & BPF_F_PEER) {
-		if (unlikely(!skb_at_tc_ingress(skb)))
-			goto out_drop;
 		dev = skb_get_peer_dev(dev);
 		if (unlikely(!dev ||
 			     !(dev->flags & IFF_UP) ||
 			     net_eq(net, dev_net(dev))))
 			goto out_drop;
+		skb_scrub_packet(skb, false);
+		if (flags & BPF_F_EGRESS)
+			return __bpf_redirect(skb, dev, 0);
+		if (unlikely(!skb_at_tc_ingress(skb)))
+			goto out_drop;
 		skb->dev = dev;
 		dev_sw_netstats_rx_add(dev, skb->len);
-		skb_scrub_packet(skb, false);
 		return -EAGAIN;
 	}
 	return flags & BPF_F_NEIGH ?
@@ -2575,10 +2577,10 @@ BPF_CALL_2(bpf_redirect_peer, u32, ifindex, u64, flags)
 {
 	struct bpf_redirect_info *ri = bpf_net_ctx_get_ri();
 
-	if (unlikely(flags))
+	if (unlikely(flags & ~BPF_F_EGRESS))
 		return TC_ACT_SHOT;
 
-	ri->flags = BPF_F_PEER;
+	ri->flags = BPF_F_PEER | flags;
 	ri->tgt_index = ifindex;
 
 	return TC_ACT_REDIRECT;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 89b36de5fdbb..c91b5a4bda03 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5079,17 +5079,19 @@ union bpf_attr {
  * 	Description
  * 		Redirect the packet to another net device of index *ifindex*.
  * 		This helper is somewhat similar to **bpf_redirect**\ (), except
- * 		that the redirection happens to the *ifindex*' peer device and
- * 		the netns switch takes place from ingress to ingress without
- * 		going through the CPU's backlog queue.
+ * 		that the redirection happens to the *ifindex*' peer device. If
+ * 		*flags* is 0, the netns switch takes place from ingress to
+ * 		ingress without going through the CPU's backlog queue. If the
+ * 		**BPF_F_EGRESS** flag is provided then redirection happens in
+ * 		the egress direction of the peer device.
  *
  * 		*skb*\ **->mark** and *skb*\ **->tstamp** are not cleared during
  * 		the netns switch.
  *
- * 		The *flags* argument is reserved and must be 0. The helper is
- * 		currently only supported for tc BPF program types at the
- * 		ingress hook and for veth and netkit target device types. The
- * 		peer device must reside in a different network namespace.
+ * 		If the *flags* argument is 0, the helper is currently only
+ * 		supported for tc BPF program types at the ingress hook and for
+ * 		veth and netkit target device types. The peer device must reside
+ * 		in a different network namespace.
  * 	Return
  * 		The helper returns **TC_ACT_REDIRECT** on success or
  * 		**TC_ACT_SHOT** on error.
@@ -6336,9 +6338,10 @@ enum {
 /* Flags for bpf_redirect and bpf_redirect_map helpers */
 enum {
 	BPF_F_INGRESS		= (1ULL << 0), /* used for skb path */
+	BPF_F_EGRESS		= (1ULL << 1), /* used for skb path */
 	BPF_F_BROADCAST		= (1ULL << 3), /* used for XDP path */
 	BPF_F_EXCLUDE_INGRESS	= (1ULL << 4), /* used for XDP path */
-#define BPF_F_REDIRECT_FLAGS (BPF_F_INGRESS | BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS)
+#define BPF_F_REDIRECT_FLAGS (BPF_F_INGRESS | BPF_F_EGRESS | BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS)
 };
 
 #define __bpf_md_ptr(type, name)	\
-- 
2.43.0


^ permalink raw reply related

* [PATCH v2 bpf-next 2/2] selftests/bpf: Add tests for bpf_redirect_peer with BPF_F_EGRESS
From: Jordan Rife @ 2026-06-18 18:20 UTC (permalink / raw)
  To: bpf
  Cc: Jordan Rife, netdev, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Stanislav Fomichev,
	Jiayuan Chen, Paul Chaignon
In-Reply-To: <20260618182035.43811-1-jordan@jrife.io>

Extend redirect tests to cover bpf_redirect_peer(BPF_F_EGRESS). SRC
redirects to DST using bpf_redirect_peer(BPF_F_EGRESS) then traffic is
hairpinned into DST using bpf_redirect.

Signed-off-by: Jordan Rife <jordan@jrife.io>
---
 .../selftests/bpf/prog_tests/tc_redirect.c    | 68 +++++++++++++++++++
 .../selftests/bpf/progs/test_tc_peer.c        | 22 ++++++
 2 files changed, 90 insertions(+)

diff --git a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
index 64fbda082309..af8968b89ad7 100644
--- a/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
+++ b/tools/testing/selftests/bpf/prog_tests/tc_redirect.c
@@ -192,6 +192,8 @@ static int create_netkit(int mode, char *prim, char *peer)
 	req.n.nlmsg_len += sizeof(struct ifinfomsg);
 	addattr_l(&req.n, sizeof(req), IFLA_IFNAME, peer, strlen(peer));
 	addattr_nest_end(&req.n, peer_info);
+	addattr32(&req.n, sizeof(req), IFLA_NETKIT_SCRUB,
+		  NETKIT_SCRUB_NONE);
 	addattr_nest_end(&req.n, data);
 	addattr_nest_end(&req.n, linkinfo);
 
@@ -405,6 +407,24 @@ static int netns_load_bpf(const struct bpf_program *src_prog,
 	return -1;
 }
 
+static struct bpf_link *netns_attach_nk(const char *ns, int ifindex,
+					struct bpf_program *prog)
+{
+	LIBBPF_OPTS(bpf_netkit_opts, optl);
+	struct nstoken *nstoken = NULL;
+	struct bpf_link *link = NULL;
+
+	nstoken = open_netns(ns);
+	if (!ASSERT_OK_PTR(nstoken, "setns"))
+		goto cleanup;
+
+	link = bpf_program__attach_netkit(prog, ifindex, &optl);
+cleanup:
+	if (nstoken)
+		close_netns(nstoken);
+	return link;
+}
+
 static void test_tcp(int family, const char *addr, __u16 port)
 {
 	int listen_fd = -1, accept_fd = -1, client_fd = -1;
@@ -1082,6 +1102,53 @@ static void test_tc_redirect_peer(struct netns_setup_result *setup_result)
 	close_netns(nstoken);
 }
 
+static void test_tc_redirect_peer_ing(struct netns_setup_result *setup_result)
+{
+	struct test_tc_peer *skel;
+	struct nstoken *nstoken;
+	int err;
+
+	nstoken = open_netns(NS_FWD);
+	if (!ASSERT_OK_PTR(nstoken, "setns fwd"))
+		return;
+
+	skel = test_tc_peer__open();
+	if (!ASSERT_OK_PTR(skel, "test_tc_peer__open"))
+		goto done;
+
+	skel->rodata->IFINDEX_SRC = setup_result->ifindex_src_fwd;
+	skel->rodata->IFINDEX_DST = setup_result->ifindex_dst_fwd;
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc_src_ing,
+		  BPF_NETKIT_PRIMARY), 0, "src_prog_attach_type");
+	ASSERT_EQ(bpf_program__set_expected_attach_type(skel->progs.tc_dst_ing,
+		  BPF_NETKIT_PRIMARY), 0, "dst_prog_attach_type");
+
+	err = test_tc_peer__load(skel);
+	if (!ASSERT_OK(err, "test_tc_peer__load"))
+		goto done;
+
+	skel->links.tc_src_ing = netns_attach_nk(NS_SRC,
+						 setup_result->ifindex_src,
+						 skel->progs.tc_src_ing);
+	if (!ASSERT_OK_PTR(skel->links.tc_src_ing, "attach_src"))
+		goto done;
+	skel->links.tc_dst_ing = netns_attach_nk(NS_DST,
+						 setup_result->ifindex_dst,
+						 skel->progs.tc_dst_ing);
+	if (!ASSERT_OK_PTR(skel->links.tc_dst_ing, "attach_dst"))
+		goto done;
+
+	if (!ASSERT_OK(set_forwarding(false), "disable forwarding"))
+		goto done;
+
+	test_connectivity();
+
+done:
+	if (skel)
+		test_tc_peer__destroy(skel);
+	close_netns(nstoken);
+}
+
 static int tun_open(char *name)
 {
 	struct ifreq ifr;
@@ -1280,6 +1347,7 @@ static void *test_tc_redirect_run_tests(void *arg)
 
 	RUN_TEST(tc_redirect_peer, MODE_VETH);
 	RUN_TEST(tc_redirect_peer, MODE_NETKIT);
+	RUN_TEST(tc_redirect_peer_ing, MODE_NETKIT);
 	RUN_TEST(tc_redirect_peer_l3, MODE_VETH);
 	RUN_TEST(tc_redirect_peer_l3, MODE_NETKIT);
 	RUN_TEST(tc_redirect_neigh, MODE_VETH);
diff --git a/tools/testing/selftests/bpf/progs/test_tc_peer.c b/tools/testing/selftests/bpf/progs/test_tc_peer.c
index 365eacb5dc34..cfb9ef7f467c 100644
--- a/tools/testing/selftests/bpf/progs/test_tc_peer.c
+++ b/tools/testing/selftests/bpf/progs/test_tc_peer.c
@@ -34,6 +34,28 @@ int tc_src(struct __sk_buff *skb)
 	return bpf_redirect_peer(IFINDEX_DST, 0);
 }
 
+SEC("tc")
+int tc_dst_ing(struct __sk_buff *skb)
+{
+	if (!skb->mark) {
+		skb->mark = 0x1;
+		return bpf_redirect_peer(IFINDEX_SRC, BPF_F_EGRESS);
+	}
+
+	return bpf_redirect(IFINDEX_DST, 0);
+}
+
+SEC("tc")
+int tc_src_ing(struct __sk_buff *skb)
+{
+	if (!skb->mark) {
+		skb->mark = 0x1;
+		return bpf_redirect_peer(IFINDEX_DST, BPF_F_EGRESS);
+	}
+
+	return bpf_redirect(IFINDEX_SRC, 0);
+}
+
 SEC("tc")
 int tc_dst_l3(struct __sk_buff *skb)
 {
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH 2/2] selftests/bpf: Add test for bpf_sock_read_xattr() kfunc
From: John Fastabend @ 2026-06-18 18:24 UTC (permalink / raw)
  To: Christian Brauner
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Alexei Starovoitov, Daniel Borkmann, Alexander Viro, Jan Kara,
	Simon Horman, Kuniyuki Iwashima, Willem de Bruijn, linux-fsdevel,
	netdev, bpf, Andrii Nakryiko, Martin KaFai Lau, Eduard Zingerman,
	Kumar Kartikeya Dwivedi, Song Liu, Yonghong Song, Jiri Olsa
In-Reply-To: <20260617-work-bpf-sock-xattr-v1-2-a1276f7c9da3@kernel.org>

On Wed, Jun 17, 2026 at 01:18:28PM +0200, Christian Brauner wrote:
>Add a selftest that loads the kfunc in sleepable and non-sleepable
>lsm/socket_connect programs and checks that a value set via fsetxattr()
>on a socket is read back.
>
>Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
>---

Reviewed-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* RE: [PATCH net] net: mana: Sync page pool RX frags for CPU
From: Haiyang Zhang @ 2026-06-18 18:38 UTC (permalink / raw)
  To: Dexuan Cui, KY Srinivasan, wei.liu@kernel.org, Dexuan Cui,
	Long Li, andrew+netdev@lunn.ch, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	Konstantin Taranov, horms@kernel.org, ernis@linux.microsoft.com,
	dipayanroy@linux.microsoft.com, kees@kernel.org,
	jacob.e.keller@intel.com, ssengar@linux.microsoft.com,
	linux-hyperv@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-rdma@vger.kernel.org
  Cc: stable@vger.kernel.org
In-Reply-To: <20260618035029.249361-1-decui@microsoft.com>



> -----Original Message-----
> From: Dexuan Cui <decui@microsoft.com>
> Sent: Wednesday, June 17, 2026 11:50 PM
> To: KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <DECUI@microsoft.com>; Long Li <longli@microsoft.com>;
> andrew+netdev@lunn.ch; davem@davemloft.net; edumazet@google.com;
> kuba@kernel.org; pabeni@redhat.com; Konstantin Taranov
> <kotaranov@microsoft.com>; horms@kernel.org; ernis@linux.microsoft.com;
> dipayanroy@linux.microsoft.com; kees@kernel.org; jacob.e.keller@intel.com;
> ssengar@linux.microsoft.com; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; linux-
> rdma@vger.kernel.org
> Cc: stable@vger.kernel.org
> Subject: [PATCH net] net: mana: Sync page pool RX frags for CPU
> 
> MANA allocates RX buffers from page pool fragments when frag_count is
> greater than 1. In that case the buffers remain DMA mapped by page pool
> and the RX completion path does not call dma_unmap_single(). As a result,
> the implicit sync-for-CPU normally performed by dma_unmap_single() is
> missing before the packet data is passed to the networking stack.
> 
> This breaks RX on configurations which require explicit DMA syncing, for
> example when booted with swiotlb=force.
> 
> Fix this by recording the page pool page and DMA sync offset when the RX
> buffer is allocated, and syncing the received packet range for CPU access
> before handing the RX buffer to the stack.
> 
> Also validate the packet length reported in the RX CQE before using it as
> a DMA sync length or passing it to skb processing. The CQE is supplied
> by the device and should not be blindly trusted by Confidential VMs.
> 
> Fixes: 730ff06d3f5c ("net: mana: Use page pool fragments for RX buffers
> instead of full pages to improve memory efficiency.")
> Cc: stable@vger.kernel.org
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> ---
>  drivers/net/ethernet/microsoft/mana/mana_en.c | 61 +++++++++++++++----
>  include/net/mana/mana.h                       |  8 +++
>  2 files changed, 57 insertions(+), 12 deletions(-)

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>



^ permalink raw reply

* Re: [PATCH bpf] bpf, sockmap: fix BUG_ON in skb_to_sgvec() on a resized ingress skb
From: John Fastabend @ 2026-06-18 19:00 UTC (permalink / raw)
  To: Sechang Lim
  Cc: Jakub Sitnicki, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Liu Jian, Daniel Borkmann, Cong Wang,
	netdev, bpf, linux-kernel
In-Reply-To: <20260613082442.3252576-1-rhkrqnwk98@gmail.com>

On Sat, Jun 13, 2026 at 08:24:31AM +0000, Sechang Lim wrote:
>sk_psock_skb_ingress_enqueue() maps a received message into a scatterlist
>with skb_to_sgvec(skb, sg, off, len). On the SK_SKB strparser path off and
>len come from the message's strp_msg (stm->offset and stm->full_len), set
>by the stream parser. strparser does not trim the skb, so normally
>skb->len - off >= full_len and len is within the skb.
>
>An SK_SKB verdict (or parser) program may call bpf_skb_change_tail() and
>shrink the skb after full_len was recorded. len then covers more bytes than
>the skb holds, __skb_to_sgvec() walks past the data and trips BUG_ON(len):

FWIW this only happens if the strparser program is also attached. If 
there is no strparser program stm->offset = 0 and stm->full_len will be 
whatever the verdict program set. So there we would get

   len = skb->len; // then if it shrinks to skb->len - X its ok.
   off = 0;


>
>  kernel BUG at net/core/skbuff.c:5286!
>  RIP: 0010:__skb_to_sgvec+0x78c/0x790
>  Call Trace:
>   <IRQ>
>   skb_to_sgvec+0x32/0x90
>   sk_psock_skb_ingress_enqueue+0x42/0x370
>   sk_psock_skb_ingress_self+0x1a8/0x200
>   sk_psock_verdict_apply+0x33c/0x360
>   sk_psock_strp_read+0x24a/0x370
>   __strp_recv+0x66d/0xda0
>   __tcp_read_sock+0x13d/0x590
>   tcp_bpf_strp_read_sock+0x195/0x320
>   strp_data_ready+0x267/0x340
>   sk_psock_strp_data_ready+0x1ce/0x350
>   tcp_data_queue+0x1364/0x2fd0
>   </IRQ>
>
>Clamp len to skb->len - off, and drop the message if off is already past
>the skb. sk_psock_skb_ingress_enqueue() is the only skb_to_sgvec() caller
>and both ingress paths (verdict SK_PASS and the backlog worker) reach it.
>The clamp is a no-op unless the skb was shrunk.
>
>Fixes: 7303524e04af ("skmsg: Lose offset info in sk_psock_skb_ingress")
>Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
>---
> net/core/skmsg.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
>diff --git a/net/core/skmsg.c b/net/core/skmsg.c
>index e1850caf1a71..2961178ebd1e 100644
>--- a/net/core/skmsg.c
>+++ b/net/core/skmsg.c
>@@ -550,6 +550,10 @@ static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
> {
> 	int num_sge, copied;
>
>+	if (off >= skb->len)
>+		return -EINVAL;
>+	len = min_t(u32, len, skb->len - off);
>+

This is blocking the BUG but will break the socket. We should fix
at the cause. Something like this untested... although I've never
used the strparser program in any of our cases.

diff --git a/net/core/skmsg.c b/net/core/skmsg.c
index 2521b643fa05..95347f9d140c 100644
--- a/net/core/skmsg.c
+++ b/net/core/skmsg.c
@@ -542,6 +542,20 @@ static struct sk_msg *sk_psock_create_ingress_msg(struct sock *sk,
         return alloc_sk_msg(GFP_KERNEL);
  }

+static bool sk_psock_skb_strp_range(struct sk_buff *skb, u32 *off, u32 *len)
+{
+       struct strp_msg *stm = strp_msg(skb);
+
+       *off = stm->offset;
+       if (unlikely(*off >= skb->len)) {
+               *len = 0;
+               return false;
+       }
+
+       *len = min_t(u32, stm->full_len, skb->len - *off);
+       return *len;
+}
+
  static int sk_psock_skb_ingress_enqueue(struct sk_buff *skb,
                                         u32 off, u32 len,
                                         struct sk_psock *psock,
@@ -696,12 +710,8 @@ static void sk_psock_backlog(struct work_struct *work)
         while ((skb = skb_peek(&psock->ingress_skb))) {
                 len = skb->len;
                 off = 0;
-               if (skb_bpf_strparser(skb)) {
-                       struct strp_msg *stm = strp_msg(skb);
-
-                       off = stm->offset;
-                       len = stm->full_len;
-               }
+               if (skb_bpf_strparser(skb))
+                       sk_psock_skb_strp_range(skb, &off, &len);

                 /* Resume processing from previous partial state */
                 if (unlikely(state->len)) {
@@ -709,6 +719,9 @@ static void sk_psock_backlog(struct work_struct *work)
                         off = state->off;
                 }

+               if (unlikely(!len))
+                       goto out_free_skb;
+
                 ingress = skb_bpf_ingress(skb);
                 skb_bpf_redirect_clear(skb);
                 do {
@@ -737,7 +750,8 @@ static void sk_psock_backlog(struct work_struct *work)
                         len -= ret;
                 } while (len);

-               /* The entire skb sent, clear state */
+out_free_skb:
+               /* The skb has been handled, clear state. */
                 sk_psock_skb_state(psock, state, 0, 0);
                 skb = skb_dequeue(&psock->ingress_skb);
                 kfree_skb(skb);
@@ -1020,10 +1034,10 @@ static int sk_psock_verdict_apply(struct sk_psock *psock, struct sk_buff *skb,
                         len = skb->len;
                         off = 0;
                         if (skb_bpf_strparser(skb)) {
-                               struct strp_msg *stm = strp_msg(skb);
-
-                               off = stm->offset;
-                               len = stm->full_len;
+                               if (unlikely(!sk_psock_skb_strp_range(skb, &off, &len))) {
+                                       err = 0;
+                                       goto out_free;
+                               }
                         }
                         err = sk_psock_skb_ingress_self(psock, skb, off, len, false);
		}

^ permalink raw reply related

* Re: lan7801 looses VLAN Filter Table
From: Nicolai Buchwitz @ 2026-06-18 19:00 UTC (permalink / raw)
  To: Sven Schuchmann; +Cc: netdev
In-Reply-To: <BEZP281MB224501E38B30BFDC4BD3D364D9E32@BEZP281MB2245.DEUP281.PROD.OUTLOOK.COM>

Hi Sven

On 18.6.2026 17:18, Sven Schuchmann wrote:
> Hi,
> I have a problem with a lan7801 chip in Kernel 6.18. I configure 
> VLAN-ID (2) and an IP address.
> But if I disconnect and connect the network-cable several times at some 
> point no packets are
> received anymore. Without using VLAN this does not happen.
> 
> I tracked this down that sometimes the VLAN Filter table seems
> to get cleared. I hooked into the lan78xx.c driver to dump the vlan 
> table:
> 
> static void lan78xx_get_stats(struct net_device *netdev,
> 			      struct ethtool_stats *stats, u64 *data)
> {
> 	struct lan78xx_net *dev = netdev_priv(netdev);
> 	struct lan78xx_priv *pdata = (struct lan78xx_priv *)(dev->data[0]);
> 
> 	lan78xx_update_stats(dev);
> 
> 	for (int i = 0; i < 3; i++) {
> 		u32 buf;
> 		lan78xx_dataport_read(dev, DP_SEL_RSEL_VLAN_DA_, i, 1, &buf);
> 		if (pdata->vlan_table[i] != buf)
> 			netdev_err(dev->net, "VLAN TABLE %d: 0x%08x 0x%08x", i, 
> pdata->vlan_table[i], buf);
> 		else
> 			netdev_info(dev->net, "VLAN TABLE %d: 0x%08x 0x%08x", i, 
> pdata->vlan_table[i], buf);
> 	}
> 
> So I can "read out" the table if I do "ethtool -S" and see it in the 
> kernel log.
> Normally the output looks like this:
> VLAN TABLE 0: 0x00000005 0x00000005
> So the table looks as expected. The Local Filter table from pdata is 
> the same as in the chip itself.
> 
> But after some cable disconnects and connects I see this:
> VLAN TABLE 0: 0x00000005 0x00000000
> So the table got cleared or deleted and no paketes on VLAN-ID 2 go 
> through.
> I even can do this after I read out the table in lan78xx_get_stats():
> 
> 	lan78xx_dataport_write(dev, DP_SEL_RSEL_VLAN_DA_, 0,
> 				DP_SEL_VHF_VLAN_LEN, pdata->vlan_table);
> 
> ...and with this I can "fix" the table again from the ethtool and it 
> starts working again.
> 
> Has someone seen something like this or can point me to a direction 
> where
> I could reinit this table (I already tried at the end of 
> lan78xx_mac_link_up() without success...)

I was able to reproduce your issue on my hardware. AFAIU the vlan table 
is not restored after USB suspend.
I will send a patch shortly. Would be great if you can test it too.

> 
> Thanks!
> 
> Regards, Sven

Thanks
Nicolai

^ permalink raw reply

* [PATCH net] net: usb: lan78xx: restore VLAN filter table after device reset
From: Nicolai Buchwitz @ 2026-06-18 19:11 UTC (permalink / raw)
  To: Thangaraj Samynathan, Rengarajan Sundararajan, UNGLinuxDriver,
	Woojung.Huh
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Sven Schuchmann, netdev, linux-usb, linux-kernel,
	Nicolai Buchwitz

Configured VLANs stop receiving traffic after a USB autosuspend/resume
cycle, e.g. when a cable is unplugged long enough for the device to
suspend and then plugged back in. VLAN filtering stays enabled but all
VLAN-tagged frames are dropped until a VLAN is added or removed again.

The reset on resume clears the hardware VLAN filter table, but unlike
the multicast and address filters it is never reprogrammed from the
driver's shadow copy, so it stays empty.

Restore the VLAN filter table as part of the reset sequence.

Reported-by: Sven Schuchmann <schuchmann@schleissheimer.de>
Closes: https://lore.kernel.org/netdev/BEZP281MB224501E38B30BFDC4BD3D364D9E32@BEZP281MB2245.DEUP281.PROD.OUTLOOK.COM/T/#u
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver")
Signed-off-by: Nicolai Buchwitz <nb@tipi-net.de>
---
 drivers/net/usb/lan78xx.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index bcf293ea1bd3..52c76de64eb9 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -3065,14 +3065,20 @@ static int lan78xx_set_features(struct net_device *netdev,
 	return lan78xx_write_reg(dev, RFE_CTL, pdata->rfe_ctl);
 }
 
+static int lan78xx_write_vlan_table(struct lan78xx_net *dev)
+{
+	struct lan78xx_priv *pdata = (struct lan78xx_priv *)(dev->data[0]);
+
+	return lan78xx_dataport_write(dev, DP_SEL_RSEL_VLAN_DA_, 0,
+				      DP_SEL_VHF_VLAN_LEN, pdata->vlan_table);
+}
+
 static void lan78xx_deferred_vlan_write(struct work_struct *param)
 {
 	struct lan78xx_priv *pdata =
 			container_of(param, struct lan78xx_priv, set_vlan);
-	struct lan78xx_net *dev = pdata->dev;
 
-	lan78xx_dataport_write(dev, DP_SEL_RSEL_VLAN_DA_, 0,
-			       DP_SEL_VHF_VLAN_LEN, pdata->vlan_table);
+	lan78xx_write_vlan_table(pdata->dev);
 }
 
 static int lan78xx_vlan_rx_add_vid(struct net_device *netdev,
@@ -3353,6 +3359,15 @@ static int lan78xx_reset(struct lan78xx_net *dev)
 
 	lan78xx_set_multicast(dev->net);
 
+	/* The chip reset above also clears the VLAN filter table held in the
+	 * shared VLAN/DA hash RAM. The network stack does not re-add VLANs
+	 * after a silent device reset (e.g. on reset_resume after USB
+	 * autosuspend), so restore the table from our shadow copy here.
+	 */
+	ret = lan78xx_write_vlan_table(dev);
+	if (ret < 0)
+		return ret;
+
 	/* reset PHY */
 	ret = lan78xx_read_reg(dev, PMT_CTL, &buf);
 	if (ret < 0)

base-commit: 7d8297e26b4e20b5d1c3c3fe51fe81a1c7fbc823
-- 
2.53.0


^ permalink raw reply related

* [PATCH iwl-net] idpf: fix max_vport related crash on allocation error during init
From: Emil Tantilov @ 2026-06-18 19:23 UTC (permalink / raw)
  To: intel-wired-lan
  Cc: netdev, anthony.l.nguyen, przemyslaw.kitszel, andrew+netdev,
	davem, edumazet, kuba, pabeni, madhu.chittim

Set adapter->max_vports only after successful allocation of vports, netdevs
and  vport_config buffers. This fixes possible crashes on reset or rmmod,
following failed allocation on init

[  305.981402] idpf 0000:83:00.0: enabling device (0100 -> 0102)
[  305.994464] idpf 0000:83:00.0: Device HW Reset initiated
[  320.416872] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  320.416918] #PF: supervisor read access in kernel mode
[  320.416942] #PF: error_code(0x0000) - not-present page
[  320.416963] PGD 2099657067 P4D 0
[  320.416983] Oops: Oops: 0000 [#1] SMP NOPTI
...
[  320.417093] RIP: 0010:idpf_remove+0x118/0x200 [idpf]
[  320.417130] Code: 8b bb 98 09 00 00 e8 17 0f 5b e5 48 8b bb e8 08 00 00 e8 0b 0f 5b e5 66 83 bb 28 06 00 00 00 48 8b bb 20 06 00 00 74 49 31 ed <48> 8b 04 ef 48 85 c0 74 2f 48 8b 78 20 e8 66 58 91 e5 48 8b 83 20
[  320.417183] RSP: 0018:ff7322212903fdb8 EFLAGS: 00010246
[  320.417205] RAX: 0000000000000000 RBX: ff4463de40300000 RCX: ff7322212903fd4c
[  320.417228] RDX: 0000000000000001 RSI: ffffffffa7f7d100 RDI: 0000000000000000
[  320.417250] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
[  320.417272] R10: 0000000000000001 R11: ff4463de3a638f58 R12: ff4463be89ac7000
[  320.417294] R13: ff4463be89ac7198 R14: ff4463be94fc7198 R15: ffffffffc0f10f20
[  320.417317] FS:  00007f963c0e6740(0000) GS:ff4463fdd65d8000(0000) knlGS:0000000000000000
[  320.417342] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  320.417362] CR2: 0000000000000000 CR3: 00000020ba674002 CR4: 0000000000773ef0
[  320.417385] PKRU: 55555554
[  320.417398] Call Trace:
[  320.417412]  <TASK>
[  320.417429]  pci_device_remove+0x42/0xb0
[  320.417459]  device_release_driver_internal+0x1a9/0x210
[  320.417492]  driver_detach+0x4b/0x90
[  320.417516]  bus_remove_driver+0x70/0x100
[  320.417539]  pci_unregister_driver+0x2e/0xb0
[  320.417564]  __do_sys_delete_module.constprop.0+0x190/0x2f0
[  320.417592]  ? kmem_cache_free+0x31e/0x550
[  320.417619]  ? lockdep_hardirqs_on_prepare+0xde/0x190
[  320.417644]  ? do_syscall_64+0x38/0x6b0
[  320.417665]  do_syscall_64+0xc8/0x6b0
[  320.417683]  ? clear_bhb_loop+0x30/0x80
[  320.417706]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  320.417727] RIP: 0033:0x7f963bb30beb

Fixes: 0fe45467a104 ("idpf: add create vport and netdev configuration")
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
---
 drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
index be66f9b2e101..dc5ad784f456 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
@@ -3555,7 +3555,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
 
 	pci_sriov_set_totalvfs(adapter->pdev, idpf_get_max_vfs(adapter));
 	num_max_vports = idpf_get_max_vports(adapter);
-	adapter->max_vports = num_max_vports;
 	adapter->vports = kzalloc_objs(*adapter->vports, num_max_vports);
 	if (!adapter->vports)
 		return -ENOMEM;
@@ -3576,6 +3575,12 @@ int idpf_vc_core_init(struct idpf_adapter *adapter)
 		goto err_netdev_alloc;
 	}
 
+	/* Set max_vports only after vports, netdevs and vport_config buffers
+	 * are allocated to make sure max_vport bound loops don't end up
+	 * crashing, following allocation errors on init.
+	 */
+	adapter->max_vports = num_max_vports;
+
 	/* Start the mailbox task before requesting vectors. This will ensure
 	 * vector information response from mailbox is handled
 	 */
-- 
2.37.3


^ permalink raw reply related

* RE: [Intel-wired-lan] [PATCH iwl-net] idpf: fix max_vport related crash on allocation error during init
From: Loktionov, Aleksandr @ 2026-06-18 19:37 UTC (permalink / raw)
  To: Tantilov, Emil S, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, Nguyen, Anthony L, Kitszel, Przemyslaw,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Chittim, Madhu
In-Reply-To: <20260618192325.8694-1-emil.s.tantilov@intel.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Emil Tantilov
> Sent: Thursday, June 18, 2026 9:23 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; andrew+netdev@lunn.ch;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; Chittim, Madhu <madhu.chittim@intel.com>
> Subject: [Intel-wired-lan] [PATCH iwl-net] idpf: fix max_vport related
> crash on allocation error during init
> 
> Set adapter->max_vports only after successful allocation of vports,
> netdevs and  vport_config buffers. This fixes possible crashes on
> reset or rmmod, following failed allocation on init
> 
> [  305.981402] idpf 0000:83:00.0: enabling device (0100 -> 0102) [
> 305.994464] idpf 0000:83:00.0: Device HW Reset initiated [
> 320.416872] BUG: kernel NULL pointer dereference, address:
> 0000000000000000 [  320.416918] #PF: supervisor read access in kernel
> mode [  320.416942] #PF: error_code(0x0000) - not-present page [
> 320.416963] PGD 2099657067 P4D 0 [  320.416983] Oops: Oops: 0000 [#1]
> SMP NOPTI ...
> [  320.417093] RIP: 0010:idpf_remove+0x118/0x200 [idpf] [  320.417130]
> Code: 8b bb 98 09 00 00 e8 17 0f 5b e5 48 8b bb e8 08 00 00 e8 0b 0f
> 5b e5 66 83 bb 28 06 00 00 00 48 8b bb 20 06 00 00 74 49 31 ed <48> 8b
> 04 ef 48 85 c0 74 2f 48 8b 78 20 e8 66 58 91 e5 48 8b 83 20 [
> 320.417183] RSP: 0018:ff7322212903fdb8 EFLAGS: 00010246 [  320.417205]
> RAX: 0000000000000000 RBX: ff4463de40300000 RCX: ff7322212903fd4c [
> 320.417228] RDX: 0000000000000001 RSI: ffffffffa7f7d100 RDI:
> 0000000000000000 [  320.417250] RBP: 0000000000000000 R08:
> 0000000000000001 R09: 0000000000000000 [  320.417272] R10:
> 0000000000000001 R11: ff4463de3a638f58 R12: ff4463be89ac7000 [
> 320.417294] R13: ff4463be89ac7198 R14: ff4463be94fc7198 R15:
> ffffffffc0f10f20 [  320.417317] FS:  00007f963c0e6740(0000)
> GS:ff4463fdd65d8000(0000) knlGS:0000000000000000 [  320.417342] CS:
> 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [  320.417362] CR2:
> 0000000000000000 CR3: 00000020ba674002 CR4: 0000000000773ef0 [
> 320.417385] PKRU: 55555554 [  320.417398] Call Trace:
> [  320.417412]  <TASK>
> [  320.417429]  pci_device_remove+0x42/0xb0 [  320.417459]
> device_release_driver_internal+0x1a9/0x210
> [  320.417492]  driver_detach+0x4b/0x90
> [  320.417516]  bus_remove_driver+0x70/0x100 [  320.417539]
> pci_unregister_driver+0x2e/0xb0 [  320.417564]
> __do_sys_delete_module.constprop.0+0x190/0x2f0
> [  320.417592]  ? kmem_cache_free+0x31e/0x550 [  320.417619]  ?
> lockdep_hardirqs_on_prepare+0xde/0x190
> [  320.417644]  ? do_syscall_64+0x38/0x6b0 [  320.417665]
> do_syscall_64+0xc8/0x6b0 [  320.417683]  ? clear_bhb_loop+0x30/0x80 [
> 320.417706]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> [  320.417727] RIP: 0033:0x7f963bb30beb
> 
> Fixes: 0fe45467a104 ("idpf: add create vport and netdev
> configuration")
> Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
> Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
> ---
>  drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> index be66f9b2e101..dc5ad784f456 100644
> --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c
> @@ -3555,7 +3555,6 @@ int idpf_vc_core_init(struct idpf_adapter
> *adapter)
> 
>  	pci_sriov_set_totalvfs(adapter->pdev,
> idpf_get_max_vfs(adapter));
>  	num_max_vports = idpf_get_max_vports(adapter);
> -	adapter->max_vports = num_max_vports;
>  	adapter->vports = kzalloc_objs(*adapter->vports,
> num_max_vports);
>  	if (!adapter->vports)
>  		return -ENOMEM;
> @@ -3576,6 +3575,12 @@ int idpf_vc_core_init(struct idpf_adapter
> *adapter)
>  		goto err_netdev_alloc;
>  	}
> 
> +	/* Set max_vports only after vports, netdevs and vport_config
> buffers
> +	 * are allocated to make sure max_vport bound loops don't end
> up
> +	 * crashing, following allocation errors on init.
> +	 */
> +	adapter->max_vports = num_max_vports;
> +
>  	/* Start the mailbox task before requesting vectors. This will
> ensure
>  	 * vector information response from mailbox is handled
>  	 */
> --
> 2.37.3

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>


^ permalink raw reply

* Re: [PATCH net] net: dsa: realtek: fix memory leak in rtl8366rb_setup_led()
From: Luiz Angelo Daros de Luca @ 2026-06-18 20:12 UTC (permalink / raw)
  To: David Yang
  Cc: netdev, Linus Walleij, Alvin Šipraga, Andrew Lunn,
	Vladimir Oltean, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-kernel
In-Reply-To: <20260618140200.1888707-1-mmyangfl@gmail.com>

Thanks David,


> led_classdev_register_ext() only reads init_data.devicename - it never
> stores the pointer. However, the caller allocated devicename with
> kasprintf() but never freed it, leaking the string memory.
>
> Fix it with a stack buffer to avoid dynamic buffers completely.
>
> Fixes: 32d617005475 ("net: dsa: realtek: add LED drivers for rtl8366rb")
> Signed-off-by: David Yang <mmyangfl@gmail.com>
> ---
>  drivers/net/dsa/realtek/rtl8366rb-leds.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/dsa/realtek/rtl8366rb-leds.c b/drivers/net/dsa/realtek/rtl8366rb-leds.c
> index 509ffd3f8db5..ba50d311cb15 100644
> --- a/drivers/net/dsa/realtek/rtl8366rb-leds.c
> +++ b/drivers/net/dsa/realtek/rtl8366rb-leds.c
> @@ -89,6 +89,7 @@ static int rtl8366rb_setup_led(struct realtek_priv *priv, struct dsa_port *dp,
>         struct led_init_data init_data = { };
>         enum led_default_state state;
>         struct rtl8366rb_led *led;
> +       char name[64];
>         u32 led_group;
>         int ret;
>
> @@ -129,10 +130,9 @@ static int rtl8366rb_setup_led(struct realtek_priv *priv, struct dsa_port *dp,
>         init_data.fwnode = led_fwnode;
>         init_data.devname_mandatory = true;
>
> -       init_data.devicename = kasprintf(GFP_KERNEL, "Realtek-%d:0%d:%d",
> -                                        dp->ds->index, dp->index, led_group);

Indeed, it will leak. init_data is local and init_data.devicename is
read by led_compose_name, not stored. However, stack is a limited
space for allocation.
You can alternatively solve the leak using devm_kasprintf() (my
choice) or adding a kfree() before leaving the function.

> -       if (!init_data.devicename)
> -               return -ENOMEM;
> +       snprintf(name, sizeof(name), "Realtek-%d:0%d:%d",
> +                dp->ds->index, dp->index, led_group);
> +       init_data.devicename = name;
>
>         ret = devm_led_classdev_register_ext(priv->dev, &led->cdev, &init_data);
>         if (ret) {
> --
> 2.53.0
>

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH net] igb: only strip Rx timestamp header on the first buffer of a frame
From: Jacob Keller @ 2026-06-18 20:25 UTC (permalink / raw)
  To: Tony Nguyen, Kurt Kanzenbach, Tjerk Kusters,
	netdev@vger.kernel.org
  Cc: intel-wired-lan@lists.osuosl.org, przemyslaw.kitszel@intel.com,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, richardcochran@gmail.com,
	hawk@kernel.org, stable@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <55ab9b13-ee51-4ac6-af7b-b3feb159eb51@intel.com>

On 6/18/2026 10:38 AM, Tony Nguyen wrote:
> On 6/15/2026 12:43 AM, Kurt Kanzenbach wrote:
>> On Fri Jun 12 2026, Tjerk Kusters wrote:
>>> Fixes: 5379260852b0 ("igb: Fix XDP with PTP enabled")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: T Kusters <tkusters@aweta.nl>
> 
> Sign off should be your full name.
> 
Ideally it should also match whatever you use as your email in the From.

^ permalink raw reply

* [RFC net-next 0/4] net: dsa: motorcomm: Add LED support
From: David Yang @ 2026-06-18 20:26 UTC (permalink / raw)
  To: netdev
  Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel

RFC during net-next closed

David Yang (4):
  net: dsa: motorcomm: Move to subdirectory
  net: dsa: motorcomm: Split SMI module
  net: dsa: motorcomm: Dynamically allocate port structures
  net: dsa: motorcomm: Add LED support

 MAINTAINERS                                   |   2 +-
 drivers/net/dsa/Kconfig                       |  10 +-
 drivers/net/dsa/Makefile                      |   2 +-
 drivers/net/dsa/motorcomm/Kconfig             |  17 +
 drivers/net/dsa/motorcomm/Makefile            |   5 +
 .../net/dsa/{yt921x.c => motorcomm/chip.c}    | 311 +++-------
 .../net/dsa/{yt921x.h => motorcomm/chip.h}    |  21 +-
 drivers/net/dsa/motorcomm/leds.c              | 530 ++++++++++++++++++
 drivers/net/dsa/motorcomm/leds.h              | 104 ++++
 drivers/net/dsa/motorcomm/smi.c               | 155 +++++
 drivers/net/dsa/motorcomm/smi.h               |  88 +++
 11 files changed, 1003 insertions(+), 242 deletions(-)
 create mode 100644 drivers/net/dsa/motorcomm/Kconfig
 create mode 100644 drivers/net/dsa/motorcomm/Makefile
 rename drivers/net/dsa/{yt921x.c => motorcomm/chip.c} (95%)
 rename drivers/net/dsa/{yt921x.h => motorcomm/chip.h} (99%)
 create mode 100644 drivers/net/dsa/motorcomm/leds.c
 create mode 100644 drivers/net/dsa/motorcomm/leds.h
 create mode 100644 drivers/net/dsa/motorcomm/smi.c
 create mode 100644 drivers/net/dsa/motorcomm/smi.h

-- 
2.53.0


^ permalink raw reply

* [RFC net-next 1/4] net: dsa: motorcomm: Move to subdirectory
From: David Yang @ 2026-06-18 20:26 UTC (permalink / raw)
  To: netdev
  Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
In-Reply-To: <20260618202716.2166450-1-mmyangfl@gmail.com>

yt921x is already the longest single-file DSA driver, so it's time to
split it into parts.

Signed-off-by: David Yang <mmyangfl@gmail.com>
---
 MAINTAINERS                                    |  2 +-
 drivers/net/dsa/Kconfig                        | 10 ++--------
 drivers/net/dsa/Makefile                       |  2 +-
 drivers/net/dsa/motorcomm/Kconfig              |  8 ++++++++
 drivers/net/dsa/motorcomm/Makefile             |  3 +++
 drivers/net/dsa/{yt921x.c => motorcomm/chip.c} |  2 +-
 drivers/net/dsa/{yt921x.h => motorcomm/chip.h} |  0
 7 files changed, 16 insertions(+), 11 deletions(-)
 create mode 100644 drivers/net/dsa/motorcomm/Kconfig
 create mode 100644 drivers/net/dsa/motorcomm/Makefile
 rename drivers/net/dsa/{yt921x.c => motorcomm/chip.c} (99%)
 rename drivers/net/dsa/{yt921x.h => motorcomm/chip.h} (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index 06df1171f4cf..b007f20b2763 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -18039,7 +18039,7 @@ M:	David Yang <mmyangfl@gmail.com>
 L:	netdev@vger.kernel.org
 S:	Maintained
 F:	Documentation/devicetree/bindings/net/dsa/motorcomm,yt921x.yaml
-F:	drivers/net/dsa/yt921x.*
+F:	drivers/net/dsa/motorcomm/
 F:	net/dsa/tag_yt921x.c
 
 MOXA SMARTIO/INDUSTIO/INTELLIO SERIAL CARD
diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
index 4ab567c5bbaf..98e9bbe47de7 100644
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -72,6 +72,8 @@ config NET_DSA_MV88E6060
 
 source "drivers/net/dsa/microchip/Kconfig"
 
+source "drivers/net/dsa/motorcomm/Kconfig"
+
 source "drivers/net/dsa/mv88e6xxx/Kconfig"
 
 source "drivers/net/dsa/mxl862xx/Kconfig"
@@ -158,12 +160,4 @@ config NET_DSA_VITESSE_VSC73XX_PLATFORM
 	  This enables support for the Vitesse VSC7385, VSC7388, VSC7395
 	  and VSC7398 SparX integrated ethernet switches, connected over
 	  a CPU-attached address bus and work in memory-mapped I/O mode.
-
-config NET_DSA_YT921X
-	tristate "Motorcomm YT9215 ethernet switch chip support"
-	select NET_DSA_TAG_YT921X
-	select NET_IEEE8021Q_HELPERS if DCB
-	help
-	  This enables support for the Motorcomm YT9215 ethernet switch
-	  chip.
 endmenu
diff --git a/drivers/net/dsa/Makefile b/drivers/net/dsa/Makefile
index d2975badffc0..138225baa4d5 100644
--- a/drivers/net/dsa/Makefile
+++ b/drivers/net/dsa/Makefile
@@ -14,11 +14,11 @@ obj-$(CONFIG_NET_DSA_SMSC_LAN9303_MDIO) += lan9303_mdio.o
 obj-$(CONFIG_NET_DSA_VITESSE_VSC73XX) += vitesse-vsc73xx-core.o
 obj-$(CONFIG_NET_DSA_VITESSE_VSC73XX_PLATFORM) += vitesse-vsc73xx-platform.o
 obj-$(CONFIG_NET_DSA_VITESSE_VSC73XX_SPI) += vitesse-vsc73xx-spi.o
-obj-$(CONFIG_NET_DSA_YT921X) += yt921x.o
 obj-y				+= b53/
 obj-y				+= hirschmann/
 obj-y				+= lantiq/
 obj-y				+= microchip/
+obj-y				+= motorcomm/
 obj-y				+= mv88e6xxx/
 obj-y				+= mxl862xx/
 obj-y				+= netc/
diff --git a/drivers/net/dsa/motorcomm/Kconfig b/drivers/net/dsa/motorcomm/Kconfig
new file mode 100644
index 000000000000..64ff7d07a91b
--- /dev/null
+++ b/drivers/net/dsa/motorcomm/Kconfig
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+config NET_DSA_YT921X
+	tristate "Motorcomm YT9215 ethernet switch chip support"
+	select NET_DSA_TAG_YT921X
+	select NET_IEEE8021Q_HELPERS if DCB
+	help
+	  This enables support for the Motorcomm YT9215 ethernet switch
+	  chip.
diff --git a/drivers/net/dsa/motorcomm/Makefile b/drivers/net/dsa/motorcomm/Makefile
new file mode 100644
index 000000000000..bf99feb4c454
--- /dev/null
+++ b/drivers/net/dsa/motorcomm/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_NET_DSA_YT921X) += yt921x.o
+yt921x-objs := chip.o
diff --git a/drivers/net/dsa/yt921x.c b/drivers/net/dsa/motorcomm/chip.c
similarity index 99%
rename from drivers/net/dsa/yt921x.c
rename to drivers/net/dsa/motorcomm/chip.c
index 159b16606f6c..f070732845eb 100644
--- a/drivers/net/dsa/yt921x.c
+++ b/drivers/net/dsa/motorcomm/chip.c
@@ -26,7 +26,7 @@
 #include <net/ieee8021q.h>
 #include <net/pkt_cls.h>
 
-#include "yt921x.h"
+#include "chip.h"
 
 struct yt921x_mib_desc {
 	unsigned int size;
diff --git a/drivers/net/dsa/yt921x.h b/drivers/net/dsa/motorcomm/chip.h
similarity index 100%
rename from drivers/net/dsa/yt921x.h
rename to drivers/net/dsa/motorcomm/chip.h
-- 
2.53.0


^ permalink raw reply related

* [RFC net-next 2/4] net: dsa: motorcomm: Split SMI module
From: David Yang @ 2026-06-18 20:26 UTC (permalink / raw)
  To: netdev
  Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
In-Reply-To: <20260618202716.2166450-1-mmyangfl@gmail.com>

SMI operations is going to be used across different modules.

Signed-off-by: David Yang <mmyangfl@gmail.com>
---
 drivers/net/dsa/motorcomm/Makefile |   1 +
 drivers/net/dsa/motorcomm/chip.c   | 207 +----------------------------
 drivers/net/dsa/motorcomm/smi.c    | 155 +++++++++++++++++++++
 drivers/net/dsa/motorcomm/smi.h    |  88 ++++++++++++
 4 files changed, 245 insertions(+), 206 deletions(-)
 create mode 100644 drivers/net/dsa/motorcomm/smi.c
 create mode 100644 drivers/net/dsa/motorcomm/smi.h

diff --git a/drivers/net/dsa/motorcomm/Makefile b/drivers/net/dsa/motorcomm/Makefile
index bf99feb4c454..9fa24929007c 100644
--- a/drivers/net/dsa/motorcomm/Makefile
+++ b/drivers/net/dsa/motorcomm/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_NET_DSA_YT921X) += yt921x.o
 yt921x-objs := chip.o
+yt921x-objs += smi.o
diff --git a/drivers/net/dsa/motorcomm/chip.c b/drivers/net/dsa/motorcomm/chip.c
index f070732845eb..6dee25b6754a 100644
--- a/drivers/net/dsa/motorcomm/chip.c
+++ b/drivers/net/dsa/motorcomm/chip.c
@@ -13,7 +13,6 @@
 #include <linux/if_bridge.h>
 #include <linux/if_hsr.h>
 #include <linux/if_vlan.h>
-#include <linux/iopoll.h>
 #include <linux/mdio.h>
 #include <linux/module.h>
 #include <linux/of.h>
@@ -27,6 +26,7 @@
 #include <net/pkt_cls.h>
 
 #include "chip.h"
+#include "smi.h"
 
 struct yt921x_mib_desc {
 	unsigned int size;
@@ -155,9 +155,6 @@ static const struct yt921x_info yt921x_infos[] = {
 
 #define YT921X_VID_UNWARE	4095
 
-#define YT921X_POLL_SLEEP_US	10000
-#define YT921X_POLL_TIMEOUT_US	100000
-
 /* The interval should be small enough to avoid overflow of 32bit MIBs.
  *
  * Until we can read MIBs from stats64 call directly (i.e. sleep
@@ -196,208 +193,6 @@ static u32 ethaddr_lo2_to_u32(const unsigned char *addr)
 	return (addr[4] << 8) | addr[5];
 }
 
-static int yt921x_reg_read(struct yt921x_priv *priv, u32 reg, u32 *valp)
-{
-	WARN_ON(!mutex_is_locked(&priv->reg_lock));
-
-	return priv->reg_ops->read(priv->reg_ctx, reg, valp);
-}
-
-static int yt921x_reg_write(struct yt921x_priv *priv, u32 reg, u32 val)
-{
-	WARN_ON(!mutex_is_locked(&priv->reg_lock));
-
-	return priv->reg_ops->write(priv->reg_ctx, reg, val);
-}
-
-static int
-yt921x_reg_wait(struct yt921x_priv *priv, u32 reg, u32 mask, u32 *valp)
-{
-	u32 val;
-	int res;
-	int ret;
-
-	ret = read_poll_timeout(yt921x_reg_read, res,
-				res || (val & mask) == *valp,
-				YT921X_POLL_SLEEP_US, YT921X_POLL_TIMEOUT_US,
-				false, priv, reg, &val);
-	if (ret)
-		return ret;
-	if (res)
-		return res;
-
-	*valp = val;
-	return 0;
-}
-
-static int
-yt921x_reg_update_bits(struct yt921x_priv *priv, u32 reg, u32 mask, u32 val)
-{
-	int res;
-	u32 v;
-	u32 u;
-
-	res = yt921x_reg_read(priv, reg, &v);
-	if (res)
-		return res;
-
-	u = v;
-	u &= ~mask;
-	u |= val;
-	if (u == v)
-		return 0;
-
-	return yt921x_reg_write(priv, reg, u);
-}
-
-static int yt921x_reg_set_bits(struct yt921x_priv *priv, u32 reg, u32 mask)
-{
-	return yt921x_reg_update_bits(priv, reg, 0, mask);
-}
-
-static int yt921x_reg_clear_bits(struct yt921x_priv *priv, u32 reg, u32 mask)
-{
-	return yt921x_reg_update_bits(priv, reg, mask, 0);
-}
-
-static int
-yt921x_reg_toggle_bits(struct yt921x_priv *priv, u32 reg, u32 mask, bool set)
-{
-	return yt921x_reg_update_bits(priv, reg, mask, !set ? 0 : mask);
-}
-
-/* Some multi-word registers, like VLANn_CTRL, should be treated as a single
- * long register. More specifically, writes to parts of its words won't become
- * visible, until the last word is written.
- *
- * Here we require full read and write operations over these registers to
- * eliminate potential issues, although partial reads/writes are also possible.
- */
-
-static void update_ctrls_unaligned(u32 *lo, u32 *hi, u64 mask, u64 val)
-{
-	*lo &= ~lower_32_bits(mask);
-	*hi &= ~upper_32_bits(mask);
-	*lo |= lower_32_bits(val);
-	*hi |= upper_32_bits(val);
-}
-
-static int
-yt921x_regs_read(struct yt921x_priv *priv, u32 reg, u32 *vals,
-		 unsigned int num_regs)
-{
-	int res;
-
-	for (unsigned int i = 0; i < num_regs; i++) {
-		res = yt921x_reg_read(priv, reg + 4 * i, &vals[i]);
-		if (res)
-			return res;
-	}
-
-	return 0;
-}
-
-static int
-yt921x_regs_write(struct yt921x_priv *priv, u32 reg, const u32 *vals,
-		  unsigned int num_regs)
-{
-	int res;
-
-	for (unsigned int i = 0; i < num_regs; i++) {
-		res = yt921x_reg_write(priv, reg + 4 * i, vals[i]);
-		if (res)
-			return res;
-	}
-
-	return 0;
-}
-
-static int
-yt921x_regs_update_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks,
-			const u32 *vals, unsigned int num_regs)
-{
-	bool changed = false;
-	u32 vs[4];
-	int res;
-
-	BUILD_BUG_ON(num_regs > ARRAY_SIZE(vs));
-
-	res = yt921x_regs_read(priv, reg, vs, num_regs);
-	if (res)
-		return res;
-
-	for (unsigned int i = 0; i < num_regs; i++) {
-		u32 u = vs[i];
-
-		u &= ~masks[i];
-		u |= vals[i];
-		if (u != vs[i])
-			changed = true;
-
-		vs[i] = u;
-	}
-
-	if (!changed)
-		return 0;
-
-	return yt921x_regs_write(priv, reg, vs, num_regs);
-}
-
-static int
-yt921x_regs_clear_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks,
-		       unsigned int num_regs)
-{
-	bool changed = false;
-	u32 vs[4];
-	int res;
-
-	BUILD_BUG_ON(num_regs > ARRAY_SIZE(vs));
-
-	res = yt921x_regs_read(priv, reg, vs, num_regs);
-	if (res)
-		return res;
-
-	for (unsigned int i = 0; i < num_regs; i++) {
-		u32 u = vs[i];
-
-		u &= ~masks[i];
-		if (u != vs[i])
-			changed = true;
-
-		vs[i] = u;
-	}
-
-	if (!changed)
-		return 0;
-
-	return yt921x_regs_write(priv, reg, vs, num_regs);
-}
-
-static int
-yt921x_reg64_write(struct yt921x_priv *priv, u32 reg, const u32 *vals)
-{
-	return yt921x_regs_write(priv, reg, vals, 2);
-}
-
-static int
-yt921x_reg64_update_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks,
-			 const u32 *vals)
-{
-	return yt921x_regs_update_bits(priv, reg, masks, vals, 2);
-}
-
-static int
-yt921x_reg64_clear_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks)
-{
-	return yt921x_regs_clear_bits(priv, reg, masks, 2);
-}
-
-static int
-yt921x_reg96_write(struct yt921x_priv *priv, u32 reg, const u32 *vals)
-{
-	return yt921x_regs_write(priv, reg, vals, 3);
-}
-
 static int yt921x_reg_mdio_read(void *context, u32 reg, u32 *valp)
 {
 	struct yt921x_reg_mdio *mdio = context;
diff --git a/drivers/net/dsa/motorcomm/smi.c b/drivers/net/dsa/motorcomm/smi.c
new file mode 100644
index 000000000000..93e6c0f7e562
--- /dev/null
+++ b/drivers/net/dsa/motorcomm/smi.c
@@ -0,0 +1,155 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2026 David Yang
+ */
+
+#include <linux/iopoll.h>
+
+#include "chip.h"
+#include "smi.h"
+
+#define YT921X_POLL_SLEEP_US	10000
+#define YT921X_POLL_TIMEOUT_US	100000
+
+int yt921x_reg_read(struct yt921x_priv *priv, u32 reg, u32 *valp)
+{
+	WARN_ON(!mutex_is_locked(&priv->reg_lock));
+
+	return priv->reg_ops->read(priv->reg_ctx, reg, valp);
+}
+
+int yt921x_reg_write(struct yt921x_priv *priv, u32 reg, u32 val)
+{
+	WARN_ON(!mutex_is_locked(&priv->reg_lock));
+
+	return priv->reg_ops->write(priv->reg_ctx, reg, val);
+}
+
+int yt921x_reg_wait(struct yt921x_priv *priv, u32 reg, u32 mask, u32 *valp)
+{
+	u32 val;
+	int res;
+	int ret;
+
+	ret = read_poll_timeout(yt921x_reg_read, res,
+				res || (val & mask) == *valp,
+				YT921X_POLL_SLEEP_US, YT921X_POLL_TIMEOUT_US,
+				false, priv, reg, &val);
+	if (ret)
+		return ret;
+	if (res)
+		return res;
+
+	*valp = val;
+	return 0;
+}
+
+int yt921x_reg_update_bits(struct yt921x_priv *priv, u32 reg, u32 mask, u32 val)
+{
+	int res;
+	u32 v;
+	u32 u;
+
+	res = yt921x_reg_read(priv, reg, &v);
+	if (res)
+		return res;
+
+	u = v;
+	u &= ~mask;
+	u |= val;
+	if (u == v)
+		return 0;
+
+	return yt921x_reg_write(priv, reg, u);
+}
+
+int
+yt921x_regs_read(struct yt921x_priv *priv, u32 reg, u32 *vals,
+		 unsigned int num_regs)
+{
+	int res;
+
+	for (unsigned int i = 0; i < num_regs; i++) {
+		res = yt921x_reg_read(priv, reg + 4 * i, &vals[i]);
+		if (res)
+			return res;
+	}
+
+	return 0;
+}
+
+int
+yt921x_regs_write(struct yt921x_priv *priv, u32 reg, const u32 *vals,
+		  unsigned int num_regs)
+{
+	int res;
+
+	for (unsigned int i = 0; i < num_regs; i++) {
+		res = yt921x_reg_write(priv, reg + 4 * i, vals[i]);
+		if (res)
+			return res;
+	}
+
+	return 0;
+}
+
+int
+yt921x_regs_update_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks,
+			const u32 *vals, unsigned int num_regs)
+{
+	bool changed = false;
+	u32 vs[4];
+	int res;
+
+	WARN_ON_ONCE(num_regs > ARRAY_SIZE(vs));
+
+	res = yt921x_regs_read(priv, reg, vs, num_regs);
+	if (res)
+		return res;
+
+	for (unsigned int i = 0; i < num_regs; i++) {
+		u32 u = vs[i];
+
+		u &= ~masks[i];
+		u |= vals[i];
+		if (u != vs[i])
+			changed = true;
+
+		vs[i] = u;
+	}
+
+	if (!changed)
+		return 0;
+
+	return yt921x_regs_write(priv, reg, vs, num_regs);
+}
+
+int
+yt921x_regs_clear_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks,
+		       unsigned int num_regs)
+{
+	bool changed = false;
+	u32 vs[4];
+	int res;
+
+	WARN_ON_ONCE(num_regs > ARRAY_SIZE(vs));
+
+	res = yt921x_regs_read(priv, reg, vs, num_regs);
+	if (res)
+		return res;
+
+	for (unsigned int i = 0; i < num_regs; i++) {
+		u32 u = vs[i];
+
+		u &= ~masks[i];
+		if (u != vs[i])
+			changed = true;
+
+		vs[i] = u;
+	}
+
+	if (!changed)
+		return 0;
+
+	return yt921x_regs_write(priv, reg, vs, num_regs);
+}
diff --git a/drivers/net/dsa/motorcomm/smi.h b/drivers/net/dsa/motorcomm/smi.h
new file mode 100644
index 000000000000..2e956065eb90
--- /dev/null
+++ b/drivers/net/dsa/motorcomm/smi.h
@@ -0,0 +1,88 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2026 David Yang
+ */
+
+#ifndef _YT_SMI_H
+#define _YT_SMI_H
+
+#include <linux/types.h>
+#include <linux/wordpart.h>
+
+struct yt921x_priv;
+
+int yt921x_reg_read(struct yt921x_priv *priv, u32 reg, u32 *valp);
+int yt921x_reg_write(struct yt921x_priv *priv, u32 reg, u32 val);
+int yt921x_reg_wait(struct yt921x_priv *priv, u32 reg, u32 mask, u32 *valp);
+int yt921x_reg_update_bits(struct yt921x_priv *priv, u32 reg, u32 mask,
+			   u32 val);
+
+static inline int
+yt921x_reg_set_bits(struct yt921x_priv *priv, u32 reg, u32 mask)
+{
+	return yt921x_reg_update_bits(priv, reg, 0, mask);
+}
+
+static inline int
+yt921x_reg_clear_bits(struct yt921x_priv *priv, u32 reg, u32 mask)
+{
+	return yt921x_reg_update_bits(priv, reg, mask, 0);
+}
+
+static inline int
+yt921x_reg_toggle_bits(struct yt921x_priv *priv, u32 reg, u32 mask, bool set)
+{
+	return yt921x_reg_update_bits(priv, reg, mask, !set ? 0 : mask);
+}
+
+/* Some multi-word registers, like VLANn_CTRL, should be treated as a single
+ * long register. More specifically, writes to parts of its words won't become
+ * visible, until the last word is written.
+ *
+ * Here we require full read and write operations over these registers to
+ * eliminate potential issues, although partial reads/writes are also possible.
+ */
+
+static inline void update_ctrls_unaligned(u32 *lo, u32 *hi, u64 mask, u64 val)
+{
+	*lo &= ~lower_32_bits(mask);
+	*hi &= ~upper_32_bits(mask);
+	*lo |= lower_32_bits(val);
+	*hi |= upper_32_bits(val);
+}
+
+int yt921x_regs_read(struct yt921x_priv *priv, u32 reg, u32 *vals,
+		     unsigned int num_regs);
+int yt921x_regs_write(struct yt921x_priv *priv, u32 reg, const u32 *vals,
+		      unsigned int num_regs);
+int yt921x_regs_update_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks,
+			    const u32 *vals, unsigned int num_regs);
+int yt921x_regs_clear_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks,
+			   unsigned int num_regs);
+
+static inline int
+yt921x_reg64_write(struct yt921x_priv *priv, u32 reg, const u32 *vals)
+{
+	return yt921x_regs_write(priv, reg, vals, 2);
+}
+
+static inline int
+yt921x_reg64_update_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks,
+			 const u32 *vals)
+{
+	return yt921x_regs_update_bits(priv, reg, masks, vals, 2);
+}
+
+static inline int
+yt921x_reg64_clear_bits(struct yt921x_priv *priv, u32 reg, const u32 *masks)
+{
+	return yt921x_regs_clear_bits(priv, reg, masks, 2);
+}
+
+static inline int
+yt921x_reg96_write(struct yt921x_priv *priv, u32 reg, const u32 *vals)
+{
+	return yt921x_regs_write(priv, reg, vals, 3);
+}
+
+#endif
-- 
2.53.0


^ permalink raw reply related

* [RFC net-next 3/4] net: dsa: motorcomm: Dynamically allocate port structures
From: David Yang @ 2026-06-18 20:26 UTC (permalink / raw)
  To: netdev
  Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
In-Reply-To: <20260618202716.2166450-1-mmyangfl@gmail.com>

With support for LED introduced later, struct yt921x_priv will be 17k
which is not very good for a single kmalloc(). Convert the ports array
to a array of pointers to stop bloating the priv struct.

Signed-off-by: David Yang <mmyangfl@gmail.com>
---
 drivers/net/dsa/motorcomm/chip.c | 95 ++++++++++++++++++++++++--------
 drivers/net/dsa/motorcomm/chip.h |  3 +-
 2 files changed, 75 insertions(+), 23 deletions(-)

diff --git a/drivers/net/dsa/motorcomm/chip.c b/drivers/net/dsa/motorcomm/chip.c
index 6dee25b6754a..d44f7749de02 100644
--- a/drivers/net/dsa/motorcomm/chip.c
+++ b/drivers/net/dsa/motorcomm/chip.c
@@ -548,11 +548,14 @@ yt921x_mbus_ext_init(struct yt921x_priv *priv, struct device_node *mnp)
 /* Read and handle overflow of 32bit MIBs. MIB buffer must be zeroed before. */
 static int yt921x_read_mib(struct yt921x_priv *priv, int port)
 {
-	struct yt921x_port *pp = &priv->ports[port];
+	struct yt921x_port *pp = priv->ports[port];
 	struct device *dev = to_device(priv);
 	struct yt921x_mib *mib = &pp->mib;
 	int res = 0;
 
+	if (!pp)
+		return -ENODEV;
+
 	/* Reading of yt921x_port::mib is not protected by a lock and it's vain
 	 * to keep its consistency, since we have to read registers one by one
 	 * and there is no way to make a snapshot of MIB stats.
@@ -609,9 +612,8 @@ static void yt921x_poll_mib(struct work_struct *work)
 {
 	struct yt921x_port *pp = container_of_const(work, struct yt921x_port,
 						    mib_read.work);
-	struct yt921x_priv *priv = (void *)(pp - pp->index) -
-				   offsetof(struct yt921x_priv, ports);
 	unsigned long delay = YT921X_STATS_INTERVAL_JIFFIES;
+	struct yt921x_priv *priv = pp->priv;
 	int port = pp->index;
 	int res;
 
@@ -643,10 +645,13 @@ static void
 yt921x_dsa_get_ethtool_stats(struct dsa_switch *ds, int port, uint64_t *data)
 {
 	struct yt921x_priv *priv = to_yt921x_priv(ds);
-	struct yt921x_port *pp = &priv->ports[port];
+	struct yt921x_port *pp = priv->ports[port];
 	struct yt921x_mib *mib = &pp->mib;
 	size_t j;
 
+	if (!pp)
+		return;
+
 	mutex_lock(&priv->reg_lock);
 	yt921x_read_mib(priv, port);
 	mutex_unlock(&priv->reg_lock);
@@ -685,9 +690,12 @@ yt921x_dsa_get_eth_mac_stats(struct dsa_switch *ds, int port,
 			     struct ethtool_eth_mac_stats *mac_stats)
 {
 	struct yt921x_priv *priv = to_yt921x_priv(ds);
-	struct yt921x_port *pp = &priv->ports[port];
+	struct yt921x_port *pp = priv->ports[port];
 	struct yt921x_mib *mib = &pp->mib;
 
+	if (!pp)
+		return;
+
 	mutex_lock(&priv->reg_lock);
 	yt921x_read_mib(priv, port);
 	mutex_unlock(&priv->reg_lock);
@@ -721,9 +729,12 @@ yt921x_dsa_get_eth_ctrl_stats(struct dsa_switch *ds, int port,
 			      struct ethtool_eth_ctrl_stats *ctrl_stats)
 {
 	struct yt921x_priv *priv = to_yt921x_priv(ds);
-	struct yt921x_port *pp = &priv->ports[port];
+	struct yt921x_port *pp = priv->ports[port];
 	struct yt921x_mib *mib = &pp->mib;
 
+	if (!pp)
+		return;
+
 	mutex_lock(&priv->reg_lock);
 	yt921x_read_mib(priv, port);
 	mutex_unlock(&priv->reg_lock);
@@ -750,9 +761,12 @@ yt921x_dsa_get_rmon_stats(struct dsa_switch *ds, int port,
 			  const struct ethtool_rmon_hist_range **ranges)
 {
 	struct yt921x_priv *priv = to_yt921x_priv(ds);
-	struct yt921x_port *pp = &priv->ports[port];
+	struct yt921x_port *pp = priv->ports[port];
 	struct yt921x_mib *mib = &pp->mib;
 
+	if (!pp)
+		return;
+
 	mutex_lock(&priv->reg_lock);
 	yt921x_read_mib(priv, port);
 	mutex_unlock(&priv->reg_lock);
@@ -786,9 +800,12 @@ yt921x_dsa_get_stats64(struct dsa_switch *ds, int port,
 		       struct rtnl_link_stats64 *stats)
 {
 	struct yt921x_priv *priv = to_yt921x_priv(ds);
-	struct yt921x_port *pp = &priv->ports[port];
+	struct yt921x_port *pp = priv->ports[port];
 	struct yt921x_mib *mib = &pp->mib;
 
+	if (!pp)
+		return;
+
 	stats->rx_length_errors = mib->rx_undersize_errors +
 				  mib->rx_fragment_errors;
 	stats->rx_over_errors = mib->rx_oversize_errors;
@@ -822,9 +839,12 @@ yt921x_dsa_get_pause_stats(struct dsa_switch *ds, int port,
 			   struct ethtool_pause_stats *pause_stats)
 {
 	struct yt921x_priv *priv = to_yt921x_priv(ds);
-	struct yt921x_port *pp = &priv->ports[port];
+	struct yt921x_port *pp = priv->ports[port];
 	struct yt921x_mib *mib = &pp->mib;
 
+	if (!pp)
+		return;
+
 	mutex_lock(&priv->reg_lock);
 	yt921x_read_mib(priv, port);
 	mutex_unlock(&priv->reg_lock);
@@ -3332,15 +3352,20 @@ static int yt921x_bridge(struct yt921x_priv *priv, u16 ports_mask)
 
 	isolated_mask = 0;
 	for_each_set_bit(port, &targets_mask, YT921X_PORT_NUM) {
-		struct yt921x_port *pp = &priv->ports[port];
+		struct yt921x_port *pp = priv->ports[port];
 
+		if (!pp)
+			continue;
 		if (pp->isolated)
 			isolated_mask |= BIT(port);
 	}
 
 	/* Block from non-cpu bridge ports ... */
 	for_each_set_bit(port, &targets_mask, YT921X_PORT_NUM) {
-		struct yt921x_port *pp = &priv->ports[port];
+		struct yt921x_port *pp = priv->ports[port];
+
+		if (!pp)
+			continue;
 
 		/* to non-bridge ports */
 		ctrl = ~ports_mask;
@@ -3397,11 +3422,14 @@ static int
 yt921x_bridge_flags(struct yt921x_priv *priv, int port,
 		    struct switchdev_brport_flags flags)
 {
-	struct yt921x_port *pp = &priv->ports[port];
+	struct yt921x_port *pp = priv->ports[port];
 	bool do_flush;
 	u32 mask;
 	int res;
 
+	if (!pp)
+		return -ENODEV;
+
 	if (flags.mask & BR_LEARNING) {
 		bool learning = flags.val & BR_LEARNING;
 
@@ -3954,11 +3982,16 @@ yt921x_phylink_mac_link_down(struct phylink_config *config, unsigned int mode,
 {
 	struct dsa_port *dp = dsa_phylink_to_port(config);
 	struct yt921x_priv *priv = to_yt921x_priv(dp->ds);
+	struct yt921x_port *pp;
 	int port = dp->index;
 	int res;
 
+	pp = priv->ports[port];
+	if (!pp)
+		return;
+
 	/* No need to sync; port control block is hold until device remove */
-	cancel_delayed_work(&priv->ports[port].mib_read);
+	cancel_delayed_work(&pp->mib_read);
 
 	mutex_lock(&priv->reg_lock);
 	res = yt921x_port_down(priv, port);
@@ -3977,9 +4010,14 @@ yt921x_phylink_mac_link_up(struct phylink_config *config,
 {
 	struct dsa_port *dp = dsa_phylink_to_port(config);
 	struct yt921x_priv *priv = to_yt921x_priv(dp->ds);
+	struct yt921x_port *pp;
 	int port = dp->index;
 	int res;
 
+	pp = priv->ports[port];
+	if (!pp)
+		return;
+
 	mutex_lock(&priv->reg_lock);
 	res = yt921x_port_up(priv, port, mode, interface, speed, duplex,
 			     tx_pause, rx_pause);
@@ -3989,7 +4027,7 @@ yt921x_phylink_mac_link_up(struct phylink_config *config,
 		dev_err(dp->ds->dev, "Failed to %s port %d: %i\n", "bring up",
 			port, res);
 
-	schedule_delayed_work(&priv->ports[port].mib_read, 0);
+	schedule_delayed_work(&pp->mib_read, 0);
 }
 
 static void
@@ -4574,6 +4612,23 @@ static int yt921x_dsa_setup(struct dsa_switch *ds)
 		return -ENODEV;
 	}
 
+	for (int port = 0; port < YT921X_PORT_NUM; port++) {
+		struct yt921x_port *pp;
+
+		if (!(BIT(port) & (priv->info->internal_mask |
+				   priv->info->external_mask)))
+			continue;
+
+		pp = devm_kzalloc(dev, sizeof(*pp), GFP_KERNEL);
+		if (!pp)
+			return -ENOMEM;
+		priv->ports[port] = pp;
+
+		pp->priv = priv;
+		pp->index = port;
+		INIT_DELAYED_WORK(&pp->mib_read, yt921x_poll_mib);
+	}
+
 	mutex_lock(&priv->reg_lock);
 	res = yt921x_chip_setup(priv);
 	mutex_unlock(&priv->reg_lock);
@@ -4682,7 +4737,10 @@ static void yt921x_mdio_remove(struct mdio_device *mdiodev)
 		return;
 
 	for (size_t i = ARRAY_SIZE(priv->ports); i-- > 0; ) {
-		struct yt921x_port *pp = &priv->ports[i];
+		struct yt921x_port *pp = priv->ports[i];
+
+		if (!pp)
+			continue;
 
 		disable_delayed_work_sync(&pp->mib_read);
 	}
@@ -4730,13 +4788,6 @@ static int yt921x_mdio_probe(struct mdio_device *mdiodev)
 	priv->reg_ops = &yt921x_reg_ops_mdio;
 	priv->reg_ctx = mdio;
 
-	for (size_t i = 0; i < ARRAY_SIZE(priv->ports); i++) {
-		struct yt921x_port *pp = &priv->ports[i];
-
-		pp->index = i;
-		INIT_DELAYED_WORK(&pp->mib_read, yt921x_poll_mib);
-	}
-
 	ds = &priv->ds;
 	ds->dev = dev;
 	ds->assisted_learning_on_cpu_port = true;
diff --git a/drivers/net/dsa/motorcomm/chip.h b/drivers/net/dsa/motorcomm/chip.h
index 555046526669..950a5799f8b6 100644
--- a/drivers/net/dsa/motorcomm/chip.h
+++ b/drivers/net/dsa/motorcomm/chip.h
@@ -929,6 +929,7 @@ struct yt921x_acl_blk {
 };
 
 struct yt921x_port {
+	struct yt921x_priv *priv;
 	unsigned char index;
 
 	bool hairpin;
@@ -964,7 +965,7 @@ struct yt921x_priv {
 	struct mii_bus *mbus_int;
 	struct mii_bus *mbus_ext;
 
-	struct yt921x_port ports[YT921X_PORT_NUM];
+	struct yt921x_port *ports[YT921X_PORT_NUM];
 
 	u16 eee_ports_mask;
 
-- 
2.53.0


^ permalink raw reply related

* [RFC net-next 4/4] net: dsa: motorcomm: Add LED support
From: David Yang @ 2026-06-18 20:26 UTC (permalink / raw)
  To: netdev
  Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel
In-Reply-To: <20260618202716.2166450-1-mmyangfl@gmail.com>

LEDs can be described in the device tree using the same format as qca8k.
Each port can configure up to 3 LEDs.

Signed-off-by: David Yang <mmyangfl@gmail.com>
---
 drivers/net/dsa/motorcomm/Kconfig  |   9 +
 drivers/net/dsa/motorcomm/Makefile |   1 +
 drivers/net/dsa/motorcomm/chip.c   |   7 +-
 drivers/net/dsa/motorcomm/chip.h   |  18 +
 drivers/net/dsa/motorcomm/leds.c   | 530 +++++++++++++++++++++++++++++
 drivers/net/dsa/motorcomm/leds.h   | 104 ++++++
 6 files changed, 667 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/dsa/motorcomm/leds.c
 create mode 100644 drivers/net/dsa/motorcomm/leds.h

diff --git a/drivers/net/dsa/motorcomm/Kconfig b/drivers/net/dsa/motorcomm/Kconfig
index 64ff7d07a91b..7c4d1eaa16c2 100644
--- a/drivers/net/dsa/motorcomm/Kconfig
+++ b/drivers/net/dsa/motorcomm/Kconfig
@@ -6,3 +6,12 @@ config NET_DSA_YT921X
 	help
 	  This enables support for the Motorcomm YT9215 ethernet switch
 	  chip.
+
+config NET_DSA_YT921X_LEDS
+	bool "LED support for Motorcomm YT9215"
+	default y
+	depends on NET_DSA_YT921X
+	depends on LEDS_CLASS=y || LEDS_CLASS=NET_DSA_YT921X
+	help
+	  This enabled support for controlling the LEDs attached to the
+	  Motorcomm YT9215 switch chips.
diff --git a/drivers/net/dsa/motorcomm/Makefile b/drivers/net/dsa/motorcomm/Makefile
index 9fa24929007c..6bb3adfbcc2d 100644
--- a/drivers/net/dsa/motorcomm/Makefile
+++ b/drivers/net/dsa/motorcomm/Makefile
@@ -2,3 +2,4 @@
 obj-$(CONFIG_NET_DSA_YT921X) += yt921x.o
 yt921x-objs := chip.o
 yt921x-objs += smi.o
+yt921x-$(CONFIG_NET_DSA_YT921X_LEDS) += leds.o
diff --git a/drivers/net/dsa/motorcomm/chip.c b/drivers/net/dsa/motorcomm/chip.c
index d44f7749de02..4856db69e2ea 100644
--- a/drivers/net/dsa/motorcomm/chip.c
+++ b/drivers/net/dsa/motorcomm/chip.c
@@ -26,6 +26,7 @@
 #include <net/pkt_cls.h>
 
 #include "chip.h"
+#include "leds.h"
 #include "smi.h"
 
 struct yt921x_mib_desc {
@@ -151,8 +152,6 @@ static const struct yt921x_info yt921x_infos[] = {
 	{}
 };
 
-#define YT921X_NAME	"yt921x"
-
 #define YT921X_VID_UNWARE	4095
 
 /* The interval should be small enough to avoid overflow of 32bit MIBs.
@@ -4559,6 +4558,10 @@ static int yt921x_chip_setup(struct yt921x_priv *priv)
 		return res;
 #endif
 
+	res = yt921x_led_setup(priv);
+	if (res)
+		return res;
+
 	/* Clear MIB */
 	ctrl = YT921X_MIB_CTRL_CLEAN | YT921X_MIB_CTRL_ALL_PORT;
 	res = yt921x_reg_write(priv, YT921X_MIB_CTRL, ctrl);
diff --git a/drivers/net/dsa/motorcomm/chip.h b/drivers/net/dsa/motorcomm/chip.h
index 950a5799f8b6..ea889319d996 100644
--- a/drivers/net/dsa/motorcomm/chip.h
+++ b/drivers/net/dsa/motorcomm/chip.h
@@ -850,9 +850,13 @@ enum yt921x_fdb_entry_status {
 #define YT921X_ACL_NUM		(YT921X_ACL_BLK_NUM * YT921X_ACL_ENT_PER_BLK)
 #define YT921X_UDF_NUM		8
 
+#define YT921X_LED_GROUP_NUM	3
+
 /* 8 internal + 2 external + 1 mcu */
 #define YT921X_PORT_NUM			11
 
+#define YT921X_NAME	"yt921x"
+
 #define yt921x_port_is_internal(port) ((port) < 8)
 #define yt921x_port_is_external(port) (8 <= (port) && (port) < 9)
 
@@ -928,6 +932,14 @@ struct yt921x_acl_blk {
 	struct yt921x_acl_rule *rules[YT921X_ACL_ENT_PER_BLK];
 };
 
+struct yt921x_led {
+	struct led_classdev cdev;
+	unsigned char group;
+
+	bool use_cycle;
+	bool use_duty;
+};
+
 struct yt921x_port {
 	struct yt921x_priv *priv;
 	unsigned char index;
@@ -939,6 +951,12 @@ struct yt921x_port {
 	struct yt921x_mib mib;
 	u64 rx_frames;
 	u64 tx_frames;
+
+#if IS_ENABLED(CONFIG_NET_DSA_YT921X_LEDS)
+	struct yt921x_led leds[YT921X_LED_GROUP_NUM];
+	unsigned int blink_cycle;
+	unsigned int blink_duty;
+#endif
 };
 
 struct yt921x_reg_ops {
diff --git a/drivers/net/dsa/motorcomm/leds.c b/drivers/net/dsa/motorcomm/leds.c
new file mode 100644
index 000000000000..49d657b38822
--- /dev/null
+++ b/drivers/net/dsa/motorcomm/leds.c
@@ -0,0 +1,530 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2026 David Yang
+ */
+
+#include "chip.h"
+#include "leds.h"
+#include "smi.h"
+
+#define to_yt921x_led(led_cdev) \
+	container_of_const((led_cdev), struct yt921x_led, cdev)
+#define to_yt921x_port(led) \
+	((void *)((led) - (led)->group) - offsetof(struct yt921x_port, leds))
+#define to_yt921x_priv(pp) ((pp)->priv)
+#define to_device(priv) ((priv)->ds.dev)
+
+static u32 yt921x_led_regaddr(struct yt921x_priv *priv, int port, int group)
+{
+	switch (group) {
+	case 0:
+	default:
+		return YT921X_LED0_PORTn(port);
+	case 1:
+		return YT921X_LED1_PORTn(port);
+	case 2:
+		return YT921X_LED2_PORTn(port);
+	}
+}
+
+static int
+yt921x_led_force_get(struct yt921x_priv *priv, int port, int group, bool *onp)
+{
+	u32 val;
+	int res;
+
+	res = yt921x_reg_read(priv, YT921X_LED2_PORTn(port), &val);
+	if (res)
+		return res;
+
+	*onp = (val & YT921X_LED2_PORT_FORCEn_M(group)) ==
+	       YT921X_LED2_PORT_FORCEn_ON(group);
+	return 0;
+}
+
+static int
+yt921x_led_force_set(struct yt921x_priv *priv, int port, int group, bool on)
+{
+	struct yt921x_port *pp = priv->ports[port];
+	struct yt921x_led *led = &pp->leds[group];
+	u32 ctrl;
+	u32 mask;
+
+	if (!pp)
+		return -ENODEV;
+
+	led->use_cycle = false;
+	led->use_duty = false;
+
+	mask = YT921X_LED2_PORT_FORCEn_M(group);
+	ctrl = on ? YT921X_LED2_PORT_FORCEn_ON(group) :
+	       YT921X_LED2_PORT_FORCEn_OFF(group);
+	return yt921x_reg_update_bits(priv, YT921X_LED2_PORTn(port), mask,
+				      ctrl);
+}
+
+/* 2*lcm(2,3,4,6) */
+#define YT921X_LED_DUTY_DENOM 24
+#define YT921X_LED_DUTY(nom, denom) (YT921X_LED_DUTY_DENOM * (nom) / (denom))
+
+#define M_SQRT2 1.41421356237309504880
+
+static int
+yt921x_led_blink_select(const struct yt921x_priv *priv, unsigned long on,
+			unsigned long off, unsigned int *cyclep,
+			unsigned int *dutyp)
+{
+	unsigned int cycle_upper;
+	unsigned int cycle_req;
+	unsigned int cycle;
+	unsigned int duty;
+
+	if (!on && !off) {
+		*cyclep = YT921X_LED_BLINK_DEF;
+		*dutyp = YT921X_LED_DUTY(1, 2);
+		return 0;
+	}
+
+	cycle = YT921X_LED_BLINK_MAX;
+	cycle_upper = M_SQRT2 * YT921X_LED_BLINK_MAX + 1;
+	if (cycle_upper <= on + off)
+		return -EOPNOTSUPP;
+
+	cycle_req = on + off;
+	for (; cycle > YT921X_LED_BLINK_MIN; cycle_upper >>= 1, cycle >>= 1)
+		if (cycle_upper >> 1 <= cycle_req)
+			break;
+
+	duty = YT921X_LED_DUTY(on > off ? off : on, cycle_req);
+	if (duty < YT921X_LED_DUTY(5, 24))
+		duty = YT921X_LED_DUTY(1, 6);
+	else if (duty < YT921X_LED_DUTY(7, 24))
+		duty = YT921X_LED_DUTY(1, 4);
+	else if (duty < YT921X_LED_DUTY(5, 12))
+		duty = YT921X_LED_DUTY(1, 3);
+	else
+		duty = YT921X_LED_DUTY(1, 2);
+	if (on > off)
+		duty = YT921X_LED_DUTY_DENOM - duty;
+
+	*cyclep = cycle;
+	*dutyp = duty;
+	return 0;
+}
+
+static int
+yt921x_led_blink_set(struct yt921x_priv *priv, int port, int group,
+		     unsigned long *onp, unsigned long *offp)
+{
+	struct yt921x_port *pp = priv->ports[port];
+	struct yt921x_led *led = &pp->leds[group];
+	unsigned int cycle;
+	unsigned int duty;
+	bool change_cycle;
+	bool change_duty;
+	bool use_cycle;
+	u32 ctrl;
+	u32 mask;
+	u32 val;
+	int res;
+
+	if (!pp)
+		return -ENODEV;
+
+	res = yt921x_led_blink_select(priv, *onp, *offp, &cycle, &duty);
+	if (res)
+		return res;
+
+	use_cycle = cycle < YT921X_LED_BLINK_DEF;
+	change_cycle = use_cycle && cycle != pp->blink_cycle;
+	change_duty = duty != pp->blink_duty;
+	if (change_cycle || change_duty)
+		for (unsigned int i = 0; i < YT921X_LED_GROUP_NUM; i++) {
+			if (i == group)
+				continue;
+			if ((change_cycle && pp->leds[i].use_cycle) ||
+			    (change_duty && pp->leds[i].use_duty))
+				return -EOPNOTSUPP;
+		}
+
+	mask = YT921X_LED1_PORT_BLINK_DUTY_M | YT921X_LED1_PORT_BLINK_DUTY_COMP;
+	switch (duty >= YT921X_LED_DUTY(1, 2) ? duty :
+		YT921X_LED_DUTY_DENOM - duty) {
+	default:
+		duty = YT921X_LED_DUTY(1, 2);
+		fallthrough;
+	case YT921X_LED_DUTY(1, 2):
+		ctrl = YT921X_LED1_PORT_BLINK_DUTY_1_2;
+		break;
+	case YT921X_LED_DUTY(2, 3):
+		ctrl = YT921X_LED1_PORT_BLINK_DUTY_2_3;
+		break;
+	case YT921X_LED_DUTY(3, 4):
+		ctrl = YT921X_LED1_PORT_BLINK_DUTY_3_4;
+		break;
+	case YT921X_LED_DUTY(5, 6):
+		ctrl = YT921X_LED1_PORT_BLINK_DUTY_5_6;
+		break;
+	}
+	if (duty < YT921X_LED_DUTY(1, 2))
+		ctrl |= YT921X_LED1_PORT_BLINK_DUTY_COMP;
+	if (use_cycle) {
+		mask |= YT921X_LED1_PORT_OTHER_BLINK_M;
+		ctrl |= YT921X_LED1_PORT_OTHER_BLINK(9 - __fls(cycle));
+	}
+	res = yt921x_reg_update_bits(priv, YT921X_LED1_PORTn(port), mask, ctrl);
+	if (res)
+		return res;
+
+	res = yt921x_reg_read(priv, YT921X_LED2_PORTn(port), &val);
+	if (res)
+		return res;
+
+	/* The chip seems to jam a while if changing duty only */
+	ctrl = val & ~YT921X_LED2_PORT_FORCEn_M(group);
+	ctrl |= YT921X_LED2_PORT_FORCEn_OFF(group);
+	if (ctrl != val) {
+		res = yt921x_reg_write(priv, YT921X_LED2_PORTn(port), ctrl);
+		if (res)
+			return res;
+	}
+
+	ctrl = val & ~(YT921X_LED2_PORT_FORCEn_M(group) |
+		       YT921X_LED2_PORT_FORCE_BLINKn_M(group));
+	ctrl |= YT921X_LED2_PORT_FORCEn_BLINK(group);
+	if (use_cycle)
+		ctrl |= YT921X_LED2_PORT_FORCE_BLINKn_OTHER(group);
+	else
+		ctrl |= YT921X_LED2_PORT_FORCE_BLINKn(group, __fls(cycle) - 9);
+	res = yt921x_reg_write(priv, YT921X_LED2_PORTn(port), ctrl);
+	if (res)
+		return res;
+
+	if (use_cycle) {
+		led->use_cycle = true;
+		pp->blink_cycle = cycle;
+	}
+	led->use_duty = true;
+	pp->blink_duty = duty;
+
+	*onp = duty * cycle / YT921X_LED_DUTY_DENOM;
+	*offp = cycle - *onp;
+	return 0;
+}
+
+static u32 yt921x_led_trigger_maps[__TRIGGER_NETDEV_MAX] = {
+	[TRIGGER_NETDEV_LINK]		= YT921X_LEDx_PORT_ACT_ACTIVE,
+	[TRIGGER_NETDEV_LINK_10]	= YT921X_LEDx_PORT_ACT_10M,
+	[TRIGGER_NETDEV_LINK_100]	= YT921X_LEDx_PORT_ACT_100M,
+	[TRIGGER_NETDEV_LINK_1000]	= YT921X_LEDx_PORT_ACT_1000M,
+	[TRIGGER_NETDEV_HALF_DUPLEX]	= YT921X_LEDx_PORT_ACT_DUPLEX_HALF,
+	[TRIGGER_NETDEV_FULL_DUPLEX]	= YT921X_LEDx_PORT_ACT_DUPLEX_FULL,
+	[TRIGGER_NETDEV_TX]		= YT921X_LEDx_PORT_ACT_TX,
+	[TRIGGER_NETDEV_RX]		= YT921X_LEDx_PORT_ACT_RX,
+};
+
+static bool yt921x_led_trigger_is_supported(int group, unsigned long flags)
+{
+	unsigned int i;
+
+	for_each_set_bit(i, &flags, __TRIGGER_NETDEV_MAX)
+		if (!yt921x_led_trigger_maps[i])
+			return false;
+
+	return true;
+}
+
+static int
+yt921x_led_trigger_get(struct yt921x_priv *priv, int port, int group,
+		       unsigned long *flagsp)
+{
+	u32 addr = yt921x_led_regaddr(priv, port, group);
+	u32 val;
+	int res;
+
+	res = yt921x_reg_read(priv, addr, &val);
+	if (res)
+		return res;
+
+	*flagsp = 0;
+	for (unsigned int i = 0; i < __TRIGGER_NETDEV_MAX; i++)
+		if (val & yt921x_led_trigger_maps[i])
+			*flagsp |= BIT(i);
+
+	return 0;
+}
+
+static int
+yt921x_led_trigger_set(struct yt921x_priv *priv, int port, int group,
+		       unsigned long flags)
+{
+	struct yt921x_port *pp = priv->ports[port];
+	struct yt921x_led *led = &pp->leds[group];
+	unsigned int i;
+	u32 addr;
+	u32 ctrl;
+	u32 mask;
+	int res;
+
+	if (!pp)
+		return -ENODEV;
+
+	ctrl = 0;
+	for_each_set_bit(i, &flags, __TRIGGER_NETDEV_MAX) {
+		if (!yt921x_led_trigger_maps[i])
+			return -EOPNOTSUPP;
+
+		ctrl |= yt921x_led_trigger_maps[i];
+	}
+
+	led->use_cycle = false;
+	led->use_duty = false;
+
+	mask = !group ? YT921X_LED0_PORT_ACT_M : YT921X_LEDx_PORT_ACT_M;
+	if (group == 2) {
+		mask |= YT921X_LED2_PORT_FORCEn_M(group);
+		ctrl |= YT921X_LED2_PORT_FORCEn_DONTCARE(group);
+	}
+	addr = yt921x_led_regaddr(priv, port, group);
+	res = yt921x_reg_update_bits(priv, addr, mask, ctrl);
+	if (res)
+		return res;
+
+	if (group != 2) {
+		mask = YT921X_LED2_PORT_FORCEn_M(group);
+		ctrl = YT921X_LED2_PORT_FORCEn_DONTCARE(group);
+		res = yt921x_reg_update_bits(priv, YT921X_LED2_PORTn(port),
+					     mask, ctrl);
+		if (res)
+			return res;
+	}
+
+	return 0;
+}
+
+static enum led_brightness
+yt921x_cled_brightness_get(struct led_classdev *led_cdev)
+{
+	struct yt921x_led *led = to_yt921x_led(led_cdev);
+	struct yt921x_port *pp = to_yt921x_port(led);
+	struct yt921x_priv *priv = to_yt921x_priv(pp);
+	bool on = false;
+
+	mutex_lock(&priv->reg_lock);
+	yt921x_led_force_get(priv, pp->index, led->group, &on);
+	mutex_unlock(&priv->reg_lock);
+
+	return on ? LED_ON : LED_OFF;
+}
+
+static int
+yt921x_cled_brightness_set_blocking(struct led_classdev *led_cdev,
+				    enum led_brightness brightness)
+{
+	struct yt921x_led *led = to_yt921x_led(led_cdev);
+	struct yt921x_port *pp = to_yt921x_port(led);
+	struct yt921x_priv *priv = to_yt921x_priv(pp);
+	int res;
+
+	mutex_lock(&priv->reg_lock);
+	res = yt921x_led_force_set(priv, pp->index, led->group, brightness);
+	mutex_unlock(&priv->reg_lock);
+
+	return res;
+}
+
+static int
+yt921x_cled_blink_set(struct led_classdev *led_cdev, unsigned long *delay_on,
+		      unsigned long *delay_off)
+{
+	struct yt921x_led *led = to_yt921x_led(led_cdev);
+	struct yt921x_port *pp = to_yt921x_port(led);
+	struct yt921x_priv *priv = to_yt921x_priv(pp);
+	int res;
+
+	mutex_lock(&priv->reg_lock);
+	res = yt921x_led_blink_set(priv, pp->index, led->group, delay_on,
+				   delay_off);
+	mutex_unlock(&priv->reg_lock);
+
+	return res;
+}
+
+static struct device * __maybe_unused
+yt921x_cled_hw_control_get_device(struct led_classdev *led_cdev)
+{
+	struct yt921x_led *led = to_yt921x_led(led_cdev);
+	struct yt921x_port *pp = to_yt921x_port(led);
+	struct yt921x_priv *priv = to_yt921x_priv(pp);
+	struct dsa_port *dp;
+
+	dp = dsa_to_port(&priv->ds, pp->index);
+	if (!dp || !dp->user)
+		return NULL;
+	return &dp->user->dev;
+}
+
+static int __maybe_unused
+yt921x_cled_hw_control_is_supported(struct led_classdev *led_cdev,
+				    unsigned long flags)
+{
+	struct yt921x_led *led = to_yt921x_led(led_cdev);
+
+	return yt921x_led_trigger_is_supported(led->group, flags) ? 0 :
+	       -EOPNOTSUPP;
+}
+
+static int __maybe_unused
+yt921x_cled_hw_control_get(struct led_classdev *led_cdev, unsigned long *flagsp)
+{
+	struct yt921x_led *led = to_yt921x_led(led_cdev);
+	struct yt921x_port *pp = to_yt921x_port(led);
+	struct yt921x_priv *priv = to_yt921x_priv(pp);
+	int res;
+
+	mutex_lock(&priv->reg_lock);
+	res = yt921x_led_trigger_get(priv, pp->index, led->group, flagsp);
+	mutex_unlock(&priv->reg_lock);
+
+	return res;
+}
+
+static int __maybe_unused
+yt921x_cled_hw_control_set(struct led_classdev *led_cdev, unsigned long flags)
+{
+	struct yt921x_led *led = to_yt921x_led(led_cdev);
+	struct yt921x_port *pp = to_yt921x_port(led);
+	struct yt921x_priv *priv = to_yt921x_priv(pp);
+	int res;
+
+	mutex_lock(&priv->reg_lock);
+	res = yt921x_led_trigger_set(priv, pp->index, led->group, flags);
+	mutex_unlock(&priv->reg_lock);
+
+	return res;
+}
+
+static int
+yt921x_led_setup_port(struct yt921x_priv *priv, int port,
+		      struct fwnode_handle *fwnode, u32 *invp)
+{
+	struct yt921x_port *pp = priv->ports[port];
+	struct device *dev = to_device(priv);
+	struct led_init_data init_data = {};
+	struct led_classdev *led_cdev;
+	enum led_default_state state;
+	struct yt921x_led *led;
+	char name[64];
+	u32 group;
+	int res;
+
+	if (!pp)
+		return -ENODEV;
+
+	res = fwnode_property_read_u32(fwnode, "reg", &group);
+	if (res)
+		return res;
+
+	if (group >= YT921X_LED_GROUP_NUM) {
+		dev_warn(dev, "Invalid LED reg %d defined for port %d", group,
+			 port);
+		return -EINVAL;
+	}
+
+	led = &pp->leds[group];
+	led->group = group;
+
+	led_cdev = &led->cdev;
+	state = led_init_default_state_get(fwnode);
+	switch (state) {
+	case LEDS_DEFSTATE_OFF:
+	case LEDS_DEFSTATE_ON:
+		res = yt921x_led_force_set(priv, port, group, state);
+		if (res)
+			return res;
+		led_cdev->brightness = state;
+		break;
+	case LEDS_DEFSTATE_KEEP: {
+		bool on;
+
+		res = yt921x_led_force_get(priv, port, group, &on);
+		if (res)
+			return res;
+		led_cdev->brightness = on ? LED_ON : LED_OFF;
+		break;
+	}
+	}
+	led_cdev->max_brightness = 1;
+	led_cdev->flags = LED_RETAIN_AT_SHUTDOWN;
+	led_cdev->brightness_get = yt921x_cled_brightness_get;
+	led_cdev->brightness_set_blocking = yt921x_cled_brightness_set_blocking;
+	led_cdev->blink_set = yt921x_cled_blink_set;
+#ifdef CONFIG_LEDS_TRIGGERS
+	led_cdev->hw_control_trigger = "netdev";
+	led_cdev->hw_control_get_device = yt921x_cled_hw_control_get_device;
+	led_cdev->hw_control_is_supported = yt921x_cled_hw_control_is_supported;
+	led_cdev->hw_control_get = yt921x_cled_hw_control_get;
+	led_cdev->hw_control_set = yt921x_cled_hw_control_set;
+#endif
+
+	init_data.fwnode = fwnode;
+	snprintf(name, sizeof(name), YT921X_NAME "-%d:%02d:%d", priv->ds.index,
+		 port, group);
+	init_data.devicename = name;
+	init_data.devname_mandatory = true;
+
+	res = devm_led_classdev_register_ext(dev, led_cdev, &init_data);
+	if (res) {
+		dev_warn(dev, "Failed to init LED %d for port %d", group, port);
+		return res;
+	}
+
+	return 0;
+}
+
+int yt921x_led_setup(struct yt921x_priv *priv)
+{
+	struct dsa_switch *ds = &priv->ds;
+	struct dsa_port *dp;
+	u32 mask;
+	u32 ctrl;
+	int res;
+
+	mask = YT921X_LED_CTRL_MODE_M | YT921X_LED_CTRL_PORT_NUM_M |
+	       YT921X_LED_CTRL_EN;
+	ctrl = YT921X_LED_CTRL_MODE_PARALLEL | YT921X_LED_CTRL_PORT_NUM_M |
+	       YT921X_LED_CTRL_EN;
+	res = yt921x_reg_update_bits(priv, YT921X_LED_CTRL, mask, ctrl);
+	if (res)
+		return res;
+
+	ctrl = 0;
+	dsa_switch_for_each_port(dp, ds) {
+		struct device_node *leds_np;
+
+		if (!dp->dn)
+			continue;
+
+		leds_np = of_get_child_by_name(dp->dn, "leds");
+		if (!leds_np)
+			continue;
+
+		for_each_child_of_node_scoped(leds_np, led_np) {
+			res = yt921x_led_setup_port(priv, dp->index,
+						    of_fwnode_handle(led_np),
+						    &ctrl);
+			if (res)
+				break;
+		}
+
+		of_node_put(leds_np);
+		if (res)
+			return res;
+	}
+
+	res = yt921x_reg_write(priv, YT921X_LED_PAR_INV, ctrl);
+	if (res)
+		return res;
+
+	return 0;
+}
diff --git a/drivers/net/dsa/motorcomm/leds.h b/drivers/net/dsa/motorcomm/leds.h
new file mode 100644
index 000000000000..265d5ea5f04e
--- /dev/null
+++ b/drivers/net/dsa/motorcomm/leds.h
@@ -0,0 +1,104 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2026 David Yang
+ */
+
+#ifndef _YT_LEDS_H
+#define _YT_LEDS_H
+
+#include <linux/bitfield.h>
+#include <linux/bits.h>
+#include <linux/kconfig.h>
+
+#define YT921X_LED_CTRL			0xd0000
+#define  YT921X_LED_CTRL_EN			BIT(21)
+#define  YT921X_LED_CTRL_LOOPDETECT_BLINK_M	GENMASK(20, 19)	/* cycle = 512 * x ms */
+#define   YT921X_LED_CTRL_LOOPDETECT_BLINK(x)		FIELD_PREP(YT921X_LED_CTRL_LOOPDETECT_BLINK_M, (x))
+#define  YT921X_LED_CTRL_PORT_NUM_M		GENMASK(16, 13)
+#define   YT921X_LED_CTRL_PORT_NUM(x)			FIELD_PREP(YT921X_LED_CTRL_PORT_NUM_M, (x))
+#define  YT921X_LED_CTRL_MODE_M			GENMASK(1, 0)
+#define   YT921X_LED_CTRL_MODE(x)			FIELD_PREP(YT921X_LED_CTRL_MODE_M, (x))
+#define   YT921X_LED_CTRL_MODE_PARALLEL			YT921X_LED_CTRL_MODE(0)
+#define   YT921X_LED_CTRL_MODE_SERIAL			YT921X_LED_CTRL_MODE(2)
+#define YT921X_LED0_PORTn(port)		(0xd0004 + 4 * (port))
+#define  YT921X_LED0_PORT_ACT_M			GENMASK(17, 0)
+#define  YT921X_LED0_PORT_ACT_LINK_TRY_DIS	BIT(17)
+#define  YT921X_LED0_PORT_ACT_COLLISION_BLINK	BIT(16)
+#define YT921X_LED1_PORTn(port)		(0xd0040 + 4 * (port))
+#define  YT921X_LED1_PORT_OTHER_BLINK_M		GENMASK(31, 30)	/* cycle = 512 >> x ms */
+#define   YT921X_LED1_PORT_OTHER_BLINK(x)		FIELD_PREP(YT921X_LED1_PORT_OTHER_BLINK_M, (x))
+#define  YT921X_LED1_PORT_EEE_BLINK_M		GENMASK(29, 28)	/* cycle = 512 >> x ms */
+#define   YT921X_LED1_PORT_EEE_BLINK(x)			FIELD_PREP(YT921X_LED1_PORT_EEE_BLINK_M, (x))
+#define  YT921X_LED1_PORT_BLINK_DUTY_COMP	BIT(27)
+#define  YT921X_LED1_PORT_BLINK_DUTY_M		GENMASK(26, 25)
+#define   YT921X_LED1_PORT_BLINK_DUTY(x)		FIELD_PREP(YT921X_LED1_PORT_BLINK_DUTY_M, (x))
+#define   YT921X_LED1_PORT_BLINK_DUTY_1_2		YT921X_LED1_PORT_BLINK_DUTY(0)
+#define   YT921X_LED1_PORT_BLINK_DUTY_2_3		YT921X_LED1_PORT_BLINK_DUTY(1)
+#define   YT921X_LED1_PORT_BLINK_DUTY_3_4		YT921X_LED1_PORT_BLINK_DUTY(2)
+#define   YT921X_LED1_PORT_BLINK_DUTY_5_6		YT921X_LED1_PORT_BLINK_DUTY(3)
+#define YT921X_LED2_PORTn(port)		(0xd0080 + 4 * (port))
+#define  YT921X_LED2_PORT_FORCEn_M(grp)		GENMASK(4 * (grp) + 19, 4 * (grp) + 18)
+#define   YT921X_LED2_PORT_FORCEn(grp, x)		((x) << (4 * (grp) + 18))
+#define   YT921X_LED2_PORT_FORCEn_DONTCARE(grp)		YT921X_LED2_PORT_FORCEn(grp, 0)
+#define   YT921X_LED2_PORT_FORCEn_BLINK(grp)		YT921X_LED2_PORT_FORCEn(grp, 1)
+#define   YT921X_LED2_PORT_FORCEn_ON(grp)		YT921X_LED2_PORT_FORCEn(grp, 2)
+#define   YT921X_LED2_PORT_FORCEn_OFF(grp)		YT921X_LED2_PORT_FORCEn(grp, 3)
+#define  YT921X_LED2_PORT_FORCE_BLINKn_M(grp)	GENMASK(4 * (grp) + 17, 4 * (grp) + 16)	/* cycle = 512 << x ms */
+#define   YT921X_LED2_PORT_FORCE_BLINKn(grp, x)		((x) << (4 * (grp) + 16))
+#define   YT921X_LED2_PORT_FORCE_BLINKn_OTHER(grp)	YT921X_LED2_PORT_FORCE_BLINKn(grp, 3)
+#define  YT921X_LEDx_PORT_ACT_M			GENMASK(16, 0)
+#define  YT921X_LEDx_PORT_ACT_EEE		BIT(15)
+#define  YT921X_LEDx_PORT_ACT_LOOPDETECT	BIT(14)
+#define  YT921X_LEDx_PORT_ACT_ACTIVE		BIT(13)
+#define  YT921X_LEDx_PORT_ACT_DUPLEX_FULL	BIT(12)
+#define  YT921X_LEDx_PORT_ACT_DUPLEX_HALF	BIT(11)
+#define  YT921X_LEDx_PORT_ACT_TX_BLINK		BIT(10)
+#define  YT921X_LEDx_PORT_ACT_RX_BLINK		BIT(9)
+#define  YT921X_LEDx_PORT_ACT_TX		BIT(8)
+#define  YT921X_LEDx_PORT_ACT_RX		BIT(7)
+#define  YT921X_LEDx_PORT_ACT_1000M		BIT(6)
+#define  YT921X_LEDx_PORT_ACT_100M		BIT(5)
+#define  YT921X_LEDx_PORT_ACT_10M		BIT(4)
+#define  YT921X_LEDx_PORT_ACT_COLLISION_BLINK_EN	BIT(3)
+#define  YT921X_LEDx_PORT_ACT_1000M_BLINK	BIT(2)
+#define  YT921X_LEDx_PORT_ACT_100M_BLINK	BIT(1)
+#define  YT921X_LEDx_PORT_ACT_10M_BLINK		BIT(0)
+#define YT921X_LED_SER_CTRL		0xd0100
+#define  YT921X_LED_SER_CTRL_EN			GENMASK(25, 24)
+#define  YT921X_LED_SER_CTRL_ACTIVE_LOW		BIT(4)
+#define  YT921X_LED_SER_CTRL_LED_NUM_M		GENMASK(1, 0)	/* #led - 1 */
+#define   YT921X_LED_SER_CTRL_LED_NUM(x)		FIELD_PREP(YT921X_LED_SER_CTRL_LED_NUM_M, (x))
+#define YT921X_LED_SER_MAPnm(grp, port)	(0xd0104 + 8 * (2 - (grp)) + 4 * ((port) / 5))
+#define  YT921X_LED_SER_MAP_DSTn_PORT_M(port)	GENMASK(6 * ((port) % 5) + 5, 6 * ((port) % 5) + 2)
+#define   YT921X_LED_SER_MAP_DSTn_PORT(port, x)		((x) << (6 * ((port) % 5) + 2))
+#define  YT921X_LED_SER_MAP_DSTn_LED_M(port)	GENMASK(6 * ((port) % 5) + 1, 6 * ((port) % 5))
+#define   YT921X_LED_SER_MAP_DSTn_LED(port, x)		((x) << (6 * ((port) % 5)))
+#define YT921X_LED_PAR_PORTS		0xd01c4
+#define YT921X_LED_PAR_INV		0xd01c8
+#define  YT921X_LED_PAR_INV_INVnm(grp, port)	BIT(10 * (grp) + (port))
+#define YT921X_LED_PAR_MAPn(port)	(0xd01d0 + 4 * (port))
+#define  YT921X_LED_PAR_MAP_DSTn_PORT_M(grp)	GENMASK(6 * (grp) + 5, 6 * (grp) + 2)
+#define   YT921X_LED_PAR_MAP_DSTn_PORT(grp, x)		((x) << (6 * (grp) + 2))
+#define  YT921X_LED_PAR_MAP_DSTn_LED_M(grp)	GENMASK(6 * (grp) + 1, 6 * (grp))
+#define   YT921X_LED_PAR_MAP_DSTn_LED(grp, x)		((x) << (6 * (grp)))
+
+#define YT921X_LED_BLINK_MIN	64
+#define YT921X_LED_BLINK_DEF	512
+#define YT921X_LED_BLINK_MAX	2048
+
+struct yt921x_priv;
+
+#if IS_ENABLED(CONFIG_NET_DSA_YT921X_LEDS)
+
+int yt921x_led_setup(struct yt921x_priv *priv);
+
+#else
+
+static inline int yt921x_led_setup(struct yt921x_priv *priv)
+{
+	return 0;
+}
+
+#endif
+
+#endif
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net] eth: bnxt: improve the timing of stats
From: Michael Chan @ 2026-06-18 20:35 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms,
	pavan.chebbi
In-Reply-To: <20260618181358.3037661-1-kuba@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1982 bytes --]

On Thu, Jun 18, 2026 at 11:14 AM Jakub Kicinski <kuba@kernel.org> wrote:

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 055e93a417b6..25462f854478 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -10575,6 +10575,35 @@ static void bnxt_accumulate_all_stats(struct bnxt *bp)
>         }
>  }
>
> +/* Re-accumulate stats from DMA buffers if stale.
> + * uAPIs for reading sw_stats should call this first.
> + *
> + * We promise user space update frequency of bp->stats_coal_ticks but
> + * the update is a two step process - first device updates the DMA buffer,
> + * then we have to update from that buffer to driver stats in the service work.
> + * Worst case we would be 2x off from the desired frequency.
> + * Sync the stats sooner, if stale. The 20% threshold was chosen arbitrarily.
> + *
> + * Ideally we would split the user-configured time into two portions,
> + * i.e. also lower the DMA period by the 20%. But the DMA timer seems to have
> + * too coarse granularity to play such tricks.
> + */
> +void bnxt_sync_stats(struct bnxt *bp)
> +{
> +       unsigned long stale;
> +
> +       if (!netif_running(bp->dev) || !bp->stats_coal_ticks)
> +               return;
> +
> +       spin_lock(&bp->stats_lock);
> +       stale = usecs_to_jiffies(bp->stats_coal_ticks / 5);
> +       if (time_after_eq(jiffies, bp->stats_updated_jiffies + stale)) {
> +               bnxt_accumulate_all_stats(bp);

This call will accumulate all stats including ring stats and port
stats.  I think only the ring stats are worth accumulating because
they may have been updated by DMA.  The port stats should not have
changed.  They only change after calling bnxt_hwrm_port_qstats(), etc.

So ideally, we should factor out the ring stats part from
bnxt_accumulate_all_stats() and only accumulate the ring stats here.
Thanks.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]

^ permalink raw reply

* Re: [PATCH net] net: dsa: realtek: fix memory leak in rtl8366rb_setup_led()
From: David Yang @ 2026-06-18 20:52 UTC (permalink / raw)
  To: Luiz Angelo Daros de Luca
  Cc: netdev, Linus Walleij, Alvin Šipraga, Andrew Lunn,
	Vladimir Oltean, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, linux-kernel
In-Reply-To: <CAJq09z7fBSdknaNW0ufBYm4wO2vL1tuBjBW5FV3NeGRg8749SA@mail.gmail.com>

On Fri, Jun 19, 2026 at 4:12 AM Luiz Angelo Daros de Luca
<luizluca@gmail.com> wrote:
> Indeed, it will leak. init_data is local and init_data.devicename is
> read by led_compose_name, not stored. However, stack is a limited
> space for allocation.

I've checked the buffer is long enough to hold the name string while
relatively small (only 64B), so it should be safe for the stack.

> You can alternatively solve the leak using devm_kasprintf() (my
> choice) or adding a kfree() before leaving the function.

devm_kasprintf() still makes the memory unused (later in the driver)
and unusable (since normally you won't unload the switch driver until
shutdown), IMO.

^ permalink raw reply

* Re: [PATCH] net: add sock_open() for unified socket creation
From: Al Viro @ 2026-06-18 21:12 UTC (permalink / raw)
  To: Alex Goltsev; +Cc: davem, netdev, linux-kernel
In-Reply-To: <CAEKmD4KSvAGWEod3h8mPKQ-UYhKqakxfakt4gXrsU8sWuAO77g@mail.gmail.com>

On Thu, Jun 18, 2026 at 04:06:31PM +0300, Alex Goltsev wrote:

> +/**
> + * sock_open - creates a socket (unified interface)
> + * @family: protocol family (AF_INET, ...)
> + * @type: communication type (SOCK_STREAM, ...)
> + * @protocol: protocol (0, ...)
> + * @res: new socket
> + * @sock_type: one of SOCK_USER, SOCK_KERN, or SOCK_LITE
> + *
> + * Unified entry point for all socket creation variants.
> + * SOCK_USER creates a userspace socket (via sock_create).
> + * SOCK_KERN creates a kernel socket (via sock_create_kern).
> + * SOCK_LITE creates a lightweight uninitialized socket (via sock_create_lite).
> + *
> + * Return: 0 on success, negative errno on failure. On failure @res is NULL.
> + */

What's the point (and why not make it inline, while we are at it)?

Are there really callers that would pass a non-constant value as the last argument,
and if so, what are they doing next?

^ permalink raw reply

* Re: [PATCH net 1/2] sctp: factor out INIT verification failure handling
From: Xin Long @ 2026-06-18 21:53 UTC (permalink / raw)
  To: Simon Horman
  Cc: netdev, linux-sctp, davem, kuba, edumazet, pabeni,
	marcelo.leitner
In-Reply-To: <20260617103954.852101-1-horms@kernel.org>

On Wed, Jun 17, 2026 at 6:40 AM Simon Horman <horms@kernel.org> wrote:
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://netdev-ai.bots.linux.dev/sashiko/
> ---
> sctp: factor out INIT verification failure handling
>
> This refactor moves the duplicated INIT/INIT-ACK error handling out of
> sctp_sf_do_5_1B_init(), sctp_sf_do_5_1C_ack(), and
> sctp_sf_do_unexpected_init() into a new helper sctp_abort_on_init_err().
> The commit message states "No functional change intended."
>
> > diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> > index 9b23c11cbb9e..544f308ee527 100644
> > --- a/net/sctp/sm_statefuns.c
> > +++ b/net/sctp/sm_statefuns.c
>
> [ ... ]
>
> > @@ -548,50 +531,9 @@ enum sctp_disposition sctp_sf_do_5_1C_ack(struct net *net,
> >       err_chunk = NULL;
> >       if (!sctp_verify_init(net, ep, asoc, chunk->chunk_hdr->type,
> >                             (struct sctp_init_chunk *)chunk->chunk_hdr, chunk,
> > -                           &err_chunk)) {
> > -
> > -             enum sctp_error error = SCTP_ERROR_NO_RESOURCE;
> > -
> > -             /* This chunk contains fatal error. It is to be discarded.
> > -              * Send an ABORT, with causes.  If there are no causes,
> > -              * then there wasn't enough memory.  Just terminate
> > -              * the association.
> > -              */
> > -             if (err_chunk) {
> > -                     packet = sctp_abort_pkt_new(net, ep, asoc, arg,
> > -                                     (__u8 *)(err_chunk->chunk_hdr) +
> > -                                     sizeof(struct sctp_chunkhdr),
> > -                                     ntohs(err_chunk->chunk_hdr->length) -
> > -                                     sizeof(struct sctp_chunkhdr));
> > -
> > -                     sctp_chunk_free(err_chunk);
> > -
> > -                     if (packet) {
> > -                             sctp_add_cmd_sf(commands, SCTP_CMD_SEND_PKT,
> > -                                             SCTP_PACKET(packet));
> > -                             SCTP_INC_STATS(net, SCTP_MIB_OUTCTRLCHUNKS);
> > -                             error = SCTP_ERROR_INV_PARAM;
> > -                     }
> > -             }
> > -
> > -             /* SCTP-AUTH, Section 6.3:
> > -              *    It should be noted that if the receiver wants to tear
> > -              *    down an association in an authenticated way only, the
> > -              *    handling of malformed packets should not result in
> > -              *    tearing down the association.
> > -              *
> > -              * This means that if we only want to abort associations
> > -              * in an authenticated way (i.e AUTH+ABORT), then we
> > -              * can't destroy this association just because the packet
> > -              * was malformed.
> > -              */
> > -             if (sctp_auth_recv_cid(SCTP_CID_ABORT, asoc))
> > -                     return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
> > -
> > -             SCTP_INC_STATS(net, SCTP_MIB_ABORTEDS);
> > -             return sctp_stop_t1_and_abort(net, commands, error, ECONNREFUSED,
> > -                                             asoc, chunk->transport);
> > -     }
> > +                           &err_chunk))
> > +             return sctp_abort_on_init_err(net, ep, asoc, chunk, arg,
> > +                                           commands, err_chunk);
>
> [High]
> In the original sctp_sf_do_5_1C_ack() the "if (err_chunk) { ... }"
> block is non-controlling: when err_chunk is NULL the code skips the
> block, falls through with error == SCTP_ERROR_NO_RESOURCE, consults
> sctp_auth_recv_cid(SCTP_CID_ABORT, asoc) and either pdiscards or runs
> sctp_stop_t1_and_abort() to tear the half-open association down.
>
> [High]
> In the new helper, can the INIT-ACK + NULL err_chunk case still reach
> that path?  Looking at sctp_abort_on_init_err():
>
>   if (!err_chunk)
>           return sctp_sf_tabort_8_4_8(net, ep, asoc, SCTP_ST_CHUNK(0),
>                                       arg, commands);
>
>   ...
>   if (chunk->chunk_hdr->type != SCTP_CID_INIT_ACK) {
>           ...
>   }
>   /* SCTP-AUTH, Section 6.3 ... */
>   if (sctp_auth_recv_cid(SCTP_CID_ABORT, asoc))
>           return sctp_sf_pdiscard(...);
>   SCTP_INC_STATS(net, SCTP_MIB_ABORTEDS);
>   return sctp_stop_t1_and_abort(net, commands, error, ECONNREFUSED,
>                                 asoc, chunk->transport);
>
> [High]
> The !err_chunk early return runs before the INIT-ACK type check, so
> when sctp_verify_init() returns 0 with *errp left as NULL on an
> INIT-ACK, control now goes to sctp_sf_tabort_8_4_8() rather than the
> auth-check + sctp_stop_t1_and_abort() path.
>
> [High]
> Is the NULL-err_chunk case actually reachable on an INIT-ACK?  In
> sctp_process_inv_mandatory() and sctp_process_inv_paramlength() in
> net/sctp/sm_make_chunk.c, *errp is allocated via sctp_make_op_error_space()
> or sctp_make_violation_paramlen() and the function returns 0
> unconditionally:
>
>   if (!*errp)
>           *errp = sctp_make_op_error_space(asoc, chunk, 0);
>   if (*errp)
>           sctp_init_cause(...);
>   return 0;
>
> So an allocation failure on a malformed INIT-ACK reaches the helper
> with err_chunk == NULL.
>
> [High]
> On that path, does the new code skip several actions that the original
> performed?  sctp_sf_tabort_8_4_8() emits an OOTB ABORT and pdiscards;
> it does not stop T1-init, does not call SCTP_CMD_SET_SK_ERR(ECONNREFUSED),
> does not issue SCTP_CMD_INIT_FAILED, does not transition the asoc to
> SCTP_STATE_CLOSED, and does not increment SCTP_MIB_ABORTEDS.
>
> [High]
> As a result, would connect(2) wait for T1-init to exhaust its
> retransmissions instead of receiving ECONNREFUSED promptly, and would
> SCTP_MIB_ABORTEDS no longer be incremented (with SCTP_MIB_IN_PKT_DISCARDS
> being incremented in its place)?
>
> [High]
> Does this also bypass the SCTP-AUTH section 6.3 check quoted in the
> comment?  Previously, when AUTH-ABORT was negotiated and err_chunk
> was NULL, sctp_auth_recv_cid() caused sctp_sf_pdiscard() to be returned
> and no ABORT was emitted on the wire.  In the new path, sctp_sf_tabort_8_4_8()
> sends an unauthenticated OOTB ABORT regardless of AUTH negotiation.
>
> [High]
> Finally, does the disposition return value also change from
> SCTP_DISPOSITION_ABORT to SCTP_DISPOSITION_CONSUME on this path?
>
> [High]
> Would moving the chunk->chunk_hdr->type check ahead of the !err_chunk
> early return restore the original semantics for INIT-ACK while keeping
> the sctp_sf_tabort_8_4_8() fallback for INIT?
>
All above is talking about one regression, the (!err_chunk) check should be
after (chunk->chunk_hdr->type != SCTP_CID_INIT_ACK) check.

This is a good catch, and I will fix it on v2.

Thanks.

> [ ... ]
>
> > @@ -6485,6 +6403,61 @@ static void sctp_send_stale_cookie_err(struct net *net,
> >       }
> >  }
> >
> > +static enum sctp_disposition sctp_abort_on_init_err(
> > +                                     struct net *net,
> > +                                     const struct sctp_endpoint *ep,
> > +                                     const struct sctp_association *asoc,
> > +                                     const struct sctp_chunk *chunk,
> > +                                     void *arg,
> > +                                     struct sctp_cmd_seq *commands,
> > +                                     struct sctp_chunk *err_chunk)
> > +{
> > +     enum sctp_error error = SCTP_ERROR_NO_RESOURCE;
> > +     struct sctp_packet *packet;
> > +     struct sctp_chunkhdr *ch;
> > +
> > +     if (!err_chunk)
> > +             return sctp_sf_tabort_8_4_8(net, ep, asoc, SCTP_ST_CHUNK(0),
> > +                                         arg, commands);
> > +
> > +     ch = err_chunk->chunk_hdr;
> > +     packet = sctp_abort_pkt_new(net, ep, asoc, arg,
> > +                                 (__u8 *)ch + sizeof(*ch),
> > +                                 ntohs(ch->length) - sizeof(*ch));
> > +
> > +     sctp_chunk_free(err_chunk);
> > +
> > +     if (packet) {
> > +             sctp_add_cmd_sf(commands, SCTP_CMD_SEND_PKT,
> > +                             SCTP_PACKET(packet));
> > +             SCTP_INC_STATS(net, SCTP_MIB_OUTCTRLCHUNKS);
> > +             error = SCTP_ERROR_INV_PARAM;
> > +     }
> > +
> > +     if (chunk->chunk_hdr->type != SCTP_CID_INIT_ACK) {
> > +             if (!packet)
> > +                     return SCTP_DISPOSITION_NOMEM;
> > +             return SCTP_DISPOSITION_CONSUME;
> > +     }
> > +     /* SCTP-AUTH, Section 6.3:
> > +      *    It should be noted that if the receiver wants to tear
> > +      *    down an association in an authenticated way only, the
> > +      *    handling of malformed packets should not result in
> > +      *    tearing down the association.
> > +      *
> > +      * This means that if we only want to abort associations
> > +      * in an authenticated way (i.e AUTH+ABORT), then we
> > +      * can't destroy this association just because the packet
> > +      * was malformed.
> > +      */
> > +     if (sctp_auth_recv_cid(SCTP_CID_ABORT, asoc))
> > +             return sctp_sf_pdiscard(net, ep, asoc, SCTP_ST_CHUNK(0), arg,
> > +                                     commands);
> > +
> > +     SCTP_INC_STATS(net, SCTP_MIB_ABORTEDS);
> > +     return sctp_stop_t1_and_abort(net, commands, error, ECONNREFUSED,
> > +                                   asoc, chunk->transport);
> > +}

^ permalink raw reply

* Re: [PATCH net 2/2] sctp: add INIT verification after cookie unpacking
From: Xin Long @ 2026-06-18 22:08 UTC (permalink / raw)
  To: Simon Horman
  Cc: netdev, linux-sctp, davem, kuba, edumazet, pabeni,
	marcelo.leitner
In-Reply-To: <20260617104013.852168-1-horms@kernel.org>

On Wed, Jun 17, 2026 at 6:40 AM Simon Horman <horms@kernel.org> wrote:
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://netdev-ai.bots.linux.dev/sashiko/
> ---
> Now I have enough verified information to produce the report.
>
> commit 9d74027bfd0cb22a103cca034883f0fcb4e2535a
> Author: Xin Long <lucien.xin@gmail.com>
>
> sctp: add INIT verification after cookie unpacking
>
> This commit adds sctp_verify_init() checks after cookie unpacking in
> the COOKIE-ECHO processing paths when cookie_auth_enable is disabled.
> It also changes the bounds comparison inside sctp_verify_init() from
> chunk->chunk_end to peer_init + ntohs(peer_init->chunk_hdr.length).
>
> > diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
> > index 41958b8e59fd..21b9eb1c02e9 100644
> > --- a/net/sctp/sm_make_chunk.c
> > +++ b/net/sctp/sm_make_chunk.c
> > @@ -2298,7 +2298,7 @@ int sctp_verify_init(struct net *net, const struct sctp_endpoint *ep,
> >        * VIOLATION error.  We build the ERROR chunk here and let the normal
> >        * error handling code build and send the packet.
> >        */
> > -     if (param.v != (void *)chunk->chunk_end)
> > +     if (param.v != (void *)peer_init + ntohs(peer_init->chunk_hdr.length))
> >               return sctp_process_inv_paramlength(asoc, param.p, chunk, errp);
>
> [High]
> Can this new comparison reject INIT chunks whose final parameter has a
> length that is not a multiple of 4?
>
> sctp_walk_params() advances pos.v by SCTP_PAD4(ntohs(pos.p->length)) on
> every iteration:
>
>   include/net/sctp/sctp.h:
>   #define _sctp_walk_params(pos, chunk, end)\
>   for (pos.v = (u8 *)(chunk + 1);\
>        ...
>        pos.v += SCTP_PAD4(ntohs(pos.p->length)))
>
> so when the final parameter has an unpadded length L that is not 4-byte
> aligned, pos.v ends at peer_init + offset_of_last_param + SCTP_PAD4(L),
> which exceeds peer_init + ntohs(peer_init->chunk_hdr.length) by 1..3.
>
> The previous comparison used chunk->chunk_end, which inqueue.c computes
> with the padding included:
>
>   net/sctp/inqueue.c:
>   chunk->chunk_end = ((__u8 *)ch) + SCTP_PAD4(ntohs(ch->length));
>
> so the old check accepted unaligned-length final parameters; the new
> check returns 0 from sctp_verify_init(), and the callers then invoke
> sctp_abort_on_init_err() and emit an ABORT.
>
> Note that this change is in sctp_verify_init() itself, so it affects
> every caller, not just the new COOKIE-ECHO sites: sctp_sf_do_5_1B_init,
> sctp_sf_do_5_1C_ack, sctp_sf_do_unexpected_init, sctp_sf_do_5_1D_ce
> and sctp_sf_do_5_2_4_dupcook all reach this comparison.
>
> Realistic triggers include:
>
>   - Locally-generated INITs constructed by sctp_make_init() when both
>     auth_enable and asconf_enable are set: with the default chunk IDs
>     ASCONF + ASCONF_ACK the AUTH-CHUNKS parameter has length 6 and is
>     typically the last parameter, leaving chunk_hdr.length unaligned.
>
>   - Peer INITs containing any final parameter with non-4-aligned
>     length, e.g. SCTP_PARAM_SUPPORTED_ADDRESS_TYPES with 1 or 3 types
>     last, AUTH-CHUNKS with N=1,2,3 entries last, or any unknown TLV.
>
> Even in the COOKIE-ECHO context the new bound seems too strict, since
> sctp_pack_cookie() copies the INIT verbatim with memcpy(cookie + 1,
> init_chunk->chunk_hdr, ntohs(...->length)) followed immediately by
> raw_addrs, so unpacked cookies recovered from peers that sent an INIT
> with an unaligned-length final parameter would also be spuriously
> rejected.
>
> Would comparing against peer_init + SCTP_PAD4(ntohs(peer_init->
> chunk_hdr.length)) preserve the long-standing semantics of chunk_end
> while still validating the cookie-derived peer_init bounds?
>
There's no problem here, as INIT or INIT_ACK chunk's length is always
a multiple of 4, unless it's an abnormal one. We don't need SCTP_PAD4()
for this check.

Thanks.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox