Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v3 1/2] net/sched: dualpi2: fix GSO backlog accounting
From: Jamal Hadi Salim @ 2026-06-20 13:35 UTC (permalink / raw)
  To: Xingquan Liu; +Cc: netdev, Jiri Pirko, Victor Nogueira, Chia-Yu Chang, stable
In-Reply-To: <20260619151447.223640-1-b1n@b1n.io>

On Fri, Jun 19, 2026 at 11:15 AM Xingquan Liu <b1n@b1n.io> wrote:
>
> When DualPI2 splits a GSO skb into N segments, it propagates N
> additional packets to its parent before returning NET_XMIT_SUCCESS.
> The parent then accounts for the original skb once more, leaving its
> qlen one larger than the number of packets actually queued.
>
> With QFQ as the parent, after all real packets are dequeued, QFQ still
> has a non-zero qlen while its in-service aggregate has no active
> classes. qfq_choose_next_agg() returns NULL and qfq_dequeue() passes
> the result to qfq_peek_skb(), causing a NULL pointer dereference.
>
> Follow the same pattern used by tbf_segment() and taprio: count only
> successfully queued segments, propagate the difference between the
> original skb and those segments, and return NET_XMIT_SUCCESS whenever
> at least one segment was queued.
>
> Fixes: 8f9516daedd6 ("sched: Add enqueue/dequeue of dualpi2 qdisc")
> Cc: stable@vger.kernel.org
> Signed-off-by: Xingquan Liu <b1n@b1n.io>

Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>

cheers,
jamal

> ---
> v3:
> - Move the UDP GSO sender into tdc_gso.py.
>
> v2:
> - Change patch commit message.
> - Add tdc test.
>
>  net/sched/sch_dualpi2.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
>
> diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
> index d7c3254ef800..5434df6ca8ef 100644
> --- a/net/sched/sch_dualpi2.c
> +++ b/net/sched/sch_dualpi2.c
> @@ -461,7 +461,7 @@ static int dualpi2_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>                 if (IS_ERR_OR_NULL(nskb))
>                         return qdisc_drop(skb, sch, to_free);
>
> -               cnt = 1;
> +               cnt = 0;
>                 byte_len = 0;
>                 orig_len = qdisc_pkt_len(skb);
>                 skb_list_walk_safe(nskb, nskb, next) {
> @@ -488,16 +488,15 @@ static int dualpi2_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
>                                 byte_len += nskb->len;
>                         }
>                 }
> -               if (cnt > 1) {
> +               if (cnt > 0) {
>                         /* The caller will add the original skb stats to its
>                          * backlog, compensate this if any nskb is enqueued.
>                          */
> -                       --cnt;
> -                       byte_len -= orig_len;
> +                       qdisc_tree_reduce_backlog(sch, 1 - cnt,
> +                                                 orig_len - byte_len);
>                 }
> -               qdisc_tree_reduce_backlog(sch, -cnt, -byte_len);
>                 consume_skb(skb);
> -               return err;
> +               return cnt > 0 ? NET_XMIT_SUCCESS : err;
>         }
>         return dualpi2_enqueue_skb(skb, sch, to_free);
>  }
>
> base-commit: 96e7f9122aae0ed000ee321f324b812a447906d9
> --
> Xingquan Liu
>

^ permalink raw reply

* Re: [PATCH bpf v2] bpf, sockmap: disallow update and delete from tc, xdp and flow_dissector
From: Jiayuan Chen @ 2026-06-20 13:24 UTC (permalink / raw)
  To: Sechang Lim, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	John Fastabend, Eduard Zingerman, Kumar Kartikeya Dwivedi,
	David S . Miller, Jakub Kicinski, Jesper Dangaard Brouer
  Cc: Martin KaFai Lau, Song Liu, Yonghong Song, Jiri Olsa,
	Stanislav Fomichev, Lorenz Bauer, bpf, linux-kernel, netdev
In-Reply-To: <20260620034632.2308-1-rhkrqnwk98@gmail.com>


On 6/20/26 11:46 AM, Sechang Lim wrote:
> sock_map_update_common() and __sock_map_delete() hold stab->lock and call
> sock_map_unref() -> sock_map_del_link(), which takes sk_callback_lock for
> write. That gives the order stab->lock -> sk_callback_lock.
>
> The reverse order comes from the SK_SKB stream parser.
> sk_psock_strp_data_ready() holds sk_callback_lock for read, and after the
> verdict tcp_bpf_strp_read_sock() acks the consumed data inline via
> __tcp_cleanup_rbuf(). The ACK goes out egress, where a sched_cls program
> deletes from the sockmap and takes stab->lock:
>
>    WARNING: possible circular locking dependency detected
>    7.1.0-rc6 Not tainted
>    ------------------------------------------------------
>    syz.9.8824 is trying to acquire lock:
>    (&stab->lock){+.-.}-{3:3}, at: __sock_map_delete net/core/sock_map.c:421
>    but task is already holding lock:
>    (clock-AF_INET){++.-}-{3:3}, at: sk_psock_strp_data_ready net/core/skmsg.c:1173
>
>    -> #1 (clock-AF_INET){++.-}-{3:3}:
>           _raw_write_lock_bh
>           sock_map_del_link net/core/sock_map.c:167
>           sock_map_unref net/core/sock_map.c:184
>           sock_map_update_common net/core/sock_map.c:509
>           sock_map_update_elem_sys net/core/sock_map.c:588
>           map_update_elem kernel/bpf/syscall.c:1805
>
>    -> #0 (&stab->lock){+.-.}-{3:3}:
>           _raw_spin_lock_bh
>           __sock_map_delete net/core/sock_map.c:421
>           sock_map_delete_elem net/core/sock_map.c:452
>           bpf_prog_06044d24140080b6
>           tcx_run net/core/dev.c:4451
>           sch_handle_egress net/core/dev.c:4541
>           __dev_queue_xmit net/core/dev.c:4808
>           ...
>           tcp_bpf_strp_read_sock net/ipv4/tcp_bpf.c:701
>           strp_data_ready net/strparser/strparser.c:402
>           sk_psock_strp_data_ready net/core/skmsg.c:1174
>           tcp_data_queue net/ipv4/tcp_input.c:5661
>
>    Possible unsafe locking scenario:
>
>           CPU0                    CPU1
>           ----                    ----
>      rlock(clock-AF_INET);
>                                   lock(&stab->lock);
>                                   lock(clock-AF_INET);
>      lock(&stab->lock);
>
>     *** DEADLOCK ***
>
> A tc, xdp or flow_dissector program has no reason to update or delete a
> sockmap, and redirect does not go through here. Drop them from
> may_update_sockmap() so the verifier rejects it. It also closes the
> matching sockhash inversion.
>
> Fixes: 0126240f448d ("bpf: sockmap: Allow update from BPF")
> Suggested-by: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Sechang Lim <rhkrqnwk98@gmail.com>
> ---
> v2:
>   - reject sockmap update/delete from tc, xdp and flow_dissector (John
>     Fastabend)
>   - fix the changelog (Jiayuan Chen)
>
> v1:
>   - https://lore.kernel.org/all/20260616091153.2966617-1-rhkrqnwk98@gmail.com/
>
>   kernel/bpf/verifier.c | 4 ----
>   1 file changed, 4 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 7fb88e1cd7c4..94d225521b5a 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -8766,11 +8766,7 @@ static bool may_update_sockmap(struct bpf_verifier_env *env, int func_id)
>   			return true;
>   		break;
>   	case BPF_PROG_TYPE_SOCKET_FILTER:
> -	case BPF_PROG_TYPE_SCHED_CLS:
> -	case BPF_PROG_TYPE_SCHED_ACT:
> -	case BPF_PROG_TYPE_XDP:
>   	case BPF_PROG_TYPE_SK_REUSEPORT:
> -	case BPF_PROG_TYPE_FLOW_DISSECTOR:
>   	case BPF_PROG_TYPE_SK_LOOKUP:
>   		return true;
>   	default:

CI failed.

https://github.com/kernel-patches/bpf/actions/runs/27859622337/job/82454035306

Please drop or change such trigger.


Also, please drop Fixes tag and target to bpf-next for the same reason 
in you another thread.


Nit:

You can also manually fork and create pull request against

https://github.com/kernel-patches/bpf/ to run full test to run the full

test and make sure all tests pass before you send patch.


^ permalink raw reply

* [PATCH net v2] net: wwan: iosm: bound device offsets in the MUX downlink decoder
From: Maoyi Xie @ 2026-06-20 13:13 UTC (permalink / raw)
  To: Loic Poulain, Sergey Ryazanov, Johannes Berg
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, netdev, linux-kernel, stable

mux_dl_adb_decode() walks a chain of aggregated datagram tables using
offsets and lengths taken from the modem. first_table_index,
next_table_index, table_length, datagram_index and datagram_length are
all device supplied le values. Only first_table_index was checked, and
only for being non zero. The decoder then formed adth = block +
adth_index and read the table header and the datagram entries with no
bound against the received skb. A modem that reports an index or a
length past the downlink buffer makes the decoder read out of bounds.

The buffer is IPC_MEM_MAX_DL_MUX_LITE_BUF_SIZE and skb->len is at most
that, so skb->len is the real limit, but none of these in band offsets
were checked against it.

Validate every device offset and length against skb->len before use.
The block header must fit. Each table header, on entry and after every
next_table_index, must lie inside the skb. The datagram table must fit.
Each datagram index and length must stay inside the skb. The header
padding must not exceed the datagram length so the receive length does
not wrap.

This was reproduced under KASAN as a slab out of bounds read on a normal
downlink receive once the iosm net device is up.

Fixes: 1f52d7b62285 ("net: wwan: iosm: Enable M.2 7360 WWAN card support")
Suggested-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
---
Changes in v2:
- mux_dl_process_dg now uses intermediate native endian locals dg_index
  and dg_len so the bound checks read cleaner and avoid the repeated
  le32_to_cpu conversions, per Loic Poulain's review. No functional
  change.

Link to v1: https://lore.kernel.org/all/178185979029.4044562.9993615975949055530@maoyixie.com/

 drivers/net/wwan/iosm/iosm_ipc_mux_codec.c | 33 ++++++++++++++++------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/net/wwan/iosm/iosm_ipc_mux_codec.c b/drivers/net/wwan/iosm/iosm_ipc_mux_codec.c
index bff46f7ca59f..ff9a4bc52f29 100644
--- a/drivers/net/wwan/iosm/iosm_ipc_mux_codec.c
+++ b/drivers/net/wwan/iosm/iosm_ipc_mux_codec.c
@@ -553,19 +553,21 @@ static int mux_dl_process_dg(struct iosm_mux *ipc_mux, struct mux_adbh *adbh,
 	u32 packet_offset, i, rc, dg_len;
 
 	for (i = 0; i < nr_of_dg; i++, dg++) {
-		if (le32_to_cpu(dg->datagram_index)
-				< sizeof(struct mux_adbh))
+		u32 dg_index = le32_to_cpu(dg->datagram_index);
+
+		dg_len = le16_to_cpu(dg->datagram_length);
+
+		if (dg_index < sizeof(struct mux_adbh))
 			goto dg_error;
 
-		/* Is the packet inside of the ADB */
-		if (le32_to_cpu(dg->datagram_index) >=
-					le32_to_cpu(adbh->block_length)) {
+		/* Is the packet inside of the ADB and the received skb ? */
+		if (dg_index >= le32_to_cpu(adbh->block_length) ||
+		    dg_index >= skb->len ||
+		    dg_len > skb->len - dg_index ||
+		    dl_head_pad_len >= dg_len) {
 			goto dg_error;
 		} else {
-			packet_offset =
-				le32_to_cpu(dg->datagram_index) +
-				dl_head_pad_len;
-			dg_len = le16_to_cpu(dg->datagram_length);
+			packet_offset = dg_index + dl_head_pad_len;
 			/* Pass the packet to the netif layer. */
 			rc = ipc_mux_net_receive(ipc_mux, if_id, ipc_mux->wwan,
 						 packet_offset,
@@ -595,6 +597,10 @@ static void mux_dl_adb_decode(struct iosm_mux *ipc_mux,
 	block = skb->data;
 	adbh = (struct mux_adbh *)block;
 
+	/* The block header itself must fit in the received skb. */
+	if (skb->len < sizeof(struct mux_adbh))
+		goto adb_decode_err;
+
 	/* Process the aggregated datagram tables. */
 	adth_index = le32_to_cpu(adbh->first_table_index);
 
@@ -606,6 +612,11 @@ static void mux_dl_adb_decode(struct iosm_mux *ipc_mux,
 
 	/* Loop through mixed session tables. */
 	while (adth_index) {
+		/* The table header must lie within the received skb. */
+		if (adth_index < sizeof(struct mux_adbh) ||
+		    adth_index > skb->len - sizeof(struct mux_adth))
+			goto adb_decode_err;
+
 		/* Get the reference to the table header. */
 		adth = (struct mux_adth *)(block + adth_index);
 
@@ -629,6 +640,10 @@ static void mux_dl_adb_decode(struct iosm_mux *ipc_mux,
 		if (le16_to_cpu(adth->table_length) < sizeof(struct mux_adth))
 			goto adb_decode_err;
 
+		/* The whole datagram table must fit in the received skb. */
+		if (le16_to_cpu(adth->table_length) > skb->len - adth_index)
+			goto adb_decode_err;
+
 		/* Calculate the number of datagrams. */
 		nr_of_dg = (le16_to_cpu(adth->table_length) -
 					sizeof(struct mux_adth)) /
-- 
2.34.1


^ permalink raw reply related

* [PATCH net] net: phylink: print correct c45 phy id when missing PHY driver
From: Aleksander Jan Bajkowski @ 2026-06-20 13:11 UTC (permalink / raw)
  To: linux, andrew, hkallweit1, davem, edumazet, kuba, pabeni,
	rmk+kernel, vladimir.oltean, netdev, linux-kernel
  Cc: Aleksander Jan Bajkowski

If no PHY driver is found, `phy_id` is returned. `phy_id` holds the c22 ID.
Modules with a rollball bridge support only c45 transfers. The c45 IDs are
stored in the `c45_ids` structure. In the current code these modules report
an ID 0x00000000. This may lead users to mistakenly conclude that the
rollball bridge isn't properly implemented in their SFP module. This patch
fixes the wrong IDs for c45 modules when a driver cannot be found.

Tested on Fiberstore SFP-GB-BE-T (C22) and ONTi ONT-C1TE-R05 (Rollball).

Before:
[ 2440.373985] mtk_soc_eth 15100000.ethernet sfp-lan: PHY i2c:sfp2:11 (id 0x00000000) has no driver loaded
[ 2440.383385] mtk_soc_eth 15100000.ethernet sfp-lan: Drivers which handle known common cases: CONFIG_BCM84881_PHY, CONFIG_MARVELL_PHY
[ 2440.395274] sfp sfp2: sfp_add_phy failed: -EINVAL

After:
[   82.573700] mtk_soc_eth 15100000.ethernet sfp-lan: PHY i2c:sfp2:11 (id 0x001cc898) has no driver loaded
[   82.583098] mtk_soc_eth 15100000.ethernet sfp-lan: Drivers which handle known common cases: CONFIG_BCM84881_PHY, CONFIG_MARVELL_PHY
[   82.594996] sfp sfp2: sfp_add_phy failed: -EINVAL

Fixes: ffcbfb5f9779 ("net: phylink: improve phylink_sfp_config_phy() error message with missing PHY driver")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
---
 drivers/net/phy/phylink.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 087ac63f9193..7d7595158bf9 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -3917,13 +3917,30 @@ static void phylink_sfp_link_up(void *upstream)
 	phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_LINK);
 }
 
+static u32 phylink_get_phy_id(struct phy_device *phy)
+{
+	if (phy->is_c45) {
+		const int num_ids = ARRAY_SIZE(phy->c45_ids.device_ids);
+		int i;
+
+		for (i = 1; i < num_ids; i++) {
+			if (phy->c45_ids.mmds_present & BIT(i))
+				return (phy->c45_ids.device_ids[i]);
+		}
+
+		return 0;
+	} else {
+		return phy->phy_id;
+	}
+}
+
 static int phylink_sfp_connect_phy(void *upstream, struct phy_device *phy)
 {
 	struct phylink *pl = upstream;
 
 	if (!phy->drv) {
-		phylink_err(pl, "PHY %s (id 0x%.8lx) has no driver loaded\n",
-			    phydev_name(phy), (unsigned long)phy->phy_id);
+		phylink_err(pl, "PHY %s (id 0x%.8x) has no driver loaded\n",
+			    phydev_name(phy), phylink_get_phy_id(phy));
 		phylink_err(pl, "Drivers which handle known common cases: CONFIG_BCM84881_PHY, CONFIG_MARVELL_PHY\n");
 		return -EINVAL;
 	}
-- 
2.53.0


^ permalink raw reply related

* [PATCH net 1/1] net/sched: cls_api: Handle TC_ACT_CONSUMED in tcf_qevent_handle
From: Jamal Hadi Salim @ 2026-06-20 13:07 UTC (permalink / raw)
  To: netdev
  Cc: davem, edumazet, kuba, pabeni, horms, jiri, victor,
	zdi-disclosures, security, Jamal Hadi Salim, Zero Day Initiative

tcf_classify() can return TC_ACT_CONSUMED while the skb is held by the
defragmentation engine (e.g. act_ct on out-of-order fragments). When
that happens the skb is no longer owned by the caller and must not be
touched again.

tcf_qevent_handle() did not handle TC_ACT_CONSUMED: it fell through the
switch and returned the skb to the caller as if classification had
passed. The only qdisc that wires up qevents today is RED, via three call sites
(qe_mark on RED_PROB_MARK/HARD_MARK, qe_early_drop on congestion_drop)
red_enqueue() was continuing to operate on an skb it no longer owns  in this
case -- enqueueing it, dropping it, or updating statistics. Resulting in a UAF.

  tc qdisc add dev eth0 root handle 1: red ... qevent early_drop block 10
  tc filter add block 10 ... action ct

  (with ct defrag enabled and traffic that produces out-of-order
  fragments, e.g. a fragmented UDP stream)

Handle TC_ACT_CONSUMED in tcf_qevent_handle() the same way the ingress
and egress fast paths do: treat it as stolen and return NULL without
touching the skb. Unlike the TC_ACT_STOLEN case, the skb must not be
dropped/freed here, as it is no longer owned by us.

Fixes: 3f14b377d01d ("net/sched: act_ct: fix skb leak and crash on ooo frags")
Reported-by: Zero Day Initiative <zdi-disclosures@trendmicro.com>
Tested-by: Victor Nogueira <victor@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/cls_api.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 20f7f9ee0b353..3e67600a4a1a1 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -4049,6 +4049,9 @@ struct sk_buff *tcf_qevent_handle(struct tcf_qevent *qe, struct Qdisc *sch, stru
 		skb_do_redirect(skb);
 		*ret = __NET_XMIT_STOLEN;
 		return NULL;
+	case TC_ACT_CONSUMED:
+		*ret = __NET_XMIT_STOLEN;
+		return NULL;
 	}

 	return skb;
-- 
2.34.1

^ permalink raw reply related

* [PATCH next] net: phylink: print correct c45 phy id when missing PHY driver
From: Aleksander Jan Bajkowski @ 2026-06-20 13:00 UTC (permalink / raw)
  To: linux, andrew, hkallweit1, davem, edumazet, kuba, pabeni,
	rmk+kernel, vladimir.oltean, netdev, linux-kernel
  Cc: Aleksander Jan Bajkowski

If no PHY driver is found, `phy_id` is returned. `phy_id` holds the c22 ID.
Modules with a rollball bridge support only c45 transfers. The c45 IDs are
stored in the `c45_ids` structure. In the current code these modules report
an ID 0x00000000. This may lead users to mistakenly conclude that the
rollball bridge isn't properly implemented in their SFP module. This patch
fixes the wrong IDs for c45 modules when a driver cannot be found.

Tested on Fiberstore SFP-GB-BE-T (C22) and ONTi ONT-C1TE-R05 (Rollball).

Before:
[ 2440.373985] mtk_soc_eth 15100000.ethernet sfp-lan: PHY i2c:sfp2:11 (id 0x00000000) has no driver loaded
[ 2440.383385] mtk_soc_eth 15100000.ethernet sfp-lan: Drivers which handle known common cases: CONFIG_BCM84881_PHY, CONFIG_MARVELL_PHY
[ 2440.395274] sfp sfp2: sfp_add_phy failed: -EINVAL

After:
[   82.573700] mtk_soc_eth 15100000.ethernet sfp-lan: PHY i2c:sfp2:11 (id 0x001cc898) has no driver loaded
[   82.583098] mtk_soc_eth 15100000.ethernet sfp-lan: Drivers which handle known common cases: CONFIG_BCM84881_PHY, CONFIG_MARVELL_PHY
[   82.594996] sfp sfp2: sfp_add_phy failed: -EINVAL

Fixes: ffcbfb5f9779 ("net: phylink: improve phylink_sfp_config_phy() error message with missing PHY driver")
Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl>
---
 drivers/net/phy/phylink.c | 21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 087ac63f9193..7d7595158bf9 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -3917,13 +3917,30 @@ static void phylink_sfp_link_up(void *upstream)
 	phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_LINK);
 }
 
+static u32 phylink_get_phy_id(struct phy_device *phy)
+{
+	if (phy->is_c45) {
+		const int num_ids = ARRAY_SIZE(phy->c45_ids.device_ids);
+		int i;
+
+		for (i = 1; i < num_ids; i++) {
+			if (phy->c45_ids.mmds_present & BIT(i))
+				return (phy->c45_ids.device_ids[i]);
+		}
+
+		return 0;
+	} else {
+		return phy->phy_id;
+	}
+}
+
 static int phylink_sfp_connect_phy(void *upstream, struct phy_device *phy)
 {
 	struct phylink *pl = upstream;
 
 	if (!phy->drv) {
-		phylink_err(pl, "PHY %s (id 0x%.8lx) has no driver loaded\n",
-			    phydev_name(phy), (unsigned long)phy->phy_id);
+		phylink_err(pl, "PHY %s (id 0x%.8x) has no driver loaded\n",
+			    phydev_name(phy), phylink_get_phy_id(phy));
 		phylink_err(pl, "Drivers which handle known common cases: CONFIG_BCM84881_PHY, CONFIG_MARVELL_PHY\n");
 		return -EINVAL;
 	}
-- 
2.53.0


^ permalink raw reply related

* [PATCH 2/2] selftests/bpf: validate rx_queue_index in xdp_metadata
From: Siddharth C @ 2026-06-20 12:13 UTC (permalink / raw)
  To: ast, kuba, hawk, andrii, netdev, bpf, linux-kernel,
	linux-kselftest
  Cc: Siddharth_Cibi
In-Reply-To: <20260620121321.45227-1-siddharthcibi@icloud.com>

From: Siddharth_Cibi <siddharthcibi@icloud.com>

Extend xdp_metadata selftest coverage to validate that
ctx->rx_queue_index is preserved and observable after XDP redirect
execution.

Capture rx_queue_index in metadata and assert that it matches the
expected queue during packet verification.

Signed-off-by: Siddharth_Cibi <siddharthcibi@icloud.com>
---
 tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 3 ++-
 tools/testing/selftests/bpf/progs/xdp_metadata.c      | 2 +-
 tools/testing/selftests/bpf/xdp_metadata.h            | 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
index 5c31054ad4a4..f8cabbbe7bb7 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_metadata.c
@@ -309,7 +309,8 @@ static int verify_xsk_metadata(struct xsk *xsk, bool sent_from_af_xdp)
 
 	if (!ASSERT_NEQ(meta->rx_hash, 0, "rx_hash"))
 		return -1;
-
+	if (!ASSERT_EQ(meta->rx_queue_index, QUEUE_ID, "rx_queue_index"))
+        	return -1;
 	if (!sent_from_af_xdp) {
 		if (!ASSERT_NEQ(meta->rx_hash_type & XDP_RSS_TYPE_L4, 0, "rx_hash_type"))
 			return -1;
diff --git a/tools/testing/selftests/bpf/progs/xdp_metadata.c b/tools/testing/selftests/bpf/progs/xdp_metadata.c
index 09bb8a038d52..62ae83860d7f 100644
--- a/tools/testing/selftests/bpf/progs/xdp_metadata.c
+++ b/tools/testing/selftests/bpf/progs/xdp_metadata.c
@@ -98,7 +98,7 @@ int rx(struct xdp_md *ctx)
 	bpf_xdp_metadata_rx_hash(ctx, &meta->rx_hash, &meta->rx_hash_type);
 	bpf_xdp_metadata_rx_vlan_tag(ctx, &meta->rx_vlan_proto,
 				     &meta->rx_vlan_tci);
-
+	meta->rx_queue_index = ctx->rx_queue_index;
 	return bpf_redirect_map(&xsk, ctx->rx_queue_index, XDP_PASS);
 }
 
diff --git a/tools/testing/selftests/bpf/xdp_metadata.h b/tools/testing/selftests/bpf/xdp_metadata.h
index 87318ad1117a..1f0ae4c00091 100644
--- a/tools/testing/selftests/bpf/xdp_metadata.h
+++ b/tools/testing/selftests/bpf/xdp_metadata.h
@@ -49,4 +49,5 @@ struct xdp_meta {
 		__s32 rx_vlan_tag_err;
 	};
 	enum xdp_meta_field hint_valid;
+	__u32 rx_queue_index;
 };
-- 
2.53.0


^ permalink raw reply related

* [PATCH 1/2] bpf: preserve rx_queue_index across XDP redirects
From: Siddharth C @ 2026-06-20 12:13 UTC (permalink / raw)
  To: ast, kuba, hawk, andrii, netdev, bpf, linux-kernel,
	linux-kselftest
  Cc: Siddharth C
In-Reply-To: <20260620121321.45227-1-siddharthcibi@icloud.com>

Store rx_queue_index in struct xdp_frame during xdp_buff to
xdp_frame conversion and restore it when rebuilding xdp_rxq_info
for cpumap and devmap execution paths.

This preserves ingress RX queue information for XDP programs
executed after redirect, allowing access to the original
rx_queue_index instead of losing queue context.

Also propagate rx_queue_index for zero-copy XDP frame conversion.

Signed-off-by: Siddharth_Cibi <siddharthcibi@icloud.com>
---
 include/net/xdp.h   | 2 ++
 kernel/bpf/cpumap.c | 2 +-
 kernel/bpf/devmap.c | 5 ++++-
 net/core/xdp.c      | 1 +
 4 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index aa742f413c35..90318b2b76dc 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -301,6 +301,7 @@ struct xdp_frame {
 	 */
 	enum xdp_mem_type mem_type:32;
 	struct net_device *dev_rx; /* used by cpumap */
+	u32 rx_queue_index;
 	u32 frame_sz;
 	u32 flags; /* supported values defined in xdp_buff_flags */
 };
@@ -441,6 +442,7 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
 
 	/* rxq only valid until napi_schedule ends, convert to xdp_mem_type */
 	xdp_frame->mem_type = xdp->rxq->mem.type;
+	xdp_frame->rx_queue_index = xdp->rxq->queue_index;
 
 	return xdp_frame;
 }
diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 5e59ab896f05..8f2d7013620f 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -197,7 +197,7 @@ static int cpu_map_bpf_prog_run_xdp(struct bpf_cpu_map_entry *rcpu,
 
 		rxq.dev = xdpf->dev_rx;
 		rxq.mem.type = xdpf->mem_type;
-		/* TODO: report queue_index to xdp_rxq_info */
+		rxq.queue_index = xdpf->rx_queue_index;
 
 		xdp_convert_frame_to_buff(xdpf, &xdp);
 
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index dc7b859e8bbf..f419fa0e53e5 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -339,7 +339,7 @@ static int dev_map_bpf_prog_run(struct bpf_prog *xdp_prog,
 				struct net_device *rx_dev)
 {
 	struct xdp_txq_info txq = { .dev = tx_dev };
-	struct xdp_rxq_info rxq = { .dev = rx_dev };
+	struct xdp_rxq_info rxq = { };
 	struct xdp_buff xdp;
 	int i, nframes = 0;
 
@@ -349,6 +349,9 @@ static int dev_map_bpf_prog_run(struct bpf_prog *xdp_prog,
 		int err;
 
 		xdp_convert_frame_to_buff(xdpf, &xdp);
+		rxq.dev = rx_dev;
+		rxq.mem.type = xdpf->mem_type;
+		rxq.queue_index = xdpf->rx_queue_index;
 		xdp.txq = &txq;
 		xdp.rxq = &rxq;
 
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 9890a30584ba..9691d8dfadf3 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -606,6 +606,7 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp)
 	xdpf->metasize = metasize;
 	xdpf->frame_sz = PAGE_SIZE;
 	xdpf->mem_type = MEM_TYPE_PAGE_ORDER0;
+	xdpf->rx_queue_index = xdp->rxq->queue_index;
 
 	xsk_buff_free(xdp);
 	return xdpf;
-- 
2.53.0


^ permalink raw reply related

* (no subject)
From: Siddharth C @ 2026-06-20 12:13 UTC (permalink / raw)
  To: ast, kuba, hawk, andrii, netdev, bpf, linux-kernel,
	linux-kselftest
  Cc: Siddharth C

Subject: [PATCH 0/2] bpf: preserve rx_queue_index across XDP redirects

XDP programs executed after redirect through cpumap and devmap
currently lose ingress RX queue information because rx_queue_index
is not preserved across xdp_buff to xdp_frame conversion.

Preserve rx_queue_index in struct xdp_frame and restore it when
rebuilding xdp_rxq_info for redirected execution paths.

Add a selftest validating that ctx->rx_queue_index remains available
through redirected execution.

Testing:
* Built modified kernel objects
* Ran tools/testing/selftests/bpf/test_progs -t xdp_metadata -v
* Verified xdp_metadata passes
* Added explicit rx_queue_index assertion


Siddharth C (1):
  bpf: preserve rx_queue_index across XDP redirects

Siddharth_Cibi (1):
  selftests/bpf: validate rx_queue_index in xdp_metadata

 include/net/xdp.h                                     | 2 ++
 kernel/bpf/cpumap.c                                   | 2 +-
 kernel/bpf/devmap.c                                   | 5 ++++-
 net/core/xdp.c                                        | 1 +
 tools/testing/selftests/bpf/prog_tests/xdp_metadata.c | 3 ++-
 tools/testing/selftests/bpf/progs/xdp_metadata.c      | 2 +-
 tools/testing/selftests/bpf/xdp_metadata.h            | 1 +
 7 files changed, 12 insertions(+), 4 deletions(-)

-- 
2.53.0


^ permalink raw reply

* Re: [PATCH 5.10] netdevsim: Fix memory leak of nsim_dev->fa_cookie
From: Sasha Levin @ 2026-06-20 11:55 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Sasha Levin, Mikhail Dmitrichenko, Jakub Kicinski,
	David S. Miller, Jiri Pirko, Ido Schimmel, netdev, linux-kernel,
	Andrew Lunn, Eric Dumazet, Paolo Abeni, Jiri Pirko, lvc-project,
	Wang Yufen
In-Reply-To: <20260619091507.95142-1-mdmitrichenko@astralinux.ru>

> [PATCH 5.10] netdevsim: Fix memory leak of nsim_dev->fa_cookie

Queued for 5.10, thanks.

-- 
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH 5.10] net: 9p: fix refcount leak in p9_read_work() error handling
From: Sasha Levin @ 2026-06-20 11:54 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman
  Cc: Sasha Levin, Alexander Martyniuk, Eric Van Hensbergen,
	Latchesar Ionkov, Dominique Martinet, David S. Miller,
	Jakub Kicinski, Tomas Bortoli, v9fs-developer, netdev,
	linux-kernel, Eric Van Hensbergen, Christian Schoenebeck, v9fs,
	lvc-project, Hangyu Hua
In-Reply-To: <20260618151940.76321-1-alexevgmart@gmail.com>

> [PATCH 5.10] net: 9p: fix refcount leak in p9_read_work() error handling

Queued for 5.10, thanks.

-- 
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH v6.1 0/3] Fix CVE-2026-23272
From: Sasha Levin @ 2026-06-20 11:54 UTC (permalink / raw)
  To: stable, gregkh
  Cc: Sasha Levin, pablo, fw, phil, davem, edumazet, kuba, pabeni,
	horms, netfilter-devel, coreteam, netdev, linux-kernel,
	ajay.kaher, alexey.makhalov, vamsi-krishna.brahmajosyula,
	yin.ding, tapas.kundu, Shivani Agarwal
In-Reply-To: <20260619092850.1274076-1-shivani.agarwal@broadcom.com>

> [PATCH v6.1 0/3] Fix CVE-2026-23272

Queued the series for 6.1, thanks.

-- 
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH v6.6-v6.1] netfilter: nf_tables: always walk all pending catchall elements
From: Sasha Levin @ 2026-06-20 11:54 UTC (permalink / raw)
  To: stable, gregkh
  Cc: Sasha Levin, pablo, fw, phil, davem, edumazet, kuba, pabeni,
	horms, netfilter-devel, coreteam, netdev, linux-kernel,
	ajay.kaher, alexey.makhalov, vamsi-krishna.brahmajosyula,
	yin.ding, tapas.kundu, Yiming Qian, Shivani Agarwal
In-Reply-To: <20260618083438.1269242-1-shivani.agarwal@broadcom.com>

> [PATCH v6.6-v6.1] netfilter: nf_tables: always walk all pending catchall
> elements

This one didn't apply to either 6.6.y or 6.1.y.

-- 
Thanks,
Sasha

^ permalink raw reply

* Re: [PATCH v5.10 0/2] Fix CVE-2026-23204
From: Sasha Levin @ 2026-06-20 11:54 UTC (permalink / raw)
  To: stable, gregkh
  Cc: Sasha Levin, davem, edumazet, kuba, pabeni, horms, netdev,
	linux-kernel, xiaosuo, iri, jhs, ajay.kaher, alexey.makhalov,
	vamsi-krishna.brahmajosyula, yin.ding, tapas.kundu,
	Shivani Agarwal
In-Reply-To: <20260618080807.1269070-1-shivani.agarwal@broadcom.com>

> [PATCH v5.10 0/2] Fix CVE-2026-23204

Queued the series for 5.10, thanks.

-- 
Thanks,
Sasha

^ permalink raw reply

* [PATCH net-next v2 4/4] net: phy: own phydev->psec via PSE notifier and remove fwnode_mdio hook
From: Carlo Szelinsky @ 2026-06-20 11:24 UTC (permalink / raw)
  To: Oleksij Rempel, Kory Maincent, Andrew Lunn, Heiner Kallweit,
	Russell King, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Corey Leavitt, Jonas Jelonek, netdev, linux-kernel,
	Carlo Szelinsky
In-Reply-To: <20260620112440.1734404-1-github@szelinsky.de>

From: Corey Leavitt <corey@leavitt.info>

Transfer ownership of phydev->psec from fwnode_mdio to the phy
subsystem itself. The phy subsystem now subscribes to the pse-pd
notifier chain and manages psec attach/detach in response to PSE
controller lifecycle events, while fwnode_mdio loses its PSE awareness
entirely.

phydev->psec is attached after device_add() has made the phy visible
on mdio_bus_type, under a narrow rtnl_lock() that covers only
phy_try_attach_pse(). Ordering the attach after registration closes
the race that would otherwise leave a phy unattached: a PSE_REGISTERED
event firing during registration walks mdio_bus_type and either finds
the phy already added (and attaches it) or runs before device_add(),
in which case the post-add attach resolves it. The phydev->psec check
in phy_try_attach_pse() makes the two paths idempotent. Holding rtnl
across of_pse_control_get() is safe because pse_list_mutex is never
taken in the opposite order.

device_add() is deliberately left outside rtnl. Binding a phy that
itself provides an SFP cage reaches sfp_bus_add_upstream() through
phy_probe() -> phy_setup_ports() -> phy_sfp_probe(), and
sfp_bus_add_upstream() takes rtnl_lock(); holding rtnl across
device_add() would deadlock such phys (reported on RTL8214FC).

phy_device_register() is split into the public form, which takes the
narrow rtnl_lock() around the attach, and a phy_device_register_locked()
form for callers that already hold rtnl (the SFP module state machine
via __sfp_sm_event). This pair mirrors the register_netdevice() /
register_netdev() split convention already established in the core
networking stack. The _locked form runs device_add() under the
caller's rtnl, which is safe because a phy resident on an SFP module
does not itself provide a downstream cage, so phy_sfp_probe() is a
no-op there.

  - On PSE_REGISTERED: an rtnl-guarded bus walk retries the attach for
    every registered phy whose psec is still NULL. This is the "phy
    was enumerated before the PSE controller loaded" case, the root
    cause of the boot-time probe-retry storm on systems with a modular
    PSE controller driver.

  - On PSE_UNREGISTERED: an rtnl-guarded bus walk releases every
    phydev->psec that targets the departing controller before
    pse_release_pis() frees pcdev->pi. Without this, a phy still
    holding a pse_control reference would cause a use-after-free in
    __pse_control_release()'s pcdev->pi[psec->id] access, and the PSE
    driver module could not finish unloading while any phy still held a
    reference.

A bad `pses` binding -- an error from of_pse_control_get() other than
-ENOENT (no phandle) or -EPROBE_DEFER (controller not yet registered)
-- is reported with phydev_warn() rather than silently dropped,
preserving the diagnostic that the removed fwnode_mdio lookup used to
provide.

The final pse_control_put() of phydev->psec moves from
phy_device_remove() to phy_device_release(), so it runs only after
every reference on the device -- including the bus-iterator references
taken by bus_for_each_dev() in the notifier walk -- has been dropped.

Finally, delete fwnode_find_pse_control() and its call site in
fwnode_mdiobus_register_phy(), and drop the PSE header from
fwnode_mdio.c. The MDIO/DSA probe no longer sees any PSE-originated
-EPROBE_DEFER, so the probe-retry storm is gone and fwnode_mdio is
now PSE-agnostic.

Reported-by: Jonas Jelonek <jelonek.jonas@gmail.com>
Closes: https://lore.kernel.org/netdev/e00048dd-1ed3-40c3-9912-59bccf015ad5@gmail.com/
Signed-off-by: Corey Leavitt <corey@leavitt.info>
Co-developed-by: Carlo Szelinsky <github@szelinsky.de>
Signed-off-by: Carlo Szelinsky <github@szelinsky.de>
---
 drivers/net/mdio/fwnode_mdio.c |  34 -------
 drivers/net/phy/phy_device.c   | 168 +++++++++++++++++++++++++++++++--
 drivers/net/phy/sfp.c          |   2 +-
 drivers/net/pse-pd/pse_core.c  |  14 +++
 include/linux/phy.h            |   2 +
 include/linux/pse-pd/pse.h     |   9 ++
 6 files changed, 186 insertions(+), 43 deletions(-)

diff --git a/drivers/net/mdio/fwnode_mdio.c b/drivers/net/mdio/fwnode_mdio.c
index ba7091518265..7bd979b59f49 100644
--- a/drivers/net/mdio/fwnode_mdio.c
+++ b/drivers/net/mdio/fwnode_mdio.c
@@ -11,33 +11,11 @@
 #include <linux/fwnode_mdio.h>
 #include <linux/of.h>
 #include <linux/phy.h>
-#include <linux/pse-pd/pse.h>
 
 MODULE_AUTHOR("Calvin Johnson <calvin.johnson@oss.nxp.com>");
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("FWNODE MDIO bus (Ethernet PHY) accessors");
 
-static struct pse_control *
-fwnode_find_pse_control(struct fwnode_handle *fwnode,
-			struct phy_device *phydev)
-{
-	struct pse_control *psec;
-	struct device_node *np;
-
-	if (!IS_ENABLED(CONFIG_PSE_CONTROLLER))
-		return NULL;
-
-	np = to_of_node(fwnode);
-	if (!np)
-		return NULL;
-
-	psec = of_pse_control_get(np, phydev);
-	if (PTR_ERR(psec) == -ENOENT)
-		return NULL;
-
-	return psec;
-}
-
 static struct mii_timestamper *
 fwnode_find_mii_timestamper(struct fwnode_handle *fwnode)
 {
@@ -118,7 +96,6 @@ int fwnode_mdiobus_register_phy(struct mii_bus *bus,
 				struct fwnode_handle *child, u32 addr)
 {
 	struct mii_timestamper *mii_ts = NULL;
-	struct pse_control *psec = NULL;
 	struct phy_device *phy;
 	bool is_c45;
 	u32 phy_id;
@@ -159,14 +136,6 @@ int fwnode_mdiobus_register_phy(struct mii_bus *bus,
 			goto clean_phy;
 	}
 
-	psec = fwnode_find_pse_control(child, phy);
-	if (IS_ERR(psec)) {
-		rc = PTR_ERR(psec);
-		goto unregister_phy;
-	}
-
-	phy->psec = psec;
-
 	/* phy->mii_ts may already be defined by the PHY driver. A
 	 * mii_timestamper probed via the device tree will still have
 	 * precedence.
@@ -176,9 +145,6 @@ int fwnode_mdiobus_register_phy(struct mii_bus *bus,
 
 	return 0;
 
-unregister_phy:
-	if (is_acpi_node(child) || is_of_node(child))
-		phy_device_remove(phy);
 clean_phy:
 	phy_device_free(phy);
 clean_mii_ts:
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 0615228459ef..f5febff4b00b 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -223,8 +223,19 @@ static void phy_mdio_device_free(struct mdio_device *mdiodev)
 
 static void phy_device_release(struct device *dev)
 {
+	struct phy_device *phydev = to_phy_device(dev);
+
+	/* bus_for_each_dev() holds get_device() across each iteration
+	 * step, deferring this release callback until any in-flight PSE
+	 * notifier walk has advanced past this phy. pse_control_put()
+	 * takes pse_list_mutex, so this path must run in sleepable
+	 * context.
+	 */
+	might_sleep();
+	pse_control_put(phydev->psec);
+
 	fwnode_handle_put(dev->fwnode);
-	kfree(to_phy_device(dev));
+	kfree(phydev);
 }
 
 static void phy_mdio_device_remove(struct mdio_device *mdiodev)
@@ -1102,11 +1113,103 @@ struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45)
 }
 EXPORT_SYMBOL(get_phy_device);
 
-/**
- * phy_device_register - Register the phy device on the MDIO bus
- * @phydev: phy_device structure to be added to the MDIO bus
+/* Best-effort attach of phydev->psec from a DT `pses = <&...>` phandle.
+ * Caller must hold rtnl. A missing phandle (-ENOENT) or a not-yet-registered
+ * controller (-EPROBE_DEFER) is silent; the notifier retries the latter at
+ * PSE_REGISTERED time. Any other error means a broken binding and is warned
+ * about, but left non-fatal so the phy still registers.
  */
-int phy_device_register(struct phy_device *phydev)
+static void phy_try_attach_pse(struct phy_device *phydev)
+{
+	struct pse_control *psec;
+	struct device_node *np;
+
+	ASSERT_RTNL();
+
+	np = phydev->mdio.dev.of_node;
+	if (!np)
+		return;
+
+	if (phydev->psec)
+		return;
+
+	psec = of_pse_control_get(np, phydev);
+	if (IS_ERR(psec)) {
+		if (PTR_ERR(psec) != -EPROBE_DEFER && PTR_ERR(psec) != -ENOENT)
+			phydev_warn(phydev, "failed to get PSE control: %pe\n",
+				    psec);
+		return;
+	}
+
+	phydev->psec = psec;
+}
+
+static int phy_pse_attach_one(struct device *dev, void *data __maybe_unused)
+{
+	ASSERT_RTNL();
+
+	if (dev->type != &mdio_bus_phy_type)
+		return 0;
+
+	phy_try_attach_pse(to_phy_device(dev));
+	return 0;
+}
+
+static int phy_pse_detach_one(struct device *dev, void *data)
+{
+	struct pse_controller_dev *pcdev = data;
+	struct phy_device *phydev;
+	struct pse_control *psec;
+
+	ASSERT_RTNL();
+
+	if (dev->type != &mdio_bus_phy_type)
+		return 0;
+
+	phydev = to_phy_device(dev);
+	psec = phydev->psec;
+	if (!psec || !pse_control_matches_pcdev(psec, pcdev))
+		return 0;
+
+	phydev->psec = NULL;
+	pse_control_put(psec);
+	return 0;
+}
+
+static int phy_pse_notifier_event(struct notifier_block *nb,
+				  unsigned long event, void *data)
+{
+	switch (event) {
+	case PSE_REGISTERED:
+		rtnl_lock();
+		bus_for_each_dev(&mdio_bus_type, NULL, NULL,
+				 phy_pse_attach_one);
+		rtnl_unlock();
+		return NOTIFY_OK;
+	case PSE_UNREGISTERED:
+		rtnl_lock();
+		bus_for_each_dev(&mdio_bus_type, NULL, data,
+				 phy_pse_detach_one);
+		rtnl_unlock();
+		return NOTIFY_OK;
+	default:
+		return NOTIFY_DONE;
+	}
+}
+
+static struct notifier_block phy_pse_notifier __read_mostly = {
+	.notifier_call = phy_pse_notifier_event,
+};
+
+/* Core registration: add the phy to the MDIO bus. Does not touch rtnl or
+ * PSE. phydev->psec is attached by the callers below, after device_add()
+ * has made the phy visible on mdio_bus_type, so that a concurrent PSE
+ * notifier walk and the attach can never leave the phy unattached. Keeping
+ * device_add() out of rtnl also avoids deadlocking when binding a phy that
+ * itself provides an SFP cage (phy_probe() -> phy_sfp_probe() ->
+ * sfp_bus_add_upstream() takes rtnl).
+ */
+static int __phy_device_register(struct phy_device *phydev)
 {
 	int err;
 
@@ -1135,10 +1238,54 @@ int phy_device_register(struct phy_device *phydev)
  out:
 	/* Assert the reset signal */
 	phy_device_reset(phydev, 1);
-
 	mdiobus_unregister_device(&phydev->mdio);
 	return err;
 }
+
+/**
+ * phy_device_register_locked - Register the phy device on the MDIO bus
+ * @phydev: phy_device structure to be added to the MDIO bus
+ *
+ * Same as phy_device_register() but caller must already hold rtnl_lock().
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int phy_device_register_locked(struct phy_device *phydev)
+{
+	int err;
+
+	ASSERT_RTNL();
+
+	err = __phy_device_register(phydev);
+	if (err)
+		return err;
+
+	phy_try_attach_pse(phydev);
+
+	return 0;
+}
+EXPORT_SYMBOL(phy_device_register_locked);
+
+/**
+ * phy_device_register - Register the phy device on the MDIO bus
+ * @phydev: phy_device structure to be added to the MDIO bus
+ *
+ * Return: 0 on success, negative error code on failure.
+ */
+int phy_device_register(struct phy_device *phydev)
+{
+	int err;
+
+	err = __phy_device_register(phydev);
+	if (err)
+		return err;
+
+	rtnl_lock();
+	phy_try_attach_pse(phydev);
+	rtnl_unlock();
+
+	return 0;
+}
 EXPORT_SYMBOL(phy_device_register);
 
 /**
@@ -1152,8 +1299,6 @@ EXPORT_SYMBOL(phy_device_register);
 void phy_device_remove(struct phy_device *phydev)
 {
 	unregister_mii_timestamper(phydev->mii_ts);
-	pse_control_put(phydev->psec);
-
 	device_del(&phydev->mdio.dev);
 
 	/* Assert the reset signal */
@@ -3981,8 +4126,14 @@ static int __init phy_init(void)
 	if (rc)
 		goto err_c45;
 
+	rc = pse_register_notifier(&phy_pse_notifier);
+	if (rc)
+		goto err_genphy;
+
 	return 0;
 
+err_genphy:
+	phy_driver_unregister(&genphy_driver);
 err_c45:
 	phy_driver_unregister(&genphy_c45_driver);
 err_ethtool_phy_ops:
@@ -3999,6 +4150,7 @@ static int __init phy_init(void)
 
 static void __exit phy_exit(void)
 {
+	pse_unregister_notifier(&phy_pse_notifier);
 	phy_driver_unregister(&genphy_c45_driver);
 	phy_driver_unregister(&genphy_driver);
 	rtnl_lock();
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c
index 03bfd8640db9..18868bdd6485 100644
--- a/drivers/net/phy/sfp.c
+++ b/drivers/net/phy/sfp.c
@@ -2083,7 +2083,7 @@ static int sfp_sm_probe_phy(struct sfp *sfp, int addr, bool is_c45)
 	/* Mark this PHY as being on a SFP module */
 	phy->is_on_sfp_module = true;
 
-	err = phy_device_register(phy);
+	err = phy_device_register_locked(phy);
 	if (err) {
 		phy_device_free(phy);
 		dev_err(sfp->dev, "phy_device_register failed: %pe\n",
diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c
index 37ba4ab778af..432ca2ee5402 100644
--- a/drivers/net/pse-pd/pse_core.c
+++ b/drivers/net/pse-pd/pse_core.c
@@ -2021,3 +2021,17 @@ bool pse_has_c33(struct pse_control *psec)
 	return psec->pcdev->types & ETHTOOL_PSE_C33;
 }
 EXPORT_SYMBOL_GPL(pse_has_c33);
+
+/**
+ * pse_control_matches_pcdev - Test whether a pse_control targets a controller
+ * @psec: pse_control obtained from of_pse_control_get()
+ * @pcdev: PSE controller to compare against
+ *
+ * Return: %true if @psec was obtained from @pcdev, %false otherwise.
+ */
+bool pse_control_matches_pcdev(struct pse_control *psec,
+			       struct pse_controller_dev *pcdev)
+{
+	return psec->pcdev == pcdev;
+}
+EXPORT_SYMBOL_GPL(pse_control_matches_pcdev);
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 199a7aaa341b..865b9baddb85 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -2158,6 +2158,8 @@ struct phy_device *fwnode_phy_find_device(struct fwnode_handle *phy_fwnode);
 struct fwnode_handle *fwnode_get_phy_node(const struct fwnode_handle *fwnode);
 struct phy_device *get_phy_device(struct mii_bus *bus, int addr, bool is_c45);
 int phy_device_register(struct phy_device *phy);
+/* Caller must hold rtnl_lock(); see phy_device_register() for the public form. */
+int phy_device_register_locked(struct phy_device *phy);
 void phy_device_free(struct phy_device *phydev);
 void phy_device_remove(struct phy_device *phydev);
 int phy_get_c45_ids(struct phy_device *phydev);
diff --git a/include/linux/pse-pd/pse.h b/include/linux/pse-pd/pse.h
index 78fe3a2b1ea8..d4310ca71a3e 100644
--- a/include/linux/pse-pd/pse.h
+++ b/include/linux/pse-pd/pse.h
@@ -385,6 +385,9 @@ int pse_ethtool_set_prio(struct pse_control *psec,
 bool pse_has_podl(struct pse_control *psec);
 bool pse_has_c33(struct pse_control *psec);
 
+bool pse_control_matches_pcdev(struct pse_control *psec,
+			       struct pse_controller_dev *pcdev);
+
 int pse_register_notifier(struct notifier_block *nb);
 int pse_unregister_notifier(struct notifier_block *nb);
 
@@ -438,6 +441,12 @@ static inline bool pse_has_c33(struct pse_control *psec)
 	return false;
 }
 
+static inline bool pse_control_matches_pcdev(struct pse_control *psec,
+					     struct pse_controller_dev *pcdev)
+{
+	return false;
+}
+
 static inline int pse_register_notifier(struct notifier_block *nb)
 {
 	return 0;
-- 
2.43.0


^ permalink raw reply related

* [PATCH net-next v2 3/4] net: pse-pd: fire lifecycle events on controller register/unregister
From: Carlo Szelinsky @ 2026-06-20 11:24 UTC (permalink / raw)
  To: Oleksij Rempel, Kory Maincent, Andrew Lunn, Heiner Kallweit,
	Russell King, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Corey Leavitt, Jonas Jelonek, netdev, linux-kernel,
	Carlo Szelinsky
In-Reply-To: <20260620112440.1734404-1-github@szelinsky.de>

From: Corey Leavitt <corey@leavitt.info>

Hook the newly-introduced pse_controller_notifier chain so that
pse_controller_register() fires PSE_REGISTERED after the controller
has been added to pse_controller_list (i.e. is now resolvable by
of_pse_control_get()), and pse_controller_unregister() fires
PSE_UNREGISTERED before the controller is removed from the list
(while it is still valid to dereference from a subscriber's
pse_control pointer targeting it).

With no subscribers yet, this is observably a no-op. A later change
wires the phy subsystem in as the first subscriber.

Signed-off-by: Corey Leavitt <corey@leavitt.info>
Signed-off-by: Carlo Szelinsky <github@szelinsky.de>
---
 drivers/net/pse-pd/pse_core.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c
index 84c734ed4553..37ba4ab778af 100644
--- a/drivers/net/pse-pd/pse_core.c
+++ b/drivers/net/pse-pd/pse_core.c
@@ -1138,6 +1138,9 @@ int pse_controller_register(struct pse_controller_dev *pcdev)
 	list_add(&pcdev->list, &pse_controller_list);
 	mutex_unlock(&pse_list_mutex);
 
+	blocking_notifier_call_chain(&pse_controller_notifier,
+				     PSE_REGISTERED, pcdev);
+
 	return 0;
 }
 EXPORT_SYMBOL_GPL(pse_controller_register);
@@ -1148,6 +1151,9 @@ EXPORT_SYMBOL_GPL(pse_controller_register);
  */
 void pse_controller_unregister(struct pse_controller_dev *pcdev)
 {
+	blocking_notifier_call_chain(&pse_controller_notifier,
+				     PSE_UNREGISTERED, pcdev);
+
 	pse_flush_pw_ds(pcdev);
 	pse_release_pis(pcdev);
 	if (pcdev->irq)
-- 
2.43.0


^ permalink raw reply related

* [PATCH net-next v2 2/4] net: pse-pd: add notifier chain for controller lifecycle events
From: Carlo Szelinsky @ 2026-06-20 11:24 UTC (permalink / raw)
  To: Oleksij Rempel, Kory Maincent, Andrew Lunn, Heiner Kallweit,
	Russell King, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Corey Leavitt, Jonas Jelonek, netdev, linux-kernel,
	Carlo Szelinsky
In-Reply-To: <20260620112440.1734404-1-github@szelinsky.de>

From: Corey Leavitt <corey@leavitt.info>

Introduce a blocking notifier chain that allows other subsystems to be
informed when a PSE controller is registered or unregistered, and
provide pse_register_notifier() / pse_unregister_notifier() as the
subscriber interface.

Subsequent patches will use this to let the phy subsystem own the
phydev->psec lifecycle directly, decoupling PSE lookup from
fwnode_mdiobus_register_phy() and removing the probe-time
-EPROBE_DEFER coupling that currently exists between mdio, phy and
pse-pd when the PSE controller driver is modular.

A blocking chain (rather than atomic) is used because callbacks will
take rtnl_lock and call back into pse_core via of_pse_control_get().

The enum pse_controller_event is placed outside the
IS_ENABLED(CONFIG_PSE_CONTROLLER) guard so that subscribers compiled
into a kernel without PSE support can still reference the event
values in dead-code paths without breaking the build.

This patch is pure infrastructure: nothing fires events yet, and
nothing subscribes. No observable behavior change.

Signed-off-by: Corey Leavitt <corey@leavitt.info>
Signed-off-by: Carlo Szelinsky <github@szelinsky.de>
---
 drivers/net/pse-pd/pse_core.c | 34 ++++++++++++++++++++++++++++++++++
 include/linux/pse-pd/pse.h    | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 66 insertions(+)

diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c
index a5e6d7b26b9f..84c734ed4553 100644
--- a/drivers/net/pse-pd/pse_core.c
+++ b/drivers/net/pse-pd/pse_core.c
@@ -8,6 +8,7 @@
 #include <linux/device.h>
 #include <linux/ethtool.h>
 #include <linux/ethtool_netlink.h>
+#include <linux/notifier.h>
 #include <linux/of.h>
 #include <linux/phy.h>
 #include <linux/pse-pd/pse.h>
@@ -23,6 +24,39 @@ static LIST_HEAD(pse_controller_list);
 static DEFINE_XARRAY_ALLOC(pse_pw_d_map);
 static DEFINE_MUTEX(pse_pw_d_mutex);
 
+static BLOCKING_NOTIFIER_HEAD(pse_controller_notifier);
+
+/**
+ * pse_register_notifier - register a callback for PSE controller events
+ * @nb: notifier block to register
+ *
+ * See enum pse_controller_event for events fired and their subscriber
+ * contract. Callbacks run in process context; they may sleep, take
+ * rtnl, and call of_pse_control_get(). The chain fires synchronously,
+ * so a PSE controller driver's probe/unbind path must not hold any
+ * such lock when calling pse_controller_register() or
+ * pse_controller_unregister().
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+int pse_register_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_register(&pse_controller_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(pse_register_notifier);
+
+/**
+ * pse_unregister_notifier - unregister a previously registered callback
+ * @nb: notifier block previously passed to pse_register_notifier()
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+int pse_unregister_notifier(struct notifier_block *nb)
+{
+	return blocking_notifier_chain_unregister(&pse_controller_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(pse_unregister_notifier);
+
 /**
  * struct pse_control - a PSE control
  * @pcdev: a pointer to the PSE controller device
diff --git a/include/linux/pse-pd/pse.h b/include/linux/pse-pd/pse.h
index 4e5696cfade7..78fe3a2b1ea8 100644
--- a/include/linux/pse-pd/pse.h
+++ b/include/linux/pse-pd/pse.h
@@ -21,6 +21,7 @@ struct net_device;
 struct phy_device;
 struct pse_controller_dev;
 struct netlink_ext_ack;
+struct notifier_block;
 
 /* C33 PSE extended state and substate. */
 struct ethtool_c33_pse_ext_state_info {
@@ -337,6 +338,24 @@ enum pse_budget_eval_strategies {
 	PSE_BUDGET_EVAL_STRAT_DYNAMIC	= 1 << 2,
 };
 
+/**
+ * enum pse_controller_event - PSE controller lifecycle events
+ *
+ * Event data in callbacks is always a pointer to the struct
+ * pse_controller_dev firing the event.
+ *
+ * @PSE_REGISTERED: controller added to pse_controller_list and
+ *	resolvable by of_pse_control_get().
+ * @PSE_UNREGISTERED: controller about to be removed from
+ *	pse_controller_list. Subscribers holding pse_control references
+ *	targeting it must drop them before returning and must not
+ *	acquire new references for it.
+ */
+enum pse_controller_event {
+	PSE_REGISTERED,
+	PSE_UNREGISTERED,
+};
+
 #if IS_ENABLED(CONFIG_PSE_CONTROLLER)
 int pse_controller_register(struct pse_controller_dev *pcdev);
 void pse_controller_unregister(struct pse_controller_dev *pcdev);
@@ -366,6 +385,9 @@ int pse_ethtool_set_prio(struct pse_control *psec,
 bool pse_has_podl(struct pse_control *psec);
 bool pse_has_c33(struct pse_control *psec);
 
+int pse_register_notifier(struct notifier_block *nb);
+int pse_unregister_notifier(struct notifier_block *nb);
+
 #else
 
 static inline struct pse_control *of_pse_control_get(struct device_node *node,
@@ -416,6 +438,16 @@ static inline bool pse_has_c33(struct pse_control *psec)
 	return false;
 }
 
+static inline int pse_register_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+
+static inline int pse_unregister_notifier(struct notifier_block *nb)
+{
+	return 0;
+}
+
 #endif
 
 #endif
-- 
2.43.0


^ permalink raw reply related

* [PATCH net-next v2 1/4] net: pse-pd: scope pse_control regulator handle to kref lifetime
From: Carlo Szelinsky @ 2026-06-20 11:24 UTC (permalink / raw)
  To: Oleksij Rempel, Kory Maincent, Andrew Lunn, Heiner Kallweit,
	Russell King, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Corey Leavitt, Jonas Jelonek, netdev, linux-kernel,
	Carlo Szelinsky
In-Reply-To: <20260620112440.1734404-1-github@szelinsky.de>

From: Corey Leavitt <corey@leavitt.info>

__pse_control_release() drops psec->ps via devm_regulator_put(), which
only succeeds if the devres entry added by the matching
devm_regulator_get_exclusive() is still present on pcdev->dev at the
time the pse_control's kref hits zero.

In practice that assumption does not hold when the controller is
unbound while any pse_control still has consumers: pcdev->dev's
devres list is released LIFO, so every per-attach regulator-GET
devres runs (and regulator_put()s the underlying regulator) before
pse_controller_unregister() itself is invoked. Any later
pse_control_put() from that unbind path then reads psec->ps as a
dangling pointer inside devm_regulator_put() and WARNs at
drivers/regulator/devres.c:232 (devres_release() fails to find the
already-released match).

The pse_control's consumer handle is logically scoped to the
pse_control's refcount, not to pcdev->dev's devres lifetime. Switch
to the plain regulator_get_exclusive() / regulator_put() pair so
__pse_control_release() does the right put regardless of whether
the controller's devres has already been unwound.

No change to the regulator-framework-visible refcount or lifetime of
the underlying regulator: a single get paired with a single put. The
existing devm_regulator_register() for the per-PI rails is unchanged
(those ARE correctly scoped to the controller's lifetime).

Fixes: d83e13761d5b ("net: pse-pd: Use regulator framework within PSE framework")
Signed-off-by: Corey Leavitt <corey@leavitt.info>
Acked-by: Kory Maincent <kory.maincent@bootlin.com>
Signed-off-by: Carlo Szelinsky <github@szelinsky.de>
---
 drivers/net/pse-pd/pse_core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/pse-pd/pse_core.c b/drivers/net/pse-pd/pse_core.c
index 69dbdbde9d71..a5e6d7b26b9f 100644
--- a/drivers/net/pse-pd/pse_core.c
+++ b/drivers/net/pse-pd/pse_core.c
@@ -1367,7 +1367,7 @@ static void __pse_control_release(struct kref *kref)

 	if (psec->pcdev->pi[psec->id].admin_state_enabled)
 		regulator_disable(psec->ps);
-	devm_regulator_put(psec->ps);
+	regulator_put(psec->ps);

 	module_put(psec->pcdev->owner);

@@ -1436,8 +1436,8 @@ pse_control_get_internal(struct pse_controller_dev *pcdev, unsigned int index,
 		goto free_psec;

 	pcdev->pi[index].admin_state_enabled = ret;
-	psec->ps = devm_regulator_get_exclusive(pcdev->dev,
-						rdev_get_name(pcdev->pi[index].rdev));
+	psec->ps = regulator_get_exclusive(pcdev->dev,
+					   rdev_get_name(pcdev->pi[index].rdev));
 	if (IS_ERR(psec->ps)) {
 		ret = PTR_ERR(psec->ps);
 		goto put_module;
-- 
2.43.0

^ permalink raw reply related

* [PATCH net-next v2 0/4] net: pse-pd: decouple controller lookup from MDIO probe
From: Carlo Szelinsky @ 2026-06-20 11:24 UTC (permalink / raw)
  To: Oleksij Rempel, Kory Maincent, Andrew Lunn, Heiner Kallweit,
	Russell King, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Corey Leavitt, Jonas Jelonek, netdev, linux-kernel,
	Carlo Szelinsky
In-Reply-To: <20260423-pse-notifier-decouple-v1-0-86ed750a9d62@leavitt.info>

This is v2 of Corey's RFC [1]. Corey is busy at the moment, so I'm picking
it up to unblock everyone. The design is unchanged. The main thing v2
fixes is the SFP deadlock Jonas reported, plus a couple of smaller points
from the review.

The problem:

When a PSE controller driver is built as a module and a DT PHY node has a
"pses = <&...>" phandle, fwnode_mdiobus_register_phy() tries to resolve the
PSE handle before the controller has probed. It gets -EPROBE_DEFER, the
MDIO/DSA probe fails, and driver-core keeps retrying until the PSE module
loads. Since fa2f0454174c each retry does a full phy_device_register() /
phy_device_remove() cycle, so on a board with a tight watchdog the retry
loop can reset the box before userspace is up.

Rather than make the retry cheaper, this takes the PSE lookup out of the
MDIO probe path completely. pse_core gets a notifier chain (REGISTERED /
UNREGISTERED), the phy layer subscribes, owns phydev->psec, and attaches the
PSE handle when the controller actually shows up instead of during probe.
fwnode_mdio no longer knows about PSE, so no -EPROBE_DEFER crosses that
boundary and the retry loop is gone.

What changed since v1:

 - v1 made phy_device_register() hold rtnl across the whole registration,
   including device_add(). That deadlocks a PHY that drives its own SFP cage:
   device_add() -> phy_probe() -> phy_sfp_probe() -> sfp_bus_add_upstream(),
   and sfp_bus_add_upstream() takes rtnl again. Jonas hit this with
   RTL8214FC. v2 keeps device_add() out of rtnl and only takes rtnl around
   the psec attach, which now runs after device_add(). Doing the attach
   after the phy is on the bus keeps the PSE_REGISTERED race closed: either
   the notifier walk finds the phy and attaches it, or our own attach does,
   and the phydev->psec check makes that idempotent.

 - A broken "pses" binding now gets a phydev_warn() instead of being
   swallowed. -ENOENT (no phandle) and -EPROBE_DEFER stay quiet.

Tested on a Realtek rtl93xx PoE switch with two HS104 PSE controllers on
i2c:

 - clean boot, no probe-retry loop, no watchdog reset
 - 10G SFP+ port: module hotplug works, no deadlock (this is the path that
   hung with v1)
 - ethtool --set-pse enable/disable cuts and restores power to a connected PD
 - full i2c unbind -> rmmod -> modprobe cycle: PSE detaches on unbind (module
   refcount drops to 0 so rmmod works), and re-attaches on reload with power
   restored, no reboot. No lockdep splats.

Tested-by: Carlo Szelinsky <github@szelinsky.de>

One thing I'd like input on: the Fixes: tags. Patch 1 is a standalone
regulator lifetime fix and carries its own Fixes:. The boot-hang itself is
fixed by patches 2-4 together. Should those three carry
Fixes: fa2f0454174c so the fix can be backported, or should the series stay
net-next only? I'm fine either way.

[1] https://lore.kernel.org/netdev/20260423-pse-notifier-decouple-v1-0-86ed750a9d62@leavitt.info/

Corey Leavitt (4):
  net: pse-pd: scope pse_control regulator handle to kref lifetime
  net: pse-pd: add notifier chain for controller lifecycle events
  net: pse-pd: fire lifecycle events on controller register/unregister
  net: phy: own phydev->psec via PSE notifier and remove fwnode_mdio
    hook

 drivers/net/mdio/fwnode_mdio.c |  34 -------
 drivers/net/phy/phy_device.c   | 168 +++++++++++++++++++++++++++++++--
 drivers/net/phy/sfp.c          |   2 +-
 drivers/net/pse-pd/pse_core.c  |  60 +++++++++++-
 include/linux/phy.h            |   2 +
 include/linux/pse-pd/pse.h     |  41 ++++++++
 6 files changed, 261 insertions(+), 46 deletions(-)

base-commit: b85966adbf5de0668a815c6e3527f87e0c387fb4
-- 
2.43.0

^ permalink raw reply

* Re: [PATCH net] net: marvell: prestera: use unaligned accessors for DSA tag
From: Runyu Xiao @ 2026-06-20 10:01 UTC (permalink / raw)
  To: David Laight
  Cc: Taras Chornyi, netdev, andrew+netdev, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Oleksandr Mazur,
	Andrii Savka, Vadym Kochan, Volodymyr Mytnyk, linux-kernel,
	Jianhao Xu, stable
In-Reply-To: <20260620104750.5270a11c@pumpkin>

On Sat, 20 Jun 2026 17:47:50 +0800, David Laight wrote:
&gt; Stop sending these 'fixes' unless you can do proper analysis.
&gt; skb data is guaranteed to be aligned so that these reads (and ones of
&gt; the IP/TCP/UDP headers) are aligned.

You are right. I treated the DSA tag buffer as a generic byte buffer and
did not account for the skb data alignment guarantees in this path.

Please drop this patch. I will re-check the remaining reports against
the relevant subsystem alignment contracts before sending anything else.

pw-bot: changes-requested

Thanks,
Runyu

^ permalink raw reply

* Re: [PATCH net] net: marvell: prestera: use unaligned accessors for DSA tag
From: David Laight @ 2026-06-20  9:47 UTC (permalink / raw)
  To: Runyu Xiao
  Cc: Taras Chornyi, netdev, Andrew Lunn, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Oleksandr Mazur,
	Andrii Savka, Vadym Kochan, Volodymyr Mytnyk, linux-kernel,
	Jianhao Xu, stable
In-Reply-To: <20260620093739.2164921-1-runyu.xiao@seu.edu.cn>

On Sat, 20 Jun 2026 17:37:39 +0800
Runyu Xiao <runyu.xiao@seu.edu.cn> wrote:

> Prestera parses and builds its 16-byte DSA tag from an skb byte buffer.
> The current code casts the tag pointer to __be32 * and then reads or
> writes the four tag words through that typed pointer.
> 
> The tag pointer is derived from skb data, but that only identifies the
> protocol tag location inside the packet buffer. It does not make the tag
> a naturally aligned __be32 array. Use the unaligned big-endian helpers
> for both parsing and building the tag.
> 
> This issue was detected by our static analysis tool and confirmed by
> manual audit. The same access pattern was validated with UBSAN alignment
> instrumentation by keeping the original cast from a u8 DSA tag buffer to
> __be32 * and reading dsa_words[i] from a deliberately misaligned tag
> buffer. UBSAN reported misaligned-access loads of type '__be32' in
> prestera_dsa_parse().
> 
> The driver has the same source-level issue: the RX path parses bytes at
> skb->data - ETH_TLEN, and the TX path writes the tag at skb->data +
> 2 * ETH_ALEN. Those offsets identify the DSA tag bytes, but they do not
> establish a __be32 object or a 4-byte alignment guarantee for typed loads
> and stores.

Stop sending these 'fixes' unless you can do proper analysis.
skb data is guaranteed to be aligned so that these reads (and ones of
the IP/TCP/UDP headers) are aligned.

	David


> 
> Fixes: 501ef3066c89 ("net: marvell: prestera: Add driver for Prestera family ASIC devices")
> Cc: stable@vger.kernel.org
> Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
> ---
>  .../ethernet/marvell/prestera/prestera_dsa.c  | 19 +++++++++----------
>  1 file changed, 9 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/prestera/prestera_dsa.c b/drivers/net/ethernet/marvell/prestera/prestera_dsa.c
> index b7e89c0ca5c0..276f98cbd50e 100644
> --- a/drivers/net/ethernet/marvell/prestera/prestera_dsa.c
> +++ b/drivers/net/ethernet/marvell/prestera/prestera_dsa.c
> @@ -4,6 +4,7 @@
>  #include <linux/bitfield.h>
>  #include <linux/bitops.h>
>  #include <linux/errno.h>
> +#include <linux/unaligned.h>
>  #include <linux/string.h>
>  
>  #include "prestera_dsa.h"
> @@ -33,15 +34,14 @@
>  
>  int prestera_dsa_parse(struct prestera_dsa *dsa, const u8 *dsa_buf)
>  {
> -	__be32 *dsa_words = (__be32 *)dsa_buf;
>  	enum prestera_dsa_cmd cmd;
>  	u32 words[4];
>  	u32 field;
>  
> -	words[0] = ntohl(dsa_words[0]);
> -	words[1] = ntohl(dsa_words[1]);
> -	words[2] = ntohl(dsa_words[2]);
> -	words[3] = ntohl(dsa_words[3]);
> +	words[0] = get_unaligned_be32(dsa_buf);
> +	words[1] = get_unaligned_be32(dsa_buf + 4);
> +	words[2] = get_unaligned_be32(dsa_buf + 8);
> +	words[3] = get_unaligned_be32(dsa_buf + 12);
>  
>  	/* set the common parameters */
>  	cmd = (enum prestera_dsa_cmd)FIELD_GET(PRESTERA_DSA_W0_CMD, words[0]);
> @@ -82,7 +82,6 @@ int prestera_dsa_parse(struct prestera_dsa *dsa, const u8 *dsa_buf)
>  
>  int prestera_dsa_build(const struct prestera_dsa *dsa, u8 *dsa_buf)
>  {
> -	__be32 *dsa_words = (__be32 *)dsa_buf;
>  	u32 dev_num = dsa->hw_dev_num;
>  	u32 words[4] = { 0 };
>  
> @@ -98,10 +97,10 @@ int prestera_dsa_build(const struct prestera_dsa *dsa, u8 *dsa_buf)
>  	words[1] |= FIELD_PREP(PRESTERA_DSA_W1_EXT_BIT, 1);
>  	words[2] |= FIELD_PREP(PRESTERA_DSA_W2_EXT_BIT, 1);
>  
> -	dsa_words[0] = htonl(words[0]);
> -	dsa_words[1] = htonl(words[1]);
> -	dsa_words[2] = htonl(words[2]);
> -	dsa_words[3] = htonl(words[3]);
> +	put_unaligned_be32(words[0], dsa_buf);
> +	put_unaligned_be32(words[1], dsa_buf + 4);
> +	put_unaligned_be32(words[2], dsa_buf + 8);
> +	put_unaligned_be32(words[3], dsa_buf + 12);
>  
>  	return 0;
>  }


^ permalink raw reply

* Re: [PATCH net v2] net: phy: realtek: Clear MDIO_AN_10GBT_CTRL_ADV10G bit
From: Jan Klos @ 2026-06-20  9:43 UTC (permalink / raw)
  To: Markus Stockhausen
  Cc: Heiner Kallweit, Andrew Lunn, Russell King, netdev,
	Maxime Chevallier, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Daniel Golle, Vladimir Oltean,
	Aleksander Jan Bajkowski, Jan Hoffmann, Issam Hamdi, Chukun Pan,
	Russell King (Oracle), ChunHao Lin, linux-kernel
In-Reply-To: <008101dd0080$5feb6860$1fc23920$@gmx.de>

On Sat, 20 Jun 2026 at 08:45, Markus Stockhausen
<markus.stockhausen@gmx.de> wrote:
>
> > Von: Jan Klos <honza.klos@gmail.com>
> > Gesendet: Samstag, 20. Juni 2026 03:20
> > Betreff: [PATCH net v2] net: phy: realtek: Clear MDIO_AN_10GBT_CTRL_ADV10G
> bit
> >
> > On RTL8127A connected to a link partner that advertises 10000baseT
> > speed cannot be changed to anything other than 10000baseT as 10GbE
> > is always advertised regardless of any setting. Fix this by
> > clearing MDIO_AN_10GBT_CTRL_ADV10G bit in rtl822x_config_aneg()'s
> > call to phy_modify_mmd_changed().
>
> As you are enhancing the mask, shouldn't this be "... by respecting ..."?
>
> Markus
>

I don't think so, in (__)phy_modify_mmd_changed() the mask is really used to
clear MMD register bits from old register value before setting new bits in set:
* @mask: bit mask of bits to clear
* @set: new value of bits set in mask to write to @regnum
*
* Unlocked helper function which allows a MMD register to be modified as
* new register value = (old register value & ~mask) | set

^ permalink raw reply

* [PATCH net] net: marvell: prestera: use unaligned accessors for DSA tag
From: Runyu Xiao @ 2026-06-20  9:37 UTC (permalink / raw)
  To: Taras Chornyi, netdev
  Cc: Andrew Lunn, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Oleksandr Mazur, Andrii Savka, Vadym Kochan,
	Volodymyr Mytnyk, linux-kernel, Runyu Xiao, Jianhao Xu, stable

Prestera parses and builds its 16-byte DSA tag from an skb byte buffer.
The current code casts the tag pointer to __be32 * and then reads or
writes the four tag words through that typed pointer.

The tag pointer is derived from skb data, but that only identifies the
protocol tag location inside the packet buffer. It does not make the tag
a naturally aligned __be32 array. Use the unaligned big-endian helpers
for both parsing and building the tag.

This issue was detected by our static analysis tool and confirmed by
manual audit. The same access pattern was validated with UBSAN alignment
instrumentation by keeping the original cast from a u8 DSA tag buffer to
__be32 * and reading dsa_words[i] from a deliberately misaligned tag
buffer. UBSAN reported misaligned-access loads of type '__be32' in
prestera_dsa_parse().

The driver has the same source-level issue: the RX path parses bytes at
skb->data - ETH_TLEN, and the TX path writes the tag at skb->data +
2 * ETH_ALEN. Those offsets identify the DSA tag bytes, but they do not
establish a __be32 object or a 4-byte alignment guarantee for typed loads
and stores.

Fixes: 501ef3066c89 ("net: marvell: prestera: Add driver for Prestera family ASIC devices")
Cc: stable@vger.kernel.org
Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
---
 .../ethernet/marvell/prestera/prestera_dsa.c  | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/marvell/prestera/prestera_dsa.c b/drivers/net/ethernet/marvell/prestera/prestera_dsa.c
index b7e89c0ca5c0..276f98cbd50e 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_dsa.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_dsa.c
@@ -4,6 +4,7 @@
 #include <linux/bitfield.h>
 #include <linux/bitops.h>
 #include <linux/errno.h>
+#include <linux/unaligned.h>
 #include <linux/string.h>

 #include "prestera_dsa.h"
@@ -33,15 +34,14 @@

 int prestera_dsa_parse(struct prestera_dsa *dsa, const u8 *dsa_buf)
 {
-	__be32 *dsa_words = (__be32 *)dsa_buf;
 	enum prestera_dsa_cmd cmd;
 	u32 words[4];
 	u32 field;

-	words[0] = ntohl(dsa_words[0]);
-	words[1] = ntohl(dsa_words[1]);
-	words[2] = ntohl(dsa_words[2]);
-	words[3] = ntohl(dsa_words[3]);
+	words[0] = get_unaligned_be32(dsa_buf);
+	words[1] = get_unaligned_be32(dsa_buf + 4);
+	words[2] = get_unaligned_be32(dsa_buf + 8);
+	words[3] = get_unaligned_be32(dsa_buf + 12);

 	/* set the common parameters */
 	cmd = (enum prestera_dsa_cmd)FIELD_GET(PRESTERA_DSA_W0_CMD, words[0]);
@@ -82,7 +82,6 @@ int prestera_dsa_parse(struct prestera_dsa *dsa, const u8 *dsa_buf)

 int prestera_dsa_build(const struct prestera_dsa *dsa, u8 *dsa_buf)
 {
-	__be32 *dsa_words = (__be32 *)dsa_buf;
 	u32 dev_num = dsa->hw_dev_num;
 	u32 words[4] = { 0 };

@@ -98,10 +97,10 @@ int prestera_dsa_build(const struct prestera_dsa *dsa, u8 *dsa_buf)
 	words[1] |= FIELD_PREP(PRESTERA_DSA_W1_EXT_BIT, 1);
 	words[2] |= FIELD_PREP(PRESTERA_DSA_W2_EXT_BIT, 1);

-	dsa_words[0] = htonl(words[0]);
-	dsa_words[1] = htonl(words[1]);
-	dsa_words[2] = htonl(words[2]);
-	dsa_words[3] = htonl(words[3]);
+	put_unaligned_be32(words[0], dsa_buf);
+	put_unaligned_be32(words[1], dsa_buf + 4);
+	put_unaligned_be32(words[2], dsa_buf + 8);
+	put_unaligned_be32(words[3], dsa_buf + 12);

 	return 0;
 }
-- 
2.34.1

^ permalink raw reply related

* "ip help" output is an error
From: Dmitri Seletski @ 2026-06-20  9:36 UTC (permalink / raw)
  To: netdev

Hello iproute2 maintainers,

I am reporting an inconsistency regarding the exit status of the ip help 
command.

Current Behavior:
When running ip help, the command prints the help documentation to 
stdout, but exits with a non-zero status (error). This causes issues in 
shell scripts that rely on exit codes for control flow.

Steps to reproduce:
bash

# This returns "FAIL" because the exit code is non-zero
if ip help > /dev/null; then
     echo "SUCCESS"
else
     echo "FAIL"
fi

Expected Behavior:
Since the command successfully performs the requested task (displaying 
help information) and does not encounter a system error, it should 
return an exit code of 0.

Context:
This behavior breaks standard Bash logic for automation. For example:
ip help && echo "This will not execute"

"ip help |grep br" - this will bring no result.

Current version tested: iproute2-6.19.0

Thank you for your time and for maintaining this tool.

Regards,
Dmitri Seletski

^ permalink raw reply

* Re: [PATCH net] nfc: st-nci: use unaligned accessors for frame length
From: David Laight @ 2026-06-20  9:29 UTC (permalink / raw)
  To: Runyu Xiao
  Cc: Krzysztof Kozlowski, netdev, Samuel Ortiz, Christophe Ricard,
	linux-kernel, Jianhao Xu, stable
In-Reply-To: <20260620090536.1701282-1-runyu.xiao@seu.edu.cn>

On Sat, 20 Jun 2026 17:05:36 +0800
Runyu Xiao <runyu.xiao@seu.edu.cn> wrote:

> The ST NCI I2C and SPI transports parse a frame length from bytes
> received from the controller. Both paths first read the frame header into
> a local u8 buffer and then cast buf + 2 to __be16 * before converting it
> from big endian.

Then align the local buffer.

	David

> 
> These are transport byte buffers, not __be16 objects. Use
> get_unaligned_be16() for the NCI frame length field in both the I2C and
> SPI transports.
> 
> This issue was detected by our static analysis tool and confirmed by
> manual audit. A focused UBSAN alignment validation kept the original
> access shape, be16_to_cpu(*(__be16 *)(buf + 2)), and ran it on an NCI
> frame byte buffer with buf + 2 at an odd address. UBSAN reported a
> misaligned-access load of type '__be16', and the trace contained
> st_nci_i2c_read().
> 
> The driver has the same source-level issue: the transport helpers fill
> u8 buffers, and the length checks only prove that the bytes are present.
> They do not establish a __be16 object at buf + 2 or a 2-byte alignment
> guarantee before the typed load.
> 
> Fixes: ed06aeefdac3 ("nfc: st-nci: Rename st21nfcb to st-nci")
> Fixes: 2bc4d4f8c8f3 ("nfc: st-nci: Add spi phy support for st21nfcb")
> Cc: stable@vger.kernel.org
> Signed-off-by: Runyu Xiao <runyu.xiao@seu.edu.cn>
> ---
>  drivers/nfc/st-nci/i2c.c | 3 ++-
>  drivers/nfc/st-nci/spi.c | 3 ++-
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/nfc/st-nci/i2c.c b/drivers/nfc/st-nci/i2c.c
> index 9ae839a6f5cc..29fdb4ae56e0 100644
> --- a/drivers/nfc/st-nci/i2c.c
> +++ b/drivers/nfc/st-nci/i2c.c
> @@ -14,6 +14,7 @@
>  #include <linux/delay.h>
>  #include <linux/nfc.h>
>  #include <linux/of.h>
> +#include <linux/unaligned.h>
>  
>  #include "st-nci.h"
>  
> @@ -120,7 +121,7 @@ static int st_nci_i2c_read(struct st_nci_i2c_phy *phy,
>  	if (r != ST_NCI_I2C_MIN_SIZE)
>  		return -EREMOTEIO;
>  
> -	len = be16_to_cpu(*(__be16 *) (buf + 2));
> +	len = get_unaligned_be16(buf + 2);
>  	if (len > ST_NCI_I2C_MAX_SIZE) {
>  		nfc_err(&client->dev, "invalid frame len\n");
>  		return -EBADMSG;
> diff --git a/drivers/nfc/st-nci/spi.c b/drivers/nfc/st-nci/spi.c
> index 169eacc0a32a..1326c20e43fc 100644
> --- a/drivers/nfc/st-nci/spi.c
> +++ b/drivers/nfc/st-nci/spi.c
> @@ -14,6 +14,7 @@
>  #include <linux/delay.h>
>  #include <linux/nfc.h>
>  #include <linux/of.h>
> +#include <linux/unaligned.h>
>  #include <net/nfc/nci.h>
>  
>  #include "st-nci.h"
> @@ -130,7 +131,7 @@ static int st_nci_spi_read(struct st_nci_spi_phy *phy,
>  	if (r < 0)
>  		return -EREMOTEIO;
>  
> -	len = be16_to_cpu(*(__be16 *) (buf + 2));
> +	len = get_unaligned_be16(buf + 2);
>  	if (len > ST_NCI_SPI_MAX_SIZE) {
>  		nfc_err(&dev->dev, "invalid frame len\n");
>  		phy->ndlc->hard_fault = 1;


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox