Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net v3 0/3] net: airoha: Fix airoha_qdma_cleanup_tx_queue() processing
From: Lorenzo Bianconi @ 2026-04-17  6:26 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman
  Cc: linux-arm-kernel, linux-mediatek, netdev
In-Reply-To: <20260416-airoha_qdma_cleanup_tx_queue-fix-net-v3-0-2b69f5788580@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 1679 bytes --]

> Add missing bits in airoha_qdma_cleanup_tx_queue routine.
> Fix airoha_qdma_cleanup_tx_queue processing errors intorduced in commit
> '3f47e67dff1f7 ("net: airoha: Add the capability to consume out-of-order
> DMA tx descriptors")'.
> 
> ---
> Changes in v3:
> - Move ndesc initialization fix in a dedicated patch.
> - Add patch 2/3 to move entries to queue head in case of DMA mapping
>   failure in airoha_dev_xmit().
> - Cosmetics.
> - Link to v2: https://lore.kernel.org/r/20260414-airoha_qdma_cleanup_tx_queue-fix-net-v2-1-875de57cc022@kernel.org
> 
> Changes in v2:
> - Move q->ndesc initialization at end of airoha_qdma_init_tx routine in
>   order to avoid any possible NULL pointer dereference in
>   airoha_qdma_cleanup_tx_queue()
> - Check if q->tx_list is empty in airoha_qdma_cleanup_tx_queue()
> - Link to v1: https://lore.kernel.org/r/20260410-airoha_qdma_cleanup_tx_queue-fix-net-v1-1-b7171c8f1e78@kernel.org
> 
> ---
> Lorenzo Bianconi (3):
>       net: airoha: Move ndesc initialization at end of airoha_qdma_init_tx()
>       net: airoha: Move entries to queue head in case of DMA mapping failure in airoha_dev_xmit()
>       net: airoha: Add missing bits in airoha_qdma_cleanup_tx_queue()

Please drop this version, I will send a new one dropping patch 2/3.

Regards,
Lorenzo

> 
>  drivers/net/ethernet/airoha/airoha_eth.c | 42 ++++++++++++++++++++++++++------
>  1 file changed, 35 insertions(+), 7 deletions(-)
> ---
> base-commit: 3f20012a3964f487ae1e9ff942e2f35d4e9595bf
> change-id: 20260410-airoha_qdma_cleanup_tx_queue-fix-net-93375f5ee80f
> 
> Best regards,
> -- 
> Lorenzo Bianconi <lorenzo@kernel.org>
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply

* Re: [PATCH net v2 1/2] bnge: fix initial HWRM sequence
From: Vikas Gupta @ 2026-04-17  6:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, edumazet, pabeni, andrew+netdev, horms, netdev,
	linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta
In-Reply-To: <20260417022201.1807745-1-kuba@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2390 bytes --]

> > @@ -73,25 +73,35 @@ static int bnge_func_qcaps(struct bnge_dev *bd)
>
> [ ... ]
>
> >       return 0;
> > +
> > +err_free_ctx_mem:
> > +     bnge_free_ctx_mem(bd);
> > +     return rc;
> >  }
>
> Could this error path leave the firmware with active DMA pointers to freed
> memory?
>
> Looking at bnge_alloc_ctx_mem(), it configures the firmware with host DMA
> addresses for the backing store via bnge_backing_store_cfg(). If a
> subsequent step like bnge_hwrm_func_resc_qcaps() fails, this new error path
> directly frees the context memory.

HW does not access the context memory unless rings are active (traffic
is running).
Therefore, It is safe to free the memory on the host.

>
> Previously, errors in this sequence triggered bnge_fw_unregister_dev(),
> which safely unregistered the firmware before freeing the memory. There is
> even a comment in bnge_fw_unregister_dev() documenting this requirement:

bnge_fw_unregister_dev() is called in both success and failure scenarios.
Therefore, free context memory using bnge_free_ctx_mem() after
bnge_hwrm_func_drv_unrgtr() ensures that the context memory is invalidated
at the hardware level.

>
> static void bnge_fw_unregister_dev(struct bnge_dev *bd)
> {
>         /* ctx mem free after unrgtr only */
>         bnge_hwrm_func_drv_unrgtr(bd);
>         bnge_free_ctx_mem(bd);
> }
>
> Since the firmware isn't told to stop using the memory before it is freed
> here, is there a risk the hardware might DMA into the freed host pages?

There is no communication mechanism to inform the firmware to free the
context memory.

>
> > @@ -132,32 +142,25 @@ static int bnge_fw_register_dev(struct bnge_dev *bd)
>
> [ ... ]
>
> > -err_func_unrgtr:
> > -     bnge_fw_unregister_dev(bd);
> > +err_free_ctx_mem:
> > +     bnge_free_ctx_mem(bd);
> >       return rc;
> >  }
>
> This error path appears to have the same regression. If
> bnge_hwrm_func_drv_rgtr() fails after bnge_func_qcaps() has already
> configured the backing store, freeing the context memory directly without
> unregistering might allow the hardware to access freed memory.

Even if bnge_hwrm_func_drv_rgtr() fails, it is still safe to free the context
memory at the host because the driver unloads from this point.

AI reviews appear to ignore logic related to handling context memory
in the patch.
I see no valid comments on the patch.

Thanks,
Vikas

> --
> pw-bot: cr

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5465 bytes --]

^ permalink raw reply

* [PATCH] net/packet: fix TOCTOU race on mmap'd vnet_hdr in tpacket_snd()
From: Zero Mark @ 2026-04-17  6:07 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: security, David S . Miller, Jakub Kicinski, Eric Dumazet, netdev,
	Zero Mark
In-Reply-To: <CAHOBGNBvZOXGzzMDuHWw1RrRvbg4TZVH34jVDhc1nkHbW_URXA@mail.gmail.com>

In tpacket_snd(), when PACKET_VNET_HDR is enabled, vnet_hdr points
directly into the mmap'd TX ring buffer shared with userspace. The
kernel validates the header via __packet_snd_vnet_parse() but then
re-reads all fields later in virtio_net_hdr_to_skb(). A concurrent
userspace thread can modify the vnet_hdr fields (gso_type, gso_size,
flags, csum_start, csum_offset) between validation and use, bypassing
all safety checks.

This can lead to:
 - Out-of-bounds checksum writes via crafted csum_start/csum_offset
 - Malicious GSO segmentation parameters
 - Kernel memory corruption and potential local privilege escalation

The non-TPACKET path (packet_snd()) already correctly copies vnet_hdr
to a stack-local variable. All other vnet_hdr consumers in the kernel
(tun.c, tap.c, virtio_net.c) also use stack copies. The TPACKET TX
path is the only caller of virtio_net_hdr_to_skb() that reads directly
from user-controlled shared memory.

Fix this by copying vnet_hdr from the mmap'd ring buffer to a
stack-local variable before validation and use, consistent with the
approach used in packet_snd() and all other callers.

Exploitation requires CAP_NET_RAW, which can be obtained without
special privileges via user namespaces.

Confirmed with a PoC on Linux 6.8.0 (Ubuntu): kprobe tracing on
skb_partial_csum_set captured 77 race wins in 500,000 iterations.

Affects all kernels since PACKET_VNET_HDR support was added to the
TPACKET TX path (~v3.14).

Fixes: 9ed988e5 ("packet: add vnet_hdr support for tpacket_snd")
Signed-off-by: Zero Mark <patzilla007@gmail.com>
---
 net/packet/af_packet.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index abcdef012345..fedcba654321 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2725,7 +2725,8 @@ static int tpacket_parse_header(struct packet_sock *po, void *frame,
 static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 {
 	struct sk_buff *skb = NULL;
 	struct net_device *dev;
-	struct virtio_net_hdr *vnet_hdr = NULL;
+	struct virtio_net_hdr vnet_hdr;
+	bool has_vnet_hdr = false;
 	struct sockcm_cookie sockc;
 	__be16 proto;
 	int err, reserve = 0;
@@ -2828,16 +2829,17 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 		if (po->has_vnet_hdr) {
-			vnet_hdr = data;
-			data += sizeof(*vnet_hdr);
-			tp_len -= sizeof(*vnet_hdr);
+			memcpy(&vnet_hdr, data, sizeof(vnet_hdr));
+			data += sizeof(vnet_hdr);
+			tp_len -= sizeof(vnet_hdr);
 			if (tp_len < 0 ||
-			    __packet_snd_vnet_parse(vnet_hdr, tp_len)) {
+			    __packet_snd_vnet_parse(&vnet_hdr, tp_len)) {
 				tp_len = -EINVAL;
 				goto tpacket_error;
 			}
 			copylen = __virtio16_to_cpu(vio_le(),
-						    vnet_hdr->hdr_len);
+						    vnet_hdr.hdr_len);
+			has_vnet_hdr = true;
 		}
 		copylen = max_t(int, copylen, dev->hard_header_len);
 		skb = sock_alloc_send_skb(&po->sk,
@@ -2875,11 +2877,11 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
 		}

-		if (po->has_vnet_hdr) {
-			if (virtio_net_hdr_to_skb(skb, vnet_hdr, vio_le())) {
+		if (has_vnet_hdr) {
+			if (virtio_net_hdr_to_skb(skb, &vnet_hdr, vio_le())) {
 				tp_len = -EINVAL;
 				goto tpacket_error;
 			}
-			virtio_net_hdr_set_proto(skb, vnet_hdr);
+			virtio_net_hdr_set_proto(skb, &vnet_hdr);
 		}

 		skb->destructor = tpacket_destruct_skb;
--
2.43.0

^ permalink raw reply related

* Re: TCP default settings (bugzilla)
From: plantegg ren @ 2026-04-17  5:58 UTC (permalink / raw)
  To: stephen; +Cc: netdev

Hi Stephen,

  I'm the reporter of those two bugs. I'm a DBA and Linux SRE with over
  10 years at Alibaba Cloud (Aliyun).

  These come from real production pain, not just theory. During my time at
  Alibaba Cloud, I pushed to change the default tcp_retries2 from 15 to 7
  in Alibaba Cloud Linux 3 (ALinux3) — our in-house distro serving millions
  of ECS instances. That change alone eliminated a whole class of prolonged
  outages across the fleet.

  The most memorable case: MySQL crashed and restarted in seconds, but the
  application tier stayed down for ~16 minutes because all existing
  connections were stuck in retransmission. After changing tcp_retries2 from
  15 to 5, recovery time dropped from 957s to about 20s.

  The tcp_keepalive_time issue bit us through LVS — connections silently
  dropped after 900s of idle time, but TCP didn't notice until 7200s later.
  We spent days chasing "random" Connection Reset errors across dozens of
  services before tracing it to this mismatch.

  Every ops team I've talked to ends up applying these tweaks independently
  after getting burned. If a major cloud distro already ships tcp_retries2=7,
  maybe it's time for upstream to reconsider the default too.

  I did use AI to help format the bug reports (guilty as charged), but the
  problems and the data are from years of production experience.

  Thanks for forwarding to the list.

  Xijun Ren

^ permalink raw reply

* Re: [PATCH net-next v4 5/5] selftests: net: bridge: add MRC and QQIC field encoding tests
From: Ujjal Roy @ 2026-04-17  5:57 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Nikolay Aleksandrov, David Ahern, Shuah Khan,
	Andy Roulin, Yong Wang, Petr Machata, Ujjal Roy, bridge, netdev,
	linux-kernel, linux-kselftest
In-Reply-To: <20260413084752.GD209364@shredder>

On Mon, Apr 13, 2026 at 2:18 PM Ido Schimmel <idosch@nvidia.com> wrote:
>
> See some comments below, but note that net-next is closed:
>
> https://lore.kernel.org/netdev/20260412142250.131bf997@kernel.org/
>
> So you can either wait with v5 until it is open again or post it as RFC
> so that we can at least review (but not merge) it while net-next is
> closed.

Let me clear the changes asked here inline, so that I will be prepared
with v5 until net-next is open. You can ask me to send it as RFC v5,
if you have doubts about inline answers.

>
> On Sun, Apr 12, 2026 at 11:10:47AM +0000, Ujjal Roy wrote:
> > Enhance vlmc_query_intvl_test and vlmc_query_response_intvl_test in
> > bridge_vlan_mcast.sh to validate IGMPv3/MLDv2 protocol compliance for
> > MRC and QQIC field encoding across both linear and exponential ranges.
> >
> > TEST: Vlan multicast snooping enable                                [ OK ]
> > TEST: Vlan mcast_query_interval global option default value         [ OK ]
> > INFO: Vlan 10 mcast_query_interval (QQIC) test cases:
> > TEST: Number of tagged IGMPv2 general query                         [ OK ]
> > TEST: IGMPv3 QQIC linear value 60                                   [ OK ]
> > TEST: MLDv2 QQIC linear value 60                                    [ OK ]
> > TEST: IGMPv3 QQIC non linear value 160                              [ OK ]
> > TEST: MLDv2 QQIC non linear value 160                               [ OK ]
> > TEST: Vlan mcast_query_response_interval global option default value   [ OK ]
> > INFO: Vlan 10 mcast_query_response_interval (MRC) test cases:
> > TEST: IGMPv3 MRC linear value 60                                    [ OK ]
> > TEST: IGMPv3 MRC non linear value 160                               [ OK ]
> > TEST: MLDv2 MRC linear value 30000                                  [ OK ]
> > TEST: MLDv2 MRC non linear value 60000                              [ OK ]
> >
> > Signed-off-by: Ujjal Roy <royujjal@gmail.com>
> > ---
> >  .../net/forwarding/bridge_vlan_mcast.sh       | 150 +++++++++++++++++-
> >  1 file changed, 142 insertions(+), 8 deletions(-)
> >
> > diff --git a/tools/testing/selftests/net/forwarding/bridge_vlan_mcast.sh b/tools/testing/selftests/net/forwarding/bridge_vlan_mcast.sh
> > index e8031f68200a..9f9f33d58286 100755
> > --- a/tools/testing/selftests/net/forwarding/bridge_vlan_mcast.sh
> > +++ b/tools/testing/selftests/net/forwarding/bridge_vlan_mcast.sh
> > @@ -162,14 +162,27 @@ vlmc_query_cnt_setup()
> >  {
> >       local type=$1
> >       local dev=$2
> > +     local match=$3
> >
> >       if [[ $type == "igmp" ]]; then
> > -             tc filter add dev $dev egress pref 10 prot 802.1Q \
> > +             # This matches: IP Protocol 2 (IGMP)
> > +             tc filter add dev "$dev" egress pref 10 prot 802.1Q \
> >                       flower vlan_id 10 vlan_ethtype ipv4 dst_ip 224.0.0.1 ip_proto 2 \
> > +                     action continue
> > +             # AND Type 0x11 (Query) at offset 24 after IP
> > +             # IP (20 byte IP + 4 bytes Option)
>
> Let's make it clearer: 20 bytes IPv4 header + 4 bytes Router Alert option
             #    20 bytes IPv4 header + 4 bytes Router Alert option +
IGMP[offset 0] Query

>
> > +             match=(match u8 0x11 0xff at 24 $match)
> > +             tc filter add dev "$dev" egress pref 20 prot 802.1Q u32 "${match[@]}" \
> >                       action pass
> >       else
> > -             tc filter add dev $dev egress pref 10 prot 802.1Q \
> > +             # This matches: ICMPv6
> > +             tc filter add dev "$dev" egress pref 10 prot 802.1Q \
> >                       flower vlan_id 10 vlan_ethtype ipv6 dst_ip ff02::1 ip_proto icmpv6 \
> > +                     action continue
> > +             # AND Type 0x82 (Query) at offset 48 after IPv6
> > +             # IPv6 (40 bytes IPv6 + 2 bytes next HDR + 4 bytes Option + 2 byte pad)
>
> Same: 40 bytes IPv6 header + 8 bytes Hop-by-hop option
             #    40 bytes IPv6 header + 8 bytes Hop-by-hop option +
MLD[offset 0] Query

>
> > +             match=(match u8 0x82 0xff at 48 $match)
> > +             tc filter add dev "$dev" egress pref 20 prot 802.1Q u32 "${match[@]}" \
> >                       action pass
> >       fi
>
> Sashiko has a relevant comment:
>
> "
> Does this configuration evaluate all packets against the pref 20 filter,
> regardless of the pref 10 result?
>
> In tc, if a packet does not match a filter, classification automatically falls
> through to the next priority filter.  By using "action continue" on pref 10,
> matching packets are also instructed to continue evaluation at the next filter.
>
> Because both matching and non-matching packets proceed to pref 20, pref 10
> seems to act as a no-op gate.  Could this cause the u32 rules in pref 20 to
> inadvertently match unrelated background traffic on the interface?
>
> To implement a logical AND across different classifiers, should pref 10 use
> "action goto chain 1" with pref 20 placed inside chain 1?
> "
Answer: No, it should evaluate IGMP only by pref 10 filter AND IGMPv3
Query by pref 20 filter. Query filter may include additional match for
QQIC/MRC.

Here is my new filter:
             tc filter add dev "$dev" egress pref 10 prot 802.1Q \
                     flower vlan_id 10 vlan_ethtype ipv4 dst_ip
224.0.0.1 ip_proto 2 \
                     action goto chain 1

>
> >
> > @@ -181,7 +194,53 @@ vlmc_query_cnt_cleanup()
> >       local dev=$1
> >
> >       ip link set dev br0 type bridge mcast_stats_enabled 0
> > -     tc filter del dev $dev egress pref 10
> > +     tc filter del dev "$dev" egress pref 20
> > +     tc filter del dev "$dev" egress pref 10
> > +}
> > +
> > +vlmc_query_get_intvl_match()
> > +{
> > +     local type=$1
> > +     local version=$2
> > +     local test=$3
> > +     local interval=$4
> > +
> > +     if [ "$test" = "qqic" ]; then
> > +             # QQIC is 8-bit floating point encoding for IGMPv3 and MLDv2
> > +             if [ "${type}v${version}" = "igmpv3" ]; then
> > +                     # IP 20 bytes + 4 bytes Option + IGMPv3[9]
> > +                     if [[ $interval -lt 128 ]]; then
> > +                             echo "match u8 0x3c 0xff at 33"
>
> Please pass the expected value as an argument instead of hard coding
> "0x3c" here. Same in other places in the function.
Will pass the expected code as an argument. Also will update the comments here.
                     # 20 bytes IPv4 header + 4 bytes Router Alert
option + IGMPv3[offset 9] QQIC

>
> > +                     else
> > +                             echo "match u8 0x84 0xff at 33"
> > +                     fi
> > +             elif [ "${type}v${version}" = "mldv2" ]; then
> > +                     # IPv6 40 + 2 next HDR + 4 Option + 2 pad + MLDv2[25]
> > +                     if [[ $interval -lt 128 ]]; then
> > +                             echo "match u8 0x3c 0xff at 73"
> > +                     else
> > +                             echo "match u8 0x84 0xff at 73"
> > +                     fi
> > +             fi
> > +     elif [ "$test" = "mrc" ]; then
> > +             if [ "${type}v${version}" = "igmpv3" ]; then
> > +                     # MRC is 8-bit floating point encoding for IGMPv3
> > +                     # IP 20 bytes + 4 bytes Option + IGMPv3[1]
> > +                     if [[ $interval -lt 128 ]]; then
> > +                             echo "match u8 0x3c 0xff at 25"
> > +                     else
> > +                             echo "match u8 0x84 0xff at 25"
> > +                     fi
> > +             elif [ "${type}v${version}" = "mldv2" ]; then
> > +                     # MRC is 16-bit floating point encoding for MLDv2
> > +                     # IPv6 40 + 2 next HDR + 4 Option + 2 pad + MLDv2[4]
> > +                     if [[ $interval -lt 32768 ]]; then
> > +                             echo "match u16 0x7530 0xffff at 52"
> > +                     else
> > +                             echo "match u16 0x8d4c 0xffff at 52"
> > +                     fi
> > +             fi
> > +     fi
> >  }
> >
> >  vlmc_check_query()
> > @@ -191,9 +250,13 @@ vlmc_check_query()
> >       local dev=$3
> >       local expect=$4
> >       local time=$5
> > +     local test=$6
> > +     local interval=$7
> > +     local intvl_match=""
> >       local ret=0
> >
> > -     vlmc_query_cnt_setup $type $dev
> > +     intvl_match="$(vlmc_query_get_intvl_match "$type" "$version" "$test" "$interval")"
> > +     vlmc_query_cnt_setup "$type" "$dev" "$intvl_match"
> >
> >       local pre_tx_xstats=$(vlmc_query_cnt_xstats $type $version $dev)
> >       bridge vlan global set vid 10 dev br0 mcast_snooping 1 mcast_querier 1
> > @@ -201,7 +264,7 @@ vlmc_check_query()
> >       if [[ $ret -eq 0 ]]; then
> >               sleep $time
> >
> > -             local tcstats=$(tc_rule_stats_get $dev 10 egress)
> > +             local tcstats=$(tc_rule_stats_get "$dev" 20 egress)
> >               local post_tx_xstats=$(vlmc_query_cnt_xstats $type $version $dev)
> >
> >               if [[ $tcstats != $expect || \
> > @@ -441,6 +504,7 @@ vlmc_query_intvl_test()
> >       check_err $? "Wrong default mcast_query_interval global vlan option value"
> >       log_test "Vlan mcast_query_interval global option default value"
> >
> > +     log_info "Vlan 10 mcast_query_interval (QQIC) test cases:"
>
> Let's remove this as it makes the output confusing:
Sure, I will remove this line.

>
> INFO: Vlan 10 mcast_query_response_interval (MRC) test cases:
> TEST: IGMPv3 MRC linear value 60                                    [ OK ]
> [...]
> TEST: Flood unknown vlan multicast packets to router port only      [ OK ]
> TEST: Disable multicast vlan snooping when vlan filtering is disabled   [ OK ]
>
> >       RET=0
> >       bridge vlan global set vid 10 dev br0 mcast_snooping 1 mcast_startup_query_count 0
> >       bridge vlan global set vid 10 dev br0 mcast_snooping 1 mcast_query_interval 200
> > @@ -448,8 +512,42 @@ vlmc_query_intvl_test()
> >       # 1 is sent immediately, then 2 more in the next 5 seconds
> >       vlmc_check_query igmp 2 $swp1 3 5
> >       check_err $? "Wrong number of tagged IGMPv2 general queries sent"
> > -     log_test "Vlan 10 mcast_query_interval option changed to 200"
> > +     log_test "Number of tagged IGMPv2 general query"
> >
> > +     RET=0
> > +     bridge vlan global set vid 10 dev br0 mcast_snooping 1 mcast_igmp_version 3
> > +     check_err $? "Could not set mcast_igmp_version in vlan 10"
> > +     bridge vlan global set vid 10 dev br0 mcast_snooping 1 mcast_mld_version 2
> > +     check_err $? "Could not set mcast_mld_version in vlan 10"
> > +     bridge vlan global set vid 10 dev br0 mcast_snooping 1 mcast_query_interval 6000
> > +     check_err $? "Could not set mcast_query_interval in vlan 10"
> > +     # 1 is sent immediately, IGMPv3 QQIC should match with linear value 60s
> > +     vlmc_check_query igmp 3 $swp1 1 1 qqic 60
> > +     check_err $? "Wrong QQIC in generated IGMPv3 general queries"
> > +     log_test "IGMPv3 QQIC linear value 60"
> > +
> > +     RET=0
> > +     # 1 is sent immediately, MLDv2 QQIC should match with linear value 60s
> > +     vlmc_check_query mld 2 $swp1 1 1 qqic 60
> > +     check_err $? "Wrong QQIC in generated MLDv2 general queries"
> > +     log_test "MLDv2 QQIC linear value 60"
> > +
> > +     RET=0
> > +     bridge vlan global set vid 10 dev br0 mcast_snooping 1 mcast_query_interval 16000
> > +     check_err $? "Could not set mcast_query_interval in vlan 10"
> > +     # 1 is sent immediately, IGMPv3 QQIC should match with non linear value 160s
> > +     vlmc_check_query igmp 3 $swp1 1 1 qqic 160
> > +     check_err $? "Wrong QQIC in generated IGMPv3 general queries"
> > +     log_test "IGMPv3 QQIC non linear value 160"
> > +
> > +     RET=0
> > +     # 1 is sent immediately, MLDv2 QQIC should match with non linear value 160s
> > +     vlmc_check_query mld 2 $swp1 1 1 qqic 160
> > +     check_err $? "Wrong QQIC in generated MLDv2 general queries"
> > +     log_test "MLDv2 QQIC non linear value 160"
> > +
> > +     bridge vlan global set vid 10 dev br0 mcast_snooping 1 mcast_igmp_version 2
> > +     bridge vlan global set vid 10 dev br0 mcast_snooping 1 mcast_mld_version 1
> >       bridge vlan global set vid 10 dev br0 mcast_snooping 1 mcast_startup_query_count 2
> >       bridge vlan global set vid 10 dev br0 mcast_snooping 1 mcast_query_interval 12500
> >  }
> > @@ -468,11 +566,47 @@ vlmc_query_response_intvl_test()
> >       check_err $? "Wrong default mcast_query_response_interval global vlan option value"
> >       log_test "Vlan mcast_query_response_interval global option default value"
> >
> > +     log_info "Vlan 10 mcast_query_response_interval (MRC) test cases:"
>
> Same
I will remove this line also.

[...]

^ permalink raw reply

* [PATCH] gtp: disable BH before calling udp_tunnel_xmit_skb()
From: David Carlier @ 2026-04-17  5:54 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Harald Welte, Andrew Lunn, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni
  Cc: Weiming Shi, osmocom-net-gprs, netdev, linux-kernel,
	David Carlier, stable

gtp_genl_send_echo_req() runs as a generic netlink doit handler in
process context with BH not disabled. It calls udp_tunnel_xmit_skb(),
which eventually invokes iptunnel_xmit() — that uses __this_cpu_inc/dec
on softnet_data.xmit.recursion to track the tunnel xmit recursion level.

Without local_bh_disable(), the task may migrate between
dev_xmit_recursion_inc() and dev_xmit_recursion_dec(), breaking the
per-CPU counter pairing. The result is stale or negative recursion
levels that can later produce false-positive
SKB_DROP_REASON_RECURSION_LIMIT drops on either CPU.

The other udp_tunnel_xmit_skb() call sites in gtp.c are unaffected:
the data path runs under ndo_start_xmit and the echo response handlers
run from the UDP encap rx softirq, both with BH already disabled.

Fix it by disabling BH around the udp_tunnel_xmit_skb() call, mirroring
commit 2cd7e6971fc2 ("sctp: disable BH before calling
udp_tunnel_xmit_skb()").

Fixes: 6f1a9140ecda ("net: add xmit recursion limit to tunnel xmit functions")
Cc: stable@vger.kernel.org
Signed-off-by: David Carlier <devnexen@gmail.com>
---
 drivers/net/gtp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 70b9e58b9b78..5150f2e4f66b 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -2400,6 +2400,7 @@ static int gtp_genl_send_echo_req(struct sk_buff *skb, struct genl_info *info)
 		return -ENODEV;
 	}

+	local_bh_disable();
 	udp_tunnel_xmit_skb(rt, sk, skb_to_send,
 			    fl4.saddr, fl4.daddr,
 			    inet_dscp_to_dsfield(fl4.flowi4_dscp),
@@ -2409,6 +2410,7 @@ static int gtp_genl_send_echo_req(struct sk_buff *skb, struct genl_info *info)
 			    !net_eq(sock_net(sk),
 				    dev_net(gtp->dev)),
 			    false, 0);
+	local_bh_enable();
 	return 0;
 }

-- 
2.53.0

^ permalink raw reply related

* [net-next v2 5/5] net: stmmac: starfive: Add STMMAC_FLAG_SPH_DISABLE flag
From: Minda Chen @ 2026-04-17  2:45 UTC (permalink / raw)
  To: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, netdev
  Cc: linux-kernel, linux-stm32, devicetree, Minda Chen
In-Reply-To: <20260417024523.107786-1-minda.chen@starfivetech.com>

Add default disable split header flag in all the starfive
soc.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
index 91698c763dac..9146b498658d 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
@@ -145,7 +145,7 @@ static int starfive_dwmac_probe(struct platform_device *pdev)
 	}
 
 	dwmac->dev = &pdev->dev;
-	plat_dat->flags |= STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP;
+	plat_dat->flags |= (STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP | STMMAC_FLAG_SPH_DISABLE);
 	plat_dat->bsp_priv = dwmac;
 	plat_dat->dma_cfg->dche = true;
 
-- 
2.17.1


^ permalink raw reply related

* [PATCH net 1/2] net/mlx5e: psp: Fix invalid access on PSP dev registration fail
From: Tariq Toukan @ 2026-04-17  5:02 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Boris Pismenny, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Daniel Zahka, Willem de Bruijn, Cosmin Ratiu,
	Raed Salem, Rahul Rameshbabu, Dragos Tatulea, Kees Cook, netdev,
	linux-rdma, linux-kernel, Gal Pressman
In-Reply-To: <20260417050201.192070-1-tariqt@nvidia.com>

From: Cosmin Ratiu <cratiu@nvidia.com>

priv->psp->psp is initialized with the PSP device as returned by
psp_dev_create(). This could also return an error, in which case a
future psp_dev_unregister() will result in unpleasantness.

Avoid that by using a local variable and only saving the PSP device when
registration succeeds.
Also apply some light refactoring of the functions managing the PSP
device in order to make them more readable/safe.

Fixes: 89ee2d92f66c ("net/mlx5e: Support PSP offload functionality")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 .../mellanox/mlx5/core/en_accel/psp.c         | 36 ++++++++++---------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/psp.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/psp.c
index 6a50b6dec0fa..d9adb993e64d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/psp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/psp.c
@@ -1070,29 +1070,37 @@ static struct psp_dev_ops mlx5_psp_ops = {
 
 void mlx5e_psp_unregister(struct mlx5e_priv *priv)
 {
-	if (!priv->psp || !priv->psp->psp)
+	struct mlx5e_psp *psp = priv->psp;
+
+	if (!psp || !psp->psp)
 		return;
 
-	psp_dev_unregister(priv->psp->psp);
+	psp_dev_unregister(psp->psp);
+	psp->psp = NULL;
 }
 
 void mlx5e_psp_register(struct mlx5e_priv *priv)
 {
+	struct mlx5e_psp *psp = priv->psp;
+	struct psp_dev *psd;
+
 	/* FW Caps missing */
 	if (!priv->psp)
 		return;
 
-	priv->psp->caps.assoc_drv_spc = sizeof(u32);
-	priv->psp->caps.versions = 1 << PSP_VERSION_HDR0_AES_GCM_128;
+	psp->caps.assoc_drv_spc = sizeof(u32);
+	psp->caps.versions = 1 << PSP_VERSION_HDR0_AES_GCM_128;
 	if (MLX5_CAP_PSP(priv->mdev, psp_crypto_esp_aes_gcm_256_encrypt) &&
 	    MLX5_CAP_PSP(priv->mdev, psp_crypto_esp_aes_gcm_256_decrypt))
-		priv->psp->caps.versions |= 1 << PSP_VERSION_HDR0_AES_GCM_256;
+		psp->caps.versions |= 1 << PSP_VERSION_HDR0_AES_GCM_256;
 
-	priv->psp->psp = psp_dev_create(priv->netdev, &mlx5_psp_ops,
-					&priv->psp->caps, NULL);
-	if (IS_ERR(priv->psp->psp))
+	psd = psp_dev_create(priv->netdev, &mlx5_psp_ops, &psp->caps, NULL);
+	if (IS_ERR(psd)) {
 		mlx5_core_err(priv->mdev, "PSP failed to register due to %pe\n",
-			      priv->psp->psp);
+			      psd);
+		return;
+	}
+	psp->psp = psd;
 }
 
 int mlx5e_psp_init(struct mlx5e_priv *priv)
@@ -1131,22 +1139,18 @@ int mlx5e_psp_init(struct mlx5e_priv *priv)
 	if (!psp)
 		return -ENOMEM;
 
-	priv->psp = psp;
 	fs = mlx5e_accel_psp_fs_init(priv);
 	if (IS_ERR(fs)) {
 		err = PTR_ERR(fs);
-		goto out_err;
+		kfree(psp);
+		return err;
 	}
 
 	psp->fs = fs;
+	priv->psp = psp;
 
 	mlx5_core_dbg(priv->mdev, "PSP attached to netdevice\n");
 	return 0;
-
-out_err:
-	priv->psp = NULL;
-	kfree(psp);
-	return err;
 }
 
 void mlx5e_psp_cleanup(struct mlx5e_priv *priv)
-- 
2.44.0


^ permalink raw reply related

* [PATCH net 2/2] net/mlx5e: psp: Hook PSP dev reg/unreg to profile enable/disable
From: Tariq Toukan @ 2026-04-17  5:02 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Boris Pismenny, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Daniel Zahka, Willem de Bruijn, Cosmin Ratiu,
	Raed Salem, Rahul Rameshbabu, Dragos Tatulea, Kees Cook, netdev,
	linux-rdma, linux-kernel, Gal Pressman
In-Reply-To: <20260417050201.192070-1-tariqt@nvidia.com>

From: Cosmin Ratiu <cratiu@nvidia.com>

devlink reload while PSP connections are active does:

mlx5_unload_one_devl_locked() -> mlx5_detach_device()
-> _mlx5e_suspend()
  -> mlx5e_detach_netdev()
    -> profile->cleanup_rx
    -> profile->cleanup_tx
  -> mlx5e_destroy_mdev_resources() -> mlx5_core_dealloc_pd() fails:
...
mlx5_core 0000:08:00.0: mlx5_cmd_out_err:821:(pid 19722):
DEALLOC_PD(0x801) op_mod(0x0) failed, status bad resource state(0x9),
syndrome (0xef0c8a), err(-22)
...

The reason for failure is the existence of TX keys, which are removed by
the PSP dev unregistration happening in:
profile->cleanup() -> mlx5e_psp_unregister() -> mlx5e_psp_cleanup()
  -> psp_dev_unregister()
...but this isn't invoked in the devlink reload flow, only when changing
the NIC profile (e.g. when transitioning to switchdev mode) or on dev
teardown.

Move PSP device registration into mlx5e_nic_enable(), and unregistration
into the corresponding mlx5e_nic_disable(). These functions are called
during netdev attach/detach after RX & TX are set up.
This ensures that the keys will be gone by the time the PD is destroyed.

Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Fixes: 89ee2d92f66c ("net/mlx5e: Support PSP offload functionality")
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 6c4eeb88588c..c3938a2dbbfe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -6021,7 +6021,6 @@ static int mlx5e_nic_init(struct mlx5_core_dev *mdev,
 	if (take_rtnl)
 		rtnl_lock();
 
-	mlx5e_psp_register(priv);
 	/* update XDP supported features */
 	mlx5e_set_xdp_feature(priv);
 
@@ -6034,7 +6033,6 @@ static int mlx5e_nic_init(struct mlx5_core_dev *mdev,
 static void mlx5e_nic_cleanup(struct mlx5e_priv *priv)
 {
 	mlx5e_health_destroy_reporters(priv);
-	mlx5e_psp_unregister(priv);
 	mlx5e_ktls_cleanup(priv);
 	mlx5e_psp_cleanup(priv);
 	mlx5e_fs_cleanup(priv->fs);
@@ -6158,6 +6156,7 @@ static void mlx5e_nic_enable(struct mlx5e_priv *priv)
 
 	mlx5e_fs_init_l2_addr(priv->fs, netdev);
 	mlx5e_ipsec_init(priv);
+	mlx5e_psp_register(priv);
 
 	err = mlx5e_macsec_init(priv);
 	if (err)
@@ -6228,6 +6227,7 @@ static void mlx5e_nic_disable(struct mlx5e_priv *priv)
 	mlx5_lag_remove_netdev(mdev, priv->netdev);
 	mlx5_vxlan_reset_to_default(mdev->vxlan);
 	mlx5e_macsec_cleanup(priv);
+	mlx5e_psp_unregister(priv);
 	mlx5e_ipsec_cleanup(priv);
 }
 
-- 
2.44.0


^ permalink raw reply related

* [PATCH net 0/2] mlx5e PSP fixes
From: Tariq Toukan @ 2026-04-17  5:01 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Boris Pismenny, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Daniel Zahka, Willem de Bruijn, Cosmin Ratiu,
	Raed Salem, Rahul Rameshbabu, Dragos Tatulea, Kees Cook, netdev,
	linux-rdma, linux-kernel, Gal Pressman

Hi,

This patchset provides bug fixes from Cosmin to the mlx5e PSP feature.

Thanks,
Tariq.

Cosmin Ratiu (2):
  net/mlx5e: psp: Fix invalid access on PSP dev registration fail
  net/mlx5e: psp: Hook PSP dev reg/unreg to profile enable/disable

 .../mellanox/mlx5/core/en_accel/psp.c         | 36 ++++++++++---------
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  4 +--
 2 files changed, 22 insertions(+), 18 deletions(-)


base-commit: 82c21069028c5db3463f851ae8ac9cc2e38a3827
-- 
2.44.0


^ permalink raw reply

* Re: [net-next v2 3/5] dt-bindings: net: starfive,jh7110-dwmac: Add JHB100 sgmii rx clk
From: Rob Herring (Arm) @ 2026-04-17  4:36 UTC (permalink / raw)
  To: Minda Chen
  Cc: Paolo Abeni, Jakub Kicinski, Alexandre Torgue, Maxime Coquelin,
	Emil Renner Berthing, Andrew Lunn, David S . Miller, linux-stm32,
	devicetree, netdev, Krzysztof Kozlowski, Rob Herring,
	linux-kernel, Eric Dumazet, Conor Dooley
In-Reply-To: <20260417024523.107786-4-minda.chen@starfivetech.com>


On Fri, 17 Apr 2026 10:45:21 +0800, Minda Chen wrote:
> JHB100 SGMII interface tx/rx mac clock is split and require to
> set clock rate in 10M/100M/1000M speed. So dts need to add a
> new rx clock in code, dts and dt binding doc.
> 
> Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
> ---
>  .../bindings/net/starfive,jh7110-dwmac.yaml   | 42 ++++++++++++++++---
>  1 file changed, 36 insertions(+), 6 deletions(-)
> 

My bot found errors running 'make dt_binding_check' on your patch:

yamllint warnings/errors:
./Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml:56:8: [warning] wrong indentation: expected 8 but found 7 (indentation)

dtschema/dtc warnings/errors:

doc reference errors (make refcheckdocs):

See https://patchwork.kernel.org/project/devicetree/patch/20260417024523.107786-4-minda.chen@starfivetech.com

The base for the series is generally the latest rc1. A different dependency
should be noted in *this* patch.

If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:

pip3 install dtschema --upgrade

Please check and re-submit after running the above command yourself. Note
that DT_SCHEMA_FILES can be set to your schema file to speed up checking
your schema. However, it must be unset to test all examples with your schema.


^ permalink raw reply

* Re: [PATCH net] net/sched: act_ct: fix skb leak on fragment check failure
From: phx @ 2026-04-17  3:56 UTC (permalink / raw)
  To: Jamal Hadi Salim; +Cc: netdev, jiri, horms
In-Reply-To: <CAM0EoMmi0tdhB9ECmcpPea7iSFm5AiLme71cw5zXK+WVUZGEMw@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 2826 bytes --]

Found it through code review. Reproduced it on a 7.0-rc6 kernel
using a veth pair with act_ct on ingress:

  ip netns add ns_ct
  ip link add veth0 type veth peer name veth1
  ip link set veth1 netns ns_ct
  ip link set veth0 up
  ip netns exec ns_ct ip link set veth1 up
  tc qdisc add dev veth0 clsact
  tc filter add dev veth0 ingress protocol ip flower action ct zone 1

Then send truncated IP packets (10 bytes IP, need 20 minimum) from
the namespace via raw AF_PACKET socket on veth1. This hits
pskb_may_pull failure in tcf_ct_ipv4_is_fragment -> -EINVAL ->
out_frag -> TC_ACT_CONSUMED. net/core/dev.c handles TC_ACT_CONSUMED
by returning NULL without freeing the skb.

Result on unpatched kernel:
  Sent: 222 packets
  skbuff_head_cache: before=6439 after=6663 growth=224
  FAIL: skb leak detected (224 objects leaked)

Attached a test script that automates this. With the fix applied
(TC_ACT_SHOT for non-EINPROGRESS errors), the skbs get freed and
the test passes. The test script is generated by AI.

On Thu, Apr 16, 2026 at 11:32 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:

> On Thu, Apr 16, 2026 at 9:01 AM Dudu Lu <phx0fer@gmail.com> wrote:
> >
> > When tcf_ct_handle_fragments() returns an error other than -EINPROGRESS
> > (e.g. -EINVAL from malformed fragments), tcf_ct_act() jumps to out_frag
> > which unconditionally returns TC_ACT_CONSUMED. This tells the caller the
> > skb was consumed, but it was not freed, leaking one skb per malformed
> > fragment.
> >
> > TC_ACT_CONSUMED is only correct for -EINPROGRESS, where defragmentation
> > is genuinely in progress and the skb has been queued. For all other
> > errors the skb is still owned by the caller and must be freed via
> > TC_ACT_SHOT.
> >
> > Fixes: 3f14b377d01d ("net/sched: act_ct: fix skb leak and crash on ooo
> frags")
> > Signed-off-by: Dudu Lu <phx0fer@gmail.com>
>
> Do you have a reproducer? Always helps adding at least a tdc test.
> Also: How did you find this issue? was it AI? If yes, please add the
> tag "Assisted-by:<AI name here>"
>
> cheers,
> jamal
>
> > ---
> >  net/sched/act_ct.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
> > index 7d5e50c921a0..870655f682bd 100644
> > --- a/net/sched/act_ct.c
> > +++ b/net/sched/act_ct.c
> > @@ -1107,8 +1107,10 @@ TC_INDIRECT_SCOPE int tcf_ct_act(struct sk_buff
> *skb, const struct tc_action *a,
> >         return retval;
> >
> >  out_frag:
> > -       if (err != -EINPROGRESS)
> > +       if (err != -EINPROGRESS) {
> >                 tcf_action_inc_drop_qstats(&c->common);
> > +               return TC_ACT_SHOT;
> > +       }
> >         return TC_ACT_CONSUMED;
> >
> >  drop:
> > --
> > 2.39.3 (Apple Git-145)
> >
>

[-- Attachment #1.2: Type: text/html, Size: 3592 bytes --]

[-- Attachment #2: act_ct_skb_leak.sh --]
[-- Type: text/x-sh, Size: 3349 bytes --]

#!/bin/bash
# SPDX-License-Identifier: GPL-2.0
#
# Test for skb leak in act_ct when tcf_ct_handle_fragments() fails.
#
# When a truncated IP packet hits act_ct, tcf_ct_ipv4_is_fragment()
# returns -EINVAL. The out_frag path returns TC_ACT_CONSUMED, but the
# skb was never actually consumed — it leaks.
#
# This test sends truncated IP packets through a veth pair with act_ct
# on ingress and checks skbuff_head_cache slab growth.

set -e

readonly NS="ns-act-ct-leak-$(mktemp -u XXXXXX)"
readonly DEV="veth-ct0"
readonly DEV_PEER="veth-ct1"
readonly NUM_PKTS=500
# Minimum slab growth to consider a leak (allow some noise)
readonly LEAK_THRESHOLD=200

cleanup() {
	ip link del $DEV 2>/dev/null || true
	ip netns del $NS 2>/dev/null || true
}

trap cleanup EXIT

get_skb_active() {
	awk '/skbuff_head_cache/ {print $2}' /proc/slabinfo
}

# Build the packet sender
build_sender() {
	local prog="$1"
	local src="$prog.c"

	cat > "$src" << 'EOF'
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <linux/if_packet.h>
#include <net/if.h>
#include <net/ethernet.h>
#include <arpa/inet.h>
#include <fcntl.h>

int main(int argc, char **argv) {
	const char *ifname = argv[1];
	int count = atoi(argv[2]);
	int fd, i, sent = 0;

	fd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
	if (fd < 0) { perror("socket"); return 1; }
	fcntl(fd, F_SETFL, O_NONBLOCK);

	struct sockaddr_ll sa = {};
	sa.sll_family = AF_PACKET;
	sa.sll_protocol = htons(ETH_P_IP);
	sa.sll_ifindex = if_nametoindex(ifname);
	sa.sll_halen = 6;
	memset(sa.sll_addr, 0xff, 6);

	/* Ethernet(14) + truncated IP(10) = 24 bytes.
	 * IP header needs 20 bytes minimum, so pskb_may_pull fails
	 * in tcf_ct_ipv4_is_fragment() -> -EINVAL.
	 */
	unsigned char pkt[24] = {};
	memset(pkt, 0xff, 6);		/* dst = broadcast */
	pkt[12] = 0x08; pkt[13] = 0x00;/* ethertype = IPv4 */
	pkt[14] = 0x45;			/* ver=4, ihl=5 */
	pkt[16] = 0x00; pkt[17] = 0x0a;/* total_len=10 */
	pkt[23] = 0x06;			/* proto=TCP */

	for (i = 0; i < count; i++) {
		if (sendto(fd, pkt, sizeof(pkt), 0,
			   (struct sockaddr *)&sa, sizeof(sa)) >= 0)
			sent++;
	}
	printf("%d\n", sent);
	close(fd);
	return 0;
}
EOF
	gcc -Wall -o "$prog" "$src"
}

echo "=== act_ct skb leak test ==="

# Setup veth pair with namespace
ip netns add $NS
ip link add $DEV type veth peer name $DEV_PEER
ip link set $DEV_PEER netns $NS
ip link set $DEV up
ip netns exec $NS ip link set $DEV_PEER up

# Add act_ct filter on ingress
tc qdisc add dev $DEV clsact
tc filter add dev $DEV ingress protocol ip flower action ct zone 1

# Build sender
SENDER=$(mktemp)
build_sender "$SENDER"

# Record slab state
BEFORE=$(get_skb_active)

# Send truncated packets from namespace
SENT=$(ip netns exec $NS "$SENDER" $DEV_PEER $NUM_PKTS)
sleep 1

# Record slab state again
AFTER=$(get_skb_active)

# Check tc stats
DROPPED=$(tc -s filter show dev $DEV ingress | \
	  awk '/dropped/ {for(i=1;i<=NF;i++) if($i=="dropped") print $(i+1)}' | \
	  tr -d ',')

GROWTH=$((AFTER - BEFORE))

echo "Sent: $SENT packets"
echo "TC dropped: $DROPPED"
echo "skbuff_head_cache: before=$BEFORE after=$AFTER growth=$GROWTH"

rm -f "$SENDER" "${SENDER}.c"

if [ "$GROWTH" -ge "$LEAK_THRESHOLD" ]; then
	echo "FAIL: skb leak detected ($GROWTH objects leaked)"
	exit 1
else
	echo "PASS: no significant skb leak"
	exit 0
fi

^ permalink raw reply

* [PATCH net] net: dsa: mt7530: fix .get_stats64 sleeping in atomic context
From: Daniel Golle @ 2026-04-17  3:55 UTC (permalink / raw)
  To: Chester A. Unal, Daniel Golle, Andrew Lunn, Vladimir Oltean,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Matthias Brugger, AngeloGioacchino Del Regno, Russell King,
	Christian Marangi, netdev, linux-kernel, linux-arm-kernel,
	linux-mediatek
  Cc: Frank Wunderlich, John Crispin

The .get_stats64 callback runs in atomic context, but on
MDIO-connected switches every register read acquires the MDIO bus
mutex, which can sleep:
[   12.645973] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:609
[   12.654442] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 759, name: grep
[   12.663377] preempt_count: 0, expected: 0
[   12.667410] RCU nest depth: 1, expected: 0
[   12.671511] INFO: lockdep is turned off.
[   12.675441] CPU: 0 UID: 0 PID: 759 Comm: grep Tainted: G S      W           7.0.0+ #0 PREEMPT
[   12.675453] Tainted: [S]=CPU_OUT_OF_SPEC, [W]=WARN
[   12.675456] Hardware name: Bananapi BPI-R64 (DT)
[   12.675459] Call trace:
[   12.675462]  show_stack+0x14/0x1c (C)
[   12.675477]  dump_stack_lvl+0x68/0x8c
[   12.675487]  dump_stack+0x14/0x1c
[   12.675495]  __might_resched+0x14c/0x220
[   12.675504]  __might_sleep+0x44/0x80
[   12.675511]  __mutex_lock+0x50/0xb10
[   12.675523]  mutex_lock_nested+0x20/0x30
[   12.675532]  mt7530_get_stats64+0x40/0x2ac
[   12.675542]  dsa_user_get_stats64+0x2c/0x40
[   12.675553]  dev_get_stats+0x44/0x1e0
[   12.675564]  dev_seq_printf_stats+0x24/0xe0
[   12.675575]  dev_seq_show+0x14/0x3c
[   12.675583]  seq_read_iter+0x37c/0x480
[   12.675595]  seq_read+0xd0/0xec
[   12.675605]  proc_reg_read+0x94/0xe4
[   12.675615]  vfs_read+0x98/0x29c
[   12.675625]  ksys_read+0x54/0xdc
[   12.675633]  __arm64_sys_read+0x18/0x20
[   12.675642]  invoke_syscall.constprop.0+0x54/0xec
[   12.675653]  do_el0_svc+0x3c/0xb4
[   12.675662]  el0_svc+0x38/0x200
[   12.675670]  el0t_64_sync_handler+0x98/0xdc
[   12.675679]  el0t_64_sync+0x158/0x15c

For MDIO-connected switches, poll MIB counters asynchronously using a
delayed workqueue every second and let .get_stats64 return the cached
values under a per-port spinlock. A mod_delayed_work() call on each
read triggers an immediate refresh so counters stay responsive when
queried more frequently.

MMIO-connected switches (MT7988, EN7581, AN7583) are not affected
because their regmap does not sleep, so they continue to read MIB
counters directly in .get_stats64.

Fixes: 88c810f35ed5 ("net: dsa: mt7530: implement .get_stats64")
Signed-off-by: Daniel Golle <daniel@makrotopia.org>
---
This bug highlights a bigger problem and the actual cause:
Locking in the mt7530 driver deserves a cleanup, and refactoring
towards cleanly and directly using the regmap API.
I've prepared this already and am going to submit a series doing
most of that using Coccinelle semantic patches once net-next opens
again.

 drivers/net/dsa/mt7530.c | 54 +++++++++++++++++++++++++++++++++++++---
 drivers/net/dsa/mt7530.h |  6 +++++
 2 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index b9423389c2ef0..786d3a8492bcb 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -25,6 +25,8 @@
 
 #include "mt7530.h"
 
+#define MT7530_STATS_POLL_INTERVAL	(1 * HZ)
+
 static struct mt753x_pcs *pcs_to_mt753x_pcs(struct phylink_pcs *pcs)
 {
 	return container_of(pcs, struct mt753x_pcs, pcs);
@@ -906,10 +908,9 @@ static void mt7530_get_rmon_stats(struct dsa_switch *ds, int port,
 	*ranges = mt7530_rmon_ranges;
 }
 
-static void mt7530_get_stats64(struct dsa_switch *ds, int port,
-			       struct rtnl_link_stats64 *storage)
+static void mt7530_read_port_stats64(struct mt7530_priv *priv, int port,
+				     struct rtnl_link_stats64 *storage)
 {
-	struct mt7530_priv *priv = ds->priv;
 	uint64_t data;
 
 	/* MIB counter doesn't provide a FramesTransmittedOK but instead
@@ -951,6 +952,43 @@ static void mt7530_get_stats64(struct dsa_switch *ds, int port,
 			       &storage->rx_crc_errors);
 }
 
+static void mt7530_stats_poll(struct work_struct *work)
+{
+	struct mt7530_priv *priv = container_of(work, struct mt7530_priv,
+						stats_work.work);
+	struct rtnl_link_stats64 stats = {};
+	struct dsa_port *dp;
+	int port;
+
+	dsa_switch_for_each_user_port(dp, priv->ds) {
+		port = dp->index;
+
+		mt7530_read_port_stats64(priv, port, &stats);
+
+		spin_lock(&priv->stats_lock);
+		priv->ports[port].stats = stats;
+		spin_unlock(&priv->stats_lock);
+	}
+
+	schedule_delayed_work(&priv->stats_work,
+			      MT7530_STATS_POLL_INTERVAL);
+}
+
+static void mt7530_get_stats64(struct dsa_switch *ds, int port,
+			       struct rtnl_link_stats64 *storage)
+{
+	struct mt7530_priv *priv = ds->priv;
+
+	if (priv->bus) {
+		spin_lock(&priv->stats_lock);
+		*storage = priv->ports[port].stats;
+		spin_unlock(&priv->stats_lock);
+		mod_delayed_work(system_wq, &priv->stats_work, 0);
+	} else {
+		mt7530_read_port_stats64(priv, port, storage);
+	}
+}
+
 static void mt7530_get_eth_ctrl_stats(struct dsa_switch *ds, int port,
 				      struct ethtool_eth_ctrl_stats *ctrl_stats)
 {
@@ -3137,6 +3175,13 @@ mt753x_setup(struct dsa_switch *ds)
 	if (ret && priv->irq_domain)
 		mt7530_free_mdio_irq(priv);
 
+	if (!ret && priv->bus) {
+		spin_lock_init(&priv->stats_lock);
+		INIT_DELAYED_WORK(&priv->stats_work, mt7530_stats_poll);
+		schedule_delayed_work(&priv->stats_work,
+				      MT7530_STATS_POLL_INTERVAL);
+	}
+
 	return ret;
 }
 
@@ -3404,6 +3449,9 @@ EXPORT_SYMBOL_GPL(mt7530_probe_common);
 void
 mt7530_remove_common(struct mt7530_priv *priv)
 {
+	if (priv->bus)
+		cancel_delayed_work_sync(&priv->stats_work);
+
 	if (priv->irq_domain)
 		mt7530_free_mdio_irq(priv);
 
diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h
index 3e0090bed298d..44c1dc75baea8 100644
--- a/drivers/net/dsa/mt7530.h
+++ b/drivers/net/dsa/mt7530.h
@@ -796,6 +796,7 @@ struct mt7530_fdb {
  * @pvid:	The VLAN specified is to be considered a PVID at ingress.  Any
  *		untagged frames will be assigned to the related VLAN.
  * @sgmii_pcs:	Pointer to PCS instance for SerDes ports
+ * @stats:	Cached port statistics for MDIO-connected switches
  */
 struct mt7530_port {
 	bool enable;
@@ -803,6 +804,7 @@ struct mt7530_port {
 	u32 pm;
 	u16 pvid;
 	struct phylink_pcs *sgmii_pcs;
+	struct rtnl_link_stats64 stats;
 };
 
 /* Port 5 mode definitions of the MT7530 switch */
@@ -875,6 +877,8 @@ struct mt753x_info {
  * @create_sgmii:	Pointer to function creating SGMII PCS instance(s)
  * @active_cpu_ports:	Holding the active CPU ports
  * @mdiodev:		The pointer to the MDIO device structure
+ * @stats_lock:		Protects cached per-port stats from concurrent access
+ * @stats_work:		Delayed work for polling MIB counters on MDIO switches
  */
 struct mt7530_priv {
 	struct device		*dev;
@@ -900,6 +904,8 @@ struct mt7530_priv {
 	int (*create_sgmii)(struct mt7530_priv *priv);
 	u8 active_cpu_ports;
 	struct mdio_device *mdiodev;
+	spinlock_t stats_lock; /* protects cached stats counters */
+	struct delayed_work stats_work;
 };
 
 struct mt7530_hw_vlan_entry {
-- 
2.53.0

^ permalink raw reply related

* [net-next v2 1/5] dt-bindings: net: starfive,jh7110-dwmac: Remove JH8100
From: Minda Chen @ 2026-04-17  2:45 UTC (permalink / raw)
  To: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, netdev
  Cc: linux-kernel, linux-stm32, devicetree, Minda Chen
In-Reply-To: <20260417024523.107786-1-minda.chen@starfivetech.com>

Remove JH8100 dt-bindings because do not support it now.
StarFive have stopped JH8100 developing and will release it
outside.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
---
 .../bindings/net/starfive,jh7110-dwmac.yaml   | 28 ++++---------------
 1 file changed, 5 insertions(+), 23 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml b/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
index 313a15331661..0d1962980f57 100644
--- a/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
@@ -30,10 +30,6 @@ properties:
       - items:
           - const: starfive,jh7110-dwmac
           - const: snps,dwmac-5.20
-      - items:
-          - const: starfive,jh8100-dwmac
-          - const: starfive,jh7110-dwmac
-          - const: snps,dwmac-5.20
 
   reg:
     maxItems: 1
@@ -120,25 +116,11 @@ allOf:
           minItems: 3
           maxItems: 3
 
-      if:
-        properties:
-          compatible:
-            contains:
-              const: starfive,jh8100-dwmac
-      then:
-        properties:
-          resets:
-            maxItems: 1
-
-          reset-names:
-            const: stmmaceth
-      else:
-        properties:
-          resets:
-            minItems: 2
-
-          reset-names:
-            minItems: 2
+        resets:
+          minItems: 2
+
+        reset-names:
+          minItems: 2
 
 unevaluatedProperties: false
 
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH bpf v2 2/2] selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
From: KaFai Wan @ 2026-04-17  3:07 UTC (permalink / raw)
  To: Martin KaFai Lau
  Cc: daniel, john.fastabend, sdf, ast, andrii, eddyz87, memxor, song,
	yonghong.song, jolsa, davem, edumazet, kuba, pabeni, horms, shuah,
	jiayuan.chen, bpf, netdev, linux-kernel, linux-kselftest
In-Reply-To: <2026416184330.-HAW.martin.lau@linux.dev>

On Thu, 2026-04-16 at 12:06 -0700, Martin KaFai Lau wrote:
> On Thu, Apr 16, 2026 at 07:23:08PM +0800, KaFai Wan wrote:
> > index 56685fc03c7e..2d738c0c4259 100644
> > --- a/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
> > +++ b/tools/testing/selftests/bpf/prog_tests/tcp_hdr_options.c
> > @@ -513,6 +513,59 @@ static void misc(void)
> >  	bpf_link__destroy(link);
> >  }
> >  
> > +static void hdr_sockopt(void)
> > +{
> > +	const char send_msg[] = "MISC!!!";
> > +	char recv_msg[sizeof(send_msg)];
> > +	const unsigned int nr_data = 2;
> > +	struct bpf_link *link;
> > +	struct sk_fds sk_fds;
> > +	int i, ret, true_val = 1;
> > +
> > +	lport_linum_map_fd = bpf_map__fd(misc_skel->maps.lport_linum_map);
> > +
> > +	link = bpf_program__attach_cgroup(misc_skel->progs.misc_hdr_sockopt, cg_fd);
> > +	if (!ASSERT_OK_PTR(link, "attach_cgroup(misc_hdr_sockopt)"))
> > +		return;
> > +
> > +	if (sk_fds_connect(&sk_fds, false)) {
> > +		bpf_link__destroy(link);
> > +		return;
> > +	}
> > +
> > +	ret = setsockopt(sk_fds.active_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
> > +	if (!ASSERT_OK(ret, "setsockopt(TCP_NODELAY) active"))
> > +		goto check_linum;
> > +
> > +	ret = setsockopt(sk_fds.passive_fd, SOL_TCP, TCP_NODELAY, &true_val, sizeof(true_val));
> 
> Why are these two setsockopt(TCP_NODELAY) calls needed?
> 
> Instead of creating a new "void hdr_sockopt(void)", can the test be done in the
> existing "void misc(void)" by doing bpf_setsockopt(TCP_NODELAY) in the
> misc_estab() bpf prog?

Oh, I see. I meant to test on both active and passive side. We can only test on active side in the
existing "void misc(void)".
> 
> The PASSIVE_ESTABLISHED_CB can do the bpf_setsockopt(TCP_NODELAY, 0)
> if it wants to keep the same expectation on Nagle. The
> BPF_SOCK_OPS_HDR_OPT_LEN_CB and BPF_SOCK_OPS_WRITE_HDR_OPT_CB
> can do bpf_setsockopt(TCP_NODELAY, 1) to test recursion and
> the error return value.
> 
> >  void test_tcp_hdr_options(void)
> > diff --git a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
> > b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
> > index d487153a839d..a8cf7c4e7ed2 100644
> > --- a/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
> > +++ b/tools/testing/selftests/bpf/progs/test_misc_tcp_hdr_options.c
> > @@ -28,6 +28,12 @@ unsigned int nr_data = 0;
> >  unsigned int nr_syn = 0;
> >  unsigned int nr_fin = 0;
> >  unsigned int nr_hwtstamp = 0;
> > +unsigned int nr_hdr_sockopt_estab = 0;
> > +unsigned int nr_hdr_sockopt_estab_err = 0;
> > +unsigned int nr_hdr_sockopt_len = 0;
> > +unsigned int nr_hdr_sockopt_len_err = 0;
> > +unsigned int nr_hdr_sockopt_write = 0;
> > +unsigned int nr_hdr_sockopt_write_err = 0;
> 
> nr_hdr_sockopt_estab, nr_hdr_sockopt_len, and nr_hdr_sockopt_write
> are unnecessary. These tests have already been covered in some ways.

yes, they are unnecessary in existing misc_estab()
> 
> Mostly a nit. The new counters are used in both connections. Note the
> existing nr_xxx is exclusively used in either active or passive,
> so there is no parallel counting in practice.
> 
> Instead of counting, just use a bool nodelay_est_ok,
> nodelay_hdr_len_err, nodelay_write_err and assert them
> to be true in userspace.

indeed. will fix these in next version.

-- 
Thanks,
KaFai

^ permalink raw reply

* Re: [PATCH net v2] NFC: digital: bound SENSF response copy into nfc_target
From: Pengpeng Hou @ 2026-04-17  3:06 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, David S. Miller, Eric Dumazet, Paolo Abeni, Simon Horman,
	Kees Cook, linux-kernel, pengpeng
In-Reply-To: <20260407120004.4-nfc-sensf-v2-pengpeng@iscas.ac.cn>

Hi Jakub,

Thanks, that makes sense.

I won't resend another bounds-only version as-is. I'll first dig into
why the digital path uses a 19-byte `struct digital_sensf_res` while
the core/UAPI path only carries 18 bytes in `sensf_res`, and follow up
once I have a clearer explanation for what the driver/core boundary
should be here. I'll also shorten the `Fixes:` hash in the next
revision.

Thanks,
Pengpeng

^ permalink raw reply

* [net-next v2 4/5] net: stmmac: starfive: Add JHB100 SGMII interface
From: Minda Chen @ 2026-04-17  2:45 UTC (permalink / raw)
  To: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, netdev
  Cc: linux-kernel, linux-stm32, devicetree, Minda Chen
In-Reply-To: <20260417024523.107786-1-minda.chen@starfivetech.com>

Add JHB100 compatible and SGMII support. JHB100 soc contains
2 SGMII interfaces and integrated with serdes PHY. SGMII with
split TX/RX MAC clock and need to set 2.5M/25M/125M TX/RX clock
rate in 10M/100M/1000M speed mode.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
---
 .../ethernet/stmicro/stmmac/dwmac-starfive.c  | 54 ++++++++++++++-----
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
index 16b955a6d77b..91698c763dac 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-starfive.c
@@ -26,6 +26,7 @@ struct starfive_dwmac_data {
 struct starfive_dwmac {
 	struct device *dev;
 	const struct starfive_dwmac_data *data;
+	struct clk *sgmii_rx;
 };
 
 static int starfive_dwmac_set_mode(struct plat_stmmacenet_data *plat_dat)
@@ -68,6 +69,24 @@ static int starfive_dwmac_set_mode(struct plat_stmmacenet_data *plat_dat)
 	return 0;
 }
 
+static int stmmac_starfive_sgmii_set_clk_rate(void *bsp_priv, struct clk *clk_tx_i,
+					      phy_interface_t interface, int speed)
+{
+	struct starfive_dwmac *dwmac = (void *)bsp_priv;
+	long rate = rgmii_clock(speed);
+	int ret;
+
+	/* MAC clock rate the same as RGMII */
+	if (rate < 0)
+		return 0;
+
+	ret = clk_set_rate(clk_tx_i, rate);
+	if (ret)
+		return ret;
+
+	return clk_set_rate(dwmac->sgmii_rx, rate);
+}
+
 static int starfive_dwmac_probe(struct platform_device *pdev)
 {
 	struct plat_stmmacenet_data *plat_dat;
@@ -102,24 +121,34 @@ static int starfive_dwmac_probe(struct platform_device *pdev)
 		return dev_err_probe(&pdev->dev, PTR_ERR(clk_gtx),
 				     "error getting gtx clock\n");
 
-	/* Generally, the rgmii_tx clock is provided by the internal clock,
-	 * which needs to match the corresponding clock frequency according
-	 * to different speeds. If the rgmii_tx clock is provided by the
-	 * external rgmii_rxin, there is no need to configure the clock
-	 * internally, because rgmii_rxin will be adaptively adjusted.
-	 */
-	if (!device_property_read_bool(&pdev->dev, "starfive,tx-use-rgmii-clk"))
-		plat_dat->set_clk_tx_rate = stmmac_set_clk_tx_rate;
+	if (plat_dat->phy_interface == PHY_INTERFACE_MODE_SGMII) {
+		dwmac->sgmii_rx = devm_clk_get_enabled(&pdev->dev, "sgmii_rx");
+		if (IS_ERR(dwmac->sgmii_rx))
+			return dev_err_probe(&pdev->dev,
+					     PTR_ERR(dwmac->sgmii_rx),
+					     "error getting sgmii rx clock\n");
+		plat_dat->set_clk_tx_rate = stmmac_starfive_sgmii_set_clk_rate;
+	} else {
+		/*
+		 * Generally, the rgmii_tx clock is provided by the internal clock,
+		 * which needs to match the corresponding clock frequency according
+		 * to different speeds. If the rgmii_tx clock is provided by the
+		 * external rgmii_rxin, there is no need to configure the clock
+		 * internally, because rgmii_rxin will be adaptively adjusted.
+		 */
+		if (!device_property_read_bool(&pdev->dev, "starfive,tx-use-rgmii-clk"))
+			plat_dat->set_clk_tx_rate = stmmac_set_clk_tx_rate;
+
+		err = starfive_dwmac_set_mode(plat_dat);
+		if (err)
+			return err;
+	}
 
 	dwmac->dev = &pdev->dev;
 	plat_dat->flags |= STMMAC_FLAG_EN_TX_LPI_CLK_PHY_CAP;
 	plat_dat->bsp_priv = dwmac;
 	plat_dat->dma_cfg->dche = true;
 
-	err = starfive_dwmac_set_mode(plat_dat);
-	if (err)
-		return err;
-
 	return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
 }
 
@@ -130,6 +159,7 @@ static const struct starfive_dwmac_data jh7100_data = {
 static const struct of_device_id starfive_dwmac_match[] = {
 	{ .compatible = "starfive,jh7100-dwmac", .data = &jh7100_data },
 	{ .compatible = "starfive,jh7110-dwmac" },
+	{ .compatible = "starfive,jhb100-dwmac" },
 	{ /* sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, starfive_dwmac_match);
-- 
2.17.1


^ permalink raw reply related

* [net-next v2 3/5] dt-bindings: net: starfive,jh7110-dwmac: Add JHB100 sgmii rx clk
From: Minda Chen @ 2026-04-17  2:45 UTC (permalink / raw)
  To: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, netdev
  Cc: linux-kernel, linux-stm32, devicetree, Minda Chen
In-Reply-To: <20260417024523.107786-1-minda.chen@starfivetech.com>

JHB100 SGMII interface tx/rx mac clock is split and require to
set clock rate in 10M/100M/1000M speed. So dts need to add a
new rx clock in code, dts and dt binding doc.

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
---
 .../bindings/net/starfive,jh7110-dwmac.yaml   | 42 ++++++++++++++++---
 1 file changed, 36 insertions(+), 6 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml b/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
index edc246a71ce3..3802cdbf1848 100644
--- a/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
@@ -39,20 +39,26 @@ properties:
     maxItems: 1
 
   clocks:
+    minItems: 5
     items:
       - description: GMAC main clock
       - description: GMAC AHB clock
       - description: PTP clock
       - description: TX clock
       - description: GTX clock
+      - description: SGMII RX clock
 
   clock-names:
-    items:
-      - const: stmmaceth
-      - const: pclk
-      - const: ptp_ref
-      - const: tx
-      - const: gtx
+    minItems: 5
+    maxItems: 6
+    contains:
+      enum:
+       - stmmaceth
+       - pclk
+       - ptp_ref
+       - tx
+       - gtx
+       - sgmii_rx
 
   starfive,tx-use-rgmii-clk:
     description:
@@ -99,6 +105,14 @@ allOf:
           minItems: 2
           maxItems: 2
 
+        clocks:
+          minItems: 5
+          maxItems: 5
+
+        clock-names:
+          minItems: 5
+          maxItems: 5
+
         resets:
           maxItems: 1
 
@@ -120,6 +134,14 @@ allOf:
           minItems: 3
           maxItems: 3
 
+        clocks:
+          minItems: 5
+          maxItems: 5
+
+        clock-names:
+          minItems: 5
+          maxItems: 5
+
         resets:
           minItems: 2
 
@@ -139,6 +161,14 @@ allOf:
         interrupt-names:
           const: macirq
 
+        clocks:
+          minItems: 5
+          maxItems: 6
+
+        clock-names:
+          minItems: 5
+          maxItems: 6
+
         resets:
           maxItems: 1
 
-- 
2.17.1


^ permalink raw reply related

* [net-next v2 2/5] dt-bindings: net: starfive,jh7110-dwmac: Add JHB100 support
From: Minda Chen @ 2026-04-17  2:45 UTC (permalink / raw)
  To: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, netdev
  Cc: linux-kernel, linux-stm32, devicetree, Minda Chen
In-Reply-To: <20260417024523.107786-1-minda.chen@starfivetech.com>

Add StarFive JHB100 dwmac support and compatible.
The JHB100 dwmac shares the same driver code as the JH7110 dwmac,
which contains 2 SGMII interfaces, 1 RGMII/RMII interface and
1 RMII interface.
JHB100 dwmac has only one reset signal and one main interrupt
line.

Please refer to below:

JHB100: reset-names = "stmmaceth";

Example usage of JHB100 in the device tree:

gmac0: ethernet@11b80000 {
        compatible = "starfive,jhb100-dwmac",
                     "snps,dwmac-5.20";
        interrupts = <225>;
        interrupt-names = "macirq";
        ...
};

Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
---
 .../devicetree/bindings/net/snps,dwmac.yaml   |  1 +
 .../bindings/net/starfive,jh7110-dwmac.yaml   | 23 +++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/snps,dwmac.yaml b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
index 38bc34dc4f09..85cd3252e8b1 100644
--- a/Documentation/devicetree/bindings/net/snps,dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/snps,dwmac.yaml
@@ -115,6 +115,7 @@ properties:
         - sophgo,sg2044-dwmac
         - starfive,jh7100-dwmac
         - starfive,jh7110-dwmac
+        - starfive,jhb100-dwmac
         - tesla,fsd-ethqos
         - thead,th1520-gmac
 
diff --git a/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml b/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
index 0d1962980f57..edc246a71ce3 100644
--- a/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
+++ b/Documentation/devicetree/bindings/net/starfive,jh7110-dwmac.yaml
@@ -18,6 +18,7 @@ select:
         enum:
           - starfive,jh7100-dwmac
           - starfive,jh7110-dwmac
+          - starfive,jhb100-dwmac
   required:
     - compatible
 
@@ -30,6 +31,9 @@ properties:
       - items:
           - const: starfive,jh7110-dwmac
           - const: snps,dwmac-5.20
+      - items:
+          - const: starfive,jhb100-dwmac
+          - const: snps,dwmac-5.20
 
   reg:
     maxItems: 1
@@ -122,6 +126,25 @@ allOf:
         reset-names:
           minItems: 2
 
+  - if:
+      properties:
+        compatible:
+          contains:
+            const: starfive,jhb100-dwmac
+    then:
+      properties:
+        interrupts:
+          maxItems: 1
+
+        interrupt-names:
+          const: macirq
+
+        resets:
+          maxItems: 1
+
+        reset-names:
+          const: stmmaceth
+
 unevaluatedProperties: false
 
 examples:
-- 
2.17.1


^ permalink raw reply related

* [net-next v2 0/5] Add StarFive JHB100 soc SGMII GMAC support
From: Minda Chen @ 2026-04-17  2:45 UTC (permalink / raw)
  To: Alexandre Torgue, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Maxime Coquelin,
	Emil Renner Berthing, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, netdev
  Cc: linux-kernel, linux-stm32, devicetree, Minda Chen

JHB100 is a Starfive new RISC-V SoC for datacenter BMC (BaseBoard
Managent Controller). Similar with Aspeed 27x0.

The JHB100 minimal system upstream is in progress:
https://patchwork.kernel.org/project/linux-riscv/cover/20260403054945.467700-1-changhuang.liang@starfivetech.com/

JHB100 GMAC still using designware GMAC core like JH7100 and JH7110,
and contains 2 SGMII interfaces, 1 RGMII/RMII interface, 1 RMII
interface. In JH7100/JH7110 dwmac-starfive.c have supported RGMII/RMII
interface. So require to add SGMII support to dwmac-starfive.c for JHB100.

SGMII serdes PHY has been intergrated in JHB100 and do not have driver
setting.

In JHB100 EVB board, SGMII connect with motorcomm YT8531s external PHY
and support RJ45 ethernet port.

The patch base in 7.0-rc5

changes
v2:
1. patch1 Add the remove reason
2. patch2 rename rx clock to sgmii_rx
3. patch4 confirm sgmii rx clock exist, or will probe error
   sgmii will not call starfive_dwmac_set_mode() 

Minda Chen (5):
  dt-bindings: net: starfive,jh7110-dwmac: Remove JH8100
  dt-bindings: net: starfive,jh7110-dwmac: Add JHB100 support
  dt-bindings: net: starfive,jh7110-dwmac: Add JHB100 sgmii rx clk
  net: stmmac: starfive: Add JHB100 SGMII interface
  net: stmmac: starfive: Add STMMAC_FLAG_SPH_DISABLE flag

 .../devicetree/bindings/net/snps,dwmac.yaml   |  1 +
 .../bindings/net/starfive,jh7110-dwmac.yaml   | 89 +++++++++++++------
 .../ethernet/stmicro/stmmac/dwmac-starfive.c  | 56 +++++++++---
 3 files changed, 106 insertions(+), 40 deletions(-)

base-commit: c369299895a591d96745d6492d4888259b004a9e
-- 
2.17.1

^ permalink raw reply

* Re: [PATCH bpf v2 1/2] bpf: Reject TCP_NODELAY in TCP header option callbacks
From: Jiayuan Chen @ 2026-04-17  2:43 UTC (permalink / raw)
  To: KaFai Wan, martin.lau, daniel, john.fastabend, sdf, ast, andrii,
	eddyz87, memxor, song, yonghong.song, jolsa, davem, edumazet,
	kuba, pabeni, horms, shuah, jiayuan.chen, bpf, netdev,
	linux-kernel, linux-kselftest
  Cc: Quan Sun, Yinhao Hu, Kaiyan Mei
In-Reply-To: <20260416112308.1820332-2-kafai.wan@linux.dev>


On 4/16/26 7:23 PM, KaFai Wan wrote:
> A BPF_SOCK_OPS program can enable
> BPF_SOCK_OPS_WRITE_HDR_OPT_CB_FLAG and then call
> bpf_setsockopt(TCP_NODELAY) from BPF_SOCK_OPS_HDR_OPT_LEN_CB or
> BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
>
> In these callbacks, bpf_setsockopt(TCP_NODELAY) can reach
> __tcp_sock_set_nodelay(), which can call tcp_push_pending_frames().
>
>  From BPF_SOCK_OPS_HDR_OPT_LEN_CB, tcp_push_pending_frames() can call
> tcp_current_mss(), which calls tcp_established_options() and re-enters
> bpf_skops_hdr_opt_len().
>
> BPF_SOCK_OPS_HDR_OPT_LEN_CB
>    -> bpf_setsockopt(TCP_NODELAY)
>      -> tcp_push_pending_frames()
>        -> tcp_current_mss()
>          -> tcp_established_options()
>            -> bpf_skops_hdr_opt_len()
>              -> BPF_SOCK_OPS_HDR_OPT_LEN_CB
>
>  From BPF_SOCK_OPS_WRITE_HDR_OPT_CB, tcp_push_pending_frames() can call
> tcp_write_xmit(), which calls tcp_transmit_skb().  That path recomputes
> header option length through tcp_established_options() and
> bpf_skops_hdr_opt_len() before re-entering bpf_skops_write_hdr_opt().
>
> BPF_SOCK_OPS_WRITE_HDR_OPT_CB
>    -> bpf_setsockopt(TCP_NODELAY)
>      -> tcp_push_pending_frames()
>        -> tcp_write_xmit()
>          -> tcp_transmit_skb()
>            -> tcp_established_options()
>              -> bpf_skops_hdr_opt_len()
>            -> bpf_skops_write_hdr_opt()
>              -> BPF_SOCK_OPS_WRITE_HDR_OPT_CB
>
> This leads to unbounded recursion and can overflow the kernel stack.
>
> Reject TCP_NODELAY with -EOPNOTSUPP in bpf_sock_ops_setsockopt()
> when bpf_setsockopt() is called from
> BPF_SOCK_OPS_HDR_OPT_LEN_CB or BPF_SOCK_OPS_WRITE_HDR_OPT_CB.
>
> Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn>
> Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
> Closes: https://lore.kernel.org/bpf/d1d523c9-6901-4454-a183-94462b8f3e4e@std.uestc.edu.cn/
> Fixes: 7e41df5dbba2 ("bpf: Add a few optnames to bpf_setsockopt")
> Signed-off-by: KaFai Wan <kafai.wan@linux.dev>
> ---
>   net/core/filter.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index fcfcb72663ca..911ff04bca5a 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -5833,6 +5833,11 @@ BPF_CALL_5(bpf_sock_ops_setsockopt, struct bpf_sock_ops_kern *, bpf_sock,
>   	if (!is_locked_tcp_sock_ops(bpf_sock))
>   		return -EOPNOTSUPP;
>   
> +	if ((bpf_sock->op == BPF_SOCK_OPS_HDR_OPT_LEN_CB ||
> +	     bpf_sock->op == BPF_SOCK_OPS_WRITE_HDR_OPT_CB) &&
> +	    IS_ENABLED(CONFIG_INET) && level == SOL_TCP && optname == TCP_NODELAY)
> +		return -EOPNOTSUPP;
> +
>   	return _bpf_setsockopt(bpf_sock->sk, level, optname, optval, optlen);
>   }
>   

A simple comment is recommended:

/* TCP_NODELAY triggers tcp_push_pending_frames() and re-enters these 
callbacks. */


Also like Martin pointed before, BPF_SOCK_OPS_HDR_OPT_LEN_CB / 
BPF_SOCK_OPS_WRITE_HDR_OPT_CB

can only be produced under CONFIG_INET so IS_ENABLED(CONFIG_INET) is dead.


^ permalink raw reply

* Re: [PATCH net v2] net: pse-pd: fix out-of-bounds bitmap access in pse_isr() on 32-bit
From: patchwork-bot+netdevbpf @ 2026-04-17  2:40 UTC (permalink / raw)
  To: Kory Maincent
  Cc: kuba, netdev, linux-kernel, github, thomas.petazzoni, o.rempel,
	andrew+netdev, davem, edumazet, pabeni
In-Reply-To: <20260415130300.806152-1-kory.maincent@bootlin.com>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 15 Apr 2026 15:02:59 +0200 you wrote:
> In pse_isr(), notifs_mask was declared as a single unsigned long on the
> stack (32 bits on 32-bit architectures). For PSE controllers with more
> than 32 ports, this causes two problems:
> 
> - map_event callbacks could wrote bit positions >= 32 via
>   *notifs_mask |= BIT(i), which is undefined behaviour on a 32-bit
>   unsigned long and corrupts adjacent stack memory.
> 
> [...]

Here is the summary with links:
  - [net,v2] net: pse-pd: fix out-of-bounds bitmap access in pse_isr() on 32-bit
    https://git.kernel.org/netdev/net/c/5099807f335c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] net: dsa: remove redundant netdev_lock_ops() from conduit ethtool ops
From: patchwork-bot+netdevbpf @ 2026-04-17  2:40 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: netdev, davem, edumazet, kuba, pabeni, andrew, olteanv, horms,
	sdf, linux-kernel, maxime.chevallier
In-Reply-To: <20260414231035.1917035-1-sdf@fomichev.me>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 14 Apr 2026 16:10:35 -0700 you wrote:
> DSA replaces the conduit (master) device's ethtool_ops with its own
> wrappers that aggregate stats from both the conduit and DSA switch
> ports. Taking the lock again inside the DSA wrappers causes a deadlock.
> 
> Stumbled upon this when booting qemu with fbnic and CONFIG_NET_DSA_LOOP=y
> (which looks like some kind of testing device that auto-populates the ports
> of eth0). `ethtool -i` is enough to deadlock. This means we have basically zero
> coverage for DSA stuff with real ops locked devs.
> 
> [...]

Here is the summary with links:
  - [net] net: dsa: remove redundant netdev_lock_ops() from conduit ethtool ops
    https://git.kernel.org/netdev/net/c/0f99e0c3e19b

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH net] selftests: net: add missing CMAC to tcp_ao config
From: patchwork-bot+netdevbpf @ 2026-04-17  2:40 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, netdev, edumazet, pabeni, andrew+netdev, horms, shuah, fw,
	antonio, matttbe, phil, linux-kselftest
In-Reply-To: <20260416010439.1053587-1-kuba@kernel.org>

Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 15 Apr 2026 18:04:39 -0700 you wrote:
> Recent changes to crypto and wifi made CMAC no longer
> selected by default on x86 and tcp_ao needs it.
> Add the missing config.
> 
> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
> ---
> CC: shuah@kernel.org
> CC: fw@strlen.de
> CC: antonio@openvpn.net
> CC: matttbe@kernel.org
> CC: phil@nwl.cc
> CC: linux-kselftest@vger.kernel.org
> 
> [...]

Here is the summary with links:
  - [net] selftests: net: add missing CMAC to tcp_ao config
    https://git.kernel.org/netdev/net/c/82c21069028c

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [PATCH v2 net 0/2] net: enetc: fix command BD ring issues
From: patchwork-bot+netdevbpf @ 2026-04-17  2:40 UTC (permalink / raw)
  To: Wei Fang
  Cc: claudiu.manoil, vladimir.oltean, xiaoning.wang, andrew+netdev,
	davem, edumazet, kuba, pabeni, chleroy, netdev, linux-kernel, imx,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <20260415060833.2303846-1-wei.fang@nxp.com>

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed, 15 Apr 2026 14:08:31 +0800 you wrote:
> Currently, the implementation of command BD ring has two issues, one is
> that the driver may obtain wrong consumer index of the ring, because the
> driver does not mask out the SBE bit of the CIR value, so a wrong index
> will be obtained when a SBE error ouccrs. The other one is that the DMA
> buffer may be used after free. If netc_xmit_ntmp_cmd() times out and
> returns an error, the pending command is not explicitly aborted, while
> ntmp_free_data_mem() unconditionally frees the DMA buffer. If the buffer
> has already been reallocated elsewhere, this may lead to silent memory
> corruption. Because the hardware eventually processes the pending command
> and perform a DMA write of the response to the physical address of the
> freed buffer. So this patch set is to fix these two issues.
> 
> [...]

Here is the summary with links:
  - [v2,net,1/2] net: enetc: correct the command BD ring consumer index
    https://git.kernel.org/netdev/net/c/759a32900b6f
  - [v2,net,2/2] net: enetc: fix NTMP DMA use-after-free issue
    https://git.kernel.org/netdev/net/c/3cade698881e

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox