Netdev List
 help / color / mirror / Atom feed
* [PATCH iwl-net v3 4/6] ixgbe: fix cls_u32 nexthdr path returning success when no entry installed
From: Aleksandr Loktionov @ 2026-04-15 14:28 UTC (permalink / raw)
  To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov
  Cc: netdev, Simon Horman, Marcin Szycik
In-Reply-To: <20260415142841.3222399-1-aleksandr.loktionov@intel.com>

ixgbe_configure_clsu32() returns 0 (success) after the nexthdr loop
even when ixgbe_clsu32_build_input() fails for every candidate entry
and no jump-table slot is actually programmed.  Callers that test the
return value would then falsely believe the filter was installed.

The variable 'err' already tracks the last ixgbe_clsu32_build_input()
return value; if the loop completes with a successful break, err is 0.
If all attempts failed, err holds the last failure code.  Change the
unconditional 'return 0' to 'return err' so errors are propagated
correctly.

Fixes: 1cdaaf5405ba ("ixgbe: Match on multiple headers for cls_u32 offloads")
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
---
v2 -> v3:
 - Add Reviewed-by: Simon Horman; no code change.

v1 -> v2:
 - Add Fixes: tag; reroute from iwl-next to iwl-net (false-success
   return is a user-visible correctness bug, not a cleanup).

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 210c7b9..6e7f8a9 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -10311,7 +10311,7 @@ static int ixgbe_configure_clsu32(struct ixgbe_adapter *adapter,
 				kfree(jump);
 			}
 		}
-		return 0;
+		return err;
 	}
 
 	input = kzalloc_obj(*input);
-- 
2.52.0

^ permalink raw reply related

* [PATCH iwl-net v3 5/6] ixgbe: fix ITR value overflow in adaptive interrupt throttling
From: Aleksandr Loktionov @ 2026-04-15 14:28 UTC (permalink / raw)
  To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov; +Cc: netdev
In-Reply-To: <20260415142841.3222399-1-aleksandr.loktionov@intel.com>

ixgbe_update_itr() packs a mode flag (IXGBE_ITR_ADAPTIVE_LATENCY,
bit 7) and a usecs delay (bits [6:0]) into an unsigned int, then
stores the combined value in ring_container->itr which is declared as
u8.  Values above 0xFF wrap on truncation, corrupting both the delay
and the mode flag on the next readback.

Keep the mode bit (IXGBE_ITR_ADAPTIVE_LATENCY) and the usec delay as
separate operands in the final store expression.  Clamp only the usecs
portion to [IXGBE_ITR_ADAPTIVE_MIN_USECS, IXGBE_ITR_ADAPTIVE_MAX_USECS]
using clamp_val() so that:
 - overflow cannot bleed into the mode bit (bit 7),
 - the delay cannot exceed 126 us (IXGBE_ITR_ADAPTIVE_MAX_USECS),
 - the delay cannot drop below 10 us (IXGBE_ITR_ADAPTIVE_MIN_USECS).

Fixes: b4ded8327fea ("ixgbe: Update adaptive ITR algorithm")
Cc: stable@vger.kernel.org
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
v2 -> v3:
 - Use clamp_val() instead of min_t() to also guard the lower bound
   (IXGBE_ITR_ADAPTIVE_MIN_USECS); keep mode and delay as separate
   operands until final store; use IXGBE_ITR_ADAPTIVE_MAX_USECS (126)
   as upper bound instead of IXGBE_ITR_ADAPTIVE_LATENCY - 1 (127)
   (Simon Horman).

v1 -> v2:
 - Add proper [N/M] numbering so patchwork tracks it as part of the set;
   no code change.

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 10 +++++++---
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 210c7b9..9f3ae21 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2886,11 +2886,17 @@ static void ixgbe_update_itr(struct ixgbe_q_vector *q_vector,
 				    IXGBE_ITR_ADAPTIVE_MIN_INC * 64) *
 		       IXGBE_ITR_ADAPTIVE_MIN_INC;
 		break;
 	}
 
 clear_counts:
-	/* write back value */
-	ring_container->itr = itr;
+	/* Separate mode bit (IXGBE_ITR_ADAPTIVE_LATENCY) from usec delay;
+	 * clamp delay to [MIN_USECS, MAX_USECS] before storing to prevent
+	 * u8 truncation from corrupting the mode flag or delay on readback.
+	 */
+	ring_container->itr = (itr & IXGBE_ITR_ADAPTIVE_LATENCY) |
+		clamp_val(itr & ~IXGBE_ITR_ADAPTIVE_LATENCY,
+			  IXGBE_ITR_ADAPTIVE_MIN_USECS,
+			  IXGBE_ITR_ADAPTIVE_MAX_USECS);
 
 	/* next update should occur within next jiffy */
 	ring_container->next_update = next_update + 1;
-- 
2.52.0

^ permalink raw reply related

* [PATCH iwl-net v3 6/6] ixgbe: fix integer overflow and wrong bit position in ixgbe_validate_rtr()
From: Aleksandr Loktionov @ 2026-04-15 14:28 UTC (permalink / raw)
  To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov
  Cc: netdev, Simon Horman
In-Reply-To: <20260415142841.3222399-1-aleksandr.loktionov@intel.com>

Two bugs in the same loop in ixgbe_validate_rtr():

1. The 3-bit traffic-class field was extracted by shifting a u32 and
   assigning the result directly to a u8.  For user priority 0 this is
   harmless; for UP[5..7] the shift leaves bits [15..21] in the u32
   which are then silently truncated when stored in u8.  Mask with
   IXGBE_RTRUP2TC_UP_MASK before the assignment so only the intended
   3 bits are kept.

2. When clearing an out-of-bounds entry the mask was always shifted by
   the fixed constant IXGBE_RTRUP2TC_UP_SHIFT (== 3), regardless of
   which loop iteration was being processed.  This means only UP1 (bit
   position 3) was ever cleared; UP0,2..7 (positions 0, 6, 9, ..., 21)
   were left unreset, so invalid TC mappings persisted in hardware and
   could mis-steer received packets to the wrong traffic class.
   Use i * IXGBE_RTRUP2TC_UP_SHIFT to target the correct 3-bit field
   for each iteration.

Swap the operand order in the mask expression to place the constant
on the right per kernel coding style (noted by David Laight).

Fixes: 8b1c0b24d9af ("ixgbe: configure minimal packet buffers to support TC")
Cc: stable@vger.kernel.org
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
v2 -> v3:
 - Correct Fixes: tag to 8b1c0b24d9af ("ixgbe: configure minimal packet
   buffers to support TC") -- the previously used e7589eab9291 predates
   the buggy code path (Simon Horman); add Reviewed-by: Simon Horman.

v1 -> v2:
 - Add Fixes: tag; reroute to iwl-net (wrong bit positions cause packet
   mis-steering); swap to (reg >> ...) & MASK operand order per David
   Laight.

 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 210c7b9..c9e4f12 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -9772,11 +9772,12 @@ static void ixgbe_validate_rtr(struct ixgbe_adapter *adapter, u8 tc)
 	rsave = reg;
 
 	for (i = 0; i < MAX_TRAFFIC_CLASS; i++) {
-		u8 up2tc = reg >> (i * IXGBE_RTRUP2TC_UP_SHIFT);
+		u8 up2tc = (reg >> (i * IXGBE_RTRUP2TC_UP_SHIFT)) &
+			   IXGBE_RTRUP2TC_UP_MASK;
 
 		/* If up2tc is out of bounds default to zero */
 		if (up2tc > tc)
-			reg &= ~(0x7 << IXGBE_RTRUP2TC_UP_SHIFT);
+			reg &= ~(IXGBE_RTRUP2TC_UP_MASK << (i * IXGBE_RTRUP2TC_UP_SHIFT));
 	}
 
 	if (reg != rsave)
-- 
2.52.0

^ permalink raw reply related

* Re: [PATCH net-next 3/3] net/ethernet/zte/dinghai: add hardware register access and PCI capability scanning
From: Andrew Lunn @ 2026-04-15 14:31 UTC (permalink / raw)
  To: Junyang Han
  Cc: netdev, davem, andrew+netdev, edumazet, kuba, pabeni, ran.ming,
	han.chengfei, zhang.yanze
In-Reply-To: <20260415015334.2018453-3-han.junyang@zte.com.cn>

> +int32_t zxdh_pf_pci_find_capability(struct pci_dev *pdev, uint8_t cfg_type,
> +                    uint32_t ioresource_types, int32_t *bars)
> +{
> +    int32_t pos = 0;
> +    uint8_t type = 0;
> +    uint8_t bar = 0;
> +
> +    for (pos = pci_find_capability(pdev, PCI_CAP_ID_VNDR); pos > 0;
> +         pos = pci_find_next_capability(pdev, pos, PCI_CAP_ID_VNDR)) {
> +        pci_read_config_byte(pdev, pos + offsetof
> (struct zxdh_pf_pci_cap, cfg_type), &type);
> +        pci_read_config_byte(pdev, pos + offsetof
> (struct zxdh_pf_pci_cap, bar), &bar);

Something odd going on with indentation? Has the mailer corrupted it?

> +
> +        /* ignore structures with reserved BAR values */
> +        if (bar > ZXDH_PF_MAX_BAR_VAL)
> +            continue;
> +
> +        if (type == cfg_type) {
> +            if (pci_resource_len(pdev, bar) &&
> +                pci_resource_flags(pdev, bar) & ioresource_types) {
> +                *bars |= (1 << bar);
> +                return pos;
> +            }
> +        }
> +    }
> +
> +    return 0;
> +}
> +
> +void __iomem *zxdh_pf_map_capability(struct dh_core_dev *dh_dev, int32_t off,
> +                     size_t minlen, uint32_t align,
> +                     uint32_t start, uint32_t size,
> +                     size_t *len, resource_size_t *pa,
> +                     uint32_t *bar_off)
> +    p = pci_iomap_range(pdev, bar, offset, length);
> +    if (unlikely(!p)) {

Is this hot path? Please only use unlikely() when dealing with frames
in the hot path.

> +int32_t zxdh_pf_common_cfg_init(struct dh_core_dev *dh_dev)
> +{
> +    int32_t common = 0;
> +    struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
> +    struct pci_dev *pdev = dh_dev->pdev;
> +
> +    /* check for a common config: if not, use legacy mode (bar 0). */
> +    common = zxdh_pf_pci_find_capability(pdev, ZXDH_PCI_CAP_COMMON_CFG,
> +                         IORESOURCE_IO | IORESOURCE_MEM,
> +                         &pf_dev->modern_bars);
> +    if (common == 0) {
> +        LOG_ERR("missing capabilities %i, leaving for legacy driver\
> n", common);
> +        return -ENODEV;
> +    }
> +
> +    pf_dev->common = zxdh_pf_map_capability(dh_dev, common,
> +                        sizeof(struct zxdh_pf_pci_common_cfg),
> +                        ZXDH_PF_ALIGN4, 0,
> +                        sizeof(struct zxdh_pf_pci_common_cfg),
> +                        NULL, NULL, NULL);
> +    if (unlikely(!pf_dev->common)) {
> +        LOG_ERR("pf_dev->common is null\n");
> +        return -EINVAL;
> +    }
> +
> +    return 0;
> +}

> +int32_t zxdh_pf_notify_cfg_init(struct dh_core_dev *dh_dev)
> +{
> +    /* We don't know how many VQs we'll map, ahead of the time.
> +     * If notify length is small, map it all now. Otherwise, map each VQ individually later.
> +     */
> +    if ((uint64_t)notify_length + (notify_offset % PAGE_SIZE) <= PAGE_SIZE) {

Please try to avoid casts. They suggest the types are wrong. You will
probably have better code if you don't need the cast.

> +int32_t zxdh_pf_modern_cfg_init(struct dh_core_dev *dh_dev)
> +{
> +    int32_t ret = 0;
> +    struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
> +    struct pci_dev *pdev = dh_dev->pdev;
> +
> +    ret = zxdh_pf_common_cfg_init(dh_dev);
> +    if (ret != 0) {

if (ret)

would be more normal.

> +void zxdh_pf_get_vf_mac
> (struct dh_core_dev *dh_dev, uint8_t *mac, int32_t vf_id)
> +{
> +    uint32_t DEV_MAC_L = 0;
> +    uint16_t DEV_MAC_H = 0;
> +    struct zxdh_pf_device *pf_dev = dh_core_priv(dh_dev);
> +
> +    if (pf_dev->pf_sriov_cap_base) {
> +        DEV_MAC_L = ioread32((void __iomem *)(pf_dev->pf_sriov_cap_base +
> +                     (pf_dev->sriov_bar_size) * vf_id +
> +                     pf_dev->dev_cfg_bar_off));

Is the cast needed? pf_dev->pf_sriov_cap_base should already be void *
__iomem.

	Andrew

^ permalink raw reply

* Re: [PATCH] xfrm: iptfs: fix deadlock in iptfs_destroy_state
From: Simon Horman @ 2026-04-15 14:35 UTC (permalink / raw)
  To: phx0fer; +Cc: 'Simon Horman', netdev, steffen.klassert, herbert, davem
In-Reply-To: <20260413085138.72623-1-phx0fer@gmail.com>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
xfrm: iptfs: fix deadlock in iptfs_destroy_state

This commit resolves an ABBA deadlock in iptfs_destroy_state() by moving
the cancellation of iptfs_timer and drop_timer outside of their
respective spinlocks.

> diff --git a/net/xfrm/xfrm_iptfs.c b/net/xfrm/xfrm_iptfs.c
> index 97bc979e55baf..11291b87158cb 100644
> --- a/net/xfrm/xfrm_iptfs.c
> +++ b/net/xfrm/xfrm_iptfs.c
> @@ -2708,8 +2708,10 @@ static void iptfs_destroy_state(struct xfrm_state *x)
>  	if (!xtfs)
>  		return;
>  
> -	spin_lock_bh(&xtfs->x->lock);
>  	hrtimer_cancel(&xtfs->iptfs_timer);
> +	hrtimer_cancel(&xtfs->drop_timer);
> +
> +	spin_lock_bh(&xtfs->x->lock);
>  	__skb_queue_head_init(&list);
>  	skb_queue_splice_init(&xtfs->queue, &list);
>  	spin_unlock_bh(&xtfs->x->lock);

Is it safe to call hrtimer_cancel() and skb_queue_splice_init() on xtfs
here if it was created via iptfs_clone_state() but failed initialization?

If iptfs_clone_state() uses kmemdup() to shallow-copy the xtfs structure
from the original state, the copy includes xtfs->queue and the hrtimer
structs (iptfs_timer and drop_timer). Reinitialization only happens later
in iptfs_init_state() -> __iptfs_init_state().

If x->type->init_state() fails during __xfrm_init_state(), the state is
destroyed via xfrm_state_put() before iptfs_init_state() is called.

On this destruction path, does calling hrtimer_cancel() result in a
regression where it operates on the copied timers? If the original timer
was queued, the copied timer has the HRTIMER_STATE_ENQUEUED bit set, but
its rb_node is not natively in the rbtree. Calling hrtimer_cancel() might
call rb_erase_cached() using the cloned node's pointers (which point to
the original node's parent/children).

Additionally, since xtfs->queue is a shallow copy, its next/prev pointers
would point to the original state's queue. Could splicing it here lead to
a regression by modifying the original state's skb queue? If the original
queue was empty, it points to itself, and this code might dequeue it and
call kfree_skb() on an address inside orig->mode_data.

^ permalink raw reply

* Re: [PATCH iproute2] ss: force a flush in monitor mode
From: patchwork-bot+netdevbpf @ 2026-04-15 14:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: dsahern, stephen, davem, kuba, pabeni, kuniyu, netdev,
	eric.dumazet
In-Reply-To: <20260415130307.1016393-1-edumazet@google.com>

Hello:

This patch was applied to iproute2/iproute2.git (main)
by Stephen Hemminger <stephen@networkplumber.org>:

On Wed, 15 Apr 2026 13:03:07 +0000 you wrote:
> Call fflush() from generic_show_sock() in order to work
> with pipes and redirects.
> 
> After this patch, "ss -E &>log_file" works as expected.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> 
> [...]

Here is the summary with links:
  - [iproute2] ss: force a flush in monitor mode
    https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=4d82739fda71

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply

* Re: [net-next v1 1/3] net: phy: motorcomm: Add yt8531_set_ds() mdio_locked bool parameter
From: Andrew Lunn @ 2026-04-15 14:40 UTC (permalink / raw)
  To: Minda Chen
  Cc: Frank, Andrew Lunn, Heiner Kallweit, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260415092654.64907-2-minda.chen@starfivetech.com>

On Wed, Apr 15, 2026 at 05:26:52PM +0800, Minda Chen wrote:
> yt8531_set_ds() default set register with mdio lock and only called
> with YT8531 PHY. But new type YT8531s support RGMII and has the same
> pin strength setting with YT8531, YT8531s need to call yt8531_set_ds()
> setting pin drive strength. But Its config init function
> yt8521_config_init() already get the mdio lock with phy_select_page().
> 
> Need to add ytphy API without lock in yt8531_set_ds() and a new
> bool parameter for YT8531s RGMII case.

This is ugly.

Please try to modify the code so that both PHYs can call
yt8531_set_ds() in the same locking context. You then don't need the
mdio_locked parameter.

    Andrew

---
pw-bot: cr

^ permalink raw reply

* [PATCH net v4] ipvs: fix MTU check for GSO packets in tunnel mode
From: Yingnan Zhang @ 2026-04-15 14:40 UTC (permalink / raw)
  To: ja, pablo
  Cc: coreteam, davem, edumazet, fw, horms, kuba, linux-kernel,
	lvs-devel, netdev, netfilter-devel, pabeni, phil, Yingnan Zhang

Currently, IPVS skips MTU checks for GSO packets by excluding them with
the !skb_is_gso(skb) condition. This creates problems when IPVS tunnel
mode encapsulates GSO packets with IPIP headers.

The issue manifests in two ways:

1. MTU violation after encapsulation:
   When a GSO packet passes through IPVS tunnel mode, the original MTU
   check is bypassed. After adding the IPIP tunnel header, the packet
   size may exceed the outgoing interface MTU, leading to unexpected
   fragmentation at the IP layer.

2. Fragmentation with problematic IP IDs:
   When net.ipv4.vs.pmtu_disc=1 and a GSO packet with multiple segments
   is fragmented after encapsulation, each segment gets a sequentially
   incremented IP ID (0, 1, 2, ...). This happens because:

   a) The GSO packet bypasses MTU check and gets encapsulated
   b) At __ip_finish_output, the oversized GSO packet is split into
      separate SKBs (one per segment), with IP IDs incrementing
   c) Each SKB is then fragmented again based on the actual MTU

   This sequential IP ID allocation differs from the expected behavior
   and can cause issues with fragment reassembly and packet tracking.

Fix this by properly validating GSO packets using
skb_gso_validate_network_len(). This function correctly validates
whether the GSO segments will fit within the MTU after segmentation. If
validation fails, send an ICMP Fragmentation Needed message to enable
proper PMTU discovery.

Fixes: 4cdd34084d53 ("netfilter: nf_conntrack_ipv6: improve fragmentation handling")
Signed-off-by: Yingnan Zhang <342144303@qq.com>

---
v4:
- Introduce a new helper function ip_vs_exceeds_mtu() to improve readability (reviewer feedback)

v3: https://lore.kernel.org/netdev/tencent_73010FBD5FA1C05C3BC23A07A50B11CEC90A@qq.com/
v2: https://lore.kernel.org/netdev/tencent_CA2C1C219C99D315086BE55E8654AF7E6009@qq.com/
v1: https://lore.kernel.org/netdev/tencent_4A3E1C339C75D359093BE4F08648AFAA6009@qq.com/
---
---
 net/netfilter/ipvs/ip_vs_xmit.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 0fb5162992e5..64dfdf8b00c4 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -102,6 +102,18 @@ __ip_vs_dst_check(struct ip_vs_dest *dest)
 	return dest_dst;
 }
 
+/* Based on ip_exceeds_mtu(). */
+static bool ip_vs_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
+{
+	if (skb->len <= mtu)
+		return false;
+
+	if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
+		return false;
+
+	return true;
+}
+
 static inline bool
 __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
 {
@@ -112,7 +124,7 @@ __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
 		if (IP6CB(skb)->frag_max_size > mtu)
 			return true; /* largest fragment violate MTU */
 	}
-	else if (skb->len > mtu && !skb_is_gso(skb)) {
+	else if (ip_vs_exceeds_mtu(skb, mtu)) {
 		return true; /* Packet size violate MTU size */
 	}
 	return false;
@@ -232,7 +244,7 @@ static inline bool ensure_mtu_is_adequate(struct netns_ipvs *ipvs, int skb_af,
 			return true;
 
 		if (unlikely(ip_hdr(skb)->frag_off & htons(IP_DF) &&
-			     skb->len > mtu && !skb_is_gso(skb) &&
+			     ip_vs_exceeds_mtu(skb, mtu) &&
 			     !ip_vs_iph_icmp(ipvsh))) {
 			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
 				  htonl(mtu));
-- 
2.51.0.windows.1


^ permalink raw reply related

* Re: [net-next v1 2/3] net: motorcomm: phy: set drive strength in 8531s RGMII case
From: Andrew Lunn @ 2026-04-15 14:42 UTC (permalink / raw)
  To: Minda Chen
  Cc: Frank, Andrew Lunn, Heiner Kallweit, David S . Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, netdev, linux-kernel
In-Reply-To: <20260415092654.64907-3-minda.chen@starfivetech.com>

On Wed, Apr 15, 2026 at 05:26:53PM +0800, Minda Chen wrote:
> Set RXD and RX CLK pin drive strength while in 8531s RGMII
> case.
> 
> Signed-off-by: Minda Chen <minda.chen@starfivetech.com>
> ---
>  drivers/net/phy/motorcomm.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/net/phy/motorcomm.c b/drivers/net/phy/motorcomm.c
> index 35aff1519b4b..f3129419f7c9 100644
> --- a/drivers/net/phy/motorcomm.c
> +++ b/drivers/net/phy/motorcomm.c
> @@ -1714,6 +1714,11 @@ static int yt8521_config_init(struct phy_device *phydev)
>  		if (ret < 0)
>  			goto err_restore_page;
>  	}
> +
> +	if (phydev->drv->phy_id == PHY_ID_YT8531S &&
> +	    phydev->interface != PHY_INTERFACE_MODE_SGMII)
> +		ret = yt8531_set_ds(phydev, true);

phy_interface_is_rgmii().


^ permalink raw reply

* Re: [PATCH net 1/1] 8021q: free cleared egress QoS mappings safely
From: Simon Horman @ 2026-04-15 14:47 UTC (permalink / raw)
  To: n05ec
  Cc: 'Simon Horman', netdev, andrew+netdev, davem, edumazet,
	kuba, pabeni, kees, yifanwucs, tomapufckgml, yuantan098, bird,
	ylong030
In-Reply-To: <b877895cd02d35254b5c05d3c40abbf130cd87eb.1776039122.git.ylong030@ucr.edu>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
8021q: free cleared egress QoS mappings safely

This commit converts the 802.1Q egress priority mapping lists to use
RCU-protected pointers instead of keeping tombstones. It deletes mappings
when vlan_prio is cleared, deferring the freeing of removed nodes until
after an RCU grace period.

> diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
> index a000b1ef05206..bbe7cbd97939d 100644
> --- a/net/8021q/vlan_netlink.c
> +++ b/net/8021q/vlan_netlink.c
> @@ -260,13 +260,10 @@ static int vlan_fill_info(struct sk_buff *skb, const struct net_device *dev)
>  			goto nla_put_failure;
>  
>  		for (i = 0; i < ARRAY_SIZE(vlan->egress_priority_map); i++) {
> -			for (pm = vlan->egress_priority_map[i]; pm;
> -			     pm = pm->next) {
> -				if (!pm->vlan_qos)
> -					continue;
> -
> +			for (pm = rtnl_dereference(vlan->egress_priority_map[i]); pm;
> +			     pm = rtnl_dereference(pm->next)) {

Is it safe to use rtnl_dereference() for this traversal?

The vlan_fill_info() function implements the rtnl_link_ops->fill_info
callback, which is invoked by rtnl_dump_ifinfo() to service interface
netlink dumps. Modern netlink dumps can operate locklessly under
rcu_read_lock() without acquiring the RTNL lock.

Using rtnl_dereference() expands to rcu_dereference_protected(), which
might trigger a lockdep warning when the RTNL lock is not held.

Additionally, it omits the memory barriers needed for safe RCU reader
traversal. Without these barriers, concurrent updates from
vlan_dev_set_egress_priority() could cause the lockless reader to observe
stale or uninitialized list nodes.

Would using rcu_dereference_rtnl() or rcu_dereference() here be more
appropriate to ensure safe lockless traversal during netlink dumps?

>  				m.from = pm->priority;
> -				m.to   = (pm->vlan_qos >> 13) & 0x7;
> +				m.to   = (READ_ONCE(pm->vlan_qos) >> 13) & 0x7;
>  				if (nla_put(skb, IFLA_VLAN_QOS_MAPPING,
>  					    sizeof(m), &m))
>  					goto nla_put_failure;

^ permalink raw reply

* Re: rust: net: phy: intent for MAE0621A (out-of-tree C -> Rust), request for target guidance
From: Andrew Lunn @ 2026-04-15 14:53 UTC (permalink / raw)
  To: wenzhaoliao
  Cc: hkallweit1, fujita.tomonori, linux, tmgross, ojeda, netdev,
	rust-for-linux
In-Reply-To: <AN6AigCwKNz*0oYAjaS2aKqr.3.1776262694321.Hmail.2023000929@ruc.edu.cn>

> - paged register access is open-coded and does not robustly propagate or
>   restore errors;
> - several vendor sequences use magic page/register values with no documented
>   rationale in the driver;
> - there are unconditional resets and fixed `mdelay`/`msleep` delays without a
>   clear completion check or justification;
> - debugging uses raw `printk()` calls;
> - some helper return values are ignored, and `ret |= ...` is not a good fit
>   for mainline-style error handling;
> - the MMD / EEE handling looks narrowly special-cased and would need to be
>   re-checked against phylib conventions and proper documentation.

Nice, you spotted many of the issues in that code. That gives me a
better feeling, you have some understanding of Ethernet PHYs.

> At the same time, we should also be explicit that we do not currently have
> MAE0621A hardware in hand, nor sufficient public documentation to claim that
> it is already a well-grounded first target. Our current local setup is useful
> for Rust-for-Linux build/tooling validation and limited non-hardware checks,
> but not for real hardware-backed PHY validation.

My personal experience is that anything which is not tested is
broken. For a driver to be merged, it needs to be tested on real
hardware.

Can you get one of the amlogic boards? TrustOnX Player (TOX3)? Radxa
A5E? I've no idea how easy it is to get Mainline running on these
boards.

	Andrew

^ permalink raw reply

* Re:Re: rust: net: phy: intent for MAE0621A (out-of-tree C -> Rust), request for target guidance
From: wenzhaoliao @ 2026-04-15 15:01 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: hkallweit1, fujita.tomonori, linux, tmgross, ojeda, netdev,
	rust-for-linux
In-Reply-To: <f57bb151-bc84-4c67-ae43-8901b09de884@lunn.ch>


Hello Andrew,


Thank you, this is very helpful.


We agree with your point that for a driver to be merged, it needs to be tested
on real hardware. So we will not push the driver-RFC direction further without
getting hardware into the loop first.


We will now look into obtaining one of the suggested boards and check what it
takes to get a near-mainline or mainline kernel running on it. Based on that,
we will decide whether this is a realistic first target before investing more
time in the Rust driver work itself.


At the moment, Radxa A5E looks like the more concrete option from our side,
mainly because there seems to be visible public discussion around MAE0621A on
that board. But we are still checking the practical side of board availability
and software support.


If you have a preference between TrustOnX Player (TOX3) and Radxa A5E as a
first board to try, that would be very helpful. Otherwise, we will investigate
the A5E path first and come back once we have a clearer hardware/testing plan.


Thank you again for the guidance.


Best regards,
Liao Wenzhao









发件人:Andrew Lunn <andrew@lunn.ch>
发送日期:2026-04-15 22:53:17
收件人:wenzhaoliao <wenzhaoliao@ruc.edu.cn>
抄送人:hkallweit1@gmail.com,fujita.tomonori@gmail.com,linux@armlinux.org.uk,tmgross@umich.edu,ojeda@kernel.org,netdev@vger.kernel.org,rust-for-linux@vger.kernel.org
主题:Re: rust: net: phy: intent for MAE0621A (out-of-tree C -> Rust), request for target guidance>> - paged register access is open-coded and does not robustly propagate or
>>   restore errors;
>> - several vendor sequences use magic page/register values with no documented
>>   rationale in the driver;
>> - there are unconditional resets and fixed `mdelay`/`msleep` delays without a
>>   clear completion check or justification;
>> - debugging uses raw `printk()` calls;
>> - some helper return values are ignored, and `ret |= ...` is not a good fit
>>   for mainline-style error handling;
>> - the MMD / EEE handling looks narrowly special-cased and would need to be
>>   re-checked against phylib conventions and proper documentation.
>
>Nice, you spotted many of the issues in that code. That gives me a
>better feeling, you have some understanding of Ethernet PHYs.
>
>> At the same time, we should also be explicit that we do not currently have
>> MAE0621A hardware in hand, nor sufficient public documentation to claim that
>> it is already a well-grounded first target. Our current local setup is useful
>> for Rust-for-Linux build/tooling validation and limited non-hardware checks,
>> but not for real hardware-backed PHY validation.
>
>My personal experience is that anything which is not tested is
>broken. For a driver to be merged, it needs to be tested on real
>hardware.
>
>Can you get one of the amlogic boards? TrustOnX Player (TOX3)? Radxa
>A5E? I've no idea how easy it is to get Mainline running on these
>boards.
>
>	Andrew
>

^ permalink raw reply

* [PATCH net v4 0/3] vsock/virtio: fix MSG_PEEK calculation on bytes to copy
From: Luigi Leonardi @ 2026-04-15 15:09 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi

`virtio_transport_stream_do_peek`, when calculating the number of bytes to
copy, didn't consider the `offset`, caused by partial reads that happened
before.
This might cause out-of-bounds read that lead to an EFAULT.
More details in the commits.

Commit 1 introduces the fix
Commit 2 introduces some preliminary work for adding a test and fixes a
problem in existing tests.
Commit 3 introduces a test that checks for this bug to avoid future
regressions.

For disclosure: this bug was found initially by claude opus 4.6, I then analyzed
it and worked on the fix and the test.

Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
Changes in v4:
- Picked up RoB
- Increased sleep time from 10 us to 10 ms
- Minor changes to commit messages and comments as suggested by Stefano.
- Link to v3: https://lore.kernel.org/r/20260414-fix_peek-v3-0-e7daead49f83@redhat.com

Changes in v3:
- Addressed reviwers omment
    - Dropped test client, reusing the one already existing
    - Minor changes: added comment, improved commit messages
    - Rebased to latest net-next
- Link to v2: https://lore.kernel.org/r/20260407-fix_peek-v2-0-2e2581dc8b7c@redhat.com

Changes in v2:
- Addressed reviewers comment
    - Test now uses the recv_buf utils.
    - Removed unnecessary barrier
    - Checkpatch warnings.
- Added new commit that allows to use recv_buf with MSG_PEEK
- Picked up RoBs
- Link to v1: https://lore.kernel.org/r/20260402-fix_peek-v1-0-ad274fcef77b@redhat.com

---
Luigi Leonardi (3):
      vsock/virtio: fix MSG_PEEK ignoring skb offset when calculating bytes to copy
      vsock/test: fix MSG_PEEK handling in recv_buf()
      vsock/test: add MSG_PEEK after partial recv test

 net/vmw_vsock/virtio_transport_common.c |  5 ++--
 tools/testing/vsock/util.c              | 15 ++++++++++
 tools/testing/vsock/vsock_test.c        | 50 +++++++++++++++++++++++++--------
 3 files changed, 55 insertions(+), 15 deletions(-)
---
base-commit: 35c2c39832e569449b9192fa1afbbc4c66227af7
change-id: 20260401-fix_peek-6837b83469e3

Best regards,
-- 
Luigi Leonardi <leonardi@redhat.com>


^ permalink raw reply

* [PATCH net v4 1/3] vsock/virtio: fix MSG_PEEK ignoring skb offset when calculating bytes to copy
From: Luigi Leonardi @ 2026-04-15 15:09 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi
In-Reply-To: <20260415-fix_peek-v4-0-8207e872759e@redhat.com>

`virtio_transport_stream_do_peek()` does not account for the skb offset
when computing the number of bytes to copy.

This means that, after a partial recv() that advances the offset, a peek
requesting more bytes than are available in the sk_buff causes
`skb_copy_datagram_iter()` to go past the valid payload, resulting in
a -EFAULT.

The dequeue path already handles this correctly.
Apply the same logic to the peek path.

Fixes: 0df7cd3c13e4 ("vsock/virtio/vhost: read data from non-linear skb")
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Acked-by: Arseniy Krasnov <avkrasnov@salutedevices.com>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
 net/vmw_vsock/virtio_transport_common.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index a152a9e208d0..b5015ab2ee1e 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -545,9 +545,8 @@ virtio_transport_stream_do_peek(struct vsock_sock *vsk,
 	skb_queue_walk(&vvs->rx_queue, skb) {
 		size_t bytes;
 
-		bytes = len - total;
-		if (bytes > skb->len)
-			bytes = skb->len;
+		bytes = min_t(size_t, len - total,
+			      skb->len - VIRTIO_VSOCK_SKB_CB(skb)->offset);
 
 		spin_unlock_bh(&vvs->rx_lock);
 

-- 
2.53.0


^ permalink raw reply related

* [PATCH net v4 2/3] vsock/test: fix MSG_PEEK handling in recv_buf()
From: Luigi Leonardi @ 2026-04-15 15:09 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi
In-Reply-To: <20260415-fix_peek-v4-0-8207e872759e@redhat.com>

`recv_buf` does not handle the MSG_PEEK flag correctly: it keeps calling
`recv` until all requested bytes are available or an error occurs.

The problem is how it calculates the number of bytes read: MSG_PEEK
doesn't consume any bytes and will re-read the same bytes from the buffer
head, so summing the return value every time is wrong.

Moreover, MSG_PEEK doesn't consume the bytes in the buffer, so if more
bytes are requested than are available, the loop will never terminate,
because `recv` will never return EOF. For this reason, we need to compare
the number of bytes read with the number of bytes expected.

Add a check: if the MSG_PEEK flag is present, update the byte counter and
break out of the loop only after at least the expected number of bytes
have been received; otherwise, retry after a short delay to avoid
consuming too many CPU cycles.

This allows us to simplify the `test_stream_credit_update_test` by
reusing `recv_buf`, like some other tests already do.

Suggested-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
 tools/testing/vsock/util.c       | 15 +++++++++++++++
 tools/testing/vsock/vsock_test.c | 13 +------------
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c
index 1fe1338c79cd..fe316b02a590 100644
--- a/tools/testing/vsock/util.c
+++ b/tools/testing/vsock/util.c
@@ -381,7 +381,13 @@ void send_buf(int fd, const void *buf, size_t len, int flags,
 	}
 }
 
+#define RECV_PEEK_RETRY_USEC (10 * 1000)
+
 /* Receive bytes in a buffer and check the return value.
+ *
+ * When MSG_PEEK is set, recv() is retried until it returns at least
+ * expected_ret bytes. The function returns on error, EOF, or timeout
+ * as usual.
  *
  * expected_ret:
  *  <0 Negative errno (for testing errors)
@@ -403,6 +409,15 @@ void recv_buf(int fd, void *buf, size_t len, int flags, ssize_t expected_ret)
 		if (ret <= 0)
 			break;
 
+		if (flags & MSG_PEEK) {
+			if (ret >= expected_ret) {
+				nread = ret;
+				break;
+			}
+			timeout_usleep(RECV_PEEK_RETRY_USEC);
+			continue;
+		}
+
 		nread += ret;
 	} while (nread < len);
 	timeout_end();
diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index 5bd20ccd9335..bdb0754965df 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -1500,18 +1500,7 @@ static void test_stream_credit_update_test(const struct test_opts *opts,
 	}
 
 	/* Wait until there will be 128KB of data in rx queue. */
-	while (1) {
-		ssize_t res;
-
-		res = recv(fd, buf, buf_size, MSG_PEEK);
-		if (res == buf_size)
-			break;
-
-		if (res <= 0) {
-			fprintf(stderr, "unexpected 'recv()' return: %zi\n", res);
-			exit(EXIT_FAILURE);
-		}
-	}
+	recv_buf(fd, buf, buf_size, MSG_PEEK, buf_size);
 
 	/* There is 128KB of data in the socket's rx queue, dequeue first
 	 * 64KB, credit update is sent if 'low_rx_bytes_test' == true.

-- 
2.53.0


^ permalink raw reply related

* [PATCH net v4 3/3] vsock/test: add MSG_PEEK after partial recv test
From: Luigi Leonardi @ 2026-04-15 15:09 UTC (permalink / raw)
  To: Stefan Hajnoczi, Stefano Garzarella, Michael S. Tsirkin,
	Jason Wang, Xuan Zhuo, Eugenio Pérez, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, Simon Horman,
	Arseniy Krasnov
  Cc: kvm, virtualization, netdev, linux-kernel, Luigi Leonardi
In-Reply-To: <20260415-fix_peek-v4-0-8207e872759e@redhat.com>

Add a test that verifies MSG_PEEK works correctly after a partial
recv().

This is to test a bug that was present in the
`virtio_transport_stream_do_peek()` when computing the number of bytes to
copy: After a partial read, the peek function didn't take into
consideration the number of bytes that were already read. So peeking the
whole buffer would cause an out-of-bounds read, that resulted in a -EFAULT.

This test does exactly this: do a partial recv on a buffer, then try to
peek the whole buffer content. The test re-uses
`test_stream_msg_peek_client()` to also cover this scenario.

Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Luigi Leonardi <leonardi@redhat.com>
---
 tools/testing/vsock/vsock_test.c | 37 +++++++++++++++++++++++++++++++++++++
 1 file changed, 37 insertions(+)

diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c
index bdb0754965df..76be0e4a7f0e 100644
--- a/tools/testing/vsock/vsock_test.c
+++ b/tools/testing/vsock/vsock_test.c
@@ -346,6 +346,38 @@ static void test_stream_msg_peek_server(const struct test_opts *opts)
 	return test_msg_peek_server(opts, false);
 }
 
+static void test_stream_peek_after_recv_server(const struct test_opts *opts)
+{
+	unsigned char buf_normal[MSG_PEEK_BUF_LEN];
+	unsigned char buf_peek[MSG_PEEK_BUF_LEN];
+	int fd;
+
+	fd = vsock_stream_accept(VMADDR_CID_ANY, opts->peer_port, NULL);
+	if (fd < 0) {
+		perror("accept");
+		exit(EXIT_FAILURE);
+	}
+
+	control_writeln("SRVREADY");
+
+	/* Partial recv to advance offset within the skb */
+	recv_buf(fd, buf_normal, 1, 0, 1);
+
+	/* Peek with a buffer larger than the remaining data */
+	recv_buf(fd, buf_peek, sizeof(buf_peek), MSG_PEEK, sizeof(buf_peek) - 1);
+
+	/* Consume the remaining data */
+	recv_buf(fd, buf_normal, sizeof(buf_normal) - 1, 0, sizeof(buf_normal) - 1);
+
+	/* Compare full peek and normal read. */
+	if (memcmp(buf_peek, buf_normal, sizeof(buf_peek) - 1)) {
+		fprintf(stderr, "Full peek data mismatch\n");
+		exit(EXIT_FAILURE);
+	}
+
+	close(fd);
+}
+
 #define SOCK_BUF_SIZE (2 * 1024 * 1024)
 #define SOCK_BUF_SIZE_SMALL (64 * 1024)
 #define MAX_MSG_PAGES 4
@@ -2509,6 +2541,11 @@ static struct test_case test_cases[] = {
 		.run_client = test_stream_tx_credit_bounds_client,
 		.run_server = test_stream_tx_credit_bounds_server,
 	},
+	{
+		.name = "SOCK_STREAM MSG_PEEK after partial recv",
+		.run_client = test_stream_msg_peek_client,
+		.run_server = test_stream_peek_after_recv_server,
+	},
 	{},
 };
 

-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net 1/1] 8021q: free cleared egress QoS mappings safely
From: Simon Horman @ 2026-04-15 15:15 UTC (permalink / raw)
  To: Ren Wei
  Cc: netdev, andrew+netdev, davem, edumazet, kuba, pabeni, kees,
	yifanwucs, tomapufckgml, yuantan098, bird, ylong030
In-Reply-To: <b877895cd02d35254b5c05d3c40abbf130cd87eb.1776039122.git.ylong030@ucr.edu>

On Mon, Apr 13, 2026 at 05:07:20PM +0800, Ren Wei wrote:
> From: Longxuan Yu <ylong030@ucr.edu>
> 
> vlan_dev_set_egress_priority() leaves cleared egress priority mapping
> nodes in the hash until device teardown. Repeated set/clear cycles with
> distinct skb priorities therefore allocate an unbounded number of
> vlan_priority_tci_mapping objects and leak memory.
> 
> Delete mappings when vlan_prio is cleared instead of keeping
> tombstones. The TX fast path and reporting paths walk the lists without
> RTNL, so convert the egress mapping lists to RCU-protected pointers and
> defer freeing removed nodes until after a grace period.
> 
> Cc: stable@kernel.org
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: Yifan Wu <yifanwucs@gmail.com>
> Reported-by: Juefei Pu <tomapufckgml@gmail.com>
> Co-developed-by: Yuan Tan <yuantan098@gmail.com>
> Signed-off-by: Yuan Tan <yuantan098@gmail.com>
> Suggested-by: Xin Liu <bird@lzu.edu.cn>
> Signed-off-by: Longxuan Yu <ylong030@ucr.edu>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
> ---
>  include/linux/if_vlan.h  | 23 +++++++++++--------
>  net/8021q/vlan_dev.c     | 48 +++++++++++++++++++++++-----------------
>  net/8021q/vlan_netlink.c |  9 +++-----
>  net/8021q/vlanproc.c     | 12 ++++++----
>  4 files changed, 53 insertions(+), 39 deletions(-)

There is a lot of change here. And I'd suggest splitting the patch up into
(at least) two patches:

1. Convert mappings to use RCU
2. Fix bug

As is, the bug fix itself is difficult to isolate amongst the other changes.

Also, AI generated review suggests that this bug was introduced by commit
b020cb488586 ("[VLAN]: Keep track of number of QoS mappings"). If so,
it would be appropriate to use that commit in the Fixes tag.

-- 
pw-bot: changes-requested

^ permalink raw reply

* [PATCH net v2 0/2] bnge fixes
From: Vikas Gupta @ 2026-04-15 15:16 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: netdev, linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta, Vikas Gupta

Hi,
 This series fix two issues.

 Patch-1: 
    Due to wrong HWRM sequence, driver do not get the correct
    information regarding resources and capabilities.
    The patch fixes the initial HWRM sequence.
Patch-2:
    Remove the unsupported backing store type initialization, which is
    not supported in Thor Ultra devices.

Thanks,
Vikas

v1->v2: 
   Include Fixes tags.


Vikas Gupta (2):
  bnge: fix initial HWRM sequence
  bnge: remove unsupported backing store type

 .../net/ethernet/broadcom/bnge/bnge_core.c    | 39 ++++++++++---------
 .../net/ethernet/broadcom/bnge/bnge_rmem.c    | 16 --------
 2 files changed, 21 insertions(+), 34 deletions(-)

-- 
2.47.1


^ permalink raw reply

* [PATCH net v2 1/2] bnge: fix initial HWRM sequence
From: Vikas Gupta @ 2026-04-15 15:16 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: netdev, linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta, Vikas Gupta
In-Reply-To: <20260415151621.1104956-1-vikas.gupta@broadcom.com>

Firmware may not advertize correct resources if backing store is not
enabled before resource information is queried.
Fix the initial sequence of HWRMs so that driver gets capabilities
and resource information correctly.

Fixes: 3fa9e977a0cd ("bng_en: Initialize default configuration")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Rahul Gupta <rahul-rg.gupta@broadcom.com>
---
 .../net/ethernet/broadcom/bnge/bnge_core.c    | 39 ++++++++++---------
 1 file changed, 21 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnge/bnge_core.c b/drivers/net/ethernet/broadcom/bnge/bnge_core.c
index b4090283df0f..9f6a33b912a6 100644
--- a/drivers/net/ethernet/broadcom/bnge/bnge_core.c
+++ b/drivers/net/ethernet/broadcom/bnge/bnge_core.c
@@ -73,25 +73,35 @@ static int bnge_func_qcaps(struct bnge_dev *bd)
 		return rc;
 	}
 
+	rc = bnge_alloc_ctx_mem(bd);
+	if (rc) {
+		dev_err(bd->dev, "Failed to allocate ctx mem rc: %d\n", rc);
+		goto err_free_ctx_mem;
+	}
+
 	rc = bnge_hwrm_func_resc_qcaps(bd);
 	if (rc) {
 		dev_err(bd->dev, "query resc caps failure rc: %d\n", rc);
-		return rc;
+		goto err_free_ctx_mem;
 	}
 
 	rc = bnge_hwrm_func_qcfg(bd);
 	if (rc) {
 		dev_err(bd->dev, "query config failure rc: %d\n", rc);
-		return rc;
+		goto err_free_ctx_mem;
 	}
 
 	rc = bnge_hwrm_vnic_qcaps(bd);
 	if (rc) {
 		dev_err(bd->dev, "vnic caps failure rc: %d\n", rc);
-		return rc;
+		goto err_free_ctx_mem;
 	}
 
 	return 0;
+
+err_free_ctx_mem:
+	bnge_free_ctx_mem(bd);
+	return rc;
 }
 
 static void bnge_fw_unregister_dev(struct bnge_dev *bd)
@@ -132,32 +142,25 @@ static int bnge_fw_register_dev(struct bnge_dev *bd)
 
 	bnge_hwrm_fw_set_time(bd);
 
-	rc =  bnge_hwrm_func_drv_rgtr(bd);
+	/* Get the resources and configuration from firmware */
+	rc = bnge_func_qcaps(bd);
 	if (rc) {
-		dev_err(bd->dev, "Failed to rgtr with firmware rc: %d\n", rc);
+		dev_err(bd->dev, "Failed initial configuration rc: %d\n", rc);
 		return rc;
 	}
 
-	rc = bnge_alloc_ctx_mem(bd);
+	rc = bnge_hwrm_func_drv_rgtr(bd);
 	if (rc) {
-		dev_err(bd->dev, "Failed to allocate ctx mem rc: %d\n", rc);
-		goto err_func_unrgtr;
-	}
-
-	/* Get the resources and configuration from firmware */
-	rc = bnge_func_qcaps(bd);
-	if (rc) {
-		dev_err(bd->dev, "Failed initial configuration rc: %d\n", rc);
-		rc = -ENODEV;
-		goto err_func_unrgtr;
+		dev_err(bd->dev, "Failed to rgtr with firmware rc: %d\n", rc);
+		goto err_free_ctx_mem;
 	}
 
 	bnge_set_dflt_rss_hash_type(bd);
 
 	return 0;
 
-err_func_unrgtr:
-	bnge_fw_unregister_dev(bd);
+err_free_ctx_mem:
+	bnge_free_ctx_mem(bd);
 	return rc;
 }
 
-- 
2.47.1


^ permalink raw reply related

* [PATCH net v2 2/2] bnge: remove unsupported backing store type
From: Vikas Gupta @ 2026-04-15 15:16 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, andrew+netdev, horms
  Cc: netdev, linux-kernel, vsrama-krishna.nemani, bhargava.marreddy,
	rajashekar.hudumula, ajit.khaparde, dharmender.garg,
	rahul-rg.gupta, Vikas Gupta
In-Reply-To: <20260415151621.1104956-1-vikas.gupta@broadcom.com>

The backing store type, BNGE_CTX_MRAV, is not applicable in Thor Ultra
devices. Remove it from the backing store configuration, as the firmware
will not populate entities in this backing store type, due to which the
driver load fails.

Fixes: 29c5b358f385 ("bng_en: Add backing store support")
Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Dharmender Garg <dharmender.garg@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnge/bnge_rmem.c | 16 ----------------
 1 file changed, 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c b/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c
index 94f15e08a88c..b066ee887a09 100644
--- a/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c
+++ b/drivers/net/ethernet/broadcom/bnge/bnge_rmem.c
@@ -324,7 +324,6 @@ int bnge_alloc_ctx_mem(struct bnge_dev *bd)
 	u32 l2_qps, qp1_qps, max_qps;
 	u32 ena, entries_sp, entries;
 	u32 srqs, max_srqs, min;
-	u32 num_mr, num_ah;
 	u32 extra_srqs = 0;
 	u32 extra_qps = 0;
 	u32 fast_qpmd_qps;
@@ -390,21 +389,6 @@ int bnge_alloc_ctx_mem(struct bnge_dev *bd)
 	if (!bnge_is_roce_en(bd))
 		goto skip_rdma;
 
-	ctxm = &ctx->ctx_arr[BNGE_CTX_MRAV];
-	/* 128K extra is needed to accommodate static AH context
-	 * allocation by f/w.
-	 */
-	num_mr = min_t(u32, ctxm->max_entries / 2, 1024 * 256);
-	num_ah = min_t(u32, num_mr, 1024 * 128);
-	ctxm->split_entry_cnt = BNGE_CTX_MRAV_AV_SPLIT_ENTRY + 1;
-	if (!ctxm->mrav_av_entries || ctxm->mrav_av_entries > num_ah)
-		ctxm->mrav_av_entries = num_ah;
-
-	rc = bnge_setup_ctxm_pg_tbls(bd, ctxm, num_mr + num_ah, 2);
-	if (rc)
-		return rc;
-	ena |= FUNC_BACKING_STORE_CFG_REQ_ENABLES_MRAV;
-
 	ctxm = &ctx->ctx_arr[BNGE_CTX_TIM];
 	rc = bnge_setup_ctxm_pg_tbls(bd, ctxm, l2_qps + qp1_qps + extra_qps, 1);
 	if (rc)
-- 
2.47.1


^ permalink raw reply related

* Re: [PATCH v2] Bluetooth: Add Broadcom channel priority commands
From: Luiz Augusto von Dentz @ 2026-04-15 15:19 UTC (permalink / raw)
  To: Sasha Finkelstein
  Cc: Sven Peter, Janne Grunau, Neal Gompa, Marcel Holtmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, linux-kernel, asahi, linux-arm-kernel,
	linux-bluetooth, netdev
In-Reply-To: <CAMT+MTQ6orj5tpiGL9hz8m2TGiBjA-9D_0e1iLt=_dXBFHcOgg@mail.gmail.com>

Hi Sasha,

On Wed, Apr 15, 2026 at 8:34 AM Sasha Finkelstein <fnkl.kernel@gmail.com> wrote:
>
> On Tue, 14 Apr 2026 at 16:00, Luiz Augusto von Dentz
> <luiz.dentz@gmail.com> wrote:
> > > +       if (sock)
> > > +               set_bit(SOCK_CUSTOM_SOCKOPT, &sock->flags);
> >
> > This is more complicated than it needs to be. I'd just add a new
> > callback, `hdev->set_priority(handle, skb->priority)`, so the driver
> > is called whenever it needs to elevate a connection's priority, that
> > said there could be cases where a connection needs its priority set
> > momentarily to transmit A2DP, followed by OBEX packets that are best
> > effort. Therefore, `hci_conn` will probably need to track the priority
> > so it can detect when it needs changing on a per skb basis.
>
> I have tested per-skb priorities, and unfortunately, this does not work.
> If something tries to send a low-priority packet (for example - a volume
> adjustment), a priority drop causes the same kind of dropout that is
> caused by scans. It appears that the only way to make this hardware work
> is to set the entire hci connection as high priority for as long as it
> is being used to transmit audio.

Ok, then maybe we should decrease the priority, so it can only go up.
That said, in a multiple connection scenario, we cannot really tell
what should be prioritized if we cannot momentarily decrease the
priority.

-- 
Luiz Augusto von Dentz

^ permalink raw reply

* Re: Re: rust: net: phy: intent for MAE0621A (out-of-tree C -> Rust), request for target guidance
From: Andrew Lunn @ 2026-04-15 15:30 UTC (permalink / raw)
  To: wenzhaoliao
  Cc: hkallweit1, fujita.tomonori, linux, tmgross, ojeda, netdev,
	rust-for-linux
In-Reply-To: <AF6AxgB-KF5*XoYukJCGZKoD.3.1776265264920.Hmail.2023000929@ruc.edu.cn>

> If you have a preference between TrustOnX Player (TOX3) and Radxa A5E as a
> first board to try, that would be very helpful. Otherwise, we will investigate
> the A5E path first and come back once we have a clearer hardware/testing plan.

Makes no difference to me. Use whatever is simplest for you.

If you want to do more than just convert the C driver to Rust, but add
new features, made see if the LEDs are controlled by the PHY? Is there
an interrupt output from the PHY to the SoC? Does the board support
WoL? These are all features which the Rust binding is missing, and
could be added, if you have a device to test them on, and the needed
register information.

      Andrew

^ permalink raw reply

* Re: [PATCH v2] Bluetooth: Add Broadcom channel priority commands
From: Sasha Finkelstein @ 2026-04-15 15:31 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Sven Peter, Janne Grunau, Neal Gompa, Marcel Holtmann,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, linux-kernel, asahi, linux-arm-kernel,
	linux-bluetooth, netdev
In-Reply-To: <CABBYNZLNR8hYS9jLLKeB=M9XVvtSFtf1wi4DmcJKBbQVvHTPaw@mail.gmail.com>

On Wed, 15 Apr 2026 at 17:19, Luiz Augusto von Dentz
<luiz.dentz@gmail.com> wrote:
> Ok, then maybe we should decrease the priority, so it can only go up.
> That said, in a multiple connection scenario, we cannot really tell
> what should be prioritized if we cannot momentarily decrease the
> priority.

I believe that the priority is only per-connection and is not designed
to be used per-packet. On Android they change priority when an
A2DP stream starts or stops, by sending the commands from
userspace and are accepting that other things using the same hci
connection will also have high priority.

^ permalink raw reply

* Re: [PATCH] net: sched: teql: fix use-after-free in teql_master_xmit
From: Simon Horman @ 2026-04-15 15:31 UTC (permalink / raw)
  To: hxzene
  Cc: 'Simon Horman', jhs, jiri, davem, edumazet, kuba, pabeni,
	netdev, linux-kernel
In-Reply-To: <20260413094448.2263828-1-hxzene@gmail.com>

From: 'Simon Horman' <horms@kernel.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev
---
net: sched: teql: fix use-after-free in teql_master_xmit

This patch attempts to fix a use-after-free in teql_master_xmit() by
saving master->slaves to a local head variable during teql_destroy(),
and by adding READ_ONCE()/WRITE_ONCE() to datapath accesses.

> diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
> index ec4039a201a2c..2e86397a5219c 100644
> --- a/net/sched/sch_teql.c
> +++ b/net/sched/sch_teql.c
> @@ -101,7 +101,7 @@ teql_dequeue(struct Qdisc *sch)
>  	if (skb == NULL) {
>  		struct net_device *m = qdisc_dev(q);
>  		if (m) {
> -			dat->m->slaves = sch;
> +			WRITE_ONCE(dat->m->slaves, sch);

Can this lockless publication write a soon-to-be-freed pointer to
master->slaves?

If teql_dequeue() concurrently executes this write while teql_destroy()
is unlinking sch and scheduling it for deletion via call_rcu(), sch could
be published globally just as it is being destroyed.

When the RCU grace period ends, sch would be freed but still globally
referenced. The WRITE_ONCE() prevents store-tearing but does not prevent
this race condition.

>  		}
>  	} else {
[ ... ]
> @@ -136,19 +136,23 @@ teql_destroy(struct Qdisc *sch)
>  	if (!master)
>  		return;
>  
> -	prev = master->slaves;
> +	prev = READ_ONCE(master->slaves);
>  	if (prev) {
> +		struct Qdisc *head = prev;
> +
>  		do {
>  			q = NEXT_SLAVE(prev);
>  			if (q == sch) {
>  				NEXT_SLAVE(prev) = NEXT_SLAVE(q);
> -				if (q == master->slaves) {
> -					master->slaves = NEXT_SLAVE(q);
> -					if (q == master->slaves) {
> +				if (q == head) {

Could changing this condition introduce a use-after-free if master->slaves
is concurrently advanced?

If the datapath (teql_master_xmit() or teql_dequeue()) advances
master->slaves to point to sch after head was snapshot, head will not
equal sch.

When teql_destroy() unlinks sch, the q == head check would evaluate
to false, meaning master->slaves is not updated. This would leave
master->slaves pointing to the unlinked sch, which will then be freed.

> +					WRITE_ONCE(master->slaves,
> +						   NEXT_SLAVE(q));
[ ... ]
> @@ -317,7 +321,7 @@ static netdev_tx_t teql_master_xmit(struct sk_buff *skb, struct net_device *dev)
>  				    netdev_start_xmit(skb, slave, slave_txq, false) ==
>  				    NETDEV_TX_OK) {
>  					__netif_tx_unlock(slave_txq);
> -					master->slaves = NEXT_SLAVE(q);
> +					WRITE_ONCE(master->slaves, NEXT_SLAVE(q));

Similarly to teql_dequeue() above, does this risk publishing a dying qdisc
globally?

If NEXT_SLAVE(q) is read before teql_destroy() unlinks it, but the
WRITE_ONCE() executes immediately after the unlink, it appears the
unlinked qdisc would be written to the global master->slaves pointer.

[ ... ]

While looking at the concurrency issues addressed by this patch, it appears
teql_qdisc_init() might also be affected by concurrent updates to
m->slaves:

net/sched/sch_teql.c:teql_qdisc_init() {
    ...
		q->next = NEXT_SLAVE(m->slaves);
		NEXT_SLAVE(m->slaves) = sch;
    ...
}

Because NEXT_SLAVE(m->slaves) evaluates m->slaves twice, if m->slaves
is concurrently advanced between these two statements, could this break
the list linkage and cause lost elements or infinite loops?

^ permalink raw reply

* Re: [PATCH net v4] ipvs: fix MTU check for GSO packets in tunnel mode
From: Julian Anastasov @ 2026-04-15 15:35 UTC (permalink / raw)
  To: Yingnan Zhang
  Cc: pablo, coreteam, davem, edumazet, fw, horms, kuba, linux-kernel,
	lvs-devel, netdev, netfilter-devel, pabeni, phil
In-Reply-To: <tencent_7F7B107ECA750C095D05C19C3B723AFFA60A@qq.com>


	Hello,

On Wed, 15 Apr 2026, Yingnan Zhang wrote:

> Currently, IPVS skips MTU checks for GSO packets by excluding them with
> the !skb_is_gso(skb) condition. This creates problems when IPVS tunnel
> mode encapsulates GSO packets with IPIP headers.
> 
> The issue manifests in two ways:
> 
> 1. MTU violation after encapsulation:
>    When a GSO packet passes through IPVS tunnel mode, the original MTU
>    check is bypassed. After adding the IPIP tunnel header, the packet
>    size may exceed the outgoing interface MTU, leading to unexpected
>    fragmentation at the IP layer.
> 
> 2. Fragmentation with problematic IP IDs:
>    When net.ipv4.vs.pmtu_disc=1 and a GSO packet with multiple segments
>    is fragmented after encapsulation, each segment gets a sequentially
>    incremented IP ID (0, 1, 2, ...). This happens because:
> 
>    a) The GSO packet bypasses MTU check and gets encapsulated
>    b) At __ip_finish_output, the oversized GSO packet is split into
>       separate SKBs (one per segment), with IP IDs incrementing
>    c) Each SKB is then fragmented again based on the actual MTU
> 
>    This sequential IP ID allocation differs from the expected behavior
>    and can cause issues with fragment reassembly and packet tracking.
> 
> Fix this by properly validating GSO packets using
> skb_gso_validate_network_len(). This function correctly validates
> whether the GSO segments will fit within the MTU after segmentation. If
> validation fails, send an ICMP Fragmentation Needed message to enable
> proper PMTU discovery.
> 
> Fixes: 4cdd34084d53 ("netfilter: nf_conntrack_ipv6: improve fragmentation handling")
> Signed-off-by: Yingnan Zhang <342144303@qq.com>

	Looks good to me for the nf tree, thanks!

Acked-by: Julian Anastasov <ja@ssi.bg>

> ---
> v4:
> - Introduce a new helper function ip_vs_exceeds_mtu() to improve readability (reviewer feedback)
> 
> v3: https://lore.kernel.org/netdev/tencent_73010FBD5FA1C05C3BC23A07A50B11CEC90A@qq.com/
> v2: https://lore.kernel.org/netdev/tencent_CA2C1C219C99D315086BE55E8654AF7E6009@qq.com/
> v1: https://lore.kernel.org/netdev/tencent_4A3E1C339C75D359093BE4F08648AFAA6009@qq.com/
> ---
> ---
>  net/netfilter/ipvs/ip_vs_xmit.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
> index 0fb5162992e5..64dfdf8b00c4 100644
> --- a/net/netfilter/ipvs/ip_vs_xmit.c
> +++ b/net/netfilter/ipvs/ip_vs_xmit.c
> @@ -102,6 +102,18 @@ __ip_vs_dst_check(struct ip_vs_dest *dest)
>  	return dest_dst;
>  }
>  
> +/* Based on ip_exceeds_mtu(). */
> +static bool ip_vs_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
> +{
> +	if (skb->len <= mtu)
> +		return false;
> +
> +	if (skb_is_gso(skb) && skb_gso_validate_network_len(skb, mtu))
> +		return false;
> +
> +	return true;
> +}
> +
>  static inline bool
>  __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
>  {
> @@ -112,7 +124,7 @@ __mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
>  		if (IP6CB(skb)->frag_max_size > mtu)
>  			return true; /* largest fragment violate MTU */
>  	}
> -	else if (skb->len > mtu && !skb_is_gso(skb)) {
> +	else if (ip_vs_exceeds_mtu(skb, mtu)) {
>  		return true; /* Packet size violate MTU size */
>  	}
>  	return false;
> @@ -232,7 +244,7 @@ static inline bool ensure_mtu_is_adequate(struct netns_ipvs *ipvs, int skb_af,
>  			return true;
>  
>  		if (unlikely(ip_hdr(skb)->frag_off & htons(IP_DF) &&
> -			     skb->len > mtu && !skb_is_gso(skb) &&
> +			     ip_vs_exceeds_mtu(skb, mtu) &&
>  			     !ip_vs_iph_icmp(ipvsh))) {
>  			icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
>  				  htonl(mtu));
> -- 
> 2.51.0.windows.1

Regards

--
Julian Anastasov <ja@ssi.bg>


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox