Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v1 net-next 4/5] ipv4: fib: Avoid calling fib_trie_table() in fib_new_table() for dying net.
From: Ido Schimmel @ 2026-06-15 12:39 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David Ahern, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <20260612063225.455191-5-kuniyu@google.com>

On Fri, Jun 12, 2026 at 06:32:07AM +0000, Kuniyuki Iwashima wrote:
> We will call ip_fib_net_exit() from ->exit_rtnl().
> 
> All fib_table will be destroyed before devices are unregistered.
> 
> During device unregistration, inetdev_destroy() could call
> fib_del_ifaddr(), which calls fib_magic(RTM_DELROUTE).
> 
> fib_magic() calls fib_new_table(), but we do not want to create
> a new table after ip_fib_net_exit() destroys all tables.
> 
> As a prep, let's add check_net() before fib_trie_table() in
> fib_new_table().
> 
> fib_trie_table() is also called from fib_trie_unmerge(), but
> fib_get_table() fails first in fib_unmerge(), so the same
> problem does not occur there.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply

* Re: [PATCH v1 net-next 5/5] ipv4: fib: Convert fib_net_exit_batch() to ->exit_rtnl().
From: Ido Schimmel @ 2026-06-15 12:39 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David Ahern, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <20260612063225.455191-6-kuniyu@google.com>

On Fri, Jun 12, 2026 at 06:32:08AM +0000, Kuniyuki Iwashima wrote:
> Currently, IPv4 routes are flushed in ->exit_batch() after
> all devices are unregistered.
> 
> Unlike IPv6, IPv4 routes are not added from the fast path,
> so we can flush routes before default_device_exit_batch().
> 
> Let's call ip_fib_net_exit() from ->exit_rtnl() to save
> one RTNL locking dance.
> 
> ip_fib_net_exit() must use list_del_rcu() for fib_table
> for the fast path on dying dev.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply

* Re: [PATCH] net: airoha: Fix skb->priority underflow in airoha_dev_select_queue()
From: Simon Horman @ 2026-06-15 12:40 UTC (permalink / raw)
  To: Wayen.Yan
  Cc: netdev, lorenzo, pabeni, kuba, edumazet, andrew+netdev,
	angelogioacchino.delregno, matthias.bgg, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <6a2de8c5.2c570c9e.53b1a.0e1b@mx.google.com>

On Sun, Jun 14, 2026 at 07:30:54AM +0800, Wayen.Yan wrote:
> In airoha_dev_select_queue(), the expression:
> 
>   queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES;
> 
> implicitly converts to unsigned arithmetic: when skb->priority is 0
> (the default for unclassified traffic), (0u - 1u) wraps to UINT_MAX,
> and UINT_MAX % 8 = 7, routing default best-effort packets to the
> highest-priority QoS queue. This causes QoS inversion where the
> majority of traffic on a PON gateway starves actual high-priority
> flows (VoIP, gaming, etc.).
> 
> Fix by guarding the subtraction: when priority is 0, map to queue 0
> (lowest priority), otherwise apply the original (priority - 1) % 8
> mapping.
> 
> Fixes: 2b288b81560b ("net: airoha: Introduce ndo_select_queue callback")
> Signed-off-by: Wayen <win847@gmail.com>

Our CI guessed incorrectly that this was for the net-next tree,
where it doesn't apply cleanly.

Please post a v2 targeting the net tree like this:

Subject: [PATCH net v2] ...

I suggest including Lorenzo's Acked-by tag.

For more information, please see:
https://docs.kernel.org/process/maintainer-netdev.html

...

-- 
pw-bot: changes-requested

^ permalink raw reply

* Re: [PATCH net v2 1/1] net: ipv4: bound TCP reordering sysctl writes and MTU probe sizes
From: Eric Dumazet @ 2026-06-15 12:43 UTC (permalink / raw)
  To: Ren Wei
  Cc: netdev, kuniyu, david.laight.linux, ncardwell, pabeni,
	chia-yu.chang, ij, yuuchihsu, idosch, fmancera, herbert,
	yuantan098, zcliangcn, bird, bronzed_45_vested
In-Reply-To: <1a5b7e1ef4d70fbad8c8ee0b82d8405f3c964a3d.1781395200.git.bronzed_45_vested@icloud.com>

On Mon, Jun 15, 2026 at 3:31 AM Ren Wei <n05ec@lzu.edu.cn> wrote:
>
> From: Wyatt Feng <bronzed_45_vested@icloud.com>
>
> Reject invalid `net.ipv4.tcp_reordering` values before they reach TCP
> socket state. The sysctl is stored as an `int` but copied into the
> `u32` `tp->reordering` field for new sockets, so negative writes wrap
> to large values.
>
> With `tcp_mtu_probing=2`, the wrapped value can overflow the
> `tcp_mtu_probe()` size calculation and drive the MTU probing path into
> an out-of-bounds read. Route `tcp_reordering` writes through
> `proc_dointvec_minmax()` and require it to be at least 1. Also require
> `tcp_max_reordering` to be at least 1 so the configured maximum cannot
> become negative either.
>
> When registering the table for a non-init network namespace, relocate
> `extra2` pointers that refer into `init_net.ipv4` so the
> `tcp_reordering` upper bound follows that namespace's
> `tcp_max_reordering`.
>
> Harden `tcp_mtu_probe()` itself by computing `size_needed` as `u64`.
> This keeps the send queue and window checks from being bypassed through
> signed integer overflow.
>
> Fixes: 91cc17c0e5e5 ("[TCP]: MTUprobe: receiver window & data available checks fixed")
> Cc: stable@vger.kernel.org
> Reported-by: Yuan Tan <yuantan098@gmail.com>
> Reported-by: Zhengchuan Liang <zcliangcn@gmail.com>
> Reported-by: Xin Liu <bird@lzu.edu.cn>
> Suggested-by: Eric Dumazet <edumazet@google.com>
> Assisted-by: Codex:GPT-5.4
> Signed-off-by: Wyatt Feng <bronzed_45_vested@icloud.com>
> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn>
> ---

Reviewed-by: Eric Dumazet <edumazet@google.com>

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net-next] i40e: add devlink parameter for Flow Director ATR sample rate
From: Loktionov, Aleksandr @ 2026-06-15 12:44 UTC (permalink / raw)
  To: mheib@redhat.com, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, jiri@resnulli.us, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	horms@kernel.org, corbet@lwn.net, Nguyen, Anthony L,
	Kitszel, Przemyslaw, andrew+netdev@lunn.ch
In-Reply-To: <20260614161131.192068-1-mheib@redhat.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of mheib@redhat.com
> Sent: Sunday, June 14, 2026 6:12 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; jiri@resnulli.us; davem@davemloft.net;
> edumazet@google.com; kuba@kernel.org; pabeni@redhat.com;
> horms@kernel.org; corbet@lwn.net; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw
> <przemyslaw.kitszel@intel.com>; andrew+netdev@lunn.ch; Mohammad Heib
> <mheib@redhat.com>
> Subject: [Intel-wired-lan] [PATCH net-next] i40e: add devlink
> parameter for Flow Director ATR sample rate
> 
> From: Mohammad Heib <mheib@redhat.com>
> 
> The i40e driver uses Flow Director ATR to periodically update flow
> steering information for active TCP flows. The update frequency is
> currently controlled by I40E_DEFAULT_ATR_SAMPLE_RATE and is fixed at
> driver build time.
> 
> On systems with a large number of queues and high-rate TCP workloads,
> the default sampling interval can result in frequent Flow Director
> reprogramming for long-lived flows.
> 
> The amount of TCP packet reordering observed on some systems is
> sensitive to the ATR sampling interval. Increasing the interval
> reduces Flow Director programming activity and can significantly
> reduce the associated reordering.
> 
> Since the optimal sampling interval depends on the workload and system
> configuration, a single fixed value is not suitable for all
> deployments.
> 
> Add a devlink parameter to allow administrators to tune the ATR sample
> rate at runtime without rebuilding the driver or disabling ATR
> functionality entirely.
> 
> Signed-off-by: Mohammad Heib <mheib@redhat.com>
> ---
>  Documentation/networking/devlink/i40e.rst     | 19 ++++++
>  drivers/net/ethernet/intel/i40e/i40e.h        |  1 +
>  .../net/ethernet/intel/i40e/i40e_devlink.c    | 65
> +++++++++++++++++++
>  drivers/net/ethernet/intel/i40e/i40e_main.c   |  4 +-
>  drivers/net/ethernet/intel/i40e/i40e_txrx.h   |  4 +-
>  5 files changed, 90 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/networking/devlink/i40e.rst
> b/Documentation/networking/devlink/i40e.rst
> index 51c887f0dc83..704469aa9acf 100644
> --- a/Documentation/networking/devlink/i40e.rst
> +++ b/Documentation/networking/devlink/i40e.rst
> @@ -40,6 +40,25 @@ Parameters
> 
>          The default value is ``0`` (internal calculation is used).
> 

...

> +static int i40e_atr_sample_rate_set(struct devlink *devlink,
> +				    u32 id,
> +				    struct devlink_param_gset_ctx *ctx,
> +				    struct netlink_ext_ack *extack) {
> +	struct i40e_pf *pf = devlink_priv(devlink);
> +	struct i40e_vsi *vsi;
> +	u32 sample_rate = ctx->val.vu32;
> +	int i;
> +
> +	pf->atr_sample_rate = sample_rate;
> +
> +	if (!test_bit(I40E_FLAG_FD_ATR_ENA, pf->flags))
> +		return 0;
> +
> +	vsi = i40e_pf_get_main_vsi(pf);
> +	if (!vsi)
> +		return 0;
> +
> +	for (i = 0; i < vsi->num_queue_pairs; i++) {
> +		if (!vsi->tx_rings[i])
> +			continue;
I'm afraid devlink runtime callback holds NO rtnl_lock.
So if (!vsi->tx_rings[i]) can see not NULL while i40e_down() is running for example.

> +		vsi->tx_rings[i]->atr_sample_rate = sample_rate;
> +		vsi->tx_rings[i]->atr_count = 0;
So UB is possible NULL dereference or UAF.


> +	}
> +
> +	return 0;
> +}
> +

...

> 
>  	bool ring_active;		/* is ring online or not */
>  	bool arm_wb;		/* do something to arm write back */
> --
> 2.53.0


^ permalink raw reply

* Re: [PATCH v1 net-next 0/5] ipv4: fib: Remove RTNL in fib_net_exit_batch().
From: Eric Dumazet @ 2026-06-15 12:46 UTC (permalink / raw)
  To: Kuniyuki Iwashima
  Cc: David Ahern, Ido Schimmel, David S . Miller, Jakub Kicinski,
	Paolo Abeni, Simon Horman, Kuniyuki Iwashima, netdev
In-Reply-To: <20260612063225.455191-1-kuniyu@google.com>

On Thu, Jun 11, 2026 at 11:32 PM Kuniyuki Iwashima <kuniyu@google.com> wrote:
>
> Currently, we flush all IPv4 routes at ->exit_batch() during
> netns dismantle, which requires an extra RTNL.
>
> IPv4 routes are not added from the fast path unlike IPv6, so
> we can flush routes before default_device_exit_batch().
>
> However, there is implicit ordering between ip_fib_net_exit()
> and default_device_exit_batch().
>
> This series detangles it and moves ip_fib_net_exit() to
>  ->exit_rtnl() to save the RTNL dance.
>
> The same change for IPv6 will need more work.
>

For the series:

Reviewed-by: Eric Dumazet <edumazet@google.com>

Thanks!

^ permalink raw reply

* [PATCH net v2] net: airoha: Fix skb->priority underflow in
From: Wayen Yan @ 2026-06-15 12:48 UTC (permalink / raw)
  To: netdev; +Cc: lorenzo, horms, linux-arm-kernel, linux-mediatek

From b894fc031e307f1b6756ea9fcac98e82e23815e1 Mon Sep 17 00:00:00 2001
From: "Wayen.Yan" <win847@gmail.com>
Date: Sun, 14 Jun 2026 07:30:54 +0800
Subject: [PATCH net v2] net: airoha: Fix skb->priority underflow in
 airoha_dev_select_queue()

In airoha_dev_select_queue(), the expression:

  queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES;

implicitly converts to unsigned arithmetic: when skb->priority is 0
(the default for unclassified traffic), (0u - 1u) wraps to UINT_MAX,
and UINT_MAX % 8 = 7, routing default best-effort packets to the
highest-priority QoS queue. This causes QoS inversion where the
majority of traffic on a PON gateway starves actual high-priority
flows (VoIP, gaming, etc.).

Fix by guarding the subtraction: when priority is 0, map to queue 0
(lowest priority), otherwise apply the original (priority - 1) % 8
mapping.

Fixes: 2b288b81560b ("net: airoha: Introduce ndo_select_queue callback")
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Wayen <win847@gmail.com>
---
 drivers/net/ethernet/airoha/airoha_eth.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/airoha/airoha_eth.c b/drivers/net/ethernet/airoha/airoha_eth.c
index 31cdb11cd7..d476ef83c3 100644
--- a/drivers/net/ethernet/airoha/airoha_eth.c
+++ b/drivers/net/ethernet/airoha/airoha_eth.c
@@ -1933,7 +1933,7 @@ static u16 airoha_dev_select_queue(struct net_device *dev, struct sk_buff *skb,
 	 */
 	channel = netdev_uses_dsa(dev) ? skb_get_queue_mapping(skb) : port->id;
 	channel = channel % AIROHA_NUM_QOS_CHANNELS;
-	queue = (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES; /* QoS queue */
+	queue = skb->priority ? (skb->priority - 1) % AIROHA_NUM_QOS_QUEUES : 0;
 	queue = channel * AIROHA_NUM_QOS_QUEUES + queue;
 
 	return queue < dev->num_tx_queues ? queue : 0;
-- 
2.51.0


^ permalink raw reply related

* RE: [PATCH net] igb: only strip Rx timestamp header on the first buffer of a frame
From: Tjerk Kusters @ 2026-06-15 12:48 UTC (permalink / raw)
  To: Kurt Kanzenbach, netdev@vger.kernel.org
  Cc: intel-wired-lan@lists.osuosl.org, anthony.l.nguyen@intel.com,
	przemyslaw.kitszel@intel.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, richardcochran@gmail.com, hawk@kernel.org,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <8733yojljf.fsf@jax.kurt.home>

Hello,

> 
> Great explanation! igb_clean_rx_irq_zc() does not need the same treatment,
> correct?
> 
> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
> 

Looks in the ZC path each frame fits in a single buffer, so igb_ptp_rx_pktstamp() only runs on that one buffer. The bug only hits the regular RX path, which can spread a frame over multiple buffers and calls igb_ptp_rx_pktstamp() on each one - a continuation buffer can then be misread as a timestamp header.

On affected systems I switched to Intel's out-of-tree igb 5.19.10 driver, which is not affected. I also reproduced both the failure and the 5.19.10 success with a simple UDP test client/server (jumbo all-zero payloads), matching what the real GigE camera showed

Regards
Tjerk

^ permalink raw reply

* [PATCH net v3] net: pch_gbe: handle TX skb allocation failure
From: Ruoyu Wang @ 2026-06-15 12:50 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Simon Horman, Masayuki Ohtake, netdev, linux-kernel, Ruoyu Wang

pch_gbe_alloc_tx_buffers() allocates an skb for each TX descriptor and
then passes the returned pointer to skb_reserve(). If netdev_alloc_skb()
fails, skb_reserve() dereferences NULL.

Make pch_gbe_alloc_tx_buffers() return an error when an skb allocation
fails. On failure, let pch_gbe_alloc_tx_buffers() clean the partially
allocated TX ring before returning the error. While bringing the device
up, release the RX buffer pool through a shared cleanup helper before
unwinding the IRQ setup.

Fixes: 77555ee72282 ("net: Add Gigabit Ethernet driver of Topcliff PCH")
Signed-off-by: Ruoyu Wang <ruoyuw560@gmail.com>
---
Changes in v3:
- Move the partial TX ring cleanup into pch_gbe_alloc_tx_buffers(), as
  suggested by Simon Horman.

Changes in v2:
- Add the kernel-doc return value description for
  pch_gbe_alloc_tx_buffers().

 .../ethernet/oki-semi/pch_gbe/pch_gbe_main.c  | 38 ++++++++++++++-----
 1 file changed, 29 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
index e5a6f59..98091fb 100644
--- a/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
+++ b/drivers/net/ethernet/oki-semi/pch_gbe/pch_gbe_main.c
@@ -1411,13 +1411,25 @@ pch_gbe_alloc_rx_buffers_pool(struct pch_gbe_adapter *adapter,
 	return 0;
 }
 
+static void pch_gbe_free_rx_buffers_pool(struct pch_gbe_adapter *adapter,
+					 struct pch_gbe_rx_ring *rx_ring)
+{
+	dma_free_coherent(&adapter->pdev->dev, rx_ring->rx_buff_pool_size,
+			  rx_ring->rx_buff_pool, rx_ring->rx_buff_pool_logic);
+	rx_ring->rx_buff_pool_logic = 0;
+	rx_ring->rx_buff_pool_size = 0;
+	rx_ring->rx_buff_pool = NULL;
+}
+
 /**
  * pch_gbe_alloc_tx_buffers - Allocate transmit buffers
  * @adapter:   Board private structure
  * @tx_ring:   Tx descriptor ring
+ *
+ * Return: 0 on success, -ENOMEM if a TX skb allocation fails.
  */
-static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter,
-					struct pch_gbe_tx_ring *tx_ring)
+static int pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter,
+				    struct pch_gbe_tx_ring *tx_ring)
 {
 	struct pch_gbe_buffer *buffer_info;
 	struct sk_buff *skb;
@@ -1431,12 +1443,17 @@ static void pch_gbe_alloc_tx_buffers(struct pch_gbe_adapter *adapter,
 	for (i = 0; i < tx_ring->count; i++) {
 		buffer_info = &tx_ring->buffer_info[i];
 		skb = netdev_alloc_skb(adapter->netdev, bufsz);
+		if (!skb) {
+			pch_gbe_clean_tx_ring(adapter, tx_ring);
+			return -ENOMEM;
+		}
 		skb_reserve(skb, PCH_GBE_DMA_ALIGN);
 		buffer_info->skb = skb;
 		tx_desc = PCH_GBE_TX_DESC(*tx_ring, i);
 		tx_desc->gbec_status = (DSC_INIT16);
 	}
-	return;
+
+	return 0;
 }
 
 /**
@@ -1878,7 +1895,12 @@ int pch_gbe_up(struct pch_gbe_adapter *adapter)
 			   "Error: can't bring device up - alloc rx buffers pool failed\n");
 		goto freeirq;
 	}
-	pch_gbe_alloc_tx_buffers(adapter, tx_ring);
+	err = pch_gbe_alloc_tx_buffers(adapter, tx_ring);
+	if (err) {
+		netdev_err(netdev,
+			   "Error: can't bring device up - alloc tx buffers failed\n");
+		goto freebuf;
+	}
 	pch_gbe_alloc_rx_buffers(adapter, rx_ring, rx_ring->count);
 	adapter->tx_queue_len = netdev->tx_queue_len;
 	pch_gbe_enable_dma_rx(&adapter->hw);
@@ -1892,6 +1914,8 @@ int pch_gbe_up(struct pch_gbe_adapter *adapter)
 
 	return 0;
 
+freebuf:
+	pch_gbe_free_rx_buffers_pool(adapter, rx_ring);
 freeirq:
 	pch_gbe_free_irq(adapter);
 out:
@@ -1927,11 +1951,7 @@ void pch_gbe_down(struct pch_gbe_adapter *adapter)
 	pch_gbe_clean_tx_ring(adapter, adapter->tx_ring);
 	pch_gbe_clean_rx_ring(adapter, adapter->rx_ring);
 
-	dma_free_coherent(&adapter->pdev->dev, rx_ring->rx_buff_pool_size,
-			  rx_ring->rx_buff_pool, rx_ring->rx_buff_pool_logic);
-	rx_ring->rx_buff_pool_logic = 0;
-	rx_ring->rx_buff_pool_size = 0;
-	rx_ring->rx_buff_pool = NULL;
+	pch_gbe_free_rx_buffers_pool(adapter, rx_ring);
 }
 
 /**
-- 
2.51.0

^ permalink raw reply related

* Re: [PATCH net-next v4] net: mana: Add Interrupt Moderation support
From: Simon Horman @ 2026-06-15 12:51 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: linux-hyperv, netdev, K. Y. Srinivasan, Haiyang Zhang, Wei Liu,
	Dexuan Cui, Long Li, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Konstantin Taranov, Shradha Gupta,
	Erni Sri Satya Vennela, Dipayaan Roy, Aditya Garg, Breno Leitao,
	linux-kernel, linux-rdma, paulros
In-Reply-To: <20260613205812.2659945-1-haiyangz@linux.microsoft.com>

On Sat, Jun 13, 2026 at 01:57:54PM -0700, Haiyang Zhang wrote:
> From: Haiyang Zhang <haiyangz@microsoft.com>
> 
> Add Static and Dynamic Interrupt Moderation (DIM) support for
> Rx and Tx.
> Update queue creation procedure with new data struct with the related
> settings.
> Add functions to collect stat for DIM, and workers to update DIM data
> and settings.
> Update ethtool handler to get/set the moderation settings from a user.
> To avoid detach/re-attach ops, ring DIM doorbell to change settings
> at run time.
> By default, adaptive-rx/tx (DIM) are enabled if supported by HW.
> 
> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
> ---
> v4:
>   Fixed tx stat, concurrency, and mb issues from Simon's review.

...

Thanks for your comprehensive reply to the AI-generated review of v3
that I forwarded. And for fixing the issues present in v3.

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* Re: [linux-next:master] [selftests]  a3f88d89f6: kernel-selftests-bpf.net.test_bridge_neigh_suppress.sh.arping.fail
From: Oliver Sang @ 2026-06-15 13:02 UTC (permalink / raw)
  To: Danielle Ratson
  Cc: oe-lkp@lists.linux.dev, lkp@intel.com, Jakub Kicinski,
	Nikolay Aleksandrov, netdev@vger.kernel.org, oliver.sang
In-Reply-To: <MN2PR12MB4517EDE21D4DA962A55CF1ECD81B2@MN2PR12MB4517.namprd12.prod.outlook.com>

hi, Danielle,

On Thu, Jun 11, 2026 at 11:44:39AM +0000, Danielle Ratson wrote:
> Hi Oliver,
> 
> Thank you for the report.
> 
> The failures appear to be caused by an arping tool version mismatch.
> The test was written assuming iputils arping semantics, but not all distributions ship that version. Different arping implementations have incompatible behavior for the flags used throughout test_bridge_neigh_suppress.sh.
> 
> Looking at the added log, the 56 failures are not limited to the neigh_suppress_arp_probe section.
> The other arping-based test cases in the file are also affected, which is consistent with a tool version issue rather than a kernel regression.
> 
> To confirm the root cause on your end, please share the results for running the below:
> $ arping -V
> $ ./test_bridge_neigh_suppress.sh -t neigh_suppress_arp -v

sorry for late.

our tests run in a auto framework, I had to add some code to print above
information, but so far, it just generates below output.
before we try further, want to seek your advice if these information are
enough?

KERNEL SELFTESTS: linux_headers_dir is /usr/src/linux-headers-x86_64-rhel-9.4-bpf-a3f88d89f698743a8cd91fb43f997e2d292a168d
### arping -V
arping: option requires an argument -- 'V'
ARPing 2.25, by Thomas Habets <thomas@habets.se>
usage: arping [ -0aAbdDeFpPqrRuUvzZ ] [ -w <sec> ] [ -W <sec> ] [ -S <host/ip> ]
              [ -T <host/ip ] [ -s <MAC> ] [ -t <MAC> ] [ -c <count> ]
              [ -C <count> ] [ -i <interface> ] [ -m <type> ] [ -g <group> ]
              [ -V <vlan> ] [ -Q <priority> ] <host/ip/MAC | -B>
For complete usage info, use --help or check the manpage.
### ./test_bridge_neigh_suppress.sh -t neigh_suppress_arp -v
                                                <-------- seems there is no output here
Per-port ARP suppression - VLAN 10              <-------- seems already start the tests
----------------------------------
COMMAND: tc -n sw1-U1mYwE qdisc replace dev vx0 clsact


> 
> Thanks,
> Danielle
> 
> > -----Original Message-----
> > From: kernel test robot <oliver.sang@intel.com>
> > Sent: Thursday, 11 June 2026 10:23
> > To: Danielle Ratson <danieller@nvidia.com>
> > Cc: oe-lkp@lists.linux.dev; lkp@intel.com; Jakub Kicinski <kuba@kernel.org>;
> > Nikolay Aleksandrov <razor@blackwall.org>; netdev@vger.kernel.org;
> > oliver.sang@intel.com
> > Subject: [linux-next:master] [selftests] a3f88d89f6: kernel-selftests-
> > bpf.net.test_bridge_neigh_suppress.sh.arping.fail
> > 
> > 
> > hi, Danielle Ratson,
> > 
> > for new added tests, we still found some failures in our tests, not sure if any
> > dependencies we missed? thanks
> > 
> > 
> > Hello,
> > 
> > kernel test robot noticed "kernel-selftests-
> > bpf.net.test_bridge_neigh_suppress.sh.arping.fail" on:
> > 
> > commit: a3f88d89f698743a8cd91fb43f997e2d292a168d ("selftests: net:
> > Add tests for ARP probe and DAD NS handling")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > 
> > in testcase: kernel-selftests-bpf
> > version:
> > with following parameters:
> > 
> > 	group: net
> > 
> > 
> > config: x86_64-rhel-9.4-bpf
> > compiler: gcc-14
> > test machine: 16 threads Intel(R) Core(TM) i7-13620H (Raptor Lake) with 32G
> > memory
> > 
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > 
> > 
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes:
> > | https://lore.kernel.org/oe-lkp/202606110955.8f29025d-lkp@intel.com
> > 
> > 
> > # timeout set to 3600
> > # selftests: net: test_bridge_neigh_suppress.sh #
> > 
> > [...]
> > 
> > #
> > # Per-port ARP probe suppression
> > # ------------------------------
> > # TEST: ARP probe suppression                                         [FAIL]
> > # TEST: "neigh_suppress" is on                                        [ OK ]
> > # TEST: ARP probe suppression                                         [FAIL]
> > # TEST: FDB and neighbor entry installation                           [ OK ]
> > # TEST: arping                                                        [FAIL]
> > # TEST: ARP probe suppression                                         [FAIL]
> > # TEST: neighbor removal                                              [ OK ]
> > # TEST: ARP probe suppression                                         [FAIL]
> > # TEST: "neigh_suppress" is off                                       [ OK ]
> > # TEST: ARP probe suppression                                         [FAIL]
> > #
> > # Per-port DAD NS suppression
> > # ---------------------------
> > # TEST: DAD NS suppression                                            [ OK ]
> > # TEST: "neigh_suppress" is on                                        [ OK ]
> > # TEST: DAD NS suppression                                            [ OK ]
> > # TEST: FDB and neighbor entry installation                           [ OK ]
> > # TEST: DAD NS suppression                                            [ OK ]
> > # TEST: DAD NS proxy NA reply                                         [ OK ]
> > # TEST: neighbor removal                                              [ OK ]
> > # TEST: DAD NS suppression                                            [ OK ]
> > # TEST: "neigh_suppress" is off                                       [ OK ]
> > # TEST: DAD NS suppression                                            [ OK ]
> > #
> > # Tests passed: 124
> > # Tests failed:  56
> > not ok 110 selftests: net: test_bridge_neigh_suppress.sh # exit=1
> > 
> > 
> > 
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-
> > ci/archive/20260611/202606110955.8f29025d-lkp@intel.com
> > 
> > 
> > 
> > --
> > 0-DAY CI Kernel Test Service
> > https://github.com/intel/lkp-tests/wiki
> 

^ permalink raw reply

* [PATCH net-next v2] net: dsa: yt921x: Add limited ACL flow statistics support
From: David Yang @ 2026-06-15 13:03 UTC (permalink / raw)
  To: netdev
  Cc: David Yang, Andrew Lunn, Vladimir Oltean, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni, linux-kernel

The yt921x supports flow statistics, which might be used to implement
.cls_flower_stats(). However, the number of flow counter are limited,
and you must choose between byte mode or packet mode. As there is no
interface for statistics preference for now, we pick one on our own
initiative.

Signed-off-by: David Yang <mmyangfl@gmail.com>
---
v1: https://lore.kernel.org/r/20260610202508.845328-1-mmyangfl@gmail.com
  - use a safe counter reader
  - use flow_stats_update()
  - do not use hardware counter if assisted by software
  - set extack on exhausted counter
 drivers/net/dsa/yt921x.c | 135 ++++++++++++++++++++++++++++++++++++++-
 drivers/net/dsa/yt921x.h |  17 +++++
 2 files changed, 151 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/yt921x.c b/drivers/net/dsa/yt921x.c
index 9929676a15e1..3dddedd0776a 100644
--- a/drivers/net/dsa/yt921x.c
+++ b/drivers/net/dsa/yt921x.c
@@ -266,6 +266,36 @@ yt921x_reg_toggle_bits(struct yt921x_priv *priv, u32 reg, u32 mask, bool set)
 	return yt921x_reg_update_bits(priv, reg, mask, !set ? 0 : mask);
 }
 
+/* Reliably read a 64bit counter */
+static int yt921x_counter_read(struct yt921x_priv *priv, u32 reg, u64 *valp)
+{
+	u32 old_lo;
+	int res;
+	u32 hi;
+	u32 lo;
+
+	res = yt921x_reg_read(priv, reg, &old_lo);
+	if (res)
+		return res;
+
+	for (int i = 0; i < 16; i++) {
+		res = yt921x_reg_read(priv, reg + 4, &hi);
+		if (res)
+			return res;
+		res = yt921x_reg_read(priv, reg, &lo);
+		if (res)
+			return res;
+
+		if (lo >= old_lo) {
+			*valp = ((u64)hi << 32) | lo;
+			return 0;
+		}
+		old_lo = lo;
+	}
+
+	return -ETIMEDOUT;
+}
+
 /* Some multi-word registers, like VLANn_CTRL, should be treated as a single
  * long register. More specifically, writes to parts of its words won't become
  * visible, until the last word is written.
@@ -2167,6 +2197,8 @@ yt921x_acl_rule_ext_parse_flow(struct yt921x_acl_rule_ext *ruleext, int port,
 	yt921x_acl_rule_set_ports(&ruleext->r, 0, BIT(port));
 	ruleext->r.tag = cls->cookie;
 	ruleext->r.type = TC_SETUP_CLSFLOWER;
+	/* Align with sja1105 */
+	ruleext->r.stat_pkt_mode = true;
 	return 0;
 }
 
@@ -2224,6 +2256,47 @@ yt921x_acl_reserve(struct yt921x_priv *priv, unsigned int entscnt,
 	return UINT_MAX;
 }
 
+static int
+yt921x_acl_stat(struct yt921x_priv *priv, enum tc_setup_type type,
+		unsigned long tag, struct flow_stats *stats)
+{
+	struct yt921x_acl_rule *aclrule;
+	const struct yt921x_acl_blk *aclblk;
+	unsigned int statid;
+	unsigned int binid;
+	unsigned int blkid;
+	unsigned int entid;
+	u64 diff;
+	u64 stat;
+	int res;
+
+	entid = yt921x_acl_find(priv, type, tag);
+	if (entid == UINT_MAX)
+		return -ENOENT;
+
+	blkid = entid / YT921X_ACL_ENT_PER_BLK;
+	binid = entid % YT921X_ACL_ENT_PER_BLK;
+	aclblk = priv->acl_blks[blkid];
+	aclrule = aclblk->rules[binid];
+
+	if (!(aclrule->action[0] & YT921X_ACL_ACTa_FLOWSTAT_EN))
+		return -EOPNOTSUPP;
+
+	statid = FIELD_GET(YT921X_ACL_ACTa_FLOWSTAT_ID_M, aclrule->action[0]);
+	res = yt921x_counter_read(priv, YT921X_FLOWSTATn_STAT(statid), &stat);
+	if (res)
+		return res;
+
+	diff = stat - aclrule->laststat;
+	if (diff)
+		aclrule->lastused = jiffies;
+	aclrule->laststat = stat;
+	flow_stats_update(stats, aclrule->stat_pkt_mode ? 0 : diff,
+			  !aclrule->stat_pkt_mode ? 0 : diff, 0,
+			  aclrule->lastused, FLOW_ACTION_HW_STATS_IMMEDIATE);
+	return 0;
+}
+
 static int
 yt921x_acl_commit(struct yt921x_priv *priv, unsigned int entid, u8 entsmask)
 {
@@ -2336,6 +2409,10 @@ yt921x_acl_del(struct yt921x_priv *priv, enum tc_setup_type type,
 		clear_bit(FIELD_GET(YT921X_ACL_ACTa_METER_ID_M,
 				    aclrule->action[0]),
 			  priv->meters_map);
+	if (aclrule->action[0] & YT921X_ACL_ACTa_FLOWSTAT_EN)
+		clear_bit(FIELD_GET(YT921X_ACL_ACTa_FLOWSTAT_ID_M,
+				    aclrule->action[0]),
+			  priv->flowstats_map);
 	priv->acl_masks[blkid] &= ~aclrule->mask;
 	kvfree(aclrule);
 	if (!priv->acl_masks[blkid]) {
@@ -2353,13 +2430,15 @@ yt921x_acl_add(struct yt921x_priv *priv,
 	unsigned int entscnt = hweight8(ruleext->r.mask);
 	struct yt921x_acl_rule *aclrule;
 	struct yt921x_acl_blk *aclblk;
-	bool use_trap = false;
 	unsigned int meterid;
+	unsigned int statid;
 	unsigned long mask;
 	unsigned int binid;
 	unsigned int blkid;
 	unsigned int entid;
 	unsigned int o;
+	bool use_trap;
+	u32 ctrl;
 	int res;
 
 	/* Allocate resources */
@@ -2367,6 +2446,10 @@ yt921x_acl_add(struct yt921x_priv *priv,
 	if (entid == UINT_MAX)
 		return -EOPNOTSUPP;
 
+	use_trap = (ruleext->r.action[2] & YT921X_ACL_ACTc_FWD_EN) &&
+		   (FIELD_GET(YT921X_ACL_ACTc_FWD_M,
+			      ruleext->r.action[2]) == YT921X_ACL_ACTc_FWD_TRAP);
+
 	if (!(ruleext->r.action[0] & YT921X_ACL_ACTa_METER_EN)) {
 		meterid = YT921X_METER_NUM;
 	} else {
@@ -2386,6 +2469,35 @@ yt921x_acl_add(struct yt921x_priv *priv,
 		}
 	}
 
+	if (ruleext->r.sw_assisted && use_trap) {
+		statid = YT921X_FLOWSTAT_NUM;
+	} else {
+		statid = find_first_zero_bit(priv->flowstats_map,
+					     YT921X_FLOWSTAT_NUM);
+		if (statid >= YT921X_FLOWSTAT_NUM) {
+			NL_SET_ERR_MSG_MOD(extack,
+					   "No more flowstats available");
+		} else {
+			u32 zeros[2] = {};
+
+			ctrl = YT921X_FLOWSTAT_CTRL_TYPE_FLOW |
+			       YT921X_FLOWSTAT_CTRL_EN;
+			if (ruleext->r.stat_pkt_mode)
+				ctrl |= YT921X_FLOWSTAT_CTRL_PKT_MODE;
+			res = yt921x_reg_write(priv,
+					       YT921X_FLOWSTATn_CTRL(statid),
+					       ctrl);
+			if (res)
+				return res;
+
+			res = yt921x_reg64_write(priv,
+						 YT921X_FLOWSTATn_STAT(statid),
+						 zeros);
+			if (res)
+				return res;
+		}
+	}
+
 	/* Prepare acl block ctrlblk */
 	blkid = entid / YT921X_ACL_ENT_PER_BLK;
 	binid = entid % YT921X_ACL_ENT_PER_BLK;
@@ -2426,6 +2538,9 @@ yt921x_acl_add(struct yt921x_priv *priv,
 		aclrule->action[0] |= YT921X_ACL_ACTa_METER_ID(meterid);
 	else
 		aclrule->action[0] &= ~YT921X_ACL_ACTa_METER_EN;
+	if (statid < YT921X_FLOWSTAT_NUM)
+		aclrule->action[0] |= YT921X_ACL_ACTa_FLOWSTAT_EN |
+				      YT921X_ACL_ACTa_FLOWSTAT_ID(statid);
 
 	/* Write rules */
 	aclblk->rules[binid] = aclrule;
@@ -2438,6 +2553,8 @@ yt921x_acl_add(struct yt921x_priv *priv,
 
 	if (meterid < YT921X_METER_NUM)
 		set_bit(meterid, priv->meters_map);
+	if (statid < YT921X_FLOWSTAT_NUM)
+		set_bit(statid, priv->flowstats_map);
 	priv->acl_masks[blkid] |= aclrule->mask;
 	return 0;
 
@@ -2449,6 +2566,21 @@ yt921x_acl_add(struct yt921x_priv *priv,
 	return res;
 }
 
+static int
+yt921x_dsa_cls_flower_stats(struct dsa_switch *ds, int port,
+			    struct flow_cls_offload *cls, bool ingress)
+{
+	struct yt921x_priv *priv = to_yt921x_priv(ds);
+	int res;
+
+	mutex_lock(&priv->reg_lock);
+	res = yt921x_acl_stat(priv, TC_SETUP_CLSFLOWER, cls->cookie,
+			      &cls->stats);
+	mutex_unlock(&priv->reg_lock);
+
+	return res;
+}
+
 static int
 yt921x_dsa_cls_flower_del(struct dsa_switch *ds, int port,
 			  struct flow_cls_offload *cls, bool ingress)
@@ -4817,6 +4949,7 @@ static const struct dsa_switch_ops yt921x_dsa_switch_ops = {
 	.port_policer_add	= yt921x_dsa_port_policer_add,
 	.port_setup_tc		= yt921x_dsa_port_setup_tc,
 	/* acl */
+	.cls_flower_stats	= yt921x_dsa_cls_flower_stats,
 	.cls_flower_del		= yt921x_dsa_cls_flower_del,
 	.cls_flower_add		= yt921x_dsa_cls_flower_add,
 	/* hsr */
diff --git a/drivers/net/dsa/yt921x.h b/drivers/net/dsa/yt921x.h
index 555046526669..0851102d8b35 100644
--- a/drivers/net/dsa/yt921x.h
+++ b/drivers/net/dsa/yt921x.h
@@ -759,6 +759,16 @@ enum yt921x_l4_type {
 #define  YT921X_METER_CTRLa_EIR_M		GENMASK(17, 0)
 #define   YT921X_METER_CTRLa_EIR(x)			FIELD_PREP(YT921X_METER_CTRLa_EIR_M, (x))
 #define YT921X_METERn_STAT(x)		(0x221000 + 8 * (x))
+#define YT921X_FLOWSTATn_STAT(x)	(0x221400 + 8 * (x))
+#define YT921X_FLOWSTATn_CTRL(x)	(0x221c00 + 4 * (x))
+#define  YT921X_FLOWSTAT_CTRL_EN		BIT(3)
+#define  YT921X_FLOWSTAT_CTRL_PKT_MODE		BIT(2)	/* 0: byte mode */
+#define  YT921X_FLOWSTAT_CTRL_TYPE_M		GENMASK(1, 0)
+#define   YT921X_FLOWSTAT_CTRL_TYPE(x)			FIELD_PREP(YT921X_FLOWSTAT_CTRL_TYPE_M, (x))
+#define   YT921X_FLOWSTAT_CTRL_TYPE_FLOW		YT921X_FLOWSTAT_CTRL_TYPE(0)
+#define   YT921X_FLOWSTAT_CTRL_TYPE_CPU_CODE		YT921X_FLOWSTAT_CTRL_TYPE(1)
+#define   YT921X_FLOWSTAT_CTRL_TYPE_DROP_CODE		YT921X_FLOWSTAT_CTRL_TYPE(2)
+#define   YT921X_FLOWSTAT_CTRL_TYPE_PORT		YT921X_FLOWSTAT_CTRL_TYPE(3)
 
 #define YT921X_PORTn_VLAN_CTRL(port)	(0x230010 + 4 * (port))
 #define  YT921X_PORT_VLAN_CTRL_SVLAN_PRIO_EN	BIT(31)
@@ -830,6 +840,8 @@ enum yt921x_fdb_entry_status {
 #define YT921X_SHAPE_CIR_MAX	((1 << 18) - 1)
 #define YT921X_SHAPE_CBS_MAX	((1 << 14) - 1)
 
+#define YT921X_FLOWSTAT_NUM	64
+
 #define YT921X_LAG_NUM		2
 #define YT921X_LAG_PORT_NUM	4
 
@@ -920,6 +932,10 @@ struct yt921x_acl_rule {
 	u32 action[3];
 	bool sw_assisted;
 
+	bool stat_pkt_mode;
+	u64 laststat;
+	u64 lastused;
+
 	u8 mask;
 	struct yt921x_acl_entry entries[YT921X_ACL_ENT_PER_BLK];
 };
@@ -969,6 +985,7 @@ struct yt921x_priv {
 	u16 eee_ports_mask;
 
 	DECLARE_BITMAP(meters_map, YT921X_METER_NUM);
+	DECLARE_BITMAP(flowstats_map, YT921X_FLOWSTAT_NUM);
 
 	u8 acl_masks[YT921X_ACL_BLK_NUM];
 	struct yt921x_acl_blk *acl_blks[YT921X_ACL_BLK_NUM];
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH v5 5/9] block: implement NVMEM provider
From: Bartosz Golaszewski @ 2026-06-15 13:06 UTC (permalink / raw)
  To: Loic Poulain
  Cc: Bartosz Golaszewski, linux-mmc, devicetree, linux-kernel,
	linux-arm-msm, linux-block, linux-wireless, ath10k,
	linux-bluetooth, netdev, daniel, Ulf Hansson, Rob Herring,
	Krzysztof Kozlowski, Conor Dooley, Bjorn Andersson, Konrad Dybcio,
	Jens Axboe, Johannes Berg, Jeff Johnson, Marcel Holtmann,
	Luiz Augusto von Dentz, Balakrishna Godavarthi, Rocky Liao,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Srinivas Kandagatla, Andrew Lunn, Heiner Kallweit,
	Russell King, Saravana Kannan
In-Reply-To: <CAFEp6-0oxBEdfH-fqhdM18pt4JewLwrMOON9qpQgLFh8KS0hDg@mail.gmail.com>

On Mon, 15 Jun 2026 11:33:22 +0200, Loic Poulain
<loic.poulain@oss.qualcomm.com> said:
> On Mon, Jun 15, 2026 at 11:28 AM Loic Poulain
> <loic.poulain@oss.qualcomm.com> wrote:
>>
>
> Also we cannot safely return -EPROBE_DEFER from add_disk_final()
> either. The NVMEM registration point is late in the sequence, too much
> has already happened to easily unwind. The easiest is that the NVMEM
> simply won't be available if registration fails, which looks
> acceptable?
>

I'd argue that it's a problem with subsystem code then as unwinding should
work fine no matter the point in the sequence when it's initiated but I guess
this isn't really an issue in your patches.

I suppose we shouldn't typically run into probe deferral here so I'm fine just
ignoring the return value.

Bart

^ permalink raw reply

* Re: [PATCH net v2] psample: use nla_reserve() for PSAMPLE_ATTR_DATA
From: Simon Horman @ 2026-06-15 13:09 UTC (permalink / raw)
  To: Xiang Mei; +Cc: kuba, netdev, davem, yotam.gi, edumazet, pabeni, bestswngs
In-Reply-To: <20260614034919.918494-1-xmei5@asu.edu>

On Sat, Jun 13, 2026 at 08:49:19PM -0700, Xiang Mei wrote:
> psample_sample_packet() open-codes the PSAMPLE_ATTR_DATA attribute and
> reserves nla_total_size(data_len) bytes but only writes NLA_HDRLEN +
> data_len of them.  When data_len is not a multiple of 4 the trailing
> alignment padding is left uninitialised, leaking stale slab memory to
> every listener on the PSAMPLE_NL_MCGRP_SAMPLE multicast group.
> 
> Use nla_reserve(), which lays out the header and zeroes the padding, and
> copy the payload into the reserved area with skb_copy_bits().
> 
> Fixes: 6ae0a6286171 ("net: Introduce psample, a new genetlink channel for packet sampling")
> Reported-by: Weiming Shi <bestswngs@gmail.com>
> Assisted-by: Claude:claude-opus-4-8
> Signed-off-by: Xiang Mei <xmei5@asu.edu>
> ---
> v2: use nla_reserve to ensure no info leak

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply

* 答复: [外部邮件] Re: [PATCH] net/mlx5: Fix wrong register access in mlx5_query_mtppse()
From: Li,Rongqing @ 2026-06-15 13:09 UTC (permalink / raw)
  To: Gal Pressman, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
	Mark Bloch, Andrew Lunn, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <054e047e-84a3-4dab-ad5d-604d16fc347d@nvidia.com>

> This is clearly completely broken, which pointed me to the fact that
> mlx5_query_mtppse() is not used anywhere.
> 
> Can you please submit a patch to remove it?

Ok, I will remove it  

[Li,Rongqing] 


^ permalink raw reply

* Re: [PATCH 0/3] vmsplice: make vmsplice a trivial wrapper for preadv2/pwritev2
From: Joanne Koong @ 2026-06-15 13:11 UTC (permalink / raw)
  To: Val Packett
  Cc: Al Viro, Linus Torvalds, Christian Brauner, Askar Safin,
	linux-kernel, linux-mm, linux-api, netdev, Matthew Wilcox,
	Jens Axboe, Christoph Hellwig, David Howells, Andrew Morton,
	David Hildenbrand, Pedro Falcato, Miklos Szeredi, patches,
	linux-fsdevel, Jan Kara, Steven Rostedt, fuse-devel,
	Bernd Schubert
In-Reply-To: <83f05c55-efba-4bf5-abfe-d2ab0819e904@packett.cool>

On Mon, Jun 15, 2026 at 2:25 AM Val Packett <val@packett.cool> wrote:
>
> Hi,
>
> On 6/1/26 2:33 PM, Al Viro wrote:
> > On Mon, Jun 01, 2026 at 10:17:23AM -0700, Linus Torvalds wrote:
> >
> >> TLDR: maybe we could ghet rid of "f_op->splice_read". *That* would be
> >> a big simplification.
> > FUSE might be interesting - fuse_dev_splice_read() and its ilk.
> > Communications between the kernel and fuse server at least used to
> > seriously want that, so that would be one place to look for unhappy
> > userland...
>
> speaking of fuse_dev_splice……_write actually, this series has broken
> xdg-document-portal!
>
> https://github.com/flatpak/xdg-desktop-portal/issues/2026
>
> Specifically what happens is that the EINVAL is returned due to oh.len
> != nbytes:
>
> fuse_dev_do_write: oh.len 16400 != nbytes 15526
>
> (where 16400 == 16384 (read len) + 16, 15526 == 15510 (file len) + 16)
>
> After reverting the series, there is no error because oh.len
> becomes 15526 too.

I think this is because of how libfuse handles eof / short reads. When
it detects a short read, it fixes up the header length after the
header was already vmspliced to the pipe because it assumes vmsplice
mapped the header's page into the pipe by reference. It assumes that
modifying the header length in place gets then reflected in what the
pipe later splices out.

The logic for this happens in fuse_send_data_iov() [1]:
a) sets out->len = headerlen (16) + len (16384) = 16400 in the
stack-allocated fuse_out_header
b) vmsplices the header to the pipe
c) splices the backing file to the pipe. if this hits EOF, it'll get
back 15510 instead of 16384
d) detects the short read [2], fixes up the stack out->len = 16 + 15510 = 15526
e) splices the pipe to /dev/fuse

After this patch, step b) is a straight copy which means step d)'s
fixup doesn't modify what's in the pipe. This could be fixed up in
libfuse to not depend on modify-after-vmsplice, but I don't think this
helps for applications using already-released libfuse versions. I
think this patch needs to be reverted.

Thanks,
Joanne

[1] https://github.com/libfuse/libfuse/blob/master/lib/fuse_lowlevel.c#L846
[2] https://github.com/libfuse/libfuse/blob/master/lib/fuse_lowlevel.c#L956

>
>
> Thanks,
> ~val
>

^ permalink raw reply

* [PATCH net v3] rtase: Workaround for TX hang caused by short UDP packets entering hardware PTP parsing
From: Justin Lai @ 2026-06-15 13:16 UTC (permalink / raw)
  To: kuba
  Cc: davem, edumazet, pabeni, andrew+netdev, linux-kernel, netdev,
	stable, horms, richardcochran, david.laight.linux,
	aleksander.lobakin, pkshih, larry.chiu, Justin Lai

The hardware performs additional PTP parsing on UDP packets identified
by destination ports 319/320 at the expected UDP destination port
offset.

If such a packet has transport data smaller than RTASE_MIN_PAD_LEN,
the parser may access data beyond the end of the packet and trigger a
TX hang.

For IPv4 fragmented packets, a non-initial fragment does not contain a
UDP header. However, if the payload contains values matching PTP
destination ports (319/320) at the expected UDP destination port
offset, the hardware incorrectly classifies the fragment as a PTP
packet and performs further parsing.

IPv6 fragmented packets are not affected because the hardware only
enters this parsing path when the IPv6 Next Header field directly
indicates UDP. Packets carrying a Fragment Header do not enter this
path.

Pad affected packets so the transport data reaches
RTASE_MIN_PAD_LEN before transmission to avoid triggering the
hardware issue.

Fixes: d6e882b89fdf ("rtase: Implement .ndo_start_xmit function")
Cc: stable@vger.kernel.org
Signed-off-by: Justin Lai <justinlai0215@realtek.com>
---
v2 -> v3:
- Remove dependency on skb_transport_header_was_set().
- Determine UDP header offset from IPv4/IPv6 headers.
- Use skb_header_pointer() for UDP header access.
- Add non-linear skb handling.

v1 -> v2:
- Remove RTASE_SHORT_PKT_THRESH and the packet length check.
- Check transport data length before parsing the UDP header.
- Add Fixes tag.
- Add Cc: stable@vger.kernel.org.
- Target net tree.
---
 drivers/net/ethernet/realtek/rtase/rtase.h    |  2 +
 .../net/ethernet/realtek/rtase/rtase_main.c   | 79 +++++++++++++++++++
 2 files changed, 81 insertions(+)

diff --git a/drivers/net/ethernet/realtek/rtase/rtase.h b/drivers/net/ethernet/realtek/rtase/rtase.h
index b9209eb6ea73..d489d20177ac 100644
--- a/drivers/net/ethernet/realtek/rtase/rtase.h
+++ b/drivers/net/ethernet/realtek/rtase/rtase.h
@@ -359,4 +359,6 @@ struct rtase_private {
 
 #define RTASE_MSS_MASK GENMASK(28, 18)
 
+#define RTASE_MIN_PAD_LEN 47
+
 #endif /* RTASE_H */
diff --git a/drivers/net/ethernet/realtek/rtase/rtase_main.c b/drivers/net/ethernet/realtek/rtase/rtase_main.c
index 55105d34bc79..4c295a39c7a0 100644
--- a/drivers/net/ethernet/realtek/rtase/rtase_main.c
+++ b/drivers/net/ethernet/realtek/rtase/rtase_main.c
@@ -61,6 +61,7 @@
 #include <linux/pci.h>
 #include <linux/pm_runtime.h>
 #include <linux/prefetch.h>
+#include <linux/ptp_classify.h>
 #include <linux/rtnetlink.h>
 #include <linux/tcp.h>
 #include <asm/irq.h>
@@ -1249,6 +1250,81 @@ static u32 rtase_tx_csum(struct sk_buff *skb, const struct net_device *dev)
 	return csum_cmd;
 }
 
+static bool rtase_get_udp_offset(struct sk_buff *skb, u32 *udp_offset)
+{
+	int no = skb_network_offset(skb);
+	struct ipv6hdr *i6h, _i6h;
+	struct iphdr *ih, _ih;
+
+	switch (vlan_get_protocol(skb)) {
+	case htons(ETH_P_IP):
+		ih = skb_header_pointer(skb, no, sizeof(_ih), &_ih);
+		if (!ih)
+			return false;
+
+		if (ih->ihl < 5)
+			return false;
+
+		if (ih->protocol != IPPROTO_UDP)
+			return false;
+
+		*udp_offset = no + ih->ihl * 4;
+
+		return true;
+	case htons(ETH_P_IPV6):
+		i6h = skb_header_pointer(skb, no, sizeof(_i6h), &_i6h);
+		if (!i6h)
+			return false;
+
+		if (i6h->nexthdr != IPPROTO_UDP)
+			return false;
+
+		*udp_offset = no + sizeof(*i6h);
+
+		return true;
+	default:
+		return false;
+	}
+}
+
+static bool rtase_skb_pad(struct sk_buff *skb)
+{
+	__be16 *dest, _dest;
+	u32 trans_data_len;
+	u32 udp_offset;
+	u16 dest_port;
+	u32 pad_len;
+
+	if (!rtase_get_udp_offset(skb, &udp_offset))
+		return true;
+
+	trans_data_len = skb->len - udp_offset;
+	if (trans_data_len < offsetof(struct udphdr, len) ||
+	    trans_data_len >= RTASE_MIN_PAD_LEN)
+		return true;
+
+	dest = skb_header_pointer(skb,
+				  udp_offset + offsetof(struct udphdr, dest),
+				  sizeof(_dest), &_dest);
+	if (!dest)
+		return true;
+
+	dest_port = ntohs(*dest);
+	if (dest_port != PTP_EV_PORT && dest_port != PTP_GEN_PORT)
+		return true;
+
+	if (skb_is_nonlinear(skb)) {
+		if (skb_linearize(skb))
+			return false;
+	}
+
+	pad_len = RTASE_MIN_PAD_LEN - trans_data_len;
+	if (__skb_put_padto(skb, skb->len + pad_len, false))
+		return false;
+
+	return true;
+}
+
 static int rtase_xmit_frags(struct rtase_ring *ring, struct sk_buff *skb,
 			    u32 opts1, u32 opts2)
 {
@@ -1362,6 +1438,9 @@ static netdev_tx_t rtase_start_xmit(struct sk_buff *skb,
 		opts2 |= rtase_tx_csum(skb, dev);
 	}
 
+	if (!rtase_skb_pad(skb))
+		goto err_dma_0;
+
 	frags = rtase_xmit_frags(ring, skb, opts1, opts2);
 	if (unlikely(frags < 0))
 		goto err_dma_0;
-- 
2.40.1


^ permalink raw reply related

* Re: [PATCH net-next v7 02/12] net: phylink: introduce internal phylink PCS handling
From: Maxime Chevallier @ 2026-06-15 13:31 UTC (permalink / raw)
  To: Christian Marangi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Simon Horman, Jonathan Corbet, Shuah Khan,
	Lorenzo Bianconi, Heiner Kallweit, Russell King, Saravana Kannan,
	Philipp Zabel, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
	Justin Stitt, netdev, devicetree, linux-kernel, linux-doc,
	linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260615122950.22281-3-ansuelsmth@gmail.com>

Hi Christian,

On 6/15/26 14:29, Christian Marangi wrote:
> Introduce internal handling of PCS for phylink. This is an alternative
> way to .mac_select_pcs that moves the selection logic of the PCS entirely
> to phylink with the usage of the supported_interface value in the PCS
> struct.
> 
> MAC should now provide a callback to fill the available PCS in
> phylink_config in .fill_available_pcs and fill the .num_possible_pcs with
> the number of elements in the array. MAC should also define a new bitmap,
> pcs_interfaces, in phylink_config to define for what interface mode a
> dedicated PCS is required.
> 
> On phylink_create(), an array of PCS pointer is allocated of size
> .num_possible_pcs from phylink_config and .fill_available_pcs from
> phylink_config is called passing as args the just allocated array and
> the number of possible element in it.
> 
> MAC will fill this passed array with all the available PCS.
> 
> This array is then parsed and a linked list of PCS is created based on
> the allocated PCS array filled by MAC via .fill_available_pcs().
> 
> Every PCS in phylink PCS list gets then linked to the phylink instance
> by setting the phylink value in phylink_pcs struct to the phylink instance.
> Also the supported_interface value in phylink struct is updated with
> the new supported_interface from the provided PCS.
> 
> On phylink_destroy(), every PCS in phylink PCS list is unlinked from the
> phylink instance by setting the phylink value in phylink_pcs struct to NULL
> and removed from the PCS list.
> 
> phylink_validate_mac_and_pcs(), phylink_major_config() and
> phylink_inband_caps() are updated to support this new implementation
> with the PCS list stored in phylink.
> 
> They will make use of phylink_validate_pcs_interface() that will loop
> for every PCS in the phylink PCS available list and find one that supports
> the passed interface.
> 
> phylink_validate_pcs_interface() applies the same logic of .mac_select_pcs
> where if a supported_interface value is not set for the PCS struct, then
> it's assumed every interface is supported.
> 
> A MAC is required to implement either a .mac_select_pcs or make use of
> the PCS list implementation. Implementing both will result in a fail
> on phylink_create().
> 
> A MAC defining .num_possible_pcs in phylink_config MUST also define a
> .fill_available_pcs or phylink_create() will fail with an negative error.
> 
> phylink value in phylink_pcs struct with this implementation is used to
> track from PCS side when it's attached to a phylink instance. PCS driver
> will make use of this information to correctly detach from a phylink
> instance if needed.
> 
> phylink_pcs_change() is also changed to verify that the PCS that triggered
> a link change is the one that is currently used by the phylink instance.
> 
> The .mac_select_pcs implementation is not changed but it's expected that
> every MAC driver migrates to the new implementation to later deprecate
> and remove .mac_select_pcs.
> 
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> ---

[...]

> @@ -1872,10 +1993,28 @@ struct phylink *phylink_create(struct phylink_config *config,
>  	mutex_init(&pl->phydev_mutex);
>  	mutex_init(&pl->state_mutex);
>  	INIT_WORK(&pl->resolve, phylink_resolve);
> +	INIT_LIST_HEAD(&pl->pcs_list);
> +
> +	/* Fill the PCS list with available PCS from phylink config */
> +	ret = phylink_fill_available_pcs(pl, config);
> +	if (ret < 0) {
> +		kfree(pl);
> +		return ERR_PTR(ret);
> +	}
> +
> +	/* Link available PCS to phylink */
> +	list_for_each_entry(pcs, &pl->pcs_list, list)
> +		pcs->phylink = pl;
>  
>  	phy_interface_copy(pl->supported_interfaces,
>  			   config->supported_interfaces);
>  
> +	/* Update supported interfaces */
> +	list_for_each_entry(pcs, &pl->pcs_list, list)
> +		phy_interface_or(pl->supported_interfaces,
> +				 pl->supported_interfaces,
> +				 pcs->supported_interfaces);
> +

I'm not entirely sure about that, we may need to restrict the supported_interfaces
from the MAC.

As an example, take mvpp2. We have 2 PCSs, one for BaseX/SGMII, one for BaseR. But
if we don't have a comphy (generic PHY) device, then we can't use all the
combination of modes our PCSs can provide :

https://elixir.bootlin.com/linux/v7.1-rc7/source/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c#L7074

These aren't external PCS IPs, but from what I understand you'd like to
handle these the same way as purely external PCSs, right ?

I'd say the MAC driver utltimately has the knowledge of all possible interfaces.

The way I see it, it's probably safer to let the MAC give a wide range of interfaces,
and filter that down with what the PCSs can provide (i.e. turn that or into an and,
while handling the case where the pcs supported interfaces is empty).

What do you think ?

Maxime

^ permalink raw reply

* Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
From: Toke Høiland-Jørgensen @ 2026-06-15 13:31 UTC (permalink / raw)
  To: Ralf Lici
  Cc: netdev, Daniel Gröber, Antonio Quartulli, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	Beniamino Galvani
In-Reply-To: <20260613131720.253936-1-ralf@mandelbit.com>

>> >> I think a better model is to treat the device as basically a loopback
>> >> device that translates packets before looping them back (so when they
>> >> come back they appear to be coming from that device).
>> >>
>> >> Any reason why that wouldn't work?
>> >>
>> >
>> > That's indeed the intended model for the ipxlat netdevice: route packets
>> > to it, translate them, then loop them back into the stack as packets
>> > received from that same device. That seemed like the simplest model and
>> > the one that exposes the translation point most clearly.
>>
>> Right. I think this could be made a bit more explicit in the
>> documentation as well, since it's a bit of an unusual model.
>>
>> And, well, taking a step back: is it really the right model? Regular NAT
>> lives in netfilter, why can't this be a netfilter module as well? Seems
>> to me you could have something like:
>>
>> table ip xlat4 {
>> 	chain postrouting {
>> 		type nat hook postrouting priority srcnat; policy accept;
>> 		ip daddr 0.0.0.0/0 oifname "eth0" xlat to 64:ff9b::/96
>> 	}
>> }
>> table ip6 xlat6 {
>> 	chain prerouting {
>> 		type nat hook prerouting priority dstnat; policy accept;
>> 		ip6 saddr 64::ff0b::/96 iifname "eth0" xlat from 64::ff9b::/96
>> 	}
>> }
>>
>> and that would provide the functionality without having to implement a
>> new interface type and the associated multiple traversals through the
>> stack? Did you consider this as an alternative to the new device type?
>>
>
> We did consider netfilter, and your example is syntactically attractive,
> but I am no longer convinced it is the cleanest model for SIIT.
>
> An nft expression cannot simply rewrite ETH_P_IP <-> ETH_P_IPV6 and
> return ACCEPT as if this were normal NAT because the current hook
> invocation, dst, and conntrack-related state were established for the
> packet as it entered that hook. A cross-family translator would need to
> consume the skb, clear or rebuild route and ct metadata as appropriate,
> do an other-family route lookup, and resume at a well-defined point in
> that family. That seems possible, but it would be a new stateless
> cross-family action, not just a new mode of the existing nft nat
> expression (which is built around nf_nat_setup_info and assumes the
> packet's L3 family does not change AFAICT).

Right, I did not expect it would be possible to actually share code with
the existing NAT functionality, but conceptually they're similar. I.e.,
if I was an admin trying to figure out if my system supported SIIT
translation, my chain of thought would be something along the line of:
"SIIT is a variant of NAT, and I know NAT is a long-standing feature of
netfilter, so I wonder if SIIT exists there as well".

Adding the netfilter folks to Cc to try to get their attention and an
opinion on this :)

> My second concern is that the SIIT boundary would be a property of
> rule and hook placement. That gives flexibility, but it also means the
> translation point has to be constrained and documented very carefully
> to avoid ambiguous TTL/Hop Limit, PMTU/ICMP, and hook-order behavior.
> For this use case I would rather have the route that matches the
> translation prefix also be the object that says: leave this family
> here and continue in the other one.

Yeah, with flexibility comes the ability to shoot yourself in the foot.
But that's not really different from much of the other functionality we
have in the kernel today, is it? For netfilter in particular it's
certainly possible to configure a broken NAT configuration that leads to
packet drops (or just invalid packets being sent out on a network
device).

> After looking at the available kernel mechanisms again, I think the
> better model is probably LWT: routes carry an ipxlat encap referencing a
> named translator domain configured over netlink. That should represent
> the stateless, prefix-based and symmetric nature of ipxlat.

I think this description actually hits the nail on the head: What are we
implementing here? Is it a product feature, or a building block for one?
The properties you mention wrt consistency, symmetry etc are properties
of the high-level feature (which is also generally the level things are
specified in RFCs). Whereas other packet mangling features in the kernel
are more in the "building block" category, where it's possible to
configure things to implement a particular feature set / compliance with
a particular RFC, but it's also possible to do things that are outside
of that.

I think this relates to the "mechanism, not policy" approach that we
take to most things in the kernel: implement the building blocks to do
something in the most general way we can, and then leave it up to
userspace to configure things in a way that results in a consistent
high-level system behaviour.

That being said:

> Very roughly, userspace could look like:
>
>     ip xlat add siit0 prefix6 64:ff9b::/96
>     ip route add ... encap ipxlat id siit0
>     ip -6 route add ... encap ipxlat id siit0
>
> There are some useful precedents for this: ILA is stateless address
> translation as LWT, seg6_local already has cross-family LWT actions, and
> ioam6 has a similar split between separately configured objects and
> route attachments.
>
> The invariant I would like v2 to follow is that the original-family
> route lookup selects translation as its terminal route action. The
> translated skb then gets a fresh lookup in the other family. From that
> point on, TTL/Hop Limit where applicable, PMTU, ICMP errors, and
> netfilter visibility belong to the translated family.
>
> So I think your question addresses the core design issue in this RFC. My
> current preference is to rework the next version around an LWT/domain
> model instead of the virtual netdevice model, unless prototyping shows a
> fundamental problem with that approach.
>
> Does that model make sense to you?

I did consider this as well before suggesting netfilter as the right
place to hook things, and I do think the route object model has some
appeal. I agree it's a better model than the magical loopback interface,
certainly.

I think in the end this comes down to whether flexibility in how to use
this translation mechanism is a bug or a feature, as outlined above. I'm
leaning towards "feature", but could probably be persuaded otherwise :)

> Thanks for pushing on this.

You're welcome! Thanks for working on it - will be cool to have this
land in whatever form we end up agreeing on!

-Toke

^ permalink raw reply

* Re: [PATCH net-next v7 01/12] net: phylink: keep and use MAC supported_interfaces in phylink struct
From: Maxime Chevallier @ 2026-06-15 13:33 UTC (permalink / raw)
  To: Christian Marangi, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley, Simon Horman, Jonathan Corbet, Shuah Khan,
	Lorenzo Bianconi, Heiner Kallweit, Russell King, Saravana Kannan,
	Philipp Zabel, Nathan Chancellor, Nick Desaulniers, Bill Wendling,
	Justin Stitt, netdev, devicetree, linux-kernel, linux-doc,
	linux-arm-kernel, linux-mediatek, llvm
In-Reply-To: <20260615122950.22281-2-ansuelsmth@gmail.com>

Hello Christian,

On 6/15/26 14:29, Christian Marangi wrote:
> Add in phylink struct a copy of supported_interfaces from phylink_config
> and make use of that instead of relying on phylink_config value.
> 
> This in preparation for support of PCS handling internally to phylink
> where a PCS can be removed or added after the phylink is created and we
> need both a reference of the supported_interfaces value from
> phylink_config and an internal value that can be updated with the new
> PCS info.
> 
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> ---
>  drivers/net/phy/phylink.c | 22 +++++++++++++++-------
>  1 file changed, 15 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
> index 087ac63f9193..4d59c0dd78db 100644
> --- a/drivers/net/phy/phylink.c
> +++ b/drivers/net/phy/phylink.c
> @@ -60,6 +60,11 @@ struct phylink {
>  	/* The link configuration settings */
>  	struct phylink_link_state link_config;
>  
> +	/* What interface are supported by the current link.
> +	 * Can change on removal or addition of new PCS.
> +	 */
> +	DECLARE_PHY_INTERFACE_MASK(supported_interfaces);

Can you clarify a bit what you mean here ? Is that the combination of the
interfaces the MAC supports AND the currently in-use PCS ?

Maxime


^ permalink raw reply

* Re: [PATCH bpf 1/2] bpf: Fix partial copy of non-linear skb test_run output
From: Paul Chaignon @ 2026-06-15 13:39 UTC (permalink / raw)
  To: Sun Jian
  Cc: bpf, netdev, linux-kselftest, linux-kernel, ast, daniel, andrii,
	martin.lau, eddyz87, memxor, song, yonghong.song, jolsa, shuah
In-Reply-To: <20260615073856.152479-2-sun.jian.kdev@gmail.com>

On Mon, Jun 15, 2026 at 03:38:55PM +0800, Sun Jian wrote:
> For non-linear skbs, bpf_test_finish() derives the linear head copy
> length from copy_size - frag_size. This only matches the skb head length
> when copy_size is the full packet size.
> 
> When userspace provides a short data_out buffer, copy_size is clamped to
> that buffer size. If copy_size is smaller than frag_size, the computed
> length becomes negative and bpf_test_finish() returns -ENOSPC before
> copying the packet prefix or updating data_size_out.

Thanks for fixing this!

> 
> Compute the linear head length from the skb layout instead, and clamp the
> head copy length to copy_size. This preserves the expected partial-copy
> semantics: return -ENOSPC, copy the packet prefix that fits in data_out,
> and report the full packet length through data_size_out.
> 
> Fixes: 838baa351cee ("bpf: Craft non-linear skbs in BPF_PROG_TEST_RUN")

Wouldn't this bug actually go back to 7855e0db150ad ("bpf: test_run:
add xdp_shared_info pointer in bpf_test_finish signature") and also
affect the XDP bpf_prog_test_run_xdp()? If so, could you also add a
selftest that covers it for XDP?

> Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com>
> ---
>  net/bpf/test_run.c | 11 ++++-------
>  1 file changed, 4 insertions(+), 7 deletions(-)
> 
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 2bc04feadfab..976e8fa31bc9 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -453,19 +453,16 @@ static int bpf_test_finish(const union bpf_attr *kattr,
>  	}
>  
>  	if (data_out) {
> -		int len = sinfo ? copy_size - frag_size : copy_size;
> -
> -		if (len < 0) {
> -			err = -ENOSPC;
> -			goto out;
> -		}
> +		u32 head_len = size - frag_size;
> +		u32 len = min(copy_size, head_len);
>  
>  		if (copy_to_user(data_out, data, len))
>  			goto out;
>  
>  		if (sinfo) {
> -			int i, offset = len;
> +			u32 offset = len;
>  			u32 data_len;
> +			int i;
>  
>  			for (i = 0; i < sinfo->nr_frags; i++) {
>  				skb_frag_t *frag = &sinfo->frags[i];
> -- 
> 2.43.0
> 

^ permalink raw reply

* [PATCH nf-next v2 1/6] netfilter: nf_nat_ftp: replace u_int16_t with u16
From: Carlos Grillet @ 2026-06-15 13:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260615133835.51273-1-carlos@carlosgrillet.me>

Use preferred kernel integer type u16 instead of the POSIX u_int16_t
variant.

No functional change.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_nat_ftp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_nat_ftp.c b/net/netfilter/nf_nat_ftp.c
index c92a436d9c48..ab714629e2b1 100644
--- a/net/netfilter/nf_nat_ftp.c
+++ b/net/netfilter/nf_nat_ftp.c
@@ -69,7 +69,7 @@ static unsigned int nf_nat_ftp(struct sk_buff *skb,
 			       struct nf_conntrack_expect *exp)
 {
 	union nf_inet_addr newaddr;
-	u_int16_t port;
+	u16 port;
 	int dir = CTINFO2DIR(ctinfo);
 	struct nf_conn *ct = exp->master;
 	char buffer[sizeof("|1||65535|") + INET6_ADDRSTRLEN];
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v2 0/6] netfilter: replace u_int*_t with kernel int types
From: Carlos Grillet @ 2026-06-15 13:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, linux-kernel, netdev

Hi all! This is my first patch series of many, I hope :)
I'd like to start contributing by helping out with janitor work,
standardizing code and cleaning up.

This patch series replaces POSIX u_int8_t/u_int16_t with the preferred
kernel types u8/u16 across several netfilter files.

u_int*_t appears in many other files, 48 more to be precise, but I wanted
to keep this series small, unless advised otherwise.

No functional changes.

Changes in v2:
- addresses sashiko comments https://sashiko.dev/#/patchset/32368
  - nf_sockopt: update function prototypes and struct definitions
  - nf_log: update the corresponding function declarations and the 
    nf_logfn typedef
- link to v1: https://lore.kernel.org/all/20260612125146.75672-1-carlos@carlosgrillet.me

Carlos Grillet (6):
  netfilter: nf_nat_ftp: replace u_int16_t with u16
  netfilter: nf_nat_irc: replace u_int16_t with u16
  netfilter: nf_sockopt: replace u_int8_t with u8
  netfilter: xt_DSCP: replace u_int8_t with u8
  netfilter: xt_TCPOPTSTRIP: replace u_int8_t and u_int16_t with u8 and u16
  netfilter: nf_log: replace u_int8_t with u8

 include/linux/netfilter.h      |  6 +++---
 include/net/netfilter/nf_log.h | 16 ++++++++--------
 net/netfilter/nf_log.c         | 14 +++++++-------
 net/netfilter/nf_nat_ftp.c     |  2 +-
 net/netfilter/nf_nat_irc.c     |  2 +-
 net/netfilter/nf_sockopt.c     |  8 ++++----
 net/netfilter/xt_DSCP.c        |  8 ++++----
 net/netfilter/xt_TCPOPTSTRIP.c |  8 ++++----
 8 files changed, 32 insertions(+), 32 deletions(-)

-- 
2.54.0


^ permalink raw reply

* [PATCH nf-next v2 3/6] netfilter: nf_sockopt: replace u_int8_t with u8
From: Carlos Grillet @ 2026-06-15 13:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, linux-kernel, netdev
In-Reply-To: <20260615133835.51273-1-carlos@carlosgrillet.me>

Replace POSIX u_int8_t with preferred kernel type u8, update prototype
and struct definition.

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 include/linux/netfilter.h  | 6 +++---
 net/netfilter/nf_sockopt.c | 8 ++++----
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h
index efbbfa770d66..91b68bdba3f5 100644
--- a/include/linux/netfilter.h
+++ b/include/linux/netfilter.h
@@ -181,7 +181,7 @@ static inline void nf_hook_state_init(struct nf_hook_state *p,
 struct nf_sockopt_ops {
 	struct list_head list;
 
-	u_int8_t pf;
+	u8 pf;
 
 	/* Non-inclusive ranges: use 0/0/NULL to never get called. */
 	int set_optmin;
@@ -357,9 +357,9 @@ NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
 }
 
 /* Call setsockopt() */
-int nf_setsockopt(struct sock *sk, u_int8_t pf, int optval, sockptr_t opt,
+int nf_setsockopt(struct sock *sk, u8 pf, int optval, sockptr_t opt,
 		  unsigned int len);
-int nf_getsockopt(struct sock *sk, u_int8_t pf, int optval, char __user *opt,
+int nf_getsockopt(struct sock *sk, u8 pf, int optval, char __user *opt,
 		  int *len);
 
 struct flowi;
diff --git a/net/netfilter/nf_sockopt.c b/net/netfilter/nf_sockopt.c
index 34afcd03b6f6..19a1d028158c 100644
--- a/net/netfilter/nf_sockopt.c
+++ b/net/netfilter/nf_sockopt.c
@@ -59,8 +59,8 @@ void nf_unregister_sockopt(struct nf_sockopt_ops *reg)
 }
 EXPORT_SYMBOL(nf_unregister_sockopt);
 
-static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, u_int8_t pf,
-		int val, int get)
+static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, u8 pf,
+					      int val, int get)
 {
 	struct nf_sockopt_ops *ops;
 
@@ -89,7 +89,7 @@ static struct nf_sockopt_ops *nf_sockopt_find(struct sock *sk, u_int8_t pf,
 	return ops;
 }
 
-int nf_setsockopt(struct sock *sk, u_int8_t pf, int val, sockptr_t opt,
+int nf_setsockopt(struct sock *sk, u8 pf, int val, sockptr_t opt,
 		  unsigned int len)
 {
 	struct nf_sockopt_ops *ops;
@@ -104,7 +104,7 @@ int nf_setsockopt(struct sock *sk, u_int8_t pf, int val, sockptr_t opt,
 }
 EXPORT_SYMBOL(nf_setsockopt);
 
-int nf_getsockopt(struct sock *sk, u_int8_t pf, int val, char __user *opt,
+int nf_getsockopt(struct sock *sk, u8 pf, int val, char __user *opt,
 		  int *len)
 {
 	struct nf_sockopt_ops *ops;
-- 
2.54.0


^ permalink raw reply related

* [PATCH nf-next v2 2/6] netfilter: nf_nat_irc: replace u_int16_t with u16
From: Carlos Grillet @ 2026-06-15 13:38 UTC (permalink / raw)
  To: Pablo Neira Ayuso, Florian Westphal, Phil Sutter
  Cc: netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20260615133835.51273-1-carlos@carlosgrillet.me>

Replace POSIX u_int16_t with preferred kernel type u16

No functional changes.

Signed-off-by: Carlos Grillet <carlos@carlosgrillet.me>
---
 net/netfilter/nf_nat_irc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/nf_nat_irc.c b/net/netfilter/nf_nat_irc.c
index 19c4fcc60c50..14b79cb0171b 100644
--- a/net/netfilter/nf_nat_irc.c
+++ b/net/netfilter/nf_nat_irc.c
@@ -39,7 +39,7 @@ static unsigned int help(struct sk_buff *skb,
 	char buffer[sizeof("4294967296 65635")];
 	struct nf_conn *ct = exp->master;
 	union nf_inet_addr newaddr;
-	u_int16_t port;
+	u16 port;
 
 	/* Reply comes from server. */
 	newaddr = ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3;
-- 
2.54.0


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox