Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] net: wwan: t7xx: destroy DMA pool on CLDMA late init failure
From: Loic Poulain @ 2026-06-22 15:38 UTC (permalink / raw)
  To: Haoxiang Li
  Cc: chandrashekar.devegowda, haijun.liu, ricardo.martinez,
	ryazanov.s.a, johannes, andrew+netdev, davem, edumazet, kuba,
	pabeni, ilpo.jarvinen, netdev, linux-kernel, stable
In-Reply-To: <20260621031714.3605022-1-haoxiang_li2024@163.com>

On Sun, Jun 21, 2026 at 5:18 AM Haoxiang Li <haoxiang_li2024@163.com> wrote:
>
> t7xx_cldma_late_init() creates md_ctrl->gpd_dmapool before
> initializing the TX and RX rings. If any ring initialization
> fails, the error path frees the already initialized rings but
> leaves the DMA pool allocated.
>
> Destroy md_ctrl->gpd_dmapool on the late-init failure path
> to avoid leaking the DMA pool.
>
> Fixes: 39d439047f1d ("net: wwan: t7xx: Add control DMA interface")
> Cc: stable@vger.kernel.org
> Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>

Reviewed-by: Loic Poulain <loic.poulain@oss.qualcomm.com>

> ---
>  drivers/net/wwan/t7xx/t7xx_hif_cldma.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/wwan/t7xx/t7xx_hif_cldma.c b/drivers/net/wwan/t7xx/t7xx_hif_cldma.c
> index e10cb4f9104e..2917cee9b802 100644
> --- a/drivers/net/wwan/t7xx/t7xx_hif_cldma.c
> +++ b/drivers/net/wwan/t7xx/t7xx_hif_cldma.c
> @@ -1063,6 +1063,9 @@ static int t7xx_cldma_late_init(struct cldma_ctrl *md_ctrl)
>         while (i--)
>                 t7xx_cldma_ring_free(md_ctrl, &md_ctrl->tx_ring[i], DMA_TO_DEVICE);
>
> +       dma_pool_destroy(md_ctrl->gpd_dmapool);
> +       md_ctrl->gpd_dmapool = NULL;
> +
>         return ret;
>  }
>
> --
> 2.25.1
>

^ permalink raw reply

* Re: [PATCH net v3 0/6] ipv6: fix error handling in disable_ipv6 sysctl
From: Ido Schimmel @ 2026-06-22 15:35 UTC (permalink / raw)
  To: Fernando Fernandez Mancera
  Cc: netdev, nicolas.dichtel, stephen, horms, pabeni, kuba, edumazet,
	davem, dsahern
In-Reply-To: <20260622130857.5115-1-fmancera@suse.de>

On Mon, Jun 22, 2026 at 03:08:51PM +0200, Fernando Fernandez Mancera wrote:
> While working on a different IPv6 patch series I have spotted multiple
> minor bugs around sysctl error handling and notifications. In general,
> they are not serious issues.

For the series:

Reviewed-by: Ido Schimmel <idosch@nvidia.com>

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH iwl-net] ice: clear the default forwarding VSI rule when releasing a VSI
From: Petr Oros @ 2026-06-22 15:30 UTC (permalink / raw)
  To: Marcin Szycik, netdev
  Cc: Przemek Kitszel, Eric Dumazet, linux-kernel, Andrew Lunn,
	Tony Nguyen, Michal Swiatkowski, Jacob Keller, Jakub Kicinski,
	Paolo Abeni, David S. Miller, intel-wired-lan
In-Reply-To: <4dc1eb2d-e69f-4f13-ab08-ed0077305098@linux.intel.com>


On 6/22/26 15:52, Marcin Szycik wrote:
>
> On 22/06/2026 10:10, Petr Oros wrote:
>> When a VSI is configured as the switch's default forwarding VSI
>> (ICE_SW_LKUP_DFLT) and is then torn down, the rule is left behind in
>> the switch. ice_vsi_release() no longer removes it, and the SR-IOV VF
>> free path (ice_free_vfs() -> ice_free_vf_res() -> ice_vf_vsi_release()
>> -> ice_vsi_release()) does not disable promiscuous mode either, which
>> only happens on VF reset in ice_vf_clear_all_promisc_modes().
>>
>> A trusted VF that enters unicast promiscuous mode becomes the default
>> forwarding VSI (this is the default mode, when the PF does not have VF
>> true-promiscuous mode enabled). If the VFs are then destroyed without
>> the VF first leaving promiscuous mode, the ICE_SW_LKUP_DFLT rule for
>> the now-freed VSI is leaked. When VFs are recreated, a VSI reuses the
>> freed hw_vsi_id. If it is assigned a different VSI handle than the
>> leaked rule holds, ice_set_dflt_vsi() does not recognize it as
>> already-default, and ice_add_update_vsi_list() folds the dangling
>> (freed) handle into a VSI list, which the firmware rejects. The VSI
>> handle assigned on re-creation varies, so the failure is intermittent
>> rather than every cycle.
>>
>> Reproduce by repeatedly running the cycle below on the two ports of the
>> same card, where $VF0 and $VF1 are the netdevs of vf 15 once they
>> appear. The VF must be brought up so iavf actually pushes the unicast
>> promiscuous request, and the rule must settle before the VFs are torn
>> down again:
>>
>>    echo 16 > /sys/class/net/$PF0/device/sriov_numvfs
>>    echo 16 > /sys/class/net/$PF1/device/sriov_numvfs
>>    ip link set $PF0 vf 15 trust on
>>    ip link set $PF1 vf 15 trust on
>>    ip link set $VF0 up
>>    ip link set $VF1 up
>>    ip link set $VF0 promisc on
>>    ip link set $VF1 promisc on
>>    sleep 1
>>    echo 0 > /sys/class/net/$PF0/device/sriov_numvfs
>>    echo 0 > /sys/class/net/$PF1/device/sriov_numvfs
>>
>> Within a few cycles the ice PF and iavf VF log:
>>
>>    Failed to set VSI 25 as the default forwarding VSI, error -22
>>    Turning on/off promiscuous mode for VF 63 failed, error: -22
>>    PF returned error -53 (IAVF_ERR_ADMIN_QUEUE_ERROR) to our request 14
>>
>> This cleanup used to live in ice_vsi_release() but was dropped by the
>> referenced refactor. Restore it. Clear the default forwarding VSI rule
>> in ice_vsi_release() when this VSI owns it, which covers every teardown
>> path.
>>
>> Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
>> Signed-off-by: Petr Oros <poros@redhat.com>
>> ---
>>   drivers/net/ethernet/intel/ice/ice_lib.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
>> index 2717cc31bff8fe..408464434506ef 100644
>> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
>> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
>> @@ -2872,6 +2872,9 @@ int ice_vsi_release(struct ice_vsi *vsi)
>>   		return -ENODEV;
>>   	pf = vsi->back;
>>   
>> +	if (ice_is_vsi_dflt_vsi(vsi))
>> +		ice_clear_dflt_vsi(vsi);
> In the referenced commit, the chunk of code that contained these missing 2 lines
> was moved to ice_vsi_decfg(). It also sounds like a good place for them and will
> be called from ice_vsi_release(). Are you sure we should place them directly in
> ice_vsi_release() instead?
No, ice_vsi_decfg() is not a good place for them because it is not
release only. It also runs on the rebuild and reconfig paths
(ice_vsi_rebuild(), ice_vf_reconfig_vsi(), the ice_vsi_cfg() error
path), where the VSI is reconfigured in place and stays alive, so it
can still be the default VSI afterwards.

Before the refactor the release-path clear lived only in
ice_vsi_release() and the old ice_vsi_rebuild() never cleared it.
Putting it in ice_vsi_decfg() would also clear the default VSI whenever
the default VSI itself is reset or reconfigured, which the original
code never did. ice_vsi_release() keeps it to the case where the owning
VSI is actually torn down, and the ice_is_vsi_dflt_vsi() guard makes it
a no-op everywhere else.

So I would prefer to keep it in ice_vsi_release().

Regards,

Petr

> Thanks,
> Marcin
>
>> +
>>   	if (test_bit(ICE_FLAG_RSS_ENA, pf->flags))
>>   		ice_rss_clean(vsi);
>>   
>


^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net v2] igb: only strip Rx timestamp header on the first buffer of a frame
From: Loktionov, Aleksandr @ 2026-06-22 15:27 UTC (permalink / raw)
  To: tkusters@aweta.nl, Nguyen, Anthony L, Kitszel, Przemyslaw,
	Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Richard Cochran, Jesper Dangaard Brouer,
	Kurt Kanzenbach
  Cc: intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
In-Reply-To: <20260619-igb-rx-ts-fix-v2-1-d3b8d605ca62@aweta.nl>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Tjerk Kusters via B4 Relay
> Sent: Friday, June 19, 2026 9:15 AM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; Andrew Lunn
> <andrew+netdev@lunn.ch>; David S. Miller <davem@davemloft.net>; Eric
> Dumazet <edumazet@google.com>; Jakub Kicinski <kuba@kernel.org>; Paolo
> Abeni <pabeni@redhat.com>; Richard Cochran <richardcochran@gmail.com>;
> Jesper Dangaard Brouer <hawk@kernel.org>; Kurt Kanzenbach
> <kurt@linutronix.de>
> Cc: intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-
> kernel@vger.kernel.org; stable@vger.kernel.org; Tjerk Kusters
> <tkusters@aweta.nl>
> Subject: [Intel-wired-lan] [PATCH net v2] igb: only strip Rx timestamp
> header on the first buffer of a frame
> 
> From: Tjerk Kusters <tkusters@aweta.nl>
> 
> When Rx hardware timestamping is enabled (e.g. ptp4l, which configures
> HWTSTAMP_FILTER_ALL), the NIC prepends a 16-byte timestamp header to
> the first Rx buffer of every received frame. igb_clean_rx_irq() strips
> this header inside its per-buffer loop:
> 
> 	if (igb_test_staterr(rx_desc, E1000_RXDADV_STAT_TSIP)) {
> 		ts_hdr_len = igb_ptp_rx_pktstamp(rx_ring->q_vector,
> 						 pktbuf, &timestamp);
> 		pkt_offset += ts_hdr_len;
> 		size -= ts_hdr_len;
> 	}
> 
> For a frame that spans more than one Rx buffer (e.g. a jumbo frame),
> this block runs once per buffer. The timestamp header only exists at
> the start of the first buffer, but igb_ptp_rx_pktstamp() is called for
> every buffer.
> 
> On a continuation buffer the data is packet payload, not a timestamp
> header. igb_ptp_rx_pktstamp() already has two guards against acting on
> a non-header buffer: it returns 0 if PTP is disabled, and returns 0 if
> the reserved dwords (the first 8 bytes) are non-zero. Neither is
> sufficient
> here: PTP is enabled, and a continuation buffer whose payload happens
> to begin with 8 zero bytes passes the reserved-dword check. In that
> case the payload is mistaken for a valid timestamp header and
> igb_ptp_rx_pktstamp() returns IGB_TS_HDR_LEN, so the caller strips 16
> bytes of real data from that buffer. A frame spanning N buffers whose
> continuation buffers start with zero bytes therefore loses 16 * (N -
> 1) bytes from its tail.
> 
> This is easily triggered by a GigE Vision camera streaming dark frames
> (mostly 0x00 pixel data) over jumbo UDP with PTP active on the
> receiver:
> the all-zero frames arrive truncated while frames with non-zero
> content are fine. There is no error indication.
> 
> No content-based check can reliably tell a continuation buffer that
> begins with zero bytes from a real timestamp header, because both are
> all zero.
> Fix it structurally instead: only attempt the strip on the first
> buffer of a frame, which is the only buffer that can contain a
> timestamp header. In
> igb_clean_rx_irq() skb is NULL until the first buffer has been
> processed, so guarding the strip with !skb restricts it to the first
> buffer regardless of payload content.
> 
> Fixes: 5379260852b0 ("igb: Fix XDP with PTP enabled")
> Cc: stable@vger.kernel.org
> Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
> Signed-off-by: Tjerk Kusters <tkusters@aweta.nl>
> ---
> Changes in v2:
>  - resend via b4 (v1 was sent with a mail client)
>  - use full author name "Tjerk Kusters" (Jacob Keller)
>  - add Reviewed-by from Kurt Kanzenbach
>  - no functional change
> 
> Link to v1:
> https://lore.kernel.org/all/PAWPR05MB1069106D52F4E17F1EDB99C67B9182@PA
> WPR05MB10691.eurprd05.prod.outlook.com/
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/igb/igb_main.c
> b/drivers/net/ethernet/intel/igb/igb_main.c
> index ce91dda00ec0..abb55cd589a9 100644
> --- a/drivers/net/ethernet/intel/igb/igb_main.c
> +++ b/drivers/net/ethernet/intel/igb/igb_main.c
> @@ -9061,7 +9061,8 @@ static int igb_clean_rx_irq(struct igb_q_vector
> *q_vector, const int budget)
>  		pktbuf = page_address(rx_buffer->page) + rx_buffer-
> >page_offset;
> 
>  		/* pull rx packet timestamp if available and valid */
> -		if (igb_test_staterr(rx_desc, E1000_RXDADV_STAT_TSIP)) {
> +		if (!skb &&
> +		    igb_test_staterr(rx_desc, E1000_RXDADV_STAT_TSIP)) {
>  			int ts_hdr_len;
> 
>  			ts_hdr_len = igb_ptp_rx_pktstamp(rx_ring-
> >q_vector,
> 
> ---
> base-commit: 2d3090a8aeb596a26935db0955d46c9a5db5c6ce
> change-id: 20260619-igb-rx-ts-fix-cd70585ee316
> 
> Best regards,
> --
> Tjerk Kusters <tkusters@aweta.nl>
> 

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH net] igc: Fix RX HW timestamp reporting when NET_RX_BUSY_POLL is disabled
From: Loktionov, Aleksandr @ 2026-06-22 15:26 UTC (permalink / raw)
  To: Ding Meng, Nguyen, Anthony L, Kitszel, Przemyslaw,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	kuba@kernel.org, pabeni@redhat.com, Kiszka, Jan, Bezdeka, Florian
  Cc: intel-wired-lan@lists.osuosl.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, wq.wang@siemens.com
In-Reply-To: <20260622041718.6106-1-meng.ding@siemens.com>



> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf
> Of Ding Meng via Intel-wired-lan
> Sent: Monday, June 22, 2026 6:13 AM
> To: Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>; andrew+netdev@lunn.ch;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; Kiszka, Jan <jan.kiszka@siemens.com>; Bezdeka,
> Florian <florian.bezdeka@siemens.com>
> Cc: intel-wired-lan@lists.osuosl.org; linux-kernel@vger.kernel.org;
> netdev@vger.kernel.org; meng.ding@siemens.com; wq.wang@siemens.com
> Subject: [Intel-wired-lan] [PATCH net] igc: Fix RX HW timestamp
> reporting when NET_RX_BUSY_POLL is disabled
> 
> When CONFIG_NET_RX_BUSY_POLL is deactivated, fetching RX HW timestamps
> from the NIC no longer works as expected.
> 
> This occurs because disabling CONFIG_NET_RX_BUSY_POLL disables the SKB
> NAPI mapping in __skb_mark_napi_id(). Consequently, get_timestamp()
> fails to perform its driver lookup, and the igc driver's struct
> net_device_ops::ndo_get_tstamp is never invoked.
> 
> Instead, get_timestamp() falls back to use shhwtstamps(skb)->hwtstamp,
> a field that the driver has not populated.
> 
> Fix this by populating the hwtstamp field with the correct timestamp
> in the default timer when CONFIG_NET_RX_BUSY_POLL is disabled.
> 
> Fixes: 069b142f5819 ("igc: Add support for PTP .getcyclesx64()")
I think, because it's a fix, it needs Cc: stable@vger.kernel.org

Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

> Co-developed-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> Signed-off-by: Florian Bezdeka <florian.bezdeka@siemens.com>
> Signed-off-by: Ding Meng <meng.ding@siemens.com>
> ---
>  drivers/net/ethernet/intel/igc/igc_main.c | 38 ++++++++++++++++------
> -
>  1 file changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c
> b/drivers/net/ethernet/intel/igc/igc_main.c
> index 8ac16808023..1da8d7aa76d 100644
> --- a/drivers/net/ethernet/intel/igc/igc_main.c
> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
> @@ -1992,7 +1992,26 @@ static struct sk_buff *igc_build_skb(struct
> igc_ring *rx_ring,
>  	return skb;
>  }
> 
> -static struct sk_buff *igc_construct_skb(struct igc_ring *rx_ring,
> +static void igc_construct_skb_timestamps(struct igc_adapter *adapter,
> +					 struct sk_buff *skb,
> +					 struct igc_xdp_buff *ctx)
> +{
> +	if (!ctx->rx_ts)
> +		return;
> +#ifdef CONFIG_NET_RX_BUSY_POLL
> +	skb_shinfo(skb)->tx_flags |= SKBTX_HW_TSTAMP_NETDEV;
> +	skb_hwtstamps(skb)->netdev_data = ctx->rx_ts; #else
> +	struct igc_inline_rx_tstamps *tstamps;
> +
> +	tstamps = ctx->rx_ts;
> +	skb_hwtstamps(skb)->hwtstamp = igc_ptp_rx_pktstamp(adapter,
> +							   tstamps->timer0);
> +#endif
> +}
> +
> +static struct sk_buff *igc_construct_skb(struct igc_adapter *adapter,
> +					 struct igc_ring *rx_ring,
>  					 struct igc_rx_buffer *rx_buffer,
>  					 struct igc_xdp_buff *ctx)
>  {
> @@ -2013,10 +2032,7 @@ static struct sk_buff *igc_construct_skb(struct
> igc_ring *rx_ring,
>  	if (unlikely(!skb))
>  		return NULL;
> 
> -	if (ctx->rx_ts) {
> -		skb_shinfo(skb)->tx_flags |= SKBTX_HW_TSTAMP_NETDEV;
> -		skb_hwtstamps(skb)->netdev_data = ctx->rx_ts;
> -	}
> +	igc_construct_skb_timestamps(adapter, skb, ctx);
> 
>  	/* Determine available headroom for copy */
>  	headlen = size;
> @@ -2686,7 +2702,7 @@ static int igc_clean_rx_irq(struct igc_q_vector
> *q_vector, const int budget)
>  		else if (ring_uses_build_skb(rx_ring))
>  			skb = igc_build_skb(rx_ring, rx_buffer,
> &ctx.xdp);
>  		else
> -			skb = igc_construct_skb(rx_ring, rx_buffer,
> &ctx);
> +			skb = igc_construct_skb(adapter, rx_ring,
> rx_buffer, &ctx);
> 
>  		/* exit if we failed to retrieve a buffer */
>  		if (!xdp_res && !skb) {
> @@ -2738,7 +2754,8 @@ static int igc_clean_rx_irq(struct igc_q_vector
> *q_vector, const int budget)
>  	return total_packets;
>  }
> 
> -static struct sk_buff *igc_construct_skb_zc(struct igc_ring *ring,
> +static struct sk_buff *igc_construct_skb_zc(struct igc_adapter
> *adapter,
> +					    struct igc_ring *ring,
>  					    struct igc_xdp_buff *ctx)
>  {
>  	struct xdp_buff *xdp = &ctx->xdp;
> @@ -2760,10 +2777,7 @@ static struct sk_buff
> *igc_construct_skb_zc(struct igc_ring *ring,
>  		__skb_pull(skb, metasize);
>  	}
> 
> -	if (ctx->rx_ts) {
> -		skb_shinfo(skb)->tx_flags |= SKBTX_HW_TSTAMP_NETDEV;
> -		skb_hwtstamps(skb)->netdev_data = ctx->rx_ts;
> -	}
> +	igc_construct_skb_timestamps(adapter, skb, ctx);
> 
>  	return skb;
>  }
> @@ -2775,7 +2789,7 @@ static void igc_dispatch_skb_zc(struct
> igc_q_vector *q_vector,
>  	struct igc_ring *ring = q_vector->rx.ring;
>  	struct sk_buff *skb;
> 
> -	skb = igc_construct_skb_zc(ring, ctx);
> +	skb = igc_construct_skb_zc(q_vector->adapter, ring, ctx);
>  	if (!skb) {
>  		ring->rx_stats.alloc_failed++;
>  		set_bit(IGC_RING_FLAG_RX_ALLOC_FAILED, &ring->flags);
> 
> base-commit: 4549871118cf616eecdd2d939f78e3b9e1dddc48
> --
> 2.47.3


^ permalink raw reply

* Re: [PATCH iproute2] ip: return correct status from help command
From: Stephen Hemminger @ 2026-06-22 15:16 UTC (permalink / raw)
  To: Rose Wright; +Cc: netdev
In-Reply-To: <20260621180311.8374-1-rosesophiewright@gmail.com>

On Sun, 21 Jun 2026 18:03:11 +0000
Rose Wright <rosesophiewright@gmail.com> wrote:

> Currently, "ip help" or "ip -help" always returns an error code because usage() is used as a fall through on "ip" and defaults to stderr with -1.
> 
> This is a minor bug that breaks "ip help | grep" and other scripts that rely on standard exit codes. The fix is to pass the status code as a parameter into usage() and change stderr to stdout when needed.
> 
> Signed-off-by: Rose Wright <rosesophiewright@gmail.com>
> ---

This is the closest of the three submissions, but there are way more commands in iproute2
than just ip. Need to address all the commands. Looks like perfect trivial job for AI
coding tools. I am looking into it now.


^ permalink raw reply

* RE: Ethtool : PRBS feature
From: Das, Shubham @ 2026-06-22 15:10 UTC (permalink / raw)
  To: Andrew Lunn, Maxime Chevallier
  Cc: Alexander H Duyck, lee@trager.us, netdev@vger.kernel.org,
	mkubecek@suse.cz, D H, Siddaraju, Chintalapalle, Balaji,
	Lindberg, Magnus, niklas.damberg@ericsson.com
In-Reply-To: <b61ee484-f9f5-4504-9e88-aeac701cd4e2@lunn.ch>

> Do you at least have the functionality of the standard C45 registers, even if the addresses and bit fields are messed up?
> If you do, maybe we should actually start with a C45 conforming implementation, and then you can do a translation layer to whatever oddball implementation you have?

The PHY supports the equivalent functionality (PRBS TX, PRBS RX/checker, BER testing, error injection, and symbol/error counters read), 
but these are not exposed through standard Clause 45 PRBS registers. Instead, all operations are implemented by PHY firmware
and accessed through a command/control register interface.

If we want to support both driver-owned NIC architectures and use cases where the PHY driver directly manages the PRBS functionality,
Should this be exposed through a common phylib abstraction/API or a different approach ?

- Shubham D

> -----Original Message-----
> From: Andrew Lunn <andrew@lunn.ch>
> Sent: 21 June 2026 00:51
> To: Maxime Chevallier <maxime.chevallier@bootlin.com>
> Cc: Das, Shubham <shubham.das@intel.com>; Alexander H Duyck
> <alexander.duyck@gmail.com>; lee@trager.us; netdev@vger.kernel.org;
> mkubecek@suse.cz; D H, Siddaraju <siddaraju.dh@intel.com>; Chintalapalle,
> Balaji <balaji.chintalapalle@intel.com>; Lindberg, Magnus
> <magnus.k.lindberg@ericsson.com>; niklas.damberg@ericsson.com
> Subject: Re: Ethtool : PRBS feature
> 
> On Sat, Jun 20, 2026 at 04:39:06PM +0200, Maxime Chevallier wrote:
> > Hi,
> >
> > On 6/20/26 15:48, Das, Shubham wrote:
> > >> Can you change the firmware to expose the 802.3 registers for PRBS?
> > >> You can then write a library which both plylib and your driver can use.
> > >
> > > Andrew,
> > >
> > > No, exposing the PRBS registers to drivers is not possible in our design (the
> registers are buried deep within the Accelerator/NIC/PHY/Analog IP hierarchy).
> > >
> > > Additionally, the PHY PRBS registers are not in accordance with the IEEE
> Clause 45 definitions. For instance, the PRBS registers are paged and 32-bit wide.
> > >
> 
> Hi Shubham
> 
> Do you at least have the functionality of the standard C45 registers, even if the
> addresses and bit fields are messed up?
> 
> If you do, maybe we should actually start with a C45 conforming implementation,
> and then you can do a translation layer to whatever oddball implementation you
> have?
> 
> > > Given these constraints, we think ethtool --phy-test is a reasonable
> > > starting point for exposing the long-established Ethernet PRBS
> > > functionality to Linux userspace, as it aligns well with the
> > > driver-owned NIC architecture model.
> 
> I agree an ethtool --phy-test makes sense, but we need to ensure standard based
> C45 functionality is covered, not just your oddball vendor functionality.
> 
> 	Andrew

^ permalink raw reply

* RE: [PATCH 1/2] Protect skb pointer used by two different kernel instances
From: Selvamani Rajagopal @ 2026-06-22 15:03 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Parthiban Veerasooran, Andrew Lunn, Piergiorgio Beruto,
	David S. Miller, Jakub Kicinski, Paolo Abeni,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andrew Lunn
In-Reply-To: <CANn89iLLrOZ4Ep=dBgs0sC0tWsrZHy8DnCPLw_Em+7v2UtE9tA@mail.gmail.com>



> -----Original Message-----
> From: Eric Dumazet <edumazet@google.com>
> Subject: Re: [PATCH 1/2] Protect skb pointer used by two different kernel instances
> 
> 
> >
> > Fixes: b542d13fab0f ("net: ethernet: oa_tc6: Interrupt is active low, level triggered.")
> > Signed-off-by: Selvamani Rajagopal <Selvamani.Rajagopal@onsemi.com>
> > ---
> 
> OK but please use "net: ethernet: oa_tc6:" prefix in the patch title.


Yes. Another miss. Sorry about that.

^ permalink raw reply

* [PATCH net] netpoll: fix a use-after-free on shutdown path
From: Breno Leitao @ 2026-06-22 15:01 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, Amerigo Wang
  Cc: netdev, linux-kernel, vlad.wing, asantostc, kernel-team, stable,
	Breno Leitao

There is a use-after-free error on netpoll, which is clearly detected by
KASAN.

      BUG: KASAN: slab-use-after-free in _raw_spin_lock_irqsave+0x3b/0x80
      Read of size 1 at addr ... by task kworker/9:1
      Workqueue: events queue_process
      Call Trace:
       skb_dequeue+0x1e/0xb0
       queue_process+0x2c/0x600
       process_scheduled_works+0x4b6/0x850
       worker_thread+0x414/0x5a0
      Allocated by task 242:
       __netpoll_setup+0x201/0x4a0
       netpoll_setup+0x249/0x550
       enabled_store+0x32f/0x380
      Freed by task 0:
       kfree+0x1b7/0x540
       rcu_core+0x3f8/0x7a0

The problem happens when there is a pending TX worker running in
parallel with the cleanup path.

This is what happens on netpoll shutdown path:

1) __netpoll_cleanup() is called
2) set dev->npinfo to NULL
3) call_rcu() with rcu_cleanup_netpoll_info()
  3.1) rcu_cleanup_netpoll_info() tries to cancel all workers with
       cancel_delayed_work(), but doesn't wait for the worker to finish
4) and kfree(npinfo);

Because 3.1) doesn't really cancel the work, as the comment says "we
can't call cancel_delayed_work_sync here, as we are in softirq", the TX
worker can run after 4).

Tl;DR: queue_process() is not an RCU reader, it reaches npinfo through
the work item via container_of().

In reality, we can improve this cleanup path by a lot, but, given that
this is targeting net, just do the sane path:

1) set dev->npinfo to NULL
2) synchronize net / RCU
3) cancel_delayed_work_sync() any new worker (that potentially showed up
   after the grace period -- and should exit soon given they will see
   dev->npinfo = NULL)
4) then rcu_cleanup_netpoll_info() -> kfree() npinfo

In the future, we can do the cleanup inline here, and don't need
npinfo->rcu rcu_head, but that is net-next material.

Cc: stable@vger.kernel.org
Fixes: 38e6bc185d95 ("netpoll: make __netpoll_cleanup non-block")
Signed-off-by: Breno Leitao <leitao@debian.org>
---
 net/core/netpoll.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/net/core/netpoll.c b/net/core/netpoll.c
index 229dde818ab33..5765015b40720 100644
--- a/net/core/netpoll.c
+++ b/net/core/netpoll.c
@@ -634,9 +634,6 @@ static void rcu_cleanup_netpoll_info(struct rcu_head *rcu_head)
 
 	skb_queue_purge(&npinfo->txq);
 
-	/* we can't call cancel_delayed_work_sync here, as we are in softirq */
-	cancel_delayed_work(&npinfo->tx_work);
-
 	/* clean after last, unfinished work */
 	__skb_queue_purge(&npinfo->txq);
 	/* now cancel it again */
@@ -664,6 +661,14 @@ static void __netpoll_cleanup(struct netpoll *np)
 			ops->ndo_netpoll_cleanup(np->dev);
 
 		RCU_INIT_POINTER(np->dev->npinfo, NULL);
+		/*
+		 * synchronize_net() does not protect the worker
+		 * (queue_process() is not an RCU reader). It fences the
+		 * senders -- the real RCU readers -- so they cannot re-arm
+		 * tx_work after the np->dev->npinfo was set to NULL.
+		 */
+		synchronize_net();
+		cancel_delayed_work_sync(&npinfo->tx_work);
 		call_rcu(&npinfo->rcu, rcu_cleanup_netpoll_info);
 	}
 

---
base-commit: d07d80b6a129a44538cda1549b7acf95154fb197
change-id: 20260622-netpoll_rcu_fix-def7bce1207a

Best regards,
-- 
Breno Leitao <leitao@debian.org>


^ permalink raw reply related

* Re: [PATCH iproute2-next] "ip help" wrong output, exit code.
From: Stephen Hemminger @ 2026-06-22 14:57 UTC (permalink / raw)
  To: Dmitri Seletski; +Cc: netdev
In-Reply-To: <069b13e1-f689-410b-bd40-b5e5831b67e7@gmail.com>

On Sun, 21 Jun 2026 22:48:59 +0100
Dmitri Seletski <drjoms@gmail.com> wrote:

> From 0805e07105cd15c5b94271a4706e50e3c65dbde5 Mon Sep 17 00:00:00 2001
> From: Dmitri Seletski <drjoms@gmail.com>
> Date: Sun, 21 Jun 2026 22:12:43 +0100
> Subject: [PATCH iproute2-next]  "ip help" wrong output, exit code.
> 
> Changed output of "ip help" from standard error to standard output. And 
> Exit is now 0 instead of -1. "ip help|grep bridge" - now gives bridge 
> syntax instead of flooding user with everything from "ip help".
> ---
> ip/ip.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/ip/ip.c b/ip/ip.c
> index e4b71bde..4627b61c 100644
> --- a/ip/ip.c
> +++ b/ip/ip.c
> @@ -56,7 +56,7 @@ static void usage(void) __attribute__((noreturn));
> 
> static void usage(void)
> {
> -fprintf(stderr,
> +fprintf(stdout,
> "Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }\n"
> "       ip [ -force ] -batch filename\n"
> "where  OBJECT := { address | addrlabel | fou | help | ila | ioam | l2tp 
> | link |\n"
> @@ -72,7 +72,7 @@ static void usage(void)
> "                    -o[neline] | -t[imestamp] | -ts[hort] | -b[atch] 
> [filename] |\n"
> "                    -rc[vbuf] [size] | -n[etns] name | -N[umeric] | 
> -a[ll] |\n"
> "                    -c[olor]}\n");
> -exit(-1);
> +exit(0);
> }

Your mailer damages white space.

^ permalink raw reply

* [PATCH net] nfc: nci: fix out-of-bounds write in nci_target_auto_activated()
From: Samuel Page @ 2026-06-22 14:52 UTC (permalink / raw)
  To: David Heidelberg
  Cc: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Simon Horman, oe-linux-nfc, netdev, linux-kernel, stable,
	Samuel Page

nci_target_auto_activated() appends a target to the fixed-size array
ndev->targets[NCI_MAX_DISCOVERED_TARGETS] and increments ndev->n_targets
without first checking the array is full; unlike its sibling
nci_add_new_target(), which bails out when n_targets already equals
NCI_MAX_DISCOVERED_TARGETS.

ndev->n_targets is only cleared by nci_clear_target_list(), so an NFCC
that repeatedly re-runs discovery (RF_DISCOVER_RSP, which re-enters
NCI_DISCOVERY without clearing the target list) and reports an
auto-activated target (RF_INTF_ACTIVATED_NTF) drives n_targets past the
limit. The append then writes a struct nfc_target past the end of the
array (a slab out-of-bounds write), and nfc_targets_found() goes on to
walk the array with the inflated count:

  BUG: KASAN: slab-out-of-bounds in nci_add_new_protocol+0x94/0x2ac [nci]
  Write of size 2 at addr ffff0000c7299a18 by task kworker/u8:0/12
  Workqueue: nfc0_nci_rx_wq nci_rx_work [nci]
  Call trace:
   nci_add_new_protocol+0x94/0x2ac [nci]
   nci_ntf_packet+0xddc/0x11a0 [nci]
   nci_rx_work+0x15c/0x1e0 [nci]
   process_one_work+0x2dc/0x500
   worker_thread+0x240/0x460
   kthread+0x1c0/0x1d0
   ret_from_fork+0x10/0x20

  The buggy address belongs to the cache kmalloc-2k of size 2048
  The buggy address is located 1024 bytes to the right of
  allocated 1560-byte region [ffff0000c7299000, ffff0000c7299618)

Guard nci_target_auto_activated() with the same check used by
nci_add_new_target().

Fixes: 019c4fbaa790 ("NFC: Add NCI multiple targets support")
Cc: stable@vger.kernel.org
Assisted-by: Bynario AI
Signed-off-by: Samuel Page <sam@bynar.io>
---
 net/nfc/nci/ntf.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/net/nfc/nci/ntf.c b/net/nfc/nci/ntf.c
index c96512bb8653..566ca839fa48 100644
--- a/net/nfc/nci/ntf.c
+++ b/net/nfc/nci/ntf.c
@@ -603,6 +603,12 @@ static void nci_target_auto_activated(struct nci_dev *ndev,
 	struct nfc_target *target;
 	int rc;
 
+	/* This is a new target, check if we've enough room */
+	if (ndev->n_targets == NCI_MAX_DISCOVERED_TARGETS) {
+		pr_debug("not enough room, ignoring new target...\n");
+		return;
+	}
+
 	target = &ndev->targets[ndev->n_targets];
 
 	rc = nci_add_new_protocol(ndev, target, ntf->rf_protocol,

base-commit: 47186409c092cd7dd70350999186c700233e854d
-- 
2.54.0


^ permalink raw reply related

* RE: [PATCH net-next v5 1/4] dpll: add DPLL_PIN_TYPE_INT_NCO pin type
From: Kubalewski, Arkadiusz @ 2026-06-22 14:47 UTC (permalink / raw)
  To: Vecera, Ivan, Jiri Pirko, Vadim Fedorenko, Jakub Kicinski
  Cc: netdev@vger.kernel.org, Jiri Pirko, David S. Miller,
	Donald Hunter, Eric Dumazet, Schmidt, Michal, Paolo Abeni,
	Vaananen, Pasi, Oros, Petr, Prathosh Satish, Simon Horman,
	linux-kernel@vger.kernel.org
In-Reply-To: <23e47140-f69f-451d-9154-29071130c11c@redhat.com>

>From: Ivan Vecera <ivecera@redhat.com>
>Sent: Friday, June 19, 2026 7:08 PM
>
>On 6/17/26 1:59 PM, Kubalewski, Arkadiusz wrote:
>>> From: Ivan Vecera <ivecera@redhat.com>
>>> Sent: Monday, June 15, 2026 2:00 PM
>>>
>>> On 6/11/26 2:09 PM, Jiri Pirko wrote:
>>>> Wed, Jun 10, 2026 at 05:45:46PM +0200, ivecera@redhat.com wrote:
>>>>> On 6/10/26 3:04 PM, Kubalewski, Arkadiusz wrote:
>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>> Sent: Tuesday, June 9, 2026 4:59 PM
>>>>>>>
>>>>>>> On 6/9/26 4:00 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>>> From: Jiri Pirko <jiri@resnulli.us>
>>>>>>>>> Sent: Tuesday, June 9, 2026 10:51 AM
>>>>>>>>>
>>>>>>>>> Mon, Jun 08, 2026 at 07:03:46PM +0200,
>>>>>>>>> arkadiusz.kubalewski@intel.com
>>>>>>>>> wrote:
>>>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>>>> Sent: Monday, June 8, 2026 5:48 PM
>>>>>>>>>>>
>>>>>>>>>>> On 6/8/26 4:43 PM, Kubalewski, Arkadiusz wrote:
>>>>>>>>>>>>> From: Ivan Vecera <ivecera@redhat.com>
>>>>>>>>>>>>> Sent: Sunday, May 31, 2026 9:44 PM ...
>>>>>>>>>>>>>            -
>>>>>>>>>>>>>              name: gnss
>>>>>>>>>>>>>              doc: GNSS recovered clock
>>>>>>>>>>>>> +      -
>>>>>>>>>>>>> +        name: int-nco
>>>>>>>>>>>>> +        doc: |
>>>>>>>>>>>>> +          Device internal numerically controlled oscillator.
>>>>>>>>>>>>> +          When connected as a DPLL input, the DPLL enters
>>>>>>>>>>>>> NCO
>>>>>>>>>>>>> mode
>>>>>>>>>>>>> +          where the output frequency is adjusted by the host
>>>>>>>>>>>>> via
>>>>>>>>>>>>> +          the PTP clock interface.
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Ivan!
>>>>>>>>>>>>
>>>>>>>>>>>> How would you control this in case of automatic mode dpll?
>>>>>>>>>>>> Automatic mode DPLL shall be controlled on HW level, such pin
>>>>>>>>>>>> brakes that rule and requires some driver magic to show it is
>>>>>>>>>>>> higher priority then the rest of the pins?
>>>>>>>>>>>
>>>>>>>>>>> The NCO pin can be connected only in manual mode. In other
>>>>>>>>>>> words
>>>>>>>>>>> a
>>>>>>>>>>> DPLL in automatic mode cannot select NCO pin (switch to NCO
>>>>>>>>>>> mode)
>>>>>>>>>>> by
>>>>>>>>>>> its own.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Being picky on DPLL_MODE for enabling feature is not something
>>>>>>>>>> we
>>>>>>>>>> can allow if it is not related to HW limitation, is it?
>>>>>>>>>> Could you please elaborate why it is not possible for AUTOMATIC
>>>>>>>>>> mode?
>>>>>>>>>
>>>>>>>>> In automatic mode, the pin selection logic is defined upon prio.
>>>>>>>>> I
>>>>>>>>> can imagine that if NCO pin has the highest prio of the available
>>>>>>>>> ones, it gets picked. I would be aligned 100% with automatic mode
>>>>>>>>> behaviour.
>>>>>>>>> Is there a real usecase for it?
>>>>>>>>>
>>>>>>>>> [..]
>>>>>>>>
>>>>>>>> This is not true. AUTOMATIC mode is HW solution, SW driver ONLY
>>>>>>>> configures priorities on the inputs, not manages the active
>>>>>>>> inputs.
>>>>>>>> This brakes that behavior, the SW driver would have to manually
>>>>>>>> override the AUTMATIC mode to be fed from such NCO pin as it
>>>>>>>> doesn't
>>>>>>>> exists on it's priority list, HW cannot pick or use it.
>>>>>>>
>>>>>>> Correct, AUTO mode is hardware feature and it should not be
>>>>>>> emulated
>>>>>>> by a
>>>>>>> driver. If the hardware does not support it then the switching
>>>>>>> between
>>>>>>> input references should be done by userspace (by monitoring ffo,
>>>>>>> phase_offset, operstate).
>>>>>>>
>>>>>>
>>>>>> Yes, exactly, so for AUTOMATIC mode HW it will not be possible to
>>>>>> create
>>>>>> such pin, which means that NCO pin would serve only a MANUAL mode
>>>>>> implementation.
>>>>>> Basically this is something we shall not allow to happen. DPLL API
>>>>>> should be designed to cover the case where AUTO mode is able to
>>>>>> implement
>>>>>> all features consistently.
>>>>>
>>>>> If you don't like the proposal from Jiri (NCO switch driven by NCO
>>>>> pin
>>>>> priority -> highest==enter_nco else leave_nco) then it could be
>>>>> possible
>>>>> to handle the switching by allowing the state 'connected' in AUTO
>>>>> mode
>>>>> for the NCO pin type. Then the implementation will be the same for
>>>>> both
>>>>> selection modes.
>>>>>
>>>>> Only difference would be that a user does not need to switch the
>>>>> device
>>>> >from the AUTO to MANUAL mode.
>>>>>
>>>>>>>> The real use case is that any DPLL can switch the mode to this one
>>>>>>>> instead of implementing MANUAL mode just to use the feature with a
>>>>>>>> 'virtual' pin.
>>>>>>>
>>>>>>> I don't expect this... but it is up to a driver. I don't plan such
>>>>>>> functionality in zl3073x as the NCO pin does not expose prio_get()
>>>>>>> and
>>>>>>> prio_set() callbacks - so it is clear that this pin cannot be part
>>>>>>> of
>>>>>>> the
>>>>>>> automatic selection.
>>>>>>>
>>>>>>> Ivan
>>>>>>
>>>>>> There is a difference between particular HW and API capabilities,
>>>>>> with
>>>>>> the
>>>>>> proposed API we would disallow the possibility of such
>>>>>> implementation
>>>>>> for
>>>>>> existing HW variants.
>>>>>>
>>>>>> DPLL NCO MODE would allow that but as pointed here by Ivan and by
>>>>>> Jiri
>>>>>> in
>>>>>> the other email it would also require the extra implementation for
>>>>>> some
>>>>>> configuration - device level phase/ffo handling.
>>>>>>
>>>>>> To summarize it all, I don't have such simple solution for it.
>>>>>>
>>>>>> First thing that comes to my mind is to combine both approaches.
>>>>>> Make it possible for AUTMATIC mode to also set "CONNECTED" state
>>>>>> on certain kind of "OVERRIDE" pins, where it could be determined by
>>>>>> the type of PIN and embed that logic into the DPLL subsystem.
>>>>>
>>>>> The possible states for particual pins are now handled at a driver
>>>>> level
>>>>> so the driver decides if the requested state is correct or not. So it
>>>>> could be easy to implement this.
>>>>>
>>>>> For auto mode allowed states:
>>>>> - input references: selectable / disconnected
>>>>> - nco pin: connected / disconnected
>>>>>
>>>>>> Basically, if driver registers such NCO pin it would be always
>>>>>> selected
>>>>>> manually, and in such case all the other pins are going to
>>>>>> disconnected
>>>>>> state while DPLL mode is also a "OVERRIDE" or something like it.
>>>>>
>>>>> I would leave this decision on the driver level... Imagine the
>>>>> potential
>>>>> HW that would allow to switch NCO mode if there is no valid input
>>>>> reference.
>>>>>
>>>>> Example:
>>>>>
>>>>> REF0 (prio 0) -> +------+ -> OUT0
>>>>> REF1 (prio 1) -> | DPLL | -> ...
>>>>> NCO  (prio 2) -> +------+ -> OUTn
>>>>>
>>>>> Such HW would prefer REF0 or REF1 and lock to one of them if they are
>>>>> qualified. But if they are NOT, then it switches to NCO mode.
>>
>> Now you said yourself "NCO mode" ... I agree that it would be a mode in
>> that case. Where instead of running on regular/built in XO dpll would
>> run
>> on NCO and user could select it, and this would be addition to regular
>> behavior.
>>
>> I also agree that the pin approach might be better/easier to use,
>> assuming
>> frequency offset for all the outputs given dpll drives, it makes more
>> sense
>> to have it configurable on input side.
>
>+1
>
>>>>>
>>>>> In this situation the relevant driver would allow to configure
>>>>> priority
>>>>> and state 'selectable' for this NCO pin.
>>>>>
>>>>>> Perhaps the pin type could include OVERRIDE in it's name to make it
>>>>>> less
>>>>>> confusing and needs some extra documentation.
>>>>>>
>>>>>> Thoughts?
>>>>> I think _INT_ is ok. In the case of TYPE_INT_OSCILLATOR it is also
>>>>> obvious that it is not a standard input reference.
>>>>>
>>>>> Jiri, Vadim, Arek, thoughts?
>>>>
>>>> I agree with you, the driver should have the flexibility to implement
>>>> this according to his/hw's needs/capabilities. If it implements prio
>>>> selection in AUTO mode, let it have it. If it implements manual NCO
>>>> pin
>>>> selection in AUTO mode using connected/disconnected override, let it
>>>> have it.
>>
>> I don't know 'current' HW that is capable of using AUTO mode as a part
>> of
>> HW-based priority source selection and use such NCO input..
>> But as already explained above, this is special mode of regular XO,
>> which
>> allows DPLL's output frequency offset configuration.
>
>Lets keep this available for potential future HW. I can imagine a
>situation where a user will prefer an automatic switch to NCO mode
>if there is no qualified input reference - automatic switch means
>that HW will support this (not emulated by the driver).
>
>>>>
>>>> Moreover, I actually like the "override" capability for pins in AUTO
>>>> mode in general. It may be handy for other usecases as well.
>>>>
>>> Arek? Vadim?
>>>
>>> Thanks,
>>> Ivan
>>
>> Agree, 'override' capability of a pin would be the way to go for this
>> and
>> other similar further cases.
>>
>> I believe a single approach on this would be best, I mean if AUTO mode
>> needs a capability, to switch from regular behavior to 'OVERRIDE', and
>> 'OVERRIDE' is only pin capability that allows such behavior for AUTO
>> mode, then similar approach should be used on MANUAL mode, to make
>> userspace know that such pin is always available to set "CONNECTED"
>> and make the userspace implementation consistent on enabling it no
>> matter
>> if AUTO or MANUAL mode dpll.
>
>Proposal:
>1) new pin capability
>    - name: state-connected-override
>    - doc: pin state can be changed to connected in any DPLL mode
>
>2) new NCO pin type to switch the DPLL to NCO mode when connected
>
>3) automatic-only DPLL
>    - should expose NCO pin with state-connected-override capability
>
>4) manual-only DPLL
>   - does not need to expose NCO pin with state-connected-override cap
>

It makes sense only if such AUTO mode DPLL can have both OVERRIDE and
non-OVERRIDE possibilities. So further HW design could implement it
without OVERRIDE and current AUTO mode could OVERRIDE and use it.

Having it this way, requires different user space implementation,
but don't think we will be able to ease it anyhow.

Thank you!
Arkadiusz

>5) dual-mode DPLL (supporting mode switching)
>   - if it exposes NCO pin with the override cap then it has to support
>     switching to NCO mode directly from AUTO mode
>   - if does not expose NCO pin with the override cap then a user MUST
>     switch the DPLL mode from AUTO to MANUAL to be able to make NCO
>     pin connected to the DPLL
>
>Vadim, Jiri, Arek - thoughts?
>
>Thanks,
>Ivan


^ permalink raw reply

* [PATCH] net: stmmac: fix missed le32_to_cpu()
From: Ben Dooks @ 2026-06-22 14:37 UTC (permalink / raw)
  To: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Maxime Coquelin, Alexandre Torgue,
	Russell King (Oracle), Ben Dooks, Maxime Chevallier, netdev,
	linux-stm32, linux-arm-kernel, linux-kernel

The print in ndesc_display_ring() sends the des2 and des3
to the pr_info() without passing them through the relevant
conversion to cpu order.

Fix the (prototype) sparse warnings by using le32_to_cpu():
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: warning: incorrect type in argument 6 (different base types)
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17:    expected unsigned int
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17:    got restricted __le32 [usertype] des2
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17: warning: incorrect type in argument 7 (different base types)
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17:    expected unsigned int
drivers/net/ethernet/stmicro/stmmac/norm_desc.c:258:17:    got restricted __le32 [usertype] des3

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
---
 drivers/net/ethernet/stmicro/stmmac/norm_desc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
index c4b613564f87..74c9b7b1fe8f 100644
--- a/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/norm_desc.c
@@ -258,7 +258,7 @@ static void ndesc_display_ring(void *head, unsigned int size, bool rx,
 		pr_info("%03d [%pad]: 0x%x 0x%x 0x%x 0x%x",
 			i, &dma_addr,
 			(unsigned int)x, (unsigned int)(x >> 32),
-			p->des2, p->des3);
+			le32_to_cpu(p->des2), le32_to_cpu(p->des3));
 		p++;
 	}
 	pr_info("\n");
-- 
2.37.2.352.g3c44437643


^ permalink raw reply related

* Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
From: Toke Høiland-Jørgensen @ 2026-06-22 14:36 UTC (permalink / raw)
  To: Ralf Lici
  Cc: netdev, Daniel Gröber, Antonio Quartulli, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	Beniamino Galvani
In-Reply-To: <20260622133452.432257-1-ralf@mandelbit.com>

[ skipping some of the netfilter-related context until we hear from the
netfilter devs ]

>> > My second concern is that the SIIT boundary would be a property of
>> > rule and hook placement. That gives flexibility, but it also means the
>> > translation point has to be constrained and documented very carefully
>> > to avoid ambiguous TTL/Hop Limit, PMTU/ICMP, and hook-order behavior.
>> > For this use case I would rather have the route that matches the
>> > translation prefix also be the object that says: leave this family
>> > here and continue in the other one.
>>
>> Yeah, with flexibility comes the ability to shoot yourself in the foot.
>> But that's not really different from much of the other functionality we
>> have in the kernel today, is it? For netfilter in particular it's
>> certainly possible to configure a broken NAT configuration that leads to
>> packet drops (or just invalid packets being sent out on a network
>> device).
>>
>
> True, misconfiguration is always possible and that alone is not an
> argument against the netfilter model. But what do we actually gain in
> capability from that flexibility? I agree on the UX argument (an admin
> would look in nft first), but in terms of what the feature can do, I
> can't yet see what the nft model unlocks. More on this just below.
>
>> > After looking at the available kernel mechanisms again, I think the
>> > better model is probably LWT: routes carry an ipxlat encap referencing a
>> > named translator domain configured over netlink. That should represent
>> > the stateless, prefix-based and symmetric nature of ipxlat.
>>
>> I think this description actually hits the nail on the head: What are we
>> implementing here? Is it a product feature, or a building block for one?
>> The properties you mention wrt consistency, symmetry etc are properties
>> of the high-level feature (which is also generally the level things are
>> specified in RFCs). Whereas other packet mangling features in the kernel
>> are more in the "building block" category, where it's possible to
>> configure things to implement a particular feature set / compliance with
>> a particular RFC, but it's also possible to do things that are outside
>> of that.
>>
>> I think this relates to the "mechanism, not policy" approach that we
>> take to most things in the kernel: implement the building blocks to do
>> something in the most general way we can, and then leave it up to
>> userspace to configure things in a way that results in a consistent
>> high-level system behaviour.
>>
>
> That's a good point, and I agree that we should not bake a high-level
> product policy into the kernel if what we need is a reusable mechanism
> (the LWT idea was my attempt at exactly that). What I am still trying to
> understand is whether there is a useful generic trigger for stateless
> cross-family translation beyond the route/prefix/policy-routing cases.
>
> Routes and policy routing already cover the selectors I can make
> coherent for a stateless, per-packet translator: destination/source
> prefix, iif/oif/VRF, mark, TOS/DSCP, and so on. nft can of course match
> much more than that, but the additional selectors that would materially
> change the translation decision seem to be selectors such as L4 fields,
> payload state, or conntrack state. Those are exactly the selectors I am
> struggling to make correct for a stateless translator:
>
> - non-first fragments carry no L4 header at all, yet the translator must
>   rewrite every fragment (an nft ... tcp dport trigger cannot fire on
>   them);
>
> - ICMP errors must be translated too, but the flow identity lives in the
>   quoted inner header (reversed), not in anything an L4/ct match on the
>   error packet can see and there is no conntrack to associate them,
>   since this is stateless.

True in principle, but if (say) you deploy this on a network that is
configured so it will never fragment packets, this won't be an issue in
practice.

I.e., you're quite right that arbitrary matching criteria cannot be
guaranteed to result in coherent translation. But I think that goes into
the "use it wrong, get wrong results" bin. E.g., if you match on
something that results in only a subset of the packets of a flow being
translated, well, only that subset of the packets will make it to the
destination. The SIIT translator itself should not try to fix this, but
neither should it prevent it; that's what I mean by "building block" -
it's up to the builder using the blocks to make sure the building
doesn't collapse, that's out of scope for the block manufacturer to
worry about :)

> So an L4-conditional trigger does not look like a good primitive for
> correct stateless SIIT unless the action also defragments/refragments or
> uses conntrack-like state. Those may be valid mechanisms, but they move
> the design away from the stateless per-packet SIIT boundary this RFC is
> trying to model.
>
> So my first question is: is there a useful nft configuration this should
> enable that is not naturally expressible as route selection, while still
> remaining stateless SIIT rather than a NAT64-like stateful feature?
> Maybe there is a real use case there, but I cannot construct one yet.

So the poster child for "match on arbitrary criteria" is of course BPF.
You can write BPF programs that match on arbitrary parts of the packet
header, custom encapsulation headers,or even on out of band things like
system state, phase of the moon, or what have you. And we should
certainly allow a BPF program to make the decision on whether to perform
the SIIT translation.

Which... maybe is an argument to keep it as a device like you do in this
RFC series? Redirecting to a device is trivially supported from TC-BPF,
which also makes it possible to use the translation mechanism without
going through the routing subsystem at all, saving a bit of overhead.
Whereas making it a route action ties it very closely to the routing
subsystem.

WDYT?

-Toke

^ permalink raw reply

* Re: [PATCH net 0/2] nfc: llcp: fix OOB reads and integer bugs in TLV parsers
From: Muhammad Bilal @ 2026-06-22 13:58 UTC (permalink / raw)
  To: david; +Cc: netdev, linux-kernel, oe-linux-nfc, horms, stable
In-Reply-To: <78e283f0-1493-4d72-92fe-e6444458fb91@ixit.cz>

On Sun, Jun 21, 2026 at 09:52 PM David Heidelberg wrote:
> could I ask for the patches rebase against for-next?

Hi David,

Rebased onto nfc/for-next.

v2 is now sent and available here:
https://lore.kernel.org/netdev/20260622131802.239035-1-meatuni001@gmail.com/

Patch 2/2 was dropped because the equivalent fixes for nfc_llcp_recv_snl()
were already merged (ed85d4cbbfaa), so only llcp_commands.c remains.

Thanks,
Muhammad Bilal

^ permalink raw reply

* Re: [Intel-wired-lan] [PATCH iwl-net] ice: clear the default forwarding VSI rule when releasing a VSI
From: Marcin Szycik @ 2026-06-22 13:52 UTC (permalink / raw)
  To: Petr Oros, netdev
  Cc: Przemek Kitszel, Eric Dumazet, linux-kernel, Andrew Lunn,
	Tony Nguyen, Michal Swiatkowski, Jacob Keller, Jakub Kicinski,
	Paolo Abeni, David S. Miller, intel-wired-lan
In-Reply-To: <20260622081030.2312129-1-poros@redhat.com>



On 22/06/2026 10:10, Petr Oros wrote:
> When a VSI is configured as the switch's default forwarding VSI
> (ICE_SW_LKUP_DFLT) and is then torn down, the rule is left behind in
> the switch. ice_vsi_release() no longer removes it, and the SR-IOV VF
> free path (ice_free_vfs() -> ice_free_vf_res() -> ice_vf_vsi_release()
> -> ice_vsi_release()) does not disable promiscuous mode either, which
> only happens on VF reset in ice_vf_clear_all_promisc_modes().
> 
> A trusted VF that enters unicast promiscuous mode becomes the default
> forwarding VSI (this is the default mode, when the PF does not have VF
> true-promiscuous mode enabled). If the VFs are then destroyed without
> the VF first leaving promiscuous mode, the ICE_SW_LKUP_DFLT rule for
> the now-freed VSI is leaked. When VFs are recreated, a VSI reuses the
> freed hw_vsi_id. If it is assigned a different VSI handle than the
> leaked rule holds, ice_set_dflt_vsi() does not recognize it as
> already-default, and ice_add_update_vsi_list() folds the dangling
> (freed) handle into a VSI list, which the firmware rejects. The VSI
> handle assigned on re-creation varies, so the failure is intermittent
> rather than every cycle.
> 
> Reproduce by repeatedly running the cycle below on the two ports of the
> same card, where $VF0 and $VF1 are the netdevs of vf 15 once they
> appear. The VF must be brought up so iavf actually pushes the unicast
> promiscuous request, and the rule must settle before the VFs are torn
> down again:
> 
>   echo 16 > /sys/class/net/$PF0/device/sriov_numvfs
>   echo 16 > /sys/class/net/$PF1/device/sriov_numvfs
>   ip link set $PF0 vf 15 trust on
>   ip link set $PF1 vf 15 trust on
>   ip link set $VF0 up
>   ip link set $VF1 up
>   ip link set $VF0 promisc on
>   ip link set $VF1 promisc on
>   sleep 1
>   echo 0 > /sys/class/net/$PF0/device/sriov_numvfs
>   echo 0 > /sys/class/net/$PF1/device/sriov_numvfs
> 
> Within a few cycles the ice PF and iavf VF log:
> 
>   Failed to set VSI 25 as the default forwarding VSI, error -22
>   Turning on/off promiscuous mode for VF 63 failed, error: -22
>   PF returned error -53 (IAVF_ERR_ADMIN_QUEUE_ERROR) to our request 14
> 
> This cleanup used to live in ice_vsi_release() but was dropped by the
> referenced refactor. Restore it. Clear the default forwarding VSI rule
> in ice_vsi_release() when this VSI owns it, which covers every teardown
> path.
> 
> Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions")
> Signed-off-by: Petr Oros <poros@redhat.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_lib.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
> index 2717cc31bff8fe..408464434506ef 100644
> --- a/drivers/net/ethernet/intel/ice/ice_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_lib.c
> @@ -2872,6 +2872,9 @@ int ice_vsi_release(struct ice_vsi *vsi)
>  		return -ENODEV;
>  	pf = vsi->back;
>  
> +	if (ice_is_vsi_dflt_vsi(vsi))
> +		ice_clear_dflt_vsi(vsi);

In the referenced commit, the chunk of code that contained these missing 2 lines
was moved to ice_vsi_decfg(). It also sounds like a good place for them and will
be called from ice_vsi_release(). Are you sure we should place them directly in
ice_vsi_release() instead?

Thanks,
Marcin

> +
>  	if (test_bit(ICE_FLAG_RSS_ENA, pf->flags))
>  		ice_rss_clean(vsi);
>  


^ permalink raw reply

* Re: [PATCH net 3/3] net/mlx5e: Reject unsupported CB Shaper TSA in ETS validation
From: Pavan Chebbi @ 2026-06-22 13:50 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netdev, Paolo Abeni, Alexei Lazar, Carolina Jubran,
	Leon Romanovsky, linux-kernel, linux-rdma, Mark Bloch,
	Saeed Mahameed, Gal Pressman
In-Reply-To: <20260622112925.624795-4-tariqt@nvidia.com>

[-- Attachment #1: Type: text/plain, Size: 622 bytes --]

On Mon, Jun 22, 2026 at 5:02 PM Tariq Toukan <tariqt@nvidia.com> wrote:
>
> From: Alexei Lazar <alazar@nvidia.com>
>
> Credit Based (CB) TSA is not supported by the mlx5 driver, so reject
> any configurations that specify it.
>
> Fixes: 08fb1dacdd76 ("net/mlx5e: Support DCBNL IEEE ETS")
> Signed-off-by: Alexei Lazar <alazar@nvidia.com>
> Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]

^ permalink raw reply

* Re: [PATCH net 2/3] net/mlx5e: Validate bandwidth for non-ETS traffic classes
From: Pavan Chebbi @ 2026-06-22 13:49 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netdev, Paolo Abeni, Alexei Lazar, Carolina Jubran,
	Leon Romanovsky, linux-kernel, linux-rdma, Mark Bloch,
	Saeed Mahameed, Gal Pressman
In-Reply-To: <20260622112925.624795-3-tariqt@nvidia.com>

[-- Attachment #1: Type: text/plain, Size: 720 bytes --]

On Mon, Jun 22, 2026 at 5:00 PM Tariq Toukan <tariqt@nvidia.com> wrote:
>
> From: Alexei Lazar <alazar@nvidia.com>
>
> The IEEE 802.1Qaz standard defines that bandwidth allocation percentages
> only apply to ETS traffic classes.
>
> Reject ETS configurations that specify non-zero bandwidth for traffic
> classes.
>
> Fixes: 08fb1dacdd76 ("net/mlx5e: Support DCBNL IEEE ETS")
> Signed-off-by: Alexei Lazar <alazar@nvidia.com>
> Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]

^ permalink raw reply

* Re: [PATCH] net: liquidio: fix BAR resource leak on PF number failure
From: Simon Horman @ 2026-06-22 13:48 UTC (permalink / raw)
  To: Haoxiang Li
  Cc: andrew+netdev, davem, kuba, pabeni, felix.manlunas,
	ricardo.farrington, netdev, linux-kernel, stable
In-Reply-To: <20260620083728.2722895-1-haoxiang_li2024@163.com>

On Sat, Jun 20, 2026 at 04:37:28PM +0800, Haoxiang Li wrote:
> If cn23xx_get_pf_num() fails, the function returns without
> unmapping either BAR. Unmap both BARs before returning from
> the error path.

I think it would be worth noting how this problem was found,
and if a publicly available tool was used, naming it.

> 
> Fixes: 0c45d7fe12c7 ("liquidio: fix use of pf in pass-through mode in a virtual machine")
> Cc: stable@vger.kernel.org
> Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>

There is an AI-generated review of this patch available on sashiko.dev.
I don't think the issue raised there directly affects this patch.
But you may want to consider looking into that in the context of
a separate follow-up.

> ---
>  drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
> index 75f22f74774c..a1548ca81ecd 100644
> --- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
> +++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
> @@ -1167,8 +1167,11 @@ int setup_cn23xx_octeon_pf_device(struct octeon_device *oct)
>  		return 1;
>  	}
>  
> -	if (cn23xx_get_pf_num(oct) != 0)
> +	if (cn23xx_get_pf_num(oct) != 0) {
> +		octeon_unmap_pci_barx(oct, 0);
> +		octeon_unmap_pci_barx(oct, 1);
>  		return 1;
> +	}
>  
>  	if (cn23xx_sriov_config(oct)) {
>  		octeon_unmap_pci_barx(oct, 0);

I think this would be best handled by introducing an idiomatic goto unwind
ladder to this function. Something like this (compile tested only!):

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
index 75f22f74774c..73362b92d0fd 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_pf_device.c
@@ -1163,18 +1163,14 @@ int setup_cn23xx_octeon_pf_device(struct octeon_device *oct)
 	if (octeon_map_pci_barx(oct, 1, MAX_BAR1_IOREMAP_SIZE)) {
 		dev_err(&oct->pci_dev->dev, "%s CN23XX BAR1 map failed\n",
 			__func__);
-		octeon_unmap_pci_barx(oct, 0);
-		return 1;
+		goto err_free_barx_0;
 	}
 
 	if (cn23xx_get_pf_num(oct) != 0)
-		return 1;
+		goto err_free_barx_1;
 
-	if (cn23xx_sriov_config(oct)) {
-		octeon_unmap_pci_barx(oct, 0);
-		octeon_unmap_pci_barx(oct, 1);
-		return 1;
-	}
+	if (cn23xx_sriov_config(oct))
+		goto err_free_barx_1;
 
 	octeon_write_csr64(oct, CN23XX_SLI_MAC_CREDIT_CNT, 0x3F802080802080ULL);
 
@@ -1205,6 +1201,12 @@ int setup_cn23xx_octeon_pf_device(struct octeon_device *oct)
 	oct->coproc_clock_rate = 1000000ULL * cn23xx_coprocessor_clock(oct);
 
 	return 0;
+
+err_free_barx_0:
+	octeon_unmap_pci_barx(oct, 0);
+err_free_barx_1:
+	octeon_unmap_pci_barx(oct, 1);
+	return 1;
 }
 EXPORT_SYMBOL_GPL(setup_cn23xx_octeon_pf_device);
 
-- 
pw-bot: changes-requested

^ permalink raw reply related

* Re: [PATCH net 1/3] net/mlx5e: Report zero bandwidth for non-ETS traffic classes
From: Pavan Chebbi @ 2026-06-22 13:48 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	netdev, Paolo Abeni, Alexei Lazar, Carolina Jubran,
	Leon Romanovsky, linux-kernel, linux-rdma, Mark Bloch,
	Saeed Mahameed, Gal Pressman
In-Reply-To: <20260622112925.624795-2-tariqt@nvidia.com>

[-- Attachment #1: Type: text/plain, Size: 1023 bytes --]

On Mon, Jun 22, 2026 at 5:00 PM Tariq Toukan <tariqt@nvidia.com> wrote:
>
> From: Alexei Lazar <alazar@nvidia.com>
>
> The IEEE 802.1Qaz standard defines that bandwidth allocation percentages
> only apply to Enhanced Transmission Selection (ETS) traffic classes.
> For STRICT and VENDOR transmission selection algorithms, bandwidth
> percentage values are not applicable.
>
> Currently for non-ETS 100 bandwidth is being reported for all traffic
> classes in the get operation due to hardware limitation, regardless of
> their TSA type.
>
> Fix this by reporting 0 for non-ETS traffic classes.
>
> Fixes: 820c2c5e773d ("net/mlx5e: Read ETS settings directly from firmware")
> Signed-off-by: Alexei Lazar <alazar@nvidia.com>
> Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>

LGTM. Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]

^ permalink raw reply

* Re: [RFC net-next 08/15] ipxlat: add translation engine and dispatch core
From: Ralf Lici @ 2026-06-22 13:34 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: netdev, Daniel Gröber, Antonio Quartulli, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel, Pablo Neira Ayuso, Florian Westphal, Phil Sutter,
	Beniamino Galvani
In-Reply-To: <87tsr4gcag.fsf@toke.dk>

On Mon, 15 Jun 2026 15:31:51 +0200, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> >> >> I think a better model is to treat the device as basically a loopback
> >> >> device that translates packets before looping them back (so when they
> >> >> come back they appear to be coming from that device).
> >> >>
> >> >> Any reason why that wouldn't work?
> >> >>
> >> >
> >> > That's indeed the intended model for the ipxlat netdevice: route packets
> >> > to it, translate them, then loop them back into the stack as packets
> >> > received from that same device. That seemed like the simplest model and
> >> > the one that exposes the translation point most clearly.
> >>
> >> Right. I think this could be made a bit more explicit in the
> >> documentation as well, since it's a bit of an unusual model.
> >>
> >> And, well, taking a step back: is it really the right model? Regular NAT
> >> lives in netfilter, why can't this be a netfilter module as well? Seems
> >> to me you could have something like:
> >>
> >> table ip xlat4 {
> >> 	chain postrouting {
> >> 		type nat hook postrouting priority srcnat; policy accept;
> >> 		ip daddr 0.0.0.0/0 oifname "eth0" xlat to 64:ff9b::/96
> >> 	}
> >> }
> >> table ip6 xlat6 {
> >> 	chain prerouting {
> >> 		type nat hook prerouting priority dstnat; policy accept;
> >> 		ip6 saddr 64::ff0b::/96 iifname "eth0" xlat from 64::ff9b::/96
> >> 	}
> >> }
> >>
> >> and that would provide the functionality without having to implement a
> >> new interface type and the associated multiple traversals through the
> >> stack? Did you consider this as an alternative to the new device type?
> >>
> >
> > We did consider netfilter, and your example is syntactically attractive,
> > but I am no longer convinced it is the cleanest model for SIIT.
> >
> > An nft expression cannot simply rewrite ETH_P_IP <-> ETH_P_IPV6 and
> > return ACCEPT as if this were normal NAT because the current hook
> > invocation, dst, and conntrack-related state were established for the
> > packet as it entered that hook. A cross-family translator would need to
> > consume the skb, clear or rebuild route and ct metadata as appropriate,
> > do an other-family route lookup, and resume at a well-defined point in
> > that family. That seems possible, but it would be a new stateless
> > cross-family action, not just a new mode of the existing nft nat
> > expression (which is built around nf_nat_setup_info and assumes the
> > packet's L3 family does not change AFAICT).
>
> Right, I did not expect it would be possible to actually share code with
> the existing NAT functionality, but conceptually they're similar. I.e.,
> if I was an admin trying to figure out if my system supported SIIT
> translation, my chain of thought would be something along the line of:
> "SIIT is a variant of NAT, and I know NAT is a long-standing feature of
> netfilter, so I wonder if SIIT exists there as well".
>
> Adding the netfilter folks to Cc to try to get their attention and an
> opinion on this :)
>

That's the right move, it would be great to have their opinion on these
architectural questions.

For the netfilter folks just added: the context is a stateless
IPv6<>IPv4 SIIT translator (RFC 7915). The open design question is
whether the stateless cross-family translation should live as an nft
action, or as a route/LWT action (ILA / seg6_local lineage).

> > My second concern is that the SIIT boundary would be a property of
> > rule and hook placement. That gives flexibility, but it also means the
> > translation point has to be constrained and documented very carefully
> > to avoid ambiguous TTL/Hop Limit, PMTU/ICMP, and hook-order behavior.
> > For this use case I would rather have the route that matches the
> > translation prefix also be the object that says: leave this family
> > here and continue in the other one.
>
> Yeah, with flexibility comes the ability to shoot yourself in the foot.
> But that's not really different from much of the other functionality we
> have in the kernel today, is it? For netfilter in particular it's
> certainly possible to configure a broken NAT configuration that leads to
> packet drops (or just invalid packets being sent out on a network
> device).
>

True, misconfiguration is always possible and that alone is not an
argument against the netfilter model. But what do we actually gain in
capability from that flexibility? I agree on the UX argument (an admin
would look in nft first), but in terms of what the feature can do, I
can't yet see what the nft model unlocks. More on this just below.

> > After looking at the available kernel mechanisms again, I think the
> > better model is probably LWT: routes carry an ipxlat encap referencing a
> > named translator domain configured over netlink. That should represent
> > the stateless, prefix-based and symmetric nature of ipxlat.
>
> I think this description actually hits the nail on the head: What are we
> implementing here? Is it a product feature, or a building block for one?
> The properties you mention wrt consistency, symmetry etc are properties
> of the high-level feature (which is also generally the level things are
> specified in RFCs). Whereas other packet mangling features in the kernel
> are more in the "building block" category, where it's possible to
> configure things to implement a particular feature set / compliance with
> a particular RFC, but it's also possible to do things that are outside
> of that.
>
> I think this relates to the "mechanism, not policy" approach that we
> take to most things in the kernel: implement the building blocks to do
> something in the most general way we can, and then leave it up to
> userspace to configure things in a way that results in a consistent
> high-level system behaviour.
>

That's a good point, and I agree that we should not bake a high-level
product policy into the kernel if what we need is a reusable mechanism
(the LWT idea was my attempt at exactly that). What I am still trying to
understand is whether there is a useful generic trigger for stateless
cross-family translation beyond the route/prefix/policy-routing cases.

Routes and policy routing already cover the selectors I can make
coherent for a stateless, per-packet translator: destination/source
prefix, iif/oif/VRF, mark, TOS/DSCP, and so on. nft can of course match
much more than that, but the additional selectors that would materially
change the translation decision seem to be selectors such as L4 fields,
payload state, or conntrack state. Those are exactly the selectors I am
struggling to make correct for a stateless translator:

- non-first fragments carry no L4 header at all, yet the translator must
  rewrite every fragment (an nft ... tcp dport trigger cannot fire on
  them);

- ICMP errors must be translated too, but the flow identity lives in the
  quoted inner header (reversed), not in anything an L4/ct match on the
  error packet can see and there is no conntrack to associate them,
  since this is stateless.

So an L4-conditional trigger does not look like a good primitive for
correct stateless SIIT unless the action also defragments/refragments or
uses conntrack-like state. Those may be valid mechanisms, but they move
the design away from the stateless per-packet SIIT boundary this RFC is
trying to model.

So my first question is: is there a useful nft configuration this should
enable that is not naturally expressible as route selection, while still
remaining stateless SIIT rather than a NAT64-like stateful feature?
Maybe there is a real use case there, but I cannot construct one yet.

> That being said:
>
> > Very roughly, userspace could look like:
> >
> >     ip xlat add siit0 prefix6 64:ff9b::/96
> >     ip route add ... encap ipxlat id siit0
> >     ip -6 route add ... encap ipxlat id siit0
> >
> > There are some useful precedents for this: ILA is stateless address
> > translation as LWT, seg6_local already has cross-family LWT actions, and
> > ioam6 has a similar split between separately configured objects and
> > route attachments.
> >
> > The invariant I would like v2 to follow is that the original-family
> > route lookup selects translation as its terminal route action. The
> > translated skb then gets a fresh lookup in the other family. From that
> > point on, TTL/Hop Limit where applicable, PMTU, ICMP errors, and
> > netfilter visibility belong to the translated family.
> >
> > So I think your question addresses the core design issue in this RFC. My
> > current preference is to rework the next version around an LWT/domain
> > model instead of the virtual netdevice model, unless prototyping shows a
> > fundamental problem with that approach.
> >
> > Does that model make sense to you?
>
> I did consider this as well before suggesting netfilter as the right
> place to hook things, and I do think the route object model has some
> appeal. I agree it's a better model than the magical loopback interface,
> certainly.
>
> I think in the end this comes down to whether flexibility in how to use
> this translation mechanism is a bug or a feature, as outlined above. I'm
> leaning towards "feature", but could probably be persuaded otherwise :)
>

I agree this is the tradeoff to settle now. The reason I suggested LWT
is that making the family-A route lookup the handoff point gives the
TTL/PMTU/ICMP-and-reroute semantics a natural place to live.

So my other question, where the netfilter maintainers' input would be
very useful, is what semantics would be acceptable for changing address
family inside a netfilter hook. Concretely:

- What verdict should such an expression return? My assumption is it
  has to consume the skb and reinject into the other family (NF_STOLEN
  + reinject), since ACCEPT would resume traversal in a family whose
  dst/chain no longer apply.

- What becomes of the original-family conntrack entry, which is now
  orphaned (and is pure overhead in the stateless case)?

- Where do TTL/Hop-Limit, PMTU/ICMP generation, and netfilter
  visibility end up belonging after the family changes?

All that said, the netfilter model has its own strengths, and once we
settle the architectural questions it could turn into a very clean
design. Curious to hear your thoughts on the points above.

-- 
Ralf Lici
Mandelbit Srl

^ permalink raw reply

* Re: [PATCH] netdevsim: fix use-after-free in __nsim_dev_port_del
From: Simon Horman @ 2026-06-22 13:35 UTC (permalink / raw)
  To: hrushirajg23
  Cc: Simon Horman, kuba, andrew+netdev, davem, edumazet, pabeni, jiri,
	netdev, linux-kernel, syzbot+6c25f4750230faf70be9
In-Reply-To: <20260620064937.16301-1-hrushirajg23@gmail.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
Subject: netdevsim: fix use-after-free in __nsim_dev_port_del

The patch removes the per-port debugfs directory in nsim_destroy()
before free_netdev() so that debugfs files referencing fields inside
the netdevsim struct cannot be reached after the slab is released.

Link: https://syzkaller.appspot.com/bug?extid=6c25f4750230faf70be9

> diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
> index 27e5f109f933..08136e7990cb 100644
> --- a/drivers/net/netdevsim/netdev.c
> +++ b/drivers/net/netdevsim/netdev.c
> @@ -1214,6 +1214,13 @@ void nsim_destroy(struct netdevsim *ns)
>  		ns->page = NULL;
>  	}
>  
> +	/*
> +	 * Remove per-port debugfs files before free_netdev() releases the
> +	 * netdevsim struct to prevent use-after-free in concurrent readers.
> +	 */
> +	debugfs_remove_recursive(ns->nsim_dev_port->ddir);
> +	ns->nsim_dev_port->ddir = NULL;
> +
>  	free_netdev(dev);
>  }

[High]
Does the same use-after-free window remain on the nsim_create() error
path?

nsim_create() unconditionally calls nsim_ethtool_init(ns) before the
init step that can fail:

    SET_NETDEV_DEVLINK_PORT(dev, &nsim_dev_port->devlink_port);
    nsim_ethtool_init(ns);
    if (nsim_dev_port_is_pf(nsim_dev_port))
        err = nsim_init_netdevsim(ns);
    else
        err = nsim_init_netdevsim_vf(ns);
    if (err)
        goto err_free_netdev;
    ...
err_free_netdev:
    free_netdev(dev);
    return ERR_PTR(err);

nsim_ethtool_init() populates files under nsim_dev_port->ddir whose
data pointers reference fields inside ns, for example:

    ethtool = debugfs_create_dir("ethtool", ns->nsim_dev_port->ddir);
    debugfs_create_u32("get_err", 0600, ethtool, &ns->ethtool.get_err);
    debugfs_create_u32("set_err", 0600, ethtool, &ns->ethtool.set_err);
    ...
    dir = debugfs_create_dir("ring", ethtool);
    debugfs_create_u32("rx_max_pending", 0600, dir,
                      &ns->ethtool.ring.rx_max_pending);

If nsim_init_netdevsim() (or nsim_init_netdevsim_vf()) fails, the
err_free_netdev label runs free_netdev(dev) directly and returns
ERR_PTR(err). The caller then calls nsim_dev_port_debugfs_exit() to
tear down ddir, but in the meantime the netdevsim slab has already
been freed while these debugfs files are still live with data pointers
into it.

Should the same debugfs_remove_recursive(nsim_dev_port->ddir) be done
on the err_free_netdev path before free_netdev(dev), or should the
debugfs file creation be moved past the point of no return so the
fuzzer-reachable failure paths (alloc pressure on mock_phc_create,
register_netdevice, nsim_psp_init, etc.) cannot leave dangling
debugfs entries?

The Fixes: tag e05b2d141fef appears to cover this path as well.

^ permalink raw reply

* Re: [PATCH net] veth: fix NAPI leak in XDP enable error path
From: Pavan Chebbi @ 2026-06-22 13:31 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Simon Horman,
	netdev, eric.dumazet, Guenter Roeck, Björn Töpel,
	Daniel Borkmann, Ilias Apalodimas, Michael S. Tsirkin,
	Tariq Toukan
In-Reply-To: <20260622111825.88337-1-edumazet@google.com>

[-- Attachment #1: Type: text/plain, Size: 1610 bytes --]

On Mon, Jun 22, 2026 at 4:48 PM Eric Dumazet <edumazet@google.com> wrote:
>
> During XDP enablement in veth, if xdp_rxq_info_reg() or
> xdp_rxq_info_reg_mem_model() fails, the driver rolls back the changes.
>
> However, the rollback loop:
>         for (i--; i >= start; i--) {
>
> decrements the loop index 'i' before the first iteration. This
> correctly skips unregistering the rxq for the failed index 'i' (as
> registration failed or was already cleaned up), but it also
> erroneously skips calling netif_napi_deli() for rq[i].xdp_napi.
>
> Since netif_napi_add() was already called for index 'i', this leaves
> a dangling napi_struct in the device's napi_list. When the veth
> device is later destroyed, the freed queue memory (which contains the
> leaked NAPI structure) can be reused.
>
> The subsequent device teardown iterates the NAPI list and
> corrupts the reallocated memory, leading to UAF.
>
> Fix this by explicitly deleting the NAPI association for the failed
> index 'i' before rolling back the successfully configured queues.
>
> Fixes: b02e5a0ebb17 ("xsk: Propagate napi_id to XDP socket Rx path")
> Reported-by: Guenter Roeck <groeck@google.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Björn Töpel <bjorn.topel@intel.com>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Tariq Toukan <tariqt@nvidia.com>
> ---
>  drivers/net/veth.c | 2 ++
>  1 file changed, 2 insertions(+)
>

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]

^ permalink raw reply

* Re: [PATCH net 1/5] net: hns3: unify copper port ksettings configuration path
From: Jijie Shao @ 2026-06-22 13:30 UTC (permalink / raw)
  To: Simon Horman
  Cc: shaojijie, davem, edumazet, kuba, pabeni, andrew+netdev,
	shenjian15, liuyonglong, chenhao418, huangdonghua3, yangshuaisong,
	netdev, linux-kernel
In-Reply-To: <20260618153356.909862-1-horms@kernel.org>


on 2026/6/18 23:33, Simon Horman wrote:
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
> ---
> net: hns3: unify copper port ksettings configuration path
>
> Refactor hns3_set_link_ksettings() and hclge_set_phy_link_ksettings()
> to unify the configuration path for copper ports.
>
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c
> [ ... ]
>> @@ -863,6 +859,8 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
>>   	struct hnae3_handle *handle = hns3_get_handle(netdev);
>>   	struct hnae3_ae_dev *ae_dev = hns3_get_ae_dev(handle);
>>   	const struct hnae3_ae_ops *ops = hns3_get_ops(handle);
>> +	u8 module_type = HNAE3_MODULE_TYPE_UNKNOWN;
>> +	u8 media_type = HNAE3_MEDIA_TYPE_UNKNOWN;
>>   	int ret;
>>   
>>   	/* Chip don't support this mode. */
>> @@ -878,22 +876,20 @@ static int hns3_set_link_ksettings(struct net_device *netdev,
>>   		  cmd->base.autoneg, cmd->base.speed, cmd->base.duplex,
>>   		  cmd->lanes);
>>   
>> -	/* Only support ksettings_set for netdev with phy attached for now */
>> -	if (netdev->phydev) {
>> -		if (cmd->base.speed == SPEED_1000 &&
>> -		    cmd->base.autoneg == AUTONEG_DISABLE)
>> -			return -EINVAL;
> [Severity: Medium]
> Does removing this validation allow invalid configurations for native copper
> PHYs?
>
> The previous code explicitly rejected SPEED_1000 with AUTONEG_DISABLE because
> IEEE 802.3 1000BASE-T strictly requires autonegotiation. By delegating
> directly to phy_ethtool_ksettings_set(), which permits this for 1000BASE-X,
> the driver might now silently accept an invalid configuration.

[Accepted] Yes, this is indeed an effective bug.

>
>> -		return phy_ethtool_ksettings_set(netdev->phydev, cmd);
>> -	} else if (test_bit(HNAE3_DEV_SUPPORT_PHY_IMP_B, ae_dev->caps) &&
>> -		   ops->set_phy_link_ksettings) {
>> +	if (!ops->get_media_type)
>> +		return -EOPNOTSUPP;
>> +	ops->get_media_type(handle, &media_type, &module_type);
>>   
>> -		return ops->set_phy_link_ksettings(handle, cmd);
>> -	}
>> +	if (media_type == HNAE3_MEDIA_TYPE_COPPER) {
>> +		if (!ops->set_phy_link_ksettings)
>> +			return -EOPNOTSUPP;
>> +		return ops->set_phy_link_ksettings(handle, cmd);
>> +	}
> [ ... ]
>> diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
>> +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
> [ ... ]
>> @@ -3400,10 +3400,28 @@ hclge_set_phy_link_ksettings(struct hnae3_handle *handle,
>>   		return ret;
>>   	}
>>   
>> +	linkmode_copy(hdev->hw.mac.advertising, cmd->link_modes.advertising);
>> +	return 0;
>> +}
>> +
>> +static int
>> +hclge_set_phy_link_ksettings(struct hnae3_handle *handle,
>> +			     const struct ethtool_link_ksettings *cmd)
>> +{
>> +	struct hclge_vport *vport = hclge_get_vport(handle);
>> +	struct hclge_dev *hdev = vport->back;
>> +	int ret;
>> +
>> +	if (hnae3_dev_phy_imp_supported(hdev))
>> +		ret = hclge_ethtool_ksettings_set(handle, cmd);
>> +	else
>> +		ret = phy_ethtool_ksettings_set(handle->netdev->phydev, cmd);
> [Severity: Critical]
> Can handle->netdev->phydev be NULL here, leading to a kernel panic when called
> from userspace?
>
> A copper port can legitimately have a NULL phydev if the port's PHY address
> is configured as PHY_INEXISTENT in the firmware.
>
> Because the updated hns3_set_link_ksettings() now routes all
> HNAE3_MEDIA_TYPE_COPPER configurations here without the previous
> if (netdev->phydev) guard, phy_ethtool_ksettings_set() will unconditionally
> dereference phydev->mdio.addr.

[Accepted] Yes, this is indeed an effective bug.


Thanks,
Jijie Shao


^ permalink raw reply

* Re: [PATCH] fsl/fman: Free init resources on KeyGen failure in fman_init()
From: Pavan Chebbi @ 2026-06-22 13:27 UTC (permalink / raw)
  To: Haoxiang Li
  Cc: madalin.bucur, sean.anderson, andrew+netdev, davem, edumazet,
	kuba, pabeni, florinel.iordache, netdev, linux-kernel, stable
In-Reply-To: <20260622111638.997251-1-haoxiang_li2024@163.com>

[-- Attachment #1: Type: text/plain, Size: 1668 bytes --]

On Mon, Jun 22, 2026 at 4:47 PM Haoxiang Li <haoxiang_li2024@163.com> wrote:
>
> fman_muram_alloc() allocates initialization resources before
> initializing the KeyGen block. If keygen_init() fails, the
> function returns -EINVAL directly and leaves those resources
> allocated. Free the initialization resources before returning
> from the KeyGen failure path.
>
> While at it, drop the unused error check around enable(), which
> always returns 0.
>
> Fixes: 7472f4f281d0 ("fsl/fman: enable FMan Keygen")
> Cc: stable@kernel.org
> Signed-off-by: Haoxiang Li <haoxiang_li2024@163.com>
> ---
>  drivers/net/ethernet/freescale/fman/fman.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>

LGTM. Note that you should add "net" to the fixes patch titles.
Otherwise you see patch automation has guessed this one for net-next.
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>

> diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
> index 013273a2de32..3a2a57207e55 100644
> --- a/drivers/net/ethernet/freescale/fman/fman.c
> +++ b/drivers/net/ethernet/freescale/fman/fman.c
> @@ -1995,12 +1995,12 @@ static int fman_init(struct fman *fman)
>
>         /* Init KeyGen */
>         fman->keygen = keygen_init(fman->kg_regs);
> -       if (!fman->keygen)
> +       if (!fman->keygen) {
> +               free_init_resources(fman);
>                 return -EINVAL;
> +       }
>
> -       err = enable(fman, cfg);
> -       if (err != 0)
> -               return err;
> +       enable(fman, cfg);
>
>         enable_time_stamp(fman);
>
> --
> 2.25.1
>
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 5469 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox