Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH net 10/13] ice: fix locking in ice_dcb_rebuild()
From: Jacob Keller @ 2026-05-06 21:13 UTC (permalink / raw)
  To: Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Kwapulinski,
	Aleksandr Loktionov, Arkadiusz Kubalewski, Maciej Fijalkowski,
	Michal Kubiak, Joshua Hay, Madhu Chittim, Willem de Bruijn,
	Dave Ertman, Ivan Vecera, Grzegorz Nitka
  Cc: netdev, stable, Bart Van Assche, intel-wired-lan, Arpana Arland
In-Reply-To: <20260504-jk-iwl-net-2026-05-04-v1-10-a222a88bd962@intel.com>

On 5/4/2026 10:14 PM, Jacob Keller wrote:
> From: Bart Van Assche <bvanassche@acm.org>
> 
> Move the mutex_lock() call up to prevent that DCB settings change after
> the first ice_query_port_ets() call. The second ice_query_port_ets()
> call in ice_dcb_rebuild() is already protected by pf->tc_mutex.
> 
> This also fixes a bug in an error path, as before taking the first
> "goto dcb_error" in the function jumped over mutex_lock() to
> mutex_unlock().
> 
> This bug has been detected by the clang thread-safety analyzer.
> 
> Cc: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Cc: intel-wired-lan@lists.osuosl.org
> Fixes: 242b5e068b25 ("ice: Fix DCB rebuild after reset")
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
> Tested-by: Arpana Arland <arpanax.arland@intel.com>
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---
>  drivers/net/ethernet/intel/ice/ice_dcb_lib.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> index 16aa25535152..0bc6dd375687 100644
> --- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> +++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
> @@ -537,14 +537,14 @@ void ice_dcb_rebuild(struct ice_pf *pf)
>  	struct ice_dcbx_cfg *err_cfg;
>  	int ret;
>  
> +	mutex_lock(&pf->tc_mutex);
> +
>  	ret = ice_query_port_ets(pf->hw.port_info, &buf, sizeof(buf), NULL);
>  	if (ret) {
>  		dev_err(dev, "Query Port ETS failed\n");
>  		goto dcb_error;
>  	}
>  
> -	mutex_lock(&pf->tc_mutex);
> -
>  	if (!pf->hw.port_info->qos_cfg.is_sw_lldp)
>  		ice_cfg_etsrec_defaults(pf->hw.port_info);
>  
> 

Sashiko thinks there is a problem here:

> Does moving mutex_lock(&pf->tc_mutex) to the top of ice_dcb_rebuild()
> introduce an AB-BA deadlock with rtnl_lock?
> In standard DCB netlink operations, the netlink layer calls into the driver
> with rtnl_lock already held, and the driver subsequently acquires
> pf->tc_mutex. This establishes a required lock ordering of rtnl_lock
> followed by pf->tc_mutex.
> With this change, if the first ice_query_port_ets() fails, we take the goto
> dcb_error path while already holding pf->tc_mutex.
> Looking at the error cleanup path lower down in ice_dcb_rebuild():
> dcb_error:
>     dev_err(dev, "Disabling DCB until new settings occur\n");
>     [ ... ]
>     ice_pf_dcb_cfg(pf, err_cfg, false);
> Because the locked parameter is passed as false, ice_pf_dcb_cfg() will
> unconditionally acquire rtnl_lock().
> Does this create a lock inversion (pf->tc_mutex followed by rtnl_lock)
> against concurrent DCB netlink operations?
This seems like a fully pre-existing error. We already jump to dcb_error
else where in the function.

I don't know if this locking order really is an ABBA violation (I did
not review any of the other flows that take tc_mutex to confirm), but I
don't think it should hold this fix.

Someone from the ice team will need to investigate and see what the best
solution is. I suspect we'll have to take RTNL lock then the tc_mutex
and pass true to the ice_pf_dcb_cfg function. Or, better yet, see if
this converts to the netdev per-instance lock and we could drop the
tc_mutex entirely, relying on netdev_lock?

Thanks,
Jake

^ permalink raw reply

* [PATCH] devlink/param: replace deprecated strcpy() with strscpy()
From: Álvaro Costa @ 2026-05-06 21:14 UTC (permalink / raw)
  To: jiri
  Cc: Álvaro Costa, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Simon Horman, open list:DEVLINK, open list

Replace strcpy() call used to extract a string parameter from param_data
with strscpy(). Since strscpy() already performs bounds checking and
ensures the destination string is NUL-terminated, remove the string
length check as well.

Signed-off-by: Álvaro Costa <alvaroc.dev@gmail.com>
---
 net/devlink/param.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/devlink/param.c b/net/devlink/param.c
index cf95268da5b0..26695b7e2861 100644
--- a/net/devlink/param.c
+++ b/net/devlink/param.c
@@ -536,11 +536,9 @@ devlink_param_value_get_from_info(const struct devlink_param *param,
 		value->vu64 = nla_get_u64(param_data);
 		break;
 	case DEVLINK_PARAM_TYPE_STRING:
-		len = strnlen(nla_data(param_data), nla_len(param_data));
-		if (len == nla_len(param_data) ||
-		    len >= __DEVLINK_PARAM_MAX_STRING_VALUE)
+		len = strscpy(value->vstr, nla_data(param_data));
+		if (len < 0)
 			return -EINVAL;
-		strcpy(value->vstr, nla_data(param_data));
 		break;
 	case DEVLINK_PARAM_TYPE_BOOL:
 		if (param_data && nla_len(param_data))
-- 
2.53.0


^ permalink raw reply related

* Re: [PATCH net 11/13] ice: fix PTP hang for E825C devices
From: Jacob Keller @ 2026-05-06 21:16 UTC (permalink / raw)
  To: Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Kwapulinski,
	Aleksandr Loktionov, Arkadiusz Kubalewski, Maciej Fijalkowski,
	Michal Kubiak, Joshua Hay, Madhu Chittim, Willem de Bruijn,
	Dave Ertman, Ivan Vecera, Grzegorz Nitka
  Cc: netdev, stable, Simon Horman, Rinitha S
In-Reply-To: <20260504-jk-iwl-net-2026-05-04-v1-11-a222a88bd962@intel.com>

On 5/4/2026 10:14 PM, Jacob Keller wrote:
> From: Grzegorz Nitka <grzegorz.nitka@intel.com>
> 
> Change the order of PTP reconfiguration when port goes down or up
> (ice_down and ice_up calls) to be more graceful and consistent from
> timestamp interrupts processing perspective.
> 
> For both calls (ice_up and ice_down), accompanying ice_ptp_link_change
> is called which starts/stops PTP timer. This patch changes the order:
> - while link goes down: disable net device Tx first (netif_carrier_off,
>   netif_tx_disable), then call ice_ptp_link_change
> - while link goes up: ice_ptp_link_change called first, then re-enable
>   net device Tx (netif_tx_start_all_queues)
> 
> Otherwise, there is a narrow window in which PTP timestamp request has
> been triggered and timestamp processing occurs when PTP timer is not
> enabled yet (up case) or already disabled (down case). This may lead to
> undefined behavior and receiving invalid timestamps. This case was
> observed on E825C devices only.
> 

Sashiko has some concerns:

> Does this sequence still leave a window for early packets to trigger PTP
> processing before the timer is ready?
> Looking earlier in ice_up_complete(), the hardware Rx rings are started
> before we reach this point:
> 	err = ice_vsi_start_all_rx_rings(vsi);
> 	if (err)
> 		return err;
> Since ice_vsi_start_all_rx_rings() is called before ice_ptp_link_change(),
> could packets be received and timestamped before the PTP timer is actually
> enabled?

This doesn't matter. The issue is for Tx timestamps and the Rx path is
unrelated here.

Sashiko also says:

> Is the race condition fully closed by moving ice_ptp_link_change() here?
> While netif_tx_disable() stops the OS networking stack from enqueuing new
> packets, it does not flush or wait for the hardware DMA rings to drain. The
> hardware Tx rings are actually stopped later in the teardown process:
> 	ice_vsi_dis_irq(vsi);
> 	tx_err = ice_vsi_stop_lan_tx_rings(vsi, ICE_NO_RESET, 0);
> If an in-flight packet from the hardware Tx ring triggers a PTP timestamp
> request after ice_ptp_link_change() has set ptp_port->link_up = false,
> wouldn't the software worker ignore the timestamp?
> Leaving a valid timestamp unread in the hardware FIFO is noted as the cause
> of the PHY hang. Should ice_ptp_link_change() be called after the hardware
> rings are fully stopped to completely avoid this regression?
> Additionally, is there a missing VSI type check here in ice_down()?
> Unlike ice_up_complete(), which checks if the VSI is a PF or SF, this block
> executes for any VSI that has a netdev.
> Because the PTP timer is shared at the physical port level, could bringing
> down a secondary interface (like a sub-function) invoke ice_down() and
> unconditionally disable the PTP timer for vsi->back?
> This seems like it would break PTP functionality for the main PF interface
> and any other active sub-functions on that port.

This I am not sure. Gregorz, do you have any thoughts? I thought we had
spoken about this and were maybe investigating separating the PTP link
state from the admin up/down state entirely.

^ permalink raw reply

* Re: [PATCH net 00/13] Intel Wired LAN Driver Updates 2026-05-04 (i40e, ice, idpf)
From: Jacob Keller @ 2026-05-06 21:21 UTC (permalink / raw)
  To: Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Kwapulinski,
	Aleksandr Loktionov, Arkadiusz Kubalewski, Maciej Fijalkowski,
	Joshua Hay, Madhu Chittim, Willem de Bruijn, Dave Ertman,
	Ivan Vecera, Grzegorz Nitka
  Cc: netdev, stable, Matt Vollrath, Sunitha Mekala, Kohei Enju,
	Paul Menzel, Simon Horman, Emil Tantilov, Samuel Salin,
	Patryk Holda, Greg Kroah-Hartman, Tony Nguyen, stable,
	Marcin Szycik, Bart Van Assche, intel-wired-lan, Arpana Arland,
	Rinitha S
In-Reply-To: <20260504-jk-iwl-net-2026-05-04-v1-0-a222a88bd962@intel.com>

On 5/4/2026 10:14 PM, Jacob Keller wrote:
> Matt Volrath fixes two issues with the i40e driver probe routine, ensuring
> that PTP is properly cleaned up if the probe fails.
> 
> Maciej fixes the i40e driver logic to keep the q_vectors array in sync with
> changes to the channel count via ethtool.
> 
> Emil corrects the initialization of the read_dev_clk_lock spinlock in
> idpf_ptp_init, ensuring it is initialized prior to when the
> ptp_schedule_worker() is called.
> 
> Josh fixes the idpf driver to prevent enabling XDP if the queue based
> scheduling is not supported by the firmware.
> 
> Josh fixes the idpf skb data path for handling queue based scheduling.
> 
> Josh fixes an XDP crash in the soft reset error path, restoring the
> original configuration if idpf_xdp_setup_prog() fails.
> 
> Greg KH fixes a double free and use-after free in the idpf auxiliary device
> error paths.
> 
> Marcin fixes ice_set_rss_hfunc() to use the correct q_opt_flags field,
> correcting the assignment and preventing submission of invalid data to the
> firmware.
> 
> Bart corrects the locking in ice_dcb_rebuild(), ensuring that the tc_mutex
> is held over the entire operation.
> 
> Grzegorz fixes the ordering of ice_ptp_link_change() in ice_up_complete()
> ensuring that the PTP timestamps will not be enabled before the PTP timer
> is actually re-initialized.
> 
> Ivan fixes the rclk pin state get for E810 devices, ensuring the index is
> properly offset by the base_rclk_idx value. This ensures that the correct
> pin index is used to look up recovered clock state. He additionally adds
> bounds checking to prevent attempting to access pins outside of the pin
> state array.
> 
> Ivan also moves the CGU register macros to the top of ice_dpll.h, inside
> the header guard to avoid duplicate macro definitions should the ice_dpll.h
> header is included multiple times.
> 
> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
> ---

Sashiko pointed out a few issues with some of the patches. I replied to
all the patches with possible issues, and I think some really do need
more work.

In particular, patch 3 needs to resolve a definite use-after-free issue,
patch 5 needs to address an issue with the extack pointer use, patch 6
and 7 need some investigation from the author to confirm, and patch 11
needs some confirmation from Grzegorz on whether there is still any gap.

Sashiko did have some concerns on patch 1, 2, 8, and 10. I replied to
the patches, and I think those are issues which need separate follow up
work and shouldn't block these fixes.

I'm going to submit a v2 which drops the patches that need rework.

Thanks,
Jake
> Bart Van Assche (1):
>       ice: fix locking in ice_dcb_rebuild()
> 
> Emil Tantilov (2):
>       idpf: fix read_dev_clk_lock spinlock init in idpf_ptp_init()
>       idpf: fix xdp crash in soft reset error path
> 
> Greg Kroah-Hartman (1):
>       idpf: fix double free and use-after-free in aux device error paths
> 
> Grzegorz Nitka (1):
>       ice: fix PTP hang for E825C devices
> 
> Ivan Vecera (2):
>       ice: dpll: fix rclk pin state get for E810
>       ice: dpll: fix misplaced header macros
> 
> Joshua Hay (2):
>       idpf: do not enable XDP if queue based scheduling is not supported
>       idpf: fix skb datapath queue based scheduling crashes and timeouts
> 
> Maciej Fijalkowski (1):
>       i40e: keep q_vectors array in sync with channel count changes
> 
> Marcin Szycik (1):
>       ice: fix setting RSS VSI hash for E830
> 
> Matt Vollrath (2):
>       i40e: Cleanup PTP registration on probe failure
>       i40e: Cleanup PTP pins on probe failure
> 
>  drivers/net/ethernet/intel/i40e/i40e.h          |  1 +
>  drivers/net/ethernet/intel/ice/ice_dpll.h       | 32 ++++++-------
>  drivers/net/ethernet/intel/idpf/idpf_txrx.h     | 12 +++--
>  drivers/net/ethernet/intel/idpf/idpf_virtchnl.h |  4 +-
>  drivers/net/ethernet/intel/i40e/i40e_main.c     | 36 ++++++++++++---
>  drivers/net/ethernet/intel/i40e/i40e_ptp.c      |  3 +-
>  drivers/net/ethernet/intel/ice/ice_dcb_lib.c    |  4 +-
>  drivers/net/ethernet/intel/ice/ice_dpll.c       |  5 ++
>  drivers/net/ethernet/intel/ice/ice_main.c       |  6 +--
>  drivers/net/ethernet/intel/idpf/idpf_idc.c      |  6 +++
>  drivers/net/ethernet/intel/idpf/idpf_lib.c      |  4 +-
>  drivers/net/ethernet/intel/idpf/idpf_ptp.c      |  4 +-
>  drivers/net/ethernet/intel/idpf/idpf_txrx.c     | 61 ++++++++++++++-----------
>  drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 19 ++------
>  drivers/net/ethernet/intel/idpf/xdp.c           | 15 ++++--
>  drivers/net/ethernet/intel/idpf/xsk.c           |  4 +-
>  16 files changed, 132 insertions(+), 84 deletions(-)
> ---
> base-commit: bd3a4795d5744f59a1f485379f1303e5e606f377
> change-id: 20260504-jk-iwl-net-2026-05-04-f9526823577f
> 
> Best regards,
> --  
> Jacob Keller <jacob.e.keller@intel.com>
> 


^ permalink raw reply

* Re: [Patch net-next v1 1/7] r8169: add support for multi irqs
From: Heiner Kallweit @ 2026-05-06 21:28 UTC (permalink / raw)
  To: javen, nic_swsd, andrew+netdev, davem, edumazet, kuba, pabeni,
	horms
  Cc: netdev, linux-kernel
In-Reply-To: <20260506081326.767-2-javen_xu@realsil.com.cn>

On 06.05.2026 10:13, javen wrote:
> From: Javen Xu <javen_xu@realsil.com.cn>
> 
> RSS uses multi rx queues to receive packets, and each rx queue needs one
> irq and napi. So this patch adds support for multi irqs and napi here.
> 
> Signed-off-by: Javen Xu <javen_xu@realsil.com.cn>
> ---
>  drivers/net/ethernet/realtek/r8169_main.c | 199 ++++++++++++++++++++--
>  1 file changed, 184 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index 791277e750ba..ef74ee02c117 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -77,6 +77,7 @@
>  #define R8169_RX_RING_BYTES	(NUM_RX_DESC * sizeof(struct RxDesc))
>  #define R8169_TX_STOP_THRS	(MAX_SKB_FRAGS + 1)
>  #define R8169_TX_START_THRS	(2 * R8169_TX_STOP_THRS)
> +#define R8169_MAX_MSIX_VEC	32
>  
>  #define OCP_STD_PHY_BASE	0xa400
>  
> @@ -435,6 +436,8 @@ enum rtl8125_registers {
>  #define INT_CFG0_CLKREQEN		BIT(3)
>  	IntrMask_8125		= 0x38,
>  	IntrStatus_8125		= 0x3c,
> +	INTR_VEC_MAP_MASK	= 0x800,
> +	INTR_VEC_MAP_STATUS	= 0x802,

These register names don't have a chip version reference.
Does this mean they can be used on other chip versions
with RSS as well?

>  	INT_CFG1_8125		= 0x7a,
>  	LEDSEL2			= 0x84,
>  	LEDSEL1			= 0x86,
> @@ -728,6 +731,19 @@ enum rtl_dash_type {
>  	RTL_DASH_25_BP,
>  };
>  
> +struct rtl8169_napi {
> +	struct napi_struct napi;
> +	void *priv;
> +	int index;

It seems the index is never used in this patch.

> +};
> +
> +struct rtl8169_irq {
> +	irq_handler_t	handler;
> +	unsigned int	vector;
> +	u8		requested;
> +	char		name[IFNAMSIZ + 10];
> +};
> +
>  struct rtl8169_private {
>  	void __iomem *mmio_addr;	/* memory map physical address */
>  	struct pci_dev *pci_dev;
> @@ -745,9 +761,19 @@ struct rtl8169_private {
>  	dma_addr_t RxPhyAddr;
>  	struct page *Rx_databuff[NUM_RX_DESC];	/* Rx data buffers */
>  	struct ring_info tx_skb[NUM_TX_DESC];	/* Tx data buffers */
> +	struct rtl8169_irq irq_tbl[R8169_MAX_MSIX_VEC];
> +	struct rtl8169_napi r8169napi[R8169_MAX_MSIX_VEC];
> +	u16 isr_reg[R8169_MAX_MSIX_VEC];
> +	u16 imr_reg[R8169_MAX_MSIX_VEC];

These arrays result in unecessarily high memory consumption on all other
chip versions. Can't they be dynamically allocated, only in case driver
supports RSS for the respective chip version?

> +	unsigned int num_rx_rings;
>  	u16 cp_cmd;
>  	u16 tx_lpi_timer;
>  	u32 irq_mask;
> +	u8 min_irq_nvecs;
> +	u8 max_irq_nvecs;

It seems these values are actually constants.
Can't we avoid these members?

> +	u8 hw_supp_isr_ver;
> +	u8 hw_curr_isr_ver;
> +	u8 irq_nvecs;
>  	int irq;
>  	struct clk *clk;
>  
> @@ -763,6 +789,8 @@ struct rtl8169_private {
>  	unsigned aspm_manageable:1;
>  	unsigned dash_enabled:1;
>  	bool sfp_mode:1;
> +	bool rss_support:1;
> +	bool rss_enable:1;
>  	dma_addr_t counters_phys_addr;
>  	struct rtl8169_counters *counters;
>  	struct rtl8169_tc_offsets tc_offset;
> @@ -2680,6 +2708,44 @@ static void rtl_hw_reset(struct rtl8169_private *tp)
>  	rtl_loop_wait_low(tp, &rtl_chipcmd_cond, 100, 100);
>  }
>  
> +static void rtl_setup_mqs_reg(struct rtl8169_private *tp)
> +{
> +	if (tp->mac_version <= RTL_GIGA_MAC_VER_52) {
> +		tp->isr_reg[0] = IntrStatus;
> +		tp->imr_reg[0] = IntrMask;
> +	} else {
> +		tp->isr_reg[0] = IntrStatus_8125;
> +		tp->imr_reg[0] = IntrMask_8125;
> +	}
> +
> +	for (int i = 1; i < tp->max_irq_nvecs; i++)
> +		tp->isr_reg[i] = (u16)(INTR_VEC_MAP_STATUS + (i - 1) * 4);
> +
> +	for (int i = 1; i < tp->max_irq_nvecs; i++)
> +		tp->imr_reg[i] = (u16)(INTR_VEC_MAP_MASK + (i - 1) * 4);

This populates the array with constant values. Therefore, can't you avoid
using this array?

> +}
> +
> +static void rtl_software_parameter_initialize(struct rtl8169_private *tp)
> +{
> +	tp->num_rx_rings = 1;
> +
> +	switch (tp->mac_version) {
> +	case RTL_GIGA_MAC_VER_80:
> +		tp->min_irq_nvecs = 1;
> +		tp->max_irq_nvecs = 1;
> +		tp->hw_supp_isr_ver = 6;

Magic value 6 requires at least an explanation and a constant.

> +		break;
> +	default:
> +		tp->min_irq_nvecs = 1;
> +		tp->max_irq_nvecs = 1;
> +		tp->hw_supp_isr_ver = 1;
> +		break;
> +	}
> +	tp->hw_curr_isr_ver = tp->hw_supp_isr_ver;

This indicates that the current version can be set to a version
which is not the supported one. This is misleading.
- Is supp_isr_ver the highest supported isr version?
- And does this mean that each chip is backwards-compatible and
  supports also all lower isr versions?

> +
> +	rtl_setup_mqs_reg(tp);
> +}
> +
>  static void rtl_request_firmware(struct rtl8169_private *tp)
>  {
>  	struct rtl_fw *rtl_fw;
> @@ -4266,9 +4332,21 @@ static void rtl8169_tx_clear(struct rtl8169_private *tp)
>  	netdev_reset_queue(tp->dev);
>  }
>  
> +static void rtl8169_napi_disable(struct rtl8169_private *tp)
> +{
> +	for (int i = 0; i < tp->irq_nvecs; i++)
> +		napi_disable(&tp->r8169napi[i].napi);
> +}
> +
> +static void rtl8169_napi_enable(struct rtl8169_private *tp)
> +{
> +	for (int i = 0; i < tp->irq_nvecs; i++)
> +		napi_enable(&tp->r8169napi[i].napi);
> +}
> +
>  static void rtl8169_cleanup(struct rtl8169_private *tp)
>  {
> -	napi_disable(&tp->napi);
> +	rtl8169_napi_disable(tp);
>  
>  	/* Give a racing hard_start_xmit a few cycles to complete. */
>  	synchronize_net();
> @@ -4313,8 +4391,8 @@ static void rtl_reset_work(struct rtl8169_private *tp)
>  
>  	for (i = 0; i < NUM_RX_DESC; i++)
>  		rtl8169_mark_to_asic(tp->RxDescArray + i);
> +	rtl8169_napi_enable(tp);
>  
> -	napi_enable(&tp->napi);

This moves the empty line. It should remain where it is.

>  	rtl_hw_start(tp);
>  }
>  
> @@ -4820,7 +4898,7 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
>  			goto release_descriptor;
>  		}
>  
> -		skb = napi_alloc_skb(&tp->napi, pkt_size);
> +		skb = napi_alloc_skb(&tp->r8169napi[0].napi, pkt_size);
>  		if (unlikely(!skb)) {
>  			dev->stats.rx_dropped++;
>  			goto release_descriptor;
> @@ -4844,7 +4922,7 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
>  		if (skb->pkt_type == PACKET_MULTICAST)
>  			dev->stats.multicast++;
>  
> -		napi_gro_receive(&tp->napi, skb);
> +		napi_gro_receive(&tp->r8169napi[0].napi, skb);
>  
>  		dev_sw_netstats_rx_add(dev, pkt_size);
>  release_descriptor:
> @@ -4856,7 +4934,8 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
>  
>  static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  {
> -	struct rtl8169_private *tp = dev_instance;
> +	struct rtl8169_napi *napi = dev_instance;
> +	struct rtl8169_private *tp = napi->priv;
>  	u32 status = rtl_get_events(tp);
>  
>  	if ((status & 0xffff) == 0xffff || !(status & tp->irq_mask))
> @@ -4873,13 +4952,53 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  		phy_mac_interrupt(tp->phydev);
>  
>  	rtl_irq_disable(tp);
> -	napi_schedule(&tp->napi);
> +	napi_schedule(&napi->napi);
>  out:
>  	rtl_ack_events(tp, status);
>  
>  	return IRQ_HANDLED;
>  }
>  
> +static void rtl8169_free_irq(struct rtl8169_private *tp)
> +{
> +	for (int i = 0; i < tp->irq_nvecs; i++) {
> +		struct rtl8169_irq *irq = &tp->irq_tbl[i];
> +		struct rtl8169_napi *napi = &tp->r8169napi[i];
> +
> +		if (irq->requested) {

Is this check actually needed? Wouldn't pci_free_irq()
also be fine with irqs not having been requested?

> +			irq->requested = 0;
> +			pci_free_irq(tp->pci_dev, i, napi);
> +		}
> +	}
> +}
> +
> +static int rtl8169_request_irq(struct rtl8169_private *tp)
> +{
> +	const int len = sizeof(tp->irq_tbl[0].name);
> +	struct net_device *dev = tp->dev;
> +	struct rtl8169_napi *napi;
> +	struct rtl8169_irq *irq;
> +	int rc = 0;
> +
> +	for (int i = 0; i < tp->irq_nvecs; i++) {
> +		irq = &tp->irq_tbl[i];
> +
> +		napi = &tp->r8169napi[i];
> +		snprintf(irq->name, len, "%s-%d", dev->name, i);

I don't think this is needed. pci_request_irq() supports dynamic
irq name generation.

> +		irq->handler = rtl8169_interrupt;
> +		rc = pci_request_irq(tp->pci_dev, i, irq->handler, NULL, napi, irq->name);
> +		if (rc)
> +			break;
> +
> +		irq->vector = pci_irq_vector(tp->pci_dev, i);
> +		irq->requested = 1;
> +	}
> +
> +	if (rc)
> +		rtl8169_free_irq(tp);
> +	return rc;
> +}
> +
>  static void rtl_task(struct work_struct *work)
>  {
>  	struct rtl8169_private *tp =
> @@ -4914,9 +5033,10 @@ static void rtl_task(struct work_struct *work)
>  
>  static int rtl8169_poll(struct napi_struct *napi, int budget)
>  {
> -	struct rtl8169_private *tp = container_of(napi, struct rtl8169_private, napi);
> +	struct rtl8169_napi *r8169_napi = container_of(napi, struct rtl8169_napi, napi);
> +	struct rtl8169_private *tp = r8169_napi->priv;
>  	struct net_device *dev = tp->dev;
> -	int work_done;
> +	int work_done = 0;
>  
>  	rtl_tx(dev, tp, budget);
>  
> @@ -5035,7 +5155,7 @@ static void rtl8169_up(struct rtl8169_private *tp)
>  	phy_init_hw(tp->phydev);
>  	phy_resume(tp->phydev);
>  	rtl8169_init_phy(tp);
> -	napi_enable(&tp->napi);
> +	rtl8169_napi_enable(tp);
>  	enable_work(&tp->wk.work);
>  	rtl_reset_work(tp);
>  
> @@ -5053,7 +5173,7 @@ static int rtl8169_close(struct net_device *dev)
>  	rtl8169_down(tp);
>  	rtl8169_rx_clear(tp);
>  
> -	free_irq(tp->irq, tp);
> +	rtl8169_free_irq(tp);
>  
>  	phy_disconnect(tp->phydev);
>  
> @@ -5108,7 +5228,8 @@ static int rtl_open(struct net_device *dev)
>  	rtl_request_firmware(tp);
>  
>  	irqflags = pci_dev_msi_enabled(pdev) ? IRQF_NO_THREAD : IRQF_SHARED;
> -	retval = request_irq(tp->irq, rtl8169_interrupt, irqflags, dev->name, tp);
> +
> +	retval = rtl8169_request_irq(tp);
>  	if (retval < 0)
>  		goto err_release_fw_2;
>  
> @@ -5125,7 +5246,7 @@ static int rtl_open(struct net_device *dev)
>  	return retval;
>  
>  err_free_irq:
> -	free_irq(tp->irq, tp);
> +	rtl8169_free_irq(tp);
>  err_release_fw_2:
>  	rtl_release_firmware(tp);
>  	rtl8169_rx_clear(tp);
> @@ -5328,7 +5449,9 @@ static void rtl_set_irq_mask(struct rtl8169_private *tp)
>  
>  static int rtl_alloc_irq(struct rtl8169_private *tp)
>  {
> +	struct pci_dev *pdev = tp->pci_dev;
>  	unsigned int flags;
> +	int nvecs;
>  
>  	switch (tp->mac_version) {
>  	case RTL_GIGA_MAC_VER_02 ... RTL_GIGA_MAC_VER_06:
> @@ -5344,7 +5467,18 @@ static int rtl_alloc_irq(struct rtl8169_private *tp)
>  		break;
>  	}
>  
> -	return pci_alloc_irq_vectors(tp->pci_dev, 1, 1, flags);
> +	nvecs = pci_alloc_irq_vectors(pdev, tp->min_irq_nvecs, tp->max_irq_nvecs, flags);
> +
> +	if (nvecs < 0)
> +		nvecs = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_ALL_TYPES);

This may be dangerous. If the first allocation fails, you may here allocate an
interrupt of a type not supported by the chip.

> +
> +	if (nvecs < 0)
> +		return nvecs;
> +
> +	tp->irq = pci_irq_vector(pdev, 0);
> +	tp->irq_nvecs = nvecs;
> +
> +	return 0;
>  }
>  
>  static void rtl_read_mac_address(struct rtl8169_private *tp,
> @@ -5539,6 +5673,17 @@ static void rtl_hw_initialize(struct rtl8169_private *tp)
>  	}
>  }
>  
> +static int rtl8169_set_real_num_queue(struct rtl8169_private *tp)
> +{
> +	int retval;
> +
> +	retval = netif_set_real_num_tx_queues(tp->dev, 1);
> +	if (retval < 0)
> +		return retval;
> +
> +	return netif_set_real_num_rx_queues(tp->dev, tp->num_rx_rings);
> +}
> +
>  static int rtl_jumbo_max(struct rtl8169_private *tp)
>  {
>  	/* Non-GBit versions don't support jumbo frames */
> @@ -5599,6 +5744,19 @@ static bool rtl_aspm_is_safe(struct rtl8169_private *tp)
>  	return false;
>  }
>  
> +static void r8169_init_napi(struct rtl8169_private *tp)
> +{
> +	for (int i = 0; i < tp->irq_nvecs; i++) {
> +		struct rtl8169_napi *r8169napi = &tp->r8169napi[i];
> +		int (*poll)(struct napi_struct *napi, int budget);
> +
> +		poll = rtl8169_poll;
> +		netif_napi_add(tp->dev, &r8169napi->napi, poll);
> +		r8169napi->priv = tp;
> +		r8169napi->index = i;
> +	}
> +}
> +
>  static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>  {
>  	const struct rtl_chip_info *chip;
> @@ -5703,11 +5861,12 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>  
>  	rtl_hw_reset(tp);
>  
> +	rtl_software_parameter_initialize(tp);
> +
>  	rc = rtl_alloc_irq(tp);
>  	if (rc < 0)
>  		return dev_err_probe(&pdev->dev, rc, "Can't allocate interrupt\n");
>  
> -	tp->irq = pci_irq_vector(pdev, 0);
>  
>  	INIT_WORK(&tp->wk.work, rtl_task);
>  	disable_work(&tp->wk.work);
> @@ -5716,7 +5875,13 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>  
>  	dev->ethtool_ops = &rtl8169_ethtool_ops;
>  
> -	netif_napi_add(dev, &tp->napi, rtl8169_poll);
> +	if (!tp->rss_support) {
> +		netif_napi_add(dev, &tp->r8169napi[0].napi, rtl8169_poll);
> +		tp->r8169napi[0].priv = tp;
> +		tp->r8169napi[0].index = 0;
> +	} else {
> +		r8169_init_napi(tp);
> +	}
>  
>  	dev->hw_features = NETIF_F_IP_CSUM | NETIF_F_RXCSUM |
>  			   NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX;
> @@ -5778,6 +5943,10 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	if (jumbo_max)
>  		dev->max_mtu = jumbo_max;
>  
> +	rc = rtl8169_set_real_num_queue(tp);
> +	if (rc < 0)
> +		return dev_err_probe(&pdev->dev, rc, "set tx/rx num failure\n");
> +
>  	rtl_set_irq_mask(tp);
>  
>  	tp->counters = dmam_alloc_coherent (&pdev->dev, sizeof(*tp->counters),


^ permalink raw reply

* [PATCH net-next v3 0/2] Keep PHY link during WoL sleep cycle
From: Justin Chen @ 2026-05-06 21:31 UTC (permalink / raw)
  To: netdev
  Cc: bcm-kernel-feedback-list, pabeni, kuba, edumazet, davem,
	andrew+netdev, florian.fainelli, Justin Chen

First we divide the init/deinit path to allow for a partial init/deinit
during a sleep cycle. We also remove some unnecessary small functions at
the same time.

Then we modify the suspend and resume path to allow for a partial bring
down and bring up. This allow us to keep the PHY link up and to resume
network traffic much quicker. Note we only do this when WoL is enabled
since the PHY is already powered. In the non-WoL case we want to follow
the same flow.

Justin Chen (2):
  net: bcmasp: Divide init to allow partial bring up
  net: bcmasp: Keep phy link during WoL sleep cycle

 .../net/ethernet/broadcom/asp2/bcmasp_intf.c  | 262 ++++++++++--------
 1 file changed, 141 insertions(+), 121 deletions(-)

-- 
2.34.1


^ permalink raw reply

* [PATCH net-next v3 2/2] net: bcmasp: Keep phy link during WoL sleep cycle
From: Justin Chen @ 2026-05-06 21:31 UTC (permalink / raw)
  To: netdev
  Cc: bcm-kernel-feedback-list, pabeni, kuba, edumazet, davem,
	andrew+netdev, florian.fainelli, Justin Chen
In-Reply-To: <20260506213114.2002886-1-justin.chen@broadcom.com>

We currently more or less restart all the HW on resume. Since we also
stop the PHY, it takes a while for the PHY link to be re-negotiated on
resume. Instead of doing a full restart, we keep the HW state and the
PHY link, that way we can resume network traffic with a much smaller
delay.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
---
v3
- Reworked how we managed the phy state machine while suspended. We make sure
  the state machine is not running to avoid races and restore the link
  appropriately when the HW is reset.

v2
- In the resume case with a HW reset. We trigger a relink by setting the link
  to PHY_UP instead of calling phy_restart_aneg().

 .../net/ethernet/broadcom/asp2/bcmasp_intf.c  | 62 ++++++++++++++-----
 1 file changed, 48 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c b/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c
index e2b51ec903af..ed0977832ce4 100644
--- a/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c
+++ b/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c
@@ -923,7 +923,7 @@ static void bcmasp_phy_hw_unprepare(struct bcmasp_intf *intf)
 		bcmasp_rgmii_mode_en_set(intf, false);
 }
 
-static void bcmasp_netif_deinit(struct net_device *dev)
+static void bcmasp_netif_deinit(struct net_device *dev, bool stop_phy)
 {
 	struct bcmasp_intf *intf = netdev_priv(dev);
 	u32 reg, timeout = 1000;
@@ -946,7 +946,8 @@ static void bcmasp_netif_deinit(struct net_device *dev)
 
 	umac_enable_set(intf, UMC_CMD_TX_EN, 0);
 
-	phy_stop(dev->phydev);
+	if (stop_phy)
+		phy_stop(dev->phydev);
 
 	umac_enable_set(intf, UMC_CMD_RX_EN, 0);
 
@@ -974,7 +975,7 @@ static int bcmasp_stop(struct net_device *dev)
 	/* Stop tx from updating HW */
 	netif_tx_disable(dev);
 
-	bcmasp_netif_deinit(dev);
+	bcmasp_netif_deinit(dev, true);
 
 	bcmasp_reclaim_free_buffers(intf);
 
@@ -1385,15 +1386,26 @@ int bcmasp_interface_suspend(struct bcmasp_intf *intf)
 {
 	struct device *kdev = &intf->parent->pdev->dev;
 	struct net_device *dev = intf->ndev;
+	bool wake;
 
 	if (!netif_running(dev))
 		return 0;
 
 	netif_device_detach(dev);
 
-	bcmasp_netif_deinit(dev);
+	wake = device_may_wakeup(kdev) && intf->wolopts;
 
-	if (!intf->wolopts) {
+	bcmasp_netif_deinit(dev, !wake);
+
+	if (wake) {
+		/* Disable phy status updates while suspending */
+		mutex_lock(&dev->phydev->lock);
+		dev->phydev->state = PHY_READY;
+		mutex_unlock(&dev->phydev->lock);
+		cancel_delayed_work_sync(&dev->phydev->state_queue);
+
+		bcmasp_suspend_to_wol(intf);
+	} else {
 		bcmasp_phy_hw_unprepare(intf);
 
 		/* If Wake-on-LAN is disabled, we can safely
@@ -1402,9 +1414,6 @@ int bcmasp_interface_suspend(struct bcmasp_intf *intf)
 		bcmasp_core_clock_set_intf(intf, false);
 	}
 
-	if (device_may_wakeup(kdev) && intf->wolopts)
-		bcmasp_suspend_to_wol(intf);
-
 	clk_disable_unprepare(intf->parent->clk);
 
 	return 0;
@@ -1428,8 +1437,11 @@ static void bcmasp_resume_from_wol(struct bcmasp_intf *intf)
 
 int bcmasp_interface_resume(struct bcmasp_intf *intf)
 {
+	struct device *kdev = &intf->parent->pdev->dev;
 	struct net_device *dev = intf->ndev;
+	bool wake;
 	int ret;
+	u32 reg;
 
 	if (!netif_running(dev))
 		return 0;
@@ -1438,17 +1450,39 @@ int bcmasp_interface_resume(struct bcmasp_intf *intf)
 	if (ret)
 		return ret;
 
-	bcmasp_core_clock_set_intf(intf, true);
+	wake = device_may_wakeup(kdev) && intf->wolopts;
 
-	bcmasp_resume_from_wol(intf);
-
-	bcmasp_phy_hw_prepare(intf);
+	bcmasp_core_clock_set_intf(intf, true);
 
-	umac_reset_and_init(intf, dev->dev_addr);
+	/* The interface might be HW reset in some suspend modes, so we may
+	 * need to restore the UNIMAC/PHY if that is the case.
+	 */
+	reg = umac_rl(intf, UMC_CMD);
+	if (wake && (reg & UMC_CMD_RX_EN)) {
+		umac_enable_set(intf, UMC_CMD_TX_EN, 1);
+		bcmasp_resume_from_wol(intf);
+	} else {
+		bcmasp_phy_hw_prepare(intf);
+		umac_reset_and_init(intf, dev->dev_addr);
+	}
 
 	bcmasp_netif_init(dev);
 
-	phy_start(dev->phydev);
+	if (wake) {
+		/* If HW was reset, reprogram the unimac/PHY before resuming
+		 * link status tracking to avoid racing the state machine.
+		 */
+		if (!(reg & UMC_CMD_RX_EN))
+			bcmasp_adj_link(dev);
+
+		/* Resume link status tracking */
+		mutex_lock(&dev->phydev->lock);
+		dev->phydev->state = dev->phydev->link ? PHY_RUNNING : PHY_NOLINK;
+		mutex_unlock(&dev->phydev->lock);
+		phy_trigger_machine(dev->phydev);
+	} else {
+		phy_start(dev->phydev);
+	}
 
 	netif_device_attach(dev);
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next v3 1/2] net: bcmasp: Divide init to allow partial bring up
From: Justin Chen @ 2026-05-06 21:31 UTC (permalink / raw)
  To: netdev
  Cc: bcm-kernel-feedback-list, pabeni, kuba, edumazet, davem,
	andrew+netdev, florian.fainelli, Justin Chen
In-Reply-To: <20260506213114.2002886-1-justin.chen@broadcom.com>

To prepare for a partial bring up of the interface during resume,
we break apart the bcmasp_netif_init() function into smaller chunks
that can be called as necessary. Also consolidate some functions that
do not need to be standalone.

Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
---
v2
- Moved tx_lpi_timer read to only bcmasp_open(). We don't want to overwrite
  the SW ctx with the HW default in the resume/suspend case.

 .../net/ethernet/broadcom/asp2/bcmasp_intf.c  | 210 ++++++++----------
 1 file changed, 98 insertions(+), 112 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c b/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c
index ec63f50a849e..e2b51ec903af 100644
--- a/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c
+++ b/drivers/net/ethernet/broadcom/asp2/bcmasp_intf.c
@@ -344,40 +344,35 @@ static netdev_tx_t bcmasp_xmit(struct sk_buff *skb, struct net_device *dev)
 	return NETDEV_TX_OK;
 }
 
-static void bcmasp_netif_start(struct net_device *dev)
+static void umac_reset_and_init(struct bcmasp_intf *intf,
+				const unsigned char *addr)
 {
-	struct bcmasp_intf *intf = netdev_priv(dev);
-
-	bcmasp_set_rx_mode(dev);
-	napi_enable(&intf->tx_napi);
-	napi_enable(&intf->rx_napi);
-
-	bcmasp_enable_rx_irq(intf, 1);
-	bcmasp_enable_tx_irq(intf, 1);
-	bcmasp_enable_phy_irq(intf, 1);
-
-	phy_start(dev->phydev);
-}
+	struct phy_device *phydev = intf->ndev->phydev;
+	u32 mac0, mac1;
 
-static void umac_reset(struct bcmasp_intf *intf)
-{
 	umac_wl(intf, 0x0, UMC_CMD);
 	umac_wl(intf, UMC_CMD_SW_RESET, UMC_CMD);
 	usleep_range(10, 100);
 	/* We hold the umac in reset and bring it out of
 	 * reset when phy link is up.
 	 */
-}
 
-static void umac_set_hw_addr(struct bcmasp_intf *intf,
-			     const unsigned char *addr)
-{
-	u32 mac0 = (addr[0] << 24) | (addr[1] << 16) | (addr[2] << 8) |
-		    addr[3];
-	u32 mac1 = (addr[4] << 8) | addr[5];
+	umac_wl(intf, 0x800, UMC_FRM_LEN);
+	umac_wl(intf, 0xffff, UMC_PAUSE_CNTRL);
+	umac_wl(intf, 0x800, UMC_RX_MAX_PKT_SZ);
+
+	mac0 = (addr[0] << 24) | (addr[1] << 16) | (addr[2] << 8) |
+		addr[3];
+	mac1 = (addr[4] << 8) | addr[5];
 
 	umac_wl(intf, mac0, UMC_MAC0);
 	umac_wl(intf, mac1, UMC_MAC1);
+
+	/* Reset shadow values since we reset the umac */
+	intf->old_duplex = -1;
+	intf->old_link = -1;
+	intf->old_pause = -1;
+	phydev->eee_cfg.tx_lpi_timer = umac_rl(intf, UMC_EEE_LPI_TIMER);
 }
 
 static void umac_enable_set(struct bcmasp_intf *intf, u32 mask,
@@ -401,13 +396,6 @@ static void umac_enable_set(struct bcmasp_intf *intf, u32 mask,
 		usleep_range(1000, 2000);
 }
 
-static void umac_init(struct bcmasp_intf *intf)
-{
-	umac_wl(intf, 0x800, UMC_FRM_LEN);
-	umac_wl(intf, 0xffff, UMC_PAUSE_CNTRL);
-	umac_wl(intf, 0x800, UMC_RX_MAX_PKT_SZ);
-}
-
 static int bcmasp_tx_reclaim(struct bcmasp_intf *intf)
 {
 	struct bcmasp_intf_stats64 *stats = &intf->stats64;
@@ -927,6 +915,14 @@ static void bcmasp_rgmii_mode_en_set(struct bcmasp_intf *intf, bool enable)
 	rgmii_wl(intf, reg, RGMII_OOB_CNTRL);
 }
 
+static void bcmasp_phy_hw_unprepare(struct bcmasp_intf *intf)
+{
+	if (intf->internal_phy)
+		bcmasp_ephy_enable_set(intf, false);
+	else
+		bcmasp_rgmii_mode_en_set(intf, false);
+}
+
 static void bcmasp_netif_deinit(struct net_device *dev)
 {
 	struct bcmasp_intf *intf = netdev_priv(dev);
@@ -984,11 +980,7 @@ static int bcmasp_stop(struct net_device *dev)
 
 	phy_disconnect(dev->phydev);
 
-	/* Disable internal EPHY or external PHY */
-	if (intf->internal_phy)
-		bcmasp_ephy_enable_set(intf, false);
-	else
-		bcmasp_rgmii_mode_en_set(intf, false);
+	bcmasp_phy_hw_unprepare(intf);
 
 	/* Disable the interface clocks */
 	bcmasp_core_clock_set_intf(intf, false);
@@ -998,10 +990,15 @@ static int bcmasp_stop(struct net_device *dev)
 	return 0;
 }
 
-static void bcmasp_configure_port(struct bcmasp_intf *intf)
+static void bcmasp_phy_hw_prepare(struct bcmasp_intf *intf)
 {
 	u32 reg, id_mode_dis = 0;
 
+	if (intf->internal_phy)
+		bcmasp_ephy_enable_set(intf, true);
+	else
+		bcmasp_rgmii_mode_en_set(intf, true);
+
 	reg = rgmii_rl(intf, RGMII_PORT_CNTRL);
 	reg &= ~RGMII_PORT_MODE_MASK;
 
@@ -1036,26 +1033,8 @@ static void bcmasp_configure_port(struct bcmasp_intf *intf)
 	rgmii_wl(intf, reg, RGMII_OOB_CNTRL);
 }
 
-static int bcmasp_netif_init(struct net_device *dev, bool phy_connect)
+static phy_interface_t bcmasp_phy_iface_for_connect(phy_interface_t mode)
 {
-	struct bcmasp_intf *intf = netdev_priv(dev);
-	phy_interface_t phy_iface = intf->phy_interface;
-	u32 phy_flags = PHY_BRCM_AUTO_PWRDWN_ENABLE |
-			PHY_BRCM_DIS_TXCRXC_NOENRGY |
-			PHY_BRCM_IDDQ_SUSPEND;
-	struct phy_device *phydev = NULL;
-	int ret;
-
-	/* Always enable interface clocks */
-	bcmasp_core_clock_set_intf(intf, true);
-
-	/* Enable internal PHY or external PHY before any MAC activity */
-	if (intf->internal_phy)
-		bcmasp_ephy_enable_set(intf, true);
-	else
-		bcmasp_rgmii_mode_en_set(intf, true);
-	bcmasp_configure_port(intf);
-
 	/* This is an ugly quirk but we have not been correctly
 	 * interpreting the phy_interface values and we have done that
 	 * across different drivers, so at least we are consistent in
@@ -1081,46 +1060,43 @@ static int bcmasp_netif_init(struct net_device *dev, bool phy_connect)
 	 * affected because they use different phy_interface_t values
 	 * or the Generic PHY driver.
 	 */
-	switch (phy_iface) {
+	switch (mode) {
 	case PHY_INTERFACE_MODE_RGMII:
-		phy_iface = PHY_INTERFACE_MODE_RGMII_ID;
-		break;
+		return PHY_INTERFACE_MODE_RGMII_ID;
 	case PHY_INTERFACE_MODE_RGMII_TXID:
-		phy_iface = PHY_INTERFACE_MODE_RGMII_RXID;
-		break;
+		return PHY_INTERFACE_MODE_RGMII_RXID;
 	default:
-		break;
+		return mode;
 	}
+}
 
-	if (phy_connect) {
-		phydev = of_phy_connect(dev, intf->phy_dn,
-					bcmasp_adj_link, phy_flags,
-					phy_iface);
-		if (!phydev) {
-			ret = -ENODEV;
-			netdev_err(dev, "could not attach to PHY\n");
-			goto err_phy_disable;
-		}
-
-		if (intf->internal_phy)
-			dev->phydev->irq = PHY_MAC_INTERRUPT;
-
-		/* Indicate that the MAC is responsible for PHY PM */
-		phydev->mac_managed_pm = true;
-
-		/* Set phylib's copy of the LPI timer */
-		phydev->eee_cfg.tx_lpi_timer = umac_rl(intf, UMC_EEE_LPI_TIMER);
+static int bcmasp_phy_attach(struct bcmasp_intf *intf)
+{
+	u32 phy_flags = PHY_BRCM_AUTO_PWRDWN_ENABLE |
+			PHY_BRCM_DIS_TXCRXC_NOENRGY |
+			PHY_BRCM_IDDQ_SUSPEND;
+	struct phy_device *phydev;
+	phy_interface_t phy_iface;
+
+	phy_iface = bcmasp_phy_iface_for_connect(intf->phy_interface);
+	phydev = of_phy_connect(intf->ndev, intf->phy_dn,
+				bcmasp_adj_link, phy_flags,
+				phy_iface);
+	if (!phydev) {
+		netdev_err(intf->ndev, "could not attach to PHY\n");
+		return -ENODEV;
 	}
+	if (intf->internal_phy)
+		intf->ndev->phydev->irq = PHY_MAC_INTERRUPT;
 
-	umac_reset(intf);
-
-	umac_init(intf);
+	phydev->mac_managed_pm = true;
 
-	umac_set_hw_addr(intf, dev->dev_addr);
+	return 0;
+}
 
-	intf->old_duplex = -1;
-	intf->old_link = -1;
-	intf->old_pause = -1;
+static void bcmasp_netif_init(struct net_device *dev)
+{
+	struct bcmasp_intf *intf = netdev_priv(dev);
 
 	bcmasp_init_tx(intf);
 	netif_napi_add_tx(intf->ndev, &intf->tx_napi, bcmasp_tx_poll);
@@ -1132,18 +1108,13 @@ static int bcmasp_netif_init(struct net_device *dev, bool phy_connect)
 
 	intf->crc_fwd = !!(umac_rl(intf, UMC_CMD) & UMC_CMD_CRC_FWD);
 
-	bcmasp_netif_start(dev);
-
-	netif_start_queue(dev);
-
-	return 0;
+	bcmasp_set_rx_mode(dev);
+	napi_enable(&intf->tx_napi);
+	napi_enable(&intf->rx_napi);
 
-err_phy_disable:
-	if (intf->internal_phy)
-		bcmasp_ephy_enable_set(intf, false);
-	else
-		bcmasp_rgmii_mode_en_set(intf, false);
-	return ret;
+	bcmasp_enable_rx_irq(intf, 1);
+	bcmasp_enable_tx_irq(intf, 1);
+	bcmasp_enable_phy_irq(intf, 1);
 }
 
 static int bcmasp_open(struct net_device *dev)
@@ -1161,14 +1132,30 @@ static int bcmasp_open(struct net_device *dev)
 	if (ret)
 		goto err_free_mem;
 
-	ret = bcmasp_netif_init(dev, true);
-	if (ret) {
-		clk_disable_unprepare(intf->parent->clk);
-		goto err_free_mem;
-	}
+	bcmasp_core_clock_set_intf(intf, true);
+
+	bcmasp_phy_hw_prepare(intf);
+
+	ret = bcmasp_phy_attach(intf);
+	if (ret)
+		goto err_phy_attach;
+
+	umac_reset_and_init(intf, dev->dev_addr);
+
+	dev->phydev->eee_cfg.tx_lpi_timer = umac_rl(intf, UMC_EEE_LPI_TIMER);
+
+	bcmasp_netif_init(dev);
+
+	phy_start(dev->phydev);
+
+	netif_start_queue(dev);
 
 	return ret;
 
+err_phy_attach:
+	bcmasp_phy_hw_unprepare(intf);
+	bcmasp_core_clock_set_intf(intf, false);
+	clk_disable_unprepare(intf->parent->clk);
 err_free_mem:
 	bcmasp_reclaim_free_buffers(intf);
 
@@ -1407,10 +1394,7 @@ int bcmasp_interface_suspend(struct bcmasp_intf *intf)
 	bcmasp_netif_deinit(dev);
 
 	if (!intf->wolopts) {
-		if (intf->internal_phy)
-			bcmasp_ephy_enable_set(intf, false);
-		else
-			bcmasp_rgmii_mode_en_set(intf, false);
+		bcmasp_phy_hw_unprepare(intf);
 
 		/* If Wake-on-LAN is disabled, we can safely
 		 * disable the network interface clocks.
@@ -1454,17 +1438,19 @@ int bcmasp_interface_resume(struct bcmasp_intf *intf)
 	if (ret)
 		return ret;
 
-	ret = bcmasp_netif_init(dev, false);
-	if (ret)
-		goto out;
+	bcmasp_core_clock_set_intf(intf, true);
 
 	bcmasp_resume_from_wol(intf);
 
+	bcmasp_phy_hw_prepare(intf);
+
+	umac_reset_and_init(intf, dev->dev_addr);
+
+	bcmasp_netif_init(dev);
+
+	phy_start(dev->phydev);
+
 	netif_device_attach(dev);
 
 	return 0;
-
-out:
-	clk_disable_unprepare(intf->parent->clk);
-	return ret;
 }
-- 
2.34.1


^ permalink raw reply related

* [net-next v39] mctp pcc: Implement MCTP over PCC Transport
From: Adam Young @ 2026-05-06 21:43 UTC (permalink / raw)
  To: Jeremy Kerr, Matt Johnston, Andrew Lunn, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: netdev, linux-kernel, Sudeep Holla, Jonathan Cameron, Huisong Li

Implementation of network driver for
Management Component Transport Protocol(MCTP)
over Platform Communication Channel(PCC)

DMTF DSP:0292
Link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf

The transport mechanism is called Platform Communication Channels (PCC)
is part of the ACPI spec:

Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/14_Platform_Communications_Channel/Platform_Comm_Channel.html

The PCC mechanism is managed via a mailbox implemented at
drivers/mailbox/pcc.c

MCTP devices are specified via ACPI by entries in DSDT/SSDT and
reference channels specified in the PCCT. Messages are sent on a type
3 and received on a type 4 channel.  Communication with other devices
use the PCC based doorbell mechanism; a shared memory segment with a
corresponding interrupt and a memory register used to trigger remote
interrupts.

The shared buffer must be at least 68 bytes long as that is the minimum
MTU as defined by the MCTP specification.

Unlike the existing PCC Type 2 based drivers, the mssg parameter to
mbox_send_msg is actively used. The data section of the struct sk_buff
that contains the outgoing packet is sent to the mailbox, already
properly formatted as a PCC exctended message.

If the mailbox ring buffer is full, the driver stops the incoming
packet queues until a message has been sent, freeing space in the
ring buffer.

When the Type 3 channel outbox receives a txdone response interrupt,
it consumes the outgoing sk_buff, allowing it to be freed.

Bringing up an interface creates the channel between the network driver
and the mailbox driver. This enables communication with the remote
endpoint, to include the receipt of new messages. Bringing down an
interface removes the channel, and no new messages can be delivered.
Stopping the interface will leave any packets that are cached in the
mailbox ringbuffer. They cannot safely be freed until the PCC mailbox
attempts to deliver them and has removed them from the ring buffer.

PCC is based on a shared buffer and a set of I/O mapped memory locations
that the Spec calls registers.  This mechanism exists regardless of the
existence of the driver. If the user has the ability to map these
physical location to virtual locations, they have the ability to drive the
hardware.  Thus, there is a security aspect to this mechanism that extends
beyond the responsibilities of the operating system.

If the hardware does not expose the PCC in the ACPI table, this device
will never be enabled. Thus it is only an issue on hardware that does
support PCC. In that case, it is up to the remote controller to sanitize
communication; MCTP will be exposed as a socket interface, and userland
can send any crafted packet it wants. It would also be incumbent on
the hardware manufacturer to allow the end user to disable MCTP over PCC
communication if they did not want to expose it.

Link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf
Link: https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/14_Platform_Communications_Channel/Platform_Comm_Channel.html
Signed-off-by: Adam Young <admiyo@os.amperecomputing.com>

---

Several rounds of code review have gone into this update.  The changes
are listed roughly in the order they were made.  Additional issues have
been found but require changes in the mailbox/pcc.c and the
corresponding header file.  These will be submitted in a separate series
of reviews.  The driver works without those changes, and any issues are
pre-existing in all PCC extended memory drivers.

Previous Version: https://lore.kernel.org/lkml/20260405180741.1496198-1-admiyo@os.amperecomputing.com/

Changes from Previous version

check that returned message length is smaller than the size of the shared buffer.
remove space after // in mctp_pcc_tx
don't say ndo in function names
don't use capital letter in MTU function name

move dev_dstats_tx_add outside of irq processing
Note AI Generated Code Review comment which seems legit:
        mctp_pcc_tx() runs in process or softirq context and calls
        dev_dstats_tx_dropped(). mctp_pcc_tx_done() runs in the controller's hardirq
        context and calls dev_dstats_tx_add().
        The generic dev_dstats_* macros use u64_stats_update_begin(), which does not
        disable interrupts. If a hardware interrupt fires while the softirq path is
        inside u64_stats_update_begin(), the hardirq path will re-enter the lock.

while technically there are still possible ways that the SKB send might fail
	we cannot report them any later than prep_tx.  In tx complete,
	we are in interupt context and it is not safe to update the stats then.

mctp-pcc use little-endian conversion for shared buffer header values
explain why we test for a null SKB in mctp_pcc_tx_prepare and warn once
Deal with fragmented outbound packets
Cleanup the active message when tearing down the driver
        the active message may have been sent to the backend just prior to clean up
        we want to free it during cleanup, and thus make sure that the null pointer
        set as the active message is not passed to the callback if the message
        response interrupt gets triggered.

return PTR_ERR and EINVAL for initialization error cases
Fail initialization if there are not exactly 2 channels: an outbox and an inbox
don't advance msg_free when freeing skbs
use PCC_EXTRA_LEN  to clarify and standardize length comparisons between value from buffer and size of shmem.
ensure the shared buffer is large enough to hold MCTP-PCC messages
ensure there is an active message when clearing the ring buffer
start teardown process by disabling rx
        Avoid a race condition where the teardown
        process frees the active req, but an interupt
        resets it, leaking memory.  Ensure active_req
        stays clear By closing the channel first.

CONFIG option to ensure mctp is 64bit only
compare inbox MTU with Outbox MTU and take the smaller value
add mctp_pcc_change_mtu
use ACPI constant for u32 data type
default case for switch in testing PCC channels
---
 MAINTAINERS                 |   5 +
 drivers/net/mctp/Kconfig    |  15 ++
 drivers/net/mctp/Makefile   |   1 +
 drivers/net/mctp/mctp-pcc.c | 427 ++++++++++++++++++++++++++++++++++++
 4 files changed, 448 insertions(+)
 create mode 100644 drivers/net/mctp/mctp-pcc.c

diff --git a/MAINTAINERS b/MAINTAINERS
index cb0f2bf6fe2e..7a064e326f0a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15398,6 +15398,11 @@ F:	include/net/mctpdevice.h
 F:	include/net/netns/mctp.h
 F:	net/mctp/
 
+MANAGEMENT COMPONENT TRANSPORT PROTOCOL (MCTP) over PCC (MCTP-PCC) Driver
+M:	Adam Young <admiyo@os.amperecomputing.com>
+S:	Maintained
+F:	drivers/net/mctp/mctp-pcc.c
+
 MAPLE TREE
 M:	Liam R. Howlett <Liam.Howlett@oracle.com>
 R:	Alice Ryhl <aliceryhl@google.com>
diff --git a/drivers/net/mctp/Kconfig b/drivers/net/mctp/Kconfig
index cf325ab0b1ef..9c1d228e5ad7 100644
--- a/drivers/net/mctp/Kconfig
+++ b/drivers/net/mctp/Kconfig
@@ -47,6 +47,21 @@ config MCTP_TRANSPORT_I3C
 	  A MCTP protocol network device is created for each I3C bus
 	  having a "mctp-controller" devicetree property.
 
+config MCTP_TRANSPORT_PCC
+	tristate "MCTP PCC transport"
+	depends on ACPI
+	depends on PCC
+	depends on 64BIT
+	help
+	  Provides a driver to access MCTP devices over PCC transport,
+	  A MCTP protocol network device is created via ACPI for each
+	  entry in the DSDT/SSDT that matches the identifier. The Platform
+	  communication channels are selected from the corresponding
+	  entries in the PCCT.
+
+	  Say y here if you need to connect to MCTP endpoints over PCC. To
+	  compile as a module, use m; the module will be called mctp-pcc.
+
 config MCTP_TRANSPORT_USB
 	tristate "MCTP USB transport"
 	depends on USB
diff --git a/drivers/net/mctp/Makefile b/drivers/net/mctp/Makefile
index c36006849a1e..0a591299ffa9 100644
--- a/drivers/net/mctp/Makefile
+++ b/drivers/net/mctp/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_MCTP_SERIAL) += mctp-serial.o
 obj-$(CONFIG_MCTP_TRANSPORT_I2C) += mctp-i2c.o
 obj-$(CONFIG_MCTP_TRANSPORT_I3C) += mctp-i3c.o
+obj-$(CONFIG_MCTP_TRANSPORT_PCC) += mctp-pcc.o
 obj-$(CONFIG_MCTP_TRANSPORT_USB) += mctp-usb.o
diff --git a/drivers/net/mctp/mctp-pcc.c b/drivers/net/mctp/mctp-pcc.c
new file mode 100644
index 000000000000..de644c188f3e
--- /dev/null
+++ b/drivers/net/mctp/mctp-pcc.c
@@ -0,0 +1,427 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * mctp-pcc.c - Driver for MCTP over PCC.
+ * Copyright (c) 2024-2026, Ampere Computing LLC
+ *
+ */
+
+/* Implementation of MCTP over PCC DMTF Specification DSP0256
+ * https://www.dmtf.org/sites/default/files/standards/documents/DSP0292_1.0.0WIP50.pdf
+ */
+
+#include <linux/acpi.h>
+#include <linux/hrtimer.h>
+#include <linux/if_arp.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/mailbox_client.h>
+#include <linux/module.h>
+#include <linux/netdevice.h>
+#include <linux/platform_device.h>
+#include <linux/skbuff.h>
+#include <linux/string.h>
+
+#include <acpi/acpi_bus.h>
+#include <acpi/acpi_drivers.h>
+#include <acpi/acrestyp.h>
+#include <acpi/actbl.h>
+#include <acpi/pcc.h>
+#include <net/mctp.h>
+#include <net/mctpdevice.h>
+
+#define MCTP_SIGNATURE          "MCTP"
+#define MCTP_SIGNATURE_LENGTH   (sizeof(MCTP_SIGNATURE) - 1)
+#define MCTP_MIN_MTU            68
+#define PCC_HEADER_SIZE         sizeof(struct acpi_pcct_ext_pcc_shared_memory)
+#define MCTP_PCC_MIN_SIZE       (PCC_HEADER_SIZE + MCTP_MIN_MTU)
+#define PCC_EXTRA_LEN           (PCC_HEADER_SIZE - sizeof(pcc_header.command))
+struct mctp_pcc_mailbox {
+	u32 index;
+	struct pcc_mbox_chan *chan;
+	struct mbox_client client;
+};
+
+/* The netdev structure. One of these per PCC adapter. */
+struct mctp_pcc_ndev {
+	struct net_device *ndev;
+	struct acpi_device *acpi_device;
+	struct mctp_pcc_mailbox inbox;
+	struct mctp_pcc_mailbox outbox;
+};
+
+static void mctp_pcc_client_rx_callback(struct mbox_client *cl, void *mssg)
+{
+	struct acpi_pcct_ext_pcc_shared_memory pcc_header;
+	struct mctp_pcc_ndev *mctp_pcc_ndev;
+	struct mctp_pcc_mailbox *inbox;
+	struct mctp_skb_cb *cb;
+	struct sk_buff *skb;
+	u32 header_length;
+	int size;
+
+	mctp_pcc_ndev = container_of(cl, struct mctp_pcc_ndev, inbox.client);
+	inbox = &mctp_pcc_ndev->inbox;
+	memcpy_fromio(&pcc_header, inbox->chan->shmem, sizeof(pcc_header));
+
+	// The message must at least have the PCC command indicating it is an MCTP
+	// message followed by the MCTP header, or we have a malformed message.
+	// This may be run on big endian system, but the data in the buffer is
+	// explicitly little endian.
+	header_length = le32_to_cpu(pcc_header.length);
+	if (header_length < sizeof(pcc_header.command) + sizeof(struct mctp_hdr)) {
+		dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
+		return;
+	}
+	// If the reported size is larger than the shared memory minus headers,
+	// something is wrong and treat the buffer as corrupted data.
+	if (header_length > inbox->chan->shmem_size - PCC_EXTRA_LEN) {
+		dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
+		return;
+	}
+	if (memcmp(&pcc_header.command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH) != 0) {
+		dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
+		return;
+	}
+
+	size = header_length + PCC_EXTRA_LEN;
+	skb = netdev_alloc_skb(mctp_pcc_ndev->ndev, size);
+	if (!skb) {
+		dev_dstats_rx_dropped(mctp_pcc_ndev->ndev);
+		return;
+	}
+	skb_put(skb, size);
+	skb->protocol = htons(ETH_P_MCTP);
+	memcpy_fromio(skb->data, inbox->chan->shmem, size);
+	dev_dstats_rx_add(mctp_pcc_ndev->ndev, size);
+	skb_pull(skb, sizeof(pcc_header));
+	skb_reset_mac_header(skb);
+	skb_reset_network_header(skb);
+	cb = __mctp_cb(skb);
+	cb->halen = 0;
+	netif_rx(skb);
+}
+
+static netdev_tx_t mctp_pcc_tx(struct sk_buff *skb, struct net_device *ndev)
+{
+	struct acpi_pcct_ext_pcc_shared_memory *pcc_header;
+	struct mctp_pcc_ndev *mpnd = netdev_priv(ndev);
+	int len = skb->len;
+
+	/* Consolidated a fragmented packet into contiguous memory */
+	if (skb_is_nonlinear(skb)) {
+		if (skb_linearize(skb))
+			goto error;
+	}
+
+	if (skb_cow_head(skb, sizeof(*pcc_header)))
+		goto error;
+
+	/**
+	 * This code could potentially be run on A Big Endian
+	 * System.  The ACPI specification requires that values
+	 * in the shared buffer be little endian.
+	 */
+	pcc_header = skb_push(skb, sizeof(*pcc_header));
+	pcc_header->signature = cpu_to_le32(PCC_SIGNATURE | mpnd->outbox.index);
+	pcc_header->flags = cpu_to_le32(PCC_CMD_COMPLETION_NOTIFY);
+	memcpy(&pcc_header->command, MCTP_SIGNATURE, MCTP_SIGNATURE_LENGTH);
+	pcc_header->length =  cpu_to_le32(len + MCTP_SIGNATURE_LENGTH);
+
+	if (mbox_send_message(mpnd->outbox.chan->mchan, skb) < 0) {
+		// Remove the header in case it gets sent again
+		skb_pull(skb, sizeof(*pcc_header));
+		netif_stop_queue(ndev);
+		return NETDEV_TX_BUSY;
+	}
+
+	return NETDEV_TX_OK;
+error:
+	dev_dstats_tx_dropped(ndev);
+	kfree_skb(skb);
+	return NETDEV_TX_OK;
+}
+
+static void mctp_pcc_tx_prepare(struct mbox_client *cl, void *mssg)
+{
+	struct mctp_pcc_ndev *mctp_pcc_ndev;
+	struct mctp_pcc_mailbox *outbox;
+	struct sk_buff *skb = mssg;
+
+	mctp_pcc_ndev = container_of(cl, struct mctp_pcc_ndev, outbox.client);
+	outbox = &mctp_pcc_ndev->outbox;
+
+	/* The PCC Mailbox typically does not make use of the mssg pointer
+	 * The mctp-over pcc driver is the only client that uses it.
+	 * This value should always be non-null; it is possible
+	 * that a change in the Mailbox level will break that assumption.
+	 */
+	if (!skb) {
+		WARN_ONCE(!skb, "%s called with null message.\n", __func__);
+		return;
+	}
+
+	if (skb->len > outbox->chan->shmem_size) {
+		dev_dstats_tx_dropped(mctp_pcc_ndev->ndev);
+		return;
+	}
+	memcpy_toio(outbox->chan->shmem,  skb->data, skb->len);
+	dev_dstats_tx_add(mctp_pcc_ndev->ndev, skb->len);
+}
+
+static void mctp_pcc_tx_done(struct mbox_client *c, void *mssg, int r)
+{
+	struct mctp_pcc_ndev *mctp_pcc_ndev;
+	struct sk_buff *skb = mssg;
+
+	/*
+	 * If there is a packet in flight during driver cleanup
+	 * It may have been freed already.
+	 */
+	if (!mssg)
+		return;
+	/*
+	 * If the return code is non-zero, we should not report the packet
+	 * as transmitted.  However, we are in IRQ context right now, and we
+	 * cannot safely write transmission statistics.
+	 */
+	mctp_pcc_ndev = container_of(c, struct mctp_pcc_ndev, outbox.client);
+	dev_consume_skb_any(skb);
+	netif_wake_queue(mctp_pcc_ndev->ndev);
+}
+
+static int mctp_pcc_open(struct net_device *ndev)
+{
+	struct mctp_pcc_ndev *mctp_pcc_ndev = netdev_priv(ndev);
+	struct mctp_pcc_mailbox *outbox, *inbox;
+
+	outbox = &mctp_pcc_ndev->outbox;
+	inbox = &mctp_pcc_ndev->inbox;
+
+	outbox->chan = pcc_mbox_request_channel(&outbox->client, outbox->index);
+	if (IS_ERR(outbox->chan))
+		return PTR_ERR(outbox->chan);
+
+	if (outbox->chan->shmem_size  <  MCTP_PCC_MIN_SIZE) {
+		pcc_mbox_free_channel(outbox->chan);
+		return -EINVAL;
+	}
+
+	inbox->client.rx_callback = mctp_pcc_client_rx_callback;
+	inbox->chan = pcc_mbox_request_channel(&inbox->client, inbox->index);
+	if (IS_ERR(inbox->chan)) {
+		pcc_mbox_free_channel(outbox->chan);
+		return PTR_ERR(inbox->chan);
+	}
+	if (inbox->chan->shmem_size < MCTP_PCC_MIN_SIZE) {
+		pcc_mbox_free_channel(outbox->chan);
+		pcc_mbox_free_channel(inbox->chan);
+		return -EINVAL;
+	}
+	return 0;
+}
+
+static int mctp_pcc_stop(struct net_device *ndev)
+{
+	struct mctp_pcc_ndev *mctp_pcc_ndev;
+	unsigned int count, idx;
+	struct mbox_chan *chan;
+	struct sk_buff *skb;
+
+	mctp_pcc_ndev = netdev_priv(ndev);
+	chan = mctp_pcc_ndev->outbox.chan->mchan;
+	pcc_mbox_free_channel(mctp_pcc_ndev->inbox.chan);
+	scoped_guard(spinlock_irqsave, &chan->lock) {
+		skb = chan->active_req;
+		chan->active_req = NULL;
+		if (skb) {
+			dev_dstats_tx_dropped(ndev);
+			dev_consume_skb_any(skb);
+		}
+		while (chan->msg_count > 0) {
+			count = chan->msg_count;
+			idx = chan->msg_free;
+			if (idx >= count)
+				idx -= count;
+			else
+				idx += MBOX_TX_QUEUE_LEN - count;
+			skb = chan->msg_data[idx];
+			dev_dstats_tx_dropped(ndev);
+			dev_consume_skb_any(skb);
+			chan->msg_count--;
+		}
+	}
+	pcc_mbox_free_channel(mctp_pcc_ndev->outbox.chan);
+	return 0;
+}
+
+static int mctp_pcc_change_mtu(struct net_device *dev, int new_mtu)
+{
+	if (new_mtu > dev->max_mtu)
+		return -EINVAL;
+	if (new_mtu < MCTP_MIN_MTU)
+		return -EINVAL;
+	return 0;
+}
+
+static const struct net_device_ops mctp_pcc_netdev_ops = {
+	.ndo_open = mctp_pcc_open,
+	.ndo_stop = mctp_pcc_stop,
+	.ndo_start_xmit = mctp_pcc_tx,
+	.ndo_change_mtu = mctp_pcc_change_mtu,
+};
+
+static void mctp_pcc_setup(struct net_device *ndev)
+{
+	ndev->type = ARPHRD_MCTP;
+	ndev->hard_header_len = sizeof(struct acpi_pcct_ext_pcc_shared_memory);
+	ndev->tx_queue_len = 0;
+	ndev->flags = IFF_NOARP;
+	ndev->netdev_ops = &mctp_pcc_netdev_ops;
+	ndev->needs_free_netdev = true;
+	ndev->pcpu_stat_type = NETDEV_PCPU_STAT_DSTATS;
+}
+
+struct mctp_pcc_lookup_context {
+	int index;
+	u32 inbox_index;
+	u32 outbox_index;
+};
+
+static acpi_status lookup_pcct_indices(struct acpi_resource *ares,
+				       void *context)
+{
+	struct mctp_pcc_lookup_context *luc = context;
+	struct acpi_resource_address32 *addr;
+
+	if (ares->type != ACPI_RESOURCE_TYPE_ADDRESS32)
+		return AE_OK;
+
+	addr = ACPI_CAST_PTR(struct acpi_resource_address32, &ares->data);
+	switch (luc->index) {
+	case 0:
+		luc->outbox_index = addr[0].address.minimum;
+		break;
+	case 1:
+		luc->inbox_index = addr[0].address.minimum;
+		break;
+	default:
+		return AE_ERROR;
+	}
+	luc->index++;
+	return AE_OK;
+}
+
+static void mctp_cleanup_netdev(void *data)
+{
+	struct net_device *ndev = data;
+
+	mctp_unregister_netdev(ndev);
+}
+
+static int initialize_mtu(struct net_device *ndev)
+{
+	struct mctp_pcc_ndev *mctp_pcc_ndev;
+	struct mctp_pcc_mailbox *outbox;
+	struct pcc_mbox_chan *pchan;
+	int mctp_pcc_max_mtu;
+
+	mctp_pcc_ndev = netdev_priv(ndev);
+	outbox = &mctp_pcc_ndev->outbox;
+	pchan = pcc_mbox_request_channel(&outbox->client, outbox->index);
+	if (IS_ERR(pchan))
+		return PTR_ERR(pchan);
+	if (pchan->shmem_size < MCTP_MIN_MTU + sizeof(struct acpi_pcct_ext_pcc_shared_memory)) {
+		pcc_mbox_free_channel(pchan);
+		return -EINVAL;
+	}
+	mctp_pcc_max_mtu = pchan->shmem_size - sizeof(struct acpi_pcct_ext_pcc_shared_memory);
+	pcc_mbox_free_channel(pchan);
+
+	ndev->mtu = MCTP_MIN_MTU;
+	ndev->max_mtu = mctp_pcc_max_mtu;
+	ndev->min_mtu = MCTP_MIN_MTU;
+
+	return 0;
+}
+
+static int mctp_pcc_driver_add(struct acpi_device *acpi_dev)
+{
+	struct mctp_pcc_lookup_context context = {0};
+	struct mctp_pcc_ndev *mctp_pcc_ndev;
+	struct device *dev = &acpi_dev->dev;
+	struct net_device *ndev;
+	acpi_handle dev_handle;
+	acpi_status status;
+	char name[32];
+	int rc;
+
+	dev_dbg(dev, "Adding mctp_pcc device for HID %s\n",
+		acpi_device_hid(acpi_dev));
+	dev_handle = acpi_device_handle(acpi_dev);
+	status = acpi_walk_resources(dev_handle, "_CRS", lookup_pcct_indices,
+				     &context);
+	if (!ACPI_SUCCESS(status)) {
+		dev_err(dev, "FAILED to lookup PCC indexes from CRS\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Ensure we have exactly 2 channels: an outbox and an inbox.
+	 */
+	if (context.index != 2)
+		return -EINVAL;
+
+	snprintf(name, sizeof(name), "mctppcc%d", context.inbox_index);
+	ndev = alloc_netdev(sizeof(*mctp_pcc_ndev), name, NET_NAME_PREDICTABLE,
+			    mctp_pcc_setup);
+	if (!ndev)
+		return -ENOMEM;
+
+	mctp_pcc_ndev = netdev_priv(ndev);
+
+	mctp_pcc_ndev->inbox.index = context.inbox_index;
+	mctp_pcc_ndev->inbox.client.dev = dev;
+	mctp_pcc_ndev->outbox.index = context.outbox_index;
+	mctp_pcc_ndev->outbox.client.dev = dev;
+
+	mctp_pcc_ndev->outbox.client.tx_prepare = mctp_pcc_tx_prepare;
+	mctp_pcc_ndev->outbox.client.tx_done = mctp_pcc_tx_done;
+	mctp_pcc_ndev->acpi_device = acpi_dev;
+	mctp_pcc_ndev->ndev = ndev;
+	acpi_dev->driver_data = mctp_pcc_ndev;
+
+	rc = initialize_mtu(ndev);
+	if (rc)
+		goto free_netdev;
+
+	rc = mctp_register_netdev(ndev, NULL, MCTP_PHYS_BINDING_PCC);
+	if (rc)
+		goto free_netdev;
+
+	return devm_add_action_or_reset(dev, mctp_cleanup_netdev, ndev);
+free_netdev:
+	free_netdev(ndev);
+	return rc;
+}
+
+static const struct acpi_device_id mctp_pcc_device_ids[] = {
+	{ "DMT0001" },
+	{}
+};
+
+static struct acpi_driver mctp_pcc_driver = {
+	.name = "mctp_pcc",
+	.class = "Unknown",
+	.ids = mctp_pcc_device_ids,
+	.ops = {
+		.add = mctp_pcc_driver_add,
+	},
+};
+
+module_acpi_driver(mctp_pcc_driver);
+
+MODULE_DEVICE_TABLE(acpi, mctp_pcc_device_ids);
+
+MODULE_DESCRIPTION("MCTP PCC ACPI device");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Adam Young <admiyo@os.amperecomputing.com>");
-- 
2.43.0


^ permalink raw reply related

* Re: [PATCH net-next 09/12] gpio: tc956x: add TC956x/QPS615 support
From: Andrew Lunn @ 2026-05-06 21:43 UTC (permalink / raw)
  To: Alex Elder
  Cc: andrew+netdev, davem, edumazet, kuba, pabeni, maxime.chevallier,
	rmk+kernel, andersson, konradybcio, robh, krzk+dt, conor+dt,
	linusw, brgl, arnd, gregkh, daniel, mohd.anwar, a0987203069,
	alexandre.torgue, ast, boon.khai.ng, chenchuangyu, chenhuacai,
	daniel, hawk, hkallweit1, inochiama, john.fastabend, julianbraha,
	livelycarpet87, matthew.gerlach, mcoquelin.stm32, me,
	prabhakar.mahadev-lad.rj, richardcochran, rohan.g.thomas, sdf,
	siyanteng, weishangjuan, wens, netdev, bpf, linux-arm-msm,
	devicetree, linux-gpio, linux-stm32, linux-arm-kernel,
	linux-kernel
In-Reply-To: <79684efa-4ba9-4144-a99b-dab935007a2f@riscstar.com>

> To be clear, the reason you're asking is that you're suggesting
> we might want to model the GPIO controller differently, correct?

Correct.

> I.e., model it as *not* associated with the embedded PCIe
> functions.  Then we need to think about what its parent device
> would be (the power control device, which I think somehow
> duplicates the switch device?).

Logically, the GPIO controller cannot be part of a downstream
function, if you need to use the GPIO controller to turn the
downstream function on.

Logically, the GPIO controller needs to be above the switch downstream
end points. Where above, i don't know. Which is why i was asking about
where it appears in the address spaces.

But i also don't know too much about PCI, i'm used to SoCs with simple
linear MMIO.

From the little i know, it is more than what address space does the
GPIO appear in. Its also, what enumerable entity does it appear in
within the PCI bus. Because its the enumeration which is going to
trigger a driver load, which can then drive the GPIO controller.

Or, something more radical, you make the PCIe power controller an MFD,
instantiating both the power controller and a GPIO controller over the
I2C bus. GPIO access will not be as fast, but is there anything here
which needs to be fast?

      Andrew

^ permalink raw reply

* Re: [Patch net-next v1 2/7] r8169: add support for multi rx queues
From: Heiner Kallweit @ 2026-05-06 21:45 UTC (permalink / raw)
  To: javen, nic_swsd, andrew+netdev, davem, edumazet, kuba, pabeni,
	horms
  Cc: netdev, linux-kernel
In-Reply-To: <20260506081326.767-3-javen_xu@realsil.com.cn>

On 06.05.2026 10:13, javen wrote:
> From: Javen Xu <javen_xu@realsil.com.cn>
> 
> This patch adds support for multi rx queues. RSS requires multi rx
> queues to receive packets. So we need struct rtl8169_rx_ring for each
> queue.
> 
> Signed-off-by: Javen Xu <javen_xu@realsil.com.cn>
> ---
>  drivers/net/ethernet/realtek/r8169_main.c | 318 +++++++++++++++++-----
>  1 file changed, 251 insertions(+), 67 deletions(-)
> 
> diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
> index ef74ee02c117..bc75dbb9901d 100644
> --- a/drivers/net/ethernet/realtek/r8169_main.c
> +++ b/drivers/net/ethernet/realtek/r8169_main.c
> @@ -74,10 +74,11 @@
>  #define NUM_TX_DESC	256	/* Number of Tx descriptor registers */
>  #define NUM_RX_DESC	256	/* Number of Rx descriptor registers */
>  #define R8169_TX_RING_BYTES	(NUM_TX_DESC * sizeof(struct TxDesc))
> -#define R8169_RX_RING_BYTES	(NUM_RX_DESC * sizeof(struct RxDesc))
>  #define R8169_TX_STOP_THRS	(MAX_SKB_FRAGS + 1)
>  #define R8169_TX_START_THRS	(2 * R8169_TX_STOP_THRS)
> +#define R8169_MAX_RX_QUEUES	8
>  #define R8169_MAX_MSIX_VEC	32
> +#define R8127_MAX_RX_QUEUES	8

Why two MAX_RX_QUEUES constants with the same value?

>  
>  #define OCP_STD_PHY_BASE	0xa400
>  
> @@ -447,6 +448,7 @@ enum rtl8125_registers {
>  	RSS_CTRL_8125		= 0x4500,
>  	Q_NUM_CTRL_8125		= 0x4800,
>  	EEE_TXIDLE_TIMER_8125	= 0x6048,
> +	RDSAR_Q1_LOW		= 0x4000,

Better sort register ids by register number?

>  };
>  
>  #define LEDSEL_MASK_8125	0x23f
> @@ -731,6 +733,19 @@ enum rtl_dash_type {
>  	RTL_DASH_25_BP,
>  };
>  
> +struct rtl8169_rx_ring {
> +	u32 index;					/* Rx queue index */
> +	u32 cur_rx;					/* Index of next Rx pkt. */
> +	u32 dirty_rx;					/* Index for recycling. */
> +	u32 num_rx_desc;				/* num of Rx desc */
> +	struct RxDesc *rx_desc_array;			/* array of Rx Desc*/
> +	u32 rx_desc_alloc_size;				/* memory size per descs of ring */
> +	dma_addr_t rx_desc_phy_addr[NUM_RX_DESC];	/* Rx data buffer physical dma address */
> +	dma_addr_t rx_phy_addr;				/* Rx desc physical address */
> +	struct page *rx_databuff[NUM_RX_DESC];		/* Rx data buffers */
> +	u16 rdsar_reg;					/* Receive Descriptor Start Address */
> +};
> +
>  struct rtl8169_napi {
>  	struct napi_struct napi;
>  	void *priv;
> @@ -744,6 +759,13 @@ struct rtl8169_irq {
>  	char		name[IFNAMSIZ + 10];
>  };
>  
> +enum rx_desc_ring_type {
> +	RX_DESC_RING_TYPE_UNKNOWN = 0,
> +	RX_DESC_RING_TYPE_DEFAULT,
> +	RX_DESC_RING_TYPE_RSS,
> +	RX_DESC_RING_TYPE_MAX

UNKNOWN and MAX seem to never be used in this series.
So do we need them?

> +};
> +
>  struct rtl8169_private {
>  	void __iomem *mmio_addr;	/* memory map physical address */
>  	struct pci_dev *pci_dev;
> @@ -752,28 +774,28 @@ struct rtl8169_private {
>  	struct napi_struct napi;
>  	enum mac_version mac_version;
>  	enum rtl_dash_type dash_type;
> -	u32 cur_rx; /* Index into the Rx descriptor buffer of next Rx pkt. */
>  	u32 cur_tx; /* Index into the Tx descriptor buffer of next Rx pkt. */
>  	u32 dirty_tx;
>  	struct TxDesc *TxDescArray;	/* 256-aligned Tx descriptor ring */
> -	struct RxDesc *RxDescArray;	/* 256-aligned Rx descriptor ring */
>  	dma_addr_t TxPhyAddr;
> -	dma_addr_t RxPhyAddr;
> -	struct page *Rx_databuff[NUM_RX_DESC];	/* Rx data buffers */
>  	struct ring_info tx_skb[NUM_TX_DESC];	/* Tx data buffers */
>  	struct rtl8169_irq irq_tbl[R8169_MAX_MSIX_VEC];
>  	struct rtl8169_napi r8169napi[R8169_MAX_MSIX_VEC];
> +	struct rtl8169_rx_ring rx_ring[R8169_MAX_RX_QUEUES];
>  	u16 isr_reg[R8169_MAX_MSIX_VEC];
>  	u16 imr_reg[R8169_MAX_MSIX_VEC];
>  	unsigned int num_rx_rings;
>  	u16 cp_cmd;
>  	u16 tx_lpi_timer;
>  	u32 irq_mask;
> +	u16 hw_supp_num_rx_queues;
>  	u8 min_irq_nvecs;
>  	u8 max_irq_nvecs;
>  	u8 hw_supp_isr_ver;
>  	u8 hw_curr_isr_ver;
>  	u8 irq_nvecs;
> +	u8 init_rx_desc_type;
> +	u8 recheck_desc_ownbit;

This seems to be a flag. Then why type u8?

>  	int irq;
>  	struct clk *clk;
>  
> @@ -2647,9 +2669,27 @@ static void rtl_init_rxcfg(struct rtl8169_private *tp)
>  	}
>  }
>  
> +static void rtl8169_rx_desc_init(struct rtl8169_private *tp)
> +{
> +	for (int i = 0; i < tp->num_rx_rings; i++) {
> +		struct rtl8169_rx_ring *ring = &tp->rx_ring[i];
> +
> +		memset(ring->rx_desc_array, 0x0, ring->rx_desc_alloc_size);
> +	}
> +}
> +
>  static void rtl8169_init_ring_indexes(struct rtl8169_private *tp)
>  {
> -	tp->dirty_tx = tp->cur_tx = tp->cur_rx = 0;
> +	tp->dirty_tx = 0;
> +	tp->cur_tx = 0;
> +
> +	for (int i = 0; i < tp->hw_supp_num_rx_queues; i++) {
> +		struct rtl8169_rx_ring *ring = &tp->rx_ring[i];
> +
> +		ring->dirty_rx = 0;
> +		ring->cur_rx = 0;
> +		ring->index = i;
> +	}
>  }
>  
>  static void rtl_jumbo_config(struct rtl8169_private *tp)
> @@ -2708,8 +2748,18 @@ static void rtl_hw_reset(struct rtl8169_private *tp)
>  	rtl_loop_wait_low(tp, &rtl_chipcmd_cond, 100, 100);
>  }
>  
> +static void rtl_set_ring_size(struct rtl8169_private *tp, u32 rx_num)
> +{
> +	for (int i = 0; i < tp->hw_supp_num_rx_queues; i++)
> +		tp->rx_ring[i].num_rx_desc = rx_num;
> +}
> +
>  static void rtl_setup_mqs_reg(struct rtl8169_private *tp)
>  {
> +	tp->rx_ring[0].rdsar_reg = RxDescAddrLow;
> +	for (int i = 1; i < tp->hw_supp_num_rx_queues; i++)
> +		tp->rx_ring[i].rdsar_reg = (u16)(RDSAR_Q1_LOW + (i - 1) * 8);

This looks like array rx_ring[] isn't actually needed.

> +
>  	if (tp->mac_version <= RTL_GIGA_MAC_VER_52) {
>  		tp->isr_reg[0] = IntrStatus;
>  		tp->imr_reg[0] = IntrMask;
> @@ -2733,17 +2783,21 @@ static void rtl_software_parameter_initialize(struct rtl8169_private *tp)
>  	case RTL_GIGA_MAC_VER_80:
>  		tp->min_irq_nvecs = 1;
>  		tp->max_irq_nvecs = 1;
> +		tp->hw_supp_num_rx_queues = R8127_MAX_RX_QUEUES;
>  		tp->hw_supp_isr_ver = 6;
>  		break;
>  	default:
>  		tp->min_irq_nvecs = 1;
>  		tp->max_irq_nvecs = 1;
> +		tp->hw_supp_num_rx_queues = 1;
>  		tp->hw_supp_isr_ver = 1;
>  		break;
>  	}
> +	tp->init_rx_desc_type = RX_DESC_RING_TYPE_DEFAULT;
>  	tp->hw_curr_isr_ver = tp->hw_supp_isr_ver;
>  
>  	rtl_setup_mqs_reg(tp);
> +	rtl_set_ring_size(tp, NUM_RX_DESC);
>  }
>  
>  static void rtl_request_firmware(struct rtl8169_private *tp)
> @@ -2877,8 +2931,13 @@ static void rtl_set_rx_tx_desc_registers(struct rtl8169_private *tp)
>  	 */
>  	RTL_W32(tp, TxDescStartAddrHigh, ((u64) tp->TxPhyAddr) >> 32);
>  	RTL_W32(tp, TxDescStartAddrLow, ((u64) tp->TxPhyAddr) & DMA_BIT_MASK(32));
> -	RTL_W32(tp, RxDescAddrHigh, ((u64) tp->RxPhyAddr) >> 32);
> -	RTL_W32(tp, RxDescAddrLow, ((u64) tp->RxPhyAddr) & DMA_BIT_MASK(32));
> +
> +	for (int i = 0; i < tp->num_rx_rings; i++) {
> +		struct rtl8169_rx_ring *ring = &tp->rx_ring[i];
> +
> +		RTL_W32(tp, ring->rdsar_reg, ((u64)ring->rx_phy_addr) & DMA_BIT_MASK(32));
> +		RTL_W32(tp, ring->rdsar_reg + 4, ((u64)ring->rx_phy_addr >> 32));
> +	}
>  }
>  
>  static void rtl8169_set_magic_reg(struct rtl8169_private *tp)
> @@ -4214,7 +4273,7 @@ static int rtl8169_change_mtu(struct net_device *dev, int new_mtu)
>  	return 0;
>  }
>  
> -static void rtl8169_mark_to_asic(struct RxDesc *desc)
> +static void rtl8169_mark_to_asic_default(struct RxDesc *desc)
>  {
>  	u32 eor = le32_to_cpu(desc->opts1) & RingEnd;
>  
> @@ -4224,13 +4283,19 @@ static void rtl8169_mark_to_asic(struct RxDesc *desc)
>  	WRITE_ONCE(desc->opts1, cpu_to_le32(DescOwn | eor | R8169_RX_BUF_SIZE));
>  }
>  
> +static void rtl8169_mark_to_asic(struct rtl8169_private *tp, struct RxDesc *desc)
> +{
> +	rtl8169_mark_to_asic_default(desc);
> +}
> +
>  static struct page *rtl8169_alloc_rx_data(struct rtl8169_private *tp,
> -					  struct RxDesc *desc)
> +					  struct rtl8169_rx_ring *ring, unsigned int index)
>  {
>  	struct device *d = tp_to_dev(tp);
>  	int node = dev_to_node(d);
>  	dma_addr_t mapping;
>  	struct page *data;
> +	struct RxDesc *desc = ring->rx_desc_array + index;
>  
>  	data = alloc_pages_node(node, GFP_KERNEL, get_order(R8169_RX_BUF_SIZE));
>  	if (!data)
> @@ -4244,55 +4309,111 @@ static struct page *rtl8169_alloc_rx_data(struct rtl8169_private *tp,
>  	}
>  
>  	desc->addr = cpu_to_le64(mapping);
> -	rtl8169_mark_to_asic(desc);
> +	ring->rx_desc_phy_addr[index] = mapping;
> +	rtl8169_mark_to_asic(tp, desc);
>  
>  	return data;
>  }
>  
> -static void rtl8169_rx_clear(struct rtl8169_private *tp)
> +static void rtl8169_rx_clear(struct rtl8169_private *tp, struct rtl8169_rx_ring *ring)
>  {
>  	int i;
>  
> -	for (i = 0; i < NUM_RX_DESC && tp->Rx_databuff[i]; i++) {
> +	for (i = 0; i < NUM_RX_DESC && ring->rx_databuff[i]; i++) {
>  		dma_unmap_page(tp_to_dev(tp),
> -			       le64_to_cpu(tp->RxDescArray[i].addr),
> +			       ring->rx_desc_phy_addr[i],
>  			       R8169_RX_BUF_SIZE, DMA_FROM_DEVICE);
> -		__free_pages(tp->Rx_databuff[i], get_order(R8169_RX_BUF_SIZE));
> -		tp->Rx_databuff[i] = NULL;
> -		tp->RxDescArray[i].addr = 0;
> -		tp->RxDescArray[i].opts1 = 0;
> +		__free_pages(ring->rx_databuff[i], get_order(R8169_RX_BUF_SIZE));
> +		ring->rx_databuff[i] = NULL;
> +		ring->rx_desc_phy_addr[i] = 0;
> +		ring->rx_desc_array[i].addr = 0;
> +		ring->rx_desc_array[i].opts1 = 0;
>  	}
>  }
>  
> -static int rtl8169_rx_fill(struct rtl8169_private *tp)
> +static void rtl8169_mark_as_last_descriptor_default(struct RxDesc *desc)
> +{
> +	desc->opts1 |= cpu_to_le32(RingEnd);
> +}
> +
> +static void rtl8169_mark_as_last_descriptor(struct rtl8169_private *tp, struct RxDesc *desc)
> +{
> +	rtl8169_mark_as_last_descriptor_default(desc);
> +}
> +

Do we actually need this in this patch?

> +static int rtl8169_rx_fill(struct rtl8169_private *tp, struct rtl8169_rx_ring *ring)
>  {
>  	int i;
>  
>  	for (i = 0; i < NUM_RX_DESC; i++) {
>  		struct page *data;
>  
> -		data = rtl8169_alloc_rx_data(tp, tp->RxDescArray + i);
> +		data = rtl8169_alloc_rx_data(tp, ring, i);
>  		if (!data) {
> -			rtl8169_rx_clear(tp);
> +			rtl8169_rx_clear(tp, ring);
>  			return -ENOMEM;
>  		}
> -		tp->Rx_databuff[i] = data;
> +		ring->rx_databuff[i] = data;
>  	}
>  
>  	/* mark as last descriptor in the ring */
> -	tp->RxDescArray[NUM_RX_DESC - 1].opts1 |= cpu_to_le32(RingEnd);
> +	rtl8169_mark_as_last_descriptor(tp, &ring->rx_desc_array[NUM_RX_DESC - 1]);
> +
> +	return 0;
> +}
> +
> +static int rtl8169_alloc_rx_desc(struct rtl8169_private *tp)
> +{
> +	struct rtl8169_rx_ring *ring;
> +	struct pci_dev *pdev = tp->pci_dev;
>  
> +	for (int i = 0; i < tp->num_rx_rings; i++) {
> +		ring = &tp->rx_ring[i];
> +		ring->rx_desc_alloc_size = (ring->num_rx_desc + 1) * sizeof(struct RxDesc);
> +		ring->rx_desc_array = dma_alloc_coherent(&pdev->dev,
> +							 ring->rx_desc_alloc_size,
> +							 &ring->rx_phy_addr,
> +							 GFP_KERNEL);
> +		if (!ring->rx_desc_array)
> +			return -1;
> +	}
>  	return 0;
>  }
>  
> +static void rtl8169_free_rx_desc(struct rtl8169_private *tp)
> +{
> +	struct rtl8169_rx_ring *ring;
> +	struct pci_dev *pdev = tp->pci_dev;
> +
> +	for (int i = 0; i < tp->num_rx_rings; i++) {
> +		ring = &tp->rx_ring[i];
> +		if (ring->rx_desc_array) {
> +			dma_free_coherent(&pdev->dev,
> +					  ring->rx_desc_alloc_size,
> +					  ring->rx_desc_array,
> +					  ring->rx_phy_addr);
> +			ring->rx_desc_array = NULL;
> +		}
> +	}
> +}
> +
>  static int rtl8169_init_ring(struct rtl8169_private *tp)
>  {
> +	int retval = 0;
> +
>  	rtl8169_init_ring_indexes(tp);
> +	rtl8169_rx_desc_init(tp);
>  
>  	memset(tp->tx_skb, 0, sizeof(tp->tx_skb));
> -	memset(tp->Rx_databuff, 0, sizeof(tp->Rx_databuff));
>  
> -	return rtl8169_rx_fill(tp);
> +	for (int i = 0; i < tp->num_rx_rings; i++) {
> +		struct rtl8169_rx_ring *ring = &tp->rx_ring[i];
> +
> +		memset(ring->rx_databuff, 0, sizeof(ring->rx_databuff));
> +		retval = rtl8169_rx_fill(tp, ring);
> +	}
> +
> +	return retval;
>  }
>  
>  static void rtl8169_unmap_tx_skb(struct rtl8169_private *tp, unsigned int entry)
> @@ -4381,16 +4502,24 @@ static void rtl8169_cleanup(struct rtl8169_private *tp)
>  	rtl8169_init_ring_indexes(tp);
>  }
>  
> -static void rtl_reset_work(struct rtl8169_private *tp)
> +static void rtl8169_rx_desc_reset(struct rtl8169_private *tp)
>  {
> -	int i;
> +	for (int i = 0; i < tp->num_rx_rings; i++) {
> +		struct rtl8169_rx_ring *ring = &tp->rx_ring[i];
>  
> +		for (int j = 0; j < ring->num_rx_desc; j++)
> +			rtl8169_mark_to_asic(tp, ring->rx_desc_array + j);
> +	}
> +}
> +
> +static void rtl_reset_work(struct rtl8169_private *tp)
> +{
>  	netif_stop_queue(tp->dev);
>  
>  	rtl8169_cleanup(tp);
>  
> -	for (i = 0; i < NUM_RX_DESC; i++)
> -		rtl8169_mark_to_asic(tp->RxDescArray + i);
> +	rtl8169_rx_desc_reset(tp);
> +
>  	rtl8169_napi_enable(tp);
>  
>  	rtl_hw_start(tp);
> @@ -4784,6 +4913,11 @@ static void rtl8169_pcierr_interrupt(struct net_device *dev)
>  	rtl_schedule_task(tp, RTL_FLAG_TASK_RESET_PENDING);
>  }
>  
> +static void rtl8169_desc_quirk(struct rtl8169_private *tp)
> +{
> +	RTL_R8(tp, tp->imr_reg[0]);
> +}
> +
>  static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp,
>  		   int budget)
>  {
> @@ -4836,9 +4970,11 @@ static inline int rtl8169_fragmented_frame(u32 status)
>  	return (status & (FirstFrag | LastFrag)) != (FirstFrag | LastFrag);
>  }
>  
> -static inline void rtl8169_rx_csum(struct sk_buff *skb, u32 opts1)
> +static inline void rtl8169_rx_csum_default(struct rtl8169_private *tp,
> +					   struct sk_buff *skb,
> +					   struct RxDesc *desc)
>  {
> -	u32 status = opts1 & (RxProtoMask | RxCSFailMask);
> +	u32 status = le32_to_cpu(desc->opts1) & (RxProtoMask | RxCSFailMask);
>  
>  	if (status == RxProtoTCP || status == RxProtoUDP)
>  		skb->ip_summed = CHECKSUM_UNNECESSARY;
> @@ -4846,22 +4982,71 @@ static inline void rtl8169_rx_csum(struct sk_buff *skb, u32 opts1)
>  		skb_checksum_none_assert(skb);
>  }
>  
> -static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget)
> +static inline void rtl8169_rx_csum(struct rtl8169_private *tp,
> +				   struct sk_buff *skb,
> +				   struct RxDesc *desc)
> +{
> +	rtl8169_rx_csum_default(tp, skb, desc);
> +}
> +
> +static u32 rtl8169_rx_desc_opts1(struct rtl8169_private *tp, struct RxDesc *desc)
> +{
> +	return READ_ONCE(desc->opts1);
> +}

I don't see which benefit the helper provides. Instead it may cause side effects
due to the READ_ONCE().

> +
> +static bool rtl8169_check_rx_desc_error(struct net_device *dev,
> +					struct rtl8169_private *tp,
> +					u32 status)
> +{
> +	if (unlikely(status & RxRES)) {
> +		if (status & (RxRWT | RxRUNT))
> +			dev->stats.rx_length_errors++;
> +		if (status & RxCRC)
> +			dev->stats.rx_crc_errors++;
> +		return true;
> +	}
> +	return false;
> +}
> +
> +static inline void rtl8169_set_desc_dma_addr(struct rtl8169_private *tp,
> +					     struct RxDesc *desc,
> +					     dma_addr_t mapping)
> +{
> +	desc->addr = cpu_to_le64(mapping);
> +}

Argument tp isn't used. Why is it there?

> +
> +static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp,
> +		  struct rtl8169_rx_ring *ring, int budget)
>  {
>  	struct device *d = tp_to_dev(tp);
>  	int count;
>  
> -	for (count = 0; count < budget; count++, tp->cur_rx++) {
> -		unsigned int pkt_size, entry = tp->cur_rx % NUM_RX_DESC;
> -		struct RxDesc *desc = tp->RxDescArray + entry;
> +	for (count = 0; count < budget; count++, ring->cur_rx++) {
> +		unsigned int pkt_size, entry = ring->cur_rx % ring->num_rx_desc;
> +		struct RxDesc *desc = ring->rx_desc_array + entry;
>  		struct sk_buff *skb;
>  		const void *rx_buf;
>  		dma_addr_t addr;
>  		u32 status;
>  
> -		status = le32_to_cpu(READ_ONCE(desc->opts1));
> -		if (status & DescOwn)
> -			break;
> +		status = le32_to_cpu(rtl8169_rx_desc_opts1(tp, desc));
> +
> +		if (status & DescOwn) {
> +			if (!tp->recheck_desc_ownbit)
> +				break;
> +
> +			/* Workaround for a hardware issue:

Hardware issue on which chip version(s)?

> +			 * Hardware might trigger RX interrupt before the DMA
> +			 * engine fully updates RX desc ownbit in host memory.
> +			 * So we do a quirk and re-read to avoid missing RX
> +			 * packets.
> +			 */
> +			tp->recheck_desc_ownbit = false;
> +			rtl8169_desc_quirk(tp);
> +			status = le32_to_cpu(rtl8169_rx_desc_opts1(tp, desc));
> +			if (status & DescOwn)
> +				break;
> +		}
>  
>  		/* This barrier is needed to keep us from reading
>  		 * any other fields out of the Rx descriptor until
> @@ -4869,20 +5054,15 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
>  		 */
>  		dma_rmb();
>  
> -		if (unlikely(status & RxRES)) {
> +		if (rtl8169_check_rx_desc_error(dev, tp, status)) {
>  			if (net_ratelimit())
>  				netdev_warn(dev, "Rx ERROR. status = %08x\n",
>  					    status);
> +
>  			dev->stats.rx_errors++;
> -			if (status & (RxRWT | RxRUNT))
> -				dev->stats.rx_length_errors++;
> -			if (status & RxCRC)
> -				dev->stats.rx_crc_errors++;
>  
>  			if (!(dev->features & NETIF_F_RXALL))
>  				goto release_descriptor;
> -			else if (status & RxRWT || !(status & (RxRUNT | RxCRC)))
> -				goto release_descriptor;
>  		}
>  
>  		pkt_size = status & GENMASK(13, 0);
> @@ -4898,14 +5078,14 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
>  			goto release_descriptor;
>  		}
>  
> -		skb = napi_alloc_skb(&tp->r8169napi[0].napi, pkt_size);
> +		skb = napi_alloc_skb(&tp->r8169napi[ring->index].napi, pkt_size);
>  		if (unlikely(!skb)) {
>  			dev->stats.rx_dropped++;
>  			goto release_descriptor;
>  		}
>  
> -		addr = le64_to_cpu(desc->addr);
> -		rx_buf = page_address(tp->Rx_databuff[entry]);
> +		addr = ring->rx_desc_phy_addr[entry];
> +		rx_buf = page_address(ring->rx_databuff[entry]);
>  
>  		dma_sync_single_for_cpu(d, addr, pkt_size, DMA_FROM_DEVICE);
>  		prefetch(rx_buf);
> @@ -4914,7 +5094,7 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
>  		skb->len = pkt_size;
>  		dma_sync_single_for_device(d, addr, pkt_size, DMA_FROM_DEVICE);
>  
> -		rtl8169_rx_csum(skb, status);
> +		rtl8169_rx_csum(tp, skb, desc);
>  		skb->protocol = eth_type_trans(skb, dev);
>  
>  		rtl8169_rx_vlan_tag(desc, skb);
> @@ -4922,11 +5102,12 @@ static int rtl_rx(struct net_device *dev, struct rtl8169_private *tp, int budget
>  		if (skb->pkt_type == PACKET_MULTICAST)
>  			dev->stats.multicast++;
>  
> -		napi_gro_receive(&tp->r8169napi[0].napi, skb);
> +		napi_gro_receive(&tp->r8169napi[ring->index].napi, skb);
>  
>  		dev_sw_netstats_rx_add(dev, pkt_size);
>  release_descriptor:
> -		rtl8169_mark_to_asic(desc);
> +		rtl8169_set_desc_dma_addr(tp, desc, ring->rx_desc_phy_addr[entry]);
> +		rtl8169_mark_to_asic(tp, desc);
>  	}
>  
>  	return count;
> @@ -4952,6 +5133,7 @@ static irqreturn_t rtl8169_interrupt(int irq, void *dev_instance)
>  		phy_mac_interrupt(tp->phydev);
>  
>  	rtl_irq_disable(tp);
> +	tp->recheck_desc_ownbit = true;
>  	napi_schedule(&napi->napi);
>  out:
>  	rtl_ack_events(tp, status);
> @@ -5040,7 +5222,8 @@ static int rtl8169_poll(struct napi_struct *napi, int budget)
>  
>  	rtl_tx(dev, tp, budget);
>  
> -	work_done = rtl_rx(dev, tp, budget);
> +	for (int i = 0; i < tp->num_rx_rings; i++)
> +		work_done += rtl_rx(dev, tp, &tp->rx_ring[i], budget);
>  
>  	if (work_done < budget && napi_complete_done(napi, work_done))
>  		rtl_irq_enable(tp);
> @@ -5168,21 +5351,21 @@ static int rtl8169_close(struct net_device *dev)
>  	struct pci_dev *pdev = tp->pci_dev;
>  
>  	pm_runtime_get_sync(&pdev->dev);
> -
>  	netif_stop_queue(dev);
> +
>  	rtl8169_down(tp);
> -	rtl8169_rx_clear(tp);
> +	for (int i = 0; i < tp->num_rx_rings; i++)
> +		rtl8169_rx_clear(tp, &tp->rx_ring[i]);
>  
>  	rtl8169_free_irq(tp);
>  
>  	phy_disconnect(tp->phydev);
>  
> -	dma_free_coherent(&pdev->dev, R8169_RX_RING_BYTES, tp->RxDescArray,
> -			  tp->RxPhyAddr);
>  	dma_free_coherent(&pdev->dev, R8169_TX_RING_BYTES, tp->TxDescArray,
>  			  tp->TxPhyAddr);
>  	tp->TxDescArray = NULL;
> -	tp->RxDescArray = NULL;
> +
> +	rtl8169_free_rx_desc(tp);
>  
>  	pm_runtime_put_sync(&pdev->dev);
>  
> @@ -5211,16 +5394,15 @@ static int rtl_open(struct net_device *dev)
>  	 * Rx and Tx descriptors needs 256 bytes alignment.
>  	 * dma_alloc_coherent provides more.
>  	 */
> +

It's not the only place with unmotivated changes. Please remove these changes.

>  	tp->TxDescArray = dma_alloc_coherent(&pdev->dev, R8169_TX_RING_BYTES,
>  					     &tp->TxPhyAddr, GFP_KERNEL);
>  	if (!tp->TxDescArray)
> -		goto out;
> -
> -	tp->RxDescArray = dma_alloc_coherent(&pdev->dev, R8169_RX_RING_BYTES,
> -					     &tp->RxPhyAddr, GFP_KERNEL);
> -	if (!tp->RxDescArray)
>  		goto err_free_tx_0;
>  
> +	if (rtl8169_alloc_rx_desc(tp) < 0)
> +		goto err_free_rx_1;
> +
>  	retval = rtl8169_init_ring(tp);
>  	if (retval < 0)
>  		goto err_free_rx_1;
> @@ -5249,11 +5431,10 @@ static int rtl_open(struct net_device *dev)
>  	rtl8169_free_irq(tp);
>  err_release_fw_2:
>  	rtl_release_firmware(tp);
> -	rtl8169_rx_clear(tp);
> +	for (int i = 0; i < tp->num_rx_rings; i++)
> +		rtl8169_rx_clear(tp, &tp->rx_ring[i]);
>  err_free_rx_1:
> -	dma_free_coherent(&pdev->dev, R8169_RX_RING_BYTES, tp->RxDescArray,
> -			  tp->RxPhyAddr);
> -	tp->RxDescArray = NULL;
> +	rtl8169_free_rx_desc(tp);
>  err_free_tx_0:
>  	dma_free_coherent(&pdev->dev, R8169_TX_RING_BYTES, tp->TxDescArray,
>  			  tp->TxPhyAddr);
> @@ -5767,7 +5948,10 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
>  	u32 txconfig;
>  	u32 xid;
>  
> -	dev = devm_alloc_etherdev(&pdev->dev, sizeof (*tp));
> +	dev = devm_alloc_etherdev_mqs(&pdev->dev, sizeof(*tp),
> +				      1,
> +				      R8169_MAX_RX_QUEUES);
> +
>  	if (!dev)
>  		return -ENOMEM;
>  


^ permalink raw reply

* [PATCH net v2 0/8] Intel Wired LAN Driver Updates 2026-05-04 (i40e, ice, idpf)
From: Jacob Keller @ 2026-05-06 21:48 UTC (permalink / raw)
  To: Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Kwapulinski,
	Aleksandr Loktionov, Arkadiusz Kubalewski, Maciej Fijalkowski,
	Joshua Hay, Madhu Chittim, Willem de Bruijn, Dave Ertman,
	Ivan Vecera, Grzegorz Nitka
  Cc: netdev, stable, Jacob Keller, Matt Vollrath, Sunitha Mekala,
	Kohei Enju, Paul Menzel, Emil Tantilov, Simon Horman,
	Samuel Salin, Greg Kroah-Hartman, Tony Nguyen, stable,
	Marcin Szycik, Bart Van Assche, intel-wired-lan, Arpana Arland

Matt Volrath fixes two issues with the i40e driver probe routine, ensuring
that PTP is properly cleaned up if the probe fails.

Emil corrects the initialization of the read_dev_clk_lock spinlock in
idpf_ptp_init, ensuring it is initialized prior to when the
ptp_schedule_worker() is called.

Greg KH fixes a double free and use-after free in the idpf auxiliary device
error paths.

Marcin fixes ice_set_rss_hfunc() to use the correct q_opt_flags field,
correcting the assignment and preventing submission of invalid data to the
firmware.

Bart corrects the locking in ice_dcb_rebuild(), ensuring that the tc_mutex
is held over the entire operation.

Ivan fixes the rclk pin state get for E810 devices, ensuring the index is
properly offset by the base_rclk_idx value. This ensures that the correct
pin index is used to look up recovered clock state. He additionally adds
bounds checking to prevent attempting to access pins outside of the pin
state array.

Ivan also moves the CGU register macros to the top of ice_dpll.h, inside
the header guard to avoid duplicate macro definitions should the ice_dpll.h
header is included multiple times.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
Changes in v2:
- Dropped patches which had comments from Sashiko pointing out issues that
  need to be addressed.
- Link to v1: https://patch.msgid.link/20260504-jk-iwl-net-2026-05-04-v1-0-a222a88bd962@intel.com

---
Bart Van Assche (1):
      ice: fix locking in ice_dcb_rebuild()

Emil Tantilov (1):
      idpf: fix read_dev_clk_lock spinlock init in idpf_ptp_init()

Greg Kroah-Hartman (1):
      idpf: fix double free and use-after-free in aux device error paths

Ivan Vecera (2):
      ice: dpll: fix rclk pin state get for E810
      ice: dpll: fix misplaced header macros

Marcin Szycik (1):
      ice: fix setting RSS VSI hash for E830

Matt Vollrath (2):
      i40e: Cleanup PTP registration on probe failure
      i40e: Cleanup PTP pins on probe failure

 drivers/net/ethernet/intel/i40e/i40e.h       |  1 +
 drivers/net/ethernet/intel/ice/ice_dpll.h    | 32 ++++++++++++++--------------
 drivers/net/ethernet/intel/i40e/i40e_main.c  |  2 ++
 drivers/net/ethernet/intel/i40e/i40e_ptp.c   |  3 ++-
 drivers/net/ethernet/intel/ice/ice_dcb_lib.c |  4 ++--
 drivers/net/ethernet/intel/ice/ice_dpll.c    |  5 +++++
 drivers/net/ethernet/intel/ice/ice_main.c    |  2 +-
 drivers/net/ethernet/intel/idpf/idpf_idc.c   |  6 ++++++
 drivers/net/ethernet/intel/idpf/idpf_ptp.c   |  4 ++--
 9 files changed, 37 insertions(+), 22 deletions(-)
---
base-commit: bd3a4795d5744f59a1f485379f1303e5e606f377
change-id: 20260504-jk-iwl-net-2026-05-04-f9526823577f

Best regards,
--  
Jacob Keller <jacob.e.keller@intel.com>


^ permalink raw reply

* [PATCH net v2 1/8] i40e: Cleanup PTP registration on probe failure
From: Jacob Keller @ 2026-05-06 21:48 UTC (permalink / raw)
  To: Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Kwapulinski,
	Aleksandr Loktionov, Arkadiusz Kubalewski, Maciej Fijalkowski,
	Joshua Hay, Madhu Chittim, Willem de Bruijn, Dave Ertman,
	Ivan Vecera, Grzegorz Nitka
  Cc: netdev, stable, Jacob Keller, Matt Vollrath, Sunitha Mekala
In-Reply-To: <20260506-jk-iwl-net-2026-05-04-v2-0-a5ea4dc837a9@intel.com>

From: Matt Vollrath <tactii@gmail.com>

Fix two conditions which would leak PTP registration on probe failure:

1. i40e_setup_pf_switch can encounter an error in
   i40e_setup_pf_filter_control, call i40e_ptp_init, then return
   non-zero, sending i40e_probe to err_vsis.

2. i40e_setup_misc_vector can return non-zero, sending i40e_probe to
   err_vsis.

Both of these conditions have been present since PTP was introduced in
this driver.

Found with coccinelle.

Fixes: beb0dff1251db ("i40e: enable PTP")
Signed-off-by: Matt Vollrath <tactii@gmail.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 028bd500603a..f06fcef644e5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -16108,6 +16108,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	/* Unwind what we've done if something failed in the setup */
 err_vsis:
 	set_bit(__I40E_DOWN, pf->state);
+	i40e_ptp_stop(pf);
 	i40e_clear_interrupt_scheme(pf);
 	kfree(pf->vsi);
 err_switch_setup:

-- 
2.54.0.rc2.531.gaf818d63126a


^ permalink raw reply related

* [PATCH net v2 3/8] idpf: fix read_dev_clk_lock spinlock init in idpf_ptp_init()
From: Jacob Keller @ 2026-05-06 21:48 UTC (permalink / raw)
  To: Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Kwapulinski,
	Aleksandr Loktionov, Arkadiusz Kubalewski, Maciej Fijalkowski,
	Joshua Hay, Madhu Chittim, Willem de Bruijn, Dave Ertman,
	Ivan Vecera, Grzegorz Nitka
  Cc: netdev, stable, Jacob Keller, Emil Tantilov, Simon Horman,
	Samuel Salin
In-Reply-To: <20260506-jk-iwl-net-2026-05-04-v2-0-a5ea4dc837a9@intel.com>

From: Emil Tantilov <emil.s.tantilov@intel.com>

In idpf_ptp_init(), read_dev_clk_lock is initialized after
ptp_schedule_worker() had already been called (and after
idpf_ptp_settime64() could reach the lock). The PTP aux worker
fires immediately upon scheduling and can call into
idpf_ptp_read_src_clk_reg_direct(), which takes
spin_lock(&ptp->read_dev_clk_lock) on an uninitialized lock, triggering
the lockdep "non-static key" warning:

[12973.796587] idpf 0000:83:00.0: Device HW Reset initiated
[12974.094507] INFO: trying to register non-static key.
...
[12974.097208] Call Trace:
[12974.097213]  <TASK>
[12974.097218]  dump_stack_lvl+0x93/0xe0
[12974.097234]  register_lock_class+0x4c4/0x4e0
[12974.097249]  ? __lock_acquire+0x427/0x2290
[12974.097259]  __lock_acquire+0x98/0x2290
[12974.097272]  lock_acquire+0xc6/0x310
[12974.097281]  ? idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf]
[12974.097311]  ? lockdep_hardirqs_on_prepare+0xde/0x190
[12974.097318]  ? finish_task_switch.isra.0+0xd2/0x350
[12974.097330]  ? __pfx_ptp_aux_kworker+0x10/0x10 [ptp]
[12974.097343]  _raw_spin_lock+0x30/0x40
[12974.097353]  ? idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf]
[12974.097373]  idpf_ptp_read_src_clk_reg+0xb7/0x150 [idpf]
[12974.097391]  ? kthread_worker_fn+0x88/0x3d0
[12974.097404]  ? kthread_worker_fn+0x4e/0x3d0
[12974.097411]  idpf_ptp_update_cached_phctime+0x26/0x120 [idpf]
[12974.097428]  ? _raw_spin_unlock_irq+0x28/0x50
[12974.097436]  idpf_ptp_do_aux_work+0x15/0x20 [idpf]
[12974.097454]  ptp_aux_kworker+0x20/0x40 [ptp]
[12974.097464]  kthread_worker_fn+0xd5/0x3d0
[12974.097474]  ? __pfx_kthread_worker_fn+0x10/0x10
[12974.097482]  kthread+0xf4/0x130
[12974.097489]  ? __pfx_kthread+0x10/0x10
[12974.097498]  ret_from_fork+0x32c/0x410
[12974.097512]  ? __pfx_kthread+0x10/0x10
[12974.097519]  ret_from_fork_asm+0x1a/0x30
[12974.097540]  </TASK>

Move the call to spin_lock_init() up a bit to make sure read_dev_clk_lock
is not touched before it's been initialized.

Fixes: 5cb8805d2366 ("idpf: negotiate PTP capabilities and get PTP clock")
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Reviewed-by: Madhu Chittim <madhu.chittim@intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Samuel Salin <Samuel.salin@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/idpf/idpf_ptp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_ptp.c b/drivers/net/ethernet/intel/idpf/idpf_ptp.c
index eec91c4f0a75..4a51d2727547 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_ptp.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_ptp.c
@@ -952,6 +952,8 @@ int idpf_ptp_init(struct idpf_adapter *adapter)
 		goto free_ptp;
 	}
 
+	spin_lock_init(&adapter->ptp->read_dev_clk_lock);
+
 	err = idpf_ptp_create_clock(adapter);
 	if (err)
 		goto free_ptp;
@@ -977,8 +979,6 @@ int idpf_ptp_init(struct idpf_adapter *adapter)
 			goto remove_clock;
 	}
 
-	spin_lock_init(&adapter->ptp->read_dev_clk_lock);
-
 	pci_dbg(adapter->pdev, "PTP init successful\n");
 
 	return 0;

-- 
2.54.0.rc2.531.gaf818d63126a


^ permalink raw reply related

* [PATCH net v2 2/8] i40e: Cleanup PTP pins on probe failure
From: Jacob Keller @ 2026-05-06 21:48 UTC (permalink / raw)
  To: Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Kwapulinski,
	Aleksandr Loktionov, Arkadiusz Kubalewski, Maciej Fijalkowski,
	Joshua Hay, Madhu Chittim, Willem de Bruijn, Dave Ertman,
	Ivan Vecera, Grzegorz Nitka
  Cc: netdev, stable, Jacob Keller, Matt Vollrath, Kohei Enju,
	Paul Menzel, Sunitha Mekala
In-Reply-To: <20260506-jk-iwl-net-2026-05-04-v2-0-a5ea4dc837a9@intel.com>

From: Matt Vollrath <tactii@gmail.com>

PTP pin structs are allocated early in probe, but never cleaned up.

Fix this by calling i40e_ptp_free_pins in the error path.

To support this, i40e_ptp_free_pins is added to the header and
pin_config is correctly nullified after being freed.

This has been an issue since i40e_ptp_alloc_pins was introduced.

Fixes: 1050713026a08 ("i40e: add support for PTP external synchronization clock")
Reported-by: Kohei Enju <kohei@enjuk.jp>
Cc: stable@vger.kernel.org
Signed-off-by: Matt Vollrath <tactii@gmail.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Kohei Enju <kohei@enjuk.jp>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e.h      | 1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
 drivers/net/ethernet/intel/i40e/i40e_ptp.c  | 3 ++-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index dcb50c2e1aa2..83e780919ac9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -1318,6 +1318,7 @@ void i40e_ptp_restore_hw_time(struct i40e_pf *pf);
 void i40e_ptp_init(struct i40e_pf *pf);
 void i40e_ptp_stop(struct i40e_pf *pf);
 int i40e_ptp_alloc_pins(struct i40e_pf *pf);
+void i40e_ptp_free_pins(struct i40e_pf *pf);
 int i40e_update_adq_vsi_queues(struct i40e_vsi *vsi, int vsi_offset);
 int i40e_is_vsi_uplink_mode_veb(struct i40e_vsi *vsi);
 int i40e_get_partition_bw_setting(struct i40e_pf *pf);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f06fcef644e5..6d4f9218dc68 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -16112,6 +16112,7 @@ static int i40e_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 	i40e_clear_interrupt_scheme(pf);
 	kfree(pf->vsi);
 err_switch_setup:
+	i40e_ptp_free_pins(pf);
 	i40e_reset_interrupt_capability(pf);
 	timer_shutdown_sync(&pf->service_timer);
 err_mac_addr:
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index 404a716db8da..7d07c389bb23 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -940,12 +940,13 @@ int i40e_ptp_hwtstamp_get(struct net_device *netdev,
  *
  * Release memory allocated for PTP pins.
  **/
-static void i40e_ptp_free_pins(struct i40e_pf *pf)
+void i40e_ptp_free_pins(struct i40e_pf *pf)
 {
 	if (i40e_is_ptp_pin_dev(&pf->hw)) {
 		kfree(pf->ptp_pins);
 		kfree(pf->ptp_caps.pin_config);
 		pf->ptp_pins = NULL;
+		pf->ptp_caps.pin_config = NULL;
 	}
 }
 

-- 
2.54.0.rc2.531.gaf818d63126a


^ permalink raw reply related

* [PATCH net v2 4/8] idpf: fix double free and use-after-free in aux device error paths
From: Jacob Keller @ 2026-05-06 21:48 UTC (permalink / raw)
  To: Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Kwapulinski,
	Aleksandr Loktionov, Arkadiusz Kubalewski, Maciej Fijalkowski,
	Joshua Hay, Madhu Chittim, Willem de Bruijn, Dave Ertman,
	Ivan Vecera, Grzegorz Nitka
  Cc: netdev, stable, Jacob Keller, Greg Kroah-Hartman, Tony Nguyen,
	stable, Paul Menzel
In-Reply-To: <20260506-jk-iwl-net-2026-05-04-v2-0-a5ea4dc837a9@intel.com>

From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

When auxiliary_device_add() fails in idpf_plug_vport_aux_dev() or
idpf_plug_core_aux_dev(), the err_aux_dev_add label calls
auxiliary_device_uninit() and falls through to err_aux_dev_init.  The
uninit call will trigger put_device(), which invokes the release
callback (idpf_vport_adev_release / idpf_core_adev_release) that frees
iadev.  The fall-through then reads adev->id from the freed iadev for
ida_free() and double-frees iadev with kfree().

Free the IDA slot and clear the back-pointer before uninit, while adev
is still valid, then return immediately.

Commit 65637c3a1811 ("idpf: fix UAF in RDMA core aux dev deinitialization")
fixed the same use-after-free in the matching unplug path in this file but
missed both probe error paths.

Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: stable@kernel.org
Fixes: be91128c579c ("idpf: implement RDMA vport auxiliary dev create, init, and destroy")
Fixes: f4312e6bfa2a ("idpf: implement core RDMA auxiliary dev create, init, and destroy")
Assisted-by: gregkh_clanker_t1000
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/idpf/idpf_idc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/intel/idpf/idpf_idc.c b/drivers/net/ethernet/intel/idpf/idpf_idc.c
index 7e4f4ac92653..b7d6b08fc89e 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_idc.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_idc.c
@@ -90,7 +90,10 @@ static int idpf_plug_vport_aux_dev(struct iidc_rdma_core_dev_info *cdev_info,
 	return 0;
 
 err_aux_dev_add:
+	ida_free(&idpf_idc_ida, adev->id);
+	vdev_info->adev = NULL;
 	auxiliary_device_uninit(adev);
+	return ret;
 err_aux_dev_init:
 	ida_free(&idpf_idc_ida, adev->id);
 err_ida_alloc:
@@ -228,7 +231,10 @@ static int idpf_plug_core_aux_dev(struct iidc_rdma_core_dev_info *cdev_info)
 	return 0;
 
 err_aux_dev_add:
+	ida_free(&idpf_idc_ida, adev->id);
+	cdev_info->adev = NULL;
 	auxiliary_device_uninit(adev);
+	return ret;
 err_aux_dev_init:
 	ida_free(&idpf_idc_ida, adev->id);
 err_ida_alloc:

-- 
2.54.0.rc2.531.gaf818d63126a


^ permalink raw reply related

* [PATCH net v2 5/8] ice: fix setting RSS VSI hash for E830
From: Jacob Keller @ 2026-05-06 21:48 UTC (permalink / raw)
  To: Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Kwapulinski,
	Aleksandr Loktionov, Arkadiusz Kubalewski, Maciej Fijalkowski,
	Joshua Hay, Madhu Chittim, Willem de Bruijn, Dave Ertman,
	Ivan Vecera, Grzegorz Nitka
  Cc: netdev, stable, Jacob Keller, Marcin Szycik
In-Reply-To: <20260506-jk-iwl-net-2026-05-04-v2-0-a5ea4dc837a9@intel.com>

From: Marcin Szycik <marcin.szycik@linux.intel.com>

ice_set_rss_hfunc() performs a VSI update, in which it sets hashing
function, leaving other VSI options unchanged. However, ::q_opt_flags is
mistakenly set to the value of another field, instead of its original
value, probably due to a typo. What happens next is hardware-dependent:

On E810, only the first bit is meaningful (see
ICE_AQ_VSI_Q_OPT_PE_FLTR_EN) and can potentially end up in a different
state than before VSI update.

On E830, some of the remaining bits are not reserved. Setting them
to some unrelated values can cause the firmware to reject the update
because of invalid settings, or worse - succeed.

Reproducer:
  sudo ethtool -X $PF1 equal 8

Output in dmesg:
  Failed to configure RSS hash for VSI 6, error -5

Fixes: 352e9bf23813 ("ice: enable symmetric-xor RSS for Toeplitz hash function")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 1d1947a7fe11..c52c465280f7 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -8046,7 +8046,7 @@ int ice_set_rss_hfunc(struct ice_vsi *vsi, u8 hfunc)
 	ctx->info.q_opt_rss |=
 		FIELD_PREP(ICE_AQ_VSI_Q_OPT_RSS_HASH_M, hfunc);
 	ctx->info.q_opt_tc = vsi->info.q_opt_tc;
-	ctx->info.q_opt_flags = vsi->info.q_opt_rss;
+	ctx->info.q_opt_flags = vsi->info.q_opt_flags;
 
 	err = ice_update_vsi(hw, vsi->idx, ctx, NULL);
 	if (err) {

-- 
2.54.0.rc2.531.gaf818d63126a


^ permalink raw reply related

* [PATCH net v2 6/8] ice: fix locking in ice_dcb_rebuild()
From: Jacob Keller @ 2026-05-06 21:48 UTC (permalink / raw)
  To: Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Kwapulinski,
	Aleksandr Loktionov, Arkadiusz Kubalewski, Maciej Fijalkowski,
	Joshua Hay, Madhu Chittim, Willem de Bruijn, Dave Ertman,
	Ivan Vecera, Grzegorz Nitka
  Cc: netdev, stable, Jacob Keller, Bart Van Assche, intel-wired-lan,
	Arpana Arland
In-Reply-To: <20260506-jk-iwl-net-2026-05-04-v2-0-a5ea4dc837a9@intel.com>

From: Bart Van Assche <bvanassche@acm.org>

Move the mutex_lock() call up to prevent that DCB settings change after
the first ice_query_port_ets() call. The second ice_query_port_ets()
call in ice_dcb_rebuild() is already protected by pf->tc_mutex.

This also fixes a bug in an error path, as before taking the first
"goto dcb_error" in the function jumped over mutex_lock() to
mutex_unlock().

This bug has been detected by the clang thread-safety analyzer.

Cc: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Fixes: 242b5e068b25 ("ice: Fix DCB rebuild after reset")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_dcb_lib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
index 16aa25535152..0bc6dd375687 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
@@ -537,14 +537,14 @@ void ice_dcb_rebuild(struct ice_pf *pf)
 	struct ice_dcbx_cfg *err_cfg;
 	int ret;
 
+	mutex_lock(&pf->tc_mutex);
+
 	ret = ice_query_port_ets(pf->hw.port_info, &buf, sizeof(buf), NULL);
 	if (ret) {
 		dev_err(dev, "Query Port ETS failed\n");
 		goto dcb_error;
 	}
 
-	mutex_lock(&pf->tc_mutex);
-
 	if (!pf->hw.port_info->qos_cfg.is_sw_lldp)
 		ice_cfg_etsrec_defaults(pf->hw.port_info);
 

-- 
2.54.0.rc2.531.gaf818d63126a


^ permalink raw reply related

* [PATCH net v2 7/8] ice: dpll: fix rclk pin state get for E810
From: Jacob Keller @ 2026-05-06 21:48 UTC (permalink / raw)
  To: Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Kwapulinski,
	Aleksandr Loktionov, Arkadiusz Kubalewski, Maciej Fijalkowski,
	Joshua Hay, Madhu Chittim, Willem de Bruijn, Dave Ertman,
	Ivan Vecera, Grzegorz Nitka
  Cc: netdev, stable, Jacob Keller
In-Reply-To: <20260506-jk-iwl-net-2026-05-04-v2-0-a5ea4dc837a9@intel.com>

From: Ivan Vecera <ivecera@redhat.com>

The refactoring of ice_dpll_rclk_state_on_pin_get() to use
ice_dpll_pin_get_parent_idx() omitted the base_rclk_idx adjustment that was
correctly added in the ice_dpll_rclk_state_on_pin_set() path. This breaks
E810 devices where base_rclk_idx is non-zero, causing the wrong hardware
index to be used for pin state lookup and incorrect recovered clock state
to be reported via the DPLL subsystem. E825C is unaffected as its
base_rclk_idx is 0.

While at it, add bounds check against ICE_DPLL_RCLK_NUM_MAX on hw_idx after
the base_rclk_idx subtraction in both ice_dpll_rclk_state_on_pin_{get,set}()
to prevent out-of-bounds access on the pin state array.

Fixes: ad1df4f2d591 ("ice: dpll: Support E825-C SyncE and dynamic pin discovery")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_dpll.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.c b/drivers/net/ethernet/intel/ice/ice_dpll.c
index 27b460926bac..892bc7c2e28b 100644
--- a/drivers/net/ethernet/intel/ice/ice_dpll.c
+++ b/drivers/net/ethernet/intel/ice/ice_dpll.c
@@ -2523,6 +2523,8 @@ ice_dpll_rclk_state_on_pin_set(const struct dpll_pin *pin, void *pin_priv,
 	if (hw_idx < 0)
 		goto unlock;
 	hw_idx -= pf->dplls.base_rclk_idx;
+	if (hw_idx >= ICE_DPLL_RCLK_NUM_MAX)
+		goto unlock;
 
 	if ((enable && p->state[hw_idx] == DPLL_PIN_STATE_CONNECTED) ||
 	    (!enable && p->state[hw_idx] == DPLL_PIN_STATE_DISCONNECTED)) {
@@ -2586,6 +2588,9 @@ ice_dpll_rclk_state_on_pin_get(const struct dpll_pin *pin, void *pin_priv,
 	hw_idx = ice_dpll_pin_get_parent_idx(p, parent_pin);
 	if (hw_idx < 0)
 		goto unlock;
+	hw_idx -= pf->dplls.base_rclk_idx;
+	if (hw_idx >= ICE_DPLL_RCLK_NUM_MAX)
+		goto unlock;
 
 	ret = ice_dpll_pin_state_update(pf, p, ICE_DPLL_PIN_TYPE_RCLK_INPUT,
 					extack);

-- 
2.54.0.rc2.531.gaf818d63126a


^ permalink raw reply related

* [PATCH net v2 8/8] ice: dpll: fix misplaced header macros
From: Jacob Keller @ 2026-05-06 21:48 UTC (permalink / raw)
  To: Przemek Kitszel, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Piotr Kwapulinski,
	Aleksandr Loktionov, Arkadiusz Kubalewski, Maciej Fijalkowski,
	Joshua Hay, Madhu Chittim, Willem de Bruijn, Dave Ertman,
	Ivan Vecera, Grzegorz Nitka
  Cc: netdev, stable, Jacob Keller
In-Reply-To: <20260506-jk-iwl-net-2026-05-04-v2-0-a5ea4dc837a9@intel.com>

From: Ivan Vecera <ivecera@redhat.com>

The CGU register definitions (ICE_CGU_R10, ICE_CGU_R11 and related field
masks) were placed after the #endif of the _ICE_DPLL_H_ include guard,
leaving them unprotected. Move them inside the guard.

Fixes: ad1df4f2d591 ("ice: dpll: Support E825-C SyncE and dynamic pin discovery")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_dpll.h | 32 +++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_dpll.h b/drivers/net/ethernet/intel/ice/ice_dpll.h
index ae42cdea0ee1..8678575359b9 100644
--- a/drivers/net/ethernet/intel/ice/ice_dpll.h
+++ b/drivers/net/ethernet/intel/ice/ice_dpll.h
@@ -8,6 +8,22 @@
 
 #define ICE_DPLL_RCLK_NUM_MAX	4
 
+#define ICE_CGU_R10			0x28
+#define ICE_CGU_R10_SYNCE_CLKO_SEL	GENMASK(8, 5)
+#define ICE_CGU_R10_SYNCE_CLKODIV_M1	GENMASK(13, 9)
+#define ICE_CGU_R10_SYNCE_CLKODIV_LOAD	BIT(14)
+#define ICE_CGU_R10_SYNCE_DCK_RST	BIT(15)
+#define ICE_CGU_R10_SYNCE_ETHCLKO_SEL	GENMASK(18, 16)
+#define ICE_CGU_R10_SYNCE_ETHDIV_M1	GENMASK(23, 19)
+#define ICE_CGU_R10_SYNCE_ETHDIV_LOAD	BIT(24)
+#define ICE_CGU_R10_SYNCE_DCK2_RST	BIT(25)
+#define ICE_CGU_R10_SYNCE_S_REF_CLK	GENMASK(31, 27)
+
+#define ICE_CGU_R11			0x2C
+#define ICE_CGU_R11_SYNCE_S_BYP_CLK	GENMASK(6, 1)
+
+#define ICE_CGU_BYPASS_MUX_OFFSET_E825C	3
+
 /**
  * enum ice_dpll_pin_sw - enumerate ice software pin indices:
  * @ICE_DPLL_PIN_SW_1_IDX: index of first SW pin
@@ -157,19 +173,3 @@ static inline void ice_dpll_deinit(struct ice_pf *pf) { }
 #endif
 
 #endif
-
-#define ICE_CGU_R10				0x28
-#define ICE_CGU_R10_SYNCE_CLKO_SEL		GENMASK(8, 5)
-#define ICE_CGU_R10_SYNCE_CLKODIV_M1		GENMASK(13, 9)
-#define ICE_CGU_R10_SYNCE_CLKODIV_LOAD		BIT(14)
-#define ICE_CGU_R10_SYNCE_DCK_RST		BIT(15)
-#define ICE_CGU_R10_SYNCE_ETHCLKO_SEL		GENMASK(18, 16)
-#define ICE_CGU_R10_SYNCE_ETHDIV_M1		GENMASK(23, 19)
-#define ICE_CGU_R10_SYNCE_ETHDIV_LOAD		BIT(24)
-#define ICE_CGU_R10_SYNCE_DCK2_RST		BIT(25)
-#define ICE_CGU_R10_SYNCE_S_REF_CLK		GENMASK(31, 27)
-
-#define ICE_CGU_R11				0x2C
-#define ICE_CGU_R11_SYNCE_S_BYP_CLK		GENMASK(6, 1)
-
-#define ICE_CGU_BYPASS_MUX_OFFSET_E825C		3

-- 
2.54.0.rc2.531.gaf818d63126a


^ permalink raw reply related

* Re: [Intel-wired-lan] [PATCH] i40e: Fix i40e_debug() to use struct i40e_hw argument
From: Jacob Keller @ 2026-05-06 21:57 UTC (permalink / raw)
  To: Mohamed Khalfella, Paul Menzel
  Cc: Tony Nguyen, Przemek Kitszel, David S . Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, intel-wired-lan, netdev,
	linux-kernel
In-Reply-To: <20260429165229.GF2686-mkhalfella@purestorage.com>

On 4/29/2026 9:52 AM, Mohamed Khalfella wrote:
> On Wed 2026-04-29 13:02:00 +0200, Paul Menzel wrote:
>> Dear Mohamed,
>>
>>
>> Thank you for your patch.
>>
>> Am 28.04.26 um 20:14 schrieb Mohamed Khalfella:
>>> i40e_debug() macro takes struct i40e_hw *h as first argument. But the
>>> macro body uses hw instead of h. This has been working so far because hw
>>> happen to be the name of the variable in the context where the marco is
>>
>> marco → ma*cr*o
> 
> Good catch. Also 'happen' should be 'happens'
> 
>>
>>> expanded. Fix the macro to use the passed argument.
>>
>> I’d add a Fixes: tag, but the maintainers might have more input.
> 
> Yes, I should have added Fixes: tag. I will leave it to the maintainer
> to decide if v2 is needed to fix the spelling mistakes and add Fixes
> tag.
> 
> Fixes: 5dfd37c37a44 ("i40e: Split i40e_osdep.h")
> 
Please send a v2 with the fixes tag and typo. It will make it easier to
avoid losing this data.

^ permalink raw reply

* Re: [PATCH net-next 1/6] bridge: uapi: Add neigh_forward_grat netlink attributes
From: Jakub Kicinski @ 2026-05-06 22:05 UTC (permalink / raw)
  To: Danielle Ratson
  Cc: Ido Schimmel, netdev@vger.kernel.org, donald.hunter@gmail.com,
	davem@davemloft.net, edumazet@google.com, pabeni@redhat.com,
	horms@kernel.org, razor@blackwall.org, andrew+netdev@lunn.ch,
	shuah@kernel.org, ast@fiberby.net, liuhangbin@gmail.com,
	daniel@iogearbox.net, Andy Roulin, fmaurer@redhat.com,
	sdf.kernel@gmail.com, sd@queasysnail.net, kees@kernel.org,
	nickgarlis@gmail.com, amorenoz@redhat.com, alasdair@mcwilliam.dev,
	johannes.wiesboeck@aisec.fraunhofer.de, Petr Machata,
	linux-kernel@vger.kernel.org, bridge@lists.linux.dev,
	linux-kselftest@vger.kernel.org
In-Reply-To: <DM6PR12MB45167562767CFC70DA9AE8D0D83F2@DM6PR12MB4516.namprd12.prod.outlook.com>

On Wed, 6 May 2026 08:31:01 +0000 Danielle Ratson wrote:
> > > I think this should be u8 ? neigh-vlan-suppress looks buggy too  
> > 
> > I pointed this out during internal review, but assumed I am missing something
> > since almost all the attributes use flag when they are in fact u8. We can fix  
> 
> This is in fact the reason why I also changed it myself to use flag before sending.
> 
> > neigh-forward-grat to use u8 in v2 and change the rest in net. To be clear, I
> > believe the following should be converted from flag to u8:
> > 
> > mode, guard, protect, fast-leave, learning, unicast-flood, proxyarp, learning-
> > sync, proxyarp-wifi, mcast-flood, mcast-to-ucast, vlan-tunnel, bcast-flood,
> > neigh-suppress, isolated, mrp-ring-open, mrp-in-open, locked, mab, neigh-
> > vlan-suppress
> 
> So should we proceed as Ido suggested?

I think so. Tweak this patch and send a separate fix to net for the
existing attrs.

> > > flag is a type without a payload, the presence of the attr is the
> > > entire information
> > >
> > > None of the AIs seem to catch this, I think you may have over-split
> > > this submission a little bit. This patch may have been better off
> > > squashed into patch 4 ?  
> 
> It seems like the patch has enough content, but I can squash. I guess
> ill split the commit between patches 4 and 5 accordingly.

Could be my lack of familiarity with bridge but FWIW reading the
submission it felt like it was split by "area of code" rather than
logical steps. Splitting work into patches is a bit of an art..

> > Related: The AI also did not catch that the spec was missing (easy
> > to forget for rtnetlink). Do you think it's worth adding to
> > review-prompts?  

I assumed Sashiko missed this because it doesn't spent too much time on
the series as a whole right now (I was going to tweak that but looks
like Claude is having another meltdown, crazy inference latency,
patches backlogging for review..)

Are you saying that the review prompts-based test is also not catching
this even if you give it the range of commits that form the series?

^ permalink raw reply

* Re: [PATCH v5 net-next 3/3] selftests:net: Implement ptp4l sync test using netdevsim
From: Jakub Kicinski @ 2026-05-06 22:07 UTC (permalink / raw)
  To: Maciek Machnikowski
  Cc: netdev, richardcochran, milena.olech, willemdebruijn.kernel,
	andrew, vadim.fedorenko, horms
In-Reply-To: <7fc0a457-7153-4b22-b3c9-2c6298b75d73@machnikowski.net>

On Wed, 6 May 2026 13:18:01 +0200 Maciek Machnikowski wrote:
> On 06/05/2026 02:23, Jakub Kicinski wrote:
> > On Tue, 5 May 2026 17:22:34 -0700 Jakub Kicinski wrote:  
> >> You have to add it to the relevant config.
> >>
> >> Please read: github.com/linux-netdev/nipa/wiki/Netdev-CI-system  
> > 
> > Sorry, wrong link, this:
> > 
> > https://github.com/linux-netdev/nipa/wiki/How-to-run-netdev-selftests-CI-style  
> I did follow the guid, ran the test on the vng environment and it still
> passed the test. Is there more debuggability on the CI system? Can you
> share the .config file it generated while running the test? Or the
> commands it tries to run so I can verify them locally?
> 
> 
> The commands I ran were:
> 
> make mrproper
> vng --build  --config tools/testing/selftests/drivers/net/netdevsim/config
> vng -v --run . --user root --cpus 4 -- tools/testing/selftests/net/ptp.py
> 
> The tests were done on the ARM64 Fedora 44 with the default kernel
> 6.19.14-300.fc44.aarch64 running inside the Parallels Desktop VM.

https://netdev-ctrl.bots.linux.dev/logs/vmksft/net/results/631241/config

^ permalink raw reply

* Re: [PATCH net-next v10 1/4] tun/tap: add ptr_ring consume helper with netdev queue wakeup
From: Michael S. Tsirkin @ 2026-05-06 22:18 UTC (permalink / raw)
  To: Simon Schippers
  Cc: willemdebruijn.kernel, jasowang, andrew+netdev, davem, edumazet,
	kuba, pabeni, eperezma, leiyang, stephen, jon, tim.gebauer,
	netdev, linux-kernel, kvm, virtualization
In-Reply-To: <20260506141033.180450-2-simon.schippers@tu-dortmund.de>

On Wed, May 06, 2026 at 04:10:30PM +0200, Simon Schippers wrote:
> Introduce tun_ring_consume() that wraps ptr_ring_consume() and calls
> __tun_wake_queue(). The latter wakes the stopped netdev subqueue once
> half of the ring capacity has been consumed, tracked via the new
> cons_cnt field in tun_file. cons_cnt is updated while holding the ring
> consumer lock, avoiding races. As a safety net, the queue is also woken
> when the ring becomes empty. The point is to allow the queue to be
> stopped when it gets full, which is required for traffic shaping -
> implemented by the following "avoid ptr_ring tail-drop when a qdisc
> is present". That patch also explains the pairing of the smp_mb()
> of __tun_wake_queue().
> 
> Without the corresponding queue stopping, this patch alone causes no
> regression for a tap setup sending to a qemu VM: 1.132 Mpps
> to 1.144 Mpps.
> 
> Details: AMD Ryzen 5 5600X at 4.3 GHz, 3200 MHz RAM, isolated QEMU
> threads, pktgen sender; Avg over 50 runs @ 100,000,000 packets;
> SRSO and spectre v2 mitigations disabled.
> 
> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
> ---
>  drivers/net/tun.c | 54 +++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 50 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index b183189f1853..00ecf128fe8e 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -145,6 +145,7 @@ struct tun_file {
>  	struct list_head next;
>  	struct tun_struct *detached;
>  	struct ptr_ring tx_ring;
> +	int cons_cnt;
>  	struct xdp_rxq_info xdp_rxq;
>  };
>  
> @@ -557,6 +558,13 @@ void tun_ptr_free(void *ptr)
>  }
>  EXPORT_SYMBOL_GPL(tun_ptr_free);
>  
> +static void tun_reset_cons_cnt(struct tun_file *tfile)
> +{
> +	spin_lock(&tfile->tx_ring.consumer_lock);
> +	tfile->cons_cnt = 0;
> +	spin_unlock(&tfile->tx_ring.consumer_lock);
> +}
> +
>  static void tun_queue_purge(struct tun_file *tfile)
>  {
>  	void *ptr;
> @@ -564,6 +572,7 @@ static void tun_queue_purge(struct tun_file *tfile)
>  	while ((ptr = ptr_ring_consume(&tfile->tx_ring)) != NULL)
>  		tun_ptr_free(ptr);
>  
> +	tun_reset_cons_cnt(tfile);
>  	skb_queue_purge(&tfile->sk.sk_write_queue);
>  	skb_queue_purge(&tfile->sk.sk_error_queue);
>  }
> @@ -730,6 +739,7 @@ static int tun_attach(struct tun_struct *tun, struct file *file,
>  		goto out;
>  	}
>  
> +	tun_reset_cons_cnt(tfile);
>  	tfile->queue_index = tun->numqueues;
>  	tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
>  
> @@ -2115,13 +2125,46 @@ static ssize_t tun_put_user(struct tun_struct *tun,
>  	return total;
>  }
>  
> -static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
> +/* Callers must hold ring.consumer_lock */
> +static void __tun_wake_queue(struct tun_struct *tun,
> +			     struct tun_file *tfile, int consumed)
> +{
> +	struct netdev_queue *txq = netdev_get_tx_queue(tun->dev,
> +						tfile->queue_index);
> +
> +	/* Paired with smp_mb__after_atomic() in tun_net_xmit() */
> +	smp_mb();
> +	if (netif_tx_queue_stopped(txq)) {
> +		tfile->cons_cnt += consumed;
> +		if (tfile->cons_cnt >= tfile->tx_ring.size / 2 ||
> +		    __ptr_ring_empty(&tfile->tx_ring)) {
> +			netif_tx_wake_queue(txq);
> +			tfile->cons_cnt = 0;
> +		}
> +	}
> +}
> +
> +static void *tun_ring_consume(struct tun_struct *tun, struct tun_file *tfile)
> +{
> +	void *ptr;
> +
> +	spin_lock(&tfile->tx_ring.consumer_lock);
> +	ptr = __ptr_ring_consume(&tfile->tx_ring);
> +	if (ptr)
> +		__tun_wake_queue(tun, tfile, 1);
> +
> +	spin_unlock(&tfile->tx_ring.consumer_lock);
> +	return ptr;
> +}
> +
> +static void *tun_ring_recv(struct tun_struct *tun, struct tun_file *tfile,
> +			   int noblock, int *err)
>  {
>  	DECLARE_WAITQUEUE(wait, current);
>  	void *ptr = NULL;
>  	int error = 0;
>  
> -	ptr = ptr_ring_consume(&tfile->tx_ring);
> +	ptr = tun_ring_consume(tun, tfile);
>  	if (ptr)
>  		goto out;
>  	if (noblock) {
> @@ -2133,7 +2176,7 @@ static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
>  
>  	while (1) {
>  		set_current_state(TASK_INTERRUPTIBLE);
> -		ptr = ptr_ring_consume(&tfile->tx_ring);
> +		ptr = tun_ring_consume(tun, tfile);
>  		if (ptr)
>  			break;
>  		if (signal_pending(current)) {


So based on commit log I expected all calls to ptr_ring_consume to
be replaced with tun_ring_consume, but it looks like tun_queue_purge
still calls ptr_ring_consume.
I suspect that together with patch 4 it can sometimes leave us stuck
with a stopped queue and an empty ring, forever.





> @@ -2170,7 +2213,7 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
>  
>  	if (!ptr) {
>  		/* Read frames from ring */
> -		ptr = tun_ring_recv(tfile, noblock, &err);
> +		ptr = tun_ring_recv(tun, tfile, noblock, &err);
>  		if (!ptr)
>  			return err;
>  	}
> @@ -3406,6 +3449,8 @@ static int tun_chr_open(struct inode *inode, struct file * file)
>  		return -ENOMEM;
>  	}
>  
> +	tun_reset_cons_cnt(tfile);
> +
>  	mutex_init(&tfile->napi_mutex);
>  	RCU_INIT_POINTER(tfile->tun, NULL);
>  	tfile->flags = 0;
> @@ -3614,6 +3659,7 @@ static int tun_queue_resize(struct tun_struct *tun)
>  	for (i = 0; i < tun->numqueues; i++) {
>  		tfile = rtnl_dereference(tun->tfiles[i]);
>  		rings[i] = &tfile->tx_ring;
> +		tun_reset_cons_cnt(tfile);
>  	}
>  	list_for_each_entry(tfile, &tun->disabled, next)
>  		rings[i++] = &tfile->tx_ring;
> -- 
> 2.43.0


^ permalink raw reply

* Re: [PATCH v5 net-next 02/15] dt-bindings: net: dsa: add NETC switch
From: Rob Herring (Arm) @ 2026-05-06 22:22 UTC (permalink / raw)
  To: Wei Fang
  Cc: linux-kernel, chleroy, edumazet, frank.li, kuba, netdev,
	vladimir.oltean, devicetree, linux, conor+dt, horms, davem,
	andrew+netdev, xiaoning.wang, pabeni, krzk+dt, linuxppc-dev,
	claudiu.manoil, f.fainelli, linux-arm-kernel, imx
In-Reply-To: <20260430024945.3413973-3-wei.fang@nxp.com>


On Thu, 30 Apr 2026 10:49:32 +0800, Wei Fang wrote:
> Add bindings for NETC switch. This switch is a PCIe function of NETC IP,
> it supports advanced QoS with 8 traffic classes and 4 drop resilience
> levels, and a full range of TSN standards capabilities. The switch CPU
> port connects to an internal ENETC port, which is also a PCIe function
> of NETC IP. So these two ports use a light-weight "pseudo MAC" instead
> of a back-to-back MAC, because the "pseudo MAC" provides the delineation
> between switch and ENETC, this translates to lower power (less logic and
> memory) and lower delay (as there is no serialization delay across this
> link).
> 
> Signed-off-by: Wei Fang <wei.fang@nxp.com>
> Reviewed-by: Frank Li <Frank.Li@nxp.com>
> ---
>  .../bindings/net/dsa/nxp,netc-switch.yaml     | 127 ++++++++++++++++++
>  1 file changed, 127 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/dsa/nxp,netc-switch.yaml
> 

Reviewed-by: Rob Herring (Arm) <robh@kernel.org>


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox