Netdev List
 help / color / mirror / Atom feed
* [PATCH net v2 2/2] net: phy: micrel: remove ksz9131_resume()
From: Ovidiu Panait @ 2026-04-09  9:56 UTC (permalink / raw)
  To: andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni,
	biju.das.jz
  Cc: netdev, linux-kernel, linux-renesas-soc, Ovidiu Panait
In-Reply-To: <20260409095633.70973-1-ovidiu.panait.rb@renesas.com>

ksz9131_resume() was added to restore RGMII delays on resume for platforms
where the PHY loses power during suspend to RAM. However, for s2idle, the
PHY stays in Software Power-Down (SPD) during resume. In that case,
ksz9131_config_rgmii_delay() accesses MMD registers before kszphy_resume()
clears BMCR_PDOWN. The KSZ9131 datasheet states that during SPD, access to
the MMD registers is restricted:

  - Only access to the standard registers (0 through 31) is supported.
  - Access to MMD address spaces other than MMD address space 1 is
    possible if the spd_clock_gate_override bit is set.
  - Access to MMD address space 1 is not possible.

Additionally, only RGMII delays were restored, while other settings
from ksz9131_config_init() were not.

Now that the preceding commit ("net: phylink: call phy_init_hw() in
phylink resume path") performs a phy_init_hw() during phylink resume,
ksz9131_resume() is no longer needed.

Remove it and use kszphy_resume() directly.

Fixes: f25a7eaa897f ("net: phy: micrel: Add ksz9131_resume()")
Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
---
 drivers/net/phy/micrel.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 2aa1dedd21b8..f2513109865a 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -6014,14 +6014,6 @@ static int lan8841_suspend(struct phy_device *phydev)
 	return kszphy_generic_suspend(phydev);
 }
 
-static int ksz9131_resume(struct phy_device *phydev)
-{
-	if (phydev->suspended && phy_interface_is_rgmii(phydev))
-		ksz9131_config_rgmii_delay(phydev);
-
-	return kszphy_resume(phydev);
-}
-
 #define LAN8842_PTP_GPIO_NUM 16
 
 static int lan8842_ptp_probe_once(struct phy_device *phydev)
@@ -6929,7 +6921,7 @@ static struct phy_driver ksphy_driver[] = {
 	.get_strings	= kszphy_get_strings,
 	.get_stats	= kszphy_get_stats,
 	.suspend	= kszphy_suspend,
-	.resume		= ksz9131_resume,
+	.resume		= kszphy_resume,
 	.cable_test_start	= ksz9x31_cable_test_start,
 	.cable_test_get_status	= ksz9x31_cable_test_get_status,
 	.get_features	= ksz9477_get_features,
-- 
2.34.1


^ permalink raw reply related

* [PATCH net v2 1/2] net: phylink: call phy_init_hw() in phylink resume path
From: Ovidiu Panait @ 2026-04-09  9:56 UTC (permalink / raw)
  To: andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni,
	biju.das.jz
  Cc: netdev, linux-kernel, linux-renesas-soc, Ovidiu Panait
In-Reply-To: <20260409095633.70973-1-ovidiu.panait.rb@renesas.com>

When mac_managed_pm flag is set, mdio_bus_phy_resume() is skipped,
so phy_init_hw(), which performs soft_reset and config_init, is not
called during resume.

This is inconsistent with the non-mac_managed_pm path, where
mdio_bus_phy_resume() calls phy_init_hw() before phy_resume()
on every resume.

Add phy_init_hw() calls in both phylink_prepare_resume() and
phylink_resume(), to ensure that the PHY state is the same as
when the PHY is resumed via the MDIO bus.

Signed-off-by: Ovidiu Panait <ovidiu.panait.rb@renesas.com>
---
 drivers/net/phy/phylink.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 087ac63f9193..c302126009f6 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -2669,8 +2669,10 @@ void phylink_prepare_resume(struct phylink *pl)
 	 * then resume the PHY. Note that 802.3 allows PHYs 500ms before
 	 * the clock meets requirements. We do not implement this delay.
 	 */
-	if (pl->config->mac_requires_rxc && phydev && phydev->suspended)
+	if (pl->config->mac_requires_rxc && phydev && phydev->suspended) {
+		phy_init_hw(phydev);
 		phy_resume(phydev);
+	}
 }
 EXPORT_SYMBOL_GPL(phylink_prepare_resume);
 
@@ -2683,6 +2685,8 @@ EXPORT_SYMBOL_GPL(phylink_prepare_resume);
  */
 void phylink_resume(struct phylink *pl)
 {
+	struct phy_device *phydev = pl->phydev;
+
 	ASSERT_RTNL();
 
 	if (phylink_phy_pm_speed_ctrl(pl))
@@ -2712,6 +2716,9 @@ void phylink_resume(struct phylink *pl)
 		/* Re-enable and re-resolve the link parameters */
 		phylink_enable_and_run_resolve(pl, PHYLINK_DISABLE_MAC_WOL);
 	} else {
+		if (phydev && phydev->suspended)
+			phy_init_hw(phydev);
+
 		phylink_start(pl);
 	}
 }
-- 
2.34.1


^ permalink raw reply related

* [PATCH net v2 0/2] net: phylink: fix PHY reinitialization on resume
From: Ovidiu Panait @ 2026-04-09  9:56 UTC (permalink / raw)
  To: andrew, hkallweit1, linux, davem, edumazet, kuba, pabeni,
	biju.das.jz
  Cc: netdev, linux-kernel, linux-renesas-soc, Ovidiu Panait

When mac_managed_pm flag is set, mdio_bus_phy_resume() is skipped,
so phy_init_hw(), which performs soft_reset and config_init, is not
called during resume.

This is inconsistent with the non-mac_managed_pm path, where
mdio_bus_phy_resume() calls phy_init_hw() before phy_resume()
on every resume.

This series adds phy_init_hw() to the phylink resume path to ensure
consistent PHY state regardless of whether mac_managed_pm is set,
and removes a now-redundant workaround in the KSZ9131 PHY driver.

v2:
- Moved phy_init_hw() from ksz9131_resume() to phylink resume path, as
  suggested by Russell.

v1: https://lore.kernel.org/all/20260403111738.37749-1-ovidiu.panait.rb@renesas.com/#t

Ovidiu Panait (2):
  net: phylink: call phy_init_hw() in phylink resume path
  net: phy: micrel: remove ksz9131_resume()

 drivers/net/phy/micrel.c  | 10 +---------
 drivers/net/phy/phylink.c |  9 ++++++++-
 2 files changed, 9 insertions(+), 10 deletions(-)

-- 
2.34.1


^ permalink raw reply

* RE: [net-next,v3,01/13] net: renesas: rswitch: improve port change mode functions
From: Michael Dege @ 2026-04-09  9:55 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: paul@pbarker.dev, niklas.soderlund, kuba@kernel.org,
	Yoshihiro Shimoda, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, edumazet@google.com,
	linux-renesas-soc@vger.kernel.org, andrew+netdev@lunn.ch,
	davem@davemloft.net
In-Reply-To: <20260402125912.234434-1-pabeni@redhat.com>

Hello Paolo,

Thank you for your review.

> -----Original Message-----
> From: Paolo Abeni <pabeni@redhat.com>
> Sent: Thursday, April 2, 2026 2:59 PM
> To: Michael Dege <michael.dege@renesas.com>
> Cc: paul@pbarker.dev; niklas.soderlund <niklas.soderlund@ragnatech.se>; kuba@kernel.org; Yoshihiro
> Shimoda <yoshihiro.shimoda.uh@renesas.com>; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> edumazet@google.com; pabeni@redhat.com; linux-renesas-soc@vger.kernel.org; andrew+netdev@lunn.ch;
> davem@davemloft.net
> Subject: Re: [net-next,v3,01/13] net: renesas: rswitch: improve port change mode functions
>
> This is an AI-generated review of your patch. The human sending this email has considered the AI
> review valid, or at least plausible.
>
> For local reproduction steps see
> https://netdev-ai.bots.linux.dev/ai-
> local.html&data=05%7C02%7Cmichael.dege%40renesas.com%7C4f84d222a25841bd46eb08de90b7abdd%7C53d82571da19
> 47e49cb4625a166a4a2a%7C0%7C0%7C639107315837152138%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYi
> OiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=DbF%2FVyoOLfJSYrnI
> ceiszOHcbvGrEtenm%2Fk%2Fah%2FIrys%3D&reserved=0
> ---
> net: renesas: rswitch: improve port change mode functions
>
> For ETHA and GWCA ports every mode change from operational to configuration and vice-versa requires to
> change the mode to "disable"
> first. This was done by calling the function twice. Let the change_mode function take care of this
> required step in a single call.
>
> > diff --git a/drivers/net/ethernet/renesas/rswitch_main.c
> > b/drivers/net/ethernet/renesas/rswitch_main.c
> > index 6fe964816322..f6d1e610e7fa 100644
> > --- a/drivers/net/ethernet/renesas/rswitch_main.c
> > +++ b/drivers/net/ethernet/renesas/rswitch_main.c
> > @@ -176,14 +176,19 @@ static int rswitch_gwca_change_mode(struct rswitch_private *priv,
> >     if (!rswitch_agent_clock_is_enabled(priv->addr, priv->gwca.index))
> >             rswitch_agent_clock_ctrl(priv->addr, priv->gwca.index, 1);
> >
> > -   iowrite32(mode, priv->addr + GWMC);
> > +   iowrite32(GWMC_OPC_DISABLE, priv->addr + GWMC);
> >
> > -   ret = rswitch_reg_wait(priv->addr, GWMS, GWMS_OPS_MASK, mode);
> > +   ret = rswitch_reg_wait(priv->addr, GWMS, GWMS_OPS_MASK,
> > +GWMC_OPC_DISABLE);
> >
> > -   if (mode == GWMC_OPC_DISABLE)
> > +   if (mode == GWMC_OPC_DISABLE) {
> >             rswitch_agent_clock_ctrl(priv->addr, priv->gwca.index, 0);
> >
> > -   return ret;
> > +           return ret;
> > +   }
> > +
> > +   iowrite32(mode, priv->addr + GWMC);
> > +
> > +   return rswitch_reg_wait(priv->addr, GWMS, GWMS_OPS_MASK, mode);
>
> Should the error from the initial DISABLE transition be checked before proceeding? When mode !=
> GWMC_OPC_DISABLE, this code stores the result of
> rswitch_reg_wait() for the DISABLE transition in 'ret' but never checks it.
> If the hardware fails to reach DISABLE state, the function silently proceeds to write the target mode.
>
> Before this patch, callers explicitly checked the error after calling rswitch_gwca_change_mode(priv,
> GWMC_OPC_DISABLE):
>
> rswitch_gwca_hw_init() and rswitch_gwca_hw_deinit():
>     err = rswitch_gwca_change_mode(priv, GWMC_OPC_DISABLE);
>     if (err < 0)
>         return err;
>
> This patch removes those caller-side checks but doesn't replicate the error checking inside
> rswitch_gwca_change_mode(). If the first rswitch_reg_wait() returns -ETIMEDOUT, should the function
> return that error before attempting to write the target mode?

I have fixed this and will provide a new version of the patches.

Best regards,

Michael
>
> >  }
> >
> >  static int rswitch_gwca_mcast_table_reset(struct rswitch_private
> > *priv) @@ -682,9 +687,6 @@ static int rswitch_gwca_hw_init(struct rswitch_private *priv)
> >     unsigned int i;
> >     int err;
> >
> > -   err = rswitch_gwca_change_mode(priv, GWMC_OPC_DISABLE);
> > -   if (err < 0)
> > -           return err;
> >     err = rswitch_gwca_change_mode(priv, GWMC_OPC_CONFIG);
> >     if (err < 0)
> >             return err;
>
> [ ... ]
>
> > @@ -717,9 +719,6 @@ static int rswitch_gwca_hw_init(struct rswitch_private *priv)
> >                     return err;
> >     }
> >
> > -   err = rswitch_gwca_change_mode(priv, GWMC_OPC_DISABLE);
> > -   if (err < 0)
> > -           return err;
> >     return rswitch_gwca_change_mode(priv, GWMC_OPC_OPERATION);  }
> >
> > @@ -727,9 +726,6 @@ static int rswitch_gwca_hw_deinit(struct
> > rswitch_private *priv)  {
> >     int err;
> >
> > -   err = rswitch_gwca_change_mode(priv, GWMC_OPC_DISABLE);
> > -   if (err < 0)
> > -           return err;
> >     err = rswitch_gwca_change_mode(priv, GWMC_OPC_RESET);
> >     if (err < 0)
> >             return err;
>
> [ ... ]
>
> > @@ -1116,14 +1112,18 @@ static int rswitch_etha_change_mode(struct rswitch_etha *etha,
> >     if (!rswitch_agent_clock_is_enabled(etha->coma_addr, etha->index))
> >             rswitch_agent_clock_ctrl(etha->coma_addr, etha->index, 1);
> >
> > -   iowrite32(mode, etha->addr + EAMC);
> > -
> > -   ret = rswitch_reg_wait(etha->addr, EAMS, EAMS_OPS_MASK, mode);
> > +   iowrite32(EAMC_OPC_DISABLE, etha->addr + EAMC);
> > +   ret = rswitch_reg_wait(etha->addr, EAMS, EAMS_OPS_MASK,
> > +EAMC_OPC_DISABLE);
> >
> > -   if (mode == EAMC_OPC_DISABLE)
> > +   if (mode == EAMC_OPC_DISABLE) {
> >             rswitch_agent_clock_ctrl(etha->coma_addr, etha->index, 0);
> >
> > -   return ret;
> > +           return ret;
> > +   }
> > +
> > +   iowrite32(mode, etha->addr + EAMC);
> > +
> > +   return rswitch_reg_wait(etha->addr, EAMS, EAMS_OPS_MASK, mode);
>
> Same pattern here in rswitch_etha_change_mode(). When mode != EAMC_OPC_DISABLE, the return value from
> rswitch_reg_wait() for the intermediate DISABLE transition is stored in 'ret' but never checked.
>
> Before this patch, rswitch_etha_hw_init() explicitly called rswitch_etha_change_mode(etha,
> EAMC_OPC_DISABLE) with error checking before calling it with EAMC_OPC_CONFIG. This patch removes that
> caller-side check but doesn't replicate it inside the function. Should there be a check like 'if (ret
> < 0) return ret;' after the first rswitch_reg_wait()?
>
> >  }
> >
> >  static void rswitch_etha_read_mac_address(struct rswitch_etha *etha)
>
> [ ... ]


^ permalink raw reply

* [net-next PATCH v5 4/4] octeontx2-pf: cn20k: Use unified Halo context
From: Subbaraya Sundeep @ 2026-04-09  9:53 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2
  Cc: netdev, linux-kernel, Subbaraya Sundeep
In-Reply-To: <1775728404-28451-1-git-send-email-sbhatta@marvell.com>

Use unified Halo context present in CN20K hardware for
octeontx2 netdevs instead of aura and pool contexts.
Note that with this halo context in place RQ backpressure
is not being configured and the same will be supported
later.

Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
---
 .../ethernet/marvell/octeontx2/nic/cn20k.c    | 215 +++++++++---------
 .../ethernet/marvell/octeontx2/nic/cn20k.h    |   3 +
 .../marvell/octeontx2/nic/otx2_common.h       |   3 +
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |   8 +
 4 files changed, 126 insertions(+), 103 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn20k.c b/drivers/net/ethernet/marvell/octeontx2/nic/cn20k.c
index a5a8f4558717..f513e9ffc2dd 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/cn20k.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn20k.c
@@ -242,15 +242,6 @@ int cn20k_register_pfvf_mbox_intr(struct otx2_nic *pf, int numvfs)
 
 #define RQ_BP_LVL_AURA   (255 - ((85 * 256) / 100)) /* BP when 85% is full */
 
-static u8 cn20k_aura_bpid_idx(struct otx2_nic *pfvf, int aura_id)
-{
-#ifdef CONFIG_DCB
-	return pfvf->queue_to_pfc_map[aura_id];
-#else
-	return 0;
-#endif
-}
-
 static int cn20k_tc_get_entry_index(struct otx2_flow_config *flow_cfg,
 				    struct otx2_tc_flow *node)
 {
@@ -517,84 +508,7 @@ int cn20k_tc_alloc_entry(struct otx2_nic *nic,
 	return 0;
 }
 
-static int cn20k_aura_aq_init(struct otx2_nic *pfvf, int aura_id,
-			      int pool_id, int numptrs)
-{
-	struct npa_cn20k_aq_enq_req *aq;
-	struct otx2_pool *pool;
-	u8 bpid_idx;
-	int err;
-
-	pool = &pfvf->qset.pool[pool_id];
-
-	/* Allocate memory for HW to update Aura count.
-	 * Alloc one cache line, so that it fits all FC_STYPE modes.
-	 */
-	if (!pool->fc_addr) {
-		err = qmem_alloc(pfvf->dev, &pool->fc_addr, 1, OTX2_ALIGN);
-		if (err)
-			return err;
-	}
-
-	/* Initialize this aura's context via AF */
-	aq = otx2_mbox_alloc_msg_npa_cn20k_aq_enq(&pfvf->mbox);
-	if (!aq) {
-		/* Shared mbox memory buffer is full, flush it and retry */
-		err = otx2_sync_mbox_msg(&pfvf->mbox);
-		if (err)
-			return err;
-		aq = otx2_mbox_alloc_msg_npa_cn20k_aq_enq(&pfvf->mbox);
-		if (!aq)
-			return -ENOMEM;
-	}
-
-	aq->aura_id = aura_id;
-
-	/* Will be filled by AF with correct pool context address */
-	aq->aura.pool_addr = pool_id;
-	aq->aura.pool_caching = 1;
-	aq->aura.shift = ilog2(numptrs) - 8;
-	aq->aura.count = numptrs;
-	aq->aura.limit = numptrs;
-	aq->aura.avg_level = 255;
-	aq->aura.ena = 1;
-	aq->aura.fc_ena = 1;
-	aq->aura.fc_addr = pool->fc_addr->iova;
-	aq->aura.fc_hyst_bits = 0; /* Store count on all updates */
-
-	/* Enable backpressure for RQ aura */
-	if (aura_id < pfvf->hw.rqpool_cnt && !is_otx2_lbkvf(pfvf->pdev)) {
-		aq->aura.bp_ena = 0;
-		/* If NIX1 LF is attached then specify NIX1_RX.
-		 *
-		 * Below NPA_AURA_S[BP_ENA] is set according to the
-		 * NPA_BPINTF_E enumeration given as:
-		 * 0x0 + a*0x1 where 'a' is 0 for NIX0_RX and 1 for NIX1_RX so
-		 * NIX0_RX is 0x0 + 0*0x1 = 0
-		 * NIX1_RX is 0x0 + 1*0x1 = 1
-		 * But in HRM it is given that
-		 * "NPA_AURA_S[BP_ENA](w1[33:32]) - Enable aura backpressure to
-		 * NIX-RX based on [BP] level. One bit per NIX-RX; index
-		 * enumerated by NPA_BPINTF_E."
-		 */
-		if (pfvf->nix_blkaddr == BLKADDR_NIX1)
-			aq->aura.bp_ena = 1;
-
-		bpid_idx = cn20k_aura_bpid_idx(pfvf, aura_id);
-		aq->aura.bpid = pfvf->bpid[bpid_idx];
-
-		/* Set backpressure level for RQ's Aura */
-		aq->aura.bp = RQ_BP_LVL_AURA;
-	}
-
-	/* Fill AQ info */
-	aq->ctype = NPA_AQ_CTYPE_AURA;
-	aq->op = NPA_AQ_INSTOP_INIT;
-
-	return 0;
-}
-
-static int cn20k_pool_aq_init(struct otx2_nic *pfvf, u16 pool_id,
+static int cn20k_halo_aq_init(struct otx2_nic *pfvf, u16 pool_id,
 			      int stack_pages, int numptrs, int buf_size,
 			      int type)
 {
@@ -610,36 +524,57 @@ static int cn20k_pool_aq_init(struct otx2_nic *pfvf, u16 pool_id,
 	if (err)
 		return err;
 
+	/* Allocate memory for HW to update Aura count.
+	 * Alloc one cache line, so that it fits all FC_STYPE modes.
+	 */
+	if (!pool->fc_addr) {
+		err = qmem_alloc(pfvf->dev, &pool->fc_addr, 1, OTX2_ALIGN);
+		if (err) {
+			qmem_free(pfvf->dev, pool->stack);
+			return err;
+		}
+	}
+
 	pool->rbsize = buf_size;
 
-	/* Initialize this pool's context via AF */
+	/* Initialize this aura's context via AF */
 	aq = otx2_mbox_alloc_msg_npa_cn20k_aq_enq(&pfvf->mbox);
 	if (!aq) {
 		/* Shared mbox memory buffer is full, flush it and retry */
 		err = otx2_sync_mbox_msg(&pfvf->mbox);
-		if (err) {
-			qmem_free(pfvf->dev, pool->stack);
-			return err;
-		}
+		if (err)
+			goto free_mem;
 		aq = otx2_mbox_alloc_msg_npa_cn20k_aq_enq(&pfvf->mbox);
 		if (!aq) {
-			qmem_free(pfvf->dev, pool->stack);
-			return -ENOMEM;
+			err = -ENOMEM;
+			goto free_mem;
 		}
 	}
 
 	aq->aura_id = pool_id;
-	aq->pool.stack_base = pool->stack->iova;
-	aq->pool.stack_caching = 1;
-	aq->pool.ena = 1;
-	aq->pool.buf_size = buf_size / 128;
-	aq->pool.stack_max_pages = stack_pages;
-	aq->pool.shift = ilog2(numptrs) - 8;
-	aq->pool.ptr_start = 0;
-	aq->pool.ptr_end = ~0ULL;
+
+	aq->halo.stack_base = pool->stack->iova;
+	aq->halo.stack_caching = 1;
+	aq->halo.ena = 1;
+	aq->halo.buf_size = buf_size / 128;
+	aq->halo.stack_max_pages = stack_pages;
+	aq->halo.shift = ilog2(numptrs) - 8;
+	aq->halo.ptr_start = 0;
+	aq->halo.ptr_end = ~0ULL;
+
+	aq->halo.avg_level = 255;
+	aq->halo.fc_ena = 1;
+	aq->halo.fc_addr = pool->fc_addr->iova;
+	aq->halo.fc_hyst_bits = 0; /* Store count on all updates */
+
+	if (pfvf->npa_dpc_valid) {
+		aq->halo.op_dpc_ena = 1;
+		aq->halo.op_dpc_set = pfvf->npa_dpc;
+	}
+	aq->halo.unified_ctx = 1;
 
 	/* Fill AQ info */
-	aq->ctype = NPA_AQ_CTYPE_POOL;
+	aq->ctype = NPA_AQ_CTYPE_HALO;
 	aq->op = NPA_AQ_INSTOP_INIT;
 
 	if (type != AURA_NIX_RQ) {
@@ -661,6 +596,80 @@ static int cn20k_pool_aq_init(struct otx2_nic *pfvf, u16 pool_id,
 	}
 
 	return 0;
+
+free_mem:
+	qmem_free(pfvf->dev, pool->stack);
+	qmem_free(pfvf->dev, pool->fc_addr);
+	return err;
+}
+
+static int cn20k_aura_aq_init(struct otx2_nic *pfvf, int aura_id,
+			      int pool_id, int numptrs)
+{
+	return 0;
+}
+
+static int cn20k_pool_aq_init(struct otx2_nic *pfvf, u16 pool_id,
+			      int stack_pages, int numptrs, int buf_size,
+			      int type)
+{
+	return cn20k_halo_aq_init(pfvf, pool_id, stack_pages,
+				  numptrs, buf_size, type);
+}
+
+int cn20k_npa_alloc_dpc(struct otx2_nic *nic)
+{
+	struct npa_cn20k_dpc_alloc_req *req;
+	struct npa_cn20k_dpc_alloc_rsp *rsp;
+	int err;
+
+	req = otx2_mbox_alloc_msg_npa_cn20k_dpc_alloc(&nic->mbox);
+	if (!req)
+		return -ENOMEM;
+
+	/* Count successful ALLOC requests only */
+	req->dpc_conf = 1ULL << 4;
+
+	err = otx2_sync_mbox_msg(&nic->mbox);
+	if (err)
+		return err;
+
+	rsp = (struct npa_cn20k_dpc_alloc_rsp *)otx2_mbox_get_rsp(&nic->mbox.mbox,
+								  0, &req->hdr);
+	if (IS_ERR(rsp))
+		return PTR_ERR(rsp);
+
+	nic->npa_dpc = rsp->cntr_id;
+	nic->npa_dpc_valid = true;
+
+	return 0;
+}
+
+int cn20k_npa_free_dpc(struct otx2_nic *nic)
+{
+	struct npa_cn20k_dpc_free_req *req;
+	int err;
+
+	if (!nic->npa_dpc_valid)
+		return 0;
+
+	mutex_lock(&nic->mbox.lock);
+
+	req = otx2_mbox_alloc_msg_npa_cn20k_dpc_free(&nic->mbox);
+	if (!req) {
+		mutex_unlock(&nic->mbox.lock);
+		return -ENOMEM;
+	}
+
+	req->cntr_id = nic->npa_dpc;
+
+	err = otx2_sync_mbox_msg(&nic->mbox);
+
+	nic->npa_dpc_valid = false;
+
+	mutex_unlock(&nic->mbox.lock);
+
+	return err;
 }
 
 static int cn20k_sq_aq_init(void *dev, u16 qidx, u8 chan_offset, u16 sqb_aura)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/cn20k.h b/drivers/net/ethernet/marvell/octeontx2/nic/cn20k.h
index b5e527f6d7eb..16a69d84ea79 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/cn20k.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/cn20k.h
@@ -28,4 +28,7 @@ int cn20k_tc_alloc_entry(struct otx2_nic *nic,
 			 struct otx2_tc_flow *new_node,
 			 struct npc_install_flow_req *dummy);
 int cn20k_tc_free_mcam_entry(struct otx2_nic *nic, u16 entry);
+int cn20k_npa_alloc_dpc(struct otx2_nic *nic);
+int cn20k_npa_free_dpc(struct otx2_nic *nic);
+
 #endif /* CN20K_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
index eecee612b7b2..f997dfc0fedd 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.h
@@ -592,6 +592,9 @@ struct otx2_nic {
 	struct cn10k_ipsec	ipsec;
 	/* af_xdp zero-copy */
 	unsigned long		*af_xdp_zc_qidx;
+
+	bool			npa_dpc_valid;
+	u8			npa_dpc; /* NPA DPC counter id */
 };
 
 static inline bool is_otx2_lbkvf(struct pci_dev *pdev)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index ee623476e5ff..2b5fe67d297c 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -1651,6 +1651,9 @@ int otx2_init_hw_resources(struct otx2_nic *pf)
 	if (!is_otx2_lbkvf(pf->pdev))
 		otx2_nix_config_bp(pf, true);
 
+	if (is_cn20k(pf->pdev))
+		cn20k_npa_alloc_dpc(pf);
+
 	/* Init Auras and pools used by NIX RQ, for free buffer ptrs */
 	err = otx2_rq_aura_pool_init(pf);
 	if (err) {
@@ -1726,6 +1729,8 @@ int otx2_init_hw_resources(struct otx2_nic *pf)
 	otx2_ctx_disable(mbox, NPA_AQ_CTYPE_AURA, true);
 	otx2_aura_pool_free(pf);
 err_free_nix_lf:
+	if (pf->npa_dpc_valid)
+		cn20k_npa_free_dpc(pf);
 	mutex_lock(&mbox->lock);
 	free_req = otx2_mbox_alloc_msg_nix_lf_free(mbox);
 	if (free_req) {
@@ -1790,6 +1795,9 @@ void otx2_free_hw_resources(struct otx2_nic *pf)
 
 	otx2_free_sq_res(pf);
 
+	if (is_cn20k(pf->pdev))
+		cn20k_npa_free_dpc(pf);
+
 	/* Free RQ buffer pointers*/
 	otx2_free_aura_ptr(pf, AURA_NIX_RQ);
 
-- 
2.48.1


^ permalink raw reply related

* [net-next PATCH v5 3/4] octeontx2-af: npa: cn20k: Add debugfs for Halo
From: Subbaraya Sundeep @ 2026-04-09  9:53 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2
  Cc: netdev, linux-kernel, Linu Cherian, Subbaraya Sundeep
In-Reply-To: <1775728404-28451-1-git-send-email-sbhatta@marvell.com>

From: Linu Cherian <lcherian@marvell.com>

Similar to other hardware contexts add debugfs support for
unified Halo context.

Sample output on cn20k::
/sys/kernel/debug/cn20k/npa # cat halo_ctx
======halo : 2=======
W0: Stack base          ffffff790000
W1: ena                 1
W1: nat_align           0
W1: stack_caching       1
W1: aura drop ena       0
W1: aura drop           0
W1: buf_offset          0
W1: buf_size            32
W1: ref_cnt_prof                0
W2: stack_max_pages     13
W2: stack_pages         11
W3: bp_0                0
W3: bp_1                0
W3: bp_2                0

snip ..

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
---
 .../marvell/octeontx2/af/cn20k/debugfs.c      | 60 ++++++++++++++++
 .../marvell/octeontx2/af/cn20k/debugfs.h      |  2 +
 .../marvell/octeontx2/af/rvu_debugfs.c        | 71 ++++++++++++++++---
 3 files changed, 125 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c
index 3debf2fae1a4..c0cfd3a39c23 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c
@@ -489,3 +489,63 @@ void print_npa_cn20k_pool_ctx(struct seq_file *m,
 		   pool->thresh_qint_idx, pool->err_qint_idx);
 	seq_printf(m, "W8: fc_msh_dst\t\t%d\n", pool->fc_msh_dst);
 }
+
+void print_npa_cn20k_halo_ctx(struct seq_file *m, struct npa_aq_enq_rsp *rsp)
+{
+	struct npa_cn20k_aq_enq_rsp *cn20k_rsp;
+	struct npa_cn20k_halo_s *halo;
+
+	cn20k_rsp = (struct npa_cn20k_aq_enq_rsp *)rsp;
+	halo = &cn20k_rsp->halo;
+
+	seq_printf(m, "W0: Stack base\t\t%llx\n", halo->stack_base);
+
+	seq_printf(m, "W1: ena \t\t%d\nW1: nat_align \t\t%d\n",
+		   halo->ena, halo->nat_align);
+	seq_printf(m, "W1: stack_caching\t%d\n",
+		   halo->stack_caching);
+	seq_printf(m, "W1: aura drop ena\t%d\n", halo->aura_drop_ena);
+	seq_printf(m, "W1: aura drop\t\t%d\n", halo->aura_drop);
+	seq_printf(m, "W1: buf_offset\t\t%d\nW1: buf_size\t\t%d\n",
+		   halo->buf_offset, halo->buf_size);
+	seq_printf(m, "W1: ref_cnt_prof\t\t%d\n", halo->ref_cnt_prof);
+	seq_printf(m, "W2: stack_max_pages \t%d\nW2: stack_pages\t\t%d\n",
+		   halo->stack_max_pages, halo->stack_pages);
+	seq_printf(m, "W3: bp_0\t\t%d\nW3: bp_1\t\t%d\nW3: bp_2\t\t%d\n",
+		   halo->bp_0, halo->bp_1, halo->bp_2);
+	seq_printf(m, "W3: bp_3\t\t%d\nW3: bp_4\t\t%d\nW3: bp_5\t\t%d\n",
+		   halo->bp_3, halo->bp_4, halo->bp_5);
+	seq_printf(m, "W3: bp_6\t\t%d\nW3: bp_7\t\t%d\nW3: bp_ena_0\t\t%d\n",
+		   halo->bp_6, halo->bp_7, halo->bp_ena_0);
+	seq_printf(m, "W3: bp_ena_1\t\t%d\nW3: bp_ena_2\t\t%d\n",
+		   halo->bp_ena_1, halo->bp_ena_2);
+	seq_printf(m, "W3: bp_ena_3\t\t%d\nW3: bp_ena_4\t\t%d\n",
+		   halo->bp_ena_3, halo->bp_ena_4);
+	seq_printf(m, "W3: bp_ena_5\t\t%d\nW3: bp_ena_6\t\t%d\n",
+		   halo->bp_ena_5, halo->bp_ena_6);
+	seq_printf(m, "W3: bp_ena_7\t\t%d\n", halo->bp_ena_7);
+	seq_printf(m, "W4: stack_offset\t%d\nW4: shift\t\t%d\nW4: avg_level\t\t%d\n",
+		   halo->stack_offset, halo->shift, halo->avg_level);
+	seq_printf(m, "W4: avg_con \t\t%d\nW4: fc_ena\t\t%d\nW4: fc_stype\t\t%d\n",
+		   halo->avg_con, halo->fc_ena, halo->fc_stype);
+	seq_printf(m, "W4: fc_hyst_bits\t%d\nW4: fc_up_crossing\t%d\n",
+		   halo->fc_hyst_bits, halo->fc_up_crossing);
+	seq_printf(m, "W4: update_time\t\t%d\n", halo->update_time);
+	seq_printf(m, "W5: fc_addr\t\t%llx\n", halo->fc_addr);
+	seq_printf(m, "W6: ptr_start\t\t%llx\n", halo->ptr_start);
+	seq_printf(m, "W7: ptr_end\t\t%llx\n", halo->ptr_end);
+	seq_printf(m, "W8: bpid_0\t\t%d\n", halo->bpid_0);
+	seq_printf(m, "W8: err_int \t\t%d\nW8: err_int_ena\t\t%d\n",
+		   halo->err_int, halo->err_int_ena);
+	seq_printf(m, "W8: thresh_int\t\t%d\nW8: thresh_int_ena \t%d\n",
+		   halo->thresh_int, halo->thresh_int_ena);
+	seq_printf(m, "W8: thresh_up\t\t%d\nW8: thresh_qint_idx\t%d\n",
+		   halo->thresh_up, halo->thresh_qint_idx);
+	seq_printf(m, "W8: err_qint_idx \t%d\n", halo->err_qint_idx);
+	seq_printf(m, "W9: thresh\t\t%llu\n", (u64)halo->thresh);
+	seq_printf(m, "W9: fc_msh_dst\t\t%d\n", halo->fc_msh_dst);
+	seq_printf(m, "W9: op_dpc_ena\t\t%d\nW9: op_dpc_set\t\t%d\n",
+		   halo->op_dpc_ena, halo->op_dpc_set);
+	seq_printf(m, "W9: stream_ctx\t\t%d\nW9: unified_ctx\t\t%d\n",
+		   halo->stream_ctx, halo->unified_ctx);
+}
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.h b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.h
index 0c5f05883666..7e00c7499e35 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.h
@@ -27,5 +27,7 @@ void print_npa_cn20k_aura_ctx(struct seq_file *m,
 			      struct npa_cn20k_aq_enq_rsp *rsp);
 void print_npa_cn20k_pool_ctx(struct seq_file *m,
 			      struct npa_cn20k_aq_enq_rsp *rsp);
+void print_npa_cn20k_halo_ctx(struct seq_file *m,
+			      struct npa_aq_enq_rsp *rsp);
 
 #endif
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
index fa461489acdd..0ac59103b4a4 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
@@ -968,6 +968,9 @@ static void print_npa_qsize(struct seq_file *m, struct rvu_pfvf *pfvf)
 		seq_printf(m, "Aura count : %d\n", pfvf->aura_ctx->qsize);
 		seq_printf(m, "Aura context ena/dis bitmap : %*pb\n",
 			   pfvf->aura_ctx->qsize, pfvf->aura_bmap);
+		if (pfvf->halo_bmap)
+			seq_printf(m, "Halo context ena/dis bitmap : %*pb\n",
+				   pfvf->aura_ctx->qsize, pfvf->halo_bmap);
 	}
 
 	if (!pfvf->pool_ctx) {
@@ -1195,6 +1198,20 @@ static void print_npa_pool_ctx(struct seq_file *m, struct npa_aq_enq_rsp *rsp)
 		seq_printf(m, "W8: fc_msh_dst\t\t%d\n", pool->fc_msh_dst);
 }
 
+static const char *npa_ctype_str(int ctype)
+{
+	switch (ctype) {
+	case NPA_AQ_CTYPE_AURA:
+		return "aura";
+	case NPA_AQ_CTYPE_HALO:
+		return "halo";
+	case NPA_AQ_CTYPE_POOL:
+		return "pool";
+	default:
+		return "unknown";
+	}
+}
+
 /* Reads aura/pool's ctx from admin queue */
 static int rvu_dbg_npa_ctx_display(struct seq_file *m, void *unused, int ctype)
 {
@@ -1211,6 +1228,7 @@ static int rvu_dbg_npa_ctx_display(struct seq_file *m, void *unused, int ctype)
 
 	switch (ctype) {
 	case NPA_AQ_CTYPE_AURA:
+	case NPA_AQ_CTYPE_HALO:
 		npalf = rvu->rvu_dbg.npa_aura_ctx.lf;
 		id = rvu->rvu_dbg.npa_aura_ctx.id;
 		all = rvu->rvu_dbg.npa_aura_ctx.all;
@@ -1235,6 +1253,9 @@ static int rvu_dbg_npa_ctx_display(struct seq_file *m, void *unused, int ctype)
 	} else if (ctype == NPA_AQ_CTYPE_POOL && !pfvf->pool_ctx) {
 		seq_puts(m, "Pool context is not initialized\n");
 		return -EINVAL;
+	} else if (ctype == NPA_AQ_CTYPE_HALO && !pfvf->aura_ctx) {
+		seq_puts(m, "Halo context is not initialized\n");
+		return -EINVAL;
 	}
 
 	memset(&aq_req, 0, sizeof(struct npa_aq_enq_req));
@@ -1244,6 +1265,9 @@ static int rvu_dbg_npa_ctx_display(struct seq_file *m, void *unused, int ctype)
 	if (ctype == NPA_AQ_CTYPE_AURA) {
 		max_id = pfvf->aura_ctx->qsize;
 		print_npa_ctx = print_npa_aura_ctx;
+	} else if (ctype == NPA_AQ_CTYPE_HALO) {
+		max_id = pfvf->aura_ctx->qsize;
+		print_npa_ctx = print_npa_cn20k_halo_ctx;
 	} else {
 		max_id = pfvf->pool_ctx->qsize;
 		print_npa_ctx = print_npa_pool_ctx;
@@ -1251,8 +1275,7 @@ static int rvu_dbg_npa_ctx_display(struct seq_file *m, void *unused, int ctype)
 
 	if (id < 0 || id >= max_id) {
 		seq_printf(m, "Invalid %s, valid range is 0-%d\n",
-			   (ctype == NPA_AQ_CTYPE_AURA) ? "aura" : "pool",
-			max_id - 1);
+			   npa_ctype_str(ctype), max_id - 1);
 		return -EINVAL;
 	}
 
@@ -1265,12 +1288,19 @@ static int rvu_dbg_npa_ctx_display(struct seq_file *m, void *unused, int ctype)
 		aq_req.aura_id = aura;
 
 		/* Skip if queue is uninitialized */
+		if (ctype == NPA_AQ_CTYPE_AURA &&
+		    !test_bit(aura, pfvf->aura_bmap))
+			continue;
+
+		if (ctype == NPA_AQ_CTYPE_HALO &&
+		    !test_bit(aura, pfvf->halo_bmap))
+			continue;
+
 		if (ctype == NPA_AQ_CTYPE_POOL && !test_bit(aura, pfvf->pool_bmap))
 			continue;
 
-		seq_printf(m, "======%s : %d=======\n",
-			   (ctype == NPA_AQ_CTYPE_AURA) ? "AURA" : "POOL",
-			aq_req.aura_id);
+		seq_printf(m, "======%s : %d=======\n", npa_ctype_str(ctype),
+			   aq_req.aura_id);
 		rc = rvu_npa_aq_enq_inst(rvu, &aq_req, &rsp);
 		if (rc) {
 			seq_puts(m, "Failed to read context\n");
@@ -1299,6 +1329,12 @@ static int write_npa_ctx(struct rvu *rvu, bool all,
 			return -EINVAL;
 		}
 		max_id = pfvf->aura_ctx->qsize;
+	} else if (ctype == NPA_AQ_CTYPE_HALO) {
+		if (!pfvf->aura_ctx) {
+			dev_warn(rvu->dev, "Halo context is not initialized\n");
+			return -EINVAL;
+		}
+		max_id = pfvf->aura_ctx->qsize;
 	} else if (ctype == NPA_AQ_CTYPE_POOL) {
 		if (!pfvf->pool_ctx) {
 			dev_warn(rvu->dev, "Pool context is not initialized\n");
@@ -1309,13 +1345,14 @@ static int write_npa_ctx(struct rvu *rvu, bool all,
 
 	if (id < 0 || id >= max_id) {
 		dev_warn(rvu->dev, "Invalid %s, valid range is 0-%d\n",
-			 (ctype == NPA_AQ_CTYPE_AURA) ? "aura" : "pool",
+			 npa_ctype_str(ctype),
 			max_id - 1);
 		return -EINVAL;
 	}
 
 	switch (ctype) {
 	case NPA_AQ_CTYPE_AURA:
+	case NPA_AQ_CTYPE_HALO:
 		rvu->rvu_dbg.npa_aura_ctx.lf = npalf;
 		rvu->rvu_dbg.npa_aura_ctx.id = id;
 		rvu->rvu_dbg.npa_aura_ctx.all = all;
@@ -1374,12 +1411,12 @@ static ssize_t rvu_dbg_npa_ctx_write(struct file *filp,
 				     const char __user *buffer,
 				     size_t count, loff_t *ppos, int ctype)
 {
-	char *cmd_buf, *ctype_string = (ctype == NPA_AQ_CTYPE_AURA) ?
-					"aura" : "pool";
+	const char *ctype_string = npa_ctype_str(ctype);
 	struct seq_file *seqfp = filp->private_data;
 	struct rvu *rvu = seqfp->private;
 	int npalf, id = 0, ret;
 	bool all = false;
+	char *cmd_buf;
 
 	if ((*ppos != 0) || !count)
 		return -EINVAL;
@@ -1417,6 +1454,21 @@ static int rvu_dbg_npa_aura_ctx_display(struct seq_file *filp, void *unused)
 
 RVU_DEBUG_SEQ_FOPS(npa_aura_ctx, npa_aura_ctx_display, npa_aura_ctx_write);
 
+static ssize_t rvu_dbg_npa_halo_ctx_write(struct file *filp,
+					  const char __user *buffer,
+					  size_t count, loff_t *ppos)
+{
+	return rvu_dbg_npa_ctx_write(filp, buffer, count, ppos,
+				     NPA_AQ_CTYPE_HALO);
+}
+
+static int rvu_dbg_npa_halo_ctx_display(struct seq_file *filp, void *unused)
+{
+	return rvu_dbg_npa_ctx_display(filp, unused, NPA_AQ_CTYPE_HALO);
+}
+
+RVU_DEBUG_SEQ_FOPS(npa_halo_ctx, npa_halo_ctx_display, npa_halo_ctx_write);
+
 static ssize_t rvu_dbg_npa_pool_ctx_write(struct file *filp,
 					  const char __user *buffer,
 					  size_t count, loff_t *ppos)
@@ -2798,6 +2850,9 @@ static void rvu_dbg_npa_init(struct rvu *rvu)
 			    &rvu_dbg_npa_qsize_fops);
 	debugfs_create_file("aura_ctx", 0600, rvu->rvu_dbg.npa, rvu,
 			    &rvu_dbg_npa_aura_ctx_fops);
+	if (is_cn20k(rvu->pdev))
+		debugfs_create_file("halo_ctx", 0600, rvu->rvu_dbg.npa, rvu,
+				    &rvu_dbg_npa_halo_ctx_fops);
 	debugfs_create_file("pool_ctx", 0600, rvu->rvu_dbg.npa, rvu,
 			    &rvu_dbg_npa_pool_ctx_fops);
 
-- 
2.48.1


^ permalink raw reply related

* [net-next PATCH v5 2/4] octeontx2-af: npa: cn20k: Add DPC support
From: Subbaraya Sundeep @ 2026-04-09  9:53 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2
  Cc: netdev, linux-kernel, Linu Cherian, Subbaraya Sundeep
In-Reply-To: <1775728404-28451-1-git-send-email-sbhatta@marvell.com>

From: Linu Cherian <lcherian@marvell.com>

CN20k introduces 32 diagnostic and performance
counters that are shared across all NPA LFs.

Counters being shared, each PF driver need to request
for a counter with the required configuration to the AF,
so that a counter can be allocated and mapped to the
respective LF with the requested configuration.

Add new mbox messages, npa_dpc_alloc/free to handle this.

Also ensure all the LF to DPC counter mappings are cleared
at the time of LF free/teardown.

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
---
 .../ethernet/marvell/octeontx2/af/cn20k/api.h |   6 +
 .../ethernet/marvell/octeontx2/af/cn20k/npa.c | 129 ++++++++++++++++++
 .../ethernet/marvell/octeontx2/af/cn20k/reg.h |   7 +
 .../net/ethernet/marvell/octeontx2/af/mbox.h  |  19 +++
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |   3 +
 .../ethernet/marvell/octeontx2/af/rvu_npa.c   |  14 +-
 6 files changed, 177 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/api.h b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/api.h
index 4285b5d6a6a2..b13e7628f767 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/api.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/api.h
@@ -29,4 +29,10 @@ int cn20k_mbox_setup(struct otx2_mbox *mbox, struct pci_dev *pdev,
 		     void *reg_base, int direction, int ndevs);
 void cn20k_rvu_enable_afvf_intr(struct rvu *rvu, int vfs);
 void cn20k_rvu_disable_afvf_intr(struct rvu *rvu, int vfs);
+
+int npa_cn20k_dpc_alloc(struct rvu *rvu, struct npa_cn20k_dpc_alloc_req *req,
+			struct npa_cn20k_dpc_alloc_rsp *rsp);
+int npa_cn20k_dpc_free(struct rvu *rvu, struct npa_cn20k_dpc_free_req *req);
+void npa_cn20k_dpc_free_all(struct rvu *rvu, u16 pcifunc);
+
 #endif /* CN20K_API_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npa.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npa.c
index c963f43dc7b0..24a710f4f5fc 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npa.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npa.c
@@ -8,6 +8,8 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 
+#include "cn20k/api.h"
+#include "cn20k/reg.h"
 #include "struct.h"
 #include "../rvu.h"
 
@@ -46,3 +48,130 @@ int rvu_npa_halo_hwctx_disable(struct npa_aq_enq_req *req)
 
 	return 0;
 }
+
+int npa_cn20k_dpc_alloc(struct rvu *rvu, struct npa_cn20k_dpc_alloc_req *req,
+			struct npa_cn20k_dpc_alloc_rsp *rsp)
+{
+	struct rvu_hwinfo *hw = rvu->hw;
+	u16 pcifunc = req->hdr.pcifunc;
+	int cntr, lf, blkaddr, ridx;
+	struct rvu_block *block;
+	struct rvu_pfvf *pfvf;
+	u64 val, lfmask;
+
+	pfvf = rvu_get_pfvf(rvu, pcifunc);
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0);
+	if (!pfvf->npalf || blkaddr < 0)
+		return NPA_AF_ERR_AF_LF_INVALID;
+
+	block = &hw->block[blkaddr];
+	lf = rvu_get_lf(rvu, block, pcifunc, 0);
+	if (lf < 0)
+		return NPA_AF_ERR_AF_LF_INVALID;
+
+	mutex_lock(&rvu->rsrc_lock);
+
+	/* allocate a new counter */
+	cntr = rvu_alloc_rsrc(&rvu->npa_dpc);
+	if (cntr < 0) {
+		mutex_unlock(&rvu->rsrc_lock);
+		return cntr;
+	}
+
+	rsp->cntr_id = cntr;
+
+	/* DPC counter config */
+	rvu_write64(rvu, blkaddr, NPA_AF_DPCX_CFG(cntr), req->dpc_conf);
+
+	/* 0 to 63 lfs -> idx 0, 64 - 127 lfs -> idx 1 */
+	ridx = lf >> 6;
+	lfmask = BIT_ULL(ridx ? lf - NPA_DPC_LFS_PER_REG : lf);
+
+	ridx = 2 * cntr + ridx;
+	/* Give permission for LF access */
+	val = rvu_read64(rvu, blkaddr, NPA_AF_DPC_PERMITX(ridx));
+	val |= lfmask;
+	rvu_write64(rvu, blkaddr, NPA_AF_DPC_PERMITX(ridx), val);
+
+	mutex_unlock(&rvu->rsrc_lock);
+
+	return 0;
+}
+
+int rvu_mbox_handler_npa_cn20k_dpc_alloc(struct rvu *rvu,
+					 struct npa_cn20k_dpc_alloc_req *req,
+					 struct npa_cn20k_dpc_alloc_rsp *rsp)
+{
+	return npa_cn20k_dpc_alloc(rvu, req, rsp);
+}
+
+int npa_cn20k_dpc_free(struct rvu *rvu, struct npa_cn20k_dpc_free_req *req)
+{
+	struct rvu_hwinfo *hw = rvu->hw;
+	u16 pcifunc = req->hdr.pcifunc;
+	int cntr, lf, blkaddr, ridx;
+	struct rvu_block *block;
+	struct rvu_pfvf *pfvf;
+	u64 val, lfmask;
+
+	pfvf = rvu_get_pfvf(rvu, pcifunc);
+
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0);
+	if (!pfvf->npalf || blkaddr < 0)
+		return NPA_AF_ERR_AF_LF_INVALID;
+
+	block = &hw->block[blkaddr];
+	lf = rvu_get_lf(rvu, block, pcifunc, 0);
+	if (lf < 0)
+		return NPA_AF_ERR_AF_LF_INVALID;
+
+	if (req->cntr_id >= NPA_DPC_MAX)
+		return NPA_AF_ERR_PARAM;
+
+	mutex_lock(&rvu->rsrc_lock);
+
+	/* 0 to 63 lfs -> idx 0, 64 - 127 lfs -> idx 1 */
+	ridx = lf >> 6;
+	lfmask = BIT_ULL(ridx ? lf - NPA_DPC_LFS_PER_REG : lf);
+	cntr = req->cntr_id;
+
+	ridx = 2 * cntr + ridx;
+
+	val = rvu_read64(rvu, blkaddr, NPA_AF_DPC_PERMITX(ridx));
+	/* Check if the counter is allotted to this LF */
+	if (!(val & lfmask)) {
+		mutex_unlock(&rvu->rsrc_lock);
+		return 0;
+	}
+
+	/* Revert permission */
+	val &= ~lfmask;
+	rvu_write64(rvu, blkaddr, NPA_AF_DPC_PERMITX(ridx), val);
+
+	/* Free this counter */
+	rvu_free_rsrc(&rvu->npa_dpc, req->cntr_id);
+
+	mutex_unlock(&rvu->rsrc_lock);
+
+	return 0;
+}
+
+void npa_cn20k_dpc_free_all(struct rvu *rvu, u16 pcifunc)
+{
+	struct npa_cn20k_dpc_free_req req;
+	int i;
+
+	req.hdr.pcifunc = pcifunc;
+	for (i = 0; i < NPA_DPC_MAX; i++) {
+		req.cntr_id = i;
+		npa_cn20k_dpc_free(rvu, &req);
+	}
+}
+
+int rvu_mbox_handler_npa_cn20k_dpc_free(struct rvu *rvu,
+					struct npa_cn20k_dpc_free_req *req,
+					struct msg_rsp *rsp)
+{
+	return npa_cn20k_dpc_free(rvu, req);
+}
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/reg.h b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/reg.h
index 8bfaa507ee50..9b49e376878e 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/reg.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/reg.h
@@ -143,4 +143,11 @@
 	offset = (0xb000000ull | (a) << 4 | (b) << 20);		\
 	offset; })
 
+/* NPA Registers */
+#define NPA_AF_DPCX_CFG(a)		(0x800 | (a) << 6)
+#define NPA_AF_DPC_PERMITX(a)		(0x1000 | (a) << 3)
+
+#define NPA_DPC_MAX			32
+#define NPA_DPC_LFS_PER_REG		64
+
 #endif /* RVU_MBOX_REG_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index 4a97bd93d882..b29ec26b66b7 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -213,6 +213,10 @@ M(NPA_AQ_ENQ,		0x402, npa_aq_enq, npa_aq_enq_req, npa_aq_enq_rsp)   \
 M(NPA_HWCTX_DISABLE,	0x403, npa_hwctx_disable, hwctx_disable_req, msg_rsp)\
 M(NPA_CN20K_AQ_ENQ,	0x404, npa_cn20k_aq_enq, npa_cn20k_aq_enq_req,	\
 				npa_cn20k_aq_enq_rsp)			\
+M(NPA_CN20K_DPC_ALLOC,	0x405, npa_cn20k_dpc_alloc, npa_cn20k_dpc_alloc_req, \
+				npa_cn20k_dpc_alloc_rsp)		\
+M(NPA_CN20K_DPC_FREE,	0x406, npa_cn20k_dpc_free, npa_cn20k_dpc_free_req, \
+				msg_rsp)				\
 /* SSO/SSOW mbox IDs (range 0x600 - 0x7FF) */				\
 /* TIM mbox IDs (range 0x800 - 0x9FF) */				\
 /* CPT mbox IDs (range 0xA00 - 0xBFF) */				\
@@ -910,6 +914,21 @@ struct npa_cn20k_aq_enq_rsp {
 	};
 };
 
+struct npa_cn20k_dpc_alloc_req {
+	struct mbox_msghdr hdr;
+	u16 dpc_conf;
+};
+
+struct npa_cn20k_dpc_alloc_rsp {
+	struct mbox_msghdr hdr;
+	u8 cntr_id;
+};
+
+struct npa_cn20k_dpc_free_req {
+	struct mbox_msghdr hdr;
+	u8 cntr_id;
+};
+
 /* Disable all contexts of type 'ctype' */
 struct hwctx_disable_req {
 	struct mbox_msghdr hdr;
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 36a71d32b894..0299fa1bd3bc 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -663,6 +663,9 @@ struct rvu {
 	/* CPT interrupt lock */
 	spinlock_t		cpt_intr_lock;
 
+	/* NPA */
+	struct rsrc_bmap	npa_dpc;
+
 	struct mutex		mbox_lock; /* Serialize mbox up and down msgs */
 	u16			rep_pcifunc;
 	bool			altaf_ready;
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npa.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npa.c
index 809386c6bcba..f7916ac79c69 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npa.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npa.c
@@ -8,6 +8,8 @@
 #include <linux/module.h>
 #include <linux/pci.h>
 
+#include "cn20k/api.h"
+#include "cn20k/reg.h"
 #include "rvu_struct.h"
 #include "rvu_reg.h"
 #include "rvu.h"
@@ -504,6 +506,8 @@ int rvu_mbox_handler_npa_lf_free(struct rvu *rvu, struct msg_req *req,
 		return NPA_AF_ERR_LF_RESET;
 	}
 
+	if (is_cn20k(rvu->pdev))
+		npa_cn20k_dpc_free_all(rvu, pcifunc);
 	npa_ctx_free(rvu, pfvf);
 
 	return 0;
@@ -569,12 +573,17 @@ static int npa_aq_init(struct rvu *rvu, struct rvu_block *block)
 int rvu_npa_init(struct rvu *rvu)
 {
 	struct rvu_hwinfo *hw = rvu->hw;
-	int blkaddr;
+	int err, blkaddr;
 
 	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, 0);
 	if (blkaddr < 0)
 		return 0;
 
+	rvu->npa_dpc.max = NPA_DPC_MAX;
+	err = rvu_alloc_bitmap(&rvu->npa_dpc);
+	if (err)
+		return err;
+
 	/* Initialize admin queue */
 	return npa_aq_init(rvu, &hw->block[blkaddr]);
 }
@@ -591,6 +600,7 @@ void rvu_npa_freemem(struct rvu *rvu)
 
 	block = &hw->block[blkaddr];
 	rvu_aq_free(rvu, block->aq);
+	kfree(rvu->npa_dpc.bmap);
 }
 
 void rvu_npa_lf_teardown(struct rvu *rvu, u16 pcifunc, int npalf)
@@ -611,6 +621,8 @@ void rvu_npa_lf_teardown(struct rvu *rvu, u16 pcifunc, int npalf)
 	ctx_req.ctype = NPA_AQ_CTYPE_HALO;
 	npa_lf_hwctx_disable(rvu, &ctx_req);
 
+	if (is_cn20k(rvu->pdev))
+		npa_cn20k_dpc_free_all(rvu, pcifunc);
 	npa_ctx_free(rvu, pfvf);
 }
 
-- 
2.48.1


^ permalink raw reply related

* [net-next PATCH v5 1/4] octeontx2-af: npa: cn20k: Add NPA Halo support
From: Subbaraya Sundeep @ 2026-04-09  9:53 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2
  Cc: netdev, linux-kernel, Linu Cherian, Subbaraya Sundeep
In-Reply-To: <1775728404-28451-1-git-send-email-sbhatta@marvell.com>

From: Linu Cherian <lcherian@marvell.com>

CN20K silicon implements unified aura and pool context
type called Halo for better resource usage. Add support to
handle Halo context type operations.

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: Subbaraya Sundeep <sbhatta@marvell.com>
---
 .../ethernet/marvell/octeontx2/af/cn20k/npa.c | 27 +++++++
 .../marvell/octeontx2/af/cn20k/struct.h       | 81 +++++++++++++++++++
 .../net/ethernet/marvell/octeontx2/af/mbox.h  |  6 ++
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  2 +
 .../ethernet/marvell/octeontx2/af/rvu_npa.c   | 63 +++++++++++++--
 .../marvell/octeontx2/af/rvu_struct.h         |  1 +
 6 files changed, 173 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npa.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npa.c
index fe8f926c8b75..c963f43dc7b0 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npa.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npa.c
@@ -19,3 +19,30 @@ int rvu_mbox_handler_npa_cn20k_aq_enq(struct rvu *rvu,
 				   (struct npa_aq_enq_rsp *)rsp);
 }
 EXPORT_SYMBOL(rvu_mbox_handler_npa_cn20k_aq_enq);
+
+int rvu_npa_halo_hwctx_disable(struct npa_aq_enq_req *req)
+{
+	struct npa_cn20k_aq_enq_req *hreq;
+
+	hreq = (struct npa_cn20k_aq_enq_req *)req;
+
+	hreq->halo.bp_ena_0 = 0;
+	hreq->halo.bp_ena_1 = 0;
+	hreq->halo.bp_ena_2 = 0;
+	hreq->halo.bp_ena_3 = 0;
+	hreq->halo.bp_ena_4 = 0;
+	hreq->halo.bp_ena_5 = 0;
+	hreq->halo.bp_ena_6 = 0;
+	hreq->halo.bp_ena_7 = 0;
+
+	hreq->halo_mask.bp_ena_0 = 1;
+	hreq->halo_mask.bp_ena_1 = 1;
+	hreq->halo_mask.bp_ena_2 = 1;
+	hreq->halo_mask.bp_ena_3 = 1;
+	hreq->halo_mask.bp_ena_4 = 1;
+	hreq->halo_mask.bp_ena_5 = 1;
+	hreq->halo_mask.bp_ena_6 = 1;
+	hreq->halo_mask.bp_ena_7 = 1;
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h
index 763f6cabd7c2..2364bafd329d 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/struct.h
@@ -377,4 +377,85 @@ struct npa_cn20k_pool_s {
 
 static_assert(sizeof(struct npa_cn20k_pool_s) == NIX_MAX_CTX_SIZE);
 
+struct npa_cn20k_halo_s {
+	u64 stack_base                  : 64;
+	u64 ena                         :  1;
+	u64 nat_align                   :  1;
+	u64 reserved_66_67              :  2;
+	u64 stack_caching               :  1;
+	u64 reserved_69_71              :  3;
+	u64 aura_drop_ena               :  1;
+	u64 reserved_73_79              :  7;
+	u64 aura_drop                   :  8;
+	u64 buf_offset                  : 12;
+	u64 reserved_100_103            :  4;
+	u64 buf_size                    : 12;
+	u64 reserved_116_119            :  4;
+	u64 ref_cnt_prof                :  3;
+	u64 reserved_123_127            :  5;
+	u64 stack_max_pages             : 32;
+	u64 stack_pages                 : 32;
+	u64 bp_0                        :  7;
+	u64 bp_1                        :  7;
+	u64 bp_2                        :  7;
+	u64 bp_3                        :  7;
+	u64 bp_4                        :  7;
+	u64 bp_5                        :  7;
+	u64 bp_6                        :  7;
+	u64 bp_7                        :  7;
+	u64 bp_ena_0                    :  1;
+	u64 bp_ena_1                    :  1;
+	u64 bp_ena_2                    :  1;
+	u64 bp_ena_3                    :  1;
+	u64 bp_ena_4                    :  1;
+	u64 bp_ena_5                    :  1;
+	u64 bp_ena_6                    :  1;
+	u64 bp_ena_7                    :  1;
+	u64 stack_offset                :  4;
+	u64 reserved_260_263            :  4;
+	u64 shift                       :  6;
+	u64 reserved_270_271            :  2;
+	u64 avg_level                   :  8;
+	u64 avg_con                     :  9;
+	u64 fc_ena                      :  1;
+	u64 fc_stype                    :  2;
+	u64 fc_hyst_bits                :  4;
+	u64 fc_up_crossing              :  1;
+	u64 reserved_297_299            :  3;
+	u64 update_time                 : 16;
+	u64 reserved_316_319            :  4;
+	u64 fc_addr                     : 64;
+	u64 ptr_start                   : 64;
+	u64 ptr_end                     : 64;
+	u64 bpid_0                      : 12;
+	u64 reserved_524_535            : 12;
+	u64 err_int                     :  8;
+	u64 err_int_ena                 :  8;
+	u64 thresh_int                  :  1;
+	u64 thresh_int_ena              :  1;
+	u64 thresh_up                   :  1;
+	u64 reserved_555                :  1;
+	u64 thresh_qint_idx             :  7;
+	u64 reserved_563                :  1;
+	u64 err_qint_idx                :  7;
+	u64 reserved_571_575            :  5;
+	u64 thresh                      : 36;
+	u64 reserved_612_615            :  4;
+	u64 fc_msh_dst                  : 11;
+	u64 reserved_627_630            :  4;
+	u64 op_dpc_ena                  :  1;
+	u64 op_dpc_set                  :  5;
+	u64 reserved_637_637            :  1;
+	u64 stream_ctx                  :  1;
+	u64 unified_ctx                 :  1;
+	u64 reserved_640_703            : 64;
+	u64 reserved_704_767            : 64;
+	u64 reserved_768_831            : 64;
+	u64 reserved_832_895            : 64;
+	u64 reserved_896_959            : 64;
+	u64 reserved_960_1023           : 64;
+};
+
+static_assert(sizeof(struct npa_cn20k_halo_s) == NIX_MAX_CTX_SIZE);
+
 #endif
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index dc42c81c0942..4a97bd93d882 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -884,6 +884,8 @@ struct npa_cn20k_aq_enq_req {
 		struct npa_cn20k_aura_s aura;
 		/* Valid when op == WRITE/INIT and ctype == POOL */
 		struct npa_cn20k_pool_s pool;
+		/* Valid when op == WRITE/INIT and ctype == HALO */
+		struct npa_cn20k_halo_s halo;
 	};
 	/* Mask data when op == WRITE (1=write, 0=don't write) */
 	union {
@@ -891,6 +893,8 @@ struct npa_cn20k_aq_enq_req {
 		struct npa_cn20k_aura_s aura_mask;
 		/* Valid when op == WRITE and ctype == POOL */
 		struct npa_cn20k_pool_s pool_mask;
+		/* Valid when op == WRITE/INIT and ctype == HALO */
+		struct npa_cn20k_halo_s halo_mask;
 	};
 };
 
@@ -901,6 +905,8 @@ struct npa_cn20k_aq_enq_rsp {
 		struct npa_cn20k_aura_s aura;
 		/* Valid when op == READ and ctype == POOL */
 		struct npa_cn20k_pool_s pool;
+		/* Valid when op == READ and ctype == HALO */
+		struct npa_cn20k_halo_s halo;
 	};
 };
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index a466181cf908..36a71d32b894 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -261,6 +261,7 @@ struct rvu_pfvf {
 	struct qmem	*pool_ctx;
 	struct qmem	*npa_qints_ctx;
 	unsigned long	*aura_bmap;
+	unsigned long	*halo_bmap; /* Aura and Halo are mutually exclusive */
 	unsigned long	*pool_bmap;
 
 	/* NIX contexts */
@@ -1008,6 +1009,7 @@ void rvu_npa_freemem(struct rvu *rvu);
 void rvu_npa_lf_teardown(struct rvu *rvu, u16 pcifunc, int npalf);
 int rvu_npa_aq_enq_inst(struct rvu *rvu, struct npa_aq_enq_req *req,
 			struct npa_aq_enq_rsp *rsp);
+int rvu_npa_halo_hwctx_disable(struct npa_aq_enq_req *req);
 
 /* NIX APIs */
 bool is_nixlf_attached(struct rvu *rvu, u16 pcifunc);
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npa.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npa.c
index e2a33e46b48a..809386c6bcba 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npa.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npa.c
@@ -12,6 +12,11 @@
 #include "rvu_reg.h"
 #include "rvu.h"
 
+static bool npa_ctype_invalid(struct rvu *rvu, int ctype)
+{
+	return !is_cn20k(rvu->pdev) && ctype == NPA_AQ_CTYPE_HALO;
+}
+
 static int npa_aq_enqueue_wait(struct rvu *rvu, struct rvu_block *block,
 			       struct npa_aq_inst_s *inst)
 {
@@ -72,13 +77,19 @@ int rvu_npa_aq_enq_inst(struct rvu *rvu, struct npa_aq_enq_req *req,
 	bool ena;
 
 	pfvf = rvu_get_pfvf(rvu, pcifunc);
-	if (!pfvf->aura_ctx || req->aura_id >= pfvf->aura_ctx->qsize)
+	if (!pfvf->aura_ctx || req->aura_id >= pfvf->aura_ctx->qsize ||
+	    npa_ctype_invalid(rvu, req->ctype))
 		return NPA_AF_ERR_AQ_ENQUEUE;
 
 	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPA, pcifunc);
 	if (!pfvf->npalf || blkaddr < 0)
 		return NPA_AF_ERR_AF_LF_INVALID;
 
+	/* Ensure halo bitmap is exclusive to halo ctype */
+	if (is_cn20k(rvu->pdev) && req->ctype != NPA_AQ_CTYPE_HALO &&
+	    test_bit(req->aura_id, pfvf->halo_bmap))
+		return NPA_AF_ERR_AQ_ENQUEUE;
+
 	block = &hw->block[blkaddr];
 	aq = block->aq;
 	if (!aq) {
@@ -119,7 +130,7 @@ int rvu_npa_aq_enq_inst(struct rvu *rvu, struct npa_aq_enq_req *req,
 			memcpy(mask, &req->aura_mask,
 			       sizeof(struct npa_aura_s));
 			memcpy(ctx, &req->aura, sizeof(struct npa_aura_s));
-		} else {
+		} else { /* Applies to pool and halo since size is same */
 			memcpy(mask, &req->pool_mask,
 			       sizeof(struct npa_pool_s));
 			memcpy(ctx, &req->pool, sizeof(struct npa_pool_s));
@@ -135,7 +146,7 @@ int rvu_npa_aq_enq_inst(struct rvu *rvu, struct npa_aq_enq_req *req,
 			req->aura.pool_addr = pfvf->pool_ctx->iova +
 			(req->aura.pool_addr * pfvf->pool_ctx->entry_sz);
 			memcpy(ctx, &req->aura, sizeof(struct npa_aura_s));
-		} else { /* POOL's context */
+		} else { /* Applies to pool and halo since size is same */
 			memcpy(ctx, &req->pool, sizeof(struct npa_pool_s));
 		}
 		break;
@@ -176,6 +187,20 @@ int rvu_npa_aq_enq_inst(struct rvu *rvu, struct npa_aq_enq_req *req,
 		}
 	}
 
+	if (req->ctype == NPA_AQ_CTYPE_HALO) {
+		if (req->op == NPA_AQ_INSTOP_INIT && req->aura.ena)
+			__set_bit(req->aura_id, pfvf->halo_bmap);
+		if (req->op == NPA_AQ_INSTOP_WRITE) {
+			ena = (req->aura.ena & req->aura_mask.ena) |
+				(test_bit(req->aura_id, pfvf->halo_bmap) &
+				~req->aura_mask.ena);
+			if (ena)
+				__set_bit(req->aura_id, pfvf->halo_bmap);
+			else
+				__clear_bit(req->aura_id, pfvf->halo_bmap);
+		}
+	}
+
 	/* Set pool bitmap if pool hw context is enabled */
 	if (req->ctype == NPA_AQ_CTYPE_POOL) {
 		if (req->op == NPA_AQ_INSTOP_INIT && req->pool.ena)
@@ -198,7 +223,7 @@ int rvu_npa_aq_enq_inst(struct rvu *rvu, struct npa_aq_enq_req *req,
 			if (req->ctype == NPA_AQ_CTYPE_AURA)
 				memcpy(&rsp->aura, ctx,
 				       sizeof(struct npa_aura_s));
-			else
+			else /* Applies to pool and halo since size is same */
 				memcpy(&rsp->pool, ctx,
 				       sizeof(struct npa_pool_s));
 		}
@@ -210,12 +235,14 @@ int rvu_npa_aq_enq_inst(struct rvu *rvu, struct npa_aq_enq_req *req,
 static int npa_lf_hwctx_disable(struct rvu *rvu, struct hwctx_disable_req *req)
 {
 	struct rvu_pfvf *pfvf = rvu_get_pfvf(rvu, req->hdr.pcifunc);
+	const char *context = "Unknown";
 	struct npa_aq_enq_req aq_req;
 	unsigned long *bmap;
 	int id, cnt = 0;
 	int err = 0, rc;
 
-	if (!pfvf->pool_ctx || !pfvf->aura_ctx)
+	if (!pfvf->pool_ctx || !pfvf->aura_ctx ||
+	    npa_ctype_invalid(rvu, req->ctype))
 		return NPA_AF_ERR_AQ_ENQUEUE;
 
 	memset(&aq_req, 0, sizeof(struct npa_aq_enq_req));
@@ -226,6 +253,7 @@ static int npa_lf_hwctx_disable(struct rvu *rvu, struct hwctx_disable_req *req)
 		aq_req.pool_mask.ena = 1;
 		cnt = pfvf->pool_ctx->qsize;
 		bmap = pfvf->pool_bmap;
+		context = "Pool";
 	} else if (req->ctype == NPA_AQ_CTYPE_AURA) {
 		aq_req.aura.ena = 0;
 		aq_req.aura_mask.ena = 1;
@@ -233,6 +261,14 @@ static int npa_lf_hwctx_disable(struct rvu *rvu, struct hwctx_disable_req *req)
 		aq_req.aura_mask.bp_ena = 1;
 		cnt = pfvf->aura_ctx->qsize;
 		bmap = pfvf->aura_bmap;
+		context = "Aura";
+	} else if (req->ctype == NPA_AQ_CTYPE_HALO) {
+		aq_req.aura.ena = 0;
+		aq_req.aura_mask.ena = 1;
+		rvu_npa_halo_hwctx_disable(&aq_req);
+		cnt = pfvf->aura_ctx->qsize;
+		bmap = pfvf->halo_bmap;
+		context = "Halo";
 	}
 
 	aq_req.ctype = req->ctype;
@@ -246,8 +282,7 @@ static int npa_lf_hwctx_disable(struct rvu *rvu, struct hwctx_disable_req *req)
 		if (rc) {
 			err = rc;
 			dev_err(rvu->dev, "Failed to disable %s:%d context\n",
-				(req->ctype == NPA_AQ_CTYPE_AURA) ?
-				"Aura" : "Pool", id);
+				context, id);
 		}
 	}
 
@@ -311,6 +346,9 @@ static void npa_ctx_free(struct rvu *rvu, struct rvu_pfvf *pfvf)
 	kfree(pfvf->aura_bmap);
 	pfvf->aura_bmap = NULL;
 
+	kfree(pfvf->halo_bmap);
+	pfvf->halo_bmap = NULL;
+
 	qmem_free(rvu->dev, pfvf->aura_ctx);
 	pfvf->aura_ctx = NULL;
 
@@ -374,6 +412,13 @@ int rvu_mbox_handler_npa_lf_alloc(struct rvu *rvu,
 	if (!pfvf->aura_bmap)
 		goto free_mem;
 
+	if (is_cn20k(rvu->pdev)) {
+		pfvf->halo_bmap = kcalloc(NPA_AURA_COUNT(req->aura_sz),
+					  sizeof(long), GFP_KERNEL);
+		if (!pfvf->halo_bmap)
+			goto free_mem;
+	}
+
 	/* Alloc memory for pool HW contexts */
 	hwctx_size = 1UL << ((ctx_cfg >> 4) & 0xF);
 	err = qmem_alloc(rvu->dev, &pfvf->pool_ctx, req->nr_pools, hwctx_size);
@@ -562,6 +607,10 @@ void rvu_npa_lf_teardown(struct rvu *rvu, u16 pcifunc, int npalf)
 	ctx_req.ctype = NPA_AQ_CTYPE_AURA;
 	npa_lf_hwctx_disable(rvu, &ctx_req);
 
+	/* Disable all Halos */
+	ctx_req.ctype = NPA_AQ_CTYPE_HALO;
+	npa_lf_hwctx_disable(rvu, &ctx_req);
+
 	npa_ctx_free(rvu, pfvf);
 }
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_struct.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu_struct.h
index 8e868f815de1..d37cf2cf0fee 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_struct.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_struct.h
@@ -130,6 +130,7 @@ enum npa_aq_comp {
 enum npa_aq_ctype {
 	NPA_AQ_CTYPE_AURA = 0x0,
 	NPA_AQ_CTYPE_POOL = 0x1,
+	NPA_AQ_CTYPE_HALO = 0x2,
 };
 
 /* NPA admin queue instruction opcodes */
-- 
2.48.1


^ permalink raw reply related

* [net-next PATCH v5 0/4] octeontx2: CN20K NPA Halo context support
From: Subbaraya Sundeep @ 2026-04-09  9:53 UTC (permalink / raw)
  To: andrew+netdev, davem, edumazet, kuba, pabeni, sgoutham, gakula,
	bbhushan2
  Cc: netdev, linux-kernel, Subbaraya Sundeep

This series adds NPA Halo support for CN20K in the octeontx2 AF and
PF drivers. On CN20K, NPA supports a unified "Halo" context that combines
aura and pool contexts in a single structure. This is a simplification
in hardware so that there is no need to initialize both Aura and Pool
contexts for queues. Separate Aura and Pool contexts are needed say if
we have to point many Auras to a single pool but we always use 1:1 Aura
and Pool map in Octeontx2 netdev driver. Hence for CN20K use Halo
context for netdevs.

The series:

  1) Adds Halo context type, mbox handling, and halo_bmap tracking in AF.
  2) Adds NPA DPC (diagnostic/performance counters) 32 counters with
     per-LF permit registers, mbox alloc/free, and teardown handling.
  3) Adds debugfs for Halo (halo_ctx file and NPA context display/write
     for HALO ctype).
  4) Switches the CN20K PF driver to use the unified Halo context and
     allocates a DPC counter for the NPA LF.

Changes for v5:
 Fixed double free of DPC counter in error path as per AI review
 Modified commit message to state that backpressure
 is not supported currently
Changes for v4:
 Fixed DPC counter leak as per AI review
Changes for v3:
 Fixed all AI reviews
 Removed inline for npa_ctype_invalid(as per Simon)

Changes for v2:
 Fixed all AI reviews
 Removed inline and added const for npa_ctype_str(as per Simon)
 Fixed build warning flagged with W=1


Linu Cherian (3):
  octeontx2-af: npa: cn20k: Add NPA Halo support
  octeontx2-af: npa: cn20k: Add DPC support
  octeontx2-af: npa: cn20k: Add debugfs for Halo

Subbaraya Sundeep (1):
  octeontx2-pf: cn20k: Use unified Halo context

 .../ethernet/marvell/octeontx2/af/cn20k/api.h |   6 +
 .../marvell/octeontx2/af/cn20k/debugfs.c      |  60 +++++
 .../marvell/octeontx2/af/cn20k/debugfs.h      |   2 +
 .../ethernet/marvell/octeontx2/af/cn20k/npa.c | 156 +++++++++++++
 .../ethernet/marvell/octeontx2/af/cn20k/reg.h |   7 +
 .../marvell/octeontx2/af/cn20k/struct.h       |  81 +++++++
 .../net/ethernet/marvell/octeontx2/af/mbox.h  |  25 ++
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |   5 +
 .../marvell/octeontx2/af/rvu_debugfs.c        |  71 +++++-
 .../ethernet/marvell/octeontx2/af/rvu_npa.c   |  77 ++++++-
 .../marvell/octeontx2/af/rvu_struct.h         |   1 +
 .../ethernet/marvell/octeontx2/nic/cn20k.c    | 215 +++++++++---------
 .../ethernet/marvell/octeontx2/nic/cn20k.h    |   3 +
 .../marvell/octeontx2/nic/otx2_common.h       |   3 +
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |   8 +
 15 files changed, 601 insertions(+), 119 deletions(-)

-- 
2.48.1


^ permalink raw reply

* Re: [PATCH net v3] ppp: require CAP_NET_ADMIN in target netns for unattached ioctls
From: Qingfang Deng @ 2026-04-09  9:50 UTC (permalink / raw)
  To: Taegu Ha, Andrew Lunn, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Paolo Abeni, Kees Cook, Kuniyuki Iwashima,
	Sebastian Andrzej Siewior, Cyrill Gorcunov, linux-ppp, netdev,
	linux-kernel
  Cc: gnault, jaco, richardbgobert, ericwouds, teknoraver,
	Christian Brauner, Jan Kara
In-Reply-To: <20260409071117.4354-1-hataegu0826@gmail.com>

On 2026/4/9 15:11, Taegu Ha wrote:
> /dev/ppp open is currently authorized against file->f_cred->user_ns,
> while unattached administrative ioctls operate on current->nsproxy->net_ns.
>
> As a result, a local unprivileged user can create a new user namespace
> with CLONE_NEWUSER, gain CAP_NET_ADMIN only in that new user namespace,
> and still issue PPPIOCNEWUNIT, PPPIOCATTACH, or PPPIOCATTCHAN against
> an inherited network namespace.
>
> Require CAP_NET_ADMIN in the user namespace that owns the target network
> namespace before handling unattached PPP administrative ioctls.
>
> This preserves normal pppd operation in the network namespace it is
> actually privileged in, while rejecting the userns-only inherited-netns
> case.
>
> Fixes: 273ec51dd7ce ("net: ppp_generic - introduce net-namespace functionality v2")
> Signed-off-by: Taegu Ha <hataegu0826@gmail.com>

LGTM.

Netns devs, could you please take a look?

> ---
>   drivers/net/ppp/ppp_generic.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
> index e9b41777be80..c2024684b10d 100644
> --- a/drivers/net/ppp/ppp_generic.c
> +++ b/drivers/net/ppp/ppp_generic.c
> @@ -1057,6 +1057,9 @@ static int ppp_unattached_ioctl(struct net *net, struct ppp_file *pf,
>   	struct ppp_net *pn;
>   	int __user *p = (int __user *)arg;
>   
> +	if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> +		return -EPERM;
> +
>   	switch (cmd) {
>   	case PPPIOCNEWUNIT:
>   		/* Create a new ppp unit */

^ permalink raw reply

* Re: [PATCH nf] netfilter: nft_fwd_netdev: use recursion counter in neigh egress path
From: Florian Westphal @ 2026-04-09  9:50 UTC (permalink / raw)
  To: Weiming Shi
  Cc: Pablo Neira Ayuso, David S . Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, Phil Sutter, Simon Horman, netfilter-devel, coreteam,
	netdev, Xiang Mei
In-Reply-To: <20260409053629.698822-2-bestswngs@gmail.com>

Weiming Shi <bestswngs@gmail.com> wrote:
> Fixes: f87b9464d152 ("netfilter: nft_fwd_netdev: Support egress hook")
> Reported-by: Xiang Mei <xmei5@asu.edu>
> Signed-off-by: Weiming Shi <bestswngs@gmail.com>
> ---
>  include/net/netfilter/nf_dup_netdev.h |  4 ++++
>  net/netfilter/nf_dup_netdev.c         | 18 ++++++++++++++++++
>  net/netfilter/nft_fwd_netdev.c        |  7 +++++++
>  3 files changed, 29 insertions(+)
> 
> diff --git a/include/net/netfilter/nf_dup_netdev.h b/include/net/netfilter/nf_dup_netdev.h
> index b175d271aec9..17362f76d1d1 100644
> --- a/include/net/netfilter/nf_dup_netdev.h
> +++ b/include/net/netfilter/nf_dup_netdev.h
> @@ -7,6 +7,10 @@
>  void nf_dup_netdev_egress(const struct nft_pktinfo *pkt, int oif);
>  void nf_fwd_netdev_egress(const struct nft_pktinfo *pkt, int oif);
>  
> +bool nf_dup_netdev_has_recursed(void);
> +void nf_dup_netdev_recursion_inc(void);
> +void nf_dup_netdev_recursion_dec(void);
> +
>  struct nft_offload_ctx;
>  struct nft_flow_rule;
>  
> diff --git a/net/netfilter/nf_dup_netdev.c b/net/netfilter/nf_dup_netdev.c
> index fab8b9011098..e2fe8bb6fe0d 100644
> --- a/net/netfilter/nf_dup_netdev.c
> +++ b/net/netfilter/nf_dup_netdev.c
> @@ -29,6 +29,24 @@ static u8 *nf_get_nf_dup_skb_recursion(void)
>  
>  #endif
>  
> +bool nf_dup_netdev_has_recursed(void)
> +{
> +	return *nf_get_nf_dup_skb_recursion() > NF_RECURSION_LIMIT;
> +}
> +EXPORT_SYMBOL_GPL(nf_dup_netdev_has_recursed);

I think thats a bit too heavy-handed.
nf_get_nf_dup_skb_recursion() fetches from pcpu counter or current->.

Can you move nf_get_nf_dup_skb_recursion to a shared header file
and make it inline instead of having a function call?


^ permalink raw reply

* Re: [RFC net-next 05/15] ipxlat: add IPv6 packet validation path
From: Ralf Lici @ 2026-04-09  9:44 UTC (permalink / raw)
  To: Xavier HSINYUAN
  Cc: andrew+netdev, antonio, davem, dxld, edumazet, kuba, linux-kernel,
	netdev, pabeni
In-Reply-To: <TYRPR01MB12666C2C108C1D23202C5B79ACA582@TYRPR01MB12666.jpnprd01.prod.outlook.com>

On 4/9/26 04:18, Xavier HSINYUAN wrote:
> Hi Ralf,
> 
>> +static int ipxlat_v6_validate_icmp_csum(const struct sk_buff *skb)
>> +{
>> +	struct ipv6hdr *iph6;
>> +	unsigned int len;
>> +	__sum16 csum;
>> +
>> +	if (skb->ip_summed != CHECKSUM_NONE)
>> +		return 0;
>> +
>> +	iph6 = ipv6_hdr(skb);
>> +	len = ipxlat_skb_datagram_len(skb);
>> +	csum = csum_ipv6_magic(&iph6->saddr, &iph6->daddr, len, NEXTHDR_ICMP,
>> +			       skb_checksum(skb, skb_transport_offset(skb), len,
>> +					    0));
>> +
>> +	return unlikely(csum) ? -EINVAL : 0;
>> +}
> We should include net/ip6_checksum.h to make x86_64 with KMSAN/KASAN and
> other architectures with optional _HAVE_ARCH_IPV6_CSUM happy.

Hi Xavier,

Yep, this showed up in patchwork build failures as well.
I'll add the required header where needed (packet.c and icmp_{46,64}.c)
in the next revision.

Thanks!

> 
> Best regards,
> Xavier
> 

-- 
Ralf Lici
Mandelbit Srl


^ permalink raw reply

* [PATCH net-next v2] iavf: fix kernel-doc comment style in iavf_ethtool.c
From: Aleksandr Loktionov @ 2026-04-09  9:30 UTC (permalink / raw)
  To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov
  Cc: netdev, Leszek Pepiak

iavf_ethtool.c contains 31 kernel-doc comment blocks using the legacy
`**/` terminator instead of the correct single `*/`. Two function
headers also use a colon separator (`iavf_get_channels:`,
`iavf_set_channels:`) instead of the ` - ` dash required by kernel-doc.

Additionally several comments embed their return-value descriptions in
the body paragraph, producing `scripts/kernel-doc -Wreturn` warnings.
Void functions that incorrectly say "Returns ..." are also rephrased.

Fix all issues across the full file:
 - Replace every `**/` terminator with `*/`.
 - Change `function_name:` doc headers to `function_name -`.
 - Move inline "Returns ..." sentences into dedicated `Return:` sections
   for non-void functions (iavf_get_msglevel, iavf_get_rxnfc,
   iavf_set_channels, iavf_get_rxfh_key_size, iavf_get_rxfh_indir_size,
   iavf_get_rxfh, iavf_set_rxfh).
 - Rephrase body descriptions in void functions that incorrectly said
   "Returns ..." (iavf_get_drvinfo, iavf_get_ringparam, iavf_get_coalesce).
 - Remove boilerplate body text for iavf_get_rxfh_key_size and
   iavf_get_rxfh_indir_size; the `Return:` line now conveys the same
   information without the vague "Returns the table size." sentence.

Suggested-by: Anthony L. Nguyen <anthony.l.nguyen@intel.com>
Suggested-by: Leszek Pepiak <leszek.pepiak@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
v1 -> v2 extending the scope of the changes to whole iavf_ethtool.c file
---
 drivers/net/ethernet/intel/iavf/iavf_ethtool.c | 103 ++++++++++++------------
 1 file changed, 53 insertions(+), 50 deletions(-)

diff --git a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
index 1cd1f3f..a615d59 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_ethtool.c
@@ -32,7 +32,7 @@
  * statistics array. Thus, every statistic string in an array should have the
  * same type and number of format specifiers, to be formatted by variadic
  * arguments to the iavf_add_stat_string() helper function.
- **/
+ */
 struct iavf_stats {
 	char stat_string[ETH_GSTRING_LEN];
 	int sizeof_stat;
@@ -116,7 +116,7 @@ iavf_add_one_ethtool_stat(u64 *data, void *pointer,
  * the next empty location for successive calls to __iavf_add_ethtool_stats.
  * If pointer is null, set the data values to zero and update the pointer to
  * skip these stats.
- **/
+ */
 static void
 __iavf_add_ethtool_stats(u64 **data, void *pointer,
 			 const struct iavf_stats stats[],
@@ -140,7 +140,7 @@ __iavf_add_ethtool_stats(u64 **data, void *pointer,
  *
  * The parameter @stats is evaluated twice, so parameters with side effects
  * should be avoided.
- **/
+ */
 #define iavf_add_ethtool_stats(data, pointer, stats) \
 	__iavf_add_ethtool_stats(data, pointer, stats, ARRAY_SIZE(stats))
 
@@ -157,7 +157,7 @@ __iavf_add_ethtool_stats(u64 **data, void *pointer,
  * buffer and update the data pointer when finished.
  *
  * This function expects to be called while under rcu_read_lock().
- **/
+ */
 static void
 iavf_add_queue_stats(u64 **data, struct iavf_ring *ring)
 {
@@ -189,7 +189,7 @@ iavf_add_queue_stats(u64 **data, struct iavf_ring *ring)
  *
  * Format and copy the strings described by stats into the buffer pointed at
  * by p.
- **/
+ */
 static void __iavf_add_stat_strings(u8 **p, const struct iavf_stats stats[],
 				    const unsigned int size, ...)
 {
@@ -216,7 +216,7 @@ static void __iavf_add_stat_strings(u8 **p, const struct iavf_stats stats[],
  * The parameter @stats is evaluated twice, so parameters with side effects
  * should be avoided. Additionally, stats must be an array such that
  * ARRAY_SIZE can be called on it.
- **/
+ */
 #define iavf_add_stat_strings(p, stats, ...) \
 	__iavf_add_stat_strings(p, stats, ARRAY_SIZE(stats), ## __VA_ARGS__)
 
@@ -249,7 +249,7 @@ static const struct iavf_stats iavf_gstrings_stats[] = {
  *
  * Reports speed/duplex settings. Because this is a VF, we don't know what
  * kind of link we really have, so we fake it.
- **/
+ */
 static int iavf_get_link_ksettings(struct net_device *netdev,
 				   struct ethtool_link_ksettings *cmd)
 {
@@ -308,7 +308,7 @@ static int iavf_get_link_ksettings(struct net_device *netdev,
  * @sset: id of string set
  *
  * Reports size of various string tables.
- **/
+ */
 static int iavf_get_sset_count(struct net_device *netdev, int sset)
 {
 	/* Report the maximum number queues, even if not every queue is
@@ -331,7 +331,7 @@ static int iavf_get_sset_count(struct net_device *netdev, int sset)
  * @data: pointer to data buffer
  *
  * All statistics are added to the data buffer as an array of u64.
- **/
+ */
 static void iavf_get_ethtool_stats(struct net_device *netdev,
 				   struct ethtool_stats *stats, u64 *data)
 {
@@ -367,7 +367,7 @@ static void iavf_get_ethtool_stats(struct net_device *netdev,
  * @data: buffer for string data
  *
  * Builds the statistics string table
- **/
+ */
 static void iavf_get_stat_strings(struct net_device *netdev, u8 *data)
 {
 	unsigned int i;
@@ -392,7 +392,7 @@ static void iavf_get_stat_strings(struct net_device *netdev, u8 *data)
  * @data: buffer for string data
  *
  * Builds string tables for various string sets
- **/
+ */
 static void iavf_get_strings(struct net_device *netdev, u32 sset, u8 *data)
 {
 	switch (sset) {
@@ -408,8 +408,8 @@ static void iavf_get_strings(struct net_device *netdev, u32 sset, u8 *data)
  * iavf_get_msglevel - Get debug message level
  * @netdev: network interface device structure
  *
- * Returns current debug message level.
- **/
+ * Return: current debug message level.
+ */
 static u32 iavf_get_msglevel(struct net_device *netdev)
 {
 	struct iavf_adapter *adapter = netdev_priv(netdev);
@@ -424,7 +424,7 @@ static u32 iavf_get_msglevel(struct net_device *netdev)
  *
  * Set current debug message level. Higher values cause the driver to
  * be noisier.
- **/
+ */
 static void iavf_set_msglevel(struct net_device *netdev, u32 data)
 {
 	struct iavf_adapter *adapter = netdev_priv(netdev);
@@ -439,8 +439,8 @@ static void iavf_set_msglevel(struct net_device *netdev, u32 data)
  * @netdev: network interface device structure
  * @drvinfo: ethool driver info structure
  *
- * Returns information about the driver and device for display to the user.
- **/
+ * Fills @drvinfo with information about the driver and device.
+ */
 static void iavf_get_drvinfo(struct net_device *netdev,
 			     struct ethtool_drvinfo *drvinfo)
 {
@@ -458,9 +458,9 @@ static void iavf_get_drvinfo(struct net_device *netdev,
  * @kernel_ring: ethtool extenal ringparam structure
  * @extack: netlink extended ACK report struct
  *
- * Returns current ring parameters. TX and RX rings are reported separately,
- * but the number of rings is not reported.
- **/
+ * Fills @ring with current ring parameters. TX and RX rings are reported
+ * separately, but the number of rings is not reported.
+ */
 static void iavf_get_ringparam(struct net_device *netdev,
 			       struct ethtool_ringparam *ring,
 			       struct kernel_ethtool_ringparam *kernel_ring,
@@ -483,7 +483,7 @@ static void iavf_get_ringparam(struct net_device *netdev,
  *
  * Sets ring parameters. TX and RX rings are controlled separately, but the
  * number of rings is not specified, so all rings get the same settings.
- **/
+ */
 static int iavf_set_ringparam(struct net_device *netdev,
 			      struct ethtool_ringparam *ring,
 			      struct kernel_ethtool_ringparam *kernel_ring,
@@ -551,7 +551,7 @@ static int iavf_set_ringparam(struct net_device *netdev,
  * Gets the per-queue settings for coalescence. Specifically Rx and Tx usecs
  * are per queue. If queue is <0 then we default to queue 0 as the
  * representative value.
- **/
+ */
 static int __iavf_get_coalesce(struct net_device *netdev,
 			       struct ethtool_coalesce *ec, int queue)
 {
@@ -588,11 +588,11 @@ static int __iavf_get_coalesce(struct net_device *netdev,
  * @kernel_coal: ethtool CQE mode setting structure
  * @extack: extack for reporting error messages
  *
- * Returns current coalescing settings. This is referred to elsewhere in the
- * driver as Interrupt Throttle Rate, as this is how the hardware describes
- * this functionality. Note that if per-queue settings have been modified this
- * only represents the settings of queue 0.
- **/
+ * Fills @ec with current coalescing settings. This is referred to elsewhere
+ * in the driver as Interrupt Throttle Rate, as this is how the hardware
+ * describes this functionality. Note that if per-queue settings have been
+ * modified this only represents the settings of queue 0.
+ */
 static int iavf_get_coalesce(struct net_device *netdev,
 			     struct ethtool_coalesce *ec,
 			     struct kernel_ethtool_coalesce *kernel_coal,
@@ -608,7 +608,7 @@ static int iavf_get_coalesce(struct net_device *netdev,
  * @queue: the queue to read
  *
  * Read specific queue's coalesce settings.
- **/
+ */
 static int iavf_get_per_queue_coalesce(struct net_device *netdev, u32 queue,
 				       struct ethtool_coalesce *ec)
 {
@@ -622,7 +622,7 @@ static int iavf_get_per_queue_coalesce(struct net_device *netdev, u32 queue,
  * @queue: the queue to modify
  *
  * Change the ITR settings for a specific queue.
- **/
+ */
 static int iavf_set_itr_per_queue(struct iavf_adapter *adapter,
 				  struct ethtool_coalesce *ec, int queue)
 {
@@ -680,7 +680,7 @@ static int iavf_set_itr_per_queue(struct iavf_adapter *adapter,
  * @queue: the queue to change
  *
  * Sets the coalesce settings for a particular queue.
- **/
+ */
 static int __iavf_set_coalesce(struct net_device *netdev,
 			       struct ethtool_coalesce *ec, int queue)
 {
@@ -722,7 +722,7 @@ static int __iavf_set_coalesce(struct net_device *netdev,
  * @extack: extack for reporting error messages
  *
  * Change current coalescing settings for every queue.
- **/
+ */
 static int iavf_set_coalesce(struct net_device *netdev,
 			     struct ethtool_coalesce *ec,
 			     struct kernel_ethtool_coalesce *kernel_coal,
@@ -1639,7 +1639,7 @@ static int iavf_set_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd)
  * @netdev: network interface device structure
  *
  * Return: number of RX rings.
- **/
+ */
 static u32 iavf_get_rx_ring_count(struct net_device *netdev)
 {
 	struct iavf_adapter *adapter = netdev_priv(netdev);
@@ -1653,8 +1653,8 @@ static u32 iavf_get_rx_ring_count(struct net_device *netdev)
  * @cmd: ethtool rxnfc command
  * @rule_locs: pointer to store rule locations
  *
- * Returns Success if the command is supported.
- **/
+ * Return: 0 on success, -EOPNOTSUPP if the command is not supported.
+ */
 static int iavf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
 			  u32 *rule_locs)
 {
@@ -1684,13 +1684,13 @@ static int iavf_get_rxnfc(struct net_device *netdev, struct ethtool_rxnfc *cmd,
 	return ret;
 }
 /**
- * iavf_get_channels: get the number of channels supported by the device
+ * iavf_get_channels - get the number of channels supported by the device
  * @netdev: network interface device structure
  * @ch: channel information structure
  *
  * For the purposes of our device, we only use combined channels, i.e. a tx/rx
  * queue pair. Report one extra channel to match our "other" MSI-X vector.
- **/
+ */
 static void iavf_get_channels(struct net_device *netdev,
 			      struct ethtool_channels *ch)
 {
@@ -1706,14 +1706,15 @@ static void iavf_get_channels(struct net_device *netdev,
 }
 
 /**
- * iavf_set_channels: set the new channel count
+ * iavf_set_channels - set the new channel count
  * @netdev: network interface device structure
  * @ch: channel information structure
  *
- * Negotiate a new number of channels with the PF then do a reset.  During
- * reset we'll realloc queues and fix the RSS table.  Returns 0 on success,
- * negative on failure.
- **/
+ * Negotiate a new number of channels with the PF then do a reset. During
+ * reset we'll realloc queues and fix the RSS table.
+ *
+ * Return: 0 on success, negative on failure.
+ */
 static int iavf_set_channels(struct net_device *netdev,
 			     struct ethtool_channels *ch)
 {
@@ -1750,8 +1751,8 @@ static int iavf_set_channels(struct net_device *netdev,
  * iavf_get_rxfh_key_size - get the RSS hash key size
  * @netdev: network interface device structure
  *
- * Returns the table size.
- **/
+ * Return: the RSS hash key size.
+ */
 static u32 iavf_get_rxfh_key_size(struct net_device *netdev)
 {
 	struct iavf_adapter *adapter = netdev_priv(netdev);
@@ -1763,8 +1764,8 @@ static u32 iavf_get_rxfh_key_size(struct net_device *netdev)
  * iavf_get_rxfh_indir_size - get the rx flow hash indirection table size
  * @netdev: network interface device structure
  *
- * Returns the table size.
- **/
+ * Return: the indirection table size.
+ */
 static u32 iavf_get_rxfh_indir_size(struct net_device *netdev)
 {
 	struct iavf_adapter *adapter = netdev_priv(netdev);
@@ -1777,8 +1778,10 @@ static u32 iavf_get_rxfh_indir_size(struct net_device *netdev)
  * @netdev: network interface device structure
  * @rxfh: pointer to param struct (indir, key, hfunc)
  *
- * Reads the indirection table directly from the hardware. Always returns 0.
- **/
+ * Reads the indirection table directly from the hardware.
+ *
+ * Return: 0 always.
+ */
 static int iavf_get_rxfh(struct net_device *netdev,
 			 struct ethtool_rxfh_param *rxfh)
 {
@@ -1806,9 +1809,9 @@ static int iavf_get_rxfh(struct net_device *netdev,
  * @rxfh: pointer to param struct (indir, key, hfunc)
  * @extack: extended ACK from the Netlink message
  *
- * Returns -EINVAL if the table specifies an invalid queue id, otherwise
- * returns 0 after programming the table.
- **/
+ * Return: 0 on success, -EOPNOTSUPP if the hash function is not supported,
+ * -EINVAL if the table specifies an invalid queue id.
+ */
 static int iavf_set_rxfh(struct net_device *netdev,
 			 struct ethtool_rxfh_param *rxfh,
 			 struct netlink_ext_ack *extack)
@@ -1885,7 +1888,7 @@ static const struct ethtool_ops iavf_ethtool_ops = {
  *
  * Sets ethtool ops struct in our netdev so that ethtool can call
  * our functions.
- **/
+ */
 void iavf_set_ethtool_ops(struct net_device *netdev)
 {
 	netdev->ethtool_ops = &iavf_ethtool_ops;
-- 
2.52.0

^ permalink raw reply related

* Re: [PATCH net-next 4/5] net/sched: netem: add per-impairment extended statistics
From: Paolo Abeni @ 2026-04-09  9:30 UTC (permalink / raw)
  To: Stephen Hemminger, netdev
  Cc: Jamal Hadi Salim, Jiri Pirko, David S. Miller, Eric Dumazet,
	Jakub Kicinski, Simon Horman, open list
In-Reply-To: <20260403225324.476787-5-stephen@networkplumber.org>

On 4/4/26 12:52 AM, Stephen Hemminger wrote:
> diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
> index 66e8072f44df..fada10cb9b7b 100644
> --- a/include/uapi/linux/pkt_sched.h
> +++ b/include/uapi/linux/pkt_sched.h
> @@ -569,6 +569,15 @@ struct tc_netem_gemodel {
>  #define NETEM_DIST_SCALE	8192
>  #define NETEM_DIST_MAX		16384
>  
> +struct tc_netem_xstats {
> +	__u32	delayed;	/* packets delayed */
> +	__u32	dropped;	/* packets dropped by loss model      */
> +	__u32	corrupted;	/* packets with bit errors injected   */
> +	__u32	duplicated;	/* duplicate packets generated        */
> +	__u32	reordered;	/* packets sent out of order          */
> +	__u32	ecn_marked;	/* packets ECN CE-marked (not dropped)*/
> +};

Sashiko notes that the counters size will be set in stone by the uAPI,
and u32 can wraparound very quickly (especially for unconditional delay).

I see other qdiscs generally use __u32, but some have __u64 too, so I
assume there are no architectural blocker to larger counter.

Could you please move use __u64 above?

Thanks,

Paolo


^ permalink raw reply

* [PATCH net-next] mlx4: correct error reporting in mlx4_master_process_vhcr()
From: Alok Tiwari @ 2026-04-09  9:27 UTC (permalink / raw)
  To: tariqt, andrew+netdev, kuba, davem, edumazet, pabeni, horms,
	netdev
  Cc: alok.a.tiwarilinux, alok.a.tiwari

mlx4_master_process_vhcr() logs vhcr->errno on failures, but this field
is never populated by the PF path. As a result, all failures are reported
with errno 0 and err print in status case which is misleading.

Use the actual return value (err) instead, translate it to FW status
before logging, and report both values.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
---
 drivers/net/ethernet/mellanox/mlx4/cmd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index de0193d82ec1..bdaf152e6712 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -1782,6 +1782,7 @@ static int mlx4_master_process_vhcr(struct mlx4_dev *dev, int slave,
 	}
 
 	if (err) {
+		vhcr_cmd->status = mlx4_errno_to_status(err);
 		if (!(dev->persist->state & MLX4_DEVICE_STATE_INTERNAL_ERROR)) {
 			if (vhcr->op == MLX4_CMD_ALLOC_RES &&
 			    (vhcr->in_modifier & 0xff) == RES_COUNTER &&
@@ -1791,9 +1792,8 @@ static int mlx4_master_process_vhcr(struct mlx4_dev *dev, int slave,
 					 slave, err);
 			else
 				mlx4_warn(dev, "vhcr command:0x%x slave:%d failed with error:%d, status %d\n",
-					  vhcr->op, slave, vhcr->errno, err);
+					  vhcr->op, slave, err, vhcr_cmd->status);
 		}
-		vhcr_cmd->status = mlx4_errno_to_status(err);
 		goto out_status;
 	}
 
-- 
2.50.1


^ permalink raw reply related

* Re: [PATCH mlx5-next 0/2] mlx5-next updates 2026-04-03
From: Leon Romanovsky @ 2026-04-09  9:27 UTC (permalink / raw)
  To: Jason Gunthorpe, Saeed Mahameed, Tariq Toukan
  Cc: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller, Mark Bloch, netdev, linux-rdma, linux-kernel,
	Gal Pressman, Dragos Tatulea, Moshe Shemesh
In-Reply-To: <20260403090028.137783-1-tariqt@nvidia.com>


On Fri, 03 Apr 2026 12:00:26 +0300, Tariq Toukan wrote:
> This series contains mlx5 shared updates as preparation for upcoming
> features.
> 
> Regards,
> Tariq
> 
> Moshe Shemesh (2):
>   net/mlx5: Rename MLX5_PF page counter type to MLX5_SELF
>   net/mlx5: Add icm_mng_function_id_mode cap bit
> 
> [...]

Applied, thanks!

[1/2] net/mlx5: Rename MLX5_PF page counter type to MLX5_SELF
      https://git.kernel.org/rdma/rdma/c/f9e3bd43d55f24
[2/2] net/mlx5: Add icm_mng_function_id_mode cap bit
      https://git.kernel.org/rdma/rdma/c/a1bac8b70ede33

Best regards,
-- 
Leon Romanovsky <leon@kernel.org>


^ permalink raw reply

* [PATCH net-next] gre: Count GRE packet drops
From: Gal Pressman @ 2026-04-09  9:09 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Andrew Lunn, netdev
  Cc: David Ahern, Simon Horman, Gal Pressman, Dragos Tatulea,
	Nimrod Oren

GRE is silently dropping packets without updating statistics.

In case of drop, increment rx_dropped counter to provide visibility into
packet loss. For the case where no GRE protocol handler is registered,
use rx_nohandler.

Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
---
 net/ipv4/gre_demux.c | 8 ++++++--
 net/ipv4/ip_gre.c    | 1 +
 net/ipv6/ip6_gre.c   | 1 +
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
index dafd68f3436a..96fd7dc6d82d 100644
--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -159,14 +159,18 @@ static int gre_rcv(struct sk_buff *skb)
 	rcu_read_lock();
 	proto = rcu_dereference(gre_proto[ver]);
 	if (!proto || !proto->handler)
-		goto drop_unlock;
+		goto drop_nohandler;
 	ret = proto->handler(skb);
 	rcu_read_unlock();
 	return ret;
 
-drop_unlock:
+drop_nohandler:
 	rcu_read_unlock();
+	dev_core_stats_rx_nohandler_inc(skb->dev);
+	kfree_skb(skb);
+	return NET_RX_DROP;
 drop:
+	dev_core_stats_rx_dropped_inc(skb->dev);
 	kfree_skb(skb);
 	return NET_RX_DROP;
 }
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 35f0baa99d40..169e2921a851 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -468,6 +468,7 @@ static int gre_rcv(struct sk_buff *skb)
 out:
 	icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0);
 drop:
+	dev_core_stats_rx_dropped_inc(skb->dev);
 	kfree_skb(skb);
 	return 0;
 }
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index dafcc0dcd77a..63fc8556b475 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -593,6 +593,7 @@ static int gre_rcv(struct sk_buff *skb)
 out:
 	icmpv6_send(skb, ICMPV6_DEST_UNREACH, ICMPV6_PORT_UNREACH, 0);
 drop:
+	dev_core_stats_rx_dropped_inc(skb->dev);
 	kfree_skb(skb);
 	return 0;
 }
-- 
2.52.0


^ permalink raw reply related

* Re: [PATCH net-next v2 05/14] libie: add bookkeeping support for control queue messages
From: Paolo Abeni @ 2026-04-09  9:07 UTC (permalink / raw)
  To: Tony Nguyen, davem, kuba, edumazet, andrew+netdev, netdev
  Cc: Phani R Burra, larysa.zaremba, przemyslaw.kitszel,
	aleksander.lobakin, sridhar.samudrala, anjali.singhai,
	michal.swiatkowski, maciej.fijalkowski, emil.s.tantilov,
	madhu.chittim, joshua.a.hay, jacob.e.keller,
	jayaprakash.shanmugam, jiri, horms, corbet, richardcochran,
	linux-doc, Bharath R, Samuel Salin, Aleksandr Loktionov
In-Reply-To: <20260403194938.3577011-6-anthony.l.nguyen@intel.com>

On 4/3/26 9:49 PM, Tony Nguyen wrote:
> +static bool
> +libie_ctlq_xn_process_recv(struct libie_ctlq_xn_recv_params *params,
> +			   struct libie_ctlq_msg *ctlq_msg)
> +{
> +	struct libie_ctlq_xn_manager *xnm = params->xnm;
> +	struct libie_ctlq_xn *xn;
> +	u16 msg_cookie, xn_index;
> +	struct kvec *response;
> +	int status;
> +	u16 data;
> +
> +	data = ctlq_msg->sw_cookie;
> +	xn_index = FIELD_GET(LIBIE_CTLQ_XN_INDEX_M, data);
> +	msg_cookie = FIELD_GET(LIBIE_CTLQ_XN_COOKIE_M, data);
> +	status = ctlq_msg->chnl_retval ? -EFAULT : 0;
> +
> +	xn = &xnm->ring[xn_index];
> +	if (ctlq_msg->chnl_opcode != xn->virtchnl_opcode ||
> +	    msg_cookie != xn->cookie)
> +		return false;
> +
> +	spin_lock(&xn->xn_lock);

Sashiko says:

---
Because the cookie and opcode are checked before acquiring the lock, is
it possible for the transaction to time out, be returned to the free
list, and get reallocated for a new message before the lock is acquired?
If that happens, could the old delayed response falsely complete the
newly allocated transaction since the identifiers are not re-verified
inside the lock?
---

> +/**
> + * libie_xn_check_async_timeout - Check for asynchronous message timeouts
> + * @xnm: Xn transaction manager
> + *
> + * Call the corresponding callback to notify the caller about the timeout.
> + */
> +static void libie_xn_check_async_timeout(struct libie_ctlq_xn_manager *xnm)
> +{
> +	u32 idx;
> +
> +	for_each_clear_bit(idx, xnm->free_xns_bm, LIBIE_CTLQ_MAX_XN_ENTRIES) {

Sashiko says:

---
This iterates over the bitmap without holding the lock. Concurrently,
other paths modify this bitmap using non-atomic bitwise operations like
__clear_bit() and __set_bit() under the lock. Will this cause torn reads
or data races that might lead the timeout handler to skip valid
transactions or examine invalid ones?
---


> +		params->ctlq_msg->sw_cookie = cookie;
> +		params->ctlq_msg->send_mem = *dma_mem;
> +		params->ctlq_msg->data_len = buf_len;
> +		params->ctlq_msg->chnl_opcode = params->chnl_opcode;
> +		ret = libie_ctlq_send(params->ctlq, params->ctlq_msg, 1);
> +	}
> +
> +	if (ret && !libie_cp_can_send_onstack(buf_len))
> +		libie_cp_unmap_dma_mem(dev, dma_mem);

Sashiko says:

---
When libie_ctlq_send() fails here, the DMA memory is unmapped and the
buffer is freed by the caller. However, the software tracking ring at
tx_msg[next_to_use] still contains the populated send_mem details and a
non-zero data_len.

During driver teardown, libie_ctlq_xn_send_clean() is invoked with
params->force = true, which processes the ring without checking the
hardware completion bit. Could this cause the cleanup routine to process
the failed slot again, resulting in a double-free and double-unmap?
---

There are more remarks on the following patch, please have a look.

Also, it would be very helpful if you could help triaging such
(overwhelming amount of) feedback on future submissions, explicitly
commenting on the ML. Sashiko tends to be quite noise on device driver code.

Thanks,

Paolo


^ permalink raw reply

* Re: [PATCH bpf-next v3 5/6] bpf: clear decap tunnel GSO state in skb_adjust_room
From: Hudson, Nick @ 2026-04-09  9:03 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: bpf@vger.kernel.org, netdev@vger.kernel.org, Willem de Bruijn,
	Martin KaFai Lau, Tottenham, Max, Glasgall, Anna,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Eduard Zingerman, Kumar Kartikeya Dwivedi, David S. Miller,
	Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	linux-kernel@vger.kernel.org
In-Reply-To: <willemdebruijn.kernel.245c592e6d270@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3944 bytes --]



> On Apr 8, 2026, at 4:10 PM, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> 
> !-------------------------------------------------------------------|
>  This Message Is From an External Sender
>  This message came from outside your organization.
> |-------------------------------------------------------------------!
> 
> Nick Hudson wrote:
>> On shrink in bpf_skb_adjust_room(), clear tunnel-specific GSO flags
>> according to the decapsulation flags:
>> 
>> - BPF_F_ADJ_ROOM_DECAP_L4_UDP clears SKB_GSO_UDP_TUNNEL{,_CSUM}, and
>>                                     SKB_GSO_TUNNEL_REMCSUM
>> - BPF_F_ADJ_ROOM_DECAP_L4_GRE clears SKB_GSO_GRE{,_CSUM}
>> - BPF_F_ADJ_ROOM_DECAP_IPXIP4 clears SKB_GSO_IPXIP4
>> - BPF_F_ADJ_ROOM_DECAP_IPXIP6 clears SKB_GSO_IPXIP6
>> 
>> When all tunnel-related GSO bits are cleared, also clear
>> skb->encapsulation.
>> 
>> Handle the ESP inside a UDP tunnel case where encapsulation should remain
>> set.
>> 
>> If UDP decap is performed and GSO state removed then reset encap_hdr_csum, and
>> remcsum_offload.
>> 
>> Co-developed-by: Max Tottenham <mtottenh@akamai.com>
>> Signed-off-by: Max Tottenham <mtottenh@akamai.com>
>> Co-developed-by: Anna Glasgall <aglasgal@akamai.com>
>> Signed-off-by: Anna Glasgall <aglasgal@akamai.com>
>> Signed-off-by: Nick Hudson <nhudson@akamai.com>
>> ---
>> net/core/filter.c | 40 ++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 40 insertions(+)
>> 
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 7f8d43420afb..04059d07d368 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -3667,6 +3667,46 @@ static int bpf_skb_net_shrink(struct sk_buff *skb, u32 off, u32 len_diff,
>> if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
>> skb_increase_gso_size(shinfo, len_diff);
>> 
>> + /* Selective GSO flag clearing based on decap type.
>> + * Only clear the flags for the tunnel layer being removed.
>> + */
>> + if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) &&
>> +    (shinfo->gso_type & (SKB_GSO_UDP_TUNNEL |
>> + SKB_GSO_UDP_TUNNEL_CSUM |
>> + SKB_GSO_TUNNEL_REMCSUM)))
>> + shinfo->gso_type &= ~(SKB_GSO_UDP_TUNNEL |
>> +      SKB_GSO_UDP_TUNNEL_CSUM |
>> +      SKB_GSO_TUNNEL_REMCSUM);
> 
> REMCSUM was previously not included in the series.

Yeah, I was trying to address

https://sashiko.dev/#/patchset/20260318134242.2725749-1-nhudson%40akamai.com?part=5


> 
> It is a non-obvious and rare enough feature that I would exclude it,
> or move it to a separate patch.

I’m happy to drop it.


> 
>> + if ((flags & BPF_F_ADJ_ROOM_DECAP_L4_GRE) &&
>> +    (shinfo->gso_type & (SKB_GSO_GRE | SKB_GSO_GRE_CSUM)))
>> + shinfo->gso_type &= ~(SKB_GSO_GRE |
>> +      SKB_GSO_GRE_CSUM);
>> + if ((flags & BPF_F_ADJ_ROOM_DECAP_IPXIP4) &&
>> +    (shinfo->gso_type & SKB_GSO_IPXIP4))
>> + shinfo->gso_type &= ~SKB_GSO_IPXIP4;
>> + if ((flags & BPF_F_ADJ_ROOM_DECAP_IPXIP6) &&
>> +    (shinfo->gso_type & SKB_GSO_IPXIP6))
>> + shinfo->gso_type &= ~SKB_GSO_IPXIP6;
>> +
>> + /* Clear encapsulation flag only when no tunnel GSO flags remain */
>> + if (flags & (BPF_F_ADJ_ROOM_DECAP_L4_MASK |
>> +     BPF_F_ADJ_ROOM_DECAP_IPXIP_MASK)) {
>> + if (!(shinfo->gso_type & (SKB_GSO_UDP_TUNNEL |
>> +  SKB_GSO_UDP_TUNNEL_CSUM |
>> +  SKB_GSO_GRE |
>> +  SKB_GSO_GRE_CSUM |
>> +  SKB_GSO_IPXIP4 |
>> +  SKB_GSO_IPXIP6 |
>> +  SKB_GSO_ESP)))
>> + if (skb->encapsulation)
>> + skb->encapsulation = 0;
>> +
>> + if (flags & BPF_F_ADJ_ROOM_DECAP_L4_UDP) {
>> + skb->encap_hdr_csum = !!(shinfo->gso_type & SKB_GSO_UDP_TUNNEL_CSUM);
> 
> Since the flag is never set, only possibly cleared: just clear this field when clearing the flag?
> 
> It appears that this is only used for deprecated UFO anyway.
> 
>> + skb->remcsum_offload = !!(shinfo->gso_type & SKB_GSO_TUNNEL_REMCSUM);
> 
> Always zero?

Yeah, I’ll fix these up in v4

Thanks.


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3066 bytes --]

^ permalink raw reply

* Re: [PATCH 1/5] uaccess: fix ignored_trailing logic in copy_struct_to_user()
From: Stefan Metzmacher @ 2026-04-09  9:01 UTC (permalink / raw)
  To: Aleksa Sarai
  Cc: linux-kernel, Dmitry Safonov, Dmitry Safonov, Salam Noureddine,
	David Ahern, David S . Miller, Michal Luczaj, David Wei,
	Luiz Augusto von Dentz, Luiz Augusto von Dentz, Marcel Holtmann,
	Xin Long, Eric Dumazet, Kuniyuki Iwashima, Paolo Abeni,
	Willem de Bruijn, Neal Cardwell, Jakub Kicinski, Simon Horman,
	Christian Brauner, Kees Cook, netdev, linux-bluetooth
In-Reply-To: <2026-04-08-ditzy-organic-yowl-croc-yWsgIE@cyphar.com>

Hi Aleksa,

> On 2026-04-07, Stefan Metzmacher <metze@samba.org> wrote:
>> Currently all callers pass ignored_trailing=NULL, but I have
>> code that will make use of.
>>
>> Now it actually behaves like documented:
>>
>> * If @usize < @ksize, then the kernel is trying to pass userspace a newer
>>    struct than it supports. Thus we only copy the interoperable portions
>>    (@usize) and ignore the rest (but @ignored_trailing is set to %true if
>>    any of the trailing (@ksize - @usize) bytes are non-zero).
> 
> Good catch, though I want to mention that the current API design for
> copy_struct_to_user() is a bit of a compromise -- I was trying to think
> of a way of making it generic but what information you need really
> depends on your API.
> 
> For request-flag APIs (like statx) then you can just unset the bits in
> the response mask for fields past usize and so it is a non-fatal error,
> but it requires knowing which field offsets map to which flags.
> 
> My initial idea for ignored_trailing was for it to return the offset
> memchr_inv() gives you -- but unfortunately, this doesn't help in the
> more generic case where you have multiple non-zero bits that need to
> unset multiple flags.

I guess the caller could use if (ignored_trailing) { ... }
to check more complex stuff and then decide ignore or return an error.

> Out of interest, how did you plan on using it? It might be a good idea
> to rethink this API before it starts getting used "in anger" in a way
> that leaks to uAPIs we can't change.

Currently I only use it with WARN_ON_ONCE(ignored_trailing);
in order to catch internal errors. See
https://git.samba.org/?p=metze/linux/wip.git;a=blob;f=fs/smb/common/smbdirect/smbdirect_proto.c;h=ce7c78eb6795041ba672da434ffb01db73269cb7;hb=37c61ef9758f3e113d4078220d8fc2aee366c955#l1625
But I guess I will at least change it to
if (WARN_ON_ONCE(ignored_trailing))
     return...

And in general I thought it would be good practice to
check that case in new code in order to avoid unexpected
behavior.

> In any case, for this patch feel free to take my
> 
> Reviewed-by: Aleksa Sarai <aleksa@amutable.com>

Thanks!
metze


^ permalink raw reply

* Re: [PATCH net-next v2 02/14] libie: add PCI device initialization helpers to libie
From: Paolo Abeni @ 2026-04-09  8:56 UTC (permalink / raw)
  To: Tony Nguyen, davem, kuba, edumazet, andrew+netdev, netdev
  Cc: Phani R Burra, larysa.zaremba, przemyslaw.kitszel,
	aleksander.lobakin, sridhar.samudrala, anjali.singhai,
	michal.swiatkowski, maciej.fijalkowski, emil.s.tantilov,
	madhu.chittim, joshua.a.hay, jacob.e.keller,
	jayaprakash.shanmugam, jiri, horms, corbet, richardcochran,
	linux-doc, bhelgaas, linux-pci, Bharath R, Samuel Salin,
	Aleksandr Loktionov
In-Reply-To: <20260403194938.3577011-3-anthony.l.nguyen@intel.com>

On 4/3/26 9:49 PM, Tony Nguyen wrote:
> +	mr = libie_find_mmio_region(&mmio_info->mmio_list, offset, size,
> +				    bar_idx);
> +	if (mr) {
> +		pci_warn(pdev,
> +			 "Mapping of BAR%u (offset=%llu, size=%llu) intersecting region (offset=%llu, size=%llu) already exists\n",
> +			 bar_idx, (unsigned long long)mr->offset,
> +			 (unsigned long long)mr->size,
> +			 (unsigned long long)offset, (unsigned long long)size);
> +		return mr->offset <= offset &&
> +		       mr->offset + mr->size >= offset + size;

Sashiko says:

---
Does returning true here without creating a new tracking object leave
the new mapping tied to the original mapping's lifetime?
If the driver unmaps the original region, iounmap() is called and the
tracking object is freed. Any cached virtual address pointers to the
sub-region would then become a use-after-free, and subsequent queries
for the sub-region would fail.
---

/P


^ permalink raw reply

* [PATCH net-next 2/2] ipvlan: avoid spinlock contention in ipvlan_multicast_enqueue()
From: Eric Dumazet @ 2026-04-09  8:52 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet
In-Reply-To: <20260409085238.1122947-1-edumazet@google.com>

Under high stress, we spend a lot of time cloning skbs,
then acquiring a spinlock, then freeing the clone because
the queue is full.

Add a shortcut to avoid these costs under pressure, as we did
in macvlan with commit 0d5dc1d7aad1 ("macvlan: avoid spinlock
contention in macvlan_broadcast_enqueue()")

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ipvlan/ipvlan_core.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 214fd299a5aa6e40579aa2dbcb178b5474b561a4..1be8620ad3971d281fb36fd0770efd67b566ae60 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -763,10 +763,16 @@ static rx_handler_result_t ipvlan_handle_mode_l2(struct sk_buff **pskb,
 	if (!ipvlan_external_frame(skb, port))
 		return RX_HANDLER_PASS;
 
-	nskb = skb_clone(skb, GFP_ATOMIC);
+	if (skb_queue_len_lockless(&port->backlog) >= IPVLAN_QBACKLOG_LIMIT)
+		nskb = NULL;
+	else
+		nskb = skb_clone(skb, GFP_ATOMIC);
+
 	if (nskb) {
 		ipvlan_skb_crossing_ns(nskb, NULL);
 		ipvlan_multicast_enqueue(port, nskb, false);
+	} else {
+		dev_core_stats_rx_dropped_inc(skb->dev);
 	}
 
 	return RX_HANDLER_PASS;
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH net-next 1/2] ipvlan: ipvlan_handle_mode_l2() refactoring
From: Eric Dumazet @ 2026-04-09  8:52 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet
In-Reply-To: <20260409085238.1122947-1-edumazet@google.com>

Reduce indentation level, and add a likely() clause
as we expect to process more unicast packets than multicast ones.

No functional change, this eases the following patch review.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ipvlan/ipvlan_core.c | 38 +++++++++++++++-----------------
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 0b493a8aa33857d531329e8eaef6b25a5c6f572d..214fd299a5aa6e40579aa2dbcb178b5474b561a4 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -744,34 +744,32 @@ static rx_handler_result_t ipvlan_handle_mode_l3(struct sk_buff **pskb,
 static rx_handler_result_t ipvlan_handle_mode_l2(struct sk_buff **pskb,
 						 struct ipvl_port *port)
 {
-	struct sk_buff *skb = *pskb;
+	struct sk_buff *nskb, *skb = *pskb;
 	struct ethhdr *eth = eth_hdr(skb);
-	rx_handler_result_t ret = RX_HANDLER_PASS;
 
 	if (unlikely(skb->pkt_type == PACKET_LOOPBACK))
 		return RX_HANDLER_PASS;
 
-	if (is_multicast_ether_addr(eth->h_dest)) {
-		if (ipvlan_external_frame(skb, port)) {
-			struct sk_buff *nskb = skb_clone(skb, GFP_ATOMIC);
+	/* Perform like l3 mode for non-multicast packet */
+	if (likely(!is_multicast_ether_addr(eth->h_dest)))
+		return ipvlan_handle_mode_l3(pskb, port);
 
-			/* External frames are queued for device local
-			 * distribution, but a copy is given to master
-			 * straight away to avoid sending duplicates later
-			 * when work-queue processes this frame. This is
-			 * achieved by returning RX_HANDLER_PASS.
-			 */
-			if (nskb) {
-				ipvlan_skb_crossing_ns(nskb, NULL);
-				ipvlan_multicast_enqueue(port, nskb, false);
-			}
-		}
-	} else {
-		/* Perform like l3 mode for non-multicast packet */
-		ret = ipvlan_handle_mode_l3(pskb, port);
+	/* External frames are queued for device local
+	 * distribution, but a copy is given to master
+	 * straight away to avoid sending duplicates later
+	 * when work-queue processes this frame.
+	 * This is achieved by returning RX_HANDLER_PASS.
+	 */
+	if (!ipvlan_external_frame(skb, port))
+		return RX_HANDLER_PASS;
+
+	nskb = skb_clone(skb, GFP_ATOMIC);
+	if (nskb) {
+		ipvlan_skb_crossing_ns(nskb, NULL);
+		ipvlan_multicast_enqueue(port, nskb, false);
 	}
 
-	return ret;
+	return RX_HANDLER_PASS;
 }
 
 rx_handler_result_t ipvlan_handle_frame(struct sk_buff **pskb)
-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply related

* [PATCH net-next 0/2] ipvlan: multicast delivery changes
From: Eric Dumazet @ 2026-04-09  8:52 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski, Paolo Abeni
  Cc: Simon Horman, netdev, eric.dumazet, Eric Dumazet

As we did recently for macvlan, this series adds some relief
when ipvlan is under multicast storms.

Eric Dumazet (2):
  ipvlan: ipvlan_handle_mode_l2() refactoring
  ipvlan: avoid spinlock contention in ipvlan_multicast_enqueue()

 drivers/net/ipvlan/ipvlan_core.c | 42 +++++++++++++++++---------------
 1 file changed, 23 insertions(+), 19 deletions(-)

-- 
2.53.0.1213.gd9a14994de-goog


^ permalink raw reply

* Re: [PATCH 3/5] uaccess: add copy_struct_{from,to}_bounce_buffer() helpers
From: Stefan Metzmacher @ 2026-04-09  8:47 UTC (permalink / raw)
  To: David Laight
  Cc: linux-kernel, Dmitry Safonov, Dmitry Safonov, Francesco Ruggeri,
	Salam Noureddine, David Ahern, David S . Miller, Michal Luczaj,
	David Wei, Luiz Augusto von Dentz, Luiz Augusto von Dentz,
	Marcel Holtmann, Xin Long, Eric Dumazet, Kuniyuki Iwashima,
	Paolo Abeni, Willem de Bruijn, Neal Cardwell, Jakub Kicinski,
	Simon Horman, Aleksa Sarai, Christian Brauner, Kees Cook, netdev,
	linux-bluetooth
In-Reply-To: <20260407192540.321f3879@pumpkin>

Hi David,

> On Tue,  7 Apr 2026 18:03:15 +0200
> Stefan Metzmacher <metze@samba.org> wrote:
> 
>> These are similar to copy_struct_{from,to}_user() but operate
>> on kernel buffers instead of user buffers.
>>
>> They can be used when there is a temporary bounce buffer used,
>> e.g. in msg_control or similar places.
>>
>> It allows us to have the same logic to handle old vs. current
>> and current vs. new structures in the same compatible way.
>>
>> copy_struct_from_sockptr() will also be able to
>> use copy_struct_from_bounce_buffer() for the kernel
>> case as follow us patch.
>>
>> I'll use this in my IPPROTO_SMBDIRECT work,
>> but maybe it will also be useful for others...
>> IPPROTO_QUIC will likely also use it.
>>
>> Cc: Dmitry Safonov <0x7f454c46@gmail.com>
>> Cc: Dmitry Safonov <dima@arista.com>
>> Cc: Francesco Ruggeri <fruggeri@arista.com>
>> Cc: Salam Noureddine <noureddine@arista.com>
>> Cc: David Ahern <dsahern@kernel.org>
>> Cc: David S. Miller <davem@davemloft.net>
>> Cc: Michal Luczaj <mhal@rbox.co>
>> Cc: David Wei <dw@davidwei.uk>
>> Cc: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
>> Cc: Luiz Augusto von Dentz <luiz.dentz@gmail.com>
>> Cc: Marcel Holtmann <marcel@holtmann.org>
>> Cc: Xin Long <lucien.xin@gmail.com>
>> Cc: Eric Dumazet <edumazet@google.com>
>> Cc: Kuniyuki Iwashima <kuniyu@google.com>
>> Cc: Paolo Abeni <pabeni@redhat.com>
>> Cc: Willem de Bruijn <willemb@google.com>
>> Cc: Neal Cardwell <ncardwell@google.com>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: Simon Horman <horms@kernel.org>
>> Cc: Aleksa Sarai <cyphar@cyphar.com>
>> Cc: Christian Brauner <brauner@kernel.org>
>> CC: Kees Cook <keescook@chromium.org>
>> Cc: netdev@vger.kernel.org
>> Cc: linux-bluetooth@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> Signed-off-by: Stefan Metzmacher <metze@samba.org>
>> ---
>>   include/linux/uaccess.h | 63 +++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 63 insertions(+)
>>
>> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
>> index 1234b5fa4761..a6cd4f48bb99 100644
>> --- a/include/linux/uaccess.h
>> +++ b/include/linux/uaccess.h
>> @@ -513,6 +513,69 @@ copy_struct_to_user(void __user *dst, size_t usize, const void *src,
>>   	return 0;
>>   }
>>   
>> +static __always_inline void
>> +__copy_struct_generic_bounce_buffer(void *dst, size_t dstsize,
>> +				    const void *src, size_t srcsize,
>> +				    bool *ignored_trailing)
>> +{
>> +	size_t size = min(dstsize, srcsize);
>> +	size_t rest = max(dstsize, srcsize) - size;
>> +
>> +	/* Deal with trailing bytes. */
>> +	if (dstsize > srcsize)
>> +		memset(dst + size, 0, rest);
>> +	if (ignored_trailing)
>> +		*ignored_trailing = dstsize < srcsize &&
>> +			memchr_inv(src + size, 0, rest) != NULL;
>> +	/* Copy the interoperable parts of the struct. */
>> +	memcpy(dst, src, size);
>> +}
> 
> Return 'ignored_trailing' rather than pass by reference.

I also thought about that but it makes
the copy_struct_to_ case more complex.

I'm not sure but my guess would be that
the compiler would have the chance to skip the
ignore_trailing logic if (as all current callers do)
NULL is passed.

> And this is probably too big to inline.

In the next patch this replace open coded logic in
copy_struct_from_sockptr. So as all of copy_struct_*
consists of inline functions I thought it would be good to
keep it that way.

So unless there a real good reason to change it
I'd like to keep it as is.

Thanks!
metze

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox