Netdev List
 help / color / mirror / Atom feed
* [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe
  2026-06-05  3:50 [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
@ 2026-06-05  3:50 ` Ratheesh Kannoth
  2026-06-05  7:47   ` Ratheesh Kannoth
  0 siblings, 1 reply; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-05  3:50 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham, Ratheesh Kannoth

There is only one admin-function PCI device per system.
Reject any additional AF probe with -EBUSY so the driver model matches
hardware and automated reviewers can rely on a single bound instance.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index 3cf131508ecf..1f0c962e10f4 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -3542,12 +3542,19 @@ static void rvu_update_module_params(struct rvu *rvu)
 		kpu_profile ? kpu_profile : default_pfl_name, KPU_NAME_LEN);
 }
 
+static atomic_t device_bound = ATOMIC_INIT(0);
+
 static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct device *dev = &pdev->dev;
 	struct rvu *rvu;
 	int    err;
 
+	if (atomic_cmpxchg(&device_bound, 0, 1) != 0) {
+		dev_warn(dev, "Only one af device is supported.\n");
+		return -EBUSY;
+	}
+
 	rvu = devm_kzalloc(dev, sizeof(*rvu), GFP_KERNEL);
 	if (!rvu)
 		return -ENOMEM;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements.
@ 2026-06-05  6:32 Ratheesh Kannoth
  2026-06-05  6:32 ` [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe Ratheesh Kannoth
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-05  6:32 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham, Ratheesh Kannoth

This series extends Marvell octeontx2-af support for CN20K NPC (MCAM
debuggability, allocation policy, default-rule lifetime, optional KPU
profiles from firmware files, X2/X4 MCAM keyword handling in flows and
defaults, and dynamic CN20K NPC private state), adds a devlink mechanism
for multi-value parameters, and moves devlink_nl_param_fill() temporaries
to the heap so stack usage stays reasonable once union devlink_param_value
grows (patch 3).

Patch 1 enforces a single RVU admin-function PCI device per system: there
is only one AF; rvu_probe() rejects any additional bind with -EBUSY, logs a
warning, and matches the hardware model so automated reviewers and tooling
can rely on exactly one bound AF instance.

Patch 2 improves CN20K MCAM visibility in debugfs: mcam_layout marks
enabled entries, dstats reports per-entry hit deltas (baseline updated in
software after each read; hardware counters are not cleared), and mismatch
lists enabled entries without a PF mapping.

Patch 3 allocates the per-configuration-mode union devlink_param_value
buffers and struct devlink_param_gset_ctx used by devlink_nl_param_fill()
with kcalloc()/kzalloc_obj() and funnels failures through a single cleanup
path so the netlink reply path stays safe as the union grows.

Patch 4 (Saeed) introduces DEVLINK_PARAM_TYPE_U64_ARRAY and nested
DEVLINK_ATTR_PARAM_VALUE_DATA attributes so drivers and user space can
exchange bounded u64 arrays; YAML, uapi, and netlink validation are
updated.

Patch 5 adds a runtime devlink parameter srch_order to reorder CN20K
subbank search during MCAM allocation (the param uses the u64 array type
from patch 4).

Patch 6 ties default MCAM entries to NIX LF alloc/free on CN20K, adds
NIX_LF_DONT_FREE_DFT_IDXS for PF teardown paths that must not drop default
NPC indexes while the driver still owns state, and tightens nix_lf_alloc
error propagation.

Patch 7 allows loading a custom KPU profile from /lib/firmware/kpu via
module parameter kpu_profile, with cam2 / ptype_mask wiring and helpers
that share firmware-sourced vs filesystem-sourced profile layouts.

Patch 8 makes default-rule allocation, AF flow install, and PF-side RSS,
defaults, and ethtool flows respect the active CN20K MCAM keyword width
(X2 vs X4), including X4 reference-index masking and -EOPNOTSUPP when a
flow needs X4 keys on an X2-only profile.

Patch 9 replaces file-scope npc_priv and static dstats with allocation
sized from discovered bank/subbank geometry, threads npc_priv_get()
through CN20K NPC paths, and allocates dstats via devm_kzalloc for the
debugfs helper.

This guard is ordered first so later patches assume a single bound AF.
Heap-backed devlink_nl_param_fill() sits immediately before the U64 array param work so
incremental builds stay stack-safe as the union grows; the CN20K patches
keep srch_order ahead of NIX LF coordination, optional KPU profile load
from firmware files, X2/X4 handling, and the npc_priv refactor that touches
the same files heavily.

Ratheesh Kannoth (8):
  octeontx2-af: Enforce single RVU AF probe
  octeontx2-af: npc: cn20k: debugfs enhancements
  devlink: heap-allocate param fill buffers in devlink_nl_param_fill
  octeontx2-af: npc: cn20k: add subbank search order control
  octeontx2: cn20k: Coordinate default rules with NIX LF lifecycle
  octeontx2-af: npc: Support for custom KPU profile from filesystem
  octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT
    alloc
  octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically.

Saeed Mahameed (1):
  devlink: Implement devlink param multi attribute nested data values

 Documentation/netlink/specs/devlink.yaml      |   4 +
 .../marvell/octeontx2/af/cn20k/debugfs.c      | 163 ++++-
 .../ethernet/marvell/octeontx2/af/cn20k/npc.c | 633 ++++++++++++------
 .../ethernet/marvell/octeontx2/af/cn20k/npc.h |  18 +-
 .../net/ethernet/marvell/octeontx2/af/mbox.h  |   1 +
 .../net/ethernet/marvell/octeontx2/af/npc.h   |  17 +
 .../net/ethernet/marvell/octeontx2/af/rvu.c   |   7 +
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  12 +-
 .../marvell/octeontx2/af/rvu_devlink.c        |  92 ++-
 .../ethernet/marvell/octeontx2/af/rvu_nix.c   |  77 ++-
 .../ethernet/marvell/octeontx2/af/rvu_npc.c   | 486 +++++++++++---
 .../ethernet/marvell/octeontx2/af/rvu_npc.h   |  17 +
 .../marvell/octeontx2/af/rvu_npc_fs.c         |  12 +-
 .../ethernet/marvell/octeontx2/af/rvu_reg.h   |   1 +
 .../marvell/octeontx2/nic/otx2_flows.c        |  48 +-
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |   6 +-
 include/net/devlink.h                         |   8 +
 include/uapi/linux/devlink.h                  |   1 +
 net/devlink/netlink_gen.c                     |   2 +
 net/devlink/param.c                           |  95 ++-
 20 files changed, 1295 insertions(+), 405 deletions(-)

--

v18 -> v19: Addressed Jakub comments.
	https://lore.kernel.org/netdev/20260602060359.1894952-1-rkannoth@marvell.com/
	Added 1 more patch.

v17 -> v18: Addressed sashiko comments.
	https://lore.kernel.org/netdev/20260601025844.865865-1-rkannoth@marvell.com/

v16 -> v17: Addressed Jakub comments.
	https://lore.kernel.org/netdev/20260521095303.2395584-1-rkannoth@marvell.com/

v15 -> v16: Addressed Sashiko comments
	https://lore.kernel.org/netdev/20260520020939.1457231-1-rkannoth@marvell.com/

v14 -> v15: Addressed Paolo comments
	https://lore.kernel.org/netdev/20260514062537.3813802-1-rkannoth@marvell.com/

v13 -> v14: Addressed sashiko comments.
	I had to revert Jiri comment in v11 as sashiko was complaining about
	leaking kernel memory to userspace.
	https://lore.kernel.org/netdev/20260511033923.1301976-1-rkannoth@marvell.com/

v12 -> v13: Addressed David Laight comments
	https://lore.kernel.org/netdev/20260508034912.4082520-1-rkannoth@marvell.com/

v11 -> v12: Addressed Paolo,Jiri comments.
	https://lore.kernel.org/netdev/20260409025055.1664053-1-rkannoth@marvell.com/
	Added one patch which was rejected by simon in net
	(as it was kind of enhancement rather than a bug)
	Added one more patch- which allocates two variables from heap.

v10 -> v11: Addressed Paolo comments.
	https://lore.kernel.org/netdev/20260403025533.6250-1-rkannoth@marvell.com/

v9 -> v10: Addressed Paolo comments
	https://lore.kernel.org/netdev/
	20260330053105.2722453-1-rkannoth@marvell.com/

v8 -> v9: Addressed Simon comments
	https://lore.kernel.org/netdev/
	20260325072159.1126964-1-rkannoth@marvell.com/

v7 -> v8: Addressed Simon comments
	https://lore.kernel.org/netdev/
	20260323035110.3908741-1-rkannoth@marvell.com/T/#t

v6 -> v7: Addressed Simon comments
	https://lore.kernel.org/netdev/20260320165432.98832-1-horms@kernel.org/

v5 -> v6: Addressed Jakub,Jiri comments
	https://lore.kernel.org/netdev/
	20260317045623.250187-1-rkannoth@marvell.com/

v4 -> v5: Addressed Jakub comments
	https://lore.kernel.org/netdev/
	20260312022754.2029595-6-rkannoth@marvell.com/

v3 -> v4: Addressed Simon comments
	https://lore.kernel.org/netdev/abDeXLpMMxp7G1v3@rkannoth-OptiPlex-7090/#t

v2 -> v3: Addressed Simon comments.
	https://lore.kernel.org/netdev/
	20260304043032.3661647-1-rkannoth@marvell.com/

v1 -> v2: Addressed Jakub comments.
	https://lore.kernel.org/netdev/
	20260302085803.2449828-1-rkannoth@marvell.com/#t

2.43.0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe
  2026-06-05  6:32 [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
@ 2026-06-05  6:32 ` Ratheesh Kannoth
  2026-06-08  2:17   ` Ratheesh Kannoth
                     ` (2 more replies)
  2026-06-05  6:32 ` [PATCH v19 net-next 2/9] octeontx2-af: npc: cn20k: debugfs enhancements Ratheesh Kannoth
                   ` (7 subsequent siblings)
  8 siblings, 3 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-05  6:32 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham, Ratheesh Kannoth

There is only one admin-function PCI device per system.
Reject any additional AF probe with -EBUSY so the driver model matches
hardware and automated reviewers can rely on a single bound instance.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
index 3cf131508ecf..1f0c962e10f4 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
@@ -3542,12 +3542,19 @@ static void rvu_update_module_params(struct rvu *rvu)
 		kpu_profile ? kpu_profile : default_pfl_name, KPU_NAME_LEN);
 }
 
+static atomic_t device_bound = ATOMIC_INIT(0);
+
 static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	struct device *dev = &pdev->dev;
 	struct rvu *rvu;
 	int    err;
 
+	if (atomic_cmpxchg(&device_bound, 0, 1) != 0) {
+		dev_warn(dev, "Only one af device is supported.\n");
+		return -EBUSY;
+	}
+
 	rvu = devm_kzalloc(dev, sizeof(*rvu), GFP_KERNEL);
 	if (!rvu)
 		return -ENOMEM;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v19 net-next 2/9] octeontx2-af: npc: cn20k: debugfs enhancements
  2026-06-05  6:32 [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
  2026-06-05  6:32 ` [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe Ratheesh Kannoth
@ 2026-06-05  6:32 ` Ratheesh Kannoth
  2026-06-08  2:20   ` Ratheesh Kannoth
  2026-06-08  2:26   ` Ratheesh Kannoth
  2026-06-05  6:32 ` [PATCH v19 net-next 3/9] devlink: heap-allocate param fill buffers in devlink_nl_param_fill Ratheesh Kannoth
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-05  6:32 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham, Ratheesh Kannoth

Improve MCAM visibility and field debugging for CN20K NPC.

- Extend "mcam_layout" to show enabled (+) or disabled state per entry
  so status can be verified without parsing the full "mcam_entry" dump.
- Add "dstats" debugfs entry: for enabled MCAM indices, print hit deltas
  since the prior read by comparing hardware counters to a per-entry
  software baseline and advancing that baseline after each read (hardware
  counters are not cleared).
- Add "mismatch" debugfs entry: lists MCAM entries that are enabled
  but not explicitly allocated, helping diagnose allocation/field issues.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 .../marvell/octeontx2/af/cn20k/debugfs.c      | 158 +++++++++++++++++-
 .../ethernet/marvell/octeontx2/af/cn20k/npc.c |  37 +++-
 .../ethernet/marvell/octeontx2/af/cn20k/npc.h |  11 ++
 3 files changed, 191 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c
index 6f13296303cb..730ef97a57e6 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c
@@ -13,6 +13,7 @@
 #include "struct.h"
 #include "rvu.h"
 #include "debugfs.h"
+#include "cn20k/reg.h"
 #include "cn20k/npc.h"
 
 static int npc_mcam_layout_show(struct seq_file *s, void *unused)
@@ -58,7 +59,8 @@ static int npc_mcam_layout_show(struct seq_file *s, void *unused)
 						 "v:%u", vidx0);
 				}
 
-				seq_printf(s, "\t%u(%#x) %s\n", idx0, pf1,
+				seq_printf(s, "\t%u(%#x)%c %s\n", idx0, pf1,
+					   test_bit(idx0, npc_priv->en_map) ? '+' : ' ',
 					   map ? buf0 : " ");
 			}
 			goto next;
@@ -101,9 +103,13 @@ static int npc_mcam_layout_show(struct seq_file *s, void *unused)
 						 vidx1);
 				}
 
-				seq_printf(s, "%05u(%#x) %s\t\t%05u(%#x) %s\n",
-					   idx1, pf2, v1 ? buf1 : "       ",
-					   idx0, pf1, v0 ? buf0 : "       ");
+				seq_printf(s, "%05u(%#x)%c %s\t\t%05u(%#x)%c %s\n",
+					   idx1, pf2,
+					   test_bit(idx1, npc_priv->en_map) ? '+' : ' ',
+					   v1 ? buf1 : "       ",
+					   idx0, pf1,
+					   test_bit(idx0, npc_priv->en_map) ? '+' : ' ',
+					   v0 ? buf0 : "       ");
 
 				continue;
 			}
@@ -120,8 +126,9 @@ static int npc_mcam_layout_show(struct seq_file *s, void *unused)
 						 vidx0);
 				}
 
-				seq_printf(s, "\t\t   \t\t%05u(%#x) %s\n", idx0,
-					   pf1, map ? buf0 : " ");
+				seq_printf(s, "\t\t   \t\t%05u(%#x)%c %s\n", idx0, pf1,
+					   test_bit(idx0, npc_priv->en_map) ? '+' : ' ',
+					   map ? buf0 : " ");
 				continue;
 			}
 
@@ -134,7 +141,8 @@ static int npc_mcam_layout_show(struct seq_file *s, void *unused)
 				snprintf(buf1, sizeof(buf1), "v:%05u", vidx1);
 			}
 
-			seq_printf(s, "%05u(%#x) %s\n", idx1, pf1,
+			seq_printf(s, "%05u(%#x)%c %s\n", idx1, pf1,
+				   test_bit(idx1, npc_priv->en_map) ? '+' : ' ',
 				   map ? buf1 : " ");
 		}
 next:
@@ -145,6 +153,136 @@ static int npc_mcam_layout_show(struct seq_file *s, void *unused)
 
 DEFINE_SHOW_ATTRIBUTE(npc_mcam_layout);
 
+#define __OCTEONTX2_DEBUGFS_ATTRIBUTE_FOPS(__name)			\
+static const struct file_operations __name ## _fops = {			\
+	.owner = THIS_MODULE,						\
+	.open = __name ## _open,					\
+	.read = seq_read,						\
+	.llseek = seq_lseek,						\
+	.release = single_release,					\
+}
+
+#define DEFINE_OCTEONTX2_DEBUGFS_ATTRIBUTE_WITH_SIZE(__name, __size)		\
+static int __name ## _open(struct inode *inode, struct file *file)		\
+{										\
+	return single_open_size(file, __name ## _show, inode->i_private,	\
+				__size);					\
+}										\
+__OCTEONTX2_DEBUGFS_ATTRIBUTE_FOPS(__name)
+
+static DEFINE_MUTEX(stats_lock);
+
+/* MAX_NUM_BANKS, MAX_SUBBANK_DEPTH and MAX_NUM_SUB_BANKS represent
+ * hard limit on all silicon variants, preventing any possibility of
+ * out-of-bounds access.
+ */
+static u64 dstats[MAX_NUM_BANKS][MAX_SUBBANK_DEPTH * MAX_NUM_SUB_BANKS] = {};
+static int npc_mcam_dstats_show(struct seq_file *s, void *unused)
+{
+	struct npc_priv_t *npc_priv;
+	int blkaddr, pf, mcam_idx;
+	u64 stats, delta;
+	struct rvu *rvu;
+	char buff[64];
+	u8 key_type;
+	void *map;
+
+	npc_priv = npc_priv_get();
+	rvu = s->private;
+	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
+	if (blkaddr < 0)
+		return 0;
+
+	mutex_lock(&stats_lock);
+	seq_puts(s, "idx\tpfunc\tstats\n");
+	for (int bank = npc_priv->num_banks - 1; bank >= 0; bank--) {
+		for (int idx = npc_priv->bank_depth - 1; idx >= 0; idx--) {
+			mcam_idx = bank * npc_priv->bank_depth + idx;
+
+			if (npc_mcam_idx_2_key_type(rvu, mcam_idx, &key_type))
+				continue;
+
+			if (key_type == NPC_MCAM_KEY_X4 && bank != 0)
+				continue;
+
+			if (!test_bit(mcam_idx, npc_priv->en_map))
+				continue;
+
+			stats = rvu_read64(rvu, blkaddr,
+					   NPC_AF_CN20K_MCAMEX_BANKX_STAT_EXT(idx, bank));
+			if (!stats)
+				continue;
+			if (stats == dstats[bank][idx])
+				continue;
+
+			if (stats < dstats[bank][idx])
+				dstats[bank][idx] = 0;
+
+			pf = 0xFFFF;
+			map = xa_load(&npc_priv->xa_idx2pf_map, mcam_idx);
+			if (map)
+				pf = xa_to_value(map);
+
+			delta = stats - dstats[bank][idx];
+
+			snprintf(buff, sizeof(buff), "%u\t%#04x\t%llu\n",
+				 mcam_idx, pf, delta);
+			seq_puts(s, buff);
+
+			dstats[bank][idx] = stats;
+		}
+	}
+
+	mutex_unlock(&stats_lock);
+	return 0;
+}
+
+/*  "%u\t%#04x\t%llu\n" needs less than 64 characters to print */
+#define TOTAL_SZ (MAX_NUM_BANKS * MAX_NUM_SUB_BANKS * MAX_SUBBANK_DEPTH * 64)
+DEFINE_OCTEONTX2_DEBUGFS_ATTRIBUTE_WITH_SIZE(npc_mcam_dstats, TOTAL_SZ);
+
+static int npc_mcam_mismatch_show(struct seq_file *s, void *unused)
+{
+	struct npc_priv_t *npc_priv;
+	struct npc_subbank *sb;
+	int mcam_idx, sb_off;
+	struct rvu *rvu;
+	char buff[64];
+	void *map;
+	int rc;
+
+	npc_priv = npc_priv_get();
+	rvu = s->private;
+
+	seq_puts(s, "index\tsb idx\tkw type\n");
+	for (int bank = npc_priv->num_banks - 1; bank >= 0; bank--) {
+		for (int idx = npc_priv->bank_depth - 1; idx >= 0; idx--) {
+			mcam_idx = bank * npc_priv->bank_depth + idx;
+
+			if (!test_bit(mcam_idx, npc_priv->en_map))
+				continue;
+
+			map = xa_load(&npc_priv->xa_idx2pf_map, mcam_idx);
+			if (map)
+				continue;
+
+			rc = npc_mcam_idx_2_subbank_idx(rvu, mcam_idx,
+							&sb, &sb_off);
+			if (rc)
+				continue;
+
+			snprintf(buff, sizeof(buff), "%u\t%d\t%u\n",
+				 mcam_idx, sb->idx, sb->key_type);
+
+			seq_puts(s, buff);
+		}
+	}
+	return 0;
+}
+
+/* "%u\t%d\t%u\n" needs less than 64 characters to print. */
+DEFINE_OCTEONTX2_DEBUGFS_ATTRIBUTE_WITH_SIZE(npc_mcam_mismatch, TOTAL_SZ);
+
 static int npc_mcam_default_show(struct seq_file *s, void *unused)
 {
 	struct npc_priv_t *npc_priv;
@@ -259,6 +397,12 @@ int npc_cn20k_debugfs_init(struct rvu *rvu)
 	debugfs_create_file("vidx2idx", 0444, rvu->rvu_dbg.npc,
 			    npc_priv, &npc_vidx2idx_map_fops);
 
+	debugfs_create_file("dstats", 0444, rvu->rvu_dbg.npc, rvu,
+			    &npc_mcam_dstats_fops);
+
+	debugfs_create_file("mismatch", 0444, rvu->rvu_dbg.npc, rvu,
+			    &npc_mcam_mismatch_fops);
+
 	debugfs_create_file("idx2vidx", 0444, rvu->rvu_dbg.npc,
 			    npc_priv, &npc_idx2vidx_map_fops);
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
index 003487d7c3cf..31eaaceb8766 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
@@ -824,7 +824,7 @@ npc_cn20k_enable_mcam_entry(struct rvu *rvu, int blkaddr,
 		rvu_write64(rvu, blkaddr,
 			    NPC_AF_CN20K_MCAMEX_BANKX_CFG_EXT(mcam_idx, bank),
 			    cfg);
-		return 0;
+		goto update_en_map;
 	}
 
 	/* For NPC_CN20K_MCAM_KEY_X4 keys, both the banks
@@ -842,6 +842,12 @@ npc_cn20k_enable_mcam_entry(struct rvu *rvu, int blkaddr,
 			    cfg);
 	}
 
+update_en_map:
+	if (enable)
+		set_bit(index, npc_priv.en_map);
+	else
+		clear_bit(index, npc_priv.en_map);
+
 	return 0;
 }
 
@@ -1789,9 +1795,9 @@ static int npc_subbank_idx_2_mcam_idx(struct rvu *rvu, struct npc_subbank *sb,
 	return 0;
 }
 
-static int npc_mcam_idx_2_subbank_idx(struct rvu *rvu, u16 mcam_idx,
-				      struct npc_subbank **sb,
-				      int *sb_off)
+int npc_mcam_idx_2_subbank_idx(struct rvu *rvu, u16 mcam_idx,
+			       struct npc_subbank **sb,
+			       int *sb_off)
 {
 	int bank_off, sb_id;
 
@@ -4498,11 +4504,19 @@ static int npc_priv_init(struct rvu *rvu)
 		npc_const2 = rvu_read64(rvu, blkaddr, NPC_AF_CONST2);
 
 	num_banks = mcam->banks;
+	if (num_banks > MAX_NUM_BANKS) {
+		dev_err(rvu->dev,
+			"Number of banks(%u) is invalid\n", num_banks);
+		return -EINVAL;
+	}
+
 	bank_depth = mcam->banksize;
 
 	num_subbanks = FIELD_GET(GENMASK_ULL(39, 32), npc_const2);
-	if (!num_subbanks) {
-		dev_err(rvu->dev, "Number of subbanks is zero\n");
+	if (!num_subbanks || num_subbanks > MAX_NUM_SUB_BANKS) {
+		dev_err(rvu->dev,
+			"Number of subbanks is invalid %u\n",
+			num_subbanks);
 		return -EFAULT;
 	}
 
@@ -4513,10 +4527,15 @@ static int npc_priv_init(struct rvu *rvu)
 		return -EINVAL;
 	}
 
-	npc_priv.num_subbanks = num_subbanks;
-
 	subbank_depth =	bank_depth / num_subbanks;
+	if (subbank_depth > MAX_SUBBANK_DEPTH) {
+		dev_err(rvu->dev,
+			"Invalid subbank depth %u\n",
+			subbank_depth);
+		return -EINVAL;
+	}
 
+	npc_priv.num_subbanks = num_subbanks;
 	npc_priv.bank_depth = bank_depth;
 	npc_priv.subbank_depth = subbank_depth;
 
@@ -4605,6 +4624,8 @@ void npc_cn20k_deinit(struct rvu *rvu)
 	 */
 	kfree(npc_priv.sb);
 	kfree(subbank_srch_order);
+	bitmap_clear(npc_priv.en_map, 0, MAX_NUM_BANKS * MAX_NUM_SUB_BANKS *
+		     MAX_SUBBANK_DEPTH);
 }
 
 static int npc_setup_mcam_section(struct rvu *rvu, int key_type)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.h b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.h
index 3d5eb952cc07..3e851950be64 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.h
@@ -10,6 +10,10 @@
 
 #define MKEX_CN20K_SIGN	0x19bbfdbd160
 
+/* MAX_NUM_BANKS, MAX_SUBBANK_DEPTH and MAX_NUM_SUB_BANKS represent
+ * hard limit on all silicon variants, preventing any possibility of
+ * out-of-bounds access on matrix defined using these values.
+ */
 #define MAX_NUM_BANKS 2
 #define MAX_NUM_SUB_BANKS 32
 #define MAX_SUBBANK_DEPTH 256
@@ -170,6 +174,7 @@ struct npc_defrag_show_node {
  * @num_banks:		Number of banks.
  * @num_subbanks:	Number of subbanks.
  * @subbank_depth:	Depth of subbank.
+ * @en_map:		Enable/disable status.
  * @kw:			Kex configured key type.
  * @sb:			Subbank array.
  * @xa_sb_used:		Array of used subbanks.
@@ -193,6 +198,9 @@ struct npc_priv_t {
 	const int num_banks;
 	int num_subbanks;
 	int subbank_depth;
+	DECLARE_BITMAP(en_map, MAX_NUM_BANKS *
+		       MAX_NUM_SUB_BANKS *
+		       MAX_SUBBANK_DEPTH);
 	u8 kw;
 	struct npc_subbank *sb;
 	struct xarray xa_sb_used;
@@ -336,5 +344,8 @@ u16 npc_cn20k_vidx2idx(u16 index);
 u16 npc_cn20k_idx2vidx(u16 idx);
 int npc_cn20k_defrag(struct rvu *rvu);
 bool npc_is_cgx_or_lbk(struct rvu *rvu, u16 pcifunc);
+int npc_mcam_idx_2_subbank_idx(struct rvu *rvu, u16 mcam_idx,
+			       struct npc_subbank **sb,
+			       int *sb_off);
 
 #endif /* NPC_CN20K_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v19 net-next 3/9] devlink: heap-allocate param fill buffers in devlink_nl_param_fill
  2026-06-05  6:32 [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
  2026-06-05  6:32 ` [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe Ratheesh Kannoth
  2026-06-05  6:32 ` [PATCH v19 net-next 2/9] octeontx2-af: npc: cn20k: debugfs enhancements Ratheesh Kannoth
@ 2026-06-05  6:32 ` Ratheesh Kannoth
  2026-06-05  6:32 ` [PATCH v19 net-next 4/9] devlink: Implement devlink param multi attribute nested data values Ratheesh Kannoth
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-05  6:32 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham, Ratheesh Kannoth

devlink_nl_param_fill() kept two per-configuration-mode copies of
union devlink_param_value plus a struct devlink_param_gset_ctx on the
stack while building the Netlink reply. Allocate those with kcalloc()
and kzalloc_obj() instead, and route failures through a single cleanup
path so temporary buffers are always freed.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 net/devlink/param.c | 62 +++++++++++++++++++++++++++++++++------------
 1 file changed, 46 insertions(+), 16 deletions(-)

diff --git a/net/devlink/param.c b/net/devlink/param.c
index 1a196d3a843d..bd3881349c60 100644
--- a/net/devlink/param.c
+++ b/net/devlink/param.c
@@ -304,56 +304,79 @@ static int devlink_nl_param_fill(struct sk_buff *msg, struct devlink *devlink,
 				 u32 portid, u32 seq, int flags,
 				 struct netlink_ext_ack *extack)
 {
-	union devlink_param_value default_value[DEVLINK_PARAM_CMODE_MAX + 1];
-	union devlink_param_value param_value[DEVLINK_PARAM_CMODE_MAX + 1];
 	bool default_value_set[DEVLINK_PARAM_CMODE_MAX + 1] = {};
 	bool param_value_set[DEVLINK_PARAM_CMODE_MAX + 1] = {};
 	const struct devlink_param *param = param_item->param;
-	struct devlink_param_gset_ctx ctx;
+	union devlink_param_value *default_value;
+	union devlink_param_value *param_value;
+	struct devlink_param_gset_ctx *ctx;
 	struct nlattr *param_values_list;
 	struct nlattr *param_attr;
 	void *hdr;
 	int err;
 	int i;
 
+	default_value = kcalloc(DEVLINK_PARAM_CMODE_MAX + 1,
+				sizeof(*default_value), GFP_KERNEL);
+	if (!default_value)
+		return -ENOMEM;
+
+	param_value = kcalloc(DEVLINK_PARAM_CMODE_MAX + 1,
+			      sizeof(*param_value), GFP_KERNEL);
+	if (!param_value) {
+		kfree(default_value);
+		return -ENOMEM;
+	}
+
+	ctx = kzalloc_obj(*ctx);
+	if (!ctx) {
+		kfree(param_value);
+		kfree(default_value);
+		return -ENOMEM;
+	}
+
 	/* Get value from driver part to driverinit configuration mode */
 	for (i = 0; i <= DEVLINK_PARAM_CMODE_MAX; i++) {
 		if (!devlink_param_cmode_is_supported(param, i))
 			continue;
 		if (i == DEVLINK_PARAM_CMODE_DRIVERINIT) {
-			if (param_item->driverinit_value_new_valid)
+			if (param_item->driverinit_value_new_valid) {
 				param_value[i] = param_item->driverinit_value_new;
-			else if (param_item->driverinit_value_valid)
+			} else if (param_item->driverinit_value_valid) {
 				param_value[i] = param_item->driverinit_value;
-			else
-				return -EOPNOTSUPP;
+			} else {
+				err = -EOPNOTSUPP;
+				goto get_put_fail;
+			}
 
 			if (param_item->driverinit_value_valid) {
 				default_value[i] = param_item->driverinit_default;
 				default_value_set[i] = true;
 			}
 		} else {
-			ctx.cmode = i;
-			err = devlink_param_get(devlink, param, &ctx, extack);
+			ctx->cmode = i;
+			err = devlink_param_get(devlink, param, ctx, extack);
 			if (err)
-				return err;
-			param_value[i] = ctx.val;
+				goto get_put_fail;
 
-			err = devlink_param_get_default(devlink, param, &ctx,
+			param_value[i] = ctx->val;
+
+			err = devlink_param_get_default(devlink, param, ctx,
 							extack);
 			if (!err) {
-				default_value[i] = ctx.val;
+				default_value[i] = ctx->val;
 				default_value_set[i] = true;
 			} else if (err != -EOPNOTSUPP) {
-				return err;
+				goto get_put_fail;
 			}
 		}
 		param_value_set[i] = true;
 	}
 
+	err = -EMSGSIZE;
 	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
 	if (!hdr)
-		return -EMSGSIZE;
+		goto get_put_fail;
 
 	if (devlink_nl_put_handle(msg, devlink))
 		goto genlmsg_cancel;
@@ -393,6 +416,9 @@ static int devlink_nl_param_fill(struct sk_buff *msg, struct devlink *devlink,
 	nla_nest_end(msg, param_values_list);
 	nla_nest_end(msg, param_attr);
 	genlmsg_end(msg, hdr);
+	kfree(default_value);
+	kfree(param_value);
+	kfree(ctx);
 	return 0;
 
 values_list_nest_cancel:
@@ -401,7 +427,11 @@ static int devlink_nl_param_fill(struct sk_buff *msg, struct devlink *devlink,
 	nla_nest_cancel(msg, param_attr);
 genlmsg_cancel:
 	genlmsg_cancel(msg, hdr);
-	return -EMSGSIZE;
+get_put_fail:
+	kfree(default_value);
+	kfree(param_value);
+	kfree(ctx);
+	return err;
 }
 
 static void devlink_param_notify(struct devlink *devlink,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v19 net-next 4/9] devlink: Implement devlink param multi attribute nested data values
  2026-06-05  6:32 [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
                   ` (2 preceding siblings ...)
  2026-06-05  6:32 ` [PATCH v19 net-next 3/9] devlink: heap-allocate param fill buffers in devlink_nl_param_fill Ratheesh Kannoth
@ 2026-06-05  6:32 ` Ratheesh Kannoth
  2026-06-05  6:32 ` [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control Ratheesh Kannoth
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-05  6:32 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham, Saeed Mahameed, Ratheesh Kannoth

From: Saeed Mahameed <saeedm@nvidia.com>

Devlink param value attribute is not defined since devlink is handling
the value validating and parsing internally, this allows us to implement
multi attribute values without breaking any policies.

Devlink param multi-attribute values are considered to be dynamically
sized arrays of u64 values, by introducing a new devlink param type
DEVLINK_PARAM_TYPE_U64_ARRAY, driver and user space can set a variable
count of u64 values into the DEVLINK_ATTR_PARAM_VALUE_DATA attribute.

Implement get/set parsing and add to the internal value structure passed
to drivers.

This is useful for devices that need to configure a list of values for
a specific configuration.

example:
$ devlink dev param show pci/... name multi-value-param
name multi-value-param type driver-specific
values:
cmode permanent value: 0,1,2,3,4,5,6,7

$ devlink dev param set pci/... name multi-value-param \
		value 4,5,6,7,0,1,2,3 cmode permanent

Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 Documentation/netlink/specs/devlink.yaml |  4 +++
 include/net/devlink.h                    |  8 ++++++
 include/uapi/linux/devlink.h             |  1 +
 net/devlink/netlink_gen.c                |  2 ++
 net/devlink/param.c                      | 33 +++++++++++++++++++++++-
 5 files changed, 47 insertions(+), 1 deletion(-)

diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml
index 247b147d689f..52ad1e7805d1 100644
--- a/Documentation/netlink/specs/devlink.yaml
+++ b/Documentation/netlink/specs/devlink.yaml
@@ -234,6 +234,10 @@ definitions:
         value: 10
       -
         name: binary
+      -
+        name: u64-array
+        value: 129
+
   -
     name: rate-tc-index-max
     type: const
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 5f4083dc4345..dd546dbd57cf 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -433,6 +433,13 @@ enum devlink_param_type {
 	DEVLINK_PARAM_TYPE_U64 = DEVLINK_VAR_ATTR_TYPE_U64,
 	DEVLINK_PARAM_TYPE_STRING = DEVLINK_VAR_ATTR_TYPE_STRING,
 	DEVLINK_PARAM_TYPE_BOOL = DEVLINK_VAR_ATTR_TYPE_FLAG,
+	DEVLINK_PARAM_TYPE_U64_ARRAY = DEVLINK_VAR_ATTR_TYPE_U64_ARRAY,
+};
+
+#define __DEVLINK_PARAM_MAX_ARRAY_SIZE 32
+struct devlink_param_u64_array {
+	u64 size;
+	u64 val[__DEVLINK_PARAM_MAX_ARRAY_SIZE];
 };
 
 union devlink_param_value {
@@ -442,6 +449,7 @@ union devlink_param_value {
 	u64 vu64;
 	char vstr[__DEVLINK_PARAM_MAX_STRING_VALUE];
 	bool vbool;
+	struct devlink_param_u64_array u64arr;
 };
 
 struct devlink_param_gset_ctx {
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 0b165eac7619..ca713bcc47b9 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -406,6 +406,7 @@ enum devlink_var_attr_type {
 	DEVLINK_VAR_ATTR_TYPE_BINARY,
 	__DEVLINK_VAR_ATTR_TYPE_CUSTOM_BASE = 0x80,
 	/* Any possible custom types, unrelated to NLA_* values go below */
+	DEVLINK_VAR_ATTR_TYPE_U64_ARRAY,
 };
 
 enum devlink_attr {
diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
index 81899786fd98..f52b0c2b19ed 100644
--- a/net/devlink/netlink_gen.c
+++ b/net/devlink/netlink_gen.c
@@ -37,6 +37,8 @@ devlink_attr_param_type_validate(const struct nlattr *attr,
 	case DEVLINK_VAR_ATTR_TYPE_NUL_STRING:
 		fallthrough;
 	case DEVLINK_VAR_ATTR_TYPE_BINARY:
+		fallthrough;
+	case DEVLINK_VAR_ATTR_TYPE_U64_ARRAY:
 		return 0;
 	}
 	NL_SET_ERR_MSG_ATTR(extack, attr, "invalid enum value");
diff --git a/net/devlink/param.c b/net/devlink/param.c
index bd3881349c60..3e9d2e5750c2 100644
--- a/net/devlink/param.c
+++ b/net/devlink/param.c
@@ -252,6 +252,15 @@ devlink_nl_param_value_put(struct sk_buff *msg, enum devlink_param_type type,
 				return -EMSGSIZE;
 		}
 		break;
+	case DEVLINK_PARAM_TYPE_U64_ARRAY:
+		if (val->u64arr.size > __DEVLINK_PARAM_MAX_ARRAY_SIZE)
+			return -EMSGSIZE;
+
+		for (int i = 0; i < val->u64arr.size; i++) {
+			if (nla_put_uint(msg, nla_type, val->u64arr.val[i]))
+				return -EMSGSIZE;
+		}
+		break;
 	}
 	return 0;
 }
@@ -537,7 +546,7 @@ devlink_param_value_get_from_info(const struct devlink_param *param,
 				  union devlink_param_value *value)
 {
 	struct nlattr *param_data;
-	int len;
+	int len, cnt, rem;
 
 	param_data = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA];
 
@@ -577,6 +586,28 @@ devlink_param_value_get_from_info(const struct devlink_param *param,
 			return -EINVAL;
 		value->vbool = nla_get_flag(param_data);
 		break;
+
+	case DEVLINK_PARAM_TYPE_U64_ARRAY:
+		cnt = 0;
+		nla_for_each_attr_type(param_data,
+				       DEVLINK_ATTR_PARAM_VALUE_DATA,
+				       genlmsg_data(info->genlhdr),
+				       genlmsg_len(info->genlhdr), rem) {
+			if (cnt >= __DEVLINK_PARAM_MAX_ARRAY_SIZE)
+				return -EMSGSIZE;
+
+			if ((nla_len(param_data) != sizeof(u64)) &&
+			    (nla_len(param_data) != sizeof(u32))) {
+				NL_SET_BAD_ATTR(info->extack, param_data);
+				return -EINVAL;
+			}
+
+			value->u64arr.val[cnt] = nla_get_uint(param_data);
+			cnt++;
+		}
+
+		value->u64arr.size = cnt;
+		break;
 	}
 	return 0;
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control
  2026-06-05  6:32 [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
                   ` (3 preceding siblings ...)
  2026-06-05  6:32 ` [PATCH v19 net-next 4/9] devlink: Implement devlink param multi attribute nested data values Ratheesh Kannoth
@ 2026-06-05  6:32 ` Ratheesh Kannoth
  2026-06-08  2:22   ` Ratheesh Kannoth
  2026-06-08  2:28   ` Ratheesh Kannoth
  2026-06-05  6:32 ` [PATCH v19 net-next 6/9] octeontx2: cn20k: Coordinate default rules with NIX LF lifecycle Ratheesh Kannoth
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-05  6:32 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham, Ratheesh Kannoth

CN20K NPC MCAM is split into 32 subbanks that are searched in a
predefined order during allocation. Lower-numbered subbanks have
higher priority than higher-numbered ones.

Add a runtime "srch_order" to control the order in which
subbanks are searched during MCAM allocation.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 .../ethernet/marvell/octeontx2/af/cn20k/npc.c | 120 +++++++++++++++++-
 .../ethernet/marvell/octeontx2/af/cn20k/npc.h |   3 +
 .../marvell/octeontx2/af/rvu_devlink.c        |  92 ++++++++++++--
 3 files changed, 203 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
index 31eaaceb8766..2705753c1878 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
@@ -3376,7 +3376,7 @@ rvu_mbox_handler_npc_cn20k_get_kex_cfg(struct rvu *rvu,
 	return 0;
 }
 
-static int *subbank_srch_order;
+static u32 *subbank_srch_order;
 
 static void npc_populate_restricted_idxs(int num_subbanks)
 {
@@ -3388,7 +3388,7 @@ static int npc_create_srch_order(int cnt)
 {
 	int val = 0;
 
-	subbank_srch_order = kcalloc(cnt, sizeof(int),
+	subbank_srch_order = kcalloc(cnt, sizeof(u32),
 				     GFP_KERNEL);
 	if (!subbank_srch_order)
 		return -ENOMEM;
@@ -3906,6 +3906,122 @@ static void npc_unlock_all_subbank(void)
 		mutex_unlock(&npc_priv.sb[i].lock);
 }
 
+int npc_cn20k_search_order_set(struct rvu *rvu,
+			       u64 narr[MAX_NUM_SUB_BANKS], int cnt)
+{
+	struct npc_mcam *mcam = &rvu->hw->mcam;
+	int rsrc[2][MAX_NUM_SUB_BANKS] = { };
+	u8 save[MAX_NUM_SUB_BANKS] = { };
+	struct npc_subbank *sb;
+	struct xarray *xa;
+	int prio, rc, err;
+	int sb_idx;
+	enum {
+		FREE = 0,
+		USED = 1,
+	};
+
+	if (cnt != npc_priv.num_subbanks) {
+		dev_err(rvu->dev, "Number of entries(%u) != %u\n",
+			cnt, npc_priv.num_subbanks);
+		return -EINVAL;
+	}
+
+	mutex_lock(&mcam->lock);
+	npc_lock_all_subbank();
+
+	for (sb_idx = 0; sb_idx < cnt; sb_idx++) {
+		sb = &npc_priv.sb[sb_idx];
+		save[sb->idx] = sb->arr_idx;
+	}
+
+	for (prio = 0; prio < cnt; prio++) {
+		sb_idx = narr[prio];
+		sb = &npc_priv.sb[sb_idx];
+
+		if (sb->flags & NPC_SUBBANK_FLAG_USED)
+			xa = &npc_priv.xa_sb_used;
+		else
+			xa = &npc_priv.xa_sb_free;
+
+		rc = xa_err(xa_store(xa, prio,
+				     xa_mk_value(sb_idx), GFP_KERNEL));
+		if (rc) {
+			dev_err(rvu->dev,
+				"Setting arr_idx=%d for sb=%d failed\n",
+				sb->arr_idx, sb_idx);
+			goto fail;
+		}
+
+		if (sb->flags & NPC_SUBBANK_FLAG_USED) {
+			rsrc[USED][sb->arr_idx] -= 1;
+			rsrc[USED][prio] += 1;
+		} else {
+			rsrc[FREE][sb->arr_idx] -= 1;
+			rsrc[FREE][prio] += 1;
+		}
+
+		sb->arr_idx = prio;
+	}
+
+	for (prio = 0; prio < cnt; prio++) {
+		if (rsrc[FREE][prio] == -1)
+			xa_erase(&npc_priv.xa_sb_free, prio);
+
+		if (rsrc[USED][prio] == -1)
+			xa_erase(&npc_priv.xa_sb_used, prio);
+	}
+
+	for (int i = 0; i < cnt; i++)
+		subbank_srch_order[i] = (u32)narr[i];
+
+	restrict_valid = false;
+
+	npc_unlock_all_subbank();
+	mutex_unlock(&mcam->lock);
+
+	return 0;
+
+fail:
+	for (prio = 0; prio < cnt; prio++) {
+		if (rsrc[FREE][prio] == 1)
+			xa_erase(&npc_priv.xa_sb_free, prio);
+
+		if (rsrc[USED][prio] == 1)
+			xa_erase(&npc_priv.xa_sb_used, prio);
+	}
+
+	for (sb_idx = 0; sb_idx < cnt; sb_idx++) {
+		sb = &npc_priv.sb[sb_idx];
+		sb->arr_idx = save[sb_idx];
+
+		if (sb->flags & NPC_SUBBANK_FLAG_USED)
+			xa = &npc_priv.xa_sb_used;
+		else
+			xa = &npc_priv.xa_sb_free;
+
+		/* Since the entry already exists, xa_store() replaces
+		 * the value without a kmalloc(), making failure highly unlikely.
+		 */
+		err = xa_err(xa_store(xa, sb->arr_idx,
+				      xa_mk_value(sb->idx), GFP_KERNEL));
+		WARN(!!err, "Failed to rollback sb=%u idx=%u\n",
+		     sb->idx, sb->arr_idx);
+	}
+
+	npc_unlock_all_subbank();
+	mutex_unlock(&mcam->lock);
+
+	return rc;
+}
+
+const u32 *npc_cn20k_search_order_get(bool *restricted_order, u32 *sz)
+{
+	*restricted_order = restrict_valid;
+	*sz = npc_priv.num_subbanks;
+	return subbank_srch_order;
+}
+
 /* Only non-ref non-contigous mcam indexes
  * are picked for defrag process
  */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.h b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.h
index 3e851950be64..8bf857317e49 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.h
@@ -347,5 +347,8 @@ bool npc_is_cgx_or_lbk(struct rvu *rvu, u16 pcifunc);
 int npc_mcam_idx_2_subbank_idx(struct rvu *rvu, u16 mcam_idx,
 			       struct npc_subbank **sb,
 			       int *sb_off);
+const u32 *npc_cn20k_search_order_get(bool *restricted_order, u32 *sz);
+int npc_cn20k_search_order_set(struct rvu *rvu, u64 narr[MAX_NUM_SUB_BANKS],
+			       int cnt);
 
 #endif /* NPC_CN20K_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
index a42404e6db7c..aa3ecab5ebd8 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_devlink.c
@@ -1258,6 +1258,7 @@ enum rvu_af_dl_param_id {
 	RVU_AF_DEVLINK_PARAM_ID_NPC_EXACT_FEATURE_DISABLE,
 	RVU_AF_DEVLINK_PARAM_ID_NPC_DEF_RULE_CNTR_ENABLE,
 	RVU_AF_DEVLINK_PARAM_ID_NPC_DEFRAG,
+	RVU_AF_DEVLINK_PARAM_ID_NPC_SRCH_ORDER,
 	RVU_AF_DEVLINK_PARAM_ID_NIX_MAXLF,
 };
 
@@ -1619,12 +1620,83 @@ static int rvu_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
 	return 0;
 }
 
+static int rvu_af_dl_npc_srch_order_set(struct devlink *devlink, u32 id,
+					struct devlink_param_gset_ctx *ctx,
+					struct netlink_ext_ack *extack)
+{
+	struct rvu_devlink *rvu_dl = devlink_priv(devlink);
+	struct rvu *rvu = rvu_dl->rvu;
+
+	return npc_cn20k_search_order_set(rvu,
+					  ctx->val.u64arr.val,
+					  ctx->val.u64arr.size);
+}
+
+static int rvu_af_dl_npc_srch_order_get(struct devlink *devlink, u32 id,
+					struct devlink_param_gset_ctx *ctx,
+					struct netlink_ext_ack *extack)
+{
+	bool restricted_order;
+	const u32 *order;
+	u32 sz;
+
+	order = npc_cn20k_search_order_get(&restricted_order, &sz);
+	ctx->val.u64arr.size = sz;
+	for (int i = 0; i < sz; i++)
+		ctx->val.u64arr.val[i] = order[i];
+
+	return 0;
+}
+
+static int rvu_af_dl_npc_srch_order_validate(struct devlink *devlink, u32 id,
+					     union devlink_param_value *val,
+					     struct netlink_ext_ack *extack)
+{
+	struct rvu_devlink *rvu_dl = devlink_priv(devlink);
+	struct rvu *rvu = rvu_dl->rvu;
+	bool restricted_order;
+	unsigned long w = 0;
+	u64 *arr;
+	u32 sz;
+
+	npc_cn20k_search_order_get(&restricted_order, &sz);
+	if (sz != val->u64arr.size) {
+		dev_err(rvu->dev,
+			"Wrong size %llu, should be %u\n",
+			val->u64arr.size, sz);
+		return -EINVAL;
+	}
+
+	arr = val->u64arr.val;
+	for (int i = 0; i < sz; i++) {
+		if (arr[i] >= sz)
+			return -EINVAL;
+
+		w |= BIT_ULL(arr[i]);
+	}
+
+	if (bitmap_weight(&w, sz) != sz) {
+		dev_err(rvu->dev,
+			"Duplicate or out-of-range subbank index. %lu\n",
+			find_first_zero_bit(&w, sz));
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static const struct devlink_ops rvu_devlink_ops = {
 	.eswitch_mode_get = rvu_devlink_eswitch_mode_get,
 	.eswitch_mode_set = rvu_devlink_eswitch_mode_set,
 };
 
-static const struct devlink_param rvu_af_dl_param_defrag[] = {
+static const struct devlink_param rvu_af_dl_cn20k_params[] = {
+	DEVLINK_PARAM_DRIVER(RVU_AF_DEVLINK_PARAM_ID_NPC_SRCH_ORDER,
+			     "npc_srch_order", DEVLINK_PARAM_TYPE_U64_ARRAY,
+			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+			     rvu_af_dl_npc_srch_order_get,
+			     rvu_af_dl_npc_srch_order_set,
+			     rvu_af_dl_npc_srch_order_validate),
 	DEVLINK_PARAM_DRIVER(RVU_AF_DEVLINK_PARAM_ID_NPC_DEFRAG,
 			     "npc_defrag", DEVLINK_PARAM_TYPE_STRING,
 			     BIT(DEVLINK_PARAM_CMODE_RUNTIME),
@@ -1666,13 +1738,13 @@ int rvu_register_dl(struct rvu *rvu)
 	}
 
 	if (is_cn20k(rvu->pdev)) {
-		err = devlink_params_register(dl, rvu_af_dl_param_defrag,
-					      ARRAY_SIZE(rvu_af_dl_param_defrag));
+		err = devlink_params_register(dl, rvu_af_dl_cn20k_params,
+					      ARRAY_SIZE(rvu_af_dl_cn20k_params));
 		if (err) {
 			dev_err(rvu->dev,
-				"devlink defrag params register failed with error %d",
+				"devlink cn20k params register failed with error %d",
 				err);
-			goto err_dl_defrag;
+			goto err_dl_cn20k_params;
 		}
 	}
 
@@ -1695,10 +1767,10 @@ int rvu_register_dl(struct rvu *rvu)
 
 err_dl_exact_match:
 	if (is_cn20k(rvu->pdev))
-		devlink_params_unregister(dl, rvu_af_dl_param_defrag,
-					  ARRAY_SIZE(rvu_af_dl_param_defrag));
+		devlink_params_unregister(dl, rvu_af_dl_cn20k_params,
+					  ARRAY_SIZE(rvu_af_dl_cn20k_params));
 
-err_dl_defrag:
+err_dl_cn20k_params:
 	devlink_params_unregister(dl, rvu_af_dl_params, ARRAY_SIZE(rvu_af_dl_params));
 
 err_dl_health:
@@ -1717,8 +1789,8 @@ void rvu_unregister_dl(struct rvu *rvu)
 	devlink_params_unregister(dl, rvu_af_dl_params, ARRAY_SIZE(rvu_af_dl_params));
 
 	if (is_cn20k(rvu->pdev))
-		devlink_params_unregister(dl, rvu_af_dl_param_defrag,
-					  ARRAY_SIZE(rvu_af_dl_param_defrag));
+		devlink_params_unregister(dl, rvu_af_dl_cn20k_params,
+					  ARRAY_SIZE(rvu_af_dl_cn20k_params));
 
 	/* Unregister exact match devlink only for CN10K-B */
 	if (rvu_npc_exact_has_match_table(rvu))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v19 net-next 6/9] octeontx2: cn20k: Coordinate default rules with NIX LF lifecycle
  2026-06-05  6:32 [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
                   ` (4 preceding siblings ...)
  2026-06-05  6:32 ` [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control Ratheesh Kannoth
@ 2026-06-05  6:32 ` Ratheesh Kannoth
  2026-06-08  2:29   ` Ratheesh Kannoth
  2026-06-05  6:32 ` [PATCH v19 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem Ratheesh Kannoth
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-05  6:32 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham, Ratheesh Kannoth

Add NIX_LF_DONT_FREE_DFT_IDXS so the PF can send NIX LF free during hw
reinit or teardown without the AF freeing CN20K default NPC rule indexes
while the driver still owns that state (otx2_init_hw_resources and
otx2_free_hw_resources).

On CN20K, allocate default NPC rules from NIX LF alloc before
nix_interface_init, roll back with npc_cn20k_dft_rules_free on failure,
and free from NIX LF free when the new flag is not set. Tighten
rvu_mbox_handler_nix_lf_alloc error handling: use a single rc, propagate
qmem_alloc and other errors, and set -ENOMEM only when kcalloc fails
(remove the blanket -ENOMEM at the free_mem path).

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 .../net/ethernet/marvell/octeontx2/af/mbox.h  |  1 +
 .../ethernet/marvell/octeontx2/af/rvu_nix.c   | 77 +++++++++++++------
 .../ethernet/marvell/octeontx2/af/rvu_npc.c   | 20 +++--
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |  6 +-
 4 files changed, 69 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
index dc42c81c0942..e07fbf842b94 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/mbox.h
@@ -1009,6 +1009,7 @@ struct nix_lf_free_req {
 	struct mbox_msghdr hdr;
 #define NIX_LF_DISABLE_FLOWS		BIT_ULL(0)
 #define NIX_LF_DONT_FREE_TX_VTAG	BIT_ULL(1)
+#define NIX_LF_DONT_FREE_DFT_IDXS	BIT_ULL(2)
 	u64 flags;
 };
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
index f977734ae712..d8989395e875 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_nix.c
@@ -16,6 +16,7 @@
 #include "cgx.h"
 #include "lmac_common.h"
 #include "rvu_npc_hash.h"
+#include "cn20k/npc.h"
 
 static void nix_free_tx_vtag_entries(struct rvu *rvu, u16 pcifunc);
 static int rvu_nix_get_bpid(struct rvu *rvu, struct nix_bp_cfg_req *req,
@@ -1499,9 +1500,11 @@ int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
 				  struct nix_lf_alloc_req *req,
 				  struct nix_lf_alloc_rsp *rsp)
 {
-	int nixlf, qints, hwctx_size, intf, err, rc = 0;
+	int nixlf, qints, hwctx_size, intf, rc = 0;
+	u16 bcast, mcast, promisc, ucast;
 	struct rvu_hwinfo *hw = rvu->hw;
 	u16 pcifunc = req->hdr.pcifunc;
+	bool rules_created = false;
 	struct rvu_block *block;
 	struct rvu_pfvf *pfvf;
 	u64 cfg, ctx_cfg;
@@ -1555,8 +1558,8 @@ int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
 		return NIX_AF_ERR_RSS_GRPS_INVALID;
 
 	/* Reset this NIX LF */
-	err = rvu_lf_reset(rvu, block, nixlf);
-	if (err) {
+	rc = rvu_lf_reset(rvu, block, nixlf);
+	if (rc) {
 		dev_err(rvu->dev, "Failed to reset NIX%d LF%d\n",
 			block->addr - BLKADDR_NIX0, nixlf);
 		return NIX_AF_ERR_LF_RESET;
@@ -1566,13 +1569,15 @@ int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
 
 	/* Alloc NIX RQ HW context memory and config the base */
 	hwctx_size = 1UL << ((ctx_cfg >> 4) & 0xF);
-	err = qmem_alloc(rvu->dev, &pfvf->rq_ctx, req->rq_cnt, hwctx_size);
-	if (err)
+	rc = qmem_alloc(rvu->dev, &pfvf->rq_ctx, req->rq_cnt, hwctx_size);
+	if (rc)
 		goto free_mem;
 
 	pfvf->rq_bmap = kcalloc(req->rq_cnt, sizeof(long), GFP_KERNEL);
-	if (!pfvf->rq_bmap)
+	if (!pfvf->rq_bmap) {
+		rc = -ENOMEM;
 		goto free_mem;
+	}
 
 	rvu_write64(rvu, blkaddr, NIX_AF_LFX_RQS_BASE(nixlf),
 		    (u64)pfvf->rq_ctx->iova);
@@ -1583,13 +1588,15 @@ int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
 
 	/* Alloc NIX SQ HW context memory and config the base */
 	hwctx_size = 1UL << (ctx_cfg & 0xF);
-	err = qmem_alloc(rvu->dev, &pfvf->sq_ctx, req->sq_cnt, hwctx_size);
-	if (err)
+	rc = qmem_alloc(rvu->dev, &pfvf->sq_ctx, req->sq_cnt, hwctx_size);
+	if (rc)
 		goto free_mem;
 
 	pfvf->sq_bmap = kcalloc(req->sq_cnt, sizeof(long), GFP_KERNEL);
-	if (!pfvf->sq_bmap)
+	if (!pfvf->sq_bmap) {
+		rc = -ENOMEM;
 		goto free_mem;
+	}
 
 	rvu_write64(rvu, blkaddr, NIX_AF_LFX_SQS_BASE(nixlf),
 		    (u64)pfvf->sq_ctx->iova);
@@ -1599,13 +1606,15 @@ int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
 
 	/* Alloc NIX CQ HW context memory and config the base */
 	hwctx_size = 1UL << ((ctx_cfg >> 8) & 0xF);
-	err = qmem_alloc(rvu->dev, &pfvf->cq_ctx, req->cq_cnt, hwctx_size);
-	if (err)
+	rc = qmem_alloc(rvu->dev, &pfvf->cq_ctx, req->cq_cnt, hwctx_size);
+	if (rc)
 		goto free_mem;
 
 	pfvf->cq_bmap = kcalloc(req->cq_cnt, sizeof(long), GFP_KERNEL);
-	if (!pfvf->cq_bmap)
+	if (!pfvf->cq_bmap) {
+		rc = -ENOMEM;
 		goto free_mem;
+	}
 
 	rvu_write64(rvu, blkaddr, NIX_AF_LFX_CQS_BASE(nixlf),
 		    (u64)pfvf->cq_ctx->iova);
@@ -1615,18 +1624,18 @@ int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
 
 	/* Initialize receive side scaling (RSS) */
 	hwctx_size = 1UL << ((ctx_cfg >> 12) & 0xF);
-	err = nixlf_rss_ctx_init(rvu, blkaddr, pfvf, nixlf, req->rss_sz,
-				 req->rss_grps, hwctx_size, req->way_mask,
-				 !!(req->flags & NIX_LF_RSS_TAG_LSB_AS_ADDER));
-	if (err)
+	rc = nixlf_rss_ctx_init(rvu, blkaddr, pfvf, nixlf, req->rss_sz,
+				req->rss_grps, hwctx_size, req->way_mask,
+				!!(req->flags & NIX_LF_RSS_TAG_LSB_AS_ADDER));
+	if (rc)
 		goto free_mem;
 
 	/* Alloc memory for CQINT's HW contexts */
 	cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST2);
 	qints = (cfg >> 24) & 0xFFF;
 	hwctx_size = 1UL << ((ctx_cfg >> 24) & 0xF);
-	err = qmem_alloc(rvu->dev, &pfvf->cq_ints_ctx, qints, hwctx_size);
-	if (err)
+	rc = qmem_alloc(rvu->dev, &pfvf->cq_ints_ctx, qints, hwctx_size);
+	if (rc)
 		goto free_mem;
 
 	rvu_write64(rvu, blkaddr, NIX_AF_LFX_CINTS_BASE(nixlf),
@@ -1639,8 +1648,8 @@ int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
 	cfg = rvu_read64(rvu, blkaddr, NIX_AF_CONST2);
 	qints = (cfg >> 12) & 0xFFF;
 	hwctx_size = 1UL << ((ctx_cfg >> 20) & 0xF);
-	err = qmem_alloc(rvu->dev, &pfvf->nix_qints_ctx, qints, hwctx_size);
-	if (err)
+	rc = qmem_alloc(rvu->dev, &pfvf->nix_qints_ctx, qints, hwctx_size);
+	if (rc)
 		goto free_mem;
 
 	rvu_write64(rvu, blkaddr, NIX_AF_LFX_QINTS_BASE(nixlf),
@@ -1684,10 +1693,22 @@ int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
 	if (is_sdp_pfvf(rvu, pcifunc))
 		intf = NIX_INTF_TYPE_SDP;
 
-	err = nix_interface_init(rvu, pcifunc, intf, nixlf, rsp,
-				 !!(req->flags & NIX_LF_LBK_BLK_SEL));
-	if (err)
-		goto free_mem;
+	if (is_cn20k(rvu->pdev)) {
+		rc = npc_cn20k_dft_rules_idx_get(rvu, pcifunc, &bcast, &mcast,
+						 &promisc, &ucast);
+		if (rc) {
+			rc = npc_cn20k_dft_rules_alloc(rvu, pcifunc);
+			if (rc)
+				goto free_mem;
+
+			rules_created = true;
+		}
+	}
+
+	rc = nix_interface_init(rvu, pcifunc, intf, nixlf, rsp,
+				!!(req->flags & NIX_LF_LBK_BLK_SEL));
+	if (rc)
+		goto free_dft;
 
 	/* Disable NPC entries as NIXLF's contexts are not initialized yet */
 	rvu_npc_disable_default_entries(rvu, pcifunc, nixlf);
@@ -1699,9 +1720,12 @@ int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
 
 	goto exit;
 
+free_dft:
+	if (is_cn20k(rvu->pdev) && rules_created)
+		npc_cn20k_dft_rules_free(rvu, pcifunc);
+
 free_mem:
 	nix_ctx_free(rvu, pfvf);
-	rc = -ENOMEM;
 
 exit:
 	/* Set macaddr of this PF/VF */
@@ -1775,6 +1799,9 @@ int rvu_mbox_handler_nix_lf_free(struct rvu *rvu, struct nix_lf_free_req *req,
 
 	nix_ctx_free(rvu, pfvf);
 
+	if (is_cn20k(rvu->pdev) && !(req->flags & NIX_LF_DONT_FREE_DFT_IDXS))
+		npc_cn20k_dft_rules_free(rvu, pcifunc);
+
 	return 0;
 }
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index d301a3f0f87a..150d50b72c48 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -1285,11 +1285,18 @@ void npc_enadis_default_mce_entry(struct rvu *rvu, u16 pcifunc,
 	struct nix_mce_list *mce_list;
 	int index, blkaddr, mce_idx;
 	struct rvu_pfvf *pfvf;
+	u16 ptr[4];
 
 	/* multicast pkt replication is not enabled for AF's VFs & SDP links */
 	if (is_lbk_vf(rvu, pcifunc) || is_sdp_pfvf(rvu, pcifunc))
 		return;
 
+	/* In cn20k, only CGX mapped devices have default MCAST entry */
+	if (is_cn20k(rvu->pdev) &&
+	    npc_cn20k_dft_rules_idx_get(rvu, pcifunc, &ptr[0], &ptr[1],
+					&ptr[2], &ptr[3]))
+		return;
+
 	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
 	if (blkaddr < 0)
 		return;
@@ -1329,9 +1336,12 @@ static void npc_enadis_default_entries(struct rvu *rvu, u16 pcifunc,
 	struct rvu_pfvf *pfvf = rvu_get_pfvf(rvu, pcifunc);
 	struct npc_mcam *mcam = &rvu->hw->mcam;
 	int index, blkaddr;
+	u16 ptr[4];
 
 	/* only CGX or LBK interfaces have default entries */
-	if (is_cn20k(rvu->pdev) && !npc_is_cgx_or_lbk(rvu, pcifunc))
+	if (is_cn20k(rvu->pdev) &&
+	    npc_cn20k_dft_rules_idx_get(rvu, pcifunc, &ptr[0], &ptr[1],
+					&ptr[2], &ptr[3]))
 		return;
 
 	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
@@ -4085,12 +4095,10 @@ void rvu_npc_clear_ucast_entry(struct rvu *rvu, int pcifunc, int nixlf)
 
 	ucast_idx = npc_get_nixlf_mcam_index(mcam, pcifunc,
 					     nixlf, NIXLF_UCAST_ENTRY);
-	if (ucast_idx < 0) {
-		dev_err(rvu->dev,
-			"%s: Error to get ucast entry for pcifunc=%#x\n",
-			__func__, pcifunc);
+
+	/* In cn20k, default rules are freed before detach rsrc */
+	if (ucast_idx < 0)
 		return;
-	}
 
 	npc_enable_mcam_entry(rvu, mcam, blkaddr, ucast_idx, false);
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index f9fbf0c17648..b4538edb13f8 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -1053,7 +1053,6 @@ irqreturn_t otx2_pfaf_mbox_intr_handler(int irq, void *pf_irq)
 	/* Clear the IRQ */
 	otx2_write64(pf, RVU_PF_INT, BIT_ULL(0));
 
-
 	mbox_data = otx2_read64(pf, RVU_PF_PFAF_MBOX0);
 
 	if (mbox_data & MBOX_UP_MSG) {
@@ -1729,7 +1728,7 @@ int otx2_init_hw_resources(struct otx2_nic *pf)
 	mutex_lock(&mbox->lock);
 	free_req = otx2_mbox_alloc_msg_nix_lf_free(mbox);
 	if (free_req) {
-		free_req->flags = NIX_LF_DISABLE_FLOWS;
+		free_req->flags = NIX_LF_DISABLE_FLOWS | NIX_LF_DONT_FREE_DFT_IDXS;
 		if (otx2_sync_mbox_msg(mbox))
 			dev_err(pf->dev, "%s failed to free nixlf\n", __func__);
 	}
@@ -1803,7 +1802,7 @@ void otx2_free_hw_resources(struct otx2_nic *pf)
 	/* Reset NIX LF */
 	free_req = otx2_mbox_alloc_msg_nix_lf_free(mbox);
 	if (free_req) {
-		free_req->flags = NIX_LF_DISABLE_FLOWS;
+		free_req->flags = NIX_LF_DISABLE_FLOWS | NIX_LF_DONT_FREE_DFT_IDXS;
 		if (!(pf->flags & OTX2_FLAG_PF_SHUTDOWN))
 			free_req->flags |= NIX_LF_DONT_FREE_TX_VTAG;
 		if (otx2_sync_mbox_msg(mbox))
@@ -1926,7 +1925,6 @@ int otx2_alloc_queue_mem(struct otx2_nic *pf)
 	struct otx2_qset *qset = &pf->qset;
 	struct otx2_cq_poll *cq_poll;
 
-
 	/* RQ and SQs are mapped to different CQs,
 	 * so find out max CQ IRQs (i.e CINTs) needed.
 	 */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v19 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem
  2026-06-05  6:32 [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
                   ` (5 preceding siblings ...)
  2026-06-05  6:32 ` [PATCH v19 net-next 6/9] octeontx2: cn20k: Coordinate default rules with NIX LF lifecycle Ratheesh Kannoth
@ 2026-06-05  6:32 ` Ratheesh Kannoth
  2026-06-08  2:23   ` Ratheesh Kannoth
  2026-06-08  2:30   ` Ratheesh Kannoth
  2026-06-05  6:32 ` [PATCH v19 net-next 8/9] octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc Ratheesh Kannoth
  2026-06-05  6:32 ` [PATCH v19 net-next 9/9] octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically Ratheesh Kannoth
  8 siblings, 2 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-05  6:32 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham, Ratheesh Kannoth

Flashing updated firmware on deployed devices is cumbersome. Provide a
mechanism to load a custom KPU (Key Parse Unit) profile directly from
the filesystem at module load time.

When the rvu_af module is loaded with the kpu_profile parameter, the
specified profile is read from /lib/firmware/kpu and programmed into
the KPU registers. Add npc_kpu_profile_cam2 for the extended cam format
used by filesystem-loaded profiles and support ptype/ptype_mask in
npc_config_kpucam when profile->from_fs is set.

Usage:
  1. Copy the KPU profile file to /lib/firmware/kpu.
  2. Build OCTEONTX2_AF as a module.
  3. Load: insmod rvu_af.ko kpu_profile=<profile_name>

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 .../ethernet/marvell/octeontx2/af/cn20k/npc.c |  57 ++-
 .../net/ethernet/marvell/octeontx2/af/npc.h   |  17 +
 .../net/ethernet/marvell/octeontx2/af/rvu.h   |  12 +-
 .../ethernet/marvell/octeontx2/af/rvu_npc.c   | 466 ++++++++++++++----
 .../ethernet/marvell/octeontx2/af/rvu_npc.h   |  17 +
 .../ethernet/marvell/octeontx2/af/rvu_reg.h   |   1 +
 6 files changed, 459 insertions(+), 111 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
index 2705753c1878..32b53b5bc57a 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
@@ -521,13 +521,17 @@ npc_program_single_kpm_profile(struct rvu *rvu, int blkaddr,
 			       int kpm, int start_entry,
 			       const struct npc_kpu_profile *profile)
 {
+	int num_cam_entries, num_action_entries;
 	int entry, num_entries, max_entries;
 	u64 idx;
 
-	if (profile->cam_entries != profile->action_entries) {
+	num_cam_entries = npc_get_num_kpu_cam_entries(rvu, profile);
+	num_action_entries = npc_get_num_kpu_action_entries(rvu, profile);
+
+	if (num_cam_entries != num_action_entries) {
 		dev_err(rvu->dev,
 			"kpm%d: CAM and action entries [%d != %d] not equal\n",
-			kpm, profile->cam_entries, profile->action_entries);
+			kpm, num_cam_entries, num_action_entries);
 
 		WARN(1, "Fatal error\n");
 		return;
@@ -536,16 +540,18 @@ npc_program_single_kpm_profile(struct rvu *rvu, int blkaddr,
 	max_entries = rvu->hw->npc_kpu_entries / 2;
 	entry = start_entry;
 	/* Program CAM match entries for previous kpm extracted data */
-	num_entries = min_t(int, profile->cam_entries, max_entries);
+	num_entries = min_t(int, num_cam_entries, max_entries);
 	for (idx = 0; entry < num_entries + start_entry; entry++, idx++)
-		npc_config_kpmcam(rvu, blkaddr, &profile->cam[idx],
+		npc_config_kpmcam(rvu, blkaddr,
+				  npc_get_kpu_cam_nth_entry(rvu, profile, idx),
 				  kpm, entry);
 
 	entry = start_entry;
 	/* Program this kpm's actions */
-	num_entries = min_t(int, profile->action_entries, max_entries);
+	num_entries = min_t(int, num_action_entries, max_entries);
 	for (idx = 0; entry < num_entries + start_entry; entry++, idx++)
-		npc_config_kpmaction(rvu, blkaddr, &profile->action[idx],
+		npc_config_kpmaction(rvu, blkaddr,
+				     npc_get_kpu_action_nth_entry(rvu, profile, idx),
 				     kpm, entry, false);
 }
 
@@ -611,20 +617,23 @@ npc_enable_kpm_entry(struct rvu *rvu, int blkaddr, int kpm, int num_entries)
 static void npc_program_kpm_profile(struct rvu *rvu, int blkaddr, int num_kpms)
 {
 	const struct npc_kpu_profile *profile1, *profile2;
+	int pfl1_num_cam_entries, pfl2_num_cam_entries;
 	int idx, total_cam_entries;
 
 	for (idx = 0; idx < num_kpms; idx++) {
 		profile1 = &rvu->kpu.kpu[idx];
+		pfl1_num_cam_entries = npc_get_num_kpu_cam_entries(rvu, profile1);
 		npc_program_single_kpm_profile(rvu, blkaddr, idx, 0, profile1);
 		profile2 = &rvu->kpu.kpu[idx + KPU_OFFSET];
+		pfl2_num_cam_entries = npc_get_num_kpu_cam_entries(rvu, profile2);
+
 		npc_program_single_kpm_profile(rvu, blkaddr, idx,
-					       profile1->cam_entries,
+					       pfl1_num_cam_entries,
 					       profile2);
-		total_cam_entries = profile1->cam_entries +
-			profile2->cam_entries;
+		total_cam_entries = pfl1_num_cam_entries + pfl2_num_cam_entries;
 		npc_enable_kpm_entry(rvu, blkaddr, idx, total_cam_entries);
 		rvu_write64(rvu, blkaddr, NPC_AF_KPMX_PASS2_OFFSET(idx),
-			    profile1->cam_entries);
+			    pfl1_num_cam_entries);
 		/* Enable the KPUs associated with this KPM */
 		rvu_write64(rvu, blkaddr, NPC_AF_KPUX_CFG(idx), 0x01);
 		rvu_write64(rvu, blkaddr, NPC_AF_KPUX_CFG(idx + KPU_OFFSET),
@@ -634,6 +643,7 @@ static void npc_program_kpm_profile(struct rvu *rvu, int blkaddr, int num_kpms)
 
 void npc_cn20k_parser_profile_init(struct rvu *rvu, int blkaddr)
 {
+	struct npc_kpu_profile_action *act;
 	struct rvu_hwinfo *hw = rvu->hw;
 	int num_pkinds, idx;
 
@@ -665,9 +675,15 @@ void npc_cn20k_parser_profile_init(struct rvu *rvu, int blkaddr)
 	num_pkinds = rvu->kpu.pkinds;
 	num_pkinds = min_t(int, hw->npc_pkinds, num_pkinds);
 
-	for (idx = 0; idx < num_pkinds; idx++)
-		npc_config_kpmaction(rvu, blkaddr, &rvu->kpu.ikpu[idx],
+	/* Cn20k does not support Custom profile from filesystem */
+	for (idx = 0; idx < num_pkinds; idx++) {
+		act = npc_get_ikpu_nth_entry(rvu, idx);
+		if (!act)
+			continue;
+
+		npc_config_kpmaction(rvu, blkaddr, act,
 				     0, idx, true);
+	}
 
 	/* Program KPM CAM and Action profiles */
 	npc_program_kpm_profile(rvu, blkaddr, hw->npc_kpms);
@@ -679,7 +695,7 @@ struct npc_priv_t *npc_priv_get(void)
 }
 
 static void npc_program_mkex_rx(struct rvu *rvu, int blkaddr,
-				struct npc_mcam_kex_extr *mkex_extr,
+				const struct npc_mcam_kex_extr *mkex_extr,
 				u8 intf)
 {
 	u8 num_extr = rvu->hw->npc_kex_extr;
@@ -708,7 +724,7 @@ static void npc_program_mkex_rx(struct rvu *rvu, int blkaddr,
 }
 
 static void npc_program_mkex_tx(struct rvu *rvu, int blkaddr,
-				struct npc_mcam_kex_extr *mkex_extr,
+				const struct npc_mcam_kex_extr *mkex_extr,
 				u8 intf)
 {
 	u8 num_extr = rvu->hw->npc_kex_extr;
@@ -737,7 +753,7 @@ static void npc_program_mkex_tx(struct rvu *rvu, int blkaddr,
 }
 
 static void npc_program_mkex_profile(struct rvu *rvu, int blkaddr,
-				     struct npc_mcam_kex_extr *mkex_extr)
+				     const struct npc_mcam_kex_extr *mkex_extr)
 {
 	struct rvu_hwinfo *hw = rvu->hw;
 	u8 intf;
@@ -1630,8 +1646,8 @@ npc_cn20k_update_action_entries_n_flags(struct rvu *rvu,
 int npc_cn20k_apply_custom_kpu(struct rvu *rvu,
 			       struct npc_kpu_profile_adapter *profile)
 {
+	const struct npc_cn20k_kpu_profile_fwdata *fw = rvu->kpu_fwdata;
 	size_t hdr_sz = sizeof(struct npc_cn20k_kpu_profile_fwdata);
-	struct npc_cn20k_kpu_profile_fwdata *fw = rvu->kpu_fwdata;
 	struct npc_kpu_profile_action *action;
 	struct npc_kpu_profile_cam *cam;
 	struct npc_kpu_fwdata *fw_kpu;
@@ -1676,8 +1692,15 @@ int npc_cn20k_apply_custom_kpu(struct rvu *rvu,
 	}
 
 	/* Verify if profile fits the HW */
+	if (fw->kpus > rvu->hw->npc_kpus) {
+		dev_warn(rvu->dev, "Not enough KPUs: %d > %d\n", fw->kpus,
+			 rvu->hw->npc_kpus);
+		return -EINVAL;
+	}
+
+	/* Check if there is enough memory */
 	if (fw->kpus > profile->kpus) {
-		dev_warn(rvu->dev, "Not enough KPUs: %d > %ld\n", fw->kpus,
+		dev_warn(rvu->dev, "Not enough KPUs: %d > %zu\n", fw->kpus,
 			 profile->kpus);
 		return -EINVAL;
 	}
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/npc.h b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
index 2138c044fe41..eaed172f1606 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/npc.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/npc.h
@@ -267,6 +267,19 @@ struct npc_kpu_profile_cam {
 	u16 dp2_mask;
 } __packed;
 
+struct npc_kpu_profile_cam2 {
+	u8 state;
+	u8 state_mask;
+	u16 dp0;
+	u16 dp0_mask;
+	u16 dp1;
+	u16 dp1_mask;
+	u16 dp2;
+	u16 dp2_mask;
+	u8 ptype;
+	u8 ptype_mask;
+} __packed;
+
 struct npc_kpu_profile_action {
 	u8 errlev;
 	u8 errcode;
@@ -292,6 +305,10 @@ struct npc_kpu_profile {
 	int action_entries;
 	struct npc_kpu_profile_cam *cam;
 	struct npc_kpu_profile_action *action;
+	int cam_entries2;
+	int action_entries2;
+	struct npc_kpu_profile_action *action2;
+	struct npc_kpu_profile_cam2 *cam2;
 };
 
 /* NPC KPU register formats */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
index 65397daae4c2..7f3505ae6860 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.h
@@ -553,17 +553,19 @@ struct npc_kpu_profile_adapter {
 	const char			*name;
 	u64				version;
 	const struct npc_lt_def_cfg	*lt_def;
-	const struct npc_kpu_profile_action	*ikpu; /* array[pkinds] */
-	const struct npc_kpu_profile	*kpu; /* array[kpus] */
+	struct npc_kpu_profile_action	*ikpu; /* array[pkinds] */
+	struct npc_kpu_profile_action	*ikpu2; /* array[pkinds] */
+	struct npc_kpu_profile	*kpu; /* array[kpus] */
 	union npc_mcam_key_prfl {
-		struct npc_mcam_kex		*mkex;
+		const struct npc_mcam_kex		*mkex;
 					/* used for cn9k and cn10k */
-		struct npc_mcam_kex_extr	*mkex_extr; /* used for cn20k */
+		const struct npc_mcam_kex_extr	*mkex_extr; /* used for cn20k */
 	} mcam_kex_prfl;
 	struct npc_mcam_kex_hash	*mkex_hash;
 	bool				custom;
 	size_t				pkinds;
 	size_t				kpus;
+	bool				from_fs;
 };
 
 #define RVU_SWITCH_LBK_CHAN	63
@@ -634,7 +636,7 @@ struct rvu {
 
 	/* Firmware data */
 	struct rvu_fwdata	*fwdata;
-	void			*kpu_fwdata;
+	const void		*kpu_fwdata;
 	size_t			kpu_fwdata_sz;
 	void __iomem		*kpu_prfl_addr;
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
index 150d50b72c48..b4635d78f9d5 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
@@ -1495,7 +1495,8 @@ void rvu_npc_free_mcam_entries(struct rvu *rvu, u16 pcifunc, int nixlf)
 }
 
 static void npc_program_mkex_rx(struct rvu *rvu, int blkaddr,
-				struct npc_mcam_kex *mkex, u8 intf)
+				const struct npc_mcam_kex *mkex,
+				u8 intf)
 {
 	int lid, lt, ld, fl;
 
@@ -1524,7 +1525,8 @@ static void npc_program_mkex_rx(struct rvu *rvu, int blkaddr,
 }
 
 static void npc_program_mkex_tx(struct rvu *rvu, int blkaddr,
-				struct npc_mcam_kex *mkex, u8 intf)
+				const struct npc_mcam_kex *mkex,
+				u8 intf)
 {
 	int lid, lt, ld, fl;
 
@@ -1553,7 +1555,7 @@ static void npc_program_mkex_tx(struct rvu *rvu, int blkaddr,
 }
 
 static void npc_program_mkex_profile(struct rvu *rvu, int blkaddr,
-				     struct npc_mcam_kex *mkex)
+				     const struct npc_mcam_kex *mkex)
 {
 	struct rvu_hwinfo *hw = rvu->hw;
 	u8 intf;
@@ -1693,8 +1695,12 @@ static void npc_config_kpucam(struct rvu *rvu, int blkaddr,
 			      const struct npc_kpu_profile_cam *kpucam,
 			      int kpu, int entry)
 {
+	const struct npc_kpu_profile_cam2 *kpucam2 = (void *)kpucam;
+	struct npc_kpu_profile_adapter *profile = &rvu->kpu;
 	struct npc_kpu_cam cam0 = {0};
 	struct npc_kpu_cam cam1 = {0};
+	u64 *val = (u64 *)&cam1;
+	u64 *mask = (u64 *)&cam0;
 
 	cam1.state = kpucam->state & kpucam->state_mask;
 	cam1.dp0_data = kpucam->dp0 & kpucam->dp0_mask;
@@ -1706,6 +1712,14 @@ static void npc_config_kpucam(struct rvu *rvu, int blkaddr,
 	cam0.dp1_data = ~kpucam->dp1 & kpucam->dp1_mask;
 	cam0.dp2_data = ~kpucam->dp2 & kpucam->dp2_mask;
 
+	if (profile->from_fs) {
+		u8 ptype = kpucam2->ptype;
+		u8 pmask = kpucam2->ptype_mask;
+
+		*val |= FIELD_PREP(GENMASK_ULL(57, 56), ptype & pmask);
+		*mask |= FIELD_PREP(GENMASK_ULL(57, 56), ~ptype & pmask);
+	}
+
 	rvu_write64(rvu, blkaddr,
 		    NPC_AF_KPUX_ENTRYX_CAMX(kpu, entry, 0), *(u64 *)&cam0);
 	rvu_write64(rvu, blkaddr,
@@ -1717,34 +1731,104 @@ u64 npc_enable_mask(int count)
 	return (((count) < 64) ? ~(BIT_ULL(count) - 1) : (0x00ULL));
 }
 
+struct npc_kpu_profile_action *
+npc_get_ikpu_nth_entry(struct rvu *rvu, int n)
+{
+	struct npc_kpu_profile_adapter *profile = &rvu->kpu;
+
+	if (profile->from_fs)
+		return &profile->ikpu2[n];
+
+	return &profile->ikpu[n];
+}
+
+int
+npc_get_num_kpu_cam_entries(struct rvu *rvu,
+			    const struct npc_kpu_profile *kpu_pfl)
+{
+	struct npc_kpu_profile_adapter *profile = &rvu->kpu;
+
+	if (profile->from_fs)
+		return kpu_pfl->cam_entries2;
+
+	return kpu_pfl->cam_entries;
+}
+
+struct npc_kpu_profile_cam *
+npc_get_kpu_cam_nth_entry(struct rvu *rvu,
+			  const struct npc_kpu_profile *kpu_pfl, int n)
+{
+	struct npc_kpu_profile_adapter *profile = &rvu->kpu;
+
+	if (profile->from_fs)
+		return (void *)&kpu_pfl->cam2[n];
+
+	return (void *)&kpu_pfl->cam[n];
+}
+
+int
+npc_get_num_kpu_action_entries(struct rvu *rvu,
+			       const struct npc_kpu_profile *kpu_pfl)
+{
+	struct npc_kpu_profile_adapter *profile = &rvu->kpu;
+
+	if (profile->from_fs)
+		return kpu_pfl->action_entries2;
+
+	return kpu_pfl->action_entries;
+}
+
+struct npc_kpu_profile_action *
+npc_get_kpu_action_nth_entry(struct rvu *rvu,
+			     const struct npc_kpu_profile *kpu_pfl,
+			     int n)
+{
+	struct npc_kpu_profile_adapter *profile = &rvu->kpu;
+
+	if (profile->from_fs)
+		return (void *)&kpu_pfl->action2[n];
+
+	return (void *)&kpu_pfl->action[n];
+}
+
 static void npc_program_kpu_profile(struct rvu *rvu, int blkaddr, int kpu,
 				    const struct npc_kpu_profile *profile)
 {
+	int num_cam_entries, num_action_entries;
 	int entry, num_entries, max_entries;
 	u64 entry_mask;
 
-	if (profile->cam_entries != profile->action_entries) {
+	num_cam_entries = npc_get_num_kpu_cam_entries(rvu, profile);
+	num_action_entries = npc_get_num_kpu_action_entries(rvu, profile);
+
+	if (num_cam_entries != num_action_entries) {
 		dev_err(rvu->dev,
 			"KPU%d: CAM and action entries [%d != %d] not equal\n",
-			kpu, profile->cam_entries, profile->action_entries);
+			kpu, num_cam_entries, num_action_entries);
 	}
 
 	max_entries = rvu->hw->npc_kpu_entries;
 
+	WARN(num_cam_entries > max_entries,
+	     "KPU%u: err: hw max entries=%u, input entries=%u\n",
+	     kpu,  rvu->hw->npc_kpu_entries, num_cam_entries);
+
 	/* Program CAM match entries for previous KPU extracted data */
-	num_entries = min_t(int, profile->cam_entries, max_entries);
+	num_entries = min_t(int, num_cam_entries, max_entries);
 	for (entry = 0; entry < num_entries; entry++)
 		npc_config_kpucam(rvu, blkaddr,
-				  &profile->cam[entry], kpu, entry);
+				  (void *)npc_get_kpu_cam_nth_entry(rvu, profile, entry),
+				  kpu, entry);
 
 	/* Program this KPU's actions */
-	num_entries = min_t(int, profile->action_entries, max_entries);
+	num_entries = min_t(int, num_action_entries, max_entries);
 	for (entry = 0; entry < num_entries; entry++)
-		npc_config_kpuaction(rvu, blkaddr, &profile->action[entry],
+		npc_config_kpuaction(rvu, blkaddr,
+				     (void *)npc_get_kpu_action_nth_entry(rvu, profile, entry),
 				     kpu, entry, false);
 
 	/* Enable all programmed entries */
-	num_entries = min_t(int, profile->action_entries, profile->cam_entries);
+	num_entries = min_t(int, num_action_entries, num_cam_entries);
 	entry_mask = npc_enable_mask(num_entries);
 	/* Disable first KPU_MAX_CST_ENT entries for built-in profile */
 	if (!rvu->kpu.custom)
@@ -1788,26 +1872,175 @@ static void npc_prepare_default_kpu(struct rvu *rvu,
 	npc_cn20k_update_action_entries_n_flags(rvu, profile);
 }
 
-static int npc_apply_custom_kpu(struct rvu *rvu,
-				struct npc_kpu_profile_adapter *profile)
+static int npc_alloc_kpu_cam2_n_action2(struct rvu *rvu, int kpu_num,
+					int num_entries)
+{
+	struct npc_kpu_profile_adapter *adapter = &rvu->kpu;
+	struct npc_kpu_profile *kpu;
+
+	kpu = &adapter->kpu[kpu_num];
+
+	kpu->cam2 = devm_kcalloc(rvu->dev, num_entries,
+				 sizeof(*kpu->cam2), GFP_KERNEL);
+	if (!kpu->cam2)
+		return -ENOMEM;
+
+	kpu->action2 = devm_kcalloc(rvu->dev, num_entries,
+				    sizeof(*kpu->action2), GFP_KERNEL);
+	if (!kpu->action2)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int npc_apply_custom_kpu_from_fw(struct rvu *rvu,
+					struct npc_kpu_profile_adapter *profile)
 {
 	size_t hdr_sz = sizeof(struct npc_kpu_profile_fwdata), offset = 0;
+	const struct npc_kpu_profile_fwdata *fw;
 	struct npc_kpu_profile_action *action;
-	struct npc_kpu_profile_fwdata *fw;
 	struct npc_kpu_profile_cam *cam;
 	struct npc_kpu_fwdata *fw_kpu;
-	int entries;
-	u16 kpu, entry;
+	int entries, entry, kpu;
 
-	if (is_cn20k(rvu->pdev))
-		return npc_cn20k_apply_custom_kpu(rvu, profile);
+	fw = rvu->kpu_fwdata;
+
+	for (kpu = 0; kpu < fw->kpus; kpu++) {
+		if (rvu->kpu_fwdata_sz < hdr_sz + offset) {
+			dev_warn(rvu->dev,
+				 "Profile size mismatch on KPU%i parsing\n",
+				 kpu + 1);
+			return -EINVAL;
+		}
+
+		fw_kpu = (struct npc_kpu_fwdata *)(fw->data + offset);
+		if (fw_kpu->entries < 0) {
+			dev_warn(rvu->dev,
+				 "Profile entries is negative on KPU%i parsing\n",
+				 kpu + 1);
+			return -EINVAL;
+		}
+
+		if (fw_kpu->entries > KPU_MAX_CST_ENT)
+			dev_warn(rvu->dev,
+				 "Too many custom entries on KPU%d: %d > %d\n",
+				 kpu, fw_kpu->entries, KPU_MAX_CST_ENT);
+		entries = min_t(int, fw_kpu->entries, KPU_MAX_CST_ENT);
+		cam = (struct npc_kpu_profile_cam *)fw_kpu->data;
+		offset += sizeof(*fw_kpu) + fw_kpu->entries * sizeof(*cam);
+		action = (struct npc_kpu_profile_action *)(fw->data + offset);
+		offset += fw_kpu->entries * sizeof(*action);
+		if (rvu->kpu_fwdata_sz < hdr_sz + offset) {
+			dev_warn(rvu->dev,
+				 "Profile size mismatch on KPU%i parsing.\n",
+				 kpu + 1);
+			return -EINVAL;
+		}
+		for (entry = 0; entry < entries; entry++) {
+			profile->kpu[kpu].cam[entry] = cam[entry];
+			profile->kpu[kpu].action[entry] = action[entry];
+		}
+	}
+
+	return 0;
+}
+
+static int npc_apply_custom_kpu_from_fs(struct rvu *rvu,
+					struct npc_kpu_profile_adapter *profile)
+{
+	size_t hdr_sz = sizeof(struct npc_kpu_profile_fwdata), offset = 0;
+	const struct npc_kpu_profile_fwdata *fw;
+	struct npc_kpu_profile_action *action;
+	struct npc_kpu_profile_cam2 *cam2;
+	struct npc_kpu_fwdata *fw_kpu;
+	int entries, ret, entry, kpu;
 
 	fw = rvu->kpu_fwdata;
 
+	/* Binary blob contains ikpu actions entries at start of data[0] */
+	profile->ikpu2 = devm_kcalloc(rvu->dev, 1,
+				      sizeof(ikpu_action_entries),
+				      GFP_KERNEL);
+	if (!profile->ikpu2)
+		return -ENOMEM;
+
+	action = (struct npc_kpu_profile_action *)(fw->data + offset);
+
+	if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))
+		return -EINVAL;
+
+	/* The firmware layout does dependent on the internal size of
+	 * ikpu_action_entries.
+	 */
+	memcpy((void *)profile->ikpu2, action, sizeof(ikpu_action_entries));
+	offset += sizeof(ikpu_action_entries);
+
+	for (kpu = 0; kpu < fw->kpus; kpu++) {
+		if (rvu->kpu_fwdata_sz < hdr_sz + offset + sizeof(*fw_kpu)) {
+			dev_warn(rvu->dev,
+				 "profile size mismatch on kpu%i parsing\n",
+				 kpu + 1);
+			return -EINVAL;
+		}
+
+		fw_kpu = (struct npc_kpu_fwdata *)(fw->data + offset);
+		if (fw_kpu->entries <= 0) {
+			dev_warn(rvu->dev,
+				 "Invalid kpu entries on KPU%d\n", kpu);
+			return -EINVAL;
+		}
+
+		entries = min_t(int, fw_kpu->entries, rvu->hw->npc_kpu_entries);
+		dev_info(rvu->dev,
+			 "Loading %u entries on KPU%d\n", entries, kpu);
+
+		cam2 = (struct npc_kpu_profile_cam2 *)fw_kpu->data;
+		offset += sizeof(*fw_kpu) + fw_kpu->entries * sizeof(*cam2);
+		action = (struct npc_kpu_profile_action *)(fw->data + offset);
+		offset += fw_kpu->entries * sizeof(*action);
+		if (rvu->kpu_fwdata_sz < hdr_sz + offset) {
+			dev_warn(rvu->dev,
+				 "profile size mismatch on kpu%i parsing.\n",
+				 kpu + 1);
+			return -EINVAL;
+		}
+
+		profile->kpu[kpu].cam_entries2 = entries;
+		profile->kpu[kpu].action_entries2 = entries;
+		ret = npc_alloc_kpu_cam2_n_action2(rvu, kpu, entries);
+		if (ret) {
+			dev_warn(rvu->dev,
+				 "profile entry allocation failed for kpu=%d for %d entries\n",
+				 kpu, entries);
+			return -EINVAL;
+		}
+
+		for (entry = 0; entry < entries; entry++) {
+			profile->kpu[kpu].cam2[entry] = cam2[entry];
+			profile->kpu[kpu].action2[entry] = action[entry];
+		}
+	}
+
+	return 0;
+}
+
+static int npc_apply_custom_kpu(struct rvu *rvu,
+				struct npc_kpu_profile_adapter *profile,
+				bool from_fs, int *fw_kpus)
+{
+	size_t hdr_sz = sizeof(struct npc_kpu_profile_fwdata);
+	const struct npc_kpu_profile_fwdata *fw;
+	struct npc_kpu_profile_fwdata *sfw;
+
+	if (is_cn20k(rvu->pdev))
+		return npc_cn20k_apply_custom_kpu(rvu, profile);
+
 	if (rvu->kpu_fwdata_sz < hdr_sz) {
 		dev_warn(rvu->dev, "Invalid KPU profile size\n");
 		return -EINVAL;
 	}
+
+	fw = rvu->kpu_fwdata;
 	if (le64_to_cpu(fw->signature) != KPU_SIGN) {
 		dev_warn(rvu->dev, "Invalid KPU profile signature %llx\n",
 			 fw->signature);
@@ -1835,42 +2068,38 @@ static int npc_apply_custom_kpu(struct rvu *rvu,
 		return -EINVAL;
 	}
 	/* Verify if profile fits the HW */
+	if (fw->kpus > rvu->hw->npc_kpus) {
+		dev_warn(rvu->dev, "Not enough KPUs: %d > %d\n", fw->kpus,
+			 rvu->hw->npc_kpus);
+		return -EINVAL;
+	}
+
+	/* Check if there is enough memory for fw loading.
+	 * Check if there is enough entries for profile->kpu[] to
+	 * set cam_entries2 and action_entries2
+	 */
 	if (fw->kpus > profile->kpus) {
-		dev_warn(rvu->dev, "Not enough KPUs: %d > %ld\n", fw->kpus,
+		dev_warn(rvu->dev, "Not enough KPUs: %d > %zu\n", fw->kpus,
 			 profile->kpus);
 		return -EINVAL;
 	}
 
+	*fw_kpus = fw->kpus;
+
+	sfw = devm_kcalloc(rvu->dev, 1, sizeof(*sfw), GFP_KERNEL);
+	if (!sfw)
+		return -ENOMEM;
+
+	memcpy(sfw, fw, sizeof(*sfw));
+
 	profile->custom = 1;
-	profile->name = fw->name;
+	profile->name = sfw->name;
 	profile->version = le64_to_cpu(fw->version);
-	profile->mcam_kex_prfl.mkex = &fw->mkex;
-	profile->lt_def = &fw->lt_def;
-
-	for (kpu = 0; kpu < fw->kpus; kpu++) {
-		fw_kpu = (struct npc_kpu_fwdata *)(fw->data + offset);
-		if (fw_kpu->entries > KPU_MAX_CST_ENT)
-			dev_warn(rvu->dev,
-				 "Too many custom entries on KPU%d: %d > %d\n",
-				 kpu, fw_kpu->entries, KPU_MAX_CST_ENT);
-		entries = min(fw_kpu->entries, KPU_MAX_CST_ENT);
-		cam = (struct npc_kpu_profile_cam *)fw_kpu->data;
-		offset += sizeof(*fw_kpu) + fw_kpu->entries * sizeof(*cam);
-		action = (struct npc_kpu_profile_action *)(fw->data + offset);
-		offset += fw_kpu->entries * sizeof(*action);
-		if (rvu->kpu_fwdata_sz < hdr_sz + offset) {
-			dev_warn(rvu->dev,
-				 "Profile size mismatch on KPU%i parsing.\n",
-				 kpu + 1);
-			return -EINVAL;
-		}
-		for (entry = 0; entry < entries; entry++) {
-			profile->kpu[kpu].cam[entry] = cam[entry];
-			profile->kpu[kpu].action[entry] = action[entry];
-		}
-	}
+	profile->mcam_kex_prfl.mkex = &sfw->mkex;
+	profile->lt_def = &sfw->lt_def;
 
-	return 0;
+	return from_fs ? npc_apply_custom_kpu_from_fs(rvu, profile) :
+		npc_apply_custom_kpu_from_fw(rvu, profile);
 }
 
 static int npc_load_kpu_prfl_img(struct rvu *rvu, void __iomem *prfl_addr,
@@ -1958,45 +2187,19 @@ static int npc_load_kpu_profile_fwdb(struct rvu *rvu, const char *kpu_profile)
 	return ret;
 }
 
-void npc_load_kpu_profile(struct rvu *rvu)
+static int npc_load_kpu_profile_from_fw(struct rvu *rvu)
 {
 	struct npc_kpu_profile_adapter *profile = &rvu->kpu;
 	const char *kpu_profile = rvu->kpu_pfl_name;
-	const struct firmware *fw = NULL;
-	bool retry_fwdb = false;
-
-	/* If user not specified profile customization */
-	if (!strncmp(kpu_profile, def_pfl_name, KPU_NAME_LEN))
-		goto revert_to_default;
-	/* First prepare default KPU, then we'll customize top entries. */
-	npc_prepare_default_kpu(rvu, profile);
-
-	/* Order of preceedence for load loading NPC profile (high to low)
-	 * Firmware binary in filesystem.
-	 * Firmware database method.
-	 * Default KPU profile.
-	 */
-	if (!request_firmware_direct(&fw, kpu_profile, rvu->dev)) {
-		dev_info(rvu->dev, "Loading KPU profile from firmware: %s\n",
-			 kpu_profile);
-		rvu->kpu_fwdata = kzalloc(fw->size, GFP_KERNEL);
-		if (rvu->kpu_fwdata) {
-			memcpy(rvu->kpu_fwdata, fw->data, fw->size);
-			rvu->kpu_fwdata_sz = fw->size;
-		}
-		release_firmware(fw);
-		retry_fwdb = true;
-		goto program_kpu;
-	}
+	int fw_kpus = 0;
 
-load_image_fwdb:
 	/* Loading the KPU profile using firmware database */
 	if (npc_load_kpu_profile_fwdb(rvu, kpu_profile))
-		goto revert_to_default;
+		return -EFAULT;
 
-program_kpu:
 	/* Apply profile customization if firmware was loaded. */
-	if (!rvu->kpu_fwdata_sz || npc_apply_custom_kpu(rvu, profile)) {
+	if (!rvu->kpu_fwdata_sz ||
+	    npc_apply_custom_kpu(rvu, profile, false, &fw_kpus)) {
 		/* If image from firmware filesystem fails to load or invalid
 		 * retry with firmware database method.
 		 */
@@ -2010,10 +2213,6 @@ void npc_load_kpu_profile(struct rvu *rvu)
 			}
 			rvu->kpu_fwdata = NULL;
 			rvu->kpu_fwdata_sz = 0;
-			if (retry_fwdb) {
-				retry_fwdb = false;
-				goto load_image_fwdb;
-			}
 		}
 
 		dev_warn(rvu->dev,
@@ -2021,22 +2220,101 @@ void npc_load_kpu_profile(struct rvu *rvu)
 			 kpu_profile);
 		kfree(rvu->kpu_fwdata);
 		rvu->kpu_fwdata = NULL;
-		goto revert_to_default;
+		return -EFAULT;
 	}
 
-	dev_info(rvu->dev, "Using custom profile '%s', version %d.%d.%d\n",
+	dev_info(rvu->dev, "Using custom profile '%.32s', version %d.%d.%d\n",
 		 profile->name, NPC_KPU_VER_MAJ(profile->version),
 		 NPC_KPU_VER_MIN(profile->version),
 		 NPC_KPU_VER_PATCH(profile->version));
 
-	return;
+	return 0;
+}
+
+static int npc_load_kpu_profile_from_fs(struct rvu *rvu)
+{
+	struct npc_kpu_profile_adapter *profile = &rvu->kpu;
+	const char *kpu_profile = rvu->kpu_pfl_name;
+	const struct firmware *fw = NULL;
+	int ret, fw_kpus = 0;
+	char path[512] = "kpu/";
+
+	if (strlen(kpu_profile) > sizeof(path) - strlen("kpu/") - 1) {
+		dev_err(rvu->dev, "kpu profile name is too big\n");
+		return -ENOSPC;
+	}
+
+	strcat(path, kpu_profile);
+
+	if (request_firmware_direct(&fw, path, rvu->dev))
+		return -ENOENT;
+
+	dev_info(rvu->dev, "Loading KPU profile from filesystem: %s\n",
+		 path);
+
+	rvu->kpu_fwdata = fw->data;
+	rvu->kpu_fwdata_sz = fw->size;
+
+	ret = npc_apply_custom_kpu(rvu, profile, true, &fw_kpus);
+	release_firmware(fw);
+	rvu->kpu_fwdata = NULL;
+
+	if (ret) {
+		rvu->kpu_fwdata_sz = 0;
+		dev_err(rvu->dev,
+			"Loading KPU profile from filesystem failed\n");
+		return ret;
+	}
+
+	/* In firmware loading from filesystem method, all entries are from
+	 * same binary blob.
+	 */
+	rvu->kpu.kpus = fw_kpus;
+	profile->kpus = fw_kpus;
+	profile->from_fs = true;
+	return 0;
+}
+
+void npc_load_kpu_profile(struct rvu *rvu)
+{
+	struct npc_kpu_profile_adapter *profile = &rvu->kpu;
+	const char *kpu_profile = rvu->kpu_pfl_name;
+
+	profile->from_fs = false;
+
+	npc_prepare_default_kpu(rvu, profile);
+
+	/* If user not specified profile customization */
+	if (!strncmp(kpu_profile, def_pfl_name, KPU_NAME_LEN))
+		return;
+
+	/* Order of preceedence for load loading NPC profile (high to low)
+	 * Firmware binary in filesystem.
+	 * Firmware database method.
+	 * Default KPU profile.
+	 */
+
+	/* Filesystem-based KPU loading is not supported on cn20k.
+	 * npc_prepare_default_kpu() was invoked earlier, but control
+	 * reached this point because the default profile was not selected.
+	 * No need to call it again.
+	 */
+	if (!is_cn20k(rvu->pdev)) {
+		if (!npc_load_kpu_profile_from_fs(rvu))
+			return;
+	}
+
+	/* First prepare default KPU, then we'll customize top entries. */
+	npc_prepare_default_kpu(rvu, profile);
+	if (!npc_load_kpu_profile_from_fw(rvu))
+		return;
 
-revert_to_default:
 	npc_prepare_default_kpu(rvu, profile);
 }
 
 static void npc_parser_profile_init(struct rvu *rvu, int blkaddr)
 {
+	struct npc_kpu_profile_adapter *profile = &rvu->kpu;
 	struct rvu_hwinfo *hw = rvu->hw;
 	int num_pkinds, num_kpus, idx;
 
@@ -2060,7 +2338,9 @@ static void npc_parser_profile_init(struct rvu *rvu, int blkaddr)
 	num_pkinds = min_t(int, hw->npc_pkinds, num_pkinds);
 
 	for (idx = 0; idx < num_pkinds; idx++)
-		npc_config_kpuaction(rvu, blkaddr, &rvu->kpu.ikpu[idx], 0, idx, true);
+		npc_config_kpuaction(rvu, blkaddr,
+				     npc_get_ikpu_nth_entry(rvu, idx),
+				     0, idx, true);
 
 	/* Program KPU CAM and Action profiles */
 	num_kpus = rvu->kpu.kpus;
@@ -2068,6 +2348,11 @@ static void npc_parser_profile_init(struct rvu *rvu, int blkaddr)
 
 	for (idx = 0; idx < num_kpus; idx++)
 		npc_program_kpu_profile(rvu, blkaddr, idx, &rvu->kpu.kpu[idx]);
+
+	if (profile->from_fs) {
+		rvu_write64(rvu, blkaddr, NPC_AF_PKINDX_TYPE(54), 0x03);
+		rvu_write64(rvu, blkaddr, NPC_AF_PKINDX_TYPE(58), 0x03);
+	}
 }
 
 void npc_mcam_rsrcs_deinit(struct rvu *rvu)
@@ -2297,18 +2582,21 @@ static void rvu_npc_hw_init(struct rvu *rvu, int blkaddr)
 
 static void rvu_npc_setup_interfaces(struct rvu *rvu, int blkaddr)
 {
-	struct npc_mcam_kex_extr *mkex_extr = rvu->kpu.mcam_kex_prfl.mkex_extr;
-	struct npc_mcam_kex *mkex = rvu->kpu.mcam_kex_prfl.mkex;
+	const struct npc_mcam_kex_extr *mkex_extr;
 	struct npc_mcam *mcam = &rvu->hw->mcam;
 	struct rvu_hwinfo *hw = rvu->hw;
+	const struct npc_mcam_kex *mkex;
 	u64 nibble_ena, rx_kex, tx_kex;
 	u64 *keyx_cfg, reg;
 	u8 intf;
 
+	mkex_extr = rvu->kpu.mcam_kex_prfl.mkex_extr;
+	mkex = rvu->kpu.mcam_kex_prfl.mkex;
+
 	if (is_cn20k(rvu->pdev)) {
-		keyx_cfg = mkex_extr->keyx_cfg;
+		keyx_cfg = (u64 *)mkex_extr->keyx_cfg;
 	} else {
-		keyx_cfg = mkex->keyx_cfg;
+		keyx_cfg = (u64 *)mkex->keyx_cfg;
 		/* Reserve last counter for MCAM RX miss action which is set to
 		 * drop packet. This way we will know how many pkts didn't
 		 * match any MCAM entry.
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.h
index 83c5e32e2afc..662f6693cfe9 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.h
@@ -18,4 +18,21 @@ int npc_fwdb_prfl_img_map(struct rvu *rvu, void __iomem **prfl_img_addr,
 
 void npc_mcam_clear_bit(struct npc_mcam *mcam, u16 index);
 void npc_mcam_set_bit(struct npc_mcam *mcam, u16 index);
+
+struct npc_kpu_profile_action *
+npc_get_ikpu_nth_entry(struct rvu *rvu, int n);
+
+int
+npc_get_num_kpu_cam_entries(struct rvu *rvu,
+			    const struct npc_kpu_profile *kpu_pfl);
+struct npc_kpu_profile_cam *
+npc_get_kpu_cam_nth_entry(struct rvu *rvu,
+			  const struct npc_kpu_profile *kpu_pfl, int n);
+
+int
+npc_get_num_kpu_action_entries(struct rvu *rvu,
+			       const struct npc_kpu_profile *kpu_pfl);
+struct npc_kpu_profile_action *
+npc_get_kpu_action_nth_entry(struct rvu *rvu,
+			     const struct npc_kpu_profile *kpu_pfl, int n);
 #endif /* RVU_NPC_H */
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_reg.h b/drivers/net/ethernet/marvell/octeontx2/af/rvu_reg.h
index 62cdc714ba57..ab89b8c6e490 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_reg.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_reg.h
@@ -596,6 +596,7 @@
 #define NPC_AF_INTFX_KEX_CFG(a)		(0x01010 | (a) << 8)
 #define NPC_AF_PKINDX_ACTION0(a)	(0x80000ull | (a) << 6)
 #define NPC_AF_PKINDX_ACTION1(a)	(0x80008ull | (a) << 6)
+#define NPC_AF_PKINDX_TYPE(a)		(0x80010ull | (a) << 6)
 #define NPC_AF_PKINDX_CPI_DEFX(a, b)	(0x80020ull | (a) << 6 | (b) << 3)
 #define NPC_AF_KPUX_ENTRYX_CAMX(a, b, c) \
 		(0x100000 | (a) << 14 | (b) << 6 | (c) << 3)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v19 net-next 8/9] octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc
  2026-06-05  6:32 [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
                   ` (6 preceding siblings ...)
  2026-06-05  6:32 ` [PATCH v19 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem Ratheesh Kannoth
@ 2026-06-05  6:32 ` Ratheesh Kannoth
  2026-06-08  2:24   ` Ratheesh Kannoth
  2026-06-08  2:31   ` Ratheesh Kannoth
  2026-06-05  6:32 ` [PATCH v19 net-next 9/9] octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically Ratheesh Kannoth
  8 siblings, 2 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-05  6:32 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham, Ratheesh Kannoth

Default CN20K NPC rule allocation now keys off the active MCAM keyword
width: use X4 with a bank-masked reference index when the silicon uses
X4 keys, and X2 with the raw index otherwise (replacing the previous
always-X2 / eidx + 1 behaviour).

In the AF flow-install path, flows that need more than 256 key bits
query the NPC profile; if the platform is fixed to X2 entries, fail
with -EOPNOTSUPP instead of requesting X4. Otherwise select X4 for the
MCAM alloc.

On the PF, cache and pass the profile kw_type from npc_get_pfl_info
through otx2_mcam_pfl_info_get(), and use it when allocating MCAM
entries for RSS/defaults and when installing ethtool flows on CN20K,
including masking the reference index for X4 slot layout.

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 .../ethernet/marvell/octeontx2/af/cn20k/npc.c | 21 ++++++--
 .../marvell/octeontx2/af/rvu_npc_fs.c         | 12 ++++-
 .../marvell/octeontx2/nic/otx2_flows.c        | 48 +++++++++++++------
 3 files changed, 61 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
index 32b53b5bc57a..d76aad867934 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
@@ -4500,10 +4500,16 @@ int npc_cn20k_dft_rules_alloc(struct rvu *rvu, u16 pcifunc)
 	pfvf = rvu_get_pfvf(rvu, pcifunc);
 	pfvf->hw_prio = NPC_DFT_RULE_PRIO;
 
+	if (npc_priv.kw == NPC_MCAM_KEY_X4) {
+		req.kw_type = NPC_MCAM_KEY_X4;
+		req.ref_entry = eidx & (npc_priv.bank_depth - 1);
+	} else {
+		req.kw_type = NPC_MCAM_KEY_X2;
+		req.ref_entry = eidx;
+	}
+
 	req.contig = false;
 	req.ref_prio = NPC_MCAM_HIGHER_PRIO;
-	req.ref_entry = eidx;
-	req.kw_type = NPC_MCAM_KEY_X2;
 	req.count = cnt;
 	req.hdr.pcifunc = pcifunc;
 
@@ -4533,11 +4539,18 @@ int npc_cn20k_dft_rules_alloc(struct rvu *rvu, u16 pcifunc)
 	 * as NPC_DFT_RULE_PRIO - 1 (higher hw priority)
 	 */
 	req.contig = false;
-	req.kw_type = NPC_MCAM_KEY_X2;
 	req.count = cnt;
 	req.hdr.pcifunc = pcifunc;
 	req.ref_prio = NPC_MCAM_LOWER_PRIO;
-	req.ref_entry = eidx + 1;
+
+	if (npc_priv.kw == NPC_MCAM_KEY_X4) {
+		req.kw_type = NPC_MCAM_KEY_X4;
+		req.ref_entry = eidx & (npc_priv.bank_depth - 1);
+	} else {
+		req.kw_type = NPC_MCAM_KEY_X2;
+		req.ref_entry = eidx;
+	}
+
 	ret = rvu_mbox_handler_npc_mcam_alloc_entry(rvu, &req, &rsp);
 	if (ret) {
 		dev_err(rvu->dev,
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c
index 34f1e066707b..a22decbe3449 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc_fs.c
@@ -1671,9 +1671,11 @@ rvu_npc_alloc_entry_for_flow_install(struct rvu *rvu,
 {
 	struct npc_mcam_alloc_entry_req entry_req;
 	struct npc_mcam_alloc_entry_rsp entry_rsp;
+	struct npc_get_pfl_info_rsp rsp = { 0 };
 	struct npc_get_num_kws_req kws_req;
 	struct npc_get_num_kws_rsp kws_rsp;
 	int off, kw_bits, rc;
+	struct msg_req req;
 	u8 *src, *dst;
 
 	if (!is_cn20k(rvu->pdev)) {
@@ -1697,8 +1699,16 @@ rvu_npc_alloc_entry_for_flow_install(struct rvu *rvu,
 	kw_bits = kws_rsp.kws * 64;
 
 	*kw_type = NPC_MCAM_KEY_X2;
-	if (kw_bits > 256)
+	if (kw_bits > 256) {
+		rvu_mbox_handler_npc_get_pfl_info(rvu, &req, &rsp);
+		if (rsp.kw_type == NPC_MCAM_KEY_X2) {
+			dev_err(rvu->dev,
+				"Only X2 entries are supported in X2 profile\n");
+			return -EOPNOTSUPP;
+		}
+
 		*kw_type = NPC_MCAM_KEY_X4;
+	}
 
 	memset(&entry_req, 0, sizeof(entry_req));
 	memset(&entry_rsp, 0, sizeof(entry_rsp));
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
index 38cc539d724d..5dd0591fed99 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_flows.c
@@ -37,14 +37,13 @@ static void otx2_clear_ntuple_flow_info(struct otx2_nic *pfvf, struct otx2_flow_
 	flow_cfg->max_flows = 0;
 }
 
-static int otx2_mcam_pfl_info_get(struct otx2_nic *pfvf, bool *is_x2,
-				  u16 *x4_slots)
+static int otx2_mcam_pfl_info_get(struct otx2_nic *pfvf, u16 *x4_slots, u8 *kw_type)
 {
 	struct npc_get_pfl_info_rsp *rsp;
 	struct msg_req *req;
 	static struct {
 		bool is_set;
-		bool is_x2;
+		u8 kw_type;
 		u16 x4_slots;
 	} pfl_info;
 
@@ -53,8 +52,8 @@ static int otx2_mcam_pfl_info_get(struct otx2_nic *pfvf, bool *is_x2,
 	 */
 	mutex_lock(&pfvf->mbox.lock);
 	if (pfl_info.is_set) {
-		*is_x2 = pfl_info.is_x2;
 		*x4_slots = pfl_info.x4_slots;
+		*kw_type = pfl_info.kw_type;
 		mutex_unlock(&pfvf->mbox.lock);
 		return 0;
 	}
@@ -79,16 +78,16 @@ static int otx2_mcam_pfl_info_get(struct otx2_nic *pfvf, bool *is_x2,
 		return -EFAULT;
 	}
 
-	*is_x2 = (rsp->kw_type == NPC_MCAM_KEY_X2);
-	if (*is_x2)
-		*x4_slots = 0;
+	pfl_info.kw_type = rsp->kw_type;
+	if (rsp->kw_type == NPC_MCAM_KEY_X2)
+		pfl_info.x4_slots = 0;
 	else
-		*x4_slots = rsp->x4_slots;
-
-	pfl_info.is_x2 = *is_x2;
-	pfl_info.x4_slots = *x4_slots;
+		pfl_info.x4_slots = rsp->x4_slots;
 	pfl_info.is_set = true;
 
+	*x4_slots = pfl_info.x4_slots;
+	*kw_type = pfl_info.kw_type;
+
 	mutex_unlock(&pfvf->mbox.lock);
 	return 0;
 }
@@ -164,6 +163,7 @@ int otx2_alloc_mcam_entries(struct otx2_nic *pfvf, u16 count)
 	u16 dft_idx = 0, x4_slots = 0;
 	int ent, allocated = 0, ref;
 	bool is_x2 = false;
+	u8 kw_type = 0;
 	int rc;
 
 	/* Free current ones and allocate new ones with requested count */
@@ -182,12 +182,14 @@ int otx2_alloc_mcam_entries(struct otx2_nic *pfvf, u16 count)
 	}
 
 	if (is_cn20k(pfvf->pdev)) {
-		rc = otx2_mcam_pfl_info_get(pfvf, &is_x2, &x4_slots);
+		rc = otx2_mcam_pfl_info_get(pfvf, &x4_slots, &kw_type);
 		if (rc) {
 			netdev_err(pfvf->netdev, "Error to retrieve profile info\n");
 			return rc;
 		}
 
+		is_x2 = kw_type == NPC_MCAM_KEY_X2;
+
 		rc = otx2_get_dft_rl_idx(pfvf, &dft_idx);
 		if (rc) {
 			netdev_err(pfvf->netdev,
@@ -289,6 +291,8 @@ int otx2_mcam_entry_init(struct otx2_nic *pfvf)
 	struct npc_mcam_alloc_entry_rsp *rsp;
 	int vf_vlan_max_flows, count;
 	int rc, ref, prio, ent;
+	u8 kw_type = 0;
+	u16 x4_slots;
 	u16 dft_idx;
 
 	ref = 0;
@@ -315,6 +319,16 @@ int otx2_mcam_entry_init(struct otx2_nic *pfvf)
 	if (!flow_cfg->def_ent)
 		return -ENOMEM;
 
+	kw_type = NPC_MCAM_KEY_X2;
+	if (is_cn20k(pfvf->pdev)) {
+		rc = otx2_mcam_pfl_info_get(pfvf, &x4_slots, &kw_type);
+		if (rc) {
+			netdev_err(pfvf->netdev,
+				   "Error to get pfl info\n");
+			return rc;
+		}
+	}
+
 	mutex_lock(&pfvf->mbox.lock);
 
 	req = otx2_mbox_alloc_msg_npc_mcam_alloc_entry(&pfvf->mbox);
@@ -324,6 +338,10 @@ int otx2_mcam_entry_init(struct otx2_nic *pfvf)
 	}
 
 	req->kw_type = NPC_MCAM_KEY_X2;
+	if (is_cn20k(pfvf->pdev) && kw_type == NPC_MCAM_KEY_X4) {
+		req->kw_type = NPC_MCAM_KEY_X4;
+		ref &= (x4_slots - 1);
+	}
 	req->contig = false;
 	req->count = count;
 	req->ref_prio = prio;
@@ -1174,15 +1192,14 @@ static int otx2_add_flow_msg(struct otx2_nic *pfvf, struct otx2_flow *flow)
 #ifdef CONFIG_DCB
 	int vlan_prio, qidx, pfc_rule = 0;
 #endif
+	bool modify = false, is_x2;
 	int err, vf = 0, off, sz;
-	bool modify = false;
 	u8 kw_type = 0;
 	u8 *src, *dst;
 	u16 x4_slots;
-	bool is_x2;
 
 	if (is_cn20k(pfvf->pdev)) {
-		err = otx2_mcam_pfl_info_get(pfvf, &is_x2, &x4_slots);
+		err = otx2_mcam_pfl_info_get(pfvf, &x4_slots, &kw_type);
 		if (err) {
 			netdev_err(pfvf->netdev,
 				   "Error to retrieve NPC profile info, pcifunc=%#x\n",
@@ -1190,6 +1207,7 @@ static int otx2_add_flow_msg(struct otx2_nic *pfvf, struct otx2_flow *flow)
 			return -EFAULT;
 		}
 
+		is_x2 = kw_type == NPC_MCAM_KEY_X2;
 		if (!is_x2) {
 			err = otx2_prepare_flow_request(&flow->flow_spec,
 							&treq);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v19 net-next 9/9] octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically.
  2026-06-05  6:32 [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
                   ` (7 preceding siblings ...)
  2026-06-05  6:32 ` [PATCH v19 net-next 8/9] octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc Ratheesh Kannoth
@ 2026-06-05  6:32 ` Ratheesh Kannoth
  2026-06-08  2:25   ` Ratheesh Kannoth
  2026-06-08  2:32   ` Ratheesh Kannoth
  8 siblings, 2 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-05  6:32 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham, Ratheesh Kannoth

Replace the file-scope static npc_priv with a kcalloc'd struct filled
from hardware bank/subbank geometry at init (num_banks is no longer a
const compile-time constant; drop init_done and use a non-NULL
npc_priv pointer for liveness). Thread npc_priv_get() / pointer access
through the CN20K NPC code paths, extend teardown to kfree the root
struct on failure and in npc_cn20k_deinit, and adjust MCAM section
setup to use the discovered subbank count.

Allocate MCAM debugfs dstats via devm_kzalloc instead of a static matrix,
and use the allocated backing store consistently when computing deltas
(including the counter rollover compare).

Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
---
 .../marvell/octeontx2/af/cn20k/debugfs.c      |  17 +-
 .../ethernet/marvell/octeontx2/af/cn20k/npc.c | 442 +++++++++---------
 .../ethernet/marvell/octeontx2/af/cn20k/npc.h |   4 +-
 3 files changed, 240 insertions(+), 223 deletions(-)

diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c
index 730ef97a57e6..b6fda42e44c7 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/debugfs.c
@@ -176,7 +176,8 @@ static DEFINE_MUTEX(stats_lock);
  * hard limit on all silicon variants, preventing any possibility of
  * out-of-bounds access.
  */
-static u64 dstats[MAX_NUM_BANKS][MAX_SUBBANK_DEPTH * MAX_NUM_SUB_BANKS] = {};
+static u64 (*dstats)[MAX_NUM_BANKS][MAX_SUBBANK_DEPTH * MAX_NUM_SUB_BANKS];
+
 static int npc_mcam_dstats_show(struct seq_file *s, void *unused)
 {
 	struct npc_priv_t *npc_priv;
@@ -212,24 +213,24 @@ static int npc_mcam_dstats_show(struct seq_file *s, void *unused)
 					   NPC_AF_CN20K_MCAMEX_BANKX_STAT_EXT(idx, bank));
 			if (!stats)
 				continue;
-			if (stats == dstats[bank][idx])
+			if (stats == dstats[0][bank][idx])
 				continue;
 
-			if (stats < dstats[bank][idx])
-				dstats[bank][idx] = 0;
+			if (stats < dstats[0][bank][idx])
+				dstats[0][bank][idx] = 0;
 
 			pf = 0xFFFF;
 			map = xa_load(&npc_priv->xa_idx2pf_map, mcam_idx);
 			if (map)
 				pf = xa_to_value(map);
 
-			delta = stats - dstats[bank][idx];
+			delta = stats - dstats[0][bank][idx];
 
 			snprintf(buff, sizeof(buff), "%u\t%#04x\t%llu\n",
 				 mcam_idx, pf, delta);
 			seq_puts(s, buff);
 
-			dstats[bank][idx] = stats;
+			dstats[0][bank][idx] = stats;
 		}
 	}
 
@@ -397,6 +398,10 @@ int npc_cn20k_debugfs_init(struct rvu *rvu)
 	debugfs_create_file("vidx2idx", 0444, rvu->rvu_dbg.npc,
 			    npc_priv, &npc_vidx2idx_map_fops);
 
+	dstats = devm_kzalloc(rvu->dev, sizeof(*dstats), GFP_KERNEL);
+	if (!dstats)
+		return -ENOMEM;
+
 	debugfs_create_file("dstats", 0444, rvu->rvu_dbg.npc, rvu,
 			    &npc_mcam_dstats_fops);
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
index d76aad867934..26618c652483 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.c
@@ -16,9 +16,7 @@
 #include "cn20k/reg.h"
 #include "rvu_npc_fs.h"
 
-static struct npc_priv_t npc_priv = {
-	.num_banks = MAX_NUM_BANKS,
-};
+static struct npc_priv_t *npc_priv;
 
 static const char *npc_kw_name[NPC_MCAM_KEY_MAX] = {
 	[NPC_MCAM_KEY_DYN] = "DYNAMIC",
@@ -226,7 +224,7 @@ static u16 npc_idx2vidx(u16 idx)
 	vidx = idx;
 	index = idx;
 
-	map = xa_load(&npc_priv.xa_idx2vidx_map, index);
+	map = xa_load(&npc_priv->xa_idx2vidx_map, index);
 	if (!map)
 		goto done;
 
@@ -242,7 +240,7 @@ static u16 npc_idx2vidx(u16 idx)
 
 static bool npc_is_vidx(u16 vidx)
 {
-	return vidx >= npc_priv.bank_depth * 2;
+	return vidx >= npc_priv->bank_depth * 2;
 }
 
 static u16 npc_vidx2idx(u16 vidx)
@@ -256,7 +254,7 @@ static u16 npc_vidx2idx(u16 vidx)
 	idx = vidx;
 	index = vidx;
 
-	map = xa_load(&npc_priv.xa_vidx2idx_map, index);
+	map = xa_load(&npc_priv->xa_vidx2idx_map, index);
 	if (!map)
 		goto done;
 
@@ -272,7 +270,7 @@ static u16 npc_vidx2idx(u16 vidx)
 
 u16 npc_cn20k_vidx2idx(u16 idx)
 {
-	if (!npc_priv.init_done)
+	if (!npc_priv)
 		return idx;
 
 	if (!npc_is_vidx(idx))
@@ -283,7 +281,7 @@ u16 npc_cn20k_vidx2idx(u16 idx)
 
 u16 npc_cn20k_idx2vidx(u16 idx)
 {
-	if (!npc_priv.init_done)
+	if (!npc_priv)
 		return idx;
 
 	if (npc_is_vidx(idx))
@@ -306,7 +304,7 @@ static int npc_vidx_maps_del_entry(struct rvu *rvu, u16 vidx, u16 *old_midx)
 
 	mcam_idx = npc_vidx2idx(vidx);
 
-	map = xa_erase(&npc_priv.xa_vidx2idx_map, vidx);
+	map = xa_erase(&npc_priv->xa_vidx2idx_map, vidx);
 	if (!map) {
 		dev_err(rvu->dev,
 			"%s: vidx(%u) does not map to proper mcam idx\n",
@@ -314,7 +312,7 @@ static int npc_vidx_maps_del_entry(struct rvu *rvu, u16 vidx, u16 *old_midx)
 		return -ESRCH;
 	}
 
-	map = xa_erase(&npc_priv.xa_idx2vidx_map, mcam_idx);
+	map = xa_erase(&npc_priv->xa_idx2vidx_map, mcam_idx);
 	if (!map) {
 		dev_err(rvu->dev,
 			"%s: vidx(%u) is not valid\n",
@@ -341,7 +339,7 @@ static int npc_vidx_maps_modify(struct rvu *rvu, u16 vidx, u16 new_midx)
 		return -ESRCH;
 	}
 
-	map = xa_erase(&npc_priv.xa_vidx2idx_map, vidx);
+	map = xa_erase(&npc_priv->xa_vidx2idx_map, vidx);
 	if (!map) {
 		dev_err(rvu->dev,
 			"%s: vidx(%u) could not be deleted from vidx2idx map\n",
@@ -351,7 +349,7 @@ static int npc_vidx_maps_modify(struct rvu *rvu, u16 vidx, u16 new_midx)
 
 	old_midx = xa_to_value(map);
 
-	rc = xa_insert(&npc_priv.xa_vidx2idx_map, vidx,
+	rc = xa_insert(&npc_priv->xa_vidx2idx_map, vidx,
 		       xa_mk_value(new_midx), GFP_KERNEL);
 	if (rc) {
 		dev_err(rvu->dev,
@@ -360,7 +358,7 @@ static int npc_vidx_maps_modify(struct rvu *rvu, u16 vidx, u16 new_midx)
 		goto fail1;
 	}
 
-	map = xa_erase(&npc_priv.xa_idx2vidx_map, old_midx);
+	map = xa_erase(&npc_priv->xa_idx2vidx_map, old_midx);
 	if (!map) {
 		dev_err(rvu->dev,
 			"%s: old_midx(%u, vidx(%u)) cannot be added to idx2vidx map\n",
@@ -369,7 +367,7 @@ static int npc_vidx_maps_modify(struct rvu *rvu, u16 vidx, u16 new_midx)
 		goto fail2;
 	}
 
-	rc = xa_insert(&npc_priv.xa_idx2vidx_map, new_midx,
+	rc = xa_insert(&npc_priv->xa_idx2vidx_map, new_midx,
 		       xa_mk_value(vidx), GFP_KERNEL);
 	if (rc) {
 		dev_err(rvu->dev,
@@ -382,21 +380,21 @@ static int npc_vidx_maps_modify(struct rvu *rvu, u16 vidx, u16 new_midx)
 
 fail3:
 	/* Restore vidx at old_midx location */
-	if (xa_insert(&npc_priv.xa_idx2vidx_map, old_midx,
+	if (xa_insert(&npc_priv->xa_idx2vidx_map, old_midx,
 		      xa_mk_value(vidx), GFP_KERNEL))
 		dev_err(rvu->dev,
 			"%s: Error to roll back idx2vidx old_midx=%u vidx=%u\n",
 			__func__, old_midx, vidx);
 fail2:
 	/* Erase new_midx inserted at vidx */
-	if (!xa_erase(&npc_priv.xa_vidx2idx_map, vidx))
+	if (!xa_erase(&npc_priv->xa_vidx2idx_map, vidx))
 		dev_err(rvu->dev,
 			"%s: Failed to roll back vidx2idx vidx=%u\n",
 			__func__, vidx);
 
 fail1:
 	/* Restore old_midx at vidx location */
-	if (xa_insert(&npc_priv.xa_vidx2idx_map, vidx,
+	if (xa_insert(&npc_priv->xa_vidx2idx_map, vidx,
 		      xa_mk_value(old_midx), GFP_KERNEL))
 		dev_err(rvu->dev,
 			"%s: Failed to roll back vidx2idx to old_midx=%u, vidx=%u\n",
@@ -412,10 +410,10 @@ static int npc_vidx_maps_add_entry(struct rvu *rvu, u16 mcam_idx, int pcifunc,
 	u32 id;
 
 	/* Virtual index start from maximum mcam index + 1 */
-	max = npc_priv.bank_depth * 2 * 2 - 1;
-	min = npc_priv.bank_depth * 2;
+	max = npc_priv->bank_depth * 2 * 2 - 1;
+	min = npc_priv->bank_depth * 2;
 
-	rc = xa_alloc(&npc_priv.xa_vidx2idx_map, &id,
+	rc = xa_alloc(&npc_priv->xa_vidx2idx_map, &id,
 		      xa_mk_value(mcam_idx),
 		      XA_LIMIT(min, max), GFP_KERNEL);
 	if (rc) {
@@ -425,7 +423,7 @@ static int npc_vidx_maps_add_entry(struct rvu *rvu, u16 mcam_idx, int pcifunc,
 		goto fail1;
 	}
 
-	rc = xa_insert(&npc_priv.xa_idx2vidx_map, mcam_idx,
+	rc = xa_insert(&npc_priv->xa_idx2vidx_map, mcam_idx,
 		       xa_mk_value(id), GFP_KERNEL);
 	if (rc) {
 		dev_err(rvu->dev,
@@ -440,7 +438,7 @@ static int npc_vidx_maps_add_entry(struct rvu *rvu, u16 mcam_idx, int pcifunc,
 	return 0;
 
 fail2:
-	xa_erase(&npc_priv.xa_vidx2idx_map, id);
+	xa_erase(&npc_priv->xa_vidx2idx_map, id);
 fail1:
 	return rc;
 }
@@ -691,7 +689,7 @@ void npc_cn20k_parser_profile_init(struct rvu *rvu, int blkaddr)
 
 struct npc_priv_t *npc_priv_get(void)
 {
-	return &npc_priv;
+	return npc_priv;
 }
 
 static void npc_program_mkex_rx(struct rvu *rvu, int blkaddr,
@@ -860,9 +858,9 @@ npc_cn20k_enable_mcam_entry(struct rvu *rvu, int blkaddr,
 
 update_en_map:
 	if (enable)
-		set_bit(index, npc_priv.en_map);
+		set_bit(index, npc_priv->en_map);
 	else
-		clear_bit(index, npc_priv.en_map);
+		clear_bit(index, npc_priv->en_map);
 
 	return 0;
 }
@@ -1751,28 +1749,28 @@ int npc_mcam_idx_2_key_type(struct rvu *rvu, u16 mcam_idx, u8 *key_type)
 	int bank_off, sb_id;
 
 	/* mcam_idx should be less than (2 * bank depth) */
-	if (mcam_idx >= npc_priv.bank_depth * 2) {
+	if (mcam_idx >= npc_priv->bank_depth * 2) {
 		dev_err(rvu->dev, "%s: bad params\n",
 			__func__);
 		return -EINVAL;
 	}
 
 	/* find mcam offset per bank */
-	bank_off = mcam_idx & (npc_priv.bank_depth - 1);
+	bank_off = mcam_idx & (npc_priv->bank_depth - 1);
 
 	/* Find subbank id */
-	sb_id = bank_off / npc_priv.subbank_depth;
+	sb_id = bank_off / npc_priv->subbank_depth;
 
 	/* Check if subbank id is more than maximum
 	 * number of subbanks available
 	 */
-	if (sb_id >= npc_priv.num_subbanks) {
+	if (sb_id >= npc_priv->num_subbanks) {
 		dev_err(rvu->dev, "%s: invalid subbank %d\n",
 			__func__, sb_id);
 		return -EINVAL;
 	}
 
-	sb = &npc_priv.sb[sb_id];
+	sb = &npc_priv->sb[sb_id];
 
 	*key_type = sb->key_type;
 
@@ -1788,7 +1786,7 @@ static int npc_subbank_idx_2_mcam_idx(struct rvu *rvu, struct npc_subbank *sb,
 	 * subsection depth - 1
 	 */
 	if (sb->key_type == NPC_MCAM_KEY_X4 &&
-	    sub_off >= npc_priv.subbank_depth) {
+	    sub_off >= npc_priv->subbank_depth) {
 		dev_err(rvu->dev,
 			"%s: Failed to get mcam idx (x4) sb->idx=%u sub_off=%u",
 			__func__, sb->idx, sub_off);
@@ -1799,7 +1797,7 @@ static int npc_subbank_idx_2_mcam_idx(struct rvu *rvu, struct npc_subbank *sb,
 	 * 2 * subsection depth - 1
 	 */
 	if (sb->key_type == NPC_MCAM_KEY_X2 &&
-	    sub_off >= npc_priv.subbank_depth * 2) {
+	    sub_off >= npc_priv->subbank_depth * 2) {
 		dev_err(rvu->dev,
 			"%s: Failed to get mcam idx (x2) sb->idx=%u sub_off=%u",
 			__func__, sb->idx, sub_off);
@@ -1807,12 +1805,12 @@ static int npc_subbank_idx_2_mcam_idx(struct rvu *rvu, struct npc_subbank *sb,
 	}
 
 	/* Find subbank offset from respective subbank (w.r.t bank) */
-	off = sub_off & (npc_priv.subbank_depth - 1);
+	off = sub_off & (npc_priv->subbank_depth - 1);
 
 	/* if subsection idx is in bank1, add bank depth,
 	 * which is part of sb->b1b
 	 */
-	bot = sub_off >= npc_priv.subbank_depth ? sb->b1b : sb->b0b;
+	bot = sub_off >= npc_priv->subbank_depth ? sb->b1b : sb->b0b;
 
 	*mcam_idx = bot + off;
 	return 0;
@@ -1825,37 +1823,37 @@ int npc_mcam_idx_2_subbank_idx(struct rvu *rvu, u16 mcam_idx,
 	int bank_off, sb_id;
 
 	/* mcam_idx should be less than (2 * bank depth) */
-	if (mcam_idx >= npc_priv.bank_depth * 2) {
+	if (mcam_idx >= npc_priv->bank_depth * 2) {
 		dev_err(rvu->dev, "%s: Invalid mcam idx %u\n",
 			__func__, mcam_idx);
 		return -EINVAL;
 	}
 
 	/* find mcam offset per bank */
-	bank_off = mcam_idx & (npc_priv.bank_depth - 1);
+	bank_off = mcam_idx & (npc_priv->bank_depth - 1);
 
 	/* Find subbank id */
-	sb_id = bank_off / npc_priv.subbank_depth;
+	sb_id = bank_off / npc_priv->subbank_depth;
 
 	/* Check if subbank id is more than maximum
 	 * number of subbanks available
 	 */
-	if (sb_id >= npc_priv.num_subbanks) {
+	if (sb_id >= npc_priv->num_subbanks) {
 		dev_err(rvu->dev, "%s: invalid subbank %d\n",
 			__func__, sb_id);
 		return -EINVAL;
 	}
 
-	*sb = &npc_priv.sb[sb_id];
+	*sb = &npc_priv->sb[sb_id];
 
 	/* Subbank offset per bank */
-	*sb_off = bank_off % npc_priv.subbank_depth;
+	*sb_off = bank_off % npc_priv->subbank_depth;
 
 	/* Index in a subbank should add subbank depth
 	 * if it is in bank1
 	 */
-	if (mcam_idx >= npc_priv.bank_depth)
-		*sb_off += npc_priv.subbank_depth;
+	if (mcam_idx >= npc_priv->bank_depth)
+		*sb_off += npc_priv->subbank_depth;
 
 	return 0;
 }
@@ -1871,9 +1869,9 @@ static int __npc_subbank_contig_alloc(struct rvu *rvu,
 	int k, offset, delta = 0;
 	int cnt = 0, sbd;
 
-	sbd = npc_priv.subbank_depth;
+	sbd = npc_priv->subbank_depth;
 
-	if (sidx >= npc_priv.bank_depth)
+	if (sidx >= npc_priv->bank_depth)
 		delta = sbd;
 
 	switch (prio) {
@@ -1940,8 +1938,8 @@ static int __npc_subbank_non_contig_alloc(struct rvu *rvu,
 	int cnt = 0, delta;
 	int k, sbd;
 
-	sbd = npc_priv.subbank_depth;
-	delta = sidx >= npc_priv.bank_depth ? sbd : 0;
+	sbd = npc_priv->subbank_depth;
+	delta = sidx >= npc_priv->bank_depth ? sbd : 0;
 
 	switch (prio) {
 		/* Find an area of size 'count' from sidx to eidx */
@@ -2002,7 +2000,7 @@ static void __npc_subbank_sboff_2_off(struct rvu *rvu, struct npc_subbank *sb,
 {
 	int sbd;
 
-	sbd = npc_priv.subbank_depth;
+	sbd = npc_priv->subbank_depth;
 
 	*off = sb_off & (sbd - 1);
 	*bmap = (sb_off >= sbd) ? sb->b1map : sb->b0map;
@@ -2051,20 +2049,20 @@ static int __npc_subbank_mark_free(struct rvu *rvu, struct npc_subbank *sb)
 	sb->flags = NPC_SUBBANK_FLAG_FREE;
 	sb->key_type = 0;
 
-	bitmap_clear(sb->b0map, 0, npc_priv.subbank_depth);
-	bitmap_clear(sb->b1map, 0, npc_priv.subbank_depth);
+	bitmap_clear(sb->b0map, 0, npc_priv->subbank_depth);
+	bitmap_clear(sb->b1map, 0, npc_priv->subbank_depth);
 
-	if (!xa_erase(&npc_priv.xa_sb_used, sb->arr_idx)) {
+	if (!xa_erase(&npc_priv->xa_sb_used, sb->arr_idx)) {
 		dev_err(rvu->dev,
 			"%s: Error to delete from xa_sb_used array\n",
 			__func__);
 		return -EFAULT;
 	}
 
-	rc = xa_insert(&npc_priv.xa_sb_free, sb->arr_idx,
+	rc = xa_insert(&npc_priv->xa_sb_free, sb->arr_idx,
 		       xa_mk_value(sb->idx), GFP_KERNEL);
 	if (rc) {
-		rc = xa_insert(&npc_priv.xa_sb_used, sb->arr_idx,
+		rc = xa_insert(&npc_priv->xa_sb_used, sb->arr_idx,
 			       xa_mk_value(sb->idx), GFP_KERNEL);
 		if (rc)
 			dev_err(rvu->dev,
@@ -2093,21 +2091,21 @@ static int __npc_subbank_mark_used(struct rvu *rvu, struct npc_subbank *sb,
 	sb->flags = NPC_SUBBANK_FLAG_USED;
 	sb->key_type = key_type;
 	if (key_type == NPC_MCAM_KEY_X4)
-		sb->free_cnt = npc_priv.subbank_depth;
+		sb->free_cnt = npc_priv->subbank_depth;
 	else
-		sb->free_cnt = 2 * npc_priv.subbank_depth;
+		sb->free_cnt = 2 * npc_priv->subbank_depth;
 
-	bitmap_clear(sb->b0map, 0, npc_priv.subbank_depth);
-	bitmap_clear(sb->b1map, 0, npc_priv.subbank_depth);
+	bitmap_clear(sb->b0map, 0, npc_priv->subbank_depth);
+	bitmap_clear(sb->b1map, 0, npc_priv->subbank_depth);
 
-	if (!xa_erase(&npc_priv.xa_sb_free, sb->arr_idx)) {
+	if (!xa_erase(&npc_priv->xa_sb_free, sb->arr_idx)) {
 		dev_err(rvu->dev,
 			"%s: Error to delete from xa_sb_free array\n",
 			__func__);
 		return -EFAULT;
 	}
 
-	rc = xa_insert(&npc_priv.xa_sb_used, sb->arr_idx,
+	rc = xa_insert(&npc_priv->xa_sb_used, sb->arr_idx,
 		       xa_mk_value(sb->idx), GFP_KERNEL);
 	if (rc)
 		dev_err(rvu->dev,
@@ -2131,10 +2129,10 @@ static bool __npc_subbank_free(struct rvu *rvu, struct npc_subbank *sb,
 
 	/* Check whether we can mark whole subbank as free */
 	if (sb->key_type == NPC_MCAM_KEY_X4) {
-		if (sb->free_cnt < npc_priv.subbank_depth)
+		if (sb->free_cnt < npc_priv->subbank_depth)
 			goto done;
 	} else {
-		if (sb->free_cnt < 2 * npc_priv.subbank_depth)
+		if (sb->free_cnt < 2 * npc_priv->subbank_depth)
 			goto done;
 	}
 
@@ -2213,7 +2211,7 @@ static int __npc_subbank_alloc(struct rvu *rvu, struct npc_subbank *sb,
 
 	/* x4 indexes are from 0 to bank size as it combines two x2 banks */
 	if (key_type == NPC_MCAM_KEY_X4 &&
-	    (ref >= npc_priv.bank_depth || limit >= npc_priv.bank_depth)) {
+	    (ref >= npc_priv->bank_depth || limit >= npc_priv->bank_depth)) {
 		dev_err(rvu->dev,
 			"%s: Wrong ref_enty(%d) or limit(%d) for x4\n",
 			__func__, ref, limit);
@@ -2223,8 +2221,8 @@ static int __npc_subbank_alloc(struct rvu *rvu, struct npc_subbank *sb,
 	/* This function is called either bank0 or bank1 portion of a subbank.
 	 * so ref and limit should be on same bank.
 	 */
-	diffbank = !!((ref & npc_priv.bank_depth) ^
-		      (limit & npc_priv.bank_depth));
+	diffbank = !!((ref & npc_priv->bank_depth) ^
+		      (limit & npc_priv->bank_depth));
 	if (diffbank) {
 		dev_err(rvu->dev,
 			"%s: request ref and limit should be from same bank\n",
@@ -2248,7 +2246,7 @@ static int __npc_subbank_alloc(struct rvu *rvu, struct npc_subbank *sb,
 	 * or equal to mcam entries available in the subbank if contig.
 	 */
 	if (sb->flags & NPC_SUBBANK_FLAG_FREE) {
-		if (contig && count > npc_priv.subbank_depth) {
+		if (contig && count > npc_priv->subbank_depth) {
 			dev_err(rvu->dev, "%s: Less number of entries\n",
 				__func__);
 			return -ENOSPC;
@@ -2271,10 +2269,10 @@ static int __npc_subbank_alloc(struct rvu *rvu, struct npc_subbank *sb,
 	}
 
 process:
-	/* if ref or limit >= npc_priv.bank_depth, index are in bank1.
+	/* if ref or limit >= npc_priv->bank_depth, index are in bank1.
 	 * else bank0.
 	 */
-	if (ref >= npc_priv.bank_depth) {
+	if (ref >= npc_priv->bank_depth) {
 		bmap = sb->b1map;
 		t = sb->b1t;
 		b = sb->b1b;
@@ -2285,8 +2283,8 @@ static int __npc_subbank_alloc(struct rvu *rvu, struct npc_subbank *sb,
 	}
 
 	/* Calculate free slots */
-	bw = bitmap_weight(bmap, npc_priv.subbank_depth);
-	bfree = npc_priv.subbank_depth - bw;
+	bw = bitmap_weight(bmap, npc_priv->subbank_depth);
+	bfree = npc_priv->subbank_depth - bw;
 
 	if (!bfree) {
 		dev_dbg(rvu->dev, "%s: subbank is full\n", __func__);
@@ -2415,7 +2413,7 @@ npc_del_from_pf_maps(struct rvu *rvu, u16 mcam_idx)
 	int pcifunc, idx;
 	void *map;
 
-	map = xa_erase(&npc_priv.xa_idx2pf_map, mcam_idx);
+	map = xa_erase(&npc_priv->xa_idx2pf_map, mcam_idx);
 	if (!map) {
 		dev_err(rvu->dev,
 			"%s: failed to erase mcam_idx(%u) from xa_idx2pf map\n",
@@ -2424,7 +2422,7 @@ npc_del_from_pf_maps(struct rvu *rvu, u16 mcam_idx)
 	}
 
 	pcifunc = xa_to_value(map);
-	map = xa_load(&npc_priv.xa_pf_map, pcifunc);
+	map = xa_load(&npc_priv->xa_pf_map, pcifunc);
 	if (!map) {
 		dev_err(rvu->dev,
 			"%s: failed to find entry for (%u) from xa_pf_map, mcam=%u\n",
@@ -2434,7 +2432,7 @@ npc_del_from_pf_maps(struct rvu *rvu, u16 mcam_idx)
 
 	idx = xa_to_value(map);
 
-	map = xa_erase(&npc_priv.xa_pf2idx_map[idx], mcam_idx);
+	map = xa_erase(&npc_priv->xa_pf2idx_map[idx], mcam_idx);
 	if (!map) {
 		dev_err(rvu->dev,
 			"%s: failed to erase mcam_idx(%u) from xa_pf2idx_map map\n",
@@ -2454,18 +2452,18 @@ npc_add_to_pf_maps(struct rvu *rvu, u16 mcam_idx, int pcifunc)
 		"%s: add2maps mcam_idx(%u) to xa_idx2pf map pcifunc=%#x\n",
 		__func__, mcam_idx, pcifunc);
 
-	rc = xa_insert(&npc_priv.xa_idx2pf_map, mcam_idx,
+	rc = xa_insert(&npc_priv->xa_idx2pf_map, mcam_idx,
 		       xa_mk_value(pcifunc), GFP_KERNEL);
 
 	if (rc) {
-		map = xa_load(&npc_priv.xa_idx2pf_map, mcam_idx);
+		map = xa_load(&npc_priv->xa_idx2pf_map, mcam_idx);
 		dev_err(rvu->dev,
 			"%s: failed to insert mcam_idx(%u) to xa_idx2pf map, existing value=%lu\n",
 			__func__, mcam_idx, xa_to_value(map));
 		return -EFAULT;
 	}
 
-	map = xa_load(&npc_priv.xa_pf_map, pcifunc);
+	map = xa_load(&npc_priv->xa_pf_map, pcifunc);
 	if (!map) {
 		dev_err(rvu->dev,
 			"%s: failed to find pf map entry for pcifunc=%#x, mcam=%u\n",
@@ -2475,12 +2473,12 @@ npc_add_to_pf_maps(struct rvu *rvu, u16 mcam_idx, int pcifunc)
 
 	idx = xa_to_value(map);
 
-	rc = xa_insert(&npc_priv.xa_pf2idx_map[idx], mcam_idx,
+	rc = xa_insert(&npc_priv->xa_pf2idx_map[idx], mcam_idx,
 		       xa_mk_value(pcifunc), GFP_KERNEL);
 
 	if (rc) {
-		map = xa_load(&npc_priv.xa_pf2idx_map[idx], mcam_idx);
-		xa_erase(&npc_priv.xa_idx2pf_map, mcam_idx);
+		map = xa_load(&npc_priv->xa_pf2idx_map[idx], mcam_idx);
+		xa_erase(&npc_priv->xa_idx2pf_map, mcam_idx);
 		dev_err(rvu->dev,
 			"%s: failed to insert mcam_idx(%u) to xa_pf2idx_map map, earlier value=%lu idx=%u\n",
 			__func__, mcam_idx, xa_to_value(map), idx);
@@ -2510,9 +2508,9 @@ npc_subbank_suits(struct npc_subbank *sb, int key_type)
 	return false;
 }
 
-#define SB_ALIGN_UP(val)   (((val) + npc_priv.subbank_depth) & \
-			    ~((npc_priv.subbank_depth) - 1))
-#define SB_ALIGN_DOWN(val) ALIGN_DOWN((val), npc_priv.subbank_depth)
+#define SB_ALIGN_UP(val)   (((val) + npc_priv->subbank_depth) & \
+			    ~((npc_priv->subbank_depth) - 1))
+#define SB_ALIGN_DOWN(val) ALIGN_DOWN((val), npc_priv->subbank_depth)
 
 static void npc_subbank_iter_down(struct rvu *rvu,
 				  int ref, int limit,
@@ -2538,7 +2536,7 @@ static void npc_subbank_iter_down(struct rvu *rvu,
 	}
 
 	*cur_ref = *cur_limit - 1;
-	align = *cur_ref - npc_priv.subbank_depth + 1;
+	align = *cur_ref - npc_priv->subbank_depth + 1;
 	if (align <= limit) {
 		*stop = true;
 		*cur_limit = limit;
@@ -2578,7 +2576,7 @@ static void npc_subbank_iter_up(struct rvu *rvu,
 	}
 
 	*cur_ref = *cur_limit + 1;
-	align = *cur_ref + npc_priv.subbank_depth - 1;
+	align = *cur_ref + npc_priv->subbank_depth - 1;
 
 	if (align >= limit) {
 		*stop = true;
@@ -2606,17 +2604,17 @@ npc_subbank_iter(struct rvu *rvu, int key_type,
 
 	/* limit and ref should < bank_depth for x4 */
 	if (key_type == NPC_MCAM_KEY_X4) {
-		if (*cur_ref >= npc_priv.bank_depth)
+		if (*cur_ref >= npc_priv->bank_depth)
 			return -EINVAL;
 
-		if (*cur_limit >= npc_priv.bank_depth)
+		if (*cur_limit >= npc_priv->bank_depth)
 			return -EINVAL;
 	}
 	/* limit and ref should < 2 * bank_depth, for x2 */
-	if (*cur_ref >= 2 * npc_priv.bank_depth)
+	if (*cur_ref >= 2 * npc_priv->bank_depth)
 		return -EINVAL;
 
-	if (*cur_limit >= 2 * npc_priv.bank_depth)
+	if (*cur_limit >= 2 * npc_priv->bank_depth)
 		return -EINVAL;
 
 	return 0;
@@ -2651,7 +2649,7 @@ static int npc_idx_free(struct rvu *rvu, u16 *mcam_idx, int count,
 			vidx = npc_idx2vidx(midx);
 		}
 
-		if (midx >= npc_priv.bank_depth * npc_priv.num_banks) {
+		if (midx >= npc_priv->bank_depth * npc_priv->num_banks) {
 			dev_err(rvu->dev,
 				"%s: Invalid mcam_idx=%u cannot be deleted\n",
 				__func__, mcam_idx[i]);
@@ -2846,7 +2844,7 @@ static int npc_subbank_free_cnt(struct rvu *rvu, struct npc_subbank *sb,
 {
 	int cnt, spd;
 
-	spd = npc_priv.subbank_depth;
+	spd = npc_priv->subbank_depth;
 	mutex_lock(&sb->lock);
 
 	if (sb->flags & NPC_SUBBANK_FLAG_FREE)
@@ -3005,7 +3003,7 @@ static int npc_subbank_noref_alloc(struct rvu *rvu, int key_type, bool contig,
 	max_alloc = !contig;
 
 	/* Check used subbanks for free slots */
-	xa_for_each(&npc_priv.xa_sb_used, index, val) {
+	xa_for_each(&npc_priv->xa_sb_used, index, val) {
 		idx = xa_to_value(val);
 
 		/* Minimize allocation from restricted subbanks
@@ -3014,7 +3012,7 @@ static int npc_subbank_noref_alloc(struct rvu *rvu, int key_type, bool contig,
 		if (npc_subbank_restrict_usage(rvu, idx))
 			continue;
 
-		sb = &npc_priv.sb[idx];
+		sb = &npc_priv->sb[idx];
 
 		/* Skip if not suitable subbank */
 		if (!npc_subbank_suits(sb, key_type))
@@ -3071,9 +3069,9 @@ static int npc_subbank_noref_alloc(struct rvu *rvu, int key_type, bool contig,
 	}
 
 	/* Allocate in free subbanks */
-	xa_for_each(&npc_priv.xa_sb_free, index, val) {
+	xa_for_each(&npc_priv->xa_sb_free, index, val) {
 		idx = xa_to_value(val);
-		sb = &npc_priv.sb[idx];
+		sb = &npc_priv->sb[idx];
 
 		/* Minimize allocation from restricted subbanks
 		 * in noref allocations.
@@ -3129,7 +3127,7 @@ static int npc_subbank_noref_alloc(struct rvu *rvu, int key_type, bool contig,
 	for (i = 0; restrict_valid &&
 	     (i < ARRAY_SIZE(npc_subbank_restricted_idxs)); i++) {
 		idx = npc_subbank_restricted_idxs[i];
-		sb = &npc_priv.sb[idx];
+		sb = &npc_priv->sb[idx];
 
 		/* Skip if not suitable subbank */
 		if (!npc_subbank_suits(sb, key_type))
@@ -3209,7 +3207,7 @@ int npc_cn20k_ref_idx_alloc(struct rvu *rvu, int pcifunc, int key_type,
 	bool ref_valid;
 	u16 vidx;
 
-	bd = npc_priv.bank_depth;
+	bd = npc_priv->bank_depth;
 
 	/* Special case: ref == 0 && limit= 0 && prio == HIGH && count == 1
 	 * Here user wants to allocate 0th entry
@@ -3227,7 +3225,7 @@ int npc_cn20k_ref_idx_alloc(struct rvu *rvu, int pcifunc, int key_type,
 	ref_valid = !!(limit || ref);
 	defrag_candidate = !ref_valid && !contig && virt;
 	if (!ref_valid) {
-		if (contig && count > npc_priv.subbank_depth)
+		if (contig && count > npc_priv->subbank_depth)
 			goto try_noref_multi_subbank;
 
 		rc = npc_subbank_noref_alloc(rvu, key_type, contig,
@@ -3272,7 +3270,7 @@ int npc_cn20k_ref_idx_alloc(struct rvu *rvu, int pcifunc, int key_type,
 		return -EINVAL;
 	}
 
-	if (contig && count > npc_priv.subbank_depth)
+	if (contig && count > npc_priv->subbank_depth)
 		goto try_ref_multi_subbank;
 
 	rc = npc_subbank_ref_alloc(rvu, key_type, ref, limit,
@@ -3334,8 +3332,8 @@ void npc_cn20k_subbank_calc_free(struct rvu *rvu, int *x2_free,
 	*x4_free = 0;
 	*sb_free = 0;
 
-	for (i = 0; i < npc_priv.num_subbanks; i++) {
-		sb = &npc_priv.sb[i];
+	for (i = 0; i < npc_priv->num_subbanks; i++) {
+		sb = &npc_priv->sb[i];
 		mutex_lock(&sb->lock);
 
 		/* Count number of free subbanks */
@@ -3433,11 +3431,11 @@ static void npc_subbank_init(struct rvu *rvu, struct npc_subbank *sb, int idx)
 {
 	mutex_init(&sb->lock);
 
-	sb->b0b = idx * npc_priv.subbank_depth;
-	sb->b0t = sb->b0b + npc_priv.subbank_depth - 1;
+	sb->b0b = idx * npc_priv->subbank_depth;
+	sb->b0t = sb->b0b + npc_priv->subbank_depth - 1;
 
-	sb->b1b = npc_priv.bank_depth + idx * npc_priv.subbank_depth;
-	sb->b1t = sb->b1b + npc_priv.subbank_depth - 1;
+	sb->b1b = npc_priv->bank_depth + idx * npc_priv->subbank_depth;
+	sb->b1t = sb->b1b + npc_priv->subbank_depth - 1;
 
 	sb->flags = NPC_SUBBANK_FLAG_FREE;
 	sb->idx = idx;
@@ -3449,7 +3447,7 @@ static void npc_subbank_init(struct rvu *rvu, struct npc_subbank *sb, int idx)
 	/* Keep first and last subbank at end of free array; so that
 	 * it will be used at last
 	 */
-	xa_store(&npc_priv.xa_sb_free, sb->arr_idx,
+	xa_store(&npc_priv->xa_sb_free, sb->arr_idx,
 		 xa_mk_value(sb->idx), GFP_KERNEL);
 }
 
@@ -3474,7 +3472,7 @@ static int npc_pcifunc_map_create(struct rvu *rvu)
 
 		pcifunc = pf << 9;
 
-		xa_store(&npc_priv.xa_pf_map, (unsigned long)pcifunc,
+		xa_store(&npc_priv->xa_pf_map, (unsigned long)pcifunc,
 			 xa_mk_value(cnt), GFP_KERNEL);
 
 		cnt++;
@@ -3483,7 +3481,7 @@ static int npc_pcifunc_map_create(struct rvu *rvu)
 		for (vf = 0; vf < numvfs; vf++) {
 			pcifunc = (pf << 9) | (vf + 1);
 
-			xa_store(&npc_priv.xa_pf_map, (unsigned long)pcifunc,
+			xa_store(&npc_priv->xa_pf_map, (unsigned long)pcifunc,
 				 xa_mk_value(cnt), GFP_KERNEL);
 			cnt++;
 		}
@@ -3569,7 +3567,7 @@ static int npc_defrag_alloc_free_slots(struct rvu *rvu,
 	int rc, sb_off, i, err;
 	bool deleted;
 
-	sb = &npc_priv.sb[f->idx];
+	sb = &npc_priv->sb[f->idx];
 
 	alloc_cnt1 = 0;
 	alloc_cnt2 = 0;
@@ -3639,9 +3637,9 @@ static int npc_defrag_add_2_show_list(struct rvu *rvu, u16 old_midx,
 	node->vidx = vidx;
 	INIT_LIST_HEAD(&node->list);
 
-	mutex_lock(&npc_priv.lock);
-	list_add_tail(&node->list, &npc_priv.defrag_lh);
-	mutex_unlock(&npc_priv.lock);
+	mutex_lock(&npc_priv->lock);
+	list_add_tail(&node->list, &npc_priv->defrag_lh);
+	mutex_unlock(&npc_priv->lock);
 
 	return 0;
 }
@@ -3745,7 +3743,7 @@ int npc_defrag_move_vdx_to_free(struct rvu *rvu,
 		}
 
 		/* save pcifunc */
-		map = xa_load(&npc_priv.xa_idx2pf_map, old_midx);
+		map = xa_load(&npc_priv->xa_idx2pf_map, old_midx);
 		pcifunc = xa_to_value(map);
 
 		/* delete from pf maps */
@@ -3904,29 +3902,29 @@ static void npc_defrag_list_clear(void)
 {
 	struct npc_defrag_show_node *node, *next;
 
-	mutex_lock(&npc_priv.lock);
-	list_for_each_entry_safe(node, next, &npc_priv.defrag_lh, list) {
+	mutex_lock(&npc_priv->lock);
+	list_for_each_entry_safe(node, next, &npc_priv->defrag_lh, list) {
 		list_del_init(&node->list);
 		kfree(node);
 	}
 
-	mutex_unlock(&npc_priv.lock);
+	mutex_unlock(&npc_priv->lock);
 }
 
 static void npc_lock_all_subbank(void)
 {
 	int i;
 
-	for (i = 0; i < npc_priv.num_subbanks; i++)
-		mutex_lock(&npc_priv.sb[i].lock);
+	for (i = 0; i < npc_priv->num_subbanks; i++)
+		mutex_lock(&npc_priv->sb[i].lock);
 }
 
 static void npc_unlock_all_subbank(void)
 {
 	int i;
 
-	for (i = npc_priv.num_subbanks - 1; i >= 0; i--)
-		mutex_unlock(&npc_priv.sb[i].lock);
+	for (i = npc_priv->num_subbanks - 1; i >= 0; i--)
+		mutex_unlock(&npc_priv->sb[i].lock);
 }
 
 int npc_cn20k_search_order_set(struct rvu *rvu,
@@ -3944,9 +3942,9 @@ int npc_cn20k_search_order_set(struct rvu *rvu,
 		USED = 1,
 	};
 
-	if (cnt != npc_priv.num_subbanks) {
+	if (cnt != npc_priv->num_subbanks) {
 		dev_err(rvu->dev, "Number of entries(%u) != %u\n",
-			cnt, npc_priv.num_subbanks);
+			cnt, npc_priv->num_subbanks);
 		return -EINVAL;
 	}
 
@@ -3954,18 +3952,18 @@ int npc_cn20k_search_order_set(struct rvu *rvu,
 	npc_lock_all_subbank();
 
 	for (sb_idx = 0; sb_idx < cnt; sb_idx++) {
-		sb = &npc_priv.sb[sb_idx];
+		sb = &npc_priv->sb[sb_idx];
 		save[sb->idx] = sb->arr_idx;
 	}
 
 	for (prio = 0; prio < cnt; prio++) {
 		sb_idx = narr[prio];
-		sb = &npc_priv.sb[sb_idx];
+		sb = &npc_priv->sb[sb_idx];
 
 		if (sb->flags & NPC_SUBBANK_FLAG_USED)
-			xa = &npc_priv.xa_sb_used;
+			xa = &npc_priv->xa_sb_used;
 		else
-			xa = &npc_priv.xa_sb_free;
+			xa = &npc_priv->xa_sb_free;
 
 		rc = xa_err(xa_store(xa, prio,
 				     xa_mk_value(sb_idx), GFP_KERNEL));
@@ -3989,10 +3987,10 @@ int npc_cn20k_search_order_set(struct rvu *rvu,
 
 	for (prio = 0; prio < cnt; prio++) {
 		if (rsrc[FREE][prio] == -1)
-			xa_erase(&npc_priv.xa_sb_free, prio);
+			xa_erase(&npc_priv->xa_sb_free, prio);
 
 		if (rsrc[USED][prio] == -1)
-			xa_erase(&npc_priv.xa_sb_used, prio);
+			xa_erase(&npc_priv->xa_sb_used, prio);
 	}
 
 	for (int i = 0; i < cnt; i++)
@@ -4008,20 +4006,20 @@ int npc_cn20k_search_order_set(struct rvu *rvu,
 fail:
 	for (prio = 0; prio < cnt; prio++) {
 		if (rsrc[FREE][prio] == 1)
-			xa_erase(&npc_priv.xa_sb_free, prio);
+			xa_erase(&npc_priv->xa_sb_free, prio);
 
 		if (rsrc[USED][prio] == 1)
-			xa_erase(&npc_priv.xa_sb_used, prio);
+			xa_erase(&npc_priv->xa_sb_used, prio);
 	}
 
 	for (sb_idx = 0; sb_idx < cnt; sb_idx++) {
-		sb = &npc_priv.sb[sb_idx];
+		sb = &npc_priv->sb[sb_idx];
 		sb->arr_idx = save[sb_idx];
 
 		if (sb->flags & NPC_SUBBANK_FLAG_USED)
-			xa = &npc_priv.xa_sb_used;
+			xa = &npc_priv->xa_sb_used;
 		else
-			xa = &npc_priv.xa_sb_free;
+			xa = &npc_priv->xa_sb_free;
 
 		/* Since the entry already exists, xa_store() replaces
 		 * the value without a kmalloc(), making failure highly unlikely.
@@ -4041,7 +4039,7 @@ int npc_cn20k_search_order_set(struct rvu *rvu,
 const u32 *npc_cn20k_search_order_get(bool *restricted_order, u32 *sz)
 {
 	*restricted_order = restrict_valid;
-	*sz = npc_priv.num_subbanks;
+	*sz = npc_priv->num_subbanks;
 	return subbank_srch_order;
 }
 
@@ -4065,7 +4063,7 @@ int npc_cn20k_defrag(struct rvu *rvu)
 	INIT_LIST_HEAD(&x4lh);
 	INIT_LIST_HEAD(&x2lh);
 
-	node = kcalloc(npc_priv.num_subbanks, sizeof(*node), GFP_KERNEL);
+	node = kcalloc(npc_priv->num_subbanks, sizeof(*node), GFP_KERNEL);
 	if (!node)
 		return -ENOMEM;
 
@@ -4074,13 +4072,13 @@ int npc_cn20k_defrag(struct rvu *rvu)
 	npc_lock_all_subbank();
 
 	/* Fill in node with subbank properties */
-	for (i = 0; i < npc_priv.num_subbanks; i++) {
-		sb = &npc_priv.sb[i];
+	for (i = 0; i < npc_priv->num_subbanks; i++) {
+		sb = &npc_priv->sb[i];
 
 		node[i].idx = i;
 		node[i].key_type = sb->key_type;
 		node[i].free_cnt = sb->free_cnt;
-		node[i].vidx = kcalloc(npc_priv.subbank_depth * 2,
+		node[i].vidx = kcalloc(npc_priv->subbank_depth * 2,
 				       sizeof(*node[i].vidx),
 				       GFP_KERNEL);
 		if (!node[i].vidx) {
@@ -4110,8 +4108,8 @@ int npc_cn20k_defrag(struct rvu *rvu)
 	}
 
 	/* Filling vidx[] array with all vidx in that subbank */
-	xa_for_each_start(&npc_priv.xa_vidx2idx_map, index, map,
-			  npc_priv.bank_depth * 2) {
+	xa_for_each_start(&npc_priv->xa_vidx2idx_map, index, map,
+			  npc_priv->bank_depth * 2) {
 		midx = xa_to_value(map);
 		rc =  npc_mcam_idx_2_subbank_idx(rvu, midx,
 						 &sb, &sb_off);
@@ -4128,14 +4126,14 @@ int npc_cn20k_defrag(struct rvu *rvu)
 	}
 
 	/* Mark all subbank which has ref allocation */
-	for (i = 0; i < npc_priv.num_subbanks; i++) {
+	for (i = 0; i < npc_priv->num_subbanks; i++) {
 		tnode = &node[i];
 
 		if (!tnode->valid)
 			continue;
 
 		tot = (tnode->key_type == NPC_MCAM_KEY_X2) ?
-			npc_priv.subbank_depth * 2 : npc_priv.subbank_depth;
+			npc_priv->subbank_depth * 2 : npc_priv->subbank_depth;
 
 		if (node[i].vidx_cnt != tot - tnode->free_cnt)
 			tnode->refs = true;
@@ -4152,7 +4150,7 @@ int npc_cn20k_defrag(struct rvu *rvu)
 free_vidx:
 	npc_unlock_all_subbank();
 	mutex_unlock(&mcam->lock);
-	for (i = 0; i < npc_priv.num_subbanks; i++)
+	for (i = 0; i < npc_priv->num_subbanks; i++)
 		kfree(node[i].vidx);
 	kfree(node);
 	return rc;
@@ -4180,7 +4178,7 @@ int npc_cn20k_dft_rules_idx_get(struct rvu *rvu, u16 pcifunc, u16 *bcast,
 		*ptr[i] = USHRT_MAX;
 	}
 
-	if (!npc_priv.init_done)
+	if (!npc_priv)
 		return 0;
 
 	if (is_lbk_vf(rvu, pcifunc)) {
@@ -4188,7 +4186,7 @@ int npc_cn20k_dft_rules_idx_get(struct rvu *rvu, u16 pcifunc, u16 *bcast,
 			return -EINVAL;
 
 		idx = NPC_DFT_RULE_ID_MK(pcifunc, NPC_DFT_RULE_PROMISC_ID);
-		val = xa_load(&npc_priv.xa_pf2dfl_rmap, idx);
+		val = xa_load(&npc_priv->xa_pf2dfl_rmap, idx);
 		if (!val) {
 			pr_debug("%s: Failed to find %s index for pcifunc=%#x\n",
 				 __func__,
@@ -4207,7 +4205,7 @@ int npc_cn20k_dft_rules_idx_get(struct rvu *rvu, u16 pcifunc, u16 *bcast,
 			return -EINVAL;
 
 		idx = NPC_DFT_RULE_ID_MK(pcifunc, NPC_DFT_RULE_UCAST_ID);
-		val = xa_load(&npc_priv.xa_pf2dfl_rmap, idx);
+		val = xa_load(&npc_priv->xa_pf2dfl_rmap, idx);
 		if (!val) {
 			pr_debug("%s: Failed to find %s index for pcifunc=%#x\n",
 				 __func__,
@@ -4227,7 +4225,7 @@ int npc_cn20k_dft_rules_idx_get(struct rvu *rvu, u16 pcifunc, u16 *bcast,
 			continue;
 
 		idx = NPC_DFT_RULE_ID_MK(pcifunc, i);
-		val = xa_load(&npc_priv.xa_pf2dfl_rmap, idx);
+		val = xa_load(&npc_priv->xa_pf2dfl_rmap, idx);
 		if (!val) {
 			pr_debug("%s: Failed to find %s index for pcifunc=%#x\n",
 				 __func__,
@@ -4251,8 +4249,8 @@ int rvu_mbox_handler_npc_get_pfl_info(struct rvu *rvu, struct msg_req *req,
 		return -EOPNOTSUPP;
 	}
 
-	rsp->kw_type = npc_priv.kw;
-	rsp->x4_slots = npc_priv.bank_depth;
+	rsp->kw_type = npc_priv->kw;
+	rsp->x4_slots = npc_priv->bank_depth;
 	return 0;
 }
 
@@ -4342,7 +4340,7 @@ void npc_cn20k_dft_rules_free(struct rvu *rvu, u16 pcifunc)
 	int blkaddr, rc, i;
 	void *map;
 
-	if (!npc_priv.init_done)
+	if (!npc_priv)
 		return;
 
 	if (!npc_is_cgx_or_lbk(rvu, pcifunc)) {
@@ -4360,7 +4358,7 @@ void npc_cn20k_dft_rules_free(struct rvu *rvu, u16 pcifunc)
 	/* LBK */
 	if (is_lbk_vf(rvu, pcifunc)) {
 		index = NPC_DFT_RULE_ID_MK(pcifunc, NPC_DFT_RULE_PROMISC_ID);
-		map = xa_erase(&npc_priv.xa_pf2dfl_rmap, index);
+		map = xa_erase(&npc_priv->xa_pf2dfl_rmap, index);
 		if (!map)
 			dev_dbg(rvu->dev,
 				"%s: Err from delete %s mcam idx from xarray (pcifunc=%#x\n",
@@ -4374,7 +4372,7 @@ void npc_cn20k_dft_rules_free(struct rvu *rvu, u16 pcifunc)
 	/* VF */
 	if (is_vf(pcifunc)) {
 		index = NPC_DFT_RULE_ID_MK(pcifunc, NPC_DFT_RULE_UCAST_ID);
-		map = xa_erase(&npc_priv.xa_pf2dfl_rmap, index);
+		map = xa_erase(&npc_priv->xa_pf2dfl_rmap, index);
 		if (!map)
 			dev_dbg(rvu->dev,
 				"%s: Err from delete %s mcam idx from xarray (pcifunc=%#x\n",
@@ -4388,7 +4386,7 @@ void npc_cn20k_dft_rules_free(struct rvu *rvu, u16 pcifunc)
 	/* PF */
 	for (i = NPC_DFT_RULE_START_ID; i < NPC_DFT_RULE_MAX_ID; i++)  {
 		index = NPC_DFT_RULE_ID_MK(pcifunc, i);
-		map = xa_erase(&npc_priv.xa_pf2dfl_rmap, index);
+		map = xa_erase(&npc_priv->xa_pf2dfl_rmap, index);
 		if (!map)
 			dev_dbg(rvu->dev,
 				"%s: Err from delete %s mcam idx from xarray (pcifunc=%#x\n",
@@ -4448,7 +4446,7 @@ int npc_cn20k_dft_rules_alloc(struct rvu *rvu, u16 pcifunc)
 	struct msg_rsp free_rsp;
 	u16 b, m, p, u;
 
-	if (!npc_priv.init_done)
+	if (!npc_priv)
 		return 0;
 
 	if (!npc_is_cgx_or_lbk(rvu, pcifunc)) {
@@ -4471,7 +4469,7 @@ int npc_cn20k_dft_rules_alloc(struct rvu *rvu, u16 pcifunc)
 	}
 
 	/* Set ref index as lowest priority index */
-	eidx = 2 * npc_priv.bank_depth - 1;
+	eidx = 2 * npc_priv->bank_depth - 1;
 
 	/* Install only UCAST for VF */
 	cnt = is_vf(pcifunc) ? 1 : ARRAY_SIZE(mcam_idx);
@@ -4500,9 +4498,9 @@ int npc_cn20k_dft_rules_alloc(struct rvu *rvu, u16 pcifunc)
 	pfvf = rvu_get_pfvf(rvu, pcifunc);
 	pfvf->hw_prio = NPC_DFT_RULE_PRIO;
 
-	if (npc_priv.kw == NPC_MCAM_KEY_X4) {
+	if (npc_priv->kw == NPC_MCAM_KEY_X4) {
 		req.kw_type = NPC_MCAM_KEY_X4;
-		req.ref_entry = eidx & (npc_priv.bank_depth - 1);
+		req.ref_entry = eidx & (npc_priv->bank_depth - 1);
 	} else {
 		req.kw_type = NPC_MCAM_KEY_X2;
 		req.ref_entry = eidx;
@@ -4543,9 +4541,9 @@ int npc_cn20k_dft_rules_alloc(struct rvu *rvu, u16 pcifunc)
 	req.hdr.pcifunc = pcifunc;
 	req.ref_prio = NPC_MCAM_LOWER_PRIO;
 
-	if (npc_priv.kw == NPC_MCAM_KEY_X4) {
+	if (npc_priv->kw == NPC_MCAM_KEY_X4) {
 		req.kw_type = NPC_MCAM_KEY_X4;
-		req.ref_entry = eidx & (npc_priv.bank_depth - 1);
+		req.ref_entry = eidx & (npc_priv->bank_depth - 1);
 	} else {
 		req.kw_type = NPC_MCAM_KEY_X2;
 		req.ref_entry = eidx;
@@ -4569,7 +4567,7 @@ int npc_cn20k_dft_rules_alloc(struct rvu *rvu, u16 pcifunc)
 	/* LBK */
 	if (is_lbk_vf(rvu, pcifunc)) {
 		index = NPC_DFT_RULE_ID_MK(pcifunc, NPC_DFT_RULE_PROMISC_ID);
-		ret = xa_insert(&npc_priv.xa_pf2dfl_rmap, index,
+		ret = xa_insert(&npc_priv->xa_pf2dfl_rmap, index,
 				xa_mk_value(mcam_idx[0]), GFP_KERNEL);
 		if (ret) {
 			dev_err(rvu->dev,
@@ -4586,7 +4584,7 @@ int npc_cn20k_dft_rules_alloc(struct rvu *rvu, u16 pcifunc)
 	/* VF */
 	if (is_vf(pcifunc)) {
 		index = NPC_DFT_RULE_ID_MK(pcifunc, NPC_DFT_RULE_UCAST_ID);
-		ret = xa_insert(&npc_priv.xa_pf2dfl_rmap, index,
+		ret = xa_insert(&npc_priv->xa_pf2dfl_rmap, index,
 				xa_mk_value(mcam_idx[0]), GFP_KERNEL);
 		if (ret) {
 			dev_err(rvu->dev,
@@ -4604,7 +4602,7 @@ int npc_cn20k_dft_rules_alloc(struct rvu *rvu, u16 pcifunc)
 	for (i = NPC_DFT_RULE_START_ID, k = 0; i < NPC_DFT_RULE_MAX_ID &&
 	     k < cnt; i++, k++) {
 		index = NPC_DFT_RULE_ID_MK(pcifunc, i);
-		ret = xa_insert(&npc_priv.xa_pf2dfl_rmap, index,
+		ret = xa_insert(&npc_priv->xa_pf2dfl_rmap, index,
 				xa_mk_value(mcam_idx[k]), GFP_KERNEL);
 		if (ret) {
 			dev_err(rvu->dev,
@@ -4613,7 +4611,7 @@ int npc_cn20k_dft_rules_alloc(struct rvu *rvu, u16 pcifunc)
 				pcifunc);
 			for (int p = NPC_DFT_RULE_START_ID; p < i; p++) {
 				index = NPC_DFT_RULE_ID_MK(pcifunc, p);
-				xa_erase(&npc_priv.xa_pf2dfl_rmap, index);
+				xa_erase(&npc_priv->xa_pf2dfl_rmap, index);
 			}
 			goto err;
 		}
@@ -4687,71 +4685,79 @@ static int npc_priv_init(struct rvu *rvu)
 		return -EINVAL;
 	}
 
-	npc_priv.num_subbanks = num_subbanks;
-	npc_priv.bank_depth = bank_depth;
-	npc_priv.subbank_depth = subbank_depth;
+	npc_priv = kcalloc(1, sizeof(*npc_priv), GFP_KERNEL);
+	if (!npc_priv)
+		return -ENOMEM;
+
+	npc_priv->num_banks = num_banks;
+	npc_priv->num_subbanks = num_subbanks;
+	npc_priv->bank_depth = bank_depth;
+	npc_priv->subbank_depth = subbank_depth;
 
 	/* Get kex configured key size */
 	cfg = rvu_read64(rvu, blkaddr, NPC_AF_INTFX_KEX_CFG(0));
-	npc_priv.kw = FIELD_GET(GENMASK_ULL(34, 32), cfg);
+	npc_priv->kw = FIELD_GET(GENMASK_ULL(34, 32), cfg);
 
 	dev_info(rvu->dev,
 		 "banks=%u depth=%u, subbanks=%u depth=%u, key type=%s\n",
 		 num_banks, bank_depth, num_subbanks, subbank_depth,
-		 npc_kw_name[npc_priv.kw]);
+		 npc_kw_name[npc_priv->kw]);
 
-	npc_priv.sb = kcalloc(num_subbanks, sizeof(struct npc_subbank),
-			      GFP_KERNEL);
-	if (!npc_priv.sb)
-		return -ENOMEM;
+	npc_priv->sb = kcalloc(num_subbanks, sizeof(struct npc_subbank),
+			       GFP_KERNEL);
+	if (!npc_priv->sb)
+		goto fail1;
 
-	xa_init_flags(&npc_priv.xa_sb_used, XA_FLAGS_ALLOC);
-	xa_init_flags(&npc_priv.xa_sb_free, XA_FLAGS_ALLOC);
-	xa_init_flags(&npc_priv.xa_idx2pf_map, XA_FLAGS_ALLOC);
-	xa_init_flags(&npc_priv.xa_pf_map, XA_FLAGS_ALLOC);
-	xa_init_flags(&npc_priv.xa_pf2dfl_rmap, XA_FLAGS_ALLOC);
-	xa_init_flags(&npc_priv.xa_idx2vidx_map, XA_FLAGS_ALLOC);
-	xa_init_flags(&npc_priv.xa_vidx2idx_map, XA_FLAGS_ALLOC);
+	xa_init_flags(&npc_priv->xa_sb_used, XA_FLAGS_ALLOC);
+	xa_init_flags(&npc_priv->xa_sb_free, XA_FLAGS_ALLOC);
+	xa_init_flags(&npc_priv->xa_idx2pf_map, XA_FLAGS_ALLOC);
+	xa_init_flags(&npc_priv->xa_pf_map, XA_FLAGS_ALLOC);
+	xa_init_flags(&npc_priv->xa_pf2dfl_rmap, XA_FLAGS_ALLOC);
+	xa_init_flags(&npc_priv->xa_idx2vidx_map, XA_FLAGS_ALLOC);
+	xa_init_flags(&npc_priv->xa_vidx2idx_map, XA_FLAGS_ALLOC);
 
 	if (npc_create_srch_order(num_subbanks))
-		goto fail1;
+		goto fail2;
 
 	npc_populate_restricted_idxs(num_subbanks);
 
 	/* Initialize subbanks */
-	for (i = 0, sb = npc_priv.sb; i < num_subbanks; i++, sb++)
+	for (i = 0, sb = npc_priv->sb; i < num_subbanks; i++, sb++)
 		npc_subbank_init(rvu, sb, i);
 
 	/* Get number of pcifuncs in the system */
-	npc_priv.pf_cnt = npc_pcifunc_map_create(rvu);
-	npc_priv.xa_pf2idx_map = kcalloc(npc_priv.pf_cnt,
-					 sizeof(struct xarray),
-					 GFP_KERNEL);
-	if (!npc_priv.xa_pf2idx_map)
-		goto fail2;
+	npc_priv->pf_cnt = npc_pcifunc_map_create(rvu);
+	npc_priv->xa_pf2idx_map = kcalloc(npc_priv->pf_cnt,
+					  sizeof(struct xarray),
+					  GFP_KERNEL);
+	if (!npc_priv->xa_pf2idx_map)
+		goto fail3;
 
-	for (i = 0; i < npc_priv.pf_cnt; i++)
-		xa_init_flags(&npc_priv.xa_pf2idx_map[i], XA_FLAGS_ALLOC);
+	for (i = 0; i < npc_priv->pf_cnt; i++)
+		xa_init_flags(&npc_priv->xa_pf2idx_map[i], XA_FLAGS_ALLOC);
 
-	INIT_LIST_HEAD(&npc_priv.defrag_lh);
-	mutex_init(&npc_priv.lock);
+	INIT_LIST_HEAD(&npc_priv->defrag_lh);
+	mutex_init(&npc_priv->lock);
 
 	return 0;
 
-fail2:
+fail3:
 	kfree(subbank_srch_order);
 	subbank_srch_order = NULL;
 
+fail2:
+	xa_destroy(&npc_priv->xa_sb_used);
+	xa_destroy(&npc_priv->xa_sb_free);
+	xa_destroy(&npc_priv->xa_idx2pf_map);
+	xa_destroy(&npc_priv->xa_pf_map);
+	xa_destroy(&npc_priv->xa_pf2dfl_rmap);
+	xa_destroy(&npc_priv->xa_idx2vidx_map);
+	xa_destroy(&npc_priv->xa_vidx2idx_map);
+	kfree(npc_priv->sb);
+	npc_priv->sb = NULL;
 fail1:
-	xa_destroy(&npc_priv.xa_sb_used);
-	xa_destroy(&npc_priv.xa_sb_free);
-	xa_destroy(&npc_priv.xa_idx2pf_map);
-	xa_destroy(&npc_priv.xa_pf_map);
-	xa_destroy(&npc_priv.xa_pf2dfl_rmap);
-	xa_destroy(&npc_priv.xa_idx2vidx_map);
-	xa_destroy(&npc_priv.xa_vidx2idx_map);
-	kfree(npc_priv.sb);
-	npc_priv.sb = NULL;
+	kfree(npc_priv);
+	npc_priv = NULL;
 	return -ENOMEM;
 }
 
@@ -4759,25 +4765,31 @@ void npc_cn20k_deinit(struct rvu *rvu)
 {
 	int i;
 
-	xa_destroy(&npc_priv.xa_sb_used);
-	xa_destroy(&npc_priv.xa_sb_free);
-	xa_destroy(&npc_priv.xa_idx2pf_map);
-	xa_destroy(&npc_priv.xa_pf_map);
-	xa_destroy(&npc_priv.xa_pf2dfl_rmap);
-	xa_destroy(&npc_priv.xa_idx2vidx_map);
-	xa_destroy(&npc_priv.xa_vidx2idx_map);
+	if (!npc_priv)
+		return;
 
-	for (i = 0; i < npc_priv.pf_cnt; i++)
-		xa_destroy(&npc_priv.xa_pf2idx_map[i]);
+	xa_destroy(&npc_priv->xa_sb_used);
+	xa_destroy(&npc_priv->xa_sb_free);
+	xa_destroy(&npc_priv->xa_idx2pf_map);
+	xa_destroy(&npc_priv->xa_pf_map);
+	xa_destroy(&npc_priv->xa_pf2dfl_rmap);
+	xa_destroy(&npc_priv->xa_idx2vidx_map);
+	xa_destroy(&npc_priv->xa_vidx2idx_map);
 
-	kfree(npc_priv.xa_pf2idx_map);
+	for (i = 0; i < npc_priv->pf_cnt; i++)
+		xa_destroy(&npc_priv->xa_pf2idx_map[i]);
+
+	kfree(npc_priv->xa_pf2idx_map);
 	/* No need to destroy mutex lock as it is
 	 * part of subbank structure
 	 */
-	kfree(npc_priv.sb);
+	kfree(npc_priv->sb);
 	kfree(subbank_srch_order);
-	bitmap_clear(npc_priv.en_map, 0, MAX_NUM_BANKS * MAX_NUM_SUB_BANKS *
+	bitmap_clear(npc_priv->en_map, 0, MAX_NUM_BANKS * MAX_NUM_SUB_BANKS *
 		     MAX_SUBBANK_DEPTH);
+	npc_defrag_list_clear();
+	kfree(npc_priv);
+	npc_priv = NULL;
 }
 
 static int npc_setup_mcam_section(struct rvu *rvu, int key_type)
@@ -4790,7 +4802,7 @@ static int npc_setup_mcam_section(struct rvu *rvu, int key_type)
 		return -ENODEV;
 	}
 
-	for (sec = 0; sec < npc_priv.num_subbanks; sec++)
+	for (sec = 0; sec < npc_priv->num_subbanks; sec++)
 		rvu_write64(rvu, blkaddr,
 			    NPC_AF_MCAM_SECTIONX_CFG_EXT(sec), key_type);
 
@@ -4812,10 +4824,12 @@ int npc_cn20k_init(struct rvu *rvu)
 	if (err) {
 		dev_err(rvu->dev, "%s: mcam section configuration failure\n",
 			__func__);
-		return err;
+		goto fail;
 	}
 
-	npc_priv.init_done = true;
-
 	return 0;
+
+fail:
+	npc_cn20k_deinit(rvu);
+	return err;
 }
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.h b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.h
index 8bf857317e49..10e5bab50f62 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.h
+++ b/drivers/net/ethernet/marvell/octeontx2/af/cn20k/npc.h
@@ -183,7 +183,6 @@ struct npc_defrag_show_node {
  * @xa_idx2pf_map:	Mcam index to PF map.
  * @xa_pf_map:		Pcifunc to index map.
  * @pf_cnt:		Number of PFs.
- * @init_done:		Indicates MCAM initialization is done.
  * @xa_pf2dfl_rmap:	PF to default rule index map.
  * @xa_idx2vidx_map:	Mcam index to virtual index map.
  * @xa_vidx2idx_map:	virtual index to mcam index map.
@@ -195,7 +194,7 @@ struct npc_defrag_show_node {
  */
 struct npc_priv_t {
 	int bank_depth;
-	const int num_banks;
+	int num_banks;
 	int num_subbanks;
 	int subbank_depth;
 	DECLARE_BITMAP(en_map, MAX_NUM_BANKS *
@@ -214,7 +213,6 @@ struct npc_priv_t {
 	struct list_head defrag_lh;
 	struct mutex lock; /* protect defrag nodes */
 	int pf_cnt;
-	bool init_done;
 };
 
 struct npc_kpm_action0 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe
  2026-06-05  3:50 ` [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe Ratheesh Kannoth
@ 2026-06-05  7:47   ` Ratheesh Kannoth
  0 siblings, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-05  7:47 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 09:20:21, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> There is only one admin-function PCI device per system.
> Reject any additional AF probe with -EBUSY so the driver model matches
> hardware and automated reviewers can rely on a single bound instance.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

Apologies. Mavell SMTP server had many issues which resulted delaying,requeing, resending
emails. Sorry for the noise. Please abandon this set as another patch (same patch) is
active.
pw-bot: changes-requested

> ---
>  drivers/net/ethernet/marvell/octeontx2/af/rvu.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
> index 3cf131508ecf..1f0c962e10f4 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
> @@ -3542,12 +3542,19 @@ static void rvu_update_module_params(struct rvu *rvu)
>  		kpu_profile ? kpu_profile : default_pfl_name, KPU_NAME_LEN);
>  }
>
> +static atomic_t device_bound = ATOMIC_INIT(0);
> +
>  static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  {
>  	struct device *dev = &pdev->dev;
>  	struct rvu *rvu;
>  	int    err;
>
> +	if (atomic_cmpxchg(&device_bound, 0, 1) != 0) {
> +		dev_warn(dev, "Only one af device is supported.\n");
> +		return -EBUSY;
> +	}
> +
>  	rvu = devm_kzalloc(dev, sizeof(*rvu), GFP_KERNEL);
>  	if (!rvu)
>  		return -ENOMEM;
> --
> 2.43.0
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe
  2026-06-05  6:32 ` [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe Ratheesh Kannoth
@ 2026-06-08  2:17   ` Ratheesh Kannoth
  2026-06-08  2:25   ` Ratheesh Kannoth
  2026-06-08 22:40   ` Jakub Kicinski
  2 siblings, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:17 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:37, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> There is only one admin-function PCI device per system.
> Reject any additional AF probe with -EBUSY so the driver model matches
> hardware and automated reviewers can rely on a single bound instance.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://sashiko.dev/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

>>  static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>  {
>>  	struct device *dev = &pdev->dev;
>>  	struct rvu *rvu;
>>  	int    err;
>>
>> +	if (atomic_cmpxchg(&device_bound, 0, 1) != 0) {
>> +		dev_warn(dev, "Only one af device is supported.\n");
>> +		return -EBUSY;
>> +	}
>> +
>Since this atomic_cmpxchg() sets device_bound to 1, is there a path that sets
>it back to 0?
>Looking at rvu_probe(), if a failure occurs later in the probe process
>(such as -ENOMEM or resource allocation failures), it jumps to the cleanup
>labels but does not clear the device_bound flag. Does this permanently
>prevent driver rebinding or retry on probe deferral?
>Additionally, when the driver is successfully unbound via rvu_remove(),
>the flag doesn't appear to be cleared there either, which would cause
>subsequent manual binds to fail with -EBUSY.
>Is there a missing atomic_set(&device_bound, 0) in the error paths and in
>rvu_remove()?

You are right. The current implementation lacks
transactional rollback for device_bound in
rvu_probe() error paths, as well as the corresponding reset
in rvu_remove().

The inclusion of atomic_cmpxchg() here is a proactive sanity
check to enforce the hardware paradigm, as firmware instantiates
only a single Admin Function (AF) PCI device. Sashiko had raised many race
issues in previous version of the patch, assuming that multiple AF device
can be probed.

However, full
error-handling path hardening and proper resource cleanup for
the AF driver are currently incomplete.

To prevent scope creep in this fundamental enablement series, we
plan to address the comprehensive error-path rollback—including
proper atomic_set(&device_bound, 0) invocations on probe failure
and driver detachment—in a dedicated, subsequent hardening
patchset targeted for net-next.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 2/9] octeontx2-af: npc: cn20k: debugfs enhancements
  2026-06-05  6:32 ` [PATCH v19 net-next 2/9] octeontx2-af: npc: cn20k: debugfs enhancements Ratheesh Kannoth
@ 2026-06-08  2:20   ` Ratheesh Kannoth
  2026-06-08  2:26   ` Ratheesh Kannoth
  1 sibling, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:20 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:38, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> Improve MCAM visibility and field debugging for CN20K NPC.
>
> - Extend "mcam_layout" to show enabled (+) or disabled state per entry
>   so status can be verified without parsing the full "mcam_entry" dump.
> - Add "dstats" debugfs entry: for enabled MCAM indices, print hit deltas
>   since the prior read by comparing hardware counters to a per-entry
>   software baseline and advancing that baseline after each read (hardware
>   counters are not cleared).
> - Add "mismatch" debugfs entry: lists MCAM entries that are enabled
>   but not explicitly allocated, helping diagnose allocation/field issues.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://sashiko.dev/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

>> +
>> +			snprintf(buff, sizeof(buff), "%u\t%#04x\t%llu\n",
>> +				 mcam_idx, pf, delta);
>> +			seq_puts(s, buff);
>> +
>> +			dstats[bank][idx] = stats;
>Does this loop read and write out-of-bounds of the dstats array if bank_depth
>is larger than expected? As discussed below in npc_priv_init(), bank_depth
>can bypass validation. If that happens, iterating up to bank_depth - 1 will
>exceed the fixed MAX_SUBBANK_DEPTH * MAX_NUM_SUB_BANKS array dimensions.

npc_priv_init() do check whether within limits during probe()

>Can stale baselines in dstats cause inaccurate delta calculations when MCAM
>entries are freed and reallocated?
>Can stale baselines in dstats cause inaccurate delta calculations when MCAM
>entries are freed and reallocated?

Yes. but it is OK as the dstats is used to debug field issues. First time
it may show wrong data, but second time ownwards, data will be correct as s/w baseline
array updates to the hw stats.

>While the reset logic attempts to handle this with:
>if (stats < dstats[bank][idx]) dstats[bank][idx] = 0;
>This reset will fail to trigger if a newly allocated entry accumulates hits
>greater than or equal to the stale baseline before the very first debugfs
>read. In this scenario, the stale baseline would be silently subtracted,
>drastically undercounting the statistics.

No. dstats[0][bank][idx] = stats line sets s/w array baseline to match hw
stats.

> @@ -842,6 +842,12 @@ npc_cn20k_enable_mcam_entry(struct rvu *rvu, int blkaddr,
>>  			    cfg);
>>  	}
>>
>> +update_en_map:
>> +	if (enable)
>> +		set_bit(index, npc_priv.en_map);
>> +	else
>> +		clear_bit(index, npc_priv.en_map);
>> +
>Can index exceed the bounds of the en_map bitmap?
>Because en_map is allocated for exactly MAX_NUM_BANKS * MAX_NUM_SUB_BANKS *
>MAX_SUBBANK_DEPTH (16384) bits, an index corresponding to an oversized
>bank_depth (as described in npc_priv_init() below) could cause set_bit() to
>silently corrupt adjacent fields in struct npc_priv_t.

No. We do check in npc_priv_init() (during probe()) against these macros (MAX_NUM_BANKS,
MAX_NUM_SUB_BANKS, MAX_SUBBANK_DEPTH). if the hw read values exceed these, probe()
itself will fail.

>>  	}
>>
>> -	npc_priv.num_subbanks = num_subbanks;
>> -
>>  	subbank_depth =	bank_depth / num_subbanks;
>> +	if (subbank_depth > MAX_SUBBANK_DEPTH) {
>> +		dev_err(rvu->dev,
>> +			"Invalid subbank depth %u\n",
>> +			subbank_depth);
>> +		return -EINVAL;
>> +	}
>>
>> +	npc_priv.num_subbanks = num_subbanks;
>>  	npc_priv.bank_depth = bank_depth;
>>  	npc_priv.subbank_depth = subbank_depth;
>Does integer division truncation allow an invalid bank_depth to bypass this
>check?
This is the case when hw read value is 0. Even though this case wont happen in hw,
we will post fix patch after this series to check agaist 0 and return err (during
probe())

>If the hardware-provided bank_depth is not an exact multiple of num_subbanks
>(for example, if bank_depth is 8223 and num_subbanks is 32), subbank_depth
>truncates to 256. This passes the MAX_SUBBANK_DEPTH check, allowing bank_depth
>to remain 8223. This oversized bank_depth then drives loops and calculations
>in other functions, leading to the out-of-bounds accesses in debugfs and the
>en_map bitmap operations highlighted above.
In all SoCs, bank_depth is an exact multiple of num_banks. We can add a check
in npc_priv_init() during probe() (as a hardening series to net-next after this
patch is merged)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control
  2026-06-05  6:32 ` [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control Ratheesh Kannoth
@ 2026-06-08  2:22   ` Ratheesh Kannoth
  2026-06-08  2:28   ` Ratheesh Kannoth
  1 sibling, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:22 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:41, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> CN20K NPC MCAM is split into 32 subbanks that are searched in a
> predefined order during allocation. Lower-numbered subbanks have
> higher priority than higher-numbered ones.
>
> Add a runtime "srch_order" to control the order in which
> subbanks are searched during MCAM allocation.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://sashiko.dev/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

> @@ -1566,13 +1569,15 @@ int rvu_mbox_handler_nix_lf_alloc(struct rvu *rvu,
>>
>>  	/* Alloc NIX RQ HW context memory and config the base */
>>  	hwctx_size = 1UL << ((ctx_cfg >> 4) & 0xF);
>> -	err = qmem_alloc(rvu->dev, &pfvf->rq_ctx, req->rq_cnt, hwctx_size);
>> -	if (err)
>> +	rc = qmem_alloc(rvu->dev, &pfvf->rq_ctx, req->rq_cnt, hwctx_size);
>> +	if (rc)
>>  		goto free_mem;
>>
>>  	pfvf->rq_bmap = kcalloc(req->rq_cnt, sizeof(long), GFP_KERNEL);
>> -	if (!pfvf->rq_bmap)
>> +	if (!pfvf->rq_bmap) {
>> +		rc = -ENOMEM;
>>  		goto free_mem;
>> +	}
>This is a pre-existing issue, but I wanted to ask about what happens if a
>VF or PF repeatedly sends the NIX_LF_ALLOC mailbox message without an
>intervening NIX_LF_FREE message.
Pre-existing issue, will address as a seperate patch after this series
is merged.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem
  2026-06-05  6:32 ` [PATCH v19 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem Ratheesh Kannoth
@ 2026-06-08  2:23   ` Ratheesh Kannoth
  2026-06-08  2:30   ` Ratheesh Kannoth
  1 sibling, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:23 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:43, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> Flashing updated firmware on deployed devices is cumbersome. Provide a
> mechanism to load a custom KPU (Key Parse Unit) profile directly from
> the filesystem at module load time.
>
> When the rvu_af module is loaded with the kpu_profile parameter, the
> specified profile is read from /lib/firmware/kpu and programmed into
> the KPU registers. Add npc_kpu_profile_cam2 for the extended cam format
> used by filesystem-loaded profiles and support ptype/ptype_mask in
> npc_config_kpucam when profile->from_fs is set.
>
> Usage:
>   1. Copy the KPU profile file to /lib/firmware/kpu.
>   2. Build OCTEONTX2_AF as a module.
>   3. Load: insmod rvu_af.ko kpu_profile=<profile_name>
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://sashiko.dev/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

>>
>>  	/* Verify if profile fits the HW */
>> +	if (fw->kpus > rvu->hw->npc_kpus) {
>> +		dev_warn(rvu->dev, "Not enough KPUs: %d > %d\n", fw->kpus,
>> +			 rvu->hw->npc_kpus);
>> +		return -EINVAL;
>> +	}
>> +
>> +	/* Check if there is enough memory */
>>  	if (fw->kpus > profile->kpus) {
>This is a pre-existing issue, but does this code read out-of-bounds when
>parsing firmware KPU profiles?
Pre-existing issue, will address as a seperate patch after this series
is merged.

>>
>> +	for (kpu = 0; kpu < fw->kpus; kpu++) {
>> +		if (rvu->kpu_fwdata_sz < hdr_sz + offset) {
>> +			dev_warn(rvu->dev,
>> +				 "Profile size mismatch on KPU%i parsing\n",
>> +				 kpu + 1);
>> +			return -EINVAL;
>> +		}
>> +
>> +		fw_kpu = (struct npc_kpu_fwdata *)(fw->data + offset);
>> +		if (fw_kpu->entries < 0) {
>This is also a pre-existing issue, but does this check properly prevent an
>out-of-bounds read?
Pre-existing issue, will address as a seperate patch after this series
is merged.

>> +	/* Binary blob contains ikpu actions entries at start of data[0] */
>> +	profile->ikpu2 = devm_kcalloc(rvu->dev, 1,
>> +				      sizeof(ikpu_action_entries),
>> +				      GFP_KERNEL);
>Will this leak devm-managed memory if filesystem firmware loading fails?
Yes. Buti it not an issue as there is only one AF device per system. 1st patch in the series
do enforce the same.

> +	/* The firmware layout does dependent on the internal size of
>> +	 * ikpu_action_entries.
>> +	 */
>> +	memcpy((void *)profile->ikpu2, action, sizeof(ikpu_action_entries));
>> +	offset += sizeof(ikpu_action_entries);
>Does this tightly couple the firmware ABI to the kernel's compile-time
>array size?
Yes. There is no field in the structure to indicate the length. We cant add
now as it will break backward compatability, so we maintain same entries all
time.

>The filesystem-based KPU profile parser copies data and advances its buffer
>offset using the compile-time sizeof(ikpu_action_entries). If future kernels
>add new packet kinds (PKINDs), the array size will increase.
>Because the firmware binary does not encode the length of this section, older
>firmware loaded onto a newer kernel will be parsed incorrectly, shifting the
>offset and causing the driver to read garbage for the remaining KPU headers,
>which breaks firmware ABI compatibility.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 8/9] octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc
  2026-06-05  6:32 ` [PATCH v19 net-next 8/9] octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc Ratheesh Kannoth
@ 2026-06-08  2:24   ` Ratheesh Kannoth
  2026-06-08  2:31   ` Ratheesh Kannoth
  1 sibling, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:24 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:44, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> Default CN20K NPC rule allocation now keys off the active MCAM keyword
> width: use X4 with a bank-masked reference index when the silicon uses
> X4 keys, and X2 with the raw index otherwise (replacing the previous
> always-X2 / eidx + 1 behaviour).
>
> In the AF flow-install path, flows that need more than 256 key bits
> query the NPC profile; if the platform is fixed to X2 entries, fail
> with -EOPNOTSUPP instead of requesting X4. Otherwise select X4 for the
> MCAM alloc.
>
> On the PF, cache and pass the profile kw_type from npc_get_pfl_info
> through otx2_mcam_pfl_info_get(), and use it when allocating MCAM
> entries for RSS/defaults and when installing ethtool flows on CN20K,
> including masking the reference index for X4 slot layout.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://sashiko.dev/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

>> +static int otx2_mcam_pfl_info_get(struct otx2_nic *pfvf, u16 *x4_slots, u8 *kw_type)
>>  {
>>  	struct npc_get_pfl_info_rsp *rsp;
>>  	struct msg_req *req;
>>  	static struct {
>>  		bool is_set;
>> -		bool is_x2;
>> +		u8 kw_type;
>>  		u16 x4_slots;
>>  	} pfl_info;
>This is a pre-existing issue, but does caching the hardware profile information
>in this static structure lead to data races when multiple devices probe
>concurrently?
Hw profile information is unique. AF process profile read mbox messages from PF serially. If first
PF updates this information, is_set flag will be set to true. Next PF can directly read this value
rather than sending an mbox request to AF. There wont be any race as values are same and updated serially.
Secondly this is a pre-existing issue, so we can add some hardeing logic (some locks) in a new series
after this patch series is merged.

>+	u8 kw_type = 0;
>> +	u16 x4_slots;
>>  	u16 dft_idx;
>>
>>  	ref = 0;
>> @@ -315,6 +319,16 @@ int otx2_mcam_entry_init(struct otx2_nic *pfvf)
>>  	if (!flow_cfg->def_ent)
>>  		return -ENOMEM;
>This isn't a bug introduced by this patch, but does otx2_mcam_entry_init()
>leak the def_ent array during repeated flow parameter updates?
>
Pre-existing issue, will address as a seperate patch after this series
is merged.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 9/9] octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically.
  2026-06-05  6:32 ` [PATCH v19 net-next 9/9] octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically Ratheesh Kannoth
@ 2026-06-08  2:25   ` Ratheesh Kannoth
  2026-06-08  2:32   ` Ratheesh Kannoth
  1 sibling, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:25 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:45, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> Replace the file-scope static npc_priv with a kcalloc'd struct filled
> from hardware bank/subbank geometry at init (num_banks is no longer a
> const compile-time constant; drop init_done and use a non-NULL
> npc_priv pointer for liveness). Thread npc_priv_get() / pointer access
> through the CN20K NPC code paths, extend teardown to kfree the root
> struct on failure and in npc_cn20k_deinit, and adjust MCAM section
> setup to use the discovered subbank count.
>
> Allocate MCAM debugfs dstats via devm_kzalloc instead of a static matrix,
> and use the allocated backing store consistently when computing deltas
> (including the counter rollover compare).
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://sashiko.dev/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

>>   */
>> -static u64 dstats[MAX_NUM_BANKS][MAX_SUBBANK_DEPTH * MAX_NUM_SUB_BANKS] = {};
>> +static u64 (*dstats)[MAX_NUM_BANKS][MAX_SUBBANK_DEPTH * MAX_NUM_SUB_BANKS];
>> +
>>  static int npc_mcam_dstats_show(struct seq_file *s, void *unused)
>>  {
>What happens if npc_priv is NULL when this function is called?
npc_priv is allocated during probe() much before debugfs entries are created.

>>  	debugfs_create_file("vidx2idx", 0444, rvu->rvu_dbg.npc,
>>  			    npc_priv, &npc_vidx2idx_map_fops);
>>
>> +	dstats = devm_kzalloc(rvu->dev, sizeof(*dstats), GFP_KERNEL);
>> +	if (!dstats)
>> +		return -ENOMEM;
>> +
>Does allocating dstats per-device using devm_kzalloc overwrite the global
>dstats pointer in multi-device systems?
Only one AF device per system, first patch in the series enforce the same.

>>  	int bank_off, sb_id;
>>
>>  	/* mcam_idx should be less than (2 * bank depth) */
>> -	if (mcam_idx >= npc_priv.bank_depth * 2) {
>> +	if (mcam_idx >= npc_priv->bank_depth * 2) {
>Is it possible for unprivileged Virtual Functions to trigger a denial of
>service here?
>Mailbox handlers invoke npc_mcam_idx_2_key_type(). If a VF sends a mailbox
>message early during host probe, or after an NPC initialization failure,
>npc_priv might be NULL. Would this crash the host kernel without a
>NULL pointer check?
Not possibe; as VF/PF can be probed only after AF is initialized properly.

>> -	rsp->kw_type = npc_priv.kw;
>> -	rsp->x4_slots = npc_priv.bank_depth;
>> +	rsp->kw_type = npc_priv->kw;
>> +	rsp->x4_slots = npc_priv->bank_depth;
>>  	return 0;
>>  }
>Could this also crash if npc_priv is NULL?
>Since mailbox handlers process messages from Virtual Functions, checking
>npc_priv for NULL before dereferencing seems necessary to prevent a crash.
Not possibe; as VF/PF can be probed only after AF is initialized properly.

> -	npc_priv.num_subbanks = num_subbanks;
>> -	npc_priv.bank_depth = bank_depth;
>> -	npc_priv.subbank_depth = subbank_depth;
>> +	npc_priv = kcalloc(1, sizeof(*npc_priv), GFP_KERNEL);
>Will probing a second RVU device overwrite the npc_priv pointer allocated
>by the first device?
>Since npc_priv remains a global static pointer, it appears this should be
>stored in a per-device structure like struct rvu rather than a global
>variable to support multi-device setups safely.
Only one AF device per system, first patch in the series enforce the same.

>
>> +	kfree(npc_priv);
>> +	npc_priv = NULL;
>>  }
>If any single RVU device unbinds or fails probing, will this free the
>global npc_priv and set it to NULL?
>If so, this seems like it would corrupt the shared state and cause
>use-after-free or NULL pointer crashes for other active RVU devices on
>the system.
Ubinding and failuer handling in AF driver is not complete; as is not in
the scope of this patch. Will work on a proper error handlling (hardening
patch series) once this series is merged.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe
  2026-06-05  6:32 ` [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe Ratheesh Kannoth
  2026-06-08  2:17   ` Ratheesh Kannoth
@ 2026-06-08  2:25   ` Ratheesh Kannoth
  2026-06-08 22:40   ` Jakub Kicinski
  2 siblings, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:25 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:37, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> There is only one admin-function PCI device per system.
> Reject any additional AF probe with -EBUSY so the driver model matches
> hardware and automated reviewers can rely on a single bound instance.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

>> --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
>> +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu.c
>> @@ -3542,12 +3542,19 @@ static void rvu_update_module_params(struct rvu *rvu)
>>  		kpu_profile ? kpu_profile : default_pfl_name, KPU_NAME_LEN);
>>  }
>>
>> +static atomic_t device_bound = ATOMIC_INIT(0);
>> +
>>  static int rvu_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>>  {
>>  	struct device *dev = &pdev->dev;
>>  	struct rvu *rvu;
>>  	int    err;
>>
>> +	if (atomic_cmpxchg(&device_bound, 0, 1) != 0) {
>> +		dev_warn(dev, "Only one af device is supported.\n");
>> +		return -EBUSY;
>> +	}
>> +
>Once this atomic_cmpxchg() succeeds, where is device_bound ever reset
>back to 0?
>The flag is set on entry but the function does not clear it on any of
>the err_* failure labels, and rvu_remove() also does not touch it.
>Two scenarios appear to leave the driver permanently unusable until
>module unload:
>1) Any transient probe failure after the cmpxchg (devm_kzalloc returning
>   NULL, pci_enable_device(), dma_set_mask_and_coherent(), pcim_iomap(),
>   rvu_setup_hw_resources(), rvu_mbox_init(), rvu_register_interrupts(),
>   rvu_register_dl(), rvu_enable_sriov(), etc.) returns via an err_*
>   label with device_bound still set to 1. Wouldn't every subsequent
>   probe attempt — including deferred-probe retries on the same device
>   — then fail with -EBUSY even though nothing is bound?
>2) A successful probe followed by sysfs unbind, PCI hot-remove, or a
>   driver rebind goes through rvu_remove(), which tears down the rvu
>   instance but does not clear device_bound. Doesn't this break
>   rebinding the same device (echo <bdf> > /sys/bus/pci/drivers/rvu_af/bind)
>   and PCI error-recovery flows for the lifetime of the module load?
>The commit message describes the intent as rejecting "any additional AF
>probe" so there is "a single bound instance", which reads as one bound
>at a time. Should device_bound be cleared (atomic_set(&device_bound, 0))
>on every err_* path in rvu_probe() and at the end of rvu_remove() to
>match that intent?

You are completely correct. The current implementation lacks
transactional rollback for device_bound in the downstream
rvu_probe() error paths, as well as the corresponding reset
in rvu_remove().

The inclusion of atomic_cmpxchg() here is a proactive sanity
check to enforce the hardware paradigm, as firmware instantiates
only a single Admin Function (AF) PCI device. However, full
error-handling path hardening and proper resource cleanup for
the AF driver are currently incomplete across the broader
initialization sequence.

To prevent scope creep in this series, we
plan to address the comprehensive error-path rollback—including
proper atomic_set(&device_bound, 0) invocations on probe failure
and driver detachment—in a dedicated, subsequent hardening
patchset targeted for net-next.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 2/9] octeontx2-af: npc: cn20k: debugfs enhancements
  2026-06-05  6:32 ` [PATCH v19 net-next 2/9] octeontx2-af: npc: cn20k: debugfs enhancements Ratheesh Kannoth
  2026-06-08  2:20   ` Ratheesh Kannoth
@ 2026-06-08  2:26   ` Ratheesh Kannoth
  1 sibling, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:26 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:38, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> Improve MCAM visibility and field debugging for CN20K NPC.
>
> - Extend "mcam_layout" to show enabled (+) or disabled state per entry
>   so status can be verified without parsing the full "mcam_entry" dump.
> - Add "dstats" debugfs entry: for enabled MCAM indices, print hit deltas
>   since the prior read by comparing hardware counters to a per-entry
>   software baseline and advancing that baseline after each read (hardware
>   counters are not cleared).
> - Add "mismatch" debugfs entry: lists MCAM entries that are enabled
>   but not explicitly allocated, helping diagnose allocation/field issues.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

t
>> +			stats = rvu_read64(rvu, blkaddr,
>> +					   NPC_AF_CN20K_MCAMEX_BANKX_STAT_EXT(idx, bank));
>> +			if (!stats)
>> +				continue;
>> +			if (stats == dstats[bank][idx])
>> +				continue;
>> +
>> +			if (stats < dstats[bank][idx])
>> +				dstats[bank][idx] = 0;
>Can dstats[bank][idx] overflow this static array?
No. This array is allocated with Max values.
MAX_NUM_BANKS, MAX_SUBBANK_DEPTH and MAX_NUM_SUB_BANKS

>The header comment asserts that MAX_NUM_BANKS, MAX_NUM_SUB_BANKS and
>MAX_SUBBANK_DEPTH are hard limits "preventing any possibility of
>out-of-bounds access", but npc_priv_init() does not enforce that
>contract on bank_depth itself. It only checks num_banks, num_subbanks,
>and subbank_depth = bank_depth / num_subbanks.
>bank_depth maximum across SoC is 8192. We will add a check for num_subbnks as well
>as part of hardening series to net-next once this patch series is merged.
Will add a check on bank_depth in npc_priv_init() in hardening patch series to
net-next after this series is merged.

>Is the dstats baseline correct on the first read, and after an MCAM slot
>is freed and reallocated?
>The commit message describes dstats as printing "hit deltas since the
>prior read by comparing hardware counters to a per-entry software
>baseline and advancing that baseline after each read". The baseline
>dstats[bank][idx] is keyed only by physical hardware slot.
>On the first read after module load, dstats[bank][idx] is 0, so:
>	delta = stats - dstats[bank][idx];
>prints the cumulative counter rather than a delta.
>When an MCAM slot is freed and reallocated to a different rule or PF
>(and the hardware counter is preserved across moves, as
>npc_defrag_move_vdx_to_free does via rvu_write64(... STAT_EXT ...,
>stats)), the retained baseline mixes counts that belonged to the
>previous owner into the delta attributed to the new PF.
>Should the baseline be cleared inside npc_cn20k_enable_mcam_entry() on
>each enable/disable transition, in addition to whatever happens on read?
>Is reading dstats meant to be destructive?
>Each read advances dstats[bank][idx] = stats, so two concurrent readers
>(for example a monitoring daemon and a human admin) race for events:
>whichever reads first advances the baseline, and the other sees zero
>deltas for entries it would otherwise have observed. Most observability
>files under debugfs/procfs/sysfs are passive snapshots so multiple tools
>can scrape them concurrently. Would a snapshot file plus userspace delta
>computation, or a separate write-to-clear sibling, fit better here?

"dstats" is for live debugging in field, where user wants to know which NPC
mcam entry is being hit. This debugfs entry helps to identify the mcam entry.
If mcam entry is allocated/deallocated/allocated again, first time 'cat'ing debugfs
entry, may show wrong delta. But next time, it will show the correct value as s/w
baseline array would have updated to the hw stats value correctly.

>> +
>> +			pf = 0xFFFF;
>> +			map = xa_load(&npc_priv->xa_idx2pf_map, mcam_idx);
>> +			if (map)
>> +				pf = xa_to_value(map);
>> +
>> +			delta = stats - dstats[bank][idx];
>> +
>> +			snprintf(buff, sizeof(buff), "%u\t%#04x\t%llu\n",
>> +				 mcam_idx, pf, delta);
>> +			seq_puts(s, buff);
>> +
>> +			dstats[bank][idx] = stats;
>> +		}
>> +	}
>> +
>> +	mutex_unlock(&stats_lock);
>> +	return 0;
>> +}
>> +
>> +/*  "%u\t%#04x\t%llu\n" needs less than 64 characters to print */
>> +#define TOTAL_SZ (MAX_NUM_BANKS * MAX_NUM_SUB_BANKS * MAX_SUBBANK_DEPTH * 64)
>> +DEFINE_OCTEONTX2_DEBUGFS_ATTRIBUTE_WITH_SIZE(npc_mcam_dstats, TOTAL_SZ);
>Is the up-front 1 MiB allocation per open() necessary here?
>TOTAL_SZ evaluates to MAX_NUM_BANKS * MAX_NUM_SUB_BANKS *
>MAX_SUBBANK_DEPTH * 64 = 1,048,576 bytes, and single_open_size() asks
>seq_file to preallocate that buffer regardless of how much the file
>actually emits. dstats only emits lines for entries with changed
>counters, and mismatch is typically a few lines.
>The size also does not scale to the actual hardware geometry; it uses
>the MAX constants instead of npc_priv->num_banks * npc_priv->bank_depth.
>Could the standard seq_file iterator pattern (start/next/stop/show via
>DEFINE_SHOW_ATTRIBUTE) be used instead, so the buffer grows on demand.

BUffer is pre allocated to the maximum required.

>>  	}
>>
>> @@ -4513,10 +4527,15 @@ static int npc_priv_init(struct rvu *rvu)
>>  		return -EINVAL;
>>  	}
>>
>> -	npc_priv.num_subbanks = num_subbanks;
>> -
>>  	subbank_depth =	bank_depth / num_subbanks;
>> +	if (subbank_depth > MAX_SUBBANK_DEPTH) {
>> +		dev_err(rvu->dev,
>> +			"Invalid subbank depth %u\n",
>> +			subbank_depth);
>> +		return -EINVAL;
>> +	}
>>
>> +	npc_priv.num_subbanks = num_subbanks;
>>  	npc_priv.bank_depth = bank_depth;
>>  	npc_priv.subbank_depth = subbank_depth;
>Tying back to the dstats overflow question above: bank_depth itself is
>not bounded here. With hardware reporting bank_depth = 8193 and
>num_subbanks = 32, integer division gives subbank_depth = 256, which
>passes this check, but bank_depth still exceeds MAX_NUM_SUB_BANKS *
>MAX_SUBBANK_DEPTH = 8192. Should there be an explicit check on
>bank_depth, or a check that bank_depth == subbank_depth * num_subbanks,
>to keep the en_map bitmap and dstats[][] within their declared sizes?
bank_depth is less than MAX_BANK_DEPTH in all Soc. WIll add a check in
next patch series (hardening series to check against bank_depth <= MAX_BANK_DEPTH)
int npc_priv_init() (During probe())

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control
  2026-06-05  6:32 ` [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control Ratheesh Kannoth
  2026-06-08  2:22   ` Ratheesh Kannoth
@ 2026-06-08  2:28   ` Ratheesh Kannoth
  1 sibling, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:28 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:41, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> CN20K NPC MCAM is split into 32 subbanks that are searched in a
> predefined order during allocation. Lower-numbered subbanks have
> higher priority than higher-numbered ones.
>
> Add a runtime "srch_order" to control the order in which
> subbanks are searched during MCAM allocation.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

>> +		USED = 1,
>> +	};
>> +
>> +	if (cnt != npc_priv.num_subbanks) {
>> +		dev_err(rvu->dev, "Number of entries(%u) != %u\n",
>> +			cnt, npc_priv.num_subbanks);
>> +		return -EINVAL;
>> +	}
>> +
>> +	mutex_lock(&mcam->lock);
>> +	npc_lock_all_subbank();
>Can the encoding of subbank_srch_order[] be reconciled between
>initialization and this setter?
>The boot-time fill done by npc_create_srch_order() and npc_subbank_init()
>populates the array as subbank_srch_order[subbank_idx] = priority (i.e.
>indexed by subbank, value is priority). For cnt = 8 it produces
>[3, 5, 2, 6, 1, 7, 0, 4], whose actual hardware search order is
>sb6, sb4, sb2, sb0, sb7, sb1, sb3, sb5.
>The new setter writes the same array back as
>subbank_srch_order[priority] = subbank_idx (indexed by priority, value is
>subbank):
>    for (int i = 0; i < cnt; i++)
>        subbank_srch_order[i] = (u32)narr[i];
>If a user runs devlink dev param show (which calls the GET path returning
>the boot-time encoding) and then re-applies the same array via
>devlink dev param set, the SET path treats it as a permutation of subbank
>indices and produces search order sb3, sb5, sb2, sb6, sb1, sb7, sb0, sb4
>instead of the previous order.
>The validator also checks that arr[i] in 0..sz-1 occurs exactly once each,
>which matches the SET form but is also true of the boot-time content
>(subbank_srch_order[7] == 4 in the example), so the validator does not
>catch the round-trip mismatch.
>Should both writers agree on a single encoding so that show + set
>round-trips are stable?

Agree, but npc_create_srch_order() and npc_subbank_init() are not added by this function
and is preexisting issue. WIll send patch to "net" tree soon after this series
is merged.

> +			xa_erase(&npc_priv.xa_sb_used, prio);
>> +	}
>> +
>> +	for (int i = 0; i < cnt; i++)
>> +		subbank_srch_order[i] = (u32)narr[i];
>> +
>> +	restrict_valid = false;
>Is the unconditional restrict_valid = false here intentional, and is
>there a way to undo it?

No.
This is intentional.

>The cleanup loop directly above does not only erase newly-added entries.
>The check rsrc[FREE][prio] == 1 (and the USED counterpart) also fires for
>slots whose successful iteration replaced a pre-existing entry whose
>paired decrementing iteration never ran due to the failure.
>For example, with sb 0 FREE arr_idx = 0 and sb 1 FREE arr_idx = 1 and
>narr = [1, 0]:
>  - iter prio = 0 succeeds: xa_sb_free[0] is replaced from 0 to 1,
>    rsrc[FREE][0] becomes +1, rsrc[FREE][1] becomes -1.
>  - iter prio = 1 fails.
>  - cleanup loop sees rsrc[FREE][0] == 1 and erases xa_sb_free[0].
>The rollback loop then calls xa_store(xa_sb_free, save[0] = 0, sb 0)
>against an empty slot, which does need a fresh xarray node under
>GFP_KERNEL. If that allocation fails, the only response is WARN(!!err, ...)
>and the function returns the original error.
>After such a failure, sb->arr_idx is restored but the corresponding
>xa_sb_used / xa_sb_free entry is gone. Would future MCAM allocations via
>npc_subbank_noref_alloc() then silently skip those subbanks? Could
>xa_reserve() before the destructive phase, or not pre-erasing entries in
>the cleanup loop, make rollback unconditionally safe?

Suppose there are only two subbnaks and both are free.

When first loop runs.

rsrc[FREE][0] = 1 - 1 = 0
rsrc[FREE][1] = 1 - 1 = 0

xa_erse check if any value is positive, in this case nothing is positive, nothing
will be removed. So rollback will be safe.

>> +
>> +	npc_unlock_all_subbank();
>> +	mutex_unlock(&mcam->lock);
>> +
>> +	return rc;
>> +}
>Is the locking on the allocator side sufficient to make this setter
>atomic with respect to MCAM allocation?

Yes. allocation is done in mbox handler; which is protected by mcam->lock.

>Is the locking on the allocator side sufficient to make this setter
>atomic with respect to MCAM allocation?
>The setter takes mcam->lock plus all per-subbank locks and rewrites
>xa_sb_used / xa_sb_free, modifies every sb->arr_idx, and toggles
>restrict_valid.
>The allocation hot path
>rvu_mbox_handler_npc_mcam_alloc_entry()
>  -> npc_mcam_alloc_entries()
>    -> npc_cn20k_ref_idx_alloc()
>      -> npc_subbank_noref_alloc()
>iterates xa_for_each(&npc_priv.xa_sb_used, ...) and
>xa_for_each(&npc_priv.xa_sb_free, ...) without mcam->lock held
>(npc_mcam_alloc_entries() takes mcam->lock only after
>npc_cn20k_ref_idx_alloc() returns) and without holding any per-subbank
>lock between iterations (the per-subbank lock is taken inside
>npc_subbank_alloc() / npc_subbank_suits() and released before the next
>iteration step).
>xa_for_each is RCU-safe so this is not a use-after-free, but a concurrent
>SET can re-key xarray entries while the allocator is mid-walk, causing
>the allocator to visit some subbanks twice and skip others entirely.
>restrict_valid is also a plain bool read by npc_subbank_restrict_usage()
>and npc_defrag_skip_restricted_sb() and a plain bool write by the setter,
>with no READ_ONCE / WRITE_ONCE.
>Should the allocator iteration also hold mcam->lock (or the relevant
>subbank locks) for the duration of the walk, and should restrict_valid
>use READ_ONCE / WRITE_ONCE?
mcam_allocation() is done under mutex_unlock(&mcam->lock); in mbox handler.
Mbox are handled serially by AF. So this case wont' happen.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 6/9] octeontx2: cn20k: Coordinate default rules with NIX LF lifecycle
  2026-06-05  6:32 ` [PATCH v19 net-next 6/9] octeontx2: cn20k: Coordinate default rules with NIX LF lifecycle Ratheesh Kannoth
@ 2026-06-08  2:29   ` Ratheesh Kannoth
  0 siblings, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:29 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:42, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> Add NIX_LF_DONT_FREE_DFT_IDXS so the PF can send NIX LF free during hw
> reinit or teardown without the AF freeing CN20K default NPC rule indexes
> while the driver still owns that state (otx2_init_hw_resources and
> otx2_free_hw_resources).
>
> On CN20K, allocate default NPC rules from NIX LF alloc before
> nix_interface_init, roll back with npc_cn20k_dft_rules_free on failure,
> and free from NIX LF free when the new flag is not set. Tighten
> rvu_mbox_handler_nix_lf_alloc error handling: use a single rc, propagate
> qmem_alloc and other errors, and set -ENOMEM only when kcalloc fails
> (remove the blanket -ENOMEM at the free_mem path).
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

>>  		intf = NIX_INTF_TYPE_SDP;
>>
>> -	err = nix_interface_init(rvu, pcifunc, intf, nixlf, rsp,
>> -				 !!(req->flags & NIX_LF_LBK_BLK_SEL));
>> -	if (err)
>> -		goto free_mem;
>> +	if (is_cn20k(rvu->pdev)) {
>> +		rc = npc_cn20k_dft_rules_idx_get(rvu, pcifunc, &bcast, &mcast,
>> +						 &promisc, &ucast);
>> +		if (rc) {
>> +			rc = npc_cn20k_dft_rules_alloc(rvu, pcifunc);
>> +			if (rc)
>> +				goto free_mem;
>> +
>> +			rules_created = true;
>> +		}
>> +	}
>Does the rules_created flag faithfully reflect whether allocation
>actually happened?
>npc_cn20k_dft_rules_alloc() returns 0 without doing any work when the
>pcifunc is not CGX or LBK mapped (for example SDP):
>	if (!npc_is_cgx_or_lbk(rvu, pcifunc)) {
>		dev_dbg(rvu->dev,
>			"%s: dft rule allocation is only for cgx mapped device, pcifunc=%#x\n",
>			__func__, pcifunc);
>		return 0;
>	}
>For such a pcifunc, npc_cn20k_dft_rules_idx_get() returns -ESRCH, the
>if branch is taken, dft_rules_alloc() is a no-op returning 0, and
>rules_created becomes true even though no rule was created.
>The free_dft path then calls npc_cn20k_dft_rules_free(), which itself
>early-returns on the same !npc_is_cgx_or_lbk predicate, so the safety
>relies on that hidden coupling rather than an explicit signal.
>Would it be clearer to have npc_cn20k_dft_rules_alloc() return a
>distinct created/not-created indication, or to gate rules_created on
>npc_is_cgx_or_lbk(rvu, pcifunc) at the call site?
>> +
dft rules are only for cgx/lbk interfaces. So for sdp, eventhough npc_cn20k_dft_rules_alloc() returns 0
and rules_created is set to true, failure case, it will call dft_rules_free(), but there is nothing
to free. So this will work.

>> +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c
>> @@ -4085,12 +4095,10 @@ void rvu_npc_clear_ucast_entry(struct rvu *rvu, int pcifunc, int nixlf)
>>
>>  	ucast_idx = npc_get_nixlf_mcam_index(mcam, pcifunc,
>>  					     nixlf, NIXLF_UCAST_ENTRY);
>> -	if (ucast_idx < 0) {
>> -		dev_err(rvu->dev,
>> -			"%s: Error to get ucast entry for pcifunc=%#x\n",
>> -			__func__, pcifunc);
>> +
>> +	/* In cn20k, default rules are freed before detach rsrc */
>> +	if (ucast_idx < 0)
>>  		return;
>> -	}
>The comment says this silent return is for the CN20K case where default
>rules are freed before detach rsrc, but the change drops the dev_err
>unconditionally for every silicon that reaches rvu_npc_clear_ucast_entry()
>via the generic rvu_detach_block() path, including CN10K and OcteonTX2.
>On legacy silicon a negative ucast_idx was previously treated as
>exceptional and worth logging; should this silent return be gated on
>is_cn20k(rvu->pdev) so the dev_err is preserved for the other variants?
No. In cn10k, ucast entry is not dynamic, so it is treated as error. This is
a teardown path, so want to keep same code for cn10k and cn20k.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem
  2026-06-05  6:32 ` [PATCH v19 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem Ratheesh Kannoth
  2026-06-08  2:23   ` Ratheesh Kannoth
@ 2026-06-08  2:30   ` Ratheesh Kannoth
  1 sibling, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:30 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:43, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> Flashing updated firmware on deployed devices is cumbersome. Provide a
> mechanism to load a custom KPU (Key Parse Unit) profile directly from
> the filesystem at module load time.
>
> When the rvu_af module is loaded with the kpu_profile parameter, the
> specified profile is read from /lib/firmware/kpu and programmed into
> the KPU registers. Add npc_kpu_profile_cam2 for the extended cam format
> used by filesystem-loaded profiles and support ptype/ptype_mask in
> npc_config_kpucam when profile->from_fs is set.
>
> Usage:
>   1. Copy the KPU profile file to /lib/firmware/kpu.
>   2. Build OCTEONTX2_AF as a module.
>   3. Load: insmod rvu_af.ko kpu_profile=<profile_name>
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

>> +	struct npc_kpu_profile_action *action2;
>> +	struct npc_kpu_profile_cam2 *cam2;
>>  };
>The on-disk difference between npc_kpu_profile_cam and
>npc_kpu_profile_cam2 is two extra trailing bytes (ptype, ptype_mask).
>Could a single normalized in-memory representation work, with the FS
>loader filling ptype/ptype_mask to zero when reading the legacy layout?
>Every consumer now has to dispatch through accessor helpers
>(npc_get_kpu_cam_nth_entry(), npc_get_num_kpu_cam_entries(),
>npc_get_ikpu_nth_entry(), ...) gated on profile->from_fs, and any new
>consumer that forgets the accessor reads the wrong table silently.
>The names cam2 / action2 / ikpu2 also do not carry semantic meaning on
>their own — would something tied to the format (for example
>cam_with_ptype) read more clearly?
>The from_fs flag also conflates two orthogonal concerns: where the bytes
>came from, and which on-disk schema they use. Is it intentional that the
>new extended format cannot be shipped through the FW-DB path, and that a
>legacy-format profile cannot be loaded from /lib/firmware/kpu/?
This is a new feature request for customer, prior to this we did not have support
for filesystem loading

> +		u8 ptype = kpucam2->ptype;
>> +		u8 pmask = kpucam2->ptype_mask;
>> +
>> +		*val |= FIELD_PREP(GENMASK_ULL(57, 56), ptype & pmask);
>> +		*mask |= FIELD_PREP(GENMASK_ULL(57, 56), ~ptype & pmask);
>> +	}
>The encoding writes ptype into bits [57:56] of the 64-bit CAM word, but
>struct npc_kpu_cam declares those bits as a reserved 8-bit field:
>    struct npc_kpu_cam {
>    #if defined(__BIG_ENDIAN_BITFIELD)
>        u64 rsvd_63_56     : 8;
>        u64 state          : 8;
>        ...
>Should the bitfield in npc_kpu_cam be extended with a named ptype member
>so the encoding is visible in the struct layout (and checkable by sparse)?
>ptype and ptype_mask in npc_kpu_profile_cam2 are u8 (8-bit), but only the
>low 2 bits land in the CAM word via GENMASK_ULL(57, 56); any user passing
>ptype with bits above bit 1 silently has those bits dropped. Is that
>intentional, and if so could the field width be enforced (range-checked
>on load) to surface bad firmware data?
>There is also no #define for the bit positions or for the constant 0x03
>that npc_parser_profile_init() pairs with NPC_AF_PKINDX_TYPE later in
>this patch — a symbolic name would make the relationship between the
>two writes clear.
This is customer specific and all these are intentional.

>that npc_parser_profile_init() pairs with NPC_AF_PKINDX_TYPE later in
>this patch — a symbolic name would make the relationship between the
>two writes clear.
>[ ... ]
>> +static int npc_apply_custom_kpu_from_fs(struct rvu *rvu,
>> +					struct npc_kpu_profile_adapter *profile)
>> +{
>> +	size_t hdr_sz = sizeof(struct npc_kpu_profile_fwdata), offset = 0;
>> +	const struct npc_kpu_profile_fwdata *fw;
>> +	struct npc_kpu_profile_action *action;
>> +	struct npc_kpu_profile_cam2 *cam2;
>> +	struct npc_kpu_fwdata *fw_kpu;
>> +	int entries, ret, entry, kpu;
>>
>>  	fw = rvu->kpu_fwdata;
>>
>> +	/* Binary blob contains ikpu actions entries at start of data[0] */
>> +	profile->ikpu2 = devm_kcalloc(rvu->dev, 1,
>> +				      sizeof(ikpu_action_entries),
>> +				      GFP_KERNEL);
>> +	if (!profile->ikpu2)
>> +		return -ENOMEM;
>> +
>> +	action = (struct npc_kpu_profile_action *)(fw->data + offset);
>> +
>> +	if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))
>> +		return -EINVAL;
>Does the order here leave profile->ikpu2 charged to the device with
>uninitialized contents on the -EINVAL path?
>The devm_kcalloc() runs before the size check, and on error
>profile->from_fs is not set, so npc_get_ikpu_nth_entry() will not
>dereference ikpu2 today — but any future caller that consults
>profile->ikpu2 directly would see uninitialized memory at the right size.
>Could the validation move ahead of the allocation, with memcpy following
>both?

>
>> +	/* The firmware layout does dependent on the internal size of
>> +	 * ikpu_action_entries.
>> +	 */
>> +	memcpy((void *)profile->ikpu2, action, sizeof(ikpu_action_entries));
>> +	offset += sizeof(ikpu_action_entries);
>The fwdata header declares the KPU count via fw->kpus but does not
>declare an ikpu/pkind count for this leading region — its size is
>implicitly hardcoded to ARRAY_SIZE(ikpu_action_entries), an in-tree
>kernel array. The comment acknowledges this hidden ABI dependency.
>If ikpu_action_entries ever grows or shrinks in-tree, every previously
>distributed FS blob silently misaligns: the kernel reads the wrong
>number of bytes for ikpu actions, and all subsequent KPU CAM/action
>offsets are off. The size check
>"if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))" only
>ensures there are enough bytes — it does not ensure the bytes actually
>represent ikpu actions of that exact count.
>Could the on-disk format carry an explicit length / pkind count for the
>ikpu region so the kernel can validate and reject mismatched blobs?
>
if  npc_apply_custom_kpu_from_fs() fails with above mentioned -EINVAL,
npc_prepare_default_kpu(rvu, profile) again intialize profile->ikpu2 to
static memory in npc_load_kpu_profile() function.

>profile->ikpu2 directly would see uninitialized memory at the right size.
>Could the validation move ahead of the allocation, with memcpy following
>both?
>> +	/* The firmware layout does dependent on the internal size of
>> +	 * ikpu_action_entries.
>> +	 */
>> +	memcpy((void *)profile->ikpu2, action, sizeof(ikpu_action_entries));
>> +	offset += sizeof(ikpu_action_entries);
>The fwdata header declares the KPU count via fw->kpus but does not
>declare an ikpu/pkind count for this leading region — its size is
>implicitly hardcoded to ARRAY_SIZE(ikpu_action_entries), an in-tree
>kernel array. The comment acknowledges this hidden ABI dependency.
>If ikpu_action_entries ever grows or shrinks in-tree, every previously
>distributed FS blob silently misaligns: the kernel reads the wrong
>number of bytes for ikpu actions, and all subsequent KPU CAM/action
>offsets are off. The size check
>"if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))" only
>ensures there are enough bytes — it does not ensure the bytes actually
>represent ikpu actions of that exact count.
>Could the on-disk format carry an explicit length / pkind count for the
>ikpu region so the kernel can validate and reject mismatched blobs?

There is no field (currently) in structure to indicate number of ikpu_action_entries
in binary blob. So we keep the number same in all binary blob. We can't add the size
as it will break backward compatability.

> +	for (kpu = 0; kpu < fw->kpus; kpu++) {
>[ ... ]
>> +		fw_kpu = (struct npc_kpu_fwdata *)(fw->data + offset);
>> +		if (fw_kpu->entries <= 0) {
>> +			dev_warn(rvu->dev,
>> +				 "Invalid kpu entries on KPU%d\n", kpu);
>> +			return -EINVAL;
>> +		}
>> +
>> +		entries = min_t(int, fw_kpu->entries, rvu->hw->npc_kpu_entries);
>[ ... ]
>> +		cam2 = (struct npc_kpu_profile_cam2 *)fw_kpu->data;
>> +		offset += sizeof(*fw_kpu) + fw_kpu->entries * sizeof(*cam2);
>> +		action = (struct npc_kpu_profile_action *)(fw->data + offset);
>> +		offset += fw_kpu->entries * sizeof(*action);
>> +		if (rvu->kpu_fwdata_sz < hdr_sz + offset) {
>Could fw_kpu->entries get an upper bound check before being used in the
>offset arithmetic? The FW-DB sibling npc_apply_custom_kpu_from_fw()
>clamps via min_t(int, fw_kpu->entries, KPU_MAX_CST_ENT) before doing the
>arithmetic, while this path only clamps after. On a 64-bit size_t the
>post-arithmetic check correctly catches malformed blobs, but defensive
>symmetry with the FW-DB path would be nice.
OK. will add to net-next tree after this series is merged.

>[ ... ]
>> +	fw = rvu->kpu_fwdata;
>>  	if (le64_to_cpu(fw->signature) != KPU_SIGN) {
>>  		dev_warn(rvu->dev, "Invalid KPU profile signature %llx\n",
>>  			 fw->signature);
>Two distinct on-disk binary formats now share the same KPU_SIGN
>("kpuprof\0") signature and the same npc_kpu_profile_fwdata header.
>The FW-DB layout expects fwdata->data[0] to start directly with KPU
>entries; the new FS layout expects data[0] to start with
>sizeof(ikpu_action_entries) bytes of ikpu actions, then KPU entries
>encoded as npc_kpu_profile_cam2 (with ptype/ptype_mask) instead of
>npc_kpu_profile_cam.
>Nothing in npc_apply_custom_kpu() distinguishes the two; the choice is
>made by which load path is invoked (the from_fs argument). Is that a
>deliberate decision?
Yes. from_fs binary blob has npc_kpu_profile_cam2 format.

>If a user accidentally drops a FW-DB-format blob into /lib/firmware/kpu,
>the signature passes, npc_apply_custom_kpu_from_fs() memcpys the first
>sizeof(ikpu_action_entries) bytes (which are KPU CAM/action data in that
>layout) into profile->ikpu2, and npc_parser_profile_init() then programs
>those bytes into NPC pkind action registers via
>npc_config_kpuaction(rvu, blkaddr, npc_get_ikpu_nth_entry(rvu, idx),
>0, idx, true). The remaining KPU entries are read with the wrong layout
>(npc_kpu_profile_cam2 vs npc_kpu_profile_cam) on top.
>Could the blob carry a layout magic or version field independent of
>load source, and the loader reject mismatches?
User has to load correct FW to get requested feature. We will suggest customer
your suggestion to have different signatures.

> +	*fw_kpus = fw->kpus;
>The cn20k early return at the top of this function returns without
>writing *fw_kpus. Should that path also set *fw_kpus, or should the
>parameter be removed from cn20k call sites?
>Today npc_load_kpu_profile_from_fw() declares int fw_kpus = 0 and never
>reads it back, so the dead local hides the asymmetry. If cn20k is ever
>wired into the FS path the trailing
>"rvu->kpu.kpus = fw_kpus; profile->kpus = fw_kpus;" in
>npc_load_kpu_profile_from_fs() will silently zero kpus and disable KPU
>programming because *fw_kpus was never written.

Currently only cn10k has support for custom binary support from filesystem.

> +	int ret, fw_kpus = 0;
>> +	char path[512] = "kpu/";
>> +
>> +	if (strlen(kpu_profile) > sizeof(path) - strlen("kpu/") - 1) {
>> +		dev_err(rvu->dev, "kpu profile name is too big\n");
>> +		return -ENOSPC;
>> +	}
>> +
>> +	strcat(path, kpu_profile);
>rvu->kpu_pfl_name is char[KPU_NAME_LEN] (32 bytes), so strlen() of that
>buffer can never exceed sizeof(path) - 5 = 507. Is the strlen() check
>intended as defense-in-depth, or is a smaller path[KPU_NAME_LEN +
>sizeof("kpu/")] buffer (or snprintf(path, sizeof(path), "kpu/%.32s",
>kpu_profile)) preferable so the bound and intent line up?
We will add it in defensive pathset to net-next following this patchseries
is merged.

>> +	rvu->kpu.kpus = fw_kpus;
>> +	profile->kpus = fw_kpus;
>> +	profile->from_fs = true;
>> +	return 0;
>> +}
>The success path leaves rvu->kpu_fwdata_sz non-zero while
>rvu->kpu_fwdata has been set to NULL. The error path correctly zeroes
>both.
>npc_load_mkex_profile() and npc_cn20k_load_mkex_profile() consult
>kpu_fwdata_sz to decide whether to skip the FW-DB mkex lookup, so the
>asymmetry is harmless today, but any future consumer that uses
>kpu_fwdata_sz to gate a kpu_fwdata dereference will hit a NULL deref.
>Could kpu_fwdata_sz be cleared symmetrically on success?
This is intentional. We dont need to keep this symmetry.

>> +void npc_load_kpu_profile(struct rvu *rvu)
>> +{
>[ ... ]
>> +	/* Filesystem-based KPU loading is not supported on cn20k.
>> +	 * npc_prepare_default_kpu() was invoked earlier, but control
>> +	 * reached this point because the default profile was not selected.
>> +	 * No need to call it again.
>> +	 */
>> +	if (!is_cn20k(rvu->pdev)) {
>> +		if (!npc_load_kpu_profile_from_fs(rvu))
>> +			return;
>> +	}
>> +
>> +	/* First prepare default KPU, then we'll customize top entries. */
>> +	npc_prepare_default_kpu(rvu, profile);
>> +	if (!npc_load_kpu_profile_from_fw(rvu))
>> +		return;
>>
>>  	npc_prepare_default_kpu(rvu, profile);
>>  }
>The comment says "No need to call it again" but the very next executable
>line after the if (!is_cn20k(...)) block is exactly
>npc_prepare_default_kpu(rvu, profile). The second call is in fact needed
>on non-cn20k to reset state corrupted by a partial FS attempt before the
>FW path runs. Could the comment be reworded to reflect that?
>
This indicate that npc_prepare_default_kpu() called above, so we dont have to callit before
calling npc_load_kpu_profile_from_fs() in cn10k.

>[ ... ]
>>  	for (idx = 0; idx < num_kpus; idx++)
>>  		npc_program_kpu_profile(rvu, blkaddr, idx, &rvu->kpu.kpu[idx]);
>> +
>> +	if (profile->from_fs) {
>> +		rvu_write64(rvu, blkaddr, NPC_AF_PKINDX_TYPE(54), 0x03);
>> +		rvu_write64(rvu, blkaddr, NPC_AF_PKINDX_TYPE(58), 0x03);
>> +	}
>>  }
>A few questions about these two writes:
>The pkind indices 54 and 58 match npc.h enum values
>NPC_RX_CPT_HDR_PTP_PKIND = 54 and NPC_RX_CPT_HDR_PKIND = 58 — could
>those named constants be used here so the relationship is explicit?
>The constant 0x03 has no symbolic name in rvu_reg.h, and there is no
>#define for the bits in NPC_AF_PKINDX_TYPE. Could a named macro be added
>that documents what value is being programmed?
>The same function carefully bounds pkind iteration earlier with
>"num_pkinds = min_t(int, hw->npc_pkinds, num_pkinds);" but these two
>writes target indices 54 and 58 unconditionally. hw->npc_pkinds comes
>from NPC_AF_CONST1[19:12] and is silicon-dependent — should these be
>gated on hw->npc_pkinds >= 59 (or on the corresponding pkind being
>defined) to avoid writing undefined CSRs on a future variant?
>Finally, is it intentional that this hardware register configuration is
>chosen based on the load mechanism (from_fs) rather than profile
>content? Any future FS profile that does not expect pkinds 54 and 58 to
>be configured this way would silently misbehave. Could the writes be
>driven from the profile data itself?

This is very customer specific and intentional. I did not find any speific
names for these values as there are many values in npc_profie.h as well.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 8/9] octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc
  2026-06-05  6:32 ` [PATCH v19 net-next 8/9] octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc Ratheesh Kannoth
  2026-06-08  2:24   ` Ratheesh Kannoth
@ 2026-06-08  2:31   ` Ratheesh Kannoth
  1 sibling, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:31 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:44, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> Default CN20K NPC rule allocation now keys off the active MCAM keyword
> width: use X4 with a bank-masked reference index when the silicon uses
> X4 keys, and X2 with the raw index otherwise (replacing the previous
> always-X2 / eidx + 1 behaviour).
>
> In the AF flow-install path, flows that need more than 256 key bits
> query the NPC profile; if the platform is fixed to X2 entries, fail
> with -EOPNOTSUPP instead of requesting X4. Otherwise select X4 for the
> MCAM alloc.
>
> On the PF, cache and pass the profile kw_type from npc_get_pfl_info
> through otx2_mcam_pfl_info_get(), and use it when allocating MCAM
> entries for RSS/defaults and when installing ethtool flows on CN20K,
> including masking the reference index for X4 slot layout.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says

>The commit message says this replaces the "previous always-X2 /
>eidx + 1 behaviour", but the pre-patch code used ref_entry = eidx
>in the higher-priority allocation and ref_entry = eidx + 1 only in
>the VF lower-priority fallback path.
>Is the commit message description of the prior behavior accurate?
>For the lower-priority fallback in the X2 branch, ref_entry now becomes
>eidx instead of eidx + 1, so the value passed to
>rvu_mbox_handler_npc_mcam_alloc_entry() changes for already-shipping X2
>CN20K silicon, separate from the X4 enablement.
>Was the prior eidx + 1 a deliberate offset (for example to step past
>the higher-priority slot reserved by the first allocation) or an
>off-by-one that is being corrected here?
>If it is a fix, would it make sense to split this into a standalone
>patch with a Fixes: tag so it can be backported and bisected
>independently of the X4 enablement, and could the rationale be added
>to the changelog?
eidx is the oorrect value as eidx + 1 may overflow. initially i submitted this patch
as bug fix, but simon said this is more than a bug fix, so this is part of net-next.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 9/9] octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically.
  2026-06-05  6:32 ` [PATCH v19 net-next 9/9] octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically Ratheesh Kannoth
  2026-06-08  2:25   ` Ratheesh Kannoth
@ 2026-06-08  2:32   ` Ratheesh Kannoth
  1 sibling, 0 replies; 26+ messages in thread
From: Ratheesh Kannoth @ 2026-06-08  2:32 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: andrew+netdev, davem, donald.hunter, edumazet, horms, jiri, kuba,
	pabeni, sgoutham

On 2026-06-05 at 12:02:45, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> Replace the file-scope static npc_priv with a kcalloc'd struct filled
> from hardware bank/subbank geometry at init (num_banks is no longer a
> const compile-time constant; drop init_done and use a non-NULL
> npc_priv pointer for liveness). Thread npc_priv_get() / pointer access
> through the CN20K NPC code paths, extend teardown to kfree the root
> struct on failure and in npc_cn20k_deinit, and adjust MCAM section
> setup to use the discovered subbank count.
>
> Allocate MCAM debugfs dstats via devm_kzalloc instead of a static matrix,
> and use the allocated backing store consistently when computing deltas
> (including the counter rollover compare).
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260605063245.3553861-1-rkannoth%40marvell.com says
:

>> @@ -397,6 +398,10 @@ int npc_cn20k_debugfs_init(struct rvu *rvu)
>>  	debugfs_create_file("vidx2idx", 0444, rvu->rvu_dbg.npc,
>>  			    npc_priv, &npc_vidx2idx_map_fops);
>>
>> +	dstats = devm_kzalloc(rvu->dev, sizeof(*dstats), GFP_KERNEL);
>> +	if (!dstats)
>> +		return -ENOMEM;
>> +
>>  	debugfs_create_file("dstats", 0444, rvu->rvu_dbg.npc, rvu,
>>  			    &npc_mcam_dstats_fops);
>What happens when this devm_kzalloc() fails?
>The only caller is rvu_dbg_npc_init() in
>drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c, which invokes
>npc_cn20k_debugfs_init() as a void call and discards the return value.
>By the time the new -ENOMEM path is taken, mcam_layout, mcam_default and
>vidx2idx have already been registered via debugfs_create_file(), and
>dstats, mismatch, idx2vidx and defrag are then skipped. Three of the
>four skipped files (mismatch, idx2vidx, defrag) have no relationship to
>dstats and seem to be dropped only as a side effect of where the
>allocation was placed.
>Would it be better to either move the dstats allocation up to
>npc_priv_init() (or to the very start of npc_cn20k_debugfs_init() before
>any debugfs_create_file()), or alternatively gate just the dstats
>debugfs_create_file() on the allocation success while still creating the
>unrelated files? In either case, should the caller also be updated to
>honor the return value?

if "dtstats" debugfs is missing, it is still fine as this debugfs entry is used
only for live debugging.

>>  #include "cn20k/reg.h"
>>  #include "rvu_npc_fs.h"
>>
>> -static struct npc_priv_t npc_priv = {
>> -	.num_banks = MAX_NUM_BANKS,
>> -};
>> +static struct npc_priv_t *npc_priv;
>With the previous code, init_done was set to true only at the very end
>of npc_cn20k_init(), after npc_setup_mcam_section() succeeded, so
>'init_done == true' implied both the struct fields were populated and
>the hardware had been programmed.
>The new gate is 'npc_priv != NULL', but npc_priv is assigned by
>kcalloc() near the start of npc_priv_init(), well before bank_depth, kw,
>sb[], the xarrays, the mutex and the list head are populated. Does
>'npc_priv != NULL' still carry the same 'fully initialized' meaning that
>init_done used to?

npc_priv is initialized in npc_priv_init() during probe(). if any of the initialization
fails, npc_priv is freed and set to NULL, and probe() returns with error.

>> @@ -4251,8 +4249,8 @@ int rvu_mbox_handler_npc_get_pfl_info(struct rvu *rvu, struct msg_req *req,
>>  		return -EOPNOTSUPP;
>>  	}
>>
>> -	rsp->kw_type = npc_priv.kw;
>> -	rsp->x4_slots = npc_priv.bank_depth;
>> +	rsp->kw_type = npc_priv->kw;
>> +	rsp->x4_slots = npc_priv->bank_depth;
>>  	return 0;
>>  }
>Should this handler also gain an 'if (!npc_priv)' guard?
>The other public entry points were updated to check for NULL:
>  npc_cn20k_vidx2idx()
>  npc_cn20k_idx2vidx()
>  npc_cn20k_dft_rules_idx_get()
>  npc_cn20k_dft_rules_alloc()
>  npc_cn20k_dft_rules_free()
>but rvu_mbox_handler_npc_get_pfl_info() now dereferences npc_priv->kw
>and npc_priv->bank_depth without a NULL check. Is that intentional, or
>is the asymmetry just an oversight?

If npc_priv initialization fails, probe itself will fail. So these handlers are invoked much later by PF, which
will be active only after AF is fully up.

>>
>> -	xa_destroy(&npc_priv.xa_sb_used);
>[ ... ]
>> +	if (!npc_priv)
>> +		return;
>>
>> -	for (i = 0; i < npc_priv.pf_cnt; i++)
>> -		xa_destroy(&npc_priv.xa_pf2idx_map[i]);
>> +	xa_destroy(&npc_priv->xa_sb_used);
>[ ... ]
>> +	npc_defrag_list_clear();
>> +	kfree(npc_priv);
>> +	npc_priv = NULL;
>>  }
>Previously the storage was a file-scope static, so teardown only flipped
>init_done. Now the storage itself is freed and the pointer cleared
>without any synchronization (no synchronize_rcu(), no flush of the mbox
>workqueue, no lock).
>If a racing mbox handler such as rvu_mbox_handler_npc_get_pfl_info() (or
>any of the npc_cn20k_dft_rules_* helpers) has already passed its
>'if (!npc_priv) return;' check but has not yet read a field, can it now
>dereference freed memory?
>The probe ordering may make this hard to reach today, but does the new
>free-and-NULL pattern introduce a use-after-free window that the old
>init_done flag did not have?

rvu_remove() disables mbox first, so this case wont happen.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe
  2026-06-05  6:32 ` [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe Ratheesh Kannoth
  2026-06-08  2:17   ` Ratheesh Kannoth
  2026-06-08  2:25   ` Ratheesh Kannoth
@ 2026-06-08 22:40   ` Jakub Kicinski
  2 siblings, 0 replies; 26+ messages in thread
From: Jakub Kicinski @ 2026-06-08 22:40 UTC (permalink / raw)
  To: Ratheesh Kannoth
  Cc: linux-kernel, netdev, andrew+netdev, davem, donald.hunter,
	edumazet, horms, jiri, pabeni, sgoutham

On Fri, 5 Jun 2026 12:02:37 +0530 Ratheesh Kannoth wrote:
> There is only one admin-function PCI device per system.
> Reject any additional AF probe with -EBUSY so the driver model matches
> hardware and automated reviewers can rely on a single bound instance.

Could you point me to a PCI networking driver written in the last two
decades which would have this sort of limitation?

At the very least you need to explain in the commit message **why**
correctly handling multiple devices in a system is beyond your
abilities.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2026-06-08 22:40 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-05  6:32 [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
2026-06-05  6:32 ` [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe Ratheesh Kannoth
2026-06-08  2:17   ` Ratheesh Kannoth
2026-06-08  2:25   ` Ratheesh Kannoth
2026-06-08 22:40   ` Jakub Kicinski
2026-06-05  6:32 ` [PATCH v19 net-next 2/9] octeontx2-af: npc: cn20k: debugfs enhancements Ratheesh Kannoth
2026-06-08  2:20   ` Ratheesh Kannoth
2026-06-08  2:26   ` Ratheesh Kannoth
2026-06-05  6:32 ` [PATCH v19 net-next 3/9] devlink: heap-allocate param fill buffers in devlink_nl_param_fill Ratheesh Kannoth
2026-06-05  6:32 ` [PATCH v19 net-next 4/9] devlink: Implement devlink param multi attribute nested data values Ratheesh Kannoth
2026-06-05  6:32 ` [PATCH v19 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control Ratheesh Kannoth
2026-06-08  2:22   ` Ratheesh Kannoth
2026-06-08  2:28   ` Ratheesh Kannoth
2026-06-05  6:32 ` [PATCH v19 net-next 6/9] octeontx2: cn20k: Coordinate default rules with NIX LF lifecycle Ratheesh Kannoth
2026-06-08  2:29   ` Ratheesh Kannoth
2026-06-05  6:32 ` [PATCH v19 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem Ratheesh Kannoth
2026-06-08  2:23   ` Ratheesh Kannoth
2026-06-08  2:30   ` Ratheesh Kannoth
2026-06-05  6:32 ` [PATCH v19 net-next 8/9] octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc Ratheesh Kannoth
2026-06-08  2:24   ` Ratheesh Kannoth
2026-06-08  2:31   ` Ratheesh Kannoth
2026-06-05  6:32 ` [PATCH v19 net-next 9/9] octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically Ratheesh Kannoth
2026-06-08  2:25   ` Ratheesh Kannoth
2026-06-08  2:32   ` Ratheesh Kannoth
  -- strict thread matches above, loose matches on Subject: below --
2026-06-05  3:50 [PATCH v19 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
2026-06-05  3:50 ` [PATCH v19 net-next 1/9] octeontx2-af: Enforce single RVU AF probe Ratheesh Kannoth
2026-06-05  7:47   ` Ratheesh Kannoth

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox