Re: [PATCH v10 net-next 3/6] devlink: Implement devlink param multi attribute nested data values

public inbox for linux-rdma@vger.kernel.org
 help / color / mirror / Atom feed

From: Ratheesh Kannoth <rkannoth@marvell.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-rdma@vger.kernel.org>, <sgoutham@marvell.com>,
	<andrew+netdev@lunn.ch>, <davem@davemloft.net>,
	<edumazet@google.com>, <kuba@kernel.org>,
	<donald.hunter@gmail.com>, <horms@kernel.org>, <jiri@resnulli.us>,
	<chuck.lever@oracle.com>, <matttbe@kernel.org>,
	<cjubran@nvidia.com>, <saeedm@nvidia.com>, <leon@kernel.org>,
	<tariqt@nvidia.com>, <mbloch@nvidia.com>, <dtatulea@nvidia.com>
Subject: Re: [PATCH v10 net-next 3/6] devlink: Implement devlink param multi attribute nested data values
Date: Wed, 8 Apr 2026 16:11:33 +0530	[thread overview]
Message-ID: <adYw3SeHoj43Xs2S@rkannoth-OptiPlex-7090> (raw)
In-Reply-To: <c14a0783-a69f-448d-a464-2d802e6d0ec7@redhat.com>

On 2026-04-07 at 15:28:09, Paolo Abeni (pabeni@redhat.com) wrote:
> On 4/3/26 4:55 AM, Ratheesh Kannoth wrote:
> > From: Saeed Mahameed <saeedm@nvidia.com>
> >
> > Devlink param value attribute is not defined since devlink is handling
> > the value validating and parsing internally, this allows us to implement
> > multi attribute values without breaking any policies.
> >
> > Devlink param multi-attribute values are considered to be dynamically
> > sized arrays of u64 values, by introducing a new devlink param type
> > DEVLINK_PARAM_TYPE_U64_ARRAY, driver and user space can set a variable
> > count of u32 values into the DEVLINK_ATTR_PARAM_VALUE_DATA attribute.
> >
> > Implement get/set parsing and add to the internal value structure passed
> > to drivers.
> >
> > This is useful for devices that need to configure a list of values for
> > a specific configuration.
> >
> > example:
> > $ devlink dev param show pci/... name multi-value-param
> > name multi-value-param type driver-specific
> > values:
> > cmode permanent value: 0,1,2,3,4,5,6,7
> >
> > $ devlink dev param set pci/... name multi-value-param \
> > 		value 4,5,6,7,0,1,2,3 cmode permanent
> >
> > Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
> > Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
> > ---
> >  Documentation/netlink/specs/devlink.yaml |  4 ++
> >  include/net/devlink.h                    |  8 +++
> >  include/uapi/linux/devlink.h             |  1 +
> >  net/devlink/netlink_gen.c                |  2 +
> >  net/devlink/param.c                      | 91 +++++++++++++++++++-----
> >  5 files changed, 89 insertions(+), 17 deletions(-)
> >
> > diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml
> > index b495d56b9137..b619de4fe08a 100644
> > --- a/Documentation/netlink/specs/devlink.yaml
> > +++ b/Documentation/netlink/specs/devlink.yaml
> > @@ -226,6 +226,10 @@ definitions:
> >          value: 10
> >        -
> >          name: binary
> > +      -
> > +        name: u64-array
> > +        value: 129
> > +
> >    -
> >      name: rate-tc-index-max
> >      type: const
> > diff --git a/include/net/devlink.h b/include/net/devlink.h
> > index 3038af6ec017..3a355fea8189 100644
> > --- a/include/net/devlink.h
> > +++ b/include/net/devlink.h
> > @@ -432,6 +432,13 @@ enum devlink_param_type {
> >  	DEVLINK_PARAM_TYPE_U64 = DEVLINK_VAR_ATTR_TYPE_U64,
> >  	DEVLINK_PARAM_TYPE_STRING = DEVLINK_VAR_ATTR_TYPE_STRING,
> >  	DEVLINK_PARAM_TYPE_BOOL = DEVLINK_VAR_ATTR_TYPE_FLAG,
> > +	DEVLINK_PARAM_TYPE_U64_ARRAY = DEVLINK_VAR_ATTR_TYPE_U64_ARRAY,
> > +};
> > +
> > +#define __DEVLINK_PARAM_MAX_ARRAY_SIZE 32
> > +struct devlink_param_u64_array {
> > +	u64 size;
> > +	u64 val[__DEVLINK_PARAM_MAX_ARRAY_SIZE];
> >  };
> >
> >  union devlink_param_value {
> > @@ -441,6 +448,7 @@ union devlink_param_value {
> >  	u64 vu64;
> >  	char vstr[__DEVLINK_PARAM_MAX_STRING_VALUE];
> >  	bool vbool;
> > +	struct devlink_param_u64_array u64arr;
>
> Sashiko as a couple of relevant remarks here, specifically:
>
> ---
> Does this increase the size of union devlink_param_value from 32 bytes
> to over 264 bytes?
> Looking at existing functions like devlink_nl_param_value_fill_one() and
> devlink_nl_param_value_put(), they take multiple copies of this union by
> value. Passing two of these unions by value consumes over 528 bytes of
> stack space, and combined in a call chain this pushes nearly 800 bytes
> of arguments onto the stack.
> Could this create a risk of hitting CONFIG_FRAME_WARN limits deep in
> driver notification contexts? Should the signatures of the internal
> functions and exported APIs be updated to pass the unions by pointer
> instead?
ACK. will make as seperate patch.

> ---
>
> >  };
> >
> >  struct devlink_param_gset_ctx {
> > diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> > index 7de2d8cc862f..5332223dd6d0 100644
> > --- a/include/uapi/linux/devlink.h
> > +++ b/include/uapi/linux/devlink.h
> > @@ -406,6 +406,7 @@ enum devlink_var_attr_type {
> >  	DEVLINK_VAR_ATTR_TYPE_BINARY,
> >  	__DEVLINK_VAR_ATTR_TYPE_CUSTOM_BASE = 0x80,
> >  	/* Any possible custom types, unrelated to NLA_* values go below */
> > +	DEVLINK_VAR_ATTR_TYPE_U64_ARRAY,
> >  };
> >
> >  enum devlink_attr {
> > diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c
> > index eb35e80e01d1..7aaf462f27ee 100644
> > --- a/net/devlink/netlink_gen.c
> > +++ b/net/devlink/netlink_gen.c
> > @@ -37,6 +37,8 @@ devlink_attr_param_type_validate(const struct nlattr *attr,
> >  	case DEVLINK_VAR_ATTR_TYPE_NUL_STRING:
> >  		fallthrough;
> >  	case DEVLINK_VAR_ATTR_TYPE_BINARY:
> > +		fallthrough;
> > +	case DEVLINK_VAR_ATTR_TYPE_U64_ARRAY:
> >  		return 0;
> >  	}
> >  	NL_SET_ERR_MSG_ATTR(extack, attr, "invalid enum value");
> > diff --git a/net/devlink/param.c b/net/devlink/param.c
> > index cf95268da5b0..2ec85dffd8ac 100644
> > --- a/net/devlink/param.c
> > +++ b/net/devlink/param.c
> > @@ -252,6 +252,14 @@ devlink_nl_param_value_put(struct sk_buff *msg, enum devlink_param_type type,
> >  				return -EMSGSIZE;
> >  		}
> >  		break;
> > +	case DEVLINK_PARAM_TYPE_U64_ARRAY:
> > +		if (val.u64arr.size > __DEVLINK_PARAM_MAX_ARRAY_SIZE)
> > +			return -EMSGSIZE;
> > +
> > +		for (int i = 0; i < val.u64arr.size; i++)
> > +			if (nla_put_uint(msg, nla_type, val.u64arr.val[i]))
> > +				return -EMSGSIZE;
> > +		break;
> >  	}
> >  	return 0;
> >  }
> > @@ -304,56 +312,78 @@ static int devlink_nl_param_fill(struct sk_buff *msg, struct devlink *devlink,
> >  				 u32 portid, u32 seq, int flags,
> >  				 struct netlink_ext_ack *extack)
> >  {
> > -	union devlink_param_value default_value[DEVLINK_PARAM_CMODE_MAX + 1];
> > -	union devlink_param_value param_value[DEVLINK_PARAM_CMODE_MAX + 1];
> >  	bool default_value_set[DEVLINK_PARAM_CMODE_MAX + 1] = {};
> >  	bool param_value_set[DEVLINK_PARAM_CMODE_MAX + 1] = {};
> >  	const struct devlink_param *param = param_item->param;
> > -	struct devlink_param_gset_ctx ctx;
> > +	union devlink_param_value *default_value;
> > +	union devlink_param_value *param_value;
> > +	struct devlink_param_gset_ctx *ctx;
> >  	struct nlattr *param_values_list;
> >  	struct nlattr *param_attr;
> >  	void *hdr;
> >  	int err;
> >  	int i;
> >
> > +	default_value = kcalloc(DEVLINK_PARAM_CMODE_MAX + 1,
> > +				sizeof(*default_value), GFP_KERNEL);
> > +	if (!default_value)
> > +		return -ENOMEM;
> > +
> > +	param_value = kcalloc(DEVLINK_PARAM_CMODE_MAX + 1,
> > +			      sizeof(*param_value), GFP_KERNEL);
> > +	if (!param_value) {
> > +		kfree(default_value);
> > +		return -ENOMEM;
> > +	}
> > +
> > +	ctx = kmalloc_obj(*ctx);
> > +	if (!ctx) {
> > +		kfree(param_value);
> > +		kfree(default_value);
> > +		return -ENOMEM;
> > +	}
> > +
> >  	/* Get value from driver part to driverinit configuration mode */
> >  	for (i = 0; i <= DEVLINK_PARAM_CMODE_MAX; i++) {
> >  		if (!devlink_param_cmode_is_supported(param, i))
> >  			continue;
> >  		if (i == DEVLINK_PARAM_CMODE_DRIVERINIT) {
> > -			if (param_item->driverinit_value_new_valid)
> > +			if (param_item->driverinit_value_new_valid) {
> >  				param_value[i] = param_item->driverinit_value_new;
> > -			else if (param_item->driverinit_value_valid)
> > +			} else if (param_item->driverinit_value_valid) {
> >  				param_value[i] = param_item->driverinit_value;
> > -			else
> > -				return -EOPNOTSUPP;
> > +			} else {
> > +				err = -EOPNOTSUPP;
> > +				goto get_put_fail;
> > +			}
> >
> >  			if (param_item->driverinit_value_valid) {
> >  				default_value[i] = param_item->driverinit_default;
> >  				default_value_set[i] = true;
> >  			}
> >  		} else {
> > -			ctx.cmode = i;
> > -			err = devlink_param_get(devlink, param, &ctx, extack);
> > +			ctx->cmode = i;
> > +			err = devlink_param_get(devlink, param, ctx, extack);
> >  			if (err)
> > -				return err;
> > -			param_value[i] = ctx.val;
> > +				goto get_put_fail;
> > +			param_value[i] = ctx->val;
> >
> > -			err = devlink_param_get_default(devlink, param, &ctx,
> > +			err = devlink_param_get_default(devlink, param, ctx,
> >  							extack);
> >  			if (!err) {
> > -				default_value[i] = ctx.val;
> > +				default_value[i] = ctx->val;
> >  				default_value_set[i] = true;
> >  			} else if (err != -EOPNOTSUPP) {
> > -				return err;
> > +				goto get_put_fail;
> >  			}
> >  		}
> >  		param_value_set[i] = true;
> >  	}
> >
> > +	err = -EMSGSIZE;
> >  	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
> >  	if (!hdr)
> > -		return -EMSGSIZE;
> > +		goto get_put_fail;
> >
> >  	if (devlink_nl_put_handle(msg, devlink))
> >  		goto genlmsg_cancel;
> > @@ -393,6 +423,9 @@ static int devlink_nl_param_fill(struct sk_buff *msg, struct devlink *devlink,
> >  	nla_nest_end(msg, param_values_list);
> >  	nla_nest_end(msg, param_attr);
> >  	genlmsg_end(msg, hdr);
> > +	kfree(default_value);
> > +	kfree(param_value);
> > +	kfree(ctx);
> >  	return 0;
> >
> >  values_list_nest_cancel:
> > @@ -401,7 +434,11 @@ static int devlink_nl_param_fill(struct sk_buff *msg, struct devlink *devlink,
> >  	nla_nest_cancel(msg, param_attr);
> >  genlmsg_cancel:
> >  	genlmsg_cancel(msg, hdr);
> > -	return -EMSGSIZE;
> > +get_put_fail:
> > +	kfree(default_value);
> > +	kfree(param_value);
> > +	kfree(ctx);
> > +	return err;
> >  }
> >
> >  static void devlink_param_notify(struct devlink *devlink,
> > @@ -507,7 +544,7 @@ devlink_param_value_get_from_info(const struct devlink_param *param,
> >  				  union devlink_param_value *value)
> >  {
> >  	struct nlattr *param_data;
> > -	int len;
> > +	int len, cnt, rem;
> >
> >  	param_data = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA];
> >
> > @@ -547,6 +584,26 @@ devlink_param_value_get_from_info(const struct devlink_param *param,
> >  			return -EINVAL;
> >  		value->vbool = nla_get_flag(param_data);
> >  		break;
> > +
> > +	case DEVLINK_PARAM_TYPE_U64_ARRAY:
> > +		cnt = 0;
> > +		nla_for_each_attr_type(param_data,
> > +				       DEVLINK_ATTR_PARAM_VALUE_DATA,
> > +				       genlmsg_data(info->genlhdr),
> > +				       genlmsg_len(info->genlhdr), rem) {
> > +			if (cnt >= __DEVLINK_PARAM_MAX_ARRAY_SIZE)
> > +				return -EMSGSIZE;
> > +
> > +			if ((nla_len(param_data) != sizeof(u64)) &&
> > +			    (nla_len(param_data) != sizeof(u32)))
> > +				return -EINVAL;
> > +
> > +			value->u64arr.val[cnt] = (u64)nla_get_uint(param_data);
> > +			cnt++;
> > +		}
> > +
> > +		value->u64arr.size = cnt;
> > +		break;
>
> Sashiko says:
>
> ---
> Does this make it impossible to set an empty array to clear a
> multi-value parameter?
> If userspace provides 0 elements, param_data will be NULL. Earlier in
> devlink_param_value_get_from_info(), there is a check:
> 	param_data = info->attrs[DEVLINK_ATTR_PARAM_VALUE_DATA];
> 	if (param->type != DEVLINK_PARAM_TYPE_BOOL && !param_data)
> 		return -EINVAL;
> If the parameter is a U64_ARRAY and no data is provided, this check will
> immediately return -EINVAL.
> The kernel can successfully emit an empty array on a GET request if the
> size is 0. Should the SET path similarly support receiving 0 elements to
> allow userspace to clear a multi-value parameter?
multi-value parameter is to set values. if some one wants to set to zero
they can set all values as zero and pass a "size" array. we can ignore this comment, right ?

> ---
>
> There are several others NIC-specific remarks, which IMHO are mostly
> pre-existing issues, but please have a look:
>
> https://sashiko.dev/#/patchset/20260403025533.6250-1-rkannoth%40marvell.com
>
> /P
>

A. Patch 1: [PATCH v10 net-next 1/6] octeontx2-af: npc: cn20k: debugfs enhancements

1.

"Is it safe to use a global static array here?
Concurrent reads of the dstats debugfs file could race to read and write to
this array, leading to torn updates. Additionally, on systems with multiple
RVU devices, wouldn't they share the same array and cross-contaminate their
statistics?"

The purpose of using "dstats" is to identify which rules current traffic is hitting.
We are not particularly concerned with overall packet statistics, as those can
be obtained through rule's stats register. Each rule hit will display the corresponding
rule during 'cating', so this comment can be disregarded.

2.

"Are the hardware parameters read into npc_priv->bank_depth and
npc_priv->num_banks guaranteed to fit within the static bounds of dstats?
If the hardware reports a bank_depth that makes mcam_idx exceed the
pre-allocated array dimensions, could this result in an out-of-bounds
memory access here?"

static u64 dstats[MAX_NUM_BANKS][MAX_SUBBANK_DEPTH * MAX_NUM_SUB_BANKS] = {};
MAX_SUBBANK_DEPTH and MAX_NUM_SUB_BANKS are maximum possible. So above case
wont happen

3.

"Could this cause data loss if the output exceeds the initial seq_file
buffer size?
DEFINE_SHOW_ATTRIBUTE() uses single_open(), which will discard the buffer,
double its size, and re-invoke this show callback from the beginning if it
overflows.
Since dstats[bank][idx] is updated during the first execution, a
re-invocation would see stats == dstats[bank][idx] and silently drop all
previously processed entries."

The purpose of "dstats" entry to find live traffic hitting rules. It may be showing "wrong"
value for the first time, but second time onwards, it will be okay. Usually user runs 'cat'ing mutliple
times to find "htting" rule.

5.

"Similar to the dstats array, does this code overflow en_map?
If the index provided here can exceed the compile-time bounds of
MAX_NUM_BANKS * MAX_NUM_SUB_BANKS * MAX_SUBBANK_DEPTH due to dynamically
read hardware sizes, it could corrupt memory.
Additionally, is en_map explicitly cleared during device initialization
or teardown?
Since npc_priv is global, if the driver is unbound and rebound without
unloading the module, could en_map retain stale state from the previous
session?"
MAX_NUM_BANKS * MAX_NUM_SUB_BANKS * MAX_SUBBANK_DEPTH is big enough to hold
for all SoC variants.

B. Patch 2: [PATCH v10 net-next 2/6] net/mlx5e: heap-allocate devlink param values

1.

"Does returning -ENOMEM here trigger a redundant WARN_ON() in the caller?
Looking at mlx5e_pcie_cong_event_init(), it checks the return value like this:
drivers/net/ethernet/mellanox/mlx5/core/en/pcie_cong_event.c:mlx5e_pcie_cong_event_init() {
    ...
	err = mlx5e_pcie_cong_get_thresh_config(mdev, &thresh_config);
	if (WARN_ON(err))
		return err;
    ...
}
Since kcalloc() already prints a warning and stack trace on allocation failure
(unless __GFP_NOWARN is used), wrapping the -ENOMEM return in WARN_ON() will
result in a double stack trace in the kernel log.
Is it possible to drop the WARN_ON() in the caller to avoid this redundant
splat?"

Existing code issue.  i believe, i dont have to fix this. correct me pls.

C. Patch 3: [PATCH v10 net-next 3/6] devlink: Implement devlink param multi attribute nested data values

Replied above as part of your comments.

D. Patch 4: [PATCH v10 net-next 4/6] octeontx2-af: npc: cn20k: add subbank search order control

1.

"If xa_store() fails and we jump to fail_used, doesn't this leave the device
in an unrecoverable state?
The code previously erased all subbank entries from npc_priv.xa_sb_used and
npc_priv.xa_sb_free. By aborting without restoring the remaining subbanks,
the erased subbanks appear to be permanently leaked, which would permanently
break subsequent hardware MCAM allocations. Global state is also modified
before the insertion loop, leaving the driver's internal state inconsistent.
Could this be avoided by pre-allocating the required XArray slots with
xa_reserve() or by building new XArrays and swapping them upon success?"

We are doing this operation underlocks and there wont be repetitions of slots in
the array. This above condiition is very unlikely, if there is some issue, we
flag the error thru print. This means that there is some other deep issue which
caused this error. So we can ignore this.

2.
" +	if (bitmap_weight(&w, sz) != sz) {
> +		dev_err(rvu->dev,
> +			"Duplicate or out-of-range subbank index. %lu\n",
> +			find_first_zero_bit(&w, sz));
> +		return -EINVAL;
> +	}
Does this error message print the missing index instead of the duplicate
one?
Since find_first_zero_bit() finds the first unset bit, it will return the
index that the user failed to provide, not the one they duplicated or
provided out of range. Could this lead to a confusing error message for
users?"
I will change print error message to   "Duplicate or missing or out-of-range subbank index. %lu\n","

E. Patch 5: [PATCH v10 net-next 5/6] octeontx2-af: npc: cn20k: dynamically allocate and free default MCAM entries

1.

>  	u64 cfg, hw_prio;
>  	u8 kw_type;
>
> +	if (index < 0 || index >= mcam->total_entries) {
> +		WARN(1, "Wrong mcam index %d\n", index);
> +		return;
> +	}
Could this cause kernel log spam due to unchecked negative MCAM indices?
In npc_enadis_default_entries() and npc_enadis_default_mce_entry(), the code
queries the MCAM index using npc_get_nixlf_mcam_index(), which correctly
returns a negative error code if the entry is not allocated.
This unchecked negative index is then passed directly to
npc_cn20k_enable_mcam_entry(), which checks if (index < 0) and triggers a
stack trace via WARN(1, ...). This means that a routine NIX LF teardown of
an uninitialized interface (or a VF without multicast support querying the
PF's missing BCAST entry) will repeatedly trigger stack traces.

We have checks in npc_enadis_default_entries() to check for default entries and bailout.

       /* only CGX or LBK interfaces have default entries */
        if (is_cn20k(rvu->pdev) &&
            !npc_is_cgx_or_lbk(rvu, pcifunc & ~RVU_PFVF_FUNC_MASK))
                return;

will add the same for npc_enadis_default_mce_entry() as well.

2.

> @@ -4024,6 +4041,13 @@ int npc_cn20k_dft_rules_idx_get(struct rvu *rvu, u16 pcifunc, u16 *bcast,
>  	void *val;
>  	int i, j;
>
> +	for (int i = 0; i < ARRAY_SIZE(ptr); i++) {
> +		if (!ptr[i])
> +			continue;
There is variable shadowing here. The variable 'i' is declared at the
function scope, but the loop re-declares 'i', shadowing the outer scope
variable. This triggers -Wshadow compiler warnings.

ACK.

3.

Does this code use virtual MCAM indices as physical indices, potentially
causing massive out-of-bounds memory corruption?
In npc_cn20k_dft_rules_alloc(), default MCAM entries are allocated via the
mailbox handler (rvu_mbox_handler_npc_mcam_alloc_entry), which returns
virtual indices (vidx) in rsp.entry_list for CN20K hardware. These vidx
values are inserted directly into the xa_pf2dfl_rmap xarray.
Later, this code in npc_cn20k_dft_rules_free() retrieves these vidx values
and passes them directly to physical index APIs: npc_mcam_clear_bit() and
mcam->entry2pfvf_map[]. Because vidx indices are structurally shifted
(starting at npc_priv.bank_depth * 2), they are guaranteed to be larger
than mcam->bmap_entries.
This causes rentry = mcam->bmap_entries - index - 1 in npc_mcam_clear_bit()
to underflow, leading __clear_bit() to perform out-of-bounds writes on
kernel heap memory. The vidx values should probably be mapped back to
physical indices using npc_cn20k_vidx2idx() before accessing software state
arrays.

virt field in npc_mcam_alloc_entry_req structure is not set for while calling for
default mcam index allocation using rvu_mbox_handler_npc_mcam_alloc_entry(). So
this comment can be ignored.

4.

Is there a race condition here that could lead to MCAM index corruption due
to premature lock release?
The mcam->lock is held while calling npc_mcam_clear_bit(), which
immediately marks the MCAM index as free in the software bitmap
(mcam->bmap). The code then explicitly unlocks mcam->lock before calling
npc_cn20k_idx_free() to release the index in the firmware/hardware.
Releasing the lock creates a critical race window: another thread can
concurrently allocate this exact MCAM index (since it appears free in the
bitmap), assign it to a new interface, and configure it.
When the original thread resumes and calls npc_cn20k_idx_free(), it
instructs the hardware to free the index, silently destroying the newly
allocated flow rules of the concurrent thread. Should the hardware cleanup
be performed before the software bitmap is marked free, or should the lock
remain held?

These alloc free request are thru mbox messages and are processed one by one (serially)
So ignore this case.

5.

int npc_cn20k_dft_rules_alloc(struct rvu *rvu, u16 pcifunc)
Is there a software bitmap leak and multi-free bug in the error path of
npc_cn20k_dft_rules_alloc()?
If xa_insert() fails, the code jumps to the err label to free the allocated
entries. The error path executes:
	for (i = 0; i < cnt; i++)
		npc_idx_free(rvu, mcam_idx, cnt, false);
Since npc_idx_free() takes an array and a count to free 'cnt' entries,
looping over it causes the exact same array of indices to be freed 'cnt'
times.
Furthermore, the error path fails to clear the software bits in the core
hardware bitmap (mcam->bmap) via npc_mcam_clear_bit() and does not
invalidate mcam->entry2pfvf_map. Since the allocation handler successfully
marked these entries as used, omitting these steps permanently leaks the
software state for these MCAM indices.

dft rules fail means a critical issue as this will affect system operation. Theere is
no point in recovering cleanly from it. But anyway, i dont find a clear 1:1 with code, am
i missing anything here.

6.

Does this early return skip the cleanup of user-defined flow rules for
non-CGX/LBK interfaces?
While the intent seems to be skipping the disablement of default entries
for interfaces that don't have them (like SDP or CPT), this function is
also responsible for disabling ALL user-defined MCAM flow rules associated
with the pcifunc by iterating over mcam->mcam_rules.
By returning early here, all user-defined rules are left fully active in
the hardware parser, pointing to a NIX LF that is being torn down and freed.
This could lead to hardware DMA faults and context errors when traffic
matches these orphaned rules. Should this check be exclusively inside
rvu_npc_disable_default_entries() instead?

ACK.

7.

Can this cause a permanent resource leak of default MCAM entries on PF driver
teardown?
otx2_free_hw_resources() unconditionally sets the NIX_LF_DONT_FREE_DFT_IDXS
flag during NIX LF free. While intended for cases like suspend/resume, this
function is called across all driver unbind/remove paths.
Consequently, when the otx2_pf driver is unloaded, the AF driver skips
freeing the default MCAM rules, leaving them permanently orphaned in the
AF's xa_pf2dfl_rmap and hardware allocation bitmap. They can never be
recovered or reused by other physical or virtual functions, causing a
severe leak of globally shared MCAM resources on every module detach.

These default mcam indexes are freed during rvu_detach_rsrcs mbox call.
rvu_detach_rsrcs() ->npc_cn20k_dft_rules_free().

F: Patch 6: [PATCH v10 net-next 6/6] octeontx2-af: npc: Support for custom KPU profile from filesystem

1.

Can this corrupt global state? In npc_prepare_default_kpu(), adapter->kpu
is initialized to the global static array npc_kpu_profiles. By assigning
kpu->cam2 and kpu->action2 here, this modifies the shared static array with
device-managed memory (devm_kcalloc).
If the device is unbound, the memory will be freed, leaving dangling pointers
in the global array which could lead to a use-after-free for subsequent device
instances.

This memory is part of "AF" and is initlaized during boot. AF memory will be freed when
device is shutdown. There wont be any case of af getting freed and come again on a live system.

2.

> +
> +	action = (struct npc_kpu_profile_action *)(fw->data + offset);
> +
> +	if (rvu->kpu_fwdata_sz < hdr_sz + sizeof(ikpu_action_entries))
> +		return -ENOMEM;
Is -ENOMEM the correct return value here? A malformed or truncated firmware
file would cause this condition, so -EINVAL might be more appropriate than
indicating a memory allocation failure.

ACK.

3.

Could this cause firmware compatibility issues in the future?
ikpu_action_entries is an internal kernel array. If new entries are added
to it in future kernel versions, sizeof(ikpu_action_entries) will grow. This
would cause older firmware files to fail the size check above, or miscalculate
the offset for subsequent entries.

There could be a chance, but there is not field in "struct npc_kpu_profile_fwdata"
to indicate sizeof ikpu_action_entries. Adding a new field will break backward compatability etc.
Anyway In future if more entries are planned, we need to come up with a new design to address
this issue.

4.

 +
> +	for (kpu = 0; kpu < fw->kpus; kpu++) {
Does this loop have a bounds checking issue? Unlike the firmware database
path which checks if fw->kpus is greater than profile->kpus, this path only
checked if fw->kpus is greater than rvu->hw->npc_kpus earlier.
If the hardware limit exceeds the size of the statically allocated
profile->kpu array, this could overflow profile->kpu.

ACK.

5.

 +				 "profile size mismatch on kpu%i parsing\n",
> +				 kpu + 1);
> +			return -EINVAL;
> +		}
> +
> +		fw_kpu = (struct npc_kpu_fwdata *)(fw->data + offset);
> +		entries = min(fw_kpu->entries, rvu->hw->npc_kpu_entries);
Can this result in an out-of-bounds read? The size check above only ensures
that the start of fw_kpu is within the firmware data boundaries. It does not
verify that the entire fw_kpu structure fits within the remaining space before
dereferencing fw_kpu->entries.
> +		dev_info(rvu->dev,
> +			 "Loading %u entries on KPU%d\n", entries, kpu);
> +

ACK.

next prev parent reply	other threads:[~2026-04-08 10:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-03  2:55 [PATCH v10 net-next 0/6] octeontx2-af: npc: Enhancements Ratheesh Kannoth
2026-04-03  2:55 ` [PATCH v10 net-next 1/6] octeontx2-af: npc: cn20k: debugfs enhancements Ratheesh Kannoth
2026-04-03  2:55 ` [PATCH v10 net-next 2/6] net/mlx5e: heap-allocate devlink param values Ratheesh Kannoth
2026-04-03  2:55 ` [PATCH v10 net-next 3/6] devlink: Implement devlink param multi attribute nested data values Ratheesh Kannoth
2026-04-07  9:58   ` Paolo Abeni
2026-04-08 10:41     ` Ratheesh Kannoth [this message]
2026-04-03  2:55 ` [PATCH v10 net-next 4/6] octeontx2-af: npc: cn20k: add subbank search order control Ratheesh Kannoth
2026-04-03  2:55 ` [PATCH v10 net-next 5/6] octeontx2-af: npc: cn20k: dynamically allocate and free default MCAM entries Ratheesh Kannoth
2026-04-03  2:55 ` [PATCH v10 net-next 6/6] octeontx2-af: npc: Support for custom KPU profile from filesystem Ratheesh Kannoth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=adYw3SeHoj43Xs2S@rkannoth-OptiPlex-7090 \
    --to=rkannoth@marvell.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=chuck.lever@oracle.com \
    --cc=cjubran@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=donald.hunter@gmail.com \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=matttbe@kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=sgoutham@marvell.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox