Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Ratheesh Kannoth <rkannoth@marvell.com>
To: <linux-kernel@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
	<netdev@vger.kernel.org>, <oss-drivers@corigine.com>
Cc: <akiyano@amazon.com>, <andrew+netdev@lunn.ch>,
	<anthony.l.nguyen@intel.com>, <arkadiusz.kubalewski@intel.com>,
	<brett.creeley@amd.com>, <darinzon@amazon.com>,
	<davem@davemloft.net>, <donald.hunter@gmail.com>,
	<edumazet@google.com>, <horms@kernel.org>, <idosch@nvidia.com>,
	<ivecera@redhat.com>, <jiri@resnulli.us>, <kuba@kernel.org>,
	<leon@kernel.org>, <mbloch@nvidia.com>,
	<michael.chan@broadcom.com>, <pabeni@redhat.com>,
	<pavan.chebbi@broadcom.com>, <petrm@nvidia.com>,
	<Prathosh.Satish@microchip.com>, <przemyslaw.kitszel@intel.com>,
	<saeedm@nvidia.com>, <sgoutham@marvell.com>, <tariqt@nvidia.com>,
	<vadim.fedorenko@linux.dev>
Subject: Re: [PATCH v14 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control
Date: Fri, 15 May 2026 12:56:49 +0530	[thread overview]
Message-ID: <agbKuUI4m1Wtnti_@rkannoth-OptiPlex-7090> (raw)
In-Reply-To: <20260514062537.3813802-6-rkannoth@marvell.com>

On 2026-05-14 at 11:55:33, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> CN20K NPC MCAM is split into 32 subbanks that are searched in a
> predefined order during allocation. Lower-numbered subbanks have
> higher priority than higher-numbered ones.
>
> Add a runtime "srch_order" to control the order in which
> subbanks are searched during MCAM allocation.
>

>
>> +int npc_cn20k_search_order_set(struct rvu *rvu,
>> +			       u64 narr[MAX_NUM_SUB_BANKS], int cnt)
>> +{
>> +	struct npc_mcam *mcam = &rvu->hw->mcam;
>> +	struct npc_subbank *sb;
>> +	struct xarray *xa;
>> +	int prio, rc;
>> +	int sb_idx;
>> +
>> +	if (cnt != npc_priv.num_subbanks) {
>> +		dev_err(rvu->dev, "Number of entries(%u) != %u\n",
>> +			cnt, npc_priv.num_subbanks);
>> +		return -EINVAL;
>> +	}
>> +
>> +	mutex_lock(&mcam->lock);
>> +	npc_lock_all_subbank();
>Does exposing npc_lock_all_subbank() to a user-triggerable devlink callback
>cause a lockdep violation?
>npc_lock_all_subbank() acquires mutexes for all subbanks sequentially.
>Since these mutexes share the same lock class, lockdep might detect this
>as a possible recursive lock. With up to 32 subbanks, this also exceeds
>the maximum lockdep nesting limit of 8 subclasses.

The 32 subbank mutexes are physically distinct locks protecting independent
hardware queues, meaning no actual runtime deadlock path exists. To satisfy
lockdep's verification engine and prevent tracking limit exhaustion during
sequential allocation, the driver uses a singular, overarching top-level
domain lock (mcam->lock) to serialize the entire configuration sequence.

Because mcam->lock acts as the primary synchronization barrier for the
devlink context, concurrent cross-subbank inversion paths are structurally
impossible. To permanently silence the tracking constraint warning, i can post a patch
(post merging this series) to either mutex_lock_nested() with an explicit index state
or disables lockdep tracking entirely for this specialized hardware control
loop using the lockdep_set_novalidate_class() macro

>> +	restrict_valid = false;
>> +
>> +	for (prio = 0; prio < cnt; prio++) {
>> +		sb_idx = narr[prio];
>> +		sb = &npc_priv.sb[sb_idx];
>> +		sb->arr_idx = prio;
>> +
>> +		/* erase entry if exist */
>> +		xa_erase(&npc_priv.xa_sb_free, sb->arr_idx);
>> +		xa_erase(&npc_priv.xa_sb_used, sb->arr_idx);
>> +
>> +		xa = &npc_priv.xa_sb_free;
>> +		if (sb->flags & NPC_SUBBANK_FLAG_USED)
>> +			xa = &npc_priv.xa_sb_used;
>> +
>> +		/* if xa_store() fails due to memory pressure, there is
>> +		 * no point in rolling back, as roll back may also fail
>> +		 */
>> +		rc = xa_err(xa_store(xa, sb->arr_idx,
>> +				     xa_mk_value(sb_idx), GFP_KERNEL));
>> +		if (rc) {
>> +			dev_err(rvu->dev,
>> +				"Setting arr_idx=%d for sb=%d failed\n",
>> +				sb->arr_idx, sb_idx);
>> +			goto fail;
>> +		}
>> +	}
>Does this error path leak subbank resources and leave the xarrays in a
>corrupted state?
>If xa_store() fails, the function jumps to the fail label and aborts
>without restoring the original state. Because the entry at the current
>priority index was already erased via xa_erase(), but the new subbank
>failed to store, that subbank is permanently lost.
>Additionally, the subbank that failed to store is left with a modified
>arr_idx pointing to an empty slot, potentially breaking future allocations.
>Could this function preallocate resources (such as using xa_reserve()) or
>build temporary xarrays to ensure the changes can be applied atomically
>without a destructive intermediate state?
>> +    restrict_valid = false;
>> +
>> +    for (prio = 0; prio < cnt; prio++) {
>> +            sb_idx = narr[prio];
>> +            sb = &npc_priv.sb[sb_idx];
>> +            sb->arr_idx = prio;
>> +
>> +            /* erase entry if exist */
>> +            xa_erase(&npc_priv.xa_sb_free, sb->arr_idx);
>> +            xa_erase(&npc_priv.xa_sb_used, sb->arr_idx);
>> +
>> +            xa = &npc_priv.xa_sb_free;
>> +            if (sb->flags & NPC_SUBBANK_FLAG_USED)
>> +                    xa = &npc_priv.xa_sb_used;
>> +
>> +            /* if xa_store() fails due to memory pressure, there is
>> +             * no point in rolling back, as roll back may also fail
>> +             */
>> +            rc = xa_err(xa_store(xa, sb->arr_idx,
>> +                                 xa_mk_value(sb_idx), GFP_KERNEL));
>> +            if (rc) {
>> +                    dev_err(rvu->dev,
>> +                            "Setting arr_idx=%d for sb=%d failed\n",
>> +                            sb->arr_idx, sb_idx);
>> +                    goto fail;
>> +            }
>> +    }
>Does this error path leak subbank resources and leave the xarrays in a
>corrupted state?
>If xa_store() fails, the function jumps to the fail label and aborts
>without restoring the original state. Because the entry at the current
>priority index was already erased via xa_erase(), but the new subbank
>failed to store, that subbank is permanently lost.
>Additionally, the subbank that failed to store is left with a modified
>arr_idx pointing to an empty slot, potentially breaking future allocations.
>Could this function preallocate resources (such as using xa_reserve()) or
>build temporary xarrays to ensure the changes can be applied atomically
>without a destructive intermediate state?

The omission of a transactional rollback path is intentional due to
the systemic implications of a GFP_KERNEL allocation failure inside
the xarray core. If xa_store() returns an ENOMEM error, the underlying
slab allocator is already exhausted, meaning any programmatic attempt
to erase or restore original indices would also fail, compounding the
unpredictable state.

Because this devlink callback operates under the top-level mcam->lock
mutex, no concurrent allocation or translation worker can read the
partially modified xarrays while the driver is in this degraded state.
An xa_store() failure indicates a fatal, system-wide memory leak or
exhaustion event. At this threshold, the driver prints a critical
error payload to dmesg to allow deterministic post-mortem debugging of
the host's global memory subsystem, rather than attempting an unreliable
in-kernel rollback.

  reply	other threads:[~2026-05-15  7:27 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-14  6:25 [PATCH v14 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
2026-05-14  6:25 ` [PATCH v14 net-next 1/9] octeontx2-af: npc: cn20k: debugfs enhancements Ratheesh Kannoth
2026-05-15  7:25   ` Ratheesh Kannoth
2026-05-14  6:25 ` [PATCH v14 net-next 2/9] net/mlx5e: Reduce stack use reading PCIe congestion thresholds Ratheesh Kannoth
2026-05-14  6:25 ` [PATCH v14 net-next 3/9] devlink: pass param values by pointer Ratheesh Kannoth
2026-05-14  6:25 ` [PATCH v14 net-next 4/9] devlink: Implement devlink param multi attribute nested data values Ratheesh Kannoth
2026-05-14  6:25 ` [PATCH v14 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control Ratheesh Kannoth
2026-05-15  7:26   ` Ratheesh Kannoth [this message]
2026-05-14  6:25 ` [PATCH v14 net-next 6/9] octeontx2: cn20k: Coordinate default rules with NIX LF lifecycle Ratheesh Kannoth
2026-05-15  7:28   ` Ratheesh Kannoth
2026-05-14  6:25 ` [PATCH v14 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem Ratheesh Kannoth
2026-05-15  7:29   ` Ratheesh Kannoth
2026-05-14  6:25 ` [PATCH v14 net-next 8/9] octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc Ratheesh Kannoth
2026-05-14  6:25 ` [PATCH v14 net-next 9/9] octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically Ratheesh Kannoth
2026-05-15  7:30   ` Ratheesh Kannoth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=agbKuUI4m1Wtnti_@rkannoth-OptiPlex-7090 \
    --to=rkannoth@marvell.com \
    --cc=Prathosh.Satish@microchip.com \
    --cc=akiyano@amazon.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=arkadiusz.kubalewski@intel.com \
    --cc=brett.creeley@amd.com \
    --cc=darinzon@amazon.com \
    --cc=davem@davemloft.net \
    --cc=donald.hunter@gmail.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=idosch@nvidia.com \
    --cc=ivecera@redhat.com \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=oss-drivers@corigine.com \
    --cc=pabeni@redhat.com \
    --cc=pavan.chebbi@broadcom.com \
    --cc=petrm@nvidia.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=saeedm@nvidia.com \
    --cc=sgoutham@marvell.com \
    --cc=tariqt@nvidia.com \
    --cc=vadim.fedorenko@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox