From: Ratheesh Kannoth <rkannoth@marvell.com>
To: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Cc: <sgoutham@marvell.com>, <davem@davemloft.net>,
<edumazet@google.com>, <kuba@kernel.org>, <pabeni@redhat.com>,
<andrew+netdev@lunn.ch>
Subject: Re: [PATCH v4 net 09/10] octeontx2-af: npc: cn20k: Tear down default MCAM rules explicitly on free
Date: Mon, 27 Apr 2026 15:39:09 +0530 [thread overview]
Message-ID: <ae81xaD0ZQuwmtvs@rkannoth-OptiPlex-7090> (raw)
In-Reply-To: <20260427063213.3937451-10-rkannoth@marvell.com>
On 2026-04-27 at 12:02:12, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> npc_cn20k_dft_rules_free() used the NPC MCAM mbox "free all" path, which
> does not match how cn20k tracks default-rule MCAM slots indexes.
>
> Resolve the default-rule indices, then for each valid slot clear the
> bitmap entry, drop the PF/VF map, disable the MCAM line, clear the
> target function, and npc_cn20k_idx_free(). Remove any
> matching software mcam_rules nodes. On hard failure from idx_free, WARN
> and stop so the box stays up for analysis.
>
> In npc_mcam_free_all_entries(), prefetch the same default-rule indices
> and, on cn20k, skip bitmap clear and idx_free when the scanned entry is
> one of those reserved defaults (they are released by
> npc_cn20k_dft_rules_free). Still disable the entry and tear down counter
> mapping for every matching index.
>
> Fixes: 09d3b7a1403f ("octeontx2-af: npc: cn20k: Allocate default MCAM indexes")
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
>> free_rules:
>> + blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
>> + if (blkaddr < 0)
>> + return;
>> + for (int i = 0; i < 4; i++) {
>> + if (ptr[i] == USHRT_MAX)
>> + continue;
>>
>> - free_req.hdr.pcifunc = pcifunc;
>> - free_req.all = 1;
>> - rc = rvu_mbox_handler_npc_mcam_free_entry(rvu, &free_req, &rsp);
>> - if (rc)
>> - dev_err(rvu->dev,
>> - "%s: Error deleting default entries (pcifunc=%#x\n",
>> - __func__, pcifunc);
>> + mutex_lock(&mcam->lock);
>> + npc_mcam_clear_bit(mcam, ptr[i]);
>> + mcam->entry2pfvf_map[ptr[i]] = NPC_MCAM_INVALID_MAP;
>> + npc_cn20k_enable_mcam_entry(rvu, blkaddr, ptr[i], false);
>> + mcam->entry2target_pffunc[ptr[i]] = 0x0;
>> + mutex_unlock(&mcam->lock);
>> +
>> + rc = npc_cn20k_idx_free(rvu, &ptr[i], 1);
>> + if (rc) {
>> + /* Non recoverable error. Let us WARN and return. Keep system alive to
>> + * enable debugging
>> + */
>> + WARN(1, "%s Error deleting default entries (pcifunc=%#x) mcam_idx=%u\n",
>> + __func__, pcifunc, ptr[i]);
>> + return;
>When npc_cn20k_idx_free() fails on some ptr[i], this path WARNs and
>returns from inside the per-entry loop. At that point, the earlier
>block in npc_cn20k_dft_rules_free() has already xa_erase()d the
>xa_pf2dfl_rmap entries for all four default rule IDs for this pcifunc,
>so npc_cn20k_dft_rules_idx_get() on a retry would return -ESRCH and
>could not reclaim anything.
>Given that, could the remaining ptr[i+1..3] entries leak on this path?
>Their bitmap bits stay cleared only if they were processed before the
>failure, but any entries after i still have their bitmap slot allocated,
>entry2pfvf_map still referencing pcifunc, MCAM line still enabled,
>entry2target_pffunc still set, and the cn20k idx allocator still holding
>the slots.
No. This is a non recoverable error and no way we can recover and proceed. So this warning
and returning would enable the user to debug the system.
>> + }
>> + }
>> +
>> + mutex_lock(&mcam->lock);
>> + list_for_each_entry_safe(rule, tmp, &mcam->mcam_rules, list) {
>> + for (int i = 0; i < 4; i++) {
>> + if (ptr[i] != rule->entry)
>> + continue;
>> +
>> + list_del(&rule->list);
>> + kfree(rule);
>> + break;
>> + }
>> + }
>> + mutex_unlock(&mcam->lock);
>> }
>On the same error path, the list_for_each_entry_safe() walk over
>mcam->mcam_rules below is skipped entirely because of the early return,
>so rvu_npc_mcam_rule nodes for all four default indices (including the
>ones that were successfully torn down before the failure) stay on the
>list.
>The commit message says:
> Resolve the default-rule indices, then for each valid slot clear the
> bitmap entry, drop the PF/VF map, disable the MCAM line, clear the
> target function, and npc_cn20k_idx_free(). Remove any
> matching software mcam_rules nodes. On hard failure from idx_free, WARN
> and stop so the box stays up for analysis.
>Is the "Remove any matching software mcam_rules nodes" step intended to
>also be skipped whenever idx_free() fails? If not, would it make sense
>to move the mcam_rules cleanup ahead of the per-entry loop, or continue
>the loop best-effort and WARN once at the end, so the software list and
>the already-torn-down slots do not silently diverge on the error path?
No. mcam list will show more information in debugfs. I dont think, freeing 4 entries (in default entries)
would save us anything. So better to keep this way.
next prev parent reply other threads:[~2026-04-27 10:09 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 6:32 [PATCH v4 net 0/10] octeontx2-af: npc: cn20k: MCAM fixes Ratheesh Kannoth
2026-04-27 6:32 ` [PATCH v4 net 01/10] octeontx2-af: npc: cn20k: Propagate MCAM key-type errors on cn20k Ratheesh Kannoth
2026-04-27 9:45 ` Ratheesh Kannoth
2026-04-27 6:32 ` [PATCH v4 net 02/10] octeontx2-af: npc: cn20k: Drop debugfs_create_file() error checks in init Ratheesh Kannoth
2026-04-27 6:32 ` [PATCH v4 net 03/10] octeontx2-af: npc: cn20k: Propagate errors in defrag MCAM alloc rollback Ratheesh Kannoth
2026-04-27 9:56 ` Ratheesh Kannoth
2026-04-27 6:32 ` [PATCH v4 net 04/10] octeontx2-af: npc: cn20k: Fix target map and rule Ratheesh Kannoth
2026-04-27 6:32 ` [PATCH v4 net 05/10] octeontx2-af: npc: cn20k: Clear MCAM entries by index and key width Ratheesh Kannoth
2026-04-27 10:01 ` Ratheesh Kannoth
2026-04-27 6:32 ` [PATCH v4 net 06/10] octeontx2-af: npc: cn20k: Fix bank value Ratheesh Kannoth
2026-04-27 6:32 ` [PATCH v4 net 07/10] octeontx2-af: npc: cn20k: Fix MCAM actions read Ratheesh Kannoth
2026-04-27 6:32 ` [PATCH v4 net 08/10] octeontx2-af: npc: cn20k: Initialize default-rule index outputs up front Ratheesh Kannoth
2026-04-27 6:32 ` [PATCH v4 net 09/10] octeontx2-af: npc: cn20k: Tear down default MCAM rules explicitly on free Ratheesh Kannoth
2026-04-27 10:09 ` Ratheesh Kannoth [this message]
2026-04-27 6:32 ` [PATCH v4 net 10/10] octeontx2-af: npc: cn20k: Reject missing default-rule MCAM indices Ratheesh Kannoth
2026-04-27 10:13 ` Ratheesh Kannoth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ae81xaD0ZQuwmtvs@rkannoth-OptiPlex-7090 \
--to=rkannoth@marvell.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sgoutham@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox