public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Ratheesh Kannoth <rkannoth@marvell.com>
To: <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Cc: <sgoutham@marvell.com>, <davem@davemloft.net>,
	<edumazet@google.com>, <kuba@kernel.org>, <pabeni@redhat.com>,
	<andrew+netdev@lunn.ch>
Subject: Re: [PATCH v4 net 09/10] octeontx2-af: npc: cn20k: Tear down default MCAM rules explicitly on free
Date: Mon, 27 Apr 2026 15:39:09 +0530	[thread overview]
Message-ID: <ae81xaD0ZQuwmtvs@rkannoth-OptiPlex-7090> (raw)
In-Reply-To: <20260427063213.3937451-10-rkannoth@marvell.com>

On 2026-04-27 at 12:02:12, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> npc_cn20k_dft_rules_free() used the NPC MCAM mbox "free all" path, which
> does not match how cn20k tracks default-rule MCAM slots indexes.
>
> Resolve the default-rule indices, then for each valid slot clear the
> bitmap entry, drop the PF/VF map, disable the MCAM line, clear the
> target function, and npc_cn20k_idx_free(). Remove any
> matching software mcam_rules nodes. On hard failure from idx_free, WARN
> and stop so the box stays up for analysis.
>
> In npc_mcam_free_all_entries(), prefetch the same default-rule indices
> and, on cn20k, skip bitmap clear and idx_free when the scanned entry is
> one of those reserved defaults (they are released by
> npc_cn20k_dft_rules_free). Still disable the entry and tear down counter
> mapping for every matching index.
>
> Fixes: 09d3b7a1403f ("octeontx2-af: npc: cn20k: Allocate default MCAM indexes")
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>

>>  free_rules:
>> +	blkaddr = rvu_get_blkaddr(rvu, BLKTYPE_NPC, 0);
>> +	if (blkaddr < 0)
>> +		return;
>> +	for (int i = 0; i < 4; i++) {
>> +		if (ptr[i] == USHRT_MAX)
>> +			continue;
>>
>> -	free_req.hdr.pcifunc = pcifunc;
>> -	free_req.all = 1;
>> -	rc = rvu_mbox_handler_npc_mcam_free_entry(rvu, &free_req, &rsp);
>> -	if (rc)
>> -		dev_err(rvu->dev,
>> -			"%s: Error deleting default entries (pcifunc=%#x\n",
>> -			__func__, pcifunc);
>> +		mutex_lock(&mcam->lock);
>> +		npc_mcam_clear_bit(mcam, ptr[i]);
>> +		mcam->entry2pfvf_map[ptr[i]] = NPC_MCAM_INVALID_MAP;
>> +		npc_cn20k_enable_mcam_entry(rvu, blkaddr, ptr[i], false);
>> +		mcam->entry2target_pffunc[ptr[i]] = 0x0;
>> +		mutex_unlock(&mcam->lock);
>> +
>> +		rc = npc_cn20k_idx_free(rvu, &ptr[i], 1);
>> +		if (rc) {
>> +			/* Non recoverable error. Let us WARN and return. Keep system alive to
>> +			 * enable debugging
>> +			 */
>> +			WARN(1, "%s Error deleting default entries (pcifunc=%#x) mcam_idx=%u\n",
>> +			     __func__, pcifunc, ptr[i]);
>> +			return;
>When npc_cn20k_idx_free() fails on some ptr[i], this path WARNs and
>returns from inside the per-entry loop.  At that point, the earlier
>block in npc_cn20k_dft_rules_free() has already xa_erase()d the
>xa_pf2dfl_rmap entries for all four default rule IDs for this pcifunc,
>so npc_cn20k_dft_rules_idx_get() on a retry would return -ESRCH and
>could not reclaim anything.
>Given that, could the remaining ptr[i+1..3] entries leak on this path?
>Their bitmap bits stay cleared only if they were processed before the
>failure, but any entries after i still have their bitmap slot allocated,
>entry2pfvf_map still referencing pcifunc, MCAM line still enabled,
>entry2target_pffunc still set, and the cn20k idx allocator still holding
>the slots.
No. This is a non recoverable error and no way we can recover and proceed. So this warning
and returning would enable the user to debug the system.

>> +		}
>> +	}
>> +
>> +	mutex_lock(&mcam->lock);
>> +	list_for_each_entry_safe(rule, tmp, &mcam->mcam_rules, list) {
>> +		for (int i = 0; i < 4; i++) {
>> +			if (ptr[i] != rule->entry)
>> +				continue;
>> +
>> +			list_del(&rule->list);
>> +			kfree(rule);
>> +			break;
>> +		}
>> +	}
>> +	mutex_unlock(&mcam->lock);
>>  }
>On the same error path, the list_for_each_entry_safe() walk over
>mcam->mcam_rules below is skipped entirely because of the early return,
>so rvu_npc_mcam_rule nodes for all four default indices (including the
>ones that were successfully torn down before the failure) stay on the
>list.
>The commit message says:
>    Resolve the default-rule indices, then for each valid slot clear the
>    bitmap entry, drop the PF/VF map, disable the MCAM line, clear the
>    target function, and npc_cn20k_idx_free(). Remove any
>    matching software mcam_rules nodes. On hard failure from idx_free, WARN
>    and stop so the box stays up for analysis.
>Is the "Remove any matching software mcam_rules nodes" step intended to
>also be skipped whenever idx_free() fails?  If not, would it make sense
>to move the mcam_rules cleanup ahead of the per-entry loop, or continue
>the loop best-effort and WARN once at the end, so the software list and
>the already-torn-down slots do not silently diverge on the error path?
No. mcam list will show more information in debugfs. I dont think, freeing 4 entries (in default entries)
would save us anything. So better to keep this way.

  reply	other threads:[~2026-04-27 10:09 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27  6:32 [PATCH v4 net 0/10] octeontx2-af: npc: cn20k: MCAM fixes Ratheesh Kannoth
2026-04-27  6:32 ` [PATCH v4 net 01/10] octeontx2-af: npc: cn20k: Propagate MCAM key-type errors on cn20k Ratheesh Kannoth
2026-04-27  9:45   ` Ratheesh Kannoth
2026-04-27  6:32 ` [PATCH v4 net 02/10] octeontx2-af: npc: cn20k: Drop debugfs_create_file() error checks in init Ratheesh Kannoth
2026-04-27  6:32 ` [PATCH v4 net 03/10] octeontx2-af: npc: cn20k: Propagate errors in defrag MCAM alloc rollback Ratheesh Kannoth
2026-04-27  9:56   ` Ratheesh Kannoth
2026-04-27  6:32 ` [PATCH v4 net 04/10] octeontx2-af: npc: cn20k: Fix target map and rule Ratheesh Kannoth
2026-04-27  6:32 ` [PATCH v4 net 05/10] octeontx2-af: npc: cn20k: Clear MCAM entries by index and key width Ratheesh Kannoth
2026-04-27 10:01   ` Ratheesh Kannoth
2026-04-27  6:32 ` [PATCH v4 net 06/10] octeontx2-af: npc: cn20k: Fix bank value Ratheesh Kannoth
2026-04-27  6:32 ` [PATCH v4 net 07/10] octeontx2-af: npc: cn20k: Fix MCAM actions read Ratheesh Kannoth
2026-04-27  6:32 ` [PATCH v4 net 08/10] octeontx2-af: npc: cn20k: Initialize default-rule index outputs up front Ratheesh Kannoth
2026-04-27  6:32 ` [PATCH v4 net 09/10] octeontx2-af: npc: cn20k: Tear down default MCAM rules explicitly on free Ratheesh Kannoth
2026-04-27 10:09   ` Ratheesh Kannoth [this message]
2026-04-27  6:32 ` [PATCH v4 net 10/10] octeontx2-af: npc: cn20k: Reject missing default-rule MCAM indices Ratheesh Kannoth
2026-04-27 10:13   ` Ratheesh Kannoth

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ae81xaD0ZQuwmtvs@rkannoth-OptiPlex-7090 \
    --to=rkannoth@marvell.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sgoutham@marvell.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox