From: Ratheesh Kannoth <rkannoth@marvell.com>
To: <linux-kernel@vger.kernel.org>, <netdev@vger.kernel.org>
Cc: <andrew+netdev@lunn.ch>, <davem@davemloft.net>,
<donald.hunter@gmail.com>, <edumazet@google.com>,
<horms@kernel.org>, <jiri@resnulli.us>, <kuba@kernel.org>,
<pabeni@redhat.com>, <sgoutham@marvell.com>
Subject: Re: [PATCH v20 net-next 2/9] octeontx2-af: npc: cn20k: debugfs enhancements
Date: Wed, 10 Jun 2026 10:38:05 +0530 [thread overview]
Message-ID: <aijxNS4BooInl2Si@rkannoth-OptiPlex-7090> (raw)
In-Reply-To: <20260609040453.711932-3-rkannoth@marvell.com>
On 2026-06-09 at 09:34:46, Ratheesh Kannoth (rkannoth@marvell.com) wrote:
> Improve MCAM visibility and field debugging for CN20K NPC.
>
> - Extend "mcam_layout" to show enabled (+) or disabled state per entry
> so status can be verified without parsing the full "mcam_entry" dump.
> - Add "dstats" debugfs entry: for enabled MCAM indices, print hit deltas
> since the prior read by comparing hardware counters to a per-entry
> software baseline and advancing that baseline after each read (hardware
> counters are not cleared).
> - Add "mismatch" debugfs entry: lists MCAM entries that are enabled
> but not explicitly allocated, helping diagnose allocation/field issues.
>
> Signed-off-by: Ratheesh Kannoth <rkannoth@marvell.com>
https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260609040453.711932-1-rkannoth%40marvell.com says:
> + continue;
>> + if (stats == dstats[bank][idx])
>> + continue;
>> +
>> + if (stats < dstats[bank][idx])
>> + dstats[bank][idx] = 0;
>The dstats[][] array is module-global and the show handler both reads
>and writes it on every read. A few observations about that contract,
>which the commit message describes as "hit deltas since the prior
>read":
>The baseline survives across npc_cn20k_enable_mcam_entry(..., false)
>followed by a re-enable with a different tenant. If the new tenant's
>hardware counter happens to equal the cached baseline, this branch
>silently skips the entry:
> if (stats == dstats[bank][idx])
> continue;
>On counter rollback the code zeroes the baseline first and then
>computes the delta against zero:
> if (stats < dstats[bank][idx])
> dstats[bank][idx] = 0;
> ...
> delta = stats - dstats[bank][idx];
>so the very next read after a rollback reports the full absolute
>counter value rather than a delta. Is that the intended behaviour?
>Also, since dstats[][] is a single shared baseline, two concurrent
>readers (for example an operator session and a monitoring script)
>serialise on stats_lock but each read advances the baseline for the
>other, so each one sees only a fraction of the deltas. Would
>per-open state (allocated in the open() callback rather than file
>scope) be more in line with what the commit message promises?
Thanks for pointing this out. The dstats matrix is intended purely
as a lightweight telemetry tool to identify which MCAM entries are
actively being hit by current traffic via debugfs, rather than
providing strict, transactionally guaranteed billing or accounting
metrics.
You are correct that a stale baseline could cause a transient
under-reporting on the very first read after a reallocation if the
hardware counter quickly surpasses the old baseline. However,
during that first read, the fallback logic will immediately catch
up and synchronize dstats[bank][idx] to the current hardware state
(stats). All subsequent reads will reflect completely accurate hit
increments.
Given that this is non-critical debugfs telemetry, i would like to
keep this as it is. or fix it as part of "net" tree.
>> +
>> + pf = 0xFFFF;
>> + map = xa_load(&npc_priv->xa_idx2pf_map, mcam_idx);
>> + if (map)
>> + pf = xa_to_value(map);
>> +
>> + delta = stats - dstats[bank][idx];
>> +
>> + snprintf(buff, sizeof(buff), "%u\t%#04x\t%llu\n",
>> + mcam_idx, pf, delta);
>> + seq_puts(s, buff);
>> +
>> + dstats[bank][idx] = stats;
>> + }
>> + }
>> +
>> + mutex_unlock(&stats_lock);
>> + return 0;
>> +}
>npc_mcam_dstats_show() calls npc_mcam_idx_2_key_type() and then uses
>key_type to decide whether to skip bank 1 for X4 entries, but it does
>not take sb->lock while doing so. The existing npc_mcam_layout_show()
>in the same file holds sb->lock around exactly these field reads:
> mutex_lock(&sb->lock);
> ...
> if (sb->key_type == NPC_MCAM_KEY_X4) {
> ...
>The mutators __npc_subbank_mark_used() and __npc_subbank_free() update
>sb->key_type under sb->lock via their npc_subbank_alloc /
>npc_subbank_free wrappers.
>If a subbank is concurrently being retyped, can this reader observe a
>transient sb->key_type value (for example 0 mid-free, or X4 for a
>subbank being retyped to X2) and as a result drop or duplicate
>entries due to the "key_type == NPC_MCAM_KEY_X4 && bank != 0" filter?
The dstats matrix is intended purely
as a lightweight telemetry tool to identify which MCAM entries are
actively being hit by current traffic via debugfs, rather than
providing strict, transactionally guaranteed billing or accounting
metrics. So these concurrency issue wont defeat the purpose of this
mcam entry.
>> +static int npc_mcam_mismatch_show(struct seq_file *s, void *unused)
>> +{
>> + struct npc_priv_t *npc_priv;
>> + struct npc_subbank *sb;
>> + int mcam_idx, sb_off;
>> + struct rvu *rvu;
>> + char buff[64];
>> + void *map;
>> + int rc;
>> +
>> + npc_priv = npc_priv_get();
>> + rvu = s->private;
>> +
>> + seq_puts(s, "index\tsb idx\tkw type\n");
>> + for (int bank = npc_priv->num_banks - 1; bank >= 0; bank--) {
>> + for (int idx = npc_priv->bank_depth - 1; idx >= 0; idx--) {
>> + mcam_idx = bank * npc_priv->bank_depth + idx;
>> +
>> + if (!test_bit(mcam_idx, npc_priv->en_map))
>> + continue;
>> +
>> + map = xa_load(&npc_priv->xa_idx2pf_map, mcam_idx);
>> + if (map)
>> + continue;
>> +
>> + rc = npc_mcam_idx_2_subbank_idx(rvu, mcam_idx,
>> + &sb, &sb_off);
>> + if (rc)
>> + continue;
>> +
>> + snprintf(buff, sizeof(buff), "%u\t%d\t%u\n",
>> + mcam_idx, sb->idx, sb->key_type);
>> +
>> + seq_puts(s, buff);
>> + }
>> + }
>> + return 0;
>> +}
>The same observation applies to npc_mcam_mismatch_show(): sb->idx and
>sb->key_type are read here without holding sb->lock, while the rest
>of npc.c serialises mutations of these fields under that lock. Was
>sb->lock omitted intentionally for these debug paths?
This was omitted intentionally. npc_mcam_mismatch_show() is a debugfs dump
intended to provide a point-in-time snapshot. Acquiring sb->lock per subbank would
only protect that specific iteration and wouldn't guarantee full consistency
across the entire loop snapshot anyway. Therefore, we avoided the locks here.
next prev parent reply other threads:[~2026-06-10 5:08 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-09 4:04 [PATCH v20 net-next 0/9] octeontx2-af: npc: Enhancements Ratheesh Kannoth
2026-06-09 4:04 ` [PATCH v20 net-next 1/9] octeontx2-af: enforce single RVU AF probe Ratheesh Kannoth
2026-06-09 4:04 ` [PATCH v20 net-next 2/9] octeontx2-af: npc: cn20k: debugfs enhancements Ratheesh Kannoth
2026-06-10 4:19 ` Ratheesh Kannoth
2026-06-10 5:08 ` Ratheesh Kannoth [this message]
2026-06-09 4:04 ` [PATCH v20 net-next 3/9] devlink: heap-allocate param fill buffers in devlink_nl_param_fill Ratheesh Kannoth
2026-06-09 4:04 ` [PATCH v20 net-next 4/9] devlink: Implement devlink param multi attribute nested data values Ratheesh Kannoth
2026-06-09 4:04 ` [PATCH v20 net-next 5/9] octeontx2-af: npc: cn20k: add subbank search order control Ratheesh Kannoth
2026-06-10 4:28 ` Ratheesh Kannoth
2026-06-10 5:10 ` Ratheesh Kannoth
2026-06-09 4:04 ` [PATCH v20 net-next 6/9] octeontx2: cn20k: Coordinate default rules with NIX LF lifecycle Ratheesh Kannoth
2026-06-10 4:30 ` Ratheesh Kannoth
2026-06-10 4:31 ` Ratheesh Kannoth
2026-06-09 4:04 ` [PATCH v20 net-next 7/9] octeontx2-af: npc: Support for custom KPU profile from filesystem Ratheesh Kannoth
2026-06-10 5:24 ` Ratheesh Kannoth
2026-06-09 4:04 ` [PATCH v20 net-next 8/9] octeontx2: cn20k: Respect NPC MCAM X2/X4 profile in flows and DFT alloc Ratheesh Kannoth
2026-06-10 5:31 ` Ratheesh Kannoth
2026-06-09 4:04 ` [PATCH v20 net-next 9/9] octeontx2-af: npc: cn20k: Allocate npc_priv and dstats dynamically Ratheesh Kannoth
2026-06-10 4:40 ` Ratheesh Kannoth
2026-06-10 5:33 ` Ratheesh Kannoth
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aijxNS4BooInl2Si@rkannoth-OptiPlex-7090 \
--to=rkannoth@marvell.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=donald.hunter@gmail.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sgoutham@marvell.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox