From: Drew Fustini <fustini@kernel.org>
To: Fenghua Yu <fenghuay@nvidia.com>
Cc: "Adrien Ricciardi" <aricciardi@baylibre.com>,
"Alexandre Ghiti" <alex@ghiti.fr>,
"Atish Kumar Patra" <atishp@rivosinc.com>,
"Atish Patra" <atish.patra@linux.dev>,
"Babu Moger" <babu.moger@amd.com>,
"Ben Horgan" <ben.horgan@arm.com>,
"Borislav Petkov" <bp@alien8.de>,
"Chen Pei" <cp0613@linux.alibaba.com>,
"Conor Dooley" <conor.dooley@microchip.com>,
"Conor Dooley" <conor+dt@kernel.org>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
"Dave Martin" <Dave.Martin@arm.com>,
"Gong Shuai" <gong.shuai@sanechips.com.cn>,
"Gong Shuai" <gsh517@gmail.com>,
guo.wenjia23@zte.com.cn, "James Morse" <james.morse@arm.com>,
"Kornel Dulęba" <mindal@semihalf.com>,
"Krzysztof Kozlowski" <krzk+dt@kernel.org>,
liu.qingtao2@zte.com.cn,
"Liu Zhiwei" <zhiwei_liu@linux.alibaba.com>,
"Palmer Dabbelt" <palmer@dabbelt.com>,
"Paul Walmsley" <pjw@kernel.org>,
"Peter Newman" <peternewman@google.com>,
"Radim Krčmář" <rkrcmar@ventanamicro.com>,
"Reinette Chatre" <reinette.chatre@intel.com>,
"Rob Herring" <robh@kernel.org>,
"Samuel Holland" <samuel.holland@sifive.com>,
"Sebastian Andrzej Siewior" <bigeasy@linutronix.de>,
"Tony Luck" <tony.luck@intel.com>,
"Vasudevan Srinivasan" <vasu@rivosinc.com>,
"Ved Shanbhogue" <ved@rivosinc.com>,
"Weiwei Li" <liwei1518@gmail.com>,
"yunhui cui" <cuiyunhui@bytedance.com>,
linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
x86@kernel.org, devicetree@vger.kernel.org,
linux-rt-devel@lists.linux.dev, linux-doc@vger.kernel.org
Subject: Re: [PATCH v3 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask
Date: Fri, 3 Jul 2026 01:00:25 -0700 [thread overview]
Message-ID: <akdsGZm8e9w5idhn@x1> (raw)
In-Reply-To: <473da771-b711-457b-b9ad-491fee111b16@nvidia.com>
On Wed, Jul 01, 2026 at 01:18:41PM -0700, Fenghua Yu wrote:
> Hi, Drew,
Hi, thanks for reviewing.
> Could you please change my email address to my NVIDIA email
> fenghuay@nvidia.com?
No problem.
> On 6/28/26 14:18, Drew Fustini wrote:
> > Wire CBQRI capacity controllers into resctrl as RDT_RESOURCE_L2 and
> > RDT_RESOURCE_L3 schemata.
> >
> > Mismatched CC caps at the same cache level are treated as a fatal
> > configuration error since fs/resctrl exposes a single per-rid cap
> > set. Domains are created lazily in the cpuhp online callback so
> > cpu_mask reflects only currently online CPUs.
> >
> > Assisted-by: Claude:claude-opus-4-7
> > Co-developed-by: Adrien Ricciardi <aricciardi@baylibre.com>
> > Signed-off-by: Adrien Ricciardi <aricciardi@baylibre.com>
> > Signed-off-by: Drew Fustini <fustini@kernel.org>
> > ---
> > MAINTAINERS | 2 +
> > arch/riscv/include/asm/resctrl.h | 147 ++++++++
> > drivers/resctrl/Kconfig | 4 +
> > drivers/resctrl/Makefile | 1 +
> > drivers/resctrl/cbqri_resctrl.c | 787 +++++++++++++++++++++++++++++++++++++++
> > 5 files changed, 941 insertions(+)
> [SNIP]
>
> > +/*
> > + * Walk cbqri_controllers and pick one capacity controller (CC) per cache
> > + * level (L2/L3) to back the corresponding RDT_RESOURCE_L*. When more than
> > + * one CC sits at the same level (e.g. one per socket), they must agree on
> > + * rcid_count / ncblks / alloc_capable. A mismatch is fatal because resctrl
> > + * exposes a single set of caps per rid. The first matching controller wins.
> > + */
> > +static int cbqri_resctrl_pick_caches(void)
> > +{
> > + struct cbqri_controller *ctrl;
> > + int ret = 0;
> > +
> > + mutex_lock(&cbqri_controllers_lock);
>
> Is it better to change mutex_lock()/mutex_unlock() to
> guard(mutex)(&cbqri_controllers_lock)?
>
> 1. This code is simpler and can avoid potential missing unlock issue.
> 2. This matches mpam code.
Good point, I will switch cbqri_controllers_lock lock/unlock to
guard(mutex).
>
> > +
> > + list_for_each_entry(ctrl, &cbqri_controllers, list) {
> > + struct cbqri_resctrl_res *cbqri_res;
> > + int rid;
> > +
> > + if (ctrl->type != CBQRI_CONTROLLER_TYPE_CAPACITY)
> > + continue;
> > + if (!ctrl->alloc_capable)
> > + continue;
> > +
> > + rid = cbqri_cache_level_to_rid(ctrl->cache.cache_level);
> > + if (rid < 0) {
> > + pr_info("skipping controller at unsupported cache level %u\n",
> > + ctrl->cache.cache_level);
> > + continue;
> > + }
> > +
> > + cbqri_res = &cbqri_resctrl_resources[rid];
> > + if (cbqri_res->ctrl) {
> > + /*
> > + * CCs at the same cache level must agree on every cap
> > + * resctrl exposes globally. Reject mismatches at pick
> > + * time so the inconsistency is visible at boot.
> > + */
> > + if (cbqri_res->ctrl->rcid_count != ctrl->rcid_count ||
> > + cbqri_res->ctrl->cc.ncblks != ctrl->cc.ncblks ||
> > + cbqri_res->ctrl->cc.supports_alloc_at_code !=
> > + ctrl->cc.supports_alloc_at_code ||
> > + cbqri_res->ctrl->alloc_capable != ctrl->alloc_capable) {
> > + pr_err("L%d controllers have mismatched capabilities\n",
> > + ctrl->cache.cache_level);
> > + ret = -EINVAL;
> > + break;
>
> Is it possible to support cbqri on both L2 and L3 on the same machine?
> Failure on one controller will stop picking another other controller here.
>
> If both L2 and L3 can be supported on the same machine, does it make sense
> to pr_err() (fatal for this controller) and continue to go to the next
> controller? So failure on L2 won't impact L3?
>
> If that's the case, does it make sense not to return error for
> pick_caches()? So pick_caches() failure is not fatal?
Yes, both L2 and L3 can be present at once, and you're right that this
approach was too strict. I will change it so that it drops only the
offending cache level. I will make pick_caches() clear that rid so it is
not exposed and continue to the next controller. It no longer returns an
error, so an L2 mismatch leaves L3, and any future QoS feature untouched.
[..]
> > +static int cbqri_attach_cpu_to_all_ctrls(unsigned int cpu)
> > +{
> > + struct cbqri_controller *ctrl;
> > + int err = 0;
> > +
> > + lockdep_assert_held(&cbqri_domain_list_lock);
> > +
> > + /*
> > + * Hold cbqri_controllers_lock across the walk so a controller
> > + * registered after boot cannot corrupt it. The register path takes
> > + * it as a leaf and never cbqri_domain_list_lock, so this nesting
> > + * cannot invert.
> > + */
> > + mutex_lock(&cbqri_controllers_lock);
>
> guard(mutex)(&cbqri_controllers_lock)?
Ack.
> > +static int cbqri_resctrl_setup(void)
> > +{
> > + int rid;
> > + int err;
> > +
> > + for (rid = 0; rid < RDT_NUM_RESOURCES; rid++)
> > + cbqri_resctrl_resources[rid].resctrl_res.rid = rid;
> > +
> > + err = cbqri_resctrl_pick_caches();
> > + if (err)
> > + return err;
>
> Failure in pick_caches() will abort any future cbqri features e.g. memory bw
> allocation/monitoring. Is it possible to ignore the pick_caches() failure
> and continue to setup other cbqri features? Failed caches won't impact other
> QoS features, right?
You're right, and this is the same change as the L2/L3 mismatch above. I
will drop the error return from pick_caches() so that a cache problem can
no longer abort setup or block other QoS features. Cache capacity and
memory bandwidth are separate controller types that will be picked
independently, so a failed cache will not gate bandwidth alloc/mon when
those land. The only early return I will keep is the -ENODEV for when
nothing at all is exposable.
Thanks,
Drew
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2026-07-03 8:00 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-28 21:18 [PATCH v3 0/8] riscv: Add Ssqosid and initial CBQRI resctrl support Drew Fustini
2026-06-28 21:18 ` [PATCH v3 1/8] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-06-28 21:18 ` [PATCH v3 2/8] riscv: Detect the Ssqosid extension Drew Fustini
2026-06-28 21:18 ` [PATCH v3 3/8] riscv: Add support for srmcfg CSR from " Drew Fustini
[not found] ` <20260628212820.D0DD51F000E9@smtp.kernel.org>
2026-06-29 22:46 ` Drew Fustini
2026-06-28 21:18 ` [PATCH v3 4/8] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
[not found] ` <20260628212810.05F6D1F00A3A@smtp.kernel.org>
2026-07-01 0:24 ` Drew Fustini
2026-06-28 21:18 ` [PATCH v3 5/8] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-07-01 20:18 ` Fenghua Yu
2026-07-03 8:00 ` Drew Fustini [this message]
2026-06-28 21:18 ` [PATCH v3 6/8] riscv: Enable resctrl filesystem for Ssqosid Drew Fustini
2026-06-28 21:18 ` [PATCH v3 7/8] dt-bindings: riscv: Add binding for CBQRI controllers Drew Fustini
2026-06-29 15:32 ` Conor Dooley
2026-06-28 21:18 ` [PATCH v3 8/8] riscv_cbqri: Add CBQRI capacity allocation platform driver Drew Fustini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=akdsGZm8e9w5idhn@x1 \
--to=fustini@kernel.org \
--cc=Dave.Martin@arm.com \
--cc=alex@ghiti.fr \
--cc=aricciardi@baylibre.com \
--cc=atish.patra@linux.dev \
--cc=atishp@rivosinc.com \
--cc=babu.moger@amd.com \
--cc=ben.horgan@arm.com \
--cc=bigeasy@linutronix.de \
--cc=bp@alien8.de \
--cc=conor+dt@kernel.org \
--cc=conor.dooley@microchip.com \
--cc=cp0613@linux.alibaba.com \
--cc=cuiyunhui@bytedance.com \
--cc=dave.hansen@linux.intel.com \
--cc=devicetree@vger.kernel.org \
--cc=fenghuay@nvidia.com \
--cc=gong.shuai@sanechips.com.cn \
--cc=gsh517@gmail.com \
--cc=guo.wenjia23@zte.com.cn \
--cc=james.morse@arm.com \
--cc=krzk+dt@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=liu.qingtao2@zte.com.cn \
--cc=liwei1518@gmail.com \
--cc=mindal@semihalf.com \
--cc=palmer@dabbelt.com \
--cc=peternewman@google.com \
--cc=pjw@kernel.org \
--cc=reinette.chatre@intel.com \
--cc=rkrcmar@ventanamicro.com \
--cc=robh@kernel.org \
--cc=samuel.holland@sifive.com \
--cc=tony.luck@intel.com \
--cc=vasu@rivosinc.com \
--cc=ved@rivosinc.com \
--cc=x86@kernel.org \
--cc=zhiwei_liu@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox