From: Drew Fustini <fustini@kernel.org>
To: sashiko@lists.linux.dev
Cc: conor+dt@kernel.org, devicetree@vger.kernel.org,
krzk+dt@kernel.org, robh@kernel.org,
linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
x86@kernel.org
Subject: Re: [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from Ssqosid extension
Date: Thu, 14 May 2026 15:24:56 -0700 [thread overview]
Message-ID: <agZLuHhsN5ezSmqg@tt-p14-dfustini> (raw)
In-Reply-To: <20260511235300.781C6C2BCB0@smtp.kernel.org>
On Mon, May 11, 2026 at 11:52:59PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
> - [High] Uninitialized Hardware State / Early Boot Bypass for CSR_SRMCFG
> - [High] Resource Control Bypass via Monitoring Group (MCID) evaluation flaw
> - [High] Hardware/Software State Desync on Boot CPU Resume from Suspend
> - [Medium] Redundant loop and Data Race in hotplug initialization
> - [Medium] Initcall returns positive hotplug state ID on success
> --
>
> commit 0e2697a6f259670c211c47e45e2448410a9ac7b9
> Author: Drew Fustini <fustini@kernel.org>
>
> riscv: add support for srmcfg CSR from Ssqosid extension
>
> This commit adds support for the srmcfg CSR defined in the Ssqosid ISA
> extension. It introduces a per-task srmcfg field mirrored to a per-CPU
> cache to optimize context switching and configure resource management IDs.
>
> Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
>
> > diff --git a/arch/riscv/include/asm/qos.h b/arch/riscv/include/asm/qos.h
> > new file mode 100644
> > index 0000000000000..6988fe37551ee
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/qos.h
> > @@ -0,0 +1,64 @@
> [ ... ]
> > +static inline void __switch_to_srmcfg(struct task_struct *next)
> > +{
> > + u32 thread_srmcfg;
> > +
> > + thread_srmcfg = READ_ONCE(next->thread.srmcfg);
> > +
> > + /* Default-group tasks (thread.srmcfg == 0) follow this CPU's default. */
> > + if (thread_srmcfg == 0)
> > + thread_srmcfg = __this_cpu_read(cpu_srmcfg_default);
>
> Does this logic correctly handle tasks assigned to a specific monitoring
> group but the default resource group?
>
> The srmcfg CSR encodes both the Resource Control ID (RCID) and Monitoring
> Counter ID (MCID).
>
> If a task is placed in the default resource allocation group (RCID=0) but
> is assigned a specific monitoring group (MCID!=0), its thread.srmcfg will
> be non-zero.
>
> The thread_srmcfg == 0 check will fail, bypassing the cpu_srmcfg_default
> override. This would allow the task to escape the CPU's restricted RCID
> allocation.
>
> Could the RCID and MCID fields be extracted and evaluated independently?
Yes, good catch. v5 will rewrite __switch_to_srmcfg() to evaluate RCID
and MCID against cpu_srmcfg_default independently, matching x86 RDT's
__resctrl_sched_in() per-field logic. A task in the default RCID group
with a specific MCID (or vice versa) now correctly inherits the CPU
default for the unassigned dimension.
>
> > diff --git a/arch/riscv/kernel/qos.c b/arch/riscv/kernel/qos.c
> > new file mode 100644
> > index 0000000000000..75bda2ed89e1f
> > --- /dev/null
> > +++ b/arch/riscv/kernel/qos.c
> > @@ -0,0 +1,49 @@
> [ ... ]
> > +/* cached value of srmcfg csr for each cpu */
> > +DEFINE_PER_CPU(u32, cpu_srmcfg);
>
> Will this cause the CSR write to be bypassed during early boot?
>
> DEFINE_PER_CPU places the cache in .bss or implicitly zero-initializes it.
>
> During early boot, context switches occur to threads like idle or init
> before the arch_initcall executes, and these early threads have
> thread.srmcfg == 0.
>
> When __switch_to_srmcfg() checks if thread_srmcfg != __this_cpu_read(cpu_srmcfg),
> it will evaluate to 0 != 0 and skip writing the CSR. This means the CPU
> would execute all early boot code with an implementation-defined,
> potentially random hardware CSR state.
>
> Could we statically initialize the per-CPU variable, for example:
>
> DEFINE_PER_CPU(u32, cpu_srmcfg) = U32_MAX;
I will fix in v5
> > +
> > +/* default srmcfg value for each cpu, set via resctrl cpu assignment */
> > +DEFINE_PER_CPU(u32, cpu_srmcfg_default);
> > +
> > +/*
> > + * Seed the per-CPU srmcfg cache to a sentinel that no real srmcfg encoding
> > + * can produce (MCID << 16 | RCID, both fields well under 16 bits) so the
> > + * next __switch_to_srmcfg() unconditionally writes the CSR. Ssqosid v1.0
> > + * leaves CSR state across hart stop/start implementation-defined, so the
> > + * cached value cannot be trusted after online.
> > + */
> > +static int riscv_srmcfg_online(unsigned int cpu)
> > +{
> > + per_cpu(cpu_srmcfg, cpu) = U32_MAX;
> > + return 0;
> > +}
>
> How is the boot CPU's cache handled during system resume from suspend?
I will add a CPU PM notifier that invalidates cpu_srmcfg on CPU_PM_EXIT
and CPU_PM_ENTER_FAILED, so the boot CPU also forces a CSR write on
resume.
> > +static int __init riscv_srmcfg_init(void)
> > +{
> > + unsigned int cpu;
> > + int err;
> > +
> > + if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID))
> > + return 0;
> > +
> > + /* Seed already-online CPUs. The cpuhp callback covers later onlines. */
> > + for_each_online_cpu(cpu)
> > + per_cpu(cpu_srmcfg, cpu) = U32_MAX;
>
> Is this manual iteration over online CPUs necessary?
No, I will drop it. cpuhp_setup_state() already invokes the startup
callback on every already-online CPU.
> > + err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/srmcfg:online",
> > + riscv_srmcfg_online, NULL);
> > + if (err < 0)
> > + pr_warn("srmcfg cpuhp registration failed (%d), cpus brought online after boot will not invalidate the CSR_SRMCFG cache\n",
> > + err);
> > + return err;
> > +}
> > +arch_initcall(riscv_srmcfg_init);
>
> Does this return a false failure during boot?
do_one_initcall() in init/main.c ignores positive return values from
initcalls. I will leave this as-is.
thanks,
drew
WARNING: multiple messages have this Message-ID (diff)
From: Drew Fustini <fustini@kernel.org>
To: sashiko@lists.linux.dev
Cc: conor+dt@kernel.org, devicetree@vger.kernel.org,
krzk+dt@kernel.org, robh@kernel.org,
linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org,
x86@kernel.org
Subject: Re: [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from Ssqosid extension
Date: Thu, 14 May 2026 15:24:56 -0700 [thread overview]
Message-ID: <agZLuHhsN5ezSmqg@tt-p14-dfustini> (raw)
In-Reply-To: <20260511235300.781C6C2BCB0@smtp.kernel.org>
On Mon, May 11, 2026 at 11:52:59PM +0000, sashiko-bot@kernel.org wrote:
> Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
> - [High] Uninitialized Hardware State / Early Boot Bypass for CSR_SRMCFG
> - [High] Resource Control Bypass via Monitoring Group (MCID) evaluation flaw
> - [High] Hardware/Software State Desync on Boot CPU Resume from Suspend
> - [Medium] Redundant loop and Data Race in hotplug initialization
> - [Medium] Initcall returns positive hotplug state ID on success
> --
>
> commit 0e2697a6f259670c211c47e45e2448410a9ac7b9
> Author: Drew Fustini <fustini@kernel.org>
>
> riscv: add support for srmcfg CSR from Ssqosid extension
>
> This commit adds support for the srmcfg CSR defined in the Ssqosid ISA
> extension. It introduces a per-task srmcfg field mirrored to a per-CPU
> cache to optimize context switching and configure resource management IDs.
>
> Link: https://github.com/riscv/riscv-ssqosid/releases/tag/v1.0
>
> > diff --git a/arch/riscv/include/asm/qos.h b/arch/riscv/include/asm/qos.h
> > new file mode 100644
> > index 0000000000000..6988fe37551ee
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/qos.h
> > @@ -0,0 +1,64 @@
> [ ... ]
> > +static inline void __switch_to_srmcfg(struct task_struct *next)
> > +{
> > + u32 thread_srmcfg;
> > +
> > + thread_srmcfg = READ_ONCE(next->thread.srmcfg);
> > +
> > + /* Default-group tasks (thread.srmcfg == 0) follow this CPU's default. */
> > + if (thread_srmcfg == 0)
> > + thread_srmcfg = __this_cpu_read(cpu_srmcfg_default);
>
> Does this logic correctly handle tasks assigned to a specific monitoring
> group but the default resource group?
>
> The srmcfg CSR encodes both the Resource Control ID (RCID) and Monitoring
> Counter ID (MCID).
>
> If a task is placed in the default resource allocation group (RCID=0) but
> is assigned a specific monitoring group (MCID!=0), its thread.srmcfg will
> be non-zero.
>
> The thread_srmcfg == 0 check will fail, bypassing the cpu_srmcfg_default
> override. This would allow the task to escape the CPU's restricted RCID
> allocation.
>
> Could the RCID and MCID fields be extracted and evaluated independently?
Yes, good catch. v5 will rewrite __switch_to_srmcfg() to evaluate RCID
and MCID against cpu_srmcfg_default independently, matching x86 RDT's
__resctrl_sched_in() per-field logic. A task in the default RCID group
with a specific MCID (or vice versa) now correctly inherits the CPU
default for the unassigned dimension.
>
> > diff --git a/arch/riscv/kernel/qos.c b/arch/riscv/kernel/qos.c
> > new file mode 100644
> > index 0000000000000..75bda2ed89e1f
> > --- /dev/null
> > +++ b/arch/riscv/kernel/qos.c
> > @@ -0,0 +1,49 @@
> [ ... ]
> > +/* cached value of srmcfg csr for each cpu */
> > +DEFINE_PER_CPU(u32, cpu_srmcfg);
>
> Will this cause the CSR write to be bypassed during early boot?
>
> DEFINE_PER_CPU places the cache in .bss or implicitly zero-initializes it.
>
> During early boot, context switches occur to threads like idle or init
> before the arch_initcall executes, and these early threads have
> thread.srmcfg == 0.
>
> When __switch_to_srmcfg() checks if thread_srmcfg != __this_cpu_read(cpu_srmcfg),
> it will evaluate to 0 != 0 and skip writing the CSR. This means the CPU
> would execute all early boot code with an implementation-defined,
> potentially random hardware CSR state.
>
> Could we statically initialize the per-CPU variable, for example:
>
> DEFINE_PER_CPU(u32, cpu_srmcfg) = U32_MAX;
I will fix in v5
> > +
> > +/* default srmcfg value for each cpu, set via resctrl cpu assignment */
> > +DEFINE_PER_CPU(u32, cpu_srmcfg_default);
> > +
> > +/*
> > + * Seed the per-CPU srmcfg cache to a sentinel that no real srmcfg encoding
> > + * can produce (MCID << 16 | RCID, both fields well under 16 bits) so the
> > + * next __switch_to_srmcfg() unconditionally writes the CSR. Ssqosid v1.0
> > + * leaves CSR state across hart stop/start implementation-defined, so the
> > + * cached value cannot be trusted after online.
> > + */
> > +static int riscv_srmcfg_online(unsigned int cpu)
> > +{
> > + per_cpu(cpu_srmcfg, cpu) = U32_MAX;
> > + return 0;
> > +}
>
> How is the boot CPU's cache handled during system resume from suspend?
I will add a CPU PM notifier that invalidates cpu_srmcfg on CPU_PM_EXIT
and CPU_PM_ENTER_FAILED, so the boot CPU also forces a CSR write on
resume.
> > +static int __init riscv_srmcfg_init(void)
> > +{
> > + unsigned int cpu;
> > + int err;
> > +
> > + if (!riscv_has_extension_unlikely(RISCV_ISA_EXT_SSQOSID))
> > + return 0;
> > +
> > + /* Seed already-online CPUs. The cpuhp callback covers later onlines. */
> > + for_each_online_cpu(cpu)
> > + per_cpu(cpu_srmcfg, cpu) = U32_MAX;
>
> Is this manual iteration over online CPUs necessary?
No, I will drop it. cpuhp_setup_state() already invokes the startup
callback on every already-online CPU.
> > + err = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "riscv/srmcfg:online",
> > + riscv_srmcfg_online, NULL);
> > + if (err < 0)
> > + pr_warn("srmcfg cpuhp registration failed (%d), cpus brought online after boot will not invalidate the CSR_SRMCFG cache\n",
> > + err);
> > + return err;
> > +}
> > +arch_initcall(riscv_srmcfg_init);
>
> Does this return a false failure during boot?
do_one_initcall() in init/main.c ignores positive return values from
initcalls. I will leave this as-is.
thanks,
drew
_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv
next prev parent reply other threads:[~2026-05-14 22:24 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-11 5:10 [PATCH RFC v4 00/18] riscv: add Ssqosid and CBQRI resctrl support Drew Fustini
2026-05-11 5:10 ` Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 01/18] dt-bindings: riscv: Add Ssqosid extension description Drew Fustini
2026-05-11 5:10 ` Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 02/18] riscv: detect the Ssqosid extension Drew Fustini
2026-05-11 5:10 ` Drew Fustini
2026-05-11 5:10 ` [PATCH RFC v4 03/18] riscv: add support for srmcfg CSR from " Drew Fustini
2026-05-11 5:10 ` Drew Fustini
2026-05-11 23:52 ` sashiko-bot
2026-05-14 22:24 ` Drew Fustini [this message]
2026-05-14 22:24 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 04/18] fs/resctrl: Add resctrl_is_membw() helper Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 05/18] fs/resctrl: Add RDT_RESOURCE_MB_MIN and RDT_RESOURCE_MB_WGHT Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 06/18] fs/resctrl: Let bandwidth resources default to min_bw at reset Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 1:26 ` sashiko-bot
2026-05-15 13:46 ` Drew Fustini
2026-05-15 13:46 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 08/18] riscv_cbqri: Add capacity controller monitoring " Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 1:58 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 09/18] riscv_cbqri: Add bandwidth controller probe and allocation " Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 2:29 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 10/18] riscv_cbqri: Add bandwidth controller monitoring " Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 4:01 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 12/18] riscv_cbqri: resctrl: Add L3 cache occupancy monitoring Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 5:00 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 13/18] riscv_cbqri: resctrl: Add MB_MIN bandwidth allocation via Rbwb Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 5:42 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 15/18] riscv_cbqri: resctrl: Add mbm_total_bytes bandwidth monitoring Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 21:26 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 16/18] ACPI: RISC-V: Parse RISC-V Quality of Service Controller (RQSC) table Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 21:46 ` sashiko-bot
2026-05-11 5:11 ` [PATCH RFC v4 17/18] ACPI: RISC-V: Add support for RISC-V Quality of Service Controller (RQSC) Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-11 5:11 ` [PATCH RFC v4 18/18] riscv: enable resctrl filesystem for Ssqosid Drew Fustini
2026-05-11 5:11 ` Drew Fustini
2026-05-12 22:56 ` sashiko-bot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=agZLuHhsN5ezSmqg@tt-p14-dfustini \
--to=fustini@kernel.org \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=krzk+dt@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=robh@kernel.org \
--cc=sashiko@lists.linux.dev \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.