From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E51B3383304 for ; Tue, 12 May 2026 04:01:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778558496; cv=none; b=VKZm/Xwvy9dXHMmE39/C7HuIwx8Pn0Bd0YMuUEBFJD8+68iuHW285grEeT216+6DGdRdMx2JD2SocXZBvTc+4eA1HYHhQqmCab9h9Yu3uXa5SbfAHWAkFlWkoG8CEyu7/6Pin5FIWvZ27zwTi6+KrfS/ZtxWD5BCBQvdi89p9CE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778558496; c=relaxed/simple; bh=pDHdcvZxyP7oH7hQAZhM7f3KkD9CzmJZDqA+2QnjQ1w=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=R1TmUZJ9lOEzVXPiFAfLU7LMZFutDxBOabqVMA/+/zZmwZ/CUixgH8gGgK3VuhBnqqfBYnSsDohiTxK0zjiUJkfH/UBgV/Vk3RxQy+4XHXTwLS/jLJ21Ymkm+BycqfMUsRSfKcdK74r+m3ZOYIVr2rhiEquMDvcf/w3JGmHyJj4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OPx02vmq; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OPx02vmq" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 51715C2BCB0; Tue, 12 May 2026 04:01:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778558495; bh=pDHdcvZxyP7oH7hQAZhM7f3KkD9CzmJZDqA+2QnjQ1w=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date:From; b=OPx02vmqGbgVwVKkQNcpF69SGGpUYo0bbtu1vXUVtfmSWuS0UT8Y6qwIpaDWN3XHf xqys5hXcOnlVNW0UQWDJi0ATnrX8Ud5T2OxpRnGaxVqITX2zJG6ki9rcFQ3lVdOufj 5lUn6TZeqeu5ShymH4Gl88q0AJAZB/Md8VgfEAE+8Mzmom1r/4ruoCp8tAXUDNvW0R 5o8m2ekdXFvAeufbjVMI3w4EBJHssXDNXtNGaJOEvXm22D4AKkbQWQEY7uepDY+H/o EWw9di4ZN1LZdsP7ZWB7zf5rAVrvL3HVHf+/U4QekgfDw31KOHC4Voian1MadDnNyw 90V1lmHntUtkg== From: sashiko-bot@kernel.org Subject: Re: [PATCH RFC v4 11/18] riscv_cbqri: resctrl: Add cache allocation via capacity block mask Reply-To: sashiko@lists.linux.dev To: "Drew Fustini" Cc: robh@kernel.org, krzk+dt@kernel.org, devicetree@vger.kernel.org, conor+dt@kernel.org In-Reply-To: <20260510-ssqosid-cbqri-rqsc-v7-0-v4-11-eb53831ef683@kernel.org> References: <20260510-ssqosid-cbqri-rqsc-v7-0-v4-11-eb53831ef683@kernel.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Tue, 12 May 2026 04:01:34 +0000 Message-Id: <20260512040135.51715C2BCB0@smtp.kernel.org> Precedence: bulk X-Mailing-List: devicetree@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 8 potential issue(= s) to consider: - [High] Initialization and reset loops unconditionally iterate over `CDP_N= UM_TYPES`, sending invalid access types to non-CDP hardware. - [High] Missing upper bounds check on `rcid_count` leads to silent resourc= e isolation bypass via integer truncation. - [Medium] `cbqri_resctrl_setup()` fails to clean up globally visible state= on initialization failure. - [Medium] Using `wait_event()` to wait for a previous initcall creates a d= ormant deadlock trap. - [Medium] Unnecessary use of `synchronize_rcu()` inside CPU hotplug callba= cks causes severe latency spikes. - [Medium] The `cbqri_domain_list_lock` mutex is completely redundant. - [Low] Enabling CONFIG_RISCV_ISA_SSQOSID and CONFIG_RESCTRL_FS without CON= FIG_RISCV_CBQRI_DRIVER causes a linker error. - [Low] Use of non-existent macro `kzalloc_obj` in `cbqri_new_domain()`. -- commit 8f17399147965ff9d1d7a5a7237300d91e824dd5 Author: Drew Fustini riscv_cbqri: resctrl: Add cache allocation via capacity block mask =20 This commit wires the CBQRI capacity controllers into the resctrl subsystem. It exposes L2 and L3 resources and handles domains lazily during CPU hotplu= g. > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -595,6 +595,7 @@ config RISCV_ISA_SSQOSID > bool "Ssqosid extension support for supervisor mode Quality of Service = ID" > depends on 64BIT > default n > + select ARCH_HAS_CPU_RESCTRL If CONFIG_RISCV_CBQRI_DRIVER is disabled, does this cause a linker error when CONFIG_RESCTRL_FS is enabled? The riscv implementation of the resctrl_arch_* functions is in cbqri_resctrl.c, which seems to only be compiled when the driver is enabled. > diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resc= trl.c > --- /dev/null > +++ b/drivers/resctrl/cbqri_resctrl.c [ ... ] > +u32 resctrl_arch_get_num_closid(struct rdt_resource *res) > +{ > + struct cbqri_resctrl_res *hw_res; > + > + hw_res =3D container_of(res, struct cbqri_resctrl_res, resctrl_res); > + > + /* > + * fs/resctrl calls this for resctrl-defined rids that CBQRI may not > + * back (e.g. RDT_RESOURCE_MBA from set_mba_sc() during unmount). > + * Unpicked rids have ctrl =3D=3D NULL. Report no closids. > + */ > + if (!hw_res->ctrl) > + return 0; > + > + return hw_res->ctrl->rcid_count; > +} Is there a bounds check needed for rcid_count against the 12-bit maximum? If the firmware reports an rcid_count greater than 4096, fs/resctrl would permit creating groups with closid >=3D 4096. This could lead to FIELD_PREP() silently truncating the upper bits in resctrl_arch_set_closid_rmid(), causing the task to run in a different group and bypassing the intended hardware isolation. [ ... ] > +static struct rdt_ctrl_domain *cbqri_new_domain(struct cbqri_controller = *ctrl) > +{ > + struct cbqri_resctrl_dom *hw_dom; > + struct rdt_ctrl_domain *domain; > + > + hw_dom =3D kzalloc_obj(*hw_dom, GFP_KERNEL); Does kzalloc_obj exist in the kernel? It looks like this might cause a build failure due to an implicit function declaration. [ ... ] > +static int cbqri_init_domain_ctrlval(struct rdt_resource *r, struct rdt_= ctrl_domain *d) > +{ > + struct cbqri_resctrl_res *hw_res; > + enum resctrl_conf_type t; > + int err =3D 0; > + int i; > + > + hw_res =3D container_of(r, struct cbqri_resctrl_res, resctrl_res); > + > + for (i =3D 0; i < hw_res->ctrl->rcid_count; i++) { > + /* > + * Seed both DATA and CODE staged slots so a later mount > + * with -o cdp does not see stale CODE values. > + * CDP_NUM_TYPES is 1 on non-CDP controllers. > + */ > + for (t =3D 0; t < CDP_NUM_TYPES; t++) { > + err =3D resctrl_arch_update_one(r, d, i, t, > + resctrl_get_default_ctrl(r)); Does this loop correctly handle non-CDP hardware? A comment mentions that CDP_NUM_TYPES is 1 on non-CDP controllers, but since it is a compile-time e= num constant, the loop will always execute for t =3D CDP_CODE. This could cause resctrl_arch_update_one() to send CBQRI_AT_CODE to controllers that do not support it, potentially triggering a hardware fault= and aborting initialization. [ ... ] > +static void cbqri_detach_cpu_from_ctrl_domains(struct rdt_resource *res, > + unsigned int cpu) > +{ > + struct rdt_ctrl_domain *domain, *tmp; > + > + list_for_each_entry_safe(domain, tmp, &res->ctrl_domains, hdr.list) { > + if (!cpumask_test_cpu(cpu, &domain->hdr.cpu_mask)) > + continue; > + cpumask_clear_cpu(cpu, &domain->hdr.cpu_mask); > + if (cpumask_empty(&domain->hdr.cpu_mask)) { > + resctrl_offline_ctrl_domain(res, domain); > + list_del_rcu(&domain->hdr.list); > + synchronize_rcu(); Could this synchronize_rcu() call cause severe latency spikes during CPU hotplug? This function is called synchronously from cbqri_resctrl_offline_cpu() while the core kernel holds the global cpu_hotplug_lock. It seems fs/resctrl mana= ges these lists using standard locks, so the RCU primitives might not be necess= ary here. [ ... ] > +static int cbqri_resctrl_setup(void) > +{ > + int rid; > + int err; > + > + /* Wait for cacheinfo so cbqri_probe_cc()'s lazy fill has data. */ > + wait_event(wait_cacheinfo_ready, cacheinfo_ready); Could this wait_event() potentially put the init thread to sleep indefinite= ly? Because cbqri_resctrl_setup() and __cacheinfo_ready() both run as initcalls= on the same kernel_init thread, if the previous initcall failed or the order changed, wait_event() would sleep forever since no other thread exists to wake it up. [ ... ] > + cbqri_resctrl_accumulate_caps(); > + > + if (!exposed_alloc_capable) { > + pr_debug("no resctrl-capable CBQRI controllers found\n"); > + return -ENODEV; > + } > + > + err =3D resctrl_init(); > + if (err) > + return err; If resctrl_init() fails, should this function call cbqri_resctrl_teardown() or otherwise revert the global state? Since cbqri_resctrl_accumulate_caps() sets exposed_alloc_capable to true, leaving it set on failure would cause resctrl_arch_alloc_capable() to incorrectly return true. [ ... ] > +static int cbqri_resctrl_online_cpu(unsigned int cpu) > +{ > + struct cbqri_controller *ctrl; > + int err =3D 0; > + > + mutex_lock(&cbqri_domain_list_lock); Is the cbqri_domain_list_lock mutex necessary? Since cbqri_resctrl_online_cpu() and cbqri_resctrl_offline_cpu() are invoked exclusively by the CPU hotplug state machine, the core kernel already serializes them globally using the cpu_hotplug_lock. --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260510-ssqosid-cb= qri-rqsc-v7-0-v4-0-eb53831ef683@kernel.org?part=3D11