From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0D4AF3446C7 for ; Mon, 1 Jun 2026 20:57:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780347456; cv=none; b=B/q09lXI5UzQx3BxuWGLs2ot3ojFc/UOW1vkjpuHLMDRCdpNi1fBMCivF2wnk2iJCt208KGOUBTqiVMcbq/UnukAqYg7huLDvQ0uAiM42zsLG4eRU/Nz2M8npj5XZPs0U2dy4nSDE3fSjzn15JKQZigboDCDGcZYgV1vcH4a2nE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780347456; c=relaxed/simple; bh=YDGQ1cnEeL7m4CxYzmmKj+7V6GIkBChOW9UH2LgxERM=; h=From:Subject:To:Cc:In-Reply-To:References:Content-Type:Date: Message-Id; b=n4y+ukafnakwyoFH6znJpNrPdVS5U6j1nj1pgzT+HLImyhOX+KzlX8yIwKFm1h+T5A/E5StSkOJB6MZynxDMJROwXtAy/ZK92Z3xwoZcCRhbbUjqEvdm8+CSpOpit1VY7ylEyVGr2vkONWvhm0C/m/rOtDmw93eJg/IXhJkq1DE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=AnvN17V6; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AnvN17V6" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 76D021F00893; Mon, 1 Jun 2026 20:57:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1780347454; bh=PItIEssMRJ+JJVA1vQbWkR5M3rzVYY6eW7QpHjAkMho=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=AnvN17V6P8NIcro+nXbMBOtf/LI4iIS2JJ2olwYGxC6E/YduGlM6Z5c/xgbuKldqr 4a08Z6ohsuHijEXfwMbidI+PcicAdOIgD+Yx3QOt0/wiuAlY8h1XGRQW6cXdCc5GDo leeOjSkad6CKHJ4Z0ZXUVMruYBIs7X8H6KBDfCNZN0440i31zLOYl5fqmZ08dk9tJG 2VvJFIsVFfKB4oCFkAtIgd4/le2MXCbb3RUKYzMbV3IeZdGDQABCI8H0HeSvkc18RH vgIWXkiG8igMT9a+9G3pImLdNYCnQgOzyEUqcnEuwgCNI03gzDA9Q0u4Cx4UeXXF7u H9yjZDa4uyalA== From: sashiko-bot@kernel.org Subject: Re: [PATCH RFC v6 14/18] riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Reply-To: sashiko-reviews@lists.linux.dev To: "Drew Fustini" Cc: robh@kernel.org, conor+dt@kernel.org, devicetree@vger.kernel.org In-Reply-To: <20260601-ssqosid-cbqri-rqsc-v7-0-v6-14-baf00f50028a@kernel.org> References: <20260601-ssqosid-cbqri-rqsc-v7-0-v6-14-baf00f50028a@kernel.org> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Mon, 01 Jun 2026 20:57:34 +0000 Message-Id: <20260601205734.76D021F00893@smtp.kernel.org> Precedence: bulk X-Mailing-List: devicetree@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Thank you for your contribution! Sashiko AI review found 3 potential issue(= s) to consider: Pre-existing issues: - [High] AB-BA deadlock between rdtgroup_mutex and cbqri_domain_list_lock i= nvolving CPU hotplug and resctrl filesystem reads. - [High] Unprotected list_del() and kfree() causes Use-After-Free on res->c= trl_domains. - [High] Staging registers are read and modified before ensuring the hardwa= re is not busy, leading to data corruption. -- commit 77c2c065054c1b30b97ae2ddd37b439f110e553b Author: Drew Fustini riscv_cbqri: resctrl: Add MB_WGHT bandwidth allocation via Mweight Add bandwidth allocation through Mweight (shared weight for unreserved bandwidth) exposed as the MB_WGHT resource. Mweight is an integer in [0, 255]. A value of 0 disables work-conserving sharing for the group, capping its bandwidth at the MB_MIN reservation. > diff --git a/drivers/resctrl/cbqri_resctrl.c b/drivers/resctrl/cbqri_resc= trl.c > index 1d312004b07d9..14b955eb79491 100644 > --- a/drivers/resctrl/cbqri_resctrl.c > +++ b/drivers/resctrl/cbqri_resctrl.c [ ... ] > @@ -426,6 +428,14 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, = struct rdt_ctrl_domain *d, > val =3D (u32)rbwb; > break; > } > + case RDT_RESOURCE_MB_WGHT: { > + u64 mweight; > + > + err =3D cbqri_read_mweight(ctrl, closid, &mweight); [Severity: High] This isn't a bug introduced by this patch, but the newly added cbqri_read_mweight() called here duplicates a pre-existing flaw. Can this corrupt data by writing to staging registers before ensuring the hardware is not busy? In cbqri_read_mweight(), the driver interacts with the hardware by writing = to a staging register, writing a control register, and waiting for the BUSY flag= to clear. If a previous operation times out (such as the 1ms limit in cbqri_wait_busy_flag()), the driver returns -EIO and releases ctrl->lock, b= ut the hardware might still be processing. The next caller acquires the lock and immediately calls cbqri_set_bc_bw_all= oc() to read-modify-write the staging register before cbqri_bc_alloc_op() verifi= es the BUSY flag is 0. Would modifying the staging register of an in-flight hardware operation cau= se undefined behavior, misconfiguration, or WPRI field corruption? > + if (err =3D=3D 0) > + val =3D (u32)mweight; > + break; > + } [ ... ] > @@ -961,10 +1007,46 @@ static int cbqri_attach_cpu_to_one_bw_res(struct c= bqri_controller *ctrl, > return 0; > } > =20 > +static void cbqri_detach_cpu_from_one_bw_res(struct cbqri_controller *ct= rl, > + enum resctrl_res_level rid, > + unsigned int cpu) > +{ > + struct cbqri_resctrl_res *hw_res =3D &cbqri_resctrl_resources[rid]; > + struct rdt_resource *res =3D &hw_res->resctrl_res; > + struct rdt_ctrl_domain *domain; > + int dom_id =3D ctrl->mem.prox_dom; > + > + lockdep_assert_held(&cbqri_domain_list_lock); > + > + if (!hw_res->ctrl) > + return; > + > + domain =3D cbqri_find_ctrl_domain(&res->ctrl_domains, dom_id); > + if (!domain || !cpumask_test_cpu(cpu, &domain->hdr.cpu_mask)) > + return; > + > + cpumask_clear_cpu(cpu, &domain->hdr.cpu_mask); > + if (cpumask_empty(&domain->hdr.cpu_mask)) { > + resctrl_offline_ctrl_domain(res, domain); [Severity: High] This isn't a bug introduced by this patch, but the newly added cbqri_detach_cpu_from_one_bw_res() replicates a broken lock ordering. Could this lead to an AB-BA deadlock between rdtgroup_mutex and cbqri_domain_list_lock? If a user reading a monitor data file triggers rdtgroup_mondata_show(), it acquires rdtgroup_mutex, then calls mon_event_read() -> resctrl_arch_rmid_read(), which acquires cbqri_domain_list_lock. Concurrently, when a CPU is taken offline, cbqri_resctrl_offline_cpu() acquires cbqri_domain_list_lock, then calls cbqri_detach_cpu_from_all_ctrls= () -> cbqri_detach_cpu_from_one_bw_res(). If the domain becomes empty, it calls resctrl_offline_ctrl_domain(), which attempts to acquire rdtgroup_mutex. Would these two paths race and cause a permanent AB-BA deadlock? > + list_del(&domain->hdr.list); > + kfree(container_of(domain, struct cbqri_resctrl_dom, > + resctrl_ctrl_dom)); [Severity: High] This is a pre-existing issue, but the new cbqri_detach_cpu_from_one_bw_res() replicates it here. Can this result in a Use-After-Free on the res->ctrl_domains list? VFS read paths in fs/resctrl (like rdtgroup_schemata_show()) traverse the res->ctrl_domains list using list_for_each_entry while holding only rdtgroup_mutex. When the CPU mask becomes empty, resctrl_offline_ctrl_domain() is called ab= ove, which takes and releases rdtgroup_mutex. After it returns, list_del() and kfree() are executed without holding rdtgroup_mutex or using RCU primitives. Could a concurrent list traversal under rdtgroup_mutex dereference the freed domain pointer and crash? > + } > +} --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260601-ssqosid-cb= qri-rqsc-v7-0-v6-0-baf00f50028a@kernel.org?part=3D14