From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 017D8CD4F25 for ; Fri, 15 May 2026 13:47:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:Cc:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=h4chSPW9dgrLOgSC4tqpMuavaaBnoylOqq2YB5c+sOo=; b=ypJl02dGzOmRxv Hi0iZzpcQPb86bdr9w1XpwLACmAp1baDyCb/S5lcHOBn2YF3phj8QMzOKskZjPTg+OApeOucwdvZ+ 45oYbZjjKbj1lckLurs4bf6vaNKOxFwDdaSvCFZSDRGxqZclEu0B4uxNfOE0DdzipVJWs0ND4Wltf YlsgdcQ2euOxMnKEv7OL40EOig762GJ84QoL+clN1hvnUWUnAnqq+FFL5WByfUWXZzbmnCmZDy0nu UyE2cI79bH9/vYnPEUJIy2GwImlyJpZd0u1niywbb8Nk60xDEERwnxoKkBJMeAwWMHrAq8ClheRrU EcwU/elCGc+0wC6AyHvg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.99.1 #2 (Red Hat Linux)) id 1wNssX-00000008Tof-29sy; Fri, 15 May 2026 13:47:01 +0000 Received: from tor.source.kernel.org ([172.105.4.254]) by bombadil.infradead.org with esmtps (Exim 4.99.1 #2 (Red Hat Linux)) id 1wNssW-00000008ToR-3zwu for linux-riscv@lists.infradead.org; Fri, 15 May 2026 13:47:01 +0000 Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 13B9760138; Fri, 15 May 2026 13:47:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A0878C2BCB0; Fri, 15 May 2026 13:46:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1778852819; bh=1DnnCkmocMnHe94ZkReRXuscYumyh1GIuX6FKsRW7gU=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=i9XkkD7PzP8uXBEEeDm7JVQRgf3aXLKbGhB6VQPjUauVsBNxvwtykf+WkzilGX8ts 84SQVmaqdX+JHhbm1JRAmvgKyLzDGn3CqeurmAZrGxQAf2V1MHJVZPIS9tP8t5dCqL i4MfOu9SjKwoUD3SphCYbNr19xkwnuGlkhvjuY771bYEKpzm7cjx/Hi+sMNtnRIwku MuzBkNZOgfdh+SNj9O9sPfwgHUILfE2nWcl/PqWmPBJqd7lAIy6ymCuwxbUh+6GmIB 45slBgCu9dRiZP0peWw8U8iSkzxUEWR78T3Qaxqd4MlSea2Ol6TKg//YmR0Q09QMuc 7XnrCDrVjmuXg== Date: Fri, 15 May 2026 06:46:58 -0700 From: Drew Fustini To: sashiko@lists.linux.dev Cc: robh@kernel.org, conor+dt@kernel.org, devicetree@vger.kernel.org, krzk+dt@kernel.org, linux-kernel@vger.kernel.org, linux-riscv@lists.infradead.org Subject: Re: [PATCH RFC v4 07/18] riscv_cbqri: Add capacity controller probe and allocation device ops Message-ID: References: <20260510-ssqosid-cbqri-rqsc-v7-0-v4-7-eb53831ef683@kernel.org> <20260512012635.84FCCC2BCB0@smtp.kernel.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20260512012635.84FCCC2BCB0@smtp.kernel.org> X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" Errors-To: linux-riscv-bounces+linux-riscv=archiver.kernel.org@lists.infradead.org On Tue, May 12, 2026 at 01:26:34AM +0000, sashiko-bot@kernel.org wrote: [..] > > +int cbqri_apply_cache_config(struct cbqri_controller *ctrl, u32 closid, > > + const struct cbqri_cc_config *cfg) > > +{ > > + bool need_at_mirror; > > + u64 saved_cbm = 0; > > + int err = 0; > > + u64 reg; > > + > > + mutex_lock(&ctrl->lock); > > + > > + need_at_mirror = ctrl->cc.supports_alloc_at_code && !cfg->cdp_enabled; > > + > > + /* > > + * Capture the cfg->at half CBM before any write so a partial > > + * AT-mirror failure can revert and keep the two halves consistent. > > + */ > > + if (need_at_mirror) { > > + err = cbqri_cc_alloc_op(ctrl, CBQRI_CC_ALLOC_CTL_OP_READ_LIMIT, > > + closid, cfg->at); > > + if (err < 0) > > + goto out; > > + saved_cbm = ioread64(ctrl->base + CBQRI_CC_BLOCK_MASK_OFF); > > + } > > To mitigate silent firmware no-ops where the hardware returns success without > updating the staging register, cbqri_read_cache_config() pre-clears > the register. > > If a silent no-op occurs here during the READ_LIMIT operation, could > saved_cbm capture stale MMIO data? If a revert is later triggered, does > this risk writing garbage data back and corrupting the controller's capacity > allocation state? I will change it to pre-clear cc_block_mask before the READ_LIMIT so a silent firmware no-op surfaces as zero rather than stale data. > > + /* Get the operation status */ > > + *status = FIELD_GET(CBQRI_CONTROL_REGISTERS_STATUS_MASK, reg); > > + > > + /* > > + * Check for the AT support if the register is implemented > > + * (if not, the status value will remain 0) > > + */ > > + if (*status != 0) { > > + /* > > + * Re-issue operation with AT=CODE so the controller > > + * latches AT=CODE on supported hardware (or resets it to 0 > > + * on hardware that doesn't). OP must be a defined CBQRI op > > + * here. OP=0 is a no-op and would silently disable CDP. > > + */ > > + reg = saved_reg; > > + FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_OP_MASK, ®, operation); > > + FIELD_MODIFY(CBQRI_CONTROL_REGISTERS_AT_MASK, ®, > > + CBQRI_CONTROL_REGISTERS_AT_CODE); > > + iowrite64(reg, ctrl->base + reg_offset); > > When issuing the second probe to verify AT support, the code restores the raw > boot-time register state via reg = saved_reg. > > Does this execute the command against whatever uninitialized RCID value the > firmware originally left in the register, since the RCID mask is not cleared > again? If the RCID is out of bounds, could the hardware reject the operation, > causing the driver to falsely report that supports_alloc_at_code is false? > I will change cbqri_probe_feature() to build each write from saved_reg masked clean of OP, AT, RCID, and EVT_ID before ORing in the intended fields. > > +static int cbqri_probe_cc(struct cbqri_controller *ctrl) > > +{ > > [ ... ] > > > + /* > > + * Resolve cache_size via cacheinfo. cpus_read_lock satisfies > > + * lockdep_assert_cpus_held() inside get_cpu_cacheinfo_level(). If > > + * every cpu_mask member is offline, cache_size stays 0 and the > > + * controller cannot back occupancy monitoring. > > + */ > > + cpus_read_lock(); > > + if (!ctrl->cache.cache_size) { > > + int cpu = cpumask_first_and(&ctrl->cache.cpu_mask, cpu_online_mask); > > + > > + if (cpu < nr_cpu_ids) { > > + struct cacheinfo *ci; > > + > > + ci = get_cpu_cacheinfo_level(cpu, ctrl->cache.cache_level); > > + if (ci) > > + ctrl->cache.cache_size = ci->size; > > + } > > + } > > + cpus_read_unlock(); > > A comment in riscv_cbqri_register_controller() states that cache_size is > filled lazily during probe. > > Since cbqri_probe_cc() is invoked synchronously during early boot > registration, wouldn't CPUs in other clusters still be offline? Because there > is no lazy re-evaluation mechanism or CPU hotplug hook later on, does this > permanently leave ctrl->cache.cache_size at 0 and break occupancy monitoring > calculations for those controllers? ACPI parsing runs at subsys_initcall which is after smp_init() brings up the secondary CPUs, so every cpu in cpu_mask is online by the time cbqri_probe_cc() runs. > > > +void cbqri_controller_destroy(struct cbqri_controller *ctrl) > > +{ > > + kfree(ctrl); > > +} > > When riscv_cbqri_unregister_last() is used to roll back a successful > registration, it calls this function to free the controller. > > Does this leak the I/O mappings and memory regions, since it fails to call > iounmap(ctrl->base) and release_mem_region() for the resources successfully > allocated during cbqri_probe_controller()? I will fix this to mirror the probe error paths. They already clean up and clear ctrl->base before returning, so destroying a probe-failed controller does not double free or unmap an already-released region. > > +int riscv_cbqri_register_controller(const struct cbqri_controller_info *info) > > +{ > > + struct cbqri_controller *ctrl; > > + int err; > > + > > + if (!info->addr) { > > + pr_warn("skipping controller with invalid addr=0x0\n"); > > + return -EINVAL; > > + } > > + > > + ctrl = kzalloc(sizeof(*ctrl), GFP_KERNEL); > > + if (!ctrl) > > + return -ENOMEM; > > + > > + mutex_init(&ctrl->lock); > > + > > + ctrl->addr = info->addr; > > + ctrl->size = info->size; > > + ctrl->type = info->type; > > + ctrl->rcid_count = info->rcid_count; > > + ctrl->mcid_count = info->mcid_count; > > The cbqri header explicitly defines CBQRI_MAX_RCID and CBQRI_MAX_MCID as 1024, > alongside documentation warning that a malformed firmware table claiming a > large number of IDs could trip the soft-lockup watchdog during per-id MMIO > init loops. > > Is it safe to blindly assign these counts without validating them against the > safety caps? Could a malformed ACPI or DT table bypass this protection and > trigger a soft lockup during boot? CBQRI_MAX_RCID and CBQRI_MAX_MCID are enforced in acpi_parse_rqsc() which is the only firmware discovery path that exists today. Any value that reaches riscv_cbqri_register_controller() has already passed that. -Drew _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv