From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id F1C5CFD8769 for ; Tue, 17 Mar 2026 13:38:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Transfer-Encoding: Content-Type:In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date: Message-ID:Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ydSHSOljwvdlSgZcl8nhQSrW5UKGZG12hy/kVBqDDQI=; b=LMsf8o+Z9lNsy5s1eWcjQ0+2sO rJJyFp5sM4gcMoTFdjjhsCMOWo9FOAB7pzUixzGPSg4BH/VJhCTLAm6m3zRWb3yhD7COHCBrjFUpM XxgCVKMUhhEOEGM3rwHK2RgGC8msd44V91TZY+LKaSMSc4OzgMHCoSduXh7cRZOrswy+jHUIl6dSr lmFRTsiCzZt+98/Z9hn3cE4ZGp8Zfu7HVEROqkTCAD/ZwcsPwnmAwPw5cyU1UuBS4rQjjreGPUet2 OcWc5o+DBD/fGcOFsYwAMfzEOs9u4XW7GAmVgRWmXnhB0BAdQc8yg1gcyBWnEoF/TuxFI95PQIcXi Wf1vIZtQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1w2Ucm-00000006SjQ-0Q2q; Tue, 17 Mar 2026 13:38:20 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1w2Uci-00000006SiV-39iX for linux-arm-kernel@lists.infradead.org; Tue, 17 Mar 2026 13:38:17 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 390A91477; Tue, 17 Mar 2026 06:38:07 -0700 (PDT) Received: from [10.57.60.199] (unknown [10.57.60.199]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2ADA03F778; Tue, 17 Mar 2026 06:38:12 -0700 (PDT) Message-ID: <6ea250dd-ab9f-4abd-8fa8-77560e5dc6ab@arm.com> Date: Tue, 17 Mar 2026 13:38:09 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] iommu/arm-smmu-v3: Allocate cmdq_batch on the heap To: Pranjal Shrivastava , Cheng-Yang Chou Cc: will@kernel.org, linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev, jserv@ccns.ncku.edu.tw References: <20260311094444.3714302-1-yphbchou0911@gmail.com> From: Robin Murphy Content-Language: en-GB In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20260317_063816_889889_D77492E6 X-CRM114-Status: GOOD ( 29.74 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 2026-03-11 2:22 pm, Pranjal Shrivastava wrote: > On Wed, Mar 11, 2026 at 05:44:44PM +0800, Cheng-Yang Chou wrote: >> The arm_smmu_cmdq_batch structure is large and was being allocated on >> the stack in four call sites, causing stack frame sizes to exceed the >> 1024-byte limit: >> >> - arm_smmu_atc_inv_domain: 1120 bytes >> - arm_smmu_atc_inv_master: 1088 bytes >> - arm_smmu_sync_cd: 1088 bytes >> - __arm_smmu_tlb_inv_range: 1072 bytes >> >> Move these allocations to the heap using kmalloc_obj() and kfree() to >> eliminate the -Wframe-larger-than=1024 warnings and prevent potential >> stack overflows. Pro tip: you can also eliminate the warning by setting CONFIG_FRAME_WARN to a larger number, or to 0. The default is 2048, so you're already only getting a warning because you've gone out of your way to ask for a warning. The smaller the number you choose, the more warnings you'll get, but does that alone justify "fixing" them? It's certainly plausible that we could get to issuing invalidation commands at the bottom of a relatively long callchain through subsystem/client driver/DMA API code, but have you observed a stack overflow in practice? It's not like these functions are re-entrant, or calling out to unknown external code, so while ~1KB is admittedly reasonably big, we can at least reason that there's never going to be much more *beyond* that (basically just whatever arm_smmu_cmdq_issue_cmdlist() uses). > Thanks for the patch. I agree that we should address these warnings, but > moving these allocations to the heap via kmalloc_obj() in the fast path > is problematic. Introducing heap allocation adds unnecessary latency and > potential for allocation failure in hot paths. > > So, yes, we are using a lot of stack but we're using it to do good > things.. > > IMO, if we really want to address these, instead of kmalloc, we could > potentially consider some pre-allocated per-CPU buffers (that's a lot of > additional book-keeping though) to keep the data off the stack or > something similar following a simple rule: The fast path must be > deterministic- no SLAB allocations and no introducing new failure points > > The last thing we'd want is a graphic driver's shrinker calling > dma-unmaps when the system is already under heavy memory pressure and > calling kmalloc leading to a circular dependency or allocation failure > exactly when the system needs to perform the unmap the most. ISTR it's worse than that, and in fact we must not even attempt to allocate in a reclaim path at all, or it risks deadlock. So since the SMMU driver cannot realistically know the context of *why* it's being asked to unmap/invalidate something, I'm not sure it can ever be assumed to be safe. Thanks, Robin. > > Thanks, > Praan > >> Signed-off-by: Cheng-Yang Chou >> --- >> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 66 +++++++++++++++------ >> 1 file changed, 48 insertions(+), 18 deletions(-) >> >> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> index 4d00d796f078..734546dc6a78 100644 >> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c >> @@ -1281,7 +1281,7 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master, >> int ssid, bool leaf) >> { >> size_t i; >> - struct arm_smmu_cmdq_batch cmds; >> + struct arm_smmu_cmdq_batch *cmds; >> struct arm_smmu_device *smmu = master->smmu; >> struct arm_smmu_cmdq_ent cmd = { >> .opcode = CMDQ_OP_CFGI_CD, >> @@ -1291,13 +1291,23 @@ static void arm_smmu_sync_cd(struct arm_smmu_master *master, >> }, >> }; >> >> - arm_smmu_cmdq_batch_init(smmu, &cmds, &cmd); >> + cmds = kmalloc_obj(*cmds); >> + if (!cmds) { >> + struct arm_smmu_cmdq_ent cmd_all = { .opcode = CMDQ_OP_CFGI_ALL }; >> + >> + WARN_ONCE(1, "arm-smmu-v3: failed to allocate cmdq_batch, falling back to full CD invalidation\n"); >> + arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd_all); >> + return; >> + } >> + >> + arm_smmu_cmdq_batch_init(smmu, cmds, &cmd); >> for (i = 0; i < master->num_streams; i++) { >> cmd.cfgi.sid = master->streams[i].id; >> - arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd); >> + arm_smmu_cmdq_batch_add(smmu, cmds, &cmd); >> } >> >> - arm_smmu_cmdq_batch_submit(smmu, &cmds); >> + arm_smmu_cmdq_batch_submit(smmu, cmds); >> + kfree(cmds); >> } >> >> static void arm_smmu_write_cd_l1_desc(struct arm_smmu_cdtab_l1 *dst, >> @@ -2225,31 +2235,37 @@ arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size, >> static int arm_smmu_atc_inv_master(struct arm_smmu_master *master, >> ioasid_t ssid) >> { >> - int i; >> + int i, ret; >> struct arm_smmu_cmdq_ent cmd; >> - struct arm_smmu_cmdq_batch cmds; >> + struct arm_smmu_cmdq_batch *cmds; >> >> arm_smmu_atc_inv_to_cmd(ssid, 0, 0, &cmd); >> >> - arm_smmu_cmdq_batch_init(master->smmu, &cmds, &cmd); >> + cmds = kmalloc_obj(*cmds); >> + if (!cmds) >> + return -ENOMEM; >> + >> + arm_smmu_cmdq_batch_init(master->smmu, cmds, &cmd); >> for (i = 0; i < master->num_streams; i++) { >> cmd.atc.sid = master->streams[i].id; >> - arm_smmu_cmdq_batch_add(master->smmu, &cmds, &cmd); >> + arm_smmu_cmdq_batch_add(master->smmu, cmds, &cmd); >> } >> >> - return arm_smmu_cmdq_batch_submit(master->smmu, &cmds); >> + ret = arm_smmu_cmdq_batch_submit(master->smmu, cmds); >> + kfree(cmds); >> + return ret; >> } >> >> int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, >> unsigned long iova, size_t size) >> { >> struct arm_smmu_master_domain *master_domain; >> - int i; >> + int i, ret; >> unsigned long flags; >> struct arm_smmu_cmdq_ent cmd = { >> .opcode = CMDQ_OP_ATC_INV, >> }; >> - struct arm_smmu_cmdq_batch cmds; >> + struct arm_smmu_cmdq_batch *cmds; >> >> if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)) >> return 0; >> @@ -2271,7 +2287,11 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, >> if (!atomic_read(&smmu_domain->nr_ats_masters)) >> return 0; >> >> - arm_smmu_cmdq_batch_init(smmu_domain->smmu, &cmds, &cmd); >> + cmds = kmalloc_obj(*cmds); >> + if (!cmds) >> + return -ENOMEM; >> + >> + arm_smmu_cmdq_batch_init(smmu_domain->smmu, cmds, &cmd); >> >> spin_lock_irqsave(&smmu_domain->devices_lock, flags); >> list_for_each_entry(master_domain, &smmu_domain->devices, >> @@ -2294,12 +2314,14 @@ int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, >> >> for (i = 0; i < master->num_streams; i++) { >> cmd.atc.sid = master->streams[i].id; >> - arm_smmu_cmdq_batch_add(smmu_domain->smmu, &cmds, &cmd); >> + arm_smmu_cmdq_batch_add(smmu_domain->smmu, cmds, &cmd); >> } >> } >> spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); >> >> - return arm_smmu_cmdq_batch_submit(smmu_domain->smmu, &cmds); >> + ret = arm_smmu_cmdq_batch_submit(smmu_domain->smmu, cmds); >> + kfree(cmds); >> + return ret; >> } >> >> /* IO_PGTABLE API */ >> @@ -2334,7 +2356,7 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd, >> struct arm_smmu_device *smmu = smmu_domain->smmu; >> unsigned long end = iova + size, num_pages = 0, tg = 0; >> size_t inv_range = granule; >> - struct arm_smmu_cmdq_batch cmds; >> + struct arm_smmu_cmdq_batch *cmds; >> >> if (!size) >> return; >> @@ -2362,7 +2384,14 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd, >> num_pages++; >> } >> >> - arm_smmu_cmdq_batch_init(smmu, &cmds, cmd); >> + cmds = kmalloc_obj(*cmds); >> + if (!cmds) { >> + WARN_ONCE(1, "arm-smmu-v3: failed to allocate cmdq_batch, falling back to full TLB invalidation\n"); >> + arm_smmu_tlb_inv_context(smmu_domain); >> + return; >> + } >> + >> + arm_smmu_cmdq_batch_init(smmu, cmds, cmd); >> >> while (iova < end) { >> if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { >> @@ -2391,10 +2420,11 @@ static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd, >> } >> >> cmd->tlbi.addr = iova; >> - arm_smmu_cmdq_batch_add(smmu, &cmds, cmd); >> + arm_smmu_cmdq_batch_add(smmu, cmds, cmd); >> iova += inv_range; >> } >> - arm_smmu_cmdq_batch_submit(smmu, &cmds); >> + arm_smmu_cmdq_batch_submit(smmu, cmds); >> + kfree(cmds); >> } >> >> static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, >> -- >> 2.48.1 >> >>