From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9604A19F135 for ; Mon, 9 Mar 2026 21:13:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.170 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773090815; cv=none; b=jDNSi1+OJvwGcqfOLPDB0G1qZ1wp0m0vR584vjz8CH+7F8m/XTqunegiikMu6tiOaDitScyZ5uDXSP7w7gA3kP3kSC8xjHNU8ir49m32Mhur13uAGUPn4eNqP2q303nSaPFiBBU6TG4EGYFBL2C3/UcixNUTB094q1ILfqz8uks= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773090815; c=relaxed/simple; bh=IKAKczj63e5T8plZ584BE4vsdPGGqNtRV+I4ItpYqZ4=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=Oo1GKbdO6JomzMroDR7whKPyCwW8X2aBiq8OpRXDeZILZukxEI6ZkoLULqOwD6hPk4lkbqNLjqYaqJWAi8ymxNuoguLFfKY2tbfo8mErsOvTaPWjKedgXEWs+0F9nygeTodjVBIHRP/aGMqa/CZ6ForzJMDAt4bxdh7AwNBh6zc= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=afQmC+ES; arc=none smtp.client-ip=209.85.214.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="afQmC+ES" Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-2ae523d54d2so10785ad.1 for ; Mon, 09 Mar 2026 14:13:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1773090813; x=1773695613; darn=lists.linux.dev; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:from:to :cc:subject:date:message-id:reply-to; bh=WMZCgTylx/H+zmeDU5PTQEfEZyUolYh7BMpr3mPpNCg=; b=afQmC+ESRQnypSkdK4/nhdgDRqni4qetXDgVme9dRV/5k7PMk/AUHz5WNRV7fwXplj npfs6Xb+1hhOUG2ZolS3lHZzJzk7gVguA7+mnhZF5ZkqtJwrYPtaHipEWI7mDvBz2Rn7 0HdgnMaLxBpkIGmnCBYZIEZbPe6QtcEmgXkEFyzlL1tb0Wn4PZSJP/f3UjDKWsqEZQNH OD0jQLic5TaJyZkJnYMhV0OlIUFyM7db+m3weMXDPKzDO4x4yMsj3VipOgFQEFCdSx9h cGv9mhvIqduCbYVQPYkHRRz4o3QFDtXBcN7p6IJn33vLC4fZbpv66Fiko5z8PgFACL/C ryfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773090813; x=1773695613; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WMZCgTylx/H+zmeDU5PTQEfEZyUolYh7BMpr3mPpNCg=; b=kL3kazEyIHbm5Op25YbQWjkpcaHUXQgUh9gXIliOh8iGc/AeWe/jdvzSl7VMYvRXzv wZ51Jr4sWt8HFVJ/ucYh6bvcRB2wqh60bDacHuOJgiKHkIcUxVqh1yojONg3jH8J4CpR Vzh90vEifHc4i6Ysj1GrZLH13pKEzfvNC9hkeL3z7BoCR0bBiXUc3x9DG2lnaVs4Y4Aa Hn0uhB3zov67W7ycXOx22zME4/ZGOxaYKUA+GWgM4BW/qw2awLOopvGePNZh5mZYgH/l HjfSwd+HJelgH1KN3+pf5vdC2fnbeKoR4ADH9oZpklcZNfFHBJmaGcocH/OUbB+ypQ5O o5fQ== X-Gm-Message-State: AOJu0Ywzn34PH4CGWncteW8rLcsLIrGK1H4jJb5fLYxQLzuEjWfYkQFN Q1c9h7+gCxUnMExnucbvTW6JbGmEPSoklBYgL/1OFK4KrIri7uFj3CuYaCxfesKj3w== X-Gm-Gg: ATEYQzwf9Tlprwv+vQgB7RwgNEyNuXtYkj5GnjANMnKQ2cAPO534r0QvYumxVOX/Cgx xnVvseCY5zdFzNL+LErWGl8bMaI5sJ/fz5RtaKfNbNQbaEvNeeD4dXEMhqWW0bOup7TFsAIKo6R f/B5w1/N2AL05Xhb3oQPMutKW6lmbfwJtoHULpEykLO69rsQ7VuHL1S/ba95nhDpDN6TIWNZntF MxyorpR6C+MhS8fU9fmaLNjS8Wc0rlPF9kX8j87LeuiNoEJClrVTo0BoPJJRw1K8KBPEgjLvT0x 3KP58668qWX/8FMoJ2xT9Pb7d32CQV+frkzwlv+y4Fkbu99nO02WMvjnEslLllVR6R1Munja1Om iSvZKBtlqM5rHkYjClmcHCZirRySYIT2/KRgiu6W5tpuSezB5S8v0FZzHBzHpVvVCLF2xt1laaJ 6i2cmIDc01vleIxOMcLV/xAAAQnRZwbKqg/4NWopG6rLhhCRxQDav4McsL6A== X-Received: by 2002:a17:902:ea04:b0:2a0:7fac:c031 with SMTP id d9443c01a7336-2aea30cfb43mr902745ad.14.1773090812404; Mon, 09 Mar 2026 14:13:32 -0700 (PDT) Received: from google.com (10.129.124.34.bc.googleusercontent.com. [34.124.129.10]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2ae83f8a629sm151715235ad.62.2026.03.09.14.13.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Mar 2026 14:13:31 -0700 (PDT) Date: Mon, 9 Mar 2026 21:13:26 +0000 From: Pranjal Shrivastava To: Daniel Mentz Cc: iommu@lists.linux.dev, Will Deacon , Joerg Roedel , Robin Murphy , Jason Gunthorpe , Mostafa Saleh , Nicolin Chen , Ashish Mhetre , Sairaj Kodilkar Subject: Re: [PATCH v5 06/10] iommu/arm-smmu-v3: Add a usage counter for cmdq Message-ID: References: <20260126151157.3418145-1-praan@google.com> <20260126151157.3418145-7-praan@google.com> Precedence: bulk X-Mailing-List: iommu@lists.linux.dev List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: On Mon, Mar 09, 2026 at 07:46:24PM +0000, Pranjal Shrivastava wrote: > On Sun, Mar 08, 2026 at 02:23:03PM -0700, Daniel Mentz wrote: > > On Mon, Jan 26, 2026 at 7:12 AM Pranjal Shrivastava wrote: > > > > > > Introduce a biased counter to track the number of active cmdq owners as > > > a preparatory step for the runtime PM implementation. > > > > Non-owners waiting for their CMD_SYNC to be consumed also appear to > > increase this counter. > > > > Ack, I'll clarify that in the commit message too, any owners or > non-owners waiting on sync would increment this counter. > > > > > > > The counter will be used to gate command submission, preventing the > > > submission of new commands while the device is suspended and deferring > > > suspend while the command submissions are in-flight. > > > > What does command submission mean in this context? I understand that > > commands are still added to the queue (until the queue is full). It's > > only the producer index that is not advanced. > > > > Exactly, incrementing the "producer index" is how the command is > actually submitted to the HW. In case the SMMU is suspended, we'd simply > add it to the queue but not touch the PROD register in the HW. I can > clarify this further by saying "preventing the flushing of new commands" > > > > > > > The counter is biased to a value of 1 during device reset. A cmdq owner > > > or a thread issuing cmds with sync, increment it before accessing HW > > > registers and decrements it with release semantics afterwards. > > > > > > A value of 1 represents an idle (but active) state. A suspend operation > > > will set it to from 1 -> 0 representing the suspended state. > > > > > > Signed-off-by: Pranjal Shrivastava > > > --- > > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 65 +++++++++++++++++---- > > > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 3 + > > > 2 files changed, 57 insertions(+), 11 deletions(-) > > > > > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > > index 91edcd8922a7..3a8d3c2a8d69 100644 > > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c > > > @@ -805,7 +805,7 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, > > > u64 cmd_sync[CMDQ_ENT_DWORDS]; > > > u32 prod; > > > unsigned long flags; > > > - bool owner; > > > + bool owner, has_ref = false; > > > struct arm_smmu_ll_queue llq, head; > > > int ret = 0; > > > > > > @@ -819,8 +819,15 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, > > > > > > while (!queue_has_space(&llq, n + sync)) { > > > local_irq_restore(flags); > > > + > > > + if (!atomic_inc_not_zero(&smmu->nr_cmdq_users)) > > > + /* Device is suspended, don't wait for space */ > > > + return 0; > > > + > > > > Does this mean that we still accumulate commands until the queue is > > full while in the suspended state and then have SMMU excecute all of > > these (now irrelevant) commands when the command queue is re-enabled? > > One might argue that this is wasteful, but I guess it might be > > technically correct, because it doesn't hurt either. > > > > No, we drop any commands that are issued if the command queue doesn't > have space while it is suspended. There are two reasons for this: > > 1. The arm_smmu_cmdq_poll_until_not_full below would try to perform an > MMIO access (read_relaxed(cmdq->q.cons_reg) which shouldn't be allowed > when the SMMU is suspended. > > 2. We usually have 3 types of commands in the SMMUv3: > > a) Invalidation commands: We clear the entire TLB during resume, so we > don't really need to batch these. > > b) Prefetch commands: Any prefetches are effectively no-ops as when the > SMMU is suspended, the resume would effectively read the latest config > > c) Page responses: These are NOT expected to occur while the SMMU is > suspended and we shouldn't be batching them anyway. (The page resp > handlers handle such situations and avoid queueing-in commands anyway > with this series) > > Thus, any dropped invalidation / prefetch commands don't harm us if the > SMMU was suspended. > > > > > if (arm_smmu_cmdq_poll_until_not_full(smmu, cmdq, &llq)) > > > dev_err_ratelimited(smmu->dev, "CMDQ timeout\n"); > > > + > > > + atomic_dec_return_release(&smmu->nr_cmdq_users); > > > local_irq_save(flags); > > > } > > > > > > @@ -879,10 +886,35 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, > > > arm_smmu_cmdq_poll_valid_map(cmdq, llq.prod, prod); > > > > > > /* > > > - * d. Advance the hardware prod pointer > > > + * d. Advance the hardware prod pointer (if smmu is still active) > > > > Consider "(if command queue is still enabled)". Otherwise, what does > > "active" mean? SMMU and/or command queue enabled? > > > > The suspend callback is implemented in a way where we can't have a > "partial" suspend, i.e. there is no code path where the CMDQ is disabled > but the SMMU is not. Thus, "active" was supposed to mean The SMMU is > enabled entirely since there isn't a partial suspend/resume implemented. > > Although, I can update the comment to say (if smmu is still enabled) OR > (if smmu is NOT suspended). > > > > * Control dependency ordering from the entries becoming valid. > > > */ > > > - writel_relaxed(prod, cmdq->q.prod_reg); > > > + if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) { > > > + writel_relaxed(prod, cmdq->q.prod_reg); > > > + > > > + if (sync) { > > > + has_ref = true; > > > + } else { > > > + /* > > > + * Use release semantics to enforce ordering without a full barrier. > > > + * This ensures the prior writel_relaxed() is ordered/visible > > > + * before the refcount decrement, avoiding the heavy pipeline > > > + * stall of a full wmb(). > > > + * > > > + * We need the atomic_dec_return_release() below and the > > > + * atomic_set_release() in step (e) below doesn't suffice. > > > + * > > > + * Specifically, without release semantics on the decrement, > > > + * the CPU is free to reorder the independent atomic_dec_relaxed() > > > + * before the writel_relaxed(). > > > + * > > > + * If this happens, the refcount could drop to zero, allowing the PM > > > + * suspend path (running on another CPU) to disable the SMMU before > > > + * the register write completes, resulting in a bus fault. > > > + */ > > > + atomic_dec_return_release(&smmu->nr_cmdq_users); > > > + } > > > + } > > > > > > /* > > > * e. Tell the next owner we're done > > > @@ -894,14 +926,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, > > > > > > /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */ > > > if (sync) { > > > - llq.prod = queue_inc_prod_n(&llq, n); > > > - ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq); > > > - if (ret) { > > > - dev_err_ratelimited(smmu->dev, > > > - "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n", > > > - llq.prod, > > > - readl_relaxed(cmdq->q.prod_reg), > > > - readl_relaxed(cmdq->q.cons_reg)); > > > + > > > + /* If we are not the owner, check if we're suspended */ > > > + if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) { > > > > Can there be a situation, where we execute this branch and the > > hardware prod pointer has not been updated, in which case > > arm_smmu_cmdq_poll_until_sync will wait forever (well, actually, until > > it times out). Imagine the following series of events: > > * nr_cmdq_users==0 > > * hardware prod pointer update is skipped > > * nr_cmdq_users becomes 1 > > * arm_smmu_cmdq_poll_until_sync waits but commands are not executed > > by SMMU because hardware prod pointer update wasn't updated > > * arm_smmu_device_reset() tries to submit CMDQ_OP_CFGI_ALL, but gets > > stuck because the queue is full. > > * cmdq->q.llq.cons is never updated because the shared lock is held. > > queue_has_space() returns false, and > > arm_smmu_cmdq_poll_until_not_full() can't get the exclusive lock > > > > Thanks for catching this! IIUC, you mean a race between the resume and > this code block can cause a deadlock? Something like the following: > > CPU 0 CPU 1 > nr_cmdq_users == 0 > skip hw_prod++ > ->resume cb > nr_cmdq_user == 1 > nr_cmdq_users == 1 > in the if (sync) part > arm_smmu_cmdq_poll_until_sync() > CMDQ_OP_CFGI_ALL stuck for space > > I guess we could add a check here if (has_ref == false) but > nr_cmdq_users > 0, then we could write the prod_reg again.. maybe we'd > need to factor out a helper function here.. I'll think more about this.. > > I guess the simplest thing would be to bail out much before, something > like if (nr_cmdq_users == 0 && sync) return; > > > > + has_ref = true; > > > + llq.prod = queue_inc_prod_n(&llq, n); > > > + ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq); > > > > If you skip arm_smmu_cmdq_poll_until_sync() in the nr_cmdq_users==0 > > case, I'm thinking that llq.cons is not updated. If this is the case, > > is the subsequent WRITE_ONCE(cmdq->q.llq.cons, llq.cons); potentially > > incorrect? > > > > The goal here was to "simulate" a successful command submission while > the smmu is suspended (as mentioned by Will earlier), we do everything > except for touching MMIO here. > > Since the commands can be one of the three types mentioned above, we > don't really miss out on anything with the faux-cmd submission here. > > Thus, what if we update llq.prod always and set llq.cons == llq.prod > condtionally if nr_cmdq_users == 0. > > Ideally, this should also prevent the `queue_full` scenario as all > commands are faux-issued instantly. Maybe something like: > > if (sync) { > llq.prod = queue_inc_prod_n(&llq, n); > > if (has_ref || atomic_inc_not_zero(&smmu->nr_cmdq_users)) { > if (!has_ref) { > /* Repair: We were suspended earlier but now we're active. > * Kick the hardware to publish our commands. */ > has_ref = true; > ... owner logic again ... > ... > writel_relaxed(llq.prod, cmdq->q.prod_reg); > } > ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq); > > if (ret) { > // ... > } > } else { > /* > * Device is suspended. Fast-forward the consumer pointer in memory > * to "drop" these commands and avoid a full queue deadlock. > * (n + 1 to include the SYNC command) > */ > llq.cons = queue_inc_prod_n(&llq, n + 1); > } > > if (!arm_smmu_cmdq_shared_tryunlock(cmdq)) { > WRITE_ONCE(cmdq->q.llq.cons, llq.cons); > arm_smmu_cmdq_shared_unlock(cmdq); > } > } > > This could still be racy.. I'll give it another thought.. > Still thinking out loud, but maybe we can have something like: if (owner) { .... if (atomic_inc_not_zero(&smmu->nr_cmdq_users)) { writel_relaxes(llq.prod, cmdq->q.prod_reg); has_ref = true; } else { /* SMMU is suspended, fake a consumption */ WRITE_ONCE(cmdq->q.llq.cons, prod); /* Avoid racing with resume */ sync = false; } atomic_set_release(&cmdq->owner_prod, prod); } if (sync) { ... same stuff .. } LMK, what you think? [ snip ] Thanks, Praan