From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 14D58336EC5; Wed, 28 Jan 2026 18:44:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769625895; cv=none; b=k4AUA3f8+J3ITKBHEB0NJohFwzDAaMuEHPS8VgZhTfBhKtAP66pVuBUI0oNSegL3eB8SYoonGSfJ821sVvHnbbbYdFRkVyr6ZB1zgqm1i0tkuV6ZwCKMKF4zrABjFb5RLRvqsMONx0G0CzK3GxXWhzFgEZ+AHmO2XA2HmHGRYZo= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769625895; c=relaxed/simple; bh=YWYdf9VXH4blRVtgKNmi/VpABd2r97IKD4NwZPdQJw8=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=eIhMIj4/kOUrt0TzW5MZhLFJL/ROdwkBrzWtRwg+yhPy3ZiMGLFTag52Eok/sFkz2VKisv7DusGVdsF0l+knsBXmewirTmyp+/StOBN7nOI+V1ecWMT1i6iwXkfEd6pirxKD6xzJz4n72sTbq9M72WSXEexJL5bXpgmAAtsYLAU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C644B1516; Wed, 28 Jan 2026 10:44:46 -0800 (PST) Received: from [10.1.196.85] (e121345-lin.cambridge.arm.com [10.1.196.85]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 12D3B3F5CA; Wed, 28 Jan 2026 10:44:47 -0800 (PST) Message-ID: Date: Wed, 28 Jan 2026 18:44:35 +0000 Precedence: bulk X-Mailing-List: linux-arm-msm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] iommu/arm-smmu: Use pm_runtime in fault handlers To: Prakash Gupta , Will Deacon , Joerg Roedel , Rob Clark , Connor Abbott Cc: linux-arm-msm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Akhil P Oommen , Pratyush Brahma , Pranjal Shrivastava References: <20260127-smmu-rpm-v1-1-2ef2f4c85305@oss.qualcomm.com> <9fb8f661-a235-4f86-bc10-f80a21a8fa9d@arm.com> <00fcbcf2-3f48-410d-88a3-88dc834c1ed7@oss.qualcomm.com> From: Robin Murphy Content-Language: en-GB In-Reply-To: <00fcbcf2-3f48-410d-88a3-88dc834c1ed7@oss.qualcomm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit [ +Pranjal as this might matter for v3 too... ] On 28/01/2026 5:56 am, Prakash Gupta wrote: > > On 1/27/2026 9:35 PM, Robin Murphy wrote: >> On 2026-01-27 12:11 pm, Prakash Gupta wrote: >>> Commit d4a44f0750bb ("iommu/arm-smmu: Invoke pm_runtime across the >>> driver") >>> enabled pm_runtime for the arm-smmu device. On systems where the SMMU >>> sits in a power domain, all register accesses must be done while the >>> device is runtime-resumed to avoid unclocked register reads and >>> potential NoC errors. >>> >>> So far, this has not been an issue for most SMMU clients because >>> stall-on-fault is enabled by default. While a translation fault is >>> being handled, the SMMU stalls further translations for that context >>> bank, so the fault handler would not race with a powered-down >>> SMMU. >>> >>> Adreno SMMU now disables stall-on-fault in the presence of fault >>> storms to avoid saturating SMMU resources and hanging the GMU. With >>> stall-on-fault disabled, the SMMU can generate faults while its power >>> domain may no longer be enabled, which makes unclocked accesses to >>> fault-status registers in the SMMU fault handlers possible. >> >> At face value, that sounds wrong - how does an SMMU generate a fault, >> or indeed do anything, when it's powered off? In principle it's >> possible that the SMMU might signal an interrupt, and is _then_ >> suspended (with the interrupt line somehow remaining asserted, so >> probably more clock-gated than completely powered down) before the >> interrupt hander runs, but we rather assume that we're not going to >> have an unhandled hardware IRQ hanging around for longer than the >> autosuspend delay. >> >> So, judging by the diff below, I guess what you really mean is that in >> the case of a threaded context IRQ handler, it can take long enough >> between handling the hardware IRQ and the thread actually running that >> the SMMU may have suspended in between? > > You are correct that the SMMU cannot generate a fault while powered > down. A more accurate description of the race condition is as follows: > When stall-on-fault is disabled, the faulting transaction does is > terminated. This allows the master (the GPU) to complete its work, drop > its power vote for the SMMU, and allow the SMMU to suspend. However, the > SMMU fault handler may still be waiting to execute on the CPU. > If the SMMU suspends before the handler reads the fault registers, an > unclocked access occurs. This scenario is significantly more likely when > using threaded IRQs due to the scheduling latency involved. I will > update the next iteration to reflect this. > >> >>> Guard the context and global fault handlers with arm_smmu_rpm_get() / >>> arm_smmu_rpm_put() so that all SMMU fault register accesses are done >>> with the SMMU powered. >>> >>> Fixes: b13044092c1e ("drm/msm: Temporarily disable stall-on-fault >>> after a page fault") >>> Co-developed-by: Pratyush Brahma >>> Signed-off-by: Pratyush Brahma >>> Signed-off-by: Prakash Gupta >>> --- >>>   drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  5 ++- >>>   drivers/iommu/arm/arm-smmu/arm-smmu.c      | 53 >>> ++++++++++++++++++++++-------- >>>   2 files changed, 43 insertions(+), 15 deletions(-) >>> >>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>> b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>> index 573085349df3..2d03df72612d 100644 >>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c >>> @@ -317,6 +317,7 @@ static int qcom_adreno_smmu_init_context(struct >>> arm_smmu_domain *smmu_domain, >>>       struct arm_smmu_device *smmu = smmu_domain->smmu; >>>       struct qcom_smmu *qsmmu = to_qcom_smmu(smmu); >>>       const struct of_device_id *client_match; >>> +    const struct arm_smmu_impl *impl = qsmmu->data->impl; >>>       int cbndx = smmu_domain->cfg.cbndx; >>>       struct adreno_smmu_priv *priv; >>>   @@ -350,10 +351,12 @@ static int >>> qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain, >>>       priv->get_ttbr1_cfg = qcom_adreno_smmu_get_ttbr1_cfg; >>>       priv->set_ttbr0_cfg = qcom_adreno_smmu_set_ttbr0_cfg; >>>       priv->get_fault_info = qcom_adreno_smmu_get_fault_info; >>> -    priv->set_stall = qcom_adreno_smmu_set_stall; >>>       priv->set_prr_bit = NULL; >>>       priv->set_prr_addr = NULL; >>>   +    if (impl->context_fault_needs_threaded_irq) >>> +        priv->set_stall = qcom_adreno_smmu_set_stall; >>> + >>>       if (of_device_is_compatible(np, "qcom,smmu-500") && >>>           !of_device_is_compatible(np, "qcom,sm8250-smmu-500") && >>>           of_device_is_compatible(np, "qcom,adreno-smmu")) { >>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c >>> b/drivers/iommu/arm/arm-smmu/arm-smmu.c >>> index 5e690cf85ec9..183f12e45b02 100644 >>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c >>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c >>> @@ -462,10 +462,23 @@ static irqreturn_t arm_smmu_context_fault(int >>> irq, void *dev) >>>       int idx = smmu_domain->cfg.cbndx; >>>       int ret; >>>   +    if (smmu->impl && smmu->impl->context_fault_needs_threaded_irq) { >> >> Why is this conditional on being threaded, if the global fault handler >> that can never be threaded at all apparently needs it unconditionally? > Synchronous runtime PM calls can sleep, which would cause issue if > called within a hard IRQ context. This is why I added the conditional > check for threaded IRQs. > Furthermore, this change only allow the driver to override the > stall-on-fault setting when context_fault_needs_threaded_irq is true. > Since the unclocked access issue is tied to disabling stall-on-fault, > the fix is only logically required for the threaded IRQ path. > For the Global Fault handler, which runs in a hard IRQ context, you are > right—we cannot safely vote for power there. I will remove the runtime > PM call from that section. Hmm, but then how *do* we actually guarantee that autosuspend doesn't happen to kick in and power down the SMMU just as a hardirq handler runs, when there's some unexpected event? I fear there's a horrible can of worms here... Thanks, Robin.