public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Samiullah Khawaja <skhawaja@google.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: will@kernel.org, robin.murphy@arm.com, joro@8bytes.org,
	 bhelgaas@google.com, jgg@nvidia.com, rafael@kernel.org,
	lenb@kernel.org,  praan@google.com, baolu.lu@linux.intel.com,
	xueshuai@linux.alibaba.com,  kevin.tian@intel.com,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	 linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-pci@vger.kernel.org,  vsethi@nvidia.com
Subject: Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap
Date: Thu, 19 Mar 2026 00:08:04 +0000	[thread overview]
Message-ID: <abs6PudCAh-eVRPA@google.com> (raw)
In-Reply-To: <abs0CQCrFlCsV6Ls@Asurada-Nvidia>

On Wed, Mar 18, 2026 at 04:23:53PM -0700, Nicolin Chen wrote:
>Hi Sami,
>
>On Wed, Mar 18, 2026 at 10:02:32PM +0000, Samiullah Khawaja wrote:
>> On Tue, Mar 17, 2026 at 12:15:37PM -0700, Nicolin Chen wrote:
>> > @@ -895,9 +898,19 @@ int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu,
>> >
>> > 	/* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */
>> > 	if (sync) {
>> > +		u32 sync_prod;
>> > +
>> > 		llq.prod = queue_inc_prod_n(&llq, n);
>> > +		sync_prod = llq.prod;
>> > +
>> > 		ret = arm_smmu_cmdq_poll_until_sync(smmu, cmdq, &llq);
>> > -		if (ret) {
>> > +		if (test_and_clear_bit(Q_IDX(&llq, sync_prod),
>> > +				       cmdq->atc_sync_timeouts)) {
>>
>> This will not be set if a software timeout (1 second) occurs. Do you
>> know if the ATC timeout of Arm sMMUv3 is less than the software timeout
>> in the driver?
>
>You brought up a good point!
>
>I think ATC timeout follows the PCI Completion Timeout Value in
>"Device Control 2 Register", which is typically set [50us, 50ms]
>but can be set up to [17s, 64s] according to PCI Base spec.

Agreed.
>
>> If not maybe we can handle the software timeout here also as the cmdlist
>> is already known?
>
>I think it's trickier.
>
>If the software times out first at 1s, it means the CMDQ is still
>pending on wait for the completion of ATC invalidation. Then, the
>caller sees -ETIMEOUT and tries to bisect the ATC batch or update
>the STE directly, either of which involves CMDQ. But CMDQ has not
>recovered yet.
>
>Then, in case of a batch, all the reties could timeout again. So,
>it will fail to identify which device is truly broken. This would
>end badly by blindly disabling all the devices in the batch. Also
>the disabling calls require CMDQ too, so they might fail as well.

Yes, looking at VT-d currently and the queue length is 256 and this
spirals out of control quickly.
>
>Thus, partially to answer the question, in case software timeout,
>I am afraid that we can hardly do anything.. :-/

Agreed.

Do you think we can maybe document this somewhere? Maybe add to the
cover letter?
>
>This means I need to set a different return code for ATC timeouts
>v.s. software timeouts.
>
>Also, there is another problem: when PCI CTO finally reaches, the
>GERROR ISR will set atc_sync_timeouts but nobody will clear it..
>So, before calling arm_smmu_cmdq_issue_cmdlist(), we need to make
>sure there is no dirty bit on the bitmap too.

Yes, Just to confirm, do you think this needs to be handled regardless
whether we handle the software timeout for the ATC invalidation?
Basically to cleanup the bit on bitmap.
>
>Thanks!
>Nicolin

  reply	other threads:[~2026-03-19  0:08 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 19:15 [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 1/7] iommu: Do not call pci_dev_reset_iommu_done() unless reset succeeds Nicolin Chen
2026-03-18  7:21   ` Tian, Kevin
2026-03-18 20:16     ` Nicolin Chen
2026-03-18  8:02   ` Shuai Xue
2026-03-18 20:27     ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 2/7] iommu: Add reset_device_done callback for hardware fault recovery Nicolin Chen
2026-03-18  5:59   ` Baolu Lu
2026-03-18 18:42     ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 3/7] iommu: Add iommu_report_device_broken() to quarantine a broken device Nicolin Chen
2026-03-18  6:13   ` Baolu Lu
2026-03-19  1:31     ` Nicolin Chen
2026-03-18  7:31   ` Tian, Kevin
2026-03-19  1:30     ` Nicolin Chen
2026-03-19  2:35       ` Tian, Kevin
2026-03-19  3:13         ` Nicolin Chen
2026-03-18 11:45   ` Shuai Xue
2026-03-18 20:29     ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-03-18  7:36   ` Tian, Kevin
2026-03-18 19:26     ` Nicolin Chen
2026-03-18 22:06       ` Samiullah Khawaja
2026-03-19  3:08         ` Tian, Kevin
2026-03-19  3:12           ` Nicolin Chen
2026-03-23 23:51             ` Jason Gunthorpe
2026-03-18 22:02   ` Samiullah Khawaja
2026-03-18 23:23     ` Nicolin Chen
2026-03-19  0:08       ` Samiullah Khawaja [this message]
2026-03-19  1:15         ` Nicolin Chen
2026-03-23 23:57       ` Jason Gunthorpe
2026-03-24  1:21         ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 5/7] iommu/arm-smmu-v3: Replace smmu with master in arm_smmu_inv Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 6/7] iommu/arm-smmu-v3: Introduce master->ats_broken flag Nicolin Chen
2026-03-18  7:39   ` Tian, Kevin
2026-03-18 20:00     ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 7/7] iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout Nicolin Chen
2026-03-19  2:56   ` Shuai Xue
2026-03-19  3:26     ` Nicolin Chen
2026-03-19  7:41       ` Shuai Xue
2026-03-18  7:47 ` [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon " Tian, Kevin
2026-03-18 20:04   ` Nicolin Chen
2026-03-19  2:29     ` Tian, Kevin
2026-03-19  3:10       ` Nicolin Chen
2026-03-24  0:03         ` Jason Gunthorpe
2026-03-24  1:30           ` Nicolin Chen
2026-03-25  6:55           ` Tian, Kevin
2026-03-25 14:12             ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abs6PudCAh-eVRPA@google.com \
    --to=skhawaja@google.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=praan@google.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox