All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolin Chen <nicolinc@nvidia.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: <will@kernel.org>, <robin.murphy@arm.com>, <joro@8bytes.org>,
	<bhelgaas@google.com>, <rafael@kernel.org>, <lenb@kernel.org>,
	<praan@google.com>, <kees@kernel.org>, <baolu.lu@linux.intel.com>,
	<smostafa@google.com>, <Alexander.Grest@microsoft.com>,
	<kevin.tian@intel.com>, <miko.lenczewski@arm.com>,
	<linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>,
	<linux-kernel@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
	<linux-pci@vger.kernel.org>, <vsethi@nvidia.com>
Subject: Re: [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts
Date: Thu, 5 Mar 2026 13:15:45 -0800	[thread overview]
Message-ID: <aanygWWZLA1htDdQ@Asurada-Nvidia> (raw)
In-Reply-To: <20260305153911.GT972761@nvidia.com>

On Thu, Mar 05, 2026 at 11:39:11AM -0400, Jason Gunthorpe wrote:
> On Wed, Mar 04, 2026 at 09:21:42PM -0800, Nicolin Chen wrote:
> > +	/*
> > +	 * ATC timeout indicates the device has stopped responding to coherence
> > +	 * protocol requests. The only safe recovery is a reset to flush stale
> > +	 * cached translations. Note that pci_reset_function() internally calls
> > +	 * pci_dev_reset_iommu_prepare/done() as well and ensures to block ATS
> > +	 * if PCI-level reset fails.
> > +	 */
> > +	if (!pci_reset_function(pdev)) {
> > +		/*
> > +		 * If reset succeeds, set BME back. Otherwise, fence the system
> > +		 * from a faulty device, in which case user will have to replug
> > +		 * the device to invoke pci_set_master().
> > +		 */
> > +		pci_dev_lock(pdev);
> > +		pci_set_master(pdev);
> > +		pci_dev_unlock(pdev);
> > +	}
> 
> I thought we talked about this, the iommu driver cannot just blindly
> issue a reset like this, the reset has to come from the actual device
> driver through the AERish mechanism. Otherwise the driver RAS is going
> to explode.
> 
> The smmu driver should immediately block the STE (reject translated
> requests) to protect the system before resuming whatever command
> submissio n has encountered the error.
> 
> You could delegate the STE change to the interrupted command
> submission to avoid doing it from a ISR, that makes alot of sense
> because the submission thread is already operating a cmdq so it could
> stick in a STE invalidation command, possibly even in place of the
> failed ATC command.

You mean in arm_smmu_cmdq_issue_cmdlist() that issued the timed
out ATC command?

So my test case was to trigger a device fault followed by an ATC
command. But, I found that the ATC command submission returned 0
while only the ISR received:
    CMDQ error (cons 0x03000003): ATC invalidate timeout
    arm_smmu_debugfs_atc_write: ATC_INV ret=0

It seems difficult to insert a CMDQ_OP_CFGI_STE in the submission
thread?

> I think I'd break this up into smaller steps, just focus on this STE
> mechanism at start and have any future attach callback fix the STE.
> 
> Then we can talk about how to properly trigger the PCI RAS flow and so
> on.

OK.

Thanks
Nicolin

  reply	other threads:[~2026-03-05 21:16 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-05  5:21 [PATCH v1 0/2] iommu/arm-smmu-v3: Reset PCI device upon ATC invalidate timeout Nicolin Chen
2026-03-05  5:21 ` [PATCH v1 1/2] iommu: Do not call pci_dev_reset_iommu_done() unless reset succeeds Nicolin Chen
2026-03-05  5:21 ` [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts Nicolin Chen
2026-03-05 15:15   ` kernel test robot
2026-03-05 15:24   ` Robin Murphy
2026-03-05 21:06     ` Nicolin Chen
2026-03-05 23:30       ` Nicolin Chen
2026-03-05 23:52       ` Jason Gunthorpe
2026-03-06 15:24         ` Robin Murphy
2026-03-06 15:56           ` Jason Gunthorpe
2026-03-10 19:34             ` Pranjal Shrivastava
2026-03-05 15:39   ` Jason Gunthorpe
2026-03-05 21:15     ` Nicolin Chen [this message]
2026-03-05 23:41       ` Jason Gunthorpe
2026-03-06  1:29         ` Nicolin Chen
2026-03-06  1:33           ` Jason Gunthorpe
2026-03-06  5:06             ` Nicolin Chen
2026-03-06 13:02               ` Jason Gunthorpe
2026-03-06 19:20                 ` Nicolin Chen
2026-03-06 19:22                   ` Jason Gunthorpe
2026-03-06 19:39                     ` Nicolin Chen
2026-03-06 19:47                       ` Jason Gunthorpe
2026-03-10 19:40                 ` Pranjal Shrivastava
2026-03-10 19:57                   ` Nicolin Chen
2026-03-10 20:04                     ` Pranjal Shrivastava
2026-03-06 13:22         ` Robin Murphy
2026-03-06 14:01           ` Jason Gunthorpe
2026-03-06 20:18             ` Nicolin Chen
2026-03-06 20:22               ` Jason Gunthorpe
2026-03-06 20:34                 ` Nicolin Chen
2026-03-06  3:22     ` Baolu Lu
2026-03-06 13:00       ` Jason Gunthorpe
2026-03-06 19:35         ` Samiullah Khawaja
2026-03-06 19:43           ` Jason Gunthorpe
2026-03-06 19:59             ` Samiullah Khawaja
2026-03-06 20:03               ` Jason Gunthorpe
2026-03-06 20:22                 ` Samiullah Khawaja
2026-03-06 20:26                   ` Jason Gunthorpe
2026-03-10 20:00                     ` Samiullah Khawaja
2026-03-11 12:12                       ` Jason Gunthorpe
2026-03-06  2:35   ` kernel test robot
2026-03-10 19:16   ` Pranjal Shrivastava
2026-03-10 19:51     ` Nicolin Chen
2026-03-10 20:00       ` Pranjal Shrivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aanygWWZLA1htDdQ@Asurada-Nvidia \
    --to=nicolinc@nvidia.com \
    --cc=Alexander.Grest@microsoft.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kees@kernel.org \
    --cc=kevin.tian@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=miko.lenczewski@arm.com \
    --cc=praan@google.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.