All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicolin Chen <nicolinc@nvidia.com>
To: "Tian, Kevin" <kevin.tian@intel.com>
Cc: Samiullah Khawaja <skhawaja@google.com>,
	"will@kernel.org" <will@kernel.org>,
	"robin.murphy@arm.com" <robin.murphy@arm.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"jgg@nvidia.com" <jgg@nvidia.com>,
	"rafael@kernel.org" <rafael@kernel.org>,
	"lenb@kernel.org" <lenb@kernel.org>,
	"praan@google.com" <praan@google.com>,
	"baolu.lu@linux.intel.com" <baolu.lu@linux.intel.com>,
	"xueshuai@linux.alibaba.com" <xueshuai@linux.alibaba.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-acpi@vger.kernel.org" <linux-acpi@vger.kernel.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	Vikram Sethi <vsethi@nvidia.com>
Subject: Re: [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap
Date: Wed, 18 Mar 2026 20:12:04 -0700	[thread overview]
Message-ID: <abtphBuLiXxSxmAy@Asurada-Nvidia> (raw)
In-Reply-To: <BN9PR11MB5276467C1933E19E98277D378C4FA@BN9PR11MB5276.namprd11.prod.outlook.com>

On Thu, Mar 19, 2026 at 03:08:05AM +0000, Tian, Kevin wrote:
> > From: Samiullah Khawaja <skhawaja@google.com>
> > Sent: Thursday, March 19, 2026 6:07 AM
> > 
> > Hi Nicolin,
> > 
> > On Wed, Mar 18, 2026 at 12:26:33PM -0700, Nicolin Chen wrote:
> > >On Wed, Mar 18, 2026 at 07:36:20AM +0000, Tian, Kevin wrote:
> > >> > From: Nicolin Chen <nicolinc@nvidia.com>
> > >> > Sent: Wednesday, March 18, 2026 3:16 AM
> > >> >
> > >> > An ATC invalidation timeout is a fatal error. While the SMMUv3
> > hardware is
> > >> > aware of the timeout via a GERROR interrupt, the driver thread issuing
> > the
> > >> > commands lacks a direct mechanism to verify whether its specific batch
> > was
> > >> > the cause or not, as polling the CMD_SYNC status doesn't natively return
> > a
> > >> > failure code, making it very difficult to coordinate per-device recovery.
> > >> >
> > >> > Introduce an atc_sync_timeouts bitmap in the cmdq structure to bridge
> > this
> > >> > gap. When the ISR detects an ATC timeout, set the bit corresponding to
> > the
> > >> > physical CMDQ index of the faulting CMD_SYNC command.
> > >> >
> > >>
> > >> It's nice to see the ability of allowing sw to identify the faulting sync
> > command
> > >> upon an ATC timeout! On VT-d it's not feasible when multiple wait
> > descriptors
> > >> (similar to CMD_SYNC) are in-fly... :/
> > >
> > >Actually SMMU doesn't know which device is faulting when CMD_SYNC
> > 
> > VT-d is able to find out the SID of the device for which the device TLB
> > invalidation timed-out occured by using the SID reported in the
> > "Invalidation Queue Error Record Register" (VT-d Specs 11.4.9.9).
> 
> yes. but when there are multiple submissions (each with a wait descriptor)
> fetched/handled by the hw and then an invalidation timeout comes, all
> pending wait descriptors will be aborted (not just the one corresponding
> to the timeout). In this case all affected submitters need to re-try.

This sounds similar to SMMU then.

Nicolin

  reply	other threads:[~2026-03-19  3:12 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-17 19:15 [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon ATC invalidation timeout Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 1/7] iommu: Do not call pci_dev_reset_iommu_done() unless reset succeeds Nicolin Chen
2026-03-18  7:21   ` Tian, Kevin
2026-03-18 20:16     ` Nicolin Chen
2026-03-18  8:02   ` Shuai Xue
2026-03-18 20:27     ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 2/7] iommu: Add reset_device_done callback for hardware fault recovery Nicolin Chen
2026-03-18  5:59   ` Baolu Lu
2026-03-18 18:42     ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 3/7] iommu: Add iommu_report_device_broken() to quarantine a broken device Nicolin Chen
2026-03-18  6:13   ` Baolu Lu
2026-03-19  1:31     ` Nicolin Chen
2026-03-18  7:31   ` Tian, Kevin
2026-03-19  1:30     ` Nicolin Chen
2026-03-19  2:35       ` Tian, Kevin
2026-03-19  3:13         ` Nicolin Chen
2026-03-18 11:45   ` Shuai Xue
2026-03-18 20:29     ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 4/7] iommu/arm-smmu-v3: Mark ATC invalidate timeouts via lockless bitmap Nicolin Chen
2026-03-18  7:36   ` Tian, Kevin
2026-03-18 19:26     ` Nicolin Chen
2026-03-18 22:06       ` Samiullah Khawaja
2026-03-19  3:08         ` Tian, Kevin
2026-03-19  3:12           ` Nicolin Chen [this message]
2026-03-23 23:51             ` Jason Gunthorpe
2026-04-10  7:39               ` Tian, Kevin
2026-03-18 22:02   ` Samiullah Khawaja
2026-03-18 23:23     ` Nicolin Chen
2026-03-19  0:08       ` Samiullah Khawaja
2026-03-19  1:15         ` Nicolin Chen
2026-03-23 23:57       ` Jason Gunthorpe
2026-03-24  1:21         ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 5/7] iommu/arm-smmu-v3: Replace smmu with master in arm_smmu_inv Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 6/7] iommu/arm-smmu-v3: Introduce master->ats_broken flag Nicolin Chen
2026-03-18  7:39   ` Tian, Kevin
2026-03-18 20:00     ` Nicolin Chen
2026-03-17 19:15 ` [PATCH v2 7/7] iommu/arm-smmu-v3: Block ATS upon an ATC invalidation timeout Nicolin Chen
2026-03-19  2:56   ` Shuai Xue
2026-03-19  3:26     ` Nicolin Chen
2026-03-19  7:41       ` Shuai Xue
2026-03-18  7:47 ` [PATCH v2 0/7] iommu/arm-smmu-v3: Quarantine device upon " Tian, Kevin
2026-03-18 20:04   ` Nicolin Chen
2026-03-19  2:29     ` Tian, Kevin
2026-03-19  3:10       ` Nicolin Chen
2026-03-24  0:03         ` Jason Gunthorpe
2026-03-24  1:30           ` Nicolin Chen
2026-03-25  6:55           ` Tian, Kevin
2026-03-25 14:12             ` Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abtphBuLiXxSxmAy@Asurada-Nvidia \
    --to=nicolinc@nvidia.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=praan@google.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=skhawaja@google.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.