All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pranjal Shrivastava <praan@google.com>
To: Nicolin Chen <nicolinc@nvidia.com>
Cc: will@kernel.org, robin.murphy@arm.com, joro@8bytes.org,
	bhelgaas@google.com, jgg@nvidia.com, rafael@kernel.org,
	lenb@kernel.org, kees@kernel.org, baolu.lu@linux.intel.com,
	smostafa@google.com, Alexander.Grest@microsoft.com,
	kevin.tian@intel.com, miko.lenczewski@arm.com,
	linux-arm-kernel@lists.infradead.org, iommu@lists.linux.dev,
	linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org,
	linux-pci@vger.kernel.org, vsethi@nvidia.com
Subject: Re: [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts
Date: Tue, 10 Mar 2026 20:00:48 +0000	[thread overview]
Message-ID: <abB4cAjZjIvIzhkp@google.com> (raw)
In-Reply-To: <abB2V8VGfoh5yO45@Asurada-Nvidia>

On Tue, Mar 10, 2026 at 12:51:51PM -0700, Nicolin Chen wrote:
> On Tue, Mar 10, 2026 at 07:16:02PM +0000, Pranjal Shrivastava wrote:
> > On Wed, Mar 04, 2026 at 09:21:42PM -0800, Nicolin Chen wrote:
> > > +	/*
> > > +	 * ATC timeout indicates the device has stopped responding to coherence
> > > +	 * protocol requests. The only safe recovery is a reset to flush stale
> > > +	 * cached translations. Note that pci_reset_function() internally calls
> > > +	 * pci_dev_reset_iommu_prepare/done() as well and ensures to block ATS
> > > +	 * if PCI-level reset fails.
> > > +	 */
> > > +	if (!pci_reset_function(pdev)) {
> > 
> > I'm a little uncomfortable with this, why is an IOMMU driver poking into
> > the PCI mechanics? I agree that a reset might be the right thing to do
> > here but we wouldn't want the IOMMU driver to trigger it.. Ideally, we'd
> > need a mechanism that bubbles up fatal IOMMU faults to the PCI core and
> > let it decide/perform the reset. Maybe this could mean adding another op
> > to struct pci_error_handlers or something like that?
> 
> Robin/Jason already had similar remarks (to most of your other
> comments as well). I have acked their comments, and am already
> reworking on these.
> 

Yea just saw those discussions as well, replied before seeing those.

> > > +		/*
> > > +		 * If reset succeeds, set BME back. Otherwise, fence the system
> > > +		 * from a faulty device, in which case user will have to replug
> > > +		 * the device to invoke pci_set_master().
> > > +		 */
> > > +		pci_dev_lock(pdev);
> > 
> > Why are we using spinlock_irqsave across the worker? Also, why does
> > atc_recovery.lock have to be a spinlock? The workers run in process
> > context, and I also don't see anyone else take the atc_recovery.lock?
> 
> I guess mutex would be okay here, since there is no other place
> access the linked list. Pairing a linked list with a spinlock is
> just a common practice..
> 

Ack agreed. No problem with the type of the lock, just questioning the
choice to use spinlock_irqsave et al since I don't believe this could be
in interrupt context.

> > Why does it need to be irq-safe? If this can somehow run in irq context,
> > we also seem to be using pci_dev_lock and streams_mutex across the
> > worker?
> 
> pci_dev_lock was to fence race on the PCI level. Yet, the entire
> BME call is probably not a good idea. So, dropping that means we
> won't need pci_dev_lock.
> 

Ack.

> > Mixing mutexes with spinlocks is brittle and invites 
> > "sleep-while-atomic" bugs in future refactors..
> 
> Either streams_mutex or atc_recovery.lock was scoped for only a
> few lines each section. Each was released before the other one
> was taken. Where is the "mixing" or "sleep-while-atomic" case?

The case doesn't exist yet, I meant it as a warning against future
re-factors, since I didn't see the need to use a spinlock here, I didn't
understand why couldn't all 3 be mutexes when the existing 2 already
were.

Praan

      reply	other threads:[~2026-03-10 20:00 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-05  5:21 [PATCH v1 0/2] iommu/arm-smmu-v3: Reset PCI device upon ATC invalidate timeout Nicolin Chen
2026-03-05  5:21 ` [PATCH v1 1/2] iommu: Do not call pci_dev_reset_iommu_done() unless reset succeeds Nicolin Chen
2026-03-05  5:21 ` [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts Nicolin Chen
2026-03-05 15:15   ` kernel test robot
2026-03-05 15:24   ` Robin Murphy
2026-03-05 21:06     ` Nicolin Chen
2026-03-05 23:30       ` Nicolin Chen
2026-03-05 23:52       ` Jason Gunthorpe
2026-03-06 15:24         ` Robin Murphy
2026-03-06 15:56           ` Jason Gunthorpe
2026-03-10 19:34             ` Pranjal Shrivastava
2026-03-05 15:39   ` Jason Gunthorpe
2026-03-05 21:15     ` Nicolin Chen
2026-03-05 23:41       ` Jason Gunthorpe
2026-03-06  1:29         ` Nicolin Chen
2026-03-06  1:33           ` Jason Gunthorpe
2026-03-06  5:06             ` Nicolin Chen
2026-03-06 13:02               ` Jason Gunthorpe
2026-03-06 19:20                 ` Nicolin Chen
2026-03-06 19:22                   ` Jason Gunthorpe
2026-03-06 19:39                     ` Nicolin Chen
2026-03-06 19:47                       ` Jason Gunthorpe
2026-03-10 19:40                 ` Pranjal Shrivastava
2026-03-10 19:57                   ` Nicolin Chen
2026-03-10 20:04                     ` Pranjal Shrivastava
2026-03-06 13:22         ` Robin Murphy
2026-03-06 14:01           ` Jason Gunthorpe
2026-03-06 20:18             ` Nicolin Chen
2026-03-06 20:22               ` Jason Gunthorpe
2026-03-06 20:34                 ` Nicolin Chen
2026-03-06  3:22     ` Baolu Lu
2026-03-06 13:00       ` Jason Gunthorpe
2026-03-06 19:35         ` Samiullah Khawaja
2026-03-06 19:43           ` Jason Gunthorpe
2026-03-06 19:59             ` Samiullah Khawaja
2026-03-06 20:03               ` Jason Gunthorpe
2026-03-06 20:22                 ` Samiullah Khawaja
2026-03-06 20:26                   ` Jason Gunthorpe
2026-03-10 20:00                     ` Samiullah Khawaja
2026-03-11 12:12                       ` Jason Gunthorpe
2026-03-06  2:35   ` kernel test robot
2026-03-10 19:16   ` Pranjal Shrivastava
2026-03-10 19:51     ` Nicolin Chen
2026-03-10 20:00       ` Pranjal Shrivastava [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abB4cAjZjIvIzhkp@google.com \
    --to=praan@google.com \
    --cc=Alexander.Grest@microsoft.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kees@kernel.org \
    --cc=kevin.tian@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=miko.lenczewski@arm.com \
    --cc=nicolinc@nvidia.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.