All of lore.kernel.org
 help / color / mirror / Atom feed
From: Samiullah Khawaja <skhawaja@google.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Baolu Lu <baolu.lu@linux.intel.com>,
	 Nicolin Chen <nicolinc@nvidia.com>,
	will@kernel.org, robin.murphy@arm.com, joro@8bytes.org,
	 bhelgaas@google.com, rafael@kernel.org, lenb@kernel.org,
	praan@google.com,  kees@kernel.org, smostafa@google.com,
	Alexander.Grest@microsoft.com,  kevin.tian@intel.com,
	miko.lenczewski@arm.com, linux-arm-kernel@lists.infradead.org,
	 iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-acpi@vger.kernel.org,  linux-pci@vger.kernel.org,
	vsethi@nvidia.com
Subject: Re: [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts
Date: Tue, 10 Mar 2026 20:00:45 +0000	[thread overview]
Message-ID: <abB1VJ5Op24dMIog@google.com> (raw)
In-Reply-To: <20260306202652.GP1651202@nvidia.com>

On Fri, Mar 06, 2026 at 04:26:52PM -0400, Jason Gunthorpe wrote:
>On Fri, Mar 06, 2026 at 08:22:08PM +0000, Samiullah Khawaja wrote:
>
>> But do you think doing the timeout logic without fencing would be good
>> enough?
>
>It is what ARM and AMD do, so I wouldn't object to it.

I think without any back pressure to the caller, a device will be able
to fill the invalidation queue with device IOTLB invalidations that get
stuck until the HW timeout occurs.
>
>> Currently VT-d blocks itself, until it gets an Invalidation Timeout
>> from HW, and system ends up in a hardlockup since interrupts are
>> disabled.
>>
>> Are you concerned that if fencing is done without an RAS flow, the
>> device might not be able to detect the failure (if it really needs ATS
>> to work)?
>
>Yes, and then the device is badly locked because nothing will fix the
>IOMMU fence.
>
>VFIO might fix it if it is restarted, but other approahces like
>rmmod/insmod won't restore the broken device.
>
>So I'd rather see a more complete solution before we add fencing to
>the iommu drivers. Minimally userspace doing a rmmod, flr, insmod
>should be able to restore the device.
>
>Then auto-FLR through RAS could sit on top of that.
>
>> I am thinking, we can do translated fence and timeout change for VT-d.
>> And the device can use existing RAS mechanism to recover itself. This
>> way we atleast make sure that caller of flush can reuse the memory/IOVAs
>> without UAFs.
>
>Without a larger framework to unfence I think this will get devices
>stuck..
>
>Jason

  reply	other threads:[~2026-03-10 20:00 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-05  5:21 [PATCH v1 0/2] iommu/arm-smmu-v3: Reset PCI device upon ATC invalidate timeout Nicolin Chen
2026-03-05  5:21 ` [PATCH v1 1/2] iommu: Do not call pci_dev_reset_iommu_done() unless reset succeeds Nicolin Chen
2026-03-05  5:21 ` [PATCH v1 2/2] iommu/arm-smmu-v3: Recover ATC invalidate timeouts Nicolin Chen
2026-03-05 15:15   ` kernel test robot
2026-03-05 15:24   ` Robin Murphy
2026-03-05 21:06     ` Nicolin Chen
2026-03-05 23:30       ` Nicolin Chen
2026-03-05 23:52       ` Jason Gunthorpe
2026-03-06 15:24         ` Robin Murphy
2026-03-06 15:56           ` Jason Gunthorpe
2026-03-10 19:34             ` Pranjal Shrivastava
2026-03-05 15:39   ` Jason Gunthorpe
2026-03-05 21:15     ` Nicolin Chen
2026-03-05 23:41       ` Jason Gunthorpe
2026-03-06  1:29         ` Nicolin Chen
2026-03-06  1:33           ` Jason Gunthorpe
2026-03-06  5:06             ` Nicolin Chen
2026-03-06 13:02               ` Jason Gunthorpe
2026-03-06 19:20                 ` Nicolin Chen
2026-03-06 19:22                   ` Jason Gunthorpe
2026-03-06 19:39                     ` Nicolin Chen
2026-03-06 19:47                       ` Jason Gunthorpe
2026-03-10 19:40                 ` Pranjal Shrivastava
2026-03-10 19:57                   ` Nicolin Chen
2026-03-10 20:04                     ` Pranjal Shrivastava
2026-03-06 13:22         ` Robin Murphy
2026-03-06 14:01           ` Jason Gunthorpe
2026-03-06 20:18             ` Nicolin Chen
2026-03-06 20:22               ` Jason Gunthorpe
2026-03-06 20:34                 ` Nicolin Chen
2026-03-06  3:22     ` Baolu Lu
2026-03-06 13:00       ` Jason Gunthorpe
2026-03-06 19:35         ` Samiullah Khawaja
2026-03-06 19:43           ` Jason Gunthorpe
2026-03-06 19:59             ` Samiullah Khawaja
2026-03-06 20:03               ` Jason Gunthorpe
2026-03-06 20:22                 ` Samiullah Khawaja
2026-03-06 20:26                   ` Jason Gunthorpe
2026-03-10 20:00                     ` Samiullah Khawaja [this message]
2026-03-11 12:12                       ` Jason Gunthorpe
2026-03-06  2:35   ` kernel test robot
2026-03-10 19:16   ` Pranjal Shrivastava
2026-03-10 19:51     ` Nicolin Chen
2026-03-10 20:00       ` Pranjal Shrivastava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=abB1VJ5Op24dMIog@google.com \
    --to=skhawaja@google.com \
    --cc=Alexander.Grest@microsoft.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kees@kernel.org \
    --cc=kevin.tian@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=miko.lenczewski@arm.com \
    --cc=nicolinc@nvidia.com \
    --cc=praan@google.com \
    --cc=rafael@kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=smostafa@google.com \
    --cc=vsethi@nvidia.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.