public inbox for iommu@lists.linux-foundation.org
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Kevin Tian <kevin.tian@intel.com>,
	Ashok Raj <ashok.raj@intel.com>,
	linux-kernel@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	iommu@lists.linux-foundation.org,
	Jacob jun Pan <jacob.jun.pan@intel.com>,
	Will Deacon <will@kernel.org>
Subject: Re: [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs
Date: Tue, 31 May 2022 22:22:32 +0100	[thread overview]
Message-ID: <56bbbad7-bcba-a440-692b-64e50b4eddf8@arm.com> (raw)
In-Reply-To: <20220531185110.GJ1343366@nvidia.com>

On 2022-05-31 19:51, Jason Gunthorpe wrote:
> On Tue, May 31, 2022 at 07:07:32PM +0100, Robin Murphy wrote:
> 
>>> And we expect the iommu driver to be unable to free page table levels
>>> that have IOVA boundaries in them?
>>
>> I'm not entirely sure what you mean there, but in general an unmap request
>> is expected to match some previous map request
> 
> atomic cmpxchg is OK for inserting new page table levels but it can't
> protect you against concurrent freeing of page table levels. So
> without locks it means that page tables can't usually be freed. Which
> seems to match what the Intel driver does - at least from a cursory
> look.
> 
> This is one of the reasons the mm has the mmap/etc lock and spinlocks
> because we do expect page table levels to get wiped out when VMA's are
> zap'd - all the different locks provide the protection against page
> tables disappearing under from something manipulating them.
> 
> Basically every "lockless" walk in (process) MM land is actually
> protected by some kind of lock that blocks zap_page_range() from
> removing the page table levels themselves.

I'm not an expert in the Intel or AMD code, so I can only speak with 
confidence about what we do in io-pgtable-arm, but the main reason for 
not freeing pagetables is that it's simply not worth the bother of 
trying to work out whether a whole sub-tree is empty. Not to mention 
whether it's *still* empty by the time that we may have figured out that 
it was.

There are only 3 instances where we'll free a table while the domain is 
live. The first is the one legitimate race condition, where two map 
requests targeting relatively nearby PTEs both go to fill in an 
intermediate level of table; whoever loses that race frees the table 
they allocated, but it was never visible to anyone else so that's 
definitely fine. The second is if we're mapping a block entry, and find 
that there's already a table entry there, wherein we assume the table 
must be empty, clear the entry, invalidate any walk caches, install the 
block entry, then free the orphaned table; since we're mapping the 
entire IOVA range covered by that table, there should be no other 
operations on that IOVA range attempting to walk the table at the same 
time, so it's fine. The third is effectively the inverse, if we get a 
block-sized unmap but find a table entry rather than a block at that 
point (on the assumption that it's de-facto allowed for a single unmap 
to cover multiple adjacent mappings as long as it does so exactly); 
similarly we assume that the table must be full, and no other operations 
should be racing because we're unmapping its whole IOVA range, so we 
remove the table entry, invalidate, and free as before.

Again for efficiency reasons we don't attempt to validate those 
assumptions by inspecting the freed tables, so odd behaviour can fall 
out if the caller *does* do something bogus. For example if two calls 
race to map a block and a page in the same (unmapped) region, the block 
mapping will always succeed (and be what ends up in the final pagetable 
state), but the page mapping may or may not report failure depending on 
the exact timing.

Although we don't have debug dumping for io-pgtable-arm, it's good to be 
thinking about this, since it's made me realise that dirty-tracking 
sweeps per that proposal might pose a similar kind of concern, so we 
might still need to harden these corners for the sake of that. Which 
also reminds me that somewhere I have some half-finished patches making 
io-pgtable-arm use the iommu_iotlb_gather freelist, so maybe I'll tackle 
both concerns at once (perhaps we might even be able to RCU-ify the 
freelist generically? I'll see how it goes when I get there).

Cheers,
Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  reply	other threads:[~2022-05-31 21:22 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-27  6:30 [PATCH 00/12] iommu/vt-d: Optimize the use of locks Lu Baolu
2022-05-27  6:30 ` [PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs Lu Baolu
2022-05-27 14:59   ` Jason Gunthorpe via iommu
2022-05-29  5:14     ` Baolu Lu
2022-05-30 12:14       ` Jason Gunthorpe via iommu
2022-05-31  3:02         ` Baolu Lu
2022-05-31 13:10           ` Jason Gunthorpe via iommu
2022-05-31 14:11             ` Baolu Lu
2022-05-31 14:53               ` Jason Gunthorpe via iommu
2022-05-31 15:01                 ` Robin Murphy
2022-05-31 15:13                   ` Jason Gunthorpe via iommu
2022-05-31 16:01                     ` Robin Murphy
2022-05-31 16:21                       ` Jason Gunthorpe via iommu
2022-05-31 18:07                         ` Robin Murphy
2022-05-31 18:51                           ` Jason Gunthorpe via iommu
2022-05-31 21:22                             ` Robin Murphy [this message]
2022-05-31 23:10                               ` Jason Gunthorpe via iommu
2022-06-01  8:53                                 ` Tian, Kevin
2022-06-01 12:18                                 ` Joao Martins
2022-06-01 12:33                                   ` Jason Gunthorpe via iommu
2022-06-01 13:52                                     ` Joao Martins
2022-06-01 14:22                                       ` Jason Gunthorpe via iommu
2022-06-01  6:39                             ` Baolu Lu
2022-05-31 13:52           ` Robin Murphy
2022-05-31 15:59             ` Jason Gunthorpe via iommu
2022-05-31 16:42               ` Robin Murphy
2022-06-01  5:47               ` Baolu Lu
2022-06-01  5:33             ` Baolu Lu
2022-05-27  6:30 ` [PATCH 02/12] iommu/vt-d: Remove for_each_device_domain() Lu Baolu
2022-05-27 15:00   ` Jason Gunthorpe via iommu
2022-06-01  8:53   ` Tian, Kevin
2022-05-27  6:30 ` [PATCH 03/12] iommu/vt-d: Remove clearing translation data in disable_dmar_iommu() Lu Baolu
2022-05-27 15:01   ` Jason Gunthorpe via iommu
2022-05-29  5:22     ` Baolu Lu
2022-05-27  6:30 ` [PATCH 04/12] iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk() Lu Baolu
2022-05-27 15:01   ` Jason Gunthorpe via iommu
2022-06-01  8:56   ` Tian, Kevin
2022-05-27  6:30 ` [PATCH 05/12] iommu/vt-d: Unncessary spinlock for root table alloc and free Lu Baolu
2022-06-01  9:05   ` Tian, Kevin
2022-05-27  6:30 ` [PATCH 06/12] iommu/vt-d: Acquiring lock in domain ID allocation helpers Lu Baolu
2022-06-01  9:09   ` Tian, Kevin
2022-06-01 10:38     ` Baolu Lu
2022-05-27  6:30 ` [PATCH 07/12] iommu/vt-d: Acquiring lock in pasid manipulation helpers Lu Baolu
2022-06-01  9:18   ` Tian, Kevin
2022-06-01 10:48     ` Baolu Lu
2022-05-27  6:30 ` [PATCH 08/12] iommu/vt-d: Replace spin_lock_irqsave() with spin_lock() Lu Baolu
2022-05-27  6:30 ` [PATCH 09/12] iommu/vt-d: Check device list of domain in domain free path Lu Baolu
2022-05-27 15:05   ` Jason Gunthorpe via iommu
2022-06-01  9:28   ` Tian, Kevin
2022-06-01 11:02     ` Baolu Lu
2022-06-02  6:29       ` Tian, Kevin
2022-06-06  1:34         ` Baolu Lu
2022-05-27  6:30 ` [PATCH 10/12] iommu/vt-d: Fold __dmar_remove_one_dev_info() into its caller Lu Baolu
2022-05-27  6:30 ` [PATCH 11/12] iommu/vt-d: Use device_domain_lock accurately Lu Baolu
2022-05-27  6:30 ` [PATCH 12/12] iommu/vt-d: Convert device_domain_lock into per-domain mutex Lu Baolu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56bbbad7-bcba-a440-692b-64e50b4eddf8@arm.com \
    --to=robin.murphy@arm.com \
    --cc=ashok.raj@intel.com \
    --cc=hch@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=jacob.jun.pan@intel.com \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox