linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Baolu Lu <baolu.lu@linux.intel.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: Joerg Roedel <joro@8bytes.org>,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 02/11] iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes
Date: Fri, 18 Jul 2025 10:56:24 +0800	[thread overview]
Message-ID: <3ea54d65-7ccc-479a-8912-bccd79d678d4@linux.intel.com> (raw)
In-Reply-To: <20250717115559.GD2177622@nvidia.com>

On 7/17/25 19:55, Jason Gunthorpe wrote:
> On Thu, Jul 17, 2025 at 10:40:01AM +0800, Baolu Lu wrote:
>> On 7/16/25 22:12, Jason Gunthorpe wrote:
>>> On Mon, Jul 14, 2025 at 12:50:19PM +0800, Lu Baolu wrote:
>>>> @@ -1833,6 +1845,8 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
>>>>    	if (ret)
>>>>    		goto out_block_translation;
>>>> +	domain->iotlb_sync_map |= domain_need_iotlb_sync_map(domain, iommu);
>>>
>>> This has no locking and is in the wrong order anyhow :(
>>>
>>> Any change to how invalidation works has to be done before attaching
>>> the HW so that the required invalidations are already happening before
>>> the HW can walk the page table.
>>>
>>> And you need to serialize somehow with concurrent map/unmap as iommufd
>>> doesn't prevent userspace from racing attach with map/unmap.
>>
>> domain->iotlb_sync_map does not change the driver's behavior. It simply
>> indicates that there's no need to waste time calling
>> cache_tag_flush_range_np(), as it's just a no-op.
> 
> Of course it changes the behavior, it changes what the invalidation
> callback does.
> 
> Without locking you have a race situation where a PGD is visible to HW
> that requires extra flushing and the SW is not doing the extra
> flushing.
> 
> Before any PGD is made visible to the HW the software must ensure all
> the required invalidations are happening.

Oh, I understand now. If there is no synchronization between attach/
detach and map/unmap operations, the cache invalidation behavior must be
determined when a domain is allocated.

> 
>> I previously discussed this with Kevin, and we agreed on a phase-by-
>> phase approach. As I mentioned, domain->iotlb_sync_map is merely a hint
>> for the driver, preventing it from looping through all cache tags to
>> determine if any cache invalidation work needs to be performed. We
>> already know it's predetermined that no work needs to be done.
> 
> The iteration though the cache tags is done inside a lock so it
> doesn't have this race (it has the issue I mentioned setting up the
> cache tage list though).
> 
>> RWBF is only required on some early implementations where memory
>> coherence was not yet implemented by the VT-d engine. It should be
>> difficult to find such systems in modern environments.
> 
> Then I would set it at domain creation time, check it during attach,
> and remove this race.

How about the following changes (compiled but not tested)?

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 8db8be9b7e7d..bb00dc14275d 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1780,18 +1780,6 @@ static int domain_setup_first_level(struct 
intel_iommu *iommu,
  					  __pa(pgd), flags, old);
  }

-static bool domain_need_iotlb_sync_map(struct dmar_domain *domain,
-				       struct intel_iommu *iommu)
-{
-	if (cap_caching_mode(iommu->cap) && intel_domain_is_ss_paging(domain))
-		return true;
-
-	if (rwbf_quirk || cap_rwbf(iommu->cap))
-		return true;
-
-	return false;
-}
-
  static int dmar_domain_attach_device(struct dmar_domain *domain,
  				     struct device *dev)
  {
@@ -1831,8 +1819,6 @@ static int dmar_domain_attach_device(struct 
dmar_domain *domain,
  	if (ret)
  		goto out_block_translation;

-	domain->iotlb_sync_map |= domain_need_iotlb_sync_map(domain, iommu);
-
  	return 0;

  out_block_translation:
@@ -3352,6 +3338,14 @@ intel_iommu_domain_alloc_first_stage(struct 
device *dev,
  		return ERR_CAST(dmar_domain);

  	dmar_domain->domain.ops = &intel_fs_paging_domain_ops;
+	/*
+	 * iotlb sync for map is only needed for legacy implementations that
+	 * explicitly require flushing internal write buffers to ensure memory
+	 * coherence.
+	 */
+	if (rwbf_quirk || cap_rwbf(iommu->cap))
+		dmar_domain->iotlb_sync_map = true;
+
  	return &dmar_domain->domain;
  }

@@ -3386,6 +3380,14 @@ intel_iommu_domain_alloc_second_stage(struct 
device *dev,
  	if (flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING)
  		dmar_domain->domain.dirty_ops = &intel_dirty_ops;

+	/*
+	 * Besides the internal write buffer flush, the caching mode used for
+	 * legacy nested translation (which utilizes shadowing page tables)
+	 * also requires iotlb sync on map.
+	 */
+	if (rwbf_quirk || cap_rwbf(iommu->cap) || cap_caching_mode(iommu->cap))
+		dmar_domain->iotlb_sync_map = true;
+
  	return &dmar_domain->domain;
  }

@@ -3446,6 +3448,11 @@ static int 
paging_domain_compatible_first_stage(struct dmar_domain *dmar_domain,
  	if (!cap_fl1gp_support(iommu->cap) &&
  	    (dmar_domain->domain.pgsize_bitmap & SZ_1G))
  		return -EINVAL;
+
+	/* iotlb sync on map requirement */
+	if ((rwbf_quirk || cap_rwbf(iommu->cap)) && !dmar_domain->iotlb_sync_map)
+		return -EINVAL;
+
  	return 0;
  }

@@ -3469,6 +3476,12 @@ paging_domain_compatible_second_stage(struct 
dmar_domain *dmar_domain,
  		return -EINVAL;
  	if (!(sslps & BIT(1)) && (dmar_domain->domain.pgsize_bitmap & SZ_1G))
  		return -EINVAL;
+
+	/* iotlb sync on map requirement */
+	if ((rwbf_quirk || cap_rwbf(iommu->cap) || 
cap_caching_mode(iommu->cap)) &&
+	    !dmar_domain->iotlb_sync_map)
+		return -EINVAL;
+
  	return 0;
  }

-- 
2.43.0

Thanks,
baolu


  reply	other threads:[~2025-07-18  2:58 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-14  4:50 [PATCH 00/11] [PULL REQUEST] Intel IOMMU updates for v6.17 Lu Baolu
2025-07-14  4:50 ` [PATCH 01/11] iommu/vt-d: Remove the CONFIG_X86 wrapping from iommu init hook Lu Baolu
2025-07-14  4:50 ` [PATCH 02/11] iommu/vt-d: Optimize iotlb_sync_map for non-caching/non-RWBF modes Lu Baolu
2025-07-16 14:12   ` Jason Gunthorpe
2025-07-17  2:40     ` Baolu Lu
2025-07-17 11:55       ` Jason Gunthorpe
2025-07-18  2:56         ` Baolu Lu [this message]
2025-07-18 13:29           ` Jason Gunthorpe
2025-07-21  1:57             ` Baolu Lu
2025-07-14  4:50 ` [PATCH 03/11] iommu/vt-d: Lift the __pa to domain_setup_first_level/intel_svm_set_dev_pasid() Lu Baolu
2025-07-14  4:50 ` [PATCH 04/11] iommu/vt-d: Fold domain_exit() into intel_iommu_domain_free() Lu Baolu
2025-07-14  4:50 ` [PATCH 05/11] iommu/vt-d: Do not wipe out the page table NID when devices detach Lu Baolu
2025-07-14  4:50 ` [PATCH 06/11] iommu/vt-d: Split intel_iommu_domain_alloc_paging_flags() Lu Baolu
2025-07-14  4:50 ` [PATCH 07/11] iommu/vt-d: Create unique domain ops for each stage Lu Baolu
2025-07-14  4:50 ` [PATCH 08/11] iommu/vt-d: Split intel_iommu_enforce_cache_coherency() Lu Baolu
2025-07-14  4:50 ` [PATCH 09/11] iommu/vt-d: Split paging_domain_compatible() Lu Baolu
2025-07-14  4:50 ` [PATCH 10/11] iommu/vt-d: Fix missing PASID in dev TLB flush with cache_tag_flush_all Lu Baolu
2025-07-14  4:50 ` [PATCH 11/11] iommu/vt-d: Deduplicate cache_tag_flush_all by reusing flush_range Lu Baolu
2025-07-14 11:00 ` [PATCH 00/11] [PULL REQUEST] Intel IOMMU updates for v6.17 Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ea54d65-7ccc-479a-8912-bccd79d678d4@linux.intel.com \
    --to=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).