public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Dmytro Maluka <dmaluka@chromium.org>
To: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
	Robin Murphy <robin.murphy@arm.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Samiullah Khawaja <skhawaja@google.com>,
	iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
	"Vineeth Pillai (Google)" <vineeth@bitbyteword.org>,
	Aashish Sharma <aashish@aashishsharma.net>
Subject: Re: [PATCH 3/3] iommu/vt-d: Rework hitless PASID entry replacement
Date: Tue, 13 Jan 2026 20:39:53 +0100	[thread overview]
Message-ID: <aWafiUwBI8togJvP@google.com> (raw)
In-Reply-To: <20260113030052.977366-4-baolu.lu@linux.intel.com>

On Tue, Jan 13, 2026 at 11:00:48AM +0800, Lu Baolu wrote:
> The Intel VT-d PASID table entry is 512 bits (64 bytes). Because the
> hardware may fetch this entry in multiple 128-bit chunks, updating the
> entire entry while it is active (P=1) risks a "torn" read where the
> hardware observes an inconsistent state.
> 
> However, certain updates (e.g., changing page table pointers while
> keeping the translation type and domain ID the same) can be performed
> hitlessly. This is possible if the update is limited to a single
> 128-bit chunk while the other chunks remains stable.
> 
> Introduce a hitless replacement mechanism for PASID entries:
> 
> - Update 'struct pasid_entry' with a union to support 128-bit
>   access via the newly added val128[4] array.
> - Add pasid_support_hitless_replace() to determine if a transition
>   between an old and new entry is safe to perform atomically.
>   - For First-level/Nested translations: The first 128 bits (chunk 0)
>     must remain identical; chunk 1 is updated atomically.
>   - For Second-level/Pass-through: The second 128 bits (chunk 1)
>     must remain identical; chunk 0 is updated atomically.
> - If hitless replacement is supported, use intel_iommu_atomic128_set()
>   to commit the change in a single 16-byte burst.
> - If the changes are too extensive to be hitless, fall back to the
>   safe "tear down and re-setup" flow (clear present -> flush -> setup).
> 
> Fixes: 7543ee63e811 ("iommu/vt-d: Add pasid replace helpers")
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> ---
>  drivers/iommu/intel/pasid.h | 26 ++++++++++++++++-
>  drivers/iommu/intel/pasid.c | 57 ++++++++++++++++++++++++++++++++++---
>  2 files changed, 78 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
> index 35de1d77355f..b569e2828a8b 100644
> --- a/drivers/iommu/intel/pasid.h
> +++ b/drivers/iommu/intel/pasid.h
> @@ -37,7 +37,10 @@ struct pasid_dir_entry {
>  };
>  
>  struct pasid_entry {
> -	u64 val[8];
> +	union {
> +		u64 val[8];
> +		u128 val128[4];
> +	};
>  };
>  
>  #define PASID_ENTRY_PGTT_FL_ONLY	(1)
> @@ -297,6 +300,27 @@ static inline void pasid_set_eafe(struct pasid_entry *pe)
>  	pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7);
>  }
>  
> +static inline bool pasid_support_hitless_replace(struct pasid_entry *pte,
> +						 struct pasid_entry *new, int type)
> +{
> +	switch (type) {
> +	case PASID_ENTRY_PGTT_FL_ONLY:
> +	case PASID_ENTRY_PGTT_NESTED:
> +		/* The first 128 bits remain the same. */
> +		return READ_ONCE(pte->val[0]) == READ_ONCE(new->val[0]) &&
> +			READ_ONCE(pte->val[1]) == READ_ONCE(new->val[1]);
> +	case PASID_ENTRY_PGTT_SL_ONLY:
> +	case PASID_ENTRY_PGTT_PT:
> +		/* The second 128 bits remain the same. */
> +		return READ_ONCE(pte->val[2]) == READ_ONCE(new->val[2]) &&
> +			READ_ONCE(pte->val[3]) == READ_ONCE(new->val[3]);
> +	default:
> +		WARN_ON(true);

nit: WARN_ON(false) seems a bit more suitable?

> +	}
> +
> +	return false;
> +}
> +
>  extern unsigned int intel_pasid_max_id;
>  int intel_pasid_alloc_table(struct device *dev);
>  void intel_pasid_free_table(struct device *dev);
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 4f36138448d8..da7ab18d3bfe 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -452,7 +452,20 @@ int intel_pasid_replace_first_level(struct intel_iommu *iommu,
>  
>  	WARN_ON(old_did != pasid_get_domain_id(pte));
>  
> -	*pte = new_pte;
> +	if (!pasid_support_hitless_replace(pte, &new_pte,
> +					   PASID_ENTRY_PGTT_FL_ONLY)) {
> +		spin_unlock(&iommu->lock);
> +		intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> +		return intel_pasid_setup_first_level(iommu, dev, fsptptr,
> +						     pasid, did, flags);
> +	}
> +
> +	/*
> +	 * A first-only hitless replace requires the first 128 bits to remain
> +	 * the same. Only the second 128-bit chunk needs to be updated.
> +	 */
> +	intel_iommu_atomic128_set(&pte->val128[1], new_pte.val128[1]);
>  	spin_unlock(&iommu->lock);
>  
>  	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> @@ -563,7 +576,19 @@ int intel_pasid_replace_second_level(struct intel_iommu *iommu,
>  
>  	WARN_ON(old_did != pasid_get_domain_id(pte));
>  
> -	*pte = new_pte;
> +	if (!pasid_support_hitless_replace(pte, &new_pte,
> +					   PASID_ENTRY_PGTT_SL_ONLY)) {
> +		spin_unlock(&iommu->lock);
> +		intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> +		return intel_pasid_setup_second_level(iommu, domain, dev, pasid);
> +	}
> +
> +	/*
> +	 * A second-only hitless replace requires the second 128 bits to remain
> +	 * the same. Only the first 128-bit chunk needs to be updated.
> +	 */
> +	intel_iommu_atomic128_set(&pte->val128[0], new_pte.val128[0]);
>  	spin_unlock(&iommu->lock);
>  
>  	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> @@ -707,7 +732,19 @@ int intel_pasid_replace_pass_through(struct intel_iommu *iommu,
>  
>  	WARN_ON(old_did != pasid_get_domain_id(pte));
>  
> -	*pte = new_pte;
> +	if (!pasid_support_hitless_replace(pte, &new_pte,
> +					   PASID_ENTRY_PGTT_PT)) {
> +		spin_unlock(&iommu->lock);
> +		intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> +		return intel_pasid_setup_pass_through(iommu, dev, pasid);
> +	}
> +
> +	/*
> +	 * A passthrough hitless replace requires the second 128 bits to remain
> +	 * the same. Only the first 128-bit chunk needs to be updated.
> +	 */
> +	intel_iommu_atomic128_set(&pte->val128[0], new_pte.val128[0]);
>  	spin_unlock(&iommu->lock);
>  
>  	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> @@ -903,7 +940,19 @@ int intel_pasid_replace_nested(struct intel_iommu *iommu,
>  
>  	WARN_ON(old_did != pasid_get_domain_id(pte));
>  
> -	*pte = new_pte;
> +	if (!pasid_support_hitless_replace(pte, &new_pte,
> +					   PASID_ENTRY_PGTT_NESTED)) {
> +		spin_unlock(&iommu->lock);
> +		intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> +		return intel_pasid_setup_nested(iommu, dev, pasid, domain);
> +	}
> +
> +	/*
> +	 * A nested hitless replace requires the first 128 bits to remain
> +	 * the same. Only the second 128-bit chunk needs to be updated.
> +	 */
> +	intel_iommu_atomic128_set(&pte->val128[1], new_pte.val128[1]);
>  	spin_unlock(&iommu->lock);
>  
>  	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
> -- 
> 2.43.0
> 

  parent reply	other threads:[~2026-01-13 19:40 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-13  3:00 [PATCH 0/3] iommu/vt-d: Ensure atomicity in context and PASID entry updates Lu Baolu
2026-01-13  3:00 ` [PATCH 1/3] iommu/vt-d: Use 128-bit atomic updates for context entries Lu Baolu
2026-01-13 19:27   ` Dmytro Maluka
2026-01-14  5:14     ` Baolu Lu
2026-01-14 10:55       ` Dmytro Maluka
2026-01-15  2:26         ` Baolu Lu
2026-01-15 13:12           ` Jason Gunthorpe
2026-01-14  7:54   ` Tian, Kevin
2026-01-15  3:26     ` Baolu Lu
2026-01-15  5:59       ` Tian, Kevin
2026-01-15 13:23         ` Jason Gunthorpe
2026-01-16  5:19           ` Tian, Kevin
2026-01-16 14:33             ` Jason Gunthorpe
2026-01-13  3:00 ` [PATCH 2/3] iommu/vt-d: Clear Present bit before tearing down PASID entry Lu Baolu
2026-01-13 19:34   ` Dmytro Maluka
2026-01-14  5:38     ` Baolu Lu
2026-01-14 11:12       ` Dmytro Maluka
2026-01-15  2:45         ` Baolu Lu
2026-01-15 21:35           ` Dmytro Maluka
2026-01-16  6:06             ` Baolu Lu
2026-01-20 13:49               ` Dmytro Maluka
2026-01-14  7:32   ` Tian, Kevin
2026-01-14  8:27     ` Baolu Lu
2026-01-15  5:49       ` Tian, Kevin
2026-01-13  3:00 ` [PATCH 3/3] iommu/vt-d: Rework hitless PASID entry replacement Lu Baolu
2026-01-13 15:05   ` Jason Gunthorpe
2026-01-14  6:03     ` Baolu Lu
2026-01-13 19:27   ` Samiullah Khawaja
2026-01-13 20:56     ` Jason Gunthorpe
2026-01-14  5:45     ` Baolu Lu
2026-01-14  7:26       ` Tian, Kevin
2026-01-14 13:17         ` Jason Gunthorpe
2026-01-14 18:51           ` Samiullah Khawaja
2026-01-14 19:07             ` Jason Gunthorpe
2026-01-15  5:44           ` Tian, Kevin
2026-01-15 13:28             ` Jason Gunthorpe
2026-01-16  6:16               ` Tian, Kevin
2026-01-13 19:39   ` Dmytro Maluka [this message]
2026-01-13 20:06     ` Dmytro Maluka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aWafiUwBI8togJvP@google.com \
    --to=dmaluka@chromium.org \
    --cc=aashish@aashishsharma.net \
    --cc=baolu.lu@linux.intel.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=robin.murphy@arm.com \
    --cc=skhawaja@google.com \
    --cc=vineeth@bitbyteword.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox