From: Lu Baolu <baolu.lu@linux.intel.com>
To: Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
Kevin Tian <kevin.tian@intel.com>,
Jason Gunthorpe <jgg@nvidia.com>
Cc: Dmytro Maluka <dmaluka@chromium.org>,
Samiullah Khawaja <skhawaja@google.com>,
iommu@lists.linux.dev, linux-kernel@vger.kernel.org,
Lu Baolu <baolu.lu@linux.intel.com>
Subject: [PATCH 3/3] iommu/vt-d: Rework hitless PASID entry replacement
Date: Tue, 13 Jan 2026 11:00:48 +0800 [thread overview]
Message-ID: <20260113030052.977366-4-baolu.lu@linux.intel.com> (raw)
In-Reply-To: <20260113030052.977366-1-baolu.lu@linux.intel.com>
The Intel VT-d PASID table entry is 512 bits (64 bytes). Because the
hardware may fetch this entry in multiple 128-bit chunks, updating the
entire entry while it is active (P=1) risks a "torn" read where the
hardware observes an inconsistent state.
However, certain updates (e.g., changing page table pointers while
keeping the translation type and domain ID the same) can be performed
hitlessly. This is possible if the update is limited to a single
128-bit chunk while the other chunks remains stable.
Introduce a hitless replacement mechanism for PASID entries:
- Update 'struct pasid_entry' with a union to support 128-bit
access via the newly added val128[4] array.
- Add pasid_support_hitless_replace() to determine if a transition
between an old and new entry is safe to perform atomically.
- For First-level/Nested translations: The first 128 bits (chunk 0)
must remain identical; chunk 1 is updated atomically.
- For Second-level/Pass-through: The second 128 bits (chunk 1)
must remain identical; chunk 0 is updated atomically.
- If hitless replacement is supported, use intel_iommu_atomic128_set()
to commit the change in a single 16-byte burst.
- If the changes are too extensive to be hitless, fall back to the
safe "tear down and re-setup" flow (clear present -> flush -> setup).
Fixes: 7543ee63e811 ("iommu/vt-d: Add pasid replace helpers")
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
---
drivers/iommu/intel/pasid.h | 26 ++++++++++++++++-
drivers/iommu/intel/pasid.c | 57 ++++++++++++++++++++++++++++++++++---
2 files changed, 78 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 35de1d77355f..b569e2828a8b 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -37,7 +37,10 @@ struct pasid_dir_entry {
};
struct pasid_entry {
- u64 val[8];
+ union {
+ u64 val[8];
+ u128 val128[4];
+ };
};
#define PASID_ENTRY_PGTT_FL_ONLY (1)
@@ -297,6 +300,27 @@ static inline void pasid_set_eafe(struct pasid_entry *pe)
pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7);
}
+static inline bool pasid_support_hitless_replace(struct pasid_entry *pte,
+ struct pasid_entry *new, int type)
+{
+ switch (type) {
+ case PASID_ENTRY_PGTT_FL_ONLY:
+ case PASID_ENTRY_PGTT_NESTED:
+ /* The first 128 bits remain the same. */
+ return READ_ONCE(pte->val[0]) == READ_ONCE(new->val[0]) &&
+ READ_ONCE(pte->val[1]) == READ_ONCE(new->val[1]);
+ case PASID_ENTRY_PGTT_SL_ONLY:
+ case PASID_ENTRY_PGTT_PT:
+ /* The second 128 bits remain the same. */
+ return READ_ONCE(pte->val[2]) == READ_ONCE(new->val[2]) &&
+ READ_ONCE(pte->val[3]) == READ_ONCE(new->val[3]);
+ default:
+ WARN_ON(true);
+ }
+
+ return false;
+}
+
extern unsigned int intel_pasid_max_id;
int intel_pasid_alloc_table(struct device *dev);
void intel_pasid_free_table(struct device *dev);
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 4f36138448d8..da7ab18d3bfe 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -452,7 +452,20 @@ int intel_pasid_replace_first_level(struct intel_iommu *iommu,
WARN_ON(old_did != pasid_get_domain_id(pte));
- *pte = new_pte;
+ if (!pasid_support_hitless_replace(pte, &new_pte,
+ PASID_ENTRY_PGTT_FL_ONLY)) {
+ spin_unlock(&iommu->lock);
+ intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+
+ return intel_pasid_setup_first_level(iommu, dev, fsptptr,
+ pasid, did, flags);
+ }
+
+ /*
+ * A first-only hitless replace requires the first 128 bits to remain
+ * the same. Only the second 128-bit chunk needs to be updated.
+ */
+ intel_iommu_atomic128_set(&pte->val128[1], new_pte.val128[1]);
spin_unlock(&iommu->lock);
intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
@@ -563,7 +576,19 @@ int intel_pasid_replace_second_level(struct intel_iommu *iommu,
WARN_ON(old_did != pasid_get_domain_id(pte));
- *pte = new_pte;
+ if (!pasid_support_hitless_replace(pte, &new_pte,
+ PASID_ENTRY_PGTT_SL_ONLY)) {
+ spin_unlock(&iommu->lock);
+ intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+
+ return intel_pasid_setup_second_level(iommu, domain, dev, pasid);
+ }
+
+ /*
+ * A second-only hitless replace requires the second 128 bits to remain
+ * the same. Only the first 128-bit chunk needs to be updated.
+ */
+ intel_iommu_atomic128_set(&pte->val128[0], new_pte.val128[0]);
spin_unlock(&iommu->lock);
intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
@@ -707,7 +732,19 @@ int intel_pasid_replace_pass_through(struct intel_iommu *iommu,
WARN_ON(old_did != pasid_get_domain_id(pte));
- *pte = new_pte;
+ if (!pasid_support_hitless_replace(pte, &new_pte,
+ PASID_ENTRY_PGTT_PT)) {
+ spin_unlock(&iommu->lock);
+ intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+
+ return intel_pasid_setup_pass_through(iommu, dev, pasid);
+ }
+
+ /*
+ * A passthrough hitless replace requires the second 128 bits to remain
+ * the same. Only the first 128-bit chunk needs to be updated.
+ */
+ intel_iommu_atomic128_set(&pte->val128[0], new_pte.val128[0]);
spin_unlock(&iommu->lock);
intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
@@ -903,7 +940,19 @@ int intel_pasid_replace_nested(struct intel_iommu *iommu,
WARN_ON(old_did != pasid_get_domain_id(pte));
- *pte = new_pte;
+ if (!pasid_support_hitless_replace(pte, &new_pte,
+ PASID_ENTRY_PGTT_NESTED)) {
+ spin_unlock(&iommu->lock);
+ intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+
+ return intel_pasid_setup_nested(iommu, dev, pasid, domain);
+ }
+
+ /*
+ * A nested hitless replace requires the first 128 bits to remain
+ * the same. Only the second 128-bit chunk needs to be updated.
+ */
+ intel_iommu_atomic128_set(&pte->val128[1], new_pte.val128[1]);
spin_unlock(&iommu->lock);
intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
--
2.43.0
next prev parent reply other threads:[~2026-01-13 3:03 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-13 3:00 [PATCH 0/3] iommu/vt-d: Ensure atomicity in context and PASID entry updates Lu Baolu
2026-01-13 3:00 ` [PATCH 1/3] iommu/vt-d: Use 128-bit atomic updates for context entries Lu Baolu
2026-01-13 19:27 ` Dmytro Maluka
2026-01-14 5:14 ` Baolu Lu
2026-01-14 10:55 ` Dmytro Maluka
2026-01-15 2:26 ` Baolu Lu
2026-01-15 13:12 ` Jason Gunthorpe
2026-01-14 7:54 ` Tian, Kevin
2026-01-15 3:26 ` Baolu Lu
2026-01-15 5:59 ` Tian, Kevin
2026-01-15 13:23 ` Jason Gunthorpe
2026-01-16 5:19 ` Tian, Kevin
2026-01-16 14:33 ` Jason Gunthorpe
2026-01-13 3:00 ` [PATCH 2/3] iommu/vt-d: Clear Present bit before tearing down PASID entry Lu Baolu
2026-01-13 19:34 ` Dmytro Maluka
2026-01-14 5:38 ` Baolu Lu
2026-01-14 11:12 ` Dmytro Maluka
2026-01-15 2:45 ` Baolu Lu
2026-01-15 21:35 ` Dmytro Maluka
2026-01-16 6:06 ` Baolu Lu
2026-01-20 13:49 ` Dmytro Maluka
2026-01-14 7:32 ` Tian, Kevin
2026-01-14 8:27 ` Baolu Lu
2026-01-15 5:49 ` Tian, Kevin
2026-01-13 3:00 ` Lu Baolu [this message]
2026-01-13 15:05 ` [PATCH 3/3] iommu/vt-d: Rework hitless PASID entry replacement Jason Gunthorpe
2026-01-14 6:03 ` Baolu Lu
2026-01-13 19:27 ` Samiullah Khawaja
2026-01-13 20:56 ` Jason Gunthorpe
2026-01-14 5:45 ` Baolu Lu
2026-01-14 7:26 ` Tian, Kevin
2026-01-14 13:17 ` Jason Gunthorpe
2026-01-14 18:51 ` Samiullah Khawaja
2026-01-14 19:07 ` Jason Gunthorpe
2026-01-15 5:44 ` Tian, Kevin
2026-01-15 13:28 ` Jason Gunthorpe
2026-01-16 6:16 ` Tian, Kevin
2026-01-13 19:39 ` Dmytro Maluka
2026-01-13 20:06 ` Dmytro Maluka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260113030052.977366-4-baolu.lu@linux.intel.com \
--to=baolu.lu@linux.intel.com \
--cc=dmaluka@chromium.org \
--cc=iommu@lists.linux.dev \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=robin.murphy@arm.com \
--cc=skhawaja@google.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox