public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: Yi Liu <yi.l.liu@intel.com>
To: Baolu Lu <baolu.lu@linux.intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"joro@8bytes.org" <joro@8bytes.org>,
	"jgg@nvidia.com" <jgg@nvidia.com>
Cc: "alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	"eric.auger@redhat.com" <eric.auger@redhat.com>,
	"nicolinc@nvidia.com" <nicolinc@nvidia.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"chao.p.peng@linux.intel.com" <chao.p.peng@linux.intel.com>,
	"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
	"Duan, Zhenzhong" <zhenzhong.duan@intel.com>,
	"vasant.hegde@amd.com" <vasant.hegde@amd.com>,
	"willy@infradead.org" <willy@infradead.org>
Subject: Re: [PATCH v5 04/13] iommu/vt-d: Add pasid replace helpers
Date: Thu, 7 Nov 2024 16:39:51 +0800	[thread overview]
Message-ID: <f73869de-38b1-41cf-bd20-d91523e4fd08@intel.com> (raw)
In-Reply-To: <27c2acfb-a428-486a-bd10-7d34a8cae4ed@linux.intel.com>

On 2024/11/7 16:04, Baolu Lu wrote:
> On 2024/11/7 15:57, Yi Liu wrote:
>> On 2024/11/7 14:53, Baolu Lu wrote:
>>> On 2024/11/7 14:46, Yi Liu wrote:
>>>> On 2024/11/7 13:46, Tian, Kevin wrote:
>>>>>> From: Liu, Yi L <yi.l.liu@intel.com>
>>>>>> Sent: Thursday, November 7, 2024 12:21 PM
>>>>>>
>>>>>> On 2024/11/7 10:52, Baolu Lu wrote:
>>>>>>> On 11/6/24 23:45, Yi Liu wrote:
>>>>>>>> +int intel_pasid_replace_first_level(struct intel_iommu *iommu,
>>>>>>>> +                    struct device *dev, pgd_t *pgd,
>>>>>>>> +                    u32 pasid, u16 did, u16 old_did,
>>>>>>>> +                    int flags)
>>>>>>>> +{
>>>>>>>> +    struct pasid_entry *pte;
>>>>>>>> +
>>>>>>>> +    if (!ecap_flts(iommu->ecap)) {
>>>>>>>> +        pr_err("No first level translation support on %s\n",
>>>>>>>> +               iommu->name);
>>>>>>>> +        return -EINVAL;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if ((flags & PASID_FLAG_FL5LP) && !cap_fl5lp_support(iommu- 
>>>>>>>> >cap)) {
>>>>>>>> +        pr_err("No 5-level paging support for first-level on %s\n",
>>>>>>>> +               iommu->name);
>>>>>>>> +        return -EINVAL;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    spin_lock(&iommu->lock);
>>>>>>>> +    pte = intel_pasid_get_entry(dev, pasid);
>>>>>>>> +    if (!pte) {
>>>>>>>> +        spin_unlock(&iommu->lock);
>>>>>>>> +        return -ENODEV;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    if (!pasid_pte_is_present(pte)) {
>>>>>>>> +        spin_unlock(&iommu->lock);
>>>>>>>> +        return -EINVAL;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    WARN_ON(old_did != pasid_get_domain_id(pte));
>>>>>>>> +
>>>>>>>> +    pasid_pte_config_first_level(iommu, pte, pgd, did, flags);
>>>>>>>> +    spin_unlock(&iommu->lock);
>>>>>>>> +
>>>>>>>> +    intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
>>>>>>>> +    intel_iommu_drain_pasid_prq(dev, pasid);
>>>>>>>> +
>>>>>>>> +    return 0;
>>>>>>>> +}
>>>>>>>
>>>>>>> pasid_pte_config_first_level() causes the pasid entry to transition 
>>>>>>> from
>>>>>>> present to non-present and then to present. In this case, calling
>>>>>>> intel_pasid_flush_present() is not accurate, as it is only intended for
>>>>>>> pasid entries transitioning from present to present, according to the
>>>>>>> specification.
>>>>>>>
>>>>>>> It's recommended to move pasid_clear_entry(pte) and
>>>>>>> pasid_set_present(pte) out to the caller, so ...
>>>>>>>
>>>>>>> For setup case (pasid from non-present to present):
>>>>>>>
>>>>>>> - pasid_clear_entry(pte)
>>>>>>> - pasid_pte_config_first_level(pte)
>>>>>>> - pasid_set_present(pte)
>>>>>>> - cache invalidations
>>>>>>>
>>>>>>> For replace case (pasid from present to present)
>>>>>>>
>>>>>>> - pasid_pte_config_first_level(pte)
>>>>>>> - cache invalidations
>>>>>>>
>>>>>>> The same applies to other types of setup and replace.
>>>>>>
>>>>>> hmmm. Here is the reason I did it in the way of this patch:
>>>>>> 1) pasid_clear_entry() can clear all the fields that are not supposed to
>>>>>>      be used by the new domain. For example, converting a nested 
>>>>>> domain to
>>>>>> SS
>>>>>>      only domain, if no pasid_clear_entry() then the FSPTR would be 
>>>>>> there.
>>>>>>      Although spec seems not enforce it, it might be good to clear it.
>>>>>> 2) We don't support atomic replace yet, so the whole pasid entry 
>>>>>> transition
>>>>>>      is not done in one shot, so it looks to be ok to do this stepping
>>>>>>      transition.
>>>>>> 3) It seems to be even worse if keep the Present bit during the 
>>>>>> transition.
>>>>>>      The pasid entry might be broken while the Present bit indicates 
>>>>>> this is
>>>>>>      a valid pasid entry. Say if there is in-flight DMA, the result 
>>>>>> may be
>>>>>>      unpredictable.
>>>>>>
>>>>>> Based on the above, I chose the current way. But I admit if we are 
>>>>>> going to
>>>>>> support atomic replace, then we should refactor a bit. I believe at that
>>>>>> time we need to construct the new pasid entry first and try to 
>>>>>> exchange it
>>>>>> to the pasid table. I can see some transition can be done in that way 
>>>>>> as we
>>>>>> can do atomic exchange with 128bits. thoughts? 🙂
>>>>>>
>>>>>
>>>>> yes 128bit cmpxchg is necessary to support atomic replacement.
>>>>>
>>>>> Actually vt-d spec clearly says so e.g. SSPTPTR/DID must be updated
>>>>> together in a present entry to not break in-flight DMA.
>>>>>
>>>>> but... your current way (clear entry then update it) also break in- 
>>>>> flight
>>>>> DMA. So let's admit that as the 1st step it's not aimed to support
>>>>> atomic replacement. With that Baolu's suggestion makes more sense
>>>>> toward future extension with less refactoring required (otherwise
>>>>> you should not use intel_pasid_flush_present() then the earlier
>>>>> refactoring for that helper is also meaningless).
>>>>
>>>> I see. The pasid entry might have some filed that is not supposed to be
>>>> used after replacement. Should we have a comment about it?
>>>
>>> I guess all fields except SSADE and P of a pasid table entry should be
>>> cleared in pasid_pte_config_first_level()?
>>
>> perhaps we can take one more step forward. We can construct the new pte 
>> in a local variable first and then push it to the pte in the pasid table. :)
>>
> 
> That sounds better! The entry is composed on the stack and then copied
> over to the pasid table as a whole.

that's it. Like the below.

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 6841b9892d55..0f2a926d3bd5 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -389,6 +389,50 @@ int intel_pasid_setup_first_level(struct intel_iommu 
*iommu,
  	return 0;
  }

+int intel_pasid_replace_first_level(struct intel_iommu *iommu,
+				    struct device *dev, pgd_t *pgd,
+				    u32 pasid, u16 did, u16 old_did,
+				    int flags)
+{
+	struct pasid_entry *pte, new_pte;
+
+	if (!ecap_flts(iommu->ecap)) {
+		pr_err("No first level translation support on %s\n",
+		       iommu->name);
+		return -EINVAL;
+	}
+
+	if ((flags & PASID_FLAG_FL5LP) && !cap_fl5lp_support(iommu->cap)) {
+		pr_err("No 5-level paging support for first-level on %s\n",
+		       iommu->name);
+		return -EINVAL;
+	}
+
+	pasid_pte_config_first_level(iommu, &new_pte, pgd, did, flags);
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+
+	if (!pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
+	WARN_ON(old_did != pasid_get_domain_id(pte));
+
+	*pte = new_pte;
+	spin_unlock(&iommu->lock);
+
+	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
+	intel_iommu_drain_pasid_prq(dev, pasid);
+
+	return 0;
+}
+
  /*
   * Set up the scalable mode pasid entry for second only translation type.
   */
@@ -456,6 +500,57 @@ int intel_pasid_setup_second_level(struct intel_iommu 
*iommu,
  	return 0;
  }

+int intel_pasid_replace_second_level(struct intel_iommu *iommu,
+				     struct dmar_domain *domain,
+				     struct device *dev, u16 old_did,
+				     u32 pasid)
+{
+	struct pasid_entry *pte, new_pte;
+	struct dma_pte *pgd;
+	u64 pgd_val;
+	u16 did;
+
+	/*
+	 * If hardware advertises no support for second level
+	 * translation, return directly.
+	 */
+	if (!ecap_slts(iommu->ecap)) {
+		pr_err("No second level translation support on %s\n",
+		       iommu->name);
+		return -EINVAL;
+	}
+
+	pgd = domain->pgd;
+	pgd_val = virt_to_phys(pgd);
+	did = domain_id_iommu(domain, iommu);
+
+	pasid_pte_config_second_level(iommu, &new_pte, pgd_val,
+				      domain->agaw, did,
+				      domain->dirty_tracking);
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+
+	if (!pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
+	WARN_ON(old_did != pasid_get_domain_id(pte));
+
+	*pte = new_pte;
+	spin_unlock(&iommu->lock);
+
+	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
+	intel_iommu_drain_pasid_prq(dev, pasid);
+
+	return 0;
+}
+
  /*
   * Set up dirty tracking on a second only or nested translation type.
   */
@@ -568,6 +663,38 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
*iommu,
  	return 0;
  }

+int intel_pasid_replace_pass_through(struct intel_iommu *iommu,
+				     struct device *dev, u16 old_did,
+				     u32 pasid)
+{
+	struct pasid_entry *pte, new_pte;
+	u16 did = FLPT_DEFAULT_DID;
+
+	pasid_pte_config_pass_through(iommu, &new_pte, did);
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+
+	if (!pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
+	WARN_ON(old_did != pasid_get_domain_id(pte));
+
+	*pte = new_pte;
+	spin_unlock(&iommu->lock);
+
+	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
+	intel_iommu_drain_pasid_prq(dev, pasid);
+
+	return 0;
+}
+
  /*
   * Set the page snoop control for a pasid entry which has been set up.
   */
@@ -698,6 +825,69 @@ int intel_pasid_setup_nested(struct intel_iommu 
*iommu, struct device *dev,
  	return 0;
  }

+int intel_pasid_replace_nested(struct intel_iommu *iommu,
+			       struct device *dev, u32 pasid,
+			       u16 old_did, struct dmar_domain *domain)
+{
+	struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg;
+	struct dmar_domain *s2_domain = domain->s2_domain;
+	u16 did = domain_id_iommu(domain, iommu);
+	struct pasid_entry *pte, new_pte;
+
+	/* Address width should match the address width supported by hardware */
+	switch (s1_cfg->addr_width) {
+	case ADDR_WIDTH_4LEVEL:
+		break;
+	case ADDR_WIDTH_5LEVEL:
+		if (!cap_fl5lp_support(iommu->cap)) {
+			dev_err_ratelimited(dev,
+					    "5-level paging not supported\n");
+			return -EINVAL;
+		}
+		break;
+	default:
+		dev_err_ratelimited(dev, "Invalid stage-1 address width %d\n",
+				    s1_cfg->addr_width);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_S1_SRE) && !ecap_srs(iommu->ecap)) {
+		pr_err_ratelimited("No supervisor request support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_S1_EAFE) && !ecap_eafs(iommu->ecap)) {
+		pr_err_ratelimited("No extended access flag support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	pasid_pte_config_nestd(iommu, &new_pte, s1_cfg, s2_domain, did);
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+
+	if (!pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
+	WARN_ON(old_did != pasid_get_domain_id(pte));
+
+	*pte = new_pte;
+	spin_unlock(&iommu->lock);
+
+	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
+	intel_iommu_drain_pasid_prq(dev, pasid);
+
+	return 0;
+}
+
  /*
   * Interfaces to setup or teardown a pasid table to the scalable-mode
   * context table entry:
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index dde6d3ba5ae0..082f4fe20216 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -303,6 +296,21 @@ int intel_pasid_setup_pass_through(struct intel_iommu 
*iommu,
  				   struct device *dev, u32 pasid);
  int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
  			     u32 pasid, struct dmar_domain *domain);
+int intel_pasid_replace_first_level(struct intel_iommu *iommu,
+				    struct device *dev, pgd_t *pgd,
+				    u32 pasid, u16 did, u16 old_did,
+				    int flags);
+int intel_pasid_replace_second_level(struct intel_iommu *iommu,
+				     struct dmar_domain *domain,
+				     struct device *dev, u16 old_did,
+				     u32 pasid);
+int intel_pasid_replace_pass_through(struct intel_iommu *iommu,
+				     struct device *dev, u16 old_did,
+				     u32 pasid);
+int intel_pasid_replace_nested(struct intel_iommu *iommu,
+			       struct device *dev, u32 pasid,
+			       u16 old_did, struct dmar_domain *domain);
+
  void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
  				 struct device *dev, u32 pasid,
  				 bool fault_ignore);

> With these two issues addressed, do u mind sending a new version? Let's
> try to catch the pull request window. There are other series (iommufd
> and vfio) about user space PASID support depending on this.

yes, for sure.

-- 
Regards,
Yi Liu

  reply	other threads:[~2024-11-07  8:35 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-06 15:45 [PATCH v5 00/13] Make set_dev_pasid op supporting domain replacement Yi Liu
2024-11-06 15:45 ` [PATCH v5 01/13] iommu: Pass old domain to set_dev_pasid op Yi Liu
2024-11-06 15:45 ` [PATCH v5 02/13] iommu/vt-d: Add a helper to flush cache for updating present pasid entry Yi Liu
2024-11-06 15:45 ` [PATCH v5 03/13] iommu/vt-d: Refactor the pasid setup helpers Yi Liu
2024-11-06 15:45 ` [PATCH v5 04/13] iommu/vt-d: Add pasid replace helpers Yi Liu
2024-11-07  2:52   ` Baolu Lu
2024-11-07  4:21     ` Yi Liu
2024-11-07  5:46       ` Tian, Kevin
2024-11-07  6:46         ` Yi Liu
2024-11-07  6:53           ` Baolu Lu
2024-11-07  7:57             ` Yi Liu
2024-11-07  8:04               ` Baolu Lu
2024-11-07  8:39                 ` Yi Liu [this message]
2024-11-07  8:41                   ` Baolu Lu
2024-11-07  2:57   ` Baolu Lu
2024-11-07  4:05     ` Yi Liu
2024-11-06 15:45 ` [PATCH v5 05/13] iommu/vt-d: Consolidate the struct dev_pasid_info add/remove Yi Liu
2024-11-06 15:45 ` [PATCH v5 06/13] iommu/vt-d: Add iommu_domain_did() to get did Yi Liu
2024-11-06 15:46 ` [PATCH v5 07/13] iommu/vt-d: Make intel_iommu_set_dev_pasid() to handle domain replacement Yi Liu
2024-11-06 15:46 ` [PATCH v5 08/13] iommu/vt-d: Limit intel_iommu_set_dev_pasid() for paging domain Yi Liu
2024-11-06 15:46 ` [PATCH v5 09/13] iommu/vt-d: Make intel_svm_set_dev_pasid() support domain replacement Yi Liu
2024-11-06 15:46 ` [PATCH v5 10/13] iommu/vt-d: Make identity_domain_set_dev_pasid() to handle " Yi Liu
2024-11-06 15:46 ` [PATCH v5 11/13] iommu/vt-d: Add set_dev_pasid callback for nested domain Yi Liu
2024-11-06 15:46 ` [PATCH v5 12/13] iommu/arm-smmu-v3: Make set_dev_pasid() op support replace Yi Liu
2024-11-06 15:46 ` [PATCH v5 13/13] iommu: Make set_dev_pasid op support domain replacement Yi Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f73869de-38b1-41cf-bd20-d91523e4fd08@intel.com \
    --to=yi.l.liu@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=chao.p.peng@linux.intel.com \
    --cc=eric.auger@redhat.com \
    --cc=iommu@lists.linux.dev \
    --cc=jgg@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=nicolinc@nvidia.com \
    --cc=vasant.hegde@amd.com \
    --cc=willy@infradead.org \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox