public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases
@ 2024-08-26 11:40 Klaus Jensen
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 1/6] iommu/vt-d: Separate page request queue from SVM Klaus Jensen
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Klaus Jensen @ 2024-08-26 11:40 UTC (permalink / raw)
  To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian
  Cc: Minwoo Im, linux-kernel, iommu, Joel Granados, Klaus Jensen

This is a Request for Comment series that will hopefully generate
initial feedback on the use of the iommufd_hwpt_replace_device to
execute non-nested and non-svm user space IOPFs. Our main motivation is
to enable user-space driver driven device verification with default
pasid and without nesting nor SVM.

What?
  * Enable IO page fault handling in user space in a non-nested, non-svm
    and non-virtualised use case.
  * Removing the relation between IOPF and INTEL_IOMMU_SVM by allowing
    the user to (de)select the IOPF code through Kconfig.
  * Create a new file under iommu/intel (prq.c) that contains all the
    page request queue related logic and is not under intel/svm.c.
  * Add the IOMMU_HWPT_FAULT_ID_VALID to the valid flags used to create
    IOMMU_HWPT_ALLOC allocations.
  * Create a default (zero) pasid handle and insert it to the pasid
    array within the dev->iommu_group when replacing the old HWPT with
    an iopf enabled HWPT.

Why?
  The PCI ATS Extended Capability allows peripheral devices to
  participate in the caching of translations when operating under an
  IOMMU. Further, the ATS Page Request Interface (PRI) Extension allows
  devices to handle missing mappings. Currently, PRI is mainly used in
  the context of Shared Virtual Addressing, requiring support for the
  Process Address Space Identifier (PASID) capability, but other use
  cases such as enabling user-space driver driven device verification
  and reducing memory pinning exists. This patchest sets out to enable
  these use cases.

Testing?
  The non-nested/non-svm IOPF interface is exercised by first
  initializing an iopf enabled ioas and then reading the fault file
  descriptor. Pseudocode on the iopf initializing and handling is in [3]
  and [4] (using libvfn).

  Supplementary repositories supporting this patchset:
    1. A user space library libvfn [1] which is used for testing and
       verification (see examples/iopf.c), and
    2. Basic emulation of PCIe ATS/PRI and Intel VT-d PRQ in QEMU [2].

Notes
  Patches 5/6 are added by Klaus for testing against the QEMU test
  device (which does not support PASID). They are very much RFC.

Comments and feedback are greatly appreciated

Best

Joel

PS: I'm on PTO, so my answers might be delayed (back September 2nd). But
    I'll give priority to answer any questions or feedback when I see
    it.

[1] https://github.com/SamsungDS/libvfn/tree/iommufd-fault-queue
[2] https://gitlab.com/birkelund/qemu/-/tree/pcie-ats-pri

[3] Initializing
```
  int iopf_init(struct iommu_ioas *ioas, const char *bdf)
  {
      // open vfio device from bdf
      int devfd = open('/dev/vfio/devices/VFIO_DEV', O_RDWR);

      struct vfio_device_bind_iommufd bind = {
          .argsz = sizeof(bind),
          .flags = 0,
          .iommufd = __iommufd,
      };
      ioctl(devfd, VFIO_DEVICE_BIND_IOMMUFD, &bind);

      struct iommu_ioas *ioas = ioas;
      struct vfio_device_attach_iommufd_pt attach_data = {
          .argsz = sizeof(attach_data),
          .flags = 0,
          .pt_id = ioas->id,
      };
      ioctl(devfd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);

      struct iommu_fault_alloc fault = {
          .size = sizeof(fault),
          .flags = 0,
      };
      ioctl(__iommufd, IOMMU_FAULT_QUEUE_ALLOC, &fault);

      struct iommu_hwpt_alloc fault_cmd = {
          .size = sizeof(fault_cmd),
          .flags = IOMMU_HWPT_FAULT_ID_VALID,
          .dev_id = bind.out_devid,
          .pt_id = ioas->id,
          .data_len = 0,
          .data_uptr = (uint64_t)NULL,
          .fault_id = fault.out_fault_id,
          .__reserved = 0,
          };
      ioctl(__iommufd, IOMMU_HWPT_ALLOC, &fault_cmd);

      // This is a re-attach
      struct vfio_device_attach_iommufd_pt attach = {
          .argsz = sizeof(attach),
          .flags = 0,
          .pt_id = fault_cmd.out_hwpt_id
      };
      ioctl(dev_fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach);
  }
```

[4] Handling
```
  int handle_iopf(void *vaddr, int len, uint64_t iova) {
    exec_command(CMD)

    int iopf_fd = fault_cmd.fault_id;

    struct iommu_hwpt_pgfault pgfault = {0};
    if(read(iopf_fd, &pgfault, sizeof(pgfault)) == 0);
      return; // no page fault

    ret = iommu_map_vaddr(__iommmufd, vaddr, len, &iova)
    struct iommu_hwpt_page_response pgfault_response = {
      .cookie = pgfault.cookie,
      .code = ret ? IOMMUFD_PAGE_RESP_SUCCESS : IOMMUFD_PAGE_RESP_INVALID,
    };

    write(iopf_fd, &pgfault_response, sizeof(pgfault_response));

    return;
  }
```

Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
Joel Granados (4):
      iommu/vt-d: Separate page request queue from SVM
      iommu: Make IOMMU_IOPF selectable in Kconfig
      iommufd: Enable PRI when doing the iommufd_hwpt_alloc
      iommu: init pasid array while doing domain_replace and iopf is active

Klaus Jensen (2):
      iommu/vt-d: drop pasid requirement for prq initialization
      iommu/vt-d: do not require a PASID in page requests

 drivers/iommu/Kconfig                |   2 +-
 drivers/iommu/intel/Kconfig          |   1 -
 drivers/iommu/intel/Makefile         |   2 +-
 drivers/iommu/intel/iommu.c          |  29 ++--
 drivers/iommu/intel/iommu.h          |  40 ++++-
 drivers/iommu/intel/prq.c            | 284 ++++++++++++++++++++++++++++++++
 drivers/iommu/intel/svm.c            | 308 -----------------------------------
 drivers/iommu/iommu-priv.h           |   3 +
 drivers/iommu/iommu.c                |  31 ++++
 drivers/iommu/iommufd/fault.c        |  22 +++
 drivers/iommu/iommufd/hw_pagetable.c |   3 +-
 11 files changed, 389 insertions(+), 336 deletions(-)
---
base-commit: 3d5f968a177d468cd13568ef901c5be84d83d32b
change-id: 20240823-iopf-for-all-3b19075efc32

Best regards,
-- 
Klaus Jensen <k.jensen@samsung.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH RFC PREVIEW 1/6] iommu/vt-d: Separate page request queue from SVM
  2024-08-26 11:40 [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases Klaus Jensen
@ 2024-08-26 11:40 ` Klaus Jensen
  2024-08-26 13:09   ` Baolu Lu
  2024-09-01  5:16   ` Baolu Lu
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 2/6] iommu: Make IOMMU_IOPF selectable in Kconfig Klaus Jensen
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Klaus Jensen @ 2024-08-26 11:40 UTC (permalink / raw)
  To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian
  Cc: Minwoo Im, linux-kernel, iommu, Joel Granados, Klaus Jensen

From: Joel Granados <j.granados@samsung.com>

IO page faults are no longer dependent on CONFIG_INTEL_IOMMU_SVM. Move
all Page Request Queue (PRQ) functions that handle prq events to a new
file in drivers/iommu/intel/prq.c. The page_req_des struct is made
available in drivers/iommu/intel/iommu.h.

No functional changes are intended. This is a preparation patch to
enable the use of IO page faults outside the SVM and nested use cases.

Signed-off-by: Joel Granados <j.granados@samsung.com>
---
 drivers/iommu/intel/Makefile |   2 +-
 drivers/iommu/intel/iommu.c  |  18 +--
 drivers/iommu/intel/iommu.h  |  40 +++++-
 drivers/iommu/intel/prq.c    | 290 ++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/svm.c    | 308 -------------------------------------------
 5 files changed, 331 insertions(+), 327 deletions(-)

diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
index c8beb0281559..d3bb0798092d 100644
--- a/drivers/iommu/intel/Makefile
+++ b/drivers/iommu/intel/Makefile
@@ -1,6 +1,6 @@
 # SPDX-License-Identifier: GPL-2.0
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
-obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o
+obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o prq.o
 obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
 obj-$(CONFIG_DMAR_PERF) += perf.o
 obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9ff8b83c19a3..4ca284d53a6b 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1487,12 +1487,10 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
 	/* free context mapping */
 	free_context_table(iommu);
 
-#ifdef CONFIG_INTEL_IOMMU_SVM
 	if (pasid_supported(iommu)) {
 		if (ecap_prs(iommu->ecap))
-			intel_svm_finish_prq(iommu);
+			intel_finish_prq(iommu);
 	}
-#endif
 }
 
 /*
@@ -2480,19 +2478,18 @@ static int __init init_dmars(void)
 
 		iommu_flush_write_buffer(iommu);
 
-#ifdef CONFIG_INTEL_IOMMU_SVM
 		if (pasid_supported(iommu) && ecap_prs(iommu->ecap)) {
 			/*
 			 * Call dmar_alloc_hwirq() with dmar_global_lock held,
 			 * could cause possible lock race condition.
 			 */
 			up_write(&dmar_global_lock);
-			ret = intel_svm_enable_prq(iommu);
+			ret = intel_enable_prq(iommu);
 			down_write(&dmar_global_lock);
 			if (ret)
 				goto free_iommu;
 		}
-#endif
+
 		ret = dmar_set_interrupt(iommu);
 		if (ret)
 			goto free_iommu;
@@ -2922,13 +2919,12 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
 	intel_iommu_init_qi(iommu);
 	iommu_flush_write_buffer(iommu);
 
-#ifdef CONFIG_INTEL_IOMMU_SVM
 	if (pasid_supported(iommu) && ecap_prs(iommu->ecap)) {
-		ret = intel_svm_enable_prq(iommu);
+		ret = intel_enable_prq(iommu);
 		if (ret)
 			goto disable_iommu;
 	}
-#endif
+
 	ret = dmar_set_interrupt(iommu);
 	if (ret)
 		goto disable_iommu;
@@ -4669,9 +4665,7 @@ const struct iommu_ops intel_iommu_ops = {
 	.def_domain_type	= device_def_domain_type,
 	.remove_dev_pasid	= intel_iommu_remove_dev_pasid,
 	.pgsize_bitmap		= SZ_4K,
-#ifdef CONFIG_INTEL_IOMMU_SVM
-	.page_response		= intel_svm_page_response,
-#endif
+	.page_response		= intel_page_response,
 	.default_domain_ops = &(const struct iommu_domain_ops) {
 		.attach_dev		= intel_iommu_attach_device,
 		.set_dev_pasid		= intel_iommu_set_dev_pasid,
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index b67c14da1240..b3d98e706ed8 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -694,6 +694,35 @@ struct iommu_pmu {
 #define IOMMU_IRQ_ID_OFFSET_PRQ		(DMAR_UNITS_SUPPORTED)
 #define IOMMU_IRQ_ID_OFFSET_PERF	(2 * DMAR_UNITS_SUPPORTED)
 
+/* Page request queue descriptor */
+struct page_req_dsc {
+	union {
+		struct {
+			u64 type:8;
+			u64 pasid_present:1;
+			u64 rsvd:7;
+			u64 rid:16;
+			u64 pasid:20;
+			u64 exe_req:1;
+			u64 pm_req:1;
+			u64 rsvd2:10;
+		};
+		u64 qw_0;
+	};
+	union {
+		struct {
+			u64 rd_req:1;
+			u64 wr_req:1;
+			u64 lpig:1;
+			u64 prg_index:9;
+			u64 addr:52;
+		};
+		u64 qw_1;
+	};
+	u64 qw_2;
+	u64 qw_3;
+};
+
 struct intel_iommu {
 	void __iomem	*reg; /* Pointer to hardware regs, virtual addr */
 	u64 		reg_phys; /* physical address of hw register set */
@@ -719,12 +748,10 @@ struct intel_iommu {
 
 	struct iommu_flush flush;
 #endif
-#ifdef CONFIG_INTEL_IOMMU_SVM
 	struct page_req_dsc *prq;
 	unsigned char prq_name[16];    /* Name for PRQ interrupt */
 	unsigned long prq_seq_number;
 	struct completion prq_complete;
-#endif
 	struct iopf_queue *iopf_queue;
 	unsigned char iopfq_name[16];
 	/* Synchronization between fault report and iommu device release. */
@@ -1156,12 +1183,13 @@ void intel_context_flush_present(struct device_domain_info *info,
 				 struct context_entry *context,
 				 bool affect_domains);
 
+int intel_enable_prq(struct intel_iommu *iommu);
+int intel_finish_prq(struct intel_iommu *iommu);
+void intel_page_response(struct device *dev, struct iopf_fault *evt,
+			struct iommu_page_response *msg);
+
 #ifdef CONFIG_INTEL_IOMMU_SVM
 void intel_svm_check(struct intel_iommu *iommu);
-int intel_svm_enable_prq(struct intel_iommu *iommu);
-int intel_svm_finish_prq(struct intel_iommu *iommu);
-void intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
-			     struct iommu_page_response *msg);
 struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
 					    struct mm_struct *mm);
 void intel_drain_pasid_prq(struct device *dev, u32 pasid);
diff --git a/drivers/iommu/intel/prq.c b/drivers/iommu/intel/prq.c
new file mode 100644
index 000000000000..2814373e95d8
--- /dev/null
+++ b/drivers/iommu/intel/prq.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright © 2015 Intel Corporation.
+ *
+ * Authors: David Woodhouse <dwmw2@infradead.org>
+ */
+
+#include <linux/pci.h>
+
+#include "iommu.h"
+#include "../iommu-pages.h"
+#include "trace.h"
+
+static bool is_canonical_address(u64 addr)
+{
+	int shift = 64 - (__VIRTUAL_MASK_SHIFT + 1);
+	long saddr = (long) addr;
+
+	return (((saddr << shift) >> shift) == saddr);
+}
+
+static void handle_bad_prq_event(struct intel_iommu *iommu,
+				 struct page_req_dsc *req, int result)
+{
+	struct qi_desc desc = { };
+
+	pr_err("%s: Invalid page request: %08llx %08llx\n",
+	       iommu->name, ((unsigned long long *)req)[0],
+	       ((unsigned long long *)req)[1]);
+
+	if (!req->lpig)
+		return;
+
+	desc.qw0 = QI_PGRP_PASID(req->pasid) |
+			QI_PGRP_DID(req->rid) |
+			QI_PGRP_PASID_P(req->pasid_present) |
+			QI_PGRP_RESP_CODE(result) |
+			QI_PGRP_RESP_TYPE;
+	desc.qw1 = QI_PGRP_IDX(req->prg_index) |
+			QI_PGRP_LPIG(req->lpig);
+
+	qi_submit_sync(iommu, &desc, 1, 0);
+}
+
+static int prq_to_iommu_prot(struct page_req_dsc *req)
+{
+	int prot = 0;
+
+	if (req->rd_req)
+		prot |= IOMMU_FAULT_PERM_READ;
+	if (req->wr_req)
+		prot |= IOMMU_FAULT_PERM_WRITE;
+	if (req->exe_req)
+		prot |= IOMMU_FAULT_PERM_EXEC;
+	if (req->pm_req)
+		prot |= IOMMU_FAULT_PERM_PRIV;
+
+	return prot;
+}
+
+static void intel_prq_report(struct intel_iommu *iommu, struct device *dev,
+				 struct page_req_dsc *desc)
+{
+	struct iopf_fault event = { };
+
+	/* Fill in event data for device specific processing */
+	event.fault.type = IOMMU_FAULT_PAGE_REQ;
+	event.fault.prm.addr = (u64)desc->addr << VTD_PAGE_SHIFT;
+	event.fault.prm.pasid = desc->pasid;
+	event.fault.prm.grpid = desc->prg_index;
+	event.fault.prm.perm = prq_to_iommu_prot(desc);
+
+	if (desc->lpig)
+		event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+	if (desc->pasid_present) {
+		event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+		event.fault.prm.flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
+	}
+
+	iommu_report_device_fault(dev, &event);
+}
+
+static irqreturn_t prq_event_thread(int irq, void *d)
+{
+	struct intel_iommu *iommu = d;
+	struct page_req_dsc *req;
+	int head, tail, handled;
+	struct device *dev;
+	u64 address;
+
+	/*
+	 * Clear PPR bit before reading head/tail registers, to ensure that
+	 * we get a new interrupt if needed.
+	 */
+	writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG);
+
+	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
+	head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
+	handled = (head != tail);
+	while (head != tail) {
+		req = &iommu->prq[head / sizeof(*req)];
+		address = (u64)req->addr << VTD_PAGE_SHIFT;
+
+		if (unlikely(!req->pasid_present)) {
+			pr_err("IOMMU: %s: Page request without PASID\n",
+			       iommu->name);
+bad_req:
+			handle_bad_prq_event(iommu, req, QI_RESP_INVALID);
+			goto prq_advance;
+		}
+
+		if (unlikely(!is_canonical_address(address))) {
+			pr_err("IOMMU: %s: Address is not canonical\n",
+			       iommu->name);
+			goto bad_req;
+		}
+
+		if (unlikely(req->pm_req && (req->rd_req | req->wr_req))) {
+			pr_err("IOMMU: %s: Page request in Privilege Mode\n",
+			       iommu->name);
+			goto bad_req;
+		}
+
+		if (unlikely(req->exe_req && req->rd_req)) {
+			pr_err("IOMMU: %s: Execution request not supported\n",
+			       iommu->name);
+			goto bad_req;
+		}
+
+		/* Drop Stop Marker message. No need for a response. */
+		if (unlikely(req->lpig && !req->rd_req && !req->wr_req))
+			goto prq_advance;
+
+		/*
+		 * If prq is to be handled outside iommu driver via receiver of
+		 * the fault notifiers, we skip the page response here.
+		 */
+		mutex_lock(&iommu->iopf_lock);
+		dev = device_rbtree_find(iommu, req->rid);
+		if (!dev) {
+			mutex_unlock(&iommu->iopf_lock);
+			goto bad_req;
+		}
+
+		intel_prq_report(iommu, dev, req);
+		trace_prq_report(iommu, dev, req->qw_0, req->qw_1,
+				 req->qw_2, req->qw_3,
+				 iommu->prq_seq_number++);
+		mutex_unlock(&iommu->iopf_lock);
+prq_advance:
+		head = (head + sizeof(*req)) & PRQ_RING_MASK;
+	}
+
+	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
+
+	/*
+	 * Clear the page request overflow bit and wake up all threads that
+	 * are waiting for the completion of this handling.
+	 */
+	if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) {
+		pr_info_ratelimited("IOMMU: %s: PRQ overflow detected\n",
+				    iommu->name);
+		head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
+		tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
+		if (head == tail) {
+			iopf_queue_discard_partial(iommu->iopf_queue);
+			writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
+			pr_info_ratelimited("IOMMU: %s: PRQ overflow cleared",
+					    iommu->name);
+		}
+	}
+
+	if (!completion_done(&iommu->prq_complete))
+		complete(&iommu->prq_complete);
+
+	return IRQ_RETVAL(handled);
+}
+
+int intel_enable_prq(struct intel_iommu *iommu)
+{
+	struct iopf_queue *iopfq;
+	int irq, ret;
+
+	iommu->prq = iommu_alloc_pages_node(iommu->node, GFP_KERNEL, PRQ_ORDER);
+	if (!iommu->prq) {
+		pr_warn("IOMMU: %s: Failed to allocate page request queue\n",
+			iommu->name);
+		return -ENOMEM;
+	}
+
+	irq = dmar_alloc_hwirq(IOMMU_IRQ_ID_OFFSET_PRQ + iommu->seq_id, iommu->node, iommu);
+	if (irq <= 0) {
+		pr_err("IOMMU: %s: Failed to create IRQ vector for page request queue\n",
+		       iommu->name);
+		ret = -EINVAL;
+		goto free_prq;
+	}
+	iommu->pr_irq = irq;
+
+	snprintf(iommu->iopfq_name, sizeof(iommu->iopfq_name),
+		 "dmar%d-iopfq", iommu->seq_id);
+	iopfq = iopf_queue_alloc(iommu->iopfq_name);
+	if (!iopfq) {
+		pr_err("IOMMU: %s: Failed to allocate iopf queue\n", iommu->name);
+		ret = -ENOMEM;
+		goto free_hwirq;
+	}
+	iommu->iopf_queue = iopfq;
+
+	snprintf(iommu->prq_name, sizeof(iommu->prq_name), "dmar%d-prq", iommu->seq_id);
+
+	ret = request_threaded_irq(irq, NULL, prq_event_thread, IRQF_ONESHOT,
+				   iommu->prq_name, iommu);
+	if (ret) {
+		pr_err("IOMMU: %s: Failed to request IRQ for page request queue\n",
+		       iommu->name);
+		goto free_iopfq;
+	}
+	dmar_writeq(iommu->reg + DMAR_PQH_REG, 0ULL);
+	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
+	dmar_writeq(iommu->reg + DMAR_PQA_REG, virt_to_phys(iommu->prq) | PRQ_ORDER);
+
+	init_completion(&iommu->prq_complete);
+
+	return 0;
+
+free_iopfq:
+	iopf_queue_free(iommu->iopf_queue);
+	iommu->iopf_queue = NULL;
+free_hwirq:
+	dmar_free_hwirq(irq);
+	iommu->pr_irq = 0;
+free_prq:
+	iommu_free_pages(iommu->prq, PRQ_ORDER);
+	iommu->prq = NULL;
+
+	return ret;
+}
+
+int intel_finish_prq(struct intel_iommu *iommu)
+{
+	dmar_writeq(iommu->reg + DMAR_PQH_REG, 0ULL);
+	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
+	dmar_writeq(iommu->reg + DMAR_PQA_REG, 0ULL);
+
+	if (iommu->pr_irq) {
+		free_irq(iommu->pr_irq, iommu);
+		dmar_free_hwirq(iommu->pr_irq);
+		iommu->pr_irq = 0;
+	}
+
+	if (iommu->iopf_queue) {
+		iopf_queue_free(iommu->iopf_queue);
+		iommu->iopf_queue = NULL;
+	}
+
+	iommu_free_pages(iommu->prq, PRQ_ORDER);
+	iommu->prq = NULL;
+
+	return 0;
+}
+
+void intel_page_response(struct device *dev, struct iopf_fault *evt,
+			 struct iommu_page_response *msg)
+{
+	struct device_domain_info *info = dev_iommu_priv_get(dev);
+	struct intel_iommu *iommu = info->iommu;
+	u8 bus = info->bus, devfn = info->devfn;
+	struct iommu_fault_page_request *prm;
+	struct qi_desc desc;
+	bool pasid_present;
+	bool last_page;
+	u16 sid;
+
+	prm = &evt->fault.prm;
+	sid = PCI_DEVID(bus, devfn);
+	pasid_present = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
+	last_page = prm->flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
+
+	desc.qw0 = QI_PGRP_PASID(prm->pasid) | QI_PGRP_DID(sid) |
+			QI_PGRP_PASID_P(pasid_present) |
+			QI_PGRP_RESP_CODE(msg->code) |
+			QI_PGRP_RESP_TYPE;
+	desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page);
+	desc.qw2 = 0;
+	desc.qw3 = 0;
+
+	qi_submit_sync(iommu, &desc, 1, 0);
+}
+
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 0e3a9b38bef2..c50c68df1fb2 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -25,92 +25,6 @@
 #include "../iommu-pages.h"
 #include "trace.h"
 
-static irqreturn_t prq_event_thread(int irq, void *d);
-
-int intel_svm_enable_prq(struct intel_iommu *iommu)
-{
-	struct iopf_queue *iopfq;
-	int irq, ret;
-
-	iommu->prq = iommu_alloc_pages_node(iommu->node, GFP_KERNEL, PRQ_ORDER);
-	if (!iommu->prq) {
-		pr_warn("IOMMU: %s: Failed to allocate page request queue\n",
-			iommu->name);
-		return -ENOMEM;
-	}
-
-	irq = dmar_alloc_hwirq(IOMMU_IRQ_ID_OFFSET_PRQ + iommu->seq_id, iommu->node, iommu);
-	if (irq <= 0) {
-		pr_err("IOMMU: %s: Failed to create IRQ vector for page request queue\n",
-		       iommu->name);
-		ret = -EINVAL;
-		goto free_prq;
-	}
-	iommu->pr_irq = irq;
-
-	snprintf(iommu->iopfq_name, sizeof(iommu->iopfq_name),
-		 "dmar%d-iopfq", iommu->seq_id);
-	iopfq = iopf_queue_alloc(iommu->iopfq_name);
-	if (!iopfq) {
-		pr_err("IOMMU: %s: Failed to allocate iopf queue\n", iommu->name);
-		ret = -ENOMEM;
-		goto free_hwirq;
-	}
-	iommu->iopf_queue = iopfq;
-
-	snprintf(iommu->prq_name, sizeof(iommu->prq_name), "dmar%d-prq", iommu->seq_id);
-
-	ret = request_threaded_irq(irq, NULL, prq_event_thread, IRQF_ONESHOT,
-				   iommu->prq_name, iommu);
-	if (ret) {
-		pr_err("IOMMU: %s: Failed to request IRQ for page request queue\n",
-		       iommu->name);
-		goto free_iopfq;
-	}
-	dmar_writeq(iommu->reg + DMAR_PQH_REG, 0ULL);
-	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
-	dmar_writeq(iommu->reg + DMAR_PQA_REG, virt_to_phys(iommu->prq) | PRQ_ORDER);
-
-	init_completion(&iommu->prq_complete);
-
-	return 0;
-
-free_iopfq:
-	iopf_queue_free(iommu->iopf_queue);
-	iommu->iopf_queue = NULL;
-free_hwirq:
-	dmar_free_hwirq(irq);
-	iommu->pr_irq = 0;
-free_prq:
-	iommu_free_pages(iommu->prq, PRQ_ORDER);
-	iommu->prq = NULL;
-
-	return ret;
-}
-
-int intel_svm_finish_prq(struct intel_iommu *iommu)
-{
-	dmar_writeq(iommu->reg + DMAR_PQH_REG, 0ULL);
-	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
-	dmar_writeq(iommu->reg + DMAR_PQA_REG, 0ULL);
-
-	if (iommu->pr_irq) {
-		free_irq(iommu->pr_irq, iommu);
-		dmar_free_hwirq(iommu->pr_irq);
-		iommu->pr_irq = 0;
-	}
-
-	if (iommu->iopf_queue) {
-		iopf_queue_free(iommu->iopf_queue);
-		iommu->iopf_queue = NULL;
-	}
-
-	iommu_free_pages(iommu->prq, PRQ_ORDER);
-	iommu->prq = NULL;
-
-	return 0;
-}
-
 void intel_svm_check(struct intel_iommu *iommu)
 {
 	if (!pasid_supported(iommu))
@@ -237,43 +151,6 @@ static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
 	return ret;
 }
 
-/* Page request queue descriptor */
-struct page_req_dsc {
-	union {
-		struct {
-			u64 type:8;
-			u64 pasid_present:1;
-			u64 rsvd:7;
-			u64 rid:16;
-			u64 pasid:20;
-			u64 exe_req:1;
-			u64 pm_req:1;
-			u64 rsvd2:10;
-		};
-		u64 qw_0;
-	};
-	union {
-		struct {
-			u64 rd_req:1;
-			u64 wr_req:1;
-			u64 lpig:1;
-			u64 prg_index:9;
-			u64 addr:52;
-		};
-		u64 qw_1;
-	};
-	u64 qw_2;
-	u64 qw_3;
-};
-
-static bool is_canonical_address(u64 addr)
-{
-	int shift = 64 - (__VIRTUAL_MASK_SHIFT + 1);
-	long saddr = (long) addr;
-
-	return (((saddr << shift) >> shift) == saddr);
-}
-
 /**
  * intel_drain_pasid_prq - Drain page requests and responses for a pasid
  * @dev: target device
@@ -363,191 +240,6 @@ void intel_drain_pasid_prq(struct device *dev, u32 pasid)
 	}
 }
 
-static int prq_to_iommu_prot(struct page_req_dsc *req)
-{
-	int prot = 0;
-
-	if (req->rd_req)
-		prot |= IOMMU_FAULT_PERM_READ;
-	if (req->wr_req)
-		prot |= IOMMU_FAULT_PERM_WRITE;
-	if (req->exe_req)
-		prot |= IOMMU_FAULT_PERM_EXEC;
-	if (req->pm_req)
-		prot |= IOMMU_FAULT_PERM_PRIV;
-
-	return prot;
-}
-
-static void intel_svm_prq_report(struct intel_iommu *iommu, struct device *dev,
-				 struct page_req_dsc *desc)
-{
-	struct iopf_fault event = { };
-
-	/* Fill in event data for device specific processing */
-	event.fault.type = IOMMU_FAULT_PAGE_REQ;
-	event.fault.prm.addr = (u64)desc->addr << VTD_PAGE_SHIFT;
-	event.fault.prm.pasid = desc->pasid;
-	event.fault.prm.grpid = desc->prg_index;
-	event.fault.prm.perm = prq_to_iommu_prot(desc);
-
-	if (desc->lpig)
-		event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
-	if (desc->pasid_present) {
-		event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
-		event.fault.prm.flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
-	}
-
-	iommu_report_device_fault(dev, &event);
-}
-
-static void handle_bad_prq_event(struct intel_iommu *iommu,
-				 struct page_req_dsc *req, int result)
-{
-	struct qi_desc desc = { };
-
-	pr_err("%s: Invalid page request: %08llx %08llx\n",
-	       iommu->name, ((unsigned long long *)req)[0],
-	       ((unsigned long long *)req)[1]);
-
-	if (!req->lpig)
-		return;
-
-	desc.qw0 = QI_PGRP_PASID(req->pasid) |
-			QI_PGRP_DID(req->rid) |
-			QI_PGRP_PASID_P(req->pasid_present) |
-			QI_PGRP_RESP_CODE(result) |
-			QI_PGRP_RESP_TYPE;
-	desc.qw1 = QI_PGRP_IDX(req->prg_index) |
-			QI_PGRP_LPIG(req->lpig);
-
-	qi_submit_sync(iommu, &desc, 1, 0);
-}
-
-static irqreturn_t prq_event_thread(int irq, void *d)
-{
-	struct intel_iommu *iommu = d;
-	struct page_req_dsc *req;
-	int head, tail, handled;
-	struct device *dev;
-	u64 address;
-
-	/*
-	 * Clear PPR bit before reading head/tail registers, to ensure that
-	 * we get a new interrupt if needed.
-	 */
-	writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG);
-
-	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
-	head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
-	handled = (head != tail);
-	while (head != tail) {
-		req = &iommu->prq[head / sizeof(*req)];
-		address = (u64)req->addr << VTD_PAGE_SHIFT;
-
-		if (unlikely(!req->pasid_present)) {
-			pr_err("IOMMU: %s: Page request without PASID\n",
-			       iommu->name);
-bad_req:
-			handle_bad_prq_event(iommu, req, QI_RESP_INVALID);
-			goto prq_advance;
-		}
-
-		if (unlikely(!is_canonical_address(address))) {
-			pr_err("IOMMU: %s: Address is not canonical\n",
-			       iommu->name);
-			goto bad_req;
-		}
-
-		if (unlikely(req->pm_req && (req->rd_req | req->wr_req))) {
-			pr_err("IOMMU: %s: Page request in Privilege Mode\n",
-			       iommu->name);
-			goto bad_req;
-		}
-
-		if (unlikely(req->exe_req && req->rd_req)) {
-			pr_err("IOMMU: %s: Execution request not supported\n",
-			       iommu->name);
-			goto bad_req;
-		}
-
-		/* Drop Stop Marker message. No need for a response. */
-		if (unlikely(req->lpig && !req->rd_req && !req->wr_req))
-			goto prq_advance;
-
-		/*
-		 * If prq is to be handled outside iommu driver via receiver of
-		 * the fault notifiers, we skip the page response here.
-		 */
-		mutex_lock(&iommu->iopf_lock);
-		dev = device_rbtree_find(iommu, req->rid);
-		if (!dev) {
-			mutex_unlock(&iommu->iopf_lock);
-			goto bad_req;
-		}
-
-		intel_svm_prq_report(iommu, dev, req);
-		trace_prq_report(iommu, dev, req->qw_0, req->qw_1,
-				 req->qw_2, req->qw_3,
-				 iommu->prq_seq_number++);
-		mutex_unlock(&iommu->iopf_lock);
-prq_advance:
-		head = (head + sizeof(*req)) & PRQ_RING_MASK;
-	}
-
-	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
-
-	/*
-	 * Clear the page request overflow bit and wake up all threads that
-	 * are waiting for the completion of this handling.
-	 */
-	if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) {
-		pr_info_ratelimited("IOMMU: %s: PRQ overflow detected\n",
-				    iommu->name);
-		head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
-		tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
-		if (head == tail) {
-			iopf_queue_discard_partial(iommu->iopf_queue);
-			writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
-			pr_info_ratelimited("IOMMU: %s: PRQ overflow cleared",
-					    iommu->name);
-		}
-	}
-
-	if (!completion_done(&iommu->prq_complete))
-		complete(&iommu->prq_complete);
-
-	return IRQ_RETVAL(handled);
-}
-
-void intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
-			     struct iommu_page_response *msg)
-{
-	struct device_domain_info *info = dev_iommu_priv_get(dev);
-	struct intel_iommu *iommu = info->iommu;
-	u8 bus = info->bus, devfn = info->devfn;
-	struct iommu_fault_page_request *prm;
-	struct qi_desc desc;
-	bool pasid_present;
-	bool last_page;
-	u16 sid;
-
-	prm = &evt->fault.prm;
-	sid = PCI_DEVID(bus, devfn);
-	pasid_present = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
-	last_page = prm->flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
-
-	desc.qw0 = QI_PGRP_PASID(prm->pasid) | QI_PGRP_DID(sid) |
-			QI_PGRP_PASID_P(pasid_present) |
-			QI_PGRP_RESP_CODE(msg->code) |
-			QI_PGRP_RESP_TYPE;
-	desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page);
-	desc.qw2 = 0;
-	desc.qw3 = 0;
-
-	qi_submit_sync(iommu, &desc, 1, 0);
-}
-
 static void intel_svm_domain_free(struct iommu_domain *domain)
 {
 	struct dmar_domain *dmar_domain = to_dmar_domain(domain);

-- 
2.45.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC PREVIEW 2/6] iommu: Make IOMMU_IOPF selectable in Kconfig
  2024-08-26 11:40 [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases Klaus Jensen
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 1/6] iommu/vt-d: Separate page request queue from SVM Klaus Jensen
@ 2024-08-26 11:40 ` Klaus Jensen
  2024-08-26 14:05   ` Jason Gunthorpe
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 3/6] iommufd: Enable PRI when doing the iommufd_hwpt_alloc Klaus Jensen
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Klaus Jensen @ 2024-08-26 11:40 UTC (permalink / raw)
  To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian
  Cc: Minwoo Im, linux-kernel, iommu, Joel Granados, Klaus Jensen

From: Joel Granados <j.granados@samsung.com>

IOMMU_IOPF is no longer selectable through INTEL_IOMMU_SVM effectively
severing their relation and allowing them to be used independently.

Signed-off-by: Joel Granados <j.granados@samsung.com>
---
 drivers/iommu/Kconfig       | 2 +-
 drivers/iommu/intel/Kconfig | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index a82f10054aec..d3ee8a0ad4a6 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -164,7 +164,7 @@ config IOMMU_SVA
 	bool
 
 config IOMMU_IOPF
-	bool
+	bool "Enable IO page fault in IOMMU"
 
 config FSL_PAMU
 	bool "Freescale IOMMU support"
diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index f52fb39c968e..fb5a4593c197 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -51,7 +51,6 @@ config INTEL_IOMMU_SVM
 	depends on X86_64
 	select MMU_NOTIFIER
 	select IOMMU_SVA
-	select IOMMU_IOPF
 	help
 	  Shared Virtual Memory (SVM) provides a facility for devices
 	  to access DMA resources through process address space by

-- 
2.45.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC PREVIEW 3/6] iommufd: Enable PRI when doing the iommufd_hwpt_alloc
  2024-08-26 11:40 [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases Klaus Jensen
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 1/6] iommu/vt-d: Separate page request queue from SVM Klaus Jensen
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 2/6] iommu: Make IOMMU_IOPF selectable in Kconfig Klaus Jensen
@ 2024-08-26 11:40 ` Klaus Jensen
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 4/6] iommu: init pasid array while doing domain_replace and iopf is active Klaus Jensen
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Klaus Jensen @ 2024-08-26 11:40 UTC (permalink / raw)
  To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian
  Cc: Minwoo Im, linux-kernel, iommu, Joel Granados, Klaus Jensen

From: Joel Granados <j.granados@samsung.com>

Add IOMMU_HWPT_FAULT_ID_VALID as part of the valid flags when doing an
iommufd_hwpt_alloc allowing the use of an iommu fault allocation
(iommu_fault_alloc) with the IOMMU_HWPT_ALLOC ioctl.

Signed-off-by: Joel Granados <j.granados@samsung.com>
---
 drivers/iommu/intel/iommu.c          | 3 ++-
 drivers/iommu/iommufd/hw_pagetable.c | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 4ca284d53a6b..ada3507d2831 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3716,7 +3716,8 @@ intel_iommu_domain_alloc_user(struct device *dev, u32 flags,
 	}
 
 	if (flags &
-	    (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING)))
+	    (~(IOMMU_HWPT_ALLOC_NEST_PARENT | IOMMU_HWPT_ALLOC_DIRTY_TRACKING
+	       | IOMMU_HWPT_FAULT_ID_VALID)))
 		return ERR_PTR(-EOPNOTSUPP);
 	if (nested_parent && !nested_supported(iommu))
 		return ERR_PTR(-EOPNOTSUPP);
diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index aefde4443671..88074e459995 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -107,7 +107,8 @@ iommufd_hwpt_paging_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
 			  const struct iommu_user_data *user_data)
 {
 	const u32 valid_flags = IOMMU_HWPT_ALLOC_NEST_PARENT |
-				IOMMU_HWPT_ALLOC_DIRTY_TRACKING;
+				IOMMU_HWPT_ALLOC_DIRTY_TRACKING |
+				IOMMU_HWPT_FAULT_ID_VALID;
 	const struct iommu_ops *ops = dev_iommu_ops(idev->dev);
 	struct iommufd_hwpt_paging *hwpt_paging;
 	struct iommufd_hw_pagetable *hwpt;

-- 
2.45.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC PREVIEW 4/6] iommu: init pasid array while doing domain_replace and iopf is active
  2024-08-26 11:40 [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases Klaus Jensen
                   ` (2 preceding siblings ...)
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 3/6] iommufd: Enable PRI when doing the iommufd_hwpt_alloc Klaus Jensen
@ 2024-08-26 11:40 ` Klaus Jensen
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 5/6] iommu/vt-d: drop pasid requirement for prq initialization Klaus Jensen
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 25+ messages in thread
From: Klaus Jensen @ 2024-08-26 11:40 UTC (permalink / raw)
  To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian
  Cc: Minwoo Im, linux-kernel, iommu, Joel Granados, Klaus Jensen

From: Joel Granados <j.granados@samsung.com>

iommu_report_device_fault expects a pasid array to have an
iommu_attach_handle when a fault is detected. Add this handle when the
replacing hwpt has a valid iommufd fault object. Remove it when we
release ownership of the group.

Signed-off-by: Joel Granados <j.granados@samsung.com>
---
 drivers/iommu/iommu-priv.h    |  3 +++
 drivers/iommu/iommu.c         | 31 +++++++++++++++++++++++++++++++
 drivers/iommu/iommufd/fault.c | 22 ++++++++++++++++++++++
 3 files changed, 56 insertions(+)

diff --git a/drivers/iommu/iommu-priv.h b/drivers/iommu/iommu-priv.h
index de5b54eaa8bf..493e501badc7 100644
--- a/drivers/iommu/iommu-priv.h
+++ b/drivers/iommu/iommu-priv.h
@@ -38,6 +38,9 @@ void iommu_device_unregister_bus(struct iommu_device *iommu,
 struct iommu_attach_handle *iommu_attach_handle_get(struct iommu_group *group,
 						    ioasid_t pasid,
 						    unsigned int type);
+int iommu_init_pasid_array(struct iommu_domain *domain,
+			   struct iommu_group *group,
+			   struct iommu_attach_handle *handle);
 int iommu_attach_group_handle(struct iommu_domain *domain,
 			      struct iommu_group *group,
 			      struct iommu_attach_handle *handle);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ed6c5cb60c5a..083bf1bbad93 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3262,6 +3262,9 @@ EXPORT_SYMBOL_GPL(iommu_device_claim_dma_owner);
 
 static void __iommu_release_dma_ownership(struct iommu_group *group)
 {
+	if (!xa_empty(&group->pasid_array))
+		xa_erase(&group->pasid_array, IOMMU_NO_PASID);
+
 	if (WARN_ON(!group->owner_cnt || !group->owner ||
 		    !xa_empty(&group->pasid_array)))
 		return;
@@ -3495,6 +3498,33 @@ iommu_attach_handle_get(struct iommu_group *group, ioasid_t pasid, unsigned int
 }
 EXPORT_SYMBOL_NS_GPL(iommu_attach_handle_get, IOMMUFD_INTERNAL);
 
+/**
+ * iommu_init_pasid_array - Initialize pasid array in the domain group
+ *
+ * Returns 0 on success. Error code on failure
+ *
+ * An IOMMU_NO_PASID element is *NOT* replaced if there is one already there.
+ */
+int iommu_init_pasid_array(struct iommu_domain *domain,
+			   struct iommu_group *group,
+			   struct iommu_attach_handle *handle)
+{
+	int ret;
+
+	if (handle)
+		handle->domain = domain;
+
+	mutex_lock(&group->mutex);
+	ret = xa_insert(&group->pasid_array, IOMMU_NO_PASID, handle, GFP_KERNEL);
+	mutex_unlock(&group->mutex);
+
+	if (ret == -EBUSY)
+		ret = 0;
+
+	return ret;
+}
+EXPORT_SYMBOL_NS_GPL(iommu_init_pasid_array, IOMMUFD_INTERNAL);
+
 /**
  * iommu_attach_group_handle - Attach an IOMMU domain to an IOMMU group
  * @domain: IOMMU domain to attach
@@ -3535,6 +3565,7 @@ int iommu_attach_group_handle(struct iommu_domain *domain,
 }
 EXPORT_SYMBOL_NS_GPL(iommu_attach_group_handle, IOMMUFD_INTERNAL);
 
+
 /**
  * iommu_detach_group_handle - Detach an IOMMU domain from an IOMMU group
  * @domain: IOMMU domain to attach
diff --git a/drivers/iommu/iommufd/fault.c b/drivers/iommu/iommufd/fault.c
index a643d5c7c535..ea7f1bf64892 100644
--- a/drivers/iommu/iommufd/fault.c
+++ b/drivers/iommu/iommufd/fault.c
@@ -178,6 +178,25 @@ static int __fault_domain_replace_dev(struct iommufd_device *idev,
 	return ret;
 }
 
+static int iommufd_init_pasid_array(struct iommu_domain *domain,
+				  struct iommufd_device *idev)
+{
+	int ret;
+	struct iommufd_attach_handle *handle;
+	struct iommu_group *group = idev->igroup->group;
+
+	handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+	if (!handle)
+		return -ENOMEM;
+	handle->idev = idev;
+
+	ret = iommu_init_pasid_array(domain, group, &handle->handle);
+	if (ret)
+		kfree(handle);
+
+	return ret;
+}
+
 int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
 				     struct iommufd_hw_pagetable *hwpt,
 				     struct iommufd_hw_pagetable *old)
@@ -190,6 +209,9 @@ int iommufd_fault_domain_replace_dev(struct iommufd_device *idev,
 		ret = iommufd_fault_iopf_enable(idev);
 		if (ret)
 			return ret;
+
+		if (iommufd_init_pasid_array(hwpt->domain, idev))
+			return -EINVAL;
 	}
 
 	ret = __fault_domain_replace_dev(idev, hwpt, old);

-- 
2.45.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC PREVIEW 5/6] iommu/vt-d: drop pasid requirement for prq initialization
  2024-08-26 11:40 [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases Klaus Jensen
                   ` (3 preceding siblings ...)
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 4/6] iommu: init pasid array while doing domain_replace and iopf is active Klaus Jensen
@ 2024-08-26 11:40 ` Klaus Jensen
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 6/6] iommu/vt-d: do not require a PASID in page requests Klaus Jensen
  2024-08-26 13:59 ` [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases Jason Gunthorpe
  6 siblings, 0 replies; 25+ messages in thread
From: Klaus Jensen @ 2024-08-26 11:40 UTC (permalink / raw)
  To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian
  Cc: Minwoo Im, linux-kernel, iommu, Joel Granados, Klaus Jensen

From: Klaus Jensen <k.jensen@samsung.com>

PASID support within the IOMMU is not required to enable the Page
Request Queue, only the PRS capability.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 drivers/iommu/intel/iommu.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ada3507d2831..bc1a369c2cf4 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1487,10 +1487,8 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
 	/* free context mapping */
 	free_context_table(iommu);
 
-	if (pasid_supported(iommu)) {
-		if (ecap_prs(iommu->ecap))
-			intel_finish_prq(iommu);
-	}
+	if (ecap_prs(iommu->ecap))
+		intel_finish_prq(iommu);
 }
 
 /*
@@ -2478,7 +2476,7 @@ static int __init init_dmars(void)
 
 		iommu_flush_write_buffer(iommu);
 
-		if (pasid_supported(iommu) && ecap_prs(iommu->ecap)) {
+		if (ecap_prs(iommu->ecap)) {
 			/*
 			 * Call dmar_alloc_hwirq() with dmar_global_lock held,
 			 * could cause possible lock race condition.
@@ -2919,7 +2917,7 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
 	intel_iommu_init_qi(iommu);
 	iommu_flush_write_buffer(iommu);
 
-	if (pasid_supported(iommu) && ecap_prs(iommu->ecap)) {
+	if (ecap_prs(iommu->ecap)) {
 		ret = intel_enable_prq(iommu);
 		if (ret)
 			goto disable_iommu;

-- 
2.45.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH RFC PREVIEW 6/6] iommu/vt-d: do not require a PASID in page requests
  2024-08-26 11:40 [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases Klaus Jensen
                   ` (4 preceding siblings ...)
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 5/6] iommu/vt-d: drop pasid requirement for prq initialization Klaus Jensen
@ 2024-08-26 11:40 ` Klaus Jensen
  2024-09-04 10:19   ` Joel Granados
  2024-08-26 13:59 ` [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases Jason Gunthorpe
  6 siblings, 1 reply; 25+ messages in thread
From: Klaus Jensen @ 2024-08-26 11:40 UTC (permalink / raw)
  To: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian
  Cc: Minwoo Im, linux-kernel, iommu, Joel Granados, Klaus Jensen

From: Klaus Jensen <k.jensen@samsung.com>

PRQ events can now be handled without a PASID being present. Remove the
restriction.

Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
---
 drivers/iommu/intel/prq.c | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/intel/prq.c b/drivers/iommu/intel/prq.c
index 2814373e95d8..cc36198ebf91 100644
--- a/drivers/iommu/intel/prq.c
+++ b/drivers/iommu/intel/prq.c
@@ -101,18 +101,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
 		req = &iommu->prq[head / sizeof(*req)];
 		address = (u64)req->addr << VTD_PAGE_SHIFT;
 
-		if (unlikely(!req->pasid_present)) {
-			pr_err("IOMMU: %s: Page request without PASID\n",
-			       iommu->name);
-bad_req:
-			handle_bad_prq_event(iommu, req, QI_RESP_INVALID);
-			goto prq_advance;
-		}
-
 		if (unlikely(!is_canonical_address(address))) {
 			pr_err("IOMMU: %s: Address is not canonical\n",
 			       iommu->name);
-			goto bad_req;
+bad_req:
+			handle_bad_prq_event(iommu, req, QI_RESP_INVALID);
+			goto prq_advance;
 		}
 
 		if (unlikely(req->pm_req && (req->rd_req | req->wr_req))) {

-- 
2.45.2


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 1/6] iommu/vt-d: Separate page request queue from SVM
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 1/6] iommu/vt-d: Separate page request queue from SVM Klaus Jensen
@ 2024-08-26 13:09   ` Baolu Lu
  2024-09-04  9:12     ` Joel Granados
  2024-09-01  5:16   ` Baolu Lu
  1 sibling, 1 reply; 25+ messages in thread
From: Baolu Lu @ 2024-08-26 13:09 UTC (permalink / raw)
  To: Klaus Jensen, David Woodhouse, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian
  Cc: baolu.lu, Minwoo Im, linux-kernel, iommu, Joel Granados,
	Klaus Jensen

On 2024/8/26 19:40, Klaus Jensen wrote:
> From: Joel Granados<j.granados@samsung.com>
> 
> IO page faults are no longer dependent on CONFIG_INTEL_IOMMU_SVM. Move
> all Page Request Queue (PRQ) functions that handle prq events to a new
> file in drivers/iommu/intel/prq.c. The page_req_des struct is made
> available in drivers/iommu/intel/iommu.h.
> 
> No functional changes are intended. This is a preparation patch to
> enable the use of IO page faults outside the SVM and nested use cases.
> 
> Signed-off-by: Joel Granados<j.granados@samsung.com>
> ---
>   drivers/iommu/intel/Makefile |   2 +-
>   drivers/iommu/intel/iommu.c  |  18 +--
>   drivers/iommu/intel/iommu.h  |  40 +++++-
>   drivers/iommu/intel/prq.c    | 290 ++++++++++++++++++++++++++++++++++++++++
>   drivers/iommu/intel/svm.c    | 308 -------------------------------------------
>   5 files changed, 331 insertions(+), 327 deletions(-)
> 
> diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
> index c8beb0281559..d3bb0798092d 100644
> --- a/drivers/iommu/intel/Makefile
> +++ b/drivers/iommu/intel/Makefile
> @@ -1,6 +1,6 @@
>   # SPDX-License-Identifier: GPL-2.0
>   obj-$(CONFIG_DMAR_TABLE) += dmar.o
> -obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o
> +obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o prq.o
>   obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
>   obj-$(CONFIG_DMAR_PERF) += perf.o
>   obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o

Thanks for the patch! Now that IOPF is separate from SVA, the Kconfig
needs to be updated accordingly.

diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
index f52fb39c968e..2888671c9278 100644
--- a/drivers/iommu/intel/Kconfig
+++ b/drivers/iommu/intel/Kconfig
@@ -15,6 +15,7 @@ config INTEL_IOMMU
         select DMA_OPS
         select IOMMU_API
         select IOMMU_IOVA
+       select IOMMU_IOPF
         select IOMMUFD_DRIVER if IOMMUFD
         select NEED_DMA_MAP_STATE
         select DMAR_TABLE
@@ -51,7 +52,6 @@ config INTEL_IOMMU_SVM
         depends on X86_64
         select MMU_NOTIFIER
         select IOMMU_SVA
-       select IOMMU_IOPF
         help
           Shared Virtual Memory (SVM) provides a facility for devices
           to access DMA resources through process address space by

Thanks,
baolu

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases
  2024-08-26 11:40 [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases Klaus Jensen
                   ` (5 preceding siblings ...)
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 6/6] iommu/vt-d: do not require a PASID in page requests Klaus Jensen
@ 2024-08-26 13:59 ` Jason Gunthorpe
  2024-09-02 10:48   ` Joel Granados
  6 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2024-08-26 13:59 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
	Robin Murphy, Kevin Tian, Minwoo Im, linux-kernel, iommu,
	Joel Granados, Klaus Jensen

On Mon, Aug 26, 2024 at 01:40:26PM +0200, Klaus Jensen wrote:
> This is a Request for Comment series that will hopefully generate
> initial feedback on the use of the iommufd_hwpt_replace_device to
> execute non-nested and non-svm user space IOPFs. Our main motivation is
> to enable user-space driver driven device verification with default
> pasid and without nesting nor SVM.
> 
> What?
>   * Enable IO page fault handling in user space in a non-nested, non-svm
>     and non-virtualised use case.
>   * Removing the relation between IOPF and INTEL_IOMMU_SVM by allowing
>     the user to (de)select the IOPF code through Kconfig.
>   * Create a new file under iommu/intel (prq.c) that contains all the
>     page request queue related logic and is not under intel/svm.c.
>   * Add the IOMMU_HWPT_FAULT_ID_VALID to the valid flags used to create
>     IOMMU_HWPT_ALLOC allocations.
>   * Create a default (zero) pasid handle and insert it to the pasid
>     array within the dev->iommu_group when replacing the old HWPT with
>     an iopf enabled HWPT.
> 
> Why?
>   The PCI ATS Extended Capability allows peripheral devices to
>   participate in the caching of translations when operating under an
>   IOMMU. Further, the ATS Page Request Interface (PRI) Extension allows
>   devices to handle missing mappings. Currently, PRI is mainly used in
>   the context of Shared Virtual Addressing, requiring support for the
>   Process Address Space Identifier (PASID) capability, but other use
>   cases such as enabling user-space driver driven device verification
>   and reducing memory pinning exists. This patchest sets out to enable
>   these use cases.

I definitely expect PRI to work outside PASID and SVA cases, so this
is going in a good direction

>   Supplementary repositories supporting this patchset:
>     1. A user space library libvfn [1] which is used for testing and
>        verification (see examples/iopf.c), and

That's pretty neat, I've been wanting to see some kind of IOMMU test
suite based around a capable widely available device. This is the
closest I've seen..

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 2/6] iommu: Make IOMMU_IOPF selectable in Kconfig
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 2/6] iommu: Make IOMMU_IOPF selectable in Kconfig Klaus Jensen
@ 2024-08-26 14:05   ` Jason Gunthorpe
  2024-09-04  9:44     ` Joel Granados
  0 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2024-08-26 14:05 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
	Robin Murphy, Kevin Tian, Minwoo Im, linux-kernel, iommu,
	Joel Granados, Klaus Jensen

On Mon, Aug 26, 2024 at 01:40:28PM +0200, Klaus Jensen wrote:
> From: Joel Granados <j.granados@samsung.com>
> 
> IOMMU_IOPF is no longer selectable through INTEL_IOMMU_SVM effectively
> severing their relation and allowing them to be used independently.
> 
> Signed-off-by: Joel Granados <j.granados@samsung.com>
> ---
>  drivers/iommu/Kconfig       | 2 +-
>  drivers/iommu/intel/Kconfig | 1 -
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index a82f10054aec..d3ee8a0ad4a6 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -164,7 +164,7 @@ config IOMMU_SVA
>  	bool
>  
>  config IOMMU_IOPF
> -	bool
> +	bool "Enable IO page fault in IOMMU"


Currently IOMMU_IOPF indicates that the driver wants to consume the
library functions around IOPF, it is not a user selectable because any
driver that links to those functions should have them working. If you
want to make the core driver use them then the select should be moved
from the SVM sub config to the core driver config.

If you want to change IOMMU_IOPF to a user configurable then it should
have a full help text and all the kconfig places touching it should be
turned into

  depends IOMMU_IOPF

Ie you can't enable any drivers SVA kconfig without also picking
IOMMU_IOPF.

This is doing neither fully.. Pick one :)

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 1/6] iommu/vt-d: Separate page request queue from SVM
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 1/6] iommu/vt-d: Separate page request queue from SVM Klaus Jensen
  2024-08-26 13:09   ` Baolu Lu
@ 2024-09-01  5:16   ` Baolu Lu
  2024-09-04  8:39     ` Joel Granados
  1 sibling, 1 reply; 25+ messages in thread
From: Baolu Lu @ 2024-09-01  5:16 UTC (permalink / raw)
  To: Klaus Jensen, David Woodhouse, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian
  Cc: baolu.lu, Minwoo Im, linux-kernel, iommu, Joel Granados,
	Klaus Jensen

On 2024/8/26 19:40, Klaus Jensen wrote:
> From: Joel Granados<j.granados@samsung.com>
> 
> IO page faults are no longer dependent on CONFIG_INTEL_IOMMU_SVM. Move
> all Page Request Queue (PRQ) functions that handle prq events to a new
> file in drivers/iommu/intel/prq.c. The page_req_des struct is made
> available in drivers/iommu/intel/iommu.h.
> 
> No functional changes are intended. This is a preparation patch to
> enable the use of IO page faults outside the SVM and nested use cases.
> 
> Signed-off-by: Joel Granados<j.granados@samsung.com>
> ---
>   drivers/iommu/intel/Makefile |   2 +-
>   drivers/iommu/intel/iommu.c  |  18 +--
>   drivers/iommu/intel/iommu.h  |  40 +++++-
>   drivers/iommu/intel/prq.c    | 290 ++++++++++++++++++++++++++++++++++++++++
>   drivers/iommu/intel/svm.c    | 308 -------------------------------------------
>   5 files changed, 331 insertions(+), 327 deletions(-)
> 
> diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
> index c8beb0281559..d3bb0798092d 100644
> --- a/drivers/iommu/intel/Makefile
> +++ b/drivers/iommu/intel/Makefile
> @@ -1,6 +1,6 @@
>   # SPDX-License-Identifier: GPL-2.0
>   obj-$(CONFIG_DMAR_TABLE) += dmar.o
> -obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o
> +obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o prq.o
>   obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
>   obj-$(CONFIG_DMAR_PERF) += perf.o
>   obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 9ff8b83c19a3..4ca284d53a6b 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1487,12 +1487,10 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
>   	/* free context mapping */
>   	free_context_table(iommu);
>   
> -#ifdef CONFIG_INTEL_IOMMU_SVM
>   	if (pasid_supported(iommu)) {
>   		if (ecap_prs(iommu->ecap))
> -			intel_svm_finish_prq(iommu);
> +			intel_finish_prq(iommu);
>   	}
> -#endif
>   }
>   
>   /*
> @@ -2480,19 +2478,18 @@ static int __init init_dmars(void)
>   
>   		iommu_flush_write_buffer(iommu);
>   
> -#ifdef CONFIG_INTEL_IOMMU_SVM
>   		if (pasid_supported(iommu) && ecap_prs(iommu->ecap)) {
>   			/*
>   			 * Call dmar_alloc_hwirq() with dmar_global_lock held,
>   			 * could cause possible lock race condition.
>   			 */
>   			up_write(&dmar_global_lock);
> -			ret = intel_svm_enable_prq(iommu);
> +			ret = intel_enable_prq(iommu);
>   			down_write(&dmar_global_lock);
>   			if (ret)
>   				goto free_iommu;
>   		}
> -#endif
> +
>   		ret = dmar_set_interrupt(iommu);
>   		if (ret)
>   			goto free_iommu;
> @@ -2922,13 +2919,12 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
>   	intel_iommu_init_qi(iommu);
>   	iommu_flush_write_buffer(iommu);
>   
> -#ifdef CONFIG_INTEL_IOMMU_SVM
>   	if (pasid_supported(iommu) && ecap_prs(iommu->ecap)) {
> -		ret = intel_svm_enable_prq(iommu);
> +		ret = intel_enable_prq(iommu);
>   		if (ret)
>   			goto disable_iommu;
>   	}
> -#endif
> +
>   	ret = dmar_set_interrupt(iommu);
>   	if (ret)
>   		goto disable_iommu;
> @@ -4669,9 +4665,7 @@ const struct iommu_ops intel_iommu_ops = {
>   	.def_domain_type	= device_def_domain_type,
>   	.remove_dev_pasid	= intel_iommu_remove_dev_pasid,
>   	.pgsize_bitmap		= SZ_4K,
> -#ifdef CONFIG_INTEL_IOMMU_SVM
> -	.page_response		= intel_svm_page_response,
> -#endif
> +	.page_response		= intel_page_response,
>   	.default_domain_ops = &(const struct iommu_domain_ops) {
>   		.attach_dev		= intel_iommu_attach_device,
>   		.set_dev_pasid		= intel_iommu_set_dev_pasid,
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index b67c14da1240..b3d98e706ed8 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -694,6 +694,35 @@ struct iommu_pmu {
>   #define IOMMU_IRQ_ID_OFFSET_PRQ		(DMAR_UNITS_SUPPORTED)
>   #define IOMMU_IRQ_ID_OFFSET_PERF	(2 * DMAR_UNITS_SUPPORTED)
>   
> +/* Page request queue descriptor */
> +struct page_req_dsc {
> +	union {
> +		struct {
> +			u64 type:8;
> +			u64 pasid_present:1;
> +			u64 rsvd:7;
> +			u64 rid:16;
> +			u64 pasid:20;
> +			u64 exe_req:1;
> +			u64 pm_req:1;
> +			u64 rsvd2:10;
> +		};
> +		u64 qw_0;
> +	};
> +	union {
> +		struct {
> +			u64 rd_req:1;
> +			u64 wr_req:1;
> +			u64 lpig:1;
> +			u64 prg_index:9;
> +			u64 addr:52;
> +		};
> +		u64 qw_1;
> +	};
> +	u64 qw_2;
> +	u64 qw_3;
> +};

Why not move this structure to prq.c? It is specific to that file. Or
not?

> +
>   struct intel_iommu {
>   	void __iomem	*reg; /* Pointer to hardware regs, virtual addr */
>   	u64 		reg_phys; /* physical address of hw register set */
> @@ -719,12 +748,10 @@ struct intel_iommu {
>   
>   	struct iommu_flush flush;
>   #endif
> -#ifdef CONFIG_INTEL_IOMMU_SVM
>   	struct page_req_dsc *prq;
>   	unsigned char prq_name[16];    /* Name for PRQ interrupt */
>   	unsigned long prq_seq_number;
>   	struct completion prq_complete;
> -#endif
>   	struct iopf_queue *iopf_queue;
>   	unsigned char iopfq_name[16];
>   	/* Synchronization between fault report and iommu device release. */
> @@ -1156,12 +1183,13 @@ void intel_context_flush_present(struct device_domain_info *info,
>   				 struct context_entry *context,
>   				 bool affect_domains);
>   
> +int intel_enable_prq(struct intel_iommu *iommu);
> +int intel_finish_prq(struct intel_iommu *iommu);
> +void intel_page_response(struct device *dev, struct iopf_fault *evt,
> +			struct iommu_page_response *msg);
> +
>   #ifdef CONFIG_INTEL_IOMMU_SVM
>   void intel_svm_check(struct intel_iommu *iommu);
> -int intel_svm_enable_prq(struct intel_iommu *iommu);
> -int intel_svm_finish_prq(struct intel_iommu *iommu);
> -void intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
> -			     struct iommu_page_response *msg);
>   struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
>   					    struct mm_struct *mm);
>   void intel_drain_pasid_prq(struct device *dev, u32 pasid);
> diff --git a/drivers/iommu/intel/prq.c b/drivers/iommu/intel/prq.c
> new file mode 100644
> index 000000000000..2814373e95d8
> --- /dev/null
> +++ b/drivers/iommu/intel/prq.c
> @@ -0,0 +1,290 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright © 2015 Intel Corporation.
> + *
> + * Authors: David Woodhouse<dwmw2@infradead.org>

Many contributors have worked on the code moved in this change. The
original authorship is no longer relevant.

Consider adding a comment like 'Split from svm.c' to document the
origin.

> + */
> +
> +#include <linux/pci.h>
> +
> +#include "iommu.h"
> +#include "../iommu-pages.h"
> +#include "trace.h"
> +
> +static bool is_canonical_address(u64 addr)
> +{
> +	int shift = 64 - (__VIRTUAL_MASK_SHIFT + 1);
> +	long saddr = (long) addr;
> +
> +	return (((saddr << shift) >> shift) == saddr);
> +}
> +
> +static void handle_bad_prq_event(struct intel_iommu *iommu,
> +				 struct page_req_dsc *req, int result)
> +{
> +	struct qi_desc desc = { };
> +
> +	pr_err("%s: Invalid page request: %08llx %08llx\n",
> +	       iommu->name, ((unsigned long long *)req)[0],
> +	       ((unsigned long long *)req)[1]);
> +
> +	if (!req->lpig)
> +		return;
> +
> +	desc.qw0 = QI_PGRP_PASID(req->pasid) |
> +			QI_PGRP_DID(req->rid) |
> +			QI_PGRP_PASID_P(req->pasid_present) |
> +			QI_PGRP_RESP_CODE(result) |
> +			QI_PGRP_RESP_TYPE;
> +	desc.qw1 = QI_PGRP_IDX(req->prg_index) |
> +			QI_PGRP_LPIG(req->lpig);
> +
> +	qi_submit_sync(iommu, &desc, 1, 0);
> +}
> +
> +static int prq_to_iommu_prot(struct page_req_dsc *req)
> +{
> +	int prot = 0;
> +
> +	if (req->rd_req)
> +		prot |= IOMMU_FAULT_PERM_READ;
> +	if (req->wr_req)
> +		prot |= IOMMU_FAULT_PERM_WRITE;
> +	if (req->exe_req)
> +		prot |= IOMMU_FAULT_PERM_EXEC;
> +	if (req->pm_req)
> +		prot |= IOMMU_FAULT_PERM_PRIV;
> +
> +	return prot;
> +}
> +
> +static void intel_prq_report(struct intel_iommu *iommu, struct device *dev,
> +				 struct page_req_dsc *desc)
> +{
> +	struct iopf_fault event = { };
> +
> +	/* Fill in event data for device specific processing */
> +	event.fault.type = IOMMU_FAULT_PAGE_REQ;
> +	event.fault.prm.addr = (u64)desc->addr << VTD_PAGE_SHIFT;
> +	event.fault.prm.pasid = desc->pasid;
> +	event.fault.prm.grpid = desc->prg_index;
> +	event.fault.prm.perm = prq_to_iommu_prot(desc);
> +
> +	if (desc->lpig)
> +		event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
> +	if (desc->pasid_present) {
> +		event.fault.prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
> +		event.fault.prm.flags |= IOMMU_FAULT_PAGE_RESPONSE_NEEDS_PASID;
> +	}
> +
> +	iommu_report_device_fault(dev, &event);
> +}
> +
> +static irqreturn_t prq_event_thread(int irq, void *d)
> +{
> +	struct intel_iommu *iommu = d;
> +	struct page_req_dsc *req;
> +	int head, tail, handled;
> +	struct device *dev;
> +	u64 address;
> +
> +	/*
> +	 * Clear PPR bit before reading head/tail registers, to ensure that
> +	 * we get a new interrupt if needed.
> +	 */
> +	writel(DMA_PRS_PPR, iommu->reg + DMAR_PRS_REG);
> +
> +	tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
> +	head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
> +	handled = (head != tail);
> +	while (head != tail) {
> +		req = &iommu->prq[head / sizeof(*req)];
> +		address = (u64)req->addr << VTD_PAGE_SHIFT;
> +
> +		if (unlikely(!req->pasid_present)) {
> +			pr_err("IOMMU: %s: Page request without PASID\n",
> +			       iommu->name);
> +bad_req:
> +			handle_bad_prq_event(iommu, req, QI_RESP_INVALID);
> +			goto prq_advance;
> +		}
> +
> +		if (unlikely(!is_canonical_address(address))) {
> +			pr_err("IOMMU: %s: Address is not canonical\n",
> +			       iommu->name);
> +			goto bad_req;
> +		}
> +
> +		if (unlikely(req->pm_req && (req->rd_req | req->wr_req))) {
> +			pr_err("IOMMU: %s: Page request in Privilege Mode\n",
> +			       iommu->name);
> +			goto bad_req;
> +		}
> +
> +		if (unlikely(req->exe_req && req->rd_req)) {
> +			pr_err("IOMMU: %s: Execution request not supported\n",
> +			       iommu->name);
> +			goto bad_req;
> +		}
> +
> +		/* Drop Stop Marker message. No need for a response. */
> +		if (unlikely(req->lpig && !req->rd_req && !req->wr_req))
> +			goto prq_advance;
> +
> +		/*
> +		 * If prq is to be handled outside iommu driver via receiver of
> +		 * the fault notifiers, we skip the page response here.
> +		 */
> +		mutex_lock(&iommu->iopf_lock);
> +		dev = device_rbtree_find(iommu, req->rid);
> +		if (!dev) {
> +			mutex_unlock(&iommu->iopf_lock);
> +			goto bad_req;
> +		}
> +
> +		intel_prq_report(iommu, dev, req);
> +		trace_prq_report(iommu, dev, req->qw_0, req->qw_1,
> +				 req->qw_2, req->qw_3,
> +				 iommu->prq_seq_number++);
> +		mutex_unlock(&iommu->iopf_lock);
> +prq_advance:
> +		head = (head + sizeof(*req)) & PRQ_RING_MASK;
> +	}
> +
> +	dmar_writeq(iommu->reg + DMAR_PQH_REG, tail);
> +
> +	/*
> +	 * Clear the page request overflow bit and wake up all threads that
> +	 * are waiting for the completion of this handling.
> +	 */
> +	if (readl(iommu->reg + DMAR_PRS_REG) & DMA_PRS_PRO) {
> +		pr_info_ratelimited("IOMMU: %s: PRQ overflow detected\n",
> +				    iommu->name);
> +		head = dmar_readq(iommu->reg + DMAR_PQH_REG) & PRQ_RING_MASK;
> +		tail = dmar_readq(iommu->reg + DMAR_PQT_REG) & PRQ_RING_MASK;
> +		if (head == tail) {
> +			iopf_queue_discard_partial(iommu->iopf_queue);
> +			writel(DMA_PRS_PRO, iommu->reg + DMAR_PRS_REG);
> +			pr_info_ratelimited("IOMMU: %s: PRQ overflow cleared",
> +					    iommu->name);
> +		}
> +	}
> +
> +	if (!completion_done(&iommu->prq_complete))
> +		complete(&iommu->prq_complete);
> +
> +	return IRQ_RETVAL(handled);
> +}
> +
> +int intel_enable_prq(struct intel_iommu *iommu)
> +{
> +	struct iopf_queue *iopfq;
> +	int irq, ret;
> +
> +	iommu->prq = iommu_alloc_pages_node(iommu->node, GFP_KERNEL, PRQ_ORDER);
> +	if (!iommu->prq) {
> +		pr_warn("IOMMU: %s: Failed to allocate page request queue\n",
> +			iommu->name);
> +		return -ENOMEM;
> +	}
> +
> +	irq = dmar_alloc_hwirq(IOMMU_IRQ_ID_OFFSET_PRQ + iommu->seq_id, iommu->node, iommu);
> +	if (irq <= 0) {
> +		pr_err("IOMMU: %s: Failed to create IRQ vector for page request queue\n",
> +		       iommu->name);
> +		ret = -EINVAL;
> +		goto free_prq;
> +	}
> +	iommu->pr_irq = irq;
> +
> +	snprintf(iommu->iopfq_name, sizeof(iommu->iopfq_name),
> +		 "dmar%d-iopfq", iommu->seq_id);
> +	iopfq = iopf_queue_alloc(iommu->iopfq_name);
> +	if (!iopfq) {
> +		pr_err("IOMMU: %s: Failed to allocate iopf queue\n", iommu->name);
> +		ret = -ENOMEM;
> +		goto free_hwirq;
> +	}
> +	iommu->iopf_queue = iopfq;
> +
> +	snprintf(iommu->prq_name, sizeof(iommu->prq_name), "dmar%d-prq", iommu->seq_id);
> +
> +	ret = request_threaded_irq(irq, NULL, prq_event_thread, IRQF_ONESHOT,
> +				   iommu->prq_name, iommu);
> +	if (ret) {
> +		pr_err("IOMMU: %s: Failed to request IRQ for page request queue\n",
> +		       iommu->name);
> +		goto free_iopfq;
> +	}
> +	dmar_writeq(iommu->reg + DMAR_PQH_REG, 0ULL);
> +	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
> +	dmar_writeq(iommu->reg + DMAR_PQA_REG, virt_to_phys(iommu->prq) | PRQ_ORDER);
> +
> +	init_completion(&iommu->prq_complete);
> +
> +	return 0;
> +
> +free_iopfq:
> +	iopf_queue_free(iommu->iopf_queue);
> +	iommu->iopf_queue = NULL;
> +free_hwirq:
> +	dmar_free_hwirq(irq);
> +	iommu->pr_irq = 0;
> +free_prq:
> +	iommu_free_pages(iommu->prq, PRQ_ORDER);
> +	iommu->prq = NULL;
> +
> +	return ret;
> +}
> +
> +int intel_finish_prq(struct intel_iommu *iommu)
> +{
> +	dmar_writeq(iommu->reg + DMAR_PQH_REG, 0ULL);
> +	dmar_writeq(iommu->reg + DMAR_PQT_REG, 0ULL);
> +	dmar_writeq(iommu->reg + DMAR_PQA_REG, 0ULL);
> +
> +	if (iommu->pr_irq) {
> +		free_irq(iommu->pr_irq, iommu);
> +		dmar_free_hwirq(iommu->pr_irq);
> +		iommu->pr_irq = 0;
> +	}
> +
> +	if (iommu->iopf_queue) {
> +		iopf_queue_free(iommu->iopf_queue);
> +		iommu->iopf_queue = NULL;
> +	}
> +
> +	iommu_free_pages(iommu->prq, PRQ_ORDER);
> +	iommu->prq = NULL;
> +
> +	return 0;
> +}
> +
> +void intel_page_response(struct device *dev, struct iopf_fault *evt,
> +			 struct iommu_page_response *msg)
> +{
> +	struct device_domain_info *info = dev_iommu_priv_get(dev);
> +	struct intel_iommu *iommu = info->iommu;
> +	u8 bus = info->bus, devfn = info->devfn;
> +	struct iommu_fault_page_request *prm;
> +	struct qi_desc desc;
> +	bool pasid_present;
> +	bool last_page;
> +	u16 sid;
> +
> +	prm = &evt->fault.prm;
> +	sid = PCI_DEVID(bus, devfn);
> +	pasid_present = prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID;
> +	last_page = prm->flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE;
> +
> +	desc.qw0 = QI_PGRP_PASID(prm->pasid) | QI_PGRP_DID(sid) |
> +			QI_PGRP_PASID_P(pasid_present) |
> +			QI_PGRP_RESP_CODE(msg->code) |
> +			QI_PGRP_RESP_TYPE;
> +	desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page);
> +	desc.qw2 = 0;
> +	desc.qw3 = 0;
> +
> +	qi_submit_sync(iommu, &desc, 1, 0);
> +}

The intel_drain_pasid_prq() helper should be moved to prq.c. It's no
longer specific to SVM.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases
  2024-08-26 13:59 ` [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases Jason Gunthorpe
@ 2024-09-02 10:48   ` Joel Granados
  2024-09-02 11:06     ` Joel Granados
                       ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Joel Granados @ 2024-09-02 10:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Klaus Jensen, David Woodhouse, Lu Baolu, Joerg Roedel,
	Will Deacon, Robin Murphy, Kevin Tian, Minwoo Im, linux-kernel,
	iommu, Klaus Jensen

On Mon, Aug 26, 2024 at 10:59:55AM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 26, 2024 at 01:40:26PM +0200, Klaus Jensen wrote:
> > This is a Request for Comment series that will hopefully generate
> > initial feedback on the use of the iommufd_hwpt_replace_device to
> > execute non-nested and non-svm user space IOPFs. Our main motivation is
> > to enable user-space driver driven device verification with default
> > pasid and without nesting nor SVM.
> > 
> > What?
> >   * Enable IO page fault handling in user space in a non-nested, non-svm
> >     and non-virtualised use case.
> >   * Removing the relation between IOPF and INTEL_IOMMU_SVM by allowing
> >     the user to (de)select the IOPF code through Kconfig.
> >   * Create a new file under iommu/intel (prq.c) that contains all the
> >     page request queue related logic and is not under intel/svm.c.
> >   * Add the IOMMU_HWPT_FAULT_ID_VALID to the valid flags used to create
> >     IOMMU_HWPT_ALLOC allocations.
> >   * Create a default (zero) pasid handle and insert it to the pasid
> >     array within the dev->iommu_group when replacing the old HWPT with
> >     an iopf enabled HWPT.
> > 
> > Why?
> >   The PCI ATS Extended Capability allows peripheral devices to
> >   participate in the caching of translations when operating under an
> >   IOMMU. Further, the ATS Page Request Interface (PRI) Extension allows
> >   devices to handle missing mappings. Currently, PRI is mainly used in
> >   the context of Shared Virtual Addressing, requiring support for the
> >   Process Address Space Identifier (PASID) capability, but other use
> >   cases such as enabling user-space driver driven device verification
> >   and reducing memory pinning exists. This patchest sets out to enable
> >   these use cases.
> 
Sorry for the late reply, Slowly getting through my backlog after PTO

> I definitely expect PRI to work outside PASID and SVA cases, so this
> is going in a good direction
This touches on a detail (at least in Intel's vtd-io spec) that is not
100% clear to me. Second paragraph of section "3.4.3 Scalable Mode
Address Translation" reads:
"
  ... Scalable-mode context-entries support both requests-without-PASID
  and requests-with-PASID. However unlike legacy mode, in scalable-mode,
  requests-without-PASID obtain a PASID value from the RID_PASID field of
  the scalable-mode context- entry and are processed similarly to
  requests-with-PASID.Implementations not supporting RID_PASID capability
  (ECAP_REG.RPS is 0b), use a PASID value of 0 to perform address
  translation for requests without PASID.
"
This basically means that a default PASID is used even though the
request is without PASID. Right? Therefore "outside PASID" means with
the default PASID (at least in Intels case). Right?

> 
> >   Supplementary repositories supporting this patchset:
> >     1. A user space library libvfn [1] which is used for testing and
> >        verification (see examples/iopf.c), and
> 
> That's pretty neat, I've been wanting to see some kind of IOMMU test
> suite based around a capable widely available device. This is the
> closest I've seen..
Yes! This is an obvious application of libvfn. Do you see it as a
something that can be included in tools/selftests/iommu?

Best
-- 

Joel Granados

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases
  2024-09-02 10:48   ` Joel Granados
@ 2024-09-02 11:06     ` Joel Granados
  2024-09-02 12:47     ` Baolu Lu
  2024-09-04 16:13     ` Jason Gunthorpe
  2 siblings, 0 replies; 25+ messages in thread
From: Joel Granados @ 2024-09-02 11:06 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Klaus Jensen, David Woodhouse, Lu Baolu, Joerg Roedel,
	Will Deacon, Robin Murphy, Kevin Tian, Minwoo Im, linux-kernel,
	iommu, Klaus Jensen

On Mon, Sep 02, 2024 at 12:48:19PM +0200, Joel Granados wrote:
> On Mon, Aug 26, 2024 at 10:59:55AM -0300, Jason Gunthorpe wrote:
> > On Mon, Aug 26, 2024 at 01:40:26PM +0200, Klaus Jensen wrote:
> > > This is a Request for Comment series that will hopefully generate
> > > initial feedback on the use of the iommufd_hwpt_replace_device to
> > > execute non-nested and non-svm user space IOPFs. Our main motivation is
> > > to enable user-space driver driven device verification with default
> > > pasid and without nesting nor SVM.
> > > 
> > > What?
> > >   * Enable IO page fault handling in user space in a non-nested, non-svm
> > >     and non-virtualised use case.
> > >   * Removing the relation between IOPF and INTEL_IOMMU_SVM by allowing
> > >     the user to (de)select the IOPF code through Kconfig.
> > >   * Create a new file under iommu/intel (prq.c) that contains all the
> > >     page request queue related logic and is not under intel/svm.c.
> > >   * Add the IOMMU_HWPT_FAULT_ID_VALID to the valid flags used to create
> > >     IOMMU_HWPT_ALLOC allocations.
> > >   * Create a default (zero) pasid handle and insert it to the pasid
> > >     array within the dev->iommu_group when replacing the old HWPT with
> > >     an iopf enabled HWPT.
> > > 
> > > Why?
> > >   The PCI ATS Extended Capability allows peripheral devices to
> > >   participate in the caching of translations when operating under an
> > >   IOMMU. Further, the ATS Page Request Interface (PRI) Extension allows
> > >   devices to handle missing mappings. Currently, PRI is mainly used in
> > >   the context of Shared Virtual Addressing, requiring support for the
> > >   Process Address Space Identifier (PASID) capability, but other use
> > >   cases such as enabling user-space driver driven device verification
> > >   and reducing memory pinning exists. This patchest sets out to enable
> > >   these use cases.
> > 
> Sorry for the late reply, Slowly getting through my backlog after PTO
> 
> > I definitely expect PRI to work outside PASID and SVA cases, so this
> > is going in a good direction
> This touches on a detail (at least in Intel's vtd-io spec) that is not
> 100% clear to me. Second paragraph of section "3.4.3 Scalable Mode
> Address Translation" reads:
> "
>   ... Scalable-mode context-entries support both requests-without-PASID
>   and requests-with-PASID. However unlike legacy mode, in scalable-mode,
>   requests-without-PASID obtain a PASID value from the RID_PASID field of
>   the scalable-mode context- entry and are processed similarly to
>   requests-with-PASID.Implementations not supporting RID_PASID capability
>   (ECAP_REG.RPS is 0b), use a PASID value of 0 to perform address
>   translation for requests without PASID.
> "
> This basically means that a default PASID is used even though the
> request is without PASID. Right? Therefore "outside PASID" means with
> the default PASID (at least in Intels case). Right?
This is something that is related to patches 5/6 and 6/6 of this set.
And maybe is more a question for Lu Baolu.

Best

-- 

Joel Granados

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases
  2024-09-02 10:48   ` Joel Granados
  2024-09-02 11:06     ` Joel Granados
@ 2024-09-02 12:47     ` Baolu Lu
  2024-09-03 13:20       ` Joel Granados
  2024-09-04 16:13     ` Jason Gunthorpe
  2 siblings, 1 reply; 25+ messages in thread
From: Baolu Lu @ 2024-09-02 12:47 UTC (permalink / raw)
  To: Joel Granados, Jason Gunthorpe
  Cc: baolu.lu, Klaus Jensen, David Woodhouse, Joerg Roedel,
	Will Deacon, Robin Murphy, Kevin Tian, Minwoo Im, linux-kernel,
	iommu, Klaus Jensen

On 2024/9/2 18:48, Joel Granados wrote:
>> I definitely expect PRI to work outside PASID and SVA cases, so this
>> is going in a good direction
> This touches on a detail (at least in Intel's vtd-io spec) that is not
> 100% clear to me. Second paragraph of section "3.4.3 Scalable Mode
> Address Translation" reads:
> "
>    ... Scalable-mode context-entries support both requests-without-PASID
>    and requests-with-PASID. However unlike legacy mode, in scalable-mode,
>    requests-without-PASID obtain a PASID value from the RID_PASID field of
>    the scalable-mode context- entry and are processed similarly to
>    requests-with-PASID.Implementations not supporting RID_PASID capability
>    (ECAP_REG.RPS is 0b), use a PASID value of 0 to perform address
>    translation for requests without PASID.
> "
> This basically means that a default PASID is used even though the
> request is without PASID. Right? Therefore "outside PASID" means with
> the default PASID (at least in Intels case). Right?

Kind of yes.

The PCI specification defines the concept of PASID and its role in
transaction routing. We refer to PCI transactions with a PASID prefix as
"request-with-PASID" and those without a PASID prefix as "request-
without-PASID." Consequently, I understand 'outside PASID' to mean
transactions that do not have a PASID prefix.

The VT-d specification describes how the IOMMU hardware handles request-
without-PASID. It uses a reserved PASID for its internal routing and
handling purposes. If RID_PASID is supported (ECAP_REG.RPS=1), software
can select its own reserved PASID. Otherwise, the IOMMU hardware will
use a default value of 0.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases
  2024-09-02 12:47     ` Baolu Lu
@ 2024-09-03 13:20       ` Joel Granados
  2024-09-04  1:37         ` Baolu Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Joel Granados @ 2024-09-03 13:20 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Jason Gunthorpe, Klaus Jensen, David Woodhouse, Joerg Roedel,
	Will Deacon, Robin Murphy, Kevin Tian, Minwoo Im, linux-kernel,
	iommu, Klaus Jensen

On Mon, Sep 02, 2024 at 08:47:21PM +0800, Baolu Lu wrote:
> On 2024/9/2 18:48, Joel Granados wrote:
> >> I definitely expect PRI to work outside PASID and SVA cases, so this
> >> is going in a good direction
> > This touches on a detail (at least in Intel's vtd-io spec) that is not
> > 100% clear to me. Second paragraph of section "3.4.3 Scalable Mode
> > Address Translation" reads:
> > "
> >    ... Scalable-mode context-entries support both requests-without-PASID
> >    and requests-with-PASID. However unlike legacy mode, in scalable-mode,
> >    requests-without-PASID obtain a PASID value from the RID_PASID field of
> >    the scalable-mode context- entry and are processed similarly to
> >    requests-with-PASID.Implementations not supporting RID_PASID capability
> >    (ECAP_REG.RPS is 0b), use a PASID value of 0 to perform address
> >    translation for requests without PASID.
> > "
> > This basically means that a default PASID is used even though the
> > request is without PASID. Right? Therefore "outside PASID" means with
> > the default PASID (at least in Intels case). Right?
> 
> Kind of yes.
> 
> The PCI specification defines the concept of PASID and its role in
> transaction routing. We refer to PCI transactions with a PASID prefix as
> "request-with-PASID" and those without a PASID prefix as "request-
> without-PASID." Consequently, I understand 'outside PASID' to mean
> transactions that do not have a PASID prefix.
> 
> The VT-d specification describes how the IOMMU hardware handles request-
> without-PASID. It uses a reserved PASID for its internal routing and
> handling purposes. If RID_PASID is supported (ECAP_REG.RPS=1), software
> can select its own reserved PASID. Otherwise, the IOMMU hardware will
> use a default value of 0.
> 
Thx for getting back to me. This generates another doubt in my head
regarding the published capabilities from the intel IOMMU Hardware:

So ecap_pasid [1] does not have to be set in scalable-mode. Right? This
allows hardware supporting scalable-mode to reject transactions with
PASID whenever ecap_pasid is *NOT* set; even though internally things
are handled with a PASID. This question is directly related to the two
last patches in the set.5/6 and 6/6.

Best

[1] I use ecap_pasid like in the kernel `ecap_pasid(e)  (e>>40) & 0x1`

-- 

Joel Granados

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases
  2024-09-03 13:20       ` Joel Granados
@ 2024-09-04  1:37         ` Baolu Lu
  2024-09-04 10:05           ` Joel Granados
  0 siblings, 1 reply; 25+ messages in thread
From: Baolu Lu @ 2024-09-04  1:37 UTC (permalink / raw)
  To: Joel Granados
  Cc: baolu.lu, Jason Gunthorpe, Klaus Jensen, David Woodhouse,
	Joerg Roedel, Will Deacon, Robin Murphy, Kevin Tian, Minwoo Im,
	linux-kernel, iommu, Klaus Jensen

On 9/3/24 9:20 PM, Joel Granados wrote:
> On Mon, Sep 02, 2024 at 08:47:21PM +0800, Baolu Lu wrote:
>> On 2024/9/2 18:48, Joel Granados wrote:
>>>> I definitely expect PRI to work outside PASID and SVA cases, so this
>>>> is going in a good direction
>>> This touches on a detail (at least in Intel's vtd-io spec) that is not
>>> 100% clear to me. Second paragraph of section "3.4.3 Scalable Mode
>>> Address Translation" reads:
>>> "
>>>     ... Scalable-mode context-entries support both requests-without-PASID
>>>     and requests-with-PASID. However unlike legacy mode, in scalable-mode,
>>>     requests-without-PASID obtain a PASID value from the RID_PASID field of
>>>     the scalable-mode context- entry and are processed similarly to
>>>     requests-with-PASID.Implementations not supporting RID_PASID capability
>>>     (ECAP_REG.RPS is 0b), use a PASID value of 0 to perform address
>>>     translation for requests without PASID.
>>> "
>>> This basically means that a default PASID is used even though the
>>> request is without PASID. Right? Therefore "outside PASID" means with
>>> the default PASID (at least in Intels case). Right?
>> Kind of yes.
>>
>> The PCI specification defines the concept of PASID and its role in
>> transaction routing. We refer to PCI transactions with a PASID prefix as
>> "request-with-PASID" and those without a PASID prefix as "request-
>> without-PASID." Consequently, I understand 'outside PASID' to mean
>> transactions that do not have a PASID prefix.
>>
>> The VT-d specification describes how the IOMMU hardware handles request-
>> without-PASID. It uses a reserved PASID for its internal routing and
>> handling purposes. If RID_PASID is supported (ECAP_REG.RPS=1), software
>> can select its own reserved PASID. Otherwise, the IOMMU hardware will
>> use a default value of 0.
>>
> Thx for getting back to me. This generates another doubt in my head
> regarding the published capabilities from the intel IOMMU Hardware:
> 
> So ecap_pasid [1] does not have to be set in scalable-mode. Right? This
> allows hardware supporting scalable-mode to reject transactions with
> PASID whenever ecap_pasid is*NOT*  set; even though internally things
> are handled with a PASID. This question is directly related to the two
> last patches in the set.5/6 and 6/6.

Yes. And 5/6, 6/6 make sense to me. We should remove the PASID
restriction from the code once PRI is split from SVA.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 1/6] iommu/vt-d: Separate page request queue from SVM
  2024-09-01  5:16   ` Baolu Lu
@ 2024-09-04  8:39     ` Joel Granados
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Granados @ 2024-09-04  8:39 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Klaus Jensen, David Woodhouse, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian, Minwoo Im,
	linux-kernel, iommu, Klaus Jensen

On Sun, Sep 01, 2024 at 01:16:43PM +0800, Baolu Lu wrote:
> On 2024/8/26 19:40, Klaus Jensen wrote:
> > From: Joel Granados<j.granados@samsung.com>
> > 
> > IO page faults are no longer dependent on CONFIG_INTEL_IOMMU_SVM. Move
> > all Page Request Queue (PRQ) functions that handle prq events to a new
> > file in drivers/iommu/intel/prq.c. The page_req_des struct is made
> > available in drivers/iommu/intel/iommu.h.
> > 
> > No functional changes are intended. This is a preparation patch to
> > enable the use of IO page faults outside the SVM and nested use cases.
> > 
> > Signed-off-by: Joel Granados<j.granados@samsung.com>
> > ---
> >   drivers/iommu/intel/Makefile |   2 +-
> >   drivers/iommu/intel/iommu.c  |  18 +--
> >   drivers/iommu/intel/iommu.h  |  40 +++++-
> >   drivers/iommu/intel/prq.c    | 290 ++++++++++++++++++++++++++++++++++++++++
> >   drivers/iommu/intel/svm.c    | 308 -------------------------------------------
> >   5 files changed, 331 insertions(+), 327 deletions(-)
...
> > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> > index b67c14da1240..b3d98e706ed8 100644
> > --- a/drivers/iommu/intel/iommu.h
> > +++ b/drivers/iommu/intel/iommu.h
> > @@ -694,6 +694,35 @@ struct iommu_pmu {
> >   #define IOMMU_IRQ_ID_OFFSET_PRQ		(DMAR_UNITS_SUPPORTED)
> >   #define IOMMU_IRQ_ID_OFFSET_PERF	(2 * DMAR_UNITS_SUPPORTED)
> >   
> > +/* Page request queue descriptor */
> > +struct page_req_dsc {
> > +	union {
> > +		struct {
> > +			u64 type:8;
> > +			u64 pasid_present:1;
> > +			u64 rsvd:7;
> > +			u64 rid:16;
> > +			u64 pasid:20;
> > +			u64 exe_req:1;
> > +			u64 pm_req:1;
> > +			u64 rsvd2:10;
> > +		};
> > +		u64 qw_0;
> > +	};
> > +	union {
> > +		struct {
> > +			u64 rd_req:1;
> > +			u64 wr_req:1;
> > +			u64 lpig:1;
> > +			u64 prg_index:9;
> > +			u64 addr:52;
> > +		};
> > +		u64 qw_1;
> > +	};
> > +	u64 qw_2;
> > +	u64 qw_3;
> > +};
> 
> Why not move this structure to prq.c? It is specific to that file. Or
> not?
I had left it here because it was included in `struct intel_iommu` but
since it is just a pointer, I think it can be put there if it makes more
sense.

> 
> > +
> >   struct intel_iommu {
> >   	void __iomem	*reg; /* Pointer to hardware regs, virtual addr */
> >   	u64 		reg_phys; /* physical address of hw register set */
> > @@ -719,12 +748,10 @@ struct intel_iommu {
> >   
> >   	struct iommu_flush flush;
> >   #endif
> > -#ifdef CONFIG_INTEL_IOMMU_SVM
> >   	struct page_req_dsc *prq;
> >   	unsigned char prq_name[16];    /* Name for PRQ interrupt */
> >   	unsigned long prq_seq_number;
> >   	struct completion prq_complete;
> > -#endif
> >   	struct iopf_queue *iopf_queue;
> >   	unsigned char iopfq_name[16];
> >   	/* Synchronization between fault report and iommu device release. */
> > @@ -1156,12 +1183,13 @@ void intel_context_flush_present(struct device_domain_info *info,
> >   				 struct context_entry *context,
> >   				 bool affect_domains);
> >   
> > +int intel_enable_prq(struct intel_iommu *iommu);
> > +int intel_finish_prq(struct intel_iommu *iommu);
> > +void intel_page_response(struct device *dev, struct iopf_fault *evt,
> > +			struct iommu_page_response *msg);
> > +
> >   #ifdef CONFIG_INTEL_IOMMU_SVM
> >   void intel_svm_check(struct intel_iommu *iommu);
> > -int intel_svm_enable_prq(struct intel_iommu *iommu);
> > -int intel_svm_finish_prq(struct intel_iommu *iommu);
> > -void intel_svm_page_response(struct device *dev, struct iopf_fault *evt,
> > -			     struct iommu_page_response *msg);
> >   struct iommu_domain *intel_svm_domain_alloc(struct device *dev,
> >   					    struct mm_struct *mm);
> >   void intel_drain_pasid_prq(struct device *dev, u32 pasid);
> > diff --git a/drivers/iommu/intel/prq.c b/drivers/iommu/intel/prq.c
> > new file mode 100644
> > index 000000000000..2814373e95d8
> > --- /dev/null
> > +++ b/drivers/iommu/intel/prq.c
> > @@ -0,0 +1,290 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright © 2015 Intel Corporation.
> > + *
> > + * Authors: David Woodhouse<dwmw2@infradead.org>
> 
> Many contributors have worked on the code moved in this change. The
> original authorship is no longer relevant.
> 
> Consider adding a comment like 'Split from svm.c' to document the
> origin.
Good point. Will do that.

> > + */
> > +
> > +#include <linux/pci.h>
> > +
> > +#include "iommu.h"
> > +#include "../iommu-pages.h"
> > +#include "trace.h"
> > +
> > +static bool is_canonical_address(u64 addr)
> > +{
> > +	int shift = 64 - (__VIRTUAL_MASK_SHIFT + 1);
> > +	long saddr = (long) addr;
> > +
> > +	return (((saddr << shift) >> shift) == saddr);
> > +}
...
> > +			QI_PGRP_RESP_CODE(msg->code) |
> > +			QI_PGRP_RESP_TYPE;
> > +	desc.qw1 = QI_PGRP_IDX(prm->grpid) | QI_PGRP_LPIG(last_page);
> > +	desc.qw2 = 0;
> > +	desc.qw3 = 0;
> > +
> > +	qi_submit_sync(iommu, &desc, 1, 0);
> > +}
> 
> The intel_drain_pasid_prq() helper should be moved to prq.c. It's no
> longer specific to SVM.
Oops. missed that one. Will do.

Will address these and the rest of the comments in my V1

Thx for the review

Best

-- 

Joel Granados

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 1/6] iommu/vt-d: Separate page request queue from SVM
  2024-08-26 13:09   ` Baolu Lu
@ 2024-09-04  9:12     ` Joel Granados
  2024-09-04 11:07       ` Baolu Lu
  0 siblings, 1 reply; 25+ messages in thread
From: Joel Granados @ 2024-09-04  9:12 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Klaus Jensen, David Woodhouse, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian, Minwoo Im,
	linux-kernel, iommu, Klaus Jensen

On Mon, Aug 26, 2024 at 09:09:14PM +0800, Baolu Lu wrote:
> On 2024/8/26 19:40, Klaus Jensen wrote:
> > From: Joel Granados<j.granados@samsung.com>
> > 
> > IO page faults are no longer dependent on CONFIG_INTEL_IOMMU_SVM. Move
> > all Page Request Queue (PRQ) functions that handle prq events to a new
> > file in drivers/iommu/intel/prq.c. The page_req_des struct is made
> > available in drivers/iommu/intel/iommu.h.
> > 
> > No functional changes are intended. This is a preparation patch to
> > enable the use of IO page faults outside the SVM and nested use cases.
> > 
> > Signed-off-by: Joel Granados<j.granados@samsung.com>
> > ---
> >   drivers/iommu/intel/Makefile |   2 +-
> >   drivers/iommu/intel/iommu.c  |  18 +--
> >   drivers/iommu/intel/iommu.h  |  40 +++++-
> >   drivers/iommu/intel/prq.c    | 290 ++++++++++++++++++++++++++++++++++++++++
> >   drivers/iommu/intel/svm.c    | 308 -------------------------------------------
> >   5 files changed, 331 insertions(+), 327 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
> > index c8beb0281559..d3bb0798092d 100644
> > --- a/drivers/iommu/intel/Makefile
> > +++ b/drivers/iommu/intel/Makefile
> > @@ -1,6 +1,6 @@
> >   # SPDX-License-Identifier: GPL-2.0
> >   obj-$(CONFIG_DMAR_TABLE) += dmar.o
> > -obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o
> > +obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o prq.o
> >   obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
> >   obj-$(CONFIG_DMAR_PERF) += perf.o
> >   obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
> 
> Thanks for the patch! Now that IOPF is separate from SVA, the Kconfig
> needs to be updated accordingly.
> 
> diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
> index f52fb39c968e..2888671c9278 100644
> --- a/drivers/iommu/intel/Kconfig
> +++ b/drivers/iommu/intel/Kconfig
> @@ -15,6 +15,7 @@ config INTEL_IOMMU
>          select DMA_OPS
>          select IOMMU_API
>          select IOMMU_IOVA
> +       select IOMMU_IOPF
This will force IOMMU_IOPF when INTEL_IOMMU is selected. You do this in
order to use IOPF if available, but it wont conflict with Intel IOMMU's
that do not support IOPF. right?

>          select IOMMUFD_DRIVER if IOMMUFD
>          select NEED_DMA_MAP_STATE
>          select DMAR_TABLE
> @@ -51,7 +52,6 @@ config INTEL_IOMMU_SVM
>          depends on X86_64
>          select MMU_NOTIFIER
>          select IOMMU_SVA
> -       select IOMMU_IOPF
I already had this one
>          help
>            Shared Virtual Memory (SVM) provides a facility for devices
>            to access DMA resources through process address space by
> 
> Thanks,
> baolu

Thx for the review

best
-- 

Joel Granados

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 2/6] iommu: Make IOMMU_IOPF selectable in Kconfig
  2024-08-26 14:05   ` Jason Gunthorpe
@ 2024-09-04  9:44     ` Joel Granados
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Granados @ 2024-09-04  9:44 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Klaus Jensen, David Woodhouse, Lu Baolu, Joerg Roedel,
	Will Deacon, Robin Murphy, Kevin Tian, Minwoo Im, linux-kernel,
	iommu, Klaus Jensen

On Mon, Aug 26, 2024 at 11:05:58AM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 26, 2024 at 01:40:28PM +0200, Klaus Jensen wrote:
> > From: Joel Granados <j.granados@samsung.com>
> > 
> > IOMMU_IOPF is no longer selectable through INTEL_IOMMU_SVM effectively
> > severing their relation and allowing them to be used independently.
> > 
> > Signed-off-by: Joel Granados <j.granados@samsung.com>
> > ---
> >  drivers/iommu/Kconfig       | 2 +-
> >  drivers/iommu/intel/Kconfig | 1 -
> >  2 files changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index a82f10054aec..d3ee8a0ad4a6 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -164,7 +164,7 @@ config IOMMU_SVA
> >  	bool
> >  
> >  config IOMMU_IOPF
> > -	bool
> > +	bool "Enable IO page fault in IOMMU"
> 
> 
> Currently IOMMU_IOPF indicates that the driver wants to consume the
> library functions around IOPF, it is not a user selectable because any
> driver that links to those functions should have them working. If you
> want to make the core driver use them then the select should be moved
> from the SVM sub config to the core driver config.
This is in line with the feedback that I got from Lu. I'll put
IOMMU_IOPF under INTEL_IOMMU and drop this commit from the set.

Thx.

> 
> If you want to change IOMMU_IOPF to a user configurable then it should
> have a full help text and all the kconfig places touching it should be
> turned into
> 
>   depends IOMMU_IOPF
> 
> Ie you can't enable any drivers SVA kconfig without also picking
> IOMMU_IOPF.
> 
> This is doing neither fully.. Pick one :)
> 
> Jason

-- 

Joel Granados

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases
  2024-09-04  1:37         ` Baolu Lu
@ 2024-09-04 10:05           ` Joel Granados
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Granados @ 2024-09-04 10:05 UTC (permalink / raw)
  To: Baolu Lu
  Cc: Jason Gunthorpe, Klaus Jensen, David Woodhouse, Joerg Roedel,
	Will Deacon, Robin Murphy, Kevin Tian, Minwoo Im, linux-kernel,
	iommu, Klaus Jensen

On Wed, Sep 04, 2024 at 09:37:35AM +0800, Baolu Lu wrote:
> On 9/3/24 9:20 PM, Joel Granados wrote:
> > On Mon, Sep 02, 2024 at 08:47:21PM +0800, Baolu Lu wrote:
> >> On 2024/9/2 18:48, Joel Granados wrote:
> >>>> I definitely expect PRI to work outside PASID and SVA cases, so this
> >>>> is going in a good direction
> >>> This touches on a detail (at least in Intel's vtd-io spec) that is not
> >>> 100% clear to me. Second paragraph of section "3.4.3 Scalable Mode
> >>> Address Translation" reads:
> >>> "
> >>>     ... Scalable-mode context-entries support both requests-without-PASID
> >>>     and requests-with-PASID. However unlike legacy mode, in scalable-mode,
> >>>     requests-without-PASID obtain a PASID value from the RID_PASID field of
> >>>     the scalable-mode context- entry and are processed similarly to
> >>>     requests-with-PASID.Implementations not supporting RID_PASID capability
> >>>     (ECAP_REG.RPS is 0b), use a PASID value of 0 to perform address
> >>>     translation for requests without PASID.
> >>> "
> >>> This basically means that a default PASID is used even though the
> >>> request is without PASID. Right? Therefore "outside PASID" means with
> >>> the default PASID (at least in Intels case). Right?
> >> Kind of yes.
> >>
> >> The PCI specification defines the concept of PASID and its role in
> >> transaction routing. We refer to PCI transactions with a PASID prefix as
> >> "request-with-PASID" and those without a PASID prefix as "request-
> >> without-PASID." Consequently, I understand 'outside PASID' to mean
> >> transactions that do not have a PASID prefix.
> >>
> >> The VT-d specification describes how the IOMMU hardware handles request-
> >> without-PASID. It uses a reserved PASID for its internal routing and
> >> handling purposes. If RID_PASID is supported (ECAP_REG.RPS=1), software
> >> can select its own reserved PASID. Otherwise, the IOMMU hardware will
> >> use a default value of 0.
> >>
> > Thx for getting back to me. This generates another doubt in my head
> > regarding the published capabilities from the intel IOMMU Hardware:
> > 
> > So ecap_pasid [1] does not have to be set in scalable-mode. Right? This
> > allows hardware supporting scalable-mode to reject transactions with
> > PASID whenever ecap_pasid is*NOT*  set; even though internally things
> > are handled with a PASID. This question is directly related to the two
> > last patches in the set.5/6 and 6/6.
> 
> Yes. And 5/6, 6/6 make sense to me. We should remove the PASID
> restriction from the code once PRI is split from SVA.

Thx for the clarification. I'll make sure to include them in my V1

Best

-- 

Joel Granados

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 6/6] iommu/vt-d: do not require a PASID in page requests
  2024-08-26 11:40 ` [PATCH RFC PREVIEW 6/6] iommu/vt-d: do not require a PASID in page requests Klaus Jensen
@ 2024-09-04 10:19   ` Joel Granados
  2024-09-04 11:39     ` Joel Granados
  0 siblings, 1 reply; 25+ messages in thread
From: Joel Granados @ 2024-09-04 10:19 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian, Minwoo Im,
	linux-kernel, iommu, Klaus Jensen

On Mon, Aug 26, 2024 at 01:40:32PM +0200, Klaus Jensen wrote:
> From: Klaus Jensen <k.jensen@samsung.com>
> 
> PRQ events can now be handled without a PASID being present. Remove the
> restriction.
> 
> Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> ---
>  drivers/iommu/intel/prq.c | 12 +++---------
>  1 file changed, 3 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/intel/prq.c b/drivers/iommu/intel/prq.c
> index 2814373e95d8..cc36198ebf91 100644
> --- a/drivers/iommu/intel/prq.c
> +++ b/drivers/iommu/intel/prq.c
> @@ -101,18 +101,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>  		req = &iommu->prq[head / sizeof(*req)];
>  		address = (u64)req->addr << VTD_PAGE_SHIFT;
>  
> -		if (unlikely(!req->pasid_present)) {
> -			pr_err("IOMMU: %s: Page request without PASID\n",
> -			       iommu->name);
> -bad_req:
> -			handle_bad_prq_event(iommu, req, QI_RESP_INVALID);
> -			goto prq_advance;
> -		}
> -
>  		if (unlikely(!is_canonical_address(address))) {
>  			pr_err("IOMMU: %s: Address is not canonical\n",
>  			       iommu->name);
> -			goto bad_req;
> +bad_req:
> +			handle_bad_prq_event(iommu, req, QI_RESP_INVALID);
> +			goto prq_advance;
>  		}
>  
>  		if (unlikely(req->pm_req && (req->rd_req | req->wr_req))) {
> 
> -- 
> 2.45.2
> 

I'll squash this commit with the one that introduces the prq.c file.

Best

-- 

Joel Granados

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 1/6] iommu/vt-d: Separate page request queue from SVM
  2024-09-04  9:12     ` Joel Granados
@ 2024-09-04 11:07       ` Baolu Lu
  0 siblings, 0 replies; 25+ messages in thread
From: Baolu Lu @ 2024-09-04 11:07 UTC (permalink / raw)
  To: Joel Granados
  Cc: baolu.lu, Klaus Jensen, David Woodhouse, Joerg Roedel,
	Will Deacon, Robin Murphy, Jason Gunthorpe, Kevin Tian, Minwoo Im,
	linux-kernel, iommu, Klaus Jensen

On 2024/9/4 17:12, Joel Granados wrote:
> On Mon, Aug 26, 2024 at 09:09:14PM +0800, Baolu Lu wrote:
>> On 2024/8/26 19:40, Klaus Jensen wrote:
>>> From: Joel Granados<j.granados@samsung.com>
>>>
>>> IO page faults are no longer dependent on CONFIG_INTEL_IOMMU_SVM. Move
>>> all Page Request Queue (PRQ) functions that handle prq events to a new
>>> file in drivers/iommu/intel/prq.c. The page_req_des struct is made
>>> available in drivers/iommu/intel/iommu.h.
>>>
>>> No functional changes are intended. This is a preparation patch to
>>> enable the use of IO page faults outside the SVM and nested use cases.
>>>
>>> Signed-off-by: Joel Granados<j.granados@samsung.com>
>>> ---
>>>    drivers/iommu/intel/Makefile |   2 +-
>>>    drivers/iommu/intel/iommu.c  |  18 +--
>>>    drivers/iommu/intel/iommu.h  |  40 +++++-
>>>    drivers/iommu/intel/prq.c    | 290 ++++++++++++++++++++++++++++++++++++++++
>>>    drivers/iommu/intel/svm.c    | 308 -------------------------------------------
>>>    5 files changed, 331 insertions(+), 327 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
>>> index c8beb0281559..d3bb0798092d 100644
>>> --- a/drivers/iommu/intel/Makefile
>>> +++ b/drivers/iommu/intel/Makefile
>>> @@ -1,6 +1,6 @@
>>>    # SPDX-License-Identifier: GPL-2.0
>>>    obj-$(CONFIG_DMAR_TABLE) += dmar.o
>>> -obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o
>>> +obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o cache.o prq.o
>>>    obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
>>>    obj-$(CONFIG_DMAR_PERF) += perf.o
>>>    obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
>> Thanks for the patch! Now that IOPF is separate from SVA, the Kconfig
>> needs to be updated accordingly.
>>
>> diff --git a/drivers/iommu/intel/Kconfig b/drivers/iommu/intel/Kconfig
>> index f52fb39c968e..2888671c9278 100644
>> --- a/drivers/iommu/intel/Kconfig
>> +++ b/drivers/iommu/intel/Kconfig
>> @@ -15,6 +15,7 @@ config INTEL_IOMMU
>>           select DMA_OPS
>>           select IOMMU_API
>>           select IOMMU_IOVA
>> +       select IOMMU_IOPF
> This will force IOMMU_IOPF when INTEL_IOMMU is selected. You do this in
> order to use IOPF if available, but it wont conflict with Intel IOMMU's
> that do not support IOPF. right?

Intel VT-d includes a bit in the capability registers that indicates
whether PRI is supported by the hardware. Therefore, there should be no
conflict.

Thanks,
baolu

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 6/6] iommu/vt-d: do not require a PASID in page requests
  2024-09-04 10:19   ` Joel Granados
@ 2024-09-04 11:39     ` Joel Granados
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Granados @ 2024-09-04 11:39 UTC (permalink / raw)
  To: Klaus Jensen
  Cc: David Woodhouse, Lu Baolu, Joerg Roedel, Will Deacon,
	Robin Murphy, Jason Gunthorpe, Kevin Tian, Minwoo Im,
	linux-kernel, iommu, Klaus Jensen

On Wed, Sep 04, 2024 at 12:19:16PM +0200, Joel Granados wrote:
> On Mon, Aug 26, 2024 at 01:40:32PM +0200, Klaus Jensen wrote:
> > From: Klaus Jensen <k.jensen@samsung.com>
> > 
> > PRQ events can now be handled without a PASID being present. Remove the
> > restriction.
> > 
> > Signed-off-by: Klaus Jensen <k.jensen@samsung.com>
> > ---
> >  drivers/iommu/intel/prq.c | 12 +++---------
> >  1 file changed, 3 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/prq.c b/drivers/iommu/intel/prq.c
> > index 2814373e95d8..cc36198ebf91 100644
> > --- a/drivers/iommu/intel/prq.c
> > +++ b/drivers/iommu/intel/prq.c
> > @@ -101,18 +101,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> >  		req = &iommu->prq[head / sizeof(*req)];
> >  		address = (u64)req->addr << VTD_PAGE_SHIFT;
> >  
> > -		if (unlikely(!req->pasid_present)) {
> > -			pr_err("IOMMU: %s: Page request without PASID\n",
> > -			       iommu->name);
> > -bad_req:
> > -			handle_bad_prq_event(iommu, req, QI_RESP_INVALID);
> > -			goto prq_advance;
> > -		}
> > -
> >  		if (unlikely(!is_canonical_address(address))) {
> >  			pr_err("IOMMU: %s: Address is not canonical\n",
> >  			       iommu->name);
> > -			goto bad_req;
> > +bad_req:
> > +			handle_bad_prq_event(iommu, req, QI_RESP_INVALID);
> > +			goto prq_advance;
> >  		}
> >  
> >  		if (unlikely(req->pm_req && (req->rd_req | req->wr_req))) {
> > 
> > -- 
> > 2.45.2
> > 
> 
> I'll squash this commit with the one that introduces the prq.c file.
On second thought. I'll leave it as a separate commit. The first one
will be the "copy all the code to a new file" commit and the second one
will be "remove the pasid check" I'll change the commit message a bit so
it is clear why they are not together.

Best

-- 

Joel Granados

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases
  2024-09-02 10:48   ` Joel Granados
  2024-09-02 11:06     ` Joel Granados
  2024-09-02 12:47     ` Baolu Lu
@ 2024-09-04 16:13     ` Jason Gunthorpe
  2024-09-09 14:46       ` Joel Granados
  2 siblings, 1 reply; 25+ messages in thread
From: Jason Gunthorpe @ 2024-09-04 16:13 UTC (permalink / raw)
  To: Joel Granados
  Cc: Klaus Jensen, David Woodhouse, Lu Baolu, Joerg Roedel,
	Will Deacon, Robin Murphy, Kevin Tian, Minwoo Im, linux-kernel,
	iommu, Klaus Jensen

On Mon, Sep 02, 2024 at 12:48:19PM +0200, Joel Granados wrote:
> > >   Supplementary repositories supporting this patchset:
> > >     1. A user space library libvfn [1] which is used for testing and
> > >        verification (see examples/iopf.c), and
> > 
> > That's pretty neat, I've been wanting to see some kind of IOMMU test
> > suite based around a capable widely available device. This is the
> > closest I've seen..
>
> Yes! This is an obvious application of libvfn. Do you see it as a
> something that can be included in tools/selftests/iommu?

Maybe? What would it look like in-kernel?

I've been thinking the same thing with mlx5

Maybe some kind of test runner with a plugin driver that has some kind
of 'do dma', 'generate interrupt', etc sort of operations, IDK.

Jason

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases
  2024-09-04 16:13     ` Jason Gunthorpe
@ 2024-09-09 14:46       ` Joel Granados
  0 siblings, 0 replies; 25+ messages in thread
From: Joel Granados @ 2024-09-09 14:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Klaus Jensen, David Woodhouse, Lu Baolu, Joerg Roedel,
	Will Deacon, Robin Murphy, Kevin Tian, Minwoo Im, linux-kernel,
	iommu, Klaus Jensen

On Wed, Sep 04, 2024 at 01:13:50PM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 02, 2024 at 12:48:19PM +0200, Joel Granados wrote:
> > > >   Supplementary repositories supporting this patchset:
> > > >     1. A user space library libvfn [1] which is used for testing and
> > > >        verification (see examples/iopf.c), and
> > > 
> > > That's pretty neat, I've been wanting to see some kind of IOMMU test
> > > suite based around a capable widely available device. This is the
> > > closest I've seen..
> >
> > Yes! This is an obvious application of libvfn. Do you see it as a
> > something that can be included in tools/selftests/iommu?
> 
> Maybe? What would it look like in-kernel?
Having it in-kernel with libvfn might be a bit too much because we would
have to bring libvfn into the kernel sources. But we can have some sort
of DMA test suit that runs as CI. Similar to what fstests or blktest do
(dmatests?). And we can automate it all with kdevops.

Here is a very rough run down of the idea.
1. We create (or use if there is one already) a DMA test suit. That has
can run on its own
2. Use libvfn to create commands that poke at the iommu{,fd}, intel)
amd, arm drivers.
3. Use qemu iommu implementation as well as pci dma enabled devices as
the test devices.
4. Can use hardware if it is detected.
5. Can use kdevops to bring up the different environments (e.g. kernel and
qemu branch combinations) needed for the test.
6. And finally put the kdevops test targets into some existing kernel CI
like linux-next or 0day (or whatever makes sense).

> 
> I've been thinking the same thing with mlx5
Not too familiar with this driver, but if it makes sense to reuse it (or
part of it) to make the test happen, I'm all for that.

> 
> Maybe some kind of test runner with a plugin driver that has some kind
> of 'do dma', 'generate interrupt', etc sort of operations, IDK.
Yes. If you are up for it, we can maybe discuss it a bit in LPC and
flesh it out a bit more?

Best

-- 

Joel Granados

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2024-09-10  6:19 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-26 11:40 [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases Klaus Jensen
2024-08-26 11:40 ` [PATCH RFC PREVIEW 1/6] iommu/vt-d: Separate page request queue from SVM Klaus Jensen
2024-08-26 13:09   ` Baolu Lu
2024-09-04  9:12     ` Joel Granados
2024-09-04 11:07       ` Baolu Lu
2024-09-01  5:16   ` Baolu Lu
2024-09-04  8:39     ` Joel Granados
2024-08-26 11:40 ` [PATCH RFC PREVIEW 2/6] iommu: Make IOMMU_IOPF selectable in Kconfig Klaus Jensen
2024-08-26 14:05   ` Jason Gunthorpe
2024-09-04  9:44     ` Joel Granados
2024-08-26 11:40 ` [PATCH RFC PREVIEW 3/6] iommufd: Enable PRI when doing the iommufd_hwpt_alloc Klaus Jensen
2024-08-26 11:40 ` [PATCH RFC PREVIEW 4/6] iommu: init pasid array while doing domain_replace and iopf is active Klaus Jensen
2024-08-26 11:40 ` [PATCH RFC PREVIEW 5/6] iommu/vt-d: drop pasid requirement for prq initialization Klaus Jensen
2024-08-26 11:40 ` [PATCH RFC PREVIEW 6/6] iommu/vt-d: do not require a PASID in page requests Klaus Jensen
2024-09-04 10:19   ` Joel Granados
2024-09-04 11:39     ` Joel Granados
2024-08-26 13:59 ` [PATCH RFC PREVIEW 0/6] iommu: enable user space iopfs in non-nested and non-svm cases Jason Gunthorpe
2024-09-02 10:48   ` Joel Granados
2024-09-02 11:06     ` Joel Granados
2024-09-02 12:47     ` Baolu Lu
2024-09-03 13:20       ` Joel Granados
2024-09-04  1:37         ` Baolu Lu
2024-09-04 10:05           ` Joel Granados
2024-09-04 16:13     ` Jason Gunthorpe
2024-09-09 14:46       ` Joel Granados

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox