[PATCH v3 0/5] Add support for Verisilicon IOMMU used by media codec blocks

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 0/5]  Add support for Verisilicon IOMMU used by media codec blocks
@ 2025-06-19 13:12 Benjamin Gaignard
  2025-06-19 13:12 ` [PATCH v3 1/5] dt-bindings: vendor-prefixes: Add Verisilicon Benjamin Gaignard
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Benjamin Gaignard @ 2025-06-19 13:12 UTC (permalink / raw)
  To: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, jgg
  Cc: iommu, devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	kernel, Benjamin Gaignard

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 2558 bytes --]

Hi all,

This patch series adds support for the Verisilicon IOMMU, which is found in front
of hardware encoder and decoder blocks in several SoCs using Verisilicon IP. 
A first implementation of this IOMMU is available on the Rockchip RK3588 SoC.

Rockchip provides a driver for this hardware in their 6.1 kernel branch:
https://github.com/rockchip-linux/kernel/blob/develop-6.1/drivers/iommu/rockchip-iommu-av1d.c

This series includes:
- a new binding for the Verisilicon IOMMU
- a basic driver implementation
- DT updates for RK3588

The driver was forward-ported from Rockchip’s 6.1 implementation, 
the prefix was renamed to vsi for generality, and several fixes were applied.

AV1 decoding was tested using the stateless VPU driver and Fluster.
The test results show a score of 205/239, which confirms that no regressions
were introduced by this series.

Feedback and testing welcome.

changes in version 3:
- Change compatible to "rockchip,rk3588-iommu-1.2"
- Fix compatible in .yaml
- Update DT and driver to use "rockchip,rk3588-iommu-1.2" compatible
- Set CONFIG_VSI_IOMMU as module in defconfig
- Create an identity domain for the driver
- Fix double flush issue
- Rework attach/detach logic
- Simplify xlate function
- Discover iommu device like done in ARM driver
- Remove ARM_DMA_USE_IOMMU from Kconfig

changes in version 2:
- Add a compatible "rockchip,rk3588-av1-iommu"
- Fix clock-names in binding 
- Remove "vsi_mmu" label in binding example.
- Rework driver probe function
- Remove double flush
- Rework driver internal structures and avoid allocate
  in xlate function.
- Do not touch to VPU driver anymore (path removed)
- Add a patch to enable the driver in arm64 defconfig

Benjamin Gaignard (5):
  dt-bindings: vendor-prefixes: Add Verisilicon
  dt-bindings: iommu: verisilicon: Add binding for VSI IOMMU
  iommu: Add verisilicon IOMMU driver
  arm64: dts: rockchip: Add verisilicon IOMMU node on RK3588
  arm64: defconfig: enable Verisilicon IOMMU

 .../bindings/iommu/verisilicon,iommu.yaml     |  71 ++
 .../devicetree/bindings/vendor-prefixes.yaml  |   2 +
 arch/arm64/boot/dts/rockchip/rk3588-base.dtsi |  11 +
 arch/arm64/configs/defconfig                  |   1 +
 drivers/iommu/Kconfig                         |  11 +
 drivers/iommu/Makefile                        |   1 +
 drivers/iommu/vsi-iommu.c                     | 874 ++++++++++++++++++
 7 files changed, 971 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/verisilicon,iommu.yaml
 create mode 100644 drivers/iommu/vsi-iommu.c

-- 
2.43.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 1/5] dt-bindings: vendor-prefixes: Add Verisilicon
  2025-06-19 13:12 [PATCH v3 0/5] Add support for Verisilicon IOMMU used by media codec blocks Benjamin Gaignard
@ 2025-06-19 13:12 ` Benjamin Gaignard
  2025-06-19 13:12 ` [PATCH v3 2/5] dt-bindings: iommu: verisilicon: Add binding for VSI IOMMU Benjamin Gaignard
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Benjamin Gaignard @ 2025-06-19 13:12 UTC (permalink / raw)
  To: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, jgg
  Cc: iommu, devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	kernel, Benjamin Gaignard, Conor Dooley

Verisilicon Microelectronics is a company based in Shanghai, China,
developping hardware blocks for SoC.

https://verisilicon.com/

Add their name to the list of vendors.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
---
 Documentation/devicetree/bindings/vendor-prefixes.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/vendor-prefixes.yaml b/Documentation/devicetree/bindings/vendor-prefixes.yaml
index 5d2a7a8d3ac6..1baf8304c9ac 100644
--- a/Documentation/devicetree/bindings/vendor-prefixes.yaml
+++ b/Documentation/devicetree/bindings/vendor-prefixes.yaml
@@ -1635,6 +1635,8 @@ patternProperties:
     description: Variscite Ltd.
   "^vdl,.*":
     description: Van der Laan b.v.
+  "^verisilicon,.*":
+    description: VeriSilicon Microelectronics
   "^vertexcom,.*":
     description: Vertexcom Technologies, Inc.
   "^via,.*":
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 2/5] dt-bindings: iommu: verisilicon: Add binding for VSI IOMMU
  2025-06-19 13:12 [PATCH v3 0/5] Add support for Verisilicon IOMMU used by media codec blocks Benjamin Gaignard
  2025-06-19 13:12 ` [PATCH v3 1/5] dt-bindings: vendor-prefixes: Add Verisilicon Benjamin Gaignard
@ 2025-06-19 13:12 ` Benjamin Gaignard
  2025-06-19 14:19   ` Sebastian Reichel
  2025-06-19 13:12 ` [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver Benjamin Gaignard
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Benjamin Gaignard @ 2025-06-19 13:12 UTC (permalink / raw)
  To: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, jgg
  Cc: iommu, devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	kernel, Benjamin Gaignard

Add a device tree binding for the Verisilicon (VSI) IOMMU.
This IOMMU sits in front of hardware encoder and decoder
blocks on SoCs using Verisilicon IP, such as the Rockchip RK3588.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
---
changes in version 3:
- Change compatible to "rockchip,rk3588-iommu-1.2"
- Fix compatible in .yaml

 .../bindings/iommu/verisilicon,iommu.yaml     | 71 +++++++++++++++++++
 1 file changed, 71 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/verisilicon,iommu.yaml

diff --git a/Documentation/devicetree/bindings/iommu/verisilicon,iommu.yaml b/Documentation/devicetree/bindings/iommu/verisilicon,iommu.yaml
new file mode 100644
index 000000000000..8d0b0658ed8b
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/verisilicon,iommu.yaml
@@ -0,0 +1,71 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/iommu/verisilicon,iommu.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Verisilicon IOMMU
+
+maintainers:
+  - Benjamin Gaignard <benjamin.gaignard@collabora.com>
+
+description: |+
+  A Versilicon iommu translates io virtual addresses to physical addresses for
+  its associated video decoder.
+
+properties:
+  compatible:
+    items:
+      - const: verisilicon,iommu
+      - const: rockchip,rk3588-iommu-1.2
+
+  reg:
+    maxItems: 1
+
+  interrupts:
+    maxItems: 1
+
+  clocks:
+    items:
+      - description: Core clock
+      - description: Interface clock
+
+  clock-names:
+    items:
+      - const: core
+      - const: iface
+
+  "#iommu-cells":
+    const: 0
+
+  power-domains:
+    maxItems: 1
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - clock-names
+  - "#iommu-cells"
+
+additionalProperties: false
+
+examples:
+  - |
+    #include <dt-bindings/clock/rockchip,rk3588-cru.h>
+    #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+    bus {
+      #address-cells = <2>;
+      #size-cells = <2>;
+
+      iommu@fdca0000 {
+        compatible = "verisilicon,iommu","rockchip,rk3588-iommu-1.2";
+        reg = <0x0 0xfdca0000 0x0 0x600>;
+        interrupts = <GIC_SPI 109 IRQ_TYPE_LEVEL_HIGH 0>;
+        clocks = <&cru ACLK_AV1>, <&cru PCLK_AV1>;
+        clock-names = "core", "iface";
+        #iommu-cells = <0>;
+      };
+    };
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-19 13:12 [PATCH v3 0/5] Add support for Verisilicon IOMMU used by media codec blocks Benjamin Gaignard
  2025-06-19 13:12 ` [PATCH v3 1/5] dt-bindings: vendor-prefixes: Add Verisilicon Benjamin Gaignard
  2025-06-19 13:12 ` [PATCH v3 2/5] dt-bindings: iommu: verisilicon: Add binding for VSI IOMMU Benjamin Gaignard
@ 2025-06-19 13:12 ` Benjamin Gaignard
  2025-06-19 13:47   ` Jason Gunthorpe
  2025-06-20 19:37   ` Robin Murphy
  2025-06-19 13:12 ` [PATCH v3 4/5] arm64: dts: rockchip: Add verisilicon IOMMU node on RK3588 Benjamin Gaignard
  2025-06-19 13:12 ` [PATCH v3 5/5] arm64: defconfig: enable Verisilicon IOMMU Benjamin Gaignard
  4 siblings, 2 replies; 22+ messages in thread
From: Benjamin Gaignard @ 2025-06-19 13:12 UTC (permalink / raw)
  To: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, jgg
  Cc: iommu, devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	kernel, Benjamin Gaignard

The Verisilicon IOMMU hardware block can be found in combination
with Verisilicon hardware video codecs (encoders or decoders) on
different SoCs.
Enable it will allow us to use non contiguous memory allocators
for Verisilicon video codecs.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
---
changes in version 3:
- Change compatible to "rockchip,rk3588-iommu-1.2"
- Create an identity domain for the driver
- Fix double flush issue
- Rework attach/detach logic
- Simplify xlate function
- Discover iommu device like done in ARM driver
- Remove ARM_DMA_USE_IOMMU from Kconfig

 drivers/iommu/Kconfig     |  11 +
 drivers/iommu/Makefile    |   1 +
 drivers/iommu/vsi-iommu.c | 874 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 886 insertions(+)
 create mode 100644 drivers/iommu/vsi-iommu.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 0a33d995d15d..3e95d1db737b 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -383,4 +383,15 @@ config SPRD_IOMMU
 
 	  Say Y here if you want to use the multimedia devices listed above.
 
+config VSI_IOMMU
+	tristate "Verisilicon IOMMU Support"
+	depends on ARM64
+	select IOMMU_API
+	help
+	  Support for IOMMUs used by Verisilicon sub-systems like video
+	  decoders or encoder hardware blocks.
+
+	  Say Y here if you want to use this IOMMU in front of these
+	  hardware blocks.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 355294fa9033..68aeff31af8b 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -34,3 +34,4 @@ obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
 obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o
 obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
 obj-$(CONFIG_APPLE_DART) += apple-dart.o
+obj-$(CONFIG_VSI_IOMMU) += vsi-iommu.o
diff --git a/drivers/iommu/vsi-iommu.c b/drivers/iommu/vsi-iommu.c
new file mode 100644
index 000000000000..89e63a6a60c1
--- /dev/null
+++ b/drivers/iommu/vsi-iommu.c
@@ -0,0 +1,874 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (C) 2025 Collabora Ltd.
+ *
+ * IOMMU API for Verisilicon
+ *
+ * Module Authors:	Yandong Lin <yandong.lin@rock-chips.com>
+ *			Simon Xue <xxm@rock-chips.com>
+ *			Benjamin Gaignard <benjamin.gaignard@collabora.com>
+ */
+
+#include <linux/clk.h>
+#include <linux/compiler.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/dma-mapping.h>
+#include <linux/dma-map-ops.h>
+#include <linux/errno.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iommu.h>
+#include <linux/iopoll.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/of_iommu.h>
+#include <linux/of_platform.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include "iommu-pages.h"
+
+struct vsi_iommu {
+	struct device *dev;
+	void __iomem **bases;
+	int num_mmu;
+	int num_irq;
+	struct clk_bulk_data *clocks;
+	int num_clocks;
+	struct iommu_device iommu;
+	struct list_head node; /* entry in vsi_iommu_domain.iommus */
+	struct iommu_domain *domain; /* domain to which iommu is attached */
+};
+
+struct vsi_iommu_domain {
+	struct device *dma_dev;
+	struct list_head iommus;
+	u32 *dt; /* page directory table */
+	dma_addr_t dt_dma;
+	spinlock_t iommus_lock; /* lock for iommus list */
+	spinlock_t dt_lock; /* lock for modifying page directory table */
+	struct iommu_domain domain;
+	/* for vsi iommu */
+	u64 *pta; /* page directory table */
+	dma_addr_t pta_dma;
+};
+
+static struct iommu_domain vsi_identity_domain;
+
+#define NUM_DT_ENTRIES	1024
+#define NUM_PT_ENTRIES	1024
+#define PT_SIZE		(NUM_PT_ENTRIES * sizeof(u32))
+
+#define SPAGE_SIZE	BIT(12)
+
+/* vsi iommu regs address */
+#define VSI_MMU_CONFIG1_BASE			0x1ac
+#define VSI_MMU_AHB_EXCEPTION_BASE		0x380
+#define VSI_MMU_AHB_CONTROL_BASE		0x388
+#define VSI_MMU_AHB_TBL_ARRAY_BASE_L_BASE	0x38C
+
+/* MMU register offsets */
+#define VSI_MMU_FLUSH_BASE		0x184
+#define VSI_MMU_BIT_FLUSH		BIT(4)
+
+#define VSI_MMU_PAGE_FAULT_ADDR		0x380
+#define VSI_MMU_STATUS_BASE		0x384	/* IRQ status */
+
+#define VSI_MMU_BIT_ENABLE		BIT(0)
+
+#define VSI_MMU_OUT_OF_BOUND		BIT(28)
+/* Irq mask */
+#define VSI_MMU_IRQ_MASK		0x7
+
+#define VSI_DTE_PT_ADDRESS_MASK		0xffffffc0
+#define VSI_DTE_PT_VALID		BIT(0)
+
+#define VSI_PAGE_DESC_LO_MASK		0xfffff000
+#define VSI_PAGE_DESC_HI_MASK		GENMASK_ULL(39, 32)
+#define VSI_PAGE_DESC_HI_SHIFT		(32 - 4)
+
+static inline phys_addr_t vsi_dte_pt_address(u32 dte)
+{
+	return (phys_addr_t)dte & VSI_DTE_PT_ADDRESS_MASK;
+}
+
+static inline u32 vsi_mk_dte(u32 dte)
+{
+	return (phys_addr_t)dte | VSI_DTE_PT_VALID;
+}
+
+#define VSI_PTE_PAGE_ADDRESS_MASK	0xfffffff0
+#define VSI_PTE_PAGE_WRITABLE		BIT(2)
+#define VSI_PTE_PAGE_VALID		BIT(0)
+
+static inline phys_addr_t vsi_pte_page_address(u32 pte)
+{
+	u64 pte_vsi = pte;
+
+	pte_vsi = ((pte_vsi & VSI_PAGE_DESC_HI_MASK) << VSI_PAGE_DESC_HI_SHIFT) |
+		  (pte_vsi & VSI_PAGE_DESC_LO_MASK);
+
+	return (phys_addr_t)pte_vsi;
+}
+
+static u32 vsi_mk_pte(phys_addr_t page, int prot)
+{
+	u32 flags = 0;
+
+	flags |= (prot & IOMMU_WRITE) ? VSI_PTE_PAGE_WRITABLE : 0;
+	page = (page & VSI_PAGE_DESC_LO_MASK) |
+	       ((page & VSI_PAGE_DESC_HI_MASK) >> VSI_PAGE_DESC_HI_SHIFT);
+	page &= VSI_PTE_PAGE_ADDRESS_MASK;
+
+	return page | flags | VSI_PTE_PAGE_VALID;
+}
+
+#define VSI_DTE_PT_VALID	BIT(0)
+
+static inline bool vsi_dte_is_pt_valid(u32 dte)
+{
+	return dte & VSI_DTE_PT_VALID;
+}
+
+static inline bool vsi_pte_is_page_valid(u32 pte)
+{
+	return pte & VSI_PTE_PAGE_VALID;
+}
+
+static u32 vsi_mk_pte_invalid(u32 pte)
+{
+	return pte & ~VSI_PTE_PAGE_VALID;
+}
+
+#define VSI_MASTER_TLB_MASK	GENMASK_ULL(31, 10)
+/* mode 0 : 4k */
+#define VSI_PTA_4K_MODE	0
+
+static u64 vsi_mk_pta(dma_addr_t dt_dma)
+{
+	u64 val = (dt_dma & VSI_MASTER_TLB_MASK) | VSI_PTA_4K_MODE;
+
+	return val;
+}
+
+static struct vsi_iommu_domain *to_vsi_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct vsi_iommu_domain, domain);
+}
+
+static void vsi_iommu_disable(struct vsi_iommu *iommu)
+{
+	int i;
+
+	/* Ignore error while disabling, just keep going */
+	WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks));
+	for (i = 0; i < iommu->num_mmu; i++)
+		writel(0, iommu->bases[i] + VSI_MMU_AHB_CONTROL_BASE);
+
+	clk_bulk_disable(iommu->num_clocks, iommu->clocks);
+}
+
+static int vsi_iommu_enable(struct vsi_iommu *iommu)
+{
+	struct iommu_domain *domain = iommu->domain;
+	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
+	int ret, i;
+
+	ret = clk_bulk_enable(iommu->num_clocks, iommu->clocks);
+	if (ret)
+		return ret;
+
+	for (i = 0; i < iommu->num_mmu; i++) {
+		u32 val = readl(iommu->bases[i] + VSI_MMU_AHB_CONTROL_BASE);
+
+		if (!(val & VSI_MMU_BIT_ENABLE)) {
+			writel(vsi_domain->pta_dma,
+			       iommu->bases[i] + VSI_MMU_AHB_TBL_ARRAY_BASE_L_BASE);
+			writel(VSI_MMU_OUT_OF_BOUND, iommu->bases[i] + VSI_MMU_CONFIG1_BASE);
+			writel(VSI_MMU_BIT_ENABLE, iommu->bases[i] + VSI_MMU_AHB_EXCEPTION_BASE);
+			writel(VSI_MMU_BIT_ENABLE, iommu->bases[i] + VSI_MMU_AHB_CONTROL_BASE);
+		}
+	}
+	clk_bulk_disable(iommu->num_clocks, iommu->clocks);
+
+	return ret;
+}
+
+static inline void vsi_table_flush(struct vsi_iommu_domain *vsi_domain, dma_addr_t dma,
+				   unsigned int count)
+{
+	size_t size = count * sizeof(u32); /* count of u32 entry */
+
+	dma_sync_single_for_device(vsi_domain->dma_dev, dma, size, DMA_TO_DEVICE);
+}
+
+#define VSI_IOVA_DTE_MASK	0xffc00000
+#define VSI_IOVA_DTE_SHIFT	22
+#define VSI_IOVA_PTE_MASK	0x003ff000
+#define VSI_IOVA_PTE_SHIFT	12
+#define VSI_IOVA_PAGE_MASK	0x00000fff
+#define VSI_IOVA_PAGE_SHIFT	0
+
+static u32 vsi_iova_dte_index(dma_addr_t iova)
+{
+	return (u32)(iova & VSI_IOVA_DTE_MASK) >> VSI_IOVA_DTE_SHIFT;
+}
+
+static u32 vsi_iova_pte_index(dma_addr_t iova)
+{
+	return (u32)(iova & VSI_IOVA_PTE_MASK) >> VSI_IOVA_PTE_SHIFT;
+}
+
+static u32 vsi_iova_page_offset(dma_addr_t iova)
+{
+	return (u32)(iova & VSI_IOVA_PAGE_MASK) >> VSI_IOVA_PAGE_SHIFT;
+}
+
+static u32 vsi_iommu_read(void __iomem *base, u32 offset)
+{
+	return readl(base + offset);
+}
+
+static void vsi_iommu_write(void __iomem *base, u32 offset, u32 value)
+{
+	writel(value, base + offset);
+}
+
+static void vsi_iommu_flush_tlb_all(struct iommu_domain *domain)
+{
+	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
+	struct list_head *pos;
+	unsigned long flags;
+	int i;
+
+	spin_lock_irqsave(&vsi_domain->iommus_lock, flags);
+	list_for_each(pos, &vsi_domain->iommus) {
+		struct vsi_iommu *iommu;
+		int ret;
+
+		iommu = list_entry(pos, struct vsi_iommu, node);
+		ret = pm_runtime_get_if_in_use(iommu->dev);
+		if (WARN_ON_ONCE(ret < 0))
+			continue;
+		if (ret) {
+			WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks));
+			for (i = 0; i < iommu->num_mmu; i++) {
+				writel(VSI_MMU_BIT_FLUSH,
+				       iommu->bases[i] + VSI_MMU_FLUSH_BASE);
+				writel(0, iommu->bases[i] + VSI_MMU_FLUSH_BASE);
+			}
+			clk_bulk_disable(iommu->num_clocks, iommu->clocks);
+			pm_runtime_put(iommu->dev);
+		}
+	}
+	spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
+}
+
+static irqreturn_t vsi_iommu_irq(int irq, void *dev_id)
+{
+	struct vsi_iommu *iommu = dev_id;
+	u32 int_status;
+	dma_addr_t iova;
+	irqreturn_t ret = IRQ_NONE;
+	int i, err;
+
+	err = pm_runtime_get_if_in_use(iommu->dev);
+	if (!err || WARN_ON_ONCE(err < 0))
+		return ret;
+
+	if (WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)))
+		goto out;
+
+	for (i = 0; i < iommu->num_mmu; i++) {
+		int_status = vsi_iommu_read(iommu->bases[i], VSI_MMU_STATUS_BASE);
+		if (int_status & VSI_MMU_IRQ_MASK) {
+			dev_err(iommu->dev, "unexpected int_status=%08x\n", int_status);
+			iova = vsi_iommu_read(iommu->bases[i], VSI_MMU_PAGE_FAULT_ADDR);
+
+			if (iommu->domain)
+				report_iommu_fault(iommu->domain, iommu->dev, iova, int_status);
+			else
+				dev_err(iommu->dev,
+					"Page fault while iommu not attached to domain?\n");
+		}
+		vsi_iommu_write(iommu->bases[i], VSI_MMU_STATUS_BASE, 0);
+		ret = IRQ_HANDLED;
+	}
+
+	clk_bulk_disable(iommu->num_clocks, iommu->clocks);
+
+out:
+	pm_runtime_put(iommu->dev);
+	return ret;
+}
+
+static struct vsi_iommu *vsi_iommu_get_from_dev(struct device *dev)
+{
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+	struct device *iommu_dev = bus_find_device_by_fwnode(&platform_bus_type,
+							     fwspec->iommu_fwnode);
+
+	put_device(iommu_dev);
+
+	return iommu_dev ? dev_get_drvdata(iommu_dev) : NULL;
+}
+
+static struct iommu_domain *vsi_iommu_domain_alloc_paging(struct device *dev)
+{
+	struct vsi_iommu *iommu = vsi_iommu_get_from_dev(dev);
+	struct vsi_iommu_domain *vsi_domain;
+
+	vsi_domain = kzalloc(sizeof(*vsi_domain), GFP_KERNEL);
+	if (!vsi_domain)
+		return NULL;
+
+	vsi_domain->dma_dev = iommu->dev;
+	iommu->domain = &vsi_identity_domain;
+
+	/*
+	 * iommu use a 2 level pagetable.
+	 * Each level1 (dt) and level2 (pt) table has 1024 4-byte entries.
+	 * Allocate one 4 KiB page for each table.
+	 */
+	vsi_domain->dt = iommu_alloc_pages_sz(GFP_KERNEL | GFP_DMA32,
+					      SPAGE_SIZE);
+	if (!vsi_domain->dt)
+		goto err_free_domain;
+
+	vsi_domain->dt_dma = dma_map_single(vsi_domain->dma_dev, vsi_domain->dt,
+					    SPAGE_SIZE, DMA_TO_DEVICE);
+	if (dma_mapping_error(vsi_domain->dma_dev, vsi_domain->dt_dma)) {
+		dev_err(vsi_domain->dma_dev, "DMA map error for DT\n");
+		goto err_free_dt;
+	}
+
+	vsi_domain->pta = iommu_alloc_pages_sz(GFP_KERNEL | GFP_DMA32,
+					       SPAGE_SIZE);
+	if (!vsi_domain->pta)
+		goto err_unmap_dt;
+
+	vsi_domain->pta[0] = vsi_mk_pta(vsi_domain->dt_dma);
+	vsi_domain->pta_dma = dma_map_single(vsi_domain->dma_dev, vsi_domain->pta,
+					     SPAGE_SIZE, DMA_TO_DEVICE);
+	if (dma_mapping_error(vsi_domain->dma_dev, vsi_domain->pta_dma)) {
+		dev_err(vsi_domain->dma_dev, "DMA map error for PTA\n");
+		goto err_free_pta;
+	}
+
+	spin_lock_init(&vsi_domain->iommus_lock);
+	spin_lock_init(&vsi_domain->dt_lock);
+	INIT_LIST_HEAD(&vsi_domain->iommus);
+
+	vsi_domain->domain.geometry.aperture_start = 0;
+	vsi_domain->domain.geometry.aperture_end   = DMA_BIT_MASK(32);
+	vsi_domain->domain.geometry.force_aperture = true;
+	vsi_domain->domain.pgsize_bitmap = SZ_4K;
+
+	return &vsi_domain->domain;
+
+err_free_pta:
+	iommu_free_pages(vsi_domain->pta);
+err_unmap_dt:
+	dma_unmap_single(vsi_domain->dma_dev, vsi_domain->dt_dma,
+			 SPAGE_SIZE, DMA_TO_DEVICE);
+err_free_dt:
+	iommu_free_pages(vsi_domain->dt);
+err_free_domain:
+	kfree(vsi_domain);
+
+	return NULL;
+}
+
+static phys_addr_t vsi_iommu_iova_to_phys(struct iommu_domain *domain,
+					  dma_addr_t iova)
+{
+	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
+	unsigned long flags;
+	phys_addr_t pt_phys, phys = 0;
+	u32 dte, pte;
+	u32 *page_table;
+
+	spin_lock_irqsave(&vsi_domain->dt_lock, flags);
+
+	dte = vsi_domain->dt[vsi_iova_dte_index(iova)];
+	if (!vsi_dte_is_pt_valid(dte))
+		goto out;
+
+	pt_phys = vsi_dte_pt_address(dte);
+	page_table = (u32 *)phys_to_virt(pt_phys);
+	pte = page_table[vsi_iova_pte_index(iova)];
+	if (!vsi_pte_is_page_valid(pte))
+		goto out;
+
+	phys = vsi_pte_page_address(pte) + vsi_iova_page_offset(iova);
+out:
+	spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
+
+	return phys;
+}
+
+static u32 *vsi_dte_get_page_table(struct vsi_iommu_domain *vsi_domain, dma_addr_t iova)
+{
+	u32 *page_table, *dte_addr;
+	u32 dte_index, dte;
+	phys_addr_t pt_phys;
+	dma_addr_t pt_dma;
+
+	assert_spin_locked(&vsi_domain->dt_lock);
+
+	dte_index = vsi_iova_dte_index(iova);
+	dte_addr = &vsi_domain->dt[dte_index];
+	dte = *dte_addr;
+	if (vsi_dte_is_pt_valid(dte))
+		goto done;
+
+	page_table = (u32 *)iommu_alloc_pages_sz(GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);
+	if (!page_table)
+		return ERR_PTR(-ENOMEM);
+
+	dte = vsi_mk_dte(virt_to_phys(page_table));
+	*dte_addr = dte;
+
+	pt_dma = dma_map_single(vsi_domain->dma_dev, page_table, SPAGE_SIZE, DMA_TO_DEVICE);
+	if (dma_mapping_error(vsi_domain->dma_dev, pt_dma)) {
+		dev_err(vsi_domain->dma_dev, "DMA mapping error while allocating page table\n");
+		iommu_free_pages(page_table);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	vsi_table_flush(vsi_domain,
+			vsi_domain->dt_dma + dte_index * sizeof(u32), 1);
+done:
+	pt_phys = vsi_dte_pt_address(dte);
+	return (u32 *)phys_to_virt(pt_phys);
+}
+
+static size_t vsi_iommu_unmap_iova(struct vsi_iommu_domain *vsi_domain,
+				   u32 *pte_addr, dma_addr_t pte_dma,
+				   size_t size)
+{
+	unsigned int pte_count;
+	unsigned int pte_total = size / SPAGE_SIZE;
+
+	assert_spin_locked(&vsi_domain->dt_lock);
+
+	for (pte_count = 0; pte_count < pte_total; pte_count++) {
+		u32 pte = pte_addr[pte_count];
+
+		if (!vsi_pte_is_page_valid(pte))
+			break;
+
+		pte_addr[pte_count] = vsi_mk_pte_invalid(pte);
+	}
+
+	vsi_table_flush(vsi_domain, pte_dma, pte_count);
+
+	return pte_count * SPAGE_SIZE;
+}
+
+static int vsi_iommu_map_iova(struct vsi_iommu_domain *vsi_domain, u32 *pte_addr,
+			      dma_addr_t pte_dma, dma_addr_t iova,
+			      phys_addr_t paddr, size_t size, int prot)
+{
+	unsigned int pte_count;
+	unsigned int pte_total = size / SPAGE_SIZE;
+
+	assert_spin_locked(&vsi_domain->dt_lock);
+
+	for (pte_count = 0; pte_count < pte_total; pte_count++) {
+		u32 pte = pte_addr[pte_count];
+
+		if (vsi_pte_is_page_valid(pte))
+			goto unwind;
+
+		pte_addr[pte_count] = vsi_mk_pte(paddr, prot);
+
+		paddr += SPAGE_SIZE;
+	}
+
+	vsi_table_flush(vsi_domain, pte_dma, pte_total);
+
+	return 0;
+unwind:
+	/* Unmap the range of iovas that we just mapped */
+	vsi_iommu_unmap_iova(vsi_domain, pte_addr, pte_dma,
+			     pte_count * SPAGE_SIZE);
+
+	return -EADDRINUSE;
+}
+
+static size_t vsi_iommu_unmap(struct iommu_domain *domain, unsigned long _iova,
+			      size_t size, size_t count, struct iommu_iotlb_gather *gather)
+{
+	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
+	unsigned long flags;
+	dma_addr_t pte_dma, iova = (dma_addr_t)_iova;
+	phys_addr_t pt_phys;
+	u32 dte;
+	u32 *pte_addr;
+	size_t unmap_size;
+
+	spin_lock_irqsave(&vsi_domain->dt_lock, flags);
+
+	dte = vsi_domain->dt[vsi_iova_dte_index(iova)];
+	/* Just return 0 if iova is unmapped */
+	if (!vsi_dte_is_pt_valid(dte)) {
+		spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
+		return 0;
+	}
+
+	pt_phys = vsi_dte_pt_address(dte);
+	pte_addr = (u32 *)phys_to_virt(pt_phys) + vsi_iova_pte_index(iova);
+	pte_dma = pt_phys + vsi_iova_pte_index(iova) * sizeof(u32);
+	unmap_size = vsi_iommu_unmap_iova(vsi_domain, pte_addr, pte_dma, size);
+
+	spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
+
+	return unmap_size;
+}
+
+static int vsi_iommu_map(struct iommu_domain *domain, unsigned long _iova,
+			 phys_addr_t paddr, size_t size, size_t count,
+			 int prot, gfp_t gfp, size_t *mapped)
+{
+	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
+	unsigned long flags;
+	dma_addr_t pte_dma, iova = (dma_addr_t)_iova;
+	u32 *page_table, *pte_addr;
+	u32 dte, pte_index;
+	int ret;
+
+	/*
+	 * IOMMU drivers are not supposed to lock the page table, however this
+	 * driver does not safely handle the cache flushing or table
+	 * installation across concurrent threads so locking is used as a simple
+	 * solution.
+	 */
+	spin_lock_irqsave(&vsi_domain->dt_lock, flags);
+
+	page_table = vsi_dte_get_page_table(vsi_domain, iova);
+	if (IS_ERR(page_table)) {
+		spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
+		return PTR_ERR(page_table);
+	}
+
+	dte = vsi_domain->dt[vsi_iova_dte_index(iova)];
+	pte_index = vsi_iova_pte_index(iova);
+	pte_addr = &page_table[pte_index];
+	pte_dma = vsi_dte_pt_address(dte) + pte_index * sizeof(u32);
+	ret = vsi_iommu_map_iova(vsi_domain, pte_addr, pte_dma, iova,
+				 paddr, size, prot);
+
+	spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
+	if (!ret)
+		*mapped = size;
+
+	return ret;
+}
+
+static int vsi_iommu_identity_attach(struct iommu_domain *domain,
+				     struct device *dev)
+{
+	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
+	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
+	unsigned long flags;
+	int ret;
+
+	if (WARN_ON(!iommu))
+		return -ENODEV;
+
+	if (iommu->domain == domain)
+		return 0;
+
+	iommu->domain = domain;
+
+	spin_lock_irqsave(&vsi_domain->iommus_lock, flags);
+	list_del_init(&iommu->node);
+	spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
+
+	ret = pm_runtime_get_if_in_use(iommu->dev);
+	WARN_ON_ONCE(ret < 0);
+	if (ret > 0) {
+		vsi_iommu_disable(iommu);
+		pm_runtime_put(iommu->dev);
+	}
+
+	return 0;
+}
+
+static struct iommu_domain_ops vsi_identity_ops = {
+	.attach_dev = vsi_iommu_identity_attach,
+};
+
+static struct iommu_domain vsi_identity_domain = {
+	.type = IOMMU_DOMAIN_IDENTITY,
+	.ops = &vsi_identity_ops,
+};
+
+static int vsi_iommu_attach_device(struct iommu_domain *domain,
+				   struct device *dev)
+{
+	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
+	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
+	unsigned long flags;
+	int ret;
+
+	if (WARN_ON(!iommu))
+		return -ENODEV;
+
+	/* iommu already attached */
+	if (iommu->domain == domain)
+		return 0;
+
+	ret = vsi_iommu_identity_attach(&vsi_identity_domain, dev);
+	if (ret)
+		return ret;
+
+	iommu->domain = domain;
+
+	spin_lock_irqsave(&vsi_domain->iommus_lock, flags);
+	list_add_tail(&iommu->node, &vsi_domain->iommus);
+	spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
+
+	ret = pm_runtime_get_if_in_use(iommu->dev);
+	if (!ret || WARN_ON_ONCE(ret < 0))
+		return 0;
+
+	ret = vsi_iommu_enable(iommu);
+	if (ret)
+		WARN_ON(vsi_iommu_identity_attach(&vsi_identity_domain, dev));
+
+	pm_runtime_put(iommu->dev);
+
+	return ret;
+}
+
+static void vsi_iommu_domain_free(struct iommu_domain *domain)
+{
+	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
+	int i;
+
+	WARN_ON(!list_empty(&vsi_domain->iommus));
+
+	for (i = 0; i < NUM_DT_ENTRIES; i++) {
+		u32 dte = vsi_domain->dt[i];
+
+		if (vsi_dte_is_pt_valid(dte)) {
+			phys_addr_t pt_phys = vsi_dte_pt_address(dte);
+			u32 *page_table = phys_to_virt(pt_phys);
+
+			dma_unmap_single(vsi_domain->dma_dev, pt_phys,
+					 SPAGE_SIZE, DMA_TO_DEVICE);
+			iommu_free_pages(page_table);
+		}
+	}
+
+	dma_unmap_single(vsi_domain->dma_dev, vsi_domain->dt_dma,
+			 SPAGE_SIZE, DMA_TO_DEVICE);
+	iommu_free_pages(vsi_domain->dt);
+
+	dma_unmap_single(vsi_domain->dma_dev, vsi_domain->pta_dma,
+			 SPAGE_SIZE, DMA_TO_DEVICE);
+	iommu_free_pages(vsi_domain->pta);
+
+	kfree(vsi_domain);
+}
+
+static struct iommu_device *vsi_iommu_probe_device(struct device *dev)
+{
+	struct vsi_iommu *iommu = vsi_iommu_get_from_dev(dev);
+	struct device_link *link;
+
+	if (WARN_ON(!iommu))
+		return ERR_PTR(-ENODEV);
+
+	link = device_link_add(dev, iommu->dev,
+			       DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
+	if (!link)
+		dev_err(dev, "Unable to link %s\n", dev_name(iommu->dev));
+
+	dev_iommu_priv_set(dev, iommu);
+	return &iommu->iommu;
+}
+
+static void vsi_iommu_release_device(struct device *dev)
+{
+	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
+
+	device_link_remove(dev, iommu->dev);
+}
+
+static int vsi_iommu_of_xlate(struct device *dev,
+			      const struct of_phandle_args *args)
+{
+	return iommu_fwspec_add_ids(dev, args->args, 1);
+}
+
+static struct iommu_ops vsi_iommu_ops = {
+	.identity_domain = &vsi_identity_domain,
+	.domain_alloc_paging = vsi_iommu_domain_alloc_paging,
+	.probe_device = vsi_iommu_probe_device,
+	.release_device = vsi_iommu_release_device,
+	.device_group = generic_single_device_group,
+	.of_xlate = vsi_iommu_of_xlate,
+	.default_domain_ops = &(const struct iommu_domain_ops) {
+		.attach_dev		= vsi_iommu_attach_device,
+		.map_pages		= vsi_iommu_map,
+		.unmap_pages		= vsi_iommu_unmap,
+		.flush_iotlb_all	= vsi_iommu_flush_tlb_all,
+		.iova_to_phys		= vsi_iommu_iova_to_phys,
+		.free			= vsi_iommu_domain_free,
+	}
+};
+
+static const struct of_device_id vsi_iommu_dt_ids[] = {
+	{
+		.compatible = "verisilicon,iommu",
+	},
+	{
+		.compatible = "rockchip,rk3588-iommu-1.2",
+	},
+	{ /* sentinel */ }
+};
+
+static int vsi_iommu_probe(struct platform_device *pdev)
+{
+	struct device *dev = &pdev->dev;
+	struct vsi_iommu *iommu;
+	struct resource *res;
+	int num_res = pdev->num_resources;
+	int err, i;
+
+	iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
+	if (!iommu)
+		return -ENOMEM;
+
+	iommu->dev = dev;
+	iommu->num_mmu = 0;
+
+	iommu->bases = devm_kcalloc(dev, num_res, sizeof(*iommu->bases),
+				    GFP_KERNEL);
+	if (!iommu->bases)
+		return -ENOMEM;
+
+	for (i = 0; i < num_res; i++) {
+		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
+		if (!res)
+			continue;
+		iommu->bases[i] = devm_ioremap_resource(&pdev->dev, res);
+		if (IS_ERR(iommu->bases[i]))
+			continue;
+		iommu->num_mmu++;
+	}
+	if (iommu->num_mmu == 0)
+		return PTR_ERR(iommu->bases[0]);
+
+	iommu->num_irq = platform_irq_count(pdev);
+	if (iommu->num_irq < 0)
+		return iommu->num_irq;
+
+	err = devm_clk_bulk_get_all(dev, &iommu->clocks);
+	if (err >= 0)
+		iommu->num_clocks = err;
+	else if (err == -ENOENT)
+		iommu->num_clocks = 0;
+	else
+		return err;
+
+	err = clk_bulk_prepare(iommu->num_clocks, iommu->clocks);
+	if (err)
+		return err;
+
+	for (i = 0; i < iommu->num_irq; i++) {
+		int irq = platform_get_irq(pdev, i);
+
+		if (irq < 0)
+			return irq;
+
+		err = devm_request_irq(iommu->dev, irq, vsi_iommu_irq,
+				       IRQF_SHARED, dev_name(dev), iommu);
+		if (err)
+			goto err_unprepare_clocks;
+	}
+
+	dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
+	platform_set_drvdata(pdev, iommu);
+
+	pm_runtime_enable(dev);
+
+	err = iommu_device_sysfs_add(&iommu->iommu, dev, NULL, dev_name(dev));
+	if (err)
+		goto err_runtime_disable;
+
+	err = iommu_device_register(&iommu->iommu, &vsi_iommu_ops, dev);
+	if (err)
+		goto err_remove_sysfs;
+
+	return 0;
+
+err_remove_sysfs:
+	iommu_device_sysfs_remove(&iommu->iommu);
+err_runtime_disable:
+	pm_runtime_disable(dev);
+err_unprepare_clocks:
+	clk_bulk_unprepare(iommu->num_clocks, iommu->clocks);
+	return err;
+}
+
+static void vsi_iommu_shutdown(struct platform_device *pdev)
+{
+	struct vsi_iommu *iommu = platform_get_drvdata(pdev);
+	int i;
+
+	for (i = 0; i < iommu->num_irq; i++) {
+		int irq = platform_get_irq(pdev, i);
+
+		devm_free_irq(iommu->dev, irq, iommu);
+	}
+
+	pm_runtime_force_suspend(&pdev->dev);
+}
+
+static int __maybe_unused vsi_iommu_suspend(struct device *dev)
+{
+	struct vsi_iommu *iommu = dev_get_drvdata(dev);
+
+	if (iommu->domain == &vsi_identity_domain)
+		return 0;
+
+	vsi_iommu_disable(iommu);
+	return 0;
+}
+
+static int __maybe_unused vsi_iommu_resume(struct device *dev)
+{
+	struct vsi_iommu *iommu = dev_get_drvdata(dev);
+
+	if (iommu->domain == &vsi_identity_domain)
+		return 0;
+
+	return vsi_iommu_enable(iommu);
+}
+
+static const struct dev_pm_ops vsi_iommu_pm_ops = {
+	SET_RUNTIME_PM_OPS(vsi_iommu_suspend, vsi_iommu_resume, NULL)
+	SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
+				pm_runtime_force_resume)
+};
+
+static struct platform_driver rockchip_vsi_iommu_driver = {
+	.probe = vsi_iommu_probe,
+	.shutdown = vsi_iommu_shutdown,
+	.driver = {
+		   .name = "vsi_iommu",
+		   .of_match_table = vsi_iommu_dt_ids,
+		   .pm = &vsi_iommu_pm_ops,
+		   .suppress_bind_attrs = true,
+	},
+};
+builtin_platform_driver(rockchip_vsi_iommu_driver);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 4/5] arm64: dts: rockchip: Add verisilicon IOMMU node on RK3588
  2025-06-19 13:12 [PATCH v3 0/5] Add support for Verisilicon IOMMU used by media codec blocks Benjamin Gaignard
                   ` (2 preceding siblings ...)
  2025-06-19 13:12 ` [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver Benjamin Gaignard
@ 2025-06-19 13:12 ` Benjamin Gaignard
  2025-06-19 13:12 ` [PATCH v3 5/5] arm64: defconfig: enable Verisilicon IOMMU Benjamin Gaignard
  4 siblings, 0 replies; 22+ messages in thread
From: Benjamin Gaignard @ 2025-06-19 13:12 UTC (permalink / raw)
  To: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, jgg
  Cc: iommu, devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	kernel, Benjamin Gaignard

Add the device tree node for the Verisilicon IOMMU present
in the RK3588 SoC.
This IOMMU handles address translation for the VPU hardware blocks.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
---
changes in version 3:
- Update DT to use "rockchip,rk3588-iommu-1.2" compatible

 arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
index 70f03e68ba55..ab97f06eea74 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
@@ -1263,6 +1263,17 @@ av1d: video-codec@fdc70000 {
 		clock-names = "aclk", "hclk";
 		power-domains = <&power RK3588_PD_AV1>;
 		resets = <&cru SRST_A_AV1>, <&cru SRST_P_AV1>, <&cru SRST_A_AV1_BIU>, <&cru SRST_P_AV1_BIU>;
+		iommus = <&av1d_mmu>;
+	};
+
+	av1d_mmu: iommu@fdca0000 {
+		compatible = "verisilicon,iommu","rockchip,rk3588-iommu-1.2";
+		reg = <0x0 0xfdca0000 0x0 0x600>;
+		interrupts = <GIC_SPI 109 IRQ_TYPE_LEVEL_HIGH 0>;
+		clocks = <&cru ACLK_AV1>, <&cru PCLK_AV1>;
+		clock-names = "core", "iface";
+		#iommu-cells = <0>;
+		power-domains = <&power RK3588_PD_AV1>;
 	};
 
 	vop: vop@fdd90000 {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v3 5/5] arm64: defconfig: enable Verisilicon IOMMU
  2025-06-19 13:12 [PATCH v3 0/5] Add support for Verisilicon IOMMU used by media codec blocks Benjamin Gaignard
                   ` (3 preceding siblings ...)
  2025-06-19 13:12 ` [PATCH v3 4/5] arm64: dts: rockchip: Add verisilicon IOMMU node on RK3588 Benjamin Gaignard
@ 2025-06-19 13:12 ` Benjamin Gaignard
  4 siblings, 0 replies; 22+ messages in thread
From: Benjamin Gaignard @ 2025-06-19 13:12 UTC (permalink / raw)
  To: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, jgg
  Cc: iommu, devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	kernel, Benjamin Gaignard

Enable Verisilicon IOMMU used by RK3588 AV1 hardware codec.

Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
---
changes in version 3:
- Set CONFIG_VSI_IOMMU as module in defconfig

 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 897fc686e6a9..a5122990ecfa 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -1451,6 +1451,7 @@ CONFIG_ARM_SMMU=y
 CONFIG_ARM_SMMU_V3=y
 CONFIG_MTK_IOMMU=y
 CONFIG_QCOM_IOMMU=y
+CONFIG_VSI_IOMMU=m
 CONFIG_REMOTEPROC=y
 CONFIG_IMX_REMOTEPROC=y
 CONFIG_MTK_SCP=m
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-19 13:12 ` [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver Benjamin Gaignard
@ 2025-06-19 13:47   ` Jason Gunthorpe
  2025-06-19 16:27     ` Benjamin Gaignard
  2025-06-20 19:37   ` Robin Murphy
  1 sibling, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2025-06-19 13:47 UTC (permalink / raw)
  To: Benjamin Gaignard
  Cc: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, iommu, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip, kernel

On Thu, Jun 19, 2025 at 03:12:24PM +0200, Benjamin Gaignard wrote:

> +static struct iommu_domain *vsi_iommu_domain_alloc_paging(struct device *dev)
> +{
> +	struct vsi_iommu *iommu = vsi_iommu_get_from_dev(dev);
> +	struct vsi_iommu_domain *vsi_domain;
> +
> +	vsi_domain = kzalloc(sizeof(*vsi_domain), GFP_KERNEL);
> +	if (!vsi_domain)
> +		return NULL;
> +
> +	vsi_domain->dma_dev = iommu->dev;
> +	iommu->domain = &vsi_identity_domain;

?? alloc paging should not change the iommu.

Probably this belongs in vsi_iommu_probe_device if the device starts
up in an identity translation mode.

> +static u32 *vsi_dte_get_page_table(struct vsi_iommu_domain *vsi_domain, dma_addr_t iova)
> +{
> +	u32 *page_table, *dte_addr;
> +	u32 dte_index, dte;
> +	phys_addr_t pt_phys;
> +	dma_addr_t pt_dma;
> +
> +	assert_spin_locked(&vsi_domain->dt_lock);
> +
> +	dte_index = vsi_iova_dte_index(iova);
> +	dte_addr = &vsi_domain->dt[dte_index];
> +	dte = *dte_addr;
> +	if (vsi_dte_is_pt_valid(dte))
> +		goto done;
> +
> +	page_table = (u32 *)iommu_alloc_pages_sz(GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);

Unnecessary casts are not the kernel style, I saw a couple others too

Ugh. This ignores the gfp flags that are passed into map because you
have to force atomic due to the spinlock that shouldn't be there :(
This means it does not set GFP_KERNEL_ACCOUNT when required. It would
be better to continue to use the passed in GFP flags but override them
to atomic mode.

> +static int vsi_iommu_identity_attach(struct iommu_domain *domain,
> +				     struct device *dev)
> +{
> +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
> +	unsigned long flags;
> +	int ret;
> +
> +	if (WARN_ON(!iommu))
> +		return -ENODEV;

These WARN_ON's should be removed. ops are never called by the core
without a probed device.

> +static int vsi_iommu_attach_device(struct iommu_domain *domain,
> +				   struct device *dev)
> +{
> +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
> +	unsigned long flags;
> +	int ret;
> +
> +	if (WARN_ON(!iommu))
> +		return -ENODEV;
> +
> +	/* iommu already attached */
> +	if (iommu->domain == domain)
> +		return 0;
> +
> +	ret = vsi_iommu_identity_attach(&vsi_identity_domain, dev);
> +	if (ret)
> +		return ret;

Hurm, this is actually quite bad, now that it is clear the HW is in an
identity mode it is actually a security problem for VFIO to switch the
translation to identity during attach_device. I'd really prefer new
drivers don't make this mistake.

It seems the main thing motivating this is the fact a linked list has
only a single iommu->node so you can't attach the iommu to both the
new/old domain and atomically update the page table base.

Is it possible for the HW to do a blocking behavior? That would be an
easy fix.. You should always be able to force this by allocating a
shared top page table level during probe time and making it entirely
empty while staying always in the paging mode. Maybe there is a less
expensive way.

Otherwise you probably have work more like the other drivers and
allocate a struct for each attachment so you can have the iommu
attached two domains during the switch over and never drop to an
identity mode.

> +	iommu->domain = domain;
> +
> +	spin_lock_irqsave(&vsi_domain->iommus_lock, flags);
> +	list_add_tail(&iommu->node, &vsi_domain->iommus);
> +	spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
> +
> +	ret = pm_runtime_get_if_in_use(iommu->dev);
> +	if (!ret || WARN_ON_ONCE(ret < 0))
> +		return 0;

This probably should have a comment, is the idea the resume will setup
the domain? How does locking of iommu->domain work in that case?

Maybe the suspend resume paths should be holding the group mutex..

> +	ret = vsi_iommu_enable(iommu);
> +	if (ret)
> +		WARN_ON(vsi_iommu_identity_attach(&vsi_identity_domain, dev));

Is this necessary though? vsi_iommu_enable failure cases don't change
the HW, and a few lines above was an identity_attach. Just delay
setting iommu->domain until it succeeds, and this is a simple error.

> +static struct iommu_ops vsi_iommu_ops = {
> +	.identity_domain = &vsi_identity_domain,

Add:

  .release_domain = &vsi_identity_domain,

Which will cause the core code to automatically run through to
vsi_iommu_disable() prior to calling vsi_iommu_release_device(), which
will avoid UAF problems.

Also, should the probe functions be doing some kind of validation that
there is only one struct device attached?

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] dt-bindings: iommu: verisilicon: Add binding for VSI IOMMU
  2025-06-19 13:12 ` [PATCH v3 2/5] dt-bindings: iommu: verisilicon: Add binding for VSI IOMMU Benjamin Gaignard
@ 2025-06-19 14:19   ` Sebastian Reichel
  2025-06-19 15:02     ` Conor Dooley
  2025-06-20  9:54     ` Benjamin Gaignard
  0 siblings, 2 replies; 22+ messages in thread
From: Sebastian Reichel @ 2025-06-19 14:19 UTC (permalink / raw)
  To: Benjamin Gaignard
  Cc: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, jgg, iommu, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip, kernel

[-- Attachment #1: Type: text/plain, Size: 1747 bytes --]

Hi,

On Thu, Jun 19, 2025 at 03:12:23PM +0200, Benjamin Gaignard wrote:
> +properties:
> +  compatible:
> +    items:
> +      - const: verisilicon,iommu
> +      - const: rockchip,rk3588-iommu-1.2

The entries should be ordered the other way around, so that the
"generic" compatible is the fallback. Also the 1.2 version is from
Verisilicon. It does not really make sense for Rockchip. So I
think it should look like this:

properties:
  compatible:
    items:
      - const: rockchip,rk3588-av1-iommu
      - const: verisilicon,iommu-1.2

Otherwise LGTM.

-- Sebastian

> +
> +  reg:
> +    maxItems: 1
> +
> +  interrupts:
> +    maxItems: 1
> +
> +  clocks:
> +    items:
> +      - description: Core clock
> +      - description: Interface clock
> +
> +  clock-names:
> +    items:
> +      - const: core
> +      - const: iface
> +
> +  "#iommu-cells":
> +    const: 0
> +
> +  power-domains:
> +    maxItems: 1
> +
> +required:
> +  - compatible
> +  - reg
> +  - interrupts
> +  - clocks
> +  - clock-names
> +  - "#iommu-cells"
> +
> +additionalProperties: false
> +
> +examples:
> +  - |
> +    #include <dt-bindings/clock/rockchip,rk3588-cru.h>
> +    #include <dt-bindings/interrupt-controller/arm-gic.h>
> +
> +    bus {
> +      #address-cells = <2>;
> +      #size-cells = <2>;
> +
> +      iommu@fdca0000 {
> +        compatible = "verisilicon,iommu","rockchip,rk3588-iommu-1.2";
> +        reg = <0x0 0xfdca0000 0x0 0x600>;
> +        interrupts = <GIC_SPI 109 IRQ_TYPE_LEVEL_HIGH 0>;
> +        clocks = <&cru ACLK_AV1>, <&cru PCLK_AV1>;
> +        clock-names = "core", "iface";
> +        #iommu-cells = <0>;
> +      };
> +    };
> -- 
> 2.43.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] dt-bindings: iommu: verisilicon: Add binding for VSI IOMMU
  2025-06-19 14:19   ` Sebastian Reichel
@ 2025-06-19 15:02     ` Conor Dooley
  2025-06-20  9:54     ` Benjamin Gaignard
  1 sibling, 0 replies; 22+ messages in thread
From: Conor Dooley @ 2025-06-19 15:02 UTC (permalink / raw)
  To: Sebastian Reichel
  Cc: Benjamin Gaignard, joro, will, robin.murphy, robh, krzk+dt,
	conor+dt, heiko, nicolas.dufresne, jgg, iommu, devicetree,
	linux-kernel, linux-arm-kernel, linux-rockchip, kernel

[-- Attachment #1: Type: text/plain, Size: 880 bytes --]

On Thu, Jun 19, 2025 at 04:19:32PM +0200, Sebastian Reichel wrote:
> Hi,
> 
> On Thu, Jun 19, 2025 at 03:12:23PM +0200, Benjamin Gaignard wrote:
> > +properties:
> > +  compatible:
> > +    items:
> > +      - const: verisilicon,iommu
> > +      - const: rockchip,rk3588-iommu-1.2
> 
> The entries should be ordered the other way around, so that the
> "generic" compatible is the fallback. Also the 1.2 version is from
> Verisilicon. It does not really make sense for Rockchip. So I
> think it should look like this:
> 
> properties:
>   compatible:
>     items:
>       - const: rockchip,rk3588-av1-iommu
>       - const: verisilicon,iommu-1.2

If you're not sure Benjamin, please just ask questions. There's been
like 3 versions of this patch, soon to be 4, in like 3 days, two of
which could have been avoided if you just asked "is this what you mean"?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-19 13:47   ` Jason Gunthorpe
@ 2025-06-19 16:27     ` Benjamin Gaignard
  2025-06-19 16:59       ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Gaignard @ 2025-06-19 16:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, iommu, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip, kernel


Le 19/06/2025 à 15:47, Jason Gunthorpe a écrit :
> On Thu, Jun 19, 2025 at 03:12:24PM +0200, Benjamin Gaignard wrote:
>
>> +static struct iommu_domain *vsi_iommu_domain_alloc_paging(struct device *dev)
>> +{
>> +	struct vsi_iommu *iommu = vsi_iommu_get_from_dev(dev);
>> +	struct vsi_iommu_domain *vsi_domain;
>> +
>> +	vsi_domain = kzalloc(sizeof(*vsi_domain), GFP_KERNEL);
>> +	if (!vsi_domain)
>> +		return NULL;
>> +
>> +	vsi_domain->dma_dev = iommu->dev;
>> +	iommu->domain = &vsi_identity_domain;
> ?? alloc paging should not change the iommu.
>
> Probably this belongs in vsi_iommu_probe_device if the device starts
> up in an identity translation mode.

Your are right it useless here, I will remove it.

>
>> +static u32 *vsi_dte_get_page_table(struct vsi_iommu_domain *vsi_domain, dma_addr_t iova)
>> +{
>> +	u32 *page_table, *dte_addr;
>> +	u32 dte_index, dte;
>> +	phys_addr_t pt_phys;
>> +	dma_addr_t pt_dma;
>> +
>> +	assert_spin_locked(&vsi_domain->dt_lock);
>> +
>> +	dte_index = vsi_iova_dte_index(iova);
>> +	dte_addr = &vsi_domain->dt[dte_index];
>> +	dte = *dte_addr;
>> +	if (vsi_dte_is_pt_valid(dte))
>> +		goto done;
>> +
>> +	page_table = (u32 *)iommu_alloc_pages_sz(GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);
> Unnecessary casts are not the kernel style, I saw a couple others too
>
> Ugh. This ignores the gfp flags that are passed into map because you
> have to force atomic due to the spinlock that shouldn't be there :(
> This means it does not set GFP_KERNEL_ACCOUNT when required. It would
> be better to continue to use the passed in GFP flags but override them
> to atomic mode.

I will add a gfp_t parameter and use it like that:
page_table = iommu_alloc_pages_sz(gfp | GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);

>
>> +static int vsi_iommu_identity_attach(struct iommu_domain *domain,
>> +				     struct device *dev)
>> +{
>> +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
>> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>> +	unsigned long flags;
>> +	int ret;
>> +
>> +	if (WARN_ON(!iommu))
>> +		return -ENODEV;
> These WARN_ON's should be removed. ops are never called by the core
> without a probed device.

ok

>
>> +static int vsi_iommu_attach_device(struct iommu_domain *domain,
>> +				   struct device *dev)
>> +{
>> +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
>> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>> +	unsigned long flags;
>> +	int ret;
>> +
>> +	if (WARN_ON(!iommu))
>> +		return -ENODEV;
>> +
>> +	/* iommu already attached */
>> +	if (iommu->domain == domain)
>> +		return 0;
>> +
>> +	ret = vsi_iommu_identity_attach(&vsi_identity_domain, dev);
>> +	if (ret)
>> +		return ret;
> Hurm, this is actually quite bad, now that it is clear the HW is in an
> identity mode it is actually a security problem for VFIO to switch the
> translation to identity during attach_device. I'd really prefer new
> drivers don't make this mistake.
>
> It seems the main thing motivating this is the fact a linked list has
> only a single iommu->node so you can't attach the iommu to both the
> new/old domain and atomically update the page table base.
>
> Is it possible for the HW to do a blocking behavior? That would be an
> easy fix.. You should always be able to force this by allocating a
> shared top page table level during probe time and making it entirely
> empty while staying always in the paging mode. Maybe there is a less
> expensive way.
>
> Otherwise you probably have work more like the other drivers and
> allocate a struct for each attachment so you can have the iommu
> attached two domains during the switch over and never drop to an
> identity mode.

I will remove the switch to identity domain and it will works fine.

>
>> +	iommu->domain = domain;
>> +
>> +	spin_lock_irqsave(&vsi_domain->iommus_lock, flags);
>> +	list_add_tail(&iommu->node, &vsi_domain->iommus);
>> +	spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
>> +
>> +	ret = pm_runtime_get_if_in_use(iommu->dev);
>> +	if (!ret || WARN_ON_ONCE(ret < 0))
>> +		return 0;
> This probably should have a comment, is the idea the resume will setup
> the domain? How does locking of iommu->domain work in that case?
>
> Maybe the suspend resume paths should be holding the group mutex..
>
>> +	ret = vsi_iommu_enable(iommu);
>> +	if (ret)
>> +		WARN_ON(vsi_iommu_identity_attach(&vsi_identity_domain, dev));
> Is this necessary though? vsi_iommu_enable failure cases don't change
> the HW, and a few lines above was an identity_attach. Just delay
> setting iommu->domain until it succeeds, and this is a simple error.

I think I will change vsi_iommu_enable() prototype to:
static int vsi_iommu_enable(struct vsi_iommu *iommu, struct iommu_domain *domain)
and do iommu->domain = domain; at the end of the function if everything goes correctly.


> iommu->domain = domain;
>
>
>> +static struct iommu_ops vsi_iommu_ops = {
>> +	.identity_domain = &vsi_identity_domain,
> Add:
>
>    .release_domain = &vsi_identity_domain,
>
> Which will cause the core code to automatically run through to
> vsi_iommu_disable() prior to calling vsi_iommu_release_device(), which
> will avoid UAF problems.
>
> Also, should the probe functions be doing some kind of validation that
> there is only one struct device attached?

which kind of validation ?

>
> Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-19 16:27     ` Benjamin Gaignard
@ 2025-06-19 16:59       ` Jason Gunthorpe
  2025-06-20  8:57         ` Benjamin Gaignard
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2025-06-19 16:59 UTC (permalink / raw)
  To: Benjamin Gaignard
  Cc: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, iommu, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip, kernel

On Thu, Jun 19, 2025 at 06:27:52PM +0200, Benjamin Gaignard wrote:
> 
> Le 19/06/2025 à 15:47, Jason Gunthorpe a écrit :

> > Ugh. This ignores the gfp flags that are passed into map because you
> > have to force atomic due to the spinlock that shouldn't be there :(
> > This means it does not set GFP_KERNEL_ACCOUNT when required. It would
> > be better to continue to use the passed in GFP flags but override them
> > to atomic mode.
> 
> I will add a gfp_t parameter and use it like that:
> page_table = iommu_alloc_pages_sz(gfp | GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);

This will or together GFP_ATOMIC and GFP_KERNEL, I don't think that
works..

> > > +static int vsi_iommu_attach_device(struct iommu_domain *domain,
> > > +				   struct device *dev)
> > > +{
> > > +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
> > > +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
> > > +	unsigned long flags;
> > > +	int ret;
> > > +
> > > +	if (WARN_ON(!iommu))
> > > +		return -ENODEV;
> > > +
> > > +	/* iommu already attached */
> > > +	if (iommu->domain == domain)
> > > +		return 0;
> > > +
> > > +	ret = vsi_iommu_identity_attach(&vsi_identity_domain, dev);
> > > +	if (ret)
> > > +		return ret;
> > Hurm, this is actually quite bad, now that it is clear the HW is in an
> > identity mode it is actually a security problem for VFIO to switch the
> > translation to identity during attach_device. I'd really prefer new
> > drivers don't make this mistake.
> > 
> > It seems the main thing motivating this is the fact a linked list has
> > only a single iommu->node so you can't attach the iommu to both the
> > new/old domain and atomically update the page table base.
> > 
> > Is it possible for the HW to do a blocking behavior? That would be an
> > easy fix.. You should always be able to force this by allocating a
> > shared top page table level during probe time and making it entirely
> > empty while staying always in the paging mode. Maybe there is a less
> > expensive way.
> > 
> > Otherwise you probably have work more like the other drivers and
> > allocate a struct for each attachment so you can have the iommu
> > attached two domains during the switch over and never drop to an
> > identity mode.
> 
> I will remove the switch to identity domain and it will works fine.

You'll loose invalidations..

Maybe the easiest thing is to squish vsi_iommu_enable() and reorganize
it so that the spinlock is held across the register programming and
then you can atomically under the lock change the registers and change
the linked list. The register write cannot fail so this is fine.

This should probably also flush the iotlb inside the lock.

> > Which will cause the core code to automatically run through to
> > vsi_iommu_disable() prior to calling vsi_iommu_release_device(), which
> > will avoid UAF problems.
> > 
> > Also, should the probe functions be doing some kind of validation that
> > there is only one struct device attached?
> 
> which kind of validation ?

That only one device probed to the iommu?

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-19 16:59       ` Jason Gunthorpe
@ 2025-06-20  8:57         ` Benjamin Gaignard
  2025-06-20 12:05           ` Jason Gunthorpe
  0 siblings, 1 reply; 22+ messages in thread
From: Benjamin Gaignard @ 2025-06-20  8:57 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, iommu, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip, kernel


Le 19/06/2025 à 18:59, Jason Gunthorpe a écrit :
> On Thu, Jun 19, 2025 at 06:27:52PM +0200, Benjamin Gaignard wrote:
>> Le 19/06/2025 à 15:47, Jason Gunthorpe a écrit :
>>> Ugh. This ignores the gfp flags that are passed into map because you
>>> have to force atomic due to the spinlock that shouldn't be there :(
>>> This means it does not set GFP_KERNEL_ACCOUNT when required. It would
>>> be better to continue to use the passed in GFP flags but override them
>>> to atomic mode.
>> I will add a gfp_t parameter and use it like that:
>> page_table = iommu_alloc_pages_sz(gfp | GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);
> This will or together GFP_ATOMIC and GFP_KERNEL, I don't think that
> works..

I have test it, I don't see conflicts or errors. What worries you ?

>
>>>> +static int vsi_iommu_attach_device(struct iommu_domain *domain,
>>>> +				   struct device *dev)
>>>> +{
>>>> +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
>>>> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>>>> +	unsigned long flags;
>>>> +	int ret;
>>>> +
>>>> +	if (WARN_ON(!iommu))
>>>> +		return -ENODEV;
>>>> +
>>>> +	/* iommu already attached */
>>>> +	if (iommu->domain == domain)
>>>> +		return 0;
>>>> +
>>>> +	ret = vsi_iommu_identity_attach(&vsi_identity_domain, dev);
>>>> +	if (ret)
>>>> +		return ret;
>>> Hurm, this is actually quite bad, now that it is clear the HW is in an
>>> identity mode it is actually a security problem for VFIO to switch the
>>> translation to identity during attach_device. I'd really prefer new
>>> drivers don't make this mistake.
>>>
>>> It seems the main thing motivating this is the fact a linked list has
>>> only a single iommu->node so you can't attach the iommu to both the
>>> new/old domain and atomically update the page table base.
>>>
>>> Is it possible for the HW to do a blocking behavior? That would be an
>>> easy fix.. You should always be able to force this by allocating a
>>> shared top page table level during probe time and making it entirely
>>> empty while staying always in the paging mode. Maybe there is a less
>>> expensive way.
>>>
>>> Otherwise you probably have work more like the other drivers and
>>> allocate a struct for each attachment so you can have the iommu
>>> attached two domains during the switch over and never drop to an
>>> identity mode.
>> I will remove the switch to identity domain and it will works fine.
> You'll loose invalidations..
>
> Maybe the easiest thing is to squish vsi_iommu_enable() and reorganize
> it so that the spinlock is held across the register programming and
> then you can atomically under the lock change the registers and change
> the linked list. The register write cannot fail so this is fine.
>
> This should probably also flush the iotlb inside the lock.

I will try to summarize:
in vsi_iommu_attach_device() I should:
- take the lock
- do nothing if the domain is the same.
- if iommu is used (ie powered up):
   - update the registers to enable the iommu
   - flush
   - update the link list
- update iommu->domain
- release the lock

in vsi_iommu_identity_attach() I should:
- take the lock
- do nothing if the domain is the same.
- if iommu is used (ie powered up):
   - disable the iommu
   - remove the node from the list
- update iommu->domain
- release the lock

vsi_iommu_suspend() and vsi_iommu_resume() will also have to take the lock
before calling vsi_iommu_disable() and vsi_iommu_enable().

sound good for you ?

Do I have to switch to identity domain in vsi_iommu_attach_device()
before applying the requested domain ?

Regards,
Benjamin

>
>>> Which will cause the core code to automatically run through to
>>> vsi_iommu_disable() prior to calling vsi_iommu_release_device(), which
>>> will avoid UAF problems.
>>>
>>> Also, should the probe functions be doing some kind of validation that
>>> there is only one struct device attached?
>> which kind of validation ?
> That only one device probed to the iommu?
>
> Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 2/5] dt-bindings: iommu: verisilicon: Add binding for VSI IOMMU
  2025-06-19 14:19   ` Sebastian Reichel
  2025-06-19 15:02     ` Conor Dooley
@ 2025-06-20  9:54     ` Benjamin Gaignard
  1 sibling, 0 replies; 22+ messages in thread
From: Benjamin Gaignard @ 2025-06-20  9:54 UTC (permalink / raw)
  To: Sebastian Reichel
  Cc: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, jgg, iommu, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip, kernel


Le 19/06/2025 à 16:19, Sebastian Reichel a écrit :
> Hi,
>
> On Thu, Jun 19, 2025 at 03:12:23PM +0200, Benjamin Gaignard wrote:
>> +properties:
>> +  compatible:
>> +    items:
>> +      - const: verisilicon,iommu
>> +      - const: rockchip,rk3588-iommu-1.2
> The entries should be ordered the other way around, so that the
> "generic" compatible is the fallback. Also the 1.2 version is from
> Verisilicon. It does not really make sense for Rockchip. So I
> think it should look like this:
>
> properties:
>    compatible:
>      items:
>        - const: rockchip,rk3588-av1-iommu
>        - const: verisilicon,iommu-1.2
>
> Otherwise LGTM.

Thanks I will do like that.

Regards,
Benjamin

>
> -- Sebastian
>
>> +
>> +  reg:
>> +    maxItems: 1
>> +
>> +  interrupts:
>> +    maxItems: 1
>> +
>> +  clocks:
>> +    items:
>> +      - description: Core clock
>> +      - description: Interface clock
>> +
>> +  clock-names:
>> +    items:
>> +      - const: core
>> +      - const: iface
>> +
>> +  "#iommu-cells":
>> +    const: 0
>> +
>> +  power-domains:
>> +    maxItems: 1
>> +
>> +required:
>> +  - compatible
>> +  - reg
>> +  - interrupts
>> +  - clocks
>> +  - clock-names
>> +  - "#iommu-cells"
>> +
>> +additionalProperties: false
>> +
>> +examples:
>> +  - |
>> +    #include <dt-bindings/clock/rockchip,rk3588-cru.h>
>> +    #include <dt-bindings/interrupt-controller/arm-gic.h>
>> +
>> +    bus {
>> +      #address-cells = <2>;
>> +      #size-cells = <2>;
>> +
>> +      iommu@fdca0000 {
>> +        compatible = "verisilicon,iommu","rockchip,rk3588-iommu-1.2";
>> +        reg = <0x0 0xfdca0000 0x0 0x600>;
>> +        interrupts = <GIC_SPI 109 IRQ_TYPE_LEVEL_HIGH 0>;
>> +        clocks = <&cru ACLK_AV1>, <&cru PCLK_AV1>;
>> +        clock-names = "core", "iface";
>> +        #iommu-cells = <0>;
>> +      };
>> +    };
>> -- 
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-20  8:57         ` Benjamin Gaignard
@ 2025-06-20 12:05           ` Jason Gunthorpe
  2025-06-20 13:52             ` Benjamin Gaignard
  0 siblings, 1 reply; 22+ messages in thread
From: Jason Gunthorpe @ 2025-06-20 12:05 UTC (permalink / raw)
  To: Benjamin Gaignard
  Cc: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, iommu, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip, kernel

On Fri, Jun 20, 2025 at 10:57:49AM +0200, Benjamin Gaignard wrote:
> 
> Le 19/06/2025 à 18:59, Jason Gunthorpe a écrit :
> > On Thu, Jun 19, 2025 at 06:27:52PM +0200, Benjamin Gaignard wrote:
> > > Le 19/06/2025 à 15:47, Jason Gunthorpe a écrit :
> > > > Ugh. This ignores the gfp flags that are passed into map because you
> > > > have to force atomic due to the spinlock that shouldn't be there :(
> > > > This means it does not set GFP_KERNEL_ACCOUNT when required. It would
> > > > be better to continue to use the passed in GFP flags but override them
> > > > to atomic mode.
> > > I will add a gfp_t parameter and use it like that:
> > > page_table = iommu_alloc_pages_sz(gfp | GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);
> > This will or together GFP_ATOMIC and GFP_KERNEL, I don't think that
> > works..
> 
> I have test it, I don't see conflicts or errors. What worries you ?

Just looking at the bitmaps I'm not sure? Did you run with lockdep?

> > > > > +static int vsi_iommu_attach_device(struct iommu_domain *domain,
> > > > > +				   struct device *dev)
> > > > > +{
> > > > > +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
> > > > > +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
> > > > > +	unsigned long flags;
> > > > > +	int ret;
> > > > > +
> > > > > +	if (WARN_ON(!iommu))
> > > > > +		return -ENODEV;
> > > > > +
> > > > > +	/* iommu already attached */
> > > > > +	if (iommu->domain == domain)
> > > > > +		return 0;
> > > > > +
> > > > > +	ret = vsi_iommu_identity_attach(&vsi_identity_domain, dev);
> > > > > +	if (ret)
> > > > > +		return ret;
> > > > Hurm, this is actually quite bad, now that it is clear the HW is in an
> > > > identity mode it is actually a security problem for VFIO to switch the
> > > > translation to identity during attach_device. I'd really prefer new
> > > > drivers don't make this mistake.
> > > > 
> > > > It seems the main thing motivating this is the fact a linked list has
> > > > only a single iommu->node so you can't attach the iommu to both the
> > > > new/old domain and atomically update the page table base.
> > > > 
> > > > Is it possible for the HW to do a blocking behavior? That would be an
> > > > easy fix.. You should always be able to force this by allocating a
> > > > shared top page table level during probe time and making it entirely
> > > > empty while staying always in the paging mode. Maybe there is a less
> > > > expensive way.
> > > > 
> > > > Otherwise you probably have work more like the other drivers and
> > > > allocate a struct for each attachment so you can have the iommu
> > > > attached two domains during the switch over and never drop to an
> > > > identity mode.
> > > I will remove the switch to identity domain and it will works fine.
> > You'll loose invalidations..
> > 
> > Maybe the easiest thing is to squish vsi_iommu_enable() and reorganize
> > it so that the spinlock is held across the register programming and
> > then you can atomically under the lock change the registers and change
> > the linked list. The register write cannot fail so this is fine.
> > 
> > This should probably also flush the iotlb inside the lock.
> 
> I will try to summarize:
> in vsi_iommu_attach_device() I should:
> - take the lock
> - do nothing if the domain is the same.
> - if iommu is used (ie powered up):
>   - update the registers to enable the iommu
>   - flush
>   - update the link list
> - update iommu->domain
> - release the lock

That sounds believable, yes.. Though can you do the "powered up" checks
inside the spinlock - are they sleeping? Can they be done before the
spinlock?

> in vsi_iommu_identity_attach() I should:
> - take the lock
> - do nothing if the domain is the same.
> - if iommu is used (ie powered up):
>   - disable the iommu
>   - remove the node from the list
> - update iommu->domain
> - release the lock

And maybe flush too? How does the caching hw work at this point? You
can't have stale entries in the cache when you switch back to an
enabled/translating configuration. So either the HW flushes implicitly
or you need to add a flush somewhere..
 
> vsi_iommu_suspend() and vsi_iommu_resume() will also have to take the lock
> before calling vsi_iommu_disable() and vsi_iommu_enable().

Yes, if they use iommu->domain that seems good

If the above locking is a problem then I'd use the group mutex instead
during resume/suspend. The attach functions are already called with
the group mutex held.

> Do I have to switch to identity domain in vsi_iommu_attach_device()
> before applying the requested domain ?

No, that is what creates the security problem. What you want is to
update the HW registers in a way that the HW just changes hitlessly
from one translation to another, then flush the cache.

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-20 12:05           ` Jason Gunthorpe
@ 2025-06-20 13:52             ` Benjamin Gaignard
  2025-06-20 14:42               ` Benjamin Gaignard
  2025-06-20 16:36               ` Jason Gunthorpe
  0 siblings, 2 replies; 22+ messages in thread
From: Benjamin Gaignard @ 2025-06-20 13:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, iommu, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip, kernel


Le 20/06/2025 à 14:05, Jason Gunthorpe a écrit :
> On Fri, Jun 20, 2025 at 10:57:49AM +0200, Benjamin Gaignard wrote:
>> Le 19/06/2025 à 18:59, Jason Gunthorpe a écrit :
>>> On Thu, Jun 19, 2025 at 06:27:52PM +0200, Benjamin Gaignard wrote:
>>>> Le 19/06/2025 à 15:47, Jason Gunthorpe a écrit :
>>>>> Ugh. This ignores the gfp flags that are passed into map because you
>>>>> have to force atomic due to the spinlock that shouldn't be there :(
>>>>> This means it does not set GFP_KERNEL_ACCOUNT when required. It would
>>>>> be better to continue to use the passed in GFP flags but override them
>>>>> to atomic mode.
>>>> I will add a gfp_t parameter and use it like that:
>>>> page_table = iommu_alloc_pages_sz(gfp | GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);
>>> This will or together GFP_ATOMIC and GFP_KERNEL, I don't think that
>>> works..
>> I have test it, I don't see conflicts or errors. What worries you ?
> Just looking at the bitmaps I'm not sure? Did you run with lockdep?

Yes and it complains about that.
I see that sun50i driver have more less the same struct than my driver
but doesn't use lock. I will try see if I can remove the lock.

>>>>>> +static int vsi_iommu_attach_device(struct iommu_domain *domain,
>>>>>> +				   struct device *dev)
>>>>>> +{
>>>>>> +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
>>>>>> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>>>>>> +	unsigned long flags;
>>>>>> +	int ret;
>>>>>> +
>>>>>> +	if (WARN_ON(!iommu))
>>>>>> +		return -ENODEV;
>>>>>> +
>>>>>> +	/* iommu already attached */
>>>>>> +	if (iommu->domain == domain)
>>>>>> +		return 0;
>>>>>> +
>>>>>> +	ret = vsi_iommu_identity_attach(&vsi_identity_domain, dev);
>>>>>> +	if (ret)
>>>>>> +		return ret;
>>>>> Hurm, this is actually quite bad, now that it is clear the HW is in an
>>>>> identity mode it is actually a security problem for VFIO to switch the
>>>>> translation to identity during attach_device. I'd really prefer new
>>>>> drivers don't make this mistake.
>>>>>
>>>>> It seems the main thing motivating this is the fact a linked list has
>>>>> only a single iommu->node so you can't attach the iommu to both the
>>>>> new/old domain and atomically update the page table base.
>>>>>
>>>>> Is it possible for the HW to do a blocking behavior? That would be an
>>>>> easy fix.. You should always be able to force this by allocating a
>>>>> shared top page table level during probe time and making it entirely
>>>>> empty while staying always in the paging mode. Maybe there is a less
>>>>> expensive way.
>>>>>
>>>>> Otherwise you probably have work more like the other drivers and
>>>>> allocate a struct for each attachment so you can have the iommu
>>>>> attached two domains during the switch over and never drop to an
>>>>> identity mode.
>>>> I will remove the switch to identity domain and it will works fine.
>>> You'll loose invalidations..
>>>
>>> Maybe the easiest thing is to squish vsi_iommu_enable() and reorganize
>>> it so that the spinlock is held across the register programming and
>>> then you can atomically under the lock change the registers and change
>>> the linked list. The register write cannot fail so this is fine.
>>>
>>> This should probably also flush the iotlb inside the lock.
>> I will try to summarize:
>> in vsi_iommu_attach_device() I should:
>> - take the lock
>> - do nothing if the domain is the same.
>> - if iommu is used (ie powered up):
>>    - update the registers to enable the iommu
>>    - flush
>>    - update the link list
>> - update iommu->domain
>> - release the lock
> That sounds believable, yes.. Though can you do the "powered up" checks
> inside the spinlock - are they sleeping? Can they be done before the
> spinlock?
>
>> in vsi_iommu_identity_attach() I should:
>> - take the lock
>> - do nothing if the domain is the same.
>> - if iommu is used (ie powered up):
>>    - disable the iommu
>>    - remove the node from the list
>> - update iommu->domain
>> - release the lock
> And maybe flush too? How does the caching hw work at this point? You
> can't have stale entries in the cache when you switch back to an
> enabled/translating configuration. So either the HW flushes implicitly
> or you need to add a flush somewhere..

I do not have the documentation for the hardware but it seems that
enable/disable are enough to do the job.
  

>   
>> vsi_iommu_suspend() and vsi_iommu_resume() will also have to take the lock
>> before calling vsi_iommu_disable() and vsi_iommu_enable().
> Yes, if they use iommu->domain that seems good
>
> If the above locking is a problem then I'd use the group mutex instead
> during resume/suspend. The attach functions are already called with
> the group mutex held.

Does group mutex is also called when using vsi_iommu_map or vsi_iommu_unmap ?

>
>> Do I have to switch to identity domain in vsi_iommu_attach_device()
>> before applying the requested domain ?
> No, that is what creates the security problem. What you want is to
> update the HW registers in a way that the HW just changes hitlessly
> from one translation to another, then flush the cache.
>
> Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-20 13:52             ` Benjamin Gaignard
@ 2025-06-20 14:42               ` Benjamin Gaignard
  2025-06-20 16:35                 ` Jason Gunthorpe
  2025-06-20 16:36               ` Jason Gunthorpe
  1 sibling, 1 reply; 22+ messages in thread
From: Benjamin Gaignard @ 2025-06-20 14:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, iommu, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip, kernel


Le 20/06/2025 à 15:52, Benjamin Gaignard a écrit :
>
> Le 20/06/2025 à 14:05, Jason Gunthorpe a écrit :
>> On Fri, Jun 20, 2025 at 10:57:49AM +0200, Benjamin Gaignard wrote:
>>> Le 19/06/2025 à 18:59, Jason Gunthorpe a écrit :
>>>> On Thu, Jun 19, 2025 at 06:27:52PM +0200, Benjamin Gaignard wrote:
>>>>> Le 19/06/2025 à 15:47, Jason Gunthorpe a écrit :
>>>>>> Ugh. This ignores the gfp flags that are passed into map because you
>>>>>> have to force atomic due to the spinlock that shouldn't be there :(
>>>>>> This means it does not set GFP_KERNEL_ACCOUNT when required. It 
>>>>>> would
>>>>>> be better to continue to use the passed in GFP flags but override 
>>>>>> them
>>>>>> to atomic mode.
>>>>> I will add a gfp_t parameter and use it like that:
>>>>> page_table = iommu_alloc_pages_sz(gfp | GFP_ATOMIC | GFP_DMA32, 
>>>>> SPAGE_SIZE);
>>>> This will or together GFP_ATOMIC and GFP_KERNEL, I don't think that
>>>> works..
>>> I have test it, I don't see conflicts or errors. What worries you ?
>> Just looking at the bitmaps I'm not sure? Did you run with lockdep?
>
> Yes and it complains about that.
> I see that sun50i driver have more less the same struct than my driver
> but doesn't use lock. I will try see if I can remove the lock.

I have replace the two spinlock by a mutex in vsi_iommu structure.
It seems it works well and lockdep doesn't complain anymore.

>
>>>>>>> +static int vsi_iommu_attach_device(struct iommu_domain *domain,
>>>>>>> +                   struct device *dev)
>>>>>>> +{
>>>>>>> +    struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
>>>>>>> +    struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>>>>>>> +    unsigned long flags;
>>>>>>> +    int ret;
>>>>>>> +
>>>>>>> +    if (WARN_ON(!iommu))
>>>>>>> +        return -ENODEV;
>>>>>>> +
>>>>>>> +    /* iommu already attached */
>>>>>>> +    if (iommu->domain == domain)
>>>>>>> +        return 0;
>>>>>>> +
>>>>>>> +    ret = vsi_iommu_identity_attach(&vsi_identity_domain, dev);
>>>>>>> +    if (ret)
>>>>>>> +        return ret;
>>>>>> Hurm, this is actually quite bad, now that it is clear the HW is 
>>>>>> in an
>>>>>> identity mode it is actually a security problem for VFIO to 
>>>>>> switch the
>>>>>> translation to identity during attach_device. I'd really prefer new
>>>>>> drivers don't make this mistake.
>>>>>>
>>>>>> It seems the main thing motivating this is the fact a linked list 
>>>>>> has
>>>>>> only a single iommu->node so you can't attach the iommu to both the
>>>>>> new/old domain and atomically update the page table base.
>>>>>>
>>>>>> Is it possible for the HW to do a blocking behavior? That would 
>>>>>> be an
>>>>>> easy fix.. You should always be able to force this by allocating a
>>>>>> shared top page table level during probe time and making it entirely
>>>>>> empty while staying always in the paging mode. Maybe there is a less
>>>>>> expensive way.
>>>>>>
>>>>>> Otherwise you probably have work more like the other drivers and
>>>>>> allocate a struct for each attachment so you can have the iommu
>>>>>> attached two domains during the switch over and never drop to an
>>>>>> identity mode.
>>>>> I will remove the switch to identity domain and it will works fine.
>>>> You'll loose invalidations..
>>>>
>>>> Maybe the easiest thing is to squish vsi_iommu_enable() and reorganize
>>>> it so that the spinlock is held across the register programming and
>>>> then you can atomically under the lock change the registers and change
>>>> the linked list. The register write cannot fail so this is fine.
>>>>
>>>> This should probably also flush the iotlb inside the lock.
>>> I will try to summarize:
>>> in vsi_iommu_attach_device() I should:
>>> - take the lock
>>> - do nothing if the domain is the same.
>>> - if iommu is used (ie powered up):
>>>    - update the registers to enable the iommu
>>>    - flush
>>>    - update the link list
>>> - update iommu->domain
>>> - release the lock
>> That sounds believable, yes.. Though can you do the "powered up" checks
>> inside the spinlock - are they sleeping? Can they be done before the
>> spinlock?
>>
>>> in vsi_iommu_identity_attach() I should:
>>> - take the lock
>>> - do nothing if the domain is the same.
>>> - if iommu is used (ie powered up):
>>>    - disable the iommu
>>>    - remove the node from the list
>>> - update iommu->domain
>>> - release the lock
>> And maybe flush too? How does the caching hw work at this point? You
>> can't have stale entries in the cache when you switch back to an
>> enabled/translating configuration. So either the HW flushes implicitly
>> or you need to add a flush somewhere..
>
> I do not have the documentation for the hardware but it seems that
> enable/disable are enough to do the job.
>
>
>>> vsi_iommu_suspend() and vsi_iommu_resume() will also have to take 
>>> the lock
>>> before calling vsi_iommu_disable() and vsi_iommu_enable().
>> Yes, if they use iommu->domain that seems good
>>
>> If the above locking is a problem then I'd use the group mutex instead
>> during resume/suspend. The attach functions are already called with
>> the group mutex held.
>
> Does group mutex is also called when using vsi_iommu_map or 
> vsi_iommu_unmap ?
>
>>
>>> Do I have to switch to identity domain in vsi_iommu_attach_device()
>>> before applying the requested domain ?
>> No, that is what creates the security problem. What you want is to
>> update the HW registers in a way that the HW just changes hitlessly
>> from one translation to another, then flush the cache.
>>
>> Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-20 14:42               ` Benjamin Gaignard
@ 2025-06-20 16:35                 ` Jason Gunthorpe
  0 siblings, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2025-06-20 16:35 UTC (permalink / raw)
  To: Benjamin Gaignard
  Cc: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, iommu, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip, kernel

On Fri, Jun 20, 2025 at 04:42:02PM +0200, Benjamin Gaignard wrote:
> 
> Le 20/06/2025 à 15:52, Benjamin Gaignard a écrit :
> > 
> > Le 20/06/2025 à 14:05, Jason Gunthorpe a écrit :
> > > On Fri, Jun 20, 2025 at 10:57:49AM +0200, Benjamin Gaignard wrote:
> > > > Le 19/06/2025 à 18:59, Jason Gunthorpe a écrit :
> > > > > On Thu, Jun 19, 2025 at 06:27:52PM +0200, Benjamin Gaignard wrote:
> > > > > > Le 19/06/2025 à 15:47, Jason Gunthorpe a écrit :
> > > > > > > Ugh. This ignores the gfp flags that are passed into map because you
> > > > > > > have to force atomic due to the spinlock that shouldn't be there :(
> > > > > > > This means it does not set GFP_KERNEL_ACCOUNT when
> > > > > > > required. It would
> > > > > > > be better to continue to use the passed in GFP flags
> > > > > > > but override them
> > > > > > > to atomic mode.
> > > > > > I will add a gfp_t parameter and use it like that:
> > > > > > page_table = iommu_alloc_pages_sz(gfp | GFP_ATOMIC |
> > > > > > GFP_DMA32, SPAGE_SIZE);
> > > > > This will or together GFP_ATOMIC and GFP_KERNEL, I don't think that
> > > > > works..
> > > > I have test it, I don't see conflicts or errors. What worries you ?
> > > Just looking at the bitmaps I'm not sure? Did you run with lockdep?
> > 
> > Yes and it complains about that.
> > I see that sun50i driver have more less the same struct than my driver
> > but doesn't use lock. I will try see if I can remove the lock.
> 
> I have replace the two spinlock by a mutex in vsi_iommu structure.
> It seems it works well and lockdep doesn't complain anymore.

You cannot use a sleeping lock within the map/unmap
functions. Removing the lock is hard for your case because you have
the cache flushing problem.

Maybe mask GFP_KERNEL off and then or back in GFP_ATOMIC.

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-20 13:52             ` Benjamin Gaignard
  2025-06-20 14:42               ` Benjamin Gaignard
@ 2025-06-20 16:36               ` Jason Gunthorpe
  1 sibling, 0 replies; 22+ messages in thread
From: Jason Gunthorpe @ 2025-06-20 16:36 UTC (permalink / raw)
  To: Benjamin Gaignard
  Cc: joro, will, robin.murphy, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, iommu, devicetree, linux-kernel,
	linux-arm-kernel, linux-rockchip, kernel

On Fri, Jun 20, 2025 at 03:52:46PM +0200, Benjamin Gaignard wrote:
> > > vsi_iommu_suspend() and vsi_iommu_resume() will also have to take the lock
> > > before calling vsi_iommu_disable() and vsi_iommu_enable().
> > Yes, if they use iommu->domain that seems good
> > 
> > If the above locking is a problem then I'd use the group mutex instead
> > during resume/suspend. The attach functions are already called with
> > the group mutex held.
> 
> Does group mutex is also called when using vsi_iommu_map or vsi_iommu_unmap ?

No

Jason

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-19 13:12 ` [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver Benjamin Gaignard
  2025-06-19 13:47   ` Jason Gunthorpe
@ 2025-06-20 19:37   ` Robin Murphy
  2025-06-20 20:45     ` Lucas Stach
  2025-06-23 14:03     ` Benjamin Gaignard
  1 sibling, 2 replies; 22+ messages in thread
From: Robin Murphy @ 2025-06-20 19:37 UTC (permalink / raw)
  To: Benjamin Gaignard, joro, will, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, jgg
  Cc: iommu, devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	kernel

On 19/06/2025 2:12 pm, Benjamin Gaignard wrote:
> The Verisilicon IOMMU hardware block can be found in combination
> with Verisilicon hardware video codecs (encoders or decoders) on
> different SoCs.
> Enable it will allow us to use non contiguous memory allocators
> for Verisilicon video codecs.
> 
> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
> ---
> changes in version 3:
> - Change compatible to "rockchip,rk3588-iommu-1.2"
> - Create an identity domain for the driver
> - Fix double flush issue
> - Rework attach/detach logic
> - Simplify xlate function
> - Discover iommu device like done in ARM driver
> - Remove ARM_DMA_USE_IOMMU from Kconfig
> 
>   drivers/iommu/Kconfig     |  11 +
>   drivers/iommu/Makefile    |   1 +
>   drivers/iommu/vsi-iommu.c | 874 ++++++++++++++++++++++++++++++++++++++
>   3 files changed, 886 insertions(+)
>   create mode 100644 drivers/iommu/vsi-iommu.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 0a33d995d15d..3e95d1db737b 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -383,4 +383,15 @@ config SPRD_IOMMU
>   
>   	  Say Y here if you want to use the multimedia devices listed above.
>   
> +config VSI_IOMMU
> +	tristate "Verisilicon IOMMU Support"
> +	depends on ARM64

"(ARCH_ROCKCHIP && ARM64) || COMPILE_TEST", probably. Otherwise you 
might risk annoying Geert :)

> +	select IOMMU_API
> +	help
> +	  Support for IOMMUs used by Verisilicon sub-systems like video
> +	  decoders or encoder hardware blocks.
> +
> +	  Say Y here if you want to use this IOMMU in front of these
> +	  hardware blocks.
> +
>   endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 355294fa9033..68aeff31af8b 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -34,3 +34,4 @@ obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
>   obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o
>   obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
>   obj-$(CONFIG_APPLE_DART) += apple-dart.o
> +obj-$(CONFIG_VSI_IOMMU) += vsi-iommu.o
> diff --git a/drivers/iommu/vsi-iommu.c b/drivers/iommu/vsi-iommu.c
> new file mode 100644
> index 000000000000..89e63a6a60c1
> --- /dev/null
> +++ b/drivers/iommu/vsi-iommu.c
> @@ -0,0 +1,874 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (C) 2025 Collabora Ltd.
> + *
> + * IOMMU API for Verisilicon
> + *
> + * Module Authors:	Yandong Lin <yandong.lin@rock-chips.com>
> + *			Simon Xue <xxm@rock-chips.com>
> + *			Benjamin Gaignard <benjamin.gaignard@collabora.com>
> + */
> +
> +#include <linux/clk.h>
> +#include <linux/compiler.h>
> +#include <linux/delay.h>
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include <linux/dma-map-ops.h>

This shouldn't be here, it's a device driver not a DMA API implementation.

> +#include <linux/errno.h>
> +#include <linux/init.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/iommu.h>
> +#include <linux/iopoll.h>
> +#include <linux/list.h>
> +#include <linux/mm.h>
> +#include <linux/module.h>
> +#include <linux/of.h>
> +#include <linux/of_iommu.h>
> +#include <linux/of_platform.h>
> +#include <linux/platform_device.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +#include "iommu-pages.h"
> +
> +struct vsi_iommu {
> +	struct device *dev;
> +	void __iomem **bases;
> +	int num_mmu;

What are these for, given the binding specifies a single "reg"?

If you do anticipate supporting multiple MMUs per client device, I would 
still strongly consider following the exynos-iommu approach of linking 
distinct instances together internally - the "single instance with 
multiple interfaces" model is a bit of a bodge, and has the big weakness 
that you tend to end up forever having to evolve the "clocks", 
"interrupts" etc. bindings in weird and arbitrary ways for different 
integrations. Especially if this is a 3rd-party IP that's liable to be 
used by multiple different SoC vendors.

> +	int num_irq;
> +	struct clk_bulk_data *clocks;
> +	int num_clocks;
> +	struct iommu_device iommu;
> +	struct list_head node; /* entry in vsi_iommu_domain.iommus */
> +	struct iommu_domain *domain; /* domain to which iommu is attached */
> +};
> +
> +struct vsi_iommu_domain {
> +	struct device *dma_dev;
> +	struct list_head iommus;
> +	u32 *dt; /* page directory table */
> +	dma_addr_t dt_dma;
> +	spinlock_t iommus_lock; /* lock for iommus list */
> +	spinlock_t dt_lock; /* lock for modifying page directory table */
> +	struct iommu_domain domain;
> +	/* for vsi iommu */

Wait, so who is the rest of this driver-private structure for, if not 
also for this driver? :/

> +	u64 *pta; /* page directory table */
> +	dma_addr_t pta_dma;
> +};
> +
> +static struct iommu_domain vsi_identity_domain;
> +
> +#define NUM_DT_ENTRIES	1024
> +#define NUM_PT_ENTRIES	1024
> +#define PT_SIZE		(NUM_PT_ENTRIES * sizeof(u32))
> +
> +#define SPAGE_SIZE	BIT(12)
> +
> +/* vsi iommu regs address */
> +#define VSI_MMU_CONFIG1_BASE			0x1ac
> +#define VSI_MMU_AHB_EXCEPTION_BASE		0x380
> +#define VSI_MMU_AHB_CONTROL_BASE		0x388
> +#define VSI_MMU_AHB_TBL_ARRAY_BASE_L_BASE	0x38C
> +
> +/* MMU register offsets */
> +#define VSI_MMU_FLUSH_BASE		0x184
> +#define VSI_MMU_BIT_FLUSH		BIT(4)
> +
> +#define VSI_MMU_PAGE_FAULT_ADDR		0x380
> +#define VSI_MMU_STATUS_BASE		0x384	/* IRQ status */
> +
> +#define VSI_MMU_BIT_ENABLE		BIT(0)
> +
> +#define VSI_MMU_OUT_OF_BOUND		BIT(28)
> +/* Irq mask */
> +#define VSI_MMU_IRQ_MASK		0x7
> +
> +#define VSI_DTE_PT_ADDRESS_MASK		0xffffffc0

I'm curious what those lowest 6 bits are for - does it really only 
require 64-byte alignment for pagetables, or can it actually accommodate 
a folded >32-bit address similar to the PTE level?

> +#define VSI_DTE_PT_VALID		BIT(0)
> +
> +#define VSI_PAGE_DESC_LO_MASK		0xfffff000
> +#define VSI_PAGE_DESC_HI_MASK		GENMASK_ULL(39, 32)
> +#define VSI_PAGE_DESC_HI_SHIFT		(32 - 4)
> +
> +static inline phys_addr_t vsi_dte_pt_address(u32 dte)
> +{
> +	return (phys_addr_t)dte & VSI_DTE_PT_ADDRESS_MASK;
> +}
> +
> +static inline u32 vsi_mk_dte(u32 dte)
> +{
> +	return (phys_addr_t)dte | VSI_DTE_PT_VALID;
> +}
> +
> +#define VSI_PTE_PAGE_ADDRESS_MASK	0xfffffff0
> +#define VSI_PTE_PAGE_WRITABLE		BIT(2)

Any idea if there are other useful permission bits?

> +#define VSI_PTE_PAGE_VALID		BIT(0)
> +
> +static inline phys_addr_t vsi_pte_page_address(u32 pte)
> +{
> +	u64 pte_vsi = pte;
> +
> +	pte_vsi = ((pte_vsi & VSI_PAGE_DESC_HI_MASK) << VSI_PAGE_DESC_HI_SHIFT) |

"(pte_vsi & VSI_PAGE_DESC_HI_MASK)" will by definition be 0.

> +		  (pte_vsi & VSI_PAGE_DESC_LO_MASK);
> +
> +	return (phys_addr_t)pte_vsi;
> +}
> +
> +static u32 vsi_mk_pte(phys_addr_t page, int prot)
> +{
> +	u32 flags = 0;
> +
> +	flags |= (prot & IOMMU_WRITE) ? VSI_PTE_PAGE_WRITABLE : 0;
> +	page = (page & VSI_PAGE_DESC_LO_MASK) |
> +	       ((page & VSI_PAGE_DESC_HI_MASK) >> VSI_PAGE_DESC_HI_SHIFT);
> +	page &= VSI_PTE_PAGE_ADDRESS_MASK;

If VSI_PAGE_DESC_LO_MASK and VSI_PAGE_DESC_HI_MASK are correct to start 
with then VSI_PTE_PAGE_ADDRESS_MASK serves no purpose.

> +	return page | flags | VSI_PTE_PAGE_VALID;
> +}
> +
> +#define VSI_DTE_PT_VALID	BIT(0)
> +
> +static inline bool vsi_dte_is_pt_valid(u32 dte)
> +{
> +	return dte & VSI_DTE_PT_VALID;
> +}
> +
> +static inline bool vsi_pte_is_page_valid(u32 pte)
> +{
> +	return pte & VSI_PTE_PAGE_VALID;
> +}
> +
> +static u32 vsi_mk_pte_invalid(u32 pte)
> +{
> +	return pte & ~VSI_PTE_PAGE_VALID;
> +}
> +
> +#define VSI_MASTER_TLB_MASK	GENMASK_ULL(31, 10)

I note that this ends up being associated with the 
VSI_MMU_AHB_TBL_ARRAY_BASE_L_BASE register - is the "TLB"/"TBL" 
difference a typo one way or the other, or definitely intentional?

(For all I know maybe it really could be a table of translation 
lookaside bufers?)

> +/* mode 0 : 4k */
> +#define VSI_PTA_4K_MODE	0
> +
> +static u64 vsi_mk_pta(dma_addr_t dt_dma)
> +{
> +	u64 val = (dt_dma & VSI_MASTER_TLB_MASK) | VSI_PTA_4K_MODE;
> +
> +	return val;
> +}
> +
> +static struct vsi_iommu_domain *to_vsi_domain(struct iommu_domain *dom)
> +{
> +	return container_of(dom, struct vsi_iommu_domain, domain);
> +}
> +
> +static void vsi_iommu_disable(struct vsi_iommu *iommu)
> +{
> +	int i;
> +
> +	/* Ignore error while disabling, just keep going */

FWIW I thought that comment was wrong at first, because we're clearly 
not disabling the clocks...

> +	WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks));
> +	for (i = 0; i < iommu->num_mmu; i++)
> +		writel(0, iommu->bases[i] + VSI_MMU_AHB_CONTROL_BASE);

However, even if the IOMMU itself is going away, is it really safe to 
ignore an error if it means we could risk hanging on an unclocked 
register access here?

> +	clk_bulk_disable(iommu->num_clocks, iommu->clocks);
> +}
> +
> +static int vsi_iommu_enable(struct vsi_iommu *iommu)
> +{
> +	struct iommu_domain *domain = iommu->domain;
> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
> +	int ret, i;
> +
> +	ret = clk_bulk_enable(iommu->num_clocks, iommu->clocks);
> +	if (ret)
> +		return ret;
> +
> +	for (i = 0; i < iommu->num_mmu; i++) {
> +		u32 val = readl(iommu->bases[i] + VSI_MMU_AHB_CONTROL_BASE);
> +
> +		if (!(val & VSI_MMU_BIT_ENABLE)) {

If the hardware happens to be enabled already, can you be sure it's 
enabled *with the expected configuration*?

> +			writel(vsi_domain->pta_dma,
> +			       iommu->bases[i] + VSI_MMU_AHB_TBL_ARRAY_BASE_L_BASE);
> +			writel(VSI_MMU_OUT_OF_BOUND, iommu->bases[i] + VSI_MMU_CONFIG1_BASE);
> +			writel(VSI_MMU_BIT_ENABLE, iommu->bases[i] + VSI_MMU_AHB_EXCEPTION_BASE);
> +			writel(VSI_MMU_BIT_ENABLE, iommu->bases[i] + VSI_MMU_AHB_CONTROL_BASE);
> +		}
> +	}
> +	clk_bulk_disable(iommu->num_clocks, iommu->clocks);
> +
> +	return ret;
> +}
> +
> +static inline void vsi_table_flush(struct vsi_iommu_domain *vsi_domain, dma_addr_t dma,
> +				   unsigned int count)
> +{
> +	size_t size = count * sizeof(u32); /* count of u32 entry */
> +
> +	dma_sync_single_for_device(vsi_domain->dma_dev, dma, size, DMA_TO_DEVICE);
> +}
> +
> +#define VSI_IOVA_DTE_MASK	0xffc00000
> +#define VSI_IOVA_DTE_SHIFT	22
> +#define VSI_IOVA_PTE_MASK	0x003ff000
> +#define VSI_IOVA_PTE_SHIFT	12
> +#define VSI_IOVA_PAGE_MASK	0x00000fff
> +#define VSI_IOVA_PAGE_SHIFT	0
> +
> +static u32 vsi_iova_dte_index(dma_addr_t iova)
> +{
> +	return (u32)(iova & VSI_IOVA_DTE_MASK) >> VSI_IOVA_DTE_SHIFT;

Are these u32 casts really necessary? At worst, why not just make the 
"iova" argument u32 to begin with?

> +}
> +
> +static u32 vsi_iova_pte_index(dma_addr_t iova)
> +{
> +	return (u32)(iova & VSI_IOVA_PTE_MASK) >> VSI_IOVA_PTE_SHIFT;
> +}
> +
> +static u32 vsi_iova_page_offset(dma_addr_t iova)
> +{
> +	return (u32)(iova & VSI_IOVA_PAGE_MASK) >> VSI_IOVA_PAGE_SHIFT;
> +}
> +
> +static u32 vsi_iommu_read(void __iomem *base, u32 offset)
> +{
> +	return readl(base + offset);
> +}
> +
> +static void vsi_iommu_write(void __iomem *base, u32 offset, u32 value)
> +{
> +	writel(value, base + offset);
> +}

Ditch these read/write wrappers, they're used all of once, and they're 
still longer than just writing out the straightforward readl/writel 
directly. Abstracting a structure member dereference is one thing, but 
abstracting a single addition is entirely unnecessary.

> +static void vsi_iommu_flush_tlb_all(struct iommu_domain *domain)
> +{
> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
> +	struct list_head *pos;
> +	unsigned long flags;
> +	int i;
> +
> +	spin_lock_irqsave(&vsi_domain->iommus_lock, flags);
> +	list_for_each(pos, &vsi_domain->iommus) {
> +		struct vsi_iommu *iommu;
> +		int ret;
> +
> +		iommu = list_entry(pos, struct vsi_iommu, node);
> +		ret = pm_runtime_get_if_in_use(iommu->dev);
> +		if (WARN_ON_ONCE(ret < 0))
> +			continue;
> +		if (ret) {
> +			WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks));

Again, would it really be OK to go ahead and access a potentially 
unclocked register?

TBH I'd drop all this busywork with the clocks altogether, and just 
enable/disable them in the runtime PM callbacks. I can't imagine the 
IOMMU would actually work very well with its core clock stopped anyway - 
presumably you're only getting away with it in this case because the 
clocks are shared with the codec so that's keeping them enabled while 
there is traffic for the IOMMU to translate. If someone really really 
really wants to gate the interface clock between register accesses on 
some other platform where it would have an effect, they can always come 
back and add that later.

> +			for (i = 0; i < iommu->num_mmu; i++) {
> +				writel(VSI_MMU_BIT_FLUSH,
> +				       iommu->bases[i] + VSI_MMU_FLUSH_BASE);
> +				writel(0, iommu->bases[i] + VSI_MMU_FLUSH_BASE);

That's curious - you set the bit, then explicitly clear it again 
imemdiately, but don't have to wait for any kind of completion status?

> +			}
> +			clk_bulk_disable(iommu->num_clocks, iommu->clocks);
> +			pm_runtime_put(iommu->dev);
> +		}
> +	}
> +	spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
> +}
> +
> +static irqreturn_t vsi_iommu_irq(int irq, void *dev_id)
> +{
> +	struct vsi_iommu *iommu = dev_id;
> +	u32 int_status;
> +	dma_addr_t iova;
> +	irqreturn_t ret = IRQ_NONE;
> +	int i, err;
> +
> +	err = pm_runtime_get_if_in_use(iommu->dev);
> +	if (!err || WARN_ON_ONCE(err < 0))
> +		return ret;
> +
> +	if (WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)))
> +		goto out;
> +
> +	for (i = 0; i < iommu->num_mmu; i++) {
> +		int_status = vsi_iommu_read(iommu->bases[i], VSI_MMU_STATUS_BASE);
> +		if (int_status & VSI_MMU_IRQ_MASK) {
> +			dev_err(iommu->dev, "unexpected int_status=%08x\n", int_status);
> +			iova = vsi_iommu_read(iommu->bases[i], VSI_MMU_PAGE_FAULT_ADDR);
> +
> +			if (iommu->domain)

The current domain should never be NULL. You should default to either a 
blocking or bypass domain (depending on the hardware behaiour), and 
initialise that before you ever even request the IRQ.

> +				report_iommu_fault(iommu->domain, iommu->dev, iova, int_status);
> +			else
> +				dev_err(iommu->dev,
> +					"Page fault while iommu not attached to domain?\n");
> +		}
> +		vsi_iommu_write(iommu->bases[i], VSI_MMU_STATUS_BASE, 0);
> +		ret = IRQ_HANDLED;
> +	}
> +
> +	clk_bulk_disable(iommu->num_clocks, iommu->clocks);
> +
> +out:
> +	pm_runtime_put(iommu->dev);
> +	return ret;
> +}
> +
> +static struct vsi_iommu *vsi_iommu_get_from_dev(struct device *dev)
> +{
> +	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
> +	struct device *iommu_dev = bus_find_device_by_fwnode(&platform_bus_type,
> +							     fwspec->iommu_fwnode);
> +
> +	put_device(iommu_dev);
> +
> +	return iommu_dev ? dev_get_drvdata(iommu_dev) : NULL;
> +}
> +
> +static struct iommu_domain *vsi_iommu_domain_alloc_paging(struct device *dev)
> +{
> +	struct vsi_iommu *iommu = vsi_iommu_get_from_dev(dev);

Use dev_iommu_priv_get() - we get here via a dev_iommu_ops() lookup, so 
dev is already guaranteed to be one of your clients.

> +	struct vsi_iommu_domain *vsi_domain;
> +
> +	vsi_domain = kzalloc(sizeof(*vsi_domain), GFP_KERNEL);
> +	if (!vsi_domain)
> +		return NULL;
> +
> +	vsi_domain->dma_dev = iommu->dev;
> +	iommu->domain = &vsi_identity_domain;

Nope, that should only happen when the domain is actually attached.

> +	/*
> +	 * iommu use a 2 level pagetable.
> +	 * Each level1 (dt) and level2 (pt) table has 1024 4-byte entries.
> +	 * Allocate one 4 KiB page for each table.
> +	 */
> +	vsi_domain->dt = iommu_alloc_pages_sz(GFP_KERNEL | GFP_DMA32,
> +					      SPAGE_SIZE);
> +	if (!vsi_domain->dt)
> +		goto err_free_domain;
> +
> +	vsi_domain->dt_dma = dma_map_single(vsi_domain->dma_dev, vsi_domain->dt,
> +					    SPAGE_SIZE, DMA_TO_DEVICE);
> +	if (dma_mapping_error(vsi_domain->dma_dev, vsi_domain->dt_dma)) {
> +		dev_err(vsi_domain->dma_dev, "DMA map error for DT\n");
> +		goto err_free_dt;
> +	}
> +
> +	vsi_domain->pta = iommu_alloc_pages_sz(GFP_KERNEL | GFP_DMA32,
> +					       SPAGE_SIZE);
> +	if (!vsi_domain->pta)
> +		goto err_unmap_dt;
> +
> +	vsi_domain->pta[0] = vsi_mk_pta(vsi_domain->dt_dma);
> +	vsi_domain->pta_dma = dma_map_single(vsi_domain->dma_dev, vsi_domain->pta,
> +					     SPAGE_SIZE, DMA_TO_DEVICE);
> +	if (dma_mapping_error(vsi_domain->dma_dev, vsi_domain->pta_dma)) {
> +		dev_err(vsi_domain->dma_dev, "DMA map error for PTA\n");
> +		goto err_free_pta;
> +	}

I'm especially curious what this "pta" really is - is the comment above 
just wrong, and you've actually got a 3-level pagetable supporting 
somewhere between 33 and 42 bits of VA? If not, then the additional 
level of indirection would very much seem to imply some kind of 
mechanism for accommodating multiple pagetables at once, and in that 
case, is it like a PASID table where the client device gets to choose 
which entry to use, or a StreamID table to disambiguate multiple client 
devices? (Where in the latter case it should most likely belong to the 
IOMMU rather than the domain, and you probably want nonzero #iommu-cells 
in the DT binding for the client IDs).

> +	spin_lock_init(&vsi_domain->iommus_lock);
> +	spin_lock_init(&vsi_domain->dt_lock);
> +	INIT_LIST_HEAD(&vsi_domain->iommus);
> +
> +	vsi_domain->domain.geometry.aperture_start = 0;
> +	vsi_domain->domain.geometry.aperture_end   = DMA_BIT_MASK(32);
> +	vsi_domain->domain.geometry.force_aperture = true;
> +	vsi_domain->domain.pgsize_bitmap = SZ_4K;
> +
> +	return &vsi_domain->domain;
> +
> +err_free_pta:
> +	iommu_free_pages(vsi_domain->pta);
> +err_unmap_dt:
> +	dma_unmap_single(vsi_domain->dma_dev, vsi_domain->dt_dma,
> +			 SPAGE_SIZE, DMA_TO_DEVICE);
> +err_free_dt:
> +	iommu_free_pages(vsi_domain->dt);
> +err_free_domain:
> +	kfree(vsi_domain);
> +
> +	return NULL;
> +}
> +
> +static phys_addr_t vsi_iommu_iova_to_phys(struct iommu_domain *domain,
> +					  dma_addr_t iova)
> +{
> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
> +	unsigned long flags;
> +	phys_addr_t pt_phys, phys = 0;
> +	u32 dte, pte;
> +	u32 *page_table;
> +
> +	spin_lock_irqsave(&vsi_domain->dt_lock, flags);
> +
> +	dte = vsi_domain->dt[vsi_iova_dte_index(iova)];
> +	if (!vsi_dte_is_pt_valid(dte))
> +		goto out;
> +
> +	pt_phys = vsi_dte_pt_address(dte);
> +	page_table = (u32 *)phys_to_virt(pt_phys);
> +	pte = page_table[vsi_iova_pte_index(iova)];
> +	if (!vsi_pte_is_page_valid(pte))
> +		goto out;
> +
> +	phys = vsi_pte_page_address(pte) + vsi_iova_page_offset(iova);
> +out:
> +	spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
> +
> +	return phys;
> +}
> +
> +static u32 *vsi_dte_get_page_table(struct vsi_iommu_domain *vsi_domain, dma_addr_t iova)
> +{
> +	u32 *page_table, *dte_addr;
> +	u32 dte_index, dte;
> +	phys_addr_t pt_phys;
> +	dma_addr_t pt_dma;
> +
> +	assert_spin_locked(&vsi_domain->dt_lock);
> +
> +	dte_index = vsi_iova_dte_index(iova);
> +	dte_addr = &vsi_domain->dt[dte_index];
> +	dte = *dte_addr;
> +	if (vsi_dte_is_pt_valid(dte))
> +		goto done;
> +
> +	page_table = (u32 *)iommu_alloc_pages_sz(GFP_ATOMIC | GFP_DMA32, SPAGE_SIZE);
> +	if (!page_table)
> +		return ERR_PTR(-ENOMEM);
> +
> +	dte = vsi_mk_dte(virt_to_phys(page_table));

No no no no no. You're already generating pt_dma the correct way just 
below, use it! By all means *check* what dma_map_single() returns to 
make sure you're not getting any unexpected translation/bounce-buffering 
if you want to be able to rely on using phys_to_virt() as a shortcut 
later, but never blindly assume that virt_to_phys() is appropriate for 
DMA (for instance, what if CONFIG_ZONE_DMA32 isn't enabled so page_table 
ended up at a >32-bit address?)

> +	*dte_addr = dte;
> +
> +	pt_dma = dma_map_single(vsi_domain->dma_dev, page_table, SPAGE_SIZE, DMA_TO_DEVICE);
> +	if (dma_mapping_error(vsi_domain->dma_dev, pt_dma)) {
> +		dev_err(vsi_domain->dma_dev, "DMA mapping error while allocating page table\n");
> +		iommu_free_pages(page_table);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	vsi_table_flush(vsi_domain,
> +			vsi_domain->dt_dma + dte_index * sizeof(u32), 1);

The oreder seems a bit jumbled up here as well - it would be safest to 
get everything done with page_table itself first, *then* worry about 
updating the DTE to point to it.

> +done:
> +	pt_phys = vsi_dte_pt_address(dte);
> +	return (u32 *)phys_to_virt(pt_phys);
> +}
> +
> +static size_t vsi_iommu_unmap_iova(struct vsi_iommu_domain *vsi_domain,
> +				   u32 *pte_addr, dma_addr_t pte_dma,
> +				   size_t size)
> +{
> +	unsigned int pte_count;
> +	unsigned int pte_total = size / SPAGE_SIZE;
> +
> +	assert_spin_locked(&vsi_domain->dt_lock);
> +
> +	for (pte_count = 0; pte_count < pte_total; pte_count++) {
> +		u32 pte = pte_addr[pte_count];

What prevents this running off the end of the pagetable page? AFAICS 
you're not capping "size" to DTE boundaries in the main callback either.

> +
> +		if (!vsi_pte_is_page_valid(pte))
> +			break;
> +
> +		pte_addr[pte_count] = vsi_mk_pte_invalid(pte);
> +	}
> +
> +	vsi_table_flush(vsi_domain, pte_dma, pte_count);
> +
> +	return pte_count * SPAGE_SIZE;
> +}
> +
> +static int vsi_iommu_map_iova(struct vsi_iommu_domain *vsi_domain, u32 *pte_addr,
> +			      dma_addr_t pte_dma, dma_addr_t iova,
> +			      phys_addr_t paddr, size_t size, int prot)
> +{
> +	unsigned int pte_count;
> +	unsigned int pte_total = size / SPAGE_SIZE;
> +
> +	assert_spin_locked(&vsi_domain->dt_lock);
> +
> +	for (pte_count = 0; pte_count < pte_total; pte_count++) {
> +		u32 pte = pte_addr[pte_count];
> +
> +		if (vsi_pte_is_page_valid(pte))
> +			goto unwind;
> +
> +		pte_addr[pte_count] = vsi_mk_pte(paddr, prot);
> +
> +		paddr += SPAGE_SIZE;
> +	}
> +
> +	vsi_table_flush(vsi_domain, pte_dma, pte_total);
> +
> +	return 0;
> +unwind:
> +	/* Unmap the range of iovas that we just mapped */
> +	vsi_iommu_unmap_iova(vsi_domain, pte_addr, pte_dma,
> +			     pte_count * SPAGE_SIZE);

If you failed to map anything, return an error; otherwise, just return 
however much you did map successfully. The IOMMU core will take care of 
the rest.

> +
> +	return -EADDRINUSE;
> +}
> +
> +static size_t vsi_iommu_unmap(struct iommu_domain *domain, unsigned long _iova,
> +			      size_t size, size_t count, struct iommu_iotlb_gather *gather)
> +{
> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
> +	unsigned long flags;
> +	dma_addr_t pte_dma, iova = (dma_addr_t)_iova;
> +	phys_addr_t pt_phys;
> +	u32 dte;
> +	u32 *pte_addr;
> +	size_t unmap_size;
> +
> +	spin_lock_irqsave(&vsi_domain->dt_lock, flags);
> +
> +	dte = vsi_domain->dt[vsi_iova_dte_index(iova)];
> +	/* Just return 0 if iova is unmapped */
> +	if (!vsi_dte_is_pt_valid(dte)) {
> +		spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
> +		return 0;
> +	}
> +
> +	pt_phys = vsi_dte_pt_address(dte);
> +	pte_addr = (u32 *)phys_to_virt(pt_phys) + vsi_iova_pte_index(iova);
> +	pte_dma = pt_phys + vsi_iova_pte_index(iova) * sizeof(u32);
> +	unmap_size = vsi_iommu_unmap_iova(vsi_domain, pte_addr, pte_dma, size);
> +
> +	spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
> +
> +	return unmap_size;
> +}
> +
> +static int vsi_iommu_map(struct iommu_domain *domain, unsigned long _iova,
> +			 phys_addr_t paddr, size_t size, size_t count,
> +			 int prot, gfp_t gfp, size_t *mapped)
> +{
> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
> +	unsigned long flags;
> +	dma_addr_t pte_dma, iova = (dma_addr_t)_iova;
> +	u32 *page_table, *pte_addr;
> +	u32 dte, pte_index;
> +	int ret;
> +
> +	/*
> +	 * IOMMU drivers are not supposed to lock the page table, however this
> +	 * driver does not safely handle the cache flushing or table
> +	 * installation across concurrent threads so locking is used as a simple
> +	 * solution.
> +	 */

No need for that comment - it's perfectly fine for IOMMU drivers to 
serialise pagetable accesses if they want to. Of course if they're the 
kind of IOMMU that will find itself in a big server system with hundreds 
on CPUs mapping and unmapping tens of thousands of network packets per 
second in parallel, then for sure it's inadvisable from a performance 
perspective, but for a little embedded IOMMU that's only going to be 
handling relatively long-lived media buffers there is absolutely nothing 
wrong with simple and straightforward. In fact if you had tried to do 
clever lock-free stuff here, I would definitely be asking "do you really 
need this?" :)

> +	spin_lock_irqsave(&vsi_domain->dt_lock, flags);
> +
> +	page_table = vsi_dte_get_page_table(vsi_domain, iova);
> +	if (IS_ERR(page_table)) {
> +		spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
> +		return PTR_ERR(page_table);
> +	}
> +
> +	dte = vsi_domain->dt[vsi_iova_dte_index(iova)];
> +	pte_index = vsi_iova_pte_index(iova);
> +	pte_addr = &page_table[pte_index];
> +	pte_dma = vsi_dte_pt_address(dte) + pte_index * sizeof(u32);
> +	ret = vsi_iommu_map_iova(vsi_domain, pte_addr, pte_dma, iova,
> +				 paddr, size, prot);
> +
> +	spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
> +	if (!ret)
> +		*mapped = size;
> +
> +	return ret;
> +}
> +
> +static int vsi_iommu_identity_attach(struct iommu_domain *domain,
> +				     struct device *dev)
> +{
> +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
> +	unsigned long flags;
> +	int ret;
> +
> +	if (WARN_ON(!iommu))
> +		return -ENODEV;

That can never happen. The domain is already validated against the 
device ops in __iommu_attach_group().

> +
> +	if (iommu->domain == domain)
> +		return 0;
> +
> +	iommu->domain = domain;
> +
> +	spin_lock_irqsave(&vsi_domain->iommus_lock, flags);

Erm, what lock? vsi_identity_domain is a plain struct iommu_domain, so 
this vsi_domain pointer has container_of()ed out into other adjacent 
static data... :O

> +	list_del_init(&iommu->node);
> +	spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
> +
> +	ret = pm_runtime_get_if_in_use(iommu->dev);
> +	WARN_ON_ONCE(ret < 0);
> +	if (ret > 0) {
> +		vsi_iommu_disable(iommu);
> +		pm_runtime_put(iommu->dev);
> +	}
> +
> +	return 0;
> +}
> +
> +static struct iommu_domain_ops vsi_identity_ops = {

Const.

> +	.attach_dev = vsi_iommu_identity_attach,
> +};
> +
> +static struct iommu_domain vsi_identity_domain = {
> +	.type = IOMMU_DOMAIN_IDENTITY,
> +	.ops = &vsi_identity_ops,
> +};
> +
> +static int vsi_iommu_attach_device(struct iommu_domain *domain,
> +				   struct device *dev)
> +{
> +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
> +	unsigned long flags;
> +	int ret;
> +
> +	if (WARN_ON(!iommu))
> +		return -ENODEV;

Similarly impossible.

> +
> +	/* iommu already attached */
> +	if (iommu->domain == domain)
> +		return 0;
> +
> +	ret = vsi_iommu_identity_attach(&vsi_identity_domain, dev);
> +	if (ret)
> +		return ret;
> +
> +	iommu->domain = domain;
> +
> +	spin_lock_irqsave(&vsi_domain->iommus_lock, flags);
> +	list_add_tail(&iommu->node, &vsi_domain->iommus);
> +	spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
> +
> +	ret = pm_runtime_get_if_in_use(iommu->dev);
> +	if (!ret || WARN_ON_ONCE(ret < 0))
> +		return 0;
> +
> +	ret = vsi_iommu_enable(iommu);
> +	if (ret)
> +		WARN_ON(vsi_iommu_identity_attach(&vsi_identity_domain, dev));
> +
> +	pm_runtime_put(iommu->dev);
> +
> +	return ret;
> +}
> +
> +static void vsi_iommu_domain_free(struct iommu_domain *domain)
> +{
> +	struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
> +	int i;
> +
> +	WARN_ON(!list_empty(&vsi_domain->iommus));
> +
> +	for (i = 0; i < NUM_DT_ENTRIES; i++) {
> +		u32 dte = vsi_domain->dt[i];
> +
> +		if (vsi_dte_is_pt_valid(dte)) {
> +			phys_addr_t pt_phys = vsi_dte_pt_address(dte);
> +			u32 *page_table = phys_to_virt(pt_phys);
> +
> +			dma_unmap_single(vsi_domain->dma_dev, pt_phys,
> +					 SPAGE_SIZE, DMA_TO_DEVICE);
> +			iommu_free_pages(page_table);
> +		}
> +	}
> +
> +	dma_unmap_single(vsi_domain->dma_dev, vsi_domain->dt_dma,
> +			 SPAGE_SIZE, DMA_TO_DEVICE);
> +	iommu_free_pages(vsi_domain->dt);
> +
> +	dma_unmap_single(vsi_domain->dma_dev, vsi_domain->pta_dma,
> +			 SPAGE_SIZE, DMA_TO_DEVICE);
> +	iommu_free_pages(vsi_domain->pta);
> +
> +	kfree(vsi_domain);
> +}
> +
> +static struct iommu_device *vsi_iommu_probe_device(struct device *dev)
> +{
> +	struct vsi_iommu *iommu = vsi_iommu_get_from_dev(dev);
> +	struct device_link *link;
> +
> +	if (WARN_ON(!iommu))
> +		return ERR_PTR(-ENODEV);

Either don't have this check at all (since it's redundant if you're 
using fwspecs and of_xlate), or don't make it a WARN_ON (if you want the 
impression of supporting non-fwspec usage where probe_device is the one 
op where the core *does* give you "is this your client device?" calls).

> +	link = device_link_add(dev, iommu->dev,
> +			       DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
> +	if (!link)
> +		dev_err(dev, "Unable to link %s\n", dev_name(iommu->dev));
> +
> +	dev_iommu_priv_set(dev, iommu);
> +	return &iommu->iommu;
> +}
> +
> +static void vsi_iommu_release_device(struct device *dev)
> +{
> +	struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
> +
> +	device_link_remove(dev, iommu->dev);
> +}
> +
> +static int vsi_iommu_of_xlate(struct device *dev,
> +			      const struct of_phandle_args *args)
> +{
> +	return iommu_fwspec_add_ids(dev, args->args, 1);

What are you adding here? Per the DT binding there are no IDs, so 
args->args_count will be 0 and args->args will be most likely be 
uninitialised stack.

> +}
> +
> +static struct iommu_ops vsi_iommu_ops = {

Const.

> +	.identity_domain = &vsi_identity_domain,
> +	.domain_alloc_paging = vsi_iommu_domain_alloc_paging,
> +	.probe_device = vsi_iommu_probe_device,
> +	.release_device = vsi_iommu_release_device,
> +	.device_group = generic_single_device_group,
> +	.of_xlate = vsi_iommu_of_xlate,
> +	.default_domain_ops = &(const struct iommu_domain_ops) {
> +		.attach_dev		= vsi_iommu_attach_device,
> +		.map_pages		= vsi_iommu_map,
> +		.unmap_pages		= vsi_iommu_unmap,
> +		.flush_iotlb_all	= vsi_iommu_flush_tlb_all,
> +		.iova_to_phys		= vsi_iommu_iova_to_phys,
> +		.free			= vsi_iommu_domain_free,
> +	}
> +};
> +
> +static const struct of_device_id vsi_iommu_dt_ids[] = {
> +	{
> +		.compatible = "verisilicon,iommu",
> +	},
> +	{
> +		.compatible = "rockchip,rk3588-iommu-1.2",

You can drop this - if the driver doesn't have any SoC-specific 
behaviour then we only need to match the generic compatible here. As 
long as the SoC-specific compatibles are in the binding, and thus in 
deployed DTBs, we can start making use of them in future as and when we 
have a reason to.

> +	},
> +	{ /* sentinel */ }
> +};
> +
> +static int vsi_iommu_probe(struct platform_device *pdev)
> +{
> +	struct device *dev = &pdev->dev;
> +	struct vsi_iommu *iommu;
> +	struct resource *res;
> +	int num_res = pdev->num_resources;
> +	int err, i;
> +
> +	iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
> +	if (!iommu)
> +		return -ENOMEM;
> +
> +	iommu->dev = dev;
> +	iommu->num_mmu = 0;
> +
> +	iommu->bases = devm_kcalloc(dev, num_res, sizeof(*iommu->bases),
> +				    GFP_KERNEL);
> +	if (!iommu->bases)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < num_res; i++) {
> +		res = platform_get_resource(pdev, IORESOURCE_MEM, i);
> +		if (!res)
> +			continue;
> +		iommu->bases[i] = devm_ioremap_resource(&pdev->dev, res);

Consider devm_platform_ioremap_resource().

> +		if (IS_ERR(iommu->bases[i]))
> +			continue;
> +		iommu->num_mmu++;
> +	}
> +	if (iommu->num_mmu == 0)
> +		return PTR_ERR(iommu->bases[0]);
> +
> +	iommu->num_irq = platform_irq_count(pdev);
> +	if (iommu->num_irq < 0)
> +		return iommu->num_irq;
> +
> +	err = devm_clk_bulk_get_all(dev, &iommu->clocks);
> +	if (err >= 0)
> +		iommu->num_clocks = err;
> +	else if (err == -ENOENT)
> +		iommu->num_clocks = 0;
> +	else
> +		return err;
> +
> +	err = clk_bulk_prepare(iommu->num_clocks, iommu->clocks);
> +	if (err)
> +		return err;

I wonder if devm_clk_bulk_get_all_enabled() might help here, but if you 
do want to do any subsequent management then quite possibly it just 
shifts the complexity to making sure they're reenabled in all the paths 
where they can be released again :/

> +
> +	for (i = 0; i < iommu->num_irq; i++) {
> +		int irq = platform_get_irq(pdev, i);

As with num_mmu, according to your binding num_irq must be exactly 1. Do 
you really need the pretence of supporting more or fewer?

> +
> +		if (irq < 0)
> +			return irq;
> +
> +		err = devm_request_irq(iommu->dev, irq, vsi_iommu_irq,
> +				       IRQF_SHARED, dev_name(dev), iommu);
> +		if (err)
> +			goto err_unprepare_clocks;
> +	}
> +
> +	dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
> +	platform_set_drvdata(pdev, iommu);
> +
> +	pm_runtime_enable(dev);
> +
> +	err = iommu_device_sysfs_add(&iommu->iommu, dev, NULL, dev_name(dev));
> +	if (err)
> +		goto err_runtime_disable;
> +
> +	err = iommu_device_register(&iommu->iommu, &vsi_iommu_ops, dev);
> +	if (err)
> +		goto err_remove_sysfs;
> +
> +	return 0;
> +
> +err_remove_sysfs:
> +	iommu_device_sysfs_remove(&iommu->iommu);
> +err_runtime_disable:
> +	pm_runtime_disable(dev);
> +err_unprepare_clocks:
> +	clk_bulk_unprepare(iommu->num_clocks, iommu->clocks);
> +	return err;
> +}
> +
> +static void vsi_iommu_shutdown(struct platform_device *pdev)
> +{
> +	struct vsi_iommu *iommu = platform_get_drvdata(pdev);
> +	int i;
> +
> +	for (i = 0; i < iommu->num_irq; i++) {
> +		int irq = platform_get_irq(pdev, i);
> +
> +		devm_free_irq(iommu->dev, irq, iommu);

Most devm_free calls are suspect in general, and this one is certainly 
no exception. Even if it justifiable to suppress IRQs during shutdown, 
can you not simply disable interrupt generation at the IOMMU end, or at 
worst just do a disable_irq()? In the shutdown path we really don't want 
to be doing any more work than absolutely necessary.

> +	}
> +
> +	pm_runtime_force_suspend(&pdev->dev);
> +}
> +
> +static int __maybe_unused vsi_iommu_suspend(struct device *dev)
> +{
> +	struct vsi_iommu *iommu = dev_get_drvdata(dev);
> +
> +	if (iommu->domain == &vsi_identity_domain)
> +		return 0;
> +
> +	vsi_iommu_disable(iommu);

This seems simlarly dubious - if suspend doesn't need to explicitly save 
any additional context for a subsequent resume then it shouldn't really 
do anything, certainly not change the state of IOMMU translation. At 
best that's a waste of time, at worst it risks unexpected faults or 
memory corruption (if a nominally-suspended client device is actually 
still running).

> +	return 0;
> +}
> +
> +static int __maybe_unused vsi_iommu_resume(struct device *dev)
> +{
> +	struct vsi_iommu *iommu = dev_get_drvdata(dev);
> +
> +	if (iommu->domain == &vsi_identity_domain)
> +		return 0;
> +
> +	return vsi_iommu_enable(iommu);
> +}
> +
> +static const struct dev_pm_ops vsi_iommu_pm_ops = {
> +	SET_RUNTIME_PM_OPS(vsi_iommu_suspend, vsi_iommu_resume, NULL)
> +	SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
> +				pm_runtime_force_resume)
> +};

Consider DEFINE_RUNTIME_DEV_PM_OPS().

> +static struct platform_driver rockchip_vsi_iommu_driver = {
> +	.probe = vsi_iommu_probe,
> +	.shutdown = vsi_iommu_shutdown,
> +	.driver = {
> +		   .name = "vsi_iommu",
> +		   .of_match_table = vsi_iommu_dt_ids,
> +		   .pm = &vsi_iommu_pm_ops,
> +		   .suppress_bind_attrs = true,
> +	},
> +};
> +builtin_platform_driver(rockchip_vsi_iommu_driver);

I guess that does technically work out when this is built as a module, 
as the device_initcall() gets redefined to module_init() and the lack of 
module_exit() just prevents removal (which in practice would be 
prevented by held references anyway)... But it still looks a bit odd.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-20 19:37   ` Robin Murphy
@ 2025-06-20 20:45     ` Lucas Stach
  2025-06-23 12:05       ` Robin Murphy
  2025-06-23 14:03     ` Benjamin Gaignard
  1 sibling, 1 reply; 22+ messages in thread
From: Lucas Stach @ 2025-06-20 20:45 UTC (permalink / raw)
  To: Robin Murphy, Benjamin Gaignard, joro, will, robh, krzk+dt,
	conor+dt, heiko, nicolas.dufresne, jgg
  Cc: iommu, devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	kernel

Am Freitag, dem 20.06.2025 um 20:37 +0100 schrieb Robin Murphy:
> On 19/06/2025 2:12 pm, Benjamin Gaignard wrote:
> > The Verisilicon IOMMU hardware block can be found in combination
> > with Verisilicon hardware video codecs (encoders or decoders) on
> > different SoCs.
> > Enable it will allow us to use non contiguous memory allocators
> > for Verisilicon video codecs.
> > 
> > Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
> > ---
> 
[...]
> I'm especially curious what this "pta" really is - is the comment above 
> just wrong, and you've actually got a 3-level pagetable supporting 
> somewhere between 33 and 42 bits of VA? If not, then the additional 
> level of indirection would very much seem to imply some kind of 
> mechanism for accommodating multiple pagetables at once, and in that 
> case, is it like a PASID table where the client device gets to choose 
> which entry to use, or a StreamID table to disambiguate multiple client 
> devices? (Where in the latter case it should most likely belong to the 
> IOMMU rather than the domain, and you probably want nonzero #iommu-cells 
> in the DT binding for the client IDs).
> 
PTA is short for page table array and it's another level of indirection
to select the page tables to be used for the specific translation. On
the Vivante GPU, where this MMU IP originated, the GPU can select the
index into this array to be used for a specific command stream to
implement GPU client isolation, so it's much like a PASID table.

I have no idea if and how the integration with the video codecs can
select the PTA index.

Regards,
Lucas


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-20 20:45     ` Lucas Stach
@ 2025-06-23 12:05       ` Robin Murphy
  0 siblings, 0 replies; 22+ messages in thread
From: Robin Murphy @ 2025-06-23 12:05 UTC (permalink / raw)
  To: Lucas Stach, Benjamin Gaignard, joro, will, robh, krzk+dt,
	conor+dt, heiko, nicolas.dufresne, jgg
  Cc: iommu, devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	kernel

On 2025-06-20 9:45 pm, Lucas Stach wrote:
> Am Freitag, dem 20.06.2025 um 20:37 +0100 schrieb Robin Murphy:
>> On 19/06/2025 2:12 pm, Benjamin Gaignard wrote:
>>> The Verisilicon IOMMU hardware block can be found in combination
>>> with Verisilicon hardware video codecs (encoders or decoders) on
>>> different SoCs.
>>> Enable it will allow us to use non contiguous memory allocators
>>> for Verisilicon video codecs.
>>>
>>> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
>>> ---
>>
> [...]
>> I'm especially curious what this "pta" really is - is the comment above
>> just wrong, and you've actually got a 3-level pagetable supporting
>> somewhere between 33 and 42 bits of VA? If not, then the additional
>> level of indirection would very much seem to imply some kind of
>> mechanism for accommodating multiple pagetables at once, and in that
>> case, is it like a PASID table where the client device gets to choose
>> which entry to use, or a StreamID table to disambiguate multiple client
>> devices? (Where in the latter case it should most likely belong to the
>> IOMMU rather than the domain, and you probably want nonzero #iommu-cells
>> in the DT binding for the client IDs).
>>
> PTA is short for page table array and it's another level of indirection
> to select the page tables to be used for the specific translation. On
> the Vivante GPU, where this MMU IP originated, the GPU can select the
> index into this array to be used for a specific command stream to
> implement GPU client isolation, so it's much like a PASID table.

Thanks for the clarification!

(Although, similarly to panfrost, does this mean we should at least 
break out an io-pgtable implementation to share between the two drivers 
rather than duplicate code between here and etnaviv_iommu?)

> I have no idea if and how the integration with the video codecs can
> select the PTA index.

Yeah, that's really the thing - it may smell like a PASID table for the 
GPU use-case, but AFAICS that still wouldn't necessarily rule it out 
from turning up in some codec block with, say, all the decode 
functionality hard-wired to index 0 and encode to index 1, then all of a 
sudden we start needing different driver behaviour and potentially a 
different DT binding. I guess the door is still open to support 
"#iommu-cells = 1" to specify explicit PTA indices without breaking the 
implicit "#iommu-cells = 0" behaviour, so I don't think we're painting 
ourselves into a corner at this point, it's more just something to be 
wary of for the future.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver
  2025-06-20 19:37   ` Robin Murphy
  2025-06-20 20:45     ` Lucas Stach
@ 2025-06-23 14:03     ` Benjamin Gaignard
  1 sibling, 0 replies; 22+ messages in thread
From: Benjamin Gaignard @ 2025-06-23 14:03 UTC (permalink / raw)
  To: Robin Murphy, joro, will, robh, krzk+dt, conor+dt, heiko,
	nicolas.dufresne, jgg
  Cc: iommu, devicetree, linux-kernel, linux-arm-kernel, linux-rockchip,
	kernel


Le 20/06/2025 à 21:37, Robin Murphy a écrit :
> On 19/06/2025 2:12 pm, Benjamin Gaignard wrote:
>> The Verisilicon IOMMU hardware block can be found in combination
>> with Verisilicon hardware video codecs (encoders or decoders) on
>> different SoCs.
>> Enable it will allow us to use non contiguous memory allocators
>> for Verisilicon video codecs.
>>
>> Signed-off-by: Benjamin Gaignard <benjamin.gaignard@collabora.com>
>> ---
>> changes in version 3:
>> - Change compatible to "rockchip,rk3588-iommu-1.2"
>> - Create an identity domain for the driver
>> - Fix double flush issue
>> - Rework attach/detach logic
>> - Simplify xlate function
>> - Discover iommu device like done in ARM driver
>> - Remove ARM_DMA_USE_IOMMU from Kconfig
>>
>>   drivers/iommu/Kconfig     |  11 +
>>   drivers/iommu/Makefile    |   1 +
>>   drivers/iommu/vsi-iommu.c | 874 ++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 886 insertions(+)
>>   create mode 100644 drivers/iommu/vsi-iommu.c
>>
>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>> index 0a33d995d15d..3e95d1db737b 100644
>> --- a/drivers/iommu/Kconfig
>> +++ b/drivers/iommu/Kconfig
>> @@ -383,4 +383,15 @@ config SPRD_IOMMU
>>           Say Y here if you want to use the multimedia devices listed 
>> above.
>>   +config VSI_IOMMU
>> +    tristate "Verisilicon IOMMU Support"
>> +    depends on ARM64
>
> "(ARCH_ROCKCHIP && ARM64) || COMPILE_TEST", probably. Otherwise you 
> might risk annoying Geert :)
>
>> +    select IOMMU_API
>> +    help
>> +      Support for IOMMUs used by Verisilicon sub-systems like video
>> +      decoders or encoder hardware blocks.
>> +
>> +      Say Y here if you want to use this IOMMU in front of these
>> +      hardware blocks.
>> +
>>   endif # IOMMU_SUPPORT
>> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
>> index 355294fa9033..68aeff31af8b 100644
>> --- a/drivers/iommu/Makefile
>> +++ b/drivers/iommu/Makefile
>> @@ -34,3 +34,4 @@ obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o
>>   obj-$(CONFIG_IOMMU_IOPF) += io-pgfault.o
>>   obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
>>   obj-$(CONFIG_APPLE_DART) += apple-dart.o
>> +obj-$(CONFIG_VSI_IOMMU) += vsi-iommu.o
>> diff --git a/drivers/iommu/vsi-iommu.c b/drivers/iommu/vsi-iommu.c
>> new file mode 100644
>> index 000000000000..89e63a6a60c1
>> --- /dev/null
>> +++ b/drivers/iommu/vsi-iommu.c
>> @@ -0,0 +1,874 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright (C) 2025 Collabora Ltd.
>> + *
>> + * IOMMU API for Verisilicon
>> + *
>> + * Module Authors:    Yandong Lin <yandong.lin@rock-chips.com>
>> + *            Simon Xue <xxm@rock-chips.com>
>> + *            Benjamin Gaignard <benjamin.gaignard@collabora.com>
>> + */
>> +
>> +#include <linux/clk.h>
>> +#include <linux/compiler.h>
>> +#include <linux/delay.h>
>> +#include <linux/device.h>
>> +#include <linux/dma-mapping.h>
>> +#include <linux/dma-map-ops.h>
>
> This shouldn't be here, it's a device driver not a DMA API 
> implementation.
>
>> +#include <linux/errno.h>
>> +#include <linux/init.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/io.h>
>> +#include <linux/iommu.h>
>> +#include <linux/iopoll.h>
>> +#include <linux/list.h>
>> +#include <linux/mm.h>
>> +#include <linux/module.h>
>> +#include <linux/of.h>
>> +#include <linux/of_iommu.h>
>> +#include <linux/of_platform.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/pm_runtime.h>
>> +#include <linux/slab.h>
>> +#include <linux/spinlock.h>
>> +
>> +#include "iommu-pages.h"
>> +
>> +struct vsi_iommu {
>> +    struct device *dev;
>> +    void __iomem **bases;
>> +    int num_mmu;
>
> What are these for, given the binding specifies a single "reg"?
>
> If you do anticipate supporting multiple MMUs per client device, I 
> would still strongly consider following the exynos-iommu approach of 
> linking distinct instances together internally - the "single instance 
> with multiple interfaces" model is a bit of a bodge, and has the big 
> weakness that you tend to end up forever having to evolve the 
> "clocks", "interrupts" etc. bindings in weird and arbitrary ways for 
> different integrations. Especially if this is a 3rd-party IP that's 
> liable to be used by multiple different SoC vendors.

No, I do not plan to support multiple MMUs per client device.
I will simplify it and use only void __iomem *regs instead.

>
>> +    int num_irq;
>> +    struct clk_bulk_data *clocks;
>> +    int num_clocks;
>> +    struct iommu_device iommu;
>> +    struct list_head node; /* entry in vsi_iommu_domain.iommus */
>> +    struct iommu_domain *domain; /* domain to which iommu is 
>> attached */
>> +};
>> +
>> +struct vsi_iommu_domain {
>> +    struct device *dma_dev;
>> +    struct list_head iommus;
>> +    u32 *dt; /* page directory table */
>> +    dma_addr_t dt_dma;
>> +    spinlock_t iommus_lock; /* lock for iommus list */
>> +    spinlock_t dt_lock; /* lock for modifying page directory table */
>> +    struct iommu_domain domain;
>> +    /* for vsi iommu */
>
> Wait, so who is the rest of this driver-private structure for, if not 
> also for this driver? :/

I will remove this useless comment.

>
>> +    u64 *pta; /* page directory table */
>> +    dma_addr_t pta_dma;
>> +};
>> +
>> +static struct iommu_domain vsi_identity_domain;
>> +
>> +#define NUM_DT_ENTRIES    1024
>> +#define NUM_PT_ENTRIES    1024
>> +#define PT_SIZE        (NUM_PT_ENTRIES * sizeof(u32))
>> +
>> +#define SPAGE_SIZE    BIT(12)
>> +
>> +/* vsi iommu regs address */
>> +#define VSI_MMU_CONFIG1_BASE            0x1ac
>> +#define VSI_MMU_AHB_EXCEPTION_BASE        0x380
>> +#define VSI_MMU_AHB_CONTROL_BASE        0x388
>> +#define VSI_MMU_AHB_TBL_ARRAY_BASE_L_BASE    0x38C
>> +
>> +/* MMU register offsets */
>> +#define VSI_MMU_FLUSH_BASE        0x184
>> +#define VSI_MMU_BIT_FLUSH        BIT(4)
>> +
>> +#define VSI_MMU_PAGE_FAULT_ADDR        0x380
>> +#define VSI_MMU_STATUS_BASE        0x384    /* IRQ status */
>> +
>> +#define VSI_MMU_BIT_ENABLE        BIT(0)
>> +
>> +#define VSI_MMU_OUT_OF_BOUND        BIT(28)
>> +/* Irq mask */
>> +#define VSI_MMU_IRQ_MASK        0x7
>> +
>> +#define VSI_DTE_PT_ADDRESS_MASK        0xffffffc0
>
> I'm curious what those lowest 6 bits are for - does it really only 
> require 64-byte alignment for pagetables, or can it actually 
> accommodate a folded >32-bit address similar to the PTE level?

I do not have these info because the hardware isn't documented.
The only hints are in the original driver here:
https://github.com/rockchip-linux/kernel/blob/develop-6.1/drivers/iommu/rockchip-iommu-av1d.c
If someone from rockchip or verisilicon read this thread, you are welcome
to comment it and provide us more info :-)

>
>> +#define VSI_DTE_PT_VALID        BIT(0)
>> +
>> +#define VSI_PAGE_DESC_LO_MASK        0xfffff000
>> +#define VSI_PAGE_DESC_HI_MASK        GENMASK_ULL(39, 32)
>> +#define VSI_PAGE_DESC_HI_SHIFT        (32 - 4)
>> +
>> +static inline phys_addr_t vsi_dte_pt_address(u32 dte)
>> +{
>> +    return (phys_addr_t)dte & VSI_DTE_PT_ADDRESS_MASK;
>> +}
>> +
>> +static inline u32 vsi_mk_dte(u32 dte)
>> +{
>> +    return (phys_addr_t)dte | VSI_DTE_PT_VALID;
>> +}
>> +
>> +#define VSI_PTE_PAGE_ADDRESS_MASK    0xfffffff0
>> +#define VSI_PTE_PAGE_WRITABLE        BIT(2)
>
> Any idea if there are other useful permission bits?

No (same reason than above)

>
>> +#define VSI_PTE_PAGE_VALID        BIT(0)
>> +
>> +static inline phys_addr_t vsi_pte_page_address(u32 pte)
>> +{
>> +    u64 pte_vsi = pte;
>> +
>> +    pte_vsi = ((pte_vsi & VSI_PAGE_DESC_HI_MASK) << 
>> VSI_PAGE_DESC_HI_SHIFT) |
>
> "(pte_vsi & VSI_PAGE_DESC_HI_MASK)" will by definition be 0.

I will remove it.

>
>> +          (pte_vsi & VSI_PAGE_DESC_LO_MASK);
>> +
>> +    return (phys_addr_t)pte_vsi;
>> +}
>> +
>> +static u32 vsi_mk_pte(phys_addr_t page, int prot)
>> +{
>> +    u32 flags = 0;
>> +
>> +    flags |= (prot & IOMMU_WRITE) ? VSI_PTE_PAGE_WRITABLE : 0;
>> +    page = (page & VSI_PAGE_DESC_LO_MASK) |
>> +           ((page & VSI_PAGE_DESC_HI_MASK) >> VSI_PAGE_DESC_HI_SHIFT);
>> +    page &= VSI_PTE_PAGE_ADDRESS_MASK;
>
> If VSI_PAGE_DESC_LO_MASK and VSI_PAGE_DESC_HI_MASK are correct to 
> start with then VSI_PTE_PAGE_ADDRESS_MASK serves no purpose.

You are right I will remove it.

>
>> +    return page | flags | VSI_PTE_PAGE_VALID;
>> +}
>> +
>> +#define VSI_DTE_PT_VALID    BIT(0)
>> +
>> +static inline bool vsi_dte_is_pt_valid(u32 dte)
>> +{
>> +    return dte & VSI_DTE_PT_VALID;
>> +}
>> +
>> +static inline bool vsi_pte_is_page_valid(u32 pte)
>> +{
>> +    return pte & VSI_PTE_PAGE_VALID;
>> +}
>> +
>> +static u32 vsi_mk_pte_invalid(u32 pte)
>> +{
>> +    return pte & ~VSI_PTE_PAGE_VALID;
>> +}
>> +
>> +#define VSI_MASTER_TLB_MASK    GENMASK_ULL(31, 10)
>
> I note that this ends up being associated with the 
> VSI_MMU_AHB_TBL_ARRAY_BASE_L_BASE register - is the "TLB"/"TBL" 
> difference a typo one way or the other, or definitely intentional?
>
> (For all I know maybe it really could be a table of translation 
> lookaside bufers?)

It is typo.

>
>> +/* mode 0 : 4k */
>> +#define VSI_PTA_4K_MODE    0
>> +
>> +static u64 vsi_mk_pta(dma_addr_t dt_dma)
>> +{
>> +    u64 val = (dt_dma & VSI_MASTER_TLB_MASK) | VSI_PTA_4K_MODE;
>> +
>> +    return val;
>> +}
>> +
>> +static struct vsi_iommu_domain *to_vsi_domain(struct iommu_domain *dom)
>> +{
>> +    return container_of(dom, struct vsi_iommu_domain, domain);
>> +}
>> +
>> +static void vsi_iommu_disable(struct vsi_iommu *iommu)
>> +{
>> +    int i;
>> +
>> +    /* Ignore error while disabling, just keep going */
>
> FWIW I thought that comment was wrong at first, because we're clearly 
> not disabling the clocks...

The way it has been design in the original driver is to much complex.
I will come back to something more simple:
- enable/disable the clock in pm_runtime callbacks
- call pm_runtime_resume_and_get(), pm_runtime_mark_last_busy() and pm_runtime_put_autosuspend() when needed.

>
>> + WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks));
>> +    for (i = 0; i < iommu->num_mmu; i++)
>> +        writel(0, iommu->bases[i] + VSI_MMU_AHB_CONTROL_BASE);
>
> However, even if the IOMMU itself is going away, is it really safe to 
> ignore an error if it means we could risk hanging on an unclocked 
> register access here?
>
>> + clk_bulk_disable(iommu->num_clocks, iommu->clocks);
>> +}
>> +
>> +static int vsi_iommu_enable(struct vsi_iommu *iommu)
>> +{
>> +    struct iommu_domain *domain = iommu->domain;
>> +    struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>> +    int ret, i;
>> +
>> +    ret = clk_bulk_enable(iommu->num_clocks, iommu->clocks);
>> +    if (ret)
>> +        return ret;
>> +
>> +    for (i = 0; i < iommu->num_mmu; i++) {
>> +        u32 val = readl(iommu->bases[i] + VSI_MMU_AHB_CONTROL_BASE);
>> +
>> +        if (!(val & VSI_MMU_BIT_ENABLE)) {
>
> If the hardware happens to be enabled already, can you be sure it's 
> enabled *with the expected configuration*?

I will remove this check.

>
>> + writel(vsi_domain->pta_dma,
>> +                   iommu->bases[i] + 
>> VSI_MMU_AHB_TBL_ARRAY_BASE_L_BASE);
>> +            writel(VSI_MMU_OUT_OF_BOUND, iommu->bases[i] + 
>> VSI_MMU_CONFIG1_BASE);
>> +            writel(VSI_MMU_BIT_ENABLE, iommu->bases[i] + 
>> VSI_MMU_AHB_EXCEPTION_BASE);
>> +            writel(VSI_MMU_BIT_ENABLE, iommu->bases[i] + 
>> VSI_MMU_AHB_CONTROL_BASE);
>> +        }
>> +    }
>> +    clk_bulk_disable(iommu->num_clocks, iommu->clocks);
>> +
>> +    return ret;
>> +}
>> +
>> +static inline void vsi_table_flush(struct vsi_iommu_domain 
>> *vsi_domain, dma_addr_t dma,
>> +                   unsigned int count)
>> +{
>> +    size_t size = count * sizeof(u32); /* count of u32 entry */
>> +
>> +    dma_sync_single_for_device(vsi_domain->dma_dev, dma, size, 
>> DMA_TO_DEVICE);
>> +}
>> +
>> +#define VSI_IOVA_DTE_MASK    0xffc00000
>> +#define VSI_IOVA_DTE_SHIFT    22
>> +#define VSI_IOVA_PTE_MASK    0x003ff000
>> +#define VSI_IOVA_PTE_SHIFT    12
>> +#define VSI_IOVA_PAGE_MASK    0x00000fff
>> +#define VSI_IOVA_PAGE_SHIFT    0
>> +
>> +static u32 vsi_iova_dte_index(dma_addr_t iova)
>> +{
>> +    return (u32)(iova & VSI_IOVA_DTE_MASK) >> VSI_IOVA_DTE_SHIFT;
>
> Are these u32 casts really necessary? At worst, why not just make the 
> "iova" argument u32 to begin with?

I will do that.

>
>> +}
>> +
>> +static u32 vsi_iova_pte_index(dma_addr_t iova)
>> +{
>> +    return (u32)(iova & VSI_IOVA_PTE_MASK) >> VSI_IOVA_PTE_SHIFT;
>> +}
>> +
>> +static u32 vsi_iova_page_offset(dma_addr_t iova)
>> +{
>> +    return (u32)(iova & VSI_IOVA_PAGE_MASK) >> VSI_IOVA_PAGE_SHIFT;
>> +}
>> +
>> +static u32 vsi_iommu_read(void __iomem *base, u32 offset)
>> +{
>> +    return readl(base + offset);
>> +}
>> +
>> +static void vsi_iommu_write(void __iomem *base, u32 offset, u32 value)
>> +{
>> +    writel(value, base + offset);
>> +}
>
> Ditch these read/write wrappers, they're used all of once, and they're 
> still longer than just writing out the straightforward readl/writel 
> directly. Abstracting a structure member dereference is one thing, but 
> abstracting a single addition is entirely unnecessary.

Sure

>
>> +static void vsi_iommu_flush_tlb_all(struct iommu_domain *domain)
>> +{
>> +    struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>> +    struct list_head *pos;
>> +    unsigned long flags;
>> +    int i;
>> +
>> +    spin_lock_irqsave(&vsi_domain->iommus_lock, flags);
>> +    list_for_each(pos, &vsi_domain->iommus) {
>> +        struct vsi_iommu *iommu;
>> +        int ret;
>> +
>> +        iommu = list_entry(pos, struct vsi_iommu, node);
>> +        ret = pm_runtime_get_if_in_use(iommu->dev);
>> +        if (WARN_ON_ONCE(ret < 0))
>> +            continue;
>> +        if (ret) {
>> +            WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks));
>
> Again, would it really be OK to go ahead and access a potentially 
> unclocked register?
>
> TBH I'd drop all this busywork with the clocks altogether, and just 
> enable/disable them in the runtime PM callbacks. I can't imagine the 
> IOMMU would actually work very well with its core clock stopped anyway 
> - presumably you're only getting away with it in this case because the 
> clocks are shared with the codec so that's keeping them enabled while 
> there is traffic for the IOMMU to translate. If someone really really 
> really wants to gate the interface clock between register accesses on 
> some other platform where it would have an effect, they can always 
> come back and add that later.
>
>> +            for (i = 0; i < iommu->num_mmu; i++) {
>> +                writel(VSI_MMU_BIT_FLUSH,
>> +                       iommu->bases[i] + VSI_MMU_FLUSH_BASE);
>> +                writel(0, iommu->bases[i] + VSI_MMU_FLUSH_BASE);
>
> That's curious - you set the bit, then explicitly clear it again 
> imemdiately, but don't have to wait for any kind of completion status?
>
>> +            }
>> +            clk_bulk_disable(iommu->num_clocks, iommu->clocks);
>> +            pm_runtime_put(iommu->dev);
>> +        }
>> +    }
>> +    spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
>> +}
>> +
>> +static irqreturn_t vsi_iommu_irq(int irq, void *dev_id)
>> +{
>> +    struct vsi_iommu *iommu = dev_id;
>> +    u32 int_status;
>> +    dma_addr_t iova;
>> +    irqreturn_t ret = IRQ_NONE;
>> +    int i, err;
>> +
>> +    err = pm_runtime_get_if_in_use(iommu->dev);
>> +    if (!err || WARN_ON_ONCE(err < 0))
>> +        return ret;
>> +
>> +    if (WARN_ON(clk_bulk_enable(iommu->num_clocks, iommu->clocks)))
>> +        goto out;
>> +
>> +    for (i = 0; i < iommu->num_mmu; i++) {
>> +        int_status = vsi_iommu_read(iommu->bases[i], 
>> VSI_MMU_STATUS_BASE);
>> +        if (int_status & VSI_MMU_IRQ_MASK) {
>> +            dev_err(iommu->dev, "unexpected int_status=%08x\n", 
>> int_status);
>> +            iova = vsi_iommu_read(iommu->bases[i], 
>> VSI_MMU_PAGE_FAULT_ADDR);
>> +
>> +            if (iommu->domain)
>
> The current domain should never be NULL. You should default to either 
> a blocking or bypass domain (depending on the hardware behaiour), and 
> initialise that before you ever even request the IRQ.

Jason also report this, I will fix that in v4

>
>> + report_iommu_fault(iommu->domain, iommu->dev, iova, int_status);
>> +            else
>> +                dev_err(iommu->dev,
>> +                    "Page fault while iommu not attached to 
>> domain?\n");
>> +        }
>> +        vsi_iommu_write(iommu->bases[i], VSI_MMU_STATUS_BASE, 0);
>> +        ret = IRQ_HANDLED;
>> +    }
>> +
>> +    clk_bulk_disable(iommu->num_clocks, iommu->clocks);
>> +
>> +out:
>> +    pm_runtime_put(iommu->dev);
>> +    return ret;
>> +}
>> +
>> +static struct vsi_iommu *vsi_iommu_get_from_dev(struct device *dev)
>> +{
>> +    struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
>> +    struct device *iommu_dev = 
>> bus_find_device_by_fwnode(&platform_bus_type,
>> +                                 fwspec->iommu_fwnode);
>> +
>> +    put_device(iommu_dev);
>> +
>> +    return iommu_dev ? dev_get_drvdata(iommu_dev) : NULL;
>> +}
>> +
>> +static struct iommu_domain *vsi_iommu_domain_alloc_paging(struct 
>> device *dev)
>> +{
>> +    struct vsi_iommu *iommu = vsi_iommu_get_from_dev(dev);
>
> Use dev_iommu_priv_get() - we get here via a dev_iommu_ops() lookup, 
> so dev is already guaranteed to be one of your clients.

ok

>
>> +    struct vsi_iommu_domain *vsi_domain;
>> +
>> +    vsi_domain = kzalloc(sizeof(*vsi_domain), GFP_KERNEL);
>> +    if (!vsi_domain)
>> +        return NULL;
>> +
>> +    vsi_domain->dma_dev = iommu->dev;
>> +    iommu->domain = &vsi_identity_domain;
>
> Nope, that should only happen when the domain is actually attached.

I will remove it in v4.

>
>> +    /*
>> +     * iommu use a 2 level pagetable.
>> +     * Each level1 (dt) and level2 (pt) table has 1024 4-byte entries.
>> +     * Allocate one 4 KiB page for each table.
>> +     */
>> +    vsi_domain->dt = iommu_alloc_pages_sz(GFP_KERNEL | GFP_DMA32,
>> +                          SPAGE_SIZE);
>> +    if (!vsi_domain->dt)
>> +        goto err_free_domain;
>> +
>> +    vsi_domain->dt_dma = dma_map_single(vsi_domain->dma_dev, 
>> vsi_domain->dt,
>> +                        SPAGE_SIZE, DMA_TO_DEVICE);
>> +    if (dma_mapping_error(vsi_domain->dma_dev, vsi_domain->dt_dma)) {
>> +        dev_err(vsi_domain->dma_dev, "DMA map error for DT\n");
>> +        goto err_free_dt;
>> +    }
>> +
>> +    vsi_domain->pta = iommu_alloc_pages_sz(GFP_KERNEL | GFP_DMA32,
>> +                           SPAGE_SIZE);
>> +    if (!vsi_domain->pta)
>> +        goto err_unmap_dt;
>> +
>> +    vsi_domain->pta[0] = vsi_mk_pta(vsi_domain->dt_dma);
>> +    vsi_domain->pta_dma = dma_map_single(vsi_domain->dma_dev, 
>> vsi_domain->pta,
>> +                         SPAGE_SIZE, DMA_TO_DEVICE);
>> +    if (dma_mapping_error(vsi_domain->dma_dev, vsi_domain->pta_dma)) {
>> +        dev_err(vsi_domain->dma_dev, "DMA map error for PTA\n");
>> +        goto err_free_pta;
>> +    }
>
> I'm especially curious what this "pta" really is - is the comment 
> above just wrong, and you've actually got a 3-level pagetable 
> supporting somewhere between 33 and 42 bits of VA? If not, then the 
> additional level of indirection would very much seem to imply some 
> kind of mechanism for accommodating multiple pagetables at once, and 
> in that case, is it like a PASID table where the client device gets to 
> choose which entry to use, or a StreamID table to disambiguate 
> multiple client devices? (Where in the latter case it should most 
> likely belong to the IOMMU rather than the domain, and you probably 
> want nonzero #iommu-cells in the DT binding for the client IDs).
>
>> + spin_lock_init(&vsi_domain->iommus_lock);
>> +    spin_lock_init(&vsi_domain->dt_lock);
>> +    INIT_LIST_HEAD(&vsi_domain->iommus);
>> +
>> +    vsi_domain->domain.geometry.aperture_start = 0;
>> +    vsi_domain->domain.geometry.aperture_end   = DMA_BIT_MASK(32);
>> +    vsi_domain->domain.geometry.force_aperture = true;
>> +    vsi_domain->domain.pgsize_bitmap = SZ_4K;
>> +
>> +    return &vsi_domain->domain;
>> +
>> +err_free_pta:
>> +    iommu_free_pages(vsi_domain->pta);
>> +err_unmap_dt:
>> +    dma_unmap_single(vsi_domain->dma_dev, vsi_domain->dt_dma,
>> +             SPAGE_SIZE, DMA_TO_DEVICE);
>> +err_free_dt:
>> +    iommu_free_pages(vsi_domain->dt);
>> +err_free_domain:
>> +    kfree(vsi_domain);
>> +
>> +    return NULL;
>> +}
>> +
>> +static phys_addr_t vsi_iommu_iova_to_phys(struct iommu_domain *domain,
>> +                      dma_addr_t iova)
>> +{
>> +    struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>> +    unsigned long flags;
>> +    phys_addr_t pt_phys, phys = 0;
>> +    u32 dte, pte;
>> +    u32 *page_table;
>> +
>> +    spin_lock_irqsave(&vsi_domain->dt_lock, flags);
>> +
>> +    dte = vsi_domain->dt[vsi_iova_dte_index(iova)];
>> +    if (!vsi_dte_is_pt_valid(dte))
>> +        goto out;
>> +
>> +    pt_phys = vsi_dte_pt_address(dte);
>> +    page_table = (u32 *)phys_to_virt(pt_phys);
>> +    pte = page_table[vsi_iova_pte_index(iova)];
>> +    if (!vsi_pte_is_page_valid(pte))
>> +        goto out;
>> +
>> +    phys = vsi_pte_page_address(pte) + vsi_iova_page_offset(iova);
>> +out:
>> +    spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
>> +
>> +    return phys;
>> +}
>> +
>> +static u32 *vsi_dte_get_page_table(struct vsi_iommu_domain 
>> *vsi_domain, dma_addr_t iova)
>> +{
>> +    u32 *page_table, *dte_addr;
>> +    u32 dte_index, dte;
>> +    phys_addr_t pt_phys;
>> +    dma_addr_t pt_dma;
>> +
>> +    assert_spin_locked(&vsi_domain->dt_lock);
>> +
>> +    dte_index = vsi_iova_dte_index(iova);
>> +    dte_addr = &vsi_domain->dt[dte_index];
>> +    dte = *dte_addr;
>> +    if (vsi_dte_is_pt_valid(dte))
>> +        goto done;
>> +
>> +    page_table = (u32 *)iommu_alloc_pages_sz(GFP_ATOMIC | GFP_DMA32, 
>> SPAGE_SIZE);
>> +    if (!page_table)
>> +        return ERR_PTR(-ENOMEM);
>> +
>> +    dte = vsi_mk_dte(virt_to_phys(page_table));
>
> No no no no no. You're already generating pt_dma the correct way just 
> below, use it! By all means *check* what dma_map_single() returns to 
> make sure you're not getting any unexpected 
> translation/bounce-buffering if you want to be able to rely on using 
> phys_to_virt() as a shortcut later, but never blindly assume that 
> virt_to_phys() is appropriate for DMA (for instance, what if 
> CONFIG_ZONE_DMA32 isn't enabled so page_table ended up at a >32-bit 
> address?)

Thanks, I will change that.

>
>> +    *dte_addr = dte;
>> +
>> +    pt_dma = dma_map_single(vsi_domain->dma_dev, page_table, 
>> SPAGE_SIZE, DMA_TO_DEVICE);
>> +    if (dma_mapping_error(vsi_domain->dma_dev, pt_dma)) {
>> +        dev_err(vsi_domain->dma_dev, "DMA mapping error while 
>> allocating page table\n");
>> +        iommu_free_pages(page_table);
>> +        return ERR_PTR(-ENOMEM);
>> +    }
>> +
>> +    vsi_table_flush(vsi_domain,
>> +            vsi_domain->dt_dma + dte_index * sizeof(u32), 1);
>
> The oreder seems a bit jumbled up here as well - it would be safest to 
> get everything done with page_table itself first, *then* worry about 
> updating the DTE to point to it.
>
>> +done:
>> +    pt_phys = vsi_dte_pt_address(dte);
>> +    return (u32 *)phys_to_virt(pt_phys);
>> +}
>> +
>> +static size_t vsi_iommu_unmap_iova(struct vsi_iommu_domain *vsi_domain,
>> +                   u32 *pte_addr, dma_addr_t pte_dma,
>> +                   size_t size)
>> +{
>> +    unsigned int pte_count;
>> +    unsigned int pte_total = size / SPAGE_SIZE;
>> +
>> +    assert_spin_locked(&vsi_domain->dt_lock);
>> +
>> +    for (pte_count = 0; pte_count < pte_total; pte_count++) {
>> +        u32 pte = pte_addr[pte_count];
>
> What prevents this running off the end of the pagetable page? AFAICS 
> you're not capping "size" to DTE boundaries in the main callback either.

I will limit it by using NUM_PT_ENTRIES.

>
>> +
>> +        if (!vsi_pte_is_page_valid(pte))
>> +            break;
>> +
>> +        pte_addr[pte_count] = vsi_mk_pte_invalid(pte);
>> +    }
>> +
>> +    vsi_table_flush(vsi_domain, pte_dma, pte_count);
>> +
>> +    return pte_count * SPAGE_SIZE;
>> +}
>> +
>> +static int vsi_iommu_map_iova(struct vsi_iommu_domain *vsi_domain, 
>> u32 *pte_addr,
>> +                  dma_addr_t pte_dma, dma_addr_t iova,
>> +                  phys_addr_t paddr, size_t size, int prot)
>> +{
>> +    unsigned int pte_count;
>> +    unsigned int pte_total = size / SPAGE_SIZE;
>> +
>> +    assert_spin_locked(&vsi_domain->dt_lock);
>> +
>> +    for (pte_count = 0; pte_count < pte_total; pte_count++) {
>> +        u32 pte = pte_addr[pte_count];
>> +
>> +        if (vsi_pte_is_page_valid(pte))
>> +            goto unwind;
>> +
>> +        pte_addr[pte_count] = vsi_mk_pte(paddr, prot);
>> +
>> +        paddr += SPAGE_SIZE;
>> +    }
>> +
>> +    vsi_table_flush(vsi_domain, pte_dma, pte_total);
>> +
>> +    return 0;
>> +unwind:
>> +    /* Unmap the range of iovas that we just mapped */
>> +    vsi_iommu_unmap_iova(vsi_domain, pte_addr, pte_dma,
>> +                 pte_count * SPAGE_SIZE);
>
> If you failed to map anything, return an error; otherwise, just return 
> however much you did map successfully. The IOMMU core will take care 
> of the rest.
ok
>
>> +
>> +    return -EADDRINUSE;
>> +}
>> +
>> +static size_t vsi_iommu_unmap(struct iommu_domain *domain, unsigned 
>> long _iova,
>> +                  size_t size, size_t count, struct 
>> iommu_iotlb_gather *gather)
>> +{
>> +    struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>> +    unsigned long flags;
>> +    dma_addr_t pte_dma, iova = (dma_addr_t)_iova;
>> +    phys_addr_t pt_phys;
>> +    u32 dte;
>> +    u32 *pte_addr;
>> +    size_t unmap_size;
>> +
>> +    spin_lock_irqsave(&vsi_domain->dt_lock, flags);
>> +
>> +    dte = vsi_domain->dt[vsi_iova_dte_index(iova)];
>> +    /* Just return 0 if iova is unmapped */
>> +    if (!vsi_dte_is_pt_valid(dte)) {
>> +        spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
>> +        return 0;
>> +    }
>> +
>> +    pt_phys = vsi_dte_pt_address(dte);
>> +    pte_addr = (u32 *)phys_to_virt(pt_phys) + vsi_iova_pte_index(iova);
>> +    pte_dma = pt_phys + vsi_iova_pte_index(iova) * sizeof(u32);
>> +    unmap_size = vsi_iommu_unmap_iova(vsi_domain, pte_addr, pte_dma, 
>> size);
>> +
>> +    spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
>> +
>> +    return unmap_size;
>> +}
>> +
>> +static int vsi_iommu_map(struct iommu_domain *domain, unsigned long 
>> _iova,
>> +             phys_addr_t paddr, size_t size, size_t count,
>> +             int prot, gfp_t gfp, size_t *mapped)
>> +{
>> +    struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>> +    unsigned long flags;
>> +    dma_addr_t pte_dma, iova = (dma_addr_t)_iova;
>> +    u32 *page_table, *pte_addr;
>> +    u32 dte, pte_index;
>> +    int ret;
>> +
>> +    /*
>> +     * IOMMU drivers are not supposed to lock the page table, 
>> however this
>> +     * driver does not safely handle the cache flushing or table
>> +     * installation across concurrent threads so locking is used as 
>> a simple
>> +     * solution.
>> +     */
>
> No need for that comment - it's perfectly fine for IOMMU drivers to 
> serialise pagetable accesses if they want to. Of course if they're the 
> kind of IOMMU that will find itself in a big server system with 
> hundreds on CPUs mapping and unmapping tens of thousands of network 
> packets per second in parallel, then for sure it's inadvisable from a 
> performance perspective, but for a little embedded IOMMU that's only 
> going to be handling relatively long-lived media buffers there is 
> absolutely nothing wrong with simple and straightforward. In fact if 
> you had tried to do clever lock-free stuff here, I would definitely be 
> asking "do you really need this?" :)
>
>> + spin_lock_irqsave(&vsi_domain->dt_lock, flags);
>> +
>> +    page_table = vsi_dte_get_page_table(vsi_domain, iova);
>> +    if (IS_ERR(page_table)) {
>> +        spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
>> +        return PTR_ERR(page_table);
>> +    }
>> +
>> +    dte = vsi_domain->dt[vsi_iova_dte_index(iova)];
>> +    pte_index = vsi_iova_pte_index(iova);
>> +    pte_addr = &page_table[pte_index];
>> +    pte_dma = vsi_dte_pt_address(dte) + pte_index * sizeof(u32);
>> +    ret = vsi_iommu_map_iova(vsi_domain, pte_addr, pte_dma, iova,
>> +                 paddr, size, prot);
>> +
>> +    spin_unlock_irqrestore(&vsi_domain->dt_lock, flags);
>> +    if (!ret)
>> +        *mapped = size;
>> +
>> +    return ret;
>> +}
>> +
>> +static int vsi_iommu_identity_attach(struct iommu_domain *domain,
>> +                     struct device *dev)
>> +{
>> +    struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
>> +    struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>> +    unsigned long flags;
>> +    int ret;
>> +
>> +    if (WARN_ON(!iommu))
>> +        return -ENODEV;
>
> That can never happen. The domain is already validated against the 
> device ops in __iommu_attach_group().
>
>> +
>> +    if (iommu->domain == domain)
>> +        return 0;
>> +
>> +    iommu->domain = domain;
>> +
>> +    spin_lock_irqsave(&vsi_domain->iommus_lock, flags);
>
> Erm, what lock? vsi_identity_domain is a plain struct iommu_domain, so 
> this vsi_domain pointer has container_of()ed out into other adjacent 
> static data... :O

I will simplify the lock schema and use one declared inside vsi_iommu structure.

>
>> +    list_del_init(&iommu->node);
>> +    spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
>> +
>> +    ret = pm_runtime_get_if_in_use(iommu->dev);
>> +    WARN_ON_ONCE(ret < 0);
>> +    if (ret > 0) {
>> +        vsi_iommu_disable(iommu);
>> +        pm_runtime_put(iommu->dev);
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static struct iommu_domain_ops vsi_identity_ops = {
>
> Const.
>
>> +    .attach_dev = vsi_iommu_identity_attach,
>> +};
>> +
>> +static struct iommu_domain vsi_identity_domain = {
>> +    .type = IOMMU_DOMAIN_IDENTITY,
>> +    .ops = &vsi_identity_ops,
>> +};
>> +
>> +static int vsi_iommu_attach_device(struct iommu_domain *domain,
>> +                   struct device *dev)
>> +{
>> +    struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
>> +    struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>> +    unsigned long flags;
>> +    int ret;
>> +
>> +    if (WARN_ON(!iommu))
>> +        return -ENODEV;
>
> Similarly impossible.
>
>> +
>> +    /* iommu already attached */
>> +    if (iommu->domain == domain)
>> +        return 0;
>> +
>> +    ret = vsi_iommu_identity_attach(&vsi_identity_domain, dev);
>> +    if (ret)
>> +        return ret;
>> +
>> +    iommu->domain = domain;
>> +
>> +    spin_lock_irqsave(&vsi_domain->iommus_lock, flags);
>> +    list_add_tail(&iommu->node, &vsi_domain->iommus);
>> +    spin_unlock_irqrestore(&vsi_domain->iommus_lock, flags);
>> +
>> +    ret = pm_runtime_get_if_in_use(iommu->dev);
>> +    if (!ret || WARN_ON_ONCE(ret < 0))
>> +        return 0;
>> +
>> +    ret = vsi_iommu_enable(iommu);
>> +    if (ret)
>> + WARN_ON(vsi_iommu_identity_attach(&vsi_identity_domain, dev));
>> +
>> +    pm_runtime_put(iommu->dev);
>> +
>> +    return ret;
>> +}
>> +
>> +static void vsi_iommu_domain_free(struct iommu_domain *domain)
>> +{
>> +    struct vsi_iommu_domain *vsi_domain = to_vsi_domain(domain);
>> +    int i;
>> +
>> +    WARN_ON(!list_empty(&vsi_domain->iommus));
>> +
>> +    for (i = 0; i < NUM_DT_ENTRIES; i++) {
>> +        u32 dte = vsi_domain->dt[i];
>> +
>> +        if (vsi_dte_is_pt_valid(dte)) {
>> +            phys_addr_t pt_phys = vsi_dte_pt_address(dte);
>> +            u32 *page_table = phys_to_virt(pt_phys);
>> +
>> +            dma_unmap_single(vsi_domain->dma_dev, pt_phys,
>> +                     SPAGE_SIZE, DMA_TO_DEVICE);
>> +            iommu_free_pages(page_table);
>> +        }
>> +    }
>> +
>> +    dma_unmap_single(vsi_domain->dma_dev, vsi_domain->dt_dma,
>> +             SPAGE_SIZE, DMA_TO_DEVICE);
>> +    iommu_free_pages(vsi_domain->dt);
>> +
>> +    dma_unmap_single(vsi_domain->dma_dev, vsi_domain->pta_dma,
>> +             SPAGE_SIZE, DMA_TO_DEVICE);
>> +    iommu_free_pages(vsi_domain->pta);
>> +
>> +    kfree(vsi_domain);
>> +}
>> +
>> +static struct iommu_device *vsi_iommu_probe_device(struct device *dev)
>> +{
>> +    struct vsi_iommu *iommu = vsi_iommu_get_from_dev(dev);
>> +    struct device_link *link;
>> +
>> +    if (WARN_ON(!iommu))
>> +        return ERR_PTR(-ENODEV);
>
> Either don't have this check at all (since it's redundant if you're 
> using fwspecs and of_xlate), or don't make it a WARN_ON (if you want 
> the impression of supporting non-fwspec usage where probe_device is 
> the one op where the core *does* give you "is this your client 
> device?" calls).
>
>> +    link = device_link_add(dev, iommu->dev,
>> +                   DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
>> +    if (!link)
>> +        dev_err(dev, "Unable to link %s\n", dev_name(iommu->dev));
>> +
>> +    dev_iommu_priv_set(dev, iommu);
>> +    return &iommu->iommu;
>> +}
>> +
>> +static void vsi_iommu_release_device(struct device *dev)
>> +{
>> +    struct vsi_iommu *iommu = dev_iommu_priv_get(dev);
>> +
>> +    device_link_remove(dev, iommu->dev);
>> +}
>> +
>> +static int vsi_iommu_of_xlate(struct device *dev,
>> +                  const struct of_phandle_args *args)
>> +{
>> +    return iommu_fwspec_add_ids(dev, args->args, 1);
>
> What are you adding here? Per the DT binding there are no IDs, so 
> args->args_count will be 0 and args->args will be most likely be 
> uninitialised stack.

I will remove this function.

>
>> +}
>> +
>> +static struct iommu_ops vsi_iommu_ops = {
>
> Const.
>
>> +    .identity_domain = &vsi_identity_domain,
>> +    .domain_alloc_paging = vsi_iommu_domain_alloc_paging,
>> +    .probe_device = vsi_iommu_probe_device,
>> +    .release_device = vsi_iommu_release_device,
>> +    .device_group = generic_single_device_group,
>> +    .of_xlate = vsi_iommu_of_xlate,
>> +    .default_domain_ops = &(const struct iommu_domain_ops) {
>> +        .attach_dev        = vsi_iommu_attach_device,
>> +        .map_pages        = vsi_iommu_map,
>> +        .unmap_pages        = vsi_iommu_unmap,
>> +        .flush_iotlb_all    = vsi_iommu_flush_tlb_all,
>> +        .iova_to_phys        = vsi_iommu_iova_to_phys,
>> +        .free            = vsi_iommu_domain_free,
>> +    }
>> +};
>> +
>> +static const struct of_device_id vsi_iommu_dt_ids[] = {
>> +    {
>> +        .compatible = "verisilicon,iommu",
>> +    },
>> +    {
>> +        .compatible = "rockchip,rk3588-iommu-1.2",
>
> You can drop this - if the driver doesn't have any SoC-specific 
> behaviour then we only need to match the generic compatible here. As 
> long as the SoC-specific compatibles are in the binding, and thus in 
> deployed DTBs, we can start making use of them in future as and when 
> we have a reason to.

DT maintainers have asked to for it so I will keep it but renamed to be DT compliant.

>
>> +    },
>> +    { /* sentinel */ }
>> +};
>> +
>> +static int vsi_iommu_probe(struct platform_device *pdev)
>> +{
>> +    struct device *dev = &pdev->dev;
>> +    struct vsi_iommu *iommu;
>> +    struct resource *res;
>> +    int num_res = pdev->num_resources;
>> +    int err, i;
>> +
>> +    iommu = devm_kzalloc(dev, sizeof(*iommu), GFP_KERNEL);
>> +    if (!iommu)
>> +        return -ENOMEM;
>> +
>> +    iommu->dev = dev;
>> +    iommu->num_mmu = 0;
>> +
>> +    iommu->bases = devm_kcalloc(dev, num_res, sizeof(*iommu->bases),
>> +                    GFP_KERNEL);
>> +    if (!iommu->bases)
>> +        return -ENOMEM;
>> +
>> +    for (i = 0; i < num_res; i++) {
>> +        res = platform_get_resource(pdev, IORESOURCE_MEM, i);
>> +        if (!res)
>> +            continue;
>> +        iommu->bases[i] = devm_ioremap_resource(&pdev->dev, res);
>
> Consider devm_platform_ioremap_resource().
>
>> +        if (IS_ERR(iommu->bases[i]))
>> +            continue;
>> +        iommu->num_mmu++;
>> +    }
>> +    if (iommu->num_mmu == 0)
>> +        return PTR_ERR(iommu->bases[0]);
>> +
>> +    iommu->num_irq = platform_irq_count(pdev);
>> +    if (iommu->num_irq < 0)
>> +        return iommu->num_irq;
>> +
>> +    err = devm_clk_bulk_get_all(dev, &iommu->clocks);
>> +    if (err >= 0)
>> +        iommu->num_clocks = err;
>> +    else if (err == -ENOENT)
>> +        iommu->num_clocks = 0;
>> +    else
>> +        return err;
>> +
>> +    err = clk_bulk_prepare(iommu->num_clocks, iommu->clocks);
>> +    if (err)
>> +        return err;
>
> I wonder if devm_clk_bulk_get_all_enabled() might help here, but if 
> you do want to do any subsequent management then quite possibly it 
> just shifts the complexity to making sure they're reenabled in all the 
> paths where they can be released again :/

I will rework the probe function to take care of your remarks.

>
>> +
>> +    for (i = 0; i < iommu->num_irq; i++) {
>> +        int irq = platform_get_irq(pdev, i);
>
> As with num_mmu, according to your binding num_irq must be exactly 1. 
> Do you really need the pretence of supporting more or fewer?
>
>> +
>> +        if (irq < 0)
>> +            return irq;
>> +
>> +        err = devm_request_irq(iommu->dev, irq, vsi_iommu_irq,
>> +                       IRQF_SHARED, dev_name(dev), iommu);
>> +        if (err)
>> +            goto err_unprepare_clocks;
>> +    }
>> +
>> +    dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
>> +    platform_set_drvdata(pdev, iommu);
>> +
>> +    pm_runtime_enable(dev);
>> +
>> +    err = iommu_device_sysfs_add(&iommu->iommu, dev, NULL, 
>> dev_name(dev));
>> +    if (err)
>> +        goto err_runtime_disable;
>> +
>> +    err = iommu_device_register(&iommu->iommu, &vsi_iommu_ops, dev);
>> +    if (err)
>> +        goto err_remove_sysfs;
>> +
>> +    return 0;
>> +
>> +err_remove_sysfs:
>> +    iommu_device_sysfs_remove(&iommu->iommu);
>> +err_runtime_disable:
>> +    pm_runtime_disable(dev);
>> +err_unprepare_clocks:
>> +    clk_bulk_unprepare(iommu->num_clocks, iommu->clocks);
>> +    return err;
>> +}
>> +
>> +static void vsi_iommu_shutdown(struct platform_device *pdev)
>> +{
>> +    struct vsi_iommu *iommu = platform_get_drvdata(pdev);
>> +    int i;
>> +
>> +    for (i = 0; i < iommu->num_irq; i++) {
>> +        int irq = platform_get_irq(pdev, i);
>> +
>> +        devm_free_irq(iommu->dev, irq, iommu);
>
> Most devm_free calls are suspect in general, and this one is certainly 
> no exception. Even if it justifiable to suppress IRQs during shutdown, 
> can you not simply disable interrupt generation at the IOMMU end, or 
> at worst just do a disable_irq()? In the shutdown path we really don't 
> want to be doing any more work than absolutely necessary.

Sure

>
>> +    }
>> +
>> +    pm_runtime_force_suspend(&pdev->dev);
>> +}
>> +
>> +static int __maybe_unused vsi_iommu_suspend(struct device *dev)
>> +{
>> +    struct vsi_iommu *iommu = dev_get_drvdata(dev);
>> +
>> +    if (iommu->domain == &vsi_identity_domain)
>> +        return 0;
>> +
>> +    vsi_iommu_disable(iommu);
>
> This seems simlarly dubious - if suspend doesn't need to explicitly 
> save any additional context for a subsequent resume then it shouldn't 
> really do anything, certainly not change the state of IOMMU 
> translation. At best that's a waste of time, at worst it risks 
> unexpected faults or memory corruption (if a nominally-suspended 
> client device is actually still running).
>
>> +    return 0;
>> +}
>> +
>> +static int __maybe_unused vsi_iommu_resume(struct device *dev)
>> +{
>> +    struct vsi_iommu *iommu = dev_get_drvdata(dev);
>> +
>> +    if (iommu->domain == &vsi_identity_domain)
>> +        return 0;
>> +
>> +    return vsi_iommu_enable(iommu);
>> +}
>> +
>> +static const struct dev_pm_ops vsi_iommu_pm_ops = {
>> +    SET_RUNTIME_PM_OPS(vsi_iommu_suspend, vsi_iommu_resume, NULL)
>> +    SET_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend,
>> +                pm_runtime_force_resume)
>> +};
>
> Consider DEFINE_RUNTIME_DEV_PM_OPS().
>
>> +static struct platform_driver rockchip_vsi_iommu_driver = {
>> +    .probe = vsi_iommu_probe,
>> +    .shutdown = vsi_iommu_shutdown,
>> +    .driver = {
>> +           .name = "vsi_iommu",
>> +           .of_match_table = vsi_iommu_dt_ids,
>> +           .pm = &vsi_iommu_pm_ops,
>> +           .suppress_bind_attrs = true,
>> +    },
>> +};
>> +builtin_platform_driver(rockchip_vsi_iommu_driver);
>
> I guess that does technically work out when this is built as a module, 
> as the device_initcall() gets redefined to module_init() and the lack 
> of module_exit() just prevents removal (which in practice would be 
> prevented by held references anyway)... But it still looks a bit odd.

I will try to fix that in the next version.
Thanks for your comments.

Benjamin

>
> Thanks,
> Robin.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-06-23 14:03 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-19 13:12 [PATCH v3 0/5] Add support for Verisilicon IOMMU used by media codec blocks Benjamin Gaignard
2025-06-19 13:12 ` [PATCH v3 1/5] dt-bindings: vendor-prefixes: Add Verisilicon Benjamin Gaignard
2025-06-19 13:12 ` [PATCH v3 2/5] dt-bindings: iommu: verisilicon: Add binding for VSI IOMMU Benjamin Gaignard
2025-06-19 14:19   ` Sebastian Reichel
2025-06-19 15:02     ` Conor Dooley
2025-06-20  9:54     ` Benjamin Gaignard
2025-06-19 13:12 ` [PATCH v3 3/5] iommu: Add verisilicon IOMMU driver Benjamin Gaignard
2025-06-19 13:47   ` Jason Gunthorpe
2025-06-19 16:27     ` Benjamin Gaignard
2025-06-19 16:59       ` Jason Gunthorpe
2025-06-20  8:57         ` Benjamin Gaignard
2025-06-20 12:05           ` Jason Gunthorpe
2025-06-20 13:52             ` Benjamin Gaignard
2025-06-20 14:42               ` Benjamin Gaignard
2025-06-20 16:35                 ` Jason Gunthorpe
2025-06-20 16:36               ` Jason Gunthorpe
2025-06-20 19:37   ` Robin Murphy
2025-06-20 20:45     ` Lucas Stach
2025-06-23 12:05       ` Robin Murphy
2025-06-23 14:03     ` Benjamin Gaignard
2025-06-19 13:12 ` [PATCH v3 4/5] arm64: dts: rockchip: Add verisilicon IOMMU node on RK3588 Benjamin Gaignard
2025-06-19 13:12 ` [PATCH v3 5/5] arm64: defconfig: enable Verisilicon IOMMU Benjamin Gaignard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).