* [PATCH v6 00/10] New DRM accel driver for Rockchip's RKNN NPU
@ 2025-06-04 7:57 Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 01/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
` (8 more replies)
0 siblings, 9 replies; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-04 7:57 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso,
Krzysztof Kozlowski, Jeff Hugo
This series adds a new driver for the NPU that Rockchip includes in its
newer SoCs, developed by them on the NVDLA base.
In its current form, it supports the specific NPU in the RK3588 SoC.
The userspace driver is part of Mesa and an initial draft can be found at:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
Changes in v6:
- Make all cores depend on pclk and npu clocks
- Fix BO sync direction logic
- Misc. cleanups
- Link to v5: https://lore.kernel.org/r/20250520-6-10-rocket-v5-0-18c9ca0fcb3c@tomeuvizoso.net
Changes in v5:
- Use bulk clk API
- Rename bindings file
- Syntax improvement to bindings
- Link to v4: https://lore.kernel.org/r/20250519-6-10-rocket-v4-0-d6dff6b4c0ae@tomeuvizoso.net
Changes in v4:
- Several fixes to DT bindings.
- Link to v3: https://lore.kernel.org/r/20250516-6-10-rocket-v3-0-7051ac9225db@tomeuvizoso.net
Changes in v3:
- Reference in the device tree only the register blocks that are
actually used.
- Several style and robustness fixes suggested in the mailing list.
- Added patches from Nicolas Frattaroli that add support to the NPU for
the Rock 5B board.
- Link to v2: https://lore.kernel.org/r/20250225-6-10-rocket-v2-0-d4dbcfafc141@tomeuvizoso.net
Changes in v2:
- Drop patch adding the rk3588 compatible to rockchip-iommu (Sebastian Reichel)
- Drop patch adding support for multiple power domains to rockchip-iommu (Sebastian Reichel)
- Link to v1: https://lore.kernel.org/r/20240612-6-10-rocket-v1-0-060e48eea250@tomeuvizoso.net
---
Nicolas Frattaroli (2):
arm64: dts: rockchip: add pd_npu label for RK3588 power domains
arm64: dts: rockchip: enable NPU on ROCK 5B
Tomeu Vizoso (8):
dt-bindings: npu: rockchip,rknn: Add bindings
arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base
arm64: dts: rockchip: Enable the NPU on quartzpro64
accel/rocket: Add registers header
accel/rocket: Add a new driver for Rockchip's NPU
accel/rocket: Add IOCTL for BO creation
accel/rocket: Add job submission IOCTL
accel/rocket: Add IOCTLs for synchronizing memory accesses
Documentation/accel/index.rst | 1 +
Documentation/accel/rocket/index.rst | 19 +
.../bindings/npu/rockchip,rk3588-rknn-core.yaml | 144 +
MAINTAINERS | 10 +
arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 89 +-
.../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 30 +
arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 56 +
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 1 +
drivers/accel/rocket/Kconfig | 25 +
drivers/accel/rocket/Makefile | 10 +
drivers/accel/rocket/rocket_core.c | 80 +
drivers/accel/rocket/rocket_core.h | 59 +
drivers/accel/rocket/rocket_device.c | 31 +
drivers/accel/rocket/rocket_device.h | 30 +
drivers/accel/rocket/rocket_drv.c | 300 ++
drivers/accel/rocket/rocket_drv.h | 17 +
drivers/accel/rocket/rocket_gem.c | 208 +
drivers/accel/rocket/rocket_gem.h | 31 +
drivers/accel/rocket/rocket_job.c | 694 +++
drivers/accel/rocket/rocket_job.h | 50 +
drivers/accel/rocket/rocket_registers.h | 4425 ++++++++++++++++++++
include/uapi/drm/rocket_accel.h | 145 +
23 files changed, 6455 insertions(+), 1 deletion(-)
---
base-commit: a6d708809579ed3d902465785666085ff07a1d7c
change-id: 20240612-6-10-rocket-9316defc14c7
Best regards,
--
Tomeu Vizoso <tomeu@tomeuvizoso.net>
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v6 01/10] dt-bindings: npu: rockchip,rknn: Add bindings
2025-06-04 7:57 [PATCH v6 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
@ 2025-06-04 7:57 ` Tomeu Vizoso
2025-06-04 8:25 ` Heiko Stübner
2025-06-04 7:57 ` [PATCH v6 02/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base Tomeu Vizoso
` (7 subsequent siblings)
8 siblings, 1 reply; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-04 7:57 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso,
Krzysztof Kozlowski
Add the bindings for the Neural Processing Unit IP from Rockchip.
v2:
- Adapt to new node structure (one node per core, each with its own
IOMMU)
- Several misc. fixes from Sebastian Reichel
v3:
- Split register block in its constituent subblocks, and only require
the ones that the kernel would ever use (Nicolas Frattaroli)
- Group supplies (Rob Herring)
- Explain the way in which the top core is special (Rob Herring)
v4:
- Change required node name to npu@ (Rob Herring and Krzysztof Kozlowski)
- Remove unneeded items: (Krzysztof Kozlowski)
- Fix use of minItems/maxItems (Krzysztof Kozlowski)
- Add reg-names to list of required properties (Krzysztof Kozlowski)
- Fix example (Krzysztof Kozlowski)
v5:
- Rename file to rockchip,rk3588-rknn-core.yaml (Krzysztof Kozlowski)
- Streamline compatible property (Krzysztof Kozlowski)
v6:
- Remove mention to NVDLA, as the hardware is only incidentally related
(Kever Yang)
- Mark pclk and npu clocks as required by all clocks (Rob Herring)
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
---
.../bindings/npu/rockchip,rk3588-rknn-core.yaml | 144 +++++++++++++++++++++
1 file changed, 144 insertions(+)
diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..9a5e9e213912d0997da2f6ae26189adf044dcc7b
--- /dev/null
+++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
@@ -0,0 +1,144 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/npu/rockchip,rk3588-rknn-core.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Neural Processing Unit IP from Rockchip
+
+maintainers:
+ - Tomeu Vizoso <tomeu@tomeuvizoso.net>
+
+description:
+ Rockchip IP for accelerating inference of neural networks.
+
+ There is to be a node per each core in the NPU. In Rockchip's design there
+ will be one core that is special and needs to be powered on before any of the
+ other cores can be used. This special core is called the top core and should
+ have the compatible string that corresponds to top cores.
+
+properties:
+ $nodename:
+ pattern: '^npu@[a-f0-9]+$'
+
+ compatible:
+ enum:
+ - rockchip,rk3588-rknn-core-top
+ - rockchip,rk3588-rknn-core
+
+ reg:
+ maxItems: 3
+
+ reg-names:
+ items:
+ - const: pc
+ - const: cna
+ - const: core
+
+ clocks:
+ maxItems: 4
+
+ clock-names:
+ items:
+ - const: aclk
+ - const: hclk
+ - const: npu
+ - const: pclk
+
+ interrupts:
+ maxItems: 1
+
+ iommus:
+ maxItems: 1
+
+ npu-supply: true
+
+ power-domains:
+ maxItems: 1
+
+ resets:
+ maxItems: 2
+
+ reset-names:
+ items:
+ - const: srst_a
+ - const: srst_h
+
+ sram-supply: true
+
+required:
+ - compatible
+ - reg
+ - reg-names
+ - clocks
+ - clock-names
+ - interrupts
+ - iommus
+ - power-domains
+ - resets
+ - reset-names
+ - npu-supply
+ - sram-supply
+
+allOf:
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - rockchip,rknn-core-top
+ then:
+ properties:
+ clocks:
+ minItems: 4
+
+ clock-names:
+ minItems: 4
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - rockchip,rknn-core
+ then:
+ properties:
+ clocks:
+ maxItems: 2
+ clock-names:
+ maxItems: 2
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/clock/rockchip,rk3588-cru.h>
+ #include <dt-bindings/interrupt-controller/irq.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/power/rk3588-power.h>
+ #include <dt-bindings/reset/rockchip,rk3588-cru.h>
+
+ bus {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ npu@fdab0000 {
+ compatible = "rockchip,rk3588-rknn-core-top";
+ reg = <0x0 0xfdab0000 0x0 0x1000>,
+ <0x0 0xfdab1000 0x0 0x1000>,
+ <0x0 0xfdab3000 0x0 0x1000>;
+ reg-names = "pc", "cna", "core";
+ assigned-clocks = <&scmi_clk SCMI_CLK_NPU>;
+ assigned-clock-rates = <200000000>;
+ clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>,
+ <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>;
+ clock-names = "aclk", "hclk", "npu", "pclk";
+ interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>;
+ iommus = <&rknn_mmu_top>;
+ npu-supply = <&vdd_npu_s0>;
+ power-domains = <&power RK3588_PD_NPUTOP>;
+ resets = <&cru SRST_A_RKNN0>, <&cru SRST_H_RKNN0>;
+ reset-names = "srst_a", "srst_h";
+ sram-supply = <&vdd_npu_mem_s0>;
+ };
+ };
+...
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 02/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base
2025-06-04 7:57 [PATCH v6 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 01/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
@ 2025-06-04 7:57 ` Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 03/10] arm64: dts: rockchip: Enable the NPU on quartzpro64 Tomeu Vizoso
` (6 subsequent siblings)
8 siblings, 0 replies; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-04 7:57 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
See Chapter 36 "RKNN" from the RK3588 TRM (Part 1).
This is a derivative of NVIDIA's NVDLA, but with its own front-end
processor.
The IP is divided in three cores, programmed independently. The first
core though is special, requiring to be powered on before any of the
others can be used.
The IOMMU of the first core is also special in that it has two subunits
(read/write?) that need to be programmed in sync.
v2:
- Have one device for each NPU core (Sebastian Reichel)
- Have one device for each IOMMU (Sebastian Reichel)
- Correctly sort nodes (Diederik de Haas)
- Add rockchip,iommu compatible to IOMMU nodes (Sebastian Reichel)
v3:
- Adapt to a split of the register block in the DT bindings (Nicolas
Frattaroli)
v4:
- Adapt to changes in bindings
v6:
- pclk and npu clocks are needed by all clocks (Rob Herring)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 87 +++++++++++++++++++++++++++
1 file changed, 87 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
index 1e18ad93ba0ebdad31642b88ff0f90ef4e8dc76f..ea831bb6e2ccc64c811f885a4964da7617c255d7 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
@@ -1136,6 +1136,93 @@ power-domain@RK3588_PD_SDMMC {
};
};
+ rknn_core_top: npu@fdab0000 {
+ compatible = "rockchip,rk3588-rknn-core-top";
+ reg = <0x0 0xfdab0000 0x0 0x1000>,
+ <0x0 0xfdab1000 0x0 0x1000>,
+ <0x0 0xfdab3000 0x0 0x1000>;
+ reg-names = "pc", "cna", "core";
+ interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>,
+ <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>;
+ clock-names = "aclk", "hclk", "npu", "pclk";
+ assigned-clocks = <&scmi_clk SCMI_CLK_NPU>;
+ assigned-clock-rates = <200000000>;
+ resets = <&cru SRST_A_RKNN0>, <&cru SRST_H_RKNN0>;
+ reset-names = "srst_a", "srst_h";
+ power-domains = <&power RK3588_PD_NPUTOP>;
+ iommus = <&rknn_mmu_top>;
+ status = "disabled";
+ };
+
+ rknn_mmu_top: iommu@fdab9000 {
+ compatible = "rockchip,rk3588-iommu", "rockchip,rk3568-iommu";
+ reg = <0x0 0xfdab9000 0x0 0x100>,
+ <0x0 0xfdaba000 0x0 0x100>;
+ interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>;
+ clock-names = "aclk", "iface";
+ #iommu-cells = <0>;
+ power-domains = <&power RK3588_PD_NPUTOP>;
+ status = "disabled";
+ };
+
+ rknn_core_1: npu@fdac0000 {
+ compatible = "rockchip,rk3588-rknn-core";
+ reg = <0x0 0xfdac0000 0x0 0x1000>,
+ <0x0 0xfdac1000 0x0 0x1000>,
+ <0x0 0xfdac3000 0x0 0x1000>;
+ reg-names = "pc", "cna", "core";
+ interrupts = <GIC_SPI 111 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU1>, <&cru HCLK_NPU1>,
+ <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>;
+ clock-names = "aclk", "hclk", "npu", "pclk";
+ resets = <&cru SRST_A_RKNN1>, <&cru SRST_H_RKNN1>;
+ reset-names = "srst_a", "srst_h";
+ power-domains = <&power RK3588_PD_NPU1>;
+ iommus = <&rknn_mmu_1>;
+ status = "disabled";
+ };
+
+ rknn_mmu_1: iommu@fdac9000 {
+ compatible = "rockchip,rk3588-iommu", "rockchip,rk3568-iommu";
+ reg = <0x0 0xfdaca000 0x0 0x100>;
+ interrupts = <GIC_SPI 111 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU1>, <&cru HCLK_NPU1>;
+ clock-names = "aclk", "iface";
+ #iommu-cells = <0>;
+ power-domains = <&power RK3588_PD_NPU1>;
+ status = "disabled";
+ };
+
+ rknn_core_2: npu@fdad0000 {
+ compatible = "rockchip,rk3588-rknn-core";
+ reg = <0x0 0xfdad0000 0x0 0x1000>,
+ <0x0 0xfdad1000 0x0 0x1000>,
+ <0x0 0xfdad3000 0x0 0x1000>;
+ reg-names = "pc", "cna", "core";
+ interrupts = <GIC_SPI 112 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU2>, <&cru HCLK_NPU2>,
+ <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>;
+ clock-names = "aclk", "hclk", "npu", "pclk";
+ resets = <&cru SRST_A_RKNN2>, <&cru SRST_H_RKNN2>;
+ reset-names = "srst_a", "srst_h";
+ power-domains = <&power RK3588_PD_NPU2>;
+ iommus = <&rknn_mmu_2>;
+ status = "disabled";
+ };
+
+ rknn_mmu_2: iommu@fdad9000 {
+ compatible = "rockchip,rk3588-iommu", "rockchip,rk3568-iommu";
+ reg = <0x0 0xfdada000 0x0 0x100>;
+ interrupts = <GIC_SPI 112 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU2>, <&cru HCLK_NPU2>;
+ clock-names = "aclk", "iface";
+ #iommu-cells = <0>;
+ power-domains = <&power RK3588_PD_NPU2>;
+ status = "disabled";
+ };
+
vpu121: video-codec@fdb50000 {
compatible = "rockchip,rk3588-vpu121", "rockchip,rk3568-vpu";
reg = <0x0 0xfdb50000 0x0 0x800>;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 03/10] arm64: dts: rockchip: Enable the NPU on quartzpro64
2025-06-04 7:57 [PATCH v6 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 01/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 02/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base Tomeu Vizoso
@ 2025-06-04 7:57 ` Tomeu Vizoso
2025-06-04 8:39 ` Heiko Stübner
2025-06-04 7:57 ` [PATCH v6 05/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
` (5 subsequent siblings)
8 siblings, 1 reply; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-04 7:57 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
Enable the nodes added in a previous commit to the rk3588s device tree.
v2:
- Split nodes (Sebastian Reichel)
- Sort nodes (Sebastian Reichel)
- Add board regulators (Sebastian Reichel)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
.../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 30 ++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
index 78aaa6635b5d20a650aba8d8c2d0d4f498ff0d33..2e45b213c25b99571dd71ce90bc7970418f60276 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
@@ -415,6 +415,36 @@ &pcie3x4 {
status = "okay";
};
+&rknn_core_top {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_mem_s0>;
+ status = "okay";
+};
+
+&rknn_core_1 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_mem_s0>;
+ status = "okay";
+};
+
+&rknn_core_2 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_mem_s0>;
+ status = "okay";
+};
+
+&rknn_mmu_top {
+ status = "okay";
+};
+
+&rknn_mmu_1 {
+ status = "okay";
+};
+
+&rknn_mmu_2 {
+ status = "okay";
+};
+
&saradc {
vref-supply = <&vcc_1v8_s0>;
status = "okay";
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 05/10] accel/rocket: Add a new driver for Rockchip's NPU
2025-06-04 7:57 [PATCH v6 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (2 preceding siblings ...)
2025-06-04 7:57 ` [PATCH v6 03/10] arm64: dts: rockchip: Enable the NPU on quartzpro64 Tomeu Vizoso
@ 2025-06-04 7:57 ` Tomeu Vizoso
2025-06-04 18:14 ` Robin Murphy
2025-06-04 7:57 ` [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
` (4 subsequent siblings)
8 siblings, 1 reply; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-04 7:57 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
This initial version supports the NPU as shipped in the RK3588 SoC and
described in the first part of its TRM, in Chapter 36.
This NPU contains 3 independent cores that the driver can submit jobs
to.
This commit adds just hardware initialization and power management.
v2:
- Split cores and IOMMUs as independent devices (Sebastian Reichel)
- Add some documentation (Jeffrey Hugo)
- Be more explicit in the Kconfig documentation (Jeffrey Hugo)
- Remove resets, as these haven't been found useful so far (Zenghui Yu)
- Repack structs (Jeffrey Hugo)
- Use DEFINE_DRM_ACCEL_FOPS (Jeffrey Hugo)
- Use devm_drm_dev_alloc (Jeffrey Hugo)
- Use probe log helper (Jeffrey Hugo)
- Introduce UABI header in a later patch (Jeffrey Hugo)
v3:
- Adapt to a split of the register block in the DT bindings (Nicolas
Frattaroli)
- Move registers header to its own commit (Thomas Zimmermann)
- Misc. cleanups (Thomas Zimmermann and Jeff Hugo)
- Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
- PM improvements (Nicolas Frattaroli)
v4:
- Use bulk clk API (Krzysztof Kozlowski)
v6:
- Remove mention to NVDLA, as the hardware is only incidentally related
(Kever Yang)
- Use calloc instead of GFP_ZERO (Jeff Hugo)
- Explicitly include linux/container_of.h (Jeff Hugo)
- pclk and npu clocks are now needed by all cores (Rob Herring)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
Documentation/accel/index.rst | 1 +
Documentation/accel/rocket/index.rst | 19 +++
MAINTAINERS | 10 ++
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 1 +
drivers/accel/rocket/Kconfig | 25 ++++
drivers/accel/rocket/Makefile | 8 +
drivers/accel/rocket/rocket_core.c | 70 +++++++++
drivers/accel/rocket/rocket_core.h | 45 ++++++
drivers/accel/rocket/rocket_device.c | 25 ++++
drivers/accel/rocket/rocket_device.h | 26 ++++
drivers/accel/rocket/rocket_drv.c | 278 +++++++++++++++++++++++++++++++++++
drivers/accel/rocket/rocket_drv.h | 13 ++
13 files changed, 522 insertions(+)
diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
index bc85f26533d88891dde482f91e26c99991b22869..d8fa332d60a890dbb617454d2a26d9b6f9b196aa 100644
--- a/Documentation/accel/index.rst
+++ b/Documentation/accel/index.rst
@@ -10,6 +10,7 @@ Compute Accelerators
introduction
amdxdna/index
qaic/index
+ rocket/index
.. only:: subproject and html
diff --git a/Documentation/accel/rocket/index.rst b/Documentation/accel/rocket/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..300eb3aeab1d8c6514c65af4d216b2d5a1669131
--- /dev/null
+++ b/Documentation/accel/rocket/index.rst
@@ -0,0 +1,19 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+=====================================
+ accel/rocket Rockchip NPU driver
+=====================================
+
+The accel/rocket driver supports the Neural Processing Units (NPUs) inside some
+Rockchip SoCs such as the RK3588. Rockchip calls it RKNN and sometimes RKNPU.
+
+The hardware is described in chapter 36 in the RK3588 TRM.
+
+This driver just powers the hardware on and off, allocates and maps buffers to
+the device and submits jobs to the frontend unit. Everything else is done in
+userspace, as a Gallium driver (also called rocket) that is part of the Mesa3D
+project.
+
+Hardware currently supported:
+
+* RK3588
\ No newline at end of file
diff --git a/MAINTAINERS b/MAINTAINERS
index 96b82704950184bd71623ff41fc4df31e4c7fe87..2d8833bf1f2db06ca624d703f19066adab2f9fde 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7263,6 +7263,16 @@ T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
F: drivers/accel/ivpu/
F: include/uapi/drm/ivpu_accel.h
+DRM ACCEL DRIVER FOR ROCKCHIP NPU
+M: Tomeu Vizoso <tomeu@tomeuvizoso.net>
+L: dri-devel@lists.freedesktop.org
+S: Supported
+T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
+F: Documentation/accel/rocket/
+F: Documentation/devicetree/bindings/npu/rockchip,rknn-core.yaml
+F: drivers/accel/rocket/
+F: include/uapi/drm/rocket_accel.h
+
DRM COMPUTE ACCELERATORS DRIVERS AND FRAMEWORK
M: Oded Gabbay <ogabbay@kernel.org>
L: dri-devel@lists.freedesktop.org
diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
index 5b9490367a39fd12d35a8d9021768aa186c09308..bb01cebc42bf16ebf02e938040f339ff94869e33 100644
--- a/drivers/accel/Kconfig
+++ b/drivers/accel/Kconfig
@@ -28,5 +28,6 @@ source "drivers/accel/amdxdna/Kconfig"
source "drivers/accel/habanalabs/Kconfig"
source "drivers/accel/ivpu/Kconfig"
source "drivers/accel/qaic/Kconfig"
+source "drivers/accel/rocket/Kconfig"
endif
diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
index a301fb6089d4c515430175c5e2ba9190f6dc9158..ffc3fa58866616d933184a7659573cd4d4780a8d 100644
--- a/drivers/accel/Makefile
+++ b/drivers/accel/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_DRM_ACCEL_AMDXDNA) += amdxdna/
obj-$(CONFIG_DRM_ACCEL_HABANALABS) += habanalabs/
obj-$(CONFIG_DRM_ACCEL_IVPU) += ivpu/
obj-$(CONFIG_DRM_ACCEL_QAIC) += qaic/
+obj-$(CONFIG_DRM_ACCEL_ROCKET) += rocket/
\ No newline at end of file
diff --git a/drivers/accel/rocket/Kconfig b/drivers/accel/rocket/Kconfig
new file mode 100644
index 0000000000000000000000000000000000000000..9a59c6c61bf4d6460d8008b16331f001c97de67d
--- /dev/null
+++ b/drivers/accel/rocket/Kconfig
@@ -0,0 +1,25 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config DRM_ACCEL_ROCKET
+ tristate "Rocket (support for Rockchip NPUs)"
+ depends on DRM
+ depends on ARM64 || COMPILE_TEST
+ depends on MMU
+ select DRM_SCHED
+ select IOMMU_SUPPORT
+ select IOMMU_IO_PGTABLE_LPAE
+ select DRM_GEM_SHMEM_HELPER
+ help
+ Choose this option if you have a Rockchip SoC that contains a
+ compatible Neural Processing Unit (NPU), such as the RK3588. Called by
+ Rockchip either RKNN or RKNPU, it accelerates inference of neural
+ networks.
+
+ The interface exposed to userspace is described in
+ include/uapi/drm/rocket_accel.h and is used by the Rocket userspace
+ driver in Mesa3D.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here: the
+ module will be called rocket.
diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..abdd75f2492eaecf8bf5e78a2ac150ea19ac3e96
--- /dev/null
+++ b/drivers/accel/rocket/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_DRM_ACCEL_ROCKET) := rocket.o
+
+rocket-y := \
+ rocket_core.o \
+ rocket_device.o \
+ rocket_drv.o
diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
new file mode 100644
index 0000000000000000000000000000000000000000..3a6f25f2b4103075102739588bcdad96510e2a4e
--- /dev/null
+++ b/drivers/accel/rocket/rocket_core.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <linux/clk.h>
+#include <linux/dev_printk.h>
+#include <linux/err.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+
+#include "rocket_core.h"
+
+int rocket_core_init(struct rocket_core *core)
+{
+ struct device *dev = core->dev;
+ struct platform_device *pdev = to_platform_device(dev);
+ u32 version;
+ int err = 0;
+
+ err = devm_clk_bulk_get(dev, ARRAY_SIZE(core->clks), core->clks);
+ if (err)
+ return dev_err_probe(dev, err, "failed to get clocks for core %d\n", core->index);
+
+ core->pc_iomem = devm_platform_ioremap_resource_byname(pdev, "pc");
+ if (IS_ERR(core->pc_iomem)) {
+ dev_err(dev, "couldn't find PC registers %ld\n", PTR_ERR(core->pc_iomem));
+ return PTR_ERR(core->pc_iomem);
+ }
+
+ core->cna_iomem = devm_platform_ioremap_resource_byname(pdev, "cna");
+ if (IS_ERR(core->cna_iomem)) {
+ dev_err(dev, "couldn't find CNA registers %ld\n", PTR_ERR(core->cna_iomem));
+ return PTR_ERR(core->cna_iomem);
+ }
+
+ core->core_iomem = devm_platform_ioremap_resource_byname(pdev, "core");
+ if (IS_ERR(core->core_iomem)) {
+ dev_err(dev, "couldn't find CORE registers %ld\n", PTR_ERR(core->core_iomem));
+ return PTR_ERR(core->core_iomem);
+ }
+
+ pm_runtime_use_autosuspend(dev);
+
+ /*
+ * As this NPU will be most often used as part of a media pipeline that
+ * ends presenting in a display, choose 50 ms (~3 frames at 60Hz) as an
+ * autosuspend delay as that will keep the device powered up while the
+ * pipeline is running.
+ */
+ pm_runtime_set_autosuspend_delay(dev, 50);
+
+ pm_runtime_enable(dev);
+
+ err = pm_runtime_get_sync(dev);
+
+ version = rocket_pc_readl(core, VERSION);
+ version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
+
+ pm_runtime_mark_last_busy(dev);
+ pm_runtime_put_autosuspend(dev);
+
+ dev_info(dev, "Rockchip NPU core %d version: %d\n", core->index, version);
+
+ return 0;
+}
+
+void rocket_core_fini(struct rocket_core *core)
+{
+ pm_runtime_dont_use_autosuspend(core->dev);
+ pm_runtime_disable(core->dev);
+}
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
new file mode 100644
index 0000000000000000000000000000000000000000..1b1beb9798f03ec2ca325496a4d894674d0b798d
--- /dev/null
+++ b/drivers/accel/rocket/rocket_core.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_CORE_H__
+#define __ROCKET_CORE_H__
+
+#include <drm/gpu_scheduler.h>
+#include <linux/clk.h>
+#include <linux/io.h>
+#include <linux/mutex_types.h>
+
+#include "rocket_registers.h"
+
+#define rocket_pc_readl(core, reg) \
+ readl((core)->pc_iomem + (REG_PC_##reg))
+#define rocket_pc_writel(core, reg, value) \
+ writel(value, (core)->pc_iomem + (REG_PC_##reg))
+
+#define rocket_cna_readl(core, reg) \
+ readl((core)->cna_iomem + (REG_CNA_##reg) - REG_CNA_S_STATUS)
+#define rocket_cna_writel(core, reg, value) \
+ writel(value, (core)->cna_iomem + (REG_CNA_##reg) - REG_CNA_S_STATUS)
+
+#define rocket_core_readl(core, reg) \
+ readl((core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
+#define rocket_core_writel(core, reg, value) \
+ writel(value, (core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
+
+struct rocket_core {
+ struct device *dev;
+ struct rocket_device *rdev;
+ struct device_link *link;
+ unsigned int index;
+
+ int irq;
+ void __iomem *pc_iomem;
+ void __iomem *cna_iomem;
+ void __iomem *core_iomem;
+ struct clk_bulk_data clks[4];
+};
+
+int rocket_core_init(struct rocket_core *core);
+void rocket_core_fini(struct rocket_core *core);
+
+#endif
diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
new file mode 100644
index 0000000000000000000000000000000000000000..a05c103e117e3eaa6439884b7acb6e3483296edb
--- /dev/null
+++ b/drivers/accel/rocket/rocket_device.c
@@ -0,0 +1,25 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <linux/array_size.h>
+#include <linux/clk.h>
+#include <linux/dev_printk.h>
+
+#include "rocket_device.h"
+
+int rocket_device_init(struct rocket_device *rdev)
+{
+ int err;
+
+ /* Initialize core 0 (top) */
+ err = rocket_core_init(&rdev->cores[0]);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+void rocket_device_fini(struct rocket_device *rdev)
+{
+ rocket_core_fini(&rdev->cores[0]);
+}
diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
new file mode 100644
index 0000000000000000000000000000000000000000..b5d5f1479d56e2fde59bbcad9de2b58cef9a9a4d
--- /dev/null
+++ b/drivers/accel/rocket/rocket_device.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_DEVICE_H__
+#define __ROCKET_DEVICE_H__
+
+#include <drm/drm_device.h>
+#include <linux/clk.h>
+#include <linux/container_of.h>
+
+#include "rocket_core.h"
+
+struct rocket_device {
+ struct drm_device ddev;
+
+ struct rocket_core *cores;
+ unsigned int num_cores;
+};
+
+int rocket_device_init(struct rocket_device *rdev);
+void rocket_device_fini(struct rocket_device *rdev);
+
+#define to_rocket_device(drm_dev) \
+ ((struct rocket_device *)container_of(drm_dev, struct rocket_device, ddev))
+
+#endif
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
new file mode 100644
index 0000000000000000000000000000000000000000..f1996ef1ed3e9c99e968add7b4ee983a8139295d
--- /dev/null
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -0,0 +1,278 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include "linux/device/devres.h"
+#include "linux/gfp_types.h"
+#include <drm/drm_accel.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_gem.h>
+#include <drm/drm_ioctl.h>
+#include <drm/drm_of.h>
+#include <linux/array_size.h>
+#include <linux/clk.h>
+#include <linux/component.h>
+#include <linux/dma-mapping.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+
+#include "rocket_drv.h"
+
+static int
+rocket_open(struct drm_device *dev, struct drm_file *file)
+{
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct rocket_file_priv *rocket_priv;
+
+ rocket_priv = kzalloc(sizeof(*rocket_priv), GFP_KERNEL);
+ if (!rocket_priv)
+ return -ENOMEM;
+
+ rocket_priv->rdev = rdev;
+ file->driver_priv = rocket_priv;
+
+ return 0;
+}
+
+static void
+rocket_postclose(struct drm_device *dev, struct drm_file *file)
+{
+ struct rocket_file_priv *rocket_priv = file->driver_priv;
+
+ kfree(rocket_priv);
+}
+
+static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
+#define ROCKET_IOCTL(n, func) \
+ DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0)
+};
+
+DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
+
+/*
+ * Rocket driver version:
+ * - 1.0 - initial interface
+ */
+static const struct drm_driver rocket_drm_driver = {
+ .driver_features = DRIVER_COMPUTE_ACCEL,
+ .open = rocket_open,
+ .postclose = rocket_postclose,
+ .ioctls = rocket_drm_driver_ioctls,
+ .num_ioctls = ARRAY_SIZE(rocket_drm_driver_ioctls),
+ .fops = &rocket_accel_driver_fops,
+ .name = "rocket",
+ .desc = "rocket DRM",
+};
+
+static int rocket_drm_bind(struct device *dev)
+{
+ struct device_node *core_node;
+ struct rocket_device *rdev;
+ struct drm_device *ddev;
+ unsigned int num_cores = 1;
+ int err;
+
+ rdev = devm_drm_dev_alloc(dev, &rocket_drm_driver, struct rocket_device, ddev);
+ if (IS_ERR(rdev))
+ return PTR_ERR(rdev);
+
+ ddev = &rdev->ddev;
+ dev_set_drvdata(dev, rdev);
+
+ for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
+ if (of_device_is_available(core_node))
+ num_cores++;
+
+ rdev->cores = devm_kcalloc(dev, num_cores, sizeof(*rdev->cores), GFP_KERNEL);
+ if (IS_ERR(rdev->cores))
+ return PTR_ERR(rdev->cores);
+
+ /* Add core 0, any other cores will be added later when they are bound */
+ rdev->cores[0].rdev = rdev;
+ rdev->cores[0].dev = dev;
+ rdev->cores[0].index = 0;
+ rdev->num_cores = 1;
+
+ err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
+ if (err)
+ return err;
+
+ err = rocket_device_init(rdev);
+ if (err) {
+ dev_err_probe(dev, err, "Fatal error during NPU init\n");
+ goto err_device_fini;
+ }
+
+ err = component_bind_all(dev, rdev);
+ if (err)
+ goto err_device_fini;
+
+ err = drm_dev_register(ddev, 0);
+ if (err < 0)
+ goto err_unbind;
+
+ return 0;
+
+err_unbind:
+ component_unbind_all(dev, rdev);
+err_device_fini:
+ rocket_device_fini(rdev);
+ return err;
+}
+
+static void rocket_drm_unbind(struct device *dev)
+{
+ struct rocket_device *rdev = dev_get_drvdata(dev);
+ struct drm_device *ddev = &rdev->ddev;
+
+ drm_dev_unregister(ddev);
+
+ component_unbind_all(dev, rdev);
+
+ rocket_device_fini(rdev);
+}
+
+const struct component_master_ops rocket_drm_ops = {
+ .bind = rocket_drm_bind,
+ .unbind = rocket_drm_unbind,
+};
+
+static int rocket_core_bind(struct device *dev, struct device *master, void *data)
+{
+ struct rocket_device *rdev = data;
+ unsigned int core = rdev->num_cores;
+ int err;
+
+ dev_set_drvdata(dev, rdev);
+
+ rdev->cores[core].rdev = rdev;
+ rdev->cores[core].dev = dev;
+ rdev->cores[core].index = core;
+ rdev->cores[core].link = device_link_add(dev, rdev->cores[0].dev,
+ DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
+
+ rdev->num_cores++;
+
+ err = rocket_core_init(&rdev->cores[core]);
+ if (err) {
+ rocket_device_fini(rdev);
+ return err;
+ }
+
+ return 0;
+}
+
+static void rocket_core_unbind(struct device *dev, struct device *master, void *data)
+{
+ struct rocket_device *rdev = data;
+
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ if (rdev->cores[core].dev == dev) {
+ rocket_core_fini(&rdev->cores[core]);
+ device_link_del(rdev->cores[core].link);
+ break;
+ }
+ }
+}
+
+const struct component_ops rocket_core_ops = {
+ .bind = rocket_core_bind,
+ .unbind = rocket_core_unbind,
+};
+
+static int rocket_probe(struct platform_device *pdev)
+{
+ struct component_match *match = NULL;
+ struct device_node *core_node;
+
+ if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core"))
+ return component_add(&pdev->dev, &rocket_core_ops);
+
+ for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core") {
+ if (!of_device_is_available(core_node))
+ continue;
+
+ drm_of_component_match_add(&pdev->dev, &match,
+ component_compare_of, core_node);
+ }
+
+ return component_master_add_with_match(&pdev->dev, &rocket_drm_ops, match);
+}
+
+static void rocket_remove(struct platform_device *pdev)
+{
+ if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core-top"))
+ component_master_del(&pdev->dev, &rocket_drm_ops);
+ else if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core"))
+ component_del(&pdev->dev, &rocket_core_ops);
+}
+
+static const struct of_device_id dt_match[] = {
+ { .compatible = "rockchip,rk3588-rknn-core-top" },
+ { .compatible = "rockchip,rk3588-rknn-core" },
+ {}
+};
+MODULE_DEVICE_TABLE(of, dt_match);
+
+static int find_core_for_dev(struct device *dev)
+{
+ struct rocket_device *rdev = dev_get_drvdata(dev);
+
+ for (unsigned int core = 0; core < rdev->num_cores; core++) {
+ if (dev == rdev->cores[core].dev)
+ return core;
+ }
+
+ return -1;
+}
+
+static int rocket_device_runtime_resume(struct device *dev)
+{
+ struct rocket_device *rdev = dev_get_drvdata(dev);
+ int core = find_core_for_dev(dev);
+ int err = 0;
+
+ if (core < 0)
+ return -ENODEV;
+
+ err = clk_bulk_prepare_enable(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks);
+ if (err) {
+ dev_err(dev, "failed to enable (%d) clocks for core %d\n", err, core);
+ return err;
+ }
+
+ return 0;
+}
+
+static int rocket_device_runtime_suspend(struct device *dev)
+{
+ struct rocket_device *rdev = dev_get_drvdata(dev);
+ int core = find_core_for_dev(dev);
+
+ if (core < 0)
+ return -ENODEV;
+
+ clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks);
+
+ return 0;
+}
+
+EXPORT_GPL_DEV_PM_OPS(rocket_pm_ops) = {
+ RUNTIME_PM_OPS(rocket_device_runtime_suspend, rocket_device_runtime_resume, NULL)
+ SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
+};
+
+static struct platform_driver rocket_driver = {
+ .probe = rocket_probe,
+ .remove = rocket_remove,
+ .driver = {
+ .name = "rocket",
+ .pm = pm_ptr(&rocket_pm_ops),
+ .of_match_table = dt_match,
+ },
+};
+module_platform_driver(rocket_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("DRM driver for the Rockchip NPU IP");
+MODULE_AUTHOR("Tomeu Vizoso");
diff --git a/drivers/accel/rocket/rocket_drv.h b/drivers/accel/rocket/rocket_drv.h
new file mode 100644
index 0000000000000000000000000000000000000000..bd3a697ab7c8e378967ce638b04d7d86845b53c7
--- /dev/null
+++ b/drivers/accel/rocket/rocket_drv.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_DRV_H__
+#define __ROCKET_DRV_H__
+
+#include "rocket_device.h"
+
+struct rocket_file_priv {
+ struct rocket_device *rdev;
+};
+
+#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation
2025-06-04 7:57 [PATCH v6 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (3 preceding siblings ...)
2025-06-04 7:57 ` [PATCH v6 05/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
@ 2025-06-04 7:57 ` Tomeu Vizoso
2025-06-04 16:18 ` Daniel Stone
2025-06-04 7:57 ` [PATCH v6 07/10] accel/rocket: Add job submission IOCTL Tomeu Vizoso
` (3 subsequent siblings)
8 siblings, 1 reply; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-04 7:57 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
This uses the SHMEM DRM helpers and we map right away to the CPU and NPU
sides, as all buffers are expected to be accessed from both.
v2:
- Sync the IOMMUs for the other cores when mapping and unmapping.
v3:
- Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
v6:
- Use mutexes guard (Markus Elfring)
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
drivers/accel/rocket/Makefile | 3 +-
drivers/accel/rocket/rocket_device.c | 4 ++
drivers/accel/rocket/rocket_device.h | 2 +
drivers/accel/rocket/rocket_drv.c | 7 +-
drivers/accel/rocket/rocket_gem.c | 126 +++++++++++++++++++++++++++++++++++
drivers/accel/rocket/rocket_gem.h | 26 ++++++++
include/uapi/drm/rocket_accel.h | 44 ++++++++++++
7 files changed, 210 insertions(+), 2 deletions(-)
diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
index abdd75f2492eaecf8bf5e78a2ac150ea19ac3e96..4deef267f9e1238c4d8bd108dcc8afd9dc8b2b8f 100644
--- a/drivers/accel/rocket/Makefile
+++ b/drivers/accel/rocket/Makefile
@@ -5,4 +5,5 @@ obj-$(CONFIG_DRM_ACCEL_ROCKET) := rocket.o
rocket-y := \
rocket_core.o \
rocket_device.o \
- rocket_drv.o
+ rocket_drv.o \
+ rocket_gem.o
diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
index a05c103e117e3eaa6439884b7acb6e3483296edb..5e559104741af22c528914c96e44558323ab6c89 100644
--- a/drivers/accel/rocket/rocket_device.c
+++ b/drivers/accel/rocket/rocket_device.c
@@ -4,6 +4,7 @@
#include <linux/array_size.h>
#include <linux/clk.h>
#include <linux/dev_printk.h>
+#include <linux/mutex.h>
#include "rocket_device.h"
@@ -16,10 +17,13 @@ int rocket_device_init(struct rocket_device *rdev)
if (err)
return err;
+ mutex_init(&rdev->iommu_lock);
+
return 0;
}
void rocket_device_fini(struct rocket_device *rdev)
{
+ mutex_destroy(&rdev->iommu_lock);
rocket_core_fini(&rdev->cores[0]);
}
diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
index b5d5f1479d56e2fde59bbcad9de2b58cef9a9a4d..10acfe8534f00a7985d40a93f4b2f7f69d43caee 100644
--- a/drivers/accel/rocket/rocket_device.h
+++ b/drivers/accel/rocket/rocket_device.h
@@ -13,6 +13,8 @@
struct rocket_device {
struct drm_device ddev;
+ struct mutex iommu_lock;
+
struct rocket_core *cores;
unsigned int num_cores;
};
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index f1996ef1ed3e9c99e968add7b4ee983a8139295d..209ef342bfa4993db793a2745dcd353b4ef84eb1 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -8,6 +8,7 @@
#include <drm/drm_gem.h>
#include <drm/drm_ioctl.h>
#include <drm/drm_of.h>
+#include <drm/rocket_accel.h>
#include <linux/array_size.h>
#include <linux/clk.h>
#include <linux/component.h>
@@ -17,6 +18,7 @@
#include <linux/pm_runtime.h>
#include "rocket_drv.h"
+#include "rocket_gem.h"
static int
rocket_open(struct drm_device *dev, struct drm_file *file)
@@ -45,6 +47,8 @@ rocket_postclose(struct drm_device *dev, struct drm_file *file)
static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
#define ROCKET_IOCTL(n, func) \
DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0)
+
+ ROCKET_IOCTL(CREATE_BO, create_bo),
};
DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
@@ -54,9 +58,10 @@ DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
* - 1.0 - initial interface
*/
static const struct drm_driver rocket_drm_driver = {
- .driver_features = DRIVER_COMPUTE_ACCEL,
+ .driver_features = DRIVER_COMPUTE_ACCEL | DRIVER_GEM,
.open = rocket_open,
.postclose = rocket_postclose,
+ .gem_create_object = rocket_gem_create_object,
.ioctls = rocket_drm_driver_ioctls,
.num_ioctls = ARRAY_SIZE(rocket_drm_driver_ioctls),
.fops = &rocket_accel_driver_fops,
diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c
new file mode 100644
index 0000000000000000000000000000000000000000..fa07c494fd845f98ac759407677a649ed2c97727
--- /dev/null
+++ b/drivers/accel/rocket/rocket_gem.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <drm/drm_device.h>
+#include <drm/drm_utils.h>
+#include <drm/rocket_accel.h>
+#include <linux/dma-mapping.h>
+#include <linux/iommu.h>
+
+#include "rocket_device.h"
+#include "rocket_gem.h"
+
+static void rocket_gem_bo_free(struct drm_gem_object *obj)
+{
+ struct rocket_device *rdev = to_rocket_device(obj->dev);
+ struct rocket_gem_object *bo = to_rocket_bo(obj);
+ struct sg_table *sgt;
+
+ drm_WARN_ON(obj->dev, bo->base.pages_use_count > 1);
+
+ guard(mutex)(&rdev->iommu_lock);
+
+ sgt = drm_gem_shmem_get_pages_sgt(&bo->base);
+
+ /* Unmap this object from the IOMMUs for cores > 0 */
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ struct iommu_domain *domain = iommu_get_domain_for_dev(rdev->cores[core].dev);
+ size_t unmapped = iommu_unmap(domain, sgt->sgl->dma_address, bo->size);
+
+ drm_WARN_ON(obj->dev, unmapped != bo->size);
+ }
+
+ /* This will unmap the pages from the IOMMU linked to core 0 */
+ drm_gem_shmem_free(&bo->base);
+}
+
+static const struct drm_gem_object_funcs rocket_gem_funcs = {
+ .free = rocket_gem_bo_free,
+ .print_info = drm_gem_shmem_object_print_info,
+ .pin = drm_gem_shmem_object_pin,
+ .unpin = drm_gem_shmem_object_unpin,
+ .get_sg_table = drm_gem_shmem_object_get_sg_table,
+ .vmap = drm_gem_shmem_object_vmap,
+ .vunmap = drm_gem_shmem_object_vunmap,
+ .mmap = drm_gem_shmem_object_mmap,
+ .vm_ops = &drm_gem_shmem_vm_ops,
+};
+
+struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size)
+{
+ struct rocket_gem_object *obj;
+
+ obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+ if (!obj)
+ return ERR_PTR(-ENOMEM);
+
+ obj->base.base.funcs = &rocket_gem_funcs;
+
+ return &obj->base.base;
+}
+
+int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct drm_rocket_create_bo *args = data;
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct drm_gem_shmem_object *shmem_obj;
+ struct rocket_gem_object *rkt_obj;
+ struct drm_gem_object *gem_obj;
+ struct sg_table *sgt;
+ int ret;
+
+ shmem_obj = drm_gem_shmem_create(dev, args->size);
+ if (IS_ERR(shmem_obj))
+ return PTR_ERR(shmem_obj);
+
+ gem_obj = &shmem_obj->base;
+ rkt_obj = to_rocket_bo(gem_obj);
+
+ rkt_obj->size = args->size;
+ rkt_obj->offset = 0;
+
+ ret = drm_gem_handle_create(file, gem_obj, &args->handle);
+ drm_gem_object_put(gem_obj);
+
+ guard(mutex)(&rdev->iommu_lock);
+
+ if (ret)
+ goto err;
+
+ /* This will map the pages to the IOMMU linked to core 0 */
+ sgt = drm_gem_shmem_get_pages_sgt(shmem_obj);
+ if (IS_ERR(sgt)) {
+ ret = PTR_ERR(sgt);
+ goto err;
+ }
+
+ /* Map the pages to the IOMMUs linked to the other cores, so all cores can access this BO */
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ ret = iommu_map_sgtable(iommu_get_domain_for_dev(rdev->cores[core].dev),
+ sgt->sgl->dma_address,
+ sgt,
+ IOMMU_READ | IOMMU_WRITE);
+ if (ret < 0 || ret < args->size) {
+ drm_err(dev, "failed to map buffer: size=%d request_size=%u\n",
+ ret, args->size);
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ /* iommu_map_sgtable might have aligned the size */
+ rkt_obj->size = ret;
+
+ dma_sync_sgtable_for_device(rdev->cores[core].dev, shmem_obj->sgt,
+ DMA_BIDIRECTIONAL);
+ }
+
+ args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
+ args->dma_address = sg_dma_address(shmem_obj->sgt->sgl);
+
+ return 0;
+
+err:
+ drm_gem_shmem_object_free(gem_obj);
+
+ return ret;
+}
diff --git a/drivers/accel/rocket/rocket_gem.h b/drivers/accel/rocket/rocket_gem.h
new file mode 100644
index 0000000000000000000000000000000000000000..41497554366961cfe18cf6c7e93ab1e4e5dc1886
--- /dev/null
+++ b/drivers/accel/rocket/rocket_gem.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_GEM_H__
+#define __ROCKET_GEM_H__
+
+#include <drm/drm_gem_shmem_helper.h>
+
+struct rocket_gem_object {
+ struct drm_gem_shmem_object base;
+
+ size_t size;
+ u32 offset;
+};
+
+struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size);
+
+int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file);
+
+static inline
+struct rocket_gem_object *to_rocket_bo(struct drm_gem_object *obj)
+{
+ return container_of(to_drm_gem_shmem_obj(obj), struct rocket_gem_object, base);
+}
+
+#endif
diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h
new file mode 100644
index 0000000000000000000000000000000000000000..95720702b7c4413d72b89c1f0f59abb22dc8c6b3
--- /dev/null
+++ b/include/uapi/drm/rocket_accel.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Tomeu Vizoso
+ */
+#ifndef __DRM_UAPI_ROCKET_ACCEL_H__
+#define __DRM_UAPI_ROCKET_ACCEL_H__
+
+#include "drm.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define DRM_ROCKET_CREATE_BO 0x00
+
+#define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
+
+/**
+ * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
+ *
+ */
+struct drm_rocket_create_bo {
+ /** Input: Size of the requested BO. */
+ __u32 size;
+
+ /** Output: GEM handle for the BO. */
+ __u32 handle;
+
+ /**
+ * Output: DMA address for the BO in the NPU address space. This address
+ * is private to the DRM fd and is valid for the lifetime of the GEM
+ * handle.
+ */
+ __u64 dma_address;
+
+ /** Output: Offset into the drm node to use for subsequent mmap call. */
+ __u64 offset;
+};
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif /* __DRM_UAPI_ROCKET_ACCEL_H__ */
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 07/10] accel/rocket: Add job submission IOCTL
2025-06-04 7:57 [PATCH v6 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (4 preceding siblings ...)
2025-06-04 7:57 ` [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
@ 2025-06-04 7:57 ` Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 08/10] accel/rocket: Add IOCTLs for synchronizing memory accesses Tomeu Vizoso
` (2 subsequent siblings)
8 siblings, 0 replies; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-04 7:57 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
Using the DRM GPU scheduler infrastructure, with a scheduler for each
core.
Userspace can decide for a series of tasks to be executed sequentially
in the same core, so SRAM locality can be taken advantage of.
The job submission code was initially based on Panfrost.
v2:
- Remove hardcoded number of cores
- Misc. style fixes (Jeffrey Hugo)
- Repack IOCTL struct (Jeffrey Hugo)
v3:
- Adapt to a split of the register block in the DT bindings (Nicolas
Frattaroli)
- Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
- Use drm_* logging functions (Thomas Zimmermann)
- Rename reg i/o macros (Thomas Zimmermann)
- Add padding to ioctls and check for zero (Jeff Hugo)
- Improve error handling (Nicolas Frattaroli)
v6:
- Use mutexes guard (Markus Elfring)
- Use u64_to_user_ptr (Jeff Hugo)
- Drop rocket_fence (Rob Herring)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
drivers/accel/rocket/Makefile | 3 +-
drivers/accel/rocket/rocket_core.c | 10 +
drivers/accel/rocket/rocket_core.h | 14 +
drivers/accel/rocket/rocket_device.c | 2 +
drivers/accel/rocket/rocket_device.h | 2 +
drivers/accel/rocket/rocket_drv.c | 15 +
drivers/accel/rocket/rocket_drv.h | 4 +
drivers/accel/rocket/rocket_job.c | 694 +++++++++++++++++++++++++++++++++++
drivers/accel/rocket/rocket_job.h | 50 +++
include/uapi/drm/rocket_accel.h | 64 ++++
10 files changed, 857 insertions(+), 1 deletion(-)
diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
index 4deef267f9e1238c4d8bd108dcc8afd9dc8b2b8f..3713dfe223d6ec6293ced3ef9291af2f3d144131 100644
--- a/drivers/accel/rocket/Makefile
+++ b/drivers/accel/rocket/Makefile
@@ -6,4 +6,5 @@ rocket-y := \
rocket_core.o \
rocket_device.o \
rocket_drv.o \
- rocket_gem.o
+ rocket_gem.o \
+ rocket_job.o
diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index 3a6f25f2b4103075102739588bcdad96510e2a4e..b57e10d9938c0f71d0107841244ec969ca9e30e1 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -8,6 +8,7 @@
#include <linux/pm_runtime.h>
#include "rocket_core.h"
+#include "rocket_job.h"
int rocket_core_init(struct rocket_core *core)
{
@@ -38,6 +39,10 @@ int rocket_core_init(struct rocket_core *core)
return PTR_ERR(core->core_iomem);
}
+ err = rocket_job_init(core);
+ if (err)
+ return err;
+
pm_runtime_use_autosuspend(dev);
/*
@@ -51,6 +56,10 @@ int rocket_core_init(struct rocket_core *core)
pm_runtime_enable(dev);
err = pm_runtime_get_sync(dev);
+ if (err) {
+ rocket_job_fini(core);
+ return err;
+ }
version = rocket_pc_readl(core, VERSION);
version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
@@ -67,4 +76,5 @@ void rocket_core_fini(struct rocket_core *core)
{
pm_runtime_dont_use_autosuspend(core->dev);
pm_runtime_disable(core->dev);
+ rocket_job_fini(core);
}
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index 1b1beb9798f03ec2ca325496a4d894674d0b798d..de5fb4e26d4542bda8abf6ab8d4bd562755d547e 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -37,6 +37,20 @@ struct rocket_core {
void __iomem *cna_iomem;
void __iomem *core_iomem;
struct clk_bulk_data clks[4];
+
+ struct rocket_job *in_flight_job;
+
+ spinlock_t job_lock;
+
+ struct {
+ struct workqueue_struct *wq;
+ struct work_struct work;
+ atomic_t pending;
+ } reset;
+
+ struct drm_gpu_scheduler sched;
+ u64 fence_context;
+ u64 emit_seqno;
};
int rocket_core_init(struct rocket_core *core);
diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
index 5e559104741af22c528914c96e44558323ab6c89..8f5c99aeaa6118c406cc570f7d4747cd1cb1c082 100644
--- a/drivers/accel/rocket/rocket_device.c
+++ b/drivers/accel/rocket/rocket_device.c
@@ -18,12 +18,14 @@ int rocket_device_init(struct rocket_device *rdev)
return err;
mutex_init(&rdev->iommu_lock);
+ mutex_init(&rdev->sched_lock);
return 0;
}
void rocket_device_fini(struct rocket_device *rdev)
{
+ mutex_destroy(&rdev->sched_lock);
mutex_destroy(&rdev->iommu_lock);
rocket_core_fini(&rdev->cores[0]);
}
diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
index 10acfe8534f00a7985d40a93f4b2f7f69d43caee..50e46f0516bd1615b5f826c5002a6c0ecbf9aed4 100644
--- a/drivers/accel/rocket/rocket_device.h
+++ b/drivers/accel/rocket/rocket_device.h
@@ -13,6 +13,8 @@
struct rocket_device {
struct drm_device ddev;
+ struct mutex sched_lock;
+
struct mutex iommu_lock;
struct rocket_core *cores;
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 209ef342bfa4993db793a2745dcd353b4ef84eb1..e6e257e01cd83a18abc2817135ffd97abe3104b3 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -19,12 +19,14 @@
#include "rocket_drv.h"
#include "rocket_gem.h"
+#include "rocket_job.h"
static int
rocket_open(struct drm_device *dev, struct drm_file *file)
{
struct rocket_device *rdev = to_rocket_device(dev);
struct rocket_file_priv *rocket_priv;
+ int ret;
rocket_priv = kzalloc(sizeof(*rocket_priv), GFP_KERNEL);
if (!rocket_priv)
@@ -33,7 +35,15 @@ rocket_open(struct drm_device *dev, struct drm_file *file)
rocket_priv->rdev = rdev;
file->driver_priv = rocket_priv;
+ ret = rocket_job_open(rocket_priv);
+ if (ret)
+ goto err_free;
+
return 0;
+
+err_free:
+ kfree(rocket_priv);
+ return ret;
}
static void
@@ -41,6 +51,7 @@ rocket_postclose(struct drm_device *dev, struct drm_file *file)
{
struct rocket_file_priv *rocket_priv = file->driver_priv;
+ rocket_job_close(rocket_priv);
kfree(rocket_priv);
}
@@ -49,6 +60,7 @@ static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0)
ROCKET_IOCTL(CREATE_BO, create_bo),
+ ROCKET_IOCTL(SUBMIT, submit),
};
DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
@@ -257,6 +269,9 @@ static int rocket_device_runtime_suspend(struct device *dev)
if (core < 0)
return -ENODEV;
+ if (!rocket_job_is_idle(&rdev->cores[core]))
+ return -EBUSY;
+
clk_bulk_disable_unprepare(ARRAY_SIZE(rdev->cores[core].clks), rdev->cores[core].clks);
return 0;
diff --git a/drivers/accel/rocket/rocket_drv.h b/drivers/accel/rocket/rocket_drv.h
index bd3a697ab7c8e378967ce638b04d7d86845b53c7..b4055cfad6bd431b7c59b0848653748ab945615c 100644
--- a/drivers/accel/rocket/rocket_drv.h
+++ b/drivers/accel/rocket/rocket_drv.h
@@ -4,10 +4,14 @@
#ifndef __ROCKET_DRV_H__
#define __ROCKET_DRV_H__
+#include <drm/gpu_scheduler.h>
+
#include "rocket_device.h"
struct rocket_file_priv {
struct rocket_device *rdev;
+
+ struct drm_sched_entity sched_entity;
};
#endif
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
new file mode 100644
index 0000000000000000000000000000000000000000..e5f04e2eff75cae2ac74a7c34b6591e2167e9a56
--- /dev/null
+++ b/drivers/accel/rocket/rocket_job.c
@@ -0,0 +1,694 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
+/* Copyright 2019 Collabora ltd. */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <drm/drm_print.h>
+#include <drm/drm_file.h>
+#include <drm/drm_gem.h>
+#include <drm/rocket_accel.h>
+#include <linux/interrupt.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+
+#include "rocket_core.h"
+#include "rocket_device.h"
+#include "rocket_drv.h"
+#include "rocket_job.h"
+#include "rocket_registers.h"
+
+#define JOB_TIMEOUT_MS 500
+
+static struct rocket_job *
+to_rocket_job(struct drm_sched_job *sched_job)
+{
+ return container_of(sched_job, struct rocket_job, base);
+}
+
+static const char *rocket_fence_get_driver_name(struct dma_fence *fence)
+{
+ return "rocket";
+}
+
+static const char *rocket_fence_get_timeline_name(struct dma_fence *fence)
+{
+ return "rockchip-npu";
+}
+
+static const struct dma_fence_ops rocket_fence_ops = {
+ .get_driver_name = rocket_fence_get_driver_name,
+ .get_timeline_name = rocket_fence_get_timeline_name,
+};
+
+static struct dma_fence *rocket_fence_create(struct rocket_core *core)
+{
+ struct dma_fence *fence;
+
+ fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+ if (!fence)
+ return ERR_PTR(-ENOMEM);
+
+ dma_fence_init(fence, &rocket_fence_ops, &core->job_lock,
+ core->fence_context, ++core->emit_seqno);
+
+ return fence;
+}
+
+static int
+rocket_copy_tasks(struct drm_device *dev,
+ struct drm_file *file_priv,
+ struct drm_rocket_job *job,
+ struct rocket_job *rjob)
+{
+ struct drm_rocket_task *tasks;
+ int ret = 0;
+ int i;
+
+ rjob->task_count = job->task_count;
+
+ if (!rjob->task_count)
+ return 0;
+
+ tasks = kvmalloc_array(rjob->task_count, sizeof(*tasks), GFP_KERNEL);
+ if (!tasks) {
+ ret = -ENOMEM;
+ drm_dbg(dev, "Failed to allocate incoming tasks\n");
+ goto fail;
+ }
+
+ if (copy_from_user(tasks, u64_to_user_ptr(job->tasks), rjob->task_count * sizeof(*tasks))) {
+ ret = -EFAULT;
+ drm_dbg(dev, "Failed to copy incoming tasks\n");
+ goto fail;
+ }
+
+ rjob->tasks = kvmalloc_array(job->task_count, sizeof(*rjob->tasks), GFP_KERNEL);
+ if (!rjob->tasks) {
+ drm_dbg(dev, "Failed to allocate task array\n");
+ ret = -ENOMEM;
+ goto fail;
+ }
+
+ for (i = 0; i < rjob->task_count; i++) {
+ if (tasks[i].reserved != 0) {
+ drm_dbg(dev, "Reserved field in drm_rocket_task struct should be 0.\n");
+ return -EINVAL;
+ }
+
+ if (tasks[i].regcmd_count == 0) {
+ ret = -EINVAL;
+ goto fail;
+ }
+ rjob->tasks[i].regcmd = tasks[i].regcmd;
+ rjob->tasks[i].regcmd_count = tasks[i].regcmd_count;
+ }
+
+fail:
+ kvfree(tasks);
+ return ret;
+}
+
+static void rocket_job_hw_submit(struct rocket_core *core, struct rocket_job *job)
+{
+ struct rocket_task *task;
+ bool task_pp_en = 1;
+ bool task_count = 1;
+
+ /* GO ! */
+
+ /* Don't queue the job if a reset is in progress */
+ if (!atomic_read(&core->reset.pending)) {
+ task = &job->tasks[job->next_task_idx];
+ job->next_task_idx++; /* TODO: Do this only after a successful run? */
+
+ rocket_pc_writel(core, BASE_ADDRESS, 0x1);
+
+ rocket_cna_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
+ rocket_core_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
+
+ rocket_pc_writel(core, BASE_ADDRESS, task->regcmd);
+ rocket_pc_writel(core, REGISTER_AMOUNTS, (task->regcmd_count + 1) / 2 - 1);
+
+ rocket_pc_writel(core, INTERRUPT_MASK,
+ PC_INTERRUPT_MASK_DPU_0 | PC_INTERRUPT_MASK_DPU_1);
+ rocket_pc_writel(core, INTERRUPT_CLEAR,
+ PC_INTERRUPT_CLEAR_DPU_0 | PC_INTERRUPT_CLEAR_DPU_1);
+
+ rocket_pc_writel(core, TASK_CON, ((0x6 | task_pp_en) << 12) | task_count);
+
+ rocket_pc_writel(core, TASK_DMA_BASE_ADDR, 0x0);
+
+ rocket_pc_writel(core, OPERATION_ENABLE, 0x1);
+
+ dev_dbg(core->dev,
+ "Submitted regcmd at 0x%llx to core %d",
+ task->regcmd, core->index);
+ }
+}
+
+static int rocket_acquire_object_fences(struct drm_gem_object **bos,
+ int bo_count,
+ struct drm_sched_job *job,
+ bool is_write)
+{
+ int i, ret;
+
+ for (i = 0; i < bo_count; i++) {
+ ret = dma_resv_reserve_fences(bos[i]->resv, 1);
+ if (ret)
+ return ret;
+
+ ret = drm_sched_job_add_implicit_dependencies(job, bos[i],
+ is_write);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static void rocket_attach_object_fences(struct drm_gem_object **bos,
+ int bo_count,
+ struct dma_fence *fence)
+{
+ int i;
+
+ for (i = 0; i < bo_count; i++)
+ dma_resv_add_fence(bos[i]->resv, fence, DMA_RESV_USAGE_WRITE);
+}
+
+static int rocket_job_push(struct rocket_job *job)
+{
+ struct rocket_device *rdev = job->rdev;
+ struct drm_gem_object **bos;
+ struct ww_acquire_ctx acquire_ctx;
+ int ret = 0;
+
+ bos = kvmalloc_array(job->in_bo_count + job->out_bo_count, sizeof(void *),
+ GFP_KERNEL);
+ memcpy(bos, job->in_bos, job->in_bo_count * sizeof(void *));
+ memcpy(&bos[job->in_bo_count], job->out_bos, job->out_bo_count * sizeof(void *));
+
+ ret = drm_gem_lock_reservations(bos, job->in_bo_count + job->out_bo_count, &acquire_ctx);
+ if (ret)
+ goto err;
+
+ scoped_guard(mutex, &rdev->sched_lock) {
+ drm_sched_job_arm(&job->base);
+
+ job->inference_done_fence = dma_fence_get(&job->base.s_fence->finished);
+
+ ret = rocket_acquire_object_fences(job->in_bos, job->in_bo_count, &job->base, false);
+ if (ret)
+ goto err_unlock;
+
+ ret = rocket_acquire_object_fences(job->out_bos, job->out_bo_count, &job->base, true);
+ if (ret)
+ goto err_unlock;
+
+ kref_get(&job->refcount); /* put by scheduler job completion */
+
+ drm_sched_entity_push_job(&job->base);
+ }
+
+ rocket_attach_object_fences(job->out_bos, job->out_bo_count, job->inference_done_fence);
+
+err_unlock:
+ drm_gem_unlock_reservations(bos, job->in_bo_count + job->out_bo_count, &acquire_ctx);
+err:
+ kfree(bos);
+
+ return ret;
+}
+
+static void rocket_job_cleanup(struct kref *ref)
+{
+ struct rocket_job *job = container_of(ref, struct rocket_job,
+ refcount);
+ unsigned int i;
+
+ dma_fence_put(job->done_fence);
+ dma_fence_put(job->inference_done_fence);
+
+ if (job->in_bos) {
+ for (i = 0; i < job->in_bo_count; i++)
+ drm_gem_object_put(job->in_bos[i]);
+
+ kvfree(job->in_bos);
+ }
+
+ if (job->out_bos) {
+ for (i = 0; i < job->out_bo_count; i++)
+ drm_gem_object_put(job->out_bos[i]);
+
+ kvfree(job->out_bos);
+ }
+
+ kfree(job->tasks);
+
+ kfree(job);
+}
+
+static void rocket_job_put(struct rocket_job *job)
+{
+ kref_put(&job->refcount, rocket_job_cleanup);
+}
+
+static void rocket_job_free(struct drm_sched_job *sched_job)
+{
+ struct rocket_job *job = to_rocket_job(sched_job);
+
+ drm_sched_job_cleanup(sched_job);
+
+ rocket_job_put(job);
+}
+
+static struct rocket_core *sched_to_core(struct rocket_device *rdev,
+ struct drm_gpu_scheduler *sched)
+{
+ unsigned int core;
+
+ for (core = 0; core < rdev->num_cores; core++) {
+ if (&rdev->cores[core].sched == sched)
+ return &rdev->cores[core];
+ }
+
+ return NULL;
+}
+
+static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
+{
+ struct rocket_job *job = to_rocket_job(sched_job);
+ struct rocket_device *rdev = job->rdev;
+ struct rocket_core *core = sched_to_core(rdev, sched_job->sched);
+ struct dma_fence *fence = NULL;
+ int ret;
+
+ if (unlikely(job->base.s_fence->finished.error))
+ return NULL;
+
+ /*
+ * Nothing to execute: can happen if the job has finished while
+ * we were resetting the GPU.
+ */
+ if (job->next_task_idx == job->task_count)
+ return NULL;
+
+ fence = rocket_fence_create(core);
+ if (IS_ERR(fence))
+ return fence;
+
+ if (job->done_fence)
+ dma_fence_put(job->done_fence);
+ job->done_fence = dma_fence_get(fence);
+
+ ret = pm_runtime_get_sync(core->dev);
+ if (ret < 0)
+ return fence;
+
+ scoped_guard(spinlock, &core->job_lock) {
+ core->in_flight_job = job;
+ rocket_job_hw_submit(core, job);
+ }
+
+ return fence;
+}
+
+static void rocket_job_handle_done(struct rocket_core *core,
+ struct rocket_job *job)
+{
+ if (job->next_task_idx < job->task_count) {
+ rocket_job_hw_submit(core, job);
+ return;
+ }
+
+ core->in_flight_job = NULL;
+ dma_fence_signal_locked(job->done_fence);
+ pm_runtime_put_autosuspend(core->dev);
+}
+
+static void rocket_job_handle_irq(struct rocket_core *core)
+{
+ u32 status, raw_status;
+
+ pm_runtime_mark_last_busy(core->dev);
+
+ status = rocket_pc_readl(core, INTERRUPT_STATUS);
+ raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
+
+ rocket_pc_writel(core, OPERATION_ENABLE, 0x0);
+ rocket_pc_writel(core, INTERRUPT_CLEAR, 0x1ffff);
+
+ scoped_guard(spinlock, &core->job_lock)
+ if (core->in_flight_job)
+ rocket_job_handle_done(core, core->in_flight_job);
+}
+
+static void
+rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
+{
+ bool cookie;
+
+ if (!atomic_read(&core->reset.pending))
+ return;
+
+ /*
+ * Stop the scheduler.
+ *
+ * FIXME: We temporarily get out of the dma_fence_signalling section
+ * because the cleanup path generate lockdep splats when taking locks
+ * to release job resources. We should rework the code to follow this
+ * pattern:
+ *
+ * try_lock
+ * if (locked)
+ * release
+ * else
+ * schedule_work_to_release_later
+ */
+ drm_sched_stop(&core->sched, bad);
+
+ cookie = dma_fence_begin_signalling();
+
+ if (bad)
+ drm_sched_increase_karma(bad);
+
+ /*
+ * Mask job interrupts and synchronize to make sure we won't be
+ * interrupted during our reset.
+ */
+ rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
+ synchronize_irq(core->irq);
+
+ /* Handle the remaining interrupts before we reset. */
+ rocket_job_handle_irq(core);
+
+ /*
+ * Remaining interrupts have been handled, but we might still have
+ * stuck jobs. Let's make sure the PM counters stay balanced by
+ * manually calling pm_runtime_put_noidle() and
+ * rocket_devfreq_record_idle() for each stuck job.
+ * Let's also make sure the cycle counting register's refcnt is
+ * kept balanced to prevent it from running forever
+ */
+ scoped_guard(spinlock, &core->job_lock) {
+ if (core->in_flight_job)
+ pm_runtime_put_noidle(core->dev);
+
+ core->in_flight_job = NULL;
+ }
+
+ /* Proceed with reset now. */
+ pm_runtime_force_suspend(core->dev);
+ pm_runtime_force_resume(core->dev);
+
+ /* GPU has been reset, we can clear the reset pending bit. */
+ atomic_set(&core->reset.pending, 0);
+
+ /*
+ * Now resubmit jobs that were previously queued but didn't have a
+ * chance to finish.
+ * FIXME: We temporarily get out of the DMA fence signalling section
+ * while resubmitting jobs because the job submission logic will
+ * allocate memory with the GFP_KERNEL flag which can trigger memory
+ * reclaim and exposes a lock ordering issue.
+ */
+ dma_fence_end_signalling(cookie);
+ drm_sched_resubmit_jobs(&core->sched);
+ cookie = dma_fence_begin_signalling();
+
+ /* Restart the scheduler */
+ drm_sched_start(&core->sched, 0);
+
+ dma_fence_end_signalling(cookie);
+}
+
+static enum drm_gpu_sched_stat rocket_job_timedout(struct drm_sched_job *sched_job)
+{
+ struct rocket_job *job = to_rocket_job(sched_job);
+ struct rocket_device *rdev = job->rdev;
+ struct rocket_core *core = sched_to_core(rdev, sched_job->sched);
+
+ /*
+ * If the GPU managed to complete this jobs fence, the timeout is
+ * spurious. Bail out.
+ */
+ if (dma_fence_is_signaled(job->done_fence))
+ return DRM_GPU_SCHED_STAT_NOMINAL;
+
+ /*
+ * Rocket IRQ handler may take a long time to process an interrupt
+ * if there is another IRQ handler hogging the processing.
+ * For example, the HDMI encoder driver might be stuck in the IRQ
+ * handler for a significant time in a case of bad cable connection.
+ * In order to catch such cases and not report spurious rocket
+ * job timeouts, synchronize the IRQ handler and re-check the fence
+ * status.
+ */
+ synchronize_irq(core->irq);
+
+ if (dma_fence_is_signaled(job->done_fence)) {
+ dev_warn(core->dev, "unexpectedly high interrupt latency\n");
+ return DRM_GPU_SCHED_STAT_NOMINAL;
+ }
+
+ dev_err(core->dev, "gpu sched timeout");
+
+ atomic_set(&core->reset.pending, 1);
+ rocket_reset(core, sched_job);
+
+ return DRM_GPU_SCHED_STAT_NOMINAL;
+}
+
+static void rocket_reset_work(struct work_struct *work)
+{
+ struct rocket_core *core;
+
+ core = container_of(work, struct rocket_core, reset.work);
+ rocket_reset(core, NULL);
+}
+
+static const struct drm_sched_backend_ops rocket_sched_ops = {
+ .run_job = rocket_job_run,
+ .timedout_job = rocket_job_timedout,
+ .free_job = rocket_job_free
+};
+
+static irqreturn_t rocket_job_irq_handler_thread(int irq, void *data)
+{
+ struct rocket_core *core = data;
+
+ rocket_job_handle_irq(core);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t rocket_job_irq_handler(int irq, void *data)
+{
+ struct rocket_core *core = data;
+ u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
+
+ WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
+ WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
+
+ if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 ||
+ raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1))
+ return IRQ_NONE;
+
+ rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
+
+ return IRQ_WAKE_THREAD;
+}
+
+int rocket_job_init(struct rocket_core *core)
+{
+ struct drm_sched_init_args args = {
+ .ops = &rocket_sched_ops,
+ .num_rqs = DRM_SCHED_PRIORITY_COUNT,
+ .credit_limit = 1,
+ .timeout = msecs_to_jiffies(JOB_TIMEOUT_MS),
+ .name = dev_name(core->dev),
+ .dev = core->dev,
+ };
+ int ret;
+
+ INIT_WORK(&core->reset.work, rocket_reset_work);
+ spin_lock_init(&core->job_lock);
+
+ core->irq = platform_get_irq(to_platform_device(core->dev), 0);
+ if (core->irq < 0)
+ return core->irq;
+
+ ret = devm_request_threaded_irq(core->dev, core->irq,
+ rocket_job_irq_handler,
+ rocket_job_irq_handler_thread,
+ IRQF_SHARED, KBUILD_MODNAME "-job",
+ core);
+ if (ret) {
+ dev_err(core->dev, "failed to request job irq");
+ return ret;
+ }
+
+ core->reset.wq = alloc_ordered_workqueue("rocket-reset-%d", 0, core->index);
+ if (!core->reset.wq)
+ return -ENOMEM;
+
+ core->fence_context = dma_fence_context_alloc(1);
+
+ args.timeout_wq = core->reset.wq;
+ ret = drm_sched_init(&core->sched, &args);
+ if (ret) {
+ dev_err(core->dev, "Failed to create scheduler: %d.", ret);
+ goto err_sched;
+ }
+
+ return 0;
+
+err_sched:
+ drm_sched_fini(&core->sched);
+
+ destroy_workqueue(core->reset.wq);
+ return ret;
+}
+
+void rocket_job_fini(struct rocket_core *core)
+{
+ drm_sched_fini(&core->sched);
+
+ cancel_work_sync(&core->reset.work);
+ destroy_workqueue(core->reset.wq);
+}
+
+int rocket_job_open(struct rocket_file_priv *rocket_priv)
+{
+ struct rocket_device *rdev = rocket_priv->rdev;
+ struct drm_gpu_scheduler **scheds = kmalloc_array(rdev->num_cores, sizeof(scheds),
+ GFP_KERNEL);
+ unsigned int core;
+ int ret;
+
+ for (core = 0; core < rdev->num_cores; core++)
+ scheds[core] = &rdev->cores[core].sched;
+
+ ret = drm_sched_entity_init(&rocket_priv->sched_entity,
+ DRM_SCHED_PRIORITY_NORMAL,
+ scheds,
+ rdev->num_cores, NULL);
+ if (WARN_ON(ret))
+ return ret;
+
+ return 0;
+}
+
+void rocket_job_close(struct rocket_file_priv *rocket_priv)
+{
+ struct drm_sched_entity *entity = &rocket_priv->sched_entity;
+
+ kfree(entity->sched_list);
+ drm_sched_entity_destroy(entity);
+}
+
+int rocket_job_is_idle(struct rocket_core *core)
+{
+ /* If there are any jobs in this HW queue, we're not idle */
+ if (atomic_read(&core->sched.credit_count))
+ return false;
+
+ return true;
+}
+
+static int rocket_ioctl_submit_job(struct drm_device *dev, struct drm_file *file,
+ struct drm_rocket_job *job)
+{
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct rocket_file_priv *file_priv = file->driver_priv;
+ struct rocket_job *rjob = NULL;
+ int ret = 0;
+
+ if (job->task_count == 0)
+ return -EINVAL;
+
+ rjob = kzalloc(sizeof(*rjob), GFP_KERNEL);
+ if (!rjob)
+ return -ENOMEM;
+
+ kref_init(&rjob->refcount);
+
+ rjob->rdev = rdev;
+
+ ret = drm_sched_job_init(&rjob->base,
+ &file_priv->sched_entity,
+ 1, NULL);
+ if (ret)
+ goto out_put_job;
+
+ ret = rocket_copy_tasks(dev, file, job, rjob);
+ if (ret)
+ goto out_cleanup_job;
+
+ ret = drm_gem_objects_lookup(file, u64_to_user_ptr(job->in_bo_handles),
+ job->in_bo_handle_count, &rjob->in_bos);
+ if (ret)
+ goto out_cleanup_job;
+
+ rjob->in_bo_count = job->in_bo_handle_count;
+
+ ret = drm_gem_objects_lookup(file, u64_to_user_ptr(job->out_bo_handles),
+ job->out_bo_handle_count, &rjob->out_bos);
+ if (ret)
+ goto out_cleanup_job;
+
+ rjob->out_bo_count = job->out_bo_handle_count;
+
+ ret = rocket_job_push(rjob);
+ if (ret)
+ goto out_cleanup_job;
+
+out_cleanup_job:
+ if (ret)
+ drm_sched_job_cleanup(&rjob->base);
+out_put_job:
+ rocket_job_put(rjob);
+
+ return ret;
+}
+
+int rocket_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct drm_rocket_submit *args = data;
+ struct drm_rocket_job *jobs;
+ int ret = 0;
+ unsigned int i = 0;
+
+ if (args->reserved != 0) {
+ drm_dbg(dev, "Reserved field in drm_rocket_submit struct should be 0.\n");
+ return -EINVAL;
+ }
+
+ jobs = kvmalloc_array(args->job_count, sizeof(*jobs), GFP_KERNEL);
+ if (!jobs) {
+ drm_dbg(dev, "Failed to allocate incoming job array\n");
+ return -ENOMEM;
+ }
+
+ if (copy_from_user(jobs, u64_to_user_ptr(args->jobs),
+ args->job_count * sizeof(*jobs))) {
+ ret = -EFAULT;
+ drm_dbg(dev, "Failed to copy incoming job array\n");
+ goto exit;
+ }
+
+ for (i = 0; i < args->job_count; i++) {
+ if (jobs[i].reserved != 0) {
+ drm_dbg(dev, "Reserved field in drm_rocket_job struct should be 0.\n");
+ return -EINVAL;
+ }
+
+ rocket_ioctl_submit_job(dev, file, &jobs[i]);
+ }
+
+exit:
+ kfree(jobs);
+
+ return ret;
+}
diff --git a/drivers/accel/rocket/rocket_job.h b/drivers/accel/rocket/rocket_job.h
new file mode 100644
index 0000000000000000000000000000000000000000..99e1928fbd89f9b506c63bf9dd591124feeb54b5
--- /dev/null
+++ b/drivers/accel/rocket/rocket_job.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_JOB_H__
+#define __ROCKET_JOB_H__
+
+#include <drm/drm_drv.h>
+#include <drm/gpu_scheduler.h>
+
+#include "rocket_core.h"
+#include "rocket_drv.h"
+
+struct rocket_task {
+ u64 regcmd;
+ u32 regcmd_count;
+};
+
+struct rocket_job {
+ struct drm_sched_job base;
+
+ struct rocket_device *rdev;
+
+ struct drm_gem_object **in_bos;
+ struct drm_gem_object **out_bos;
+
+ u32 in_bo_count;
+ u32 out_bo_count;
+
+ struct rocket_task *tasks;
+ u32 task_count;
+ u32 next_task_idx;
+
+ /* Fence to be signaled by drm-sched once its done with the job */
+ struct dma_fence *inference_done_fence;
+
+ /* Fence to be signaled by IRQ handler when the job is complete. */
+ struct dma_fence *done_fence;
+
+ struct kref refcount;
+};
+
+int rocket_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *file);
+
+int rocket_job_init(struct rocket_core *core);
+void rocket_job_fini(struct rocket_core *core);
+int rocket_job_open(struct rocket_file_priv *rocket_priv);
+void rocket_job_close(struct rocket_file_priv *rocket_priv);
+int rocket_job_is_idle(struct rocket_core *core);
+
+#endif
diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h
index 95720702b7c4413d72b89c1f0f59abb22dc8c6b3..cb1b5934c201160e7650aabd1b3a2b1c77b1fd7b 100644
--- a/include/uapi/drm/rocket_accel.h
+++ b/include/uapi/drm/rocket_accel.h
@@ -12,8 +12,10 @@ extern "C" {
#endif
#define DRM_ROCKET_CREATE_BO 0x00
+#define DRM_ROCKET_SUBMIT 0x01
#define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
+#define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit)
/**
* struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
@@ -37,6 +39,68 @@ struct drm_rocket_create_bo {
__u64 offset;
};
+/**
+ * struct drm_rocket_task - A task to be run on the NPU
+ *
+ * A task is the smallest unit of work that can be run on the NPU.
+ */
+struct drm_rocket_task {
+ /** Input: DMA address to NPU mapping of register command buffer */
+ __u64 regcmd;
+
+ /** Input: Number of commands in the register command buffer */
+ __u32 regcmd_count;
+
+ /** Reserved, must be zero. */
+ __u32 reserved;
+};
+
+/**
+ * struct drm_rocket_job - A job to be run on the NPU
+ *
+ * The kernel will schedule the execution of this job taking into account its
+ * dependencies with other jobs. All tasks in the same job will be executed
+ * sequentially on the same core, to benefit from memory residency in SRAM.
+ */
+struct drm_rocket_job {
+ /** Input: Pointer to an array of struct drm_rocket_task. */
+ __u64 tasks;
+
+ /** Input: Pointer to a u32 array of the BOs that are read by the job. */
+ __u64 in_bo_handles;
+
+ /** Input: Pointer to a u32 array of the BOs that are written to by the job. */
+ __u64 out_bo_handles;
+
+ /** Input: Number of tasks passed in. */
+ __u32 task_count;
+
+ /** Input: Number of input BO handles passed in (size is that times 4). */
+ __u32 in_bo_handle_count;
+
+ /** Input: Number of output BO handles passed in (size is that times 4). */
+ __u32 out_bo_handle_count;
+
+ /** Reserved, must be zero. */
+ __u32 reserved;
+};
+
+/**
+ * struct drm_rocket_submit - ioctl argument for submitting commands to the NPU.
+ *
+ * The kernel will schedule the execution of these jobs in dependency order.
+ */
+struct drm_rocket_submit {
+ /** Input: Pointer to an array of struct drm_rocket_job. */
+ __u64 jobs;
+
+ /** Input: Number of jobs passed in. */
+ __u32 job_count;
+
+ /** Reserved, must be zero. */
+ __u32 reserved;
+};
+
#if defined(__cplusplus)
}
#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 08/10] accel/rocket: Add IOCTLs for synchronizing memory accesses
2025-06-04 7:57 [PATCH v6 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (5 preceding siblings ...)
2025-06-04 7:57 ` [PATCH v6 07/10] accel/rocket: Add job submission IOCTL Tomeu Vizoso
@ 2025-06-04 7:57 ` Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 09/10] arm64: dts: rockchip: add pd_npu label for RK3588 power domains Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B Tomeu Vizoso
8 siblings, 0 replies; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-04 7:57 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso,
Jeff Hugo
The NPU cores have their own access to the memory bus, and this isn't
cache coherent with the CPUs.
Add IOCTLs so userspace can mark when the caches need to be flushed, and
also when a writer job needs to be waited for before the buffer can be
accessed from the CPU.
Initially based on the same IOCTLs from the Etnaviv driver.
v2:
- Don't break UABI by reordering the IOCTL IDs (Jeff Hugo)
v3:
- Check that padding fields in IOCTLs are zero (Jeff Hugo)
v6:
- Fix conversion logic to make sure we use DMA_BIDIRECTIONAL when needed
(Lucas Stach)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
---
drivers/accel/rocket/rocket_drv.c | 2 +
drivers/accel/rocket/rocket_gem.c | 82 +++++++++++++++++++++++++++++++++++++++
drivers/accel/rocket/rocket_gem.h | 5 +++
include/uapi/drm/rocket_accel.h | 37 ++++++++++++++++++
4 files changed, 126 insertions(+)
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index e6e257e01cd83a18abc2817135ffd97abe3104b3..adc142facdb357c461f29b16aef77e401d87c9f6 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -61,6 +61,8 @@ static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
ROCKET_IOCTL(CREATE_BO, create_bo),
ROCKET_IOCTL(SUBMIT, submit),
+ ROCKET_IOCTL(PREP_BO, prep_bo),
+ ROCKET_IOCTL(FINI_BO, fini_bo),
};
DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c
index fa07c494fd845f98ac759407677a649ed2c97727..4325e6c39d5df35601d4820cd0a5c829ade585d5 100644
--- a/drivers/accel/rocket/rocket_gem.c
+++ b/drivers/accel/rocket/rocket_gem.c
@@ -124,3 +124,85 @@ int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *
return ret;
}
+
+static inline enum dma_data_direction rocket_op_to_dma_dir(u32 op)
+{
+ op &= ROCKET_PREP_READ | ROCKET_PREP_WRITE;
+
+ if (op == ROCKET_PREP_READ)
+ return DMA_FROM_DEVICE;
+ else if (op == ROCKET_PREP_WRITE)
+ return DMA_TO_DEVICE;
+ else
+ return DMA_BIDIRECTIONAL;
+}
+
+int rocket_ioctl_prep_bo(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct drm_rocket_prep_bo *args = data;
+ unsigned long timeout = drm_timeout_abs_to_jiffies(args->timeout_ns);
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct drm_gem_object *gem_obj;
+ struct drm_gem_shmem_object *shmem_obj;
+ bool write = !!(args->op & ROCKET_PREP_WRITE);
+ long ret = 0;
+
+ if (args->op & ~(ROCKET_PREP_READ | ROCKET_PREP_WRITE))
+ return -EINVAL;
+
+ gem_obj = drm_gem_object_lookup(file, args->handle);
+ if (!gem_obj)
+ return -ENOENT;
+
+ ret = dma_resv_wait_timeout(gem_obj->resv, dma_resv_usage_rw(write),
+ true, timeout);
+ if (!ret)
+ ret = timeout ? -ETIMEDOUT : -EBUSY;
+
+ shmem_obj = &to_rocket_bo(gem_obj)->base;
+
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ dma_sync_sgtable_for_cpu(rdev->cores[core].dev, shmem_obj->sgt,
+ rocket_op_to_dma_dir(args->op));
+ }
+
+ to_rocket_bo(gem_obj)->last_cpu_prep_op = args->op;
+
+ drm_gem_object_put(gem_obj);
+
+ return ret;
+}
+
+int rocket_ioctl_fini_bo(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct drm_rocket_fini_bo *args = data;
+ struct drm_gem_shmem_object *shmem_obj;
+ struct rocket_gem_object *rkt_obj;
+ struct drm_gem_object *gem_obj;
+
+ if (args->reserved != 0) {
+ drm_dbg(dev, "Reserved field in drm_rocket_fini_bo struct should be 0.\n");
+ return -EINVAL;
+ }
+
+ gem_obj = drm_gem_object_lookup(file, args->handle);
+ if (!gem_obj)
+ return -ENOENT;
+
+ rkt_obj = to_rocket_bo(gem_obj);
+ shmem_obj = &rkt_obj->base;
+
+ WARN_ON(rkt_obj->last_cpu_prep_op == 0);
+
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ dma_sync_sgtable_for_device(rdev->cores[core].dev, shmem_obj->sgt,
+ rocket_op_to_dma_dir(rkt_obj->last_cpu_prep_op));
+ }
+
+ rkt_obj->last_cpu_prep_op = 0;
+
+ drm_gem_object_put(gem_obj);
+
+ return 0;
+}
diff --git a/drivers/accel/rocket/rocket_gem.h b/drivers/accel/rocket/rocket_gem.h
index 41497554366961cfe18cf6c7e93ab1e4e5dc1886..2caa268f7f496f782996c6ad2c4eb851a225a86f 100644
--- a/drivers/accel/rocket/rocket_gem.h
+++ b/drivers/accel/rocket/rocket_gem.h
@@ -11,12 +11,17 @@ struct rocket_gem_object {
size_t size;
u32 offset;
+ u32 last_cpu_prep_op;
};
struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size);
int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file);
+int rocket_ioctl_prep_bo(struct drm_device *dev, void *data, struct drm_file *file);
+
+int rocket_ioctl_fini_bo(struct drm_device *dev, void *data, struct drm_file *file);
+
static inline
struct rocket_gem_object *to_rocket_bo(struct drm_gem_object *obj)
{
diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h
index cb1b5934c201160e7650aabd1b3a2b1c77b1fd7b..b5c80dd767be56e9720b51e4a82617a425a881a1 100644
--- a/include/uapi/drm/rocket_accel.h
+++ b/include/uapi/drm/rocket_accel.h
@@ -13,9 +13,13 @@ extern "C" {
#define DRM_ROCKET_CREATE_BO 0x00
#define DRM_ROCKET_SUBMIT 0x01
+#define DRM_ROCKET_PREP_BO 0x02
+#define DRM_ROCKET_FINI_BO 0x03
#define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
#define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit)
+#define DRM_IOCTL_ROCKET_PREP_BO DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_PREP_BO, struct drm_rocket_prep_bo)
+#define DRM_IOCTL_ROCKET_FINI_BO DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_FINI_BO, struct drm_rocket_fini_bo)
/**
* struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
@@ -39,6 +43,39 @@ struct drm_rocket_create_bo {
__u64 offset;
};
+#define ROCKET_PREP_READ 0x01
+#define ROCKET_PREP_WRITE 0x02
+
+/**
+ * struct drm_rocket_prep_bo - ioctl argument for starting CPU ownership of the BO.
+ *
+ * Takes care of waiting for any NPU jobs that might still use the NPU and performs cache
+ * synchronization.
+ */
+struct drm_rocket_prep_bo {
+ /** Input: GEM handle of the buffer object. */
+ __u32 handle;
+
+ /** Input: mask of ROCKET_PREP_x, direction of the access. */
+ __u32 op;
+
+ /** Input: Amount of time to wait for NPU jobs. */
+ __s64 timeout_ns;
+};
+
+/**
+ * struct drm_rocket_fini_bo - ioctl argument for finishing CPU ownership of the BO.
+ *
+ * Synchronize caches for NPU access.
+ */
+struct drm_rocket_fini_bo {
+ /** Input: GEM handle of the buffer object. */
+ __u32 handle;
+
+ /** Reserved, must be zero. */
+ __u32 reserved;
+};
+
/**
* struct drm_rocket_task - A task to be run on the NPU
*
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 09/10] arm64: dts: rockchip: add pd_npu label for RK3588 power domains
2025-06-04 7:57 [PATCH v6 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (6 preceding siblings ...)
2025-06-04 7:57 ` [PATCH v6 08/10] accel/rocket: Add IOCTLs for synchronizing memory accesses Tomeu Vizoso
@ 2025-06-04 7:57 ` Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B Tomeu Vizoso
8 siblings, 0 replies; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-04 7:57 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
From: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
The NPU of the RK3588 has an external supply. This supply also affects
the power domain of the NPU, not just the NPU device nodes themselves.
Since correctly modelled boards will want the power domain to be aware
of the regulator so that it doesn't always have to be on, add a label to
the NPU power domain node so board files can reference it.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
index ea831bb6e2ccc64c811f885a4964da7617c255d7..a44dfb376fdf4c29e1bd307d9a7d1621e11d8c59 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
@@ -837,7 +837,7 @@ power: power-controller {
status = "okay";
/* These power domains are grouped by VD_NPU */
- power-domain@RK3588_PD_NPU {
+ pd_npu: power-domain@RK3588_PD_NPU {
reg = <RK3588_PD_NPU>;
#power-domain-cells = <0>;
#address-cells = <1>;
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v6 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B
2025-06-04 7:57 [PATCH v6 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (7 preceding siblings ...)
2025-06-04 7:57 ` [PATCH v6 09/10] arm64: dts: rockchip: add pd_npu label for RK3588 power domains Tomeu Vizoso
@ 2025-06-04 7:57 ` Tomeu Vizoso
8 siblings, 0 replies; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-04 7:57 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
From: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
The NPU on the ROCK5B uses the same regulator for both the sram-supply
and the npu's supply. Add this regulator, and enable all the NPU bits.
Also add the regulator as a domain-supply to the pd_npu power domain.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 56 +++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
index d22068475c5dc6cb885f878f3f527a66edf1ba70..49500f7cbcb14af4919a6c1997e9e53a01d84973 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
@@ -316,6 +316,28 @@ regulator-state-mem {
};
};
+&i2c1 {
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c1m2_xfer>;
+ status = "okay";
+
+ vdd_npu_s0: regulator@42 {
+ compatible = "rockchip,rk8602";
+ reg = <0x42>;
+ fcs,suspend-voltage-selector = <1>;
+ regulator-name = "vdd_npu_s0";
+ regulator-boot-on;
+ regulator-min-microvolt = <550000>;
+ regulator-max-microvolt = <950000>;
+ regulator-ramp-delay = <2300>;
+ vin-supply = <&vcc5v0_sys>;
+
+ regulator-state-mem {
+ regulator-off-in-suspend;
+ };
+ };
+};
+
&i2c6 {
status = "okay";
@@ -440,6 +462,10 @@ &pd_gpu {
domain-supply = <&vdd_gpu_s0>;
};
+&pd_npu {
+ domain-supply = <&vdd_npu_s0>;
+};
+
&pinctrl {
hdmirx {
hdmirx_hpd: hdmirx-5v-detection {
@@ -500,6 +526,36 @@ &pwm1 {
status = "okay";
};
+&rknn_core_top {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_s0>;
+ status = "okay";
+};
+
+&rknn_core_1 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_s0>;
+ status = "okay";
+};
+
+&rknn_core_2 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_s0>;
+ status = "okay";
+};
+
+&rknn_mmu_top {
+ status = "okay";
+};
+
+&rknn_mmu_1 {
+ status = "okay";
+};
+
+&rknn_mmu_2 {
+ status = "okay";
+};
+
&saradc {
vref-supply = <&avcc_1v8_s0>;
status = "okay";
--
2.49.0
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v6 01/10] dt-bindings: npu: rockchip,rknn: Add bindings
2025-06-04 7:57 ` [PATCH v6 01/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
@ 2025-06-04 8:25 ` Heiko Stübner
2025-06-04 8:36 ` Tomeu Vizoso
0 siblings, 1 reply; 22+ messages in thread
From: Heiko Stübner @ 2025-06-04 8:25 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Oded Gabbay,
Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo, Tomeu Vizoso
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso,
Krzysztof Kozlowski
Am Mittwoch, 4. Juni 2025, 09:57:14 Mitteleuropäische Sommerzeit schrieb Tomeu Vizoso:
> Add the bindings for the Neural Processing Unit IP from Rockchip.
>
> v2:
> - Adapt to new node structure (one node per core, each with its own
> IOMMU)
> - Several misc. fixes from Sebastian Reichel
>
> v3:
> - Split register block in its constituent subblocks, and only require
> the ones that the kernel would ever use (Nicolas Frattaroli)
> - Group supplies (Rob Herring)
> - Explain the way in which the top core is special (Rob Herring)
>
> v4:
> - Change required node name to npu@ (Rob Herring and Krzysztof Kozlowski)
> - Remove unneeded items: (Krzysztof Kozlowski)
> - Fix use of minItems/maxItems (Krzysztof Kozlowski)
> - Add reg-names to list of required properties (Krzysztof Kozlowski)
> - Fix example (Krzysztof Kozlowski)
>
> v5:
> - Rename file to rockchip,rk3588-rknn-core.yaml (Krzysztof Kozlowski)
> - Streamline compatible property (Krzysztof Kozlowski)
>
> v6:
> - Remove mention to NVDLA, as the hardware is only incidentally related
> (Kever Yang)
> - Mark pclk and npu clocks as required by all clocks (Rob Herring)
>
> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> ---
> .../bindings/npu/rockchip,rk3588-rknn-core.yaml | 144 +++++++++++++++++++++
> 1 file changed, 144 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> new file mode 100644
> index 0000000000000000000000000000000000000000..9a5e9e213912d0997da2f6ae26189adf044dcc7b
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> @@ -0,0 +1,144 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/npu/rockchip,rk3588-rknn-core.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Neural Processing Unit IP from Rockchip
> +
> +maintainers:
> + - Tomeu Vizoso <tomeu@tomeuvizoso.net>
> +
> +description:
> + Rockchip IP for accelerating inference of neural networks.
> +
> + There is to be a node per each core in the NPU. In Rockchip's design there
> + will be one core that is special and needs to be powered on before any of the
> + other cores can be used. This special core is called the top core and should
> + have the compatible string that corresponds to top cores.
> +
> +properties:
> + $nodename:
> + pattern: '^npu@[a-f0-9]+$'
> +
> + compatible:
> + enum:
> + - rockchip,rk3588-rknn-core-top
> + - rockchip,rk3588-rknn-core
> +
> + reg:
> + maxItems: 3
> +
> + reg-names:
> + items:
> + - const: pc
> + - const: cna
> + - const: core
> +
> + clocks:
> + maxItems: 4
> +
> + clock-names:
> + items:
> + - const: aclk
> + - const: hclk
> + - const: npu
> + - const: pclk
> +
> + interrupts:
> + maxItems: 1
> +
> + iommus:
> + maxItems: 1
> +
> + npu-supply: true
> +
> + power-domains:
> + maxItems: 1
> +
> + resets:
> + maxItems: 2
> +
> + reset-names:
> + items:
> + - const: srst_a
> + - const: srst_h
> +
> + sram-supply: true
> +
> +required:
> + - compatible
> + - reg
> + - reg-names
> + - clocks
> + - clock-names
> + - interrupts
> + - iommus
> + - power-domains
> + - resets
> + - reset-names
> + - npu-supply
> + - sram-supply
> +
> +allOf:
> + - if:
> + properties:
> + compatible:
> + contains:
> + enum:
> + - rockchip,rknn-core-top
should be rockchip,rk3588-rknn-core-top I think
> + then:
> + properties:
> + clocks:
> + minItems: 4
> +
> + clock-names:
> + minItems: 4
> + - if:
> + properties:
> + compatible:
> + contains:
> + enum:
> + - rockchip,rknn-core
should be rockchip,rk3588-rknn-core
> + then:
> + properties:
> + clocks:
> + maxItems: 2
> + clock-names:
> + maxItems: 2
Heiko
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 01/10] dt-bindings: npu: rockchip,rknn: Add bindings
2025-06-04 8:25 ` Heiko Stübner
@ 2025-06-04 8:36 ` Tomeu Vizoso
0 siblings, 0 replies; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-04 8:36 UTC (permalink / raw)
To: Heiko Stübner
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Oded Gabbay,
Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo, devicetree, linux-arm-kernel,
linux-rockchip, linux-kernel, dri-devel, linux-doc, linux-media,
linaro-mm-sig, Krzysztof Kozlowski
On Wed, Jun 4, 2025 at 10:25 AM Heiko Stübner <heiko@sntech.de> wrote:
>
> Am Mittwoch, 4. Juni 2025, 09:57:14 Mitteleuropäische Sommerzeit schrieb Tomeu Vizoso:
> > Add the bindings for the Neural Processing Unit IP from Rockchip.
> >
> > v2:
> > - Adapt to new node structure (one node per core, each with its own
> > IOMMU)
> > - Several misc. fixes from Sebastian Reichel
> >
> > v3:
> > - Split register block in its constituent subblocks, and only require
> > the ones that the kernel would ever use (Nicolas Frattaroli)
> > - Group supplies (Rob Herring)
> > - Explain the way in which the top core is special (Rob Herring)
> >
> > v4:
> > - Change required node name to npu@ (Rob Herring and Krzysztof Kozlowski)
> > - Remove unneeded items: (Krzysztof Kozlowski)
> > - Fix use of minItems/maxItems (Krzysztof Kozlowski)
> > - Add reg-names to list of required properties (Krzysztof Kozlowski)
> > - Fix example (Krzysztof Kozlowski)
> >
> > v5:
> > - Rename file to rockchip,rk3588-rknn-core.yaml (Krzysztof Kozlowski)
> > - Streamline compatible property (Krzysztof Kozlowski)
> >
> > v6:
> > - Remove mention to NVDLA, as the hardware is only incidentally related
> > (Kever Yang)
> > - Mark pclk and npu clocks as required by all clocks (Rob Herring)
> >
> > Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
> > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> > Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
> > ---
> > .../bindings/npu/rockchip,rk3588-rknn-core.yaml | 144 +++++++++++++++++++++
> > 1 file changed, 144 insertions(+)
> >
> > diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> > new file mode 100644
> > index 0000000000000000000000000000000000000000..9a5e9e213912d0997da2f6ae26189adf044dcc7b
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> > @@ -0,0 +1,144 @@
> > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/npu/rockchip,rk3588-rknn-core.yaml#
> > +$schema: http://devicetree.org/meta-schemas/core.yaml#
> > +
> > +title: Neural Processing Unit IP from Rockchip
> > +
> > +maintainers:
> > + - Tomeu Vizoso <tomeu@tomeuvizoso.net>
> > +
> > +description:
> > + Rockchip IP for accelerating inference of neural networks.
> > +
> > + There is to be a node per each core in the NPU. In Rockchip's design there
> > + will be one core that is special and needs to be powered on before any of the
> > + other cores can be used. This special core is called the top core and should
> > + have the compatible string that corresponds to top cores.
> > +
> > +properties:
> > + $nodename:
> > + pattern: '^npu@[a-f0-9]+$'
> > +
> > + compatible:
> > + enum:
> > + - rockchip,rk3588-rknn-core-top
> > + - rockchip,rk3588-rknn-core
> > +
> > + reg:
> > + maxItems: 3
> > +
> > + reg-names:
> > + items:
> > + - const: pc
> > + - const: cna
> > + - const: core
> > +
> > + clocks:
> > + maxItems: 4
> > +
> > + clock-names:
> > + items:
> > + - const: aclk
> > + - const: hclk
> > + - const: npu
> > + - const: pclk
> > +
> > + interrupts:
> > + maxItems: 1
> > +
> > + iommus:
> > + maxItems: 1
> > +
> > + npu-supply: true
> > +
> > + power-domains:
> > + maxItems: 1
> > +
> > + resets:
> > + maxItems: 2
> > +
> > + reset-names:
> > + items:
> > + - const: srst_a
> > + - const: srst_h
> > +
> > + sram-supply: true
> > +
> > +required:
> > + - compatible
> > + - reg
> > + - reg-names
> > + - clocks
> > + - clock-names
> > + - interrupts
> > + - iommus
> > + - power-domains
> > + - resets
> > + - reset-names
> > + - npu-supply
> > + - sram-supply
> > +
> > +allOf:
> > + - if:
> > + properties:
> > + compatible:
> > + contains:
> > + enum:
> > + - rockchip,rknn-core-top
>
> should be rockchip,rk3588-rknn-core-top I think
>
> > + then:
> > + properties:
> > + clocks:
> > + minItems: 4
> > +
> > + clock-names:
> > + minItems: 4
> > + - if:
> > + properties:
> > + compatible:
> > + contains:
> > + enum:
> > + - rockchip,rknn-core
>
> should be rockchip,rk3588-rknn-core
Thanks. Actually, all the addOf section is not needed any more as we
now know that from a resources point of view all cores are the same.
Cheers,
Tomeu
> > + then:
> > + properties:
> > + clocks:
> > + maxItems: 2
> > + clock-names:
> > + maxItems: 2
>
>
> Heiko
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 03/10] arm64: dts: rockchip: Enable the NPU on quartzpro64
2025-06-04 7:57 ` [PATCH v6 03/10] arm64: dts: rockchip: Enable the NPU on quartzpro64 Tomeu Vizoso
@ 2025-06-04 8:39 ` Heiko Stübner
0 siblings, 0 replies; 22+ messages in thread
From: Heiko Stübner @ 2025-06-04 8:39 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Oded Gabbay,
Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Kever Yang, Jeff Hugo, Tomeu Vizoso
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
Am Mittwoch, 4. Juni 2025, 09:57:16 Mitteleuropäische Sommerzeit schrieb Tomeu Vizoso:
> Enable the nodes added in a previous commit to the rk3588s device tree.
shouldn't the quartzpro64 also need a vdd_npu regulator, like the rock-5b
support at the end of the series? If not, please mention that in the commit
message.
Also, it'd make sense to collect all dts patches in one location (probably
at the bottom of the series=
Heiko
> v2:
> - Split nodes (Sebastian Reichel)
> - Sort nodes (Sebastian Reichel)
> - Add board regulators (Sebastian Reichel)
>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> ---
> .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 30 ++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
> index 78aaa6635b5d20a650aba8d8c2d0d4f498ff0d33..2e45b213c25b99571dd71ce90bc7970418f60276 100644
> --- a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
> +++ b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
> @@ -415,6 +415,36 @@ &pcie3x4 {
> status = "okay";
> };
>
> +&rknn_core_top {
> + npu-supply = <&vdd_npu_s0>;
> + sram-supply = <&vdd_npu_mem_s0>;
> + status = "okay";
> +};
> +
> +&rknn_core_1 {
> + npu-supply = <&vdd_npu_s0>;
> + sram-supply = <&vdd_npu_mem_s0>;
> + status = "okay";
> +};
> +
> +&rknn_core_2 {
> + npu-supply = <&vdd_npu_s0>;
> + sram-supply = <&vdd_npu_mem_s0>;
> + status = "okay";
> +};
> +
> +&rknn_mmu_top {
> + status = "okay";
> +};
> +
> +&rknn_mmu_1 {
> + status = "okay";
> +};
> +
> +&rknn_mmu_2 {
> + status = "okay";
> +};
> +
> &saradc {
> vref-supply = <&vcc_1v8_s0>;
> status = "okay";
>
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation
2025-06-04 7:57 ` [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
@ 2025-06-04 16:18 ` Daniel Stone
2025-06-04 17:03 ` Robin Murphy
0 siblings, 1 reply; 22+ messages in thread
From: Daniel Stone @ 2025-06-04 16:18 UTC (permalink / raw)
To: Tomeu Vizoso
Cc: Rob Herring, Maxime Ripard, David Airlie, Simona Vetter,
Sebastian Reichel, Nicolas Frattaroli, Kever Yang,
linux-arm-kernel, linux-rockchip, linux-kernel, dri-devel,
Robin Murphy
Hi Tomeu,
I have some bad news ...
On Wed, 4 Jun 2025 at 08:57, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote:
> +int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file)
> +{
> + [...]
> +
> + /* This will map the pages to the IOMMU linked to core 0 */
> + sgt = drm_gem_shmem_get_pages_sgt(shmem_obj);
> + if (IS_ERR(sgt)) {
> + ret = PTR_ERR(sgt);
> + goto err;
> + }
> +
> + /* Map the pages to the IOMMUs linked to the other cores, so all cores can access this BO */
So, uh, this is not great.
We only have a single IOMMU context (well, one per core, but one
effective VMA) for the whole device. Every BO that gets created, gets
mapped into the IOMMU until it's been destroyed. Given that there is
no client isolation and no CS validation, that means that every client
has RW access to every BO created by any other client, for the
lifetime of that BO.
I really don't think that this is tractable, given that anyone with
access to the device can exfiltrate anything that anyone else has
provided to the device.
I also don't think that CS validation is tractable tbh.
So I guess that leaves us with the third option: enforcing context
separation within the kernel driver.
The least preferable option I can think of is that rkt sets up and
tears down MMU mappings for each job, according to the BO list
provided for it. This seems like way too much overhead - especially
with RK IOMMU ops having been slow enough within DRM that we expended
a lot of effort in Weston doing caching of DRM BOs to avoid doing this
unless completely necessary. It also seems risky wrt allocating memory
in drm_sched paths to ensure forward progress.
Slightly more preferable than this would be that rkt kept a
per-context list of BOs and their VA mappings, and when switching
between different contexts, would tear down all MMU mappings from the
old context and set up mappings from the new. But this has the same
issues with drm_sched.
The most preferable option from where I sit is that we could have an
explicit notion of driver-managed IOMMU contexts, such that rkt could
prepare the IOMMU for each context, and then switching contexts at
job-run time would be a matter of changing the root DTE pointer and
issuing a flush. But I don't see that anywhere in the user-facing
IOMMU API, and I'm sure Robin (CCed) will be along shortly to explain
why it's not possible ...
Either way, I wonder if we have fully per-context mappings, userspace
should not manage IOVAs in the VM_BIND style common to newer drivers,
rather than relying on the kernel to do VA allocation and inform
userspace of them?
I'm really sorry this has come so late in the game.
Cheers,
Daniel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation
2025-06-04 16:18 ` Daniel Stone
@ 2025-06-04 17:03 ` Robin Murphy
2025-06-05 7:41 ` Tomeu Vizoso
0 siblings, 1 reply; 22+ messages in thread
From: Robin Murphy @ 2025-06-04 17:03 UTC (permalink / raw)
To: Daniel Stone, Tomeu Vizoso
Cc: Rob Herring, Maxime Ripard, David Airlie, Simona Vetter,
Sebastian Reichel, Nicolas Frattaroli, Kever Yang,
linux-arm-kernel, linux-rockchip, linux-kernel, dri-devel
On 2025-06-04 5:18 pm, Daniel Stone wrote:
> Hi Tomeu,
> I have some bad news ...
>
> On Wed, 4 Jun 2025 at 08:57, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote:
>> +int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file)
>> +{
>> + [...]
>> +
>> + /* This will map the pages to the IOMMU linked to core 0 */
>> + sgt = drm_gem_shmem_get_pages_sgt(shmem_obj);
>> + if (IS_ERR(sgt)) {
>> + ret = PTR_ERR(sgt);
>> + goto err;
>> + }
>> +
>> + /* Map the pages to the IOMMUs linked to the other cores, so all cores can access this BO */
>
> So, uh, this is not great.
>
> We only have a single IOMMU context (well, one per core, but one
> effective VMA) for the whole device. Every BO that gets created, gets
> mapped into the IOMMU until it's been destroyed. Given that there is
> no client isolation and no CS validation, that means that every client
> has RW access to every BO created by any other client, for the
> lifetime of that BO.
>
> I really don't think that this is tractable, given that anyone with
> access to the device can exfiltrate anything that anyone else has
> provided to the device.
>
> I also don't think that CS validation is tractable tbh.
>
> So I guess that leaves us with the third option: enforcing context
> separation within the kernel driver.
>
> The least preferable option I can think of is that rkt sets up and
> tears down MMU mappings for each job, according to the BO list
> provided for it. This seems like way too much overhead - especially
> with RK IOMMU ops having been slow enough within DRM that we expended
> a lot of effort in Weston doing caching of DRM BOs to avoid doing this
> unless completely necessary. It also seems risky wrt allocating memory
> in drm_sched paths to ensure forward progress.
>
> Slightly more preferable than this would be that rkt kept a
> per-context list of BOs and their VA mappings, and when switching
> between different contexts, would tear down all MMU mappings from the
> old context and set up mappings from the new. But this has the same
> issues with drm_sched.
>
> The most preferable option from where I sit is that we could have an
> explicit notion of driver-managed IOMMU contexts, such that rkt could
> prepare the IOMMU for each context, and then switching contexts at
> job-run time would be a matter of changing the root DTE pointer and
> issuing a flush. But I don't see that anywhere in the user-facing
> IOMMU API, and I'm sure Robin (CCed) will be along shortly to explain
> why it's not possible ...
On the contrary, it's called iommu_attach_group() :)
In fact this is precisely the usage model I would suggest for this sort
of thing, and IIRC I had a similar conversation with the Ethos driver
folks a few years back. Running your own IOMMU domain is no biggie, see
several other DRM drivers (including rockchip). As long as you have a
separate struct device per NPU core then indeed it should be perfectly
straightforward to maintain distinct IOMMU domains per job, and
attach/detach them as part of scheduling the jobs on and off the cores.
Looks like rockchip-iommu supports cross-instance attach, so if
necessary you should already be OK to have multiple cores working on the
same job at once, without needing extra work at the IOMMU end.
> Either way, I wonder if we have fully per-context mappings, userspace
> should not manage IOVAs in the VM_BIND style common to newer drivers,
> rather than relying on the kernel to do VA allocation and inform
> userspace of them?
Indeed if you're using the IOMMU API directly then you need to do your
own address space management either way, so matching bits of process VA
space is pretty simple to put on the table.
Thanks,
Robin.
>
> I'm really sorry this has come so late in the game.
>
> Cheers,
> Daniel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 05/10] accel/rocket: Add a new driver for Rockchip's NPU
2025-06-04 7:57 ` [PATCH v6 05/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
@ 2025-06-04 18:14 ` Robin Murphy
0 siblings, 0 replies; 22+ messages in thread
From: Robin Murphy @ 2025-06-04 18:14 UTC (permalink / raw)
To: Tomeu Vizoso, Rob Herring, Krzysztof Kozlowski, Conor Dooley,
Heiko Stuebner, Oded Gabbay, Jonathan Corbet, Maarten Lankhorst,
Maxime Ripard, Thomas Zimmermann, David Airlie, Simona Vetter,
Sumit Semwal, Christian König, Sebastian Reichel,
Nicolas Frattaroli, Kever Yang, Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig
[ Since Daniel made me look... ]
On 2025-06-04 8:57 am, Tomeu Vizoso wrote:
[...]
> diff --git a/drivers/accel/rocket/Kconfig b/drivers/accel/rocket/Kconfig
> new file mode 100644
> index 0000000000000000000000000000000000000000..9a59c6c61bf4d6460d8008b16331f001c97de67d
> --- /dev/null
> +++ b/drivers/accel/rocket/Kconfig
> @@ -0,0 +1,25 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +config DRM_ACCEL_ROCKET
> + tristate "Rocket (support for Rockchip NPUs)"
> + depends on DRM
> + depends on ARM64 || COMPILE_TEST
Best make that "(ARCH_ROCKCHIP && ARM64) || COMPILE_TEST" now before
someone else inevitably does. Or perhaps just a pre-emptive
"ARCH_ROCKCHIP || COMPILE_TEST" if this is the same NPU that's in RV1126
etc.
> + depends on MMU
> + select DRM_SCHED
> + select IOMMU_SUPPORT
Selecting user-visible symbols is often considered bad form, but this
one isn't even functional - all you're doing here is forcing the
top-level availability of all the IOMMU driver/API options.
If you really want to nanny the user and dissuade them from building a
config which is unlikely to be useful in practice, then at best maybe
"depends on ROCKCHIP_IOMMU || COMPILE_TEST", but TBH I wouldn't even
bother with that. Even if you want to rely on using the IOMMU client API
unconditionally, it'll fail decisively enough at runtime if there's no
IOMMU present (or the API is stubbed out entirely).
> + select IOMMU_IO_PGTABLE_LPAE
And I have no idea what this might think it's here for :/
Thanks,
Robin.
> + select DRM_GEM_SHMEM_HELPER
> + help
> + Choose this option if you have a Rockchip SoC that contains a
> + compatible Neural Processing Unit (NPU), such as the RK3588. Called by
> + Rockchip either RKNN or RKNPU, it accelerates inference of neural
> + networks.
> +
> + The interface exposed to userspace is described in
> + include/uapi/drm/rocket_accel.h and is used by the Rocket userspace
> + driver in Mesa3D.
> +
> + If unsure, say N.
> +
> + To compile this driver as a module, choose M here: the
> + module will be called rocket.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation
2025-06-04 17:03 ` Robin Murphy
@ 2025-06-05 7:41 ` Tomeu Vizoso
2025-06-05 12:29 ` Daniel Stone
2025-06-05 13:37 ` Robin Murphy
0 siblings, 2 replies; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-05 7:41 UTC (permalink / raw)
To: Robin Murphy
Cc: Daniel Stone, Rob Herring, Maxime Ripard, David Airlie,
Simona Vetter, Sebastian Reichel, Nicolas Frattaroli, Kever Yang,
linux-arm-kernel, linux-rockchip, linux-kernel, dri-devel
On Wed, Jun 4, 2025 at 7:03 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 2025-06-04 5:18 pm, Daniel Stone wrote:
> > Hi Tomeu,
> > I have some bad news ...
> >
> > On Wed, 4 Jun 2025 at 08:57, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote:
> >> +int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file)
> >> +{
> >> + [...]
> >> +
> >> + /* This will map the pages to the IOMMU linked to core 0 */
> >> + sgt = drm_gem_shmem_get_pages_sgt(shmem_obj);
> >> + if (IS_ERR(sgt)) {
> >> + ret = PTR_ERR(sgt);
> >> + goto err;
> >> + }
> >> +
> >> + /* Map the pages to the IOMMUs linked to the other cores, so all cores can access this BO */
> >
> > So, uh, this is not great.
> >
> > We only have a single IOMMU context (well, one per core, but one
> > effective VMA) for the whole device. Every BO that gets created, gets
> > mapped into the IOMMU until it's been destroyed. Given that there is
> > no client isolation and no CS validation, that means that every client
> > has RW access to every BO created by any other client, for the
> > lifetime of that BO.
> >
> > I really don't think that this is tractable, given that anyone with
> > access to the device can exfiltrate anything that anyone else has
> > provided to the device.
> >
> > I also don't think that CS validation is tractable tbh.
> >
> > So I guess that leaves us with the third option: enforcing context
> > separation within the kernel driver.
> >
> > The least preferable option I can think of is that rkt sets up and
> > tears down MMU mappings for each job, according to the BO list
> > provided for it. This seems like way too much overhead - especially
> > with RK IOMMU ops having been slow enough within DRM that we expended
> > a lot of effort in Weston doing caching of DRM BOs to avoid doing this
> > unless completely necessary. It also seems risky wrt allocating memory
> > in drm_sched paths to ensure forward progress.
> >
> > Slightly more preferable than this would be that rkt kept a
> > per-context list of BOs and their VA mappings, and when switching
> > between different contexts, would tear down all MMU mappings from the
> > old context and set up mappings from the new. But this has the same
> > issues with drm_sched.
> >
> > The most preferable option from where I sit is that we could have an
> > explicit notion of driver-managed IOMMU contexts, such that rkt could
> > prepare the IOMMU for each context, and then switching contexts at
> > job-run time would be a matter of changing the root DTE pointer and
> > issuing a flush. But I don't see that anywhere in the user-facing
> > IOMMU API, and I'm sure Robin (CCed) will be along shortly to explain
> > why it's not possible ...
>
> On the contrary, it's called iommu_attach_group() :)
>
> In fact this is precisely the usage model I would suggest for this sort
> of thing, and IIRC I had a similar conversation with the Ethos driver
> folks a few years back. Running your own IOMMU domain is no biggie, see
> several other DRM drivers (including rockchip). As long as you have a
> separate struct device per NPU core then indeed it should be perfectly
> straightforward to maintain distinct IOMMU domains per job, and
> attach/detach them as part of scheduling the jobs on and off the cores.
> Looks like rockchip-iommu supports cross-instance attach, so if
> necessary you should already be OK to have multiple cores working on the
> same job at once, without needing extra work at the IOMMU end.
Ok, so if I understood it correctly, the plan would be for each DRM
client to have one IOMMU domain per each core (each core has its own
IOMMU), and map all its buffers in all these domains.
Then when a job is about to be scheduled on a given core, make sure
that the IOMMU for that core uses the domain for the client that
submitted the job.
Did I get that right?
> > Either way, I wonder if we have fully per-context mappings, userspace
> > should not manage IOVAs in the VM_BIND style common to newer drivers,
> > rather than relying on the kernel to do VA allocation and inform
> > userspace of them?
>
> Indeed if you're using the IOMMU API directly then you need to do your
> own address space management either way, so matching bits of process VA
> space is pretty simple to put on the table.
My impression was that the VM_BIND facility is intended for SVM as in
OpenCL and maybe Vulkan?
Guess my question is, what would such an accelerator driver win by
letting userspace manage the address space?
Thanks,
Tomeu
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation
2025-06-05 7:41 ` Tomeu Vizoso
@ 2025-06-05 12:29 ` Daniel Stone
2025-06-05 12:57 ` Tomeu Vizoso
2025-06-05 13:37 ` Robin Murphy
1 sibling, 1 reply; 22+ messages in thread
From: Daniel Stone @ 2025-06-05 12:29 UTC (permalink / raw)
To: Tomeu Vizoso
Cc: Robin Murphy, Rob Herring, Maxime Ripard, David Airlie,
Simona Vetter, Sebastian Reichel, Nicolas Frattaroli, Kever Yang,
linux-arm-kernel, linux-rockchip, linux-kernel, dri-devel
Hey,
On Thu, 5 Jun 2025 at 08:41, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote:
> > Indeed if you're using the IOMMU API directly then you need to do your
> > own address space management either way, so matching bits of process VA
> > space is pretty simple to put on the table.
>
> My impression was that the VM_BIND facility is intended for SVM as in
> OpenCL and maybe Vulkan?
>
> Guess my question is, what would such an accelerator driver win by
> letting userspace manage the address space?
I mean, not a lot gained, but otoh there's also not much to be gained
by using the kernel's range allocator either, and it saves userspace a
roundtrip to discover the VA for a BO it just created/mapped?
Cheers,
Daniel
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation
2025-06-05 12:29 ` Daniel Stone
@ 2025-06-05 12:57 ` Tomeu Vizoso
0 siblings, 0 replies; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-05 12:57 UTC (permalink / raw)
To: Daniel Stone
Cc: Robin Murphy, Rob Herring, Maxime Ripard, David Airlie,
Simona Vetter, Sebastian Reichel, Nicolas Frattaroli, Kever Yang,
linux-arm-kernel, linux-rockchip, linux-kernel, dri-devel
On Thu, Jun 5, 2025 at 2:29 PM Daniel Stone <daniel@fooishbar.org> wrote:
>
> Hey,
>
> On Thu, 5 Jun 2025 at 08:41, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote:
> > > Indeed if you're using the IOMMU API directly then you need to do your
> > > own address space management either way, so matching bits of process VA
> > > space is pretty simple to put on the table.
> >
> > My impression was that the VM_BIND facility is intended for SVM as in
> > OpenCL and maybe Vulkan?
> >
> > Guess my question is, what would such an accelerator driver win by
> > letting userspace manage the address space?
>
> I mean, not a lot gained, but otoh there's also not much to be gained
> by using the kernel's range allocator either, and it saves userspace a
> roundtrip to discover the VA for a BO it just created/mapped?
Oh, I just map on creation and return the VA as an out arg in the
creation ioctl.
So it just seemed the simplest approach, with the least custom code in
the driver.
Cheers,
Tomeu
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation
2025-06-05 7:41 ` Tomeu Vizoso
2025-06-05 12:29 ` Daniel Stone
@ 2025-06-05 13:37 ` Robin Murphy
2025-06-05 16:32 ` Tomeu Vizoso
1 sibling, 1 reply; 22+ messages in thread
From: Robin Murphy @ 2025-06-05 13:37 UTC (permalink / raw)
To: Tomeu Vizoso
Cc: Daniel Stone, Rob Herring, Maxime Ripard, David Airlie,
Simona Vetter, Sebastian Reichel, Nicolas Frattaroli, Kever Yang,
linux-arm-kernel, linux-rockchip, linux-kernel, dri-devel
On 05/06/2025 8:41 am, Tomeu Vizoso wrote:
[...]
>> In fact this is precisely the usage model I would suggest for this sort
>> of thing, and IIRC I had a similar conversation with the Ethos driver
>> folks a few years back. Running your own IOMMU domain is no biggie, see
>> several other DRM drivers (including rockchip). As long as you have a
>> separate struct device per NPU core then indeed it should be perfectly
>> straightforward to maintain distinct IOMMU domains per job, and
>> attach/detach them as part of scheduling the jobs on and off the cores.
>> Looks like rockchip-iommu supports cross-instance attach, so if
>> necessary you should already be OK to have multiple cores working on the
>> same job at once, without needing extra work at the IOMMU end.
>
> Ok, so if I understood it correctly, the plan would be for each DRM
> client to have one IOMMU domain per each core (each core has its own
> IOMMU), and map all its buffers in all these domains.
>
> Then when a job is about to be scheduled on a given core, make sure
> that the IOMMU for that core uses the domain for the client that
> submitted the job.
>
> Did I get that right?
It should only need a single IOMMU domain per DRM client, so no faffing
about replicating mappings. iommu_paging_domain_alloc() does need *an*
appropriate target device so it can identify the right IOMMU driver, but
that in itself doesn't preclude attaching other devices to the resulting
domain as well as (or even instead of) the nominal one. In general, not
all IOMMU drivers support cross-instance attach so it may fail with
-EINVAL, and *that*'s when you might need to fall back to allocating
multiple per-instance domains - but as I say since this is a
Rockchip-specific driver where the IOMMU *is* intended to support this
already, you don't need to bother.
Thanks,
Robin.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation
2025-06-05 13:37 ` Robin Murphy
@ 2025-06-05 16:32 ` Tomeu Vizoso
2025-06-05 16:36 ` Daniel Stone
0 siblings, 1 reply; 22+ messages in thread
From: Tomeu Vizoso @ 2025-06-05 16:32 UTC (permalink / raw)
To: Robin Murphy
Cc: Daniel Stone, Rob Herring, Maxime Ripard, David Airlie,
Simona Vetter, Sebastian Reichel, Nicolas Frattaroli, Kever Yang,
linux-arm-kernel, linux-rockchip, linux-kernel, dri-devel
On Thu, Jun 5, 2025 at 3:37 PM Robin Murphy <robin.murphy@arm.com> wrote:
>
> On 05/06/2025 8:41 am, Tomeu Vizoso wrote:
> [...]
> >> In fact this is precisely the usage model I would suggest for this sort
> >> of thing, and IIRC I had a similar conversation with the Ethos driver
> >> folks a few years back. Running your own IOMMU domain is no biggie, see
> >> several other DRM drivers (including rockchip). As long as you have a
> >> separate struct device per NPU core then indeed it should be perfectly
> >> straightforward to maintain distinct IOMMU domains per job, and
> >> attach/detach them as part of scheduling the jobs on and off the cores.
> >> Looks like rockchip-iommu supports cross-instance attach, so if
> >> necessary you should already be OK to have multiple cores working on the
> >> same job at once, without needing extra work at the IOMMU end.
> >
> > Ok, so if I understood it correctly, the plan would be for each DRM
> > client to have one IOMMU domain per each core (each core has its own
> > IOMMU), and map all its buffers in all these domains.
> >
> > Then when a job is about to be scheduled on a given core, make sure
> > that the IOMMU for that core uses the domain for the client that
> > submitted the job.
> >
> > Did I get that right?
>
> It should only need a single IOMMU domain per DRM client, so no faffing
> about replicating mappings. iommu_paging_domain_alloc() does need *an*
> appropriate target device so it can identify the right IOMMU driver, but
> that in itself doesn't preclude attaching other devices to the resulting
> domain as well as (or even instead of) the nominal one. In general, not
> all IOMMU drivers support cross-instance attach so it may fail with
> -EINVAL, and *that*'s when you might need to fall back to allocating
> multiple per-instance domains - but as I say since this is a
> Rockchip-specific driver where the IOMMU *is* intended to support this
> already, you don't need to bother.
Ok, I did just that and it's working great in my testing:
I create a domain when the client opens the DRM connection and map all
its BOs to it. Then when a job is about to start, I detach whatever
domain was attached to the core's group and attach that client's
domain.
Will send a v7 with it in a couple of days.
Thanks!
Tomeu
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation
2025-06-05 16:32 ` Tomeu Vizoso
@ 2025-06-05 16:36 ` Daniel Stone
0 siblings, 0 replies; 22+ messages in thread
From: Daniel Stone @ 2025-06-05 16:36 UTC (permalink / raw)
To: Tomeu Vizoso
Cc: Robin Murphy, Rob Herring, Maxime Ripard, David Airlie,
Simona Vetter, Sebastian Reichel, Nicolas Frattaroli, Kever Yang,
linux-arm-kernel, linux-rockchip, linux-kernel, dri-devel
On Thu, 5 Jun 2025 at 17:32, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote:
> On Thu, Jun 5, 2025 at 3:37 PM Robin Murphy <robin.murphy@arm.com> wrote:
> > It should only need a single IOMMU domain per DRM client, so no faffing
> > about replicating mappings. iommu_paging_domain_alloc() does need *an*
> > appropriate target device so it can identify the right IOMMU driver, but
> > that in itself doesn't preclude attaching other devices to the resulting
> > domain as well as (or even instead of) the nominal one. In general, not
> > all IOMMU drivers support cross-instance attach so it may fail with
> > -EINVAL, and *that*'s when you might need to fall back to allocating
> > multiple per-instance domains - but as I say since this is a
> > Rockchip-specific driver where the IOMMU *is* intended to support this
> > already, you don't need to bother.
>
> Ok, I did just that and it's working great in my testing:
>
> I create a domain when the client opens the DRM connection and map all
> its BOs to it. Then when a job is about to start, I detach whatever
> domain was attached to the core's group and attach that client's
> domain.
>
> Will send a v7 with it in a couple of days.
Awesome, thanks so much for that Tomeu!
Cheers,
Daniel
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-06-05 16:39 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-04 7:57 [PATCH v6 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 01/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
2025-06-04 8:25 ` Heiko Stübner
2025-06-04 8:36 ` Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 02/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588-base Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 03/10] arm64: dts: rockchip: Enable the NPU on quartzpro64 Tomeu Vizoso
2025-06-04 8:39 ` Heiko Stübner
2025-06-04 7:57 ` [PATCH v6 05/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
2025-06-04 18:14 ` Robin Murphy
2025-06-04 7:57 ` [PATCH v6 06/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
2025-06-04 16:18 ` Daniel Stone
2025-06-04 17:03 ` Robin Murphy
2025-06-05 7:41 ` Tomeu Vizoso
2025-06-05 12:29 ` Daniel Stone
2025-06-05 12:57 ` Tomeu Vizoso
2025-06-05 13:37 ` Robin Murphy
2025-06-05 16:32 ` Tomeu Vizoso
2025-06-05 16:36 ` Daniel Stone
2025-06-04 7:57 ` [PATCH v6 07/10] accel/rocket: Add job submission IOCTL Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 08/10] accel/rocket: Add IOCTLs for synchronizing memory accesses Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 09/10] arm64: dts: rockchip: add pd_npu label for RK3588 power domains Tomeu Vizoso
2025-06-04 7:57 ` [PATCH v6 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B Tomeu Vizoso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).