* [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU
@ 2025-05-19 13:43 Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 01/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
` (9 more replies)
0 siblings, 10 replies; 15+ messages in thread
From: Tomeu Vizoso @ 2025-05-19 13:43 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
This series adds a new driver for the NPU that Rockchip includes in its
newer SoCs, developed by them on the NVDLA base.
In its current form, it supports the specific NPU in the RK3588 SoC.
The userspace driver is part of Mesa and an initial draft can be found at:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
Changes in v4:
- Several fixes to DT bindings.
- Link to v3: https://lore.kernel.org/r/20250516-6-10-rocket-v3-0-7051ac9225db@tomeuvizoso.net
Changes in v3:
- Reference in the device tree only the register blocks that are
actually used.
- Several style and robustness fixes suggested in the mailing list.
- Added patches from Nicolas Frattaroli that add support to the NPU for
the Rock 5B board.
- Link to v2: https://lore.kernel.org/r/20250225-6-10-rocket-v2-0-d4dbcfafc141@tomeuvizoso.net
Changes in v2:
- Drop patch adding the rk3588 compatible to rockchip-iommu (Sebastian Reichel)
- Drop patch adding support for multiple power domains to rockchip-iommu (Sebastian Reichel)
- Link to v1: https://lore.kernel.org/r/20240612-6-10-rocket-v1-0-060e48eea250@tomeuvizoso.net
---
Nicolas Frattaroli (2):
arm64: dts: rockchip: add pd_npu label for RK3588 power domains
arm64: dts: rockchip: enable NPU on ROCK 5B
Tomeu Vizoso (8):
dt-bindings: npu: rockchip,rknn: Add bindings
arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588s
arm64: dts: rockchip: Enable the NPU on quartzpro64
accel/rocket: Add registers header
accel/rocket: Add a new driver for Rockchip's NPU
accel/rocket: Add IOCTL for BO creation
accel/rocket: Add job submission IOCTL
accel/rocket: Add IOCTLs for synchronizing memory accesses
Documentation/accel/index.rst | 1 +
Documentation/accel/rocket/index.rst | 25 +
.../bindings/npu/rockchip,rknn-core.yaml | 149 +
MAINTAINERS | 10 +
arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 87 +-
.../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 30 +
arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 56 +
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 1 +
drivers/accel/rocket/Kconfig | 25 +
drivers/accel/rocket/Makefile | 10 +
drivers/accel/rocket/rocket_core.c | 103 +
drivers/accel/rocket/rocket_core.h | 59 +
drivers/accel/rocket/rocket_device.c | 45 +
drivers/accel/rocket/rocket_device.h | 31 +
drivers/accel/rocket/rocket_drv.c | 337 ++
drivers/accel/rocket/rocket_drv.h | 17 +
drivers/accel/rocket/rocket_gem.c | 211 +
drivers/accel/rocket/rocket_gem.h | 31 +
drivers/accel/rocket/rocket_job.c | 723 ++++
drivers/accel/rocket/rocket_job.h | 50 +
drivers/accel/rocket/rocket_registers.h | 4425 ++++++++++++++++++++
include/uapi/drm/rocket_accel.h | 145 +
23 files changed, 6571 insertions(+), 1 deletion(-)
---
base-commit: a6d708809579ed3d902465785666085ff07a1d7c
change-id: 20240612-6-10-rocket-9316defc14c7
Best regards,
--
Tomeu Vizoso <tomeu@tomeuvizoso.net>
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v4 01/10] dt-bindings: npu: rockchip,rknn: Add bindings
2025-05-19 13:43 [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
@ 2025-05-19 13:43 ` Tomeu Vizoso
2025-05-20 7:33 ` Krzysztof Kozlowski
2025-05-19 13:43 ` [PATCH v4 02/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588s Tomeu Vizoso
` (8 subsequent siblings)
9 siblings, 1 reply; 15+ messages in thread
From: Tomeu Vizoso @ 2025-05-19 13:43 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
Add the bindings for the Neural Processing Unit IP from Rockchip.
v2:
- Adapt to new node structure (one node per core, each with its own
IOMMU)
- Several misc. fixes from Sebastian Reichel
v3:
- Split register block in its constituent subblocks, and only require
the ones that the kernel would ever use (Nicolas Frattaroli)
- Group supplies (Rob Herring)
- Explain the way in which the top core is special (Rob Herring)
v4:
- Change required node name to npu@ (Rob Herring and Krzysztof Kozlowski)
- Remove unneeded items: (Krzysztof Kozlowski)
- Fix use of minItems/maxItems (Krzysztof Kozlowski)
- Add reg-names to list of required properties (Krzysztof Kozlowski)
- Fix example (Krzysztof Kozlowski)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
---
.../bindings/npu/rockchip,rknn-core.yaml | 149 +++++++++++++++++++++
1 file changed, 149 insertions(+)
diff --git a/Documentation/devicetree/bindings/npu/rockchip,rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rknn-core.yaml
new file mode 100644
index 0000000000000000000000000000000000000000..fafd0b01da215c7396262012988e364ef07ea137
--- /dev/null
+++ b/Documentation/devicetree/bindings/npu/rockchip,rknn-core.yaml
@@ -0,0 +1,149 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/npu/rockchip,rknn-core.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Neural Processing Unit IP from Rockchip
+
+maintainers:
+ - Tomeu Vizoso <tomeu@tomeuvizoso.net>
+
+description:
+ Rockchip IP for accelerating inference of neural networks, based on NVIDIA's
+ open source NVDLA IP.
+
+ There is to be a node per each core in the NPU. In Rockchip's design there
+ will be one core that is special and needs to be powered on before any of the
+ other cores can be used. This special core is called the top core and should
+ have the compatible string that corresponds to top cores.
+
+properties:
+ $nodename:
+ pattern: '^npu@[a-f0-9]+$'
+
+ compatible:
+ oneOf:
+ - enum:
+ - rockchip,rk3588-rknn-core-top
+ - enum:
+ - rockchip,rk3588-rknn-core
+
+ reg:
+ maxItems: 3
+
+ reg-names:
+ items:
+ - const: pc
+ - const: cna
+ - const: core
+
+ clocks:
+ minItems: 2
+ maxItems: 4
+
+ clock-names:
+ items:
+ - const: aclk
+ - const: hclk
+ - const: npu
+ - const: pclk
+ minItems: 2
+
+ interrupts:
+ maxItems: 1
+
+ iommus:
+ maxItems: 1
+
+ npu-supply: true
+
+ power-domains:
+ maxItems: 1
+
+ resets:
+ maxItems: 2
+
+ reset-names:
+ items:
+ - const: srst_a
+ - const: srst_h
+
+ sram-supply: true
+
+required:
+ - compatible
+ - reg
+ - reg-names
+ - clocks
+ - clock-names
+ - interrupts
+ - iommus
+ - power-domains
+ - resets
+ - reset-names
+ - npu-supply
+ - sram-supply
+
+allOf:
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - rockchip,rknn-core-top
+ then:
+ properties:
+ clocks:
+ minItems: 4
+
+ clock-names:
+ minItems: 4
+ - if:
+ properties:
+ compatible:
+ contains:
+ enum:
+ - rockchip,rknn-core
+ then:
+ properties:
+ clocks:
+ maxItems: 2
+ clock-names:
+ maxItems: 2
+
+additionalProperties: false
+
+examples:
+ - |
+ #include <dt-bindings/clock/rockchip,rk3588-cru.h>
+ #include <dt-bindings/interrupt-controller/irq.h>
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+ #include <dt-bindings/power/rk3588-power.h>
+ #include <dt-bindings/reset/rockchip,rk3588-cru.h>
+
+ bus {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ npu@fdab0000 {
+ compatible = "rockchip,rk3588-rknn-core-top";
+ reg = <0x0 0xfdab0000 0x0 0x1000>,
+ <0x0 0xfdab1000 0x0 0x1000>,
+ <0x0 0xfdab3000 0x0 0x1000>;
+ reg-names = "pc", "cna", "core";
+ assigned-clocks = <&scmi_clk SCMI_CLK_NPU>;
+ assigned-clock-rates = <200000000>;
+ clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>,
+ <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>;
+ clock-names = "aclk", "hclk", "npu", "pclk";
+ interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>;
+ iommus = <&rknn_mmu_top>;
+ npu-supply = <&vdd_npu_s0>;
+ power-domains = <&power RK3588_PD_NPUTOP>;
+ resets = <&cru SRST_A_RKNN0>, <&cru SRST_H_RKNN0>;
+ reset-names = "srst_a", "srst_h";
+ sram-supply = <&vdd_npu_mem_s0>;
+ };
+ };
+...
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 02/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588s
2025-05-19 13:43 [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 01/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
@ 2025-05-19 13:43 ` Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 03/10] arm64: dts: rockchip: Enable the NPU on quartzpro64 Tomeu Vizoso
` (7 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Tomeu Vizoso @ 2025-05-19 13:43 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
See Chapter 36 "RKNN" from the RK3588 TRM (Part 1).
This is a derivative of NVIDIA's NVDLA, but with its own front-end
processor.
The IP is divided in three cores, programmed independently. The first
core though is special, requiring to be powered on before any of the
others can be used.
The IOMMU of the first core is also special in that it has two subunits
(read/write?) that need to be programmed in sync.
v2:
- Have one device for each NPU core (Sebastian Reichel)
- Have one device for each IOMMU (Sebastian Reichel)
- Correctly sort nodes (Diederik de Haas)
- Add rockchip,iommu compatible to IOMMU nodes (Sebastian Reichel)
v3:
- Adapt to a split of the register block in the DT bindings (Nicolas
Frattaroli)
v4:
- Adapt to changes in bindings
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 85 +++++++++++++++++++++++++++
1 file changed, 85 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
index 1e18ad93ba0ebdad31642b88ff0f90ef4e8dc76f..f5e58851047e80b23f9ff3244692ad868ddc1ff6 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
@@ -1136,6 +1136,91 @@ power-domain@RK3588_PD_SDMMC {
};
};
+ rknn_core_top: npu@fdab0000 {
+ compatible = "rockchip,rk3588-rknn-core-top";
+ reg = <0x0 0xfdab0000 0x0 0x1000>,
+ <0x0 0xfdab1000 0x0 0x1000>,
+ <0x0 0xfdab3000 0x0 0x1000>;
+ reg-names = "pc", "cna", "core";
+ interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>,
+ <&cru ACLK_NPU0>, <&cru HCLK_NPU0>;
+ clock-names = "aclk", "hclk", "npu", "pclk";
+ assigned-clocks = <&scmi_clk SCMI_CLK_NPU>;
+ assigned-clock-rates = <200000000>;
+ resets = <&cru SRST_A_RKNN0>, <&cru SRST_H_RKNN0>;
+ reset-names = "srst_a", "srst_h";
+ power-domains = <&power RK3588_PD_NPUTOP>;
+ iommus = <&rknn_mmu_top>;
+ status = "disabled";
+ };
+
+ rknn_mmu_top: iommu@fdab9000 {
+ compatible = "rockchip,rk3588-iommu", "rockchip,rk3568-iommu";
+ reg = <0x0 0xfdab9000 0x0 0x100>,
+ <0x0 0xfdaba000 0x0 0x100>;
+ interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>;
+ clock-names = "aclk", "iface";
+ #iommu-cells = <0>;
+ power-domains = <&power RK3588_PD_NPUTOP>;
+ status = "disabled";
+ };
+
+ rknn_core_1: npu@fdac0000 {
+ compatible = "rockchip,rk3588-rknn-core";
+ reg = <0x0 0xfdac0000 0x0 0x1000>,
+ <0x0 0xfdac1000 0x0 0x1000>,
+ <0x0 0xfdac3000 0x0 0x1000>;
+ reg-names = "pc", "cna", "core";
+ interrupts = <GIC_SPI 111 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU1>, <&cru HCLK_NPU1>;
+ clock-names = "aclk", "hclk";
+ resets = <&cru SRST_A_RKNN1>, <&cru SRST_H_RKNN1>;
+ reset-names = "srst_a", "srst_h";
+ power-domains = <&power RK3588_PD_NPU1>;
+ iommus = <&rknn_mmu_1>;
+ status = "disabled";
+ };
+
+ rknn_mmu_1: iommu@fdac9000 {
+ compatible = "rockchip,rk3588-iommu", "rockchip,rk3568-iommu";
+ reg = <0x0 0xfdaca000 0x0 0x100>;
+ interrupts = <GIC_SPI 111 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU1>, <&cru HCLK_NPU1>;
+ clock-names = "aclk", "iface";
+ #iommu-cells = <0>;
+ power-domains = <&power RK3588_PD_NPU1>;
+ status = "disabled";
+ };
+
+ rknn_core_2: npu@fdad0000 {
+ compatible = "rockchip,rk3588-rknn-core";
+ reg = <0x0 0xfdad0000 0x0 0x1000>,
+ <0x0 0xfdad1000 0x0 0x1000>,
+ <0x0 0xfdad3000 0x0 0x1000>;
+ reg-names = "pc", "cna", "core";
+ interrupts = <GIC_SPI 112 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU2>, <&cru HCLK_NPU2>;
+ clock-names = "aclk", "hclk";
+ resets = <&cru SRST_A_RKNN2>, <&cru SRST_H_RKNN2>;
+ reset-names = "srst_a", "srst_h";
+ power-domains = <&power RK3588_PD_NPU2>;
+ iommus = <&rknn_mmu_2>;
+ status = "disabled";
+ };
+
+ rknn_mmu_2: iommu@fdad9000 {
+ compatible = "rockchip,rk3588-iommu", "rockchip,rk3568-iommu";
+ reg = <0x0 0xfdada000 0x0 0x100>;
+ interrupts = <GIC_SPI 112 IRQ_TYPE_LEVEL_HIGH 0>;
+ clocks = <&cru ACLK_NPU2>, <&cru HCLK_NPU2>;
+ clock-names = "aclk", "iface";
+ #iommu-cells = <0>;
+ power-domains = <&power RK3588_PD_NPU2>;
+ status = "disabled";
+ };
+
vpu121: video-codec@fdb50000 {
compatible = "rockchip,rk3588-vpu121", "rockchip,rk3568-vpu";
reg = <0x0 0xfdb50000 0x0 0x800>;
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 03/10] arm64: dts: rockchip: Enable the NPU on quartzpro64
2025-05-19 13:43 [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 01/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 02/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588s Tomeu Vizoso
@ 2025-05-19 13:43 ` Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 05/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
` (6 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Tomeu Vizoso @ 2025-05-19 13:43 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
Enable the nodes added in a previous commit to the rk3588s device tree.
v2:
- Split nodes (Sebastian Reichel)
- Sort nodes (Sebastian Reichel)
- Add board regulators (Sebastian Reichel)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
.../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts | 30 ++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
index 78aaa6635b5d20a650aba8d8c2d0d4f498ff0d33..2e45b213c25b99571dd71ce90bc7970418f60276 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-quartzpro64.dts
@@ -415,6 +415,36 @@ &pcie3x4 {
status = "okay";
};
+&rknn_core_top {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_mem_s0>;
+ status = "okay";
+};
+
+&rknn_core_1 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_mem_s0>;
+ status = "okay";
+};
+
+&rknn_core_2 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_mem_s0>;
+ status = "okay";
+};
+
+&rknn_mmu_top {
+ status = "okay";
+};
+
+&rknn_mmu_1 {
+ status = "okay";
+};
+
+&rknn_mmu_2 {
+ status = "okay";
+};
+
&saradc {
vref-supply = <&vcc_1v8_s0>;
status = "okay";
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 05/10] accel/rocket: Add a new driver for Rockchip's NPU
2025-05-19 13:43 [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (2 preceding siblings ...)
2025-05-19 13:43 ` [PATCH v4 03/10] arm64: dts: rockchip: Enable the NPU on quartzpro64 Tomeu Vizoso
@ 2025-05-19 13:43 ` Tomeu Vizoso
2025-05-20 7:37 ` Krzysztof Kozlowski
2025-05-19 13:43 ` [PATCH v4 06/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
` (5 subsequent siblings)
9 siblings, 1 reply; 15+ messages in thread
From: Tomeu Vizoso @ 2025-05-19 13:43 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
This initial version supports the NPU as shipped in the RK3588 SoC and
described in the first part of its TRM, in Chapter 36.
This NPU contains 3 independent cores that the driver can submit jobs
to.
This commit adds just hardware initialization and power management.
v2:
- Split cores and IOMMUs as independent devices (Sebastian Reichel)
- Add some documentation (Jeffrey Hugo)
- Be more explicit in the Kconfig documentation (Jeffrey Hugo)
- Remove resets, as these haven't been found useful so far (Zenghui Yu)
- Repack structs (Jeffrey Hugo)
- Use DEFINE_DRM_ACCEL_FOPS (Jeffrey Hugo)
- Use devm_drm_dev_alloc (Jeffrey Hugo)
- Use probe log helper (Jeffrey Hugo)
- Introduce UABI header in a later patch (Jeffrey Hugo)
v3:
- Adapt to a split of the register block in the DT bindings (Nicolas
Frattaroli)
- Move registers header to its own commit (Thomas Zimmermann)
- Misc. cleanups (Thomas Zimmermann and Jeff Hugo)
- Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
- PM improvements (Nicolas Frattaroli)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
Documentation/accel/index.rst | 1 +
Documentation/accel/rocket/index.rst | 25 +++
MAINTAINERS | 10 ++
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 1 +
drivers/accel/rocket/Kconfig | 25 +++
drivers/accel/rocket/Makefile | 8 +
drivers/accel/rocket/rocket_core.c | 93 +++++++++++
drivers/accel/rocket/rocket_core.h | 45 +++++
drivers/accel/rocket/rocket_device.c | 39 +++++
drivers/accel/rocket/rocket_device.h | 27 +++
drivers/accel/rocket/rocket_drv.c | 315 +++++++++++++++++++++++++++++++++++
drivers/accel/rocket/rocket_drv.h | 13 ++
13 files changed, 603 insertions(+)
diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst
index bc85f26533d88891dde482f91e26c99991b22869..d8fa332d60a890dbb617454d2a26d9b6f9b196aa 100644
--- a/Documentation/accel/index.rst
+++ b/Documentation/accel/index.rst
@@ -10,6 +10,7 @@ Compute Accelerators
introduction
amdxdna/index
qaic/index
+ rocket/index
.. only:: subproject and html
diff --git a/Documentation/accel/rocket/index.rst b/Documentation/accel/rocket/index.rst
new file mode 100644
index 0000000000000000000000000000000000000000..a3389f9a284c0975bc201f6e09082c01970e08a3
--- /dev/null
+++ b/Documentation/accel/rocket/index.rst
@@ -0,0 +1,25 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+=====================================
+ accel/rocket Rockchip NPU driver
+=====================================
+
+The accel/rocket driver supports the Neural Processing Units (NPUs) inside some
+Rockchip SoCs such as the RK3588. Rockchip calls it RKNN and sometimes RKNPU.
+
+This NPU is closely based on the NVDLA IP released by NVIDIA as open hardware in
+2018, along with open source kernel and userspace drivers.
+
+The frontend unit in Rockchip's NPU though is completely different from that in
+the open source IP, so this kernel driver is specific to Rockchip's version.
+
+The hardware is described in chapter 36 in the RK3588 TRM.
+
+This driver just powers the hardware on and off, allocates and maps buffers to
+the device and submits jobs to the frontend unit. Everything else is done in
+userspace, as a Gallium driver (also called rocket) that is part of the Mesa3D
+project.
+
+Hardware currently supported:
+
+* RK3588
\ No newline at end of file
diff --git a/MAINTAINERS b/MAINTAINERS
index 96b82704950184bd71623ff41fc4df31e4c7fe87..2d8833bf1f2db06ca624d703f19066adab2f9fde 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7263,6 +7263,16 @@ T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
F: drivers/accel/ivpu/
F: include/uapi/drm/ivpu_accel.h
+DRM ACCEL DRIVER FOR ROCKCHIP NPU
+M: Tomeu Vizoso <tomeu@tomeuvizoso.net>
+L: dri-devel@lists.freedesktop.org
+S: Supported
+T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
+F: Documentation/accel/rocket/
+F: Documentation/devicetree/bindings/npu/rockchip,rknn-core.yaml
+F: drivers/accel/rocket/
+F: include/uapi/drm/rocket_accel.h
+
DRM COMPUTE ACCELERATORS DRIVERS AND FRAMEWORK
M: Oded Gabbay <ogabbay@kernel.org>
L: dri-devel@lists.freedesktop.org
diff --git a/drivers/accel/Kconfig b/drivers/accel/Kconfig
index 5b9490367a39fd12d35a8d9021768aa186c09308..bb01cebc42bf16ebf02e938040f339ff94869e33 100644
--- a/drivers/accel/Kconfig
+++ b/drivers/accel/Kconfig
@@ -28,5 +28,6 @@ source "drivers/accel/amdxdna/Kconfig"
source "drivers/accel/habanalabs/Kconfig"
source "drivers/accel/ivpu/Kconfig"
source "drivers/accel/qaic/Kconfig"
+source "drivers/accel/rocket/Kconfig"
endif
diff --git a/drivers/accel/Makefile b/drivers/accel/Makefile
index a301fb6089d4c515430175c5e2ba9190f6dc9158..ffc3fa58866616d933184a7659573cd4d4780a8d 100644
--- a/drivers/accel/Makefile
+++ b/drivers/accel/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_DRM_ACCEL_AMDXDNA) += amdxdna/
obj-$(CONFIG_DRM_ACCEL_HABANALABS) += habanalabs/
obj-$(CONFIG_DRM_ACCEL_IVPU) += ivpu/
obj-$(CONFIG_DRM_ACCEL_QAIC) += qaic/
+obj-$(CONFIG_DRM_ACCEL_ROCKET) += rocket/
\ No newline at end of file
diff --git a/drivers/accel/rocket/Kconfig b/drivers/accel/rocket/Kconfig
new file mode 100644
index 0000000000000000000000000000000000000000..9a59c6c61bf4d6460d8008b16331f001c97de67d
--- /dev/null
+++ b/drivers/accel/rocket/Kconfig
@@ -0,0 +1,25 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config DRM_ACCEL_ROCKET
+ tristate "Rocket (support for Rockchip NPUs)"
+ depends on DRM
+ depends on ARM64 || COMPILE_TEST
+ depends on MMU
+ select DRM_SCHED
+ select IOMMU_SUPPORT
+ select IOMMU_IO_PGTABLE_LPAE
+ select DRM_GEM_SHMEM_HELPER
+ help
+ Choose this option if you have a Rockchip SoC that contains a
+ compatible Neural Processing Unit (NPU), such as the RK3588. Called by
+ Rockchip either RKNN or RKNPU, it accelerates inference of neural
+ networks.
+
+ The interface exposed to userspace is described in
+ include/uapi/drm/rocket_accel.h and is used by the Rocket userspace
+ driver in Mesa3D.
+
+ If unsure, say N.
+
+ To compile this driver as a module, choose M here: the
+ module will be called rocket.
diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
new file mode 100644
index 0000000000000000000000000000000000000000..abdd75f2492eaecf8bf5e78a2ac150ea19ac3e96
--- /dev/null
+++ b/drivers/accel/rocket/Makefile
@@ -0,0 +1,8 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_DRM_ACCEL_ROCKET) := rocket.o
+
+rocket-y := \
+ rocket_core.o \
+ rocket_device.o \
+ rocket_drv.o
diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
new file mode 100644
index 0000000000000000000000000000000000000000..a947c3120558e8af90bc0730a4d30ac796d5683d
--- /dev/null
+++ b/drivers/accel/rocket/rocket_core.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <linux/clk.h>
+#include <linux/dev_printk.h>
+#include <linux/err.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+
+#include "rocket_core.h"
+
+static int rocket_clk_init(struct rocket_core *core)
+{
+ struct device *dev = core->dev;
+ int err;
+
+ core->a_clk = devm_clk_get(dev, "aclk");
+ if (IS_ERR(core->a_clk)) {
+ err = PTR_ERR(core->a_clk);
+ dev_err(dev, "devm_clk_get failed %d for aclk in core %d\n", err, core->index);
+ return err;
+ }
+
+ core->h_clk = devm_clk_get(dev, "hclk");
+ if (IS_ERR(core->h_clk)) {
+ err = PTR_ERR(core->h_clk);
+ dev_err(dev, "devm_clk_get failed %d for hclk in core %d\n", err, core->index);
+ clk_disable_unprepare(core->a_clk);
+ return err;
+ }
+
+ return 0;
+}
+
+int rocket_core_init(struct rocket_core *core)
+{
+ struct device *dev = core->dev;
+ struct platform_device *pdev = to_platform_device(dev);
+ u32 version;
+ int err = 0;
+
+ err = rocket_clk_init(core);
+ if (err)
+ return err;
+
+ core->pc_iomem = devm_platform_ioremap_resource_byname(pdev, "pc");
+ if (IS_ERR(core->pc_iomem)) {
+ dev_err(dev, "couldn't find PC registers %ld\n", PTR_ERR(core->pc_iomem));
+ return PTR_ERR(core->pc_iomem);
+ }
+
+ core->cna_iomem = devm_platform_ioremap_resource_byname(pdev, "cna");
+ if (IS_ERR(core->cna_iomem)) {
+ dev_err(dev, "couldn't find CNA registers %ld\n", PTR_ERR(core->cna_iomem));
+ return PTR_ERR(core->cna_iomem);
+ }
+
+ core->core_iomem = devm_platform_ioremap_resource_byname(pdev, "core");
+ if (IS_ERR(core->core_iomem)) {
+ dev_err(dev, "couldn't find CORE registers %ld\n", PTR_ERR(core->core_iomem));
+ return PTR_ERR(core->core_iomem);
+ }
+
+ pm_runtime_use_autosuspend(dev);
+
+ /*
+ * As this NPU will be most often used as part of a media pipeline that
+ * ends presenting in a display, choose 50 ms (~3 frames at 60Hz) as an
+ * autosuspend delay as that will keep the device powered up while the
+ * pipeline is running.
+ */
+ pm_runtime_set_autosuspend_delay(dev, 50);
+
+ pm_runtime_enable(dev);
+
+ err = pm_runtime_get_sync(dev);
+
+ version = rocket_pc_read(core, VERSION);
+ version += rocket_pc_read(core, VERSION_NUM) & 0xffff;
+
+ pm_runtime_mark_last_busy(dev);
+ pm_runtime_put_autosuspend(dev);
+
+ dev_info(dev, "Rockchip NPU core %d version: %d\n", core->index, version);
+
+ return 0;
+}
+
+void rocket_core_fini(struct rocket_core *core)
+{
+ pm_runtime_dont_use_autosuspend(core->dev);
+ pm_runtime_disable(core->dev);
+}
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
new file mode 100644
index 0000000000000000000000000000000000000000..3bde8ad8e6e45f9000ee377d7a5ea9ca01f9ac53
--- /dev/null
+++ b/drivers/accel/rocket/rocket_core.h
@@ -0,0 +1,45 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_CORE_H__
+#define __ROCKET_CORE_H__
+
+#include <drm/gpu_scheduler.h>
+#include <linux/io.h>
+#include <linux/mutex_types.h>
+
+#include "rocket_registers.h"
+
+#define rocket_pc_readl(core, reg) \
+ readl((core)->pc_iomem + (REG_PC_##reg))
+#define rocket_pc_writel(core, reg, value) \
+ writel(value, (core)->pc_iomem + (REG_PC_##reg))
+
+#define rocket_cna_readl(core, reg) \
+ readl((core)->cna_iomem + (REG_CNA_##reg) - REG_CNA_S_STATUS)
+#define rocket_cna_writel(core, reg, value) \
+ writel(value, (core)->cna_iomem + (REG_CNA_##reg) - REG_CNA_S_STATUS)
+
+#define rocket_core_readl(core, reg) \
+ readl((core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
+#define rocket_core_writel(core, reg, value) \
+ writel(value, (core)->core_iomem + (REG_CORE_##reg) - REG_CORE_S_STATUS)
+
+struct rocket_core {
+ struct device *dev;
+ struct rocket_device *rdev;
+ struct device_link *link;
+ unsigned int index;
+
+ int irq;
+ void __iomem *pc_iomem;
+ void __iomem *cna_iomem;
+ void __iomem *core_iomem;
+ struct clk *a_clk;
+ struct clk *h_clk;
+};
+
+int rocket_core_init(struct rocket_core *core);
+void rocket_core_fini(struct rocket_core *core);
+
+#endif
diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
new file mode 100644
index 0000000000000000000000000000000000000000..bb469ac87d36249157f4ba9d9f7106ad558309e4
--- /dev/null
+++ b/drivers/accel/rocket/rocket_device.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <linux/clk.h>
+#include <linux/dev_printk.h>
+
+#include "rocket_device.h"
+
+int rocket_device_init(struct rocket_device *rdev)
+{
+ struct device *dev = rdev->cores[0].dev;
+ int err;
+
+ rdev->clk_npu = devm_clk_get(dev, "npu");
+ if (IS_ERR(rdev->clk_npu)) {
+ err = PTR_ERR(rdev->clk_npu);
+ dev_err(dev, "devm_clk_get failed %d for clock npu\n", err);
+ return err;
+ }
+
+ rdev->pclk = devm_clk_get(dev, "pclk");
+ if (IS_ERR(rdev->pclk)) {
+ err = PTR_ERR(rdev->pclk);
+ dev_err(dev, "devm_clk_get failed %d for clock pclk\n", err);
+ return err;
+ }
+
+ /* Initialize core 0 (top) */
+ err = rocket_core_init(&rdev->cores[0]);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+void rocket_device_fini(struct rocket_device *rdev)
+{
+ rocket_core_fini(&rdev->cores[0]);
+}
diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
new file mode 100644
index 0000000000000000000000000000000000000000..ba2301e9302120ae338c07baa7d12dd99cb925a9
--- /dev/null
+++ b/drivers/accel/rocket/rocket_device.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_DEVICE_H__
+#define __ROCKET_DEVICE_H__
+
+#include <drm/drm_device.h>
+
+#include "rocket_core.h"
+
+struct rocket_device {
+ struct drm_device ddev;
+
+ struct clk *clk_npu;
+ struct clk *pclk;
+
+ struct rocket_core *cores;
+ unsigned int num_cores;
+};
+
+int rocket_device_init(struct rocket_device *rdev);
+void rocket_device_fini(struct rocket_device *rdev);
+
+#define to_rocket_device(drm_dev) \
+ ((struct rocket_device *)container_of(drm_dev, struct rocket_device, ddev))
+
+#endif
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
new file mode 100644
index 0000000000000000000000000000000000000000..82f4cc374bfaa92678da791849537d51bb4c0ba8
--- /dev/null
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -0,0 +1,315 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <drm/drm_accel.h>
+#include <drm/drm_drv.h>
+#include <drm/drm_gem.h>
+#include <drm/drm_ioctl.h>
+#include <drm/drm_of.h>
+#include <linux/clk.h>
+#include <linux/component.h>
+#include <linux/dma-mapping.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+
+#include "rocket_drv.h"
+
+static int
+rocket_open(struct drm_device *dev, struct drm_file *file)
+{
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct rocket_file_priv *rocket_priv;
+
+ rocket_priv = kzalloc(sizeof(*rocket_priv), GFP_KERNEL);
+ if (!rocket_priv)
+ return -ENOMEM;
+
+ rocket_priv->rdev = rdev;
+ file->driver_priv = rocket_priv;
+
+ return 0;
+}
+
+static void
+rocket_postclose(struct drm_device *dev, struct drm_file *file)
+{
+ struct rocket_file_priv *rocket_priv = file->driver_priv;
+
+ kfree(rocket_priv);
+}
+
+static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
+#define ROCKET_IOCTL(n, func) \
+ DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0)
+};
+
+DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
+
+/*
+ * Rocket driver version:
+ * - 1.0 - initial interface
+ */
+static const struct drm_driver rocket_drm_driver = {
+ .driver_features = DRIVER_COMPUTE_ACCEL,
+ .open = rocket_open,
+ .postclose = rocket_postclose,
+ .ioctls = rocket_drm_driver_ioctls,
+ .num_ioctls = ARRAY_SIZE(rocket_drm_driver_ioctls),
+ .fops = &rocket_accel_driver_fops,
+ .name = "rocket",
+ .desc = "rocket DRM",
+};
+
+static int rocket_drm_bind(struct device *dev)
+{
+ struct device_node *core_node;
+ struct rocket_device *rdev;
+ struct drm_device *ddev;
+ unsigned int num_cores = 1;
+ int err;
+
+ rdev = devm_drm_dev_alloc(dev, &rocket_drm_driver, struct rocket_device, ddev);
+ if (IS_ERR(rdev))
+ return PTR_ERR(rdev);
+
+ ddev = &rdev->ddev;
+ dev_set_drvdata(dev, rdev);
+
+ for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core")
+ if (of_device_is_available(core_node))
+ num_cores++;
+
+ rdev->cores = devm_kmalloc_array(dev, num_cores, sizeof(*rdev->cores),
+ GFP_KERNEL | __GFP_ZERO);
+ if (IS_ERR(rdev->cores))
+ return PTR_ERR(rdev->cores);
+
+ /* Add core 0, any other cores will be added later when they are bound */
+ rdev->cores[0].rdev = rdev;
+ rdev->cores[0].dev = dev;
+ rdev->cores[0].index = 0;
+ rdev->num_cores = 1;
+
+ err = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(40));
+ if (err)
+ return err;
+
+ err = rocket_device_init(rdev);
+ if (err) {
+ dev_err_probe(dev, err, "Fatal error during NPU init\n");
+ goto err_device_fini;
+ }
+
+ err = component_bind_all(dev, rdev);
+ if (err)
+ goto err_device_fini;
+
+ err = drm_dev_register(ddev, 0);
+ if (err < 0)
+ goto err_unbind;
+
+ return 0;
+
+err_unbind:
+ component_unbind_all(dev, rdev);
+err_device_fini:
+ rocket_device_fini(rdev);
+ return err;
+}
+
+static void rocket_drm_unbind(struct device *dev)
+{
+ struct rocket_device *rdev = dev_get_drvdata(dev);
+ struct drm_device *ddev = &rdev->ddev;
+
+ drm_dev_unregister(ddev);
+
+ component_unbind_all(dev, rdev);
+
+ rocket_device_fini(rdev);
+}
+
+const struct component_master_ops rocket_drm_ops = {
+ .bind = rocket_drm_bind,
+ .unbind = rocket_drm_unbind,
+};
+
+static int rocket_core_bind(struct device *dev, struct device *master, void *data)
+{
+ struct rocket_device *rdev = data;
+ unsigned int core = rdev->num_cores;
+ int err;
+
+ dev_set_drvdata(dev, rdev);
+
+ rdev->cores[core].rdev = rdev;
+ rdev->cores[core].dev = dev;
+ rdev->cores[core].index = core;
+ rdev->cores[core].link = device_link_add(dev, rdev->cores[0].dev,
+ DL_FLAG_STATELESS | DL_FLAG_PM_RUNTIME);
+
+ rdev->num_cores++;
+
+ err = rocket_core_init(&rdev->cores[core]);
+ if (err) {
+ rocket_device_fini(rdev);
+ return err;
+ }
+
+ return 0;
+}
+
+static void rocket_core_unbind(struct device *dev, struct device *master, void *data)
+{
+ struct rocket_device *rdev = data;
+
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ if (rdev->cores[core].dev == dev) {
+ rocket_core_fini(&rdev->cores[core]);
+ device_link_del(rdev->cores[core].link);
+ break;
+ }
+ }
+}
+
+const struct component_ops rocket_core_ops = {
+ .bind = rocket_core_bind,
+ .unbind = rocket_core_unbind,
+};
+
+static int rocket_probe(struct platform_device *pdev)
+{
+ struct component_match *match = NULL;
+ struct device_node *core_node;
+
+ if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core"))
+ return component_add(&pdev->dev, &rocket_core_ops);
+
+ for_each_compatible_node(core_node, NULL, "rockchip,rk3588-rknn-core") {
+ if (!of_device_is_available(core_node))
+ continue;
+
+ drm_of_component_match_add(&pdev->dev, &match,
+ component_compare_of, core_node);
+ }
+
+ return component_master_add_with_match(&pdev->dev, &rocket_drm_ops, match);
+}
+
+static void rocket_remove(struct platform_device *pdev)
+{
+ if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core-top"))
+ component_master_del(&pdev->dev, &rocket_drm_ops);
+ else if (fwnode_device_is_compatible(pdev->dev.fwnode, "rockchip,rk3588-rknn-core"))
+ component_del(&pdev->dev, &rocket_core_ops);
+}
+
+static const struct of_device_id dt_match[] = {
+ { .compatible = "rockchip,rk3588-rknn-core-top" },
+ { .compatible = "rockchip,rk3588-rknn-core" },
+ {}
+};
+MODULE_DEVICE_TABLE(of, dt_match);
+
+static int find_core_for_dev(struct device *dev)
+{
+ struct rocket_device *rdev = dev_get_drvdata(dev);
+
+ for (unsigned int core = 0; core < rdev->num_cores; core++) {
+ if (dev == rdev->cores[core].dev)
+ return core;
+ }
+
+ return -1;
+}
+
+static int rocket_device_runtime_resume(struct device *dev)
+{
+ struct rocket_device *rdev = dev_get_drvdata(dev);
+ int core = find_core_for_dev(dev);
+ int err = 0;
+
+ if (core < 0)
+ return -ENODEV;
+
+ if (core == 0) {
+ err = clk_prepare_enable(rdev->clk_npu);
+ if (err) {
+ dev_err(dev, "clk_prepare_enable failed %d for clock npu\n", err);
+ return err;
+ }
+
+ err = clk_prepare_enable(rdev->pclk);
+ if (err) {
+ dev_err(dev, "clk_prepare_enable failed %d for clock pclk\n", err);
+ goto error_clk_npu;
+ }
+ }
+
+ err = clk_prepare_enable(rdev->cores[core].a_clk);
+ if (err) {
+ dev_err(dev, "clk_prepare_enable failed %d for a_clk in core %d\n", err, core);
+ goto error_pclk;
+ }
+
+ err = clk_prepare_enable(rdev->cores[core].h_clk);
+ if (err) {
+ dev_err(dev, "clk_prepare_enable failed %d for h_clk in core %d\n", err, core);
+ goto error_a_clk;
+ }
+
+ return 0;
+
+error_a_clk:
+ clk_disable_unprepare(rdev->cores[core].a_clk);
+
+error_pclk:
+ if (core == 0)
+ clk_disable_unprepare(rdev->pclk);
+
+error_clk_npu:
+ if (core == 0)
+ clk_disable_unprepare(rdev->clk_npu);
+
+ return err;
+}
+
+static int rocket_device_runtime_suspend(struct device *dev)
+{
+ struct rocket_device *rdev = dev_get_drvdata(dev);
+ int core = find_core_for_dev(dev);
+
+ if (core < 0)
+ return -ENODEV;
+
+ clk_disable_unprepare(rdev->cores[core].a_clk);
+ clk_disable_unprepare(rdev->cores[core].h_clk);
+
+ if (core == 0) {
+ clk_disable_unprepare(rdev->pclk);
+ clk_disable_unprepare(rdev->clk_npu);
+ }
+
+ return 0;
+}
+
+EXPORT_GPL_DEV_PM_OPS(rocket_pm_ops) = {
+ RUNTIME_PM_OPS(rocket_device_runtime_suspend, rocket_device_runtime_resume, NULL)
+ SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, pm_runtime_force_resume)
+};
+
+static struct platform_driver rocket_driver = {
+ .probe = rocket_probe,
+ .remove = rocket_remove,
+ .driver = {
+ .name = "rocket",
+ .pm = pm_ptr(&rocket_pm_ops),
+ .of_match_table = dt_match,
+ },
+};
+module_platform_driver(rocket_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("DRM driver for the Rockchip NPU IP");
+MODULE_AUTHOR("Tomeu Vizoso");
diff --git a/drivers/accel/rocket/rocket_drv.h b/drivers/accel/rocket/rocket_drv.h
new file mode 100644
index 0000000000000000000000000000000000000000..bd3a697ab7c8e378967ce638b04d7d86845b53c7
--- /dev/null
+++ b/drivers/accel/rocket/rocket_drv.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_DRV_H__
+#define __ROCKET_DRV_H__
+
+#include "rocket_device.h"
+
+struct rocket_file_priv {
+ struct rocket_device *rdev;
+};
+
+#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 06/10] accel/rocket: Add IOCTL for BO creation
2025-05-19 13:43 [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (3 preceding siblings ...)
2025-05-19 13:43 ` [PATCH v4 05/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
@ 2025-05-19 13:43 ` Tomeu Vizoso
2025-05-30 19:23 ` Markus Elfring
2025-05-19 13:43 ` [PATCH v4 07/10] accel/rocket: Add job submission IOCTL Tomeu Vizoso
` (4 subsequent siblings)
9 siblings, 1 reply; 15+ messages in thread
From: Tomeu Vizoso @ 2025-05-19 13:43 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
This uses the SHMEM DRM helpers and we map right away to the CPU and NPU
sides, as all buffers are expected to be accessed from both.
v2:
- Sync the IOMMUs for the other cores when mapping and unmapping.
v3:
- Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
drivers/accel/rocket/Makefile | 3 +-
drivers/accel/rocket/rocket_device.c | 4 ++
drivers/accel/rocket/rocket_device.h | 2 +
drivers/accel/rocket/rocket_drv.c | 7 +-
drivers/accel/rocket/rocket_gem.c | 131 +++++++++++++++++++++++++++++++++++
drivers/accel/rocket/rocket_gem.h | 26 +++++++
include/uapi/drm/rocket_accel.h | 44 ++++++++++++
7 files changed, 215 insertions(+), 2 deletions(-)
diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
index abdd75f2492eaecf8bf5e78a2ac150ea19ac3e96..4deef267f9e1238c4d8bd108dcc8afd9dc8b2b8f 100644
--- a/drivers/accel/rocket/Makefile
+++ b/drivers/accel/rocket/Makefile
@@ -5,4 +5,5 @@ obj-$(CONFIG_DRM_ACCEL_ROCKET) := rocket.o
rocket-y := \
rocket_core.o \
rocket_device.o \
- rocket_drv.o
+ rocket_drv.o \
+ rocket_gem.o
diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
index bb469ac87d36249157f4ba9d9f7106ad558309e4..eb10bda13e695fb0c89c1e3464145cdc63748de1 100644
--- a/drivers/accel/rocket/rocket_device.c
+++ b/drivers/accel/rocket/rocket_device.c
@@ -3,6 +3,7 @@
#include <linux/clk.h>
#include <linux/dev_printk.h>
+#include <linux/mutex.h>
#include "rocket_device.h"
@@ -30,10 +31,13 @@ int rocket_device_init(struct rocket_device *rdev)
if (err)
return err;
+ mutex_init(&rdev->iommu_lock);
+
return 0;
}
void rocket_device_fini(struct rocket_device *rdev)
{
+ mutex_destroy(&rdev->iommu_lock);
rocket_core_fini(&rdev->cores[0]);
}
diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
index ba2301e9302120ae338c07baa7d12dd99cb925a9..7381a5148f291ef217c3b256f03d5ab357291688 100644
--- a/drivers/accel/rocket/rocket_device.h
+++ b/drivers/accel/rocket/rocket_device.h
@@ -14,6 +14,8 @@ struct rocket_device {
struct clk *clk_npu;
struct clk *pclk;
+ struct mutex iommu_lock;
+
struct rocket_core *cores;
unsigned int num_cores;
};
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 82f4cc374bfaa92678da791849537d51bb4c0ba8..4f346df06bcde5a24022bdb651c434d0c6e3c468 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -6,6 +6,7 @@
#include <drm/drm_gem.h>
#include <drm/drm_ioctl.h>
#include <drm/drm_of.h>
+#include <drm/rocket_accel.h>
#include <linux/clk.h>
#include <linux/component.h>
#include <linux/dma-mapping.h>
@@ -14,6 +15,7 @@
#include <linux/pm_runtime.h>
#include "rocket_drv.h"
+#include "rocket_gem.h"
static int
rocket_open(struct drm_device *dev, struct drm_file *file)
@@ -42,6 +44,8 @@ rocket_postclose(struct drm_device *dev, struct drm_file *file)
static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
#define ROCKET_IOCTL(n, func) \
DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0)
+
+ ROCKET_IOCTL(CREATE_BO, create_bo),
};
DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
@@ -51,9 +55,10 @@ DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
* - 1.0 - initial interface
*/
static const struct drm_driver rocket_drm_driver = {
- .driver_features = DRIVER_COMPUTE_ACCEL,
+ .driver_features = DRIVER_COMPUTE_ACCEL | DRIVER_GEM,
.open = rocket_open,
.postclose = rocket_postclose,
+ .gem_create_object = rocket_gem_create_object,
.ioctls = rocket_drm_driver_ioctls,
.num_ioctls = ARRAY_SIZE(rocket_drm_driver_ioctls),
.fops = &rocket_accel_driver_fops,
diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c
new file mode 100644
index 0000000000000000000000000000000000000000..8a8a7185daac4740081293aae6945c9b2bbeb2dd
--- /dev/null
+++ b/drivers/accel/rocket/rocket_gem.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <drm/drm_device.h>
+#include <drm/drm_utils.h>
+#include <drm/rocket_accel.h>
+#include <linux/dma-mapping.h>
+#include <linux/iommu.h>
+
+#include "rocket_device.h"
+#include "rocket_gem.h"
+
+static void rocket_gem_bo_free(struct drm_gem_object *obj)
+{
+ struct rocket_device *rdev = to_rocket_device(obj->dev);
+ struct rocket_gem_object *bo = to_rocket_bo(obj);
+ struct sg_table *sgt;
+
+ drm_WARN_ON(obj->dev, bo->base.pages_use_count > 1);
+
+ mutex_lock(&rdev->iommu_lock);
+
+ sgt = drm_gem_shmem_get_pages_sgt(&bo->base);
+
+ /* Unmap this object from the IOMMUs for cores > 0 */
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ struct iommu_domain *domain = iommu_get_domain_for_dev(rdev->cores[core].dev);
+ size_t unmapped = iommu_unmap(domain, sgt->sgl->dma_address, bo->size);
+
+ drm_WARN_ON(obj->dev, unmapped != bo->size);
+ }
+
+ /* This will unmap the pages from the IOMMU linked to core 0 */
+ drm_gem_shmem_free(&bo->base);
+
+ mutex_unlock(&rdev->iommu_lock);
+}
+
+static const struct drm_gem_object_funcs rocket_gem_funcs = {
+ .free = rocket_gem_bo_free,
+ .print_info = drm_gem_shmem_object_print_info,
+ .pin = drm_gem_shmem_object_pin,
+ .unpin = drm_gem_shmem_object_unpin,
+ .get_sg_table = drm_gem_shmem_object_get_sg_table,
+ .vmap = drm_gem_shmem_object_vmap,
+ .vunmap = drm_gem_shmem_object_vunmap,
+ .mmap = drm_gem_shmem_object_mmap,
+ .vm_ops = &drm_gem_shmem_vm_ops,
+};
+
+struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size)
+{
+ struct rocket_gem_object *obj;
+
+ obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+ if (!obj)
+ return ERR_PTR(-ENOMEM);
+
+ obj->base.base.funcs = &rocket_gem_funcs;
+
+ return &obj->base.base;
+}
+
+int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct drm_rocket_create_bo *args = data;
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct drm_gem_shmem_object *shmem_obj;
+ struct rocket_gem_object *rkt_obj;
+ struct drm_gem_object *gem_obj;
+ struct sg_table *sgt;
+ int ret;
+
+ shmem_obj = drm_gem_shmem_create(dev, args->size);
+ if (IS_ERR(shmem_obj))
+ return PTR_ERR(shmem_obj);
+
+ gem_obj = &shmem_obj->base;
+ rkt_obj = to_rocket_bo(gem_obj);
+
+ rkt_obj->size = args->size;
+ rkt_obj->offset = 0;
+
+ ret = drm_gem_handle_create(file, gem_obj, &args->handle);
+ drm_gem_object_put(gem_obj);
+ if (ret)
+ goto err;
+
+ mutex_lock(&rdev->iommu_lock);
+
+ /* This will map the pages to the IOMMU linked to core 0 */
+ sgt = drm_gem_shmem_get_pages_sgt(shmem_obj);
+ if (IS_ERR(sgt)) {
+ ret = PTR_ERR(sgt);
+ goto err_unlock;
+ }
+
+ /* Map the pages to the IOMMUs linked to the other cores, so all cores can access this BO */
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ ret = iommu_map_sgtable(iommu_get_domain_for_dev(rdev->cores[core].dev),
+ sgt->sgl->dma_address,
+ sgt,
+ IOMMU_READ | IOMMU_WRITE);
+ if (ret < 0 || ret < args->size) {
+ drm_err(dev, "failed to map buffer: size=%d request_size=%u\n",
+ ret, args->size);
+ ret = -ENOMEM;
+ goto err_unlock;
+ }
+
+ /* iommu_map_sgtable might have aligned the size */
+ rkt_obj->size = ret;
+
+ dma_sync_sgtable_for_device(rdev->cores[core].dev, shmem_obj->sgt,
+ DMA_BIDIRECTIONAL);
+ }
+
+ mutex_unlock(&rdev->iommu_lock);
+
+ args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node);
+ args->dma_address = sg_dma_address(shmem_obj->sgt->sgl);
+
+ return 0;
+
+err_unlock:
+ mutex_unlock(&rdev->iommu_lock);
+err:
+ drm_gem_shmem_object_free(gem_obj);
+
+ return ret;
+}
diff --git a/drivers/accel/rocket/rocket_gem.h b/drivers/accel/rocket/rocket_gem.h
new file mode 100644
index 0000000000000000000000000000000000000000..41497554366961cfe18cf6c7e93ab1e4e5dc1886
--- /dev/null
+++ b/drivers/accel/rocket/rocket_gem.h
@@ -0,0 +1,26 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_GEM_H__
+#define __ROCKET_GEM_H__
+
+#include <drm/drm_gem_shmem_helper.h>
+
+struct rocket_gem_object {
+ struct drm_gem_shmem_object base;
+
+ size_t size;
+ u32 offset;
+};
+
+struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size);
+
+int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file);
+
+static inline
+struct rocket_gem_object *to_rocket_bo(struct drm_gem_object *obj)
+{
+ return container_of(to_drm_gem_shmem_obj(obj), struct rocket_gem_object, base);
+}
+
+#endif
diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h
new file mode 100644
index 0000000000000000000000000000000000000000..95720702b7c4413d72b89c1f0f59abb22dc8c6b3
--- /dev/null
+++ b/include/uapi/drm/rocket_accel.h
@@ -0,0 +1,44 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2024 Tomeu Vizoso
+ */
+#ifndef __DRM_UAPI_ROCKET_ACCEL_H__
+#define __DRM_UAPI_ROCKET_ACCEL_H__
+
+#include "drm.h"
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+#define DRM_ROCKET_CREATE_BO 0x00
+
+#define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
+
+/**
+ * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
+ *
+ */
+struct drm_rocket_create_bo {
+ /** Input: Size of the requested BO. */
+ __u32 size;
+
+ /** Output: GEM handle for the BO. */
+ __u32 handle;
+
+ /**
+ * Output: DMA address for the BO in the NPU address space. This address
+ * is private to the DRM fd and is valid for the lifetime of the GEM
+ * handle.
+ */
+ __u64 dma_address;
+
+ /** Output: Offset into the drm node to use for subsequent mmap call. */
+ __u64 offset;
+};
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif /* __DRM_UAPI_ROCKET_ACCEL_H__ */
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 07/10] accel/rocket: Add job submission IOCTL
2025-05-19 13:43 [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (4 preceding siblings ...)
2025-05-19 13:43 ` [PATCH v4 06/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
@ 2025-05-19 13:43 ` Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 08/10] accel/rocket: Add IOCTLs for synchronizing memory accesses Tomeu Vizoso
` (3 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Tomeu Vizoso @ 2025-05-19 13:43 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
Using the DRM GPU scheduler infrastructure, with a scheduler for each
core.
Userspace can decide for a series of tasks to be executed sequentially
in the same core, so SRAM locality can be taken advantage of.
The job submission code was initially based on Panfrost.
v2:
- Remove hardcoded number of cores
- Misc. style fixes (Jeffrey Hugo)
- Repack IOCTL struct (Jeffrey Hugo)
v3:
- Adapt to a split of the register block in the DT bindings (Nicolas
Frattaroli)
- Make use of GPL-2.0-only for the copyright notice (Jeff Hugo)
- Use drm_* logging functions (Thomas Zimmermann)
- Rename reg i/o macros (Thomas Zimmermann)
- Add padding to ioctls and check for zero (Jeff Hugo)
- Improve error handling (Nicolas Frattaroli)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
drivers/accel/rocket/Makefile | 3 +-
drivers/accel/rocket/rocket_core.c | 14 +-
drivers/accel/rocket/rocket_core.h | 14 +
drivers/accel/rocket/rocket_device.c | 2 +
drivers/accel/rocket/rocket_device.h | 2 +
drivers/accel/rocket/rocket_drv.c | 15 +
drivers/accel/rocket/rocket_drv.h | 4 +
drivers/accel/rocket/rocket_job.c | 723 +++++++++++++++++++++++++++++++++++
drivers/accel/rocket/rocket_job.h | 50 +++
include/uapi/drm/rocket_accel.h | 64 ++++
10 files changed, 888 insertions(+), 3 deletions(-)
diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile
index 4deef267f9e1238c4d8bd108dcc8afd9dc8b2b8f..3713dfe223d6ec6293ced3ef9291af2f3d144131 100644
--- a/drivers/accel/rocket/Makefile
+++ b/drivers/accel/rocket/Makefile
@@ -6,4 +6,5 @@ rocket-y := \
rocket_core.o \
rocket_device.o \
rocket_drv.o \
- rocket_gem.o
+ rocket_gem.o \
+ rocket_job.o
diff --git a/drivers/accel/rocket/rocket_core.c b/drivers/accel/rocket/rocket_core.c
index a947c3120558e8af90bc0730a4d30ac796d5683d..6c28654c5f3122fb09a10c7df22da7650a8c7d32 100644
--- a/drivers/accel/rocket/rocket_core.c
+++ b/drivers/accel/rocket/rocket_core.c
@@ -8,6 +8,7 @@
#include <linux/pm_runtime.h>
#include "rocket_core.h"
+#include "rocket_job.h"
static int rocket_clk_init(struct rocket_core *core)
{
@@ -61,6 +62,10 @@ int rocket_core_init(struct rocket_core *core)
return PTR_ERR(core->core_iomem);
}
+ err = rocket_job_init(core);
+ if (err)
+ return err;
+
pm_runtime_use_autosuspend(dev);
/*
@@ -74,9 +79,13 @@ int rocket_core_init(struct rocket_core *core)
pm_runtime_enable(dev);
err = pm_runtime_get_sync(dev);
+ if (err) {
+ rocket_job_fini(core);
+ return err;
+ }
- version = rocket_pc_read(core, VERSION);
- version += rocket_pc_read(core, VERSION_NUM) & 0xffff;
+ version = rocket_pc_readl(core, VERSION);
+ version += rocket_pc_readl(core, VERSION_NUM) & 0xffff;
pm_runtime_mark_last_busy(dev);
pm_runtime_put_autosuspend(dev);
@@ -90,4 +99,5 @@ void rocket_core_fini(struct rocket_core *core)
{
pm_runtime_dont_use_autosuspend(core->dev);
pm_runtime_disable(core->dev);
+ rocket_job_fini(core);
}
diff --git a/drivers/accel/rocket/rocket_core.h b/drivers/accel/rocket/rocket_core.h
index 3bde8ad8e6e45f9000ee377d7a5ea9ca01f9ac53..1b21f07b6100fa0ea73cd3ef2accbb0b6fc8777a 100644
--- a/drivers/accel/rocket/rocket_core.h
+++ b/drivers/accel/rocket/rocket_core.h
@@ -37,6 +37,20 @@ struct rocket_core {
void __iomem *core_iomem;
struct clk *a_clk;
struct clk *h_clk;
+
+ struct rocket_job *in_flight_job;
+
+ spinlock_t job_lock;
+
+ struct {
+ struct workqueue_struct *wq;
+ struct work_struct work;
+ atomic_t pending;
+ } reset;
+
+ struct drm_gpu_scheduler sched;
+ u64 fence_context;
+ u64 emit_seqno;
};
int rocket_core_init(struct rocket_core *core);
diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
index eb10bda13e695fb0c89c1e3464145cdc63748de1..14b4e95539d67b693c1810c5b5a89b6a16ca2232 100644
--- a/drivers/accel/rocket/rocket_device.c
+++ b/drivers/accel/rocket/rocket_device.c
@@ -32,12 +32,14 @@ int rocket_device_init(struct rocket_device *rdev)
return err;
mutex_init(&rdev->iommu_lock);
+ mutex_init(&rdev->sched_lock);
return 0;
}
void rocket_device_fini(struct rocket_device *rdev)
{
+ mutex_destroy(&rdev->sched_lock);
mutex_destroy(&rdev->iommu_lock);
rocket_core_fini(&rdev->cores[0]);
}
diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h
index 7381a5148f291ef217c3b256f03d5ab357291688..01f6dc0b657f19eb4037557b0bf5a4c6fb653261 100644
--- a/drivers/accel/rocket/rocket_device.h
+++ b/drivers/accel/rocket/rocket_device.h
@@ -11,6 +11,8 @@
struct rocket_device {
struct drm_device ddev;
+ struct mutex sched_lock;
+
struct clk *clk_npu;
struct clk *pclk;
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 4f346df06bcde5a24022bdb651c434d0c6e3c468..18ff051c336a14b7dda72d235faffb7a55a0a8ee 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -16,12 +16,14 @@
#include "rocket_drv.h"
#include "rocket_gem.h"
+#include "rocket_job.h"
static int
rocket_open(struct drm_device *dev, struct drm_file *file)
{
struct rocket_device *rdev = to_rocket_device(dev);
struct rocket_file_priv *rocket_priv;
+ int ret;
rocket_priv = kzalloc(sizeof(*rocket_priv), GFP_KERNEL);
if (!rocket_priv)
@@ -30,7 +32,15 @@ rocket_open(struct drm_device *dev, struct drm_file *file)
rocket_priv->rdev = rdev;
file->driver_priv = rocket_priv;
+ ret = rocket_job_open(rocket_priv);
+ if (ret)
+ goto err_free;
+
return 0;
+
+err_free:
+ kfree(rocket_priv);
+ return ret;
}
static void
@@ -38,6 +48,7 @@ rocket_postclose(struct drm_device *dev, struct drm_file *file)
{
struct rocket_file_priv *rocket_priv = file->driver_priv;
+ rocket_job_close(rocket_priv);
kfree(rocket_priv);
}
@@ -46,6 +57,7 @@ static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0)
ROCKET_IOCTL(CREATE_BO, create_bo),
+ ROCKET_IOCTL(SUBMIT, submit),
};
DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
@@ -288,6 +300,9 @@ static int rocket_device_runtime_suspend(struct device *dev)
if (core < 0)
return -ENODEV;
+ if (!rocket_job_is_idle(&rdev->cores[core]))
+ return -EBUSY;
+
clk_disable_unprepare(rdev->cores[core].a_clk);
clk_disable_unprepare(rdev->cores[core].h_clk);
diff --git a/drivers/accel/rocket/rocket_drv.h b/drivers/accel/rocket/rocket_drv.h
index bd3a697ab7c8e378967ce638b04d7d86845b53c7..b4055cfad6bd431b7c59b0848653748ab945615c 100644
--- a/drivers/accel/rocket/rocket_drv.h
+++ b/drivers/accel/rocket/rocket_drv.h
@@ -4,10 +4,14 @@
#ifndef __ROCKET_DRV_H__
#define __ROCKET_DRV_H__
+#include <drm/gpu_scheduler.h>
+
#include "rocket_device.h"
struct rocket_file_priv {
struct rocket_device *rdev;
+
+ struct drm_sched_entity sched_entity;
};
#endif
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
new file mode 100644
index 0000000000000000000000000000000000000000..aee6ebdb2bd227439449fdfcab3ce7d1e39cd4c4
--- /dev/null
+++ b/drivers/accel/rocket/rocket_job.c
@@ -0,0 +1,723 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright 2019 Linaro, Ltd, Rob Herring <robh@kernel.org> */
+/* Copyright 2019 Collabora ltd. */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#include <drm/drm_print.h>
+#include <drm/drm_file.h>
+#include <drm/drm_gem.h>
+#include <drm/rocket_accel.h>
+#include <linux/interrupt.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+
+#include "rocket_core.h"
+#include "rocket_device.h"
+#include "rocket_drv.h"
+#include "rocket_job.h"
+#include "rocket_registers.h"
+
+#define JOB_TIMEOUT_MS 500
+
+static struct rocket_job *
+to_rocket_job(struct drm_sched_job *sched_job)
+{
+ return container_of(sched_job, struct rocket_job, base);
+}
+
+struct rocket_fence {
+ struct dma_fence base;
+ struct drm_device *dev;
+ /* rocket seqno for signaled() test */
+ u64 seqno;
+ int queue;
+};
+
+#define to_rocket_fence(dma_fence) \
+ ((struct rocket_fence *)container_of(dma_fence, struct rocket_fence, base))
+
+static const char *rocket_fence_get_driver_name(struct dma_fence *fence)
+{
+ return "rocket";
+}
+
+static const char *rocket_fence_get_timeline_name(struct dma_fence *fence)
+{
+ return "rockchip-npu";
+}
+
+static const struct dma_fence_ops rocket_fence_ops = {
+ .get_driver_name = rocket_fence_get_driver_name,
+ .get_timeline_name = rocket_fence_get_timeline_name,
+};
+
+static struct dma_fence *rocket_fence_create(struct rocket_core *core)
+{
+ struct rocket_device *rdev = core->rdev;
+ struct rocket_fence *fence;
+
+ fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+ if (!fence)
+ return ERR_PTR(-ENOMEM);
+
+ fence->dev = &rdev->ddev;
+ fence->seqno = ++core->emit_seqno;
+ dma_fence_init(&fence->base, &rocket_fence_ops, &core->job_lock,
+ core->fence_context, fence->seqno);
+
+ return &fence->base;
+}
+
+static int
+rocket_copy_tasks(struct drm_device *dev,
+ struct drm_file *file_priv,
+ struct drm_rocket_job *job,
+ struct rocket_job *rjob)
+{
+ struct drm_rocket_task *tasks;
+ int ret = 0;
+ int i;
+
+ rjob->task_count = job->task_count;
+
+ if (!rjob->task_count)
+ return 0;
+
+ tasks = kvmalloc_array(rjob->task_count, sizeof(*tasks), GFP_KERNEL);
+ if (!tasks) {
+ ret = -ENOMEM;
+ drm_dbg(dev, "Failed to allocate incoming tasks\n");
+ goto fail;
+ }
+
+ if (copy_from_user(tasks,
+ (void __user *)(uintptr_t)job->tasks,
+ rjob->task_count * sizeof(*tasks))) {
+ ret = -EFAULT;
+ drm_dbg(dev, "Failed to copy incoming tasks\n");
+ goto fail;
+ }
+
+ rjob->tasks = kvmalloc_array(job->task_count, sizeof(*rjob->tasks), GFP_KERNEL);
+ if (!rjob->tasks) {
+ drm_dbg(dev, "Failed to allocate task array\n");
+ ret = -ENOMEM;
+ goto fail;
+ }
+
+ for (i = 0; i < rjob->task_count; i++) {
+ if (tasks[i].reserved != 0) {
+ drm_dbg(dev, "Reserved field in drm_rocket_task struct should be 0.\n");
+ return -EINVAL;
+ }
+
+ if (tasks[i].regcmd_count == 0) {
+ ret = -EINVAL;
+ goto fail;
+ }
+ rjob->tasks[i].regcmd = tasks[i].regcmd;
+ rjob->tasks[i].regcmd_count = tasks[i].regcmd_count;
+ }
+
+fail:
+ kvfree(tasks);
+ return ret;
+}
+
+static void rocket_job_hw_submit(struct rocket_core *core, struct rocket_job *job)
+{
+ struct rocket_task *task;
+ bool task_pp_en = 1;
+ bool task_count = 1;
+
+ /* GO ! */
+
+ /* Don't queue the job if a reset is in progress */
+ if (!atomic_read(&core->reset.pending)) {
+ task = &job->tasks[job->next_task_idx];
+ job->next_task_idx++; /* TODO: Do this only after a successful run? */
+
+ rocket_pc_writel(core, BASE_ADDRESS, 0x1);
+
+ rocket_cna_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
+ rocket_core_writel(core, S_POINTER, 0xe + 0x10000000 * core->index);
+
+ rocket_pc_writel(core, BASE_ADDRESS, task->regcmd);
+ rocket_pc_writel(core, REGISTER_AMOUNTS, (task->regcmd_count + 1) / 2 - 1);
+
+ rocket_pc_writel(core, INTERRUPT_MASK,
+ PC_INTERRUPT_MASK_DPU_0 | PC_INTERRUPT_MASK_DPU_1);
+ rocket_pc_writel(core, INTERRUPT_CLEAR,
+ PC_INTERRUPT_CLEAR_DPU_0 | PC_INTERRUPT_CLEAR_DPU_1);
+
+ rocket_pc_writel(core, TASK_CON, ((0x6 | task_pp_en) << 12) | task_count);
+
+ rocket_pc_writel(core, TASK_DMA_BASE_ADDR, 0x0);
+
+ rocket_pc_writel(core, OPERATION_ENABLE, 0x1);
+
+ dev_dbg(core->dev,
+ "Submitted regcmd at 0x%llx to core %d",
+ task->regcmd, core->index);
+ }
+}
+
+static int rocket_acquire_object_fences(struct drm_gem_object **bos,
+ int bo_count,
+ struct drm_sched_job *job,
+ bool is_write)
+{
+ int i, ret;
+
+ for (i = 0; i < bo_count; i++) {
+ ret = dma_resv_reserve_fences(bos[i]->resv, 1);
+ if (ret)
+ return ret;
+
+ ret = drm_sched_job_add_implicit_dependencies(job, bos[i],
+ is_write);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
+static void rocket_attach_object_fences(struct drm_gem_object **bos,
+ int bo_count,
+ struct dma_fence *fence)
+{
+ int i;
+
+ for (i = 0; i < bo_count; i++)
+ dma_resv_add_fence(bos[i]->resv, fence, DMA_RESV_USAGE_WRITE);
+}
+
+static int rocket_job_push(struct rocket_job *job)
+{
+ struct rocket_device *rdev = job->rdev;
+ struct drm_gem_object **bos;
+ struct ww_acquire_ctx acquire_ctx;
+ int ret = 0;
+
+ bos = kvmalloc_array(job->in_bo_count + job->out_bo_count, sizeof(void *),
+ GFP_KERNEL);
+ memcpy(bos, job->in_bos, job->in_bo_count * sizeof(void *));
+ memcpy(&bos[job->in_bo_count], job->out_bos, job->out_bo_count * sizeof(void *));
+
+ ret = drm_gem_lock_reservations(bos, job->in_bo_count + job->out_bo_count, &acquire_ctx);
+ if (ret)
+ goto err;
+
+ mutex_lock(&rdev->sched_lock);
+ drm_sched_job_arm(&job->base);
+
+ job->inference_done_fence = dma_fence_get(&job->base.s_fence->finished);
+
+ ret = rocket_acquire_object_fences(job->in_bos, job->in_bo_count, &job->base, false);
+ if (ret) {
+ mutex_unlock(&rdev->sched_lock);
+ goto err_unlock;
+ }
+
+ ret = rocket_acquire_object_fences(job->out_bos, job->out_bo_count, &job->base, true);
+ if (ret) {
+ mutex_unlock(&rdev->sched_lock);
+ goto err_unlock;
+ }
+
+ kref_get(&job->refcount); /* put by scheduler job completion */
+
+ drm_sched_entity_push_job(&job->base);
+
+ mutex_unlock(&rdev->sched_lock);
+
+ rocket_attach_object_fences(job->out_bos, job->out_bo_count, job->inference_done_fence);
+
+err_unlock:
+ drm_gem_unlock_reservations(bos, job->in_bo_count + job->out_bo_count, &acquire_ctx);
+err:
+ kfree(bos);
+
+ return ret;
+}
+
+static void rocket_job_cleanup(struct kref *ref)
+{
+ struct rocket_job *job = container_of(ref, struct rocket_job,
+ refcount);
+ unsigned int i;
+
+ dma_fence_put(job->done_fence);
+ dma_fence_put(job->inference_done_fence);
+
+ if (job->in_bos) {
+ for (i = 0; i < job->in_bo_count; i++)
+ drm_gem_object_put(job->in_bos[i]);
+
+ kvfree(job->in_bos);
+ }
+
+ if (job->out_bos) {
+ for (i = 0; i < job->out_bo_count; i++)
+ drm_gem_object_put(job->out_bos[i]);
+
+ kvfree(job->out_bos);
+ }
+
+ kfree(job->tasks);
+
+ kfree(job);
+}
+
+static void rocket_job_put(struct rocket_job *job)
+{
+ kref_put(&job->refcount, rocket_job_cleanup);
+}
+
+static void rocket_job_free(struct drm_sched_job *sched_job)
+{
+ struct rocket_job *job = to_rocket_job(sched_job);
+
+ drm_sched_job_cleanup(sched_job);
+
+ rocket_job_put(job);
+}
+
+static struct rocket_core *sched_to_core(struct rocket_device *rdev,
+ struct drm_gpu_scheduler *sched)
+{
+ unsigned int core;
+
+ for (core = 0; core < rdev->num_cores; core++) {
+ if (&rdev->cores[core].sched == sched)
+ return &rdev->cores[core];
+ }
+
+ return NULL;
+}
+
+static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
+{
+ struct rocket_job *job = to_rocket_job(sched_job);
+ struct rocket_device *rdev = job->rdev;
+ struct rocket_core *core = sched_to_core(rdev, sched_job->sched);
+ struct dma_fence *fence = NULL;
+ int ret;
+
+ if (unlikely(job->base.s_fence->finished.error))
+ return NULL;
+
+ /*
+ * Nothing to execute: can happen if the job has finished while
+ * we were resetting the GPU.
+ */
+ if (job->next_task_idx == job->task_count)
+ return NULL;
+
+ fence = rocket_fence_create(core);
+ if (IS_ERR(fence))
+ return fence;
+
+ if (job->done_fence)
+ dma_fence_put(job->done_fence);
+ job->done_fence = dma_fence_get(fence);
+
+ ret = pm_runtime_get_sync(core->dev);
+ if (ret < 0)
+ return fence;
+
+ spin_lock(&core->job_lock);
+
+ core->in_flight_job = job;
+ rocket_job_hw_submit(core, job);
+
+ spin_unlock(&core->job_lock);
+
+ return fence;
+}
+
+static void rocket_job_handle_done(struct rocket_core *core,
+ struct rocket_job *job)
+{
+ if (job->next_task_idx < job->task_count) {
+ rocket_job_hw_submit(core, job);
+ return;
+ }
+
+ core->in_flight_job = NULL;
+ dma_fence_signal_locked(job->done_fence);
+ pm_runtime_put_autosuspend(core->dev);
+}
+
+static void rocket_job_handle_irq(struct rocket_core *core)
+{
+ u32 status, raw_status;
+
+ pm_runtime_mark_last_busy(core->dev);
+
+ status = rocket_pc_readl(core, INTERRUPT_STATUS);
+ raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
+
+ rocket_pc_writel(core, OPERATION_ENABLE, 0x0);
+ rocket_pc_writel(core, INTERRUPT_CLEAR, 0x1ffff);
+
+ spin_lock(&core->job_lock);
+
+ if (core->in_flight_job)
+ rocket_job_handle_done(core, core->in_flight_job);
+
+ spin_unlock(&core->job_lock);
+}
+
+static void
+rocket_reset(struct rocket_core *core, struct drm_sched_job *bad)
+{
+ bool cookie;
+
+ if (!atomic_read(&core->reset.pending))
+ return;
+
+ /*
+ * Stop the scheduler.
+ *
+ * FIXME: We temporarily get out of the dma_fence_signalling section
+ * because the cleanup path generate lockdep splats when taking locks
+ * to release job resources. We should rework the code to follow this
+ * pattern:
+ *
+ * try_lock
+ * if (locked)
+ * release
+ * else
+ * schedule_work_to_release_later
+ */
+ drm_sched_stop(&core->sched, bad);
+
+ cookie = dma_fence_begin_signalling();
+
+ if (bad)
+ drm_sched_increase_karma(bad);
+
+ /*
+ * Mask job interrupts and synchronize to make sure we won't be
+ * interrupted during our reset.
+ */
+ rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
+ synchronize_irq(core->irq);
+
+ /* Handle the remaining interrupts before we reset. */
+ rocket_job_handle_irq(core);
+
+ /*
+ * Remaining interrupts have been handled, but we might still have
+ * stuck jobs. Let's make sure the PM counters stay balanced by
+ * manually calling pm_runtime_put_noidle() and
+ * rocket_devfreq_record_idle() for each stuck job.
+ * Let's also make sure the cycle counting register's refcnt is
+ * kept balanced to prevent it from running forever
+ */
+ spin_lock(&core->job_lock);
+ if (core->in_flight_job)
+ pm_runtime_put_noidle(core->dev);
+
+ core->in_flight_job = NULL;
+ spin_unlock(&core->job_lock);
+
+ /* Proceed with reset now. */
+ pm_runtime_force_suspend(core->dev);
+ pm_runtime_force_resume(core->dev);
+
+ /* GPU has been reset, we can clear the reset pending bit. */
+ atomic_set(&core->reset.pending, 0);
+
+ /*
+ * Now resubmit jobs that were previously queued but didn't have a
+ * chance to finish.
+ * FIXME: We temporarily get out of the DMA fence signalling section
+ * while resubmitting jobs because the job submission logic will
+ * allocate memory with the GFP_KERNEL flag which can trigger memory
+ * reclaim and exposes a lock ordering issue.
+ */
+ dma_fence_end_signalling(cookie);
+ drm_sched_resubmit_jobs(&core->sched);
+ cookie = dma_fence_begin_signalling();
+
+ /* Restart the scheduler */
+ drm_sched_start(&core->sched, 0);
+
+ dma_fence_end_signalling(cookie);
+}
+
+static enum drm_gpu_sched_stat rocket_job_timedout(struct drm_sched_job *sched_job)
+{
+ struct rocket_job *job = to_rocket_job(sched_job);
+ struct rocket_device *rdev = job->rdev;
+ struct rocket_core *core = sched_to_core(rdev, sched_job->sched);
+
+ /*
+ * If the GPU managed to complete this jobs fence, the timeout is
+ * spurious. Bail out.
+ */
+ if (dma_fence_is_signaled(job->done_fence))
+ return DRM_GPU_SCHED_STAT_NOMINAL;
+
+ /*
+ * Rocket IRQ handler may take a long time to process an interrupt
+ * if there is another IRQ handler hogging the processing.
+ * For example, the HDMI encoder driver might be stuck in the IRQ
+ * handler for a significant time in a case of bad cable connection.
+ * In order to catch such cases and not report spurious rocket
+ * job timeouts, synchronize the IRQ handler and re-check the fence
+ * status.
+ */
+ synchronize_irq(core->irq);
+
+ if (dma_fence_is_signaled(job->done_fence)) {
+ dev_warn(core->dev, "unexpectedly high interrupt latency\n");
+ return DRM_GPU_SCHED_STAT_NOMINAL;
+ }
+
+ dev_err(core->dev, "gpu sched timeout");
+
+ atomic_set(&core->reset.pending, 1);
+ rocket_reset(core, sched_job);
+
+ return DRM_GPU_SCHED_STAT_NOMINAL;
+}
+
+static void rocket_reset_work(struct work_struct *work)
+{
+ struct rocket_core *core;
+
+ core = container_of(work, struct rocket_core, reset.work);
+ rocket_reset(core, NULL);
+}
+
+static const struct drm_sched_backend_ops rocket_sched_ops = {
+ .run_job = rocket_job_run,
+ .timedout_job = rocket_job_timedout,
+ .free_job = rocket_job_free
+};
+
+static irqreturn_t rocket_job_irq_handler_thread(int irq, void *data)
+{
+ struct rocket_core *core = data;
+
+ rocket_job_handle_irq(core);
+
+ return IRQ_HANDLED;
+}
+
+static irqreturn_t rocket_job_irq_handler(int irq, void *data)
+{
+ struct rocket_core *core = data;
+ u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
+
+ WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
+ WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
+
+ if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 ||
+ raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1))
+ return IRQ_NONE;
+
+ rocket_pc_writel(core, INTERRUPT_MASK, 0x0);
+
+ return IRQ_WAKE_THREAD;
+}
+
+int rocket_job_init(struct rocket_core *core)
+{
+ struct drm_sched_init_args args = {
+ .ops = &rocket_sched_ops,
+ .num_rqs = DRM_SCHED_PRIORITY_COUNT,
+ .credit_limit = 1,
+ .timeout = msecs_to_jiffies(JOB_TIMEOUT_MS),
+ .name = dev_name(core->dev),
+ .dev = core->dev,
+ };
+ int ret;
+
+ INIT_WORK(&core->reset.work, rocket_reset_work);
+ spin_lock_init(&core->job_lock);
+
+ core->irq = platform_get_irq(to_platform_device(core->dev), 0);
+ if (core->irq < 0)
+ return core->irq;
+
+ ret = devm_request_threaded_irq(core->dev, core->irq,
+ rocket_job_irq_handler,
+ rocket_job_irq_handler_thread,
+ IRQF_SHARED, KBUILD_MODNAME "-job",
+ core);
+ if (ret) {
+ dev_err(core->dev, "failed to request job irq");
+ return ret;
+ }
+
+ core->reset.wq = alloc_ordered_workqueue("rocket-reset-%d", 0, core->index);
+ if (!core->reset.wq)
+ return -ENOMEM;
+
+ core->fence_context = dma_fence_context_alloc(1);
+
+ args.timeout_wq = core->reset.wq;
+ ret = drm_sched_init(&core->sched, &args);
+ if (ret) {
+ dev_err(core->dev, "Failed to create scheduler: %d.", ret);
+ goto err_sched;
+ }
+
+ return 0;
+
+err_sched:
+ drm_sched_fini(&core->sched);
+
+ destroy_workqueue(core->reset.wq);
+ return ret;
+}
+
+void rocket_job_fini(struct rocket_core *core)
+{
+ drm_sched_fini(&core->sched);
+
+ cancel_work_sync(&core->reset.work);
+ destroy_workqueue(core->reset.wq);
+}
+
+int rocket_job_open(struct rocket_file_priv *rocket_priv)
+{
+ struct rocket_device *rdev = rocket_priv->rdev;
+ struct drm_gpu_scheduler **scheds = kmalloc_array(rdev->num_cores, sizeof(scheds),
+ GFP_KERNEL);
+ unsigned int core;
+ int ret;
+
+ for (core = 0; core < rdev->num_cores; core++)
+ scheds[core] = &rdev->cores[core].sched;
+
+ ret = drm_sched_entity_init(&rocket_priv->sched_entity,
+ DRM_SCHED_PRIORITY_NORMAL,
+ scheds,
+ rdev->num_cores, NULL);
+ if (WARN_ON(ret))
+ return ret;
+
+ return 0;
+}
+
+void rocket_job_close(struct rocket_file_priv *rocket_priv)
+{
+ struct drm_sched_entity *entity = &rocket_priv->sched_entity;
+
+ kfree(entity->sched_list);
+ drm_sched_entity_destroy(entity);
+}
+
+int rocket_job_is_idle(struct rocket_core *core)
+{
+ /* If there are any jobs in this HW queue, we're not idle */
+ if (atomic_read(&core->sched.credit_count))
+ return false;
+
+ return true;
+}
+
+static int rocket_ioctl_submit_job(struct drm_device *dev, struct drm_file *file,
+ struct drm_rocket_job *job)
+{
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct rocket_file_priv *file_priv = file->driver_priv;
+ struct rocket_job *rjob = NULL;
+ int ret = 0;
+
+ if (job->task_count == 0)
+ return -EINVAL;
+
+ rjob = kzalloc(sizeof(*rjob), GFP_KERNEL);
+ if (!rjob)
+ return -ENOMEM;
+
+ kref_init(&rjob->refcount);
+
+ rjob->rdev = rdev;
+
+ ret = drm_sched_job_init(&rjob->base,
+ &file_priv->sched_entity,
+ 1, NULL);
+ if (ret)
+ goto out_put_job;
+
+ ret = rocket_copy_tasks(dev, file, job, rjob);
+ if (ret)
+ goto out_cleanup_job;
+
+ ret = drm_gem_objects_lookup(file,
+ (void __user *)(uintptr_t)job->in_bo_handles,
+ job->in_bo_handle_count, &rjob->in_bos);
+ if (ret)
+ goto out_cleanup_job;
+
+ rjob->in_bo_count = job->in_bo_handle_count;
+
+ ret = drm_gem_objects_lookup(file,
+ (void __user *)(uintptr_t)job->out_bo_handles,
+ job->out_bo_handle_count, &rjob->out_bos);
+ if (ret)
+ goto out_cleanup_job;
+
+ rjob->out_bo_count = job->out_bo_handle_count;
+
+ ret = rocket_job_push(rjob);
+ if (ret)
+ goto out_cleanup_job;
+
+out_cleanup_job:
+ if (ret)
+ drm_sched_job_cleanup(&rjob->base);
+out_put_job:
+ rocket_job_put(rjob);
+
+ return ret;
+}
+
+int rocket_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct drm_rocket_submit *args = data;
+ struct drm_rocket_job *jobs;
+ int ret = 0;
+ unsigned int i = 0;
+
+ if (args->reserved != 0) {
+ drm_dbg(dev, "Reserved field in drm_rocket_submit struct should be 0.\n");
+ return -EINVAL;
+ }
+
+ jobs = kvmalloc_array(args->job_count, sizeof(*jobs), GFP_KERNEL);
+ if (!jobs) {
+ drm_dbg(dev, "Failed to allocate incoming job array\n");
+ return -ENOMEM;
+ }
+
+ if (copy_from_user(jobs,
+ (void __user *)(uintptr_t)args->jobs,
+ args->job_count * sizeof(*jobs))) {
+ ret = -EFAULT;
+ drm_dbg(dev, "Failed to copy incoming job array\n");
+ goto exit;
+ }
+
+ for (i = 0; i < args->job_count; i++) {
+ if (jobs[i].reserved != 0) {
+ drm_dbg(dev, "Reserved field in drm_rocket_job struct should be 0.\n");
+ return -EINVAL;
+ }
+
+ rocket_ioctl_submit_job(dev, file, &jobs[i]);
+ }
+
+exit:
+ kfree(jobs);
+
+ return ret;
+}
diff --git a/drivers/accel/rocket/rocket_job.h b/drivers/accel/rocket/rocket_job.h
new file mode 100644
index 0000000000000000000000000000000000000000..99e1928fbd89f9b506c63bf9dd591124feeb54b5
--- /dev/null
+++ b/drivers/accel/rocket/rocket_job.h
@@ -0,0 +1,50 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
+
+#ifndef __ROCKET_JOB_H__
+#define __ROCKET_JOB_H__
+
+#include <drm/drm_drv.h>
+#include <drm/gpu_scheduler.h>
+
+#include "rocket_core.h"
+#include "rocket_drv.h"
+
+struct rocket_task {
+ u64 regcmd;
+ u32 regcmd_count;
+};
+
+struct rocket_job {
+ struct drm_sched_job base;
+
+ struct rocket_device *rdev;
+
+ struct drm_gem_object **in_bos;
+ struct drm_gem_object **out_bos;
+
+ u32 in_bo_count;
+ u32 out_bo_count;
+
+ struct rocket_task *tasks;
+ u32 task_count;
+ u32 next_task_idx;
+
+ /* Fence to be signaled by drm-sched once its done with the job */
+ struct dma_fence *inference_done_fence;
+
+ /* Fence to be signaled by IRQ handler when the job is complete. */
+ struct dma_fence *done_fence;
+
+ struct kref refcount;
+};
+
+int rocket_ioctl_submit(struct drm_device *dev, void *data, struct drm_file *file);
+
+int rocket_job_init(struct rocket_core *core);
+void rocket_job_fini(struct rocket_core *core);
+int rocket_job_open(struct rocket_file_priv *rocket_priv);
+void rocket_job_close(struct rocket_file_priv *rocket_priv);
+int rocket_job_is_idle(struct rocket_core *core);
+
+#endif
diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h
index 95720702b7c4413d72b89c1f0f59abb22dc8c6b3..cb1b5934c201160e7650aabd1b3a2b1c77b1fd7b 100644
--- a/include/uapi/drm/rocket_accel.h
+++ b/include/uapi/drm/rocket_accel.h
@@ -12,8 +12,10 @@ extern "C" {
#endif
#define DRM_ROCKET_CREATE_BO 0x00
+#define DRM_ROCKET_SUBMIT 0x01
#define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
+#define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit)
/**
* struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
@@ -37,6 +39,68 @@ struct drm_rocket_create_bo {
__u64 offset;
};
+/**
+ * struct drm_rocket_task - A task to be run on the NPU
+ *
+ * A task is the smallest unit of work that can be run on the NPU.
+ */
+struct drm_rocket_task {
+ /** Input: DMA address to NPU mapping of register command buffer */
+ __u64 regcmd;
+
+ /** Input: Number of commands in the register command buffer */
+ __u32 regcmd_count;
+
+ /** Reserved, must be zero. */
+ __u32 reserved;
+};
+
+/**
+ * struct drm_rocket_job - A job to be run on the NPU
+ *
+ * The kernel will schedule the execution of this job taking into account its
+ * dependencies with other jobs. All tasks in the same job will be executed
+ * sequentially on the same core, to benefit from memory residency in SRAM.
+ */
+struct drm_rocket_job {
+ /** Input: Pointer to an array of struct drm_rocket_task. */
+ __u64 tasks;
+
+ /** Input: Pointer to a u32 array of the BOs that are read by the job. */
+ __u64 in_bo_handles;
+
+ /** Input: Pointer to a u32 array of the BOs that are written to by the job. */
+ __u64 out_bo_handles;
+
+ /** Input: Number of tasks passed in. */
+ __u32 task_count;
+
+ /** Input: Number of input BO handles passed in (size is that times 4). */
+ __u32 in_bo_handle_count;
+
+ /** Input: Number of output BO handles passed in (size is that times 4). */
+ __u32 out_bo_handle_count;
+
+ /** Reserved, must be zero. */
+ __u32 reserved;
+};
+
+/**
+ * struct drm_rocket_submit - ioctl argument for submitting commands to the NPU.
+ *
+ * The kernel will schedule the execution of these jobs in dependency order.
+ */
+struct drm_rocket_submit {
+ /** Input: Pointer to an array of struct drm_rocket_job. */
+ __u64 jobs;
+
+ /** Input: Number of jobs passed in. */
+ __u32 job_count;
+
+ /** Reserved, must be zero. */
+ __u32 reserved;
+};
+
#if defined(__cplusplus)
}
#endif
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 08/10] accel/rocket: Add IOCTLs for synchronizing memory accesses
2025-05-19 13:43 [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (5 preceding siblings ...)
2025-05-19 13:43 ` [PATCH v4 07/10] accel/rocket: Add job submission IOCTL Tomeu Vizoso
@ 2025-05-19 13:43 ` Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 09/10] arm64: dts: rockchip: add pd_npu label for RK3588 power domains Tomeu Vizoso
` (2 subsequent siblings)
9 siblings, 0 replies; 15+ messages in thread
From: Tomeu Vizoso @ 2025-05-19 13:43 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
The NPU cores have their own access to the memory bus, and this isn't
cache coherent with the CPUs.
Add IOCTLs so userspace can mark when the caches need to be flushed, and
also when a writer job needs to be waited for before the buffer can be
accessed from the CPU.
Initially based on the same IOCTLs from the Etnaviv driver.
v2:
- Don't break UABI by reordering the IOCTL IDs (Jeff Hugo)
v3:
- Check that padding fields in IOCTLs are zero (Jeff Hugo)
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
drivers/accel/rocket/rocket_drv.c | 2 +
drivers/accel/rocket/rocket_gem.c | 80 +++++++++++++++++++++++++++++++++++++++
drivers/accel/rocket/rocket_gem.h | 5 +++
include/uapi/drm/rocket_accel.h | 37 ++++++++++++++++++
4 files changed, 124 insertions(+)
diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c
index 18ff051c336a14b7dda72d235faffb7a55a0a8ee..75ccf6e14b2bed80005a70b8cc06925b7c3ac405 100644
--- a/drivers/accel/rocket/rocket_drv.c
+++ b/drivers/accel/rocket/rocket_drv.c
@@ -58,6 +58,8 @@ static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = {
ROCKET_IOCTL(CREATE_BO, create_bo),
ROCKET_IOCTL(SUBMIT, submit),
+ ROCKET_IOCTL(PREP_BO, prep_bo),
+ ROCKET_IOCTL(FINI_BO, fini_bo),
};
DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops);
diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c
index 8a8a7185daac4740081293aae6945c9b2bbeb2dd..cdc5238a93fa5978129dc1ac8ec8de955160dc18 100644
--- a/drivers/accel/rocket/rocket_gem.c
+++ b/drivers/accel/rocket/rocket_gem.c
@@ -129,3 +129,83 @@ int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *
return ret;
}
+
+static inline enum dma_data_direction rocket_op_to_dma_dir(u32 op)
+{
+ if (op & ROCKET_PREP_READ)
+ return DMA_FROM_DEVICE;
+ else if (op & ROCKET_PREP_WRITE)
+ return DMA_TO_DEVICE;
+ else
+ return DMA_BIDIRECTIONAL;
+}
+
+int rocket_ioctl_prep_bo(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct drm_rocket_prep_bo *args = data;
+ unsigned long timeout = drm_timeout_abs_to_jiffies(args->timeout_ns);
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct drm_gem_object *gem_obj;
+ struct drm_gem_shmem_object *shmem_obj;
+ bool write = !!(args->op & ROCKET_PREP_WRITE);
+ long ret = 0;
+
+ if (args->op & ~(ROCKET_PREP_READ | ROCKET_PREP_WRITE))
+ return -EINVAL;
+
+ gem_obj = drm_gem_object_lookup(file, args->handle);
+ if (!gem_obj)
+ return -ENOENT;
+
+ ret = dma_resv_wait_timeout(gem_obj->resv, dma_resv_usage_rw(write),
+ true, timeout);
+ if (!ret)
+ ret = timeout ? -ETIMEDOUT : -EBUSY;
+
+ shmem_obj = &to_rocket_bo(gem_obj)->base;
+
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ dma_sync_sgtable_for_cpu(rdev->cores[core].dev, shmem_obj->sgt,
+ rocket_op_to_dma_dir(args->op));
+ }
+
+ to_rocket_bo(gem_obj)->last_cpu_prep_op = args->op;
+
+ drm_gem_object_put(gem_obj);
+
+ return ret;
+}
+
+int rocket_ioctl_fini_bo(struct drm_device *dev, void *data, struct drm_file *file)
+{
+ struct rocket_device *rdev = to_rocket_device(dev);
+ struct drm_rocket_fini_bo *args = data;
+ struct drm_gem_shmem_object *shmem_obj;
+ struct rocket_gem_object *rkt_obj;
+ struct drm_gem_object *gem_obj;
+
+ if (args->reserved != 0) {
+ drm_dbg(dev, "Reserved field in drm_rocket_fini_bo struct should be 0.\n");
+ return -EINVAL;
+ }
+
+ gem_obj = drm_gem_object_lookup(file, args->handle);
+ if (!gem_obj)
+ return -ENOENT;
+
+ rkt_obj = to_rocket_bo(gem_obj);
+ shmem_obj = &rkt_obj->base;
+
+ WARN_ON(rkt_obj->last_cpu_prep_op == 0);
+
+ for (unsigned int core = 1; core < rdev->num_cores; core++) {
+ dma_sync_sgtable_for_device(rdev->cores[core].dev, shmem_obj->sgt,
+ rocket_op_to_dma_dir(rkt_obj->last_cpu_prep_op));
+ }
+
+ rkt_obj->last_cpu_prep_op = 0;
+
+ drm_gem_object_put(gem_obj);
+
+ return 0;
+}
diff --git a/drivers/accel/rocket/rocket_gem.h b/drivers/accel/rocket/rocket_gem.h
index 41497554366961cfe18cf6c7e93ab1e4e5dc1886..2caa268f7f496f782996c6ad2c4eb851a225a86f 100644
--- a/drivers/accel/rocket/rocket_gem.h
+++ b/drivers/accel/rocket/rocket_gem.h
@@ -11,12 +11,17 @@ struct rocket_gem_object {
size_t size;
u32 offset;
+ u32 last_cpu_prep_op;
};
struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size);
int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file);
+int rocket_ioctl_prep_bo(struct drm_device *dev, void *data, struct drm_file *file);
+
+int rocket_ioctl_fini_bo(struct drm_device *dev, void *data, struct drm_file *file);
+
static inline
struct rocket_gem_object *to_rocket_bo(struct drm_gem_object *obj)
{
diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h
index cb1b5934c201160e7650aabd1b3a2b1c77b1fd7b..b5c80dd767be56e9720b51e4a82617a425a881a1 100644
--- a/include/uapi/drm/rocket_accel.h
+++ b/include/uapi/drm/rocket_accel.h
@@ -13,9 +13,13 @@ extern "C" {
#define DRM_ROCKET_CREATE_BO 0x00
#define DRM_ROCKET_SUBMIT 0x01
+#define DRM_ROCKET_PREP_BO 0x02
+#define DRM_ROCKET_FINI_BO 0x03
#define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo)
#define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit)
+#define DRM_IOCTL_ROCKET_PREP_BO DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_PREP_BO, struct drm_rocket_prep_bo)
+#define DRM_IOCTL_ROCKET_FINI_BO DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_FINI_BO, struct drm_rocket_fini_bo)
/**
* struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs.
@@ -39,6 +43,39 @@ struct drm_rocket_create_bo {
__u64 offset;
};
+#define ROCKET_PREP_READ 0x01
+#define ROCKET_PREP_WRITE 0x02
+
+/**
+ * struct drm_rocket_prep_bo - ioctl argument for starting CPU ownership of the BO.
+ *
+ * Takes care of waiting for any NPU jobs that might still use the NPU and performs cache
+ * synchronization.
+ */
+struct drm_rocket_prep_bo {
+ /** Input: GEM handle of the buffer object. */
+ __u32 handle;
+
+ /** Input: mask of ROCKET_PREP_x, direction of the access. */
+ __u32 op;
+
+ /** Input: Amount of time to wait for NPU jobs. */
+ __s64 timeout_ns;
+};
+
+/**
+ * struct drm_rocket_fini_bo - ioctl argument for finishing CPU ownership of the BO.
+ *
+ * Synchronize caches for NPU access.
+ */
+struct drm_rocket_fini_bo {
+ /** Input: GEM handle of the buffer object. */
+ __u32 handle;
+
+ /** Reserved, must be zero. */
+ __u32 reserved;
+};
+
/**
* struct drm_rocket_task - A task to be run on the NPU
*
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 09/10] arm64: dts: rockchip: add pd_npu label for RK3588 power domains
2025-05-19 13:43 [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (6 preceding siblings ...)
2025-05-19 13:43 ` [PATCH v4 08/10] accel/rocket: Add IOCTLs for synchronizing memory accesses Tomeu Vizoso
@ 2025-05-19 13:43 ` Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B Tomeu Vizoso
2025-05-30 18:50 ` [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Jagan Teki
9 siblings, 0 replies; 15+ messages in thread
From: Tomeu Vizoso @ 2025-05-19 13:43 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
From: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
The NPU of the RK3588 has an external supply. This supply also affects
the power domain of the NPU, not just the NPU device nodes themselves.
Since correctly modelled boards will want the power domain to be aware
of the regulator so that it doesn't always have to be on, add a label to
the NPU power domain node so board files can reference it.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
arch/arm64/boot/dts/rockchip/rk3588-base.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
index f5e58851047e80b23f9ff3244692ad868ddc1ff6..3e5199355a12f5c9ed8b6aea1dd95dcf8cbf1846 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
+++ b/arch/arm64/boot/dts/rockchip/rk3588-base.dtsi
@@ -837,7 +837,7 @@ power: power-controller {
status = "okay";
/* These power domains are grouped by VD_NPU */
- power-domain@RK3588_PD_NPU {
+ pd_npu: power-domain@RK3588_PD_NPU {
reg = <RK3588_PD_NPU>;
#power-domain-cells = <0>;
#address-cells = <1>;
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v4 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B
2025-05-19 13:43 [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (7 preceding siblings ...)
2025-05-19 13:43 ` [PATCH v4 09/10] arm64: dts: rockchip: add pd_npu label for RK3588 power domains Tomeu Vizoso
@ 2025-05-19 13:43 ` Tomeu Vizoso
2025-05-30 18:50 ` [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Jagan Teki
9 siblings, 0 replies; 15+ messages in thread
From: Tomeu Vizoso @ 2025-05-19 13:43 UTC (permalink / raw)
To: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo
Cc: devicetree, linux-arm-kernel, linux-rockchip, linux-kernel,
dri-devel, linux-doc, linux-media, linaro-mm-sig, Tomeu Vizoso
From: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
The NPU on the ROCK5B uses the same regulator for both the sram-supply
and the npu's supply. Add this regulator, and enable all the NPU bits.
Also add the regulator as a domain-supply to the pd_npu power domain.
Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
---
arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts | 56 +++++++++++++++++++++++++
1 file changed, 56 insertions(+)
diff --git a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
index d22068475c5dc6cb885f878f3f527a66edf1ba70..49500f7cbcb14af4919a6c1997e9e53a01d84973 100644
--- a/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
+++ b/arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
@@ -316,6 +316,28 @@ regulator-state-mem {
};
};
+&i2c1 {
+ pinctrl-names = "default";
+ pinctrl-0 = <&i2c1m2_xfer>;
+ status = "okay";
+
+ vdd_npu_s0: regulator@42 {
+ compatible = "rockchip,rk8602";
+ reg = <0x42>;
+ fcs,suspend-voltage-selector = <1>;
+ regulator-name = "vdd_npu_s0";
+ regulator-boot-on;
+ regulator-min-microvolt = <550000>;
+ regulator-max-microvolt = <950000>;
+ regulator-ramp-delay = <2300>;
+ vin-supply = <&vcc5v0_sys>;
+
+ regulator-state-mem {
+ regulator-off-in-suspend;
+ };
+ };
+};
+
&i2c6 {
status = "okay";
@@ -440,6 +462,10 @@ &pd_gpu {
domain-supply = <&vdd_gpu_s0>;
};
+&pd_npu {
+ domain-supply = <&vdd_npu_s0>;
+};
+
&pinctrl {
hdmirx {
hdmirx_hpd: hdmirx-5v-detection {
@@ -500,6 +526,36 @@ &pwm1 {
status = "okay";
};
+&rknn_core_top {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_s0>;
+ status = "okay";
+};
+
+&rknn_core_1 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_s0>;
+ status = "okay";
+};
+
+&rknn_core_2 {
+ npu-supply = <&vdd_npu_s0>;
+ sram-supply = <&vdd_npu_s0>;
+ status = "okay";
+};
+
+&rknn_mmu_top {
+ status = "okay";
+};
+
+&rknn_mmu_1 {
+ status = "okay";
+};
+
+&rknn_mmu_2 {
+ status = "okay";
+};
+
&saradc {
vref-supply = <&avcc_1v8_s0>;
status = "okay";
--
2.49.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH v4 01/10] dt-bindings: npu: rockchip,rknn: Add bindings
2025-05-19 13:43 ` [PATCH v4 01/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
@ 2025-05-20 7:33 ` Krzysztof Kozlowski
0 siblings, 0 replies; 15+ messages in thread
From: Krzysztof Kozlowski @ 2025-05-20 7:33 UTC (permalink / raw)
To: Tomeu Vizoso
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo, devicetree, linux-arm-kernel, linux-rockchip,
linux-kernel, dri-devel, linux-doc, linux-media, linaro-mm-sig
On Mon, May 19, 2025 at 03:43:33PM GMT, Tomeu Vizoso wrote:
> Add the bindings for the Neural Processing Unit IP from Rockchip.
>
> v2:
> - Adapt to new node structure (one node per core, each with its own
> IOMMU)
> - Several misc. fixes from Sebastian Reichel
>
> v3:
> - Split register block in its constituent subblocks, and only require
> the ones that the kernel would ever use (Nicolas Frattaroli)
> - Group supplies (Rob Herring)
> - Explain the way in which the top core is special (Rob Herring)
>
> v4:
> - Change required node name to npu@ (Rob Herring and Krzysztof Kozlowski)
> - Remove unneeded items: (Krzysztof Kozlowski)
> - Fix use of minItems/maxItems (Krzysztof Kozlowski)
> - Add reg-names to list of required properties (Krzysztof Kozlowski)
> - Fix example (Krzysztof Kozlowski)
>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
This order of SoB is still odd. You as person sending it should be the
last signing person. Are you sure you are using b4 for managing
trailers? I would expect it to re-order these on every update and this
is already v4.
> ---
> .../bindings/npu/rockchip,rknn-core.yaml | 149 +++++++++++++++++++++
Filename matching compatible, so rockchip,rk3588-rknn-core.yaml
> 1 file changed, 149 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/npu/rockchip,rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rknn-core.yaml
> new file mode 100644
> index 0000000000000000000000000000000000000000..fafd0b01da215c7396262012988e364ef07ea137
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/npu/rockchip,rknn-core.yaml
> @@ -0,0 +1,149 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/npu/rockchip,rknn-core.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Neural Processing Unit IP from Rockchip
> +
> +maintainers:
> + - Tomeu Vizoso <tomeu@tomeuvizoso.net>
> +
> +description:
> + Rockchip IP for accelerating inference of neural networks, based on NVIDIA's
> + open source NVDLA IP.
> +
> + There is to be a node per each core in the NPU. In Rockchip's design there
> + will be one core that is special and needs to be powered on before any of the
> + other cores can be used. This special core is called the top core and should
> + have the compatible string that corresponds to top cores.
> +
> +properties:
> + $nodename:
> + pattern: '^npu@[a-f0-9]+$'
> +
> + compatible:
> + oneOf:
Drop... if you followed my advice you would notice it is not necessary.
> + - enum:
> + - rockchip,rk3588-rknn-core-top
> + - enum:
> + - rockchip,rk3588-rknn-core
My comments were only partially implemented. This syntax is really not
readable and not necessary and I asked to make it part of previous enum.
This is just one enum:
compatible:
enum:
- foo
- bar
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 05/10] accel/rocket: Add a new driver for Rockchip's NPU
2025-05-19 13:43 ` [PATCH v4 05/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
@ 2025-05-20 7:37 ` Krzysztof Kozlowski
0 siblings, 0 replies; 15+ messages in thread
From: Krzysztof Kozlowski @ 2025-05-20 7:37 UTC (permalink / raw)
To: Tomeu Vizoso
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo, devicetree, linux-arm-kernel, linux-rockchip,
linux-kernel, dri-devel, linux-doc, linux-media, linaro-mm-sig
On Mon, May 19, 2025 at 03:43:37PM GMT, Tomeu Vizoso wrote:
> +#endif
> diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..bb469ac87d36249157f4ba9d9f7106ad558309e4
> --- /dev/null
> +++ b/drivers/accel/rocket/rocket_device.c
> @@ -0,0 +1,39 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright 2024-2025 Tomeu Vizoso <tomeu@tomeuvizoso.net> */
> +
> +#include <linux/clk.h>
> +#include <linux/dev_printk.h>
> +
> +#include "rocket_device.h"
> +
> +int rocket_device_init(struct rocket_device *rdev)
> +{
> + struct device *dev = rdev->cores[0].dev;
> + int err;
> +
> + rdev->clk_npu = devm_clk_get(dev, "npu");
> + if (IS_ERR(rdev->clk_npu)) {
> + err = PTR_ERR(rdev->clk_npu);
> + dev_err(dev, "devm_clk_get failed %d for clock npu\n", err);
> + return err;
> + }
That's probe path? so use standard syntax:
return dev_err_probe(). One line instead of four.
> +
> + rdev->pclk = devm_clk_get(dev, "pclk");
> + if (IS_ERR(rdev->pclk)) {
> + err = PTR_ERR(rdev->pclk);
> + dev_err(dev, "devm_clk_get failed %d for clock pclk\n", err);
> + return err;
Same here... except that this should be blk API and entire function gets
smaller.
> + }
> +
> + /* Initialize core 0 (top) */
> + err = rocket_core_init(&rdev->cores[0]);
> + if (err)
> + return err;
> +
> + return 0;
> +}
...
> +static int rocket_device_runtime_resume(struct device *dev)
> +{
> + struct rocket_device *rdev = dev_get_drvdata(dev);
> + int core = find_core_for_dev(dev);
> + int err = 0;
> +
> + if (core < 0)
> + return -ENODEV;
> +
> + if (core == 0) {
> + err = clk_prepare_enable(rdev->clk_npu);
> + if (err) {
> + dev_err(dev, "clk_prepare_enable failed %d for clock npu\n", err);
> + return err;
> + }
> +
> + err = clk_prepare_enable(rdev->pclk);
> + if (err) {
> + dev_err(dev, "clk_prepare_enable failed %d for clock pclk\n", err);
> + goto error_clk_npu;
> + }
> + }
> +
> + err = clk_prepare_enable(rdev->cores[core].a_clk);
> + if (err) {
> + dev_err(dev, "clk_prepare_enable failed %d for a_clk in core %d\n", err, core);
> + goto error_pclk;
> + }
> +
> + err = clk_prepare_enable(rdev->cores[core].h_clk);
> + if (err) {
> + dev_err(dev, "clk_prepare_enable failed %d for h_clk in core %d\n", err, core);
> + goto error_a_clk;
> + }
All four above calls could be just one call with bulk API.
> +
> + return 0;
> +
> +error_a_clk:
> + clk_disable_unprepare(rdev->cores[core].a_clk);
> +
> +error_pclk:
> + if (core == 0)
> + clk_disable_unprepare(rdev->pclk);
> +
> +error_clk_npu:
> + if (core == 0)
> + clk_disable_unprepare(rdev->clk_npu);
And all this would be gone...
> +
> + return err;
Best regards,
Krzysztof
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU
2025-05-19 13:43 [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
` (8 preceding siblings ...)
2025-05-19 13:43 ` [PATCH v4 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B Tomeu Vizoso
@ 2025-05-30 18:50 ` Jagan Teki
2025-06-01 6:41 ` Tomeu Vizoso
9 siblings, 1 reply; 15+ messages in thread
From: Jagan Teki @ 2025-05-30 18:50 UTC (permalink / raw)
To: Tomeu Vizoso
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo, devicetree, linux-arm-kernel, linux-rockchip,
linux-kernel, dri-devel, linux-doc, linux-media, linaro-mm-sig
On Mon, 19 May 2025 at 19:14, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote:
>
> This series adds a new driver for the NPU that Rockchip includes in its
> newer SoCs, developed by them on the NVDLA base.
>
> In its current form, it supports the specific NPU in the RK3588 SoC.
>
> The userspace driver is part of Mesa and an initial draft can be found at:
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698
>
> Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> ---
> Changes in v4:
> - Several fixes to DT bindings.
> - Link to v3: https://lore.kernel.org/r/20250516-6-10-rocket-v3-0-7051ac9225db@tomeuvizoso.net
>
> Changes in v3:
> - Reference in the device tree only the register blocks that are
> actually used.
> - Several style and robustness fixes suggested in the mailing list.
> - Added patches from Nicolas Frattaroli that add support to the NPU for
> the Rock 5B board.
> - Link to v2: https://lore.kernel.org/r/20250225-6-10-rocket-v2-0-d4dbcfafc141@tomeuvizoso.net
>
> Changes in v2:
> - Drop patch adding the rk3588 compatible to rockchip-iommu (Sebastian Reichel)
> - Drop patch adding support for multiple power domains to rockchip-iommu (Sebastian Reichel)
> - Link to v1: https://lore.kernel.org/r/20240612-6-10-rocket-v1-0-060e48eea250@tomeuvizoso.net
>
> ---
> Nicolas Frattaroli (2):
> arm64: dts: rockchip: add pd_npu label for RK3588 power domains
> arm64: dts: rockchip: enable NPU on ROCK 5B
>
> Tomeu Vizoso (8):
> dt-bindings: npu: rockchip,rknn: Add bindings
> arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588s
> arm64: dts: rockchip: Enable the NPU on quartzpro64
> accel/rocket: Add registers header
> accel/rocket: Add a new driver for Rockchip's NPU
> accel/rocket: Add IOCTL for BO creation
> accel/rocket: Add job submission IOCTL
> accel/rocket: Add IOCTLs for synchronizing memory accesses
Can this be possible to infer yolov8/10? Do we need to convert PT/ONNX
to any other common format's unlike rknn?
Thanks,
Jagan.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 06/10] accel/rocket: Add IOCTL for BO creation
2025-05-19 13:43 ` [PATCH v4 06/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
@ 2025-05-30 19:23 ` Markus Elfring
0 siblings, 0 replies; 15+ messages in thread
From: Markus Elfring @ 2025-05-30 19:23 UTC (permalink / raw)
To: Tomeu Vizoso, dri-devel, linux-rockchip, linux-arm-kernel,
linux-media, devicetree, linaro-mm-sig, Christian König,
Conor Dooley, David Airlie, Heiko Stübner, Jeff Hugo,
Jonathan Corbet, Krzysztof Kozlowski, Maarten Lankhorst,
Maxime Ripard, Nicolas Frattaroli, Oded Gabbay, Rob Herring,
Sebastian Reichel, Simona Vetter, Sumit Semwal, Thomas Zimmermann
Cc: LKML, linux-doc
…
> +++ b/drivers/accel/rocket/rocket_gem.c
> @@ -0,0 +1,131 @@
…
> +static void rocket_gem_bo_free(struct drm_gem_object *obj)
> +{
…
> + mutex_lock(&rdev->iommu_lock);
> +
> + sgt = drm_gem_shmem_get_pages_sgt(&bo->base);
…
> + drm_gem_shmem_free(&bo->base);
> +
> + mutex_unlock(&rdev->iommu_lock);
> +}
…
Under which circumstances would you become interested to apply a statement
like “guard(mutex)(&rdev->iommu_lock);”?
https://elixir.bootlin.com/linux/v6.15/source/include/linux/mutex.h#L201
Regards,
Markus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU
2025-05-30 18:50 ` [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Jagan Teki
@ 2025-06-01 6:41 ` Tomeu Vizoso
0 siblings, 0 replies; 15+ messages in thread
From: Tomeu Vizoso @ 2025-06-01 6:41 UTC (permalink / raw)
To: Jagan Teki
Cc: Rob Herring, Krzysztof Kozlowski, Conor Dooley, Heiko Stuebner,
Oded Gabbay, Jonathan Corbet, Maarten Lankhorst, Maxime Ripard,
Thomas Zimmermann, David Airlie, Simona Vetter, Sumit Semwal,
Christian König, Sebastian Reichel, Nicolas Frattaroli,
Jeff Hugo, devicetree, linux-arm-kernel, linux-rockchip,
linux-kernel, dri-devel, linux-doc, linux-media, linaro-mm-sig
On Fri, May 30, 2025 at 8:50 PM Jagan Teki <jagan@edgeble.ai> wrote:
>
> On Mon, 19 May 2025 at 19:14, Tomeu Vizoso <tomeu@tomeuvizoso.net> wrote:
> >
> > This series adds a new driver for the NPU that Rockchip includes in its
> > newer SoCs, developed by them on the NVDLA base.
> >
> > In its current form, it supports the specific NPU in the RK3588 SoC.
> >
> > The userspace driver is part of Mesa and an initial draft can be found at:
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698
> >
> > Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
> > ---
> > Changes in v4:
> > - Several fixes to DT bindings.
> > - Link to v3: https://lore.kernel.org/r/20250516-6-10-rocket-v3-0-7051ac9225db@tomeuvizoso.net
> >
> > Changes in v3:
> > - Reference in the device tree only the register blocks that are
> > actually used.
> > - Several style and robustness fixes suggested in the mailing list.
> > - Added patches from Nicolas Frattaroli that add support to the NPU for
> > the Rock 5B board.
> > - Link to v2: https://lore.kernel.org/r/20250225-6-10-rocket-v2-0-d4dbcfafc141@tomeuvizoso.net
> >
> > Changes in v2:
> > - Drop patch adding the rk3588 compatible to rockchip-iommu (Sebastian Reichel)
> > - Drop patch adding support for multiple power domains to rockchip-iommu (Sebastian Reichel)
> > - Link to v1: https://lore.kernel.org/r/20240612-6-10-rocket-v1-0-060e48eea250@tomeuvizoso.net
> >
> > ---
> > Nicolas Frattaroli (2):
> > arm64: dts: rockchip: add pd_npu label for RK3588 power domains
> > arm64: dts: rockchip: enable NPU on ROCK 5B
> >
> > Tomeu Vizoso (8):
> > dt-bindings: npu: rockchip,rknn: Add bindings
> > arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588s
> > arm64: dts: rockchip: Enable the NPU on quartzpro64
> > accel/rocket: Add registers header
> > accel/rocket: Add a new driver for Rockchip's NPU
> > accel/rocket: Add IOCTL for BO creation
> > accel/rocket: Add job submission IOCTL
> > accel/rocket: Add IOCTLs for synchronizing memory accesses
>
> Can this be possible to infer yolov8/10? Do we need to convert PT/ONNX
> to any other common format's unlike rknn?
Both considerations are entirely dependent on the userspace driver.
This kernel driver should be able to support a userspace driver that
accelerates any YOLO version. Should also be able to support without
changes a userspace driver that implements execution of ONNX, PyTorch
models, etc. With or without conversion to an intermediate model
format.
Regarding the particular userspace driver that has been submitted for
review to Mesa, you can put questions and comments at:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29698
Thanks,
Tomeu
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-06-01 6:43 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-19 13:43 [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 01/10] dt-bindings: npu: rockchip,rknn: Add bindings Tomeu Vizoso
2025-05-20 7:33 ` Krzysztof Kozlowski
2025-05-19 13:43 ` [PATCH v4 02/10] arm64: dts: rockchip: Add nodes for NPU and its MMU to rk3588s Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 03/10] arm64: dts: rockchip: Enable the NPU on quartzpro64 Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 05/10] accel/rocket: Add a new driver for Rockchip's NPU Tomeu Vizoso
2025-05-20 7:37 ` Krzysztof Kozlowski
2025-05-19 13:43 ` [PATCH v4 06/10] accel/rocket: Add IOCTL for BO creation Tomeu Vizoso
2025-05-30 19:23 ` Markus Elfring
2025-05-19 13:43 ` [PATCH v4 07/10] accel/rocket: Add job submission IOCTL Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 08/10] accel/rocket: Add IOCTLs for synchronizing memory accesses Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 09/10] arm64: dts: rockchip: add pd_npu label for RK3588 power domains Tomeu Vizoso
2025-05-19 13:43 ` [PATCH v4 10/10] arm64: dts: rockchip: enable NPU on ROCK 5B Tomeu Vizoso
2025-05-30 18:50 ` [PATCH v4 00/10] New DRM accel driver for Rockchip's RKNN NPU Jagan Teki
2025-06-01 6:41 ` Tomeu Vizoso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).